typus is a local python typographer

,'``.._ ,'``. :,--._:)\,:,._,.: All Glory to :`--,'' :`...';\ the HYPNO TOAD! `,' `---' `. / : / \ ,' :\.___,-. `...,---'``````-..._ |: \ ( ) ;: ) \ _,-. `. ( // `' \ : `.// ) ) , ; ,-|`. _,'/ ) ) ,' ,' ( :`.`-..____..=:.-': . _,' ,' `,'\ ``--....-)=' `._, \ ,') _ '``._ _.-/ _ `. (_) / )' ; / \ \`-.' `--( `-:`. `' ___..' _,-' |/ `.) `-. `.`.``-----``--, .' |/`.\`' ,','); SSt ` (/ (/

Found on the Internet.

Hello!
I want to share my little development: a typographer that can be used locally.

Disclaimer

The project is under development and needs to be thoroughly tested.

Opportunities

replacing quotes with «„“» and “''” (in the English version). The number of levels is not limited - the printer simply alternates even / odd - where you can customize
placement inches, apostrophes: 4′ , 20″
complex symbols: ellipsis, copyrights, trademarks, arrows, etc.: (c) becomes ``, and even if it is written in Cyrillic
hyphens are replaced by dashes in texts and numeric ranges
replacing hyphens with a short dash in phone numbers
alignment of minuses and multiplication signs
the binding of numbers with the following words with a continuous hyphen, for example 40
linking conjunctions and any words of 1-2 characters followed by words
separation of units of measurement from numbers (perhaps I drank in the near future, the chance of a false-positive result is very large)
non-breaking gaps in abbreviations: .. will be . . .; . . - here the usual space will become discontinuous
replacing and (with a dot at the end and without) with a ruble symbol - maybe I will drink it, since it will remove the dot if it finds a match at the end of a sentence
replacement of fractions 1/2 , 1/3 , etc. on existing unicode characters
removal of extra spaces and line breaks, trimming at the beginning and at the end
placement of inseparable spaces in a bunch of cases
does not affect html tags and ignores the contents (head|iframe|pre|code|script|style)
you can pass strings that the typographer will ignore

Example

 from typus import ru_typus ru_typus('00" "11 \'22\' 11"? "11 \'22 "33 33?"\' 11" 00 "11 \'22\' 11" 0"') '00″ «11 „22“ 11»? «11 „22 «33 33?»“ 11» 00 «11 „22“ 11» 0″'

Number is the nesting level. If the first quotation stood to zero, there would be another level, and so the inches would come out.

How does

 class BaseTypus(EnRuExpressions, TypusCore): processors = (EscapePhrases, EscapeHtml, TypoQuotes, Expressions) class RuTypus(RuQuotes, BaseTypus): pass ru_typus = RuTypus()

Typus consists of "processors" and "expressions."

Expressions

These are pairs (regex, replace) , which are transferred to re.sub(regex, replace) and are executed sequentially (see just below). Almost all typographers are "expressions." They are written as methods with the prefix expr_ , the function should return a nested list, i.e. one "expression" can return a train of "expressions":

 class MyTypus(Typus): expressions = Typus.expressions + 'http://bar' def expr_http://bar(self): expr = ( (r'\d', '@'), #    @ ) return expr

The third, optional, argument is the flags passed to re.compile ; by default, this is re.I | re.U | re.M | re.S re.I | re.U | re.M | re.S re.I | re.U | re.M | re.S
By the way, replace may be a function, see re.sub .

To determine the sequence, the typograph attribute is used - expressions , which stores a list of expression names . You can turn off the excess:

 from typus import RuTypus exclude_expressions = ('ruble', 'math') class MyTypus(RuTypus): expressions = (e for e in RuTypus.expressions if e not in exclude_expressions)

expressions can be a generator, but if you make a sequence, you can do this:

 def expr_http://bar(self): if 'some' in self.expressions: return baz return egg

There is only one mix of expressions in the box - EnRuExpressions , but it does almost all the work.

Expressions are used for expressions to work.

Processors

Sometimes simple regulars do not get off, you have to fence uber-function. The processor is a class-function-decorator, which is initiated during the creation of a typographer, and then called when processing text. It (the processor instance) is passed to the typograph instance itself, so that the processor can access its configuration.

When using multiple processors, they decorate each other in order. For example:

  html     ,    -

Several processors are EscapePhrases with Typus: EscapePhrases , EscapeHtml , TypoQuotes , Expressions .

Escapeprases

There are cases when a certain piece of text cannot be processed, or you know in advance that the typographer will stop at this place, in this case you can do this:

 typus('"http://bar 2""', escape_phrases=['2"']) '«http://bar 2"»'

Without this, the printer will meet the closing quote: «http://bar 2»" . Another example:

 typus('  (c)  (c)', escape_phrases=[' (c)']) '  (c)  '

The escape_phrases argument can be escape_phrases separate field in your CRUD application (aka "admin"), where the content manager will be able to list the phrases through the separator, and you will pass them to the typographer.

To divide the text, you can use the utility:

 from typus.utils import splinter split = splinter(',') split('a, b,c ') == ['a', 'b', 'c'] split('a, b\,c') == ['a', 'b,c']

splinter understands shielded delimiters and calls str.strip() for each phrase.

EscapeHtml

Express html-tags to the typographer and returns them after. Without it, <img src="http://bar"> will turn into <img src=«http://bar»> .

TypoQuotes

Put quotes. Expects that the printer will list the attributes loq , roq , leq , req . Example:

 from typus import BaseTypus from typus.chars import LAQUO, RAQUO, DLQUO, LDQUO class MyTypus(BaseTypus): #  ,  ,  ,   loq, roq, leq, req = LAQUO, RAQUO, DLQUO, LDQUO

There are ready EnQuotes and RuQuotes in the module typus.mixins .

Expressions

Provides expression work. During the initialization of the printer, all regulars are compiled and stored in the processor instance.

About debugging

If you give debug=True to the typographer, he will replace all non-breaking spaces with an underscore, this can be useful for debugging:

 ru_typus('(c) me', debug=True) '_me'

Demo

Important: the demo runs on a very simple virtual machine and is intended to demonstrate the possibilities.

I will not save anything (honestly) , the source code of the site you will find on my github .

Installation and use

 pip install -e git://github.com/byashimov/typus.git#egg=typus

Further:

 from typus import en_typus, ru_typus en_typus('"Beautiful is better than ugly." (c) Tim Peters.', debug=True) '“Beautiful is_better than ugly.” _Tim Peters.' # _ for nbsp ru_typus('" ,  ." ()  .') '« ,  .»  .' # cyrillic '' in '()'

Documentation

This article can be considered as such, until I make a clumsy translation into English.

Compatibility

 Name Stmts Miss Cover ----------------------------------------- typus/__init__.py 8 0 100% typus/chars.py 18 0 100% typus/core.py 24 0 100% typus/mixins.py 77 0 100% typus/processors.py 99 0 100% typus/utils.py 30 0 100% ----------------------------------------- TOTAL 256 0 100% ________________ summary ________________ py25: commands succeeded py26: commands succeeded py27: commands succeeded py33: commands succeeded py34: commands succeeded py35: commands succeeded congratulations :)

Travis-CI , which I use, does not support 2.5 , and I’m not always checking manually by hand, so if you still use it (condolences), run the tests after installation.

Project page .

Plans and ideas

I do not plan to add to the printer underscore links or placement of html-tags. This should be occupied by a text processor (markdown, retext, etc.). In addition, all have their own cases.
I also would not want the typographer to correct errors in the text, even if it does not cost anything.
Almost all typographers convert unsafe characters, such as & , into html entities. At the moment, it is not clear to me why to do this: browsers, search engines and parsers cope playfully with such text, and I just don’t want to run cpu just like that to make the code unreadable. I would be glad to have a specific example.
Probably, ru_typus will cope with Ukrainian and Belarusian texts (and possibly with others), if so, I will add it to the project description.

Look like that's it.

PS Some hell with highlighting inline code on Habré.

Source: https://habr.com/ru/post/303608/

All Articles

typus is a local python typographer

Disclaimer

Opportunities

Example

How does

Expressions

Processors

Escapeprases

EscapeHtml

TypoQuotes

Expressions

About debugging

Demo

Installation and use

Documentation

Compatibility

Plans and ideas

More articles: