Habra-colorer - script for coloring your handy tools

Having started writing a topic about my perversions with TeX, I realized that I really lacked normal syntax highlighting. Googling in Habr and the environs was induced by a couple of editors who did not work for me and the description of the formatter formatted for pygments .
Having decided “why am I worse”, I scribbled a “python” script “on my lap”, which paints the code for me.

An autopsy showed that the ~~patient died from an autopsy,~~ using the code found on Habré without modifications would not work out - either sharpened to the old version of pygments, or something else. In general, I got into the documentation and in the first example I came across HTML 3.2 Formatter very similar to the one I found earlier.

The comparison showed that both formatters, and if not twins, then brothers - according to the new rules, the definition of the format method that the respected barbuza did not use was mandatory , and I had to correct the Habra-specific tags.

We process the code

Further, having armed myself with a python with regexp, I decided: I will process the text and replace it with blocks with a code drawn up by analogy with HabraReditor with colored code:

- <code class= "python" >  #!/usr/bin/env python import sys </code> 

Those. all that inside a block limited by code with a single class attribute is run through the appropriate lexer pygments.

def preparse_text(text, linenos = False , style = None ): """Extract code blocks from raw text, render via pygments and return as unicode string""" R = re . compile( ur'^\s*<code class="(?P<class>.*?)">\s*$(?P<code>.*?)^\s*</code>\s*$' , re . I | re . U | re . S | re . M) out = [] prev = 0‌ ar = { 'linenos' : False } if linenos: ar[ 'linenos' ] = 'inline' if style: ar[ 'style' ] = style for s in R . finditer(text): fmt = OldHtmlFormatter( ** ar) out . append(text[prev:s . start()]) lx = get_lexer_by_name(s . group( 'class' )) if not lx: lx = guess_lexer(s . group( 'code' )) if lx: s0 = s . group( 'code' ) s0 = s0 . replace( u' ' , u' \u00a0 ' ) #   src = highlight(s0, lx, fmt) # .replace(u'\n', '<br/>\n') # for preview else : src = u'<code> %s </code>' % s . group( 'code' ) out . append(src) prev = s . end() del lx del fmt lx = None out . append(text[prev:]) return u'' . join(out)

At the output we get our text, in which all such blocks are colored in accordance with the chosen pygments style.
The formatter fix consisted of 3 parts - replacing regular strings with unicode ones, adding a function of escaping HTML characters from the barbuza blog with a small edit and adding a “stupid” function of packaging color codes from 6 characters to 3 if color allows.
')

Console utility

It remains to write a piece that will call our formatter and set additional parameters for it.

Here, to make life easier, we need the optparse module for parsing command line parameters:

# p = OptionParser(usage = 'usage: %prog [options] input_file' ) p . add_option( '-f' , '--file' , metavar = "FILE" , help = "Write output to FILE" ) p . add_option( '-s' , '--style' , metavar = "STYLE" ,help = "Use color STYLE for formatting" ) p . add_option( '--htm' , '--html' , action = "store_true" , help = "Add extra html headers in output" ) p . add_option( '--list-styles' , action = "store_true" , help = "Show list of supported styles" ) p . add_option( '--list-languages' , action = "store_true" , help = "Show list of supported languages" ) # , 'op', 'a[0]' op,a = p . parse_args() # - , if op . list_styles: from pygments . styles import get_all_styles print "Supported color styles:" for s in get_all_styles(): print u" \t %s " % (s,) sys . exit( 0‌ ) # - :) if op . list_languages: from pygments . lexers import get_all_lexers print "Supported languages and aliases:" ss = list (get_all_lexers()) ss . sort(key = lambda x:x[ 0‌ ] . lower()) for s in ss: print s[ 0‌ ] if s[ 1 ]: print " \t " , ", " . join(s[ 1 ]) sys . exit( 0‌ ) # , if len (a) != 1 : print "No input file specified!" sys . exit( 1 )

That's all. It remains to read the specified file, process it for coloring code and write to the specified file or on the screen if the output file is not specified. Smacks bydlokodom, but optimize elementary laziness. Since there are already thoughts of refining all this disgrace - we will postpone refactoring until later.

srcfile = a[ 0‌ ] dstfile = op . file f = unicode ( open (srcfile, 'rb' ) . read(), 'utf-8' , 'replace' ) # , utf8 ^) s = preparse_text(f, False , op . style) . encode( 'utf-8' ) # if dstfile: fn = open (dstfile, 'wb' ) else : fn = sys . stdout # HTML -- if op . htm: fn . write( '<html><head><meta http-equiv="content-type" content="text/html; charset= \' utf8 \' "/></head><body> \n ' ) fn . write(s) if op . htm: fn . write( '</body></html> \n ' ) try : close(fn) # stdout close, except : pass

Total

As a result, I received a workable utility that allows me, with a “light hand movement,” © to paint the code of articles typed in Far-e, GEdit-e or Midnight Commandere.
The entire source code (with some garbage) is available at dumpz.org/17521 .

Todo

Already now some directions of development are seen:

“Brush” the coloring part - add the ability to output line numbers and automatic screening of nested codes (now I had to add comments after the code tag in the first example
There is a thought to make GUI on PyQt with preview a'la Habr
You can draw a couple of your own styles, the benefit is quite simple.

Used materials

Habrahabr
Hub Browser blog barbuza as an idea and function of shielding HTML characters
Habra Editor from SoftCoder
Python 2.6
Pygments
Sc.me source code coloring site
Dumping dumping site dumpz.org

PS: This text was prepared in GEdit and colored with habra-colorer with default style.
ZYY: In the process of writing, I found out that Habr "does not like" the lonely number "0" inside the font tags. I had to cheat by adding a tricky space with the code & # 8204 ;.

Source: https://habr.com/ru/post/86781/

All Articles