
Having started writing a topic about my perversions with TeX, I realized that I really lacked normal syntax highlighting. Googling in Habr and the environs was induced by a couple of editors who did not work for me and the description of the
formatter formatted for
pygments .
Having decided “why am I worse”, I scribbled a “python” script “on my lap”, which paints the code for me.
An autopsy showed that the
patient died from an autopsy, using the code found on Habré without modifications would not work out - either sharpened to the old version of pygments, or something else. In general, I got into the documentation and in the first example I came
across HTML 3.2 Formatter very similar to the one I found earlier.
The comparison showed that both formatters, and if not twins, then brothers - according to the new rules, the definition of the
format method that the respected
barbuza did not use was
mandatory , and I had to correct the Habra-specific tags.
We process the code
Further, having armed myself with a python with regexp, I decided: I will process the text and replace it with blocks with a code drawn up by analogy with
HabraReditor with colored code:
-
<code class= "python" > <!-- , -->
#!/usr/bin/env python
import sys
</code> <!-- , -->
Those. all that inside a block limited by
code with a single
class attribute is run through the appropriate lexer pygments.
def preparse_text(text, linenos = False , style = None ):
"""Extract code blocks from raw text, render via pygments and return as unicode string"""
R = re . compile( ur'^\s*<code class="(?P<class>.*?)">\s*$(?P<code>.*?)^\s*</code>\s*$' , re . I | re . U | re . S | re . M)
out = []
prev = 0‌
ar = { 'linenos' : False }
if linenos:
ar[ 'linenos' ] = 'inline'
if style:
ar[ 'style' ] = style
for s in R . finditer(text):
fmt = OldHtmlFormatter( ** ar)
out . append(text[prev:s . start()])
lx = get_lexer_by_name(s . group( 'class' ))
if not lx:
lx = guess_lexer(s . group( 'code' ))
if lx:
s0 = s . group( 'code' )
s0 = s0 . replace( u' ' , u' \u00a0 ' ) #
src = highlight(s0, lx, fmt) # .replace(u'\n', '<br/>\n') # for preview
else :
src = u'<code> %s </code>' % s . group( 'code' )
out . append(src)
prev = s . end()
del lx
del fmt
lx = None
out . append(text[prev:])
return u'' . join(out)
At the output we get our text, in which all such blocks are colored in accordance with the chosen pygments style.
The formatter fix consisted of 3 parts - replacing regular strings with unicode ones, adding a function of escaping HTML characters from the
barbuza blog with a small edit and adding a “stupid” function of packaging color codes from 6 characters to 3 if color allows.
')
Console utility
It remains to write a piece that will call our formatter and set additional parameters for it.
Here, to make life easier, we need the
optparse module for parsing command line parameters:
#
p = OptionParser(usage = 'usage: %prog [options] input_file' )
p . add_option( '-f' , '--file' , metavar = "FILE" , help = "Write output to FILE" )
p . add_option( '-s' , '--style' , metavar = "STYLE" ,help = "Use color STYLE for formatting" )
p . add_option( '--htm' , '--html' , action = "store_true" , help = "Add extra html headers in output" )
p . add_option( '--list-styles' , action = "store_true" , help = "Show list of supported styles" )
p . add_option( '--list-languages' , action = "store_true" , help = "Show list of supported languages" )
# , 'op', 'a[0]'
op,a = p . parse_args()
# - ,
if op . list_styles:
from pygments . styles import get_all_styles
print "Supported color styles:"
for s in get_all_styles():
print u" \t %s " % (s,)
sys . exit( 0‌ )
# - :)
if op . list_languages:
from pygments . lexers import get_all_lexers
print "Supported languages and aliases:"
ss = list (get_all_lexers())
ss . sort(key = lambda x:x[ 0‌ ] . lower())
for s in ss:
print s[ 0‌ ]
if s[ 1 ]:
print " \t " , ", " . join(s[ 1 ])
sys . exit( 0‌ )
# ,
if len (a) != 1 :
print "No input file specified!"
sys . exit( 1 )
That's all. It remains to read the specified file, process it for coloring code and write to the specified file or on the screen if the output file is not specified. Smacks bydlokodom, but optimize elementary laziness. Since there are already thoughts of refining all this disgrace - we will postpone refactoring until later.
srcfile = a[ 0‌ ]
dstfile = op . file
f = unicode ( open (srcfile, 'rb' ) . read(), 'utf-8' , 'replace' ) # , utf8 ^)
s = preparse_text(f, False , op . style) . encode( 'utf-8' ) #
if dstfile:
fn = open (dstfile, 'wb' )
else :
fn = sys . stdout
# HTML --
if op . htm:
fn . write( '<html><head><meta http-equiv="content-type" content="text/html; charset= \' utf8 \' "/></head><body> \n ' )
fn . write(s)
if op . htm:
fn . write( '</body></html> \n ' )
try :
close(fn) # stdout close,
except :
pass
Total
As a result, I received a workable utility that allows me, with a “light hand movement,” © to paint the code of articles typed in Far-e, GEdit-e or Midnight Commandere.
The entire source code (with some garbage) is available at
dumpz.org/17521 .
Todo
Already now some directions of development are seen:
- “Brush” the coloring part - add the ability to output line numbers and automatic screening of nested codes (now I had to add comments after the code tag in the first example
- There is a thought to make GUI on PyQt with preview a'la Habr
- You can draw a couple of your own styles, the benefit is quite simple.
Used materials
- Habrahabr
- Hub Browser blog barbuza as an idea and function of shielding HTML characters
- Habra Editor from SoftCoder
- Python 2.6
- Pygments
- Sc.me source code coloring site
- Dumping dumping site dumpz.org
PS: This text was prepared in GEdit and colored with habra-colorer with default style.
ZYY: In the process of writing, I found out that Habr "does not like" the lonely number "0" inside the
font tags. I had to cheat by adding a tricky space with the code & # 8204 ;.