⬆️ ⬇️

Jevix library for Python



Good day to all!

Recently I asked a question on habre, about an analog library jevix for python .

The solution was suggested to me, but these solutions were not enough for me, and everyone knows that Jevix is ​​the most popular library for working with html.



For those who hear for the first time on Jevix


Jevix is ​​a tool for automatically applying typing rules, endowed with the ability to unify markup of HTML / XML documents, control the list of valid tags and attributes, and prevent possible XSS attacks in the code of documents.



The author of this library for php is, habra resident ur001 , a link to the project jevix.ru/project



')

Prehistory


I really like this library, I used it more than once in my projects, but as they say, times change and I started learning python + django and faced such a problem, I started to do a project where I need to tightly control the transmitted html code and not give the enemy any a chance for xss attacks.



That's just the trouble, I could not find jevix libraries for python or a worthy analogue! So, after thinking about it, I decided to rewrite jevix + php to jevix + python.



Go


I will say right away the library is raw and is in beta mode, please potest all volunteers so that they do not lose face in battle.



What is not yet implemented


  1. AutoCorrect, for characters such as (r) on ®
  2. Add padding from specific lines
  3. you cannot disable the br arrangement in certain tags, such as object ... so for this reason it has completely disabled it




How to use?


Everything is trivial, I tried to leave everything as it is and even the name of the variables.

We initialize

from jevix import Jevix p = Jevix() 


Customize
 p.cfgAllowTags(['ls','ddcut','a', 'img', 'i', 'p', 'b', 'u', 's', 'video', 'em', 'strong', 'nobr', 'li', 'ol', 'ul', 'sup', 'abbr', 'sub', 'acronym', 'h4', 'h5', 'h6', 'br', 'hr', 'pre', 'code', 'object', 'param', 'embed', 'blockquote', 'iframe','table','th','tr','td']) p.cfgSetTagShort(['br','img', 'hr', 'ddcut','ls']); p.cfgSetTagPreformatted(['pre','code','video', 'iframe']) p.cfgAllowTagParams('img', { 0:'src', 'alt' : '#text', 1:'title', 'align': ['right', 'left', 'center', 'middle'], 'width':'#int', 'height':'#int', 'hspace':'#int', 'vspace':'#int', 'class' : ['image-center', 'image-left', 'image-right'] }) p.cfgAllowTagParams('a', { 0:'title', 1:'href', 'rel' : '#text', 'name' : '#text', 'target' : ['_blank'] }) p.cfgAllowTagParams('ddcut', { 0:'name', }) p.cfgAllowTagParams('acronym', { 0:'title', }) p.cfgAllowTagParams('abbr', { 0:'title', }) p.cfgAllowTagParams('param', { 'width' : '#int', 'height' : '#int', 'src' : {'#domain':['youtube.com','rutube.ru','vimeo.com']} }) p.cfgAllowTagParams('iframe', { 'name' : '#text', 'value' : '#text', 'height' : '#int', 'width' : '#int', 'src' : {'#domain':['youtube.com','rutube.ru','vimeo.com','video.yandex.ru']}, }) p.cfgAllowTagParams('ls', { 'user' : '#text' }) p.cfgAllowTagParams('td', { 'colspan':'#int','rowspan':'#int','align':['right', 'left', 'center', 'justify'], 'height':'#int','width':'#int' }) p.cfgAllowTagParams('table', { 'border':'#int', 'cellpadding':'#int','cellspacing':'#int','align':['right', 'left', 'center'], 'height':'#int','width':'#int' }) p.cfgAllowTagParams('embed', { 'src' : {'#domain':['youtube.com','rutube.ru','vimeo.com','video.yandex.ru']}, 'type' : '#text', 'allowscriptaccess' : '#text', 'allowfullscreen' : '#text','width' : '#int', 'height' : '#int', 'flashvars': '#text', 'wmode': '#text' }) p.cfgAllowTagParams('object', { 'width' : '#int', 'height' : '#int', 'data' : {'#domain':['youtube.com','rutube.ru','vimeo.com','video.yandex.ru']}, }) p.cfgSetTagParamsRequired('img', ['src']) p.cfgSetTagParamsRequired('a', ['href']) p.cfgSetTagParamsRequired('iframe', ['src']) p.cfgSetTagCutWithContent(['script', 'style']) p.cfgSetTagChilds('ul', ['li'], False, True) p.cfgSetTagChilds('ol', ['li'], False, True) p.cfgSetTagChilds('object', ['param'], False, True) p.cfgSetTagChilds('object', ['embed'], False, False) p.cfgSetTagChilds('table', ['tr'], False, True) p.cfgSetTagChilds('tr', ['td','th'], False, True) p.cfgSetTagIsEmpty(['param','embed','a','iframe']); p.cfgSetTagNoAutoBr(['ul','ol','object','table','tr']) p.cfgSetTagBlockType(['h4','h5','h6','ol','ul','blockquote','pre','table','iframe', 'object']) p.cfgSetAutoBrMode(True) p.cfgSetTagNoTypography(['code','video','object', 'iframe']) errors = [] text= u"""<iframe width="560" height="315" src="http://youtube.com/embed/lGnGQXUeaVc" frameborder="0" allowfullscreen></iframe>""" 




Well, we perform

 text, errors = p.parse(text) print text, errors 




References:

on the official project - jevix.ru

to test in online python + jevix - jevix.vir-mir.ru

project on github - github.com/vir-mir/jevix

Source: https://habr.com/ru/post/168439/



All Articles