The purpose of the post - criticize.I study Django in a small way, by creating a blog engine. There was a problem outputting custom content, i.e. its shielding. I wrote my function. Having some experience in creating content for different sites with different ideologies for shielding user data (rather negative, mostly sites are asked to do extra work), I developed a philosophy.
Comment. In my blog in the database is always stored in no way processed the original user input. The task of the function is to make from this input html suitable for display. In the simplest case, the function can be called while the content is displayed (in a template or view). To improve performance, you can create a separate field in the database, which will be the result of the screening function, and will be updated during the content update.
')
So, philosophy functions:
- This is a fundamental principle - any user data is initially considered invalid. Therefore, all that is not allowed will be screened.
- Minimum work for the user. If I want to write a construction with angle brackets, I will write angle brackets, and I will not think about "& lt;" etc. The function itself converts the wrong characters.
- However, you can use some tags. Which - is set by the function parameter, or the default set is used. Each tag has a set of valid attributes. The function will leave valid tags and valid attributes. Invalid attributes will simply be removed from the output, non-valid tags will be escaped.
- Following the first principle (and common sense) - unclosed tags will be automatically closed. Closing tags without opening will be escaped.
- The blog is programmatic-oriented, i.e. The blog will be pieces of code. These pieces should be escaped entirely (it will be a piece of code, anyone, even html). For this there are tags <pre> <code [class = "language"]> ... </ code> </ pre>. I was guided by the highlight.js code highlighting library, it uses exactly this format of the code frame. All that is inside this unit will be shielded entirely. And highlighted.
- I love koloboks :) Therefore, the function should optionally replace smiles with images.
So everything is simple for the user - write what you want, everything will be screened, except for the allowed tags.
Why not ready markup, bbcode, markdown:
- That users do not learn this new markup.
- What would the text of the post was the most "standard" (html) and portable. Copy-paste the original text of the post, screen the areas with the code, insert it into another blog - it is displayed correctly.
Although to make your own compact tags, which would be revealed in bulky html tags, it is a good idea. Most likely, I will make it so that you can assign a filter to each user input.
Here are some examples taken from the tests:
# New line test
text: 'New' + os.linesep + os.linesep + 'line'
html(text): 'New' + '<br/>' + os.linesep + '<br/>' + os.linesep + 'line')
# Simple smile test
text: ':)'
html(text): '<img src="/files/smile.gif"/>')
# Smile in code block test
text: '<pre><code>there will be no image :)</code></pre>'
html(text): '<pre><code>there will be no image :)</code></pre>')
# Plain text test
text: 'Some simple text without any tags'
html(text): 'Some simple text without any tags')
# Simple tags test
text: 'This is <b>important</b> information'
html(text): 'This is <b>important</b> information')
# Wrong tag test
text: 'This is <b>important</b> information and this is <h5>title</h5>'
html(text): 'This is <b>important</b> information and this is <h5>title</h5>')
# Wrong attributes test
text: 'This is <b onclick="javascript:alert();">probable XSS</b>'
html(text): 'This is <b>probable XSS</b>')
# Simple tags test
text: 'This is <img src="image.jpg" onclick="javascript:alert();"/>image'
html(text): 'This is <img src="image.jpg"/>image')
# Text without tags
text: 'Sample code: <?php print "Hello World"; ?>'
html(text): 'Sample code: <?php print "Hello World"; ?>')
# Code test
text: 'Sample code: <pre><code><a href="home.html">link</a></code></pre>'
html(text): 'Sample code: <pre><code><a href="home.html">link</a></code></pre>')
# Closing tag without opening tag
text: '</b>bold text ended'
html(text): '</b>bold text ended')
# Autoclosing tags test
text: '<b>bold and <i>italic</i> text'
html(text): '<b>bold and <i>italic</i> text</b>')
# Overlapping areas test
text: 'if 1 < 2 <b>only</b> then it is ok, but if 1 < 2 or 2 < 3 then <i>bad</i>'
html(text): 'if 1 < 2 <b>only</b> then it is ok, but if 1 < 2 or 2 < 3 then <i>bad</i>')
Unfortunately, I do not have a server where I could show the function. It is here:
http://pastebin.com/f591d71e5 .
I hope for criticism, thanks.