📜 ⬆️ ⬇️

HTML Purifier. Expanding opportunities


Just a couple of paragraphs, I will pay attention to the features of the interaction of this library with the framework Yii, the rest is fully universal and will be interesting to anyone who uses or plans to use this library.

If you are already familiar with Purifier, feel free to start reading from here.

A little bit about HTML Purifier

If you have not heard of such an excellent library (and the search on Habré speaks of not so much popularity) as HTML Purifier , I advise you to take a closer look at it, especially if your users generate content in html format. This can be an ordinary user, a moderator, or even an administrator.
What does this library do?
According to the configuration, it clears any html code from all malicious, non-valid, prohibited (your configuration) parts of the code, including individual attributes.

Less words, more code

I think a couple of examples will speak for themselves.
$config = HTMLPurifier_Config::createDefault(); $config->set('Attr.AllowedClasses',array('header')); //  Attr.ForbiddenClasses   CSS  $config->set('AutoFormat.AutoParagraph',true); //   <p>     $config->set('AutoFormat.RemoveEmpty',true); //   ,  * $config->set('HTML.Doctype','HTML 4.01 Strict'); //      <strike> $purifier = new HTMLPurifier($config); $clean_html = $purifier->purify($html); 

* - RemoveEmpty Exceptions
')
Source html:
  <p invalidAttribute="value">,    <strike></strike>:</p> <p>  - <invalidTag></invalidTag>,</p> <p class="header error"> - ,</p>  - ! <script type="text/javascript">alert("hacked by Alexander Blok");</script> 

The result of the purify function
  <p>,    <span style="text-decoration:line-through;"></span>:</p> <p>  - ,</p> <p class="header"> - ,</p> <p> - !</p> 


The number of settings is impressive and gives you the opportunity out of the box to get those buns that you need.


"Pearl Buttons"

But there would not be this post, if, as usual, we did not want something special, namely two things:
  1. Replace all links to external sites with our site.ru/redirect?url=link link
  2. Add target attribute = _blank to all user links

The tasks did not seem too complicated, there is a good article on the docks in the first one, and the second is a bit too much - the HTML.TargetBlank config does the work for us.

Task 1 - replacing external links

Purifier has a great HTMLPurifier_URIFilter class and equally great examples of how this filter is implemented.
I took the DisableExternalResources file as a basis and quickly rewrote it to fit my needs, namely replacing an external link with an internal one.
Filter file
A little description:
In the prepare function we get the host of our site, divide by points, and expand the array.
As a result, it gets array ('ru', 'site', 'subdomen').
In the filter function, we do the same with the user reference and compare the host, if it is the same, then we change nothing and return true, but if not, we create a new URI object with our address and insert a custom link into the GET parameter.
Important The filter method should not return anything other than true or false. Do not try to replace the link by returning it via return.
 <?php class HTMLPurifier_URIFilter_MakeRedirect extends HTMLPurifier_URIFilter { /** * @type string */ public $name = 'MakeRedirect'; /** * @type array */ protected $ourHostParts = false; /** * @param HTMLPurifier_Config $config * @return void */ public function prepare($config) { $our_host = $config->getDefinition('URI')->host; if ($our_host !== null) { $this->ourHostParts = array_reverse(explode('.', $our_host)); } } /** * @param HTMLPurifier_URI $uri Reference * @param HTMLPurifier_Config $config * @param HTMLPurifier_Context $context * @return bool */ public function filter(&$uri, $config, $context) { if (is_null($uri->host)) { return true; } if ($this->ourHostParts === false) { return false; } $host_parts = array_reverse(explode('.', $uri->host)); foreach ($this->ourHostParts as $i => $x) { if (!isset($host_parts[$i]) || $host_parts[$i] != $this->ourHostParts[$i]) { $path = Yii::app()->createUrl('site/redirect'); //  Yii,      url manager       /action,    $query = 'url='.urlencode($uri->toString()); $uri = new HTMLPurifier_URI('http', null, Yii::app()->request->getServerName(), // return $_SERVER['SERVER_NAME'] null, $path, $query, null); break; } } return true; } } 


Apply filter

To do this, as the documentation suggests, we need to refer to the HTMLPurifier_Config object.
  $config = HTMLPurifier_Config::createDefault(); $uri = $config->getDefinition('URI'); $uri->addFilter(new HTMLPurifier_URIFilter_MakeRedirect(), $config); $purifier = new HTMLPurifier($config); $clean_html = $purifier->purify($html); 

Paragraph for happy users Yii

I am one of them ( and no regrets ). Yii out of the box supports Purifier, but not everything is so smooth.
Example from documentation:
 $p = new CHtmlPurifier(); //   Yii $p->options = array('URI.AllowedSchemes'=>array('http' => true, 'https' => true,)); //      $text = $p->purify($text); 

From there, we learn:
  /** * @var mixed the options to be passed to HTML Purifier instance. * This can be a HTMLPurifier_Config object, an array of directives (Namespace.Directive => Value) * or the filename of an ini file. * @see http://htmlpurifier.org/live/configdoc/plain.html */ private $_options=null; 

It seems that everything is fine, you can pass an HTMLPurifier_Config object instead of an array, try:
  $purifier = new CHtmlPurifier(); $config = HTMLPurifier_Config::createDefault(); $config->set('AutoFormat.RemoveEmpty', true); $uri = $config->getDefinition('URI'); $uri->addFilter(new HTMLPurifier_URIFilter_MakeRedirect(), $config); $purifier->options = $config; $clean_html = $purifier->purify($html); 

  Warning Base directory /framework/vendors/htmlpurifier/standalone/HTMLPurifier/DefinitionCache/Serializer does not exist, please create or change using %Cache.SerializerPath 

Here we do not get upset and go into the Goggle CHtmlPurifier mana and find out that you need to set the Cache.SerializerPath parameter with the value Yii :: app () -> getRuntimePath (), this will allow the driver to use this folder to store the cache
We do:
 $purifier = new CHtmlPurifier(); $config = HTMLPurifier_Config::createDefault(); $config->set('AutoFormat.RemoveEmpty', true); $config->set('Cache.SerializerPath',Yii::app()->getRuntimePath()); // <-- $uri = $config->getDefinition('URI'); $uri->addFilter(new HTMLPurifier_URIFilter_MakeRedirect(), $config); $purifier->options = $config; $clean_html = $purifier->purify($html); 

 Cannot set directive after finalization invoked on line 127 in file /framework/web/widgets/CHtmlPurifier.php 

Now the driver does not like the fact that we define the parameter twice. And CHtmlPurifier does it itself in the createNewHtmlPurifierInstance () method
 protected function createNewHtmlPurifierInstance() { $this->_purifier=new HTMLPurifier($this->getOptions()); $this->_purifier->config->set('Cache.SerializerPath',Yii::app()->getRuntimePath()); return $this->_purifier; } 

Here, I confess, I spent quite a bit of time looking for a beautiful solution, but alas. I found nothing more beautiful than creating the GHtmlPurifier class and inheriting it from the CHtmlPurifier class by rewriting the createNewHtmlPurifierInstance () method.
I put the new file in the protected / components / folder and the code finally worked.
  $htmlpurifier = new GHtmlPurifier(); $config = HTMLPurifier_Config::createDefault(); $config->set('Cache.SerializerPath',Yii::app()->getRuntimePath()); $uri = $config->getDefinition('URI'); $uri->addFilter(new HTMLPurifier_URIFilter_MakeRedirect(), $config); $htmlpurifier->options = $config; return $htmlpurifier->purify($text); 

Task 2 - add target = _blank

I will not bother you with examples of non-working code and I will say right away that HTML.TargetBlank works only with external links and its use is no longer necessary. And URI filters cannot access the tag and its attributes.
Already accustomed to good documentation on the library, I got into mana, but alas, the necessary Advanced API section was empty and there was an inscription “Filed under Development” .
There was nothing left, how to plunge into the sources and find how the HTML.TargetBlank module is implemented.
Here he is:
HTMLPurifier_AttrTransform_TargetBlank
 /** * Adds target="blank" to all outbound links. This transform is * only attached if Attr.TargetBlank is TRUE. This works regardless * of whether or not Attr.AllowedFrameTargets */ class HTMLPurifier_AttrTransform_TargetBlank extends HTMLPurifier_AttrTransform { private $parser; public function __construct() { $this->parser = new HTMLPurifier_URIParser(); } public function transform($attr, $config, $context) { if (!isset($attr['href'])) { return $attr; } // XXX Kind of inefficient $url = $this->parser->parse($attr['href']); $scheme = $url->getSchemeObj($config, $context); if ($scheme->browsable && !$url->isBenign($config, $context)) { $attr['target'] = '_blank'; } return $attr; } } 


It was decided to create our own module, which will not include checking for external address, but will add target = _blank to all links that it finds.
I think everyone can handle copying and deleting a pair of lines in the transform method. Therefore, the listing will not result. It is important not to forget to change the name of your module, I called it HTMLPurifier_AttrTransform_TargetBlankAll and put it in the same folder / protected / components /.
But this was not enough, the module does not automatically pick up, and we need to create a class that will add a module to our configuration. In the code I added a couple of comments, which was clear what needs to be changed if you want to write your own module.
HTMLPurifier_HTMLModule_TargetBlankAll.php
 class HTMLPurifier_HTMLModule_TargetBlankAll extends HTMLPurifier_HTMLModule { public $name = 'TargetBlankAll'; //      .     public function setup($config) { $a = $this->addBlankElement('a'); // ,        A $a->attr_transform_post[] = new HTMLPurifier_AttrTransform_TargetBlankAll(); //       //      $a->attr_transform_pre[] } } 


I also added this file to the / protected / components folder.
Now it remains to add this module to our config and enjoy the result. This is not entirely logical. We need to get a reference to the HTML object, with the $ raw = true parameter, so that it is initialized and the __construct () method works in the HTMLPurifier_HTMLDefinition class.
In the __construct () method, the $ this-> manager variable is initialized, which we will use to connect our module.
  $htmlpurifier = new GHtmlPurifier(); $config = HTMLPurifier_Config::createDefault(); $config->set('Cache.SerializerPath',Yii::app()->getRuntimePath()); $uri = $config->getDefinition('URI'); $uri->addFilter(new HTMLPurifier_URIFilter_MakeRedirect(), $config); $html = $config->getHTMLDefinition(true); //     HTMLPurifier_HTMLDefinition $html->manager->addModule('TargetBlankAll'); //      $htmlpurifier->options = $config; return $htmlpurifier->purify($text); 

Ta-dam:
 <a href="http://site.ru/">http://site.ru</a> <a href="http://habrahabr.ru/">http://habrahabr.ru</a> 

 <a href="http://site.ru/" target="_blank">http://site.ru</a> <a href="http://site.ru/redirect/?url=http%3A%2F%2Fhabrahabr.ru%2F" target="_blank">http://habrahabr.ru</a> 

Both tasks are completed!


I hope this article introduced you to this wonderful tool and will help make your website more interesting and safer at the same time, giving your users the opportunity to create interesting content using all the features of html.

This library is no different fast, so do not use it for data output on the fly.

Source: https://habr.com/ru/post/202188/


All Articles