📜 ⬆️ ⬇️

Clearing HTML tags from attributes

WYSIWYG is an integral part of sites with editable content. But its disadvantage is the "stuffing" of HTML tags with different attributes. In this article I want to tell about how you can remove "unnecessary" attributes in a large number of records.

In my case, it was necessary to transfer content from Joomla to Wordpress. This was done using CakePHP. The content of the required Jooml articles was saved as WP posts. But the tables of the new site did not match the design, because everyone had attributes: border, width, cellspacing, cellpadding, align, class. It was necessary to get rid of this all. I was offered two options:
  1. Use regular expressions
  2. Take a dump base and go through autochange.

Both options are not very suitable. attributes are spelled out in a different order and set.

PHP DOMDocument helped in solving the problem. Here is the code of operation:

$dom = new DOMDocument; $dom->loadHTML($html); //   HTML $xpath = new DOMXPath($dom); // XPath $tags = $xpath->query('//tag'); //   foreach ($tags as $tag) { $tag->removeAttribute('attr'); //   } //   HTML-      $new = $dom->saveHTML(); 

Here is my final code (used in the loop):
')
 $dom = new DOMDocument; $dom->loadHTML(mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8')); $xpath = new DOMXPath($dom); // del attributes from <table> $tables = $xpath->query('//table'); foreach ($tables as $table) { $table->removeAttribute('width'); $table->removeAttribute('cellspacing'); $table->removeAttribute('cellpadding'); $table->removeAttribute('border'); } // del attr from <tr> $rows = $xpath->query('//tr'); foreach ($rows as $row) { $row->removeAttribute('align'); } $newContent = $dom->saveHTML(); 

So if you need to get rid of the attributes in the tags of posts / statte - you can just walk through a simple PHP-script.

Source: https://habr.com/ru/post/254203/


All Articles