📜 ⬆️ ⬇️

PHP: How to parse a complex XML file and not drown in your own code

Good time of day!

The scope of the XML-format is quite extensive. Along with CSV, JSON, and others, XML is one of the most common ways to present data for sharing between different services, programs, and sites. An example is the CommerceML format for the exchange of goods and orders between 1C "Trade Management" and an online store.

Therefore, almost everyone involved in creating web services from time to time has to deal with the need to parse XML documents. In my post, I propose one of the methods for how to do this as clearly and transparently as possible using XMLReader.

PHP offers several ways to work with the XML format. Without going into details, I will say that in principle they can be divided into two groups:
')
  1. Loading the entire XML document into memory as an object and working with this object
  2. Step-by-step reading of an XML string at the tag, attribute, and text level

The first method is more understandable on an intuitive level, the code looks more transparent. This method is well suited for small files.

The second method is a lower-level approach, which gives us a number of advantages, and at the same time somewhat darkens life. Let us dwell on it in more detail. Pros:


But: we sacrifice the readability of the code. If the goal of our parsing is, say, counting the sum of values ​​in certain places within XML with a simple structure, then there are no problems.
However, if the file structure is complex, the work with the data also depends on the full path to this data, and the result should include a lot of parameters, then we will come to a rather muddled code.

So I wrote a class that later made my life easier. Its use simplifies the writing of rules and greatly improves the readability of programs, their size becomes several times smaller, and the code becomes more beautiful.

The main idea is as follows: our XML schema, and how to work with it, we will store in a single array that repeats the hierarchy of only the tags we need. Also, for any of the tags in the same array, we will be able to register the necessary functions — handlers for opening a tag, closing it, reading attributes or reading text, or all together. Thus, we store the structure of our XML and handlers in one place. One look at our processing structure will be enough to understand what we are doing with our XML file. I will make a reservation that on simple tasks (as in the examples below) the readability is small, but it will be obvious when working with files of a relatively complex structure - for example, the exchange format with 1C.

Now the specifics. Here is our class:

Class XMLReaderStruct - click to expand
class XMLReaderStruct extends XMLReader { public function xmlStruct($xml, $structure, $encoding = null, $options = 0, $debug = false) { $this->xml($xml, $encoding, $options); $stack = array(); $node = &$structure; $skipToDepth = false; while ($this->read()) { switch ($this->nodeType) { case self::ELEMENT: if ($skipToDepth === false) { //       ,     ,  :      //  ,   ,      ,      .  //  ,    ,        . if (isset($node[$this->name])) { if ($debug) echo "[  ]: ",$this->name," -   .   .\r\n"; $stack[$this->depth] = &$node; $node = &$node[$this->name]; if (isset($node["__open"])) { if ($debug) echo "    ",$this->name," - .\r\n"; if (false === $node["__open"]()) return false; } if (isset($node["__attrs"])) { if ($debug) echo "    ",$this->name," - .\r\n"; $attrs = array(); if ($this->hasAttributes) while ($this->moveToNextAttribute()) $attrs[$this->name] = $this->value; if (false === $node["__attrs"]($attrs)) return false; } if ($this->isEmptyElement) { if ($debug) echo "  ",$this->name," .   .\r\n"; if (isset($node["__close"])) { if ($debug) echo "    ",$this->name," - .\r\n"; if (false === $node["__close"]()) return false; } $node = &$stack[$this->depth]; } } else { $skipToDepth = $this->depth; if ($debug) echo "[  ]: ",$this->name," -    .        ",$skipToDepth,".\r\n"; } } else { if ($debug) echo "(  ): ",$this->name," -    .\r\n"; } break; case self::TEXT: if ($skipToDepth === false) { if ($debug) echo "[  ]: ",$this->value," -  .\r\n"; if (isset($node["__text"])) { if ($debug) echo "    - .\r\n"; if (false === $node["__text"]($this->value)) return false; } } else { if ($debug) echo "(  ): ",$this->value," -    .\r\n"; } break; case self::END_ELEMENT: if ($skipToDepth === false) { //  $skipToDepth  ,   ,        , //       . if ($debug) echo "[  ]: ",$this->name," -   .   .\r\n"; if (isset($node["__close"])) { if ($debug) echo "    ",$this->name," - .\r\n"; if (false === $node["__close"]()) return false; } $node = &$stack[$this->depth]; } elseif ($this->depth === $skipToDepth) { //  $skipToDepth ,   ,    ,         . if ($debug) echo "[  ]: ",$this->name," -   ",$skipToDepth,".    .\r\n"; $skipToDepth = false; } else { if ($debug) echo "(  ): ",$this->name," -    .\r\n"; } break; } } return true; } } 


As you can see, our class extends the capabilities of the standard XMLReader class, to which we have added one method:

 xmlStruct($xml, $structure, $encoding = null, $options = 0, $debug = false) 

Options:


$ Structure argument.

This is an associative array, the structure of which repeats the hierarchy of tags of an XML file plus, if necessary, handler functions (defined as fields with the corresponding key) can be in each structure element:


If any of the handlers returns false, the parsing will fail, and the xmlStruct () function will return false. The examples below show how to construct the $ structure argument:

Example 1 showing the order in which handlers are called
Let there is an XML file:

 <?xml version="1.0" encoding="UTF-8"?> <root> <a attr_1="123" attr_2="456">Abc</a> <b> <x>This is node <x> inside <b></x> </b> <c></c> <d> <x>This is node <x> inside <d></x> </d> <e></e> </root> 

 $structure = array( 'root' => array( 'a' => array( "__attrs" => function($array) { echo "ATTR ARRAY IS ",json_encode($array),"\r\n"; }, "__text" => function($text) use (&$a) { echo "TEXT a {$text}\r\n"; } ), 'b' => array( "__open" => function() { echo "OPEN b\r\n"; }, "__close" => function() { echo "CLOSE b\r\n"; }, 'x' => array( "__open" => function() { echo "OPEN x\r\n"; }, "__text" => function($text) { echo "TEXT x {$text}\r\n"; }, "__close" => function() { echo "CLOSE x\r\n"; } ) ) ) ); $xmlReaderStruct->xmlStruct($xml, $structure); 

Handlers will be called (in chronological order):

root attributes-> a
text field root-> a
opening root-> b
opening root-> b-> x
text root-> b-> x
closing root-> b-> x
close root-> b

The remaining fields will not be processed (including root-> d-> x will be ignored, because it is outside the structure)

Example 2 illustrating a simple practical task
Let there is an XML file:

 <?xml version="1.0" encoding="UTF-8"?> <shop> <record> <id>0</id> <type>product</type> <name>Some product name. ID:0</name> <qty>0</qty> <price>0</price> </record> <record> <id>1</id> <type>service</type> <name>Some product name. ID:1</name> <qty>1</qty> <price>15</price> </record> <record> <id>2</id> <type>product</type> <name>Some product name. ID:2</name> <qty>2</qty> <price>30</price> </record> <record> <id>3</id> <type>service</type> <name>Some product name. ID:3</name> <qty>3</qty> <price>45</price> </record> <record> <id>4</id> <type>product</type> <name>Some product name. ID:4</name> <qty>4</qty> <price>60</price> </record> <record> <id>5</id> <type>service</type> <name>Some product name. ID:5</name> <qty>5</qty> <price>75</price> </record> </shop> 

This is a kind of cash voucher with goods and services.

Each check entry contains a record ID, type (product “product” or service “service”), name, quantity and price.

Task: calculate the amount of the check, but separately for goods and services.

 include_once "xmlreaderstruct.class.php"; $x = new XMLReaderStruct(); $productsSum = 0; $servicesSum = 0; $structure = array( 'shop' => array( 'record' => array( 'type' => array( "__text" => function($text) use (&$currentRecord) { $currentRecord['isService'] = $text === 'service'; } ), 'qty' => array( "__text" => function($text) use (&$currentRecord) { $currentRecord['qty'] = (int)$text; } ), 'price' => array( "__text" => function($text) use (&$currentRecord) { $currentRecord['price'] = (int)$text; } ), '__open' => function() use (&$currentRecord) { $currentRecord = array(); }, '__close' => function() use (&$currentRecord, &$productsSum, &$servicesSum) { $money = $currentRecord['qty'] * $currentRecord['price']; if ($currentRecord['isService']) $servicesSum += $money; else $productsSum += $money; } ) ) ); $x->xmlStruct(file_get_contents('example.xml'), $structure); echo 'Overal products price: ', $productsSum, ', Overal services price: ', $servicesSum; 

Source: https://habr.com/ru/post/452648/


All Articles