Prehistory

 Having decided to run a small service on the hosting I was given, it turned out that there is not a single xml parser there: neither SimpleXML, nor DOMXML, but only libxml and xml-rpc. Without thinking twice, I decided to write my own. I needed to parse non-complex rss feeds, so the xml => array class was enough. 
[one]But for an interesting article this was clearly not enough, so now we will write our replacement for SimpleXML. And at the same time let's go over the many interesting features of PHP 5.
Formulation of the problem
Access to the elements will be provided as access to the properties of the class, for example, 
$ xml-> element , and access to the attributes of the element as an array, those 
$ xml-> element [ 'attr' ] , also implements the check for the existence of an attribute using 
isset () and iterate over elements using 
foreach . So, let's begin.
')
A little bit of magic?
In PHP 5, some 'magic' methods are defined for the classes, they begin with a double underscore '__' and are called when a certain action originates. 
[2] We will need the following:
- void __construct ([ mixed $ args [, $ ... ]]) is the most famous magic method, called after the class has been created by the new operator.
- mixed __get ( $ name ) - called when accessing properties of a class if the corresponding field was not found, for example, $ obj-> element will call __get ( 'element' ) if element has not been declared as a class field.
- void __set ( $ name , $ value ) - accordingly it is called when the class property changes, for example, $ obj-> element = $ some_var will call __set ( 'element' , $ some_var ) .
- string __toString () - called on any operation on a class, as on a string, say echo $ obj or strval ( $ obj ) . We need this method to get the content of the item. Unfortunately, there are no methods for returning a non-string, so in order to convert an element into a number, you have to do this: intval (strval ( $ obj )) .
SPL
Standard PHP Library - the standard PHP library, like 
STL from the world of C ++, was created in order to provide the developer with tools for solving typical problems. 
[3]We will need to implement the following interfaces:
- ArrayAccess - to access the class as an array, for example, $ obj [ 'name' ] or isset ( $ obj [ 'name' ]) .
- IteratorAggregate - to enable iteration over the class using foreach .
- Countable - to find out the number of descendants of the item.
XML and expat
These are standard libraries for working with XML and creating XML parsers. 
[4] What is needed to solve our problem. For the sake of interest, you can write an analysis of the xml-file manually, for example, on regular expressions.
Most of all in expat we are interested in the following functions:
- bool xml_set_element_handler ( resource $ parser , callback $ start_element_handler , callback $ end_element_handler ) - sets the functions called when the open and closed tags are found, respectively.
- bool xml_set_character_data_handler ( resource $ parser , callback $ handler ) - calls the function, passing it the character content of the element, and even if nothing was there, it is still called.
Note: 
callback in php is either the name of the function passed as a string, or an array with two values - the first is the name of the class, and the second is the name of the method of this class.
Pointers
Pointers in PHP do not quite work as they do in C or C ++. 
[5] Actually, the 
$ a = & 
$ b construction only means that now 
$ a points to the same area with data as 
$ b , and it’s impossible to change the address where 
$ b points through 
$ a, it’s possible to say that address change has one nesting level.
Starting with the fifth version, in PHP all variables are passed to the function by pointer, but as soon as you change its value, memory is allocated for a new one. In our case, pointers are useful for pointing to the parent element.
Coding
With the theory finished, now we will start directly writing of the parser.
Each object will represent a single xml element, so it will need properties such as tag name, attributes, data, a reference to the parent, and an array with descendants, in addition, you will need a pointer variable to the current element. Of the methods, we will need to implement all the interfaces, add a child, set a reference to the parent, assign the contents of the element, and the three functions required for the parser - open and close the tag and get the contents of the element.
Make a sketch of the future class:
class XML implements ArrayAccess , IteratorAggregate , Countable { 
private $ pointer ; 
private $ tagName ; 
private $ attributes = array (); 
private $ cdata ; 
private $ parent ; 
private $ childs = array (); 
public function __construct ( $ data ) {} 
public function __toString () {return; } 
public function __get ( $ name ) {return; } 
public function offsetGet ( $ offset ) {return; } 
public function offsetExists ( $ offset ) {return; } 
public function offsetSet ( $ offset , $ value ) {return; } 
public function offsetUnset ( $ offset ) {return; } 
public function count () {return; } 
public function getIterator () {return; } 
public function appendChild ( $ tag , $ attributes ) {return; } 
public function setParent ( XML $ parent ) {} 
public function getParent () {return; } 
public function setCData ( $ cdata ) {} 
private function parse ( $ data ) {} 
private function tag_open ( $ parser , $ tag , $ attributes ) {} 
private function cdata ( $ parser , $ cdata ) {} 
private function tag_close ( $ parser , $ tag ) {} 
}
Now let's get down to the implementation of the functions. In order, let's start with the constructor. In our case, it can take two types of values - a string (xml) or an array of two elements (element name, attributes), since there is no overload of the same method with different parameters in php - you will have to manually check the type.
public function __construct ( $ data ) { 
if ( is_array ( $ data )) { 
list ( $ this -> tagName , $ this -> attributes ) = $ data ; 
} else if ( is_string ( $ data )) 
$ this -> parse ( $ data ); 
}
As already mentioned, with the help of the 
__toString () magic method, the user will be able to get the data of an element as a string, and then convert it to any type that he wants, unfortunately, it’s impossible to return directly what he wants, so that's the only way.
At the same time, we will analyze the next magic method 
__get ( $ name ) , with the help of which we will access the descendants of the current element. It is quite logical that if there is only one descendant, then it will be returned immediately, without the need to call on the 0 index of the array. For example: 
$ xml-> rss-> channel-> item [ 5 ] -> url , instead of 
$ xml-> rss [ 0 ] -> channel [ 0 ] -> item [ 5 ] -> url [ 0 ] , if the elements rss, channel and url exist in a single copy at their nesting level.
public function __toString () { 
return $ this -> cdata ; 
} 
public function __get ( $ name ) { 
if (isset ( $ this -> childs [ $ name ])) { 
if ( count ( $ this -> childs [ $ name ]) == 1 ) 
return $ this -> childs [ $ name ] [ 0 ]; 
else 
return $ this -> childs [ $ name ]; 
} 
throw new Exception ( “UFO steals [$ name]!” ); 
} 
The 
offsetGet , 
offsetExists , 
offsetSet, and 
offsetUnset functions implement the 
ArrayAccess interface to access an object as an array. We use it to access element attributes. 
offsetSet and 
offsetUnset will leave stubs for now.
public function offsetGet ( $ offset ) { 
if (isset ( $ this -> attributes [ $ offset ])) 
return $ this -> attributes [ $ offset ]; 
throw new Exception ( "Holy cow! There is'nt [$ offset] attribute!" ); 
} 
public function offsetExists ( $ offset ) { 
return isset ( $ this -> attributes [ $ offset ]); 
}
And now we are faced with a problem because of a recent decision. If suddenly we want to start a 
foreach loop on a single element, then it will start on the xml object itself! So you have to sacrifice the ability to use 
foreach for element attributes in a simple way and implement the 
getAttributes () method. And we will return the iterator and the number of elements for the array of elements to which the callee belongs, and if he does not have a parent, then an iterator over the array from one current element. Thus, the 
IteratorAggregate and 
Countable interfaces will be implemented.
public function count () { 
if ( $ this -> parent ! = null ) 
return count ( $ this -> parent -> childs [ $ this -> tagName ]); 
return 1 ; 
} 
public function getIterator () { 
if ( $ this -> parent ! = null ) 
return new ArrayIterator ( $ this -> parent -> childs [ $ this -> tagName ]); 
return new ArrayIterator (array ( $ this )); 
} 
Adding a child is a simple function, the only interesting thing about it is that after adding an element, it returns a reference to it.
public function appendChild ( $ tag , $ attributes ) { 
$ element = new XML (array ( $ tag , $ attributes )); 
$ element -> setParent ( $ this ); 
$ this -> childs [ $ tag ] [] = $ element ; 
return $ element ; 
}
Now we implement the parser itself. To create a tree structure we will use a pointer to the current element. At the beginning, it is installed directly on the current element, when opening a tag - on an open element, so that all elements contained in it are added to its descendants, and when closing a tag - on its parent element.
private function parse ( $ data ) { 
$ this -> pointer = & $ this ; 
$ parser = xml_parser_create (); 
xml_set_object ( $ parser , $ this ); 
xml_parser_set_option ( $ parser , XML_OPTION_CASE_FOLDING , false ); 
xml_set_element_handler ( $ parser , "tag_open" , "tag_close" ); 
xml_set_character_data_handler ( $ parser , "cdata" ); 
xml_parse ( $ parser , $ data ); 
} 
private function tag_open ( $ parser , $ tag , $ attributes ) { 
$ this -> pointer = & $ this -> pointer -> appendChild ( $ tag , $ attributes ); 
} 
private function cdata ( $ parser , $ cdata ) { 
$ this -> pointer -> setCData ( $ cdata ); 
} 
private function tag_close ( $ parser , $ tag ) { 
$ this -> pointer = & $ this -> pointer -> getParent (); 
}
Everything. Parser is ready to go. In order not to inflate the article even more, I downloaded the 
entire source code with comments on Google Docs and the 
usage example too. 
[6]What's next?
This is still not a complete replacement for SimpleXML, our parser still does not know how to create an xml document from the data in it. Adding the necessary functions is not a difficult task, so I will leave it, for those who are interested, as homework :)
Links
1) The first version of 
xml => array parser .
2) Documentation of 
magical methods (eng) ( 
rus ).
3) 
SPL documentation .
4) Description of the functions of the 
xml-parser .
5) 
Documentation of signs (eng) ( 
rus ).
6) The 
final version of the parser and a simple 
example of use .