Hey. I noticed that there are still not many posts devoted to Symfony 2.0 . I will try to fix it in the near future with topics and translations about the components of the framework. Now I present to your attention the translation of an article from Fabien Potencier's blog, which is always interesting to read. The translation may not always be literal, but I tried to convey the meaning clearly. So, let's begin.- HTML and XML documents are like bread and butter for web developers. Day after day, you are likely to create many HTML documents. And for sure you have to parse some of them from time to time: because you use web services and you want to extract some information, either because you want to get data from the necessary web pages, or simply because you want to write functional tests for the web site. Getting the content is easy enough, but how can you parse it to highlight the right information?
PHP already comes with a large number of tools for parsing XML documents: for example, SimpleXML, DOM and XMLReader. But as soon as you need to extract information deeply embedded in the structure of the document, everything is not as easy as it should be. Of course, XPath is your best friend if you need to select elements, but the learning curve is very steep. Even expressions that should be simple turn out to be cumbersome. For example, here is an XPath expression for finding all h1 tags with the “foo” class:
h1[contains(concat( ' ' , normalize-space(@ class ), ' ' ), ' foo ' )]
The expression turned out to be difficult, since the tag can have several classes:
')
< h1 class ="foo" > Foo </ h1 >
< h1 class ="foo bar" > Foo </ h1 >
< h1 class ="foobar bar" > Foo </ h1 >
The expression must select the first two h1 tags, but not the third.
Of course, everyone knows that doing the same on css is easier than ever:
h1.foo
For functional tests in
symfony 2 , I was looking for a way to increase the power and expressiveness of CSS selectors using tools that already exist in PHP. The first idea that came to my mind is to convert a CSS selector into its XPath equivalent. But is it possible? The answer is rather "Yes."
John Resig wrote in his
post almost on the same topic: "The most important thing is to understand that CSS selectors are often very short, but extremely inefficient compared to XPath."
Writing a "tokenizer", a parser, and a linker that can convert CSS selectors to XPath equivalents is not a trivial task. Therefore, instead of inventing the wheel, I looked at the existing libraries. Very soon I came across lxml, the Python library. The
lxml.cssselect module of the
lxml library does what it needs. So I spent the time translating code from Python to PHP, added some unit tests, and voila - the
CSS selector component for Symfony 2 was born.
For reference: in symfony 1 there is a class sfDomCssSelector, but it does not convert CSS selectors to XPath. It does the robot well, but is limited to very simple CSS selectors and cannot be used in conjunction with standard XML tools.
The symfony 2 CSS Selector component does only one thing, and tries to do it well:
convert CSS selectors to XPath expressions . Its use is very simple:
use Symfony\Components\CssSelector\Parser;
$xpath = Parser::cssToXpath( 'h1.foo' );
Now the $ xpath variable contains “h1 [contains (concat ('', normalize-space (@class), ''), 'foo')]”.
Let's give an example of how you can use a component. Suppose you want to get all the post titles and URLs on my blog (information is available at
fabien.potencier.org/articles ).
use Symfony\Components\CssSelector\Parser;
$document = new \DOMDocument();
$document->loadHTMLFile( 'http://fabien.potencier.org/articles' );
$xpath = new \DOMXPath($document);
foreach ($xpath->query(Parser::cssToXpath( 'div.item > h4 > a' )) as $node)
{
printf( "%s (%s)\n" , $node->nodeValue, $node->getAttribute( 'href' ));
}
The code is very simple, and instead of using an XPath expression, we allow the parser class to convert CSS selectors for us into an XPath expression.
$xpath->query(Parser::cssToXpath( 'div.item > h4 > a' ))
Remember that if you are working with XML documents, you need to declare the namespaces used. Let's use SimpleXMLElement, which understands only well-formed XML documents:
$document = new \SimpleXMLElement( 'http://fabien.potencier.org/articles' , 0, true );
$document->registerXPathNamespace( 'xhtml' , 'http://www.w3.org/1999/xhtml' );
foreach ($document->xpath(Parser::cssToXpath( 'xhtml|div.item > xhtml|h4 > xhtml|a' )) as $node)
{
printf( "%s (%s)\n" , $node, $node[ 'href' ]);
}
As you can see, CSS selectors support namespaces (xhtml | div).
This new CSS Selector component will be used in Symfony 2 for functional tests (but as you will see in the next few weeks, quite differently from symfony 1).
The component code is modularly tested with good code coverage (test coverage), so be free to
use it (code is on Github:
github.com/fabpot/symfony in the
Symfony \ Components \ CssSelector namespace ) and leave feedback.