📜 ⬆️ ⬇️

Parsim sites easily and naturally with phpQuery

Hi habr, I think some of you have a need to pull off something necessary from some kind of site. But writing tons of functions to get a phrase out is very dreary, long and not productive.
I present to you the phpQuery. This is a php version of a well-known javascript frame - jQuery .
The author tried to glory and ported almost everything we need.

Let's see what she can do.

And she is able everything that her elder brother jQuery is able.
Actually, for starters, we will not go far and will focus on% username% .habrahabr.ru / blog /

There are quite a few ways to do this.
phpQuery :: newDocument ($ html, $ contentType = null) Create a new document from the markup. If $ contentType is not specified, it will be determined based on the markup. If that doesn't work, then we assume that this is text / html in utf-8.
phpQuery :: newDocumentFile ($ file, $ contentType = null) Create a new document from a file. Works like newDocument ()
phpQuery :: newDocumentHTML ($ html, $ charset = 'utf-8')
phpQuery :: newDocumentXHTML ($ html, $ charset = 'utf-8')
phpQuery :: newDocumentXML ($ html, $ charset = 'utf-8')
phpQuery :: newDocumentPHP ($ html, $ contentType = null) You can read more about it here .
phpQuery :: newDocumentFileHTML ($ file, $ charset = 'utf-8')
phpQuery :: newDocumentFileXHTML ($ file, $ charset = 'utf-8')
phpQuery :: newDocumentFileXML ($ file, $ charset = 'utf-8')
phpQuery :: newDocumentFilePHP ($ file, $ contentType) You can read more about it here .

Well, we will not go far. Give% username%, parse your blog entries. First, download phpQuery . Now create something like index.php
<?php require ('phpQuery/phpQuery.php'); $habrablog = file_get_contents('http://%username%.habrahabr.ru/blog/'); $document = phpQuery::newDocument($habrablog); $hentry = $document->find('div.hentry'); foreach ($hentry as $el) { $pq = pq($el); //   $  jQuery $pq->find('h2.entry-title > a.blog')->attr('href', 'http://%username%.habrahabr.ru/blog/')->html('%username%'); //     $pq->find('div.entry-info')->remove(); //    $tags = $pq->find('ul.tags > li > a'); $tags->append(': ')->prepend(' :'); //     $pq->find('div.content')->prepend('<br />')->prepend($tags); //       } echo $hentry; ?> 

This is just a small part of what is possible to do.
Also with it comes such a thing as jQueryServer. In fact, this is the same as phpQuery, but on the client side.
Demo example
 <script type="text/javascript"> jQuery.serverConfig.url = '/phpQuery/jQueryServer/jQueryServer.php'; function demo() { $.server({ url: document.location.toString(), dataType: 'json' }) .find('li') .client(function(response){ $.each(response, function(k, li){ $('ul').append(li); }); }); } $(function(){ $('ul').append('<li>above LIs will be downloaded and appended below in 2 seconds...</li>'); setTimeout(demo, 2000); }); </script> 

This option is quite practical and allows you to parse content from several sites in a few seconds without bothering to write php code.
Materials on the topic

Google code
Official blog
')
If you are interested in the next article, I want to consider parsing sites accessible only to authorized users (no captcha is finite). Yes, phpQuery can do this too, though not without the help of the Zend Framework.

Source: https://habr.com/ru/post/69149/


All Articles