📜 ⬆️ ⬇️

We write our rating of popular entries of Russian-language blogs based on Yandex.API, part 1

Once upon a time, Yandex was part of a search on Yandex blogs for a rating of popular entries, which for many was a sort of daily newspaper. But we decided to close it in Yandex and provided an API so that everyone could make their own rating of popular blog entries, which we will do today. We will write in PHP.


So, for a start, let's look at the Blog Search Records Statistics API.

Everything is simple, we have an RSS blog statistics statistics for the last 24 hours, divided into pages of 50 entries.
')
The first page displays the most recent entries. The page number we need to get is set by the parameter p = <page number> in the URL address at which we access the API, for example: http://blogs.yandex.ru/entriesapi?p=1 .

Follow the link above and look at the source code of this page ( View-> view source or Ctrl + U if you have FireFox / Opera / Chrome). We see a plain XML document that will be read and analyzed using PHP. We are interested in information about the records that are inside the tags <item>. The names of the nested elements speak for themselves, this is:

Well, we learned to read such notes, now let's teach them to read our script.

First you need to decide on a tool for working with XML. In the PHP manual there is just a huge section of XML manipulation , where you can find a lot of different tools for working with XML documents. One of the simplest, most convenient, and at the same time powerful solutions is the SimpleXML extension, which comes with PHP from version 5 and is enabled by default. We will work with him. Believe me, this is really a very good tool that allows you to easily and conveniently work with XML documents. I will demonstrate this:
$xml = simplexml_load_file('http://blogs.yandex.ru/entriesapi');
echo '<b> :</b> ',
$xml->channel->title, '<br>',
'<b> :</b> ',
$xml->channel->item[0]->title, '<br>',
'<b> : </b> ',
$xml->channel->item[0]->pubDate;

I think everything is clear: in the first line we load the first page of the tape of popular records into the $ xml variable (the simplexml_load_file function returns an instance of the SimpleXMLElement class), which we can then interpret as an object corresponding to our XML document, which we actually do.

If we refer to an arbitrary field of such an object, we search the child element of the XML tree with the name corresponding to the name of the requested field, and if such an element is found, an object is returned that is also an instance of the SimpleXMLElement class, or an array of such objects.

Thanks to this logic of work, we can perform chains of queries of the form: $ xml-> someElement-> children-> childrenOfChildren.

Note that the item elements in our XML document are many, so $ xml-> channel-> item returns not one object, but an array of objects that provide access to these elements. In the example, we turned to the very first item element in the document at index [0], and displayed its title and publication date on the screen (in the browser).

An alternative way to get the desired XML branch of a document in SimpleXML is to use XPath , the query language for the elements of the XML tree. In the SimpleXMLElement class for this, there is an xpath method (string $ path) that returns an array of instances of the SimpleXMLElement class or FALSE in case of an error.

An example of using XPath:

$items=$xml->xpath('channel/item');

To get a similar set of elements without XPath, you need to run:

$items=$xml->channel->item;

Using XPath or field call chains is a matter of taste, in our case we will use XPath to refer to elements like yablogs: links, where there is a ":" symbol, as it prevents interpretation of these elements as object fields in PHP.

Well, let's do something already. For example, a function that receives information about all records in 24 hours. Actually, here:
  1. define ('MAX_PAGES', 200);
  2. function load_all () {
  3. $ all_items = array ();
  4. for ($ i = 1; true; $ i ++) {
  5. $ xml = simplexml_load_file ('http://blogs.yandex.ru/entriesapi?p='. $ i);
  6. $ items = $ xml-> xpath ('channel / item');
  7. if (empty ($ items)) {
  8. break;
  9. }
  10. $ all_items = array_merge ($ all_items, $ items);
  11. if ($ i> = MAX_PAGES) break;
  12. }
  13. return $ all_items;
  14. }


Such a function will of course work very slowly and most likely will rest against the limitation on the execution time of php scripts: the default is 30 seconds, therefore, the received data must be cached, and the receipt itself must be executed not when opening the page, but through the crontab task scheduler. We will save the received information to the database, we will use MySQL, but this is in the next part, but for now let's get back to working with XML.

In this lesson, let's collect our rating of popular entries of Russian-language blogs with minimal functionality and postpone caching, expanding the functionality, adding OOP functionality and MVC functionality to our code.

In order not to wait long and tediously for the result of the execution of the load_all () function, let's set a restriction for it: we will only process the first 4 pages of the RSS feed, provided by Yandex. Especially for this, I introduced the MAX_PAGES constant, replacing the value 200 in the first line by 4.

Our rating should be able to sort the records by the number of comments, the number of links and the number of visits. We already have a function with which we can get a list of records. So the task is to sort this list.

If we entered information about all entries into the database by a script called through the task scheduler, and when issuing information to the user, would take it from the database, we would use the SQL capabilities to sort, but we will do this in the second part of the lesson, Now we will do a quick sort using PHP.

The standard PHP usort function will help us in this, allowing us to sort arrays using our function to compare the elements of an array.

The array elements we have are instances of the class SimpleXMLElement, initialized by the XML elements of the <item> tree.

Consider how we can get for example the number of comments in a post:
  1. $ xml = simplexml_load_file ('http://blogs.yandex.ru/entriesapi');
  2. $ item = $ xml-> channel-> item [12]; // We get some kind of XML element item, for example the 13th.
  3. $ comments_arr = $ item-> xpath ('yablogs: comments'); // get an array of objects
  4. $ comments_obj = $ comments_arr [0]; // we must have one object, we know that
  5. $ comments = (int) $ comments_obj; // type its type to an integer (int) so that you can perform a comparison


And now you can write a comparison function, and universal.
  1. $ cmp = 'yablogs: comments'; // compare by this parameter,
  2. // to the comparison function when used in usort there is a requirement:
  3. // it should take only two parameters
  4. // corresponding to the compared elements of the array
  5. // so we make $ cmp just a global variable
  6. function cmp ($ a, $ b)
  7. {
  8. global $ cmp;
  9. $ a = $ a-> xpath ($ cmp);
  10. $ b = $ b-> xpath ($ cmp);
  11. $ a = (int) $ a [0];
  12. $ b = (int) $ b [0];
  13. if ($ a == $ b) {
  14. return 0;
  15. }
  16. return ($ a> $ b)? -eleven;
  17. }


We will have the sort_by function, into which we will pass the link to the array and the string corresponding to the XML element by which we will sort (the comparison criterion). In turn, sort_by will call the standard function usort, passing it a link to the array and assign the comparison criterion to the global variable $ cmp.
  1. function sort_by ($ sort_by, $ i)
  2. {
  3. global $ cmp;
  4. $ cmp = $ sort_by;
  5. usort ($ i, 'cmp');
  6. }


Almost done, now we can, for example, display entries sorted by the number of comments like this:
  1. $ items = load_all ();
  2. sort_by ('yablogs: comments', & $ items);
  3. foreach ($ items as $ item) {
  4. $ comments = $ item-> xpath ('yablogs: comments');
  5. $ links = $ item-> xpath ('yablogs: links');
  6. $ visits = $ item-> xpath ('yablogs: visits24');
  7. echo "<a href='$item-> link '> $ item-> title </a> <br>",
  8. "Comments: $ comments [0] <br>",
  9. "Links: $ links [0] <br>",
  10. "Views: $ visits [0] <hr>";
  11. }


How exactly to sort our list, we will pass by the sort_by URI parameter, which will be available in php as $ _GET ['sort_by']. Create three links for different types of sorting, for this before the <? Php tag, denoting the beginning of the php code, we write:
  1. <div style = "text-align: center;">
  2. <a href="index.php/?sort_by=comments"> Most Commented </a>
  3. <a href="index.php/?sort_by=visits24"> Most Visited </a>
  4. <a href="index.php/?sort_by=links"> Most Cited </a>
  5. </ div>


We only need to get the sorting criteria in PHP and sort them out accordingly. To do this, replace the line with sort_by ('yablogs: comments', & $ items) with this:
  1. if (isset ($ _ GET ['sort_by'])) {
  2. $ crit = $ _ GET ['sort_by'];
  3. } else {
  4. $ crit = 'comments';
  5. }
  6. sort_by ('yablogs:'. $ crit, & $ items);


And finally, add a bit of styling with CSS to make our rating look better:
  1. <style>
  2. div {padding: 20px; background-color: #EEE;}
  3. hr {border: none; border-bottom: 1px dashed yellow;}
  4. </ style>


The source of what we have done can be downloaded here: http://www.nayjest.ru/userfiles/yabdex.blograting.by.nayjest.zip

As you can see, everything is very simple.

In the next lessons I will tell you how to make a full-fledged web service out of it, where everything will be OOP's, work with the database, MVC architecture, valid HTML, caching, maybe even AJAX, and generally whatever you want (suggest in comments!) .

I hope it was interesting and useful. To not miss the following lessons, follow me on Twitter . Thank you for your attention, waiting for your comments!

Source: https://habr.com/ru/post/79560/


All Articles