How important is the API or compare Yandex.XML and the real issue

It was always interesting if there is a difference between the search results of Yandex and their API (xml.yandex.ru) solving the same tasks (official position: Yandex.XML - the ability to make search queries to Yandex and publish search results on your site).

It is known that the data in Yandex.Webmaster is always very late and disagree with reality: information that can be obtained through the issue (the number of pages indexed, links, etc.) appears in the NMR only after a few days.

But since in Yandex they oppose direct parsing of the output, they made an alternative by getting xml data.

By the way, before Ya.XML everyone could get access, simply by confirming the phone number in the account (if I'm not mistaken, for unconfirmed accounts there was a limit of 1000 requests), but about a year or two ago, Yandex abandoned this policy and entered its metric , which is strongly correlated with traffic (or more precisely, with the "number of hits in the issue").
')
In general, this is a very interesting metric (for example, the more often the site is shown in the results, the more often the Yandex anti-virus bot checks the page). Last year, I just received it, parsing 3kk requests from different groups. These data can be condemned in a separate article. And the first time I heard this term at the Yet Another Conference 2013, in the security department.

But back to XML.

The essence of the experiment:

1. 2,778 requests were taken from 4 groups (commerce, women's topics, tourism, information requests)
2. Almost simultaneously, search results parsing was launched (xml parses longer due to internal restrictions)
3. To access Ya.XML, we took our own limits from J.Webmaster, for parsing the issue - a closed proxy service. For the sake of purity of the experiment, the region was specified lr = 1 (the geography of the IP proxy service is RU (by huizu), Moscow is specified in the address field).

The last update of the database was January 9, and the data was collected 13, so the issuance storm is no longer there and the data can be considered reliable.

A little bit about the cons of XML:

does not render the contents of the title, only the snippet
snippet has a difference with a snippet from issuing
does not show whether there is an advertisement in the issue (this is how competitors can be assessed and the degree of commercialization of the request)
does not show if Yandex services are in issue

(I also check domains for indicators on another project (indexing, TIC, etc.). When checking an index through XML, Yandex very often changes the numbers, I noticed it long ago. The discrepancy can reach hundreds of pages (plus or minus), sometimes in the index 0.)

Now the conclusions:

Most of the differences - plus or minus 1 position.
Slightly less - plus or minus 5 positions
Very few - other sites in positions.

And in numbers:

Matches positions - 75%
Does not match - 25%

I would be happy to point out possible errors and, especially, compared with the results of similar experiments.

Random sampling with highlighted data: yadi.sk/i/i4imHJ8qmvgTd
All results in csv: yadi.sk/d/X5SYWxl7mvgUe
Database Dump: yadi.sk/d/O5viMlrRmvgKD

The numbers in the results are the frequency of queries on WordStatus (general and accurate), they do not play a special role, but just have

Source: https://habr.com/ru/post/275197/

All Articles

How important is the API or compare Yandex.XML and the real issue

More articles: