📜 ⬆️ ⬇️

Working with Yandex.Webmaster API

Anyone who tracks changes in the attitude of Yandex to his site knows about such a useful Yandex service as Yandex.Webmaster , but not many people know that now it is possible to automate this process through interaction with the API.

Since I am engaged in the development and subsequent promotion of sites, and there are quite a lot of promoted sites, I immediately seized the opportunity to automate the process of obtaining statistics from Yandex.Webmaster.
First, we get all the data, even if we forgot to do it or there was no time for it, daily, hourly or even more often.
Secondly, working through the API, we can create our own data viewing interface, making it more convenient for us.

Although there is documentation on this API on Yandex, and it is even provided with examples, I personally could not figure out what to do with the run-up. Therefore, if you are also interested in this topic, please under the cat.

You need to begin your acquaintance with the Yandex.Webmaster API by registering with Yandex. When we are already logged in, we can register our “Application”. In the future, we will need it to receive tokens of users whose sites we want to monitor. You can register a new application here .
')
A little about registration.



If you did not check the second paragraph, then your application is registered and ready to use.
A list of your applications is available here .
Going into the newly created, we get the "application id" and "application password" that we need.

If you decide to immediately access the links, i.e. noted and the second paragraph, you will need to do the following:
Download this document , print it, fill it out, scan it and send it to webmaster-api@yandex-team.ru. My application passed moderation in 4-5 days.

Getting Started.

I will give an example in PHP, because everything works for me on it.

$client_id = "  Id "; $client_secret = "  "; //        ,        //        state,      ,    if (!isset($_GET["code"])) { Header("Location: https://oauth.yandex.ru/authorize?response_type=code&client_id=".$client_id); die(); } //    ""   ,      // $_Get["code"]      .     . //        ,     . $result=postKeys("https://oauth.yandex.ru/token", array( 'grant_type'=> 'authorization_code', //   'code'=> $_GET["code"], //    'client_id'=>$client_id, 'client_secret'=>$client_secret ), array('Content-type: application/x-www-form-urlencoded') ); //    function postKeys($url,$peremen,$headers) { $post_arr=array(); foreach ($peremen as $key=>$value) { $post_arr[]=$key."=".$value; } $data=implode('&',$post_arr); $handle=curl_init(); curl_setopt($handle, CURLOPT_URL, $url); curl_setopt($handle, CURLOPT_HTTPHEADER, $headers); curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($handle, CURLOPT_SSL_VERIFYHOST, false); curl_setopt($handle, CURLOPT_POST, true); curl_setopt($handle, CURLOPT_RETURNTRANSFER, true); curl_setopt($handle, CURLOPT_POSTFIELDS, $data); $response=curl_exec($handle); $code=curl_getinfo($handle, CURLINFO_HTTP_CODE); return array("code"=>$code,"response"=>$response); } //   ,    200,    ,      if ($result["code"]==200) { $result["response"]=json_decode($result["response"],true); $token=$result["response"]["access_token"]; echo $token; }else{ echo "- ! : ".$result["code"]; } //     ,   , ,        ,   


So, we have a token. Now we can deal with getting information about the status of sites.
I will show this in several stages, as if we did not have access to any information previously.

 $token="  "; // ,   function get_stat($url,$headers) { $handle=curl_init(); curl_setopt($handle, CURLOPT_URL, $url); curl_setopt($handle, CURLOPT_HTTPHEADER, $headers); curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($handle, CURLOPT_SSL_VERIFYHOST, false); curl_setopt($handle, CURLOPT_RETURNTRANSFER, true); $response=curl_exec($handle); $code=curl_getinfo($handle, CURLINFO_HTTP_CODE); return array("code"=>$code,"response"=>$response); } //   ,       $result["code"] //   -  id   . //       https://webmaster.yandex.ru/api/123456789, 123456789 - id  //      ,   id   $result=get_stat('https://webmaster.yandex.ru/api/me',array('Authorization: OAuth '.$token)); $user_id=str_replace('https://webmaster.yandex.ru/api/','',$result["response"]); //      ,          . //   : href="https://webmaster.yandex.ru/api/123456789/hosts, 123456789 - id  //     ,     $result=get_stat('https://webmaster.yandex.ru/api/'.$user_id.'/hosts',array('Authorization: OAuth '.$token)); $xml=new SimpleXMLElement($result["response"]); $hosts_xml=$xml->xpath("host"); $hosts=array(); foreach($hosts_xml as $host) { $hosts[(string)$host->name]= array( "name"=>(string)$host->name, "verification_state"=>(string)$host->verification->attributes()->state, "crawling_state"=>(string)$host->crawling->attributes()->state, "virused"=>(string)$host->virused, "last-access"=>(string)$host->{'last-access'}, "tcy"=>(string)$host->tcy, "url-count"=>(string)$host->{'url-count'}, "index-count"=>(string)$host->{'index-count'}, "href"=>(string)$host->attributes()->href ); } unset($hosts_xml); unset($xml); /*                       ,     Array ( [domen] => Array ( [name] => domen -   [verification_state] => VERIFIED -       [crawling_state] => INDEXED -   [virused] => false -      [last-access] => 2012-11-06T22:54:10 -      [tcy] => 150 -  [url-count] => 7458 -    [index-count] => 6131 -     [href] => https://webmaster.yandex.ru/api/id /hosts/id  -       ) ) */ //       //    xml,    ,   ,     ,     $site_href="https://webmaster.yandex.ru/api/654321/hosts/123456"; // 654321 - user_id, 123456 - site_id $result=get_stat($site_href."/stats",array('Authorization: OAuth '.$token)); $xml=new SimpleXMLElement($result["response"]); $errors=(string)$xml->{'url-errors'}; //     $internal-links=(string)$xml->{'internal-links-count'}; //    $links=(string)$xml->{'links-count'}; //     unset($xml); 


Also, we can get information on indexed and excluded pages, having received URLs, but, unfortunately, the data is given only “for the past week”, and this can be said, lack of data.
You can get this data by requests:

 $result=curlGet($site_href."/indexed",array('Authorization: OAuth '.$token)); /*  : <host> <index-count>238</index-count> <last-week-index-urls> <url>http://example.com/page1.html</url> <url>http://example.com/page2.html</url> </last-week-index-urls> </host> */ $result=curlGet($site_href."/excluded",array('Authorization: OAuth '.$token)); /*  : <host> <url-errors count="12"> <url-errors-with-code code=”404”> <count>12</count> <severity>ERROR</severity> </url-errors-with-code> </url-errors> </host>*/ 

Status and decoding errors look here .

You can also receive data on external links to the site, but they are also available as much as possible during the “one week” period.

 $result=curlGet($site_href."/links",array('Authorization: OAuth '.$token)); /*  : <host> <links-count>1436</links-count> <last-week-links> <url>http://example1.com/page1.html</url> <url>http://example2.com/page2.html</url> </last-week-links> </host>*/ 


Another Yandex gives data "about popular requests", but practice shows that there is a divergence of positions very often with the actual position, and the data is outdated. So I did not even bother about this.

The most important thing I do is collect once every 12 hours statistics on all my sites. All data is stored in the database, and therefore I have the opportunity not only to see the current state of affairs, but also to analyze the changes.

And yet, in order not to receive all the data on the site every time, it is better to keep references to them in the database. I mean $ site_url. If suddenly something changes in the API, you can always update these links.

I hope that this information was interesting and useful for you!
Automate your work. The less we are busy with red tape, the more time we have to work. And our loved ones!

Source: https://habr.com/ru/post/157753/


All Articles