📜 ⬆️ ⬇️

Yandex Wishes

For more than a year of operation of our sites, we have the following wishes for Yandex, which can make the work of this search engine more friendly to the Internet sites that it indexes.
Behind each of these wishes is a separate story, which argues its importance, moreover, many of these wishes are quite simple to fulfill.

Use a large enough crawl-delay "by default"
Once our site fell, as it turned out, YandexSomething, the robot, downloaded up to 12 pages per second. Yes, we didn’t have rawl-delay, but it shouldn’t allow the search engine robot to make so many requests per second, setting this parameter to at least 1 second by default would avoid such problems, those who want to configure this parameter faster, but Sites that do not know about the existence of YandexSomething should not suffer from this.

Separate User-agent: YandexSomething and Yandex / 1.01.001
Our reaction to the DoS attack of the robot was to prohibit it in robots.txt, especially since we found that it was some kind of news bot. It was logical, but as it turned out very stupid. So (by the way, this is indirectly indicated in the documentation ), this led to the fact that we stopped visiting and Yandex / 1.01.001 (Yandex / 1.03.003 went regularly). We learned about this a week later from our users, as a result the site was thrown out of Yandex. Traffic began to fall with a delay of 5-6 days and therefore we could not detect this misstep. It should be noted that the support service recognized the illogicality of this behavior and promised to correct it. In addition, gray on Twitter suggested that it would be more correct to set the crawl-delay, which I did.

Consider content delivery time, even if it costs crawl-delay
On our site, in addition to the main example.net domain, subdomains of the type company .example.net are used, even if we set the rawl-delay, we will not protect against the DoS attack of robots, since robots.txt and accordingly rawl-delay will be different for all subdomains (and there are dozens thousand in our case) and, formally, the robot has the right to put a site with any delay value going to 10,000 sites at a time. Tonight, our site rebooted several times for precisely this reason, since the number of requests per second was three times higher than the Crawl-delay. I don’t know how, but Google does everything correctly and not only does it not load the server too much, it also downloads pages evenly, it seems to me that it takes into account the time it takes to upload content and does not request many pages from one IP address simultaneously, why not try doing this to Yandex ?
')
Download pages evenly
In our case, the robot enters the main domain extremely unevenly. It looks like this: the robot enters the main domain, indexes it very actively for several hours and then leaves the subdomains for 10-30 hours. As information is constantly added to the site, we have special pages with the latest updates, from which there are links to new content. It is clear that logging in every 10-30 hours the robot misses a lot of things, and this leads to complaints from users that their sites sometimes do not appear in the Yandex index for months. Again, Google found these pages for a couple of months and downloads them regularly, as a result very rarely passes more than 3 days before indexing. It is clear that a month in order for Yandex to index the site is not a time limit, but I think you can fight this as well.

Be more loyal and predictable in relation to new sites.
Everyone started once, and not all were immediately linked to the top sites, in our case, it took Yandex more than 2 months to start indexing the main domain, and this started only after correspondence with the support service, judging by the experience of colleagues , even despite the unique content and the availability of external links. Again, here Google behaves more friendly, it added us almost immediately, and gradually gradually increased the number of pages and the speed of indexation, but we were not highly in search, but we were there and everything developed predictably.

The purpose of this list was not to show that Yandex is bad and someone is good. Yandex is the leader of search in Runet, probably the most technologically advanced and successful project, and this means a lot, including the fact that many consider its search to be good, not to mention the fact that having an alternative is always better than its absence. I just want Yandex to become even better, and more responsive to sites, whose existence depends largely on it. Moreover, it seems to me not very difficult.

I think many readers have something to add to this list. Perhaps it would be nice if Yandex realized the opportunity to send a feature request with discussion and voting, this would be better for everyone. For now it can be made in comments.

Thanks in advance to Yandex if any of this list will be heard and implemented.

Source: https://habr.com/ru/post/62609/


All Articles