📜 ⬆️ ⬇️

Yandex robot takes into account wishes

Recently, a discussion was held on Habré, which discussed the policy of crawling sites and the incident with Robot Yandex at the uaprom.net and ruprom.net servers.
Thank you all for the following tips, we will try to take them into account. As for the case of uaprom / ruprom, the data on the ugly behavior of our robot is true, but does not reflect the full picture.

1. The Yandex robot has downloaded 19238 pages from 8506 (eight thousand five hundred six) sub-domains of uaprom.net and 6896 (six thousand eight hundred ninety-six) sub-domains of ruprom.net, and not from two hosts, as may seem from notes.

2. On each host (of ~ 15000), no more than one call was made in 1.1 or 2 seconds (depending on the size of the host).
')
Now uaprom.net/robots.txt and ruprom.net/robots.txt set Crawl-Delay to 0.5, which increased the load on their hosts (by default, Crawl-Delay 1.1 or 2 seconds).

3. All subdomains ruprom.net and uaprom.net lie on two IP. Yandex automatic algorithms have defined ruprom.net and uaprom.net as hosting (they position themselves as reliable hosting, see ruprom.net/tour-4 and uaprom.net/tour-4 ).

For reliable hosting, on servers of which there are many sites, we created a load not exceeding 12 requests per second for IP.

4. User-Agent was given to YandexSomething and this is our fault. It was not a news story, but one of the search robots, with whom we forgot to change the default view. The error was fixed, thanks, the robot was put in a corner.

Summary: the load created on the IP hosting sites ruprom.net and uaprom.net did not exceed the limits acceptable for crawling most hosting sites. We understand that it may be redundant for small hosting sites and will try to better differentiate the load on large and small hosting. We hope that the Runet servers will meet the reformed robot favorably.

Source: https://habr.com/ru/post/62731/


All Articles