📜 ⬆️ ⬇️

YandexBot follows the links that the user follows.

This morning, a girl wrote to us in support of the dating site and said that she appears in the men on the “she looked at you” list, although she definitely didn’t look. And I did not use a computer at that time. By chain, this appeal reached me. I'll tell you what I managed to install.

As a small digression, I will tell you briefly about myself. I, among other things, do server and admin part on a small serious dating site. The site is small, and on it, of course, there are mailings to users (new letters, new users). To avoid “spammers”, I’ll clarify that all mailings are 100% compliant with the COI model - approved by the user, he can always and everywhere unsubscribe from them, if he received “bounce”, then we automatically block the box, etc. Several years ago, Spamhaus was very offended at us and I remember the experience of communicating with them for this whole life. Therefore, the correctness of mailings for us is a priority.

In the letters that we send to users, there are links to profiles of other users on the site. Each such link works auto-prilozhinivanie. Since There are few who remember passwords, and our task is to make it convenient for the user to enter the site as quickly and easily as possible. Of course, if you hack the user's mail, there will be an access leak to our site, but I think that in this case, convenience is more important than paranoia.

So, looking at the log of the call of the girl who addressed us, I only saw today (at 8 am) 11 visits to the site from ip 178.154.243.78 and the user agent Mozilla / 5.0 (compatible; YandexBot / 3.0; + http: //yandex.com/bots). By whois, the ip-address really belongs to Yandex LLC. I look at the girl's account - mail on yahoo.com. I am looking for the girl’s most logging in logs - found, user agent - Opera / 9.80 (Windows NT 6.1; Edition Yx) Presto / 2.12.388 Version / 12.15, i.e. Desktop Opera, compiled by Yandex (Edition Yx). Options from where Yandex learned about the existence of 11 different private links, two.
1) The girl went to these links and Opera "pounded" about this in Yandex;
2) She looked at the Yahoo mail through the Opera, using data compression, and Yandex, when proxying (it is included with the girl), collected all the links that were there for further “use”. Poll girls confirmed - Opera from Yandex, data compression, Yahoo mail. But, looking ahead, I will say that I trust the 1st variant more.
')
He continued to dig. For 8 hours of the current nginx log (11 hours in Moscow, the log starts from 0 to GMT), there were 350 unique logins from YandexBot. I look further - it all turned out to be private links from 15 users. Looked at two random. Both users use pure Yandex Browser. Without proxying. YandexBot calls into the account began at all 04/03/2015.

I decided to find the most "clean" option. And found. We have links that we give, include the date of the letter. Therefore, it was not a big deal (grep + awk) to find in the nginx log calls from the letter that we sent to the user today.

I will cite the nginx log (ip user, site, exact link changed):

site.ru 1.1.1.1 - - [26/Jun/2015:08:12:18 +0000] "GET /member/detail/111111750?a=1&c=10000080000&v=11ebeedf6eeam4ihkdeb7540037b5ab7&mail=1435305126_60&t=1 HTTP/1.1" 200 6803 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 YaBrowser/15.6.2311.4046 Safari/537.36" "-" 0.107 cs=-upstream: 192.168.106.14:7002 answer=200 response=0.107 0.107 site.ru 1.1.1.1 - - [26/Jun/2015:08:12:18 +0000] "GET /member/detail/111111750?a=1&c=10000080000&v=11ebeedf6eeam4ihkdeb7540037b5ab7&mail=1435305126_60&t=1 HTTP/1.1" 200 6803 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 YaBrowser/15.6.2311.4046 Safari/537.36" "-" 0.092 cs=-upstream: 192.168.106.4:7002 answer=200 response=0.092 0.092 site.ru 1.1.1.1 - - [26/Jun/2015:08:12:30 +0000] "GET /member/detail/111111708?a=1&c=10000080000&v=11ebeedf6eeam4ihkdeb7540037b5ab7&mail=1435305126_60&t=1 HTTP/1.1" 200 6354 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 YaBrowser/15.6.2311.4046 Safari/537.36" "-" 0.049 cs=-upstream: 192.168.106.12:7002 answer=200 response=0.049 0.049 site.ru 1.1.1.1 - - [26/Jun/2015:08:12:30 +0000] "GET /member/detail/111111708?a=1&c=10000080000&v=11ebeedf6eeam4ihkdeb7540037b5ab7&mail=1435305126_60&t=1 HTTP/1.1" 200 6331 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 YaBrowser/15.6.2311.4046 Safari/537.36" "-" 0.030 cs=-upstream: 192.168.106.10:7002 answer=200 response=0.030 0.030 site.ru 1.1.1.1 - - [26/Jun/2015:08:12:45 +0000] "GET /member/detail/111111436?a=1&c=10000080000&v=11ebeedf6eeam4ihkdeb7540037b5ab7&mail=1435305126_60&t=1 HTTP/1.1" 200 6293 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 YaBrowser/15.6.2311.4046 Safari/537.36" "-" 0.047 cs=-upstream: 192.168.106.18:7002 answer=200 response=0.047 0.047 site.ru 1.1.1.1 - - [26/Jun/2015:08:13:00 +0000] "GET /member/detail/111111053?a=1&c=10000080000&v=11ebeedf6eeam4ihkdeb7540037b5ab7&mail=1435305126_60&t=1 HTTP/1.1" 200 6630 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 YaBrowser/15.6.2311.4046 Safari/537.36" "-" 0.030 cs=-upstream: 192.168.106.10:7002 answer=200 response=0.030 0.030 site.ru 1.1.1.1 - - [26/Jun/2015:08:13:08 +0000] "GET /member/detail/111110974?a=1&c=10000080000&v=11ebeedf6eeam4ihkdeb7540037b5ab7&mail=1435305126_60&t=1 HTTP/1.1" 200 6542 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 YaBrowser/15.6.2311.4046 Safari/537.36" "-" 0.045 cs=-upstream: 192.168.106.12:7002 answer=200 response=0.045 0.045 site.ru 1.1.1.1 - - [26/Jun/2015:08:13:24 +0000] "GET /member/detail/111110878?a=1&c=10000080000&v=11ebeedf6eeam4ihkdeb7540037b5ab7&mail=1435305126_60&t=1 HTTP/1.1" 200 7651 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 YaBrowser/15.6.2311.4046 Safari/537.36" "-" 0.102 cs=-upstream: 192.168.106.12:7002 answer=200 response=0.102 0.102 site.ru 5.255.253.141 - - [26/Jun/2015:08:13:26 +0000] "GET /member/detail/111111053?a=1&c=10000080000&v=11ebeedf6eeam4ihkdeb7540037b5ab7&mail=1435305126_60&t=1 HTTP/1.1" 200 6741 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" "-" 0.113 cs=-upstream: 192.168.106.4:7002 answer=200 response=0.113 0.113 site.ru 5.255.253.141 - - [26/Jun/2015:08:13:32 +0000] "GET /member/detail/111110974?a=1&c=10000080000&v=11ebeedf6eeam4ihkdeb7540037b5ab7&mail=1435305126_60&t=1 HTTP/1.1" 200 6651 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" "-" 0.161 cs=-upstream: 192.168.106.6:7002 answer=200 response=0.161 0.161 site.ru 5.255.253.141 - - [26/Jun/2015:08:13:34 +0000] "GET /member/detail/111111436?a=1&c=10000080000&v=11ebeedf6eeam4ihkdeb7540037b5ab7&mail=1435305126_60&t=1 HTTP/1.1" 200 6405 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" "-" 0.140 cs=-upstream: 192.168.106.10:7002 answer=200 response=0.140 0.140 site.ru 5.255.253.141 - - [26/Jun/2015:08:13:43 +0000] "GET /member/detail/111110878?a=1&c=10000080000&v=11ebeedf6eeam4ihkdeb7540037b5ab7&mail=1435305126_60&t=1 HTTP/1.1" 200 7764 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" "-" 0.117 cs=-upstream: 192.168.106.18:7002 answer=200 response=0.117 0.117 

You can see that the user came from a letter on 7 links and on the 4th of them YandexBot almost immediately went through.

I did not find any results in the search results. What Yandex does with them, only Yandex knows.

My personal conclusion: Yandex. Browser and Opera from Yandex collect links on which the user clicks. They are analyzed and on the part of the links YandexBot is traversed. It started on 04/03/2015.

Update on 07/19/2015
I want to confirm the words of Yandex employees from the comments that the bot does not enter the links from the mail. It is a fact. I checked the logs for several days, and did not find any occurrences of the links from Yandex mail by their robot.

Update on 07/19/2015
Taken below from the kukutz comment .
In general, this is a very unpleasant mistake.
Here is the press service commentary:
Yandex.Browser collects impersonal statistical information to improve the quality of the Browser, which also includes the addresses of the visited pages. This happens only if the person is allowed to do this in the program settings (ticked off “Send usage statistics to Yandex”).
Due to a technical error, information about some of these pages from the Browser was included in the list indexed by the Yandex robot. We have already corrected it for the site, which was described on Habré, and will soon fix it completely. We are grateful to the user Habrahabr for helping to find this error


Update on 08/25/2015
It's been 40 days. Yandex fixed this “feature / error” for only one domain - which I sent to them. For all the rest goes as before.

Source: https://habr.com/ru/post/262695/


All Articles