📜 ⬆️ ⬇️

The random () function of a googleblock works absolutely deterministically.

I did some experiments on how Googlebot parses and renders JavaScript, and came across a few interesting things. The first is that the Math.random() function in Googlebot produces a fully deterministic series of numbers. I wrote a small script that uses this bug to accurately identify the googleblock:


A source

When you first call Math.random() from the goblobot, the result will always be 0,14881141134537756 , the second call will always return 0,19426893815398216 . The script on the link above simply uses this information to identify the Google bot, although it obfuscates its actions a little so that they do not look too arbitrary.

Google crawling


Imagine the amount of work that Google needs to do to get around the entire web, and still run all the scripts. It cannot do without abundant optimizations, and I believe that deterministic random numbers are implemented for the following reasons:
')
  1. Speed.
  2. Better security.
  3. Predictability - guglobot can be sure that the page will be displayed equally at each visit.

Acceleration time ...


Googlebot also launches JavaScript with an accelerated clock, which is quite logical. Why really wait for 5 seconds if you are a bot? So Google actually starts the timer at a much faster pace. If you create a simple script with a ticker and run the Fetch & Render function in the Google Search Console, the script will run almost instantly, but the result will look like this:



The second date is the date from the future! Marty McFly would be proud.

When it started?


I wondered if Google’s “random” number generator was being updated, but the search for the number 0,14881141134537756 returned more than 18,000 results, so the constant seems pretty stable. Finding this out, I googled a bit more and found an old comment on Hacker News from KMag:

At some point, someone from SEO found out that random () always returns 0.5. I'm not sure that anyone realized that JavaScript always saw some date from the summer of 2006, but I assume that the situation has changed.

It seems that the situation persists for a long time, but earlier random() always returned 0,5 , and now it produces deterministic series of numbers. The date is actually set exactly at the beginning, but then it can go to the future. KMag further said:

I hope that now they have set a random starting number and date, using a cryptographic hash of all the loaded scripts and the text of the page, so that it will be deterministic, but it will be difficult to manipulate them.

It seems this did not happen. But I'm not sure that in this way you can do a lot of things that you cannot do with a user agent and IP. But maybe this method will allow you to do something, plausibly denying your guilt!

Source: https://habr.com/ru/post/348914/


All Articles