📜 ⬆️ ⬇️

Protection against bots, based on differences in working with large numbers in JavaScript and PHP

Recently, I had to deal with the protection from bots, used on several fairly popular resources.
At first glance, the protection seemed to be the usual setting of cookies via javascript, which is 15 minutes to cope with. In fact, after a little research, it became clear where what is being done and what parameters are being transferred to, it remains only to rewrite a small function from javascript to php and the trick is done.
But it was not so easy. And although in the end the defense was broken, it took far 15 minutes, and the principle of protection itself turned out to be new and quite interesting for me.

So, first things first.

Surface inspection


Protection works as follows.
The script of the main page of the site index.php expects a cookie, in which one of the parameters will indicate the hash calculated from the visitor's IP address.
If the cookie is not transmitted, then index.php redirects the visitor to another page containing the javascript code that calculates the required parameter, writes it to the cookie and returns us back to the main page.

In order for a regular php bot that performs GET and POST requests through CURL to pass through such protection, you need to rewrite the hash calculation from javascript to php and then add the desired cookie to the request header.
')

Autopsy


Now more.
Launch Firefox, disable javascript and enable Firebug.
We request the main page index.php and look at the request and response headers.

Request:

Get
  http://example.com 


The headers of this query are of no interest to us.
Here are the response headers:

Status: 302 Moved Temporarily

Connection keep-alive
Content-Type text / html
Date XXX GMT
Location
  http://example.com/govalidateyourself#98765:1234:11.22.3.3.4.4:/index.php 

Server YTS / 1.20.0
Transfer-Encoding chunked


Then Firefox automatically switches to the Location specified in the header, receiving the following response header:

Accept-Ranges bytes
Connection keep-alive
Content-Type text / html; charset = utf-8
Date XXX GMT
Last-Modified YYY GMT
Server YTS / 1.20.0
Set-Cookie addr = 1234: 11.22.33.44; path = /
Transfer-Encoding chunked


Where 11.22.33.44 is my IP address, 1234 is a number whose calculation logic is unknown.

The page itself contains a link to the js-code.
  http://example2.com/validator/va.js 
and the inscription "No javascript".
Without js, they won't let us go any further.

After all the requests and responses are recorded, we enable javascript, clear the cookie and do it all over again.
Now we are interested in what will happen after the request of the validation page.

This time, the main page of the site is loaded, and here is the header of the last request:

Accept text / html, application / xhtml + xml, application / xml; q = 0.9, * / *; q = 0.8
Accept-Encoding gzip, deflate
Accept-Language ru-ru, ru; q = 0.8, en-us; q = 0.5, en; q = 0.3
Connection keep-alive
Cookie addr = 5678: 11.22.33.44; urine = aabbccdd; v = 1
Host example.com
Referer
  http://example.com/govalidateyourself 

User-agent of some kind of firefox


Constant 1234 from the last server response this time changed to 5678, the IP address remained the same. Apparently this is the request ID assigned by the server and stored in the cookie. Well, it should be saved and simply written in cookies unchanged during requests.

But the parameter urine = aabbccdd is already interesting. Since he did not come from the server, it means he was received from us, and something tells me that this is the work of va.js.

It's time to see what's inside. At first glance, a complete swamp, in which it is better not to get into:

if(document.cookie==""){document.write("Cookies error")}else{function poo(a,b){var c=a.length,d=b^c,e=0,f;while(c>=4){f=a.charCodeAt(e)&255|(a.charCodeAt(++e)&255)<<8|(a.charCodeAt(++e)&255)<<16|(a.charCodeAt(++e)&255)<<24;f=(f&65535)*1540483477+(((f>>>16)*1540483477&65535)<<16);f^=f>>>24;f=(f&65535)*1540483477+(((f>>>16)*1540483477&65535)<<16);d=(d&65535)*1540483477+(((d>>>16)*1540483477&65535)<<16)^f;c-=4;++e}switch(c){case 3:d^=(a.charCodeAt(e+2)&255)<<16;case 2:d^=(a.charCodeAt(e+1)&255)<<8;case 1:d^=a.charCodeAt(e)&255;d=(d&65535)*1540483477+(((d>>>16)*1540483477&65535)<<16)}d^=d>>>13;d=(d&65535)*1540483477+(((d>>>16)*1540483477&65535)<<16);d^=d>>>15;return d>>>0}function coo(a){var b=a+"=";var c=document.cookie.split(";");for(var d=0;d<c.length;d++){var e=c[d];while(e.charAt(0)==" ")e=e.substring(1,e.length);if(e.indexOf(b)==0)return e.substring(b.length,e.length)}return null}var dt=new Date,expiryTime=dt.setTime(dt.getTime()+1000e5);var dt2=new Date,expiryTime=dt2.setTime(dt2.getTime()+2e4);var addr=window.location.hash.split(":")[2];var a=poo(addr,47).toString(16);for(var i=0,z="";i<8-a.length;i++)z+="0";a=z+a;a=a.substring(6)+a.substring(4,6)+a.substring(2,4)+a.substring(0,2);var refurl=window.location.hash.split(":")[3];document.cookie="urine="+a+"; expires="+dt.toGMTString()+"; path=/";if(!coo("v")){document.cookie="v=1; expires="+dt2.toGMTString()+"; path=/";setTimeout("window.location = refurl",300)}else if(coo("v")<3){var c=coo("v");c++;document.cookie="v="+c+"; expires="+dt2.toGMTString()+"; path=/";setTimeout("window.location = refurl",300)}else if(coo("v")>=3){document.write("Too many redirects from: "+document.referrer)}} 


But a little patience, and after formatting everything looks readable and quite understandable.
There are two functions coo () and poo (), and the code that writes the cookie we need and sends it back to index.php.

The soo () function is not of particular interest, it receives the value of the specified parameter from a cookie, and is easily written to php with a simple regular expression.

Here is the poo () function, which counts the urine parameter:

 function poo( a, b ) { var c = a.length, d = b^c, e = 0, f; while( c >= 4 ) { f = a.charCodeAt( e ) & 255 | ( a.charCodeAt( ++e ) & 255 ) << 8 | ( a.charCodeAt( ++e ) & 255 ) << 16 | ( a.charCodeAt( ++e ) & 255 ) << 24; f = ( f & 65535 ) * 1540483477 + ( ( ( f >>> 16 ) * 1540483477 & 65535 ) << 16 ); f ^= f >>> 24; f = ( f & 65535 ) * 1540483477 + ( ( ( f >>> 16 ) * 1540483477 & 65535 ) << 16 ); d = ( d & 65535 ) * 1540483477 + ( ( ( d >>> 16 ) * 1540483477 & 65535 ) << 16 )^f; c -= 4; ++e } switch( c ) { case 3: d ^= ( a.charCodeAt( e + 2 ) & 255 ) << 16; case 2: d ^= ( a.charCodeAt( e + 1 ) & 255 ) << 8; case 1: d ^= a.charCodeAt( e ) & 255; d = ( d & 65535 ) * 1540483477 + ( ( ( d >>> 16 ) * 1540483477 & 65535 ) << 16 ) } d ^= d >>> 13; d = ( d & 65535 ) * 1540483477 + ( ( ( d >>> 16 ) * 1540483477 & 65535 ) << 16 ); d ^= d >>> 15; return d >>> 0 } 


During a call, the following parameters are passed to it:

 var a = poo( addr, 47 ).toString( 16 ); 


a - this is the ready-made value of the urine parameter (then it is only padded with zeros if it contains less than 8 characters).
addr is our IP address 11.22.33.44.
47 is a constant.

Now everything looks clear.
A php bot breaking through this protection should work according to the following algorithm.

1. Making a GET request
  http://example.com/index.php 
Set the option to receive response headers:

 curl_setopt( $ch, CURLOPT_HEADER, 1 ); 


And at the same time turn on the automatic transition in the case of redirect:

 curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, 1 ); 


In this case, curl itself will perform the transition to the new location, and we do not need to program the second request. And we will get the headers of both responses, the first heading will be the Location, the second - the first cookie containing the request ID.

2. Parsing the headers, we get the request ID and your IP address (if we use different tricks, then we may not know it at once, but here we are kindly prompted - it is very convenient).
We consider the urine parameter, write to the cookie and send a new GET request to index.php. Protection passed.

Cook is prescribed as follows:

 $headers = array( "Cookie: " . $cookie_str, // "addr=5678:11.22.33.44; urine=aabbccdd; v=1" /*    / */ ); curl_setopt( $ch, CURLOPT_HTTPHEADER, $headers ); 


So, there was the final touch - the calculation of urine.

Rake


You just need to rewrite the function poo () in php.
To begin with, let's google a bit and write analogs for a pair of js-functions and operators that are not in php:

 // php js functions function charCodeAt( $str, $i ) { return ord( substr( $str, $i, 1 ) ); } // char at function charAt( $str, $i ) { return $str{ $i }; } //unsigned shift right (js >>>) function zeroFill( $a, $b ) { $z = hexdec( 80000000 ); if( $z & $a ) { $a = ( $a >> 1 ); $a &= ( ~ $z ); $a |= 0x40000000; $a = ( $a >> ( $b - 1 ) ); } else { $a = ( $a >> $b ); } return $a; } 


Now everything is ready, and you can rewrite poo ():

 // function poo( $a, $b ) { $c = strlen( $a ); $d = $b ^ $c; $e = 0; $f = ''; while( $c >= 4 ) { $f = charCodeAt( $a, $e ) & 255 | ( charCodeAt( $a, ++$e ) & 255 ) << 8 | ( charCodeAt( $a, ++$e ) & 255 ) << 16 | ( charCodeAt( $a, ++$e ) & 255 ) << 24; $f = ( $f & 65535 ) * 1540483477 + ( ( ( zeroFill( $f, 16 ) ) * 1540483477 & 65535 ) << 16 ); $f ^= zeroFill( $f, 24 ); $f = ( $f & 65535 ) * 1540483477 + ( ( ( zeroFill( $f, 16 ) ) * 1540483477 & 65535 ) << 16 ); $d = ( $d & 65535 ) * 1540483477 + ( ( ( zeroFill( $d, 16 ) ) * 1540483477 & 65535 ) << 16 )^$f; $c -= 4; ++$e; } switch( $c ) { case 3: $d ^= ( charCodeAt( $a, $e + 2 ) & 255 ) << 16; case 2: $d ^= ( charCodeAt( $a, $e + 1 ) & 255 ) << 8; case 1: $d ^= charCodeAt( $a, $e ) & 255; $d = ( $d & 65535 ) * 1540483477 + ( ( ( zeroFill( $d, 16 ) ) * 1540483477 & 65535 ) << 16 ); } $d ^= zeroFill( $d, 13 ); $d = ( $d & 65535 ) * 1540483477 + ( ( ( zeroFill( $d, 16 ) ) * 1540483477 & 65535 ) << 16 ); $d ^= zeroFill( $d, 15 ); return zeroFill( $d, 0 ); } 


Save, run and break off - the results of js and php versions do not match.
What's the matter?
Add the code in js and php to display the result after each line of calculations and see what it is.

It turns out that simple arithmetic operators php, unlike javascript, do not work well with large numbers.

For example, the expression

 ( 18220025198660 & 65535 ) * 1540483477 + ( ( ( 18220025198660 >>> 16 ) * 1540483477 & 65535 ) << 16 ); 


in javascript will be equal to 221886241596 36 , and similar in php

 ( 18220025198660 & 65535 ) * 1540483477 + ( ( ( zeroFill( 18220025198660, 16 ) ) * 1540483477 & 65535 ) << 16 ) 


will be equal to a slightly different number 221886241596 00

When several similar formulas are calculated in a row, the error accumulates, giving a completely different result. In some php expressions, by default, the result is that the type is int and limits the maximum value to 4 billion (on 32-bit systems).

Perl has similar problems with large numbers.

For accurate calculations in php, you must use the functions of the BC Math library. At the same time, you need to add a cast to the float type.

As a result of trial and error, we get code that gives the same results as javascript. But this requires additional time and effort.
The code is not the most optimal, for greater clarity, calculations are performed in steps.

 // function poo( $a, $b ) { $c = strlen( $a ); $d = $b ^ $c; $e = 0; $f = ''; while( $c >= 4 ) { $f = charCodeAt( $a, $e ) & 255 | ( charCodeAt( $a, ++$e ) & 255 ) << 8 | ( charCodeAt( $a, ++$e ) & 255 ) << 16 | ( charCodeAt( $a, ++$e ) & 255 ) << 24; $f = bcadd( bcmul( $f & 65535, 1540483477 ), ( floatval( ( bcmul( ( zeroFill( $f, 16 ) ), ( 1540483477 & 65535 ) ) ) ) << 16 ) ); $xx = zeroFill( $f, 24 ); $f = floatval( $f ) ^ floatval( $xx ); // $f = floatval( $f ); $f1 = bcmul( $f & 65535, 1540483477 ); $f2 = ( floatval( ( bcmul( ( zeroFill( $f, 16 ) ), ( 1540483477 & 65535 ) ) ) ) << 16 ); $f = bcadd( $f1, $f2 ); $d1 = bcmul( $d & 65535, 1540483477 ); $d2 = ( floatval( ( bcmul( ( zeroFill( $d, 16 ) ), ( 1540483477 & 65535 ) ) ) ) << 16 ); $d = bcadd( $d1, $d2 ); $d = floatval( $d ) ^ floatval( $f ); $c -= 4; ++$e; } switch( $c ) { case 3: $d = floatval( $d ) ^ ( ( charCodeAt( $a, $e + 2 ) & 255 ) << 16 ); case 2: $d = floatval( $d ) ^ ( ( charCodeAt( $a, $e + 1 ) & 255 ) << 8 ); case 1: $d = floatval( $d ) ^ ( charCodeAt( $a, $e ) & 255 ); $d1 = bcmul( $d & 65535, 1540483477 ); $d2 = ( floatval( ( bcmul( ( zeroFill( $d, 16 ) ), ( 1540483477 & 65535 ) ) ) ) << 16 ); $d = bcadd( $d1, $d2 ); } $d = floatval( $d ) ^ zeroFill( $d, 13 ); $d1 = bcmul( floatval( floatval( $d ) & 65535 ), 1540483477 ); $dd21 = zeroFill( $d, 16 ); $dd22 = floatval( bcmul( $dd21, 1540483477 & 65535 ) ); $dd23 = floatval( $dd22 << 16 ); $d2 = $dd23; $d = bcadd( $d1, $d2 ); $d = floatval( $d ) ^ zeroFill( $d, 15 ); if( $d < 0 ) { $res = bindec( decbin( ~0 ) ) - abs( $d ) + 1; } else { $res = $d; } return $res; } 


And for the zeroFill () function, we add to the very beginning:

 $a = floatval( $a ); 


Conclusion


My bots have done their job, and you can use the protection described here for your own purposes. If you modify it, for example, dynamically changing the code making the computation, then such a hacking will become an even more difficult task. And if no one wants to take you seriously, this protection will be enough.

In general, the best protection against bots is a captcha. Even the most cunning javascript can be executed by bots using something like the Mechanize Perl module.

Source: https://habr.com/ru/post/137961/


All Articles