In this post I will give a little research on the site blocking mechanism by Rostelecom, and also show ways to bypass it without using different tunnels to third-party hosts (proxies, vpn, etc.). This probably applies to some other providers as well.
The result of the lock
HTTP sites of RT have been blocking for some time by URL, not by IP.
When blocking, a redirect of the form comes in the form
95.167.13.50/?st=0&dt= <IP> & rs = <URL> , where <IP> is the IP that the browser was connected to, <URL> is the URL that it requested. If you view the transmitted traffic, it becomes clear that only the beginning of the server response is overwritten, the rest remains as it is.
It looks like this HTTP / 1.1 302 Found
Connection: close
Location: http://95.167.13.50/?st=0&dt=192.237.142.117&rs=grani.ru/
f-8
Transfer-Encoding: chunked
Connection: keep-alive
6d7
<! DOCTYPE HTML>
<html lang = "en" xmlns: fb = "http://www.facebook.com/2008/fbml">
<head>
<meta charset = "utf-8">
<title>
Grani.Ru:
the main thing
</ title>
...
Real site response HTTP / 1.1 200 OK
Server: nginx / 1.2.1
Date: Sun, 01 Feb 2015 17:34:03 GMT
Content-Type: text / html; charset = utf-8
Transfer-Encoding: chunked
Connection: keep-alive
6d7
<! DOCTYPE HTML>
<html lang = "en" xmlns: fb = "http://www.facebook.com/2008/fbml">
<head>
<meta charset = "utf-8">
<title>
Grani.Ru:
the main thing
</ title>
...
Those. By finding a way to restore headers, you can bypass the lock. Obviously, this is not the most affordable way.
What and how is blocked
Obviously, blocking sites from RT on manual control.
Not all sites from the registry are blocked. At a minimum, there are several HTTPS sites that are not blocked at all.
Typically, HTTPS sites are blocked by IP, sometimes the provider climbs into HTTPS, substituting your certificate, in which case there is a blocking by URL.
')
Sometimes the HTTPS site from the registry is blocked only by HTTP (by URL, rather than by IP, respectively) and is quietly accessible via HTTPS.
Exploring deeper
During a series of experiments, the following principles of blocking operation were identified:
- The first line of the request looks for the name of the HTTP method, a space, a URL, a space, or? or /.
Reacts to the methods GET, POST, HEAD, DELETE, OPTIONS, TRACE. The PUT method has apparently been forgotten, it is being skipped. Other method names are also missing. Names of methods with a modified case of characters are also missing.
The check occurs only in the first line, if you insert an empty line at the beginning of the request, the request passes.
If the URL is "/", then only the method name is searched.
When adding extra space after the method name, the request also passes without problems if the URL is not equal to "/".
Apparently, the URL is considered to have ended when space characters are found, "?" or "/". If you add some other character to the URL, the request passes. Including, if you add a newline character, i.e. remove "http / 1.1" from the request. - URL coding (urlencode) does not help overcome censorship, including in different registers. Even if you encode the initial slash (% 2F), the request is blocked, although the web server does not understand this anymore.
- Next, look for the Host header.
And it is searched in the same package.
And it is searched with the obligatory correspondence to the form “Host: <HOST>”. Any extra character or change in the case of the header name (host, HOST) allows the request to pass.
Changing the case of the characters of the domain itself does not help, however, the lock is triggered.
Workarounds
Thus, we arrive at the following workarounds:
- Adding a blank line to the beginning of the request Not all web servers understand, in particular, nginx does not understand.
- Add space before URL. This is understood by popular web servers. However, there may be problems in rare cases (such as here )
- Add some character after the URL. Obviously, it should be some character that the web server ignores, but the censor unit decides that it is part of the URL. I could not find such a symbol.
- Remove the protocol name and version ("HTTP / 1.1"). In this case, the request is perceived by the web server as HTTP / 1.0, and in this version of the protocol there was no Host header, so this will not work with many sites.
- Sending URL and Host in different packages.
You can simply call send for the first line of the request (HTTP method and URL) and then send the rest of the request in the usual way.
You can add a sufficiently large header (about 1,530 bytes to fill the entire packet for sure) between these lines.
Problems with web servers in such cases are not revealed. - Modification of the Host header.
You can change the register, add spaces before and after the domain.
Problems with web servers in such cases are not revealed.
Practical implementation
I chose a 3proxy based implementation. It includes a plugin that allows you to modify all the transmitted data based on regular expressions. At the same time, the proxy is quite light and undemanding, it can be installed on an ordinary router.
In accordance with the foregoing, the most convenient options in practice are the addition of an extra header before the Host and a modification of the Host header. Obviously, Host modification is preferable, since does not increase the size of the request. I regularly use this method to decide for myself what information I can consume.
But generally both options are easily customizable:
Add extra header pcre_rewrite cliheader dunno "Host:" "X-Something: \ r \ nHost:"
Header modification pcre_rewrite cliheader dunno "Host:" "HOST:"
Basic config # dns servers
nserver 77.88.8.8
nserver 8.8.8.8
# cache dns
nscache 65536
# work in the background
daemon
# plugin connection, you should specify the full path
plugin PCREPlugin.ld.so pcre_plugin
# one of the rules described above
pcre_rewrite ...
# launch proxy, option -a allows to get rid of Forwarded-For and Via headers
proxy -a -p8080
UPD:@ValdikSS made a very interesting note:You had to look at the traffic that comes to the interface from Rostelecom. It is likely that the DPI is connected in parallel, rather than sequentially, and only client traffic arrives there. Since The DPI is clearly closer than the website, the Location package from the DPI comes faster than the real first package from the site, and the package from the site is already dropped by the OS kernel as a retransmission, so if you use Linux, one line in iptables is enough to bypass the blocking :
iptables -A INPUT -p tcp --sport 80 -m string --algo bm --string "http://95.167.13.50/?st" -j DROP
From me:Indeed, there is a retransmission. I watched the traffic, but obviously not carefully enough.
First comes a package in which only HTTP 302 and Location, then comes a package with a normal site response.
However, the system does not discard the second packet, but uniquely combines with the first.
Those. packages come
one HTTP / 1.1 302 Found
Connection: close
Location: http://95.167.13.50/?st=0&dt=192.237.142.117&rs=grani.ru/
2 HTTP / 1.1 200 OK
Server: nginx / 1.2.1
Date: Sun, 01 Feb 2015 17:34:03 GMT
Content-Type: text / html; charset = utf-8
Transfer-Encoding: chunked
Connection: keep-alive
6d7
<! DOCTYPE HTML>
...
And the application sees it.
So HTTP / 1.1 302 Found
Connection: close
Location: http://95.167.13.50/?st=0&dt=192.237.142.117&rs=grani.ru/
f-8
Transfer-Encoding: chunked
Connection: keep-alive
6d7
<! DOCTYPE HTML>
...
This is observed in both Windows and Linux.
But the above iptables rule really solves the issue.
So +1 workaround.
This method can also be used on the gateway / router. The rule is, of course, necessary to be added to the FORWARD chain.