
At
one time, the topic surfaced that in
one of the online games there appeared such a rather annoying thing as a
captcha . By itself, distracting from the game to enter captcha can turn out not very good consequences, especially if you enter it not the first time, and enemies can annoy. But not in this salt. Especially bad thing for those who use local bots. Those small ones stumble over the captcha, and for this game the game instantly penalizes them with the loss of units and resources. Unpleasant thing to say.
So, the task:
It would be desirable, that it was not necessary to enter captcha. Even if you play yourself, even if the bot plays for you, if you sleep.
Additional condition: 40 hours of time (for panic on the ship).
Desirable condition: the installation file under Windows.
Another desirable condition: the result should take no more than a megabyte.
I will say right away that I am not a gamer, and even on the contrary, I am some opponent of online games, and in order to introduce additional entropy into this industry, I decided to take up this business. The case, perhaps, could bring some profit in the wake of the appearance of captcha and panic associated with it, but did not bring it for certain reasons.
')
So what to do?
Attempt 1
Write a system tool that intercepts HTTP requests and responses from all installed programs, and filters those responses to which you would need to enter a captcha, entering it yourself. Two Belarusian programmers pored over the program for about half a year, changing the platform from C to C #, and then to Java, and putting up with the fact that they might need OpenSSL installed on the machine. The task every time overgrown with unnecessary details. Well, in general, it did not work out.
Attempt 2: Himself, all myself
It is quite clear that there are not so many methods, and the choice is only between SOCKS proxy and HTTP proxy. After a while, it became clear that SOCKS proxies were supported by far from all user applications, and the choice became unequivocal.
Platform selection
The choice was not difficult, especially considering the attempt 1. C and C # were quickly swept away, given the complete lack of experience. The following rich platforms were identified:
Java It is difficult to assume that to install such a small utility, users would want to install a JVM, weighing scary to say how many tens of megabytes. Java has fallen off.
Python As you know, it works everywhere and everything is included (batteries included). It weighs 7MB. By modern standards, of course, a bit, but still wanted more compact. The question remains how to install this installer in the installer of my utility. I do not know how this is done in Python applications, perhaps much easier, but I once did the installer in the installer and no longer want it.
Ruby At the time of the beginning of my searches there was no one-step installer for Windows. Completely. Now there is, it implies the installation of MinGW, MSYS and other things, when installing a user that might scare. Weight 7Mb.
About the installer in the installer, the question remains.
Lua A very old and popular language among scripters of C ++ games. Sluggish community, scattered libraries. The weight of a
custom VM
assembly in the right libraries is only 800Kb. The installer is not provided, there is a set of exe files, which as a parameter is passed to the lua script to run it. What you need is also compiled under Win, MacOS, Linux, each of them in versions 32 and 64 separately. That is necessary.
So, I took up the study of
Lua (the New Year wish came true, I learned a new programming language).
The language has wonderful properties, such as:
- sandboxing (in ruby ​​there was only a
patch for version 1.8.5): it allows you to run third-party code, limiting its environment;
- coroutines (such as ruby ​​fibers of 1.9): allows you to make very lightweight cooperative multitasking;
- very simple (more precisely, simple - there is only an associative array) data structure, which, as it turned out, is enough for most data processing tasks;
... much more, so hard in one post.
The easiest thing was to make such a system in the form of a HTTP proxy server filtering requests and responses, which was decided to do (well, not good).
The idea is simple: hang the TCP server, listen to what the client asks, parse the HTTP headers, search for the HOST, remove the HTTP header “Proxy-Connection”, send a request to the intended recipient, receive a response, direct the client, etc.
The server's response needs to be filtered, and this can be done if the server does not use HTTPS, and it fortunately does not use it. It turned out to be quite simple, it turned out to be enough to write a shortened to 190 lines of Lua analog
mechanize for Ruby , which does with the headers and body of the request that no matter what head, allowing you to write whatever you want filters for HTTP requests.
Well, in this case, we had to get rid of the harmful
reCAPTCHA , for which we only needed to determine:
- whether the page trabiana slit the original request (and whether the requested HTML page):
string.find(request.uri(), 'travian') and mimetype and string.find(mimetype, 'text/html')
- whether the result page contained a captcha instead of “useful” game data:
local captcha, captcha_key = string.match(response.body(), '<iframe src="(http://api.recaptcha.net/noscript??(k=[%a%d_]+&lang=en))')
How to solve the captcha itself (with the help of not too modern and high-performing Hindus, but extremely cheap living Hindus) is a bit beyond the scope of our purely technical subject, therefore somewhere else.
As a result, the captcha image was downloaded, sent to the Indians (2 times just in case), the responses received in 5-10 seconds were compared, and if they were the same, the result was sent to the HTTP POST request, to which the victim issues a page, happily reporting that we, it turns out, a man, and a bunch of "useful" game data. We are showing this page to an unsuspecting client who could notice only a pause. In the pessimistic case of a mismatch, the picture was sent to the Hindus again, and so on until the moment when at least two identical solutions could be obtained, the rest were sent to the support service for a refund (practically no freebies for the Indians).
So, here it is, the solution is.
However, something happened that no one could have imagined. For some reason, users also needed access to other sites, is unthinkable! Among them were even sites accessed via HTTPS, and with this it was necessary to do something without regular switching proxies on / off.
And it was also necessary that several requests be received simultaneously. It was unpleasant that google analytics sometimes makes a request lasting a minute, leaving a single-threaded proxy on standby.
Well, for this there were as many as three different libraries for creating an asynchronous TCP server. That is, we wait for the incoming connection, get a piece of data, transfer control to the dispatcher, see if there are still incoming connections or open connections for which there is data (select / kpoll / epoll), transfer control in turn.
Alas and oh, since all such connections occur on a local machine, all this happens almost instantly. And slow connections are outgoing. Podkovyryat existing libraries (
copas ,
asok ), which are designed to multiplex incoming connections, was more difficult than writing your own. And I wrote a short one (272 lines) of
mine . In addition to the fact that all incoming and outgoing connections work asynchronously, you can add any more korutinas (correct me, people with specialized education) to the pool that works in the general cycle.
Well, everything began to work in parallel, and in terms of speed, it only lags behind how it works without a proxy.
How great was my surprise when I received from the server a page with (including) headers:
Content-Encoding: gzip
Transfer-Encoding: chunked
and actually full cracks as a response body.
The first thought was to disable Accept-Encoding in the request so that the server did not try to pack the data, and to remake HTTP 1.1 into HTTP 1.0 so that it would not send “chunks”. But I thought about falling speed and increasing traffic, and took pity on users.
It turned out like this:
if headers(pipe, target)['Transfer-Encoding'] == 'chunked' then
target.body = dechunk(target.body)
end
function dechunk(chunkie)
local chunk_size
local chunk
local chunks = {}
chunkie, chunk_size = readline(chunkie)
while chunk_size and tonumber(chunk_size, 16) > 0 do
chunkie, chunk = readbytes(chunkie, tonumber(chunk_size, 16))
table.insert(chunks, chunk)
chunkie, chunk_size = readline(chunkie)
if not chunk_size or chunk_size == '' then -- sometimes there's a crlf, sometimes not
chunkie, chunk_size = readline(chunkie)
end
end
return table.concat(chunks)
end
Gone to read materiel. Thank God for these items of documentation decently.
We glue the "chunks", we get a gzip file (sometimes deflate, but I have not met yet). Unpacking (thanks to David Manura for the
library ).
Unpacking came out even easier:
if headers(pipe, target)['Content-Encoding'] == 'gzip' and #target.body > 0 then
local decoded = {}
gzip.gunzip {input=target.body, output=function(byte) table.insert(decoded, string.char(byte)) end}
target.body = table.concat(decoded)
end
Left a little
- make HTTPS tunneling for HTTPS sites (thank God, OpenSSL does not need to be bundled, just transfer the data transparently back and forth):
if request.method() == 'CONNECT' then
local sent_to_server, err = client.send("HTTP/1.0 200 Connection established\r\nProxy-agent: BotHQ-Agent/1.2\r\n\r\n")
print('https transparent connection')
https(client, server)
return
end
local function https(client, server)
close_callback = function()
client.close()
server.close()
end
client.receive_subscribe(function(data)
server.send(data)
end, close_callback)
server.receive_subscribe(function(data)
client.send(data)
end, close_callback)
end
- put in the installer:
In general, adventures with the launch of
7zip sfx on
Heroku are a separate post. The joy of victory overshadows any difficult moments of development.
Well, I do not know about you, but it was interesting and exciting for me to do this. I do not regret the time spent.
Eventually:
The proxy server itself is on line 71
here .
Asynchronous library TCP-server-client for 272 lines
here .
A kind of HTTP client for 190 lines.
Filter for solving captcha for 150 lines
here .
An installation file smaller than a megabyte.
I am sure that there are many useful applications for such a thing, ranging from not-so-good ones, such as spamies with an “automatic” captcha solution, to useful ones when it is necessary to filter user traffic flexibly with a script. Let me give you a simple script that does not allow users connected through a proxy to connect to vk.com:
module(..., package.seeall)
function filter(request, response)
response.set_body('')
end
function pre(request, response)
return string.find(request.uri(), 'vk.com')
end
With the upcoming release of lua 5.2, the restriction on calling metamethods from corutin will be removed, and libraries can be made more beautiful, for example, the http.set_body methods and much more will go.