Not so long ago, an article about
ICQ in Python was published, which pushed me to develop the theme, albeit in a slightly different direction. A few years ago I had difficulties with home Internet: access only to a local network, only ICQ and local Jabber server from communication with the outside world; there was no other way to get out. As a result, the idea was born to tunnel HTTP traffic to XMPP.
Scheme
The scheme is based on three main components:
')
- bot server : accepts messages with HTTP requests, executes, encodes and sends the result to the client
- bot client : sends HTTP information about the requests to be executed to the server, waits for the result, processes and returns the result of the query execution that is ready for further use
- http-proxy : a proxy server that processes HTTP requests using a bot client
Components are located like this: on a remote machine with internet access, a bot server is started. A bot client and a proxy are launched on
localhost ; client applications are configured to use our proxy, for example:
$ http_proxy="localhost:3128" wget ...
For the bot client to communicate with the bot server, a simple XML-based protocol is used.
Request to download the index page
example.com :
<url>http://example.com</url>
Answer:
<answer chunk="2" count="19"><data>encoded_data</data></answer>
The answer consists of several parts, chunk'ov. Here
chunk is the
chunk number,
count is the total number of chunks into which the response to the request was broken.
encoded_data is a base64 encoded response chunk.
For greater clarity, I present the scheme graphically:
local
+ ------------------------------------------------- ---------------------------------- +
| http-client (browser, wget) -> http-proxy -> bot-client |
+ ------------------------------------------------- ---------------------------------- +
/ \
||
\ /
remote
+ ------------------------------------------------- ---------------------------------- +
| bot-server |
+ ------------------------------------------------- ---------------------------------- +
Implementation
General information
Xmpppy is used to work with XMPP. No tricky features are required, you only need to process incoming messages and send replies. XML is parsed and generated by means of the standard library -
xml.dom.minidom .
Bot server
The server's task is to receive download requests, give them to the library, which itself will figure out what to download, and return the result, and the server will forward this result to the client.
In a simplified scheme, server-side message handling looks like this:
import xmpp from Fetcher import Fetcher fetcher = None def message_callback(con, msg): global fetcher if msg.getBody(): try: ret = fetcher.process_command(msg.getBody()) except: ret = ["failed to process command"] for i in ret: reply = xmpp.Message(msg.getFrom(), i) reply.setType('chat') con.send(reply) if __name__ == "__main__": jid = xmpp.JID("my@server.jid") user = jid.getNode() server = jid.getDomain() password = "secret" conn = xmpp.Client(server, debug=[]) conres = conn.connect() authres = conn.auth(user, password, resource="foo") conn.RegisterHandler('message', message_callback) conn.sendInitPresence() fetcher = Fetcher() while True: conn.Process(1)
I deliberately removed error handling and hardcoded values to make the code more compact and easier to read. So what is going on here? We connect to the jabber server and hang the message handler:
conn.RegisterHandler('message', message_callback)
Thus, for each new incoming message, our
message_callback (con, msg) function will be called, the arguments of which will be the handle of the connection and the message itself. The function itself calls the command handler from the
Fetcher class, which does all the “black” work and returns a list of chunks given to the client. That's all, this server operation ends.
Fetcher
The
Fetcher class implements the very logic of executing and encoding HTTP requests. Entirely I will not give it a code, it can be viewed in the archive attached to the article, I will describe only the main points:
def process_command(self, command): doc = xml.dom.minidom.parseString(command) url = self._gettext(doc.getElementsByTagName("url")[0].childNodes) try: f = urllib2.urlopen(url) except Exception, err: return ["%s" % str(err)] lines = base64.b64encode(f.read()) ret = [] chunk_size = 1024 x = 0 n = 1 chunk_count = (len(lines) + chunk_size - 1) / chunk_size while x < len(lines): ret.append(self._prepare_chunk(n, chunk_count, lines[x:x + chunk_size])) x += chunk_size n += 1 return ret
The
process_command function, as you probably remember, is called by our bot server. It parses the XML request, determines which url it needs to request, and does it with
urllib2 . The downloaded is encoded in base64 so that there are no unexpected problems with special characters, and it is split into equal parts in order not to rest on the restriction on the length of the message. Then each chunk is wrapped in XML and sent out.
Customer
The client, in fact, is only one callback, which glues data and decodes from base64:
def message_callback(con, msg): global fetcher, output, result if msg.getBody(): message = msg.getBody() chunks, count, data = fetcher.parse_answer(message) output.append(data) if chunks == count: result = base64.b64decode(''.join(output))
Proxy
To ensure that the tunnel can be used transparently, HTTP-proxy is implemented. Proxy server is on port 3128 / tcp and waits for requests. Received requests are sent to the bot server for processing, the result is decoded and sent to the client. From the point of view of client applications, our proxy is no different from “ordinary” ones.
To create a TCP server, use
SocketServer.StreamRequestHandler from the standard library.
class RequestHandler(SocketServer.StreamRequestHandler): def handle(self): data = self.request.recv(1024) method, url, headers = parse_http_request(data) if url is not None: response = fetch_file(server_jid, client_jid, password, url) self.wfile.write(response) self.request.close()
The
parse_http_request () function parses an HTTP request, pulling out url, headers and http version from it;
fetch_file () - requests a
url using a bot client.
Conclusion
The full source code is available
here as a shar archive (you need to run the file and execute it as a shell script). Of course, this is more a prototype than a full-fledged application, but the prototype is working and at least downloads small files without problems. This should be enough for the main purpose of the article: to demonstrate the "non-interactive" application of the IM bot.
A lot of things can be improved in the project - from adding authentication, normal support for request types, and ending with performance work. It is very interesting, what kind of performance can be achieved with such an architecture, the study of which, perhaps, I will soon do.