Not quite the usual XMPP bot in Python: tunneling

Not so long ago, an article about ICQ in Python was published, which pushed me to develop the theme, albeit in a slightly different direction. A few years ago I had difficulties with home Internet: access only to a local network, only ICQ and local Jabber server from communication with the outside world; there was no other way to get out. As a result, the idea was born to tunnel HTTP traffic to XMPP.

Scheme

The scheme is based on three main components:
')

bot server : accepts messages with HTTP requests, executes, encodes and sends the result to the client
bot client : sends HTTP information about the requests to be executed to the server, waits for the result, processes and returns the result of the query execution that is ready for further use
http-proxy : a proxy server that processes HTTP requests using a bot client

Components are located like this: on a remote machine with internet access, a bot server is started. A bot client and a proxy are launched on localhost ; client applications are configured to use our proxy, for example:

$ http_proxy="localhost:3128" wget ...

For the bot client to communicate with the bot server, a simple XML-based protocol is used.

Request to download the index page example.com :

 <url>http://example.com</url>

Answer:

 <answer chunk="2" count="19"><data>encoded_data</data></answer>

The answer consists of several parts, chunk'ov. Here chunk is the chunk number, count is the total number of chunks into which the response to the request was broken. encoded_data is a base64 encoded response chunk.

For greater clarity, I present the scheme graphically:

                                      local                                            
 + ------------------------------------------------- ---------------------------------- +
 |  http-client (browser, wget) -> http-proxy -> bot-client | 
 + ------------------------------------------------- ---------------------------------- +
                                        / \
                                        ||
                                        \ /
                                     remote
 + ------------------------------------------------- ---------------------------------- +
 |  bot-server |
 + ------------------------------------------------- ---------------------------------- +

Implementation

General information

Xmpppy is used to work with XMPP. No tricky features are required, you only need to process incoming messages and send replies. XML is parsed and generated by means of the standard library - xml.dom.minidom .

Bot server

The server's task is to receive download requests, give them to the library, which itself will figure out what to download, and return the result, and the server will forward this result to the client.

In a simplified scheme, server-side message handling looks like this:

 import xmpp from Fetcher import Fetcher fetcher = None def message_callback(con, msg): global fetcher if msg.getBody(): try: ret = fetcher.process_command(msg.getBody()) except: ret = ["failed to process command"] for i in ret: reply = xmpp.Message(msg.getFrom(), i) reply.setType('chat') con.send(reply) if __name__ == "__main__": jid = xmpp.JID("my@server.jid") user = jid.getNode() server = jid.getDomain() password = "secret" conn = xmpp.Client(server, debug=[]) conres = conn.connect() authres = conn.auth(user, password, resource="foo") conn.RegisterHandler('message', message_callback) conn.sendInitPresence() fetcher = Fetcher() while True: conn.Process(1)

I deliberately removed error handling and hardcoded values to make the code more compact and easier to read. So what is going on here? We connect to the jabber server and hang the message handler:

  conn.RegisterHandler('message', message_callback)

Thus, for each new incoming message, our message_callback (con, msg) function will be called, the arguments of which will be the handle of the connection and the message itself. The function itself calls the command handler from the Fetcher class, which does all the “black” work and returns a list of chunks given to the client. That's all, this server operation ends.

Fetcher

The Fetcher class implements the very logic of executing and encoding HTTP requests. Entirely I will not give it a code, it can be viewed in the archive attached to the article, I will describe only the main points:

  def process_command(self, command): doc = xml.dom.minidom.parseString(command) url = self._gettext(doc.getElementsByTagName("url")[0].childNodes) try: f = urllib2.urlopen(url) except Exception, err: return ["%s" % str(err)] lines = base64.b64encode(f.read()) ret = [] chunk_size = 1024 x = 0 n = 1 chunk_count = (len(lines) + chunk_size - 1) / chunk_size while x < len(lines): ret.append(self._prepare_chunk(n, chunk_count, lines[x:x + chunk_size])) x += chunk_size n += 1 return ret

The process_command function, as you probably remember, is called by our bot server. It parses the XML request, determines which url it needs to request, and does it with urllib2 . The downloaded is encoded in base64 so that there are no unexpected problems with special characters, and it is split into equal parts in order not to rest on the restriction on the length of the message. Then each chunk is wrapped in XML and sent out.

Customer

The client, in fact, is only one callback, which glues data and decodes from base64:

 def message_callback(con, msg): global fetcher, output, result if msg.getBody(): message = msg.getBody() chunks, count, data = fetcher.parse_answer(message) output.append(data) if chunks == count: result = base64.b64decode(''.join(output))

Proxy

To ensure that the tunnel can be used transparently, HTTP-proxy is implemented. Proxy server is on port 3128 / tcp and waits for requests. Received requests are sent to the bot server for processing, the result is decoded and sent to the client. From the point of view of client applications, our proxy is no different from “ordinary” ones.

To create a TCP server, use SocketServer.StreamRequestHandler from the standard library.

 class RequestHandler(SocketServer.StreamRequestHandler): def handle(self): data = self.request.recv(1024) method, url, headers = parse_http_request(data) if url is not None: response = fetch_file(server_jid, client_jid, password, url) self.wfile.write(response) self.request.close()

The parse_http_request () function parses an HTTP request, pulling out url, headers and http version from it; fetch_file () - requests a url using a bot client.

Conclusion

The full source code is available here as a shar archive (you need to run the file and execute it as a shell script). Of course, this is more a prototype than a full-fledged application, but the prototype is working and at least downloads small files without problems. This should be enough for the main purpose of the article: to demonstrate the "non-interactive" application of the IM bot.

A lot of things can be improved in the project - from adding authentication, normal support for request types, and ending with performance work. It is very interesting, what kind of performance can be achieved with such an architecture, the study of which, perhaps, I will soon do.

Source: https://habr.com/ru/post/111971/

All Articles