In the article I will describe my small open-source project - Centrifuge (hereinafter referred to as Centrifuge). This is a Python server whose task is to send (broadcast) messages in real time to connected (mainly from the browser) clients.
It will be a story filled with both personal emotions and a description of the technologies used, but without code examples. If the topic is close to you - do not pass by, it will be curious. ')
For a start, please see the screencast (do not forget to include subtitles), if after watching the interest does not disappear, feel free to read further!
The idea of a real-time server of messages is not new at all; among existing similar projects I can cite as an example Pusher and Pubnub . Here is a quote from the Pusher website:
Pusher is a hosted API for web and mobile apps.
Pubnub on its main page tells us something similar:
Thousands of mobile, web, and desktop apps.
Goals Centrifuges are not so global. This is not a ready distributed worldwide infrastructure for creating real-time applications, it’s just a server that you install on your machine and use as a message broker.
I have no doubt that there are quite a few such servers. Some time ago I myself happened to write a very similar thing - cyclone-sse . This is a Twisted daemon that allows you to send messages to channels in real time using Server-Sent Events (SSE) technology (or fallback to Long-Polling for old people like IE 7). It turned out quite a decent piece of code that we successfully use in battle.
However, that demon does not solve some important problems:
1) Lack of any authorization. In our case, all projects are closed from the outside by the company's firewall and we use cyclone-sse only for public data. But to add real-time events to a project that is accessible to all Internet users, an authorization mechanism is needed.
2) Does not use Websockets . A protocol that provides the possibility of two-way data exchange (while SSE is a unidirectional protocol, messages are only possible from the server to the client). In addition, web sockets support cross-domain communication, which is not always true for Server-Sent Events.
One day a work colleague complained that he lacked the ability to monitor package updates for JAVA. I thought it was a good idea for a small project - to track new packages, perhaps not only for Java, but also for other programming languages, to show updates in the web interface in real time - and I started.
The more I wrote, the further the code went from the original idea. From the package update aggregator, the project became the aggregator of everything.
It sounds strong, but after a while I realized that the implementation of such a task requires something more than I have ... And it was decided to simplify everything - at that time the skeleton of sending messages to connected clients was written, why not develop this area? I realized that I could write code similar in purpose and scope to cyclone-sse, but more adapted to mass use.
So, you are a python programmer and you need to write such a server - what will you use? A choice of Twisted , Gevent and Tornado . And, while Guido Van Rossum is standardizing the interface of event-loop programs in the Tulip library, we need to choose.
I chose a tornado. He works on the third python, he's just great after all. In many ways, this choice and the desire for the final code to work in Python 3.3 predetermined the choice of other related technologies - ZeroMQ ( pyzmq ), SockJS ( sockjs-tornado ), MongoDB ( motor ) and PostgreSQL ( momoko ). All libraries are asynchronous, excluding locks when interacting with sockets.
I didn’t come to ZeroMQ right away.
When there are several application processes behind the balancer, and customers can theoretically connect to any of them - it is necessary to somehow maintain the integrity of the internal state of such a system and be able to communicate between application instances. Initially, for these purposes, I used the Redis Pub / Sub mechanism. But I stumbled upon a bug in the implementation of the Tornado-Redis library and looked towards other solutions.
As a result, the choice fell on ZeroMQ - sockets on steroids, a set of patterns for organizing a wide variety of network interactions. The lack of a separate broker is just great. If you have not heard about this library, or have heard, but did not go into details - correct it now! Read them The Guide , it's worth it. This is my first project using this library, I hope experienced community members will look at the code and point out possible flaws.
Each Centrifuge process creates a PUB socket that is bound to a specific address / port. The process also has a SUB socket, which connects to the PUB socket of the current process and the PUB sockets of the other instances (if running). The disadvantage of this scheme is the need to manually specify all the addresses of PUB sockets when starting the process. Therefore, it is possible to run XPUB / XSUB proxies in a separate process and start all Centrifuge processes using this proxy. That is, to organize all the interaction here according to this scheme:
The last piece of the puzzle is the client. Tornado out of the box works with web sockets, but I decided to go a little further and allow the client to use SockJS as well. So everything will work in browsers without the support of web sockets. I would like to separately thank Serge S. Koval (mrjoes) for sockjs-tornado. Support for socket.io is not planned.
So, what is the result? And something like this:
As can be seen in the diagram, MongoDB or PostgreSQL is used as a database. What should we keep? Projects and their settings, categories within these projects. More on this talk a little below.
It seems to me that I still haven’t been able to clearly explain what I’m doing to everyone here. So, here’s something like this:
1) You wanted to add something real-time to your site - comments, charts, updated counters, notifications ...
2) However, your site is not on an asynchronous backend, or on the asynchronous one, but you don’t want to write from scratch the logic of the management of channels, subscriptions, etc.
A centrifuge may well suit you in this case.
3) pip install centrifuge. Or a little more detail in the documentation (the documentation is still crumpled, incomprehensible in some places, but in the future I hope to change it for the better).
4) I still have to integrate ... I will emphasize some important points.
For starters, you need to run a centrifuge. Yes, there are all sorts of options for launching there, but I believe that you will succeed, and if not, write to me, I will help.
After starting, you need to go to the administrative web interface, create a new project. The project has several settings, I will not describe them in this article - the text and so it turns out too much.
After creating the project, add categories to it - in fact, these are the namespaces in the project, within which there are channels. Since channels are created on the fly, categories play the role of a repository of settings for channels, and also exist to limit subscription rights to a particular channel within a category.
Perhaps the most important option for the category is bidirectional. If you tick it off, then connected clients will be able to send messages to the channel themselves, without involving your application. Otherwise, only a unidirectional message from the server to the client - when your application sends an event to the Centrifuge using a POST request. A POST request contains data about the project, category, channel and the message being sent directly. The message is sent to all subscribers subscribed to the channel.
So, the Centrifuge (or several Centrifuges) is spinning its event-loop, projects and categories are created, it's up to the client part. As I said before, you can use native web sockets or the SockJS library for communication. Currently there are no javascript libraries that simplify interaction with the Centrifuge, they may appear in the future. So far, in order to interact, you need to send JSON messages corresponding to JSON schemes. Currently there are only 4 methods for such commands:
auth - the first message after the connection is established - authorization.
subscribe - after successful authorization, you can subscribe to channels in various categories that were accessed during authorization.
unsubscribe - unsubscribe from the channel
broadcast - send a message to the channel, works only for channels belonging to the bidirectional category
In order not to allow everyone and everyone to connect to the Centrifuge channels, symmetric encryption is used based on the secret key. This key can be seen in the web interface in the project settings. Your web application should be able to generate a special token ( like this ) based on this project secret key.
When connecting a new client, if you specified a special address in the project parameters - the Centrifuge will send a POST request to this address with the data of the connecting client, your application should determine whether this client has access to the requested sections, and if authorization is allowed - return corresponding centrifuge response.
I omitted a lot of technical details. Specially did not insert a single piece of code - the project is very young and who knows what will change in the near future, at least after comments on this article. The interaction with the Centrifuge using a special client Cent , a detailed description of client authorization, analysis of command parameters for interaction from the browser, description of project options and categories remained behind the scenes. I think that would be too tiring. If there is any interest in the project, I will write about it in the future.
In the repository on Github there is an example of an application using a Centrifuge. And in the documentation there is an example of Nginx configuration for deployment. License - BSD. If for any reason you want, but cannot use Centrifuge because of the license - write to me, I will reconsider.
Waiting for your comments and suggestions for improvement.