In the world of Django, the addition of Django Channels is gaining popularity. This library should bring asynchronous network programming to Django that we have been waiting for. Artyom Malyshev at Moscow Python Conf 2017 explained how the first version of the library does it (now the author has already written down channels2), why does it do it and whether it does at all.
First of all, Zen Python says that any solution should be the only one. Therefore, in Python, at least three . Network asynchronous frameworks already exist in large numbers:
Twisted;
Eventlet;
Gevent;
Tornado;
Asyncio.
It would seem, why write another library and whether it is necessary at all. ')
About speaker: Artyom Malyshev is an independent Python developer. Engaged in the development of distributed systems, speaks at conferences on Python. Artyom can be found on the nickname @ PROOFIT404 on Github and on social networks. Django is synchronous by definition . If we are talking about ORM, then synchronously refer to the database during attribute access, when we write, for example, post.author.username, it does not cost anything.
In addition, Django is a WSGI framework.
WSGI
WSGI is a synchronous interface for working with web servers.
Its main feature is that we have a function that takes an argument and immediately returns a value. This is all that a web server can expect from us. No asynchronous and does not smell .
This was done a long time ago, back in 2003, when the web was simple, users read all kinds of news on the Internet, went to guest books. It was enough just to accept the request and process it. Give an answer and forget that this user was at all.
But, for a moment, now is not 2003, so users want much more from us. They want a rich web application, live content, they want the application to work great on the desktop, on the laptop, on other tops, on the clock. Most importantly, users do not want to press F5 , because, for example, there is no such button on tablets.
Web browsers, of course, meet us - they add new protocols and new features. If you and I were developing only the frontend, then we would simply take the browser as a platform and use its core features, since it is ready to provide them to us.
But, for backend programmers, everything has changed a lot . Web sockets, HTTP2, and the like are a huge pain in terms of architecture, because they are long-lived connections with their own states that need to be processed.
This is the problem that Django Channels for Django is trying to solve. This library is designed to give you the ability to handle connections, leaving the Django Core, to which we are accustomed, absolutely unchanged.
This was done by a wonderful man, Andrew Godwin , who has a terrible English accent that speaks very quickly. You should know him for things like the long-forgotten Django South and Django Migrations, which came to us from version 1.7. Since he repaired the migration for Django, he has been busy repairing web sockets and HTTP2.
How did he do it? Once upon a time, the following image went on the Internet: empty squares, arrows, the inscription “Good architecture” - you enter your favorite technologies into these small squares, you get a site that scales well.
Andrew Godwin wrote a server on these squares that stands in the front and accepts any requests, be they asynchronous, synchronous, e-mail, whatever. Between them is the so-called Channel Layer, which stores received messages in a format that is accessible to a pool of synchronous workers. As soon as an asynchronous connection sent us something, we record it in the Channel Layer, and then the synchronous worker can take it from there and process it in the same way as any Django View or anything else, synchronously. As soon as the synchronous code sent the response back to the Channel Layer, the asynchronous server will send it, stream it, do everything it needs. Thus, an abstraction is made.
This implies several implementations, and in production it is proposed to use Twisted, as an asynchronous server that implements the frontend for Django, and Redis , which will be the very channel of communication between synchronous Django and asynchronous Twisted.
The good news is that in order to use Django Channels, you don’t need to know either Twisted or Redis at all - these are all the implementation details. Your DevOps will know this, or you will meet when you repair production at three o'clock in the morning.
ASGI
Abstraction is a protocol called ASGI. This is a standard interface that lies between any network interface, server, be it a synchronous or asynchronous protocol, and your application. Its main concept is the channel.
Channel
A channel is an ordered first-in-first-out queue of messages that have a lifetime. These messages can be delivered zero or one time, and can only be received by one Consumer.
A function that accepts a message may send several answers, or may not send the answer at all. Very similar to view, the only difference is that there is no return function, thus we can talk about how many answers we return from the function.
We add this function to routing, for example, we hang it to receive a message on a web socket.
from channels.routing import route from myapp.consumers import ws_message channel_routing = [ route ('websocket.receive' ws_message), }
We register it in Django settings, as well as register the database.
There can be several Channel Layers in a project, just as there can be several databases. This thing is very similar to db router, if someone used it.
Next, we define our ASGI application. It synchronizes how Twisted starts and how synchronized workers are started — they all need this application.
import os from channels.asgi import get_channel_layer os.environ.setdefault( 'DJANGO_SETTINGS_MODULE', 'myproject.settings', ) channel_layer = get_channel_layer()
After that, the code is deployed: we launch gunicorn, standardly send an HTTP request, synchronously, with the view, as we are used to. We start the asynchronous server, which will stand in front of our synchronous Django, and the workers who will process the messages.
As we have seen, message has such a thing as the Reply channel. Why do you need it?
hannel unidirectional, respectively WebSocket receive, WebSocket connect, WebSocket disconnect is a common channel to the system for input messages. And the Reply channel is a channel that is strictly tied to the user's connection. Accordingly, message has an input and output channel. This pair allows you to identify from whom you received this message.
Groups
A group is a set of channels. If we send a message to a group, it is automatically sent to all channels of this group. This is convenient because nobody likes to write for loops. Plus, the implementation of groups is usually done using the native functions of the Channel layer, so it works faster than just sending messages one by one.
from channels import Group defws_connect(message): Group ('chat').add (message.reply_channel) defws_disconnect(message): Group ('chat').discard(message.reply_channel) defws_message(message): Group ('chat'). Send ({ 'text': message.content ['text'], })
And as soon as the channel is added to the group, reply will go to all users who have connected to our site, and not just the echo-answer to ourselves.
Generic consumers
What I love Django for is declarative. Similarly, there are declarative Consumers.
Base Consumer is basic, it can only map the channel that you have defined to your own method and call it.
There are a large number of predefined consumers with deliberately augmented behavior, such as WebSocket Consumer, which determines in advance that it will handle WebSocket connect, WebSocket receive, WebSocket disconnect. You can immediately specify which groups to add the reply channel to, and as soon as you use self.send it will understand, send it to a group or to a single user.
There is also a version of WebSocket Consumer with JSON, that is, not text, not bytes, but already parsed JSON will come to receive - this is convenient.
It is added to routing in the same way via route_class. In route_class, myapp is taken, which is determined from the consumer, from there all channels are taken and all channels specified in myapp are routed. Write in such a way less.
Routing
Let's talk in detail about routing and what it provides us.
First, these are filters.
// app.js S = new WebSocket ('ws://localhost:8000/chat/') # routing.py route('websocket.connect', ws_connect, path=r'^/chat/$')
This may be the path that came to us from the URI of the web socket connection, or the http request method. This can be any message field from a channel, for example, for an e-mail: text, body, carbon copy, whatever. The number of keyword arguments for a route is arbitrary.
Routing allows you to do nested routes. If several consumers are determined by some common characteristics, it is convenient to group them and add everyone to the route at once.
from channels import route, include blog_routes = [ route ( 'websocket.connect', blog, path = r'^/stream/') , ] routing = [ include (blog_routes, path= r'^/blog' ), ]
Multiplexing
If we open several web sockets, each has a different URI, and we can hang several handlers on them. But let's be honest, open a few connections just to make something beautiful on the back end, unlike an engineering approach.
Therefore, it is possible to call several handlers via a single web socket. We define such a WebsocketDemultiplexer, which operates with the notion of stream within a single web socket. Through this stream, it will redirect your message to another channel.
The stream argument is added to the message so that the multiplexer can figure out where to put the given message. The payload argument contains everything that goes to the channel after the multiplexer processes it.
It is very important to note that in the Channel Layer, the message will fall twice : before the multiplexer and after the multiplexer. Thus, as soon as you start using a multiplexer, you automatically add latency to your queries.
{ "stream" : "intval", "payload" : { … } }
Sessions
Each channel has its own sessions. This is a very handy thing, for example, to keep state between calls to handlers. You can group them by the reply channel, since this is an identifier that belongs to the user. The session is stored in the same engine, which stores the usual http session. For obvious reasons, the signed cookie is not supported, they are simply not in the web socket.
During the connection, you can get http session and use it in your consumer. As part of the negotiation process, setting up a web socket connection is sent to the user's cookies. Accordingly, therefore, you can get a user session, get a user object that you used to use in Django before, just as if you were working with a view.
from channels.sessions import http_session_user @http_session_user defws_connect(message) : message.http_session ['room'] = room if message.user.username : …
Message order
Channels allows you to solve a very important problem. If we establish a connection with a web socket and immediately send it, then this leads to the fact that two events — WebSocket connect and WebSocket receive — are very close in time. It is very likely that consumer for these web sockets will run in parallel. Debugging it will be very fun.
Django channels allows you to enter two types of lock:
Easylock . With the help of the session mechanism, we guarantee that until the consumer receives the message, we will not process any message on the web sockets. After the connection is established, the order is arbitrary, perhaps parallel execution.
Hardlock - only one consumer of a specific user is executed at a time. This is an overhead of synchronization, since the slow session engine is used. Nevertheless, there is such an opportunity.
In order to write this, there are the same decorators that we saw earlier in the http session, channel session. In declarative consumer you can just write attributes, as soon as you write them, it will automatically apply to all methods of this consumer.
Data binding
In due time Meteor became famous for Data binding.
Open two browsers, go to the same page, and in one of them click on the scroll bar. At the same time, in the second browser, on this page, the scroll bar changes its value. That's cool.
This is implemented using hooks provided by Django Signals . If binding is defined for a model, all connections that are in a group for this instance model will be notified of each event. They created a model, changed the model, deleted it - it will all be in the alert. The notification occurs on the specified fields: the value of this field has changed - a payload is formed, sent via a web socket. It's comfortable.
It is important to understand that if in our example we constantly click the scroll bar, then messages will always go on and the model will be saved. This will work up to a certain load, then everything will rest on the base.
Redis layer
Let's talk a little more about how the most popular Channel Layer for production - Redis.
It is arranged well:
works with synchronous connections at the level of workers;
very friendly to Twisted, does not slow down, where it is particularly necessary, that is, on your front-line server;
MSGPACK is used to serialize messages within Redis, which allows you to reduce the footprint on each message;
you can distribute the load across multiple instances of Redis; it will automatically be shaded by using a consistent hash algorithm. Thus, the single point of failure disappears.
The channel is simply a list of id from Redis. By id is the value of a particular message. This is done so that you can control the life of each message and channel separately. In principle, this is logical.
The first problem is the newly invented callback hell. It is very important to understand that most of the problems with the channels that you encounter will be in style: arguments came to the consumer that he did not expect. Where they came from, who put them in Redis is all a dubious task to investigate. Debugging of distributed systems in general for the strong in spirit. AsyncIO solves this problem.
Celery
On the Internet, they write that Django Channels is a replacement for Celery. I have bad news for you - no, it is not.
In channels:
no retry, you can not delay the execution of handler;
No canvas - just callback. Celery provides the groups, the chain, my favorite chord, which, after parallel execution of the groups, causes another callback with synchronization. None of this is in the channels;
there is no job for the arrival time of messages, some systems without this are simply impossible to design.
I see the future as official support for using channels and celery together, with minimal cost, with minimal effort. But Django Channels is not a Celery replacement.
Django for modern web
Django Channels is the Django for the modern web. This is the same Django that we all used to use: synchronous, declarative, with a large number of batteries. Django Channels is just one battery plus. You should always understand where to use it and whether to do it. If the Django project is not needed, then the Channels are not needed there. They are only useful in projects where Django is justified.
A professional conference for Python developers comes to a new level - on October 22 and 23, 2018 we will gather 600 best Python programmers in Russia, present the most interesting reports and, of course, create an environment for networking in the best traditions of the Moscow Python community with the support of the Ontico team.
We invite experts to make a presentation. The program committee is already up and running until September 7th.
For participants, an online brainstorming program is conducted. In this document, you can make the missing topics or just the speakers, whose performances are interesting to you. The document will be updated, in fact, you will be able to follow the program formation all the time.