Some time ago, several events happened that changed the familiar look of the Python web development landscape: Facebook acquired the Friendfeed service and immediately opened the source code of the project's technology — the http server and the Tornado microform. At the same time, the developer of Friendfeed published a note in his blog in which he cited the reasons why it was decided to develop from scratch his own asynchronous web server.
The article is an excursion into the very heart of this and competing (Twisted.web) projects, their asynchronous data processing cycles.
')
The developer's
note contained criticism of Twisted, a popular framework for building asynchronous applications, as untested and unstable; The results of comparing the performance of a simple application on Twisted.web (Twisted subset, specializing in the http protocol and web development) and Tornado were presented. Naturally, the latter in these tests turned out to be more effective.
One of the key programmers Twisted
could not stand aside and gave the reasons why Friendfeed would not have reinvented the bike and used the existing tools; in the next
post he pointed to another development - Orbited Comet-server, which was ported to Twisted for reasons of greater stability and ease of development.
From the point of view of the web developer, Tornado and Twisted.web are not very different, because they are microframes that provide only the most basic tools for working with queries, authorization, and so on, and cannot compare with such giants as Django or, if you go beyond Python world, Ruby on Rails.
Asynchrony
The heart, soul, and the main difference between both applications from competitors is asynchronous processing of requests by the server, which allows to gain in performance due to the rejection of context switches typical of synchronous servers that produce many processes or threads.
All actions are performed by a single process (thread) in a single cycle, the “event loop” (event loop), similar to those found in frameworks for building interfaces.
Performance
As mentioned above, the gain compared with the classical synchronous servers is achieved by creating such a single cycle of calculations, which would allow to abandon the switching of the context of the kernel.
Such a cycle is present in both Tornado (ioloop) and Twisted (various implementations of the reactor). Let's try to understand each of them, determine the reasons for performance gains on the Tornado http server, evaluate the code and architectural solutions of each of the asynchronous servers.
Tornado (ioloop)
The Tornado ioloop module uses the default epoll mechanism for working with non-blocking sockets. If there is no one on the platform (in fact, only Linux with version 2.6 and later kernels are suitable) is not provided, then
universal select is used.
The implementation of the main loop is extremely simple; it fits in a couple of small files: epoll.c is the wrapper for epoll, ioloop.py is the implementation of the loop.
In epoll.c in the Python function, epoll_create, epoll_ctl, epoll_wait are wrapped and the epoll module is declared. This module is compiled and used if the standard language module for asynchronous work with sockets (select module) does not support epoll (does not contain the epoll class).
So, the event loop itself is located in the start method of the IOLoop class of the ioloop.py module. Below are parts of this method with somewhat expanded explanations:
def start (self):
self._running = True
while true:
# Default timeout between event handler call cycles
# avoids hanging the event pool
poll_timeout = 0.2
# Create a list of event handlers
callbacks = list (self._callbacks)
for callback in callbacks:
# Remove the handler from the list of unused and execute
if callback in self._callbacks:
self._callbacks.remove (callback)
self._run_callback (callback)
# If there are handlers, there is no need for a delay between cycles
if self._callbacks:
poll_timeout = 0.0
# If there are event handlers that are executed with a time delay, and specified
# time has passed - we execute such handlers.
if self._timeouts:
now = time.time ()
while self._timeouts and self._timeouts [0] .deadline <= now:
timeout = self._timeouts.pop (0)
self._run_callback (timeout.callback)
# the next set of events will be collected either standard time
# delays, or if you need to call the delayed handler earlier,
# through the time set for this handler
if self._timeouts:
milliseconds = self._timeouts [0] .deadline - now
poll_timeout = min (milliseconds, poll_timeout)
# If some handler decided to stop the work, we exit the loop
if not self._running:
break
# Next, pool events are collected for a specified time.
try:
event_pairs = self._impl.poll (poll_timeout)
except Exception, e:
if e.args == (4, "Interrupted system call"):
logging.warning ("Interrupted system call", exc_info = 1)
continue
else:
raise
# For given file descriptors (sockets), events are pulled out and
# with them their handlers are called (for example, functions that read data from sockets - fdopen)
self._events.update (event_pairs)
while self._events:
fd, events = self._events.popitem ()
try:
self._handlers [fd] (fd, events)
except KeyboardInterrupt:
raise
except OSError, e:
if e [0] == errno.EPIPE:
# occurs when a client connection is lost
pass
else:
logging.error ("Exception in I / O handler for fd% d",
fd, exc_info = True)
except:
logging.error ("Exception in I / O handler for fd% d",
fd, exc_info = True)
Here, in general, and all. Calls pending for a certain time (or one cycle) and handlers of incoming events are cyclically called. The data received by the handlers is not read / written completely, but gradually, through buffers.
All other levels of the framework are written in the same simple and uncluttered style: http-server, request handlers and individual connections.
Twisted (reactor)
The twisted.internet.reactor module from the framework is the same event loop that executes event handlers and possible errors.
By default, the web server reactor (as well as the framework as a whole) uses the select engine of event distribution for non-blocking sockets; This mechanism is universal for Unix and Win32 platforms, although it is slightly inferior in efficiency to kqueue (FreeBSD) or epoll reactors (only for Linux)
Consider the operation of the reactor EPollReactor, as an analogue of the main mechanism used in Tornado (ioloop, working with epoll).
The reactor contains several dictionaries, around which all asynchronous loop logic is concentrated. Dictionaries are declared in the class constructor:
class EPollReactor (posixbase.PosixReactorBase):
implements (IReactorFDSet)
def __init __ (self):
self._poller = _epoll.epoll (1024)
self._reads = {}
self._writes = {}
self._selectables = {}
posixbase.PosixReactorBase .__ init __ (self)
This is where the event pool itself is created (_poller); dictionaries (_reads and _writes) containing mappings of whole numbers of file descriptors to random numbers. As a matter of fact, these are simply sets of descriptors for reading (_reads) and writing (_writes) data.
Of interest is the cycle of asynchronous event processing itself, so we omit the description of the utility methods declared in the reactor class (and its base class).
The iteration of a sample of events and their processing is as follows (comments are translated and, if possible, expanded):
def doPoll (self, timeout):
if timeout is None:
timeout = 1
# convert the iteration delay (event collection time) to milliseconds
timeout = int (timeout * 1000)
try:
# The number of selected events is limited by the number of monitored
# I / O objects (number selected heuristically)
# and the cycle locking time passed in the argument to the loop-calling function.
l = self._poller.wait (len (self._selectables), timeout)
except IOError, err:
if err.errno == errno.EINTR:
return
# In case of interruption of waiting by a signal, we exit the iteration;
# in all other cases it is assumed that errors could occur
# only on the side of the application and it is worth passing the exception further
raise
# If during the collection of events there were no errors, proceed
# call event handlers on handles.
_drdw = self._doReadOrWrite
for fd, event in l:
try:
selectable = self._selectables [fd]
except KeyError:
pass
else:
log.callWithLogger (selectable, _drdw, selectable, fd, event)
The self._doReadOrWrite reactor method (renamed to _drdw) is passed a handle, an event that occurred on it, and an event handler (if one was found). Let's look at the method itself:
def _doReadOrWrite (self, selectable, fd, event):
why = None
inRead = False
if event & _POLL_DISCONNECTED and not (event & _epoll.IN):
why = CONNECTION_LOST
else:
try:
if event & _epoll.IN:
why = selectable.doRead ()
inRead = True
if not why and event & _epoll.OUT:
why = selectable.doWrite ()
inRead = False
if selectable.fileno ()! = fd:
why = error.ConnectionFdescWentAway (
'Filedescriptor went away')
inRead = False
except:
log.err ()
why = sys.exc_info () [1]
if why:
self._disconnectSelectable (selectable, why, inRead)
Here, events of receipt and recording of data from / to the handle are processed, error handling takes place, if any.
Thus, at the lowest level, Tornado and Twisted are similar, differences begin at higher levels of abstraction. The development from the Friendfeed team makes just a few simple add-ons on the loop (HttpStream -> HttpConnection -> HttpServer and others). The cycles here are based solely on epoll or select.
Twisted Framework is built on special abstractions (like Deferred); its reactors are implemented for a wider range of solutions: poll, epoll, select, kqueue (MacOS and freeBSD), a couple of tools under Win32; There are reactors that are built into the framework framework for building interfaces (PyGTK, wxWidgets).
findings
Strictly speaking, it is difficult to compare the universal network framework and specialized application. The Tornado code is much simpler and more concise as a whole, more responsive to the pythonic principle. Only the absence of tests is puzzling, which is considered indecent in modern development.
On the other hand, Twisted is a versatile tool that, with all its really wide possibilities, preserves its harmony and consistency; and in this sense, it can be compared to a great Qt (in the original implementation for C ++). Http-server - just a special case of its use. Code greater
parts of the framework components are well tested, it even provides its own testing tool (Trial).
Naturally, Twisted, like any generalizing system, is inferior in performance to specialized development.
Another reason why Twisted is inferior in efficiency to Tornado and another high-performance asynchronous framework Diesel is more advanced error handling, which adds reliability but hides cherished RPS.
So, the main advantage of Twisted is universality. Tornado - performance.
What to choose? Decide for yourself. Both frameworks provide the web programmer with a very spartan set of development tools, clearly inferior in the simplicity of Django and the comprehensive fullness of Zope; both win speeds (up to a 20-30 percent increase compared to Apache solutions).