As an introduction to asynchronous programming and the most superficial story about
Twisted Framework, I publish the materials of
my report on
HighLoad ++ (2009) .
Recently, in the web area, attention has shifted from “heavy” application servers, which spend hundreds of milliseconds, or even seconds, to process a request to more lightweight services that transmit smaller amounts of data with minimal latency. Transition from generation of tens and hundreds of kilobytes of HTML code in response to a request to transfer changes in data packed in JSON and measured in hundreds of bytes. Examples of such services include Gmail, FriendFeed, Twitter Live Search, etc.
To ensure minimum user latency, you must either maintain a persistent connection (for example, Adobe Flash, RTMP) or use the HTTP long polling technique in conjunction with keep alive. Anyway, on the server side this leads to the appearance of a large number of simultaneous connections (thousands, tens of thousands), for each of which not such a large amount of data is transmitted. This situation is called
C10k problem.
For server-side processing, the architectural choice on the server side is not so big: a process for a connection, a thread for a connection, a combined process-thread variant or asynchronous I / O (possibly in combination with additional processes or threads). With more than 10 thousand simultaneous connections in terms of resource consumption, it is absolutely impossible to imagine the creation of 10 thousand processes; 10 thousand threads is also unlikely to be a reasonable solution. It should be additionally taken into account that with such a large number of connections, the amount of work on each of them is relatively small, most of them stand idle waiting for new data to arrive. Therefore, most processes or threads will simply be in a standby state, wasting system resources.
')
Asynchronous I / O allows non-blocking network I / O across thousands of open sockets within a single execution thread (one process). Implementation mechanisms in different OSs are different, for example: select (), poll (), epoll (), kqueue (), etc. Examples of applications using asynchronous I / O:
- nginx (additional processes are used to service tasks that require more CPU);
- haproxy;
- memcached;
- other.
However, asynchronous I / O is not a universal solution: for a database server, this would hardly be a good way to organize connection maintenance, since large amounts of disk I / O and processor time are required to process each request, which does not allow within one process.
Twisted Framework is an extensive set of classes and modules for implementing asynchronous network applications. Twisted Framework is:
- a kernel that abstracts all asynchronous I / O operations and uses the appropriate OS-specific mechanism;
- the concept of Deferred, which allows you to implement in a simple form the service of the request: asynchronous network calls (for example, to the database, memcached), handling error situations; Deferred is similar to conventional sequential programming constructs for an asynchronous programming model;
- An extensive set of already implemented network protocols: HTTP, DNS, SMTP, IMAP, memcached, Jabber, ICQ, etc .; even more protocols are available as additional modules;
- additional infrastructure: unit-tests with Deferred support, thread pools, processes, etc .;
- High-quality development concept - complete unit-test coverage, strict review of any changes.
The report presents as a concrete example three applications implemented on Twisted with my participation. The architecture, specific performance parameters, optimization techniques, advantages and disadvantages of Twisted for solving this problem are considered:
- PyFMS RTMP server, Smotri.Com service broadcasting server (hundreds of broadcasts, tens of thousands of viewers);
- backend server of the MDC project - storage and processing of the user’s communication history, storage of settings, etc .;
- Qik Push Engine is a server for immediate delivery of changes to information about videos created by service users, including push notifications about live streams that have appeared, scaling, processing large amounts of information.
Additional Information:
Presentation