What is an “asynchronous event model”, and why is it now “in fashion”

Now on the subject the Internet is fashionable the word "Node.js" . In this small article we will try to understand (“on the fingers”), where did all this come from, and how does this architecture differ from the usual architecture with “synchronous” and “blocking” input / output in the application code (a typical PHP + MySQL site) running on an application server that uses a "flow (or process) to request" scheme (classic Apache Web Server ).

About the readability of the article

This article, since its appearance here, has undergone many revisions (including conceptual) and additions, thanks to feedback from readers mentioned at the end of the article. If you have a piece for understanding here, describe it in the comments, and we will write it in an article in a more understandable language.

About performance

Modern high- powered sites like twitter , VKontakte and facebook work on bundles of the form PHP + Apache + NoSQL or Ruby on Rails + Unicorn + NoSQL, and do not slow down at all. First, they use NoSQL instead of SQL. Second, they distribute requests ( “balance” ) across many of the same working servers (this is called “horizontal scaling” ). Third, they cache everything they can: whole pages, pieces of pages, data in Json format for Ajax requests , and so on ... Cached data is “static” , and immediately sent to servers like NginX , bypassing attachment.

I personally do not know whether the site will be faster if it is rewritten from Apache + PHP to Node.js. In the thematic Internet can be found as those who consider the system flows slower "asynchronous event model", and those who defend the opposite point of view.
')
Thinking about what to write the next project, you should proceed from its tasks, and choose the architecture that is well superimposed on the objectives of the project.

For example, if your program supports multiple simultaneous connections, and constantly writes to them, and reads from them, then in this case you should definitely look in the direction of an “asynchronous event model” (for example, in the direction of Node.js). Node.js is perfect if you want to translate any subsystem to the WebSocket protocol.

Examples of systems that are well suited to the “asynchronous event model”:

the system in the dispatching taxi that monitors the movement of each car, distributes the flow of passengers, calculates the best way, etc.
a life support system that constantly collects data from a variety of scattered sensors, and controls chemical composition, temperature, humidity, etc.
the human body ( brain - control logic, nervous system - data transmission channel)
chat
MMORPG

What is “blocking” and “non-blocking” I / O

Let us understand the types of input / output using the example of a network socket (“socket” - literally “place of connection”), through which the Internet user connected to our website, and uploads a picture for the avatar. In this article, we will compare the “asynchronous event model” with the “familiar” architecture, where all the I / O in the application code is “synchronous” and “blocking”. “Habitual” - simply because before that no one bothered with all sorts of “locks”, and everyone wrote like that, and it was enough for everyone. What is “synchronous” and “blocking” I / O? This is the simplest and most common I / O on which most of the sites are written:

open file
start reading it
wait until it counts
file was considered
close file
display the read content on the screen

In the case of our socket, this will be:

start listening to the socket
read from it the first piece of image data
wait until the second data portion of the image comes to it
read from it the second piece of image data
wait for the next batch of image data
...
picture was considered
we put the picture on the avatar to the user

In this case, a “blocking” occurs in the code of our program, during which the thread is idle, although it could do something useful. To solve this problem, “synchronous” and “non-blocking” input / output was coined:

start listening to the socket
if there is no new data on it, stop listening to the socket
if it has already received some portion of the image data - read this data
stop listening to the socket

If these steps are performed in a loop until the last portion of the image data has been read, then we will also get the whole picture as a result. With the only difference that in this cycle, in addition to reading data from the socket, we can do something else useful, and not stand idle under the "lock". For example, one could also read data from another socket. Such a cycle of "non-blocking" I / O pops up again closer to the middle of the article.

There is also "asynchronous" I / O. In our article, we will not consider it, but in general it is when we hang the “callback function” (callback) from our code, which will be called by the operating system every time the next piece of image data comes to this socket. And then we forget about listening to this socket in general, going to do other things. "Asynchronous" I / O, as well as "synchronous", is divided into "blocking" and "non-blocking". But in this article, under the words “blocking” and “non-blocking”, we will mean precisely “synchronous” input / output.

And yet, in this article we will consider only the “familiar” architecture, where the application is running directly on the operating system, with its system threads, and not on any “virtual machine” with its “green threads”. Because inside a “virtual machine” with “green streams” you can perform various miracles, such as turning the supposedly “synchronous” I / O into “asynchronous”, which will be discussed closer to the end of the article, in the section “Alternative way”.

Prerequisites

The whole avalanche of experiments with new application architectures was caused by the fact that traditional architecture was solving the needs of the Internet at the dawn of its development, and, of course, was not designed to meet the evolving needs of the “web two-nol” Internet, in which everything buzzes and moves.

The PHP + MySQL + Apache combination proven over the years coped well with “Internet 1.0”. The server launched a new thread (or process, which is almost the same from the point of view of the operating system ) for each user request. This thread went to PHP , from there to the database, chose something there, and returned with an answer that was sent to the user via HTTP , after which it self-contained.

However, for real-time applications, it was missed. Suppose we have the task of “simultaneously maintaining 10,000 connections with users.” One could create 10,000 threads for this. How will they get along with each other? They will get along with each other system "scheduler" , whose task is to give each thread its share of processor time, and at the same time not deprive anyone. He acts like that. When one thread has worked a little, the scheduler starts, temporarily stops the thread, and “prepares the site” to start the next thread (which is already waiting in the queue).

Such a “site preparation” is called a “context switch” , and it includes the preservation of the “context” of the suspended flow, and the restoration of the context of the flow that will be launched next. The "context" includes the processor registers and process data in the operating system itself (id's, access rights, resources and locks, allocated memory, etc.).

How often the scheduler runs is decided by the operating system. For example, in Linux, the default scheduler runs about once every hundredth of a second . The scheduler is also called when the process is “blocked” manually (for example, by the sleep function) or while waiting for “synchronous” and “blocking” (that is, the simplest and most common) input / output (for example, a user request in the PHP stream waits until data will give him a monthly sales report).

In general, it is believed that “context switching” between system threads is not so expensive, and is in the order of a microsecond.

If threads actively read different areas of RAM (and write to different areas of RAM), then, with an increase in the number of such threads, they will miss the second-level cache (L2) of the processor, which is of the order of a megabyte. In this case, they will have to wait each time for the delivery of data on the system bus from the RAM to the processor, and for writing data on the system bus from the processor to the RAM. Such access to RAM is orders of magnitude slower than accessing the processor's cache: for this, this cache was invented. In these cases, the “context switch” time can go up to 50 microseconds .

On the Internet, you can find the opinion that the constant “context switching” of a large number of simultaneous streams can significantly slow down the entire system. However, I did not find unambiguous and detailed numerical evidence for this hypothesis.

Let us consider what imprint imposes a multi-threaded model on the application's consumption of RAM. A “stack” is associated with each system thread. If the thread calls a certain function with arguments, then the arguments of this function are put on the “stack”, and the current address in the code, called the “return address” (because we will return back here when the called function ends). If this function calls some other function inside itself, then the corresponding data is again written to the “stack”, on top of those that have already been written there, thus creating a semblance of a coil.

When creating a system thread, the “stack” is allocated by the operating system in RAM not all at once, but in pieces, as it is used. This is called “virtual memory . ” That is, each thread is immediately allocated a large piece of “virtual memory” under the “stack”, but in fact, all this “virtual memory” is split into “pieces” called “memory pages”, and these “memory pages” are already allocated to “real »RAM only when necessary. When a thread touches a “memory page” that has not yet been allocated in “real” RAM (for example, it tries to command the processor to write something there), the processor’s “memory control unit” intercepts this action and causes an “ exception ” in the operating system " Page fault ", to which it responds by allocating this "memory page" in the "real" RAM.

In Linux, the default stack size is 8 megabytes , and the size of the “memory page” is 4 kilobytes (one or two “memory pages” are immediately allocated to the “stack”). In terms of 10,000 simultaneously running threads, we get a requirement of about 80 megabytes of “real” RAM. It seems like a bit, and there seems to be no cause for concern. But the size of the required memory in this case grows as O (n) , which means that with a further increase in load, difficulties with “scalability” may arise: what if tomorrow your site will already serve 100,000 simultaneous users, and will require maintaining 100 000 simultaneous connections? And the day after tomorrow - 1,000,000? And after the day after tomorrow - it is still unknown how much ...

Single-threaded application servers lack such a drawback and do not require new memory as the number of simultaneous connections grows (this is called O (1)). Take a look at this graph comparing the memory consumption of Apache Web Server and NginX :

Modern web servers (including modern Apache) are not built entirely on a stream-to-query architecture, but on a more optimized one : there is a pool of pre-prepared threads that serve all requests as they arrive. This can be compared with an attraction in which there are 10 horses, and 100 riders who want to ride: a queue forms, and while the first 10 riders do not roll back and forth, the next 10 riders will stand and wait in line. In this case, an attraction is an application server, horses are streams from a pool, and riders are site users.

If we use such a “pool” of system threads, then at the same time we will be able to serve only the number of users, how many threads we will have “in the pool”, that is, not 10,000.

The difficulties described in this section, which constantly raise the question of the suitability of a multi-threaded architecture for servicing a very large number of simultaneous connections, have received the collective name “The C10K problem” .

Asynchronous event model

Needed a new architecture for this class of applications. And in such a situation, the “asynchronous event model” came in handy. It is based on the “event loop” and the “reactor” pattern (from the word “react” to respond).

An “event loop” is an endless loop that polls “event sources” (descriptors) for any “events” to appear in them. The survey is performed using the library of “synchronous” I / O, which, in this case, will be “non-blocking” (the O_NONBLOCK flag is passed to the system I / O function).

That is, during the next turn of the “event cycle”, our system passes through all the descriptors sequentially, and tries to count “events” from them: if there are any, they are returned as a read function to our system; if the descriptor does not have any new events, then it will not “block” and wait for the “event” to appear, but will immediately return the answer: “there are no new events”.

An “event” can be the arrival of a regular portion of data on a network socket (“socket” - literally “junction”), or the reading of a new portion of data from a hard disk: in general, any input / output. For example, when you upload a picture to the hosting, the data comes in chunks, each time causing the event "a new piece of picture data is received."

The “event source” in this case will be the “descriptor” (pointer to the data stream) of the TCP socket through which you connected to the site via the network.

The second component of the new architecture, as already mentioned, is the “reactor” pattern. And, for the Russian people, this is not the same reactor, which is at the nuclear power plant. The essence of this pattern is that the server code is not written in one large piece, which is executed sequentially, but in small blocks, each of which is called (“reacts”) when the event associated with it occurs. Thus, the code is a set of multiple blocks whose task is to "react" to some events.

Such a new architecture became “mainstream” after the appearance of Node.js. Node.js is written in C ++ , and bases its event loop on the Sibish library "libev" . However, Javascript is not a favorite language here: if the language of the library has “non-blocking” I / O, you can also write similar frameworks for it: Python has Twisted and Tornado , Perl has Perl Object Environment , Ruby has EventMachine (which is already five years old). On these "frameworks" you can write your own servers, like Node.js. For example, for Java (based on java.nio), Netty and MINA are written, and for Ruby (based on EventMachine), Goliath (which also benefits from Fibers ).

Advantages and disadvantages

"Asynchronous event model" is well suited where many, many users simultaneously perform some actions that do not load the processor. For example: they receive temperature from the sensors in the “ current time ” mode, receive images from video cameras, transmit the temperature taken from the thermometers attached to them to the server, write new messages in the chat, receive new messages from the chat, etc.

The requirement of actions that do not load the processor becomes clear when we remember that this entire infinite cycle is running in one single thread, and if you insert some heavy computation into this cycle (let's say, start solving a differential equation), then all the rest users will wait in the queue until this calculation is completed.

Therefore, servers like Node.js are suitable only for tasks that do not load the processor, or as a “frontend” for a heavyweight backend . And also they are suitable as servers for servicing “slow” requests (narrow communication channel, slow data return / sending, long response time somewhere inside, ...). I would take servers like Node.js to take the place of the “input-output” intermediary. For example, the place of the intermediary between the “client” and the “server”: the entire visual representation is created and drawn directly in the browser of the Internet user, all the necessary data is stored on the server in the repository, and Node.js performs the intermediary task, issuing the “client” the required data on request and writing new data to the storage when it comes from a “client”.

The fact that the servers on the "asynchronous event model" are running in the same system thread in practice creates two more obstacles. The first is memory leaks. If Apache creates a system thread for each new request, then, after sending the response to the user, this system thread self-destructs, and all the memory allocated to it is simply released. In the case of, say, Node.js, the developer should be careful not to leave a trace when processing the next user request (to remove all the evidence from the memory that such a request came at all), otherwise the process will devour more and more memory with each new request. The second is the handling of program errors. If, again, normal Apache creates a separate system thread to process the incoming request, and the processing code in PHP throws some “exception”, then this system thread will just silently “die”, and the user will receive in response a page like “500. Internal Server Error. In the case of the same Node.js, the only error that occurred during the processing of a single request would “put” the entire server entirely, due to which it would have to be monitored and restarted manually .

Another possible drawback of the “asynchronous event model” is sometimes (not always, but it happens, especially when using the “asynchronous event model” for what it is not intended for) the application code can become difficult to understand because of the intertwining of “callbacks” . This is called the problem of “spaghetti code” , and is described as: “a callback on a callback, a callback on drive”. They are trying to fight this, and, for example, the library Seq is written for Node.js.

Another way to eliminate “callbacks” in general is the so-called continuations ( coroutines ). They are introduced, for example, in Scala , starting with version 2.8 ( coroutines ), and in Ruby, starting with version 1.9 ( Fibers ). Here's an example of how using Fibers in Ruby, you can completely eliminate callbacks, and write code as if everything happens synchronously .

For Node.js, a similar node-fibers library was written. In terms of performance (in artificial tests, not in real applications), node-fibers still work about three to four times slower than the usual style with “callbacks”. The author of the library states that this performance difference arises where Javascript fits into the C ++ code of the V8 engine (on which Node.js itself is based), and that performance measurements should not be interpreted as “node-fibers three to four times slower” callbacks ”, but as“ compared to the other low-level actions in your code (working with byte arrays, connecting to a database or to a service on the Internet), the node-fibers performance imprint will not be noticed at all ”.

In addition to the usual programming style, node-fibers gives us a familiar and convenient way to handle try / catch errors . However, this library will not be implemented into the core of Node.js , since Ryan Dahl sees the purpose of his creation in being low-level and not hiding anything from the developer.

This concludes the main part of this article, and finally we will briefly consider an alternative way, and how the “event loop” polls the “sources of events” for new data in them.

Alternative way

In this article, we explained why an application that uses “synchronous” and “blocking” I / O does not support a large number of simultaneous connections. As one of the solutions, we proposed to transfer this application to an “asynchronous event model” (that is, to rewrite the application, say, on Node.js). In this way, we will solve the problem by actually (backstage) switching from “synchronous” and “blocking” input / output to “synchronous” and “non-blocking” input / output. But this is not the only solution: we can also resort to "asynchronous" I / O.

Namely, we can use the good old "pool" of system flows (described earlier in this article), which evolved to a new stage of development. This stage of development is called “green processes” (respectively, there are also “green streams”). These are processes, but not system ones, but created by a virtual machine of the language in which our code is written. A virtual machine runs inside itself the usual “pool” of system threads (say, by the number of cores in the processor), and already on these system threads displays its internal “green processes” (completely hiding it from the developer).

“Green processes” are precisely “processes” and not “flows”, since they do not have any common variables with each other, and communicate only by sending control “messages” to each other. Such a model provides protection against various “deadlocks” and avoids problems with data sharing , because everything that has a “green process” is its internal state and “message”.

Each "object" has its own turn of "messages" (for this, a "green process" is created). And any call to the “object” code is sending a “message” to it. Sending “messages” from one “object” to another “object” occurs asynchronously.

In addition to this, the virtual machine creates its I / O subsystem, which is mapped to non-blocking system I / O (and again the developer is unaware of anything).

And, of course, the virtual machine also contains its internal scheduler.

As a result, the developer thinks that he is writing the usual code, with the usual I / O, but in fact there is a very high-performance system. Examples: Erlang , Actors in Scala .

How the “event loop” polls the “event sources” for new data

The simplest solution you can think of is to poll all the “descriptors” (open network sockets, read or write files, ...) for new data. This algorithm is called “poll” . It looks like this:

you have two open sockets
you create an array of two structures that describe these sockets
to each element of this array, you put down what and what socket to write to it
then you pass this array to the system function poll, which writes there a description of the current state of these sockets
after that you walk through this array again, figuring out whether there is new data for these sockets
if there is, read them and do something with them
« »

, , , « » « », ( ).

, , , - ( , ).

( 95% ) ( 10 000 ) , .

, , , . , , « » . : « O(n)».

? , : epoll Linux' kqueue FreeBSD. Windows' IO Completion Ports , epoll ', Node.js' Windows, libuv , libev, IO Completion Ports.

epoll . poll ' .

( ), ( ), .
/dev/epoll ( «», Linux' « »). ( ) mmap - . ioctl

O(n) , O(1) , .

,

, : akzhan , erlyvideo , eyeofhell , MagaSoft , Mox , nuit , olegich , reddot , splav_asv , tanenn , Throwable .
, : Goder , @theelephant.