There are two ways to handle parallel requests to the server. Streaming (threaded, synchronous) servers use a multitude of simultaneously running threads, each of which processes a single request. At this time event (evented, asynchronous) servers perform a single event loop that processes all requests.
To select one of the two approaches, you need to determine the server load profile.
Suppose each request requires
c milliseconds of CPU and
w milliseconds of real time for processing. CPU time is spent on active computations, and real time includes all requests to external resources. For example, a request requires 5 ms of time
c from the CPU and 95 ms to wait for a response from the database. The result is 100ms. Let's also assume that the streaming server version can support up to
t streams before scheduling and context switching problems begin.
If each request only requires CPU time to process, the server is able to respond to at most 1000 /
c requests per second. For example, if each request takes 2 milliseconds of CPU time, then it will be 1000/2 = 500 requests per second.
')
In general, a multi-threaded server is able to handle
t * 1000 /
w requests per second.
The throughput of the streaming server is the minimum of these expressions (1000 /
c and
t * 1000 /
w ). The event server is limited only by the CPU performance (1000 /
c ), since it uses only one thread. Everything described above can be expressed as follows:
def max_request_rate(t, c, w): cpu_bound = 1000. / c thread_bound = t * 1000. / w print 'threaded: %d\nevented: %d' % (min(cpu_bound, thread_bound), cpu_bound)
Now consider the different types of servers and see how they show themselves in a different implementation.
For examples, I will use
t = 100.
Let's start with a classic example: HTTP proxy server. This type of server requires almost no CPU time, so suppose that
c = 0.1 ms. Suppose that the following servers receive a delay, say
w = 50 ms. Then we get:
>>> max_request_rate(100, 0.1, 50) threaded: 2000 evented: 10000
Our calculations show that the streaming server will be able to process 2,000 requests per second, and event-driven 10,000. The high performance of the event server tells us that the number of streams has become a bottleneck for the streaming server.
Another example is a web application server. First, consider the case of an application that does not require any external resources, but makes certain calculations. The processing time will be, say, 5 ms. Because no blocking calls are made, the time
w will also be 5 ms. Then:
>>> max_request_rate(100, 5, 5) threaded: 200 evented: 200
In this case, CPU performance was a bottleneck.
Now imagine that the application needs to request data from an external resource and the CPU time is 0.5 ms, and the total time
w = 100 ms.
>>> max_request_rate(100, 0.5, 100) threaded: 1000 evented: 2000
As can be seen from simple calculations, and indeed from mathematics, a synchronous streaming implementation will not have greater performance than an asynchronous event.
However, it should be borne in mind that the end user is primarily interested in real time, which he spends waiting for a response. Problems begin when an application has to do something long. For example, to make a large and complex request to the database, to carry out complex calculations, process graphics, make a request to someone else's server. There are many examples.
As soon as a single-threaded program serving ten thousand clients in a cycle, has to linger for the sake of one of them, for example, for a second, everyone else is waiting for this second. If there are many such requests, no client will receive a response earlier than the total request processing time. Cooperative multitasking in action. In the case of streaming solutions, there is preemptive multitasking. The system will not allow "heavy" request to spend the time of all.
On the one hand, we have production asynchronous solutions whose bottleneck is the complexity of writing programs, since most of the existing libraries are written in blocking style. The programmer takes responsibility for the fact that an error in one request does not affect the others.
On the other hand, we have blocking solutions for which we are used to writing programs. The problem of the number of supported threads can be solved with the help of specialized tools, for example, greenlet or Erlang processes. If streaming servers reach a bar at which the number of threads will no longer be the bottleneck, they will look more attractive due to response time and reliability.