Introduction to the development of web-applications on PSGI / Plack. Part 4. Asynchrony

With the permission of the author and chief editor of the magazine PragmaticPerl.com, I publish this article.
The original article can be read here .

Continuing a series of articles devoted to the development of PSGI / Plack. We deal with asynchrony.
In previous articles we looked at the main aspects of development for PSGI / Plack, which, in principle, are sufficient for developing applications of almost any complexity.

We figured out what a PSGI is, figured out how Plack works, then we figured out how the main Plack components (Plack :: Builder, Plack :: Request, Plack :: Middleware) work. Then we looked at Starman in detail, which is a good PSGI server, ready for use in production.

Nuance

Everything that was considered earlier concerned development under the execution model, which is called synchronous. Now consider the asynchronous model.
')

Synchronization and asynchrony

The synchronous model is simple and straightforward. Everything happens one after another in a certain order. This is called an execution process. Consider one interpreter process that, say, performs a loop, one of the elements of which is user input. The next loop iteration will not be executed until the previous one, which includes waiting for user input of data, is completed. This is a synchronous model.

While the user does not enter anything, the program waits for input and does nothing useful. This situation is called execution execution lock. In this case, a simple program simply recycles processor time. But if in the process of waiting for the user, the program does something else, waiting for input, then the process becomes asynchronous, and the situation, accordingly, is non-blocking.

We go to the bar

Consider the bar as an example. A simple bar or pub in which customers sit and drink beer. A lot of customers. The bar has two waiters - Bob and Joe. They work in two different ways. Bob approaches the clients, takes the order, goes to the bar, orders the barman a glass of beer, waits until the bartender pours the glass, takes it to the customer, the situation repeats. Bob is working synchronously. Joe does a completely different thing. He takes the order from the client, goes to the bartender, tells him: “Hey, pour a% beername% glass”, then goes to take the order from the next client. As soon as the bartender pours a glass, he calls Joe, who takes the glass and takes it to the customer.

In this case, Bob works synchronously, and Joe, respectively, asynchronously. Joe's job model is event-oriented. This is the most popular model of asynchronous systems. In our case, waiting for input is the time required to fill the glass with beer, the event manager is the bartender, and the event is the bartender’s cry “% beername% is poured”.

Problem

Now, readers who have never worked with asynchronous systems should have a question. “And why, in fact, do synchronous things, if asynchrony is faster and more convenient?”.

This is a very popular fallacy, but it is not. Asynchronous solutions also have a number of problems and disadvantages. There are a lot of places where you can read that asynchronous solutions are more productive than synchronous ones. Yes and no.

Let's go back to the waiters. Bob is working slowly, telling jokes to the bartender, steadily spreading glasses, and Joe is constantly wagging like mad. The load on Joe, of course, higher, because he does much more at the same time. The load on Bob is minimal as long as there are no customers. As soon as there are many customers, they begin to loudly demand their beer and rush Bob. The load on the client’s side is increasing, but Bob continues to work at the same pace, he doesn’t care, he’s not going to give up his scheme of work, and even if the sky collapses.

So, from here we can conclude that asynchrony is not bad, but it should be understood that the asynchronous system will always be under load. The load, in principle, will be the same as for the synchronous system, but with one difference. The synchronous system is subject to peak loads, and asynchronous these loads “smears” over the execution time.

And most importantly, we must not forget that any system can simultaneously perform as many tasks as the processor cores are available to the process.

Asynchronous PSGI / Plack

Classic Plack application (skip the builder section):

my $app = sub { my $env = shift; my $req = Plack::Request->new($env); my $res = $req->new_response(200); $res->body('body'); return $res->finalize(); };

From the code it is clear that the $ app scalar contains a link to a function that returns a valid PSGI response (link to an array). Thus, it is a reference to a function that returns a reference to an array. Here you can add asynchrony, but the case does not come out of it, because the executable process will be blocked.

A PSGI application that is a reference to a function that returns a reference to an array should be executed to the end, and only then release the execution thread.

Naturally, this code will work correctly on any PSGI server, since he is synchronous. Any asynchronous server can execute synchronous code, but the synchronous server cannot execute asynchronous code. The code above is synchronous. In the last article we touched on a PSGI server like Twiggy. I recommend installing it if you do not already have it. This can be done in several ways. With cpan (cpan install Twiggy), with cpanm (cpanm Twiggy), or take it on github.

Twiggy

Twiggy - asynchronous server. Twiggy and Starman have the same author @miyagawa.

About Twiggy @miyagawa says the following: “PSGI / Plack HTTP server based on AnyEvent.”

Twiggy - a supermodel from the 60s, which, as many believe, marked the beginning of fashion for "thin", but because the server is very “light”, “thin”, “small”, the name was not chosen by chance.

Delayed response

PSGI application with a deferred response is presented in the documentation as follows:

 my $app = sub { my $env = shift; return sub { my $responder = shift; fetch_content_from_server(sub { my $content = shift; $responder->([ 200, $headers, [ $content ] ]); }); }; };

Let us see how it works, to understand how to use it further and write your application that works with a deferred answer.

An application is a reference to a function that returns a function that will be executed after certain conditions are met (callback). As a result, the application is a function reference that returns a function reference. That's all you need to understand. The server, if the PSGI environment variable “psgi.streaming” is set, will attempt to perform this operation in non-blocking mode, i.e. asynchronously.

So how does it work?

If you run such an application on Starman, there will be no difference, but if we use a delayed response on an asynchronous server, the execution process will look like this.

The server receives a request.
The server is requesting data from somewhere, from where it is coming for a long time (fetch_content_from_server function).
Then, while waiting for a response, it can accept more requests.

If the model were synchronous, the server would not be able to accept a single request until the previous one had completed.

Write the application using the deferred response mechanism. The application will look like this:

 use strict; use Plack; my $app = sub { my $env = shift; return sub { my $responder = shift; my $body = "ok\n"; $responder->([ 200, [], [ $body ] ]); } }

And now we will launch the application both with the help of Starman, and with the help of Twiggy.

The launch team using Starman does not change with us and looks like this:

 starman --port 8080 app.psgi

To run using Twiggy:

 twiggy --port 8081 app.psgi

Now we will make a request first to one server, then to another.

Request to Starman:

 curl localhost:8080/ ok

Query to Twiggy:

 curl localhost:8081/ ok

So far, no difference, and the server work the same way.

And now let's do a simple experiment with Twiggy and Starman. Imagine that we need to write an application that will perform something at the request of the client, and after completing the operation, report on the work done. But, since we do not need to keep the client, we use to simulate the execution of anything AnyEvent-> timer () for Twiggy, sleep 5 for Starman. In general, sleep is not the best option here, but we don’t have another one, because code with AnyEvent in Starman will not work.

So, we realize two options.

Blocking:

 use strict; sub { my $env = shift; return sub { my $responder = shift; sleep 5; warn 'Hi'; $responder->([ 200, [ 'Content-Type' => 'text/json'], [ 'Hi' ] ]); } }

No matter how we launch it, even with the help of Starman, even with the help of Twiggy, the result will always be the same. Start it, for starters, with the help of Starman with the following command:

 starman --port 8080 --workers=1 app.psgi

Warning: for the purity of the experiment should be used Starman with a single workflow.
Turning to the server from different terminals at the same time, we can see how this application is executed. First, the worker will take the first request and begin to execute it. At this point, the second request will be in the queue. As soon as the first request is completed, the server will begin processing the next request.

In total, two requests will be executed in approximately 10 seconds (the second one is launched for processing only after the first one). If the request is 3, then the estimated execution time will be 18 seconds. This situation is called blocking.

Asynchronous code

If you run the previous execution example using Twiggy, the result will be exactly the same. Now the question may arise why an asynchronous server is needed if it is blocked and Starman works in the same way.

The fact is that in order for something to work asynchronously, a mechanism is needed that will provide asynchrony, an event loop, for example.

Twiggy is built around an AnyEvent mechanism that starts when the server starts. We can use it immediately after starting the server. It is also possible to use Coro, an article on which will also necessarily be.

Now we will write the code that will not work with Starman, and get a ready-made asynchronous application.

Let's put the code in order and make the application asynchronous. As a result, we should have something like this:

 sub { my $env = shift; return sub { my $respond = shift; $env->{timer} = AnyEvent->timer( after => 5, cb => sub { warn 'Hi' . time() . "\n"; $respond->([200, [], ['Hi' . time() . "\n"]]); } ); } }

It is worth recalling that there will always be locks, where they will be depends on writing the code. The less time the server is blocked, the better.

How it works?

The timer starts first. The main point is that in the return sub {...} you need to assign the observer object (AnyEvent-> timer (...)) to the variable that was declared before the return sub {...}, or use condvar. Otherwise, the timer will never be executed, because AnyEvent will consider that the function is completed and nothing needs to be done. When the timer expires, an event occurs, the function is executed, and the server returns the result. If you make from different terminals, for example, three requests, then they will all be executed asynchronously, and the response of the timer event will be returned. But the most important thing is that there is no blocking. This is evidenced by the result of three queries made from different terminals, the output of STDERR:

 twiggy --port 8080 app.psgi Hi1372613810 Hi1372613811 Hi1372613812

The server was launched by the following command:

 twiggy --port 8080 app.psgi

And requests were executed using curl:

 curl localhost:8080

Recall that the preforking server in the classic form is synchronous. The simultaneity of requests is processed using a certain number of workers. Those. If you run the previous synchronous code:

 use strict; sub { my $env = shift; return sub { my $responder = shift; sleep 5; warn 'Hi'; $responder->([ 200, [ 'Content-Type' => 'text/json'], [ 'Hi' ] ]); } }

with several workers, it turns out that two requests will be executed simultaneously. But it’s not asynchronous, but that each request is processed by its own workflow. This is how Starman works, preforking PSGI server.

Take the asynchronous example:

 sub { my $env = shift; return sub { my $respond = shift; $env->{timer} = AnyEvent->timer( after => 5, cb => sub { warn 'Hi' . time() . "\n"; $respond->([200, [], ['Hi' . time() . "\n"]]); } ); } }

We will start the following command:

 twiggy --port 8080 app.psgi

and repeat the experiment with two simultaneous requests.

Indeed, Twiggy works as a single process, but nothing prevents it from performing other useful activities while waiting. This is asynchrony.

This example was used solely to demonstrate how a deferred response can be used. For a better understanding of how Twiggy works, it is recommended that you read the AnyEvent articles in previous issues of the magazine (“Everything you wanted to know about AnyEvent, but were afraid to ask” and “AnyEvent and fork”).

At the moment there is a fairly large number of PSGI-servers that support event cycles. Namely:

Feersum is an asynchronous XS server with unrealistic performance, based on EV.
Twiggy is an asynchronous server, based on AnyEvent.
Twiggy :: TLS is the same Twiggy, but with ssl support.
Twiggy :: Prefork is the same Twiggy, but with workers.
Monoceros is a young, hybrid server with both synchronous and asynchronous parts.
Corona - asynchronous server, based on Coro.

findings

Any technology has its own nuances. Deciding which approach to use should be based on the data for each specific task, but not use the asynchronous approach everywhere, because it is fashionable.
Dmitry Shamatrin

Source: https://habr.com/ru/post/248457/

All Articles