When comparing asynchronous programming exceeds synchronous, both in memory consumption and in performance. We are familiar with this fact for years. If you look at Django or Ruby on Rails, perhaps the two most promising web frameworks that have emerged over the past few years, both are based on a synchronous style. Why even in 2010 we write programs that rely on synchronous programming?
The reason we are stuck in synchronous programming is twofold. First, the way of writing code directly for asynchronous behavior is inconvenient. Secondly, popular and / or common languages ​​have not enough built-in constructions required for the implementation of less straightforward trips to asynchronous programming.
Asynchronous programming is too hard
Let's first look at the direct implementation: the event loop. In this campaign, we have one process with a closed infinite loop. Functionality is achieved by fast execution of small tasks in this cycle. One of them can read several bytes from a socket, while another function can write several bytes to a file, and another one can do some calculations, for example, read XOR on data that is buffered from the first socket.
')
The most important thing in this cycle is that one and only one task is performed at each time point. This means that you have to break the logic into small pieces that are executed sequentially. And if one of the functions is blocked, it stops the whole cycle, and nothing can be executed at this moment.
We have some really good frameworks that are designed to make working with event processing cycles easier. In Python it is Twistev, and, somewhat newer, Tornado. Ruby has EventMachine. Perl has POE. What these frameworks do in two ways: provide constructs for easier work with the event loop (such as Deferreds or Promises, and provide asynchronous implementations of common tasks, for example, HTTP or DNS clients.
But these frameworks are not very good for asynchronous programming for two reasons. First, we need to change the coding style. Imagine what it would look like to display a simple blog page with comments. Here is a small piece of JavaScript to show how this works in a synchronous framework:
function handleBlogPostRequest(request, response, postSlug) { var db = new DBClient(); var post = db.getBlogPost(postSlug); var comments = db.getComments(post.id); var html = template.render('blog/post.html', {'post': post, 'comments': comments}); response.write(html); response.close(); }
And now a piece of code that demonstrates how this can be in an asynchronous framework. It is necessary to immediately note a few things: the code is specially written so that 4 levels of nesting are not required. We also wrote callbacks inside handleBlogPostRequest to take advantage of closures, such as access to the request and response objects, the template context, and the database client. Avoiding nesting and closures is what we need to think about while writing such code. But this is not even implied in the synchronous version.
function handleBlogPostRequest(request, response, postSlug) { var context = {}; var db = new DBClient(); function pageRendered(html) { response.write(html); response.close(); } function gotComments(comments) { context['comments'] = comments; template.render('blog/post.html', context).addCallback(pageRendered); } function gotBlogPost(post) { context['post'] = post; db.getComments(post.id).addCallback(gotComments); } db.getBlogPost(postSlug).addCallback(gotBlogPost); }
By the way, I chose JavaScript to show the point. People are now very pleased with
node.js , and this is a very cool framework, but it doesn’t hide the complexity that goes beyond asynchronism. It only hides some details of the implementation of the event loop.
The second reason why these frameworks are not good enough is that not all I / O can be handled properly at the framework level, in which case it is necessary to refer to hacks. For example, MySQL does not provide asynchronous drivers, so most well-known frameworks use threads to ensure that this communication works out of the box.
The resulting awkward API, added complexity, and the simple fact that most developers do not change their coding style leads us to the conclusion that this type of frame framework is not the desired final solution to the problem (I admit the idea that you can perform Real Work today using these , like many programmers already). This leads us to think: what other options do we have for asynchronous programming? Coroutines (coroutines) and lightweight processes, which leads us to a new important problem.
Languages ​​do not support lighter asynchronous paradigms.
There are several language constructs that, if correctly implemented in modern languages, can pave the way for alternative methods to write asynchronously, while also avoiding the drawbacks of the event loop. These constructs are coroutines and lightweight processes.
A coroutine is a function that can stop and return to execution in a specific, programmatically defined, location. This simple concept may allow converting the blocking-looking code to non-blocking. At several critical points in the code of your I / O library, low-jerk functions performing I / O may decide to "coordinate". In this case, one may pause execution, while the other returns to execution, and so on.
Here is an example (on Python, but I think it is clear):
def download_pages(): google = urlopen('http://www.google.com/').read() yahoo = urlopen('http://www.yahoo.com/').read()
This usually works like this: a new socket is opened, connected to Google, the HTTP header is sent, the full response is read, buffered, and assigned to the
google variable. Then the same for the
yahoo variable.
Ok, now imagine that the underlying implementation of the socket was built using coroutines that interact with each other. This time, as in the past, the socket will be opened and the connection will be established with Google, after which a request will be sent. But this time, after sending the request, relining the socket will pause its execution.
Having suspended its execution (but not returning the value yet), execution will continue from the next line. It also happens on the Yahoo string: as soon as the request is sent, the Yahoo string suspends execution. But there is still something to interact with - for example, some data can be read from a Google socket - and it returns to its execution at this point. It reads some data from the Google socket and suspends its execution again.
The execution jumps back and forth between two coroutines until one of them completes. For example, the Yahoo socket is completed, but Google is not. In this case, the Google socket continues to read its socket until completion, because there are no other coroutines for interaction. As soon as the Google socket is finally completed, the function will return the entire buffer.
Then a line from Yahoo returns all your data.
We retained the style of our blocking code, but used asynchronous programming. The most remarkable thing - we sustained our original program algorithm - the
google variable is assigned first, then
yahoo . In fact, somewhere below, we had a clever cycle of events to determine who gets the execution, but this is hidden from us by the fact of using coroutines.
Languages ​​like PHP, Python, Ruby, Perl, simply do not have built-in coroutines fast enough for the background implementation of such a transformation. So what's up with lekgovesnymi processes?
Lightweight processes are what Erlang uses as the main primitive of multithreading. Essentially, these processes are mostly implemented in Erlang VM. Each process has about 300 words of redundancy (overhead) and its execution is planned mainly in the Erlang VM, without dividing the state between all processes. In fact, we do not need to think about creating a process, it is practically free. The trick is that all processes can only communicate through message passing.
The use of lightweight processes at the virtual machine level eliminates the excess memory consumption, the change of contests and the relative slowness of the interprocess interaction provided by the operating system. The same virtual machine has full access to the memory stack of each process and can freely move or resize these processes and their stacks. This is what the OS simply cannot do.
With this model of lightweight processes, it is again possible to return to the generally accepted model of using different processes for all our asynchronous needs. The question becomes: can the concept of a lightweight process be implemented in languages ​​other than Erlang? Answer: “I don’t know.” In my opinion, Erlang uses some features of the language (such as the absence of changing data structures - Ed. Note: there are no variables in it) for its implementation of lightweight processes.
And where to go further
The key idea is that developers should think about their code in terms of callbacks and asynchrony, as required by asynchronous, cycle-based frameworks. After 10 years, we still see that most of the developers who are posed for this question simply ignore it. They continue to use annoying blocking methodologies of the past.
We should pay attention to such alternative implementations as coroutines and lightweight processes, which will allow us to make asynchronous programming as simple as synchronous programming. Only after that will we be able to get rid of the attachment to synchronicity.
Note Trans .: Meanwhile, coroutines are already actively used. At least in python: