Stackless Python and Concurrence

Before moving on to the capabilities of Stackless and Concurrence, consider the easiest way to write a network application that handles several simultaneous connections:

socket() bind() listen() accept() fork() -> read() write() ... close()

For each new incoming connection, the process creates its own copy via fork (). This is an extremely unprofitable way, which, moreover, has difficulties with synchronization between processes. In the simple case, they are solved through the creation of pipes (pipes) between the parent and child processes and the serialization of data. In more complex, interprocess synchronization primitives will be required. Let us recall more about the costs of creation, destruction and switching processes. These are very resource-intensive operations, both in memory and in computing power. Therefore, it will be very difficult to process many simultaneous connections.

However, this approach has one important advantage - the code is extremely simple. The logic of the application is directly transferred to the appropriate language constructs - if you need to receive some data over the network in a loop, then this will be the loop operator. If you first need to perform one action, followed by another, then it will simply be two consecutive statements in the program, and so on.
')
If, instead of creating new processes, to create separate threads within one process, then we will get rid of some of the problems - it will become much easier to exchange data between threads. To allocate memory for shared objects, it will suffice to use the usual means of the language, it is safe to transfer references to shared objects between threads and not waste resources on serialization. This saves us a lot of processor resources, but does not eliminate the need for explicit synchronization and access to shared objects. In addition, each operating system thread has its own stack, which takes several kilobytes of memory, which, when multiplied by the number of simultaneous connections, can take several hundred megabytes. But if you can accept the loss of memory (it is cheap), then the computational costs of creating and destroying streams, context switching and synchronization will be quite noticeable. In addition, the GIL curse hangs over Python, which further reduces the efficiency of multi-threaded applications.

The next step in engineering to increase productivity was single-threaded applications based on the explicit assignment of a finite state machine. Each connection is represented by an automaton, in each of the states of which the input data is processed in some way, causing, in turn, further state changes. Since the state of the machine is only a small data structure, it turns out to create, store and destroy them much faster than individual threads of the operating system. And they occupy much less memory than streams.

In applications based on the explicit specification of a finite state machine, the main loop is a poll of all open connections for any events - data came, an error occurred, or a place in the send buffer was made available. For each of the events, a handler is called at the state machine. To optimize this survey (and quickly interviewing tens of thousands of connections is not an easy task) modern operating systems provide various very effective, but not compatible with each other, interfaces (kqueue, epoll, and others). To write portable network applications, special libevent libraries have been developed, hiding from the programmer the details of the implementation of the poll of connections.

However, if we specify a finite state machine in an explicit form (each state is a separate section of the program), then the structure of the application becomes complex and unreadable. In fact, the transition between states in this case is similar to the use of goto operators - when the state changes, we need to re-search all over the source code, where the next state handler is located. Here is an example of the structure of an application that implements a simple network protocol:

 select() -> read_ready -> read(cmd) if state == "STATE1": if cmd == "CMD1": state = "STATE2" else: invalid_command() elif state == "STATE2": if cmd == "CMD2": state = "STATE1" else: invalid_command()

Each handler is never blocked, and due to this, asynchronous data processing is achieved. If, with such an architecture, an application needs to make a database request, it will need to send a request, go to the waiting state of the response and return control to the main loop. When the response comes from the database, a handler will be called who will receive the data and process it. If the request-response scheme is multiphase (for example, in SMTP, call connect, give control, wait for the connection, wait for the data, wait for the HELO from the server, send your HELO, give the controls, wait for the response, read the response, etc.) explicitly setting states becomes a programmer’s nightmare.

With all the complexity of the implementation, this approach has an undoubted advantage - it does not require almost any synchronization between connection handlers. In fact, switching between them is done cooperatively - only at the moments when they explicitly give control of the main loop. Under no circumstances can the handler be interrupted, which means that all its operations on accessing global objects are guaranteed to be atomic. We forget about mutexes, semaphores and other troubles for which precious processor clock cycles were spent.

Stackless

There is a way to combine the performance of state machines and the simplicity of the first solution. For this we need Stackless Python. Stackless Python is an enhanced version of the Python interpreter. It allows the programmer to take advantage of multi-threaded programming without sacrificing performance on synchronization primitives and without problems with race conditions. If you correctly use cheap and light Stackless microflows, they allow you to improve the structure of the program, get more readable code and increase the productivity of the programmer. Let's see how it works.

From the programmer's point of view, creating a tasklet (microflow in terms of Stackless) is no different from creating a new operating system thread: stackless.tasklet (your_func) (1, 2, 3). We run the execution of the your_func function (1, 2, 3) in the context of a new tasklet. The execution of this function will continue until the tasklet explicitly gives control to the kernel (stackless.schedule ()), or it is blocked waiting for sending or receiving any information. For example, a tasklet wants to get data from a network socket, but they are not yet available. At this point, the tasklet enters the I / O waiting queue, and control is transferred to the next tasklet in turn. When the expected data arrives, the first tasklet will receive control and continue processing the data.

In fact, the same logic worked in a state machine with finite automata (cooperative multitasking, the need to use a socket manager and no need for synchronization primitives to access common data structures), the main difference is that the task is described with regular Python linear code . For example, calls to network services can be described as:

 val = memcached.get("some-object-123") if val is None: res = list(mysql.query("select val from tbl where id=%d", 123)) if len(res): val = res[0] memcached.set("some-object-123", val)

Each of the network operations (calls to memcached, database, execution of HTTP requests, sending SMTP letters, etc.) will suspend the execution of the tasklet until its result is received. While waiting, other tasklets will be executed.

Tasklets can send data to each other using channels. A channel is an object that has two main methods, send () and receive (). If one tasklet sends ch.send (some_object) data to the channel, another can receive this data: some_object = ch.receive (). If there is no waiting for the tasklet on the channel, then the sending will be blocked until data is received. And, if there are no pending data in the channel, then the receiving tasklet will be blocked until they appear. One channel can be used by multiple tasklets, each of which can receive or send data. Channels are the main synchronization method between tasklets. For example, if you want to implement a pool of a limited number of persistent connections to the database, then the operation of taking a connection from the pool can be the following:

 def get(): if len(self._pool): return self._pool.pop(0) else: return self._wait_channel.receive()

If there are loose connections in the pool, then one of them will be taken. Otherwise, the tasklet will block on the channel and will wait until someone releases the connection. Tasklets that are locked on channels do not consume any computing time. The channel logic will automatically place the tasklet in the scheduler queue as soon as data is put into the channel. The operation of putting the connection back to the pool will be:

 def put(conn): if self._wait_channel.balance < 0: self._wait_channel.send(conn) else: self._pool.append(conn)

If the “channel balance” is less than zero, then this means that some kind of tasklets are waiting for this channel. In this case, the connection returned to the pool will be placed in the channel, from which it will immediately be grabbed by the tasklet, which first entered the queue, and it will continue to be executed.

Stackless itself is a system for switching context of tasklets, a scheduler, a channel mechanism and serialization of tasklets allowing to save them to disk, transfer over the network, and then continue execution from the place where it was interrupted. There is also a greenlets package, which is a stripped down version of Stackless. It implements only microflows (actually greenlets), and the rest of the logic, including the scheduler, falls on the programmer’s shoulders. Because of this, Greenlets are slightly (10-25 percent) slower than Stackless, but they do not require a special version of the interpreter.

Concurrence

For writing real-world network applications, a library is needed for working with non-blocking sockets, which will include a socket manager, blocking tasklets on network operations and continuing their execution when network events occur. There are several such libraries: simple triviality , Eventlet (only for Greenlets), gevent (only for Greenlets) and Concurrence (for Greenlets and Stackless). It is about the last I want to tell.

Concurrence is based on libevent, its main loop and connection buffer system are implemented in C and provide excellent performance for network operations. In addition to the socket manager itself, Concurrence provides the ability to create timers, use functions like sleep (s), it implements many popular protocols (HTTP clients, HTTP servers (WSGI), Memcached, MySQL - yes, yes, this asynchronous MySQL client library , XMPP). The example above (with references to Memcached and MySQL) was written on Concurrence. Here’s how to do a minimal web server with it:

 def hello_world(environ, start_response): start_response("200 OK", []) return ["<html>Hello, world!</html>"] def main(): server = WSGIServer(hello_world) server.serve(('localhost', 8000)) dispatch(main)

The dispatch function starts the main Concurrence loop and queues the very first tasklet that executes the main function. Next, WSGIServer is started, which will accept connections. Under each connection, a separate tasklet runs, performing the hello_world function. The latter can be of arbitrary complexity and include any asynchronous operations. As long as the system waits for them, new connections will continue to be accepted.

Now a spoon of tar. Unfortunately, it seems that Concurrence is abandoned and is no longer supported. The author does not respond to letters, including bug reports with patches. Therefore, I took the liberty to publish my version of Concurrence with the fixed bugs I found, and with several added features, in particular, with HTTP PUT support for WebDAV, with an implemented SMTP client and Thrift support. The repository is on github .

Anyone who plans to use Stackless, Concurrence, or other asynchronous programming technologies in Python, I invite you to subscribe to the ru-python-async mailing list.

Links

Stackless Python - www.stackless.com
Success Stories - www.stackless.com/wiki/Applications
Concurrence - opensource.hyves.org/concurrence
My version of Concurrence - github.com/JoyTeam/concurrence

Source: https://habr.com/ru/post/107237/

All Articles

Stackless Python and Concurrence

Stackless

Concurrence

Links

More articles: