📜 ⬆️ ⬇️

Asynchronous Python: Various Forms of Competitiveness

With the advent of Python 3 quite a bit of noise about “asynchrony” and “concurrency”, it can be assumed that Python recently introduced these features / concepts. But it is not. We have used these operations many times. In addition, beginners may think that asyncio is the only or best way to recreate and use asynchronous / parallel operations. In this article, we will look at various ways to achieve concurrency, their advantages and disadvantages.

Definition of terms:


Before we dive into the technical aspects, it is important to have some basic understanding of the terms often used in this context.

Synchronous and asynchronous:

In synchronous operations, tasks are performed one after another. In asynchronous tasks can be started and completed independently of each other. One asynchronous task can be started and continue to execute while execution proceeds to a new task. Asynchronous tasks do not block (do not make them wait for the task to complete) operations and are usually performed in the background.
')
For example, you should contact your travel agent to plan your next vacation. You need to send a letter to your supervisor before flying away. In synchronous mode, you first call the travel agency, and if you are asked to wait, then you will wait until you are answered. Then you start writing a letter to the manager. Thus, you perform tasks one after another. [synchronous execution, approx. translator] But if you are smart, then while you are asked to wait [hang on the phone, approx. translator] you will begin to write an e-mail and when they talk to you again you will pause the spelling, speak and then finish the letter. You can also ask a friend to call the agency, and write a letter yourself. This is asynchronous, tasks do not block each other.

Competitiveness and concurrency:

Competition implies that two tasks are performed together . In our previous example, when we looked at an asynchronous example, we gradually advanced either in the writing of the letter or in the conversation with the tour. agency. This is competitiveness .

When we asked to call a friend and wrote a letter ourselves, the tasks were performed in parallel .

Parallelism is essentially a form of competitiveness. But concurrency depends on hardware. For example, if there is only one core in a CPU, then two tasks cannot be executed in parallel. They just divide the CPU time among themselves. Then this is competitiveness, but not concurrency. But when we have several cores [like a friend in the previous example, which is the second core, approx. translator] we can perform several operations (depending on the number of cores) at the same time.

Summing up:


Parallelism implies competitiveness. But competitiveness does not always imply concurrency.

Threads and Processes


Python has supported threads for a very long time. Threads allow you to perform operations concurrently. But there is a problem with the Global Interpreter Lock (GIL) because of which the threads could not provide true concurrency. Nevertheless, with the advent of multiprocessing, you can use multiple cores using Python.

Threads

Consider a small example. In the following code, the worker will execute asynchronously and simultaneously across multiple threads.

import threading import time import random def worker(number): sleep = random.randrange(1, 10) time.sleep(sleep) print("I am Worker {}, I slept for {} seconds".format(number, sleep)) for i in range(5): t = threading.Thread(target=worker, args=(i,)) t.start() print("All Threads are queued, let's see when they finish!") 

Here is an example of the output:

 $ python thread_test.py All Threads are queued, let's see when they finish! I am Worker 1, I slept for 1 seconds I am Worker 3, I slept for 4 seconds I am Worker 4, I slept for 5 seconds I am Worker 2, I slept for 7 seconds I am Worker 0, I slept for 9 seconds 

Thus, we launched 5 threads for collaboration and after their start (that is, after starting the worker function), the operation does not wait for the threads to finish before moving on to the next print statement. This is an asynchronous operation.

In our example, we passed the function to the Thread constructor. If we wanted, we could implement a subclass with a method (OOP style).

Further reading:

To learn more about threads, use the link below:


Global Interpreter Lock (GIL)

GIL was introduced to make CPython memory handling easier and to provide better integration with C (for example, extensions). GIL is a blocking mechanism when the Python interpreter launches only one thread at a time. Those. Only one stream can be executed in Python bytecode at a time. GIL ensures that multiple threads are not executed in parallel .

GIL in brief:


Many see GIL as a weakness. I consider this as a blessing, because such libraries as NumPy, SciPy, which occupy a special, unique position in the scientific community, were created.

Further reading:

These resources will allow to go deep into GIL:


Processes (Processes)

To achieve concurrency in Python, a multiprocessing module has been added, which provides an API, and looks very similar if you have used threading before.

Let's just go and change the previous example. Now the modified version uses Process instead of Flow .

 import multiprocessing import time import random def worker(number): sleep = random.randrange(1, 10) time.sleep(sleep) print("I am Worker {}, I slept for {} seconds".format(number, sleep)) for i in range(5): t = multiprocessing.Process(target=worker, args=(i,)) t.start() print("All Processes are queued, let's see when they finish!") 

What has changed? I just imported a multiprocessing module instead of threading . And then, instead of a stream, I used a process. That's all! Now, instead of multiple threads, we use processes that run on different CPU cores (unless, of course, your processor has several cores).

Using the Pool class, we can also distribute the execution of a single function among several processes for different input values. Example from official documents:

 from multiprocessing import Pool def f(x): return x*x if __name__ == '__main__': p = Pool(5) print(p.map(f, [1, 2, 3])) 

Here, instead of going through the list of values ​​and calling the function f one by one, we actually run the function in different processes. One process performs f (1), the other f (2), and the other f (3). Finally, the results are again combined into a list. This allows us to break the heavy calculations into smaller parts and run them in parallel for faster calculation.

Further reading:


Module concurrent.futures

The concurrent.futures module is large and makes writing asynchronous code very easy. My favorites ThreadPoolExecutor and ProcessPoolExecutor . These implementers maintain a pool of threads or processes. We send our tasks to the pool, and it runs the tasks in an accessible thread / process. A Future object is returned, which can be used to query and retrieve the result upon completion of the task.

Here is an example of ThreadPoolExecutor:

 from concurrent.futures import ThreadPoolExecutor from time import sleep def return_after_5_secs(message): sleep(5) return message pool = ThreadPoolExecutor(3) future = pool.submit(return_after_5_secs, ("hello")) print(future.done()) sleep(5) print(future.done()) print(future.result()) 

I have an article about concurrent.futures masnun.com/2016/03/29/python-a-quick-introduction-to-the-concurrent-futures-module.html . It may be useful for a deeper study of this module.

Further reading:


Asyncio - what, how and why?


You probably have a question that many people in the Python community have - what does new asyncio bring? Why was another asynchronous I / O needed? Have we not already had threads and processes? Let's watch!

Why do we need asyncio?

The processes are very expensive [in terms of resource consumption, approx. translator] to create. Therefore, for I / O operations, threads are mainly selected. We know that I / O depends on external things — slow disks or unpleasant network lags make I / O often unpredictable. Now suppose we are using threads for I / O operations. 3 threads perform various I / O tasks. The interpreter would have to switch between competitive streams and give each of them some time in turn. Let's call the streams - T1, T2 and T3. Three threads started their I / O operation. T3 completes it first. T2 and T1 are still waiting for I / O. The Python interpreter switches to T1, but it is still waiting. Well, the interpreter moves to T2, but it still waits, and then moves to T3, which is ready and executes the code. Do you see this as a problem?

T3 was ready, but the interpreter first switched between T2 and T1 - it incurred switching costs, which we could have avoided if the interpreter first switched to T3, right?

What is asynio?

Asyncio provides us with a cycle of events along with other cool things. The event loop monitors I / O events and switches tasks that are ready and waiting for an I / O operation [event loop is a software construct that waits for arrival and sends out events or messages in the program, approx. translator] .

The idea is very simple. There is an event loop. And we have functions that perform asynchronous I / O operations. We pass our functions to the event loop and ask it to run them for us. The event loop returns the Future object to us, like a promise that in the future we will get something. We hold on to the promise, check from time to time whether it matters (we are very impatient), and finally, when the value is received, we use it in some other operations [i.e. we sent a request, we were immediately given a ticket and told to wait until the result came. We periodically check the result and as soon as it is received, we take a ticket and get the value from it, approx. translator] .

Asyncio uses generators and korutiny for stopping and resuming tasks. You can read the details here:


How to use asyncio?

Before we begin, let's take a look at an example:

 import asyncio import datetime import random async def my_sleep_func(): await asyncio.sleep(random.randint(0, 5)) async def display_date(num, loop): end_time = loop.time() + 50.0 while True: print("Loop: {} Time: {}".format(num, datetime.datetime.now())) if (loop.time() + 1.0) >= end_time: break await my_sleep_func() loop = asyncio.get_event_loop() asyncio.ensure_future(display_date(1, loop)) asyncio.ensure_future(display_date(2, loop)) loop.run_forever() 

Please note that the async / await syntax is for Python 3.5 and above only. Let's go through the code:


Whenever an await call occurs, asyncio understands that functions will probably take some time. Thus, it pauses execution, starts monitoring any I / O event associated with it, and allows you to run tasks. When asyncio notices that the suspended I / O function is ready, it resumes the function.

Making the right choice


We have just walked through the most popular forms of competition. But the question remains - what should I choose? It depends on the use cases. From my experience, I tend to follow this pseudo code:

 if io_bound: if io_very_slow: print("Use Asyncio") else: print("Use Threads") else: print("Multi Processing") 


[Approx. translator]

Source: https://habr.com/ru/post/421625/


All Articles