With the advent of Python 3 quite a bit of noise about “asynchrony” and “concurrency”, it can be assumed that Python recently introduced these features / concepts. But it is not. We have used these operations many times. In addition, beginners may think that asyncio is the only or best way to recreate and use asynchronous / parallel operations. In this article, we will look at various ways to achieve concurrency, their advantages and disadvantages.
Definition of terms:
Before we dive into the technical aspects, it is important to have some basic understanding of the terms often used in this context.
Synchronous and asynchronous:In
synchronous operations, tasks are performed one after another. In
asynchronous tasks can be started and completed independently of each other. One asynchronous task can be started and continue to execute while execution proceeds to a new task. Asynchronous tasks
do not block (do not make them wait for the task to complete) operations and are usually performed in the background.
')
For example, you should contact your travel agent to plan your next vacation. You need to send a letter to your supervisor before flying away. In synchronous mode, you first call the travel agency, and if you are asked to wait, then you will wait until you are answered. Then you start writing a letter to the manager. Thus, you perform tasks one after another.
[synchronous execution, approx. translator] But if you are smart, then while you are asked to wait
[hang on the phone, approx. translator] you will begin to write an e-mail and when they talk to you again you will pause the spelling, speak and then finish the letter. You can also ask a friend to call the agency, and write a letter yourself. This is asynchronous, tasks do not block each other.
Competitiveness and concurrency:Competition implies that two tasks are performed
together . In our previous example, when we looked at an asynchronous example, we gradually advanced either in the writing of the letter or in the conversation with the tour. agency. This is
competitiveness .
When we asked to call a friend and wrote a letter ourselves, the tasks were performed
in parallel .
Parallelism is essentially a form of competitiveness. But concurrency depends on hardware. For example, if there is only one core in a CPU, then two tasks cannot be executed in parallel. They just divide the CPU time among themselves. Then this is competitiveness, but not concurrency. But when we have several cores
[like a friend in the previous example, which is the second core, approx. translator] we can perform several operations (depending on the number of cores) at the same time.
Summing up:
- Synchronicity: blocks operations (blocking)
- Asynchrony: does not block operations (non-blocking)
- Competitiveness: Joint Progress (Joint)
- Parallelism: parallel progress (parallel)
Parallelism implies competitiveness. But competitiveness does not always imply concurrency.
Threads and Processes
Python has supported threads for a very long time. Threads allow you to perform operations concurrently. But there is a problem with the
Global Interpreter Lock (GIL) because of which the threads could not provide true concurrency. Nevertheless, with the advent of
multiprocessing, you can use multiple cores using Python.
ThreadsConsider a small example. In the following code, the
worker will execute asynchronously and simultaneously across multiple threads.
import threading import time import random def worker(number): sleep = random.randrange(1, 10) time.sleep(sleep) print("I am Worker {}, I slept for {} seconds".format(number, sleep)) for i in range(5): t = threading.Thread(target=worker, args=(i,)) t.start() print("All Threads are queued, let's see when they finish!")
Here is an example of the output:
$ python thread_test.py All Threads are queued, let's see when they finish! I am Worker 1, I slept for 1 seconds I am Worker 3, I slept for 4 seconds I am Worker 4, I slept for 5 seconds I am Worker 2, I slept for 7 seconds I am Worker 0, I slept for 9 seconds
Thus, we launched 5 threads for collaboration and after their start (that is, after starting the worker function), the operation
does not wait for the threads to finish before moving on to the next print statement. This is an asynchronous operation.
In our example, we passed the function to the Thread constructor. If we wanted, we could implement a subclass with a method (OOP style).
Further reading:To learn more about threads, use the link below:
Global Interpreter Lock (GIL)GIL was introduced to make CPython memory handling easier and to provide better integration with C (for example, extensions). GIL is a blocking mechanism when the Python interpreter launches only one thread at a time. Those. Only one stream can be executed in Python bytecode at a time. GIL ensures that multiple threads are not executed
in parallel .
GIL in brief:
- One thread can run at the same time.
- The Python interpreter switches between threads to achieve concurrency.
- GIL is applicable to CPython (standard implementation). But such as, for example, Jython and IronPython do not have GIL.
- GIL makes single-threaded programs fast.
- GIL I / O operations do not usually interfere.
- GIL makes it easy to integrate non-thread-safe C libraries, thanks to GIL we have many high-performance extensions / modules written in C.
- For CPU dependent tasks, the interpreter checks every N ticks and switches threads. Thus, one thread does not block the others.
Many see GIL as a weakness. I consider this as a blessing, because such libraries as NumPy, SciPy, which occupy a special, unique position in the scientific community, were created.
Further reading:These resources will allow to go deep into GIL:
Processes (Processes)To achieve concurrency in Python, a
multiprocessing module has been added, which provides an API, and looks very similar if you have used
threading before.
Let's just go and change the previous example. Now the modified version uses
Process instead of
Flow .
import multiprocessing import time import random def worker(number): sleep = random.randrange(1, 10) time.sleep(sleep) print("I am Worker {}, I slept for {} seconds".format(number, sleep)) for i in range(5): t = multiprocessing.Process(target=worker, args=(i,)) t.start() print("All Processes are queued, let's see when they finish!")
What has changed? I just imported a
multiprocessing module instead of
threading . And then, instead of a stream, I used a process. That's all! Now, instead of multiple threads, we use processes that run on different CPU cores (unless, of course, your processor has several cores).
Using the Pool class, we can also distribute the execution of a single function among several processes for different input values. Example from official documents:
from multiprocessing import Pool def f(x): return x*x if __name__ == '__main__': p = Pool(5) print(p.map(f, [1, 2, 3]))
Here, instead of going through the list of values and calling the function f one by one, we actually run the function in different processes. One process performs f (1), the other f (2), and the other f (3). Finally, the results are again combined into a list. This allows us to break the heavy calculations into smaller parts and run them in parallel for faster calculation.
Further reading:Module concurrent.futuresThe concurrent.futures module is large and makes writing asynchronous code very easy. My favorites
ThreadPoolExecutor and
ProcessPoolExecutor . These implementers maintain a pool of threads or processes. We send our tasks to the pool, and it runs the tasks in an accessible thread / process. A
Future object is returned, which can be used to query and retrieve the result upon completion of the task.
Here is an example of ThreadPoolExecutor:
from concurrent.futures import ThreadPoolExecutor from time import sleep def return_after_5_secs(message): sleep(5) return message pool = ThreadPoolExecutor(3) future = pool.submit(return_after_5_secs, ("hello")) print(future.done()) sleep(5) print(future.done()) print(future.result())
I have an article about concurrent.futures
masnun.com/2016/03/29/python-a-quick-introduction-to-the-concurrent-futures-module.html . It may be useful for a deeper study of this module.
Further reading:Asyncio - what, how and why?
You probably have a question that many people in the Python community have - what does new asyncio bring? Why was another asynchronous I / O needed? Have we not already had threads and processes? Let's watch!
Why do we need asyncio?The processes are very expensive
[in terms of resource consumption, approx. translator] to create. Therefore, for I / O operations, threads are mainly selected. We know that I / O depends on external things — slow disks or unpleasant network lags make I / O often unpredictable. Now suppose we are using threads for I / O operations. 3 threads perform various I / O tasks. The interpreter would have to switch between competitive streams and give each of them some time in turn. Let's call the streams - T1, T2 and T3. Three threads started their I / O operation. T3 completes it first. T2 and T1 are still waiting for I / O. The Python interpreter switches to T1, but it is still waiting. Well, the interpreter moves to T2, but it still waits, and then moves to T3, which is ready and executes the code. Do you see this as a problem?
T3 was ready, but the interpreter first switched between T2 and T1 - it incurred switching costs, which we could have avoided if the interpreter first switched to T3, right?
What is asynio?Asyncio provides us with a cycle of events along with other cool things. The event loop monitors I / O events and switches tasks that are ready and waiting for an I / O operation
[event loop is a software construct that waits for arrival and sends out events or messages in the program, approx. translator] .
The idea is very simple. There is an event loop. And we have functions that perform asynchronous I / O operations. We pass our functions to the event loop and ask it to run them for us. The event loop returns the Future object to us, like a promise that in the future we will get something. We hold on to the promise, check from time to time whether it matters (we are very impatient), and finally, when the value is received, we use it in some other operations
[i.e. we sent a request, we were immediately given a ticket and told to wait until the result came. We periodically check the result and as soon as it is received, we take a ticket and get the value from it, approx. translator] .
Asyncio uses generators and korutiny for stopping and resuming tasks. You can read the details here:
How to use asyncio?Before we begin, let's take a look at an example:
import asyncio import datetime import random async def my_sleep_func(): await asyncio.sleep(random.randint(0, 5)) async def display_date(num, loop): end_time = loop.time() + 50.0 while True: print("Loop: {} Time: {}".format(num, datetime.datetime.now())) if (loop.time() + 1.0) >= end_time: break await my_sleep_func() loop = asyncio.get_event_loop() asyncio.ensure_future(display_date(1, loop)) asyncio.ensure_future(display_date(2, loop)) loop.run_forever()
Please note that the async / await syntax is for Python 3.5 and above only. Let's go through the code:
- We have the asynchronous function display_date, which takes a number (as an identifier) and an event loop as parameters.
- The function has an infinite loop, which is interrupted after 50 seconds. But during this period, she repeatedly prints the time and pauses. The await function can wait for other asynchronous functions to complete (corutin).
- Pass the function to the event loop (using the ensure_future method).
- We start a cycle of events.
Whenever an await call occurs, asyncio understands that functions will probably take some time. Thus, it pauses execution, starts monitoring any I / O event associated with it, and allows you to run tasks. When asyncio notices that the suspended I / O function is ready, it resumes the function.
Making the right choice
We have just walked through the most popular forms of competition. But the question remains - what should I choose? It depends on the use cases. From my experience, I tend to follow this pseudo code:
if io_bound: if io_very_slow: print("Use Asyncio") else: print("Use Threads") else: print("Multi Processing")
- CPU Bound => Multi Processing
- I / O Bound, Fast I / O, Limited Number of Connections => Multi Threading
- I / O Bound, Slow I / O, Many connections => Asyncio
[Approx. translator]