Once again about a multithreading in one line

It took me a while ago to speed up the server response in my Flask project. Due to the fact that in view the request to three remote web services is sequentially called, the page load time with data not from the cache reached 10 seconds. Yes, maybe Flask is not the framework that was worth using, but what we have, we have.
So let's get started. Since I can’t publish the real code, I’ll consider it using academic examples.

Task 1 There are three functions a, b, c, which need to be called in separate threads, wait for the result of their execution and issue a response.
To solve problem 1, I used this translation , for I was fascinated by the ease of use of the library.

import multiprocessing.dummy as multiprocessing import time def a(): time.sleep(2) return 'a' def b(): time.sleep(2) return 'b' def c(): time.sleep(1) return 'c' p = multiprocessing.Pool() results = p.map(lambda f: f(),[a,b,c]) print(results) p.close() p.join()

The result of the code:

 ['a', 'b', 'c']

Great, but there is a significant disadvantage. The execution time of the code is not limited, it will wait for the result of all the procedures. We change the wording of the problem.

Task 2 There are three functions a, b, c, which need to be called in separate threads, and after a time interval, check whether they have completed or not, to produce a result.
')
For the decision we use the same library, but already the map_async function. Its difference is that it returns an AsyncResult object.

 import multiprocessing.dummy as multiprocessing import time def a(): time.sleep(2) return 'a' def b(): time.sleep(2) return 'b' def c(): time.sleep(1) return 'c' p = multiprocessing.Pool() result = p.map_async(lambda f: f(),[a,b,c]) TIMEOUT =3 print(results.get(TIMEOUT)) p.close() p.join()

The result of the execution with TIMEOUT> = 3 is the same as in the previous case, but if at least one of the procedures fails to complete, a TimeoutError exception is thrown. However, this result did not suit me well. The fact is that in my case it was essential for me to have one function managed to work, the rest could be absent when issuing.

Task 3 There are three functions a, b, c that need to be called in separate threads, wait for the result of function a.

 import multiprocessing.dummy as multiprocessing import time def a(): time.sleep(2) print(1) return 'a' def b(): time.sleep(3) print(2) return 'b' def c(): time.sleep(1) print(3) return 'c' p = multiprocessing.Pool() results=[] for r in p.imap(lambda f: f(),[a,b,c]): results.append(r) break print(results) p.close() p.join()

Result of performance:

 3 1 ['a'] 2

As you can see, although we completed 2 functions out of 3, we received the result only for the priority one. To get the result of the second, you should use imap_unordered:

 results=[] for r in p.imap_unordered(lambda f: f(),[a,b,c]): results.append(r) if r =='a': break

Result:

 3 1 ['c', 'a'] 2

What if we only need the result of one stream, the fastest one, in the main thread? It is enough to remove the p.join () call from the previous example and exit the loop by the first result.

Now there is such a moment. If you try to use the multiprocessing module that works with processes, instead of multiprocessing.dummy, working with threads, you will get a cPickle.PicklingError serialization error, since the interprocess communication cannot serialize the function. In order for the code to work, you need to enter a function-alias, the code will not be so beautiful, but:

 import multiprocessing import time def a(): time.sleep(2) return 'a' def b(): time.sleep(2) return 'b' def c(): time.sleep(1) return 'c' params_mapping = { 'a':a, 'b':b, 'c':c } def func(param): return params_mapping[param]() p = multiprocessing.Pool() results = p.map(func,['a','b','c']) print(results) p.close() p.join()

Source: https://habr.com/ru/post/260431/

All Articles

Once again about a multithreading in one line

More articles: