As it is known, Global Interpreter Lock (GIL) is used in the main Python implementation CPython (
python.org ). This thing allows you to simultaneously run only one Python stream - the rest are obliged to wait for the GIL switch to them.
A colleague of
Qualab recently published a
lively article on Habré, proposing an innovative approach: create a Python subinterpreter to the operating system stream, and get the opportunity to run all our subinterpreters in parallel. Those. GIL, as it were, does not interfere at all.
The idea is fresh, but it has one major drawback - it does not work ...
Let me first look at the GIL in more detail, and then proceed to the analysis of the mistakes of the author.
')
GIL
I will briefly describe the essential GIL details for consideration in the Python 3.2+ implementation (a more detailed description of the subject can be found
here ).
Version 3.2 is chosen to be specific and reduce the length of presentation. For 1.x and 2.x, the differences are minor.
- GIL, as the name suggests, is a sync object. It is designed to block simultaneous access to the internal state of Python from different threads.
- It can be captured by any stream or remain free (unclotted).
- Only one thread can capture a GIL at a time.
- GIL is the only one for the whole process in which Python is executed. Once again I emphasize: GIL is hidden not in the subinterpreter or anywhere else - it is implemented as a set of static variables common to the whole process code.
- From the GIL point of view, each thread executing the Python C API calls must have a PyThreadState structure. GIL points to one of PyThreadState (working) or does not indicate anything (GIL is released, threads run independently and in parallel).
- After the start of the interpreter, the only operation allowed on the Python C API with no captured GIL is its capture. Everything else is forbidden (Py_INCREF is also technically safe, Py_DECREF can cause an object to be deleted, which can cause an uncontrolled, unprotected, simultaneous change of the internal state of Python that GIL tries to prevent). In the DEBUG assembly checks for incorrect operation with GIL more, in RELEASE part is disabled for better performance.
- Switches GIL on a timer (default 5 ms) or by explicit calling (
PyThreadState_Swap, PyEval_RestoreThread, PyEval_SaveThread, PyGILState_Ensure, PyGILState_Release, etc.)
As you can see, it is possible to start simultaneous parallel execution of code, while it is impossible to make calls to the Python C API (this concerns the execution of code written on python, too, of course).
At the same time, “no” means (especially in the RELEASE assembly used by all) that this behavior is unstable. It may not break right away. It can work fine on this program in general, and with a slight harmless change in the executed Python code, terminate with a segmentation fault and a bunch of side effects.
Why sub-interpreters do not help
What does a
Qualab colleague
do (you can find the link to the archive with the code in his article, I duplicated the source code on gist:
gist.github.com/4680136 )?
In the main thread,
GIL is immediately
released via PyEval_SaveThread () . The main thread no longer works with python - it creates several worker threads and waits for them to complete.
Workflow
captures GIL . The code came out strange, but now it does not matter. The main thing - GIL clamped in our fist.
And immediately the parallel execution of workflows becomes serial. It was possible not to fence the construction with subinterpreters - the sense of them in our context is exactly zero, as expected.
I do not know why the author did not notice this immediately, before the publication of the article. And then he persisted for a long time, preferring to call black white.
Back to the parallel execution is simple - you need to let go of GIL. But then it will be impossible to work with the Python interpreter.
If you still do not care about the ban and call the Python C API without GIL - the program will break, and not necessarily immediately and not the fact that without unpleasant side effects. If you want to shoot yourself in the foot in a particularly intricate way - this is your chance.
I repeat again: GIL is one for the whole process, not for the interpreter-subinterpreter. The GIL capture means that all threads executing the Python code are suspended.
Conclusion
Like or not GIL - it already exists and I strongly recommend to learn how to work with it correctly.
- Either grab the GIL and call the Python C API functions.
- Or let him go and do what we want, but the Python cannot be touched in this mode.
- Parallel work is provided by the simultaneous launch of several processes through multiprocessing or in some other way. The details of working with processes are beyond the scope of this article.
The rules are simple, there are no exceptions and no loopholes.