Execution Threads and PHP

PHP and threads of execution (threads). A sentence of just four words, and you can write a book on this topic. As usual, I will not do that, but I will give you information so that you can understand the subject to a certain extent.

Let's start with the confusion that is in the minds of some programmers. PHP is not a multi-threaded language. Inside PHP itself, no execution threads are used, and PHP does not allow custom code to natively use them as a parallelization mechanism.

PHP is very far from other technologies. For example, in Java, execution threads are very actively used; they can also be found in user programs. There is no such thing in PHP. And for good reason.

The PHP engine dispenses with threads of execution mainly for the sake of simplicity. After reading the next section, you will learn that execution threads are not “magic technology that allows you to speed up the work of any program.” Looks like a seller speech, right? But we are not merchants - we are techies, and we know what we are talking about. There are currently no threads in the PHP engine. Perhaps in the future they will appear. But it will entail so many difficulties that the result may be far from expected. The main difficulty is cross-platform multithreaded programming (thread programming). The second difficulty is shared resources and lock management. The third is that execution threads are not suitable for every program. PHP architecture originated around 2000, and at that time, streaming programming was rare and immature. Therefore, the authors of PHP (mainly Zend engine) decided to make a whole engine without threads. Yes, and they did not have the necessary resources to create a stable cross-platform multi-threaded engine.

In addition, threads cannot be applied in PHP user space. This language does not do the code. The concept of PHP is “shot and forget”. The request should be processed as quickly as possible in order to free PHP for the next request. PHP is created as a binding language: you do not handle complex tasks that require threads. Instead, contact fast-and-ready resources, tie everything together and send it back to the user. PHP is an action language, and if something requires the processing of “more time than usual”, then it should be done not in PHP. Therefore, a system based on queues (Gearman, AMQP, ActiveMQ, etc.) is used for asynchronous processing of some heavy tasks. In Unix, it’s customary to do this: "Develop small, self-sufficient tools and link them together." PHP is not designed for active parallelization, it is a lot of other technologies. Every problem is the right tool.

A few words about the threads

Let's refresh what the execution threads are. We will not go into details, you will find them in the Internet and books.

The execution flow is a “small” processing unit (light unit of work treatment) that is inside the process. A process can create multiple threads of execution. A thread must be part of only one process. The process is a “big” processing unit within the operating system. On multi-core (multiprocessor) computers, several cores (processors) work in parallel and process part of the load of executable tasks. If processes A and B are ready for queuing and two cores (processors) are ready for operation, then A and B should be simultaneously sent for processing. Then the computer effectively handles several tasks per unit of time (time interval, timeframe). We call it "parallelism."

Process:

The flow of execution:

Together:

Previously, A and B were processes: completely independent handlers. But execution threads are not processes. Threads are units that live within processes. That is, the process can distribute work across several smaller tasks that are performed simultaneously. For example, processes A and B can generate flows A1, A2, B1 and B2. If a computer is equipped with several processors, for example, eight, then all four threads can run in one time interval (timeframe).

Execution threads are a way to divide the work of a process into several small subtasks solved in parallel (in the same time interval). Moreover, the threads run in much the same way as the processes: the kernel program flow manager (Kernel thread scheduler) manages the threads using states.

Execution threads are easier processes, they only need a stack and a few registers for work. And processes need a lot of things: a new virtual machine frame (VM frame) from the kernel, a heap, different signaling information, information about file descriptors, locks, etc.

The process memory is managed at the hardware level by the kernel and the MMU, and the thread execution memory is controlled at the program level by the programmer and the threading libraries.

So remember: execution threads are easier processes and easier to manage. With proper use, they also run faster than processes, since the OS kernel almost does not interfere with thread management and dispatching.

Thread memory layout

Threads have their own stack. Therefore, when accessing variables declared in functions, they receive their own copy of this data.

The process heap is shared between threads, like global variables, and file descriptors. This is both an advantage and a disadvantage. If we only read from the global memory, then we need to do it on time. For example, after the stream X and before the stream Y. If we write to the global memory, then we should make sure that we will not try to write several streams there either at the same time. Otherwise, this memory region will be in an unpredictable state - the so-called race condition . This is the main problem in streaming programming.

In case of simultaneous access, you need to implement some mechanisms in the code, such as reentrancy or synchronization routine. Re-entry violates concurrency. And synchronization allows you to manage consistency in a predictable way.

Processes do not share memory, they are perfectly isolated at the OS level. And execution threads within one process share a large amount of memory.

Therefore, they need tools to synchronize access to shared memory, such as semaphores and mutexes. The operation of these tools is based on the principle of “blocking”: if a resource is locked and the thread tries to access it, then by default the thread will wait for the resource to unlock. Therefore, execution threads alone will not make your program faster. Without efficiently allocating tasks to threads and managing shared memory locks, your program will run even slower than using a single process without threads. It’s just that streams will constantly wait for each other (and I’m not even talking about deadlocks, fasting, etc.).

If you have no experience in streaming programming, then it will be a difficult task for you. To gain experience with threads of execution, it will take many hours of practice and solutions to WTF moments. It is worth forgetting about some trifles - and the whole program will go into the dressing. It is more difficult to debug a program with threads than without them if we are talking about real projects with hundreds or thousands of threads in one process. You will go crazy and just drown in all this.

Stream programming is a difficult task. To become a master, you need to spend a lot of time and effort.

Such a flow sharing scheme is not always convenient. Therefore, a local thread storage (TLS) appeared. TLS can be described as "globals belonging to one stream and not used by others." These are memory areas that reflect a global state, private for a particular flow of execution (as in the case of using processes alone). When creating a thread, a part of the process heap is allocated - the storage. The stream library is queried for a key that is associated with this repository. It must be used by the execution thread each time it accesses its repository. To destroy the allocated resources at the end of the life of the thread requires a destructor.

An application is thread safe if every access to global resources is under complete control and is completely predictable. Otherwise, the dispatcher (scheduler) will cross you the road: some tasks will unexpectedly be executed and the productivity will fall.

Stream libraries

Threads need the help of the OS kernel. In operating systems, execution threads appeared in the mid-1990s, so the techniques for working with them are polished.

But there are cross-platform issues. There are especially many differences between Windows and Unix systems. In these ecosystems, different models of stream execution are adopted and different stream libraries are used.

In Linux, the kernel makes the clone () system call to create a thread or process. But it is incredibly complex, so system calls use C code to facilitate everyday streaming programming. The libc still does not control streaming operations (the standard library from C11 demonstrates such an initiative), external libraries are involved in this. Today, Unix systems typically use pthread (there are other libraries as well). Pthread is an abbreviation for Posix threads. This POSIX normalization of the use of threads and their behavior dates back to 1995. If you need execution threads, include the libpthread library: pass to GCC -lpthread . It is written in C, its code is open , it has its own version control and management mechanism.

So, in Unix-systems the pthread library is most often used. It provides consistency (concurrency), and concurrency depends on the specific OS and computer.

Consistency is when multiple threads are randomly executed on a single processor. Parallelism is when multiple threads run simultaneously on different processors.

Consistency:

Parallelism:

PHP and execution threads

To begin, remember:

There are no threads in PHP: its engine and code do not have threads for parallelizing internal work.
PHP does not offer streams to users: you cannot natively use them in PHP. Joe Watkins, one of the PHP developers, created a good library that adds execution threads to user space: ext / pthread . But personally, I would not choose PHP for such tasks: it is not intended for this, it is better to take C or Java.

So what about threads in PHP?

How PHP handles requests

The thing is, how PHP will handle HTTP requests. The web server needs to provide some kind of consistency (or concurrency) to serve several clients simultaneously. After all, answering one client, it is impossible to put all the others on pause.

Therefore, servers usually use several processes or several threads to respond to clients.

Historically, processes work under Unix. It's just the basis of Unix, with its birth, there are processes that can create new processes ( fork() ), destroy them ( exit() ) and synchronize ( wait() , waitpid() ). In such an environment, numerous PHP service multiple client requests. But each works in his own process .

In this situation, PHP can not help: the processes are completely isolated. Process A, which processes request A for client data A, cannot interact (read or write) with process B, which processes request B for client B. We need this.

In 98% of cases, two architectures are used: php-fpm and Apache with mpm_prefork .

Under Windows, everything is more complicated, as in Unix servers with threads.

Windows is a really great OS. It has only one drawback - closed source code. But online or in books you can find information about the internal structure of many technical resources. Microsoft engineers talk a lot about how Windows works.

Windows has a different approach to consistency and concurrency. This OS very actively uses execution threads. In fact, creating a process in Windows is such a difficult task that it is usually avoided. Instead, run threads always and everywhere. Threads in Windows are much more powerful than in Linux. Yes exactly.

When PHP runs under Windows, the web server (any) will handle client requests in streams, not processes . That is, in this environment, PHP runs in the stream. And therefore, he should be especially careful with the thread specifications: he must be thread safe .

PHP must be thread-safe, that is, manage consistency that it did not create, but in which and with which it functions. That is, protect your access to your own global variables. And PHP has a lot of them.

Zend Thread Safety (ZTS, Zend thread safety) is responsible for this protection.

Note that the same is true for Unix, if you decide to use execution threads to parallelize the processing of client requests. But for Unix-systems, this is a very unusual situation, since for such tasks classical processes are traditionally used here. Although no one bothers to choose streams, it can improve performance. Threads are easier processes, so the system can perform many more threads. In addition, if your PHP extension needs thread-safety (like ext / pthread), then you will need thread-safe PHP.

ZTS implementation details

ZTS is activated using --enable-maintainer-zts . Typically, you do not need this switch if you do not run PHP under Windows or do not run PHP with an extension that requires the engine's thread safety to work.

There are a number of ways to check the current mode of operation. CLI and php –v will tell you that NTS (Not Thread Safe) or ZTS (Zend Thread Safe) is now activated.

You can also use phpinfo() :

You can read PHP_ZTS constant from PHP in your code.

 if (PHP_ZTS) { echo "You are running a thread safe version of the PHP engine"; }

When compiling with ZTS, the entire foundation of PHP becomes thread-safe. But activated extensions may not be thread-safe. All official extensions (distributed with PHP) are safe, but you can’t vouch for third parties. Below, you will see that mastering the thread safety of PHP extensions requires special use of the API. And, as it constantly happens with streams: one omission - and the whole server can fall down.

When using threads, if you do not call reentrant functions (usually from libc) or blindly refer to a true global variable (true global variable), this will lead to strange behavior in all single-level threads (sibling threads) . For example, mess with threads in one extension - and this will affect every client served in all threads on the server! A dreadful situation: one client can spoil all other client data.

When designing PHP extensions:

Extreme caution and good knowledge of streaming programming are required. Otherwise, you will completely unpredictably break the server, and you will not be able to debug it quickly enough.
If you make a mistake with threads, this will affect all clients served by all threads on the server. You may not even notice this, because erroneous stream programming usually leads to horrible unpredictable behavior that is not easy to reproduce. *

Reentrant functions

When designing a PHP extension, use reentrant functions : functions that do not depend on the global state. Although it is too simplistic. If more, then reentrant functions can be called until their previous call is completed. They are able to work in parallel in two or more execution threads. If they used a global state, they would not be reentrant. However, they can block their own global state, and therefore be thread-safe;) Many traditional functions from libc are not reentrant, because they were created when they had not yet invented execution threads.

So some libc (especially glibc) publish reentrant equivalent functions as functions with the suffix _r() . The new C11 standard gives more options for using threads. And functions from C11 libc are reworked and got the _s() suffix (for example, localtime_s() ).

strtok() => strtok_r(); strerror(), strerror_r(); readdir() => readdir_r() strtok() => strtok_r(); strerror(), strerror_r(); readdir() => readdir_r() -, etc.

PHP itself provides some features mainly for cross-platform use. Take a look at main / reentrancy.c .

Also, do not forget about reentrancy when writing your own C-functions. The function will be reentrant if you can pass everything you need as arguments (on the stack or through registers) and if it does not use global / static variables or any non-reentrant functions.

Do not bind to thread safe libraries.

Remember that in streaming programming, the whole process of sharing a memory image is important. This includes linked libraries.

If your extension is tied to a library that is exactly thread safe, you will have to develop your own methods of providing thread safety in order to protect the access to the global state in the library. In streaming programming and C, this happens often, but is easily overlooked.

Using ZTS

ZTS is a code level that controls access to global streaming variables using Thread Local Storage (TLS) in PHP 7.

When developing the PHP language and its extensions, we have to distinguish two types of globals in the code.

There are true globals (true globals), which are simply traditional global variables C. They have everything in order with the architecture, but since we did not protect them from consistency in the streams, we can only read them when PHP processes requests. True globals are created and written before at least one thread of execution is created. In PHP's internal terminology, this step is called module init . This is clearly seen in the example extensions:

 static int val; /*   */ PHP_MINIT(wow_ext) /*   PHP */ { if (something()) { val = 3; /*     */ } }

This pseudocode shows how any PHP extension might look. Extensions have multiple hooks that are initialized during the PHP life cycle. MINIT () interceptor refers to the initialization of PHP. This procedure starts PHP and you can safely read a global variable or write to it, as in the example above.

The second important interceptor is RINIT (), the initialization of the request. This procedure is called for each extension, when processing each new request. That is, RINIT () can be called extensions thousands of times. At this point, PHP is already going into the stream . The web server will break the initial process into threads, so thread safety is required in RINIT (). This is perfectly logical in a situation where threads are created for simultaneous processing of several requests. Do not forget - you do not create threads . Instead of PHP, it creates a web server.

We also use thread globals . These are global variables whose thread safety is provided by the ZTS level:

 PHP_RINIT(wow_ext) /*    PHP */ { if (something()) { WOW_G(val) = 3; /*     */ } }

To access the streaming global, we used the WOW_G() macro. Let's see how it works.

The need for macros

Remember: when PHP is running on threads, you need to protect access to all global states related to queries. If there are no threads, then this protection is not needed. After all, each process gets its own memory, which no one else uses.

So, the way of accessing globals related to queries depends on the environment (a multitasking engine is used). Therefore, it is necessary to make so that the access to the globals associated with the queries is executed in the same way regardless of the environment.

Macros are used for this.

The WOW_G() macro will be processed in different ways, in accordance with the work of the multitasking PHP engine (processes or threads). You can influence this by recompiling your extension. Therefore, PHP extensions are incompatible when switching between ZTS and non-ZTS modes. Incompatible at the level of binary code (binary incompatible)!

ZTS is incompatible at the binary level with non-ZTS. When switching from one mode to another, exceptions must be recompiled.

When working in a process, the WOW_G() macro is usually processed as follows:

 #ifdef ZTS #define WOW_G(v) wow_globals.v #endif

When working in a stream:

 #ifndef ZTS #define WOW_G(v) wow_globals.v #else #define WOW_G(v) (((wow_globals *) (*((void ***) tsrm_get_ls_cache()))[((wow_globals_id)-1)])->v) #endif

In ZTS mode, more difficult.

When working in the process - nonZTS mode (Non Zend Thread Safe) - the true global is used, wow_globals . This variable is a structure containing global variables, and with the help of a macro we refer to each of them. WOW_G(foo) leads to wow_globals.foo . Naturally, you need to declare this variable so that it is reset to zero at startup. This is also done using a macro (in ZTS mode, it is done differently):

 ZEND_BEGIN_MODULE_GLOBALS(wow) int foo; ZEND_END_MODULE_GLOBALS(wow) ZEND_DECLARE_MODULE_GLOBALS(wow)

Then the macro is processed like this:

 #define ZEND_BEGIN_MODULE_GLOBALS(module_name) typedef struct _zend_##module_name##_globals { #define ZEND_END_MODULE_GLOBALS(module_name) } zend_##module_name##_globals; #define ZEND_DECLARE_MODULE_GLOBALS(module_name) zend_##module_name##_globals module_name##_globals;

And that's all. When working in the process - nothing complicated.

But when working in a stream - using ZTS - we no longer have true global globals C. But global global ads look the same:

 #define ZEND_BEGIN_MODULE_GLOBALS(module_name) typedef struct _zend_##module_name##_globals { #define ZEND_END_MODULE_GLOBALS(module_name) } zend_##module_name##_globals; #define WOW_G(v) (((wow_globals *) (*((void ***) tsrm_get_ls_cache()))[((wow_globals_id)-1)])->v)

In ZTS and nonZTS, globals are declared the same.

But access to them happens in different ways. The function tsrm_get_ls_cache() is called in ZTS. This is a call to a TLS repository that will return the memory area allocated for the current concrete flow. Considering that first of all we are casting to the void type, this is not so simple with this code.

TSRM level

ZTS uses the so-called TSRM level - Thread Safe Resource Manager. It's just a piece of C code, nothing more!

Thanks to the TSRM level, ZTS can work. For the most part, it is located in the / TSRM folder of the PHP source code.

TSRM is not an ideal level. In general, it is well designed and has appeared since the beginning of PHP 5 (around 2004). TSRM can work with several low-level thread libraries: Gnu Portable Thread, Posix Threads, State Threads, Win32 Threads, and BeThreads. The desired level can be selected during configuration (./configure --with-tsrm-xxxxx).

When analyzing TSRM, we will discuss only the implementation based on pthreads.

TSRM download

When PHP is loaded during module initialization, it quickly calls tsrm_startup() . PHP does not yet know how many threads to create and how many resources are required to ensure thread safety. It prepares thread tables, each consisting of one element. Later the tables will grow, but for now they are distributed using malloc() .

This initial step is also important because here we create the TLS key and the TLS mutex that we need to synchronize.

 static pthread_key_t tls_key; TSRM_API int tsrm_startup(int expected_threads, int expected_resources, int debug_level, char *debug_filename) { pthread_key_create( &tls_key, 0 ); /* Create the key */ tsrm_error_file = stderr; tsrm_error_set(debug_level, debug_filename); tsrm_tls_table_size = expected_threads; tsrm_tls_table = (tsrm_tls_entry **) calloc(tsrm_tls_table_size, sizeof(tsrm_tls_entry *)); if (!tsrm_tls_table) { TSRM_ERROR((TSRM_ERROR_LEVEL_ERROR, "Unable to allocate TLS table")); return 0; } id_count=0; resource_types_table_size = expected_resources; resource_types_table = (tsrm_resource_type *) calloc(resource_types_table_size, sizeof(tsrm_resource_type)); if (!resource_types_table) { TSRM_ERROR((TSRM_ERROR_LEVEL_ERROR, "Unable to allocate resource types table")); free(tsrm_tls_table); tsrm_tls_table = NULL; return 0; } tsmm_mutex = tsrm_mutex_alloc(); /*   */ } #define MUTEX_T pthread_mutex_t * TSRM_API MUTEX_T tsrm_mutex_alloc(void) { MUTEX_T mutexp; mutexp = (pthread_mutex_t *)malloc(sizeof(pthread_mutex_t)); pthread_mutex_init(mutexp,NULL); return mutexp; }

TSRM Resources

When the TSRM level is loaded, you need to add new resources to it . A resource is a memory area containing a set of global variables, usually related to the PHP extension. The resource must belong to the current thread of execution or be protected for access.

This area of memory has some size. She will need initialization (constructor) and deinitialization (destructor). Usually, initialization is limited to zeroing the memory area, and nothing is done when deinitializing.

The TSRM level passes a unique ID to the resource. The caller must then save this ID, since it will be needed later to return the protected memory area from the TSRM.

TSRM function creating a new resource:

 typedef struct { size_t size; ts_allocate_ctor ctor; ts_allocate_dtor dtor; int done; } tsrm_resource_type; TSRM_API ts_rsrc_id ts_allocate_id(ts_rsrc_id *rsrc_id, size_t size, ts_allocate_ctor ctor, ts_allocate_dtor dtor) { int i; tsrm_mutex_lock(tsmm_mutex); /*  id  */ *rsrc_id = id_count++; /*         */ if (resource_types_table_size < id_count) { resource_types_table = (tsrm_resource_type *) realloc(resource_types_table, sizeof(tsrm_resource_type)*id_count); if (!resource_types_table) { tsrm_mutex_unlock(tsmm_mutex); TSRM_ERROR((TSRM_ERROR_LEVEL_ERROR, "Unable to allocate storage for resource")); *rsrc_id = 0; return 0; } resource_types_table_size = id_count; } resource_types_table[(*rsrc_id)-1].size = size; resource_types_table[(*rsrc_id)-1].ctor = ctor; resource_types_table[(*rsrc_id)-1].dtor = dtor; resource_types_table[(*rsrc_id)-1].done = 0; /*        */ for (i=0; icount < id_count) { int j; p->storage = (void *) realloc(p->storage, sizeof(void *)*id_count); for (j=p->count; jstorage[j] = (void *) malloc(resource_types_table[j].size); if (resource_types_table[j].ctor) { resource_types_table[j].ctor(p->storage[j]); } } p->count = id_count; } p = p->next; } } tsrm_mutex_unlock(tsmm_mutex); return *rsrc_id; }

As you can see, this function needs a mutually exclusive lock (mutex lock). If it is called in the child thread of execution (and it will be called in each of them), it will block other threads until it finishes manipulating the global thread storage state.

The new resource is added to the dynamic array resource_types_table[] and gets a unique identifier - rsrc_id , which is incremented as resources are added.

Launch request

Now we are ready to process requests. Remember that each request will be serviced in its own thread of execution. What happens when the request appears there? At the very beginning of each new request, the ts_resource_ex() function is ts_resource_ex() . It reads the ID of the current thread of execution and tries to extract the resources allocated for this thread, that is, the memory for the globals of the current thread. If resources are not detected (the stream is new), then resources are allocated for the current stream based on the model created when you start PHP. This is done with allocate_new_resource()

 static void allocate_new_resource(tsrm_tls_entry **thread_resources_ptr, THREAD_T thread_id) { int i; TSRM_ERROR((TSRM_ERROR_LEVEL_CORE, "Creating data structures for thread %x", thread_id)); (*thread_resources_ptr) = (tsrm_tls_entry *) malloc(sizeof(tsrm_tls_entry)); (*thread_resources_ptr)->storage = NULL; if (id_count > 0) { (*thread_resources_ptr)->storage = (void **) malloc(sizeof(void *)*id_count); } (*thread_resources_ptr)->count = id_count; (*thread_resources_ptr)->thread_id = thread_id; (*thread_resources_ptr)->next = NULL; /*           */ tsrm_tls_set(*thread_resources_ptr); if (tsrm_new_thread_begin_handler) { tsrm_new_thread_begin_handler(thread_id); } for (i=0; istorage[i] = NULL; } else { (*thread_resources_ptr)->storage[i] = (void *) malloc(resource_types_table[i].size); if (resource_types_table[i].ctor) { resource_types_table[i].ctor((*thread_resources_ptr)->storage[i]); } } } if (tsrm_new_thread_end_handler) { tsrm_new_thread_end_handler(thread_id); } tsrm_mutex_unlock(tsmm_mutex); }

Local storage cache extensions

Each extension in PHP 7 can declare its cache in the local storage. This means that when starting each new thread of execution, each extension must read the local storage area of its own thread of execution, rather than iterating through the list of stores with each global access (global access). There is no magic here, for this you need to do a few things.

First you need to compile PHP with cache support: enter -DZEND_ENABLE_STATIC_TSRMLS_CACHE = 1 on the compilation command line. In any case, this should be done by default. Next, when declaring your extension ZEND_TSRMLS_CACHE_DEFINE() use the macro ZEND_TSRMLS_CACHE_DEFINE() :

#define ZEND_TSRMLS_CACHE_DEFINE(); __thread void *_tsrm_ls_cache = ((void *)0);

As you can see, a real global C is declared, only with the special declaration __thread . This is necessary in order to tell the compiler that it will be a thread variable (thread specific).

Then you need to fill this void * repository with data from the repository reserved for your globals with TSRM level. To do this, you can use ZEND_TSRMLS_CACHE_UPDATE() in the global constructor:

 PHP_GINIT_FUNCTION(my_ext) { #ifdef ZTS ZEND_TSRMLS_CACHE_UPDATE(); #endif /* Continue initialization here */ } ```cpp   (macro expansion): ```#define ZEND_TSRMLS_CACHE_UPDATE() _tsrm_ls_cache = tsrm_get_ls_cache();```    pthread: ```#define tsrm_get_ls_cache pthread_getspecific(tls_key)``` ,    ,         —    : ```cpp #ifdef ZTS #define MY_G(v) (((my_globals *) (*((void ***) _tsrm_ls_cache))[((my_globals_id)-1)])->(v))

, MY_G(), , _tsrm_ls_cache ID : my_globals_id .

, - . ID. TSRM , /.

Conclusion

— . , PHP : TLS, , — TSRM. , , . , PHP .

TSRM: -, , . , ZTS, . TSRM : , , .

, , (request-bound). - -, , servinf : , , -.

Source: https://habr.com/ru/post/329446/

All Articles