Survey: how do you solve the problem of synchronizing parallel PHP queries?

For a long time, I have been trying to understand how much space is devoted to issues of parallelism and competitiveness of code execution in the daily practice of an average PHP programmer. On the one hand, when developing a server application, the programmer automatically writes code that will be executed in parallel. On the other hand, in practice in PHP all the problems of this area were solved by the tools that everyone used - a web server, a session and a DBMS.

Do your projects pay attention to the synchronization problems of parallel HTTP requests being processed? They are solved through transactions, blocking? What methods of blocking do you use? Anyway, do you need to bathe about it, or is the topic useless? We learn the opinion of the audience. This post does not provide answers to questions. There is exploration.

***
')
In the PHP world, it has historically been decided not to pay much attention to the parallel execution of code. Under multithreading, PHP itself is not sharpened (there is no thread safety inside the engine). I have never met projects or programmers who used pthreads (did you meet? Then tell us about it in the commands). And it’s not really that multithreading is needed in practice in web applications where you need to execute requests in parallel and not separate parts of the code within a single request. And since the parallel execution of incoming requests in separate processes is organized by the application server (php-fpm or apache), the programmer does not need to think about it - everything works out of the box.

In PHP, there is a session mechanism that is used overwhelmingly. A session with default settings blocks parallel execution of requests within a single session. This “covers” some holes and leads to the fact that in practice you can never face obvious problems. Those. until the user starts hacking, working, for example, in two browsers at the same time, nothing breaks due to the absence of locks and transactions.

In addition, the probability of collisions due to parallelism is very small for sites with a peak number of requests less than 2 per second (assuming that the response generation time does not exceed a second).

And finally, if some problems of parallelism still emerge, then the easiest way to solve them is a transaction in a database with a sufficient level of isolation . Since almost all sites that are developed in PHP use transactional DBMS as a repository, it is enough just to start using transactions to solve the problems of concurrent query execution that lead to inconsistency of data. Even without delving deeply into the topic of process synchronization, simply using transactions, the problem can be solved.

All this leads to the fact that in practice the average PHP programmer almost never encounters problems of parallel code execution. Most people know very little about parallel programming, synchronization and locking. This is clearly seen in the interviews. And how much it is in demand, I want to know in this survey.

In part, this is all well and good - the low threshold of entry, which is one of the main advantages of PHP. Rapid development by saving time to work through all the bottlenecks. But sooner or later, many begin to face the problem of synchronizing processes face to face. And the development of reliable, and not designed for the circumstances, applications requires a certain level of study of this issue.

The simplest example when a transaction does not help is to warm up the cache. In order for data that is cached not to be generated in parallel, competitive requests need to be blocked, allowing the cache to fill in the request that started it first. Without blocking is not enough. Moreover, if there are several servers, the lock should be centralized. Another example is file hosting. The user has limited the number of files that he can upload. When adding a file, you need to compare the number of uploaded files with the limit and accept the file if the limit is not reached. Although you can make a feint with your ears and do without locks, it will be easiest to block before checking for the user, check the counter, take a slot for the file, release the lock, and then take the file body itself.

And using transactions also has its problems. At a minimum, they need to be restarted several times if race-condition is present and the transaction is rolled back due to a collision. There are questions when working with external resources for the database - files, cache, requests to the remote API.

***

In fact, all PHP programmers write code that works in a competitive environment. Often, even in a very competitive. And access to shared resources from parallel-running processes has to be synchronized. I think that many, like me, will be interested to know how colleagues in the workshop are looking at this problem. How do your projects solve the problem of synchronization of access to shared resources?

How is this problem solved by us?

We use tranzatskiy and blocking. Transactions help preserve the consistency of the data, if the task is reduced to a series of queries in the database. When it is necessary to synchronize the code that works not only with the database, or does not work with it at all, we use locks through my abstraction library on the locking method . If the backend is running on the same server, it’s enough to use the driver for flock (), if you need to block remotely, then you can use the drivers for Redis or Memcache.

If you have good materials on this topic, share links in the comments.

PS To fans of other programming languages: if you want to tell how the problem is successfully solved in your language / framework, you are welcome. Otherwise, pay attention to the hub in which the publication is placed.

Source: https://habr.com/ru/post/250617/

All Articles

Survey: how do you solve the problem of synchronizing parallel PHP queries?

More articles: