Not so long ago there was a wonderful article describing the general principles of working with the Gearman queue server. I would like to continue the material, complementing it with some details of practical application, namely: - installation and server management - queue management - what is possible and how - PECL and PEAR php extensions to work with Gearman - server monitoring - code examples - data transfer in portions - organization of parallel computing in PHP
Interesting? I ask under the cat Before proceeding to the presentation of the material, I must say where it came from. This is a compilation of practical applications in several projects (all dealt with working with slow remote sites), extracts from documentation and discussions, just some observations and abstract considerations. Of course, there may be inaccuracies, controversial decisions and controversial thoughts - the author will be grateful for any criticism and editing.
Before we go into details, once again illustrate working with the server. The simplest example. How does calling and executing functions in PHP:
Trivially. Now the same, but using the queue server:
In short, the script that needs the results / actions of the function (“client”) sends to the server (registers) the name of the function and its arguments — a task is created on the server (“task”).
If a handler is registered on the server for a function with the same name (“worker”) and it is currently free, data for processing and the name of the function in the form of a task (“job”) are transferred to it. If the worker for such a function is not registered or he is busy, the task is queued and waits for processing.
After processing the worker transfers what returns f-i, back on Gearman. The queue server looks at which client registered tasks with such a name and with such data, and sends the result of the worker to it (this is if the client did not register the task as background. If the task is registered as background, nothing is sent to the client) .
What immediately follows from this principle of work?
1. Four objects are involved in working with Gearman: client, task, worker, job . Description of all objects are at the office.PHP site .
2. The data transferred — and to the server from the client, and back from the worker — is only a string, and only one. If you need to pass several arguments or not a string - an array for example - serialization is required.
However, all of the above - just a brief repeated of. documentation and already published materials. Now the nuances.
Gearman Installation and Management
If there is a great need, you can start the Gearman server on Windows, there is its implementation in Java: https://launchpad.net/gearman-java
But we will consider working under linux (the following examples are for debian). Server installation runs smoothly and has no features:
aptitude install gearman-job-server
After a successful installation, the server is controlled from etc / init.d / gearman-job-server with simple and clear commands: {start | stop | restart | force-reload} The default host for the server is localhost, the port is 4730
Server state monitoring
There is always a need to see what happens on the server: what tasks are registered, how many are in the queue, how many workers are registered for each task. This can be done from the console with the command (echo status) | netcat 127.0.0.1 4730
Fate queue at gearman restart
Immediately the question arises: what will happen to the queue on a running server, if you make it hara-kiri restart? With the default installation, the queue is stored in memory, restarting the server cancels all tasks that clients sent to the server, and cancels the registration of all workers on the server. In this case, the workers will generate exceptions like "Lost connection with the server." The queue can be saved in the database, supported by MySQL, PostgreSQL, SQLite. How to organize such a queue is well written at the office. site: http://gearman.org/index.php?id=manual:job_server#persistent_queues
It should be borne in mind that even if the queue is saved, when the server is restarted, task handlers — workers — will still “fall off” from the server, and you will either have to restart them, or you should foresee such a situation in advance by the workers themselves.
Reset the entire queue and reset the queue of a specific task
If the queue is not stored in the database, restarting the server, as mentioned above, will reset the entire queue. If the queue is stored in the database, you need, in addition to restarting the server, to clear the corresponding table - all or selectively. Recall that restarting the server will generate exceptions in both clients and workers.
However, there may be a situation when you need to reset the queue for only one task. Example: we parse 100 sites, one has stopped responding, 1000 tasks have accumulated in the queue to receive data from this site, data from all other sites are received through the queue server without any problems. It is necessary to reset the queue only for the stalled site. The easiest and painless way to do this is to run a fake worker who registers a task with the same name on the queue server, but quickly returning a NULL, empty string or anything else — the main thing is quickly. This worker will pass through all the pending tasks, the queue will be cleared.
Restart worker
What may it take? Well, except for the obvious situation where the worker is frozen and this is beyond doubt, there is one more, very frequent. Situation: the worker is running, has registered his tasks on the server. If at this moment you make changes to the code of the worker, no matter how much you save it, the code that was at the time of the task registration will process the tasks from the queue server. In order for the changes in the code to take effect - that is, that the tasks from the server are processed by the already changed code, the worker must be restarted, that is, the execution script of the current worker must be stopped and restarted. The methods are trivial: manually, using a bash-script, provide for a processing in the worker - like exit_worker and launching it from the crown, etc.
PECL and REAR php extensions to work with Gearman
Two variants of extensions can be used to work with the Gearman queue server in php: pecl gearman and pear Net_Gearman
There is a fundamental difference between these extensions.
pear net_geearman
pear Net_Gearman is just a few php files. And that's all. It is very easy to install on the server: pear install Net_Gearman channel: //pear.php.net/Net_Gearman-0.2.3 You can even not install on the server - just unpack the archive, implement the connection acc. classes and everything - you can use, like so for example
This, by the way, makes it possible to work with Gearman in php using the same Denwer or OpenServer.
The current version is the alpha release dated 2009. But it is not scary: no one bothers to edit the required files, no additional components / libraries are used. Despite the date, the library is operable on php5.3. * And does not require revision.
Note the important detail pear Net_Gearman: the library has server monitoring tools and (partially) its management, which makes it possible to do, for example, the monitoring script mentioned above.
pecl gearman
Despite all the advantages, the interface of the pear Net_Gearman library is poor and, as they say somewhere, somewhat clumsy. In addition, IDE - phpStorm, for example, does not contain Net_Gearman classes ( you need to fasten the crutch for convenient work ), and pecl gearman is better suited for working use. The main difference between pecl gearman and pear Net_gearman is that pecl gearman does not directly interact with the server - this is a wrapper for the libgearman C-library.
Install pecl gearman
I would really like this here and now: pecl install gearman But this will not work. At the end of the installation, a message appears stating that libgearman version 0.21 and higher is required, building such a library pulls some more actions along with it, and it may not all be crowned with success. There are a lot of recommendations for installing the pecl-gearman-0.7.0 library on the Internet, but it has a bug - there is a high probability of crashing with a Segmentation fault. By means of several tests it was established that version 0.8.1 is working stably and without problems. Here is the installation procedure for debian6 / php5.3 * from scratch, starting with the server (for debian7 / php5.4, see this post here ):
aptitude install gearman-job-server aptitude install php5-dev aptitude install php-pear aptitude install make aptitude install libgearman-dev cd /tmp pecl download gearman-0.8.1 tar -xvf gearman-0.8.1.tgz cd gearman-0.8.1 phpize ./configure make make test make install echo'extension=gearman.so' > /etc/php5/conf.d/gearman.ini
Now you can safely use all the beauty of the library in a real project. The IDE contains classes for pecl gearman, no further action is required. All the examples below use the pecl gearman extension.
Data transmission in parts and additional data exchange with the client
It is not always convenient to use the do operation in the client; it is often better to use adding a task and processing data coming from the server using simple callback functions. In addition, if the data is transmitted in parts, it makes sense to display the number of processed parts - for example, for a progress bar. Customer
The status of the task is normal or background (synchronous / asynchronous) and the priority of its execution is set in the client. If a task is added to the server as a background task, the client actually simply “throws” it onto the server and is not interested in its future. If the task is normal, the client waits from the server for the result of its execution. For background tasks, the prefix background is used. For priorities there are three levels - normal (without prefixes), low / high sootv. Low / High, for example, the addTaskHighBackground method — add a high-priority background task. All this is well written in the office.documentation in the GearmanClient section
Parallel computations in PHP using Gearman
For a snack, of course, the most delicious. The code of this example is somewhat voluminous, therefore it is not given, but there is nothing military in it: the client adds 800+ tasks to the queue server, 4 workers handle the tasks. The tasks themselves are the translation of pieces of text using the Yandex Translate API, the entire task as a whole is the translation of the book The Godfather from Russian into Ukrainian. Here's a video: