📜 ⬆️ ⬇️

Gearman Queue Server: Practical Experience and the Gearman Monitor && Control web application

The Gearman Queue Server is a great tool. But in the work of the queue server in something like a system unit: it does something, but in order to know what, and to control the process, you need a monitor with a keyboard, and an idea of ​​what is happening in the system unit.
It often seems that Gearman is like an outlandish tool without a handle: it’s interesting and beautiful, but it’s not clear why you need it, but it’s painful to use.
Need to get out of this situation, Gearman is really good.
Let's consider:


Interesting? I ask under the cat.


Is it possible to explain the principle of the Gearman "on the fingers"?

Can.
')
What is a worker?
This is just a console script, in which programming language it does not matter. The script is running constantly, it is a demon.
When started, the worker daemon sends lines to the server — the names of the functions it can perform, opens a socket to the server, and waits.

What data does a worker get?
Two lines come to the co-worker: the first is the name of the f-function to be executed, and the second is the argument. We emphasize: it is impossible to transfer several arguments to the worker, an array, an object is just a string, all other data must be converted to a string — serialization or json.
Actually, what is surprising is that all data transmission over the protocol, for example, HTTP, is also the transfer of strings only.

Where do the lines come from - tasks for the worker?
When sending a task to the server, the Gearman client transmits just two strings — the name of the function to be executed and the argument as a string. In this case, the task in the queue is marked as being in progress.

So, what is next?
If the task is a background task, the worker performs the function and sends only the execution signal to the server. If the task is not a background task, the gearman transmits a string — the result of executing the function, and the execution signal.
When a run-time signal is received, Gearman marks the task as completed and removes it from the queue.

And if in the process of performing the function, the worker will crash abnormally, or if there is an abnormal situation in the function, that is, the function will not be executed?
Gearman will not receive a successful completion signal in this case, and the task will remain in the queue.

What is a customer?
Separate concept “client” for Gearman is difficult to distinguish. You can contact Gearman from PL, sending and receiving data - you can send and wait, you can in the background. But still easier.

"Receives", "sends", "come" - how is the data exchange with the server? Looking for low-level add-ons in PL?
Communication with Gearman takes place via sockets. It is possible to not have any additions to the PL to work with Gearman, to manage working with sockets and Telnet. For example, in PHP, you can work with Gearman by installing the PECL add- on to the language, or you can use the Pear library by simply connecting files of several classes.

If the client sends a task to the server, but there is no worker for this function - is it not free, or not at all?
The task "hangs" in anticipation of the worker. If there are more tasks for the same function, a queue is formed - Gearman queue server is still.

Is it possible to distinguish two tasks on the queue server for the same function?
Not. No argument, no way.

Is it possible to find out which task is sent in the queue?
Since tasks for the same function are indistinguishable, no.

Can I find out how many tasks are in the queue?
Yes. You can also find out how many workers can handle this task, and where are these workers from (IP).

Imagine that one task has been entered, and several workers can do it. Who gets the task?
Tasks are indistinguishable, but workers are also indistinguishable. What kind of workman Gearman will give the task - so easy not to answer. No queue of workers.

What kinds of tasks are there on the queue server?
Tasks can be divided 1) by priority, 2) background task or not
1) There are three priorities - normal, low - less priority than normal, and high - higher than normal priority. The priority is taken into account in the queue, the default is the normal priority.
2) The background task does not give anything to the queue server, and the client does not receive any data. A normal task should return a string, and Gearman will return this string to the client as the result of the task.

How many workers can you run at the same time?
Here you need to clarify - not how much to run, but how many workers can simultaneously connect to Gearman to handle tasks.
Theoretically - as much as necessary. In reality, the maximum number of workers can be limited by the maximum number of simultaneous threads to an external resource, the number of simultaneous connections to the database, and other conditions independent of Gearman. The author of these lines managed to run about 1000 on one server, then MySQL began to swear.

Is it possible to work in PHP with Gearman on Windows?
The question is divided into two: 1) is it possible to start the Gearman queue server itself on Windows and 2) whether it is possible to work with the queue server already on the external host.
1) there is the option of installing Gearman on Windows using cygwin

2) For comfortable work, for use in the code of the usual constructions of the form
$worker = new GearmanWorker(); $worker->addServer() 

and similar language additions are required. This immediately imposes limitations: not only can you not do it under Windows, but you will not be able to do it on the hosting, if you do not have access rights to the server, which allow installing packages.
Denwer and OpenServer do not contain add-ons to work with Gearman.
So, there is no way out, Gearman - only for * nix systems?
Of course not. Problem Solving - Connecting to the Pear Net_Gearman Library Project
There are other class names and a slightly different logic, but you can survive this and work with Gearman fully.

Real tasks using gearman



1) An advertising agency accompanies AdWords and Yandex.Direct advertising campaigns for medium and large online stores. Information about goods changes constantly: prices change (the dollar exchange rate, etc.), new products appear, some goods run out of stock, some are removed from sale. There are thousands and tens of thousands of products in each store.
The task: to make the information in the AdWords and Yandex.Direct advertising be up-to-date: you need to change prices, add advertisements / groups / campaigns for new products, stop advertising for those who have finished in stock, delete ads for products that have been removed from sale.
The task is simply solved with the use of Gearman: from the database we form tasks - changing prices, adding, stopping, deleting. Next, tasks are thrown onto the queue server, and there they are quietly processed into several threads: the workers contact the AdWords API or the Yandex.Direct API and perform the required operations.
It turned out like this:

Gearman here solves two problems: parallel execution of processes — once, and regulation of access to external resources.
Neither the AdWords API nor the Yandex.Direct API will allow you to perform operations in a hundred, for example, flows - there is a limit on the number of requests per second. In my case, I got a maximum of 4 flows to the AdWords API and 8 flows to Yandex.Direct API.
About runtime. The initial load of 10k products takes up to several hours, several hundreds of thousands of ads, several hundred campaigns, etc. are created. Update - a few minutes, fully automatic.

2) Continuation of task 1. Almost all stores transmit data as an XML file. But some send a .xls file. No problem getting the data from the XLS file, there is PHPExcel. But PHPExcel has a nuance: when processing large files, it slows down a lot, but the main thing is that it consumes memory up to exceeding the limits in php.ini 1024MB and more.
The processing of an XLS file using PHPExcel can be parallelized (the idea was drawn from this publication here , thanks to MParshin ). Workers read the file in rows, each worker is their own lines, so the process is parallelized.
Here Gearman solves two problems: 1) parallel processing and 2) bypassing the limit on the memory limit of a single PHP script.
Of course, in this case it is also impossible to start a thousand, for example, workers - the server's memory is also not infinite. In my case, it turned out to process the XLS file in 20 threads.

3) A large vendor - an equipment manufacturer - wants to know at what prices its equipment is actually traded. Data can be obtained from the catalog, but again - only from the site. Standard process: parser, workers in several threads receive data from the catalog site.
The presence of the API in the developed system Gearman Monitor && Control allows you to demonstrate to the client - the vendor - the acquisition of data in real time.
Here Gearman solves two problems: parallel acquisition and processing of data, and display of the process in real time.
Here is the client interface of the system.


4) Hotel service providers provide data. Task: display hotels on the map. The task seems to be quite simple, since the data is provided along with the coordinates. But, unfortunately, many of the coordinates provided are incorrect, and customers cannot show this - the address is in one place, and the hotel marker is in a completely different place. It was decided to independently obtain data using Google geocoder using hotel addresses. But if you simply call the geocoder in javascript, many markers are not displayed - the javascript tries to get the addresses of all hotels at the same time, the geocoder blocks most requests due to exceeding the limit by the number of calls per second.
Solution: all geocoder requests from javascript are sent to your proxy, which forwards the task to the queue server.
Here Gearman solves the problem of regulating and limiting access to an external resource - the Google Geocoder API.


5) The last example is the integrated use of the queue server. The site provides specific information about several thousand objects. Task: to get and translate this information. Solution: run the workers for information, and workers for translation. The first workers in several streams receive information, as soon as the worker receives the required text, he again sets the task for the queue server, the task for the translation workers. Workers-translators translate it and put the finished material in the database.
Gearman is used here as standard - receiving and processing information in several streams, the only nuance is that the workers themselves set tasks for other workers.

Based on practical challenges, Gearman control system requirements have emerged.

Imagine that we are working with a queue server. How to run one worker? We type in the console php worker.php
Ok, you need to run 20 workers for parallel processing. Open 20 consoles? Not an option. Or you need to run several different workers - the same problem. So, the class method and implementation in the web interface are required:
- start of workers, with the choice of which one and indicating the number
Ok, we develop further, started the worker, it works. And how to stop it? Necessarily need
- stop workers
Situation: they launched several different workers in the development process, and - oh, stop, all back, the parameter is not the same! Help us out:
- stop all workers with one action
But the worker is like a little child: he does something, and what is not clear, he needs supervision. This will require:
- logging the work of the worker
Here a little more. It's very nice to see what the worker is doing, to enjoy the creation. But here's the situation: bang - and the worker walked out. Or even better - does not start at all. And what a mistake? Is it wrong in the code, or an unhandled exception, or an error from an external service is not provided? Therefore:
- logging of worker errors, including fatal ones
The log can be large, its viewing is tedious, and often impossible due to the volume. required
- search in the log of arbitrary text
Deal with the workers, they are subject to us. And the turn?
Again, workers like small children dismantle a handful, but what about that pile? Therefore:
- output of all functions that are registered on the queue server
- output queue of tasks for each function
But now the clients threw 1000 taskouts onto the server of task queues, 1000 of them accumulated one at a time, and one worker clearly cannot cope with it, or there are no workers at all. Or closer to life: the external service to which the workers addressed is not available, the queue needs to be reset. How to be? Is required
- reset / clear the queue for each function
But we played enough Gearman, you need to disconnect. Or we did something wrong, sent it to the queue server, and we need to urgently stop everything. Will help
- full reset: clear all queues, stop all workers.

We add that everything that happens should be displayed in real time without any updates to the page .

All the above tasks will be solved by the Gearman_Monitor class and the Gearman Monitor && Control web application that implements the methods of this class.
Gearman Monitor && Control project on Github
Pure web application screenshot

The same screenshot with detailed explanations.


Here is a web application video


For use in development, the Gearman_Monitor and Gmonitor_Settings classes are required (the names of the php files match the names of the classes).
The properties and methods of the classes are documented in detail in the files themselves. We explain only this:
 public static $func_name_synonyms = array( 'summ' => 'Sum', 'muliply' => '    ', 'subtract' => 'Substract Function', 'divide' => '功能', ); 

Here
array key - function name registered by the worker on the queue server
value - What you see in the table, in any form. This is for the convenience of the operators - once, and another application. Imagine that several projects are “spinning” on one queue server. Of course, I want to see not everything in the heap, but only my project. To do this, we specify synonyms (if not specified, the function name is used), and set the value
 public static $synonyms_only_view = true; 

In this case, only those for which there are synonyms will be displayed in the table of functions.

One more thing. The web application contains a logger and PHP error handler. If you connect the gearman_includes.php file to the worker, you can write from the worker to the log that is displayed, and the worker's errors will also be recorded and displayed in the log.

There are two workers in the / workers directory. fake_worker.php is needed for the application to work, the second worker from the running project can be considered as an example.

To use the web application you will need:
- creation of the log table in the database (see the file log_create.sql)
- specify the parameters of the database connection in the Gearman_Db.php file
- set write permissions to all directories inside view / Smarty / smarty_dirs (first of all it concerns the view / Smarty / smarty_dirs / templates_ directory)
- specify the Gearman server host in the Gearman_Monitor.php file

All successful queues!

PS The web application is torn from a working project, there may be some minor inconsistencies, I apologize in advance.
Another moment. Attempts were made repeatedly to rewrite the web application — to refine the code, the interface, but tasks constantly arose with the use of Gearman, and the very first version was quickly finished on the knee. As a result, it was published as repeatedly tested, reliable, stable and convenient.

Source: https://habr.com/ru/post/212761/


All Articles