📜 ⬆️ ⬇️

Do not be afraid to use HandlerSocket


(example of the protocol HandlerSocket in the picture)

Introduction


In the previous project, there was a need for unloading the database, then life and pushed me with the HandlerSocket.

HandlerSocket is a protocol implemented in the eponymous plugin for MySQL RDBMS, which allows using the NoSQL method to access data stored in InnoDB tables. The main reason for using NoSQL solutions is a very fast search by primary key.
')
More about HandlerSocket
HandlerSocket runs as a daemon inside the mysql process, accepting TCP connections and performing client requests. It does not support SQL queries; instead, it provides a simple query language for CRUD table operations. That is why it is much faster than mysqld / libmysql in some cases:

HandlerSocket handles data without parsing an SQL query, which leads to a decrease in CPU usage.
It supports batch query execution. You can send several requests at once and get the result at once, which again reduces the load on the processor and the network.
The HandlerSocket protocol is more compact than mysql / libmysql, which leads to a reduction in network load.

More details can be read here:



Under the cut waiting for you:


Studying the issue of introducing HS into a project, I was quite surprised that under it there was no normal PHP library that would just be easy to use and cover the protocol functionality. The solutions available at that time were similar to those being developed, but not to any ready-to-use ones.

We were not satisfied with ready-made solutions and they didn’t give me a watch from the project, so that I could take and throw something of my own, so the option of introducing HS was discarded. The project is long gone, but the idea that HS has a cool thing seized me. And now more than a year has passed, when I tried again to find solutions and found that nothing had moved from that dead center.

The cool guys from Badoo once told me that they were using HS under their hood, but I could not find in the public domain what they use, I suspect that something samopisny.

Analysis of ready-made solutions


There were three ready solutions: 1 written in PHP ( HSPHP ) and 2 extensions to the interpreter, written in C.

All 3 had a couple of common problems:


With such a set of problems, the C extensions were immediately rejected, because there was no desire to understand them and finish on C at all, HSPHP remained.

While I was dealing with HSPHP, I managed to make a couple pull requests. Inside, everything is written through callbacks, which I don’t really like. All this together made me abandon this library. By the way, recently added support for authorization in HSPHP, but it all broke down completely.

Why your bike


I needed an easy-to-use tool that out of the box would not require any modifications, was easy to use and clearly kept up to date. Since I did not find anything like this, and the interest in the protocol has not cooled, time and desire were found by themselves and the body sat down to write code in semi-automatic mode.

Meet - HandlerSocketLibrary


Briefly about the results:



Architecture



(Image is clickable)

This is a graph of methods called under the hood of the library, drawn using the XHProf profiler. It clearly shows that the 2 slowest sections of the code are working with the socket and parsing the answer. I use a wrapper for working with streams / sockets SteamLibrary .

The idea is simple - create a connection to the socket, send a command, get an answer. The architecture has the ability to flexibly play with various options for reading data through a socket (fgets, fgetc, fread ...). I have not tested the speed of work through various wrappers on sockets, but this is all already embedded in the architecture.

Query examples


I will not consider all possible requests here, because they are described in the documentation (link to the documentation), I will try to give common examples and tell about the features.

There are 2 sockets - reading and writing. The difference in them all comes down to supported teams. A reading socket can handle 3 commands:


To connect to a reading socket:

$reader = new \HS\Reader('localhost', 9998, 'passwordRead'); 


The constructor accepts - host, port and password for authorization (optional). Authorization command is sent automatically.

Open the index:

 $indexId = $reader->getIndexId( 'database', 'tableName', 'PRIMARY', array('key', 'text'), array('num') ); 

It is necessary to specify the database, table and key (if the key is empty, then by default the work will continue with the PRIMARY key), then the list of columns (key, text) to be opened, and the additional column for filtering (num). Filtering is optional, but only these columns will be available for filtering in the future, and the values ​​of these columns will not be returned when the server responds.

On subsequent calls, the index will not open, it will be used already open.

To obtain data through an open index, execute the code:

 $selectQuery = $reader->selectByIndex($indexId, Comparison::EQUAL, array(42)); 

We take values ​​that are equal to 42, because in the list of columns of this index the sequence goes (key, text), then 42 is assigned to 1 column, namely key, the type of comparison is equality (Comparison :: EQUAL).

By default, HS returns just a list of values. At the output, you can get data in the form of an associative array (values ​​are automatically mapped into an array by column name) or a vector.

To do this, just add after the request and before sending it the line:

 $selectQuery->setReturnType(SelectQuery::ASSOC); //      

To get the data, you need to send a query and parse the results:

 $resultList = $reader->getResultList(); 

The $ resultList variable contains an array of the results of all the commands that were sent for this iteration, but it’s not quick and inconvenient to clear all the answers, so all answers are automatically mapped to queries.

 $selectResult = $selectQuery->getResult(); //   NULL    $reader->getResultList(); $arrayResultList = $selectResult->getData(); //    . 

Another way is to open the index and send the command at once:

 $selectQuery = $reader->select( array('key', 'text'), //  'database', 'tableName', 'PRIMARY', Comparison::MORE, //  ,     > array('1'), // sql like: key > 1 0, // offset 99, // limit array('num'), //   array(new Filter(Comparison::EQUAL, 0, '3')) //  sql like: num = 3 ); // SELECT key,text from database.tableName WHERE key > 1 and num = 3 LIMIT 0,99; $this->getReader()->getResultList(); 

First of all, it will be checked whether there is an index that contains the entire list of columns. If yes, it will be used, and if not, a new index will open with the required number of columns.

The third way is through the “Builder Inquiries” (QueryBuilder), I think there is no point in commenting here:

 $selectQueryBuilder = \HS\QueryBuilder::select(array('key', 'date', 'varchar', 'text', 'set', 'union')) ->fromDataBase($this->getDatabase()) ->fromTable($this->getTableName()) ->where(Comparison::MORE, array('key' => 2)) ->andWhere('float', Comparison::EQUAL, 3); $selectQuery = $reader->addQueryBuilder($selectQueryBuilder); $reader->getResultList(); $selectQuery->getResult()->getData(); 

These are all commands that can work through a reading socket. All others will fall out with ReadOnlyError.

The writing socket handles these 3 commands and 5 more:


Examples of how to work with these commands are in the HandlerSocketLibrary docks . I generally consider the first 3 to be intuitive and it will be enough just to see examples, so I’m missing them.

But increment and decrement are commands that increase or decrease values ​​in the database. You need to know that in the case when the value in the process of the operation should change from negative to positive or vice versa, such an operation will not pass and the value will not change. That is, the value -1 will never become 0 using the increment command. This is a feature of the HS plugin itself.

If in modifying commands you set the flag $ suffix = true, then the answer will be SelectResult with values ​​that meet your criteria for modification BEFORE the modification process. That is, if you update 2 columns and specify suffix = true, then the values ​​of these columns will return to you before the update, and not after.

If the limit and offset are not specified, then by default they are 1 and 0, respectively.

It is important not to forget such a feature of incremental \ decrement queries: the value of the limit is multiplied with the value by which you need to increase or decrease the record.

Performance comparison with analogues


On the graphs, the comparison took place according to the speed of processing requests and the RAM consumed for it. A total of 1000 requests were executed.

The measurements were made on a synthetic database running Mac Os X, i5 + 4Gb Ram. Absolute values ​​were not critical for me, I needed a percentage ratio, but you can do the same on your hardware. To do this, you can use my test scripts laid out on github .

With Insert, I “cheated” and simply inserted the entire 1000 elements with the 1st request.


More is better.


Less is better.

It is necessary to give HSPHP, I thought that it would be stronger to lose in speed and consumed memory to SISH extensions, but it turned out in practice that it was quite a bit. But such a huge gap in performance just caught me in a blunder, I was expecting a maximum 30-50% lag.

It was necessary to correct, it was necessary to look for what the trouble was. First of all, I pulled XHProf out from under the table and wrote down the result for 1000 Select queries:


The first problem is working through an array.

When designing, the array was selected for internal communication. I, of course, understood that I would lose in productivity, but in order to ...

A heavily stripped ParameterBag class from HttpFoundation components was used to work with the array. The point is that if there are values ​​in the array with the key N, then we return them, if not - defaultValue, which you can specify for yourself for any value.

On the graph above, it is just clear that working through ParameterBag takes with it an inadmissible amount of processor time.

The nagging is to check whether something is there or not, there are 2 most obvious ways: isset and array_key_exists .
Isset does not handle null correctly, so if you use it, you need to remove all possible moments from the architecture when null or an empty string comes to the array, since the element will not be located through the isset. In array_key_exists, everything is fine with this, that's why I chose it initially. But thanks to the profiler, the discovery was unexpected and by googling the problem of speed found that array_key_exists loses at about 10! times isset'y The most obvious solution was a temporary crutch - change the condition of checking the value in the array - add isset before the array_key_exists. It turned out this check:

 if(isset($array[$key]) || array_key_exists($array, $key)) { // ~(0.0)~ } 

Just adding an isset check before array_key_exists has increased the performance of the entire library by 4% ! It was cool, but 4% is not a cool enough result as a whole, so I had to completely get rid of ParameterBag, rewriting everything for specific values ​​in the queries.

The result was expected:

The dropped array reduced the execution time for Select requests by 44%, and the memory consumed by 45%.

I can not say that the speed of work is excellent now, I am still not happy with it. Next you have to get rid of unnecessary abstractions and internal optimizations for each specific request. This work is still to be done, but the backlog is reduced to 2.5-3 times, instead of 5.

Error list


Another nice thing that appeared due to the large number of questions on the network about decoding HS errors. HS source codes were studied and all possible types of errors were written out, since it was written in plain language and there are not so many source codes. Below is a list of what errors can be returned when working with HS using HandlerSocketLibrary and what they mean. I would not lay out this list in the article, if it were not for a shortage of information on this topic in the network.

error list
  • AuthenticationError - the server requires authorization and you have not passed;
  • ComparisonOperatorError - an invalid comparison operator was passed (only 4> =, <=, <,> is available);
  • InListSizeError - the search mode by the list of values ​​is selected and the number of these values ​​is specified incorrectly;
  • KeyIndexError - the specified column number exceeds the total number of columns in the open index;
  • ModifyOperatorError - an invalid modifying operator was passed. (+, -, D, U, +?, -?, D?, U?);
  • UnknownError - super prize - aaaaaavtomobil! If you won a car - create this issue;
  • ColumnParseError - invalid column values ​​passed;
  • FilterColumnError - you are trying to open a column with a number greater than the number of open columns to filter;
  • IndexOverFlowError - you are trying to use a non-initialized index number;
  • KeyLengthError - the length of the list of keys is either greater than the list, or <1;
  • OpenTableError - trying to open a nonexistent table, or a nonexistent index;
  • CommandError - you sent an invalid command;
  • FilterTypeError - passed the wrong type of filtering. (F, W);
  • InternalMysqlError - MySQL could not execute the command (for example, insert an already existing value);
  • LockTableError - you are trying to open a locked table;
  • ReadOnlyError - the modifying command was sent to the reading socket.


Here in this php file you can see the error parsing logic.


Plans


First of all, you need to add all possible use cases and cover them with tests. As a result of the struggle for the speed of query execution, rudiments appeared in the code that need to be refactored. Not all requests have a full list of features, that is, some requests may have missed the opportunity to use filters or lists of keys (yes, the code is still slightly damp). In already existing tests, an easy mess, because at the beginning I wrote neatly, in the end it was already just to cover the case. Tests for database modification do not check the state of the entire table after the modifying query.

It is necessary to add the ability to return values ​​in the form of an object on Select queries. HS supports CRUD - so you can add a wrapper to the ORM, for example, to the same Doctrine.

Why so little information about HS and a small community?


I want to clarify right away: HS is a great thing that can and should be used, it’s just that not everyone can do it due to the peculiarities of the protocol and its limitations.

So why isn’t the community growing and why is it so hard to start working with real projects in HS?

I am sure that there were other developers in my place, when between the idea of ​​introducing HS and the actual implementation of HS, tasks emerge to finish the infrastructure associated with HS, which is completely unrelated to the business objectives of the project. The result is a choice: either your manager \ those director \ someone else pays for this work and generally gives you good for it, or you do it yourself after hours, and afterwards you raise the question that everything is ready and can be implemented only profit, or you score on HS, which is most often the case, apparently.

Naturally, the fact that HS does not have access control on databases; this sharply limits the scope of application, all hosting with the number of clients> 1 disappear.

I tried to slightly contribute to solving this problem.

Monitoring


We decided, therefore, to use HS on the project, but how to measure the load passing through it? HS has the ability to configure the number of workers, here are their status and download will be monitored using Munin .





Plugins connect to MySQL and parse the output of the show processlist command. Available for download from here . This data should be enough to assess how loaded all the peaks are.

Symfony2 bundle


If your project uses the 2nd version of the symfony framework, then you have the option of messing with HS out of the box. You only need to add a new dependency to composer.json: “konstantin-kuklin / handlersocket-bundle”: “dev-master”.

Setup:

Add configs for the socket for reading and writing:

 hs: reader: host: localhost port: 9998 debug: "%kernel.debug%" auth_key: "Password_Read1" writer: host: localhost port: 9999 debug: "%kernel.debug%" 

The auth_key parameter is optional and its absence means that there is no need for authorization (your CO). The debug flag changes the operating mode of the HS library under the hood, requests are sent individually and the processing time for each request is recorded, and the operative charge for additional storage of requests is also increased.>

Then everything is simple - we work with 2 services:

 /** @var \HS\Reader $reader */ $reader = $this->get("hs_reader"); /** @var \HS\Writer $writer */ $writer = $this->get("hs_writer"); 


There are “Nishtyak” for debag:
In dev mode, go to the site and see a new icon on the line of the web-profiler:



A very simplified statistics is collected: how many requests and how much time it took to complete them (full time - sending, receiving, parsing). When you click on this new button, we will see more detailed information about the requests:



Above the information where the requests and information about the requests were sent, if you click on Display Data, then we get the data of the query result:



In fact, here my fantasy has dried up, at this moment I got to know the protocol quite well and therefore it was hard to invent cases to display such information. I would be glad to get your ideas on what information would still be useful for debugging.

Of the minuses - the bundle is not tested at all.

I plan to make 1 entry point in HS with the indication of the Database, that is, in the queries you will not have to transfer the database.

Conclusion


All this can theoretically be used in production, but I need help with testing and I hope this article will gather people who have experience with people using HS.

Constructive criticism, pull requests, suggestions, found bugs are welcome.

Now it's easy to start using HS; PerconaServer and MariaDB have this plugin in their builds. Act!

Thanks for attention!

PS On Saturday, the news flew by that the Badoo code will be posted .

Source: https://habr.com/ru/post/239637/


All Articles