Memcached plugin: NoSQL in MySQL

Hello! My name is Maxim Matyukhin, I'm a Badoo PHP programmer. In our work, we actively use MySQL. But sometimes we lack her performance, so we are constantly looking for opportunities to speed up her work.

In 2010, Yoshinori Matsunobu introduced a NoSQL plugin for MySQL called HandlerSocket. It was stated that this plugin allows you to perform more than 750,000 requests per second. We were curious, and we almost immediately began to use this solution. We liked the result so much that we began to make presentations and write articles promoting HandlerSocket.
')
Apparently, we were one of the few users of this plugin - since MySQL version 5.7 it has stopped working. But in this version there is another plugin from Oracle - InnoDB memcached plugin, which promised similar functionality.

Despite the fact that the memcached-plugin appeared in MySQL 5.6 in 2013, there are not so many articles about it and for the most part they repeat the documentation: a simple table is created and queries are made to it through the memcached client.

We have a lot of experience with Memcached and are used to the simplicity of interacting with it. We expected the same simplicity from the InnoDB memcached plugin. But in fact, it turned out that if the plug-in usage patterns are at least slightly different from those described in the documentation and articles, then a host of nuances and limitations emerges that are definitely worth considering if you are going to use the plug-in.

MySQL HandlerSocket

In this article, we will somehow compare the new memcached plugin with the old HandlerSocket. Therefore, I remind you what was the last one.

After installing the HandlerSocket plugin, MySQL started listening to two additional ports:

The first port received client requests to read data.
The second port received client requests to write data.

The client had to establish a normal TCP connection to one of these ports (no authentication was supported), and after that it was necessary to send an “open index” command (a special command with which the client informed which table of which index which fields we are going to read (or write)).

If the “open index” command worked successfully, then it was possible to send GETs or INSERT / UPDATE / DELETE commands depending on the port on which the connection was established.

HandlerSocket allowed to perform not only GETs on the primary key, but also simple samples from a non-unique index, sampling on a range, supported multiget-s and LIMIT. At the same time, it was possible to work with the table both from ordinary SQL and through a plugin. This, for example, made it possible to make some changes in transactions through SQL, and then read this data through a HandlerSocket.

It is important that HandlerSocket handled all connections with a limited pool of threads through the epoll, so tens of thousands of connections could easily be supported, while in MySQL itself a thread is created for each connection and their number is very limited.

At the same time, this is still the usual MySQL server — a familiar technology. We know how to replicate and monitor it. Monitoring the HandlerSocket is difficult, as it does not provide any specific metrics; nevertheless, some standard MySQL and InnoDB metrics are useful.

There were, of course, inconveniences, in particular, this plugin did not support working with the timestamp type. Well, the HandlerSocket protocol is harder to read and therefore harder to debug.

Read more about HandlerSocket here . You can also watch one of our presentations .

InnoDB memcached plugin

What does the new memcached plugin offer us?

As the name implies, his idea is to use the memcached client to work with MySQL and, through memcached commands, get and save data.

You can read about the main advantages of the plugin here .

We are most interested in the following:

Low CPU consumption.
The data is stored in InnoDB, which gives certain guarantees.
It is possible to work with data both through Memcached, and through SQL; they can be replicated using built-in MySQL tools.

To this list, you can add more advantages such as:

Fast and cheap connection. A normal MySQL connection is processed by one thread, and the number of threads is limited, and in a memcached plugin, one thread handles all connections in the event loop.
The ability to request several keys at once with a GET request.
Compared to MySQL HandlerSocket, then the memcached plugin does not need to use the “Open Table” command and all read and write operations occur on one port.

More details about the work of the plugin can be found in the official documentation . The following pages seemed to be the most useful for us:

After installing the plugin, MySQL starts accepting connections on port 11211 (standard memcached-port). A special database (schema) innodb_memcache also appears in which you will configure access to your tables.

Simple example

Suppose you already have a table with which you want to work through the memcached protocol:

CREATE TABLE `auth` (  `email` varchar(96) NOT NULL,  `password` varchar(64) NOT NULL,  `type` varchar(32) NOT NULL DEFAULT '',  PRIMARY KEY (`email`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

and you want to receive and change data on the primary key.

You must first describe the correspondence between the memcached key and the SQL table in the innodb_memcache.containers table. This table looks like this (I removed the encoding description to make it easier to read):

 CREATE TABLE `containers` ( `name` varchar(50) NOT NULL, `db_schema` varchar(250) NOT NULL, `db_table` varchar(250) NOT NULL, `key_columns` varchar(250) NOT NULL, `value_columns` varchar(250) DEFAULT NULL, `flags` varchar(250) NOT NULL DEFAULT '0', `cas_column` varchar(250) DEFAULT NULL, `expire_time_column` varchar(250) DEFAULT NULL, `unique_idx_name_on_key` varchar(250) NOT NULL, PRIMARY KEY (`name`) ) ENGINE=InnoDB DEFAULT

The most important fields:

name - the prefix of your memcached-key;
db_schema - the name of the database (schema);
db_table is your table;
key_columns - the name of the field in the table by which we will search (usually this is your primary key);
value_columns - a list of fields from the table that will be available memcached-plugin;
unique_idx_name_on_key - the index by which you want to search (although you have already specified key_columns, they can be included in different indexes and you must explicitly specify an index).

The remaining fields are not very important for the beginning.

Add a description of our table in innodb_memcache.containers:

 INSERT INTO innodb_memcache.containers SET   name='auth',   db_schema='test',   db_table='auth',   key_columns='email',   value_columns='password|type',   flags='0',   cas_column='0',   expire_time_column='0',   unique_idx_name_on_key='PRIMARY';

In this example, name = 'auth' is the prefix of our memcached-key. In documentation, it is often called table_id, and later in the article I will use this term.

Now we connect the TELNET to the memcached plugin and try to save and retrieve data:

 [21:26:22] maxm@localhost: ~> telnet memchached-mysql.dev 11211 Trying 127.0.0.1... Connected to memchached-mysql.dev. Escape character is '^]'. get @@auth.max@example.com END set @@auth.max@example.com 0 0 10 1234567|89 STORED get @@auth.max@example.com VALUE @@auth.max@example.com 0 10 1234567|89 END

At first we sent a GET request, it did not return anything to us. Then we saved the data with a SET request, and then got it back with a GET.

GET returned the following string: 1234567 | 89. These are the values of the "password" and "type" fields, separated by the character "|". Fields are returned in the order in which they were described in innodb_memcache.containers.value_columns.

Perhaps you have now wondered: “What will happen if the“ | ”character is found in the“ password ”?” I will describe this later.

Through SQL, this data is also available:

 MySQL [(none)]> select * from auth where email='max@example.com'; +-----------------+----------+------+ | email      | password | type | +-----------------+----------+------+ | max@example.com | 1234567  | 89 | +-----------------+----------+------+ 1 row in set (0.00 sec)

Default table_id

There is also such a mode of operation:

 get @@auth VALUE @@auth 0 21 test/auth END get max@example.com VALUE max@example.com 0 10 1234567|99 END set ivan@example.com 0 0 10 qwerty|xxx STORED get ivan@example.com VALUE ivan@example.com 0 10 qwerty|xxx END

In this example, with the get @@ auth request, we make table_id auth the default prefix for this connection. After that, all subsequent requests can be done without specifying table_id.

So far everything is simple and logical. But if you start to understand, it reveals a lot of nuances. I'll tell you about what we found.

Nuances

Caching table innodb_memcache.containers

The memcached plugin reads the innodb_memcache.containers table once at startup. Further, if an unknown table_id comes over the Memcached protocol, the plugin looks for it in the table. Therefore, you can easily add new keys (table_id), but if you want to change the settings of an existing table_id, you will have to restart the memcached plugin:

 mysql> UNINSTALL PLUGIN daemon_memcached; mysql> INSTALL PLUGIN daemon_memcached soname "libmemcached.so";

In the interval between these two requests, the Memcached interface will not work. Because of this, it is often easier to create a new table_id than to change the existing one and restart the plugin.

It was unexpected for us that such an important nuance of the plug-in operation is described on the Adapting a page of the memcached Plugin , which is not a very logical place for such information.

Flags, cas_column, expire_time_column

These fields are needed to simulate some features of Memcached. Documentation on them is controversial. Most of the examples in it illustrate working with tables in which these fields exist. You may be afraid that you will need to add them to your tables (and this is at least three INT-fields). But no. If you do not have such fields in the tables and you are not going to use Memcached functionality such as CAS, expiration or flags, then you do not need to add these fields to the tables.

When configuring a table in innodb_memcache.containers, enter '0' in these fields, make exactly the line with zero:

 INSERT INTO innodb_memcache.containers SET   name='auth',   db_schema='test',   db_table='auth',   key_columns='email',   value_columns='password|type',   flags='0',   cas_column='0',   expire_time_column='0',   unique_idx_name_on_key='PRIMARY';

It's a shame that cas_column and expire_time_column default is NULL, and if you execute INSERT INTO innodb_memcache.containers without specifying the value '0' for these fields, NULL will remain in them and this memcache prefix will simply not work.

Data types

From the documentation it is not very clear what data types can be used when working with a plugin. In several places it is said that the plugin can work only with text fields (CHAR, VARCHAR, BLOB). Here: Adapting an Existing MySQL Schema for the InnoDB memcached Plugin suggests storing numbers in string fields, and if you need to work with these numeric fields from SQL, then create a VIEW in which VARCHAR-fields with numbers will be converted into INTEGER-fields :

 CREATE VIEW numbers AS SELECT c1 KEY, CAST(c2 AS UNSIGNED INTEGER) val FROM demo_test WHERE c2 BETWEEN '0' and '9999999999';

However, in some places the documentation still says that you can work with numbers. So far we have only real production experience with text fields, but the results of experiments show that the plugin also works with numbers:

 CREATE TABLE `numbers` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `counter` int(10) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`id`) ) ENGINE=InnoDB INSERT INTO innodb_memcache.containers SET name='numbers', db_schema='test', db_table='numbers', key_columns='id', value_columns='counter', flags='0', cas_column='0',expire_time_column='0',unique_idx_name_on_key='PRIMARY';

After that via Memcached protocol:

 get @@numbers.1 END set @@numbers.1 0 0 2 12 STORED get @@numbers.1 VALUE @@numbers.1 0 2 12 END

We see that the memcached plugin can return any data types. But it returns them in the form in which they lie in InnoDB, so, for example, in the case of timestamp / datetime / float / decimal / JSON, a binary string is returned. But integers are returned as we see them through SQL.

Multiget

Memcached protocol allows you to request multiple keys in one request:

 get @@numbers.2 @@numbers.1 VALUE @@numbers.2 0 2 12 VALUE @@numbers.1 0 2 13 END

The fact that multiget works is already good. But it works within the same table_id:

 get @@auth.ivan@example.com @@numbers.2 VALUE @@auth.ivan@example.com 0 10 qwerty|xxx END

The documentation describes this point here: https://dev.mysql.com/doc/refman/8.0/en/innodb-memcached-multiple-get-range-query.html . It turns out that in multiget-e you can specify table_id only for the first key, if all other keys are taken from the default table_id (an example from the documentation):

 get @@aaa.AA BB VALUE @@aaa.AA 8 12 HELLO, HELLO VALUE BB 10 16 GOODBYE, GOODBYE END

In this example, the second key is taken from the default table_id. We could specify a lot more keys from the default table_id, and for the first key we specified a separate table_id, and this is only possible in the case of the first key.

It can be said that multiget works within a single table, because relying on such logic in the production code is not at all desirable: it is not obvious, it is easy to forget about it, to make a mistake.

If to compare with HandlerSocket, there too multiget worked within the limits of one table. But this restriction seemed natural: the client opens the index in the table and requests one or more values from it. But when working with the memcached-multiget plugin with several keys with different prefixes, it is normal practice. And from MySQL memcached-plugin expect the same. But no :(

INCR, DEL

I have already given examples of GET / SET requests. INCR and DEL queries have a feature. It lies in the fact that they only work when using the default table_id:

 DELETE @@numbers.1 ERROR get @@numbers VALUE @@numbers 0 24 test/numbers END delete 1 DELETED

Limitations of the memcached protocol

Memcached has a text protocol, which imposes some restrictions. For example, memcached-keys should not contain whitespace characters (space, newline). If you look again at the description of the table from our example:

 CREATE TABLE `auth` ( `email` varchar(96) NOT NULL, `password` varchar(64) NOT NULL, `type` varchar(32) NOT NULL DEFAULT '', PRIMARY KEY (`email`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

then this means that there should be no such characters in the “email” field.

Also, the length of memcached-keys must be less than 250 bytes (bytes, not characters). If you send more, get an error:

 "CLIENT_ERROR bad command line format"

In addition, it is necessary to take into account the fact that the memcached plugin adds its syntax to the memcached protocol. For example, it uses the character "|" as a field delimiter in response. You need to ensure that this symbol is not used in your table. The separator can be configured, but the settings will apply to all tables on the entire MySQL server.

Field separator value_columns

If you need to return several columns through the memcached protocol, as in our first example:

 get @@auth.max@example.com VALUE @@auth.max@example.com 0 10 1234567|89 END

then the column values are separated by the standard delimiter "|". The question arises: “What will happen if, for example, in the first field in a line there is a character“ | ””? The memcached plugin in this case returns the string as is, like this: 1234 | 567 | 89. In general, it is impossible to understand where there is a field.

Therefore, it is important to immediately select the correct separator. And since it will be used for all keys of all tables, it should be a universal symbol that will not appear in any field with which you will work through the memcached protocol.

Summary

This is not to say that the memcached plugin is bad. But it seems that it was written for a certain work scheme: MySQL-server with one table, to which there is access via the memcached-protocol, and this table_id is made default. Clients establish a persistent connection with the Memcached plugin and make requests to the default table_id. Probably, in such a scheme, everything will work without complaints. If you move away from it, you come across various inconveniences.

You might have expected to see any reports about plug-in performance. But we have not yet decided to use it in high-loaded places. We used it only in a few not very loaded systems and it works there at about the same speed as the HandlerSocket, but we didn’t do fair benchmarks. But still, the plugin provides an interface with which a programmer can easily make a mistake - you need to keep a lot of nuances in your head. Therefore, we are not yet ready to massively use this plugin.

We got some feature requests in the MySQL bug tracker:

https://bugs.mysql.com/bug.php?id=95091
https://bugs.mysql.com/bug.php?id=95092
https://bugs.mysql.com/bug.php?id=95093
https://bugs.mysql.com/bug.php?id=95094

Hopefully, the memcached-plugin development team will improve its product.

Source: https://habr.com/ru/post/453742/

All Articles