Use mcrouter to scale memcached horizontally.

The development of high-loaded projects in any language requires a special approach and the use of special tools, but when it comes to applications for PHP, the situation can be exacerbated so much that you have to develop, for example, your own application server . In this article we will talk about the familiar pain with the distributed storage of sessions and caching data in memcached and how we solved these problems in one “trust” project.

The hero of the occasion is a PHP application based on the symfony 2.3 framework, which is not included in the business plans at all. In addition to the quite standard storage of sessions, this project used the “caching of everything” policy in memcached: responses to queries to the database and API servers, various flags, locks to synchronize code execution, and much more. In such a situation, the failure of memcached becomes fatal for the application to work. In addition, the loss of the cache leads to serious consequences: the DBMS begins to crack at the seams, the API services ban requests, etc. Stabilization of the situation may take tens of minutes, but at this time the service will be terribly slow or even become unavailable.
')
We needed to provide the possibility of horizontal scaling of the application with small blood , i.e. with minimal changes to the source code and full preservation of functionality. Make the cache not only resilient to failures, but also try to minimize data loss from it.

What is wrong with memcached itself?

In general, the memcached extension of out-of-the-box PHP supports distributed data and session storage. The mechanism of consistent key hashing allows you to evenly place data on many servers, unambiguously addressing each specific key to a specific server in the group, and the built-in failover tools ensure high availability of the caching service (but, unfortunately, not data ).

With the storage of sessions, things are a little better: you can set up memcached.sess_number_of_replicas , as a result of which data will be saved to several servers at once, and in the event of a failure of one memcached instance, data will be sent from the others. However, if the server returns to the system without data (as usually happens after a restart), some keys will be redistributed in its favor. In fact, this will mean the loss of session data , since there is no way to “go” to another replica in case of a miss.

Standard library tools are mainly aimed at horizontal scaling: they allow you to increase the cache to giant sizes and provide access to it from code located on different servers. However, in our situation, the amount of stored data does not exceed several gigabytes, and the performance of one or two nodes is enough. Accordingly, from useful staffing tools could only ensure the availability of memcached while keeping at least one cache instance in working condition. However, even this opportunity to use did not work ... Here we should remind about the antiquity of the framework used in the project, which made it impossible to force the application to work with a pool of servers. Let's not forget about the losses of these sessions: the customer was twitching from the mass user login log-in.

Ideally, a replication of the memcached record and a replica bypass in case of a miss or error was required. Mccrouter helped us implement this strategy.

mcrouter

This is a memcached router developed by Facebook to solve its problems. It supports memcached text protocol, which allows you to scale memcached installations to insane sizes. A detailed description of mcrouter can be found in this announcement . In addition to other broad functionality, it can be what we need:

replicate the record;
do a fallback to other server groups in case of an error.

For the cause!

Mcrouter configuration

I will go directly to the configuration:

 { "pools": { "pool00": { "servers": [ "mc-0.mc:11211", "mc-1.mc:11211", "mc-2.mc:11211" }, "pool01": { "servers": [ "mc-1.mc:11211", "mc-2.mc:11211", "mc-0.mc:11211" }, "pool02": { "servers": [ "mc-2.mc:11211", "mc-0.mc:11211", "mc-1.mc:11211" }, "route": { "type": "OperationSelectorRoute", "default_policy": "AllMajorityRoute|Pool|pool00", "operation_policies": { "get": { "type": "RandomRoute", "children": [ "MissFailoverRoute|Pool|pool02", "MissFailoverRoute|Pool|pool00", "MissFailoverRoute|Pool|pool01" ] } } } }

Why three pools? Why do servers repeat? Let's see how it works.

In this configuration, mcrouter selects the path to which the request will be sent based on the request command. This is what the OperationSelectorRoute type tells him.
GET requests fall into the RandomRoute handler, which randomly selects a pool or route among the objects in the children array. Each element of this array, in turn, is a MissFailoverRoute handler, which will run through each server in the pool until it receives a response with the data, which will be returned to the client.
If we used only MissFailoverRoute with a pool of three servers, then all requests would come first to the first memcached instance, and the rest would receive requests based on the residual principle, when data are not available. Such an approach would lead to an overload of the first server in the list , so it was decided to generate three pools with addresses in different sequences and select them randomly.
All other requests (and this is a record) are processed using the AllMajorityRoute . This handler sends requests to all servers in the pool and waits for responses from at least N / 2 + 1 of them. The use of AllSyncRoute for write operations had to be abandoned, as this method requires a positive response from all servers in the group — otherwise it will return SERVER_ERROR . Although at the same time mcrouter will add the data to the available caches, the PHP calling function will return an error and generate a notice. AllMajorityRoute not so strict and allows you to withdraw up to half of the nodes from service without the problems described above.

The main disadvantage of this scheme is that if there is really no data in the cache, then for each request from the client, there will actually be executed N requests to memcached - to all servers in the pool. It is possible to reduce the number of servers in pools, for example, to two: sacrificing storage reliability, we will get more speed and less load from querying for missing keys.

NB : Documentation in wiki and issues of the project (including closed ones) representing a whole storehouse of various configurations can also be useful links for studying mcrouter.

Build and run mcrouter

The application (and memcached itself) works in Kubernetes for us - respectively, the place is also mcrouter. To build the container, we use werf , the config for which will look like this:

NB : The listings in this article are published in the flant / mcrouter repository .

 configVersion: 1 project: mcrouter deploy: namespace: '[[ env ]]' helmRelease: '[[ project ]]-[[ env ]]' --- image: mcrouter from: ubuntu:16.04 mount: - from: tmp_dir to: /var/lib/apt/lists - from: build_dir to: /var/cache/apt ansible: beforeInstall: - name: Install prerequisites apt: name: [ 'apt-transport-https', 'tzdata', 'locales' ] update_cache: yes - name: Add mcrouter APT key apt_key: url: https://facebook.imtqy.com/mcrouter/debrepo/xenial/PUBLIC.KEY - name: Add mcrouter Repo apt_repository: repo: deb https://facebook.imtqy.com/mcrouter/debrepo/xenial xenial contrib filename: mcrouter update_cache: yes - name: Set timezone timezone: name: "Europe/Moscow" - name: Ensure a locale exists locale_gen: name: en_US.UTF-8 state: present install: - name: Install mcrouter apt: name: [ 'mcrouter' ]

( werf.yaml )

... and throw a helm-chart . From the interesting - here only the config generator on the number of replicas (if someone has a more concise and elegant option - share in the comments) :

 {{- $count := (pluck .Values.global.env .Values.memcached.replicas | first | default .Values.memcached.replicas._default | int) -}} {{- $pools := dict -}} {{- $servers := list -}} {{- /*     : "0 1 2 0 1 2" */ -}} {{- range until 2 -}} {{- range $i, $_ := until $count -}} {{- $servers = append $servers (printf "mc-%d.mc:11211" $i) -}} {{- end -}} {{- end -}} {{- /*   ,  N : "[0 1 2] [1 2 0] [2 0 1]" */ -}} {{- range $i, $_ := until $count -}} {{- $pool := dict "servers" (slice $servers $i (add $i $count)) -}} {{- $_ := set $pools (printf "MissFailoverRoute|Pool|pool%02d" $i) $pool -}} {{- end -}} --- apiVersion: v1 kind: ConfigMap metadata: name: mcrouter data: config.json: | { "pools": {{- $pools | toJson | replace "MissFailoverRoute|Pool|" "" -}}, "route": { "type": "OperationSelectorRoute", "default_policy": "AllMajorityRoute|Pool|pool00", "operation_policies": { "get": { "type": "RandomRoute", "children": {{- keys $pools | toJson }} } } } }

( 10-mcrouter.yaml )

We roll out to the test environment and check:

 # php -a Interactive mode enabled php > #     php > $m = new Memcached(); php > $m->addServer('mcrouter', 11211); php > var_dump($m->set('test', 'value')); bool(true) php > var_dump($m->get('test')); string(5) "value" php > # !   : php > ini_set('session.save_handler', 'memcached'); php > ini_set('session.save_path', 'mcrouter:11211'); php > var_dump(session_start()); PHP Warning: Uncaught Error: Failed to create session ID: memcached (path: mcrouter:11211) in php shell code:1 Stack trace: #0 php shell code(1): session_start() #1 {main} thrown in php shell code on line 1 php > #  …   session_id: php > session_id("zzz"); php > var_dump(session_start()); PHP Warning: session_start(): Cannot send session cookie - headers already sent by (output started at php shell code:1) in php shell code on line 1 PHP Warning: session_start(): Failed to write session lock: UNKNOWN READ FAILURE in php shell code on line 1 PHP Warning: session_start(): Failed to write session lock: UNKNOWN READ FAILURE in php shell code on line 1 PHP Warning: session_start(): Failed to write session lock: UNKNOWN READ FAILURE in php shell code on line 1 PHP Warning: session_start(): Failed to write session lock: UNKNOWN READ FAILURE in php shell code on line 1 PHP Warning: session_start(): Failed to write session lock: UNKNOWN READ FAILURE in php shell code on line 1 PHP Warning: session_start(): Failed to write session lock: UNKNOWN READ FAILURE in php shell code on line 1 PHP Warning: session_start(): Unable to clear session lock record in php shell code on line 1 PHP Warning: session_start(): Failed to read session data: memcached (path: mcrouter:11211) in php shell code on line 1 bool(false) php >

The search for the text of the error did not give a result, however, at the request of “ mcrouter php ”, the oldest unclosed project problem — the lack of support for the binary memcached protocol — was at the forefront.

NB : The memcached ASCII protocol is slower than the binary, and the regular means of consistent key hashing work only with the binary protocol. But it does not create problems for a particular case.

The point is: it remains only to switch to the ASCII protocol and everything will work ... However, in this case, the habit of searching for answers in the documentation on php.net has played a cruel joke. You will not find the correct answer there ... unless, of course, you complete the way to the end, where in the section “User contributed notes” there will be a correct and undeservedly minded answer .

Yes, the correct option name is memcached.sess_binary_protocol . It must be disabled, after which the session will begin to work. It remains only to put the container with mcrouter in the pod with PHP!

Conclusion

Thus, only with infrastructure changes we were able to solve the problem posed: the issue with the fault tolerance of memcached is solved, the reliability of the cache storage is increased. In addition to the obvious advantages for the application, this gave room for maneuver when working on the platform: when all components have a reserve, the administrator's life is greatly simplified. Yes, this method has its drawbacks, it may look like a crutch, but if it saves money, buries the problem and does not cause new ones - why not?