📜 ⬆️ ⬇️

Lock-free memcache API

Good day, habrazhiteli!
This post is a brief summary of many hours of thinking, grub on paper, sketches of code, and, in the end, a really working code in production.
Our site (and further - just a site) actively uses memkes for hot data. The code that fills in the memkesh can work for a very long time (0.5 seconds is a long time) and at the same time, user requests manage to run another hundred update procedures. The consequences are clear, but for a long time we just could not notice them at the level of the total load. Only when we saw bursts of time for servicing some queries (from the increased load, they also got into SLOW_QUERIES_LOG MySQL) - then the work started to boil.


Problem clearly


Let us consider in more detail the script for requesting a key from memkesh, in which the key is “rotten” or has not been installed.
To understand the problem, I drew a small diagram:

Data request workflow
')
The update logic is controlled by the code, i.e. the application itself knows if the FALSE is returned to it, then the key must be regenerated. As can be seen from the diagram, the trouble arises at the moment when the data on the first request have not yet had time to “get ready”, and we are already asking the same data.
Thanks to evilbloodydemon habraouser for a precise definition - the situation is called “Dog pile effect”.

Decision


Numerous versions were put forward - how to avoid it.
At first we thought about the locking system and the update queue. But this scenario is so slow.
Then they thought - if the code is able to regenerate the data - even if it does. You just need to return him to FALSE. And immediately - reinstall the old data. Total: the update procedure will run once, and the data that will be returned to the application until the end of the regeneration process will “rotten” only during the regeneration time.
To do this, we must not only add the data itself, but also the timeout and key invalidation time. The array for a time is twice as large (for sure). The documentation states that the longest key storage time is 30 days. Those. it is enough to put the data for 15 days in a “wrapper” - 1 second for fidelity. The same applies to keys with timeout = 0 (i.e. forever, until it is superseded). Situations when the data in memkesh are needed once every 15 days - I have not met. If this happens to you, something needs to be changed.
We also quickly noticed a problem with the increment. I had to agree that all increment keys end with "_inc", for example. And when such a key is detected, we simply get the necessary data, which the memory has itself recruited. * I have removed this plug from the Memcache_Proxy :: get () method.

Code


The code is documented where it is necessary :) I apologize in advance for the code sheet, but I can’t cut it anymore.
class MC { private static $_proxy; // Singleton for our class, extended of native Memcache class private static function _proxy() { if (is_null(self::$_proxy) || self::$_proxy->closed) self::$_proxy = new Memcache_Proxy; return self::$_proxy; } public static function get($key = '') { return self::_proxy()->get($key); } public static function set($key = '', $data = NULL, $flag = FALSE, $timeout = 3600) { return self::_proxy()->set($key, $data, $flag, $timeout); } public static function delete($key = '') { return self::_proxy()->delete($key); } public static function increment($key = '', $increment = 1) { return self::_proxy()->increment($key, $increment); } } 


The MS class is needed to communicate with one copy of the memkesh within the entire code without the need to explicitly declare a connection to the memkesh. It will be created when you first access the desired method in this class.

 class Memcache_Proxy extends Memcache { public $closed = false; public function __construct() { $this->connect(MEMCACHE_HOST, MEMCACHE_PORT, null); $this->closed = false; } function __destruct() { $this->close(); $this->closed = true; } /** * Mirror for $memcache->get() method */ public function get($key = '') { if (empty($key)) return FALSE; $data = parent::get($key); if ($data !== FALSE && $this->_is_valid_cache($data)) { if (!isset($data['_dc_cache'])) $data['_dc_cache'] = NULL; //check lifetime if (time() > $data['_dc_life_end']) { //expired, save the same for a longer time for other connections $this->set($key, $data['_dc_cache'], FALSE, $data['_dc_cache_time']); return FALSE; } else { //still alive return $data['_dc_cache']; } } return FALSE; } /** * Mirror for $memcache->set() method */ public function set($key = '', $data, $flag = FALSE, $timeout = 3600) { if (empty($key)) return FALSE; // Place here "_inc" key check if (is_int($data) || $data === FALSE) parent::delete($key . '_increment'); // Maximum timeout = 15 days - 1 second if ((int)$timeout == 0 || (int)$timeout > 1295999) $timeout = 1295999; return $this->_set($key, $data, $flag, $timeout * 2); } /** * Mirror for $memcache->delete() method */ public function delete($key = '') { if (empty($key)) return FALSE; // Magic for increment. Place here "_inc" key check parent::delete($key . '_increment'); return parent::delete($key); } public function increment($key, $increment = 1) { $inc_value = parent::increment($key . '_increment', $increment); $data = parent::get($key); if ($data === FALSE) return FALSE; if ($this->_is_valid_cache($data)) { if ($inc_value === FALSE) { $inc_value = $data['_dc_cache'] + $increment; parent::set($key . '_increment', $inc_value, FALSE, $data['_dc_cache_time'] * 2); } $time = $data['_dc_life_end'] - time(); if ($time > 0) { $this->_set($key, $inc_value, FALSE, $time); return $inc_value; } } return $inc_value; } private function _set($key = '', $data, $flag = FALSE, $timeout = 3600) { $cache = array('_dc_cache' => $data, '_dc_life_end' => time() + $timeout, '_dc_cache_time' => $timeout); return parent::set($key, $cache, $flag, $timeout); } // Maybe we have pure Memcache data, not our array structure private function _is_valid_cache($value) { return (is_array($value) && isset($value['_dc_life_end']) && isset($value['_dc_cache_time']) && !empty($value['_dc_life_end']) && !empty($value['_dc_cache_time']) ) ? TRUE : FALSE; } } 


Examples of using


Code, just code. If the data is sour, we start the generation by returning FALSE only to the requester and resetting the same data at the same time. Thus, the next requestor will receive the old data until the first process finishes the generation and executes MC :: set () with the actual data. Immediately after this, all processes will receive the actual data.

 $data = MC::get('some_key'); if ($data === FALSE) { //     $data = huge_generate_func_call(); MC::set('some_key', $data, FALSE, 3600); } 


Those. We continue to use memkesh as before. If there was a wrapper for addressing memkesh, you can fix it and DO NOT touch anything in the application code. This, by the way, was one of the requirements: minimum refactoring for the implementation of a new memkesh class.

Summary


You can close your eyes for the overhead of storing the timestamp and timeout, the memory is now cheap.
The fact that the data “get rotten” by the amount of time equal to the time of data generation is not fatal and tolerable, however, new flows for the generation of the same data are not created. CTD!

PS
Suggestions and comments are welcome! Spelling - in a personal, essentially - in the comments!

UPD.
Comrades minus - we argue our choice. Not everyone was born with the talents of Pushkin and Stroustrup!

UPD. 2
Understand the minuses:
one.
The MS class is needed to change nothing in the code. Totally. Replace it with the name of your wrapper over the memok, if there is one. If not, most decent IDEs support Refactor -> Change ClassName.
2
MS class is static. So it happened historically - the minimum of refactoring is the main requirement. I did not begin to alter the code for Habr - the main idea is reflected there.

Source: https://habr.com/ru/post/128275/


All Articles