📜 ⬆️ ⬇️

Combat rebuilding cache with RED

Description of the problem


Imagine the average high-loaded site. Usually on such sites between the backend'om and DB put a cache layer. With the increase in the number of visitors, the likelihood that several users simultaneously stumble upon a "rotten" cache increases. If this happens, the load on the backend and DB increases, which in turn increases the processing time of the request and increases the likelihood of such a situation. Here is such a system with positive feedback: Small red humps are “dull” requests on a multiple cache update. This article will describe one of the approaches to solving the problem with the example ( patch attached ) PHP / APC bundles, but the theoretical framework is applicable to any language and caching system.

Solutions to the problem


There may be several approaches to solving the problem. The method proposed below is exactly workaround. The correct approach, IMHO, is to use busy lock'ov, which would allow only one process to update the cache, while the rest would give outdated data. However, this solution is not trivial enough and in most cases (with a uniform distribution of requests) the described workaround should be more than enough.

Also, in my case, it was decided to implement RED at the level of the cache itself, but the same could be implemented in the class / wrapper inside the application code. This method was chosen because it was too lazy to go into PHP code.
')

RED


RED , also known as random early detection , random early discard or random early drop is an algorithm for managing queues encountered in routers and shapers. Its physical meaning is approximately as follows: instead of waiting for the queue to fill up and you have to drop everything (taildrop), a certain threshold is entered, after which we start to drop packets with a certain probability, the probability itself is directly proportional to the free space in the buffer. Hence the Random Early Drop .

In our case, we will enter a certain threshold for the write cache drop_threshold ( drop_threshold ), after which we start with some probability p to issue cache miss read requests from the cache, which will force the application to put a fresh version of the data in the cache. Thereby, we will reduce the number of competitive cache updates in inverse proportion to p .

Materiel


Mathematics is at the third grade level here, so I’ll just give a picture and a formula:

Disclaimer


PHP I do not know. On C, I write like cattle.

APC patch


The code is written more like PoC than for production use. A value of 75% was taken from the ceiling. In real conditions, drop_threshold should be rendered to the config.
Index: apc_cache.c
================================================= =================
--- apc_cache.c ( revision 320057 )
+++ apc_cache.c ( working copy )
@@ - 705 , 6 + 705 , 13 @@
volatile apc_cache_entry_t * value = NULL;
unsigned long h;

+ / * APC-RED PATCH * /
+ const int drop_threshold = 75 ;
+ unsigned int probability = 0 ;
+ int red_cf = 0 ;
+ short drop = 0 ;
+ / * END OF APC-RED PATCH * /
+
if ( apc_cache_busy ( cache ) )
{
/ * cache cleanup in progress * /
@@ - 720 , 8 + 727 , 25 @@
while ( * slot ) {
if ( ( h == ( * slot ) -> key.h ) &&
! memcmp ( ( * slot ) -> key.data.user.identifier, strkey, keylen ) ) {
+ / * APC-RED PATCH * /
+ if ( ( * slot ) -> value-> data.user.ttl ) {
+ red_cf = ( t - ( * slot ) -> creation_time ) * 100 /
+ ( * slot ) -> value-> data.user.ttl;
+ if ( red_cf> = 100 ) {
+ drop = 1 ;
+ }
+ else if ( red_cf <= drop_threshold ) {
+ drop = 0 ;
+ }
+ else {
+ probability = ( red_cf - drop_threshold ) * 100 / ( 100 - drop_threshold ) ;
+ drop = ( arc4random ( ) % 100 ) <probability? 1 : 0 ;
+ }
+ }
+ / * END OF APC-RED PATCH * /
+
/ * Check to make sure this entry is expired by a hard TTL * /
- if ( ( * slot ) -> value-> data.user.ttl && ( time_t ) ( ( * slot ) -> creation_time + ( * slot ) -> value-> data.user.ttl ) <t ) {
+ if ( ( ( * slot ) -> value-> data.user.ttl && ( time_t ) ( ( * slot ) -> creation_time + ( * slot ) -> value-> data.user.ttl ) <t ) | | drop ) {
#if ( USE_READ_LOCKS == 0 )
If you have a write-lock
* might be right-away. Otherwise an insert

Considering how many parsers this patch went through before getting to Habr, it probably will not apply, so it’s better to leave a link to gist here: patch-apc_red.patch

Reflections on the beautiful


It would also be worthwhile to add support for the RCU (for example via liburcu ) to the APC code, because under heavy loads it is not so difficult to rest against rwlock (by the way, relatively recently added to APC). APC itself has long been worth building in PHP.

You can also think about how to adaptively set drop_threshold for some global structure like board_config , in which phpbb stores all of the global settings and which twitches on every request by all users, it makes sense to set the threshold to 50%, and vice versa , it is necessary to disable the feature, setting the threshold to 100%. If you think about this topic, you can again look at the network analogs, for example, Adaptive / Active RED (ARED) .

You can also read the long-standing, but still current report by Andrei Smirnov ( smira ) on Highload 2008: Web, caching and memcached

Source: https://habr.com/ru/post/133910/


All Articles