⬆️ ⬇️

Automatic compression of stored data in redis

The problem is that during peak hours the network interface does not cope with the amount of data transmitted.

From the available solutions, compression of the stored data was chosen.

tl; dr: save memory> 50% and network> 50%. It's about a predis plugin that automatically compresses data before being sent to redis.



As you know, redis uses a text protocol (binary safe) and data is stored in its original form. In our application in redis, serialized php objects and even pieces of html code are stored, which is very well suited to the very concept of compression - the data is homogeneous and contain many repeated groups of characters.



In the process of finding a solution, a discussion was found in the group - the developers do not plan to add compression to the protocol ... So we will do it ourselves.



So, the concept: if the size of the data transferred to be saved in redis is more than N bytes, then before saving, compress the data using gzip. When receiving data from redis, check the first bytes of the data for the presence of a gzip header and, if found, unpack the data before transferring it to the application.

Since we use predis to work with redis, the plugin was written for it.

')

Let's start small and write a mechanism for working with compression - CompressorInterface - methods for determining whether to compress, compress, determine whether to unpack and decompress. The class constructor will take a threshold value in bytes, starting from which compression is enabled. This interface will allow you to implement your favorite compression algorithm yourself, for example, a lamp WinRAR.



The logic of checking the size of the input data is moved to the AbstractCompressor class so as not to duplicate it in each of the implementations.

AbstractCompressor
 abstract class AbstractCompressor implements CompressorInterface { const BYTE_CHARSET = 'US-ASCII'; protected $threshold; public function __construct(int $threshold) { $this->threshold = $threshold; } public function shouldCompress($data): bool { if (!\is_string($data)) { return false; } return \mb_strlen($data, self::BYTE_CHARSET) > $this->threshold; } } 


We use mb_strlen to overcome possible problems with mbstring.func_overload and one-byte encoding to prevent attempts to automatically detect encoding from data.



We make an implementation based on gzencode for compression, which has magic bytes equal to \x1f\x8b\x08" (from them we will understand that the string must be unpacked).

GzipCompressor
 class GzipCompressor extends AbstractCompressor { public function compress(string $data): string { $compressed = @\gzencode($data); if ($compressed === false) { throw new CompressorException('Compression failed'); } return $compressed; } public function isCompressed($data): bool { if (!\is_string($data)) { return false; } return 0 === \mb_strpos($data, "\x1f" . "\x8b" . "\x08", 0, self::BYTE_CHARSET); } public function decompress(string $data): string { $decompressed = @\gzdecode($data); if ($decompressed === false) { throw new CompressorException('Decompression failed'); } return $decompressed; } } 




A nice bonus is if you use RedisDesktopManager , then it automatically unpacks gzip when viewing. I tried to see the result of the plug-in in it and, until I found out about this feature, I thought that the plug-in does not work :)



In predis, there is a Processor mechanism that allows you to change the arguments of commands before passing to the repository, which we will use. By the way, on the basis of this mechanism, in the standard delivery predis there is a prefixer that allows you to dynamically add some string to all keys.



 class CompressProcessor implements ProcessorInterface { private $compressor; public function __construct(CompressorInterface $compressor) { $this->compressor = $compressor; } public function process(CommandInterface $command) { if ($command instanceof CompressibleCommandInterface) { $command->setCompressor($this->compressor); if ($command instanceof ArgumentsCompressibleCommandInterface) { $arguments = $command->compressArguments($command->getArguments()); $command->setRawArguments($arguments); } } } } 


The processor is looking for commands that implement one of the interfaces:

1. CompressibleCommandInterface - shows that the command supports compression and describes a method for the command to get the implementation of the CompressorInterface .

2. ArgumentsCompressibleCommandInterface - the successor of the first interface, shows that the command supports the compression of arguments.



The logic turned out strange, do not you think? Why is the compression of the arguments explicitly and caused by the processor, and the logic for unpacking the answers is not? Take a look at the command creation code that uses predis ( \Predis\Profile\RedisProfile::createCommand() ):



 public function createCommand($commandID, array $arguments = array()) { //       $command = new $commandClass(); $command->setArguments($arguments); if (isset($this->processor)) { $this->processor->process($command); } return $command; } 


Because of this logic, we have several problems.

The first of these is that the processor can influence the command only after it has already received arguments. This does not allow transferring some external dependency to it ( GzipCompressor in our case, but it could be some other mechanism that needs to be initialized outside the predis, for example, an encryption system or a mechanism for signing data). Because of this, an interface appeared with a method for compressing arguments.

The second problem is that the processor can not affect the processing command of the server response. Because of this, the decompression logic is forced to be in CommandInterface::parseResponse() , which is not entirely correct.



Together, these two problems have led to the fact that a command for unpacking and the unpacking logic itself are not explicitly stored inside the command. I think the processor in predis should be divided into two stages - the preprocessor (to transform the arguments before sending it to the server) and the postprocessor (to transform the response from the server). I shared these thoughts with the developers of predis.



Typical Set Command Code
 use CompressibleCommandTrait; use CompressArgumentsHelperTrait; public function compressArguments(array $arguments): array { $this->compressArgument($arguments, 1); return $arguments; } 
Typical Get Command Code
 use CompressibleCommandTrait; public function parseResponse($data) { if (!$this->compressor->isCompressed($data)) { return $data; } return $this->compressor->decompress($data); } 


On the results of plug-in inclusion on the graphs of one of the cluster instances:







How to install and start using:
 composer require b1rdex/predis-compressible 


 use B1rdex\PredisCompressible\CompressProcessor; use B1rdex\PredisCompressible\Compressor\GzipCompressor; use B1rdex\PredisCompressible\Command\StringGet; use B1rdex\PredisCompressible\Command\StringSet; use B1rdex\PredisCompressible\Command\StringSetExpire; use B1rdex\PredisCompressible\Command\StringSetPreserve; use Predis\Client; use Predis\Configuration\OptionsInterface; use Predis\Profile\Factory; use Predis\Profile\RedisProfile; // strings with length > 2048 bytes will be compressed $compressor = new GzipCompressor(2048); $client = new Client([], [ 'profile' => function (OptionsInterface $options) use ($compressor) { $profile = Factory::getDefault(); if ($profile instanceof RedisProfile) { $processor = new CompressProcessor($compressor); $profile->setProcessor($processor); $profile->defineCommand('SET', StringSet::class); $profile->defineCommand('SETEX', StringSetExpire::class); $profile->defineCommand('SETNX', StringSetPreserve::class); $profile->defineCommand('GET', StringGet::class); } return $profile; }, ]); 


Upd : link to the plugin on GitHub .

Source: https://habr.com/ru/post/331474/



All Articles