Under the hood Redis: Strings

If you know why the simple string `strings` in Redis will take in memory 56 bytes - you, I think, the article will not be interesting. For everyone else, I will try to tell you what strings are in Redis and why it is important for a developer using this database to understand how they work and how they work. This knowledge is especially important if you are trying to calculate the actual memory consumption by your application or are planning to build highly loaded systems of statistics or data accounting. Or, as often happens, you are trying to urgently understand why suddenly your copy of redis began to consume unexpectedly a lot of memory.

What I will talk about - how lines are stored in Redis, what internal structures are used to store lines, what types of optimizations Redis uses under the hood. How to efficiently store large structures and in what situations you should not use lines or structures built on their basis. Strings - a key structure Redis, HSET / ZSET / LIST are built on their base, adding a small overhead to the presentation of their internal structures. Why do I need this article - more than a year I read and actively respond to stackoverflow in the redis tag. Over the course of this time, I constantly see a steady stream of questions somehow related to the fact that developers do not understand the features of Redis working with RAM and what you have to pay for high performance.

The answer to the question how much memory will be used actually depends on the operating system, compiler, type of your process and the shared processor used (in redis, by default, jemalloc). All further calculations I cite for redis 3.0.5 compiled on a 64 bit server running centos 7.

It seems to me that a small interlude is absolutely necessary for those who do not write in c / c ++ or are not very well acquainted with how everything works at a low level. Let's monstrously greatly simplify a few concepts, so that it is easier for you to understand the calculations. When in a program with s / s ++ you declare a structure, and in it you have unsigned int fields (without signed integers of 4 bytes) the compiler will carefully align their size to 8 bytes in real RAM (for x64 architecture). This article will periodically talk about the memory allocator - this is the kind of thing that allocates memory "clever". For example, jemalloc tries to optimize for you the speed of searching for new blocks of memory, relying on the alignment of the allocated fragments. The memory allocation and alignment strategy in jemallo is well described , but I think we should use the simplification that any selected fragment size will be rounded to the nearest degree 2. You ask 24 bytes - allocate 32. Ask 61 - allocate 64. I greatly simplify and I hope you will be a little clearer. These are the things that in the interpreted languages, logically, you should not worry, but here I beg you to draw your attention to them.

The concept and implementation of the lines written by Salvatore Sanfillipo (aka antirez) lies in one of the radish projects called SDS (Simple Dynamic String, github.com/antirez/sds ):

+--------+-------------------------------+-----------+ | Header | Binary safe C alike string... | Null term | +--------+-------------------------------+-----------+ | `-> Pointer returned to the user.

')
This is a simple structure with `s`, the header of which stores the current size and free space in the already allocated memory, the string itself and the required trailing zero, which the radish itself adds. In sds strings, we are most interested in the cost of heading, the strategy for changing their size and penalties for aligning the structures during memory allocation.

July 4, 2015 ended a long story with line optimization , which should get into radish 3.1. This optimization will bring more memory savings in line headers (from 16% to 200% on synthetic tests). Remove the limit of 512MB on the maximum line length in radish. All this will be possible due to the dynamic change in the length of the header when the length of the string changes. So the header will occupy only 3 bytes for lines up to 256 bytes in length, 5 bytes for lines less than 65 kb, 9 bytes (as it is now) for lines up to 512 mb and 17 bytes for lines whose size “holds” in uint64_t (64 bit without signed integer). By the way, with this change, our real farm saves about 19.3% of the memory (~ 42 GB). However, the last at the current moment 3.0.x with the header is simple - 8 bytes + 1 byte for the terminating zero. Let's estimate how much memory the “strings” line will take: 16 (header) + 7 (string length) + 1 (terminating zero) = 24 bytes (16 bytes per header, because the compiler aligns 2 unsigned int for you). At the same time, jemalloc will allocate 32 bytes for you. Let's skip it for now (I hope it will be clear later why).

What happens when the string is resized? Whenever you increase the size of a line and there is not enough allocated memory, the radish checks the new length with the SDS_MAX_PREALLOC constant (defined in sds.h and is 1,048,576 bytes). If the new length is less than this value, memory twice the requested one will be allocated. If the length of the string already exceeds SDS_MAX_PREALLOC, the value of this constant will be added to the new requested length. This feature will be important in the history of "about disappearing memory when using bitmap". By the way, when allocating memory under a bitmap, it will always be allocated 2 times the requested value, due to the implementation features of the setbit command (see setbitCommand in bitops.c).

Now we could say that our string takes 32 bytes in RAM (including alignment). Those who read the tips from the guys from hashedin.com ( redis memory optimization guide ) may recall that they strongly recommend not to use strings with a length of less than 100 bytes, because to store a short string, say when using the command `set foo bar` you spend ~ 96 bytes of which 90 bytes is an overhead (on a 64 bit machine). Cunning? Let's understand further.

All values in radishes are stored in the structure of the redisObject type. This allows radishes to know the type of value, its internal representation (in radish, this is called encoding), the data for LRU, the number of objects referring to the value, and the value itself:

 +------+----------+-----+----------+-------------------------+ | Type | Encoding | LRU | RefCount | Pointer to data (ptr*) | +------+----------+-----+----------+-------------------------+

A little later we will calculate its size for our line, taking into account the alignment of the compiler and the features of jemalloc. In the context of strings, it is very important for us to know what encodings are used to store strings. Radish uses three different storage strategies right now:

REDIS_ENCODING_INT is pretty simple. Strings can be stored in this form, if the value cast to long is in the range LONG_MIN , LONG_MAX . So, the string “dict” will be stored exactly in the form of this encoding and will be the number 1952672100 (0x74636964). The same encoding is used for a previously selected range of special values in the REDIS_SHARED_INTEGERS range (defined in redis.h and equal to 1000 by default). The values of this range are allocated immediately when starting radish.
REDIS_ENCODING_EMBSTR is used for strings with a length of up to 39 bytes (the value of the constant REDIS_ENCODING_EMBSTR_SIZE_LIMIT from object.c). This means that the redisObject and the structure with the sds string will be placed in a single memory area allocated by the allocator. With this in mind, we can correctly calculate the alignment. However, it is no less important for understanding the problem of memory fragmentation in radishes and how to live with it.
REDIS_ENCODING_RAW is used for all rows whose length exceeds REDIS_ENCODING_EMBSTR_SIZE_LIMIT . In this case, our ptr * stores the usual pointer to the memory area with the sds string.

EMBSTR appeared in 2012 and brought a 60-70% increase in performance when working with short strings, but there is still no serious research on the effect on memory and its fragmentation.

The length of our string “strings” is only 7 bytes, i.e. The type of its internal representation is EMBSTR. The string thus created is located in memory like this:

 +--------------+--------------+------------+--------+----+ | robj data... | robj->ptr | sds header | string | \0 | +--------------+-----+--------+------------+--------+----+ | ^ +-----------------------+

Now we are ready to calculate how much RAM redis is needed to store our string “strings”.

(4 + 4) ^* + 8 (encoding) + 8 (lru) + 8 (refcount) + 8 (ptr) + 16 (sds header) + 7 (the line itself) + 1 (the final zero) = 56 bytes.

The type and value in redisObject use only 4 low and high bits of the same number, therefore these two fields will take 8 bytes after alignment.

Let's check that I do not lead you by the nose. Let's look at the coding and value. We use one little-known command for debugging lines - DEBUG SDSLEN. By the way, the command is not in the official documentation , it was added in redis 2.6 and can be very useful:

 set key strings +OK debug object key +Value at:0x7fa037c35dc0 refcount:1 encoding:embstr serializedlength:8 lru:3802212 lru_seconds_idle:14 debug sdslen key +key_sds_len:3, key_sds_avail:0, val_sds_len:7, val_sds_avail:0

The encoding used is embstr, the length of the string is 7 bytes (val_sds_len). What about those 96 bytes that the guys from hashedin.com were talking about? In my understanding, they were slightly mistaken; their example with `set foo bar` would require the allocation of 112 bytes of RAM (56 bytes for the value and the same for the key), of which 106 is the overhead.

A little higher, I promised a story about disappearing memory when using BITMAP. The feature about which I want to tell is constantly flowing away from the attention of some of the developers who use it. Guys, this is what the memory optimization consultants regularly earn. Such as redis-labs or datadog. The family team “Bit and byte level operations” appeared in redis 2.2 and immediately positioned themselves as a magic wand for real-time counters (for example, an article from Spool ) that save memory. The official memory optimization guide also has an advertising slogan about using this family of data for online storage “For 100 million users, this data will take up only 12 megabytes of RAM”. In the description of SETBIT and SETRANGE warn about possible lags of the server when allocating memory, omitting the important, I think, section "When you should not use BITMAP" or "When it is better to use SET instead of BITMAP".

Armed with an understanding of how the lines grow in radish, you can see that bitmap:

Do not use for sparse data.
understand the ratio of useful and real load (about this in the example below).
take into account the dynamics of filling your bitmap.

Consider an example. Suppose you have registered up to 10 million people and your ten millionth user went online:

 setbit online 10000000 1 :0 debug sdslen online +key_sds_len:6, key_sds_avail:0, val_sds_len:1250001, val_sds_avail:1048576

Your actual memory consumption was 2,298,577 bytes, with 1,250,001 bytes “useful” to you. Storage of one of your users cost you ~ 2.3 MB. Using SET you would need ~ 64 bytes (with 4 bytes of payload). You need to correctly select the aggregation intervals so as to reduce the sparsity of the data and try to fill the bitmap in the range of 30% - in this case, you will actually use the memory for this data structure effectively. I say this to the fact that if you have a multimillion audience, and an hour online say 10,000 - 100,000 people, then using a bitmap for this purpose can be a memory overhead.

Finally, resizing lines in radishes is a permanent redistribution of blocks of memory. Memory fragmentation is another specificity of radishes, about which developers think a little.

 info memory $222 # Memory used_memory:506920 used_memory_human:495.04K used_memory_rss:7565312 used_memory_peak:2810024 used_memory_peak_human:2.68M used_memory_lua:36864 mem_fragmentation_ratio:14.92 mem_allocator:jemalloc-3.6.0

The mem_fragmentation_ratio metric shows the ratio of the operating system allocated memory ( used_memory_rss ) and the memory used by radish ( used_memory ). In this case, used_memory and used_memory_rss already include both the data itself and the cost of storing the internal structures of radish for their storage and presentation. Radish considers RSS (Resident Set Size) as the amount of memory allocated by the operating system, in which, in addition to user data (and expenses for their internal representation), the costs of fragmentation are taken into account when physically allocating memory by the operating system itself.

How to understand mem_fragmentation_ratio? A value of 2.1 tells us that we use 210% more memory for storing data than we need. A value less than 1 indicates that the memory is over and the operating system swaps.

In practice, if the values for mem_fragmentation_ratio fall outside the bounds 1 - 1.5, it means that something is wrong with you. Try:

Reload your radishes. The longer the radish, in which you actively write worked without rebooting, the higher you will have mem_fragmentation_ratio. In many ways, "thanks to" features of the allocator. This is guaranteed to help, if you have a big difference between used_memory and used_memory_peak. The last figure says what is the maximum amount of memory your copy of radish ever needed since its launch.
See what the data is and how much you plan to store. So, if 4 GB is enough to store your data - use 32-bit radish assemblies. At least, if you use a 64-bit assembly, at least try to deploy your ladies to the 32-bit version (rdb does not depend on the radish bit size and you can easily run rdb created by a 64-bit instance on a 32-bit one). Practically guaranteed, this reduces fragmentation (and the amount of memory used) by ~ 7% (due to savings on alignment).
If you understand the difference and features, try changing the allocator. Radishes can be collected from glibc malloc, jemalloc (read what facebook engineers think about this), tcmalloc.

In a conversation about fragmentation, I do not take into account the specifics of radish when LRU is turned on or if there are additional difficulties with a large number of regular string keys - all this pulls into a separate article. I would be grateful if you share your suggestions, whether it is worth writing about it and what else seems important to you when working with radishes.

The pansa user correctly notes that in the situation with the swap radish will not recalculate the value of used_memory_rss after the operating system returns part of the RAM to the process. Radish will recalculate this value when accessing data.

Table of contents:

Under the hood Redis: Strings
Under the hood Redis: Hash table (part 1)
Under the hood Redis: Hash table (part 2) and List

Additional reading materials:

Source: https://habr.com/ru/post/271487/

All Articles

Under the hood Redis: Strings

More articles: