📜 ⬆️ ⬇️

Under the hood Redis: Hash table (part 1)

If you know why after running `hset mySey foo bar` we will spend at least 296 bytes of RAM, why instagram engineers do not use string keys, why you should always change hash-max-ziplist-entries / hash-max-ziplist-val and why the data type underlying the hash is and the list, sorted set, set part - do not read. For the rest I will try to tell about it. Understanding the structure and operation of hash tables in Redis is crucial when writing systems where memory savings are important.

What is this article about - what expenses does Redis incur for storing the key itself, what is a ziplist and dict , when and for what purpose they are used, how much is occupied in memory. When hash is stored in ziplist , when in dicth and what it gives us. What tips from fashionable articles about Redis optimization should not be taken seriously and why.
I want to immediately apologize for breaking the material into two articles. Having finished the part about the dictionary, I realized that the second part - about the ziplist, is a little less than the first. As a result, after tests on colleagues, I decided to split it into two parts.

First, we need to understand how the individual structures that underlie Hashes work . Then we calculate the cost of storing data. It will be easier for you to understand the material if you already know what redisObject is and how Redis stores strings .
There is some confusion that the hashes structure is perceived as a hash table. Under the hood there is a separate type of dict, which is a real hash table. The structure itself hashes a hybrid of dict and ziplist. We need a great introduction on how the hash tables are organized and work. By showing an article without this material to several of my colleagues, I became convinced that without it, everything becomes more confusing.

In Redis, you control which internal data type a particular hash key is represented. The variable hash-max-ziplist-entries determines the maximum number of elements in a hash with a cat, you can use the encoding REDIS_ENCODING_ZIPLIST . For example, with hash-max-ziplist-entries = 100 your hash will be represented as a ziplist, as long as it contains less than 100 items. As soon as there are more items, it will be converted to REDIS_ENCODING_HT (dict). Hashh-max-ziplist-val works in a similar way - as long as the length of any value of any individual field in a hash does not exceed the hash-max-ziplist-val , the ziplist will be used. The parameters work in pairs - the internal representation from the ziplist in dict will happen as soon as any of them is exceeded. By default, any hash always has a ziplist encoding.

Dictionaries (dict) - the key structure of Redis. Not only for hashes, but also for list, zset, set. This is the main structure that Redis uses to store any data bundles of the key-value type. Redis dictionaries are a classic hash table that supports insert, replace, delete, search, and get random entries. The entire implementation of dictionaries can be found in dict.h and disc.c in the Redis sources.
')
Any hash table has a number of parameters that are critical for understanding its effectiveness and choosing the right implementation. In addition to algorithmic complexity important for any data structure , it is also a number of factors that directly depend on the conditions of use. In this vein, the fill factor and the resizing strategy are particularly interesting. The latter is especially important in any system operating in real time, since any of the available theoretical implementations always puts you in front of a choice - to spend a lot of RAM or CPU time. Redis uses the so-called incremental resizing . Although there are suggestions to try and linear hashing, monotonic keys, consistent hashing, resizing by copying, depending on the specifics of the hash.

The implementation of incremental resizing (functions dictRehashMilliseconds , dictRehash in dict.c), comes down to the fact that:
  1. When resizing a table, we allocate a new hash table that is obviously larger than the existing one (and do not change the old one). As with the rows, Redis will double the number of slots in the new table. At the same time, any requested size (including the initial one) will always be aligned upward to the nearest power of two. For example, you need 9 slots, it will be allocated - 16. The minimum limit for the new table is determined by the compile stage constant DICT_HT_INITIAL_SIZE (default 4, defined in dict.h).
  2. During each read / write operation we look in both tables.
  3. All insert operations are carried out only in a new table.
  4. For any operation, we move n elements from the old table to the new one.
  5. Delete the old table if all items are moved.

In the case of Redis, n is the number of keys that the server managed to transfer in 1 millisecond from steps 1000. In other words, 1 step of rehashing is at least 1000 items. This is important because when using keys with long string values, such an operation can significantly exceed 1 ms, causing the server to hang. An additional aspect is the large peak memory consumption when rehashing the table, which is acute in the hashes with a large number of “long” keys. When calculating, we will consider tables outside of hashing, remembering, however, that if there is a large table in our Redis node, there is a high probability that we will refuse to serve if we cannot change its size. For example, if you rent budget Redis on heroku (only 25 MB of memory).

The fill factor (the constant dict_force_resize_ratio is equal to the default 5) determines the degree of sparsity in the dictionary and when Redis starts the process of doubling the size of the current dictionary. The current value of the fill factor is defined as the ratio of the total number of elements in the hash table to the number of slots. In other words, if we have only 4 slots and 24 elements (distributed between them), Redis will decide to double the size (24/4> 5). A similar check happens every time you access any dictionary key. If the number of elements becomes equal or exceeds the number of slots - Redis will also try to double the number of slots.


Look at this structure in the source code.
typedef struct dict { dictType *type; void *privdata; dictht ht[2]; long rehashidx; /* rehashing not in progress if rehashidx == -1 */ int iterators; /* number of iterators currently running */ } dict; typedef struct dictht { dictEntry **table; unsigned long size; unsigned long sizemask; unsigned long used; } dictht; typedef struct dictEntry { void *key; union { void *val; uint64_t u64; int64_t s64; double d; } v; struct dictEntry *next; } dictEntry; 


Each dictionary is built around the use of three structures - dict , dictht and dictEntry . Dict is the root structure, in its ht field it is stored from 1 to 2 (during resheng) of the dictht tables with the list of slots. In turn, dictht stores a linked list from dictEntry . Each dictEntry stores a pointer to the key and value (as a pointer or double / uint64_t / int64_t). If you use number as the value - dictEntry will store the number, store the string - the pointer to the redisObject with the sds string on board will be saved.

When Redis accesses a specific key within the hash table:
  1. With the help of the hashing function, the required slot is found (the hashing function MurMur2 is currently used to search for the slot).
  2. All values ​​of the slot are sorted one by one (linked list), comparing the keys.
  3. The requested operation is performed on dictEntry (with key and value)

The help for HSET or HGET says that these are operations performed in O (1) . Believe with caution: most of the time your hset will occupy O (n) . In large tables (more than 1000 items) with an active hget record, it will also occupy O (n) .
The answer to the question how much memory will actually be used depends on the operating system, compiler, type of your process, and the allocator used (in redis, by default, jemalloc). All further calculations I cite for redis 3.0.5 compiled on a 64 bit server running centos 7.

Now you can calculate how much memory is spent when you call `hset mySet foo bar`. Overhead to create an empty dictionary:

If we put everything together, we get a formula for calculating the approximate overhead costs for storing n elements:
  56 + 2 * size_of(pointer) + 3 * next_power(n) * size_of(pointer) ------------------------- ------------------------------------- | | ` dict + dictht `dictEntry 


This formula does not include the cost of storing the key itself and the value. The key is always the sds string in redisObject. The value, depending on the type, can be the string sds or native numbers (integer or floating point). Check for `hset mySet foo bar`, remembering that the minimum number of slots (dictEntry) in the new dictionary is equal to DICT_HT_INITIAL_SIZE (4) :
56 + 2 * 8 + 3 * 4 * 8 = 168 + 2 * 56 = 280 bytes (will be aligned to 296 bytes).

Checking:
 config set hash-max-ziplist-entries 0 +OK config set hash-max-ziplist-value 1 +OK hset mySet foo bar :1 debug object mySet +Value at:0x7f44894d6340 refcount:1 encoding:hashtable serializedlength:9 lru:5614464 lru_seconds_idle:6 

What does info memory show? It will show you that 320 bytes are spent. 24 bytes more than we expected. This memory is the alignment cost of allocating memory.

How was the mySet key itself saved and how does Redis find our hash by that name? At the very heart of Redis lies the redisDb structure (redis database representation, in redis.h). It contains a single dict dictionary for all keys. Exactly the same as described above. This is important because it gives an idea of ​​the cost of storing the key database and will give us a basis for understanding the tips that in one instance you should not store a lot of simple keys.

Let's see what they say in the article storing hundreds of millions of simple keys . If you need to store many key-value pairs, do not use string keys, use a hash table.
A gist with a dough from an instagram article, we will write to LUA so that it doesn’t require anything other than running Redis to check:
 flushall +OK info memory $225 # Memory used_memory:508408 eval "for i=0,1000000,1 do redis.call('set', i, i) end" 0 $-1 info memory $226 # Memory used_memory:88578952 

It took us a little more than 88 MB to store 1,000,000 integer keys. Now we will save the same data in a hash table, evenly distributing keys between 2000 hash tables:
 flushall +OK info memory $227 # Memory used_memory:518496 eval "for i=0,1000000,1 do local bucket=math.floor(i/500); redis.call('hset', bucket, i, i) end" 0 $-1 info memory $229 # Memory used_memory:104407616 

To store the same data with integer fields, we spent a little more than 103 MB . However, should we try to keep say 10,000,000 keys, the situation will change dramatically, namely ~ 970 mb versus ~ 165 mb (and all of them are in the overhead of storage in the system hash table of keys). In general, the rule “if you have hundreds of millions of keys, do not use string keys” is true.

Let's look at another aspect. Very often describing optimizations of this type, it is assumed that you have many keys of the form entity: identifier (for example, `set username: 4566 RedisMan`) and suggest switching to using hash tables
bucket: id id value (for example `hset usernames: 300 4566 RedisMan`). It is important to understand that a partial substitution of concepts takes place here - the sds string `username: 4566` is transformed into the key field 4566 with the encoding REDIS_ENCODING_INT . This means that instead of sds the lines in redisObject began to use a number, thus the minimum 32 bytes on the sds line (after aligning jemalloc) turned into 4 bytes.

Let's force the hash table encoding to the ziplist:
 config set hash-max-ziplist-entries 1000 +OK config set hash-max-ziplist-value 1000000 +OK flushall +OK eval "for i=0,1000000,1 do local b=math.floor(i/500); redis.call('hset', 'usernames:' ..b, i, i) end" 0 $-1 info memory $228 # Memory used_memory:16816432 

The storage of our data took only 16 mb or 72 mb of savings ( 5 times less ) with my example of 1,000,000 keys. Tempting? We will proceed to the explanation in the second part of the article.

In the intermediate conclusion it is worthwhile to formulate several important conclusions for a loaded system, which should save memory:

Table of contents:

The materials used in writing the article:

Source: https://habr.com/ru/post/271205/


All Articles