The world's easiest lock-free hash table

A non-blocking hash table is a two-sided medal. In some cases, they allow to achieve such performance, which is not obtained in other ways. On the other hand, they are quite complex.

The first working non-blocking table I heard about was written in Java by Dr. Cliff Click. His code was released in 2007, the same year the author made a presentation on Google. I admit, when I first watched this presentation, I did not understand most of it. The main thing that I have learned is that Dr. Cliff Click must be some kind of magician.

')
Fortunately, six years was enough for me to (almost) catch Cliff in this matter. As it turned out, one does not need to be a wizard in order to understand and implement the simplest, but at the same time ideally working, non-blocking hash table. Here I will share the source code of one of them. I am sure that anyone who has experience in multi-threading development in C ++ and who is willing to carefully study the previous posts of this blog will understand it without any problems.

The hash table was written using Mintomic , a portable library for non-blocking development in C / C ++, which I released last month . It is built and run out of the box on several x86 / 64, PowerPC and ARM platforms. And since each Mintomic function has an equivalent in C ++ 11, translating this table into C ++ 11 is a trivial task.

Restrictions

We, programmers, have the instinct to write data structures immediately as universal as possible, so that they are easy to reuse. This is not bad, but it can serve us a bad service, if from the very beginning we turn it into a goal. For this post, I went to the other extreme, creating a limited and highly specialized non-blocking hash table as I could. Here are its limitations:

Stores only 32-bit integer keys and 32-bit integer values.
All keys must be non-zero.
All values must be non-zero.
The table has a fixed maximum number of records that it can store, and this number must be a power of two.
There are only two operations: SetItem and GetItem.
No delete operation.

Be sure that when you master a limited version of this hash table, it will be possible to consistently remove all restrictions without changing the approach drastically.

An approach

There are many ways to implement hash tables. The approach I chose is just a simple modification of the ArrayOfItems class, described in my previous post, A Lock-Free ... Linear Search? I strongly recommend that you read it before continuing.

Similar to ArrayOfItems , this hash table class, which I called HashTable1 , is implemented using a simple giant array of key / value pairs.

 struct Entry { mint_atomic32_t key; mint_atomic32_t value; }; Entry *m_entries;

HashTable1 no linked lists for resolving hash collisions outside the table; there is only the array itself. The zero key in the array indicates an empty entry, and the array itself is initialized with zeros. And just like in ArrayOfItems , the values are added and arranged in HashTable1 using a linear search.

The only difference between ArrayOfItems and HashTable1 is that ArrayOfItems always starts a linear search with a zero index, while HashTable1 starts each linear search with an index calculated as a hash of a key . I chose MurmurHash3's integer finalizer as a hash function, since it is fast enough and well encodes integer data.

 inline static uint32_t integerHash(uint32_t h) { h ^= h >> 16; h *= 0x85ebca6b; h ^= h >> 13; h *= 0xc2b2ae35; h ^= h >> 16; return h; }

As a result, each time we call SetItem or GetItem with the same key, the linear search will start with the same index, but when we pass different keys, the search will, in most cases, begin with completely different indexes. Thus, the values are much better distributed over the array than in ArrayOfItems , and it is SetItem call SetItem and GetItem from several parallel threads.

HashTable1 uses a circular search, which means that when a SetItem or GetItem reaches the end of the array, it simply returns to zero index and continues the search. Since the array will never be filled, each search is guaranteed to end either by detecting the required key or by detecting a record with key 0, which means that the required key does not exist in the hash table. This technique is called open addressing with linear probing , and in my opinion, this is the most lock-free-friendly hash table of the existing ones. In fact, Dr. Click uses the same technique in his Java-free security table.

Code

Here is the function that implements SetItem . It passes through an array and stores the value in the first record, the key of which is 0 or coincides with the desired key. Its code is almost identical to the ArrayOfItems::SetItem code described in the previous post . The differences are only the integer hash and bitwise “and” applied to idx , which does not allow it to go beyond the bounds of the array.

 void HashTable1::SetItem(uint32_t key, uint32_t value) { for (uint32_t idx = integerHash(key);; idx++) { idx &= m_arraySize - 1; uint32_t prevKey = mint_compare_exchange_strong_32_relaxed(&m_entries[idx].key, 0, key); if ((prevKey == 0) || (prevKey == key)) { mint_store_32_relaxed(&m_entries[idx].value, value); return; } } }

The GetItem code also almost coincides with ArrayOfItems::GetItem except for minor modifications.

 uint32_t HashTable1::GetItem(uint32_t key) { for (uint32_t idx = integerHash(key);; idx++) { idx &= m_arraySize - 1; uint32_t probedKey = mint_load_32_relaxed(&m_entries[idx].key); if (probedKey == key) return mint_load_32_relaxed(&m_entries[idx].value); if (probedKey == 0) return 0; } }

Both of the above functions are thread-safe and ArrayOfItems for the same reasons as their counterparts in ArrayOfItems : all operations with array elements are performed using atomic library functions, values are assigned to keys using compare-and-swap (CAS) on keys, and all code is resilient against memory reordering. And again, for a better understanding, I advise you to refer to the previous post .

And finally, just like in the previous article, we optimize SetItem , first checking whether a CAS is really necessary, and if not, not applying it. Thanks to this optimization, an example of the application that you find below works almost 20% faster.

 void HashTable1::SetItem(uint32_t key, uint32_t value) { for (uint32_t idx = integerHash(key);; idx++) { idx &= m_arraySize - 1; // Load the key that was there. uint32_t probedKey = mint_load_32_relaxed(&m_entries[idx].key); if (probedKey != key) { // The entry was either free, or contains another key. if (probedKey != 0) continue; // Usually, it contains another key. Keep probing. // The entry was free. Now let's try to take it using a CAS. uint32_t prevKey = mint_compare_exchange_strong_32_relaxed(&m_entries[idx].key, 0, key); if ((prevKey != 0) && (prevKey != key)) continue; // Another thread just stole it from underneath us. // Either we just added the key, or another thread did. } // Store the value in this array entry. mint_store_32_relaxed(&m_entries[idx].value, value); return; } }

All is ready! Now you have the simplest non-blocking hash table in the world. Here are the links to the source code and the header file .

A short warning: as in ArrayOfItems , all operations with HashTable1 performed with weak (relaxed) memory ordering constraints. Therefore, if you want to make any data available to other threads by writing a flag in HashTable1 , it is necessary that this record has “ release semantics ”, which can be guaranteed by putting release fence (“release barrier”) immediately before instruction. Similarly, accessing GetItem in the stream that wants to receive data needs to acquire a acquire fence (“acquisition barrier”).

 // Shared variables char message[256]; HashTable1 collection; void PublishMessage() { // Write to shared memory non-atomically. strcpy(message, "I pity the fool!"); // Release fence: The only way to safely pass non-atomic data between threads using Mintomic. mint_thread_fence_release(); // Set a flag to indicate to other threads that the message is ready. collection.SetItem(SHARED_FLAG_KEY, 1) }

Sample application

To evaluate HashTable1 at work, I created another simplest application, very similar to the example from the previous post. Each time it chooses from two experiments:

Each of the two threads adds 6,000 values with unique keys.
Each stream adds 12,000 different values with the same keys.

The code is on GitHub, so you can build it and run it yourself. For assembly instructions, see README.md .

Since the hash table never overflows - say, less than 80% of the array will be used - HashTable1 will show remarkable performance. Perhaps, I should confirm this with measurements, but based on previous experience in measuring single - threaded hash-table performance, I can argue that you will not be able to create a faster, non-blocking hash table than HashTable1 . It may seem surprising, but the ArrayOfItems that underlies it shows awful performance. Of course, as with any hash table, there is a non-zero risk to hash a large number of keys into one array index, and then the performance will be equal to the speed of ArrayOfItems . But with a fairly large table and a good hash function, such as MurmurHash3, the likelihood of such a scenario is negligible.

I used non-blocking tables, similar to this, in real projects. In one case, in the game I was working on, the severe competition of threads for shared locks created a bottleneck every time memory monitoring was turned on. After switching to lock-free hash tables, the frame rate in the worst case improved from 4 FPS to 10 FPS.

Source: https://habr.com/ru/post/322496/

All Articles

The world's easiest lock-free hash table

Restrictions

An approach

Code

Sample application

More articles: