Case studies and testing of a stream-safe pointer and contention-free shared-mutex
In this article, we show: additional optimizations, examples of use and testing of a thread-safe pointer developed by us with an optimized shared mutex.
Contfree_safe_ptr <T> is equivalent to
safe_ptr <T, contention_free_shared_mutex <>>At the end we will show comparative graphs of tests of our thread-safe pointer and one of the best lock-free algorithms from libCDS on Intel Core i5 / i7, Xeon, 2 x Xeon processors.
Three related articles:
- Making any object thread safe
- Accelerate std :: shared_mutex 10 times
- Thread safe std :: map with lock-free map performance
→
My article in English→
Examples and tests from all three articlesYou can find the libCDS library with which we will compare our solution at the link:
github.com/khizmax/libcdsAll tests in this article use this commit from libCDS:
github.com/khizmax/libcds/tree/5e2a9854b0a8625183818eb8ad91e5511fed5898')
Different granularity of locks
First, we will show how to optimally use shared-mutex, using the example of working with a table (an array of structures). Referring to the experience of industrial databases. For example, in the MSSQL DBMS, locks of different granularity are used — a lock: one or several rows, a page, a extent, one section of a table, an index, the entire table, the entire database.
https://technet.microsoft.com/en-us/library/ms189849 (v=sql.105).aspxIndeed, if we work with one row for a long time and it is important for us that at this time the row is not changed by another thread, then there is no need to block the entire table all this time - it is enough to block only 1 row.
- Lock the entire table with shared locking (shared)
- We are looking for the desired line or several lines.
- Then we block the found string.
- Unlock the table
- And working with a locked string
Then other threads will be able to work in parallel with the other lines.
So far, we have only used table level locking, i.e. blocked one or more tables.
Or all the tables used in the expression were automatically locked until it was fully completed.
(*safe_map_1)["apple"].first = "fruit";
In other cases, we manually locked one or more tables using the RAII objects of the lock until the end of the
scope of these locks (until they are destroyed):
{ std::lock_guard<decltype(safe_map_1)> lock(safe_map_1);
Let's look at an example in which we randomly select an index to insert, then randomly one of four operations (insert, delete, update, read) and perform it on a thread-safe object of type
contfree_safe_ptr <std :: map> .
Example: [37]
coliru.stacked-crooked.com/a/5b68b65bf2c5abebIn this case, we will impose the following locks on the table:
- Insert - eXclusive lock
- Delete - eXclusive lock
- Update - eXclusive lock
- Read - Shared lock
For Update or Read operations, we do:
- Lock the entire table (xlock for Update, slock for Read)
- We search for the necessary line, we read or we change it
- Unlock the table
One iteration code of our example is 1:
int const rnd_index = index_distribution(generator);
Now we will make it so that during the Update operation the table is blocked by the read lock (shared), instead of the change lock (exclusive). This will greatly speed up Update operations when using our “write contention free shared mutex” that we developed earlier.
In this case, multiple threads can simultaneously perform Update and Read operations on a single table. For example, one thread reads one line, and another thread changes another line. But if one thread tries to change the same line that another thread is reading, then in order to avoid Data-races we have to block the line itself when reading it and changing it.
Example: [38]
coliru.stacked-crooked.com/a/89f5ebd2d96296d3Now for Update or Read operations we do:
- Lock the entire table with shared locking (shared)
- We are looking for the desired line or several lines.
- Then we block the found string (xlock for Update, slock for Read)
- And we work with a locked row (X / S-lock) and a locked table (S-lock)
- Unlock the string
- Unlock the table
Diff - what we changed:

One iteration code for our example is 2:
int const rnd_index = index_distribution(generator);
Here, for thread-safe string handling, we used safe_obj. Inside safe_obj <T> there is an object of type T, and not a pointer to it, as in safe_ptr <T>. Therefore, when using safe_obj, you do not need to separately allocate memory for the object itself and change the atomic reference counter, as is required in safe_ptr. Therefore, the Insert and Delete operations are performed much faster with safe_obj than with safe_ptr.
It should be noted that when copying safe_obj, not the pointer to the object is copied, but the object itself is copied, while previously blocking the initial and final safe_obj.
Note: Strictly speaking, we do not block the entire line, but all fields of the line, except for the index of the line we are looking for. Therefore, we called our object field, not row. And also, to emphasize that in this way we can block not only one line, but even individual fields in one line, if we place them in separate safe_obj objects. This would allow different threads to block different fields and work with them in parallel.
Now we use the following locks for different operations:

This example is very fast for a large number of short time operations. But we still hold the read lock (shared) on the table in the process of changing or reading a row (field). And if we have rare but very long operations on the rows of the table, then all of this will hold the lock on the entire table for a long time.
However, if according to the logic of your task it doesn’t matter that a row can be deleted by one thread, while another thread is reading or modifying the same row, then we only need to block the table for the duration of the search row. And to avoid accessing the freed memory when another thread has deleted a string, we need to use std :: shared_ptr <T> - a pointer with an atomic thread-safe reference counter. In this case, the memory will be released only when no thread has pointers to this string.
Instead of safe_obj, we will use safe_ptr to protect the string. This will allow us to copy the pointer to the string and use the thread-safe reference counter in std :: shared_ptr <T> which is contained in safe_ptr.
Example: [39]
coliru.stacked-crooked.com/a/f2a051abfbfd2716Now for Update or Read operations we do:
- Lock the entire table with shared locking (shared)
- We are looking for the desired line or several lines.
- Then we block the found string (xlock for Update, slock for Read)
- Unlock the table
- And working with a locked string (X / S-lock) as long as it takes.
- Unlock the string
Diff - what we changed:

Example-3:
int const rnd_index = index_distribution(generator);
Well-designed multi-threaded programs use short references to shared objects, so in the future we will use not the last but the penultimate example for short read operations.
Disadvantages of Execute Around Idiom
Let's look at possible problems and criticize our code. In the previous chapter, we looked at a fairly convenient and high-performance example, explicitly setting the type of blocking for Update and Read operations, using functions:
- slock_safe_ptr () - read only
- xlock_safe_ptr () - for reading and editing
Here, the lock is held until the end of the life of the
lock_obj object returned by these functions:
auto lock_obj = slock_safe_ptr (sf_p);However, for the Insert and Delete operations, implicit locks were used, i.e. our safe_ptr <std :: map> object was blocked automatically using the Execute Around Pointer idiom, and unlocked immediately after the end of the Insert or Delete operation.
Example: [40]
coliru.stacked-crooked.com/a/89f5ebd2d96296d3But you can forget to use explicit locks on Update and Read operations. In this case, safe_ptr <std :: map> will be unlocked immediately after the search operation ends, and then you continue to use:
- found iterator that may be invalidated by another thread
- or found item that can be deleted by another thread
To partially solve this problem, instead of safe_ptr <> and safe_obj <>, you can use safe_hide_ptr <> and safe_hide_obj <> - they do not use Execute Around Pointer and you can only access members of the class after explicit blocking:
safe_hide_obj<field_t, spinlock_t> field_hide_tmp;
Example: [41]
coliru.stacked-crooked.com/a/d65deacecfe2636bIf earlier you could be mistaken and write the following - the
erroneous code:
auto it = safe_map->find(rnd_index);
Now, such a call will not compile and will require explicit blocking of objects — the
correct code:
auto s_safe_map = slock_safe_ptr(safe_map);
However, you still have the danger of using locks as temporary objects —
not correctly :
auto it = slock_safe_ptr(safe_map)->find(rnd_index);
You have a choice:
- Use safe_ptr and safe_obj to be able to explicitly or automatically (Execute Around Idiom) block your object
- Or use safe_hide_ptr and safe_hide_obj leaving only the ability to explicitly block your object.
It is up to you what to choose:
- use convenient automatic locking (Execute Around Idiom)
- or slightly reduce the possibility of making a mistake by explicitly blocking
In addition, the following functions are planned to be
added to the standard C ++ 17 for
std :: map <> :
- insert_or_assign () - if there is an element then assign, if not, then insert
- try_emplace () - if there is no element, then create an element
- merge () - merge 2 maps to 1
- extract () - get the element, and remove it from the map
The introduction of such functions allows you to perform frequently used composite operations without using iterators - in this case, using Execute Around Idiom will always guarantee the flow-safety of these operations. In general, avoiding iterators for all containers (except for std :: array and std :: vector arrays) is a big help in building multithreaded programs. The less often you use iterators, the less likely you are to access an iterator that is invalid by this or another thread. But the very idea of ​​iterators does not contradict the idea of ​​multithreading, for example, the DBMS (Oracle, MSSQL) supports
cursors (analogs of iterators) and different isolation levels of translations (different consistency of multithreading).
But whatever you choose: use Execute Around Idiom, use explicit locks for
safe_hide_ptr , discard them and use standard std :: mutex locks ... or even write your own lock-free algorithms - you always have many opportunities to make a mistake.

We section the table - still we increase productivity
Let's return again to the experience of industrial relational DBMS. For example, in a DBMS, table partitioning may be used to improve performance. In this case, instead of the entire table, you can block only the used partition (partition):
https://technet.microsoft.com/en-us/library/ms187504(v=sql.105).aspxAlthough the DBMS does not normally lock the entire table for the Delete and Insert operations, and this is always true for Delete operations. But for Insert operations, it is possible to perform very fast data loading into the table, the necessary condition of which is the exclusive table blocking:
- MS SQL (dbcc traceon (610, -1)): INSERT INTO sales_hist WITH ( TABLOCKX )
- Oracle: INSERT / * + APPEND * / INTO sales_hist
https://docs.oracle.com/cd/E18283_01/server.112/e17120/tables004.htm#i1009887
Locking Considerations with Direct-Path INSERT
During the INSERT, the database of the partitioned table locks. If you’re not in use, you can’t make it. But it is supported, but it is supported by the request before the insert operation.
Since our task is to create the fastest multithreaded container, then we also blocked the container entirely on Insert / Delete operations. But now let's try blocking only part of our container.
Let's try to implement our own partitioned ordered partitioned_map associative array and see how the performance increases. We will block in it only the section currently required.
The meaning is:
std :: map <safe_ptr <std :: map <>>>Here the first
std :: map will be constant and will contain sections (sub-tables).
This will be a very simplified example where the number of sections is set in the constructor and does not change further.
Each section is a stream-safe associative array
safe_ptr <std :: map <>> .
For maximum performance, the number of sections and their ranges must be such that the probability of accessing each section is the same. If you have a key range of 0 - 1000000, and the read / update / insert / delete requests to the beginning of the range are greater than to the end of the range, then the sections with a small key value should be larger and their ranges smaller. For example, 3 sections: [0 - 100000], [100001 - 400000], [400001 - 1000000].
But in our examples we will assume that query keys have a uniform distribution.
Section ranges can be set in two ways:
- safe_map_partitioned_t <std :: string, int> safe_map {"a", "f", "k", "p", "u"};
safe_map_partitioned_t <int, int> (0, 100000, 10000);
// set the boundaries of values ​​0 - 100000 and step for each section 10000
If, when accessing the container, the key goes beyond the section boundaries, then the request will address the nearest section — The program will work correctly.
Example: [42]
coliru.stacked-crooked.com/a/fc148b08eb4b0580Also, for maximum performance, it is necessary to use the previously implemented "contention-free shared-mutex" inside
safe_ptr <> , i.e. the meaning is:
std :: map <contfree_safe_ptr <std :: map <>>>Take the code from the previous example and add the code contfree_safe_ptr from the previous chapter.
Replace: safe_map_partitioned_t <std :: string, std :: pair <std :: string, int >>
To: safe_map_partitioned_t <std :: string, std :: pair <std :: string, int>, contfree_safe_ptr>
Example: [43]
coliru.stacked-crooked.com/a/23a1f7a3982063a1This class safe_map_partitioned_t <> we did “Just for fun”, i.e. It is not recommended to use it in real programs. We just showed an example of how you can implement your own algorithms based on the contfree_safe_ptr <> pointer and contention_free_shared_mutex <> blocking.
How to use
First download the
safe_ptr.h file from the repository root:
github.com/AlexeyAB/object_threadsafeThen include this file in your cpp-file:
#include "safe_ptr.h"As an optimal use case, we will focus on Example 2 shown above - this is simple and highly productive:
struct field_t { int money, time; field_t(int m, int t) : money(m), time(t) {} field_t() : money(0), time(0) {} }; typedef safe_obj<field_t, spinlock_t> safe_obj_field_t;
Insert and delete - change the map : Because our slock_safe_ptr () shared lock is as fast as possible, then before the change (insert_op or delete_op) we find the element we need to delete or closest to the one we need to insert using slock_safe_ptr (), which will be unlocked immediately after the find () operation. , -L1 map. test_map->emplace() test_map->erase() exclusive-lock . Insert/delete — , , — shared-lock exclusive-lock ( deadlock — ).
Read and update — ( map) : (read_op) — map ( map). slock_safe_ptr() map, map. s_safe_map, xlock_safe_ptr() , . , {}, x_field , s_safe_map map.
test_map — — .
,
auto const& some_ptr = test_map; , ,
some_ptr->find(); (shared lock) ,
some_ptr->emplace(); . .
slock_safe_ptr(test_map) ,
slock_safe_ptr(test_map)->find(); , slock_safe_ptr(test_map)->emplace(); — . .
.
safe_ptr
. .
– (MOps), 0 – 90%. : insert, delete, update ( Update std::map, ). , 15% – : 5% insert, 5% delete, 5% update, 85% read. : g++ 4.9.2 (Linux Ubuntu) x86_64
Linux (GCC 4.9.2) Windows (MSVS2015):
github.com/AlexeyAB/object_threadsafe/tree/master/benchmarkClang++ 3.8.0 Linux Makefile.
16 Intel Xeon E5-2660 v3 2.6 GHz. :
safe<map,contfree>&rowlock .
CPU, :
numactl --localalloc --cpunodebind=0 ./benchmark 16CPU, :
./benchmark
Conclusion:
- , contfree- contfree_safe_ptr<map> , std::shared_mutex safe_ptr<map,std::shared_mutex>
- , 15% std::mutex , std::shared_mutex ( safe_ptr<map,std::mutex> , safe_ptr<map,std::shared_mutex>)
- , 30% std:map – 1 thread contfree_safe_ptr<map>&rowlock. . - , .
- safe_map_partitioned<,,contfree_safe_ptr>, , «Just-for-fun» — .
- 15% , shared-mutex ( contfree_safe_ptr<map> & rowlock) 8.37 Mops, 10 , std::shared_mutex ( safe_ptr<map, std::shared_mutex>), 0.74 Mops.
«contention-free shared-mutex» : 70 contention-free, exclusive-lock. , «exclusive-lock» std::mutex , std::shared_mutex 15% .
, .. .
main.cpp :
const bool measure_latency = true;:
numactl --localalloc --cpunodebind=0 ./benchmark 16
- , std::map & std::mutex safe_ptr<map,std::mutex>, .. safe_ptr<> - , .
- , 60% , contfree_safe_ptr<map>&rowlock , contfree_safe_ptr<map>. , contfree_safe_ptr<map>&rowlock .
contfree_safe_ptr<std::map> CDS-lib desktop-CPU
desktop-CPU, .

- , contfree_safe_ptr<map> , lock-free-map CDS-lib.
- «Just-for-fun» safe_map_partitioned<,,contfree_safe_ptr> 1.7 .
contfree_safe_ptr<std::map> CDS-lib server-CPU
CPU, .
Linux (GCC 4.9.2) Windows (MSVS2015):
github.com/AlexeyAB/object_threadsafe/tree/master/CDS_testClang++ 3.8.0 Linux Makefile.
Xeon , , :
numactl --localalloc --cpunodebind=0 ./benchmark 16CPU, :
./benchmark
- , 16 , , lock-free CDS-lib , contfree_safe_ptr<map>. Those. lock-free . 16 , 8 , contfree_safe_ptr<map> , lock-free .
- «Just-for-fun»- safe_map_partitioned<,,contfree_safe_ptr> , , .
– 50% .
main.cpp :
const bool measure_latency = true;:
numactl --localalloc --cpunodebind=0 ./benchmark 16
contfree_safe_ptr<map> lock-free 8 , 16 .
.
. .
for(volatile int i = 0; i < 9000; ++i); - . 100 000 . Intel Xeon E5-2686 v4 2.3 GHz 20.5 .
Linux (GCC 4.9.2) Windows (MSVS2015) :
github.com/AlexeyAB/object_threadsafe/tree/master/Real_app_benchClang++ 3.8.0 Linux Makefile.
2- :
2 x Intel Xeon E5-2686 v4 2.3 GHz (Broadwell) : 36
72 (Hyper Threading).
:
cd libcds
make
cd…
make
./benchmark

- std::mutex std::shared_mutex std::map lock-free-map 16 . std::mutex&std::map std::shared_mutex&std::map , 32 .
- - contfree_safe_ptr< std::map<> > lock-free-map libCDS 1 64. , 20 .
— main.cpp :
const bool measure_latency = true;Linux :
./benchmark
- , 64 std::mutex , std::shared_mutex.
- - contfree_safe_ptr<std::map<>> lock-free-map libCDS 1 64. , 20 .
- 64 30 , 20 — , 10 — . , 30% , contfree_safe_ptr<T> (MOPS -), lock-free libCDS.
lock-free wait-free (queue, stack …) libCDS , .
Conclusion:- , .. 8 , CDSlib: github.com/khizmax/libcds
- thread-safe CDSlib – .
- - , 30% , contfree_safe_ptr<> , map- CDSlib.
- , CDSlib, contfree_safe_ptr<T> . , lock-free .
C++ Execute Around Pointer Idiom. Contention-free shared-mutex. lock-free libCDS lock-based .
,
safe_ptr.h , :
github.com/AlexeyAB/object_threadsafeboost?