📜 ⬆️ ⬇️

Lock-free data structures. Outside: an introduction to libcds


In this article, I give a brief overview of how to use the lock-free library of data structures libcds. I will not go into the implementation here, it’s just a view from the outside , a view from the library user.

The libcds library has its own point of view on many well-known data structures. This is partly due to the target area — the lock-free data structures are quite minimalist in the set of methods provided — partly in the desire to go beyond the limitations and solutions of the standard STL library. What came out of this is up to the libcds users.

Who cares - welcome under the cat

Libcds is a C ++ library of lock-free data structures and safe memory reclamation (SMR) methods for lock-free structures. It is almost header-only — all data structures are defined in the .h files of the headers and only the implementation of the core of the SMR algorithms is rendered into a small dynamic library.
')

SMR Overview


Perhaps the most subtle point in the development of a lock-free data structure is to determine at what point in time it is safe to physically remove the container element. Deleting an item is always a two-phase operation: first, we exclude the item from the container, then delete (deallocate) the memory allocated for the item. In the lock-free approach, the second phase - physical deletion - requires that there is complete certainty that no one (none of the parallel streams) has a link, global or local, to the item to be deleted.
SMR algorithms that can be considered as specialized garbage collectors (GC) are responsible for safe removal. In this article I will not dwell on the internal details of the implementation of a particular SMR algorithm, I will give only a general description, sufficient to start working with libcds. I hope to discuss in detail all the algorithms in future articles.

The following SMR algorithms are implemented in libcds:


All SMR algorithms require a single object representing the GC, that is, the GC is a singleton. In the dynamic library - the core of libcds - is just the implementation of the basic SMR algorithms.

As a rule, your application should use only one SMR algorithm (although libcds allows for using several different GC algorithms - all tests are built this way, but you cannot use two objects of the same GC). Therefore, the first thing to do is to select the SMR algorithm. Most data structure classes in libcds have GC, the memory management algorithm, as their first template argument. Moreover, for each GC class there is its own specialization of the template class (template) of the container class, so for different GCs, as a rule, you should include (include) different container class header files.

To use SMR algorithms, there is no need to know the internal structure of the GC, - all the mechanics of interaction with GC is hidden inside the container classes. But it is required to initialize the GC singleton object at the beginning of the program, as well as at the beginning of each thread (thread), which uses containers that depend on the GC.
Containers without GC
In libcds there are some lock-free data structures without GC:
  • lock-based containers using sophisticated (fine-grained) synchronization algorithms
  • limited (bounded) in maximum size containers
  • set and map that do not support append-only operations. As a rule, each set / map template class has a separate specialization for “GC” cds::gc::nogc . The class cds::gc::nogc is actually a mark of the fact that the container does not support removal.



Libcds initialization


The initialization of the library is to declare a GC object. This object should be the only one in your program. A GC object is simply an object-oriented superstructure over the internal implementation of the SMR algorithm. As a rule, your program will not need to refer to the methods of the GC object, - all interaction with the GC is hidden inside the lock-free data structure, so the GC object can be local.
In libcds, all different types of GC - cds::gc::HP , cds::gc::PTB cds::urcu::gc< cds::urcu::general_buffered<> > , cds::urcu::gc< cds::urcu::general_buffered<> > , etc., are initialized uniformly, but the constructors may take different arguments that set the parameters of a particular GC. All arguments to GC constructors have default values; In this introductory article, I will call constructors with no arguments, relying on default values.

The initialization code for the Hazard Pointer memory management algorithm is as follows:
 #include <cds/init.h> //cds::Initialize  cds::Terminate #include <cds/gc/hp.h> //cds::gc::HP (Hazard Pointer) int main(int argc, char** argv) { //  libcds cds::Initialize() ; { //  Hazard Pointer  cds::gc::HP hpGC ; //  main thread  lock-free  // main thread    //   libcds cds::threading::Manager::attachThread() ; // , libcds    //     ... } //  libcds cds::Terminate() ; } 

The cds::Initialize() function initializes some global internal libcds data and defines the topology of the system. This is followed by the creation of a GC object, in this example, Hazard Pointer, class cds::gc::HP , the object is created with arguments by default.
At the end, main should call the terminator function of the cds::Terminate library, which will release (free) the internal structures.
Can I get rid of the cds :: Initialize () and cds :: Terminate () calls?
It is cds::Initialize() get rid of cds::Initialize() and cds::Terminate (for example, by defining them in the DllMain dll of the Windows library or by declaring static internal functions with the constructor / destructor attribute for GCC), since these functions are half inline and generally depend on macros that can be specified on the compiler command line.


Initializing threads


Many SMR algorithms, as well as lock-free data structures, require support from threads. For example, some data is local to the thread (per-thread data), so it must be in thread local storage (TLS). To ensure this, some implementations dictate their flow control policy or their class hierarchy for threads. This, of course, is convenient for the library developer, but it is completely inconvenient for the user. Therefore, when designing libcds, I went another way: the library does not depend on the model (or class hierarchy) of streams you selected, but must be explicitly attached (attach) to the stream.

Each thread working with a container from libcds should be initialized as follows:
 // cds::threading::Manager #include <cds/threading/model.h> int myThreadEntryPoint(void *) { //     libcds cds::threading::Manager::attachThread() ; //        // lock-free  libcds ... //    libcds cds::threading::Manager::detachThread() ; return 0; } 

The cds::threading::Manager class is an adapter class for the internal thread management infrastructure. You may need it only to connect / disconnect the stream to the libcds infrastructure.
The above thread initialization code is not exceptionally safe. A better approach is to use an object whose constructor connects the stream to the libcds infrastructure, and the destructor disables it. For this, each GC class has an internal thread_gc class. With it, the thread initialization looks like this:
 #include <cds/gc/hp.h> int myThreadEntryPoint(void *) { //     libcds cds::gc::HP::thread_gc threadGC ; //        // lock-free  libcds ... return 0; } 

What if the thread was not created by us?
There are situations when it is required to connect a stream that was not created by us to libcds, and we cannot influence the initialization of this stream in any way. For example, our application is a web server plugin. The web server creates a thread pool (thread pool) and calls handlers implemented in plugins in the context of these threads. In this case, we can neither create a GC-object, nor initialize a thread by declaring an object of type GC::thread_gc . What to do?

We have to use the internal libcds initialization mechanisms: create a library initialization class and declare a static object of this class:
 #include <cds/gc/hp.h> class LibcdsInit { public: LibcdsInit() { //  libcds cds::Initialize(); //  Hazard Pointer singleton cds::gc::hzp::GarbageCollector::Construct() ; } ~LibcdsInit() { //  Hazard Pointer singleton //  true –   //  Hazard Pointer   cds::gc::hzp::GarbageCollector::Destruct(true) ; //  libcds cds::Terminate(); }; }; //    static LibcdsInit libcdsInitializator ; 

Instead of the initializer class LibcdsInit corresponding calls can be placed in DllMain (for Windows) or in constructors / destructors ( __attribute__((constructor)) for GCC).

The initialization of a thread is to connect it “forever” with a call to cds::threading::Manager::attachThread() :
 // cds::threading::Manager #include <cds/threading/model.h> void callback() { Cds::threading::Manager::attachThread(); //      lock-free //  libcds } 

We do not call thread disconnection.



Containers


After initializing the library, you can use the lock-free data structure. In libcds containers are divided into two large classes - ordinary and unusual intrusive.
Regular ones are STL-style containers, they store a copy of the data. All such containers are declared in the cds::container namespace.
Intrusive containers store the data itself, not copies. The idea is taken from boost :: intrusive . The advantage of intrusive containers is that you do not need to call the allocator to create a copy of the data. It is believed, and not without reason, that the use of a standard allocator negates the idea of ​​lock-free, since the allocator itself contains synchronization primitives within itself. In general, this is correct, but modern advanced allocators can optimize memory allocation, eliminating internal synchronization due to, for example, the presence of a memory pool for each stream.
I will not dwell on intrusive containers in this article - they require special techniques to free up memory when removing an item from a container, this is a topic for another conversation. Here I will just note that most classes of conventional containers are based on intrusive ones, that is, as a rule, all the interesting lock-free code is in the class of the intrusive container. Intrusive containers are declared in the cds::intrusive .

All data structures in libcds are template classes with a uniform template interface. There are two types of template-interfaces:

Variadic template based ads are used for simple containers - stacks and queues. The general pattern is:
 template <typename GC, typename T, typename… Options> class queue { … }; 

For compilers that do not support the variadic template, emulation is used.
Inside the container class, Options packaged in a Traits structure.

The Variadic template approach was not very convenient for complex inheritance, therefore for classes of more complex containers such as set , map , etc., traits are used:
 template <typename GC, typename T, typename Traits> class set { … }; 

For each such class, there is a make_traits that transforms the variadic template into traits:
 template <typename… Options> struct make_traits { typedef typename pack<Options…>::type type ; }; 

Using make_traits declaring a container object takes the form:
 struct Foo { … }; cds::container::SkipListSet< cds::gc::HP, Foo, typename cds::container::skip_list::make_traits< Opt1, Opt2, Opt3 >::type > theSkipListSet ; 

Such an ad, of course, looks cool, but it can create problems when debugging, - the class name can take several dozen lines. Traits ads are more compact for debugging:
 struct Foo { … }; struct skipListSetTraits: public cds::container::skip_list::make_traits< Opt1, Opt2, Opt3 >::type {}; cds::container::SkipListSet< cds::gc::HP, Foo, skipListSetTraits > theSkipListSet ; 


The template arguments for the data structure class are:

On the options should stay separately.


Options


Each container class, in addition to the data type and the SMR algorithm, can have a varied number of additional settings defining a particular behavior. For example, for ordered containers (map, set) it is necessary to specify the binary predicate of comparison of the std::less<T> elements. Or set one or another action strategy when detecting parallel work (back-off strategy). Or install the allocator used to create container elements.
Each container class has several (up to a dozen) such, as a rule, optional settings, which I call options. The approach used in STL, when each setting is a template template's template argument, is not very pleasant to me: such position-dependent arguments with default values ​​complicate the use of the class. If the class has 10 template arguments with default values, in order to change the eighth argument, we need to specify the default values ​​of all seven arguments preceding the eighth one. It is very uncomfortable. Therefore, in libcds I use a position-independent approach.
With this approach, each option has its own type; For example, the “allocator” option has the following declaration:
 template <typename Alloc> struct allocator { template <typename Base> struct pack: public Base { typedef Alloc allocator ; }; }; 

Here, the template argument Alloc is a specific class of allocator, and the allocator structure itself is the name of the option. Here is the task of the allocator through the option:
 cds::opt::allocator< std::allocator<int> > 

Using such option declarations, it is quite easy to build a meta function that collects the structure of traits from options. This process is called packing .
There are two types of options in libcds:

In the description of each class of the container is a list of valid options.

Comparison predicates


In many containers of the type map, set it is necessary to specify the predicate of comparison of elements. In STL, predicates std::less<T> are used for this, which are able to compare only elements of type T But very often it is inconvenient and costly. For example, the type std::string can be compared with the value char const * , and for such a comparison there is no need to create a temporary object std::string . For set-containers, the type T stored in the container very often has the key fields needed for comparison, and the “load” is the actual data that is not involved in the search.
Considering all this, libcds uses extended predicates that can compare values ​​of different types. Predicates are given by options. Moreover, as a comparison predicate, you can use extended less (option cds::opt::less< typename Less > ) or extended compare (option cds::opt::compare< typename Comparator > ). Compare returns an integer value:

The std::string::compare function method also works.
Writing extended predicates ( Less for the cds::opt::less< typename Less > option cds::opt::less< typename Less > or Comparator for the cds::opt::compare< typename Comparator > option) is your conscience. Here's what the Comparator might look like for string comparisons:
 struct string_cmp { int operator()(std::string const& s1, std::string const& s2) const { return s1.compare( s2 ); } int operator()(std::string const& s1, char const* s2) const { return s1.compare( s2 ); } int operator()(char const* s1, std::string const& s2) const { return –s2.compare( s1 ); } }; 


Allokatory


Containers from the cds::container namespace store copies of data, so one of the options for them is cds::opt::allocator - the allocator for allocating memory for data. Some containers have two allocator options — one for distributing data ( cds::opt::node_allocator ) and another ( cds::opt::allocator ) for allocating memory for the supporting structures of the container. By default, the std::allocator<int> used as the std::allocator<int> . If you want to use your allocator, its interface should be the same as that of std::allocator .

The library relies heavily on two properties of allocators:


Brief specifications


Supported compilers:


Supported operating systems and processor architectures:

Support for other operating systems or processor architectures is possible, but not trivial, especially if the target compiler does not support C ++ 11 atomic .


Conclusion


I hope after reading this article it will be clear how to use the libcds library. In addition, I tried to explain the basic principles, guided by which I implement in the library data structures.

I absolutely do not touch on performance issues. As I mentioned earlier, you can only talk about performance in relation to a specific task and a specific hardware. Modern hardware is very different about lock-free algorithms: one favors them, the other is necessary to persuade for a long time, choosing one or another container class options to achieve acceptable performance. Lock-free data structures are quite complex compared to their normal counterparts, so the “who is faster” factor starts playing: either a synchronization primitive + a regular container, or a complex lock-free container without external synchronization.

If you want good advice, here it is: try not to use shared data at all . I did not discover America with such advice, but it can be difficult to follow it in everyday life.

Links
[Des09] M.Desnoyers Low-Impact Operating System Tracing PhD Thesis,Chapter 6 «User-Level Implementations of Read-Copy Update»

[Des11] M.Desnoyers, P.McKenney, A.Stern, M.Dagenias, J.Walpole User-Level Implementations of Read-Copy Update

[Gid05] Anders Gidenstam, Marina Papatriantafilou and Philippas Tsigas Practical and Efficient Lock-Free Garbage Collection Based on Reference Counting

[Gid06] A.Gidenstam Algorithms for synchronization and consistency in concurrent system services , Chapter 5 «Lock-Free Memory Reclamation» Thesis for the degree of Doctor of Philosophy

[Her02] M. Herlihy, V. Luchangco, and M. Moir. The repeat offender problem: A mechanism for supporting dynamic-sized lockfree data structures Technical Report TR-2002-112, Sun Microsystems Laboratories, 2002

[Her05] M. Herlihy, V. Luchangco, P. Martin, and M. Moir. Nonblocking Memory Management Support for Dynamic_Sized Data Structures ACM Transactions on Computer Systems, Vol.23, No.2, May 2005

[Mic02] Maged M.Michael Safe memory reclamation for dynamic lock-free objects using atomic reads and writes

[Mic03] Maged M.Michael Hazard Pointers: Safe memory reclamation for lock-free objects

[Mic04] Andrei Alexandrescy, Maged Michael Lock-free Data Structures with Hazard Pointers


Source: https://habr.com/ru/post/196834/


All Articles