Lock-free data structures. Outside: an introduction to libcds

In this article, I give a brief overview of how to use the lock-free library of data structures libcds. I will not go into the implementation here, it’s just a view from the outside , a view from the library user.

The libcds library has its own point of view on many well-known data structures. This is partly due to the target area — the lock-free data structures are quite minimalist in the set of methods provided — partly in the desire to go beyond the limitations and solutions of the standard STL library. What came out of this is up to the libcds users.

Who cares - welcome under the cat

Libcds is a C ++ library of lock-free data structures and safe memory reclamation (SMR) methods for lock-free structures. It is almost header-only — all data structures are defined in the .h files of the headers and only the implementation of the core of the SMR algorithms is rendered into a small dynamic library.
')

SMR Overview

Perhaps the most subtle point in the development of a lock-free data structure is to determine at what point in time it is safe to physically remove the container element. Deleting an item is always a two-phase operation: first, we exclude the item from the container, then delete (deallocate) the memory allocated for the item. In the lock-free approach, the second phase - physical deletion - requires that there is complete certainty that no one (none of the parallel streams) has a link, global or local, to the item to be deleted.
SMR algorithms that can be considered as specialized garbage collectors (GC) are responsible for safe removal. In this article I will not dwell on the internal details of the implementation of a particular SMR algorithm, I will give only a general description, sufficient to start working with libcds. I hope to discuss in detail all the algorithms in future articles.

The following SMR algorithms are implemented in libcds:

Hazard Pointer is perhaps the first and most famous of the SMR algorithms. Invented by Michael (this surname) in 2002 [Mic02, Mic03, Mic04]. It is distinguished by relative simplicity and good speed; it is intended for the “protection” of local references to elements of the container. The disadvantage is that you specify the maximum number of concurrent threads. Header file <cds/gc/hp.h> , class cds::gc::HP
Pass-the-Buck is conceptually similar to the Hazard Pointer, but does not depend on the number of threads. Proposed in 2002 by Herlihy, V. Luchangco and M. Moir [Her02, Her05]. Due to the independence from the number of threads, it is a bit heavier than the Hazard Pointer. Header <cds/gc/ptb.h> , class cds::gc::PTB
Hazard Pointer with Reference Counting is a variant of the Hazard Pointer algorithm proposed by representatives of the Swedish school headed by Håkan Sundell - Gidenstam et al. [Gid05, Gid06]. Like the Hazard Pointer, it depends on the number of threads. A distinctive feature is that it can protect global references to container elements, that is, it is possible to implement iterators. Not recommended for use due to insufficient performance and rare glitches (I can not debug it to the end). Header file <cds/gc/hrc.h> , class cds::gc::HRC
RCU (Read-Copy Update) - unlike the above, this algorithm refers to synchronization algorithms, that is, suspends execution of streams when deleting elements, but allows parallel execution of other operations — insert, search — is therefore good for data structures with rare deletion, type map, set. It is proposed by Paul McKenney and is actively used in the Linux kernel; The original RCU can only be implemented in the OS kernel, but in 2009 Desnoyers proposed the RCU version for application (user space RCU, URCU) [Des09, Des11], which is implemented in libcds. The library offers five implementations of URCU, the most efficient is buffered URCU. Header file <cds/urcu/general_buffered.h> , class cds::urcu::gc< cds::urcu::general_buffered<> > . In libcds, URCU applies only to map and set.

All SMR algorithms require a single object representing the GC, that is, the GC is a singleton. In the dynamic library - the core of libcds - is just the implementation of the basic SMR algorithms.

As a rule, your application should use only one SMR algorithm (although libcds allows for using several different GC algorithms - all tests are built this way, but you cannot use two objects of the same GC). Therefore, the first thing to do is to select the SMR algorithm. Most data structure classes in libcds have GC, the memory management algorithm, as their first template argument. Moreover, for each GC class there is its own specialization of the template class (template) of the container class, so for different GCs, as a rule, you should include (include) different container class header files.

To use SMR algorithms, there is no need to know the internal structure of the GC, - all the mechanics of interaction with GC is hidden inside the container classes. But it is required to initialize the GC singleton object at the beginning of the program, as well as at the beginning of each thread (thread), which uses containers that depend on the GC.

Containers without GC

In libcds there are some lock-free data structures without GC:

lock-based containers using sophisticated (fine-grained) synchronization algorithms
limited (bounded) in maximum size containers
set and map that do not support append-only operations. As a rule, each set / map template class has a separate specialization for “GC” cds::gc::nogc . The class cds::gc::nogc is actually a mark of the fact that the container does not support removal.

Libcds initialization

The initialization of the library is to declare a GC object. This object should be the only one in your program. A GC object is simply an object-oriented superstructure over the internal implementation of the SMR algorithm. As a rule, your program will not need to refer to the methods of the GC object, - all interaction with the GC is hidden inside the lock-free data structure, so the GC object can be local.
In libcds, all different types of GC - cds::gc::HP , cds::gc::PTB cds::urcu::gc< cds::urcu::general_buffered<> > , cds::urcu::gc< cds::urcu::general_buffered<> > , etc., are initialized uniformly, but the constructors may take different arguments that set the parameters of a particular GC. All arguments to GC constructors have default values; In this introductory article, I will call constructors with no arguments, relying on default values.

The initialization code for the Hazard Pointer memory management algorithm is as follows:

 #include <cds/init.h> //cds::Initialize  cds::Terminate #include <cds/gc/hp.h> //cds::gc::HP (Hazard Pointer) int main(int argc, char** argv) { //  libcds cds::Initialize() ; { //  Hazard Pointer  cds::gc::HP hpGC ; //  main thread  lock-free  // main thread    //   libcds cds::threading::Manager::attachThread() ; // , libcds    //     ... } //  libcds cds::Terminate() ; }

The cds::Initialize() function initializes some global internal libcds data and defines the topology of the system. This is followed by the creation of a GC object, in this example, Hazard Pointer, class cds::gc::HP , the object is created with arguments by default.
At the end, main should call the terminator function of the cds::Terminate library, which will release (free) the internal structures.

Can I get rid of the cds :: Initialize () and cds :: Terminate () calls?

It is cds::Initialize() get rid of cds::Initialize() and cds::Terminate (for example, by defining them in the DllMain dll of the Windows library or by declaring static internal functions with the constructor / destructor attribute for GCC), since these functions are half inline and generally depend on macros that can be specified on the compiler command line.

Initializing threads

Many SMR algorithms, as well as lock-free data structures, require support from threads. For example, some data is local to the thread (per-thread data), so it must be in thread local storage (TLS). To ensure this, some implementations dictate their flow control policy or their class hierarchy for threads. This, of course, is convenient for the library developer, but it is completely inconvenient for the user. Therefore, when designing libcds, I went another way: the library does not depend on the model (or class hierarchy) of streams you selected, but must be explicitly attached (attach) to the stream.

Each thread working with a container from libcds should be initialized as follows:

 // cds::threading::Manager #include <cds/threading/model.h> int myThreadEntryPoint(void *) { //     libcds cds::threading::Manager::attachThread() ; //        // lock-free  libcds ... //    libcds cds::threading::Manager::detachThread() ; return 0; }

The cds::threading::Manager class is an adapter class for the internal thread management infrastructure. You may need it only to connect / disconnect the stream to the libcds infrastructure.
The above thread initialization code is not exceptionally safe. A better approach is to use an object whose constructor connects the stream to the libcds infrastructure, and the destructor disables it. For this, each GC class has an internal thread_gc class. With it, the thread initialization looks like this:

 #include <cds/gc/hp.h> int myThreadEntryPoint(void *) { //     libcds cds::gc::HP::thread_gc threadGC ; //        // lock-free  libcds ... return 0; }

What if the thread was not created by us?

There are situations when it is required to connect a stream that was not created by us to libcds, and we cannot influence the initialization of this stream in any way. For example, our application is a web server plugin. The web server creates a thread pool (thread pool) and calls handlers implemented in plugins in the context of these threads. In this case, we can neither create a GC-object, nor initialize a thread by declaring an object of type GC::thread_gc . What to do?

We have to use the internal libcds initialization mechanisms: create a library initialization class and declare a static object of this class:

 #include <cds/gc/hp.h> class LibcdsInit { public: LibcdsInit() { //  libcds cds::Initialize(); //  Hazard Pointer singleton cds::gc::hzp::GarbageCollector::Construct() ; } ~LibcdsInit() { //  Hazard Pointer singleton //  true –   //  Hazard Pointer   cds::gc::hzp::GarbageCollector::Destruct(true) ; //  libcds cds::Terminate(); }; }; //    static LibcdsInit libcdsInitializator ;

Instead of the initializer class LibcdsInit corresponding calls can be placed in DllMain (for Windows) or in constructors / destructors ( __attribute__((constructor)) for GCC).

The initialization of a thread is to connect it “forever” with a call to cds::threading::Manager::attachThread() :

 // cds::threading::Manager #include <cds/threading/model.h> void callback() { Cds::threading::Manager::attachThread(); //      lock-free //  libcds }

We do not call thread disconnection.

Containers

After initializing the library, you can use the lock-free data structure. In libcds containers are divided into two large classes - ordinary and ~~unusual~~ intrusive.
Regular ones are STL-style containers, they store a copy of the data. All such containers are declared in the cds::container namespace.
Intrusive containers store the data itself, not copies. The idea is taken from boost :: intrusive . The advantage of intrusive containers is that you do not need to call the allocator to create a copy of the data. It is believed, and not without reason, that the use of a standard allocator negates the idea of lock-free, since the allocator itself contains synchronization primitives within itself. In general, this is correct, but modern advanced allocators can optimize memory allocation, eliminating internal synchronization due to, for example, the presence of a memory pool for each stream.
I will not dwell on intrusive containers in this article - they require special techniques to free up memory when removing an item from a container, this is a topic for another conversation. Here I will just note that most classes of conventional containers are based on intrusive ones, that is, as a rule, all the interesting lock-free code is in the class of the intrusive container. Intrusive containers are declared in the cds::intrusive .

All data structures in libcds are template classes with a uniform template interface. There are two types of template-interfaces:

based on variadic template
traits based

Variadic template based ads are used for simple containers - stacks and queues. The general pattern is:

 template <typename GC, typename T, typename… Options> class queue { … };

For compilers that do not support the variadic template, emulation is used.
Inside the container class, Options packaged in a Traits structure.

The Variadic template approach was not very convenient for complex inheritance, therefore for classes of more complex containers such as set , map , etc., traits are used:

 template <typename GC, typename T, typename Traits> class set { … };

For each such class, there is a make_traits that transforms the variadic template into traits:

 template <typename… Options> struct make_traits { typedef typename pack<Options…>::type type ; };

Using make_traits declaring a container object takes the form:

 struct Foo { … }; cds::container::SkipListSet< cds::gc::HP, Foo, typename cds::container::skip_list::make_traits< Opt1, Opt2, Opt3 >::type > theSkipListSet ;

Such an ad, of course, looks cool, but it can create problems when debugging, - the class name can take several dozen lines. Traits ads are more compact for debugging:

 struct Foo { … }; struct skipListSetTraits: public cds::container::skip_list::make_traits< Opt1, Opt2, Opt3 >::type {}; cds::container::SkipListSet< cds::gc::HP, Foo, skipListSetTraits > theSkipListSet ;

The template arguments for the data structure class are:

GC - SMR algorithm used
T - the type of data stored in the container
Options… or Traits - options (policy) of the container

On the options should stay separately.

Options

Each container class, in addition to the data type and the SMR algorithm, can have a varied number of additional settings defining a particular behavior. For example, for ordered containers (map, set) it is necessary to specify the binary predicate of comparison of the std::less<T> elements. Or set one or another action strategy when detecting parallel work (back-off strategy). Or install the allocator used to create container elements.
Each container class has several (up to a dozen) such, as a rule, optional settings, which I call options. The approach used in STL, when each setting is a template template's template argument, is not very pleasant to me: such position-dependent arguments with default values complicate the use of the class. If the class has 10 template arguments with default values, in order to change the eighth argument, we need to specify the default values of all seven arguments preceding the eighth one. It is very uncomfortable. Therefore, in libcds I use a position-independent approach.
With this approach, each option has its own type; For example, the “allocator” option has the following declaration:

 template <typename Alloc> struct allocator { template <typename Base> struct pack: public Base { typedef Alloc allocator ; }; };

Here, the template argument Alloc is a specific class of allocator, and the allocator structure itself is the name of the option. Here is the task of the allocator through the option:

 cds::opt::allocator< std::allocator<int> >

Using such option declarations, it is quite easy to build a meta function that collects the structure of traits from options. This process is called packing .
There are two types of options in libcds:

General options - applicable to many containers. Declarations of such options are in the cds::opt namespace, and valid values for options are in the namespace cds::opt::v
Special options - applicable only to a specific container class. Their declarations are located in the namespace of a specific container, for example, in cds::container::skip_list

In the description of each class of the container is a list of valid options.

Comparison predicates

In many containers of the type map, set it is necessary to specify the predicate of comparison of elements. In STL, predicates std::less<T> are used for this, which are able to compare only elements of type T But very often it is inconvenient and costly. For example, the type std::string can be compared with the value char const * , and for such a comparison there is no need to create a temporary object std::string . For set-containers, the type T stored in the container very often has the key fields needed for comparison, and the “load” is the actual data that is not involved in the search.
Considering all this, libcds uses extended predicates that can compare values of different types. Predicates are given by options. Moreover, as a comparison predicate, you can use extended less (option cds::opt::less< typename Less > ) or extended compare (option cds::opt::compare< typename Comparator > ). Compare returns an integer value:

cmp(a, b) < 0 if a < b
cmp(a, b) == 0 if a == b
cmp(a, b) > 0 if a > b

The std::string::compare function method also works.
Writing extended predicates ( Less for the cds::opt::less< typename Less > option cds::opt::less< typename Less > or Comparator for the cds::opt::compare< typename Comparator > option) is your conscience. Here's what the Comparator might look like for string comparisons:

 struct string_cmp { int operator()(std::string const& s1, std::string const& s2) const { return s1.compare( s2 ); } int operator()(std::string const& s1, char const* s2) const { return s1.compare( s2 ); } int operator()(char const* s1, std::string const& s2) const { return –s2.compare( s1 ); } };

Allokatory

Containers from the cds::container namespace store copies of data, so one of the options for them is cds::opt::allocator - the allocator for allocating memory for data. Some containers have two allocator options — one for distributing data ( cds::opt::node_allocator ) and another ( cds::opt::allocator ) for allocating memory for the supporting structures of the container. By default, the std::allocator<int> used as the std::allocator<int> . If you want to use your allocator, its interface should be the same as that of std::allocator .

The library relies heavily on two properties of allocators:

each allocator has a template <typename Other> struct rebind , which allows to rebuild the allocator for a different type Other . As I have already said, the default std::allocator<int> is std::allocator<int> , - the container, by means of the rebind metafunction rebind changes the int type to the required one
allocator is a global object. This requirement arises from the fact that the physical removal of the lock-free element of a container is delayed and can occur after the destruction of the container object itself. Therefore, unlike STL, the libcds containers never store references to the allocator object and do not have methods for specifying the allocator object. If you need to allocate memory within the container method, the allocator is called via a temporary object: allocator_type().allocate( 1 ) . Similarly, freeing memory is done by calling allocator_type().deallocate( p ) . It is worth noting once again that the release is delayed and can occur even when the container object itself is no longer present, and it is called from the SMR algorithm.

Brief specifications

Supported compilers:

GCC 4.3 and above
MS Visual C ++ 2008 and up
Clang 3.0 and higher (with the standard GCC library libstdc ++)

Supported operating systems and processor architectures:

Windows x86, amd64
Linux x86, amd64, Intel Itanium, Sparc
Solaris sparc
HP-UX Intel Itanium

Support for other operating systems or processor architectures is possible, but not trivial, especially if the target compiler does not support C ++ 11 atomic .

Conclusion

I hope after reading this article it will be clear how to use the libcds library. In addition, I tried to explain the basic principles, guided by which I implement in the library data structures.

I absolutely do not touch on performance issues. As I mentioned earlier, you can only talk about performance in relation to a specific task and a specific hardware. Modern hardware is very different about lock-free algorithms: one favors them, the other is necessary to persuade for a long time, choosing one or another container class options to achieve acceptable performance. Lock-free data structures are quite complex compared to their normal counterparts, so the “who is faster” factor starts playing: either a synchronization primitive + a regular container, or a complex lock-free container without external synchronization.

If you want good advice, here it is: try not to use shared data at all . I did not discover America with such advice, but it can be difficult to follow it in everyday life.

Links

[Des09] M.Desnoyers Low-Impact Operating System Tracing PhD Thesis,Chapter 6 «User-Level Implementations of Read-Copy Update»

[Des11] M.Desnoyers, P.McKenney, A.Stern, M.Dagenias, J.Walpole User-Level Implementations of Read-Copy Update

[Gid05] Anders Gidenstam, Marina Papatriantafilou and Philippas Tsigas Practical and Efficient Lock-Free Garbage Collection Based on Reference Counting

[Gid06] A.Gidenstam Algorithms for synchronization and consistency in concurrent system services , Chapter 5 «Lock-Free Memory Reclamation» Thesis for the degree of Doctor of Philosophy

[Her02] M. Herlihy, V. Luchangco, and M. Moir. The repeat offender problem: A mechanism for supporting dynamic-sized lockfree data structures Technical Report TR-2002-112, Sun Microsystems Laboratories, 2002

[Her05] M. Herlihy, V. Luchangco, P. Martin, and M. Moir. Nonblocking Memory Management Support for Dynamic_Sized Data Structures ACM Transactions on Computer Systems, Vol.23, No.2, May 2005

[Mic02] Maged M.Michael Safe memory reclamation for dynamic lock-free objects using atomic reads and writes

[Mic03] Maged M.Michael Hazard Pointers: Safe memory reclamation for lock-free objects

[Mic04] Andrei Alexandrescy, Maged Michael Lock-free Data Structures with Hazard Pointers

Lock-free data structures

Start
:

libcds

Source: https://habr.com/ru/post/196834/

All Articles