struct Node { Node * m_pNext ; }; class queue { Node * m_pHead ; Node * m_pTail ; public: queue(): m_pHead( NULL ), m_pTail( NULL ) {} void enqueue( Node * p ) { p->m_pNext = nullptr; if ( m_pTail ) m_pTail->m_pNext = p; else m_pHead = p ; m_pTail = p ; } Node * dequeue() { if ( !m_pHead ) return nullptr ; Node * p = m_pHead ; m_pHead = p->m_pNext ; if ( !m_pHead ) m_pTail = nullptr ; return p ; } };
cds::intrusive::MSQueue
. Comments are inserted by code, trying to make them not too boring. bool enqueue( value_type& val ) { /* Implementation detail: node_type value_type - , node_traits::to_node_ptr - static_cast<node_type *>( &val ) */ node_type * pNew = node_traits::to_node_ptr( val ) ; typename gc::Guard guard; // , , Hazard Pointer // Back-off (template- ) back_off bkoff; node_type * t; // lock-free, , ... while ( true ) { /* m_pTail, (delete) */ t = guard.protect( m_pTail, node_to_value() ); node_type * pNext = t->m_pNext.load( memory_model::memory_order_acquire); /* : , m_pTail , , . */ if ( pNext != nullptr ) { // ! //( ) m_pTail.compare_exchange_weak( t, pNext, std::memory_order_release, std::memory_order_relaxed); /* , CAS CAS β , m_pTail - , . */ continue ; } node_type * tmp = nullptr; if ( t->m_pNext.compare_exchange_strong( tmp, pNew, std::memory_order_release, std::memory_order_relaxed )) { // . break ; } /* β CAS . , - . β , β back_off */ bkoff(); } /* , β ... , - β , . , */ ++m_ItemCounter ; /* , m_pTail. , , β , , . '!' , dequeue */ m_pTail.compare_exchange_strong( t, pNew, std::memory_order_acq_rel, std::memory_order_relaxed ); /* true. , , bounded queue, false, . enqueue */ return true; } value_type * dequeue() { node_type * pNext; back_off bkoff; // dequeue 2 Hazard Pointer' typename gc::template GuardArray<2> guards; node_type * h; // , ... while ( true ) { // m_pHead h = guards.protect( 0, m_pHead, node_to_value() ); // pNext = guards.protect( 1, h->m_pNext, node_to_value() ); // : , , // ?.. if ( m_pHead.load(std::memory_order_relaxed) != h ) { // , - - ... // continue; } /* . , */ if ( pNext == nullptr ) return nullptr; // /* , Hazard Pointer' β , ( ) */ node_type * t = m_pTail.load(std::memory_order_acquire); if ( h == t ) { /* ! : , , . ... */ m_pTail.compare_exchange_strong( t, pNext, std::memory_order_release, std::memory_order_relaxed); // - // CAS continue; } // β // if ( m_pHead.compare_exchange_strong( h, pNext, std::memory_order_release, std::memory_order_relaxed )) { // β break; } /* ... , - . , */ bkoff() ; } // - , // . enqueue --m_ItemCounter; // ' h' dispose_node( h ); /* !!! ! , [] , pNext β ! */ return pNext; }
nullptr
for nullptr
? .. You will not find. To ensure physical (but not logical) non-emptiness in the queue's constructor, one false (dummy) element is added to it, which is the head and tail. And the consequence of this is this: during the extraction ( dequeue
), an element is returned, which becomes the new dummy-element (new head), and the former dummy-element (former head) is deleted:dequeue
.m_pNext
. If this pointer is not nullptr
- the tail is not in place, you need to move it. There is another underwater stone here: it may happen that the tail points to the element in front of the head (the intersection of the head and tail). To avoid this, we implicitly check m_pTail->m_pNext
: we read the head following the head element m_pHead->m_pNext
, made sure that pNext != nullptr
, and then we see that the head is equal to the tail. Consequently, there is something behind the tail, since there is a pNext
, and the tail must be pushed forward. A typical example of mutual help (helping) flows, which is very common in lock-free programming.MSQueue
algorithm in the MSQueue
method at each iteration of the loop reads the tail, which is redundant: the tail needs to be read (to check that it is really a tail and points to the last element) only when the head has been successfully updated. Thus, we can expect a decrease in pressure on m_pTail
under certain types of load. This optimization is represented in libcds by the class cds::intrusive::MoirQueue
.MSQueue
was introduced in 2007. A rather well-known in lock-free circles researcher Nir Shavit and his comrades approached the optimization of the classic lock-free queue of Michael and Scott on the other hand.MSQueue
. In the enqueue
operation under high enqueue
, when the CAS tail changes did not work, that is, where the back-off is called in MSQueue
, we cannot determine in which order the elements will be added to the queue, as they are added simultaneously . This is the logical basket. In essence, it turns out that the abstraction of logical baskets is a kind of back-off strategy.MSQueue
we have already seen that the lock-free code is very verbose. Those wishing to see the implementation refer to the class cds::intrusive::BasketQueue
library, file cds/intrusive/basket_queue.h
. For the explanation of the algorithm, I will borrow another picture from the work of Nir Shavit & Co:MSQueue
place (and we remember that in MSQueue
tail may not indicate the last element in the queue at all) and try to change it at the same time .enqueue
and successfully adds its element, changing the tail.enqueue
on enqueue
, another thread may already delete the element in front of C. To prevent such a situation, it is proposed to apply a logical deletion , that is, you must first mark the deleted elements with the special flag deleted
. Since it is required that the flag and the pointer to the element itself can be read atomically, we will store this flag in the pNext
bit of the pNext
pointer of the element. This is permissible, since in modern systems memory is allocated evened out at least 4 bytes, so that the lower 2 bits of the pointer will always be zeros. Thus, we have invented marked pointers , which is widely used in lock-free data structures. This reception we will meet more than once in the future. By applying a logical deletion, that is, setting the low-order bit of pNext
to 1 using CAS, we will exclude the possibility to insert an element in front of the head β the insertion is also performed by CAS, and the deleted element contains 1 in the low-order bit of the pointer, so the CAS will fail (of course when inserting, we take not all of the marked pointer, but only its high bits, which contain the actual address, the lower bit is set to zero).BasketQueue
concerns the physical removal of elements. It was noticed that changing the head with each successful call to dequeue
can be costly - CAS is also called there, which, as you know, is rather heavy. Therefore, we will change the head only when several logically deleted elements have accumulated (in the default implementation of libcds
three). Or when the queue becomes empty. It can be said that the head in BasketQueue
changes by leaps (hops).MSQueue
, which they called optimistic .dequeue
operation requires only one CAS, whereas the enqueue
two (see the figure to the right).MSQueue::enqueue
two MSQueue::enqueue
came from. The first CAS associates a new element with a tail β changes pTail->pNext
. The second - promotes the tail itself. Can the pNext
field pNext
changed with a regular atomic notation, and not with a CAS? Yes, if the direction of our simply linked list would be different - not from head to tail, but on the contrary, we could use an atomic store ( pNew->pNext = pTail
) to set pNew->pNext
and then change pTail
. But if we change direction, then how do dequeue
do dequeue
? There will be no pHead->pNext
, the direction of the list has changed.enqueue
) will always be consistent. But the feedback - from head to tail, the most important, prev - may be not quite consistent, that is, its violation is permissible. If we find such a violation, we can always restore the correct list by following the next links. How to detect such a violation? Very simple: pHead->prev->next != pHead
. If this inequality is detected in the dequeue
, the auxiliary fix_list
procedure is fix_list
: void fix_list( node_type * pTail, node_type * pHead ) { // pTail pHead Hazard Pointer' node_type * pCurNode; node_type * pCurNodeNext; typename gc::template GuardArray<2> guards; pCurNode = pTail; while ( pCurNode != pHead ) { // pCurNodeNext = guards.protect(0, pCurNode->m_pNext, node_to_value() ); if ( pHead != m_pHead.load(std::memory_order_relaxed) ) break; pCurNodeNext->m_pPrev.store( pCurNode, std::memory_order_release ); guards.assign( 1, node_traits::to_value_ptr( pCurNode = pCurNodeNext )); } }
cds::intrusive::OptimisticQueue
class of the cds::intrusive::OptimisticQueue
Library]fix_list
runs through the entire queue from tail to head via the obviously correct pNext
links and adjusts pPrev
.OptimisticQueue::enqueue
: bool enqueue( value_type& val ) { node_type * pNew = node_traits::to_node_ptr( val ); typename gc::template GuardArray<2> guards; back_off bkoff; guards.assign( 1, &val ); node_type * pTail = guards.protect( 0, m_pTail, node_to_value()); while( true ) { // β pNew->m_pNext.store( pTail, std::memory_order_release ); // if ( m_pTail.compare_exchange_strong( pTail, pNew, std::memory_order_release, std::memory_order_relaxed )) { /* β . (). pTail (dequeue) ( , pTail Hazard Pointer' , , ) */ pTail->m_pPrev.store( pNew, std::memory_order_release ); break ; // Enqueue done! } /* CAS β pTail ( CAS C++11: !) pTail Hazard Pointer' */ guards.assign( 0, node_traits::to_value_ptr( pTail )); // High contention - bkoff(); } return true; }
pPrev
list (the most important for us), counting on success. And if then we find the forward and reverse list mismatches - well, we will have to spend time on reconciliation (launch fix_list
).enqueue
and dequeue
we now have exactly one CAS. Price - run fix_list
when a list violation is detected. Big is the price or small - the experiment will say.cds/intrusive.optimistic_queue.h
, the cds::intrusive::OptimisticQueue
class of the cds::intrusive::OptimisticQueue
library.enqueue
/ dequeue
), they first declare it, save the operation descriptor together with the arguments in some shared shared storage, and then begin to help competing threads: browse descriptors in the repository and try to accomplish what is written in them (descriptors). As a result, with a large load, several threads perform the same work, and only one of them will be the winner.libcds
library libcds
not have a wait-free queue implementation. For the authors themselves cite disappointing data on its performance in their research.-O3 -march=native -mtune=native
.cds::container
, that is, not intrusive. This means that memory is allocated for each element. We will compare with standard implementations std::queue<T, std::deque<T>>
and std::queue<T, std::list<T>>
with synchronization mutex. Type T
- a structure of two integers. All lock-free queues are based on Hazard Pointer.enqueue
/ dequeue
. 10 enqueue
, 75% β enqueue
, 25% β dequeue
( 10 enqueue
2.5 β dequeue
). β dequeue
7.5 , .std::queue<T, std::deque<T>>
. Why? , : std::deque
N , . , , , , , []. , libcds
, , .MSQueue
, , .OpimisticQueue
, β , , , .libcds
elimination back-off Treiber'. , , / C++ (, , , β ). , elimination back-off, β . libcds .push
), β ( pop
). β , β . β 10 ( 10 push
10 pop
). .push
/ pop
. ( libcds
, ), , 10 push/pop
10 β 15 ( 64 ), 0.1%, ( elimination back-off) 35 ! , elimination back-off , ( elimination back-off 5 ), .Source: https://habr.com/ru/post/219201/
All Articles