⬆️ ⬇️

Effective implementation of Readers – writer lock based on "Interlocked Variable Access"

Introduction



The specificity of the project in which I work is such that, on the one hand, use of third-party libraries is not allowed (with a few exceptions), and on the other hand, emphasis is placed on very deep code optimization. So you often have to reinvent the wheel in the form of your own realizations.



In the course of this publication, I want to share the idea of ​​implementing the well-known synchronization primitive readers-writer lock based on the so-called atomic operations. As you know, readers-writer lock is designed to solve the problem of synchronizing access to a shared resource in such a way as to avoid simultaneous reads and writes, but at the same time allow parallel reading to an arbitrarily large number of threads.



Finding a solution



Initially, I took the task rather lightly, but as it turned out in vain. To begin with, after viewing the available resources, I still could not find the implementation that I would have liked. As it turned out, the biggest problem was to block incoming “readers” and “writers” in such a way that they would not remain in this state forever. If, for example, you organize waiting using condition variables , it is very easy to get into a situation where the incoming flow should be blocked, but until the blocking occurs, the thread that has the resource finishes its work and the completion signal sent by it does not reach the addressee. The latter, in turn, successfully completes the transition to the “wait” state and at that the implementation fails. The problem is actually solved by introducing additional logic, but in my case, this implementation eventually turned out even slower than the meaningless and merciless lock of each incoming stream. Of course, I don’t pretend that this is an absolute truth and maybe I missed something, but I began to look for other possibilities ... All this time, I had the feeling that the problem can be solved using much simpler mechanisms interlocked variable access , through which I’m somewhat previously, quite elegantly finished with double check locking optimization. So, I began to think in this direction and as a result I got the following ...



Implementation



The project requires support for QNX systems (the product itself is focused on) and Windows (emulation, debugging, etc.). Since the second is much more popular, I will give the C ++ implementation for it. However, I’ll stop at one moment of porting to POSIX. So:

')

Ingot class


class CRwLock { public: CRwLock(); void readLock(); void writeLock(); void readUnlock(); void writeUnlock(); private: static const LONG WRITER_BIT = 1L << (sizeof(LONG) * 8 - 2); LONG mWriterReaders; }; 




With a set of functions, everything is obvious and in my opinion does not require comments. Let's talk about the only field of the mWriterReaders class:







Class constructor


 CRwLock::CRwLock() : mWriterReaders(0) { } 




It is designed to perform a single task - to set the initial state, corresponding to the absence of someone who owns the resource, or more simply, resetting all bits to 0.



Read lock


As you may have guessed, I’m going to use atomic operations on a single integer variable to achieve our goal.



 void CRwLock::readLock() { if(::InterlockedExchangeAdd(&mWriterReaders, 1) >= WRITER_BIT) { while(::InterlockedCompareExchange(&mWriterReaders, 0, 0) >= WRITER_BIT) { Sleep(0); } } } 




At the input, we immediately add ourselves to the readers and at the same time we check if there is someone who is performing the recording (this is so if the bit of the writing stream is set, and therefore the number itself is greater than or equal to this value). If so, then we must wait for the write operation to complete with the others. To do this, I chose the spin lock option during which the very “writer” bit is checked and as soon as it is cleared, everyone who waits for reading gets access (this implementation gives priority to reading threads, but more on that later).



Here I would like to dwell on the issue of the so-called busy wait , when the waiting thread actively consumes CPU resources. In this case, I compromised by adding a Sleep (0) instruction that transfers the rest of the time, from a dedicated process, to the scheduler, which can issue it to other threads, depending on their priority. In other words, time is not burned in idle, but can be used with benefit. On the other hand, the acuteness of the response from the waiting flow to the change in the state of the flags is dulled, which in the case of my configuration of the iron flows and the operations performed by them resulted in about a 10% increase in the time of the test program. But in no case should we forget that the system as a whole wins, of course, with a free CPU at its disposal.



For POSIX, instead of Sleep (0) you can use sched_yield



Write lock


 void CRwLock::writeLock() { while(::InterlockedCompareExchange(&mWriterReaders, WRITER_BIT, 0) != 0) { Sleep(0); } } 




The thread that needs to write must wait until everyone releases the resource (absolute zero) and only then does it set up reading itself. From here and the reading priority - even going into the waiting state with the existing writing stream, they are guaranteed in advance to have priority over the same blocked "writers".



Unlocking


 void CRwLock::readUnlock() { ::InterlockedExchangeAdd(&mWriterReaders, -1); } void CRwLock::writeUnlock() { ::InterlockedExchangeAdd(&mWriterReaders, -WRITER_BIT); } 




A simple decrement in the case of reading and a bit reset in the case of writing.



Instead of an afterword



I want to say that in my case I got a good performance result compared to the use of critical sections, which at the same time behaves predictably when the ratio of writing and reading flows changes.



I will be glad to your criticism and comments.

Source: https://habr.com/ru/post/200924/



All Articles