⬆️ ⬇️

We struggle with deadlock: pattern unlocked callbacks

Interlock situations



Wikipedia gives the following definition of interlocking: "Mutual blocking (eng. Deadlock) is a situation in a multitasking environment or a DBMS, in which several processes are in a state of endless waiting for resources occupied by these processes themselves."



Interlocks are usually dynamic in nature: their manifestation depends on such factors as user actions, availability of network services, positioning of the hard disk head, switching tasks in a system with preemptive multitasking, etc.



A classic example of interlocking: the first thread (A) captures the M1 mutex and then the M2 mutex. The second thread (B) captures the M2 mutex, and after that the M1 mutex. Mutual blocking of these two streams can occur as follows: flow A captures M1, flow B captures M2, then both flows are "doomed": neither flow A can capture M2, nor flow B can capture M1; attempts to capture mutexes will block both threads.

')

The described mutual blocking will occur only if both threads manage to capture exactly one mutex. Otherwise, the threads will continue.



This situation is very common in complex multi-threaded systems. As a rule, participating mutexes are located far from each other (in various components of the system), and it turns out to be quite difficult to identify participants of interlocking.



Common situation in the real system



One of the particular cases of the interlocking described above is as follows:



How dangerous is the situation described above? By the fact that during the development phase of the Worker object, the developer did not yet know exactly which functions of the system would be called through the callback interface. He only made demands on the interface: the function must have such and such parameters through which such and such data will be transmitted. And this call “in an unknown direction” is made with a captured mutex.



It is enough to add a few strokes to the picture described (this happens in complex systems):



Everything. It turned out a situation in which interlocking is possible. It will not occur in 100% of cases (a certain dynamics is necessary so that each of the participating threads has time to capture only one mutex), and this significantly complicates the search for such errors.



The following are two ways to solve this problem.



Method 1: change the lock order



The Worker object will provide separate functions for blocking and unlocking its internal mutex, and the user will be registered as follows:

  1. First, the internal mutex of the Worker object will be blocked (foreseen).
  2. After that, the consumer will perform actions that require the capture of the mutex M.
  3. The consumer will register with the Worker object; no internal mutex is captured inside the Worker object.


The disadvantages of this method are obvious:



Method 2: Do not block the mutex when transferring data to consumers



This method sounds promising: if data is transferred from the internal flow of the Worker object to consumers when the mutexes are not locked, this will correct all possible deadlocks when working with the Worker object.



Why not just make a callback when the mutex is not captured? Because the flow of the Worker object must go through the list of registered consumers and call the interface function of each consumer. If the list is not protected by the mutex, and the contents of the list change during this cycle, it is likely that the program will loop or even crash due to incorrect memory access.



Why not make a copy of the list of consumers (when creating a copy, capture a mutex), and then go through a cycle on the copy? Because you need to guarantee the consumer that after calling unregisterCallback data will not be transferred to him. If the consumer calls unregisterCallback from his destructor, the subsequent transfer of data to the callback interface of this consumer will cause the program to crash.



Thus, we almost came to the decision:



Here is a turnkey solution. To implement it, you need another synchronization object - the “condition variable” (eng. Condition variable):



Important note: if unregisterCallback can be called from the callback interface implemented by the consumer, then the described algorithm will result in a 100% hang inside unregisterCallback. This is easily solved: if unregisterCallback is called in the context of the internal flow of the Worker object, there is no need to check the flag and wait for the change of the condition variable.



Implementation using the Qt library synchronization tools



Header file:

class ICallback { public: virtual void dataReady(QByteArray data) = 0; }; class Worker : public QThread { public: Worker(); void registerCallback(ICallback *callback); void unregisterCallback(ICallback *callback); protected: virtual void run(); private: QMutex _mutex; QWaitCondition _wait; bool _callingNow; QLinkedList<ICallback *> _callbacks; }; 




Implementation:

 Worker::Worker() : QThread(), _mutex(QMutex::NonRecursive), _callingNow(false) { ... } void Worker::registerCallback(ICallback *callback) { QMutexLocker locker(&_mutex); _callbacks.append(callback); } void Worker::unregisterCallback(ICallback *callback) { QMutexLocker locker(&_mutex); _callbacks.removeOne(callback); if(QThread::currentThread()!=this) { while(_callingNow) _wait.wait(&_mutex); } } void Worker::run() { while(...) { QByteArray data; ... QLinkedList<ICallback *> callbacksCopy; _mutex.lock(); _callingNow=true; callbacksCopy=_callbacks; _mutex.unlock(); for(QLinkedList<Callback *>::const_iterator it=callbacksCopy.begin(); it!=callbacksCopy.end(); ++it) { (*it)->dataReady(data); } _mutex.lock(); _callingNow=false; _wait.wakeAll(); _mutex.unlock(); } } 

Source: https://habr.com/ru/post/175401/



All Articles