Transactions and multithreaded database access

Recently, I needed to execute the following code (presented in the most simplified form):

public void Start() { using (var transactionScope = new TransactionScope()) { ... GetOrCreateCompany(someValue); ... transactionScope.Complete(); } } private Company GetOrCreateCompany(string companyName) { var company = _companiesRepository.GetCompany(companyName); //     ;     -  null if (company == null) company = _companiesRepository.Add(companyName); return company; }

This code was executed in a multi-threaded environment, where each stream received a Start method at the input (which means each thread had its own transaction).
')
This seemingly simple code has several nuances, which will be discussed under the cut.

There is a general solution for the task: set the necessary constraints, conclude a transaction in a cycle, and if an exception occurs - try again (if a certain number of attempts is exceeded - throw the exception up). However, in the case of a large number of threads as a result of conflicting locks, this approach works very slowly (it is easier to say that it does not work). Therefore, I began to implement a more specific solution.

Below is the code of my first solution (hereinafter, the specifics of transactions and locks will be considered using MySQL as an example):

 public void Start() { using (var transactionScope = new TransactionScope(TransactionScopeOption.Requires, new TransactionOptions() { IsolationLevel = IsolationLevel.Serializable })) { ... GetOrCreateCompany(someValue); ... transactionScope.Complete(); } } private Company GetOrCreateCompany(string companyName) { var company = _companiesRepository.GetCompanyWithWriteLock(companyName); //    SELECT ... FOR UPDATE if (company == null) company = _companiesRepository.Add(companyName); return company; }

In the code above, the company selection is blocked together with the corresponding range (due to the use of Serializable) and is unlocked only after a commit. The next transaction that tries to count the same company will wait for a commit to block the transaction.

In general, this solution works. But we look at the law of Amdal , then at our code, then again at the Law of Amdal - and it becomes sad: not only are we forced to block records up to the transaction commit (and the amount of code below the lock / locks is not fixed), so also We put write-lock even if the sample returns the desired company (and therefore do not need to add it). If the first point is the basis of this decision, and there’s no way out of it, then we’ll deal with the second one.

Selection of the company will not return us null in two cases:

The company was added as part of a previous transaction, which is already committed
The company was added as part of the current transaction, and will become visible to other transactions only after committing the current transaction.

Let's try to handle the first case separately, and before blocking, we will execute the query outside the current transaction, which will check if there are no required data among the already committed transactions:

 private Company GetOrCreateCompany(string companyName) { Company company; //    ,  ,       .   -    ,      using (var independentTransactionScope = new TransactionScope(TransactionScopeOption.Suppress)) { company = _companiesRepository.GetCompany(companyName); } //      ,        -     if (company != null) return; //company   null   : 1.      ,        2.      var company = _companiesRepository.GetCompanyWithWriteLock(companyName); if (company == null) //       -       company = _companiesRepository.Add(companyName); } }

A small minus of this optimization is that you have to duplicate the sample, if the values are not visible outside the current transaction (in other words, if the sample returned from the independent TransactionScope). But against the background of performance gains on this extra sample, you can close your eyes.

Why did I write this article? I do not leave the feeling that I went in the wrong direction: the problem is trivial, and the solution ... not so much. In addition, I could not find examples of solutions to this problem, and this puzzles me even more. I hope the commentators will clarify the situation by proposing alternative solutions, or by agreeing with mine.

Update
In personal correspondence, the Shaddix habrachelok (in combination, my colleague) suggested an interesting development of the idea of using an independent transaction — to place in an independent transaction not only reading, but also adding. This, at first glance, a small modification changes everything very much.

First implementation:

 public class TransactionCode { private static readonly object _lockObject = new object(); public void Start() { using (var transactionScope = new TransactionScope()) { ... GetOrCreateCompany(someValue); ... transactionScope.Complete(); } } private Company GetOrCreateCompany(string companyName) { Company company; //,        using (var indepdentTransactionScope = new TransactionScope(TransactionScopeOption.RequiresNew)) { var company = _companiesRepository.GetCompany(companyName); } if (company != null) return; //   ,     //    using (var independentTransactionScope = new TransactionScope(TransactionScopeOption.RequiresNew)) { var company = _companiesRepository.GetCompanyWithWriteLock(companyName); if (company == null) company = _companiesRepository.Add(companyName); independentTransactionScope.Complete(); } } }

Now we are not adding a company to the main transaction, but using an independent transaction for this purpose. The main feature here is that since the addition takes place in an independent transaction that commits immediately, the added value immediately (and not after the commit transaction of the main transaction, as it was in the previous decision) will be visible to other transactions. Well, there is no doubt that this approach is more efficient: in my case, the performance gain was 300% (and the more dynamic the data, the greater the gain due to the shorter locking time).
But this solution also has disadvantages:
1. The main drawback is that if the main transaction is rolled back, then the data added in the independent transaction will not be rolled back (we committed them). In general, the disadvantage is rather critical ... but not for me: in methods like GetOrCreate, I add relatively independent data that will be added sooner or later: not in this transaction, so in the next one.
2. I have personal prejudices against using not-readonly transactions with TransactionScopeOption.RequiresNew and TransactionScopeOption.Suppress (I will cut them in the next post).

Thus, despite the fact that I like the latter solution more generally, it cannot be said that one of them is better than the other — they are different.

Source: https://habr.com/ru/post/115156/

All Articles

Transactions and multithreaded database access

More articles: