Transactions
A transaction is a sequence of operations on data that has a beginning and an end.
A transaction is the sequential execution of read and write operations. The end of a transaction can be either saving changes (commit, commit) or discarding changes (rollback). In relation to a DB, a transaction is several requests that are treated as a single request.
Transactions must comply with ACID properties.
Atomicity The transaction is either completely or not executed at all.
Consistency Upon completion of the transaction, the limitations imposed on the data (for example, constraints in the database) should not be violated. Consistency implies that the system will be transferred from one correct state to another correct one.
')
Isolation Concurrently executed transactions should not affect each other, for example, to change the data that another transaction uses. The result of executing concurrent transactions should be as if the transactions were executed sequentially.
Sustainability. After fixing, the changes should not be lost.
Transaction log
The log stores changes made by transactions, ensures atomicity and data stability in the event of a system failure.
The log contains the values that the data had before and after they were changed by the transaction. Write-ahead log strategy obliges to add to the log a record of previous values before the start, and of end values after the completion of the transaction. In the case of a sudden stop, the DB system reads the log in the reverse order and cancels the changes made by the transactions. Having met the interrupted transaction, the database executes it and makes changes about it in the log. Being in the state at the time of failure, the database reads the log in direct order and returns the changes made by the transactions. Thus, the stability of transactions that have already been fixed and the atomicity of the interrupted transaction is preserved.
Simply re-performing erroneous transactions is not enough to restore.
Example. On account of the user $ 500 and the user decides to withdraw them through an ATM. Two transactions are performed. The first one reads the value of the balance, and if there is enough money on the balance sheet it gives money to the user. The second subtracts the required amount from the balance. Suppose a system crash occurred and the first operation failed, and the second one completed. In this case, we cannot reissue money to the user without returning the system to its original state with a positive balance.Isolation levels
Read fixed data
The problem of dirty reading (Dirty Read) is that a transaction can read the intermediate result of another transaction.
Example. The initial balance value is 0 $. T1 adds $ 50 to the balance. T2 reads the balance value ($ 50). T1 cancels changes and ends. T2 continues execution with incorrect balance data.The solution is to read fixed data (Read Committed) prohibiting read data modified by the transaction. If transaction A has changed some data set, then transaction B, when requesting this data, is forced to wait for the completion of transaction A.
Repeatable Read (Repeatable Read)
Lost Updates Problem. T1 saves changes over T2 changes.
Example. The initial balance value is $ 0 and two transactions simultaneously replenish the balance. T1 and T2 read a balance of $ 0. Then T2 adds $ 200 to $ 0 and saves the result. T1 adds $ 100 to $ 0 and saves the result. The final result is $ 100 instead of $ 300.The problem of non-repeatable read (Unrepeatable read). Re-reading the same data returns different values.
Example. T1 reads the balance value of $ 0. Then T2 adds $ 50 to the balance and ends. T1 re-reads the data and detects inconsistencies with the previous result.Repeatable Read ensures that repeated reads will return the same result. Data read by one transaction is forbidden to change in others until the completion of the transaction. If transaction A read some data set, then transaction B, when requesting this data, is forced to wait for the completion of transaction A.
Ordered reading (Serializable)
Phantom Reads Problem. Two queries that select data by a certain condition return different values.
Example. T1 asks for the number of all users whose balance is greater than $ 0 but less than $ 100. T2 subtracts $ 1 from a user with a balance of $ 101. T1 re-executes the request.Ordered reading (Serializable). Transactions are executed as fully sequential. It is forbidden to update and add entries that are subject to the terms of the request. If transaction A has requested data from the entire table, then the entire table is frozen for the remaining transactions until transaction A is complete.
Scheduler
Sets the order in which operations should be performed with concurrent transactions
Provides a specified level of isolation. If the result of operations is not dependent on their order, then such operations are commutative (Permutable). Commutative read operations and operations on various data. Read-write and write-write operations are not commutative. The task of the scheduler is to alternate the operations performed by parallel transactions so that the execution result is equivalent to the sequential execution of transactions.
Control mechanisms for parallel tasks (Concurrency Control)
Optimistic based on the detection and resolution of conflicts, pessimistic to prevent conflicts
With an optimistic approach, several users have at their disposal copies of the data. The first one who completed editing saves the changes, the rest should merge the changes. An optimistic algorithm allows conflict to occur, but the system must recover from the conflict.
With a pessimistic approach, the first user who captures the data prevents the others from receiving data. If conflicts are rare it is wise to choose an optimistic strategy, as it provides a higher level of parallelism.
Locking
If one transaction has blocked data, then the remaining transactions when accessing data must wait for unlocking
A block can be superimposed on a database, table, row or attribute. Shared Lock can be imposed on one data by several transactions, allows all transactions (including overlaying) to read, prohibits change and exclusive capture. Exclusive Lock can be imposed only by one transaction, allows any actions of the imposed transaction, prohibits any actions by the others.
Deadlock is considered a situation when transactions are in standby mode, lasting indefinitely
Example. The first transaction is waiting for the release of data captured by the second, while the second is waiting for the release of data captured by the first.An optimistic solution to a deadlock problem allows a deadlock to occur, but then restores the system by rolling back one of the transactions involved in the deadlock
With a certain frequency is searched for deadlocks. One of the detection methods is by time, that is, consider that a deadlock occurred if the transaction takes too long. When a deadlock is found, one of the transactions is rolled back, which allows other transactions involved in the deadlock to complete. Victim selection can be based on transaction costs or their precedence (Wait-Die and Wound-wait schemes).
Each transaction
T is assigned a timestamp
TS containing the start time of the transaction.
Wait die
If TS (Ti) < TS (Tj) , then Ti waits, otherwise Ti rolls back and starts again with the same timestamp.If a young transaction captures a resource, and an older one requests the same resource, then the older transaction is allowed to wait. If an older transaction captures a resource, then the younger transaction requesting this resource will be rolled back.
Wound wait
If TS (Ti) < TS (Tj) , then Tj rolls back and starts again with the same timestamp, otherwise Ti waits.If a younger transaction captures a resource, and an older transaction requests the same resource, then the younger transaction will be rolled back. If an older transaction captures a resource, then a younger transaction requesting this resource is allowed to wait. Selection of the victim based on precedence prevents deadlocks, but rolls back transactions that are not in a locked state. The problem is that transactions can be rolled back many times, because an older transaction can hold a resource for a long time.
A pessimistic solution to the deadlock problem does not allow the transaction to begin execution if there is a risk of a deadlock
To detect deadlocks, a graph is built (wait graph, wait-for-graph), the vertices of which are transactions, and edges are directed from transactions waiting for data to be released to the transaction that captured this data. It is believed that a deadlock occurred if the graph is obsessive. Building a graph of expectations, especially in distributed databases, is an expensive procedure.
Two-phase locking - preventing deadlocks by capturing all the resources used by a transaction at the beginning of a transaction and freeing them at the end
All blocking operations must precede the first unlocking. It has two phases - Growing Phase at which there is an accumulation of captures and Shrinking Phase at which release of captures occurs. If it is impossible to capture one of the resources, the transaction begins again. It is possible that a transaction will not be able to capture the required resources, for example, if several transactions will compete for one resource.
Two-phase commit provides commit on all replicas of the database
Each database enters information about the data that will be changed in the log and responds to the coordinator OK (Voting Phase). After everyone answered OK, the coordinator sends a signal obliging everyone to commit. After the server commit, they answer OK, if at least one of them didn’t answer OK, then the coordinator sends a signal to cancel changes to all servers (Completion Phase).
Timestamp method
Older transaction is rolled back when attempting to access data from a younger transaction.
Each transaction is assigned a time stamp
TS corresponding to the start time. If
Ti is older than
Tj , then
TS (Ti) <
TS (Tj) .
When a transaction is rolled back, it is assigned a new timestamp. Each
Q data object involved in a transaction is marked with two labels.
W-TS (Q) is the timestamp of the youngest transaction that successfully recorded over
Q. R-TS (Q) is the timestamp of the youngest transaction that performed the read record over
Q.When transaction
T requests the reading of
Q data, two options are possible.
If TS (T) < W-TS (Q) , that is, the data has been updated by a younger transaction, then transaction T is rolled back.If TS (T) > = W-TS (Q) , then the reading is performed and R-TS (Q) becomes MAX (R-TS (Q), TS (T)) .When transaction
T requests a change in
Q data, two options are possible.
If TS (T) < R-TS (Q) , then the data has already been read by a younger transaction and if you make a change, a conflict will arise. Transaction T is rolled back.If TS (T) < W-TS (Q) , that is, the transaction is trying to overwrite a newer value, transaction T is rolled back. In other cases, the change is made and W-TS (Q) becomes equal to TS (T) .No costly waiting graph construction is required. Older transactions depend on newer ones, therefore there are no cycles in the waiting column. There are no deadlocks, because transactions are not expected, but immediately rolled back. Cascading kickbacks are possible. If
Ti rolled back, and
Tj read the data that
Ti changed, then
Tj should roll back too. If at the same time Tj was already committed, then there will be violations of the principle of sustainability.
One solution to cascade rollbacks. A transaction performs all write operations at the end, and the remaining transactions are required to wait for the completion of this operation. Transactions are waiting for commit before reading.
Thomas write rule - a variation of the timestamp method in which data updated by a younger transaction is prohibited from overwriting an older one.
Transaction
T requests a change in
Q data. If
TS (T) <
W-TS (Q) , that is, the transaction is trying to overwrite a newer value, transaction T is not rolled back as in the timestamp method.