Organization of simultaneous access to data in the cloud storage Microsoft Azure Storage

In modern web applications, situations often arise when several users simultaneously work with the same data.

In order to ensure the success of each user’s actions, application developers need to carefully consider the processing and implementation of such scenarios, especially in those cases when simultaneous processing of data by a group of users is really necessary.
')
Most often, developers use the following three strategies for managing concurrent access to data:

Optimistic concurrency

The application will update the data only after checking it for changes since it was last accessed.

For example, two users view the same wiki page, and then simultaneously decide to update it.

In this case, the platform wiki must provide a process in which the update of the second user does not replace the update of the first, and both users will understand whether the action of each of them was performed.

This strategy is used in web applications most often.
Pessimistic concurrency

In this case, the application locks the updated data, and until the lock is released, i.e. until the first user has finished editing the data, access to them will be restricted for other users.

For example, in master / slave data replication scenarios, where only the master performs updates, as a rule, only he can establish a long-term data lock to prevent anyone else from editing the data.
"Wins the last" (Last writer wins)

This approach allows you to perform any operations with data without checking the data for an update since the last time the application was accessed.

This strategy (or lack of a formal strategy) is usually applied where data is distributed in such a way that the probability of having multiple users accessing the same section is excluded.

The method described above is also useful in processing short-term data streams.

This article will discuss how the Azure Storage platform simplifies the development of applications using data warehousing, providing support for all three concurrent access strategies.

Azure Storage - simplifies cloud development

Azure Storage supports all three concurrent data access strategies, although there are some features to support optimistic and pessimistic concurrency, since the repositories were originally designed for a strict concurrency model to ensure that at the moment the storage service completes the add and edit operations, all subsequent references to this data will find the latest update.

Storage platforms that use the consistency model contain a lag - delay between the period when one user is recording and the period when the updated data is visible to other users, which complicates the development of client applications due to the need to prevent data inconsistencies for end users.

In addition to choosing the right data access strategy, developers need to pay attention to how the storage platform organizes change isolation — partial changes to the same objects in transactions.

Azure Storage service uses isolation with snapshots to enable simultaneous read and write within a single partition.

Unlike other isolation levels, isolation by snapshots ensures that all read operations see a consistent data snapshot even at the time of the updates taking place, especially when the last stored values are returned during the execution of the update transaction.

Organization of simultaneous access to blobs

You can choose which parallelism strategy to use to control access to blob objects and containers in the blob service.

If you do not explicitly specify a strategy, the “Wins Last” strategy will be used by default.

Optimistic concurrency for blobs and containers

The storage service assigns an identifier for each stored object. This ID is updated each time an update operation is performed on an object. The identifier is returned to the client as part of the response to the HTTP GET request, using the ETag header (entity tag) defined in the HTTP protocol.

A user performing an update on such an object can send along with the original ETag a conditional header in order to make sure that the update will occur only under certain conditions - in this case the condition is the “If-Match” header that is required by the storage service.

Below is a general outline of this process:

We get a blob from the repository service, the response includes an HTTP ETag header parameter that identifies the current version of the object in the repository service.
When you update the blob, include the received ETag parameter from the previous step in the conditional header of the If - Match request that you send to the service.
The service compares the ETag value in the request with the current ETag value in the blob.
If the current ETag value of the blob differs from the ETag in the If - Match request header, the service returns error 412 to the client. This indicates to the client that another process has updated the blob since the client requested it.
If the current ETag value does not differ from the ETag value in the request header, the service will perform the requested operation and update the current ETag value of the blob to show that the data has been updated.

The C # snippet below shows a simple example of creating an If-Match condition using the AccessCondition class based on the ETag value, which was derived from the properties of a previously extracted or added blob.

This condition then uses the AccessCondition object during a blob update: the AccessCondition object adds an If-Match header to the request.

If another process has updated the blob, the blob service will return an HTTP 412 message (Precondition Failed).

A full example can be downloaded here .

//  Etag  ,   - UploadText string orignalETag = blockBlob.Properties.ETag; //       string helloText = "Blob updated by a third party."; //    etag,     (   etag) blockBlob.UploadText(helloText); Console.WriteLine("Blob updated. Updated ETag = {0}", blockBlob.Properties.ETag); //    ,   ETag,     try { Console.WriteLine("Trying to update blob using orignal etag to generate if-match access condition"); blockBlob.UploadText(helloText,accessCondition: AccessCondition.GenerateIfMatchCondition(orignalETag)); } catch (StorageException ex) { if (ex.RequestInformation.HttpStatusCode == (int)HttpStatusCode.PreconditionFailed) { Console.WriteLine("Precondition failure as expected. Blob's orignal etag no longer matches"); } }

The vault service also supports other conditional headers, such as If - Modified - Since , If - Unmodified - Since, and If - None - Match .

Additional information documentation on MSDN .

The table shows the operations on containers that accept conditional headers, such as If - Match, in the request and return the ETag in the response:

Operation	Returns the ETag value .	Accepts conditional headers
Create Container	Yes	No
Get Container Properties	Yes	No
Get Container Metadata	Yes	No
Set container metadata	Yes	Yes
Get Container ACL	Yes	No
Set container ACL	Yes	Yes (*)
Delete Container	No	Yes
Lease container	Yes	Yes
List blobs	No	No

(*) The permissions defined by SetContainerACL are cached, and their update takes 30 seconds, during which the update consistency is not guaranteed.

The table shows blob operations that accept conditional headers in the request, such as If-Match, and return an ETag value:

Operation	Returns the ETag value.	Accepts conditional headers
Put blob	Yes	Yes
Get blob	Yes	Yes
Get blob properties	Yes	Yes
Set Blob Properties	Yes	Yes
Get blob metadata	Yes	Yes
Set blob metadata	Yes	Yes
Lease Blob (*)	Yes	Yes
Snapshot blob	Yes	Yes
Copy blob	Yes	Yes (for source and destination blob)
Abort Copy Blob	No	No
Delete Blob	No	Yes
Put block	No	No
Put block list	Yes	Yes
Get block list	Yes	No
Put page	Yes	Yes
Get page ranges	Yes	Yes

(*) Leasing a blob does not change its etag.

Pessimistic concurrency in blobs

To block a blob for exclusive use, a leasing mechanism is applied. When using leasing, you specify its duration: 15 - 60 seconds, or without termination, which means exceptional blocking. You can also extend the lock, or unlock the blob after you finish working with it. The blob service automatically disables leasing if it expires on the blob.

Leasing allows you to use different synchronization strategies, including exclusive write / split read, exclusive write / exclusive read, split write / exclusive read.

Where leasing exists, the storage service organizes an exclusive record (put, set, delete operations). To ensure the exclusivity of reading operations, the developer is required to ensure that all client applications use the leasing identifier, and only one client has a suitable leasing identifier at a time. Reading operations that do not include a leasing identifier occur in split reading.

The code below (C #) shows an exclusive 30 second lease on the blob, an update on the blob, and an end to the lease. If the necessary lease on the blob is already installed, then when trying to install a new one, the blob service will return the result “HTTP (409) Conflict”.

When creating a request to update a blob in the storage service, in the code, for the information on leasing, the AccessCondition object is used.

A full example can be downloaded here .

 //    15  string lease = blockBlob.AcquireLease(TimeSpan.FromSeconds(15), null); Console.WriteLine("Blob lease acquired. Lease = {0}", lease); //  ,  .     const string helloText = "Blob updated"; var accessCondition = AccessCondition.GenerateLeaseCondition(lease); blockBlob.UploadText(helloText, accessCondition: accessCondition); Console.WriteLine("Blob updated using an exclusive lease"); //       Simulate third party update to blob without lease try { //    ,       Console.WriteLine("Trying to update blob without valid lease"); blockBlob.UploadText("Update without lease, will fail"); } catch (StorageException ex) { if (ex.RequestInformation.HttpStatusCode == (int)HttpStatusCode.PreconditionFailed) Console.WriteLine("Precondition failure as expected. Blob's lease does not match"); else throw; }

If you perform a write operation on a blob with leasing without transferring the leasing identifier, the request will fall with error 412. Please note - if the leasing period expires before calling the UploadText method , and you still pass the leasing identifier, the request again drops with error 412.

For information on managing the duration of the lease and the lease ID, see the documentation .

The following list describes blob operations that can use leasing for pessimistic concurrency:

Put blob
Get blob
Get blob properties
Set Blob Properties
Get blob metadata
Set blob metadata
Delete Blob
Put block
Put block list
Get block list
Put page
Get page ranges
Snapshot Blob - no leasing id required
Copy Blob - an identifier is required if leasing is installed on the blob
Abort Copy Blob - an identifier is required if unlimited leasing is installed on the blob
Lease blob

Pessimistic concurrency for containers

Container leasing provides support for the same synchronization strategy as for blobs (exclusive write / split read, exclusive write / exclusive read, and split write / exclusive read), however, unlike blobs, the storage service uses an exclusivity strategy for deletion operations.

To remove the active leasing container, the customer must include the active leasing identifier in the removal request.

Other operations on a leasing container do not have to contain a leasing identifier, and such operations are called shared.

If an exclusive update (put or set) or read operation is required, then developers need each client to use the leasing identifier and only one client to use the currently suitable identifier.

The following are container operations that can use leasing for pessimistic concurrency:

Delete Container
Get Container Properties
Get Container Metadata
Set container metadata
Get Container ACL
Set container ACL
Lease container

Additional Information:

Organization of parallelism in the service Tables

The table service, while working with entities, by default uses an optimistic strategy for simultaneous access to data, as opposed to a blob service, where it is necessary to explicitly choose to use optimistic parallelism.

Another difference between table and blob services is that with tables you can control the behavior of concurrent access only for entities, while for the blob service you can control concurrency, both when accessing containers and blobs.

To use optimistic concurrency and check whether the entity has been modified by another process since its selection from the table service, you can use the ETag value obtained during the selection.

An outline of this process is presented below:

Extracting an entity from a table storage service. The response includes the value of ETag - the current identifier associated with the entity in the repository.
When you update an entity, include the received ETag parameter from the previous step in the conditional header of the If - Match request that you send to the service.
The service compares the ETag value in the request with the current ETag value of the entity.
If the current ETag of the entity is different from the ETag in the If - Match request header, the service returns an error 412 to the client. This indicates to the client that the other process has updated the entity since the client has requested it.
If the current ETag value does not differ from the ETag value in the If – Match or Header request header, the header contains the symbol (*), the service will perform the requested operation and update the current ETag value of the entity to indicate that the data has been updated.

Note that unlike the blob service, the table service requires the client to include an If - Match header in the update requests. However, the possibility of forced unconditional update (the strategy “wins the last”) remains and the inspection bypasses when the client sets the If - Match header value to the (*) character in the request.

The code below (C #) demonstrates the customer entity, created or selected from existing data with an updated email address. The initial insert or retrieval stores the ETag value in the customer object, and since the example uses the same instance of the object during the execution of the replace operation, it automatically sends the ETag value back to the table service, allowing the service to check for a violation while accessing simultaneously.

If another process has updated the entity in the table storage, the service returns an HTTP 412 (Precondition Failed) message.

A full example is available here.

 try { customer.Email = "updatedEmail@contoso.org"; TableOperation replaceCustomer = TableOperation.Replace(customer); customerTable.Execute(replaceCustomer); Console.WriteLine("Replace operation succeeded."); } catch (StorageException ex) { if (ex.RequestInformation.HttpStatusCode == 412) Console.WriteLine("Optimistic concurrency violation – entity has changed since it was retrieved."); else throw; }

To explicitly block the simultaneous access check, you must set the ETag property of the employee object to “*” before performing the update operation.

customer.ETag = “*”;

The table shows how table operations use ETag values:

Operations	Returns the ETag value.	Requires conditional headers
Query Entities	Yes	No
Insert entity	Yes	No
Update Entity	Yes	Yes
Merge entity	Yes	Yes
Delete Entity	No	Yes
Insert or Replace Entity	Yes	No
Insert or Merge Entity	Yes	No

Note that the Insert or Replace Entity and Insert or Merge Entity operations do not perform any simultaneous access checks, because they do not send the ETag value to the table service.

As a rule, developers using tables should rely on an optimistic strategy for concurrent access to data when developing scalable applications.

If there is a need for pessimistic locking, then a suitable option for accessing the tables for developers may be to allocate a special blob for each table and attempt to lease it before each operation on the table.

This approach requires the application to ensure that any access to the tables is made through a preliminary attempt to lease the blob.

Also note that the minimum lease duration is 15 seconds, which requires special attention when developing scalable solutions.

Additional Information:

Entity operations

Organization of parallelism in the queue storage service

For queues, there is one scenario in which the need arises to use a parallel access strategy — this is when several clients simultaneously retrieve messages from the queue. When a message is retrieved from a queue, the response includes the message itself and the pop receipt value that is required to delete the message later.

The message is not automatically removed from the queue, but after it has been retrieved, it becomes not visible to clients for a time by a certain visibilitytimeout.

The client receiving the message waits for the message to be deleted in the period after it has been processed and until the time specified by the TimeNextVisible element in the response.

To determine the TimeNextVisible, the value of visibilitytimeout is added to the time the message was retrieved.

The queue storage service does not support an optimistic or pessimistic strategy, so customers processing messages retrieved from the queue must provide an idempotent way of processing messages.

For update operations such as SetQueueServiceProperties, SetQueueMetaData, SetQueueACL, and UpdateMessage, the strategy “wins the last” is used.

Additional Information:

Organization of parallelism in the file storage service

Access to the file service can be performed using two different protocols: SMB and REST. The REST service does not support optimistic or pessimistic blocking, and all updates will be made based on the “wins the last” strategy.

Clients using SMB can use a file system-level locking mechanism to control access to shared files, including the possibility of pessimistic locking.

When the SMB client opens the file, it determines the file access parameter and the shared access mode. As a result of setting the file access option to “Write” or “Read / Write” and shared access mode to “None”, the SMB client will block the file until it is closed.

If a REST operation is performed on a file blocked by an SMB client, the REST service will return error 409 with the code “Sharing Violation”.

When an SMB client opens a file for deletion, it marks the file as “pending deletion” until all other SMB clients close it. While the file is marked as pending deletion, any REST operations on this file will return a 409 error with the SMBDeletePending code. Error code 404 (Not Found) will not be returned, as there is a possibility that the SMB client will clear the flag waiting to be deleted before closing the file. In other words, the 404 error code will return only when the file is really deleted.

Please note that while the file is in a state pending deletion by the SMB client, it will not be included in the results of the List Files.

it is also necessary to take into account that the REST Delete File and Delete Directory operations are executed atomically and do not lead to the setting of the state “pending deletion”.

Additional Information:

File Lock Organization

Conclusion

Microsoft Azure cloud storage was designed to meet the needs of complex web applications. At the same time, developers are not required to sacrifice or rethink key design patterns, such as simultaneous access to data or ensuring the correct state of data. Mechanisms to ensure them are included in the storage itself.

Full sample application used in the article: