In this, the fourth, article from the series “The internal structure and architecture of the AtContent.com service”, I suggest to get acquainted with the background processing of tasks using instances of the Azure service (Worker Role).
We recommend using Azure Queues (Azure Queue) as the primary communication channel between instances. But using only this channel does not allow for the most efficient use of service instances. So, in this article you will learn how
- minimize the delay between sending a job to an instance and starting its processing
- minimize the number of transactions to the Azure Queue
- increase the efficiency of task processing

As we can see from the diagram, the standard SDK tools for Azure assume an interaction scenario exclusively through the queue (Azure Queue). At the same time, an instance of the Azure service does not know at what point the instance of the application will send it the job through the queue. Therefore, he has to periodically check for jobs in the queue, which causes some problems and inconveniences.
One of the problems is the delay in the processing of tasks. It occurs for the following reason. The service instance has to periodically check for jobs in the queue. Moreover, if you do such a check too often, it will generate a large number of transactions to the Azure Queue. If at the same time such tasks will get into the queue quite rarely - we get a large number of “idle” transactions. If you do check the availability of tasks in the queue less often - then the interval between such checks will be longer. Thus, if the task gets into the queue immediately after it is checked by an instance of the service, then it will be processed only at the beginning of the next interval. This introduces significant delays in message processing.
')
Another problem is idle transactions. That is, even if there are no messages in the queue, the service instance still has to contact it to check for the presence of these messages. This generates overhead.
With the standard approach that the SDK offers, you have to choose between transaction costs and the amount of delay in processing tasks. For some scenarios, the processing time may be irrelevant, and the tasks themselves arrive fairly regularly. In this case, you can follow the recommendations from the SDK and process tasks, periodically selecting them from the queue. But if tasks arrive irregularly, there may be bursts and lulls, then the efficiency of processing tasks in the standard way decreases.
These issues can be resolved using the messaging mechanism between instances and roles. It was described in the previous article in the series (
http://habrahabr.ru/post/140461/ ). A few words about this mechanism. It allows you to send messages from one instance to another, or to all instances of the role. Using it, you can run various handlers on instances, which allows, for example, synchronization of instances. When applied to the task, it allows you to start processing the queue immediately after adding the job to the queue. It does not need to constantly check the presence of jobs in the queue, which eliminates "idle" transactions.

If we consider this mechanism in more detail, we will see the following:

An Azure application instance adds a task to an Azure Queue queue and sends a message to an Azure service instance. The message, in turn, activates the handler. Further actions of the handler depend on the settings. If there is a record in the settings that the processing of the queue is already running, then there is no point in re-starting it. Otherwise, the handler will start processing and add a corresponding entry to the settings. He will select tasks from the queue as they are completed and continue until they run out.
Handler settings can be stored in various ways. For example, in instance memory (which is not the most reliable way), in instance storage, or in BLOB storage (Azure Blob Storage).
The CPlase library for queue processing has a special Queue class that allows you to create queue handlers. For this, the library also has an IWorkerQueueHandler interface and extensions for it, which allow you to make work with queues more convenient:
public interface IWorkerQueueHandler { bool HandleQueue(string Message); } public static class WorkerQueueHandlerExtensions { private static string CleanUpQueueName(string DirtyQueueName) { return DirtyQueueName.Substring(0, DirtyQueueName.IndexOf(",")).ToLowerInvariant().Replace(".", "-"); } public static string GetQueueName(this IWorkerQueueHandler Handler) { return CleanUpQueueName(Handler.GetType().AssemblyQualifiedName); } public static string GetQueueName(Type HandlerType) { return CleanUpQueueName(HandlerType.AssemblyQualifiedName); } }
The interface is very simple, it has only one method, which implements the entire logic of processing messages from the queue. The extension carries one goal - to obtain from the handler type an acceptable name for the queue. This saves the programmer from having to control the queue names for various handlers.
In the Queue class itself, the choice of the necessary queue is implemented and it is created if it is not in Azure storage. As well as adding a job to the queue, which is associated with sending a message to an instance of the Azure service. In this case, as already noted earlier, the message exchange mechanism between instances and roles is used.
public static CloudQueue GetQueue(string QueueName) { CreateOnceQueue(QueueName); return queueStorage.GetQueueReference(QueueName); } public static bool AddToQueue<QueueHandlerType>(string Task) where QueueHandlerType : IWorkerQueueHandler { var Queue = GetQueue(WorkerQueueHandlerExtensions.GetQueueName(typeof(QueueHandlerType))); return AddToQueue<QueueHandlerType>(Queue, Task); } public static bool AddToQueue<QueueHandlerType>(CloudQueue Queue, string Task) { try { var Message = new CloudQueueMessage(Task); Queue.AddMessage(Message); Internal.RoleCommunicatior.WorkerRoleCommand(typeof(QueueHandlerType)); return true; } catch { return false; } }
It is easy to calculate the benefits of such a solution, if you approximately know the volume of tasks that arrive in the queue and the approximate structure of the flow. So, for example, if your tasks arise mainly in the evening from 20 to 23 hours, then most of the day, checking tasks with an interval will work “at idle”. And it is not difficult to calculate that for 21 hours she will select tasks from the queue to be reckoned. At the same time, if such a check will be performed once a second, then this will entail about 75,000 “idle” transactions per day. With a transaction cost of $ 0.01 per 10,000, this amount will be $ 0.075 per day or $ 2.25 per month. But if you have 100 different queues, then the additional costs per month will be $ 225.
The other side of job processing at intervals is delay in processing. For some tasks such a delay will not be significant. We also have tasks that need to be processed as quickly as possible, for example - the distribution of income between the author, distributor and service. Such a task is sensitive to the processing time, since the author wants to see his earned money in the account immediately, and not after some time has passed. Moreover, if the task flow is very large, then a delay of one second between the processing will limit the flow of tasks being processed to 60 per minute, and this provided that the processing is performed instantly. Thus, at peak load, the flow of tasks can inflate the queue very quickly.
With this approach to the processing of tasks from the queue at certain intervals, it is necessary to balance between the costs of “idle” attempts to select a task from the queue and a delay in processing at the load peaks. The solution we proposed eliminates the need to take care of intervals and idle transactions. And allows you to post processing tasks from the queue without delay, with high efficiency and without additional costs for "idle" transactions.
Also note that the mechanism for working with queues is a part of the Open Source library of CPlase, which will soon be published and available to everyone.
Read in the series: