
First, let's define the notion of “queue - queue”.
Take for consideration the type of queue
"FIFO" (first in, first out). If we take the value from
Wikipedia , “this is an abstract data type with the discipline of access to elements”. In short, this means that we cannot get data from it in a random order, but only take what came first.
Next, you need to decide why they are needed at all?
1. For deferred operations. A classic example is image processing. For example, the user uploaded a picture to the site that we need to process, this operation takes a lot of time, the user does not want to wait so much. Therefore, we load the image, then we transfer it to the queue. And it will be processed when any “worker” gets it.
2. To handle peak loads. For example, there is some part of the system that sometimes gets a lot of traffic and it does not require an instant response. As an option, the generation of any reports. Throwing this task into the queue - we enable it to be processed with a uniform load on the system.
')
3. Scalability. And probably the most important reason, the queue makes it possible
to scale. This means that you can raise several services for processing in parallel, which will greatly increase productivity.
Now let's look at the problems we will face if we create the queue ourselves:
1. Parallel access. Only one handler can retrieve a specific message from the queue. That is, if two services ask for messages at the same time, a unique set of messages should be returned to each of them. Otherwise, it turns out that one message will be processed twice. What could be fraught with.
2. The mechanism of deduplication. The service should have a system that protects the queue from duplicates. There may be a situation in which, by chance, the same data set will be sent to the queue two times. As a result, we will process the same thing twice. Which again is fraught.
3. Error handling mechanism. Suppose our service took three messages from the queue. Two of which he successfully processed by sending removal requests from the queue. And the third he could not process and died. A message that is in processing status is not available for other services. And it should not remain forever in the status of processing. Such a message should be passed to another handler according to some logic. We will consider the implementation of such logic soon on the example of AWS SQS (Simple Queue Service)
Amazon Web Services - Simple Queue Service
Now let's look at how SQS solves these problems and what it can do.
1. Parallel access. At the queue you can set the parameter
"Visibility timeout" . It determines how much time the message processing can take. By default it is
30 seconds. When a service picks up a message, it is transferred to
“In Flight” status for 30 seconds. If during this time there was no command to delete this message from the queue, it returns to the beginning and the next service will be able to receive it for processing again.
A small scheme of work.
Notice: Be careful. SQS in some cases may send a duplicate message (At-Least-Once Delivery). Therefore, your service, for processing, must be idempotent .
2. The error handling mechanism. In SQS, you can configure a second queue for dead letter queuing. That is, those that could not process our service will be sent to a separate queue, which you can dispose of at your discretion. You can also set after which the number of unsuccessful attempts the message goes into the “dead” queue. An unsuccessful attempt is the expiration of "Visibility timeout". That is, if during this time a removal request was not sent, such a message will be considered unprocessed and will return to the main queue or go to the “dead” queue.
3. Deduplication of messages. SQS also has a system of protection against duplicates. Each message has a
“deduplication id” , SQS will not add to the queue a message with
repeated “Deduplication Id” for 5 minutes. You must specify “Deduplication Id” in each message or enable id generation based on content. This means that the hash generated on the basis of your content will fall into “Deduplication Id”. The parameter
"Content-Based Deduplication". Read more about deduplicationNotice: Be careful if you send two identical messages within 5 minutes and you have “Content-Based Deduplication” enabled . SQS will not add the second message to the queue.
Notice: Be careful, for example, if the connection disappears on the device, and it did not receive an answer and then sent a second request after 5 minutes, a duplicate will be created.
4. Long poll. Long poll . SQS supports this type of connection with a maximum timeout of 20 seconds. That allows us to save on traffic and "jerking" of the service.
5. Metrics. Amazon also provides detailed queue metrics. Such as the number of received / sent / deleted messages, sizes in KB of these messages and so on. You can also connect SQS to the CloudWatch logging service. There you can see even more. Also there you can set up so-called
“alarms” (Alarms) and you can customize actions for any events.
Learn more about connecting to SQS. And
CloudWatch Documentation
Now let's look at the queue settings:
Major:
Default Visibility Timeout - the number of seconds / minutes / hours for which the message after receipt will not be visible for receipt. Maximum processing time is 12 hours.
Message Retention Period - the number of seconds / minutes / hours / days, which means how much time the raw messages will be stored in the queue. Maximum - 14 days.
Maximum Message Size - maximum message size in KB. The value is from 1KB to 256KB.
Delivery Delay - you can set the delay time for delivering the message to the queue. From 0 seconds to 15 minutes (Actually, messages will be in the queue, but will not be visible for receipt).
Receive Message Wait Time - the time that the connection will be held in case we use “Long poll” to receive new messages.
Content-Based Deduplication - a flag; if set to true, then “Deduplication Id” in the form of SHA-256 hash generated from the content will be added to each message.
Dead queue settings
Use Redrive Policy - flag, if set, messages will be redirected after several attempts.
Dead Letter Queue - the name of the “dead” queue to which unprocessed messages will be sent.
Maximum Receives - the number of failed processing attempts, after which the message will be sent to the dead queue
Notice: Also note that we can send all the basic parameters with each message separately. For example, each individual message can have its own Visibility Timeout or Delivery Delay.
Now a little about the messages themselves and their properties:
The message has several parameters:
1. Message body - any text.
2. Message Group Id is something like a tag, channel, mandatory for all messages. Each such group is guaranteed processed in FIFO mode.
3. Message Deduplication Id - a string to identify duplicates. If the “Content-Based Deduplication” mode is set, the parameter is optional.
There are also message attributes
Attributes consist of name, type and value.
1. Name - string
2. Type - there are several types: string, number, binary. The type comes simply as a string, and it is possible to add a postfix to the type. In this case, the type will come with this postfix through a dot, for example string.example_postfix
3. Value - string
Notice: Please note that the maximum number of attributes is 10 Details
PS: This article provides a brief description of the queue, as well as a little about the capabilities and mechanics of SQS. The next article will be devoted to
AWS Lambda , and then their practical sharing.