Exponential Backoff or as "do not overwhelm the server"

For any interaction between the client and the server, we are faced with the need to repeat requests. The network connection may be unreliable, there may be server problems or any other reasons for which you need to repeat the request. The same applies to the interaction of the backend server with the database or any other data storage (other service).

Today we will talk about the request retry interval. How long after the failed request can I repeat it? Let's look at two strategies: repeating at a fixed time interval and exponential backoff (exponential backoff). We will see in the simulation that, provided there are a large number of clients, repeating at a fixed interval may not allow the server to “rise”, and the use of exponential backoff avoids this problem.

The question of the retry interval becomes important when there are problems on the server. Very often the server is able to withstand the load from clients who send requests in a certain “current” mode, distributing their requests randomly in time. If a server fails, all clients detect it and start repeating requests after a certain interval. It may be that the frequency of such requests exceeds the limit that the server can handle.
')
Another important point is that the client often cannot distinguish the problems on the server from the problems with the network connection on the client side: if the response to the request does not come within a specified time interval, the client cannot conclude what the problem is. And the client's behavior (repeat request, retry interval) will be the same in both situations.

Repeat after a fixed time interval

This is the simplest strategy (and the most used). In case of unsuccessful execution of the request, the client repeats the execution of the request at the interval T. Here we are looking for a compromise between not “filling up” the server with too frequent requests in case of failure, and between not impairing the user experience, without repeating the requests for too long single problems in the network.

A typical retry interval can range from hundreds of milliseconds to several seconds.

Exponential delay (exponential backoff)

The exponential shelving algorithm can be written in pseudocode as follows:

delay = MinDelay while True: error = perform_request() if not error: # ok, got response break # error happened, pause between requests sleep(delay) # calculate next delay delay = min(delay * Factor, MaxDelay) delay = delay + norm_variate(delay * Jitter)

In this pseudo-code, delay is the interval between requests, perform_request performs the request itself, and norm_variate calculates the normal distribution.

There are the following algorithm parameters:

MinDelay - the minimum (initial) repeat interval (for example, 100 ms)
MaxDelay - limit on the duration of the repeat interval (for example, 15 minutes)
Factor - coefficient of exponential increase in delay (for example, 2)
Jitter - “jitter”, defines a random component (for example, 0.1)

The calculation of the delay interval works quite simply: the first interval equals MinDelay , then in case of an error in the execution of the query, the interval is increased by a factor times, and a random variable proportional to Jitter is added. The retry interval is limited to MaxDelay (excluding Jitter ).

How is this method of calculating repetition better than repeating after a fixed period of time:

the interval increases with each attempt; if the server cannot cope with the flow of requests, the flow of requests from clients will decrease with time;
the intervals are randomized, i.e. there will not be a "flurry" of requests in cycles, proportional to a fixed interval.

You can ask a reasonable question: what if the client just has a bad network connection? After all, the algorithm of exponential postponing will all the time increase the request interval, and the user will not wait forever. Three points can be made here:

if the connection is really bad (2, 3, 4 attempts fail), you may not need to repeat requests as often;
the client does not know what the problem is (see above), and for some time the behavior for the user may not be so bad as compared to the situation when the server is overwhelmed with requests and will not be able to return to normal life;
The Factor , MaxDelay and MinDelay can be changed to slow down the repetition interval or limit it.

To illustrate the delay interval graphics with this algorithm with parameters:

Blue graph: MinDelay 100 ms, Factor 2.7, Jitter 0.1, MaxDelay 10 min;
green graph: MinDelay 1 sec, Factor 1.5, Jitter 0.1, MaxDelay 5 min.

On the horizontal axis - the number of attempts.

The same graphics, logarithmic scale:

Simulation

Is it so easy to believe that the complexity of exponential postponing is justified? Let's try a simulation and find out the effect of different algorithms. The simulation consists of two independent parts: the client and the server, I will describe them in turn.

The server emulates a typical database response delay for a request: up to a certain limit, the number of simultaneous requests the server responds with a fixed delay, and then the response time begins to increase exponentially depending on the number of simultaneous requests. If we are talking about a backend server, most often its response time is also limited by the response time of the database.

The server is responsible for minDelay (100 ms) until the number of simultaneous connections exceeds concurrencyLimit (30), after which the response time increases exponentially: minDelay * factor ^ ((concurrency - concurrencyLimit) / K) , where factor , K are constants, and concurrency - the current number of simultaneous connections. In fact, the server does not perform any work, it responds to the request with a fixed response, but the delay is calculated using the above formula.

The client models the flow of requests from N clients, each of which executes requests independently. Each client thread executes queries at intervals corresponding to the exponential distribution with the lambda parameter (mat. Waiting for the query interval is lambda ). The exponential distribution describes well random processes occurring in the real world (for example, user activity, which leads to requests to the server). If the request is executed incorrectly (or if the response time is exceeded), the client starts repeating the request either at a simple fixed interval or using an exponential delay algorithm.

The simulation scenario is as follows: we start the client and the server with some parameters, after 10-15 seconds there is a steady state in which requests are executed successfully with small delays. Then we stop the server (we emulate the server “crash” or network problems) simply with a SIGSTOP signal, the client stream detects this and starts repeating the requests. Then we restore the server and see how quickly the server will return to normal (or not).

Project code

The simulation code is written in Go and uploaded to GitHub . To build, you need Go 1.1+, the compiler can be installed from the OS package or downloaded :

 $ git clone https://github.com/smira/exponential-backoff.git $ cd exponential-backoff $ go build -o client client.go $ go build -o server server.go

A simple script: in the first console, start the server:

 $ ./server Jun 21 13:31:18.708: concurrency: 0, last delay: 0 ...

The server waits for incoming HTTP requests on port 8070 and every second prints to the screen the current number of simultaneous connections and the last calculated client service latency.

In another console, launch the client:

 $ ./client OK: 98.00 req/sec, errors: 0.00 req/sec, timedout: 0.00 req/sec OK: 101.00 req/sec, errors: 0.00 req/sec, timedout: 0.00 req/sec OK: 100.00 req/sec, errors: 0.00 req/sec, timedout: 0.00 req/sec ...

The client prints every 5 seconds statistics on successful requests, requests that failed, as well as requests for which the response timeout was exceeded.

In the first console, stop the server:

 Jun 21 22:11:08.519: concurrency: 8, last delay: 100ms Jun 21 22:11:09.519: concurrency: 10, last delay: 100ms ^Z [1]+ Stopped ./server

The client detects that the server is stopped:

 OK: 36.00 req/sec, errors: 0.00 req/sec, timedout: 22.60 req/sec OK: 0.00 req/sec, errors: 0.00 req/sec, timedout: 170.60 req/sec OK: 0.00 req/sec, errors: 0.00 req/sec, timedout: 298.80 req/sec OK: 0.00 req/sec, errors: 0.00 req/sec, timedout: 371.20 req/sec

We are trying to restore the server:

 $ fg ./server Jun 21 22:13:06.693: concurrency: 0, last delay: 100ms Jun 21 22:13:07.492: concurrency: 1040, last delay: 2.671444385s Jun 21 22:13:08.492: concurrency: 1599, last delay: 16.458895305s Jun 21 22:13:09.492: concurrency: 1925, last delay: 47.524196455s Jun 21 22:13:10.492: concurrency: 2231, last delay: 2m8.580906589s ...

The number of simultaneous connections increases, the server response delay increases, for the client this will mean even more timeout situations, even more repetitions, even more simultaneous requests, the server will increase the delay even more, our model service “fell” and will not rise again.

Stop the client and restart the server to start over with an exponential snooze delay:

 $ ./client -exponential-backoff

Repeat the same sequence of actions with the suspension and resumption of the server - the server did not fall and successfully rose, within a short period of time the service was restored.

Instead of conclusion

The test client and server have a large number of parameters that can control both the load and timeouts, the behavior under load, etc .:

 $ ./client -h $ ./server -h

Even in this simple simulation, it is easy to see that the behavior of a group of clients (of which the emulated client) differs in the case of “successful” work and in the case of a situation of problems (on the network or on the server side):

during normal operation, the server load (the number of requests per second) is determined by a random distribution, but will be approximately constant (with default parameters of about 100 requests per second);
in case of problems with a simple delay, the number of requests reaches quickly high values (determined only by the number of clients and the retry delay);
in case of problems with exponentially postponing the load on the server decreases with time (until the server can cope with the load).

Tip: use exponential postponement for any repetition of the request - at the client when accessing the server or at the server when accessing the database or another service.

This post was written on the materials of the master class about high loads and reliability.

Source: https://habr.com/ru/post/227225/

All Articles