Load balancing: basic algorithms and methods

The question of load planning should be addressed at an early stage of development of any web project. The “fall” of the server (and it always happens unexpectedly, at the most inopportune moment) is fraught with very serious consequences, both moral and material. Initially, problems with insufficient server performance due to increased loads can be solved by increasing server capacity, or by optimizing the algorithms used, software codes, and so on. But sooner or later there comes a time when these measures are also insufficient.

It is necessary to resort to clustering: several servers are combined into a cluster; the load between them is distributed using a set of special methods called balancing. In addition to solving the problem of high loads, clustering also helps provide redundancy of servers to each other.
The efficiency of clustering directly depends on how the load is distributed (balanced) between the cluster elements.
')
Load balancing can be carried out using both hardware and software tools. We would like to talk about the basic methods and algorithms and balancing in this article.

Balancing levels

The balancing procedure is carried out with the help of a whole complex of algorithms and methods corresponding to the following levels of the OSI model:

networked;
transport;
applied.

Consider these levels in more detail.

Network Level Balancing

Balancing at the network level involves solving the following problem: you need to make sure that different physical machines are responsible for one specific server IP address. Such balancing can be carried out using a variety of different ways.

DNS balancing. Multiple IP addresses are allocated to one domain name. The server to which the client request will be sent is usually determined using the Round Robin algorithm (the methods and algorithms of balancing will be described in detail below).
Building NLB Cluster. When using this method, servers are combined into a cluster consisting of input and compute nodes. The load is distributed using a special algorithm. Used in solutions from Microsoft.
IP balancing using an optional router.
Balancing on a territorial basis is carried out by placing the same services with the same addresses in geographically different regions of the Internet (this is how the Anycast DNS technology works, which we have already written about ). Territorial balancing is also used in many CDNs (see interesting implementation example ).

Balancing at the transport level

This type of balancing is the simplest: the client calls the balancer, who forwards the request to one of the servers, which will process it. The choice of the server on which the request will be processed may be carried out in accordance with a variety of algorithms (this will be discussed below): by simple round-robin, by selecting the least loaded server from the pool, etc.

Sometimes balancing at the transport level is difficult to distinguish from balancing at the network level. Consider the following rule for the pf network filter in BSD systems: for example, formally, this is about balancing traffic on a specific TCP port (an example for a pf network filter in BSD systems):

 web_servers = "{10.0.0.10, 10.0.0.11, 10.0.0.13}"

 match t_p to port 80 rdr-to $ web_servers round-robin sticky-address

It is about balancing traffic on a specific TCP port.

Consider now another example:

 pass in on $ int_if from $ lan_net \
    route-to {($ ext_if1 $ ext_gw1), ($ ext_if2 $ ext_gw2)} \
    round-robin

This rule is about balancing outbound traffic at the network level. It does not specify a specific port or a specific protocol.

The difference between the balancing levels can be explained as follows. The network level includes solutions that do not terminate user sessions on themselves. They simply redirect traffic and do not work in proxy mode.
At the network level, the balancer simply decides which server to send packets to. The session with the client is carried out by the server.

At the transport level, communication with the client is closed on the balancer, which works as a proxy. It interacts with servers on its own behalf, passing information about the client in additional data and headers. Thus, for example, the popular software balancer HAProxy works.

Balancing at the application level

When balancing at the application level, the balancer works in the “smart proxy” mode. It analyzes client requests and redirects them to different servers depending on the nature of the requested content. This is how, for example, the Nginx web server works by distributing requests between the front-end and the backend. Upstream module is responsible for balancing in Nginx. For more information about the features of balancing Nginx based on various algorithms can be found, for example, here .

As another example of an application level balancing tool, pgpool is an intermediate layer between the client and the PostgreSQL DBMS server. With it, you can distribute requests to database servers depending on their content, for example, read requests will be sent to one server, and write requests to another. You can read more about pgpool and the specifics of working with it in this article ).

Algorithms and methods of balancing

There are many different algorithms and methods for load balancing. Choosing a specific algorithm, you need to proceed, firstly, from the specifics of a particular project, and secondly, from goals. which we plan to achieve.

Among the goals for which balancing is used, the following should be highlighted:

fairness : it is necessary to ensure that system resources are allocated for the processing of each request and to prevent situations in which one request is processed and all others are waiting for their turn;
efficiency : all servers that process requests must be 100% busy; It is desirable not to allow a situation where one of the servers is idle while waiting for requests for processing (we will immediately make a reservation that in real practice this goal is not always achieved);
shortening the query execution time : it is necessary to ensure a minimum time between the beginning of the processing of a request (or its queuing for processing) and its completion
shorter response time : you need to minimize the response time to a user request.

It is also highly desirable that the balancing algorithm has the following properties:

predictability : you need to clearly understand in what situations and under what loads the algorithm will be effective for solving the set tasks;
uniform loading of system resources ;
scalability : the algorithm must remain operable with increasing load.

Round robin

Round Robin, or round-robin algorithm, is a round-robin loop: the first request is sent to one server, then the next request is sent to another and so on until the last server is reached, and then it all starts over again.

The most common implementation of this algorithm is, of course, the Round Robin DNS balancing method. As you know, any DNS server stores a pair of "host name - IP address" for each machine in a specific domain. This list might look like this:

 example.com xxx.xxx.xxx.2
 www.example.com xxx.xxx.xxx.3

You can associate several IP addresses with each name from the list:

 example.com xxx.xxx.xxx.2
 www.example.com xxx.xxx.xxx.3
 www.example.com xxx.xxx.xxx.4
 www.example.com xxx.xxx.xxx.5
 www.example.com xxx.xxx.xxx.6

The DNS server goes through all the records in the table and gives the following IP address for each new request: for example, the first request is xxx.xxx.xxx.2, the second is xxx.xxx.xxxx.3, and so on. As a result, all servers in the cluster receive the same number of requests.

Among the undoubted advantages of this algorithm should be called, first, independence from the high-level protocol. To work using the Round Robin algorithm, any protocol is used in which the server is accessed by name.
Balancing based on the Round Robin algorithm does not depend on the load on the server: the caching DNS servers will help to cope with any influx of clients.

Using the Round Robin algorithm does not require communication between servers, so it can be used for both local and global balancing.
Finally, solutions based on the Round Robin algorithm have a low cost: in order for them to start working, just add a few records to the DNS.

The Round Robin algorithm also has a number of significant drawbacks. In order for the load distribution according to this algorithm to meet the above criteria of fairness and efficiency, it is necessary that each server should have the same set of resources. When performing all operations, the same amount of resources should also be involved. In actual practice, these conditions, in most cases, turn out to be impossible.

Also, when balancing according to the Round Robin algorithm, the load of one or another server within the cluster is completely ignored. Imagine the following hypothetical situation: one of the nodes is 100% loaded, while the others are only 10 - 15%. The Round Robin algorithm does not take into account the possibility of such a situation in principle, so the overloaded node will still receive requests. In this case, there is no question of any fairness, efficiency and predictability.

Due to the circumstances described above, the scope of the Round Robin algorithm is very limited.

Weighted round robin

This is an enhanced version of the Round Robin algorithm. The essence of the improvements is as follows: each server is assigned a weighting factor in accordance with its performance and power. This helps to distribute the load more flexibly: servers with more weight handle more requests. However, this does not solve all the problems with fault tolerance. More effective balancing is provided by other methods in which a greater number of parameters are taken into account when planning and distributing the load.

Least connections

In the previous section, we listed the main disadvantages of the Round Robin algorithm. Let's call one more: it does not take into account the number of currently active connections.

Consider a practical example. There are two servers - let's denote them conditionally as A and B. Fewer users are connected to server A than to server B. Server A is more overloaded. How is this possible? The answer is quite simple: connections to server A are maintained for a longer time compared to connections to server B.

The described problem can be solved with the help of an algorithm known as least connections (in abbreviated form - leastconn). It takes into account the number of connections currently supported by servers. Each following question is transmitted to the server with the least number of active connections.

There is an improved version of this algorithm, intended primarily for use in clusters consisting of servers with different technical characteristics and different performance. It is called Weighted Least Connections and takes into account in the load distribution not only the number of active connections, but also the weighting factor of the servers.

Among other advanced variants of the Least Connections algorithm, we should first of all highlight Locality-Based Least Connection Scheduling and Locality-Based Least Connection Scheduling with Replication Scheduling.

The first method was created specifically for caching proxy servers. Its essence is as follows: the largest number of requests is transmitted to servers with the least number of active connections. Each client server is assigned a group of client IPs. Requests from these IPs are sent to the “native” server, if it is not fully loaded. Otherwise, the request will be redirected to another server (it should be loaded less than half).

In the Locality-Based Least Connection Scheduling with Replication Scheduling algorithm, each IP address or group of IP addresses is assigned not to a separate server, but to a whole group of servers. The request is transferred to the least loaded server from the group. If all the servers from the “native” group are overloaded, then a new server will be reserved. This new server will be added to the group serving the IP from which the request was sent. In turn, the most loaded server from this group will be removed - this avoids redundant replication.

Destination Hash Scheduling and Source Hash Scheduling

The Destination Hash Scheduling algorithm was designed to work with a cluster of caching proxy servers, but it is often used in other cases. In this algorithm, the server processing the request is selected from a static table by the recipient's IP address.

The Source Hash Scheduling algorithm is based on the same principles as the previous one, only the server that will process the request is selected from the table by the sender's IP address.

Sticky sessions

Sticky Sessions is an algorithm for distributing incoming requests, in which connections are transmitted to the same server in the group. It is used, for example, in the Nginx web server. User sessions can be assigned to a specific server using the IP hash method (for more information about it, see the official documentation ). With this method, requests are distributed across servers based on the client's IP address. As stated in the documentation (see link above), "the method ensures that requests for the same client will be transmitted to the same server." If the server assigned to a specific address is not available, the request will be redirected to another server. An example of a fragment of the configuration file:

 upstream backend {
 ip_hash;

 server backend1.example.com;
 server backend2.example.com;
 server backend3.example.com;
 server backend4.example.com;
 }

Starting with version 1.2.2 in Nginx, you can specify a weight for each server.

The use of this method poses some problems. Session binding problems can occur if the client uses dynamic IP. In a situation where a large number of requests go through one proxy server, balancing can hardly be called effective and fair. The problems described, however, can be solved using cookies. In the commercial version of Nginx there is a special module sticky, which just uses cookies for balancing. He also has free analogues - for example, nginx-sticky-module .
You can use the sticky-sessions method in HAProxy too - for more information, see here.

Conclusion

This article is essentially an introduction to load balancing. We will continue the discussion of this topic in further publications. If you have questions, comments and additions - welcome to comments. We would also be grateful if you share non-trivial practical examples of load balancing for various projects.

Readers who for one reason or another cannot post comments here are invited to our blog .

Source: https://habr.com/ru/post/250201/

All Articles