📜 ⬆️ ⬇️

GitHub will open the GLB own load balancer code


GitHub serves billions of HTTP, Git, and SSH connections daily. To improve performance, the company began to use "bare metal", that is, computers without additional levels of virtualization. However, historically, network load balancing is more difficult to optimize.

To do this, GitHub used vertical scaling with the launch of a small number of large machines and haproxy. In addition, a specific hardware configuration was installed that provides fault tolerance for 10G links.

As a result, GitHub engineers realized that they would need to create their own solution that would work for individual resource needs. Therefore, they developed a load balancer (GitHub Load Balancer - GLB). Now GitHub has decided to turn its development into an open source project.
')
Engineers reported that one implementation of horizontal scaling and other standard balancing schemes for GitHub is not enough.
When increasing the load or attendance of the project, sooner or later the vertical scaling (increase in server resources, such as memory, disk speed, etc.) comes up against a certain limit and does not give a tangible increase. In this case, horizontal scaling is used - adding new servers with load redistribution between them.

Taking into account the bottlenecks in the previous system, the developers stopped at the following requirements for the new balancing system:

• Works on standard network equipment.
• Scaled horizontally.
• Provides high quality access, TCP connection stability and fault tolerance.
• Supports blocking of new connections.
• Load balancing for individual services and hosts for multiple services.
• Supports iterative development and is deployed as normal software.
• Allows testing of each layer, not integration tests.
• Works for multiple points of presence and data centers.
• It has resistance to common DDoS-attacks and tools to combat new types of attacks.

Work with IP


In normal cases, one external public IP address is associated with one real machine. DNS can be used to split traffic into multiple IPs. This makes it possible to distribute traffic across multiple servers. GitHub needed a solution that would allow one IP address to be linked to multiple machines.

To do this, the company used ECMP routing (Equal-Cost Multi-Path Routing), which solves this problem and allows balancing at the connection level.

L4 / L7 separation


Load balancing is performed separately at levels L4 and L7. At the L4 level, the router uses ECMP and transmits traffic to the L7, which runs the necessary software (haproxy, for example).

In the following posts , GitHub engineers promise to describe the new development in more detail, as well as talk about the process of transition to a new load balancing system.

Source: https://habr.com/ru/post/310852/


All Articles