📜 ⬆️ ⬇️

Kubernetes Network Performance Comparison



Kubernetes requires that each container in a cluster has a unique, routable IP. Kubernetes does not assign IP addresses by itself, leaving this task to third-party solutions.

The purpose of this study is to find a solution with the lowest latency, the highest throughput, and the smallest tuning cost. Since our load depends on delays, we measure delays of high percentiles with a fairly active network load. In particular, we focused on performance in the region of 30-50 percent of the maximum load, since this best reflects typical situations for non-overloaded systems.
')

Options


Docker with --net=host


Our exemplary installation. All other options were compared with her.

The option --net=host means that containers inherit the IP addresses of their host machines, i.e. there is no network containerization.

Lack of network containerization a priori provides better performance than the presence of any implementation - for this reason we used this installation as a reference.

Flannel


Flannel is a virtual network solution supported by the CoreOS project. Well tested and ready for production, so the cost of implementation is minimal.

When you add a flannel machine to a cluster, the flannel does three things:

  1. Assigns a subnet to a new machine using etcd .
  2. Creates a virtual bridge interface on the machine ( docker0 bridge ).
  3. Configures the packet forwarding backend :

    • aws-vpc - registers a machine's subnet in the Amazon AWS instance table. The number of entries in this table is limited to 50, i.e. You cannot have more than 50 machines in a cluster if you are using a flannel with aws-vpc . In addition, this backend only works with Amazon AWS;
    • host-gw - creates IP routes to subnets via the IP addresses of the remote machine. Requires direct L2 connectivity between hosts running the flannel;
    • vxlan - creates a virtual interface VXLAN .

Because flannel uses the bridge interface to forward packets, each packet passes through two network stacks when sent from one container to another.

Ipvlan


IPvlan is a driver in the Linux kernel that allows you to create virtual interfaces with unique IP addresses without the need to use a bridge interface.

To assign an IP address to a container using IPvlan requires:

  1. Create a container with no network interface at all.
  2. Create an ipvlan interface in a standard network namespace.
  3. Move the interface to the container namespace.

IPvlan is a relatively new solution, so there are no tools to automate this process yet. Thus, IPvlan deploy on multiple machines and containers becomes more complicated, that is, the implementation cost is high. However, IPvlan does not require a bridge interface and forwards packets directly from the NIC to the virtual interface, so we expected better performance than the flannel.

Load test script


For each option we have done the following steps:

  1. Set up a network on two physical machines.
  2. We launched tcpkali in a container on the same machine, setting it up to send requests at a constant speed.
  3. Run nginx in a container on another machine, setting it up to respond with a fixed-size file.
  4. They removed the system metrics and tcpkali results.

We ran this test with different numbers of requests: from 50,000 to 450,000 requests per second (RPS).

For each request, nginx responded with a fixed-size static file: 350 bytes (contents of 100 bytes and headers of 250 bytes) or 4 kilobytes.

results


  1. IPvlan showed the lowest latency and best maximum throughput. Flannel with host-gw and aws-vpc follows it with similar metrics, with host-gw better under maximum load.
  2. Flannel with vxlan showed the worst results in all tests. However, we suspect that its exceptionally bad percentile of 99,999 was caused by a bug.
  3. The results for the 4-kilobyte response are similar to the 350-byte case, but there are two noticeable differences:

    • the maximum RPS is significantly lower, since 4 270 thousand RPS were required for 4-kilobyte responses to fully load the 10-gigabit NIC;
    • IPvlan is much closer to --net=host when approaching a bandwidth limit.

Our current choice is flannel with host-gw . It has few dependencies (in particular, it does not require AWS or a new version of the Linux kernel), it is easy to install compared to IPvlan and offers sufficient performance. IPvlan is our fallback. If at some point the flannel gets support for IPvlan, we will switch to that option.

Despite the fact that the performance of aws-vpc turned out to be a little better than host-gw , the limitation of 50 machines and the fact that it was tied to Amazon AWS became decisive factors for us.

50,000 RPS, 350 bytes



At 50,000 requests per second, all candidates showed acceptable performance. You can already notice the main trend: IPvlan shows the best results, host-gw and aws-vpc follow it, and vxlan - the worst.

150,000 RPS, 350 bytes




Percentages of delay for 150 000 RPS (≈30% of the maximum RPS), ms


IPvlan is slightly better than host-gw and aws-vpc , but it has the worst percentile of 99.99. host-gw slightly better performance than aws-vpc .

250,000 RPS, 350 bytes




It is assumed that such a load is normal for production, so the results are especially important.

Percentile delays for 250,000 RPS (≈50% of maximum RPS), ms


IPvlan again shows better performance, but aws-vpc best result in percentiles of 99.99 and 99.999. host-gw superior to aws-vpc in percentiles 95 and 99.

350,000 RPS, 350 bytes




In most cases, the delay is close to the results for 250,000 RPS (350 bytes), but it is growing rapidly after the 99.5 percentile, which means an approximation to the maximum RPS.

450,000 RPS, 350 bytes





Interestingly, host-gw shows much better performance than aws-vpc :


500,000 RPS, 350 bytes


With a load of 500,000 RPS, only IPvlan still works and even surpasses --net=host , but the delay is so high that we cannot call it valid for applications that are sensitive to delays.


50,000 RPS, 4 kilobytes




Large results of requests (4 kilobytes against 350 bytes tested earlier) lead to a greater network load, but the list of leaders practically does not change:

Percentile delays at 50,000 RPS (≈20% of maximum RPS), ms


150,000 RPS, 4 kilobytes




host-gw a surprisingly poor percentile of 99.999, but it still shows good results for smaller percentiles.

Percentages of delay for 150,000 RPS (≈60% of maximum RPS), ms


250,000 RPS, 4 kilobytes




This is the maximum RPS with the greatest answer (4 Kb). aws-vpc far superior to host-gw , unlike in the case with a small response (350 bytes).

Vxlan was again excluded from the schedule.

Environment for testing


The basics


To better understand this article and reproduce our test environment, you need to be familiar with the basics of high performance.

These articles contain useful information on this topic:


Cars



Configuration


Modern NICs use Receive Side Scaling (RSS) over multiple interrupt request ( IRQ ) lines. EC2 offers only two such lines in a virtualized environment, so we tested several configurations with RSS and Receive Packet Steering (RPS) and came up with the following settings, partly recommended by the Linux kernel documentation:


This configuration allowed us to evenly distribute the interrupt load across the processor cores and achieve better throughput while maintaining the same latency as in the other tested configurations.

Kernels 0 and 9 serve only network interrupt (NIC) and do not work with packets, but they remain the most busy:



I also used tuned from Red Hat with the network-latency profile enabled.

To minimize the effect of nf_conntrack, NOTRACK rules have been added.

The sysctl configuration has been configured to support a large number of TCP connections:

 fs.file-max = 1024000 net.ipv4.ip_local_port_range = "2000 65535" net.ipv4.tcp_max_tw_buckets = 2000000 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_fin_timeout = 10 net.ipv4.tcp_slow_start_after_idle = 0 net.ipv4.tcp_low_latency = 1 

From the translator : Many thanks to my colleagues from Machine Zone, Inc. for testing! It helped us, so we wanted to share it with others.

PS Perhaps you will also be interested in our article " Container Networking Interface (CNI) - network interface and standard for Linux-containers ."

Source: https://habr.com/ru/post/332432/


All Articles