📜 ⬆️ ⬇️

How the CPU Manager in Kubernetes works

Note trans. : This article was published on the official Kubernetes blog and was written by two Intel employees who are directly involved in the development of the CPU Manager — a new feature in Kubernetes, which we wrote about in the release 1.8 review. At the moment (ie, for K8s 1.11), this feature has the status of a beta version, and read more about its purpose later in the article.

The publication is about the CPU Manager - a beta feature in Kubernetes. CPU Manager allows you to better distribute workloads in Kubelet, i.e. on the Kubernetes node agent, by assigning the allocated CPUs to the specific hearth containers.


')

Sounds great! But will the CPU Manager help me?


Depends on workload. A single compute node in a Kubernetes cluster can run multiple hearths, and some of them can run loads that are active in CPU consumption. In this scenario, hearths can compete for the process resources available on this node. When this competition escalates, the workload may move to other CPUs depending on whether it was throttled under and what CPUs are available at the time of planning. In addition, there may be cases where the workload is sensitive to context switches. In all of these scenarios, workload performance may suffer.

If your workload is sensitive to such scenarios, you can enable CPU Manager to provide better performance isolation by allocating specific CPUs to the load.

CPU Manager can help loads with the following characteristics:


OK! How to use it?


Using CPU Manager is easy. First, turn it on using the Kubelet Static Policy running on the cluster's compute nodes. Then set the guaranteed Guaranteed Quality of Service (QoS) class for the hearth. Request an integer number of CPU cores (for example, 1000m or 4000m ) for containers that need dedicated cores. Create under the previous method (for example, kubectl create -f pod.yaml ) ... and voila - the CPU Manager will assign the dedicated processor cores to each sub-container in accordance with their needs for the CPU.

 apiVersion: v1 kind: Pod metadata: name: exclusive-2 spec: containers: - image: quay.io/connordoyle/cpuset-visualizer name: exclusive-2 resources: # Pod is in the Guaranteed QoS class because requests == limits requests: # CPU request is an integer cpu: 2 memory: "256M" limits: cpu: 2 memory: "256M" 

BOM specification requesting 2 dedicated CPUs.

How does the CPU Manager work?


We consider three types of control of CPU resources available in most Linux distributions, which will be relevant in relation to Kubernetes and the actual purposes of this publication. The first two are CFS shares (what is my weighted "fair" share of CPU time in the system) and CFS quota (what is the maximum CPU time allocated to me for the period). CPU Manager also uses the third one, which is called CPU affinity (on which logical CPUs I am allowed to perform calculations).

By default, all runs and containers running on the Kubernetes cluster node can run on any available system kernels. The total number of shares to be assigned and quota are limited by the CPU resources reserved for Kubernetes and system daemons . However, the limits on the CPU time used can be determined using the limits on the CPU in the heart rate specification . Kubernetes uses the CFS quota to enforce CPU limits on the pod containers.

When you turn on the CPU Manager with the Static policy, it manages the allocated CPU pool. Initially this pool contains all the CPUs of the compute node. When Kubelet creates a container in the hearth with a guaranteed number of dedicated processor cores, the CPUs assigned to this container are allocated to it for the duration of its life and removed from the shared pool. The loads from the remaining containers are transferred from these dedicated cores to others.

All containers without dedicated CPUs ( Burstable , BestEffort, and Guaranteed with non-integer CPUs ) run on the cores remaining in the shared pool. When a container with a dedicated CPU stops working, its cores are returned to the shared pool.

More detail, please ...




The above diagram demonstrates the anatomy of the CPU Manager. It uses the UpdateContainerResources method from the Container Runtime Interface (CRI) interface to change the CPUs on which the containers run. The manager periodically cgroupfs to cgroupfs current state ( State ) of the CPU resources for each running container.

The CPU Manager uses Policies to decide whether to assign CPU cores. Implemented two policies: None and Static . By default, starting with Kubernetes 1.10, it is included with the None policy.

The Static policy assigns dedicated pods to pod containers with a guaranteed QoS class that requests an integer number of cores. The Static policy attempts to assign the CPU in the best topological manner and in the following sequence:


How does the CPU Manager improve the isolation of computing?


With Static policy enabled in CPU Manager, workloads may show better performance for one of the following reasons:


OK! Do you have any results?


In order to see the performance improvements and isolation provided by the inclusion of the CPU Manager in Kubelet, we conducted experiments on a compute node with two sockets (Intel Xeon CPU E5-2680 v3) and hyperthreading turned on. A node consists of 48 logical CPUs (24 physical cores, each with hyperthreading). The following shows the benefits of CPU Manager in performance and isolation, fixed by benchmarks and real-world workloads in three different scenarios.

How to interpret graphics?


For each scenario, graphs ( span diagrams , box plots) are shown illustrating the normalized execution time and its variability when starting the benchmark or the real load with the CPU Manager turned on and off. Executable time is normalized to best performance starts (1.00 on the Y axis represents the best start time: the smaller the graph value, the better). The height of the plot on the graph shows the variation in performance. For example, if a segment is a line, then there is no performance variation for these launches. At these sites themselves, the median line is the median, the upper one is the 75th percentile, and the lower one is the 25th percentile. The height of the segment (i.e., the difference between the 75th and 25th percentiles) is defined as the interquartile interval (IQR). "Whiskers" show data outside this interval, and the points show the outliers. Emissions are defined as any data that is 1.5 times different from IQR - less or more than the corresponding quartile. Each experiment was conducted 10 times.

Protection against load aggressors


We launched six benchmarks from the PARSEC suite (workloads “victims”) [for more details about victim workloads, you can read, for example, here - approx. trans. ] adjacent to the CPU-loading container (“aggressor” workload) with the CPU Manager on and off.

The aggressor container was launched as a Burstable QoS class, requesting 23 CPUs with the --cpus 48 flag. Benchmarks are run as pods with the QoS class Guaranteed , requiring a set of CPUs from a full socket (that is, 24 CPUs on this system). The graphs below show the normalized launch time for the poda with the benchmark next to the aggressor hearth, with and without the Static policy of the CPU Manager. In all test cases, you can see improved performance and reduced variability in performance when policies are enabled.



Isolation for adjacent loads


This demonstrates how CPU Manager can be useful in the case of many co-located workloads. The span diagrams below show the performance of two benchmarks from the PARSEC suite ( Blackscholes and Canneal ) running for the QoS classes Guaranteed (Gu) and Burstable (Bu), adjacent to each other, with Static turned on and off.

Following clockwise from the top left chart, we see Blackscholes performance for Bu QoS (top left), Canneal for Bu QoS (top right), Canneal for Gu QoS (bottom right), and Blackscholes for Gu QoS (bottom left). On each of the graphs, they are located (again we go clockwise) together with Canneal for Gu QoS (upper left), Blackscholes for Gu QoS (upper right), Blackscholes for Bu QoS (lower right) and Canneal for Bu QoS (lower left) respectively. For example, a Bu-blackscholes-Gu-canneal (top left) chart shows performance for Blackscholes running from Bu QoS and located next to Canneal with a Gu QoS class. In each case, the Gu QoS class requires a full socket core (that is, 24 CPUs), while the Bu QoS class requires 23 CPUs.

There is better performance and less variation in performance for both adjacent workloads in all tests. For example, look at the Bu-blackscholes-Gu-canneal (top left) and Gu-canneal-Bu-blackscholes (bottom right). They show the performance of simultaneously running Blackscholes and Canneal with the CPU Manager turned on and off. In this case, Canneal receives more dedicated cores from the CPU Manager, since it belongs to the Gu QoS class and requests an integer number of CPU cores. However, Blackscholes also gets a dedicated set of CPUs, since this is the only workload in the shared pool. As a result, both Blackscholes and Canneal take advantage of the isolation of loads in the case of CPU Manager.



Insulation for separately standing loads


This demonstrates how CPU Manager can be useful for freestanding real-life workloads. We took two loads from the official TensorFlow models : wide and deep and ResNet . For them, typical data sets are used (census and CIFAR10, respectively). In both cases, pods ( wide and deep , ResNet ) require 24 CPUs, which corresponds to a full socket. As shown in the graphs, in both cases, the CPU Manager provides better isolation.



Restrictions


Users may want to get CPUs allocated on a socket close to the bus connecting to an external device such as an accelerator or high-performance network card in order to avoid traffic between the sockets. This type of configuration is not yet supported in the CPU Manager. Since the CPU Manager provides the best possible distribution of CPUs belonging to a socket or physical core, it is sensitive to extreme cases and can lead to fragmentation. The CPU Manager does not take into account the isolcpus Linux kernel isolcpus , although it is used as a popular practice for some cases (for more information about this parameter, see, for example, here - note transl. ) .

PS from translator


Read also in our blog:

Source: https://habr.com/ru/post/418269/


All Articles