Improving the reliability of Kubernetes: how to quickly notice that the node fell

In a Kubernetes cluster, a node may die or be restarted.

Tools like Kubernetes provide high availability, are designed for reliable operation and automatic recovery in such scenarios, and Kubernetes really does it all perfectly.

However, you may notice: when the node crashes, the broken node will still be running for some time and receive requests that are no longer executed.

And by default this time, as it seems to me, is too long - it can be reduced. It is influenced by several parameters that are configured in Kubelet and Controller Manager.

Here is a description of the processes that occur when a node crashes:
')
1 . Kubelet sends its status to masters with -node-status-update-frequency = 10s ;

2 Noda in Kubernetes dies.

3 The kube-controller-manager daemon, which monitors the nodes, with an interval of -node-monitor-period = 5s checks in wizards the status of the node received from Kubelet.

4 Kube-controller-manager sees that the node is not responding, and during the soft interval, -node-monitor-grace-period = 40s expects whether the node should be considered problematic. This parameter should be equal to node-status-update-frequency multiplied by N, where N is the number of attempts allowed by Kubelet to send node status. N is a constant for which the value 5 is defined in the code (see the nodeStatusUpdateRetry variable in kubelet / kubelet.go ) .

Note that the default value does not match what the documentation says, because:

node-status-update-frequency × N! = node-monitor-grace-period (10 × 5! = 40)

As far as I understand, 5 attempts to send the status (each for 10 seconds) are performed in 40 seconds, because the first is performed immediately, and the second and subsequent ones - 10 seconds. As a result, five attempts are completed in 40 seconds.

That is, the real formula is as follows:

node-status-update-frequency × (N-1)! = node-monitor-grace-period

Details can be seen in the controller / node / nodecontroller.go code .

5 When the node is marked as problematic, kube-controller-manager deletes the sweeps using the -pod-eviction-timeout = 5m0s parameter .

This is a very important timeout and by default it is 5 minutes, which, in my opinion, is very important. Although the node is already marked as problematic, the kube-controller-manager will not delete its submissions for the specified time, so they will be available through their services, but requests to them will not be executed.

6 Kube-proxy constantly monitors the statuses through the API, so it will immediately notice the moment when the boards are disabled, and update the iptables rules for the node, after which the fallen boards will no longer be available.

So, the mentioned values can be changed to reduce the number of outstanding requests when the node crashes.

For my Kubernetes cluster, I configured them as follows:

kubelet : node-status-update-frequency = 4s (instead of 10s)
controller-manager : node-monitor-period = 2s (instead of 5s)
controller-manager : node-monitor-grace-period = 16s (instead of 40s)
controller-manager : pod-eviction-timeout = 30s (instead of 5m)

The results are good enough: the fall of the node is not determined in 5 minutes 40 seconds, but in 46 seconds.

PS From the translator: for ourselves, we are still choosing the optimal values depending on the network configuration (one or different data centers, one rack, etc.) and plan to share the results of the research (i.e. optimal values) when they are obtained empirically .

Source: https://habr.com/ru/post/326062/

All Articles

Improving the reliability of Kubernetes: how to quickly notice that the node fell

More articles: