📜 ⬆️ ⬇️

CPU Load: when to start worrying?

This note is a translation of an article from Scout’s blog. The article gives a simple and visual explanation of such a thing as load average . The article is aimed at beginners Linux-administrators, but perhaps it will be useful and more experienced admins. Interested welcome under cat.

You are probably already familiar with the concept of load average . Load average is the three numbers displayed when the top and uptime commands are executed. They look like this:
 load average: 0,35, 0,32, 0,41 

Most intuitively understand that these three numbers denote the average values ​​of processor utilization over progressively increasing time intervals (one, five and fifteen minutes) and the smaller their values ​​the better. Large numbers indicate too much server load. But what values ​​are considered marginal? Which values ​​are “bad” and which values ​​are “good”? When should you just worry about average load tasks, and when should other things be thrown away and solve the problem as quickly as possible?
For a start, let's see what load average means. Consider the simplest case: suppose we have one server with a single core processor.

Traffic flow analogy


A single core processor is similar to a single lane road. Imagine that you are driving traffic on a bridge. Sometimes, your bridge is loaded so hard that cars have to wait in line to drive through it. You want to let people know how long they will have to wait to move to the other side of the river. A good way to do this would be to show how many cars are waiting in the queue at a particular point in time . If there are no cars in the queue, driving drivers will know that they will be able to immediately cross the bridge. Otherwise, they will understand that they will have to wait for their turn.
So, Bridge Manager, what kind of notation will you use? How about this:

image load average = 1.00
image load average = 0.50
image load average = 1.70
Here is the base value of the CPU load. “Machines” are processed using processor time gaps (“cross the bridge”), or are queued. In Unix, this is called the length of the execution queue : the number of all processes running at a time, plus the number of processes waiting in the queue.
You, as the bridge manager, would like the process machines to never wait in line. Thus, it is preferable that the processor load is always below 1.00. Bursts of traffic are periodically possible when the load exceeds 1.00, but if it constantly exceeds this value, this is a reason to start worrying.

So you say 1.00 is the ideal load average?


Not really. The problem with the value of 1.00 is that you have no stock left. In practice, many system administrators draw a line at 0.70:

What about multiprocessor systems? My server shows load 3.00 and everything is OK!


Do you have a four-processor system? It's okay if load average is 3.00.
In multiprocessor systems, the load is calculated relative to the number of available processor cores. 100% load is indicated by the number 1.00 for a single-core machine, the number 2.00 for a dual-core, 4.00 for a quad-core, etc.
If we return to our bridge analogy, 1.00 means “one fully loaded lane”. If there is only one lane on the bridge, 1.00 means that the bridge is 100% loaded, but if there are two lanes, it is only 50% loaded.
The same with processors. 1.00 means 100% single-core processor utilization. 2.00 - 100% dual-core loading, etc.
')

Multicore vs. multiprocessing


Which is better: one processor with two cores or two separate processors? In terms of performance, both of these solutions are roughly equal. Yes, about. Here there are many nuances associated with the size of the cache, switching processes between processors, etc. Despite this, the only important characteristic for changing the system load is the total number of cores, regardless of how many physical processors they are on.
Which leads us to two more practical rules:

Let's bring it all together


Let's look at the average load values ​​using the uptime :
 ~$ uptime 09:14:44 up 1:20, 5 users, load average: 0,35, 0,32, 0,41 

Here are the indicators for a system with a quad-core processor and we see that there is a large stock of load. I will not even think about it until load average exceeds 3.70.
What average value should I monitor? For one, five or 15 minutes?

For the values ​​that we talked about earlier (1.00 - fix it immediately, etc.), time intervals of five and 15 minutes should be considered. If the load on your system exceeds 1.00 in a one-minute interval, everything is fine. If the load exceeds 1.00 in the five or 15-minute interval, you should start taking action (of course, you should also take into account the number of cores in your system).
The number of cores is important for correctly understanding load average. How do I know him?

The cat /proc/cpuinfo displays information about all the processors on your system. To find out the number of cores, feed its output to the grep utility:
 ~$ cat /proc/cpuinfo | grep 'cpu cores' cpu cores : 4 cpu cores : 4 cpu cores : 4 cpu cores : 4 

Translator's notes


Above is a translation of the article itself. Also a lot of interesting information can be gleaned from the comments to it. So, one of the commentators says that it is not important for every system to have a production margin and not to allow load values ​​above 0.70 - sometimes we need the server to work "all the way" and in such cases load average = 1.00 - that the doctor prescribed.

PS


Habrayuzer dukelion added a valuable comment in the comments that in some scenarios, to achieve the maximum efficiency of the hardware, it is worth keeping the load average slightly higher than 1.00 to the detriment of the efficiency of each individual process.

Pps


Habrayuser enemo in comments added the remark that the high load average can be caused by a large number of processes that are currently performing read / write operations. That is, load average > 1.00 on a single-core machine does not always mean that your system does not have a stock on processor load. A more careful study of the reasons for this indicator is required. By the way, this is a good topic for a new post on Habré :-)

PPPS


Habrayuser esvaf in the comments is interested in how to interpret the load average values ​​in the case of using a processor with HyperThreading technology. There is no definite answer at the moment. This article argues that a processor that has two virtual cores with one physical core will be 10-30% more productive than a simple single-core one. If we take such an assumption for the truth, I believe that when interpreting the load average it is worthwhile to take into account only the number of physical cores.

Source: https://habr.com/ru/post/216827/


All Articles