After launch, I received a lot of questions about exactly how resources in the cloud are counted. Some intuitively understand what “processor time” is, but there are those who want a detailed explanation. Since in the general announcement detailed explanations would take a lot of space, I brought it to a separate topic. At the same time, this format will allow us to describe in more detail how Zen and the virtual machines interact. The level of this text is popular science, that is, I will not go into the wilds of ring buffers, disguise events, “credit planner”, etc., instead, I will try to tell you in a human language about how the hypervisor manages the guest machines.

What is "CPU time"? At first, we wanted to call it more familiar “machine time,” since such a term was used during mainframe times, when the idea of sharing computer time was only born, but stopped in time. The machine time of those years meant all the resources that were used by the machine, and in our case we are talking about the processor, since each limited resource is taken into account separately.
So, what is “processor time” and how can it turn out that one virtual machine counts it 4 hours a day, and another winds up 30 “hours” in ten hours?
')
Selectel Cloud is running the Xen, more precisely, the Xen Cloud Platform, in which Xen acts as the hypervisor.
Xen has the concept of a “domain scheduler”. Leaving aside the difference between a domain and a virtual machine (a domain is a running virtual machine, when the virtual machine reboots, a new domain is obtained, when the virtual machine is turned off, there is no domain, and the machine itself is), we can assume that this virtual machine scheduler. Those who are familiar with the work of modern operating systems probably already guessed that the domain scheduler is suspiciously similar to the process scheduler in these most advanced operating systems.
How does the virtual machine look like?
An event occurs: a network packet arrives, a timer fires, a reboot signal, etc. Xen gives the processor a command to start running the virtual machine (more precisely, the domain, but within this explanation we will consider these concepts equivalent). The kernel of the virtual machine handles the event that caused it to be woken up. If necessary, it invokes user processes. Processes do their work and tell the kernel "everything is finished." The kernel deals with its questions and also says to the hypervisor (Xen) - “that's it, I'm done.” After that, Xen
stops the execution of the machine. She simply does nothing in the literal sense of the word. The car remains in this state until a new event occurs.
In modern machines, these events occur at a tremendous speed - for example, if you download a file at a speed of 5 Mb / s, then this (with a packet size of 1500 bytes) is more than 3000 packets per second. Each packet is a separate interruption (more precisely, in Xen, everything is smarter, there are several calls merged into one, so sometimes the virtual machine is a little faster than even on bare hardware). And every such event is an awakening of the machine. But the speed of modern processors is such that after each such call, the core of the virtual machine and the processes (for example, Apache or nginx) have time to work out and fall asleep. The 5Mb / s statics return is a very low load, about 1-2% of one processor core, so, despite the fact that events occur at an interval of 300 microseconds, the virtual machine runs for 3-6 microseconds and the remaining 294- 296 microseconds have time to tell the hypervisor "I am everything" and fall asleep. And after a microsecond, wake up again, work out and fall asleep again. So it turns out that the virtual machine just sleeps most of the time.
These are precisely the moments of time when the virtual machine is running and is “CPU time.”
A thoughtful reader may ask - what if the virtual machine doesn’t say "I am everything"? If we had Windows 3.11, where there was
cooperative multitasking , then this would lead to the fact that the rest would not have received their due time. But in Xen,
preemptive multitasking is used — and a virtual machine that works too greedily will simply be suspended. Forcibly. And then continued again.
Typically, this situation occurs in a shortage of processor time, and the Xen authors have spent thousands of hours developing fair planners who, in conditions of processor overload, solve the problem of allocating time so that everyone continues to work more or less evenly.
However, in real conditions of modern hosting, the speed of the processor is so high that the processor is the least demanded and most idle resource and in 99% of cases there is no competition for resources at all.
CPU time is the time during which the virtual machine is running. If she worked 2s per hour, then it is. 40 minutes means forty minutes. CPU time has nothing to do with "real" time on the clock. Since Xen is commanding virtual machines, Xen knows how long each machine has worked for up to a nanosecond. We round this value to microseconds (to avoid problems with int64), and only whole seconds are fixed in the billing (the fractional part is accumulated until it runs for a second). Money for processor time is written off as soon as it runs at least 1 kopeck (currently 36 seconds). For comparison, a virtual machine load eats up about 3-6 seconds of computer time, and this is the most “expensive” operation in the domain life cycle.
Another important detail is multiprocessing. Multiple processor cores are actually independent processors. And they can work in parallel. Suppose we give 5 MB / s to several users. At some point in time, we have to send a new packet before we sent the old one (for example, it took an interval of 0.5 microseconds). If we had one processor, this request would be in the queue and was processed after the first one. But if there are several processors, the request will be processed by the first free core, regardless of those already occupied.
If the load is high, it turns out that several processors are working simultaneously. In this case, each of them works, and the processor time is summed up. Two simultaneously loaded cores - 2s computer time per second. Eight means eight. Although in reality it usually turns out that several cores are occupied, but not completely, that is, at some point 2 cores are working, at some 3, and at some point none of them. So it is quite possible to see 10 minutes of processor time per hour on an 8-nuclear machine serving tens of thousands of customers.
If the machine load is less than 100% (that is, it consumes less than an hour of processor time per hour), then, formally, it would be possible to limit one core.
But, remember what I said above about simultaneous customer service? Several cores provide greater “responsiveness” to requests, although perhaps one core would have done well, albeit at the cost of increasing the response time to a request.
By the way, this is the answer and one more question: does the number of cores affect the CPU time spent? The answer is no, if these cores are idle, then processor time is not used. A large number of cores only reduces the delay in servicing simultaneous requests from multiple clients.

Well, after a little about how to understand the concept of "gives time", "allocates time." The processor is a piece of silicon silicon and stupid. All the processor can do is execute the code (well, respond to interrupts). And the processor doesn't really understand the “virtual machine domain” this, or a running copy of angry birds. Thus, the concept of “domain”, “hypervisor” is in a sense of convention. When we say “the virtual machine was working 10 ms”, we actually mean the phrase “the processor executed the virtual machine code 10 ms”. When we say "the hypervisor supplanted the virtual machine," we actually mean "by interrupting the timer, the processor updated the time counter, saved the process context and transferred control to another place other than where the timer interrupted it." Such a translation of an object (code) into a subject that has the ability to act greatly simplifies the explanation - each program has an behavior algorithm, and it is easier to say that “the program behaves this way” instead of saying “processor, executing the program, does this and that. ”
Now a little about how much he eats. At the beginning of the article there is a schedule of a very busy server that holds an asterisk with calls from the whole company, a web server, collecting statistics from routers, etc. Below is a website with approximately 5,000 unique visitors per day. This is to the question of whether modern server applications are heavily used by the processor (cyan on the charts is an idle processor).