The third part. Previous parts: First
In this topic: memory management and virtual machine processors.
In order to understand how XCP works with memory, you need to understand how Xen works with it. Unlike OpenVZ, Xen always allocates memory to a virtual machine (more precisely, a domain) for exclusive use. Domain memory is a domain memory and only. No overselling, no shared pages, no hypervizor swap (virtual machines can, of course, swap). If you have 4GB, then approximately 3.5GB can be divided between guest machines (512 will go to dom0). How you share them is your freedom. But to give the car more memory than it is available, you can not. Not. Point.
But in the management of actually allocated memory, everything is very good. In Xen 3.4, the memory management mechanism (xenballoon) is based on a brain that is quite complex for perception, but simple in terms of the hypervisor, the basis: memory pages are transferred (transfer) between the domain and the hypervisor.
Transmission means that the transmitting memory becomes less, and the receiving one - more. This mechanism uses not only xenballoon, but also many drivers (in order not to make copies, they simply transfer pages with data to each other), but only xenballoon abuses them in such volumes. The balloon itself works exactly as its name sounds - it is “inflated” within the domain, giving up the occupied space to the hypervisor. When a domain is given a memory, the ball is deflated.
Memory management is not free, it takes a bit of domain memory for service costs itself. In the worst case, it is about 5%, the more memory, the smaller the percentage (by 16GB of memory, the value drops to about 1%).
There are two memory management scenarios: adding memory to a running machine and removing memory from a running machine. With removal everything is simple - xenballoon gives pages to the hypervisor. With the addition it is more complicated - the kernel actually cannot expand its address space (in Xen 3.4 memory hotplug was not yet supported), so this scheme is implemented a little differently: when creating a domain, some of the pages are marked as allocated to the domain and immediately given back to the hypervisor ( i.e., balloon is considered to be pre-inflated in advance), and at the right time, the hypervisor returns pages back to the domain.
By virtue of this model, there is a theoretical ceiling (specified when creating a domain, that is, when starting a virtual machine), to which virtual memory can be expanded. It affects the size of service costs, and artificially limits the lower memory limit (for 16GB, this is approximately 512MB, for 2GB, it seems 128MB). Thus, we get two boundaries that are set when starting the domain -
. Despite the fact that the user can make memory_static_min arbitrarily low, the amount of memory will never fall below a certain value, calculated from the maximum. It is encoded directly in the xenballloon.c code and serves to protect against the OOM killer on level ground (so that there is enough memory for the kernel overhead and a minimum set of software to function). In some places, he too aggressively lifts the bar (especially in the area of the ceiling of 2-4GB), but in general, the values are quite reasonable.
In addition to these restrictions, two more (artificial) values are introduced - memory_dynamic_max and memory_dynamic_min. These values define the boundaries within which the memory can be regulated and can change on the fly (that is, it is simply an administrative convention). There is also a target value (memory_target) - the desired amount of memory. Most often, the amount of memory is divided as needed (i.e. memory = memory_target), but if the request to the target cannot be executed, then these values may differ.
There are two methods for adjusting the virtual machine's memory capacity - manual and automatic. Manual sets the target to the specified value. Automatically sets the boundaries (maximum / minimum) and gives XCP the ability to regulate the amount of memory of each domain on its own.
The memory regulation is called Dynamic Memory Control and is a squeezed daemon running on each pool host (it has nothing to do with Debian Squeeze, just a coincidence). This daemon allocates a maximum of available memory to new domains (but no more than dynamic-max), and in case of need of a hypervisor in memory (for example, to start a new virtual machine), it presses all machines (but not less than dynamic-min). If the squeezed failed to free up the required amount of memory, the host refuses to start the new virtual machine (if it is started
command without specifying where to start, the machine will start where it is, if it says “start here”, there will be a failure at launch).
On the move, you can change the values of dynamic-min / max, of course, within the available. When changing values, the possibility of change is not checked, and you can get a machine with memory = 512Mb, dynamic-min = 2GiB, target = 4GiB, dynamic-max = 8GiB. (on reboot, if conditions cannot be met, the machine will not start).
In any case, the formula is checked for each change:
static-min ≤ dynamic_min ≤ target ≤ dynamic_max ≤ static-max
Xen supports dynamic power on / off processors. (And yes, answering questions from comments - there can be more than 4 cores, I personally saw 16 cores with one virtual machine, the number of processors is not limited). By default, all machines are made with one processor, to enable multiprocessing, they must be turned off. There are three values that determine behavior:
- VCPUs-max - the maximum number of processors for a domain (set when creating a domain, that is, when you turn on the virtual machine).
- VCPUs-number - the current number of processors
- VCPUs-at-startup - The number of processors when the machine is turned on (creating the domain)
It must be said that Xen is blatantly cheating, and he does not know how to connect processors on the go. In fact, VCPUs-max processors are turned on in the virtual machine, just some of them are marked off. Thus, connecting / disconnecting processors is not their hotplug from the point of view of the OS, but simply “enable / disable”. However, this does not reduce the complexity of the task for guest systems by changing the number of processors involved in planning processes. Some programs rage about changing the number of processors (the most characteristic is atop, which, when connected to a pair of processors, begins to assume that they were always there, but was “stolen” by the hypervisor). Most programs do not care - because they do not think about the number of processors. The "connected, but turned off" processors have a small penalty — the kernel still knows about them and reserves memory by the number of VCPUs-max cores, not by the number of included ones.
There may be more virtual processors than physical processors, however, this leads to lags in the machine (instead of quiet execution of the code, the hypervisor constantly switches the context between processors).
Since there are usually fewer processors than virtual machines, there is competition for CPU time, and Zen is in charge of planning virtual machines in the same way that the OS kernel does in relation to processes. At the moment there are three schedulers (they can be changed on the go) - honest, real time and experimental. Frankly, I did not look at them deeply; standing by default “fair” is enough for most tasks.
Between each other, virtual machines can have processor limits and relative priorities.
The limit is set as a value cap, which gives the maximum allowed machine time for each processor as a percentage (attention to each core, cap = 75% with two cores means 150% of computer time, in some documentation they say that cap is total consumption machines, that is, cap = 150, cap = 330, etc. - this is not true). cap = 10 will give us 10% of the machine time and turn a powerful Xeon into a very regrettable P2 (or even P1), which is especially acute at the boot stage. In principle, reducing the cap below 25% is not worth it, because the machine can start to blunt even in the console (with a spreading bash competition easily).
The second mechanism is more interesting - this is a relative priority. It is set in hypervirtual parrots (weight). That virtual machine with a longer hyper-virtual parrot gets more machine time.
The strict ratio is that a machine with weight = 20 receives at least twice as much machine time as a machine with weight = 10 (if you want to consume this machine time, if the “priority” machine is idle, then the low-priority one can eat as much as it wants). These numbers quietly scale up to tens of thousands (i.e., maybe a machine with weight = 20000 and weight = 1, in this case the second one, in fact, works in idle priority).
Important: all priorities, unfortunately, can be changed only when the domain is stopped.
For extremely fine connoisseurs of digging in the guts, it is possible to assign specific kernels (so-called VCPU-pinning) to a virtual machine and the possibility of masking (disabling) some of the processor's capabilities (VPCU-features).
You can see the real situation with virtual machines using