VM performance analysis in VMware vSphere. Part 2: Memory

Part 1. About CPU

In this article we will talk about the performance counters of RAM (RAM) in vSphere.
It seems to be all the more unequivocal with the memory than with the processor: if the VM has performance problems, it's hard not to notice them. But if they appear, to deal with them is much more difficult. But first things first.
')

A bit of theory

The virtual memory of virtual machines is taken from the memory of the server on which the VMs are running. It is quite obvious :). If the server’s RAM is not enough for everyone, ESXi begins to use memory reclamation techniques. Otherwise, VM operating systems would crash with RAM access errors.

What technology to use ESXi solves depending on the workload of RAM:

Memory status	The border	Actions
High	400% of minFree	After reaching the upper limit, large memory pages are divided into small ones (TPS works in standard mode).
Clear	100% of minFree	Large memory pages are broken into small, TPS works forcibly.
Soft	64% of minFree	TPS + Balloon
Hard	32% of minFree	TPS + Compress + Swap
Low	16% of minFree	Compress + Swap + Block

A source

minFree is the RAM needed for the hypervisor to work.

Prior to ESXi 4.1 inclusive, minFree was fixed by default - 6% of the server's RAM (the percentage could be changed via the Mem.MinFreePct option on the ESXi). In later versions, due to an increase in the amount of memory on minFree servers, it was calculated based on the amount of host memory, and not as a fixed percentage.

The minFree value (default) is calculated as follows:

Memory percentage reserved for minFree	Memory range
6%	0-4 GB
four%	4-12 GB
2%	12-28 GB
one%	Remaining memory

A source

For example, for a server with 128 GB of RAM, the MinFree value would be:
MinFree = 245.76 + 327.68 + 327.68 + 1024 = 1925.12 MB = 1.88 GB
The actual value may differ by a couple of hundred MB, it depends on the server and RAM.

Memory percentage reserved for minFree	Memory range	Value for 128 GB
6%	0-4 GB	245.76 MB
four%	4-12 GB	327.68 MB
2%	12-28 GB	327.68 MB
one%	Remaining memory (100 GB)	1024 MB

Usually, for productive stands, only the High state can be considered normal. For testing and development stands, Clear / Soft states may be acceptable. If the RAM on the host is less than 64% MinFree, then the VMs running on it have performance problems.

In each state, certain memory reclamation techniques are applied, starting with TPS, which have practically no effect on VM performance, ending with Swapping. I'll tell you more about them.

Transparent Page Sharing (TPS). TPS is, roughly speaking, the deduplication of the pages of RAM of virtual machines on the server.

ESXi searches for the same virtual machine RAM pages, counts and compares the hash-sum of pages, and removes duplicate pages, replacing them with links to the same page in the physical memory of the server. As a result, the consumption of physical memory is reduced and it is possible to achieve some oversubscription from memory with almost no performance degradation.

A source

This mechanism works only for memory pages of 4 KB in size (small pages). Pages that are 2 MB in size (large pages) do not even try to deduplicate the hypervisor: the chance to find identical pages of this size is not great.

By default, ESXi allocates memory to large pages. Splitting large pages into small ones begins when the High state threshold is reached and it occurs forcibly when the Clear state is reached (see the hypervisor state table).

If you want TPS to start work without waiting for the host's RAM to be filled in, you need to set the “Mem.AllocGuestLargePage” value to 0 in the Advanced Options ESXi (default 1). Then the allocation of large pages of memory for virtual machines will be disabled.

Since December 2014, all releases of ESXi TPS between VMs are disabled by default, since a vulnerability was found that theoretically allows getting another VM from the VMs. Details here. Information about the practical implementation of the exploitation of the vulnerability TPS I have not met.

The TPS policy is controlled by the advanced option “Mem.ShareForceSalting” on ESXi:
0 - Inter-VM TPS. TPS works for different VM pages;
1 - TPS for VMs with the same value “sched.mem.pshare.salt” in VMX;
2 (default) - Intra-VM TPS. TPS works for pages inside the VM.

It definitely makes sense to turn off large pages and turn on Inter-VM TPS on test benches. It can also be used for stands with a large number of similar VMs. For example, on stands with VDI, physical memory savings can reach tens of percent.

Memory Ballooning. Ballooning is no longer as innocuous and transparent to the VM operating system as TPS. But with proper use with Ballooning you can live and even work.

Together with Vmware Tools, a special driver is installed on the VM, called the Balloon Driver (aka vmmemctl). When the hypervisor begins to lack physical memory and it enters the Soft state, ESXi asks VMs to reclaim unused RAM through this Balloon Driver. The driver, in turn, works at the operating system level and requests free memory from it. The hypervisor sees which pages of physical memory the Balloon Driver has occupied, takes the memory from the virtual machine and returns it to the host. There are no problems with the operation of the OS, since at the OS level the memory is occupied by the Balloon Driver. By default, Balloon Driver can take up to 65% of VM memory.

If VMware Tools are not installed on the VM or Ballooning is disabled (I do not recommend it, but there is a KB :), the hypervisor immediately switches to more demanding methods of removing memory. Conclusion: make sure that VMware Tools on the VM are available.

You can check the work of Balloon Driver from the OS via VMware Tools .

Memory Compression. This technique applies when ESXi reaches the Hard state. As the name implies, ESXi tries to compress 4 KB of a RAM page to 2 KB and thus free some space in the physical memory of the server. This technique significantly increases the access time to the contents of VM RAM pages, since the page must first be released. Sometimes not all pages can be compressed and the process itself takes some time. Therefore, this technique is not very effective in practice.

Memory Swapping. After a short phase, Memory Compression ESXi is almost inevitable (if the VMs did not go to other hosts or turned off) goes to Swapping. And if there is very little memory left (Low state), then the hypervisor also stops allocating VM memory pages, which can cause problems in VM guest OSs.

This is how Swapping works. When you turn on a virtual machine, a file with the .vswp extension is created for it. It is equal in size to non-reserved VM RAM: this is the difference between configured and reserved memory. When Swapping works, ESXi unloads virtual machine memory pages into this file and starts working with it instead of physical server memory. Of course, such such “operational” memory is several orders of magnitude slower than the real one, even if .vswp lies on fast storage.

Unlike Ballooning, when unused pages are selected from a VM, when Swapping'e, pages that are actively used by the OS or applications inside the VM can be moved to disk. As a result, VM performance drops until it hangs. VM formally works and at least it can be properly disconnected from the OS. If you will be patient;)

If the VM went to Swap - this is an emergency situation, which as far as possible is better not to allow.

Core Virtual Machine Memory Performance Counters

So we got to the main thing. To monitor the state of memory in the VM there are the following counters:

Active - shows the amount of RAM (KB) to which the VM gained access in the previous measurement period.

Usage is the same as Active, but as a percentage of the configured VM RAM. It is calculated using the following formula: active ÷ virtual machine configured memory size.
High Usage and Active, respectively, are not always indicative of VM performance problems. If a VM uses memory aggressively (at least it accesses it), this does not mean that there is not enough memory. Rather, it is a reason to see what happens in the OS.
There is a standard Alarm for Memory Usage for VM:

Shared is the amount of RAM in a VM deduplicated using TPS (inside a VM or between a VM).

Granted - the amount of physical memory of the host (Kbytes), which was given to the VM. Includes Shared.

Consumed (Granted - Shared) - the amount of physical memory (KB) that the VM consumes from the host. Does not include Shared.

If part of the VM memory is not given from the physical memory of the host, but from the swap file or the memory is taken from the VM via the Balloon Driver, this volume is not counted in Granted and Consumed.
High values for Granted and Consumed are perfectly normal. The operating system gradually takes the memory from the hypervisor and does not give it back. Over time, the actively working VM values of these counters approach the amount of configured memory, and remain there.

Zero - the amount of RAM VM (Kbytes), which contains zeros. Such memory is considered a free hypervisor and can be given away to other virtual machines. After the guest OS received recorded something in the memory, it goes into Consumed and does not return back.

Reserved Overhead - amount of RAM VM, (KB) reserved by the hypervisor for VM operation. This is a small amount, but it must be available on the host, otherwise the VM will not start.

Balloon - amount of RAM (Kbytes) taken from the VM using the Balloon Driver.

Compressed - the amount of RAM (KB), which managed to compress.

Swapped - the amount of RAM (KB), which, in the absence of physical memory on the server, moved to disk.
Balloon and other memory reclamation techniques counters are zero.

Here is a graph with Memory counters of a normally working VM with 150 GB of RAM.

On the graph below, VM has obvious problems. Under the graph it can be seen that for this VM all the described techniques of working with RAM were used. Balloon for this VM is much more than Consumed. In fact, the VM is more dead than alive.

ESXTOP

As with the CPU, if we want to quickly assess the situation on the host, as well as its dynamics at intervals of up to 2 seconds, we should use ESXTOP.

The ESXTOP over Memory screen is called up by the “m” key and looks like this (the fields B, D, H, J, K, L, O are selected):

The following parameters will be interesting for us:

Mem overcommit avg - average oversubscription memory on a host for 1, 5 and 15 minutes. If above zero, then this is a reason to see what is happening, but not always an indicator of the presence of problems.

The PMEM / MB and VMKMEM / MB lines contain information about the physical memory of the server and the memory available to VMkernel. From the interesting here you can see the value of minfree (in MB), the state of the host by memory (in our case, high).

In the NUMA / MB line you can see the distribution of RAM by NUMA nodes (sockets). In this example, the distribution is uneven, which in principle is not very good.

The following is general server statistics for memory reclamation techniques:

PSHARE / MB is TPS statistics;

SWAP / MB - Swap usage statistics;

ZIP / MB - statistics of compression of memory pages;

MEMCTL / MB - usage statistics for Balloon Driver.

For individual VMs we may be interested in the following information. I hid the VM names so as not to embarrass the audience :). If the ESXTOP metric is similar to the counter in vSphere, I quote the corresponding counter.

MEMSZ is the amount of memory configured on the VM (MB).
MEMSZ = GRANT + MCTLSZ + SWCUR + untouched.

GRANT - Granted in MB.

TCHD - Active in MB.

MCTL? - whether the Balloon Driver is installed on the VM.

MCTLSZ - Balloon in MB.

MCTLGT is the amount of RAM (MB) that ESXi wants to withdraw from the VM via the Balloon Driver (Memctl Target).

MCTLMAX is the maximum amount of RAM (MB) that ESXi can remove from the VM via the Balloon Driver.

SWCUR is the current amount of RAM (MB) given to the VM from the Swap file.

SWGT is the amount of RAM (MB) that ESXi wants to send to the VM from the Swap file (Swap Target).

Also through ESXTOP you can see more detailed information about VM NUMA-topology. To do this, select the fields D, G:

NHN - NUMA nodes on which the VM is located. Here you can immediately notice wide vm, which do not fit on one NUMA node.

NRMEM - how many megabytes of VM memory is taken from a remote NUMA node.

NLMEM - how many megabytes of VM memory is taken from a local NUMA node.

N% L is the percentage of VM memory on the local NUMA node (if less than 80%, performance problems may occur).

Memory on the hypervisor

If the CPU counters for the hypervisor are usually not of particular interest, then the situation with memory is the opposite. High Memory Usage on the VM does not always indicate a performance problem, but high Memory Usage on the hypervisor just starts the work of a memory management technician and causes problems with the performance of the VM. The alarms for Host Memory Usage should be monitored and prevented the VM from getting into Swap.

Unswap

If a VM gets into Swap, its performance is greatly reduced. Traces of Ballooning and compression quickly disappear after the appearance of free RAM on the host, but the virtual machine is not in a hurry to return from Swap to the server's RAM.
Prior to ESXi 6.0, the only reliable and fast way to remove VMs from Swap was to reboot (more precisely, switching off / on the container). Starting with ESXi 6.0, although not quite official, it appeared to be a working and reliable way to remove VMs from Swap. At one of the conferences, I was able to communicate with one of the VMware engineers responsible for CPU Scheduler. He confirmed that the method is quite working and safe. In our experience there were no problems with it either.

Actually, the commands for withdrawing VMs from Swap were described by Duncan Epping. I will not repeat the detailed description, just give an example of its use. As you can see in the screenshot, some time after the execution of the specified Swap commands on the VM disappears.

ESXi RAM Management Tips

Finally, here are some tips to help you avoid problems with VM performance due to RAM:

Do not allow oversubscription for RAM in productive clusters. It is advisable to always have ~ 20-30% of free memory in the cluster, so that the DRS (and the administrator) have room for maneuver, and when migrating, the VMs do not go to Swap. Also do not forget about the margin for fault tolerance. It is unpleasant when, when a single server fails and the VM is rebooted using HA, some of the machines also go to Swap.
In high consolidation infrastructures, try NOT to create VMs with more than half the memory of the host. This again will help DRS distribute virtual machines to the cluster servers without any problems. This rule, of course, is not universal :).
Watch out for Host Memory Usage Alarm.
Do not forget to install VMware Tools on the VM and do not turn off Ballooning.
Consider turning on Inter-VM TPS and turning off Large Pages in VDI and test environments.
If the VM has performance problems, check if it uses memory from the remote NUMA node.
Take your VM out of Swap as quickly as possible! In addition, if the VM in Swap, for obvious reasons, the storage system suffers.

On this about the memory I have everything. Below are related articles for those who want to go into details. The following article will be devoted to the story.

useful links

http://www.yellow-bricks.com/2015/03/02/what-happens-at-which-vsphere-memory-state/
http://www.yellow-bricks.com/2013/06/14/how-does-mem-minfreepct-work-with-vsphere-5-0-and-up/
https://www.vladan.fr/vmware-transparent-page-sharing-tps-explained/
http://www.yellow-bricks.com/2016/06/02/memory-pages-swapped-can-unswap/
https://kb.vmware.com/s/article/1002586
https://www.vladan.fr/what-is-vmware-memory-ballooning/
https://kb.vmware.com/s/article/2080735
https://kb.vmware.com/s/article/2017642
https://labs.vmware.com/vmtj/vmware-esx-memory-resource-management-swap
https://blogs.vmware.com/vsphere/2013/10/understanding-vsphere-active-memory.html
https://www.vmware.com/support/developer/converter-sdk/conv51_apireference/memory_counters.html
https://docs.vmware.com/en/VMware-vSphere/6.5/vsphere-esxi-vcenter-server-65-monitoring-performance-guide.pdf

Source: https://habr.com/ru/post/455820/

All Articles