Accounting for RAM in the cloud

We continue a detailed analysis of how resources are counted.

Before we discuss how memory is taken into account, first we will look at how this memory is allocated to a virtual machine, and what the “virtual machine memory” is.

Real Virtual Machine Memory

The Xen hypervisor, which is the foundation of XCP, which is the foundation of the Selectle cloud, controls several aspects of the virtual machines. From the point of view that interests us from the point of view, the processor and memory. The processor we discussed is now the turn of the RAM.
')
From the point of view of Xen, the allocation of memory to a domain (virtual machine) means that the domain has the right to write to the specified memory page. Attempting a domain to write to a memory page that is forbidden for it will cause an exception situation and, with a high probability, the domain will stop working, so the kernel of the guest system is careful not to go beyond the permitted memory (exactly the same if the program tries to access the nonexistent memory page , then the kernel will crash the program or call an error handler). In any case, the virtual machine is allowed to use only the memory that it was allowed to use. Thus, Xen always knows exactly how many memory pages are allocated to a particular virtual machine. (Yes, the minimum gradation of memory accounting is a 4kb piece of memory called a “page”). I’ll leave out the section on address translation, as this is one of the most ... difficult ... areas. In short - add two more descriptor tables to the usual virtual memory translation scheme on i386 - you will get about it.

When a domain is created (that is, the virtual machine starts), a program called domain_builder tells Xen how many pages to allocate to the domain. The same information is reported to the kernel of the guest system so that it knows how much memory is allocated to it.

A modd (memory on demand dæmon) is running inside the virtual machine, which, using the xenstore (the zena-specific inter-domain interaction method), informs the management service about the state of the domain's memory. In reality, this is just a record of the contents of / proc / meminfo, no more. The server looks at the settings of the virtual machine and decides how much memory to add or remove. And gives the command to change the memory.

And here begins the most interesting. In Xen, there is the concept of transferring pages of memory . This, literally, means "take a memory page from one domain and transfer to another." Accordingly, when a command is given for the return / reception of memory, Xen takes the memory from the guest system, or gives it this memory. Due to the general (useful) paranoia of Xen, all pages returned from the domain are reset to zero (so as not to accidentally give away valuable data to outside virtualization neighbors).

It is clear that the memory has limits to reduce - the kernel and the basic programs (init, getty, etc.) want some sort of memory for themselves. Below it is impossible to descend, and it is dangerous to approach (it is possible to “slightly” press and cause OOM or other unpleasant problems). This is what explains the rather high lower memory limit.

The upper limit is more complicated. When creating a domain (starting a virtual machine), the upper limit of memory is set, and the starting machine reserves a certain amount of service data for possible “future” pages. This is not a very large amount, but it is still there, and the high upper memory limit increases memory consumption. Therefore, the official recommendation is not to make a minimum / maximum ratio greater than 8-10. In theory, when the work of the memory hotplug is fully stable, this limit will be removed, but at the moment the technology is too experimental to charge for it.

Thus, the memory changes automatically within the specified limits. We will talk about MOD again after our programmers finish the ability for clients to change the MOD settings.

But, within the framework of general curiosity, I will answer what will happen if MOD is turned off. The memory will simply cease to be regulated (it is fixed in the volume that was, and everything will calm down on this). Memory accounting is not related to MOD.

Memory accounting

It is clear that the classic VDS "XXX megabytes per month" (the straight line in the picture on the right) does not fit - now it is 256 megabytes, and after ten seconds - already 500, or even a couple of gigabytes.

Instead, we consider the value called GB * h, that is, gigabytes per hour. In fact, our internal accounting unit is kb * ns (kilobyte-nanosecond), but the price of kilobyte-seconds is too small, and writing it in the price list - confusing people, not to mention accountants, frightening figures like "price 1.3 * 10 ^-19 rubles ”, or, even better,“ quantity: 10e + 23 ”, so that in the interface we bring this value to a reasonable figure (GB * h). But the internal accounting is similar to how we do with the processor - the money for the memory is removed as soon as a penny or more has come over.

GB * h - the value of synthetic. It can be 128 MB, stretched for 8 hours, or 4 GB for 15 minutes, or a combination of changing memory sizes, folded accordingly. The pictures show different uses of memory. All pictures show the interval in an hour of astronomical time.

Now about how we summarize. Once in a small time interval, we take the value of memory, multiply by the duration of the interval, we get the number of kb * s for this interval. These values are summed up, regardless of which host the machine is on. By the way, statistics is collected in the same place (for charts), but we really lack programmers (we are serious - we have programmer vacancies and we are waiting for you), so this will appear later in the interface for clients. In fact, this is the simplest numerical integration of a memory allocation curve, approximated by rectangles. To avoid unpleasant effects, we discard the first and last values, so you can assume that we give you 2 seconds of using memory for the entire life of the machine.

Disk Cache, Free Memory and Swap

Another question that frequently arises is the question of what memory we consider. If you look closely at the data / proc / meminfo or ps output, you can see a bunch of different values. Fortunately, all these values, including the size of the address space, are just "own inventions" of the guest system core. There is a value of allocated memory (totalmem), it indicates how many physical memory pages are allocated to the virtual machine.

By the way, I forgot to say. Xen has no overcell. What is oversell? This is when 10 virtual machines say “here you are 500 MB,” and there is only 2 GB on the node (the server that does virtualization). Funny But this is how any host lives under openVZ. Since programs rarely clog up 100% of RAM, the remnants are “resold” (eng. Oversell) again. In most cases, this is good, but what if all ten cars occupied 400MB each? Trouble So, Xen does not have an overcell mode, and moreover, the memory model in Xen does not, in principle, allow for the situation of “promised but not allocated” RAM. There are very specific situations of “common” memory pages (they are used in drivers and the account goes into units of kilobytes), but there are no situations where application memory pages (and not drivers) are common between different virtual machines. All allocated memory for the virtual machine belongs to her and only to her. Even if this memory is not used.

Actually, we return to free memory. The Linux kernel is known for being greedy for memory - all free memory is occupied by disk cache. We'll talk about this a little later, but disk cache in the cloud is a very useful thing, as it allows you to significantly reduce the number of disk operations (especially reading). Free memory can only be on a very idle machine with excess memory. We try to provide such a mode of operation so that there is no “completely free” memory. Although, if the virtual machine is idle, the situation when 60 of 128 MB are occupied is quite possible. We take into account all the allocated domain memory - and yes, free and idle memory is taken into account, since it was allocated for exclusive use of the domain.

Now a little about the swap-file, the paging file (more precisely, most often it is a paging section). swap is very useful for the guest system - it allows the kernel to increase the size of virtual memory . With MOD turned on, the only purpose of the paging file is to increase virtual memory (in fact, it is the “oversell” that suits the kernel for processes in the guest machine). In addition, he insures against sudden leaps of memory. MOD reacts quite quickly (reaction time less than a second), however, in certain conditions (for example, someone asked for a lot of memory) may not be in time. In this case, the paging file will resolve the situation, the request will be successful, and the arrived in time MOD will provide the necessary amount of memory). In addition, the situation when a part of the process of a running machine is pushed out to the swap file is normal - this is a natural optimization of memory usage. For example, CentOS (in my opinion, a rather thick system overloaded with baubles) launches quite a few demons, a decent part of which is not used most of the time. Such demons are pushed into the paging file and lie there for weeks without movement.

We do not recommend disabling the paging file, as this does not save much money, but the risks of running into a MemoryError message at the time of an ugly tampering with a memory request from a thick application increase greatly. Moreover, if it turns out that many processes reserve a lot of virtual memory, we advise, on the contrary, to increase the size of the paging file (for example, by creating and connecting the second) - this will give more freedom for the operating system to manage address space.

Source: https://habr.com/ru/post/110792/

All Articles

Accounting for RAM in the cloud

Real Virtual Machine Memory

Memory accounting

Disk Cache, Free Memory and Swap

More articles: