Work with KVM virtual machines. Virtual Machine Resource Limiting

In the previous ~~series of~~ publications, we discussed with you the issues of preparing a host machine , creating and cloning virtual machines. Today I will tell about not less important question - about restriction of use of resources by virtual computers.

')

Introduction to cgroups

To limit the resources used by virtual machines, I suggest using cgroups , a kernel subsystem that uses the scheduling linux tasks to set the required limits.

In runet, the work of cgroups is practically not covered, on foreign Internet, basically, everything is limited to the description of the famous patch “ 200 lines kernel patch that does wonders ”, except, perhaps, the article by Daniel Berrange .

In this article, I want to consider working with cgroups in relation to virtual machines, but this does not prevent us from using this subsystem to perform common desktop tasks.

In fact, cgroups is a hierarchical file system, similar to / sys or / proc, which provides a simple interface for accessing the internal mechanisms for allocating resources in the kernel. To understand what this file system is, an example will help us: create mount points and mount two controllers there - cpu and blkio , and see what's inside.

# mkdir /cgroup
# mkdir /cgroup/cpu
# mkdir /cgroup/blkio
# mount -t cgroup -ocpu cgroup /cgroup/cpu
# mount -t cgroup -oblkio cgroup /cgroup/blkio
# ls /cgroup/cpu
cgroup.clone_children cgroup.event_control cgroup.procs cpu.rt_period_us cpu.rt_runtime_us cpu.shares notify_on_release release_agent sysdefault tasks
# ls /cgroup/blkio
blkio.io_merged blkio.io_service_time blkio.throttle.io_service_bytes blkio.throttle.write_bps_device blkio.weight_device tasks blkio.io_queued blkio.io_wait_time blkio.throttle.io_serviced blkio.throttle.write_iops_device cgroup.clone_children notify_on_release blkio.io_service_bytes
blkio.reset_stats blkio.throttle.read_bps_device blkio.time cgroup.event_control release_agent
blkio.io_serviced blkio.sectors blkio.throttle.read_iops_device blkio.weightcgroup.procs sysdefault

The tasks file in each branch of the tree contains the PID of the processes that will be managed through a specific group in cgroups. Thus it is possible to implement a hierarchical structure of limiting processes. It should be borne in mind that limits apply to all processes in a group.

CPU load limiting

A little bit about how you can limit the process. This can be done very simply:

Select the PID that we want to put in cgroups.
We put it in tasks.
We set limits (we assign a process weight, it can be set arbitrarily, but the value must be greater than zero).

# echo $$ > /cgroup/cpu/tasks
# echo 100 > /cgroup/cpu/cpu.shares

Where 100 is the weight assigned to the current bash process.

There are some special features in adding PID to tasks. For example, you can bind at most one process at a time, that is, echo PID1 PID2 ... will not work. In addition, it should be noted that echo does not return the correct result of writing - for the most correct implementation, use the write () system call.

This limiting method has a significant disadvantage: the limits cannot be strictly defined. This limits their use in the service provision scheme, when a virtual amount of resources is assumed to be allocated for a virtual machine. The cgroups developers assume that the processor will be used 100% (for which it is, in fact, needed), and not to stand idle. This is implemented using the mechanism of scales.

But this minus is often a big plus. For example, take three virtual machines with weights of 50, 30, and 20, respectively, and one processor core. If all machines have a maximum load, then each will be allocated, respectively, 50, 30 and 20 percent of the CPU.

Often it is possible that a virtual machine does not need resources at some point in time. Let's say it will be the second car (with a weight of 30). Thus, two machines with weights of 50 and 20 will work: 71% (50 / (50 + 20)) of processor resources will be allocated to one machine, the second - 100% - 71% = 29%. This nuance of resource allocation will allow the virtual machine, if necessary, to use up to the full power of the kernel.

Disk Subsystem Limiting

With disks, the situation is more complicated, although the implementation of limits there may be more stringent.

Let's see what we have inside the blkio controller, which is responsible for managing disk IO.

# cd /cgroup/blkio
# ls
blkio.io_merged blkio.io_service_bytes blkio.io_service_time blkio.reset_stats blkio.time
blkio.weight_device cgroup.event_control release_agent blkio.io_queued
blkio.io_serviced blkio.io_wait_time blkio.sectors blkio.weight cgroup.clone_children
cgroup.procs notify_on_release tasks

As you can see, there are several parameters for limiting the performance of the disk subsystem, namely -

iops is the number of I / O operations per second
bps - bandwidth
weight is the weight of the system.

To indicate for which disk and which process iops or bps should be set, you need to determine the major and minor disk (see about device classification ) and send them to a special file (example for bps):

# ls -la /dev/sda
brw-rw---- 1 root disk 8 , 0 11 13:59 /dev/sda
# echo $$ > /cgroup/blkio/tasks
# echo 3 > /proc/sys/vm/drop_caches
# echo "8:0 1000000" > /cgroup/blkio/blkio.throttle.read_bps_device
# echo "8:0 1000000" > /cgroup/blkio/blkio.throttle.write_bps_device
# dd if=/dev/zero of=/tmp/zerofile bs=4K count=102400 oflag=direct
# dd if=/tmp/zerofile of=/tmp/zerofile2 bs=4K count=102400
25426+0
25425+0
104140800 (104 MB), 102,21 c, 1,0 MB/c

Therefore, if you need to use hard limits, the virtual machine should use a separate block device (for example, an LVM partition), which will have major and minor . For images as files, only the blkio.weight controller can be used.

There is a logical question why dd adheres to the same limits that we asked for the bash interpreter. Everything is very simple: cgroups keeps track of children that are forked by the parent and automatically bring them into tasks.

It should be noted that blkio.throttling * appears if you enable the CONFIG_BLK_CGROUP and CONFIG_BLK_DEV_THROTTLING parameters in the kernel. I recommend searching for menuconfig for the sake of interest according to CGROUP and BLK - by default there are a lot of interesting things disabled.

Unfortunately, cgroups limits only disk access without using caching. If you do not reset the caches or specify dd oflag = direct or iflag = direct , limits will not be applied. Read more about it here and here .

Everything is simpler with blkio.weight : everything works in the same way as cpu.shares.

Work with cgroups from an unprivileged user

In cgroups, an unprivileged user cannot write. But there is a set of utilities included in the libcg project that allow working with cgroups without administrative privileges. On Debian, they can be installed along with the cgroup-bin package.

Having installed the package, let's see which utilities are included in its composition:

$ ls /usr/*bin/cg*
/usr/bin/cgclassify /usr/bin/cgdelete /usr/bin/cgget /usr/bin/cgsnapshot /usr/sbin/cgconfigparser
/usr/bin/cgcreate /usr/bin/cgexec /usr/bin/cgset /usr/sbin/cgclear /usr/sbin/cgrulesengd

The most useful for us will be cgcreate and cgdelete , as well as a script that, after reading the configuration file, will automatically create the necessary groups when the system starts up - cgconfigparser .

Perform cgconfig configuration so that when you start the system we automatically mount the necessary file system:

$ cat /etc/cgconfig.conf
mount {
cpu = /cgroup/cpu;
cpuacct = /cgroup/cpuacct;
devices = /cgroup/devices;
# memory = /cgroup/memory;
blkio = /cgroup/blkio;
# freezer = /cgroup/freezer;
cpuset = /cgroup/cpuset;
}
$ sudo /etc/init.d/cgconfig restart

We don’t really need the freezer and memory controllers, and the memory, moreover, refuses to be mounted automatically, and we will need to mount it manually if necessary.

Let's create a group so that our unprivileged user ~~% username%~~ username can do self-improvement:

$ sudo cgcreate -f 750 -d 750 -a username:libvirt -t username:libvirt -g cpu,blkio:username

Where:

-f and -d set permissions on files and directories within a group, respectively;
-a and -t set the owner for the subsystem parameters and the tasks file
-g creates groups for the specified cpu and blkio controllers relative to the root (for example, in this case, they will be / cgroup / cpu / username and / cgroup / blkio / username, respectively).

After this, you will see that the username directory has been created in the / cgroup / cpu and / cgroup / blkio directories, to which the specified user username can write tasks for cgroups.

It is very convenient to create a group when starting a virtual machine and set limits there, and when stopped, delete the group:

$ sudo cgdelete cpu,blkio:username

When using cgroups in servicing virtual machines, you can somewhat automate the process of setting limits and creating groups. In the /etc/libvirt/qemu.conf file, you can specify controllers with which qemu can work. You can also add a list of devices to which the virtual machine can access:

cgroup_controllers = [ "cpu", "devices", "memory", "cpuset", "blkio" ]
cgroup_device_acl = [
"/dev/null", "/dev/full", "/dev/zero",
"/dev/random", "/dev/urandom",
"/dev/ptmx", "/dev/kvm", "/dev/kqemu",
"/dev/rtc", "/dev/hpet", "/dev/net/tun",
]

It is convenient to use virsh for setting cpu and blkio weights (part of the libvirt library, see the article about preparing the host system ):

$ virsh schedinfo --set cpu_shares=1024 debian_guest
$ virsh blkiotune debian_guest --weight 1024

At the same time, the value in /cgroup/cpu/sysdefault/libvirt/qemu/debian_guest/cpu.shares will change to 1024. But, unfortunately, an ordinary user cannot change the settings in the sysdefault tree.

Network load limiting

With the resources of the processor and the disk, we slowly figured out. Now the most important thing is the network. It is important for us to get a hard band delimitation and at the same time acceptable values of delay and load on the server. What we are going to do now is scientifically called shaping and QoS (Quality of Service). Selecting a band is easy - it is much more difficult to consider various scenarios for using a virtual machine.

Suppose we need to allocate a band of 5 Mbps to a virtual machine. Theoretically, the question is simple; it is solved by adding a rule for tc (a strip with minimal delay):

tc qdisc add dev debian_guest root handle 1: htb default 20
tc class add dev debian_guest parent 1: classid 1:1 htb rate 5mbit burst 15k

But research into the question shows that such a simple separation is not enough.

Suppose that we have a large flow of HTTP traffic, and it is difficult to reach the server.
This is decided by prioritization.

tc class add dev debian_guest parent 1:1 classid 1:10 htb rate 5mbit burst 15k prio 1

Other:

tc class add dev debian_guest parent 1:1 classid 1:20 htb rate 5mbit burst 15k prio 2
tc qdisc add dev debian_guest parent 1:10 handle 10: sfq perturb 10

SSH should be set higher:

tc filter add dev debian_guest parent 1:0 protocol ip prio 10 u32 match ip tos 0x10 0xff flowid 1:10

ICMP packets also need to be put higher:

tc filter add dev debian_guest parent 1:0 protocol ip prio 10 u32 match ip protocol 1 0xff flowid 1:10

TCP ACK also have the highest priority:

tc filter add dev debian_guest parent 1: protocol ip prio 10 u32 match ip protocol 6 0xff match u8 0x05 0x0f at 0 match u16 0x0000 0xffc0 at 2 match u8 0x10 0xff at 33 flowid 1:10

And let all the others go ... With the second priority.

And this is our incoming traffic:

tc qdisc add dev debian_guest handle ffff: ingress
tc filter add dev debian_guest parent ffff: protocol ip prio 50 u32 match ip src 0.0.0.0/0 police rate 5mbit burst 15k drop flowid :1

You can add another 100,500 different rules, but you need to take into account the specifics of the virtual machine - maybe something needs to be closed, etc.

In general, it does not hurt to close all sorts of IRC ports (6667-6669), torrents (6881-6889), and other godless services.

When you stop, you can delete rules that are not required in the future:

tc qdisc del dev debian_guest root
tc qdisc del dev debian_guest ingress

Statistics on the interface can be easily viewed:

tc -s -d qdisc show dev debian_guest
tc -s -d class show dev debian_guest
tc -s -d filter show dev debian_guest

Accordingly, you can see how much traffic enters the interface and leaves it, and it is easy to set up all sorts of graphs.

Very conveniently, you can combine the rules with iptables and tc using the mangle module (help Google) and then manage this traffic through tc. For example, we need to mark traffic for bittorent and allocate for it a band of 1 kilobit:

tc class add dev debian_guest parent 1:1 classid 1:30 htb rate 1kbit prio 3
tc qdisc add dev debian_guest parent 1:30 handle 30: sfq perturb 10
iptables -t mangle -A POSTROUTING -o debian_guest --sport 6881:6889 -j MARK --set-mark 30
iptables -t mangle -A POSTROUTING -o debian_guest --dport 6881:6889 -j MARK --set-mark 30

In conclusion

After reading a series of articles ( 1 , 2 , 3, and 4 ), you can install and configure the KVM virtualization system, the libvirt toolkit manager, and build your own virtual machine image. All this will allow you to save resources, increase the reliability of the system (due to live migration of guest systems), simplify the creation of backups (for example, using LVM snapshots).

I hope I managed to convey some useful information to you, and my experience in this series of articles will help you save your time.

This concludes the first series of articles. If something is interesting - ask in the comments, I am always ready to answer any questions that may arise.

Source: https://habr.com/ru/post/122425/

All Articles