
In the previous
series of publications, we discussed with you the issues of
preparing a host machine , creating and
cloning virtual machines. Today I will tell about not less important question - about restriction of use of resources by virtual computers.
')
Introduction to cgroups
To limit the resources used by virtual machines, I suggest using
cgroups , a kernel subsystem that uses the scheduling linux tasks to set the required limits.
In runet, the work of
cgroups is practically not covered, on foreign Internet, basically, everything is limited to the description of the famous patch “
200 lines kernel patch that does wonders ”, except, perhaps, the
article by Daniel Berrange .
In this article, I want to consider working with
cgroups in relation to virtual machines, but this does not prevent us from using this subsystem to perform common desktop tasks.
In fact,
cgroups is a hierarchical file system, similar to / sys or / proc, which provides a simple interface for accessing the internal mechanisms for allocating resources in the kernel. To understand what this file system is, an example will help us: create mount points and mount two controllers there -
cpu and
blkio , and see what's inside.
# mkdir /cgroup
# mkdir /cgroup/cpu
# mkdir /cgroup/blkio
# mount -t cgroup -ocpu cgroup /cgroup/cpu
# mount -t cgroup -oblkio cgroup /cgroup/blkio
# ls /cgroup/cpu
cgroup.clone_children cgroup.event_control cgroup.procs cpu.rt_period_us cpu.rt_runtime_us cpu.shares notify_on_release release_agent sysdefault tasks
# ls /cgroup/blkio
blkio.io_merged blkio.io_service_time blkio.throttle.io_service_bytes blkio.throttle.write_bps_device blkio.weight_device tasks blkio.io_queued blkio.io_wait_time blkio.throttle.io_serviced blkio.throttle.write_iops_device cgroup.clone_children notify_on_release blkio.io_service_bytes
blkio.reset_stats blkio.throttle.read_bps_device blkio.time cgroup.event_control release_agent
blkio.io_serviced blkio.sectors blkio.throttle.read_iops_device blkio.weightcgroup.procs sysdefault
The
tasks file in each branch of the tree contains the PID of the processes that will be managed through a specific group in cgroups. Thus it is possible to implement a hierarchical structure of limiting processes. It should be borne in mind that limits apply to all processes in a group.
CPU load limiting
A little bit about how you can limit the process. This can be done very simply:
- Select the PID that we want to put in cgroups.
- We put it in tasks.
- We set limits (we assign a process weight, it can be set arbitrarily, but the value must be greater than zero).
# echo $$ > /cgroup/cpu/tasks
# echo 100 > /cgroup/cpu/cpu.shares
Where
100 is the weight assigned to the current bash process.
There are some special features in adding PID to tasks. For example, you can bind at most one process at a time, that is,
echo PID1 PID2 ... will not work. In addition, it should be noted that echo does not return the correct result of writing - for the most correct implementation, use the write () system call.
This limiting method has a significant disadvantage: the limits cannot be strictly defined. This limits their use in the service provision scheme, when a virtual amount of resources is assumed to be allocated for a virtual machine. The
cgroups developers assume that the processor will be used 100% (for which it is, in fact, needed), and not to stand idle. This is implemented using the mechanism of scales.
But this minus is often a big plus. For example, take three virtual machines with weights of 50, 30, and 20, respectively, and one processor core. If all machines have a maximum load, then each will be allocated, respectively, 50, 30 and 20 percent of the CPU.
Often it is possible that a virtual machine does not need resources at some point in time. Let's say it will be the second car (with a weight of 30). Thus, two machines with weights of 50 and 20 will work: 71% (50 / (50 + 20)) of processor resources will be allocated to one machine, the second - 100% - 71% = 29%. This nuance of resource allocation will allow the virtual machine, if necessary, to use up to the full power of the kernel.
Disk Subsystem Limiting
With disks, the situation is more complicated, although the implementation of limits there may be more stringent.
Let's see what we have inside the blkio controller, which is responsible for managing disk IO.
# cd /cgroup/blkio
# ls
blkio.io_merged blkio.io_service_bytes blkio.io_service_time blkio.reset_stats blkio.time
blkio.weight_device cgroup.event_control release_agent blkio.io_queued
blkio.io_serviced blkio.io_wait_time blkio.sectors blkio.weight cgroup.clone_children
cgroup.procs notify_on_release tasks
As you can see, there are several parameters for limiting the performance of the disk subsystem, namely -
- iops is the number of I / O operations per second
- bps - bandwidth
- weight is the weight of the system.
To indicate for which disk and which process
iops or
bps should be set, you need to determine the
major and
minor disk (see about
device classification ) and send them to a special file (example for bps):
# ls -la /dev/sda
brw-rw---- 1 root disk 8 , 0 11 13:59 /dev/sda
# echo $$ > /cgroup/blkio/tasks
# echo 3 > /proc/sys/vm/drop_caches
# echo "8:0 1000000" > /cgroup/blkio/blkio.throttle.read_bps_device
# echo "8:0 1000000" > /cgroup/blkio/blkio.throttle.write_bps_device
# dd if=/dev/zero of=/tmp/zerofile bs=4K count=102400 oflag=direct
# dd if=/tmp/zerofile of=/tmp/zerofile2 bs=4K count=102400
25426+0
25425+0
104140800 (104 MB), 102,21 c, 1,0 MB/c
Therefore, if you need to use hard limits, the virtual machine should use a separate block device (for example, an LVM partition), which will have
major and
minor . For images as files, only the
blkio.weight controller can be used.
There is a logical question why dd adheres to the same limits that we asked for the bash interpreter. Everything is very simple:
cgroups keeps track of children that are forked by the parent and automatically bring them into tasks.
It should be noted that blkio.throttling * appears if you enable the
CONFIG_BLK_CGROUP and
CONFIG_BLK_DEV_THROTTLING parameters in the kernel. I recommend searching for menuconfig for the sake of interest according to CGROUP and BLK - by default there are a lot of interesting things disabled.
Unfortunately,
cgroups limits only disk access without using caching. If you do not reset the caches or specify dd
oflag = direct or
iflag = direct , limits will not be applied. Read more about it
here and
here .
Everything is simpler with
blkio.weight : everything works in the same way as cpu.shares.
Work with cgroups from an unprivileged user
In
cgroups, an unprivileged user cannot write. But there is a set of utilities included in the
libcg project that allow working with
cgroups without administrative privileges. On Debian, they can be installed along with the
cgroup-bin package.
Having installed the package, let's see which utilities are included in its composition:
$ ls /usr/*bin/cg*
/usr/bin/cgclassify /usr/bin/cgdelete /usr/bin/cgget /usr/bin/cgsnapshot /usr/sbin/cgconfigparser
/usr/bin/cgcreate /usr/bin/cgexec /usr/bin/cgset /usr/sbin/cgclear /usr/sbin/cgrulesengd
The most useful for us will be
cgcreate and
cgdelete , as well as a script that, after reading the configuration file, will automatically create the necessary groups when the system starts up -
cgconfigparser .
Perform cgconfig configuration so that when you start the system we automatically mount the necessary file system:
$ cat /etc/cgconfig.conf
mount {
cpu = /cgroup/cpu;
cpuacct = /cgroup/cpuacct;
devices = /cgroup/devices;
# memory = /cgroup/memory;
blkio = /cgroup/blkio;
# freezer = /cgroup/freezer;
cpuset = /cgroup/cpuset;
}
$ sudo /etc/init.d/cgconfig restart
We don’t really need the freezer and memory controllers, and the memory, moreover, refuses to be mounted automatically, and we will need to mount it manually if necessary.
Let's create a group so that our unprivileged user
% username% username can do self-improvement:
$ sudo cgcreate -f 750 -d 750 -a username:libvirt -t username:libvirt -g cpu,blkio:username
Where:
- -f and -d set permissions on files and directories within a group, respectively;
- -a and -t set the owner for the subsystem parameters and the tasks file
- -g creates groups for the specified cpu and blkio controllers relative to the root (for example, in this case, they will be / cgroup / cpu / username and / cgroup / blkio / username, respectively).
After this, you will see that the username directory has been created in the / cgroup / cpu and / cgroup / blkio directories, to which the specified user username can write tasks for cgroups.
It is very convenient to create a group when starting a virtual machine and set limits there, and when stopped, delete the group:
$ sudo cgdelete cpu,blkio:username
When using cgroups in servicing virtual machines, you can somewhat automate the process of setting limits and creating groups. In the /etc/libvirt/qemu.conf file, you can specify controllers with which qemu can work. You can also add a list of devices to which the virtual machine can access:
cgroup_controllers = [ "cpu", "devices", "memory", "cpuset", "blkio" ]
cgroup_device_acl = [
"/dev/null", "/dev/full", "/dev/zero",
"/dev/random", "/dev/urandom",
"/dev/ptmx", "/dev/kvm", "/dev/kqemu",
"/dev/rtc", "/dev/hpet", "/dev/net/tun",
]
It is convenient to use virsh for setting cpu and blkio weights (part of the libvirt library, see the article about
preparing the host system ):
$ virsh schedinfo --set cpu_shares=1024 debian_guest
$ virsh blkiotune debian_guest --weight 1024
At the same time, the value in
/cgroup/cpu/sysdefault/libvirt/qemu/debian_guest/cpu.shares will change to 1024. But, unfortunately, an ordinary user cannot change the settings in the sysdefault tree.
Network load limiting
With the resources of the processor and the disk, we slowly figured out. Now the most important thing is the network. It is important for us to get a hard band delimitation and at the same time acceptable values of delay and load on the server. What we are going to do now is scientifically called shaping and QoS (Quality of Service). Selecting a band is easy - it is much more difficult to consider various scenarios for using a virtual machine.
Suppose we need to allocate a band of 5 Mbps to a virtual machine. Theoretically, the question is simple; it is solved by adding a rule for
tc (a strip with minimal delay):
tc qdisc add dev debian_guest root handle 1: htb default 20
tc class add dev debian_guest parent 1: classid 1:1 htb rate 5mbit burst 15k
But research into the question shows that such a simple separation is not enough.
Suppose that we have a large flow of HTTP traffic, and it is difficult to reach the server.
This is decided by prioritization.
tc class add dev debian_guest parent 1:1 classid 1:10 htb rate 5mbit burst 15k prio 1
Other:
tc class add dev debian_guest parent 1:1 classid 1:20 htb rate 5mbit burst 15k prio 2
tc qdisc add dev debian_guest parent 1:10 handle 10: sfq perturb 10
SSH should be set higher:
tc filter add dev debian_guest parent 1:0 protocol ip prio 10 u32 match ip tos 0x10 0xff flowid 1:10
ICMP packets also need to be put higher:
tc filter add dev debian_guest parent 1:0 protocol ip prio 10 u32 match ip protocol 1 0xff flowid 1:10
TCP ACK also have the highest priority:
tc filter add dev debian_guest parent 1: protocol ip prio 10 u32 match ip protocol 6 0xff match u8 0x05 0x0f at 0 match u16 0x0000 0xffc0 at 2 match u8 0x10 0xff at 33 flowid 1:10
And let all the others go ... With the second priority.
And this is our incoming traffic:
tc qdisc add dev debian_guest handle ffff: ingress
tc filter add dev debian_guest parent ffff: protocol ip prio 50 u32 match ip src 0.0.0.0/0 police rate 5mbit burst 15k drop flowid :1
You can add another 100,500 different rules, but you need to take into account the specifics of the virtual machine - maybe something needs to be closed, etc.
In general, it does not hurt to close all sorts of IRC ports (6667-6669), torrents (6881-6889), and other godless services.
When you stop, you can delete rules that are not required in the future:
tc qdisc del dev debian_guest root
tc qdisc del dev debian_guest ingress
Statistics on the interface can be easily viewed:
tc -s -d qdisc show dev debian_guest
tc -s -d class show dev debian_guest
tc -s -d filter show dev debian_guest
Accordingly, you can see how much traffic enters the interface and leaves it, and it is easy to set up all sorts of graphs.
Very conveniently, you can combine the rules with iptables and tc using the mangle module (help Google) and then manage this traffic through tc. For example, we need to mark traffic for bittorent and allocate for it a band of 1 kilobit:
tc class add dev debian_guest parent 1:1 classid 1:30 htb rate 1kbit prio 3
tc qdisc add dev debian_guest parent 1:30 handle 30: sfq perturb 10
iptables -t mangle -A POSTROUTING -o debian_guest --sport 6881:6889 -j MARK --set-mark 30
iptables -t mangle -A POSTROUTING -o debian_guest --dport 6881:6889 -j MARK --set-mark 30
In conclusion
After reading a series of articles (
1 ,
2 ,
3, and
4 ), you can install and configure the KVM virtualization system, the libvirt toolkit manager, and build your own virtual machine image. All this will allow you to save resources, increase the reliability of the system (due to live migration of guest systems), simplify the creation of backups (for example, using LVM snapshots).
I hope I managed to convey some useful information to you, and my experience in this series of articles will help you save your time.
This concludes the first series of articles. If something is interesting - ask in the comments, I am always ready to answer any questions that may arise.