Creating LXC containers with shared file base

The use of lightweight LXC containers is currently rather limited mainly due to their “dampness”. To use them in production is the lot of real Jedi. Under production in this case refers to the direct provision of services to customers.

However, for simple sharing of services and control of resources such containers are quite suitable with some assumptions. For example, we assume that root in a container is equal to root on the target system.

The article will show how you can quickly create lightweight containers on a local disk with shared files without using LVM snapshots.
')

Briefly about the essence of the containers LXC

LXC is a tool for implementing virtual containers in the Linux kernel. At its core, LXC is simply a collection of userspace utilities that exploit the capabilities implemented in the kernel. As such, the concept of LXC is missing in the Linux kernel.

The two main components of the container are namespaces and control groups (cgroups). The former provide isolation of container processes from each other, and the latter are responsible for limiting the resources allocated to the container.

Currently, valid namespaces are:

pid - process identifier namespace
mount - mounted file system namespace
network - allows you to create isolated network stacks inside containers
utsname - provides isolation of the utsname structure. Primarily used to set different hostnames.
ipc is the SysV IPC namespace. Shared memory, semaphores and message queues will have different id.
user - uid / gid namespace

By the way, the last namespace promise to finally finish to the kernel version 3.9

These spaces are enough for the containers to feel independent.

Initially, all processes in the system use common namespaces. By creating a new process, we can tell the kernel to clone the namespaces we need for this process. This is achieved by specifying the special flags CLONE_NEW * of the clone () call. By specifying this flag for a specific namespace when creating a process, we ensure that the process will be created in its own namespace. This is how LXC utilities work, creating a new container.

You can separate the namespace of an existing process by calling unshare (). You can completely replace one process name space with another by using setns (), but this call requires support for new kernels (> 3.0).

It is setns () that is used to "jump" into the container.

Control groups, like namespaces, are implemented in the kernel. In user space, their use is available LXC using the interface of the special file system cgroup. LXC utilities create a directory in this file system with the name of the container, and then write the pid of the processes into the control group files. Therefore, the name of the container is essentially the name of the control group.

Preparing the system for creating containers LXC

Let's skip this step, because there are many places about it. For example, here . The essence of the configuration is to build a kernel with the necessary options and install userspace utilities.

Fortunately, many of the cores of modern distributions are already compiled with these options, so you probably won't need reassembly.

If you're used to using libvirt to manage virtualization, the good news is that libvirt fully supports LXC. The article about him will not be told to be "closer to the body."

Create a file system base for containers

Usually they do this: they create a kind of basic LVM device, and on this basis they create separate snapshots for the file systems of each container. Thus, it saves disk space due to the fact that snapshot takes place only by the amount of modified blocks.
Instead of lvm, as an option, it is possible to use a file system that supports snapshots, for example btrfs.

But this method has a significant drawback: disk write operations with lvm snapshots are extremely slow.

Therefore, for certain tasks, you can use the following method:

Create a base container image
We select from it the general unchangeable part.
Create symbolic links from the image to this part.
When creating a container, we mount this part inside the container.

Let's get started We will use the same LVM as the base container (although this is not necessary at all):

$ mkdir -p /lxc/base $ mount /dev/mapper/lxc /lxc/base $ cat /.exclude /dev/* /mnt/* /tmp/* /proc/* /sys/* /usr/src/* /lxc $ rsync --exclude-from=/.exclude -avz / /lxc/base/ $ DEV="/lxc/base/dev" $ mknod -m 666 ${DEV}/null c 1 3 $ mknod -m 666 ${DEV}/zero c 1 5 $ mknod -m 666 ${DEV}/random c 1 8 $ mknod -m 666 ${DEV}/urandom c 1 9 $ mkdir -m 755 ${DEV}/pts $ mkdir -m 1777 ${DEV}/shm $ mknod -m 666 ${DEV}/tty c 5 0 $ mknod -m 600 ${DEV}/console c 5 1 $ mknod -m 666 ${DEV}/full c 1 7 $ mknod -m 600 ${DEV}/initctl p $ mknod -m 666 ${DEV}/ptmx c 5 2

After the end of copying, let's start creating the unchangeable part. Let's call it common:

 $ cd /lxc/base $ mkdir common $ mv bin lib lib64 sbin usr common/ $ ln -s common/bin $ ln -s common/sbin $ ln -s common/lib $ ln -s common/lib64 $ ln -s common/usr $ chroot /lxc/base $ > /etc/fstab

After that, remove start_udev from /etc/rc.sysinit, disable unnecessary services, and make additional settings at our discretion. We remove the hostname from the configuration files so that it is not redefined at the start of the container.

Mount the cgroup file system with which the container resources will be limited. This process will occur by creating a directory with the name of the container within the file system. The directory will be created (and deleted) by the LXC utilities.

 $ mount -t cgroup -o cpuset,memory,cpu,devices,net_cls none /cgroup

We explicitly specify the controllers that we want to mount, because by default, the Cenk6 / RHEL6 distributions mount the blkio controller, which does not support the nested hierarchies necessary for LXC to work. There are no problems with this in Ubuntu / Debian.

Also useful may be the cgclear utility from libcgroup, which not only unmounts control groups, but also destroys them at the kernel level. This will help prevent the -EBUSY error when re-mounting individual controllers.

Now we will create a network bridge into which all containers will be connected. Be careful, when performing the operation, the network disappears.

 $ brctl addbr br0 $ brctl addif br0 eth0 $ ifconfig eth0 0.0.0.0 $ ifconfig br0 10.0.0.15 netmask 255.255.255.0 $ route add default gw 10.0.0.1

All new virtual container interfaces will be included in this new bridge.

Do not forget to reflect all the changes in the start configuration files of the distribution.

Create LXC Container

After preparing the base image of the system, we can proceed directly to creating the first container in the system. Let's just call it lxc-container.

The procedure for creating a container includes three simple steps:

Create fstab container file
Preparing container file system
Creating a container configuration file

Set up fstab for our container:

 $ cat > /lxc/lxc-container.fstab << EOF devpts /lxc/lxc-container/dev/pts devpts defaults 0 0 proc /lxc/lxc-container/proc proc defaults 0 0 sysfs /lxc/lxc-container/sys sysfs defaults 0 0 EOF

Now let's prepare the file system for our first lxc-container using the previously created immutable part of the base image.

 $ mkdir /lxc/lxc-container && cd /lxc/lxc-container $ rsync --exclude=/dev/* --exclude=/common/* -avz /lxc/base/ . $ mount --bind /lxc/base/dev /lxc/lxc-container/dev $ mount --bind /lxc/base/common /lxc/lxc-container/common $ mount -o remount,ro /lxc/lxc-container/common

The last two lines can not be combined into one. Well, okay.
As you can see, the main drawback (or main advantage) of the described method is revealed here. The basic part of the file system inside the container is read-only.

And finally, the most important thing is the container configuration file. In this example, we assume that the lxc utilities are installed in the root of the system.

 $ mkdir -p /var/lib/lxc/lxc-container $ cat > /var/lib/lxc/lxc-container/config << EOF # hostname   lxc.utsname = lxc-name0 #   tty lxc.tty = 2 #      fstab lxc.rootfs = /lxc/lxc-container lxc.rootfs.mount = /lxc/lxc-container lxc.mount = /lxc/lxc-container.fstab #    lxc.network.type = veth lxc.network.name = eth0 lxc.network.link = br0 lxc.network.flags = up lxc.network.mtu = 1500 lxc.network.ipv4 = 10.0.0.16/24 #       /dev lxc.cgroup.memory.limit_in_bytes = 128M lxc.cgroup.memory.memsw.limit_in_bytes = 256M lxc.cgroup.cpuset.cpus = lxc.cgroup.devices.deny = a lxc.cgroup.devices.allow = c 1:3 rwm lxc.cgroup.devices.allow = c 1:5 rwm lxc.cgroup.devices.allow = c 5:1 rwm lxc.cgroup.devices.allow = c 5:0 rwm lxc.cgroup.devices.allow = c 4:0 rwm lxc.cgroup.devices.allow = c 4:1 rwm lxc.cgroup.devices.allow = c 1:9 rwm lxc.cgroup.devices.allow = c 1:8 rwm lxc.cgroup.devices.allow = c 136:* rwm lxc.cgroup.devices.allow = c 5:2 rwm lxc.cgroup.devices.allow = c 254:0 rwm EOF

Please note that we denied access to all devices except those that are explicitly indicated, and also limited memory and swap. Despite this current restriction of the free utility, the utilities inside will display full physical memory inside the container.

Do not forget to reflect all changes in the start configuration files of the distribution kit, otherwise they will all be lost after a reboot!

Launch the LXC container

It is time to launch our newly created container. To do this, we use the lxc-start utility, passing it the name of our container as an argument:

 $ lxc-start --name lxc-container

Connect to LXC container

In LXC, there is a problem with jumping into a container from a physical server.

lxc-attach, designed for this, works only with the patched kernel. Patches implement functionality for certain namespaces (namely, mount-namespace and pid-namespace). The patches themselves can be downloaded from the link.

The jump functionality is implemented by the special system call setns (), which binds a third-party process to existing namespaces.

Replacing the jump to the container can lxc-console, which connects to one of the virtual console of the container

 $ lxc-console --name lxc-container -t 2

And before us is the container console / dev / tty2

 CentOS release 6.3 (Final) Kernel 2.6.32 on an x86_64 lxc-container login: root Password: Last login: Fri Nov 23 14:28:43 on tty2 $ hostname lxc-container $ tty /dev/tty2 $ ls -l /dev/tty2 crw--w---- 1 root tty 136, 3 Nov 26 14:25 /dev/tty2

The / dev / tty2 device has a major number of 136, and is not a “real tty”. This device is serviced by a pseudo-terminal driver, whose master is read on a physical server, and the slave is read on a container. That is, our / dev / tty2 is the usual device / dev / pts / 3

And, of course, you can connect via ssh:

 $ ssh root@lxc-container

LXC Operation

This is a very interesting, but separate topic of discussion. It can be noted here that the LXC utilities take over some of the container administration tasks, but you can completely do without them. For example, you can see the list of processes in the system with the division into containers:

 $ ps ax -o pid,command,cgroup

cgroup in this case coincides with the name of the container

Source: https://habr.com/ru/post/181247/

All Articles