Diskless network boot and life after it

Story

Once we received (well, not yourselves ...) servers with 14 hard drives of 2TB each. Having got rid of a hardware raid (why a separate question), we thought about what it would be nice to do network booting for them in order to get rid of the fuss with partitions. The disks were supposed to be exported via iSCSI, and I didn’t want to allocate any disks to the Special System Disks, but some for everything else. Thus, the task arose to make a boot over the network with the location of the root directory in RAM.

Theory

In fact, in order for the system to boot, it needs 3 components — the kernel, the initial initramfs environment, and the root directory in which the system will work.

Practice

All actions are performed on a machine with ubuntu precise.

PXE

First, let's configure PXE. There are a lot of manuals on this topic, so I will tell only the essence.
We put your favorite dhcp server, for example, isc-dhcp-server, which will distribute IP addresses to machines and specify the path to the pxelinux.0 file, which will be sent to the tftp server (tftp-hpa or atftp).

aptitude install isc-dhcp-server tftpd-hpa

An example of the dhcp server config. In the example, the pxe server is located at 10.0.0.1.

 option domain-name-servers 8.8.8.8; server-name "pxe"; subnet 10.0.0.0 netmask 255.255.255.0 { range dynamic-bootp 10.0.0.2 10.0.0.10; option subnet-mask 255.255.255.0; option routers 10.0.0.1; option root-path "10.0.0.1:/var/lib/tftpboot/"; filename "pxelinux.0"; }

')
We start the tftp server (in ubuntu it has an init-script, but it is likely that you will have to run it through inetd / xinetd).
We check the performance. Put the file in the / var / lib / tftpboot directory and try to pull it off with a tftp client.

 tftp 10.0.0.1 tftp> get pxelinux.0

In principle, it doesn’t matter where you take the pxelinux.0 file, since it is just a bootloader, into which we transfer what we need to load further.
You can make a beautiful menu in the bootloader, but now we don’t need it, so my pxelinux.cfg / default looks like this

 default vesamenu.c32 aprompt 1 timeout 2 label ubuntu 12.04 menu label Ubuntu precise kernel vmlinuz append initrd=initrd.img boot=ram rooturl=http://10.0.0.1/rootfs.squashfs ip=dhcp

rootfs

We build the rootfs image via debootstrap, spin it into it and install the necessary programs. We configure the network, hostname, firewall, and so on. The more settings we make, the larger the image will be. The main thing do not forget to change the password to root.

 mkdir -p /mnt/rootfs debootstrap precise /mnt/rootfs/ http://mirror.yandex.ru/ubuntu/ chroot /mnt/rootfs /bin/bash aptitude install vim ...

With our minimum set, the system turned out to weigh 200MB.

Initramfs

In this example, we will take the root fs image from the web server located on our network boot server, that is, at 10.0.0.1. The solution was simply because our initramfs had a wget utility. In order not to draw a large amount of data over the network, we decided to compress the image. This could be done with the usual tar, but you can try squashfs, especially since tar is usually not built into initramfs, on the other hand, nothing prevents it from being added there.

Squashfs
Squashfs is a compression file system that has been included in the kernel since version 2.6.29. With its help, you can archive the directory, mount the device on the loop and read from it, to write, you need to carry out the procedure of adding files to the archive. Since when you access the squashfs, you read from the archive, this gives an additional load on cpu.

  mksquashfs /mnt/rootfs/ rootfs.squashfs -noappend -always-use-fragments du -hs rootfs.squashfs 92M rootfs.squashfs

For more efficient compression, you can use the -comp option to set the type of compression; gzip is used by default.

Next, you need to teach init from initramfs to take the root image and put it into RAM.

init in initramfs is a sh script that parses options from the cmdline, mounts the fs, makes the switch_root and starts the main init-process of the system.
Let's use this and add our options for cmdline. Let's write the ram script, which will be called when the boot = ram option is set.

vim / usr / share / initramfs-tools / scripts / ram

 #!/bin/bash retry_nr=0 do_rammount() { log_begin_msg "Configuring networking" configure_networking log_end_msg log_begin_msg "Downloading rootfs image" mkdir -p /tmp/squashfs wget ${rooturl} -O /tmp/squashfs/rootfs.squashfs log_end_msg log_begin_msg "Mounting rootfs image to /mnt/squashfs" mkdir -p /mnt/squashfs mount -t squashfs -o loop /tmp/squashfs/rootfs.squashfs /mnt/squashfs log_end_msg log_begin_msg "Mounting tmpfs and copy rootfs image" mkdir -p ${rootmnt} mount -t tmpfs -o size=1G none ${rootmnt} cp -r -v /mnt/squashfs/* ${rootmnt} || exit 2 log_end_msg log_begin_msg "Umount squashfs" umount /mnt/squashfs || exit 2 log_end_msg } mountroot() { for x in $(cat /proc/cmdline); do case $x in rooturl=*) export rooturl=${x#rooturl=} ;; esac done log_begin_msg "Loading module squashfs" modprobe squashfs log_end_msg # For DHCP modprobe af_packet wait_for_udev 10 # Default delay is around 180s delay=${ROOTDELAY:-180} # loop until rammount succeeds do_rammount while [ ${retry_nr} -lt ${delay} ] && [ ! -e ${rootmnt}${init} ]; do log_begin_msg "Retrying rammount" /bin/sleep 1 do_rammount retry_nr=$(( ${retry_nr} + 1 )) log_end_msg done }

Through the rooturl parameter, you can specify where to download the root fs image. To work with squashfs, you need to load its module into the kernel. Specify in /etc/initramfs-tools/initramfs.conf BOOT = ram and reassemble the initramfs

 mkinitramfs -o /var/lib/tftpboot/initrd.img

Turn on the machine on which we will test, and look at what is happening. After a successful download, we received a diskless system, which takes up about 300MB in memory, and we can write to it, but after reboot, the system will return to its original state.

In this example, we used squashfs just to compress the image, but why don't we try to mount the root partition in squashfs and see what happens? We change our script, in the do_rammount () function we leave only the squashfs mount.

 do_rammount() { log_begin_msg "Configuring networking" configure_networking log_end_msg log_begin_msg "Downloading rootfs image" mkdir -p /tmp/squashfs wget ${rooturl} -O /tmp/squashfs/rootfs.squashfs log_end_msg log_begin_msg "Mounting rootfs image to /mnt/squashfs" mkdir -p /mnt/squashfs mount -t squashfs -o loop /tmp/squashfs/rootfs.squashfs ${rootmnt} log_end_msg }

Reassemble initramfs, run, look. The system boots in ro mode, but it only takes up about 180MB in memory.
In some cases, mounting in ro mode is good, but we are not satisfied with it, but we also do not want to waste RAM. The solution was found using Aufs.

Aufs
Aufs allows you to do cascade-combined mount file systems - one in read-only mode, and the second in rw. It works in the copy-on-write mode, that is, all changes are written to the rw system and after that the read is performed from it.
Again we rewrite our script.
Add to mountroot () function

  log_begin_msg "Loading module aufs" modprobe aufs log_end_msg

And the do_rammount () function is reduced to the following form:

 do_rammount() { log_begin_msg "Configuring networking" configure_networking log_end_msg log_begin_msg "Downloading rootfs image" mkdir -p /tmp/squashfs wget ${rooturl} -O /tmp/squashfs/rootfs.squashfs log_end_msg log_begin_msg "Mounting rootfs image to /mnt/ro" mkdir -p /mnt/ro mount -t squashfs -o loop /tmp/squashfs/rootfs.squashfs /mnt/ro log_end_msg log_begin_msg "Mounting tmpfs to /mnt/rw" mkdir -p /mnt/rw mount -t tmpfs -o size=1G none /mnt/rw log_end_msg log_begin_msg "Mounting aufs to /mnt/aufs" mkdir -p /mnt/aufs mount -t aufs -o dirs=/mnt/rw=rw:/mnt/ro=ro aufs /mnt/aufs log_end_msg [ -d /mnt/aufs/mnt/ro ] || mkdir -p /mnt/aufs/mnt/ro [ -d /mnt/aufs/mnt/rw ] || mkdir -p /mnt/aufs/mnt/rw mount --move /mnt/ro /mnt/aufs/mnt/ro #  squashfs   aufs mount --move /mnt/rw /mnt/aufs/mnt/rw #   tmpfs  aufs mount --move /mnt/aufs ${rootmnt} #   aufs  ${rootmnt} }

Reassemble initramfs, run, look. The system takes up 181Mb in memory, while we can change it, write, read. All changes are stored separately in / mnt / rw, and the system itself is stored in / mnt / ro.

As a result, we received a system that is loaded over the network, takes up a small amount in memory, and after each reboot, all changes disappear (therefore, we need to pre-assemble all the necessary waste products of the system in a safe place).

All of the above methods have the right to life. I hope that this information will be useful to you, but it will be interesting for me to read / listen to your comments.
Thanks for attention.

Links

Ubuntu boot to ram
Squashfs home page
PXE

Source: https://habr.com/ru/post/164147/

All Articles