Cloud: Cloning disks VS installation

One of the questions that arise when creating a service (in this case it does not matter, cloud or VDS) is how to create client machines.

Most of our competitors use image cloning, which was once installed and configured by the system administrator. We use a clean installation from scratch every machine.

Below I will try to explain what caused this decision, what both approaches have pros and cons.
')

How it all began...

When the cloud had just acquired the first features, the task of automating the creation of virtual machines arose. Of course, the first decision lying on the surface was “take and bend”. Fortunately, all the business there is one team (xe vm-clone). Next was the need to correct the network settings, the host name and root password. All work - half a day. Well, two days with flea fishing.

Made? Made. Fortunately, even beta testers did not see this version.

After analyzing all the negative factors of disk cloning, we still came to the idea of "installation". This process was, to put it mildly, thorny, and it took me about a month and a half, during which I had to very deeply study the specifics of each Linux distribution.

The result is that the existing template system allows us to quite simply change the settings of the created machines, and the whole process is not tied to a kind of mystical “master copy”, sacred and insanely important.

What is a “clean install”? This is when a virtual machine with an empty disk is created, a special kernel is launched with a special initrd, inside of which is an installer (netinst). This installer is given a file with answers to questions (each distribution has its own).

Actually, what is the difference between cloning a disk of an already installed OS and a full installation (this, of course, is not about the process, but about the result)?

Unique ssh-keys. Yes, every ssh server has a private key, which must be unique on each server. This key (in debian / ubuntu) is generated at installation (and not at the first launch). And if two client virtual machines have the same private key, it will be a huge security hole, because the server of one client can impersonate the server of another. Exactly also applies to ssl-certificates and their private keys in Apache and other services that use openssl.
When installing SQL server into debian / ubuntu, a special account is created in the database with a randomly generated password so that dpkg can update the databases. The password for this account must be unique between the virtual machines. If it is the same, it means that your “neighbor in the cloud” knows the admin password from your sql. Nicely? Not really.
Date the files were created. During cloning, you will find that the image creation date refers to the time when the cloud administrator created it. A part of the files (if the administrator rolls updates) has a completely different date. Deadly? Not. Unpleasant? Yes.
Log files contain extra information - extraneous logins in the authorization log, the installation log in general is full of nonsense of unknown years of origin.
The uuids of all possible types (including the file system UUID) are repeated from disk to disk, which makes the word “unique” in the UUID meaningless

Of course, all this can be changed and corrected. If you know about it. And if you do not know? Realize the post-fact that there was a hole on several thousand virtual machines due to the fact that someone forgot about another unique identifier in one of the subsystems (in udev, in device-mapper, somewhere else, what are we forgot to think)? In order to avoid these errors, you need to examine every distribution through. Each. And Debian, and Susi, and Centos, and Ubuntu (although Ubuntu, although a fork of Debian, but amassed a very decent amount of differences); and if one more distribution kit is added?

The level of responsibility when cloning increases incredibly. And disproportionate benefits.

But, in fact, the true motivation for using the installation instead of cloning was the ideological purity of the method. Probably, this may seem strange to someone, but in many matters I want to do not just “make it work”, but follow the ideology laid down in the architecture. If distribution authors assume that their distribution will be installed, then it is correct if it is installed.

However, a clean installation also has some goodies for users.

For example, CentOS users will find in / root the anaconda-ks.cfg file with the settings with which the machine was installed, as well as install.log and install.log.syslog with installation logs. These files refer specifically to the machine where they are located, and not to any once-delivered machine. Exactly the same way, users of Debian / Ubuntu will find their own logs in the / var / log / install directory. About the correct dates for the files I have already written above.

To lie, that there are no minuses in the "full installation" compared to cloning a disk, I will not. They are. But they all relate only to the time of installation and do not affect the future life of the machine.

I will list them:

Installation is slower than cloning, especially in situations of small system disks.
The installation is more capricious - if the caching proxy “hiccups” at least once, the installation will “stick”, in some cases fatal (the installation will hang).
It is much more difficult to make changes to the installation; debugging these changes requires the installation procedure to be launched each time.

As you can see, none of these drawbacks concern the life of the machine after the installation is completed, all the problems are either the torment of the installation itself or problems in planning and deployment. Thus, once having overcome the problem, we get the most pure systems that correspond to the idea of the creators of the distribution.

However, I do not want to curse competitors frantically. The variant with cloning is quite viable, and with due diligence it will make the machines quite similar to “freshly installed” ones. Perhaps the only question for me is "how similar" ...

How it works?

In the database (the very SCAPI, about which I have not yet told) there is a class of objects of the template. Template stores a bunch of options for creating a machine - limits on the minimum / maximum memory, disk; text description, link to icons - in short, everything that is shown to the user when creating the machine (example on the left). In addition, the most important thing is stored there: the text autoinstall_script. Each template has its own script, although for practical purposes it is generated from administrative files using a single file for 32-bit and 64-bit versions of the same OS. This is done by a bunch of variable substitutions, since The actual text of the 32-bit and 64-bit versions is very different (not just the package names).

So, when the user clicks on the “install” button, the script sends to the scapi a series of commands (vm-create, vif-create, vdi-create, vbd-create, etc.) that “create” the machine. After that, run the main command - vm-install. This command takes the parameters required for the installation (such as the host name), takes a script from the database that matches the machine template (the template is transmitted at the time of vm-create) and starts the installation process. As I wrote above, the “installation process” literally means the following: a special kernel, a special initrd, and a bundle of parameters that are passed through the PV-args to the paravirtualized kernel.

Inside the initrd each system has its own installer, which wants its own format. Immediately, as soon as the virtual machine starts up with the kernel / initrd for installation, its settings change to “working”, with the usual for this distribution kernel and initrd to boot. (By the way, which kernel to use for loading, also stores the template in itself).

There are three installation methods among mainstream distros: preseed, kickstart and autoyast.

Preseed is a debian achievement, also used in Ubuntu. It contains answers to questions that packages can ask through deb mechanisms. Unfortunately, the answers to these questions are not always described in the documentation - sometimes you have to look inside the deb / udeb files (udeb is such a small deb that stores installer fragments). It must be said, although debian and ubuntu are very close, the set of answers they have differs significantly enough to be perceived as different operating systems.

The main advantage of preseed, the ability to transfer any answers to questions both in the form of a file and in the form of kernel arguments, which allows the “common” parameters for machines to be transferred to a static file, and everything that is unique to the machine (name, password, IP address) ) pass it through PV-args. By the way, Ubuntu preseed handles the installer from debian (the same red-blue in text mode), so no beautiful gui (which desktop users see when installing) is not there.

Kickstart - the invention of Red Hat, along with the rest of the features of RHEL was completely transferred to CentOS. In terms of capabilities, it is somewhat inferior to preseed, which is why we had to invent a whole system of dynamic generation of kickstart with protection from loading by unauthorized machines (it will be very unpleasant if someone downloads your initial password, right?). Another CentOS piquancy is that its netinstall image does not support Xen, and it has to be patched (replaced by kudzu) before running.

Autoyast is an extension of Yast, the configuration system Suse (and OpenSuse, respectively), on the one hand, allows you to do a lot (more precisely, within the framework of the Yast concept, everything), on the other hand, it does not support normal PV-args in exactly the same way. Plus, autoyast is written in the vivid XML, which is why the size of autoyast.xml is about 5 times bigger than the kickstart and preseed with a much smaller amount of useful information.

It is necessary to tell separately about the specifics of installing CentOS. Because of their excessive adherence to immutability, immediately after installing CentOS, it’s as ... as a centos. That is, yum update gives as much as 80+ MB updates on a freshly installed system. In order not to leave users with ancient systems, yum update had to be included in the% post section for kickstart, which, by the way, explains why centos is put longer than any other system.

Source: https://habr.com/ru/post/113915/

All Articles

Cloud: Cloning disks VS installation

How it all began...

How it works?

More articles: