📜 ⬆️ ⬇️

Linux distribution from scratch to build Docker images - our experience with dappdeps


Building images for a Docker based on a base image usually involves invoking commands in the environment of this base image. For example, invoking the apt-get command, which is in the base image, to install new packages.

Often there is a need to install some set of utilities into the base system, with the help of which some files that are required in the final image are installed or assembled. For example, to build a Go application, you need to install the Go compiler, put all the application source in the base image, and compile the required program. However, the final image requires only a compiled program without the entire set of utilities that was used to compile this program.

The problem is known: one of the ways to solve it can be to build an auxiliary image and transfer files from the auxiliary image to the resulting one. For this, Docker appeared multi-stage builds or artifact images in dapp . And this approach ideally solves a problem similar to transferring the results of compiling source codes into the final image. However, it does not solve all possible problems ...

Here's another example: Chef is used in local mode to build an image. To do this, chefdk is put in the base image, recipes are mounted or added, these recipes are launched that customize the image, install new components, packages, configs files and so on. Similarly, another configuration management system can be used - for example, Ansible. However, the installed chefdk takes about 500 MB and significantly increases the size of the final image - leaving it there makes no sense.
')
But Docker's multi-stage builds will no longer solve this problem . What if the user doesn’t want to know about the side effect of the program - in particular, what files does it create? For example, in order not to keep unnecessary explicit descriptions of all exported paths from an image. I just want to start the program, get some result in the image, but so that the program and the whole environment needed for its work remain outside the final image .

In the case of chefdk, it would be possible to mount a directory with this chefdk into an assembly image for the duration of the build. But there are problems with this solution:

  1. Not any program needed for assembly is installed in a separate directory that is easy to mount into the build image. In the case of Ansible, you need to install Python in a nonstandard place so as not to conflict with system Python, which may cause problems.
  2. The mounted program will depend on the base image used. If the program is compiled for Ubuntu, then it may not start in an environment not intended for it - for example, in Alpine. Even chefdk, which is an omnibus package with all its dependencies, still depends on the system glibc and will not work in Alpine, where musl libc is used.

But what if we can prepare a kind of static unchanging set of all possible useful utilities that will be so cleverly linked that it will work in any basic image , even scratch? After connecting such / such images to the base, in the final image there will be only an empty mount-point directory into which these utilities were connected.

In search of adventures


Theory


You need to get an image that contains a set of programs in some statically defined non-standard directory - for example, /myutils . Any program in /myutils should depend only on the libraries in /myutils .

A dynamically compiled Linux program depends on the location of the ld-linux linker in the system. For example, the bash binary in ubuntu:16.04 compiled so that it depends on the linker /lib64/ld-linux-x86-64.so.2 :

 $ ldd /bin/bash linux-vdso.so.1 => (0x00007ffca67d8000) libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007fd8505a6000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd8503a2000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd84ffd8000) /lib64/ld-linux-x86-64.so.2 (0x00007fd8507cf000) 

Moreover, this dependence is static and compiled into the binary itself:

 $ grep "/lib64/ld-linux-x86-64.so.2" /bin/bash Binary file /bin/bash matches 

Thus, it is necessary: ​​a) to compile the conditional /myutils/bin/bash so that it uses the linker /myutils/bin/bash /myutils/lib64/ld-linux-x86-64.so.2 ; b) that the linker /myutils/{lib64,lib} /myutils/lib64/ld-linux-x86-64.so.2 be configured to dynamically link libraries from /myutils/{lib64,lib} .

The first step is to build a toolchain image, which will contain everything that is necessary for the assembly and subsequent work of other programs in a non-standard root directory. To do this, we will come in handy to the instructions of the Linux From Scratch project.


We build the distribution kit dappdeps


Why a set of images of our "distribution" is called dappdeps ? Because these images are used by the dapp collector - they are going to the needs of this project.

So, our ultimate goal :


Images of dappdeps may depend on each other . For example, when building dappdeps / base, you need a toolchain and glibc from the dappdeps / toolchain image. After compiling all the utilities in dappdeps / base, they will need files from dappdeps / toolchain to run at runtime.

The main condition is that the utilities from these images are located in a non-standard place , namely, in /.dapp/deps/ , and do not depend on any utilities or libraries in standard system paths. Also in the dappdeps images there should not be any other files besides / /.dapp/deps .

Such images will allow you to create on their basis containers with volumes containing utilities, and mount them to other containers using the --volumes-from option for Docker.

Putting dappdeps / toolchain


Chapter 5, “Constructing a Temporary System,” of the Linux From Scratch tutorial, describes the process of building a temporary chroot environment in /tools with some set of utilities, which the main distribution kit then compiles.

In our case, slightly alter the directory chroot-environment. In the --prefix parameter we will specify /.dapp/deps/toolchain/0.1.1 when compiling. This is the directory that will appear in the assembly container when you mount dappdeps / toolchain into it - it contains all the necessary utilities and libraries. We only need GNU binutils, GCC and glibc.

Going image using Docker multi-stage builds. In an image based on ubuntu:16.04 whole environment is prepared and the programs are compiled and installed in /.dapp/deps/toolchain/0.1.1 . Then this directory is copied into the scratch-image dappdeps / toolchain: 0.1.1. Dockerfile can be found here .

The final dappdeps / toolchain image is the “temporary system” in LFS terminology. GCC in this system is still tied to the system library paths, however, we will not ensure that GCC works in any basic image. The image of dappdeps / toolchain is auxiliary, it will be used later, incl. to build already really independent of the general system libraries of programs.

Use Omnibus with dappdeps / toolchain



For projects such as chefdk or GitLab , Omnibus is used. It allows you to create self-contained bundles with the program and all dependent libraries, except the system linker and libc. All instructions are described by easy-to-read Ruby recipes. Also, the Omnibus project has a library of already written omnibus-software recipes.

So, let's try to describe the build of the remaining dappdeps distributions using Omnibus . However, to get rid of the dependency on the system linker and libc, we will collect all the programs in Omnibus using the compiler from dappdeps / toolchain. In this case, the programs will be tied to glibc, which is also in dappdeps / toolchain.

To do this, save the contents of dappdeps / toolchain as an archive:

 $ docker pull dappdeps/toolchain:0.1.1 $ docker save dappdeps/toolchain:0.1.1 -o dappdeps-toolchain.tar 

Add this archive via the Dockerfile ADD directive and unpack the contents of the archive into the root of the building container:

 ADD ./dappdeps-toolchain.tar /dappdeps-toolchain RUN tar xf /dappdeps-toolchain/**/layer.tar -C / 

Before running the assembly via omnibus, add the path /.dapp/deps/toolchain/0.1.1/bin to the PATH variable as a priority so that GCC from dappdeps / toolchain is used.

The output of Omnibus is a package (in our case, DEB), the contents of which is unpacked and transferred to /.dapp/deps/{base|gitartifact|...} using Docker multi-stage builds similar to dappdeps / toolchain.

Putting dappdeps / base


The project for Omnibus is described using the project file dapp/dappdeps/base/omnibus/config/projects/dappdeps-base.rb :

 name 'dappdeps-base' license 'MIT' license_file 'LICENSE.txt' DOCKER_IMAGE_VERSION = "0.2.3" install_dir "/.dapp/deps/base/#{DOCKER_IMAGE_VERSION}" build_version DOCKER_IMAGE_VERSION build_iteration 1 dependency "dappdeps-base" 

This file shows all dependencies of the Omnibus-package dappdeps-base and the target directory for installation. Dependencies can be located either in a separate repository (for example, omnibus-software ) or in the omnibus/config/software directory. Each file in this directory describes the instructions for installing a package / component. For dappdeps-base, Omnibus has software recipes that are missing from the standard omnibus-software repository: acl , attr , coreutils , diffutils , findutils , gtar , rsync , sed , shadow , sudo , termcap .

Consider the example of rsync , as the software recipe for Omnibus looks like:

 name 'rsync' default_version '3.1.2' license 'GPL-3.0' license_file 'COPYING' version('3.1.2') { source md5: '0f758d7e000c0f7f7d3792610fad70cb' } source url: "https://download.samba.org/pub/rsync/src/rsync-#{version}.tar.gz" dependency 'attr' dependency 'acl' dependency 'popt' relative_path "rsync-#{version}" build do env = with_standard_compiler_flags(with_embedded_path) command "./configure --prefix=#{install_dir}/embedded", env: env command "make -j #{workers}", env: env command 'make install', env: env end 

The source directive specifies the URL from which to download the source code. Dependencies on other components are indicated by dependency by name. The name of the component being assembled is specified by the name directive. Each software recipe in turn may indicate dependencies on other components. Inside the build block, there are standard build commands from source codes.

The Omnibus project and the Dockerfile for dappdeps / base can be found here .

Putting dappdeps / gitartifact


In the case of dappdeps-gitartifact, only a recipe for building Git is needed, and it is already in omnibus-software - all that remains is to plug it into the current Omnibus. The rest is all the same.

The Omnibus and Dockerfile project for dappdeps / gitartifact can be found here .

We collect dappdeps / chefdk


For chefdk, there is also a ready-made Omnibus project . It only remains to add it to the assembly container through Dockerfile and replace the standard installation paths chefdk /opt/chefdk with /.dapp/deps/chefdk/2.3.17-2 (our installation path will include the Chef version).

Dockerfile to build dappdeps / chefdk can be found here .

Putting dappdeps / ansible


To build Ansible, we also start an Omnibus project, in which we install the Python interpreter, pip and describe the software recipe for Ansible:

 name "ansible" ANSIBLE_GIT_TAG = "v2.4.4.0+dapp-6" dependency "python" dependency "pip" build do command "#{install_dir}/embedded/bin/pip install https://github.com/flant/ansible/archive/#{ANSIBLE_GIT_TAG}.tar.gz" command "#{install_dir}/embedded/bin/pip install pyopenssl" end 

As you can see, the Ansible image is a built-in Python, pip and installed via Ansible pip with dependencies.

The Omnibus and Dockerfile project for dappdeps / ansible can be found here .

How to use the dappdeps distribution?


To use dappdeps images via mounting volumes, you must first create a container for each image and specify which volume is stored in this container. This requires Docker at the moment.

 $ docker create --name dappdeps-toolchain --volume /.dapp/deps/toolchain/0.1.1 dappdeps/toolchain:0.1.1 no-such-cmd 13edda732176a44d7d822202d8327565b78f4a2190368bb1df46cdad1e127b6e $ docker ps -a | grep dappdeps-toolchain 13edda732176 dappdeps/toolchain:0.1.1 "no-such-cmd" About a minute ago Created dappdeps-toolchain 

The container is called dappdeps-toolchain : by this name, all declared volumes of this container can be used for mounting to other containers with the help of --volumes-from . The no-such-cmd command-line parameter is required for the Docker, but this container will never be launched - it will remain in the Created state.

Create the remaining containers:

 $ docker create --name dappdeps-base --volume /.dapp/deps/base/0.2.3 dappdeps/base:0.2.3 no-such-cmd 20f524c5b8b4a59112b4b7cb85e47eee660c7906fb72a4935a767a215c89964e $ docker create --name dappdeps-ansible --volume /.dapp/deps/ansible/2.4.4.0-10 dappdeps/ansible:2.4.4.0-10 no-such-cmd cd01ae8b69cd68e0611bb6c323040ce202e8e7e6456a3f03a4d0a3ffbbf2c510 $ docker create --name dappdeps-gitartifact --volume /.dapp/deps/gitartifact/0.2.1 dappdeps/gitartifact:0.2.1 no-such-cmd 2c12a8743c2b238d90debaf066e29685b41b138c10f2b893a815931df866576d $ docker create --name dappdeps-chefdk --volume /.dapp/deps/chefdk/2.3.17-2 dappdeps/chefdk:2.3.17-2 no-such-cmd 4dffe74c49c8e4cdf9d749177ae9efec3bdae6e37c8b6df41b6eb527a5c1d891 

So we have reached the climax, for the sake of which all this fuss was conceived. So, as a demonstration of the possibilities, install the nginx and tree packages in the Alpine image by running Ansible from dappdeps / ansible via Bash from dappdeps / base :

 $ docker run -ti --name mycontainer --volumes-from dappdeps-toolchain --volumes-from dappdeps-base --volumes-from dappdeps-gitartifact --volumes-from dappdeps-ansible --volumes-from dappdeps-chefdk alpine:latest /.dapp/deps/base/0.2.3/embedded/bin/bash -lc '/.dapp/deps/ansible/2.4.4.0-10/embedded/bin/ansible localhost -m apk -a "name=nginx,tree update_cache=yes"' [WARNING]: Unable to parse /etc/ansible/hosts as an inventory source [WARNING]: No inventory was parsed, only implicit localhost is available [WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all' localhost | SUCCESS => { "changed": true, "failed": false, "msg": "installed nginx tree package(s)", "packages": [ "pcre", "nginx", "tree" ], "stderr": "", "stderr_lines": [], "stdout": "(1/3) Installing pcre (8.41-r1)\n(2/3) Installing nginx (1.12.2-r3)\nExecuting nginx-1.12.2-r3.pre-install\n(3/3) Installing tree (1.7.0-r1)\nExecuting busybox-1.27.2-r7.trigger\nOK: 6 MiB in 14 packages\n", "stdout_lines": [ "(1/3) Installing pcre (8.41-r1)", "(2/3) Installing nginx (1.12.2-r3)", "Executing nginx-1.12.2-r3.pre-install", "(3/3) Installing tree (1.7.0-r1)", "Executing busybox-1.27.2-r7.trigger", "OK: 6 MiB in 14 packages" ] } 

The final chord - create an image from the resulting container and ... see that from the dappdeps there are only empty mount-point directories left in it!

 $ docker commit mycontainer myimage sha256:9646be723b91daeaf538b7d92bb8844578abc7acd3028394f543e883eeb382bb $ docker run -ti --rm myimage tree /.dapp /.dapp └── deps ├── ansible │ └── 2.4.4.0-10 ├── base │ └── 0.2.3 ├── chefdk │ └── 2.3.17-2 ├── gitartifact │ └── 0.2.1 └── toolchain └── 0.1.1 11 directories, 0 files 


It would seem, what else can you dream? ..

Further work and problems


What are the problems with dappdeps?


It is necessary to work on reducing the size of dappdeps / toolchain. To do this, you need to divide the toolchain into 2 parts: the part needed to build new utilities in dappdeps, and the part with the base libraries of type glibc, which must be installed at runtime already to run these utilities.

To work with the Ansible apt module in dappdeps / ansible, I had to add the contents of the python-apt package in Ubuntu directly to the image without rebuilding. In this case, the apt module works without problems in basic DEB-based images, but glibc of a certain version is required. Since apt itself is a distribution-specific module, this is valid.

What is missing in the Dockerfile?


To use a volume from a dappdeps / toolchain image, you must first create an archive of this image, and then add it to another image via the Dockerfile ADD directive (see the section “Using Omnibus with dappdeps / toolchain”). On the part of Dockerfile, there is not enough functionality that would allow you to simply connect the directory of another image for the build time as VOLUME , i.e. analogue of the --volumes-from option for Dockerfile.

findings


We made sure that the idea works and allows you to use GNU and other CLI utilities in assembly instructions, run the Python or Ruby interpreter, even run Ansible or Chef in Alpine or scratch images. At the same time, the writer of the assembly instructions does not need to know the side effect of the execution of running commands and explicitly list which files need to be imported, as is the case with Docker multi-stage builds.

The results of this work are also applied in practice : dapp uses dappdeps images in building containers. For example, Git from dappdeps / gitartifact is used to work with patches, and the Git utility with some guarantee behaves the same in all basic images. However, how dapp uses dappdeps is beyond the scope of this article (links to code for the most curious: dapp / deps , dapp / dimg / builder / chef.rb , dapp / dimg / builder / ansible.rb ).

The purpose of this article was to convey the very idea and show the possibility of its application using a real practical example.

PS All described dappdeps images are available on hub.docker.com : dappdeps/toolchain:0.1.1 , dappdeps/base:0.2.3 , dappdeps/gitartifact0.2.1 , dappdeps/ansible:2.4.4.0-10 , dappdeps/chefdk:2.3.17-2 - you can use them.

Source: https://habr.com/ru/post/352432/


All Articles