Building images for a Docker based on a base image usually involves invoking commands in the environment of this base image. For example, invoking the apt-get command, which is in the base image, to install new packages.
Often there is a need to install some set of utilities into the base system, with the help of which some files that are required in the final image are installed or assembled. For example, to build a Go application, you need to install the Go compiler, put all the application source in the base image, and compile the required program. However, the final image requires only a compiled program without the entire set of utilities that was used to compile this program.
The problem is known: one of the ways to solve it can be to build an auxiliary image and transfer files from the auxiliary image to the resulting one. For this,
Docker appeared
multi-stage builds or
artifact images in dapp . And this approach ideally solves a problem similar to transferring the results of compiling source codes into the final image. However, it does not solve all possible problems ...
Here's another example: Chef is used in local mode to build an image. To do this, chefdk is put in the base image, recipes are mounted or added, these recipes are launched that customize the image, install new components, packages, configs files and so on. Similarly, another configuration management system can be used - for example, Ansible. However, the installed chefdk takes about 500 MB and significantly increases the size of the final image - leaving it there makes no sense.
')
But Docker's multi-stage builds
will no longer solve this problem . What if the user doesn’t want to know about the side effect of the program - in particular, what files does it create? For example, in order not to keep unnecessary explicit descriptions of all exported paths from an image. I just want to start the program, get some result in the image, but so that the program and the whole environment needed for its work
remain outside the final image .
In the case of chefdk, it would be possible to mount a directory with this chefdk into an assembly image for the duration of the build. But there are problems with this solution:
- Not any program needed for assembly is installed in a separate directory that is easy to mount into the build image. In the case of Ansible, you need to install Python in a nonstandard place so as not to conflict with system Python, which may cause problems.
- The mounted program will depend on the base image used. If the program is compiled for Ubuntu, then it may not start in an environment not intended for it - for example, in Alpine. Even chefdk, which is an omnibus package with all its dependencies, still depends on the system glibc and will not work in Alpine, where musl libc is used.
But what if we can prepare a kind of static unchanging set of all possible useful utilities that will be so cleverly linked that it
will work in any basic image , even scratch? After connecting such / such images to the base, in the final image there will be only an empty mount-point directory into which these utilities were connected.
In search of adventures
Theory
You need to get an image that contains a set of programs in some statically defined non-standard directory - for example,
/myutils
. Any program in
/myutils
should depend only on the libraries in
/myutils
.
A dynamically compiled Linux program depends on the location of the
ld-linux linker in the system. For example, the
bash
binary in
ubuntu:16.04
compiled so that it depends on the linker
/lib64/ld-linux-x86-64.so.2
:
$ ldd /bin/bash linux-vdso.so.1 => (0x00007ffca67d8000) libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007fd8505a6000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd8503a2000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd84ffd8000) /lib64/ld-linux-x86-64.so.2 (0x00007fd8507cf000)
Moreover, this dependence is static and compiled into the binary itself:
$ grep "/lib64/ld-linux-x86-64.so.2" /bin/bash Binary file /bin/bash matches
Thus, it is necessary: ​​a) to compile the conditional
/myutils/bin/bash
so that it uses the linker
/myutils/bin/bash
/myutils/lib64/ld-linux-x86-64.so.2
; b) that the linker
/myutils/{lib64,lib}
/myutils/lib64/ld-linux-x86-64.so.2
be configured to dynamically link libraries from
/myutils/{lib64,lib}
.
The first step is to build a
toolchain
image, which will contain everything that is necessary for the assembly and subsequent work of other programs in a non-standard root directory. To do this, we will come in handy to the instructions of the
Linux From Scratch project.
We build the distribution kit dappdeps
Why a set of images of our "distribution" is called
dappdeps ? Because these images are used by the
dapp collector - they are going to the needs of this project.
So, our
ultimate goal :
- A dappdeps / toolchain image with a GCC compiler for building other applications and the glibc library.
- Image of dappdeps / base with a set of programs and all dependent libraries: bash, gtar, sudo, coreutils, findutils, diffutils, sed, rsync, shadow, termcap.
- Image of dappdeps / gitartifact with Git utility and all dependencies.
- An image of dappdeps / chefdk with an omnibus chefdk package that contains all the Chef dependencies, including Ruby interpreter.
- An image of dappdeps / ansible with the Ansible utility, which contains all the dependencies, including Python interpreter.
Images of dappdeps may
depend on each other . For example, when building dappdeps / base, you need a toolchain and glibc from the dappdeps / toolchain image. After compiling all the utilities in dappdeps / base, they will need files from dappdeps / toolchain to run at runtime.
The main condition is that the utilities from these images
are located in a non-standard place , namely, in
/.dapp/deps/
, and
do not depend on any utilities or libraries in standard system paths. Also in the dappdeps images there should not be any other files besides /
/.dapp/deps
.
Such images will allow you to create on their basis containers with volumes containing utilities, and mount them to other containers using the
--volumes-from
option for Docker.
Putting dappdeps / toolchain
Chapter 5, “Constructing a Temporary System,” of the Linux From Scratch tutorial, describes the process of building a temporary chroot environment in
/tools
with some set of utilities, which the main distribution kit then compiles.
In our case, slightly alter the directory chroot-environment. In the
--prefix
parameter we will specify
/.dapp/deps/toolchain/0.1.1
when compiling. This is the directory that will appear in the assembly container when you mount dappdeps / toolchain into it - it contains all the necessary utilities and libraries. We only need GNU binutils, GCC and glibc.
Going image using Docker multi-stage builds. In an image based on
ubuntu:16.04
whole environment is prepared and the programs are compiled and installed in
/.dapp/deps/toolchain/0.1.1
. Then this directory is copied into the scratch-image dappdeps / toolchain: 0.1.1.
Dockerfile can be found here .The final dappdeps / toolchain image is the “temporary system” in LFS terminology. GCC in this system is still tied to the system library paths, however, we will not ensure that GCC works in any basic image. The image of dappdeps / toolchain is auxiliary, it will be used later, incl. to build already really independent of the general system libraries of programs.
Use Omnibus with dappdeps / toolchain
For projects such as
chefdk or
GitLab ,
Omnibus is used. It allows you to create self-contained bundles with the program and all dependent libraries, except the system linker and libc. All instructions are described by easy-to-read Ruby recipes. Also, the Omnibus project has a library of already written
omnibus-software recipes.
So, let's try to describe the
build of the remaining dappdeps distributions using Omnibus . However, to get rid of the dependency on the system linker and libc, we will collect all the programs in Omnibus using the compiler from dappdeps / toolchain. In this case, the programs will be tied to glibc, which is also in dappdeps / toolchain.
To do this, save the contents of dappdeps / toolchain as an archive:
$ docker pull dappdeps/toolchain:0.1.1 $ docker save dappdeps/toolchain:0.1.1 -o dappdeps-toolchain.tar
Add this archive via the
Dockerfile ADD
directive and unpack the contents of the archive into the root of the building container:
ADD ./dappdeps-toolchain.tar /dappdeps-toolchain RUN tar xf /dappdeps-toolchain/**/layer.tar -C /
Before running the assembly via omnibus, add the path
/.dapp/deps/toolchain/0.1.1/bin
to the
PATH
variable as a priority so that GCC from dappdeps / toolchain is used.
The output of Omnibus is a package (in our case, DEB), the contents of which is unpacked and transferred to
/.dapp/deps/{base|gitartifact|...}
using Docker multi-stage builds similar to dappdeps / toolchain.
Putting dappdeps / base
The project for Omnibus is described using the project file
dapp/dappdeps/base/omnibus/config/projects/dappdeps-base.rb
:
name 'dappdeps-base' license 'MIT' license_file 'LICENSE.txt' DOCKER_IMAGE_VERSION = "0.2.3" install_dir "/.dapp/deps/base/#{DOCKER_IMAGE_VERSION}" build_version DOCKER_IMAGE_VERSION build_iteration 1 dependency "dappdeps-base"
This file shows all dependencies of the Omnibus-package dappdeps-base and the target directory for installation. Dependencies can be located either in a separate repository (for example,
omnibus-software ) or in the
omnibus/config/software
directory. Each file in this directory describes the instructions for installing a package / component. For dappdeps-base, Omnibus has software recipes that are missing from the standard omnibus-software repository:
acl
,
attr
,
coreutils
,
diffutils
,
findutils
,
gtar
,
rsync
,
sed
,
shadow
,
sudo
,
termcap
.
Consider the example of
rsync
, as the software recipe for Omnibus looks like:
name 'rsync' default_version '3.1.2' license 'GPL-3.0' license_file 'COPYING' version('3.1.2') { source md5: '0f758d7e000c0f7f7d3792610fad70cb' } source url: "https://download.samba.org/pub/rsync/src/rsync-#{version}.tar.gz" dependency 'attr' dependency 'acl' dependency 'popt' relative_path "rsync-#{version}" build do env = with_standard_compiler_flags(with_embedded_path) command "./configure --prefix=#{install_dir}/embedded", env: env command "make -j #{workers}", env: env command 'make install', env: env end
The
source
directive specifies the URL from which to download the source code. Dependencies on other components are indicated by
dependency
by name. The name of the component being assembled is specified by the
name
directive. Each software recipe in turn may indicate dependencies on other components. Inside the
build
block, there are standard build commands from source codes.
The Omnibus project and the Dockerfile for dappdeps / base can be found here .Putting dappdeps / gitartifact
In the case of dappdeps-gitartifact, only a recipe for building Git is needed, and it is already in omnibus-software - all that remains is to plug it into the current Omnibus. The rest is all the same.
The Omnibus and Dockerfile project for dappdeps / gitartifact can be found here .We collect dappdeps / chefdk
For chefdk, there is also a ready-made
Omnibus project . It only remains to add it to the assembly container through Dockerfile and replace the standard installation paths chefdk
/opt/chefdk
with
/.dapp/deps/chefdk/2.3.17-2
(our installation path will include the Chef version).
Dockerfile to build dappdeps / chefdk can be found here .Putting dappdeps / ansible
To build Ansible, we also start an Omnibus project, in which we install the Python interpreter, pip and describe the software recipe for Ansible:
name "ansible" ANSIBLE_GIT_TAG = "v2.4.4.0+dapp-6" dependency "python" dependency "pip" build do command "#{install_dir}/embedded/bin/pip install https://github.com/flant/ansible/archive/#{ANSIBLE_GIT_TAG}.tar.gz" command "#{install_dir}/embedded/bin/pip install pyopenssl" end
As you can see, the Ansible image is a built-in Python, pip and installed via Ansible pip with dependencies.
The Omnibus and Dockerfile project for dappdeps / ansible can be found here .How to use the dappdeps distribution?
To use dappdeps images via mounting volumes,
you must first create a container for each image and specify which volume is stored in this container. This requires Docker at the moment.
$ docker create --name dappdeps-toolchain --volume /.dapp/deps/toolchain/0.1.1 dappdeps/toolchain:0.1.1 no-such-cmd 13edda732176a44d7d822202d8327565b78f4a2190368bb1df46cdad1e127b6e $ docker ps -a | grep dappdeps-toolchain 13edda732176 dappdeps/toolchain:0.1.1 "no-such-cmd" About a minute ago Created dappdeps-toolchain
The container is called
dappdeps-toolchain
: by this name, all declared volumes of this container can be used for mounting to other containers with the help of
--volumes-from
. The
no-such-cmd
command-line parameter is required for the Docker, but this container will never be launched - it will remain in the
Created
state.
Create the remaining containers:
$ docker create --name dappdeps-base --volume /.dapp/deps/base/0.2.3 dappdeps/base:0.2.3 no-such-cmd 20f524c5b8b4a59112b4b7cb85e47eee660c7906fb72a4935a767a215c89964e $ docker create --name dappdeps-ansible --volume /.dapp/deps/ansible/2.4.4.0-10 dappdeps/ansible:2.4.4.0-10 no-such-cmd cd01ae8b69cd68e0611bb6c323040ce202e8e7e6456a3f03a4d0a3ffbbf2c510 $ docker create --name dappdeps-gitartifact --volume /.dapp/deps/gitartifact/0.2.1 dappdeps/gitartifact:0.2.1 no-such-cmd 2c12a8743c2b238d90debaf066e29685b41b138c10f2b893a815931df866576d $ docker create --name dappdeps-chefdk --volume /.dapp/deps/chefdk/2.3.17-2 dappdeps/chefdk:2.3.17-2 no-such-cmd 4dffe74c49c8e4cdf9d749177ae9efec3bdae6e37c8b6df41b6eb527a5c1d891
So we have reached the climax, for the sake of which all this fuss was conceived. So, as a demonstration of the possibilities,
install the nginx
and tree
packages in the Alpine image by running Ansible from dappdeps / ansible via Bash from dappdeps / base :
$ docker run -ti --name mycontainer --volumes-from dappdeps-toolchain --volumes-from dappdeps-base --volumes-from dappdeps-gitartifact --volumes-from dappdeps-ansible --volumes-from dappdeps-chefdk alpine:latest /.dapp/deps/base/0.2.3/embedded/bin/bash -lc '/.dapp/deps/ansible/2.4.4.0-10/embedded/bin/ansible localhost -m apk -a "name=nginx,tree update_cache=yes"' [WARNING]: Unable to parse /etc/ansible/hosts as an inventory source [WARNING]: No inventory was parsed, only implicit localhost is available [WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all' localhost | SUCCESS => { "changed": true, "failed": false, "msg": "installed nginx tree package(s)", "packages": [ "pcre", "nginx", "tree" ], "stderr": "", "stderr_lines": [], "stdout": "(1/3) Installing pcre (8.41-r1)\n(2/3) Installing nginx (1.12.2-r3)\nExecuting nginx-1.12.2-r3.pre-install\n(3/3) Installing tree (1.7.0-r1)\nExecuting busybox-1.27.2-r7.trigger\nOK: 6 MiB in 14 packages\n", "stdout_lines": [ "(1/3) Installing pcre (8.41-r1)", "(2/3) Installing nginx (1.12.2-r3)", "Executing nginx-1.12.2-r3.pre-install", "(3/3) Installing tree (1.7.0-r1)", "Executing busybox-1.27.2-r7.trigger", "OK: 6 MiB in 14 packages" ] }
The final chord - create an image from the resulting container and ... see that
from the dappdeps there are only empty mount-point directories left in it! $ docker commit mycontainer myimage sha256:9646be723b91daeaf538b7d92bb8844578abc7acd3028394f543e883eeb382bb $ docker run -ti --rm myimage tree /.dapp /.dapp └── deps ├── ansible │ └── 2.4.4.0-10 ├── base │ └── 0.2.3 ├── chefdk │ └── 2.3.17-2 ├── gitartifact │ └── 0.2.1 └── toolchain └── 0.1.1 11 directories, 0 files
It would seem, what else can you dream? ..
Further work and problems
What are the problems with dappdeps?
It is necessary to work on reducing the size of dappdeps / toolchain. To do this, you need to divide the toolchain into 2 parts: the part needed to build new utilities in dappdeps, and the part with the base libraries of type glibc, which must be installed at runtime already to run these utilities.
To work with the Ansible apt module in dappdeps / ansible, I had to add the contents of the python-apt package in Ubuntu directly to the image without rebuilding. In this case, the apt module works without problems in basic DEB-based images, but glibc of a certain version is required. Since apt itself is a distribution-specific module, this is valid.
What is missing in the Dockerfile?
To use a volume from a dappdeps / toolchain image, you must first create an archive of this image, and then add it to another image via the
Dockerfile ADD
directive (see the section “Using Omnibus with dappdeps / toolchain”). On the part of Dockerfile, there is not enough functionality that would allow you to simply connect the directory of another image for the build time as
VOLUME
, i.e. analogue of the
--volumes-from
option for Dockerfile.
findings
We made sure that the idea works and allows you to use GNU and other CLI utilities in assembly instructions, run the Python or Ruby interpreter, even run Ansible or Chef in Alpine or scratch images. At the same time, the writer of the assembly instructions does not need to know the side effect of the execution of running commands and explicitly list which files need to be imported, as is the case with Docker multi-stage builds.
The results of this work
are also applied in practice : dapp uses dappdeps images in building containers. For example, Git from dappdeps / gitartifact is used to work with patches, and the Git utility with some guarantee behaves the same in all basic images. However, how dapp uses dappdeps is beyond the scope of this article (links to code for the most curious:
dapp / deps ,
dapp / dimg / builder / chef.rb ,
dapp / dimg / builder / ansible.rb ).
The purpose of this article was to convey the very idea and show the possibility of its application using a real practical example.
PS All described dappdeps images
are available on hub.docker.com :
dappdeps/toolchain:0.1.1
,
dappdeps/base:0.2.3
,
dappdeps/gitartifact0.2.1
,
dappdeps/ansible:2.4.4.0-10
,
dappdeps/chefdk:2.3.17-2
- you can use them.