Docker image optimization

Docker images can be very large. Many exceed 1 GB in size. How do they become like that? Should they be like that? Can we make them smaller without sacrificing functionality?

At CenturyLink Lab we have been working a lot on building various docker images lately. When we started experimenting with their creation, we found that our assemblies very quickly swell in volume (it was common to build an image that weighs 1 GB or more). The size, of course, is not so important if we are talking about images of two gigabytes lying on the local machine. But it becomes a problem when you start constantly downloading / sending these images via the Internet.

I decided that it was worthwhile to dig deeper and figure out how the process of creating docker images works, in order to understand what can be done to reduce the size of our assemblies.
')
As a small digression, Adriaan de Jonge recently published an article, “ Creating the smallest possible Docker container, ” in which he described how to build an image that does not contain anything other than the Go static binary link that runs with the container. His image is strikingly small - 3.6 MB. Here I will not consider such extremes. As a person who is used to working with languages like Python and Ruby, I need a slightly higher level of support from the OS, and I will gladly sacrifice a hundred megabytes of free space to be able to run Debian and apt-get install my dependencies. Therefore, although I envy Adrian’s tiny image, I need support from a wider range of applications, which makes his approach impractical.

Layers

Before we get to the topic of reducing your images, you need to talk about layers. The concept of layers affects various low-level technical details about things like the root file system ( rootfs ), the copy-on-write mechanism, and the cascade-integrated mount ( union mount ). Fortunately, this topic is well covered elsewhere , so I will not retell it here. For our purposes, it is important to understand that each instruction in the Dockerfile leads to the creation of a new image layer.

Let's take a look at the Dockerfile example to see this in action:

 FROM debian:wheezy RUN mkdir /tmp/foo RUN fallocate -l 1G /tmp/foo/bar

A completely useless image, but it will help us demonstrate what has been said. Here we use debian:wheezy as the base image, create the /tmp/foo , and in it select 1 GB of space for the bar file.

Let's collect this image:

 $ docker build -t sample . Sending build context to Docker daemon 2.56 kB Sending build context to Docker daemon Step 0 : FROM debian:wheezy ---> e8d37d9e3476 Step 1 : RUN mkdir /tmp/foo ---> Running in 3d5d8b288cc2 ---> 9876aa270471 Removing intermediate container 3d5d8b288cc2 Step 2 : RUN fallocate -l 1G /tmp/foo/bar ---> Running in 6c797329ee43 ---> 3ebe08b36733 Removing intermediate container 6c797329ee43 Successfully built 3ebe08b36733

If you look at the output of the docker build , you can see what exactly Docker does to build our image:

Using the value of the FROM instruction, Docker launches a container based on a debian:wheezy image (container ID: 3d5d8b288cc2 )
Inside this container, Docker runs the mkdir /tmp/foo command.
The container is stopped, commited (as a result, a new image with ID 9876aa270471 ) and then deleted
Docker launches another container, this time from the image saved in the previous step (this container has ID 6c797329ee43 )
Inside the running Docker container, execute the fallocate -l 1G /tmp/foo/bar
The container is stopped, commited (as a result, a new image was created with ID 3ebe08b36733 ) and then deleted

We can see the final result by running the docker images --tree (unfortunately, the --tree flag --tree obsolete and is likely to be removed in future releases):

 $ docker images --tree Warning: '--tree' is deprecated, it will be removed soon. See usage. └─511136ea3c5a Virtual Size: 0 B Tags: scratch:latest └─59e359cb35ef Virtual Size: 85.18 MB └─e8d37d9e3476 Virtual Size: 85.18 MB Tags: debian:wheezy └─9876aa270471 Virtual Size: 85.18 MB └─3ebe08b36733 Virtual Size: 1.159 GB Tags: sample:latest

Here you can see the image marked as debian:wheezy , followed by the two containers mentioned earlier (one for each instruction in the Dockerfile).

We often talk about layers and images as if they were different things. But, in fact, each layer is an image, and an image layer is just a collection of other images.

Just as we do:

 docker run -it sample:latest /bin/bash

We can easily run one of the unnamed layers:

 docker run -it 9876aa270471 /bin/bash

Both of them are images, on the basis of which containers can be launched. The only difference is that the first is named and the second is not. This ability to run containers from any layer can be quite useful when debugging your Dockerfile.

Image size

Knowing that an image is nothing more than a collection of other images, one can come to an obvious conclusion: the image size is equal to the sum of the sizes of the images that make it up.

Let's look at the output of the docker history :

 $ docker history sample IMAGE CREATED CREATED BY SIZE 3ebe08b36733 3 minutes ago /bin/sh -c fallocate -l 1G /tmp/foo/bar 1.074 GB 9876aa270471 3 minutes ago /bin/sh -c mkdir /tmp/foo 0 B e8d37d9e3476 4 days ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B 59e359cb35ef 4 days ago /bin/sh -c #(nop) ADD file:1e2ba3d9379f 85.18 MB 511136ea3c5a 13 months ago 0 B

We can see all the layers of the sample image along with the commands that led to their creation and their size (note that the order of the layers in docker history is the same as the order displayed in docker images --tree ).

There are only two instructions that do something meaningful to our image: an ADD instruction (inherited from debian:wheezy ) and our fallocate command.

Let's save our image to a tarball and see what the weight will be:

 $ docker save sample > sample.tar $ ls -lh sample.tar -rw-r--r-- 1 core core 1.1G Jul 26 02:35 sample.tar

When the image is saved in a tar file this way, different metadata about each layer is also placed there, so the final size will be slightly more than the sum of the sizes of all the layers.

Add one more instruction to the Dockerfile:

 FROM debian:wheezy RUN mkdir /tmp/foo RUN fallocate -l 1G /tmp/foo/bar RUN rm /tmp/foo/bar

A new instruction will delete the file immediately after it is created.

If we do the docker build for the updated Dockerfile and look at the story again, we will see the following:

 $ docker history sample IMAGE CREATED CREATED BY SIZE 9d9bdb929b00 8 seconds ago /bin/sh -c rm /tmp/foo/bar 0 B 3ebe08b36733 24 minutes ago /bin/sh -c fallocate -l 1G /tmp/foo/bar 1.074 GB 9876aa270471 24 minutes ago /bin/sh -c mkdir /tmp/foo 0 B e8d37d9e3476 4 days ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B 59e359cb35ef 4 days ago /bin/sh -c #(nop) ADD file:1e2ba3d9379f 85.18 MB 511136ea3c5a 13 months ago 0 B

Notice that the rm call has added a new layer (to 0 bytes), but everything else remains the same. If we save our updated image, we should see that the size has practically not changed (there will be a slight difference due to the metadata of the added layer):

 $ docker save sample > sample.tar $ ls -lh sample.tar -rw-r--r-- 1 core core 1.1G Jul 26 02:55 sample.tar

If we called the docker run for this image and looked into the /tmp/foo , we would find it empty (ultimately, the file was deleted). However, since our Dockerfile generated a layer containing a 1 GB file, it has become an integral part of the image.

Each additional instruction in your Dockerfile will only increase the overall size of the image.

Of course, this example is far-fetched. But understanding the fact that images are sums of the layers of which they are composed is important when looking for ways to reduce them. Below I will describe several ways to do this.

Choose your base

Pretty obvious advice. However, the choice of the base can significantly affect the final size of the image. For example, here is a list of popular base images and their sizes:

 $ docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE scratch latest 511136ea3c5a 13 months ago 0 B busybox latest a9eb17255234 7 weeks ago 2.433 MB debian latest e8d37d9e3476 4 days ago 85.18 MB ubuntu latest ba5877dc9bec 4 days ago 192.7 MB centos latest 1a7dc42f78ba 2 weeks ago 236.4 MB fedora latest 88b42ffd1f7c 10 days ago 373.7 MB

We used to use ubuntu as a basis for the team - mostly because most of us already knew it. However, after playing a little with debian , we concluded that it fully satisfies our needs and at the same time retains 100+ MB of space.

The list of useful databases may be different and depends on your needs, but you should definitely check it out. If you use Ubuntu, when BusyBox would be enough, then you waste a lot of space.

It would be desirable, that the size of images was displayed in storage of Docker. But now, unfortunately, to find out the size, the image must be downloaded.

Reuse your base

One of the advantages of the layer approach is the ability to reuse layers between different images. The example below shows three images that use debian:wheezy as a basis:

 $ docker images --tree Warning: '--tree' is deprecated, it will be removed soon. See usage. └─511136ea3c5a Virtual Size: 0 B Tags: scratch:latest └─e8d37d9e3476 Virtual Size: 85.18 MB Tags: debian:wheezy ├─22a0de5ea279 Virtual Size: 85.18 MB │ └─057ac524d834 Virtual Size: 85.18 MB │ └─bd30825f7522 Virtual Size: 106.2 MB Tags: creeper:latest ├─d689af903018 Virtual Size: 85.18 MB │ └─bcf6f6a90302 Virtual Size: 85.18 MB │ └─ffab3863d257 Virtual Size: 95.67 MB Tags: enderman:latest └─9876aa270471 Virtual Size: 85.18 MB └─3ebe08b36733 Virtual Size: 1.159 GB └─9d9bdb929b00 Virtual Size: 1.159 GB Tags: sample:latest

Each one builds on debian:wheezy , but these are not three copies of Debian. Instead of copying, each image contains a link to an instance of the Debian layer (one of the reasons I like docker images --tree , is that it clearly demonstrates the connections between different layers).

This means that once you download debian:wheezy , you no longer have to drag these layers again, and every bit of it used in the images will take place only once.

So you can save a considerable amount of space and Internet traffic using a common base for different images.

Group your teams

In the example above, we create the file and then immediately delete it. The situation, though contrived, but something similar often occurs during the construction of images. Let's look at something more realistic:

 FROM debian:wheezy WORKDIR /tmp RUN wget -nv RUN tar -xvf someutility-v1.0.0.tar.gz RUN mv /tmp/someutility-v1.0.0/someutil /usr/bin/someutil RUN rm -rf /tmp/someutility-v1.0.0 RUN rm /tmp/someutility-v1.0.0.tar.gz

We download the tarball, unpack it, move and clean up something.

As we saw earlier, each of these instructions creates a separate layer. Although we delete the archive and the extracted files, they still remain part of the image.

 $ docker history some utility IMAGE CREATED CREATED BY SIZE 33f4a99 16 seconds ago /bin/sh -c rm /tmp/someutility-v1.0.0.tar.gz 0 B fec7b5e 17 seconds ago /bin/sh -c rm -rf /tmp/someutility-v1.0.0 0 B 0851974 18 seconds ago /bin/sh -c mv /tmp/someutility-v1.0.0/someuti 12.21 MB 5b6b996 19 seconds ago /bin/sh -c tar -xvf someutility-v1.0.0.tar.gz 99.91 MB 0eebad5 20 seconds ago /bin/sh -c wget -nv http://centurylinklabs.com 55.34 MB d6798fc 8 minutes ago /bin/sh -c #(nop) WORKDIR /tmp 0 B e8d37d9 5 days ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B 59e359c 5 days ago /bin/sh -c #(nop) ADD file:1e2ba3d9379f7685a1 85.18 MB 511136e 13 months ago 0 B

Running wget results in a 55 MB layer, and unpacking the archive to a 99 MB layer. We do not need these files, which means we just waste 150+ MB in vain.

We can fix this by doing a little refactoring of our Dockerfile:

 FROM debian:wheezy WORKDIR /tmp RUN wget -nv && \ tar -xvf someutility-v1.0.0.tar.gz && \ mv /tmp/someutility-v1.0.0/someutil /usr/bin/someutil && \ rm -rf /tmp/someutility-v1.0.0 && \ rm /tmp/someutility-v1.0.0.tar.gz

Instead of running each command in a separate RUN instruction, we grouped them using the && operator. And although the Dockerfile becomes a bit less readable, it allows us to remove the tarball and extracted directory before the layer is committed.

Here is what happened as a result:

 $ docker history some utility IMAGE CREATED CREATED BY SIZE 8216b5f 7 seconds ago /bin/sh -c wget -nv http://centurylinklabs.com 12.21 MB d6798fc 17 minutes ago /bin/sh -c #(nop) WORKDIR /tmp 0 B e8d37d9 5 days ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B 59e359c 5 days ago /bin/sh -c #(nop) ADD file:1e2ba3d9379f7685a1 85.18 MB 511136e 13 months ago 0 B

Note that in the end we got the same image, while getting rid of a few extra layers and saving 150 MB of free space.

I would not advise you to urgently go and rewrite all the commands in your Dockerfile in one line. However, if you notice that somewhere there is a similar situation when you create and then delete files, then combining several instructions into one will help you keep the image size as small as possible.

"Shut down" your images

All the strategies described above are based on the assumption that you are creating your own image, or, at least, you have access to the Dockerfile. However, it is possible that you have an image created by someone else and you want to make it a little easier.

In this case, we can take advantage of the fact that the creation of the container leads to the merging of all layers into one.

Let's go back to our sample image (the one with fallocate and rm ) and run it:

 $ docker run -d sample 7423d238b754e6a2c5294aab7b185f80be2457ee36de22795685b19ff1cf03ec

Since our image, in fact, does nothing, it immediately completes the work. This gives us a stopped container, which is the result of merging all the layers of the image (I used the -d flag just to display the container ID).

If we export this container by redirecting the output to the docker import , we can turn it back into an image:

 $ docker export 7423d238b | docker import - sample:flat 3995a1f00b91efb016250ca6acc31aaf5d621c6adaf84664a66b7a4594f695eb $ docker history sample:flat IMAGE CREATED CREATED BY SIZE 3995a1f00b91 12 seconds ago 85.18 MB

Please note that the story for our new image sample:flat shows only one layer weighing 85 MB, - the layer containing the gigabyte file is missing.

And, although this is a rather clever trick, it should be noted that it has significant drawbacks:

Merging all the layers together, you lose the advantage of sharing layers in different ways described earlier. Our sample:flat image now contains a built-in debian:wheezy copy.
All metadata, usually stored with the image, is lost in the process of launching / exporting / importing. Opening ports, environment variables, default command — everything that can be declared in the original image is lost.

Therefore, I definitely would not advise you to rush to "collapse" all your images. But, sometimes, it can be useful: if you are trying to optimize someone else's image, or just want to find out how much you can press your own.

- Source: Optimizing Docker Images

Source: https://habr.com/ru/post/234829/

All Articles