
Since we started working on Iron.io, we have been trying to solve the problem of keeping our IronWorker containers up to date with respect to new runtimes and Linux packages. For the past two years, IronWorker has used the same runtime without changes. So far, a few weeks ago, we have not released in production various environments for programming languages.
Since the creation of our service, we have used only one container, which contained a set of language environments and binary packages - Ruby, Python, PHP, Java, .NET and other languages, as well as libraries such as ImageMagick, SoX and others.
This container and its usage strategy began to become obsolete, as did Ruby 1.9.1, Node 0.8, Mono 2, and other languages ​​with older versions that were used on the stack by default. Over time, the problem became even more acute, as people began to use new things, but were forced to change their code to work with old versions of languages.
Limited to one LXC container
IronWorker uses LXC containers to isolate resources and ensure security while performing tasks. LXC worked fine as a performance component, but it fell from time to time when it came to integrating with all sorts of environments needed to handle tasks. We were at a dead end when it came to creating runtime environments. On the one hand, we could not simply update the versions in the existing container, otherwise we would have risked destroying a million-plus tasks that are performed every day. (We tried it once at the beginning of the service launch, and it didn’t lead to anything good)
')
We also could not store different
LXC containers with different versions of languages, since they contain full copies of the operating system and libraries (that is, ~ 2 GB per image). In fact, this would work fine in a PaaS environment, such as Heroku where processes run endlessly, and you can simply get the right container before starting the process. In such a situation, there would be large custom images for each client, but in the case of IronWorker, everything is different.
IronWorker is a large multi-user task processing system, where users add tasks to the queue, and these tasks are performed in thousands of handlers. It can be used to offload the main execution thread by running in the background, running scheduled tasks, continuously processing transactions and message flows, or performing parallel processing on a large number of cores. The advantage is that users have the ability to process on demand and at the same time very large parallelism without any effort.
Inside, the service works as follows: gets a task from a set of queues, sets a runtime to a specific VM, loads the task code, and then starts the process. The essence of the service implies that all machines are constantly used by all customers. We do not allocate a car for specific applications or customers for a long period of time. Tasks, as a rule, are carried out not for long. Some work for only a few seconds or minutes, and the maximum work time is limited to sixty minutes.
LXC worked as it should, but we puzzled over how to update or add something to our existing container, without breaking backward compatibility and not using an insane amount of disk space. Our options seemed rather limited and therefore we put off the decision.
... and then Docker came

We first heard about Docker over a year ago. We helped organize GoSF MeetUp and
Solomon Hikes , the creator of
Docker, attended the
conference in March 2013 and demonstrated his new project Docker, which was written in the Go language. In fact, he published it on that day, it was the first time someone saw him.
The demo was great, and over a hundred developers in the audience were impressed with what he and his team did. (And immediately, as evidenced by one of his comments, Solomon began a new development methodology called
Shame Driven Development )
“Alas, it was too raw,” we said the day before, “the project was not ready for production, but it was really worthy of praise.”
Solomon Hikes and Travis Reader at the 2013 OpenStack Summit.A month later, I met Solomon at the OpenStack Summit in Portland to work together and see how we could use Docker to solve our problem. (I thought that I would only go to one meeting, but instead we spent a lot of time working with Solomon and other developers).
I played with Docker, and Solomon helped me understand what he can do and how it works. It was not just a good project, it solved a difficult problem in a well-designed way. And it did not make him flawed, because he was a pioneer written in the Go language and did not have a decent technical debt, at least from my point of view.
Research and development
Before Docker, we tried various package managers, including trying to work with Nix. Nix is ​​a great project and it has a lot of good points, but unfortunately it wasn’t exactly what we needed. Nix supports atomic updates, rollbacks and has a declarative approach to system configuration.
Unfortunately, it was difficult to maintain scripts for various programs and libraries that we use in our images, and it was also difficult to add some custom packages and programs. The effort required to integrate scripts, libraries, and more was more like patches for our system. We were looking for something else that would bring us closer to meeting our requirements.
In the beginning there were such requirements:
Provide different versions of the same languages ​​(i.e. ruby ​​1.9 and ruby ​​2.1)
Have a safe way to update one part of the system without disturbing other parts (for example, update only python libraries and don’t touch ruby ​​libraries)
Use a declarative approach to system configuration (simple scripts that describe what should be inside the image)
Create an easy way to update and roll back
In the process, it became clear that there are several other advantages of using Docker, which we did not anticipate. These include:
Creating separate and isolated environments for each runtime / language
Getting support for the
CoW file system (which
brings us to a more secure and efficient level in
image management).
Getting a reliable way to switch between different runtime environments on the fly
Work with Docker
Working with Docker, it was not difficult to integrate it, since we already used LXC. (Docker complements LXC with a high-level API running at the process level. StackOverflow link below.)
After we migrated our existing shell scripts to
Dockerfiles and created images, all we had to do was to switch from using LXC directly - this is the “docker run” (instead of “lxc-execute ') and specify the image ID needed for each task. .
Command to run LXC image:
lxc-execute -n VM_NAME -f CONFIG_FILE COMMAND
The command to run the Docker image:
docker run -i -name=VM_NAME worker:STACK_NAME COMMAND
It should be noted that we are slightly moving away from the recommended approaches to the creation and installation of containers.
The standard approach is to either create images at runtime using Dockerfiles, or store them in closed / open repositories in the cloud. Instead, we create images, and then make snapshots of them and store them in EBS attached to our system. This is done because the system should start very quickly. Creating images at runtime was a bad option, even loading them from external storage would be too slow.
Basic images plus diffs
Using Docker also solved the problem of disk space, since each image is just a set of changes (diff) from the base image. And this means that we can have one basic image, which contains the operating system and Linux libraries that we use in all images, and use it as a basis for many other images. The size of the inherited image includes only the size of the differences from the base image.
For example, if you install Ruby, the new image will contain only files that were installed with Ruby. To prevent this from becoming confusing, let's think of this as a Git repository containing all the files on a computer, where the base image is the master branch, and all other images are different branches generated from the base image. This ability to incorporate differences and create images from existing containers is very useful, as it allows us to constantly release new versions, add libraries, packages, and concentrate more on problem solving.
Some problems
We had several problems when creating and introducing new environments with the help of Docker, but there were no serious ones among them.
We had some difficulties with the removal of containers after running the task. The process of removing the container sometimes fell, but we found a fairly clean solution.
When setting up some software components, we found that Docker incorrectly emulated some low-level functions, such as fuse. As a result, we had to resort to some magic to get a properly working Java image.
Well that's all. Questions to the Docker developers, mainly boiled down to a
few fixes. As for the new Docker functionality, the fact that we have enough. (We still have not tried to add anything to the functional, since the existing set of functions is quite extensive).
LXC, Containers and Docker
LXC (LinuX Containers) is an operating system-level virtualization system that provides a secure way to isolate one or more processes from other processes running on the same Linux system. When using containers, resources can be isolated, services are limited, and processes are allocated the isolated space of the operating system with its own file system structure and network interfaces. Multiple containers can use the same core, but each container can be limited to using only a certain amount of resources, such as CPU, memory, and I / O operations. As a result, applications, tasks, and other processes can be configured to run as multiple lightweight, isolated Linux instances on the same machine.
Docker is built on top of LXC, which allows you to manage images and deployment. Here is
an article on StackOverflow from Solomon about the differences and compatibility between LXC and Docker:
If you take a look at the features of Docker, most of them are already provided in the LXC. So what does Docker add? Why should I use Docker instead of simple LXC?
Docker is not a replacement for LXC. “LXC” refers to the capabilities of the Linux kernel (in particular, namespaces and control groups), which allow you to isolate processes from each other and control the distribution of their resources.
Docker offers a high-level tool with several powerful features on top of the low-level kernel functions.
Read more >>Docker in production
Docker - the foundation of IronWorker's stackWe currently use Docker in production as part of the IronWorker service. You can choose from 10 different “stacks” (containers) for your tasks by setting the “stack” option when loading code. If you think about it, this is a convenient opportunity - you can specify the language version for a short-term task that will be executed on any number of cores.
Using Docker for image management allows you to update images without fear of damaging other parts of the system. In other words, we can update the image of Ruby 1.9 without touching the image of Ruby 2.1. (Maintaining consistency is paramount in any large-scale system, especially when you support a large set of languages).
We also have a more automated process for updating images with Dockerfiles, which allows us to deploy updates on a predictable schedule. In addition, we have the ability to create custom images. They can be formed according to a specific version of the language and / or include certain frameworks and libraries.
Looking to the future
The decision to use Docker in production was not an extremely risky step. A year ago it was possible, but now it is a stable product. The fact that this is a new product is an advantage in our eyes. It has a minimal set of capabilities and is built for large-scale and dynamic cloud environments like ours.
We looked at the Docker from the inside and got to know the people behind it, but even without that, the Docker would be a natural choice. It is a lot of pluses, and it is few minuses.
And, as a council, we suggest using “ready-to-use” Dockerfiles, scripts and accessible images. There are many useful things to start with. In fact, we will probably make our Dockerfiles and images accessible, which means that people will be able to easily run their workers locally, and we will also make it possible to send pull reqeusts to improve them.
Processing tens of thousands of hours of CPU time and millions of tasks every day in almost every language is not easy. Docker allowed us to solve some serious problems at the cost of small efforts. This has increased our ability to innovate, as well as to create new opportunities for IronWorker. But, equally important, it allows us to preserve and even exceed the guaranteed conditions of service over which we work a lot.
Docker has a great future, and we are glad that we have decided to include it in our technology stack.