
Hi Habr! In the
previous article, I explained how to build your cloud hosting in 5 minutes using
Ansible ,
Docker and
Docker Swarm . In this part, I will talk about how services running in the cloud find each other, how the load balancing between them occurs and their fault tolerance is ensured.
This is an introductory article, here we will focus on a review of the tools that will solve the problem of “service discovery” in our cloud.
In the next part we will begin to practice, so I decided to give you time to get acquainted with them.
Content
Problem
Let's analyze the most typical problem and its common solution - we have a web application and we must ensure load balancing and its fault tolerance.
')
We can run multiple copies of our web application that
Supervisor will monitor.
Supervisor will restart our web application if any errors occur, and will also add such events to the log. The problem of load balancing will solve the installation of
Nginx .
Nginx configuration will look something like this:
upstream app { server 192.168.1.2:8080 max_fails=3 fail_timeout=5s; server 192.168.1.2:8081 max_fails=3 fail_timeout=5s; server 192.168.1.2:8082 max_fails=3 fail_timeout=5s; } server { location / { proxy_pass http://app; health_check; } }
This configuration will work as follows - if within 5 seconds the number of unsuccessful attempts to access one of the web applications reaches 3, then such an application will be marked as inoperative for 5 seconds (
if it fell with an error, then Supervisor restarts it ). Thus, the entire load is evenly distributed between the working copies of the applications.
disadvantages
In fact, this is a good configuration and if you have a few applications and the load is more or less even, then it is better to use it.
But we are building a cloud where this or that application will be launched - we do not know. Our load may vary for different sites / web applications in different ways, so it would be nice to be able to change the number of running copies of our applications depending on the situation. In other words - we cannot configure
Nginx / Apache / etc in advance for such a configuration.
It would be cool if
Nginx and our other services adapted to the dynamic nature of our cloud. We will deal with the solution of this particular problem in this article.
Requirements
We need a place where our services will be able to register themselves and receive information about each other.
Docker Swarm , which we started using in the
previous article , out of the box
knows how to work with
etcd ,
Consul and
Zookeeper .
We need our services to be automatically registered and removed from the above systems (
we will not teach this for each application ). For these purposes, we use
Registrator (
we will consider it in more detail below ), which works out of the box with
Consul ,
etcd and
SkyDNS 2 (
Zookeeper support in the plans ).
Our services should be able to find each other using DNS queries.
Consul and
SkyDNS 2 (
which works together with etcd ) can solve this problem.
Monitoring the health of the services we also need. It is available to us at
Consul (
which we will use ) out of the box and is
supported by the Registrator (
it must transmit information about how this or that service should be monitored ).
Last but not least, you need a service to automatically configure our components. If we run 10 copies of one web application and 20 copies of another, it must understand and immediately react to it (
changing the configuration of Nginx, for example ). This role will be played by the
Consul Template (
we will consider it in more detail below ).
NoteAs you can see, there are different solutions to our problem. Before writing this article, I worked on my configuration for a little over a month and did not encounter any problems.
Consul

Of the above options (
Consul , Zookeeper , etcd ),
Consul is the most independent project able to solve our problem of finding services out of the box.
Despite the fact that
Consul ,
Zookeeper and
etcd are located here in the same row, I would not compare them with each other. All 3 projects implement distributed key / value storage, and this is where their common features end.
Consul will provide us with a DNS server, which is not in
Zookeeper and
etcd (
can be added using SkyDNS 2 ). Moreover,
Consul will give us health monitoring (
which neither etcd nor Zookeeper can boast of ), which is also necessary for a full-fledged Service Discovery.
With
Consul we get the
Web UI (the
demo of which can be seen right now ) and high-quality
official documentation .
NoteEven if you plan to use the same configuration that I describe and the use of
Zookeeper and
SkyDNS 2 is not in your plans, I would still be familiar with these projects.
Registrator
Registrator receives information from
Docker 's about starting / stopping containers (
through a socket connection using the Docker API ) and adds / deletes them to / from
Consul 'a.
Registrator automatically obtains information about a particular service based on published ports and from the
Docker container environment variables. In other words - it works with any containers that you have and requires additional configuration only if you need to redefine the parameters obtained automatically.
And since all of our services work exclusively in
Docker containers (
including the Registrator itself ), then
Consul will always have information about all the running services of our cloud.
This is all cool, of course, but even better is what
Registrator can tell
Consul 'how to check the health of our services. This is done using the same environment variables.
NoteConsul can check the health of services if the Consul Service Catalog ( which we use ) is used to save information about them.
If Consul Key-value Store is used ( which is also supported by the Registrator and uses, for example, Docker Swarm to save information about Docker nodes ), there is no such function.
Let's look at an example:
$ docker run -d --name nginx.0 -p 4443:443 -p 8000:80 \ -e "SERVICE_443_NAME=https" \ -e "SERVICE_443_CHECK_SCRIPT=curl --silent --fail https://our-https-site.com" \ -e "SERVICE_443_CHECK_INTERVAL=5s" \ -e "SERVICE_80_NAME=http" \ -e "SERVICE_80_CHECK_HTTP=/health/endpoint/path" \ -e "SERVICE_80_CHECK_INTERVAL=15s" \ -e "SERVICE_80_CHECK_TIMEOUT=3s" \ -e "SERVICE_TAGS=www" nginx
After a similar launch,
Consul 's list of our services will look like this:
{ "services": [ { "id": "hostname:nginx.0:443", "name": "https", "tags": [ "www" ], "address": "192.168.1.102", "port": 4443, "checks": [ { "script" : "curl --silent --fail https://our-https-site.com", "interval": "5s" } ] }, { "id": "hostname:nginx.0:80", "name": "http", "tags": [ "www" ], "address": "192.168.1.102", "port": 8000, "checks": [ { "http": "/health/endpoint/path", "interval": "15s", "timeout": "3s" } ] }, ... ] }
As you can see, based on the published ports,
Registrator concluded that it is necessary to register 2 services (
http and https ). Moreover,
Consul 'a now has all the necessary information on how to check the health of these services.
In the first case, the command "
curl --silent --fail our-https-site.com " will be executed every 5 seconds and the result of the check will depend on the exit code of this command.
In the second case - every 15 seconds, Consul will jerk the URL transmitted by us. If the server's response code is
2xx , then our service is “healthy”, if
429 Too Many Requests , then in “emergency condition”, if everything else, then “ground him in peace”.
More examples and more detailed information can be found in the
official documentation .
Consul Template

We decided where to store information about all services of our cloud, as well as how it will get there and automatically updated there. But we have not yet figured out how we will receive information from there and how, in consequence, we will transmit it to our services. This is what
Consul Template will do.
To do this, you need to take the configuration file of our application (
which we want to configure ) and make a template out of it, according to the rules of the
HashiCorp Configuration Language .
Let's look at a simple example with the
Nginx configuration file:
upstream app { least_conn;
After we explain to the
Consul Template where this template is located, where to put the result and what command to execute (
it also knows how ) when it is changed (
in this case, reboot Nginx ), the magic will begin. In this case,
Consul Template will receive the addresses and port numbers of all copies of the "
cool-app " application, which are tagged with "
tag1 " and are in a "healthy" state and add them to the configuration file. If there are no such applications, then, as you already guessed, everything that is after
{{else}} will remain.
Each time the
cool-app service is added and removed with the
tag1 tag, the configuration file will be overwritten, and then
Nginx will be rebooted. All this happens automatically and does not require intervention, we just run the required number of copies of our application and do not worry about anything.
More examples can be found in the
official documentation .
Conclusion
To date, there are enough tools to solve the problem of finding services, but not so many tools that could solve this problem out of the box and immediately provide us with everything we need.
In the next part, I published a set of scripts for
Ansible , which will configure all the above tools for us and we can begin in practice.
That's all. Thank you all for your attention. Stable clouds and good luck to you!
Follow me on Twitter , I talk about working in a startup, my mistakes and the right decisions, about python and everything related to web development.
PS
I'm looking for developers to the company, the details in my profile .