Performance orchestra

It would hardly be wrong to say that the best of people
gain joy through suffering.
Ludwig van Beethoven

I'm Sergey, I work at Yandex.Money in the performance research team. I want to tell you the beginning of the story about our path to using orchestration - how we chose the tools and what was taken into account. All events from the article take place in real time, so you, dear readers, follow the development of the situation almost live.

Why do we need a conductor in the team?

Who is the conductor? From fr. diriger - to manage, direct, direct - in the world of music - this is a person, the head of learning and performing ensemble music. In our case, this place is occupied by orchestration and automation systems.

Their role is no different from the role of the conductor in music - they are needed to help the team, guide and organize its play.

As a rule, the team has a certain set of capacities - let's call them the servers on which they implement their projects.

The approach to obtaining and operating these servers is varied. A few examples:

The team makes a request, for example to the operation group, to provide them with resources with certain parameters.
The operation group provides them with the necessary amount - cloud or bare metal (“bare iron”) - and undertake to maintain them in proper condition according to the SLA. The configuration is also performed by the operations team.
The team receives only the resources of the cloud or bare metal from the exploitation group, the configuration it produces on its own.
The team itself “buys” resources and supports / adjusts them completely independently.

Our team uses servers that need to be supported - update the OS, install new packages, etc.

For ourselves, we have identified them in two main types:

tank group
service team.

The tank group consists of hosts with Yandex.Tank.

The service team incorporates everything related to maintenance, which is various services for providing support for the release cycle, generation of automatic reports, etc.

At one point, all of this became inconvenient to manage in manual mode, and we thought about automating the entire process, starting from the "casting" of servers and ending with the development, layout and launch of our internal service.

Why is a conductor needed, even if the orchestra can play?

To begin with, we mastered Ansible and began to pour our bare metal servers in order to be less dependent on system administrators - here everyone wins, we gain new skills and we relieve administrators of the part of the work that they always have without them. We strive to develop outside our specialty and team autonomy as far as possible.

In the company, work with Ansible has been adjusted and regulated for quite a long time, so we easily integrated our solution into this process.

Now the hosts fusion consists of three Ansible roles:

the first role is set by the OS,
the second rolls the basic settings for the host, LDAP authorization, for example,
and the third installs in the docker-container Yandex. Tank and associated dependencies.

Let us turn to the services that we use within the team.

For our tasks, we equally use Kotlin and Python, as well as a little bit of Golang. In order to unify the development and deployment of our services, we decided to pack them in docker containers. This gives you the freedom to choose a programming language and at the same time adjusts the uniform delivery format of your application.

A quick note about Docker ipv6

Some of the services with which we interact are only available via ipv6, so I had to figure out how to make ipv6 for containers.

According to the ipv6 documentation on the Docker official website, ipv6 is enabled by adding parameters to daemon.json:

{ "ipv6": true, "fixed-cidr-v6": "2001:db8:1::/64" }

In this case, the provider must issue the subnet ipv6, which you register in fixed-cidr-v6.
However, we chose another option - ipv6 NAT, and here's why:

Now docker cannot be used only with ipv6.
The presence of a globally routable address in each container means that all ports (even unpublished) become accessible to everyone if additional filtering is not performed.
userland proxy for publishing ports, iptables for ipv4 only .

ipv6 NAT is a docker container that manages the rules in ip6tables and edits them when adding a new container.

In order for this solution to work properly, it was necessary to do some more manipulations. Be sure to initialize ip6table_nat in the system. The presence of a module installed in the system does not guarantee that the module will be loaded into the kernel at startup. We encountered this when we received this error when starting the NAT container on a fresh host:

 2019/01/22 14:59:54 running [/sbin/ip6tables -t filter -N DOCKER --wait]: exit status 3: modprobe: can't change directory to '/lib/modules': No such file or directory ip6tables v1.6.2: can't initialize ip6tables table `filter': Table does not exist (do you need to insmod?)

The problem was solved after adding the Ansible initialization to the role using the modprobe module and loading when the OS starts using the lineinfile:

 - name: Add ip6table_nat module modprobe: name: ip6table_nat state: present - name: Add ip6table_nat to boot lineinfile: path: /etc/modules line: 'ip6table_nat'

By the way, on Habré there is a good article that briefly and clearly describes the advantages and disadvantages of this or that method for ipv6 in docker.

But back to our question asked at the beginning:
Why is a conductor needed, even if the orchestra can play?

Now everyone represents how to play in our team:

the process of "pouring" of servers created,
development and deployment of services are unified.

There is a reasonable question - how to deploy, update, control our services in docker containers in an efficient and maximally automated way?

Despite the fact that every member of the orchestra knows his own part, he may stray from the original idea. Here we come to the fact that without a conductor, our orchestra will not effectively rehearse and play smoothly. The conductor is responsible for all parameters of performance, for everything to be united by a single tempo and mood.

How to get a good conductor with minimal investment?

The theme of the orchestration is quite well developed in the market. But first, let's talk about auxiliary tools that can help the conductor.

Consul is a system that provides two main functions:

service discovery,
distributed storage key value.

In our orchestra, Consul will be responsible for registering services and storing their configurations. There are two registration options:

Active is when the service registers itself using the HTTP API;
Passive - the service must be registered manually.

Vault is a repository that standardizes and unifies safe storage and working with secrets - passwords, certificates.
Here are the benefits that we get using this tool:

A single center for creating and storing secrets, managing their life cycle through the HTTP API.
Transit Secrets Engine - encrypt-decrypt data without saving it. The ability to transfer data in encrypted form over unsecured communication channels.
Access policies that are easy to configure.
Audit access to secrets.
The ability to create your own CA (Certificate Authority) to manage self-signed certificates within your infrastructure.

Taking into account all our requirements, two options were suitable for the role of conductor - Kubernetes and Nomad.

Kubernetes

How many articles and books have been written about him ( such , for example), reports have been told that I will write shortly - this is a universal combine that can do almost everything. Paying for it is not always easy setup and cluster support on Kubernetes.

Nomad

A tool from HashiCorp, a company known for the consul and vault mentioned above.

Nomad seemed to us quite simple to install and configure than Kubernetes. One binary file works in both server mode and client mode. At the same time, Nomad covers the entire list of tasks that we want it to solve: cluster management, fast scheduler, multidatacenter support. Plus, using consul and vault, we get tighter integration to orchestrate our services.

What is now in the work:

prepared servers for Consul deployment,
the nomad cluster configuration will be entered into Consul, with which nomad should be automatically deployed,
in parallel, we will install a vault to store secrets.

The question to the audience is whether it is worth starting a conductor for such tasks or an orchestra without it? Tell us in the comments what you think about this.

Subscribe to our blog and stay in touch - we will soon tell you what happened in the end, and if we set up a nomad cluster, as we wanted.

Come to our cozy telegram chat , where you can always ask for advice, help colleagues and just talk about performance research and more.

Source: https://habr.com/ru/post/443822/

All Articles