We in Dailymotion started using Kubernetes in production 3 years ago. But deploying applications on several clusters is still fun, so in the past few years we have tried to improve our tools and workflows.
Here we describe how we deploy our applications on several Kubernetes clusters around the world.
To deploy several Kubernetes objects at once, we use Helm , and all our charts are stored in the same git repository. To deploy a full stack of applications from several services, we use the so-called summary chart. In essence, this is a chart that declares dependencies and allows you to initialize the API and its services with a single command.
We also wrote a small Python script on top of Helm to do checks, create charts, add secrets and deploy applications. All these tasks are performed on the central CI platform using the docker image.
Let's get to the point.
Note. When you read this, the first release candidate for Helm 3 has already been announced. The basic version contains a whole set of improvements designed to solve some of the problems that we faced in the past.
For applications, we use branching, and decided to apply the same approach to charts.
Each environment has its own private repository that stores our charts, and we use the Chartmuseum with very useful APIs. In this way, we guarantee strict isolation between environments and checking the charts in real conditions, before using them in production.
Chart repositories in different environments
It is worth noting that when developers send the dev branch, a version of their chart is automatically sent to the dev Chartmuseum. Thus, all developers use the same dev repository, and you need to carefully indicate your version of the chart in order not to accidentally use someone's changes.
Moreover, our small Python script checks Kubernetes objects against Kubernetes OpenAPI specifications using Kubeval before publishing them to Chartmusem.
There was a time when we used the Kubernetes cluster federation , where you could declare Kubernetes objects from one API endpoint. But there were problems. For example, some Kubernetes objects could not be created at the end point of the federation, so it was difficult to maintain the merged objects and other objects for individual clusters.
To solve the problem, we began to manage the clusters independently, which greatly simplified the process (using the first version of the federation; in the second, something could change).
Now our platform is distributed in 6 regions - 3 locally and 3 in the cloud.
4 global Helm values ​​allow you to identify differences between clusters. For all of our charts, there are minimal defaults.
global: cloud: True env: staging region: us-central1 clusterName: staging-us-central1
Global values
These values ​​help define the context for our applications and are used for different tasks: monitoring, tracing, logging, making external calls, scaling, etc.
Here is a specific example:
{{/* Returns Horizontal Pod Autoscaler replicas for GraphQL*/}} {{- define "graphql.hpaReplicas" -}} {{- if eq .Values.global.env "prod" }} {{- if eq .Values.global.region "europe-west1" }} minReplicas: 40 {{- else }} minReplicas: 150 {{- end }} maxReplicas: 1400 {{- else }} minReplicas: 4 maxReplicas: 20 {{- end }} {{- end -}}
Helm Template Example
This logic is defined in an auxiliary template so as not to clutter Kubernetes YAML.
Our deployment tools are based on several YAML files. Below is an example of how we declare a service and its scaling topology (number of replicas) in a cluster.
releases: - foo.world foo.world: # Release name services: # List of dailymotion's apps/projects foobar: chart_name: foo-foobar repo: git@github.com:dailymotion/foobar contexts: prod-europe-west1: deployments: - name: foo-bar-baz replicas: 18 - name: another-deployment replicas: 3
Service definition
This is a diagram of all the steps that define our deployment workflow. The last step deploys the application simultaneously to several work clusters.
As for security, we keep track of all the secrets from different places and store them in a unique Vault repository in Paris.
Our deployment tools extract secrets from Vault and, when deployment times arrive, insert them into Helm.
To do this, we defined a comparison between the secrets in Vault and the secrets our applications need:
secrets: - secret_id: "stack1-app1-password" contexts: - name: "default" vaultPath: "/kv/dev/stack1/app1/test" vaultKey: "password" - name: "cluster1" vaultPath: "/kv/dev/stack1/app1/test" vaultKey: "password"
apiVersion: v1 data: {{- range $key,$value := .Values.secrets }} {{ $key }}: {{ $value | b64enc | quote }} {{ end }} kind: Secret metadata: name: "{{ .Chart.Name }}" labels: chartVersion: "{{ .Chart.Version }}" tillerVersion: "{{ .Capabilities.TillerVersion.SemVer }}" type: Opaque
Now we share the development of charts and applications. This means that developers have to work in two git repositories: one for the application, and the other to determine its deployment in Kubernetes. 2 git repositories are 2 workflows and it’s easy for a beginner to get confused.
As we already said, generalized charts are very convenient for defining dependencies and quickly deploying multiple applications. But we use --reuse-values
to avoid passing all values ​​every time we deploy the application included in this generic chart.
In the continuous delivery workflow, we only have two values ​​that change regularly: the number of replicas and the image tag (version). Other, more stable values ​​are changed manually, and this is quite difficult. Moreover, one mistake in the deployment of the generalized chart can lead to serious disruptions, as we have seen from our own experience.
When a developer adds a new application, he has to change several files: announcement of the application, list of secrets, add an application depending on if it is included in the generalized chart.
Now we have one AppRole that reads all the secrets from Vault.
For rollback, you need to execute the command on several clusters, and this is fraught with errors. We perform this operation manually in order to guarantee the correct version identifier.
We want to return the chart to the repository of the application that it deploys.
The workflow will be the same as for development. For example, when a branch is sent to the master, the deployment will start automatically. The main difference between this approach and the current workflow will be that everything will be managed in git (the application itself and the way it is deployed to Kubernetes).
There are several advantages:
Our developers have been using this workflow for 2 years now, so we need the most painless migration possible. Therefore, we decided to add an intermediate stage on the way to the goal.
The first stage is simple:
apiVersion: "v1" kind: "DailymotionRelease" metadata: name: "app1.ns1" environment: "dev" branch: "mybranch" spec: slack_channel: "#admin" chart_name: "app1" scaling: - context: "dev-us-central1-0" replicas: - name: "hermes" count: 2 - context: "dev-europe-west1-0" replicas: - name: "app1-deploy" count: 2 secrets: - secret_id: "app1" contexts: - name: "default" vaultPath: "/kv/dev/ns1/app1/test" vaultKey: "password" - name: "dev-europe-west1-0" vaultPath: "/kv/dev/ns1/app1/test" vaultKey: "password"
We talked to all the developers, so the migration process has already begun. The first phase is still monitored using the CI platform. Soon I will write another post about the second stage: how we switched to the GitOps workflow with Flux . I will tell you how we set up everything and what difficulties we faced (several repositories, secrets, etc.). Follow the news.
Here we tried to describe our progress in the application deployment workflow in recent years, which has led to thoughts about the GitOps approach. We have not yet reached the goal and will report the results, but now we are convinced that we did the right thing when we decided to simplify everything and bring it closer to the habits of the developers.
Source: https://habr.com/ru/post/458934/
All Articles