Is preparing a Kubernetes cluster easy and convenient? Announcing addon-operator

Following the shell-operator, we introduce its older brother, the addon-operator . This is an open source project that is used to install system components into the Kubernetes cluster, which can be called a common word - add-ons.

Why do any supplements?

It is no secret that Kubernetes is not an all-in-one finished product, and various additions will be needed to build an “adult” cluster. Addon-operator will help to install, configure and maintain these add-ons up to date.
')
The need for additional components in the cluster is disclosed in the report of a colleague driusha . In short, the situation with Kubernetes at the moment is such that for a simple installation you can “play” components from the box, for developers and testing you can add Ingress, but for a full installation, which you can say “your production is ready”, you need to add a dozen different add-ons: something for monitoring, something for logs, don't forget ingress and cert-manager, highlight groups of nodes, add network policies, spice up with sysctl and pod autoscaler settings ...

What are the specifics of working with them?

As practice shows, the case is not limited to one installation. For comfortable work with a cluster, add-ons will need to be updated, disconnected (removed from the cluster), and you will want to test something before installing it in a production-cluster.

So maybe Ansible is enough? Maybe. But full-fledged add-ons generally do not live without settings . These settings may differ depending on the cluster option (aws, gce, azure, bare-metal, do, ...). Some settings can not be set in advance - they need to be received from the cluster. And the cluster is not static: for some settings you will have to follow the changes. And here Ansible is not enough: we need a program that lives in a cluster, i.e. Kubernetes Operator.

Those who have tried shell-operator will say that the tasks of installing and updating add-ons and tracking settings can be solved with the help of shell-operator hooks . You can write a script that will do the conditional kubectl apply and follow, for example, the ConfigMap where the settings will be stored. Approximately it is also implemented in addon-operator.

How is this organized in addon-operator?

Creating a new solution, we proceeded from the following principles:

The add-on installer must support templating and declarative configuration . Do not make magic scripts that install add-ons. Addon-operator uses Helm to install add-ons. To install you need to create a chart and highlight the values that will be used for setting.
Settings can be generated during installation , they can be obtained from the cluster , or receive updates by monitoring the cluster resources. These operations can be implemented using hooks.
Settings can be stored in a cluster . To store settings in a cluster, a ConfigMap / addon-operator is created and an Addon-operator monitors changes to this ConfigMap. An addon-operator gives hooks access to settings using simple conventions.
Addition depends on the settings . If the settings have changed, then the Addon-operator rolls out the Helm-chart with new values. The union of the Helm-chart, the values for it and the hooks, we called the module (see below for more details).
Staging No magic release scripts. The update mechanism is similar to the usual application - to collect add-ons and addon-operator in the image, to run and roll out.
Control of the result . Addon-operator can give metrics to Prometheus.

What is the add-on addon-operator?

Addition can be considered everything that adds new functions to the cluster. For example, installing Ingress is a great example of addition. This can be any operator or controller with its CRD: prometheus-operator, cert-manager, kube-controller-manager, etc. Or something small, but simplifying the operation - for example, secret copier, copying registry secrets to new namespaces, or sysctl tuner, which configures sysctl parameters on new nodes.

To implement add-ons, the Addon-operator provides several concepts:

The helm chart is used to install various software in a cluster - for example, Prometheus, Grafana, nginx-ingress. If the desired component has a Helm-chart, then installing it using the Addon-operator will be very easy.
Storage values . Helm charts usually have many different settings that can change over time. The addon-operator maintains the storage of these settings and is able to monitor their changes in order to reset the Helm-chart with new values.
Hooks are executable files that the Addon-operator launches by event and which gain access to the values store. A hook can monitor changes in the cluster and update values in the values store. Those. With the help of hooks, you can do discovery to collect values from the cluster at startup or on a schedule, or you can use continuous discovery, collecting values from the cluster according to changes in the cluster.
A module is a union of the helm-chart, the values repository and hooks. Modules can be turned on and off. Disabling the module is the removal of all releases of the Helm-chart. The modules can turn on themselves dynamically, for example, if all the modules it needs are turned on, or if the discovery found the necessary parameters in the hooks, this is done using an auxiliary enabled-script.
Global hooks . These are hooks “by themselves”, they are not included in the modules and have access to the global values store, the values from which are available to all hooks in the modules.

How do these parts work together? Consider a picture from the documentation:

There are two work scenarios:

A global hook is triggered by an event — for example, when a resource changes in a cluster. This hook handles changes and writes new values to the global values store. The addon-operator notices that the global repository has changed and is launching all modules. Each module with its own hooks determines whether it needs to be included, and updates its storage values. If the module is enabled, the Addon-operator starts the installation of the Helm-chart. At the same time, the Helm chart contains values from the module storage and from the global storage.
The second scenario is simpler: the modular hook is triggered by an event; it changes the values in the module's value store. The addon-operator notices this and launches the Helm-chart with updated values.

Addition can be implemented as a single hook or as one Helm-chart, or even as several dependent modules - this depends on the complexity of the component installed in the cluster and on the desired level of flexibility of settings. For example, in the repository ( / examples ) there is the addition of sysctl-tuner, which is implemented both as a simple module with a hook and a Helm-chart, and using the values storage, which makes it possible to add settings through editing the ConfigMap.

Delivery of updates

A few words about the organization of updates to the components that Addon-operator installs.

To run an Addon-operator in a cluster, you need to build an image with additions in the form of hooks and Helm-charts, add a binary addon-operator file and everything you need for hooks: bash , kubectl , jq , python , etc. Then this image can be rolled out into a cluster as a normal application and most likely you will want to organize a particular tagging scheme. If there are not too many clusters, the same approach as with the applications can come up: a new release, a new version, go over all the clusters and correct the image from the Pods. However, in the case of roll-out on a tangible number of clusters, the concept of self-updating from the channel more suited us.

We have it as follows:

A channel is essentially an identifier that can be set by anyone (for example, dev / stage / ea / stable).
The channel name is an image tag. When you need to roll out updates to the channel, then a new image is assembled and tagged with the name of the channel.
When a new image appears in the registry, the Addon-operator is restarted and launched with the new image.

This is not the best practice, as described in the Kubernetes documentation . It is not recommended to do this, but we are talking about a regular application that lives in the same cluster . In the case of an Addon-operator, an application is a multitude of Deployments scattered across clusters, and self-updating greatly helps and simplifies life.

Channels also help in testing : if there is an auxiliary cluster, you can configure it on the stage channel and roll updates into it before rolling out to the ea and stable channels. If an error has occurred with the cluster on the ea channel, you can switch it to stable while the problem is being investigated with this cluster. If the cluster is removed from active support, it switches to its “frozen” channel — for example, freeze-2019-03-20 .

In addition to updates of hooks and Helm-charts, you may need to update the third-party component . For example, you noticed an error in the conditional node-exporter and even figured out how to patch it. Next, open the PR and wait for a new release to go through all the clusters and increase the version of the image. In order not to wait indefinitely, you can assemble your node-exporter and switch to it before accepting PR.

In general, it can be done without the Addon-operator, but with the Addon-operator, the module for installing node-exporter will be visible in one repository, you can keep the Dockerfile right there, it’s easier for all participants in the process to understand that it happens ... And if there are several clusters, it becomes easier to test your PR and roll out a new version!

This component update organization works successfully with us, but you can implement any other suitable scheme - in this case, the Addon-operator is a simple binary file .

Conclusion

The principles implemented in Addon-operator allow you to build a transparent process for creating, testing, installing and updating add-ons in a cluster, similar to the processes for developing ordinary applications.

Add-ons for Addon-operator in the format of modules (Helm-chart + hooks) can be spread in wide access. We, the company Flant, plan to lay out our developments in the form of such additions during the summer. Join the development on GitHub ( shell-operator , addon-operator ), try to make your addition based on examples and documentation , wait for news on Habré and on our channel on YouTube !

UPDATED (June 14) : If you have English-speaking colleagues who may be interested in the addon-operator, the corresponding announcement for them is available in our blog on Medium .