Code review - engineering practice in terms of flexible development methodology. This analysis (inspection) of the code in order to identify errors, shortcomings, discrepancies in the style of writing the code and understanding whether the code solves the task.
Today I will talk about how we organized the review process for monitoring configuration in Zabbix. The article will be useful to those who work with the monitoring system Zabbix, both in a large team and alone, even if you have “ten hosts, what is there to review”.
We use Zabbix to monitor our internal services and build infrastructure. We have a naming convention - name convention (we use a role model with Role allocation, Profile templates for monitoring), but there is no dedicated monitoring team (there are senior engineers who “ate the dog” in monitoring matters), there are engineers and junior engineers, ~ 500 hosts, ~ 150 templates (small, but very dynamic infrastructure).
This infrastructure is used to support and automate the development processes in the company , in addition to its support, we also develop automation and integration tools, so we have little experience and understanding of the development processes from the inside.
With the increase in the number of employees and the changes introduced into the monitoring system, more and more common errors that were difficult to track were encountered:
In the world of programmers, all these problems are solved quite simply: linters, modereview. So why not take these bestpractices for a Zabbix configuration review? We take!
We already wrote earlier about the advantages and examples of code review: Implementation of code inspections in the development process , Practical example of the implementation of code inspections, Code inspection. Results
Why you may need to review the Zabbix configuration:
Add your own problems in the comments, try to figure out together how to solve them with the help of review.
Zabbix has an Audit subsystem, with its help we are looking at who made the configuration changes. Its major drawback is a large number of saved events, as it saves every user event.
Imagine that any code change remains in the git history, you tried to find the name of a variable for an hour, tried 40 options and all of them are now saved, each change is a separate commit, and then you review the history of these commits without reviewing the initial and final version. Awful, right?
And in Zabbix Audit exactly like this. With it, you can track changes, but it does not allow you to quickly see the difference (diff) between the two states of the system (at the beginning of the week and at the end). In addition, her actions are divided by type: add, change, delete need to look in different windows. An example can be found in your Zabbix on the Audit tab (or look at the screenshot). It is difficult to understand what the initial state, what the current, what changes were in a week. The situation becomes more complicated when we have dozens of changes in a week.
I would like a mechanism that will allow:
Now let's talk about how we implemented the mechanism and how it can be useful to you for your Zabbix infrastructure.
For storage of Zabbix configuration we use the following formats:
Select three git repositories (we use gitlab for storage, but any VCS will do):
In these repositories we save configuration data, the rules are as follows:
Now we clearly see what type of object has changed, and it is clear which object has changed; in the example below, the Profile template has changed . ScmDev. FlusContinuousTest .
To view the changes, we use the merge-request mechanism in gitlab.
Changed the profile template. DevOps. Test - changed the trigger expression. Template, as it is located in the templates folder:
Changed the expression in the trigger and priority:
Linked to one pattern of another:
Changed the action - added a new line to the end of the default text:
An example of discussions in merge requests (everything is like that of programmers!) - you can see that you have connected the standard template directly to the host, but it is worth highlighting a separate role for the future. Screenshot from the old review, then still using the XML representation of the configuration.
In general, everything is simple:
Suppose you have completed the task and want to ask a colleague to see if you have forgotten nothing. We request a review: to do this in the zabbix-review-export repository, launch the gitlab-ci job with manual launch.
We assign a merge-request to a colleague who watches, discusses and rules the monitoring infrastructure code.
Once a week, a new review is launched to track small changes. To do this, the Schedule exports and saves the configuration to the git repository (a new commit), and the monitoring guru reviews the changes.
Now we’ll tell you how to set up this system with a review of the Zabbix configuration ( we love open source and try to share experiences with the community).
There are two possible uses:
git add * && git commit && git push
. This option is suitable for rare changes or when you are working with a monitoring system alone.Both options are described in the https://gitlab.com/devopshq/zabbix-review-export repository, everything you need is stored there - scripts, gitlab-ci settings and README.md, how to put into your infrastructure.
To get started, try the first option (or if you do not have the gitlab-ci infrastructure): use the manual mode — run the zabbix-export.py script to export (backup) the configuration, git add * && git commit && git push
on your working machine. When you get tired, go to the second option - automate automation!
Now the changes are impersonal and to find out who made the changes, you need to use the Audit system, which causes pain and suffering. But not everything is so scary, and Audit is rarely needed, usually there is enough message in the team chat to find the right employee.
Another problem: if you change the item or trigger host, it is not contained in XML. That is, we can turn off all the triggers on a particular host or change their priority to a lower one - and no one will know about it and correct us! We are waiting for this fix at https://support.zabbix.com/browse/ZBX-15175
Not yet invented an automatic recovery mechanism. Suppose a template or host is greatly changed, we understand that the changes are incorrect and you need to return everything as it was. Now we are looking for the necessary XML for the corresponding host, import it manually into the UI, and I would just like to click the “Roll back TemplateName template to commit-hash commit state”.
You can implement two-way synchronization - when changes in the Zabbix configuration are created when changes are made in YAML, then you don’t have to go to the Zabbix web interface. On github we met a similar project, but somehow it quickly faded away and the community did not accept the idea; Apparently, it is not so easy to implement in YAML that which can be clicked with the mouse in the web interface. Therefore, we stopped on the interaction in one direction.
The ideal option is to embed this configuration saving system as a code, at least just in XML format, in Zabbix. As it is done in the TeamCity CI server : configurations configured via the UI make commits on behalf of the user who changed the configuration. It turns out a very handy tool for viewing changes, and also eliminates the problem of depersonalization of changes.
Start exporting your Zabbix configuration, commit to the repository (local enough), wait a week and start again. Now the changes are under your control! https://gitlab.com/devopshq/zabbix-review-export
Who would be interested in this functionality in the box Zabbix - please vote for the issue https://support.zabbix.com/browse/ZBXNEXT-4862
All 100% uptime!
Source: https://habr.com/ru/post/433126/
All Articles