Forced Introduction to Configuration Management Systems

Abstract : how to make yourself study any of the existing configuration systems and stop editing files on the server by hand.

The post is dedicated to the heartache that needs to be overcome. Even in the post will be a bit of technical, but for the most part the post is devoted to the struggle with himself, the deferred reward and how much the motor memory controls you.

Introduction for hermits who have not heard what configuration management systems are

For many years (by the standards of IT, three generations of how), there have been programs that allow to automate the server configuration process. All these programs are complex, they invade the holy of holy administrators and force them to do "everything is not the same as before." Studying and internalizing them (admitting that "this is the right way and the right way") is an absolute must have in the career of any system administrator.

The main pain of any configuration management system

The main pain is that the configuration management system breaks the familiar automation of the fingers. Previously, you could raise a web server in 2 minutes almost without looking at the screen. Now you are offered to spend on absolutely the same actions for 15-20 minutes (if you know the configuration management system well) or even several days (!!!!!), if you study it.

This is a crime against personal effectiveness. Reduce it ten (0xA) times - and they call this progress?

Regardless of all other arguments, this thought will haunt you all the time. Even years later. You do not just press more buttons to do the same, but you have to do it more slowly, you have to wait for the computer (you never had to wait tens of seconds before the editor edits the file and restarts the web server). Worse, at the moment when you write a primitive construction in the config, you will have to deal with the extra spaces in the DSL (a special bird's programming language that you need to learn); think of any incredibly complex extraneous garbage; make special services for robots. Debugging will be worse and more disgusting - instead of a normal error message due to a typo in the config, you will receive a slurred oversharing sheet of output on two screens, which takes longer to read than to “go and do it manually”.

Even worse: often these sheets will not concern your actions, but completely incoherent changes elsewhere in the project. And you will be forced to deal with them. Sometimes it will even be configuration management system bugs and you will fix / bypass the bugs instead of doing your direct work.

Well, I "sold" you this technology? Ready to fight back with all the forces of attempts to introduce it in the workplace?

Before we continue to talk about configuration management systems, I want to show you a very illustrative example from the psychology of a marshmallow experiment . For those who are lazy to read Wikipedia, retelling: children are given a marshmallow, and they say that if they don’t eat it in 15 minutes, they will be given a second (and both can be eaten). The older the child, the more likely it is that it will last 15 minutes. Kids can not restrain themselves and eat the marshmallow immediately, although then it becomes offensive. This experiment tests the mechanism of "deferred pleasure."

This is exactly what configuration management systems offer. You spend half an hour on the operation, which you can do with your hands in 3 minutes, and then this operation can be performed again and again (when necessary) without the expense of three minutes. And most importantly, without involving the head. Most often, this procedure is delegated to the CI server (jenkins, buildbot, etc), and then it opens the door for the first step to the magic door called CI / CD.

Staging

And this little step is called 'staging'. A copy of your production that doesn’t do anything. On which you can see the answer to the question "what will happen if I change the version of the server?" and other funny experiments without breaking your production.

Of course, staging can be done without configuration management systems. But who will make sure staging looks like production? Moreover, who will make sure that after your ridiculous experiment, staging is still similar to production? If you make a ridiculous experiment on the result of a previous ridiculous experiment, then perhaps the result will be different from what you will get later on production.

This question is "who will follow?" actually with a tuck. If you do not have a giant bloated state in which you can have a couple of people who are watching the staging, then the answer is no one. No one is following. What to do?

The answer is: "we destroy to the ground," and re-create it from scratch. If in this process more than a minute of time is required from a person, then, of course, no one will do it. It’s too sad to do the same thing again and again to correct the “ridiculous experiment.”
The usual thought "Yes, why re-raise it, I will now return everything back with my hands, so faster." At the exit - a ridiculous experiment and the result of its correction, instead of a "copy of production".

But if there is a robot that "goes and does everything himself," then yes, no problem. Went and did.

Staging propagation

If re-styling is so simple, then why not raise it in one more instance? It may be simpler, it will not have any of the important complex heavy components, but the necessary piece, on which you are working right now - why not?

And it can also be on a localhost, in a virtual machine or container. What gives you almost zero latency when working (nice), support for offline mode (useful). This will require a bit of work with the configuration management system, but far less than it might seem. And then - tyk-tyk and you have a fresh piece of a copy of the production (or even something specific to your feature set from the gita).

Refactoring

After you have written the process of turning several servers into production and you can repeat it again and again (with little effort), you are free to start changing pieces of this process (to a more convenient / correct / trendy process) and see what you get.

This requires the second part of the configuration management system — server validation. We'll talk about this a little later, but for now, focus on a simple idea: you can try changing something in the server configuration process and see what happens. For free. (From myself: when I'm not sure, I sometimes run 2-3 versions in parallel on different stagings to choose the best one).

code review

Refactoring and storing instructions for the configuration management system in the gita makes it possible to carry out code review (if you have more than one person in the team). code review is awesome! First, he disciplines not to leave curvature. Secondly, you learn from each other how to do better. Third, it increases mutual knowledge about the project and the changes occurring in it. Developing the ci / cd line, with some effort, you can even see the results of the proposed change run on the temporary installation - and the robot can pull the pull request simply because it “breaks everything” - no human involved.

Tests

If we have a set of instructions for a configuration management system, then we can check the result. There are a lot of interesting solutions for this: testinfra, goss, inspec, serverspec, etc. In fact, these tests allow you to verify the result of applying your configuration. For example, “on the 80th port are listening”, “180 checks appeared in the monitoring”, “user X can log in to the server”, etc. When you have such tests, you can play with the process for as long as you need - if the tests pass, then you did everything correctly. Because of this, you can try new (impudent) and not be afraid of the unexpected (for example, "oh, I did not think that enabling SSL would break the whole monitoring").

Job security

Configuration management systems directly threaten low-skilled system administrator jobs. If earlier we needed 20 administrators, each of whom could manage 20 servers, now a team of three (slightly more qualified administrators) can perfectly cope with 400 servers. In fact, one can also cope, but 2-3 give greater team competence, a smaller bass factor (concentration of knowledge from a single person) and improve the atmosphere in the team thanks to mutual responsibility for the quality of work. And with job security, everything is simple. Either you are on the list of these three administrators, or not.

In fact, I’m a bit crafty and the reality usually looks like this: instead of 60 servers, three administrators have 400 (1000? 2000?) Servers on their hands, and the option "to hire 17 more administrators" is simply not worth it for budget reasons. But these are features of a growing market and a shortage of qualified personnel, but the general argument still remains: configuration management systems increase labor efficiency, and people with higher labor efficiency are more in demand on the market.

Another program that needs attention

With all the positivity of the above, any configuration management system is just a program. This means that there will be bugs. Including offensive requiring to do uncomfortable and ugly just to get around the bug. This means that the obvious architectural features will not be implemented simply because programmers have a different vision of the direction of the project. This means that instead of part of the documentation there will be "Documentation" (which is located in the src directory). Like any other program, it will require attention to its inner world, giving itself time, competence (and more time to study).

And once again about the paradigm shift

All of this (including bugs and a deep inner world) will require adapting the thinking to a model dictated by the configuration management system. Configuration management is an extremely invasive entity that wants to be central to everything, and many workflows will have to be tailored to the requirements of this program. There will be many frontier moments when it becomes worse (earlier it was possible to make better and easier), there will be many times when there is a feeling of performing an absurd ritual instead of normal work.

But at the same time there will be another feeling - that there is less button and more thought. Instead of “go and change” there will be a moment of reflection “is it right to change so?”, “And why should it be changed at all?”, There will be more arguments about the architecture (why on the server And we change one value and on the server B it’s different? Can we do we find something in common between these changes and describe this common as a new entity?)

Ontology, or 2nd hard problem

Ontological problems will be in full growth. Reflections on the "common to different changes" will constantly lead to the emergence of new things, and new things require names. It is well known that inventing names is the second difficult problem in IT (the first is cache invalidation), and it is complicated because the invented name determines the properties and expectations of the object. And you will have to painfully invent names every time. Every mistake is a piece of curvature in architecture. If before the changes were "because it is necessary to change", then now there will be changes, because "it is so necessary to implement the properties of sepule." I tried to avoid anecdotal examples (from life), but I still give. It took me three weeks in one project to come up with the name "system configuration" (to describe the part of the changes that affect server settings and require using ansible, as opposed to "software configuration", to describe the part that does not require intervention ansibla). The idea turned out to be reasonable and helped to divide the impossible intertwined bundle of dependencies "should be changed on server A, but only after the user changes the interface B, but if the user changes B, then we need not touch A, and if the user changes E, then B changes and B, so we need to somehow change both the server configuration and the part that the ansiblom is not configurable. " Phew ... Long-forgotten horror. Which had to be felt, thought out and found a name for a separate entity within it.

Moving from the frontier to the back office

Since the buttons for changing the config on the server are pressed less and less, and thinking about the abstract more and more, life gradually creeps from ssh user@server to git commit .
This means that after the configuration management system comes a lot from the life of programmers. The structure of the code (code !!), tests, expressiveness and clarity, code review, readiness and stability of the code (code !!) to unexpected changes in the TK, the TK itself begins to appear in one way or another. Appear issue, which is not “to repair now”, but “to figure out how to eliminate”, a technical debt appears (now they need to be rewritten), a backlog appears, well-known bugs appear. Example: .., yes, yes, we know that now our configuration incorrectly sets the caching options for disks, but to make changes, we must first rewrite the piece responsible for the classification of block devices. This is despite the fact that a normal admin would have done hdparm -W 0 /dev/sda on 400 servers in a long time ago. Perhaps, to the 300 server, he would understand that sda sometimes is a virtual cd-rom and on each server it is necessary to check who the attribute is exposed to ... but this is another story.

The life of the admin is becoming less and less like the life of the admin. The profession is divided into those who write the configuration and those who accompany it (ie, remains on the frontier, alone with the server). These are sometimes called SRE, and depending on the criticality of the project at any given time, this can be both a very cool and important profession, and an "upgraded support".

Where to begin?

If I motivated you a little bit, but you don’t know where to start, then start by finding the configuration management system that you like. These are Ansible, cfengine, Chef, Juju, Puppet, SaltStack, Quattor, etc. Most likely, the list is not full. None of these solutions is good (in my opinion), but this is the best that we have. The choice of such a system must be partially based on known programming languages (this is my IMHO), the feeling of the syntax and the viability of the project.

After selecting the software, do not rush headlong. Reading smart books on such a system is necessary, you can play too, to any level of complexity, but it’s worth implementing very limitedly, without losing control of the situation. This means that you need to start with a simple automation of long actions (rewrite your favorite deployment script in the language of the configuration management system), try to write at least some of what you are doing with your hands.

postscriptum

One of the most interesting life hacking with configuration management systems, I consider a laboratory application. When researching a new software, the configuration of this software by the configuration management system allows you to "try again" (if it is not clear whether the software makes side effects on the server).

At the same time, the description for the robot "what to do" is great for the role of the preliminary draft implementation. At one time I debugged corosync, which did not want to work on the network with a limit of unknown unicast / multicast. During debugging, I had to re-cluster several dozen times. When this was done by ansible, "a few dozen times" did not become a feat, but came to something of the "restart the laboratory" level.

The daily use of configuration management systems requires a restructuring of the most basic instincts for working with servers, but instead of a very unpleasant learning curve, it offers not only a tool to increase work efficiency, but also a change in behavior and thinking patterns to a more effective one. This is a very difficult step, but a step that must be done.

Source: https://habr.com/ru/post/343644/

All Articles