How to take control of network infrastructure. Chapter Four Automation. Templates

This article is the sixth in a series of articles on "How to take control of network infrastructure." The contents of all articles in the series and links can be found here .

Leaving several topics behind, I decided to start a new chapter.

I'll be back to safety later. Here I want to discuss one simple but effective approach, which, I am sure, in one form or another, can be useful to many. It is rather a short story about how automation can change the life of an engineer. It will be about the use of Templates. At the end there is a list of my projects where you can see how everything described here works.
')

DevOps for the network

Creating a configuration script, using GIT to control changes in the IT infrastructure, remote “uploading” - these ideas come first of all when you are thinking about the technical implementation of the DevOps approach. The pros are obvious. But there are, unfortunately, and cons.

When more than 5 years ago, our developers came to us, to networkers, with these proposals, we were not thrilled.

I must say that we inherited a rather motley network consisting of equipment from about 10 different vendors. Something was convenient to configure through our favorite cli, but somewhere we preferred to use the GUI. In addition, long work on "live" equipment has been taught to real-time control. I, for example, making changes, I feel much more comfortable working directly through cli. So I can quickly see that something went wrong and roll back the changes. All this was in some contradiction with their ideas.

Other issues arise, for example, the interface may change slightly from version to software version. This, in the end, will lead to the fact that your script will create the wrong "config". It would not be desirable to use production for "running in".

Or, how to understand that the configuration commands were applied correctly and what to do in case of an error?

I do not want to say that all these issues are unsolvable. Just saying “A” is probably reasonable to say “B” and, if you want to use the same processes to control changes as in development, then you need to have in addition to production also the dev and staging environments. Then this approach looks complete. But how much will it cost?

But there is one situation when the minuses are practically leveled, and there are only pluses. I'm talking about design work.

Project

For the last two years I have been participating in a project to build a data center for one large provider. I am responsible in this project for F5 and Palo Alto. From a Cisco point of view, this is the “3rd party equipment”.

For me personally, there are two distinct stages in this project.

First stage

The first year I was infinitely busy, I worked at night and on weekends. I could not lift my head. The pressure from the management and the customer was strong and continuous. In a constant routine, I could not even try to optimize the process. It was not only and not so much the configuration of the equipment, as the preparation of project documentation.

Here the first tests began, and I would be amazed how many minor errors and inaccuracies were made. Of course, everything worked, but the letter in the title was missed, the line in the team was missed ... The tests continued and continued, and I was already in a constant, daily struggle with errors, tests and documentation.

So it went on for a year. The project, as I understand it, was not easy for everyone, but gradually the client became more and more satisfied, and this made it possible to take on additional engineers who were able to take on part of the routine.

Now you could look around a bit.
And this was the beginning of the second stage.

Second stage

I decided to automate the process.

What I understood from the then communication with the developers (and I must pay tribute, we had a strong team), so this is that the text format, although it seems at first glance something from the world of the DOS operating system, but has some valuable properties .
For example, a text format will be useful if you want to take full advantage of the benefits of GIT and all its derivatives. And I wanted to.

Well, it would seem, you can simply store the configuration or the list of commands, but it is rather inconvenient to make changes. In addition, the design is another important task. You should have documentation describing your overall design (Low Level Design) and specific implementation (Network Implementation Plan). And in this case, the use of templates looks like a very suitable option.

Thus, when using YAML and Jinja2, the YAML file with configuration parameters such as IP addresses, BGP AS numbers, ... perfectly plays the role of NIP, while Jinja2 templates include design-specific syntax, that is, in fact, a reflection of LLD.

It took two days to learn YAML and Jinja2. To understand how this works quite a few good examples. Then it took about two weeks to create all the templates corresponding to our design: a week for Palo Alto and another week for F5. All this was laid out on corporate githab.

Now the change process was as follows:

changed the yaml file
created using the template (Jinja2) configuration file
saved to remote repository
flooded the created configuration on the equipment
saw an error
changed YAML file or Jinja2 template
created using the template (Jinja2) configuration file
...

It is clear that at first a lot of time was spent on edits, but after a week or two it was already rather a rarity.

A good test and opportunity to debug everything was the desire of the client to change the naming convention. Who worked with the F5 understands the piquancy of the situation. But for me, everything was pretty simple. I changed the names in the YAML file, deleted the entire configuration from the hardware, generated a new one and uploaded it. For everything, taking into account the bug fixes, it took 4 days: two days for each technology. After that, I was ready for the next stage, namely the creation of DEV and Staging data centers.

Dev and Staging

Staging virtually repeats production. Dev is a heavily trimmed copy built mostly on virtual hardware. The ideal situation for a new approach. If, from the general process, to isolate the time I have spent, then I think the work took no more than 2 weeks. The main time is the waiting time of the other party, and a joint search for problems. The implementation of the 3rd party was almost invisible to others. There was even a time to teach something and write a couple of articles on Habré :)

Summarize

So what do I have in the bottom line?

All I need to do to change the configuration is to change a simple, clearly structured YAML file with configuration parameters. I never change the python script and very rarely (only if there is an error) change Jinja2 warmer
from the point of view of documentation, an almost ideal situation is obtained. You change the documentation (YAML files play the role of NIP) and upload this configuration to the hardware. So your documentation is always up to date.

All this led to the fact that

error rate dropped to almost 0
it took 90 percent of the routine
At times increased the rate of implementation

PAY, F5Y, ACY

I said that a few examples are enough to understand how this works.
Here is a brief (and of course modified) version of what was created in the course of my work.

PAY = deployment P alo A lto from Y aml = Palo Alto from Yaml
F5Y = deployment F5 from Y aml = F5 from Y aml (coming soon)
ACY = deployment AC i from Y aml = F5 from Y aml

I will add a few words about ACY (not to be confused with ACI).

Those who worked with ACI know that this miracle (and in a good sense too) was not created by networkers exactly :). Forget everything you know about the network - it will not come in handy!
It is a little exaggerated, but it conveys approximately the feeling that I have been experiencing constantly, for 3 years now, while working with ACI.

And in this case, ACY is not only an opportunity to build a change control process (which is especially important in the case of ACI, because it is assumed to be the central and most critical part of your data center), but also gives you a friendly interface for creating configurations.

The engineers in this project to configure ACI instead of YAML for exactly the same purpose use Excel. In using Excel, of course, there are pluses:

your NIP in one file
beautiful signs that look nice to the client
you can use some excel tools

But there is one drawback, and in my opinion it outweighs the advantages. To control changes and coordinate the work of the team becomes much more difficult.

ACY is actually using the same approaches that I used for the 3rd party to configure ACI.

Source: https://habr.com/ru/post/453920/

All Articles