In October of this year, I gave a presentation at the HashiConf 2018 conference, where I talked about 5 key lessons that I and my colleagues at Gruntwork learned in the process of creating and maintaining a library of more than 300,000 lines of infrastructure code used in production systems by hundreds of companies. In this publication, I will share videos and slides from the presentation, as well as an abbreviated textual version of the description of the 5 main lessons.
Despite the fact that the industry is full of fashionable progressive words: Kubernetes, microservices, service grids, unchangeable infrastructure, big data, data lakes, etc., the reality is that when you are immersed in creating infrastructure, you don’t feel yourself so fashionable and progressive.
Personally, DevOps reminds me more of this:
Creating a production-level infrastructure is difficult. This is real stress. And eats a lot of time. A lot of time.
It shows how much time it will take to implement the next infrastructure project. We relied on empirical data that we collected in the course of working with hundreds of different companies:
DevOps projects always take longer than you expect. Is always. Why is that?
The first reason is the yak cut . Below is a vivid illustration of this phenomenon, (this is an excerpt from the “Malcolm in the Spotlight” series)
The second reason is that the process of creating a production level infrastructure (for example, the infrastructure on which you would put your company) consists of thousands of small details. The overwhelming majority of developers are not aware of these details, therefore, when evaluating a project, you usually forget about many critical (and time-consuming) tasks.
To avoid this, each time you start working on a new infrastructure sector, use this checklist:
Not all the elements of the list will be needed for each individual section of the infrastructure, but you must consciously and explicitly document which elements you implemented and which ones you decided to skip and why.
We list the main tools that we at Gruntwork use to create and manage infrastructure (as of 2018):
All of these tools are useful, but this is not the lesson. You need to understand that some tools are not enough. It is also necessary to change the behavior of the team.
In particular, even the best tools in the world will not help your team if it does not want to use them or it does not have enough time to learn how to use them. Thus, the key conclusion is that the use of “infrastructure as a code” is an investment, that is, you will be required to incur certain initial costs. If you invest wisely, you will get big dividends in the long run.
Newcomers to using “infrastructure as a code” often define their entire infrastructure for all their environments (development environment, intermediate environment, production environment, etc.) in a single file or set of files that are deployed as a whole. In vain.
Here are just some of the disadvantages of this approach:
terraform plan
command took 5-6 minutes to complete!In short, you must form your code from small autonomous and reusable composite modules. This is not news or discovery. You have heard it a thousand times, albeit in slightly different situations:
“Do one thing and do it well” - Unix philosophy.
“The first rule of functions is that they should be small. The second rule says that functions should be even smaller. " - “Clean Code”
If your infrastructure code does not have automatic tests, it does not work correctly. You just do not know about it. But testing the infrastructure code is difficult. You do not have a “local host” (for example, you cannot deploy an AWS VPC virtual private cloud on your laptop), or real “unit tests” (for example, you cannot isolate Terraform code from the “outside” world, because times and is intended to communicate with the outside world).
Therefore, in order to properly test the infrastructure code, you usually have to deploy it in a real environment, run the real infrastructure, test the code performance, and then break it (for a description of this testing style: see Terratest, this is an open source library that includes tools for testing the Terraform code , Packer and Docker, working with APIs AWS, GCP and Kubernetes, executing shell commands locally and on remote servers via SSH, and many other features). Thus, testing the infrastructure, we must slightly redefine the conditions:
Unit tests deploy and test one or more closely related infrastructure modules of the same type (for example, modules for a single database).
Integration tests deploy and test several different types of infrastructure modules to verify that they work together correctly (for example, database modules and web service modules).
End-to-end tests (e2e) deploy and test the entire architecture.
Please note that the diagram is a pyramid: we have a lot of unit tests, fewer integration tests and very few e2e tests. Why? It depends on the duration of each type of test:
Infrastructure tests take a long time, especially at the upper levels of the pyramid, and, naturally, you will want to find and correct as many errors as possible at the very bottom. For this you need:
To summarize all of the above. Here is how you will create and manage the infrastructure:
Source: https://habr.com/ru/post/434774/
All Articles