Terraform: a new approach to Infrastructure as code

Hello colleagues! While brilliant Ilon Mask is carrying ambitious plans for the terraforming of Mars , we are interested in new opportunities related to the paradigm " Infrastructure as Code " and want to offer you a translation of an article about one of the representatives of the "magnificent seven" - Terraform . Eugene Brickman's book on the topic is not bad, but she will soon be a year, so please speak up - do you want to see it in Russian?

The word Kamal Marhubi (Kamal Marhubi) from the company Heap.

Our infrastructure is powered by AWS, and we manage it with Terraform. In this publication, we have selected for you practical tips and tricks that were useful to us in the course of work.

Terraform and code level infrastructure

Terraform is a tool from Hashicorp that helps declaratively manage the infrastructure. In this case, you do not have to manually create instances, networks, etc. in the console of your cloud provider; it is enough to write a configuration that will set out how you see your future infrastructure. This configuration is created in human-readable text format. If you want to change your infrastructure, then edit the configuration and run the terraform apply . Terraform will send API calls to your cloud provider to bring the infrastructure in line with the configuration specified in this file.
')
If we transfer the management of the infrastructure to text files, then we have the opportunity to arm ourselves with all the favorite tools for managing the source code and processes, after which we redirect them to work with the infrastructure. Now the infrastructure is subject to version control systems, just like the source code, it can be reviewed in the same way or rolled back to an earlier state if something goes wrong.

Here's how, for example, an EC2 instance with an EBS volume is defined in Terraform:

 resource "aws_instance" "example" { ami = "ami-2757f631" instance_type = "t2.micro" ebs_block_device { device_name = "/dev/xvdb" volume_type = "gp2" volume_size = 100 } }

If you have not tried Terraform yet, then this beginner's guide will do for you and will help you quickly get comfortable with the flow of tasks in this tool.

Terraform data model

In a common perspective, the Terraform data model is simple: Terraform manages resources, and resources have attributes. Some examples from the AWS world:

EC2 instance is a resource with attributes such as machine type, boot image, availability zone, and security groups
EBS volume is a resource with attributes such as volume size, volume type, IOPS
The elastic load balancer is a resource with attributes for backup instances, the characteristics of their health, and some other phenomena.

Terraform provides a mapping of the resources described in the configuration file with the corresponding cloud provider resources. This mapping is called a state , it is a giant JSON file. When you start terraform apply Terraform updates the state by sending the corresponding request to the cloud provider. It then compares the returned resources with the information recorded in your Terraform configuration. If any difference is found, a plan is created, in essence, a list of changes that need to be made to the resources of the cloud provider in order for the actual configuration to match that indicated in your configuration. Finally, Terraform applies these changes, directing the appropriate calls to the cloud provider.

Not every Terraform resource is an AWS resource.

Understanding such a data model with resources and attributes is not so difficult, however, it may not quite coincide with the cloud provider API. In fact, a single Terraform resource can correspond to one or several basic objects of a cloud provider — or even not even one. Here are some examples from AWS:

aws_ebs_volume in Terraform matches one AWS EBS volume
aws_instance in Terraform with the built-in ebs_block_device block as in the previous example corresponds to two EC2 resources: an instance and that
aws_volume_attachment in Terraform does not match any object in EC2!

The latter may seem surprising. When creating aws_volume_attachment Terraform will make an AttachVolume request; at destruction of this volume - will make request DetachVolume . No EC2 object is involved in this: aws_volume_attachment is completely synthetic in Terraform! Like all resources in Terraform, it has an ID. But, while in most cases the ID is acquired from the cloud provider, the ID aws_volume_attachment is just a hash from the volume ID, the instance ID, and the device name . There are other cases where synthetic resources appear in Terraform - for example, aws_route53_zone_association , aws_elb_attachment and aws_security_group_rule . To find them, you can search in the resource name association or attachment , which, however, does not always help.

All problems are solved in several ways, so when choosing, be careful!

When working with Terraform, exactly the same infrastructure can be represented in several different ways. Here is another description of our example instance with the EBS volume in Terraform, which gives exactly the same EC2 resources as output:

 resource "aws_instance" "example" { ami = "ami-2757f631" instance_type = "t2.micro" } resource "aws_ebs_volume" "example-volume" { availability_zone = "${aws_instance.example.availability_zone}" type = "gp2" size = 100 } resource "aws_volume_attachment" "example-volume-attachment" { device_name = "/dev/xvdb" instance_id = "${aws_instance.example.id}" volume_id = "${aws_ebs_volume.example-volume.id}" }

Now that the EBS volume has become a full resource of Terraform, we have delimited it from the EC2 instance. There is a third resource, synthetic, linking the first two. If you submit our instance and volume, we can add and delete volumes by simply adding and removing aws_ebs_volume and aws_volume_attachment .

Often, it does not matter which EBS representation you choose. But sometimes, if the choice was wrong, then it will be quite difficult to change your infrastructure after that!

We made a mistake with the choice

This is where we burned. We work with a large PostgreSQL cluster on AWS, and 18 EBS volumes are attached to each instance as storage. All of these instances are represented in Terraform as the only aws_instance resource with EBS volumes defined in ebs_block_device blocks.

In the instances of our database, information is stored in the ZFS file system. ZFS allows you to dynamically add block devices to grow your file system without any delay. In this way, we gradually increase our storage as customers send us more and more data. Since we are an analytical company and we collect all sorts of information, such an opportunity is a huge help for us. We constantly optimize queries and insert operations in our cluster . Thus, we will not get bogged down with the hard ratio of the CPU-storage that we chose when we prepared the cluster, but we can correct the balance on the fly in order to effectively use the latest innovations.

This process could be even smoother if it were not for the ebs_block_device blocks. Yes, one can hope that Terraform will add the 19th ebs_block_device block to the aws_instance aws_instance - and everything will just work. However, Terraform sees an overwhelming change here: he “does not know” how to change an instance from 18 volumes to be 19. No, Terraform is going to demolish the entire instance and make a new one in its place! We least wanted something like this in our database, where terabytes of information are stored!

Until recently, we used a workaround and made Terraform synchronize in several stages:

run a script that used AWS CLI to create and add volumes
launch terraform refresh so that terraform refresh state and
finally, they changed the configuration so that it corresponded to the new realities

Between stages 2 and 3, the terraform plan command will show that Terraform was going to destroy and recreate all the instances of our database. Thus, it was impossible to work with these instances in Terraform until someone updates the configuration. Should I say how scary it is to remain permanently in such a state!

Terraform condition: go to surgery

Finding an approach with aws_volume_attachment , we decided to restructure our presentation. Each volume has become two new Terraform resources: aws_ebs_volume and aws_volume_attachment. We had 18 volumes per instance in the cluster, and we had more than a thousand new resources in front of us. Rebuilding a view is not just a configuration change for Terraform. We had to get to the state of Terraform and change its vision of resources.

Given that we have added over a thousand resources, we definitely were not going to do it manually. Terraform state is stored in JSON format. Although this format is stable, the documentation states that “it is not recommended to directly edit status files ”. We would still have to do it, but we wanted to be sure that we are doing it right. We decided not to engage in reverse development of the JSON format, but wrote a program that uses the internal elements of Terraform as a library for reading, modifying and writing. It was not so easy, because for all of us it was the first program on Go that we had a chance to work with! But we thought it was necessary to make sure: yes, we will not confuse in one heap all the Terraform-states of all the instances of our database.

We put the tool on GitHub , in case you want to play with it and feel yourself in our shoes.

Terraform neatly

Running terraform apply is one of the few acts that can seriously damage the entire corporate infrastructure. There are several tips, following which, the risks can be reduced - and in general it will not be so scary.

Always prepare a plan – out and follow this plan.

If you run terraform plan -out planfile , Terraform will write the plan to planfile . Then you can get this plan exactly by running the terraform apply planfile . Thus, the changes that will be applied at this moment correspond exactly to what Terraform will bring you at the time of planning. There is no situation in which the infrastructure could suddenly change due to the fact that one of the colleagues corrected it between your “plan” and “apply” operations.

However, be careful when working with this file: Terraform variables are included there, so if you write something secret there, this information will be recorded in the file system in an unencrypted form . For example, if you pass your credentials as variables to a cloud provider, they will be saved to disk as plain text.

To iterate over changes, make the IAM read-only special role.

After launching a terraform plan Terraform updates its view of your architecture. To do this, it needs only access to your cloud provider with read-only rights. Having established such a role for it, you can enumerate the changes made in the configuration and check them with a terraform plan , without even risking that the careless apply command will apply out all the work done in a day - or in a week!

With AWS, you can manage the IAM roles and associated access rights in Terraform. The role in Terraform looks like this:

 resource "aws_iam_role" "terraform-readonly" { name = "terraform-readonly" path = "/", assume_role_policy = "${data.aws_iam_policy_document.assume-terraform-readonly-role-policy.json}" }

In assume_role_policy simply displays a list of users who are entitled to accept this role.

Finally, we need a policy that provides read-only access to all AWS resources. Amazon kindly provides a policy description document that you can just copy and paste - this is exactly the document we used. We define the aws_iam_policy policy referencing this document:

 resource "aws_iam_policy" "terraform-readonly" { name = "terraform-readonly" path = "/" description = "Readonly policy for terraform planning" policy = "${file("policies/terraform-readonly.json")}" }

Then we apply the policy to the terraform-readonly role, while adding aws_iam_policy_attachment :

 resource "aws_iam_policy_attachment" "terraform-readonly-attachment" { name = "Terraform read-only attachment" roles = ["${aws_iam_role.terraform-readonly.name}"] policy_arn = "${aws_iam_policy.terraform-readonly.arn}" }

Now you can use the AssumeRole method related to the API Secure Token Service to obtain temporary credentials that only allow you to request AWS, but not make changes. By launching the terraform plan , we will update the state of Terraform so that it reflects the current state of the infrastructure. If you are working with a local state, then this information will be written to the terraform.tfstate file. If you use the remote state, for example, in S3, then your read-only role will also require the right to write - otherwise you will not be able to get to S3.

Organizing such a role was much easier than rewriting the entire state of Terraform to use aws_volume_attachment with our database volumes. We knew that no changes in the AWS infrastructure were planned - only its presentation in Terraform had to change. In the end, we absolutely did not intend to change the infrastructure - why do we need such an opportunity?

Ideas for the future

Our team is growing and new employees are learning how to make changes to the infrastructure with the help of Terraform. I want this process to be simple and safe. Most of the failures are due to human errors and changes in the configuration , and changes using Terraform can be fraught with both of them - this is creepy, agree.

For example, in a small team it is easy to ensure that only one person will work with Terraform at any given time. In a larger team, this will not work - it only remains to rely on it. If the terraform apply is run from two nodes at the same time, the result can be a terrible non-deterministic hash. Terraform 0.9 now has the ability to lock the state - it ensures that only one terraform apply command can be applied at a time.

Another area of work where you so want to achieve ease and security is the review of changes to the infrastructure. At this stage, when reviewing, we simply copy the terraform plan output as a comment to the review - and, when the option is approved, we make all the changes manually.

We have already adapted to use our continuous integration tool when we validate the Terraform configuration. For now, just run the terraform validate command, and the tool checks whether there are any syntax errors in the code. Our next task is to adapt the continuous integration tool to launch the terraform plan and output the changes made to the infrastructure as a comment to the code review. The continuous integration system should automatically run terraform apply as soon as the change is approved. So we exclude one of the stages that was required to be done manually, and also provide a more consistent audit trail (change history), which is traceable in the comments. In the version of Terraform Enterprise there is such an opportunity - so we recommend to take a closer look at it.

Source: https://habr.com/ru/post/351878/

All Articles