[Terraform + SaltStack] Cooking PrestoDB Cluster in a Pressure Cooker (Part # 1)

What is interesting here?

The recipe for making a tasty and healthy PrestoDB cluster using a Terraform pressure cooker and SaltStack in an AWS public cloud. Let us consider in detail the nuances of preparation for the work of the pressure cooker itself, the necessary steps for the proper preparation of the dish itself and, naturally, we will tell a little about the consumption of the finished dish. This part can be used as a study material on Terraform.

so let's get started:

Ingredients for the recipe

Terraform - 1 pc.
SaltStack - 1 Master and 1+ Minions
PrestoDB - 1 coordinator and 1+ worker
AWS account - 1 pc.
Smarty and file - to taste

Consider the ingredients in more detail: (without the rules of their preparation)
1. Terraform - A wonderful tool from the guys from Hashicorp (they also made such very useful things like Vagrant, Consul , Packer, Vault, etc.) used to create and modify infrastructures in various cloud and not only environments.
2. SaltStack - A tool for automated configuration and configuration of servers. Your humble servant has already written about it here and here .
3. PrestoDB - Add-on for Big Data providers to be able to query them in native and understandable SQL. Developed by the guys from Facebook, who transferred it to OSS status for which many thanks to them.
4. AWS (or any other public / private cloud, for example: GCE or OpenStack ) from the list supported by Terraform in which our PrestoDB cluster will work later. We will use AWS because it is the most common (among public cloud platforms) and is understandable to many without mass of additional explanations.
5. The article will describe only the basic principles of the work of a bundle of these products, and some tricks to facilitate the process, but I will not elaborate on the nuances of the work of a component - for each of them, in principle, you can write a book. Because adapt these techniques using the head is very welcome. And yet - do not write in the comments that something is not optimally tuned (in particular PrestoDB) - this is not the goal that I am pursuing.

Cooking a pressure cooker!

In any culinary recipe there is a default stating that the pans and pots are ready to be cooked, but in our case the correct preparation of the pressure cooker (Terraform + SaltStack) is almost 80% key to successful cooking.
So, let's start with Terraform. Well, there is CloudFormation for AWS or SaltCloud from the creators of SaltStack, so why was Terraform chosen? The main feature of Terraform is its simplicity and understandable DSL - to create an instance (or 10), this description is necessary and sufficient (we mean Terraform is downloaded and is within $ PATH):

provider "aws" { access_key = "XXXXXXXXXXXXXXXXXXXXX" # AWS IAM key secret_key = "******************************************" # AWS IAM secret region = "us-east-1" # region used to create resources } resource "aws_instance" "example_inst" { ami = "ami-6d1c2007" # CentOS 7 AMI located in US-East-1 instance_type = "t2.medium" count = "1" # or "10" can be used for parallel creation vpc_security_group_ids = [ "default-sg" ] # some security group with at least 22 port opened key_name = "secure_key" # pre created AWS E2 key pair subnet_id = "sub-abcdef123" # AWS VPC subnet }

and a simple sequence of commands:
')

terraform plan
terraform apply

the narrative is understandable and, it seems to me, does not require explanation for those who are familiar with AWS. Learn more about the available AWS resources here . Of course, we mean that an AWS account whose keys are specified in the Terraform configuration has the privileges to create the necessary resources.
Actually, the most interesting thing lies in the calls of Terraform itself - the terraform plan - does a “gauge” of what needs to be done from the last state (in our example, you need to create a new instance) and shows what resources will be created, deleted or modified, apply - actually start the process create scheduled resources. If Terraform has already been launched and you have changed the configuration (say, added instances), the planning stage will show which missing resources will be created and apply can create the missing ones.

terraform destroy

helps to completely remove all resources created with Terraform (the files in the current .tfstate directory that contain the description of the state of the created infrastructure are taken into account).
An important point, which you should not forget - terraform in most cases will not modify the existing resources - it will simply delete the old ones and recreate it again. This means, for example, that if you created an instance of type t2.medium and then changed the configuration by specifying a new type for the instance, say m4.xlarge, then when you run apply Terraform, you first destroy the previously created one, and then create a new one. This may seem strange to AWS users (it was possible to stop the instance, change its type and start it again without losing the data on the disk), but this was done to provide the same predictable behavior on all platforms. And one more thing: Terraform does not know how (and should not be able by its nature) to control resources during their life cycle - this means that Terraform does not provide commands like stop or reboot for instances created with it - you must use other means management of the created infrastructure.
Terraform provides an excellent set of functionality available in its DSL - these are variables (https://www.terraform.io/docs/configuration/variables.html), interpolators (necessary for iteration, modifying variables), modules , etc. Here is one example of using all of this:

 # Cluster shortname variable cluster_name { default = "example-presto" } # Count of nodes in cluster variable cluster_size { default = 3 } # Default owner for all nodes variable cluster_owner { default = "user@example.com" } # Default AWS AMI to use for cluster provisioning variable cluster_node_ami { default = "ami-6d1c2007" } # Default AWS type to use for cluster provisioning variable cluster_node_type { default = "t2.large" } # Defualt VPC subnet variable cluster_vpc_subnet { default = "subnet-da628fad" } # Default Security group to apply to instances variable cluster_sg { default = "sg-xxxxxxx" } # Default KeyPair to use for provisioning variable cluster_keyname { default = "secure_key" } # Cluster worker nodes resource "aws_instance" "worker_nodes" { ami = "${var.cluster_node_ami}" instance_type = "${var.cluster_node_type}" count = "${var.cluster_size - 1}" # one node will be used for coordinator vpc_security_group_ids = [ "${var.cluster_sg}" ] key_name = "${var.cluster_keyname}" subnet_id = "${var.cluster_vpc_subnet}" disable_api_termination = true tags { Name = "${var.cluster_name}-cluster-worker-${format("%02d", count.index+1)}" Owner = "${var.cluster_owner}" Purpose = "PrestoDB cluster '${var.cluster_name}' node ${format("%02d", count.index+1)}" } }

Here is an example of using variables, arithmetic operations on them, interpolation using format, using the index of the current element (if several instances of the same type are created), and resource tagging.
But it is not enough just to create / destroy instances - it is necessary to somehow initialize them (copy files, install and configure specific software, update the system, perform cluster configuration, etc.) for this Terraform introduces the concept of Provisioners . The main ones are file , remote-exec , chef and null-resource . Typical operations are copying files and running scripts on a remote instance.
Here is the previous example with provisioning operations enabled:

 # Localy stored SSH private key filename variable cluster_keyfile { default = "~/.ssh/secure_key.pem" } # Cluster worker nodes resource "aws_instance" "worker_nodes" { ami = "${var.cluster_node_ami}" instance_type = "${var.cluster_node_type}" count = "${var.cluster_size - 1}" # one node will be used for coordinator vpc_security_group_ids = [ "${var.cluster_sg}" ] key_name = "${var.cluster_keyname}" subnet_id = "${var.cluster_vpc_subnet}" disable_api_termination = true tags { Name = "${var.cluster_name}-cluster-worker-${format("%02d", count.index+1)}" Owner = "${var.cluster_owner}" Purpose = "PrestoDB cluster '${var.cluster_name}' node ${format("%02d", count.index+1)}" } # Copy bootstrap script provisioner "file" { source = "bootstrap-script.sh" destination = "/tmp/bootstrap-script.sh" connection { type = "ssh" user = "centos" private_key = "${file("${var.cluster_keyfile}")}" } } # Running provisioning commands provisioner "remote-exec" { inline = [ "yum -y update", "sudo sh /tmp/bootstrap-script.sh" ] connection { type = "ssh" user = "centos" private_key = "${file("${var.cluster_keyfile}")}" } } }

The main note - specifying information about the connection to the remote host - for AWS, this is most often access by key - therefore you have to specify exactly where this key lies (for convenience, a variable was entered). Note that the private_key attribute in the connection section cannot accept the path to the file (only key with text) - instead, the $ file {} interpolator is used to open the file on the disk and return its contents.
We got to create a simple cluster consisting of several instances (we will not go into the details of the contents of the bootstrap-script.sh file - we assume that the installation of the necessary software is registered there). Let's look at how to make a cluster with a dedicated master in our pressure cooker. In general, we assume that the worker nodes of the cluster need to know where the master node is located in order to register in it and subsequently receive tasks (let's leave all sorts of goodies like Raft and Gossip protocols to establish a master and distribute information in the cluster for other articles) - for simplicity - let's set the worker to know the IP address of the master. How to implement it in Terraform? First you need to create a separate instance for the master:

 resource "aws_instance" "master_node" { ami = "${var.cluster_node_ami}" instance_type = "${var.cluster_node_type}" count = "1" <...skipped...> provisioners { <...skipped...> } }

then, add a dependency to worker node:

 # Clurter worker nodes resource "aws_instance" "worker_nodes" { depends_on = ["aws_instance.master_node"] # dependency from master node introduced ami = "${var.cluster_node_ami}" instance_type = "${var.cluster_node_type}" count = "${var.cluster_size - 1}" # one node will be used for coordinator <...skipped...> }

The depends_on resource modifier can be used to specify the order in which to perform tasks on creating infrastructure - Terraform will not create a worker node until the master node is fully created. As you can see from the example, as a dependency (tey), you can specify a list constructed from the type of the resource, indicating its name through the point. In AWS, you can create not only instances, but also VPCs, networks, etc. - they will need to be specified as dependencies for resources using VPC - this will guarantee the correct order of creation.
But, let's continue with passing the address of the master node to all worker nodes. For this, Terraform provides a mechanism for referencing previously created resources — that is, You can simply extract information about the ip address of the master node in the worker’s description:

 # Clurter worker nodes resource "aws_instance" "worker_nodes" { depends_on = ["aws_instance.master_node"] # dependency from master node introduced ami = "${var.cluster_node_ami}" instance_type = "${var.cluster_node_type}" count = "${var.cluster_size - 1}" # one node will be used for coordinator <...skipped...> # Running provisioning commands provisioner "remote-exec" { inline = [ "yum -y update", "sudo sh /tmp/bootstrap-script.sh ${aws_instance.master_node.private_ip}" # master-ip passed to script ] connection { type = "ssh" user = "centos" private_key = "${file("${var.cluster_keyfile}")}" } } }

those. using variables of the form $ {aws_instance.master_node.private_ip} you can access almost any information about the resource. In this example, we assume that bootstrap-script.sh can take as its parameter the address of the master node and use it later for internal configuration.
Sometimes there are not enough such connections, - for example, you need to call some scripts on the master side after connecting worker nodes (accept keys, run init tasks on worker nodes, etc.) for this there is a mechanism in Terraform called null -resource is a fake resource that can be created using the dependency mechanism (see above) after all the master and worker nodes have been created. Here is an example of such a resource:

 resource "null_resource" "cluster_provision" { depends_on = [ "aws_instance.master_node", "aws_instance.worker_nodes" ] # Changes to any instance of the workers' cluster nodes or master node requires re-provisioning triggers { cluster_instance_ids = "${aws_instance.master_node.id},${join(",", aws_instance.worker_nodes.*.id)}" } # Bootstrap script can run only on the master node connection { host = "${aws_instance.master_node.private_ip}" type = "ssh" user = "centos" private_key = "${file("${var.cluster_keyfile}")}" } provisioner "remote-exec" { inline = [ <... some after-provision scripts calls on master node...> ] } }

small explanation:
1. depends_on - we specify a list of those resources that must be ready in advance.
2. triggers - form the stock (id of all instances, separated by a comma, in our case) a change of which will cause the fulfillment of all the provider's specified in this resource.
3. We indicate on which instance we need to run the provisioning scripts specified in this resource in the connection section.

If you need to perform several steps on different servers, create several null-resource with the necessary dependencies.

In general, the described will be enough to create rather complex infrastructures using Terraform.
Here are some more important tips for those who like to learn from the mistakes of others:
1. Do not forget to carefully store .tfstate files in which Terraform stores the latest state of the created infrastructure (in addition, it is a json file that can be used as an exhaustive source of information about the created resources)
2. Do not change the resources created using Terraform manually (using the management console of the services themselves and other external frameworks) - the next time you start plan & apply, you will receive a re-creation of a resource that does not correspond to the current description, which will be very unexpected and often deplorable.
3. Try first to test your configurations on small instances / a small number of them - it is very difficult to catch a lot of errors during the creation of configurations, and the validator built into Terraform will show only syntax errors (and not all).

In the second part, we will consider the continuation of preparation for the work of the pressure cooker - we will describe how to put on top of the created infrastructure SaltStack master + minions to put PrestoDB.

Source: https://habr.com/ru/post/301192/

All Articles

[Terraform + SaltStack] Cooking PrestoDB Cluster in a Pressure Cooker (Part # 1)

What is interesting here?

Ingredients for the recipe

Cooking a pressure cooker!

More articles: