📜 ⬆️ ⬇️

Experience in building Infrastructure-as-Code in VMware. Part 1: Problem Identification

Greetings, dear reader. I am starting a series of articles on how we were looking for a solution to apply the Infrastructure-as-Code approach in our virtual VMware vSphere environment.

We have Puppet for Linux configuration management system, there is (at the moment) DSC for Windows Server.

As for Linux, almost everything is automated. We put the configuration of the machines in the nodes.yaml, we put the roles in Hiera, build modules (or take ready ones), we have PXE, IP addresses are distributed from DHCP to the MAC address.

That is, from the moment Linux is started, the virtual is up to the moment when the virtual is ready for use - no action is needed. Try to guess what is done manually in this chain? True, the creation of the virtual machine itself in vSphere.
')
When I first raised this question, they told me that they were looking for a solution, tried the options, but nothing happened. “To hell with it!” - I thought and argued on the beer case that I would find a solution that would work in the following scenario: a developer or engineer makes a Pull Request, in which we have the configuration of a virtual machine (kernel, memory, network, template, and .d.) - then some magic goes to vSphere and creates a machine, according to the settings in the file.

Let me tell you a little about our environment, so that you understand what I have to deal with.

We use VMware VSphere as On-Premise virtualization - a couple of data centers, a datastore-cluster and several Resource Pools (RP) for each command. Team members have the right to create virtual machines within the RP, the infrastructure guys do not interfere with this and simply service the entire platform, periodically reminding developers and engineers to clean up unused machines behind them (the resources are not rubber).

We have Windows virtuals, Linux virtuals, the scale of the tasks is huge - web servers, reverse proxies, balancers, domain controllers, application servers and databases, and there is no end to them.

Now I will tell you what tools I tried to use, and why they did not fit me.

Empirically ...


Ansible vsphere_guest


As I wrote in the previous article , I love Ansible very much and in the matter of automation, I first look at whether it can be used for this.

According to the documentation, there is a good vsphere_guest module that can create and delete virtuals. Exactly what is needed. Here is my playbook createvm.yaml

--- - name: Create a VM in resource pool hosts: localhost connection: local gather_facts: False vars_prompt: - name: "user" prompt: "Enter your username to virtualcenter" private: no - name: "password" prompt: "Enter your password to virtualcenter" private: yes - name: "guest" prompt: "Enter you guest VM name: " private: no tasks: - name: create VM vsphere_guest: vcenter_hostname: vcenter.example.com validate_certs: no username: '{{ user }}' password: '{{ password }}' guest: '{{ guest }}' state: powered_off vm_extra_config: vcpu.hotadd: yes mem.hotadd: yes notes: This is a test VM vm_disk: disk1: size_gb: 10 type: thick datastore: mydatastore vm_nic: nic1: type: vmxnet3 network: mynetwork network_type: standard vm_hardware: memory_mb: 1024 num_cpus: 1 osid: centos64Guest scsi: paravirtual resource_pool: "/Resources/MyResourcePool" esxi: datacenter: mysite #hostname: myesxhost01 

I deliberately comment on the hostname esxi because I create a virtual machine directly in RP, and not on the host. DRS will decide where to put the virtual machine.

If I launch a playbook, it swears that the required parameter hostname is not specified. If I uncomment it, then he will complain about the lack of rights to create a virtual machine on the esx host (which is obvious, since I only have rights on the RP). I created an appropriate issue , so I hope the guys from Ansible will fix this, because the tool is really good.

Terraform


Another tool that can create virtuals in VMware is Terraform, a product from HashiCorp. Initially, he was imprisoned for interaction with the Packer and deployed to AWS, but he also solves our problems. Here is the actual configuration file:

 provider "vsphere" { user = “mylogin@example.com" password = "${var.vsphere_password}" vsphere_server = “virtualcenter.example.com" allow_unverified_ssl = "true" } resource "vsphere_virtual_machine" "test" { name = "${var.machine_name}" vcpu = 1 memory = 1024 domain = “test.example.com” datacenter = "mysite" resource_pool = "Production Cluster #1/Resources/myresourcepool" network_interface { label = "test" ipv4_address = "192.168.1.2" ipv4_prefix_length = "24" ipv4_gateway = "192.168.1.1" } disk { datastore = "${var.datastore}" size = "10" name = "${var.datastore}/${var.machine_name}/${var.machine_name}.vmdk" template = "mytemplate" } } 

variables.tf
 variable "vsphere_password" {} variable "machine_name" { type = "string" default = "test" } variable "datastore" { type = "string" default = "mysite/mydatastore" } 

terraform plan works great.

  $ terraform plan var.vsphere_password Enter a value: supersecurepassword Refreshing Terraform state in-memory prior to plan... The refreshed state will be used to calculate this plan, but will not be persisted to local or remote state storage. The Terraform execution plan has been generated and is shown below. Resources are shown in alphabetical order for quick scanning. Green resources will be created (or destroyed and then created if an existing resource exists), yellow resources are being changed in-place, and red resources will be destroyed. Cyan entries are data sources to be read. Note: You didn't specify an "-out" parameter to save this plan, so when "apply" is called, Terraform can't guarantee this is what will execute. + vsphere_virtual_machine.test datacenter: "mysite" detach_unknown_disks_on_delete: "false" disk.#: "1" disk.1370406802.bootable: "" disk.1370406802.controller_type: "scsi" disk.1370406802.datastore: "" disk.1370406802.iops: "" disk.1370406802.keep_on_remove: "" disk.1370406802.key: "<computed>" disk.1370406802.name: "" disk.1370406802.size: "" disk.1370406802.template: "mytemplate" disk.1370406802.type: "eager_zeroed" disk.1370406802.uuid: "<computed>" disk.1370406802.vmdk: "" domain: “test.example.com” enable_disk_uuid: "false" linked_clone: "false" memory: "1024" memory_reservation: "0" name: "test" network_interface.#: "1" network_interface.0.ip_address: "<computed>" network_interface.0.ipv4_address: “192.168.1.2" network_interface.0.ipv4_gateway: "192.168.1.1" network_interface.0.ipv4_prefix_length: "24" network_interface.0.ipv6_address: "<computed>" network_interface.0.ipv6_gateway: "<computed>" network_interface.0.ipv6_prefix_length: "<computed>" network_interface.0.label: "test" network_interface.0.mac_address: "<computed>" network_interface.0.subnet_mask: "<computed>" resource_pool: "Production Cluster #1/Resources/myresourcepool" skip_customization: "false" time_zone: "Etc/UTC" uuid: "<computed>" vcpu: "1" Plan: 1 to add, 0 to change, 0 to destroy. 

What is also great, you can set the IP address, domain name - that is, to set the full customization of the machine from the template. Trying to run ...

 Error applying plan: 1 error(s) occurred: * vsphere_virtual_machine.test: Datastore 'mysite/mydatastore not found. Terraform does not automatically rollback in the face of errors. Instead, your Terraform state file has been partially updated with any resources that successfully completed. Please address the error above and apply again to incrementally change your infrastructure. 

Hmm, not found datastore. As I said, we have a cluster, so I'll try to do a dirty one of one of the cluster nodes.

 Error applying plan: 1 error(s) occurred: * vsphere_virtual_machine.test: Datastore 'mysite/mydatastore/mydatastore-vol01' not found. Terraform does not automatically rollback in the face of errors. Instead, your Terraform state file has been partially updated with any resources that successfully completed. Please address the error above and apply again to incrementally change your infrastructure. 

Well ... bad luck again. Later it turned out that Terraform does not know how to work with datastore clusters. The corresponding issue was created on GitHub by my colleague, but, unfortunately, there is no success in this endeavor either.

Powercli


Having failed to find work tools from third parties, I decided to appeal to the vendor's solution.

Vendor offers two solutions - PowerCLI (add-on over Powershell) and vmware-cli (command interface for * nix).

It was not possible to make vmware-cli work on CentOS 7 and OS X (one sufferer even wrote a blog about it), therefore I decided to immediately start using a tool that works.

An attentive reader may wonder why I have spent so much time on Ansible and Terraform, while PowerCLI has long been used. The reasons are simple - I don’t know Powershell at the proper level to start using it with a swoop, plus it forces me to use a windows machine that will do pure provisioning. However, I have no other options.

A quick study of the documentation gave me enough skills to write a simple script.

 Param( [string]$Name, [string]$ResourcePool, [string]$Location, [int]$NumCPU, [int]$MemoryGB, [int]$DiskGB) $ErrorActionPreference = "Stop" Try { $credential = Get-Credential Add-PSSnapin VMware.VimAutomation.Core [string] $username = $credential.GetNetworkCredential().UserName $username = 'example\' + $username Connect-VIServer -Server virtualcenter.example.com -User $username -Password $credential.GetNetworkCredential().Password -Force $params = @{ name = $Name ResourcePool = $ResourcePool Location = $Location NumCPU = $NumCPU MemoryGB = $MemoryGB DiskGB = $DiskGB } new-vm @params } Catch [VMware.VimAutomation.ViCore.Types.V1.ErrorHandling.DuplicateName] {"VM exists"} 

This script is working and does everything necessary. Running the script is as follows:

  .\createvm.ps1 -Name mytestvm -ResourcePool myresourcepool -Location myteam -NumCPI 1 - MemoryGB 1 -DisckGB 10 

The script will ask me to provide a login and password, replay the variables and create a typewriter using cmdlet new-vm. The reader may wonder why this line is present:

 [string] $username = $credential.GetNetworkCredential().UserName $username = 'example\' + $username 

Let experienced powershell guys correct me if I'm wrong. Get-Credential creates an object consisting of a login, password and domain (if any). Password is in securestring state. Unfortunately, PowerCLI is not able to work with either the Get-Credential object or SecureString, so you have to go to such tricks to give him the login and password of a simple string variable.

findings


Dear reader, if you once have the task of automating the creation of virtual machines in VMware, consider the following:


If you have a single node ESX, then I recommend using Ansible, it has a low threshold of entry, and it is quite easy and smart.

If you have the same complex infrastructure as ours, then it’s better not to reinvent the wheel, but to learn PowerCLI.

In the next part, I will tell you how we made our script smarter, and taught him to do checks for customization, the number of cores and other resources and naming convention.

Source: https://habr.com/ru/post/317188/


All Articles