📜 ⬆️ ⬇️

How we made friends in bank infrastructure using ManageIQ

A couple of years ago, the main trends were automation, DevOps practices and the acceleration of the delivery of values ​​to the market. Home Credit Bank decided to keep up and took a course on the development of technologies, all the more so as the open whispered whisper of users who were tired of waiting a few days to wait for new resources for their important projects rang louder on the open space.


We decided to start with the process of approving applications by departments, which, as in many large companies, required time and effort. As the first task, we chose the process of creating a virtual machine regardless of the virtualization environment. Making a list of tasks, we realized that it would be necessary to integrate with other systems used in the infrastructure of our bank, for example, via API.


image


The most suitable solution was ManageIQ . This is a project that Red Hat acquired in 2012 and based on it created the commercial Red Hat CloudForms product . At the same time, ManageIQ remained in the status of an open-source product and is developing in parallel with CloudForms.


ManageIQ is written in Ruby and supports a large number of different providers of virtualization, public clouds and containerization. At the moment, we are using a version of Gaprindashvili in the High-Availability configuration in Home.


How the process has changed


Previously, each team required separate settings in its area of ​​responsibility. After preliminary preparation, all data was collected and sent to the administrator, who deployed and configured the virtual machine. Then it was necessary to inform, for example, the monitoring team that a new host had appeared that needed to be added to the monitoring. Delays in communication, workload of specialists, errors caused by the human factor, could stretch this process to several days.


Having fit the whole process into ManageIQ, we got the following results:


Virtual resource typeBefore introducing ManageIQAfter implementing ManageIQ
Linux virtual machine in VMware / oVirtTo one~ 10 minutes
Rancher virtual machine environmentworking~ 15 minutes
Windows Virtual Machine in VMwareweeks~ 25 minutes

The time difference is due to the fact that in the second case, additional time is required to prepare the host for working with Docker, download and tegrate images for infrastructure containers from Artifactory, because at this stage there is still no access to the Docker Hub. In the case of Windows, the difference is achieved due to the fact that, firstly, the creation time of a Linux VM without customization is approximately 2 minutes, and that of a Windows VM is 6 minutes. Secondly, customizing Windows itself takes about 10 minutes, versus 2 minutes for Linux.


10 minutes is not so fast, considering that approximately 2-3 minutes are spent directly on the process of creating a VM. For the remaining time, ManageIQ manages to do the following:


  1. The system collects the parameters specified by the user in the order form and decomposes them into variables.
  2. A new change request is created in the incident management system, which displays data about the new resource.
  3. The ManageIQ Resource Name Query System sends a value for a new resource.
  4. The IP address management system issues a new address based on the entered parameters.
  5. A new DNS record is registered on the local DNS server.
  6. Based on the parameters, environment and resource load, the type of virtualization and cluster for placement are selected.
  7. Next, the process of creating a virtual machine with the specified parameters.
  8. When the virtual machine is deployed from the template, you need to run scripts that will make the final settings:
    • expanding the drive to a specified size,
    • generating a new root password, changing it on a Linux host and writing to a password manager,
    • creating a configuration YAML file for Puppet in GitLab,
    • run runbooks that bring the necessary settings and updates for Windows VMs or
    • launch Puppet, which will update and configure Linux machines.
  9. After all this, the change request created in step 2 is closed. Fresh data is added to it, such as the IP address and host name.
  10. A new unit is registered in the Compute Resource Management Base (CMDB).
  11. The virtual machine is registered in Zabbix and added to monitoring.
  12. The customer and other interested parties receive an e-mail with information about the new unit created using ManageIQ.

What's inside


Let's delve into the technical details of the product. By default, ManageIQ can create a virtual machine from a template. How does this differ from what we do, for example, in vCenter? The correct answer is nothing. ManageIQ uses the same methods as virtualization systems, but does it from a single place. In addition to this, you can add your own scripts that do not fit into the standard set of features. Thus, if you have resources, for example, in public Azure, in vCenter, which is deployed on your own hardware, plus the Kubernetes cluster is spinning somewhere else, then all this can be conveniently managed from ManageIQ.


In addition to a wide variety of providers for integration, ManageIQ has convenient tools for customization. This, for example, creating convenient forms for solving your problem:



Thanks to this, it was possible to construct a full-fledged interface for ordering a virtual machine, fitting all the necessary parameters into it:



We select the amount of computing resources, OS, fill in all the additional information that is needed for integration with external systems. Further, using internal mechanisms (about them a little later), the system chooses where new resources will be placed: the data center, cluster, host and datastore are selected depending on all the parameters entered and the resources are loaded.


Do not forget that people can order too many resources or not at all what they really need. Here the system of requests and confirmations comes into play:



Any resources ordered by the user must be approved by the responsible person. In Home, a group of architects does this.


Automation structure


If you decompose all the automation processes in ManageIQ into small parts, you will notice a certain structure.


Automate Domain



Datastore hosts all the domains that ManageIQ has.


By default, there is a ManageIQ domain that is locked and is something like a reference model. If you need to make changes, another domain is created, into which elements from the ManageIQ domain are copied and changed for your own tasks.


Automate Namespace



Inside, the domains are divided into parts that are responsible for individual processes: this may be the section responsible for managing the infrastructure (Infrastructure) or for working with services (Service). We have our own Namespace, which contains everything related to the bank's systems.


Consider the structure in more detail using the example of the provisioning process for a new virtual machine. It is described in the Automate Class called VMProvision_VM .


Automate Class


The class has a structure that includes Instances , Methods , Properties, and Schema . From the point of view of automation, Schema is of most interest:


The layout is similar to pipeline in CI / CD systems. It describes the steps that will be performed during the automation process.


Automate Instance



The class described above has two Automate Instance. Each of them inherits from the circuit the stages for which the Default Value is set. Stages that have null values ​​are described in the instance.



In the instance, values ​​appeared for the steps that were empty in the schema description. It is also visible who and when made the last change.


Let's see what one of the Value values ​​represents:


This is an Automate Class called Methods, which has one Automate Instance. Its scheme describes the ipam_base_uri attribute and the execute method. The execute method, in turn, calls the Automate Method acquire_ip .


Automate Method


This is a Ruby script that allows a virtual machine to communicate via REST API with other systems. For example, as is the case with the IPAM address space management system. In IPAM we get the address, mask, subnet and VLAN for the VM. The difficulty is that the machine can be deployed in a test environment or productive, for applications or databases. Or maybe the security service decided to place it in the PCI-DSS loop. All this information is collected at the stage of creating the VM or transmitted in the parameters of the called instance (in the screenshot above you can see that the parameter contains the uri by which the method will access IPAM):


Here is some Ruby code
base_uri = $evm.object['ipam_base_uri'] prov = $evm.root["miq_provision"] site = prov.get_option(:site) app = prov.get_option(:dialog_dropdown_list_information_system) crq = prov.get_option(:crq) descr = prov.get_option(:dialog_textarea_box_usernotes) owner = $evm.root['user'].name scope = prov.get_option(:dialog_dropdown_scope) environment = prov.get_option(:landscape) 

$ evm.root is a method that returns everything that can be stored in ManageIQ. This can be information about the user, environment, variables, the current request ('miq_request'), etc. We are interested in the current provision process.


Next, we can pick up the necessary values: get_option (: site) picks up the value that was transferred at one of the previous stages, and, for example, get_option (: dialog_dropdown_list_information_system) picks up from the form that the user fills when ordering new resources.
All received values ​​are transmitted by variables in the request body in JSON format:


 options = { verify: false, headers: {"Content-Type" => "application/json"}, body: { "site" => "#{site}", "env" => "#{env}", "app" => "#{app}", "scope" => "#{scope}", "role" => "#{role}", "crq" => "#{crq}", "descr" => "#{descr}", "owner" => "#{owner}", }.to_json, } 

Using this set of parameters, IPAM will unambiguously determine in which VLAN the virtual machine should be located, and will return the network parameters.


In addition to receiving data for the correct VM configuration, ManageIQ can also generate additional information in order to make some settings at the stage of the so-called post provisioning (after the virtual machine is deployed and launched). In Home, we use Puppet to manage Linux host configurations. For each computing unit, create a YAML file in GitLab with a set of groups:


Some more Ruby code
 options = { headers: {"Private-Token" => "#{api_token}", "Content-Type" => "application/json"}, } body = { "branch" => "#{branch}", "author_email" => "email@your.domain", "author_name" => "ManageIQ Bot", "content" => "", "commit_message" => "New host created by ManageIQ", } descr = prov.get_option(:long_description) if descr.include?('rancher') && descr.include?('test') then body[:content] = "---\ngroups:\n - #{yaml_server}\n - rancher\n - user-devops-UDCR" end unless descr.include?('test') then if descr.include?('rancher') then body[:content] = "---\ngroups:\n - #{yaml_server}\n - rancher\n" end end unless descr.include?('rancher') then body[:content] = "---\ngroups:\n - #{yaml_server}\n - #{$is_id}" end 

Groups depend on the type of virtual machine, the environment in which it is created, and the information system.



After successful completion of the procedure, the user receives an email with information:


The text of the letter can also be adjusted by adding the necessary information.
In the event that an error occurs at any of the critical stages of the process, you can add a condition in which it will be explicitly stated that the process should be interrupted. If the error does not have fatal consequences, also indicate what can be continued, despite the problem.


Logging


ManageIQ writes logs of everything that can be tracked. The automation process is written in automation.log. In addition there are API logs, various cloud providers, security logs, even the output of the top command is logged.


For each event in the circuit, you can configure a log entry of their start and end:


In addition, you can write your messages in the logs:


 $evm.log(:info, "Call job status uri: #{item_uri}/#{job_id}/api/json") 

This is very useful when accessing systems by API to understand why something went wrong. Or, to track the current status of a lengthy process, such as running a Jenkins job or SCCM Runbook:


 $evm.log(:info, "acquire_osname --- naming jobStatus: #{jobStatus}") break if jobStatus.to_s == "Completed" 

You can use the standard functions for exceptions to write to the logs:


 raise “VM not specified” if vm.nil? 

By default, all logs are stored in the / var / log / manageiq / * section, but from my own experience I can say that looking for a problem through tail and grep is not a convenient solution. Given that ManageIQ writes a lot of different logs, you should take care to redirect logs, for example, to the ELK stack.


ManageIQ API


In addition to a user-friendly web interface, ManageIQ has a functional API. With it, for example, we solved the problem of dynamically determining the identifier of the template to be specified


when creating a VM:
 def get_template(vendor, os, ems) user = '#{user}' pass = '#{pass}' options = { verify: false, headers: {"Accept" => "*/*", "accept-encoding" => "gzip, deflate"}, basic_auth: { username: "#{user}", password: "#{pass}" }, } response = HTTParty.get("#{host}/api/templates?filter[]=vendor=%27#{vendor}%27&filter[]=name=%27%2A#{os}%2A%27&filter[]=ems_id=%27#{ems}%27", options).to_s link = JSON.parse(response) link["resources"].each do |r| $url = r["href"] end response = HTTParty.get($url,options).to_s template = ["#{JSON.parse(response)['id']}"+", "+"#{JSON.parse(response)['name']}"] return template end 

Using a POST request and specifying filters for the search, we get the desired template.
In addition to solving internal problems, you can create new API methods for use by external systems. At the beginning of the article, the process of ordering a new virtual machine using the web interface was shown. And this is how it looks if you do it with


POST request:
 curl -X POST \ http://Manageiq.hostname/api/service_catalogs/4/service_templates/31 \ -H 'Authorization: Basic Token-Value' \ -H 'Content-Type: application/json' \ -d '{ "action": "order", "resource": { "radio_button_vcpu": "a_2", "radio_button_vram": "a_2", "hdd_size": "40", "dropdown_os": "CentOS", "text_box_filter": "dns", "dropdown_list_information_system": "DNS ", "text_box_validator": "OK (DNS )", "textarea_box_usernotes": " ", "dropdown_env": "production", "date_control_retirement_dt": "2022-05-21", "dropdown_scope": "-" } }' 

Conclusion


Pros:



Minuses:



As we have now:


Due to the fact that ManageIQ can take full advantage of the Ruby language, we were able to integrate it to work with the following APIs:



The functionality and capabilities of the system are very extensive, and I got acquainted with many of them and continue to get acquainted in the process of implementing the system.
For example, I did not mention that the system has the ability to create your own dashboards with statistics, billing settings or buttons, to which you can attach individual scripts or entire scripts. You can add your own fields to record additional information about services and virtual machines, etc.


What Home strives for:



At the very end, I would like to remind you that tools are great, but do not forget about the importance of interaction between different teams. The changes described in the article would not have been possible without well-established communication and constant interaction on emerging issues of all interested parties.


')

Source: https://habr.com/ru/post/461891/


All Articles