You can recall the beginning of the story in the first article , but let me briefly remind you what is happening.
It all started with the transfer of monolithic applications to microservices, when the streamlined process of rolling new releases on the test environment began to fail due to the sharply increased “detail”. So that the developers did not quarrel over common test benches and everything worked quickly and smoothly, we launched an automation project.
Due to managerial difficulties, the project did not show results immediately, therefore I offer you the continuation of the story.
Until recently, the company had several old monolithic test environments, each of which contains frontend / backend / billing components, and they are needed for application development and integration tests. A test bench is a group of hosts with a specific set of applications. Inside we call them test circuits. There were a few general test patterns: integration, load, etc., which did not correspond to their production architecture.
That is, each application was not always in its own container; it was impossible to test the nginx configuration on the test circuit.
It was necessary to install and configure a future release for tests, and after that, in a separate table, specify which hosts and what are you using, so that there are no conflicts. In all of this was the following number of problems:
About a year ago, the company began a rapid growth of teams, and the number of test circuits did not change - only the amount of resources required to support them increased. That is, the volume of manual operations and related problems grew exponentially and at some point began to seriously slow down the testing process, and therefore development.
First of all, it was decided to eliminate conflicts of application versions within one scheme, and also to guarantee the correct delivery of callbacks, so that the response of one component came to a specific server, and not to the one that was previously chosen by another developer. The best solution would be to provide each developer with their own isolated environment.
The operation supports only one reference scheme, repeating production as much as possible. As for the other problems to be solved, the following list of requirements for new test circuits appeared for them:
When choosing a solution, the main limitation was the lack of automatic deployment of applications - the package with the application was simply transferred via a network folder to the test environment. That is, before the test schemes, it was necessary to build an automatic deployment system or accept the fact that the first time when creating new schemes, applications are cloned from the reference scheme (mainly Windows components).
Alternatively, the deployment of the production version of the application by operating scripts. The main disadvantage of this solution is that in order to get new versions of the components in the scheme, it needs to be recreated or to think over the update mechanism and constant support of such scripts in isolation from the applications.
By the way, about a long way to automate assemblies and deployments, we have a separate article , but there it’s mostly about assembly.
You can minimize the change in settings when creating a scheme in the case of invariance of host names to which calls are made. Thus, if the reference scheme is configured to interact, for example, by short names, then when it is cloned into a new scheme, the settings need not be changed. You only need to raise a separate DNS server for the new scheme.
Thus, if we omit a long and painful choice, the fastest solution for us was a bunch of Openstack with private networks for each scheme and Ansible as a deployment tool.
The final solution is based on Openstack + CEPH , in which each application has its own VM. For each test circuit, a separate project is created in Openstack and its own private network is allocated, to which machines with applications related to the circuit are connected. And above all this is a set of Jenkins builds, which performs all service operations via Pipeline and Ansible: creating and deleting diagrams, subcircuits, deployments of applications, periodic tasks for database synchronization.
The vertical axis is Jenkins, through which the creation of schemes, deployment of applications, the creation of databases and other processes take place.
To visualize all the processes with test circuits, we use Jenkins Pipeline and Blue Ocean, we get a clear picture of all the steps, and it is easy to see the broken stage. Most of the processes we have done are idempotent, and in case of an error it is enough just to restart the build.
Process visualization with Jenkins Pipeline.
We tried to put all the databases on separate physical machines in order not to create an extra disk load, leaving the application server in test schemes, and we clone all instances that are not yet able to automatically deploy. Instances with applications that are a database and are not transferred to separate physical machines, we create CEPH on a separate ssd pool.
Speaking of numbers: closer to the end of the project, one medium scheme included more than 60 instances that use about 180 GB of memory and 600 GB of disk space.
And I probably will answer:
To ensure the applicability of existing ansible roles and playbooks for test schemes without any changes, a dynamic inventory generation script is used that, by the name of the scheme that matches the project name in OpenStack, creates an Inventrory of the same type as in production. All the necessary information for dynamic inventory is contained in the metadata of the instances in OpenStack.
As for applications, we translated a significant part of our components on the deployment via ansible: the application is packaged in a deb package together with configs templates and variable values ​​for different environments, and the universal Ansible playbook can template configs, switch symlinks between different versions and check the elevation of the application. Thus, the number of components that we must first clone from the reference scheme when creating a new one has drastically decreased.
In fact, not all components have been transferred to a new deployment - the project is at the “translate what is left” stage just at the time of writing this article.
Another important feature was the process of maintaining all test circuits up to date, for this we implemented a pull-up of production versions from our monitoring and launching an update at night or on demand. Moreover, if the diagram is not a production version of the application, but an assembly from a feature branch, then such a component is automatically excluded from the nightly update.
Since the majority of applications in test schemes are somehow involved in the payment process, a pool of test bank cards is automatically generated for it. Creation of cards, authorization on them and clearing are performed in the processing module, which stands separately from the test schemes and is common to all payment components. Despite the fact that the processing is special test, it can only send a response to a specific server.
That is, we have a lot of schemes and a processing server that does not want to know anything about this set. In addition, in test patterns, new map ranges are added dynamically. As a logical solution to the problem, one could simply clone the processing for each scheme, but this would entail an excessive increase in resources, as well as automation of the processing settings in each scheme. In general, it does not seem to be the best way.
In order not to substantially finish the existing processing, we decided to correct the component creating the maps in the test scheme and install the nginx server with the lua module to proxying the authorization requests from the processing. After adding a range of maps to the scheme, the process of creating maps is started in processing, as a result of which for each map the name of the scheme that issued it is added to the service field.
All reverse requests from processing to test schemes are sent to the server with nginx, where the lua script searches the name of the scheme in the request body and proxies the request correctly.
As Dima wrote in the first part of our novel about test environments, when several divisions work with the system, projects tend to drag out.
But nevertheless a certain finish line is already visible, and we can summarize what we managed to solve:
At the moment, we have already done auto-update of most of the components on all test circuits before their production versions and started a project to create test circuits with a selective set of components to save resources. Now plans to use LXC containers in OpenStack.
In general, this whole technique is just the tip of the iceberg. The most pain is to persuade the development and testers to move to new schemes (to establish new processes). Just went the second quarter, as we continue to slowly, step by step to migrate from the old schemes to new ones.
Source: https://habr.com/ru/post/345654/
All Articles