📜 ⬆️ ⬇️

Continuous integration in Yandex

Maintaining a huge code base while ensuring high performance for a large number of developers is a serious challenge. Over the past 5 years, Yandex has been developing a special system of continuous integration. In this article, we’ll tell you about the scale of the Yandex code base, about transferring development to a single repository with a trunk-based approach to development, about what tasks a continuous integration system should solve in order to work effectively in such conditions.



Many years ago in Yandex there were no special rules in the development of services: each department could use any languages, any technologies, any deployment systems. And as practice has shown, such freedom did not always help to move forward faster. At that time, there were often several in-house or open-source developments for solving the same problems. With the growth of the company, this ecosystem worked worse. At the same time, we wanted to remain one big Yandex, and not be divided into many independent companies, because it gives a lot of advantages: many people do some similar tasks, the results of their work can be reused. Starting from a variety of data structures, such as distributed hash tables and lock-free queues, and ending with a lot of different specialized code that we have written for 20 years.


Many tasks that we solve do not solve in the open-source world. There is no MapReduce that works well on our volumes (5000+ servers) and our tasks, there is no task tracker that can handle all of our tens of millions of tickets. This is attractive in Yandex - you can do really big things.


But we have a serious drop in efficiency when we solve the same tasks anew, rework ready-made solutions, making it difficult to integrate between components. It is good and convenient to do everything just for yourself in your own corner, you can not think about others for the time being. But as soon as the service becomes quite noticeable, it will have dependencies. It only seems that different services are weakly dependent on each other, in fact - there are a lot of connections between different parts of the company. Many services are available through the application Yandex / Browser /, etc., or embedded in each other. For example, Alice appears in the Browser, with the help of Alice you can order a taxi. We all use common components: YT , YQL , Nirvana .


The old development model had significant problems. Due to the presence of many repositories, it’s difficult for an ordinary developer, especially for a beginner, to find out:



As a result, there was a problem of mutual use of components. The components almost could not use other components, because they represented "black boxes" for each other. This had a negative effect on the company, since the components were not only not re-used, but often did not improve. Many components were duplicated, the amount of code that had to be maintained was greatly increased. We generally moved slower than we could.


Single repository and infrastructure


5 years ago we started a project to transfer development to a single repository, with common systems for building, testing, deploying and monitoring.


The main goal we wanted to achieve is to remove interference that prevents the integration of someone else's code. The system should provide easy access to ready-made working code, a clear scheme for its connection and use, collection: projects are always collected (and pass tests).


As a result of the project, a single stack of infrastructure technologies emerged for the company: source code storage, code review system, build system, continuous integration system, deployment, monitoring.


Now most of the source code of Yandex projects is stored in a single repository, or is in the process of moving to it:



Benefits for the company:



It should also be understood that this development model has drawbacks that need to be taken into account:



Our approach to a common repository imposes general rules that everyone should follow. In the case of using a single repository, restrictions are imposed on the languages ​​used, libraries, deployment methods. But in the next project everything will be the same or very similar to yours, and you can even fix something there.


To the model of the general repository, all large companies. The monolithic repository is a large and well studied and discussed topic, so now we will not go into it much. If you would like to learn more, then at the end of the article you will find several useful links that reveal this topic in more detail.


Conditions in which the continuous integration system operates


Development is conducted on the model Trunk based development. Most users work with HEAD or the most recent copy of the repository obtained from the main branch called trunk, which is being developed. Committing changes to the repository are performed sequentially. Immediately after the commit, the new code is visible and can be used by all developers. Development in separate branches is not welcome, although branches can be used for releases.


Projects depend on the source code. Projects and libraries form a complex dependency graph. And that means that changes made in one project potentially affect the rest of the repository.


A large stream of commits goes to the repository:



The codebase contains more than 500,000 build targets and tests.


Without a special system of continuous integration in such conditions it would be very difficult to move quickly forward.


Continuous integration system


The system of continuous integration launches assemblies and tests for each change:



Builds and tests run in parallel on large clusters of hundreds of servers. Builds and tests run on different platforms. Under the main platform (linux), all projects are collected and all tests are run, and under the other platforms, a subset set up by users.


After receiving and analyzing the results of the builds and the test run, the user receives feedback, for example, if changes break any tests.




In case of detection of new breakdowns of the assembly or tests, we send a notification to test owners and the author of changes. The system also stores and displays the results of checks in a special interface. The web interface of the integration system displays the progress and the result of the test with a breakdown by test type. The screen with the results of verification now may look like this:




Features and capabilities of the continuous integration system


Solving various problems faced by developers and testers, we developed our system of continuous integration. The system is already solving many problems, but there is still much to be improved.


Types and sizes of tests


There are several types of targets that a continuous integration system can launch:



Test run frequency and binary breakdown search


Huge resources are allocated for testing in Yandex - hundreds of powerful servers. But even with a large number of resources, we can not run all the tests for each change affecting them. But at the same time it is very important for us to always help the developer to localize the place where the test breaks, especially in such a large repository.


How we act. For every change for all affected projects, builds, style checks, and tests with small and medium sizes are run. The rest of the tests are run not on every affecting commit, but at some intervals, if there are commits that affect the tests. In some cases, users can control the startup frequency, in other cases, the startup frequency is set by the system. If a test breakdown is detected, the process of searching for a test commit commit is started. The less frequently the test is run, the longer we will look for breaking commit after detecting a breakdown.



')

When running precommit checks, we also run only assemblies and light tests. Further, the user can manually initiate the launch of heavy tests by selecting from the list provided by the system of affected tests.


Flashing Test Detection


Flashing tests are such tests, the result of the launch (Passed / Failed) of which on the same code may depend on various factors. The causes of the flashing tests can be different: sleep in the test code, errors when working with multithreading, infrastructure problems (inaccessibility of any systems), etc. Flashing tests present a serious problem:



At the moment, for every check we run all the tests twice to detect flashing tests. We also take into account complaints from users (notification recipients). If we detect blinking, we mark the test with a special flag (muted) and inform the test owner. After this, only test owners will receive notification of test breakdowns. Next, we continue to run the test in the normal mode, while analyzing the history of its launches. If the test does not blink in a specific time window, the automation may decide that the test has stopped flashing and you can reset the flag.


Our current algorithm is quite simple and many improvements are planned in this place. First of all, we want to use much more useful signals.


Automatic update of test input


When testing the most complex Yandex systems, in addition to other testing methods, testing using black box strategy + data-driven testing is often used. To ensure good coverage, such tests require a large set of input data. Data can be selected from production clusters. But there is a problem with the fact that the data quickly become obsolete. The world does not stand still, our systems are constantly evolving. Outdated test data over time will not provide good test coverage, and then completely lead to breakage of the test due to the fact that programs begin to use new data that are not in the outdated test data.


In order for the data not to become outdated, the continuous integration system is able to update it automatically. How it works.


  1. Test data is stored in a special resource repository.
  2. The test contains metadata describing the required input data.
  3. The correspondence between the required test input data and the resources is stored in the continuous integration system.
  4. The developer provides regular delivery of fresh data to the storage of resources.
  5. The continuous integration system searches for new versions of test data in the resource repository and switches input data.

It is important to update the data in such a way that false testing will not occur. You cannot just take and, starting from a certain commit, start using new data, since in the event of a breakdown of the test, it will be unclear who is to blame - commit or new data. This will also make diff tests unresponsive (described below).



Therefore, we make it so that there is some small interval of commits, on which the test is launched from both the old and the new versions of the input data.




Diff tests


Diff tests we call a special type of data-driven tests that differ from the generally accepted approach in that the test does not have a reference result, but at the same time we need to find in which commits the test changed its behavior.


The standard approach to data-driven testing is as follows. The test has a benchmark result when you first run the test. The benchmark result can be stored in the repository next to the test. Subsequent runs of the test should lead to the same result.



If the result is different from the reference, the developer must decide whether the change is expected or error. If the change is expected, the developer should update the reference result while fixing the changes in the repository.


There are difficulties when using this approach in a large repository with large flows of commits:


  1. There can be many tests and the tests can be very hard. The developer does not have the opportunity to run all the affected tests in the working environment.
  2. After making changes, the test may break if the reference result was not updated simultaneously with making changes to the code. Then another developer can make changes to the same component and the test result will change again. We get the imposition of one error on another. With such problems it is very difficult to understand, it takes time from the developers.

What we do. Diff tests consist of 2 parts:



The launch of check and diff components is controlled by a continuous integration system.




If the continuous integration system detects a diff, then a binary search is first performed on the commit that caused the change. After receiving a notification from the developer, it is possible to examine the diff and decide what to do next: recognize the diff as expected (for this you need to perform a special action) or repair / "roll back" your changes.


To be continued


In the next article we will tell about how the system of continuous integration is arranged.


Links


Monolithic repository, Trunk-based development



Data-driven testing


Source: https://habr.com/ru/post/428972/


All Articles