Software Configuration Management // Version Control

Hello again.

I continue to publish a series of articles about SCM - software configuration management.
3 previous notes can be read in the same blog .

Today I will talk about what most readers work with - version control.
')

Disclaimer

Next will be described the main techniques implemented in the vast majority of version control systems. How they are implemented in the applications that the reader uses will be left at the mercy of numerous user guides, how-to, FAQ and other documents, which can be easily found. The main thing is to understand the principles and why it works this way.

What is it about?

A version control system is software that allows you to create versions of elements and work with these versions as if they were separate elements. In English-language sources, the term version control systems , abbreviated as VCS, is used. Working with versions involves both the creation of the versions themselves and the structure for storing them. As a rule, these are either chains or trees.

Before working with elements and their versions, you need to create these elements, i.e. Letting the version control system take the existing real-world objects and put them under control. Together with the element itself, its first version is always created.
Most often, the elements for version control are:

files;
directories;
hard- and softlinks.

Inside the control system, the elements themselves can be placed differently - it depends on the VCS architects. The user only needs to know that the item is placed inside the repository and is working with it using the commands of the selected toolkit.

Branching and merging versions

As already mentioned, control systems must provide structures for storing versions. The most common representation of this structure is the version tree . This is such an organization of versions of an element in which several sets of sequences of its versions can be created on the basis of any version of a configuration element. In this case, a separate set of versions originating from an arbitrary version is called a branch . And since the branch contains versions, each version can be a source for creating other branches. In short, a tree.

The name of the model speaks for itself: the plants (elements) have buds and leaves (versions), of which, in turn, are branches. On the branches - leaves (other versions) and other branches. Again, the same vegetation grows on them. As a result, a tree grows whose crown is a multitude of versions. One element - one tree.

Why do you need this whole structure? Is it really impossible to simply build versions one after another? Sure you may. However, this immediately limits the use of such a system. If versions appear one by one, then at one time, only one of the users working with the system will be able to create a new version, the rest will have to wait. Moreover, when a new version appears, everyone will need to combine their changes with current developments. And so - until everyone wants to put their work in a chain of versions. In this case, everyone will have to make sure that the merging of the versions did not lead to a system crash. And besides, until all changes are put in such a way under control, all of those waiting will have to keep intermediate results somewhere locally, not mixing with what is currently in operation. And well, if a couple of people work on a dozen of elements - they can always agree. And if the scale is much larger? Add a dozen people (even without increasing the number of elements) - and such simple chains completely stop the work. In general, the linear structure of the versions gives rise to many difficulties.

So, it is clear that you can not do without branches. But after all not to grow a branch on the slightest sneeze of the developer? Let's see in which cases the branches grow. Typical examples of branches are:

branch for a request for changes - it is created for versions created in the course of work on a change request ("developer" or "siarna", branch);
integration branch - serves as an intermediate repository for the stabilization process;
release branch - for posting versions when the configuration is stabilized (see the corresponding section in the first part of the article). Some versions on the branch can be further declared part of the basic configuration;
debug (“debug”) branch - for short-term version storage, mainly for the purpose of checking any solutions.

Scheme 1. The element.c version tree

Figure 1 shows an example version tree. The element.c file has a release_1.x release branch, where the stabilized versions of this element (1-5) are added. To save the delta for each change request, a separate branch is created with a special name format. In our case, the format is rec <record_number> _ <user_name>, where record_number is the change request ID in the tracking system. To integrate delts from different developers, integration branches are created with names like int_ <username> _ <suffix>, where the suffix stores the description of the integration or the number of the configuration being stabilized. You can also see a branch for debugging, most often they are referred to as dbg_ <username> _ <arbitrary_comment> —the verification options for changes are laid out on it.

More information about growing each branch from the example will be described below.
Each project may have its own ways of creating and naming branches, but the main ones were listed above. If product lines are used, then it becomes necessary to use all the listed species.

The version tree grows and expands, and sooner or later the results of the work should be merged. For example, a developer has grown a branch from one of the elements to work on a change request. He put several versions on it, and the last one is the one that contains debugged and tested code. At the same time, there is a release branch where versions are released within the framework of basic configurations and stable releases. It is necessary to combine the results.

For this, a version merge mechanism is used. As a rule, it implies the creation of a new version of the element for which the base version on the selected branch (base) is taken as the basis, and the changes contained in the selected third-party version (source) are applied to it. In English-language sources, the term merge is used.

A branch with a source version can be grown from both the source version and its earlier ancestors. Existing VCS allow you to merge both manually and automatically. And the second method is the main one. Manual merging is requested only in case of conflicts.

Merge conflicts arise when the same fragment changes in both versions of an element. Such a situation arises when the ancestor of the source version is not the version from which the new version will grow. A typical example of such a conflict is the revision history, which is added to the beginning of the source file, so that in each version you can immediately see who was the last to change and what was done. In the case of the merging of versions from different sources, this line will definitely cause a conflict, and it is solved only by inserting both lines into the story. When a more complicated case occurs, the developer or expert in the affected code must carefully manually make the necessary changes.

On the question of common ancestors and the merging of changes: in addition to manual and automatic, the merger can be made in a two-position and three-way manner. A two-position merge is performed by simply comparing the two versions and adding their delta (the difference between the element versions). The algorithm works on the principle of diff'a or approximately to it: take the delta and insert / delete / change the necessary lines.

The three-position merge takes into account the “ common ancestor ” of both versions and calculates the delta based on the history of the change of the element in the corresponding branches. Accordingly, in the event of a merge conflict, the developer is offered 3 versions of the element — a common ancestor and 2 variants, which became with this ancestor over time and changes. This approach helps to assess the degree and importance of the delta on both branches and decide on the need to integrate the conflict piece, often without the participation of the authors of the changes.

After the merger is completed, information about it should be saved, if possible. As a rule, the majority of mature VCSs have the ability to save “merge hands” - meta-information about where, where and at which point in time the changes merged and who did it.

An example of branching and merging

Consider an example - a tree of versions of an element in diagram 2, demonstrating the order of growing and merging branches on it. As you can already guess, the tree is entirely taken from scheme 1, but merging arrows are added to it.

An example of merging changes between different branches

An example of merging changes between different branches

Scheme 2. Example of merging changes between different branches

So, the project produces a certain product, which includes the element.c file. In order to store the stabilized versions, the team agreed that all stable or basic versions are stored on the “release_1.x” branch. This will be called the release branch . Our element is no exception, and the initial version 1 is created on the release branch.
For simplicity, we will describe the branches as if they were directories on the disk. Accordingly, the first version is called / release_1.x/1.

Further, one of the managers in the change request tracking system (hereinafter, we will call this system simply “bugtracker”) made the entry number 98, where he described the new functionality required by the product. And, of course, I assigned one of the users responsible for this task - let it be user2. user2 thought a bit and started to solve this problem, and after some time decided to put the resulting sources under version control. According to the naming standards adopted by the project (CM-politicians), the branch for making changes in our project is called rec <record number> _ <user> [_ <comments>]. Therefore, the new branch was named rec98_user2, and its creator abstained from commenting. The work is in full swing, the version / release_1.x/rec98_user2/1 appears, and then / release_1.x/rec98_user2/2.
Let's leave the developer user2 on this for now, let him think about the task. After all, while he was working, a record (CR) number 121 was registered in the bugtracker, which described a new error found by the testers. This entry was assigned to user user1, and he began to successfully correct the described error. As he corrected, he decided to start a thread to save the results. The new thread, according to project policies, the user called rec121_user1. Note that at the time of starting the work and creating a branch, someone has already added another stable version to the release branch - /release_1.x/2. Therefore, the branch grows from the latest at that time version (second). The branch is created - you can add versions. The end result is version / release_1.x/rec121_user1/2.

What's next? The error has been corrected, tested (we will leave this work plane behind the scenes for now) - it's time to make these changes a part of a stable configuration and, possibly, a new basic configuration. This is where the CM engineer or the team member who performs this role begins. With the help of ~~scrap and sledgehammers~~ of the merge team, he creates a new version on the release branch - /release_1.x/3. Pay attention to the arrow with the number 1 - it displays just the process of merging.

Let's go back to user2 — he just thought of making some changes for his task, but he decided first to quickly check what he got and give his colleagues a look at his solution. To do this, it creates a debug branch. The CM policy of the project says that it should be called dbg_ <user> [_ <comment>]. Accordingly, the new branch will be called / release_1.x/rec98_user2/dbg_user2. The user creates the version / release_1.x/rec98_user2/dbg_user2/1 on it. It was decided to take the decision in the main code, so the author made a merger of the new delta and the version from which the branch grew. At the same time, the user cleaned and optimized the code so that it was not a shame to give for integration - the result was version / release_1.x/rec98_user2/3. Well, the bright arrow at number 2 clearly describes the merging process.

However, user2 finds out that during his work a serious error was corrected, which was started by CR # 121. And this fix may affect the operation of the new functionality. The decision is made to connect both deltas and see what happens. The release version /release_1.x/rec98_user2/3 and /release_1.x/rec121_user1/2 is merged to form version /release_1.x/rec98_user2/4. Well, the fusion arrow number 3 also appears. This new version is checked for operability and errors, and a decision is made - it is necessary to integrate! CM Engineer takes his tools again and makes version / release_1.x/4, drawing the corresponding arrow number 4 to it (any coincidence of numbers is random).

However, life does not stand still. While our two developers contributed and merged the delta together, other team members have already changed the same file. Two CR'a were introduced - 130 and 131, then assigned to user3. He successfully completed them and made two branches - one per entry. Since the tasks were set and solved at different times, the branches for their solution grew from different versions on the release branch. As a result, versions /release_1.x/rec130_user3/1 and /release_1.x/rec131_user3/1, which were generated from version /release_1.x/3, turned out.

There are changes - you need to combine them, stabilize and make the basic configuration, if everything is fine. For this purpose, the CM-engineer, who passes under the operational alias user7 in the version control system, creates an integration branch that has the form int_ <user> _ <future-release-number> in this project. So, the /release_1.x/int_user7_1.5 branch appears. On it merge together both deltas. At first changes for record 130, with formation of version / release_1.x/int_user7_1.5/1. Then - for recording 131, version 2 is created for it on the same branch. Merge arrows are drawn for all operations.

The final chord of the CM-engineer is the merging of the version /release_1.x/int_user7_1.5/2 to the release branch to form the version /release_1.x/5. Subsequently, this thread will become part of the basic configuration of the product.
Here is a rather small description of a small picture. One picture is worth hundreds of words - the truth is told.

An attentive reader probably has a question in his head - if everything is done through branches and merge arrows - where does the version of /release_1.x/2 come from? After all, it does not lead to a single arrow from one branch! Natural question. The answer is also natural. Yes, there are situations when changes are made directly to the release branch. For example, a terrible mistake was made with the first version - they forgot to make a comment in the revision history section about who made the changes! Of course, this is a joke, no one will violate politics for the sake of such trifles. However - and this happens. The main thing is to know exactly who created the new version and why he did it. Best of all, if the version control system allows you to restrict the rights to create versions for each branch separately. In this case, we will additionally secure the project by giving the rights to add versions on the release branch only to the CM engineer. At least with such a restriction it will be easier to find the last :).

After the above, is it necessary to say that the ability to work with branches is actually the basic functionality of any mature version control system? Without branches, the version control system can be considered as such only from a formal point of view - simply because it can store and issue versions, but no more.

Total

Above it was shown how the main functions of any version control system. How they are implemented in specific applications taken - everyone can look at the system they personally use.
Take a look at your tools once again, with a cool look and compare with what is described.

By tradition - a list of recommended materials for independent thoughtful reading.

en.wikipedia.org/wiki/Comparison_of_revision_control_software - a great comparison of existing version control systems
www.cmcrossroads.com/bradapp/acme/branching is a good article on branching policies; there are many different branch patterns that are suitable for different projects.
www.infoq.com/articles/agile-version-control - explanatory article on how you can organize the growth of brunch and merge them using agile-techniques. Thanks sse for the link.

For a complete picture of the use of version control systems, it remains to tell about the distribution of this control among development teams located in different geographic points. But about this - in the next article.

To be continued.

PS I publish in Project Management, and not in Version Control Systems, just to publish the entire cycle of notes in a uniform way.

Source: https://habr.com/ru/post/68932/

All Articles