Software Configuration Management // Distributed Version Control

Greetings. As promised - the continuation of the cycle of notes on software configuration management, in common people called Software Configuration Management. The entire cycle can be found at the link to the CM tag . Of the still uncovered remained a couple of notes.

Today we will talk about a rather controversial and somewhat provocative issue - distributed version control systems. I know that such systems are popular among habravchan, so it is ready to discuss in advance. Moreover, I urge you not to pass by and speak out if you have something to say on the case.

So, there is a project in it - a version control system that serves several commands that implement this project. The version control system is one for all. Let me remind you that I continue the series of notes, and earlier I talked about version control in general, bypassing specific implementations. So the subject area is also gradually evolving from simple to complex.
')
So, at some point there is a need to make available central storage locally in one of the development centers - to speed up work and bypass traffic or bandwidth restrictions. For example, there are two teams located geographically in different places and time zones - say, the Russian Far East and the US Central Time Zone, half a world separates them. Work is on one project, and there is a need to change the same parts of the product. Assume that the version control system server is in the USA - respectively, developers in Russia have to send changes across half the globe to create each new version. And any operation like switching to another branch with taking the entire configuration as a whole will take too much time, considering the amount of ping. In general, in such situations, centralized storage is not the most convenient option.

Since the problem is not new and relevant, over time, different approaches to solving the problem were formulated. Or rather, two approaches to building distributed control systems.

Open distribution is the principle of construction, in which each working copy of a configuration can have its own set of child versions and the exchange of the created versions occurs according to the choice of the change creator.

The advantage of such systems is that work on a separate workstation can proceed independently from other instances of the storage. Actually, the storage may not be - how many copies, so many stores. Not surprisingly, such systems have found application primarily in Open Source. The absence of the need to maintain a separate server makes it possible to exchange only the information that is needed, and not to overload the storage and traffic with a delta that someone may never need.
The disadvantage of this approach is that the delta exchange of working products is difficult to centrally control. It turns out some Brownian delta movement, which many managers who are used to centralization may not like.
Examples of such systems are BitKeeper, git, Mercurial (Hg).

Distribution by replication provides for the creation of equal copies of the central data repository (or its parts) on all distributed servers. Here you can draw an analogy with databases and their replication. For each developer, the version repository to which it is connected is the primary one. All versions and branches are created in a central repository or replica. To distribute data, a copy of the repository is made to other available servers and some developers switch to the copy made. If it is necessary to exchange the results of work, replication of the repository occurs - both servers exchange meta-information.

The advantage of this approach is the centralization of work within the same team location. It is also worth adding that it is possible to keep part of the accumulated information from synchronization with other teams, but at the same time make it available to the entire local team at the same time. This is important when the code of the developed subsystems should not fall outside - even for other teams working on the same product.

Minus - the need to configure replication mechanisms. But, as a rule, systems using this approach provide tools for efficient data exchange. In addition, for someone it may be a minus the fact that all operations with versions are performed on a single server, and not on the local developer’s computer. That is, the “distribution” of the system is manifested at the level of commands and their location, but not at the level of a simple developer.
Examples of replication systems are ClearCase and Perforce.

Both types (open and replication) are similar to each other - in both cases information is exchanged between different copies of the same set of elements and their versions. The difference between them is in “scales”. In systems with replication, the minimum replica unit is usually a repository or its significant part, processed as a single unit. In systems with an open distribution, the minimum unit of information exchange is a separate version of an individual element.
There is a common problem with both types of distribution. It is the need to introduce a clear agreement on the naming of elements and their branches, as well as labels to indicate the resulting configurations. When combining work results, you should not get different files with the same name and meta information (branches, labels, attributes). Therefore, all developers and teams working separately should adhere to common standards. In different systems, mechanisms are provided to ensure this condition. For example, when working with ClearCase, triggers are created to create any meta information that checks it for compliance with the standard - for all created branches, it is necessary to have in the name of the branch a code (or identifier) of the site (team) in which the branch was created.

In addition, systems with an open distribution are actually left to the discretion of each individual developer - what he will give to the team in the form of a delta, and what he will not put on public display. Whether this is bad or good depends on the culture adopted within the project. For more centralized systems, with repository replication, this problem is seen from a different angle. When everyone is obliged to make their own changes to the central (for their team) version control system, the base of meta-information quickly grows in size - which affects both the cost of storage and the speed of replication of spaced databases.

Which approach is better, of course, cannot be said at once for all projects. It is up to the management of each project to adapt the git movement of the Brownian movement to a more stable state. There is no single solution for all teams and projects. Who is interested in looking at the differences between different systems and models - see the link [1].

By the way, not only version control systems can be distributed, but also change request tracking systems. The logic of work is completely similar. Here are just the basic model of work - replication. An example is IBM Rational ClearDDTS. Since such systems are not very common, we will not dwell on them in detail.

By tradition, the sources used and recommended for self-study are:

en.wikipedia.org/wiki/Comparison_of_revision_control_software - comparison of version control systems;
lib.custis.ru/index.php/The_Risks_of_Distributed_Version_Control - a sober look at the risks associated with distributed version control systems;

I will recommend another article about the problems of implementing different version control systems:
lib.custis.ru/index.php/Version_Control_and_%E2%80%9Cthe_80%25%E2%80%9D
A bit provocative, but worth reading first of all those who consider themselves to be the progressive part of the programming community. And, in advance of answering the question: no, I'm not a fan of either SVN or git.

That's all for today. Do not pass by, speak out. It is interesting to hear from people who used Perforce.
Fans of git / Hg / etc - it's interesting to hear about not obvious problems arising from the delta exchange (they should be, nothing is smooth and perfect).
If anyone replicated repositories in SVN or even CVS - tell us, thank you.

Well, - to be continued.

Source: https://habr.com/ru/post/72370/

All Articles

Software Configuration Management // Distributed Version Control

More articles: