By the nature of my work, I often witness "holy wars" between fellow programmers on which version control system to choose for a project. The role of the version control system is particularly acute in cases of developing and supporting projects with a long history. There are a lot of tool options, but I want to concentrate on two, in my opinion, the most promising: Mercurial and Git. Next, we will try to consider the capabilities of both systems from the perspective of their internal structure.
A bit of history
The impetus for the creation of both systems, both Mercurial and Git, was one 2005 event. The thing was that in the aforementioned 2005, the Linux kernel lost the ability to use the BitKeeper version control system for free. After using BitKeeper for three years, kernel developers are accustomed to its distributed workflow. Automated work with patches greatly simplified the process of recording and merging changes, and the presence of a history over a long period of time allowed for a regression.
The hierarchical organization of developers has become another important part of the Linux kernel development process. At the top of the hierarchy stood the Dictator and many Lieutenants in charge of the individual subsystems of the nucleus. Each Lieutenant accepted or rejected individual changes within his subsystem. Linus, in turn, dragged their changes and published them in the official repository of the Linux kernel. Any tool that replaced BitKeeper had to implement such a process.
The third critical requirement for the future system was the speed of work with a large number of changes and files. The Linux kernel is a very large project that accepts thousands of individual changes from thousands of different people.
')
Among the many tools suitable was not found. Almost simultaneously, Matt Mackall and Linus Torvalds release their version control systems: Mercurial and Git, respectively. Both systems were based on the ideas of the Monotone project that emerged two years earlier.
Similarity
Both version control systems have a number of common features:
- revisions are associated with checksums;
- the story has the appearance of a directed acyclic graph;
- high-level functions are supported, incl. bisection, branching and selective fixation.
Differences
Despite the common ideas and high-level functionality, the implementation of systems at a low level is largely different.
History storage
Both Git and Mercurial identify file versions by their checksum. The checksums of the individual files are combined into manifests. In Git, manifests are called trees, in which some trees may point to others. Manifests are directly related to revisions / fixations.
Mercurial uses a special Revlog storage engine to improve performance. Each file placed in the repository is associated with two others: an index and a file with data. Data files contain nuggets and delta nuggets that are created only when the number of individual file changes exceeds a certain threshold value. The index serves as a tool for efficient access to the data file. Delta, resulting from the modification of files under version control, are added only to data files. In order to edit from different places of the file to merge into one revision, an index is used. The audits of individual files are manifested, and from the manifests, fixations. This method has proven to be very effective in creating, searching, and calculating file differences. Also, the advantages of the method include compactness with respect to disk space and a fairly efficient protocol for transferring changes over the network.
The Git storage model is based on large object binary files (BLOBs). Each new revision of the file is a complete copy of the file, which results in the rapid saving of revisions. Copies of files are compressed, but still, large volumes of duplication take place. Git developers have applied data packaging techniques to reduce storage requirements. Essentially they created something similar to Revlog for a specified point in time. Packages obtained as a result of packaging differ from Revlog, but they pursue the same goal - to save data, effectively spending disk space. In view of the fact that Git saves file nuggets, rather than incrementing, commits can easily be created and destroyed. If the analysis requires to see the difference between two different fixations, then in Git the difference (diff) is calculated dynamically.
Branching
Branching is a very important part of configuration management systems, since it allows you to conduct a parallel development of new functionality, while maintaining the stability of the old. Branch support is present in both Git and Mercurial. The differences in the format of storing the history are reflected in the implementation of branching. For Mercurial, a branch is a kind of mark that is attached to a fix forever. This mark is global and unique. Any person pulling in changes from a remote repository will see all the branches in his repository and all the fixations in each of them. For Mercurial, a branch is a public development site outside the main trunk. Branch names are published to all participants, so the time-stable version numbers are usually used as names.
Branches of Git, in fact, are only pointers to commit. In different clones of the repository, branches with the same name may indicate different fixations. Branches in Git can be deleted and transferred separately (each is uniquely identified by its local name in the source repository).
Practical aspects of use
The differences in the implementations of Git and Mercurial can be illustrated with examples.
Mercurial makes it easy to commit changes, push and pull them with the support of all previous history. Git does not care about the support of the entire previous story, it only fixes the changes and creates pointers to them. For Git, the previous history doesn’t matter and what the pointers referred to earlier, what’s important is what’s relevant at the moment. There is even a tool that guarantees the preservation of local history when pulling changes from external storage - fast-forward merge. If this mechanism is enabled, Git will report changes that cannot be resolved without moving forward in history. These errors can not be taken into account if the received changes are expected.
When performing a rollback commit or merge with merge, Git simply changes the branch pointer to the previous commit. In fact, at any time when you need to roll back to some previous state, Git searches the log for the corresponding checksum and tells which commit it corresponds to. As soon as something is fixed in Git, you can always return to this state. For Mercurial, there are cases where it is impossible to completely return to its original state. Since Mercurial creates a fix to solve a problem, in some cases it is difficult to move back with a fresh change.
There are extensions to solve various problems in Mercurial. Each extension solves its problems well, if it exists by itself. There are even some extensions that provide similar functionality, but in different ways.
For example, consider the work with deferred history. Suppose we need to record changes from a working copy without committing to the repository. Git suggests using stash. Stash is a fix or a branch that is not stored in the usual place. Stash is not shown when a list of branches is displayed, but with all tools it is treated as a branch. If similar functionality is required by Mercurial, then attic or shelve extensions can be used. Both of these extensions store “deferred” history as files in the storage, which can be fixed if necessary. Each extension solves the problem in a slightly different way, so there is an inconsistency in the formats.
Another example is the git commit --amend command. If you need to change the most recent commit, for example, add something forgotten or change a comment, the git commit --amend command will create a completely new set of file objects, trees, and commit objects. After that the branch pointer is updated. If further changes are required, it is only necessary to return the pointer to the previous commit with the git reset --hard HEAD @ {1} command. To repeat this in Mercurial, you need to roll back the commit, then create a new one, then import the contents of the last commit with the queue extension, add it, and make a new commit.
It should be noted that none of the above-listed add-ons use the capabilities of the Mercurial storage format, and thus they exist solely as an independent superstructure above it.
findings
In the last section of this article I would like to express my own opinion on the choice of version control system. Both Mercurial and Git are good in their segments.
For example, for the purpose of running a commercial software project, Mercurial is more appealing to me.
- Strong work with history in Mercurial ensures that you can account for and find the original source of the error.
- After merging with a branch in Git, we risk getting a giga patch in which there will be an error somewhere.
- Global branches also provide the ability to monitor the work of colleagues with regular synchronization with the central repository.
Git is better suited for storing binary files, such as an electronic library. Compared to Mercurial, it is not focused on calculating the delta of files, which is not very efficient for binary content. The files themselves rarely change, and the basic operations with them are moving and adding. According to my own observations, the Git repository folder with the history of my library is comparable in size to a working copy with a neighborhood of about 10%.
Sources of knowledge
- Main source
- Mercurial format description
- Git format description
- General background information from Wikipedia