Secrets of lost commits in Git

Git is not very complicated, but flexible. Sometimes this flexibility leads to funny consequences. For example, look at this commit on GitHub. It looks like a normal commit, but if you clone this repository for yourself, you will not find such a commit in it. Because it is a lost commit, better known as git loose object or orphaned commit. Under the cut - a little about the insides of Git, where it comes from and what to do if you come across it.

How git stores commits

The Git repository uses a simple key-value repository, where the SHA-1 hash acts as a key, and the value is one of three container types: commit description, file tree description, or file contents. There are even low-level service commands (plumbing) to work with this repository as a database:

echo 'test content' | git hash-object -w --stdin

This architectural feature gave rise to the hazy saying that Git tracks renaming by file content. When renaming, the “commit” object will contain a link to the “file contents” object, but if the contents have not changed, then this will be a link to the object already in the repository.

When a developer creates a commit, Git places a single commit description object and a handful of objects in the repository that describe the file structure and file contents. Thus, “commits” are interconnected Git objects in a key-value store.

By default, Git stores the contents of files entirely: if we changed the line in a 100-kilobyte source, then an object with all 100 kilobytes compressed with zlib will be added to the storage. So that the repository does not swell too much, a garbage collector is provided in Git, which is started when the push command is executed, and the objects are repackaged into a pack file that contains the difference between the source file and the next revision (diff).

When commits die

In some cases, commit may not be needed. For example, the developer made a commit foo, and then rolled back the change using the reset command. Git is designed in such a way that it doesn’t remove commits right away, giving the developer the ability to “turn back” even the most destructive actions. The special command reflog allows you to view the activity log, which contains links to all changes to the repository.

But "unnecessary" commits happen not only when using the reset command. For example, the popular rebase operation simply copies information about commits, leaving the “original” in storage, which no one needs anymore. So that such “lost” objects are not accumulated, Git has a garbage collection mechanism - the garbage collector already mentioned above, which is automatically called when the push command is executed or manually called.

Garbage collector searches for objects that are no longer references, and removes them from storage. A huge role in this is played by the log of reflog operations: the links in it have a limited lifespan, by default 30 days for an object without links and 90 days for an object with links. Garbage collector first removes from the reflog log all links that have expired, and then deletes objects that are no longer referenced from the storage. This architecture gives the developer 30 days to restore the “unnecessary” commit, which would otherwise be permanently deleted from the repository after this period.

What happened on github?

I think you already guess. The specified commit was unnecessary: most likely, the author made a rebase. But GitHub shows the contents of the server repository, from which the push command is never executed. And the garbage collector, most likely, also no one calls. At the same time, when cloning such a repository, Git sends only those commits to which there are links over the network, and “lost commits”, better known as loose objects, remain lying dead weight on the server side.

I hope this small excursion into the guts of Git will save someone valuable time searching for “missing commits,” referred to, for example, by a bug tracker. If I make a mistake somewhere or have comments, I will be happy to talk in the comments.

Source: https://habr.com/ru/post/261743/

All Articles

Secrets of lost commits in Git

How git stores commits

When commits die

What happened on github?

More articles: