Several ways to optimize working with Git

In our blog on Habré, we talk about various technologies from the world of IaaS and not only. For example, we recently published material on VPN software implementations [ Part 1 ; Part 2 ], and also talked about DNS . Today we would like to delve into the theme of developing applications and services and talk about such a thing as Git, in particular, about how to optimize working with it.

/ photo hackNY.org CC

I would like to start from the very beginning - what is Git? Git is one of the version control systems (version control system, or VCS), on the basis of which several services are built, such as GitHub or GitLab. With Git, a large number of software has been developed that you probably know well: this is also the kernel of Linux, Firefox, and Chrome.
')
If you worked as a team on some software product, then imagine how it all happens. You have a specific version of your project that you send to your colleagues. They make changes to the code and send them back. You embed them in your code base and get a new version of the project.

One of the main tasks of Git is to avoid a situation of confusion between product versions, when files with names like project_betterVersion.kdenlive or project_FINAL-alternateVersion.kdenlive, etc. appear.

To simplify the work with these files and need a VCS system. So, each team member has the opportunity to work on the latest version of the project, make his own changes and inform his colleagues about it.

Control systems allow you to store several variations of the same document and, if necessary, “roll back” it to an earlier implementation. That is, you can make a copy of the repository and work with it locally, and then use special commands to embed your edits (push) into the main version or extract (pull) changes made by your colleagues.

Productivity increase

When working on large products, the source is constantly renamed, new branches are highlighted, and a comparison with previous versions is performed. Therefore, in large enough projects, there may be a decrease in Git performance. Once Facebook even encountered such problems.

Then they explained the difficulties in their work by the fact that with any change in the source files the index file was rewritten, and in a large project its size exceeded 100 MB. This led to a slowdown (by the way, here is one interesting solution to another problem with the performance of Facebook version control systems proposed by the company's engineers).

To speed up work with Git, developers use various techniques, utilities and solutions. One option would be to reduce the size of the repository.

Repository shrinking

RoR developer Steve Lorek (Steve Lorek) writes in his blog that he managed to reduce the size of the repository from 180 MB to 7 MB. To do this, he first created a local copy of Git, and then found files that take up too much storage space. Here came the bash-script Anthony Stubbs (Antony Stubbs), who finds the 10 largest and unnecessary files.

After that, he deleted these files using a series of commands:

$ git filter-branch --tag-name-filter cat --index-filter 'git rm -r --cached --ignore-unmatch filename' --prune-empty -f -- --all $ rm -rf .git/refs/original $ git reflog expire --expire=now –all $ git gc --prune=now $ git gc --aggressive --prune=now

After that, Steve sent the changes to the remote repository so that no one else had to download 180 megabytes for work.

Smart mirroring

This is another solution that is useful to organizations numbering several hundred developers. Many members of such teams work remotely and from different countries, which leads to delays when loading data from repositories. It happens that employees send hard drives to each other by mail.

Mirroring configures one or more active mirror servers that perform only read operations on copies of repositories and are synchronized with the main instance. This approach allows to reduce the time to transfer a copy of the repository to 5 GB by about 25 times.

A different approach to storing large files

Due to the fact that each developer keeps on his computer the entire history of changes, the size of the Git repositories is growing rapidly. However, there are a number of utilities that solve these problems. For example, git-annex allows you to store a symbolic link (symlink) instead of a whole file to it.

Also worth noting is the extension Git Large File Storage ( Git LFS ), which writes pointers to files in the repository. Operations with these files are tracked using the clean and smudge filters , and their contents are stored on a remote server at GitHub.com or GitHub Enterprise. Description of several other utilities you can find on the link .

Use of aliases

This tip is not so much connected with Git performance and file upload speed, but with the convenience of work. Defining aliases can significantly increase the speed of working with Git and simplify many operations. Aliases are configured using the configuration file:

 git config --global alias.co checkout git config --global alias.br branch git config --global alias.ci commit git config --global alias.st status

Interestingly, in this way you can create your own commands that are not present in the system by default, for example:

 git config --global alias.l "log --oneline --graph"

Specifically, in this case, you will be able to display logs in a line and graphically with the git l command.

These small tips can help simplify working with large repositories and make life easier for development teams. And this is a big deal in terms of quality and speed of implementation of important projects of the company.

PS And we are writing about the creation of our IaaS provider 1cloud:

Source: https://habr.com/ru/post/309704/

All Articles