Joshua Redstone
complained in the Git mailing list about some of the performance issues that Facebook had on a large repository. They created a synthetic repository and conducted tests.
Test repository4 million commits, linear history and about 1.3 million files. The size of the .git folder is about 15 GB, it was packed with the repack command:
git repack -a -d -f --max-pack-size=10g --depth=100 --window=250
The process took about two days on a good machine (lots of memory, SSD). The size of the index file was 191 MB.
The speed of Git in such a repository is not happy. The results of executing commands on a server with regular HDD and> 10 GB of RAM (commands were repeated several times, they work faster with the OS hot cache than for the first time):
')
git status
39 minutes with a cold cache; 24 seconds with a hot cache;
git blame
44 minutes and 11 minutes;
git add
(add a pair of characters at the end of the file and add it)
7 seconds and 5 seconds;
git commit -m "foo bar3" --no-verify --untracked-files=no –quiet --no-status
41 minutes and 20 seconds.
Facebook developers say that such results do not suit them, and ask for advice on how to remedy the situation. It is probably necessary to allocate specialized individual servers for Git, and somehow maintain it at the file system level in order to speed up individual operations (for example, to determine which files have changed). You have to either rewrite Git code to support individual servers, or create an add-on with scripts as a kind of access interface.
Redstone's colleague
explained that the decrease in performance is explained by the large number of O (n) structures in Git, which causes problems on large sizes. The index file itself is completely rewritten from scratch with the slightest change, and in a large project its size exceeds 100 MB. In addition, Git uses
lstat
to check for file changes, so on millions of files there are brakes with disk operations, especially in the cold cache.
In general, Facebook developers are hinting that it would be good to rewrite Git to improve performance. They refuse to divide the repository into several parts.