We did a daily big backup for
Stack Overflow , so again I played around a bit with file compression.
Our server has the latest 64-bit version
7zip (4.64). I believe that a dual-core processor is enough for a desktop system. But everything is simple with servers: the more cores, the merrier! The server has two quad-core processors - only eight cores, so I was a little disappointed to find that neither RAR nor 7zip use more than two cores.
But, even using only two cores for compression, the 7zip algorithm is incredibly efficient, and recently it has become quite fast. I used to recommend using RAR instead of Zip. Given the free, unlike RAR, and the effectiveness of 7zip, now it is logical to choose it.
Here are a couple of my tests for compressing a 4.76 GB backup database file (using a server with two 2.5 GHz Xeon E5420 quad-core processors on board):
7zip | fastest | 5 min | 14 MB / sec | 973 MB |
7zip | fast | 7 min | 11 MB / sec | 926 MB |
7zip | normal | 34 min | 2.5 MB / sec | 752 MB |
7zip | maximum | 41 min | 2.0 MB / sec | 714 MB |
7zip | ultra | 48 min | 1.7 MB / sec | 698 MB |
If you think that,
since 7zip shows good results with maximum and ultra compression ratios, then at ultra-plus the result will be incredible and will disappoint you. There are certain reasons why vendors of archivers set the normal compression ratio by default. If the compression ratio is reduced, the size of the output archive increases dramatically, while an increase in the compression ratio will lead to a relatively small decrease in the size of the output archive instead of a large increase in the compression time.
')
And now let's see what happens if I switch 7zip to use
bzip2 :

We will compress the same file (4.76 GB) on the same machine:
bzip2 | fastest | 2 min | 36 MB / sec | 1092 MB |
bzip2 | fast | 2.5 min | 29 MB / sec | 1011 MB |
bzip2 | normal | 3.5 min | 22 MB / sec | 989 MB |
bzip2 | maximum | 7 min | 12 MB / sec | 987 MB |
bzip2 | ultra | 21 min | 4 MB / sec | 986 MB |
Why is bzip2 so much faster than 7zip? It's simple:
CPU usage when using 7zip
CPU usage when using bzip2
Bzip2 uses more than two cores for parallelization. I do not know how many kernels it can use, but in the drop-down menu, 7zip offers no more than sixteen cores for bzip2. Our server has eight cores, so I chose so much during the test.
Unfortunately, the increase in the speed of bzip2 is meaningless with a high degree of compression - the difference between normal, maximum, and ultra is a measly 0.06%. The algorithm scales well in time, but practically does not scale to the size of the output file. This is very strange, because it would be to reduce the size I would like to spend the time won by parallelization.
But even a minimal size reduction can make sense, depending on the circumstances:
= + n * ( / + )
For example, if you compress a file to send it over the network, n = 1, so the compression time has a noticeable effect on the total time. If you want to put the file on the Internet, n is large, so a long compression time will have little effect on the total time.
After all, slow networks work well with slow but effective algorithms, while fast networks need fast, but perhaps less efficient algorithms.
On the other hand, the ability to compress a five-gigabyte file five times in two minutes is impressive. Therefore, the idea that how quickly the 7zip algorithm would work if it was rewritten and parallelized to work more than on two cores did not leave me at all.