Linux often uses gzip and bzip2 compression. They both provide good tightness and are comfortable to use. In this case, bzip2 compresses most files more efficiently - but, on the other hand, it works slower than more traditional gzip or zip.
But you can get the effectiveness of bzip2, while significantly increasing the speed. This is a
pbzip2 - Parallel BZIP2 utility. In the usual case, when using bzip2, only one processor core is involved, while on modern systems there may be 2, 4, or, for example, 8.
Pbzip2 can use several processor cores at once, which, according to the authors, leads to an almost linear increase in performance. The compressed files that pbzip2 creates are fully compatible with
bzip2 1.0.2 and newer versions of bzip2 (there is also a
pigz utility, which, in turn, is a multithreaded gzip implementation - thanks to
altexxx ).
')
Below is the result of testing the compression rate of a 1000M portion of the SQL file (dd if = dump.sql of = testfile bs = 1M count = 1000) on a computer with two Intel Xeon E5520 processors (4 cores, 8 threads, 2.26 GHz) :
As can be seen from the test results, pbzip2, working in 4 threads, is about 3.6 times faster than bzip2, working in one thread - which is almost a linear increase in performance.
At the same time, pbzip2, operating in 16 threads, was slower than pbzip2, using 4 threads, probably because of the speed of I / O operations. See also additional tests
in the comments (thanks to
tristan and
bliznezz ) - including using the tmpfs-section in RAM.
Pbzip2 is used in much the same way as bzip2, but there are some additional functions, such as
displaying the progress of an operation as a percentage.
To compress a file:
pbzip2 -k -p4 filename
Where
filename is the name of the file. By default, the compressed file is named the same as the source, but at the end .bz2 is added (that is, in this case,
filename.bz2 ).
The -k option is needed so that pbzip2 does not delete the source after it has finished compressing. You can also add the -v option to display detailed information, including the progress of the operation as a percentage.
The -p option sets the number of threads (in this case, 4).
To unzip the file:
pbzip2 -dk -p4 filename.bz2
Where
filename.bz2 is the file name. By default, the unpacked file is named the same as compressed, but at the end .bz2 is removed (that is, in this case, the
filename ).
The -d option is needed to decompress.
Accordingly, together with the output of detailed information and the progress of the operation, the compression will look like this:
pbzip2 -kv -p4 filename
And you can unpack the file like this:
pbzip2 -dkv -p4 filename.bz2
If you need to compress the whole directory, then, as in the case of gzip and bzip2, a so-called tarball is made (which itself does not have a compression) containing the desired directory, and it is compressed by the necessary utility.
In the case of pbzip2, you can do it in one line like this:
tar cf myfile.tar.bz2 --use-compress-prog=pbzip2 directory_to_compress/
Or so:
tar -c directory_to_compress/ | pbzip2 -c > myfile.tar.bz2
In the second case, respectively, you can also add -p4 (to set the number of threads to be 4).