
I will not explain what Reiser4 is and what it is eaten with, since there is enough information on this subject [
1 ,
2 ] and I see no point in repeating it. Therefore, I’ll probably start with the fact that I decided to try Reiser4 in 2010, but due to the problems of using transparent compression together with the packaging of tails (as it turned out there were problems in the flush procedure, which are currently solved [
3 ]), I switched back to ReiserFS. In 2013, I learned that this problem was solved [
4 ] and I again returned to Reiser4 (LZO1 on a stationary system, on a laptop without compression). After some time, I remembered the news about LZ4's “Extremely Fast Compression Algorithm”, as well as the fact that the Illumos community added support for it in ZFS. Then the thought came to me: “It would have been great to have LZ4 support in Reiser4”! So I began to "attach" it to Reiser4.
First, I looked at the code for the ccreg40 plug-in (as known, Reiser4 has a plugin structure). It all starts with the file
fs / reiser4 / plugin / compress / compress.h which contains the reiser4_compression_id enumeration:
typedef enum { LZO1_COMPRESSION_ID, GZIP1_COMPRESSION_ID, LAST_COMPRESSION_ID, } reiser4_compression_id;
It identifies the identification numbers of a particular compression algorithm (LZO1 and GZIP1 are available by default). The last in the list is LAST_COMPRESSION_ID, which is needed to determine the size of various tables containing information about algorithms and related functions.
We continue in the file
fs / reiser4 / plugin / compress / compress.c , in which we directly describe the functions. A total of 7 main functions:
- init () - Required if the algorithm requires pre-initialization of something. Neither GZIP1, nor LZO1, nor LZ4 require this, so they simply return 0.
- overrun () - Returns the maximum size of the "tail" that can be formed during compression. For example, if you do not take into account the "tail", then with incompressible incoming data, going beyond the output buffer will occur. For example, for GZIP1 this value is 0, for LZO1 “src_len / 64 + 19”, and for LZ4 “src_len / 255 + 16”.
- alloc () - Allocates memory for the needs of the algorithm.
- free () - Free memory allocated for the needs of the algorithm.
- min_size_deflate () - Returns the minimum block size that still makes sense to compress.
- compress () - Compresses the data.
- decompress () - Unpacks data.
')
I will dwell on the functions alloc () / free (). One of the arguments they take is the act argument of type tfm_action. tfm_action is an enumeration described in the header file
fs / reiser4 / plugin / compress / compress.h (it has the same structure as reiser4_compression_id), in which there are two elements TFMA_READ and TFMA_WRITE.
typedef enum { TFMA_READ, TFMA_WRITE, TFMA_LAST } tfm_action;
Thus, it is possible to determine the moment when the function is called up, while reading or writing. Some algorithms require additional memory for decompression, and so we correctly allocate the required amount of memory. For example, the GZIP1 algorithm requires additional memory and we allocate it for it, but the LZO1 / LZ4 algorithms do not require it and we do not allocate it.
Everything ends in the same file compress.c, the description of the
compression_plugins array, in which we indicate the type of the plugin, its identification number, title, functions, etc.
[LZ4_COMPRESSION_ID] = { .h = { .type_id = REISER4_COMPRESSION_PLUGIN_TYPE, .id = LZ4_COMPRESSION_ID, .pops = &compression_plugin_ops, .label = "lz4", .desc = "lz4 compression transform", .linkage = {NULL, NULL} }, .init = lz4_init, .overrun = lz4_overrun, .alloc = lz4_alloc, .free = lz4_free, .min_size_deflate = lz4_min_size_deflate, .checksum = reiser4_adler32, .compress = lz4_compress, .decompress = lz4_decompress }
Now that I changed in the LZ4 code. For starters, I removed all the code associated with Microsoft Visual Studio (maybe someday they will build the Linux kernel through the MS VS compiler, but obviously this will not be in the near future) and C ++ (one extern “C”). Then I removed the code associated with optimization for BigEndian systems, which made the output information incompatible with LittleEndian systems and the code that allows us to use stack memory instead of the usual one (it will turn out faster, but we are in the core, we will not have such freedoms for nothing). Finally, I removed the malloc () / free () function from the code, adding a pointer to the list of function arguments to a section of memory allocated for the needs of LZ4 (recall (alloc ())).
And now the most important thing is how it all worked ... frankly, bad. The LZ4 plug-in worked slower and squeezed worse than the LZO1 plug-in. Measurements were taken on a live system, in single user mode. The measure included the operation of unmounting the partition (so that the sync / flush procedures worked and the files were fully recorded to the disk). Three tests were performed: linear write / read to disk of a file hammered with zeros (from / dev / zero), linear read / write of an incompressible file (previously taken from / dev / urandom and stored in tmpfs memory), and unpacking / compressing the Linux kernel sources version 3.9.5. Of all the tests, the plugin with LZ4 showed an advantage only when writing / reading a file with zeros. In all other tests, LZO1 bypassed LZ4 both in compression / decompression speed and in the final file size.
In the course of further research (fullbench from LZ4 and lz4c vs lzop), it was found that LZ4 loses all of its properties when blocks are small, and shows the stated properties [
5 ] only on large blocks, for example, fullbench default 4MiB, lz4c 8MiB. As Edward Shishkin put it: “4MiB is a bit too much. LZO1 compresses the pieces and a lot smaller ... "[
6 ]
Thus, I found out that for Reiser4, LZO1 is more preferable than LZ4. By the way, something tells me that support for LZ4, which was added by the community, will not manifest itself in ZFS at all times (although compared to LZJB always), and unsuccessful attempts to push LZ4 on Linux [
7 ] (as an option). for kernel compression or initram) confirmation of this. As for LZ4 in btrfs ... Edward Shishkin clearly explained what btrfs is and how it is being developed.
Patch Reiser4 for Linux 3.9Patch LZ4 for Reiser4PS The LZ4_decompress_safe () function needs to be redone a little, but it makes no sense, so I did not.
Patch for reiser4progs[1]
habrahabr.ru/post/45873[2]
http://theoks.net/~onekopaka/Reiser4Site/v4.html[3]
marc.info/?l=reiserfs-devel&m=135146138331012&w=2[4]
sourceforge.net/p/reiser4/discussion/general/thread/2bca4f8e[5]
code.google.com/p/lz4[6]
sourceforge.net/p/reiser4/discussion/general/thread/780facb4[7]
lwn.net/Articles/534168[8]
habrahabr.ru/post/108629