We are glad to welcome you. Almost a year has passed since the last article was published, and we are ready to tell you what happened with the algorithm itself and how delta coding is involved.
After the release of the article on improvements to the Broo algorithm, we were faced with an obstacle in improving the level of compression and performance, namely it was impossible to improve the level of compression without degrading the speed of decompression and vice versa. I will immediately make a reservation, the improvements were made without prejudice to the other characteristics of the algorithm, but these changes are minor, then we will write about these changes. So, after, we thought about where we can apply the accumulated expertise and knowledge in a similar direction. And the choice fell on delta encoding .
Delta encoding ( English Delta encoding) - a way to present data as a difference ( delta ) between serial data instead of the data itself.
In practice, if the compression algorithms allow you to reduce the file size and store or send it without any dependencies on other files, then the delta encoding algorithms allow you to build a smaller patch (difference) based on two files (data set) and apply a patch to the file ( data set) 1 - get the file (data set) 2 .
The most common use for delta coding is to update applications on your phones and PCs. Instead of downloading the application completely and then replacing the files, a much smaller patch is built (depending on the number of changes), which makes it much faster to download the update, and the speed with which the patch is applied directly affects the update speed of the application itself.
If you know where else delta coding is applied, then write in the comments.
As we said, they are few:
These changes are still at the stage of experimentation and debugging. The main problem - after adding support for large files, the decompression speed dropped by 20%, which is unacceptable for us. So we are still searching for a solution.
Below we present only one table of comparisons of the old version of the algorithm, the experimental one and some levels of zstd. File "xml" from the previous article .
Processor: Intel i7-7700HQ
Memory: DDR4-2400
Algorithm Name | Packing speed | Unpacking speed | Compressed file size, byte | % of the original |
---|---|---|---|---|
memcpy | 17460 MB / s | 17194 MB / s | 5345280 | 100.00 |
zstd 1.3.1 -6 | 141 MB / s | 1311 MB / s | 585810 | 10.96 |
broo 1.2 | 11 MB / s | 1905 MB / s | 606838 | 11.35 |
zstd 1.3.1 -5 | 196 MB / s | 1207 MB / s | 619510 | 11.59 |
zstd 1.3.1-4 | 357 MB / s | 1214 MB / s | 637587 | 11.93 |
zstd 1.3.1 -3 | 366 MB / s | 1220 MB / s | 639073 | 11.96 |
broo 1.1 | 14 MB / s | 2005 MB / s | 643084 | 12.03 |
zstd 1.3.1 -2 | 394 MB / s | 1108 MB / s | 690508 | 12.92 |
zstd 1.3.1 -1 | 479 MB / s | 1213 MB / s | 703093 | 13.15 |
Like many algorithms, the speed depends on the processor, as we can see in the table, the decompression speed is more than 1.5 times faster than the first level zstd, on the Intel i7-7700HQ. While on the old Intel i3-550, the decompression speed was approximately equal to the zstd decompression speed, you can see the comparative tables here .
This suggests that you can perform tighter integration with individual processors. Depends on the specifics of the problem.
As you might have guessed, we developed our own delta encoding algorithm and named it DBroo (Delta Broo).
Main characteristics and features:
There are ready-made solutions, such as diff, bsdiff, xdelta and others. There was a goal to find the best (and also available) in this direction and to be with it. In a purely experimental way, Xdelta3 turned out to be the main competitor. It gives a good compression and a fairly fast speed of the patch. Also, Xdelta3 is used to update CyanogenMod (now LineageOS ).
Now look at the comparison table DBroo and Xdelta3. "Xml" is used as the reference file, and the same but changed randomly as the new file.
Algorithm Name | Patch creation speed | Patch speed | Patch size, byte | % of original |
---|---|---|---|---|
memcpy | 18052 MB / s | 18665 MB / s | 5326823 | 100.00 |
Xdelta3 -9 + lzma | 5.40 MB / s | 306 MB / s | 106542 | 2.00 |
Xdelta3 -6 + lzma | 20 MB / s | 310 MB / s | 121916 | 2.28 |
DBroo 1.0 | 7.40 MB / s | 1600.00 MB / s | 123052 | 2.31 |
Xdelta3 -9 | 7.00 MB / s | 688.24 MB / s | 179732 | 3.37 |
Xdelta3 -6 | 36.71 MB / s | 694.09 MB / s | 201681 | 3.78 |
Xdelta3 -3 | 59.22 MB / s | 637.43 MB / s | 237218 | 4.45 |
Xdelta3 -2 | 72.73 MB / s | 582.75 MB / s | 279223 | 5.24 |
Xdelta3 -1 | 81.43 MB / s | 540.53 MB / s | 478824 | 8.9 |
Only those products that are in demand in the market receive development. Therefore, we welcome your comments. We also created a telegram channel .
Thank.
Source: https://habr.com/ru/post/425255/
All Articles