📜 ⬆️ ⬇️

The birth of a new algorithm called Broo and comparison with Brotli and the rest

Good day to habravchane and guests of the site, we will focus on the lossless compression algorithm, which is a joint "child." In this article, the intermediate results of which could be achieved in the form of comparison tables with popular algorithms will be presented.

Briefly about the algorithm


The main ideology for the algorithm was compiled in several characteristics:


The packing (compression) speed was not included here initially, but will gradually improve as the whole algorithm as a whole. Entropy compression and vocabulary methods are not used.

Training


For cleanliness and ease of measurement, the algorithm was integrated into the lzbench utility (link to GitHub ), since it already has a sufficient number of other algorithms and it is easy to integrate your own.
')
Next, the files were selected from a ready-made package for testing algorithms called Silesia. Brief description of the files:
File nameDescriptionType ofSize, Byte
dickensThe collection of works by Charles Dickenstxt (eng)10 192 446
mozillatar archive with Mozilla 1.0 executablesexe51 220 480
mrMRI imageimage9,970,564
nciBase of chemical structuresdatabase33,553,445
oofficeDll files from Open Office.org 1.01exe6,152,192
osdbSample MySQL database format fromdatabase10,085,684
reymontText bookPolish, pdf6,627,202
sambasource tar archivesrc21,606,400
saoStar Catalog of the Smithsonian Astrophysical Observatorybin data7,251,944
websterWebster's American English Dictionaryhtml41,458,703
xmlCollection of xml filesxml5 345 280
X-rayX-ray imageimage8,474,240
Source

List of algorithms that participate in comparisons



The list is saturated with algorithms that solve different problems, but everything is known in comparison.

A short description of some of them.


Brotli - is based on the modern version of the LZ77 algorithm, the entropy Huffman coding and the modeling of the 2nd order context.
Designed to speed up the loading of web pages, supported in Chrome browsers based on Chromium and Firefox.

Deflate is a lossless compression algorithm using a combination of LZ77 and Huffman algorithms.

Zstandard (Zstd) is a lossless data compression algorithm developed from 2015 Yann Collet with the support of Facebook. Combines the vocabulary data compression algorithm of the LZ77 type and the effective entropy coding of the tANS type (FSE - Finite State Entropy), a modification of the Huffman code that implements a non-integer number of bits for storing characters.

LZMA - used in the 7-Zip archiver to create compressed archives in 7z format. The algorithm is based on a dictionary data compression scheme similar to that used in the LZ77, and provides a high compression ratio (usually higher than that obtained with bzip2 compression), and also allows the use of dictionaries of various sizes (up to 4 GB).

Snappy - another google lossless compression algorithm is designed to achieve high speeds without achieving maximum compression levels.

PC specifications


DualCore Intel Core i3 550, 3200 MHz processor
Memory GoodRam 8119 Mb DDR3-1333 DDR3 SDRAM
Ubuntu OS 16.10 x64

results


memcpy - a function that copies the data, taken as a reference for the speed of packing and unpacking.

The tables are sorted by compression ratio (“% of the original”, this way lzbench displays, did not convert) from small to large.

Test 1. Collection of works by Charles Dickens, text
Algorithm NamePacking speedUnpacking speedCompressed file size, byte% of originalFile Name, Type
memcpy4029 MB / s4034 MB / s10192446100.00dickens, txt
csc 2016-10-13 -118 MB / s31 MB / s402091639.45dickens, txt
lzlib 1.7 -07.76 MB / s34 MB / s381533537.43dickens, txt
lzma 9.38 -013 MB / s38 MB / s404485039.68dickens, txt
libdeflate 0.6 -185 MB / s435 MB / s423154341.52dickens, txt
zstd 1.1.3 -1143 MB / s486 MB / s427927341.98dickens, txt
xpack 2016-06-02 -183 MB / s359 MB / s428224542.01dickens, txt
brotli 0.5.2 -0168 MB / s178 MB / s440126943.18dickens, txt
zlib 1.2.8 -150 MB / s195 MB / s458561844.99dickens, txt
broo 1.06.03 MB / s265 MB / s475093646.61dickens, txt
gipfeli 2016-07-13178 MB / s254 MB / s495563248.62dickens, txt
yalz77 2015-09-19 -162 MB / s304 MB / s563410955.28dickens, txt
quicklz 1.5.0 -1250 MB / s326 MB / s583135357.21dickens, txt
lzsse2 2016-05-14 -018 MB / s1481 MB / s586570557.55dickens, txt
yappy 2014-03-22 -091 MB / s1122 MB / s614185360.26dickens, txt
snappy 1.1.3179 MB / s648 MB / s633783462.18dickens, txt
lz4 1.7.5264 MB / s1652 MB / s642874263.07dickens, txt
lz5 2.0 -10216 MB / s1855 MB / s643186963.10dickens, txt

Test 2. Tar archive with executable files Mozilla 1.0, exe
Algorithm NamePacking speedUnpacking speedCompressed file size, byte% of originalFile Name, Type
memcpy3986 MB / s4042 MB / s51220480100.00mozilla exe
csc 2016-10-13 -111 MB / s41 MB / s1533119129.93mozilla exe
lzma 9.38 -017 MB / s43 MB / s1642527232.07mozilla exe
lzlib 1.7 -018 MB / s33 MB / s1647048432.16mozilla exe
xpack 2016-06-02 -176 MB / s368 MB / s1839187435.91mozilla exe
libdeflate 0.6 -192 MB / s396 MB / s1978012438.62mozilla exe
zstd 1.1.3 -1209 MB / s542 MB / s2012045939.28mozilla exe
zlib 1.2.8 -153 MB / s209 MB / s2057722640.17mozilla exe
brotli 0.5.2 -0217 MB / s186 MB / s2174012842.44mozilla exe
broo 1.05.11 MB / s350 MB / s2317722045.25mozilla exe
gipfeli 2016-07-13236 MB / s436 MB / s2438055847.60mozilla exe
quicklz 1.5.0 -1315 MB / s368 MB / s2475681948.33mozilla exe
yalz77 2015-09-19 -149 MB / s436 MB / s2545453249.70mozilla exe
lzsse2 2016-05-14 -013 MB / s1493 MB / s2582664850.42mozilla exe
lz4 1.7.5437 MB / s1876 ​​MB / s2643566751.61mozilla exe
snappy 1.1.3303 MB / s1013 MB / s2646192451.66mozilla exe
lz5 2.0 -10334 MB / s2097 MB / s2701624252.74mozilla exe
yappy 2014-03-22 -0107 MB / s1749 MB / s2772821854.14mozilla exe

Test 3. MRI image, image
Algorithm NamePacking speedUnpacking speedCompressed file size, byte% of originalFile Name, Type
lzlib 1.7 -020 MB / s34 MB / s313089731.40mr, image
lzma 9.38 -016 MB / s44 MB / s315762631.67mr, image
csc 2016-10-13 -112 MB / s40 MB / s328580532.96mr, image
xpack 2016-06-02 -182 MB / s323 MB / s352682835.37mr, image
libdeflate 0.6 -198 MB / s428 MB / s375098537.62mr, image
zlib 1.2.8 -160 MB / s227 MB / s382836638.40mr, image
zstd 1.1.3 -1191 MB / s637 MB / s382923138.41mr, image
brotli 0.5.2 -0198 MB / s185 MB / s397564339.87mr, image
gipfeli 2016-07-13220 MB / s395 MB / s470256147.16mr, image
broo 1.05.94 MB / s305 MB / s474121947.55mr, image
quicklz 1.5.0 -1410 MB / s363 MB / s477819447.92mr, image
lzsse2 2016-05-14 -024 MB / s1523 MB / s512028951.35mr, image
yalz77 2015-09-19 -158 MB / s396 MB / s526936852.85mr, image
snappy 1.1.3302 MB / s912 MB / s541983154.36mr, image
lz4 1.7.5422 MB / s2024 MB / s544093754.57mr, image
yappy 2014-03-22 -0108 MB / s1609 MB / s645412064.73mr, image
lz5 2.0 -10294 MB / s2248 MB / s697848669.99mr, image

Test 4. Base of chemical structures, database
Algorithm NamePacking speedUnpacking speedCompressed file size, byte% of originalFile Name, Type
memcpy4042 MB / s4047 MB ​​/ s33553445100.00nci, db
csc 2016-10-13 -139 MB / s156 MB / s24637737.34nci, db
lzma 9.38 -043 MB / s153 MB / s27779978.28nci, db
lzlib 1.7 -049 MB / s103 MB / s28687618.55nci, db
zstd 1.1.3 -1435 MB / s915 MB / s28845308.60nci, db
broo 1.08.65 MB / s1000 MB / s29819708.89nci, db
xpack 2016-06-02 -1180 MB / s807 MB / s383884711.44nci, db
brotli 0.5.2 -0539 MB / s575 MB / s398419911.87nci, db
libdeflate 0.6 -1180 MB / s1165 MB / s406691312.12nci, db
zlib 1.2.8 -1122 MB / s404 MB / s462459713.78nci, db
yalz77 2015-09-19 -1197 MB / s695 MB / s505059615.05nci, db
gipfeli 2016-07-13529 MB / s681 MB / s5063829September 15nci, db
lz4 1.7.5765 MB / s2496 MB / s553304016.49nci, db
lz5 2.0 -10657 MB / s2644 MB / s554581016.53nci, db
snappy 1.1.3560 MB / s1452 MB / s614684418.32nci, db
quicklz 1.5.0 -1512 MB / s799 MB / s616063618.36nci, db
lzsse2 2016-05-14 -015 MB / s2984 MB / s633980718.89nci, db
yappy 2014-03-22 -0179 MB / s1941 MB / s896756226.73nci, db

Test 5. Dll files from Open Office.org 1.01, exe
Algorithm NamePacking speedUnpacking speedCompressed file size, byte% of originalFile Name, Type
memcpy4054 MB / s4102 MB / s6152192100.00ooffice, exe
csc 2016-10-13 -19.91 MB / s29 MB / s230152337.41ooffice, exe
lzma 9.38 -013 MB / s31 MB / s284157846.19ooffice, exe
lzlib 1.7 -014 MB / s24 MB / s287948946.80ooffice, exe
xpack 2016-06-02 -160 MB / s342 MB / s313796051.01ooffice, exe
libdeflate 0.6 -169 MB / s286 MB / s318743451.81ooffice, exe
zlib 1.2.8 -140 MB / s151 MB / s329053253.49ooffice, exe
brotli 0.5.2 -0154 MB / s143 MB / s353961557.53ooffice, exe
zstd 1.1.3 -1166 MB / s487 MB / s357989958.19ooffice, exe
broo 1.04.93 MB / s412 MB / s375720661.07ooffice, exe
gipfeli 2016-07-13163 MB / s354 MB / s392227663.75ooffice, exe
lzsse2 2016-05-14 -015 MB / s1205 MB / s399509164.94ooffice, exe
quicklz 1.5.0 -1234 MB / s264 MB / s401385965.24ooffice, exe
yalz77 2015-09-19 -135 MB / s398 MB / s412557067.06ooffice, exe
yappy 2014-03-22 -082 MB / s1718 MB / s423568768.85ooffice, exe
snappy 1.1.3222 MB / s889 MB / s427115069.42ooffice, exe
lz4 1.7.5337 MB / s1671 MB / s433891870.53ooffice, exe
lz5 2.0 -10251 MB / s1997 MB / s437007071.03ooffice, exe

Test 6. Sample MySQL database format from the Open Source Database Benchmark, database
Algorithm NamePacking speedUnpacking speedCompressed file size, byte% of originalFile Name, Type
memcpy4095 MB / s4073 MB / s10085684100.00osdb db
csc 2016-10-13 -110 MB / s38 MB / s331780032.90osdb db
lzlib 1.7 -019 MB / s33 MB / s334596533.18osdb db
xpack 2016-06-02 -168 MB / s475 MB / s375287137.21osdb db
zstd 1.1.3 -1194 MB / s585 MB / s377056637.39osdb db
libdeflate 0.6 -190 MB / s470 MB / s389680338.64osdb db
brotli 0.5.2 -0214 MB / s224 MB / s391050238.77osdb db
lzma 9.38 -015 MB / s38 MB / s398882339.55osdb db
zlib 1.2.8 -156 MB / s211 MB / s407639140.42osdb db
broo 1.05.40 MB / s474 MB / s414746541.12osdb db
lzsse2 2016-05-14 -012 MB / s1724 MB / s449255144.54osdb db
gipfeli 2016-07-13232 MB / s530 MB / s451751744.79osdb db
yalz77 2015-09-19 -151 MB / s596 MB / s457019345.31osdb db
lz4 1.7.5359 MB / s1629 MB / s525666652.12osdb db
lz5 2.0 -10278 MB / s1842 MB / s528673952.42osdb db
snappy 1.1.3303 MB / s1110 MB / s532932152.84osdb db
quicklz 1.5.0 -1277 MB / s330 MB / s549644354.50osdb db
yappy 2014-03-22 -070 MB / s1794 MB / s751573574.52osdb db

Test 7. Text of the book by Chłopi, Polish writer Radislav Reymont, Polish, PDF
Algorithm NamePacking speedUnpacking speedCompressed file size, byte% of originalFile Name, Type
memcpy4123 MB / s4120 MB / s6627202100.00reymont pdf
csc 2016-10-13 -115 MB / s47 MB ​​/ s187232428.25reymont pdf
lzma 9.38 -015 MB / s49 MB / s192195429.00reymont pdf
lzlib 1.7 -022 MB / s37 MB / s208229731.42reymont pdf
zstd 1.1.3 -1157 MB / s486 MB / s216738532.70reymont pdf
libdeflate 0.6 -1100 MB / s512 MB / s220693233.30reymont pdf
xpack 2016-06-02 -197 MB / s389 MB / s227971634.40reymont pdf
broo 1.05.10 MB / s423 MB / s228901934.54reymont pdf
brotli 0.5.2 -0212 MB / s226 MB / s236073235.62reymont pdf
zlib 1.2.8 -159 MB / s213 MB / s237643035.86reymont pdf
gipfeli 2016-07-13222 MB / s318 MB / s264491639.91reymont pdf
quicklz 1.5.0 -1284 MB / s399 MB / s300382545.33reymont pdf
yalz77 2015-09-19 -176 MB / s347 MB ​​/ s301708345.53reymont pdf
lzsse2 2016-05-14 -016 MB / s1735 MB / s303939245.86reymont pdf
yappy 2014-03-22 -0119 MB / s1252 MB / s316134447.70reymont pdf
lz4 1.7.5303 MB / s1611 MB / s318138748.00reymont pdf
lz5 2.0 -10265 MB / s1626 MB / s318490148.06reymont pdf
snappy 1.1.3208 MB / s729 MB / s323378748.80reymont pdf

Test 8. Tar source archive Samba 2-2.3, src
Algorithm NamePacking speedUnpacking speedCompressed file size, byte% of originalFile Name, Type
memcpy4048 MB / s4033 MB / s21606400100.00samba src
csc 2016-10-13 -117 MB / s60 MB / s440724120.40samba src
lzlib 1.7 -026 MB / s46 MB / s517881923.97samba src
lzma 9.38 -021 MB / s59 MB / s533893524.71samba src
zstd 1.1.3 -1257 MB / s715 MB / s555063725.69samba src
xpack 2016-06-02 -1107 MB / s568 MB / s566929526.24samba src
libdeflate 0.6 -1113 MB / s615 MB / s592297327.41samba src
brotli 0.5.2 -0304 MB / s285 MB / s608432728.16samba src
broo 1.06.90 MB / s650 MB / s618604228.63samba src
zlib 1.2.8 -173 MB / s276 MB / s632945529.29samba src
gipfeli 2016-07-13323 MB / s426 MB / s681062331.52samba src
yalz77 2015-09-19 -181 MB / s512 MB / s709889932.86samba src
quicklz 1.5.0 -1366 MB / s497 MB / s730945233.83samba src
lzsse2 2016-05-14 -014 MB / s2144 MB / s739573734.23samba src
lz4 1.7.5486 MB / s2035 MB / s771683935.72samba src
lz5 2.0 -10398 MB / s2246 MB / s792717836.69samba src
snappy 1.1.3353 MB / s1089 MB / s800877437.07samba src
yappy 2014-03-22 -0123 MB / s1769 MB / s918327342.50samba src

Test 9. Star catalog of the Smithsonian Astrophysical Observatory, bin
Algorithm NamePacking speedUnpacking speedCompressed file size, byte% of originalFile Name, Type
memcpy4096 MB / s4114 MB / s7251944100.00sao bin
lzma 9.38 -09.47 MB ​​/ s22 MB / s492352967.89sao bin
lzlib 1.7 -010 MB / s16 MB / s500557369.02sao bin
csc 2016-10-13 -15.69 MB / s17 MB / s508284670.09sao bin
xpack 2016-06-02 -147 MB ​​/ s312 MB / s525960672.53sao bin
libdeflate 0.6 -160 MB / s258 MB / s549426875.76sao bin
zlib 1.2.8 -131 MB / s158 MB / s556777476.78sao bin
brotli 0.5.2 -0130 MB / s120 MB / s601984183.01sao bin
gipfeli 2016-07-13146 MB / s422 MB / s604336183.33sao bin
broo 1.03.55 MB / s496 MB / s608611883.92sao bin
yappy 2014-03-22 -068 MB / s1709 MB / s620175285.52sao bin
zstd 1.1.3 -1145 MB / s483 MB / s625428286.24sao bin
yalz77 2015-09-19 -126 MB / s576 MB / s629903086.86sao bin
snappy 1.1.3212 MB / s969 MB / s643526688.74sao bin
quicklz 1.5.0 -1229 MB / s222 MB / s649830189.61sao bin
lzsse2 2016-05-14 -015 MB / s941 MB / s671054292.53sao bin
lz4 1.7.5337 MB / s2161 MB / s679027393.63sao bin
lz5 2.0 -10236 MB / s2501 MB / s679272093.67sao bin

Test 10. Webster's American English Dictionary, html
Algorithm NamePacking speedUnpacking speedCompressed file size, byte% of originalFile Name, Type
memcpy3970 MB / s4008 MB / s41458703100.00webster, html
csc 2016-10-13 -113 MB / s44 MB / s1036015524.99webster, html
lzma 9.38 -017 MB / s47 MB ​​/ s1270487830.64webster, html
lzlib 1.7 -022 MB / s38 MB / s1272759630.70webster, html
zstd 1.1.3 -1169 MB / s531 MB / s1373828433.14webster, html
libdeflate 0.6 -199 MB / s524 MB / s1383919233.38webster, html
broo 1.05.42 MB / s266 MB / s1385419533.42webster, html
xpack 2016-06-02 -194 MB / s441 MB / s1400690733.79webster, html
brotli 0.5.2 -0187 MB / s207 MB / s1455900735.12webster, html
zlib 1.2.8 -160 MB / s211 MB / s1499124236.16webster, html
gipfeli 2016-07-13209 MB / s281 MB / s1615231238.96webster, html
lzsse2 2016-05-14 -014 MB / s1897 MB / s1745951742.11webster, html
quicklz 1.5.0 -1276 MB / s369 MB / s1831581644.18webster, html
yalz77 2015-09-19 -162 MB / s315 MB / s1843524844.47webster, html
yappy 2014-03-22 -0107 MB / s1378 MB / s1989961048.00webster, html
lz4 1.7.5317 MB / s1593 MB / s2013998848.58webster, html
lz5 2.0 -10260 MB / s1790 MB / s2015354748.61webster, html
snappy 1.1.3214 MB / s765 MB / s2020646648.74webster, html

Test 11. Collection of xml files, xml
Algorithm NamePacking speedUnpacking speedCompressed file size, byte% of originalFile Name, Type
memcpy4118 MB / s4113 MB / s5345280100.00xml
csc 2016-10-13 -127 MB / s99 MB / s60676311.35xml
lzma 9.38 -034 MB / s108 MB / s69123612.93xml
zstd 1.1.3 -1363 MB / s887 MB / s70315113.15xml
lzlib 1.7 -039 MB / s73 MB / s74153713.87xml
broo 1.07.91 MB / s1277 MB / s80052614.98xml
brotli 0.5.2 -0409 MB / s451 MB / s90575716.94xml
libdeflate 0.6 -1143 MB / s856 MB / s94040917.59xml
zlib 1.2.8 -1104 MB / s344 MB / s96524818.06xml
xpack 2016-06-02 -1137 MB / s634 MB / s100000818.71xml
yalz77 2015-09-19 -1157 MB / s666 MB / s106737819.97xml
gipfeli 2016-07-13406 MB / s527 MB / s110053620.59xml
quicklz 1.5.0 -1452 MB / s712 MB / s112470821.04xml
lzsse2 2016-05-14 -018 MB / s2870 MB / s120112522.47xml
lz4 1.7.5617 MB / s1991 MB / s122749522.96xml
lz5 2.0 -10524 MB / s2231 MB / s124009823.20xml
snappy 1.1.3414 MB / s1196 MB / s130837424.48xml
yappy 2014-03-22 -0155 MB / s1915 MB / s160545930.04xml

Test 12. X-ray image, image
Algorithm NamePacking speedUnpacking speedCompressed file size, byte% of originalFile Name, Type
memcpy4023 MB / s4106 MB / s8474240100.00x-ray, image
csc 2016-10-13 -116 MB / s21 MB / s404963047.79x-ray, image
lzlib 1.7 -09.85 MB / s18 MB / s507927459.94x-ray, image
lzma 9.38 -010 MB / s23 MB / s519889461.35x-ray, image
xpack 2016-06-02 -148 MB / s243 MB / s586336769.19x-ray, image
libdeflate 0.6 -163 MB / s267 MB / s599975070.80x-ray, image
zlib 1.2.8 -135 MB / s145 MB / s603393271.20x-ray, image
brotli 0.5.2 -0139 MB / s121 MB / s660052377.89x-ray, image
zstd 1.1.3 -1419 MB / s569 MB / s677228679.92x-ray, image
lzsse2 2016-05-14 -017 MB / s883 MB / s729287686.06x-ray, image
quicklz 1.5.0 -1264 MB / s219 MB / s744063287.80x-ray, image
gipfeli 2016-07-13165 MB / s486 MB / s764139190.17x-ray, image
broo 1.03.47 MB ​​/ s487 MB / s770271590.90x-ray, image
yalz77 2015-09-19 -123 MB / s491 MB / s793365393.62x-ray, image
snappy 1.1.3446 MB / s1869 MB / s820918096.87x-ray, image
yappy 2014-03-22 -059 MB / s3200 MB / s832858298.28x-ray, image
lz4 1.7.5852 MB / s3457 MB / s839019599.01x-ray, image
lz5 2.0 -10540 MB / s4126 MB / s845968599.83x-ray, image

“Afterword”


It should be noted that it is still damp and there is still enough work to improve, but it already gives a positive effect. There are still hours of generating and testing hypotheses, writing tests and optimizing code.

Thanks for attention.

Source: https://habr.com/ru/post/322978/


All Articles