📜 ⬆️ ⬇️

Optimum continuous archive sorting

The embodiment of one idea - to arrange files so that the size of the archive was minimal.
The program checks the compressibility of files in a pair and then sorts the list for compression by the archiver.

sourceforge.net/projects/saro-vks
If anyone needs it - take it.

In general, the idea is not new - for example, the fifth version of WinRAR can find duplicates and save them in the archive as links.
Here, a similar principle - the "similarity" of files is determined by the compression test.

The best result is obtained with large files when the size of a pair of files is larger than the “dictionary” of the archive.
For example, 1.5 GB of executable files, about 50 MB each, shrank from 709 to 696 MB, and the archive from 25 MB of MIDI files, 30..300 kb each, decreased from 5.55 to 5.51 MB, compared to the “factory” sorting.
')
A couple of pictures on the algorithm, for clarity.

First, the compressibility of pairs is checked - the matrix of N * N files is filled (a diamond - if you delete the last files from the list, the results already obtained for the first files will be saved).



Then the sequence is sorted by the “window” - within the window all possible combinations of files are searched without duplication (factorial N), and the combination with the minimum size is left.
The window gradually shifts throughout the list while the size continues to decrease.
When the smaller size is not found anywhere, the sorting is complete.



Ps I apologize for the damp version, but the "finishing" is slow, and when the final version will be is unknown, but it is already quite possible to use it.
While there is a significant disadvantage - the size of archives with pairs of files is limited to 4 GB (very large files were not needed, and processing 64-bit sizes takes more time).
If something is missing - write, I will, if possible.

Source: https://habr.com/ru/post/253313/


All Articles