Writing ZLib-based archiver in .NET

Why write

because it is convenient to have your own customizable tool in which you can intervene in archiving at any stage
because it's interesting
because many archivers have api, paid, and about others see the first argument.

Technologies and Libraries

You will need the library zlib.net.dll ( official site ).
Visual Studio 2010 Development Environment
C # language
Framework 3.5

Technical task

Archiver should be able to:

compress files and directories
compile archive without compression
encrypt data (with and without compression)
exclude specified paths
delete files after they are compressed
unpack compressed archive

Design

Archive format

By optimization came to the following option:

Purpose	The size
Archive type	1 byte
Header length (after compression and encryption)	4 bytes
Heading (we will consider in more detail below)	N bytes
First file content block	N bytes
Second file content block	N bytes
......	......
Content block of the K-th file	N bytes

Archive Header Format

Purpose	The size
Raw header size	4 bytes
Block 1	N bytes
Block 2	N bytes
......	......
K block	N bytes

The format of the archive header block

Purpose	The size
Block size	4 bytes
Absolute path length	4 bytes
Absolute way	N bytes
Relative path length	4 bytes
Relative path	N bytes
Object size after processing	8 bytes

')
A little bit of explanation. At the beginning of the archive file is stored the header, which collects all the metadata on the archive objects. The header itself goes through the same stages of compression and encryption as the archive files. After the title, there are blocks that store the contents of files after processing, the blocks go right along. Determining the boundaries of the block follows from the header, which stores the size of the blocks.

General principles of work

The user sets the compression options, on the basis of which the necessary file handlers are connected (archiver, encoder), each such handler contains two methods, Execute and BackExecute. When archiving, call the Execute method, while unzipping the BackExecute method, and when unzipping, we use handlers in the reverse order. Such a structure makes it extremely easy to supplement the program with any number of new processors (for example, implementing other methods of encryption or compression).

Work algorithm

Determination of archive type (compressed, encrypted)
Reading the list of archiving objects
Forming a complete list of objects to be archived based on the read list and exclusion list
Creating archive header (in object view)
Enumerate the full list of objects in the title
Processing the object, updating data on its size after processing in the header, writing to the temporary file of the processed content.
Save header to file
Header processing (compression, encryption)
Building the final archive file

Implementation

ZLib is able to compress / decompress the data transferred to it as an array of bytes. Actually, this is all we need and all that will be used. He does not know how to encrypt data, for this we use the standard .NET Framework library - System.Security.Cryptography.
In the process of archiving / unarchiving, you can get data on the current object being processed, as well as any errors that have occurred.
In case of receiving an error while processing a file, the user is offered a choice of 4 actions:

abort execution
ignore error
ignore all errors
to repeat

The request for action can be canceled simply by commenting out the ErrorProcessing event, in which case the execution of the program is interrupted.
I will not give the program code, I give a link to the sources.

Directly:
Project
In the form of dll'ki

SVN:
svn: //svn.code.sf.net/p/yark/code-0/trunk

Project:
sourceforge.net/projects/yark

And an example of use:

Compression

ArchiveProvider compressor = new ArchiveProvider(); using (SaveFileDialog sfd = new SaveFileDialog()) { if (sfd.ShowDialog() == System.Windows.Forms.DialogResult.OK) { CompressorOption option = new CompressorOption() { Password = __, WithoutCompress = true___, RemoveSource = true____, Output = sfd.FileName }; //      foreach (string line in lbIncludes.Items) option.IncludePath.Add(line); //      foreach (string line in lbExclude.Items) option.ExcludePath.Add(line); compressor.Compress(option); } }

Unarchiving

 ArchiveProvider decompressor = new ArchiveProvider(); using (FolderBrowserDialog fbd = new FolderBrowserDialog()) { if (fbd.ShowDialog() == System.Windows.Forms.DialogResult.OK) { decompressor.Decompress(__, fbd.SelectedPath, __); } }

Comparison of work results

By the time the result did not begin to detect, approximately equally.
Initial data:

catalog with text files (1 430 Kb)
catalog with mixed data (18,893 Kb)

	Text	Mixed data
Winrar	613	8,045
Zip	638	8,709
This	588	8,655

For rar and zip format, the usual compression parameter was set, which is used in the program.
The current archive format stores absolute file and directory paths, you can exclude them and slightly improve compression.

Possible improvements

saving file information (date of creation / modification, access rights)
add multithreading (you just need to parallelize the creation of temporary files)
add comments to the archive
associate files with the program

Source: https://habr.com/ru/post/133379/

All Articles

Writing ZLib-based archiver in .NET

Why write

Technologies and Libraries

Technical task

Design

Archive format

Purpose

The size

Purpose

The size

Purpose

The size

General principles of work

Work algorithm

Implementation

Compression

Unarchiving

Comparison of work results

Text

Mixed data

Winrar

Zip

This

Possible improvements

More articles: