📜 ⬆️ ⬇️

7-Zip from .NET or how I did an open source project on CodePlex



This article may equally well be related to the blogs ".NET", "Open source" and "I am promoting". After that. as I wrote the material, it became clear that the most of it was “Open source” ... Please do not hit hard if I made a mistake.

So, below will be a story about my one and a half year experience of developing the open source SevenZipSharp library, laid out on CodePlex in February 2009. This library is a wrapper over 7-Zip, making it easy to use in .NET.

Using SevenZipSharp

The library has two main classes - SevenZipExtractor and SevenZipCompressor. The first use pattern:
//
using ( var extr = new SevenZipExtractor( @"\\" ))
{
extr.Extracting += DoExtractingEvent();
extr.ExtractArchive( @"\" );
DoFinishEvent();
}

//
var extr = new SevenZipExtractor( @"\\" );
extr.Extracting += DoExtractingEvent();
extr.ExtractionFinished += (s, e) => { DoFinishEvent(); extr.Dispose(); extr = null ; };
extr.BeginExtractArchive( @"\" );


* This source code was highlighted with Source Code Highlighter .
Use second pattern:
//
var cmpr = new SevenZipCompressor();
cmpr.CompressDirectory( @"\\\" , @"\" );
DoFinishEvent();
cmpr = null ;

//
var cmpr = new SevenZipCompressor();
cmpr.CompressionFinished += (s, e) => { DoFinishEvent(); cmpr = null ; }
cmpr.BeginCompressDirectory( @"\\\" , @"\" );


* This source code was highlighted with Source Code Highlighter .

I don’t want to turn the article into documentation on SevenZipSharp, so I’ll just list some of its features:
How it all began

In February 2009, I needed to work with 7-zip archives in one of the paid projects I worked on. For several days I unsuccessfully searched for a ready-made, digestible solution, but did not find anything better than this article on CodeProject . But I read a lot of posts where people complained about his absence. And then, having gathered the will into a fist and starting from the found article, I courageously went ahead and began to write my own implementation of the wrapper over 7-Zip. I decided to post the code on the newly opened CodePlex under the LGPLv3 license. At first, the work was in full swing, and I released the release after the release every few days (you can see for yourself in the “Other downloads” section on the download page ). Then my ardor pougas a few and I began to stabilize the code. In September 2009, the releases stopped coming out often (I got married) and since then I support the project as much as I can.
')
A variant with compiling 7-Zip into a mixed assembly with the / clr flag was considered. This option was rejected because firstly, the interface would have turned out low-level and not suitable for quick use and still had to write "additive", and secondly, to build code with the / clr: pure flag, we would have to rewrite a lot of code, and the unmanaged parts still remained .

When SevenZipSharp just appeared, I wanted to tell about it to potentially interested users, developers. I left a short description of the library wherever it was possible: in answers to StackOverflow questions, on programming forums (including MSDN, Channel 9), in comments to that very article on CodeProject and even on English Wikipedia . All this brought results, and soon search results of Google came out on traffic in the first place. I think everyone should advertise their projects, otherwise the majority simply won't know about their existence. The effectiveness of advertising is measured by statistics on downloads and visits, which is publicly available.

SevenZipSharp and 7-Zip

As many know, 7-Zip is written in C ++ with a small amount of C and the notorious assembler function that calculates CRC32, which Igor Pavlov implemented for x86, x86-64 and ARM. Any documentation on the code is missing that in the style of Russian programmers involved in the open source movement. There is a lot of code, it’s not at all simple and it takes some time to figure out the abundant definitions, interfaces and classes. Implementations of compression algorithms are called codecs (Codecs). Codecs are built in the library in the standard way, as plug-in protocol modules in ala Miranda / Pidgin messengers. The 7-Zip architecture is inseparable from COM; This is exactly what prevents the development of p7zip - 7-Zip for POSIX systems, which Igor Pavlov also deals with. In p7zip COM, it is replaced with a crutch, which simulates its work, simultaneously declaring half of the windows.h types. The algorithms themselves are written flawlessly and very stable, but the upper levels, as you can already guess, leave much to be desired. If the author began to write 7-Zip now, I think he came up with a kernel architecture that is more understandable, universal and portable, ideally in a language like C # or Java, even though in Python (well, there are no pluses for this purpose, which is already there) .

By the way, 7-Zip for end users (an installation that swings from 7-zip.org) is going to be Visual Studio 6 sample of the beginning of the century. Solution files are successfully converted to VS2008 / 2010 formats, and after replacing the C / C ++ compiler with a newer one and activating all optimization flags (yes, my main profession is the compiler), as well as using the profile, acceleration is reached around 15% (LZMA / LZMA2 ). On a note…

This is how SevenZipSharp wraps 7-Zip. Through COM's CreateObject, an object is created that supports the specified interface (IInArchive, IOutArchive). From this object, the necessary functions are jerked, and the desired result is achieved (for example, IInArchive.Extract (...)). During lengthy operations from unmanaged code, managed callbacks are called, and this leads to a problem that I did not immediately realize - error handling. For example, due to an error in the callback or an exception in the called callback-th user event, the execution of the operation drops without warning and any intelligible information, except for a strange 32-bit error code. I decided to wrap all the callbacks of the try / catch and put all the exceptions encountered into the error stack, which in case of failure is shown to the user. If there is a more elegant solution, please tell about it.

Attempts to chop from the shoulder to rewrite the entire 7-Zip code in C # by enthusiasts are made regularly, but none have gone further discussions. Altering algorithms from C ++ to C # is not profitable: the effort expended and the drop in speed do not pay off with cross-platform and religion, and only Pavlov himself can rewrite the kernel taking into account all the subtleties. I will not be unsubstantiated: LZMA on C # /. NET from the LZMA SDK, according to measurements, is 4 times slower than the unmanaged algorithm. Therefore, perhaps the best in such a situation was to make a wrapper with a clear and simple interface.

At one point, I wanted to make SevenZipSharp work under Mono (GNU / Linux). And then the problem of 7-Zip attachment to COM showed itself in all its glory. It was necessary to re-write the low-level part of the library almost from scratch. Because The 7-Zip code, as I already wrote, is specific, automatic wrapper tools like SWIG turned out to be useless, and in order for them to work at all, I had to first go through the whole code with the preprocessor and remove the 10-story define. I'm currently writing a COM-independent wrapper.

Development

Probably many beginners C # developers repeat the same mistakes, and I am not an exception. When I became aware of FxCop and StyleCop, I immediately tried to use them. It would seem logical to maintain the library code in good condition. However, StyleCop by default gives out too many warnings, but I didn’t want to set it up, and it was immediately dropped. FxCop has been used for some time, but at one point I caught myself thinking that I spend too much time on elementary code changes to comply with the rules, which, in fact, do not affect anything. The conclusion I made for myself is that these tools are important for developing an individual programming style, but single developers should not get involved in them.

Initially, SevenZipSharp was written in Visual Studio 2008 and worked under the 2nd framework. Even then, I realized that the smaller the .NET version, the less problems would be using the library. It is a pity that many developers at CodePlex do not understand this, start writing code using the ultra-modern features of .NET 4, and then wonder why they have so few downloads. Then I found out that in Windows Mobile <7 there is a full-fledged COM, and in a few days I ported SevenZipSharp to these mobile systems. If someone does not know, the usual and compact frameworks have a number of differences, and the code will not be assembled without minor changes, much less work. I considered supporting two almost identical branches unreasonable, and solved this problem with multiple # if / # else / # endif (standard approach in C ++ sources).

When Visual Studio 2010 / C # 4 came out, I found that the features of the new version of the language can be effectively applied to the code (for example, the appeared optional parameters eliminate 10+ overloads of a single logical method). To maintain backward compatibility, I applied # if / # else / # endif again. The code gradually began to turn from graceful classes and a branchy monster. And when the idea came to port SevenZipSharp to Mono, I still split some code files, because otherwise he would not have been able to figure it out in a couple of weeks. As a result, I faced the problem of supporting different platforms and frameworks in one single file in all its glory. Example:
#if !DOTNET20
/// <summary>
/// Unpacks the whole archive asynchronously to the specified directory name at the specified priority.
/// </summary>
/// <param name="directory">The directory where the files are to be unpacked.</param>
/// <param name="eventPriority">The priority of events, relative to the other pending operations in the System.Windows.Threading.Dispatcher event queue, the specified method is invoked.</param>
#else
/// <summary>
/// Unpacks the whole archive asynchronously to the specified directory name at the specified priority.
/// </summary>
/// <param name="directory">The directory where the files are to be unpacked.</param>
#endif
public void BeginExtractArchive( string directory
#if !DOTNET20
, DispatcherPriority eventPriority
#if CS4
= DispatcherPriority.Normal
#endif
#endif
)
{
SaveContext(
#if !DOTNET20
eventPriority
#endif
);
( new ExtractArchiveDelegate(ExtractArchive)).BeginInvoke(directory, AsyncCallbackImplementation, this );
}


* This source code was highlighted with Source Code Highlighter .

I note that developed SevenZipSharp spontaneously. If I wanted to add something to the functionality, I took and added - and did not consult with the heads, did not coordinate the decisions with the management, etc. This has its drawbacks, but then the bugs were fixed instantly and the requests for new features were satisfied within a few days. Full freedom of action - and real learning from their mistakes.

Bonuses

From participation in the open source project on CodePlex unexpected pleasant surprises appeared. First, I noticed that JetBrains, the creator of ReSharper, issues free licenses to open source software developers. I tried my luck and did not regret it - I was really given a license. ReSharper turned out to be an indispensable assistant in writing code, and I advise it to everyone. Secondly, more recently, CodePlex shows ads on project pages (at the request of their owners). Advertising revenue can either be sacrificed for good purposes, or appropriated to yourself. I chose the second option, and get about $ 10 a month. Thirdly, when SevenZipSharp gained popularity, I began to offer sponsorship to companies that develop useful tools for C # /. NET programmers, such as NDepend and SciTech .NET Memory Profiler. Fourthly, the Donate button brings small money. No one has donated more than $ 10, but even this is pleasant and stimulating.

Encourages, when in letters they thank for the library, they report that they use it in real and very well-known projects (for example, Stardock). Sometimes I receive letters from people with suggestions to start working on the library together. The person is usually full of enthusiasm, assures that together it will be great, etc. After the response in the letter, in which I write that I will not give the SVN password to anyone right away, I ask, and what a person can do at all, and I describe rough project development plans for the future, nobody has yet contacted me. It seems strange to me, perhaps the psychology of such people will be explained to me in the comments.


Typical letter without continuing

Since we are talking about people, I will tell about the public on CodePlex. The code was donated to me several times, and only once - by the rules, through the patch. Sometimes they gave practical advice, suggested what could be done better. It's nice when the errors are corrected not by you, but by other users, and then they share the bug fixes. However, a bug is often stated not in the Issue Tracker, but in discussions (Discussions), even if the bug is obvious. You have to regularly get a grasp of the questions and decide which one is a cant, a library or a curved user. However, sometimes there were “participants” who immediately got about a dozen of bugs, of which at best a couple are really worthwhile, and the rest are requests to add extra functionality that nobody needs except the “participants”. It happens that a person started a bug like "unpacking does not work," ask him in the comments what version of the library, how to reproduce the error, but he already forgot about SevenZipSharp for a long time and does not respond. Totally.

Separately amuse people who leave ratings (stars on CodePlex, from 1 to 5). Furious when put 2 and do not explain why. However, it also infuriates when they put 2 and write that they say nothing works at all and your library is shit. Fortunately, it rarely happens with SevenZipSharp, unlike other popular projects, which, I am sure, do not deserve such assessments.

Results

Looking back, I see that I contacted SevenZipSharp for a reason. Gained invaluable experience and some benefits. If you ask, is it worth developing your open source project “for the soul”, I will answer without hesitation - of course!

Thanks for attention.

Source: https://habr.com/ru/post/103521/


All Articles