This is the story of personal experience, the experience of finding a bug in someone else's, old, unsupported code.
It all started as usual, I had a simple at first glance task: to make packing files in the current folder in a ZIP
archive with a certain password in C ++ / Qt
, it would seem that it could be easier?
Naturally, the first assistant is Google
, he suggested that there are two Qt
libraries for working with ZIP
and OSDab ZIP
, among other things, Qt
itself supports qCompress
and qDecompress methods
I found out that the methods suit me a little, because they can only compress the stream, all the headers and encryption on the developer’s conscience. This path was too long and I immediately refused it and turned my attention to the libraries.The OSDaB ZIP
had to be dropped immediately, despite the fact that it is a great library, its code is distributed only under the GPL license
, I also had to build in the functionality of the proprietary application. Fortunately, QuaZIP
ended up with two GPL
and LGPL licenses
. I stopped at it. Especially without delving into his device, I sketched the simplest class to work with him and began to test.
This is where the problems started: the archive was perfectly created and encrypted by this library, it also unpacked it well, but bad luck: any other archiver refused to recognize the correct password, saying that it was not true. First, the decompression algorithm was checked by me, I thought this way: if the error is symmetrical, then I will not be able to decompress the file created and password-protected by another archiver, however, the library coped with it perfectly. It became obvious that the problem is only in the encryption algorithm, but how to find it?
To begin with, I studied the code, comments to the code, and information about the author. I learned that the QuaZIP
are a Qt
wrapper over the MiniZip
library written in pure C. Quickly making sure that the QuaZIP wrapper has nothing to do with the error, I began to learn the MiniZip code. At first glance, everything worked perfectly. And so I decided to go back to Google with the question, as well as to the site of the developer of
this library itself. there I found two unreported bug reports with the same question that I had, dated 2007.
Well, I set about studying the problem. The first file on which the suspicion fell was, of course, the crypt.h
file, in which the encryption algorithm itself was implemented.
I found out that this file is slightly adapted and copied under the BSD license crypt.h
file from the Info-ZIP
software package (this is the same zip / unzip any Linux system has), visually comparing the files did not show significant differences and errors.
Then I picked up the description of the ZIP format on the PKWARE website
and creating an archive using the library, as well as a similar one using a standard ZIP
, began to compare them.
The picture was very interesting, here it is:
The encrypted ZIP file has a fairly simple structure:
The entire file is divided into two sections, a section with the content itself and a section called the Central header
contains information about files, their size, creation date, file attributes, etc., this is necessary so that the program-archiver can quickly read information about the contents and display to the user without analyzing the entire archive file.
The file section is built like this:
[local file header 1]
[12 bytes encrypt header 1]
[file data 1]
[data descriptor 1]
[local file header n]
[12 bytes encrypt header n]
[file data n]
[data descriptor n]
It is interesting to look at the local file header:
local file header signature 4 bytes (0x04034b50)
version needed to extract 2 bytes
general purpose bit flag 2 bytes
compression method 2 bytes
last mod file time 2 bytes
last mod file date 2 bytes
crc-32 4 bytes
compressed size 4 bytes
uncompressed size 4 bytes
file name length 2 bytes
extra field length 2 bytes
file name (variable size)
extra field (variable size)
It should also be noted that any header block starts with 4 "magic" bytes "PK ..", where the last two bytes change depending on which block has started.
So, as can be seen from the screenshot, the discrepancy with the original is only one, and in order to achieve full compliance, I corrected the flag bit.
And I went to look further, naturally, the 12-byte crypto header will always be different, because when it is generated, a random number generator is used. The file contents matched in length - this is a good sign, but the data descriptor was
completely absent from the file generated by the library. I decided to add it. As it turned out, it was not necessary at all.
As a result, the files became absolutely identical in everything regarding header fields and other supporting information.
To check the algorithm, I decided to remove the random and replace it with a static value in the library and in the Info-Zip
program set, compile both programs and create archives with them, let these archives be guaranteed not working, but they should look identical if the algorithms match.
c = (rand() >> 7) & 0xff;
In the crypt.h
file and in the one and the other program, I received two new archives created in a similar way and began to investigate them in Okteta
It can be seen that the files again turned out completely different, but they had a new common part - the first 10 bytes after the local header. Those. 10 of the 12 bytes of the cryptographic title matched, but two did not.
We look at the contents of crypt.h
and see that the last two bytes are generated in a special way with the help of a certain key crcForCrypting
buf[n++] = zencode(pkeys, pcrc_32_tab, (int)(crcForCrypting >> 16) & 0xff, t); buf[n++] = zencode(pkeys, pcrc_32_tab, (int)(crcForCrypting >> 24) & 0xff, t);
OK, substituting the hard-coded crcForCrypting
in both programs, I saw that the files matched (!)
, That is, they completely matched, as if they were the same file.
Feeling that the truth is somewhere nearby, I decided to study how does the value of the variable crcForCrypting come out
? And I found out that in Info-ZIP
this variable is obtained by shifting left by 16 bits of file creation time, while in MiniZIP
this variable always turned out to be zero.
This turned out to be a solution. Having added the necessary code to populate this variable and, not forgetting to return the random to the site, I again generated the archive using the library, and it was successfully unpacked into ark
In conclusion, I will say that I wrote off with the author QuaZIP
in the process of digging, he, by the way, is our compatriot, but unfortunately I could not help me. However, after completing my research, he accepted the patch and promised to release a new version of QuaZIP
in the near future.
For the time being, there is no QuaZIP update, I attach it to the diff
article with a patch for the current version.