Some time ago I needed to compress the data directly in memory, and not to use anything third-party for this - that is, use built-in capabilities. The choice fell on
Cabinet.dll as a means for data compression and on the
IStream interface for working with data in memory. I didn’t find anything like this on the Internet, so I decided to share my work.
Introduction
I didn’t want to use third-party solutions, because I’d have to carry libraries with me or include source codes into the project. Windows does not provide such a large set of data compression / decompression tools: these are
Cabinet.dll ,
ZipFldr.dll (compressed Zip folders), and
RtlCompressBuffer /
RtlDecompressBuffer . I couldn’t find any clear documentation on compressed Zip folders, RtlCompressBuffer / RtlDecompressBuffer in Windows 7 versions inclusive only supports LZ compression, but
Cabinet.dll is present in the system right from
Windows 95 to the present day.
As functions for working with files and memory, the documentation suggests using the standard C library functions or Windows API functions, such as CreateFile / CloseHandle / ReadFile / WriteFile. Since all operations on files were performed in memory, it was decided to use
IStream for these purposes.
')
A little about Cabinet.dll
The library is functionally divided into 2 parts: FCI (file compression interface) and FDI (file decompression interface). You can read about it
here . Both interfaces use essentially the same functions for working with files and memory, but for some reason Microsoft decided to make different prototypes for FCI and FDI. However, nothing prevents to describe one through another. How to do this, see below.
To use the library, you need to connect the files
FCI.h and / or
FDI.h, respectively, and indicate the linker on
Cabinet.lib . All of these files are included with the Windows SDK.
Implementation of the compression interface
The simplest code that implements compression looks like this:
ERF erf; CCAB ccab = {MAXINT, MAXINT}; *(IStream**)ccab.szCabPath = SHCreateMemStream(0, 0);
Those. the code itself is pretty simple. The whole point is the functions that are passed when creating the FCI context and further along the run. You can read about their parameters and return values
here , therefore only the basic information will be indicated below. Below is an analysis of each function.
Here it should be added that we will have non-standard file descriptors in this regard - these are pointers to
IStream . Because of this feature, you need to be careful with the transfer of this "descriptor". For example, in the
CCAB structure
there are 2 fields:
szCabPath and
szCab , and it would seem logical to pass the address to the 2nd parameter, but not. FCI performs string concatenation (or rather, he thinks that he concatenates strings, but we know ...), so the resulting “name” of the file will be
szCabPath , and it will also be the descriptor.
fPlaced
Called every time a new file is added to the archive.
FNFCIFILEPLACED(fPlaced){ return 0; }
Return -1 means an error, the other values ​​are determined by the application. Can be used to indicate the addition of files, for example.
fGetNext
Called before creating a new archive volume.
FNFCIGETNEXTCABINET(fGetNext){ return 1; }
If successful, returns
TRUE ; otherwise, returns
FALSE . Nothing remarkable.
fStatus
It is called at several stages of file processing: block compression, adding a compressed block and recording an archive.
FNFCISTATUS(fStatus){ return typeStatus == statusCabinet ? cb2 : 0; }
In case of an error, you must return -1, otherwise - any value (except for
typeStatus == statusCabinet - then you must return the size of the archive, which is passed through the parameter
cb2 ).
fInfo
Sets file attributes.
FNFCIGETOPENINFO(fInfo){ *pattribs = 0; return (INT_PTR)pszName; }
IStream does not support date attributes, and indeed file attributes, so the value at pattribs should be set to 0, otherwise you risk getting files in the archive with strange attributes (or even not getting the archive at all).
Return -1 means an error, otherwise you need to return a handle to the open file.
fTemp
Creating a temporary file.
FNFCIGETTEMPFILE(fTemp){ *(IStream**)pszTempName = SHCreateMemStream(0, 0); return 1; }
If successful, returns
TRUE , otherwise returns
FALSE . The file name (pointer to
IStream in this case) is passed through the
pszTempName parameter.
fDelete
Delete the file.
FNFCIDELETE(fDelete){ (*(IStream**)pszFile)->Release(); return 0; }
Returns 0 on success; -1 on failure. Deleting a file in this case is the release of the resources occupied by the stream, so we simply
release () .
fAlloc, fFree
Allocation / release of memory.
FNFCIALLOC(fAlloc){ return new char[cb]; } FNFCIFREE(fFree){ delete memory; }
It's all very simple, so I even combined these functions in one section.
fOpen
Opening file (stream).
FNFCIOPEN(fOpen){ return *(INT_PTR*)pszFile; }
Since the file name in our case is equivalent to the file descriptor, which is why we return the name as a descriptor (well, or -1, if some kind of error has occurred).
fClose
Close the file descriptor.
FNFCICLOSE(fClose){ LARGE_INTEGER li = {}; ((IStream*)hf)->Seek(li, 0, 0); return 0; }
Returns 0 on success; -1 on failure. Why not
release () ? Because it "deletes the file", i.e. destroys the flow, while you only need to close it. So just reset the pointer to the beginning.
fRead, fWrite
Read / write data from file / to file.
FNFCIREAD(fRead){ ULONG ul; HRESULT hr = ((IStream*)hf)->Read(memory, cb, &ul); return (hr && hr != S_FALSE) ? -1 : ul; } FNFCIWRITE(fWrite){ ULONG ul; HRESULT hr = ((IStream*)hf)->Write(memory, cb, &ul); return (hr && hr != S_FALSE) ? -1 : ul; }
Returns the number of bytes read / written or -1 in case of an error (0 - end of file reached).
fSeek
Positioning the pointer in the file.
FNFCISEEK(fSeek){ LARGE_INTEGER liDist = {dist}; HRESULT hr =((IStream*)hf)->Seek(liDist, seektype, (ULARGE_INTEGER*)&liDist); return hr ? -1 : liDist.LowPart; }
Returns -1 on error; otherwise, a new pointer position.
Unpacking interface implementation
The unpacking code looks like this:
ERF erf; HFDI hFDI = FDICreate(fAlloc, fFree, fnOpen, fnRead, fnWrite, fnClose, fnSeek, cpuUNKNOWN, &erf); if(hFDI){ IStream *pIStrSrc = SHCreateMemStream(0, 0); if(FDICopy(hFDI, (PSZ)&pIStrCab, (PSZ)&pIStrCab, 0, fnNotify, 0, &pIStrSrc)){
Here is not so simple. The fact is that the extraction of
all files from the archive is initiated by a single function
FDICopy , which in the course of its work calls
fnNotify , where all the magic happens. But more on that later.
In general, the process is similar: we create an FDI context, a stream for the output data, extract the file from the archive into this stream (in my example, it was necessary to extract a single file) and destroy the context.
(PSZ) & pIStrCab must be specified twice, because during its operation the function concatenates both parameters, and if you omit one of them, there will be an error (yes, I also stumbled upon such a rake).
Now a little about the functions. In general, they are similar to FCI functions, except that they do not have 2 parameters; memory allocation / release functions are generally identical, so it makes no sense to re-describe them. To reduce the amount of code, you can rewrite the FCI functions through the FDI functions in order not to specify extra zero parameters.
fnOpen, fnClose
Open / close file (stream).
FNOPEN(fnOpen){ return *(INT_PTR*)pszFile; } FNCLOSE(fnClose){ return fClose(hf, 0, 0); }
fnOpen is easier to duplicate than calling
fOpen , and in
fnClose the FCI
fClose function is
called with 2 zero last parameters, because they are not used in this implementation.
fnRead, fnWrite, fnSeek
Reading / writing data and positioning the pointer.
FNREAD(fnRead){ return fRead(hf, pv, cb, 0, 0); } FNWRITE(fnWrite){ return fWrite(hf, pv, cb, 0, 0); } FNSEEK(fnSeek){ return fSeek(hf, dist, seektype, 0, 0); }
Returned values ​​are the same as for FCI.
fnNotify
The most important function.
FNFDINOTIFY(fnNotify){ if(fdint == fdintCOPY_FILE) if(!lstrcmp(pfdin->psz1, "Data"))
All information on the function can be found
here . Here you need a few explanations.
In most cases, the function returns 0 as an indicator of success (except
fdintCLOSE_FILE_INFO , then return
TRUE ). When
fdint == fdintCOPY_FILE, the behavior is as follows: 0 means the file is skipped, -1 is an error (
FDICopy completion), another value is the stream descriptor to which data should be extracted.
Now the fun begins, because if we create threads in this function, we will not get access to them from the outside. Therefore, there are at least 2 solutions, and both of them affect the hitherto unused and therefore inconspicuous last parameter
pvUser of the FDICopy function. Through it, you can transfer user data, and it is he who returns to
pfdin-> pv . The first way is if you have a fixed list of file names that you need to extract from the archive, then you can transfer it as an array of structures containing the required file name and a pointer to
IStream to extract to it. The second way is when the number of files is unknown, and you need to extract them all; In this case, through
pvUser, you can pass the address of the container (for example,
std :: vector ), in which the names and descriptors of the extracted files will be stored.
Afterword
This method is suitable for cases where the resulting data size is not particularly large - about a hundred megabytes. Of course, in the presence of 8+ GB of memory, it is not such a big expense, but remember that the operation of re-allocating memory is not the fastest operation, which also leads to memory fragmentation, as a result of which a sudden enough continuous you will not have a memory block.
As some alternative, you can use
structured storage (there is the same
IStream ) or file streams created with
SHCreateStreamOnFile /
SHCreateStreamOnFileEx . Thus, it is possible to combine input / output operations in memory with similar operations in files, since The
iStream interface can be used in both cases without any additional manipulations.
If you have any questions about the implementation, I am ready to answer them in the comments.