📜 ⬆️ ⬇️

We work with Compound File

I have been working with compound files for a long time, for more than 15 years. For all the time I had enough information about the pros and cons of compound files.
On the one hand, they are really a very convenient storage of information that allows data to be changed on the fly, on the other hand, this convenience is partially leveled by the speed of access to data.

In general, what are commonly used compound files for?
For everything that needs to be stored in a certain container (NoSQL subset).
For example, files of old versions of Microsoft Office from 97 to 2003 inclusively (consisting actually of several dozen files) were stored just in a composite file. Now they are also stored, only ZIP is used as a container.

MSI installation packages are also composite files, and even the folder thumbnail cache file uses this format.
')
True for the same Word, there is a whole complex of utilities (Recovery for Word, Word Recovery Toolbox, Munsoft Easy Word Recovery) that restore, or at least attempt to repair, damaged documents. Conclusions can do yourself.
Although, with proper work with compound files, the problem of their damage can be solved (and I will show how).

And, of course, the undoubted advantage of this format is that inside the vault a full-fledged file system is emulated with its files and folders.

By the way, nuance. Before starting the article, I conducted a survey on several forums, and it turned out that the vast majority of developers do not work with compound files, and for a simple reason, they did not hear what it is.
Now we close this gap.

1. General information about compound files and their creation


I will not tell you about the structure and internal format of the compound file, this is unnecessary.
First you need to "touch" him - what he is all about.

Therefore, let's start by creating a new compound file by calling the StgCreateDocfile.
In uses, we include this pair of ActiveX and AxCtrls (useful).
And now we write:

procedure CheckHResult(Code: HRESULT); begin if not Succeeded(Code) then RaiseLastOSError; end; var TestFilePath: string; WideBuff: WideString; Root: IStorage; begin TestFilePath := ExtractFilePath(ParamStr(0)) + '..\data\simple.bin'; ForceDirectories(ExtractFilePath(TestFilePath)); WideBuff := TestFilePath; CheckHResult(StgCreateDocfile(@WideBuff[1], STGM_CREATE or STGM_WRITE or STGM_SHARE_EXCLUSIVE, 0, Root)); 

First of all, I will pay attention to the flags.
STGM_CREATE and STGM_WRITE - these two flags are used to create a new compound file, and the presence of the STGM_WRITE flag in this case is mandatory (otherwise no focus will work).

IMPORTANT:
But with the third flag STGM_SHARE_EXCLUSIVE everything is much more cunning. Its presence is required always and everywhere, except for opening a file in “read only” mode, as discussed in the second chapter.
You can check it yourself in IDA Pro Freeware .
The StgCreateDocfile calls the DfOpenDocfile function, from which VerifyPerms is called, which will have this check:


At the address 72554E62 there is a check for the presence of this flag, and if it is suddenly not detected, the opening error will return. Thus, the simultaneous opening of a compound file for writing more than once is prohibited.

It was somewhat surprising for me to see such a check in the third ring, and I even (for the sake of experiment) scored it, after which I was able to open the file for recording two times at a time. But - correctly written to both files, of course, did not work. :)

In fact, this is quite a competent decision, due to the very format of data storage, but I will focus on it a little later, towards the end of the article.

If all checks were successful and the return code StgCreateDocfile is S_OK, then in the fourth parameter of this function, the IStorage interface will return to us, indicating the root element of the compound file, with which all further work will occur.
What can we do next?

For example, create a new file in the root (we still have a file system) and write some block of data into it.
We write the following function:

 procedure WriteFile(Storage: IStorage; AName: WideString; Data: AnsiString); var Stream: IStream; OS: TOleStream; begin CheckHResult(Storage.CreateStream(@AName[1], STGM_WRITE or STGM_SHARE_EXCLUSIVE, 0, 0, Stream)); OS := TOleStream.Create(Stream); try OS.WriteBuffer(Data[1], Length(Data)); finally OS.Free; end; end; 

In it, first of all we create a new “file” by calling the function Storage.CreateStream. It is almost identical to the previously discussed StgCreateDocfile, only as a result, returns the IStream interface, through which the contents of the file will be handled .

Pay attention to the flags: STGM_SHARE_EXCLUSIVE must be specified, and the second must go (if created) either STGM_WRITE or STGM_READWRITE, but since the composite file was created using the STGM_WRITE flag — it is used.

For convenience, working with IStream is carried out through the TOleStream interlayer class, which records data.

This, of course, is not a crucial point, and you could use the call to the function Write of the ISequentialStream interface, which is inherited from IStream, but it is easier to work with the TOleStream class.
Call the previously implemented function:

 WriteFile(Root, 'RootFile', 'First file data'); 

As a result, a file with the name RootFile and the contents of “First file data” will appear in the root.

IMPORTANT:
There is one nuance here. The names of files and folders inside a compound file cannot exceed the length of 31 Unicode characters (in fact, not more than 32, but we should not forget about the terminating zero).

Yes, exactly, a folder or file can be called “123”, but it is impossible: “My long file name and many more digits”. Moreover, the specification is a set of characters that can not be used in the name (from 0 to 0x1F).

Probably, you will say - why such restrictions, and suddenly I want to deploy a huge branched file system with a huge depth of nesting?
So it is not a question, in contrast to the standard file restrictions, the MAX_PATH constant does not affect you.
500 subfolders named “my big name”?
It's easy, we still work with the virtual file system - create what you want. :)

Let's return to our sheep: we will create a folder in the root.

 CheckHResult(Root.CreateStorage('SubFolder', STGM_WRITE or STGM_SHARE_EXCLUSIVE, 0, 0, Folder)); 

The code is almost the same as calling Storage.CreateStream, only this time we will get another IStorage interface pointing to the folder we just created.

We can create a new file in it right now:

 WriteFile(Folder, 'SubFolderFile', 'Second file data'); 

For this, we’ll specify the first parameters not Root, which refers to the root, but the newly created Forder.

IMPORTANT:
And now the nuance, if we close the application right now - the data may not be saved.
Here, in fact, not everything is so simple, for example, on my home machine, this behavior is reproduced guaranteed, and on the working machine, exactly the opposite.

To ensure that the data is saved, the following code must be executed:

 CheckHResult(Root.Commit(STGC_DEFAULT)); 

After executing this code, all data will be guaranteed to be saved to a file on disk. Well, if you suddenly "suddenly" changed your mind, you can undo all the changes that occurred from the previous commit by calling the following code:

 CheckHResult(Root.Revert); 

By the way, about closing the file.
This is done by banal nibbling of the root, after which, when calling @IntfClear for an interface in the Root variable, all other interfaces will be destroyed in a hierarchical order.
What we have left?

Aha, more methods CopyTo / MoveElementTo / EnumElements and more ...
We will deal with them a little later, but for now you can open the archive attached to the article and look at the implementation of the code described above in the file ".. \ simple \ StorageCreateDemo.dpr"

Now we try to read all this trouble.

2. Reading the compound file


Let's create a new project, connect ActiveX and AxCtrls again and write the opening code:

 var TestFilePath: string; WideBuff: WideString; Root: IStorage; begin TestFilePath := ExtractFilePath(ParamStr(0)) + '..\data\simple.bin'; WideBuff := TestFilePath; CheckHResult(StgOpenStorage(@WideBuff[1], nil, STGM_READ or STGM_SHARE_DENY_WRITE, nil, 0, Root)); 

Since we do not need access to the record, we use the STGM_READ flag and here we have the choice to use STGM_SHARE_DENY_WRITE or still leave STGM_SHARE_EXCLUSIVE (one of the two flags must be).

The result of executing the code is the variable Root, of the IStorage class, indicating the root.

How would you search for files in the specified folder on the disk?
Naturally, a recursive directory traversal using FindFirstFile.
In this case, we have something similar: this is the EnumElements method of the IStorage interface, whose call looks something like this:

 var Enum: IEnumStatStg; begin CheckHResult(Storage.EnumElements(0, nil, 0, Enum)); 

Roughly speaking, this is an analogue of the FindFirstFile call, but here we get not the handle with which we can work further, but the IEnumStatStg interface.

There is one interesting point on which is to draw your attention.
This interface (when used) will return the TStatStg structure, one of the fields of which will be the parameter pwcsName, the type of which is POleStr .

Zimus this situation understood?

Of course, this is a potential memo, because OLE never knows about the existence of our native memory manager and allocates a block for storing this string with its own means, through the IMalloc interface.

If we do not handle this situation, the application's memory will flow like a Victoria Falls, but it will be fun to look at the memory consumption counters. :)

Therefore, the first step is to get a link to an instance of this interface:

 if (CoGetMalloc(1, ShellMalloc) <> S_OK) or (ShellMalloc = nil) then raise Exception.Create('CoGetMalloc failed.'); 

We will need it to free the memory allocated by us.
Like this:

 ShellMalloc.Free(TmpElement.pwcsName); 

Further one more nuance:
The data type in the returned TStatStg can be the following:


All other options are purely service and we are not interested.

We look how it happens:

 procedure Enumerate(const Root: string; Storage: IStorage); var Enum: IEnumStatStg; TmpElement: TStatStg; ShellMalloc: IMalloc; Fetched: Int64; Folder: IStorage; AFile: IStream; begin // ..   OLE,     IMalloc if (CoGetMalloc(1, ShellMalloc) <> S_OK) or (ShellMalloc = nil) then raise Exception.Create('CoGetMalloc failed.'); //        CheckHResult(Storage.EnumElements(0, nil, 0, Enum)); //       Fetched := 1; while Fetched > 0 do if Enum.Next(1, TmpElement, @Fetched) = S_OK then //  ( ) if ShellMalloc.DidAlloc(TmpElement.pwcsName) = 1 then begin //    Write('Found: ', Root, '\', AnsiString(TmpElement.pwcsName)); //    case TmpElement.dwType of //   -       STGTY_STREAM: begin Writeln(' - file: ', sLineBreak); CheckHResult(Storage.OpenStream(TmpElement.pwcsName, nil, STGM_READ or STGM_SHARE_EXCLUSIVE, 0, AFile)); ShowFileData(AFile); Writeln; end; //   -           STGTY_STORAGE: begin Writeln(' - folder'); CheckHResult(Storage.OpenStorage(TmpElement.pwcsName, nil, STGM_READ or STGM_SHARE_EXCLUSIVE, nil, 0, Folder)); Enumerate(Root + '\' + string(TmpElement.pwcsName), Folder); end; else Writeln('Unsupported type: ', TmpElement.dwType); end; // ,      -    ShellMalloc.Free(TmpElement.pwcsName); end; end; 

And now let's see what happens when reading the file created in the first chapter:


Actually, this is exactly the data that we recorded in the first chapter.

The code for this example is in the archive for the article, following the path ".. \ simple \ StorageReadDemo.dpr"

Now let's see how to work with it a little more convenient.

3. Wrap class


At one time, I developed a small module (a thousand lines with comments), which implemented several classes that take into account all the nuances of working with composite files and provide a more convenient mechanism for working.
You can find it in the archive, in the folder ".. \ StorageReader \ FWStorage.pas" .

It has several shortcomings. The fact is that I abandoned its development a very long time ago, so on Unicode Delphi versions it will issue vornings related to working with strings.
[dcc32 Warning] FWStorage.pas (860): W1057 Implicit string cast from 'AnsiString' to 'string'
[dcc32 Warning] uStgReader.pas (102): W1057 Implicit string cast from 'ShortString' to 'string'

But at the same time it is quite functional and these vornings will not affect its performance in any way. (To be honest - too lazy to comb them too).

You can use this module at your discretion with the following reservations.

If you suddenly change the code of the classes (add ryushechki, correct errors if you find one), and then upload it to the Internet, the name of the author of the module should be saved in the header.
I’m not accompanying this module anymore (it’s outdated for me), so I’ll reject requests for improvement.
So, from this module, we are interested in the TFWStorage class, which is used to work with a compound file, and the TFWStorageCursor class, which is a wrapper on IStorage.
To begin with, I will list the methods of these classes, and then I will give an example of working with them.
So, the TFWStorage class, it is intended only for working with a file and provides several utilitarian methods:


Those. in principle, its main task is to give us an instance of the TFWStorageCursor class, with the help of which the main work with the composite file will occur.

His methods are as follows:


As you can see, there are no wrappers for IStream, work with this interface is assigned to the methods CreateStream, ReadStream, WriteStream.

In the TFWStorageEnum array, which is returned by the Enumerate method, you do not need to free the memory allocated for pacsName, this has already been done, and you are working with a copy of the data stored in the memory allocated by the native memory manager.
The only question can cause the method Backward, as so - why he destroys himself?

And now I will show, it is really convenient.
Here, for example, if we needed to open such a path: “the path to the file \ Subfolder1 \ subfolder2 \ subsubfolder”, which had to be done using the usual interfaces from the second chapter:
Open the file itself and get the IStorage interface pointing to the root, then get IStorage for the first folder, then for the second and third, which is also needed for the “subsubfolder”.
These are as many as 4 items that need to be stored somewhere.

When using TFWStorage, everything becomes much simpler:

 procedure TForm1.Button1Click(Sender: TObject); var Path: string; Storage: TFWStorage; Root, Folder: TFWStorageCursor; Data: TStringStream; begin Storage := TFWStorage.Create; try //     Path := ExpandFileName(ExtractFilePath(ParamStr(0)) + '..\data\test.bin'); //    Storage.OpenFile(Path, True, Root); //          Storage.ForceStorage(Path + '\Subfolder1\subloder2\subsubfolder', Folder); Data := TStringStream.Create; try //      Data.WriteString('new file data.'); //             while Folder <> Root do begin Folder.WriteStream(Folder.GetName + '_new_file.txt', Data); //       Folder.Backward(Folder); end; //   Root.FlushBuffer; finally Data.Free; end; finally Storage.Free; end; end; 

That's all, in terms of programming it turned out very convenient.

Well, now write something more serious, namely the editor of the contents of the compound file.

Open a new project and create something like this in it:


In private, add three variables:
 private FCurrentFileName: string; FStorage: TFWStorage; FRoot: TFWStorageCursor; 


In the form constructor we write the following code:
 procedure TForm1.FormCreate(Sender: TObject); begin //     FCurrentFileName := ExpandFileName(ExtractFilePath(ParamStr(0)) + '..\data\simple.bin'); //   FStorage := TFWStorage.Create; //   OpenFile(False); end; 


Now we write the file opening procedure itself, it is simple:

 procedure TForm1.OpenFile(CreateNew: Boolean); begin //  ,      FStorage.CloseFile; //    FStorage.OpenFile(FCurrentFileName, CreateNew or not FileExists(FCurrentFileName), FRoot); Caption := FCurrentFileName; //       ShowStorageData(FRoot); end; 

So far, so simple, yes? In principle, the rest of the code will be unpretentious.

Now we write the procedure for displaying the contents of the folder on the screen:

 procedure TForm1.ShowStorageData(AStorage: TFWStorageCursor); procedure AddItem(const ACaption: string; AIndex: Integer); begin with ListView1.Items.Add do begin Caption := ACaption; case AIndex of -1: ImageIndex := -1; 1: begin ImageIndex := 0; SubItems.Add('Folder'); end else ImageIndex := 1; SubItems.Add('File'); end; //      Data, : // -1 -     // 0 -  // 1 -  //       Data := Pointer(AIndex); end; end; var AData: TFWStorageEnum; I: Integer; begin ListView1.Items.BeginUpdate; try ListView1.Items.Clear; //  ,         // (   - ) if not AStorage.IsRoot then AddItem('..', -1); //     AStorage.Enumerate(AData); //     ListView for I := 0 to AData.Count - 1 do AddItem( string(AData.ElementEnum[I].pacsName), Byte(AData.ElementEnum[I].dwType = STGTY_STORAGE)); finally ListView1.Items.EndUpdate; end; end; 

If everything is done correctly, then run the project, this will open the file ".. \ data \ simple.bin" that we created in the first chapter and everything should look something like this:


Now let's navigate through our repository.
Its logic will be simple:


To do this, in the OnDblClick event handler of the ListView, we write the following code:

 procedure TForm1.ListView1DblClick(Sender: TObject); begin //     -  if ListView1.Selected = nil then Exit; //    Data   case Integer(ListView1.Selected.Data) of -1: //     begin //       FRoot.Backward(FRoot); //    ShowStorageData(FRoot); end; 0: //   EditFile; 1: //   begin //      FRoot.OpenStorage(AnsiString(ListView1.Selected.Caption), FRoot); //    ShowStorageData(FRoot); end; end; end; 

Now you can walk around our store with double clicks. :)

Editing the file will do as follows. We will connect a new form to the project, add a save button and a cancel button to it, and also TMemo in which the contents of the file will be displayed, after which we will write the following code:

 procedure TForm1.EditFile; var Buff: TMemoryStream; Data: AnsiString; begin Buff := TMemoryStream.Create; try //    FRoot.ReadStream(AnsiString(ListView1.Selected.Caption), Buff); //     if Buff.Size > 0 then begin SetLength(Data, Buff.Size); Buff.Read(Data[1], Buff.Size); end; //    frmEdit := TfrmEdit.Create(Self); try //   Memo   frmEdit.Memo1.Text := string(Data); //   if frmEdit.ShowModal <> mrOk then Exit; //    Memo Buff.Clear; Data := AnsiString(frmEdit.Memo1.Text); if Length(Data) > 0 then Buff.Write(Data[1], Length(Data)); //    FRoot.WriteStream(AnsiString(ListView1.Selected.Caption), Buff); //    FRoot.FlushBuffer; finally frmEdit.Release; end; finally Buff.Free; end; end; 

Well, here we have almost a full-fledged editor, it remains to add functionality for the buttons on top of the form.

Handlers for creating a new compound file and opening an existing one look like this:
 procedure TForm1.btnCreateDFaseClick(Sender: TObject); begin if SaveDialog1.Execute then begin FCurrentFileName := SaveDialog1.FileName; OpenFile(True); end; end; procedure TForm1.btnOpenDBaseClick(Sender: TObject); begin if OpenDialog1.Execute then begin FCurrentFileName := OpenDialog1.FileName; OpenFile(False); end; end; 


This will be the code for creating a new folder and deleting an existing one:
 procedure TForm1.btnAddFolderClick(Sender: TObject); var NewFolderName: string; Tmp: TFWStorageCursor; begin if InputQuery('New folder', 'Enter folder name', NewFolderName) then begin FRoot.CreateStorage(AnsiString(NewFolderName), Tmp); FRoot.FlushBuffer; end; ShowStorageData(FRoot); end; procedure TForm1.btnDelFolderClick(Sender: TObject); begin if Application.MessageBox( PChar(Format('Delete folder: "%s"?', [ListView1.Selected.Caption])), 'Delete folder', MB_ICONQUESTION or MB_YESNO) = ID_YES then begin FRoot.DeleteStorage(AnsiString(ListView1.Selected.Caption)); FRoot.FlushBuffer; ShowStorageData(FRoot); end; end; 


And the same, only for the buttons for opening and deleting a file
 procedure TForm1.btnAddFileClick(Sender: TObject); var NewFileName: string; begin if InputQuery('New file', 'Enter file name', NewFileName) then begin FRoot.CreateStream(AnsiString(NewFileName)); FRoot.FlushBuffer; end; ShowStorageData(FRoot); end; procedure TForm1.btnDelFileClick(Sender: TObject); begin if Application.MessageBox( PChar(Format('Delete file: "%s"?', [ListView1.Selected.Caption])), 'Delete file', MB_ICONQUESTION or MB_YESNO) = ID_YES then begin FRoot.DeleteStream(AnsiString(ListView1.Selected.Caption)); FRoot.FlushBuffer; ShowStorageData(FRoot); end; end; 


, :)

, - DOC ? :)


, , , , , : "..\StorageReader\"

4. Compound File


, , , .

— .

.
— .
, . :))
, , .

, , :


( ) — , , .

, ( «//») , , INI (- thumbs.db), , , ?

, .
, .

: , , .
: « . -, 1, 142 1 — 163 10, 22, 0».

— 31 , : « ».
, : « ».

, , - .

:
, ( — ), . «Properties», ( , , ) ( 1024 — , , ).
— — «Data», .

:


, ?
GUID «» 31 . :)

31 , - .

, 5 ?
-, , 5 , . , , — . :)

, « » — . «» — .

: «, , ?».

, ( ), — ?!!!

, .
, 2 200 ( ). , …
— , , ( ).

, : 50 , / .

, , , - .
, , , . , Firebird/Interbase, — MS SQL/Olracle, ADO. , .

— , , .

.
- , OpenStorage, , — .

TFWStorage : ReConnect — ForceStorage, , .

: . () ( — MSDN).

, , : « , ».

.
, , StgOpenStorage.
…

— - 4 , .
, . .

, — , ( ) — MSDN , . , .

, :
- () — , EnumElements. IEnumStatStg, . CreateStream , .

. DestroyElement CreateStream, : «- ».

, , .

5. RAW


, : , , ? ?

. , , .

, NTFS , API — ? :)

, .

, MS , POIFS , Wiki POIFS ( ), ( ).

, ? :)
.

, , :

 TPoifsFileHeader = packed record // .   (0 x E011CFD0, 0 x E11AB1A1) _abSig: array [0..7] of Byte; // Class ID.  WriteClassStg,  GetClassFile/ReadClassStg. //  Excel   = 0 _clid: TGUID; //    . _uMinorVersion: USHORT; //    Dll/ _uDllVersion: USHORT; // 0 x FFFE ,   Intel  _uByteOrder: USHORT; //  .   9,     512  (2 ^ 9) _uSectorShift: USHORT; //  -.   6,     64  (2 ^ 6) _uMiniSectorShift: USHORT; // ,    0 _usReserved: USHORT; // ,    0 _ulReserved1: ULONG; // ,    0 _ulReserved2: ULONG; //  ,    FAT. //    7,   1,  ,   1   DIF . _csectFat: ULONG; //   ,    Property Set Storage // (  FAT Directory  Root Directory Entry) _sectDirStart: ULONG; //   . _signature: ULONG; //   -.  4096 _ulMiniSectorCutoff: ULONG; //   -FAT. //  (-2),  - . _sectMiniFatStart: ULONG; //     -FAT. 0,  -  _csectMiniFat: ULONG; //    DIF . //    7,  DIF      (-2) _sectDifStart: ULONG; //    DIF .0,   < 7 _csectDif: ULONG; //   109 ,    FAT. //    7,   ,    (-1). _sectFat: array [0..108] of ULONG; end; 

, , — .

, _uSectorShift _uMiniSectorShift , , .

 procedure TPoifsFile.InitHeader; begin FStream.ReadBuffer(FHeader, SizeOf(TPoifsFileHeader)); FHeader._uSectorShift := Round(IntPower(2, FHeader._uSectorShift)); FHeader._uMiniSectorShift := Round(IntPower(2, FHeader._uMiniSectorShift)); end; 

FAT, , _ulMiniSectorCutoff .
 procedure TPoifsFile.ComposeFAT; var I, J, X, FatLength: Integer; FatBlock: TPoifsFatBlock; CurrentFat, Offset: Integer; XFat: array of Integer; begin //  -  FAT (   128 ) //   DIF ,  _csectFat   FatLength := FHeader._csectFat * 128; //      FAT SetLength(FFat, FatLength); //        SetLength(FFatOffset, FatLength); //   DIF ,  FAT     109  //     DIF  for I := 0 to IfThen(FHeader._csectDif > 0, 108, FHeader._csectFat - 1) do begin //  FAT    128  FatBlock := TPoifsFatBlock(GetBlock(FHeader._sectFat[I])); for J := 0 to 127 do begin FFat[I * 128 + J] := FatBlock[J]; //        ,  FFatOffset[I * 128 + J] := FStream.Position - SizeOf(TPoifsBlock); end; end; // ,   DIF  if FHeader._sectDifStart = 0 then Exit; //  ,      FAT   Offset := FHeader._sectDifStart; //   XFAT     FAT  SetLength(XFat, 128); //     FAT  CurrentFat := 13951; //109 * 128 - 1 BAT for X := 0 to FHeader._csectDif - 1 do begin //     ( _uSectorShift  ) //     FStream.Position := GetBlockOffset(Offset); //    FAT  FStream.ReadBuffer(XFat[0], 128 * SizeOf(DWORD)); //      //        //        for I := 0 to 126 do begin //     ,   , //  FAT   if XFat[I] < 0 then Exit; //  FAT    128  FatBlock := TPoifsFatBlock(GetBlock(XFat[I])); for J := 0 to 127 do begin Inc(CurrentFat); FFat[CurrentFat] := FatBlock[J]; FFatOffset[CurrentFat] := FStream.Position - SizeOf(TPoifsBlock); end; end; //       Offset := XFat[127]; end; end; 


TPoifsFatBlock, 128 Integer.
GetBlockOffset GetBlock.

.
 function TPoifsFile.GetBlockOffset(BlockIndex: Integer): Int64; begin Result := HEADER_SIZE + FHeader._uSectorShift * BlockIndex; end; function TPoifsFile.GetBlock(Adress: Integer): TPoifsBlock; begin FStream.Position := GetBlockOffset(Adress); FStream.ReadBuffer(Result, SizeOf(TPoifsBlock)); end; 


, , _ulMiniSectorCutoff.

 procedure TPoifsFile.ComposeMiniFat; var I, CurrChain: Integer; TmpPosition: int64; begin //        CurrChain := FHeader._sectMiniFatStart; //     (-    128 ) SetLength(FMiniFat, FHeader._csectMiniFat * 128); I := 0; while Integer(CurrChain) >= 0 do begin //    TmpPosition := GetBlockOffset(CurrChain); //  ,    if TmpPosition < 0 then Exit; //if TmpPosition > FStream.Size then Exit; FStream.Position := TmpPosition; //   FStream.ReadBuffer(FMiniFat[I], 512 {128 * SizeOf(DWORD)}); Inc(I, 128); //      FAT CurrChain := FFat[CurrChain]; end; end; 

.

:

 TPoifsProperty = packed record // 128 length //  / Caption: array[0..31] of WChar; //   CaptionSize: Word; //   STGTY_ PropertyType: Byte; //   ( TPoifsProperty    Red-Black-Tree) NodeColor: Byte; // 0 (red) or 1 (black) //      PreviousProp: Integer; //      NextProp: Integer; //      ChildProp: Integer; Reserved1: TGUID; UserFlags: DWORD; //  ATime: array [0..1] of Int64; //   FAT        StartBlock: Integer; //   Size: Integer; Reserved2: DWORD; end; TPoifsPropsBlock = array[0..3] of TPoifsProperty; 

:
 function TPoifsFile.ReadPropsArray: Boolean; var I, J, Len: Integer; PropsBlock: TPoifsPropsBlock; begin Result := True; //     Len := 0; //    ,    Property Set Storage J := FHeader._sectDirStart; repeat //     4  Inc(Len, 4); SetLength(FPropsArray, Len); PropsBlock := TPoifsPropsBlock(GetBlock(J)); for I := 0 to 3 do FPropsArray[Len - 4 + I] := PropsBlock[I]; //      FAT J := FFat[J]; until J = ENDOFCHAIN; end; 


:
  1. FAT, .
  2. MiniFAT

FAT MiniFAT?
FHeader._uSectorShift, .
FAT ( , ).
, 1 , 2048 , 512 ( ).
- , , 10 , — .
, ( ) FAT StartBlock TPoifsProperty, , , ( FAT).
, , .

, .
. , TPoifsProperty, PreviousProp, NextProp ChildProp, NodeColor. Red-Black-Tree.
, .

, .

, :


: ( TreeView), .

, Extract:

 begin FileStream := TFileStream.Create(edSrc.Text, fmOpenReadWrite); try AFile := TPoifsFile.Create(FileStream); try //      AFile.LoadFromStream; ATree := TStorageTree.Create; try //   for I := 0 to AFile.PropertiesCount - 1 do ATree.AddNode(I).Data := AFile[I]; //    FillAllChilds(0, ATree.GetNode(0).Data.ChildProp); //    TreeView1.Items.Clear; FillTree(nil, 0); //    DebugLog := TStringList.Create; try Extract(IncludeTrailingPathDelimiter(edDst.Text), 0); if DebugLog.Count > 0 then DebugLog.SaveToFile(IncludeTrailingPathDelimiter(edDst.Text) + 'cannotread.log'); finally DebugLog.Free; end; finally ATree.Free; end; finally AFile.Free; end; finally FileStream.Free; end; end; 

( ) .
.

.

, N , TPoifsProperty ( , « »).
, .

, :


:


, ChildProp, — NextProp, — PreviousProp.
, , .

, , , .
 var ATree: TStorageTree; ... procedure FillAllChilds(RootIndex, CurrentIndex: Integer); var SubChildIndex: Integer; RootNode, CurrNode, ChildNode: TStorageElement; begin if CurrentIndex < 0 then Exit; //      RootNode := ATree.GetNode(RootIndex); //        CurrNode := ATree.GetNode(CurrentIndex); if CurrNode = nil then Exit; //     -  if CurrNode.Added then Exit; CurrNode.Added := True; //       ATree.AddVector(RootNode, CurrNode); //         FillAllChilds(CurrNode.ID, CurrNode.Data.ChildProp); //          SubChildIndex := CurrNode.Data.PreviousProp; while SubChildIndex >= 0 do begin //  ,     FillAllChilds(RootIndex, SubChildIndex); ChildNode := ATree.GetNode(SubChildIndex); if ChildNode <> nil then SubChildIndex := ChildNode.Data.PreviousProp else SubChildIndex := -1; end; //       ,      SubChildIndex := CurrNode.Data.NextProp; while SubChildIndex >= 0 do begin FillAllChilds(RootIndex, SubChildIndex); ChildNode := ATree.GetNode(SubChildIndex); if ChildNode <> nil then SubChildIndex := ChildNode.Data.NextProp else SubChildIndex := -1; end; end; 


, TStorageTree , , .
, GetNode, ( TPoifsProperty, Data) AddVector, .

— .
 procedure FillTree(Node: TTreeNode; RootNodeIndex: Integer); var W: WideString; TreeNode: TTreeNode; I: Integer; RootStorageNode, ChildStorageNode: TStorageElement; begin //    RootStorageNode := ATree.GetNode(RootNodeIndex); //     (   ) W := RootStorageNode.Data.Caption; TreeNode := TreeView1.Items.AddChildFirst(Node, W); case RootStorageNode.Data.PropertyType of STGTY_STORAGE: TreeNode.ImageIndex := 0; STGTY_STREAM: TreeNode.ImageIndex := 1; end; //      for I := 0 to RootStorageNode.VectorCount - 1 do begin // ,       (   ?) ChildStorageNode := TStorageElement(RootStorageNode.GetVector(I).SlaveNode); if ChildStorageNode = nil then Continue; //   ,      ,    if ChildStorageNode.ID <> RootNodeIndex then FillTree(TreeNode, ChildStorageNode.ID); end; end; 


.
, , ( GetVector(I).SlaveNode) , .

, «Red-Black-Tree» NodeColor?
. : « — ». :)

— .
— , .

FAT — , , FAT.

, « » ( , _ulMiniSectorCutoff ):

 procedure TPoifsFile.GetDataFromStream(ChainStart: ULONG; NeedLength: DWORD; const Stream: TStream); begin Stream.Size := 0; while (Integer(ChainStart) >= 0) and (Stream.Size < NeedLength) do begin //      FStream.Position := GetBlockOffset(ChainStart); //      ChainStart := FFat[ChainStart]; //    Stream.CopyFrom(FStream, FHeader._uSectorShift); end; //   if Stream.Size > NeedLength then Stream.Size := NeedLength; end; 

- , .
, , , — ? :)

, , ChainStart, . StartBlock TPoifsProperty.

, _ulMiniSectorCutoff.

 procedure TPoifsFile.GetDataFromMiniStream(ChainStart: ULONG; NeedLength: DWORD; const Stream: TStream); var MiniStreamOffset: DWORD; RealMiniStreamSector, TmpChain: Integer; begin Stream.Size := 0; while (Integer(ChainStart) >= 0) and (Stream.Size < NeedLength) do begin //        Ministream TmpChain := ChainStart; RealMiniStreamSector := Properties[0].StartBlock; while TmpChain >= 8 do begin Dec(TmpChain, 8); RealMiniStreamSector := FFat[RealMiniStreamSector]; end; //    MiniStreamOffset := GetBlockOffset(RealMiniStreamSector); //      FStream.Position := MiniStreamOffset + (ChainStart mod 8) * FHeader._uMiniSectorShift; //       ChainStart := FMiniFat[ChainStart]; //    Stream.CopyFrom(FStream, FHeader._uMiniSectorShift); end; //   if Stream.Size > NeedLength then Stream.Size := NeedLength; end; 

, ?
, , TmpChain FAT. () FAT , TmpChain , .

, :
 procedure GetStorageData(ANode: TStorageElement; const Stream: TStream); begin if ANode.Data.Size < Integer(AFile.Header._ulMiniSectorCutoff) then AFile.GetDataFromMiniStream(ANode.Data.StartBlock, ANode.Data.Size, Stream) else AFile.GetDataFromStream(ANode.Data.StartBlock, ANode.Data.Size, Stream); end; 


, , , .

It seems to be almost everything, the last fifth stage remains - unpacking the entire contents of the compound file into a folder.
 procedure Extract(Path: string; RootNodeIndex: Integer); var W: WideString; I: Integer; RootStorageNode, ChildStorageNode: TStorageElement; F: TFileStream; begin RootStorageNode := ATree.GetNode(RootNodeIndex); W := RootStorageNode.Data.Caption; case RootStorageNode.Data.PropertyType of STGTY_STORAGE: Path := Path + W + '\'; STGTY_STREAM: begin try ForceDirectories(Path); F := TFileStream.Create(Path + W, fmCreate); try GetStorageData(RootStorageNode, F); finally F.Free; end; except DebugLog.Add(Path + W); end; end; end; for I := 0 to RootStorageNode.VectorCount - 1 do begin ChildStorageNode := TStorageElement(RootStorageNode.GetVector(I).SlaveNode); if ChildStorageNode = nil then Continue; if ChildStorageNode.ID <> RootNodeIndex then Extract(Path, ChildStorageNode.ID); end; end; 


Well, there are already no comments. We all saw earlier - the usual algorithm.

If you run a project created by us and set it on some Word document, it will look like this:


And in the daddy, where we got all this, will be this:


, ?
, , , - "|" «CompObj».

OLE , ( 0 0x1F). , , : «cannotread.log».

, , .

"..\RawStorageReader\".

, ?
, : "..\corrupted\corrupted_storage.bin"

- :


, , API:



, , RAW :


, , :


, ReadBuffer GetBlock.
We will decide.

6. .


«» Word. - . :)

.


, , , , , , . :)

:


— . , FAT, .

— .
, FAT , : ENDOFCHAIN (-2) FAT .
ENDOFCHAIN, , , , ( ).

.
, FAT, , ENDOFCHAIN. , , ( 99 ), , .

, , , ( — ).

:
, — FAT .
RAW , .

:

 function TPoifsFile.GetBlock(Adress: Integer): TPoifsBlock; var BlockOffset: Integer; begin BlockOffset := GetBlockOffset(Adress); if BlockOffset < FStream.Size then begin FStream.Position := BlockOffset; FStream.ReadBuffer(Result, SizeOf(TPoifsBlock)); end else raise Exception.Create('Wrong block offset at addres: ' + IntToStr(Adress)); end; 

Suppose now that she checks the offsets and raise an exception, if suddenly something has not grown together.

We will make the second change in the ReadPropsArray procedure, where we will more strictly control the state of the FAT array:

 function TPoifsFile.ReadPropsArray: Boolean; var I, J, Len, LastGood: Integer; PropsBlock: TPoifsPropsBlock; begin Result := True; //     Len := 0; //    ,    Property Set Storage J := FHeader._sectDirStart; LastGood := J; repeat if J = FREESECT then begin FixFatEntry(LastGood, ENDOFCHAIN); Break; end; //     4  Inc(Len, 4); SetLength(FPropsArray, Len); //     try PropsBlock := TPoifsPropsBlock(GetBlock(J)); except FixFatEntry(LastGood, ENDOFCHAIN); Break; end; for I := 0 to 3 do FPropsArray[Len - 4 + I] := PropsBlock[I]; LastGood := J; //      FAT J := FFat[J]; if J < ENDOFCHAIN then begin FixFatEntry(LastGood, ENDOFCHAIN); Break; end; until J = ENDOFCHAIN; end; 

Well, it remains to write the procedure FixFatEntry:

 procedure TPoifsFile.FixFatEntry(FatIndex, NewValue: Integer); var J, Offset: Integer; begin //    FAT  J := FatIndex mod 128; Offset := FFatOffset[FatIndex] + J * 4; //       FStream.Position := Offset; FStream.WriteBuffer(NewValue, SizeOf(Integer)); end; 

It is with the help of it that we will make changes to the FAT chain in the original file.

Now let's see what happened:


… :)
, , , .

, , FixFatEntry FStream.WriteBuffer.

, , , , , , .

:)

: "..\RawStorageReader\PoifsWithRepair.pas" .

, .

7.


— , .

, , — , , , ( , ) .

— , .
, , ?
, , — , , :)

, , :
, , , , .
() .

, , , .

:
8 (, RAW ) 473 , .
, — 150 , .
24 , 12 .
: 150000 * 24 * 12 * 8 = 345 ().

, 473 «» ( 1-2 , ). ( ), FAT ( — , ).
, FAT- — , UnErase : ?
— , .

, , : ?
1 — CD , :)

Do not believe?
( , ) ?

?
— : .
«» — :)

, , — , ( , ).

, :


, .

, " " .

: , ( 1024 ). , , NTFS? :)

Good luck :)

— (Rouse_)

, 2015

Source: https://habr.com/ru/post/254541/


All Articles