WordDocument
and 1Table
“files”, as before, but with presentation-specific ones: Current User
and PowerPoint Document
. The presence of both “files” in the “file system” of a CBF file is mandatory for presentations. By them, we can determine that we have a presentation in front of an erroneous extension.CurrentUserAtom
entry) Current User
. This entry contains technical information about who edited the file last time, but this is not the most important. In this block there is information about the offset to the first UserEditAtom
record, which will be discussed below.rh
header that contains technical information about it. To do this, read the first 8 bytes of any record. The first word usually does not contain the necessary information, but we will need the next 6 bytes. WORD at offset 2 ( rh.recType
) identifies the type of record by which you can find out what to do with the record further. Long at offset 4 ( recLen
) - record length excluding the header of eight bytes. This recording method is quite convenient and allows you to avoid many errors when parsing a presentation file.UserEditAtom
. This entry is already in the PowerPoint Document
. Later we will work only with this "file". With the help of reading this and related records, we have to build such a marvelous thing as an array of displacements PersistDirectory
, with which we will look for the main structure of the PowerPoint document - DocumentContainer
. To do this, we must read the current UserEditAtom
record, find in it the offset offsetPersistDirectory
to the current "live" version of PersistDirectory
and the offset offsetLastEdit
to the next UserEditAtom
. So let's continue to get offsets until we hit the zeros in the DWORD offsetLastEdit
.offsetPersistDirectory
offsets have been offsetPersistDirectory
we need to create this same PersistDirectory
. We go on the offset in the reverse order and read the record PersistDirectoryAtom
. They contain an array of PersistDirectoryEntry
entries. Each of them contains the number of the first entry persistId
and their number cPersist
in the current entry. After this information comes an array of offsets to PersistDirectory
objects. This is the most important array by which we will find links to all objects of the presentation.UserEditAtom
read and find the docPersistIdRef
field docPersistIdRef
. This is the number of the most important DocumentContainer
object in PersistDirectory
. We read it. It stores the car and a small cart of information about the current presentation: headers and footers, notes for slides and the main thing - the record SlideListWithTextContainer
, containing all sorts of different SlideListWithTextContainer
about slides.TextCharsAtom
, TextBytesAtom
and SlidePersistAtom
. With the first two everything is easy: this is unicode text on a slide and plain ANSI, respectively. Another thing is when instead of the text we get a link to the SlidePersistAtom
slide. According to it, we have to read the Drawing
object, which ( sic! ) Is not a PPT object. Yes, inside the slide in this case, the MS Drawing object is embedded, with a rather unpleasant structure of nested records.rh
headings with the same recType
's as the PPT. This made it possible to ease the task and slightly cheat by searching in the Drawing
object of all the same TextCharsAtom
and TextBytesAtom
by their recType
's.PersistDirectory
. If anyone has clarifications, I will listen to them with pleasure.Source: https://habr.com/ru/post/76033/
All Articles