
Many could hear about files like rarjpeg'i. This is a special kind of file, which is a jpeg image and rar-archive glued together closely. It is an excellent container for hiding the fact of information transfer. You can create rarjpeg with the following commands:
UNIX:
cat image1.jpg archive.rar> image2.jpgWINDOWS:
copy / b image1.jpg + archive.rar image2.jpg')
Or in the presence of a hex editor.
Of course, to hide the fact of the transfer of information, you can use not only the JPEG format, but many others. Each format has its own characteristics, due to which it can be suitable or not for the role of a container. I will describe how you can find glued files in the most popular formats or point to the fact of gluing.
Methods for detecting glued files can be divided into three groups:
- The method of checking the area after the EOF marker. Many popular file formats have a so-called end-of-file marker, which is responsible for displaying the necessary data. For example, photo viewers read all bytes up to this marker, however, the area after it remains ignored. This method is ideal for formats: JPEG, PNG, GIF, ZIP, RAR, PDF.
- Method of checking file size. The structure of some formats (audio and video containers) allows you to calculate the actual file size and compare it with the original size. Formats: AVI, WAV, MP4, MOV.
- Method for checking CFB files. CFB or Compound File Binary Format - a document format developed by Microsoft, which is a container with its own file system. This method is based on the detection of anomalies in the file.
Is there life after the end of the file?
Jpeg
To find the answer to this question, it is necessary to delve into the specification of the format, which is the "ancestor" of the glued files and understand its structure. Any JPEG starts with a signature of 0xFF 0xD8.
After this signature is service information, optionally an image icon and, finally, a compressed image itself. In this format, the end of the image is marked by a two-byte signature 0xFF 0xD9.
PNG
The first eight bytes of the PNG file are occupied by the following signature: 0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A. The end signature that ends the data stream: 0x49, 0x45, 0x4E, 0x44, 0xAE, 0x42, 0x60, 0x82.
Rar
Common signature for all rar archives: 0x52 0x61 0x72 0x21 (Rar!). After it there is information about the version of the archive and other related data. It was experimentally established that the archive ends with the signature 0x0A, 0x25, 0x25, 0x45, 0x4F, 0x46.
Table of formats and their signatures:
Format | Initial signature | Final signature |
---|
Jpeg | 0xFF 0xD8 | 0xFF 0xD9 |
PNG | 0x89 0x50 0x4E 0x47 0x0D 0x0A 0x1A 0x0A | 0x49 0x45 0x4E 0x44 0xAE 0x42 0x60 0x82 |
Rar | 0x52 0x61 0x72 0x21 | 0x0A 0x25 0x25 0x45 0x4F 0x46 |
The check algorithm for gluing in these formats is extremely simple:
- Find the initial signature;
- Find the final signature;
- If there is no data after the final signature - your file is clean and does not contain attachments! Otherwise, you need to look for other formats after the final signature.
GIF and PDF
Format | Initial signature | Final signature |
---|
Gif | 0x47 0x49 0x46 0x38 | 0x00 0x3B |
PDF | 0x25 0x50 0x44 0x46 | 0x0A 0x25 0x25 0x45 0x4F 0x46 |
A PDF document may have more than one EOF marker, for example, due to incorrect document generation. The number of final signatures in the GIF file is equal to the number of frames in it. Based on the features of these formats, you can improve the algorithm for checking the presence of glued files.
- Paragraph 1 is repeated from the previous algorithm.
- Paragraph 2 is repeated from the previous algorithm.
- When finding the final signature, remember its location and look further;
- If this way we reached the last EOF marker, the file is clean.
- If the file does not end with the final signature - goto is the location of the last found final signature.
The large difference between the file size and the position after the last final signature indicates the presence of an attached attachment. The difference can be more than ten bytes, although it is possible to set other values.
ZIP
A special feature of ZIP archives is the presence of three different signatures:
Signatures | Description |
---|
0x50 0x4B 0x03 0x04 | The signature of the usual archive |
0x50 0x4B 0x05 0x06 | Empty archive signature |
0x50 0x4B 0x07 0x08 | The signature of the archive, divided into parts |
The archive structure is as follows:
Local File Header 1 |
File Data 1 |
Data Descriptor 1 |
Local File Header 2 |
File Data 2 |
Data Descriptor 2 |
... |
Local File Header n |
File data n |
Data Descriptor n |
Archive decryption header |
Archive extra data record |
Central directory |
Most interesting is the central directory, which contains metadata about the files in the archive. The central directory always starts with the signature 0x50 0x4b 0x01 0x02 and ends with the signature 0x50 0x4b 0x05 0x06, followed by 18 bytes of metadata. Interestingly, empty archives consist only of a finite signature and 18 zero bytes. After 18 bytes, the archive comment area follows, which is an ideal container for hiding a file.
To check the ZIP archive, you need to find the final signature of the central directory, skip 18 bytes and look for signatures of known formats in the comment field. The large size of the comment also indicates the fact of gluing.
Size matters
Avi
The structure of the AVI file is as follows: each file starts with a RIFF signature (0x52 0x49 0x46 0x46). On the 8 byte is the format specifying the signature AVI (0x41 0x56 0x49 0x20). The block at offset 4, consisting of 4 bytes, contains the initial size of the data block (byte order - little endian). To find out the number of the block containing the next size, you need to add the size of the header (8 bytes) and the size obtained in the block 4-8 bytes. This calculates the total file size. It is assumed that the calculated size may be smaller than the actual file size. After the calculated size, the file will contain only zero bytes (it is necessary to align the border with 1 Kb).
Size calculation example:

Bias | The size | Next offset |
---|
four | 31442 | 8 + 31442 = 31450 |
Wav
Like AVI, a WAV file starts with a RIFF signature, however, this file has a signature with 8 bytes - WAVE (0x57 0x41 0x56 0x45). File size is calculated in the same way as AVI. The real size should coincide completely with the calculated one.
Mp4
MP4 or MPEG-4 - a media container format used for storing video and audio streams, also provides for the storage of subtitles and images.
Signatures are located at 4 bytes offset: file type ftyp (66 74 79 70) (QuickTime Container File Type) and file type mmp4 (6D 6D 70 34). To recognize hidden files, we are interested in the ability to calculate the file size.

Consider an example. The size of the first block is at zero offset, and it is equal to 28 (00 00 00 1C, Big Endian byte order); it also indicates the offset, where is the size of the second data block. At 28 offset we find the next block size equal to 8 (00 00 00 08). To find the next block size, add the dimensions of the previous blocks found. Thus, the file size is calculated:
Bias | Value | Next offset |
---|
0 | 28 | 28 + 0 = 28 |
28 | eight | 28 + 8 = 36 |
36 | 303739 | 36 + 303739 = 303775 |
303775 | 6202 | 303775 + 6202 = 309977 |
Mov
This widely used format is also an MPEG-4 container. MOV uses a proprietary data compression algorithm, has a similar MP4 structure and is used for the same purpose - to store audio and video data, as well as related materials.
Like MP4, any mov-file has a 4-byte signature ftyp at 4 offset, however, the following signature has the value qt__ (71 74 20 20). The rule for calculating the file size has not changed: starting from the beginning of the file, we calculate the size of the next block and add it.
The method of checking this group of formats for the presence of “glued” files is to calculate the size according to the rules specified above and compare it with the size of the file being scanned. If the current file size is much smaller than the calculated one, then this indicates the fact of splicing. When checking AVI files, it is assumed that the calculated size may be smaller than the file size due to the presence of added zeros to align the border. In such a case, it is necessary to check the zeros after the calculated file size.
Checking Compound File Binary Format
This file format, developed by Microsoft, is also known as OLE (Object Linking and Embedding) or COM (Component Object Model). Files DOC, XLS, PPT belong to a group of CFB-formats.
The CFB file consists of a 512-byte header and sectors of the same length that store data streams or service information. Each sector has its own non-negative number, with the exception of special numbers: “-1” - numbers the free sector, “-2” - numbers the sector that closes the chain. All sector chains are defined in the FAT table.

Suppose that an attacker modified a certain doc-file and pasted another file into its end. There are several different ways to detect it or point to an anomaly in a document.
Abnormal file size
As mentioned above, any CFB file consists of a header and sectors of equal length. To find out the size of a sector, you need to read a two-byte number at 30 offset from the beginning of the file and raise 2 to the power of this number. This number must be either 9 (0x0009) or 12 (0x000C), respectively, the size of the file sector is 512 or 4096 bytes. After finding the sector, check the following equality:
(FileSize - 512) mod SectorSize = 0
If this equality does not hold, then you can indicate the fact of the gluing of files. However, this method has a significant drawback. If the attacker knows the size of the sector, then it is enough for him to paste his file and n more bytes so that the value of the glued data is a multiple of the size of the sector.
Unknown sector type
If the attacker knows about the workaround of the previous check, then this method can detect the presence of sectors with undefined types.
We define the equality:
FileSize = 512 + CountReal * SectorSize, where FileSize is the file size, SectorSize is the sector size, CountReal is the number of sectors.
We also define the following variables:
- CountFat is the number of FAT sectors. Located at 44 offset from the beginning of the file (4 bytes);
- CountMiniFAT - the number of MiniFAT sectors. Located at 64 offset from the beginning of the file (4 bytes);
- CountDIFAT - the number of sectors DIFAT. Located at 72 offset from the beginning of the file (4 bytes);
- CountDE - the number of sectors Directory Entry. To find this variable, you need to find the first DE sector, which is at 48 offset. Then you need to get the full DE representation from the FAT and count the number of DE sectors;
- CountStreams - the number of sectors with datastrims;
- CountFree - the number of free sectors;
- CountClassified - the number of sectors with a specific type;
CountClassified = CountFAT + CountMiniFAT + CountDIFAT + CountDE + CountStreams + CountFree
Obviously, with the CountClassified and CountReal inequalities, it can be concluded that files can be glued together.
Used sources:
Parsing MP4 structureParsing AVI structureParsing MOV structureParsing wav structureO-checker: Detection of Malicious Documents through Deviation from File Format SpecificationsGIF format specificationsPDF format specificationsJPEG article on WikipediaZip structure parsing