Search for deleted files in NTFS

In this article, I would like to talk about the algorithms for working with the NTFS file system, which we used when creating the data recovery program Hetman Partition Recovery . The article is written as a continuation of the previous post about FAT .

Under the cut, I will describe the algorithm for finding and recovering deleted files from the NTFS partition, which we applied when developing our program. This algorithm is best described in the book “Forensic analysis of file systems” by Brian Carrie.

The NTFS (New Technology File System) file system was developed by Microsoft for Windows NT. The main goals facing NTFS developers were reliability, security, and support for high-capacity storage media.
')
Perhaps one of the main features of this file system is the storage of all service data in files. Files with administrative data can be located anywhere in the volume, just like regular files. Thus, unlike other file systems, NTFS does not have a rigidly defined structure. The entire file system is considered a data area, and any sector can be allocated to a file. This should preserve the condition that the first sectors of the volume contain the boot sector and the boot code.

NTFS stores all file information in the MFT Master File Table. Small files can be stored directly in the MFT record. Otherwise, clusters are allocated for files, and the MFT record contains a list of these clusters. The recordings themselves are very simple. Their size is 1 KB, but only the first 42 bytes have a specific purpose. The remaining bytes are stored attributes - small data structures that perform a strictly specialized function. For example, one attribute is used to store the file name, and the other is to store its contents.

rice 1. The basic structure of the MFT entry with a header and three attributes

The MFT record contains a small header, and the remaining bytes are for storing various attributes. The record shown in the figure contains three attributes.

It should be noted that the MFT table has a backup, which can be very useful when recovering data.

MFT Record Content

The size of each MFT record is determined in the boot sector, but all versions of Microsoft use 1024-byte records. The first 42 bytes contain 12 fields, and the remaining 982 bytes do not have a fixed structure and are filled with attributes. In simple terms, the MFT record can be compared to a large chest for storage. Outside on the chest are written basic information about the owner - the name and address (analogous to the fixed fields of MFT records). In the chest you can put any object whose size is smaller than the size of the chest. MFT records also do not have a fixed structure and contain attributes that store specific information.

The MFT uses sequential 48-bit addressing of records, with the first record assigned address 0. The maximum address of the MFT changes as the MFT expands and is determined by dividing the size of $ MFT by the size of one record.

Each MFT record also contains a 16-bit sequence number, automatically incremented when the record is created. Let's look at the MFT 313 record with the sequence number 1. The file to which the 313 record was allocated is deleted, and the record is re-allocated to the new file. In this case, a new sequence number 2 is assigned to the address. The MFT address is combined with the sequence number (occupying the higher 16 bits) and forms the 64-bit base address of the file.

rice 2. The base address of the file is formed by combining the address of the MFT entry and the sequence number.

NTFS uses the base address to access MFT records, because the sequence number makes it easy to detect file system corruption. For example, if the process of allocating data structures for a file in the system fails, we will come to the aid of a sequence number. Thanks to him, it will be possible to determine whether the address of the MFT entry remains from the previous file, or if it is part of a new file. In addition, the sequence number can be used when restoring deleted content.

As you have already noticed, the structure of MFT records is minimal, and most of them are used to store attributes - objects containing data of a particular type. The number of different attributes is large, and each of them has its own internal structure. For example, there are attributes for file name, date and time, and even file content. In this NTFS distinguished itself again. As a rule, file systems read and write the contents of files, and NTFS reads and writes attributes, one type of which encapsulates the contents of files.

Let us return to our analogy in which the MFT record was compared with the chest, and the attributes with the small boxes that are put in the chest. Boxes can have any shape that is best suited for storing an object. For example, discs are more convenient to store in round boxes, and posters - in long tubes.

rice 3. Sample MFT Record with Headers and Content Areas

Although heterogeneous attributes are designed for different types of data, all attributes have two common parts: the header and the content. The title is universal and standard for all attributes. The content depends on the type of attribute and can be of any size.

File recovery

Recovering a deleted file to NTFS is easier than on most file systems. But NTFS has one unpleasant feature. When a file is deleted, its name is excluded from the index of the parent directory, and its MFT entry and the clusters it occupies are released. If you exclude the file name from the index of the parent directory, the index is sorted again, and the name information may be lost. In this case, the name of the deleted file disappears from the source directory.

But do not give up, because This disadvantage is partly compensated by the fact that all MFT records are stored in one table. This greatly simplifies the search for all free entries. In addition, each entry contains an attribute with the base address of the parent directory. And this means that when a free record is found, it is usually possible to determine its full path.

To recover all deleted files in NTFS, you need to search for free entries in the MFT. Having found a free record, we can determine the name by the file name attribute and the address of the parent directory. Cluster pointers still exist, and if the data has not yet been overwritten, they can be restored. Recovery is possible even with strong file fragmentation. If the attribute value was resident (i.e., one MFT is enough to store it), the data will not be overwritten until the MFT record is re-allocated. If more than one MFT record is required to store file attributes, then we may need other records for full recovery.

When recovering files or viewing deleted content, file system or change log data may be useful.

The file system log allows the operating system to quickly restore the correct state of the file system. File system corruption usually occurs when system crashes while writing data to the file system. The log stores information about all upcoming metadata updates, and also records are created about their successful update. In the event of any error, the operation being performed may be canceled and the system returned to its previous state. It should be noted that the journal does not contain non-resident data stored in external clusters; therefore, it cannot be used to restore files. It stores the contents of resident attributes to undo recent changes.

The change log is a file in which all changes in files and directories are recorded. It can be useful for identifying files that have changed over a period of time. To identify changes, we need to sort through all the files and directories in the file system and compare their time stamps with a threshold value. This procedure can take a lot of time, but change logs make it much easier.

In conclusion, I emphasize that NTFS is a very complex and powerful file system. This is explained by the fact that it was designed not only taking into account actual needs, but also with a foundation for the future. Nevertheless, despite its complexity, data recovery in NTFS is easier than in most other file systems.

Source: https://habr.com/ru/post/193044/

All Articles