Terabyte is not the limit. Recovering data of extra-large volume on the example of a damaged Microsoft SQL Server database

When recovering data from corrupted files, a thorough knowledge of the internal data storage structure, workaround algorithms and error correction in the data structure is usually crucial. But sometimes there are additional factors that should be considered when processing the dead data and their recovery. One of the factors that I would like to talk about in this article is file size.

Most of the corrupted files encountered in our work are related to office files (documents, tables, presentations) or graphic formats. Also, their distinctive feature is a relatively small size (significantly less than 10 MB). This is due to two reasons. First, with a huge number of users who create and use files of these formats. Secondly, usually such small and, as it is often considered, not very important files do not fall within the scope of corporate data protection. Such files are often stored in portable data storages (USB Flash, and sometimes diskettes), which also has a very bad effect on their preservation. When processing this class of files, there is usually no problem with the size of the input data - the input file, if desired, can be completely projected into RAM and work directly in it.

Also, a significant percentage of files that come to us for recovery are made up by various databases. Their size usually ranges from hundreds of megabytes to tens of gigabytes. Usually such files fall under the scope of corporate measures to ensure data integrity, but even this does not give an absolute guarantee that the data will be saved in the event of a total failure. Most of these files are impractical or impossible to store in memory. Therefore, when they are processed in RAM, first, some markup of the location of the data in the file is formed, according to which, in the next recovery step, the data suitable for recovery is read and the output data is generated. In the case of potentially large volume occupied by the file markup, and also if during the recovery process it is necessary to link disparate pieces of data that form one object (for example, letters in the Exchange Server database), a temporary database storing the markup is used.
')
But there are exceptional cases - broken databases ranging in size from hundreds of gigabytes to several terabytes. Of course, data of this size cannot be unimportant and often the work of the entire company is built around such a database. Obviously, all backup schemes should be applied to such data, ensuring the reliability of storage, but even with all this there are cases of falling databases. About one of these cases will be discussed further.

FORMULATION OF THE PROBLEM

Our company was contacted for support by a user who needed to restore an MS SQL Server database of 1.8 terabytes. The largest file that we received for processing before this event is the 200-gigabyte Exchange Server database, so there was no experience with such files. A user bought our product to restore this database format - Recovery for SQL Server ( www.officerecovery.com/mssql/ ), but when trying to restore the database, the program recorded the first 99999 sql files of scripts from one of the tables and did not visually do anything further. The solution to this problem was to increase the digit capacity of the sql-script counter in the name of the output file and increase the amount of data written to each of the files. After the updated build of the program was sent, the user launched it again to restore the database, but after a few days of operation, the program fell. Also, the user complained about the not very fast operation of the program. Our further work went precisely on the basis of these two complaints.

To speed up the process of solving these problems, we asked the user to send us a copy of the problem base by signing a non-disclosure agreement (NDA) on our part. The database came on a hard drive that we connected to one of our servers in California. Further runs on real data took place there.

CHANGES TO THE PROGRAM

When working on this request, the improvement of the program had two main objectives:

the principal possibility to recover files of extra-large volume - if it becomes possible to restore the bases to 3 terabytes, then we can talk about the possibility of restoring the bases of virtually any size.
The acceleration of the program - the difference between 4 and 6 hours for recovery is mostly insignificant, the difference between 20 and 30 days is critical.

First of all, a study was conducted of the program on its fall on this file. There was an option that the fall was due to the insufficient protection of the algorithm against incorrect data - in this case, it was necessary to localize where these incorrect data are located, deal with them and introduce protection against such bitness. It would be necessary to search among about 500 separate files that make up this database. But another version of the crash was confirmed - due to the non-optimal algorithm for reading markup from a temporary database on very large amounts of data, there was not enough RAM. The algorithm was reworked, the fall was eliminated. Other falls in the course of further work did not occur.

The speed of the program improved in two aspects:

The speed of the initial analysis of the base. Initially, it was about two days. After processing the file read with the rejection of system buffering on its own implementation of the read buffer, the parsing time decreased to 6-7 hours. Also solved the problem of clogging RAM system buffer read files.
The rate of generation of the output. Since the output data is sql-scripts, most of the CPU time is spent on generating strings from binary data. A large-scale optimization of the sql-script generator was carried out. The result is a full database recovery run without writing to the hard disk (but with the formation of all the output data) - 6 days 8 hours. Initially (according to calculations based on partial file recovery before optimization), recovery would take 25-27 days.

After a full run of the “virtual” recovery of the database without writing the output files to the hard disk, another problem became apparent - the output scripts to recreate the database should take 12,730,132,244,866 bytes (11.6 terabytes), and they simply could not be stored.

To solve the problem of storing output files, the compression of sql scripts was made on the fly into zip archives. The package was launched in separate streams with packs of scripts of about 20 gigabytes each, after packing the corresponding scripts were deleted. But since the scripts were generated faster than the previous ones were packed, and as the number of streams increased, the load on the hard disk increased dramatically, the number of streams simultaneously working on packaging was limited to four. When this limit was reached, the main recovery flow was suspended and waited for the completion of the packaging.

As a result, database recovery was successful. The recovery time is 17 days, data archiving has slowed down the process, but I had to make a compromise. The size of the packaged scripts is 1.1 terabytes. Scripts were recorded on a separate hard disk and sent to the user. After some time required for the execution of all scripts and the return of the base to the system, a positive response was received from the user.

FINDINGS

Recovery of even such huge amounts of data is possible, although most likely it will require an individual approach on the part of the recovery service and the development or customization of the program for a specific case.

If problems arise with databases of this class, you should not throw a program that could not immediately recover the damaged file. Direct communication with us in such cases will help to finalize the program to a level that allows you to master even such files. OfficeRecovery does not have a single case like this - we are aware of several more cases of recovering 200-500 gigabyte databases, and we are confident that we do not know about all such cases.

All performance indicators were removed from the program on the following configuration - Intel Core i7 920, 8Gb RAM, 1Tb (system) + 2Tb (input) + 2Tb (output) SATA-2 HDD. To speed up the recovery will help a separate SSD for temporary storage of scripts intended for packaging.

Do not forget to make backups and use reliable methods of storage. But remember that no one is immune from data loss. And if data loss still occurred - OfficeRecovery is waiting for you.

Source: https://habr.com/ru/post/151056/

All Articles

Terabyte is not the limit. Recovering data of extra-large volume on the example of a damaged Microsoft SQL Server database

FORMULATION OF THE PROBLEM

CHANGES TO THE PROGRAM

FINDINGS

More articles: