Restore individual pages in the database

Foreword

Gail Shaw article “Help, my database is corrupt. Now what? ” , The translation of which I posted last week, seemed to cause some interest, but, alas, it did not contain“ practice ”. Yes, it says how to save the data, but there are no examples.
Initially, I wanted to make another translation of the same author, but, on reflection, decided to write a post “from myself”, as if “based on”. The reasons that prompted me to do so, I will describe at the end of the post, in the notes.

Database Recovery in SQL Server

As already mentioned in the previous article, in the event that the clustered index or heap pages are damaged, the data contained on these pages is lost and the only option to restore them is to directly restore the database.

SQL Server provides many features for recovering databases. Firstly, it is the recovery of the entire database - it can take quite a lot of time (depending on the size of the database and the speed of the hard drives). Secondly, the restoration of individual file groups, or files, if your database consists of several file groups (or, respectively, files). In this case, it is possible to restore only damaged parts of the database without affecting the rest. These two types of database recovery are used quite often and will not be affected further.
Thirdly, SQL Server 2005 now has the ability to restore individual database pages - in this case, only the specified pages will be restored from the backup. Such a recovery will be especially relevant if DBCC CHECKDB finds several damaged pages in some huge table that is “lying” in a hefty file. Due to the fact that not the entire file, or even the entire table, but only a few pages will be restored, the idle time can be significantly reduced.

Requirements and limitations

Recovery Model and Availability of Transaction Log Backups

The most important thing to remember is to restore individual pages, the database must use the full (full) recovery model, or the recovery model with incomplete logging (bulk-logged). If your databases are in a simple (simple) recovery model - then you can no longer read.
The second requirement is that your full backups and transaction log backups must form an indissoluble log chain . If you never run the BACKUP LOG WITH TRUNCATE_ONLY (NO_LOG) command and do not switch to the simple recovery model in order to reduce the transaction log, and you have ALL transaction log backups since the last full backup without any damaged pages (including this one) the most complete backup) - you can not worry about the chain of journals.
In the recovery model with incomplete logging, theoretically, the recovery of individual pages should work normally if the conditions described above are met and the pages being restored are not modified by operations performed with minimal logging .
')

SQL Server Editions

Page recovery is possible in any edition of SQL Server, but for Enterprise Editions and Developer Edition editions, it is possible to restore damaged pages on-line, i.e. You can refer to the database during recovery (and moreover, you can even refer to the table to which the pages being restored are currently located, if these pages themselves are not “affected” - otherwise, the query will fail). For editions “below” Enterprise Edition, page recovery takes place off-line and the database becomes inaccessible for the duration of the recovery.

Type of damaged page

In the event that the index pages or data are damaged, their recovery is possible online in the Enterprise Edition.
Pages that provide critical system tables can be restored, but the database, when restored, will be unavailable in any edition of SQL Server.
"Placement cards" cannot be restored "separately." If GAM, SGAM, PFS, ML, DIFF pages are damaged, you need to restore the entire database. The only exceptions are IAM pages. Although they refer to “allocation maps”, they describe only one table, not the entire database, and their recovery is possible.
The database loading page (the 9th page in the 1st database file) and the file header page (the 0th page in each file) cannot be restored “separately”, if they are damaged, you will have to restore the entire database.

Actually, recovery

Now, finally, we move from theory to practice.
First of all, for training, you need a corrupted database.

Portim DB

For experiments, I will use the AdventureWorks database that comes with SQL Server. If you have not installed it, you can download it here if you wish. I transfer it to the full recovery model:

ALTER DATABASE AdventureWorks SET RECOVERY FULL

make sure there are no errors yet:

 DBCC CHECKDB('AdventureWorks') WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY

and create a full backup:

 BACKUP DATABASE AdventureWorks TO DISK = 'D:\tmp\aw_backups\aw_full_ok1.bak'

In this database, I create a crash table.

 CREATE TABLE crash (txt varchar(1000))

We will spoil the varchar type field in order to check what will happen if SQL Server suddenly detects in it the wrong data that he himself wrote there.
Before spoiling something, you need to fill it with something. I hammer in the created table the left data.

 SET NOCOUNT ON DECLARE @i INT SET @i = 1 WHILE @i<100000 BEGIN INSERT INTO crash SELECT REPLICATE('a', 1000) SET @i = @i + 1 END SET NOCOUNT OFF

Now I back up the transaction log:

 BACKUP LOG AdventureWorks TO DISK = 'D:\tmp\aw_backups\aw_log_ok1.trn'

Now let's change the data a bit:

So, everything is ready. Disconnect the database and open the mdf file with FAR (or which is more convenient for you), look for the string “zzzzzzz” in it and replace several 'z' with arbitrary characters:

Now that the database is corrupted, we connect it. And yes, I remember that in the previous article it was clearly stated that you should not detach / attach the database. But in our case it is absolutely “safe” - the database in “suspect” will not fall.

We are looking for errors

So, the damaged database successfully returned to the system. Run the integrity check again:

 DBCC CHECKDB('AdventureWorks') WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY

As a result, what we were waiting for ( be sure to remember the numbers of the damaged pages! ):

Msg 8928, Level 16, State 1, Line 1
Object ID 1883153754, index ID 0, partition ID 72057594054246400, alloc unit ID 72057594061651968 (type In-row data): Page (1:20455) could not be processed. See other errors for details.
Msg 8939, Level 16, State 98, Line 1
Table error: Object ID 1883153754, index ID 0, partition ID 72057594054246400, alloc unit ID 72057594061651968 (type In-row data), page (1:20455). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 29493257 and -4.
CHECKDB found 0 allocation errors and 2 consistency errors in table 'crash' (object ID 1883153754).
CHECKDB found 0 allocation errors and 2 consistency errors in database 'AdventureWorks'.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (AdventureWorks).

In this case, the data in the heap itself is damaged (index id = 0), therefore SQL Server cannot recover this data.
Now we have three options:

Accept data loss and perform DBCC CHECKDB ('AdventureWorks', REPAIR_ALLOW_DATA_LOSS)
Make a backup of the active part of the transaction log and restore the entire database - as a result of data loss will not, but it will take a long time
Make a backup of the active part of the transaction log and restore only one (!) Damaged page

With the second option, everything should be clear, but what happens if you run DBCC CHECKDB or how individual pages are restored - I will show further.

We recover the damaged page

First of all, we need to backup the final fragment of the transaction log ( tail backup ). At the same time, if you have Enterprise Edition, you can not add the NORECOVERY parameter, which will transfer the database to the “restoring” state, since this edition supports on-line page recovery. Moreover, if your transaction log backups are performed on a regular basis so as not to break the chain of journals, in the Enterprise Edition edition, you can make a COPY_ONLY backup.
I follow the path of off-line recovery and perform:

 BACKUP LOG AdventureWorks TO DISK = 'D:\tmp\aw_backups\aw_log_fail3.trn' WITH NORECOVERY

Now, you can repair the damaged page. First of all, we use full backup (aw_full_ok1.bak):

 RESTORE DATABASE AdventureWorks PAGE = '1:20455' FROM DISK = 'D:\tmp\aw_backups\aw_full_ok1.bak' WITH NORECOVERY

As a result, we have:

Please note that you need to use the NORECOVERY option, because we still have to roll back the transaction log backups.

 RESTORE LOG AdventureWorks FROM DISK = 'D:\tmp\aw_backups\aw_log_ok1.trn' WITH NORECOVERY

and

 RESTORE LOG AdventureWorks FROM DISK = 'D:\tmp\aw_backups\aw_log_fail3.trn' WITH RECOVERY

It seems that everything went well, we run DBCC CHECKDB and ...

Recovery was successful.
Please note that idle time is reduced due to the fact that we do not restore the entire database from the full backup, but only the damaged pages (if I restored the entire backup, the backup would recover in 8.5 seconds, but the larger the database size more will be the time difference). The lucky ones with SQL Server Enterprise Edition, who use on-line recovery, will also save time on recovering from the log backups, and with an off-line recovery, alas, the logs will be processed entirely.
It is also worth adding that in SQL Server 2005, 2008, 2008 R2, the restoration of a separate page is possible only with the help of T-SQL, Denali has the opportunity to do this through the GUI.

And if all the same DBCC CHECKDB?

Just in case, I decided to check what SQL Server would do if I run DBCC CHECKDB with the REPAIR_ALLOW_DATA_LOSS option. All the same error:

First we convert the database to SINGLE_USER mode:

 ALTER DATABASE AdventureWorks SET SINGLE_USER

And then, start the recovery:

 DBCC CHECKDB('AdventureWorks', REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY

Eventually:

Repair: The page (1:20455) has been deallocated from object ID 1883153754, index ID 0, partition ID 72057594054246400, alloc unit ID 72057594061651968 (type In-row data).

Yeah, SQL Server deleted the “spoiled” page. We transfer the database to the MULTI_USER mode so that it becomes accessible to everyone and check that it is gone:

Given that the page size in SQL Server is 8KB, and a little less is available for user data, then everything is natural, the table is “thinner” by 7 records (at the beginning there were 99999). Since this table did not have a clustered index, data could be stored in a random order, i.e. we could not even know what data would be lost.

So why, after all, not a translation?

So, why is this still not a translation, but a post “based on”. The fact is that in the open access article "Page Restore" by Gail Shaw is not. There is such a section in the book SQL Server MVP Deep Dives vol.2, which is sold for quite tangible money (but, of course, is easily on the Internet) and I’m not sure that publishing a translation is uh ... right or wrong.
In general, I read the article, took note of the main points, and then I wrote the text myself and, along the way, conducted an experiment on restoration. I hope someone this experience was useful.
And, gentlemen, I sincerely hope that if you decide to repeat this experiment, you will be extremely careful (for example, you will not experiment with the main database on the production-server). Remember that I am not responsible for your actions.

Source: https://habr.com/ru/post/137301/

All Articles