📜 ⬆️ ⬇️

12 typical database backup errors


Initially, this article was conceived only for developers and administrators of Firebird DBMS, but after talking with administrators of other databases, it turned out that most of the errors are common, and literally everything comes to a very similar rake. If you can add something to this list (even if it is specific for a particular DBMS), write to your personal email or in the comments.
In English: 12 Common Mistakes while Backing Up Databases

Our company is engaged in recovery, backup, optimization and DBMS support tools (mainly Firebird, but there are also MSSQL, PostgreSQL, InterBase, etc.) and, as a result of numerous audits and repairs, has accumulated a collection of backup-related errors. All items below are set out based on real cases with database damage, loss and damage of backups, disks, server crashes, and other “joys” of database administrators.

I would like to talk about them so that administrators and developers can change their approaches to backup management and prevent possible problems.
')
So let's get started.

1. Delete the previous backup copy before creating a new backup copy
Most often, this error is made by newbies who do not understand that the main purpose of the backup database is to provide a minimal simple information system (an important part of which is the database), and not just creating a copy of the database.
As a result, from the moment of deleting the last backup before creating a new one, the system is in an unprotected state, because during this period the database does not have a single backup copy. Since the backup can be created for a long time, this is the ideal time to trigger the Murphy law. This approach works especially well in conjunction with clause 7 (see below).
Recommendation : do not delete the previous backup until the new one is created! (and do not make a new backup to an existing file)

2. Overwriting an existing database when restoring from a backup
This mistake is made less often, but the results can be much sadder. If the backup was not checked and was damaged (see clause 6), then as a result of overwriting, you will not have either the previous copy of the database or the valid backup.
Usually this disgrace happens on Friday evening, at the time of dergotni, confusion and conflicting instructions from the authorities. A little negative luck and languid weekend in the server provided.
Firebird has some protection against this error - creating a restauran from a backup using the gbak utility with the default key –create will not work if the specified file name points to an existing database. Unfortunately, there is also a bypass of this protection - the –rep switch, which allows you to overwrite the existing file.
Recommendations : never overwrite the combat database file without receiving written instructions from the management and, preferably, without receiving a proposal for a new job.

3. Using a one-step backup restaurant, without creating an intermediate backup file
Standard I / O streams allow you to turn with many DBMS (including Firebird) an interesting trick: perform a stream backup with immediate recovery of the database from it. As a result, an intermediate backup file is not created. This is convenient for maintenance and running a test restore (if you have another backup), but in no case should you use it for an automatic backup!
If in the process of such a backup-restaurant a serious disk failure occurs, for example, the original database may be damaged, and a new one will not be created yet. Of course, if item 1 is observed, and there is a copy of the database from the previous attempt, then only data that has been created or modified in the database since the creation of its copy will be lost.
Recommendations : do not use single-step backup-restore in automatic mode, and in manual mode always check for a sufficiently fresh copy.

4. Storage of backup and database on the same physical device
Here, many may laugh, that we give some children's advice — this is the ABC of system administration. That's the way it is, but due to the proliferation of virtual environments and the database, and the disk can be on the same storage system. And it will surely break at the most inappropriate moment for the business. Plus, there are still people who believe that if they use RAID (from 1 or higher), nothing can happen to their data at all. There are still people who believe in the super-reliability of “branded” iron, but this is a special case.
Recommendations : do not store backup and database on the same physical device, no matter how reliable it may seem.

5. Lack of verification of successful completion of backup
This is quite a common mistake of both administrators and IT managers. If the result of the backup is not checked, then you can not backup at all - the result is generally the same. Be sure to need notifications by email about a successful backup, and even better by SMS. Moreover, the lack of notification is a sign of a problem!
And where are the leaders, the attentive reader will ask, who has read this far? And despite the fact that the administrator will usually configure the backup, he will be too lazy to check the notifications, especially since they are in his separate daddy, and therefore the head of the IT department should periodically request an additional report on the status of all backups. This is the question of who to punish if there are some backups, but at the right moment they were not there :)
! And when combined with paragraph 2, we get the absence of both the base and the backup.
Recommendations : use backup automation tools that can track successful and unsuccessful backups, report problems to users, and have an overview of controls (especially important when you need to control tens and hundreds of backups on different servers).

6. Lack of validation backups
The fact that backups are put somewhere does not mean that they can be read from there.
Therefore, periodic verification of backups created is mandatory, to be sure that the backups being created are not damaged, have not been copied to / dev / null
Recommendations : do not trust anyone, even yourself. All need to check.

7. Lack of health check database when using unverified backups.
Usually DBMS have several types of backup - dumps, just backups, etc. Without going into specifics, we can distinguish 2 categories - verified and unverified backups. Firebird has gbak and nbackup.
Gbak reads the entire database at the record level to create a backup file, and creates the database by inserting records into the new database, and thus verifies the backup (there are options for how errors can leak into the replicated copy, but this is another type of database administrator’s fix with incorrect migration), and the database itself (if it can be read from beginning to end, then with a high degree of probability it is not damaged).
Nbackup (also known as incremental backup) temporarily blocks the main database file for writing (in a consistent state), and allows you to quickly copy the database file (fully or partially / incrementally).
For large Firebird databases (more than 500GB), it is preferable to do nbackup, so as not to slow down users, but you need to check the database, because the created unverified backups are page copies of the database, and if the error nests at the record level (this happens because of a failure RAM) or at a logical level, then an unverified backup will contain it in the same way as the original database.
To do this, you need to use the online validation of the source database (in Firebird, starting from version 2.5.4, online validation using gfix is ​​available, and our FBDataGuard tool supports online database checking for versions 1.5-2.5).
Also, in addition to an unverified backup, it is advisable to make a verified backup periodically (once a week, for example).
For other DBMS, you must use the appropriate tools and combinations of checks.

8. Lack of control over the space for backup
In general, this is a classic mistake - with insufficient space, the backup takes up all the free space and crashes. When placing a backup on the same disk along with the database, it can lead to stopping the work with the database, and when placed on the system disk, it can lead to system failure.
In combination with point 4, at best, we will get the system to stop working, because the database also needs free space, and it has ended because of the backup.
And in combination with points 5 and 2, we again receive as a result the absence of both the base and the backup.
Recommendations : use backup tools that make a forecast of the size of the backup and warn you of a possible lack of space.

9. Lack of time control for backup duration
Just six months ago, the backup lasted 40 minutes, and suddenly it became 3 hours - why? Perhaps the size of the database has grown, or maybe the disk has fallen out of the RAID array, which has caused the write speed to deteriorate dramatically, and all your backups are about to leave this mortal world. Or maybe a good colleague simultaneously included another backup system (by the way, you can run several backups in Firebird at once, only it’s not very clear why this might be necessary).
If you do not control the execution time of the backup, then you can overlook the problem and miss the chance to fix it before it becomes large.
Also, if the backup system does not track the status of tasks, but simply launches them on schedule, you can easily get to the “morning trolley bus” - this is when a new backup can start at the moment when the previous backup is not over.
Recommendations : use the means of controlling the duration of the backup process.

10. Execution of database backup during application of OS updates
A very common problem, especially in combination with p.9 and the included automatic updates of Windows (which by default occur at 3 nights). At best, it slows down the process, and if the OS is overloaded to use updates, then the backup will be corrupted. It's good that OS updates don't happen every day.
Recommendations : If you can not turn off, then assign OS updates at a time when they can not interfere with backups.

11. Database backup by means of file backup or by means of backup of the virtual machine entirely, with the database server turned on
Many administrators forget about the fact that any DBMS has an active and complex cache that contains read and write data, and the database files themselves are opened in random access mode.
That is why it is necessary to use special types of backup, and not a simple file backup (including simple copying of database files) or backup of a virtual machine. File backup tools read the database consistently and, especially for large databases, can take a long time, so the integrity of the created copy cannot be guaranteed.
For those who want to backup the database using file tools or tools for backing up virtual machines, you can offer 2 ways:
  1. completely shut down the services and processes of the DBMS so that nothing is in the cache,
  2. use agents and / or scripts that translate databases into a special mode when copying a database file in a consistent manner is safe.

For Firebird, you need to block the main database file using nbackup before starting the backup and unlock it after it is finished, for other DBMS there are similar means to enable / disable the corresponding modes.

Some database administrators are convinced that if the DBMS has a transaction log, then such a database can be safely backed up using standard file tools, in the extreme case, only this log will be damaged. This is a dangerous misconception that is not supported by database vendors.

The roots of this misconception are clear: aggressive advertising from virtual machine manufacturers and from manufacturers of backup tools is usually silent on the fact that additional settings are needed for databases and other actively changing files. Do not trust advertising - not all yogurts are equally useful.
Recommendations : do not use file backup tools and VM backup tools without appropriate automation tools.

12. Replacing backup replication
Backup and data replication serve to increase reliability and prevent data loss, but still differ significantly.
Everyone loves replication for the ability to synchronize data on another server with minimal latency, however, backup also has a number of undeniable advantages. For example, in case of accidental (or intentional) deletion of data, replication quickly and calmly transmits changes to the replica, and the backup, especially on read-only media, is unresponsive to such operations. Setting up the correct replication, as well as the correct backup, is worth some effort, and the likelihood of errors there also exists.
Recommendations : If you have replication configured, do not neglect backups, use them together.

findings
Properly organizing backups for your beloved DBMS is not so easy, so usually DBA administrators from organizations that value data usually use professional backup tools that allow you to take into account and prevent the errors described above. For Firebird (sorry for advertising) there is our FBDataGuard, for other DBMS you can use DBArtisan or other means.

Well, and of course - do not forget to groom and cherish your admin paranoia, for example, take and check your backups ... right now!


UPDATE

Gentlemen, please respond to the PM of those who use CBT on VMWare for VM with DB.

Source: https://habr.com/ru/post/267881/


All Articles