This article is intended for IT managers and system administrators responsible for developing and implementing backup and data protection strategies. The article discusses the typical problems associated with data corruption, the shortcomings of traditional solutions to these problems and ways to improve existing strategies to further minimize the losses in case of failures.
The article is based on unique statistics collected on an array of 200 thousand damaged files that have been restored in OfficeRecovery Online.
The problem and its causes
One of the most important tasks in the planning and implementation of information infrastructure is the preservation of data. Damage or disappearance of accumulated information can cause significant damage to the business. Therefore, ensuring the reliability and safety of data should be diverse and multi-layered, protecting against the largest possible number of possible data loss situations.
In order to consider the main methods of ensuring data integrity, consider the main causes of their damage:
- Hardware failure Data loss due to physical media failure. With such damage, arbitrary parts of files are replaced with meaningless data. In severe cases, the damage goes beyond the files and can affect the file system as a whole, which can cause problems even with finding files, and not just reading them.
- Software crash. Loss of data after an error in the application processing them, for example, when saving changes to a file. Typical types of problems of this type: lack of memory, error in the application, failure of the operating system. In this case, the data in the file may cease to be complete, but can be recovered well.
- Human factor. For example, the loss of important data due to erroneous deletion of files. Modern tools for recovering deleted data from disk use specialized algorithms, but this does not always bring the desired result. As a result, some parts of the files can be “erased” with arbitrary garbage from the disk.
As the main way to deal with the consequences of the data corruption causes, backup is used, and in large organizations - the so-called Disaster Recovery Planning, contingency planning (hereinafter - DR-strategy and DR-planning).
')
Backup and DR planning as a solution to corrupted data
It is interesting to note that if in the west DR planning is a hot topic for many years now, then this term does not exist in the domestic IT glossary, in any case, there is no Russian equivalent to the corresponding article in wikipedia.
The difference between backup strategies and DR planning is that the second is a comprehensive set of technologies and procedures that answer all questions that accompany the restoration of the business IT infrastructure after catastrophic events. If the backup task is to return a complete set of data to users, then the DR planning task is to return the working IT infrastructure to the system, which is often equivalent to returning the entire business to the system.
Backups, the use of fault-tolerant data storages (for example, RAID disk arrays), thus, are only technological methods used in the development of DR-strategy.
The ultimate goal of DR planning is the complete elimination of situations when you have to deal with corrupted files and databases. Any solution provider in this area will tell you with confidence that, in the case of force majeure, your data will return to you, just press a few buttons.
Unfortunately, this is not entirely true. If you face it, you have to admit that damaged files still appear, even in organizations with billions of investments in data protection.
The main reasons for this are as follows:
- Improper use of backup systems and incorrect implementation of implemented practices. The likelihood that the data protection solution you have implemented works three days after implementation - 99% and, most likely, even much higher. But as time goes on, storages overflow, employees come and go. After two years, it may appear that the decision has long ceased to work, but this was not followed up.
- The inevitable existence of zones not covered by backup systems. Does your employee have a habit of editing an important business confidential document right on a flash drive? One untimely extraction of the flash drive from the computer during editing - and now we’ve got another corrupted file.
- Backup frequency. The frequency is 24 hours, but it can be 72 hours or 12 hours, depending on how many resources you can allocate for storing backups. The problem is that in case of force majeure, you are guaranteed complete data of 24-hour (or 72-hour, or 12-hour) prescription. The data accumulated since the last backup, no one promises to restore you. And this is the newest, and often the most valuable data.
- Exposure of backup systems to the same failures that corrupted data on production servers. This, of course, refers to the shortcomings of DR planning, but it often happens that a backup RAID rains even a little earlier than the working server it protects standing nearby.
What to do if the file is damaged, the application working with it refuses to open it, and the backup system cannot offer a copy containing the data you need? Is it possible to recover data in the most damaged file? What is the probability that it will be able to do this? What does this mean for improving DR planning in your organization?
OfficeRecovery Online: analyzes 200,000 damaged files recovery
In August 2011, OfficeRecovery launched a cloud service for online recovery of damaged files (
https://online.officerecovery.com/ru/ ). By September 2012, 200,000 files had passed through the system, and statistics were collected that are of considerable interest from the point of view of DR planning for organizations looking for ways to increase business resilience to man-made force majeure.
The order of the month was spent on processing the collected data and identifying typical causes of damage. Here are the statistics of the recovery of some popular file types:
- Corel WordPerfect files - 93.1% recover successfully
- ZIP archives - 79.0%
- Microsoft Word Documents - 75.9%
- Microsoft Project Files - 66.2%
- Images Adobe Photoshop - 66.1%
- Microsoft Excel spreadsheets - 63.2%
- Microsoft Access Databases - 55.4%
- Microsoft PowerPoint Presentations - 52.1%
- Graphic formats (pictures, photos) - 46.4%
Note: recovery is considered successful when at least some of the data is retrieved from the file. Data loss is usually unavoidable, but often even a small restored fragment is of great value to customers.The main difficulty for recovery is represented by graphic formats. This is due to the fact that often the content of the image is stored in a compressed form and it is almost impossible to restore the part of the picture that follows the site of damage in an acceptable form. This mainly concerns the formats JPEG, TIFF and RAW.
The situation with office application formats is much better. OfficeRecovery has been working with office application formats for over 14 years and has extensive experience in this area.
In terms of recovery, Microsoft Word files are considered to be easy, as in the case of even very serious file damage, there remains at least the opportunity to retrieve all the text stored in the file, albeit with loss of formatting. Often this is the only way to help users.
The next in terms of ease of recovery is Microsoft Excel: if the internal structure of the file is seriously damaged and it is impossible to fully read the set of sheets in the workbook, then it remains possible to extract the contents of all the cells on one page.
On average, recoveries were successful in more than half the time! In other words, OfficeRecovery Online returned the full or partial content from 100 thousand files that were considered lost to users.
Recovering corrupted files as part of a data protection strategy
As this article shows, broken files a) are a common problem and b) for the most part are subject to “treatment” of varying degrees of success.
Conclusion: developing your DR-strategy, immediately include in it software products and procedures for recovering damaged data resulting from failures in your IT infrastructure. Do not expect that, due to the implementation of the backup system, this situation is “impossible in principle”.OfficeRecovery offers a suite of products that complements traditional DR planning solutions with the ability to recover data that, for one reason or another, was outside the coverage of backup systems.
For such “light” formats as Word, Excel, PowerPoint and dozens of others, which form the basis of business electronic document management,
OfficeRecovery Online online service is well suited. With the help of this service, any employee can recover a broken file without special skills and using only a browser. Immediately after recovery, demo results are available, and even there is the possibility of getting free results in 2-4 weeks from the moment of recovery. For an additional fee, analysis and treatment of problem files by qualified specialists is possible.
To recover large amounts of data (for example, when databases, virtual disk images, Exchange mail databases) are damaged, the OfficeRecovery main site
(www.officerecovery.com) offers a set of traditional “offline” software products for recovering data from most common formats. These products are also recommended for cases where online recovery is not possible for privacy reasons. Instead of uploading corrupted data to an online service, a client can buy the corresponding software product and recover the data without transferring them outside of his company.