Some experience with backup & storage

Hello!

Some time ago I plunged into the world of "harsh enterprise", namely in that area of his that is responsible for storing and backing up data. More precisely, in her most. And during this period I have accumulated several rules that I try to adhere to when designing or servicing solutions in this area. Some have already outlived their own, with the development of technology, and some quite working. And I decided to share them with you.

There will be no rule 3-2-1, which is often mentioned without me, some kind of direct techniques for specific situations and other things in the same spirit. Perhaps for most of those reading this will be the basics and platitudes. This is just my humble experience and I hope it will be useful to anyone. I ask under the cat.

Features of local "sizing"

Sooner or later there is a need to get some more terabytes and / or IOPS. And then begins sizing. Often meaningless and merciless. Because it is extremely rare for someone to put in the sizing requirements of the RTO which are usually imposed on backups. Although it seems like an obvious requirement for any hardware complex. Those. when sizing and forming requirements for new equipment for some reason, the requirements of the backup system are not taken into account, which will urgently restore something to you on hardware. Sometimes something very big. In general, some kind of performance margin and space is laid, but the very first data recovery shows that it is not enough for the life cycle that was defined for this equipment.
')
Over the past year, I have already seen a situation twice when a disk array on which restoration was performed was a bottleneck in data recovery. The RTO fit, but the bell is alarming.

We have a solution on a cluster, why do you need backup?

It was this very “energetically” spoken phrase that I heard when communicating
with the developer of one very useful software for one business. The developer argued the uselessness of backup to restore the fact that the solution is deployed on the cluster and therefore will save the cluster if the node (or disk array) falls on the site. In these cases, he will certainly save. This is generally excellent when there are guys who think about fault tolerance even at the design stage.

However, data loss is achieved not only by the failure of equipment on one site, and for some reason this developer did not want to understand for a long time. As a consequence, the first version of the software was released to the DBMS community, the backup mechanics of which did not allow fulfilling either the requirements of the RTO / RPO or the SLA of the contracting organization.
In general, I hear this phrase about the cluster quite often.

First, then this!

One of my biggest mistakes was viewing backup objects as independent. Here is the DBMS, here is the software. It is backed up like this, and it is like this. First one, then another. And one day we could not recover. More precisely, we could, but in the few days that were spent to correct errors in the database. And it was not I who eliminated them, for which I am especially ashamed. Although we used a regular backup mechanism for this DBMS. Already tested on other systems.

From that moment on, I always poke my nose and shake the developer / owner of the system on how to properly create backup copies and restore. For example, in one case, the only way to create a working backup was to completely stop the services on 5 servers, make a backup and start the services.

Dump our everything?

Often I come across solutions for DBMS such as MySQL and PostgreSQL. And even more often I come across a situation where a banal dump base in / tmp is used as a backup method, and then to another medium. At the same time, the systems where these DBMSs are used are rather critical to downtime in case of data loss, and are very loaded. I am already silent about the volumes.

For some reason, few people read the documentation for these products and do not know that there are alternative ways and solutions for creating backups of these DBMS. MySQL Enterprise Backup for, respectively, MySQL and pg_basebackup ( pg_start_backup, pg_stop_backup ) in, respectively, PostgreSQL. Or knows, but flew out of my head. Although these solutions are not very difficult, and faster. Faster backup, faster restore, faster test.

Please do not shoot the pianist.
He is doing his best.
Oscar Fingal O'Flahertie Wills Wilde

Source: https://habr.com/ru/post/459952/

All Articles

Some experience with backup & storage

Features of local "sizing"

We have a solution on a cluster, why do you need backup?

First, then this!

Dump our everything?

More articles: