
Means for backing up information can be divided into several categories:
- For home / office use (backup of important documents, photos, etc. on the NAS or in the cloud);
- For medium and large (offline) enterprises (backing up important documents, reports, databases, etc., both on servers and at workstations of employees);
- For small web projects (backup files and databases from a hosting site or VPS / VDS to a remote host (or vice versa));
- For large web projects with a distributed architecture (almost the same as in offline-enterprises only taking into account the work in the global network, and not local, and usually with the use of open source tools).
With software for home and office, everything is quite simple: there are a lot of solutions, both open and proprietary, from cmd / bash scripts to solutions by well-known software manufacturers.
In the enterprise sector, everything is quite boring; there are a lot of software products that have been successfully working at many enterprises, in large banks, etc. for a long time, and
we will not advertise anyone. Many of these products have simplified the lives of system administrators well, for quite “modest money” by the standards of some enterprises.
In this article we will take a closer look at open source solutions for backing up web projects of various sizes, as well as conduct a test on the speed of file backup.
The article will be useful for webmasters, small web studios, and maybe even a seasoned admin will find something useful here.
What is needed to reserve a small site or blog, or several sites, for example, from a VPS, on which disk space is right next to it?
Begins reservation to a remote host. Those. To save valuable space on your hosting or VPS, you can connect, for example, from your home / office computer (maybe you have a NAS), ftp or sftp protocols, manually or on a schedule pick up files and carefully fold them to some safe place . Any ftp or sftp client will come down, a good option is rsync.
With rsync, it looks like this:
rsync -avzPch user@remote.host:/path/to/copy /path/to/local/storage
And this seems to be good, but what if you need to store multiple versions of database backups? Or for some reason it was necessary to make incremental copies, and it would be nice to add encryption too. You can sit for a while and make a good
bike script to your needs (for example, our
rsync-backup ), or take one of the ready-made utilities.
')
Consider a few utilities that are suitable for various applications, in particular, and for the case described above.
DuplicityDuplicity is a console utility for backup with quite ample opportunities.
There are several graphical skins for Duplicity - Deja-dup for the Gnome environment and test-drive for KDE. There is also a console wrapper duply.
Duplicity backs up to encrypted volumes in tar format locally or to a remote host. The librsync library allows for incremental writing of files, gzip is used for compression and gpg does encryption.
There is no configuration file. Automate the backup process will have the most.
Examples of using:
Backup local folder to remote host
duplicity /usr scp://host.net/target_dir
Backup from remote host to local folder
duplicity sftp://user@remote.host/var/www /home/backup/var/www
Recovery
duplicity restore /home/backup/var/www sftp://user@remote.host/var/www
About Duplicity there were already
articles on Habré, so we will not focus on it.
RsnapshotAbout rsnapshot is also not a little said on Habré,
here and
here . And
here is a good article. Rsnapshot as a whole is a good tool for creating incremental backups (snapshots). It is written in perl, using rsync to copy files. It is quite fast (faster than rdiff-backup) and saves disk space quite well due to hard links. Able to do pre and post-backup operations, can not (without crutches) encrypt and make backup to a remote host. Files are stored in their original form - easy to restore. The configuration is quite conveniently organized. Supports multiple temporary backup levels (daily, weekly, monthly). There is a fairly active community.
After you register the necessary lines in the config (what to backup and where), you can run backup:
rsnapshot -v hourly
By default, several hourly and daily snapshots will be stored. Rsnapshot differs from other utilities in that it is automated from the box (relevant for Debian / Ubuntu), i.e. The necessary lines will be written to the cron, and the configuration of the directories "/ home", "/ etc", "/ usr / local" is specified in the config
Rdiff-backupRdiff-backup is very similar to Rsnapshot, but unlike it, it is written in Python and uses the librsync library for data transfer. He knows how to copy files to a remote host, which, by the way, we used rather successfully and still use somewhere else. You can also backup from a remote host, but first you need to install Rdiff-backup there. It stores information about changes to files (deltas) in compressed form, good for large files, it saves disk space even compared to rsnapshot.
Metadata (rights, dates, owner) are stored in separate files.
Run backup from the console:
rdiff-backup remote.host::/home/web/sites/ /home/backup/rdiff/
The presence of the configuration file is not expected. It is necessary to automate most.
ObnamObnam is an open client-server backup application, the program code is written in Python for data transfer using the SSH protocol. It can be operated in two forms:
- Push reservation from a local host to a remote server running Obnam daemon.
- Pull daemon itself collects files from remote hosts via ssh protocol. In this case, the Obnam client is not needed.
Able to do snapshots, deduplication and GnuPG encryption. Backup files are stored in volumes. Metadata is stored in separate files. Recovery is done through the console.
A small excerpt from the description on Opennet (http://www.opennet.ru/opennews/art.shtml?num=39323):
“The approach to backup in Obnam is aimed at achieving three goals: ensuring high storage efficiency, ease of use and security. Storage efficiency is achieved by placing backups in a special repository, the data in which is stored in an optimal representation using deduplication. Backups of different clients and servers can be stored in one repository. In this case, duplicates are merged for all stored backups, regardless of their type, time of creation and the source of the backup. To check the integrity of the repository and its recovery after a crash, a special version of the fsck utility is provided.
If the same operating system is used on a server group, then only one copy of duplicate files will be saved in the repository, which can significantly save disk space when organizing a backup of a large number of typical systems, such as virtual environments. The repository for storing backups can be placed both on the local disk and on external servers (to create a server, no additional programs are required to store backups, access via SFTP is enough). You can access backups by mounting a virtual partition using a specially prepared FUSE module. ”
All this is good, BUT scp is used for copying to a remote host with all the consequences.
BaculaBacula is a cross-platform client-server software that allows you to manage backup, recovery, and data verification over the network for computers and operating systems of various types. At the moment, Bacula can be used on almost any unix-like systems (Linux (including zSeries), NetBSD, FreeBSD, OpenBSD, Solaris, HP-UX, Tru64, IRIX, Mac OS X) and on Microsoft Windows.
Bacula can also run entirely on a single computer or, distributed, on several, and can write backups to various types of media, including tapes, tape libraries (autochangers / libraries) and disks.
Bacula is a networked client / server backup, archive and restore program. Offering extensive storage management capabilities, it makes it easy to find and recover lost or damaged files. Due to its modular structure, Bacula is scalable and can work on both small and large systems consisting of hundreds of computers located in a large network.
GUI and web interfaces (Almir, Webmin) of varying degrees of complexity are available for Bacula.
Some time ago I had to tinker with Almir hard and to no avail in order to run it on Debian Wheezy.
Bacula is a reliable, time-tested backup system, including a well-proven in many large enterprises. Bacula differs fundamentally from the Obnam scheme of work. In the case of option client server Bacula will be 100% centralized system. You also need to have a client application on the host you want to backup. There are three SD, FD daemons working simultaneously on the server. DIR - Storage Daemon, File Daemon and Director, respectively. It is not difficult to guess who is responsible for what.
Backup copies of files Bacula stores in volumes. Metadata is stored in the database (SQLite, MySQL, PostgreSQL). Recovery is performed using a console utility or through a graphical shell. The recovery process through the console, frankly, is not the most convenient.
NumbersI decided to check the backup speed of a small folder (626M) with several sites on WP.
For this, I was not even too lazy to deploy and configure all this software. :)
The test consists of two parts:
1. Members Duplicity, Rsync, Rsnapshot, Rdiff-Backup. We copy from a remote server to a home computer, and since Rsnapshot cannot make a remote backup, it and Rdiff-backup (for comparison) will work as a home machine, i.e. will pull (pull) files from the server, and the rest, on the contrary, will push (push) to the home machine.
All utilities run with the minimum required options.
Rsyncrsync -az /home/web/sites/ home.host:/home/backup/rsync
Full backupLead time:
real 4m23.179s user 0m31.963s sys 0m2.165s
Incrementallead time
real 0m4.963s user 0m0.437s sys 0m0.562s
Occupied place:
626M /home/backup/duplicity/
Duplicityduplicity full /home/web/sites/ rsync://home.host//home/backup/duplicity
Full backupLead time:
real 5m52.179s user 0m46.963s sys 0m4.165s
Incrementallead time
real 0m49.883s user 0m5.637s sys 0m0.562s
Occupied place:
450M /home/backup/duplicity/
Rsnapshotrsnapshot -v hourly
Full backupLead time:
real 4m23.192s user 0m32.892s sys 0m2.185s
Incrementallead time
real 0m5.266s user 0m0.423s sys 0m0.656s
Occupied place:
626M /home/tmp/backup/rsnap/
Rdiff-backuprdiff-backup remote.host::/home/web/sites/ / home / backup / rdiff /
Full backupLead time:
real 7m26.315s user 0m14.341s sys 0m3.661s
Incrementallead time
real 0m25.344s user 0m5.441s sys 0m0.060s
Occupied place:
638M /home/backup/rsnap/
The results are quite predictable. The fastest was Rsync, almost the same result for Rsnapshot. Duplicity is a bit slower but takes up less disk space. Rdiff-backup is expected worse.
2. Now interesting. Check out how Obnam and Bacula work. Both solutions are quite universal, plus some similarities. Let's see who is faster.
ObnamThe first time I started copying from a remote host to my home, I had to wait a long time:
obnam backup --repository sftp: //home.host/home/backup/obnam/ / home / web / sites /
Full backupBacked up 23919 files, uploaded 489.7 MiB in 1h42m16s at 81.7 KiB / s average speed
Lead time:
real 102m16.469s user 1m23.161s sys 0m10.428s
obnam backup --repository sftp: //home.host/home/backup/obnam/ / home / web / sites /
IncrementalBacked up 23919 files, uploaded 0.0 B in 3m8s at 0.0 B / s average speed
lead time
real 3m8.230s user 0m4.593s sys 0m0.389s
Occupied place:
544M /home/tmp/backup/rsnap/
Not a very good result in my opinion, although understandable.
Let's try a second time, but on a neighboring server over a gigabit network and add compression.
obnam backup --compress-with = deflate --repository sftp: //remote.host/home/backup/obnam/ / home / web / sites /
Full backupBacked up 23919 files, uploaded 489.7 MiB in 2m15s at 3.6 MiB / s average speed
Lead time:
real 2m15.251s user 0m55.235s sys 0m6.299s
obnam backup --compress-with = deflate --repository sftp: //remote.host/home/backup/obnam/ / home / web / sites /
IncrementalBacked up 23919 files, uploaded 0.0 B in 8s at 0.0 B / s average speed
lead time
real 0m7.823s user 0m4.053s sys 0m0.253s
Occupied place:
434M /home/tmp/backup/rsnap/
So faster and the size of the backup is smaller. I did not try to encrypt, maybe later, if there is time.
BaculaFor Bacula, I have prepared a full-fledged client-server option. Client and server in one gigabit network.
I ran the task in the background and went to drink tea. When I returned, I found that everything was ready, and the following was in the log:
... Scheduled time: 23-Apr-2014 22:14:18 Start time: 23-Apr-2014 22:14:21 End time: 23-Apr-2014 22:14:34 Elapsed time: 13 secs Priority: 10 FD Files Written: 23,919 SD Files Written: 23,919 FD Bytes Written: 591,680,895 (591.6 MB) SD Bytes Written: 596,120,517 (596.1 MB) Rate: 45513.9 KB/s ...
I was even a little surprised. Everything was done in 13 seconds, the next launch took place in one second.
TotalWith rsnapshot, you can easily solve the problem of backing up files and databases (with an additional script) of your VPS to your home computer / laptop / NAS. It is also a good idea that rsnapshot will cope with a small fleet of 10-25 host servers (and more, of course, depends on your desire). Rdiff will be good for backup large files (Video content, databases, etc.)
Duplicity will help not only to keep your data safe, but to protect them in case of theft (unfortunately you cannot protect yourself from yourself, be careful and careful, keep your keys in a safe and inaccessible place for anyone else).
Bacula is an open source industry standard that will help keep data from a large fleet of computers and servers of any enterprise safe.
Obnam is an interesting tool with a number of useful advantages, but I probably will not recommend it to anyone.
If, for some reason, you are not satisfied with any of these solutions, do not hesitate to reinvent your bikes. This can be useful both for you personally and for many people.
UPD: A small summary table: