Fast implementation of incremental backup on Amazon S3

After moving my website from a shared hosting to a virtual server in the cloud, the question of data archiving came up close: if earlier the hoster was worried about the daily backup, now these worries are completely on the administrator’s shoulders. Since storing large archives on your server is not only unsafe, but also (sometimes) expensive, it was decided to copy the bulk of the files to the Amazon S3 service. Under a cat my method of implementation of an incremental backup is described. The method is quite Nubovsky, but those who wish to repeat it will easily find ways to fine-tune to suit their needs.

First, we define tasks

We need:
1. To do daily archives of web servers (catalogs of sites and databases). It is necessary to keep archives for the last month.
2. Make incremental archives - to save disk space. Make a complete archive weekly, then 6 archives of modified files, then repeat the cycle.
3. Store copies of archives on a third-party server (Amazon S3). On the local server we will keep the archive for a week, the remaining archives will be present only on the server.
My server runs under Debian OS, for other operating systems, you may need to make appropriate corrections to the installation commands and file paths.

Required components

backupninja : the program serves as a convenient wrapper for other utilities, allows you to centrally manage the backup process.
Installation: apt-get install backupninja

duplicity : utility for creating incremental backups. Able to work with remote servers.
Installation: apt-get install duplicity
')
boto : API for working with Amazon Web Services. Duplicity is used to archive directly to AWS S3 service.
Installation: apt-get install python-boto

s3cmd : command line utility for working with Amazon S3.
Installation: apt-get install s3cmd

Implementation

You need to start by registering with AWS. The process does not present any difficulties, maxout is well described in the topic Fast backup implementation in Amazon S3 . In Urjupinsk, the registration went off with a bang with a card of the local branch of the Sberbank, which means there will be no problems in other cities and villages.

Next, configure the backupninja. The main configuration file of the program is /etc/backupninja.conf All options in the file are obvious, it makes no sense to bring them here. In the simplest case, there is even no need to change anything, except to adjust the logging level (from fatal errors to debug information) and the composition of the task report sent to the admin mail.

Settings for each individual job are stored by backupninja in separate files in the /etc/backup.d directory (by default). The file name is preceded by a numeric prefix that specifies the order in which the task is executed. Tasks in files without prefix and with prefix 0 will not be executed. Tasks in files with the same prefix will be executed in parallel, each task with a prefix greater than the previous one will be executed after the completion of tasks with a smaller prefix. In this way, we can, by setting the file prefix, control the sequence in which tasks are performed.

In addition to the prefix, task files must have an extension corresponding to the type of task.

Backupninja has a shell for configuring and testing ninjahelper task configurations. It seems to me that it is more convenient to create and modify tasks in a text editor, and to test them from a shell.

So, the database backup job configuration file. I archive the database in two steps: first I make dumps of all databases, then I make incremental archives from the dumps. Step one, file 20-all.mysql (I copied the blanks of files from / usr / share / docs / backupninja / examples and made the necessary changes to them)

  ### backupninja MySQL config file ###

 # hotcopy = <yes |  no> (default = no)
 # make a backup of the actual database binary files using mysqlhotcopy.
 hotcopy = no

 # sqldump = <yes |  no> (default = no)
 # make a backup using mysqldump.  this creates text files with sql commands
 # sufficient to recontruct the database.
 #
 sqldump = yes

 # sqldumpoptions = <options>
 # (default = --lock-tables --complete-insert --add-drop-table --quick --quote-na
 # arguments to pass to mysqldump
 # sqldumpoptions = --add-drop-table --quick --quote-names

 # compress = <yes |  no> (default = yes)
 # if yes, compress the sqldump output.
 compress = yes

 # dbhost = <host> (default = localhost)

 # backupdir = <dir> (default: / var / backups / mysql)
 # where to dump the backups.  hotcopy backups will be in a subdirectory
 # 'hotcopy' and sqldump backups will be in a subdirectory 'sqldump'
 backupdir = / home / backups / mysql

 # databases = <all |  db1 db2 db3> (default = all)
 # which databases to backup.  should either be the word 'all' or a
 # space separated list of database names.
 databases = all

 user = root

In the configuration I indicated that:
- it is necessary to do a dump using mysqldump;
- dump must be compressed;
- save compressed files in the directory / home / backups / mysql
- it is necessary to backup all databases

After the task is completed, files with database names compressed with gzip will appear in the / home / backups / mysql directory.

The next task we do is an incremental database archive using duplicity, the 30-databases.dup file (I will only list the options I have changed; you need to copy the configuration file completely from /usr/share/docs/backupninja/examples/example.dup and make changes to it, otherwise the task will not be completed)

  ## additional options for duplicity
 ## I do not encrypt the archive, I increase the size of the volume to 512 megabytes, 
 ## I set the path and name for duplicity cache
 options = --no-encryption --volsize 512 --archive-dir / home / backups / duplicity --name vds1.databases

 ## temporary directory.  note that backupninja requires on disk 
 ## free space not less than the size of the volume
 tmpdir = / home / backups / tmp

 [source]
 ## what is included in the backup.  multiple sources can be set, one per line
 include = / home / backups / mysql / sqldump

 ## if you need to exclude something from the archive, specify here
 #exclude = /www/urup.ru/sxd/backup

 [dest]
 ## incremental backup (this is the default)
 incremental = yes

 ## How many days to do incremental backups before making a full backup again
 increments = 7

 ## How many days to store backups.  We store archives on the local server for the last 7 days.
 keep = 7

 ## Where to add backups.  duplicity can save on different services,
 ## including directly on AWS, but we need to store some of the archives for
 ## local disk, so we’ll copy files to Amazon 
 ## in a separate assignment
 desturl = file: /// home / backups / mysql

Once again, I’m noting that I only listed the modified lines of the configuration file. In order for the task to work, you must copy the entire configuration from /usr/share/docs/backupninja/examples/example.dup and make the necessary changes in this file.

Similarly, we do tasks for archiving web server directories. Do not forget about the possibility to set several sources for inclusion in the archive and several for exclusion, one per line of configuration. I archive each site in a separate task, I archive each archive into a separate directory with the site name in / home / backups / files / www /

When creating archives, if this is important to you, you can encrypt files using GnuPG. In this case it is necessary to remove the option - no-encryption.

The last task will copy the created archive files to the AWS S3 server. The file is called 50-upload.sh and is a regular shell script:

  #! / bin / sh

 # we synchronize local copies of archives with copies on Amazon
 s3cmd sync \
  --bucket-location = EU \
  --exclude 'sqldump / *' \
  / home / backups / files \
  / home / backups / mysql \
  s3: //vds1.backup

 # delete archives older than 30 days on Amazon
 duplicity --no-encryption --s3-use-new-style --archive-dir / home / backups / duplicity --name vds1.databases.s3 --force remove-older-than 30D s3 + http: // vds1 .backup / mysql

For duplicity to work correctly with Amazon services, you need to install and configure boto. The setting is to specify the credentials in the /etc/boto.cfg file

  [Credentials]
 aws_access_key_id = *
 aws_secret_access_key = *

Recovery from archives

Get the archive information:

  duplicity --no-encryption --s3-use-new-style collection-status s3 + http: //vds1.backup/mysql

Unpack the local archive:

  duplicity --no-encryption file: //// home / backups / mysql / home / backups / mysql / sqldump

The first parameter is the URL of the archive, the second is the directory into which the files will be unpacked. Existing files will not be overwritten unless the --force option is specified.

Unpack the archive from the Amazon server:

  duplicity --no-encryption --s3-use-new-style s3 + http: //vds1.backup/mysql / home / backups / mysql / sqldump

Extract one file from the archive.

  duplicity --no-encryption --file-to-restore home / backups / mysql / sqldump / mysql.sql.gz --force file: /// home / backups / mysql / home / backups / mysql / sqldump / mysql1. sql.gz

The file to be unpacked is indicated by the --file-to-restore key along with the full relative path (without the leading slash). The first parameter is the URL of the archive, the second is the full path to the file being unpacked. If the file already exists, it will not be overwritten, unless the --force option is specified.

Source: https://habr.com/ru/post/128710/

All Articles

Fast implementation of incremental backup on Amazon S3

First, we define tasks

Required components

Implementation

Recovery from archives

More articles: