📜 ⬆️ ⬇️

Ten millionth backup script

image
This is a manual script article written by me. The script is written in python for Linux. To whom it is interesting, I ask under habrakat.


Opportunities




Installation


In /etc/apt/source.list add:
deb http://repo.nixdi.com/ubuntu/ precise soft 

And run in the terminal:
 apt-key adv --recv-keys --keyserver keyserver.ubuntu.com 74C7B31B5F4E1715 && apt-get update && apt-get install py4backup 

Updating a package is done with the command:
 apt-get update && apt-get upgrade py4backup 

OR
Manually download the package with the command:
 wget http://repo.nixdi.com/ubuntu/py4backup_latest.deb 

and install it:
 dpkg -i ./py4backup_latest.deb 

OR
For distributions other than Ubuntu / Debian run:
 git clone https://github.com/larrabee/py4backup 

And copy the ddd and py4backup files to the directory with binary files (usually / usr / bin), the py4backup_lib.py file to the python library directory. You will also need to manually add dependencies. Requires python 3.x, btrfs-tools (btrfs-progs), lvm2, rsync. In the examples / folder you will find examples of configuration files. They need to be copied to / etc / py4backup /
')

Customization


After installation, you need to copy the configuration files from the example. To do this, run:
 mv /etc/py4backup/py4backup.conf.example /etc/py4backup/py4backup.conf mv /etc/py4backup/jobs.conf.example /etc/py4backup/jobs.conf 

And open the py4backup.conf file for editing with a text editor.
For boolean parameters, use True / False, yes / no, or 1/0.
Separating a parameter from its value is possible with the characters '=' or ':'.
Each parameter must be in its section. The section name is written before the parameter set in square brackets ('[]')
The order of the parameters in sections and sections is not important. If the parameter is not specified in the configuration file, the default value is used.
Sample configuration file:
 [MAIL] send_mail_reports = True login = login@test.com passwd = password sendto = recipient@test.com server = mail.test.com port = 25 tls = True [DD] bs = 4M ddd_bs = 4096 ddd_hash = md5 [LOGGING] logpath = /var/log/py4backup.log enable_logging = True log_with_time = True traceback = False command_output = True [OTHER] temp_snap_name = py4backup_temp_snap host_desc = My Description pathenv = /sbin:/usr/sbin 


Consider the parameters in more detail:
[ MAIL ]: This is where you define the parameters for sending notifications via email.
send_mail_reports: enables / disables sending email reports after completing the task.
login: login to log in to the smtp server.
passwd: password for the smtp server.
sendto: notification recipients. You can enter multiple addresses separated by a space.
server: the domain name or IP address of the smtp server.
port: smtp server port.
tls: enable / disable the use of TLS encryption.

[ DD ]: This is where the backup options are specified using the DD and DDD programs.
bs: block size for DD program (Used to create full copies of LVM volumes). You can specify the size in bytes, kilobytes (k) and megabytes (M). Affects the speed of creating a copy. The optimal value is 32M.
ddd_bs: block size for DDD program (Used to create differential copies of LVM volumes). You can specify the size in bytes. The larger the size, the more space the differential copy takes, but the faster it is created. The optimal value is 4096.
ddd_hash: block hashing algorithm. You can choose between md5, crc32 and None. MD5 loads the processor harder than crc32 and takes up more space, but in the case of using md5 there is much less chance of collisions.
None disables the creation of check amounts. The backup time, its size and processor load are minimal, but if the backup is damaged, you will not know about it. Not recommended for use.

[ LOGGING ]: set up job logging.
logpath: path to the log. If you are using a non-standard log placement do not forget to change the logrotate settings.
enable_logging: enables / disables logging.
log_with_time: enable / disable adding date and time to each log entry.
traceback: enables / disables adding tracebacks to the error log. Useful for debugging.
command_output: enable / disable adding console command output to the log. Useful for debugging.

[ OTHER ]: settings that are not included in other sections.
temp_snap_name: the name of the time snapshots. Used when creating a copy of LVM volumes or folders / files on the BTRFS file system. It is recommended not to change it unnecessarily.
host_desc: textual description of the host. The value of this parameter will be added to the log file and email report.
pathenv: the value of this parameter will be added to the $ PATH variable (if passed the test). If you need to add several folders, you need to separate them with a colon (':') For example, in Ubuntu, to create copies of LVM volumes when running py4backup via cron, you must add the / sbin folder to the $ PATH variable. In this case, the path is specified without the last slash ('/')

Tasks


General information

The list of tasks is in the /etc/py4backup/jobs.conf file
Task example:
 [mail-diff] type = file-diff sopath = server:/opt/ snpath = dpath = /mnt/backup_dest/ dayexp = 30 prescript = bash /root/script1.sh postscript = bash /root/script2.sh include = test test2 exclude = tests* 

Where:
[xxx]: unique name of the task.
type: job type. See details below.
sopath: backup source. In the file-full, file-diff types, you can specify remote hosts as the source.
snpath: where to create snapshot. Used only with btrfs-full, btrfs-diff and btrfs-snap types
dpath: where to save the backup. In the types btrfs-full, btrfs-diff, file-full, file-diff, you can specify remote hosts as the destination.
dayexp: after how many days to delete old backups. If set to -1, backups will never be deleted.
prescript: a script that runs before the backup. Pipe, pipeline, and other bash statements do not work. If you need to execute complex commands, save them as a script and run it.
postscript: script running after backup. The rest is similar to the prescript parameter.
include: what to include in the backup. See details in backup types description.
exclude: what to exclude from backup. See details in backup types description.
Attention! All paths must end with '/'.

Types of backup

In each task in the 'type' parameter indicates the type of backup. This parameter affects the cut pattern. copy and on the function of some parameters.
In total, py4backup has 7 types of backup:


Consider them closer.

file-full

Creates a backup using rsync. A backup copy of the folder specified in sopath is created, including all folders mounted deeper.
Features:
In the variable sopath and dpath, you can specify not only local folders, but also remote hosts. For example:
sopath = root@192.168.0.1: / home / admin / or dpath = server: / home / admin. In the second case, the correct entry should be in the file ~ / .ssh / config. Key authorization is used (additional info. See your distribution wiki).
You cannot specify sopath and dpath by remote hosts at the same time.
The value specified in the include and exclude is passed to rsync as options --include = and --exclude =. You can specify multiple values ​​separated by spaces.

file diff

Creates a differential backup from the source (sopath) and the last full copy found in the destination folder (dpath). If the full copy is not found, the task execution will fail.
The parameter list is similar to the 'file-full' type.

btrfs-full

This type is similar to the 'file-full' type, but before creating the backup copy, a snapshot of the backup directory is taken and the copy is removed from the snapshot.
This type of backup requires the snpath parameter. A temporary snapshot of the source folder (sopath) will be created in the folder specified in snpath. Moreover, the path specified there must be located on the same file system with the folder specified in sopath. Note that only the contents of this file system subvolume are copied. All mounted folders and subvolume subfolders will be ignored. The list of other parameters is similar to the 'file-full' type.

btrfs-diff

With this type of res. first, a snapshot is taken from the source folder (sopath), and then a differential copy is created from the snapshot and the last full copy found in the destination folder (dpath). If the full copy is not found, the task execution will fail.
Just as for the 'btrfs-full' type, it is necessary for the snapshot folder (snpath) to be on the same file system as the original folder (sopath).
Note that only the contents of this file system subvolume are copied. All mounted folders and subvolume subfolders will be ignored. The list of other parameters is similar to the type of 'file-full'

btrfs-snap

This type creates snapshots from the source folder specified in sopath to the snapshot folder specified in snpath.
The parameters exclude, include, dpath do not work for this type. Just as for the 'btrfs-full' type, it is necessary that the snapshot folder (snpath) be on the same file system as the original folder (sopath).
Note that only the contents of this file system subvolume are copied. All mounted folders and subvolume subfolders will be ignored.

lvm-full

This type is intended for creating full copies of LVM volumes. Consider some features of this type. The sopath parameter specifies the path to the Logical Volume Group (VG). For example:
sopath = / dev / main_vg /
By default, the script will make a copy of all the volumes in this VG.
The dpath parameter specifies where to save the backup. You cannot specify remote hosts as a backup destination. In order to make copies of only the necessary volumes, you can use the include and exclude parameters.
The exclude parameter specifies which volumes to exclude from the backup. In addition, it accepts the code word all, which means that all volumes should be excluded.
The include parameter specifies which volumes to include in the backup. Has priority over exclude. For example:
 exclude = all include = mail root 

backs up only mail and root volumes. And the following example will make a copy of all volumes, except the mail volume:
 exclude = mail 


lvm-diff

And the last (for version 1.5) backup type is intended for creating differential copies of LVM volumes.
The script searches the destination folder (dpath) for the latest full backup and, if it finds one, creates a differential copy between it and the snapshot of the current state. In this case, 2 files * -diff.dd and * -diff.ddm will appear in the destination folder. Both of them are necessary for recovery.
All parameters are similar to type lvm-full

Launch

Starting the required tasks to perform is very simple.
You must specify the –jobs (or -j) key and after it specify the names of the required tasks. For example:
py4backup --jobs backup_data backup_home
All specified tasks will be executed sequentially in the order they are given in the --jobs parameter. Also, running the script is possible via cron, but remember that the variable surrounded by cron may be different from the user’s and you may need to specify the paths to the rm, dd, rsync, btrfs, lvcreate, lvremove utilities in the pathenv variable in the configuration file.

Recovery


Here we come to the most interesting. Backup in itself costs nothing, without the ability to quickly restore a backup. In this section, I will describe typical recovery cases from backups created by the script.

File backups

Written below refers to both full and differential backups made by tasks like btrfs-full, btrfs-diff, file-full, file-diff. It is necessary to restore a backup rsync with the -aAX keys. For example:
 rsync -aAX /mnt/backup/home/2014-06-21-full/ /home/ 

or
 rsync -aAX /mnt/backup/home/2014-06-22-diff/ /home/ 

In both cases, in the destination folder you will receive a complete copy of the data, ready for use.

Recovery snapshots

Snapshots created by the btrfs-snap type can be restored in several ways.

By default snapshots are created in read-only mode. Accordingly, you can not write directly to this snapshot. Consider an example.
BTRFS is used as the root file system. With the help of the script, the snapshots of the / home folder are created and added to the / snapshots_home. And now the day has come when we need to restore the / home folder from snapshot.
The first step is to free up the / home folder (rename or delete it).
Next, we select the snapshot we need (let it be a snapshot, for 2014-06-19) and create a snapshot from it (yes, yes, snapshot snapshot):
 btrfs subvolume snapshot /snapshots_home/2014-06-19 /home 

Thus, we first made our data available for recording and secured them. Even when the script, according to the rotation, removes the snapshot from 2014-06-19, our newly created snapshot will be intact.

Restoring full LVM backups

It's all very simple.
You must create a new LVM volume that is equal to or larger than the backup copy and copy the backup copy onto it with dd.
Example:
 dd if=/backups/2014-06-19-old_volume-full of=/dev/main_vg/new_volume bs=32M 


Restoring differential LVM backups

For this recovery, you must use the ddd utility bundled with py4backup.
To restore, it needs to specify the –restore option, the -s switch with the path to the file WITHOUT EXTENSION, the -r switch with the location of the restore (block device or file). ddd remembers the path to the full backup, but if it was moved, you must manually point it to the new path. This can be done with the -f option.
Example:
The backup folder contains backup copies:
 root@virtserver / # ls /backup/ 2014-06-18-volume-full 2014-06-19-volume-diff.dd 2014-06-19-volume-diff.ddm 

And we want to restore the backup for 2014-06-19 on the device / dev / main_vg / volume To do this, run the command:
 ddd --restore -s /backup/2014-06-19-volume-diff -r /dev/main_vg/volume 

Suppose the full copy has been moved to the / backup_old / folder:
 ddd --restore -s /backup/2014-06-19-volume-diff -r /dev/main_vg/volume -f /backup_old/2014-06-18-volume-full 

After recovery, ddd will display a list of damaged blocks with the file where the damaged block is located. The full23 entry indicates damage to block number 23 in the full copy file, and diff24 to damage block 24 to the differential copy.

Tips & Tricks


Here I will talk about some not obvious points and options for using the script.


Conclusion


Disclaimer: The author of the script is not responsible for the action or inaction of the program, resulting in loss or damage to data.
There are errors in the script (mostly small, it works stably with me and on 4 test machines) and I will be grateful for bug reports (especially with tracebacks and console command output).
This manual is relevant for version 1.5.3.
You can contact me by email, address larrabee@nixdi.com or via Habr.
Source code on github .
Packages in the repository .
Thanks for reading and I will be grateful for the comments.

Source: https://habr.com/ru/post/228787/


All Articles