Bareos: tapes, Hyper-V and more

This is a post about life after Getting Started with Bareos, and what a comprehensive manual had to read the longest.

Storoj

If you already drove test tasks in the sandbox and can communicate with the baros via bconsole, then climb through the cat.
')

Situation

We just have an organization, not an IT profile, not a hosting. Backing up virtual machines from Hyper-V clusters, files from file farms and database dumps, and there are also various trivia.

Why baros

Because the screw client is available. As you know, Bareos is a dramatic fork of Bacula, well-deserved and proven. But during the selection of Bakula, the source code (and even binaries) of its fd for Windows was clamped, so no. Veeam is good, but it's worth as an armchair of the stadium of FC «Zenit». There was DPM, but how much Antoan and I from Microsoft technical support did not fight with him, love never arose.

Installation

has been described repeatedly. Who is the director and what he does with other demons - you can read, for example, here . I note only that dir and sd are extremely desirable to be the same version, the fd version is not so important. Feels like changers, the version is better to have 16 or higher.

Basic setting

According to the DPM habit, I wanted to create a big task and cram a lot into it. It turned out that small tasks are more convenient: in case of unsuccessful execution, a small one will be executed again faster and crawls through the spooler better (more on that later). The size of the task does not affect the recovery process, except that a task with hundreds of thousands of files can slow down at the stage of their selection.

Hyper-v

Virtual Machines (VMs) run on Hyper-V clusters. Within the cluster on the nodes, the fd settings are the same, the hostname for all is the cluster name. The director as a client also indicates a cluster with its cluster address. VM can move to another cluster volume, therefore we specify not a specific path, but the path to the script:

Hidden text

FileSet { #       ,    Name = "VM_lamachine-fs" Include { #    (/ "example.com"  ) File = "\\|C:/Windows/System32/WindowsPowerShell/v1.0/powershell.exe -file c:/cmd/search-vm.ps1 -machine lamachine.example.com" Options { #      Compression = LZO #     RegexFile = ".*/Virtual Machines/.*.bin" Exclude = yes } } }

But c: \ cmd \ search-vm.ps1, which gives the path to the machine:

 Param( [string]$level, [string]$machine = "NOEXISTENTVM.example.com" ) Import-Module failoverclusters $backuppath = @() $Cluster = Get-Cluster $ClusterMachines = @() $ClusterMachines += Get-ClusterResource -Cluster $Cluster | where { $_.ResourceType -like "Virtual Machine" } | where { $_.Name -like "*$machine"} | ` select -Property OwnerNode,Name, @{ Name ="VmID";Expression ={ (Get-ClusterParameter -Cluster $Cluster -InputObject $_ | where { $_.Name -eq "VmID" } | select -Property Value).Value } } if ($ClusterMachines.Count -eq 0 ) { "NO MACHINES" exit 2 } foreach ($ClusterMachine in $ClusterMachines){ $VM = Get-VM -ComputerName $ClusterMachine.OwnerNode -Id $ClusterMachine.VmID $path = $VM.Path.Replace('\','/') $backuppath += $path foreach ($HardDrive in $VM.HardDrives){ $drivepath = $HardDrive.Path | Split-Path -Parent $drivepath = $drivepath.Replace('\','/') if ($drivepath -notin $backuppath){ $backuppath += $drivepath } } } $backuppath

Before backup, snapshot is done, after backup it is deleted, for this there are a couple of someone else's roughly doped scripts.

Hidden text

Creature:

 #Copyright disclaimer: # Copyright (C) 2015, ITHierarchy Inc (www.ithierarchy.com). ALl rights reserverd. # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see <http://www.gnu.org/licenses/>. Param( [string]$level, [string]$machine = "noexist.example.com", #[string]$prefix = "", [int]$DayOfWeekForFullBackup = 2 ) Import-Module failoverclusters "Processing $machine via $env:computername" $dow = [int]$(get-date).DayOfWeek if ($dow -eq $DayOfWeekForFullBackup){ $prefix="Weekly" } $DateStamp=$(((get-date)).ToString("yyyyMMddTHHmmss")) if ($level -eq "Full"){$Backup=" Bacula -*"}Else{$Backup=" Bacula -$level*"} #$HyperVPath="C:\Hyper-V" #Set path to your Hyper-V Machines to be backed up #Sort out Actual Volume path to VM #$VMDrive=$HyperVPath.Substring(0,1) #$volume=Get-Volume $VMDrive #$TrueHyperVPath=$($HyperVPath.Replace("$($VMDrive):\",$($Volume.path))) #Get List of VMs $Cluster = Get-Cluster # let's initialize it like array (for simplier size check) $ClusterMachines = @() $ClusterMachines += Get-ClusterResource -Cluster $Cluster | where { $_.ResourceType -like "Virtual Machine" } | where { $_.Name -like "*$machine"} | ` select -Property OwnerNode,Name, @{ Name ="VmID";Expression ={ (Get-ClusterParameter -Cluster $Cluster -InputObject $_ | where { $_.Name -eq "VmID" } | select -Property Value).Value } } if ($ClusterMachines.count -gt 1){ "Ambiguous machine name" exit 2 } if ($ClusterMachines.count -ne 1){ "Machine not found: absent, not in failover cluster or something" exit 2 } foreach ($ClusterMachine in $ClusterMachines){ $VM = Get-VM -ComputerName $ClusterMachine.OwnerNode -Id $ClusterMachine.VmID write-host "Working on VM $($vm.Name) @ '$($vm.Path)'" $CurrentSnapShots = $VM | Get-VMSnapshot foreach ($SnapShot in $CurrentSnapShots){ if ($SnapShot.Name -like ("$($prefix)Backup*")){ write-host "Removing VM Checkpoint '$($SnapShot.Name)'" $SnapShot | Remove-VMSnapshot # -ComputerName $ClusterMachine.OwnerNode $LoopCount=0 do { Write-host "Waiting for snapshot '$($SnapShot.name)' to delete..." Start-Sleep -s 10 $LoopCount=$LoopCount+1 }while ($VM.Status -eq "Merging disks" -and $LoopCount -lt 30) } } $label = "$($prefix)Backup-$level-$DateStamp" write-host "Creating Checkpoint $label ($($VM.Name))" $VM | Checkpoint-VM -SnapshotName $label }

Uninstall:

 #Copyright disclaimer: # Copyright (C) 2015, ITHierarchy Inc (www.ithierarchy.com). ALl rights reserverd. # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see <http://www.gnu.org/licenses/>. Param( [string]$level, [string]$machine = "noexist.example.com", [string]$vmmserver = "vldvmm.example.com" ) Import-Module failoverclusters $Cluster = Get-Cluster $ClusterMachines = Get-ClusterResource -Cluster $Cluster | where { $_.ResourceType -like "Virtual Machine" } | where { $_.Name -like "*$machine"} | ` select -Property OwnerNode,Name, @{ Name ="VmID";Expression ={ (Get-ClusterParameter -Cluster $Cluster -InputObject $_ | where { $_.Name -eq "VmID" } | select -Property Value).Value } } # FIXME foreach    foreach ($ClusterMachine in $ClusterMachines){ $VM = Get-VM -ComputerName $ClusterMachine.OwnerNode -Id $ClusterMachine.VmID write-host "Working on VM $($vm.Name) @ '$($vm.Path)'" $CurrentSnapShots = $VM | Get-VMSnapshot foreach ($SnapShot in $CurrentSnapShots){ if ($SnapShot.Name -like ("Backup*")){ write-host "Removing VM Checkpoint '$($SnapShot.Name)'" $SnapShot | Remove-VMSnapshot # -ComputerName $ClusterMachine.OwnerNode $LoopCount=0 do { Write-host "Waiting for snapshot '$($SnapShot.name)' to delete..." Start-Sleep -s 10 $LoopCount=$LoopCount+1 }while ($VM.Status -eq "Merging disks" -and $LoopCount -lt 30) } } }

Snapshot can not be deleted, then it will be possible to make incremental backups (the script has the beginnings of such functionality - DayOfWeekForFullBackup).

Cassette

We use tape libraries: such a ~~dvuhunitovaya dvuhinyunitovaya~~ ~~dvuhshshkovaya~~ box with cassettes, one or two streamers and a robot avto-changer. Bacula, and by inheritance, and Bareos, are excellent friends with tapes (better than with HDD). What confused me - in one library, the Baros discovered two autochanger, this should not be. It turned out that the device was programmatically divided into two “logical libraries” since the time of the struggle with DPM. Go to the device admin panel and disable it unnecessary, now the system has the correct number of changers - one. The devices will be shown by the command ls /dev/tape/by-id/ , with the suffix "-nst" - writing drive, without it - a robot avto-changer.

Regarding the use of two drives for parallel writing to one pool (set of volumes): this would reduce the recording time, but did not do so. In the selected window backups and so fit, but the flow of the film may increase. But if anyone wants a parallel entry, do not forget to set the Prefer Mounted Volumes to No You can write in two different pools without problems and an overhead projector.

Scratch pools. I want to draw attention to them: of them, the baros takes the tapes to add to the other pools to which he is going to write. An unknown tape without a baros baron will not be added to the work pool. Therefore, all new tapes are added to the Scratch pool:

 label barcodes storage=mylittlestorage slot=1 pool=Scratch

You can add not to Scratch, but immediately to the working pool, but if there are several pools, it is not always possible to predict how many cassettes you need. Therefore, let him take it out of necessity.

The size of blocks and files need to be put more, it will have a beneficial effect on the speed of the tapes. “If you’re looking for a LTO-4 or newer drive, you can’t make it a little bit more. ” So don't be shy.

In order for the baros not to be tempted to write something onto a cassette that is already in a distant safe, it is better to change its status from Append to Used before recess:

 update volume=KYF389L6 volstatus=Used

The same command can change other properties of the cassette, say, move to another pool:

 update volume=KYF389L6 pool=YetAnotherPool

The cassette is easy to steal (well, easier than the IBM DS8800), so it is highly desirable to encrypt the data. You can do this by means of the writer itself, but I like software solutions as more versatile and flexible. Just don't forget .

It happens that the baros has once written a label on a cassette, but does not have a record of this cassette in the database. The second time the label does not work (“error: already labeled”), there is an add command, but in my case it caused problems, after which it was impossible to use the cassette. In this case, such a one-liner was born (it is executed in bash on the sd server, the bareos-sd itself must be stopped):

 mtx -f /dev/sg10 load 25 && mt -f /dev/st0 rewind && mt -f /dev/st0 weof && mt -f /dev/st0 rewind && mtx -f /dev/sg10 unload

If it does not allow unloading, then

 mt -f /dev/st0 offline

Spooling

This is when the data is written first to the SSD (or at least a fast HDD), and then to the tape. Compared with the sequential execution, the runtime of the streamer is reduced (he writes faster) and the total time of task execution, if there are a lot of them. If the task is one, the time taken to use the drive will also be reduced, but the total time to complete the task will increase.

To work, first on the sd side you need to specify the location and size of the spooler:

 Device { Name = Drive-0 ... #   Maximum Concurrent Jobs = 20 Spool Directory = /mnt/backup/spool Maximum Spool Size = 1950 G Maximum Job Spool Size = 1200 G }

and then enable for specific tasks:

 JobDefs { Name = "SundayTape" ... Spool Data = Yes }

I have this directive in the template of the “ribbon” task, and for the “disk” tasks the spooling is practically useless.

The principles are:

The assignment can be read from the source and written to the spooler (hereinafter “reading”) or read from the spooler and written to the tape (hereinafter referred to as “writing”). At the same time, both are not; for this, you will need to split the task into two. Therefore, I recommended above to do smaller tasks.
it is desirable that the writer write at the maximum speed, for this purpose, there should always be data ready for writing in the spooler (in status storage = my-little-storage, they are marked with the flag spool_wait).
it is necessary that the spooler does not overflow. If it overflows, then all the data from it (all the tasks at once!) Will want to be recorded on the tape (arrogate), causing fragmentation on the tape. Some 300 GB can be spread over three 2.5 TB tapes, but we don’t need it.

The size of the spooler and the number of simultaneous tasks depend on the total number of tasks, sizes of tasks, data reading speed (which can rest on the network or spooler speed), tape recording speed, ratios of sizes and speeds of different tasks, human desires (get results faster, or less Piskaku, or spend less cassettes). All this can be connected with the hellish matan, but I recommend setting up the spool as a god will put it, because even the simplified rules are not simple:

you need to select the number of simultaneously running jobs, so that in total they give more data than the bandwidth of the tape. The speeds of different sources can vary greatly, even the speed of one source can vary greatly. For example, fixed vhdx can give 100 MB / s at the beginning and 500 KB / s at the end, when fd returns shrunk zeros. This makes calculation difficult, but c'est la vie, take it with a margin.
ideally, the spooler should accommodate all the simultaneously executed tasks. But not everyone has so much SSD, and the HDD speed may be less than the speed of some LTO-6.
while data from one source is still being read (spooled), other data may already be recorded and make room in the spooler, due to this the spooler can be reduced. Here, the greater the difference in the size of the tasks, the better (if all the tasks cannot be made small).
if all the tasks have been completed, except for the one of the largest, then let it spooled / despall as much as possible, it will not do any more fragmentation.
if you want to arrange your tasks in a certain order (say, first run a couple of small ones and the biggest, and then all the others - this is advantageous in time), then set them with different schedules or different priorities (but in this case do not forget Allow Mixed Priority )

The number of simultaneously performed tasks is limited in many places, look in the documentation “Concurrent Jobs =”. It turned out to be convenient for me to put a large number with a margin everywhere, and limit it to the required number on a specific device ( sd device ).

About files

Linuksovaya habit - to crawl into the guts, pick open and ruffle. I wanted the same thing with the Baros file volumes so that you can find the volume you need and restore it even when the director is not working. For this, I tried for each task to create a new file with a name containing the name of the task. I had to delete obsolete volumes with a crown script, and also make sure that each volume has a task in the database and vice versa. Bareos quickly began to grow into eerie crutches, it was decided to abandon the human-readable naming, use meaningless names and Recycle (cleaning and reusing the file for another task). Still, without a director, nowhere, if you lose a server one, you need to restore it first of all.

And IBM recommends storing one task in one file, and so far I agree with them.

Monitoring

Some reporting facilities on the street ^ W ^ W also had to be added with scripts. The script that returns the status of the last task launch was the most popular.

Hidden text

 #!/bin/bash RED='\033[0;31m' NC='\033[0m' # No Color GREEN='\033[0;32m' YELLOW='\033[0;33m' JOBS=`su - postgres -c "psql -d bareos -c \"WITH summary AS ( SELECT name,jobstatus,jobid, ROW_NUMBER() OVER(PARTITION BY name ORDER BY starttime DESC) AS rk FROM job p WHERE starttime > current_date - INTERVAL '5 days') SELECT s.* FROM summary s WHERE s.rk=1;\"" | grep "1$" | sed 's/ //g'` #echo "$JOBS" for job in $JOBS; do jobstatus=`echo $job | cut -d '|' -f2` jobname=`echo $job | cut -d '|' -f1` jobid=$(echo $job | cut -d '|' -f3) if [ "$jobstatus" == "R" ]; then printf "%-30s" "$jobname ($jobid)" echo -e "$YELLOW running$NC ($jobstatus)" elif [ "$jobstatus" == "W" ]; then printf "%-30s" "$jobname ($jobid)" echo -e "$YELLOW warning$NC ($jobstatus)" elif [ "$jobstatus" == "T" ]; then if [[ $1 == "printall" ]]; then printf "%-30s" "$jobname ($jobid)" echo -e "$GREEN OK$NC ($jobstatus)" fi else printf "%-30s" "$jobname ($jobid)" echo -e "$RED failed$NC ($jobstatus)" fi done

The original version of the script also checked the fact of encryption, but this turned out to be a breeze. If there are problems with encryption, the task is fatally completed, which will be seen again by status.

There is a version for Zabbix

 #!/bin/bash JOBS=`su - postgres -c "psql -d bareos -c \" SELECT name,starttime,jobstatus FROM job p WHERE starttime > current_date - INTERVAL '62 days' AND name = '$1' ORDER BY starttime DESC LIMIT 1;\"" | sed 's/ //g' | grep "|.$"` for job in $JOBS; do jobname=`echo $job | cut -d '|' -f1` jobstatus=`echo $job | cut -d '|' -f3` if [ "$jobstatus" == "E" ] || [ "$jobstatus" == "f" ]; then #echo "Job $jobname failed ($jobstatus)." echo "3" exit elif [ "$jobstatus" == "W" ]; then #echo "Job $jobname with warning ($jobstatus)." echo "1" exit elif [ "$jobstatus" != "T" ] && [ "$jobstatus" != "R" ]; then #echo "Job $jobname not ok ($jobstatus)." echo "2" exit elif [ "$jobfiles" == 0 ] || [ "$jobbytes" == 0 ] ; then #echo "Job $jobname is empty." echo "4" exit else echo "0" exit fi done

The discovery script (LLD) is pinned to the bconsole output format and can easily break, but it still works. And JSON is molded by hand, but for now it also works.

 #!/bin/bash FIRST=true JOBS=$(echo "show jobs" | bconsole | grep "^ *Name = \|Enabled = no" | sed 'N;/\n Enabled = no/d;P;D' | grep -v -e "-test\"$" | cut -d'=' -f 2 | grep -o "[a-zA-Z0-9_-]*" ) echo '{ "data": [' for job in $JOBS; do if [ "$FIRST" = false ]; then echo -n "," fi FIRST=false echo "" echo " {" echo " \"{#JOBNAME}\": \"$job\"" echo -n " }" done echo ' ] }'

And I also love stacked graphics in zabbiks, for example, the volume of film occupied by different tasks:

It is evident that it is time to divide the blue task into several small ones.

Pleasant trifles

From the very beginning I used the status storage= command, but for some reason it didn’t occur to me to make the status director . A convenient summary was found there (and if the watch "echo 'status director' | bconsole" , then it is practically a dashboard), its simplicity is especially useful for operators of cassette changers.
BashI's Ctrl + R (search through command history), Ctrl + W (delete word) and the like work in bconsole.

Future achievements

Bootstraps. If the director of Baros is lost in a disaster along with the business servers, the admin will be called into the recovery of production servers may be delayed in the wristworm ^ W ^ W ^ W ^ W. To quickly restore the director, the director backs up himself and sends bootstraps to my mail. But I do not know what to do with them, the instructions are not very intelligible, I have not tested them yet. Therefore, at the end of the session, the baros still stupidly backs up the virtual machine with itself and immediately restores it to another server. And still there is a backup director with a copy of configs and a replica postgres . But bootstraps will have to be mastered.
Make a normal migration. There is such a theoretically cool thing - migration: for example, we keep backups for today and yesterday on disks, and then they move to tapes. But beautifully it is not working now. Bareos will not write multiple volume files on one tape (which is also a volume). He wants to write each file on a separate tape. This is logical, but extremely impractical, because we have one task in the file, and the tasks are small, remember?
Unlink the pool from the library. I have two libraries (one of them is dual-drive), and the entry in each pool is tied to a specific device (data can be restored on any device, there are no problems with this). In the description of the pool, you can specify several devices, then a matter of time: you need to sit down, make and potestit.
Get lz4 to work on linux-fd. Something collected from here , but did not work right away , until I figured it out.

Ask questions, the software is clear, although with character, I want to contribute to its spread.

And do not forget to check your backups.

Source: https://habr.com/ru/post/275259/

All Articles

Bareos: tapes, Hyper-V and more

Situation

Why baros

Installation

Basic setting

Hyper-v

Cassette

Spooling

About files

Monitoring

Pleasant trifles

Future achievements

More articles: