As you know, all people are divided into two types: those who have not yet done backups, and those who are already doing them. For those who are just starting to make backups, the first question that usually arises is how to archive the data. We will not consider simple options (manually cutting blanks, archiving entire directories to other servers) - they have very modest possibilities in indexing and searching archive files. Instead, we turn to automatic backup systems, in particular bacula. This article does not address the question of why bacula. The main reasons are that it is distributed under a free license, is available for heaps of platforms and has great flexibility.
The second question after choosing the backup system is the choice of where to store backups. Bacula allows you to use streamers, CDs, write archives to FIFO devices and to regular files. The streamer is convenient on corporate servers where there is a permanent physical to the hardware. Storage of archives in files is suitable when the volume of archives does not exceed the amount of hard drives, plus for reliable storage it is advisable to make a RAID array with redundancy, or even several physical servers for backups, preferably in different rooms. Otherwise, all this before the first fire. Cut into blanks is a homemade version, the main disadvantage of which is the need to regularly stick in fresh discs. We set up bacula for archiving data on Amazon S3.
Amazon S3 is a file storage on the Amazon cloud that is very inexpensive and is suitable for archiving any servers that are constantly connected to the Internet. It can be home computers, and office servers, and servers at data center sites.
So, proceed to the setting. Our farm archives office network and a small cluster in the data center.
')
Director
Bacula is arranged in such a way that the main server in it is played by the main server - Director. Its function is to store information about the entire archiving system, start the necessary tasks on a schedule and log the results of their execution. We install the bacula-director-mysql package (as it is called in Debian) on the office network server.
Bacula has excellent configuration documentation. I will focus only on the key settings. The Catalog section contains the address of the MySQL server where the archive index, database name, name and password will be stored. In the Messages section, we write the administrator's e-mail, where the reports on the results of the execution will fall:
mailcommand = "/usr/lib/bacula/bsmtp -h smtp.example.com -f \"\(Bacula\) %r\" -s \"Bacula daemon message\" %r" mail = sysadmin@example.com = all, !skipped
Storage
Bacula has a distributed architecture. The servers to which the storage media is connected to which the archives will be recorded are called Storage. It is the Storage servers that will interact with Amazon S3. In our system, we needed two Storage servers - one installed in the office, directly on the server from the Director (all data from the office computers will be backed up), the other on one of the servers in the data center (there will go data from other computers in the data center). The point of separation is not to drive paid traffic between the data center and the office.
By signing up for Amazon AWS and creating a bucket with the name you chose, you will have the following account details in your hands: bucket name, access_key and secret_key. The next step is to use the s3fs module for fuse to mount our bucket directly into the server's file system. s3fs is not part of the Debian distribution - it will need to be built manually. For some reason, fresh versions did not work right away (there were some suspensions after deleting files, and you had to rewire the file system), but version 1.16 started working right away without a single problem. Register in server load:
s3fs your-bucket-name /mnt/backup -o allow_other,retries=10,connect_timeout=30,readwrite_timeout=30
And create the file / etc / passwd-s3fs with S3 passwords:
access_key:secret_key
After the file system is mounted, we will install the bacula-sd package and ask bacula to save the archive files in S3. To do this, in the bacula-sd.conf configuration file, in the Storage section, specify the Name so that we can easily identify it, for example Name = companyname-office-sd, Maximum Concurrent Jobs = 1, so that there are no simultaneous calls to different files via S3 . In the Director section, write the name of our Director server and some random password. In the Storage section we describe the Device - the actual physical storage space for archives:
Device { Name=S3-companyname-office Media Type=file ArchiveDevice=/mnt/backup/office Label Media=yes; Random Access=yes; AutomaticMount=yes; RemovableMedia=no; AlwaysOpen=no; }
Now let's return to the Director server for a minute and in its config we will create a new Storage section:
Storage { Name = S3-companyname-office Address = storage.server.domain.name SDPort = 9103 Password = "storage-server-password" Device = S3-companyname-office Media Type = File Maximum Concurrent Jobs = 1 }
Password we specify the same as in the Storage configuration in the Director section.
File daemon
File Daemon is the actual daemon installed on computers that need to be backed up. The bacula-fd package is installed on all computers of the cluster, on office computers where there is valuable data, and its bacula-fd.conf config is configured. In the Director section, we register the identifier of our Director server and invent a new random password. This completes its configuration, and we return to the Director server again to register a new client.
In the Director config we create a new section:
Client { Name = server-name-fd Address = this.server.host.name FDPort = 9102 Catalog = MyCatalog Password = "file-server-password" Maximum Concurrent Jobs = 1 AutoPrune = yes Job Retention = 365 days }
Password is specified the same as in the File Daemon config in the Director section.
An important parameter is Job Retention. After this time, old data will be deleted from the archive index. The longer this time, the longer your old archive files will not be overwritten, and the more money you will pay for Amazon S3. The smaller, the cheaper the backups will cost you, but the depth of the backup will also be less. In addition, make sure that you have full (Full) archiving more often than Job Retention, otherwise you will have moments when the old data has already been deleted, and the new ones have not yet been archived.
Free up space
Please note that the physical data is not erased from the archive, even if the links to them are deleted from the index after the Retention interval expires. Removing from the index simply means that new ones can be written in place of old files. At the same time, the disk space is not released, and you will not pay less, even if you clear the entire index. Physically, files can either be deleted manually, making sure that they no longer have any links in the index, or install a fresh version of bacula, which automatically can truncate freed volumes. In practice, this is rarely needed, and we do not use this opportunity.
Firewall setup
During operation, bacula requires three types of TCP connections - from Director to Storage on port 9103, from Director to File Daemon to port 9102 and from File Daemon to Storage to port 9103. In all these areas, firewalls should be open, and addresses that registered in the Address parameters on Storage and File Daemon, must be available in the specified directions. In particular, if your Storage suddenly finds itself inside a local network, then Director and File Daemons that will be archived on it should also be inside the same network. If for some reason you need to keep the Storage inside the network, then you need to configure the routing so that the corresponding port is forwarded inside the network.
S3fs features
When writing to any file, the s3fs reads the entire previous version of the file to the computer, all modifications occur in the local copy, and after the file is closed, it is completely downloaded back to S3. This means that even backing up a few bytes to an archive file of 500 megabytes will result in the transfer of a gigabyte over the network. Since traffic on Amazon S3 is charged, you should not forget about this feature of s3fs. The size of the files should not be very large, so that the downloads and backfills are small. We have 3 pools on each Storage-server - for Full backups (Maximum Volume Bytes = 500000000, Volume Retention = 12 months), for Diff-backups (Maximum Volume Bytes = 300000000, Volume Retention = 7 months) and for Incremental backups (Maximum Volume Bytes = 100000000, Volume Retention = 2 months). Please note that in this example, differential backups should be done at least once every 2 months (we have 1 month), otherwise there will be times when incremental backups have already been deleted, but there is no fresh differential yet. Similarly, full backups should be done at least once every 7 months (we have 6).
In addition, since you need a place to store local copies, you need to always have it on the file system. Where to add local copies is specified in the use_cache option when mounting s3fs. The default is the root FS. The more simultaneous files you have openly, the more space is required there. Therefore, we limit both Maximum Concurrent Jobs on one Storage server (so that many files are not kept open) and Maximum Volume Bytes. If the place suddenly ends, s3fs will freeze and wait for the space to be freed.
Issue price
Storage prices range from $ 0.037 per gigabyte per month (for large storage volumes with a reduced reliability of 99.99%) to $ 0.14 per gigabyte per month (for small storage volumes with a standard reliability of 99.999999999%). We chose standard reliability and store about a terabyte of archives - it costs (along with the cost of traffic) at about $ 180 a month.
Security
Here is a small security checklist:
- close read access for all to the bacula configs and to the file / etc / passwd-s3fs, since passwords recorded there are extremely dangerous to disclose;
- restrict access to ports 9101, 9102 and 9103, except for the directions required for bacula operation;
- configure bacula to use TLS when transmitting data over public networks;
- close directories with archives from prying eyes;
- fuse encryption encrypt data that will get to Amazon S3;
- start different S3 buckets on different Storage servers so that if one archive is compromised, the hacker will not get access to the others;
- Regularly perform test recoveries to make sure you don’t forget to put something important in the archives.
Links