📜 ⬆️ ⬇️

Amazon Glacier: Perl client with multi-thread / multipart download

image

Amazon glacier


In short, Amazon Glacier is a service with a very attractive price tag, created for storing archives / backups. But the process of restoring archives is quite complicated and / or expensive. However, the service is quite suitable for secondary backup.
More about Glacier already wrote on Habré.

What's the post about


I want to share the Open Source client in Perl to synchronize the local directory with the Glacier service, also to tell about some of the nuances of working with glacier and describe the workflow of its work.

Functional


So, mtglacier. GitHub link
Features of the program:

')

Objective of the project


Implement four things in one program (all this is separately, but not together)

  1. Implementation on Perl - I believe that the language / technology in which the program is made is also important for the end user / administrator. So it is better to have a choice of implementations in different languages.
  2. Amazon S3 support is definitely planned.
  3. Multipart operations + multithreaded operations -
    multipart will help to avoid a situation where you upload several gigabytes to a remote server and suddenly the connection breaks. Multithreading speeds up downloads, and significantly speeds up the loading of a heap of small files or the removal of a large number of files.
  4. Own implementation of the protocol - it is planned to make the code reusable and publish as separate modules on the CPAN


How it works



When synchronizing files to a service, mtglacier creates a log (text file) in which all file upload operations are recorded: for each operation, the local file name, upload time, Tree Hash file, received by the archive_id file.

When restoring files from Glacier to a local disk, data for recovery is taken from this log (since you can only get a listing of files on the glacier with a delay of four hours or more).

When deleting files from Glacier, deletion entries are added to the log. When re-synchronization in Glacier, only those files that are not there are processed, according to the log.

The two-pass file recovery procedure:
  1. Creates a task to download files that are present in the log, but not on the local disk
  2. After waiting four hours, you can run a repeated command to download these files.


Cautions




How to use


  1. Create a Vault in Amazon Console
  2. Create glacier.cfg (specify the same region in which the vault was created)
     key = YOURKEY
     secret = YOURSECRET
     region = us-east-1
    

  3. Sync local directory to Glacier. Use the concurrency parameter to specify the number of threads.
    ./mtglacier.pl sync --config=glacier.cfg --from-dir /data/backup --to-vault=myvault --journal=journal.log --concurrency=3

  4. You can add files and sync again
  5. Check the integrity of files (checked only with the log)
    ./mtglacier.pl check-local-hash --config=glacier.cfg --from-dir /data/backup --to-vault=myvault --journal=journal.log


  6. You can delete some files from / data / backup
  7. Create a data recovery task. Use the max-number-of-files parameter to specify the number of archovs you want to restore. Currently, it is not recommended to specify a value greater than a few dozen (it’s not yet implemented to load more than one page with a list of current Jobs)
    ./mtglacier.pl restore --config=glacier.cfg --from-dir /data/backup --to-vault=myvault --journal=journal.log --max-number-of-files=10

  8. Wait 4 hours or more
  9. Recover deleted files
    ./mtglacier.pl restore-completed --config=glacier.cfg --from-dir /data/backup --to-vault=myvault --journal=journal.log

  10. If backup is no longer needed, delete all files from Glacier
    ./mtglacier.pl purge-vault --config=glacier.cfg --from-dir /data/backup --to-vault=myvault --journal=journal.log



Implementation



What is not enough


The main thing for the first beta was a stable (not alpha) version, ready by the end of the week, so a lot of things are still missing.
Necessarily will:

Source: https://habr.com/ru/post/150324/


All Articles