Archiving and restoring indexes in Elasticsearch

One day, one fine morning, we faced the question of archiving Elasticsearch indexes. I wanted to see in the repository slender rows of compressed files, one for each index.

“Out of the box” Elastic does not offer such a solution, at least in version 5.x. After asking a little from Google Almighty, we decided to create our own bike. Let a little awkward, but native.

')

So, given:
Server with Linux OS installed on it. On the server, version 5.x is installed, configured and running, Elasticsearch stores indexes with names like project_name-yyyy.mm.dd

Task:
Archive each index older than N days, compressing it into a separate file to be able to restore this particular index for the desired date. When the archiving process is complete, remove the index from Elasticsearch.

Decision

Indexing Archiving

To work with indexes, we need curator version older than 5.0.
We will pull out indexes from Elasticsearch through snapshots. Therefore, to begin with, make sure that the configuration file, elasticsearch, is usually the file /etc/elasticsearch/elasticsearch.yml, the path to the file repository is registered:

path.repo: /opt/elasticsearch/snapshots

If there is no such line, register and restart elasticsearch.

Next, create a repository in which we will place snapshots:

 mkdir -p /opt/elasticsearch/snapshots/repository curl -XPUT 'http://localhost:9200/_snapshot/repository' -H 'Content-Type: application/json' -d '{ "type": "fs", "settings": { "location": "repository", "compress": true } }'

And immediately create another repository called “recovery”, we will need it to restore the indexes:

 mkdir -p /opt/elasticsearch/snapshots/recovery curl -XPUT 'http://localhost:9200/_snapshot/recovery' -H 'Content-Type: application/json' -d '{ "type": "fs", "settings": { "location": "recovery", "compress": true } }'

Well, the script itself is archiving indexes. From the list of indexes to be archived, exclude the indexes .kibana and elastalert_status . The values of the variables specified in the header of the script, of course, can be pulled out and passed to the script as arguments, a matter of taste.

The local folder is specified as the archive storage in the script, but this, of course, is not very good practice, and it is better to put the archives somewhere far away in a safe place.

The archiving process is logged, the log file is specified by the $ LOG variable. Do not forget to configure the rotation for the log file.

The logic of the script is described in the comments. Do not forget to correct the values of variables, if your settings will be different from the default.

 #!/bin/bash DAYS=31 # ,   ,      SNAPSHOT_DIRECTORY="/opt/elasticsearch/snapshots" BACKUP_DIR="/opt/elasticsearch/elasticsearch_backup" REPOSITORY="repository" LOG="/var/log/elasticsearch/elasticsearch_backup.log" DATE=`date` #       ,   if ! [ -d $BACKUP_DIR ]; then mkdir -p $BACKUP_DIR fi #  ,   $DAYS INDICES=`curator_cli --config /etc/elasticsearch/curator-config.yml --host localhost --port 9200 show_indices --filter_list "[{\"filtertype\":\"age\",\"source\":\"creation_date\",\"direction\":\"older\",\"unit\":\"days\",\"unit_count\":\"$DAYS\"},{\"filtertype\":\"kibana\",\"exclude\":\"True\"},{\"filtertype\":\"pattern\",\"kind\":\"regex\",\"value\":\"elastalert_status\",\"exclude\":\"True\"}]"` #,     TEST_INDICES=`echo $INDICES | grep -q -i "error" && echo 1 || echo 0` if [ $TEST_INDICES == 1 ] then echo "$DATE     " >> $LOG exit else #        $INDICES for i in $INDICES do #     $i curator_cli --config /etc/elasticsearch/curator-config.yml --timeout 600 --host localhost --port 9200 snapshot --repository $REPOSITORY --filter_list "{\"filtertype\":\"pattern\",\"kind\":\"regex\",\"value\":\"$i\"}" #        $i SNAPSHOT=`curator_cli --config /etc/elasticsearch/curator-config.yml --host localhost --port 9200 show_snapshots --repository $REPOSITORY` #         cd $SNAPSHOT_DIRECTORY/$REPOSITORY && tar cjf $BACKUP_DIR"/"$i".tar.bz" ./* #  snapshot curator_cli --config /etc/elasticsearch/curator-config.yml --host localhost --port 9200 delete_snapshots --repository $REPOSITORY --filter_list "{\"filtertype\":\"pattern\",\"kind\":\"regex\",\"value\":\"$SNAPSHOT\"}" #   curator_cli --config /etc/elasticsearch/curator-config.yml --host localhost --port 9200 delete_indices --filter_list "{\"filtertype\":\"pattern\",\"kind\":\"regex\",\"value\":\"$i\"}" #    rm -rf $SNAPSHOT_DIRECTORY/$REPOSITORY/* done fi

Deleting obsolete archives

Fine! We received archives of indexes in storage, but they too need to be periodically cleaned too. Let's create for this a separate script /opt/elasticsearch/delete_archives.sh

 #!/bin/bash #    $DAYS  # !          "-"  .      "yyyy.mm.dd". # : aaa_bbb.ccc-yyyy.mm.dd.tar.bz DAYS=91 BACKUP_DIR="/opt/elasticsearch/elasticsearch_backup" #      THRESHOLD=$(date -d "$DAYS days ago" +%Y%m%d) #echo "THRESHOLD=$THRESHOLD" FILES=`ls -1 $BACKUP_DIR` TODELETE=`for i in $FILES; do echo $i | awk -F- '{printf "%s\n",$2 ;}' | awk -F. '{printf "%s%s%s \n",$1,$2,$3 ;}' | sed "s/$/$i/"; done` echo -e "$TODELETE" |\ while read DATE FILE do [[ $DATE -le $THRESHOLD ]] && rm -rf $BACKUP_DIR/$FILE done

Well, now it remains to install the scripts in kroner. We have them run once a day at night. Script output will redirect to log file

 0 1 * * * /bin/bash /opt/elasticsearch/backup_snapshot.sh >> /var/log/elasticsearch/elasticsearch_backup.log 0 3 * * * /bin/bash /opt/elasticsearch/delete_archive.sh >> /var/log/elasticsearch/elasticsearch_backup.log

Recovery of indexes from archive

The process of restoring indexes is also simple. For convenience, we wrap the process of restoring the index from the archive to the script, which will be passed as an argument to the file name of the archive file. For the script to work, you must install the jq utility. The repository folder / opt / elasticsearch / snapshots / recovery to restore the index and the repository itself in Elasticsearch we created earlier.

 #!/bin/bash #  ARCHIVE=$1 BACKUP_DIR="/opt/elasticsearch/elasticsearch_backup" RECOVERY_DIR="/opt/elasticsearch/snapshots/recovery/" #       rm -rf $RECOVERY_DIR/* #      tar xjf $BACKUP_DIR/$ARCHIVE -C $RECOVERY_DIR #    $SNAPSHOT     SNAPSHOT=`curl -s -XGET "localhost:9200/_snapshot/recovery/_all?pretty" | jq '.snapshots[0].snapshot' | sed 's/\"//g'` #     curl -XPOST "localhost:9200/_snapshot/recovery/$SNAPSHOT/_restore?pretty" #    ,  Elasticsearch       sleep 30 #     curl -XDELETE "localhost:9200/_snapshot/recovery/$SNAPSHOT?pretty" #    rm -rf $RECOVERY_DIR/*

Summarize

We made archiving of each index into a separate file, set up the removal of obsolete archives and learned how to restore the index for the desired date from the archive.
You can afford to take a pie from the shelf.

Source: https://habr.com/ru/post/349192/

All Articles

Archiving and restoring indexes in Elasticsearch

Decision

Indexing Archiving

Deleting obsolete archives

Recovery of indexes from archive

More articles: