📜 ⬆️ ⬇️

Data recovery redis

The authorship of the article belongs to our employee Stepan Karamyshev skob
Invite he has not yet, because I spread for him.
Further narration on his behalf.

This text does not pretend to genius, the methods described in it are not a panacea, much less a silver bullet. Just as a result of a study conducted during the accident described below, not a single source was found either in Russian or in English (nor, it seems, in German) languages, which would give similar information. Note: the information is provided "as is", its use without thoughtful study can be destructive!

Grandpa for red'ku
Folk tale
')
image



Cases are different. Sometimes even the most reliable systems can fail. So this time, dropping into the seat of my command center, I with horror and bitterness found more than a hundred messages in Skype chat. Reading along the diagonal of the story showed that redis was swollen from our customers, went into itself and was killed with the help of OOM-killer. Naturally, it wasn’t time to postpone the append-only file. The magical redis-check-aof --fix to the file happily reported on the fix, simultaneously suggesting that you cut the 8-gigabyte file to 900 megabytes.

- Oh God! Shouted customers.
- Fiasco! Shouted customers.
- $ #% \! [_ *** #% \! Shouted customers.


In aof-files, if someone does not know, there is a consistently executed redis command.

Fortunately, before running --fix "into the battle", the file was copied. In parallel, an administrative decision was made that we would not leave without repairing the file.

The very first attempt to load a daemon with a file showed “HQET command not found”. Then the mind games began. The file was recaptured from sin, after which it was subjected to cruel vivisection. First of all, the page http://redis.io/commands was opened, where all the possible redis control commands are available.

image

Level one. Hurt me plenty.


yum install -y hexedit <- set hexedit
hexedit <aof-file> <- open the hexedit file
<- go to ascii mode
</> <- go to search
HQET-> <- we are looking for the desired command, we rule
<ctrl-X> <- exit hexedit

After correcting the first wave of commands (HQET, HSAT, ZASD), redis-check-aof was re-set on the file. The execution results led to

image

Level two. Ultra-Violence.


Now the file required attention about:

0x308a7042: Expected prefix '\r\n', got: '0f0a' 


hexedit <aof-file> <- open the hexedit file
<- go to the address
<rule 0f0a on 0d0a> <- I remind you, 0d0a this is \ r \ n in hex.

It seems to have become even easier, but the next run of the redis-check-aof has already begun to talk about

image

Level three. Nightmare!


 0x       308a7046: Expected prefix '*', got: '$' 


Radish sources were downloaded, redis-check-aof was rebuilt with extended logging. In my case, I preferred to print a sequence of characters _of_ after the problematic place for more accurate identification, having received something like:

 '34' 'D' 'A' '0' 


This is done by logging not only buf [0], but also a few characters after, or the entire buf.
Additionally, note that in hexedit, the string from the example above will be represented as '34 0D 0A 00 ', i.e. two characters with leading zero.

Next, you need to very carefully read the ASCII before the problem place.
If we analyze the example above, then in ASCII we will see:

 HSET..$16..{u204237826932}g..$3..150..$1..1..*6..$4..HSET..$16..{u204237826932}g..$3 ..139..$1..1..*4..$4.. 


Further, we carefully studied the source code of redis, especially the following piece:

     if (buf[0] != '*') goto fmterr;       argc = atoi(buf+1);       if (argc < 1) goto fmterr;       argv = zmalloc(sizeof(robj*)*argc);       for (j = 0; j < argc; j++) {           if (fgets(buf,sizeof(buf),fp) == NULL) goto readerr;           if (buf[0] != '$') goto fmterr;           len = strtol(buf+1,NULL,10);           argsds = sdsnewlen(NULL,len);           if (len && fread(argsds,len,1,fp) == 0) goto fmterr;           argv[j] = createObject(REDIS_STRING,argsds);           if (fread(buf,2,1,fp) == 0) goto fmterr; /* discard CRLF */    } 


and at some point we came to understand that the line is higher:

 HSET..$16..{u204237826932}g..$3..150..$1..1..*6..$4..HSET..$16..{u204237826932}g..$3 ..139..$1..1..*4..$4.. 


Will be normally eaten by check, if it is corrected to:

 HSET..$16..{u204237826932}g..$3..150..$1..1..*4..$4..HSET..$16..{u204237826932}g..$3 ..139..$1..1..*4..$4.. 


for the lazy, the difference is in replacing 6 by 4 before the second HSET, which gave us a change in the length of the argument line in the code above, which allowed us to go through the “goto fmterr” section and quietly go on.

As a result, the file was refilled in manual mode, redis was launched, clients were reassured, the project was returned to work, the reputation was slightly dried, the file backup was reconfigured and rechecked. Just in case twice.

Source: https://habr.com/ru/post/187826/


All Articles