It so happened that the server was attacked by a cryptographer who, by "happy coincidence", partially left intact .ibd files (raw data files of innodb tables), but completely encrypted the .fpm files (structure files). At the same time .idb could be divided into:
It was possible to determine which of the variants the tables were opened in any text editor under the desired encoding (in my case it is UTF8) and simply looking at the file for the presence of text fields, for example:
Also, at the beginning of the file you can observe a large number of 0-bytes, and viruses using the block encryption algorithm (most common) usually affect them.
In my case, the attackers at the end of each encrypted file left a string of 4 bytes (1, 0, 0, 0), which simplified the task. A script was enough to search for uninfected files:
def opened(path): files = os.listdir(path) for f in files: if os.path.isfile(path + f): yield path + f for full_path in opened("C:\\some\\path"): file = open(full_path, "rb") last_string = "" for line in file: last_string = line file.close() if (last_string[len(last_string) -4:len(last_string)]) != (1, 0, 0, 0): print(full_path)
Thus it turned out to find the files belonging to the first type. The second implies a long manual, but it has already been found enough. Everything is good, but you need to know the exact structure and (of course) there was such a case that you had to work with a frequently changing table. No one remembered whether the field type was changing or whether a new column was added.
Debri city unfortunately could not help with such a case, so this article is written.
There is a structure of a table 3 months old that does not coincide with the current one (perhaps according to one field, but possible and more). Table structure:
CREATE TABLE `table_1` ( `id` INT (11), `date` DATETIME , `description` TEXT , `id_point` INT (11), `id_user` INT (11), `date_start` DATETIME , `date_finish` DATETIME , `photo` INT (1), `id_client` INT (11), `status` INT (1), `lead__time` TIME , `sendstatus` TINYINT (4) );
at the same time, you need to extract:
id_point
int (11);id_user
int (11);date_start
DATETIME;date_finish
DATETIME.For recovery, a single byte analysis of the .ibd file is used, followed by their conversion to a more readable form. Since to search for the required, we only need to analyze such data types as int and datatime, only they will be described in the article, but sometimes they will refer to other data types, which can help in other similar incidents.
Problem 1 : in the fields with the DATETIME and TEXT types there was a NULL value, and they are simply skipped in the file, because of this, it was not possible to determine the structure for recovery in my case. In the new columns, the default value was null, and some transactions could be lost due to the setting innodb_flush_log_at_trx_commit = 0, so additional time would have to be spent to determine the structure.
Problem 2 : it should be noted that the rows deleted via DELETE will all be exactly in the ibd file, but with ALTER TABLE their structure will not be updated. As a result, the data structure can vary from the beginning of the file to its end. If you often use OPTIMIZE TABLE, then with a similar problem is unlikely to come across.
Please note that the DBMS version affects the way data is stored, and this example may not work for other major versions. In my case, the windows version of mariadb 10.1.24 was used. Also, although in mariadb you work with InnoDB tables, but in fact they are XtraDB , which excludes the applicability of the method with InnoDB mysql.
In python, the data type bytes () displays data in unicode instead of the usual set of numbers. Although the file can be viewed in this form, but for convenience, you can convert bytes into a numeric form by transferring an array of bytes into a regular array (list (example_byte_array)). In any case, both methods will be useful for analysis.
After reviewing several ibd files, you can find the following:
Moreover, if you divide the file by these keywords, you get mostly flat data blocks. We will use infimum as a divider.
table = table.split("infimum".encode())
An interesting observation, for tables with a small amount of data, between infimum and supremum is a pointer to the number of rows in the block.
- test table with the 1st row
- test table with 2 lines
An array of table [0] strings can be skipped. After reviewing it, I could not find the raw data tables. Most likely, this block is used to store indexes and keys.
Starting from table [1] and translating it into a numeric array, you can already notice some regularities, namely:
These are the int values stored in the string. The first byte indicates whether the number is positive or negative. In my case, all the numbers are positive. Of the remaining 3 bytes, you can determine the number using the following function. Script:
def find_int(val: str): # example '128, 1, 2, 3' val = [int(v) for v in val.split(", ")] result_int = val[1]*256**2 + val[2]*256*1 + val[3] return result_int
For example, 128, 0, 0, 1 = 1 , or 128, 0, 75, 108 = 19308 .
The table had a primary key with auto-increment, and here it can also be found
Comparing the data from the test tables, it was revealed that the DATETIME object consists of 5 bytes starting at 153 (most likely indicating annual intervals). Since the DATTIME range is '1000-01-01' to '9999-12-31', I think the number of bytes may differ, but in my case, the data falls in the period from 2016 to 2019, so we assume that 5 bytes is enough .
To determine the time without seconds, the following functions were written. Script:
day_ = lambda x: x % 64 // 2 # {x,x,X,x,x } def hour_(x1, x2): # {x,x,X1,X2,x} if x1 % 2 == 0: return x2 // 16 elif x1 % 2 == 1: return x2 // 16 + 16 else: raise ValueError min_ = lambda x1, x2: (x1 % 16) * 4 + (x2 // 64) # {x,x,x,X1,X2}
For the year and month it was not possible to write a health-working function, so it was necessary to harkodit. Script:
ym_list = {'2016, 1': '153, 152, 64', '2016, 2': '153, 152, 128', '2016, 3': '153, 152, 192', '2016, 4': '153, 153, 0', '2016, 5': '153, 153, 64', '2016, 6': '153, 153, 128', '2016, 7': '153, 153, 192', '2016, 8': '153, 154, 0', '2016, 9': '153, 154, 64', '2016, 10': '153, 154, 128', '2016, 11': '153, 154, 192', '2016, 12': '153, 155, 0', '2017, 1': '153, 155, 128', '2017, 2': '153, 155, 192', '2017, 3': '153, 156, 0', '2017, 4': '153, 156, 64', '2017, 5': '153, 156, 128', '2017, 6': '153, 156, 192', '2017, 7': '153, 157, 0', '2017, 8': '153, 157, 64', '2017, 9': '153, 157, 128', '2017, 10': '153, 157, 192', '2017, 11': '153, 158, 0', '2017, 12': '153, 158, 64', '2018, 1': '153, 158, 192', '2018, 2': '153, 159, 0', '2018, 3': '153, 159, 64', '2018, 4': '153, 159, 128', '2018, 5': '153, 159, 192', '2018, 6': '153, 160, 0', '2018, 7': '153, 160, 64', '2018, 8': '153, 160, 128', '2018, 9': '153, 160, 192', '2018, 10': '153, 161, 0', '2018, 11': '153, 161, 64', '2018, 12': '153, 161, 128', '2019, 1': '153, 162, 0', '2019, 2': '153, 162, 64', '2019, 3': '153, 162, 128', '2019, 4': '153, 162, 192', '2019, 5': '153, 163, 0', '2019, 6': '153, 163, 64', '2019, 7': '153, 163, 128', '2019, 8': '153, 163, 192', '2019, 9': '153, 164, 0', '2019, 10': '153, 164, 64', '2019, 11': '153, 164, 128', '2019, 12': '153, 164, 192', '2020, 1': '153, 165, 64', '2020, 2': '153, 165, 128', '2020, 3': '153, 165, 192','2020, 4': '153, 166, 0', '2020, 5': '153, 166, 64', '2020, 6': '153, 1, 128', '2020, 7': '153, 166, 192', '2020, 8': '153, 167, 0', '2020, 9': '153, 167, 64','2020, 10': '153, 167, 128', '2020, 11': '153, 167, 192', '2020, 12': '153, 168, 0'} def year_month(x1, x2): # {x,X,X,x,x } for key, value in ym_list.items(): key = [int(k) for k in key.replace("'", "").split(", ")] value = [int(v) for v in value.split(", ")] if x1 == value[1] and x2 // 64 == value[2] // 64: return key return 0, 0
Sure, if you spend n times, then this misunderstanding can be corrected.
Next, the function returns a datetime object from the string. Script:
def find_data_time(val:str): val = [int(v) for v in val.split(", ")] day = day_(val[2]) hour = hour_(val[2], val[3]) minutes = min_(val[3], val[4]) year, month = year_month(val[1], val[2]) return datetime(year, month, day, hour, minutes)
We managed to find frequently repeated values from int, int, datetime, datetime It looks like this is what you need. Moreover, such a sequence twice per line is not repeated.
Using a regular expression, we find the necessary data:
fined = re.findall(r'128, \d*, \d*, \d*, 128, \d*, \d*, \d*, 153, 1[6,5,4,3]\d, \d*, \d*, \d*, 153, 1[6,5,4,3]\d, \d*, \d*, \d*', int_array)
Please note that when searching for this expression, it will not be possible to determine NULL values in the required fields, but in my case this is not critical. After in the loop we iterate the found. Script:
result = [] for val in fined: pre_result = [] bd_int = re.findall(r"128, \d*, \d*, \d*", val) bd_date= re.findall(r"(153, 1[6,5,4,3]\d, \d*, \d*, \d*)", val) for it in bd_int: pre_result.append(find_int(bd_int[it])) for bd in bd_date: pre_result.append(find_data_time(bd)) result.append(pre_result)
Actually everything, the data from the result array, this is the data we need. ### PS. ###
I understand that this method is not for everyone, but the main goal of the article is rather to push into action than to solve all your problems. I think the most correct solution would be to start exploring the source code of mariadb itself, but due to limited time, the current method seemed to be the fastest.
In some cases, after analyzing the file, you can determine the approximate structure and restore it in one of the standard ways from the links above. It will be much more correct and cause less problems.
Source: https://habr.com/ru/post/453368/
All Articles