📜 ⬆️ ⬇️

Depersonalization of the MySQL database. Interesting technique

image

In the company where I work, we use a depersonalized base with Production-a. Its total volume at the moment about 30 GB. The ruby ​​obfuscation script took about 6 hours. Acceleration can be achieved by rewriting this all into a stored procedure (stored procedure). But in our project they are forbidden ... Alas and ah.

Then I asked myself a question: is it possible to speed up the process to the maximum, to depersonalize the entire database (or at least one table completely) using only one update statement? The problem is that some fields. D. unique and some random values ​​from the list.
')

It turned out possible. A little thought, the decision came with the help of user variables, a pseudo-random number generator and the case operator.

Below is a bit of code and explanation:

Let there is a table users with fields:
first_name
gender
last_name
address_1
address_2
home_phone
birthdate
ssn
password

After obfuscation should be:
first_name is one of: female - Patricia, Taylor, Susan, Lisa, Linda, Sandra, Carol, Debra, Teresa, Rebecca, Diana, Veronika, Helen, Alexandra, Svetlana, Elona, ​​Marina, Mila, Olga, Vasilisa, Marta
men - David, John, Robert, Steven, William, Mark, Thomas, Michael, Richard, Kevin, Donald, Andrew, Ruslan, Eugene, Sergey, Alexandr, Yura, Ivan, Daniel
gender: no change
last_name: Johnson, Anderson, Reed, Erickson, Frank, Lucas, Jenkins, Watson, Morgan, Kim, Kovalinen, Konovalov, Tereshko, Urchik, Kuleshov, Kisliakov, Areshnik, Pekar, Matroskin, Gallagher
address_1: 123 Main Street
address_2: if there is nothing in the original, then after obfuscation it should be NULL, otherwise the address is 123 Main Street
home_phone: if there is nothing in the original, then after obfuscation it should be NULL, otherwise the phone 111-111-2222
birthdate: leave the year of birth the same and change the date and month
ssn: a random nine-character value unique to each line, starting with '30'
password: NULL

Decision:
SET @rand = 276821; UPDATE users SET first_name= CASE WHEN gender='F' || gender='f' THEN ELT(FLOOR(1+RAND() * 21), 'Patricia', 'Taylor', 'Susan', 'Lisa', 'Linda', 'Sandra', 'Carol', 'Debra', 'Teresa', 'Rebecca', 'Diana', 'Veronika', 'Helen', 'Alexandra', 'Svetlana', 'Elona', 'Marina', 'Mila', 'Olga', 'Vasilisa', 'Marta') ELSE ELT(FLOOR(1+RAND() * 19), 'David', 'John', 'Robert', 'Steven', 'William', 'Mark', 'Thomas', 'Michael', 'Richard', 'Kevin', 'Donald', 'Andrew', 'Ruslan', 'Eugene', 'Sergey', 'Alexandr', 'Yura', 'Ivan', 'Daniel') END, last_name = ELT(FLOOR(1+RAND() * 20), 'Johnson', 'Anderson', 'Reed', 'Erickson', 'Frank', 'Lucas', 'Jenkins', 'Watson', 'Morgan', 'Kim', 'Kovalinen', 'Konovalov', 'Tereshko', 'Urchik', 'Kuleshov', 'Kisliakov', 'Areshnik', 'Pekar', 'Matroskin', 'Gallagher'), address_1 = '123 Main Street', address_2 = CASE WHEN address_2 is not null THEN 'Apt. 14' ELSE null END, home_phone = CASE WHEN home_phone is not null THEN '111-111-2222' ELSE null END, birthdate = CASE WHEN birthdate is not null THEN MAKEDATE(YEAR(birthdate),FLOOR(RAND()*365)) ELSE null END, ssn = CASE WHEN ((@rand + 609673) % 1048576 <> 0) && ((@rand + 609673) % 1048576 <> 1000000) THEN CONCAT('30',LPAD(@rand:=(@rand + 609673) % 1048576,7,'0')) ELSE CONCAT('30',LPAD(@rand:=(@rand + 609673*2) % 1048576,7,'0')) END, password = null; 


As can be seen from the code, you can select a random value from the list using ELT (FLOOR (1 + RAND () * 21), ...), where FLOOR (1 + RAND () * 21) is a random value in the range from 1 to 21. ELT - selects the appropriate drain with the specified index.

CASE helps to choose separate female and male names depending on gender. Such progress control functions that can be used in a separate statement are four CASE, IF, IFNULL (), NULLIF ().

From what is worth mentioning, this is a generator of a random unique value. A prime number (276821) was chosen as the initial value of the generator and written to the user variable rand . The following value is set directly in the CASE statement: rand : = ( rand + 609673 * 2)% 1048576. The CONCAT expression ('30', LPAD ( rand , 7, '0')) forms the final look of the rand value.

Conclusion:
What have we achieved in this way of depersonalizing the base?
1) The execution speed was reduced from 6 hours to 4 minutes.
2) Stored procedures are not used.
3) Clear (not complicated) logic of work and all code is assembled in one place.

Source: https://habr.com/ru/post/132869/


All Articles