📜 ⬆️ ⬇️

ALTER very large tables in MySQL

If your project has tables the size of which is calculated in gigabytes, but in order to change the structure of such a table you have to stop all services for a few hours - this article will be for you.

Given: a table of several tens of gigabytes of data. The task is to change the structure of the table.

At once I will run in before, the method will work only on transactional tables. If you have a MyISAM table for dozens of gigabytes, then as in that joke - “deal with your problems yourself”. An example will be given for InnoDB tables.
')
Suppose our table structure is:

CREATE TABLE `users` ( `id` int(11) NOT NULL AUTO_INCREMENT, `email` varchar(40) NOT NULL DEFAULT '', `password_hash` char(32) NOT NULL DEFAULT '', `registration_date` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 

We want to add the last_login field to this table.

What we have options.

Head-on


 ALTER TABLE `users` ADD COLUMN `last_login` int(11) NOT NULL DEFAULT 0; 

The option works fine on small projects where the size of the tables rarely exceeds 50,000 entries. We are not an option because ALTER will be executed for too long and all this time the table will be locked both for writing and reading. Accordingly, the service will need to stop at this time.

Turn on the brain


You can not touch the table at all once for that matter, but make a separate `users_lastvisits`:
 CREATE TABLE `users_lastvisits` ( `user_id` int(11) NOT NULL, `last_login` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`user_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 

Now it is possible in all queries where last_login is needed to do a JOIN with the last_login table. It will work, of course, more slowly, and in queries it is too much time to add a JOIN, but in general this is sometimes enough and you can stop at this point.

And yet - you need to add a field


You can raise the master-slave replication, do ALTER on the slave-server and then swap them. To be honest, I have never done this before; it may be simpler than the following method, but it is not always possible to raise replication.

My way is as follows


We create a new table with a final structure, do the triggers on the first table, which will log all the changes, at the same time we start to pour the data from the first table into the second, and when we finish, we add the changed data and rename the tables.

So, we prepare 2 tables - the first with the right structure, the second for logging changes.
 CREATE TABLE `_users` ( `id` int(11) NOT NULL AUTO_INCREMENT, `email` varchar(40) NOT NULL DEFAULT '', `password_hash` char(32) NOT NULL DEFAULT '', `registration_date` int(11) NOT NULL DEFAULT '0', `lastvisit` int(11) NOT NULL DEFAULT 0, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `users_updated_rows` ( `id` int(11) NOT NULL AUTO_INCREMENT, `row_id` int(11) NOT NULL DEFAULT '0', `action` enum('updated','deleted') NOT NULL DEFAULT 'updated', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 

Put the triggers:
 DELIMITER ;; CREATE TRIGGER users_after_delete AFTER DELETE ON users FOR EACH ROW BEGIN INSERT INTO users_updated_rows VALUES (0, OLD.id, 'deleted'); END;; CREATE TRIGGER users_after_insert AFTER INSERT ON users FOR EACH ROW BEGIN INSERT INTO users_updated_rows VALUES (0, NEW.id, 'updated'); END;; CREATE TRIGGER users_after_update AFTER UPDATE ON users FOR EACH ROW BEGIN IF (OLD.id != NEW.id) THEN INSERT INTO users_updated_rows VALUES (0, OLD.id, 'deleted'); END IF; INSERT INTO users_updated_rows VALUES (0, NEW.id, 'updated'); END;; DELIMITER ; 

Now we begin perelivku. To do this, open 2 connections with the base. In one, there will be a transfer, in the other, it will be necessary to block the table for a short while.
 mysql> LOCK TABLES users WRITE; Query OK, 0 rows affected (0.00 sec) mysql> --      mysql> TRUNCATE users_updated_rows; Query OK, 0 rows affected (0.17 sec) mysql> --      mysql> INSERT INTO _users SELECT id, email, password_hash, registration_date, 0 FROM users; mysql> --     mysql> UNLOCK TABLES; 

Everything, now while the table is being poured in, we have time to think about how we will inject the data that has changed since the start of the overflow. There is nothing complicated here - I will not give the script, you just need to take one entry from the users_updated_rows table in the order in which they were added (sorted by primary key) and update or delete it in the _users table;

So, the spreading of the table has already ended, you need to pour in the remaining data. Run the script. The script should work constantly and update all records that are added to the log, when it poured all the data you need to rename the tables:
 mysql> TRUNCATE users_updated_rows; Query OK, 0 rows affected (0.16 sec) mysql> RENAME TABLE users TO __users, _users TO users; Query OK, 0 rows affected (0.11 sec) 

It is worth noting that at this moment a slight data loss is possible. queries are not atomically executed. If it is critical, it is better to turn off the service for a while so that there are no change requests. You can, for example, take away rights to write from a user and execute commands under another user.

If everything is done correctly, the data will not be lost and there will be practically no interruptions in the service. What we needed. In the same way, you can transfer data to another server, only the overflow method will be changed. Instead
 mysql> INSERT INTO _users SELECT id, email, password_hash, registration_date, 0 FROM users; 

need to pour over mysqldump:
 $ mysqldump -h host1 db users --single-transaction -ecQ | pv | mysql -h host2 

In this way, I managed to transfer a table with the size of 60Gb and 400 million lines somewhere in 12 hours without stopping the operation of services to another server.

By the way, the bike has already been invented by Facebook and is called Online Schema Change for MySQL .

Source: https://habr.com/ru/post/121129/


All Articles