Imagine a project that should work 24/7, a project using the mysql database and a typical situation, the developer comes in and says:
In the table, which now has 300 million lines, I forgot to add a field and an index on it, and stopping the work with base is invalid, think of something.
If you try to google on the topic, you will mainly find tips on creating a second label, gradual copying of data there (“insert into T select * from S limit OFFSET”) and then renaming tables, but this mechanism has significant drawbacks, for example, data in the original table may change immediately after copying, and since it is impossible to lock the table on record, then integrity of data will be in doubt.
You can copy data with external code, taking into account all changes synchronously in 2 tables, in reading only unread from the old to the new, but this requires a change in the application code, which is not always possible.
Are there any conditions when such an alter can be made on the go? Certainly yes, there are several of them, but I will tell only one option for my spherical horse in vacuum.
Given - mysql database on 2 servers with master replication and external mechanism (haproxy, application config, etc.), which shows which mysql instance is active in the code. Those. we have the ability to switch requests from one server to another.
Replica two-way:
')
server1 <==> server2
server1 is a logical master, i.e. it receives requests from the application code.
1. Turn off the slave on server1 (
mysql> stop slave; ). Since requests for server2 do not go, then disabling the replica in this direction will not cause negative consequences for the integrity of data on both servers.
2. We are running alter on server2, this query is filling the table only on server2, but the prod does not address it now, it does not cause problems, which means it is safe.
3. We follow the lag from the master to server2, the show slave status command; pay attention to Seconds_Behind_Master. The backlog is caused by the fact that requests to the locked table during alter are not played on the slave server2, they accumulate in the replay-log and after the execution of alter they are played.
4. Next, we get an obsolete table to server1, and to server2 a table with a new format we need, we need to switch the logical master to server2, we switch.
5. Turn off the server on 1 server, for example through
mysqladmin start . which causes alter to come from server2 to server1 and modify the table there.
6. Just in case we wait that the server1 caught up with the replica. If necessary, we switch the logical master again (for example, there is a hard-code for removing backups from server2 and for the logical master to backups they should not be slowed down).
And how do you solve such problems?