Proper migration from MyISAM to InnoDB

Let me distract you from the seals and tell you, based on my experience, what pitfalls appear in the transition from MyISAM to InnoDB, and how to avoid them. The application code will be in PHP.

I decided to write this post after reading a huge number of incorrect answers to a request from a subject on the Internet. Illiterate or incomplete answers are scattered around the Internet, resulting in the impression that migrating your InnoDB database is very easy. No, it's not easy! So, let's begin!

Why switch to InnoDB

With this question, I think everything is clear to everyone. I will not explain - a lot of articles on the Internet are devoted to the advantages of InnoDB. If you read these lines, it means that you consciously came to this thought about transferring your farm to InnoDB, and you, habraiser, google) I hope this article is what you need.
')

Preparatory stage

1. From the banal - is to provide the required amount of free disk space, where we have a base. InnoDB takes about 1.5 times more space than MyISAM.

2. A very important point is that it will be useful to you in the future when troubleshooting the performance of the games in the database. You need to comment on each SQL query in your application using a unique identifier, for example, a sequence number. If you have hundreds or thousands of SQL queries, how have you lived without it yet?

SELECT /*017*/ client_id, money, lastname FROM clients WHERE money > 100;

If you do this, queries like SHOW PROCESSLIST, as well as query dumps in slow log files will contain a hint for you - the SQL query number, and then you can instantly find this query in the code and optimize it.

3. Register in the my.cnf config file:

 [mysqld] innodb_file_per_table=1

This flag will allow each table to be stored in a separate tablespace (in a separate file on disk), so as not to clutter up the system tablespace.

4. Setting cache size for InnoDB - in the same my.cnf file:

 # (  ,   MyISAM        ) key_buffer_size = 8M #     50-80%        . innodb_buffer_pool_size = 512M

5. Configuring how the database works with transactions

 transaction-isolation = READ-COMMITTED innodb_lock_wait_timeout=5 innodb_rollback_on_timeout=1 binlog-format = MIXED innodb_log_file_size = 200M

I set the transaction isolation level for READ-COMMITTED on my application, instead of the default REPEATABLE-READ, because otherwise there would be an excessive number of deadlocks in the database. I decided for myself that my application might not read the most recent data, at the cost of faster work, instead of completely up-to-date data, but burdened with a lot of locks. However, for the mission-critical of the transaction in the code, you can increase its isolation level - this effect will affect only one transaction:

 mysqli_query($link, "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE");

The next parameter is a timeout, which I specifically reduced from 50 to 5 seconds so that it does not hang up client sessions for a very long time if there are locks.

innodb_rollback_on_timeout is very important as to how your code handles errors. From this moment I did not meet clarity, therefore I will tell.

- if this flag is not present, then InnoDB, upon the occurrence of a timeout (Error code 1205), only this delayed statement in your transaction will be rolled back. That is, you only need to repeat it, and not the entire transaction from the beginning. For me, this option seemed more difficult to implement.

- if the flag is set, then the entire transaction is rolled back, just as it is done when detecting the deadlock (Error code 1213). I chose this option because it allows you to make the error handling code unified, i.e. repeat the transaction from the first statement, from the beginning, when receiving any of these two errors.

innodb_log_file_size will have to be increased because of pitfall number 3 (below), since this log should be sufficient to store at least several records, and if there are records of type MEDIUMTEXT, their size may exceed several MB, therefore the default value of 5 MB is extremely small. After changing this parameter, the database should be stopped, the old ib_logfile0 and ib_logfile1 files should be deleted, and only then the database should be raised.

What to fear in InnoDB

Actually, in InnoDB you need to carefully look only at these two error codes: 1205 (timeout) and 1213 (deadlock), which were not in MyISAM. When you configure the server above, it will roll back your transactions in both cases. You will need to repeat them first. In this case, your application code can consist of both separate statements — transactions (with autocommit = 1), and also transactions consisting of several SQL statements — in this case, the transaction starts with

 mysqli_query($link, "START TRANSACTION");

and ends

 mysqli_commit($link);

(I know about mysqli_begin_transaction (), but it is only for MySQL> = 5.6, and not all such new MySQL servers everywhere).

If your call to mysqli_query () is not wrapped in for ($ i = 0; $ i <5; $ i ++) {} , then consider that your code is at risk. You need to rewrite all the code, replacing all calls to the mysqli_query () function with a call to your my_mysqli_query () function, which will repeat the roll back transaction, for example:

 function my_mysqli_query($link, $query) { $result = FALSE; $error_code = 0; $msc = microtime(true); for($attempts = 1; $attempts <= 10; $attempts++) { $result = mysqli_query($link, $query); $msctime = microtime(true) - $msc; $msctime = round($msctime, 2); if($result) { if(($attempts > 1) || ($msctime > 2)) { tologfile("[$msctime sec.] SUCCESS (from attempt #$attempts)\n$query\n\n"); } break; } $error_code = mysqli_errno($link); if(($error_code != 1205) && ($error_code != 1213)) { tologfile("[$msctime sec.] IGNORING Code: $error_code\n$query\n\n"); break; } tologfile("[$msctime sec.] FOR RETRY; Code: $error_code (attempt $attempts)\n$query\n\n"); } if(!$result) { tologfile("[$msctime sec.] FAILED after $attempts attempts; Code: $error_code\n$query\n\n"); } return $result; }

This code repeats single-stage transactions rolled back due to timeouts or deadlocks, and also logs oddities, which allowed me to catch rather rare bugs. Also note that the code is actually analogous to the log_slow_queries configuration option, it is only made on its own and is more flexible. For example, I log requests with a duration of more than 2 seconds.

Underwater stone №1

Saw a common misconception about how people handle errors:

 for ($attempts = 0; $attempts < 5; $attempts++) { $result = mysqli_query($link, $Query); if($result) { #  ,   ,  -  break; } } mysqli_commit($link);

It seems to be all right ... But only at first glance. Actually, this code is funny. For example, if there is a syntax error in the SQL query (Error code 1064), or if all data is not removed in the column (Data truncated, Error code 1265), this code will repeat apparently redundant things 5 times.
Therefore, the conclusion is that we go to the next iteration of the cycle only if the error code is 1205 or 1213. In other cases, you need to log an erroneous query, and deal with it later. This is where comments in the body of the SQL query with its sequence number come in handy.

Underwater stone №2

Everything is simple, you just need to remember this: the code that you will repeat when errors 1205 and 1213 occur should be idempotent . That is, the results of this code once and several times should not differ. For example, if inside of a for loop you have to transfer the results of the query to an array:

 array_push($clients_array, $Row['client_id']);

Then at the beginning of this for loop there should be an array cleanup code:

 $clients_array = array();

otherwise, when you repeat the transaction, you will already have twice the array of results.

Underwater stone №3

And this underwater stone is just an ahtung. Remember that I, guided by good intentions, set the transaction isolation level in READ-COMMITTED? So, in this case, and if you have replication enabled, the binary server logs will grow by leaps and bounds! With this transaction isolation level, MySQL no longer believes the data that you modify using SQL queries, so logs are sent to binary logs, and, accordingly, logs are not sent to the slaves in the STATEMENT format, as before, but in the MIXED format (binlog-format = MIXED in the config file, otherwise it will not fly up!), that is, in this case - the entire line, in which at least one column is changed, is thrown into the log.

Now let's imagine that we have a table in the database in which some large MEDIUMTEXT is stored, for example, some logs are stored in the database, along with other columns:

 CREATE TABLE `processing_logs` ( `id` int(11) NOT NULL AUTO_INCREMENT, `isfinished` int(11) NOT NULL, `text` mediumtext NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8

And in the code we do

 mysqli_query($link, "UPDATE /*041*/ processing_logs set isfinished=1 where id=102");

In this case, the entire line will be added to the mysql-bin.00001 log, along with the unchanged text , because ~~gladiolus is~~ READ-COMMITTED, which will cause the 100-megabyte binary log to overflow in just a few minutes of work on production.

Hence the conclusion - you need to deviate from the classical theory of relational databases, and allocate large-sized columns (MEDIUMTEXT for example) into separate tables if the data in them changes less frequently than other attributes in this entity.

_{Note: This is true for MySQL 5.5.} _{In the newer version of the database there is the option binlog-row-image, which can be set to minimal - this should solve the problem with strong growth of binary logs with each update of the type shown above.} _{But I did not test.}

Actually transition to InnoDB

We will switch to InnoDB by creating a new database to which we will copy all the tables in the old database. This piece of code, by the way, makes the correct database dump without stopping it, and at the same time it learns the master status, which is needed to start replication on the slave.

 select now()\G FLUSH TABLES WITH READ LOCK; SET GLOBAL read_only=ON; show master status\G \! mysqldump --add-drop-database -u root -pmypassword myclientsdb > /root/myclientsdb.sql SET GLOBAL read_only=OFF; select now()\G exit

In our case, we just need a SQL dump, which we will drive into the new InnoDB database. To import into InnoDB without problems exceeding the max_allowed_packet size, you need to run these two commands in mysql:

 set global net_buffer_length=1000000; set global max_allowed_packet=1000000000;

Next, create a new database and the user in it:

 create database newclientsdb; use newclientsdb; create user 'newclientsdb'@'localhost' identified by 'passworddb'; grant delete,insert,update,select on newclientsdb.* to 'newclientsdb'@'localhost';

And we drive all the old base into a new one. I love such pipelining, where the conversion of the base engine is done on the fly:

 cat myclientsdb.sql | sed 's/ENGINE\=MyISAM/ENGINE\=InnoDB/g' | mysql -u root -pmypassword newclientsdb

Change username / password and database name in the connection string to the database for a new database:

 $link = mysqli_connect ($Host, $User, $Password, $DBName);

We are testing, and if everything is OK, then the old MyISAM database can be dropped.

That seems to be all.

Source: https://habr.com/ru/post/269121/

All Articles