📜 ⬆️ ⬇️

Percona XtraDB Cluster. Installation and Testing

Some time ago I was thinking about increasing the availability of my project servers. Everything was solved relatively easily except for the MySQL database. Master-type replication creates problems with index synchronization, while the cluster solution NDBcluster is developing rapidly, but it is still not suitable for migrating a finished project to it due to the large list of differences and limitations.

But there is an alternative solution called Galera Cluster, on which Percona XtraDB Cluster is based, which I will tell you about installing, configuring and testing for Ubuntu.


How is this better than NDBcluster?


First: fewer restrictions (list of restrictions NDBcluster: dev.mysql.com/doc/refman/5.5/en/mysql-cluster-limitations.html , list of restrictions XtraDB Cluster: www.percona.com/doc/percona-xtradb -cluster / limitation.html ).
I only had to redo a few MyISAM tables in InnoDB.
Secondly: multiple patches of MySQL code engine from Perkona.
Thirdly: the possibility of working in a two-year configuration (I’ll just make a reservation that it is suitable only for testing).
Finally, MySQL forklifts are now trendy :)
')

How is it better Galera Cluster screwed to the fork of MySQL?


The presence of the utility xtrabackup, which allows you to make backups on the fly. This is useful because when connecting a new node to the cluster, you will not need to stop any of the working nodes to drain the base from it.

Installation


Installation under Ubuntu, as usual, is elementary:
gpg --keyserver hkp://keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A gpg -a --export CD2EFD2A | sudo apt-key add - sudo apt-get update sudo apt-get install percona-xtradb-cluster-server-5.5 

Sometimes swears on unsolvable dependencies. Then it is enough to first install the dependencies, and then the server itself.

First, without changing anything in the settings, go to the mysql console (enough on the node that will be started first) and add the user for the backup:
 grant RELOAD, LOCK TABLES, REPLICATION CLIENT, FILE on *.* to backup@localhost identified by 'password'; 


Then you need one small alteration. The fact is that in Ubunt a symlink with sh leads to dash, and the cluster startup script is sharpened for bash. If there are no contraindications, you can settle this system-wide:
 dpkg-reconfigure dash 

Reply no.

Since the bases after the launch of the cluster will become completely identical, then the system user password on all nodes will become the same as the first one. Therefore, you need to copy the /etc/mysql/debian.cnf file from the server that will be launched first to the rest.

My configs look like this:
 [mysqld_safe] wsrep_urls=gcomm://192.168.1.1:3400,gcomm://192.168.1.2:3400,gcomm:// [mysqld] port=3306 socket=/var/run/mysqld/mysqld.sock datadir=/var/lib/mysql basedir=/usr user=mysql log_error=error.log binlog_format=ROW wsrep_provider=/usr/lib/libgalera_smm.so wsrep_sst_receive_address=192.168.1.1:3500 wsrep_node_incoming_address=192.168.1.1 wsrep_slave_threads=2 wsrep_cluster_name=cluster0 wsrep_provider_options="gmcast.listen_addr=tcp://192.168.1.1:3400;" wsrep_sst_method=xtrabackup wsrep_sst_auth=backup:password wsrep_node_name=node0 innodb_locks_unsafe_for_binlog=1 innodb_autoinc_lock_mode=2 innodb_buffer_pool_size=5000M innodb_log_file_size=256M innodb_log_buffer_size=4M [client] port=3306 socket=/var/run/mysqld/mysqld.sock 

Check the location of libgalera_smm.so.
The value of the wsrep_slave_threads parameter is recommended to be set as the number of cores * 4
In wsrep_sst_auth user login and password for backups are specified.
innodb_buffer_pool_size, innodb_log_file_size, innodb_log_buffer_size - affect performance, and are selected experimentally. In my case, each node is 8 gig RAM.

To add nodes, add them to the wsrep_urls line (there should be an empty entry at the end of the line).
All ip addresses found in the file (except for the wsrep_urls line) indicate the address of the current node. They need to be changed when distributing this file to other nodes. You also need to change the name of the node in wsrep_node_name.

In my configuration, port 3400 is used for synchronization, port 3500 - for dump filling, port 3306 (standard) - for client connection.

You can run multiple nodes on the same machine, giving them different ports. If you decide to do this, then you need to create several configuration files in / etc / mysql and start the server with the following command:
 /usr/bin/mysqld_safe --defaults-file=/etc/mysql/my0.cnf 

Note that xtrabackup can only connect via a standard socket /var/run/mysqld/mysqld.sock (ignores parameters from the config). So in this case you have to not use it: wsrep_sst_method = rsync


And finally, restart the daemon:
 sudo service mysql restart 

If something went wrong, look in /var/lib/mysql/error.log.
Usually you need to erase / var / lib / mysql / ib_logfile * due to the change in the size of the log in the config file.

Sometimes it's easier to erase the whole / var / lib / mysql / (if there is nothing valuable in the databases) and re-create the default databases:
 mysql_install_db 


I also quoted some possible errors at the end of the article.

Number of nodes


The manufacturer recommends at least three, but there are differences from NDB: initially all nodes are of the same rank and have the same functionality. Nothing prevents to make and 2 nodes.

With the two-channel configuration, everything is simple and sad: it works until the connection between the nodes disappears. In this case, the cluster in order to protect against split-brain does not give anything to do. There are a couple of options for the config, under which it will work even if the connection is broken, but, understandably, there is more harm than benefit from them - when a connection appears, we will get 2 different copies of the database, which is not clear how to combine.

The variant with three nodes in my case did not fit, since I have only 2 servers on a collocation, and the third one added to them somewhere on the office channel would spoil the performance, because the speed of the cluster is determined by the speed of the slowest node.

But there is a solution: Galera Arbitator. This is an inferior node that does not store databases and does not need a fast channel, but as long as the node keeps in touch with it, it will continue to work. This node is so inferior that it does not even use the config, all parameters are passed to it via the command line.
Of the parameters, you need only the address of one of the normal nodes and the name of the cluster:
 garbd -a gcomm://192.168.1.1:3400 -g cluster0 


Tests


I tested the standard utility sql-bench from the test package, first the nodes separately, and then the cluster. The software did not change at the same time - percona-xtradb-cluster.

Install and run the test package:
 apt-get install percona-server-test-5.5 libdbd-mysql-perl cd /usr/share/sql-bench/sql-bench/ perl run-all-tests --server=mysql --user=root --password=<password> 


First tested on two identical fast machines with SSD.
I threw out the results of the first run, as they were much better than the next ones (write off to pure SSDs).

This is 1 node:
 alter-table: Total time: 17 wallclock secs ( 0.04 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.04 CPU) ATIS: Total time: 7 wallclock secs ( 2.48 usr 0.16 sys + 0.00 cusr 0.00 csys = 2.64 CPU) big-tables: Total time: 10 wallclock secs ( 1.87 usr 0.34 sys + 0.00 cusr 0.00 csys = 2.21 CPU) connect: Total time: 64 wallclock secs (19.80 usr 5.68 sys + 0.00 cusr 0.00 csys = 25.48 CPU) create: Total time: 548 wallclock secs ( 3.35 usr 1.66 sys + 0.00 cusr 0.00 csys = 5.01 CPU) insert: Total time: 531 wallclock secs (155.04 usr 19.15 sys + 0.00 cusr 0.00 csys = 174.19 CPU) select: Total time: 168 wallclock secs (17.93 usr 1.90 sys + 0.00 cusr 0.00 csys = 19.83 CPU) transactions: Test skipped because the database doesn't support transactions wisconsin: Total time: 5 wallclock secs ( 1.31 usr 0.18 sys + 0.00 cusr 0.00 csys = 1.49 CPU) 


This is a cluster:
 alter-table: Total time: 21 wallclock secs ( 0.04 usr 0.05 sys + 0.00 cusr 0.00 csys = 0.09 CPU) ATIS: Total time: 21 wallclock secs ( 2.76 usr 0.30 sys + 0.00 cusr 0.00 csys = 3.06 CPU) big-tables: Total time: 17 wallclock secs ( 1.98 usr 0.40 sys + 0.00 cusr 0.00 csys = 2.38 CPU) connect: Total time: 67 wallclock secs (21.13 usr 5.59 sys + 0.00 cusr 0.00 csys = 26.72 CPU) create: Total time: 597 wallclock secs ( 3.55 usr 1.55 sys + 0.00 cusr 0.00 csys = 5.10 CPU) insert: Total time: 1710 wallclock secs (164.66 usr 35.25 sys + 0.00 cusr 0.00 csys = 199.91 CPU) select: Total time: 187 wallclock secs (19.49 usr 2.44 sys + 0.00 cusr 0.00 csys = 21.93 CPU) transactions: Test skipped because the database doesn't support transactions wisconsin: Total time: 47 wallclock secs ( 1.62 usr 0.88 sys + 0.00 cusr 0.00 csys = 2.50 CPU) 

A 3-fold drop on write operations and a serious (if not to say epic) drop on a Wisconsin test, where the idea of ​​reading and writing should be interspersed.
I do not know why swears on the transaction. In fact, they work.

There was an assumption that everything depends on the speed of the network, so I tested it on machines (no longer identical) with the hard drives.

First node separately:
 alter-table: Total time: 55 wallclock secs ( 0.04 usr 0.02 sys + 0.00 cusr 0.00 csys = 0.06 CPU) ATIS: Total time: 10 wallclock secs ( 2.40 usr 0.14 sys + 0.00 cusr 0.00 csys = 2.54 CPU) big-tables: Total time: 7 wallclock secs ( 1.23 usr 0.15 sys + 0.00 cusr 0.00 csys = 1.38 CPU) connect: Total time: 53 wallclock secs (16.31 usr 7.65 sys + 0.00 cusr 0.00 csys = 23.96 CPU) create: Total time: 3215 wallclock secs ( 2.58 usr 0.83 sys + 0.00 cusr 0.00 csys = 3.41 CPU) insert: Total time: 541 wallclock secs (142.41 usr 22.53 sys + 0.00 cusr 0.00 csys = 164.94 CPU) select: Total time: 154 wallclock secs (12.66 usr 1.34 sys + 0.00 cusr 0.00 csys = 14.00 CPU) transactions: Test skipped because the database doesn't support transactions wisconsin: Total time: 4 wallclock secs ( 1.15 usr 0.29 sys + 0.00 cusr 0.00 csys = 1.44 CPU) 


Second node separately:
 alter-table: Total time: 59 wallclock secs ( 0.03 usr 0.03 sys + 0.00 cusr 0.00 csys = 0.06 CPU) ATIS: Total time: 11 wallclock secs ( 2.35 usr 0.23 sys + 0.00 cusr 0.00 csys = 2.58 CPU) big-tables: Total time: 11 wallclock secs ( 1.92 usr 0.30 sys + 0.00 cusr 0.00 csys = 2.22 CPU) connect: Total time: 64 wallclock secs (19.67 usr 5.84 sys + 0.00 cusr 0.00 csys = 25.51 CPU) create: Total time: 4592 wallclock secs ( 3.90 usr 1.39 sys + 0.00 cusr 0.00 csys = 5.29 CPU) insert: Total time: 581 wallclock secs (148.16 usr 19.80 sys + 0.00 cusr 0.00 csys = 167.96 CPU) select: Total time: 168 wallclock secs (18.45 usr 2.07 sys + 0.00 cusr 0.00 csys = 20.52 CPU) transactions: Test skipped because the database doesn't support transactions wisconsin: Total time: 5 wallclock secs ( 1.18 usr 0.25 sys + 0.00 cusr 0.00 csys = 1.43 CPU) 


Cluster:
 alter-table: Total time: 110 wallclock secs ( 0.04 usr 0.02 sys + 0.00 cusr 0.00 csys = 0.06 CPU) ATIS: Total time: 496 wallclock secs ( 1.61 usr 0.17 sys + 0.00 cusr 0.00 csys = 1.78 CPU) big-tables: Total time: 116 wallclock secs ( 1.02 usr 0.16 sys + 0.00 cusr 0.00 csys = 1.18 CPU) connect: Total time: 34 wallclock secs (10.98 usr 2.49 sys + 0.00 cusr 0.00 csys = 13.47 CPU) create: Total time: 4638 wallclock secs ( 2.42 usr 0.91 sys + 0.00 cusr 0.00 csys = 3.33 CPU) insert: Estimated total time: 43470.8 wallclock secs (106.50 usr 15.34 sys + 0.00 cusr 0.00 csys = 121.84 CPU) select: Total time: 631 wallclock secs (11.02 usr 1.02 sys + 0.00 cusr 0.00 csys = 12.04 CPU) transactions: Test skipped because the database doesn't support transactions wisconsin: Total time: 1576 wallclock secs ( 1.37 usr 0.44 sys + 0.00 cusr 0.00 csys = 1.81 CPU) 


However, the assumption was not confirmed: the write operation on the hard disk computers in the cluster slowed down by as much as 10 times. How to explain this - I do not know.

As for using it in real production, everyone chooses for himself. To me, this relative decline in performance is uncritical due to the small absolute numbers.

Possible problems and solutions


one.
 WSREP: Process completed with error: wsrep_sst_xtrabackup 'donor' '192.168.1.1:6000/xtrabackup_sst' 'backup:password' '/var/lib/mysql2/' '/etc/mysql/my2.cnf' '9bdd7773-0cb4-11e2-0800-8e876ebc6b70' '0' '0': 22 (Invalid argument) 

xtrabackup could not connect to the database. You are definitely not trying to make multiple nodes on the same machine? See the details in innobackup.backup.log.

2
 121002 15:19:54 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql0 /usr/bin/mysqld_safe: 172: [: gcomm://: unexpected operator 

Instead of bash, a different interpreter is used. Either in mysqld_safe change the first line, or edit the symlinks.

3. If MyISAM tables are created but not populated from the dump, then it should be this way: only InnoDB is supported.

4. If for all commands that write or read data, the MySQL client issues the “Unknown command”: this is protection against the split-brain if the connection between the nodes is broken.

Source: https://habr.com/ru/post/152969/


All Articles