📜 ⬆️ ⬇️

"Perfect" cluster. Part 3.1 Implementing MySQL Multi-Master Cluster

In continuation of the series of articles on the “Ideal” cluster, I want to share my experience in deploying and configuring Multi-Master MySQL clusters.





My other publications on the “Ideal” cluster


First of all, you should understand which of the MySQL Galera technology implementation you will use. On the market there are implementations of Galera from Percona and MariaDB. It is these two implementations that shared the lion's share of MySQL Galera implementations.
')
Both forks are used as an InnoDB plug-in, the Percona XtraDB Storage Engine.

This engine is based on the InnoDB-plugin code and is fully compatible with it, but differs in markedly higher performance due to the integration of patches from Google and Percona.
In particular, XtraDB improved the mechanism of working with memory, improved the operation of the I / O subsystem InnoDB, added support for multiple read and write threads, support for bandwidth control,
implementation of data pre-fetch (read-ahead), adaptive checkpoint installation (adaptive checkpointing), scaling options for large projects expanded, locking system adapted
to work on systems with a large number of CPUs, added additional features for the accumulation and analysis of statistics.


At the same time, MariaDB Galera is distinguished by a number of other improvements and features .





In addition, MariaDB Galera 10 has a number of improvements compared to version 5.5:



Improvements ported from MySQL 5.6:


From myself I want to add that both forks also support HandlerSocket and Memcached plugin

A more detailed description of the stable release of MariaDB 10.0 DBMS can be found in the source on opennet

Why did I choose MariaDB Galera 10?


.

MariaDB Galera 10 supports MySQL Query Cache out of the box. Any instructions for installing any of the MySQL Galera implementations clearly indicate the need to disable Query Cache. As a result, when switching from a single database server to a cluster option, the speed of reading complex queries drops significantly. And the load on the server increases commensurately.
Percona XtraDB Cluster in version 5.6 also came close to implementing full-fledged support for Query Cache, but here you need to enable it on the “live” one, after launching the node using queries:

SET GLOBAL query_cache_size =128*1024*1024; SET GLOBAL query_cache_type = 1; 


When Query Cache is enabled, up to 95% of queries return the result from the cache instead of being executed again.

I want to immediately give a couple of their comments.

Cache should not be much . The largest size, which is generally worth installing, is no more than 512MB . Even 512MB is a lot, really you need less. And that's why:

If any of the tables, the sample of which is in the cache, undergo changes (insert or change rows), then MySQL removes such samples from the cache . This approach speeds up the work of MySQL, but may be ineffective for systems with a large number of requests for changing tables. This leads to the fact that the tables are simply locked in Waiting mode for query cache lock .


The query cache can be thought of as a hash, the keys of which are queries, and the values ​​are the results of queries.
If query cache usage is enabled, then, upon receiving a query, MySQL determines whether the first three characters of the query are “SEL”. If yes, then MySQL looks to see if there is an entry in the query cache with a key equal to the query.

From here follow two important rules:



In addition to the results, MySQL stores in cache the list of tables, a selection of which is cached.

Read more about the cache of requests, you can read in the source on habrahabr

From words to deeds



I think that you use, you have already figured out. Further in the text I describe the work with MariaDB Galera 10, but almost everything described above is also true for Percona XtraDB Cluster 5.6.

If we translate a single MySQL installation into cluster execution:



To solve the first problem there are 2 ways:

# Option 1

 mysql __ -e "show table status where Engine='MyISAM';" | awk 'NR>1 {print "ALTER TABLE "$1" ENGINE = InnoDB;"}' | mysql __ 


Option 2

 mysql __ -e "show table status where Engine='MyISAM';" | awk '{print $1}' | xargs -t -i pt-online-schema-change --alter-foreign-keys-method=auto --alter "ENGINE=InnoDB" --execute --statistics --set-vars="wait_timeout=10000,innodb_lock_wait_timeout=10,lock_wait_timeout=180" --progress=time,1 D=__,t={} 


For small tables, the first option works pretty quickly. But with large tables there are problems. Since the conversion will take a long time, the table will be blocked and all operations with it will become impossible, which will certainly affect the provision of services / services. The pt-online-schema-change utility from the percona-toolkit kit will help us to solve this problem.

This utility is installed from the repository for CentOS :

 rpm -Uhv http://www.percona.com/downloads/percona-release/percona-release-0.0-1.x86_64.rpm 


Important It is necessary that the table being converted has either a primary (PRIMARY) or a unique (UNIQUE) key, otherwise it will generate an error, for example:
Cannot chunk the original table `database`.`NAMETABLE01_NOKEY`: There is no index and the table is oversized. at / usr / bin / pt-online-schema-change line 5442.


To solve the second problem, alas, there is only one way - to add the PRIMARY or UNIQUE key via ALTER.

All tables should be a primary key (multi-column primary keys are supported). DELETE operations are unsupported on tables without a primary key. Also, it can be different ordering for different nodes.


Those. possible loss of nodes, deadlocks and other problems. Plus the order of lines gets off. It needs to be repaired first.

If we have left these problems behind, then we will proceed to installing and configuring the database server itself.



 cat > /etc/yum.repos.d/MariaDB.repo << EOL [mariadb] # MariaDB 10.0 CentOS repository list - created 2015-02-18 14:04 UTC # http://mariadb.org/mariadb/repositories/ [mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.0/centos6-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1 EOL 


 yum install MariaDB-Galera-server MariaDB-client rsync galera ntp nscd 


 chkconfig nscd on $$ /etc/init.d/nscd start 


# Need to disable selinux, this is the requirement of the developers MariaDB

 sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config echo 0 > /selinux/enforce 


# On all nodes, the time must be set correctly, this is mandatory. Otherwise, you will encounter the fact that, with the SST node from the donor, the synchronized node will just wait for something, without any sign of activity.

 yum install ntp -y chkconfig ntpd on /etc/init.d/ntpd stop ntpdate 165.193.126.229 0.ru.pool.ntp.org 1.ru.pool.ntp.org 2.ru.pool.ntp.org 3.ru.pool.ntp.org /etc/init.d/ntpd start 


To set up MariaDB servers and Galera clusters, I wrote a script, it creates a configuration file, individually for each server.

I want to say again, at the output we get a workpiece that requires subsequent editing.

 #!/bin/sh # wget --no-check-certificate -q -O - 'https://cloud.sycraft.info/index.php/s/7bf49db6da59f6d48d61abcb2c4b4791/download' | bash -x - # fetch -o mysqld_config.sh 'https://cloud.sycraft.info/index.php/s/7bf49db6da59f6d48d61abcb2c4b4791/download' # sh mysqld_config.sh if [ "$(uname)" == 'Linux' ]; then IBS=innodb_buffer_pool_size\ \=\ $((`free -m | grep Mem | awk '{print $2}'`*60/100000))G; socket=socket=\/var\/lib\/mysql\/mysql.sock; DB=datadir=\/var\/lib\/mysql; conf=\/etc; cpu=$((`cat /proc/cpuinfo | grep -c processor`*2)) else IBS=innodb_buffer_pool_size\ \=\ $((`dmesg |grep real\ memory | awk '{print $5}' |cut -c 2- | tail -1`*60/100000))G; conf=\/var\/db\/mysql; cpu=$((`sysctl hw.ncpu | awk '{print $2}'`*2)) fi mkdir -p ~/backup/mysql > /dev/null 2>&1 mkdir $conf/mysql.d > /dev/null 2>&1 mkdir $conf/mysql.d/ssl > /dev/null 2>&1 mkdir /var/log/mysql > /dev/null 2>&1 chown mysql:mysql $conf/mysql.d chown mysql:mysql $conf/mysql.d/ssl chown -R mysql:mysql /var/log/mysql if [ -f $conf/my.cnf ]; then cp $conf/my.cnf ~/backup/mysql/my.cnf.`date +%Y-%m-%d_%H-%M` fi if [ -f $conf/mysql.d/000-galera.cnf ]; then cp $conf/mysql.d/000-galera.cnf ~/backup/mysql/000-galera.cnf.`date +%Y-%m-%d_%H-%M` fi if [ -f $conf/mysql.d/001-server.cnf ]; then cp $conf/mysql.d/001-server.cnf ~/backup/mysql/001-server.cnf.`date +%Y-%m-%d_%H-%M` fi if [ -f $conf/mysql.d/002-myisam.cnf ]; then cp $conf/mysql.d/002-myisam.cnf ~/backup/mysql/002-myisam.cnf.`date +%Y-%m-%d_%H-%M` fi if [ -f $conf/mysql.d/003-rep-master.cnf ]; then cp $conf/mysql.d/003-rep-master.cnf ~/backup/mysql/003-rep-master.cnf.`date +%Y-%m-%d_%H-%M` fi if [ -f $conf/mysql.d/004-rep-slave.cnf ]; then cp $conf/mysql.d/004-rep-slave.cnf ~/backup/mysql/004-rep-slave.cnf.`date +%Y-%m-%d_%H-%M` fi if [ -f $conf/mysql.d/005-mariadb-opt.cnf ]; then cp $conf/mysql.d/005-mariadb-opt.cnf ~/backup/mysql/005-mariadb-opt.cnf.`date +%Y-%m-%d_%H-%M` fi if [ -f $conf/mysql.d/006-ssl.cnf ]; then cp $conf/mysql.d/006-ssl.cnf ~/backup/mysql/006-ssl.cnf.`date +%Y-%m-%d_%H-%M` fi if [ -f $conf/mysql.d/007-handlersocket.cnf ]; then cp $conf/mysql.d/007-handlersocket.cnf ~/backup/mysql/007-handlersocket.cnf.`date +%Y-%m-%d_%H-%M` fi if [ -f $conf/mysql.d/008-threadpool.cnf ]; then cp $conf/mysql.d/008-threadpool.cnf ~/backup/mysql/008-threadpool.cnf.`date +%Y-%m-%d_%H-%M` fi cat > $conf/my.cnf << EOL !includedir $conf/mysql.d/ EOL # galera-only cat > $conf/mysql.d/000-galera.cnf << EOL [mysqld] wsrep_provider = /usr/lib64/galera/libgalera_smm.so wsrep_cluster_address = gcomm://192.168.0.30,192.168.0.40,192.168.0.41,192.168.0.74,192.168.0.75,192.168.0.76,192.168.0.161 # Node4 address wsrep_node_address = 192.168.0.161 # Cluser name wsrep_cluster_name = cluster wsrep_node_name = prod-db-new-04 #wsrep_slave_threads = $cpu innodb_autoinc_lock_mode = 2 # SST method wsrep_sst_method = xtrabackup wsrep_sst_auth = "sstuser:s3cretPass" ##wsrep_sst_method = rsync wsrep_retry_autocommit = 3 wsrep_provider_options = "gcache.size=5G; repl.commit_order=1; gmcast.segment=2" EOL cat > $conf/mysql.d/001-server.cnf << EOL [mysqld] symbolic-links=0 default_storage_engine = InnoDB innodb_file_per_table = 1 event_scheduler=on #character-set-server = utf8 $DB $socket # network connect_timeout = 600000 wait_timeout = 28800 max_connections = 600 max_allowed_packet = 512M max_connect_errors = 10000 net_read_timeout = 600000 connect_timeout = 600000 net_write_timeout = 600000 # innodb engine settings innodb_open_files = 512 $IBS innodb_buffer_pool_instances = 2 innodb_file_format = barracuda innodb_locks_unsafe_for_binlog = 1 innodb_flush_log_at_trx_commit = 2 innodb_flush_method = O_DIRECT transaction-isolation = READ-COMMITTED innodb-data-file-path = ibdata1:10M:autoextend innodb-log-file-size = 256M innodb_log_buffer_size = 8M # performance settings skip-name-resolve skip-external-locking skip-innodb_doublewrite query_cache_size = 128M query_cache_type = 1 query_cache_min_res_unit = 2K join_buffer_size = 8M read_rnd_buffer_size = 3M table_definition_cache = 2048 table_open_cache = 2048 thread_cache_size = 128 tmp_table_size = 2048M max_heap_table_size = 2048M log_error = /var/log/mysql/mysql-error.log #slow_query_log_file = /var/log/mysql/mysql-slow.log EOL # myisam cat > $conf/mysql.d/002-myisam.cnf << EOL [mysqld] key_buffer_size = 512M EOL # rep-master cat > $conf/mysql.d/003-rep-master.cnf << EOL [mysqld] #log-bin = /var/log/mysql/mysql-bin # cluster # binlog_format=ROW # single installation binlog_format=MIXED server_id = 226 sync-binlog = 0 expire-logs_days = 3 max-binlog-size = 1G log-slave-updates EOL # rep-slave cat > $conf/mysql.d/004-rep-slave.cnf << EOL [mysqld] slave-skip-errors = 1062 log_slave_updates = 1 slave_type_conversions=ALL_NON_LOSSY relay-log = /var/log/mysql/mysql-relay-bin relay-log-index = /var/log/mysql/mysql-relay-bin.index relay-log-info-file = /var/log/mysql/mysql-relay-log.info skip-slave-start # replicate-rewrite-db=from_name->to_name # replicate-ignore-table=db_name.table_name # replicate-wild-ignore-table=db_name.table_name EOL # mariadb-opt cat > $conf/mysql.d/005-mariadb-opt.cnf << EOL [mysqld] optimizer_switch='derived_merge=off,derived_with_keys=off' EOL # ssl cat > $conf/mysql.d/006-ssl.cnf << EOL #[mysqld] #ssl-ca = $conf/mysql.d/ssl/ca-cert.pem #ssl-cert = $conf/mysql.d/ssl/server-cert.pem #ssl-key = $conf/mysql.d/ssl/server-key.pem EOL # handlersocket cat > $conf/mysql.d/007-handlersocket.cnf << EOL [mysqld] #handlersocket_address=127.0.0.1 #handlersocket_port=9998 #handlersocket_port_wr=9999 EOL # threadpool cat > $conf/mysql.d/008-threadpool.cnf << EOL [mysqld] thread_handling = pool-of-threads thread_pool_size = $cpu EOL 


Life does not stand still and I, just like you, continue to evolve continuously, the latest version of the script, it is better to take immediately from its permanent page , probably from the moment of writing the article, a lot has changed in it.

Explanations to the config and generation script



wsrep_sst_method = xtrabackup

If you use rsync mode, then at the time of synchronization of the node from the donor, the donor will be completely blocked for recording. In xtrabackup mode, the lock will last only a few seconds while xtrabackup “clings” to the base.
If you use HAProxy as described here HAPRoxy for Percona or Galera on CentOS. Its configuration and monitoring in Zabbix is to work with the server while it is in donor mode, we need to edit the clustercheck script on the nodes.


# Replacing string
 AVAILABLE_WHEN_DONOR=${3:-0} 


# per line
 AVAILABLE_WHEN_DONOR=1 


In this mode, in the case of a complete drop in all the nodes of the cluster, we can reduce the idle time to synchronization to the minimum.

transaction_isolation = REPEATABLE-READ

It is worth trying to change to transaction-isolation = READ-COMMITTED transition to snapshot transactions. Each transaction becomes a kind of independent sandbox. Snapshot of data.
This is similar to the Oracle isolation level. All SELECT ... FOR UPDATE and SELECT ... LOCK IN SHARE MODE statements block only index entries and do not block the interval in front of them. Therefore, they allow you to freely add new entries after being blocked. UPDATE and DELETE, which use a unique index and unique search conditions, block only the found index entry, and do not block the interval in front of it. But in UPDATE and DELETE, a range type in InnoDB must set the next key lock or interval lock and block other users from adding to the interval covered by the range. This is necessary because "Phantom lines" must be blocked for successful replication and recovery in MySQL. Consistent reading works just like Oracle: every consistent read, even within a single transaction, sets and reads its own snapshot.
In most cases, the transition gives an increase in speed on a competitive record, but phantom reading effect is also possible. In my practice, I have met only one application that was ill with phantom. Those. These are applications using DBMS, you need to check for the ability to work in this mode.


innodb_flush_log_at_trx_commit = 2

A value of “1” means that any completed transaction will synchronously flush the log to disk. This is the default option, it is the most reliable in terms of data integrity, but the slowest in terms of speed.
The value “2” does the same, only flushes the log not to the disk, but to the cache of the operating system (that is, it does not flush after each operation). This value is suitable in most cases, because does not perform expensive write operations after each transaction. In this case, the log is written to the disk with a delay of several seconds, which is very safe from the point of view of data integrity.
But we have a cluster and in the event of a crash, the data will still be transferred from the donor. The main thing that the transaction zakomitilas on other nodes. Then we get the data at SST


innodb_buffer_pool_instances = 2

By default, InnoDB uses one instance for the Buffer Pool.
In this case, it is possible to select several blocks - and in some cases MySQL works with them in InnoDB much more efficiently. This is associated with smaller cache locks when writing data.


innodb_file_format = barracuda

This format is the most "new" and supports compression. This reduces the load on the IO (disks) by using compression. Just as a recommendation, you can use a 16KB block size.


Here is an example of alter:
 ALTER TABLE `t1` ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16; 


Here are the results of testing the speed and size of data under compression.
But there are downsides to compression . On compressed tables, ALTERs will occur much longer, as you know, ALTER, like any DDL, locks the table and with it the entire cluster. ALTER is not a transactional instruction, and therefore is not replicated by ROW binary diffs, but is transmitted in the form of requests. And until this request is executed on all nodes of the cluster, all commits will be frozen.
Those. it makes sense to do compression either on large tables where DDL is not planned in principle, or on single mysql instances.

innodb_flush_method = O_DIRECT

Data reset bypassing disk cache. This is necessary to prevent double buffering of data in the innodb_buffer_pool cache and the file system cache. Will allow more efficient use of RAM.


It is worth adding the skip-innodb_doublewrite parameter, which is important in terms of performance .

Even though it doesn’t need to be written over twice. Write to write buffer it is pretty cheap. It can be used as a rule. . However, it could not be implemented at the same time. I would expect no more than 5-10% performance loss due to use of doublewrite.


tmp_table_size = 2048M
max_heap_table_size = 2048M

Very interesting parameters, their values ​​need to squeeze to the maximum. This will reduce the number of temporary tables created on the disk. As a rule, it is the creation of temporary tables on the disk that takes more time on sorts, groupings and other complex select.


optimizer_switch = 'derived_merge = off, derived_with_keys = off'

There are problems with the compatibility of the application with the base, after switching to percona 5.6 and galera 10. The most significant of them should be immediately warned with the parameter


thread_handling = pool-of-threads
thread_pool_size = number of cores

You should also use thread_pool


wsrep_retry_autocommit = 3

Important! If the database is deadlock, the commits will be retarded, i.e. The node will not fall out of the cluster at the first sneeze, but will continue to work and we do not lose commit.

wsrep_provider_options = “gcache.size = 5G; repl.commit_order = 1; gmcast.segment = 2 "

Here is a detailed description, these parameters I usually put by default always.

The parameter wsrep_replicate_myisam = 1 is almost a 100% guarantee of the death of the cluster if at least one combat myisam table appears there.

This feature is still experimental and its inclusion adds to the ROW (based on binary diff snapshots) replication and the statement, as in the replication of DDL commands. This means constant conflicts, locks and the collapse of the cluster after any of the myisam tables.


If you have any questions, difficulties, or need advice:
My profile contacts

Source: https://habr.com/ru/post/253869/


All Articles