After my first and second articles on Apache CloudStack, a user approached me with a question about how to organize a highly available ACS control server using Galera. Previously, I did not use such a deployment topology, but on this occasion I decided to try this configuration.
This article describes the method for deploying a fault-tolerant configuration of Apache CloudStack control servers in conjunction with the MariaDB multimaster cluster (Galera). When developing this manual, the following versions of software products were used:
UPD: after the initial publication of the article, the architect ShapeBlue drew my attention to some of the limitations of ACS, which do not allow full use of the original version, so the article was amended to take into account the identified shortcomings.
The target deployment model is shown in the following figure.
Within this model, each ACS server is associated with some selected MariaDB server as a master, and the remaining servers are assigned as slaves. Some English-language manuals suggest using the HAProxy intermediate component to organize an additional layer of connections to MariaDB in such a way that each ACS server can transparently switch between DBMS servers upon failure, however, in my opinion, external traffic switching between ACS servers is enough, which can be configured using nginx.
Consider the general scheme of the deployment process of the entire system and the initial assumptions.
To deploy the MariaDB configuration, the Ansible Playbook will be used, available on
Github This approach allows the article to focus on exactly how to install ACS, but not setting up a Galera cluster. Since the above playbook is designed for use with CentOS 7, the operating system is appropriately selected. In the event that the reader plans to perform a similar setup on another operating system compatible with ACS 4.9.2, the general scheme remains the same.
Playboy is not mine. In the repository you can see where the fork is made. Regarding the original, it is slightly corrected in order to support the network on eth0.
Strange as it may sound, you cannot install ACS 4.9.2 on a working Galera cluster. This limitation arises due to the fact that the management server during installation creates a database for version 4.0, which is converted into a database of version 4.9.2 by a chain of migrations. Since the database engines (used in earlier versions of ACS), which are not supported by the cluster version of Galera, are used during the initial installation and migrations, the process does not complete successfully.
To overcome this problem, we will perform the installation in several stages:
The output will be obtained working topology displayed in the first figure.
We will deploy on 4 hosts:
The Galera Cluster Deployment Playbook assumes that the hosts are on a secure network (simplified firewall setup) accessible through an eth0 network card. The playbook assumes that the firewalld
service is being used.
Install the base components on the ac host:
# yum install epel-release # yum install net-tools git mariadb mariadb-server ansible
Launch MariaDB:
# systemctl start mariadb
Check that MariaDB works:
# mysql -uroot Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 2 Server version: 5.5.52-MariaDB MariaDB Server Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]>
Enter the recommended ACS options in the [mysql] section of the MariaDB server settings (/etc/my.cnf.d/server.cnf):
[mysql] innodb_rollback_on_timeout=1 innodb_lock_wait_timeout=600 max_connections=350 log-bin=mysql-bin binlog-format = 'ROW'
Restart MariaDB:
# systemctl restart mariadb
Create a Yum repository file for Apache CloudStack /etc/yum.repos.d/cloudstack.repo
with the following contents:
[cloudstack] name=cloudstack baseurl=http://cloudstack.apt-get.eu/centos/7/4.9/ enabled=1 gpgcheck=0
Install the ACS management server package:
# yum install cloudstack-management
Editing the Java security file:
# grep -l '/dev/random' /usr/lib/jvm/java-*/jre/lib/security/java.security /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131-3.b12.el7_3.x86_64/jre/lib/security/java.security
where we replace the random number generator (required for the encryption library used in ACS):
# sed -i 's#securerandom.source=file:/dev/random#securerandom.source=file:/dev/urandom#' /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131-3.b12.el7_3.x86_64/jre/lib/security/java.security
Change the SELinux policy to permissive in the file:
# sed -i 's#SELINUX=enforcing#SELINUX=permissive#' /etc/selinux/config
Run the installation of the ACS database (with access to MariaDB cloud / secret) from the root user (a local database access password is not required):
# cloudstack-setup-databases cloud:secret@localhost --deploy-as=root Mysql user name:cloud [ OK ] Mysql user password:****** [ OK ] Mysql server ip:localhost [ OK ] Mysql server port:3306 [ OK ] Mysql root user name:root [ OK ] Mysql root user password:****** [ OK ] Checking Cloud database files ... [ OK ] Checking local machine hostname ... [ OK ] Checking SELinux setup ... [ OK ] Detected local IP address as 176.120.25.66, will use as cluster management server node IP[ OK ] Preparing /etc/cloudstack/management/db.properties [ OK ] Applying /usr/share/cloudstack-management/setup/create-database.sql [ OK ] Applying /usr/share/cloudstack-management/setup/create-schema.sql [ OK ] Applying /usr/share/cloudstack-management/setup/create-database-premium.sql [ OK ] Applying /usr/share/cloudstack-management/setup/create-schema-premium.sql [ OK ] Applying /usr/share/cloudstack-management/setup/server-setup.sql [ OK ] Applying /usr/share/cloudstack-management/setup/templates.sql [ OK ] Processing encryption ... [ OK ] Finalizing setup ... [ OK ] CloudStack has successfully initialized database, you can check your database configuration in /etc/cloudstack/management/db.properties
Run the installation of the management server:
# cloudstack-setup-management --tomcat7 Starting to configure CloudStack Management Server: Configure Firewall ... [OK] Configure CloudStack Management Server ...[OK] CloudStack Management Server setup is Done!
We check server availability in a few seconds ( http: // ac: 8080 / client ):
# LANG=C wget -O /dev/null http://ac:8080/client 2>&1 | grep '200 OK' HTTP request sent, awaiting response... 200 OK
I recommend using the browser to the URL and make sure that the server is working and allows you to authenticate using the admin / password pair. If everything works out, then the installation of the management server was successfully completed.
Now save the database dumps for further import to the Galera cluster:
# mysqldump -uroot cloud >cloud.sql # mysqldump -uroot cloud_usage >cloud_usage.sql
Ensure that all tables have the InnoDB format:
# grep ENGINE *.sql | grep -v InnoDB | grep -v -c '*/' 0
We will deploy the Galera cluster using Ansible. Ansible works using the SSH protocol, so it is necessary to put the public part of the SSH keys on the h1, h2, h3 hosts, for the ac host.
Generate an SSH key on the ac host:
# ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: 30:81:f7:90:f0:4f:e0:d4:8a:74:ca:40:ba:d9:c7:8e root@ac The key's randomart image is: +--[ RSA 2048]----+ | .. .o+o | | .. o+=o. | |. + ==+. | | + .+ .=. | |o . o S | | + | | E . | | | | | +-----------------+
Spread the key to all hosts:
# ssh-copy-id h1 The authenticity of host 'h1 (XYZC)' can\'t be established. ECDSA key fingerprint is 27:f7:34:23:ea:b4:d2:61:8c:ec:d8:13:c2:9f:8a:ef. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys root@h1\'s password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'h1'" and check to make sure that only the key(s) you wanted were added.
Similar actions will be done for the h2 and h3 hosts.
Now you need to clone the Git repository with the Ansible playbook:
# git clone https://github.com/bwsw/mariadb-ansible-galera-cluster.git Cloning into 'mariadb-ansible-galera-cluster'... remote: Counting objects: 194, done. remote: Compressing objects: 100% (5/5), done. remote: Total 194 (delta 0), reused 2 (delta 0), pack-reused 188 Receiving objects: 100% (194/194), 29.91 KiB | 0 bytes/s, done. Resolving deltas: 100% (68/68), done.
Change the directory to mariadb-ansible-galera-cluster
:
# cd mariadb-ansible-galera-cluster # pwd /root/mariadb-ansible-galera-cluster
Edit the file galera.hosts
, where we will point our servers h1, h2, h3:
[galera_cluster] h[1:3] ansible_user=root
Check the correctness of the configuration:
# ansible -i galera.hosts all -m ping h3 | SUCCESS => { "changed": false, "ping": "pong" } h2 | SUCCESS => { "changed": false, "ping": "pong" } h1 | SUCCESS => { "changed": false, "ping": "pong" }
Install the necessary dependencies:
# ansible -i galera.hosts all -m raw -s -a "yum install -y epel-release firewalld ntpd"
Update system time:
# ansible -i galera.hosts all -m raw -s -a "chkconfig ntpd on && service ntpd stop && ntpdate 165.193.126.229 0.ru.pool.ntp.org 1.ru.pool.ntp.org 2.ru.pool.ntp.org 3.ru.pool.ntp.org && service ntpd start"
Make the necessary changes to the configuration file ansible (/etc/ansible/ansible.cfg), as described in
README.md to the playbook:
[defaults] gathering = smart fact_caching = jsonfile fact_caching_connection = ~/.ansible/cache
Let's start installing the Galera cluster according to the playbook README.md:
# ansible-playbook -i galera.hosts galera.yml --tags setup
It will take several minutes. Following the execution we will get the installed MariaDB servers on the h1, h2, h3 hosts without cluster assembly. The next step is to run the remaining steps of the playbook:
# ansible-playbook -i galera.hosts galera.yml --skip-tags setup
The last step is to start the cluster:
# ansible-playbook -i galera.hosts galera_bootstrap.yml
Check the status of the cluster. To do this, go to the h1, h2, h3 hosts and execute the mysql console:
# mysql -uroot Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 6 Server version: 10.1.25-MariaDB MariaDB Server Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> SHOW GLOBAL STATUS LIKE 'wsrep_%'; +------------------------------+------------------------------------------------------------+ | Variable_name | Value | +------------------------------+------------------------------------------------------------+ | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 0.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 0.000000 | | wsrep_cert_index_size | 0 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 3 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 230ed410-6b9b-11e7-8aa5-4b041fceb486 | | wsrep_cluster_status | Primary | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 0.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | 230d9f4f-6b9b-11e7-ba99-fab39514a7e8 | | wsrep_incoming_addresses | 111.120.25.96:3306,111.120.25.229:3306,111.120.25.152:3306 | | wsrep_last_committed | 0 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 18446744073709551615 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 0 | | wsrep_local_index | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | 230ed410-6b9b-11e7-8aa5-4b041fceb486 | | wsrep_protocol_version | 7 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.20(r3703) | | wsrep_ready | ON | | wsrep_received | 10 | | wsrep_received_bytes | 769 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_thread_count | 2 | +------------------------------+------------------------------------------------------------+ 58 rows in set (0.01 sec) MariaDB [(none)]>
Pay attention to the wsrep_incoming_addresses
field, there should be the IP addresses of our three servers h1, h2, h3.
The next step is to import the cloud.sql
and cloud_usage.sql
into the cluster.
Copy the dumps:
# scp ../*.sql h1: cloud.sql 100% 1020KB 1.0MB/s 00:00 cloud_usage.sql 100% 33KB 32.7KB/s 00:00
Open an SSH connection to host h1, where we will perform all operations for importing databases:
# ssh h1
Create databases, issue privileges to the user, and import the dumps:
[root@h1 ~]# echo "CREATE DATABASE cloud;" | mysql -uroot [root@h1 ~]# echo "CREATE DATABASE cloud_usage;" | mysql -uroot [root@h1 ~]# echo "GRANT ALL PRIVILEGES ON cloud.* TO cloud@'localhost' identified by 'secret'" | mysql -uroot [root@h1 ~]# echo "GRANT ALL PRIVILEGES ON cloud_usage.* TO cloud@'localhost' identified by 'secret'" | mysql -uroot [root@h1 ~]# echo "GRANT ALL PRIVILEGES ON cloud.* TO cloud@'h1' identified by 'secret'" | mysql -uroot [root@h1 ~]# echo "GRANT ALL PRIVILEGES ON cloud_usage.* TO cloud@'h1' identified by 'secret'" | mysql -uroot [root@h1 ~]# echo "GRANT ALL PRIVILEGES ON cloud.* TO cloud@'h2' identified by 'secret'" | mysql -uroot [root@h1 ~]# echo "GRANT ALL PRIVILEGES ON cloud_usage.* TO cloud@'h2' identified by 'secret'" | mysql -uroot [root@h1 ~]# echo "GRANT ALL PRIVILEGES ON cloud.* TO cloud@'h3' identified by 'secret'" | mysql -uroot [root@h1 ~]# echo "GRANT ALL PRIVILEGES ON cloud_usage.* TO cloud@'h3' identified by 'secret'" | mysql -uroot [root@h1 ~]# cat cloud.sql | mysql -uroot cloud [root@h1 ~]# cat cloud_usage.sql | mysql -uroot cloud_usage
It is worth making sure that our cloud, cloud_usage databases are replicated to h2 and h3 servers.
Now we will install the ACS control servers. First, from the ac host, copy the ACS repository settings:
# ansible -i galera.hosts all -m copy -a "src=/etc/yum.repos.d/cloudstack.repo dest=/etc/yum.repos.d/cloudstack.repo"
Then install the Apache CloudStack management server packages:
# ansible -i galera.hosts all -m raw -s -a "yum install -y cloudstack-management"
Change the Java security setting we already know:
# ansible -i galera.hosts all -m raw -s -a "sed -i 's#securerandom.source=file:/dev/random#securerandom.source=file:/dev/urandom#' /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131-3.b12.el7_3.x86_64/jre/lib/security/java.security"
Perform the configuration of connecting the control server ACS to the DBMS on the h1 host:
# ansible -i galera.hosts all -m raw -s -a "cloudstack-setup-databases cloud:secret@h1"
Now all servers will use host h1 as the main database server. Next, we will perform additional configuration of auxiliary servers.
Install the Apache CloudStack high-availability connection package to MySQL (for some unknown reason, CentOS 7 is missing):
# ansible -i galera.hosts all -m raw -s -a "rpm -i http://packages.shapeblue.com.s3-eu-west-1.amazonaws.com/cloudstack/upstream/centos7/4.9/cloudstack-mysql-ha-4.9.2.0-shapeblue0.el7.centos.x86_64.rpm"
We start the control servers:
# ansible -i galera.hosts all -m raw -s -a "cloudstack-setup-management --tomcat7"
Make sure everything is up and running:
# ansible -i galera.hosts all -m raw -s -a "ps xa | grep java"
Now, at the addresses http: // h1: 8080 / client , http: // h2: 8080 / client , http: // h3: 8080 / client , fault tolerant control servers are running that can be “closed” by nginx to ensure high availability. At the same time in the log /var/log/cloudstack/management/management-server.log
you can see the records that the servers know about each other:
# cat /var/log/cloudstack/management/management-server.log | grep 'management node' 2017-07-18 17:05:27,209 DEBUG [cccClusterManagerImpl] (Cluster-Heartbeat-1:ctx-b1d64593) (logid:18f29c84) Detected management node joined, id:7, nodeIP:111.120.25.96 2017-07-18 17:05:27,231 DEBUG [cccClusterManagerImpl] (Cluster-Heartbeat-1:ctx-b1d64593) (logid:18f29c84) Detected management node joined, id:12, nodeIP:111.120.25.152 2017-07-18 17:05:27,231 DEBUG [cccClusterManagerImpl] (Cluster-Heartbeat-1:ctx-b1d64593) (logid:18f29c84) Detected management node joined, id:17, nodeIP:111.120.25.229 2017-07-18 17:05:33,195 DEBUG [cccClusterManagerImpl] (Cluster-Heartbeat-1:ctx-76484aea) (logid:b7416e7b) Detected management node left and rejoined quickly, id:7, nodeIP:111.120.25.96 2017-07-18 17:07:22,582 INFO [cccClusterManagerImpl] (localhost-startStop-1:null) (logid:) Detected that another management node with the same IP 111.120.25.229 is considered as running in DB, however it is not pingable, we will continue cluster initialization with this management server node 2017-07-18 17:07:33,292 DEBUG [cccClusterManagerImpl] (Cluster-Heartbeat-1:ctx-8880f35d) (logid:037a1c4d) Detected management node joined, id:7, nodeIP:111.120.25.96 2017-07-18 17:07:33,312 DEBUG [cccClusterManagerImpl] (Cluster-Heartbeat-1:ctx-8880f35d) (logid:037a1c4d) Detected management node joined, id:12, nodeIP:111.120.25.152 2017-07-18 17:07:33,312 DEBUG [cccClusterManagerImpl] (Cluster-Heartbeat-1:ctx-8880f35d) (logid:037a1c4d) Detected management node joined, id:17, nodeIP:111.120.25.229 2017-07-18 17:12:41,927 DEBUG [cccClusterManagerImpl] (Cluster-Heartbeat-1:ctx-7e3e2d82) (logid:e63ffa72) Detected management node joined, id:7, nodeIP:111.120.25.96 2017-07-18 17:12:41,935 DEBUG [cccClusterManagerImpl] (Cluster-Heartbeat-1:ctx-7e3e2d82) (logid:e63ffa72) Detected management node joined, id:12, nodeIP:111.120.25.152 2017-07-18 17:12:41,935 DEBUG [cccClusterManagerImpl] (Cluster-Heartbeat-1:ctx-7e3e2d82) (logid:e63ffa72) Detected management node joined, id:17, nodeIP:111.120.25.229
At the end of the configuration, configure the ACS control servers to use additional Galera servers (h2, h3) as slave servers and restart the ACS control servers.
# ansible -i galera.hosts all -m raw -s -a "service cloudstack-management stop" # ansible -i galera.hosts all -m raw -s -a "sed -i 's#db.ha.enabled=false#db.ha.enabled=true#' /etc/cloudstack/management/db.properties" # ansible -i galera.hosts all -m raw -s -a "sed -i 's#db.cloud.slaves=.*#db.cloud.slaves=h2,h3#' /etc/cloudstack/management/db.properties" # ansible -i galera.hosts all -m raw -s -a "sed -i 's#db.usage.slaves=.*#db.usage.slaves=h2,h3#' /etc/cloudstack/management/db.properties" # ansible -i galera.hosts all -m raw -s -a "service cloudstack-management start"
Setup is quite simple and should not cause any difficulties. It is somewhat frustrating that it is impossible to perform installation on a Galera cluster right away, without performing a transfer procedure from a non-replicating environment to a replicable one after installation.
Source: https://habr.com/ru/post/333590/
All Articles