Failover architecture of two web servers using the example of Debian Squeeze

I received the task to organize the resiliency of a web application from two servers. The web application includes static files and data in the MySQL database.
The main requirement of the customer - the web application should always be available and in case of failure within 5 minutes, the failure should be restored.
2 servers, geographically distributed in different data centers, must satisfy this requirement.

To solve this problem, I chose Debian Squeeze OS, since Web application developers use Debian. I decided to do fault tolerance logic through DNS, i.e. there is a domain name test.ru. My 2 servers act as NS servers; all zone information is stored locally. If a failure occurs with the main server, the DNS is overwritten and the A-record points to the backup server.
In addition to DNS, there was a problem with the synchronization of files, and the need for bidirectional, because on the backup server, at the time of the main idle, information may be uploaded. To solve this problem, I use the Unison package.
To synchronize MySQL databases using standard MySQL-replication bidirectional Master-Master. With unidirectional replication (“master-slave”), if data is written to a subordinate database, the data may be inconsistent, which can lead to a replication error. In the case of bidirectional replication, the databases will be in a consistent state.
The arbiter of this entire system will be a self-written script, which I will give below.

System preparation

We have a clean Debian system with apache2 packages installed, mysql-server. I will not paint their installation, since there is plenty of information in the internet.
The main server is master IP = 10.1.0.1, the Backup server is slave IP = 10.2.0.2

File Sync

Web application files are located in the / site / web / directory. Apache on both servers should already be configured for this directory.
Next, to synchronize files, we need to make SSH-free access between servers, for this we will generate a pair of keys:

ssh-keygen -t rsa (passphrase  ) scp /root/.ssh/id_rsa.pub root@10.2.0.2:/root/.ssh/authorized_keys2

and similarly on the second server:

 ssh-keygen -t rsa (passphrase  ) scp /root/.ssh/id_rsa.pub root@10.1.0.1:/root/.ssh/authorized_keys2

After that, you need to make sure that the data in the / site / web / directories on both servers are identical and install Unison:

 apt-get install unison

create the configuration file /root/.unison/web.prf with the following contents:

 #   ,    root = /site/web root = ssh://root@10.2.0.2//site/web #       owner = true times = true batch = true #         log = true logfile = /var/log/unison_sync.log

Now you can start synchronization with the following command on the main server:

 unison web

To make automatic synchronization every 5 minutes, you need to use the following script:

 #!/bin/sh # ,      if [ -f /var/lock/sync.lock ] then echo lockfile exists! exit 1 fi /usr/bin/touch /var/lock/sync.lock /usr/bin/unison test /bin/rm /var/lock/sync.lock #End

Save this script to the file /root/bin/sync.sh, give the right to run

 chmod +x /root/bin/sync.sh

And we add the task to the CRON "crontab -e":

 */5 * * * * /root/bin/sync.sh > /dev/null 2>&1

Run this script every 5 minutes.
')

MySQL replication

For replication, we will use the MySQL user replication with the password some_password.
On the master master server, edit the file /etc/mysql/my.cnf. Insert the following lines into the section related to replication:

 server-id = 1 log_bin = /var/log/mysql/mysql-bin.log expire_logs_days = 10 max_binlog_size = 100M binlog_ignore_db = mysql binlog_ignore_db = test master-host = 10.2.0.2 #ip-  slave- master-user = replication #   master-password = some_password #  master-port = 3306

and in the same file we change the bind-address variable so that the muscle is available on any interface:

 bind-address = 0.0.0.0

We go into MySQL as root and give the replication user the right to connect to our server from the backup slave server:

 mysql -u root -p Enter password:     root,     >grant replication slave on *.* to 'replication'@'10.2.0.2' identified by 'some_password'; >flush privileges; >quit; /etc/init.d/mysql restart

Go to the second slave-server. Edit the file /etc/mysql/my.cnf. Insert the following lines into the section related to replication:

 server-id = 2 log_bin = /var/log/mysql/mysql-bin.log expire_logs_days = 10 max_binlog_size = 100M binlog_ignore_db = mysql binlog_ignore_db = test master-host = 10.1.0.1 #ip-  master- master-user = replication #   master-password = some_password #  master-port = 3306

and in the same file we change the bind-address variable so that the muscle is available on any interface:

 bind-address = 0.0.0.0

We go into MySQL as root and give the right to the replication user to connect to our server from the master master server:

 mysql -u root -p Enter password:     root,     >grant replication slave on *.* to 'replication'@'10.1.0.1' identified by 'some_password'; >flush privileges; >quit; /etc/init.d/mysql restart

On both servers, we verify that the replication process is running. To do this, perform the following:

 #mysql —u root —p Enter password:     root,     >show slave status \G

In the displayed information we are interested in three parameters:

 Slave_IO_State: Waiting for master to send event Slave_IO_Running: Yes Slave_SQL_Running: Yes

If the specified parameters on both servers correspond to the above, then everything is fine, replication is configured. If not, look at the logs.

DNS servers

In order to manage the domain zone test.ru, in the domain settings you need to delegate it to our servers. Let our servers have domain names ns.master.my.com for the main server and ns.slave.my.com for the backup in the global network.
Only after delegating the DNS zone to our servers can we manage it.
On both servers, install the bind9 package:

 apt-get install bind9

Add a line to the config /etc/bind/named.conf to specify your zones:

 echo 'include "/etc/bind/my-zones.conf";' >> /etc/bind/named.conf

And we create our own /etc/bind/my-zones.conf config file with the following contents:

 zone "test.ru" { type master; file "/etc/bind/db.test.ru"; };

On the first master master server, we create our database in the /etc/bind/db.test.ru file with the following contents:

 $ORIGIN test.ru. $TTL 10 @ IN SOA ns.master.my.com. admin.my.com. ( 2 ; Serial 10 ; Refresh 10 ; Retry 10 ; Expire 10 ) ; Negative Cache TTL IN NS ns.master.my.com. IN NS ns.slave.my.com. ; @ IN A 10.1.0.1

where ns.master.my.com is the domain name of this server (my.com domain is fictional)
The key here is A-record and the update time is 10 seconds.

On the second backup slave server, we create our database in the /etc/bind/db.test.ru file with the following contents:

 $ORIGIN test.ru. $TTL 10 @ IN SOA ns.slave.my.com. admin.my.com. ( 2 ; Serial 10 ; Refresh 10 ; Retry 10 ; Expire 10 ) ; Negative Cache TTL IN NS ns.master.my.com. IN NS ns.slave.my.com. ; @ IN A 10.1.0.1

Differences from the main in SOA-record.
Further we overload bind on both servers

 /etc/init.d/bind9 restart

And from some external computer try to check the A-record of the test.ru domain
For example, using nslookup test.ru. Must issue address 10.1.0.1.
If something is wrong, then we look at the logs.

Arbitration

We turn to the most interesting - arbitration, which will settle all this and manage the DNS-zone.
For a basis, I took the work of services SSH, DNS, HTTP and MySQL. HTTP and MySQL services are critical, in this case. if at least one of these services does not work on the main server, then you need to send all requests to the backup server, additionally notify the administrator about the problem by email.
To test the service, I use a script that, with the help of TELNET, checks the availability of the port and sets the value in a special state file.
First, the master master server is polled and the results are recorded in a file, then the backup slave server is polled, the results are also recorded in the file; then the status of critical services on the core server is checked, in case of problems, the status on the backup server is checked and, depending on the status, the DNS zone is overwritten. I will not describe all the logic, it is presented in scripts.
First you need to create a special directory on both servers:

 mkdir /var/lock/sync/

All scripts are stored in / root / bin /
The first script /root/bin/master.sh is responsible for checking the master master server for DNS, SSH, HTTP, MySQL services.

 #!/bin/bash #        # This script is licensed under GNU GPL version 2.0 or above # --------------------------------------------------------------------- ###     22, 53, 80  3306 ### ###         email ### ######    ###### WORKDIR="/root/bin/" SEMAFOR="/var/lock/sync/master.sem" MAILFILE="/root/bin/master_server_problem.txt" #   master- HOST="10.1.0.1" HTTP="80" SSH="22" MYSQL="3306" DNS="53" PROTOCOLS="SSH HTTP MYSQL DNS" ###  ### EMAIL="admin@my.com" ########## ############ ######       ##### ### Binaries ### TELNET=$(which telnet) ###Change dir### cd $WORKDIR ###Check if already notified### if [ -f $MAILFILE ]; then rm -rf $MAILFILE fi #     ,    if [ -f $SEMAFOR ]; then A=1 else echo "\ DNS 0 SSH 0 HTTP 0 MYSQL 0" > $SEMAFOR fi ### ### for PROTO in $PROTOCOLS do Num_PROTO=`cat $SEMAFOR | grep $PROTO | awk {'print $2'}` ( echo "quit" ) | $TELNET $HOST ${!PROTO} | grep Connected > /dev/null 2>&1 if [ "$?" -ne "1" ]; then #Ok echo "$PROTO PORT CONNECTED" if [ $Num_PROTO -ne "0" ]; then # !=0 if [ $Num_PROTO = "3" ]; then # ==3 echo "$PROTO PORT CONNECTING, AVALIBLE on server $HOST \n" >> $MAILFILE fi OLD_Line="$PROTO $Num_PROTO" NEW_Line="$PROTO 0" sed -i -e "s/$OLD_Line/$NEW_Line/g" $SEMAFOR fi else #Connection failure if [ $Num_PROTO -ne "3" ]; then if [ $Num_PROTO = "2" ]; then # ==2 send notification echo "$PROTO PORT NOT CONNECTING, FAILED on server $HOST \n" >> $MAILFILE fi OLD_Line="$PROTO $Num_PROTO" NEW_Line="$PROTO $(($Num_PROTO+1))" sed -i -e "s/$OLD_Line/$NEW_Line/g" $SEMAFOR fi fi done ###Send mail notification after 2 failed check### #   MUTT      SMTP- 10.6.6.6   #     if [ -f $MAILFILE ]; then /usr/bin/mutt -x -e "set smtp_url=smtp://10.6.6.6" -e "set from="admin@my.com"" -s "Server problem" $EMAIL < $MAILFILE fi

The second script /root/bin/slave.sh is responsible for checking services on the backup slave server.

 #!/bin/bash #        # This script is licensed under GNU GPL version 2.0 or above # --------------------------------------------------------------------- ###     22, 53, 80  3306 ### ###         email ### ######    ###### WORKDIR="/root/bin/" SEMAFOR="/var/lock/sync/slave.sem" MAILADMIN=0 MAILFILE="/root/bin/slave_server_problem.txt" HOST="10.2.0.2" HTTP="80" SSH="22" MYSQL="3306" DNS="53" PROTOCOLS="SSH HTTP MYSQL DNS" ###  ### EMAIL="admin@my.com" ########## ######       ##### ### Binaries ### TELNET=$(which telnet) ###Change dir### cd $WORKDIR ###Check if already notified### if [ -f $MAILFILE ]; then rm -rf $MAILFILE fi if [ -f $SEMAFOR ]; then A=1 else echo "\ DNS 0 SSH 0 HTTP 0 MYSQL 0" > $SEMAFOR fi ### SSH### for PROTO in $PROTOCOLS do Num_PROTO=`cat $SEMAFOR | grep $PROTO | awk {'print $2'}` ( echo "quit" ) | $TELNET $HOST ${!PROTO} | grep Connected > /dev/null 2>&1 if [ "$?" -ne "1" ]; then #Ok echo "$PROTO PORT CONNECTED" if [ $Num_PROTO -ne "0" ]; then # !=0 if [ $Num_PROTO = "3" ]; then # ==3 echo "$PROTO PORT CONNECTING, AVALIBLE on server $HOST \n" >> $MAILFILE fi OLD_Line="$PROTO $Num_PROTO" NEW_Line="$PROTO 0" sed -i -e "s/$OLD_Line/$NEW_Line/g" $SEMAFOR fi else #Connection failure if [ $Num_PROTO -ne "3" ]; then if [ $Num_PROTO = "2" ]; then # ==2 send notification echo "$PROTO PORT NOT CONNECTING, FAILED on server $HOST \n" >> $MAILFILE fi OLD_Line="$PROTO $Num_PROTO" NEW_Line="$PROTO $(($Num_PROTO+1))" sed -i -e "s/$OLD_Line/$NEW_Line/g" $SEMAFOR fi fi done ###Send mail notification after 2 failed check### #   MUTT      SMTP- 10.6.6.6   #     if [ -f $MAILFILE ]; then /usr/bin/mutt -x -e "set smtp_url=smtp://10.6.6.6" -e "set from="admin@my.com"" -s "Server problem" $EMAIL < $MAILFILE fi

These two files are identical, with the exception of the two variables $ HOST and $ SEMAFOR, in principle one could be made and a cycle was used, but I decided to make them separate files.
The third file /root/bin/compare.sh is used to compare the status of services on the servers and overwrites the DNS zone.

 #!/bin/bash #        # This script is licensed under GNU GPL version 2.0 or above # --------------------------------------------------------------------- FILE_MASTER="/var/lock/sync/master.sem" FILE_SLAVE="/var/lock/sync/slave.sem" HOST_MASTER="10.1.0.1" HOST_SLAVE="10.2.0.2" DNSFILE="/etc/bind/db.test.ru" LOG="/var/log/dns_rewrite.log" PROTOCOLS="HTTP MYSQL" MASTER_COL=0 SLAVE_COL=0 COL=0 for PROTO in $PROTOCOLS do COL=$(($COL + 1)) Master_PROTO=`cat $FILE_MASTER | grep $PROTO | awk {'print $2'}` MASTER_COL=$(($MASTER_COL + $Master_PROTO)) Slave_PROTO=`cat $FILE_SLAVE | grep $PROTO | awk {'print $2'}` SLAVE_COL=$(($SLAVE_COL + $Slave_PROTO)) done MAX_COL=$(($COL * 3)) if [ $MASTER_COL = $MAX_COL ]; then # ==6 if [ $SLAVE_COL = "0" ]; then #==0 #    Slave grep $HOST_MASTER $DNSFILE if [ "$?" -ne "1" ]; then #ok, rewrite sed -i -e "s/$HOST_MASTER/$HOST_SLAVE/g" $DNSFILE echo "Rewrite DNS to $HOST_SLAVE" >> $LOG /etc/init.d/bind9 restart fi fi else # check master if [ $MASTER_COL = "0" ]; then #==0 grep $HOST_SLAVE $DNSFILE if [ "$?" -ne "1" ]; then #ok, rewrite sed -i -e "s/$HOST_SLAVE/$HOST_MASTER/g" $DNSFILE echo "Rewrite DNS to $HOST_MASTER" >> $LOG /etc/init.d/bind9 restart fi else if [ $SLAVE_COL = "0" ]; then #==0 #    Slave grep $HOST_MASTER $DNSFILE if [ "$?" -ne "1" ]; then #ok, rewrite sed -i -e "s/$HOST_MASTER/$HOST_SLAVE/g" $DNSFILE echo "Rewrite DNS to $HOST_SLAVE" >> $LOG /etc/init.d/bind9 restart fi fi fi fi

Finally, we put all these scripts together in one file /root/bin/dnswrite.sh

 #!/bin/bash #    /root/bin/master.sh #   SLAVE /root/bin/slave.sh #       /root/bin/compare.sh

Add rights to run these scripts

 chmod +x /root/bin/*.sh

and add the task to the CRON to start every minute “crontab -e”:

 */1 * * * * /root/bin/dnswrite.sh /dev/null 2>&1

All is ready.
Now we have a fully automated complex fault-tolerant web application!
I will be glad to hear comments and accept comments.

Source: https://habr.com/ru/post/174713/

All Articles