Configuring Active / Passive PostgreSQL Cluster using Pacemaker, Corosync, and DRBD (CentOS 5.5)

This article explains how to configure your PostgreSQL Active / Passive cluster using Pacemaker, Corosync and DRBD.
Prepared by Raphael Marangoni, from the team BRLink Servidor Linux

1. Notes:

Linux distribution:

The Centos 5.5 distribution was taken as a basis, but you can use Red Hat Linux or Fedore Core as a basis.
We will use DRBD for replicating PostgreSQL data between nodes and you need to have a disk or partition exclusively of DRBD.
Remember: disk partitioning must be done during installation.

Network equipment / Topology:

We use two Gigabit NIC's per node, one (eth0) to connect to the network (LAN), and the other (eth1) with a crossover connection cable for both nodes. A crossover cable serves for system performance.
And so, we will use two physical nodes, node1.clusterbr.int and node2.clusterbr.int:
node1.clusterbr.int: ip of the first node 10.0.0.191 (LAN) and ip 172.16.0.1 (cross connect)
node2.clusterbr.int: ip 10.0.0.192 (LAN) and IP 172.16.0.2 (cross connect)
dbip.clusterbr.int: cluster ip, 10.0.0.190 (All applications must specify this IP to access PostgreSQL)
* Cross connect I will write as cross over
')

Wheels:

Both nodes have two sections:
/ dev / sda: for OSes;
/ dev / sdb: for DRBD.

PostgreSQL:

PostgreSQL version 8.5, all data that will be stored on DRBD will be used within our cluster.

2. Preparation of nodes:

Disable SELINUX:
vi /etc/selinux/config

(We rule only this line, the rest is left as is)
SELINUX=disabled

Configure the hostname and gateway nodes:
vi /etc/sysconfig/network

node1:

NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=node1.clusterbr.int
GATEWAY=10.0.0.9
node2:
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=node2.clusterbr.int
GATEWAY=10.0.0.9

Configuring network interfaces:
node1:
LAN interface:

vi /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0
BOOTPROTO=static
IPADDR=10.0.0.191
NETMASK=255.255.255.0
ONBOOT=yes
HWADDR=a6:1e:3d:67:66:78

Cross-over / DRBD interface:

vi /etc/sysconfig/network-scripts/ifcfg-eth1

DEVICE=eth1
BOOTPROTO=static
IPADDR=172.16.0.1
NETMASK=255.255.255.0
ONBOOT=yes
HWADDR=ee:ef:ff:9a:9a:57

node2:
LAN interface:

vi /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0
BOOTPROTO=static
IPADDR=10.0.0.192
NETMASK=255.255.255.0
ONBOOT=yes
HWADDR=52:52:a1:1a:62:32

Cross-over / DRBD interface:

vi /etc/sysconfig/network-scripts/ifcfg-eth1

DEVICE=eth1
BOOTPROTO=static
IPADDR=172.16.0.2
NETMASK=255.255.255.0
ONBOOT=yes
HWADDR=1a:18:b2:50:96:1e

DNS Setup:

vi /etc/resolv.conf

search clusterbr.int
nameserver 10.0.0.9

Hostname setting:
The configuration is the same on both nodes:

vi /etc/hosts

127.0.0.1 localhost.localdomain localhost
10.0.0.191 node1.clusterbr.int node1
10.0.0.192 node2.clusterbr.int node2
10.0.0.190 dbip.clusterbr.int node2

Check the network:
node1:
Ping node2 (via LAN interface):

ping -c 2 node2

[root@node1 ~]# ping -c 2 node2
PING node2 (10.0.0.192) 56(84) bytes of data.
64 bytes from node2 (10.0.0.192): icmp_seq=1 ttl=64 time=0.089 ms
64 bytes from node2 (10.0.0.192): icmp_seq=2 ttl=64 time=0.082 ms
--- node2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.082/0.085/0.089/0.009 ms

Ping node2 (via cross-over interface):

ping -c 2 172.16.0.2

[root@node1 ~]# ping -c 2 172.16.0.2
PING 172.16.0.2 (172.16.0.2) 56(84) bytes of data.
64 bytes from 172.16.0.2: icmp_seq=1 ttl=64 time=0.083 ms
64 bytes from 172.16.0.2: icmp_seq=2 ttl=64 time=0.083 ms
--- 172.16.0.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.083/0.083/0.083/0.000 ms

node2:
Ping node1 (via LAN interface):

ping -c 2 node1
[root@node2 ~]# ping -c 2 node1
PING node1 (10.0.0.191) 56(84) bytes of data.
64 bytes from node1 (10.0.0.191): icmp_seq=1 ttl=64 time=0.068 ms
64 bytes from node1 (10.0.0.191): icmp_seq=2 ttl=64 time=0.063 ms
--- node1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.063/0.065/0.068/0.008 ms

Ping node1 (via cross-over interface):

ping -c 2 172.16.0.1

[root@node2 ~]# ping -c 2 172.16.0.1
PING 172.16.0.1 (172.16.0.1) 56(84) bytes of data.
64 bytes from 172.16.0.1: icmp_seq=1 ttl=64 time=1.36 ms
64 bytes from 172.16.0.1: icmp_seq=2 ttl=64 time=0.075 ms
--- 172.16.0.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.075/0.722/1.369/0.647 ms

Set up initialization parameters:
(The change is made only in this line, the rest is left as is)

vi /etc/inittab

id:3:initdefault:

Let's look at the running services:

chkconfig --list | grep 3:sim

[root@node1 ~]# chkconfig --list | grep 3:sim
acpid 0:nÃ£o 1:nÃ£o 2:sim 3:sim 4:sim 5:sim 6:nÃ£o
anacron 0:nÃ£o 1:nÃ£o 2:sim 3:sim 4:sim 5:sim 6:nÃ£o
apmd 0:nÃ£o 1:nÃ£o 2:sim 3:sim 4:sim 5:sim 6:nÃ£o
atd 0:nÃ£o 1:nÃ£o 2:nÃ£o 3:sim 4:sim 5:sim 6:nÃ£o
cpuspeed 0:nÃ£o 1:sim 2:sim 3:sim 4:sim 5:sim 6:nÃ£o
crond 0:nÃ£o 1:nÃ£o 2:sim 3:sim 4:sim 5:sim 6:nÃ£o
irqbalance 0:nÃ£o 1:nÃ£o 2:sim 3:sim 4:sim 5:sim 6:nÃ£o
kudzu 0:nÃ£o 1:nÃ£o 2:nÃ£o 3:sim 4:sim 5:sim 6:nÃ£o
network 0:nÃ£o 1:nÃ£o 2:sim 3:sim 4:sim 5:sim 6:nÃ£o
rawdevices 0:nÃ£o 1:nÃ£o 2:nÃ£o 3:sim 4:sim 5:sim 6:nÃ£o
sshd 0:nÃ£o 1:nÃ£o 2:sim 3:sim 4:sim 5:sim 6:nÃ£o
syslog 0:nÃ£o 1:nÃ£o 2:sim 3:sim 4:sim 5:sim 6:nÃ£o

Now after performing the above described actions, we will overload our nodes.

3. Install the necessary software

We put the necessary package:
yum install -y postgresql84** gcc perl-mailtools perl-dbi php-pgsql

Add a repository:
rpm -Uvh download.fedora.redhat.com/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm

Now we install the ClusterLabs EPEL repository:
wget -O /etc/yum.repos.d/pacemaker.repo clusterlabs.org/rpm/epel-5/clusterlabs.repo

Install the cluster and DRBD packages:
yum install -y pacemaker corosync drbd83 kmod-drbd83 heartbeat

4. Configure DRBD

First, we need to configure DRBD on both nodes:

vi /etc/drbd.conf

global {
usage-count no;
}
common {
syncer { rate 100M; }
protocol C;
}
resource postgres {
startup {
wfc-timeout 0;
degr-wfc-timeout
120;
}
disk { on-io-error detach; }
on node1.clusterbr.int {
device /dev/drbd0;
disk /dev/sdb;
address 172.16.0.1:7791;
meta-disk internal;
}
on node2.clusterbr.int {
device /dev/drbd0;
disk /dev/sdb;
address 172.16.0.2:7791;
meta-disk internal;
}
}

Main configuration items:

resource: Refer to the resource that will be managed by DRBD, and we called it “Postgres”
disk: Specify the device that will use DRBD (disk or partition)
address: The IP address and port that DRBD will use (we specified the cross-over interfaces)
syncer: data transfer rate between nodes
If you have any questions or concerns, you can always use the manual: www.drbd.org/users-guide-emb

After this configuration, we can create metadata about the PostgreSQL resources.
Doing the following:
node1:

drbdadm create-md postgres

[root@node1 ~]# drbdadm create-md postgres
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.

node2:

drbdadm create-md postgres

[root@node2 ~]# drbdadm create-md postgres
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.

Next, we need to create a resource before connecting, again, perform the following actions on both nodes:
drbdadm up postgres

Now we can do the initial synchronization between the nodes, synchronize to the main node by selecting node1:
drbdadm -- --overwrite-data-of-peer primary postgres

To check the synchronization look:

cat /proc/drbd

[root@node1 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:09
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
ns:48128 nr:0 dw:0 dr:48128 al:0 bm:2 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:8340188
[>....................] sync'ed: 0.6% (8144/8188)M delay_probe: 7
finish: 0:11:29 speed: 12,032 (12,032) K/sec

Wait for the synchronization to complete. The process can take a long time, it all depends on the size, disk performance and of course the speed of the network cluster interfaces.
After the synchronization process is over, we can look at the state of the postgre resources
node1:

cat /proc/drbd

[root@node1 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:09
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:8388316 nr:0 dw:0 dr:8388316 al:0 bm:512 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

node2:

cat /proc/drbd

[root@node2 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:09
0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
ns:0 nr:8388316 dw:8388316 dr:0 al:0 bm:512 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

5. Configure PostgreSQL

First, we need to start the DRBD service on both nodes.
/etc/init.d/drbd start

As previously mentioned, node1 is our main node and therefore we will execute the command on it:

cat /proc/drbd

[root@node1 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:09
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:8388316 nr:0 dw:0 dr:8388316 al:0 bm:512 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Primary / Secondary means that the local server is primary and the other is secondary.

Next we need to format DRBD and select ext3 as the file system.
node1:
mkfs.ext3 /dev/drbd0

After that, we can mount the device as a standard PostgreSQL location.
Mount the partition on the node:
node1:
mount -t ext3 /dev/drbd0 /var/lib/pgsql

Next, change the owner and mount group:
node1:
chown postgres.postgres /var/lib/pgsql

Now we initialize the PostgreSQL database:
node1:

su - postgres
initdb /var/lib/pgsql/data
exit

I prefer to verify authentication on the cluster and ip nodes:
node1:

echo "host all all 10.0.0.191/32 trust" >> /var/lib/pgsql/data/pg_hba.conf
echo "host all all 10.0.0.192/32 trust" >> /var/lib/pgsql/data/pg_hba.conf
echo "host all all 10.0.0.190/32 trust" >> /var/lib/pgsql/data/pg_hba.conf

Next we will configure PostgreSQL to work on all interfaces:
node1:
vi /var/lib/pgsql/data/postgresql.conf

Uncomment only this line:
listen_addresses = '0.0.0.0'

Run PostgreSQL:
node1:
/etc/init.d/postgresql start

Now create a user with administrator rights to manage PostgeSQL:
node1:

su - postgres
createuser --superuser admpgsql --pwprompt

You need to set the password on admpgsql

After that we will create a database and fill it:
node1:

su - postgres
createdb pgbench
pgbench -i pgbench

pgbench will fill in some base information:

pgbench -i pgbench

-bash-3.2$ pgbench -i pgbench
NOTA: tabela "pgbench_branches" nÃ£o existe, ignorando
NOTA: tabela "pgbench_tellers" nÃ£o existe, ignorando
NOTA: tabela "pgbench_accounts" nÃ£o existe, ignorando
NOTA: tabela "pgbench_history" nÃ£o existe, ignorando
creating tables...
10000 tuples done.
20000 tuples done.
30000 tuples done.
40000 tuples done.
50000 tuples done.
60000 tuples done.
70000 tuples done.
80000 tuples done.
90000 tuples done.
100000 tuples done.
set primary key...
NOTA: ALTER TABLE / ADD PRIMARY KEY criarÃ¡ Ã ndice implÃ cito "pgbench_branches_pkey" na tabela "pgbench_branches"
NOTA: ALTER TABLE / ADD PRIMARY KEY criarÃ¡ Ã ndice implÃ cito "pgbench_tellers_pkey" na tabela "pgbench_tellers"
NOTA: ALTER TABLE / ADD PRIMARY KEY criarÃ¡ Ã ndice implÃ cito "pgbench_accounts_pkey" na tabela "pgbench_accounts"
vacuum...done.

Now we turn to the database to check:
node1:

psql -U admpgsql -d pgbench
select * from pgbench_tellers;

psql -U admpgsql -d pgbench
psql (8.4.5)
Digite "help" para ajuda.

pgbench=# select * from pgbench_tellers;
tid | bid | tbalance | filler
-----+-----+----------+--------
1 | 1 | 0 |
2 | 1 | 0 |
3 | 1 | 0 |
4 | 1 | 0 |
5 | 1 | 0 |
6 | 1 | 0 |
7 | 1 | 0 |
8 | 1 | 0 |
9 | 1 | 0 |
10 | 1 | 0 |
(10 registros)

At this setting is over.

Check PostgreSQL for node2:
But before we start working with the Pacemaker service, it’s better to make sure that postgre will work on node2.
First, on node1 we have to stop postgresql:
node1:
/etc/init.d/postgresql stop

Then install DRBD once:
umount /dev/drbd0

Now we will make node1 as secondary on DRBD:
drbdadm secondary postgres

Now on node2 let's make DRBD primary:
node2:
drbdadm primary postgres

Next, install the DRBD device:
mount -t ext3 /dev/drbd0 /var/lib/pgsql/

and finally run PostgreSQL:
/etc/init.d/postgresql start

Now let's check if we can access pgbench by node2:

psql -U admpgsql -d pgbench
select * from pgbench_tellers;

[root@node2 ~]# psql -U admpgsql -d pgbench
psql (8.4.5)
Digite "help" para ajuda.

pgbench=# select * from pgbench_tellers;
tid | bid | tbalance | filler
-----+-----+----------+--------
1 | 1 | 0 |
2 | 1 | 0 |
3 | 1 | 0 |
4 | 1 | 0 |
5 | 1 | 0 |
6 | 1 | 0 |
7 | 1 | 0 |
8 | 1 | 0 |
9 | 1 | 0 |
10 | 1 | 0 |
(10 registros)

After we make sure everything is working fine, we need to initialize our cluster:
node2:

/etc/init.d/postgresql stop
umount /dev/drbd0
drbdadm secondary postgres
/etc/init.d/drbd stop

node1:

drbdadm primary postgres
/etc/init.d/drbd stop

We need to make all services be disabled on both nodes by executing the commands on both nodes:

chkconfig --level 35 drbd off
chkconfig --level 35 postgresql off

6. Configure Corosync (openAIS)

Making the setting of Corosync
node1:

export ais_port=4000
export ais_mcast=226.94.1.1
export ais_addr=`ip address show eth0 | grep "inet " | tail -n 1 | awk '{print $4}' | sed s/255/0/`

Then check the data:
env | grep ais_

The variable ais_addr should contain the network address that the cluster will listen to. In our case, this is 10.0.0.0.
Next we will create a corosync configuration file:

cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
sed -i.gres "s/.*mcastaddr:.*/mcastaddr:\ $ais_mcast/g" /etc/corosync/corosync.conf
sed -i.gres "s/.*mcastport:.*/mcastport:\ $ais_port/g" /etc/corosync/corosync.conf
sed -i.gres "s/.*bindnetaddr:.*/bindnetaddr:\ $ais_addr/g" /etc/corosync/corosync.conf

Add the following to our config:

cat <<-END >>/etc/corosync/corosync.conf
aisexec {
user: root
group: root
}
END
cat <<-END >>/etc/corosync/corosync.conf
service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 0
}
END

The /etc/corosync/corosync.conf file looks like this:

compatibility: whitetank

totem {
version: 2
secauth: off
threads: 0
interface {
ringnumber: 0
bindnetaddr: 10.0.0.0
mcastaddr: 226.94.1.1
mcastport: 4000
}
}

logging {
fileline: off
to_stderr: yes
to_logfile: yes
to_syslog: yes
logfile: /tmp/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}

amf {
mode: disabled
}
aisexec {
user: root
group: root
}
service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 0
}

From node1 we will throw the configuration on node2:
scp /etc/corosync/* node2:/etc/corosync/

On both nodes we need to create a catalog of heads:
mkdir /var/log/cluster/

Now run corosync
node1:
/etc/init.d/corosync start

Let's check whether our service is in order:
node1:

grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/messages

[root@node1 bin]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/messages
Apr 7 12:37:21 node1 corosync[23533]: [MAIN ] Corosync Cluster Engine ('1.2.0'): started and ready to provide service.
Apr 7 12:37:21 node1 corosync[23533]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

Check our interfaces that corosync should listen to:

grep TOTEM /var/log/messages

[root@node1 bin]# grep TOTEM /var/log/messages
Apr 7 12:37:21 node1 corosync[23533]: [TOTEM ] Initializing transport (UDP/IP).
Apr 7 12:37:21 node1 corosync[23533]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Apr 7 12:37:21 node1 corosync[23533]: [TOTEM ] The network interface [10.0.0.191] is now up.
Apr 7 12:37:21 node1 corosync[23533]: [TOTEM ] A processor joined or left the membership and a new membership was formed.

Next, look at the pacemaker:

grep pcmk_startup /var/log/messages

[root@node1 bin]# grep pcmk_startup /var/log/messages
Apr 7 12:37:21 node1 corosync[23533]: [pcmk ] info: pcmk_startup: CRM: Initialized
Apr 7 12:37:21 node1 corosync[23533]: [pcmk ] Logging: Initialized pcmk_startup
Apr 7 12:37:21 node1 corosync[23533]: [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
Apr 7 12:37:21 node1 corosync[23533]: [pcmk ] info: pcmk_startup: Service: 9
Apr 7 12:37:21 node1 corosync[23533]: [pcmk ] info: pcmk_startup: Local hostname: node1

We look in the processes of our corosync:

ps axf

[root@node1 bin]# ps axf
(should contain something like this)
23533 ? Ssl 0:00 corosync
23539 ? SLs 0:00 \_ /usr/lib/heartbeat/stonithd
23540 ? S 0:00 \_ /usr/lib/heartbeat/cib
23541 ? S 0:00 \_ /usr/lib/heartbeat/lrmd
23542 ? S 0:00 \_ /usr/lib/heartbeat/attrd
23543 ? S 0:00 \_ /usr/lib/heartbeat/pengine
23544 ? S 0:00 \_ /usr/lib/heartbeat/crmd

If everything went well, then we can transfer the corosync to node2
node2:
/etc/init.d/corosync start

Perform a cluster check on both nodes:

crm_mon -1

[root@node1 ~]# crm_mon -1
============
Last updated: Fri Oct 29 17:44:36 2010
Stack: openais
Current DC: node1.clusterbr.int - partition with quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ node1.clusterbr.int node2.clusterbr.int ]

We must be sure that both sites are defined as online.

Add corosync to autoload on both nodes:
chkconfig --level 35 corosync on

7. Configure Pacemaker

Important commands to manage the cluster:

Cluster configuration check:
crm_verify -L

Get the list and status of the cluster:
crm_mon -1

Cluster configuration list:
crm configure show

List of open crm consoles
crm

Stonith setup
When checking the cluster configuration, we should get some errors:
crm_verify -L

Thus, to disable Stonith, we need to execute the following command on one of the nodes:
crm configure property stonith-enabled=false

Now the cluster configuration check should pass without errors:
crm_verify -L

Cluster basic settings

Run the command on any node from our nodes:
crm configure property no-quorum-policy=ignore

Setting the value will change the resource to another node, thus if synchronization fails at one of the nodes, synchronization will be intercepted by another node.
crm configure rsc_defaults resource-stickiness=100

Let's see our configuration:

crm configure show

[root@node1 ~]# crm configure show
node node1.clusterbr.int
node node2.clusterbr.int
property $id="cib-bootstrap-options" \
dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"

DBIP configuration

Add to the DBIP config the ip-address of our cluster:

crm configure primitive DBIP ocf:heartbeat:IPaddr2 \
params ip=10.0.0.190 cidr_netmask=24 \
op monitor interval=30s

Checking status:

crm_mon -1

[root@node1 ~]# crm_mon -1
============
Last updated: Fri Oct 29 17:47:53 2010
Stack: openais
Current DC: node1.clusterbr.int - partition with quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ node2.clusterbr.int node1.clusterbr.int ]

DBIP (ocf::heartbeat:IPaddr2): Started node2.clusterbr.int

Note that the state of the cluster shows where our resource is working, at the moment the work is on node2, but it can also work on node1

DRBD configuration on cluster

Add DRBD to our cluster:

crm configure primitive drbd_postgres ocf:linbit:drbd \
params drbd_resource="postgres" \
op monitor interval="15s"

Configure the primary and secondary nodes:

crm configure ms ms_drbd_postgres drbd_postgres \
meta master-max="1" master-node-max="1" \
clone-max="2" clone-node-max="1" \
notify="true"

Install DRBD:

crm configure primitive postgres_fs ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/var/lib/pgsql" fstype="ext3"

Configuring PostgreSQL on a cluster

Add postgresql to our cluster:

crm configure primitive postgresql ocf:heartbeat:pgsql \
op monitor depth="0" timeout="30" interval="30"

Now we need to add our services to the postgres group

crm configure group postgres postgres_fs DBIP postgresql
crm configure colocation postgres_on_drbd inf: postgres ms_drbd_postgres:Master

Configuring postgre to run after:
DRBDcrm configure order postgres_after_drbd inf: ms_drbd_postgres:promote postgres:start

Let's look at the cluster configuration

crm configure show

[root@node1 ~]# crm configure show
node node1.clusterbr.int
node node2.clusterbr.int
primitive DBIP ocf:heartbeat:IPaddr2 \
params ip="10.0.0.190" cidr_netmask="24" \
op monitor interval="30s"
primitive drbd_postgres ocf:linbit:drbd \
params drbd_resource="postgres" \
op monitor interval="15s"
primitive postgres_fs ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/var/lib/pgsql" fstype="ext3"
primitive postgresql ocf:heartbeat:pgsql \
op monitor interval="30" timeout="30" depth="0" \
meta target-role="Started"
group postgres postgres_fs DBIP postgresql \
meta target-role="Started"
ms ms_drbd_postgres drbd_postgres \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
colocation postgres_on_drbd inf: postgres ms_drbd_postgres:Master
order postgres_after_drbd inf: ms_drbd_postgres:promote postgres:start
property $id="cib-bootstrap-options" \
dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
[root@node1 ~]#

Configure the preferred node:
crm configure location master-prefer-node1 DBIP 50: node1.clusterbr.int

Checking status:

crm_mon -1

[root@node2 ~]# crm_mon -1
============
Last updated: Fri Oct 29 19:54:09 2010
Stack: openais
Current DC: node2.clusterbr.int - partition with quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ node2.clusterbr.int node1.clusterbr.int ]

Master/Slave Set: ms_drbd_postgres
Masters: [ node2.clusterbr.int ]
Slaves: [ node1.clusterbr.int ]
Resource Group: postgres
postgres_fs (ocf::heartbeat:Filesystem): Started node2.clusterbr.int
DBIP (ocf::heartbeat:IPaddr2): Started node2.clusterbr.int
postgresql (ocf::heartbeat:pgsql): Started node2.clusterbr.int

If you have errors, you must overload both nodes so that the new corosync configurations are accepted by the system. Also, after rebooting the system, we need to connect to the DBIP (10.0.0.190) via TCP port 5432 to the postgres service.

Cluster management

Transferring a resource to another node:
crm resource migrate postgres node1.clusterbr.int

Removing the migrate command:
crm resource unmigrate postgres

We clean our messages:
crm resource cleanup postgres

Stop the PostgreSQL service
crm resource stop postgresql

Starting PostgreSQL Service
crm resource start postgresql

8. Create a web interface for service status

The web interface will be convenient for monitoring our cluster.
Start apache:

/etc/init.d/httpd start
chkconfig --level 35 httpd on

Create a directory for the cluster (DocumentRoot):
mkdir /var/www/html/cluster/

To generate pages do the following:
crm_mon --daemonize --as-html /var/www/html/cluster/index.html

and let's set this whole thing to autoload:
echo "crm_mon --daemonize --as-html /var/www/html/cluster/index.html" >> /etc/rc.d/rc.local

Check the availability of our face in the browser:
10.0.0.190/cluster

9. Installing phppgAdmin to work with postgresql

Perform actions on both nodes:

mkdir /download
cd /download
wget 'http://downloads.sourceforge.net/project/phppgadmin/phpPgAdmin%20%5Bbeta%5D/phpPgAdmin-5.0/phpPgAdmin-5.0-beta2.tar.bz2?r=http%3A%2F%2Fphppgadmin.sourceforge.net%2F%3Fpage%3Ddownload&ts=1288189530&use_mirror=ufpr'

Install:

tar -jxvf phpPgAdmin-5.0-beta2.tar.bz2
mv phpPgAdmin-5.0-beta2 /var/www/html/cluster-pgadmin
chown apache.apache -R /var/www/html/cluster-pgadmin

Go to the browser at 10.0.0.190/cluster-pgadmin
P.S. Login and password were entered during postgresql installation

10. Network access

If you need access to postgresql from the local network, do not forget to set up authentication on Postgres

Here we set the MD5 network authentication to 10.0.0.0/24
echo "host all all 10.0.0.0/24 md5">> /var/lib/pgsql/data/pg_hba.conf

Restart postgres:

crm resource stop postgresql
crm resource start postgresql

11. Monitoring

Cluster monitoring is a mandatory part of the entire script execution process. I suggest you monitor using Zabbix, to do this, install Zabbix agent on each node and configure monitoring for these items:

1. Checking ping for accessibility (10.0.0.191, 10.0.0.192 and 172.16.0.1, 172.16.0.2)
2. Check availability DBIP (cluster ip) 10.0.0.190
3. Checking TCP port 5432 for DBIP 10.0.0.190
4. Checking the CPU, RAM and Disk
5. You can use the monitor_drbd.sh script (the result returns 1 when everything is OK and 0 when there are problems)
Script monitor_drbd.sh for Zabbix:

#!/bin/bash

CHECK=`cat /proc/drbd | grep UpToDate/UpToDate | cut -d: -f5 | cut -c1-17`
STRING_OK="UpToDate/UpToDate"

# Comparando as duas.
if [ "$CHECK" == "$STRING_OK" ] ; then
# Is ok, returning 1
echo 1;
else
# Not ok, returning 0
echo 0;
fi

PS If you see typos or do not match in the translation, very much I ask you to write in the LAN.
I will quickly fix everything for the benefit of society.

Source: https://habr.com/ru/post/108833/

All Articles