📜 ⬆️ ⬇️

Pacemaker + DRBD (Dual primary) + ctdb cluster storage

Good day, habrovchane. The task was received - to deploy a fault-tolerant High Available storage using pacamaker + drbd (in dual primary mode) + clvmd + ctdb, which will be mounted on the server. I will make a reservation that I come across all these tools for the first time and will be happy with criticism and additions / corrections. Online instructions specifically for this bundle or not, or the information is outdated. This is a working one at the moment, but there is one problem whose solution I hope to find soon. All actions must be performed on both nodes, unless otherwise indicated.

Let's get started We have two virtual machines on CentOS 7.

1) For reliability, we introduce them to / etc / hosts
')
192.168.0.1 node1 192.168.0.2 node2 

2) There is no DRBD in standard repositories, so you need to connect a third-party.

 rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm 

3) Install drbd version 8.4 (I didn’t manage to start 9.0 in dual primary mode)

 yum install -y kmod-drbd84 drbd84-utils 

4) Activate and enable drbd kernel module in autoload

 modprobe drbd echo drbd > /etc/modules-load.d/drbd.conf 

5) Create drbd resource configuration file /etc/drbd.d/r0.res

 resource r0 { protocol C; device /dev/drbd0; meta-disk internal; disk /dev/sdb; net { allow-two-primaries; } disk { fencing resource-and-stonith; } handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; } on node1 { address 192.168.0.1:7788; } on node2 { address 192.168.0.2:7788; } 

6) Turn off the drbd unit (the pacemaker will answer for it later), create metadata for the drbd disk, raise the resource

 systemctl disable drbd drbdadm create-md r0 drbdadm up r0 

7) At the first node we make the resource primary

 drbdadm primary --force r0 

8) Put the pacemaker

 yum install -y pacemaker pcs resource-agents 

9) Set a password for the hacluster user for authorization on the nodes

 echo CHANGEME | passwd --stdin hacluster 

10) Run the pacemaker on both nodes.

 systemctl enable pcsd systemctl start pcsd 

11) Log in to the cluster. From this stage we do everything on one node.

 pcs cluster auth node1 node2 -u hacluster 

12) Create a cluster named samba_cluster

 pcs cluster setup --force --name samba_cluster node1 node2 

13) activate the nodes

 pcs cluster enable --all pcs cluster start --all 

14) Since we use virtual machines as servers, we disable the STONITH mechanism

 pcs property set stonith-enabled=false pcs property set no-quorum-policy=ignore 

15) Create a VIP

 pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=192.168.0.10 cidr_netmask=24 op monitor interval=60s 

16) Create drbd resource

 pcs cluster cib drbd_cfg pcs -f drbd_cfg resource create DRBD ocf:linbit:drbd drbd_resource=r0 op monitor interval=60s pcs -f drbd_cfg resource master DRBDClone DRBD master-max=2 master-node-max=1 clone-node-max=1 clone-max=2 notify=true interleave=true pcs cluster cib-push drbd_cfg 

17) Install the necessary clvm packages and prepare clvm

 yum install -y lvm2-cluster gfs2-utils /sbin/lvmconf --enable-cluster 

18) Add the dlm and clvd resource in pacemaker

 pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s on-fail=fence clone interleave=true ordered=true pcs resource create clvmd ocf:heartbeat:clvm op monitor interval=30s on-fail=fence clone interleave=true ordered=true pcs constraint colocation add clvmd-clone with dlm-clone 

19) At this stage, running clvmd and dlm should generate an error. Go to the web interface pacemaker 192.168.0.1 : 2224. If the cluster does not appear, then add it to “Edd existing”. Next, go to Resources - dlm - optional arguments and set the value of allow_stonith_disabled = true

20) Set the queue for loading resources

 pcs constraint order start DRBDClone then dlm-clone pcs constraint order start dlm-clone then clvmd-clone 

21) Forbid LVM from writing the cache and clearing it. On both nodes

 sed -i 's/write_cache_state = 1/write_cache_state = 0/' /etc/lvm/lvm.conf rm /etc/lvm/cache/* 

22) Edit /etc/lvm/lvm.conf so that lvm does not see / dev / sdb. On both nodes

 # This configuration option has an automatic default value. # filter = [ "a|.*/|" ] filter = [ "r|^/dev/sdb$|" ] 

23) Create a CLVM partition. We do it only on one node

 $ vgcreate -Ay -cy cl_vg /dev/drbd0 Physical volume "/dev/drbd0" successfully created. Clustered volume group "cl_vg" successfully created $ lvcreate -l100%FREE -n r0 cl_vg Logical volume "r0" created. 

24) Mark up partition in gfs2

 mkfs.gfs2 -j2 -p lock_dlm -t drbd-gfs2:r0 /dev/cl_vg/r0 

25) Next we add the mounting of this section in the pacemaker and tell it to boot after clvmd

 pcs resource create fs ocf:heartbeat:Filesystem device="/dev/cl_vg/r0" directory="/mnt/" fstype="gfs2" --clone pcs constraint order start clvmd-clone then fs-clone 

26) Now it’s time ctdb, which will run samba

 yum install -y samba ctdb cifs-utils 

27) Edit the config /etc/ctdb/ctdbd.conf

 CTDB_RECOVERY_LOCK="/mnt/ctdb/.ctdb.lock" CTDB_NODES=/etc/ctdb/nodes CTDB_MANAGES_SAMBA=yes CTDB_LOGGING=file:/var/log/ctdb.log CTDB_DEBUGLEVEL=NOTICE 

28) Create a file with a list of nodes. ATTENTION! After each ip in the list of nodes, there must be a newline. Otherwise, the node will fail at initialization.

 cat /etc/ctdb/nodes 192.168.0.1 192.168.0.2 

29) Add to /etc/samba/smb.conf configuration

 [global] clustering = yes private dir = /mnt/ctdb lock directory = /mnt/ctdb idmap backend = tdb2 passdb backend = tdbsam [test] comment = Cluster Share path = /mnt browseable = yes writable = yes 

30) Finally, we create the ctdb resource and indicate that it should load after

 pcs constraint order start fs-clone then samba 

And now about the problem that I have not yet decided. If the node is rebooted, the whole bundle collapses, since drbd takes time to activate / dev / drbd0. DLM does not see the partition, because it is not yet activated and does not start, etc. Workaround - activate partition manually and restart pacemaker resources.

 vgchage -ay pcs resource refresh 

Source: https://habr.com/ru/post/435906/


All Articles