📜 ⬆️ ⬇️

pacemaker: how to finish off lying

When reserving some types of resources, it is very important that at the same time no more than one client uses the resource, as, for example, with drbd: it should not be assumed that drbd was mounted in RW mode on two systems. The same applies to disk systems that are connected to multiple servers.

This is followed by the pacemaker itself, but there may be situations where the pacemaker decides that the resource needs to be transferred, but will not be able to give a shutdown command on another node (for example, loss of network connectivity when using iscsi via a separate network, etc.) To combat this, stonith (Shoot The Other Node In The Head) is used . In pacemaker, it is configured as a resource and is able to solve many problems.

The initial configuration will be simple:


In the first step, in order to avoid problems, do a dump of the current configuration
pcs cluster cib stonith.xml 

')
The stonith must be active on the cluster, and the quorum must be disabled (since the cluster has two nodes) . Make sure of it
 #pcs -f stonith.xml property show ... no-quorum-policy: ignore stonith-enabled: true ... 

If not, then
 pcs -f stonith.xml property set stonith-enabled=true pcs -f stonith.xml property set no-quorum-policy=ignore 


Then we create ipmi-stonith resources (a full list of possible stonith resources will give pcs stonith list , and a full list of parameters is available by pcs stonith describe )
 pcs -f stonith.xml stonith create node1.stonith fence_ipmilan ipaddr="node1.ipmi" passwd="xXx" login="xXx" action="reboot" method="cycle" pcmk_host_list="node1.eth" pcmk_host_check=static-list stonith-timeout=10s op monitor interval=10s pcs -f stonith.xml stonith create node2.stonith fence_ipmilan ipaddr="node2.ipmi" passwd="xXx" login="xXx" action="reboot" method="cycle" pcmk_host_list="node2.eth" pcmk_host_check=static-list stonith-timeout=10s op monitor interval=10s 

Special attention should be paid to two parameters: ipaddr and pcmk_host_list . The first one talks about the IPMI interface at what address, and the second one - which nodes can be finished off with the created resource.

Since stonith, in terms of pacemaker, is a normal resource, it can migrate, like all other resources. It will be very unpleasant if the process responsible for restarting node2 turns out to be on node2. By this we forbid stonith resources to fall on the nodes that they will overload.
 pcs -f stonith.xml constraint location node1.stonith avoids node1.eth=INFINITY pcs -f stonith.xml constraint location node2.stonith avoids node2.eth=INFINITY 


Setup is complete. Pace configuration in pacemaker
 pcs cluster push cib stonith.xml 


After this simple check
 stonith_admin -t 20 --reboot node1.eth 

will understand that everything turned out right.

The final configuration should look something like this.

 # pcs status Online: [ node1.eth node2.eth ] Full list of resources: FS (ocf::heartbeat:Filesystem): Started node2.eth node1.stonith (stonith:fence_ipmilan): Started node2.eth node2.stonith (stonith:fence_ipmilan): Started node1.eth # pcs constraint location Location Constraints: Resource: node1.stonith Disabled on: node1.eth Resource: node2.stonith Disabled on: node2.eth # pcs property show no-quorum-policy: ignore stonith-enabled: true 

Source: https://habr.com/ru/post/200348/


All Articles