Proxmox cluster storage. Part one. Fencing

Hello!

I want to talk about how we use Proxmox Virtual Environment .

I will not describe the installation and initial setup - Proxmox is very simple and enjoyable in installation and configuration. I will tell about how we use system in a cluster environment.
')
To complete the work of the cluster, it is necessary that the various hosts of the cluster can quickly take over the control of the virtual machine. Virtualok data should not be copied anywhere. That is, all cluster hosts must have access to the data of a specific machine, or, in other words, all cluster hosts must work with a single data store, within which a specific set of virtual machines is running.

Proxmox works with two types of virtualization: the operating system level, based on OpenVZ and hardware, based on KVM . These two types use a different approach to utilizing disk space. If, in the case of OpenVZ containers, the virtual machine disk is operated at the host file system level, then in the case of KVM machines, a disk image is used that contains the virtual machine's own file system. The host operating system does not care about placing data inside a KVM disk. Hypervisor does this. When organizing the work of a cluster, the variant with disk images is implemented easier than working with the file system. KVM- machine data from the point of view of the host’s operating system may simply be “ somewhere ” in the repository. This concept remarkably rests on the LVM workflow when the KVM disk image is inside a logical volume.

In the case of OpenVZ, we are dealing with the file system, and not just with the data areas on the Shared Storage . We need a complete cluster file system.

The cluster file system will not be discussed in this part of the article. About working with KVM - too. Now let's talk about preparing the cluster for working with shared storage.

I want to immediately say that we do not give the load to the cluster, and we do not plan. The system is used for internal needs, which we have plenty. As I wrote in the previous article , we gradually transfer the payload to vCloud , and deploy Proxmox on the released capacity.

In our case, the problem of organizing a common repository boils down to two aspects:

We have a block device sent over the network to which several hosts will have access simultaneously. In order for these hosts not to fight for space on the device, we need CLVM - Clustered Logical Volume Manager . This is the same as LVM , only Clustered . Thanks to CLVM, each host has up-to-date information ( and can change it safely, without compromising integrity ) about the status of LVM volumes on Shared Storage . Logical volumes in CLVM live just like normal LVM . Logical volumes contain either KVM images or cluster FS .
In the case of OpenVZ , we have a logical volume on which the file system is located. The simultaneous operation of several machines with a noncluster file system leads to inevitable errors in the operation of everything - it is swan, cancer and pike, only worse. The file system must be aware that it lives on a shared resource, and be able to work in this mode.

Creating a cluster and connecting nodes to it I will not describe. About this you can read, for example, on the site of developers . By the way, their knowledge base is quite extensive and informative.

We work with Proxmox version 2.2 . We assume that the cluster is configured for us, and it works.

We are going to set up a fenced daemon.

The specifics of the cluster environment requires cluster nodes to follow certain rules of behavior. You can not just go and start writing on the device. First you need to ask permission. This is controlled by several cluster subsystems - CMAN ( Cluster manager ), DLM ( Distributed lock manager ) and Fenced . The fenced daemon plays the role of a bouncer. If from a cluster point of view, a node starts to behave inadequately - communication with the storage in the cluster freezes, and fenced tries to disconnect the failed node from the cluster.

Fencing is the process of eliminating nodes from working with cluster storage. Since an inadequate machine may not respond to requests to leave in an amicable way, weaning of a node can be done using forces external to the cluster. Specially trained fence agents are used to communicate with these forces. As a rule, fencing is reduced to de-energizing the node, after which the cluster takes a breath and resumes work.

In the role of external forces, any equipment capable of de-energizing the ~~muzzle of a~~ signal or isolating a machine from a network can act as a signal. Most often in the role of such devices are industrial power supplies, or server management console. We use HP equipment. Fencing is made using iLO cards.

Until the fenced daemon received confirmation from the agent that the node is safely " fenced " - all I / O operations in the cluster will be suspended. This is done to minimize the risk of data corruption in the repository. Since the failed node has ceased to follow the generally accepted rules of behavior, you can expect anything from it. For example, unauthorized ( and non-logged ) attempts to write to disk. Therefore, any communication with the repository in this situation increases the risk of data corruption.

If the fence agent is not configured for the node, then fenced will not be able to kick it in case of problems, and the cluster will be in a frozen state until the situation is resolved. In this situation, there are several scenarios for further developments:

Noda can come to his senses, return to the cluster and say that she will not be like this anymore. It will be released, and the cluster will be resumed.
A node can reboot ( or someone will reload it ), and be asked to cluster. The fact of a new attempt to connect to the cluster is considered a signal that the node is healthy, and you can continue to work.
Noda may die. This situation requires manual intervention. It is necessary to make clear to the fenced daemon that the cluster can resume work with the storage, since the node is no longer dangerous. And in general, maybe it will not return. For this purpose there is a utility " fence_ack_manual ". In this case, the operator assumes responsibility for making decisions on the resumption of the cluster.

If the host finishes work in the normal mode, it simply asks to delete itself from the domain fence, after which it loses the ability to communicate with the repository.

The presence of a host in the fence domain is a prerequisite for performing any operations with the shared storage using cluster software .

Consider the fenced configuration using the fence-ilo agent (the configuration is performed on each node of the cluster ):

In the / etc / default / redhat-cluster-pve file we set

FENCE_JOIN="yes"

Now when the system starts, the node will connect to the fence-domain. We do not want to reboot, so we add the node to the domain manually:

 root@srv03-vmx-02:~# fence_tool join

You can view the status of a fenced daemon like this:

 root@srv03-vmx-02:~# fence_tool ls fence domain member count 4 victim count 0 victim now 0 master nodeid 1 wait state none members 1 2 3 4

Testing fence-agent

There are different agents for different versions of iLO :

 root@tpve01:~# fence_ilo fence_ilo fence_ilo2 fence_ilo3 fence_ilo_mp

First of all, we'll poll the status of the node of interest through iLO :

 root@tpve01:~# fence_ilo -a ILO_IP -l LOGIN -p PASSWORD -o status Status: ON

Status: ON . Instead of " -o status ", you can say " -o reboot ". The experimental machine will receive a reset of podhy.

In the same way, we check the performance of iLO on all nodes.

Now we set up the cluster for the correct operation of the fence agents. There is a good article about setting up fenced in Proxmox , and I will not retell here what is written there, I will give only the final configuration of our cluster:

 root@tpve01:~# cat /etc/pve/cluster.conf.new <?xml version="1.0"?> <cluster name="tpve" config_version="5"> <cman keyfile="/var/lib/pve-cluster/corosync.authkey"> </cman> <fencedevices> <fencedevice agent="fence_ilo" ipaddr="IP_ILO_TPVE01" name="tpve01" passwd_script="/usr/local/pvesync/ilo_pass/tpve01" login="LOGIN"/> <fencedevice agent="fence_ilo" ipaddr="IP_ILO_TPVE02" name="tpve02" passwd_script="/usr/local/pvesync/ilo_pass/tpve02" login="LOGIN"/> </fencedevices> <clusternodes> <clusternode name="tpve01" votes="1" nodeid="1"> <fence> <method name="power"> <device name="tpve01"/> </method> </fence> </clusternode> <clusternode name="tpve02" votes="1" nodeid="2"> <fence> <method name="power"> <device name="tpve02"/> </method> </fence> </clusternode> <clusternode name="tpve03" votes="1" nodeid="3"/> </clusternodes> </cluster>

In our example, the " tpve03 " node does not have a configured fence agent, and if there are problems with it, the conflict will have to be resolved manually.

In order not to shine iLO passwords in the config, instead of the password in the agent settings, the parameter is specified

 passwd_script="/usr/local/pvesync/ilo_pass/tpve01"

This is the path to the script that gives the password. The script is primitive:

 root@tpve01:~# cat /usr/local/pvesync/ilo_pass/tpve01 #!/bin/bash echo "DERPAROL"

These scripts must exist on all nodes of the cluster.

After all changes to the cluster configuration are made and validated , we use our config. In the Proxmox web interface , go to the HA settings and say " Activate ". If everything happened without errors, and the serial number of the cluster configuration was increased, then the changes will take effect, and you can try to add nodes. In general, it is highly recommended to arrange fensing of each node of the cluster in order to be sure that everything really works.

To begin, we kick the node hands:

 root@tpve01:~# fence_node tpve02 fence tpve02 success

Noda caught reset.

Now let's try to emulate the problem. On one of the nodes we add the network interface, through which the cluster nodes communicate with each other:

 root@tpve02:~# ifdown vmbr0

After some time on a live node, polling the state of the fenced daemon will show the following:

 root@tpve01:~# fence_tool ls fence domain member count 2 victim count 1 victim now 2 master nodeid 1 wait state fencing members 1 2 3

" wait state fencing " says that the problem node is being eliminated from the cluster right now, and the fenced daemon is waiting for news from the fence agent.

After receiving confirmation from the agent:

 root@tpve01:~# fence_tool ls fence domain member count 2 victim count 0 victim now 0 master nodeid 1 wait state none members 1 3

Noda killed, work on.

When the node rises, it connects to the cluster.

Now our cluster is ready to work with shared storage.

Perhaps this will stop. In the next article, which will be released, apparently, in January, I will talk about connecting the storage and working with it.

Proxmox cluster storage. Part one. Fencing
Proxmox cluster storage. Part two. Launch
Proxmox cluster storage. Part Three Nuances

Source: https://habr.com/ru/post/163297/

All Articles

Proxmox cluster storage. Part one. Fencing

We are going to set up a fenced daemon.

Testing fence-agent

More articles: