Hello!
I want to talk about how we use
Proxmox Virtual Environment .
I will not describe the installation and initial setup -
Proxmox is very simple and enjoyable in installation and configuration. I will tell about how we use system in a cluster environment.
')
To complete the work of the cluster, it is necessary that the various hosts of the cluster can quickly take over the control of the virtual machine. Virtualok data should not be copied anywhere. That is, all cluster hosts must have access to the data of a specific machine, or, in other words, all cluster hosts must work with a single data store, within which a specific set of virtual machines is running.
Proxmox works with two types of virtualization: the operating system level, based on
OpenVZ and hardware, based on
KVM . These two types use a different approach to utilizing disk space. If, in the case of
OpenVZ containers, the virtual machine disk is operated at the host file system level, then in the case of
KVM machines, a disk image is used that contains the virtual machine's own file system. The host operating system does not care about placing data inside a
KVM disk. Hypervisor does this. When organizing the work of a cluster, the variant with disk images is implemented easier than working with the file system.
KVM- machine data from the point of view of the host’s operating system may simply be “
somewhere ” in the repository. This concept remarkably rests on the
LVM workflow when the
KVM disk image is inside a logical volume.
In the case of
OpenVZ, we are dealing with the file system, and not just with the data areas on the
Shared Storage . We need a complete cluster file system.
The cluster file system will not be discussed in this part of the article. About working with
KVM - too. Now let's talk about preparing the cluster for working with shared storage.
I want to immediately say that we do not give the load to the cluster, and we do not plan. The system is used for internal needs, which we have plenty. As I wrote in the
previous article , we gradually transfer the payload to
vCloud , and deploy Proxmox on the released capacity.
In our case, the problem of organizing a common repository boils down to two aspects:
- We have a block device sent over the network to which several hosts will have access simultaneously. In order for these hosts not to fight for space on the device, we need CLVM - Clustered Logical Volume Manager . This is the same as LVM , only Clustered . Thanks to CLVM, each host has up-to-date information ( and can change it safely, without compromising integrity ) about the status of LVM volumes on Shared Storage . Logical volumes in CLVM live just like normal LVM . Logical volumes contain either KVM images or cluster FS .
- In the case of OpenVZ , we have a logical volume on which the file system is located. The simultaneous operation of several machines with a noncluster file system leads to inevitable errors in the operation of everything - it is swan, cancer and pike, only worse. The file system must be aware that it lives on a shared resource, and be able to work in this mode.
Creating a cluster and connecting nodes to it I will not describe. About this you can read, for example,
on the site of developers . By the way, their
knowledge base is quite extensive and informative.
We work with
Proxmox version
2.2 . We assume that the cluster is configured for us, and it works.
We are going to set up a fenced daemon.
The specifics of the cluster environment requires cluster nodes to follow certain rules of behavior. You can not just go and start writing on the device. First you need to ask permission. This is controlled by several cluster subsystems -
CMAN (
Cluster manager ),
DLM (
Distributed lock manager ) and
Fenced .
The fenced daemon plays the role of a bouncer. If from a cluster point of view, a node starts to behave inadequately - communication with the storage in the cluster freezes, and
fenced tries to disconnect the failed node from the cluster.
Fencing is the process of eliminating nodes from working with cluster storage. Since an inadequate machine may not respond to requests to leave in an amicable way, weaning of a node can be done using forces external to the cluster. Specially trained fence agents are used to communicate with these forces. As a rule,
fencing is reduced to de-energizing the node, after which the cluster takes a breath and resumes work.
In the role of external forces, any equipment capable of de-energizing the
muzzle of a signal or isolating a machine from a network can act as a signal. Most often in the role of such devices are industrial power supplies, or server management console. We use
HP equipment. Fencing is made using
iLO cards.
Until the
fenced daemon received confirmation from the agent that the node is safely "
fenced " - all I / O operations in the cluster will be suspended. This is done to minimize the risk of data corruption in the repository. Since the failed node has ceased to follow the generally accepted rules of behavior, you can expect anything from it. For example, unauthorized (
and non-logged ) attempts to write to disk. Therefore, any communication with the repository in this situation increases the risk of data corruption.
If the
fence agent is not configured for the node, then
fenced will not be able to kick it in case of problems, and the cluster will be in a frozen state until the situation is resolved. In this situation, there are several scenarios for further developments:
- Noda can come to his senses, return to the cluster and say that she will not be like this anymore. It will be released, and the cluster will be resumed.
- A node can reboot ( or someone will reload it ), and be asked to cluster. The fact of a new attempt to connect to the cluster is considered a signal that the node is healthy, and you can continue to work.
- Noda may die. This situation requires manual intervention. It is necessary to make clear to the fenced daemon that the cluster can resume work with the storage, since the node is no longer dangerous. And in general, maybe it will not return. For this purpose there is a utility " fence_ack_manual ". In this case, the operator assumes responsibility for making decisions on the resumption of the cluster.
If the host finishes work in the normal mode, it simply asks to delete itself from the
domain fence, after which it loses the ability to communicate with the repository.
The presence of a host in the
fence domain is a prerequisite for performing any operations with the shared storage using cluster
software .
Consider the fenced configuration using the
fence-ilo agent (the
configuration is performed on each node of the cluster ):
In the
/ etc / default / redhat-cluster-pve file we set
FENCE_JOIN="yes"
Now when the system starts, the node will connect to the fence-domain. We do not want to reboot, so we add the node to the domain manually:
root@srv03-vmx-02:~
You can view the status of a
fenced daemon like this:
root@srv03-vmx-02:~
Testing fence-agent
There are different agents for different versions of
iLO :
root@tpve01:~
First of all, we'll
poll the status of the node of interest through
iLO :
root@tpve01:~
Status: ON . Instead of "
-o status ", you can say "
-o reboot ". The experimental machine will receive a reset of podhy.
In the same way, we check the performance of
iLO on all nodes.
Now we set up the cluster for the correct operation of the
fence agents.
There is a good
article about setting up
fenced in
Proxmox , and I will not retell here what is written there, I will give only the final configuration of our cluster:
root@tpve01:~# cat /etc/pve/cluster.conf.new <?xml version="1.0"?> <cluster name="tpve" config_version="5"> <cman keyfile="/var/lib/pve-cluster/corosync.authkey"> </cman> <fencedevices> <fencedevice agent="fence_ilo" ipaddr="IP_ILO_TPVE01" name="tpve01" passwd_script="/usr/local/pvesync/ilo_pass/tpve01" login="LOGIN"/> <fencedevice agent="fence_ilo" ipaddr="IP_ILO_TPVE02" name="tpve02" passwd_script="/usr/local/pvesync/ilo_pass/tpve02" login="LOGIN"/> </fencedevices> <clusternodes> <clusternode name="tpve01" votes="1" nodeid="1"> <fence> <method name="power"> <device name="tpve01"/> </method> </fence> </clusternode> <clusternode name="tpve02" votes="1" nodeid="2"> <fence> <method name="power"> <device name="tpve02"/> </method> </fence> </clusternode> <clusternode name="tpve03" votes="1" nodeid="3"/> </clusternodes> </cluster>
In our example, the "
tpve03 " node does not have a configured fence agent, and if there are problems with it, the conflict will have to be resolved manually.
In order not to shine
iLO passwords in the config, instead of the password in the agent settings, the parameter is specified
passwd_script="/usr/local/pvesync/ilo_pass/tpve01"
This is the path to the script that gives the password. The script is primitive:
root@tpve01:~
These scripts must exist on all nodes of the cluster.
After all changes to the cluster configuration
are made and validated , we use our config. In the
Proxmox web interface
, go to the
HA settings and say "
Activate ". If everything happened without errors, and the serial number of the cluster configuration was increased, then the changes will take effect, and you can try to add nodes. In general, it is highly recommended to arrange fensing of each node of the cluster in order to be sure that everything really works.
To begin, we kick the node hands:
root@tpve01:~
Noda caught reset.
Now let's try to emulate the problem. On one of the nodes we add the network interface, through which the cluster nodes communicate with each other:
root@tpve02:~
After some time on a live node, polling the state of the
fenced daemon will show the following:
root@tpve01:~
"
wait state fencing " says that the problem node is being eliminated from the cluster right now, and the
fenced daemon is waiting for news from the
fence agent.
After receiving confirmation from the agent:
root@tpve01:~
Noda killed, work on.
When the node rises, it connects to the cluster.
Now our cluster is ready to work with shared storage.
Perhaps this will stop. In the next article, which will be released, apparently, in January, I will talk about connecting the storage and working with it.