⬆️ ⬇️

Proxmox cluster storage. Part Three Nuances

Hello!



The third part of the article is a kind of application to the previous two, in which I talked about working with the Proxmox cluster. In this part, I will describe the problems we encountered in working with Proxmox, and how to solve them.



Authorized iSCSI connection



If you need to specify credit lines when connecting to iSCSI, it is better to do this bypassing Proxmox . Why?

')



Easier to connect manually :



root@srv01-vmx:~# iscsiadm -m discovery -t st -p 10.11.12.13 root@srv01-vmx:~# iscsiadm -m node --targetname "iqn.2012-10.local.alawar.ala-nas-01:pve-cluster-01" --portal "10.11.12.13:3260" --op=update --name node.session.auth.authmethod --value=CHAP root@srv01-vmx:~# iscsiadm -m node --targetname "iqn.2012-10.local.alawar.ala-nas-01:pve-cluster-01" --portal "10.11.12.13:3260" --op=update --name node.session.auth.username --value=Admin root@srv01-vmx:~# iscsiadm -m node --targetname "iqn.2012-10.local.alawar.ala-nas-01:pve-cluster-01" --portal "10.11.12.13:3260" --op=update --name node.session.auth.password --value=Lu4Ii2Ai root@srv01-vmx:~# iscsiadm -m node --targetname "iqn.2012-10.local.alawar.ala-nas-01:pve-cluster-01" --portal "10.11.12.13:3260" --login 


These commands must be executed for all portals that provide the target we need on all the nodes of the cluster. Or, you can execute these commands on one node, and distribute configuration files for this connection to the others. The files are located in the " / etc / iscsi / nodes " and " / etc / iscsi / send_targets " directories .



Mounting on a new GFS2-FS node



In order to mount a GFS2 file system on a new node, it (the file system ) needs to add another journal. This is done as follows: on any cluster node on which the required FS is mounted, the command is executed:



 root@pve01:~# gfs2_jadd -j 1 /mnt/cluster/storage01 


The " -j " parameter specifies the number of logs to add to FS .



This command may fail with the error:



 create: Disk quota exceeded 


Causes of the error:



Inside GFS2 -tom is actually not one file system, but two. Another file system is designed for service purposes. If desired, it can be mounted by adding the option " -o meta ". Changes within this FS can potentially lead to data file system corruption. When adding a log to FS , the meta- file system is mounted in the " / tmp / TEMP_RANDOM_DIR " directory , after which a log file is created in it. For reasons unknown to us so far, the kernel sometimes believes that in the mounted meta-FS, the quota for creating objects is exceeded, which is why this error occurs. You can get out of the situation by remounting GFS2 with data (of course, to do this you need to stop all virtuals located on this FS ), and once again execute the command to add a log. You also need to unmount the meta-FS , the rest of the last unsuccessful attempt to add the log:



 cat /proc/mounts | grep /tmp/ | grep -i gfs2 | awk '{print $2}' | xargs umount 


Mounting the data source inside the container



Container virtualization technology is good because the host has almost unlimited opportunities to communicate with the virtual machine.



When the vzctl container starts, it tries to execute the following set of scripts ( if available ):





When stopped, the following scripts are executed:





where " CTID " is the container number. The " vps. * " Scripts are executed during operations with any container. The scripts " * .start " and " * .stop " are executed in the context of the container, all the others are in the context of the host. Thus, we can script the process of starting / stopping the container, adding data mounting to it. Here are some examples:



Mounting the data directory inside the container


If the container works with a large amount of data, we try not to keep this data inside the container, but mount it from the host. In this approach, there are two positive points:



  1. The container is small, quickly backed up with Proxmox . We have the ability at any time to quickly restore / clone the functionality of the container.
  2. Container data can be centrally backed up by an adult backup system with all the facilities provided by it (multi-level backups, rotation, statistics, and so on).


Contents of the file " CTID.mount ":



 #!/bin/bash . /etc/vz/vz.conf #        OpenVZ.  ,     ${VE_ROOT} -     . . ${VE_CONFFILE} #       DIR_SRC=/storage/src_dir #   ,      DIR_DST=/data #   ,     $DIR_SRC mkdir -p ${VE_ROOT}/${DIR_DST} #      mount -n -t simfs ${DIR_SRC} ${VE_ROOT}/{$DIR_DST} -o /data #     


Mounting the file system inside the container


On the host there is a volume that needs to be given to the container. Contents of the file " CTID.mount ":



 #!/bin/bash . /etc/vz/vz.conf . ${VE_CONFFILE} UUID_SRC=3d1d8ec1-afa6-455f-8a27-5465c454e212 # UUID ,      DIR_DST=/data mkdir -p ${VE_ROOT}/${DIR_DST} mount -n -U ${UUID_SRC} ${VE_ROOT}/{$DIR_DST} 


Mounting the file system located in the file inside the container


Why this may be needed? If any tricky product ( for example, Splunk ) does not want to work with simfs in any way , or we are not satisfied with the speed of GFS2 operation under certain conditions. For example, we have some kind of cache on a bunch of small files. GFS2 does not work very quickly with large volumes of small files. Then you can create a file system on the host, other than GFS2 ( ext3 ), and connect it to the container.



Mount the loop device from file to container:



First create the file:



 root@srv01:/storage# truncate -s 10G CTID_ext3.fs 


Format the FS in the file:



 root@srv01:/storage# mkfs.ext3 CTID_ext3.fs mke2fs 1.42 (29-Nov-2011) CTID_ext3.fs is not a block special device. Proceed anyway? (y,n) y ... 


Contents of the file " CTID.mount ":



 #!/bin/bash . /etc/vz/vz.conf . ${VE_CONFFILE} CFILE_SRC=/storage/CTID_ext3.fs #   ,      DIR_DST=/data mkdir -p ${VE_ROOT}/${DIR_DST} mount -n ${CFILE_SRC} -t ext3 ${VE_ROOT}/{$DIR_DST} -o loop 


Unmounting external data in a container when stopped


When the container is stopped, the system automatically tries to unmount all file systems connected to it. But in a particularly exotic configuration, it does not work for her. So just in case an example of a simple script " CTID.umount ":



 #!/bin/bash . /etc/vz/vz.conf . ${VE_CONFFILE} DIR=/data if mountpoint -q "${VE_ROOT}${DIR}" ; then umount ${VE_ROOT}${DIR} fi 


Working in a cluster with a non-cluster file system



If for some reason there is no desire to use cluster FS (it does not suit the stability of work, it does not suit the speed, etc. ), but you want to work with a single repository, then this option is possible. For this we need:





Procedure:



Each cluster node allocates its logical volume in the CLVM , format it.



We create the main storage. Create a directory that has the same name on all nodes of the cluster ( for example, "/ storage" ). We mount our logical volume in it. We create a repository of the type “ Directory ” in the admin panel of Proxmox , call it, for example, “ STORAGE ”, say that it is not shared.



Create backup storage. Create a directory with the same name on all nodes of the cluster ( for example, "/ storage2" ). We create a repository of the type " Directory " in the admin panel of Proxmox , call it, for example, " STORAGE2 ", we say that it is not shared. In the event of a drop / shutdown of one of the nodes, we will mount its volume in the " / storage2 " directory on that node of the cluster, which will take over the load of the deceased.



What we have in the end:





Why " nedo ", and why " theoretically ":



Virtual machines live in the storage " STORAGE ", which is located in the " / storage " directory. The disk from the dead node will be mounted in the " / storage2 " directory , where Proxmox will see the containers, but cannot launch them from there. In order to raise the virtual machines located in this storage, you need to do three things:



  1. Report to firefighting containers that their new home is not the " / storage " directory, but the " / storage2 " directory . To do this, in each file " * .conf " in the directory " / etc / pve / nodes / name_death_name_name / openvz " change the contents of the variable VE_PRIVATE from " / storage / private / CTID " to " / storage2 / private / CTID ".
  2. Tell the cluster that the virtualians over that non-alive node are now located on this live one. To do this, it is enough to move all the files from the " / etc / pve / nodes / dead_number_networks / openvz " directory to the " / etc / pve / nodes / live_number_notes / openvz " directory . Perhaps, for this there is some kind of correct API- instruction, but we did not bother with this :)
  3. Reset the quota for each fire container ( just in case ):



     vzquota drop CTID 




Everything. You can run containers.



If the containers from the dead node take up some space, or we have incredibly nimble disks, or we can afford to wait, then we can avoid the first and third steps by simply transferring the containers we need from " / storage2 / private " to " / storage / private ".



If the cluster has collapsed



A cluster is a capricious creature, and there are cases when it gets into a pose. For example, after a massive network problem, or due to a massive power failure. The pose is as follows: when accessing the cluster storage, the current session is blocked, polling the fence-domain status displays alarm messages, such as " wait state messages ", and connection errors are added to the dmesg .



If no attempts to revive the cluster lead to success, then the simplest thing is to disable automatic entry into the fence- domain on all nodes of the cluster ( file "/ etc / default / redhat-cluster-pve" ), and then restart all the nodes. We must be prepared for the fact that the nodes will not be able to reboot on their own. When all nodes will be rebooted, we manually connect to the fence domain, start CLVM , and so on. The previous articles have written how to do this.



On this, perhaps, everything.



In the next part I will talk about how we automate the work in the cluster.



Thanks for attention!



Source: https://habr.com/ru/post/178429/



All Articles