Hello!
The third part of the article is a kind of application to the previous two, in which I talked about working with the Proxmox cluster. In this part, I will describe the problems we encountered in working with Proxmox, and how to solve them.
Authorized iSCSI connection
If you need to specify credit lines when connecting to
iSCSI, it is better to do this bypassing
Proxmox . Why?
')
- Firstly, because it is not possible to create an authorized iSCSI connection via the Proxmox web interface.
- Secondly, even if you decide to create an unauthorized connection in Proxmox in order to specify the authorization information manually, you will have to butt with the system for the ability to change the target configuration files, because if the connection to the iSCSI host fails, Proxmox overwrites the target information and tries to connect again.
Easier to connect manually :
root@srv01-vmx:~
These commands must be executed for all portals that provide the target we need on all the nodes of the cluster. Or, you can execute these commands on one node, and distribute configuration files for this connection to the others. The files are located in the "
/ etc / iscsi / nodes " and "
/ etc / iscsi / send_targets "
directories .
Mounting on a new GFS2-FS node
In order to mount a
GFS2 file system on a new node, it (the
file system ) needs to add another journal. This is done as follows: on any cluster node on which the required
FS is mounted, the command is executed:
root@pve01:~
The "
-j " parameter specifies the number of logs to add to
FS .
This command may fail with the error:
create: Disk quota exceeded
Causes of the error:
Inside
GFS2 -tom is actually not one file system, but two. Another file system is designed for service purposes. If desired, it can be mounted by adding the option "
-o meta ". Changes within this
FS can potentially lead to data file system corruption. When adding a log to
FS , the
meta- file system is mounted in the "
/ tmp / TEMP_RANDOM_DIR "
directory , after which a log file is created in it. For reasons unknown to us so far, the kernel sometimes believes that in the mounted
meta-FS, the quota for creating objects is exceeded, which is why this error occurs. You can get out of the situation by remounting
GFS2 with data (of
course, to do this you need to stop all virtuals located on this FS ), and once again execute the command to add a log. You also need to unmount the
meta-FS , the rest of the last unsuccessful attempt to add the log:
cat /proc/mounts | grep /tmp/ | grep -i gfs2 | awk '{print $2}' | xargs umount
Mounting the data source inside the container
Container virtualization technology is good because the host has almost unlimited opportunities to communicate with the virtual machine.
When the
vzctl container starts, it tries to execute the following set of scripts (
if available ):
- /etc/pve/openvz/vps.premount
- /etc/pve/openvz/CTID.premount
- /etc/pve/openvz/vps.mount
- /etc/pve/openvz/CTID.mount
- /etc/pve/openvz/CTID.start
When stopped, the following scripts are executed:
- /etc/pve/openvz/CTID.stop
- /etc/pve/openvz/CTID.umount
- /etc/pve/openvz/vps.umount
- /etc/pve/openvz/CTID.postumount
- /etc/pve/openvz/vps.postumount
where "
CTID " is the container number. The "
vps. * " Scripts are executed during operations with any container. The scripts "
* .start " and "
* .stop " are executed in the context of the container, all the others are in the context of the host. Thus, we can script the process of starting / stopping the container, adding data mounting to it. Here are some examples:
Mounting the data directory inside the container
If the container works with a large amount of data, we try not to keep this data inside the container, but mount it from the host. In this approach, there are two positive points:
- The container is small, quickly backed up with Proxmox . We have the ability at any time to quickly restore / clone the functionality of the container.
- Container data can be centrally backed up by an adult backup system with all the facilities provided by it (multi-level backups, rotation, statistics, and so on).
Contents of the file "
CTID.mount ":
Mounting the file system inside the container
On the host there is a volume that needs to be given to the container. Contents of the file "
CTID.mount ":
Mounting the file system located in the file inside the container
Why this may be needed? If any tricky product (
for example, Splunk ) does not want to work with
simfs in any
way , or we are not satisfied with the speed of
GFS2 operation under certain conditions. For example, we have some kind of cache on a bunch of small files.
GFS2 does not work very quickly with large volumes of small files. Then you can create a file system on the host, other than
GFS2 (
ext3 ), and connect it to the container.
Mount the
loop device from file to container:
First create the file:
root@srv01:/storage
Format the
FS in the file:
root@srv01:/storage
Contents of the file "
CTID.mount ":
Unmounting external data in a container when stopped
When the container is stopped, the system automatically tries to unmount all file systems connected to it. But in a particularly exotic configuration, it does not work for her. So just in case an example of a simple script "
CTID.umount ":
Working in a cluster with a non-cluster file system
If for some reason there is no desire to use cluster
FS (it
does not suit the stability of work, it does not suit the speed, etc. ), but you want to work with a single repository, then this option is possible. For this we need:
- A separate logical volume in CLVM for each node of the cluster
- The main storage for the normal operation of containers
- Empty backup storage for urgent mounting of a foreign node's volume in case of its crash / shutdown
Procedure:
Each cluster node allocates its logical volume in the
CLVM , format it.
We create the main storage. Create a directory that has the same name on all nodes of the cluster (
for example, "/ storage" ). We mount our logical volume in it. We create a repository of the type “
Directory ” in the admin
panel of Proxmox , call it, for example, “
STORAGE ”, say that it is not shared.
Create backup storage. Create a directory with the same name on all nodes of the cluster (
for example, "/ storage2" ). We create a repository of the type "
Directory " in the admin
panel of Proxmox , call it, for example, "
STORAGE2 ", we say that it is not shared. In the event of a drop / shutdown of one of the nodes, we will mount its volume in the "
/ storage2 "
directory on that node of the cluster, which will take over the load of the deceased.
What we have in the end:
- Migration ( including online ) of containers between nodes ( if no data is mounted to the container on the side ). The container is transferred from the node to the node by copying, respectively, the migration time depends on the amount of data in the container. The more data, the longer the container will be transferred between nodes. Do not forget about the increasing disk load at the same time.
- ( Under- ) fault tolerance. When a node is dropped, its data can be mounted on a neighboring node, and theoretically you can start working with them.
Why "
nedo ", and why "
theoretically ":
Virtual machines live in the storage "
STORAGE ", which is located in the "
/ storage " directory. The disk from the dead node will be mounted in the "
/ storage2 "
directory , where
Proxmox will see the containers, but cannot launch them from there. In order to raise the virtual machines located in this storage, you need to do three things:
- Report to firefighting containers that their new home is not the " / storage " directory, but the " / storage2 " directory . To do this, in each file " * .conf " in the directory " / etc / pve / nodes / name_death_name_name / openvz " change the contents of the variable VE_PRIVATE from " / storage / private / CTID " to " / storage2 / private / CTID ".
- Tell the cluster that the virtualians over that non-alive node are now located on this live one. To do this, it is enough to move all the files from the " / etc / pve / nodes / dead_number_networks / openvz " directory to the " / etc / pve / nodes / live_number_notes / openvz " directory . Perhaps, for this there is some kind of correct API- instruction, but we did not bother with this :)
- Reset the quota for each fire container ( just in case ):
vzquota drop CTID
Everything. You can run containers.
If the containers from the dead node take up some space, or we have incredibly nimble disks, or we can afford to wait, then we can avoid the first and third steps by simply transferring the containers we need from "
/ storage2 / private " to "
/ storage / private ".
If the cluster has collapsed
A cluster is a capricious creature, and there are cases when it gets into a pose. For example, after a massive network problem, or due to a massive power failure. The pose is as follows: when accessing the cluster storage, the current session is blocked, polling the fence-domain status displays alarm messages, such as "
wait state messages ", and connection errors are added to the
dmesg .
If no attempts to revive the cluster lead to success, then the simplest thing is to disable automatic entry into the
fence- domain on all nodes of the cluster (
file "/ etc / default / redhat-cluster-pve" ), and then restart all the nodes. We must be prepared for the fact that the nodes will not be able to reboot on their own. When all nodes will be rebooted, we manually connect to the
fence domain, start
CLVM , and so on. The previous articles have written how to do this.
On this, perhaps, everything.
In the next part I will talk about how we automate the work in the cluster.
Thanks for attention!