📜 ⬆️ ⬇️

How I did mirroring virtual machines for Free ESXi

In my home lab, I use free virtualization from VMware — it's cheap and reliable. First there was one server, then I started adding local datastores to it, then I assembled a second server ... A standard problem was the relocation of the virtual machine. Doing these operations manually, I came across one method that allowed switching a running virtual machine to copies of flat in a completely different place. The method is extremely simple: just create a snapshot of the virtual machine, clone the flat in a new place, and then in the delta kill the link to the parent disk. The hypervisor does not keep the disk metadata files open, so when you remove snapshot, merge with the new disk, and you can safely delete the old one. This method works fine without any VDDK that is not available on free hypervisors and is used, for example, by Veeam in a similar situation.


I easily automated this procedure in python by applying a few more tricks that, if I have queries, I can reveal in the following articles. A little later, there was a good person from my former colleagues in the shop who agreed to write a gooey, the latter, however, was implemented on Unity, but for the resulting free solution, called us Extrasphere, it was not bad at all. What is not a toy for the admin?


Having made the migration of virtual machines for my home lab, I thought about crash protection. The minimum requirement was to back up a running virtual machine, the maximum unattainable is the lack of a backlog of the backup from the original. Not that I had such data, where a loss of 15 seconds is critical, to tell the truth, it’s not critical for me to lose even a couple of days, but I wanted to come to this ideal with the future.
I will not give an analysis and comparison of the available solutions - it was a long time ago, no notes have been preserved since, I remember only about the unthinkable bicycling.


On a free hypervisor, you can make the simplest backup agent from a Linux machine, to which to attach flat files to a hosted virtual machine. This solution is well suited for creating full backups, but it is absolutely not suitable for incremental backup, since native CBT is not available for free hypervisors.


I thought it would be nice to cut CBT myself, but how? I heard about Zerto and SRM with their vSCSIFilter, but by downloading the open-source package for ESXi I didn’t find anything similar there - well, except that you can write a character device. I decided to take a look at the hbr_filter device, to my surprise, everything turned out to be not too difficult. Three weeks of experiments - and now I can already attach my filter driver to the virtual disk and track changes.


And what if not just tracking changes, but replicating them? Here the biggest danger is to start writing a ton of code providing the transmission channel: then pull out the changes, pack and send to the network, receive, unpack and write, and you need to ensure error handling at every step. Without a couple of agents, it seems not enough. Just look at the architecture of Zerto, to understand what to write and stabilize such a solution alone is unrealistic:


image

Fig. 1. Architecture Zerto Virtual Replication from the commercial.


Then I remembered that ESXi itself can write over the network via iSCSI and NFS for example. All that is needed is to mount the target datastore locally. And if you also include a replica, then you can write to it directly from the filter driver! I started the experiments: at first I didn’t know what to do with the included replica and just loaded it with the Ubuntu Live CD, after a couple of weeks I started to get a workable copy, and then I learned how to transfer changes on the fly. Moreover, the original machine does not receive confirmation of the record until it has passed the records to both recipients. So I got replication with zero lag.


The technology turned out to be agentless, the minimum code, the creation of a replica immediately immediately put it on python. For this dissimilarity to classic replication and simplicity, I decided to call it mirroring.


The problem of the included mirror I decided to write a simple bootloader, and in order to get at least some benefit from it, it shows the last status of the mirror on the boot, and then freezes. As a result, the actual memory consumption tends to zero, the CPU is spent a little while transferring data, but the installed agent would eat no less. As a result, the graph of disk activity on the record at the mirror and the source machine are identical.


image

Fig. 2. CPU consumption on source machine with load.


image

Fig. 3. CPU consumption on the mirror in the same period.


image

Fig. 4. Memory consumption on the source machine with a load.


image

Fig. 5. Memory consumption on the mirror in the same period.


image

Fig. 6. The disk performance of the original machine under load.


image

Fig. 7. Disk mirror performance over the same period.


In order to check the state of the mirror, I made test machines that run as linked-clone from the snapshot of the mirror. If desired, snapshots can be stored and then run the tests again, and if you really like the test, you can turn it into a persistent virtual machine using the built-in migration, which I told you about in the beginning of the story.


Local targeting is good, but what if you need to mirror to another office / city? Even if the communication channel is wide enough, the response time will drastically lower the performance of the source machine, we remember that the source machine does not receive confirmation of the record until it completes to both recipients. The solution here is extremely simple: you need to extend the recording uncertainty interval from zero to some reasonable value. For example, a tolerable lag of 3-5 seconds will ensure both good data integrity and decent performance. I am currently working on this decision. The next step is to work without ssh and application-level consistency, which, too, will not do without tricks, which I will gladly share.


')

Source: https://habr.com/ru/post/316056/


All Articles