📜 ⬆️ ⬇️

How to be ready or DR on Nutanix: asynchronous replication



The topic of building fault-tolerant data centers using hyper-convergent systems Nutanix is ​​quite extensive, so that, on the one hand, “not to ride over the tops,” and on the other hand, not to go deep into tediousness, we divide it into two parts. In the first, we will look at the theory and “classic” replication methods, in the second, our new chip, the so-called Metro Availability, the ability to build a synchronous-replicable metro cluster (metropolitan-area, large city scale and even more) at a distance of 400 centimeters.

Today, it is very rarely necessary to “start from the very beginning” and explain why a backup data center is needed at all. Today, when, on the one hand, the volumes and value of information stored in data centers have grown, on the other hand, many solutions for organizing distributed data storage and processing have become widespread, there is usually no need to explain why the company needs a backup data center. But you still have to look at “how exactly” and how it is needed.

')
Nutanix, as a young company and emerging “on the cutting edge of needs,” immediately focused on the need for a modern data center to have both a backup data center and correct, multifunctional replication. That is why, even in the youngest license that comes with every Nutanix system, and is included in the price (we call it Starter), we already have full-featured replication tools. This distinguishes Nutanix from many traditional systems in which replication often comes as an option and costs additional, often very substantial, money.
Not so with us. Any Nutanix already knows how to bidirectional asynchronous replication. Bidirectionality, by the way, which allows you to use both data centers in the “active-active” mode (and not the “active-reser”, as usual), is also not found among all vendors.

In the world of replication methods, it is customary to distinguish between two basic principles of their operation, the so-called "synchronous" and "asynchronous" replication.
Synchronous replication is easy. Each disk write operation is not considered complete until the same write operation is completed and completed on the disks of the remote system. The data block came, the block was recorded on the disks of the local storage device, but the application has not received the answer “ready, recorded!” Meanwhile, the data block was transferred to the remote system, and the process of recording this block began on it. And only after the block is recorded on the remote system, it reports success, and then the system reports the completion of the recording to the application, the block of which was recorded, the local, first system.



The advantages of synchronous replication are obvious. Data is recorded on the local and remote systems as securely as possible, the copy in the backup data center is always completely identical to the copy in your local data center. However, there is a minus, although it is one, it is huge.
If you have two systems connected by synchronous replication, then the speed of the active, local for your application, system will be equal to the speed of the data transmission channel between the local and remote systems. Because until the remote system receives the data block for writing and writes it, your local system will stand and wait. And if you have 16G FC from an application to the local system's disks, and from a local system to a remote one, 1GB Ethernet via an ISP channel through several routers, then recording to local disks will not exceed the speed of this Gigabit Ethernet channel here.

If this option does not suit you, then you should look at asynchronous replication.
Asynchronous replication, on the contrary, has many drawbacks, but one plus. But big. :)
With asynchronous replication, recording occurs on the local device's disks, regardless of the state of the remote system, and then, at a fixed frequency, they are transferred to this remote system, while copying not the operations, but simply the state of the disks of the local data source system. This is advantageous in terms of performance and channel utilization. Often, for example, in the case when an application constantly reads and changes the active data region, it is much easier to transmit the result once than hundreds or thousands of sequential operations. Alas, this is a big plus accompanied by the fact that an asynchronous copy, strictly speaking, never fully corresponds to the data of an active system, always lagging behind it for those few minutes that takes a replication cycle.

However, in practice, asynchronous replication is widely used. What to do with the "inaccuracy copy"? Well, first of all, you might think, and estimate that system recording performance and low latency values ​​are more important for you than possible loss of 15 minutes of changes in data (and often it happens). Or, in addition to general asynchronous replication for all applications, use some data protection tools at the application level, and not data storage in general (for example, various Microsoft products, Availability Groups in MS Exchange, or similar tools). in MS SQL Server).

In principle, here Nutanix has not invented anything new. On Nutanix systems, there is both synchronous, through which the so-called Metro Availability is implemented, and asynchronous replication from one Nutanix cluster, in one DC, to another, usually in another DC. Asynchronous replication is accomplished using the transfer of snapshots of disk storage, and synchronous replication is a bit more complicated and curious.

A few words about some terminological hitch.
The word “cluster” used today by everyone can be a little embarrassing, if not misleading. The fact is that in the IT industry today, the word “cluster” is used to designate not only things that are very different in nature, but also mainly applies to the structure that provides high availability. And so often the word “cluster” is used to designate just and only “HA-cluster”, that many have the impression that “cluster” = “fault-tolerant cluster”, and nothing more.
Not always the case.

First, what is a “cluster”?
A cluster is an association of some elements into a certain general essence of the higher order, existing and managed as an independent unit. As you can see, in this definition there is nothing about fault tolerance. It may be, it may not be, and the cluster is not always “HA-cluster”.

I often have to answer the question: "But can we spread the nodes of the Nutanix cluster across different sites, will we have reliability?"
The answer is this: in theory, yes, you can (of course, you shouldn’t forget that you need to stretch the 10G channels for intracluster communication through the sites, too, but even this can be done today), but in practice it is not recommended, and in general "You yourself do not want to." Why?

Let's imagine, for the sake of analogy, that a Nutanix cluster is a disk group in a disk array connected via FC. Here we have site A, and there is a JBOD shelf and a dozen disks on it in FC, and site B, with the same shelf and a dozen disks, all included in the overall FC network. And we decide to make a RAID group of RAID-5 or 6 out of all these twenty disks. In principle, in theory, we can, why not. We see each JBOD disk as a separate FC device, merge. But then the question is, what will happen to the RAID if we have an excavator driving, will something happen to the network between the sites?
Nothing good. Losses of half, or even at least a third of the participants at once, most likely will not sustain either RAID or, in a similar situation, the Nutanix cluster.

That is why, when speaking with users about clusters and Nutanix, we have to minimize confusion, specifying that we are saying: “Nutanix cluster”, and here “cluster” means “VMware cluster”, and here - Real Application Cluster (RAC ) Oracle database. And these will be all different clusters, often located hierarchically above each other, and this is completely normal.

Thus, we cannot scatter nodes of a single Nutanix cluster across sites for fault tolerance. But we can locate the nodes of one “Nutanix cluster” on one site, another site (Nutanix cluster) on another site, and then merge them into replicated relationships in the right direction, for example, from the main DC to the backup, from the head office to the branches (and vice versa), or even in both directions.

Moreover, if this is a “synchronous” metrocluster, then on top of such a synchronously replicable pair of clusters, which, I remind, can be located at a distance, in the ideal case, up to 400 kilometers (or, technically speaking, we need to between the package did not exceed 5ms), and then on top of this metro pair we can create one VMware cluster, within which virtual machines will be restarted and migrated as if on a common, unified disk storage, according to the DRS policies we write. But I will talk about this in more detail in the next series.

For asynchronous data replication from one site to another, Nutanix uses an internal Vmcaliber snapshot mechanism. When they are used, only the differential part, a kind of diff of the contents of the disk-container volume, is transmitted over the data transfer channel between the two sites. The diff from the previous transmission is taken from the source system, then sent to the remote replication recipient site, and there it rolls over. At present, Nutanix asynchronous replication transmission interval is 15 minutes. That is, in the worst case, you can lose only 15 minutes of data, while, unlike synchronous replication, the process does not reduce performance and does not increase latency when working with data.



Thus, every 15 minutes, diff disks of the main system leave for the DR site, and there it automatically turns around and rolls, keeping it up to date. If you lose the system in the "main" DC, in the worst case, you will lose data changes in 15 minutes.
But what to do if a fifteen-minute lag is too much, and you need synchronous replication?
To do this, you can use our technology Metro Availability, about which the next post.

Source: https://habr.com/ru/post/250197/


All Articles