Hello, readers Habra! The topic of this article will be the implementation of means of disaster recovery in storage systems AERODISK Engine. Initially, we wanted to write in one article about both means: replication and the metrocluster, but, unfortunately, the article turned out to be too large, so we divided the article into two parts. Let's go from simple to complex. In this article we will set up and test synchronous replication - drop one data center, and also break the communication channel between data centers and see what happens.
Our customers often ask us different questions about replication, so before proceeding to setting up and testing the replica implementation, we will tell you a little about replication in the storage system.
Storage replication is an ongoing process for ensuring the identity of data simultaneously across multiple storage systems. Technically, replication is performed by two methods.
Synchronous replication is the copying of data from the main storage system to the backup one, followed by the mandatory confirmation of both storage systems that the data has been recorded and confirmed. It is after confirmation from both sides (from both storage systems) that the data is considered recorded, and it is possible to work with them. This ensures guaranteed data identity across all storage systems participating in the replica.
Advantages of this method:
Minuses:
Asynchronous replication is also copying data from the main storage system to the backup one, but with a certain delay and without the need to confirm the record on the other side. You can work with the data immediately after writing to the main storage system, and on the backup storage the data will be available after some time. The identity of the data in this case, of course, is not ensured at all. Data on the backup storage is always a bit "in the past."
Advantages of asynchronous replication:
Minuses:
Thus, the choice of replication mode depends on the business objectives. If it is critical for you to have absolutely the same data in the backup data center as in the main (i.e. business requirement for RPO = 0), then you will have to fork out and put up with the limitations of the synchronous replica. And if the delay in the state of the data is permissible or there is simply no money, then, clearly, an asynchronous method should be used.
Separately, we single out such a mode (more precisely, already a topology) as a metrocluster. In the metrocluster mode, synchronous replication is used, but, unlike the usual replica, the metrocluster allows both storage systems to work in the active mode. Those. you do not have division into active-backup data centers. Applications work simultaneously with two storage systems that are physically located in different data centers. Downtime in case of accidents in this topology is very small (RTO, usually minutes). In this article we will not consider our implementation of the metrocluster, since this is a very large and capacious topic, so we will devote a separate, next article to it, in continuation of this.
Also, very often, when we talk about replication by means of storage, many have a reasonable question:> “Many applications have their own replication tools, why use replication on storage? Is it better or worse?
There is no unequivocal answer, so we give the arguments FOR and AGAINST:
Arguments for replication storage:
Arguments VS storage replication:
* - controversial thesis. For example, a well-known manufacturer of a DBMS, for a long time officially stated that their DBMS can normally be replicated only by their means, and the rest of replication (including SHD-shnaya) is “not true”. But life has shown that it is not. Most likely, (but this is not certain) it is simply not the most honest attempt to sell more licenses to customers.
As a result, in most cases, replication by the storage system is better, because This is a simpler and less expensive option, but there are complex cases when specific functionality of applications is needed, and it is necessary to work with application-level replication.
We will set up a cue in our lab. In the laboratory, we emulated two data centers (in fact, two stands next to each other, which seem to be in different buildings). The stand consists of two SHD Engine N2, which are interconnected by optical cables. Both storage systems are connected to a physical server running Windows Server 2016 using 10Gb Ethernet. The stand is quite simple, but essentially it does not change.
Schematically, it looks like this:
Logically, replication is organized as follows:
Now let's look at the replication functionality that we have now.
Two modes are supported: asynchronous and synchronous. It is logical that the synchronous mode is limited by distance and communication channel. In particular, for synchronous mode, you need to use fiber as physics and 10 Gigabit Ethernet (or higher).
The supported distance for synchronous replication is 40 kilometers, the delay value of the optics channel between data centers is up to 2 milliseconds. In general, it will work with long delays, but then there will be strong brakes when writing (which is also logical), so if you conceived synchronous replication between data centers, you should check the quality of the optics and the delays.
Asynchronous replication requirements are not as serious. More precisely, they are not at all. Any working Ethernet connection will work.
Currently, AERODISK ENGINE storage systems support replication for block devices (LUNs) over Ethernet (copper or optics). For projects where replication through the SAN factory via Fiber Channel is required, we are now adding the appropriate solution, but for the time being it is not ready, therefore in our case it is only Ethernet.
Replication can work between any ENGINE series storage systems (N1, N2, N4) from lower systems to older systems and vice versa.
The functionality of both replication modes is completely identical. Below is more about what is:
There are many more minor features, but there is not much point in listing them, we will mention them in the course of customization.
The setup process is quite simple and consists of three stages.
An important point in setting up replication is that the first two stages should be repeated on the remote storage system, the third stage only on the main one.
The first step is to configure the network ports on which replication traffic will be transmitted. To do this, the ports must be enabled and IP addresses on them are specified in the Front-end adapters section.
After that we need to create a pool (in our case RDG) and virtual IP for replication (VIP). The VIP is a floating IP address that is tied to the two “physical” addresses of the storage controllers (the ports that we just configured). It will be the main replication interface. You can also operate not with VIPs, but with VLANs if you need to work with tagged traffic.
The process of creating a VIP for replica is not much different from creating a VIP for input / output (NFS, SMB, iSCSI). VIP in this case, we create a normal (no VLAN), but be sure to indicate that it is for replication (without this pointer, we can not add a VIP to the rule in the next step).
VIP must be on the same subnet as the IP ports between which it “floats”.
We repeat these settings on a remote storage system, with a different IP-schnick, by itself.
VIPs from different storage systems can be on different subnets, as long as there is routing between them. In our case, this example is shown (192.168.3.XX and 192.168.2.XX)
This completes the preparation of the network part.
Configuring storage for a replica differs from the usual only in that we do the mapping through the special menu "Mapping Replication". Otherwise, everything is the same as with the usual setting. Now in order.
In the previously created pool R02, you need to create a LUN. Create, call it LUN1.
We also need to create the same LUN on the remote storage system of the same volume. We create. To avoid confusion, the remote LUN is called LUN1R.
If we needed to take a LUN that already exists, then at the time of setting up the replica this productive LUN would need to be unmounted from the host, and on a remote storage system simply create an empty LUN of identical size.
Configuration of the storage is completed, proceed to the creation of the replication rule.
After creating LUNs on the storage system, which is currently Primary, configure the LUN1 replication rule on SHD1 to LUN1R on SHD2.
Setup is made in the "Remote Replication" menu
Create a rule. To do this, specify the recipient of the replica. In the same place we set the name of the connection and the type of replication (synchronous or asynchronous).
In the field "remote systems" add our SHD2. To add, you need to use the control IP storage (MGR) and the name of the remote LUN to which we will perform replication (in our case, LUN1R). Managing IPs are needed only at the stage of adding a connection, replication traffic will not be transmitted through them, for this, the previously configured VIP will be used.
Already at this stage, we can add more than one remote system for the “one to many” topology: click on the “add node” button, as in the figure below.
In our case, the remote system is one, so we limit ourselves to this.
The rule is ready. Please note that it is added automatically on all replication members (in our case there are two of them). You can create as many rules as you like, for any number of LUNs and in any direction. For example, for load balancing we can replicate part of LUNs from SHD1 to SHD2, and the other part, on the contrary, from SHD2 to SHD1.
SHD1. Immediately after the creation of synchronization began.
SHD2. We see the same rule, but synchronization has already ended.
LUN1 on SHD1 is in the role of Primary, that is, it is active. LUN1R on SHD2 is in the role of Secondary, that is, it is in the pipeline, in case of failure of SHD1.
Now we can connect our LUN to the host.
We will do an iSCSI connection, although you can also do it by FC. Setting up an iSCSI LUN mapping in a replica is almost the same as a regular script, so we will not discuss it in detail here. If anything, this process is described in the article " Quick Setup ".
The only difference is that we create mapping in the menu “Mapping Replication”
Configured a mapping, gave LUN to a host. The host saw LUN.
Format it to the local file system.
That's it, the setup is complete. Further tests will go.
We will test three main scenarios.
To begin, we will start writing data to our LUN (we write files with random data). At once we look that the communication channel between SHD is utilized. This is easy to understand if you open monitoring the load of ports that are responsible for replication.
Both storage systems now have “useful” data; we can begin the test.
Just in case, we will look at the hash sums of one of the files and write them down.
The operation of switching roles (changing the direction of replication) can be done with any storage system, but you still need to go to both, since you need to disable mapping on the Primary and enable it on the Secondary (which will become Primary).
Perhaps now a reasonable question arises: why not automate it? The answer is: everything is simple, replication is a simple means of disaster recovery, based solely on manual operations. To automate these operations, there is a metrocluster mode, it is fully automated, but its configuration is much more complicated. We will write about the metro cluster setting in the next article.
On the main storage, we disable mapping to ensure that the recording is stopped.
Then on one of the data storage systems (not important, on the main or backup) in the “Remote Replication” menu, select our REPL1 link and click “Change role”.
After a few seconds, LUN1R (backup storage) becomes Primary.
We do LUN1R mapping with SHD2.
After that, on the host, our E: drive automatically clings, only this time it “flew in” from LUN1R.
Just in case, we compare hash sums.
Identically. Test passed.
At the moment, the main storage after standard switching is SHD2 and LUN1R, respectively. To emulate an accident, we turn off the power on both the storage controllers 2.
No more access to it.
We look that occurs on SHD 1 (reserve at the moment).
See that Primary LUN (LUN1R) is unavailable. There was an error message in the logs, in the information panel, as well as in the replication rule itself. Accordingly, data from the host is currently unavailable.
Change the role of LUN1 to Primary.
Cause mapping to host.
Make sure that drive E appears on the host.
Checking the hash.
Everything is good. The fall of the data center, which was active, survived successfully. The approximate time we spent on connecting the “reversal” of replication and connecting the LUN from the backup data center was about 3 minutes. It is clear that in real production everything is much more complicated, and besides actions with storage systems, many more operations need to be performed on the network, on hosts, in applications. And in life, this period of time will be much longer.
Here I want to write that everything, the test has been successfully completed, but we will not hurry. The main storage system is “lying”, we know that when it “fell”, it was in the role of Primary. What happens if it suddenly turns on? There are two roles Primary, which is equal to data corruption? Now check.
We are going to suddenly turn on the underlying storage system.
It is loaded for several minutes and after that it returns to service after a short synchronization, but already in the role of Secondary.
All OK. Split-brain did not happen. We thought about this, and always after the fall of the storage system rises in the role of Secondary, regardless of the role in which it was "during life." Now we can definitely say that the data center failure test was successful.
The main task of this test is to make sure that the storage system does not begin to wonder if it temporarily lost the communication channels between the two storage systems and then reappeared.
So. Disconnect the wires between the storage systems (imagine that they dug an excavator).
At Primary, we see that there is no connection with Secondary.
On Secondary we see that there is no connection with Primary.
Everything is working fine, and we continue to write data to the main storage system, that is, they are guaranteed to differ from the backup, that is, they have “parted”.
In a few minutes we are “repairing” the communication channel. As soon as they saw each other's storage systems, data synchronization is automatically turned on. Here from the administrator nothing is required.
After a while, the synchronization ends.
The connection is restored, no abnormal situations have caused a break in communication channels, and after switching on, synchronization has automatically passed.
We have analyzed the theory of what is needed and why, where are the pluses, and where are the minuses. Then configured synchronous replication between two SHD.
Next, the main tests were carried out on the regular switching, data center failure, and the interruption of communication channels. In all cases, the storage system worked well. There are no data losses, administrative operations are minimized for a manual scenario.
Next time we will complicate the situation and show how all this logic works in an automated metro cluster in active-active mode, that is, when both storage systems are basic, and the behavior in case of storage failure is fully automated.
Please write comments, we will be happy for sensible criticism and practical advice.
Until new meetings.
Source: https://habr.com/ru/post/456348/
All Articles