How to achieve replication with zero RPO over long distances

What is SLD and why is it needed?

One of the most important tasks of the IT department of the enterprise is to protect data from the effects of various external factors, such as: fire, earthquake, flood, and other disasters. Traditionally, various data replication technologies are used. However, replication usually allows you to synchronize (with one or another RPO value) the same data set only between two data centers. And for many customers this is quite enough. For many, but not for all. If a customer requires zero RPO, then you need to use synchronous replication. However, synchronous replication allows you to place data centers at a distance of about 100 km from each other. In the event of a serious disaster, or simply if the two data centers are too close to each other, both DCs can suffer at the same time - and the data will be lost.

So, if an enterprise needs to provide an extremely high level of data protection, namely:

replication of the same data set between the three data centers,
zero RPO in case of failure of any of the 3 data centers,
continuation of work in case of consecutive failure of any two data centers,

- for such demanding customers, we can offer a special solution: HPE 3PAR Synchronous Long Distance (SLD).
')
SLD is long-distance replication without data loss. How it works, I will try to explain below.

What types of replication are supported

First I want to remind you what types of replication and which topologies are supported by the HPE 3PAR StoreServ family of arrays.

3PAR StoreServ arrays support 3 modes of replication (Remote Copy):

Synchronous mode (RPO = 0);
Asynchronous Periodic mode (min RPO = 5 min);
Asynchronous streaming (Asynchronous streaming) mode (RPO about 10 seconds).

If the synchronous mode, I hope, does not require explanations, then for asynchronous modes I will briefly describe how they work:

asynchronous periodic mode: snapshots are used to synchronize volumes, taken at a specified interval of time; only new blocks that are contained in the new snapshot compared to the previous snapshot are replicated to the remote array;
asynchronous streaming mode: new blocks are accumulated in the local array for a short period of time (measured in fractions of a second) and then replicated to the remote array.

I will add to this that in all 3 modes, naturally, data consistency is maintained during replication.

As a transport layer for replication, you can use the following 3 options:

Remote Copy over Fiber Channel (RCFC) —the FC ports of the array are used for replication and the FC network is used as data transfer channels;
Remote Copy over Internet Protocol (RCIP) —the built-in IP ports of the array are used for replication (1GbE or 10GbE — depending on the array model) and the IP network is used as data channels;
Remote Copy over FCIP (Fiber Channel over IP) - the FC ports of the array are used for replication and the IP network is used as the data channels. FCIP involves the use of additional converters (gateways) of the protocols FC-IP.

And finally, supported replication topologies / configurations:

One-to-one : replication is performed only between two arrays;
Many-to-many : N arrays can replicate data to other M arrays. Maximum values for today for N and M = 4. An example of such a configuration is shown in the figure below:

Fig.1. Many-to-many replication configuration. Each array replicates data to 4 other arrays. All replication directions may be bidirectional. Here we are talking, of course, about replicating different data sets (volumes) on different arrays.

Synchronous Long Distance (SLD) is a special replication mode that allows you to simultaneously replicate the same set of data (volumes) from one array to two other arrays. This is the replication mode we consider in detail below.

How does the SLD

So, SLD is:

Simultaneous replication of a volume group from one array (A) to 2 other arrays (B and C). In this case, replication to one array (B) is performed in synchronous mode, and replication to another array (C) - in asynchronous periodic mode. See below fig.2. Thus, arrays A and B can be located relatively close to each other (the maximum distance is determined by the maximum allowed time for synchronous replication delay between two arrays RTT = 10 ms). On the contrary, the array C can be removed from the arrays A and B for a considerable distance (the maximum distance is determined by the maximum allowed time for asynchronous periodic replication delay between two arrays RTT = 120 ms).
Providing RPO = 0 on remote array C. Let me remind you that, since array C is located far enough, replication to it in synchronous mode is impossible, and the only way to ensure switching to remote array C without data loss (in case of failure of the main array A or during scheduled switching ) Is the use of SLD technology.

Fig.2. SLD scheme.

SLD works as follows: in normal mode, the data is replicated from array A to arrays B and C. At the same time, asynchronous periodic replication is also configured between arrays B and C, which is normally in the passive state (shown in Figure 2). . If the main array A fails, replication from array B to array C is automatically activated, and the data that was written to array B, but not recorded to array C, will be copied to array C. Thus, after failure of array A, arrays B and C will be automatically synchronized up to the last block that was written to array A before its failure.

After synchronization of arrays B and C, data processing can be continued; both array C and array B can be selected as the main array. In this case, no data that was written to array A will be lost (RPO = 0) and replication will be performed between arrays B and C, ensuring continuous data protection after the failure of one of the three arrays.

After restoring array A, new data that was written to arrays B and C will be copied to array A, after which it will be possible to return to the normal operation mode using array A as the main array.

In conclusion, I want to note two more important points:

Arrays A and B, between which synchronous replication is performed, can be used in the same way, that is, both of these arrays can simultaneously be main arrays. In this case, one set of volumes will be replicated from array A to arrays B and C. Another set of volumes will be replicated from array B to arrays A and C.
SLD can be used simultaneously with the 3PAR Peer Persistence technology - which allows switching between A and B arrays fully automatically. The 3PAR Peer Persistence technology also allows hosts to transparently switch between two arrays (between arrays A and B in this case) and move virtual machines online between two arrays (between two platforms). Learn more about Peer Persistence here .

Vladimir Korobeynikov, @Vladkor

Source: https://habr.com/ru/post/320366/

All Articles

How to achieve replication with zero RPO over long distances

What is SLD and why is it needed?

What types of replication are supported

How does the SLD

More articles: