Data storage systems: how slowly but surely they are decoupled from iron

^{An accident in the first data center and automatic restart of services in another}

Virtualization is one of my favorite topics. The fact is that now you can almost completely forget about the hardware used and organize, for example, a data storage system in the form of a “logical” unit that can interact with information using simple rules. At the same time, all the processes between the virtual unit and the real hardware in different data centers lie on the virtualization system and are not visible to applications.

This gives a lot of advantages, but also raises a number of new problems: for example, there is the issue of ensuring the consistency of data with synchronous replication, which imposes restrictions on the distances between nodes.
')
For example - the speed of light becomes a real physical barrier , which does not allow the customer to put the second data center further 40-50, or even less, kilometers from the first.

But let's start from the very beginning - how storage virtualization works, why everything is needed, and what tasks are being solved. And most importantly - where exactly you can win and how.

A bit of history

At first there were servers with internal disks on which all information was stored. This rather simple and logical solution quickly became not the most optimal, and over time we began to use external storage. At first they were simple, but gradually special systems were needed, which allowed storing an incredibly large amount of data and giving them very fast access. Storage systems differed from each other in volume, reliability and speed. Depending on specific technologies (for example, magnetic tape, hard drives, or even SSDs, which now, of course, do not surprise anyone), you could vary these parameters in a fairly wide range - the main thing was to have money.

For IT directors, one of the most important criteria for storage is reliability now. For example, an idle hour for a bank may well cost as much as 10–20 such cabinets with iron, not to mention reputational losses — that is why geographically distributed fault-tolerant solutions have become the main paradigm. Simply put - pieces of iron duplicating each other, which stand in two different data centers.

Evolution:

Stage I: Two servers with built-in drives

Stage II: storage and two machines in one data center.

Stage III: Different data centers and replication between them (the most common option)

Stage IV: Virtualization of this economy. By the way, in the bundle in the middle you can plug in, for example, an additional backup.

Stage V: Virtualization Storage (EMC VPLEX)

One of the tools for solving this problem is EMC VPLEX, by the example of which one can clearly understand the advantages of this very virtualization.

System comparison

So what is the point?

Before VPLE X: there are two servers, each sees its own volume, there is replication between the volumes. One always stands and waits, the second always works. The backup server does not have write permissions until the primary data center has failed.
After deploying VPLEX : replication is not needed. Both servers see only one virtual volume connected directly to themselves. Each server works with its own volume, and everyone thinks that this is a local volume. In reality, each works with its storage.

Before : to transfer data to another storage (physical storage), you need to reconfigure servers, clusters, and so on. This reduces the fault tolerance and can cause errors: when reconfiguring the cluster can disperse, for example.
After : it is possible to transfer data without reducing fault tolerance and transparently for servers (all storage systems are hidden under VPLEX and they don’t even know about the server). The mechanics are as follows: we add a new storage system, we connect it under VPLEX, we mirror it without removing the old one, then we switch it - and the server does not even notice.

Before : there are problems with different vendors, for example, you cannot configure HP replication with an EMC array.
After : you can connect an array of HP and EMC (or other manufacturers) and quietly assemble a volume from two storages. This is especially cool because large customers often have a heterogeneous “zoo” that is tightly integrated and easily upgraded. This means that any critical system can be easily and simply transferred to a new iron without a concomitant headache.

Do : need time to switch replication and cluster.
After : only the application in the cluster is restarted, it is always either on one node or on the second one, but it is transparently and quickly transferred.

Till : architecture is a geocluster with all limitations.
After : architecture - local cluster. More precisely, the server thinks so, and therefore there are no difficulties in working with him.

Before : you need a replication management software.
After : VPLEX at the system level monitors replication. And in general, there is no replication, in fact, there is a “mirror”.

Before : SRM imposes restrictions on restarting the VM in the backup data center.
After : standard VMotion works when moving the VM to the backup site (anticipating the question about the channel: yes, we have a wide channel between the sites, as we are talking about a serious Disaster Recovery solution).

How to move without idle system?

It is quite common to move from one piece of iron to another: about once every two to three years, highly loaded systems require upgrading. In Russia, the reality is that many customers are simply afraid to touch their systems and produce “crutches” instead of transferring - and often quite justified, because there are too many examples of errors when moving. With VPLEX, moving is easy - the main thing is to know about this possibility.

Another interesting point is the transfer of systems for which performance is incomprehensible. For example, a bank launches a new service, and its availability becomes an important competitive advantage in six months. The load on the iron grows, you need to make a difficult and painful move (banks are afraid of even one lost transaction, and even 5 milliseconds of a miss is a problem). In this case, VPLEX-like systems become the only more or less reasonable alternative. Otherwise, to quickly and transparently replace storage will not be easy.

Suppose the system is old and rigidly attached to the gland. When moving to another hardware, an environment is needed that will help carry out the transfer without affecting the work of users and services. Placing such a system under VPLEX, it can be easily transferred between vendors - applications will not even notice. There are no problems with OS support either. In the list of compatibility all major OSes that occur at the customer. In exotic cases, you can check the compatibility with the vendor or partner and get confirmation.

We take the existing storage system (left) and mirror it with EMC VPLEX means imperceptibly for the server and applications (right). In VPLEX terms, this is called distributed volume. The server continues to think that it works with one stack and one volume.

In fact, the first storage system becomes something like a piece of a mirror. We disable it - and the move is ready.

About synchronization

There are three configurations - Local (1 data center), Metro (synchronous replication) and Geo (2 asynchronous data centers). A type of synchronous replication with x-connection - Campus. Synchronous replication is most in demand (this is 99% of deployments in Russia). This is where the heartless speed of light comes in, which sets the maximum distance between data centers - 5 milliseconds should be enough for a signal to pass. It can be configured with 10 milliseconds, but the closer the data centers the better. Usually it is 30-40 kilometers maximum.

Schema Options for Synchronous Replication

VPLEX gives servers read-write access. Servers see one data volume each on their site, but in reality this is a VPLEX virtual distributed volume. Metro allows for long delays, Campus gives greater reliability. At Campus, it looks like this:

The best part is that there are no problems with switching replication when moving virtual machines.

When using Campus, the failure of all disk subsystems and the local part of the VPLEX will result in the loss of only half the disk path. The disks themselves will remain available to servers - only through the x-connection and the remote part of the VPLEX. This is how it works for Oracle.

There are still situations like data centers in Moscow and Novosibirsk, they are solved by asynchronous replication. Her VPLEX is also able, but already in Geo configuration.

And if an accident?

Here VPLEX Metro and VMware HA (but maybe Hyper-V) - and an accident in one of the data centers.

Services are restarted in another data center without administrator participation, since for Vmware, this is a single HA cluster.

In the middle there is Witness - it is a virtual machine that monitors the state of both clusters and makes sure that when the connection between them breaks, both do not start processing the data. That is, it protects against machine "schizophrenia." In the event of a crash, Witness allows you to work with the most current copy of only one cluster - and after the problem is fixed, the second simply receives a more recent version of the data and continues to work.

Witness takes place either at the third site or at the cloud provider. She communicates with EMC VPLEX over IP via VPN. She doesn't need anything else for her work.

Data at the remote site

Retrieving data physically located in another data center is also not a question. What for? For example, if there is not enough space in the main data center. Then you can take it in reserve.

Heterogeneous iron

Ecosystem "VPLEX and partners"

At the heart of CROC solutions based on EMC technologies, we built such a solution. On one site, EMC storage, a Cisco server (for many, the news that Cisco releases very good servers), on another site Hitachi is virtualized, and IBM server is virtualized. But as you can see, all other vendors are also quietly supported. That week we carried out a demonstration of the system for one of the banks, for which the stand was assembled, and their specialists were convinced that there were no jambs, and the integration was really smooth. During the demonstration, we imitated various accidents and failures that we had prepared, or the customer offered them during the meeting. The next stage is a pilot project on several small systems. Despite the presence of the experience of operating these glands, each customer wants to make sure that everything works. The decision center is made for this, so we don’t mind.

More bonuses

When working in one data center, VPLEX also solves the problem of mirroring inside this data center. In addition, VPLEX is much softer with regard to errors in the required performance - you can move to a more powerful storage system as the play progresses.
Having power in a remote data center, I want to use them - you can use a “backup” data center for storage in some cases, while keeping the server on your site.

How does data access work?

Surely you are already wondering how it works below. So, earlier, when recording on the same site and reading from another, there was a chance to get to irrelevant data. The VPLEX has a directory system that shows which node has the most up-to-date data, so the cache can be considered common to the entire system.

When reading a section just recorded by another machine, this bundle works.

Configuration

You can start small, for example, put a block that contains 2 controllers (that is, it is already a fault-tolerant configuration, up to 500,000 iops) - then you can go to the middle configuration or reach the maximum 4-node configuration in a rack in each data center. That is, up to 2,000,000 iops, which is not always necessary, but is achievable. You can go further and create VPLEX domains, but before that, in our market still, I think, no one has grown.

Advantages and disadvantages

Minuses:

Need for implementation costs and licenses
We need to decide on the transition to a new philosophy and train staff
Most likely, the transition process will be phased (but remember how we virtualized our servers!)

Pros:

You can get rid of the iron and not worry about the failures of parts of the system, getting the reliability of five nines.
Simple scaling and travel.
Simple manipulations with virtual machines and applications.
There is no multi-vendor problem.
A system from a certain level is cheaper than a heap of software for solving local storage problems in data centers.
Due to the simplicity and transparency of all actions, the use of VPLEX significantly reduces the number of human errors.
Forgives inaccurate performance predictions.
VPLEX allows you not to think about sharpening iron for a specific task, but to use it as you would like.
Move from resiliency to mobility
Server administrators set up clusters as before using the same tools.

Implementation

If you want to try or look at how all this works live - write to vbolotnov@croc.ru. And I will answer any questions in the comments.

Source: https://habr.com/ru/post/169333/

All Articles