There is a small data center near a manufacturing company in a small city rather far from Moscow. He needed around the clock. It so happened that there is only one input from the mains, but there is no diesel generator set. Because the company is not an IT company, but a production company, they once did not design it correctly. Because once everything worked.
The power beam began to misbehave. Every week, the lights were cut off for several hours, and in a lottery way: they could have been for an hour, and more. There are no patterns.
')
The admin offered to buy diesel, but the business said it was not an admin case. His job is to provide a simple no more than an hour. They have just poured a lot of money into the equipment, so they cannot go to the cloud, and there are no nearby commercial data centers to transport equipment there.
And what to do?
It is with this task that the customer came to us. There is no budget, you need to look for an existing solution.
The normal case (this is not counting the appearance of the second input, the transfer of equipment or the appearance of a diesel generator) - deploy the second exactly the same instance in the cloud and switch to it if something suddenly falls. Called Disaster Recovery. Some of them are building a second data center, it is cold and is waiting for the main one to fall, or it is working in active-active mode, taking 50% of the load.
But there is no money for the second full data center.
Come up with this:
There is a heavy physical server with a database in the data center of the client. And there are applications that work with this database, which are a set of virtual machines on ESXi.
To replicate the database to the cloud, they installed Carbonite Availability software (previously known as Double-Take Availability), which works at the operating system level. And for replication of virtualok set Zerto, this software works at the level of a hypervisor. Both solutions work in approximately the same way: first they replicate the entire amount of server data to the cloud, and then intercept all the records on the disks on the main site and duplicate them on the disks to the cloud. The delay is specifically in this case 10 seconds, that is, we always have a fresh copy of data 10 seconds old.
Virtual machines are not included. Using the button from the Zerto control panel, we can start all the VMs at once. It takes about 28 minutes (the machines start in parallel), the SLA is idle for 1 hour. The launch is done by calling the administrator on duty. The customer decides when it is needed.
VM pick up the base and begin to work.
When power is turned on at the facility, the customer himself understands its infrastructure. Destroys breakdowns, then manually turns on reverse replication. The amount of changes in the database accumulated during the operation of applications is sent back. Replicated - switched. In this particular example, for each hour of work of virtual machines, traffic accumulates for about 5 minutes of reloading. The longer the work of the emergency instance, the smaller the share of traffic, because the records often go to the same database tables, and we send only the difference.
After switching back to the cloud, virtual machines turn off. The customer does not pay for resources that are turned off. We have quantization by the hour.
Payment goes only for the amount of stored data, the channel and the license for software for replication (Zerto and Carbonite). We do the work on the principle of “Disaster Recovery as a Service”, we give an SLA to it all. And we are financially responsible for this SLA.
The customer generally replicates everything, a virtual machine with the same parameters as its physics, all the disks are mirrored.
This is what Zerto does:
It has agentless replication, has asynchronous mode, VMs on the DR platform are turned off, journaling replication, WAN optimization, cross-hypervisory replication, licensing by protected virtual machines.
Carbonite is agent replication, so no matter what the hypervisor is, there is an asynchronous mode of operation, there is support for snapshots, compression of transmitted data, licensing with protected virtual machines.
In the installation, both solutions are applied at once. It was necessary because of a number of features. Usually offer one thing.
You can also solve a similar problem with the domestic solution Veeam Cloud Connect (usually we use if you already have a Veeam backup).
Total
We all understand that the problem could have been solved differently by pumping the server by installing a diesel generator. However, the business has lowered the requirements for the organization of the reserve. We provided a service, and it all worked. It turned out a good example of how to deploy a DR-site correctly and inexpensively.
Links