As our customer did not want to let go of the provider

The story is rather short, but funny. Our customer actually encountered it. It all started at the moment when one of the IT infrastructure providers decided to transport their data center. And he warned all his clients about the three-day downtime in about half a year, but he so timed and organized bureaucratic red tape that some customers simply did not have time to prepare for migration.

Imagine: you are CIO. You do not have a budget for an emergency backup site. Old equipment either. The business is connected with the provision of medical services, here every extra hour of delay is expensive: one day of inactivity threatens with financial and image losses equal to the annual profit, two hours the customers will suffer.
')
And now the finishing chord: moving is in fact difficult for you, putting in frames and coordinating every month. Because you pay the provider a lot, why should you let go?

Provider logic

Everything is quite simple: the customer has lived with them for many years with the whole infrastructure and if it survives downtime, then it will live for many years. And all the time will pay. Therefore, it is necessary to preserve it as much as possible. Therefore, it is necessary to insert stick into the wheels to the end of the move. Where is he going? Well, yes, slightly compensate for the inconvenience within the framework of the SLA or make a symbolic discount.

Customer logic

The customer, to put it mildly, did not understand the humor. More precisely, he immediately understood everything and turned to us in order to transfer the entire infrastructure as soon as possible. And then do two sites already after this crisis is over.

Our task is to pull out all the data and deploy all the same in our cloud , which lies in two fault-tolerant distributed data centers certified by Uptime. Routine procedure, three days for everything, if there are no rare operating systems, for example, or hardware keys. There are no rare operating systems, hardware keys, old storage systems and heavy thresher servers. Just 5 TB of data.

This is where the problems started.

Problems

In the SLA provider, the response time was indicated - 3 days. In the last hour of the SLA term, all our requests were answered in the spirit of unforgettable Bangladeshi support with the clarification of some trifles. Then again 2 days and 23 hours.

Internet speed was up to 50 Mbit / s (this is taking into account that the customer utilizes this channel at least half live in production and there is a more or less normal part of the band only late at night). The provider refused to extend the bandwidth even for extra money.

It remains two weeks.

“Checkmate”, the administrator probably thought.
“But to hell with you,” we decided.

Doubleback

There is such a thing - Carbonite (formerly DoubleTech), designed for asynchronous replications. He has an extension Carbonite Move, designed to move. That is, the license is only for one time - until everything is settled.

Put this thing at the OS level, and then begins to monitor the recording activity. The task of the agent from the replicated machine is to send data to the channel and keep track of what is being written new after the synchronization begins. When the first hefty piece of data is transferred to the target infrastructure, at the OS level, the rest of the software will be muted and Move will highlight and send the difference between what was at the beginning of the copy and what was written while the copy was going.

The downtime for getting the difference in a week is quite small, about an hour, plus another hour for work on initiating a new infrastructure. Total registered four, but met much faster. Four hours a night they could afford. Eight is no longer.

Why a week? Because so much we dragged data from agents - given that the channel was impoverished.

Next - full-time deployment and recovery, quick tests, production day, and then the customer completely stops the virtual machines at our opponent’s site.

For the provider, it was a surprise, because when we stopped writing to them, they apparently decided that everything was a victory in their pocket.

What is this software?

Here is the software page .
On the main page it is written about okolonulevoy downtime, and this is true when there were few changes during the copying. That is, when it is possible to transfer systems one by one or the channel is wide. We had it wrong.

Using this thing, you can test new sites before launching them in production (the license allows), or using another package solution, simply set up replication in Active-Passive mode.

Simple console control:

Everything is copied, connectors are not needed - simply DoubleTech makes everything almost sectoral from the machines (more precisely, by the units of storage logic, they fix exactly the facts of a successful recording and their coordinates). The channel is protected in the process, and what exactly is pouring is not visible to anyone.

What did not work

Not everything can be taken according to the scenario above. There was one loaded virtual machine (by processor under 100 percent), there was a hot database spinning. And when we put the agent, he began to otzhirat power and the impact began on the base. At night, the load on the base was less, but the base was very large, and the window was small. We realized that we would have to endure for a very long time, and as a result, by copying at night, we simply did not have time to migrate.

We made a backup of the virtual machine, brought it physically to our site on the disk and poured it into the cloud. We turned around from backup, rolled up the update difference using standard MS SQL tools. Everything. It sounds simple, but it was necessary to do it quickly and the first time.

So good for you moving.

Links

Source: https://habr.com/ru/post/359190/

All Articles