📜 ⬆️ ⬇️

Half the fire did not happen: how we moved to a new data center

image Two moves are equivalent to one fire.
(Popular wisdom)

Instead of the preface


A well-known cardiac surgeon arrives at the car service and rents his car for repair. The mechanic working in the workshop took this opportunity to call the doctor and ask him a question:

- Doctor! In fact, we are doing the same thing: I take out the "hearts" of cars, pull the valves out of them, put new ones. And I can replace the whole engine. Anyway, after my work, the car continues to live with a new “heart”. But you row the money with a shovel, and I get a penny for my work. Why is that?!
')
What the doctor reasonably remarked:

- And you, dear, try to make a major overhaul of a working engine!

image We are growing rapidly, and we constantly need new capacities to house our equipment. At the same time, the growth of our volumes in no case should result in a decrease in the quality of our services. This is a strategic task.

Summer is the time of vacations, the most “calm” period for most webmasters: an extraordinary planned “reset” of the server is perceived more calmly.

We waited for the summer - and moved to our new data center !

Task


It would seem, what can I tell? After all, at first glance, there is nothing tricky in moving: having certain accuracy, you can easily transport anything and anywhere - especially when it comes to transporting boxes of iron, which by their nature are server and network equipment. In fact, the task of transporting hosting to a new technical platform, and even in St. Petersburg (this is an important point!), Has spicy features - in particular, it is highly desirable for hosting to continue to work during the move. Thus, the main problem that had to be solved in the process of moving was minimization of downtime in the provision of services . Based on this goal, the means were chosen.

Relocation planning was carried out based on the following data:


Solutions


The task before us could be solved in various ways, each of which was carefully analyzed. Three main groups of solutions were identified.

Simple, cheap, clumsy


The simplest solution would look like this:


Pros:


Minuses:


The quantitative predominance of the advantages of such a scenario of relocation could not outweigh the materiality of its disadvantages, and did not accept the option.

Laborious, expensive, elegant


An elegant solution would be to deploy a completely new technical platform in the new location - new equipment in the amount available in the old data center, new networks of IP addresses. After the readiness of the new site, it would be possible to:


Pros:


Minuses:


The number of disadvantages seriously outweighs the advantages, and this decision was also considered not appropriate: no one has the opportunity to use uncontrolled processes as a tool for solving their problems.

Life


After analyzing the factors that determine the duration of a break in the provision of services, we have developed a solution that we are proud of in our depths.

Technical factors


image

Organizational factors



Upon completion of the physical transfer of equipment to the new data center, we had only to “extinguish” the network in the old data center and make changes to the routing of our networks, which was successfully done. From the break in the work site could not be noticed. For those who, for technical reasons, noticed him, the site’s visibility “disappeared” for no more than 10 minutes.

Of the minuses of the decision adopted and enforced in life, only significant labor intensity and some overhead costs should be noted (for example, for the purchase of “buffer” equipment for a new technical platform). But these moments did not affect the qualitative side of the process, therefore, they turned out to be acceptable.

Organizational Conclusions


Of course, we didn’t succeed in “overhauling a working engine” - for objective reasons, it’s impossible to change the physical position of the equipment without interrupting its operation. But we are glad that we managed to prevent the occurrence of “half the fire” - the physical relocation of equipment by the user of the virtual hosting and most of the VDS or dedicated rental services clients looked completely indistinguishable from the ordinary full server reboot, performed, for example, to update hardware or system software : Instead of the planned two hours of downtime, which we warned customers in the mailing lists, the average time of site unavailability was 1 hour and 20 minutes.

Source: https://habr.com/ru/post/147955/


All Articles