Recently, we decided to go beyond the budget server segment: revise our vision of hosting virtual machines and create the most fault-tolerant service.
In this article I will tell you how our standard VPS platform is organized and what techniques we used to improve it.
Our standard VDS creation technologyNow the hosting of virtual servers with us is as follows:
Single-rack servers of approximately the same configuration are installed in the racks:
- CPU - 2 x Intel Xeon CPU E5-2630 v2 @ 2.60GHz
- Motherboard: Intel Corporation S2600JF
- RAM: 64 Gb
- DISK: 2 x HGST HDN724040ALE640 / 4000 GB, INTEL SSDSC2BP480G4 480 GB
One of the servers is the main one. VMmanager is installed on it and nodes are connected to it - additional servers.
In addition to VMmanager, client virtual servers are located on the main server.
Each server “looks” to the world with its network interface. And to increase the speed of VDS migration between nodes, servers are interconnected by separate interfaces.
')
(Fig. 1. The current scheme of hosting virtual servers)All servers operate independently of each other, and in case of performance problems on one of the servers, all virtual servers can be distributed (the “Migration” function in VMmanager) to neighboring nodes, or transferred to a newly added node.
Situations when the server fails (kernel panic, spilled out drives, dead BP, etc.) entail the inaccessibility of client virtual machines. Of course, the monitoring system immediately notifies the responsible specialists about the problem, they begin to clarify the causes and eliminate the accident. In 90% of cases, the work on the replacement of failed components takes less than an hour, plus it takes time to eliminate the consequences of the server's emergency shutdown (storage system synchronization, file system errors and other ...).
All this of course is unpleasant for us and our customers, but a simple scheme allows us to avoid unnecessary expenses and keep prices low.
New Cloud VDSTo satisfy the most demanding customers, for whom Uptime server is crucial, we have created a service with the highest possible reliability.
So, we needed new software and hardware.
Since we are already working with
ISPsystem products , the logical step was to look at VMmanager-Cloud. This panel was just created to solve the problem of fault tolerance, at the moment it is well developed and has reached a certain stability. She arranged for us and we did not consider alternatives.
Ceph . Was adopted unconditionally as a distributed file system. It is a free, freely evolving product, flexible and scalable. We tried other storage systems, but Ceph is the only product that fully satisfied our storage requirements. He seemed difficult at first, with some attempts we finally figured out. And do not regret it.
The nodes of the new cluster are assembled on the same hardware as the VMmanager working cluster, but with minor changes:
We switched to multinodes with power redundancy.
For switching between cluster nodes, instead of the usual gigabit connection, we used Infiniband. It allows you to increase the connection speed up to 56Gb (Mellanox Technologies MT27500 Family ConnectX-3 IB cards, Mellanox SX6012 switchboard)
The CentOS 7 distribution was chosen as the operating system for the cluster nodes. However, in order to make all of the above work together, I had to assemble my kernel, reassemble qemu and ask for some improvements in VMmanager-Cloud.

(Fig. 2. New scheme of cloud hosting of virtual servers)
The benefits of using new technologyAs a result, we got the following:
- even more professional service virtual servers with high uptime. Its stability does not depend on problems with the hardware of cluster nodes.
- Increase in data storage reliability due to distributed file system with multiple copies storage.
- fast migration of virtual machines. Transferring a running VPS from a node to a node happens almost instantly without losing any packets or pings. If necessary, this quickly releases the node for maintenance.
- when a node fails, client virtual machines automatically start on other nodes. For the client, this looks like an unplanned reboot on power, the downtime is equal to the OS reload time.
Since early December last year, the cluster has been operating in combat mode, at the moment it serves several hundred clients, during which time we stepped on a lot of rakes, dealt with bottlenecks, performed the necessary tuning and modeled all abnormal situations.
While we continue testing, economists consider the cost price. Due to the additional redundancy and use of more expensive technologies, it turned out to be higher than the previous cluster. We have taken this into account and are developing a new tariff for the most demanding customers.
There are a number of risks that we can not close at all, this is the power supply of the data center and external communication channels. To solve such questions, geographically distributed geo-clusters are usually done, perhaps this will be one of our next studies.
If you are interested in the technical details of the implementation of the above described technology, then we are ready to share them in the comments or make a separate article in the wake of the discussion.