📜 ⬆️ ⬇️

Scalaxi Cloud Platform Architecture Overview

In one of our previous posts , we described the architecture of disk storage. The article received a lot of feedback and got the idea to describe the entire current architecture of our cloud. We have repeatedly talked about different components at professional conferences. But, firstly, not everyone has the opportunity to visit them, and secondly, our architecture is dynamic, constantly evolving and complemented, so that a lot of information is no longer relevant.

Conventionally, the current architecture of the Scalaxi platform can be divided into three significant parts:

Virtualization pool


A limited number of physical servers for running virtual machines and a system for storing disk images of virtual machines combined with a common high-performance communication bus.

Component architecture:

The overall architecture is shown in the diagram:
image
Virtualization system

The virtualization system consists of diskless virtualization servers with the open source Xen 4.0.1 open source hypervisor.
image
')
Virtualization Server Configuration
- Intel Xeon 55xx / 56xx CPU;
- RAM 68-96 GB;
- there are no local drives;
- Infiniband-adapter;
- Ethernet adapter (for power management and emergency access via IPMI 2.0).


The virtualization server is loaded on the Infiniband network from the managing cluster. When loading, a control Xen domain is created with SUSE Linux Enterprise OS. Then, at the request of the managing cluster, client para -virtual and HVM domains (virtual machines) with Linux and Windows OS are created on the virtualization server.

Each virtual machine has:
Depending on the settings transferred by the controlling cluster, the following limits are configured:

Block Access Storage System

A block access storage system consists of two types of servers: a proxy and nodes. More with photos in the last post.
image
For work of storage at the block level, the implementation of the SCSI protocol operating in the Infiniband environment - SRP is used . In SRP, as well as in other network SCSI implementations, targets and initiators (servers and clients) are used. Targets supply the initiators with special SCSI moons (logical blocks).

SRP initiators and multipathd daemon are running on virtualization servers. The latter aggregates identical moons from different proxy servers into one virtual block device, providing fault tolerance. If one of the proxy servers fails, multipathd will switch the path to another proxy server, so that the virtual machines on the virtualization servers will not notice the failure.

The device created by multipathd is divided into logical devices according to data from a single logical group of LVM volumes. The resulting blocking devices are transferred to virtual machines, which see them as disks. If you need to change the disk size for a virtual machine, you need to change the size of the logical device in the LVM logical group, which is a very simple operation.

Storage nodes are servers with disks running SUSE Linux Enterprise OS. They use SCSI targets from the SCST driver. As storage system nodes, you can use any data storage system that uses SRP , FC, or iSCSI protocols, such as NetApp, EMC, and others.

Proxy servers perform data replication functions and combine storage space nodes into one logical LVM group. Data replication is performed using the Linux md driver between multiple storage system nodes. For this purpose, from the moons of several storage system nodes, a raid of level 1 + 0 is assembled with a given level of redundancy. By default, the backup level is 2x (each virtual machine disk is stored on two nodes of the storage system).
Storage node configuration
- Intel Xeon 55xx CPU;
- 96 GB RAM;
- 36 SAS2 600GB disks;
- Infiniband-adapter;
- Ethernet adapter (for power management and emergency access via IPMI 2.0).

Proxy Configuration
- Intel Xeon 56xx CPU;
- RAM 4 GB;
- there are no local drives;
- 2 x Infiniband-adapter (one for connections with nodes, the second for connections with virtualization servers);
- Ethernet adapter (for power management and emergency access via IPMI 2.0).


Infiniband tire

The Infiniband bus consists of two main elements: Infiniband switches and gateways to the Ethernet.
image
For pools, 324-port Grid Director 4700 Infiniband switches are used . Currently, the switch is reserved at the module level (it has a fully passive backplane and a modular architecture when the operation is not interrupted when a module fails). In the future, during development, Infiniband switches will be reserved at the chassis level.

Ethernet Gateway is a server with SUSE Linux Enterprise OS and Infiniband adapter and Ethernet adapter. Gateways are reserved.

Network system


The network system consists of two main elements: switches and multiservice gateways.
image
Juniper EX8208 switches route IP traffic. Switches are reserved.

Juniper SRX3600 multiservice gateways protect the system from parasitic traffic by recognizing various types of attacks using the signature library. Multiservice gateways are reserved. Photos can be viewed .

Managing cluster


The main elements of the management cluster are the storage system, nodes and system services running on the nodes.
image

The management cluster storage system is an HP MSA2312sa hardware raid with two controllers each. Each raid is connected to 4 nodes of the management cluster over SAS . Data between two raids is reserved at the service level.

The nodes of the management cluster are servers running SUSE Linux Enterprise OS and Xen 4.0.1 hypervisor. Each management cluster service is one or more virtual machines with associated software.

Services

The management system includes the following system services:



Each service is launched in two or more instances, data replication is performed using MySQL database replication.

All this documentation is available on our wiki , where there is also a FAQ and a description of our API, with examples.

Source: https://habr.com/ru/post/116683/


All Articles