📜 ⬆️ ⬇️

Big educational program: distributed data storage systems in practical binding for administrators of medium and large businesses

Modern networks and data centers are striding briskly to the full and total program-defined scheme, when in fact it doesn't matter what hardware you cram inside, everything will be on software. For mobile operators, this began with the fact that they did not want to install 20 antennas per house (their nodes are reconfigured, change frequencies and parameters simply by updating the config), and in data centers, first with server virtualization, which is now masthead, and then continued and storage virtualization.

But back to Russia in 2015. Below, I will show you how to save money, increase reliability and solve a number of typical tasks for sysadmins of medium and large business how to use “from improvised means” (x86 machines and any “stores”).


In this diagram, both architectures are visible, which will be discussed. SDS - two red controllers in the center with any backend, from internal disks to FC shelves and clouds. And virtual SAN, on the Hyper-converged storage scheme.
')
The most important thing:

At the same time we consider a couple of typical tasks with specific hardware and prices.

Who needs it and why?


In fact, SDS-software for data storage creates a management server (or cluster) in which different types of storage are connected: disk shelves, disks and RAM of servers (as a cache), PCI-SSD, SSD shelves, as well as individual "cabinets" storage systems of different types and models from different vendors and with different disks and connection protocols.



From this point on, this whole space becomes common. But at the same time, the software understands that the “fast” data should be stored there, and the slow archive data should be stored there in general. You as a sysadmin, roughly speaking, stop thinking in terms of the “RAID group on storage” category, and you start thinking with such concepts as “there is a data set, you need to place them in the FAST profile”. Of course, agreeing with the master or predestined that this profile is FAST on such and such disks of such a data storage system.

The same software uses RAM servers (virtual storage controllers) as a cache. That is, the usual x86 RAM, up to 1TB in size, cache and reads and writes, plus there are buns like preventive reading, grouping blocks, multithreading and really interesting Random Wrire Accelerator (but more on that below).

The most frequent applications are:


What is a Software-defined Data Center and how SDS is included in the SDDC philosophy


The difference between software-defined infrastructures and the usual “static” ones is about the same as what happened between the good old electrical circuits on lamps and the “new” ones on transistors. That is very, very significant, but at first it is quite difficult to master it. We need new approaches and a new understanding of architecture.

I note that there is nothing directly fundamentally new in the very concept of Software-defined, and the basic principles were applied 15 years ago at least back. It was simply called differently and was found far from everywhere.

In this post, we discuss SDS (Software Defined Storage), only about storage, disk arrays and other storage devices, as well as their interfaces.

I will talk about technology based on DataCore software. This is not the only vendor, but it covers almost all the tasks of data warehousing virtualization completely.

Here are a few other vendors that solve data storage tasks on software-defined architectures:
• EMC with their ScaleIO allows you to combine any number of x86 servers with disk shelves into a single fast storage. Here is a theory , but the practice of a fault-tolerant system for domestic not the most reliable servers.



• Domestic RAIDIX . That's about them and their mushrooms .


Their architecture replaces for a number of specific tasks such as video editing for 10–20 thousand dollars with a storage system costing 80–100 thousand

• Riverbed has a cool solution, with the help of which we connected all the branches of the bank in the Moscow storage system so they saw it in their city LAN-network and made a quasi-synchronous replication through the cache.


Servers in cities 1 and 2 are addressed to storage systems in Moscow as to their “in-box” disks with LAN speeds. If necessary, you can work directly (case 3, disaster recovery office), but this already means the usual signal delays from the city to Moscow and back.

• In addition, Citrix and some other vendors have similar solutions, but, as a rule, they are more focused on the company's own products.

• Nutanix solves the problems of hyper storages, but it is often expensive because they make a hardware-software complex, and there the software is separated from the iron only on very, very large volumes.

• RED HAT offers CEPH or Gluster products, but these seemingly red-eyed guys at first glance supported the sanctions.

I have the most experience with DataCore , so I ask you to forgive in advance (and add) if I accidentally bypass someone’s cool features.

Actually, what you need to know about this company: Americans (but did not join the sanctions, because they were not even placed on the stock exchange), have been on the market for 18 years, all this time they are sawing under the guidance of the same peasant as at the very beginning, product - software for building storage - SANsymphony-V, which I will continue to call SSV for short. Since their chief is an engineer, they sabotaged the technology, but did not even think about marketing. As a result, nobody knew them as such until the last year, and they earned their money by embedding their technologies into foreign partner solutions not under their own brand.

About symphony


SSV is a software repository. From the consumer (host) side, the SSV looks like a regular storage system; in fact, it looks like a disk stuck directly into the server. In our case, in practice, this is usually a virtual multiport disk, two physical copies of which are available through two different DataCore nodes.

From here, the first basic function of SSV, synchronous replication, follows, and most of the actual DataCore LUNs used are fault tolerant disks.

The software can be placed on any x86 server (almost), almost any block devices can be used as resources: external storage systems (FC, iSCSI), internal disks (including PCI-SSD), DAS, JBOD, up to connected cloud storages. Almost - because there are requirements for the gland.

SSV virtual disks can be presented to any host (IBM i5 OS exception).

Simple application (virtualizer / FC / iSCSI target):



And more interesting:



Sweet functionality


SSV has a whole range of functions - caching, load balancing, Auto-Tiering and Random Write Accelerator.

Let's start with caching. The cache here is the entire free RAM of the server on which DC is installed, works both for writing and reading, the maximum amount is 1Tb. The same ScaleIO and RAIDIX do not use RAM, but they load the disks of "their" servers or controllers. This provides a faster cache.

In this DC-architecture, the bet is made on speed and reliability. In my opinion, for practical tasks of medium-sized businesses today we get the fastest and yet quite accessible cache.

In the same cache, on the basis of the RAM of the servers, the function of randomized write optimization works, for example, under the OLTP load.



The principle of the optimizer is very simple: the host uploads random data blocks (for example, SQL) to a virtual disk, they get into the cache (RAM), which is technologically able to write random blocks quickly by arranging these blocks in sequence. When a sufficient array of sequential data is typed, they are transferred to the disk subsystem.

Approximately here, read forwarding, block grouping, multithreading, block consolidation, protection from boot / login - storm, blender effect are done. If the management software understands what the host application does (for example, reads a VDI image according to the standard scheme), then the reading can be done before the host requests the data, because he read the same several times in the same situation. It is reasonable to put this in the cache at the moment when it becomes clear what exactly he is doing there.

Auto-Tiering is when any virtual disk is based on a pool in which a variety of media can be included - from PCI-SSD and FC storage to slow SATA and even external cloud storage. Each of the carriers is assigned a level from 0 to 14, and the software automatically redistributes the blocks between the carriers, depending on the frequency of access to the block. That is, archived data is placed on SATA and other slow carriers, and hot database fragments, for example, on SSD. Moreover, all available resources are automatically and optimally used; this is not manual file processing.



Evaluation of statistics and the subsequent movement of blocks occurs by default once every 30 seconds, but in case this does not create delays for current reading and writing tasks. Load balancing is present as an analogue of RAID 0 - striping between physical media in the disk pool, and also as an opportunity to fully use both cluster nodes (active-active) as the main one, which allows you to more efficiently load adapters and a SAN network.

Using SSV, you can, for example, organize a metrocluster between storage systems that do not support this feature or require additional expensive equipment for this. And at the same time not to lose (if there is a fast channel between the nodes), but to grow in performance and functionality, plus have a performance margin.

Architecture




There are only two SSV architectures.

The first is SDS, software-defined storage. The classic “heavy” storage system is, for example, a physical rack, where there is a RISC server and an SSD factory (or HDD arrays). In addition, in fact, the price of disks, the cost of this rack is largely determined by the difference in architectures, which is very important for high reliability solutions (for example, banks). The difference in price between the x86th Chinese crafts and the similar in size storage system of the same EMC, HP or other vendor ranges from about two to a similar set of disks. Approximately half of this difference is in architecture.

So, of course, you can combine several x86 servers with disk shelves into one fast network and teach how to work as a cluster. For this there is a special software, for example EMC Vipr. Or you can build on the basis of a single x86-th server storage, scoring his discs to the eyeballs.

SDS is actually such a server. With the only difference that in 99% of cases in practice these will be 2 nodes, and on the back end there can be just about anything.

Technically, these are two x86 servers. They are Windows and DataCore SSV, between them are synchronization links (block) and control (IP). These servers are located between the host (consumer) and storage resources, for example - a bunch of shelves with disks. Restriction - there should be block access both there and there.

The most clear description of the architecture will be the block recording procedure. The virtual disk is presented to the host as an ordinary block device. The application writes a block to the disk, the block enters the RAM of the first node (1), then over the synchronization channel is recorded in the RAM of the second node (2), then recorded (3), recorded (4).



As soon as the block appears in two copies, the application receives a confirmation of the record. The configuration of the DC platform and backend depends only on the load requirements of the hosts. How correctly system performance is limited by the resources of adapters and SAN-network.

The second is Virtual SAN, that is, virtual storage. DC SSV is located in a virtual machine running Windows Server, DC is allocated storage resources connected to this host (hypervisor). It can be both internal disks and external storage systems, such nodes can be from 2 to 64 pieces in the current version. DC allows you to combine resources "under" all hypervisors and dynamically distribute this volume.

There are also two physical copies of each block, as in the previous architecture. In practice, these are most often internal server disks. The practice is to build a fault-tolerant mini-data center without using external storage: these are 2–5 nodes that can be added if necessary for new computing or storage resources. This is a particular example of the now fashionable idea of ​​a hyperconvergent environment that is used by Google, Amazon and others.

Simply put, you can build an Enterprise environment, and you can take a bunch of not the most reliable and not the fastest x86-technology, drive cars with disks and fly to capture the world on a small price list.



This is what the resulting system can do:



Two practical tasks


Task number 1. Build a virtual SAN. There are three virtualization servers located in the Data Center (2 servers) and the Backup Data Center (1 server).
It is necessary to unite into a single geographically distributed cluster of virtualization of VMware vSphere 5.5, to ensure the implementation of the fault tolerance and backup functions using technologies:
• VMware High Availability technology;
• VMware DRS load balancing technology;
• data channel backup technology;
• virtual network storage technology.

Provide the following modes of operation:
1) Normal operation.
A regular mode of operation is characterized by the functioning of all VMs in the Virtualization Center of the DPC and RCOD.
2) Emergency operation.
Emergency mode of operation is characterized by the following condition:
a) all virtual machines continue to work in the data center (DRC) in the following cases:
- network isolation server virtualization within the data center (RCMS);
- Failure of the virtual storage network between the data center and the RCOD;
- LAN failure between the data center and RZOD.
b) all virtual machines are automatically restarted on other nodes of the cluster in the data center (DRC) in the following cases:
- failure of one or two virtualization servers;
- the failure of the data center site (RCOD) during a disaster (failure of all virtualization servers).




Server hardware specifications



Server number 1



Server



HP DL380e Gen8



2 x processor



Intel Xeon Processor E5-2640 v2



RAM capacity



128 GB



10 x HDD



HP 300 GB 6G SAS 15K 2.5in SC ENT HDD



Network interface



2 * 10Gb, 4 * 1Gb



Server number 2



Server



HP DL380e Gen8



2 x processor



Intel Xeon Processor E5-2650



RAM capacity



120 GB



8 x HDD



HP 300 GB 6G SAS 15K 2.5in SC ENT HDD



Network interface



2 * 10Gb, 4 * 1Gb



Server number 3



Server



IBM x3690 X5



2 x processor



Intel Xeon Processor X7560 8C



Type of RAM



IBM 8GB PC3-8500 CL7 ECC DDR3 1066 MHz LP RDIMM



RAM capacity



264



16 x HDD



IBM 146 GB 6G SAS 15K 2.5in SFF SLIM HDD



Network interface



2 * 10Gb, 2 * 1Gb




Decision:
• Use existing hardware to create a virtualization subsystem.
• On the basis of the same equipment and internal server storage devices, create a virtual storage network with the function of synchronous replication of volumes using DataCore software.

A virtual server — a DataCore node — is deployed on each virtualization server, and virtual disks created on the local disk resources of the virtualization servers are additionally connected to the DataCore nodes. These disks are combined into pools of disk resources, on the basis of which mirror virtual disks are created. The disks are mirrored between the two DataCore Virtual SAN nodes - so the “original” disk is placed on the disk resource pool of one node, the “mirror copy” is placed on the second. Next, virtual disks are presented to virtualization servers (hypervisors) or virtual machines directly.

It turns out cheap and angry (colleagues suggest: a competitive solution for the price) and without additional iron. In addition to solving the immediate problem, the storage network gets a lot of useful additional functionality: increased performance, the ability to integrate with VMware, snapshots for the entire volume, and so on. With further growth, you only need to add virtual cluster nodes or update existing ones.

Here is the diagram:



Task number 2. Unified Storage System (NAS / SAN).

It all started with the Windows Failover cluster for the file server. The customer needed to make a file to store documents - with high availability and data backup and with almost instant data recovery. It was decided to build a cluster.

From the existing equipment, the customer had two Supermicro servers (one of which has SAS JBOD connected). There is more than enough disk space in two servers (about 10 TB per server), however, for the organization of a cluster, shared storage is required. It was also planned to have a backup of the data, since a single storage system is a single point of failure, preferably with a CDP covering the work week. The data must be available all the time, the maximum idle time is 30 minutes (and heads can fly). The standard solution included the purchase of storage, another server for backups.

Decision:
• DataCore software is installed on each server.
In DataCore architecture, Windows Failover Cluster can be deployed without using shared SAN storage (using internal server disks) or using DAS, JBOD or external storage systems with full implementation of DataCore Unified Storage (SAN & NAS) architecture, taking full advantage of Windows Server 2012 and NFS & SMB (CIFS) and providing SAN service to external hosts. Such an architecture was eventually deployed, and the disk space not used for a file server was not presented as a SAN for ESXi hosts.



It turned out very cheap in comparison with traditional solutions, plus:


Main principle


The basic principle of storage virtualization is, on the one hand, to hide the entire backend from the consumer, on the other hand, to provide any real competitive functionality to any backend.

Important practical notes about Datacore


Source: https://habr.com/ru/post/272795/


All Articles