SAN soup. We build virtual SAN on the Windows Server 2012 R2 platform

So, you just finished restoring your system after a crash. Fortunately, this time it managed - services, thank God, have risen. But your boss is not satisfied - it requires “to draw conclusions and take action.” Obviously, it is time to think about how to live. It may be worth reconsidering existing approaches to building infrastructure and taking steps to ensure the resiliency of your production system.

Disclaimer: I apologize in advance to the specialists of SAN and the noble habrasoobschestva for some liberties and simplifications that I have allowed in the preparation of the material. It is designed for those who are not yet familiar with the storage technologies incorporated in Windows Server 2012 R2, as well as those who do not have the ability to deploy a full-fledged FC or iSCSI storage network.

Where to start? I would venture to suggest that two things are required to solve the fault tolerance problem:

Good plan.
Money (estimate).

Strictly speaking, there is also a third point - “straight arms”, but its discussion is beyond the scope of this article.
')
With money, everything is clear. As for the plan, we need to develop such an architecture so that the next time a failure occurs (and it will definitely happen!), The system resists. There will have to make a small digression. The fact is that in modern terminology there are several well-established concepts that are often confused with each other. Here they are:

High Availability (HA) - the ability to minimize both planned and unplanned downtime. That is, we (read the customer) agree in advance that in the event of a failure, it will take some reasonable time to switch to the backup equipment and launch the “fallen” services on it. Breaking the connection is inevitable. Typical example: Hyper-V cluster.

Fault Tolerance (Fault Tolerance, FT) - the ability to maintain performance in case of failure of one or more components. This is when the failure occurred, but no one except the admin noticed him. Or, when we turn off one of the nodes to conduct scheduled maintenance on it (for example, installing updates), and the second node assumes the entire load at this time. The connection does not break, applications are available, only the response time increases slightly. Typical example: Level 1 RAID.

Catastrophe - resistant - the ability to relatively quickly start the service in case of global cataclysms. This is when everything collapsed at once. As a global cataclysm, they like to cite as an example hurricanes, tsunamis and floods. However, in the realities of our country, it is much more likely that a power outage in the data center is local (excavator cable back ground) or fan, as well as the flooding of basements. Example: backup data center.

Why is terminology important? Yes, because the listed tasks are fundamentally different. Accordingly, the approaches to their solution should be different. First of all, we have to decide what we want to get: high availability, fault tolerance or disaster recovery.

In the context of this article we will discuss only one of the three points, namely, fault tolerance. In fairness, it should be noted that the real need for it does not arise often. In fact, the majority of customers are quite willing to put up with the small downtime prescribed in the SLA in exchange for substantial money savings due to the rejection of prohibitively expensive "ultra-reliable" solutions. For example, if users suspend Excel for a few minutes, this will not be a big problem for a business - rather, it is more of a reason to warm up and have some coffee. However, there are some services that are extremely sensitive even to a small disconnection of the network connection. For example, DBMS and hypervisors. If Hyper-V loses touch with its virtual hard disks that are running virtual machines, this can lead to dire consequences. Also, SQL - a sudden loss of its database by a server can take a long time to be an interesting affair of a DB-admin.

So, we decided that we would build a fail-safe solution. In other words, it is necessary to remove possible points of failure at all existing levels: server, network and data storage level. How is this achieved? Of course, by duplicating everything that is possible: servers, network interfaces, data transfer channels and, of course, the disk subsystem. Here before our eyes stands the bright image of SAN. Indeed, what could be better for fault tolerance than the good old hardware FC SAN? That's the way it is ... Only this solution has one deadly flaw. Price. The cost of such a system begins with seven-digit numbers. And the upper limit is practically unlimited. Therefore, it can not just go and buy - at least, you will have to budget.

In addition, by purchasing expensive hardware, we become seriously dependent on the vendor, since compatibility with its third-party equipment is far from guaranteed. Well, the scaling of such systems may require considerable time-consuming. After all, nobody keeps expensive parts in stock - they must be ordered and waited for weeks, or even months. And the boss demands "here and now." And so that everything works "like a clock." And so with minimal cost!

Where is the way out? Is bright dreams of SAN not destined to come true? Wait ... What is SAN anyway? In essence, this is simply a way of sharing high-performance storage devices between servers at the block level. Once upon a time, classic SAN firewalls were virtually non-alternative technology where exceptional performance, fault tolerance and scalability were required. FC speed exceeded typical Ethernet from its 100 Mb / s.

But over time, network adapters 1 and 10 Gbit Ethernet appeared, software and hardware technologies to increase network bandwidth (NIC teaming, data compression, etc.) appeared, which somewhat reduced the advantages of FC and led to an explosive growth in the popularity of the iSCSI interface. With regard to resiliency and scalability, then there was progress. For example, there were options for implementing SAN-based storage with the SAS interface, which, generally speaking, was originally intended to directly connect the storage to the server - DAS. The fact is that the SAS interface, in addition to high speeds (6-12 Gbit / s), has another significant advantage - very low latency. This is very important for heavily loaded hosts, like Hyper-V.

And what do we have besides SAN? And besides SAN there is only NAS. If you compare the SAN and NAS, then the main difference between the first and second is to work at the block level. A host system can create logical partitions on a SAN, format them, and use them as ordinary local hard drives. The NAS works at the file system level and uses file transfer protocols such as SMB or CIFS. Therefore, NAS, of course, is cheap and easy, but oh-oh, very slowly. And therefore, for high-loaded production systems it is unpromising.

Is it possible to somehow combine the high speed and reliability of the SAN with the simplicity of implementation and availability of the NAS? What if you try to implement part of the SAN functionality programmatically? Apparently, the engineers of a modest Redmond company argued somehow when they were preparing their new technology to enter the market. As a result, they really got something, which is formally reminiscent of SAN. But it is several times cheaper. We are offered to use very inexpensive and affordable components to prepare a gourmet dish called the “Scale-Out File Server”, i.e. scalable file server. The word “scalable”, in my opinion, does not quite accurately reflect the essence, since, first of all, the server turned out to be fault-tolerant.

So, today we will prepare the “Soup from SAN” based on Microsoft Windows Server 2012 R2 technologies.
As ingredients, we will need:

servers without disks (only small “mirrors” for the system) - 2 pcs .;
JBOD inexpensive disk shelf with two SAS interfaces - 1 pc .;
HDD SAS - at least 10 pcs. (better - more);
SSD SAS - at least 2 pcs .;
network adapters 1-10 GBit (better - with RDMA support) - 2-4 pcs .;

As a seasoning, we will use the recommended set of spices: Storage Spaces, Tiering, SMB Direct, SMB Multichannel, CSV. Preparation time: 1-1.5 hours with experience, or 1-2 days without one.

Some theory

In Windows Server 2012 and Windows 8, an interesting technology called Storage Spaces appeared. It is intended for abstraction from the physical level of the disk subsystem. In fact, it is an operating system driver located after the partition manager and in front of the volume manager, which allows you to virtualize block storage, hiding it from the operating system. This is achieved by grouping physical disks into pools and creating virtual disks based on pools (LUNs in SAN terminology). Thus, all applications will deal with virtual disks, even without knowing what they consist of. But wait ... Again virtual disks? After all, this technology under the name of "dynamic disks" was implemented by Microsoft (more precisely, it was licensed from Veritas) already in 2000 - as part of Windows 2000! Again, we are trying to shove stale goods?

Everything is not so simple ... Unlike dynamic disks, disk spaces are much more intelligent technology, as we will see later. In the meantime, we clarify the terms:

Storage Pools is a collection of physical disks that allows you to combine disks, flexibly increase capacity, and delegate administration.

Storage Spaces are virtual disks created from free space in the storage pool. Attributes of disk spaces include resilience level, storage levels, fixed preparation, and precise administrative control.

Clustered storage spaces (Clustered Storage Spaces) - the same disk spaces located on the shared storage - what we need!

How to create a virtual disk? In our case, for a start, you need to pooled the physical SAS disks - HDD and SSD. Generally speaking, you can pooled drives with different interfaces: SATA, SCSI, and even USB. But for deployment of a failover cluster (Scale-Out File Server) only disks with SAS interface are suitable. Combining disks into a pool does not present any difficulties and is done with the help of the wizard in just two clicks. Of course, there should be no partitions on the disks to be merged, or they will have to be deleted. Having combined disks in pools, we will group them. But still it is necessary to explain to the operating system what to do with them further. Now you can create a virtual hard disk (LUN) from the disk pool. Storage Spaces technology allows you to create 3 options for virtual hard drives, like RAID levels:

Simple (analog RAID0) - recommended only for tests;
Mirroring (analog RAID1) - recommended for workloads;
Parity (analog RAID5) - recommended for partitioning with data archives.

As with hardware RAID, you can leave one or two disks in a hot spare. The listed features are presented in the default GUI. However, if this is not enough, using PowerShell cmdlets, you can also get tricky combinations that correspond, for example, to RAID level 10.

Let the reader say, if this is software RAID, it should work slower than hardware! And it will be absolutely right. Yes, slower. But how much? It's not so simple. First, as practice shows, the speed of the disks is easily compensated by their number. Therefore, the difference in performance will be the less noticeable, the more disks we will pool. For industrial use it is recommended to use from 12 discs. Secondly, in Windows Server 2012 R2, there was one great feature: merging SSD and HDD drives into one pool with the formation of a so-called. "Hybrid pool" (Tiered Storage). In this case, the system itself will track and transfer the most frequently used data to fast SSDs (remember, I said that the system is smart!). Moreover, the transfer of “hot” data on the SSD occurs block by block, and not at the file level. Moreover, using PowerShell cmdlets, you can explicitly specify which files to place on the SSD, and which on the HDD. And thirdly, Storage Spaces support the so-called. "Writeback cache" (WriteBack Cache). With short bursts of write operations, the system intercepts the data and places them in a special area on the SSD. In this way, the drop in performance during sudden peak loads is smoothed out. All together — a large number of disks, a hybrid pool, and a write-back cache — can significantly improve the system performance, minimizing the negative effect of software RAID. As for saving disk space, Storage Spaces supports such familiar SAN technology as Thin Provisioning - for more economical allocation of disk resources. In fairness, we note that as long as it is incompatible with the hybrid pool, you will have to choose one or the other.

So, by means of Storage Spaces, we can provide fault tolerance at the storage level. Now we will rise on level above. Servers need to be clustered. This feature came from Microsoft for quite some time. However, previously such a cluster could only be called highly available (we recall the terminology). And only with the release of Windows Server 2012, the opportunity to make it truly resilient appeared. This feature is called “Scale-Out File Server”. Here it will be appropriate to recall that file clusters can work in one of two modes:

"Active - Passive"; failover with service interruption - failover.
"Active - Active"; transparent failover - transparent failover.

In the first case, only one of the nodes is active - it is with it that the data is exchanged, and the second is pending. In the event of a failure of the first node, the second takes up the entire load. However, this results in the inevitable disconnection of the connection and interruption of the service. This is the principle that the file cluster works in Windows Server 2008 R2. In the second case, both nodes are active and able to simultaneously receive data from clients. In case of failure of one of the nodes, the SMB-session does not lose, therefore, the applications are not interrupted. This technology appeared only in the version of Windows Server 2012.

But in order for such simultaneous work with the storage to become possible, another technology called Cluster Shared Volume (CSV) was required. If you do not go into details, this is a logical volume specially prepared for simultaneous work with several nodes in the cluster.

And what about the network level? Here, Microsoft has several pleasant surprises in store. The first is SMB Direct over RDMA technology. Simply put, this is a technology of direct access to memory through a network interface, without the overhead of using a central processor. When this feature is enabled, network adapters actually start working at the interface speed, providing high bandwidth and extremely fast response to network requests, which gives a huge performance gain for workloads such as Hyper-V and SQL Server. Let's just say, working with a remote file server becomes similar to working with local storage. And although network adapters with RDMA support are still quite expensive, their cost is continuously decreasing (at the time of this writing, it is about 20 tr.).

The second surprise is called SMB Multichannel. If there are two network adapters installed on the load consumer (for example, SQL Server) and on the receiving side (file cluster), then a multi-channel SMB connection is created between the client and the server. This means that if, for example, a file is copied over the network and something is happening with one of the network adapters during the copying process, this does not interrupt the process - the file continues to be copied, as if nothing had happened. To verify the presence of SMB Multichannel, run the PowerShell cmdlet: Get-SmbMultichannelConnection. You will see something like this:

As you can see, the connection is established using two network interfaces at once.

Finally, the SMB 3.0 protocol optimized the work of load balancing between nodes. Just for configuration Scale-Out File Server.

Well, now that we have briefly run through technologies, it's time to move on to practice. To simplify the task, we will build a two-node cluster, although nothing prevents us from raising a fault-tolerant file cluster with up to 8 nodes. We describe the procedure from the very beginning and proceed to its implementation.

Preparatory work

So, we take a disk shelf in JBOD mode. We fill it with disks, at least two of which must be SSD. The basket should have two SAS-expander, with two slots on each. Through them we connect the basket to two servers, preferably the same. For this purpose, simple single-threaded servers are quite suitable. On servers, we install the usual SAS HBAs as controllers.

Further points:

We install on each server OS Windows Server 2012 R2.
We configure network connections, we install updates, we enter servers into the domain.
Add the role File server on each server.
On one of the servers, open the Failover Cluster Manager console.
Using the wizard, we create a standard failover cluster (Failover Cluster).
Create a new pool: Storage -> Pools -> Create a new pool.
Add SSD and HDD drives to the pool, if necessary, specify access parameters.
Create a virtual disk: Pool -> right click -> New virtual disk.
With the help of the wizard we set the type of the disk subsystem (Mirror).
With the help of the wizard we create a volume on the disk, assign a letter, format it in NTFS.
Create a shared cluster volume (CSV): Select the desired disk -> Add to shared cluster volumes.
We set the role: Roles -> Configure Role -> File Server -> Scalable File Server.
Not to wait, reset the DNS resolver cache (ipconfig / flushdns).
Select the role -> Add file share -> SMB share -> Application Profile.
Specify the location of the shared resource, give it a name.

Everything. The final result of our efforts was the creation of file balls, located along standard UNC paths, like: \\ ScaleOutFS \ Share. Now we can place critical file resources on it, such as Hyper-V virtual hard disks or SQL Server databases. Thus, we received a ready-made network of data storage. Its principal difference from traditional SANs is that SMB 3.0 is used for the connection, and not one of the block protocols (iSCSI / FC), which in a certain sense is even an advantage. Some may want to deploy the Hyper-V role directly on a cluster server, placing virtual disks on shared storage. We'll have to upset. Unfortunately, this configuration is not yet supported. For the Hyper-V and SQL Server roles, it is necessary to raise separate servers that will work with our storage using the SMB protocol.

It remains to take stock ...

fault tolerance

It is provided at all levels: data storage, servers, network interaction.

Performance

Will depend on several factors. Typically comparable to iSCSI performance. And in the case of the use of all available capabilities, including RDMA technology, the storage bandwidth will be even higher than with FC connection (up to 56 GBit / s).

Scalability

At the disk subsystem level, it is provided by simply adding disks or cascading JBOD storages. At the server level, adding nodes to the cluster. At the network level, adding network adapters, grouping them together (NIC teaming), or replacing them with others with greater bandwidth.

Security

Disk pools can be controlled using access control lists (ACLs) and delegated authority to administrators. Storage management can be fully integrated with ADDS.

Cost of

It is obvious that even with the use of all the possibilities incorporated in this technology, the cost of a turnkey solution will be several times lower than the traditional SAN.

But what about the disadvantages? Of course, they also exist. For example, storage systems and servers can not be separated by a considerable distance, as in the case of FC or iSCSI. You can not build a complex topology. SAS switches are still rare. In addition, SAS does not support hardware replication - it will have to be implemented by software. Therefore, the concept described above is not a panacea, but only an alternative to traditional storage systems. If you already have a deployed hardware SAN, this is by no means a reason to abandon it. Iron must work out the money invested in it. But if you are still just thinking about the architecture of the future storage system, it makes sense to consider this option as well-grounded from an engineering and economic point of view. And finally, I would like to note that “soup from SAN” can be cooked not only on Microsoft technologies. If you have iSCSI storage, you can use products such as StarWind iSCSI SAN, VMware Virtual SAN, Openfiler, FreeNAS, Open-E DSS V6, etc.

Enjoy your meal!
In preparing the article, materials from the Microsoft Virtual Academy portal were used.

Arsen Azgaldov, dpandme@gmail.com

Source: https://habr.com/ru/post/258845/

All Articles