
In the previous posts of this blog, I gradually talked about the most significant capabilities of NetApp storage systems, it was time to dilute the dry theory with practical matters. Hereinafter, I’m going to write about how all these baubles work for certain solutions, for virtual infrastructures, databases, high-performance file services and other traditional NetApp applications.
I am aware of some provocative heading, but I will try to protect my position with the text of the article. If you hold a different opinion about the “ideal” storage for VMware ESX, or, more broadly, for any server or desktop virtualization tools, then I will be glad to discuss this in the comments.
Although server and desktop virtualization systems are not the only market for NetApp storage systems (for example, Larry Ellison made a sensational statement this year that, according to his information, up to
60% of NetApp business is Oracle database storage (
Russian )), but this is the current market trend. that VMware, MS Hyper-V, Xen, and hardware solutions for them are the most advanced, technologically advanced and rapidly growing segment of the software and server market. No wonder NetApp has been so
closely involved with it almost from the very moment this industry was born. After all, it turned out that the ideas and principles embodied in NetApp storage systems were surprisingly well technologically compatible with what vendors of virtualization systems now do in their field.
Before we begin to examine in detail the main “fichetski” NetApp in this area, let's just list what are the most interesting and “catchy” features of storage systems NetApp, specifically in this area (links lead to previous articles on Habré, dedicated to the story about this opportunities).
Consider what these potential storage capabilities are becoming real advantages for virtual systems.
')
1. The ability to use to connect disk storage to an ESX server, to host its data under the NFS protocol appeared in VMware in version 3.0, but alas, still this connection storage option, especially for beginners, seems to be something unusual ( and “non-mounted”, unlike FC;). Experienced people have long tasted and used this option, as large-scale examples I can cite the experience of such giants as
T-Mobile (the world's largest “cloud” hosting system SAP, about 2 million customers), Oracle, SAP, Deutsche Telecom,
Rackspace , and many others, not such "transnational", which use NFS as the main protocol for transferring data from disk storage to servers and hypervisors. According to analysts of consulting company Forrester, NFS, as a protocol for accessing the data warehouses of VMware servers, is growing steadily in prevalence and popularity, reaching 36% today (18% two years ago), and has already outpaced iSCSI in popularity.

Among the many advantages of using NFS in VMware (I think that I will devote a separate, detailed article to this topic later) include such interesting features as:
- Ability to create much larger than using VMFS, datastores (up to 16TB in one piece).
- The ability of datastores is not only expandable (which allows VMFS), but also compressed (which VMFS does not allow, and is often required, in dynamically distributed "cloud" environments), and in increments of only 4KB.
- High granularity. Unlike a datastor on a VMFS, you can operate (for example, save to backup and restore from it) not with the entire datastore, but with a separate virtual disk of a separate virtual machine, or its configuration file. This is very convenient if you use a large datastore with dozens and hundreds of machines on it.
- Thin by design. The “virtual disks” of virtual machines on the DFS NFS are the usual “files” on the network “ball”. They take up as much space as they actually contain data, and not as much as we reserved when creating them. A terabyte VMDK, which is only 3GB currently recorded, will take up only 3GB of storage space on the storage system.
- Deduplication, which is more detailed below (and above;) earlier in the blog), frees up space that is directly accessible to the ESX server, and on which it can immediately post its new data. Deduplication for LUNs with VMFS also frees up space on the storage system, but it does not immediately become available to the virtual machine .
- Finally, the connection and operation with NFS is carried out across the familiar and familiar Ethernet, you do not need to fence a separate, special and expensive infrastructure of the FC, using Gigabit and 10G Ethernet you can stay in the usual (and inexpensive) Ethernet infrastructure, and the difference in performance with FCP and iSCSI, as the results of VMware itself show, is negligible .
- There are no problems with the size limit of the SCSI command queue and SCSI lock (which is especially important for large, dynamically configured cloud-based systems that use a lot of non-VAAI support, which, I recall, is only in the top-end VMware Enterprise Plus license) .
However, the advantage of universal, multi-protocol storage (unified storage) lies in the fact that the decision to use, for example, NFS is not imposed on you. A unified storage storage system can work using any of its data access protocols. You can use _anyou_ the available protocol, moreover,
any set of protocols
at the same time . If you need to use “block” protocols for LUNs (for example, you need to use RDM), you can take FCP or iSCSI for this, if you want NFS, you use it at the same time as the first, on the same storage system.
For example, it may be quite interesting to use FCP or 10G iSCSI connectivity for several highly productive and critical VMs, iSCSI over Gigabit Ethernet and VMFS for others that are not so heavy on VM, but want to work on a block protocol, and on NFS to make a large storage (up to 16TB in a datastor), for example, for dozens or even hundreds of relatively lightly loaded I / O VMs. Absolutely invaluable in real life flexibility.

2. Probably one of the most popular and spectacular capabilities of NetApp storage systems for virtual environments (not only under VMware ESX, but also under VMware View (VDI), under MS Hyper-V and Citrix Xen) is the ability to deduplicate data, that is, delete it from them repetitive fragments. Deduplication works especially effectively in the case of virtual environments, since it is obvious that most virtual disk files for virtual machines will contain the same OS, with the same or very similar files in them.
For this reason, deduplication of datastores can give 50% or more (in
practice, there are results and up to 75-85%!) Space savings. That is, after storage deduplication, its available volume seems to double or triple for you!

Particularly pleased that you do not have to pay for this decrease in speed. In the overwhelming majority of cases, after deduplication, users do not see any noticeable decrease in the performance of their repositories.
But a decrease in storage on disks is only one benefit. It is also very important and interesting that the data in the cache is deduplicated as well!
Imagine a storage system, the host server connected to it reads data blocks that fall into the cache, how much will fit in it, and the rest, not trapped, are slowly read from the disks (cache miss).

Now imagine that the same host, or hosts, read deduplicated blocks, and the cache knows that the blocks that request hosts from it are identical in content and occupy the same place on physical disks.

Unlike “ordinary
laundry detergent storaja”, for which all blocks on disks are unique and indistinguishable from each other, and even blocks completely identical in content will take place in the cache when read (the classic storage does not know anything about the contents of the block, for it "Block number three", "number five hundred forty" and "number of hundred-bitten"), the NetApp system knows that the block has been deduplicated, and for three read commands from the hosts can be read from the disk in the cache and give the hosts just one block, identical to the contents of all three requested m.
Thus, having halved and more storage on disks, due to deduplication and sharing of block storage technology, we halve and more reduce the volumes occupied by these blocks in the cache, that is, virtually, we get the equivalent cache size on such a system twice and more capacious. After all, now we have identical blocks that are not duplicated not only on disks, but also in the memory of the cache.
“So much for the second benefit,” as the elephant boa about his trunk said in a famous cartoon.
3. The situation with space saving and virtual increase in cache capacity becomes even more useful, since NetApp is actively using its relatively recent notion, which I already wrote about on Habré - the so-called Virtual Storage Tiering using a large cache on flash memory.
Such a cache solves the problems of boot storm, login storm, which are very painful for virtual infrastructures (imagine a big company using VMware View or a similar desktop product, and whose employees simultaneously include thousands of jobs at 9 am), as well as all other “storms”. », Because the working (and deduplicated) data set, often with a margin is placed in such a megacash and approximately an order of magnitude (10 times) reduces the delay time (the so-called latency) and storage system performance.

An important feature is also the saving of resources, power supply, cooling and rack space. After all, a shelf packed with disks occupies 4U in a rack, consumes about 340 watts / hour and 1,400 BTU / hours of heat, and the Flash Cache board, which can provide speed of several such shelves, consumes 18 watts / hour, does not take up space in the rack, allocates only 90 Btu / hour heat For large systems, this can be a very, very significant savings.

4. Thin Provisioning, which I mentioned earlier, is the best suited for cloud storage tasks, especially for tasks when the volume occupied can be arbitrarily and unpredictably increase, and the number of customers using the storage is tens and hundreds. By allocating such clients disk space dynamically, using overprovisioning, that is, the model, when the client “sees” free space in the amount in which he requested it, and on the storage system drives only takes up the amount of data actually written to the disks.

At the same time, I would like to note that
there is practically no difference in performance between thin and thick disks for VMware . Also in practice, the effect of “fragmentation” is zero with such a “growing” disc when recording.
I also draw your attention to the fact that “hardware” thin provisioning in storage systems can work not only with VMware thin disks, but also with thick- (excluding eager zeroed thick). And if, for some reason, you don’t want to or cannot use your own thin VMware mechanism, or use a non-VMware hypervisor, in which the thin-disk mechanism is not yet available, you can still get all the possibilities of thin provisioning, how they are implemented for you independently by the storage system.
5. I already talked about
snapshots and how they are used by NetApp. I am sure that you already know what it is and how convenient it is to create snapshots and restore the data from them to the moment you need in this way. However, as you may know, VMware has its own datastor snapshot creation mechanism. But those who have already tried, he does not respond in the best way. Indeed, as it should be noted, many attempts to implement snapshots in storage systems or software turned out to be not very successful, with a large number of unpleasant side effects, such as reduced performance when used, and in the case of VMware - a lot of difficulties during their removal. In general, I should join the opinion of a well-known Russian VMware specialist,
Mikhail Mikheev :
“Snapshots are evil,” but with one amendment: “VMware snapshots are evil,” because NetApp snapshots are a different matter (
"... - good " ;). And that's why.
Through the use of WAFL mechanisms, the resulting snapshots not only do not slow down the storage, but also solve the described problem with the work of snapshots in VMware, which allows using them as widely as possible not only to “fix” certain virtual machine states, but also as full backup copies .
To do this, there is a special software product - SnapManager for Virtual Infrastructure, which assumes all the tasks of creating a consistent copy of the contents of a VMware datastor, and of recovering from such a copy of a datastor or part of it.

The storage's own snapshot mechanism works integrated with VMware snapshots in such a way that when the storage system creates a snapshot, then to ensure file system consistency and VM status (I / O operations must be suspended at the time of taking snapshot), VMware snapshot mechanism is suspended by split-second hypervisor work on this datastor, makes a snapshot, and releases the hypervisor, without actually creating a “bad” VMware snapshot, but using only problem-free “hardware” systems snapshot Storage.
6. FlexClone. You can deduplicate duplicate data, or you can simply avoid duplicates; with NetApp, this is even called the term non-duplication, “non-duplication.” The same “shared blocks” technology that is used in deduplication, when hundreds of “logical” blocks of the file system can be referenced to a single physical block, is used in a feature called FlexClone.

The technology is somewhat similar to Linked Clones technology, but it works for any tasks, as it is implemented by the storage itself.
When creating a clone of data (volume, LUN, file), its contents are not copied to a new instance, but a new copy of metadata is created indicating the previous data blocks, but the blocks that have been modified compared to the original will already take place, but only they. It turns out this implementation of "differential disks".

Now, if there is such a need, hundreds of working VM images in a very small storage space can be created from a reference image of a virtual machine in minutes, because changes in clones will take place.

7. It's nice that almost all the listed NetApp storage features are available from the special “control center” storage system integrated into vCenter. Now the VMware administrator can not jump between two or more management tools, separately for the storage, in its own form, separately for VMware in vCenter. Now all management is concentrated in vCenter.
This panel page integrates into vCenter, and is available for free to all NetApp storage systems. In addition, special tools are supplied with this tool that automatically set optimally, for example, multipathing settings, SCSI timeouts, and other necessary parameters in accordance with vendor Best Practices.


8. Finally, quite fluently, it is worth mentioning the interesting features of
secure multitenancy , that is, the possibility, if necessary, of dividing the storage system into several “virtual”, independent “storage subsystems”, for example, if you are forced into an organization, suppose that security, to distinguish between storage and such a policy requires absolute isolation (even for the administrator!) sections, say, HR or finance department, from administrators of other departments. Now physically one storage system can work in such a “logically separated” form.
NetApp storage systems were also among the first to implement support for VAAI, allowing part of the tasks from the hypervisor server to be transferred to the storage system, such as creating and filling partitions with zeros, copying partitions, or a new, more “granular” SCSI locking system, and thus increasing performance in large infrastructures.
In addition, NetApp develops and produces a very interesting tool for analyzing and optimizing the performance of OnCommand
Insight virtual infrastructures (formerly Akorri BalancePoint), which is available regardless of NetApp storage systems, here I mention it “not to get up twice” for those who have mastered this my today is indecently huge text. :)
So, summarizing in conclusion:
I believe that NetApp storage systems are the natural and best choice today for any virtualization environment, for example for VMware vSphere, VMware View, MS Hyper-V, Citrix Xen, and others, as they offer several important and convenient features at once:
- Multiprotocol - that is, work on several different access protocols: FC, iSCSI, NFS, and at the same time, without the need to share the storage system or data on it, and accessing the data in a general, uniform manner.
- Deduplication - allows you to save space on storage systems by reducing from half and more busy space through the process of eliminating duplicate fragments, such as files in vOS virtual disks, without compromising performance, and besides, it can even be increased by virtual increase in capacity dedupe-aware cache.
- Thin Provisioning - facilitates administration, saves disk space and allows you to more conveniently allocate space to tasks "on the cloud."
- Flash Cache - increases performance by using flash memory to organize an effective “cache” layer for storing the hottest data blocks in flash space, without using capricious and expensive SSDs.
- Snapshots - allows you to almost instantly create "snapshots" of the state of the data, create backup copies of them, and instantly restore virtual machines from them, without sacrificing performance or taking up unproductive storage space for these "snapshots".
- FlexClone - creates “ideal clones” of data on the storage, such as virtual machine disks or user data, which occupy data on a disk only in the amount of changes relative to the “original” clone, which allows you to store hundreds of volume clones in a small space.
- VMware Storage Console - allows you to conveniently administer the storage system in the page application integrated into the vCenter interface and automate a number of routine procedures, as well as automatically optimize critical storage system settings for best results, and enable the VMware administrator to manage the number of VMware ESX storage parameters allowed by him, without distracting on such trifles the storage system administrator.
- To use a number of other goodies, such as secure multitenancy (the ability for users to safely divide the storage system into isolated “virtual filers”), VAAI, and so on, which I didn’t say much about in order not to make this article very infinite.
To date, such a rich set of capabilities for working in virtual environments does not offer any storage vendor. And this is because we have not said anything about speed, reliability, and ease of administration, which is worth a separate article.
Thus, for a price comparable to the prices of similar storage systems from other manufacturers, you get a system with greater connectivity due to support of different protocols, with greater reliability and protection due to RAID-DP and snapshots, with higher performance due to Flash Cache, and with greater capacity, due to deduplication, FlexClone and thin provisioning.
So, it seems to me that if you have plans to deploy a virtual server infrastructure or a “cloud” system, and have already outlined a storage system for one manufacturer or another, then before making a choice, it makes sense to get acquainted with NetApp storage systems more closely. more that in most cases you don’t have to buy a cat in a bag, because with most of the NetApp partner companies you can get one system or another in a trial to evaluate its capabilities directly on hardware, specifically for your task. Try, twist, drive on the load, and decide whether this is what you are waiting for.
A list of partner companies, which, among other things, should be addressed with the question of “how much” that is traditional for this blog;), as well as for a “test drive”, can be found on the
official NetApp website in Russian .
That's why I call NetApp storage systems “the perfect choice for VMware” (as well as Hyper-V, Xen, KVM, and so on). What do you think, what capacity in your storage system is not enough to consider it “the ideal solution for virtualization”?