Configuring Virtual Infrastructure: VDI Cluster Optimization

Well, like optimization. Creative efforts to level out the heeling and loose infrastructure, which they tried to keep with all their might in a way “do not touch anything, everything can break . ” Dangerous phrase, quickly becoming the life philosophy of an IT specialist who has stopped developing. The source of the "fraud" IT.

Half a year has passed since the most responsible person for the virtual infrastructure quit and left me the entire farm and operational documentation in the form of a list of service records. During this time, a number of works were carried out to strengthen the foundation, increase the reliability and even the comfort of the structure. The key points I want to share.

')
So, given:
VMware Enterprise Plus Virtualization Infrastructure. Includes production, test zone and VDI. The latter is implemented on the basis of the product fujitsu Pano Logic, which has not been updated for 2 years and, apparently, is not supported.
The main cluster being upgraded is VDI, as the most extensive critical service and most dense in resource utilization. It is implemented on the basis of full clones, because the associated clones do not understand the pano manager itself, and View does not want to buy the business either.

As the storage system, a set of EMC arrays is used — several CX4-240 and a pair of VNX. And there is such a refinement as IBM SVC. It is used for consolidation and storage virtualization (that is, luns are mounted with storages on SVC, pools are created there, and new LUNs are being created on these pools, which are sent to servers). All storages are connected via FC SAN.

Since there was no documentation, one had to learn how to live and work. However, some seemingly innocuous changes sometimes led to unexpectedly unpleasant consequences, revealing strange settings and crutches.

Quick navigation:

1. Storage subsystem
2. Network
3. Computational resources

1. Storage subsystem

I started this direction even before my colleague left, since SAN was my main area of responsibility.

1.1 VAAI

The first thing that surprised me was the use of a large number of datastores of small (1TB) size. Why - no one could explain. An attempt to consolidate more into datastores immediately revealed a problem - too many scsi-locks, as a result, high latency and boot-storms, as a phenomenon. Strange was that the storage system behaved as if it did not support VAAI. But in the properties of datastores it is explicitly stated “Hardware Acceleration: Supported”.

Understanding came when they introduced a new host to the cluster. A colleague remembered that before operation, "you need to enter a command - disable some kind of thing that leads to problems when working with SVC". The “piece” was the VMFS3.HardwareAcceleratedLocking parameter set to 0. In other words, perhaps the most important function of the VAAI, Atomic Test and Set (ATS), allows you to block when changing metadata of a datastor not the entire datastore, but specific sectors on the disks.

However, this setting had some problems with SVC. At least with the version of the firmware that we had. An attempt to update the firmware led to the death of two (of the three updated) nodes, support for this equipment was over, so we decided to transfer the datastores directly to the EMC arrays. In general, it remained incomprehensible to me - why it was necessary to stretch such a curved layer of virtualization of storage on top of a clearly more advanced layer of EMC pools.

Recommendations : From the point of view of performance, a smaller number of large LUNs is more profitable than a large number of small ones - there is an overhead for servicing each LUN and paralleling requests and worsening the performance of different cache levels. (EMC recommendation to reduce the load on the SP: Reduce the number of LUNs where it is possible. If you have a RAID group has multiple LUNs Replacing these with fewer, larger LUNs will also be reduced, therefore further reducing SP Utilization .)

When using storage virtualization systems, make sure that they are “smarter” than the lower level, or at least do not kill the repository functionality. In our case, SVC was obviously "dumber than" EMC arrays, refused to see LUNs more than 2 TB, and made features such as auto-typing meaningless (and I suspect that Flash Cache).
And of course, you should make sure that the storage hardware supports VAAI, and, moreover, this functionality is not blocked at the level of virtualization infrastructure.

1.2 Zoning and distribution to arrays

The second point is the strange scatter of different data categories across arrays. BD interspersed with file servers and VDI in a chaotic manner were scattered around all the stores according to the principle “where was the place”. Not to mention the fact that some datastores were connected directly to EMC arrays, and some - through SVC.

After long migrations and redistributions, we managed to distribute the data in the most optimal way - the most resource-consuming (productive servers and VDI) were folded onto VNXs, and the Clarions were given backups and less demanding services. In SVC, only the directly routed to the LUN servers remained, I took all the datastores out of it. The number of zones on the switches was reduced from ~ 120 to ~ 75, taking into account the fact that previously there were zones like "many targets - many initiators", and now no more than one initiator in the zone. Just because the data of a certain type with which certain servers worked now lie on the same storage system, and not on three different ones.

What is the profit - the extra zones created an extra load on the SAN network, the use of heterogeneous load (IO-intensive, such as database and sequential write, such as backups / file servers) on one array is detrimental to performance. Using more than one initiator in one zone is bad practice.

1.3 Path Selection Settings

# esxcli storage nmp device list on hosts showed that
a) For the most part, the Round Robin policy is used to select paths to the datastores,
b) For a part of datastores (first six) on the first two hosts, the path change takes place through 3 IOPS
Path Selection Policy Device Config: {policy = iops, iops = 3,
on the rest - the default value
Path Selection Policy Device Config: {policy = rr, iops = 1000,
c) On the last five hosts of the cluster, Fixed was generally used for some of the datastors (all communication with the storage system follows the same path as long as it is available).

The choice of the Path Selection Policy is determined by the model and vendor of the storage system. In most cases, round-robin is used for active-active configuration, in view of some load balancing. By default, the path change occurs after 1000 iops. However, in some cases this may lead to delays. There is a kb from VMware , where it is recommended to change this value to 1. There are tests showing that the performance of the storage subsystem is really higher in this case.

Recommendations : configure multipathing according to vendor recommendations for your configurations. And make sure that they are the same on all hosts. Good help in this Host Profiles in VMware.

2. Network

In order to configure load balancing, all VDI cluster blade basket switches were stacked, organized by EtherChannel, and load balancing in the Teaming and Failover section was configured as Route Based on IP Hash. The fact is that IP Hash works only on top of EtherChannel and only IP Hash is compatible with EtherChannel. However, when the cluster grew to a second blade basket, whose switches did not support EtherChannel, a problem arose. The problem manifested itself in the form of hard MAC-flapping on the switches of the second basket (according to networkers) and discarding received packets on the first (from 10 to 100 per second, according to the monitoring system).

An important recommendation is to not change network settings for the entire cluster. Checking on one host and making sure that everything is in order, we turned off EtherChannel on all others. And lost access to everything except the first. 15 painful minutes, while moving away from the shock and returning the configuration back, no one, except the lucky ones located on the first server, could not work. In the future, the settings were changed one by one, displaying it previously in the Maintenance Mode. In parallel, I considered total man-hours of downtime, multiplying 15 by 1300 (the number of VDI) and dividing by 60. Thanks to the management for understanding ... But this was not the first shock associated with virtual desktops.
By the way, I don’t know why, but until I recreated the dvSwitch, the host gave an error on every reboot: LACP Error: <something that the current configuration supports IP Hash only>. Although EtherChannel has been disabled. The new dvSwitch did not show such an error. Moving the hosts and virtual machines to the new distributed switch burned another pack of nerve cells, but everything worked out.

Along the way, I reconfigured the use of uplinks. Before starting, all portgroups were configured the same way:

I did this:

As a way of organizing balancing - Route based on physical NIC load (the next interface in the list is selected if the current load is more than 70% loaded).

Conclusions - setting up balancing and network resiliency is a creative process. But subsequent monitoring showed that with such a configuration, a) the losses of the first blade basket packets disappeared, b) load balancing on uplinks became more uniform, c) rarely, but there were cases before that the host suddenly became inaccessible. Over the past few months, this has not happened once. d) Bulk vMotion (server output in Maintenance Mode, for example) does not affect VM traffic.

The advantages of using IP Hash, compared to LB, I do not see.

3. Computational resources

It immediately seemed to me that 1.5 GB of RAM for virtual desktops on Windows 7 is a mockery of users. And most likely a negative impact on the disk subsystem due to swapping inside the OS. But there was no excess memory. The risk of losing fault tolerance and swapping at the virtual machine level was more negative. The idea came from the news about disabling the Transparent Pages Sharing tool from the default settings in upcoming versions of vSphere. More precisely, from the discussion about her on Facebook .

Summary:

The feature is disabled because there is a hypothetical security threat ( under certain highly controlled conditions * ).
In most implementations, the technology is really almost useless since ASLR appeared and support for large memory pages. Since to find two identical pages with the size of these in 2 MB is less likely than in 4 KB. And for server virtualization of a 2 MB page is much more critical in terms of performance than saving memory.

However, why not test it on a VDI cluster?

I made the following changes to the host's Advanced Settings:
Mem.AllocGuestLargePage = 0 instead of 1 - disabling large memory pages
Mem.ShareScanGhz = 6 instead of 4 - increase the scanning frequency
Mem.ShareScanTime = 30 instead of 60 - increase scanning speed

To compensate for the increased processor load, I turned off vNuma chips as useless in the case of VMs that have less than 8 vCPUs. These settings (along with the Path Selecting Policy settings) I distributed to all hosts using Host Profiles. The result can be seen on the screenshot below.

Explanation of the parameters:
If two VMs share 100 MB of memory, the Shared parameter will be 200 MB, and the Shared Common - 100 MB.
As can be seen from the monitoring results for the month, Shared Common has grown four times, and Shared - by six. The total memory savings amounted to almost 700 GB, that is, 600 GB more compared to the state on October 26. This is almost a quarter of all cluster resources. True, the average processor load has increased from 50-60% to 70-90%.
On November 5, there is a slight decline, as Mem.ShareScanTime and Mem.ShareScanGhz had to be returned to default values in order to reduce the processor load. Now it is kept at 60-80%. Nevertheless, the savings still remained significant and there was an opportunity for all machines that had 1.5 GB of memory to increase its volume to 2.

The impact of these changes in terms of the responsiveness of the disk subsystem can be estimated from the image below. The value of Read Latency for several datastores is given. Unfortunately, for the same period, experiments with the SCCM maintenance windows occurred, which gave night peaks with fantastic values up to 80 seconds and made the average values absolutely non-indicative. For this reason, I did not give myself the schedule - night peaks killed all the visibility of the daily load. But it is possible to estimate the change by the minimum values of the response time (the first columns).

At first, it was generally unusual to see the readings of the average Latency in the "green" zone - 10-20 ms. It was usual that they almost never fell below 30.

Conclusion : not all mutilated technologies are as useful as they are said to be. But not all the features on which they put a cross, and which competitors, in this case VMware, bury on their speeches are so senseless. However, you need to know them and understand when to use. And for this you need to study the settings, parameters and properties of the product, including undocumented. And then the management of IT infrastructure will not be shamanism. And it will be competent and creative use of knowledge, which is the basis of professionalism.

Thanks for attention.

Source: https://habr.com/ru/post/243413/

All Articles