Virtualization: recommendations of leading dog owners

Before building an infrastructure based on virtualization, and even more so to introduce it into commercial operation, it is necessary to ensure that the system resources are used most efficiently and the performance is maximum. In this series of articles, I will give recommendations on how to optimize the system for performance, both from the host side and from the virtual machines side.

Let's start with the host

Since servers hosting virtual machines often run at peak loads, the performance of such servers is crucial for the performance of the entire system. Potential "bottlenecks" can be:

CPU
Memory
Disk subsystem
Network subsystem

Here I will tell you how to identify bottlenecks in all four areas and how to deal with them, and most importantly, how to avoid them.
')

The processor is the heart of the computer

The "heart" of any computer is the processor. And the correctness of the choice of processor in the context of virtualization - it becomes even more important. The processor is the most expensive part of any computer, and the choice of a too powerful processor can lead to unnecessary costs not only for the purchase of the processor itself, but also in the future for electricity and cooling. If the processor is not powerful enough - the system will not be able to provide the necessary performance, which can result in the purchase of a new processor - and, therefore, costs again.
We need to get answers to the following key questions:

How many processors to install?
How many need cores?
Their speed characteristics?

Answering these questions is not as easy as it sounds. A simple example: which system to use - dual-processor or four-processor? At the price of dual-processor systems in an unconditional win: the price of one four-processor server is approximately equal to three dual-processor. It would seem that in this case the best solution is to buy three dual-processor servers and combine them into a failover cluster — and you can get a more high-performance and fault-tolerant solution. But on the other hand, in such cases ... There are many new costs:

More software licenses are required - both on the OS and on the management software (SCVMM, SCCM, SCOM, etc.)
Administration costs increase - three servers instead of one
Three servers consume more power, which means they generate more heat and take up more rack space than a single server, albeit a more powerful one.

After that, it may turn out that it would be better to use a four-processor server, which can and will cost a little more and be less fault-tolerant - together with all overhead costs it may still be cheaper.
Nevertheless, the performance of the system as a whole may depend not only and not so much on the processors. Take, for example, a DBMS. In some cases, the requirements for the processor may not be too high, but the disk subsystem can be used very actively. And if in this DBMS business logic and analytics (OLAP, reports) are actively used, then on the contrary, the requirements for the processor and memory can be much higher than for the disk subsystem.
To determine whether the processor is a bottleneck in the system, you need to know how heavily it is loaded. For this, different system utilities can be used. For example, many system administrators are accustomed to using the standard Windows Task Manager. Unfortunately, due to the peculiarities of the Hyper-V architecture, this Task Manager will show not the weather in Honduras, and not the Zimbabwean dollar rate, but only the CPU load of the host OS. Virtual machines will not be taken into account in this case - since the host OS, in the same way as all virtual machines, works in its isolated partition. Therefore, you need to use the Perfmon snap-in. Many administrators, especially those who passed the MCSA exams, know about this utility. For those who still do not know - it starts quite easily: Start - Administrative Tools - Reliability and Performance . From this snap-in, we need the Monitoring Tools - Performance Monitor branch.
With the help of this utility, you can see the values of almost any system parameters, as well as observe their change on the graph. By default, only one parameter is added (in terms of Perfmon - “counter” or “counter”) - “% Processor Time”. This counter shows the same thing as Task Manager - the CPU load of the host OS. Therefore, this counter can be deleted.
We proceed to add counters. Perfmon has many Hyper-V-related counters. Of these, we are currently interested in two:

Hyper-V Hypervisor Virtual Processor,% Total Run Time - this counter displays the load of virtual processors. You can set the display of the total load of all processors for running virtual machines, or you can select a specific virtual processor of a specific virtual machine.
Hyper-V Hypervisor Root Virtual Processor,% Total Run Time - and this counter shows the loading of selected logical processors with tasks not related to Hyper-V.

Note: What is a logical processor? The easiest way to understand this is by example. Suppose if you have one processor with one core, you will have one logical processor. If the processor is dual-core, then there will already be two logical processors. And if he supports Hyper-Threading, there will be four of them.
These two counters will help to get a real picture of the host processor load. The counters are measured in percent, and, accordingly, the closer they are to 100% - the higher the CPU load, and it may be worth considering buying additional or new, more powerful processors.

Memory does not happen much

A powerful processor is good, but with a lack of memory, the system begins to use paging files, and performance begins to fall almost exponentially. As they say on the Internet - "512 megabytes is not a memory, it is insanity."
Unfortunately (and most likely fortunately) in Hyper-V it’s impossible to allocate more memory to virtual machines than is physically present in the system. This is what is called “memory overcommit”, and the marketing of other virtualization solutions vendors plays with such joy. For better or worse, this is a topic for a separate article, and quite a few virtual copies have been broken about this topic.
In this regard, the question arises: how much memory do we need as a result? The answer depends on various factors:

How many virtual machines will be running, and how much memory will they need? The amount of memory required for each virtual machine depends on the tasks it will perform. The approach is the same as for conventional servers, but virtual machines can allocate memory more flexibly - not 1024 MB, but, for example, 900 MB.
The host OS also needs memory. It is recommended to leave at least 512 MB of free memory for the needs of the hypervisor and the host OS itself. If the amount of free memory drops below 32 MB - the system will not allow to run more than one virtual machine until the memory is free. In addition, in the host OS can perform some other tasks, in addition to virtualization. Although it is not recommended, but the fact still has a place to be, and this must be taken into account.
Other virtual machines (for Live Migration scripts). If the infrastructure is planned on the basis of a failover cluster, then additional memory should be provided on each of the hosts. The fact is that virtual machines can move from one host to another in the case of a manual move (Live Migration), or in the event of a failure of one of the hosts. If there is not enough memory on the host to start roaming virtual machines, then they simply won't be able to start on it. Therefore, at the design stage, it is necessary to provide an “emergency supply” in the amount of 50-100% of the required memory capacity. Perhaps the situation will improve slightly with the release of Windows Server 2008 R2 SP1, which includes dynamic memory allocation technologies, but I can only say for sure when I test it myself.

How do we see what happens to memory? Fortunately, you can look through your favorite Task Manager - in contrast to the CPU load, it will show memory usage fairly true. And you can (and even need to) resort to the familiar Perfmon and its Memory / Available Mbytes and Memory / Pages / Sec counters.

Hard drives: how many need them?

As a rule, it is quite difficult to predict how much disk space virtual machines will need to work. And therefore, situations where there is not enough disk space, or vice versa - when there are too many of it and the disks are idle - are quite common.
In addition to volume, there is another very important characteristic - the speed of the disk subsystem. 2 TB of disk space is certainly good, but if these are two SATA disks that are not combined into a RAID array, then the bandwidth may simply not be enough, and this will greatly affect the system performance.
Planning the storage subsystem includes the following aspects:
Controllers. The hard disk controllers can have different bus widths, different cache sizes, and in general their performance can vary greatly. Some controllers are completely “hardware”, that is, they process all requests independently, and some are “half-soft”, that is, the processor of the computer itself performs part of the request processing. The speed of the disk subsystem depends primarily on the type of controller, and you need to choose the controller correctly.
Type of disks. Hard drives, in addition to volume, have many other characteristics that should not be forgotten. This includes the interface type (IDE, SATA, SCSI, SAS), and the spindle speed (7200, 10000, 15000 rpm), and the cache size of the hard disk itself. The difference, for example, between a disk at 7200 and 10000, and even more so - 15000 rpm, or between 8 and 32 MB of cache memory - for such high-load systems as virtualization hosts - is quite high.
The number of disks and the type of RAID array. As already mentioned, sometimes, in order to achieve higher performance and reliability, the best solution would be not to install a single large disk, but to combine several smaller disks into a RAID array. There are several types of RAIDs:

RAID 0 - “interleaved array”. Information is written in blocks (“stripes”) simultaneously on several disks. Due to this, reading and writing large amounts of information is much faster than from a single disk, and the faster, the more disks in the array. But there is one big drawback: low reliability. Failure of any of the disks will lead to complete loss of information. Therefore, in practice, RAID 0 is used quite rarely. One example is the intermediate backup storage in the “Disk-to-disk-to-tape” model, when reliability is not as important as speed.
RAID 1 - "mirroring". With this model, information is recorded simultaneously on several disks, and the contents of all disks are absolutely identical. The speed of writing and reading is not higher than for a single disk, but much higher reliability: the failure of one disk will not lead to loss of information. There is only one drawback: the high cost - where there is enough one disc - you have to install two or more. Meaning is when reliability is crucial.
RAID 4 and RAID 5 - “interleaving with parity”. It represents a kind of "middle ground" between RAID 0 and RAID 1. The idea is that the information is stored on disks as in the case of RAID 0 - interleaved blocks, but in addition, the checksums of the stored data are calculated. In case of failure of one of the disks - the missing data is automatically calculated from the available data and checksums. Of course, this leads to a decrease in performance, but at the same time, data is not lost, and when replacing a failed disk, all information is restored (this process is called rebuilding the array). Data loss will occur only when two or more disks fail. Such arrays are distinguished by the fact that their write speed is much lower than the read speed. This is due to the fact that when writing a block of data, the checksum is calculated and written to disk. RAID 4 and RAID 5 differ in that in RAID 4 checksums are written to a separate disk, and in RAID 5 they are stored on all disks in the array, along with the data. In any case, to organize such an array, you need N disks for data storage plus one disk. Unlike RAID 1 and RAID 10, where the number of drives simply doubles.
RAID 6 is also RAID DP, double-parity, double parity. Same as RAID 5, but checksums are calculated twice using different algorithms. Although the disks here no longer require N + 1, as with RAID 5, but N + 2, but such an array can survive even the simultaneous failure of two disks. It is relatively rare, as a rule - in Enterprise-level data storage systems, for example, NetApp.
RAID 10 is a “hybrid” of RAID 0 and RAID 1. It is a RAID 0 of several RAID 1 (and then called RAID 0 + 1) or vice versa - RAID 1 of several RAID 0 (RAID 1 + 0). It is distinguished by the highest performance, both in writing and reading, but at the same time it is also distinguished by its high cost - since disks require 2 times more than is necessary for data storage.

As you can see, the choice of disks is quite a difficult task, so you need to choose, based not only on the requirements for disk space, but also on performance requirements, and of course, from allocated budgets. Sometimes it will be more justified to use external data storage, for example - when it comes to large volumes and / or high performance that cannot be achieved using internal disks. And when infrastructure with high resiliency is planned, then there is certainly no escape from external storage. External storage systems should be selected based on the same principles as internal disks: interface bandwidth, number of disks, type of disks supported by RAID arrays, additional functions such as changing volumes of virtual disks (LUNs) on the fly, the ability to use snapshots, etc.
What about measurements? There are several counters related to the performance of the disk subsystem. Of interest are the following:

Physical Disk,% Disk Read Time
Physical Disk,% Disk Write Time
Physical Disk,% Idle Time

These counters show the percentage of time spent reading, writing to disk and, accordingly, the percentage of downtime. If their values rise above 75% for long periods of time, this means that the performance of the disk subsystem is not high enough.
In addition, there are two more counters:

Physical Disk, Avg. Disk Read Queue Length
Physical Disk, Avg. Disk Write Queue Length

These two counters show the average length of the disk queue, respectively, for reading and writing. High values of these parameters (above 2) for short periods of time (“peaks”) are quite acceptable, and, for example, for DBMS or MS Exchange servers are quite typical, but long-term exceedances indicate that the disk subsystem is probably “narrow” place. "

Network subsystem

The network subsystem is a "bottleneck" much less often than the processor, memory and hard disk, but nevertheless - you should not forget about it.
As with all other components - there are several questions to which it would be nice to get answers at the planning stage:

How many virtual machines will be running at the same time, and what will be the load on the network?
What is the network bandwidth?
Are iSCSI storage systems in use?
Does the server have remote management hardware independent of the installed OS (for example, HP iLO or Dell DRAC)?

Depending on the answers, different configuration scenarios for the network subsystem are possible. Suppose we have only one server. It has exactly 4 network interfaces. Only three virtual machines have been launched. The server does not have an Out-of-band-management controller, which means that if something bad happens, you will have to run to the server (which is located at the other end of the city).

At the host level

For servers that do not have remote management hardware, it is recommended that one of the network interfaces be left unused in virtual networks, solely for management tasks. This will greatly reduce the risk of a situation when, due to excessive utilization or due to incorrect settings of the network interface, the ability to remotely manage the server is lost. This can be done either during the installation of the Hyper-V role by unchecking one of the network interfaces, or after installation by removing the virtual network attached to the network interface that will be used for management.
In addition, at the host level, it is imperative to install as much as possible “fresh” drivers for network adapters. This is necessary in order to take advantage of the special functions of network adapters - VLAN, Teaming, TCP Offloading, VMQ (provided that the network adapters themselves support this - as a rule, they are specialized server network adapters).

Network loads

Suppose that our three virtual machines have already worked for some time, and the traffic analysis showed that two of them do not burden the network interface very much, while the third generates very large amounts of traffic. The best solution would be to “release to the world” a virtual machine that generates a large amount of traffic through a separate network interface. To do this, you can create two virtual networks of the External type: one for those virtual machines that do not load the network, and a separate one for the third virtual machine.
In addition, you can create a virtual network with the output "out", while not creating a virtual network adapter in the parent partition. This is done using scripts. I will not go into details, but just give a link: blogs.msdn.com/b/robertvi/archive/2008/08/27/howto-create-a-virtual-swich-for-external-without-creating-a- virtual-nic-on-the-root.aspx

iSCSI

If you plan to use an iSCSI storage system, it is highly recommended to allocate a separate network interface for iSCSI operation, or even two for MPIO operation. If LUNs will be mounted in the host OS, then you just need to leave one or two interfaces unattached to virtual networks. If iSCSI initiators work inside virtual machines, they need to create one or two separate virtual networks that will be used exclusively for iSCSI traffic.

VLAN tagging

VLAN-tagging (IEEE 802.1q) means “marking” of network packets with a special marker (tag), thanks to which a packet can be associated with a particular virtual network (VLAN). In this case, the hosts belonging to different VLANs will be located in different broadcast domains, although they are physically connected to the same equipment. Hyper-V virtual network adapters also support VLAN tagging. To do this, go to the properties of the virtual adapter in the settings of the virtual machine and register the corresponding VLAN ID there.

Active equipment

So far we have talked about network interfaces and virtual network adapters within a host. But it is also necessary to take into account the bandwidth of the active equipment - for example, switches to which our hosts will connect. A simple example: if there is an 8-port 1Gbps switch, and each of the ports utilizes the entire 1Gbps bandwidth, then a 1Gbps uplink will not physically be able to pass through such traffic volumes, which will lead to a drop in performance. This has to be especially taken into account when using iSCSI - the loads there can be high, and packet delays can be quite critical for performance. Therefore, when using iSCSI, it is highly recommended to send iSCSI traffic through separate switches.

Recommendations for the host OS

We now turn to recommendations on the host OS. As you know, Windows Server 2008 R2 can be installed in two different modes: Full and Server Core. In terms of the work of the hypervisor, these modes are no different. Although the Server Core mode seems more difficult at first glance (especially for inexperienced administrators), it is recommended to use this mode. Installing an OS in Server Core mode has the following advantages over a full installation:

Smaller updates
Smaller attack surface for potential intruders
Lower processor and memory load in parent partition

Running other applications in the host OS

Running third-party (not related to Hyper-V) applications in the guest OS, as well as installing other server roles besides Hyper-V can lead to a dramatic drop in performance, as well as to a decrease in stability. The fact is that due to the peculiarities of the Hyper-V architecture, all the interaction of virtual machines with devices passes through the parent partition. Therefore, high loads or “falling into the blue screen” in the parent partition will necessarily lead to a drop in performance or simply to a “drop” in all running virtual machines. Here you can (and should) include antivirus software. Whether it is necessary at all on a host that will not do anything except virtualization, is, of course, another question. However, if the antivirus is installed, the first thing to do is to exclude from the list all folders where virtual machine files may be located. Otherwise, the performance may slow down during scanning, and if something similar to a virus is found in any VHD file, then the VHD itself may spoil the anti-virus package when trying to treat it. Similar cases have been observed with MS Exchange databases, and therefore the first recommendation is not to put file antiviruses on Exchange servers at all, and if you do, add folders with databases to exclusions.

Virtual Machine Recommendations

The steps that need to be taken to improve the performance of the virtual machines themselves depend on the applications that will be run on them. Microsoft has recommendations (best practices) for each of the applications - Exchange, SQL Server, IIS, and others. Similar recommendations exist for software of other vendors. Here I will give only general recommendations that are independent of specific software.
It will explain why you need to install Integration Services in the guest OS, how to simplify the deployment of new virtual machines using the VHD library, and how to keep these VHDs up to date with the release of new patches.

Integration services

Integration services are a set of drivers running inside the guest OS. They should be installed immediately after installing the OS. At the moment, the list of supported OS is as follows:

Windows 2000 Server SP4
Windows Server 2003 SP2
Windows Server 2008
Windows XP SP2, SP3
Windows Vista SP1
SUSE Linux Enterprise Server 10 SP3 / 11
Red Hat Enterprise Linux 5.2 - 5.5

Windows 7 and Windows Server 2008 R2 contain integration services in the installation package, so they do not need to be installed on these OSes additionally.
Installation of integration services allows the use of synthetic devices that have higher performance compared to emulated. Learn more about the difference between emulated and synthetic devices in my article on Hyper-V architecture.
Here is a list of drivers included in Integration Services:

IDE-controller - replaces the emulated IDE-controller, which increases the speed of access to the disks
SCSI controller is a fully synthetic device and requires the installation of integration services for work. Up to 64 disks can be connected to each SCSI controller, there can be up to 4 controllers per virtual machine.
Network adapter - has a higher performance than the emulated (Legacy Network Adapter), and supports special features such as VMQ.
Video and mouse - increase the convenience of managing a virtual machine through its console.

In addition to the drivers listed, the following functions are supported when installing integration services:

Operating System Shutdown - the ability to correctly shut down the guest OS without a login to it. Same as pressing the Power button on an ATX chassis.
Time Synchronization - as the name implies - synchronization of the system time between the host and the guest OS.
Data Exchange - the exchange of registry keys between the guest and the host OS. Thus, for example, the guest OS can determine the name of the host on which it is running. This feature is only available for guest OS family MS Windows.
Heartbeat is a special service that periodically sends special signals, meaning that everything is fine with the virtual machine. If the guest OS for some reason, for example, hangs, it will stop sending Heartbeat, and this can serve as a signal, for example, for an automatic reboot.
Online Backup - is a VSS Writer, allowing at any time to obtain a consistent backup copy of the virtual machine data. When you start a backup through VSS, applications running in a virtual machine automatically dump data to disk, and therefore the backup is consistent.

To install integration services in Windows OS, select Action - Integration Services Setup . In this case, an ISO-image with installation files will automatically be mounted to the virtual machine, and the installation process will start. If Autorun is disabled on the guest system, the installation process will have to be started manually.
Integration components for Linux are not included in the distribution of Windows Server - they must be downloaded from the Microsoft website.

Sysprep: create a master image

If you have a sufficiently large infrastructure, and you often have to create new virtual machines and install an OS on them, a set of ready-made "master images" of virtual hard disks will save a lot of time. «-», VHD-, , VHD . ( – ).
- :

, , ,
Sysprep, , (SID).

, «mini-setup». , , .

-

-, . , : , - , -. , , – . , , – . , - – «Offline Virtual Machine Servicing Tool». System Center Virtual Machine Manager (SCVMM), WSUS SCCM, , , . :

, SCVMM, – maintenance host.
, .
, VHD- .

Offline Virtual Machine Servicing Tool . – : www.microsoft.com/solutionaccelerators .

Conclusion

, . , - .

Source: https://habr.com/ru/post/99255/

All Articles