Good day!
In this article I want to talk about how to slightly improve the performance of an ESXi host using SSD caching. At work and at home I use products from the company VMware, the home laboratory is based on Free ESXi 6.5. The host runs virtual machines for both the home infrastructure and for testing some working projects (somehow I had to run the VDI infrastructure on it). Gradually, applications of thick VMs began to rest on the performance of the disk system, and everything did not fit on the SDD. The solution was chosen lvmcache. The logic diagram looks like this:
The basis of the whole scheme is a CentOS 7 based svm VM. It presented HDD disks using RDM and a small VMDK disk from an SSD datastor. Caching and data mirroring are implemented by software - mdadm and lvmcache. VM disk space is mounted to the host as an NFS datastore. Part of the SSD datastor is dedicated to VMs that need a high-performance disk subsystem.
')
Computing node assembled on the desktop gland:
MB: Gygabyte GA-Z68MX-UD2H-B3 (rev. 1.0)
HDD: 2 x Seagate Barracuda 750Gb, 7200 rpm
SSH: OCZ Vertex 3 240Gb
There are 2 RAID controllers on the motherboard:
- Intel Z68 SATA Controller
- Marvell 88SE9172 SATA Controller
I didnβt manage to set up 88SE9172 in ESXi (Marvell adapters (at least 88SE91xx)), I decided to leave both controllers in ACHI mode.
Rdm
RDM (Raw Device Mapping) technology allows a virtual machine to directly access a physical drive. Communication is provided through special mapping files on a separate VMFS volume. RDM uses two compatibility modes:
- Virtual mode - works in the same way as in the case of a virtual disk file, allows you to take advantage of the virtual disk in VMFS (file locking mechanism, instant snapshots);
- Physical mode - provides direct access to the device for applications that require a lower level of control.
In virtual mode, read / write operations are sent to the physical device. The RDM device is represented in the guest OS as a virtual disk file, the hardware characteristics are hidden.
In physical mode, almost all SCSI commands are transmitted to the device, in the guest OS, the device is represented as real.
By connecting the disk drives to the VM using RDM, you can get rid of the VMFS interlayer, and in the physical compatibility mode, their state can be monitored on the VM (using SMART technology). In addition, if something happens to the host, then you can access the VM by mounting the HDD to the working system.
lvmcache
lvmcache provides transparent caching of data of slow HDD devices on fast SSD devices. LVM cache places the most frequently used blocks on a fast device. Turning on and off caching can be done without interrupting work.
When you try to read the data, it turns out whether this data is in the cache. If the required data is not there, then the reading takes place from the HDD, and along the way the data is written to the cache (cache miss). Further reading of the data will come from the cache (cache hit).
Record
- Write-through mode - when a write operation occurs, data is recorded both in the cache and on the HDD disk, a safer option, the probability of data loss in case of an accident is small;
- Write-back mode - when a write operation occurs, data is first written to the cache, then flushed to disk, there is a possibility of data loss in the event of a crash. (A faster option, since the write operation completion signal is transmitted to the managing OS after receiving the data by the cache).
This is how to reset the data from the cache (write-back) to the disks:
System Setup
SSD datastore is created on the host. I chose this scheme for using the available space:
220Gb β DATASTORE_SSD
149Gb β
61Gb β
10Gb β Host Swap Cache
A virtual network looks like this:
Created a new vSwitch:
Networking β Virtual Switches β Add standart virtual switch β (svm_vSwitch, svm_), .
The VMkernel NIC connects to it via the port group:
Networking β VMkernel NICs β Add VMkernel NIC
β Port group β New Port group
β New port group β β svm_PG
β Virtual switch β svm_vSwitch
β IPv4 settings β Configuration β Static β IP
Created port group to which VM svm will be connected:
Networking β Port Groups β Add port group β (svm_Network) svm_vSwitch
Disk preparation
You must log in to the ssh host and run the following commands:
:
VM preparation
Now these disks can be connected (Existing hard disk) to the new VM. Template CentOS 7, 1vCPU, 1024Gb RAM, 2 RDM disk, 61Gb ssd disk, 2 vNIC (VM Network group port, svm_Network) - during installation of the OS we use Device Type - LVM, RAID Level - RAID1
Setting up an NFS server is quite simple:
Prepare cache and metadata volumes to enable caching of the volume cl_svm / data:
:
Array status change notifications:
At the end of the /etc/mdadm.conf file, add parameters containing the address to which messages will be sent in case of problems with the array, and, if necessary, specify the sender's address:
MAILADDR alert@domain.ru
MAILFROM svm@domain.ru
For the changes to take effect, you must restart the mdmonitor service:
Mail from VM is sent by means of ssmtp. Since I use RDM in the virtual compatibility mode, the state of the disks will be checked by the host itself.
Host preparation
Add NFS datastore to ESXi:
Storage β Datastores β New Datastore β Mount NFS Datastore
Name: DATASTORE_NFS
NFS server: 10.0.0.2
NFS share: /data
Configure VM autostart:
Host β Manage β System β Autostart β Edit Settings
Enabled β Yes
Start delay β 180sec
Stop delay β 120sec
Stop action β Shut down
Wait for heartbeat β No
Virtual Machines β svm β Autostart β Increase Priority
( , Inventory )
This policy will allow VM svm to start first, the hypervisor will mount the NFS datastore, after that the rest of the machines will turn on. Shutdown occurs in reverse order. The launch delay time of the VM was selected according to the results of the crash test, since with a small Start delay NFS value, the datastor could not be mounted, and the host tried to start the VMs that are not yet available. You can also play with the
NFS.HeartbeatFrequency
parameter.
More flexibly, autostart VMs can be configured using the command line:
:
Small optimization
Enable Jumbo Frames on the host:
Jumbo Frames: Networking β Virtual Switches β svm_vSwitch MTU 9000;
Networking β Vmkernel NICs β vmk1 MTU 9000
In Advanced Settings, set the following values:
NFS.HeartbeatFrequency = 12
NFS.HeartbeatTimeout = 5
NFS.HeartbeatMaxFailures = 10
Net.TcpipHeapSize = 32 ( 0)
Net.TcpipHeapMax = 512
NFS.MaxVolumes = 256
NFS.MaxQueueDepth = 64 ( 4294967295)
Enable Jumbo Frames on VM svm:
Performance
Performance was measured by synthetic dough (for comparison, I took readings from the cluster at work (at night)).
Used software on test VM:
- CentOS OS 7.3.1611 (8 vCPU, 12Gb vRAM, 100Gb vHDD)
- fio v2.2.8
:
The results obtained are presented in tables (* during the tests noted the average CPU usage on VM svm):
VMFS6 DatastoreDisc type | FIO depth 1 (iops) | FIO depth 24 (iops) |
---|
randread | randwrite | randread | randwrite |
HDD | 77 | 99 | 169 | 100 |
SSD | 5639 | 17039 | 40868 | 53670 |
NFS DatastoreSSD Cache | FIO depth 1 (iops) | FIO depth 24 (iops) | CPU / Ready *% |
---|
randread | randwrite | randread | randwrite |
Off | 103 | 97 | 279 | 102 | 2.7 / 0.15 |
On | 1390 | 722 | 6474 | 576 | 15 / 0.1 |
Work ClusterDisc type | FIO depth 1 (iops) | FIO depth 24 (iops) |
---|
randread | randwrite | randread | randwrite |
900Gb 10k (6D + 2P) | 122 | 1085 | 2114 | 1107 |
4Tb 7.2k (8D + 2P) | 68 | 489 | 1643 | 480 |
The results that you can touch with your hands turned out when you simultaneously run five VMs with Windows 7 and an office suite (MS Office 2013 Pro + Visio + Project) at startup. As the cache warms up, the VMs loaded faster, while the HDD practically did not participate in the load. At each launch, I noted the time for a full load of one of the five VMs and a full load of all VMs.
Simultaneous start of five VMNo | Datastore | First start | Second run | Third launch |
---|
Load time of the first VM | Load time of all VMs | Load time of the first VM | Load time of all VMs | Load time of the first VM | Load time of all VMs |
one | Hdd VMFS6 | 4 min. 8s | 6 min. 28s | 3 min. 56s | 6 min. 23s | 3 min. 40sec. | 5 minutes. 50 sec. |
2 | NFS (SSD Cache Off) | 2 minutes. 20sec | 3 min. 2s | 2 minutes. 34 sec. | 3 min. 2s | 2 minutes. 34 sec. | 2 minutes. 57sec |
3 | NFS (SSD Cache On) | 2 minutes. 33 sec. | 2 minutes. 50 sec. | 1 minute. 23s | 1 minute. 51 sec. | 1 minute. 0sec. | 1 minute. 13s |
The loading time of a single VM was:
β HDD VMFS6 - 50
β NFS - 35
β NFS - 26
In the form of a graph:
Crush test
Power down
After powering on and loading the VM host, the svm was booted with the FS scan (data remained in the cache), the NFS datastore was mounted on the host, then the rest of the VMs were loaded, problems and data loss were not observed.
HDD failure (imitation)
I decided to power off the SATA drive. Unfortunately, hot swapping is not supported, it is necessary to shut down the host in an emergency. Immediately after turning off the disk information appears in the Events.
The unpleasant moment was that when a disk was lost, the hypervisor asked the VM svm to answer the question - βRetry. Click Cancel to terminate this session β- the machine is in a freeze state.
If we imagine that there was a temporary, insignificant problem with the disk (for example, the reason for the loop), then after the problem is fixed and the host is turned on, everything is loaded in the normal mode.
SSD failure
The most unpleasant situation is the failure of ssd. Access to the data is in emergency mode. When replacing ssd, you must repeat the system setup procedure.
Service (disk replacement)
If a disk is about to happen (according to the results of SMART), in order to replace it with a worker, you must perform the following procedure (on the VM svm):
:
In the VM settings, you need to βtear offβ the dying vHDD, then replace the HDD with a new one.
Then prepare an RDM drive and add svm to the VM:
, X β SCSI Virtual Device Node vHDD:
Emergency data access
One of the disks connects to the workstation, then you need to βcollectβ RAID, disable the cache and access the data by mounting the LVM volume:
I also tried to boot the system directly from the disk, set up the network, and on another host I connected an NFS datastor - VMs are available.
Summary
As a result, I use lvmcache in write-through mode and a section for a 60Gb cache. Slightly sacrificing the host's CPU and RAM resources - instead of 210Gb is very fast and 1.3Tb of slow disk space, I got 680Gb of fast and 158Gb of very fast, but fault tolerance appeared (but if you unexpectedly fail the disk you will have to participate in the data access process).