⬆️ ⬇️

Accelerate Home ESXi 6.5 with SSD Caching

Good day!



In this article I want to talk about how to slightly improve the performance of an ESXi host using SSD caching. At work and at home I use products from the company VMware, the home laboratory is based on Free ESXi 6.5. The host runs virtual machines for both the home infrastructure and for testing some working projects (somehow I had to run the VDI infrastructure on it). Gradually, applications of thick VMs began to rest on the performance of the disk system, and everything did not fit on the SDD. The solution was chosen lvmcache. The logic diagram looks like this:







The basis of the whole scheme is a CentOS 7 based svm VM. It presented HDD disks using RDM and a small VMDK disk from an SSD datastor. Caching and data mirroring are implemented by software - mdadm and lvmcache. VM disk space is mounted to the host as an NFS datastore. Part of the SSD datastor is dedicated to VMs that need a high-performance disk subsystem.

')

Computing node assembled on the desktop gland:



MB: Gygabyte GA-Z68MX-UD2H-B3 (rev. 1.0)

HDD: 2 x Seagate Barracuda 750Gb, 7200 rpm

SSH: OCZ Vertex 3 240Gb



There are 2 RAID controllers on the motherboard:



- Intel Z68 SATA Controller

- Marvell 88SE9172 SATA Controller



I didn’t manage to set up 88SE9172 in ESXi (Marvell adapters (at least 88SE91xx)), I decided to leave both controllers in ACHI mode.



Rdm



RDM (Raw Device Mapping) technology allows a virtual machine to directly access a physical drive. Communication is provided through special mapping files on a separate VMFS volume. RDM uses two compatibility modes:



- Virtual mode - works in the same way as in the case of a virtual disk file, allows you to take advantage of the virtual disk in VMFS (file locking mechanism, instant snapshots);

- Physical mode - provides direct access to the device for applications that require a lower level of control.



In virtual mode, read / write operations are sent to the physical device. The RDM device is represented in the guest OS as a virtual disk file, the hardware characteristics are hidden.



In physical mode, almost all SCSI commands are transmitted to the device, in the guest OS, the device is represented as real.



By connecting the disk drives to the VM using RDM, you can get rid of the VMFS interlayer, and in the physical compatibility mode, their state can be monitored on the VM (using SMART technology). In addition, if something happens to the host, then you can access the VM by mounting the HDD to the working system.



lvmcache



lvmcache provides transparent caching of data of slow HDD devices on fast SSD devices. LVM cache places the most frequently used blocks on a fast device. Turning on and off caching can be done without interrupting work.





When you try to read the data, it turns out whether this data is in the cache. If the required data is not there, then the reading takes place from the HDD, and along the way the data is written to the cache (cache miss). Further reading of the data will come from the cache (cache hit).



Record



- Write-through mode - when a write operation occurs, data is recorded both in the cache and on the HDD disk, a safer option, the probability of data loss in case of an accident is small;

- Write-back mode - when a write operation occurs, data is first written to the cache, then flushed to disk, there is a possibility of data loss in the event of a crash. (A faster option, since the write operation completion signal is transmitted to the managing OS after receiving the data by the cache).



This is how to reset the data from the cache (write-back) to the disks:







System Setup



SSD datastore is created on the host. I chose this scheme for using the available space:



220Gb β€” DATASTORE_SSD

149Gb β€”

61Gb β€”

10Gb β€” Host Swap Cache




A virtual network looks like this:







Created a new vSwitch:



Networking β†’ Virtual Switches β†’ Add standart virtual switch β€” (svm_vSwitch, svm_), .



The VMkernel NIC connects to it via the port group:



Networking β†’ VMkernel NICs β†’ Add VMkernel NIC

β€” Port group β€” New Port group

β€” New port group β€” β€” svm_PG

β€” Virtual switch β€” svm_vSwitch

β€” IPv4 settings β€” Configuration β€” Static β€” IP




Created port group to which VM svm will be connected:



Networking β†’ Port Groups β†’ Add port group β€” (svm_Network) svm_vSwitch



Disk preparation



You must log in to the ssh host and run the following commands:



     : # ls -lh /vmfs/devices/disks/ lrwxrwxrwx 1 root root 72 Feb 22 20:24 vml.01000000002020202020202020202020203956504257434845535433373530 -> t10.ATA_____ST3750525AS_________________________________________9*E lrwxrwxrwx 1 root root 72 Feb 22 20:24 vml.01000000002020202020202020202020203956504257434b46535433373530 -> t10.ATA_____ST3750525AS_________________________________________9*F   ,    Β«mapping fileΒ»: # cd /vmfs/volumes/DATASTORE_SSD/  RDM    : # vmkfstools -r /vmfs/devices/disks/vml.01000000002020202020202020202020203956504257434845535433373530 9*E.vmdk # vmkfstools -r /vmfs/devices/disks/vml.01000000002020202020202020202020203956504257434b46535433373530 9*F.vmdk 


VM preparation



Now these disks can be connected (Existing hard disk) to the new VM. Template CentOS 7, 1vCPU, 1024Gb RAM, 2 RDM disk, 61Gb ssd disk, 2 vNIC (VM Network group port, svm_Network) - during installation of the OS we use Device Type - LVM, RAID Level - RAID1



Setting up an NFS server is quite simple:



 # yum install nfs-utils # systemctl enable rpcbind # systemctl enable nfs-server # systemctl start rpcbind # systemctl start nfs-server # vi /etc/exports /data 10.0.0.1(rw,sync,no_root_squash,no_subtree_check) # exportfs -ar # firewall-cmd --add-service=nfs --permanent # firewall-cmd --add-service=rpc-bind --permanent # firewall-cmd --add-service=mountd --permanent # firewall-cmd --reload 


Prepare cache and metadata volumes to enable caching of the volume cl_svm / data:



      : # pvcreate /dev/sdc # vgextend cl_svm /dev/sdc    ,  "man" ,       1000     : # lvcreate -L 60M -n meta cl_svm /dev/sdc    : # lvcreate -L 58,9G -n cache cl_svm /dev/sdc  -  : # lvconvert --type cache-pool --cachemode writethrough --poolmetadata cl_svm/meta cl_svm/cache   -   : # lvconvert --type cache --cachepool cl_svm/cache cl_svm/data     : # lvs -o cache_read_hits,cache_read_misses,cache_write_hits,cache_write_misses CacheReadHits CacheReadMisses CacheWriteHits CacheWriteMisses 421076 282076 800554 1043571 


Array status change notifications:



At the end of the /etc/mdadm.conf file, add parameters containing the address to which messages will be sent in case of problems with the array, and, if necessary, specify the sender's address:



MAILADDR alert@domain.ru

MAILFROM svm@domain.ru




For the changes to take effect, you must restart the mdmonitor service:



 #systemctl restart mdmonitor 


Mail from VM is sent by means of ssmtp. Since I use RDM in the virtual compatibility mode, the state of the disks will be checked by the host itself.



Host preparation



Add NFS datastore to ESXi:



Storage β†’ Datastores β†’ New Datastore β†’ Mount NFS Datastore

Name: DATASTORE_NFS

NFS server: 10.0.0.2

NFS share: /data




Configure VM autostart:



Host β†’ Manage β†’ System β†’ Autostart β†’ Edit Settings

Enabled β€” Yes

Start delay β€” 180sec

Stop delay β€” 120sec

Stop action β€” Shut down

Wait for heartbeat β€” No



Virtual Machines β†’ svm β†’ Autostart β†’ Increase Priority

( , Inventory )




This policy will allow VM svm to start first, the hypervisor will mount the NFS datastore, after that the rest of the machines will turn on. Shutdown occurs in reverse order. The launch delay time of the VM was selected according to the results of the crash test, since with a small Start delay NFS value, the datastor could not be mounted, and the host tried to start the VMs that are not yet available. You can also play with the NFS.HeartbeatFrequency parameter.



More flexibly, autostart VMs can be configured using the command line:



     : # vim-cmd hostsvc/autostartmanager/get_autostartseq      (): # update_autostartentry VMId StartAction StartDelay StartOrder StopAction StopDelay WaitForHeartbeat : # vim-cmd hostsvc/autostartmanager/update_autostartentry 3 "powerOn" "120" "1" "guestShutdown" "60" "systemDefault" 


Small optimization



Enable Jumbo Frames on the host:



Jumbo Frames: Networking β†’ Virtual Switches β†’ svm_vSwitch MTU 9000;

Networking β†’ Vmkernel NICs β†’ vmk1 MTU 9000




In Advanced Settings, set the following values:



NFS.HeartbeatFrequency = 12

NFS.HeartbeatTimeout = 5

NFS.HeartbeatMaxFailures = 10

Net.TcpipHeapSize = 32 ( 0)

Net.TcpipHeapMax = 512

NFS.MaxVolumes = 256

NFS.MaxQueueDepth = 64 ( 4294967295)




Enable Jumbo Frames on VM svm:



 # ifconfig ens224 mtu 9000 up # echo MTU=9000 >> /etc/sysconfig/network-scripts/ifcfg-ens224 


Performance



Performance was measured by synthetic dough (for comparison, I took readings from the cluster at work (at night)).



Used software on test VM:



- CentOS OS 7.3.1611 (8 vCPU, 12Gb vRAM, 100Gb vHDD)

- fio v2.2.8



    : # dd if=/dev/zero of=/dev/sdb bs=2M oflag=direct # fio -readonly -name=rr -rw=randread -bs=4k -runtime=300 -iodepth=1 -filename=/dev/sdb -ioengine=libaio -direct=1 # fio -readonly -name=rr -rw=randread -bs=4k -runtime=300 -iodepth=24 -filename=/dev/sdb -ioengine=libaio -direct=1 # fio -name=rw -rw=randwrite -bs=4k -runtime=300 -iodepth=1 -filename=/dev/sdb -ioengine=libaio -direct=1 # fio -name=rw -rw=randwrite -bs=4k -runtime=300 -iodepth=24 -filename=/dev/sdb -ioengine=libaio -direct=1 


The results obtained are presented in tables (* during the tests noted the average CPU usage on VM svm):

VMFS6 Datastore
Disc typeFIO depth 1 (iops)FIO depth 24 (iops)
randreadrandwriterandreadrandwrite
HDD7799169100
SSD5639170394086853670


NFS Datastore
SSD CacheFIO depth 1 (iops)FIO depth 24 (iops)CPU / Ready *%
randreadrandwriterandreadrandwrite
Off103972791022.7 / 0.15
On1390722647457615 / 0.1


Work Cluster
Disc typeFIO depth 1 (iops)FIO depth 24 (iops)
randreadrandwriterandreadrandwrite
900Gb 10k (6D + 2P)122108521141107
4Tb 7.2k (8D + 2P)684891643480


The results that you can touch with your hands turned out when you simultaneously run five VMs with Windows 7 and an office suite (MS Office 2013 Pro + Visio + Project) at startup. As the cache warms up, the VMs loaded faster, while the HDD practically did not participate in the load. At each launch, I noted the time for a full load of one of the five VMs and a full load of all VMs.

Simultaneous start of five VM
NoDatastoreFirst startSecond runThird launch
Load time of the first VMLoad time of all VMsLoad time of the first VMLoad time of all VMsLoad time of the first VMLoad time of all VMs
oneHdd VMFS64 min. 8s6 min. 28s3 min. 56s6 min. 23s3 min. 40sec.5 minutes. 50 sec.
2NFS (SSD Cache Off)2 minutes. 20sec3 min. 2s2 minutes. 34 sec.3 min. 2s2 minutes. 34 sec.2 minutes. 57sec
3NFS (SSD Cache On)2 minutes. 33 sec.2 minutes. 50 sec.1 minute. 23s1 minute. 51 sec.1 minute. 0sec.1 minute. 13s


The loading time of a single VM was:



β€” HDD VMFS6 - 50

β€” NFS - 35

β€” NFS - 26




In the form of a graph:







Crush test



Power down



After powering on and loading the VM host, the svm was booted with the FS scan (data remained in the cache), the NFS datastore was mounted on the host, then the rest of the VMs were loaded, problems and data loss were not observed.



HDD failure (imitation)



I decided to power off the SATA drive. Unfortunately, hot swapping is not supported, it is necessary to shut down the host in an emergency. Immediately after turning off the disk information appears in the Events.







The unpleasant moment was that when a disk was lost, the hypervisor asked the VM svm to answer the question - β€œRetry. Click Cancel to terminate this session ”- the machine is in a freeze state.



If we imagine that there was a temporary, insignificant problem with the disk (for example, the reason for the loop), then after the problem is fixed and the host is turned on, everything is loaded in the normal mode.



SSD failure



The most unpleasant situation is the failure of ssd. Access to the data is in emergency mode. When replacing ssd, you must repeat the system setup procedure.



Service (disk replacement)



If a disk is about to happen (according to the results of SMART), in order to replace it with a worker, you must perform the following procedure (on the VM svm):



    : # cat /proc/mdstat    : # mdadm --detail /dev/md126 /dev/md126   : # mdadm --manage /dev/md127 --fail /dev/sda1 # mdadm --manage /dev/md126 --fail /dev/sda2     : # mdadm --manage /dev/md127 --remove /dev/sda1 # mdadm --manage /dev/md126 --remove /dev/sda2 


In the VM settings, you need to β€œtear off” the dying vHDD, then replace the HDD with a new one.

Then prepare an RDM drive and add svm to the VM:



   ,  X β€”  SCSI  Virtual Device Node   vHDD: # echo "- - -" > /sys/class/scsi_host/hostX/scan   sfdisk   : # sfdisk -d /dev/sdb | sfdisk /dev/sdc     ,      : # mdadm --manage /dev/md127 --add /dev/sdc1 # mdadm --manage /dev/md126 --add /dev/sdc2 # grub2-install /dev/sdc 


Emergency data access



One of the disks connects to the workstation, then you need to β€œcollect” RAID, disable the cache and access the data by mounting the LVM volume:



 # mdadm --assemble --scan # lvremove cl_svm/cache # lvchanange -ay /dev/cl_svm/data # mount /dev/cl_svm/data /mnt/data 


I also tried to boot the system directly from the disk, set up the network, and on another host I connected an NFS datastor - VMs are available.



Summary



As a result, I use lvmcache in write-through mode and a section for a 60Gb cache. Slightly sacrificing the host's CPU and RAM resources - instead of 210Gb is very fast and 1.3Tb of slow disk space, I got 680Gb of fast and 158Gb of very fast, but fault tolerance appeared (but if you unexpectedly fail the disk you will have to participate in the data access process).

Source: https://habr.com/ru/post/323824/



All Articles