Recently, in the cloud environments and hosting more and more began to fall "virtual" hard drives. The hoster’s technical service can assure that the “virtual” disk is as fast as a dozen raids 10 (raid 100 ;-)) and holds hundreds or even thousands of IOPS — however, MySQL is noticeably slower for clients. And how to prove it to a hoster?
The problem is that measuring the “speed” of a virtual hard disk from inside a virtual machine is not easy, because it is unclear what to measure in the first place, what and why. And you need to do this to convince the virtual configuration administrators that this is not about the application and MySQL settings. And it was necessary, as they say, to simply “wash our hands” before reading the
manual to the vault .
In this article, I will illustrate a simple method for finding the “tipping point” of virtual hard disk performance using the tools available in distributions - sysbench and iostat. We will also measure the “tipping point” of Amazon’s known EBS virtual discs, both conventional EBS and Provisioned IOPS EBS (1000 and 2000 IOPS).
Theory
To hell! Too many dimensions and letters — sequential read / write, random read / write, rewrite, the effect of the kernel file cache, query queue optimization, options for the file system architecture ... let's see how the disk, or what is behind it in the network is hidden, breathed.
')
And so that it was not boring, let's add a little romance to gnu.org :-)

How MySQL loads the disk
For InnoDB, for a typical web application, if the dataset does not fit in the RAM (this is why the database was invented ;-)), the information will mainly be read from disk in a random order (buffer pool pages and clustered on disk), and recorded - sequentially (transaction logs and binary log).
Periodically buffer pool from RAM will be flushed to disk - random write. We immediately exclude the absence of a write cache and a “battery” in the virtual storage - it should be, otherwise MySQL in ACID mode (innodb-flush-log-at-trx-commit = 1) will simply die of regret.
It is important to understand here that MySQL client requests are executed in parallel threads - and the disk will be loaded, respectively, by several threads simultaneously.
Create load
Let's start with a simple Amazon EBS disk. We will load the virtual disk with the
sysbench tool (it is available in CentOS in packages, it is easy to assemble from source):
yum install sysbench mkdir -p /mount/disk_ebs/mysql/test_folder cd /mount/disk_ebs/mysql/test_folder sysbench --test=fileio --file-total-size=16G prepare
The important point is that we create a total amount of test files (16G), which is at least 2 times the amount of RAM in the virtual machine (do not ask why 2 times ;-), the more - the better). This is necessary to reduce the effect of the operating system file cache - when re-launching test files, it is better, therefore, to regenerate again (or create several test folders and switch between them).
Now we create load in N threads, emulating ~ N clients of the DBMS server that execute queries (yes, I know that there are several service flows inside the DBMS, but for now let's not complicate things). Suppose it is expected that 10 Apaches will simultaneously work with the database:
sysbench --num-threads=10 --test=fileio --file-test-mode=rndrw --max-time=180 --file-rw-ratio='2' --file-total-size=16G --max-requests=1000000 run
What are we measuring?
Now the fun part. We are not interested in what results sysbench will show - it only gives us a load. We are just watching how the virtual disk feels under load:
iostat –xm 10 Device: rrqm/s wrqm/sr/sw/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util xvdm 0.00 0.00 120.50 0.00 2.05 0.00 34.79 10.50 87.47 8.30 100.00
The disk is “swamped” with requests most of the CPU time, as seen by “% util = 100” - the disk driver accumulates requests into the queue and as the device is ready it “feeds” them (some confuse this indicator with the bus bandwidth to the disk, which, of course Wrong). It is clear that if the driver is forced to wait almost all the time, then the disk is loaded to capacity.
The average processing time for a single “svctm” request is 8.3 ms. Too much, but normal for Amazon drives. There is nothing criminal - ordinary physics.
The average waiting time for processing an “await” request is 87.47 ms and the average length of the request queue in the avgqu-sz disk driver is 10.5. This is a lot, wait almost 100 ms to process a single request! Obviously, how this value is obtained - roughly the queue size (avgqu-sz) is multiplied by the processing time of a single request (svctm).
So, we see that only 10 concurrent requests for arbitrary read / write (keys - file-test-mode = rndrw and - file-rw-ratio = '2') result in slowing down work with a virtual hard disk.
So much so that you have to wait for one request for almost 100 ms. And if the web page creates 200 disk requests - how long will it be built? 20 seconds?
Interestingly, and at what number of threads does the Amazon disk begin to accumulate the queue and serve requests faster at least 50 ms (better than generally less than 20 ms - subjectively)? We see that with 5 threads. Of course, this is a weak indicator, you cannot do without a software raid ...
Device: rrqm/s wrqm/sr/sw/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util xvdm 0.00 0.00 127.50 0.00 2.05 0.00 32.88 5.10 39.78 7.84 100.00
We see that the queue size is 5.1 and the execution time of one request is 39.78.
Testing Amazon's “fast” virtual disks
Relatively recently, Amazon announced "fast" drives with a guaranteed number of IOPS (IOPS theory is as extensive as our country google,
en.wikipedia.org/wiki/IOPS ). We know that mere mortal SATA drives do not hold more than 100 IOPS (simultaneous read and write), and, unfortunately, also mortal 15k SAS drives — no more than ~ 200 IOPS. It is also known that SSD disks and SAN on other technologies can master hundreds, and even thousands of IOPS - it is clear that they are much more expensive.
So, let's see at what number of simultaneous streams the “fast” Amazon virtual disks start to blunt and collect requests into the queue. Break the goat left leg!

EBS Disk with 1000 IOPS
One thread: Device: rrqm/s wrqm/sr/sw/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util xvdk 0.00 0.00 1084.50 0.00 27.33 0.00 51.61 3.72 3.43 0.91 99.20
Pay attention to the short processing time of the request by the virtual disk itself - 0.91 ms. Apparently the array on the SSD ;-) The queue size is ~ 4, and the average time for one request is 3.43 ms.
20 threads: Device: rrqm/s wrqm/sr/sw/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util xvdk 0.00 0.00 1059.50 0.00 26.37 0.00 50.98 55.39 51.97 0.94 100.00
We see that at 20 threads the request will have to wait ~ 50 ms due to the queue formation of 55 requests.
EBS Disk since 2000 IOPS
20 threads: Device: rrqm/s wrqm/sr/sw/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util xvdl 0.00 0.00 1542.50 0.00 36.29 0.00 48.18 33.20 21.29 0.65 100.00
50 threads: Device: rrqm/s wrqm/sr/sw/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util xvdl 0.00 0.00 1498.50 0.00 36.63 0.00 50.06 86.17 57.05 0.67 100.00
Results of measurements
We see that the EBS disk with 2000 IOPS shows approximately the same latency (~ 50ms) on 50 streams, as a disk with 1000 IOPS on 20 streams and a regular EBS disk on 6-7 streams (apparently, in ordinary EBS disks IOPS is within 200 -300).
What else happens
Virtual hard drives often give surprises. Sometimes they were simply under-tuned, because did not have time to finish reading man ...
I recently encountered a similar case when, when creating a multi-threaded test load on an empty MySQL server at a “big-expensive-fast-networked” virtual disk, the svctm rate ranged from 0.5 to 1 ms at night and from ~ 10 to 100 ms - during the day ( I wanted to measure the full moon, did not wait). MySQL, of course - braked. The reason was parallel use of network storage and unaware of each other projects, and not the settings of MySQL, which they tried to make guilty ;-)
Summary
Using the tools at hand, we rather quickly determined the competitive limit of multi-threaded load at which the virtual disk starts to accumulate the queue and serve quite typical MySQL requests for 50 ms or more. Now we can assume how many disks to put together in a raid to ensure latency, say, 10-20 ms for a given number of clients. It is clear that these are approximate data, but they will certainly help to move further, especially if you measure the performance of a real hard disk / raid and come with a cloud of champagne to a cloud hoster ;-)
In conclusion, I congratulate everyone on the past holiday, I wish you fast virtual disks, reliable servers and accurate measurements! Come to us at
Bitrix24 . Good luck!