⬆️ ⬇️

Choosing a disk system for the database MySQL

For many large high-load web projects, the speed of database operation is often a bottleneck in performance. You can add memory, tyunit certain parameters ... But in the end, most often everything depends on the disk.







We ourselves experienced similar bottlenecks on our own projects, periodically observing disc utilization close to 100% in iostat.

')

About our experience in solving this issue and want to tell you in this post ...



The first (and the most seemingly obvious) solution is to use faster disks .



The fastest currently - probably, SSD.



SSD drives work very, very fast! But…



Even Domas Mituzas (a database performance engineer on Facebook), who spoke at the recent Highload ++ 2011 conference, said something like: " If we could use SSD everywhere, we wouldn’t need to invent anything at all in terms of performance, our entire the work would not make much sense . "



Another approach is to use not one, but several disks. RAID, in other words .



We already wrote that we are hosting our own projects in the Amazon cloud. And successfully and successfully work with software RAID, collected from Amazon EBS disks .



There are a lot of different RAID configurations .



Surely, many of you have already seen and read the results of tests on EBS disks in Amazon, published in MySQL Performance Blog .



They are quite curious and interesting, but they did not suit us very much. Basically, by the fact that very different results are not correctly compared (for example, reading from one disk in one stream, RAID 0 - 8 threads, RAID 10 - 4; etc.)



Therefore, we decided to conduct our own testing. The same tool - sysbench.



We decided to work with RAID 10. It is at the same time fast and reliable. And, here, its various configurations - it is a lot of.



A small retreat. In the process of testing, we appreciated another very important advantage of the “cloud”: in the “cloud” it is very convenient to conduct a variety of tests, collecting and disassembling any test stands! And at the same time to pay - only for the time of real use!



So. We have collected 5 stands.



1. single disk - 100 Gb



2. RAID 10 - 4 50 Gb drives



They added 4 disks in the Amazon admin panel, connected them by assigning appropriate names, and then created a raid like this:



# mdadm --create /dev/md0 --level=10 --raid-devices=4 /dev/xvd[gj]



3. RAID 10 - RAID 0 of two RAID 1 (each with 2 disks of 50 Gb each)



The same procedure, but the final raid is created in three steps:



# mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/xvd[gh]

# mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/xvd[ij]

# mdadm --create /dev/md2 --level=0 --raid-devices=2 /dev/md[0-1]




4. RAID 10 - 8 25 Gb drives



Similar to paragraph 2, but only connect 8 drives, not 4.



# mdadm --create /dev/md0 --level=10 --raid-devices=8 /dev/xvd[gn]



5. RAID 10 - RAID 0 of four RAID 1 (each with 2 disks of 25 Gb)



# mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/xvd[gh]

# mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/xvd[ij]

# mdadm --create /dev/md2 --level=1 --raid-devices=2 /dev/xvd[kl]

# mdadm --create /dev/md3 --level=1 --raid-devices=2 /dev/xvd[mn]

# mdadm --create /dev/md4 --level=0 --raid-devices=4 /dev/md[0-3]




The ext4 file system was used on all test benches. Mount options:



noatime,nodiratime,data=writeback,barrier=0



For the tests used sysbench - 256 MB file; modes - random read, random write, random read / write; different number of threads - from 1 to 16.















X axis - the number of threads

Y axis - the number of operations per second.



On reading - everything is comparable in results. Raid does not give much advantage.



But this picture is very distorted, since the file cache has greatly influenced the results (the test file is placed in RAM entirely).



According to the record, some raids are losing (some overheads have an effect).



* * *



Any question that begins with the words "What is better ..." does not make sense in itself.



Which CMS is better?



Which database to choose?



What is better to choose as a RAID?



With any choice, the set and solved tasks are always important!



We choose a disk system for the database. The format of data storage with us is InnoDB.



This means that, basically, we work with large files (several GB) of ibdata.



A typical load profile is random read / write (more reads).



And now, based on a more understandable real problem, we are doing a new series of tests - on a 16 GB file.















* * *



We summarize.



The typical work of the MySQL database is random read / write, there are more reads than records. The most productive for this task - RAID 10 with a large number of disks.



The minus of such a solution is double the cost of the disks (which, at their current cost, is not critical).



The main advantage is that we have a simple solution (software RAID can be collected both on a physical server and in the cloud) for scaling the performance of a disk system.

Source: https://habr.com/ru/post/130096/



All Articles