📜 ⬆️ ⬇️

Flashcache - cheap and angry or alternative to HW RAID 10 SAS

Until 2014, on the FirstVDS servers we used industrial HDDs with
SAS-interface and hardware controllers, assembled in RAID 10. This solution completely satisfied us in terms of reliability and performance. Problems with partial loss of customer data were 3 times in 12 years of use. Twice burned hardware controllers. Once the battery failed and during an emergency power off, the raid’s built-in cache memory was cleared.

However, SAS HDD is expensive. For one server, we took a set of 4 disks of 600 GB each, a hardware RAID controller with a battery. The entire solution cost 44,806 rubles. for 1 TB. We did not want to raise prices for VDS. It was necessary to find a cheaper solution, while not losing in speed and reliability. And ideally, increase the space provided for VDS.

Only SSD is even more expensive. At that time, drives of 240 GB cost from 8000 rubles. It was cheaper to stay on Raid 10 SAS than to use SSD with a total volume of 1 TB. And to increase the storage and even more expensive. Therefore, we reviewed several software solutions and included SSD in tests to compare speed. Table with the results below.
')

Alternative solutions


zfs is a file system and logical partition manager with an adaptive replacement cache developed by Sun Microsystems. Zfs cannot be included in the original version of the Linux kernel due to license incompatibility (CDDL vs GPL). The system can be screwed with DKMS modules, but the effort is not worth it - judging by the public tests, the write / read speed was low. Test themselves did not.

bcache - the development of Google, in 2013 was still raw - was not used in production. It worked only with CentOS 7, and we used CentOS 6. Bcache was also not tested.

lvm cache - Linux community technology. It also worked only with CentOS 7, but at that time there were no public tests - we decided to do it ourselves. The numbers did not like.

flashcache is developed by Facebook: the company inspires confidence, and the technology has already been tested in production.

Flashcache works in 3 modes:


Since write back is the fastest mode, we chose it for tests.

MD - software raid. Flashcache works in conjunction with MD and Raid 1. We included MD without Flashcache in testing to test how it works separately.

Test results


In order to bring the study conditions as close as possible to the real ones, they started random writing and reading a 32 GB file (mounted file system).
Parameterraid10 sasSSDMDflashcache write back
queue depth32323232
IOPSread1,40151 4605986,124
Iopswrite99923,0822303 205
reading speed5 607 Kb / s205 842 Kb / s2 393 Kb / s24 496 Kb / s
write speed3 998 Kb / s92 329 Kb / s922 Kb / s12 823 Kb / s

Flashcache in writeback mode bypassed lvmcache and overtook the software raid. He lost a lot of expensive SSD, but most importantly, flashcache surpassed our solution on SAS HDD.

New solution with flashcache


According to a study in January 2014, we implemented flashcache on SSD + SATA HDD. Since then, on the same server is 1 SSD and 2 SATA HDD on 4TB in the mirror. The technology works in writeback mode: it quickly writes data to the cache and slowly drops it onto the main carrier.

In the implementation and maintenance of flashcache, we are faced with some features of the technology.

Flashcache features


1) SSD wears out

Due to the exceeded number of entries / overwrites, SSD stops recording new data. To prevent this, we monitor SMART attributes:


The monitoring program monitors these values ​​automatically and notifies employees about problem disks. We can only change them in time.

We used to use 240 GB disks, they worked less than a year. Now over-provisioning technology allows us to increase the backup disk area and due to this, extend the lifetime of the SSD. We cut the 1 TB disk to 240 GB, this is the working area, the remaining 760 GB is a reserve for wear. Now SSD is on average 1 year.

2) Crashes when SSD burns and unsynchronized (dirty) data is lost

In writeback mode, the data first gets into the SSD cache and only then into the SATA HDD memory. Data that did not have time to fold on the SATA HDD, called dirty. In case of failure, they are irrevocably burned out along with the SSD. In case of emergency power failure, SSD can also fail with data loss.

Fortunately, failures do not occur as often. For 2.5 years, we had two cases with the loss of client data that did not have time to register in the repository.

There are two ways to reduce the number of failures:


3) Long clean the cache

Change SSD and configure flashcache - 5 minutes. But before that, you need to clear the cache - throw all the dirty data on the disks.

On average, we have 30% of dirty data on the SSD, the maximum is 70%. Clearing the cache takes up to 4 hours.

At this time, the system is slower because it refers to slow media. We always warn customers about speed drops, but we cannot force the process. The speed of writing to SATA HDD depends on how intensively the clients use the disk. The more intensively they use, the greater the load and the slower the recording speed.

4) Cache can overflow

Frequently used data is in cache and is called hot. On our servers, they are about 13%, maximum 62%. This amount is enough for a quick read / write of all VDS on the server. But overflowing the cache and reducing productivity can be a distrust of the entire customer.

Suppose a client wants to test a disk subsystem. Run a random file writer. If the client's disk is larger in cache, everything is bad. The cache will overflow and everything will fall into poor performance. All VDS on the server will be affected.

If you decide to conduct such a test, do not expect actual results. We programmatically limit the violator of the number of hits on the disk, it reduces the speed.

5) Flashcache does not work on Centos 7

After updating the kernel, flashcache became incompatible with Centos 7. Since this version of the distribution kit costs 50% of our servers, the problem is acute. Now Centos 7 is used with sw raid1 with SSD. On three clusters, we are testing enhanceio — another caching technology — but we are not yet ready to voice the results.

Flashcache deployment results


We calculate how flashcache on SSD + SATA HDD is more profitable than RAID 10 SAS. To do this, we calculate the cost of each solution.
RAID 10 SASapproximate prices for March 2013
SAS 600 GB, 4 pcs.7714 rub. x 4
hardware controller + battery8600 rub. + 4500 rub.
cable850 rub.
= 44806 rub. or $ 1493 (at the rate of $ 1 = 30 rubles.)
- cost of 1 TB of space on the parent server

SATA HDD + SSDprices in May 2017
SATA HDD 4 TB, 2 pcs.12 000 rub. x 2
SSD17 100 rub.
= 41 100 rub. or $ 685 (at the rate of $ 1 = 60 rubles.)


Since 2013, the dollar has risen by 2 times. Therefore, a solution with a flashcache in rubles costs almost as much as RAID 10 SAS, and in dollars is 2 times cheaper.

Increasing the storage capacity by 4 times, we reduced the price of 1 TB. Now it is cheaper by 4 times in rubles and 8 times in dollars.

Conclusion


In 2014, we implemented flashcache - we increased the space provided for VDS 4 times, and increased the speed of interaction with the disk subsystem. This solution came out cheaper than the previous one, allowing us to reduce costs and not raise prices for VDS.

Reliability remained a question, but there were fewer failures with the HW RAID 10 SAS. In May 2015 for people for whom reliability and speed are crucial, we introduced tariffs with SSD as the main carrier.

Source: https://habr.com/ru/post/330040/


All Articles