On the agenda, not the first year, is the question of the reliability of SSD drives. Someone responds negatively, arguing that far more than one failed drive, while others, on the contrary, advocate that drives feel great even under load and have been working for years. I, perhaps, belong to the second camp and now I will tell you why.
SSD drives in PCs or laptops have become commonplace (especially for ultrabooks or netbooks - because of their small size), but of course the load on them is completely different, respectively, and the statistics on failures are different, and that the most regrettable is extremely insignificant, due to the fact that SSD drives in the server segment still remain exotic, despite the fact that SSD drives themselves have been gaining market share for years and have already managed to “bite off” a small piece of ordinary HDDs. Many people draw conclusions based on early experiences, when SSD only began to appear on the market. And these are well-founded fears. SSD drives rushed into the market extremely rapidly and developed very quickly from a technological point of view, which is why more than once there were problems associated with defects, which affected both the lifespan of the drive and its speed characteristics. These problems were both hardware and software (firmware), but, to our joy, drive manufacturers are actively working towards improving the reliability and quality of their products. The absence of mechanically moving parts does not yet indicate the complete reliability of the device. Unfortunately, any equipment is subject to failure. First of all, the weak point of SSD are memory chips - low resource of MLC chips (and even slightly more reliable SLC chips), which can be dealt with using ECC error correction, redundancy, monitoring SMART for predicting disk failure. But its controller is also a weak point of the drive. Due to the fact that the controller is physically located between the interface and the memory chips, the probability of its damage as a result of a malfunction or power problems is very high. At the same time, the data on the memory modules themselves remain intact, and in theory they can be safely restored, but this is quite a laborious process. As for the firmware - even a small error in the firmware can lead to complete data loss and data recovery becomes almost impossible, due to the complex data structure and the alternation of data recording on different memory modules. But this is not a reason to completely abandon the SSD, because no one is without sin, in our time often come across defective HDD batches or batches with buggy firmware (the scandal with 1.5Tb disks from Seagate 4 years ago still many in memory). But let's go back to the servers. Actually, I was inspired to write this article by viewing the SMART of my SSD, which is under the system in my personal server. Of course, this is not a production server for a large project, but nevertheless - the load there is not so small. To begin with, I will say that under the system I have installed the simplest and cheapest (at the time of purchase - 2.5 years ago) SSD - Intel X25-V Value 40Gb.

But a little background. The server has been standing for more than 6 years, and the only thing that has not changed over the years is the body :) The predecessors of this SSD were WD Raptor, who lived for exactly a year, after which he died suddenly and was replaced by the “elder brother” - WD VelociRaptor, which also after a year of work, he ordered me to live a long time (I still keep it in a box as a memory along with the ancient Quantum Fireball :)). The choice in favor of the Intel X25-V was very simple - it was necessary in the shortest possible time to buy something to replace the dead hard drive, with a very limited budget. It was natural to read reviews and compare time in general, so the first SSD (or rather the first one that came with an adapter of 3.5 ") and the cheapest SSD at that time was bought (although at that time I did not use SSD even on regular PCs). Now I’ll tell you a little about server tasks and what it does, despite the fact that it’s a “home” server, it does a lot, in particular:
- Web-server with several sites, quite visited (about 150 unique per day on each)
- MySQL server, of course, without it, none of the sites will work, but also the database is used for its own statistics system and the zabbix monitoring system also writes its indicators there.
- Deluge (torrent client, yes on the SSD, as many believe - the main killer of solid-state drives :)), which writes temporary files to the system disk, and then transfers the downloaded to a regular HDD.
- zabbix-agentd is one of the most “voracious” services in my system, which is very actively writing various indicators in the database.
- airvideoserver is a mobile video encoder that encodes video on the fly and adds the cache to the system disk.
The rest of the tasks are not so interesting by themselves, so I will simply list them: avahi, dhcpd, fail2ban, iptables, netatalk, nginx, openvpn, php-fpm, pure-ftpd, samba, sshd, zabbix-server — these services are no longer so voracious but actively write logs.
All this works on Gentoo Linux. Why I am telling all this in detail - quite a lot of services are raised on the server, which not only write logs (and the logs in the system are not disabled at all and there is no need to talk about any support for TRIM), but also create temporary files in the system (for example, airvideoserver), as well as extremely hard load disk subsystem just to write.
This is what iotop measured in 5 minutes of work:
Of course, on the one hand, not so much, but the drive is actively used both for reading and for writing. The system logs are written especially actively (when some local ftp search is trying to index the server and cannot be logged in anonymously 100,500 times).
First, let's see if there were any errors during the lifetime of the disk:
And now let's turn to the SMART disk:
Pay attention to the parameter Power_On_Hours - 19727 hours, which gives us 822 days of uptime, and even more interesting indicator - Host_Writes_32MiB - the number of recorded pages of 32Mb, i.e. we count: 473216 * 32/1024/1024 = ~ 14.44Tb. And this is about 18GB / day, not such a small figure, considering that the storage size is only 40GB (respectively, it turns out that the disc is completely overwritten about 370 times, and if we recall that the MLC memory resource is about 10,000 times, we still have a decent margin Although of course it is worth making a reservation that there is no uniform writing to memory modules, information is written more often for some, and less often for some information, but this doesn’t spoil the statistics much, but if we talk about more expensive SSD models for SLC memory , then there is a write resource in each cell ku memory is still 10 times more). Reallocated Sector Count (the number of reassigned, bad blocks) - 6, not such a large number that would have cost you to start worrying about the drive. Also, we are still interested in the Media_Wearout_Indicator parameter - this is a kind of drive state counter. The new drive has this figure equal to 99-100, and almost dying tends to zero (although there are precedents, as with the value Media_Wearout_Indicator - 0, the drive worked fine). Based on this, it can be concluded that the drive is more than alive, although it is used more than just an SSD for a home working machine. And this applies not only to Intel models of production (and I remind you that the X25-V is one of the simplest and cheapest models on the market). Unfortunately, so far there are no new technological solutions for new types of memory, so companies are actively struggling at the front of controllers for SSD drives. In particular, last year OCZ launched its drives based on the new Indilinx controller, which, with the release of each new firmware version, more and more increases its own speed of operation, which is noted by numerous tests on the Internet (by the way, I myself have already switched on my working machines on Vertex 4, but this is by no means an advertisement). Unfortunately, ordinary SSDs rarely give us any utilities to monitor their condition and we have to resort to using third-party software, which is not always optimized to work with data from a particular drive (after all, the SMART parameters of different manufacturers may differ, as well as “Normal” value of these parameters). The maximum that the manufacturer provides is the means for updating the firmware of the drive, which of course is certainly not enough. And to monitor the state parameters of the accumulator, you can use only third-party solutions (as in the example of the article - smartmontools). But the corporate segment lives a little according to other rules, in particular, having a complete set of software for working with a drive is quite a normal thing, which sometimes plays an important role in choosing a drive. ADVANSERV is actively promoting Fusion-io's PCI-E SSD solutions to the Russian corporate market, so I would like to digress from our test subject and talk about the Fusion ioSphere Management Solution software, which aims to monitor the status of their SSD drives.

Key features of this software are:
- The ability to monitor and control multiple devices from one management console
- Intuitive web user interface (which really requires Flash)
- Detection and inventory of ioMemory modules
- A variety of system reports
- Prediction failure
- LDAP user authentication
Now let's look at each of the items in a little more detail.
- The ability to control multiple ioMemory devices from a single management console allows you to simplify the process of managing and monitoring devices, and increase usability and speed.
- Automatic device discovery ioMemory speeds up device commissioning and setup. The ability to copy hardware settings also greatly simplifies the process of configuring new devices.
- Real-time monitoring allows you to actively monitor the health and performance of all ioMemory on your network, in order to anticipate problems before they become critical. Ready-made and customizable alert profiles ensure that the administrator is immediately notified of problems that require his attention, preventing downtime and data loss.
- The performance history presented in the control panel gives you an overview of the operation of the equipment throughout its entire life cycle for more efficient planning of your infrastructure.
- Forecasting a failure on a database of the amount of rewritten information on flash memory helps prevent unexpected equipment and company downtime. In fact, these data are taken from the SMART device and are provided in a convenient graphical representation.
- LDAP authentication allows you to differentiate access to the system based on existing mechanisms, at no additional cost, both financial and temporary.
ioSphere is an extension of the basic set of software ioManager 3. This extension includes greater visibility and reporting for several modules ioMemory (with what these modules can be different models) on multiple servers. The table below lists the additional features of ioSphere:
Hidden textManagement and Reports | ioManager 3 | ioSphere |
Installation | Web server | Web server |
Detection and inventory | One knot | Many nodes |
Remote access | X | X |
ioMemory configurator | One knot | Many nodes |
Real Time Performance Indicator | X | X |
Performance history | | X |
Failure forecast | | X |
Monitoring | ioManager 3 | ioSphere |
IoMemory Device Monitoring | One knot | Many nodes |
Individual Alert Criteria | | X |
SMS / e-mail notifications | | X |
Alert search history | | X |
User authentication | X | |
LDAP authentication | | X |
In the visual presentation, it looks like this:
This software, due to the use of the web-interface, works on different systems, both windows and linux / unix, it is worth mentioning that correct work is guaranteed only in Internet Explorer and Firefox - work in other browsers is possible, but not officially announced (in Particularly known problems in the work of Google Chrome on Mac). In the near future we will try to tell in more detail about various software solutions from
Fusion-io .
In the end, I would like to note that the reliability of modern SSD is not inferior in reliability to solutions on conventional HDD drives. Of course, there is always a risk of equipment failure, and this applies not only to drives. But the manufacturer's warranty and the purchase of equipment from a trusted supplier will minimize your nervous experiences. And the main rule for any drives - make backups. Even the most "stable" solution does not give 100% data integrity.
Korp author