... or what is actually a 'raid edition' for hard drives

A bit of theory
There are two strategies for the behavior of an HDD when an error is detected:
- standalone / desktop - try to read to the last. It feels like a “braking screw”, which still works, if it is a single failure, it “blunted, but passed,” plus the characteristic rebuff of recalibrating heads.
- raid - fall off right there. It feels like “there was suddenly a disk error but then mhdd, etc. HAVE NOTHING TO FIND ME. ”
The strategies are obviously different in purpose - the desktop will be better protup, but it will not give an error, there is a spare screw in the raid, and no one has any ability to endure the minute brakes on reading. Couldn't read? We read from the spare screws, mark the entire screw as failed, we start the resync, and then the disk will be sent to the utilizer. Perhaps unfairly, but there is nothing to hiccup in a responsible position.
Managing error behavior strategies is a feature of expensive hard drives. In the desktop series, it is often just not there, or it is, but without the right of inclusion - the hard drive tupit over the error as much as it sees fit. The second important point is that on raid hard drives this option is enabled by default. That can lead to problems.
')
Deciphering the name
The ability to control the behavior of the disk in case of errors is called very confusing: SCT ERC. This stands for SCT Error Recovery Control. SCT is in turn the name of a common protocol SMART Command Transport. SMART, in turn, stands for Self-Monitoring, Analysis and Reporting Technology, so the complete SCT ERC decry is:
Self-Monitoring, Analysis and Reporting Technology (exhaled).
Quick reference
You can see if the hard disk supports error management using the
smartctl -a /dev/sdxx
line SCT capabilities:
SCT capabilities: (0x303f) SCT Status supported. SCT Error Recovery Control supported. ***** SCT Feature Control supported.
If there is no line, the disc does not support them (commands).
Then - in fact, the management process. In the disks that I saw, there are two parameters - the timeout of the read operation and the timeout of the write operation. Below I will give the values ​​for all the disks to which my hands have reached.
To see the timeouts, use the
smartctl -l scterc /dev/sda
. The output looks like this:
# smartctl -l scterc /dev/sda SCT Error Recovery Control: Read: 70 (7.0 seconds) Write: 70 (7.0 seconds) # smartctl -l scterc /dev/sde SCT Error Recovery Control: Read: Disabled Write: Disabled # smartctl -l scterc /dev/sdd Warning: device does not support SCT Error Recovery Control command
For installation, respectively, we specify the values ​​separated by commas after scterc:
smartctl -l scterc,120,60 /dev/sde
(the value is indicated in tenths of a second, that is, 120 corresponds to 12 seconds, the first number is read, the second is write). 0 means "to the bitter end", that is, indefinitely.
Default values
Here are the data from different disks that I have on the farm:
Title | Model | ERC (yes or no, if there are, default values) |
---|
Western Digital VelociRaptor | WDC WD1500HLFS-01G6U1 | Yes, 7/7 |
Western Digital RE4 Serial ATA | WDC WD1500HLFS-01G6U1 | Yes, 7/7 |
Western Digital RE3 Serial ATA family | WD1002FBYS-02A6B0 | Yes, 7/7 |
Western Digital Caviar Green (Adv. Format) | WDC WD20EARS-00MVWB0 | not supported |
Western Digital Caviar Green | WD7500AACS-00D6B0 | Yes, 0/0, can not be enabled |
Seagate Maxtor DiamondMax 22 | STM3500320AS | Yes, 0/0, you can enable |
Seagate Barracuda 7200.9 | ST3400633AS | No (for the maxtors / sigates of the same years, but for the sowers there is no - wow) |
Seagate Barracuda 7200.10 | ST3500630AS | not |
Seagate Barracuda 7200.11 | ST31500341AS | (suddenly!) Yes, 0/0, you can enable |
Seagate Barracuda LP | ST31500541AS | Yes, 0/0 (i.e. disabled), you can enable |
SAMSUNG SpinPoint F4 EG (AFT) | SAMSUNG HD204UI | Yes, 0/0 (disabled), you can enable |
Hitachi Deskstar 7K3000 | HDS723030ALA640 | Yes, 0/0, cannot be enabled (scsi error aborted command) |
Hitachi Deskstar T7K500 | HDT725032VLA360 | Yes, 0/0, can not be enabled |
(just don’t ask me where I got so many drives at home).
Morality
People who take RE4 disks (and other raid editions from other remaining manufacturers), as well as velocity raptors for use as a single hard disk and do not set ERC to zero, make a huge nonsense, comparable only to people’s , which the desktop screws drive into the raid without ERC setup and hope that in case of a failure, the raid will save them.
Essentially: bought a cool screw home in the amount of one piece: turn off the ERC (0,0). Bought a screw in a raid - check that its ERC is different from zero, and better closer to a reasonable value in the region of 3-10s. (300-1000).
Models that require attention on the desktop: WD RE3, RE4, Raptor, Seagate NS.
PS In addition to ERC, manufacturers promise increased quality and reliability of the RE / NS series, but we cannot verify this, but the presence / absence of ERC is an objective easily verifiable feature. A disk without an ERC in a raid should not be under any circumstances, since in case of a failure it will bring more harm than good.
PPS How to perform operations with SMART in Microsoft Windows - I have not the slightest idea. Call the manufacturer's support service and ask. Telephone 8 (800) 200-8001.
For Mac OS X, as far as I know, there is a smartmontools port, so the specified commands (from the root) are quite feasible there.
PPPS (from comments) For WD there is a WDTLER utility (Time-Limited Error Recovery) on some hdd green-series you can still enable ERC / TLER: blog.agdunn.net/?p=208