📜 ⬆️ ⬇️

SCT Error Recovery Control

... or what is actually a 'raid edition' for hard drives



A bit of theory


There are two strategies for the behavior of an HDD when an error is detected:
The strategies are obviously different in purpose - the desktop will be better protup, but it will not give an error, there is a spare screw in the raid, and no one has any ability to endure the minute brakes on reading. Couldn't read? We read from the spare screws, mark the entire screw as failed, we start the resync, and then the disk will be sent to the utilizer. Perhaps unfairly, but there is nothing to hiccup in a responsible position.

Managing error behavior strategies is a feature of expensive hard drives. In the desktop series, it is often just not there, or it is, but without the right of inclusion - the hard drive tupit over the error as much as it sees fit. The second important point is that on raid hard drives this option is enabled by default. That can lead to problems.
')

Deciphering the name


The ability to control the behavior of the disk in case of errors is called very confusing: SCT ERC. This stands for SCT Error Recovery Control. SCT is in turn the name of a common protocol SMART Command Transport. SMART, in turn, stands for Self-Monitoring, Analysis and Reporting Technology, so the complete SCT ERC decry is: Self-Monitoring, Analysis and Reporting Technology (exhaled).

Quick reference


You can see if the hard disk supports error management using the smartctl -a /dev/sdxx line SCT capabilities:

 SCT capabilities: (0x303f) SCT Status supported. SCT Error Recovery Control supported. ***** SCT Feature Control supported. 

If there is no line, the disc does not support them (commands).

Then - in fact, the management process. In the disks that I saw, there are two parameters - the timeout of the read operation and the timeout of the write operation. Below I will give the values ​​for all the disks to which my hands have reached.

To see the timeouts, use the smartctl -l scterc /dev/sda . The output looks like this:

 # smartctl -l scterc /dev/sda SCT Error Recovery Control: Read: 70 (7.0 seconds) Write: 70 (7.0 seconds) # smartctl -l scterc /dev/sde SCT Error Recovery Control: Read: Disabled Write: Disabled # smartctl -l scterc /dev/sdd Warning: device does not support SCT Error Recovery Control command 

For installation, respectively, we specify the values ​​separated by commas after scterc: smartctl -l scterc,120,60 /dev/sde (the value is indicated in tenths of a second, that is, 120 corresponds to 12 seconds, the first number is read, the second is write). 0 means "to the bitter end", that is, indefinitely.

Default values


Here are the data from different disks that I have on the farm:
TitleModelERC (yes or no, if there are, default values)
Western Digital VelociRaptorWDC WD1500HLFS-01G6U1Yes, 7/7
Western Digital RE4 Serial ATAWDC WD1500HLFS-01G6U1Yes, 7/7
Western Digital RE3 Serial ATA familyWD1002FBYS-02A6B0Yes, 7/7
Western Digital Caviar Green (Adv. Format)WDC WD20EARS-00MVWB0not supported
Western Digital Caviar GreenWD7500AACS-00D6B0Yes, 0/0, can not be enabled
Seagate Maxtor DiamondMax 22STM3500320ASYes, 0/0, you can enable
Seagate Barracuda 7200.9ST3400633ASNo (for the maxtors / sigates of the same years, but for the sowers there is no - wow)
Seagate Barracuda 7200.10ST3500630ASnot
Seagate Barracuda 7200.11ST31500341AS(suddenly!) Yes, 0/0, you can enable
Seagate Barracuda LPST31500541ASYes, 0/0 (i.e. disabled), you can enable
SAMSUNG SpinPoint F4 EG (AFT)SAMSUNG HD204UIYes, 0/0 (disabled), you can enable
Hitachi Deskstar 7K3000HDS723030ALA640Yes, 0/0, cannot be enabled (scsi error aborted command)
Hitachi Deskstar T7K500HDT725032VLA360Yes, 0/0, can not be enabled

(just don’t ask me where I got so many drives at home).

Morality


People who take RE4 disks (and other raid editions from other remaining manufacturers), as well as velocity raptors for use as a single hard disk and do not set ERC to zero, make a huge nonsense, comparable only to people’s , which the desktop screws drive into the raid without ERC setup and hope that in case of a failure, the raid will save them.

Essentially: bought a cool screw home in the amount of one piece: turn off the ERC (0,0). Bought a screw in a raid - check that its ERC is different from zero, and better closer to a reasonable value in the region of 3-10s. (300-1000).

Models that require attention on the desktop: WD RE3, RE4, Raptor, Seagate NS.

PS In addition to ERC, manufacturers promise increased quality and reliability of the RE / NS series, but we cannot verify this, but the presence / absence of ERC is an objective easily verifiable feature. A disk without an ERC in a raid should not be under any circumstances, since in case of a failure it will bring more harm than good.

PPS How to perform operations with SMART in Microsoft Windows - I have not the slightest idea. Call the manufacturer's support service and ask. Telephone 8 (800) 200-8001.

For Mac OS X, as far as I know, there is a smartmontools port, so the specified commands (from the root) are quite feasible there.

PPPS (from comments) For WD there is a WDTLER utility (Time-Limited Error Recovery) on some hdd green-series you can still enable ERC / TLER: blog.agdunn.net/?p=208

Source: https://habr.com/ru/post/92701/


All Articles