SCT Error Recovery Control

... or what is actually a 'raid edition' for hard drives

A bit of theory

There are two strategies for the behavior of an HDD when an error is detected:

standalone / desktop - try to read to the last. It feels like a “braking screw”, which still works, if it is a single failure, it “blunted, but passed,” plus the characteristic rebuff of recalibrating heads.
raid - fall off right there. It feels like “there was suddenly a disk error but then mhdd, etc. HAVE NOTHING TO FIND ME. ”

The strategies are obviously different in purpose - the desktop will be better protup, but it will not give an error, there is a spare screw in the raid, and no one has any ability to endure the minute brakes on reading. Couldn't read? We read from the spare screws, mark the entire screw as failed, we start the resync, and then the disk will be sent to the utilizer. Perhaps unfairly, but there is nothing to hiccup in a responsible position.

Managing error behavior strategies is a feature of expensive hard drives. In the desktop series, it is often just not there, or it is, but without the right of inclusion - the hard drive tupit over the error as much as it sees fit. The second important point is that on raid hard drives this option is enabled by default. That can lead to problems.
')

Deciphering the name

The ability to control the behavior of the disk in case of errors is called very confusing: SCT ERC. This stands for SCT Error Recovery Control. SCT is in turn the name of a common protocol SMART Command Transport. SMART, in turn, stands for Self-Monitoring, Analysis and Reporting Technology, so the complete SCT ERC decry is: Self-Monitoring, Analysis and Reporting Technology (exhaled).

Quick reference

You can see if the hard disk supports error management using the smartctl -a /dev/sdxx line SCT capabilities:

 SCT capabilities: (0x303f) SCT Status supported. SCT Error Recovery Control supported. ***** SCT Feature Control supported.

If there is no line, the disc does not support them (commands).

Then - in fact, the management process. In the disks that I saw, there are two parameters - the timeout of the read operation and the timeout of the write operation. Below I will give the values for all the disks to which my hands have reached.

To see the timeouts, use the smartctl -l scterc /dev/sda . The output looks like this:

 # smartctl -l scterc /dev/sda SCT Error Recovery Control: Read: 70 (7.0 seconds) Write: 70 (7.0 seconds) # smartctl -l scterc /dev/sde SCT Error Recovery Control: Read: Disabled Write: Disabled # smartctl -l scterc /dev/sdd Warning: device does not support SCT Error Recovery Control command

For installation, respectively, we specify the values separated by commas after scterc: smartctl -l scterc,120,60 /dev/sde (the value is indicated in tenths of a second, that is, 120 corresponds to 12 seconds, the first number is read, the second is write). 0 means "to the bitter end", that is, indefinitely.

Default values

Here are the data from different disks that I have on the farm:

Title	Model	ERC (yes or no, if there are, default values)
Western Digital VelociRaptor	WDC WD1500HLFS-01G6U1	Yes, 7/7
Western Digital RE4 Serial ATA	WDC WD1500HLFS-01G6U1	Yes, 7/7
Western Digital RE3 Serial ATA family	WD1002FBYS-02A6B0	Yes, 7/7
Western Digital Caviar Green (Adv. Format)	WDC WD20EARS-00MVWB0	not supported
Western Digital Caviar Green	WD7500AACS-00D6B0	Yes, 0/0, can not be enabled
Seagate Maxtor DiamondMax 22	STM3500320AS	Yes, 0/0, you can enable
Seagate Barracuda 7200.9	ST3400633AS	No (for the maxtors / sigates of the same years, but for the sowers there is no - wow)
Seagate Barracuda 7200.10	ST3500630AS	not
Seagate Barracuda 7200.11	ST31500341AS	(suddenly!) Yes, 0/0, you can enable
Seagate Barracuda LP	ST31500541AS	Yes, 0/0 (i.e. disabled), you can enable
SAMSUNG SpinPoint F4 EG (AFT)	SAMSUNG HD204UI	Yes, 0/0 (disabled), you can enable
Hitachi Deskstar 7K3000	HDS723030ALA640	Yes, 0/0, cannot be enabled (scsi error aborted command)
Hitachi Deskstar T7K500	HDT725032VLA360	Yes, 0/0, can not be enabled

(just don’t ask me where I got so many drives at home).

Morality

People who take RE4 disks (and other raid editions from other remaining manufacturers), as well as velocity raptors for use as a single hard disk and do not set ERC to zero, make a huge nonsense, comparable only to people’s , which the desktop screws drive into the raid without ERC setup and hope that in case of a failure, the raid will save them.

Essentially: bought a cool screw home in the amount of one piece: turn off the ERC (0,0). Bought a screw in a raid - check that its ERC is different from zero, and better closer to a reasonable value in the region of 3-10s. (300-1000).

Models that require attention on the desktop: WD RE3, RE4, Raptor, Seagate NS.

PS In addition to ERC, manufacturers promise increased quality and reliability of the RE / NS series, but we cannot verify this, but the presence / absence of ERC is an objective easily verifiable feature. A disk without an ERC in a raid should not be under any circumstances, since in case of a failure it will bring more harm than good.

PPS How to perform operations with SMART in Microsoft Windows - I have not the slightest idea. Call the manufacturer's support service and ask. Telephone 8 (800) 200-8001.

For Mac OS X, as far as I know, there is a smartmontools port, so the specified commands (from the root) are quite feasible there.

PPPS (from comments) For WD there is a WDTLER utility (Time-Limited Error Recovery) on some hdd green-series you can still enable ERC / TLER: blog.agdunn.net/?p=208

Source: https://habr.com/ru/post/92701/

All Articles