📜 ⬆️ ⬇️

Carefully, Hetzner uses old and worn disks.

Hetzner

HDD tools

About Hetzner on Habré much was mentioned: here and there . At first, our team, like many others, loved it .
')
We have a long-standing relationship with Hetzner. Our old project Name.ly (along with Brief.ly ) is spinning from the Germans from its conception. As for “low-cost dedicated hosting”, they did not complain, they rented servers from Hetzner in 2008. There were no big problems. Support - approx. Not the fastest guys. But for half an hour - an hour, sometimes two - answered and helped.

But since the middle of 2011, the opinion has changed. Maybe our hard was originally ordered in 2008 and 2009 has aged, or for other reasons.

First, at the end of May, hardware RAID burned out - and at the same time “set fire” to us two disks. Two days were spent (with such a similar problem ), but I had to restore everything from the backup.

Then in early September, an old machine also “left” two disks collected in software RAID. Again, it all ended in getting data from backup.

Thanks to Hetzner for at least 100 GB of free FTP for personal use.

By the way, the old Hetzner wheels immediately "destroy". If you do not ask in advance to save the discs for some time, then the data will be lost.

Only half a year has passed, and the machine, which we restored in June, again flew the second hardware RAID. This time, “more successfully” - we managed to download data from one disk.

Here is the state of the disks after the "fall":

Model Family: Seagate Barracuda 7200.11 family
Device Model: ST31500341AS
Firmware Version: CC1H
User Capacity: 1,500,301,910,016 bytes

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

Disk 0:
1 Raw_Read_Error_Rate 0x000f 113 099 006 Pre-fail Always - 56315943
3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 10
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 12
7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always - 245310368
9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 6090
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 10
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 028 028 000 Old_age Always - 72
190 Airflow_Temperature_Cel 0x0022 055 050 045 Old_age Always - 45 (Lifetime Min/Max 45/48)
194 Temperature_Celsius 0x0022 045 050 000 Old_age Always - 45 (0 19 0 0)
195 Hardware_ECC_Recovered 0x001a 039 020 000 Old_age Always - 56315943
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 234921826195397
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 1601743597
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3358359105
...

Disk 1:
1 Raw_Read_Error_Rate 0x000f 120 099 006 Pre-fail Always - 235534254
3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 11
5 Reallocated_Sector_Ct 0x0033 001 001 036 Pre-fail Always FAILING_NOW 4092
7 Seek_Error_Rate 0x000f 091 060 030 Pre-fail Always - 1594050702
9 Power_On_Hours 0x0032 080 080 000 Old_age Always - 18095
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 11
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 099 099 000 Old_age Always - 17180131332
189 High_Fly_Writes 0x003a 001 001 000 Old_age Always - 100
190 Airflow_Temperature_Cel 0x0022 052 047 045 Old_age Always - 48 (Lifetime Min/Max 48/50)
194 Temperature_Celsius 0x0022 048 053 000 Old_age Always - 48 (0 17 0 0)
195 Hardware_ECC_Recovered 0x001a 040 015 000 Old_age Always - 235534254
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 214026810312367
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 3923719077
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3342903896
...


We drove. Asked to put new wheels. But they decided to check this time. Here is the smartctl exhaust:

Model Family: Seagate Barracuda 7200.11 family
Device Model: ST31500341AS
Firmware Version: CC1H
User Capacity: 1,500,301,910,016 bytes

Disk 0:
...
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 1
...
9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 7037
...


Because of this, it would be possible to not make a noise. Although the disc is not fresh, 10 months no matter how how. And here is the second “new” disk:

Disk 1:
...
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 32
...
9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 10155


The disc has already worked 10155 hours, i.e. 423 days, i.e. year and two months. Asked to put a new one. Received rejection:

Dear client,
we check the hard discs again.
We can't guarantee to install brand new hard discs. If you have issues with a hard
disc, you can contact us any time. We are 24/7 available.


They asked not to repeat the summer mistake, then as it turned out they delivered a half-year-old disk to us that died half a year later. Again received a refusal:

We regret a lifetime of
as example 18095
harddisk will get faulty sooner than any other.
If you think that your server is a harddisk please provide us logfiles
which shows the error.
This would be a one-time fee of 39, - Euro
for each harddisk.


$ 39 for a replacement disk, and there is no guarantee that you will get a replacement for the disk. Such a "Russian roulette".

I personally tried to call and talk. I waited 10 and a half minutes on the line - the German said “Guten Abend” several times, after which, without hearing me, he hung up (the call was via Skype - but I normally called England before).

Further, on the mail asked the question, why did we insert the disc with 32 already transferred sectors. In response, hello:

Deacr client,
If you suspect a hardware failure, we can provide a full hardware check. Both hard
disc will checked with the SMARTCTL long check. A hard disc has many spare sectors
and this is not a critical value.
If you want, we can move your request to our supervisor.


Wrote to the head. In response, received a figure:

I can understand your concern about hard drive failure, but as my
colleague
we need the hard disc again. Also a new hard
drive is not possible to install
any clients only new hard drives.
We can't guarantee to install brand new hard discs. If you have issues with a hard
disc, you can contact us logfiles which shows the error. We
are 24/7 available.


They asked again why a new disk could not be obtained for an additional charge. I even added that in this case, excessive surprise will have to blow on the forums. But the Germans did not surrender:

With this threat, we come no further into this matter.


Further questions on this topic were left unanswered by the caliper.

A lot of bad was written about RAID from 3ware. On the Seagate forums, people are screaming about problems with the firmware and the model in the Hetzner lineup is also not ok .

Now, due to the fact that we have other servers on Hetzner - it will not be possible to jump right away.

We have made an amendment so far that paying every month 25 euros (+ VAT) for a RAID controller (Hetzner FlexiPack + 2-Port Hardware RAID-Controll) that burn disks - it makes no sense. Returned to the software raid.

Reallocated_Sector_Ct, at the time of their installation is already 32, may not be extremely critical, but the fact that the reallocated sectors have already appeared and the disk has worked for more than a year, and Hetzner stops focusing on refusing to change it even after the second “fall” of the system in less than a year - alarming.

We will follow how soon reallocated will grow. Add this information to the daily information log. Everyone who is hosted on Hetzner recommend checking the drives smartctl -A /dev/sda .

Maybe we do something wrong in our team? But the situation looks like this:

You come to Toyota, buy a car, choose, look at the brakes. The brakes are already old, worn, may fail.

You ask: "Can we have new brakes, even for an additional amount?"

And in response: "but we have no such option!"


I wonder what experience other teams have with Hetzner? Has anyone encountered a similar situation with other hosters? What would you require from a company in such a situation?

How to be? What are some good alternatives to advise (except for AWS - our project, which is now spinning on Hetzner until Amazon pulls the budget)?

Post Mortem (2012-01-27)

After a day and repeated reminders, the head of the caliper responded and responded to our desire for a new disk. True for 69 euros. But the answer was friendly. Only a bit later. The system has already been launched. Disk in software RAID change - downtime.

Perhaps the pressure from Habra affected the mind of Rene.

Thank you all for the very helpful tips.

I think we will listen to noonesshadow and achekalin , we will collect the money and go for an upgrade. Perhaps in the clouds.

If anyone stays at Hetzner, pay attention to EQ vs EX:
* ZloiZmei
* vgrayster
* inkvizitor68sl
* Fr0stb1te
* Lux_In_Tenebris

and SSD
* synergy

Source: https://habr.com/ru/post/137004/


All Articles