📜 ⬆️ ⬇️

MTBF - where does a million hours MTBF come from?



It’s just surprising how big a misunderstanding is around such a widespread concept as MTBF (Mean Time Between Failure - “Time Between Failures” or “MTBF”), and even experts in the field of data storage do not understand the meaning of this value.

It would seem - what could be easier. Time to Failure is the time of trouble-free operation, from the first switching on of a new disk, until the time of failure, counted in hours.
Almost anyone who inquires about the value given by manufacturers, as MTBF of modern disks, and will easily make simple calculations, will be surprised by its strange value.
Today, the value of MTBF is given in a million or even one and a half million hours.
In a year - approximately 8,760 hours, so, based on our understanding of the “physical meaning” of this value, the manufacturer plans “MTBF” for any such disc for over a hundred years (114 years, for a million hours MTBF), which is an obvious absurdity for everyone. who have hard drives dying.

Then what is this "million hours", where and how is it measured?
Of course, the manufacturer does not drive the disc for 114 years, the estimate is made artificially, but where did the “million hours” value come from?
')
The fact is that MTBF is measured for the whole exploited “disk population” and covers the period of the declared warranty period for this type of disk . Both highlighted points are important, and are often omitted in the description, which leads to a fundamental lack of understanding.

Imagine that we put a hard drive in the server, which worked for 3 years of the warranty period, and, being intact, was replaced with a new one. The next one worked for three years, and was replaced after the expiration of the warranty period, and so on. And on the 38th disk, you have the right to expect that it will not finish until the end of the warranty period.

Or imagine a little more realistic situation.
Suppose for ease of counting, we have a storage system for 115 disks. For each disc, the manufacturer leads MTBF equal to a million hours. But we must take into account the fact that in a large disk population the total MTBF, that is, the probability of failure, increases with the number of used disks.
For 115 disks, based on the MTBF value given by the vendor, we can expect that at least one disk from the 115 population will fail before the end of the three-year warranty period.
This option is much more similar to the truth.

Strictly speaking, in practice, instead of MTBF, it is much more practical to use the AFR parameter - Annual Failure Rate, or the “annual probability of failures” derived from MTBF.
It is calculated as: AFR = 1-exp (-8760 / MTBF)
The AFR value for a disc with a million hours of MTBF is 0.87% , which, in principle, is slightly overestimated (Google in the well-known 2007 study shows for AFRs around 1% for the new discs within the warranty period), but already quite consistent with practice.

It is curious that, for example, such a hard drive manufacturer as WD has now completely ceased to indicate the MTBF value, going to an indication of another parameter: “power on / off cycles”, apparently not least because of the clearly visible misunderstanding and non-obviousness of using the indicated MTBF values ​​by users.

Source: https://habr.com/ru/post/122529/


All Articles