Inside IT Storage

Seagate Enterprise Inside IT Storage

Diving into “MTBF” and “AFR”: Storage Reliability Specs Explained

During Seagate’s recent launch of its Savvio 10K.4 drive, I read some news stories that indicated the 10K.4’s two million hour Mean Time Between Failure (MTBF) specification meant that the drive’s actual average lifespan equated to over 200 years of use. While the 10K.4 certainly is 25% more reliable in its reliability rating compared to its prior generation, unfortunately the MTBF specification itself is often clearly misunderstood. So let’s talk a little bit about what “The Mean” really means and why Annualized Failure Rate (AFR) is a preferred predictor of reliability. Many thanks go to Bill Rudock of Seagate for his insight and help with this week’s blog.

Before we get into the numbers, an important note concerning either MTBF or AFR is that these are used to estimate reliability for a population or group of drives. Neither specification is designed to determine a single individual drive’s useful lifespan.

Historically, the term Mean Time Between Failure (MTBF) was most often used as a reliability description for repairable systems.  Very early in the history of hard drives – which now spans over 50 years – drives were actually repairable systems that required frequent field service intervention/maintenance. The mean times between failures were measured in days or weeks. The term has remained in use for describing disc drive reliability since those early days.

Let’s give an example of MTBF use first as it relates to a population of airplanes that are repairable. Consider a commercial fleet of 100 airplanes tracked for one year (= 8,760 hrs).  Assume seven airplanes –  # 9, #16, #18, #56, #61, #77 and # 94 — each experienced a failure (maybe a component) in the year that required repair.  And airplane #41 required two repairs, and # 67 required six repairs in that same year.  Neglecting any downtime for which the repairs takes place and assuming otherwise the airplanes are always flying, then the cumulative time the fleet is utilized is: 100 x 8,760 hrs = 876,000 hrs.  The total number of failures for the fleet for the year was 7+2+6=15.  The MTBF for the fleet =876,000/15=58,400 hrs.  Notice that the MTBF hours are longer than a year and yet some individuals experienced failure within a year.  Also notice that the MTBF as calculated here does not take into account when any of the individuals failed.  Nor does it seem to be an accurate description of the individual airplane #67 that required six repairs in that year.

Seagate Cheetah drives are well-known within the storage industry as being among the most reliable HDDs offered

Disc drive reliability specifications based on MTBF can lead to common misconceptions. For example, Seagate’s previous generation of enterprise-class disc drives have a specified MTBF of 1,600,000 hrs.  This is much longer than any single individual’s expected mission life. Yet someone might innocently read that specification and expect all individual drives to last that long. Seagate has migrated to adding an Annualized Failure Rate (AFR) specification to be more clear and precise in reliability descriptions.  For disc drives the reliability metrics and specifications (AFR or MTBF) are, necessarily, probabilistic population metrics for groups of drives.

The following is quoted from a prior-generation Seagate Cheetah drive product manual for MTBF:

“… The mean time between failures (MTBF) target is specified as device power-on hours (POH) for all drives in service per failure. The following expression defines MTBF:

“Estimated power-on operating hours means power-on hours per disc drive times the total number of disc drives in service.”

Now, on to AFR. Compare the above with the following quotation from a current generation Product Manual:

“These drives shall achieve an AFR of 0.55% (MTBF of 1,600,000 hours) when operated in an environment that ensures the HDA case temperatures do not exceed the values specified in Section 6.4.1.Operation at case temperatures outside the specifications in Section 6.4.1 may increase the AFR (decrease the MTBF). AFR and MTBF statistics are population statistics that are not relevant to individual units.
AFR and MTBF specifications are based on the following assumptions for Enterprise Storage System environments:
•8,760 power-on hours per year
•250 average on/off cycles per year
•Operating at nominal voltages
•System provides adequate cooling to ensure the case temperatures specified in Section 6.4.1 are not exceeded”

To calculate AFR, we use this formula: AFR = 1 – exp ( – Annual Operating Hours / MTBF)

But even with the knowledge of the formula set aside for a moment, the AFR percentage itself (i.e., .55% in the above example) is itself obviously more easily understood and clear.

The MTBF estimated in the airplane example and implied by the calculation method described in the previous generation Cheetah product manual inherently assumes a statistically constant failure rate.  Though commonly used, Seagate finds that such an assumption is not generally true for disc drives and therefore use of the term MTBF can again be confusing.  We prefer Annualized Failure Rate (AFR) as a reliability metric as a result but include MTBF in enterprise product literature for historical reference.

8 Comments

  • Hi.

    Excelent text! I now know what MTBF and AFR mean.

    I see that:
    Enterprise “barracuda-es” HDs have AFR of 0.73%
    Home-consumer “barracuda 7200.12″ HDs have AFR of 0.34%

    Looking at the first time, it seems for me that “barracuda 7200.12″ is more reliable than “barracuda-es”.

    Does test parameters are different from home-consumer to enterprise-consumer? It seems for me that “barracuda 7200.12″ was tested on different conditions.

    What is the expected AFR of a “barracuda 7200.12″ in 24hs/day environment?

    I’d like to make a comparison between home and enterprise HDs AFR.

    Thanks,

  • Thank you for your response and question. You are correct, the test conditions for enterprise and desktop drives are conducted with different parameters. Unfortunately, we don’t have any test data available showing reliability estimates for that specific drive (7200.12) or other desktop-oriented drives when used in enterprise workload conditions. However, it is my understanding from discussing this subject with our engineering teams, that reliability will drop in those use cases.

  • Norm Nantel Says:

    I landed here digging up info on the term AFR and now realize that it’s nothing new, simply the failure probability computed at year-1 as shown in Reliability texts. So the AFR of 0.55% only applies to 1 year of operation. After year-2 the probability of failure is 1.1%, after year-3 it’s 1.6%,etc.

  • Yes, correct. The transition to using AFR was to provide more clarity versus MTBF which many were confused about.

  • [...] data integrity with the new T10 Protection Information standard and an increase in reliability (1.4 million hours MTBF). With a Self Encrypting Drive (SED) option, data security is covered throughout the entire drive [...]

  • Andras Bartok Says:

    Hallo David,
    your study is very useful and understandable.
    Do you have any analysis on other HDD’s AFR?
    Thanks and best regards,
    Andras Bartok

  • Thank you Andras. I do not have information on competitive product’s AFRs and many still use MTBF.
    Best,
    David

  • Ng Peng Mun Says:

    Hi,

    It was given that AFR = 0.48% FOR 2400 POH at 25C; 0.97% for 8760 POH at 25C. So what is the acceleration factor to calculate the MTBF for 8760 POH? I don’t think the formula would be AFR = 1 – exp ( – Annual Operating Hours / MTBF). Please advise! Thanks.

Post a Comment

Your email is never shared.

* Required fields

* Seagate will review all blog submissions and determine, in its sole discretion, whether such submissions will be posted for broader viewing. No blog comment will be considered for posting if deemed potentially damaging to Seagate's reputation or insufficiently aligned with the relevant blog topic. Without in any way limiting the foregoing, no submissions will be posted that contain: confidential company information; profanity; racial slurs; gratuitous references to sex, substance use, or violence; or statements that are in any way contrary to the letter or spirit of Seagate's Code of Business Conduct and Ethics.