Monthly Archives: October 2011

Traditional Mahalanobis distance is a generalized distance, which can be considered a measure of the degree of similarity (or divergence) in the mean values of multiple characteristics of a population, considering the correlation among the characteristics. It has been used for many year in clustering classification and discriminant analysis. Mahalanobis distance is attributable to Prof.P.C. Mahalanobis , founder of the Indian Statistics Institute some 60 years ago. Mahalanobis distance has been used for various types of pattern recognition, e.g. inspection systems, face and voice recognition systems , counterfeit detection systems, etc. The URL  from SAS  shown here  displays  data published by Fisher (1936) and cluster analysis, where classification into three predetermined categories is demonstrated.

Another generalized distance most engineers have encountered is the Euclidean distance between two multivariate points p and q. If p = (p1, p2,…, pn) and q = (q1, q2,…, qn) are two points in Euclidean n-space, then the distance from p to q, or from q to p is given by

D=sqrt[(q1-p1)**2 + (q2-p2)**2 +… + (qn-pn)**2]

No consideration is given to the correlation between characteristics in Euclidean distance calculations.
Dr. G. Taguchi of Ohken Associates Japan developed an innovative method for determining the generalized distance from the centroid of a reference group (of multivariate data) to a multivariate point. For example, if a doctor were to have a group of very healthy patients, whose vital characteristics like blood pressure, body temperature, skin color, heart rate, and respiration rate, etc. were all considered exemplary, then he could define a Mahalanobis space, a reference space, with those healthy folks, and use the centroid as the zero point and define a unit distance for a continuous degree-of -health scale. If a not-so-healthy person came to the same doctor, and the same characteristics were measured, he would have an MHD number much higher than the reference group. His MHD number would be indicative of his generalized distance from the centroid of the healthy group. As time passed, the MHD number for the not-so-healthy patient could increase (or decrease) , depending on whether his health were failing or improving, respectively. In general, very healthy people tend to look quite similar , while unhealthy people tend to look quite different from one another, (and from the healthy group) . In addition, the changes in correlation structure among the unhealthy patients’ characteristics strongly affect their MHD numbers. In the case where a person’s MHD number reached a predetermined high threshold value, for example, hospitalization might be recommended by the doctor. If the MHD became similar to those of the reference group, the patient could be recommended for simple periodic occasional doctor visits.
From any number of multivariate characteristics measured, it is possible to readily identify those characteristics which are most important (in a pareto sense) . Reducing cost of measurement is an important consideration for many enterprises. There is usually a subset of measurement which provide all necessary data to make correct decisions. Strong correlations between measurement make it possible to eliminate measures that add little value. The information contained in a handful of multivariate measurements may be sufficient to identify abnormal conditions.

A medical trend chart of MHD illustrates the relative level of health of a person as a function of time. For example, daily collection of data for a patient, along with daily estimation of MHD, could be used to track overall health improvements (or deteriorations). Increasing trends could be used for prognostics, to initiate preventive countermeasures, before a threshold condition is reached. The corrective effect of the countermeasure could be captured in the MHD number from the following days. Multivariate process control charts, like Shewhart and Cusum charts are similar , but these are based on probabilistic control limits derived from various statistical distribution assumptions. No such assumptions are made with MHD. Rather, consideration of costs are used to set limits.
For manufactured products, multivariate measures from testing are typically collected following final assembly. If we assume that the health of a manufactured product is analogous to the health of a patient, we could use similar methods to identify abnormal conditions and calculate a continuous MHD number for the multivariate condition. By collecting a group of manufactured systems, with exemplary performance, a Mahalanobis space could be constructed from the multivariate characteristics. A zero point and unit distance scale would be estimated as before. The system’s health could be diagnosed at t=0, just after assembly, and even later at intervals dictated by a data collection schedule. The manufactured product could easily be classified into normal and abnormal states at t=0, and the product’s tendency to become abnormal could be tracked.

The MHD measure can be utilized for many interesting industrial reliability problems including fault detection, fault isolation, degradation identification, and prognostics. For example, air bag deployment system decision relies on the ability to first establish a reference space for normal everyday driving, and then to release the air bags when multivariate shock loads and accelerations exceeds a threshold value. This is fault detection. Fire alarms should actuate when various fire conditions exist over and about that expected from simple kitchen cooking or cigarette smoking. Multivariate reference space would be collected from normal cooking conditions and abnormal fire condition would be declared above some threshold value. Tendency to fail for a high volume printer, with multivariate sensor data, could be inspected periodically, and a service agent could be dispatched or electronic countermeasure could be applied, before customer ever noticed. Availability of the printer would be higher without the fault downtime, and customer satisfaction would be higher.

The other day I was thinking about my Reliability Blog and it led to my thinking about CURVES, especially those most common in Reliability Engineering. We regularly use the Gaussian function and the Weibull Chart, but as far as my experience goes, the Bathtub Curve has been the most popular way to visually summarize the lifetime expectations of just about everything. Then, looking at the curve I could not avoid noting that it uses words most common in personal daily life: Infant Mortality, Useful Life and End of Life, and that stimulated thought on life at the beginning, middle and end.

Then Rob Reiner’s 1989 Movie, “When Harry met Sally” popped into my head, not only as a way of getting your attention, but because anyone who saw the movie can undoubtedly recall “that” scene where Sally (Meg Ryan) challenges Harry (Billy Crystal) that women can deceive men by faking an orgasm and so saying Sally “fakes” a very public (and very persuasive) orgasm to convince Harry. After Ms. Ryan is through with her demonstration, a nearby customer, (Estelle Reiner, Rob Reiner’s mother), when asked by a waiter what she wanted, replies, “I’ll have what she’s having” (33rd on the list led by “Frankly my dear, I don’t give a damn”). Now, how does this relate to the Bathtub Curve? Well, if Harry and Sally link up and Harry impregnates Sally, do we then have the beginning of a Bathtub Curve? Probably not. Bathtub Curves are by definition the lifetime of a population of products (or people, et al) using a graphical representation. What do you think?

Rather than take up a lot more of your time on this issue I would like to refer you to some very well written material on Bathtub Curves and then have you comment on how you see Bathtub Curves and what purpose they have played for you in the past, the present and how you see the future.

My first source was Wikipedia:

My second and I believe the best written and probably the most informative source is a 2-part paper written by Dennis Wilkins while he was at Hewlett-Packard (reportedly now a consultant with ReliaSoft):


We are giving a free webinar on Tuesday, October 11th at 11:30am.

Simulation and testing often go together as part of a strong reliability program.  Finite element simulation provides data and insights that would be difficult or impossible to obtain from testing alone.  Simulation gives results for all components, including those that are inaccessible, and over a wide range of conditions. The results from simulation can be critical parts of accelerated life test programs such as HALT and ALT.  In this seminar, we will show you techniques to integrate these tools into a reliability program to optimize your reliability results.

Register at

Reliability Specification

The importance of accurate and clearly specified product requirements can not be overstated. The reliability specification is usually part of a larger document, called the product specification or product requirements document. This document contains information such as the product description, performance specifications, environmental requirements, electrical interface specifications, physical specifications and reliability specifications.


Because the definition of reliability depends on the various product specifications, including the reliability requirements, it is important that they are well understood and properly stated in the product specification. Nothing should be left to interpretation or to be assumed by the reader, unless it is a legitimate design choice. Any operating condition or performance requirement that will affect the reliability value should be nailed down somewhere in the body of this document. To help in the understanding of how the product’s reliability will or should be measured, definitions of failure and MTBF (Mean-Time-Between-Failure) should appear in the reliability specification section. An example of a properly developed reliability specification for a printer follows:




The MTBF of the Model X-100 printer shall not be less than 2000 hours, based on the following operating conditions:


Power On:  8 hours per day; 168 hours per month

Printing:   25% Duty Cycle;  42 hours per month

Characters Printed:  40 characters per second average; 3.3 million per month

Columns and Lines:  40% average density


MTBF for purposes of this specification shall be defined as:


MTBF = Total Operating Hours for the Period / Total Number of Chargeable Failures


Where the total operating hours is the total power on time accumulated for the period during the specified useful life of a population (minimum of 200) of field installed printers, when operated within the specifications stated in this document. To establish a meaningful MTBF, operating hours per unit must be greater than 1000 hours.


A failure is defined as the inability of a unit to perform its specified function when operated within the defined limits of the product specification, requiring unscheduled maintenance to restore performance. Failures excluded from MTBF calculations include stoppage or substandard performance caused by operator error, environments beyond specified limits, power source failure, failure of associated supplies and equipment supplied by other vendors, or any other failures not caused by the printer.


In this example, a minimum operating time per unit was specified to insure that enough hours have been accumulated to obtain a statistically meaningful MTBF calculation. A minimum population was specified because a specified MTBF is an average value for a large population. Measurements for a small number of units can result in a wide range of values that will not reflect the true reliability level of the product. Some individuals have from time to time requested that manufacturers specify product MTBF requirements to a confidence level, such as 90%. This practice should be avoided because there is no direct relationship between a MTBF specification and confidence levels. The addition of a confidence level statement does not add anything to the specification, and only leads to confusion and a false sense of security by customers. The minimum population requirement is all that is usually required to eliminate small sample size measurement fluctuations. Sample size in this case equals the number of failures and the associated number of operating hours. With a large enough sample, all confidence level values will converge around the true population value. Therefore, it is not necessary to indicate a confidence level.





Another misconception that sometimes appears in reliability specifications is the stating of MTBF as a guaranteed life expectancy, or confusing a ‘guaranteed’ MTBF for an extended warranty. Based on a random distribution of failures, 2/3 of the units of a population of units are expected to fail prior to the MTBF value being reached and 1/3 after for an overall average value equal to the MTBF. Therefore, a specified MTBF does not guarantee a cost free failure repair period equal to the MTBF value. A warranty period is a period of time in which the vendor agrees to pay the cost of repairs. The product MTBF certainly helps determine the number of such repairs and should be taken into consideration when setting the warranty period. However, the two should not be confused.



Bob Bowman

Senior Reliability Consultant

Ops A La Carte