Modeling Methods

Physical acceleration means that operating a unit at some higher stress level(s) (i.e, higher temperature, voltage, humidity or duty cycle, etc…) should produce similar failures as would occur at typical-use stresses, except that they are expected to occur much sooner.

Failures may be due to mechanical fatigue, corrosion, chemical reaction, diffusion, migration, etc. These are the same cause of failures under normal stress condition; the only difference is the time scale (the time to failure).

When there is true acceleration, changing stress is equivalent to transforming the time scale used to record when failures occur. The transformations commonly used are linear, which means that time-to-fail at high stress just has to be multiplied by a constant (the Acceleration Factor or AF) to obtain the equivalent time-to-fail at use stress.

For many engineers, this is where they encounter the biggest challenges: what is the preferred model? How do I fit the model to the condition, materiel and physical attributes of the UUT?

Too many knowledgeable users employ the Arrhenuis equation, but this is not always the preferred method, and care must be taken to apply the proper model.

What are the experiences with other models?

What are the success stories?

An adaption of the Functional Safety standards IEC 61508 and IEC 26262 by the European Union brought a new life into slowly fading activity of reliability prediction. Both reliability prediction and reliability demonstration are now key parts of many product development programs, however despite phonetic similarity those two have little in common as well as the result they generate. 

While reliability prediction is an analytical activity often based on mathematical combination of reliabilities of parts or components comprising the system; reliability demonstration is based on product testing and is statistically driven by the test sample size.  Therefore the obtained results could drastically differ.  For example, a predicted system failure rate of 30 FIT (30 failures per billion (109) hours) would corresponds to a 10 year reliability of 99.87% (assuming 12 hours per day operation).  In order to demonstrate this kind of reliability with 50% confidence (50% confidence is considered low in most industries) one would need to successfully test 533 parts (based on binomial distribution) to the equivalent of 10 year field life.  Needless to say that this kind of test sample is prohibitive in most industries.  For example in the automotive electronics the test sample size of 23 is quite common, which roughly corresponds to 97% reliability with 50% confidence. 

The natural question is: how do you reconcile the numbers obtained from reliability prediction with the numbers you can support as part of reliability demonstration?

The answer is: I don’t believe that you can.

You can make an argument that reliability demonstration produces the lower estimate values.  Additionally the test is often addresses higher percentile severity users, thus the demonstrated reliability for the whole product population will likely be higher.  However, in most of the cases the gap will remain too wide to close.  This is something, which reliability engineers, design teams, and most importantly customers need to be aware of and be able to deal with as part of the product development reality.

What does the audience think? We’d love to hear your opinions on this.

Andre Kleyner

Robust Design & Reliability

      I delivered a webinar recently  to  describe   the  differences and similarities between robust design (RD)  activities and reliability engineering (RE)  activities in hardware product development .  A survey we took from   several hundred attendees indicated a diversity of opinions. About half the participants indicated they did not differentiate at all between the two methodologies.  Approximately 20 % indicated they did differentiate between the two methodologies, and about 30% indicated that they did not know.

I was quite surprised at the result, especially since participants came  from  working quality engineers, reliability engineers, engineering directors, system engineers, etc.  Somewhere along the way,  the  differences and similarities between the two seem to have become muddled.  Below I have    collected just twelve of the many ways in which the activities are different:

 

RD 1  Focus on design  transfer functions, and  ideal function development

RE 1  Focus on design dysfunction, failure modes, failure times, mechanisms of failure

 

RD 2   Engineering focus, empirical  models, generic models , statistics.

RE 2    Mechanistic understanding, physical models,  science oriented  approach.

 

RD3  Optimization of input-output functions with verification testing requirement

RE3  Characterization of natural phenomena with root cause analysis and  countermeasure decisions

 

RD4  Orthogonal array testing, design of experiments planning

RE4   Life tests , accelerated life tests, highly accelerated tests, accelerated degradation tests, survival  methods

 

RD5   Multitude of control, noise, and signal factor  combinations for reducing sensitivity to noise and amplifying sensitivity to signal

RE5   Single factor testing,  some multifactor testing ,   fixed design with  noise factors,  acceleration factors

 

RD6   Actively change design parameters to improve insensitivity to noise factors, and sensitivity to signal factors

RE6   Design-Build-Test-Fix cycles for reliability growth

 

RD7   Failure inspection only with verification testing of improved functions

RE7   Design out failure mechanisms, reduce variation in product strength. Reduce the effect of usage/environment

 

RD8  Synergy with axiomatic design methodology including ideal design, and simpler design

RE8   Simplify design complexity for reliability improvement.  Reuse reliable hardware .

RD9   Hierarchy of quantitative design   limits including functional limits, spec limits, control limits, adjustment limits

RE9   Identify & Increase design margins, HALT & HASS testing to flesh out design weaknesses.   Temperature & vibration stressors predominate

 

RD10   Measurement system and response selection paramount

RE10   Time-to-failure quantitative measurements supported by analytic   methods

 

RD11  Ideal function development for energy relate measures

RE11  Fitting distributions to stochastic failure time data.  Time compression by stress application

 

 

RD12  Compound noise factors largest stress.  Reduce variability to noise factors by interaction between noise and control factors, signal and noise factor.

RE12  HALT & HASS highly accelerated  testing to reveal design vulnerabilities and expand margins.  Root cause exploration and mitigation

 

There are many other differences of course, but this list should start the conversation .  I  would invite bloggers to submit their own opinions  and  lists of differences (and similarities) .

Louis LaVallee

Sr. Reliability Consultant

Ops a la Carte

 

 

When performing various reliability tasks, non-repairable systems or products are treated differently from repairable systems or products.  Some of the tools that are used for one type are not applicable to the other.   Obviously, at some level, repairable systems are composed of non-repairable parts.   Examples of non-repairable systems would be “one-shot” devices like light bulbs or more complex devices like pacemakers.  Examples of repairable systems are computers, automobiles, and airplanes.

 

What is unique about repairable systems?  Availability becomes a key measure of importance.  In simple terms, availability is the percentage of time that the product or system is able to perform its required functions.  When the required functions cannot be performed because a failure has occurred, the system must be repaired to restore the functionality.  This is where another measure, maintainability, impacts the system availability.  The faster the system can be repaired, the greater the availability to the customer.  For systems that require high reliability or availability, redundancy can improve the design.  However, repairable systems will benefit significantly more than non-repairable systems when using redundancy.

 

Common metrics used in measuring system types are shown in the table below.

METRIC

NON-REPAIRABLE

REPAIRABLE

Time to Failure MTTF Time to First FailureHazard Rate MTBF Time to First FailureROCOF/Failure Rate
Probability Reliability Availability(Reliability)
Maintainability N/A Maintainability Downtime
Warranty Product replacement within warranty period Part/product replacement within warranty period

The table below compares some additional areas of non-repairable systems and repairable systems.

NON-REPAIRABLE

REPAIRABLE

Discarded (recycled?) upon failure Restored to operating conditions without replacing entire system
Lifetime is random variable described by single time to failure Lifetime is age of system or total hours of operation
Group of systems – lifetime assumed independent & identically distributed (from same population) Random variables of interest are times between failure and number of failures at particular age.
Failure rate is hazard rate of a lifetime distribution – a property of time to failure Failure rate is rate of occurrence of failures (ROCOF) – a property of a sequence of failure times

 

Reliability modeling is usually more complex for repairable systems.  Often, methods like Markov models (chains) is required to adequately model repairable systems as opposed to simple series block diagram methods for non-repairable systems.

In the area of monitoring or analysis, the following table compares methods for both types of systems.

METHOD

NON-REPARIABLE

REPAIRABLE

Weibull Useful method (single failure modes only) Not used at system level
Reliability Growth –  Duane

– AMSAA

Usually not used Used during development testing
Mean Cumulative Function (MCF) Usually not used Useful method (non-parametric)
Event Series (Point Processes) HPP (For random, constant average rate events) NHPP (Parametric method) – complex

 

It is important to understand the type of system being designed and use the appropriate reliability methods and tools to match that system.  This may require some research but it’s important to use the correct methods so as not to have misleading results.

What has been your experience in doing analysis of repairable systems compared to non-repairable systems?