Assessing/Planning for Reliability

Why Reliability?

Reliability has several definitions, mostly covering some likelihood expression of functionality under certain conditions and a given timeframe. And designers and manufacturers often put significant effort into preventing failure. But this begs the question of “why?”

The answer is that reliability is a potential benefit to the user and, through sales and profit, to the supplier. Benefits come in the form of safety, of the product confidently delivering its intent, and of lowered operating costs. However, there are also potential downsides, of a higher up-front cost, changed aesthetics, less performance, etc. And, sales may be driven by these more visible attributes. Hence, benefits to the supplier of maybe increased investment (to deliver assured high reliability) might be negative without increased sales and / or profit margin. And perhaps development of improved reliability could delay product launch and thus lose key marketing opportunities.

Hence, we must recognize that reliability is a competing attribute.

So, how to compete?

Compete on cost by presenting operating $ benefits. Compete on performance by factoring % success. (There’s no point in having a high performance aircraft that never completes it mission.) Relate cost of warranty returns to direct loss of profits. Factor reliability and product assurance (including more generous warranties) into marketing activities, such as focus groups, to highlight potential gains in sales.

And remember that good reliability is often achieved simply by ensuring good quality. There are many instances where poor supplier and manufacturing quality generate far more warranty returns than “design” reliability issues. And good supplier and manufacturing quality can often be achieved at low cost, and has no negative impact on product performance.

So, to be effective, a reliability engineer should use the tools of marketing, field service, supply chain management, quality and accounting departments (as well as design), and be a positive agent in developing corporate strategy. Without these links, we would miss several opportunities to best optimize products for customers and supplier alike.

Question: Where does the reliability engineer go to learn those tools, beyond forging links with these several departments?

Starting  with a low cost/low complexity  design alternatives  in large engineering programs  and then  systematically raising the cost and complexity , where warranted,  is a far better approach than starting with higher cost design complex designs  and later doing cost down and simplification  activities.  I have worked on many product teams where the latter was the norm.  Subsystem teams would select costly more complex technologies, more costly high precision components and assemblies, costly manufacturing and control systems, all  in an effort to get  a jump start on functional performance and time to market requirements. Early demonstration of performance, even though the costs were  over allocations, were considered perfectly fine,  as long as everyone tacitly understood that  there would be a cost-down and simplify  activities  at the end of the development activities.  Many engineering teams  understood  that their cost allocations would be waived initially to satisfy time to market and performance requirements.

The higher initial cost design approach was quite appealing,  as it usually  avoided  the unwanted attention and pressures from engineering management.   Company buyers, in turn, would prepare themselves to put undue amounts  of pressure on suppliers to reduce costs for their deliverables.  Manufacturing engineering would invest time and resources in higher precision and many   processes with secondary operations, again with the idea that both design and manufacturing cost down would come later.     Sales and marketing people, in turn,  would prepare for  a higher priced offering than originally planned.  Extra pressure was put on sales teams to push the higher prices along to loyal customers.

The high cost design approach many time proved difficult in later development stages  as the cost down would negatively affected performance. This is really something one  would like to deal with early on in the design cycle, not at the end.   Rationale for original design choices were sometimes lost or forgotten.  The cost engineers  tasked with the cost down and simplification activities were usually not the original design engineers.   This in turn created new difficulties usually for downstream service engineers and manufacturing quality engineers.

When starting with low cost design alternatives, it becomes imperative  to quickly identify a set of robust technologies and robust manufacturing processes that simultaneously satisfy quality, cost, and delivery requirements.  In selecting low cost alternatives, engineers are tasked with   exploring  available design space (using flexible fixtures)  to identify first a working  prototype condition and  directions for  improvement without adding cost. Using experimental design methods /parameter design methods  to find an optimal set of nominal values, has been widely used.  The rule of thumb was that if you could get the functions to work just once with the low cost  approach, then you could begin the optimization process without adding cost.  There would be many opportunities to capitalize on  better combinations of control factors and signal factors.  If the optimization efforts fell short,  then adding cost incrementally until the trajectory to design maturity  improved, could be done.  Nevertheless, the initial low cost approach would still end at a better place than starting with high cost and trying to drive the cost and complexity down late in the cycle.

How often have you cut a board, a piece of material, or even wrapping paper, and guess what? You come up Short Agh! We’ll there may be a few factors in not effectively measuring the material. It can be any of the following: 1) Your measurement system or rule is not accurate enough for the material you measuring. 2) You need to measure in Milimeters and not inches. 3) You don’t have repeatable measurements, and have too much variance in your system.

If number 3 is the case for repeatability, we would suggest a Gage R&R study, and an Ops Ala Carte Consultant can help you here.

Per the 2nd cause, in my recent experience with a Supplier Quality Issue, the manufacturing tech. measured a hi-tech German Cable for medical applications, and found that a couple of pigtail lengths wire were out-of-spec!

We’ll this is an expensive error found, as the whole cable assembly would have to be shipped back to Germany to
the Supplier.

Instead, I Measure the assembly pig tails of the cable the Second Time in mm, and guess what, they all were within spec. +/- 1 mm.

So, measure twice, at least, it will save you money, time, and your company will be more productive with less rejects.

Happy Measuring.

Greg Swartz, CQE

We can’t under-estimate the power of a well laid-out qualification process. We’ve all heard about DVT, the Design Verification Process step that puts one’s product through several testing requirements; but a comprehensive qualification plan requires more than DVT.

My professional experience has shown that  a 3 steps qualification process: EVT, DVT and PMT allows a much more thorough qualification. These acronyms can be stated differently in different organizations, but they have about the same meaning:

  • EVT is for the early stages of product development, when the product is still somewhat imature; for example, for an electronic assembly, one can choose to conduct EVT on just the PCBAs before they are mated to the main system.
  • DVT is for a more mature stage of product development, when the product is complete and functional.
  • PMT is the final qualification step where customer systems and aplications are involved and the product is ready to ship.

This article and the subsequent white paper link was written by Mike Keer and the Product Realization Group team.

New Product Introduction consists of people, processes and technology, which together provide a formal methodology for a product’s transition from engineering design to volume manufacturing.  A subset of the product lifecycle process, which covers the entire lifecycle of a product from concept to end of life, NPI’s primary focus is on a product’s beta, pilot, and general availability (GA) stages.

Here are seven best practices for deploying a strong NPI strategy:

  1. Use Concurrent Engineering
  2. Mitigate Risks
  3. Employ Design for Excellence (DFX)
  4. Leverage Rapid Prototyping and Accelerated Life Testing
  5. Adhere to Agency and Environmental Compliance Requirements
  6. Learn from Prototype and Pilot Builds
  7. Deploy Scalable Business Systems
To download the rest of this article, please go to: Seven Best Practices by the PRG – 2012 05 06.pdf

Successful new product development (NPD) involves the art of balancing schedules, resources, and costs to enable products that launch on-time, with desired performance and at the right cost. Inevitably, trade-offs and risks must be made along the way. Companies that manage these trade-offs and risks consistently outperform their competitors.

On May 17, we created a panel of industry experts to look at how to identify all the risks in development, and shares their knowledge of the latest emerging risks. The panelists offer a variety of perspectives – ranging from Mechanical and Electrical Design, Product Reliability and Parts Fabrication. Practical issues of how upstream design decisions impact downstream performance, quality and costs will be explored.

You can download a copy of the presentation at: Managing Design Risk

Or you can download a copy off the PRG website at: Managing Design Risk-PRG

In the presentation, you will learn about:

– Five tips for risk mitigation
– Managing constraints and trade-offs
– Concurrent engineering and communications
– Emerging risks
– Reliability considerations

If you have any tips or recommendations, please respond to this blog with your inputs. We’d love to hear from you.

An adaption of the Functional Safety standards IEC 61508 and IEC 26262 by the European Union brought a new life into slowly fading activity of reliability prediction. Both reliability prediction and reliability demonstration are now key parts of many product development programs, however despite phonetic similarity those two have little in common as well as the result they generate. 

While reliability prediction is an analytical activity often based on mathematical combination of reliabilities of parts or components comprising the system; reliability demonstration is based on product testing and is statistically driven by the test sample size.  Therefore the obtained results could drastically differ.  For example, a predicted system failure rate of 30 FIT (30 failures per billion (109) hours) would corresponds to a 10 year reliability of 99.87% (assuming 12 hours per day operation).  In order to demonstrate this kind of reliability with 50% confidence (50% confidence is considered low in most industries) one would need to successfully test 533 parts (based on binomial distribution) to the equivalent of 10 year field life.  Needless to say that this kind of test sample is prohibitive in most industries.  For example in the automotive electronics the test sample size of 23 is quite common, which roughly corresponds to 97% reliability with 50% confidence. 

The natural question is: how do you reconcile the numbers obtained from reliability prediction with the numbers you can support as part of reliability demonstration?

The answer is: I don’t believe that you can.

You can make an argument that reliability demonstration produces the lower estimate values.  Additionally the test is often addresses higher percentile severity users, thus the demonstrated reliability for the whole product population will likely be higher.  However, in most of the cases the gap will remain too wide to close.  This is something, which reliability engineers, design teams, and most importantly customers need to be aware of and be able to deal with as part of the product development reality.

What does the audience think? We’d love to hear your opinions on this.

Andre Kleyner

In order to gain cooperation with the supplier, the key point is to look at your improvement objective in the eyes of the supplier – “Why should I do this”?  “What’s in it for me”?    The intent is to bring value to the supplier as well as the customer – both to win/gain!

GAIN WHAT – reduction in cost!

How do we do this?  There are tools/methods such as:

  • SMED – Single Minute Exchange of Die – How does help?
  • VMI – Vendor Managed Inventory

Who is using such inconcert with your suppliers?  We will explore each on future blogs – please provide your comments both supportive or otherwise!

Reliability Specification

The importance of accurate and clearly specified product requirements can not be overstated. The reliability specification is usually part of a larger document, called the product specification or product requirements document. This document contains information such as the product description, performance specifications, environmental requirements, electrical interface specifications, physical specifications and reliability specifications.


Because the definition of reliability depends on the various product specifications, including the reliability requirements, it is important that they are well understood and properly stated in the product specification. Nothing should be left to interpretation or to be assumed by the reader, unless it is a legitimate design choice. Any operating condition or performance requirement that will affect the reliability value should be nailed down somewhere in the body of this document. To help in the understanding of how the product’s reliability will or should be measured, definitions of failure and MTBF (Mean-Time-Between-Failure) should appear in the reliability specification section. An example of a properly developed reliability specification for a printer follows:




The MTBF of the Model X-100 printer shall not be less than 2000 hours, based on the following operating conditions:


Power On:  8 hours per day; 168 hours per month

Printing:   25% Duty Cycle;  42 hours per month

Characters Printed:  40 characters per second average; 3.3 million per month

Columns and Lines:  40% average density


MTBF for purposes of this specification shall be defined as:


MTBF = Total Operating Hours for the Period / Total Number of Chargeable Failures


Where the total operating hours is the total power on time accumulated for the period during the specified useful life of a population (minimum of 200) of field installed printers, when operated within the specifications stated in this document. To establish a meaningful MTBF, operating hours per unit must be greater than 1000 hours.


A failure is defined as the inability of a unit to perform its specified function when operated within the defined limits of the product specification, requiring unscheduled maintenance to restore performance. Failures excluded from MTBF calculations include stoppage or substandard performance caused by operator error, environments beyond specified limits, power source failure, failure of associated supplies and equipment supplied by other vendors, or any other failures not caused by the printer.


In this example, a minimum operating time per unit was specified to insure that enough hours have been accumulated to obtain a statistically meaningful MTBF calculation. A minimum population was specified because a specified MTBF is an average value for a large population. Measurements for a small number of units can result in a wide range of values that will not reflect the true reliability level of the product. Some individuals have from time to time requested that manufacturers specify product MTBF requirements to a confidence level, such as 90%. This practice should be avoided because there is no direct relationship between a MTBF specification and confidence levels. The addition of a confidence level statement does not add anything to the specification, and only leads to confusion and a false sense of security by customers. The minimum population requirement is all that is usually required to eliminate small sample size measurement fluctuations. Sample size in this case equals the number of failures and the associated number of operating hours. With a large enough sample, all confidence level values will converge around the true population value. Therefore, it is not necessary to indicate a confidence level.





Another misconception that sometimes appears in reliability specifications is the stating of MTBF as a guaranteed life expectancy, or confusing a ‘guaranteed’ MTBF for an extended warranty. Based on a random distribution of failures, 2/3 of the units of a population of units are expected to fail prior to the MTBF value being reached and 1/3 after for an overall average value equal to the MTBF. Therefore, a specified MTBF does not guarantee a cost free failure repair period equal to the MTBF value. A warranty period is a period of time in which the vendor agrees to pay the cost of repairs. The product MTBF certainly helps determine the number of such repairs and should be taken into consideration when setting the warranty period. However, the two should not be confused.



Bob Bowman

Senior Reliability Consultant

Ops A La Carte

When a manufacturer of a material that is used for an entire line of products informs you that they can no longer supply this material it can be difficult to find a replacement material to perform to the same specifications.  Once you’ve determined a number of potential alternative suppliers / materials, what’s next?  Depending on how the material is used in the products and the percent of parts in the product line that are made from this material will probably determine how stringent and extensive of a qualification process should follow.

In a recent case of this kind at my company, this was exactly the case.   A plastics manufacturer was experiencing issues with meeting demand and informed us that the material would no longer be available in 6 months.   The typical qualification process usually consists of several destructive and non-destructive tests that are meant to evaluate the integrity of the alternative material relative to the proposed replacement.  The tests that are common to our methods include moldability tests, drop test, tensile testing, UV exposure for color shift effects, and modest weatherability (more at environmental testing) tests for plastic warping.  The moldability tests included 30 piece sample measurements of critical and overall dimensions, mold-flow testing, and visual inspection of all features to ensure of no short shots or other molding issues.  To elaborate on the ‘weatherability tests’, essentially the materials were subjected to temperature chambers and run through high-heat and thermal cycling routines, but the only data collected was dimensional measurements before and after this testing only to gauge dimensional drift.

All the testing up to this point includes a relatively comprehensive testing platform and was a good overall gauge of the performance of the plastic as it is today.  However, we realized it was lacking one major element, the performance of the plastic in 5 years, or near the end of its warranty period.  There was only 1 test designed to evaluate the “performance” of the plastic after 5 years, and this was the UV test.  However, this test does not evaluate the mechanical properties or performance of the material, solely the physical appearance.  Not to mention there was no evaluation of acceleration factor for this test, only a subjective period of time in which the plastics are put into a UV chamber.  The only testing to evaluate material properties is the drop testing and tensile testing performed on newly molded plastic.

This is where reliability becomes useful.  Reliability forces you to consider the affects of the elements on the specimen / material / device at hand not just at the present moment, but in several hours, duty cycles, or years down the line.  Since there had never been any consideration of the mechanical integrity of the material near the end of the warranty period for this type of testing, it was up to us to determine how to do that.

We set out to determine what factors would accelerate the life of the material.  After debating on whether or not high temperature soaking would act as a stressor and accelerate the material to 5 years, we realized it probably would not since it would most likely only bring the plastic back up to its heat deflection temperature and only remove the residual stresses, perhaps allowing it to perform better in certain tests.

The next potential ‘stressor’ was UV.  It was easily agreed upon that UV exposure would affect the mechanical properties of the plastic, but we needed a system to determine its equivalent 5-year exposure.   The method we decided to implement was a 3-point test with molded tensile bars.  We’ve decided to expose the material to UV for 50 days, 75 days, and 100 days.  Once the material is finished with the ‘aging’ process, we intend to subject them to tensile testing along with ‘virgin’ tensile bars.  As of this date, units are still in the aging process in the UV chamber.  With 20 pieces of each, we should be able to determine the relationship of UV exposure to tensile strength with relatively high confidence.  Results pending.

See the Ops A La Carte Seminar on Fundamentals of Climatic Testing