Monthly Archives: August 2011

There can be a lot of confusion in the reliability field when we all start talking about regular activities that have taken on acronyms for their real names. We hope this list will help those who are new to reliability.

AFR – Actual Failure Rate
AFR – Annualized Failure Rate or Average Failure Rate
ALT – Accelerated Life Testing
ANOVA – Analysis of Variance
AQL – Acceptable Quality Level
ASME – American Society of Mechanical Engineers
ASP – Authorized Service Provider
ASQ – American Society of Quality
ASTR – Accelerated Stress Testing and Reliability
AVL – Approved Vendor List
CAD – Computer Aided Design
CAF – Conductive Anodic Filament
CALCE – Center for Advanced Life Cycle Engineering (part ofUniversityofMaryland)
CAPA – Corrective and Preventive Action
CAR – Corrective Action Report or Request
CDF – Cumulative Distribution Function
CFD – Computational Fluid Dynamics
CLCA – ClosedLoopCorrective Action
CM – Contract Manufacturer
COL – Cold Operating Limit (see also LOL)
COTS – Commercial-off-the-Shelf
CPK – Process Capability
CQE – Certified Quality Engineer
CRE – Certified Reliability Engineer
CRM – Customer Relational Management
CTQ – Critical to Quality
DFM – Design for Manufacturability
DFR – Design for Reliability
DFW – Design for Warranty
DIP – Dual Inline Package
DOA – Dead on Arrival
DOD – Department of Defense
DOE – Design of Experiments
DTIC – DefenseTechnicalInformationCenter
DVT – Design Verification Test
ECAP – Electronic Circuit Analysis Program
ECO – Engineering Change Order
EDA – Electronic Design Automation
EMC – Electro Magnetic Compatibility
EMI – Electromagnetic Interference
EMS – Electronic Manufacturing Service
EOL – End-of-Life
EOS – Electrical Overstress
ERT – Early Reliability Testing
ESD – Electrostatic Discharge
ESR – Equivalent Series Resistance
FAR – Failure Analysis Report or Request
FEA – Finite Element Analysis
FET – Field Effect Transistor
FIT – Failure in Time
FLT – Fundamental Limit of Technology
FMEA – Failure Mode and Effects Analysis
FMEA – Failure Modes and Effects Analysis
FMECA – Failure Modes Effects and Criticality Analysis
FRACAS Failure Review Analysis and Corrective Action System
FRB – Failure Review Board
FTA – Fault Tree Analysis
FTIR – Fourier Transform Infrared
FYM – First Year Multiplier
GRMS – Gravity Root Mean Squared
HALT – Highly Accelerated Life Test
HASA – Highly Accelerated Stress Audit
HASS – Highly Accelerated Stress Screen
HDD – Hard Disk Drive
HOL – High Operating Limit (also see UOL)
IC – Integrated Circuit
IEEE – Institute of Electrical & Electronics Engineers
IP – Intellectual Property
IT – Information Technology
KHZ – Kilohertz
KSLOC – Kilo Source Lines of Code
LCC – Life Cycle Cost
LDL – Lower Destruct Limit
LMM – Lumped Mass Model
LOL – Lower Operating Limit (see alsoCOL)
LTPD – Lot Tolerance Percent Defective
MOS – Metal Oxide Semiconductor
MRB – Material Review Board
MSD – Mean SquareDeviation
MTBF – Mean Time Between Failure
MTTR – Mean Time to Repair
NPF – No Problem Found
NRE – Nonrecurring Engineer
NSF – National Science Foundation
ODM – Original Design Manufacturer
OEM – Original Equipment Manufacturer
OOBA – Out of Box Audit
ORT – On-Going Reliability Test
PATCA – Professional and Technical Consultants Association
PCB – Printed Circuit Board
PHM – Prognostic and Health Management
PLC – Product Life Cycle
PLM – Product Lifecycle Management
PM – Preventive Maintenance
POF – Physics of Failure
POS – Proof of Screen
PPM – Parts Per Million
PRG – Product Realization Group
PRN – Product Realization Network
PRST – Probability Ratio Sequential Testing
PTH – Plated Through Hole
QA – Quality Assurance
QC – Quality Control
QPL – Qualified Products List
RCA – Root Cause Analysis
RDT – Reliability Demonstration Test
RIAC – Reliability Information Analysis Center
RoHS – Restriction of Hazardous Substances
ROI – Return on Investment
RPIP – Reliability Program and Integration Plan
RPM – Reliability Planning and Management
RPN – Risk Priority Number
RPP – Reliability Program Plan
RTP – Reliability Test Plan
S-N – Stress Versus Number of Cycles Relationship
S/N – Signal-to-Noise
SCA – Sneak Circuit Analysis
SDFR – Software Design for Reliability
SEM – Scanning Electron Microscope
SFMEA – Software Failure Modes and Effects Analysis
SFTA – Software Fault Tree Analysis
SME – Society of Manufacturing Engineer
SPICE – Simulation Program with Integrated Circuit Emphasis
SRC – SystemReliabilityCenter
TGA – Thermo-Gravimetric Analysis
TMA – Thermo-Mechanical Analysis
TRIAC – Triode for Alternating Current
TW – Time to Wearout
UDL – Upper Destruct Limit
UOL – Upper Operating Limit (also see HOL)
USB – Universal Serial Bus
VDL – Vibration Destruct Limit
VOL – Vibration Operating Limit
XRF – X-Ray Fluorescence

In today’s consulting market helping customers that have serious system reliability issues but low budgets to solve the problems can be challenging. Customers that are cash strapped are looking for the maximum bang for the buck and cannot afford to hire high paid consultants and make too many system changes in design. If they do, they may try to keep the work as brief as possible to cut down the time on the job. This is understandable and managerially it seems like the only option. When faced with this situation, even top of the line consultants may error in proposing a work package that is technically complete and has all of the bells and whistles needed to answer all technical questions with the concomitant costs to the customer. The proposals may be technically good in one way but may fall short on capturing the problem of the customer’s budget for changes and fixes. The proposal may thus be rejected or the customer may continue to look for a less expensive approach or less expensive consultant. There may be alternative options that consultants may examine as a viable option. The following is an example of such a case where more thoughtful compromises may help the company and be a better business approach.

Company X has a reliability issue with their high tech commercial equipment. A reliability analysis has suggested that the overall reliability of the equipment will not meet the contractual requirements of reliability and quoted warranty due to a low MTBF. A preliminary test has confirmed this fear. The company cannot afford a complete redesign of the equipment and cannot ship out the equipment in this condition or they may face high cost warranty issues and potential law suits later if and when the equipment fails prematurely in its mission. The company calls in a consultant and requests help in making the equipment more reliable. The budget for the exercise is too low for a major rework plan that seems to be needed. The consultant sees the problem and works out a design redo plan but knows that the plan will be rejected at the outset due to cost and time.

The consultant has a few options that may help the customer without incurring a high cost to redo the design and pay high consulting fees. The solution requires complete understanding of the system operation, parts required, materials and reliability issues and alternatives for fixes.
1) A DFMECA analysis shows that there are specific parts in the system that are main contributors to the risk of failure. A correction and risk mitigation analysis also shows there may be some solutions to the potential issues. A review of the initial reliability analysis revealed that there were some errors in the assumptions made that indicated low reliability. All of the parts were assumed to act in series but were not all in series. When a more realistic analysis was made; the reliability was actually higher than originally estimated. A redo of the analysis showed a significant improvement but this was not the overall answer to the problem of low reliability. The expected overall MTBF was still too low.
2) There was no redundancy in a key part. If a redundant part was used, the operation could continue without interruption while the failed part was repaired and brought back on line, or with enough redundancy, the operation could continue to the term life without replacement.
3) Some key components were changed for higher reliability (at lower cost).
4) Instead of redesigning the entire system, a careful reliability analysis suggested that the system could meet the contract requirements if there could be one replacement of a key component (that has a short life) mid way in the term life. This would have to be negotiated with the customer, but a slight reduction in system price and other minor incentives were considered easily negotiable with the customer to allow the part replacement.
5) A review of the manufacturing plan revealed that some of the manufacturing operation was actually contributing to the lower reliability and a improved workmanship plan had to be installed.
6) Training the customer technicians to use the equipment at their own site under not only under usual running conditions but under adverse conditions would pay off in higher system availability and reliability than without the training. The cost of training called for operation training at the manufactures plant but not at the customer site and did not include “what if” training with videos and pictures. This cost was re-negotiated.
7) Some redesign of the system to simplify the operation was performed where possible without major redesign effort.
8) A quicker and more reliable way to estimate the reliability as the design progressed was adopted to help steer the design and eliminate inferior design choices.
9) The reliability issue was fixed at a reasonable cost and without a high consultant fee for time expended. It may help to have a reliability team that can address the mechanical, electrical and manufacturing issues where one consultant may not have all of the expertise to address all of the potential issues. Consider adding a consultants clause for a reduced fee but with bonuses where solving the problem on a timely basis and cost savings are worth some extra compensation.
These are just some possible ways a reliability consultant can help the customer and keep the customers cost down at the same time. Not all cases have this success scenario but it may help to keep in mind

– “There are always alternatives” as Mr. Spock would say.

Semiconductor manufacturing employs high volume automated production lines using very complex wafer processing technologies requiring tight controls throughout the flow.

Specific physical, optical, chemical and electrical tests are being performed at several stages in the manufacturing flow, to screen out wafer batches, wafers or single devices outside of the tight distribution limits of a given processing step.

At the end of the wafer processing line, the wafers are tested using special test chips or test structures on the wafers and all product die are thoroughly tested for functionality.

Finished encapsulated devices are tested again for full functionality and performance against the data sheet specification limits.

New technologies, new products, new packages, are qualified for reliability with special tests and environmental and electrical stresses to demonstrate reliability and estimate population failure rate and lifetime.

All the above tests in manufacturing may result in devices not meeting the required test limits. These defective devices are highly valuable to provide detailed information on the particular failure mechanism(s) causing the defect.

It is imperative to do detailed electrical and physical characterization of the defective devices, followed by physical failure analysis including layer-by-layer de-processing of the devices and using appropriate analytical tools, as: optical microscope, SEM (scanning electron microscope), Auger analysis (chemical profiling) and a long list of other very special analytical tools.

Results of the F/A (Failure Analysis) are used to identify the root cause of the failure of a defective device.

This information set is the basis for corrective action(s) to improve chip design, manufacturing and in production test screens.  Implementing corrective actions in manufacturing should be followed up by checking/testing samples for the failure mechanisms, addressed by the corrective action(s).

There are four important sources of information for continuous quality/reliability improvement in the high volume semiconductor industry:

The first is the out-going quality assurance testing of samples of the production device population to a tight AQL quality level.

The second vehicle is on-going reliability monitoring of significant sample sizes of the outgoing finished product. This reliability monitor consists of accelerated environmental and electrical stresses, as: dynamic high temperature burn-in life test, accelerated temperature/humidity stress, temperature cycling, etc.

The third and important tool is “LIFO” (Last In First Out). To gain advance quality/reliability information on the product population in the production line we use the “LIFO” system, where samples moving through the production line with priority speed ahead of the rest of a production ”mother lot” and subjected to all of the above described quality/reliability tests and stresses. Lots, indicating problems still on the production line are put on hold, until corrective action(s) are implemented.

All information from the defective devices of the above tests and monitors, after rapid failure-and root cause analysis are being used for instant corrective action on the production line.

The most important information on failure mechanisms are extracted from field defective devices, returned from the field by customers. They are carefully analyzed by the above described methods, root cause(s) are identified, corrective action(s) implemented and the results rapidly communicated back to the customers.

Potting compounds are often used on printed circuit boards to improve reliability.  In spite of this added protection, adhesive failures, otherwise known as delamination, can occur and lead to substantial problems from moisture ingress.  Root causes of adhesive failures may be surface contaminants, inherently weak bonding between solder mask and potting material or thermal stresses that develop during temperature cycling.

In an adhesion analysis a combination of chemical and mechanical tests is often needed to determine the root cause.  Chemical testing includes surface analyses such as XPS and SIMS; both reveal chemical groups available for bonding at the surface.   Contaminants may be revealed during these tests.  Basic tests such as FTIR, TGA and EDS provide information on the materials including fire retardants whose loadings are typically quite high, affecting adhesive bonding.  Mass spectrometry can be used to identify suspected contamination areas.  Ion chromatography is often used to identify weak organic acids, a byproduct from incomplete volatilization of no-clean flux.   For determining an inherent compatibility of bonding surfaces, contact angle measurements using standard solvents, and surface tension measurements using the pendant drop method, can be very valuable.

Chemical testing must be done in conjunction with mechanical testing.  The aim of the mechanical test is to reproduce the failure mode in the laboratory under controlled conditions.  The easiest mechanical tests to perform are lap shear or peel strength tests.   The lap shear test is straightforward to fixture and provides a method for comparing bond strengths among different boards or potting materials.  A more sophisticated technique, the four-point bending test using pre-notched specimens, can be used to quantify adhesion energy, Gc.


For more information see:

Firas Awaja, Michael Gilbert, Georgina Kelly, Bronwyn Fox, Paul J. Pigram, Adhesion of polymers, Progress in Polymer Science 34 (2009) 948–968

Study of interfacial adhesion energy of multilayered ULSI thin film structures using four-point bending test, Zhenghao Gana, S.G. Mhaisalkara, Zhong Chena, Sam Zhangb, Zhe Chenc, K. Prasad, Surface & Coatings Technology 198 (2005) 85–89


VDA-German Automotive Industry Society describes process of system FMEA building. I would like to highlight 3 key areas of VDA approach:

  • Functional analysis
  • Failure analysis

The process consists from following steps:

  1. Product breakdown to system levels
  2. Functional description of system
  3. Failure Analysis
  4. Risks Evaluation
  5. Risk Optimization

First step is about definition of system.  Team breakdown system into several level which depend on scope of analysis eg. sensor-sensing
element-sensing element characteristics. Output of this step is system structure net.

In second step team define requirements, functions and characteristics into each element of product structure. Logic behind is describe how requirements should be ensured by functions and characteristics/parameters. Objective is to describe how system/product  works. Output is functional net.

When system is described then for each requirement and function we add failure(s). On requirements level known as effect of failure. It´s situation when requirement is not followed or partialy followed.

Other steps like risk evaluation ( RPN = S x O x D) and risk optimization ( mitigate RPN´s above critical level) are simialar to standard FMEA process. Also See FMECA.

Key benefits of VDA approach:

  • Structurized approach not form
  • Requirements and functional descripiton of system/product
  • Cause and Effect description based on functional analysis
  • Better orientation and traceability