FREE WEBINAR – October 7, 2015

Host: Ops A La Carte
Speaker: Robert Mueller, Senior Reliability Engineer
Date:  October 7, 2015
: 12:00pm-1:00pm Pacific Time

Root Cause Analysis of software defects continues to confirm that design defects remain the dominate root cause of software system failures. Yet, few non-regulated software development teams using ‘agile’ development methodologies emphasize formal design reviews in their processes. The design Failure Mode & Effects Analysis methodology (dFMEA) is a proven review tool for enhancing the reliability of a product’s design. Incorporating key elements of dFMEA methodology into each iteration of the ‘agile’ development process gives the product development team a continuously increasing product reliability growth profile from the very start of the software development process.

This Webinar will explore how software development teams have successfully added the elements of the dFMEA process to their Scrum (‘agile’) development processes, sprint by sprint. We will explore how teams augmented their definition of done (DoD) with FMEA process deliverables. We will explore what design artifacts (e.g., system models, object models, sequence or interaction diagrams, etc.) are typically used with the dFMEA enhanced sw design review process. We will also explore the advantages of prioritizing the product’s backlog using both an item’s value and its technical (e.g., reliability) risk.

Join us in this in-depth exploration of how FMEA can be integrated into the ‘agile’ software development process. Most of all, do not forget to bring your questions. “Yah-buts” are encouraged!


FMEA method is quite known today. There are a lot of guides, articles, and standards written about FMEA method. However, not so much written about FMEA links-relationship between FMEAs and other processes. Are these links important? To have a really strong FMEA approach we should take into consideration links-outputs/inputs between FMEA´s.  In most cases we develop/produce products within supply chain where our product is part of upper system or customer application and consist from components co-developed/produced by our suppliers. We are located betwen customer domain and supplier domain. Each domain has own FMEA (Application FMEA and Supplier FMEAs). The key point is to create interfaces betwen domain FMEAs.  When we don´t care these interfaces than we develop “our product“ not “customer product” and we can fail in customer application. There are a lot of real examples where product fail because there was missing customer voice.  We should keep in mind such approach and transfer Voice of Customer to Suppliers. Important is to setup FMEA communication platform between all supply chain entities with same risk evaluation criteria (Severity, Occurence, Detection). The objective such approach is identification of all risks and their failure cause – failure effect chain from supplier through us to customer. This approach can give us complete view what happen if parameter of supplier component fail in our domain and what will be failure effect on customer application.






Other aspect of more robust FMEA approach are links between FMEAs and other company procesess within our domain. Benefit of such approach is view on FMEAs from various functional perspectives. Following items describe how such proces can empower FMEAs and FMEAs can empower other procesess.




Requirement Management – to understand what custormer really need,how application works, what can be failure effects and their severity on customer application. It´s basic input for FMEA.

Quality Planning – FMEA is part of quality planning process and other quality tools and methods are dependent on it like control plan, measurement system analysis, process capability analysis, verification and validation planning, etc.

Risk Management – FMEA is source of product and proces risks which has to be evaluated from other risk perspective like financial effect, project timing, product porfolio, technology roadmap.

Supplier Management – FMEA is good communication platform to speak about component failures and their efects on customer system. Customer is learning from suppliers and suppliers are learning from customers.

Continual Improvement – FMEA is good source of product or process potential improvement projects definition based on highest risks.

Reliability Engineering – FMEA is integral part of product reliability analysis. Help to engineers to understand failure mechanism. Is good source of reliability test planning and after test failure analysis. Change Management – When any change in process or product  is planned than we should analyze with support of FMEAs.

Problem Solving – FMEA can be good reference for team to learn from past failures. New failure events should be added to FMEA.

It´s quite complex task to manage all these links. When we will think about these links than our FMEA can bring us more interesting results than before. It will not be separate method but will become integral part of our company procesess. There is very interesting tool to help company to manage all these links. See to


A client developed a novel FMECA technique that I think has much to recommend it. A key difference lies in the way “Occurrence” is handled. Generally, when we develop a FMECA, we assign a somewhat-arbitrary number (1 – 10) to the occurrence factor. In this approach, the failure rate (in FIT’s) is used for each of the components. The total of the FIT’s for the components in the subsystem considered is then used to normalize the FIT for each component. In this way, the most critical components can be determined. The total can also be rolled up to the next higher level. Another feature of this approach is that the failure mode of the component (for example: open, shorted, parameter change) can be included in the analysis since, in some cases, a particular failure mode can have a more deleterious effect than another.

The main advantage of this approach is that it removes some of the arbitrariness of the standard approach. A challenge however, lies in finding the FIT values (and especially the ratios of the failure modes) for some of the components.

VDA-German Automotive Industry Society describes process of system FMEA building. I would like to highlight 3 key areas of VDA approach:

  • Functional analysis
  • Failure analysis

The process consists from following steps:

  1. Product breakdown to system levels
  2. Functional description of system
  3. Failure Analysis
  4. Risks Evaluation
  5. Risk Optimization

First step is about definition of system.  Team breakdown system into several level which depend on scope of analysis eg. sensor-sensing
element-sensing element characteristics. Output of this step is system structure net.

In second step team define requirements, functions and characteristics into each element of product structure. Logic behind is describe how requirements should be ensured by functions and characteristics/parameters. Objective is to describe how system/product  works. Output is functional net.

When system is described then for each requirement and function we add failure(s). On requirements level known as effect of failure. It´s situation when requirement is not followed or partialy followed.

Other steps like risk evaluation ( RPN = S x O x D) and risk optimization ( mitigate RPN´s above critical level) are simialar to standard FMEA process. Also See FMECA.

Key benefits of VDA approach:

  • Structurized approach not form
  • Requirements and functional descripiton of system/product
  • Cause and Effect description based on functional analysis
  • Better orientation and traceability

FMEA is great tool used in many quality, reliability, and risk analysis processes.  It is not a highly sophisticated tool and is certainly not technically complex.  As a reliability tool, the FMEA is extremely effective in identifying the risks of greatest concern and thus focusing design and test activities to eliminate that risk or reduce it to tolerable levels.

Even though there is software available to assist in performing the FMEA, a spreadsheet is often adequate.  Getting the proper team together with the patience to conscientiously fill out the spreadsheet is often a more difficult task.

A typical FMEA process for a design FMEA might be composed of the following steps:

¨       Step 1: Review the Process/Design

¨       Step 2: Brainstorm potential failure modes

¨       Step 3: List potential effects of each failure mode

¨       Step 4: Assign a severity rating for each effect

¨       Step 5: Assign an occurrence rating for failure modes

¨       Step 6: Assign a detection rating for modes/effects

¨       Step 7: Calculate the risk priority numbers

¨       Step 8: Prioritize the failure modes for action

¨       Step 9: Take action to eliminate/reduce high-risk

¨       Step 10: Calculate the resulting RPN

I believe that most of these steps are quite easy to perform but one that seems to cause a great deal of confusion is Step 6: Assign a detection rating.  To assign a detection rating, the probability of detecting a failure before the effect is realized must be determined.  So, what does that mean?  I have seen a number of different explanations for what “detection” means for an FMEA.  Does that mean detecting a potential failure prior to shipment?  Does that mean detecting that a failure is imminent but prior to occurrence in the customer use environment (a type of prevention)?  Does that mean detecting the failure after it occurs but prior to it impacting the customer?  Or, does that mean just detecting that a failure has occurred?

Here are some opinions found in an internet search:

  • First, an engineer should look at the current controls of the system, that prevent failure modes from occurring or which detect the failure before it reaches the customer. Hereafter one should identify testing, analysis, monitoring and other techniques that can be or have been used on similar systems to detect failures. From these controls an engineer can learn how likely it is for a failure to be identified or detected.
  • The Design Control Detection then allows us to describe how we will test this design and the confidence we have that this test would find any potential failure mode(s) about which we are concerned.
  • Identify process or product related controls for each failure mode and then assign a detection ranking to each control. Detection rankings evaluate the current process controls in place.
  • A control can relate to the failure mode itself, the cause (or mechanism) of failure, or the effects of a failure mode.  To make evaluating controls even more complex, controls can either prevent a failure mode or cause from occurring or detect a failure mode, cause of failure, or effect of failure after it has occurred.
  • Design Control will almost certainly detect a potential cause/mechanism and subsequent failure mode.
  • Identify Current Controls (design or process). Current Controls (design or process) are the mechanisms that prevent the cause of the failure mode from occurring or which detect the failure before it reaches the Customer. The engineer should now identify testing, analysis, monitoring, and other techniques that can or have been used on the same or similar products/processes to detect failures. Each of these controls should be assessed to determine how well it is expected to identify or detect failure modes.
  • Detection is an assessment of the likelihood that the Current Controls (design and process) will detect the Cause of the Failure Mode or the Failure Mode itself, thus preventing it from reaching the Customer.
  • Identify the existing controls that identify and reduce failures.  Controls may be Preventive (designed in) or Detective (found by functional testing, etc.)–Preventive controls are those that help reduce the likelihood that a failure mode or cause will occur (affect occurrence value)–Detective controls are those that find problems that have been designed into the product (assigned detection value).
  • It is your ability to detect the failure when it occurs.
  • Basically prior to “impending” failure.  The new AIAG FMEA manual has implemented “2” control columns in an effort to assist in this endeavor.  Preventive Controls : Essenially what are you doing to prevent the failure from occurring. This includes such things as SW diagnostics.
    In an automotive application, an ABS lamp activates prior to impending failure to allow you to take it to the dealership .  Detective controls : Essentially what tests do you have in place that can detect the failure prior to design / process release to the end user.
  • Detection: Detect the Cause/Mechanism or Failure Mode, either by analytical or physical methods, before the item is released to production.
  • FMEA is a mitigation planning tool.  Detection must be relevant to mitigation.
  • Detection is sometimes termed EFFECTIVENESS. It is a numerical subjective estimate of the effectiveness of the controls to prevent or detect the cause or failure mode before the failure reaches the customer.  The assumption is that the cause has occurred.
  • A description of the methods by which occurrence of the failure mode is detected by the operator. The failure detection means, such as visual or audible warning devices, automatic sensing devices, sensing instrumentation or none will be identified. (MIL 1629)
  • The definition of Detection usually depends on the scope of the analysis. Definitions usually fall into one of three categories:

i) Detection during the design & development process

ii) Detection during the manufacturing process

iii) Detection during operation

It’s obvious that there are a number of opinions of what “detection” means in the context of an FMEA.


One thing is clear, is that during the preliminary discussions prior to beginning the detailed FMEA development, that everyone should agree on what detection means for the product being addressed.


Does anyone have an opinion on this subject?

DFMEA is a Design FMEA performed on the product design

PFMEA is a Process FMEA performed on the manufacturing process.

Both are very useful and both techniques should be utilized.

There is correlation between the two – one analysis could definitely influence the other, and often times a mitigation for a Design FMEA may be doing something different in the manufacturing process if it cannot be fixed by design alone.

There are many other forms of FMEA such as User FMEA, Software FMEA, Functional FMEA and others. FMEA is a very powerful tool and you choose the type of FMEA based on the particular circumstance. Here is a short summary of my favorite 6 types:


Design FMEAs are performed on the product or system at the design level.

The purpose is to analyze how failure modes affect the system, and to minimize failure effects upon the system. Design FMEAs are used before products are released to the manufacturing operation. All anticipated design deficiencies will have been detected and corrected by the end of this process.


Process FMEAs are performed on the manufacturing processes.

They are conducted through the quality planning phase as an aid during production.

The possible failure modes in the manufacturing process, limitations in equipment, tooling, gauges, operator training, or potential sources of error are highlighted, and corrective action taken.


System FMEAs comprise part level FMEAs.

All of the part level FMEAs will tie together to form the system.

As a FMEA goes into more detail, more failure modes will be considered.

A system FMEA needs only go down to the appropriate level of detail as needed.


Functional FMEAs are also known as “Black Box” FMEAs.

This type of FMEA focuses on the performance of the intended part or device rather than on the specific characteristic of the individual parts.

As an example, if a project is in the early design stages, a Black Box analysis would focus on the function of the device rather than on the exact specifications (color must be blue-gray, knob is 2.15 mm to the left, etc.)


A subset of the Design FMEA that focuses specifically on the customer and how they will use/mis-use the product

An input to the User FMEA is the user manual

The User FMEA will look at installation, use, and end-of-life situations.


All of the FMEA methods (Design, Process, System, Functional, User) can also be applied to software.

In a Software FMEA, we are not only interested in potential software bugs but errors in interfaces and errors in boundary conditions.

Excellent tool to use if you have a set of bugs and are trying to determine the likely cause.