SOFT ERRORS: The Key to NPFs?
Authors: Charlie Slayman and Mike Silverman
Soft Errors is a hot topic in electronics today because as devices get smaller, there is an increasingly higher chance for soft errors to affect your circuits and cause errors at your customers.
The annual International Reliability Physics Symposium (IRPS) in Anaheim May 2-6 dedicated an entire section to Soft Errors and there were a total of 16 papers presented on the sources and effects of soft errors.
What are soft errors? Soft errors are any change in the output or state of a circuit that is not permanent and can be corrected by a simple re-write, re-compute or circuit reset operation.
Where do soft errors come from? Soft errors are the result of energetic particles that generate enough charge in an IC to change its operation. Alpha particles originate from the contamination of IC process chemicals and packaging materials from naturally occurring radioactive isotopes. Neutrons originate from the reaction of cosmic rays impinging on the earth’s atmosphere.
Why are soft errors important? If a system is not architected to handle soft errors, soft errors can result in field service calls, returned product and customer dissatisfaction. And the returned product will often result in a “No Problem Found (NPF)”. Soft error rates can be orders of magnitude higher than hard fail rates. As device technology scales, the amount of charge required to upset a circuit is getting smaller and smaller.
How do you deal with soft errors? Why not just eliminate the source? Reduction of alpha particles can be very expensive from a manufacturing standpoint. It is often more cost effective to use mitigation techniques (such as error correction code). There is no practical way to shield neutrons (a 100 MeV neutron will penetrate 20 feet of concrete!), so mitigation techniques are the only practical solution for reliable product designs.

FUTURE WORK:
Future Directions in Soft Error Modeling – Soft error models exist at various levels, from nuclear physics models for charge generation, transistor level models for charge collection and circuit level models for memory and logic upsets. However, much of the work is proprietary and industry standards do not exist. Olivier Lauzeral of iRoC Technologies and ShiJie Wen of Cisco Systems lead a survey for the workshop participants to determine what features users wanted to see in the development of soft error tools from device modeling to circuit layout. The results will be available shortly (stay tuned to the Ops A La Carte website.)
Assessing Repeatability and Accuracy: The Alpha Counting Consortium – Reduction of soft errors from alpha particle contamination is driving the need for low and ultra-low alpha particle materials (~2 to 50 counts/cm2-khr) in semiconductor processing and packaging materials. This is approaching the background detection level of alpha counters. Jeff Wilkinson of Medtronic and Rick Wong of Cisco Systems discussed the plans for a round robin test among up to ten facilities using low and ultra-low alpha particle samples. The objective of the first phase of the study will be to assess measuring repeatability and accuracy of the various methods employed by participants. Jeff welcomes comments and feedback. Email Jeff at jeff.wilkinson@medtronic.com. Results of this consortium will be part of a future presentation which we will share with readers once it becomes available to the public.

OPS CONSULTING SERVICES FOR SOFT ERRORS
Consulting Services on Design for Soft Errors (DFSE):
- recommend best practices before beginning of design phase
- review of existing IC and system designs for impact of soft errors- validate design by recommending modeling and accelerated testing techniques
Consulting Services on Testing for Soft Errors:
- program management of soft error testing (Ops A La Carte does not actually do soft error testing, but can manage the project between the customer and the soft error design and test service)
Consulting Services on Troubleshooting for Soft Errors:
- review of field service issues to determine if the problem is caused by soft errors or something else. If you currently have a high NPF rate, give us a call and we can determine if it is being caused by soft errors.
For more information on this topic, we presented a one hour overview at a recent IEEE Reliability Society talk. You can download the presentation or the webcast.

Mention this article and receive $1K off your next Soft Error service.

Below is a summary of the Best of our Blog for last quarter - highlights of the best blog topics we had. If you would like to contribute to our blog, please either
or go to 
HOW CAN YOU PROTECT YOURSELF FROM SOFT ERRORS?
With all this talk about soft errors, is there anything you can do to protect your equipment or to reduce the effects of soft errors for your customers?
The best response within the next 5 days will receive one free consultation in the area of soft errors (see the Ops Consulting Service section of the Feature Section of this newsletter for the types of consultation offered). Alternatively, you can choose one free pass to our next Certified Reliability Engineer (CRE) or Certified Quality Engineer (CQE) course or one free day at our upcoming DfX Symposium.

Congratulations to Quan Tran of Boeing for providing the best response to last quarter's quiz: A) What Should Toyota Do Differently Going Forward? and b) How do you see your company reacting and changing so that you don't become the next Toyota story?
Quan's answer was:
A) Toyota needs to determine the root causes ASAP by defining the problem, investigating the problem, verifying the problem and ensuring that the problem would not happen again. To regain the customers perception and trust; Toyota needs to address the root cause and implement their corrective and preventive actions ASAP. To obtain steps A and C, Toyota needs to fully reevaluate their current and future design and verification processes. Toyota need to keep in mine that “No matter how good we design the product, we are still going to have failures during testing”; and “It is only with a robust RCA process that we can quickly and effectively deal with these failures so that we ensure the problem is fixed and does not recur and does not show up in the field.
B) Our company or any other companies definitely don’t want to become the next Toyota story. To avoid this scenario, our company is ensuring that the System Safety, Reliability and Maintainability requirements must be included and verified during the preliminary and detailed design phases of the program. System Safety includes requirements such as: Critical hazards, catastrophic hazards and Control of functions resulting in critical or catastrophic hazards, etc…As Mike Silverman mentioned in his article of Reliability Matters: The Toyota Take- Away “, other manufacturers of automobiles did foresee this particular problem and put in software that overrides the gas pedal in situations like this. Why Toyota did not foresee this particular problem? Toyota needs to apply the robust RCA process in their design current and future design. Reliability section includes requirements such as: Failure tolerance, failure propagation, separation of redundant paths and redundancy status, etc...Maintainability section include requirements such as: Qualitative maintainability design, Failure Detection, Isolation, and Recovery (FDIR), Testing at operation location, and Ambiguity Resolution.
High Technology OEM’s face significant increases in global competition, time to market pressures, and technological complexities, which result in higher risks for bringing new products to market. These risks drive the need for greater use of outsourced services.
To reduce these risks, the PRG has developed a “one-stop-shop” of outsourced services that helps companies to bring new products to market. Engagements are tailored to business and product profiles, and partners work together to deliver services that range from initial concept through global logistics and repair.
Your Benefits
- Our integrated solutions will save you time, streamlining bringing your product to market.
- Our proven partnerships will eliminate guesswork and prevent breakdowns in communication.
- Our partners will educate you on current best practices.
Product Realization Group | www.productrealizationgroup.com | mkeer@productrealizationgroup.com |


Computer Magic Training is a computer software and professional development training provider that really makes a difference. Their training works like Magic! What makes them different is that their focus is on the customer, to provide the very best learning experience, whether the class is at Computer Magic or at a corporate location. Their instructors are caring and clear communicators with exceptional knowledge. The training facilities are outstanding - warm, beautiful and comfortable with great computer equipment. The Computer Magic staff and trainers take great care of the students and make sure their needs are met. Students receive helpful after-class support and free class repeats. Computer Magic also provides Tutorial and Consulting Services and a line of Professional Development classes from their Business Magic Training division.
Computer Magic, Inc. | www.computermagictraining.com | sharon@computermagictraining.com | +1.408.261.2600.
Given the current economic climate, we know of many talented individuals that are currently looking for work. Therefore, if you are an employer and have a need for any position within reliability, engineering, or operations, we are offering to advertise in our newsletter at no cost. Just doing our part to help stimulate the economy! Below are a few positions that we do know about.
Senior Reliability Consultant
Ops A La Carte is looking for Senior Reliability Consultants around the world to join our team of consultants and work on some of the most exciting and challenging projects in the industry. Whether you have an existing consulting practice or are interested in developing one, please contact us.
Reliability Engineering Manager
If interested, email Harry McLean at: harry.mclean@aei.com.
Ops A La Carte's newsletter goes out to over 18,000 subscribers. If you would like to put an ad or job opening in next quarter's "Reliability News", email us via our Job Openings Contact Form or call at (408) 654-0499.
|