Root cause analysis (RCA) describes a collection of systematic approaches to determine the underlying causes of problems. RCA is relevant for quality improvement experts in healthcare, maintenance technicians in manufacturing, accident investigators in aviation and rail, and many more professionals across a wide range of industries.
The primary goal of RCA? To determine the corrective actions needed to prevent adverse events from reoccurring, whether on the production line or in the air.
RCA has many practical applications in numerous fields and can take multiple forms. In its most basic structure, the RCA process looks like this:
Define the problem
Write a problem statement describing the incident, its symptoms, and potential ramifications. Consider brainstorming with team members to ensure there are no details you are overlooking and to check against bias.
Gather information related to the incident. Ask yourself questions like: “When did this problem first present itself? What is its specific impact on operations?” A technician diagnosing a malfunctioning machine might ask themselves:
A sequence of events will emerge, pointing you to potential root causes.
Determine possible causal factors
Ask yourself: “What conditions allowed this to happen? Are there any other issues stemming from the primary incident?”
Mapping out a sequence of events in chronological order helps you separate non-causal events from causal events. Remember that just because two events correlate doesn’t mean that one caused the other. Closely examine the relationship between events to identify key contributing factors.
Identify the root cause
After you’ve differentiated causal factors from the non-causal factors, you can work towards pinpointing the root cause of the problem.
For example, let’s say you’ve determined the malfunctioning machine is less than a year old. Technicians lubricated it every three days and discovered a stripped bolt during the last emergency maintenance session. Equipment specifications require oiling the machine daily.
Though the stripped bolt could have contributed to the malfunction, the root cause is infrequent maintenance. Let’s assume you had skipped investigating the machine’s specs. You might have replaced the bolt only to face another breakdown days later.
Define the changes your company will need to make to keep this specific incident from recurring. Consider whether your business can devote more resources to prevention, then enact those changes based on your current capacity.
For example, preventive maintenance requires daily lubrication for this machine. In that case, your plant should plan to purchase more industrial oils and invest in more staff.
Now that your analysis is complete, it’s time to switch from reactive RCA to proactive. In addition to conducting regular follow-ups to safeguard against another incident, establish a long-term action plan.
In the long-term, your facility should audit the maintenance schedules of all equipment to ensure this scenario does not occur elsewhere in the plant.
Now that you know the basics of RCA, let’s drill down into the most popular root cause analysis frameworks. The methodology can vary widely from industry to industry and from scenario to scenario.
According to the Interaction Design Foundation, the 5 Whys is an interrogative technique that works back the cause of one effect to another up to five times. The more steps in the cause-and-effect chain, the more effort it will take to determine what launched the sequence of events in motion.
This model seeks to prevent accidents by positioning barriers between targets (often workers, passengers, or pedestrians) and hazards. When a barrier fails, the threat makes contact with the target. You can utilize barrier analysis to determine why the barrier failed and how to enact protective measures to avert further mishaps.
Barrier analysis primarily evaluates safety incidents. For example, an accident investigator might determine that a distracted train driver caused a derailment. The investigator might recommend implementing positive train control (PTC) to monitor movement. In a process called “defense in depth”, the transportation authority could layer multiple barriers by installing a computerized speed-limiting system in addition to PTC.
Event or change analysis is ideal for incidents with many potential causes. With this methodology, you consider the many changes leading up to a deviation in performance.
According to nuclear engineer and performance improvement expert Bill Wilson, the causes with the fewest additional conditions or assumptions are more likely to have caused the deviation.
Cause and effect (Fishbone or Ishikawa diagrams)
Fishbone diagrams take a visual approach to the 5 Whys technique, working back the cause of one effect to another. CMS.gov describes the problem as the head or mouth of the fish.
The “ribs” of the fish serve as the major categories of causes. For example, one category might be “environmental,” with the location and manufacturing plant layout listed as potential sub-causes. Another category might be “personnel,” with the scheduling and onboarding as sub-causes. Working through this diagram reveals systemic processes and knowledge gaps responsible for triggering a harmful event.
Kepner-Tregoe Problem Solving and Decision Making
A Kepner-Tregoe Matrix defines what the problem is and is not, isolating the who, what, when, where, and how behind an event. The Agency for Healthcare Research and Quality provides a helpful template for using the Kepner-Tregoe method.
Note: It’s important to not assign individual blame when considering who was involved in the incident. Examine organizational and environmental factors instead to identify potential systemic changes.
|Problem Statement||Is||Is Not||Distinction|
|What objects are affected?||List affected objects||List unaffected objects||Identify differences between affected vs. unaffected objects|
|Where does the problem occur?||Define event location||Define locations where problem is absent||Determine differences between affected vs. unaffected locations|
|When does the problem occur?||Describe when event occurs||Describe when event is absent||Identify differences between when event occurs vs. when it does not|
|Who is involved?||Identify personnel involved in event||Identify personnel not present||Compare present vs. absent personnel|
Management Oversight and Fault Tree Analysis
Also known as a management oversight and risk tree (MORT), this method examines workplace accidents and safety programs. The US Department of Energy describes this decision tree as a way to rapidly evaluate and assimilate new technologies and findings into an existing safety system.
Fault tree analysis uses Boolean logic to map the relationships between causes. For example, you would use the logical AND operator if two causes must happen simultaneously to trigger a problem. Otherwise, logical OR describes two causes that can individually provoke an incident.
Root cause analysis first emerged in the engineering field in the 1950s. However, due to its usefulness, RCA techniques proliferated in other industries throughout the twentieth century into the present day.
Sakichi Toyoda, the founder of Toyota Industries Co., Ltd., developed the 5 Whys method to troubleshoot manufacturing processes.
BrightHub PM reports that Toyoda also developed the Jidoka principle, selling this invention to a British firm to fund the start-up of Toyota.
The Federal Aviation Administration (FAA) implemented the Aviation Safety Reporting System (ASRS) to reduce airline incidents and improve safety management.
Aviation MRO software facilitates compliance by tracking maintenance, managing flight operation functionalities, and eliminating human errors.
Motorola developed a risk management strategy called Six Sigma to reduce manufacturing variability and defects and improve the overall quality of production and business processes.
The healthcare system began exploring RCA techniques to mitigate fatalities due to medical errors, the eighth leading cause of death in the US in 1999.
The Joint Commission’s (TJC) assessment standards mandate that all healthcare providers establish a standardized RCA process.
Five main types of RCA have evolved to meet the unique requirements of a diverse range of fields, from manufacturing to occupational safety.
This methodology originated in engineering and maintenance through equipment breakdown analysis.
Production-based RCA focuses on quality control in the manufacturing field.
This RCA method examines faults in business and manufacturing processes.
Safety-based RCA focuses on occupational safety and workplace accident analysis.
Derived from the previous four techniques, systems-based RCA utilizes aspects of change management, systems analysis, and risk management.
When should you perform root cause analysis? Explore these real-time industry applications where RCA is critical to solving the root cause of a problem.
A healthcare quality manager is responsible for patient safety, monitoring risk areas, and spearheading new care initiatives. Over the last several months, a downward trend in patient complaints has emerged. The manager utilizes RCA methodology to pinpoint which business processes are responsible for this success.
After a thorough analysis, the manager discovered that complaints dropped after a series of continuing education seminars on disability justice concepts. To replicate this success, the manager approached the CEO and board of trustees with a roster of patient-focused training courses to ensure continuous improvement in patient advocacy.
The board of trustees could vote to implement quality management software (QMS) specializing in life sciences, like Qualityze. QMS provides access to root cause analysis tools to help analyze risks, audit for optimal performance, and implement workflows. Healthcare QMS helps hospitals meet compliance standards, manage suppliers, protect patient data, and improve patient satisfaction.
A quality control specialist at an injection molding factory discovers warping in the plastic furniture emerging from the production line. Having made a minor adjustment to the mold design in the past to resolve this issue, the specialist explores other possible causes for the defective furniture.
The specialist returns to the error reporting paper trail they’ve maintained over time. Using these detailed records, they found that inconsistent mold cooling on a separate furniture line caused warping. They instruct shop floor workers to give parts sufficient time to cool before working on the next piece of furniture.
To help the specialist track maintenance activities more effectively, the plant should consider implementing computerized maintenance management software. CMMS maximizes the runtime of machinery, scans equipment for performance trends, and helps schedule regular preventive maintenance.
Manufacturing A large manufacturing enterprise suffers frequent production line pauses due to a malfunctioning machine. An industrial maintenance technician utilized RCA to determine that a particular part was responsible for the malfunction. Unfortunately, after ordering a replacement from their supplier and installing it, the machine malfunctioned again.
After revisiting their RCA methodology, the technician detected a recurring pattern of receiving faulty parts from this supplier. The plant decided to buy the part from a new vendor to avoid replacing another component. As a result, the machine began operating at full capacity after installing the new part.
A robust manufacturing execution system (MES) allows facilities to record, manage, and investigate quality non-conformances in supplier products. Investing in this software could reduce manufacturing cycle times and help schedule machine downtime in advance.
Due to the complex nature of root cause analysis, your team should avoid rushing the process to arrive at solutions faster. However, by moving through each step methodically, you can reap five significant benefits of this investigative technique:
Boost financial health
Reducing costs associated with resolving defects and cleaning up incidents translate to better long-term company performance and profits.
Identify the root cause of a problem to avoid repeat incidents, and apply your newfound knowledge to other operations to prevent similar issues from emerging elsewhere.
Enhance time to market
In manufacturing, products hit the market faster when corrective action eliminates defects immediately.
RCA prevents injuries and safety hazards by highlighting gaps in emergency procedures and training requirements.
RCA helps improve communications systems to increase staff awareness of risks, required activities, and proper procedures.
By working with a team for more thorough data coverage, performing RCA for successes and setbacks, and planning for future analyses, you can get the most out of root cause analysis techniques.