词条 | Root cause analysis |
释义 |
}} In science and engineering, root cause analysis (RCA) is a method of problem solving used for identifying the root causes of faults or problems.[1] It is widely used in IT operations, telecommunications, industrial process control, accident analysis (e.g., in aviation,[2] rail transport, or nuclear plants), medicine (for medical diagnosis), healthcare industry (e.g., for epidemiology), etc. RCA can be decomposed into four steps:
RCA generally serves as input to a remediation process whereby corrective actions are taken to prevent the fault/problem from reoccurring. The name of this process varies from one application domain to another. DefinitionsIn science and engineering, there are essentially two ways of repairing faults and solving problems. Reactive fault/problem management consists in reacting quickly after the fault/problem occurs, by treating the symptoms. This type of management is implemented by reactive systems,[3][4] self-adaptive systems,[5] self-organized systems, and complex adaptive systems. The goal here is to react quickly and alleviate the symptoms of the fault/problem as soon as possible. Proactive fault/problem management, conversely, consists in preventing problems from occurring. Many techniques can be used for this purpose, ranging from good practices in design to analyzing in detail problems that have already occurred, and taking actions to make sure they never reoccur. Speed is not as important here as the accuracy and precision of the diagnosis. The focus is on addressing the real cause of the fault/problem rather than its symptoms. Root-cause analysis is often used in proactive fault/problem management to identify the root cause of a fault/problem, that is, the factor that was the main cause of that fault/problem. It is customary to refer to the root cause in singular form, but one or several factors may in fact constitute the root cause(s) of the fault/problem under study. A factor is considered the root cause of a fault/problem if removing it prevents the fault/problem from recurring. A causal factor, conversely, is one that affects an event's outcome, but is not the root cause. Although removing a causal factor can benefit an outcome, it does not prevent its recurrence with certainty. Example of RCAImagine an investigation into a machine that stopped because it overloaded and the fuse blew.[6] Investigation shows that the machine overloaded because it had a bearing that wasn't being sufficiently lubricated. The investigation proceeds further and finds that the automatic lubrication mechanism had a pump which was not pumping sufficiently, hence the lack of lubrication. Investigation of the pump shows that it has a worn shaft. Investigation of why the shaft was worn discovers that there isn't an adequate mechanism to prevent metal scrap getting into the pump. This enabled scrap to get into the pump, and damage it. The root cause of the problem is therefore that metal scrap can contaminate the lubrication system. Fixing this problem ought to prevent the whole sequence of events recurring. Compare this with an investigation that does not find the root cause: replacing the fuse, the bearing, or the lubrication pump will probably allow the machine to go back into operation for a while. But there is a risk that the problem will simply recur, until the root cause is dealt with. Application domainsRoot-cause analysis is used in many application domains. RCA in Manufacturing and Industrial process controlThe example above illustrates how RCA can be used in manufacturing. RCA is also routinely used in industrial process control, e.g. to control the production of chemicals (quality control). RCA is also used for failure analysis in engineering and maintenance. RCA in IT and TelecommunicationsRoot-cause analysis is frequently used in IT and telecommunications to detect the root causes of serious problems. For example, in the ITIL service management framework, the goal of incident management is to resume a faulty IT service as soon as possible (reactive management), whereas problem management deals with solving recurring problems for good by addressing their root causes (proactive management). Another example is the computer security incident management process, where root-cause analysis is often used to investigate security breaches.[7] RCA is also used in conjunction with business activity monitoring and complex event processing to analyze faults in business processes. RCA in Health & SafetyIn the domains of health and safety, RCA is routinely used in medicine (diagnosis), epidemiology (e.g., to identify the source of an infectious disease), environmental science (e.g., to analyze environmental disasters), accident analysis (aviation and rail industry), and occupational safety and health.[8] RCA in Systems analysisRCA is also used in change management, risk management, and systems analysis. General principlesDespite the different approaches among the various schools of root cause analysis and the specifics of each application domain, RCA generally follows the same four steps. Identify and describe clearly the fault/problemEffective problem statements and event descriptions (as failures, for example) are helpful and usually required to ensure the execution of appropriate root cause analyses. Establish a timeline (history of events) from normal situation until the fault/problemRCA should establish a sequence of events or timeline for understanding the relationships between contributory (causal) factors, the root cause, and the fault/problem under investigation. Distinguish between the root cause and causal factorsBy correlating this sequence of events with the nature, the magnitude, the location, and the timing of the fault/problem, and possibly also with a library of previously analyzed faults/problems, RCA should enable the investigator(s) to distinguish between the root cause, causal factors, and non-causal factors. One way to trace down root causes consists in using hierarchical clustering and data-mining solutions (such as graph-theory-based data mining). Another consists in comparing the situation under investigation with past situations stored in case libraries, using case-based reasoning tools. Establish a causal graph between the root cause and the fault/problemFinally, the investigator should be able to extract from the sequences of events a subsequence of key events that explain the fault/problem, and convert it into a causal graph. From RCA to corrective actionsThe goal of RCA is to identify the root cause of the fault/problem. The next step is to trigger long-term corrective actions to address the root cause identified during RCA, and make sure that the fault/problem does not resurface. Correcting a fault/problem is not formally part of RCA, however; these are different steps in a problem-solving process known as fault management in IT and telecommunications, repair in engineering, remediation in aviation, environmental remediation in ecology, therapy in medicine, etc. Why is RCA difficult?Without delving in the idiosyncrasies of specific problems, several general conditions can make RCA more difficult than it may appear at first sight. First, important information is often missing because it is generally not possible, in practice, to monitor everything and store all monitoring data for a long time. Second, gathering data and evidence, and classifying them along a timeline of events to the final fault/problem, can be nontrivial. In telecommunications, for instance, distributed monitoring systems typically manage between 1 million and 1 billion events per day. Finding a few relevant events in such a mass of irrelevant events is akin to finding the proverbial needle in the haystack. Third, there may be more than one root cause for a given fault/problem, and this multiplicity can make the causal graph very difficult to establish. Fourth, causal graphs often have many levels, and root-cause analysis terminates at a level that is "root" to the eyes of the fault/problem investigator. Looking again at the example above in industrial process control, a deeper investigation could reveal that the maintenance procedures at the plant included periodic inspection of the lubrication subsystem every two years, while the current lubrication subsystem vendor's product specified a 6-month period. Switching vendors may have been due to management's desire to save money, and a failure to consult with engineering staff on the implication of the change on maintenance procedures. Thus, while the "root cause" shown above may have prevented the quoted recurrence, it would not have prevented other {{snd}} perhaps more severe{{snd}} failures affecting other machines. Best practices in RCATo be effective, root cause analysis must be performed systematically. A team effort is typically required. For aircraft accident analyses, for example, the conclusions of the investigation and the root causes that are identified must be backed up by documented evidence.[9] See also{{Div col|colwidth=20em}}
Notes1. ^See {{Harvnb|Wilson|1993|pages=8–17|ref=1}}. 2. ^See {{Harvnb|IATA|2016}} and {{Harvnb|Sofema|2017}}. 3. ^See {{Harvnb|Manna|1995}}. 4. ^See {{Harvnb|Lewerentz|1995}}. 5. ^See {{Harvnb|Babaoglu|2005}}. 6. ^See {{Harvnb|Ohno|1988}}. 7. ^See {{Harvnb|Abubakar|2016}}. 8. ^See {{Harvnb|OSHA|2019}}. 9. ^See {{Harvnb|IATA|2016}}. References
| first1 = Aisha | last1 = Abubakar | first2 = Pooneh | last2 = Bagheri Zadeh | first3 = Helge | last3 = Janicke | first4 = Richard | last4 = Howley | article = Root cause analysis (RCA) as a preliminary tool into the investigation of identity theft | title = Proc. 2016 International Conference On Cyber Security And Protection Of Digital Services (Cyber Security) | year = 2016 }}
| editor1 = Babaoglu, O. | editor2 = Jelasity, M. | editor3 = Montresor, A. | editor4 = Fetzer, C. | editor5 = Leonardi, S. | editor6 = van Moorsel, A. | editor7 = van Steen, M. | title = Self-star Properties in Complex Information Systems; Conceptual and Practical Foundations | publisher = Springer | series = LNCS | volume = 3460 | year = 2005 }}
| author = IATA | author-link = IATA | url = http://www.iata.org/training/courses/Pages/root-cause-analysis-talp37.aspx | title = Root Cause Analysis for Civil Aviation Authorities and Air Navigation Service Providers | access-date = 17 November 2017 | date= 8 April 2016 | website = International Air Transport Association | quote = Key steps to conducting an effective root cause analysis, which tools to use for root cause identification, and how to develop effective corrective actions plans. | archive-url = https://web.archive.org/web/20160408135838/http://www.iata.org/training/courses/Pages/root-cause-analysis-talp37.aspx | archive-date= 8 April 2016 }}
| editor1 = Claus Lewerentz | editor2 = Thomas Lindner | title = Formal Development of Reactive Systems; Case Study Production Cell | series = LNCS | volume = 891 | publisher = Springer | year = 1995 }}
| last1 = Manna | first1 = Zohar | last2 = Pnueli | first2 = Amir | title = Temporal Verification of Reactive Systems: Safety | year = 1995 | publisher = Springer | isbn = 978-0387944593 }}
| last = Ohno | first = Taiichi | title = Toyota Production System: Beyond Large-Scale Production | page = 17 | year = 1988 | location = Portland, Oregon | publisher = Productivity Press | isbn = 0-915299-14-3 }}
| author1 = OSHA | author1-link = Occupational Safety and Health Administration | author2 = EPA | author2-link = Environmental Protection Agency | title = FactSheet: The Importance of Root Cause Analysis During Incident Investigation | url = https://www.osha.gov/Publications/OSHA3895.pdf | website = Occupational Safety and Health Administration | access-date = 22 March 2019 }}
| author = Sofema | author-link = Sofema | url = https://sassofia.com/course/root-cause-analysis-safety-management-practitioners-business-area-owners-2-days/ | title = Root Cause Analysis for Safety Management Practitioners & Business Area Owners | access-date = 17 November 2017 | date = 17 November 2017 | website = Sofema Aviation Services | quote = Identify best practice techniques and behaviours to perform effective Root Cause Analysis (RCA) | archive-url = https://web.archive.org/web/20171117220831/https://sassofia.com/course/root-cause-analysis-safety-management-practitioners-business-area-owners-2-days/ | archive-date = 17 November 2017 }}
| last1 = Wilson | first1 = Paul F. | last2 = Dell | first2 = Larry D. | last3 = Anderson | first3 = Gaylord F. | title = Root Cause Analysis: A Tool for Total Quality Management | date = 1993 | publisher = ASQ Quality Press | location = Milwaukee, Wisconsin | isbn = 0-87389-163-5 }} External links
2 : Quality|Problem solving |
随便看 |
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。