Safety thinking: A brief summary of Safety-I and Safety-II (2024)

The past decades have seen a shift in safety thinking, which has been inspired by reflections on incidents, accidents, and safety efforts in various industries. The present summary seeks to briefly outline the underlying thoughts and assumptions of the traditional view of safety, commonly referred to as Safety-I, and aspirations to think about and do safety differently (Safety-II). [1]

Download PDF version from That Which Goes Right

Safety-I

Safety has traditionally been defined as the absence of unwanted outcomes, such as incidents, accidents, or injuries. Today, freedom from unacceptable risk is often equated with high levels of safety. Operation is deemed safe when the number of things that go wrong is acceptably low. Safety is defined as the absence of negatives and seen as a condition where the number of adverse outcomes (near misses, incidents, accidents) is as low as possible. This definition is largely in accordance with the original meaning of ‘safety’: to be uninjured, unharmed, and not exposed to danger.

In Safety-I, efforts to improve safety mainly focus on what goes wrong or could go wrong. Safety is measured indirectly by the absence of negatives: safety is high when the number of negative events is low and vice versa. Focusing on negatives is assumed to allow blocking the transition of an organisation from a normal (functioning) to an abnormal (non-functioning) state, using barriers, automation, redundancy etc. Safety efforts thus largely pursue a mission zero: to reduce the number of incidents, accidents, and all sorts of frequency rates to zero (e.g. LTIFR, TRIFR). Lower numbers are equated with progress on safety.

Assumptions of Safety-I are partly mirrored in efforts to improve work processes, efficiency, and productivity. An early example is Scientific Management Theory by Frederick Taylor. In 1911, Taylor suggested a set of steps to increase the performance of an organisation: 1) Work analysis: Analyse the best way of doing work. Workers’ tasks are broken down into elementary steps and movements. The most effective performance is determined. 2) Select people to perform the tasks: A best match is sought between workers’ capabilities and the requirements to successfully complete a task. Workers should neither be under- nor overqualified. 3) Training: Workers are instructed to exactly apply and follow the specified process deemed best in the analysis. People are trained to ensure specific performance and remain within the set boundaries of their tasks and activities. 4) Compliance: Line supervisors constantly monitor workers’ performance and compliance. Incentives and awards are used to increase productivity.

Scientific Management Theory has influenced and shaped the role of organisations and humans. An organisation or system (e.g. the collaboration of various organisations to achieve a certain goal, such as air transport) is basically safe because it can be thoroughly designed and described. Technologies and processes are well established and controlled. Procedures are correct, complete and applicable to any work situation. System developers are capable of anticipating and designing for all contingencies.

For work to succeed, people only have to follow the specified rules and procedures. They are not supposed to improvise. Variability of human performance is harmful and has to be prevented as good as possible. Yet workers not always meticulously execute work as planned or imagined by management but sometimes divert from the specified processes (work as done). This makes people a liability, a problem to control, or even a threat to safety. To avoid negative events, humans thus have to be controlled. The replacement of humans with automation is regarded a valuable approach to reducing and eliminating human error.

In Safety-I, the principle of safety management is reactive: Changes are necessary when a negative event has occurred or when something is deemed an unacceptable risk. Accidents are caused by malfunction and failure. All causes have an effect, and all effects have a cause. Things go wrong due to differences between work as imagined by management and work as done by the workforce. The relationship between cause (e.g. an operator’s inadequate decisions) and effect (e.g. an accident) are deemed linear and unproblematic. Investigations have to reveal the causes and contributing factors, reasoning backwards from the negative event. This often means to identify the components, both technical and human, that have failed. The sequence of events leading up to the accident is traced back in time until a (plausible) root cause is found (or constructed).

Reflections on Safety-I

The world has continuously changed since the industrial revolution. During the Age of Technology from around 1760 to the late 1970s numerous technologies were invented, such as the steam engine and railway lines. It was of course important to ensure that the new technologies worked reliably and without causing harm to humans and the environment. In case of a malfunction, the faulty machine was taken apart until the broken parts were found and replaced, since the functioning of the machine results from the functioning of all of its parts.

Large accidents in the 70s and 80s (i.e. Three Mile Island, Chernobyl) have pointed out that the Human Factor had initially been left out of the equation. In the Age of Human Factors it thus made sense to apply the same methods and methodologies to deal with the human element as those that had been successfully applied to technologies (e.g. root cause analysis, reductionism). Like technologies, humans are either successful or unsuccessful (bimodality principle). Somebody must have failed when something went wrong. The larger the accident and the higher the number of injuries and fatalities, the more severe someone’s mistakes must have been. Large and serious outcomes have equally large and serious causes (belief in the proportionality between cause and effect). The more serious an event, the more can be learnt. To prevent accidents from occurring and reoccurring, the faulty human operators—equal to the broken technical components—have to be identified, retrained, or replaced (Bad Apple Theory).

The period from the late 70s onward is characterised by production that increasingly had to be faster, better, and cheaper. Yet accidents, such as the explosion of the Challenger Space Shuttle in the late 80s, tragically pointed out the potential consequences of such an aggressive strategy and the need to account for organisational factors. During the Age of Safety Management, the development and use of Safety Management Systems has become a central and typical HSE effort in many organisations and industries.

As such, the focus of safety efforts has been expanded from technologies to human, organisational, and systems factors. Yet the underlying assumptions about the functioning of technologies, humans, organisations, and systems seem to have hardly changed. In order to improve safety and the performance of humans and organisations, the “broken” human and organisational components need to be identified and resolved. Accident investigations largely apply complex linear models (e.g. Swiss Cheese Model) to identify the broken parts, equal to holes in the layers of defence. Human and organisational factors are dealt with in the same way as technologies, following the assumption that success and failure have different underlying mechanisms.

Safety-I assumptions may apply to very simple work processes and systems that are well understood, tested, and relatively uncomplicated. Yet compared to the beginning of the 20th century when Taylor proposed Scientific Management Theory, today’s work environment has dramatically changed. Traffic volume has increased (e.g. in the air and on the road). Computerisation and automation seem unstoppable. Engines have become stronger. Transport takes place at higher speed and greater distances. Production is still expected to become even faster, better, and cheaper.

At the same time, the world is becoming increasingly complex, and so is operation in various socio-technical systems, such as aviation. Multiple diverse parts (e.g. people, operators, technologies, organisations) are interdependently connected and adapt to the surrounding conditions. New technologies, tools and equipment are constantly developed and introduced. Organisations face harsh competition and are largely dependent on the financial market (e.g. oil price changes), which often requires company-internal adaptations, such as restructures and redundancies. The result is a complex interplay of different actors and components that potentially gives rise to outcomes that are difficult or impossible to foresee.

As a consequence of increased complexity, social-technical systems are becoming more difficult—if not impossible—to tract: by the time a system has been thoroughly described and modelled, it may already have changed and adapted. The system is interdependent of other systems and difficult to control. Elaborate descriptions are necessary with many details. The principles of the system’s functioning are only partly known. Examples of intractable systems are aviation, emergency medical treatment, or military operation.

Many of today’s intractable systems stand in sharp contrast to the work environment at the beginning of the 20th century. Back then, work processes were relatively simple and could be described with few details. The principles of the functioning of work were largely known. The system was mostly independent of other systems, rather easy to control, and did hardly change while being described.

Rapidly developing technologies, such as new aircraft, are produced, delivered and used across the continents. In the hangar, professionals are able to understand the functioning of complicated airliners in detail (despite with considerable effort). Yet released into operation all over the world, factors may come into play that are difficult to account for during the planning and design phase. Examples are differences in culture, training, use, or design assumptions (e.g. meaning of the colour red). This is when complexity increases.

There is no doubt that Safety-I thinking has helped many industries become highly safe (such as commercial aviation or nuclear power generation). Accident rates have gradually decreased in ultra-safe systems (with probabilities of an accident being 10-6). Yet rates have also mostly become asymptotic, in the sense that they have reached a plateau: many systems are very safe, but they are hardly getting any safer. A small number of accidents continue to occur.

This raises the question about the limits of traditional safety thinking as a means to further improve safety in complex systems. Whereas Safety-I assumptions may have well applied to work processes at the beginning of the last century, they might be limited or no longer unequivocally applicable to some of today’s increasingly complex systems and work environments.

Safety-II

Safety-II offers an alternative, complementary view of safety, questioning widely held assumptions. As outlined above, Safety-I regards a system as safe when negative events are absent. Safety-II challenges this assumption as a valid and logical conclusion. Referring to safety as the absence of negatives implies to focus on the lack of safety, on unsafety. Consequently, Safety-II defines safety as the presence of positives, such as people’s capabilities, capacities, and competencies that make things go right. Hence, safety is present when as many things as possible go right.

The need to define safety as the presence of positives is reflected in the regulator paradox: the safer the system, the less there is to measure. At a perfect level of safety, there is nothing to measure at all (until the next accident occurs). At this stage, it becomes impossible to demonstrate that safety efforts have any positive result. Low counts of things that go wrong make expenses on safety difficult to justify. Reductions in safety efforts likely follow. Focusing on diminishing numbers is thus a highly questionable measurement of safety, and the absence of negatives an utterly poor predictor for safe operation in the future. For instance, managers had celebrated the seven-year-absence of a lost-time accident on Deepwater Horizon just days before the rig exploded on 20 April 2010, killing 11 people. Today, various concepts and models are available to measure and determine what goes wrong (e.g. loss of situational awareness; slips, trips and falls; inadequate knowledge; poor decision making; distraction; fatigue). On the contrary, few methods exist to identify the presence of positives and why things usually go right.

Safety-II challenges the prevailing attitude towards humans and organisations. Systems and organisations are no longer deemed basically safe and undermined by unreliable workers. Instead, humans are seen as a valuable and necessary resource for the flexibility and resilience of an organisation to succeed. Workers are a solution to harness because they know the messy details of how to get the work done. Only the people know how to deal with time pressure, inadequate tools, incomplete system design, and trade-offs, such as being thorough and efficient at the same time. Rules and procedures are no longer regarded as entirely complete and applicable to any work situation. For work to succeed, people have to constantly adapt and adjust their performance to the local circ*mstances.

According to Scientific Management Theory, work processes have to be developed and specified by management, following assumptions about the working conditions at the sharp, operational end. In contrast, Safety-II underlines the importance of involving workers into the planning and improvement of work processes. Management needs to identify where and why workers divert from specified procedures to get the job done. Differences between work as planed and work as done have to be identified and resolved.

In Safety-II, the principle of safety management is proactive: to continuously try to anticipate developments and events in an uncertain future. Yet safety efforts reach far beyond the assessment of visible risks and calculations of probability. Paths towards unlikely, unthinkable outcomes are explored and discussed. Voices of minorities are heard, no matter how small or seemingly irrelevant people’s concerns appear at the time. Safety efforts are made and maintained even when adverse events are absent. Discussions about safety and risks are kept alive even when everything looks safe.

Safety-II questions whether humans and technologies succeed and fail in the same way. For instance, the stopping of an elevator is performed by a technological mechanism that takes place in a stable environment (e.g. the floors do not move up or down; the weight of the elevator cannot exceed a certain limit). Technologies are unable to adapt performance unless programmed to do so. In contrast, humans are capable of adjusting their actions to the situation encountered, where conditions and outcomes might be unknown or only partly known. A situation might differ from what was expected or previously experienced by the worker, the organisation, colleagues, or management. In order to create safety, humans have to adapt performance to the local conditions and circ*mstances.

Safety-II challenges the bimodality principle of human work and the assumption that human success and failure have different origins (hypothesis of different causes). It is questioned whether success (acceptable outcomes) is solely the result of compliance (in the sense that work as done matches work as planned), whereas failure (unacceptable outcomes) is caused by error, malfunction, and non-compliance (work as done diverts from work as planned). Instead, success and failure, function and malfunction, are thought of as the result of everyday work. Both result from identical processes. The things that go right and wrong have the same mechanism and basically happen in the same way, regardless of the outcome. The same performance that usually leads to success sometimes leads to failure.

Procedures provide valuable guidance on how to successfully perform specific tasks. Yet rules and procedures might not always be complete and specify work in every possible situation. For instance, no procedure existed on how to land a DC-10 aircraft after a complete loss of all flight controls due to an engine failure of United Airlines Flight 232 on 19 July 1989. Yet the crew managed to perform an emergency landing at Sioux City airport, saving 185 out of 296 people on board. The survivors of the Piper Alpha rig explosion in 1988 were largely those who risked a 35m-leap into the ocean—against the procedure of remaining on the platform in case of a fire.

In fact, many of today’s socio-technical systems have become so intractable that work situations are often underspecified in terms of procedures. Work can only be specified in detail for situations that can be entirely understood. Taylorist ideas may apply to simple work situations and processes. Yet a growing number of negative events are impossible to be explained by means of linear cause-effect relationships. An example is the friendly fire shutdown of two U.S. Blackhawk helicopters on 14 April 1994 over Northern Iraq. The more complex and less tractable a socio-technical system becomes, the greater the uncertainty about the details of how to perform the work. Hence, the more is left to humans, and the less to technology and automation.

In a world that is becoming increasingly complex, the Human Factor is the most valuable asset for system safety. Most of today’s systems and organisations do not succeed because they have been perfectly thought out and designed. They are successful and reliable because their people are flexible and able to adjust, at all levels of the organisation. In contrast to technologies, people have the ability to adapt performance, adjust work to the existing conditions and local circ*mstances (e.g. resources and requirements), improvise when necessary, and create safety in a challenging environment. People can detect and intervene when something is about to go wrong. They come up with new ideas and improvements. Workers can apply and interpret the procedures to match the conditions at work. They can identify and overcome problems. People are able to recognise present demands and adjust their performance accordingly. They can make trade-offs between multiple, competing goals (e.g. economic efficiency, timeliness, and safety). Humans do the tasks that machines cannot do. They keep organisations and systems working in a complex, rapidly developing, and partly unpredictable world.

Download PDF version from That Which Goes Right

[1] This summary largely builds on the following publications:

Amalberti, R. (2001). The paradoxes of almost totally safe transportation systems. Safety Science, 37(3), 109-126.

Dekker, S. W. A. (2011). Drift into Failure: From Hunting Broken Components to Understanding Complex Systems. Farnham, UK: Ashgate.

Dekker, S. W. A. (2014). Safety Differently: Human Factors for a New Era. Boca Raton, FL: CRC Press.

Hollnagel, E. (2010). Safer Complex Industrial Environments: A Human Factors Approach. Boca Raton, FL: CRC Press.

Hollnagel, E. (2014). Safety-I and Safety-II: The Past and Future of Safety Management. Farnham, UK: Ashgate.

Leveson, N. G. (2011). Applying systems thinking to analyze and learn from events. Safety Science, 49(1), 55-64.

Woods, D. D., Dekker, S. W. A., Cook, R. I., Johannesen, L., & Sarter, N. B. (2010). Behind Human Error. Aldershot, UK: Ashgate.