A 'common mode failure' occurs when events are not
statistically independent. That is, one event causes multiple systems to fail.
An example is when all of the pumps for a fire sprinkler system are located in one room. If the room becomes too hot for the pumps to operate, they will all fail at essentially the same time, from one cause (the heat in the room).
The ''principle of redundancy'' states that, when events of failure of a component are statistically independent, the probabilities of their joint occurrence multiply. Thus, for instance, if the probability of failure of a component of a system is one in one thousand per year, the probability of the joint failure of two of them is one in one million per year, provided that the two events are statistically independent. This principle favors the strategy of the redundancy of components. One place this strategy is implemented is in
RAID 1, where two hard disks store a computer's data redundantly.
Also, if the events of failure of two components are maximally statistically dependent, the probability of the joint failure of both is identical to the probability of failure of them individually. In such a case, the advantages of redundancy are negated. Strategies for the avoidance of common mode failures include keeping redundant components physically isolated.
A prime example of redundancy with isolation is a
nuclear power plant. The new
ABWR has three divisions of
Emergency Core Cooling Systems, each with its own generators and pumps and each isolated from the others. The new
European Pressurized Reactor has two
containment buildings, one inside the other. However, even here it is not impossible for a common mode failure to occur (for example, caused by a highly-unlikely
Richter 10 earthquake).
See also
★
Nuclear safety
★
Probabilistic risk assessment