|
|
Surety White Paper
White Paper for the Surety Science and Engineering
Workshop September 23, 1998
Sponsored by the National Academy of Engineering, the National
Academy of Sciences, and the Department of Energy
Produced by Sandia National Laboratories
Introduction Surety
Science and Engineering refers to a newly generalized approach to
designing complex systems that assures:
- Reliability under normal circumstances
- Safety under abnormal circumstances
- Security and Use Control under hostile circumstances.
We propose that the basic approaches to reliability, safety, or
security and use control that have been used in the design of high
consequence, complex systems in the nuclear weapons program can be
generalized into a more rigorous and comprehensive surety discipline that
will apply to many complex systems. This paper describes a draft framework
for investigating this possibility, and it serves to frame the issues to
be examined and applied at the Workshop. In the discussion below, we
introduce the concept that surety solutions can be broadly classified into
four levels. For each level, there are attributes that help describe the
state of surety, and, in some cases, for each attribute there are
sublevels that define a range of effectiveness in the implementation of
the surety solution. Finally, we present eight basic approaches to
achieving the various levels of surety along with several examples of
their application. The appendix at the end of this paper amplifies on
these eight approaches.
Background For the
past 50 years, Sandia National Laboratories has been responsible for the
surety science and engineering of very high consequence systems, most
notably nuclear weapons and nuclear reactors. The resulting best practices
have been examined to construct a strategy for many government departments
and agencies with their national laboratories, corporations, and
universities to address surety challenges of the 21st century,
including the following:
- Manage increasing complexity of interdependent systems
- Counter cyber and physical terrorism
- Assure no nuclear yield by accident or hostile intent
- Counter the potential for nuclear proliferation
- Assure a reliable infrastructure so America is the place "where
things work."
- Provide solutions for aging nuclear weapons stockpile
- Assure a safe, secure, and reliable energy supply
- Provide for a clean and sustainable environment
- Counter crime
- Provide safe and secure schools
- Advance solutions for aviation reliability, safety and security
- Provide solutions for aging infrastructure
The increasing complexity and interdependence of these emerging
challenges stimulated the development of a proposed surety strategy for
the nation. Participants in the Workshop on Surety Science and Engineering
will examine and expand the conceptual methodology to test its potential
application to broader national challenges. The results will assist the
agencies responsible for those issues in communicating, if they desire to
do so, the issues and potential surety solutions to the new
106st Congress at a Surety Expo in the first week of February,
1999.
The surety
methodology Sandia developed the methodology for surety
science and engineering under sponsorship by the Department of Energy. The
method features four steps:
- Select the degree of aggregation that is most likely to accommodate
a real solution.
- Determine which of four levels of surety (presented in this paper)
dominates the current state of surety at the chosen degree of
aggregation.
- Use each of the fundamental approaches (presented in this paper)
that are appropriate to the current level of surety or higher levels to
stimulate potential solutions to surety challenges. The process is
similar to using simple machines like the screw, the lever, the inclined
plane, etc., to stimulate the invention of more complex machines or
using the abstract concepts of energy, momentum, and mass to stimulate
solutions to problems in physics.
- Evaluate the feasibility of the potential solutions by considering
the cultural, social, political, economic, and technical drivers
affecting the issue.
The workshop will concentrate on the first three steps and briefly
register the degree of consensus for each option to explore the last step.
Additional work after the workshop by the interested stakeholders may
address the fourth step more thoroughly.
This white paper briefly describes the method and illustrates the four
levels of surety and the eight approaches with simple examples, system
level examples, one set of options for nuclear reactor surety, and the
tools that accompany each approach.
Selecting the degree of
aggregation that is most likely to implement a real
solution
| The first step recognizes that complex endeavors can often be
decomposed into an interdependent structural hierarchy. For example,
the defensive capability of the United States can be decomposed into
the hierarchy of services, weapon platforms, weapon systems,
subsystems, components, etc. The system is interconnected in that a
change at one place can affect the surety of the whole. The surety
methodology can be applied at any level of aggregation. Generally,
the higher the level of aggregation, the greater the residual
ambiguity and the more difficult it is to assure surety. The optimum
approach will be different for different degrees of aggregation.
Picking the level that can best accommodate a solution is a very
important first step. |
| Defense
Establishment |
| Services |
| Weapon Platforms |
| Weapon |
| Subsystems |
| Components | |
Each situation is unique. There are no firm guidelines for the first
step. Discretion and judgment of informed practitioners is required. One
approach to achieving informed judgement is to assemble a team of
knowledgeable participants, whose experiences cover as many aspects of the
challenge as possible, and share insights on the current state and the
most likely level of aggregations to effect positive change.
The four levels of surety and
their application Identifying the approachable Degree of
Aggregation lets the group discuss the Level of Surety that best describes
the relevant state of surety.
Our examination of surety solutions developed over the past 50 years
revealed four levels of surety–identified by the principal reliance
on one of four foundations of surety: things just working as intended,
human intervention, science and engineering, and the laws of nature and
mathematics. The progression from intrinsically lower surety to higher
surety as one changes the principal reliance is not obvious. One goal of
the workshop is to test the universal validity of this observation.
The Four Levels of Surety
| I. |
Working Sufficiently as Expected and Buying Insurance to Cover
Upsets |
| II. |
Surety by Proactive Human Intervention |
| III. |
Surety by Positive Measures from Science and Engineering |
| IV. |
Surety from Laws of Nature and
Mathematics |
Each of the higher levels of surety depends on the lower levels to make
the whole system work, as illustrated in Table 1.
Table 1: Examples of the Four
Levels of Surety and their relative dependencies
| Examples Using Safety |
Level I -- Working as Expected |
Level II -- Proactive Human Intervention |
Level III -- Science
and Engineering Positive Measures |
Level IV -- Surety from First Principles of
Nature |
| Goal for all known threats |
|
|
|
|
| Nuclear weapons with ENDS |
|
|
|
|
| Pantex operations |
|
|
|
|
| Nuclear reactors |
|
|
|
|
| Aircraft safety |
|
|
|
|
| Conventional Military operations |
|
|
|
|
| Highway transportation |
|
|
|
|
| Industrial safety |
|
|
|
|
| Primary
reliance |
| Secondary reliance |
At Level I, the accountable decision maker assumes everything will work
sufficiently as expected but buys insurance to cover upsets, just in case.
Although system designers rely on foresight to assure surety and design
the system for reliable, safe, and secure operation, foresight is often
insufficient and there is no special consideration of off-normal
conditions. At best, reactive responses mitigate consequences. This level
of surety is quite adequate for many applications in which the impact of
more robust surety measures cost more than they appear to be worth. The
surety of consumer products falls in this category since limited
warranties are agreeable to both the customer and supplier. Until
recently, school safety and security have been acceptable at Level I. The
recent rise in vandalism and lethal assaults on students by other students
stimulates the search for higher levels of surety without accompanying
detrimental consequences.
At Level II, surety relies on Proactive Human Intervention. A plan is
in place and uses highly disciplined human action to control the
environment for the operation, to perform the operation reliably, and to
respond in case of emergency. If the team has difficulty in deciding if
the activity is currently in Level II, compare their knowledge of the
current surety practice with the attributes of Level II Surety shown in
Table 2 and defined as follows:
- Organizational culture - attitudes of the organization and workforce
- Oversight - review of design and operations
- Performance - personnel skills, knowledge, and experience
- Operational controls - procedures and other measures used by
personnel in performing operations
- Environmental controls - human initiated controls implemented to
control the environment under which the operation is performed
- Emergency response - personnel response to emergency
situations
Incremental improvement in the level of surety within a level can be
achieved by improving the rigor of the processes supporting each of these
attributes to progress to higher sublevels. Alternatively, the improvement
can target going to a higher surety level.
Level 2 |
Culture |
Oversight |
Perform- ance |
Oper- ational Controls |
Environ- mental Controls |
Emergency Response |
Sub- Level 3 |
Self- Actualized |
Independent assessment |
Highly skilled, knowledgeable, certified |
Formalized and controlled procedures |
Formalized and controlled to preclude undesirable environment |
Dedicated response effort |
Sub- Level 2 |
Compliant |
Self- assessment |
Some experience trained, depth not breadth |
Written procedural guidelines |
Guidelines exist to reduce likelihood of undesired
environment |
Trained response personnel (not dedicated) |
Sub- Level 1 |
Fear and ignorance |
No assessment |
Novice, checklist follower |
Training for normal situations only |
Operation free of environmental controls |
Ad hoc |
At Level III, surety relies principally on positive measures from
science and engineering. Positive measures are additional features
employed specifically to improve surety.
Engineering and scientific measures are proactively developed and
maintained to control the environment for the operation, to achieve sure
performance, and to respond in case of emergency. Examples of Level III
surety include nuclear reactors, ballistic missile defense, self-healing
telecommunication routers, and nuclear weapons without modern safety
features.
If the team has difficulty in deciding if the activity is currently in
Level III, compare their knowledge of the current surety practice with the
attributes of Level III Surety shown in Table 3 and defined as follows:
- Predictability - degree of assured performance given implementation
of engineered and scientific measures
- Range of effectiveness - range of situations over which given
measures are effective
- Theme and reliance on principles - development of a surety theme and
the principles upon which the theme rests
- Design implementation - implementation of scientific and engineered
measures into the theme
- Environmental controls - scientific and engineered measures to
ensure control of the operational environment
- Emergency response - technology and its availability to support
emergency response.
Table 3: The Attributes of Level
III Surety by Sublevel
Level 3 |
Predict- ability |
Range of Effective- ness |
Theme/ Principles |
Implemen- tation |
Environ- mental Controls |
Emergency Response |
Level 3, Sub- Level 3 |
Only one response likely, almost certain |
All credible threats |
Robust theme |
Several options- with redundancy |
Precludes environment |
Dedicated infrastructure and logistics |
Level 3, Sub- Level 2 |
Control certain responses, not entire range |
Some set of threats (multiple) |
Theme, but not for all threats |
Single point -no redundancy |
Reduces likelihood |
Standby equipment |
Level 3, Sub- Level 1 |
Little or none |
Single threat |
No theme |
No theme to implement |
Not considered |
Inventive |
Level III Surety can have problems. Designs age or are flawed. Software
has bugs. Hardware fails. Sequences unfold in unexpected and escalating
ways. These problems are minimized at Level IV by reducing the reliance on
engineered effectiveness and simplifying the situation to rely principally
on the laws of nature and mathematics and - to the extent possible - only
on those laws.
At Level IV, Surety relies principally on the laws on nature and
mathematics.
The designer uses first principles to constrain the physically
accessible parameter space in the quest for the physical impossibility of
undesired consequences. Continuous assessment moves the system towards
absolute surety. Accountable personnel identify and respond to changes in
the system or challenges to surety, ensure design principles are
maintained over the life of the system, broaden their understanding of
performance of the system over the entire range of circumstances that may
occur, and obtain the utmost confidence and predictability.
The eight approaches for
surety The four levels of surety provide a framework for
assessing the intrinsic level of surety and provide some general guidance
for increasing the state of surety. In addition, we seek a more explicit
framework for developing surety solutions. In addition to the analogies
with simple machines and with the conserved quantities of energy,
momentum, and mass in physics, we looked for other examples that turned a
list of best practices into a more powerful framework. The definition of
the "basic operations" of chemical engineering (filtration, heat transfer,
distillation, etc.) at MIT in the early 1940's provided sufficient rigor
to the existing best practices of the industry to create the discipline of
chemical engineering. We sought a set of principles that might stimulate a
similar development for surety-related challenges.
Many surety-related challenges involve human beings as complex adaptive
systems-systems that learn from experience and adapt their future
behaviors accordingly. This adaptive nature makes security and use control
in hostile circumstances particularly difficult. Our search did not reveal
a general theory of complex adaptive systems that could be tested for our
purposes. Therefore, we examined how human beings - probably the most
evolved complex adaptive system - have learned to tackle complex
challenges and have incorporated those ways into a toolkit for cognitively
solving complex problems. After all, human beings have created some very
robust solutions to complex challenges, e.g., the tribe, agriculture,
money, and the scientific method.
Elliott Jaques studied how people approached work and published The
General Theory of Bureaucracy in 1974. It offers a framework that can
be adapted to our purposes. He and his colleagues further applied their
framework over the next 25 years. Based on their work and our
investigations into many surety solutions, we have produced eight basic
approaches to surety:
- Foresight and good practices
- Mitigation after the fact and correcting what went wrong
- Proper operations with thorough science-based understanding,
independent assessment, and continuous improvement
- Administrative and/or engineered controls to reduce the probability
of occurrence
- All relevant positive measures must succeed
- Only one of many positive measures is necessary for success
- Predictable cumulative, comparative, and adaptive positive measures
- Reliance as much as possible upon laws of nature to approach
physical impossibility of high consequences
The detailed derivation of these approaches is beyond the scope of this
white paper. Furthermore, we do not contend that the eight approaches are
unique or that their description is the best one. We welcome improvements
and hope that the workshop will stimulate improvements or additions.
We then examined how the eight approaches related to the four Levels of
Surety identified by the principal reliance on one of four
foundations of surety: things just working as intended, human
intervention, science and engineering, and the laws of nature and
mathematics. The results show the eight approaches are distributed across
the four levels–with one approach spanning the boundary between human
intervention on one side and science and engineering on the other. These
results are illustrated in Tables 4, 5, and 6, in which the approaches are
designated 1 through 8 and are assigned to their corresponding Level I,
II, III, or IV, e.g., I.1 denotes the first approach and its association
with Surety Level I. Table 4 includes an illustration of a simple
(relatively low degree of aggregation) example of each approach. Table 5
shows a system-level example of each approach. Table 6 lists the tools
that are most important to each approach. Together, these figures serve to
operationally define the approaches, so they can be applied to stimulating
the creation of new surety solutions at the Workshop.
One of the goals of the workshop is to use the methodology with four
levels of surety and the eight approaches to order existing solutions and
invent new ones for each national challenge. Application of the method to
the surety of nuclear reactors produced the results in Table 7. We hope
the workshop will produce similar templates of solution options for a wide
variety of national challenges.
Table 4: Mapping of the eight approaches onto the four Levels of Surety
and illustration of each approach with simple example.
|
Levels of Surety, Surety Approaches, and Simple
Examples |
| Level |
Approach |
Simple Example of a Subsystem or
Component |
| Everything Working Sufficiently as
Intended |
I:1.0 |
Foresight and good practices |
Liability Insurance |
| I:2.0 |
Mitigation after the fact and correcting what went wrong |
Airline accident investigation |
| Proactive Human Intervention |
II:3.0 |
Proper operations with thorough science-based understanding,
independent assessment, and continuous improvement |
Simulator training and requalification of airline
pilots |
| II:4.0 |
Administrative control reduces the probability of occurrence |
X-ray and metal screening at airports |
| Positive Measures from Science and
Engineering |
III:4.5 |
Engineered controls reduce the probability of occurrence |
Automatic breathalizer or blood alcohol monitor
before car can be started |
| III:5.0 |
All relevant positive measures must success |
Seat belts and air bags in automobiles in near fatal
accidents |
| III:6.0 |
Only one of many positive measures is necessary for success. |
Coded car door locks and keyed ignitions in
automobiles |
| III:7.0 |
Predictable cumulative/comparative/adaptive positive measures |
Anti-lock brakes in cars |
| Laws of Nature and Mathematics |
IV:8.0 |
Rely as much as possible upon laws of nature to approach
physical impossibility of high consequences. |
Passively fused electrical
circuit |
Table 5: Illustration of each approach with a system-level example.
|
Levels of Surety, Surety Approaches, and System Level
Examples |
| Level |
Approach |
System Level Example |
| Everything Working Sufficiently as
Intended |
I:1.0 |
Foresight and good practices |
School security |
| I:2.0 |
Mitigation after the fact and correcting what went wrong |
Software surety by the design-test-fix method |
| Proactive Human Intervention |
II:3.0 |
Proper operations with thorough science-based understanding,
independent assessment, and continuous improvement |
Predictive reliability through well-diagnosed
experiments, tests, and simulations |
| II:4.0 |
Administrative control reduces the probability of occurrence |
Administrative controls based on systemic
understanding of each airline's system reduce the probability of an
airline safety occurrence and saves money |
| Positive Measures from Science and
Engineering |
III:4.5 |
Engineered controls reduce the probability of occurrence |
Automated and autonomous controls of the electric
power grid |
| III:5.0 |
All relevant positive measures are necessary for success |
Cooling and Loss of Coolant Systems in a Nuclear
Reactor |
| III:6.0 |
Only one of many positive measures is necessary for success. |
Ballistic missile defense with independent exo,
endo, and terminal layers |
| III:7.0 |
Predictable cumulative/comparative/adaptive positive measures |
Self-healing telecommunications switches like the
5ESS from AT&T sense problems and state of alternative paths and
offload capacity to assure reliability |
| Laws of Nature and Mathematics |
IV:8.0 |
Rely as much as possible upon laws of nature to approach
physical impossibility of high consequences. |
Passive, self-safing reactors like the HGTR and the
MHR |
Table 6: The most important tools for each approach. The tools for one
approach generally rely on the tools at the previous approaches.
|
Levels of Surety, Surety Approaches, and Preferred
Tools |
| Level |
Approach |
Preferred Tools (Cumulative) |
| Everything Working Sufficiently as
Intended |
I:1.0 |
Foresight and good practices |
Statistics, Quantitative Economic Analysis |
| I:2.0 |
Mitigation after the fact and correcting what went wrong |
Conduct of Operations, Fishbone Diagrams, Parreto
Analysis |
| Proactive Human Intervention |
II:3.0 |
Proper operations with thorough science-based understanding,
independent assessment, and continuous improvement |
Operations Research and Decision Analysis, Gaming,
Red Teaming, Black Hatting, Model-based Simulations, Data Mining,
Human Factors |
| II:4.0 |
Administrative control reduces the probability of occurrence |
Probabilistic risk assessment and consequence
management |
| Positive Measures from Science and
Engineering |
III:4.5 |
Engineered controls reduce the probability of occurrence |
Probabilistic risk assessment and consequence
management |
| III:5.0 |
All relevant positive measures must success |
System design, engineering, and synthesis
using model based simulations in high-performance computers with
data bases that are validated by well-diagnosed experiments |
| III:6.0 |
Only one of many positive measures is necessary for
success. |
| III:7.0 |
Predictable cumulative/comparative/adaptive positive
measures |
| Laws of Nature and Mathematics |
IV:8.0 |
Rely as much as possible upon laws of nature to approach
physical impossibility of high consequences. |
Table 7: Application of Surety Methodology to Reactor Safety. Current
practice is Level III, Approach 4.5 with some applications at higher
approaches.
|
Reactor Safety |
| Level: Approach |
Surety Solution |
| I:1.0 |
Good Practices |
Standard
Operating Procedures |
| I:2.0 |
Fix Problems |
Conduct of
Operations |
| I:3.0 |
Proactive understanding |
"Eyes of the
Outsider" Assessment and Emergency Operational Exercises
|
| I:4.0 |
Prevention by Administrative Controls |
Watchers
watching the watchers |
| I:4.5 |
Prevention by Engineering Controls |
Active
independent parallel coolant system for Loss of Coolant Accident
|
| I:5.0 |
All Positive Measures must succeed |
Passive independent cooling system
in Loss of Coolant Accident |
| I:6.0 |
One of many Positive Measures must succeed |
Cost effective IMEMS-based
strong-link weaklink system to assure predictable operation |
| I:7.0 |
Cumulative- Comparative- Adaptive |
Cost effective
IMEMS-based-sensor-processor-actuator system to monitor state of
health and automatically adapt system. |
| I:8.0 |
Laws of Nature and Mathematics |
Passively self-safing reactor
dynamics, possibly MHR and HTGR. |
Appendix: The eight approaches of surety
Graphical illustrations of the eight approaches of surety and
additional examples of each complete the description of this proposed
methodology for enhancing surety.
Approach I:1.0 Reliance on foresight of designers
and good practices of people
Examples
- Liability Insurance
- School security
- Design and manufacturing of most consumer products
- Nuclear Nonproliferation in India and Pakistan (mistake
at Level I)
This approach supports Level I: Surety by
everything working sufficiently as intended.
| |
|
Approach I:2.0 Mitigation after the fact by
coordinated emergency response and correcting what went
wrong
Examples
- Investigations of airline and nuclear reactor incidents
and accidents and retrofit of units or systems to correct
faults.
- Hardware or software reliability by
design-test-fix
This approach supports Level I:
Surety by everything working sufficiently as intended.
| |
|
Approach II:3.0 Surety is maintained by proper
operations with thorough understanding, independent
assessment, and continuous improvement.
Examples
- Continual simulator training and flight requalification
of airline pilots
- Humans diagnosing and disarming terrorist bombs.
- Nuclear Stockpile Management-1998
This approach
supports Level II: Surety by proactive human intervention.
| |
|
Approach II:4.0 Administrative controls reduce the
probability of deleterious environment occurring.
Examples
- X-ray and metal screening at airports
- Control access to building proximity to prevent bombing.
- Human Controlled operation of the electric power
grid
This approach supports Level II: Surety by
proactive human intervention. | |
|
Approach III:4.5 Engineered controls reduce the
probability of deleterious environment occurring.
Examples
- Automated breathalyzer and alcohol blood monitors that
enable someone to start a car
- Automated autonomous controls of the electric power
grid
This approach supports Level III: Surety by
positive measures of science and engineering.
| |
|
Approach III:5.0 All relevant positive measures are
necessary for success.
Examples
- Single-layer ballistic missile defense system
- Airbags and seatbelts
This approach supports
Level III: Surety by positive measures of science and
engineering. | |
|
Approach III:6.0 Only one of many positive measures
is necessary for success.
Examples
- Coded car doors and keyed ignition switches to keep
drunks from driving
- Many security systems to prevent theft
- Firewalls in information networks
- Multi-tier ballistic missile defense
- Pre-modern nuclear safety
This approach supports
Level III: Surety by positive measures of
| |
|
Approach III:7.0 Predictable
cumulative/comparative/ adaptive positive measures
Examples
- Anti-lock brakes
- Public Switched Telephone System with 5ESS Switch
- Classified weapon application
This approach
supports Level III: Surety by positive measures of science and
engineering. | |
|
Comparator |
|
Predictable Interventions |
|
Input |
Output | |
Approach IV:8.0 Rely as much as possible upon laws
of nature to approach physical impossibility of high
consequences.
Examples
- Passively fused power circuits
- Hang glider air foil that becomes a parachute instead of
stalling
- Enhanced Nuclear Detonation Safety systems
This
approach supports Level IV: Surety by laws of nature and
mathematics. | |
|
Precluded High
Consequences |
|
Physics |
Chemistry |
| Permitted Operations |
|
|
Material
Science | |
Conclusions The increasing complexity of life in the 20th and
21st centuries requires creative and comprehensive solutions to challenges
in reliability, safety, and security and use control. The fifty years of
experience at Sandia National Laboratories have been mined to provide a
draft strategy for stimulating surety solutions for these challenges. The
strategy consists of four levels of surety and eight approaches. The
workshop will examine this draft strategy by applying it to a variety of
national challenges of the 21st Century. |