NASA TPm2000 - 207428

Download as pdf or txt
Download as pdf or txt
You are on page 1of 366

NASA/TPm2000-207428

Reliability and Maintainability (RAM) Training

Edited by
Vincent R. Lalli
Glenn Research Center, Cleveland, Ohio

Henry A. Malec
Siemens Stromberg-Carlson, Albuquerque, New Mexico

Michael H. Packard
Ratheon Engineers and Constructors, Cleveland, Ohio

National Aeronautics and


Space Administration

Glenn Research Center

July 2000
The NASA STI Program Office... in Profile

Since its founding, NASA has been dedicated to CONFERENCE PUBLICATION. Collected
the advancement of aeronautics and space papers from scientific and technical
science. The NASA Scientific and Technical conferences, symposia, seminars, or other
Information (STI) Program Office plays a key part meetings sponsored or cosponsored by
in helping NASA maintain this important role. NASA.

The NASA STI Program Office is operated by SPECIAL PUBLICATION. Scientific,


Langley Research Center, the Lead Center for technical, or historical information from
NASA's scientific and technical information. The NASA programs, projects, and missions,
NASA STI Program Office provides access to the often concerned with subjects having
NASA STI Database, the largest collection of substantial public interest.
aeronautical and space science STI in the world.
The Program Office is also NASA's institutional TECHNICAL TRANSLATION. English-
mechanism for disseminating the results of its language translations of foreign scientific
research and development activities. These results and technical material pertinent to NASA's
are published by NASA in the NASA STI Report mission.
Series, which includes the following report types:
Specialized services that complement the STI
TECHNICAL PUBLICATION. Reports of Program Office's diverse offerings include
completed research or a major significant creating custom thesauri, building customized
phase of research that present the results of data bases, organizing and publishing research
NASA programs and include extensive data results.., even providing videos.
or theoretical analysis. Includes compilations
of significant scientific and technical data and For more information about the NASA STI
information deemed to be of continuing Program Office, see the following:
reference value. NASA's counterpart of peer-
reviewed formal professional papers but • Access the NASA STI Program Home Page
has less stringent limitations on manuscript at http://www.sti.nasa.gov
length and extent of graphic presentations.
• E-mail your question via the Internet to
TECHNICAL MEMORANDUM. Scientific [email protected]
and technical findings that are preliminary or
of specialized interest, e.g., quick release • Fax your question to the NASA Access
reports, working papers, and bibliographies Help Desk at (301) 621-0134
that contain minimal annotation. Does not
contain extensive analysis. • Telephone the NASA Access Help Desk at
(301) 621-0390
CONTRACTOR REPORT. Scientific and
technical findings by NASA-sponsored Write to:
contractors and grantees. NASA Access Help Desk
NASA Center for AeroSpace Information
7121 Standard Drive
Hanover, MD 21076
Acknowledgments

In 1993 the Orlando Division of the Martin Marietta Company recognized the need to provide its engineers,
especially its design engineers, with a practical understanding of the principles and applications of reliability
engineering. To this end, a short, informative reliability training program was prepared. The author of this
company-sponsored effort was Richard B. Dillard, who was also the principal instructor.
In response to the students' enthusiasm, their quest for additional information, and the support of their
supervisors and managers, Mr. DiUard researched and wrote chapters 2 to 6 and appendix A of this text.
Robert E. Kastner and Henry N. Hartt were coauthors for our training manual on interface definition and control.
Credit is also due to William L. Hadley, who was the stimulus for many of the ideas presented, and to
Dr. D.C. Schiavone and William E Wood, who directed and supported the efforts that went into preparing this
material.
Thanks are extended to Frank E. Croxton and Prentice-Hall, Inc. for the use of two-tail and one-tail tables
of the normal distribution and to Arthur Wald and John W'fley & Sons, Inc. for the use of tables of the cumulative
normal distribution.
In recognition of the need to help project managers better understand safety and assurance technologies,
Gary G. Kelm, Frank J. Barber, and Frank J. Barina prepared appendix B.
Kam L. Wong, using information and concepts from Charles Ryerson and Irwin Quart, prepared chapter I in
our previous workbook, RP-1253; thanks are extended to North-Holland, Inc. for permission to reprint some of the
figures and text. Thanks to Fredrick D. Gregory, Dr. Michael A. Greenfield, and Dr. Peter Rutledge, Vernon W.
Wessel, and Frank Robinson, Jr. for their encouragement and support in allowing the Professional Development
Team to develop this new workbook for our NASA Safety Training Course 017.
Henry A. Malec has passed away and will be missed. He will be remembered for his hard work in promoting
the Reliability Society. He prepared chapters 7, 10, and 11 of the original version of this text. Martha Wetherholt
and Tom Ziemianski prepared chapters 8 and 9. Thanks are extended to the Digital Press of Digital Equipment
Corporation for the software evaluation materials contained in chapter 7. Vincent R. LaUi, presently a risk manage-
ment consultant at the NASA Glenn Research Center in Cleveland, Ohio, prepared some of the new sections and
appendix C, added some of the problems, and edited and cared for the final NASA printing of this revised version
of the manual.
E.A. Winsa and Gary G. Kelm served as the final NASA project office reviewers. Their suggestions improved
the usefulness of the text for flight projects.
To supplement the material presented herein, the bibliography at the end of this manual will enable the reader
to select other authoritative material in specific reliability areas.
The editors, Vincent R. Lalli, Henry A. Malec, and Michael H. Packard would like to thank the many members
of the IEEE Reliability Society Administrative Committee for their help in developing this text.

Trade names or manufacturers' names are used in this report for


identification only. This usage does not constitute an official
endorsement, either expressed or implied, by the National
Aeronautics and Space Administration.

Available from

NASA Center for Aerospace Information National Technical Information Service


7121 Standard Drive 5287 Port Royal Road
Hanover, MD 21076 Springfield, VA 22100
Price Code: A16 Price Code: A16
Preface
What Does Reliability Mean?

Sy_ems...

The word "reliability" applies to systems that consist of people, machines, and written
information.
A system is reliable--that is, has good reliability--if the people who need it can depend
on it over a reasonable period of time. People can depend on a system if it reasonably satisfies
their needs.

People...

The views of the people involved in a system are different and depend on their responsi-
bilities; some rely on it, others keep it reliable, and others do both. Consider an automatic
grocery checkout system and the people involved:

• The owners, who are the buyers


• The store manager, who is responsible for its operation
• The clerk, who operates it
• The repair person, who maintains it in working condition
• The customer, who buys the products

Machines...

A grocery checkout system may comprise several types of machines. It has mechanical
(conveyor belt), electrical (conveyor belt motor, wiring), electronic (grocery and credit card
scanners, display screen, and cash register), and structural (checkout counter, bag holder)
parts.

Written Information...

Several types of written information contribute to the way people rely on a system:

• The sales literature


• The specifications
• The detailed manufacturing drawings
• The software user's manual, programs, and procedures
• The operating instructions
• The parts and repair manual
• The inventory control

NASA/TP--2000-207428 v
Reliability ...

People rely on systems to

• Do work or provide entertainment


• Do no unintentional harm to users, bystanders, property, or the environment
• Be reasonably economical to own and to repair
• Be safe to store or dispose of
• Accomplish their purposes without failure

What Does Reliability Engineering Mean?

Reliability engineering means accomplishing specific tasks while a system is being


planned, designed and developed, manufactured, used, and improved. These tasks are not the
usual engineering and management tasks but are those that ensure that the system meet the
users' expectations--not only when it is new but as it ages and requires repeated repairs.

Why Do We Need Reliability Engineering?

Technology users have always needed reliability engineering, but it has only developed
since the 1940's as a separate discipline. Before the Industrial Revolution, most of the
reliability details were the individual worker's responsibility because the machines, prod-
ucts, and tools were relatively simple. However, shoddy goods were produced---wheels that
broke easily, farming implements that were not dependable, lumber that rotted prematurely.
As technology rapidly changed, systems became large and complex. Companies that
produce these systems must likewise be large and complex. In such situations, many
important details that affect reliability are often relegated to a lower priority than completing
a project on time and at an affordable cost. Among the first to see the need for a separate
reliability discipline were the telephone and electric power utilities and the military.

vi NASA/TP 1999-207428
Contents
Chapter 1
Historical Perspective of Space System Reliability ................................................. 1
Summary ........................................................ 1
Past Space System Reliability ..................................... 1
Risk Management in the Revised NASA ............................. 2
The Challenge of NASA's Brave New World .......................... 3
Risk as a Resource .............................................. 3
The Role of Safety and Mission Assurance (SMA) in Risk Management .... 5
References .................................................... 7
Reliability Training ............................................. 8
Mathematics Review ............................................ 9
Notation ...................................................... 9
Manipulation of Exponential Functions .............................. 9

Chapter 2
Reliability Mathematics and Failure Physics ............................ 9
Rounding Data ................................................... 9
Integration Formulas ........................................... 10
Differential Formulas ........................................... l0
Partial Derivatives ............................................. 10
Expansion of (a + b) n ...................................................... I1
Failure Physics .................................................. 11
Probability Theory ............................................... 12
Fundamentals ................................................. 12
Probability Theorems ............................................. 13
Concept of Reliability ............................................. 14
Reliability as Probability of Success ............................... 14
Reliability as Absence of Failure .................................. 15
Product Application ............................................ 15
Interface Definition and Control .................................. 16
Concluding Remarks .............................................. 16
References ...................................................... 17
Reliability Training ............................................... 18

Chapter 3
Exponential Distribution and Reliability Models ........................ 21
Exponential Distribution ........................................... 21
Failure Rate Definition ................. "......................... 22
Failure Rate Dimensions ........................................ 23
Mean Time Between Failures ..................................... 24
Calculations of Pc for Single Devices .............................. 24
Reliability Models ................................................ 25
Calculation of Reliability for Series-Connected Devices ................ 25

NAS AfrP---2000- 207428 vii


CalculationofReliability forParallel-Connected Devices (Redundancy) ..... 27
CalculationofReliability forComplete System ......................... 29
Concluding Remarks .............................................. 30
References ...................................................... 30
Reliability
Training............................................... 31

Chapter 4
Using Failure-Rate Data ............................................ 35
Variables Affecting Failure Rates .................................... 35
Operating Life Test ............................................. 35
Storage Test .................................................. 36
Summary of Variables Affecting Failure Rates ....................... 36
Part Failure Rate Data ............................................. 39
Improving System Reliability Through Part Derating .................... 39
Use of Application Factor .......................................... 40
Predicting Reliability From Part Failure Rate Data ...................... 40
Predicting Reliability by Rapid Techniques ............................ 41
Use of Failure Rates in Tradeoffs .................................... 41
Nonoperating Failures ............................................. 42
Applications of Reliability Predictions to Control of Equipment Reliability ... 42
Standardization as a Means of Reducing Failure Rates ................... 42
Allocation of Failure Rates and Reliability ............................ 42
Importance of Learning From Each Failure ............................ 44
Failure Reporting, Analysis, Corrective Action, and Concurrence ........... 44
Case Study--Achieving Launch Vehicle Reliability ..................... 44
Design Challenge .............................................. 44
Subsystem Description .......................................... 44
Approach to Achieving Reliability Goals ........................... 44
Launch and Flight Reliability ..................................... 52
Field Failure Problem ........................................... 52
Mechanical Tests .............................................. 53
Runup and Rundown Tests ....................................... 53
Summary of Case Study ......................................... 53
Concluding Remarks .............................................. 55
References ...................................................... 55
Reliability Training ............................................... 56

Chapter 5
Applying Probability Density Functions ............................... 59
Probability Density Functions ...................................... 59
Application of Density Functions .................................... 61
Cumulative Probability Distribution .................................. 62
Normal Distribution .............................................. 63
Normal Density Function ........................................ 63
Properties of Normal Distribution ................................. 64
Symmetrical Two-Limit Problems ................................. 65

viii NASA/TP--1999-207428
One-Limit Problems ............................................ 66
Nonsymmetrical Two-LimitProblems .............................. 71
Application ofNormalDistribution toTestAnalyses and
ReliabilityPredictions ......................................... 71
EffectsofTolerance onaProduct.................................... 74
Notes onTolerance Accumulation: A How-To-Do-It Guide............. 74
Estimating Effects ofTolerance ................................... 75
Concluding Remarks .............................................. 76
References ...................................................... 77
Reliability
Training............................................... 78

Chapter 6
Testing for Reliability .............................................. 81
Demonstrating Reliability .......................................... 81
Pc IUustrated .................................................. 81
Pw Illustrated ................................................. 82
K-Factors Illustrated ............................................ 82
Test Objectives and Methods ....................................... 82
Test Objectives ................................................ 83
Attribute Test Methods .......................................... 83
Statistical Confidence .......... : ................................ 83
Test-To-Failure Methods ........................................ 86
Life Test Methods .............................................. 92
Conclusion ..................................................... 96
References ...................................................... 96

Chapter 7
Software Reliability ................................................ 99
Models ........................................................ 99
Time Domain Models .......................................... 100
Data Domain Models .......................................... 101
Axiomatic Models ............................................ 101
Other Models ................................................ 102
Trends and Conclusions ........................................ 103
Software ...................................................... 103
Categories of Software ......................................... 103
Processing Environments ....................................... 104
Severity of Software Defects .................................... 104
Software Bugs Compared With Software Defects .................... 104
Hardware and Software Failures ................................. 105
Manifestations of Software Bugs ................................. 105
References ..................................................... 107
Reliability Training .............................................. 108
Reference Document for Inspection: "Big Bird's" House Concept ......... 109
"Big Bird's" General Concept ................................... 109
Class Meeting Exercise: Requirements Inspection ................... 109
Reference Document for Inspection System Requirements ............. 110

NASA/TP---2000- 207428 ix
"BigBird's"HouseSystems
Requirements ......................... 110
ExcuseMe,AreThoseRequirements? ............................. 110
"BigBird's"Requirements
Checklist................................ 111

Chapter 8
Software Design Improvements ................ / .................... 115
Part I--Software Benefits and Limitations ............................ 115
Part II--Software Quality and the
Design and Inspection Process ................................... 124
Software Development Specifications ............................. 124
Specifications and Programming Standards ......................... 125
NASA Software Inspection Activities ............................... 125
Additional Recommendations ...................................... 127
Conclusions .................................................... 129
References ..................................................... 129
Reliability Training .............................................. 131

Chapter 9
Software Quality Assurance ........................................ 133
Concept of Quality .............................................. 133
Software Quality ................................................ 134
Software Quality Characteristics ................................. 135
Software Quality Metrics ....................................... 135
Overall Software Quality Metrics ................................ 137
Software Quality Standards ..................................... 143
Concluding Remarks ............................................. 143
References ..................................................... 144
Reliability Training ............................................. 145

Chapter 10
Reliability Management ........................................... 147
Roots of Reliability Management ................................... 147
Planning a Reliability Management Organization ...................... 147
General Management Considerations ................................ 148
Program Establishment ........................................ 148
Goals and Objectives .......................................... 149
Symbolic Representation ....................................... 149
Logistics Support and Repair Philosophy .......................... 150
Reliability Management Activities .................................. 152
Performance Requirements ..................................... 152
Specification Targets .......................................... 152
Field Studies ................................................. 153
Human Reliability ............................................... 153
Analysis Methods ............................................. 153
Human Errors ................................................ 154
Example .................................................... 154

NASA/TP--1999-207428
Presentationof Reliability........................................ 155
Engineering andManufacturing .................................. 155
UserorCustomer ............................................. 155
References ..................................................... 157
Reliability
Training.............................................. 158

Chapter 11
Designing for Maintainability and System Availability .................. 161
Introduction .................................................... 161
Definitions .................................................... 161
Importance of Maintainability ..................................... 163
Elements of Maintainability ....................................... 163
Total Cost of Ownership .......................................... 165
Maintainability and Systems Engineering ............................ 166
Maintainability Processes and Documents ............................ 167
First Phase .................................................. 167
Second Phase ................................................ 170
Third Phase .................................................. 170
Documents .................................................. 171
Maintainability Analysis Mathematics ............................... 173
Additional Considerations ...................................... 176
Requirements and Maintainability Guidelines for ORU's ................ 176
Related Techniques and Disciplines ................................. 177
Maintainability Problems ......................................... 178
Example 1 ................................................... 178
Example 2 ................................................... 178
Problem Solving Strategy ......................................... 178
Recommended Techniques ...................................... 181
Conclusion .................................................... 182
References ..................................................... 182
Reliability Training .............................................. 183

Appendix A
Reliability
Information ............................................ 185
References ..................................................... 185

Appendix B
Project Manager's Guide to Risk Management and
Product Assurance ............................................. 229
Introduction .................................................... 229
Risk Management and Product Assurance at the
NASA Glenn Research Center ................................... 229
Project Assurance Lead ........................................... 229
Role .......................................................... 229
Responsibilities ................................................. 230

NASA/TP--2000-207428 xi
Appendix C
Reliability Testing Examples ....................................... 247
Accelerated Life Testing .......................................... 267
Accept/geject Decisions With Sequential Testing ...................... 268
References ..................................................... 275

Bibliography ........................................................ 277

Training
Reliability Answers ........................................... 279

Appendix D
Training Manual for Elements of Interface ........................... 281

xii NASA/TP_1999-207428
Chapter 1
Historical Perspective of Space System Reliability
Summary 1970's and 1980's resulted in the development of applied
statistics for mission assurance and a large group of tasks for the
The NASA Strategic Plan (ref. 1-1 ) is the backbone of our project. Mission failures in a well-developed system come
new Strategic Management System, an important aspect of from necessary risks that remain in the system for the mission.
which is risk management. Coincident with a decreasing NASA Risk management is the key to mission assurance. The tradi-
budget is the new working environment that demands a better, tional tasks of applied statistics, reliability, maintainability,
faster, and cheaper way to conduct business. In such an environ- system safety, quality assurance, logistics support, human
ment where risk is considered a knowledge-based resource, factors, software assurance, and system effectiveness for a
mission assurance has come to play an important role in our project are still important and should still be performed.
understanding of risk. In the past, mission assurance activities were weakly struc-
Through the years, much of mission assurance has been tured. Often they were decoupled from the project planning
aimed at increasing independent systems engineering and fur- activity. When a project had a problem (e.g., a spacecraft would
ther refining basic design approaches. Now the time has come not fit on the launch vehicle adapter ring), the mission assur-
to direct our attention to managing the risks that come from ance people were involved to help solve it. Often problems
system interactions during a mission. To understand such risks, were caused by poorly communicated overall mission needs, a
we must bring to bear all the engineering techniques at our limited data base available to the project, tight funding, and a
disposal. Mission assurance engineers are entering the era of limited launch window. These factors resulted in much risk that
interaction in which engineering and system engineering must was not recognized until it happened. The rule-based manage-
work closely to achieve better performance on time and within ment method used by NASA recognized risk as a consequence
cost. and classified four types of payloads: A, B, C, and D. These
A structured risk management approach is critical to a were characterized as high priority, minimum risk; high prior-
successful project. This is nothing new. A risk policy must be ity, medium risk; medium priority, medium-high risk; and high
integral to the program as part of a concurrent engineering risk, minimum cost. Guidelines for system safety, reliability,
process, and risk and risk drivers must be monitored through- maintainability and quality assurance (SRM&QA) project
out. Risk may also be managed as a resource: the new way of requirements for class A-D payloads were also spelled out. An
managing better, faster, cheaper programs encompasses example is the treatment of single failure points (SFP): class A,
up-front, knowledge-based risk assessment. The safety and success-critical SFP's were not permitted; class B, success-
mission assurance (S&MA) community can provide valuable critical SFP's were allowed without a waiver but were mini-
support as risk management consultants. mized; class C, success critical SFP's were allowed without a
formal waiver; class D, the same as class C.
Often risk came as a consequence of the mission. In an
attempt to minimize risk, extensive tests and analyses were
Past Space System Reliability conducted. The residual risk was a consequence of deficiencies
in the tradable resources of mass, power, cost, performance,
Ever since the need for improved reliability in space systems and schedule. NASA tried to allocate resources, develop the
was recognized, it has been difficult to establish an identity for system, verify and validate risk, launch the system, and accom-
mission assurance engineering. Attempts to delineate an inde- plish the mission with minimal risk. Using these methods
pendent set of tasks for mission assurance engineering in the resulted in a few failures.

NASA/TP---2000-207428 I
• Traditional
stress screening

100
_ I Finished item I management
, -_'_ Irefiabilitycontrol I, _, . ,
_'I Faitumcause t _ _ • _um-zn
9O -- Numerical reliability analysis I detection .... 1 _ _ o_ .......
Parts Stabstical
. , techniau'es
. " I ,J _ _ uynam=c
. _jaHoreo)
.
management Reliability prediction _ _ _ screen,ng
8O -- _ " Failure modes and effects _- • / _ _ I :_'-"_'.'_, I
• . I-'nyslcs _ \ _. I P_ncttr¢,l L

analys,s of failure -/ _ _'_Flaw


_, 70 -- ;' \ _ "_dentification
\ . . Mathematical \ _ • Stress
\ I Reliabitity I tools to deal _ ,, _ ;v=l. °*;^-
.-_ 60 \ I methedsI with nonconstant _ '_ _ .......

- , \ .Oes,gnrev,ews
fa,,orer,t
• Inspection • Stress- ' : Reliability growth _e _ \
E 5o
1::: --o Quality control strength Reliability demonstration
=o
_ • Acceptance test • Safe_
margin • corrective
Failure reporting
actions and _
40
• Supplier controls - Physics
• Sneak circuit analysis of flaw
_- 30
-- • Derating ° Warranty failur_ _'-

_ ° Qualification test
2O

Manufacturing Redundancy application = Self-repair _-J'_'_ynamio (tailored)


10 -- control i quality control

0 I I t [ I P I
1940 1950 1960 1970 1980 1990 2000 2010

Calender year

Figure 1-1 .--Distribution of reliability emphasis with respect to calendar years (updated from ref. 6; original figure prepared by Kam L. Wong).

Various reliability efforts were grouped into categories: management-risk assessment process and requires that all
manufacturing control, design control, reliability methods, projects use it. All risks must be dispositioned before flight.
failure-cause detection, finished item reliability, flaw control, The definition of risk management (ref. 1-2) is "An orga-
and risk management. Figure 1-1 illustrates how these catego- nized, systematic decision-making process that efficiently iden-
ries have been emphasized through the years. The construction tifies risks, assesses or analyzes risks, and effectively reduces
of figure 1-1 is approximate because i/s purpose is to identify or eliminates risks to achieving the program goals." It also
activities, not to classify efforts precisely. Note that specific explains that effective project management depends on a thor-
mission assurance activities are changing and that the amount ough understanding of the concept of risk, the principles of risk
of effort expended in these may not be proportional to the management, and the establishment of a disciplined risk man-
emphasis given them. A good parts management pro_am is agement process, which is shown in figure 1-2. The figure also
always important. The decrease in the use of reliability methods explains the risk management plan requirements. A completed
does not mean that parts management is unimportant; it only risk management plan is required at the end of the formulation
reflects that the importance of parts management has been well phase and must include risk management responsibilities:
established and that parts management has become a standard resources, schedules, and milestones; methodologies: processes
design control task as part of a project. and tools to be used for risk identification, risk analysis,
assessment, and mitigation; criteria for categorizing or ranking
risks according to probability and consequences; the role of
Risk Management in the Revised NASA decisionmaking, formal reviews, and status reporting with
respect to risk management; and documentation requirements
The new NASA handbook on the Management of Major for risk management products and actions.
Systems and Pro_ams is divided according to the four parts of A new direction for mission assurance engineers should be to
the program life cycle: formulation, approval, implementation, provide dynamic, synthesizing feedback to those responsible
and evaluation. It stresses risk management as an integral part for design, manufacturing, and mission operations. The feed-
of project management. The Formulation section defines a risk back should take the form of identifying and ranking risk,

NASA/TP--2000-207428
I

Project constraints, _ i Identification of general


Mission success criteria I risk issues and concerns -_- Program risk plan
management

/
Test data, expert opinion, --_ Special risk identification, Qualitative categorization,
FMEA, lessons learned,
technical analysis / analytical assessments,
and evaluation
quantified risk,
consequence and/or
severity

Risk drivers -_ Risk mitigation actions ]


(not classified
as "accepted")

dation of mitigation
I Verification and/or vali-
actions I

I Documentation
tracking and Documentedactions
disposition risks and

Figure 1-2.--Risk management process (ref. 3).

determining risk mechanisms, and explaining risk manage- Risk as a Resource


ment techniques. Mission assurance and the project should
work together to achieve mission success. NASA's new paradigm (ref. 1-3) requires that risks be
identified and traded as a resource with an appropriate level of
mitigation. The tradable resources have increased by one: risk,

The Challenge of NASA's Brave New mass, power, schedule, performance, and cost. The resources

World are hardware allocated during development, and at the same


time risks are addressed and traded off. When the adequacy is

NASA and many other Government agencies have been demonstrated, the spacecraft is launched, and the flight perfor-

forced to face a new workplace environment. With the NASA mance is accomplished with a recognized risk. As seen for rule-
based activities, there may be some failures but there will be
budget shrinking, the nature of projects has changed: many are
more spacecraft launches to learn from. Thus, the risk has been
fast track and have fixed prices, which means that they must be
used as a resource process. The goal is to optimize the overall
completed in a better, faster, and cheaper manner. The dollars
risk posture by accepting risk in one area to benefit another. A
once put into facilities are very limited; the spacecraft budgets
strategy to recover from the occurrence of risk must also be
are smaller so the development cycle time has been reduced to
considered. Risk trades will be made (best incremental return),
save money. NASA's solution to these constraints is to empha-
possible risk consequences evaluated and developed, and deci-
size proactive risk management processes. The paradigm has to
change from rule-based to knowledge-based decisions and new sion or recovery options accepted and tracked. How is the cost

methods that will improve productivity. Figure 1-3 shows the of risk reduced? Here it is important to consider its marginal

total NASA Earth and Space Science project budgets that cost. When the cost per "unit of risk reduction" in a given
component or subsystem increases significantly--stop. It would
reflect the slogan "better, faster, and cheaper."
be better to buy down risk somewhere else.

NASA/TP--2000-207428 3
$ 590
16

8.3

9
4.6

3.1
$190
D

$ 77 2
m
• i.

= •
i i __

Fiscal 90-94 95-99 00-04 90-94 95-99 00-04 90-94 95-99 00-04
year
Average spacecraft Average development time, Annual flight rate,
development cost, yr average number missions
millions of FY 95 dollars launched/yr

Figure 1-3.--Total NASA Earth and space science projects completed in better, faster, and cheaper environment (ref. 3).

Possible risk consequences


I I I
Function Risk trade System development Resource Flight Advantage
space

• Lowest risk
Class S,
grade 1 I •• Availability
Higher masspoor
and volume •° Cost
Schedule • Fits long life missions
• More resistant to single-
/,r Many of same issues as class S
event upset (SEE)

but with less flight risk than


1_ No heritage COTS • Moderate cost
quality \ I
• Lot variations means • Higher performance
variable radiation expectation than COTS
tolerance

_ off-the-shelf
Commercial • Schedule • Performance
• Quality control at • Readily available
(COTS)
parts vendors (test oper- degradation • Cheap
• Postprocurement ations) • Incidence of • Fits short-duration missions
screening and burn-in "maverick" part with multiple launches
may find inadequacies
I
Safety and mission assurance (SMA) role
° Procurement specifications
• Vendor qualification
• Upgrading process definition
• Parts testing program
• Residual parts risk assessment

Figure 1--4.--Risk analysis for class EEE parts (ref. 3).

4 NASA/TP--2000-207428
Technology
readiness Project management Design
level (TRL) experience verification

1 to3
Unproven team

5to6 Strong Jessons


) learned activity

Proven
Higher
risk
Analysis and test

Complexity • EEE parts


risk B COTS
Single
Componentqevel test and validation
string

oversight
Embedded verification
SMA insight
and validation

Supplier ISO
Fully integrated
9000 only
testing only
only
Pig in a polk
Integration and test Software
verification and
Safety and mission assurance
validation
(SMA) involvement

Figure 1-5.--Notational risk surface (ref. 3).

Dr. Greenfield, the Deputy Associate Administrator in the technology can save time and money so there is a critical point
Office of Safety and Mission Assurance at NASA Headquar- at which it should be put to use. The diagram of figure 1-5
ters, gave a risk management presentation and illustrated shows areas of high to low risk for the various risk elements.
through six examples how to use risk as a resource (ref. 1-4). Called a risk surface (notational), if one looks along the EEE
One of his examples dealt with the class of electrical, elec- parts line, the commercial off-the-shelf parts (COTS) have
tronic, and electromagnetic (EEE) parts (ref. 1-5). Figure 1--4 more risk than B parts and B parts have more risk than S parts.
shows the function, risk trade, possible risk consequence, and Other risk elements are also shown in this figure.
advantages for the class of parts to be used in a spacecraft. The
risk trade that a project needs to make is the type of parts to use:
class S, _ade I, class B, or commercial off-the-shelf (COTS) The Role of Safety and Mission Assur-
parts. Each has possible risk consequences. For example, class ance (SMA) in Risk Management
S, grade 1 parts have poor availability and are usually older
technology, which means higher mass and volume. The advan- NASA's Safety and Mission Assurance (SMA) Office has
tages are that they are low risk, fit long-life missions, and are the core competencies to serve as a risk management consultant
more resistant to single-event upset (SEE). to the projects and is supporting the risk management plan
A measure of risk exists for a project that chooses to use a development. It provides projects with risk-resource tradeoffs:
new technology, and it is now termed the technology infusion strategies, consequences, benefits, and mitigation approaches.
risk. The technology readiness level (TRL) scale ranges from Its role is to interact in all phases of the project decision process
1 to 9. A TILL of 9 is used for existing, well-established, proven (planning, design, development, and operations). It provides
(very low-risk) technology. A TRL of 1 is used for unproven, projects with residual risk assessment during the project life
very high-risk technology at the basic research stage. New cycle. Figure 1-6 shows the mission failure modes that cause

NASA/TP--2000-207428 5
Mission failure modes I

I I

' T I' 11 I'


i
t J QML vendors
I Technology
qualification I
I
]
I
I i
Process control
t
I ]

I [ I

V V V " I

I Life testing
Mission simulation I II
I verifications
Inspections and I
I
v V
* V ,
I
Assembly testing
I Performance testing
Reliability analyses I I i i
1 I J f I
I

, V V V " T V T
System testin I
I
I Performance testing

V V V
I Mission success

Figure 1-6.--Some mission failure modes and methods leading to mission success (ref. 3).

TABLE I-I,--SAFETY AND MISSION ASSURANCE (SMA) ROLE IN RISK MANAGEMENT

SMA area Typical areas involved in tradeoffs

Quality assurance Documentation. surveillance, inspection, certification, audit, materials


review board

Configuration control Drawings, equipment lists, delivery schedules, approval authority,


freeze control, as-built documentation

Environmental requirements Design and test requirements, documentation, approvals, functional and
environment tests, programmatics (component, subsystem, system),
analysis

EEE parts Parts lists, parts class, policy, nonstandard parts, traceability, derating.
failure analysis, burn-in, selection, acquisition, upgrades, lot control.
screening, destructive physical analysis, vendor control

Reliability Single-failure-point policy, problem and failure reporting and


disposition, design performance analysis (failure modes and effects
criticality analysis, fault tree analysis, part stress, redundancy
switching, worst case, single-event upset, reviews, redundancy

Systems safety Documentation, hazard identification and/or impact, analysis (fault tree
analysis, hazard, failure modes and effects criticality analysis, sneak
circuit), structures and materials reviews, electrostatic discharge
(ESD) control, tests, inspections, surveys

Software product assurance Initiation. problem and failure reporting and disposition, simulations,
independent verification and validation (IVV). tests

NASA/TP--2000-207428
riskandsomeof themethods usedto manage themsothat References
mission
success
canbeachieved. The SMA role in risk manage-
ment is presented in table 1-1, which shows the SMA area and l- 1. NASA Strategic Plan (NASA Policy Directive 1000.1 ), 1998 available at
http://www.hq.nasa.gov/office/codeq/qdoc.pdf
other typical areas involved in project tradeoffs. For example,
1-2. Greenfield, Dr. Michael A.: Risk Management Risk as a Resource.
with EEE parts, 16 tradeoff areas are identified to help the
NASA HQ, Washington, DC, 1998.
project understand parts management risks. SMA must take the I-3. Lalli, Vincent R.: Reliability Training. NASA RP-1253, 1992.
lead to answer some very important questions: Where are the 1--4. Hoffman, Dr. Edward J.: Issues in NASA Program and Project Manage-
problems? What has been done about them? Have all the risks ment. NASA SP--6101 ( 11 ), 1996.
1-5. Recommended Practice for Parts Management: ANS BSR/AIAA R- I 0(O
been mitigated? Are we ready to fly?
1996.
1-6. Management of Major Systems and Programs: NASA NHB-\ 7120.5A.
1998 available at: http://www.hq.nasa.gov/office/codeq/qdoc.pdf

NASA/TP--2000-207428 7
Reliability Training 1

1. Which NASA Policy Guide explains risk management?

A. 8701.draft I B. 7120.5A C. 2820.1

2. What challenge is NASA facing?

A. The NASA budget is shrinking.


B. Many projects are being done faster, cheaper, and better.
C. Dollars are very limited for facilities.
D. All of the above.

3. What are the tradeable resources that projects can use?

A. Performance, cost, and schedule


B. Mass, power, performance, cost, and schedule
C. Risk, mass, power, performance, cost, and schedule

4. How should the projects use the Safety and Mission Assurance Office?

A. Design consultants
B. Systems consultants
C. Risk management consultants

IAnswers are given at the end of this manual.

NASA/TP--2000-207428
Chapter 2

Reliability Mathematics and Failure Physics


Mathematics Review Rules that must be followed when manipulating these func-
tions are given next.
Readers should have a good working knowledge of algebra
and a familiarity with integral and differential calculus. How- Rule 1:
ever, for those who feel rusty, the following review includes
solved examples for every mathematical manipulation used in eX×e y =eX+Y

this manual.
Rule 2:
Notation
1
e -x ___
The Greek symbol Z (sigma) means "take the sum of," and e X

the notation
Rule 3:
n

i=1 e x
eX-V
e y

means to take the sum of the Xi'S from i = ! to i = n.


The symbol _ means "take the n th root ofx." The square Rounding Data

root 7_ is usually written as _ without the radicand (the 2).


Reliability calculations are made by using failure rate data.
The Greek symbol I-I (pi) means "take the product of," If the failure rate data base is accurate to three places, calcula-
and the notation
tions using these data can be made to three places. Use should
n be made of the commonly accepted rule (computer's rule) to
I"[ round the computational results to the proper number of sig-
i=1 nificant figures. The Mathematics Dictionary (ref. 2-1 ) defines
rounding off:
means to take the product of the Xi'S from i = 1 to i = n.
The notation x.r is referred to as a factorial and is a shorthand When the first digit dropped is less than 5,
method of writing 1 x 2 x 3 x 4 × 5 x 6 x ... x x or in general the preceding digit is not changed; when the
as x! = x(x - l(x - 2)... (1). However, 0! is defined as unity. first digit dropped is greater than 5 or 5 and
some succeeding digit is not zero, the
Manipulation of Exponential Functions preceding digit is increased by 1; when the
first digit dropped is 5 and all succeeding
An exponential function is the Napierian base of the natural digits are zero, the commonly accepted rule
logarithms, e = 2.71828 .... raised to some power. For is to make the preceding digit even, i.e., add
example, e 2 is an exponential function and has the value 7.3891. 1 to it if it is odd, and leave it alone if it is
This value can be calculated on most calculators. already even.

NASAFFP--2000-207428 9
For example, if the reliability of a system is 0.8324, 0.8316, or
0.8315, it would take the form 0.832 if rounded off to three d(axn) _ naxn_1
(5)
places. dx

Integration Formulas Example 4:

Only the following integration formulas are used in this


manual:

b x n'_l b b n+l _a n+l


f xn d x = "n-_+ l a- n+l (l) d(4x) _ 4
dx
(2

b Example 5:

f e_Xdx = -e -x b = e- b + e -a = e-(2 - e -b (2)


(2
(2
d(x2) = 2x2-1= 2x
dx

e-(2x d x = - e-aX _ - e-(2P- e-aq


(3)
a a

d(4x3) =(3)4x3-1 = 12x 2


dx

Example I:
Partial Derivatives

x2dx=_=-- This manual uses the following partial derivative formula:


f x2+l x3
2+1 3

_v = _(xyz.____)
= yz (6)
3x I _x
f_x dx = x22 12
3 = (3) 2 -2 (2) 2 = 9 -2 4 52

Example 2:

TABLE 2-1 .--BINOM IAL COEFFICIENTS

Coefficient of each term of (a + b)"


4e -x d x : -e -x i = e-3-e-4
1 2 3 4 5 6 7 8 9 10_ 11

Example 3: 0 1

I 1 1

2 I 2 1
4e-2x--e-2X i e-8-e-6 3 I 3 3 1
a3 2 2
4 I 4 6 4 I

Differential Formulas 5 l 5 10 10 5 I

6 I 6 15 20 15 6 I
Only the following differential formulas are used in this 7 1 7 21 35 35 21 7 I
manual:
8 I 8 28 56 70 56 28 8 1

9 I 9 36 84 126 126 84 36 9 1
d(ax)
- a (4) 10 i 10 45 120 210 252 210 120 45 10
dx

10 NASA/TP--2000-207428
Example 6: what makes a part reliable? When asked, many people would
say a reliable part is one purchased according to a certain source

{i!
2ft
control document and bought from an approved vendor. Un-
v = 2 ftx3 ftx4 ft= 24 ft 3 3ft fortunately, these two qualifications are not always guarantees
4 ft of reliability. The following case illustrates this problem.
A clock purchased according to PD 4600008, procured from
an approved vendor for use in the ground support equipment
_v of a missile system, was subjected to qualification tests as part
_xx=YZ = 12 ft 2 of the reliability program. These tests consisted of high- and
low-temperature, mechanical shock, temperature shock, vibra-
tion, and humidity. The clocks from the then sole-source ven-
Expansion of (a + b) n dor failed two of the tests: low temperature and humidity. A
failure analysis revealed that lubricants in the clock's mecha-
It will be necessary to know how to transform the nism froze and that the seals were not adequate to protect the
expression (a + b)" into a binomial expansion. This type of mechanism from humidity. A second approved vendor was
problem is easily solved by using table 2-1 and recalling that selected. His clocks failed the high-temperature test. In the
process, the dial hands and numerals turned black, making read-
r/- ings impossible from a distance of 2 ft. A third approved
(a+b) n =a n +nan-lb+ 1)(n) an-2b2
2! vendor's clocks passed all the tests except mechanical shock,
which cracked two of the cases. Ironically, the fourth approved
vendor's clocks, though less expensive, passed all the tests.
+ (n - 2)(n - 1)(n) a n_3b 3 +... The point of this illustration is that four clocks, each de-
3! signed to the same specification and procured from a qualified
vendor, all performed differently in the same environments.
(n - 1)(n - 2)...(n - m + 1) Why did this happen? The specification did not include the
+
gear lubricant or the type of coating on the hands and numer-
m!
als or the type of case material.
Many similar examples could be cited, ranging from require-
x an-rob m +... + bn (7) ments for glue and paint to complete assemblies and systems.
The key to solving these problems is best stated as follows: To
know how reliable a product is or how to design a reliable
Example 7: product, you must know all the ways its parts could fail and
the types and magnitude of stresses that cause such failures.
Think about this: if you knew every conceivable way a missile
Expand (a + b) 4. From table 2-1 with n = 4,
could fail and if you knew the type and level of stress required
(a+b) 4 =a 4 +4a3b+6a2b 2 +4ab 3 +b 4 to produce each failure, you could build a missile that would
never fail because you could eliminate

Failure Physics (1) As many types of failure as possible


(2) As many stresses as possible
When we consider reliability, we think of all the parts or
(3) The remaining potential failures by controlling the
components of a system continuing to operate correctly. There-
level of the remaining stresses
fore a reliable system or product must have reliable parts. But

TABLE 2-2,--RESULTS OF QUALIFICATION TESTS ON


SOURCE CONTROL DOCUMENT CLOCK

Vendor High Low Mechanical Temperature Vibration Humidity


temperature temperature shock shock

Fail Fail

Fail

Fail

NASA/TP--2000-207428 11
Soundsimple?Well,it wouldbeexceptthatdespitethe TABLE 2-3._BSERVED PROBABILITY OF SUCCESS

thousands offailuresobserved in industryeachday,westill Number of tosses, n 1 10 100 1000 l0 000

knowverylittleabout whythings failandevenlessabouthow Number of heads

to controlthefailures.However, throughsystematicdata observed, f(n) 0 7 55 i464 I 5080

accumulation andstudy, wecontinue tolearnmore. Relative frequency

Asstated, thismanual introduces somebasicconcepts of of probability of

failurephysics:
failuremodes (howfailuresarerevealed);
fail- success, f( n ) h_ 0 0.70 0.551
,
0.464 I
i
0.508

uremechanisms (whatproduces thefailuremode);andfailure


stresses(whatactivatesthefailuremechanisms). Thetheory
of andthepracticaltoolsfor controllingfailuresarealso
presented. This is an estimate of what should be observed if the coin is
tossed but is not yet an observed fact. After the coin is tossed,
however, the probability of success could be much more spe-
cific as shown in table 2-3.
Probability Theory
The table shows two important phenomena:
Fundamentals
( I ) As the number of trials changes, the number of favorable
Because reliability values are probabilities, every student of events observed also changes. An observed probability of suc-
reliability disciplines should know the fundamentals of prob- cess (or observed reliability) may also change with each addi-
tional trial.
ability theory, which is used in chapter 3 to develop models
that represent how failures occur in products. (2) If the assumptions made in calculating the a priori prob-
Probability defined.--Probability can be defined as follows: ability (reliability prediction) are correct, the a posteriori
If an event can occur in A different ways, all of which are con- (observed) probability will approach the predicted probability
sidered equally likely, and ifa certain number B of these events as the number of trials increases. Mathematically, the relative
are considered successful or favorable, the ratio B/A is called frequencyfln)/n approaches the a priori probability B/A as the
the probability of the event. A probability, according to this number of trials n increases, or
definition, is also called an a priori (beforehand) probability
because its value is determined without experimentation. It fol- lira f(n) B
lows that reliability predictions of the success of missile flights n ..--_oo n A
that are made before the flights occur are a priori reliabilities.
In other words, a priori reliabilities are estimates of what may
In the coin toss example, the predicted reliability was 0.50.
happen and are not observed facts.
The observed reliability of 0.508 indicates that the initial as-
After an experiment has been conducted, an aposteriori prob-
sumptions about the physics of the coin were probably cor-
ability, or an observed reliability, can be defined as follows: If
rect. If, as a result of 10 000 tosses, heads turned up 90 percent
f(n)is the number of favorable or successful events observed
of the time, this could indicate that the coin was incorrectly
in a total number ofn trials or attempts, the relative frequency
assumed to be homogeneous and that, in fact, it was "loaded."
f(n)/n is called the statistical probability, the a posteriori prob-
Inconsistency in the actual act of tossing the coin, a variable
ability, the empirical probability, or the observed
that was not considered in the initial assumptions, could also
reliability. Note that the number of favorable eventsf(n) is a
be indicated. Here again, even with a simple coin problem, it
function of the total number of trials or attempts n. Therefore,
is necessary to consider all the ways the coin may "fail" in
as the number of trials or attempts changes, f(n) may also
order to predict confidently how it will perform.
change, and consequently the statistical probability (or
Reliability of missiles.--In the aerospace industry, a priori
observed reliability) may change.
probabilities (reliability predictions) are calculated for missiles
Reliability of a coin.--To apply this theory, consider the
in an effort to estimate the probability of flight success. Inher-
physics of a coin. Assume that it has two sides, is thin, and is
ent in the estimate are many assumptions based on the physics
made of homogeneous material. If the coin is tossed, one of
of the missile, such as the number of its critical parts,
two possible landings may occur: with the head side up or tail
its response to environments, and its trajectory. As in the coin
side up. If landing heads up is considered more favorable than
problem, the ultimate test of the missile's reliability prediction
landing tails up, a prediction of success can be made by using
is whether or not the prediction agrees with later observations.
the a priori theory. From the a priori definition, the probability
If during flight tests, the observations do not approach the
of success is calculated as
predictions as the number of flights increases, the initial
assumptions must be evaluated and corrected. An alternative
1 favorable event 1 approach" is to modify the missile to match the initial assump-
= -, or 50 percent
2 possible events 2 tions. This approach is usually pursued when the reliability

12 NASA/TPm2000-207428
prediction represents a level of success stated by the customer Example 4: If the probability of completing one countdown
or when the predicted value is mandatory for the missile to be without failure R I is 0.9 and the probability of a second count-
effective. This subject of reliability predictions is discussed down failing is Q2 = 0. I, the probability that the first will suc-
again in chapter 4. ceed and the second will fail is RIQ 2 = (0.9)(0.1) = 0.09.
In practice, reliability testing yields the knowledge needed Theorem 3.--If the probability that one event will occur is
to verify and improve initial assumptions. As experience is R l and the probability that a second event will occur is R 2 and
gained, the assumptions undergo refinements that make it pos- if not more than one of the events can occur (i.e., the events are
sible to develop more accurate reliability predictions on new mutually exclusive), the probability that either the first or sec-
missiles and systems not yet tested or operated. This informa- ond event, not both, will occur is R l + R 2. A similar theorem
tion also provides design engineers and management with data can be stated for more than two events.
to guide design decisions toward maximum missile or system Example 5 (true event method): Consider now the probabil-
reliability. Some reliability problems require the use of Bayes ity of completing two countdowns without a failure. Let the
or Markovian probability theorems. Additional information on probabilities of success for the first and second countdowns be
other topics is available in references 2-2 to 2-5 and in IEEE R 1 and R 2 and the probabilities of failure be QI and Q2' To
Reliability Society publications and other documents listed in solve the problem using theorem 3, it is best to diagram the
the reference sections for chapters 3 to 9 and in the bibliogra- possible events as shown in figure 2-I. The mutually exclu-
phy at the end of this manual. sive events are

Ol first countdown fails


Probability Theorems RIQ2 first countdown succeeds and second fails

RIR 2 both countdowns succeed


The three probability theorems presented here are funda-
mental and easy to understand. In these theorems and examples,
From theorem 3, the probability that one of the three events
the probability of success (reliability) is represented with an R
will occur is
and the probability of failure (unreliability) with a Q. The fol-
lowing section (Concept of Reliability) examines what con-
QI + RI Q2 + R! R 2
tributes to the reliability and unreliability of products.
Theorem/.--If the probability of success is R, the probabil-
But because these three events represent all possible events
ity of failure Q is equal to 1 - R. In other words, the probability
that can occur, their sum equals I (from theorem 1). Therefore,
that all possible events will occur is Q + R = 1.
Example 1: If the probability of a missile flight success
is 0.8 I, the probability of flight failure is 1 -0.81 = 0.19. There- QI + R1Q2 + R1R2 = 1
fore, the probability that the flight will succeed or fail is
0.19+0.81 = 1.0. The probability of completing both countdowns without one
Theorem 2.--If R l is the probability that a first event will failure RIR 2 is the solution to the proposed problem; therefore,
occur and R 2 is the probability that a second independent event
will occur, the probability that both events will occur is R1R 2. Rig2 = 1-(RIO2 +al)
A similar statement can be made for more than two indepen-
dent events.
If R! = 0.9, Ql = 0.1, R 2 = 0.9, and Q2 = 0.1, then
Example 2: If the probability of completing one countdown
without a failure R_ is 0.9, the probability of completing two
countdowns without failure is RIR 2 = (0.9)(0.9) = 0.81. The Total
possible
probability that at least one of the two countdowns will fail is events
I - RIR 2 = 1 - 0.81 = 0.19 (from theorem 1). We say that at

Firs,
lSu°cee°s
SecooO
ISucceeO
"l
least one will fail because the unreliability term Q includes
all possible failure modes, which in this case is two: one or countdown ,. countdown _ R1R2
both countdowns fail.
Example 3: If the probability of failure QI during one count-
down is 0.1, the probability of failure during two countdowns Fails(02)
_, R1Q2
is QIQ2 = (0.1)(0.1) = 0.01. Therefore, the probability that at
least one countdown will succeed is 1 -QIQ2 = 1-0.01 = 0.99. Fails (O1)
• Q1
We say that at least one will succeed because the value 0.99
includes the probability of one countdown succeeding and the Figure 2-1 .--Diagram of possible events-probability of completing
two countdowns without a failure.
probability of both countdowns succeeding.

NASA/TP--2000-207428 13
Again, because the four events represent all possible events that
R1R2 = 1- [(0.9)(0. 1) + 0.1]
can occur, their sum equals unity (from theorem 1); that is,
= I-(0.9+0.1) = t-0.19=0.81
RIR2 +R1Q2 +Q1R2 +QIQ2 =1
which agrees with the answer found in example 2 by using
theorem 2. The expression for RIR 2 can also be written Solving for the probability that both countdowns will succeed is

R,R=I-(R,Q÷Q,)--1-[(1-Q,)Q+O,] RIR 2 = 1 -(RIO 2 + QIR2 + QIQ2)

= 1-(QI +Q2 - QIQ2)


Substituting 1 - Q] for R] and I - Q2 f°rR2 on the right side of
the equation yields the answer given in example 5:
which is the usual form given for the probability of both events
succeeding. However, note that in this expression, the event
indicated by QIQ2 (both countdowns fail) is not a true possible R,R2=1-[(1-Q,)Q2
+Q,(1-Q2)+
Q,Q=]
event because we stipulated in the problem that only one = I-(Q2 -QIQ2 +QI-QtQ2 + QIQ2)
countdown could fail. The term Q]Q2 is only a mathematical
event with no relation to observable events. In other words, if = I-(QI +02 - QIQ2)
the first countdown fails, we have lost our game with chance.
Example 6 (mathematical event method): Now consider the This countdown problem has been solved in two ways to
problem of example 5, ignoring for the time being the restric- acquaint you with both methods of determining probability dia-
tion on the number of failures allowed. In this case, the diagram grams, the true event and the mathematical event. The exer-
of the possible events looks like that shown in figure 2-2. In cises at the end of this chapter may be solved by using the
this case the mutually exclusive events are method you prefer. We suggest that you work the problems
before continuing to the next section because they help you to
RIR 2 both countdowns succeed gain a working knowledge of the three theorems presented.
RIQ2 first countdown succeeds and second fails

QIR2 first countdown fails and second succeeds

both countdowns fail Concept of Reliability


QtQ2
Now that you understand the concepts of probability and
Keep in mind that in this example both countdowns may fail. failure physics, you are ready to consider the concept of reli-
From theorem 3, the probability that one of the four events ability. First, we will discuss the most common definition of
will occur is
reliability--in terms of the successful operation of a device.
This definition, to fit the general theme of the manual, is then
RI R2 + Rt Q2 + QI R2 + QI Q2 modified to consider reliability in terms of the absence of fail-
ure modes.

Total
possible Reliability as Probability of Success
events

The classical definition of reliability is generally expressed


countdown countdown I = R1R2 as follows: Reliability is the probability that a device will oper-
ate successfully for a specified period of time and under speci-
fied conditions when used in the manner and for the purpose
D R1Q2 intended. This definition has many implications. The first is
l Fails(Q1) Fails(02)
that when we say that reliability is a probability, we mean that
reliability is a variable, not an absolute value. Therefore, if a
countdown =" Q1R2 device is 90 percent reliable, there is a 10 percent chance that
it will fail. And because the failure is a chance, it may or may
not occur. As in the coin example, as more and more of the
I Fails(Q2) _.. Q1Q2
devices are tested or operated, the ratio of total success to
total attempts should approach the stated reliability of 90 per-
Figure 2-2.--Diagram of possible events--number of failures not
restricted. cent. The next implication concerns the statement "... will

14 NASA/TP--2000-207428
operate
successfully..."
This means that failures that keep the probability that those failure modes critical to the performance
device from performing its intended mission will not occur. of the device will not occur. Just as we needed a clear definition
From this comes a more general definition of reliability: it is of success when using the classical definition, we must also
the probability of success. have a clear definition of failure when using the modified
It should be obvious then that a definition of what consti- definition.
tutes the success of a device or a system is necessary before a For example, let a system have two subsystems, A and B,
statement of its reliability is possible. One definition of suc- whose states are statistically independent and whose separate
cess for a missile flight might be that the missile leaves the reliabilities are known to be RA = 0.990 and R B = 0.900. The
launching pad; another, that the missile hits the target. Either system fails if and only if at least one subsystem fails. The
way, a probability of success, or reliability, can be determined, appropriate formula for system reliability is
but it will not be the same for each definition of success. The
importance of defining success cannot be overemphasized. Rsystem = R A • R B
Without it, a contractor and a customer will never reach an
a_eement on whether or not a device has met its reliability
requirements (i.e., the mission). Rsystem = 0.990.0.900 = 0.891
The latter part of the classical definition indicates that a defi-
nition of success must specify the operating time, the operating
Product Application
conditions, and the intended use. Operating time is defined as
the time period in which the device is expected to meet its
This section relates reliability (or the probability of success)
reliability requirements. The time period may be expressed in
to product failures.
seconds, minutes, hours, years, or any other unit of time. Op-
What are the types of product failure modes? In general,
erating conditions are defined as the environment in which the
critical equipment failures may be classified as catastrophic,
device is expected to operate; they specify the electrical, tolerance, or wearout. The expression for reliability then be-
mechanical, and environmental levels of operation and their comes
durations. Intended use is defined as the purpose of the device
and the manner in which it will be used. For example, a mis-
RD = Probability{C x t × W)
sile designed to hit targets 1000 miles away should not be con-
sidered unreliable if it fails to hit targets 1100 miles away.
where
Similarly, a set of ground checkout equipment designed to be
90 percent reliable for a 1-hour tactical countdown should not
Ro design-stage reliability of a product
be considered unreliable if it fails during 10 consecutive count-
downs or training exercises. The probability of success in this C event that catastrophic failure does not occur
case is (0.9) l° = 0.35 (from probability theorem 2). t event that tolerance failure does not occur
In addition to these specified requirements, we must also W event that physical wearout does not occur
consider other factors. As explained in the inherent product
reliability section of this chapter, these areas have a marked This is the design-stage reliability of a product as described
effect on the reliability of any device. by its documentation (Note that R i, the inherent reliability, is a
term often used in place of RD). The documentation specifies
the product itself and states the conditions of use and opera-
Reliability as Absence of Failure tion. This design-stage reliability is predicated on the decisions
and actions of many people. If they change, the design-stage
Although the classical definition of reliability is adequate for reliability could change.
most purposes, we are going to modify it somewhat and Why do we consider design-stage reliability? Because the
examine reliability from a slightly different viewpoint. Con- facts of failure are these: When a design comes off the drawing
sider this definition: Reliability is the probability that the board, the parts and materials have been selected; the toler-
critical failure modes of a device will not occur during a ance, error, stress, and other performance analyses have been
specified period of time and under specified conditions when performed; the type of packaging is firm; the manufacturing
used in the manner and for the purpose intended. Essentially, processes and fabrication techniques have been decided; and
this modification replaces the words "a device will operate usually the test methods and the quality acceptance criteria
successfully" with the words "critical failure modes.., will not have been selected. The design documentation represents some
occur." This means that if all the possible failure modes of a potential reliability that can never be increased except by a
device (ways the device can fail) and their probabilities of design or manufacturing change or good maintenance. How-
occurrence are known, the probability of success (or the reli- ever, the possibility exists that the observed reliability will be
ability of a device) can be stated. It can be stated in terms of the much less than the potential reliability.

NASA/TP--2000-207428 !5
Tounderstand whythisistrue,consider thehardware asa ware were investigated. The proportion of technical informa-
blackboxwitha holein boththetopandbottom. Insideare tion actually needed to effectively define and control the es-
potential
failuresthatlimitthedesign-stage reliabilityofthe sential dimensions and tolerances of system interfaces rarely
design.Whenthehardware is operated, thesepotential fail- exceeded 50 percent of any interface control document. Also,
uresfalloutthebottom(i.e.,operating failuresareobserved). the current government process for interface control is very
Therateatwhichthefailures falloutdepends onhowthebox paper intensive. Streamlining this process can improve com-
orhardware isoperated.Unfortunately, wenever havejustthe munication, provide significant cost savings, and improve over-
design-stagefailuresto worryaboutbecause othertypesof all mission safety and assurance.
failures
arebeingadded totheboxthrough theholeinthetop. The objective of this manual is to ensure that the format,
These otherfailuresaregenerated bythemanufacturing, soft- information, and control of interfaces between equipment are
ware,quality,andlogisticsfunctions,bytheuserorcustomer, clear and understandable and contain only the information
andeven bythereliabilityorganization itself.Wediscuss these needed to guarantee interface compatibility. The emphasis is
addedfailuresandtheircontributors in thefollowingpara- on controlling the engineering design of the interface and is
graphs,butit is importanttounderstand thatbecause of the not on the functional performance requirements of the system
addedfailures,theobserved failurescouldbegreater than or on the internal workings of the interfacing equipment. In-
thedesign-stagefailures. terface control should take place, with rare exception, at the
interfacing elements and not further.
K-Factors Two essential sections of the manual are Principles of Inter-
face Control and The Process: Through the Design Phases. The
The other contributors to product failure just mentioned are first discusses how interfaces are defined, describes the types
called K-factors; they have a value between 0 and 1 and modify of interfaces to be considered, and recommends a format for
the design-stage reliability: the documentation necessary to adequately control the inter-
face. The second provides tailored guidance for interface defi-
nition and control.
Rproduc t =R D x (Kq × Krn × Ks × K r × K l × Ku )
This manual can be used to improve planned or existing in-
terface control processes during system design and develop-
K-factors denote probabilities that design-stage reliability will
ment and also to refresh and update the corporate knowledge
not be degraded by
base. The information presented will reduce the amount of pa-
per and data required in interface definition and control pro-
Kq quality test methods and acceptance criteria cesses by as much as 50 percent and will shorten the time
Km manufacturing, fabrication, and assembly techniques required to prepare an interface control document. It also high-
Ks software lights the essential technical parameters that ensure that flight
subsystems will indeed fit together and function as intended
Kr reliability engineering activities
after assembly and checkout. Please contact the NASA Center
Kl logistics activities for Aerospace Information, (301) 621-0390 to obtain a copy.
Ku user or customer Appendix A contains tables and figures that provide refer-
ence data to support chapters 2 to 6. Appendix B is a practical
Any K-factor can cause reliabili.':/'o go to zero. If each K-factor product assurance guide for project managers.
equals I (the goal), Rproduct = R D.

Interface Definition and Control Concluding Remarks

This section is a training manual describing the elements of Chapter 2 explained two principal concepts:
interface definition and control (ref. 2-7).
This technical manual was developed as part of the Office of 1. To design a reliable product or to improve a product, you
Safety and Mission Assurance continuous training initiative. must understand first how the product can fail and then how to
The structured information contained herein will enable the control the occurrence of the failures.
reader to efficiently and effectively identify and control the 2. There is an upper limit to a product's reliability when a
technical detail needed to ensure that flight system elements traditional method of design and fabrication is used. This limit
mate properly during assembly operations (on the ground and is the inherent reliability. Therefore, the most effective reli-
in space). ability engineer is the designer because all his decisions di-
Techniques used throughout the Federal Government to rectly affect the product's reliability.
define and control technical interfaces for hardware and soft- The three probability theorems were also illustrated.

16 NASA/TP--2000-207428
References

2-1. James, G.: Mathematics Dictionary. Fourth Edition, Van Nostrand


Reinhold, 1976.
2-2. Bazousky, I.: Reliability Theory and Practice. Prentice Hall, 1961.
2-3. Earles, D.R.; and Eddins, U.F.: Reliability Physics, The Physics of
Failure. AVCO Corp., Wilmin_on, MA, 1962.
2-4. Calabro, S.: Reliability Principles and Practices. McGraw-Hill, 1962.
2-5. Electronic Reliability Design Handbook, MIL-HDBK-338, vols. 1 and
2, Oct. 1988.
2-6. Lalli, Vincent R.; and Packard, Michael H.: Design for Reliability:
Failure Physics and Testing. AR & MS Tutorial Notes, 1994.
2.7. Lalli, Vincent R.: Kastner, Robert E.; and Hartt, Henry N.: Training
Manual for Elements of Interface Definition and Control. NASA RP-
1370, Cleveland, OH, 1997.

NASA/TP----2000- 207428 17
Reliability Training 1

la. What notation means to take the sum of the xi's from i = 1 to i = n?

i=1 i=1

?1

lb. If )7 = 100, x I = 90, x 2 = 70, and x 3 = 50, what is Z(.7 -Xi) 2 9.


i=1

A. 350 B. 35x102 C. 35 000

2a. What notation means to take the n th root of x?

A.x n B.¢ n C. n'x/'x

2b. If £"= lO0, xl=90, x2=70, andx3=50, whatis X--Xi) 2 ?

A. 3.6 B. 59.2 C. 640

3a. What notation means to take the product of the xi's from i = I to n?

oo n

m. 1-'Ix's B. I--IXk C. 1-Ixi


i=0 i=1

3b. Ifx I = 0.9, x 2 = 0.99, andx 3 = 0.999, what is Hxi?


i=1

A. 0.890 B. 0.800 C. 0.991

4a. The notation x! refers to what shorthand method of writing?

A. Poles B. Factorial C. Polynomials

4b. What does 10!/8! equal?

A. 800 B. 900 C. 90

5a. Describe the three rules for manipulation of exponential functions.

i. Products

A. Subtract exponents B. Add exponents C. Multiply exponents

ii. Negative exponent

A. Cancel exponents B. Balance exponents C. 1/Exponent

iii. Division

A. Add exponents B. Subtract exponents C. Multiply exponents

I Answers are given at the end of this manual.

18 NASA/TP--2000-207428
5b.Simplify,_ 6E 3E 4.

A. _2 B. E4 C. E5

6. What is the integral of the following functions?

a.
Sxi' x3dx

A. X4/4 B. x 4/4 _z C. [(X2)4--(Xl)4]/4

b. dx
-xt2 E -ax

A.__-a-r/a B. [_-axl - _-ax-_ ]/a

7. What is the derivative of the following functions?

a. 10x4

A. 40x 2 B. 40x 3 C. 10x 3

b._ 2x

A. _2x B. E 2x/2 C. 2 E 2x

8a. Write the first two terms of the binomial expansion (a + b) n.

A. a n + (n -1)an-lb + . . . B. an - nan-lb + . . . C. a n + nan-lb +..

8b. Expand (a + b) 3 by using table 2-1.

A. a 3 + 2a2b + b 3 B. a 3- 3a2b- 3ab 2 + b 3 C. a 3 + 3a2b + 3ab 2 + b 3

9. What needs to be done to design a reliable product?

A. Test and fix it

B. Know how its parts fail


C. Know the type and magnitude of stresses that cause such failures
D. Both B and C

10. What are a priori reliabilities estimates of?

A. What may happen B. What will happen C. What has happened

11. What are a posteriori reliabilities observing?

A. What may happen B. What has happened C. What will happen

NASA/TP--2000-207428 19
12.If theprobability
ofsuccess
isR, what is the probability of failure Q?

A. 1 +R B. I-R 2 C. 1-R

13. If R I. R2, and R 3 are the probabilities that three independent events will occur, what is the
probability that all three will occur?

A.R I+R2+R 3 B.R I(R2+R 3) C. nRi


i=1

14. IfR I, R 2, and R 3 are the probabilities that three independent events will occur and not more than
one of the events can occur, what is the probability that one of these events will occur?

A. RIR2R 3 B. R 3 (R 1 + R2) C. Z Ri
i=1

15. What do we need to know if a device is to perform with classical reliability?

A. Operating time and conditions


B. How it will be used

C. The intended purpose


D. All of the above

16. What do we need to know if a device is to perform with reliability defined as the absence of
failure?

A. Critical failure modes

B. Operating time and conditions


C. How it will be used

D. The intended purpose


E. All of the above

17. What is the inherent reliability R i of the product you are working on?

A. PC (the probability that catastrophic part failures will not occur)


B. Pt (the probability that tolerance failures will not occur)
C. Pw (the probability that wearout failures will not occur)
D. The product of all the above

18. What is the reliability of your product?

A. Kq (the probability that quality test methods will not degrade Ri)
B. Km (the probability that manufacturing processes will not degrade R i)
C. Kr (the probability that reliability activities will not degrade Ri)
D. K l (the probability that logistic activities will not degrade Ri)
E. Ku (the probability that the user will not degrade Ri)

F. The product of all of the above and R i

20 NASA/"FP--2000- 207428
Chapter 3
Exponential Distribution and Reliability Models
An expression for the inherent reliability of a product was This distribution states that if art observed average failure rate
given in chapter 2 as (ref. 3-1 ) it is known for a device, it is possible to calculate the probability
P(x,t) of observing x = 0,1,2,3 ..... number of failures when the
R i = PcPttPw device is operated for any period of time t.
To illustrate, consider a computer that has been observed to
where make 10 arithmetic errors (or catastrophic failures) for every
hour of operation. Suppose that we want to know the probabil-
ity of observing 0, I, and 2 failures during a 0.01-hr program.
Pc probability that catastrophic part failures will not occur
From the data given,
P probability that tolerance failures will not occur
P probability that wearout failures will not occur x (observed failures) = 0, 1, and 2

t (operating time) = 0.01 hr


In chapter 3, we discuss the term Pc and develop and explain
it (failure rate) = 10 failures/hr
its mathematical representation in detail. We then use the
probability theorems to establish methods of writing and solv-
The probability of observing no failures P(0, 0.01) is then
ing equations for product reliability in terms of series and
redundant elements.

P(O, 0.01) = (10 x O.Ol)°e -0°×°°0


O_
Exponential Distribution

To understand what is meant by exponential distribution, I x e-°l -O.l


first examine a statistical function called the Poisson distribu- --e = 0.905
1
tion, which is expressed as (ref. 3-2)

The probability of observing one failure P( I, 0.01 ) is

where P(1, 0.01) = (10 × 0.01) 1e -(lOxO'Ol)

x observed number of failures

t operating time _ (0. 11_e-O. 1 _ O. 1 x 0.905 = 0.091


1
it average failure rate

NASA/TP--2000-207428 21
Theprobability
ofobserving
twofailures
P(2, 0.01 ) is Frequently, these assumptions are not realistic and the resultant
reliability predictions are usually high. They may bear little
(10 x 0.01)2e -(10x001) resemblance to the reliability finally observed when the prod-
P(?,O.Ol)= uct is tested. Later in this manual, we will let
2_

_ (0. l)2e --01 _ 0.01 x0.905 Pc = R = e-At


2xl 2

O.00905 to keep the notation simple.


- = 0.0045 On the other hand, it is also common to use e-At to represent
2
the observed product reliability. In this case the observed
Remember that the definition of Pc- is the probability that average failure rate ),represents the combination of all types of
no catastrophic failures will occur. So, for the computer, failures including catastrophic, tolerance, and wearout. If the
Pc- = P(0, 0.01 ) = 0.905. In other words, there is a 90.5-percent total product failure rate is it', then
chance that no arithmetic errors will occur during the 0.01-hr
program. This is the reliability of the computer for that particu-
R=e -)-' = e-At ptpw( KqKmKrKtKu )
lar program.
Again the Poisson distribution for x = 0 (i.e., no observed
failures) is
Failure Rate Definition

e(o,,) - -e The failure rate )' as used in the exponential distribution e -At
O_ represents random catastrophic part failures that occur in so
short a time that they cannot be prevented by scheduled main-
The term e-2t is called the exponential distribution and is the tenance (ref. 3-4). Random means that the failures occur
simplest form of P,. Consequently, for a device that has an randomly in time (not necessarily from random causes as many
average failure rate )', the probability of observing no failures people interpret random failure) and randomly from part to
for a period of time t is (ref. 3-3) part. For example, suppose a contractor uses 1 million inte-
grated circuits in a computer. Over a period of time he may
observe an average of one circuit failure every 100 operating
hrs. Even though he knows the failure rate, he cannot say which
one of the million circuits will fail. All he knows is that on the
The expression for inherent reliability now takes the form average, one will fail every 100 hrs. In fact, ifa failed circuit is
replaced with a new one, the new one, theoretically, has the
& = e-a'sew same probability of failure as any other circuit in the computer.
In addition, if the contractor performs a failure analysis on each
or in the more general expression for total product reliability, of the failed circuits, he may find that every failure is caused by
the same mechanism, such as poorly welded joints. Unless he
takes some appropriate corrective action, he will continue to
R = e-At PtPw( KqKmKrKtKu ) observe the same random failures even though he knows the
failure cause.

At this point it is probably a good idea to digress for a moment A catastrophic failure is an electrical open or short, a me-
to explain why these expressions for reliability may differ from chanical or structural defect, or an extreme deviation from an
those used elsewhere. During the conceptual and early research initial setting or tolerance (a 5-percent-tolerance resistor that
and development phases of a program, it is common practice deviated beyond its end-of-life tolerance, say to 20 percent,
(and sometimes necessary because of a lack of information) to would be considered to have failed catastrophically).
The latter portion of the failure rate definition refers to the
assume that Pt = 1 (the design is perfect), that Pw = 1 (no
wearout failures will occur), and that the K-factors all equal 1 circumstance under which a failure is revealed. If a potential
(there will be no de_adation of inherent reliability). These operating failure is corrected by a maintenance function, such
assumptions reduce the inherent reliability and product reli- as scheduled preventive maintenance where an out-of-
ability expressions to tolerance part could be replaced, that replacement cannot be
represented by it because it did not cause an operating or
unscheduled failure. Here we see one of the many variables that
Ri = R=e-At
affect the operating failure rate of a product: the maintenance
philosophy.

22 NASA/TP--2000-207428
TABLE 3-1 .--COMMON FAILURE RATE failure rates reduce the effect of those having increasing failure
DIMENSIONS
rates. The net result is an observed near-constant failure rate for
Failures/hr. Failures/ Failures/
10_'hr I0'} hr the system. Therefore, part failure rates are usually given as a
percent
constant although in reality they may not be. This manual deals
10.o 1130.0 100 000.0
only with constant part failure rates because they are related to
1.o 10.0 10 000.0 system operation. Even if the failure rates might be changing
.I 1.0 I 000.0 over a period of time, the constant-failure-rate approximation
is used.
.Ol .1 I00.0
If the failure rate for a typical system or complex subsystem
.0Ol .01 10.0
is plotted against operating life, a curve such as that shown in
.{2}001 .001 1.0 figure 3-2 results. The curve is commonly referred to as a

.00001 .0001 .I "bathtub" curve. The time t o represents the time at which the
system is first put together. The interval from t o to t I represents
.o2}0OOl .00001 .01
a period during which assembly errors, defective parts, and
.{2o2xX}01 .000001 .o01 compatibility problems are found and corrected. As shown, the
system failure rate decreases during this debugging, or burn-in,
interval as these gross errors are eliminated. The interval from
Failure Rate Dimensions
t I to t 2 represents the useful operating life of the equipment and
is generally considered to have a constant failure rate. During
Failure rate has the dimension of failure per unit of time,
this time, the expression Pc. = e-h is used. Therefore, when
where the time is usually expressed in 10 x hours or cycles.
using e -At , we assume that the system has been properly
Some government documents express & in percent failures per
debugged. In practice, this assumption may not be true, but we
103 hours. Table 3-1 shows the most common usage. Gener-
may still obtain an adequate picture of the expected operating
ally, the form that permits calculations using whole numbers
reliability by accepting the assumption. The interval from t2 to
rather than decimal fractions is chosen.
t 3 represents the wearout period during which age and deterio-
ration cause the failure rate to increase and render the system
inoperative or extremely inefficient and costly to maintain.
"Bathtub" Curve
The following analogy should help to summarize the con-
cepts of failure and failure rate. A company picnic is planned to
In the Poisson distribution, A was referred to as an average
be held on the edge of a high cliff. Because families will be
failure rate, indicating that/q, may be a function of time 2(0.
invited, there will be various types of people involved: large,
Figure 3-1 shows three general curves representing A,(t)
small, young, and old, each type with its own personality and
possibilities. Curve A shows that as operating time increases,
problems. Picnic officials are worried about someone's falling
the failure rate also increases. This type of failure rate is found
over the cliff. The question is, What can be done about it? Four
where wearout or age is a dominant failure mode stress (e.g.,
possible solutions are presented:
slipped clutches or tires). Curve B shows that as operating time
increases, the failure rate decreases. This type of failure rate has
(1) Move the picnic farther back from the cliff. The farther
been observed in some electronic parts, especially semiconduc-
back, the less the chance someone will fall over.
tors. Curve C shows that as operating time increases, the failure
(2) Shorten the picnic time. The shorter the picnic, the less
rate remains constant. This type of failure rate has been observed
time someone has to walk to the cliff.
in many complex systems and subsystems. In a complex system
(i.e., one with a large number of parts), parts having decreasing

IDebugging I Intrinsic failure I Wearout I


region I rate region
Ii regi°n )

03
o-

"--I

t_
I.t.
I
'Yi I

I
I

I
I 1
I I
I I
B
to t1 t2 t3
Time

Figure 3-1 .--Failure rate curves. Figure 3-2.--Failure rate versus operating time.

NA SAFFP--2000- 207428 23
(3) Look over the cliffto see if anyone has fallen. A good idea If the time dimension is given in cycles, the MTBF becomes
because people would know when to call the ambulance. mean cycles between failures (MCBF), a term also in common
Unfortunately, looking over the cliffdoes not keep others from use. For a nonrepairable device, mean time to failure (MTTF)
falling. It is possible, however, that going to the bottom of the is used instead of MTBF. For a repairable device MTBF, is
cliff to see who has fallen over might reveal that every usually equal to MTTF.
15 minutes one person over the age of 99 falls over the cliff. For example, ifa device has an MTBF of 200 hrs, this neither
Knowing this, all persons over 99 could be sent home and the means that the device will not fail until 200 operating hours
picnic saved from further tragedy. have accumulated nor that the device will fail automatically at
(4) Build a high fence to separate the cliff edge from the 200 hrs. MTBF is exactly what is says: a mean or average value,
picnickers. Obviously, this is the best solution because it is which can be seen from
doubtful that anyone would climb the fence just to get to the
cliff.
e -At = e -t/MTBF

Now, let us look at this picnic-to-failure rate analogy. Say


When the operating time t equals the MTBF, the probability
that we are building a system (picnic) made of many parts
of no failure is (using exponential tables or a slide rule)
(people) and that there are many types of parts; some large,
some small and some new and untried, such as integrated
circuits. Some of these parts, the composition resistors for e -MTBF/MTBF = e -I = 0.368
instance, are old and mature. Each part has its own personality
(the way it was fabricated). Our problem is how to keep these which means that there is a chance of 1 - 0.368 = 0.632 that the
parts from failing (falling over the cliff). Again we have four device will fail before its MTBF is reached. In other words, if
possible solutions: a device has an MTBF of 1000 hrs, replacing the device after
999 hrs of operation will not improve reliability. To show the
(1) Reduce the stresses on the parts (move the picnic back concept of a mean value in another way, consider the following
from the cliff); the lower the stresses, the fewer the failures. empirical definition of MTBF:
(2) Reduce the operating time (the picnic); the shorter the
operating time, the less chance a part has to fail. Part failure
Total test hours
rates can be established (look over the cliffto see if anyone has MTBF =
Total observed failures
fallen), but this only helps if we know what parts (people) are
failing. Once we know this, we can eliminate those parts from
our system. Note that the time when the failures were observed is not
(4) Eliminate the failure mechanisms of the part (build a indicated. The assumption of a constant failure rate leads to a
fence to separate the cliff edge from the picnic). This is the best constant time between failures, or MTBF.
answer, of course, because if we eliminate the cause of part
failures, we cannot have any system failures.
Calculations of Pc for Single Devices

Mean Time Between Failures If a failure rate for a device is known, the probability of
observing no failures for any operating period t can be calcu-
For the exponential distribution, the reciprocal of the failure lated.
rate is the mean time between failures (MTBF) and is the Example 1: A control computer in a missile has a failure rate
integral of the exponential distribution: of l per 102 hrs. Find Pc' for a flight time of 0.1 hr.
Solution 1:

MTBF = =---I e_Xt


X o Pc = e-_ = e-(l/102)(0l) = e-1×10-3 = e-°'°°l = 0.999

Therefore, there is one chance in a thousand that the control


computer will fail. (Note: if/q,t or t/MTBF is less than 0.01,
Pc--- 1 -/q.t, or 1 - t/MTBF.) For example,

Therefore, ifa device has a failure rate of one failure per 100 hrs,
its MTBF is 100 hrs. Pc = e-°'°°l -=-1-0.001 =0.999

24 NASA/TP--2000-207428
If22,ort/MTBF, is greater than 0.01, use exponential tables to
Part 1 Part 2
find Pc, as shown here: and if then
does not does not Success

fail faii
_ = e -°°8 = 0.923 i

Figure 3-3.--Series model.


Example 2: The same type of problem can be solved if the
MTBF is known. The MTBF of a tape reader used in ground
support equipment is 100 hrs. Find Pc for a 2-hr operation. Rs = R]Re
Solution 2:
(We assume that the part reliabilities are independent; i.e., the
success or failure of one part will not affect the success or
Pc = e-t/MTBF = e-2/100 = e-002 = 0.980
failure of another part.) If there are n parts in the system with
each one required for system success, the total system reliabil-
If a specific Pc. is required for a specified operating time, the ity is given by
required failure rate, or MTBF, can be calculated.
Example 3: A relay is required to have a 0.999 probability of
Rs = RI R2 R3 "'" Rn = h Ri
not failing for 10 000 cycles. Find the required failure rate and
MCBF. i=I

Solution 3:
where

R=e -xt
Rs probability that system will not fail
0.999 = e -0"001 = e -_t(10%ycles) Rj reliability ofj thpart
n total number of parts
Equating exponents gives
The expression

)L(104cycles) = 0.001
0.001 1 failure Rs = H Rj
A-
j=l
104 107 cycles

is often called the product rule.


The required MCBF is therefore
Example 4: A system has I00 parts, each one required for
system success. Find the system reliability R s if each part has
MCBF = --1 = 107 cycles R=0.99.
2 Solution 4:

n 100
Reliability Models
R,=]] Rj=]] =R,R2R3.R,oo
j=l j=l
In the following sections we replace Pc = e-';tt' the reliability
of a part, with an R to keep the notation simple. = (0.99)(0.99)(0.99)... (0.99) = (0.99) I°°

= (e-0°') I00 =e-' = 0.368


Calculation of Reliability for Series-Connected Devices

In reliability, devices are considered to be in series if each Therefore, the probability that the system will succeed is about
device is required to operate without failure to obtain system 37 percent.
success (ref. 3-5). A system composed of two parts is represented Example 5: For a typical missile that has 7000 active parts
in a reliability diagram, or model, as shown in figure 3-3. If the and a reliability requirement of 0.90, each part would have to
reliability R for each part is known (probability theorem 2, have a reliability Rp of 0.999985, which is calculated using
ch. 2), the probability that the system will not fail is table A- 1:

NASA/TP--2000-207428 25
Example 6: Find the system reliability from the model shown
Rp )7000 = 0.90 = e_0.105 in figure 3-4.
Solution 6:

Solution 5" Therefore,


Step 1

3
Rp = e_0.10._ ") 1/7000 =e -
-l.3x10" -_ = e--0.00(_15
E_,jtj )l, lt I + }!,2t 2 + }l,3t 3
= 1 - 0.000015 = 0.999985 j=l

= 10/103(10) + 20/103(4)+ 100/103(2)


The product rule can also be expressed as
= 100/103 +80/103 +200/103 = 380/103

R,=I-[ Rj=R,R:R,...R. Step 2


g=l

= e-_-i'l e-_.2,2 e-_.3,3 ... e-_t,

= e-(krl+k2,2 +)-3,3... k,_t,) R s = exp - _.jtj = e -380/103 = e -°38 = 0.684


_, j=l

= exp -
If the tj's are equal (i.e., each part of the device operates for the
k, j=l )
same length of time), the product rule can further be reduced to

where

Rs = exp -E _J tc
failure rate offl h part
k j=l
tj operating time offl h part

Therefore, if for each series-connected part in a system the where tc is the common operating time.
failure rate and operating time are known, the system reliability Example 7: Find the reliability of the system shown in fig-
ure 3-5.
n

Solution 7:
can be calculated by finding -E_,jtj and raising e to the
j=l
Step 1
3

kj = Ll + _'2 +_3 = 7/103 +5/103 +6/103 = 18/103

j=l

Part 1 Part2 [ [Part 3

--1)- 1= 10/103" and _.2=20/103[ and [_.3 = 100/103,

Itl = 10
I I then

Figure 3-4.--Series model using failure rates and operating times.

JPa"I P
).1=7/103
tl = 10
).2=5/103
t2 = 10
Part 3
_.3 = 6/103
t3= 10

Figure 3-5.--Series model with operating times equal.

26 NASA/TP--2000-207428
Step2
does not fail
Automatic
(R 1= 0.9) control ?

Rs = exp - _.j tc = e or

k j=l ,] Semiautomatic control


then
does not fail Success
=e -°18 =0.835
(R 2 = 0.8)

or
Calculation of Reliability for Parallel-Connected Devices
(Redundancy) -- does not fail
Manual control
(R 3 = 0.7)

In reliability, devices are considered to be in parallel if one or


Figure 3-7.--Space capsule guidance model.
more of them can fail without causing system failure but at least
one of them must succeed for the system to succeed. First we
consider simple redundancy.
R1 R2 + RI Q2 + Q1 R2 = 1 - QI Q2
Simple redundancy.--Ifn devices are in parallel so that only
one of them must succeed for the system to succeed, the devices
In simple terms, if the only way the redundant system can fail
are said to be in simple redundancy. The model of a two-part
is by all redundant parts failing, the probability of success must
redundancy system presented in figure 3-6 illustrates this
be equal to 1 minus the probability that all redundant parts wilt
concept. In other words, if part 1 fails, the system can still
fail (i.e., R = 1 - Q) from probability theorem I in chapter 2.
succeed if part 2 does not fail, and vice versa. However, if both
This reasoning can be extended to n redundant parts if at least
parts fail, the system fails.
one of the n parts must succeed for the system to succeed.
From probability theorem 3 in chapter 2, we know that the
Example 8: Suppose that a space capsule can be guided three
possible combinations of success R and failure Q of two devices
ways: (1) automatically with R 1 = 0.9, (2) semiautomatically
are given by
with R 2 = 0.8, (3) manually with R 3 = 0.7. The diagram of
successful guiding, assuming that the three ways are indepen-
R IR 2 + Rt Q2 + QI R2 + QI Q2 dent of each other, is shown in figure 3-7. From probability
theorem 3 in chapter 2, the possible events are given by
where

RI R2 R3 + RI R2 Q3 + RI Q2 R-_+ QI R2 R-_+ R IQ2 Q3


R1R 2 both parts succeed
+ QIQ2R3 + QIR2Q3 + QIQ2Q3
RIQ2 pan I succeeds and pan 2fails

QIR2 pan l fails and part2 succeeds Because the sum of these probabilities is equal to unity and at
QIQ2 both pa_sfail least one of the control systems must operate successfully, the
probability that guidance will be successful Rguidanc e is
We also know that the sum of these events equals unity since
they are mutually exclusive (i.e., if one event occurs, the others
cannot occur). Therefore, Rguidanc e = RIR2R 3 + RIR2Q 3 + RIQ2R 3 + QIR2R3

+ RIQ2Q_ + Q1Q2R3 + QIR2Q3


Rl R2 + RIQ2 + QIR2 + Q1Q2 = 1
=l- o,o2Q,=1-[(1-R,)(1-R2)(1-
R3)]
Because at least one of the parts or devices must succeed in = 1-[(l-0.9)(1- 0.8)(1- 0.7)]
simple redundancy, the probability of this happening is given
by = 1- [(0. 1)(0.2)(0.3)]

= 1 - (0.006) = 0.994

Part 1 does
In general, then, for simple redundancy
not fail

or tl

-_ Success
/ = [-[ ej =I-(Q,02e3 4°)
__] Part2does
j=l
/ not fail

Figure 3--6.--Simple redundancy model. where

NAS A/TP--2000-207428 27
X1 = 120/106 _tAwn°Y Part 1
tl = 1000
-- or parts

Part 2 then
Part 1 fail Part 2 Success
X2 = 340/106 do not I
t 2 = 1000

Figure 3-8.--Simple redundancy model using failure


rates and operating times. Part 3

/7 Figure 3-9.--Compound redundancy model.

total probability of failure


j=l To simplify the notation, let R_ = R 2 = R3 and Q, = Q2 = Q3" This
reduces the expression to
Q total probability of failure ofj th redundant part

n total number of redundant parts


R 3 + R2Q+ R2Q+ R2Q+ RQ2 + RQ2 + RQ2 + Q3

Example 9: Find the reliability of the redundant system


or
shown in figure 3-8.
Solution 9:
R 3 + 3R2Q+3RQ2 + Q3
Step 1--Solve for the reliability of parts 1 and 2:
Because the sum of these probabilities equals unity and at least
two of the three parts must succeed, the probability for success
R, ', = = e -°12° = 0.887 is given by

R, = e -22'2 = e -[(340/106 )x103] =e-°34° = 0.712


Rs = R3 + 3R2Q= I-(3RQ2 +Q 3)

Step 2--Solve for the unreliability of each part: where 3RQ 2 represents one part succeeding and two parts
failing and Q3 represents all three parts failing.

QI=I-RI =0.113 Example 10: Assume that there are four identical power
supplies in a fire control center and that at least two of them
Q2 =I-R 2 =0.288
must continue operating for the system to be successful. Let
each supply have the same reliability, R = 0.9 (which could
Solve for the reliability of the redundant system:
represent e -_ or R i or R). Find the probability of system success

Rsimple redundant'
gsimple redundant = ! -- QIQ2 = l - (0. 113)(0.288) Solution 10: The number of possible events is given by

= 1 - 0.033 = 0.967
(R+Q)4 = R 4 +4R3Q+6R2Q2 +4RQ3 +Q4

There is a 96.7-percent chance, therefore, that both parts will


not fail during the 1000-hr operating time. The sum of the probabilities of these events equals unity;
Compound redundancy.---Compound redundancy exists therefore, the expression for two out of four succeeding is
when more than one ofn redundant parts must succeed for the
system to succeed. This can be shown in a model of a three-
Rs = R 4 +4R3Q+6R2Q2 = 1_(4RQ3 +Q4)
element redundant system in which at least two of the elements
must succeed (fig. 3-9).
From probability theorem 3 in chapter 2, the possible events Substituting R = 0.9 and Q = 1 - 0.9 gives
are

Rsml- (4RQ3 + Q4)=l-[4(0.9)(0.1) 3 +(0. l) 4 ]

RIR2R 3 + RIR2Q _ + RIQ2R 3 + QIR2R 3 + RIQ2Q_ = 1 -[(3.6)(0.001) + 0.0001] = 1-(0.0036 + 0.0001)


+QIQ2R3 + QI R2Q3 + Q1Q2Q3 = l - 0.0037 = 0.996

28 NASA/TP--2000-207428
not fail

_or Part 4 does

I
__] Part 1 does
I not fail Pa 2 oes
Pa 3Ooes
not fail lan°" not fail not fail
Success

Part 6
Part 5 does
does I then I

not fail

Figure 3-10.--Model of system with series and redundant elements.

Calculation of Reliability for Complete System


R s = e-°°le-°°°le-°°5 [1 - (1 - 0.85)(1 - 0.89)(I - 0.78)]

To find the reliability for a complete system, begin by


= e -°°61 [1- (0. 15)(0.11)(0.22)] = e_°61(1- 0.00363)
developing a model for the system, write the equation for the
probability of success from the model, and then use the failure = e-°°6]e -°°°36 = e -°°65 = 0.935

rates and operating times of the system elements to calculate the


reliability of the system (refs. 3-6 to 3-8).
However, this does not mean that there will be no equipment
Example 11: Consider the system model with series and
failures. The system will still succeed even though one or two
redundant elements shown in figure 3-10.
of the redundant paths have failed.
Solution 11: The equation can be written directly as
Example 12: Write the equation for the system shown in
figure 3-11.
R s = RIR2R3(1-Q4Q5Q6)
Solution 12: The equation can be written directly as

where R IR2R 3 represents the probability of success of the series


parts and (1 - Q4Q5Q6) represents the probability of success of
R s = RIR2[I-(R3Q4Q5 + Q3R4Q5 + Q3Q4R5
the three parts in simple redundancy. If we know that

R t = 0.99 = e -°°l R 4 = 0.85

R 2 = 0.999 = e '°°°l R 5 = 0.89 where RIR2 is the probability that the two parts in series will not
fail, 1 -(R3Q4Q 5 +... + Q3QaQs) is the probability that two out
R 3 = 0.95 = e -°°5 e,6 =0.78
of three of the compound redundant parts will not fail, and
(1 - Q6QT) is the probability that both the simple redundant
where R may represent e -2t, inherent reliability R i, or observed parts will not fail. If data giving the reliabilities of each part are
product reliability depending on the stage of product develop- available, insert this information in the system success equation
ment, then the reliability of the system is to find the system reliability.

WO
out of
three not fail [I

__[ a ldoes
20oes
land"
not fail I Inotfail
fail
do not _
and if

I Part 6 does
Part 7 does L
not fail /

Figure 3-11 .--System reliability model using series, simple redundancy, and compound redundancy elements.

NASA/TP--2000-207428 29
I
Part2 does [
not fail I
/ I
__.] Part 1 does Part3does [.
/ not fail and if not fail / ._ do not 6fail
Parts and 7 Success

, 4°oes I--
not fail not fail

Figure 3-12.--Model with series elements in redundant paths.

Example 13: Write the equation for the system shown in


figure 3-12.
Rsimple redundant = 1- 1-'-I aJ
Solution 13: The equation can be written directly as
j=l

(4) The reliability of n devices connected in compound


,. redundancy by expanding (R + Q)n and collecting the appropri-
ate terms.
where R1R6R 7 is the reliability of the series parts, (1 -R4R5) is And finally, you should be able to combine the four methods
the probability that R 4 or R 5 will fail in the bottom redundant described above to calculate the reliability of a total system.
path, and { 1 - [Q2Q3(1 - R4R5)] } is the reliability of the three In 1985, alternative methodologies were introduced in the
paths in simple redundancy. form of computer reliability analysis pro_ams. One such
underlying model uses a Weibull failure rate during the burn-
ing, or "infant mortality," period and a constant failure rate
Concluding Remarks during the steady-state period for electronic devices. Initial
results indicate that given a 15- to 40-yr system life, the infant
Chapter 3 has presented several important concepts that you mortality period is assumed to last for the first year. Of course,
should have clearly in mind: the higher the stress of the environment, the shorter the period
of infant mortality. The point is that there are many ways to
( 1 ) The exponential distribution e-_ represents the probabil- perform reliability studies, and different methodologies could
ity that no catastrophic part failures will occur in a product. be equally appropriate or inappropriate. Appendix C describes
(2) The failure rate A as used in e-A_ is a constant and five distribution functions that can be used for reliability
represents the rate at which random catastrophic failures occur. analysis. Table C-I shows the time-to-failure fit for various
(3) Although the cause of failure is known, random failures systems. The basic criteria relate to the distribution of failures
may still occur. with time.
(4) The mean time between failures (MTBF) is the reciprocal
of the failure rate.
(5) In reliability, devices are in series if each one is required References
to operate successfully for the system to be successful. Devices
are parallel or redundant if one or more can fail without causing 3-1. Failure Distribution Analyses Studies, Vols. I, 11, and Ill. Computer
Applications Inc.. New York, Aug. 1964. (Avail. NTIS: AD_631525,
system failure but at least one of the devices must succeed for
AD-63 ! 526, AD-631527.)
the system to succeed. 3-2. Hoel, Paul G.: Elementary Statistics. John Wiley & Sons, Inc., 1960.
In addition, you should be able to calculate the following: 3-3. Calabro, S.: Reliability Principles and Practices McGraw-Hill, 1962.
(1) The reliability of a device, given failure rate and operating 3-4. Reliability Prediction of Electronic Equipment. MIL-HDBK-217E.
time Jan. 1990.
3-5. Electronic Reliability Design Handbook. MIL-HDBK-338, Vols. 1
(2) The reliability of devices connected in series from the
and II, Oct. 1988.
product rule 3_5. Bloomquist, C.; and Graham, W.: Analysis of Spacecraft On-Orbit
/7 Anomalies and Lifetimes, (PRC R-3579. PRC Systems Sciences Co.:
NASA Contract NAS5-27279), NASA CR-170565. 1983.
3-7. Government-Industry Data Exchange Pro_am (GIDEP). Reliability-
j=l
Maintainability (R-M) Analyzed Data Summaries. Vol. 7, Oct. 1985.
3-8. Kececiouglu, D.: Reliability Engineering Handbook. Vols. 1 and 2.
(3) The reliability of devices connected in simple redundancy Prentice-Hall. 1991.
from

3O NASA/TP--2000-207428
Reliability Training
la. Of 45 launch vehicle flights, 9 were determined to be failures. What is the observed reliability?

A. 0.7 B. 0.8 C. 0.9

lb. What is the observed reliability if the next five flights are successful?

A. 0.72 B. 0.82 C. 0.87

lc. After the five successes of part lb, how many more successes (without additional failures) are required for
a reliability of R = 0.90?

A. 20 B. 30 C. 40

2. A three-stage launch vehicle has a reliability for each stage of R I = 0.95, R 2 = 0.94, R 3 = 0.93.

a. What is the probability of one successful flight?

A. 0.83 B. 0.85 C. 0.87

b. What is the probability of flight failure for part a?

A. 0.00021 B. 0.15 C. 0.17

c. What is the probability of two successful flights?

A. 0.689 B. 0.723 C. 0.757

3. You are taking a trip in your car and have four good tires and a good spare. By expanding (R + Q)5

a. How many events (good tires or flats) are available?

A. 16 B. 32 C. 64

b. How many combinations provide four or more good tires?

A. 6 B. 7 C. 16

c. If R = 0.99 for each tire and a successful trip means you may have only one flat, what is the probability
that you will have a successful trip?

A. 0.980 B. 0.995 C. 0.9990

4. A launch vehicle system is divided into five major subsystems, three of which have already been built and
tested. The reliability of each is as follows: R 1 = 0.95, R 2 = 0.95, R 3 = 0.98. The reliability of the overall
system must be equal to or greater than 0.85. What will be the minimum acceptable reliability of subsystems
4 and 5 to ensure 85-percent reliability?

A. 0.92 B. 0.95 C. 0.98

_Answers are given at the end of this manual+

NASA/TP--2000-207428 31
5a. A launchvehicle
testprogram
consistsof20testfiringsrequiring
90-percent
reliability.
Fivetestshave
already
beencompletedwithonefailure.
Howmanyadditional successes
mustberecorded tosuccessfully
completethetestpro_am?

A. 13 B. 14 C.15

5b. Based
ontheprobability
(foursuccesses
infiveflights),whatis theprobability
of achieving
successful
completion
ofthetestpro_am?
A.0.04 B.0.167 C.0.576

6. Duringindividual
testsof majorlaunchvehicle
subsystems,
thereliabilityof eachsubsystem
wasfound
tobe

Subsystem
1=0.95
Subsystem
2 =0.99
Subsystem
3= 0.89
Subsystem
4= 0.75

Sinceallsubsystems
arerequired
tofunction
properlytoachievesuccess, whatincrease
inreliabilityof
subsystem4 wouldbenecessary
tobringtheoverallsystem
reliabilityto0.80?

A. 15percent B.20percent C.25percent


7.
Solveforthefollowingunknown
values:
a.,_ = 750×10 -6 failures/hr: t = 10 hr: R =?

A. 0.9925 B. 0.9250 C. 0.9992

b. A,= 8.5 percent failures/103 hr; t = 3000 hr; R = 9

A. 0.9748 B. 0.7986 C. 0.0781

c. MTBF = 250 failures/hr; t = 0.5 hr; R = 9

A. 0.9802 B. 0.9980 C. 0.9998

d. R = 0.999: t = 10 hr; 2 = ')

A. 1000Xl0 -9 failures/hr B. 10Xl0 --6 failures/hr C. 10 percent failures/103 hr

e. MTBF = ?

A. 104 failures/hr B. 105 failures/hr C. 106 failures/hr

,
The a priori MTBF prediction of a printed circuit board was 12.5X106 hr. Find the number of expected
failures during a 108-hr (accelerated) life test of I0 circuit board samples.

A. 12.5 B. 80 C. 125

32 NAS A/TP--2000-207428
9a. Writethereliabilityequation
forthebattery
activation
success
diagram
shownbelow:
If And And And Then

BatteD' Passes Initiates Ignites Batter3,' Success

activates umbilical EBW I initiator I activates

command path (part 3) (part 5) (part 7)

(part I) (pan 2) or or

EBW 2 initiator 2

(part 4) (part 6_

A. R s = RIR2(1 -R3R4)(I -RsR6)R 7 B. Rs = RIR2(I -Q3Qa)(I -Q5Q6)Rv

9b. IfR = 0.9 for all series and R = 0.8 for all parallel parts, solve for R c

A. 0.73 B. 0.26 C. 0.67

10. A launch vehicle subsystem is required to be stored for 10 years (use 9000 hr = I year). If the subsystem
reliability goal is 0.975,

a. What ,,1,is required with no periodic checkout and repair?

A. 2800)< ! 0-9 B. 28X 10 -9 C. 280X 10-9

b. What _ is required with checkout and repair every 5 years'? (Assume 100-percent checkout.)

A. 5600)< 10-9 B. 56X 10 -9 C. 560X 10-9

c. What 2, is required with checkout and repair every year? (Assume 100-percent checkout.)

A, 2800X10-9 B. 28Xi0-9 C. 280×10-9

NASA/TP--2000-207428 33
Chapter 4
Using Failure-Rate Data
Now that you have a working knowledge of the exponential One-third of the parts were operated at -25 °F, one-third at
distribution e -_t and have the fundamentals of series and 77 °F, and one-third at 125 °F. The parts, tested in circuits
redundant models firmly in mind, the next task is to relate these (printed circuit boards), were derated no more than 40 percent.
concepts to your everyday world. To do this, we explore further The ordinate of the curve shows cumulative failures as a
the meaning of failure rates, examine variables that affect part function of operating time. For example, at about 240 hours, the
failure modes and mechanisms, and then use part failure rate first failure was observed and at about 385 hours, the second.
data to predict equipment reliability. We introduce a simple Several important observations can be made concerning failure
technique for allocating failure rates to elements of a system. rates and failure modes.
The concepts discussed in this chapter are tools the designer can Constant Failure Rate.---Figure 4-1 shows that the failure
use for trading off reliability with other factors such as weight, rate for the first 1600 hr is constant at one failure every 145 hr.
complexity, and cost. These concepts also provide guidelines This a_ees with the constant-_, theory. Bear in mind that
for designing reliability into equipment during the concept constant failure rate is an observation and not a physical law.
stage of a program. Depending on the equipment, failure rates may decrease or
increase for a period of time.
Random Nature.--Notice that the failures in this constant-
Variables Affecting Failure Rates failure-rate region are random (in occurrence). For example,
two diodes fail, then three transistors, then a silicon switch, then
Part failure rates are affected by ( i ) acceptance criteria, (2) all a diode, then a trimpot and a resistor, and so on.
environments, (3) application, and (4) storage. To reduce the Repetitive Failures.--Figure 4-1 also shows that during the
occurrence of part failures, we observe failure modes, learn first 1600 hr, only two of these failures involved the same type
what caused the failure (the failure stress), determine why it of device. This is important because in most systems the
failed (the failure mechanism), and then take action to eliminate problems that receive the most attention are the repetitive ones.
the failure. For example, one of the failure modes observed It should be apparent in this case that the repetitive failures are
during a storage test was an "open" connection in a wet not the ones that contribute the most to unreliability (failure
tantalum capacitor. The failure mechanism was end seal dete- rate); taking corrective action on the repetitive type of failure
rioration, which allowed the electrolyte to leak. One obvious would only improve the observed failure rate by 18 percent.
way to avoid this failure mode in a system that must be stored Failure modes.---Table 4-1 shows the observed failure
for long periods without maintenance is not to use wet tantalum modes (the way the failures were revealed) for the transistor,
capacitors. If this is impossible, the best solution would be to diode, and resistor failures given in figure 4-1. In table 4-1(a),
redesign the end seals. Further testing would be required to note that the short failure mode for transistors had an occur-
isolate the exact failure stress that produces the failure mecha- rence rate five times that of any other mode. Note also that the
nism. Once isolated, the failure mechanism can often be elimi- eight transistor failures were distributed about evenly in the
nated through redesign or additional process controls. three environments but that some different failure modes were
observed in each environment.
Observe again in table 4-1(b) that the short failure mode for
Operating Life Test
diodes occurred most frequently. The failures were not distrib-
The tests involved 7575 parts--3930 resistors, 1545 uted evenly in each environment, but a different failure mode
capacitors, 915 diodes, 1080 transistors, and 105 transformers. occurred in each environment.

NASA/TP--2000-207428 35
20 _ Test time
14 Storage time
r
I
Intrinsic failure rate, Capacitor, electrolyte leak, wet tantalum
1 failure/2300 hr
IJ Transistor, short, 2N389
- Infant mortality
15 Sl Transistor, tolerance, 2N335
,, failure rate,
1 failure/145 hr Transistor, short, 77 °F, 2N396

• Transistor, open, 125 °F, Mo 90

shod, -25 °F, 2N498

Transistor, short, -25 °F, Mo 90


Transistor, leakage, -25 °F, 2N1057

Resistor, tolerance change, 125 °F, metal film Sample size

Trimpot, intermittent,-25 °F Resistors 3930


Capacitors 1545
Diode, open, 125 °F, 1N483
Diodes 915
Selector switch, short, 77 °F, SA60A Transistors 1080
Transistor, intermittent, 125 °F, Mo 90 Transformers 105

Transistor, short, 125 °F, 2N1016B


Total 7575
Transistor, short, 77 °F, 2N389

Diode, short, 77 °F, 1N708A


Diode, short, 77 °F, 1 N761
J I I t
1 2 3 4 I 5 6 7 8 9 10 11 12x10 3
i Time, t, hr

Figure 4--1 .---Observed part failures versus test and storage time.

Resistors failed in two modes (table 4--1(c)): one intermittent (4) Most parts have a dominant failure mode. For example,
resistor at low temperatures and one tolerance failure at high the dominant failure mode for semiconductors is shorting.
temperatures. (5) Rigid part screening and acceptance criteria can sub-
Burn-in.--As shown in figure 4-1 after 1600 hr, the failure stantially reduce operating failure rates by eliminating
rate of the 7575 parts dropped by a factor of 7 for the remaining early failures.
2900 test hours (3 failures per 2900 hr, failures 12, 13, and 14,
as compared with 11 failures per 1600 hr). This is an example
Storage Test
of what are commonly called burn-in failures. The first 11
failures represent parts that had some defect not detected by the
After the operating test, the parts were put in storage for
normal part screening or acceptance tests. Such defects do not
approximately 7000 hr (10 months) and then were retested to
reveal themselves until the part has been subjected to operation
determine the effect of storage on parts. As shown in fig-
for some time. As mentioned earlier, eliminating the repetitive
ure 4-1, three failures (14, 15, and 16) were observed at the end
failure would only decrease the failure rate in the first 1600 hr
of the storage period. Note that the average failure rate observed
by about 18 percent, but if screening tests were sensitive
in storage (one failure per 2300 hr) is close to the same rate
enough to detect all defects, the failure rate would approach the
observed in the previous 2900 hr of operation. Thus, it can be
intrinsic failure rate shown in figure 4-1 right from the start.
concluded that storage does produce part failures and that the
In summary, some of the observed properties of operating
storage failure rate may be as high as the operating rate. Industry
failure rates are as follows:
is conducting a great deal of research on this problem because
storage failure rates become a significant factor in the reliability
(1) For complex equipment, the intrinsic failure rate of
of unmanned systems and affect considerably the maintenance
electronic parts is usually constant in time.
policy of manned systems.
(2) Failures are random, with repetitive failures represent-
ing only a small portion of the problems.
(3) Failure modes of parts and equipment vary, depending Summary of Variables Affecting Failure Rates
on the operating environment.
Part failure rates are thus affected by

36 NASA/TP--2000-207428
TABLE 4-3.--STRESS RATIOS THAT MEET
TABLE 4-1 .--FAILURE MODES
(a) Transistors ALLOCATION REQUIREMENT

Observed Temperature, °F Total Observed Part Stress ratio, W


part /ailures failum temperature,
failure -25 77 125 rate, oC o, 102I 03I 0410.5I 06
mode failures/hr
Failure rate of derated part per I0 (' hr, ),_
Open ........ MD-90

Short MD--90 2N389 2NI016B 30 0.23 0.22

2N498 2N396 ........... 40 0.24

Intermittent ...... MD--90 1 .206/106 50 0.24

Leakage 2NI057 1 .206/106 60 0.25

Totals 3 2 3 8 1.65/10 _' 70 0._

(b) Diodes

Open 1N483 1 0.24/104

Short IN761 ........ 3 .73/t04

1N708A

SA60A

Totals 3 1 4 0.97/104

(c) Resistors

Intermittent Trimpot l 0.06110 n

Tolerance ......... Metal film t .06/10 _

Totals I 0 I 2 0.12/10 _

TABLE 4.-2.--FAILURE RATE CALCULATION

(a) Tactical fire control station logic gate

Component Stress ratio, W Number Failure rate of Application Total failure


used, derated part at factor for rate.
N 40 °C vehicle, 7',.r = Nt.t,K _.
_ound failures/I 0 *' hr
mounted,
failures/10 n hr
Ka

Resistor, composition (2000 _) 0.5 1 0.0035 10 0.035

Resistor,

Resistor,

Resistor,
composition

composition

composition
(180

(22 000

(6500
000

f2)
f_)

f_)
5

5 2
I

I
.0035

.0038

.0035
i .035

.038

.070

Transistor, germanium (PNP type) <I W; 0.4 normalized 1 1.3 8 10.400

junction temperature
Diode, IN31A 3 t 3.5 17.500

Total, _., = Y')vr = 29.68

(b) Proposed logic gate

Resistor, film (I 300 _q) 0.8 I 0.19 0.3 0.057

Resistor,'film (3320 _) 2 .14 3 .042

Resistor, film (46 600 .Q) 2 .14 3 .042

Transistor, silicon (NPN type) <1 W: 0.15 normalized .165 8 1.320

iunction temperature •
Diode, IN31A 2 5 3.0 75.000

Total, _, = Y _.7 = 76.461

NASA/TP--2000-207428 37
(1) Acceptance criteria Equipment life periods
(2) All environments
(3) Application l 1
Infant i4 11 life
Useful Ill
Wearout
(4) Age or storage _1, mortality I
Overall life
characteristic
To find ways of reducing the occurrence of part failures, we curve --_
observe failure modes, learn what caused the failure (the failure
\'\ i

stress), determine why it failed (the failure mechanism), and Stress-related


then take action to eliminate the failure. For example, one of -r _ I failures ---- i
the failure modes observed during the storage test was an __ _ -.--Quality failures ..,,f'Wearout
"'open" in a wet tantalum capacitor. The failure mechanism was _ _ ,_ _- failures
deterioration of the end seals, which allowed the electrolyte to Time, t
leak. One obvious way to avoid this failure mode in a system
that must be stored for long periods without maintenance is not Figure 4-2.--Hazard rate versus equipment life periods.
to use wet tantalum capacitors. If this is impossible, the next
best thing would be to redesign the end seals. This would no deceiving because zone II is usually much longer than zone I
doubt require further testing to isolate the exact failure stress or III. The time when a chance failure will occur cannot be
that produces the failure mechanism. Once isolated, the failure predicted, but the likelihood or probability that one will occur
mechanism can often be eliminated through redesign or addi- during a given period of time within the useful life can be
tional process controls. determined by analyzing the equipment design. If the probabil-
One of the best known methods of representing part failures ity of a chance failure is too great, either design changes must
is the use of failure rate data. Figure 4-2 (from ref. 4-1 ) shows be introduced or the operating environment made less severe.
a typical time-versus-failure-rate curve for flight hardware. The SFR period is the basis for the application of most
This is the well-known "bathtub curve," which over the years reliability engineering design methods. Because it is constant,
has become widely accepted by the reliability community and the exponential distribution of time to failure is applicable and
has proven to be particularly appropriate for electronic equip- is the basis for the design and prediction procedures spelled out
ment and systems. It displays the sum of three failure rate in documents such as MIL-HDBK-217E (ref. 4-2).
quantities: quality (QFR), stress (SFR), and wearout (WFR). The simplicity of the approach (utilizing the exponential
Zone I, the infant mortality period, is characterized by an distribution, as previously indicated) makes it extremely attrac-
initially high failure rate (QFR). This is normally the result tive. Fortunately, it is widely applicable for complex equipment
of poor design, use of substandard components, or lack of and systems. If complex equipment consists of many compo-
adequate controls in the manufacturing process. When these nents, each having a different mean life and variance that are
mistakes are not caught by quality control operations, an early randomly distributed, then the system malfunction rate becomes
failure is likely to result. Early failures can be eliminated by a essentially constant as failed parts are replaced. Thus, even
"burn-in" period during which time the equipment is operated though the failures might be wearout failures, the mixed popu-
at stress levels closely approximating the intended actual oper- lation causes them to occur at random intervals with a constant
ating conditions. The equipment is then released for actual use failure rate and exponential behavior. This has been verified for
only when it has successfully passed through the burn-in much equipment from electronic systems to rocket motors.
period. For most well-described complex equipment, a 100-hr Zone III, the wearout period, is characterized by an increas-
failure-free burn-in is usually adequate to cull out a large ing failure rate (WFR) resulting from equipment deterioration
proportion of the infant mortality failures caused by stresses due to age or use. For example, mechanical components, such
on the parts. as transmission bearings, will eventually wear out and fail
Zone II, the useful life period, is characterized by an essen- regardless of how well they are made. Early failures can be
tially constant failure rate (SFR). This is the period dominated postponed and the useful life extended by good design and
by chance failures, defined as those failures that result from maintenance practices. The only way to prevent failure due to
strictly random or chance causes. They cannot be eliminated by wearout is to replace or repair the deteriorating component
either lengthy burn-in periods or good preventive maintenance before it fails.
practices. Because modern electronic equipment is almost completely
Equipment is designed to operate under certain conditions composed of semiconductor devices that really have no short-
and to have certain strength levels. When these strength levels term wearout mechanism, except for perhaps electromigration,
are exceeded because of random unforeseen or unknown events, one might question whether predominantly electronic equip-
a chance failure will occur. Although reliability theory and ment will even reach zone III of the bathtub curve.
practice are concerned with all three types of failure, the Different statistical distributions might be used to character-
primary concern is with chance failures since they occur during ize each zone. Hazard rate has been defined for five different
the useful life of the equipment. Figure 4-2 is somewhat failure distribution functions Depending on which distribution

38 NASAFFP---2000- 207428
fitsthehazardratedatabest,afailuredistribution functioncan use the latest version available. Even the latest version of the
beselected.Theinfantmortality periodforthetypicalh_ard data used for compiling the handbook may not represent the
rateinfigure4-2mightberepresented bytheWeibulldistribu- parts you are using. The best procedure is to use your own
tion,theusefullifeperiodby the exponential distribution, and failure rate data with modern computer-aided software to
the wearout period by the log normal distribution. simulate your designs.
As emphasized in chapter 3, failure rates are statistical, and
there is no such thing as an absolute failure rate. Consider the
Part Failure Rate Data simple definition of failure rate:

It is common in the field of reliability to represent part


Number of observed failures
integrity or reliability in terms of failure rate or mean time
between failures (MTBF). In general, part failure rates are Total operating time
presented as a function of temperature and electrical stress as
shown in figure 4-3. The family of curves on the graph Obviously, if today we observe two failures in 100 hr and
represents different applied electrical stresses in terms of a tomorrow we accumulate no more failures, the new failure rate
stress ratio or derating factor. For example, ifa part is to operate is two failures in 124 hr. Then, if a failure occurs in the next 1-hr
at temperature A and is derated 20 percent (stress ratio, 0.8), period, the failure rate is three failures in 125 hr. Therefore, we
that part will have a failure rate of X = 0.8 as shown. If the part can never know what the true failure rate is, but we can
is derated 70 percent (stress ratio, 0.3), it will have a failure rate determine representative failure rates or best estimates from
of A = 0.3, etc. Failure rate is usually given in failures per many hours of observed operating time. This type of failure rate
106 hr although as indicated in chapter 3, other dimensions are data is presented in the MIL-HDBK-217 series.
used depending on who publishes the data,
The current authoritative failure rate data published by the
Department of Defense are in MIL-HDBK-217E (ref. 4-2).
The MIL-HDBK-217 series is a direct result of the 1952 Improving System Reliability Through
AGREE effort mentioned in chapter I. The publications listed Part Derating
in table 1-1 and in references 4-3 to 4-5 are also offshoots of
this effort to meet the need for authoritative, statistically based The best way to explain how to derate a component is to give
part failure rates. Because new data on both existing and new an example. Consider two 20-V wet-slug tantalum capacitors,
state-of-the-art parts are constantly being generated and ana- both to be operated at a component temperature of 165 °F. One
lyzed, failure rate handbooks do change. Therefore, be sure to is to be operated at 20 V and the other at 12 V. First, find the
stress ratio or operating-to-rated ratio for both applications:

Stress ratio
Stress ratio = Operating voltage
1.0 .8 Rated voltage

//
15
Hence, one capacitor has a stress ratio of 1.0,

0.8 _"_ __ .3 20 V
Stress ratio = _ = 1.0

/,, 2O V

t and the other, a stress ratio of 0.6,

12V
Stress ratio = -- = 0.6
0.3 20 V

(A stress ratio of 0.6 is the same as "derating'" the component


b 40 percent.) To find the failure rate A for each capacitor, go to
A the MIL-HDBK-217E (ref. 4-2) table for MIL-C-3965
Temperature
glass-sealed, wet-slug capacitors. Move horizontally across the
Figure 4-3.--Failure rate versus electrical stress ratio and 165 °F line to the vertical 0.6 and 1.0 stress ratio columns and
temperature. read directly:

NASA NASA/TP--2000-207428 39
Importance of Learning From Each Case Study--Achieving Launch Vehicle
Failure Reliability

When a product fails, a valuable piece of information about Design Challenge


it has been generated because we have the opportunity to learn
The launch vehicle studied requires the highest acceleration
how to improve the product if we take the right actions:
and velocity and the shortest reaction time of any developed. As
Failures can be classified as
such, the design challenges were formidable; typical in-flight
environments include random vibration of 61 g's rrns up to
(1) Catastrophic (a shorted transistor or an open wire-wound
3 kHz, mechanical shock at 25 000 g's peak (between 5 and
resistor)
10 kHz), linear acceleration well in excess of 100 g's, acoustics
(2) Degradation (change in transistor gain or resistor value)
of 150 dB, and aerodynamic heating up to 6200 °F. The
(3) Physical wearout (brush wear in an electric motor)
development philosophy was that a vehicle be launched from a
tactical silo with the initial design. Although many changes
These three failure categories can be subclassified further:
occurred during the 13-year development, the first flight test
vehicle was not greatly different from the 70 now deployed.
(1) Statistically independent (a shorted capacitor in a radio-
frequency amplifier being unrelated to a low-emission
cathode in a picture tube)
Subsystem Description
(2) Cascade (the shorted capacitor in the radio-frequency
amplifier causing excessive current to flow in its transis- The vehicle is launched from an underground silo, which also
tor and burning the collector beam lead open) serves as a storage container during the muitiyear design life.
(3) Common mode (a circuit board used for primary control Adjacent to the silo and integral to it is a small compartment
of a process and a backup circuit board both burned out housing the ground support equipment. This equipment is used
by an over-voltage condition in a power supply that to conduct periodic tests of the vehicle electronics, to prepare
feeds the two of them) the vehicle for launch, and to launch the vehicle. It also
maintains the silo environment at 80!'_10 °F and 50 percent or
On the basis of the following categories, much can be learned less relative humidity.
from each failure that occurs during flight acceptance testing The vehicle is predominantly in a power-off storage mode
for a mission: good fail ure reporting, conducting fail ure analy- when deployed in its silo. A periodic test of flight electronics is
ses, maintaining a concurrence system, and taking corrective conducted automatically every 4 weeks. In a multiyear design
action. Failure analysis determines what caused the part to fail. life, the flight electronics accumulate about 11 rain of operating
Corrective action ensures that the cause is dealt with. Concur-
time and 43 830 hr of storage time. The ratio of storage time to
rence informs management of actions being taken to avoid operating time is nearly 240 000:1.
another failure. These data enable all personnel to compare the
part ratings with the use stresses and to verify that the pan is
being used with a known margin. Approach to Achieving Reliability Goals

Reliability mathematical models were developed early in the


Failure Reporting, Analysis, Corrective research and development program. From these models it was
apparent that the following parameters were the most important
Action, and Concurrence
in achieving the reliability goals:

Many different methods can be used to record reliability data


(1) Electronic storage failure rate during a multiyear design
for any given project. The Department of Defense has standard-
life (i.e., storage failures)
ized a method on DD form 787-1. A simple form that tells the
(2) Percent testability of missile electronics (i.e., MIL-
whole story on one sheet of paper is NASA--C-8192 (fig. 4-6).
STD-471 A, ref. 4--6)
The method that you use to record reliability data will have to
(3) Periodic test interval for missile electronics
fit your needs. Keep your form simple and easy to fill out, and
(4) Severity of in-flight environments (acceleration, shock,
get approval from management.
vibration, and aerodynamic heating)

44 NASAfI'P--2000-207428
Page of

Glenn Research Center


PROBLEM REPORT # (Hardware __ Software __ )

A. Project Name Procedure No. Date Identified

Assy/CSCI Name ID No. Location

Type: 1. Eng/Qual __ Process: 1. Inspect __ 3. Design __ 5. Test __ -Type:


2. F_ight __ 2. Assemt_e __ 4. Code __
3. GSE

B. Background Info. & Descriptions: (use continuation sheets as needed)

Initiator Date

C. Analysis/Root Cause/Effect on System (use continuation sheets as needed)

Is damage assessment required? Yes (Is work sheet attached? Yes)

Defect(s) info. (Name, ID, Lot code, Supplier, affected routines/sub-routines/programs, etc.)

Defect Code:

Problem Type: __ Nonconformance __ Failure Analyst Date

D. Disposition: Rework/Rewrite Repair/Patch Use as is __ Return __ Scrap __ Request Waive r __

E. Corrective Action: (use continuation sheets as needed)

Initiated: Eng Chg Order N, Software Chg Reg __, Waiver Req __, Request/Order #

Project Eng OMS&A Reviewed on: / /

F. Corrective Action Follow-up: / / By (name & tite):

G. Project Office Approval Signature(s) & Date OMS&A Approval Signature(s) & Date

NASA-C-8192 (Rev 4-97) Page 1 of 2 Distribution: Project Mgr. (orig.), QMS&A, Hardware File (Ref. PAl# 440)

(a)

Figure 4.6.--Failure report and analysis forms. (a) Problem report. (b) Damage assessment worksheet. (c) Defect codes.

NASA/TP--2000-207428 45
INSTRUCTIONS (Please print/write legibly)

Problem Report # -- Unique number assigned by OMS&A PRACA Administrator.


(Hardware Software__) -- Analyst select 1 of 2 categories.

Section A -- To be completed by person who discovered the problem


Project Name -- Name or Acronym of project.
Procedure No. -- Title and/or No. of procedurehnstructions used to carry out required task.
Date Identified -- Date when nonconformance is found or failure occurred.
AssylCSCl Name -- Name of specific pkg., assy., sub-assy, or software pkg. with problem.
ID No. -- Part No., Serial No. if there are multiple parts of same design, or SCM# (SW Config. Mgnt. #).
Location -- Location where problem is identified, e.g. GRC, KSC, EMI Lab, Machine Shop, etc.
Type -- Choose 1 of 3 choices "Engineering/Qualification, Flight or Ground Support Equip:'
Process -- Choose f of 5 choices "Inspection, Assembly, Design, Code or Test".
Test Type -- Applied for test processes only, eg. Burn-in, Vib., Thermal Cycle, Integration, Acceptance, etc.

Section B -- To be completed by person who discovered the problem


Background Info & Descriptions -- How much operating timecycles did the package have when the problem occurred? Record what
was actually measured (actual data), and what it should have been (specifications); and which computer or micro, was running the
software?
Initiator -- Name of person who initiate report. Date -- Report date.

Section C -- To be completed by responsible Project Engineer/Analyst


Analysis/Root Cause/Effect on System -- Brief summary of analysis, describe root cause(s), and effect on system if root cause(s) is
not eliminated.
Defective Part(s) Info. -- Record defective part(s) name, identification (PIN & S/N), model, lot code, supplier/manufacturer.
Problem Type -- Choose 1 of 2 choices "Nonconformance or Failure".
Analyst -- Name of analyst. Date -- Analysis complete date.

Section D -- Responsible Project Engineer(s) will choose 1 of 6 disposition choices.


Rework/Rewrite -- Correct hardware/software to conform to requirements (dwgs., specs., procedures, etc.).
Repair/Patch -- Modify hardware or patch software programs to usable condition.
Use as is -- Accept hardware/software as is, without any modifications; or "Work around" - Software remains as is, but further action is
required on operator or other systems.
Return -- Return to supplier for corrective action (rework, repair, replace, analysis, etc.),
Scrap -- Isolate defective material for details analysis, or discard un-usable material.
Request Waive r -- Initiate a Waiver Request for authorization to vary from Specified requirements.

Section E -- Joint effort of Project Engr., OMS&A Rep. & Specialist(s) as needed
Corrective Action -- Record specific actions required to eliminate problem(s) and prevent recurrence. Identify extent of software
regression testing, and affected routines/programs, including any ECO# (Eng Chg Order), SCQ# (Software Chg Request), and
Waiver Request# initiated.
Project Eng. -- Responsible project engineer's signature.
OMS&A -- Cognizant OMS&A representative's signature.
Reviewed on -- Date when Corrective Action plan is reviewed, or Problem Review Board meets.

Section F -- To be completed by OMS&A Representative


Corrective Action Follow-up -- Date when corrective action is verified. Assure approved waiver is attached if one has been
requested. This will be the official "Problem Closure Date".
Verified By _ Name of OMS&A Rep. who completed the follow-up.

Section G -- Approval Signature Requirements


Problem identified during assembly/inspection -- Required sign-off by Project Eng. & OMS&A Rep.
Problem identified during test -- Required signatures of Project Engineer, OMS&A Rep., Project Assurance Manager, and Project
Manager.

** Training on PRACA System is available through Assurance Management Office**

NASA-C-8192 (Re'.,, 4-97) Page 2 of 2

(a)

Figure 4.6.--(a) Concluded.

46 NASA/TP--2000-207428
O_

D_I
O_ 0
<

w <
n_

O9
I--
t7
z W
uJ ii
o9 ii
W
uJ
WIll-J
11 U.l
0
W
<
f C

I--
2-- 0
W UJ 0.. W
I-- IJJ _-- n t-
< I 0 o. --_ IJ_
r7 O9 0 < W
o

w
n
0 £

IJJ
0
<

o9 I

<
0
uJ U_
O_
D

<
1.1.

Z
0

J
Z

0
_z <Z
0
<
O_
r7 O0

n_
n- wC_
I- uJ m
o3 w m

w
o

NASA/TP--200(0207428 47
ATTACHMENT 3.2.7

DEFECT CODES

INITIAL DEFECT _ CODES FINAL DEFECT CODES


FAILURE CONTAMINATION

Component Select (Separate Test) Fluid


Combined (POT/COAT) Biological
POST (POT/COAT) Corrosion
PerformancelFunctional Particulate
Shock Foreign Object
Thermal Cycle Contaminated
EMI/EMC
Burn-In ELECTRICAL
Pre (POT/COAT)
Vibration Incorrect Mounting
Thermal Vacuum Connector Damaged
X-Ray Examination Reject Incorrect Lead Bend
Launch Site Test (Ground Equipment) Unqualified Part
Acoustics Short Lead
Continuity/Ground Damaged Component
Launch Site Test (Airborne Equipment) Long Lead
Engine Leaks Burnt/Discolored
Leak Test Lead/Wire Damaged
Model Survey Wire Size Incorrect
Structural Load Birdcaged
Thermal Balance Crimping Incorrect
Pressurization Insulation Damaged
Proof Pressure Missing Part
Appendage Deployment Polarity Incorrect
Phasing Test Dirty Relay Contacts
Alignment Test **Routing Incorrect
Weight and CG **Miswired
**Other
SUSPECT **Wrong Part
NOTE: Temporary code must be changed Incorrect Reference Designators
before final closeout.
Suspect MECHANICAL
Suspect as a result of DC&R activity
Incorrect Part
Binding Stuck or Jammed
Dissimilar Metals
Excess Bonding
Holes Incorrect
Lack of Proper Lubrication
Insufficient Bonding
Interference
Bent Contacts/Pins
Misaligned
Missing Part
Improper Assembly
Safety Wire Items
Weight
Torque Values Incorrect
Part Damaged
Does Not Engage or Lock Correctly
Incorrect Dimensions
(c)
Figure 4-6._(c) Defect codes.

48 NASA/TP_2000-207428
DEFECT CODES (continued)

-INAL DEFECT CODES (Continued)


MECHANICAL (continued) CODE DIMENSIONAL (continued) CODE

Location 2002 Burrs-Sharp Edges 431


Missing or Extra 2003 Threads 432
Insert 2004 Angle 433
Rework/Repair Damages 2025 Depth 434
Detail Out of Tolerance 6001
Layout 6002 DOCUMENTATION
Bend Radius/Angle 6003
Made in Reverse 6004 Other Documentation 45O
Undersize Machine/Grind 6005 Test Reports/Certs in Error/Not Complete 452
Incorrect Loft Lines Used 6006 Test Reports/Certs Not Received 453
Missing/Lost MARS 455
DAMAGE MARS in Error 456
Missing/Lost Process Plan 457
301 Incorrect Entry Process Plan 458
Packaging/Handling
Launch 303 Process Plan Not To Latest DCN 459
305 Q Codes (Other than Test Reports/Certs) 470
During Fabrication
During Usage 306
During Transportation 307 PLASTICS
During Test 308
Damage 1009 Improper Cure/Mix 475
2046 Delamination 476
Damaged PVVB
Discontinuities (Holes/Blisters/Voids) 477
DIMENSIONAL Fiber Content 478
Flexural 479
Inside Dimension Distorted 401 Lap Shear 480
Exposed Circuitry 482
Incorrect Length 402
Inside Dimension Undersize 484
403 I Incorrect Coating
485
Incomplete-Missing 404 ] Incorrect Bonding
Outside Dimension Distorted 405 I
Mislocated Feature 406 FINISH
1--
Outside Dimension Oversize 407 ]
501
Surface Finish 408 t Adhesion
Thickness Oversize 502
409 t Blistered/Flaking
503
Outside Dimension Undersize 410 I Color
504
Thickness Undersize 411 I Cracked/Crazed
5O5
Incorrect Width 412 } Incorrect
506
Inside Dimension Oversize 413 1 Pitted/Porous
Inside Diameter Undersize 507
416 I No Samples
Inside Diameter Oversize 5O8
417 I Rough/Irregular
509
Outside Diameter Undersize 418 t Thickness
510
Outside Diameter Oversize 419 1 Scratched
Flatness 420 I
Straightness 421 I IDENTIFICATION
Roundness 422 I
551
Cylindricity 423 I Incomplete
552
Perpendicularity 424 I Incorrect
554
Angularity 425 I Smeared/Illegible
Parallelism 556
426 I Missing
Profile 427 I
Runout-Total Runout MATERIALS PROPERTIES
True Position
Chemical 611
Metallurgical 612
Improper Mix/Cure 613
(c)
Figure 4-6.--(c) Continued.

NASA/TP--2000-207428 49
DEFECT CODES (continued)

FINAL DEFECT CODES _(continued)


MATERIAL PROPERTIES (continued) MISCELLANEOUS

Heat Treat Material Response Test Error


Mechanical Process/Steps Missed
Voids/I nclusions No Evidence of Source Inspection
Crack/Fracture Procurement Error
Voids/Porosity/Inclusions/Cracks Destructive Physical Analysis (DPA)
Certification Reject
Incorrect Material Particle Impact Noise Detection
Incorrect Dimensions (PIND) Reject
Chemical Composition Defective Tool/Test Equipment
Moisture Content Incorrect Assembling
Pot Life Integrity Seal Missing or Broken
Tensile Strength Intermittent Operations
Yield Launch Usage
Hardness Leakage
Cure Hardness Out of Calibration
Peel Strength Shipped Short
Burst/Ruptured
SOLDER Failed Due to Associated Equipment
Expanded (Normal Life)
Cold Joint Time/Operational, Temperature
Hole in Solder Sensitive, Expirations
Fractured Joint Procedure Not Followed
Pitted/Porous Proof Test
Insufficient Missing Operation
Excess Flux Contamination
Excess Solder All Trailer Problems
Solder/Ball Splash Documentation/Certification Problems
Dewetted Joint History Jacket Problems
Lifted Pads Directed Rejection Item
Measling
Insulation in Solder
Potential Short
Bridging Cracks
Improper Tinning Porosity
Manual Soldering Discrepancy Lack of Fusion
Machine Soldering Discrepancy Bum Through
Contaminated Joint Lack of Penetration
Corrosion/Oxidation Laps
Mismatch/Suck-In
NO DEFECT Location
Build Up
NOTE: This code to be used for Craters
MARS closures where no Discoloration
discrepancies were identified. Fill-Up
Length
Preparation
Profile
Undercut
Oxidation
Metal Expulsion
(c)
Figure 4-6.---(c) Continued.

5O NASA/TP--2000-207428
DEFECT CODES (continued)

FINAL DEFECT CODES (continued)


ELECTRONIC/COMPUTERS BONDING/COMPOSITIES/POTTING CODE
CODE
Faulty Program or Disk 2013
931 I Separation/Delamination
Unable to Load Program 2014
932 I Improper Cure
2015
Nonprogrammed Halt 933 I Incorrect Lay-Up/Set-Up
Illegal Operation or Address 2016
934 I Test Specimen Failure Missing
Computer Memory Error/Defect 2017
935 I Voids/Blisters/Bridging/Pits
Input/Output Pulse Distortion 2018
936 Damage
Low Power Output 937 Mission Operation 2051
Frequency Out of Band, Unstable 2052
938 i Damaged
or Incorrect
Commercial Part Failure 941 CONNECTORS-COMPONENTS/EEE
Communication/Transmission Line
Disturbance 943 Exceeds PDA 2041
Externally Induced Transient 2042
945 Outside of SPC Boundaries
2043
X-ray to Applicable MIL Spec
COMPONENT LEAD WELDING 2O44
Improper Testing
2O45
Noisy Output
(EMF only)
Excessive Embedment 950 TOOLING FUNCTION
Cracks 951 I ----
Voids 6007
952 Incomplete Hardware
Excessive Expulsion 953 Burrs 6006
Open/Missed Welds 6009
954 I Inadequate Structure
Damaged Ribbon/Lead 6010
955 I Discrepant Drill Bushing
Dimensions Incorrect 6012
956 Improper Insert/Bushing
Sleeving Missing 957
Insufficient Heat/Cold Weld 958 IFUSION WELDING
Misrouted 959!
Insufficient Fillet 960 I Fusion Weld Defects 2066
Ribbon/Lead Misalignment 961
Ribbon/Lead Length Incorrect
962 TUBE/HOSE
ASSEMBLY/INSTALLATIONS 20O5
Damaged Flares/Lip Seals
2006
I Incorrect Contours/Bends
Parts Mismatched 2007
2019 Wrong or Binding B-Nuts Sleeves
Fastener Wrong or Damaged 2008
2020 I Dimensional
Damaged or Missing Seals 2009
2021 I Expended
2010
Missing/Improperly Installed 2022 Damaged Braid
Parts Missing/Wrong/Damaged 2011
2023 I Cracks
Improper Configuration 2024 t
ICHEMICAL/PLATING/LU BE/PAINT
RESISTANCE WELDING Contamination 2012

Resistance Weld Defects 2067 I

(c)
Figure 4-6.--(c) Concluded.

NASA/TP--2000-207428 51
Launch and Flight Reliability Remote form
time reference
The flight test pro_am demonstrated the launch and flight (RFTR) -7
reliability of the vehicle. The ultimate flight program success Expanded [
below _ t
ratio of 91 percent exceeded the overall availability-reliability
\ t
goal by a comfortable margin.
Prep
order

Field Failure Problem

Twenty-six guidance sections failed the platform caging test / /


I _-300+50 ms Inverter -/ Lcage null/
portion of the launch station periodic tests (LSPT's). These
I system
failures resulted in a major alarm powerdown. An investigation
I ready
was conducted.
0 1 2 3 4 5 6
Description of launch station periodic tests.--The system
Launch sequencer clock, s
test requirements at the site include a requirement for station
periodic tests upon completion of cell or vehicle installation
System ready _ RFTR
and every 28 days thereafter. LSPT's check the overall system
performance
LSPT,
equipment,
the
to evaluate
software
data processing
initiates
the readiness
a test
system,
of a cell.
of the vehicle
and radar
During
and ground
interfaces. Any
an Inverter

,,, lllllllllIl
Itl
Cage null --..

It,,,,,I,
IllIIrltlllll
"1 \

nonconformance during an LSPT is logged by the data proces- I / 5373.9 ms 5924.5 ms ._l III
sor and printed out, and the time from initiation of LSPT to 1_ 5347.7 ms 5950.7 ms -- "-j I I
failure is recorded. During an LSPT, the platform spin motor is 5976.9 ms ----I I
6003.1 ms .... J
spun up and held at speed for approximately 10 sec. After this,
the system is returned to normal. Figure 4-7.--System spinup tests. (Gate times are within
An LSPT consists of two phases: + 50 ms of that shown because of data processor
tolerances.)
(1) Spinup, a power-up phase to spin the gyros, align the
platform, verify platform null, and check airborne
power supply operation on G&C's 28 and 86, and an autopilot level test on G&C 102.
(2) A detailed test of airborne electronics in the radio- G&C 102 failed caging null four times and inverter null once at
frequency test phase horizontal marriage. An evaluation of the inverter null failure
revealed that a high caging amplifier output caused the launch
Initial failure occurrence.--Cell 3 on remote farm 1 (R IC3) sequencer level detector to become offset during inverter
experienced an LSPT failure (a major alarm powerdown) monitoring, resulting in the major alarm even though the auto-
5.936 sec after "prep order," the command to ready the vehicle pilot in verter voltage was normal. Launch sequencer offset may
for launch. The failure did not repeat during four subsequent or may not occur with an uncaged platform depending on the
LSPT's. RIC3 had previously passed three scheduled LSPT's amplitude of the caging amplifier output when the inverter
before failure. A total of four cells on remote farms 1 and 2 voltage is monitored. Therefore, both the inverter null and the
had experienced similar failures. Two of the failures occurred caging null LSPT failures at site were attributed to failure of the
at 5.360 sec (an inverter test to determine if ac power is avail- platform to cage.
able). Two occurred at 5.936 sec (caging test to determine if the An autopilot acceptance test tool was modified to permit
platform is nulled to the reference position; see fig. 4-7). monitoring of the platform spin motor voltage (800 Hz, 8 V,
Replacement of failed guidance and control sections (G&C) 3 ¢) and the spin motor rotation detector (SMRD). During a
28, 102, and 86 led to successful LSPT's. G&C 99, which failed spinup test on autopilot 69 (G&C 102), recordings indicated
only once during in-cell testing, was left on line. G&C's 28, sustained caging oscillation. The SMRD showed no evidence
102, and 86 were ,,'ned to Martin Marietta, Orlando, for of spin motor operation even though all autopilot voltages were
analysis of the pet '_failed condition. correct, including the spin motor excitation voltage at the
Failure verifier, ,nd troubleshooting.--The test plan platform terminals. Further verification was obtained by listen-
that was generate_ _litted testing the failed G&C's in a ing for characteristic motor noises with a stethoscope.
horizontal marriage test and a G&C test to maximize the G&C 86 failed the G&C level test because of caging null and
probability of duplicating the field failures. Test results con- inverter null alarms. Then, 3.5 sec into the third run, the caging
firmed site failures for both the caging null and the inverter null loop stopped oscillating, but the platform did not cage in time
during a horizontal marriage test on G&C 102, a G&C level test to pass the test. The next run met all G&C test requirements.

52 NASA/TP---2000- 207428
It appeared obvious that the spin motor started spinning in the preload (torque). The spin motor generates approximately
middle of the run. 4000 dyne cm of starting torque with normal excitation voltage;
G&C 28 failed one run of the G&C level test; however, it met 800 dyne cm of this torque is used to overcome the inertia and
all requirements in the autopi[ot level test. This means that the frictional torque of the motor.
spin successfully met its acceptance test procedure require- Platform 140 was tested on the dynamometer and produced the
ments. A hesitation was noted during two of the seven spinup torque peaks of 3400 and 3100 dyne cm shown in figure 4--8.
tests conducted. Platform 127 was heated to normal on the gyro The torque peaks were three revolutions apart. This is four
test set. Its resistances were checked and found to meet speci- times the normal running torque level for a new spin motor and
fication requirements. No attempt was made to start platform about four times the torque level for this spin motor for the rest
127's spin motor at platform level. Both units were hand-carried of its run. The torque increase lasted for about one-half a
to the subcontractor for failure analysis. The subcontractor was revolution and repeated within three revolutions. The spin motor
familiar with the construction of the platform and bad the bearings were cleaned and reassembled. Two large torque
facilities to disassemble the platform without disturbing the spikes of approximately 3000 dyne cm were observed on the
apparently intermittent failure condition. first revolution. A 2200-dyne cm torque hump, one revolution
Verification test conclusions.--Verification tests isolated in duration, was centered at the beginning of the second
the site LSPT failures to a failure of the platform spin motor to revolution. From these results, it was concluded that something
spin up, thereby causing major alarms at the inverter null or in the spin motor beating was causing an abnormal frictional
caging null gate. During testing, three of the first four failed load there. This result isolated the problem to the spin motor
platforms caged upon repeated application of voltage. Once the bearing area and eliminated the motor electrical characteristics
platform caged, the platform, autopilot, and G&C met all as a contributor.
system test requirements. On the basis of these results, it was
decided to repeat LSPT's up to 10 times after a site failure
before removing the G&C, If the LSPT's were successful, the Runup and Rundown Tests
G&C would be left on line.
Measurements at platform level indicated the problem was A series of tests were performed on spin motors 96 and 140
internal to the platform and that all resistances and the platform to determine the effect of motor running time on spin motor
temperature were correct. Subcontractor representatives start and running torque. Figure 4--9 shows the change in
reviewed the test results and concurred that the problem was rundown time with a change in motor run time.
internal to the platform.

Summary of Case Study


Mechanical Tests
Fieldproblem cause.--The 26 LSPT failures at the site were
The spin motor breakaway torque was measured with a caused by the failure of the G&C platform spin motors to spin
gram gage on platform 127 and was found to be normal up within 6 sec after the command to ready the vehicle for
(750 dyne cm). Dynamometer tests were performed on both
launch. It was determined that the spin motors did not start with
platforms. The dynamometer is an instrument that measures
the normal application of voltage. A polymer film had formed
rotation torque by slowly rotating the rotor of the spin motor on the bearing surfaces during testing at 175 °F and caused the
while recording the stator rotational torque. The dynamometer bails to stick to the outer race. This film was identified as one
is used during initial builds to establish the spin motor bearing from the alkyl phenol and alkyl benzene families, and its source
was determined to be uncured resins from the bearing retainer.

_= 3 rev
10
Spin
motor
4000 /- 3400 dyne cm
0 140
5 [] 96

2000

_ 'j ," 31 O0 dyne cm -_ --.


0

o I I I
_---0.5 rev-----_ L_----O.S rev-_
0 2 4 6 8 10
Time, s Motor run time prior to rundown time, min

Figure 4-8.--Platform dynamometertorque test. Figure 4-9.--Rundown time versus motor run time.

NAS A,rI'P--2000-207428 53
Polymer film.--A film approximately 900 ,_ thick had surface of the retainer. The metal surfaces will then become
formed on the metal surfaces of the bearings of failed spin lubricated with oil containing a small percentage of uncured
motors. The amount of material generated was ~ 10 -7 g/ball. To resins. Subsequent storage cycles and running will continue
put this number in proper perspective, 2x 10-4 g of oil is put on this redistribution process, steadily increasing the phenolic
the bearin_ race during initial build, and 2x10 -3 o of oil is concentration. Exposure to a temperature of 175 °F and
impregnated in the bearing retainer. extended operational maintenance gradually cure these
Alkyl phenol/alkyl benzene is a generic identification of a phenolics in two stages. Initially, a highly viscous gummy
family of organic compounds. Further analysis identifies the residue is formed; finally, a hard, insoluble polymer film is
major compounds in the family as phenol and methyl phenol formed on the metal surfaces. The film forms a bond between
(alkyl phenols) and toluene, xylene, and benzene (alkyl ben- the balls and the races. The coating builds up to the point where
zenes). A phenolic polymeric film would have the gummy, the spin motor torque cannot overcome the bond at the initial
adhesive, and insolubility properties detected in the analysis. power application.
There is little doubt that the gummy film detected was a phenol- Extent ofproblem.--An analysis of failed and unfailed field
based material. units proved that not all platforms are susceptible to this failure.
Source ofphenol.--Phenols are used in three areas of the Obviously, a high percentage are susceptible, since 26 failures
spin motor. A phenolic adhesive bonds the stator laminations have been experienced. It is likely that many unfailed platforms
together and bonds the hysteresis ring to the rotor. The bonding contain some small percentage of uncured resins.
processes adequately cure the phenol to the point where uncured The significantly higher failure rate in the units with higher
phenols would not be present. Also, the stator laminations are serial numbers points to a process (or common) failure mode.
coated with epoxy after bonding. The remaining source is the All evidence points to lot-to-lot variations in the amount of
paper phenolic retainer, which serves as a spacer and a lubrica- uncured resins present in the retainer raw material. Traceability
tion source for the spin motor bearings. Mass spectral analysis from retainer lot to individual platform spin motor was not
of the retainers yielded spectra essentially identical to the possible in this case, but such records should be available. The
spectrum of the coating on the failed bearings. The conclusion 26 units that have failed and the failure rate at the 14-day
of this analysis is that the source of the phenolic is uncured interval bound "the total platform failure rate. The number of
phenolic resins or resin compounds in the retainer. spares available is adequate to meet system life and reliability
Retainer processing.--The retainer material is manufac- requirements.
tured to military specifications by a vendor and is screened to Site reliability.--The site system reliability goal allows
tighter vehicle requirements for specific gravity. There is no approximately two G&C failures per month for any cause.
specific requirement concerning uncured resins in the retainer Analysis of test data indicates the goal can be achieved at either
material. The vendor estimated an upper limit of I percent of a 7-day test interval (0.8 failure/month) or a 14-day test interval
uncured resin in the retainer raw material. One percent would ( 1.5 failures/month). It cannot be achieved at a 21-day interval
provide 3x10 -5 g of uncured resins, more than sufficient to (7.7 failures/month) or a 28-day interval (8.6 failures/month).
cause the spin motor problem. Even though at least 74 percent of the site failures were
The finished retainer material is cleaned by an extraction restarted, a limited number of spare G&C's are available.
process with benzene or hexane. This process does not remove Tests at the site revealed that most failed spin motors can be
a significant amount of uncured resins. Therefore, if uncured restarted within 10 power applications and once started will
resins survive the vendor processing, they will remain in the perform properly. The site procedure was revised to leave any
uncured state in the installed retainers. failed G&C's that restart within 10 attempts on line. Platforms
Mechanism of film formation.--It is theorized that the that did not start within 10 attempts were returned to the
uncured resins are transferred from the retainer to the bearing contractor and were restarted by repetitive application of
surfaces through the natural lubricating process of the retainer. overvoltage or reverse voltage up to the motor saturation limit.
Running the spin motors generates centrifugal forces that sling These data support the conclusion that the failure mode was the
the excess oil offthe rotating surfaces, leaving a thin film ofoil. formation of a film bond on the race and that increasing the
The force of gravity during subsequent storage of the motor inverter output voltage to the motor saturation limit would not
causes the already thin film to become thinner on the top eliminate the problem.
surfaces and thicker on the lower surfaces. This redistribution Current site operating procedures provide a 14-day LSPT
process involves only the oil and leaves more viscous contami- interval with a 10-min run time. This enables the G&C failure
nants in place. Subsequent running of the motor will cause rate to meet system reliability goals. The vehicle site is cur-
replacement of oil on the oil-free surfaces. The source of the rently being deactivated. If reactivation should be required, the
replacement oil is the retainer capillaries. This replacement repair of all defective or support platforms should be included
process will cause the oil to bring any uncured phenolics to the as part of that effort.

54 NASAFFP--2000-207428
Concluding Remarks References

Now that you have completed chapter 4, several concepts 4-1. ElectronicReliabilityDesign Handbook. MIL-HDBK-338,Oct. 1988.
should be clear. 4-2. Reliability Prediction of Electronic Equipment. M IL-HDBK-217E,
Jan. 1990.
4-3. Taylor, J.R.: Handbook of Piece Part Failure Rates. Martin Marietta
(1) The failure rate of complex equipment is usually consid-
Corp., June 1970. (Avail. NTIS, AD--B007168L.)
ered to be a constant.
4-4. Bloomquist, C.: and Graham. W.: Analysis of Spacecraft On-Orbit
(2) Most failures are random, with repetitive failures repre- Anomalies and Lifetime. (PRC R-3579, PRC Systems Sciences Co.;
senting a small portion of unreliability. NASA Contract NAS5-27279), NASA CR-170565, 1983.

(3) The rate at which failures occur depends upon 4-5. Government-Industry Data Exchange Program (GfOEP), Reliabi[ity
Maintainability (R-M) Analyzed Data Summaries, vol. 7, Oct, 1985,
(a) The acceptance criteria, which determine
4-6. Maintainability Demonstration, MIL-STD--47 IA, Jan. 10, 1975.
how effectively potential failures are detected 4-7. Reliability Modeling and Prediction. MIL-STD-756B. Aug. 1982.
(b) All applied stresses, including electrical, mechani- 4-8. Lloyd, D.K.; and Lipow. M: Reliability: Management. Methods, and
cal, and environmental. (As these stresses increase, Mathematics. Prentice-Hall, 1962.

the failure rate usually increases.) 4-9. Landers. R.R.: Reliability and Product Assurance. Prentice-Hall, 1963.
4-10. Anstead, R.I.: and Goldberg, E.: Failure Analysis of Electronic Parts
(4) Published failure rate data represent the potential
Laboratory Methods. NASA SP--6508, 1975.
failures expected of a part. The rate at which these 4-11. Devaney. J.R.; Hill, G.L; and Seippel, R.G.: Failure Analysis Mecha-
failures are observed depends on the applied electrical nisms, Techniques and Photo Atlas. Failure Recognition and Training
stresses (the stress ratio) and the mechanical stresses Service Inc., Monrovia, CA, 1985.
4-12. Smith, G.. et al.: How to A void Metallic Growth Problems on Electron ic
(the K A factor).
Hardware. IPC-TR-476, Institute of Printed Circuits. Sept. 1977.
(5) In general, failure rate predictions are best applied on a
relative basis,
(6) Failure rate data can be used to provide reliability
criteria to be traded off with other performance para-
meters or physical configurations.
(7) The reliability of a device can be increased only if the
device's failure mechanisms and their activation causes
are understood.

In addition, you should be able to use failure rate data to


predict the failure rate expected of a design, and consequently,
to calculate the first term, Pc,' of inherent reliability. Finally,
you should be able to allocate failure rate requirements to parts
after having been given a reliability goal for a system or the
elements of a system.

NASA/TP--2000-207428 55
Reliability Training [

la. Using the failure rate data in table 4--4, calculate the flight failure rate for a launch vehicle electronic subsystem consisting
of the following parts (assume K A = 1000):

Component Number
of parts,
N
Resistor, G657109/I 0 5
Resistor, variable, 11176416 1
Capacitor, G657113 3
Diode, G6557092 3
Transistor_ 11176056 4
Integrated circuit, analog, 11177686 I

A. 195 failures per 109hr B. 195 000 failures per 109hr C. 195 000 failures per 106hr

lb. Assume the flight failure rate for this circuit is 500 000 failures per 109 hr. Calculate the reliability of the circuit for a 0.01-hr
flight.

A. 0.9995 B. 0.99995 C. 0.999995

2. The a posteriori flight failure rate of a launch vehicle is 440 000 failures per 109 hr.

a. If the storage failure rate is 0.3 of the operating rate, how long can the vehicle be stored with a 90.4 percent probability of no
failures'?

A. 30 days B. 40 days C. 50 days

b. After 1450 hr (2 months) in storage the vehicle is removed and checked out electronically. If the vehicle passes its electronic
checkout and the checkout equipment can detect only 80 percent of the possible failures, what is the probability that the vehicle
is good? (Ignore test time.)

A. 0.962 B. 0.858 C. 0.946

. A subassembly in a piece of ground support equipment has a reliability requirement of 0.995. Preliminary estimates suggest that
the subassembly will contain 300 parts and will operate for 200 hr. What is the average part failure rate required to meet the
reliability goal'?

A. 25×10 -6 B. 16 667×10 -9 C. 83×10 -9

4. A piece of ground support equipment has a reliability goal of 0.9936. It contains four subassemblies of approximately equal risk.

a. What is the allocated reliability goal of each of the four subassemblies?

A. 0.99984 B. 0.9984 C. 0.9884

b. Allocating further into subassembly 1, assume the goal is 0.998. Solve for the average part failure rate given the following:

Estimated parts count: 100


Estimated operating time: I0 hr

A. 20 000×10 -9 B. 2000x10 -9 C. 200x10 -9

tAnswers are given at the end of this manual.

56 NASA/TP--2000-207428
TABLE 4-4.--SELECTED LISTING--APPROVED ELECTRONIC

FAILURE RATES FOR LAUNCH VEHICLE APPLICATION _

Part numbcr Part Operating Nonoperating


mode" m°de_

Failure rate failures/10 '_ hr

Integrated circuits

11177680/81/82/83/84/85 Digital 10 3
11177686 Analog 30 10

Transistors

6557155 Double switch 10


6557318/19 Medium-power switch 20
6557046 PNP type transistor I
11176911 Medium-power switch
11176056 High-speed switch
11177685 Field-effect transistor
6310038 2N5201 10
6557072 2N918 (unmatched) 50

Diodes

6557061 Rectifier and logic (5 V) 20


6557092 Rectifier and logic (30 V) 5
6557123 Rectifier and logic (50 V) b
6557125 Rectifier and logic (600 V)
11176912 Rectifier and logic (400 V) *

Resistors

6557018 2.5-W wirewound 2 1


6557015 1/8-W wirewound 3 2

6557016/17 1- and 2-W wirewound 2 5


6557030 1/IO-W fixed fihn I 5
6557031 6-W wirewound 5 5

6557109/10 1/4-W fixed composition I 2

6557329 I/8-W fixed film I 3


11176416 I-W variable metal film 5O 10.3

Capacitors

G657020/21/22 Fixed glass 0.1 0.1


G657113/173 Fixed ceramic 5 1
G657114 Fixed ceramic l0 1
G657119/120 Solid tantalum 2 I
G657202 Precision. fixed ceramic 50 3

Relays

11176326/453 DPDT armature 100 20

Transformers (RF)

11301034/35/43/49 10 5
11301064 I 5

RF coil

G657140/41 3 2
G657178/81 10 2

RF filter

G657189 I 50 I 5

_Current failure rate data are available from refs. 4-1 and 4-4.
_Applies to all slash numbers of pans shown (worst case shown).

NASA/TP--2000-207428 57
Chapter 5
Applying Probability Density Functions
The inherent reliability of equipment is defined in chapter 3 In the language of probability, the probability of x being
as within the interval (a,b) is given by

Ri = e-kt Pt Pw

where P(_ <_x <__


b) = f_p(x) d x = !

probability of no failures In other words the probability that x lies between a and b is 1.
This should be clear, since x can take only values between a and b.
e-kt probability of no catastrophic part failures In a similar fashion, we can find the probability of x being
probability of no tolerance failures within any other interval, say between c and d, from
Pt

2P w probability of no wearout failures

P(c < x < d) = fdp(x) dx


Before discussing the Pt and Pw terms in the next chapter, it
is necessary to understand probability density functions and
which is shown in figure 5-2.
cumulative probability functions. These concepts form another
Example 1: Suppose we were to perform an experiment in
part of probability theory not discussed in chapter 2. First, in
which we measured the height of oak trees in a 1-acre woods.
this chapter, the theory of density and cumulative functions is The result, if our measuring accuracy is +5 ft, might look like
discussed in general; then the normal, or Gaussian, distribution
the histogram shown in figure 5-3.
is discussed in detail. This normal distribution is used exten-
The value at the top of each histogram cell (or bar) indicates
sively later in the manual. the number of trees observed to have a height within the
boundaries of that cell. For example, 19 trees had a height
between 0 and 10 feet, 17 trees had a height between 10 and 20
Probability Density Functions feet, and so on. The figure shows that 100 trees were observed.
Now let us calculate values for the ordinate of the histogram
If a chance variable x can take on values only within some
so that the area under the histo_am equals unity. Then, we will
interval, say between a and b, the probability density function
establish a probability density function for the tree heights.
p(x) of that variable has the property that (ref. 5-1) Since we observed 100 trees, it should be apparent that if the
calculated ordinate of a cell times the width of the cell (the
cell area) yields the percentage of 100 trees in that cell, the sum
f p(x) dx = l of the percentage in all cells will have to equal 100 percent. Of,
if the percentages are expressed as decimal fractions, their sum
In other words, the area under the curve p(x) is equal to unity. will equal 1, which will be the total area under the histogram.
This is shown in figure 5-1. Therefore,

NASA/TP--2000-207428 59
Asacheck,wecanseethat
Percent of trees in cell
Ordinate of cell =
Width of cell
Ordinate of cell = 0.019 x Cell width (10) = 0.19, or 19 percent

For the cell 0 to 10 feet, which has 19 percent of the trees in it, In a similar fashion, the ordinates for the other cells can be
calculated and are shown in table 5-1 and figure 5-4.
19 1 The next step (fig. 5-4) is to draw a line through the midpoint
Ordinate of cell = --× -- = 0.019 of the cells. The equation for this line is called the probability
100 I0
density function p(x) and has the form

p(x) = -0.0002x + 0.02

Equation of curve p(x) --,


\
\ The area under the curve is (ref. 5-2)
\

_'-" Area under el00 t.100


/ ._" cu_e equals Area = J0 P(x)dxJo(--O'OOO2x + O.02)d x

x 2 100
.." _ unity
=-l-'_-+0"02x0 = (100)21-_
+0.02(100)
a x b

Figure 5-1 .--Probability density function curve. 104 ^ -1+2=1


=- 10-----T+z =

Area under p(x) between x = c This agrees with our requirement that the area under a probabil-
and x = d is probability that x lies ity density function equal unity.
between c and d
\
TABLE 5-1 .----CALCULATION OF CELL
ORDINATES FOR TREE DATA

Ce 11 Ordinate Area.
cell width
times cell ordinate
NN
a c d b 0--10 19 = 0.019 0.t9
IOOx 10
Figure 5-2.--Application of probability density function.

10-20 _17 =0.017 .17


103
19
15 = 0.015 .15
20-30 --3-3
10

I__3 = 0.013 .13


30--40 103
..... _
11 .11
= 0.011
40-50 ''T3
I0
[_ ........ 11
O
9
50--60 _ = 0.009 .09

.,_ .... ,.;. _ _

.O7
60-70 _ = 0.007
............ ;;,:;_
..... ,,, .,¢' ......
5
70-80 777" = 0.005 .05
l(J"

3
.03
i 2 80-90 10-'T = 0.003

I
0 10 20 30 40 50 60 70 80 90 100 .01
90--100 1-'_._ = °001
Tree height, x, ft

Figure 5-3.--Height of trees observed in 1-acre woods. Total area 1.00

6O NASAfrP_2000-207428
.020

.018
Probability that
//- p(x) =-O.OOO2 x + O.02 = _ miss distance
.016
>" _will exceed 90 ft_
........ -._
-< .014
->" p(x) = -0.0002 x + 0.02 -'""
: _

t- .012
e I I I I I I I I I
.010 0 10 20 30 40 50 60 70 80 90 100
Target miss distance, ft
, <:
.008 Figure 5-5.--Probability density function for missile target
t_ miss distance.
.006
t2.
in the same woods. If we accept this assumption, we could then
.004 use our experience (the established density function) to predict
the distribution of tree heights in an unmeasured acre. And this
.002 ), is exactly what is done in industry.
As you can see, if we know what the density functions are for

0 10 20 30 40 50 60 70 80 90 100 such things as failure rates, operating temperatures, and missile

Tree height, x, ff accuracy, it is easy to determine the probability of meeting a


failure rate requirement for equipment (such as a missile)
Figure 5--4.--Probability density function for tree heights. specified to operate in some temperature range with a required
accuracy.
Example 2: Suppose that a missile has a maximum target
Application of Density Functions miss distance requirement of 90 feet and that after several
hundred firings, the probability density function for miss dis-
Now let us see how we can apply the density function to the
tance is
tree data. To find the percentage of trees between 60 and 80 feet
high, solve for
p(x) = -0.0002x+ 0.02 where 0 < x < 100

P(60 < x < 80) = which is the same as thep(x) for the tree example and is shown
800p(x)dx= ,80 (-O.0002x + 0.02) dx
60
in figure 5-5.
To predict the probability that the next missile fired will miss
- + 0.02x = 802 - 602 + 0.02(80 - 60) the target by more than 90 feet, solve for
x
104 60 1

100
_ 1 (2800) + 0.4 = -0.28 + 0.4
104 P(90 -< x < 100) = -[90 (-0.0002x + 0.02)dx

= 0.12, or 12 percent 2 100


- F
x
104 0.02x 9o
Figure 5-3 shows that this answer is correct, since 12/1 O0 trees
were observed to have a height between 60 and 80 feet.
Another way to look at this example is that there is only a - 1 (1002-902)+0.02(100-90)
104
12-percent chance that a tree picked at random from the 1-acre
area would have a height between 60 and 80 feet. In a similar
-1900+0.02(10 )
fashion, we can calculate the probability that a tree would have 104
any range of heights within the boundary of 0 to 100 feet.
= -0.19 + 0.2 = 0.01, or 1 percent
In the tree example, we were able to measure the trees in a
particular part of the woods and to obtain a height density
function for those trees. But what do we do if we are interested In other words, there is a 99-percent chance that the missile
in a different area of woods and for some reason we are not able will hit within 90 feet of the target and a l-percent chance that

to go out and measure the trees? We would probably assume it will not. This is shown as the shaded area under the density
that the acre we measured was representative of all other acres function in figure 5-5.

NASA/TP--2000-207428 61
TABLE 5-2.---ORDINATES FOR CUMULATIVE
Cumulative Probability Distribution
DISTRIBUTION OF TREE DATA
Tree height, Area under Ordinate of p(x) curve
Another practical tool in probability calculation is the cumu- ft (cumulative area)
p(x) curve
lative probability distribution F(x) from reference 5-3. An F(x) 0-10 0.19 0.19
curve for the tree example in the preceding section is shown in
10--20 .17 .36
figure 5--6. The curve represents the cumulative area under the
20-30 .15 .51
probability density function p(x). The ordinates of the curve
were calculated as shown in table 5-2. 30-40 .13 .64

The cumulative curve can be used to solve the same problems 40-5O .11 .75

that the density curve was used to solve. 50--60 .09 .84
Example 3: Referring again to example 1, suppose that we 60-70 .07 .91
want to know the probability that a particular tree selected at
70-80 .05 .96
random from the woods will have a height between 30 and
80-90 .03 .99
50 feet.
90-100 .01 1.00
Solution 3A: Using the density function for tree height,

P(30 < x < 50) = (-0.0002x + 0.02)d x


0 Note that in working out solution 3A, the next-to-last step
x 2 _50 (0.75 -0.51) is the same as the next-to-last step of solution 3B.
= - 1--U+ 0.02Xl3o The reason for this is that the equation of the cumulative
probability function F(x) is found from

- _+0.40

F(x) = _ p(x)d x
= -0.16 + 0.40 = 0.24, or 24 percent
and

Solution 3B: Using the cumulative curve shown in figure 5-5,


fbaP(X d x = F(b) - F(a)

P(30 < x < 50) = F(50)- F(30) = 0.75 - 0.51


For the tree example
= 0.24, or 24 percent

which agrees with solution 3A.


F(x)= _(-O.OOO2x +O.O2)dx- x2 + 0.02x
-- 17
i
1.0

Consequently, we can find the probability of a variable x being


within some interval by using the cumulative function F(x)
even though the cumulative graph is not available.
.9 m __I
.8
.......... Example 4: What is the probability that a tree selected at
.7 random will have a height less than 20 feet?
--/" "'_- F(x)=fp(x)dx Solution 4:

t_
.6

.... ggo_)____ //i


£
I:X
.5 m I ;
e(o_<x _<20)= ,2j0
o p(x)dx=F(20)-F(0)
I I
-- I I
2.4 I
I
I
I
I I
E - x2 _0.02x2°
,, ,,
_.a 104 0
,, ,,
',
.2 , ,,

= -0.04 + 0.4 = 0.36, or 36 percent

0 10 20 30 40 50 60 70 80 90 100
which agrees with a graphical solution.
Tree height, x, f_
Some general rules for the use of the cumulative function
Figure 5-6.--Cumulative probability function for tree heights. F(x) are

62 NASA/TP--2000-207428
100

90--

e-
80 ll) m

7
c-
O
70
"6

60 E
.1:1
t_
r-,
O 5O
t'_
0.1

4O

E 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2
30
C.)
Height, fl
20
Figure 5-8.--Histogram and density function for heights
of children.
10

measurements of an object or some physical phenomenon


o 1 I 1 I I I I I
(ref. 5-4).
40 50 60 70 80 90 100 110 120 130
Tropic zone temperature, °F Example 6: Assume that we need to measure the heights of
eighth-grade children. A histogram of the children's heights
Figure 5-7.--Cumulative distribution of tropic zone temperatures.
would resemble the curvein figure 5-8. If, as in our tree example,
we calculate an ordinate for the histogram so that the area under
the histo_am equals unity and then connect the midpoints of
each cell, we obtain a smooth curve as shown in figure 5-8. This
(1) P(x < a) = F(a) curve represents the density function for the heights of the
(2) P(x > a) = ! - F(a) children. Such a curve (sometimes called a bell curve) is the
shape of the normal distribution. We say that the children's
(3) P(a < x < b) = F(b) - F(a)
heights are distributed normally.

Example 5: Suppose that we would like to know the probabil-


ity of equipment seeing tropic zone temperatures above 120 °F Normal Density Function
during operation because at or above 120 °F, we have to add a
costly air-conditioning system to cool the equipment. If we The equation for the density function p(x) of the normal
could obtain the temperature data, we might find that the distribution is
cumulative distribution for tropic zone temperatures would be
that shown in figure 5-7. 1 e_(X_y)2/2t7 2

Solution 5: From the curve, the probability of observing a


p(x)=
temperature at or above 120 °F is given by
This curve is shown in figure 5-9. The function p(x) has two
parameters. The first is the mean .7 calculated from
P(temp > 120 "F) = 1 - F(120 ° F) = I - 0.97
t/
= 0.3, or 3 percent
.7 = 12 xi where
n
i=l
With only a 3-percent chance of temperatures above 120 °F, we
probably would decide against air conditioning (all other where
parameters, such as failure rate, being equal).
n total number of measurements or observations
x i value of ith measurement
Normal Distribution
The mean, therefore, is the arithmetic average of the measure-
One of the most frequently used density functions in reliabil- ments. From example 6, we would add all the heights observed
ity engineering is the normal, or Gaussian, distribution. A more and then divide by the number of children measured to obtain
descriptive term, bowever, is the normal curve of error because a mean or average height. The mean of all the children's heights
it represents the distribution of errors observed from repeated from the data in figure 5-8 is 5.3 ft.

NASA/TP--2000-207428 63
TABLE 5-3.--AREAS BETWEEN -z AND z
1 e-(x-'})2/2a2 7 z Area under Probability
curve -z_< x< z
.4 i
I 0.683 P( - Io'<x< I0-)
2 .9545 P( - 20" < x < 2o')
Point of
.3 -- inflection _ 3 .9973 P( - 30" < x _ 30")
1: -- Area under
"0 4 .999937 P( - 40- < x < 40")
0 ,2
curve equals
One standard unity 5 .999999426 P( - 50- _<x -< 50)
deviation, a --. 6 .99999999803 P( - 60- <-.x <-.60")
C,
Z .1
7 .999999999992 P( - 70" < x _<70")

0
-4_ -3(_ -2_ -lo" x lc_ 2o 30- 40
Standardized normal variable
of inflection on the curve. This is shown in figure 5-9. It is also
Figure 5-9.---Norrnal density function. shown that equal increments of the standard deviation can be
laid out to the left (-) and the right (+) of the mean .2.
The second parameter of p(x) is the standard deviation o- As you will recall, in determining probabilities from a
calculated from density function, we need to calculate the area under the curve
p(x). When using the normal density function, it is common
practice to relate areas to the standard deviation. In general, for
the area under the curve between the values of z and -z,
standard deviations can be found from

where
p(-z < x < z) = Area = _ dz
z o-2-,/TE
2 mean of measurements

xi value of Fh measurement The areas for various values of z are shown in table 5-3. This
n total number of measurements table shows that the area under the normal curve between I o-

and -1 o- is 0.683, or 68.3 percent; the area under the normal


Note that n - 1 is used in the equation to give an unbiased curve between 2o-and -2o-is 0.9545, or 95.45 percent, and so
sampling distribution. In the general definition of o-, n instead forth.
of n - 1 would be used. Example 7: The term "3o-limit" refers to the area under the
The standard deviation is the square root of the variance, normal curve between 3o and -3o-, which is 0.9973, or
which is denoted by 0 '2. The magnitude of the variance, as well 99.73 percent, as shown in table 5-3. Therefore, if a power
as the standard deviation, indicates how far all the measure- supply output is defined as 28+3 V and the +3 V represents a
ments deviate from the mean. The standard deviation of the 3o" limit, 99.73 percent of all such power supplies will have
children's height data, for example, is approximately 0.3 ft. If an output between 25 and 31 V. The percentage of supplies
the range of heights observed had been from 5 to 5.6 ft, the having an output greater than 31 V and less than 25 V will be
standard deviation would have been approximately 0.1 ft; with 1 - 0.9973 = 0.0027, or 0.27 percent, as shown in figure 5-10.
this standard deviation, the distribution would look squeezed Up to now we have been working with areas under the
together, as shown by the dashed curve in figure 5-8. However, normal density function between integers of o-, that is, 1, 2, 3,
the area under the dashed curve would still equal the area under and so on. In practice, however, we are usually interested in the
the solid curve. area between decimal fractions of o", those being 1.1, 2.3,
et cetera. We have also been using z to represent the number of
standard deviations that a particular limit value is from the
Properties of Normal Distribution mean. For instance, in the power supply example, 25 V was
given as being three standard deviations from the mean of 28
The normal density function is a continuous distribution V. It is better when working in decimal fractions of o" to let
from -,,,, to ,_. It is symmetrical about the mean and has an area z = (x - _) / o- where x - ,2 is the distance from the mean _ to
equal to unity as required for probability density functions. For the limit value and o- is the standard deviation. Going back to
the normal distribution, the standard deviation is the distance the supply example, our lower limit was 25 V, which was 3 V
on the abscissa from the mean _ to the intercept on the abscissa from the mean of 28 V, and the standard deviation was 1 V;
of a line drawn perpendicular to the abscissa through the point therefore, z = (25 - 28)/1 = -3.

64 NASA/TP_2000-207428
TABLE 5-4.--AREAS IN "FWO TAILS OF NORMAL CURVE AT SELECTED VALUES OF z
I From reference 5-1.1

z 0 0.01 0.02 0.03 0.04 0.05 0.06 0,07 0•08 0.0'9

1.0000 0.9920 0.9840 0.9761 0.9681 0,9601 0•9522 0.9442 0.9362 0.9283
0 I .8572 .8493
.9203 .9124 .9045 .8966 •8887 •8808 .8729 .8650
,/ .8415 .8337 .8259 .8181 8103 •8026 •7949 .7872 •7795 •7718
.7642 .7566 ,7490 .7414 •7339 .7263 .7188 •7114 .7039 .6965
• 6892 .6818 • 6745 .6672 .659q .6527 .6455 •6384 .6312 •624 I

.5 .6171 •6101 • 6031 .5961 .5892 •5823 •5755 .5687 •5619 .5552
.6 .5485 .5419 .5353 ,5287 •5222 •5157 •5093 •5029 .4965 .4902
.7 .4839 .4777 ,4715 .4654 .4593 .4533 .4473 .4413 .4354 .4295
.8 •4237 .4179 .4122 .4065 4009 •3953 .3898 .3843 •3789 •3735
.9 .3681 .3628 .3576 .3524 •3472 •3421 .3371 .3320 .3271 •3222

1.0 •3t73 •3125 ,3077 ,3030 •2983 •2937 .2891 .2846 .2801 .2757
1.1 .2713 •2670 .2627 .2585 .2543 .2501 .2460 •2420 .23_ •2340
1.2 • 2301 .2263 .2225 _2187 .2150 •2113 •2077 .2041 .2005 •1971
1.3 ./936 .1902 .1868 .1835 •1802 .1770 .1738 •1707 • 1676 • 1645
1.4 .t615 .1585 ,1556 .1527 .1•499 .1471 •1443 1416 • 1389 • 1362

1.5 .1336 •1310 .1285 .1260 .1236 .1211 •1188 .1164 • 1141 .1118
1.6 • 1096 .1074 .1052 ,1031 .1010 .0989 .0969 .0949 .09_ .0910
1.7 •0891 •0873 •0854 .0836 •0819 •08OI ,0784 .0767 .075 I .0735
1.8 .07 t 9 .0703 .0688 .0672 .0658 .0643 .0629 .0615 .0601 •0588
1.9 .0574 ,056 I .0549 .0536 .0524 •0512 .0500 .0488 .0477 .0466

20 .0455 .0444 .O434 .(M24 .0414 .0404 .0394 .0385 •0375 .0366
2.1 .('1357 •0349 .0340 •0332 •0324 •0316 •0308 .0300 .0293 ,0285
2.2 •0278 O271 .0264 •0257 •0251 .0244 •0233 •1)232 I .0226 .0220
2.3 .0214 .0209 .0203 •0198 .0193 .0188 0183 ,0t78 i .0173 .0168
2.4 .0164 0160 .0155 .0151 O147 .0143 .0139 .0135 .0131 •0128

2.5 .0124 .0121 .0117 .0114 .0111 .0108 •0105 .0102 .00988 00960
2.6 .00932 .00-)05 ,00879 •00854 00829 .00805 .00781 .00759 .00736 .00715
2,7 .00693 ,00673 .00653 .00633 .00614 .00596 .00578 .00561 .00544 .00527
2.8 .00480 .00465 .00451 .00437 .00424 .004.10 .00398 .00385
2.9 .00373
.00511 .00361
...00495 .00350 .00339 .00328 .00318 00308 00298 .00288 .00279
1

z 0 0.1 0.2 0•3 0.4 0.5 0.6 0.7 0.8 0.9

3 0.00270 0.00194 0,00137 i0.03967 0.3674 0.03465 0.03318 0.0_216 0•03145 0.04962
4 .04633 .04413 .04267 .04171 .04108 .05680 ._422 .05260 .(_159 .0_58
5 ._'573 .0_'340 ._q99 .0_116 ,07666 07380 .07214 .07120 .0r'663 .0'364
6 .0s197 .OS106 00565 .0_298 ._155 .0m803 .011)411 .0m208 O"!105 .011520
t

Symmetrical Two-Limit Problems Table 5-4 shows tabulated areas in two tails of the normal

curve for selected values of z from the mean ,7. For example,

In this discussion the term "symmetrical two-limit prob- when z = 3.0, the table shows that 0.00270 of the total area

lems" refers to the area under the density function at equal lies in the two tails of the curve below -3o" and above 30".

values of z from both sides of the mean. The power supply Because the curve is symmetrical, 0.00135 of the area will lie

example was this type, since we were concerned with the area to the left of -30- and 0.00135 to the right of 30". Note that

between -30" and 30" from the mean ,T. To work these prob- this agrees with figure 5-10 for the power supply example.
lems when z ts a decimal fraction, we use tables of areas in Example 8 (using table 5-4): Suppose that a circuit design
the two tails of the normal curve, requites that the gain fl of a transistor be no less than 30 and

NASA/TP--2000-207428 65
Step 3--Now find P(30 < 13 < 180). Since 0.0193 of the
P(25 V < x < 31 V) = 99.73 percent /
Lower transistors will have a 13 below 30 or above 180, then I - 0.0193
., _- Upper
limit -_ \ must give the percentage that will lie between 30 and 180.
/ limit
This is 1 - 0.0193 = 0.9807, or 98.07 percent, as shown in

0.00135 J figure 5-11. If we were to buy 100 000 of these transistors, we


"_ 0.00135
will have i will have would expect 98 070 of them to have a 13between 30 and 180.
an output , / \ an output
The remaining 1930 would not meet our ]3 requirements.
less than I / \ greater
25 V'_ I / than
I _,. 31 v 7
One-Limit Problems

25 26 27 28 29 30 31
Observed voltage, V In many applications, engineers are interested only in one-
I F _ I I I t sided limits, an upper or lower limit, rather than a two-sided
-3or -2a -lc ,_ l(r 2c 3or upper and lower limit. In these cases, they are interested in
the area under one tail of the density function as shown in
Figure 5-10.--Probability density functions for power supply
outputs. figure 5-12. Tabulated values of the area in one tail of the
normal density function at selected values of z are given in
no greater than 180. The mean .7 of the fl density function of table 5-5.

a particular transistor is 105 with a standard deviation of 32. Example 9: Suppose an exploding bridgewire (EBW)
What percentage of the transistors will have a fl within the power supply is required to produce an output voltage of at least
required limits? 1500 V. At this output voltage or greater, all the bridgewire
Solution 8: detonators will explode. If the mean output of all such supplies
Step l--Solve for z. is known to be 1575 V and the standard deviation is 46 V, what
is the probability that an output of 1500 V or greater will be
observed?
X - Y = 105 - 30 = 180 -105 = 75

Since a is given as 32,

75
Z=--=2.34
32
.,_._..._x_._m

/
Step 2--From table 5-4, the area in the two tails when z = 2.34
is 0.0193. Therefore. two tail tables 0.00965 of the transistors
will have a 13 below 30 and 0.00965 will have a 13 above 180.
Lower
limit --_\
P(30 <_/__<180) = 1 - 0.0193 = 0.9807 \\ lower limit)
1/
/
/
/ , J ..... --
4""--2.34 a _- = / 2.34 o"-"
i x'-_+
/
/
(a) P(x < lower limit)

.,,.._._ __C_____
Lower /_ ,-Upper

limit _ I i _ / limit
\\ I i Olx <
upp

'_f.mu) _. / _ ."-Upper
30 41 73 105 137 169 180
Transistor gain, 13
limit
1 I I I J I I
-3a -2c_ -1_ E 1_ 2a 3or
_" x'-"+ /
0.00965 0.00965 /

lie below lie above J (b) P(x >_.upper limit) -j


!3= 30 p= 180 -
Figure 5-12._Example of one-limit problems. (a) Lower limit.
Figure 5-11 ._Transistor gain. (b) Upper limit.

66 NASA/TP_2000-207428
TABLE 5-5.--AREAS IN ONE TAIL OF NORMAL CURVE AT SELECTED VALUES OF z

IFrom reference 5- I, I

z 0 0.01 0.02 0.03 0.04 0.05 0,06 .O07 0.08 0.09

0 0.5000 0.4960 0,4920 0.4880 0.4840 0.4801 0.4761 0.472 I 0.4681 0.4641
•1 .4602 .4562 .4522 .4483 .4443 .4404 .4364 .4325 .4286 .4247
.2 .4207 .4168 •4129 .4090 .4052 .4013 .3974 .3936 •3897 .3859
.3 •3821 .3783 .3745 .3707 .3669 .3632 •3594 •3557 .3520 .3483
.4 .3446 .3409 .3372 .3336 .3300 .3264 .3228 .3192 .3156 .3121

•5 3085 .3050 .3015 .2981 .2946 .2912 .2877 .2843 •2810 .2776
.6 .2743 .2709 .2676 .2643 ,2611 .2578 .2546 .2514 .2483 .2451
.7 .2420 .2389 .2358 .2327 .2296 .2266 .2236 .2206 .2177 .2148
.8 ,2119 .2090 .2061 .2033 .2005 . 1977 . 1949 . 1922 • 1894 .1867
.9 .1841 •1814 •1788 •1762 .1736 .1711 .1685 .1660 .1635 .1611

1.0 .1587 .1562 .1539 .1515 .1492 ,1469 .1446 .1423 • 1401 .1379
I.I .1357 .1335 .1314 .1292 .1271 .1251 •1230 .1210 .1190 •1170
1.2 ,1151 .1131 .1112 .1093 •1075 .1056 •1038 •1020 • 1003 .0985
1.3 .0968 .0951 ,0934 .0918 .0901 .0885 .0869 .0853 •0838 .0823
1.4 .0808 .0793 .0778 .0764 .0749 .0735 .0721 ,0708 .0694 .068 I

1.5 ,0668 .0655 .0643 .0630 .0618 .0606 .0594 .0582 .0571 .0559
1.6 .0548 .0537 .0526 .0516 ,0505 .0495 .0485 ,0475 .0465 .0455
1.7 ,0446 ,0436 .0427 ,0418 .0409 .0401 .0392 .0384 .0375 .0367
1.8 .0359 .0351 .0344 ,0336 .0329 .0322 ,0314 ,0307 ,0301 .0294
1.9 .0287 .0281 ,0274 .0268 .0262 .0256 .0250 .0244 ,0239 •0233

2.0 .0228 .0222 .0217 .0212 .0207 .0202 .0197 .0192 •0188 .0183
2. I ,0179 .0174 t ,0170 .0166 .0162 .0158 ,0154 •0150 .0146 •0143
,0113 .0110
2.3 .0107 .0104 .0102 .00990 .00964 .00939 .00914 .00889 .00866 .(X)842
2.2 .(.)139 .0136 [ ,0132 •0129 .0125 .0122 •0119 .0016
2.4 .00820 .00798 .0076 .00755 .00734 .(10714 .00695 ,00676 .fX3657 .00639

2.5 .00621 .00604 .00587 .00570 ,00554 .00539 .00523 .00508 .00494 .00480
2.6 00466 .00453 .00440 .00427 .00415 .00402 .00391 .00379 .00368 .00357
2.7 .00347 ,00336 .00326 .00317 .00307 .00298 .00289 .00280 .007272 .00264
2.8 .00256 .00248 .00240 .00233 .00226 .00219 .00212 .00205 .00199 .00193
2.9 .00187 .00181 .00175 .00169 .00164 .00159 .00154 .00149 .00144 .00139

z 0 0.1 0.2 0,3 0.4 0.5 0.6 0.7 0,8 09

3 0.00135 0.0-_968 0,O_687 0.0_483 0,03337 0•03233 0.0q59 0.Oq08 0.04723 0.0-148 I
4 .04317 .04207 .04133 .05854 .0-_541 .056340 .O-s21 I .05130 .0_793 .0t'479
5 .0_287 .0¢'170 .07996 .07579 .07333 .07190 .07107 .(.)s599 .0x332 .0_182
6 .0"987 .0'_530 .0'_282 .0 _ 149 .0iI)777 .0 m402 .01(v2_06 .0m 104 .011523 .011260

Solution 9: Step 3--Find the probability that the output will be 1500 V or
Step l--Calculate z. greater. Since from step 2 P(x < 1500) = 0.0516,

Mean limit ! 575 - 1500 75 P(x > 1500) = 1- P(x < 1500) = 1 -0.0516
Z _ .... !.63
cr 46 46 = 0.9484, or 94.84 percent

Step 2--Find the area in one tail of the normal curve at z from
the mean. From table 5-5 the tail area atz = 1.63 from the mean We can therefore expect to obtain a 1500-V output voltage level
is given as 0.05 ! 6. Therefore, there is a 0.0516 probability that 94.84 percent of the time. Or to express it another way,
an observed output will be below 1500 V. 94.84 percent of the supplies will produce an output above the

NAS A/TP----2000- 207428 67


or in z notation

I1 ._I6MI an value
F(z)=--_ ;e -(112)z2 dz

A graph of F(x) is shown in figure 5-14. Recall that in


discussing cumulative functions earlier, F(x) was called the
L°_ter_ il/l _ bpu'iorl cumulative area under the density curve. Looking at figure 5-14,
then, you can see that

/ 1500 V 1575 V _ (1) F( .2 ) = 0.5, or that 50 percent of the area under the normal
_-Probabilitythat _ Probabilitythat distribution is between _oo and the mean 2, or that there is a
outputwill be outputwillbe 50-percent probability that a variable x lies in the interval
below above (_oo,_-)
1500 V = 0.0516 1500 V = 0.9484
(2) 1 - F(2.) = 0.5, or that 50 percent of the area under the
Figure 5-13,wExploding bridgewire power supply output, normal distribution is between the mean 2 and oo; or that
there is a 50-percent probability that a variable x lies in the
interval ( 2., oo)
1.0 (3) The area between -1 o" and ._ is

.9
P(- lo-_.<
x _<2.)= F(2.)- F(- lo-)
.8
=0.5-0.16=0.34
.7 = p(x) dx
or that there is a 0.34-probability that a variable x will lie
>..6 m between the mean )7 and -Icr.
For more accurate work, the cumulative areas for selected
._ .5 values of z have been tabulated and are shown in tables 5-6
and 5-7. Table 5--6 shows the cumulative areas for values of z
.4
from -,,o to 0, which are illustrated in figure 5-15. Table 5-6
shows that
.3

( 1) At z = 0 (i.e., when the distance from the limit to 2. is 0),


the cumulative area from _oo to .7 is 0.5000, or 50 percent
.1 m
I I I
(2) At z = -1.0, the cumulative area from -o_ to -lc; is
0.1587, or 15.87 percent
o
-3<_ -2c -1G X 1(_ 2C 3C (3) At z = -2.0, the cumulative area from -o_ to -2_ is
0.02275, or 2.275 percent
Figure5--14.---Curnulativenormalcurve.

Table 5-7 shows the cumulative areas for values of z from


minimum requirement of 1500 V. This result is shown in fig- 0 to o_, which is illustrated in figure 5-16.
ure 5-13. Associated with the probability density function p(x) In both tables the value of z is the same as F(x). It therefore
of the normal distribution is a cumulative probability distribu- follows that
tion denoted by F(x). As shown in the integral formulas of
chapter 2. the relation between the two is given by (1) The probability of the variable x lying between _oo
and _ is

F(,)= fp(x)dx P(---_ < x < 2") = F(.2)- F(--_)

= F(z = O)- F(z = --_)


So, for the normal distribution
= 0.5 - 0 = 0.5, or 50 percent

(2) The probability of the variable x lying between -2.1or


F(x) = cr--_-_l ;e-ll2[(x-.O/c_]: dx and 3.20"is

68 NASA/TP--2000-207428
TABLE 5-6.--CUMULATIVE NORMAL DtSTRIBUT'fON FROM z = -oo 1o u
[From reference 5-2.1

z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

-0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.464 1
- .I .46O2 .4562 •4522 .4483 .4443 .4404 .4364 .4325 .4286 .4247
- .2 .4207 .4168 .4129 .4090 .4052 .4013 .3974 ,3936 .3897 .3859
- .3 .3821 .3783 .3745 .3707 .3669 .3632 .3594 •3557 .3520 .,M83
- .4 ..3446 •3409 .3372 :3336 .3300 .3264 .3228 ,3192 .3156 .3121

- .5 .3085 .3050 .3015 .2981 .2946 .2912 • 2877 •2843 .2810 •2776
- .6 .2743 .2709 •2676 .2643 .261 I .2578 .2546 ,2514 .2483 •2451
- .7 •2420 .2389 .2358 .2327 .2297 .2266 .2236 •2206 .2177 .2148
- .8 .2119 .2090 .2061 •2033 .2005 • 1977 .1949 .1922 .1894 .1867
- .9 .1841 .1814 .1788 .1762 .1736 •1711 .1685 ,1660 .1635 •1611

-I.0 .1587 • 1562 .1539 .1515 .1492 • 1469 .1446 .1423 .1401 .1379
-I.I .1357 • 1335 .1314 .1292 .1271 .1251 .1230 .1210 .1190 •1170
-I.2 .1151 .1131 .1112 .1093 .1075 • 1056 • 1038 .1020 .1003 .09853
- 1.3 .09680 •09510 .09342 .09176 .09012 .0885 I .08691 .08534 •08379 .08226
- 1.4 .08076 •07927 .07780 .07636 .07493 .07353 .07215 .07078 .06944 .06811

- 1.5 .06681 .06552 .06426 .06301 .06178 .06057 .05938 .05821 .05705 •05592
-- 1.6 .05480 .05370 .05262 .05155 .05050 AM947 .04846 .04746 .04648 .0455 I
- 1•7 .04457 .04363 .04272 .04182 .04093 .04006 .03920 .03864 .03754 .03673
- 1.8 .03593 •03515 .03438 •03362 .03288 .032 i6 .03144 .03074 .03005 .02938
- 1.9 .02872 •02807 .02743 .02680 •02619 .02559 .02500 .02442 .02385 .02330

- 2.0 .02275 .02222 .02169 .02118 .02068 •02018 .01970 .01923 .01876 .01831
-2.1 .01786 •01743 .01700 .01659 .01618 .01578 .01539 .01500 .01463 .01426
-2.2 .01390 .01355 .01321 .01287 .01255 .01222 .01191 .01160 •01130 .01101
-2.3 .01072 .01044 .01017 .0-'9903 .029642 .029387 .029137 .028894 .0-'8656 .0:8424
-2.4 ,028198 ,027976 ,0:7760 ,027._,9 ,027344 ,027143 ,026947 ,0:6756 ,026569 .0:6387

-2.5 .026210 .026037 .0:5868 .025703 .0-'5543 .025386 .0:5234 .025085 .024940 .0:4799
-2.6 .0-'4661 .024527 -0-'4396- .024269 .024145 .0z4025 .0:3907 .023793 .0-'3681 .0:3573
-2.7 .023467 .0-'3364 .0-'3264 .0:3167 .023072 .0:2980 .0-'2890 .022803 .022718 .022635
-2.8 .0-'2555 .022477 ,o:2401 ,0-"2327 .0-'2256 .0-'2186 .0:2118 ,0-'2052 .021988 ,0:1926
-2.9 .021866 .021807 .0:1750 •021695 .021641 .02 1589 .021538 .0-'1489 .021441 .0-'1395

-3.0 .0-'1350 .02 1306 .0:1264 .021223 .0211831 .0-' 1144 .0:! 107 .0:1070 .021035 .0:1001
- 3. I .0-_9676 .039354 .039043 .038740 •038447 .038164 .ffa7888 •037622 .037364 .037114
- 3.2 .ff_687 I .036637 .036410 .036190 •035976 .035770 .0_5571 .035377 .035190 .035009
-3.3 .0348M .034665 .034501 .034342 .034189 .05404 l .033897 .033758 .033624 .033495
-3.4 .043369 .033248 •033131 .033018 .032909 .032803 ,032701 .032602 .032507 .0._2415

-3.5 .0-_2326 .032241 .032158 .032078 .032001 .fix 1926 .031854 .031785 .031718 .0_1653
-3.6 .031591 ,031531 ,031473 ,031417 ,ff_1363 ,031311 ,031261 ,031213 ,031166 .031121
-3.7 .031078 •031036 ._9961 .049574 .04920 I .O48842 .048496 .048162 .047841 .047532
-3.8 .047235 .0a6948 .0a6673 .046407 .046152 .O459O6 .045569 .045442 .045223 .045012
-3.9 •044810 .044615 .044427 .044247 .044074 .043908 .043747 ._3594 .043446 .043304

-4.0 •043167 .043036 .0"*29 I0 .042789 .042673 .042561 .0"_2454 .042351 .042252 .042157
-4.1 •0"12066 .041978 .0al894 .041814 .041737 .041662 .041591 .041523 .041458 .041395
-4.2 .041335 .041277 .041222 .041168 .041118 .O4I069 .041022 .059774 .0-s9345 .058934
-4.3 .0-_8540 .058163 .057801 .0-s7455 .057124 .0s6807 .0s6503 .056212 .0s5934 .ff'_5668
-4.4 •055413 .0-_5169 .054935 .054712 •0_4498 .054294 .0-s4098 .0-_3911 .053732 .0-_356 I

-4.5 .053398 .053241 .053092 .052949 .052813 .0-_2682 •052558 .052439 .052325 .052216
-4.6 ,052112 .0_2013 .051919 .051828 .0-Sl742 .0.s1660 .051581 .051506 .051434 .051366
-4.7 ,051301 .051239 .051179 .051123 .051069 .051017 .0"9680 .069211 .068765 •0"8339
-4.8 ,0"7933 .0"7547 .0_7178 .0%827 .066492 .066173 .065869 .065580 .0"5304 ,0_5042
-4.9 .0"4792 .0_4554 .0_4327 .064111 .0"3906 .0"3711 .0_3525 .0_3348 ,0"3179 .0_3019

-oo 0 0 0 0 0 i 0 0 0 0

NASA/TP--2000-207428 69
TABLE 5-7.--CUMULATIVE NORMAL DISTRIBUTION FROM z = 0 to oo
IFrom reference 5-2.1

z 0 0.01 002 0.03 0.04 0•05 0.06 0•07 0•08 0.09

0 0.5000 0.5040 0.5080 0.5120 0•5160 0.5199 0•5239 0•5279 0•5319 0•5359
•I •5398 .5438 .5478 .5517 .5557 •5596 .5836 .5675 •5714 .5753
.2 .5793 .5832 .5871 .5910 ,5948 •5987 .602,6 .6064 •6103 .6141
.3 ,6179 .6217 •6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
.4 .6554 .6591 .6628 .6664 .6700 •6736 .6772 .6808 .6844 •6879

.5 •6915 ,6950 •6985 •7019 .7054 .7088 •7123 .7157 .7190 •7224
.6 •7257 •7291 •7324 .7357 .7389 •7422 • 7454 •7486 .7517 •7549
.7 •7580 •7611 .7642 •7673 .7703 •7734 .7764 .7794 •7823 •7852
.8 •7881 •7910 .7939 .7967 .7995 •8023 •8051 .8078 .8106 •8133
.9 .8159 •8186 .8212 .8238 .8264 .8289 •8315 •8340 • 8365 •8389

1.0 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 •8599 •8621
I. I .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 •8830
1.2 .8849 .8869 .8888 .8907 •8925 .8944 • 8962 .8980 •8997 .90147
1.3 .90320 ,90490 .90658 .90824 •90981 .91149 .91309 .91466 .91621 .91774
1.4 ,91924 •92073 .92220 .92364 •92507 .92647 .92785 .92922 .93056 •93189

1•5 •93319 .93448 .93574 .93699 •93822 .93943 .94062 .94179 .94295 .94408
1.6 .94520 .94630 ,94738 •94845 .94950 .95053 .95154 .95254 .95352 .95449
1.7 ,95543 .95637 •95728 .95818 .95907 •95994 .96080 .96164 •96246 .96327
1.8 .96407 .96485 .96562 .96638 .96712 •96784 .96856 .96926 .96995 .97062
1.9 .97128 •97193 •97257 •97320 .97381 • 9744 I .97500 .97558 .97615 .97670

2.0 .97725 •97778 •97831 .97882 .97932 •97982 •98030 .98077 .98124 .98169
2. I .98214 .98257 .98300 .98341 •98382 .98422 .98461 .98500 •98537 .98574
2.2 ,98610 .98645 .98679 .98713 •98745 .98778 .98809 .98840 •98870 .98899
2.3 .98928 .98956 .98983 .920097 .9-'0358 .920613 .920863 .921106 .921344 .921567
2.4 .9-'1802 .922024 ,92240 .922451 ,922656 .922857 .9-'3053 .9-'3244 •92343 I .923613

2.5 .923790 .923963 .924132 .924297 .9-'4457 .924614 .924766 .924915 .925060 .9-'5201
2.6 .925339 .925473 .925604 .925731 .925855 .925975 .926093 .926207 .926319 .916427
2.7 .926533 .926636 .926736 .926833 .9269281 .9-'7020 .9-'7 ! 10 .9:7197 .927282 .927365
2.8 .927445 .927523 .927599 .927673 .927744 •927814 .927882 .9:7948 ,928012 .928074
2.9 .928134 .9-'8193 .928250 .928305 •928359 .928411 .928462 .92851 I .928559 .9-'8605

3.0 .928650 .928694 •928736 .928777 .928817 .9-'8856 .928893 .9-'8930 .928965 .928999
3.1 .920324 .920646 .930957 .9-11260 .931553 ,931863 .932112 .932378 ,932636 .932886
3.2 •933129 •933363 .933590 .933810 .934024 .934230 .934429 •934623 .934810 .9_4991
3.3 .935166 .935355 .935499 .935658 .93581 .935959 .936103 .936242 .936376 .936505
3.4 .936631 .936752 .936869 .936982 .937091 .937197 .937299 .937398 .9-_7493 .9.17585

3.5 ;937674 .937759 .937842 .937922 .937999 .938074 .9.18146 •938215 •938282 .938347
3.6 .938409 .938469 .938527 .9J8583 .938637 .938689 .938739 .938787 .938834 .938879
3.7 .938922 .938964 .940039 .940426 •940799 .941158 .9415O4 .941838 .942159 .942468
3.8 .942765 ,943052 .943327 .943593 .943848 .944094 .9"*433 I .944558 ,944777 .944988
3.9 .945190 .945385 .9*5573 .945753 .945926 .9"h5092 .946253 .946406 ,946554 .946696

4.0 .946833 .946964 .947090 t .947211 .947327 .947439 .9*7546 .947649 .947748 .947843
4. I .947934 .948022 ,948106 .948186 .948263: .948338 .948409 .948477 .948542 .948605
4.2 .948665 .948723 .948778 .948832 .948882 .948931 .948978 .9540226 .95O655 .9,si066
4.3 .951460 .951837 .952199 .952545 .952876 •953193 .953497 .953788 .9540(_ .954332
4.4 .954587 .954831 .955065 .955288 .955502 ,955706 .955902 .956089 .956268 .956439

4.5 .936602 .956759 .9-"6908 .957051 .957187 •9"_7318 .957442 .957561 .957675 .957784
4•6 .957888 .957987 .9580gI .958172 .958258 .95834 .958419 .958494 .958566 .958634
4.7 .958699 .958761 .958821 .958877 .9.58931 .958983 •960320 .960789 .961235 .961661
4.8 .962067 .962463 .962822 .963173 .963508 .963827 .964131 .9_1420 .964696 .964958
4.9 .965208 .965446 .965673 .965889 .966094 .966289 .966475 .966652 •966821 •966981

1.0 1.0 I •0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

70 NASA/TP--2000-207428
the probability that the signal will be delayed within the
area from specified times?
--_ to --Z -_ Solution 10:
Cumulative / Step 1--Find F(98 see). Since the mean is given as 95 sec and
the standard deviation as 2.2 sec,

-Z 0 Limit-Mean 98-95 3
7 ..... 1.36
_y 2.2 2.2
Figure 5-15.---Cumulative areas for values of z from --_ to 0.

From table 5-7,

F(98 sec)= F(z)= F(1.36)= 0.91309

Step 2--Find F(90 sec). Since the mean is 95 sec and the
standard deviation is 2.2 sec,

--oo 0 z oo 90-95 -5
z ...... 2.27
Figure 5-16.--Cumulative areas for values of z from 0 to oo. 2.2 2.2

From table 5--6,


Area = P(90 < x_<98) = 0.90149 7
//

/
/

//
F(90 sec)= F(Z) = F(-2.27) = 0.01160
/I

Step 3--Find P(90 < x _<98). From steps 1 and 2,

/- Upper
/ limit P(90 < x < 98) = F(98) - F(90) = 0.91309 - 0.01160
Lower
= 0.90149, or 90 percent
limit -_\

I There exists, therefore, a 90-percent probability that the signal


1

will be delayed no less than 90 sec and no more than 98 sec, as


90 90.6 92.8 95 97.2 98 99.4
shown in figure 5-17.
Signal delay time, s
I I I I
-2(_ -1G X 1C 2cr
Application of Normal Distribution to Test Analyses and
Figure 5-17.--Signal delaytime. Reliability Predictions

This section gives two examples of how the normal distribu-


tion techniques may be applied to the analysis of test data of
P(-2.10. < x < 3.20") = F(3.2)- F(-2. I) certain devices and how the results of the analysis may be used
to estimate or predict the outcome of actual tests (ref. 5-5).
= F(z = 3.2)- F(z = -2.1)
Many similar examples are given in chapter 6.
= 0.9993129 - 0.01786
Example 11". For this two-limit problem, assume that a door
= 0.9814529, or 98 percent hinge has a pin pull-force requirement of 12 + 4.64 lb. Assume
further that we have received 116 door hinges and have actually
Nonsymmetrical Two-Limit Problems measured the pin pull-force required for 16 of them as part
of an acceptance test. The results of the test are shown in table
The cumulative function is useful for solving nonsymmetrical 5-8 and in the histogram of figure 5-18. We now want to apply
two-limit problems, which are in practice the most frequently normal distribution theory and then estimate what percentage
encountered. of the remaining 100 door hinges will meet the pin pull-force
Example 10: Suppose that a time-delay relay is required to requirement.
delay the transmission of a signal at least 90 sec but no more Solution 11:
than 98 sec. If the mean "time out" of the specific type of Step l--Solve for the mean of the test data 2. We have already
relay is 95 sec and the standard deviation is 2.2 sec, what is seen that

NASA/TP_2000-207428 71
TABLE 5-8.--RESULTS OF
DOOR HINGE
/ function between
ACCEPTANCE TEST _ J ;- Area under density
7 r_........ .. acceptance limits,
Pull-force Number of
required. occurrences !/_ / 98 percent
lb
/ I

Lower I ,Upper
8
5 acceptance t i/t. \ _ acceptance
10 "_ 4 limit (-2.32 _) / d. _._, limit (+2.32 cr)

12

14
°
"_ 3 1 percent
w Ibe
,' /-11
,
r
I/ t _ ] _ ,
r- 1 percent
I will be
I I ,
E defect, ve , _ I + t \ , + defective
16
here-, I/ i } :_ I I here
\\ :_" I
Total t6

0 2 4 6 8 10 12 14 16 18 20 22
Pin pull-force, Ib

t I I I l 1 I I I
-4er -3c -2or -lc x lc 2c_ 3cr 40

Figure 5-18.--Door hinge test results.

/1

Step 2--Solve for the standard deviation o. We have also seen


,¥= i=1 that
n

where r n -.[1/2

]Z (x/-2)2 1
x i value of ith measurement
n total number of measurements

Let x = pound forces so that


1
x I =8 x 9 =12 where

x 2 = 10 xt0 = 12
observed mean
x 3 = 10 Xjl = 12
x value of ith measurement
x 4 = 10 x12 = 14
n total number of measurements
x5 = 12 Xt3 = 14

X 6 = 12 x14 = 14

x 7 = 12 x15 = 14
Solve for _._(xi_z)2:
x 8 = 12 x16 = 16 i=1

and let n = 16 (number of occurrences). The mean E- is n 16


therefore
=y_.(x;-12)
2
i=1 i=1

16 = (8 - 12) 2 + 3(I0 - 12) 2 + 7(12 - 12) 2


ZXi
+4(14 - 12) 2 + (16 - 12) 2
2 = i=l.__L_= 8 + 3(10) + 7(12) + 4(14) + 16
n 16 = (--4) 2 + 3(-2) 2 + 7(0) 2 + 4(2) 2 + (4) 2
= 12 Ib (rounded to two places) = 16+ 12+0+ 16+ 16 = 60

72 NAS A/TP--2000- 207428


Area above 147.6 °F is probability
16 that output will not be greater than
31 V at 147.6 *F and below: P = 0.96712 -_
/

Thensolvefor i=1 it

n-1 Area below 147.6 °F is


probability that output /

will be greater than ,_ 1'


16 31 V at 147.6°F and / /

E,(x,-12)2 below: P = 0.03288 _ /


i=1 60 60
n-1 16-1 15
/

Finally solve for


147.6 176
Temperature, °F

Figure 5-19.--Failure distribution of power supplies.


1- 16 2 ]1/2

°=/'=' ] --4S=21b
defective: one would have a pull force less than 7.36 lb (the
lower limit) and the other, a pull force greater than 16.64 lb (the
upper limit). This is also shown in figure 5-18.
Step 3--With a mean of 2 = 12 lb and a standard deviation of
However, considering the 16 door hinges to be actually
a= 2 lb, figure 5-18 shows that
representative of all such door hinges, we could predict that
only 98 percent of such door hinges produced would meet the
(1) The lower pull-force limit of 7.36 Ib is z =
(7.36 - 12)/2 = -2.32 standard deviations from the mean. acceptance criteria of a 12 _+4.64-1b pin pull force.
Example 12: In this one-limit problem, 10 power supplies are
(2) The upper limit of 16.64 Ib is z = (16.64 - 12)/2 =
2.32 standard deviations from the mean. selected out of a lot of I 10 and tested at increasing temperatures
until all exceed a maximum permissible output of 31 V. The
failure temperatures in degrees centigrade of the 10 supplies
Consequently, the percentage of door hinges that should fall
are observed to be
within the 12 + 4.64-1b tolerance is given by

x I = 57 x 6 = 60
P(-2.32a < x < 2.320") = F(2.32)- F(-2.32)
= 0.98983 - 0.01017 x 2 = 65 x 7 = 75
x 3 = 53 x 8 = 82
(from tables 5-6 and 5-7 )
x 4 = 62 x 9 = 71
= 0.97966, or 98 percent
x 5 = 66 xl0 = 69

This says that 98 percent of the door hinges should fall within
the 12 + 4.64-Ib tolerance and that 2 percent should be outside Find the probability that the remaining 100 supplies will have
the required tolerance. However, none of the 16 samples was an output greater than 31 V at 50 °C and below.
outside the tolerance. So where are the 2 percent that the Solution 12:
analysis says are defective? The answer is that the 2 percent of Step 1--Solve for the mean E":
defective door hinges are in the 100 not tested.
10
We can make this statement by assuming that if we had tested
all 100 door hinges, we would have expected to observe the y_,xi
57+65+53+62+66+60+75+82+71+69
same mean ( 2 = 12 lb) and standard deviation (or = 2 ib) that 2= i=I__L_._
=
10 10
we did with the 16 samples. Note that this assumption is subject
to confidence limits discussed in chapter 6. If we accept this = _660 = 66 ° C
assumption, we would expect to find 2 of the 100 door hinges 10

NASA/TP--2000-207428 73
Step2--Solveforthestandard
deviation
0-.First, Notes
on Tolerance Accumulation: A How-To-Do-It Guide

10
GeneraL--The notation used in calculating tolerance is
£(xi-66) 2 =(57-66) 2 +(65+66) 2 +(53-66) 2
i=1 T tolerance
+ (62 - 66) 2 + (66- 66) 2 + (60- 66) 2
standard deviation
+ (75-66) 2 +(82-66) 2 +(71-66) 2 +(69-66) 2
V dependent variable subject to tolerance accumulation
=81+1+169+16+0+36+81+256+25+9
= 674 X independent, measurable parameter

Then 1,2,3,n subscript notation for parameters

i generalized subscript (i.e., i = 1,2,3 ..... n for xi)

Tolerance is usually +3o'. When in doubt, find out. Note


that when Tis expressed in percent, always convert to engineer-

n, :(9): ing units before


V=f(£i,-72,_3
proceeding. The mean or average
..... -_n). The coefficient
is
of variation is

= 8.7 °C (rounded to two places) Cv = (a/V) x 100 = percent.


Worst-case method.--The worst-case method is as follows:

Step 3--Solve for z = (Limit - Mean)/o.. With an observed


mean .7 = 66 and a standard deviation 0- = 8.7, the 50 °C
limit is z = (50 - 66)/8.7 = - 16/8.7 = - 1.84 observation
locations in standard deviations from the mean. -v= - - -r,)
.....-ro)]
Step 4----Look at table 5-6 and find the cumulative area from Actually,
-,,_ to o. = - 1.84. This is given as 0.03288. Therefore, there
is a 3.288-percent probability that the remaining 100 supplies
will have an output greater than 31 V at 50 °C and below.
+_v=
:[(r,
+_
r,),(r=
+_ +- .....+_
to)]
This is shown in figure 5-19.
where the plus or minus sign is selected for maximum V and
then selected to give minimum V. If these +V worst-case limits
are acceptable, go no farther. If not, try the root-sum-square
Effects of Tolerance on a Product
method.
Root-sum-square method.--The root-sum-square method is
Because tolerances must be anticipated in all manufacturing
valid only if thef(x's) are algebraically additive (i.e., when V
processes, some important questions to ask about the effects of
is a linear function of the x's):
tolerance on a product are

(l) How is the reliability affected?


+V = V + 30"v
(2) How can tolerances be analyzed and what methods are
available? where
(3) How are tolerance failures affected?

Electrical circuits are often affected by part tolerances o.v2=0. ? +0.2 + 0"32 + • • • + 0"n2
(circuit gains can shift up or down, and transfer function poles
and
or zeros can shift into the right-hand s-plane, causing oscilla-
tions). Mechanical components may not fit together or
may be so loose that excessive vibration causes failure T..
ai = =.L if T/= +3o"
(refs. 5-6 to 5-8). 3

74 NASA/TP--2000-207428
Stated
another
way
_= f(21,22,23)

_ = 2i + 22+ 23
T=3cr

where
If these+V root-sum-square limits are acceptable, go no
farther. If they are not acceptable or thef(x' s) involve products 2l=l+O.lmil
or quotients, try the perturbation or partial derivative methods.
22 = 2+0.1 rail
Perturbation method--The perturbation method is as fol-
lows: 23 =3+0.1 rail

Now, find V and the expected range of V:


YV = ff + 30"v

where V=1+2+3=6 mils

Using the worst-case method, with positive tolerance


<>-_,:
(_,,.,,-
_)_+(_2- _)_
+. +(_:_,,-_)_
and where V+ = (1+0.1)+(2+0.1)+(3+0.1) = 6.3+

and with negative tolerance


v_,: zion,___o-,),(22
+o-2),(23
+03).....(2,.,
_+o-.,,)]
m

V_ =(1-0.1)+(2-0.1)+(3-0.I)=5.7_
The +Vlimits are valid if C v = (o"v / V')× 100 _<10 percent.
or
Partial derivative method.--The partial derivative method
is as follows:
=6+0.3 rail

-T-V= V'+3a v In the worst-case method, the tolerance on V" (i.e., 0.3 mil) is
worse than the 3o"v tolerance. Tolerance can and often does
where cause fit problems and circuit problems. Therefore, in some
cases we need to know what tolerance is acceptable.
Using the root-sum-square method,
, r_,,,__+(_'/2 ,, +(_v/_=
<>_:t_J<>_'c_j <";_+ c0x,,j
<>-<,, V = 6 mils

and
The +V limits are valid if Cv = (o"v / V)x 100 < 10 percent.
Thus, four methods are available for estimating the effects of
tolerance on a product. The worst-case method can be used on 0.I
o 1=-=0.033=o" 2=a 3
any problem. In those cases where the + Vworst-case limits are 3
not acceptable, other methods can be tried. The root-sum-
square method is usually valid if the functions are algebraically <>-,,:
(o-_+<,_
+o._1
,,_,:t_O-l_,"=)
additive. The perturbation or partial derivative methods are
valid only if the coefficient of variation is less than or equal to [3(0.033)2 ]1/2 : 0.0572
10 percent. 3o"v = 0.172

so that
Estimating Effects of Tolerance
V._=6+0.172 mil
The following examples illustrate how these tolerance equa-
tions can be used. Consider a stacked tolerance problem where In the root-sum-square method, the T value of 0.172 is the 3e_
the dependent variable is a linear function--three variables tolerance on V.
added to give V:

NASAPI'P--2000-207428 75
Asasecond example, consideravolumeproblem thathas Checking the validity gives
threevariables
in multiplication.
Find V" and the expected
range of V:
Cv = o'_.._.2
_,= _5 x 102 = 5 percent
V 100
V=LWH=lOftx5ftx2ft=lO0 ft 3
which is less than 10 percent. This solution is a better estimate
First, convert percent tolerances to engineering units: of the effects of tolerance on volume. Note also that various
values can now be estimated for different types of problems
L= 10 ft + I0 percent = 10 ft + 10 ft x0.1 =10 ft +1 ft regarding this volume because it has been represented as a
normal distribution function.
W = 5 ft + 10 percent =5 ft +5 ft x 0.1 = 5 ft +0.5 ft
Using the partial derivative method, again
H = 2 ft +5 percent =2 ft+2 fix 0.05=2 ft +0.1 ft
T = +__3o-
V_+= V"+ 3o_,

Using the worst-case method, where

V+ =(10+1)×(5+0.5)×(2+0.1)= 1 lx5.5x 2.1


or 9 x 4.5 x 1.9 = 127 or 77

The root-sum-square method cannot be used because these ) ... x,,


variables are not algebraically additive. Using the perturbation
method,
OV OV OV
V = LWH, = WH, _ = LH, _ = LW
m
OL OW OH
V = V _+3o"v

O'L = 0.33 ft, O'W = 0.17 ft , O'H = 0.03 fi


where

O'V = [(WH)2 O.2 +(LH)_,aW


9 2 +(LW)HCrbJ
2 "_'1,2

= [(5 x 2)2(0.33) 2 +(10 x 2)2(0.17) 2

+ (10x 5)2(0.03)2] 1/2

_rc_l
tr L ..... 0.33 ft
3 3 = (10.9 + 11.6 + 2.25) 1/2 = -_ = 5

O-w=Tw= 5-=0.17ft
3 3
V=100+15ft 3
¢rt-t = Tt4 =--=01 0.03 ft
3 3

o v = {[(10 + 0.33)(5)(2) - 100] 2 +[(5 + 0.17)(10)(2) -100] 2 This method is more work and gives the same results as the
perturbation method. Because the Cv= 5 percent, which is less
+[(2 + 0.03)(10)(5) - 100] 2 }1/2 than 10 percent, the method would be suitable to use.

=[(100.3-100) 2 +(103.4-100) 2 +(101.5-100) 2]


Concluding Remarks
= (10.89 + 11.56+ 2.25) j/2 = _/_ = 5

V = V- + 30"v = 100+15 ft 3 Now that you have completed chapter 5, you should have a
clear understanding of the following concepts:

76 NASA/TP--2000-207428
( 1) A probability density functionp(x) for a random variable (4) The normal distribution (also called the bell curve, the
describes the probability that the variable will take on a certain Gaussian distribution, and the normal curve of error) is a
range of values. probability density function. Using the normal distribution,
(2) The area under the density function is equal to unity, you should be able to solve the following types of problems:
which means that the probability is 1 that the variable will be (5) You should be able to take data measurements era certain
within the interval described by the density function. For device and calculate the mean of the data given by
example, the normal distribution describes the interval from
_oo to _. n
(3) Associated with each probability density function is a
cumulative probability distribution F(x) that represents the i=1
cumulative sum of the areas under the density function.
and the standard deviation of the data given by

o=
i=I

and

(a) Symmetrical two-limit problems, which are concerned with Using the data mean and standard deviation, you should then be
the probability of a variable taking on values within equal able to estimate the probability of failures occurring when more
distances from both sides of the mean.
of the same devices are tested or operated.

(6) The worst-case method can be used on any problem:


(a) Limits will be defined,
(b) No estimates can be made from the population
distribution.
(7) The root-sum-square method only applies to algebraic
variables that are additive.
-z _ z (8) The perturbation or partial derivative methods are only
(b) Nonsymmetdcal two-limit problems, which are similar to valid if the coefficient of variation is 10 percent or less.
(a) but within unequal distances from both sides of the
mean of the density function.

References
5-1. Croxton, F.E.: Tables ofAreas inTwo Tails and in One Tail ofthe Normal
Curve. Prentice-Hall Inc., 1949.
5-2. Hald, A.: Tables of the Cumulative Normal Distribution. John Wiley &
Sons, Inc., 1949.
5-3. Failure Distribution Analyses Study. Vols. 1,2, and 3, Computer Appli-
cations Inc., New York, 1964. (Avail. NTIS, AD-631525, AD--631526,
AD--631527.)
5-4. Heel, P.G: Elementary Statistics. John Wiley & Sons, 1960.
5-5. Berretonni, J.N: Practical Applications of the Weibull Distribution,
Industrial Quality Control, vol. 21, no. 2, Aug. 1964, pp. 71-79.
5-6. Reliability Prediction of Electronic Equipment. M IL-HDBK-217E,
Jan. 1990.
5-7. Electronic Reliability Design Handbook, MIL-HDBK-338. vols. I and
2, Oct. 1988.
5-8. Reliability Modeling and Prediction, MIL-STD-756B. Aug. 1982,

z.

(c) One-limit problems, which are concerned with the probability


of a variable taking on values above or below some limit.
represented by some distance from the mean of the density
function.

NASA/TP--2000-207428 77
Reliability Training

1. A unit is required to operate at 100 °F. If tests show the mean strength of the data for the unit is 123 °F and the standard
deviation is 9 °F, what is the probability that the unit will operate successfully; that is, P(x > 100 °F)?

A. 0.5234 B. 0.2523 C. 0.9946 D. 0.9995

2. A pressure vessel (including a factor of safety) has an upper operating limit of 8000 psi. Burst tests show a mean
strength of 9850 psi and a standard deviation of 440 psi. What is the probability of pressure vessel failure; that is,
P(x < 8000 psi)? 2

A. 0.04267 B. 0.04133 C. 0.04317

3. A memory drum is required to reach sink speed and stabilize in 15.5 sec at 125 °F. Five drums are tested with these
stabilizing time results: 13.2, 12.3, 14.8, 10.3, and 12.9 sec.

a. What is the mean stabilizing time?

A. 13.1 B. 10.7 C. 12.7

b. What is the standard deviation?

A. 1.63 B. 1.45 C. 1.32

c. What is the estimated percentage of drums out of specification; that is, P(x > 15.5 see)?

A. 6.7 B. 8.5 C. 4.3

4. A pyrotechnic gyro has an uncaging time requirement of 142 _-4-


20 msec. Six gyros were tested resulting in these
uncaging times: 123, 153, 140, 129, 132, and 146 msec.

a. What is the mean uncaging time?

A. 133.2 msec B, 135.2 msec C 137.2 msec

b. What is the standard deviation?

A. 10.2 B. 11.2 C. 11.9

c. What is the estimated percentage of gyros within specification; that is, P(122 < x < 162 msec)?

A. 89.8 B. 96.8 C. 82.6

5. A hydraulic pressure line was designed to the following stresses:

(a) Maximum operating pressure (actual), 1500 psi


(b) Design pressure (10-percent safety factor), 1650 psi

Tests of the pressure line indicated a mean failure pressure of 1725 psi and a standard deviation of 45 psi.

a. What is the reliability of the line when the design pressure limits are considered?

A. 0.10 B. 0.90 C. 0.95

lAnswers are given at the end of this manual.


2Thesuperscripted numbers in the answers are shorthand for 2.67x10-6.

78 NASA/TP 2000-207428
b. What is the reliability of the line when the maximum operating pressure is considered?

A. 0.99 B. 0.90 C. 0.80

6. A communications network requires a 1300-msec watchdog delay after initiation. A sample of 10 delays was tested

from a rack of 100 delays. The time delays of the circuits are as shown:

Circuit Delay,
number msec

1 1250

2 1400

3 1700

4 1435

5 1100

6 1565

7 1485

8 1385

9 1350

10 1400

a. What is the average (mean) delay time?

A. 1386 msec B. 1400 msec C. 1407 msec

b. What is the standard deviation?

A. 52.7 B. 87.1 C. 163.4

c. On the basis of this sample, what percentage of the 100 circuits will meet specifications (1300-msec or greater

delay)?

A. 75 B. 80 C. 90

7. A circuit contains four elements in series. Their equivalent resistance values are

Element Nominal Tolerance,"


resistance, T,
R, percent
ohm

A 100 '-'10

B 20 "-,1

C 10 +_5

D 10 4.5

_Where _T= +_..3_.

a. What is the nominal or mean total resistance RT ?

A. 120 _ B. 140 f_ C. 160

b. What are the worst-case R values (upper number, maximum; lower number, minimum)?

A. 131.6 f_ B. 176.3£'2 C. 151.2_

118.7 f2 146.2 _ 128.8 f2

NASA/TP--2000-207428 79
c. Using the root-sum-square method, what is the probability that RT > 135 f2?

A. 0.905 B. 0.962 C. 0.933

d. Using the perturbation method, what is the probability that RT > 135 _?

A. 0.905 B. 0.962 C. 0.933

8. Given power (watts) = I2R, where 1 = 0.5 A, T 1 = +5 percent, R = 100 f2, and TR = +10 percent. (Note: +T= +3c.)

a. What is the nominal or mean power output p ?

A. 25 W B. 20 W C. 30 W

b. What are the worse-case P values (upper number, maximum; lower number, minimum)?

A. 26.6 W B. 35.2 W C. 30.3 W


18.2 W 22.6 W 20.3 W

c. Using the perturbation method, what is the probability that (23.5 _< P < 26.5)?

A. 0.94 B. 0.80 C. 0.86

d. What is the Cv (in percent) for the perturbation method used in question 8c?

A. 12 B. 8 C. 4.6

e. Is the root-sum-square method valid for solving the probability problem 8c?

A. Yes B. No

f. Using the partial derivative method, what is the probability that 23.5 _< P < 26.5?

A. 0.942 B. 0.803 C. 0.857

80 NASA/TP--2000-207428
Chapter 6

Testing for Reliability


In chapters 3 and 4, we discussed the methods used to predict Km probability that manufacturing processes, fabrication, and
the probability that random catastrophic part failures would assembly techniques will not degrade inherent reliability
occur in given products and systems. These analytical tech- K r probability that reliability engineering activities will not
niques are well established (ref. 6-1). Yet, we should keep in decade inherent reliability
mind that they are practical only when adequate experimental K( probability that logistics activities will not degrade
data are available in the form of part failure rates. In other inherent reliability
words, their validity is predicated on great amounts of empiri- K u probability that user or customer will not decade inherent
cal information. reliability
Such is not the case when we undertake similar analyses to
determine the influence of tolerance and wearout failures on the
The term PcPt Pwdenotes inherent reliability Ri: (KqK m KrKeKu)
reliability of a product. An understanding of these failure are factors that affect the probability of the three modes of
modes depends on experimental data in the form of probability failure occurring during hardware manufacture and use rather
density functions such as those discussed in chapter 5. In than occurring from unreliable hardware design.
general, such data are unavailable on items at the part or system First, we illustrate how the empirical value of these terms
level; this kind of information must be developed empirically affects product reliability. Then, we discuss the particular test
through reliability test methods. methods used to develop these values. Assume that a device
Chapter 6 reviews and expands the terms used in the reliabil- was designed with a reliability requirement of 0.996. This
ity expression given in chapter 2 and then shows how the terms means that only 4 out of 1000 such devices can fail. The device
can be demonstrated or assessed through the application of contains 1000 parts, it has a function to perform within a
attribute test, test-to-failure, and life test methods (ref. 6-2). tolerance of X + 2 percent, and it must operate for a mission
cycle of 1000 hours at 50 °C.

Demonstrating Reliability
Pc Illustrated
Recall from chapter 2 that one way to define product reliabil-
ity is as the probability that one or more failure modes will not If we know the number and types of parts in the device plus
be manifested (ref. 6-3). This can be written as the applied stresses and part failure rates used in the exponential
distribution, e -t[_') we can estimate the probability that no
R= PcPtPw(KqKmKrK, Ku) catastrophic part failure will occur during the mission cycle.
where Assuming, for example, that our estimate is Pc- = 0.999 (i.e., one
device in 1000 will incur a catastrophic part failure during the
Pc probability that catastrophic part failures will not occur mission cycle), the product reliability of the device becomes
Pt probability that out-of-tolerance failures will not occur
Pw probability that wearout failures will not occur R = _.PtPw(K- factors) = e -t(y''_) PtPw(K - factors)
Kq probability that quality test methods and acceptance
criteria will not degrade inherent reliability = 0.999Pt Pw(K- factors)

NASA/TP_2000-207428 81
Pt Illustrated So far, the first two terms, Pc and Pr combine to yield a
probability of (0.999)(0.999) = 0.998. As a result, the remain-
Suppose that we now test one of the devices at 50 °C. If the ing terms, Pw (K-factors), must be no less than 0.998 if the
functional output is greater than the specified tolerance of 0.996 device requirement is to be satisfied. Therefore, we
X+ 2 percent, the reliability of that particular device is zero. It assume that we have demonstrated a Pw of 0.999, which
is zero because Pz is zero (i.e., R = (0.999)(O)P_,(K-factors) reduces the device reliability to
= 0). We can say, however, that the device will continue to
operate in an out-of-tolerance condition with a probability of no R = PcPtP,.(K-factors)= (0.999)(0.999)(0.999)(K- factors)
catastrophic failures equal to 0.999 just as we predicted. To
-- 0.997(K - factors)
understand this, recall that part failure rates reflect only the
electrical, mechanical, and environmental stresses applied to
the individual parts. For this reason, a prediction on the basis of K-Factors Illustrated
such data will neglect to indicate that (1) the parts have been
Since testing obviously must be conducted on real hardware,
connected to obtain a specified function, (2) a tolerance analy-
the K-factors as well as the P terms of reliability are present in
sis of the function has been performed, or (3) the parts are
every test sample. Establishing values for the K-factors requires
packaged correctly. In other words, Pc represents only how
that all failures observed during a test be subjected to
well the individual parts will operate, not how well the com-
physics-of-failure analyses to identify specific failure mecha-
bined parts will perform.
nisms. Actually, the action taken to prevent the recurrence of an
If nine more of the devices are tested at 50 °C with all the
observed failure mechanism determines the factor that caused
output functions remaining within the X+ 2 percent tolerance,
the failure. A failure that can be prevented by additional
Pt becomes 9/10 = 0.9 and the reliability of the device
screening tests as part of the quality acceptance criteria is
R = (0.999)(0.9)P w (K-factors). Because the reliability require-
charged to the K_ factor; one that requires additional control
ment of the device is 0.996, it should be clear that Pt must be
over some manuf/acturing process is charged to the K m factor,
greater than 0.996. Let us assume then that 1000 devices are
and so on. Failures that require changes in documentation,
tested at 147 °F with only one tolerance failure, which produces
design, and tolerance would be charged to the Pc, Pr or P,.
an observed Pt = 999/1000 = 0.999. The reliability of the device
terms as applicable.
is now
The least important aspect of testing is the ability to charge
an organization or function with responsibility for a failure.
R = (0.999)(0.999)Pw (K- factors) = 0.998P w (K- factors)
More important is the need to prevent observed failures from
recurring. This requires that corrective action be made a recog-
Note that, because operating time is accumulated during origi- nized part of each reliability test program.
nal functional testing, it is possible for random catastrophic part Getting back to the illustration, we assume that one failure
failures to occur. Remember, however, that this type of failure out of 1000 devices was caused by one of the K-factors even
is represented by Pc and not Pr though it could have been observed during a Pc, Pr or Pw failure
evaluation. This reduces the reliability of the device to

Pw Illustrated
R = PcPtPw (K- factors) = (0.999)(0.999)(0.999)(0.999)
Now let us take another operating device and see whether = 0.996
wearout failures will occur within the 1000-hour mission cycle.
If, as run time is accumulated, a faulty function output or which indicates that the device met its requirement.
catastrophic failure is caused by a wear mechanism, the reli-
ability of the device again becomes zero. It is zero because Pw
is zero as shown in the equation
Test Objectives and Methods
R = (0.999)(0.999)(0)(K- factors) = 0 The purpose of the preceding illustration was to provide a
better understanding of (I) how the P terms and the K-factors
Note the emphasis on the words "wear mechanism." Because relate to physical hardware and (2) the techniques for demon-
it is possible to experience random catastrophic part failures strating the terms through testing. Table 6-1 shows the sug-
and even out-of-tolerance conditions during a test for wearout, gested test methods. We say "suggested" because any of the test
it is absolutely necessary to perform physics-of-failure analy- methods can be used if certain conditions are met (ref. 6---4).
ses. This is essential to ascertain if the failures are caused by true These conditions are pointed out as each method is discussed.
physical wear before including them in the Pw assessment. Table 6-1 indicates the most efficient methods by assigning

82 NASA/TP--2000-207428
TABLE 6-1 .--TEST METHOD PRIORITIES two samples are subjected to a selected level of environmental
FOR DEMONSTRATING RELIABILITY
stress, usually the maximum anticipated operational limit. If
Reliability Suggested test method both samples pass, the device is considered qualified, preflight
term Attribute Tests to Life certified, or verified for use in the particular environment
tests failure tests involved (refs. 6-7 and 6-8). Occasionally, such tests are called
2 3 I tests to success because the true objective is to have the device
Pc

pass the test.


P_ 3 I 2
An attribute test is usually not a satisfactory method of testing
3 2 1
for reliability because it can only identify gross design and
K-factors 3 I 2 manufacturing problems. It can be used for reliability testing
only when a sufficient number of samples are tested to establish
an acceptable level of statistical confidence.
priority numbers from I to 3 (with 1 being the most efficient and
3 the least).
Statistical Confidence

The statistical confidence level is the probability that the


Test Objectives
corresponding confidence interval covers the true (but unknown)
At least 1000 test samples (attribute tests) are required to value of a population parameter. Such a confidence interval is
demonstrate a reliability requirement of 0.999. Because of cost often used as a measure of uncertainty about estimates of
and time, this approach is impractical. Furthermore, the total population parameters. In other words, rather than express
production of a product often may not even approach 1000 statistical estimates as point estimates, it is much more mean-
items. Because we usually cannot test the total production of a ingful to express them as a range (or interval), with an associ-
product (called product population), we must demonstrate ated probability (or confidence) that the true value lies within
such an interval.
reliability on a few samples. Thus, the main objective of a
It should be noted however, that statistical confidence inter-
reliability test is to test an available device so that the data will
allow a statistical conclusion to be reached about the reliability vals can be difficult to evaluate (see also refs. 6--4 and 6-9). For
of similar devices that will not or cannot be tested. That is, the simple distributions in reliability, intervals and levels are
main objective of a reliability test is not only to evaluate the calculated in a straightforward manner. For more complicated
specific items tested but also to provide a sound basis for or multiparameter distributions, especially where parameter
predicting the reliability of similar items that will not be tested estimates are not statistically independent, such intervals and
and that often have not yet been manufactured. levels can be very difficult to calculate.
As stated, to know how reliable a product is, one must know To illustrate further the limitations of attribute test methods,
how many ways it can fail and the types and magnitudes of the we apply statistics to the test results. Figure A--4(a) in appendix
stresses that produce such failures. This premise leads to a A shows on the ordinate the number of events (successes)
secondary objective of a reliability test: to produce failures in necessary to demonstrate a reliability value (abscissa) for various
the product so that the types and magnitudes of the stresses confidence levels (family of curves) when no failures are
causing such failures can be identified. Reliability tests that observed. Figures A--4(b) to (f) provide the same information
when one to five failures are observed.
result in no failures provide some measure of reliability but
little information about the population failure mechanisms of From the results of two devices tested with no failures, fig-
like devices. (The exceptions to this are not dealt with at this ure A--4(a) shows that we can state with 50-percent confidence
time.) that the population reliability of such devices is no less than
In subsequent sections, we discuss statistical confidence 71 percent. Fifty-percent confidence means that there is a
attribute test, test-to-failure, and life test methods, explain how 50-percent chance that we are wrong and that the reliability of
well these methods meet the two test objectives, show how the similar untested devices will actually be less than 71 percent.
test results can be statistically analyzed, and introduce the Similarly, we can also state from the same figure that we are
subject and use of confidence limits. 60 percent confident that the reliability of all such devices is
63 percent. But either way, the probability of success is less
Attribute Test Methods than encouraging.
To gain a better understanding of figure A--4 and the theory
Qualification, preflight certification, and design verification behind it, let us stop for a moment and see how confidence
tests are categorized as attribute tests fiefs. 6-5 and 6-6). They levels are calculated. Recall from chapter 2 that the combina-
are usually go/no-go and demonstrate that a device is good or tion of events that might result from a test of two devices was
bad without showing how good or how bad. In a typical test, given by

NASA/TP_2000-207428 83
TABLE 6--2.--RELIABILITY AND CONFIDENCE
R 2 +2RQ+Q 2= 1 LEVEL FOR TWO-SAMPLE A'ITRIBUTE TEST
WITH NO FAILURES
where Confidence Reliability, Risk.
level, R percent
R2 probability that both devices will pass percent

2RQ probability that one device will pass and one will fail I0 0.95 90
5O .71 50
Q2 probability that both devices will fail 60 .63 40
7O .55 30
In the power supply example, we observed the first event R 2 8O .45 20
90 .32 10
because both supplies passed the test. If we assume a 50-percent 99 .10 1
probability that both will pass, we can set R 2 = 0.50 and solve
for the reliability of the device as follows:
when one failure is observed, for 10 samples tested with one
observed failure, the statistically predicted or demonstrated
R 2 = 0.50
reliability at 90-percent confidence is 0.66. This answer is
found by solving
R = 0-40_.50=0.71
R l° + 10R9Q = 1 - 0.90

We then can say with 50-percent confidence that the population


R = 0.663
reliability of the device is no less than 0.71. By assuming a
50-percent chance, we are willing to accept a 50-percent risk of
which agrees with the figure to two places.
being wrong, hence the term "50 percent confident.'" If we want
only to take a 40-percent risk of being wrong, we can again Application.raThe discussion thus far has underscored the
solve for R from shortcomings of attribute tests when sample sizes are small.
Tests involving only two or three samples may reveal gross
errors in hardware design or manufacturing processes, but when
R 2 = 0.40 relied upon for anything more, the conclusions become risky
(refs. 6-7 and 6-8).
Attribute tests can be useful in testing for reliability when a
R= 0_.40=0.63 sufficient sample size is used. For example, 10 samples tested
without failure statistically demonstrate a population reliability
In this case, we can be 60 percent confident that the population of 0.79 at 90-percent confidence; 100 tests without failure
reliability of the devices is no less than 0.63. demonstrate a population reliability of 0.976 at 90-percent con-
Selection of the confidence level is a customer' s or engineer' s fidence. To understand better the application of attribute tests
choice and depends on the amount of risk he is willing to take and the use of figure A--4, consider the following examples:
on being wrong about the reliability of the device. The customer Example 1: During the flight testing of 50 missiles, five
usually specifies the risk he is willing to take in conjunction failures are observed. What confidence do we have that the
with the system reliability requirement. As higher confidence missile is 80 percent reliable?
levels (lower risk) are chosen, the lower the reliability estimate Solution 1: From figure A-4(f) the answer is read directly to
will be. For example, if we want to make a 90-percent confi- be a 95-percent confidence level. The a posteriori reliability of
dence (10-percent risk) statement based on the results of the test these 50 missiles, or that derived from the observed facts, is still
to success of two devices, we simply solve 45/50 = 90 percent. Thus, future flights will be at least 80 percent
reliable with a 5-percent risk of being wrong.
Example 2: An explosive switch has a reliability requirement
R 2 = (1 -Confidence level) = 1 -0.90 = 0.10
of 0.98. How many switches must be fired without a failure to
demonstrate this reliability at 80-percent confidence?
so that Solution 2: From figure A--4(a), the answer is read directly as
80 switches.
Example 3: A test report states that the reliability of a device
R= 0-_-_.10=0.316
was estimated to be 0.992 at 95-percent confidence based on a
test of 1000 samples. How many failures were observed?
Table 6-2 illustrates how the reliability lower bound changes Solution 3: In figure A--4(d), the 95-percent confidence curve
with various confidence levels. The curves in figure A--4 are crosses the 1000-event line at R = 0.992. Therefore, three
developed in a similar manner. In figure A--4(b), which is used failures were observed.

84 NASA/TP--2000-207428
Inthese
examples,
thepopulation reliabilityestimates may sizes N between 5 and 80. (Statistical basis for this rule:
represent
anyoftheP terms or the K-factors in the expression noncentral t distribution.)
for product reliability, depending on the definition of failure Example 5a: Ten items are tested to failure with an observed
used to judge the test results. For a device that is judged only on or measured S M of 5.8. What is the lower expected safety
its capability to remain within certain tolerances, the reliability margin of the untested population at 90-percent confidence?
would be the Pt term. Had catastrophic failures been included, Solution 5a: Set 5.8 on the movable slide at the top window
we would have demonstrated the Pc. and Pt terms. In general, for the S M value. Under N = 10 on the 90-percent window, read
attribute tests include all failure modes as part of the failure S D > 3.9. Without moving the slide, for successive levels of
definition and, consequently, the associated reliability is prod- confidence, 4.45 at 80 percent, 4.85 at 70 percent, 5.21 at
uct reliability with both the P terms and the K-factors included. 60 percent, and 5.57 at 50 percent.
Attribute testsafety margin slide rule.--A special-purpose Example 5b: Six samples are available for test. What S M is
slide rule that was developed to facilitate determining attribute required to demonstrate a population safety margin of 4.0 or
test/safety margin confidence levels will be available in class greater at 90-percent confidence level?
for these exercises. (See the back of this manual for the slide Solution 5b: Using the 90-percent window, set S D = 4.0
rule and the instructions to assemble it.) opposite N = 6, At S M read 7.1. Therefore, test results of 7.1 or
Examples 4 (confidence level for attribute test): Attribute greater will demonstrate S D > 4.0 at a 90-percent confidence
tests are tests to success. The objective is for a selected number level. If25 samples are available for test, set S O = 4.0 opposite
of samples, called tests on the slide rule, to operate successfully N = 25 on the 90-percent window. An S M of only 5.0 or greater
at some predetermined stress level. Some tests, however, may would demonstrate 4.0 or greater safety margin at 90-percent
fail. This slide rule handles combinations of up to 1000 tests and confidence.
up to 500 failures. The answer is a direct population reliability Sneak circuits.--During attribute testing, the flight hard-
reading of the untested population at a selected confidence ware may sometimes not work properly because of a sneak
level. Six confidence levels from 50 to 90 percent are available. circuit. A sneak circuit is defined for both hardware and software
(The statistical basis for this rule is the Z 2 approximation of as follows (ref. 6--10):
binomial distribution.)
Example 4a: Fifteen items are tested with one failure observed. (1) Hardware: a latent condition inherent to the system
What is the population reliability at 70-percent confidence level? design and independent of component failure that inhib-
Solution 4a: Set one failure on the movable slide above the its a desired function or initiates an undesired function
70-percent confidence level index. Read from TOTALNUMBEROF (path, timing, indication, label)
TESTSthe tests for a population reliability of 0.85 at 70-percent (2) Software: an unplanned event with no apparent cause-
confidence level. By setting one failure at successive levels of and-effect relationship that is not dependent on hard-
confidence this example gives these population reliabilities: ware failure and is not detected during a simulated
0.710 at 95-percent confidence level, 0.758 at 90 percent, 0.815 system test (path, timing, indication, label)
at 80 percent, 0.873 at 60 percent, and 0.895 at 50 percent.
Each sneak circuit problem should be analyzed, a cause
Example 4b: A population reliability of 0.9 at 95-percent
determined, and corrective action implemented and verified.
confidence level is desired. How many tests are required to
demonstrate this condition? References 6-10 to 6-12 give a number of examples of how this
can be done:
Solution 4b: Set zero failures at the 95-percent confidence
level index. From TOTALNUMBEROFTESTSread 29 tests directly
(1) Reluctant Redstone--making complex circuitry simple
above 0.90 population reliability. Therefore, 29 tests without
(2) F--4 example
failure will demonstrate this combination. If, however, one
(3) Trim motor example
failure occurs, set one failure at 95 percent. Then 46 others must
(4) Software example
pass the test successfully. Progressively more observed failures
such as 10 (set of 10 at 95 percent) require 170 successes A few minutes spent with one of these references should solve
(160 + I0). any sneak circuit problem.
Examples 5 (confidence level for safety margins): Safety Attribute test summary.--In summary, four concepts should
margin SM indicates the number of standard deviations _M be kept in mind:
between some preselected reliability boundary R b and the mean
of the measured sample failure distribution. Thus, (1) An attribute test, when conducted with only a few
SM = (XM - Rb)+aM, where XM and o"M are the measured samples, is not a satisfactory method of testing for reliability,
mean and standard deviation of the samples under test. The but it can identify gross design and manufacturing problems.
larger the sample size, the more nearly the measured SM (2) An attribute test is an adequate method of testing for
approaches the safety margin of the untested population St7 reliability only when sufficient samples are tested to establish
This rule equates S M for six levels of confidence for sample an acceptable level of statistical confidence.

NASA/TP--2000-207428 85
(3)Somesituations
dictateattribute
tests or no tests at all % xavgs
(e.g., limited availability or the high cost of samples, limited
time for testing, test levels that exceed the limits of test
equipment, and the need to use the test samples after testing).
(4) Confidence, a statistical term that depends on supporting
statistical data, reflects the amount of risk we are willing to take
when stating the reliability of a product.

Test-To-Failure Methods
0.003-percent
The purpose of the test-to-failure method is to develop a defective
failure distribution for a product under one or more types of
stress. Here, testing continues until the unit under test ceases to
",l/ I \
function within specified limits. Alternatively, test to failure I IIi "-4. i }
may be accomplished by increasing electrical load or mechani- 6 8 10 12 14 16 18 20
cal load until a failure is induced. The results are used to Gravity level, g
calculate the probability of the failure of the device for each Figure 6-1 .--Test-to-failure method appliedto metallic structure.
load. In this case, the failures are usually tolerance or physical Mean strength of material, Xavgs, 13; reliability boundary, Rb, 10;
wearout. The test-to-failure method is also valuable because we standard deviation, ss, 0.75; safety factor, SF, 13/10 or 1.3; safety
can determine the "spread" or standard deviation of the loads margin, SM, (110-131)/0.75or 4.0; probability of defect, 0.00003 or
that cause failure (or the spread of the times to failure, etc.). This 0.003 percent.
spread has a significant effect on the overall reliability.
In this discussion of test-to-failure methods, the term safety using S M provides a clearer picture of what is happening. In
factor S F is included because it is often confused with safety most cases, we must know the safety margin to understand how
margin S M. Safety factor is widely used in industry to describe useful the safety factor is.
the assurance against failure that is built into structural prod- Consider the example of the design of a support structure to
ucts. Safety factor S F can be defined as hold cargo in a launch vehicle. The component strength is
expressed and represented by its ability to withstand a particu-
w lar g force. Structural members (consisting of various mate-
S F = Xavgs rials) are tested with a mechanical load until failure occurs.
R_ We may have materials with clearly defined, repeatable, and
tight strength distributions, such as sheet and structural steel or
where
aluminum. Here, using S F presents little risk (see fig. 6-1 for
metallic structure where a normal (Gaussian) distribution is
Xavgs mean strength of material assumed). Alternatively, we may have plastics, fiberglass, and
other metal substitutes or processes with wide variations in
Rb reliability boundary, the maximum anticipated operat-
strength or repeatability and using S M provides a clearer picture
ing stress level the component receives
of a potential problem (see fig. 6-2 for a metal substitute, a
composite).
We choose to define "safety margin"S M by taking into account
the standard deviation or the spread of the data; hence, S M is the
To use and benefit from this concept we need to
number of standard deviations of the strength distribution that
lie between the reliability boundary Rb and the mean strength
(I) Know the material strengths and distributions
Xavg s " (2) Identify the reliability boundary R6 for the loading ofthe
material
(3) Know the safety margin to understand the usefulness of
sM= the safety factor

Using safety margins in this way in the design process has a


where crs is the standard deviation of the strength distribution. major benefit because they provide a clearer picture of what is
Using SFpresents little risk when we deal with materials with happening in the real world by taking strength distributions into
clearly defined, repeatable, and "tight" strength distributions, account. Also, the difference in the probability of defects (cal-
such as sheet and structural steel or aluminum. However, when culated by solving for the area under the normal distribution
we deal with plastics, fiberglass, and other metal substitutes or curve to the left of R b) is better reflected in the difference in the
processes with wide variations in strength or repeatability, strength margins.

86 NASA/TP--2000-207428
Rb Xavg$ 28 -- []
SM = 1.3 0
0
26 --
rn

24 -- 0
0

22-- []
0 Pass
0 t-t Fail
20 --O
D

I 18 []

6 8 10 12 14 16 18 20 []
Z
Gravity level, g _t6 []
4_

Figure 6--2.--Test-to-failure method applied to metal substitute E []


C
(composite). Mean strength of material, Xavg s, 13; reliability 0
14
boundary, R b, 10; standard deviation, _rs, 2.308; safety factor, E 0
Sir, 13/10 or 1.3; safety margin, SM, (110--13D/2.308 or 1.3;
co 12 0
probability of defect, 0.0968 or 9.68 percent.
[]

10 -- 0
In summary, test-to-failure methods can be used to develop
a strength distribution that provides a good estimate of toler- []

ance and physical wearout problems without the need for the 8 -- []

large samples required for attribute tests (note that extrapola- 0


Rb
tion outside the range of data should be avoided). The resu 1ts 0
6 --
of a test-to-failure exposure of a device can be used to predict 0
the reliability of similar devices that cannot or will not be tested. /- Failure

Testing to failure also provides the means for evaluating the 4 I-- Dj_N( -'" distribution

failure modes and mechanisms of devices so that improve- /', °7/1


ments can be made. It was also shown that a safety factor is
much more useful if the associated safety margin is known.
Stress
Test procedure and sample size.--Devices that are not
automatically destroyed upon being operated are normally not Figure 6-3.--Example of one-shot test-to-
expended or destroyed during a functional test. Electronic failure procedure.

equipment usually falls into this category. For such equipment,


a minimum sample size of five is necessary, each sample being
subjected to increasing stress levels until failure occurs or the
limits of the testing facility are reached, In the latter case, no Safety margins for single failure modes.--For devices that

safety margin calculation is possible because no failures are exhibit a single failure mode during a test-to-failure exposure,

observed. Here, we must rely on intuition when deciding the the safety margin and the reliability are calculated by the techni-
acceptability of the device. que just discussed in the definition of safety margin. The fol-
Test-to-failure procedure and sample size requirements for lowing examples further illustrate the method and show the
one-shot devices are different because a one-shot device is practical results.
normally expended or destroyed during a functional test. Ordi- Example 6: A test was conducted on a vendor's 0,25- and
nance items such as squib switches fall into this category. For 0.50-W film resistors to evaluate their ability to operate reliably
such devices, at least 20 samples should be tested, but 30 to 70 at their rated power levels. Thirty samples of each type were
would be more desirable. At least 12 failures should be observed tested by increasing the power dissipation until the resistance
during a test. In a typical one-shot test, of which there are many change exceeded 5 percent. The results are shown in
variations, a sample is tested at the reliability boundary and, if figure 6--4, from which the following points are noteworthy:
it passes, a new sample is tested at predetermined stress
increments until a failure occurs. Then, the next sample is tested (1) The mean strength of the 0.25-W resistor was less than

at one stress increment below the last failure. If this sample half the mean strength of the 0.50-W resistor: ._ 0.25 = 1.19 W
passes, the stress is increased one increment for the next sample. compared with ._ o.5o = 2.6 W. This was to be expected since
This process, depicted in figure 6--3, continues until at least 12 the 0.50-W resistor was larger, had more volume, and could
failures have been observed. dissipate more energy.

NASA/TP--2000-207428 87
(2) The standard deviation of the 0.25-W resistor was almost 11/59 = 18.7 percent actually did fail below 15 000 psi. The test
the same as that for the 0.50-W resistor: o'0.25 = 0.272 W; also shows that the reliability of the flame shield could be
Oo.5o = 0.332 W. This was also expected because both resistors improved either by selecting another type of material to obtain
were made by the same manufacturer and were subjected to the a higher mean strength or by changing the fabrication processes
same process controls and quality acceptance criteria. to reduce the large strength deviation.
(3) The 0.50-W resistor, because of its higher mean strength, Example 8: Samples of transistors from two vendors were
had a safety margin of 6.32 with reference to its rated power dis- tested to failure under high temperatures. Failure was defined
sipation of 0.50 W. According to table 5-5, this means that only as any out-of-tolerance parameter. The results shown in fig-
0.09149 resistors would exceed a 5-percent resistance change ure 6--6 indicate that vendor B's materials, design, and process
when applied at 0.50 W. The 0.25-W resistor, because of its control were far superior to vendor A's as revealed by the large
lower mean strength, had a safety margin of only 3.45 with differences in mean strength and standard deviation. With an
reference to its rated power of 0.25 W. According to table 5-5 S M of 1.41, 7.9 percent of vendor A's transistors would fail at
again, this means that 0.03337 resistors would exceed a 5-percent the 74 °C reliability boundary; with an SM of 8.27, vendor B's
resistance change when applied at 0.25 W. Derating the 0.25 W transistors would not be expected to fail at all. It is unlikely that
to 0.125 W increased the safety margin to 3.92 and decreased an attribute test would have identified the better transistor.
the expected number of failures to 0.04481, an improvement Example 9: Squib switch samples were tested to failure under
factor of 7.5. This, of course, is the reason for aerating compo- vibration in accordance with the procedure for testing one-shot
nents, as discussed in chapter 4. Although we have indicated that items. The results are shown in figure 6--7, where the mean and
a safety margin of 6.32 has statistical meaning, in practice a standard deviations of the failure distribution have been calcu-
population safety margin of 5 or higher indicates that the applic- lated from the failure points observed. As shown, _ s = 14 g's
able failure mode wilt not occur unless, of course, the strength and c s = 1.04 g's to produce a safety margin of 3.84 with
distribution deviates greatly from a normal distribution. reference to the reliability boundary of I0 g's.
The preceding examples have shown how the Pt product
Example 7: A fiberglass material to be used for a flame shield reliability term can be effectively demonstrated through test-
was required to have a flexural strength of 15 000 psi. The to-failure methods. This has been the case because each example
results of testing 59 samples to failure are presented in fig- except the squib switch involved a tolerance problem. The
ure 6--5. The strength distribution of the material was calculated examples also show that the K,n factor plays an important role
to have a mean of 19 900 psi and a standard deviation of in product reliability and that control over K-factors can ensure
4200 psi. The safety margin was then calculated as a significant increase in reliability.
Multiple failure modes.--Most products perform more than
15000-19000 one function and have more than one critical parameter for each
SM = = 1.17
4200 function. In addition, most products are made up of many types
of materials and parts and require many fabrication processes
Because, from table 5-7, S M = "_s / O's = 1.17indicates that during manufacture. It follows then that a product can exhibit
a variety of failure modes during testing.
87.9 percent of the samples will fail at reliability boundaries
above 15 000 psi, we can see that 12.1 percent will fail at Rb
boundaries below 15 000 psi. This analysis is optimistic in that
S M = 1.41
,'- Failure
/ distribution
Rb
i (a) j

S M = 1.17 L I I I I
Strength -2c -1 o lc 2o

Rb
i r- Failure

=l SM = 8.27 ,_/ di-stdbution

i
} 5
10 15 20 25 30x103
60 80 100 120 140 160 180 200
Strength, psi
Temperature, °C
I 1 l I I
-2c -lo x 1G 2o -20 - 1c .r 1o 20

Figure 6--5.--Strength distribution in fiberglassmaterial. Figure6-6.--Test-to-failure resultsfor two_transistors.(a) Vendor A.


)_s = 19 000 psi; ers = 4200 psi. Xs = 105°C; _rs = 22 °C. (b) Vendor B. Xs = 165 °C; _s = 11 °C.

88 NASA/TP--2000-207428
0 Pass
9 [] Fail
10 0
Failure
mll --o frequency
-- o o [] o 1

',_ 13 -- o o D O O E300 E3 3

(5 14 -- Don DO O FI iID 6

15 -- [] O [] [] 3
t [] I I
16 t / X-Failure
10 20 30
/ distribution
Sample number

Figure 6-7.--Vibration test-to-failure results of one-shot device (squib switch). Xs = 14 g's; _rs = 1.04 g's.

SM = 7.6 =

=21---.1 Stress dis__

, / I ,_-_'-Failure
'*-SM = 3'5--_ / j--Jj'_/ distributions
25 45 65 85 105 125 145
Temperature, °F

Figure 6-9.--Stress distribution for operating temperature.


Xs = 85 °F; crs = 20 °F.

t20 140 160 180 200 220


Temperature, °F

Figure 6-8.--Test-to-failure results when multiple failure modes When stress distribution is known.--When safety margins
are observed. are calculated with reference to a single point or a fixed
reliability boundary, the resulting reliability estimate is conser-
In the conduct of a test to failure, each failure mode detected vative because it is assumed that the equipment will always be
must be evaluated individually; that is, a failure distribution operated at the reliability boundary. As an illustration, fig-
must be developed for each failure mode and safety margins ure 6-9 shows the stress distribution for the operating tempera-
must be calculated for each individual failure distribution. ture of a device and the maximum anticipated operating limit
Moreover, as mentioned before, at least five samples or failure ( 145 °F), which is given in the device specifications and would
points are needed to describe each failure mode distribution. normally be considered the reliability boundary.
To see this more clearly, consider the test results shown in Figure 6-10 shows the strength distribution of the device for
figure 6-8. Here, each of the three failure modes observed is high temperatures and also that a safety margin for the device,
described in terms of its own failure distribution and resulting when referenced to the 145 °F reliability boundary, is 1.54, or a
safety margin with reference to the same reliability boundary. reliability of 93.8 percent. We know, however, that the 145 °F
If these failure modes are independent and each represents an limit is the 3tY limit of the stress distribution and will occur only
out-of-tolerance Pt condition, the Pt of the test device is given 0.135 percent of the time. The question is, How does this affect
by the estimated reliability of the device in the temperature
environment?
If we select random values from the stress and strength dis-
Pt,total = Pt,|(SM = 3"5)Pt,2 (SM = 2"I)Pt,3(SM = 7.6) tribution and subtract the stress value from the strength value,

= (0.9998)(0.9821)(1.00) = 0.9819 a positive result indicates a success--the strength exceeds the


stress. A negative result indicates a failure--the stress exceeds
This also shows that the independent evaluation of each failure the strength. With this knowledge, we can calculate adifference
mode identifies the priorities necessary to improve the product. distribution and through the application of the safety margin
For example, the elimination of failure mode 2, either by technique, solve for the probability of the strength being greater
than the stress (i.e., success). This difference distribution is also
increasing Pt,2 to 1 or by eliminating the mode altogether
distributed normally and has the following parameters:
increases Pt.totat from 0.9819 to 0.9998.

NASA/TP--2000-207428 89
R b = 145 °F
difference
SM = 1.54 distribution,
t-S M = 3. 0.9996
Rb 33/'_ Area under

0 8 32 56 80 104 128 152


126 139 152 165 178 191 204 Temperature, °F
Temperature, °F Figure 6-11 .--Strength and stress difference distribution.
Figure 6-10.--Strength distribution for operating temper- Xs = 80 °F; _s = 24 °F.
ature. Xs = 165 °F; ers = 13 °F.

TABLE 6-3._ONFIDENCE LEVELTABLES


FOR VARIOUS SAMPLE SIZES
_'difference = _s - Xstress Confidence Sample size
level.
percent
I,.,°.oI.,,o.oi,o,o,®
O'difference = - Gstress } Confidence level tables

99 A-3(a) A-3(b) A-3(c) A-3(d)


From the strength and stress distribution parameters given in 95 A-4(a) A-4(b) A-4(c) A-4(d)
the preceding example (figs. 6--9 and 6--10), 90 A-5(a) A-5(b) A-5(¢) A-5(d)

-_difference = 165 - 85 = 80 °F
deviations of the strength distribution must be adjusted to
reflect the sample size used in their calculation. For this
purpose, tables A-3 to A-5 in appendix A have been developed
O'difference = (202 + 132) 1/2 = 24 °F by using the noncentral t distribution. Table 6-3 shows the
applicable appendix A tables for selected confidence levels and
This distribution is shown in figure 6--11. sample sizes, and the examples that follow illustrate their use.
Because positive numbers represent success events, we are Example 10: Upon being tested to failure at high tempera-
interested in the area under the difference distribution that tures, 10 devices were found to have a failure distribution of
includes only positive numbers. This can be calculated by using Xs = 112.7 °C and tys = 16 °C. The reliability boundary was
zero as the reliability boundary and solving for the safety 50 °C. Find the safety margin and reliability demonstrated at
margin from 90-percent confidence.
Solution 10:
Step l--Solve first for the observed safety margin.
SM = 0-Xs= --
0-80 = 3.33
crs 24

SM = Rb-E s =50-112.7 = 3.92


This 3.33 safety margin gives a reliability of 0.9996 when the o"s 16
stress distribution is considered. Comparing this result with the
estimated reliability of 0.938 when the reliability boundary From table 5-7, the observed reliability is 0.99996.
point estimate of 145 °F was used shows the significance of
knowing the stress distribution when estimating reliability Step 2--Now in appendix A refer to table A-5(a), which deals
values. with 90-percent confidence limits for safety margins, and
Confidencelevels.--As discussed before, the main objective follow across to column N = 10, the number of samples. The
in developing a failure distribution for a device by test-to- values under the N headings in all the tables listed in table 6-3
failure methods is to predict how well apopulation of like devices represent the observed safety margins for sample sizes as
will perform. Of course, such failure distributions, along with calculated from raw test data. The S M column lists correspond-
the resulting safety margins and reliability estimates, are sub- ing population safety margins for the observed safety margins
ject to error. Errors result from sample size limitations in much shown under the N headings. Finally, corresponding popula-
the same way as the demonstrated reliability varies with sample tion reliability estimates are shown under the Px headings,
size in attribute testing. Specifically, the mean and the standard which may represent Pr or P,. as applicable.

90 NASAfrP--2000-207428
Step3--Proceed
downtheN = 10 column to 3.923, the 20

observed safety margin derived in step 1.

18
Step 4---Having located S M -- 3.923 with 10 samples, follow 13
SF = _ = 1.3
horizontally to the left to find the demonstrated population 10

safety margin in the S M column. This is 2.6. 16

Step 5--With a population S M of 2.6, follow the same line to the


right to find the population reliability estimate under the Px 14

heading. This value is 0.9953. Recall that the observed safety


margin was 3.923 and the observed reliability, 0.99996.
12
Example 11: Twelve gyroscopes were tested to failure by
SM = 1.3
using time as a stress to develop a wearout distribution. The
wearout distribution was found to have an 2 s of 5000 hours and 10 R b
a a s of 840 hours. Find the Pw demonstrated at 95-percent
confidence with a reliability boundary of I000 hours.
Solution 11: 8 _- 9.68 percent defective
Step 1--The sample safety margin is
(a)
6 v

1000 - 5000 p_x)


SM = = 4.76
840
18

Step 2--The population safety margin at 95-percent confi- 13


=1.3
dence with a 12-sample safety margin of 4.76 is read directly SF = 10
16
from table A--4(a) to be 3.0.

Step 3--For a population S M of 3.0, the corresponding Pw under


14
the Px column is 0.9986. Therefore, 99.86 percent of the
gyroscopes will not wear out before 1000 hours have been
accumulated. 12
Safetyfactor.--This section is included in the discussion of
test-to-failure methods because the term "safety factor" is often SM = 4.0

confused with safety margin. It is used widely in industry to 10

describe the assurance against failure that is built into structural • " "- 0.003 percent defective
products. There are many definitions of safety factor S F, with
the most common being the ratio of mean strength to reliability
boundary:
(b)

p(x)
Figure 6-12._Two structures with identical safety factors
(SF= 13/10 = 1.3) but with different safety margins.

When dealing with materials with clearly defined, repeatable, (a) Structure A. (b) Structure B.
and "tight" strength distributions, such as sheet and structural
steel or aluminum, using S F presents little risk. However, when reliability terms without the need for the large samples required
dealing with plastics, fiberglass, and other metal substitutes or for attribute tests.
processes with wide variations in strength or repeatability, using (2) The results of a test-to-failure exposure era device can be
S g provides a clearer picture of what is happening (fig. 6-12). used to predict the reliability of similar devices that cannot or
In most cases, we must know the safety margin to understand will not be tested.
how accurate the safety factor may be. (3) Testing to failure provides a means of evaluating the
Test-to-failure summary .--In summary, you should under- failure modes and mechanisms of devices for improvement
stand the following concepts about test-to-failure applications: purposes.
(4) Testing to failure allows confidence levels to be applied
(1) Developing a strength distribution through test-to-failure to the safety margins and to the resulting population reliability
methods provides a good estimate of the Pt and Pw product estimates.

NASA/TP--2000-207428 91
(5)Toknowhowaccurateasafety
factormaybe,wemust Test-to-failure methods generate lines on the surface parallel
alsoknowtheassociated
safety
margin. to the stress axis; life tests generate lines on the surface parallel
to the time axis. Therefore, these tests provide a good descrip-
tion of the failure surface and, consequently, the reliability of
Life Test Methods a product.
Attribute tests result only in a point on the surface if failures
Life tests are conducted to illustrate how the failure rate of a
occur and a point somewhere on the x,y-plane if failures do not
typical system or complex subsystem varies during its operat- occur. For this reason, attribute testing is one of the least
ing life. Such data provide valuable guidelines for controlling
desirable methods for ascertaining reliability. Of course, in the
product reliability. They help to establish burn-in require- case of missile flights or other events that produce go/no-go
ments, to predict spare part requirements, and to understand the
results, an attribute analysis is the only way to determine
need for or lack of need for a system maintenance program. product reliability.
Such data are obtained through laboratory life tests or from the
Application.--Although life test data are derived basically
normal operation of a fielded system.
for use in evaluating the failure characteristics of a product,
Life tests are performed to evaluate product failure-rate
byproducts of the evaluation may serve many other purposes.
characteristics. If failures include all causes of system failure,
Four of the most frequent are
the failure rate of the system is the only true factor available for
evaluating the system's performance. Life tests at the parts
(1) To serve as acceptance criteria for new hardware. For
level often require large sample sizes if realistic failure-rate
example, a product may be subjected to a life test before it is
characteristics are to be identified and laboratory life tests are
accepted for delivery to demonstrate that its failure rate is below
to simulate the major factors that influence failure rates in a
some predetermined value. Examples of such applications are
device during field operations. Furthermore, the use of running burn-in or debugging tests and group B life tests conducted on
averages in the analysis of life data will identify burn-in and electronic parts. Some manufacturers of communications sat-
wearout regions if such exist. Failure rates are statistics and ellites subject all electronic parts to a 1200-hour burn-in test
therefore are subject to confidence levels when used in making
and use only the ones that survive.
predictions (see refs. 6-13 to 6--17). (2) To identify product improvement methods. Here, life
Figure 6--13 illustrates what might be called a failure surface
tests serve a dual purpose by providing hardware at essentially
for a typical product. It shows system failure rate versus no cost for physics-of-failure analyses. In turn, these analyses
operating time and environmental stress. These three param- identify failure mechanisms and the action needed to reduce
eters describe a surface such that, given an environmental stress effectively a product' s failure rate. In the past 10 years, this has
and an operating time, the failure rate is a point on the surface.
resulted in significant part failure-rate reductions. In fact, the
failure rates of some components have been reduced so far that
accelerated life tests (life tests at elevated stress levels) and
test-to-failure techniques must be employed to attain reliability
improvements in a reasonable timeframe.
Typicalattribute
test point (failure l (3) To establish preventive maintenance policies. Products
occurs)-_ \ Failure with known or suspected wear mechanisms are life tested to
\ rate determine when the wearout process will begin to cause
\\
undesirable failure-rate trends. Once the wearout region is
x, y-plane established for a product, system failures can be reduced by
(no failures) -_ implementing a suitable preventive maintenance plan or over-
\
\
haul program. This is effectively illustrated in figure 6-14,
which shows the failure-rate trend in a commercial jet aircraft
subsystem. Here, the upward trend after 4000 hours of opera-

t_

0 1 2 3 4 5x103
/ 0,0,0 Operating time, hr
Figure6-14.MFailure-rate charactedsticsof commercialjet electronic
Figure 6-13.--Product failuresurface. subsystem.

92 NASA/TP--2000-207428
tion wasrevealed to becaused by a servomechanism that The assumption of an intrinsic failure rate may not be valid in
required lubrication. By establishing a periodiclubrication some cases, but life test results have traditionally been reported
schedule forthemechanism, furtherfailureswereeliminated. this way.
Notethatthissubsystem alsoexhibited burn-inandintrinsic- To see this illustrated, consider the results of a 4000-hour life
failure-rate
regions. test of a complex (47 000 parts) electronic system as shown in
(4)Toassess reliability.
Here,testsareperformedorlifedata figure 6-15. This graph plots cumulatively in terms of the times
arecollectedfromfielded systems toestablish
whether contrac- the 47 failures are observed so that the slopes of the lines
tualreliabilityrequirements areactuallybeingmet.Incases of represent the failure rate. The solid line shows the system
noncompliance andwhenthefieldfailures areanalyzed,one of failure rate that resulted from assuming an intrinsic failure rate,
the preceding methods is employed to improve the product, or which was
else a design change is implemented. The effectiveness of the
corrective action is then evaluated from additional life data.
Total failures 47
Because life-test-observed failure rates include catastrophic, ,;t,.... 1 failure/86 hours
tolerance, wearout, and K-factor failures, life tests usually Total operation time 4000
demonstrate product reliability.
From the plotted test data, it is obvious that this intrinsic failure
Test procedure and sample size.--Conducting a life test is
rate was not a good estimate of what really happened. The plotted
fairly straightforward. It involves only the accumulation of
data indicate that there were two intrinsic-failure-rate portions:
equipment operating time. Precautions must be taken, how-
one from 0 to 1000 hours and the other from 1000 to 4000 hours.
ever, when the test is conducted in a laboratory. Operating con-
ditions must include all the factors that affect failure rates when In the 0- to 1000-hour region, the actual failure rate was

the device is operated tactically. Major factors are environ-


ment, power-on and power-off times, power cycling rates, 35
2 = _ = 1 failure/29 hours
preventive maintenance, operator tasks, and field tolerance 1000
limits. Ignoring any of these factors may lead to an unrealistic
failure-rate estimate. or about 3 times higher than the total average failure rate of
When accelerated life tests are conducted for screening 1/86 hours; in the 1000- to 4000-hour region, the actual failure
purposes, stress levels no greater than the inherent strength of rate was
the product must be chosen. The inherent strength limit can be
evaluated through test-to-failure methods before the life tests
are conducted.
Failure
Experience with nonaccelerated life tests of military stan-
50 rate,
dard electronic parts for periods as long as 5000 hours indicates L,
that an average of one to two failures per 1000 parts can be failures/hr

expected. For this reason, life tests will not provide good 1/250 --,
\
reliability estimates at the part level except when quantities on
40
the order of 1000 or more parts are available. On the other hand,
life tests are efficient at the system level with only one sample
as long as the system is fairly complex (includes several thousand
parts).
._ 30
Life tests intended to reveal the wearout characteristics of a
device may involve as few as five samples, although from 20 to >=
30 are more desirable if a good estimate of the wearout
distribution is to be obtained.
Analyzing life test data.--Recall from chapter 3 that an 2o
empirical definition of mean time between failures (MTBF)
was given as

lO
Total test hours
MTBF =
Total observed failures

Remember also that because this expression neglects to show


t
0 1 2 3 4x103
when the failures occur, it assumes an intrinsic failure rate and
therefore an intrinsic mean time between failures, or MTBF. Operating time, hr

Figure 6-15.--Results of complex electronic system life test.

NAS A/TP--2000-207428 93
High
12 temperature
= _ = 1failure/250
hours
30OO
orabout2.9timeslowerthantheaverage. 40
:_ -,; ,emperature__ .._
Thisillustration
establishes
thedesirability
ofknowing when
failuresoccur,notjustthenumberoffailures.The results of
30 m
analyzing data by regions can be used to evaluate burn-in and
spare parts requirements. The burn-in region was identified to
be from 0 to 1000 hours because after this time the failure rate
decreased by a factor of 8.6.
This result also has a significant effect on logistics. For
20
example, if we assume that the system will accumulate
1000 hours per year, we can expect during the first year to 10
replace 35 parts:

0 1 2 3x103
1 failure x 1000 hours) Operating time, hr
Figure 6-16.--Running average failure-rate analysis of lifetest
data (300-hr running average in 50-hr increments).
whereas during the next and subsequent years we can expect to
make only four replacements:
At the end of the 3000-hour period, the failure rate was
3.3 failures per 1000 hours. This reflected a tenfold decrease
/ x1000h°urs/250hours
lfailure from the initial failure rate during debugging, typical of the
results observed for many complex systems. An example of a
running average failure-rate analysis that identifies a system
Using the average failure rate of 1 failure/86 hours, we would
wearout region is shown in figure 6-17. The increasing failure
have to plan, however, for 28 replacements every year. Obvi-
rate after 3000 hours was caused by relay failures (during
ously, the cost impact of detailed analysis can be substantial.
approximately 10 000 cycles of operation). This type of infor-
Runningaverages.--When system failure rates are irregular
or when there is a need to evaluate the effect of different mation can be used to establish a relay replacement requirement
as part of a system preventive maintenance plan.
operating conditions on a system, running average analyses are
Confidence levels.--As discussed in chapter 4, failure rates
useful. This can best be illustrated through the example pre-
are statistical. Consequently, they are subject to confidence
sented in figure 6-16. A 300-hour running average in 50-hour
levels just as attribute and test-to-failure results are influenced
exposures is shown for a complex system during an engineer-
by such factors. Confidence levels for intrinsic failure rates are
ing evaluation test. (Running averages are constructed by
calculated by using table A-2 in appendix A.
finding the failure rate for the first 300 hours of operation, then
To use this table, first calculate the total test hours accumu-
dropping the first 50 hours and picking up the 300- to 350-hour
lated from
interval and calculating the new 300-hour regional failure rate,
and then repeating the process by dropping the second 50 hours
n
of data and adding the next 50 hours for the total test period.)
From the resultant curve, you can readily see (1) the effects of t = ZNiTi
the debugging test, (2) the increase in failure rate during the i=1
high-temperature test and the decrease after that test, (3)
another increase during low-temperature exposure and the sub- where
sequent decrease, (4) a slight increase caused by vibration, and
(5) a continuously decreasing rate as the test progressed. The Ni ith unit tested
curve indicates that the system is the most sensitive to high ti test time ofN i
temperature and that because the failure rate continued to n total units tested
decrease after high-temperature exposure, exposure to high
temperatures is an effective way to screen defective parts from Then find under the number of failures observed during the test
the system. Because the failure rate continued to decrease after the tolerance factor for the desired confidence level. The lower
the tests were completed, neither low temperature nor vibration limit for the MTBF at the selected confidence level is then
caused permanent damage to the system. found from

94 NASA/TP--2000-207428
40_ Example I4: Had four of the six failures in example 13 been
observed in the first 1000 hours, what would be the demon-

A strated MTBFat 80-percent


to 3000 hours?
confidence in the region from 1000

Solution 14:
30 -- _ Relay Step t--The total test time is given as t = 2000 hours.
r-

%
Step 2--From table A-2 find the tolerance factor for two
20 failures at 80-percent confidence to be 4.3.

_ _ wearout Step 3--Find the demonstrated MTBF at 80-percent confi-


dence after I000 to 3000 hours.

lO -
2000
I
MTBF = _ = 465 hours
4.3
I I ', I
0 1 2 3 4x 103 Example 15: It is desired to demonstrate an 80-hour MTBF
Operating time, hr on a computer at 90-percent confidence. How much test time
Figure 6-17.--Running average failure-rate analysis of life test is required on one sample if no failures occur?
data identifying wearoutregion (600-hr running averagein Solution 15:
200-hr increments). Step I--From table A-2 find the tolerance factor for no
failures at 90-percent confidence to be 2.3.

MTBF =
Tolerance factor Step 2--Because the desired 90-percent-confidence MTBF
is given as 80 hours and the tolerance factor is known, calculate
and the upper limit for failure rate from the total test time required from

Tolerance factor t= (MTBF)(Tolerance factor)= (80)(2.3)= 184 hours


2=

to prove that 184 hours with no failures demonstrates an


Example 13: A system was life tested for 3000 hours, during 80-hour MTBF at 90-percent confidence.
which six failures were observed. What is the demonstrated A good discussion of fixed time and sequential tests is given
80-percent-confidence MTBF? in MIL-STD-781D (ref. 6-3).
Solution 13: Life test summary.--In summary, the following concepts
Step 1--Solve for the total test hours. are reiterated:

/1

( 1) Life tests are performed to evaluate product failure-rate


t = _ NiT i = I x 3000 = 3000 characteristics.
i=1
(2) If "failures" include all causes of system failure, the
failure rate of the system is the only true factor available for
Step 2--From table A-2 find the tolerance factor for six
evaluating the system's performance.
failures at 80-percent confidence to be 9.0.
(3) Life tests at the part level require large sample sizes if
realistic failure-rate characteristics are to be identified.
Step 3--Solve for the demonstrated MTBF.
(4) Laboratory life tests must simulate the major factors that
influence failure rates in a device during field operations.
t 300O
MTBF .... 333 hours (5) The use of running averages in the analysis of life data
Tolerance factor 9 will identify burn-in and wearout regions if such exist.
(6) Failure rates are statistics and therefore are subject to
in contrast to the observed MTBF of 3000/6 = 500 hours. confidence levels when used in making predictions.

NASA/TP--2000-207428 95
Concl usion References

When a product fails, whether during a test or from service, 6-1. Reliability Program for Systems and Equipment Development and
Production. MIL-STD---785B, July 1986.
a valuable piece of information about it has been generated. We
6-2. Bazovsky. I.: Reliability Theory and Practice. Prentice-Hall, 1963.
have the opportunity to learn how to improve the product if we
6-3. Reliability Testing for Engineering Development, Qualification. and
take the right actions. Production. MIL-STD--781 D, Oct. 1986.
Much can be learned from each failure by using good failure 6--4. Reliability Test Methods, Plans, and Environments for Engineering
reporting, analysis, and a concurrence system and by taking Development, Qualification. and Production. MIL-HDBK-781. July
1987.
corrective action. Failure analysis determines what caused the
6-5. Laube, R.B.: Methods to Assess the Success of Test Programs.
part to fail. Corrective action ensures that the cause is dealt with.
J. Environ. Sci., vol. 26, no. 2. Mar.-Apr. 1983, pp. 54-58.
With respect to testing, experimentation and evaluation to 6-6. Test Requirements for Space Vehicles. MIL-STD-1540B, Oct. 1982.
determine failure modes and effects greatly benefit reliability 6-7. Haugen. E B.: Probabilistic Approaches to Design. John Wiley & Sons,
1968.
analysis. They do so by giving precise answers to the questions
6-8. Kececioglu, D.; McKinley, J.W.; and Saroni, M.J.: A Probabilistic
of why and how a product or component fails. Testing helps to
Method of Designing Specified Reliabilities Into Mechanical Compo-
reduce high development risks associated with a completely
nents With Time Dependent Stress and Strength Distributions. NASA
new design, to analyze high-risk portions of the design, and to CR-72836, 1967.
confirm analytical models. 6-9. Laubach, C.H.: Environmental Acceptance Testing. NASA SP-T-
Attribute tests, although not the most satisfactory method of 0023, 1975.
6-10. Sneak Circuit Analysis. Boeing Safety Seminar, Boeing Systems
testing, can still identify gross design and manufacturing
Division, Aug. 1985.
problems. Test-to-failure methods can be used to develop a
6-11. Sneak Circuit Analysis. Naval Avionics Center, R&M-STD--R00205,
strength distribution that gives a good estimate of tolerance and May 1986.
physical wearout problems without the need for large samples 6-12. Sneak Circuit Application Guidelines. Rome Air Development Center,
required in attribute tests. Life tests are performed to evaluate RADC-TR-82-179. June 1989. (Avail. NTIS AD--A118479.)
6-13. Electronic Reliability Design Handbook. MIL-HDBK-338. vols. I
product failure-rate characteristics but the tests must be
and 2, Oct. 1988.
carefullydesigned.
6-14. Reliability Modeling and Prediction. MIL-STD--756B. Aug. 1982.
All these test methods can be used to establish system level 6-15. Reliability Prediction of Electronic Equipment. MIL-HDBK-217F,
reliability and when conducted properly and in a timely fash- Notice A, Jan. 1992.

ion, can give valuable information about product behavior 6-16. Kregszig, E.: Introductory Mathematical Statistics. Principles and

and overall reliability. Methods. John Wiley & Sons, Inc., 1970.
6-17. Robert, N.H.: Mathematical Methods in Reliability Engineering.
McGraw Hill, 1964.

96 NASA/TP--2000-207428
Reliability Training l

1. Seven hydraulic power supplies were tested in a combined high-temperature and vibration test. Outputs of six of the seven
units tested were within limits.

a. What is the observed reliability R of the seven units tested?

A. 0.825 B. 0,857 C. 0.913

b. What is the predicted population reliability R at 80-percent confidence?

A. 0.50 B. 0.75 C. 0.625

c. How many tests (with one failure already experienced) are needed to demonstrate R = 0.88 at 80-percent confidence?

A. 24 B. 15 C. 30

2. A vibration test was conducted on 20 autopilot sensing circuits with these results: Mean -_s = 7.8 g's; standard deviation
o"s = 1.2 g's; reliability boundary R b = 6 g's,

a. What is the observed safety margin SM?

A. 2.0 B. 1.0 C. 1.5

b. What is the observed reliability R?

A. 0.900 B. 0.935 C. 0.962

c. What is the predicted population safety margin S M at 80-percent confidence?

A. 1.19 B. 2.19 C. 3.19

d. What is the predicted population reliability R at 80-percent confidence?

A. 0.75 B. 0.95 C. 0.88

e. How could the autopilot be made more reliable?

A. Add brackets, thicker mounting materials, stiffer construction.

B. Control material tolerances more tightly; inspect torque values and weld assemblies.

C. Use vibration isolators.

D. All of the above.

3. Twenty-five low-pressure hydraulic line samples were tested to destruction. These lines are rated to carry 30 psia (Rb);
_s = 31.5 psia; crs = 0.75 psia.

a. What is the observed S M of these test items?

A. 1.0 B. 2.0 C. 3.0

1Answers are given at the end of this manual. Please assemble and use the slide rule at the back of this manual todo this problem set.

NASAfrP--2000-207428 97
b.Whatisthepredicted
population
safety
margin
S M at 90-percent confidence?

A. 0.95 B. 1.25 C. 1.51

c. The design requirement calls for an S M 2 4.0 at 90-percent confidence. After discussing the problem with the designer, it
was learned that the 30-psia rating included a 2.5-psia "pad." Using the corrected R b of 27.5 psia, now what are the SM and
S O at 90-percent confidence?

i. S M (observed) = ?

A. 4.22 B. 5.33 C. 6.44

ii. S D (predicted) = ?

A. 4.28 B. 3.75 C. 4.80

98
NASA/TP--2000-207428
Chapter 7
Software Reliability
Software reliability management is highly dependent on how One impediment to the establishment of software reliability
the relationship between quality and reliability is perceived. For as a science is the tendency toward programming development
the purposes of this manual, quality is closely related to the philosophies such as (1) "do it right the first time" (a reliability
process, and reliability is closely related to the product. Thus, model is not needed), or (2) "quality is a programmer's devel-
both span the life cycle. opment tool," or (3) "quality is the same as reliability and is
Before we can stratify software reliability, the progress of measured by the number of defects in a program and not by its
hardware reliability should be briefly reviewed. Over the past reliability." All these philosophies tend to eliminate probabilis-
25 years, the industry has observed (1) the initial assignment of tic measures because the managers consider a programmer to
"wizard status" to hardware reliability for theory, modeling, be a software factory whose quality output is controllable,
and analysis, (2) the growth of the field, and (3) the final adjustable, or both. In actuality, hardware design can be con-
establishment of hardware reliability as a science. One of the trolled for reliability characteristics better than software design
major problems was aligning reliability predictions and field can. Design philosophy experiments that failed to enhance
performance. Once that was accomplished, the wizard status hardware reliability are again being formulated for software
was removed from hardware reliability. The emphasis in hard- design. (Some of the material in this chapter is reprinted with
ware reliability from now to the year 2000, as discussed in permission from ref. 7-1.) Quality and reliability are not the
chapter 1, will be on system failure modes and effects. same. Quality is characteristic and reliability is probabilistic.
Software reliability has reached classification as a science for Our approach draws the line between quality and reliability
many reasons. The difficulty in assessing software reliability is because quality is concerned with the development process and
analogous to the problem of assessing the reliability of a new reliability is concerned with the operating product. Many models
hardware device with unknown reliability characteristics. The have been developed and a number of the measurement models
existence of 30 to 50 different software reliability models show great promise. Predictive models have been far less
indicates the organization in this area. As discussed in chapter successful partly because a data base (such as MIL-HDBK-
1, hardware reliability started at a few companies and later was 217E (ref. 7-2) for hardware) is not yet available for software.
the focus of the AGREE reports. The field then logically Software reliability often has to use other methods; it must be
progressed through different models in sequence over the years. concerned with the process of software product development.
Along the same lines, numerous people and companies have
simultaneously entered the software reliability field in their
major areas: namely, cost, complexity, and reliability. The Models
difference is that at least 100 times as many people are now
studying software reliability as initially studied hardware reli- The development of techniques for measuring software reli-
ability. The existence of so many models and their purports ability has been motivated mainly by project managers who not
tends to mask the fact that several of these models have shown only need ways of estimating the manpower required to de-
excellent correlations between software performance predic- velop a software system with a given level of performance but
tions and actual software field performance; for instance, the also need techniques for determining when this level of perfor-
Musa model as applied to communications systems and the mance has been reached. Most software reliability models
Xerox model as applied to office copiers. There are also reasons presented to date are still far from satisfying these two needs.
for not accepting software reliability as a science, and they are Most models assume that the software failure rate will be
briefly discussed here. proportional to the number of implementation and design errors

NASA/TP--2000-207428 99
inthesystem without takingintoaccount thatdifferent kindsof tial distribution, the density function of the time of discovery of
errorsmaycontribute differentlytothetotalfailurerate.Elimi- the ith error, measured from the time of discovery of the (i- t )th
natingonesignificant designerrormaydouble themean time error is
tofailure,whereas eliminating 10minorimplementation errors
(bugs) mayhavenonoticeable effect.Evenassuming thatthe P(ti) = A,(i) e -2(i)'/
failurerateisproportional tothenumber of bugsanddesign
errors inthesystem,nomodelconsiders thatthefailureratewill
thenberelated tothesystem workload. Forexample, doubling where _,(i) =f(N- i + 1) and Nis the number of errors originally
theworkload without changing thedistribution ofinputdatato present. The model gives the maximum likelihood estimates
for N andf
thesystem maydouble thefailurerate.
Software reliabilitymodelscanbegrouped intofourcatego- The Jelinsky-Moranda model has been extended by Wolverton
ries:timedomain, datadomain, axiomatic,andother. and Schick (ref. 7-6). They assume that the error rate is
proportional not only to the number of errors but also to the time
spent in debugging, so that the chance of discovery increases as
time goes on. Thayer, Lipow, and Nelson (ref. 7-7) give
Time Domain Models
another extension in which more than one error can be detected
Models formulated in the time domain attempt to relate in a time interval, with no correction being made after the end
of this interval. New maximum likelihood estimators of N and
software reliability (characterized, for instance, by a mean-
time-to-failure (MTTF) figure under typical workload condi- fare also given.
tions) to the number of bugs present in the software at a given All the models presented so far attempt to predict the reliabil-
time during its development. Typical of this approach are the ity of a software system after a period of testing and debugging.
models presented by Shooman (ref. 7-3), Musa (ref. 7-4), and In a good example of an application of this type of model,
Jelinsky and Moranda (ref. 7-5). Removing implementation Miyamoto (ref. 7-8) describes the development of an on-line,
errors should increase MTTF, and correlating bug removal real-time system for which a requirement is that the mean time
history with the time evolution of the MTTF value may allow between software errors (MTBSE) has to be longer than
the prediction of reliability when a given MTTF will be reached. 30 days. The system will operate on a day-by-day basis,
The main disadvantages of time domain models are that bug 13 hours a day. (It will be loaded every morning and reset every
correction can generate more bugs and that software unreliability evening.) The requirement is formulated so that the value of the
can be due not only to implementation errors but also to design reliability function R(t) for t = 13 hours has to be greater than
e (-13/MTBSE) = 0.9672. Miyamoto also gives the MTBSE
(specification) errors, characterization, and simulation during
testing of the typical workload. variations in time as a function of the debugging time. The
The Shooman model (ref. 7-3) attempts to estimate the MTBSE remained low for most of the debugging period,
software reliability--that is, the probability that no software jumping to an acceptable level only at the end. The correlation
failure will occur during an operating time interval (0,t)--from coefficient between the remaining number of errors in the
an estimate of the number of errors per machine-language program and the failure rate was 0.77, but the scatter plot shown
instruction present in a software system after T months of is disappointing and suggests that the correlation coefficient
debugging. The model assumes that at system integration there between the failure rate and any other system variable could
are E i errors present in the system and that the system is have given the same value. In the same paper, Miyamoto
operated continuously by an exerciser that emulates its real use. describes in detail how the system was tested.
The hazard function after T months of debugging is assumed to None of the above models takes into account that in the

be proportional to the remaining errors in the system. The process of fixing a bug, new errors may be introduced in the
reliability of the software system is then assumed to be system. The final number given is usually the mean time
between software errors, but only Miyamoto points out that this
number is valid only for a specific set of workload conditions.
R(t) = e -CE(r'T)
Other models for studying the improvement in reliability of
a software item during its development phase exist, such as
where E(r,T) is the remaining number of errors in the system Littlewood (ref. 7-9), in which the execution of a program is
after Tmonths of debugging and Cis a proportionality constant. simulated with continuous-time Markov switching among
The model provides equations for estimating C and E(r,T) from smaller programs. This model also demonstrates that under
the results of the exerciser and the number of errors corrected. certain conditions in the software system structure, the failure
The Jelinsky-Moranda model (ref. 7-5) is a special case of process will be asymptotically Poisson. Trivedi and Shooman
the Shooman model. The additional assumption made is that (ref. 7-10) give another Markov model, in which the most
each error discovered is immediately removed, decreasing the probable number of errors that will have been corrected at any
remaining number of errors by one. Assuming that the amount time t is based on preliminary modeling of the error occurrence
of debugging time between error occurrences has an exponen- and repair rates. The model also predicts the system's availabil-

100 NASA/TP---2000-207428
ityandreliabilityattimet. Schneidewind (ref. 7-11 ) describes iary variables Yi to be 0 if a run with E i is successful, and
a model which assumes that the failure process is described by [ otherwise,
a nonhomogeneous Poisson process. The rate of error detection
in a time interval is assumed to be proportional to the number N
of errors present during that interval. This leads to a Poisson p=Y_
distribution with a decreasing hazard rate. i=1

where p is again the probability that a run of the program will


Data Domain Models
result in an execution failure.

Another approach to software reliability modeling is study- A mathematical definition of the reliability of a computer
ing the data domain. The first model of this kind is described by program is given as the probability of no execution failures after
Nelson (ref. 7-12). In principle, if sets of all input data upon n runs:
which a computer program can operate are identified, the
reliability of the program can be estimated by running the R(n) = R n = (I - p)n
program for a subset of input data. Thayer, Lipow, and Nelson
(ref. 7-7) describe data domain techniques in more detail.
The model elaborates on how to choose input data values at
Schick and Wolverton (ref. 7-13) compare the time domain and
random for E according to the probability distribution Pi to
data domain models. However, different applications will tend
obtain an unbiased estimator of R(n). In addition, if the execu-
to use different subsets of all possible input data, yielding
tion time for each E i is also known, the reliability function can
different reliability values for the same software system. This
be expressed in terms of the more conventional probability of
fact is formally taken into account by Cheung (ref. 7-14), in
no failure in a time interval (0, t).
which software reliability is estimated from a Markov model
Chapter 6 in Thayer, Lipow, and Nelson (ref. 7-7) extends
whose transition probabilities depend on a user profile. Cheung
the previous models to take into account how the testing of
and Ramamoorthy (ref. 7-15) give techniques for evaluating
input data sets should be partitioned. Also discussed are the
the transition probabilities for a given profile.
uncertainty in predicting reliability values, the effect of remov-
In the Nelson model (ref. 7-12) a computer program is
ing software errors, and the effect of program structure.
described as a computable function F defined on the set E =
(E i, i = 1..... At), where E includes all possible combinations
of input data. Each E i is a sample of data needed to make a run Axiomatic Models
of the program. Execution of a program produces, for a
given value of E i, the function value F(Ei). The third category includes models in which software reli-
In the presence of bugs or design errors, a program actually ability (as well as software quality in general) is postulated to
implements F'. Let Ee be the set of input data such that F'(E e) obey certain universal laws (Ferdinand and Sutherla, ref. 7-16;
produces an execution failure (execution terminates prema- Fitzsimmons and Love, ref. 7-17), Although such models have
turely, or fails to terminate, or the results produced are not generated great interest, their general validity has never been
acceptable). IfN e is the quantity of E i leading to failure F e, proven and, at most, they only give an estimate of the number
of bugs present in a program.
The best-known axiomatic model is the so-called software
N_
p=-- science theory developed by Halstead (see ref. 7-18). Halstead
N
used an approach similar to thermodynamics to provide quan-
titative measures of program level, language level, algorithm
is the probability that a run of the program will result in an
purity, program clarity, effect of modularization, programming
execution failure. Nelson defines the reliability R as the prob-
effort, and programming time. In particular, the estimated
ability of no failures or
number of bugs in a program is given by the expression

R=l-p=l -Ne
N

In addition, this model is further refined to account for the


fact that the inputs to a program are not selected from E with where
equal a priori probability but are selected according to some
operational requirement. This requirement may be character- K proportionality constant
ized by a probability distribution (P i, i = 1..... N), Pi being the E0 mean number of mental discriminations between errors
probability that the selected input is E i. If we define the auxil- made by programmer

NASA/TP--2000-207428 101
TABLE 7-1.---CORRELATION OFEXPERIENCE TO Other Models
SOFTWARE BUG PREDICTION BY
AXIOMATIC MODELS The model presented by Costis, Landrault, and Laprie
Reference Correlation
coefficient (ref. 7-21) is based on the fact that for well-debugged pro-
between
predictedand grams, a software error results from conditions on both the
real
number ofbu,_s
input data set and the logical paths encountered. We can then
FunamiandHalstead(ref.7-19) 0.98,
0.83, 0.92
consider these events random and independent of the past
Cornell
andHalstead
(ref.7-20) 0.99 behavior of the system (i.e., with constant failure rate). Also,
because of their rarity, design errors or bugs may have the same
Fitzsimmons
andLove (ref.7-17):
SystemA 0.81 effect as transient hardware faults.
SystemB 25 The model is built on the following assumptions:
System C .75
Overall .76
(1) The system initially possesses N design errors or bugs
that can be totally corrected by N interventions of the main-
tenance team.
V volume of algorithm implementation, N Iog2(n) (2) The software failure rate is constant for a given number
of system design errors.
where (3) The system starts and continues operation until a fault is
detected; it then passes to a repair state. If the fault is due to a
N program length hardware transient, the system is put into operation again after
n size of vocabulary defined by language used a period of time for which the probability density function is
assumed to be known. If the fault is due to a software failure,
More specifically, maintenance takes place, during which the error may be
removed, more errors may be introduced, or no modifications
may be made to the software.
N = NI + N2

The model computes the availability of the system as a


n = rtI + rt2 function of time by using semi-Markovian theory. That is, the
system will make state transitions according to the transition
where probabilities matrix, and the time spent in each state is a random
variable whose probability density function is either assumed
Nl total number of occurrences of operators in a program to be known or is measurable. The main result presented by
Costis, Landrault, and Laprie (ref. 7-21) is how the availability
N2 total number of occurrences of operands in a program
of the system improves (when all the design errors have been
nI number of distinct operators appearing in a pro_am removed) as the design errors are being removed under some
n2 number of distinct operands appearing in a program restrictive conditions. They show that the minimum availabil-
ity depends only on the software failure rate at system integra-
and E 0 has been empirically estimated to be approximately tion and not on the order of occurrence of the different types of
3000. design errors. The presence of different types of design errors
Many publications have either supported or contradicted the only extends the time necessary to approach the asymptotic
results proposed by the software science theory, including a availability.
special issue of the IEEE Transactions on Software Engineer- The mathematics of the model is complex, requiring numeri-
ing (ref. 7-18). Though unconventional, the measures pro- cal computation of inverse Laplace transforms for the transition
posed by the software science theory are easy to compute, and probabilities matrix, and it is not clear that the parameters
in any case it is an alternative for estimating the number of bugs needed to simulate a real system accurately can be easily
in a software system. Table 7-1 shows a correlation coefficient measured from a real system.
between the real number of bugs found in a software project and Finally, some attempts have been made to model fault-
the number predicted by the software science theory for several tolerant software through module duplication (Hecht,
experiments. There are significant correlations with error ref. 7-22) and warnings about how not to measure software
occurrences in the programs, although the data reported by reliability (Littlewood, ref. 7-23).
Fitzsimmons and Love (ref. 7-17) (obtained from three Gen- None of the preceding models characterizes system behavior
eral Electric software development projects totaling 166 280 accurately enough to give the user a guaranteed level of perfor-
statements) show weaker correlation than the original values mance under general workload conditions. They estimate the
reported by Halstead. number of bugs present in a program but do not provide any

102 NASA/TP--2000-207428
accurate method ofcharacterizing
andmeasuring operational Software
system unreliabilitydueto software.Thereis a largegap
between thevariablesthatcanbeeasilymeasuredinarunning For several reasons, we have discussed software models before
systemandthenumber ofbugsinitssoftware.
Instead,acost- describing software. The reader should not be biased or led to
effectiveanalysis should
allowprecise
evaluationofsoftware a specific type of software. Few papers on software reliability
unreliabilityfromvariables
easily
measurable
inanoperational make a distinction between product software, embedded soft-
system, withoutknowing thedetails
ofhowthesoftware has ware, applications software, and support software. In addition,
beenwritten. the models do not distinguish between vendor-acquired soft-
ware and in-house software and combinations of these.

Trends and Conclusions

With software reliability being questioned as a science, Categories of Software


programming process control appears to be the popular answer
to both software reliability and software quality. Measurements According to Electronic Design Magazine, the United States
of the programming process are supposed to ensure the genera- supports at least 50000 software houses, each grossing
tion of an "error-free" programming product, if such an achieve- approximately $500 000 per year. It is projected that software
ment is possible. Further, quality and productivity measurements sales in the United States will surpass hardware sales and reach
combined with select leading process indicators are supposed the $60 billion range. International competition will eventually
yield error-free software.
to fulfill the control requirements for developing quality soft-
ware. This so-called answer is similar to aphilosophy that failed In-house and vendor-acquired software can be categorized as
follows:
in attempts to develop hardware reliability control. Reliability
should be used to predict field performance. Especially with
real-time communications and information management sys- (I) Product
tems, the field performance requirements vastly overshadow the (2) Embedded
field defect level requirements. How can we change the present
(3) Applications
popular trend (toward programming process control) to one
(4) Support
that includes a probabilistic reliability approach? The answer is
not a simple one; these models must be finely balanced so that
a clear separation of reliability and quality can be achieved. Product software.---This categorization is from the view-
The trends for reliability tasks in the large-scale integrated point of the software specialist. Communications digital switch-
ing systems software is included as "product software" along
circuit (LSI) and very large-scale integrated circuit (VLSI)
hardware areas are in the failure modes and effects analysis and with the software for data packet switching systems, text
the control of failures. The same emphasis can be placed on systems, etc.
software (programming bugs or software errors). Once this is Embedded software.--This category comprises program-
done, reliability models can reflect system performance due to ming systems embedded in physical products to control their
hardware and software "defects" because their frequency of operational characteristics. Examples of products are radar
controllers, boiler controls, avionics, and voice recognition
occurrence and the effects of their presence in the operation will
be known. This philosophy focuses on the complete elimina- systems.
tion of critical defects and the specified tolerance level of minor Applications software.--This category is usually developed
defects. Normally, minor defects are easier to find and more to service a company's internal operations. The accounting area
numerous than the most critical defects and therefore dominate of this category covers payroll systems, personnel systems, etc.
a defect-removal-oriented model. The business area includes reservations systems (car, motel),
We conclude that the proper method for developing quality delivery route control, manufacturing systems, and on-line
programming products combines quality, reliability, and a agent systems.
selective measurements program. In addition, a redirection of Support software.--This category consists of the software
the programming development process to be based in the future tools needed to develop, test, and qualify other software prod-
on the criticality of defects, their number, and their budgeting ucts or to aid in engineering design and development. The
at the various programming life-cycle phases is the dominant category includes compilers, assemblers, test executives, error
requirement. A reliability growth model will monitor and con- seeders, and development support systems.
trol the progress of defect removal for the design phases and Vendor-acquired software.--This software can be absorbed
prove a direct correlation to actual system field performance. by the previous four categories and is only presented here for
With such an approach, a system can be placed in operation at clarification. It includes FORTRAN compilers, COBOL com-
a customer site at a preselected performance level as predicted pilers, assemblers, the UNIX operating system, the ORACLE
by the growth model. data base system, and application packages.

NASA/TP--2000-207428 103
Processing Environments (e) Military systemdefects: significant operational restric-
tions; loss of intermediate fast frequency function in
Software can usually be developed in three ways: (I) inter-
detection systems; loss of one or more antijamming
active, (2) batch, and (3) remote job entry. In the operational features
environment, these expand to include real time. Real-time
(f) Space system defects: occasional loss of telemetry data
development can be characteristic of both product software and and communications; significant operational or control
embedded software. However, because product software and restrictions
embedded software differ greatly in their requirements and in (g) Process control defects: process cannot consistently
their development productivity and quality methodologies, handle exceptions; inability to complete all process
they should not be combined (e.g., avionics has size, weight, control functions
and reliability requirements resulting in dense software of a (3) Minor restrictions (generic: loss of features; inability to
type that a communications switching system does not have). effectively modify program)
(a) MIS software defects: mishandling of records; system
Severity of Software Defects occasionally cannot handle exceptions
(b) CAD/CAM/CAE defects: occasional errors produced
We must categorize and weigh the effects of failures. The in design system; faults produced for which there are
following four-level defect severity classification is presented workarounds
in terms of typical software product areas: (c) Telephone switching defects: loss of some support
feature, such as call forwarding or conferencing
(I) System unusable (generic: frequent system crashes) (d) Data communications defects: occasional inability to
(a) Management information system (MIS) software keep up with data rate or requests; occasional minor loss
defects: inability to generate accounts payable or to of data transmitted or received
access data base; improper billing (e) Military system defects: loss of some operational modes
(b) Computer-aided design (CAD), manufacturing (CAM), such as tracking history, monitor or slave model of oper-
and engineering (CAE) defects: inability to use systems; ation, multiple option selection
CAD produces incorrect designs (f) Space system defects: occasional loss of update infor-
(c) Telephone switching defects: frequent service outages; mation or frame; occasional loss of subframe synchroni-
loss of emergency communications service zation or dropouts of some noncritical measurements
(d) Data communications defects: loss of one or (g) Process control defects: problems that require a work-
more signaling channels; unrecoverable errors in trans- around to be implemented; minor reductions in rate or
mission; erratic service throughput; manual intervention at some points in the
(e) Military system defects: success ofmission jeopardized; process
inability to exercise fire control systems; loss of elec- (4) No restrictions (generic: cosmetic; misleading documenta-
tronic countermeasure capabilities tion; inefficient machine/person interface)
(f) Space system defects: success of space mission jeopar-
dized; risk of ground support team or flight crew life; loss
of critical telemetry information
(g) Process control defects: waste of labor hours, raw Software Bugs Compared With Software Defects
materials, or manufactured items; loss of control result-
Software bugs are not necessarily software defects: the term
ing in contamination or severe air and water pollution "defect" implies that removal or repair is necessary, and the term
(2) Major restrictions (generic: loss of some functions)
"bug" implies removal, some degree of correction, or a certain
(a) MIS software defects: loss of some ticket reservation
level of toleration. A recent example of bug toleration from the
centers or loss of certain features such as credit card
telecommunications industry is contained in reference 7-24:
verification
(b) CAD/CAM/CAE defects: loss of some features in
It is not technically or economically feasible to
computer-aided design such as the update function;
detect and fix all software problems in a system
significant operational restrictions in CAM orCAE areas;
as large as No. 4 Electronic Switching System
faults produced for which there is no workaround
(ESS). Consequentl y, a strong emphasis has been
(c) Telephone switching defects: loss of full traffic cap-
placed on making it sufficiently tolerant of soft-
ability; loss of billing
ware errors to provide successful operation and
(d) Data communications defects: occasional loss of con-
fault recovery in an environment containing soft-
sumer data; inability to operate in degraded mode with
ware problems.
loss of equipment

104 NAS A/TP---2000-207428


Various opinions exist in the industry about what constitutes analysis to a system that never really works. This means that the
a software failure. Definitions range from a software failure software which still has bugs in it really has never worked in the
being classed as any software-caused processor restart or true sense of reliability in the hardware sense." This
memory reload to a complete outage. One argument against statement agrees with reference 7-24, which says that large,
assigning an MTBF to software-caused processor restarts or complex software programs used in the communications indus-
memory reloads is that if the system recovers in the proper try are usually operating with some software bugs. Thus, a
manner by itself, there has not been a software failure, only a reliability analysis of such software is different from a reliabil-
software fault or the manifestation of a software bug. From a ity analysis of established hardware. Software reliability is not
systems reliability viewpoint, if the system recovers within a alone in the need for establishing qualitative and quantitative
reasonable time, the event is not to be classed as a software models. Reference 7-28 discusses the "bathtub curve" and the
failure. effect of recent data on electronic equipment failure rate, and
reference 7-30 discusses the effects of deferred maintenance
and nonconstant software and hardware fault rates.
Hardware and Software Failures
In the early 1980's work was done on a combined hardware/
software reliability model. Reference 7-30 states, "The use of
Microprocessor-based products have more refined defini-
steady-state availability as a reliability/maintainability meas-
tions. Four types of failure may be considered: (1) hardware
ure is shown to be misleading for systems exhibiting both
catastrophic, (2) hardware transient, (3) software catastrophic,
hardware and software faults." The authors develop a theory
and (4) software transient. In general, the catastrophic failures
for combining well-known hardware and software models in a
require a physical or remote hardware replacement, a manual or
Markov process and consider the topic of software bugs and
remote unit restart, or a software program patch. The transient
errors based on their experience in the telecommunications
failure categories can result in either restarts or reloads for the
field. To synthesize the manifestations of software bugs, we
microprocessor-based systems, subsystems, or individual units
must note some of the hardware trends for these systems:
and may or may not require further correction. A recent reli-
ability analysis of such a system assigned ratios to these
(I) Hardware transient failures increase as integrated cir-
categories. Hardware transient faults were assumed to occur at
cuits become denser.
10 times the hardware catastrophic rate, and software transient
faults were assumed to occur at 100 to 500 times the software (2) Hardware transient failures tend to remain constant or
increase slightly with time after the infant mortality phase.
catastrophic rate.
(3) Hardware (integrated circuit) catastrophic failures
The time of day is of great concern in reliability modeling and
decrease with time after the infant mortality phase.
analysis. Although hardware catastrophic failures occur at any
time of the day, they often manifest themselves during busier
These trends affect the operational software of communica-
system processing times. On the other hand, hardware transient
failures generally occur during the busy hours as do software tions systems. If the transient failures increase, the error analy-
sis and system security software are called into action more
transient failures. The availability of restart times is also critical
often. This increases the risk of misprocessing a given transac-
and in the example presented in reference 7-25, the system
tion in the communications system. A decrease in the cata-
downtime is presented as a function of the MTBF of the soft-
strophic failure rate of integrated circuits can be significant, as
ware and the reboot time. When a system's predicted reliability is
described in reference 7-13, which predicts an order-of-
close to the specified reliability, such a sensitivity analysis must
magnitude decrease in the failure rate of 4K memory devices
be performed.
between the first year and the twentieth year. We also tend to
Reference 7-26 presents a comprehensive summary of
over-simplify the actual situations. Even with five vendors of
developed models and methods that encompass software life-
these 4K devices, the manufacturing quality control person
cycle costs, productivity, reliability and error analysis, com-
may have to set up different screens to eliminate the defective
plexity, and the data parameters associated with these models
devices from different vendors. Thus, the system software will
and methods. The various models and methods are compared
in reference 7-26 on a common basis, and the results are see many different transient memory problems and combina-
tions of them in operation.
presented in matrix form.
Central control technology has prevailed in communications
systems for 25 years. The industry has used many of its old
modeling tools and applied them directly to distributed control
Manifestations of Software Bugs
structures. Most modeling research was performed on large
Many theories, models, and methods are available for quan- duplex processors. With an evolution through forms of mul-
tifying software reliability. Nathan (ref. 7-27) stated, "It is tiple duplex processors and load-sharing processors and onto
contrary to the definition of reliability to apply reliability the present forms of distributed processing architectures, the

NASA/TP--2000-207428 105
TABLE 7-2.---CRITICALITYINDE)
Bug Defect Level Failure type Failure
manifestation
removal of characteristic
rate rate criti-
cality,

4 perday 1 per month 5 Transient Errors come and go


3perday I per week 4 Transient Errors are repeated
2 per week I per month 3 Transient or Service is affected
catastrophic
1 per month 2 per year 2 Transient or System is partially
catastrophic down
1 per two I per year 1 Catastrophic System stops
years

modeling tools need to be verified. With fully distributed con- could cause a failure event. The key is to categorize levels of
trol systems the software reliability model must be conceptu- criticality for bug manifestations and estimate their probability
ally matched to the software design to achieve valid predictions of occurrence and their respective distributions. The impor-
of reliability. tance of this increases with the distribution of the hardware and
The following trends can be formulated for software tran- software. Software reliability is often controlled by establish-
sient failures: ing a software reliability design process. Reference 7-24 pre-
sents techniques for such a design process control. The final
( 1) Software transient failures decrease as the system archi- measure is the system test, which includes the evaluation of
tecture approaches a fully distributed control structure. priority problems and the performance of the system while
(2) Software transient failures increase as the processing under stress as defined by audits, interrupts, reinitialization, and
window decreases (i.e., less time allowed per function, fast other measurable parameters. The missing link in quantifying
timing mode entry, removal of error checking, removal of software bug manifestations needs to be found before we can
system ready checks, etc.) obtain an accurate software reliability model for measuring
tradeoffs in the design process on a predicted performance
A fully distributed control structure can be configured to basis. Ifa software reliability modeling tool could additionally
operate as its own error filter. In a hierarchy of processing levels, combine the effects of hardware, software, and operator faults,
each level acts as a barrier to the level below and prevents errors it would be a powerful tool for making design tradeoff deci-
or transient faults from propagating through the system. Cen- sions. Table 7-2 is an example of the missing link and presents
tral control structures cannot usually prevent this type of error a five-level criticality index for defects. Previously, we dis-
propagation. cussed a four-level defect severity classification with level four
If the interleaving of transaction processes in a software not causing errors. These examples indicate the flexibility of
program is reduced, such as with a fully distributed control such an approach to criticality classification.
architecture, the transaction processes are less likely to fail. This Software reliability measurement and its applications are
is especially true with nonconsistent user interaction as experi- discussed in reference 7-31 for two of the leading software
enced in communications systems. Another opinion on soft- reliability models, Musa's execution time model and
ware transient failures is that the faster a software program runs, Littlewood's Bayesian model. Software reliability measure-
the more likely it is to cause errors (such as encountered in ment has made substantial progress and continues to progress
central control architectures). Some general statements can be as additional projects collect data. The major hurdle in estab-
formulated: lishing a software reliability measurement tool for use during
the requirement stage is under way.
(l) In large communications systems, software transient Comparing references 7-32 and 7-31 yields an insight into
failures tend to remain constant, and software catastrophic the different methods of achieving software reliability. The
failures tend to decrease with time. method described in reference 7-32 concentrates on the design
(2) In small communications systems, software transient process meeting a present level of reliability or performance at
failures decrease with time. the various project design stages. When the system meets its
(3) As the size of the software program increases, software final software reliability acceptance criteria, the process is
transient failures decrease and hardware failures increase. complete. Reference 7-31 describes a model that provides
the design process with a continuous software reliability
A "missing link" needs further discussion. Several methods growth prediction. The Musa model can compare simulta-
can be used to quantify the occurrence of software bugs. neous software developments and can be used extensively in
However, manifestations in the system's operations are detri- making design process decisions. An excellent text on software
mental to the reliability analysis because each manifestation

106 NASA/TP---2000- 207428


reliability based on extensive data gathering was published in 7-13. Schick, G.J.; and Wolverton. R.W.: An Analysis of Computing Soft-
ware Reliability Models. IEEE Trans. Software Eng.. vol. SE-4,
1987 (ref. 7-33).
no. 2, Mar. 1978, pp, 104-120.
We can choose a decreasing, constant, or increasing software
7-14. Cheung, R.C.: A User-Oriented Software Reliability Model. [EEE
bug removal rate for systems software. Although each has its Trans. Software Eng.. vol. SE--6, no. 6, Mar. 1970, pp. 118---125.
application to special situations and systems, a decreasing 7-15. Cheung, R.C.: and Ramamoorthy, C.V.: Optimum Measurement of

software bug removal rate will generally be encountered. Program Path Frequency and Its Applications. International Federa-
tion of Automatic Control: 6th World Congress, Instrument Society of
Systems software also has advantages in that certain software
America, 1975, vol. 4, paper 34-3.
defects can be temporarily patched and the permanent patch
7-16. Ferdinand, A.E.; and Sutherla, T.W.: A Theory of Systems Complexity.
postponed to a more appropriate date. Thus, this type of defect Int. J. Gen. Syst., vol. 1, no. 1, 1974, pp. 19-33.
manifestation is treated in general as one that does not affect 7-17. Fitzsimmons, A.; and Love, T.: Review and Evaluation of Software

service, but it should be included in the overall software quality Science. Comput. Surv., vol. 10, no. I, Mar. 1978, pp. 3-18.
7-18. Commemorative Issue in Honor of Dr. Maurice H. Halstead. IEEE
assessment. The missing link concerns software bug manifes-
Trans. Software Eng., vol. SE-5, no. 2, Mar. 1979.
tations. As described in reference 7-34, until the traditional
7-19. Funami, Y.; and Halstead, M.H.: A Software Physics Analysis of
separation of hardware and software systems is overcome in Akiyama's Debugging Data. Purdue University, CSD TR- 144, 1975.
the design of large systems, it will be impossible to achieve a 7-20. Cornell, L.; and Halstead, M H.: Predicting the Number of Bugs Expected
satisfactory performance benchmark. This indicates that soft- in a Program Module. Purdue University, CSD TR-202, 1976.
7- 21. Costis, A.; Landrault, C.; and Laprie, J.C.: Reliability and Availability
ware performance modeling has not yet focused on the specific
Models for Maintained Systems Featuring Hardware Failures and
causes of software unreliability.
Design Faults. IEEE Trans. Comput., vol. C-27, June 1978,
pp. 548-560.
7-22. Hecht, H.:Fauh-TolerantSoftwareforReaI-TimeApplications.Comput.
References Surv., vol. 8, no. 4, Dec. 1976, pp. 391--407.
7-23. Littlewood, B.: How To Measure Software Reliability and How Not To.
7-1. Siewiorek, D,P.; and Swarz, R.S.: The Theory and Practice of Reliable IEEE Trans. Software Eng., vol. SE-5, no. 2, June 1979, pp. 103-110.
System Design. Digital Press, Bedford, MA, 1982, pp. 206-211. 7-24. Davis, E.A.: and Giloth, P.K.: Performance Objectives and Service
7-2. Reliability Prediction of Electronic Equipment. MIL-HDBK-217E, Experience. Bell Syst. Tech. J., vol. 60, no. 6, July-Aug. 1981,
Jan. 1990. pp. 1203-1224.
7-3. Shooman, M.L.: The Equivalence of Reliability Diagrams and Fault- 7-25. Aveyard, R.L.: and Man, F.T.: A Study on the Reliability of the Circuit
Free Analysis. IEEE Trans. Reliab., vol. R-19, no. 2, May 1970, Maintenance System-lB. Bell Syst. Tech. J., vol. 59, no. 8. Oct. 1980,
pp. 74--75. pp, 1317-1332.
7-4. Musa, J.D.: A Theory of Software Reliability and Its Applications. 7-26. Software Engineering Research Review: Quantitative Software Mod-
IEEE Trans. Software Eng., vol. SE- 1, no. 3, Sept. 1975, pp. 312-327. els. Report No. SPR-I. Data and Analysis Center for Software (DACS),

7-5. Jelinsky, Z.; and Moranda, P.B.: Applications of a Probability Based Griffiss AFB, NY. 1979.
7-27. Nathan, I.: A Deterministic Model To Predict 'Error-Free' Status of
Method to a Code Reading Experiment. Record 1973: IEEE Sympo-
sium on Computer Software Reliability, IEEE, New York, 1973, Complex Software Development. Workshop on Quantitative Soft-

pp. 78-82. ware Models. IEEE, New York, 1979.


7-6. Wolverton, R.W.; and Schick, G.J.: Assessment of Software Reliabil- 7-28. Wong, K.L.: Unified Field fFailure) Theory-Demise of the Bathtub
ity. TWE-SS-73--04, Los Angeles, CA, 1974. Curve. Annual Reliability and Maintainability Symposium, IEEE,
7-7. Thayer, T.A.; Lipow, M.; and Nelson, E.C.: Software Reliability: New York, 1981, pp. 402--407.
A Study of a Large Project Reality. North Holland, 1978. 7-29. Malec, H.A.: Maintenance Techniques in Distributed Communications
7-8. Miyamoto, 1.: Software Reliability in Online Real Time Environment. Switching Systems. IEEE Trans. Reliab., vol. R-30, no. 3, Aug. 1981,
International Conference on Reliable Software, IEEE/Automation pp. 253-257.
Industries, Inc., Silver Spring, MD, 1975, pp. 518-527. 7-30. Angus' J" E'; and James" L" E: C°mbined Hardware/S°ftware Reliabil
7-9. Littlewood, B.: A Reliability Model for Markov Structured Software. ity Models. Annual Reliability and Maintainability Symposium, IEEE,
International Conference on Reliable Software, IEEE/Automation New York, 1982, pp. 176.-181.
Industries, Inc., Silver Spring. MD, 1975, pp. 204-207. 7-31. Musa, J.D.: The Measurement and Management of Software Reliabil-
7-10. Trivedi, A.K.; and Shooman, M.L.: A Many-State Markov Model for ity. IEEE Proceedings, vol. 68, no. 9, Sept. 1980, pp. 1131-1143.
the Estimation and Prediction of Computer Software Performance. 7-32. Giloth, P.K.; and Witsken, J.R.: No. 4 ESS-Design and Performance of
International Conference on Reliable Software, IEEE/Automation Reliable Switching Software. International Switching Symposium
Industries, Inc., Silver Spring, MD, 1975, pp. 208-220. (ISS '8 I--CIC), IEEE, 1981, pp. 33AI/1-9.
7-33. Musa, J.D.: iannino, A.; and Okamoto, K.: Software Reliability. McGraw-
7-11. Schneidewiod, N.F.: Analysis of Error Processes in Computer Soft-
ware. International Conference on Reliable Software, IEEE/ Hill, 1987.

Automation Industries, Inc., Silver Spring, MD, 1975, pp. 337-346. 7-34. Malec, H.A.: Transcribing Communications Performance Standards
7-12. Nelson, EC.: A Statistical Basis for Software Reliability Assessment. Into Design Requirements. ITT Adv. Technol. Center Tech. Bul.,
TRW, 1973. vol. 2, no, 1, Aug. 1981.

NAS AfI'P--2000- 207428 107


Reliability Training l

l. In-house and vendor-acquired software can be classified into what four categories?

A. Product, embedded, B. Useful, embedded, C. Product, embedded,


applications, and error- applications, and applications, and support
free software harmful software software

2. Name the four categories of software reliability models.

A. Time domain, data B. Time domain, data C. Time axiom, data domain,
axiom, corollary, and domain, axiomatic, frequency domain, and
many and other corollary

3. Can the bug manifestation rate be

A. Equal to the defect removal rate?


B. Greater than the defect removal rate?
C. Less than the defect removal rate?
D. All of the above?

4. What are the various software processing environments?

A. Interactive, batch, remote B. Hyperactive, batch, close job C. Interactive, batch, real job
job entry, and real time entry, and compressed time entry, and remote time

5. Name the four levels of severity for software defect categorizations.

A. Generic system, functional, B. System unusable, major C. System unusable, system


category restrictions, and restrictions, minor restric- crashes, loss of features,
working tions, and no restrictions and minor bugs

6. An online, real-time system has a mean time between software errors of 15 days. The system operates 8 hours per day.
What is the value of the reliability function? Use the Miyamoto model.

A. 0.962 B. 0.999 C. 0.978

7. Is it always necessary to remove every bug from certain software products?

A. Yes B. No C. Don't know

8. Name the four types of hardware and software failure.

A. Hardware part, hardware B. Hardware plan, hardware C. Hardware catastrophic, hard-


board, software module, build, software cycle, soft- ware transient, software cat-
software plan ware type cycle astrophic, software transient

tAnswers are given at the end of this manual.

108 NASA/TP--2000-207428
Reference Document for Inspection: "Big Bird's" House Concept

What is desired: bird house

For whom (client, customer, user): "Big Bird" (the tall yellow bird on "Sesame Street")

Why: Why not, even Oscar the Grouch has a house.

"Big Bird's" General Concept

"Big Bird" needs a house (and he's willing to pay for it) and he wants it big enough for him to live in (he's over 6 feet tall). He
wants to be able to enter and leave the house comfortably, to be able to lock out the big bad wolves (even those dressed as granny),
the materials used to be strong enough to support his weight (he's not particularly svelte), and to be weather proofed enough to keep
him dry and warm in stormy weather, as defined by the post office (rain, sleet, hail, snow, wind).

Class Meeting Exercise: Requirements Inspection

Statement of Problem: "Big Bird" has no house. Life Cycle Stage

Done Step 1: Build a house. Concept

Step 2: State the kind of house desired. Requirements


System
Subsystem

Done

To be inspected

Step 3: Make drawings of desired house. Design

Step 4: Build house. Development

Step 5: Walk through house Test


(open doors and windows).

Step 6: Pay for house. Delivery

Step 7: Live in house. Operation and maintenance

Note: At any step, perform analysis and SQUAWK if changes are needed.

NASA/TP---2000-207428 109
Reference Document for Inspection System Requirements

"Big Bird's" House Systems Requirements

Excuse Me, Are Those Requirements?

Well, yes, after a bit of questioning and head scratching, the following system requirements were defined:

1. The house shall accommodate "Big Bird" and his belongings.

2. The house shall provide easy access to "Big Bird."

3. The building materials shall be strong enough to support "Big Bird" (who is, ahem, rather rotund).

4. The building materials shall deny entrance to big bad wolves (straw definitely being out of favor).

5. The house shall have security measures to prevent easy access to any nefarious beings intending "fowl" play.

6. The building materials shall be weather proof and found in nature.

7. The building materials shall be low cost (even birds have budge'ts).

8. The house shall be one room.

9. The house shall have one door.

10. The house shall have a floor.

I I. The house shall have a roof.

12. The house shall have one window.

13. The house shall rest on level ground beneath his tree.

14. There will be no electricity, plumbing, heating, or air conditioning (client has feathers, candles, a bird bath, and ice cream).

15. Client will bring his own bed (BHOB).

16. The cost of the house shall not exceed 80 bird bucks.

110 NASA/TP--2000-207428
"Big Bird's" Requirements Checklist

Clarity

1. Are requirements specified in an implementation-free way so as not to obscure the original requirements?
2. Are implementation, method, and technique requirements kept separate from functional requirements?
3. Are the requirements clear and unambiguous (i.e., are there aspects of the requirements that you do not
understand; can they be misinterpreted)?

Completeness

1. Are requirements stated as completely as possible? Have all incomplete requirements been captured as TBD's?
2. Has a feasibility analysis been performed and documented?
3. Is the impact of not achieving the requirements documented?
4. Have trade studies been performed and documented?
5. Have the security issues of hardware, software, operations personnel, and procedures been addressed?
6. Has the impact of the project on users, other systems, and the environment been assessed?
7. Are the required functions, external interfaces, and performance specifications prioritized by need date? Are they prioritized
by their significance to the system?

Compliance

1. Does this document follow the project's system documentation standards?


2. Does it follow JPL's standards?
3. Does the appropriate standard prevail in the event of inconsistencies?

Consistency

1. Are the requirements stated consistently without contradicting themselves or the requirements of related systems?
2. Is the terminology consistent with the user and/or sponsor's terminology?

Correctness

1. Are the goals of the system defined?

Data Usage

1. Are "don't care" condition values truly "don't care?" ("Don't care" values identify cases when the value of a condition or flag
is irrelevant, even though the value may be important for other cases.)
2. Are "don't care" condition values explicitly stated? (Correct identification of "don't care" values may improve a design's
portability.)

Functionality

1. Are all functions dearly and unambiguously described?


2. Are all described functions necessary and together sufficient to meet mission and system objectives?

NASA/TP--2000-207428 111
Interfaces

1. Are all external interfaces clearly defined?


2. Are all internal interfaces clearly defined?
3. Are all interfaces necessary, together sufficient, and consistent with each other?

Maintainability

1. Have the requirements for system maintainability been specified in a measurable, verifiable manner?
2. Are requirements written to be as weakly coupled as possible so that rippling effects from changes are minimized?

Performance

1. Are all required performance specifications and the amount of performance degradation that can be tolerated explicitly stated
(e.g., consider timing, throughput, memory size, accuracy, and precision)?
2. For each performance requirement defined,
a. Do rough estimates indicate that they can be met?
b. Is the impact of failure to meet the requirement defined?

Reliability

1. Are clearly defined, measurable, and verifiable reliability requirements specified?


2. Are there error detection, reporting, and recovery requirements?
3. Are undesired events (e.g., single-event upset, data loss or scrambling, operator error) considered and their required responses
specified?
4. Have assumptions about the intended sequence of functions been stated? Are these sequences required?
5. Do these requirements adequately address the survivability after a software or hardware fault of the system from the point of
view of hardware, software, operations personnel, and procedures?

Testability

I. Can the system be tested, demonstrated, inspected, or analyzed to show that it satisfies requirements?
2. Are requirements stated precisely to facilitate specification of system test success criteria and requirements?

Traceability

1. Are all functions, structures, and constraints traced to mission/system objectives?


2. Is each requirement stated in a manner that it can be uniquely referenced in subordinate documents?

112 NASA/TP--2000-207428
"Big Bird's" Formal Inspection Subsystem Requirements
'Subsystem Requirements' Written for Big Bird's Approval
I'. The house shall be made of wood'. Acceptable [] or
Major Missing Type['--"-----"]
Minor Wrong Origin
Open issue Extra Defect classification

2. The house shall be nailed'together. Acceptable [] or Major Missing Type,F-------'-I


Minor Wrong Origin
Open issue Extra Defect classification

3. The house size shall be 4 cubits by Acceptable [] or


Major Missing Type
4 cubits by 3 cubits. (If cubits were good Minor Wrong Origin
enough for Noah, they are good enough
Open issue Extra Defect classification
for us.)

4. The door shall be made of balsa wood. Acceptable [] or


Major Missing Type
Minor Wrong Origin
Open issue Extra Defect classification

5. The door opening shall be 4 inches by Acceptable [] or Major Missing Type


8 feet. Minor Wrong Origin
Open issue Extra Defect classification

6. The door shall' have a lock and key. Acceptable [] or Major Missing Type
Minor Wrong Origin
Open issue Extra Defect classification

7. The door shall have a 'door knob and Acceptable [_ or


Major Missing Type l---_
hinges. Minor Wrong Origin
Open issue Extra Defect classification

8. The door shall be on the same wall Acceptable [] or


Major Missing Type [""_
as the window. Minor Wrong Origin
Open issue Extra Defect classification

19.' The door shall be 12 meters from the mah Acceptable [] or Major Missing Type
jong set, shall be glued with silly putty to Minor Wrong Origin
the wall, and shall play the Hallelujah
Open issue Extra Defect classification
Chorus when the doorbell is rung by the
wolves wantin_ to eat Bi_ Bird.
10. The floor shall be carpeted with a silk Acceptable [] or Major Missing Type [-"-"-_
comforter (cost of 100 bird bucks; Minor Wrong Origin [-'-""'1
client has cold feet).
Open issue Extra Defect classification

11. The roof shall be shingled. Acceptable [] or Major Missing Type _'---'--_
Minor Wrong Origin
Open issue Extra Defect classification

12. The shingles shall be taffy. Acceptable [] or Major Missing Type


Minor Wrong Origin
Open issue Extra Defect classification

13. The house shall be painted blue. Acceptable [] or Major Missing ' Type
Minor Wrong Origin
Open issue Extra Defect classification
iI

14. The window shall be 3 by 3 feet. Acceptable [] or Major Missing Type _"--_
Minor Wrong Origin
Open issue Extra Defect classification

15. The window shall have interior locking Acceptable [] or Major Missing Type 1'----_
wood shutters (wolf proofing). Minor Wrong Origin
Open issue Extra Defect classification
or Major Missing ' Type
16. The window shall have a screen. Acceptable
Minor Wrong Origin
Open issue Extra Defect classification

17. The screen shall be made of oriental Acceptable [] or


Major Missing Type_
tissue paper. Minor Wrong Origin
Open issue Extra Defect classification

NASA/TP--2000-207428 I 13
"Big Bird's" Formal Inspection Subsystem Requirements
'Subsystem Requirements' Written for Big Bird's Approval
(Concluded)
or
18. The window shall open, close, and Acceptable [] Major Missing Type
lock. Minor Wrong Origin F-----"l
Open issue Extra Defect classification
or
19. The window shall have double Acceptable [] Major Missing Type
thermal glass panes. Minor Wrong Origin
Open issue Extra Defect classification
or
20. The window shall be placed next to Acceptable [] Major Missing Type
the flumajubit and the whupinsnapper. Minor Wrong Origin
Open issue Extra Defect classification
or
21. The house shall be insulated (see 10 Acceptable [] Major Missing Type
above),
Minor Wrong Origin
IOpen issue Extra Defect classification
or
22. The insulation shall be cough Acceptable [] Major Missing Type
lozenges (Smith Brothers, cherry Minor Wrong Origin r------[
flavor).
Open issue Extra Defect classification
123. The house shall have one bed (cost of Acceptable- [] or Major Missing Type
100 bird bucks). Minor Wrong Origin
Open issue Extra Defect classification
or
24, The cost of the house shall be 300 Acceptable [] Major Missing Type
bird bucks.
Minor Wrong Origin
Open issue Extra Defect classification

114 NASA/TP--2000-207428
Chapter 8

Software Design Improvements


Part I--Software Benefits and Limitations

Introduction ing that the input data are correct); however, we are now
beginning to encounter problems. Recently a manufacturer
Computer hardware and associated software have been used reported that its motherboards, which employed a particular
for many years to process accounting information, to analyze IDE (integrated drive electronics) controller, "when using
test data, and to perform engineering analysis. Now, computers certain operating systems have the potential for data corruption
and software control everything from automobiles to washing that could manifest itself as a misspelled word in a document,
machines and the number and type of applications are growing incorrect values or account balances in accounting software ....
at an exponential rate. The size of individual programs has or even corruption of an entire partition or drive.'" The potential
had similar growth. Furthermore, software and hardware are for data errors due to software embedded in certain Pentium
used to monitor and/or control potentially dangerous products computer chips has also been discovered (ref. 8-1).
and safety-critical systems. These uses include everything from Importance ofreliability.--The tremendous growth in the
airplanes and braking systems to medical devices and nuclear use of software to control systems has also drawn attention to
plants. the importance of reliability. Critical life-support systems and
The benefits to systems of using software are reduction in flight controls on military and civilian aircraft use software. For
weight, better optimization, autonomous action taken in emer- example, mechanical interlocks, which prevent unsafe condi-
gencies, more features and, hence, flexibility for users of tions from occurring (such as disabling power when an instru-
computer-based products, increased capabilities, better design ment cover is removed), are being replaced with software-
analysis, and identification of the causes of problems. controlled interlocks.
What is the benefit of weight reduction? Using a computer The size of the software also continues to grow, making it
system to control aircraft and spacecraft has tremendous weight more costly to find and fix errors. From a few lines of code
and cost advantages over relying upon conventional electrome- 20 years ago to 500 000 source lines of code (SLOC) for only
chanical systems and personnel (who could be better used the flight software of the space shuttle (ref. 8-2) and
elsewhere). 1.588 million SLOC for the F-22 fighter (ref. 8-3). The
Some of the questions software designers ask are, How can application of software in the automotive industry has increased
this hardware and software be made more reliable? How can from an 8-bit processor that controlled engine applications to a
software quality be improved? What methodology needs to be powerful personal computer that added more built-in diagnos-
provided on large and small software products to improve the tics and systems controls. Also, because of its complexity, only
design? How can software be verified? i percent of major software projects are finished on time and
Software reliability.--Software reliability includes the prob- budget and 25 percent are never finished at all (ref. 8--4).
ability that the program (in terms of the computer and its Some problems have become apparent. There occasionally
software) being executed will not deliver erroneous output. exists a lack of discipline in generating software; people treat
People have come to trust computer-generated results (assure- software controls very lightly and often have not attempted to

NASA/TP--2000-207428 115
predict
thereliability
andsafetyimplications
oftheirsoftware.
Hence,therearemany potentialandunrecognizedpitfallsinthe
application
ofsoftware thatareonlynowbeingrealized. Many
seriousincidents
insafety-critical
applications
mayhavebeen qSSZ3,r /
related
tosoftware
andthecomplex control
interfaces thatoften
accompany softwarecontrolled
systems.Oneexample occurred
when"in 1983a UnitedAirlinesBoeing767wentintoa
4-minute powerless
glideafterthepilotwascompelled to shut Software
down both engines." This was due to a computerized engine- Design I Manufacture I Operation
control system (in an attempt to optimize fuel efficiency) that
ordered the engines to run at a speed where ice buildup and Figure 8-1 .--Failure origins.
overheating occurred (ref. 8-5).
A China Airlines A300--600R Airbus crashed in part because
of cockpit confusion. "Essentially, the crew had to choose Overview: How Do Failures Arise?
between allowing the aircraft to be governed by its automatic
pilot or to fly it manually. Instead, they took a halfway measure, Generally, we can say that all failures come from the design
probably because they failed to realize that their trimmable or manufacturing process or from the operation of the equip-
horizontal stabilizer (THS) had moved to a maximum noseup ment (the computer), its associated software, and the system it
deflection as an automatic response to a go-around command. controls (fig. 1). Software is becoming a critical source of
It was defeating their effort to bring the aircraft's nose down failures because they often occur in unexpected ways. Through
with elevator control ... (ref. 8-6)." a long history of the design process and particularly in the
Because of these problems, we need to ask the following design of mechanisms or structures, the type and severity of
questions: What computer system errors can occur? What are failures have become well known. Hardware failures can often
the risks to the system from software? Why do accidents be predicted, inspections can be set up to look for potential
involving software happen--from both the systems engineer- failures, and the manufacturing process can be changed to make
ing and the software engineering viewpoint? What are some a mechanical system more reliable.
software reliability or (safety) axioms that can be applied to Although a small anomaly or error in the design or operation
software development? How can we be aware of the real risks of a mechanical system often produces a predictable and
and dangers from the application of software to a control and corresponding failure, software is different. An incorrect bit, a
sensor problem? corrupted line of code, or an error in logic can have disastrous
Software qua//ty.--How can the design of software be consequences. Testing a mechanical system (though not per-
improved? Part II of this chapter, Software Quality and the fect) can be set up to validate all "known" events; on the other
Design and Inspection Process, will answer these questions. It hand, software with only a few thousand SLOC may contain
will also discuss the following topics: useful software quality hundreds of decision options with millions of potential out-
metrics, tools to improve software quality, software specifi- comes that cannot all be tested for or even predicted. Also,
cations, assessing the quality and reliability of software, speci- historically the design and behavior of mechanical systems
fications to improve software safety, tools that affect software have been well known, so expanding the performance envelope
reliability and quality, factors that affect tradeoffs and costing of the design led to a new system that was similar to the old one.
when software quality is evaluated. The behavior of the new mechanical system was predictable.
Software safety.--Software development is now a key factor This does not hold true for software because minor changes in
affecting system safety because of the often catastrophic effects a program can lead to major changes in output.
of software errors. Therefore, a system can only be safe if its Errortypes.--The types and sources of errors that can occur
software cannot cause the hardware to create an unsafe condi- in a computer-controlled system are presented in figure 2 and
tion. Software safety is the effective integration of software are described next:
design, development, testing, operation, and maintenance into
the system development process. A safety-critical computer • Hardware failure in the computer: common to all electrical
software component (SCCSC) is one whose errors can result in devices
a potential hazard, loss of predictability, or loss of system • Hardware logic errors (in program logic controllers (PLC's)):
control. System functions are safety-critical when the software mistakes in design or manufacture
operations that, if not performed, performed out-of-sequence, • Coding errors: mistakenly written into program or program
or performed incorrectly can result in improper control func- became corrupted
tions that could directly or indirectly cause or allow a hazardous • Requirements errors: missing, incomplete, ambiguous, or
condition to exist. How can this software be improved? contradictory specifications

116 NASA/TP--2000-207428
Hardware and software failure differences .--In compari-
son with the methods used to verify the reliability of system
E%, I hardware components, those used for software prediction,
inspection, testing, and reliability verification differ greatly. The
reason for the differences is the nonphysical, abstract nature of
software, the failures of which are almost always information
design oversights or programming mistakes and are not caused
l Firmware l / I1_1i I
by environmental stresses or cumulative damage. Furthermore,
I co_pute_ I I I / Co?tro!i_t?,ace
I architecture I ,/ aria sh,eld,ng / the design rules for mechanical systems are usually well known,
a vast amount of historical data on similar systems being
available along with mathematical models of wear, fatigue,
electrical stress, and so forth to make life predictions. Each
software system is often unique, Even with some code reuse,

I i arOwar.
i:l I complexity makes reapplication difficult. Some features of
software and hardware reliability are compared in table 8-I.

=_ Sensor L II T ¢ ' Dataland

/-- Types of Software

Figure 8--2.--Types of errors. Software types are classified on the bases of timing and
control, run methodology, and run environment.
• Unintended outcome or state: logic errors in the program Timing and control.--Software risks and their impact on
code for a given set of inputs systems and data can be evaluated based on how the software
• Corrupted data: partially failed sensors or errors in internal interacts with the system, how humans interact with the system
lookup tables and the software, and whether this activity is carried on in real
• User interface problems: several sources (e.g., multiple points time. Factors to evaluate are whether (1) the software controls
to turn off computer control of a system or keyboard buffers a system or just provides information, (2) real-time human
are too small) interference and evaluation of output are allowed, (3) the soft-
• Faulty software tools (e.g., finite-element structural analysis ware output time is critical or nontime critical, and (4) the data
code generation programs): errors in logic and outputs supplied by informational software are critical or noncritical.
• Execution problems These factors are summarized in table 8-2. The reader should
° Variations in computer architecture from platform to plat also consult MIL-STD-882C, System Safety Program
form causing software verified on one platform to behave Requirements (ref. 8-7), for types of software based on levels
differently on another platform of control and hazard criticality.
° Faulty or difficult-to-use interfaces between computers Run methodology.--Another classification of software is
or between computers and sensors based on run methodology and includes these types:

TABLE 8-1 .--HARDWARE AND SOFTWARE FAILURE D[FFERENCES

Category Hardware Software

Reliability prediction Many mathematical models exist for predicting wear, Reliability predictions are nearly impossible due to the
tools fatigue life, and electronic component life. nonrandom distribution of errors.
Causes of failures Wearout, misuse, inadequate design, manufacture or Poor design affects software (the computer system on
maintenance or incorrect use can contribute to failures which the software resides can also fail).

Redundancy Hardware reliability is usually improved with Software reliability (except possibly for multiple voting
redundancy'. systems) is not improved with redundancy.
Hard or soft failures Soft failures (some degradation in service before Usually no soft failures occur (however, there may be
complete failure) often occur due to wear, chemical some recovery routines that can take the system to a
action, electrical de[radation, etc. safe state, etc.)
Maintenance Usually testing and maintenance improve hardware and Software reprogramming may introduce new and
increase reliability unpredictable failure modes into the system. Reliability
may be decreased. Any change to the code should
require _ retesting of the software, but this is
usually not done.
Reliability prediction I Design theory, a history of previous systems and load Software reliability is a function of the development
methodology Ipredictions all allow excellent reliability prediction process.

NASA/TP--2000-207428 117
TABLE 8-2.---CLASSIFICATION OF SOFTWARE BASED ON LEVEL OF HAZARD AND CONTROL

Software control Information Human/other control Real Examples


time

I
Autonomous control exercised Some information may be !May be possible but not Yes Space shuttle main engine and solid
over hazardous systems. available but insufficient !desirable: often no other rocket booster ignition sequence
for real-time interference, independent safety' s_stems
Semiautonomous control Real-time information is Possible and desirable under Yes Aircraft terrain-following system.
exercised over hazardous kvailable to allow some circumstances; other medication dispensing device,
systems. human/other system independent safety systems or nuclear power plant safety systems,
interaction and control, ability to disengage automatic go around mode in aircraft
(override)

Mix of computer and human Real-time information is Yes, required for some Yes Aircraft fly_-by-wire systems of
control over hazardous available to allow human subsystems of operation; unstable aircraft (example BN-2)
systems. interaction and control. other independent safety where computer translates pilots
Human control of some systems control requests into feasible flight
functions. surface modifications

No, but generates information Complete real-time Human interaction required to Yes Aircraft collision avoidance systems,
requiring immediate human information presented to properly control the system; nuclear power plant instrumentation.
action. allow human control over other independent safety ! hospital patient vital signs
hazardous s_,stems. systems
No, but human action based on Information not presented Human actions and decisions No I Statistical process control
information. in real time. Software directly influenced by information of machine tools.
does provide critical information; other checks historical medical information
information. summaries
No. but human action based on Information not presented Human actions and decisions No Financial and economic data
information. m real time. Software directly influenced by the
does not provide critical information
information.

Interactive: a program that is continuously running and Types of Computer System Errors
interacting with the operator
The following examples are problems that have been
Batch: a single run or process of a program (often acting on
observed with the application of software to control processes
data, such as a finite-element analysis) from which a single
and systems.
output will occur
Spaceprobe.--Clementine I, which successfully mapped the
Remote job ent_': a software environment in which programs
Moon's surface, was to have a close encounter with a near-
are submitted or started by others from remote locations
Earth asteroid. A hardware or software malfunction on the
who usually seek a single output
spacecraft "resulted in a sequencing mode that triggered an
Environment.--Software may be classified according to the open ing of valves for four of the spacecraft' s 12 attitude control
environment in which it operates: thrusters, allowing all the hydrazine propellant to be used up
(ref. 8-8)."
Embedded: a computer code written to control a product; Chemiealplant.--Programmers did not fully understand the
usually resides on a processor that is part of the product; has way a chemical plant operated. The specifications stated that if
typical applications as boiler controllers, washing machine an alarm occurred, all process control settings were to be frozen.
and automobile computer controls The resulting computer system software released a catalyst into
Applications: program that analyzes data; often runs as a a reactor and began to increase cooling water flow to it. While
batch job on a computer with limited input from the user the flow was increasing, the system received an oil sump, oil
once the job is submitted; operates in payroll systems, low alarm, and froze the flow of cooling water at too slow a rate.
finite-analysis programs, material requirements planning The result was that "the reactor overheated and the pressure
(MRP) systems (to update sections) release valve vented a quantity of noxious fumes into the
Support: software tools that may be considered another class atmosphere (ref. 8-4)."
of programs; used to develop, test, and qualify other soft- Space Shuttle.--An aborted mission nearly occurred during
ware products or to aid in engineering design and develop- the first flight of Endeavor to rendezvous and repair an lntelsat
ment; has typical applications as compilers, assemblers, satellite. The software routine used to calculate rendezvous
computer-aided-software engineering tools (CASE) firings "failed to converge to a solution due to a mismatch

118 NASA/TPm2000-207428
between
theprecision
ofthestate-vector
tables,whichdescribes
theposition
andvelocity
oftheShuttle
(ref.8-2)." HI
lD
Airliner.--A laptop computer used by a passenger on a
l
II _!
Boeing 747--400 flying over the Pacific caused the airliner's ml
navigation system to behave erratically. When the computer i:|
was brought to the flight deck and turned on, "the navigation ll
HI
displays went crazy (ref. 8-9)."
Programmer Engineer

Sources of Errors Figure 8-3.--Sources of errorbased on organizational


problems.
Investigating the sources of problems should take prece-
dence over finding the errors in the software logic. Anytime an using complex techniques, and neglecting internal comments
analog and/or an electromechanical control system is replaced and written documentation can seriously affect the quality of
by a computer system, many unique problems can occur. software and decrease its reuse.
Organizationalproblems.--Determining the causes of errors (4) Configuration control management over software
and eliminating them requires an analysis of the procedures, changes: During software development and maintenance,
organizational arrangements, and methodology that cause prob- unauthorized or undocumented changes made by a program-
lems with software. Figure 3 gives an overview of the following mer to fix a possible mistake may cause many problems down
organizational problems: the line. Toward the end of a project, pressure to complete
the job encourages code changes without proper review or
(1) Communication between the software programmer and documentation.
the systems or design engineer: The designer does not know the (5) Silver bullets: Over reliance on silver bullets to solve a
software and the programmer does not know the system with all company's software problems results in real issues being over-
its potential failure modes (they do not have domain-specific looked. One of the most difficult problems to deal with is
knowledge). Programmers frequently fail to understand the unrealistic hope that an advance in software development tech-
potential for problems if certain actions do not occur in a logical nology, a new code-generating tool, or object-oriented super
sequence. For example, "start heater and add fluids to boiler" code will make software generation problems disappear. This
may be "logical" pro_amming sequences, but what if the com- reliance also manifests itself when state-of-the-art techniques
puter has a fault after the heater is started, before enough fluid are exclusively relied upon in lieu of using good documenta-
is added to the boiler? Similarly, design and safety engineers tion, formal requirements, and continuous interface between
frequently lack knowledge about specific software, the way it software, design, and safety personnel.
will control the system, and the potential for software problems. (6) Personnel: A greater attempt to keep good programming
They treat the computer and its software as a black box with no talent should be made because a turnover results in a loss of
regard for the consequences if the unit fails. Consequently, in corporate knowledge, reduces the reuse of code, and causes
the past, system safety engineers ignored software or looked at problems with software maintenance.
it superficially when analyzing systems. (7) Software reuse: When existing software could be reused,
(2) Documentation standards for software, testing, and veri- many software programs are started from scratch (again with
fication: Many problems are caused by the practices of not little control over how the code is to be written). Note that a
documenting the software analysis and the procedures for careful reuse of codes has saved time and manpower.
inspection, testing, and last-minute fixes without retesting and
reverification. Design and verification tools may not exist. Design and requirements problems.--Poor analysis and
Formal procedures for software inspection may not exist or the flowdown of requirements specifications for an individual
procedures may be in place but may be essentially ignored by project can cause errors, delays, and cost overruns:
the software development group. For example, a potential
flight problem was noticed on one experiment scheduled to fly (1) Requirements: Poorly defined requirements for a spe-
in space to evaluate the effects of microgravity. To correct it, cific software project can cause a cost overrun and increase the
the software was changed during a preflight checkout on a probability that code logic errors will be introduced. When real-
holiday, but the change was not verified. During the mission, time systems are developed for new applications or applica-
the heaters on a device developed only 25 percent of the needed tions outside the normal areas of the software engineer's
power because the simple software change caused the loss of expertise, additional requirements are needed to implement the
some mission data. basic system. Frequently discovered while the software devel-
(3) Standardization of software structure: In many organiza- opment process is well underway, these requirements are often
tions, not requiring adherence to software standards contributes inconsistent, incomplete, incomprehensible, contradictory, and
to many system fail ures. Trying to be elegant in writing software, ambiguous.

NASA/TP--2000-207428 119
(2)Additional
features:
Adding
new features to the software TABLE 8-3.--SOURCES OF ERRORS BY PERCENT
is also major problem and is based on the perception that long Logic ? 19
after programming has started, requirements for new features Input/output .l4.74
can be added with little negative effect. However, such addi- Data handling 14.49
tions to performance requirements adversely affect system Computation ...... g.34
software because it must be changed for each new requirement. Preset data base . 7.83
Because each change risks increasing errors in the function or Documentation 6.25
User interface .. 7.70
logic, design engineers must always ask, Have the require-
Routine-to-routine interface '_.62
ments been analyzed as a complete set?
(3) Anticipating problems: More attention must be given to
protecting the software-controlled system from off-nominal
distributed. Integration problems can occur while assembling
environments and to anticipating what states the system can
the code, linking program modules together, and transferring
reach through an unexpected series of events. Too often, the
files. Poor control over maintenance upgrades of software and
emphasis is on fulfilling performance requirements without
firmware also causes errors from improperly loading programs,
carefully analyzing what can go wrong with the system.
using the wrong batch files, and patching to the wrong revision
(4) Software and/or hardware interaction: Problems result of software.
from a lack of understanding of how the program will actually
run once a system is operational. Software may not be able to
A Rome Laboratories study classified errors by percentage of
process all the sensor data during a clock cycle, or it may be
occurrence (table 8-3), which reveals the importance of inter-
unable to deal with changes in physical conditions and face design and documentation (ref. 8-10).
processors.
(5) Isolating processes: Adding too many unnecessary soft-
ware processes on a computer controlling a safety-critical
Tools to Improve Software System Reliability and Safety
system can reduce assurance that critical processes will be
handled properly (safety critical refers to systems whose fail- For each of the aforementioned problem-causing agents, the
ures can cause a loss of life, the mission, or the system). following tools minimize risk and may even eliminate the
problem.
Other problem areas.--In addition to the associated hard-
ware, sensors, and interfaces that can also increase the risk of Organizationalimprovement.--Various tools and techniques
errors, other problems concern incorrect data, the reliability of properly applied and supported at all organizational levels can
the system, and the production, distribution, and maintenance greatly improve software reliability and safety.
of software.
(1) Communication: Improve communication between
( I ) Reliability: The reliability and survivability of the com- designers, software engineers, and safety engineers through
puter hardware, sensors, and power supplies are often not concurrent engineering, safety review teams, and joint training.
adequately planned for. The central processing unit (CPU), Concurrent engineering with regular meetings between design
memory, or disk drives of a computer can fail, the system can and software engineers to review specifications and require-
lose power, excess heat or voltage spikes can cause unantici- ments will improve communications. Continuous discussions
pated errors in performance and output, or the system can with the end users will help them to understand the background
completely shut down. of the various system performance requirements. Joint training
(2) System and/or sensor interfaces: The interfaces between and cross training will encourage them to develop informal
sensors and other mechanical devices can fail, resulting in relationships and communication. Software safety review com-
damage to cables and the failure of power supplies to sensors or mittees consisting of design, software, and safety personnel
servocontrollers. Often the anticipation of these events and who continually meet to review software specifications and
effective solutions are not handled adequately. implementation will assure that safety-critical software per-
(3) Radio frequency noise: The effect of radio frequency forms properly and that specifications be carefully written, not
(RF) noise is often unanticipated. It can cause a computer just in "legal" terms but with clear descriptions of how the
processor, its memory, and input/output devices to operate system should work.
improperly, or it can cause errors or erroneous readings from (2) Documentation: Improve software documentation stan-
sensors, poorly shielded cables, connectors, and interface boards dards, testing, and verification procedures. Encourage the
(e.g., fiber optic to digital conversion). application of standards for all software projects, including
(4) Manufacture and maintenance: Improper handling of the general requirements for all system development projects, the
manufacture, reproduction, and distribution of software results industry or military standards to be followed, and the docu-
in compilation errors and improper revisions of code being ments to be generated for a specific product. These documents

120 NASA/TP--2000-207428
may include a software version description document (see improve reliability because of the benefits derived from faults
part II for more details) and plans for software management, removed in prior usage. Modularized software with well-
assurance, configuration management, requirements specifica- documented and verifiable inputs and outputs also enhances
tions, and testing. maintainability. Lewis Research Center's launch vehicle pro-
(3) Standardization: Set and enforce software structure stan- grams are reused for each mission with only minor modifica-
dards to delineate what is and what is not allowed. The pro- tions, and excellent reliability results have been achieved.
grammer should not design a "clever" program that cannot be
readily understood or debugged. Enforce safe subsets of pro- Design and requirements improvements.--The hardware
gramming language, coding standards, and style guides. and the software must be integrated to work together. This
(4) Configuration management: Implement consistent con- integration includes the entire system with input sensors and
trols over software changes and the change approval process by signal conditioners, analog-to-digital (A/D) boards, the com-
using software development products that include software puter hardware and software itself, and the output devices
configuration management and code generation tools. (control actuators). Basic design methodology can improve
Computer-aided-software-engineering (CASE) tools and other software as well; thus, the following approaches support this
configuration management techniques can automatically com- concept:
pare software revisions with previous copies and limit unap-
proved changes. Other programming tools provide mission ( 1) Requirements: Spend sufficient time defining and under-
simulation and module interface documentation. standing requirements. The system, software, and safety engi-
(5) Silver bullets: The introduction of major changes in the neers should work with the end user to develop requirements,
procedures for generating software must be scrupulously to express the requirements in mutually understandable
reviewed and their impact on the software personnel, mainte- language, and to design requirements that are testable and
nance, and standardization evaluated carefully. Major disrup- verifiable.
tions to personnel can result from any major change in the way (2) Additional features: Limit changes in requirements once
a product is designed and developed; therefore, careful and the software design process begins. Question whether an addi-
complete training of personnel, a free flow of information about tional feature is really necessary or if, instead, functionality
the new system, assurances as to the support of existing should be reduced to achieve safety and basic performance goals.
programmers, and the gradual introduction of the new methods A large number of ancillary noncritical devices and special
(e.g., starting on one small project) are required. Projects graphical user interfaces may not be necessary and may only
already underway and those scheduled to begin may or may not complicate and slow the system.
benefit from the changes. Avoid developing a false sense of security by putting soft-
(6) Personnel: Provide incentives to keep good program- ware in its proper place of importance. Erroneously, many
ming talent and maintain the corporate knowledge base. The people think that a computer controlling a system can never fail
programmers should have a mix of programming skills and and will believe computer-controlled readouts rather than rely
experience and the ability to transmit practical pro_amming on their own good senses.
knowledge to new programmers who only have classroom (3) Anticipating problems: Fully analyze the ways the
training with little or no insight into real-world problems. software-controlled system can fail and the undesirable states
Keeping senior programmers or senior managers who can the system can attain. Then, implement procedures and methods
review software and participate in independent verification and to ensure that these undesirable states and failure modes cannot
validation (IV&V) of software across missions or products is be attained and that they are not attainable through some
also beneficial, as is retaining workers who know the software unusual (though not impossible) combinations of software
systems that support software maintenance and new applica- states, environment, and/or input data. Such steps will ensure
tions of the code. Provide training in the proper methodologies. the system's invulnerability to these failures.
Software should be modularized to facilitate changes and Use error detection, correction, and recovery software
maintenance. The modules should have low coupling (the development to achieve fault tolerance. Examples of common
number of links between modules is minimized) and have high errors include inconsistent data in data bases, process deadlock,
cohesion (the level of self-containment). starvation and premature termination, runtime failures due to
Use a "clean room approach" to develop software. This out-of-range values, attempts to divide by zero, and lack of
approach implies a highly structured programming environ- storage for dynamically allocated objects. Although software
ment with tight control of the specifications for the software does not degrade, it is virtually impossible to prove the correct-
and system and support and adherence to the software analysis ness of large, complex, real-time systems. The selective use of
specifications, logic engines can be effective in reducing uncertainty about a
(7) Software reuse: Encourage the reuse of software with system' s performance.
strict controls imposed over software structure and procedures Use software that can detect and properly handle runtime
for code reuse. Software modules and/or software reuse also errors and software controls that assume the worst and prepare

NASA/TP_2000-207428 121
forit,such asundesirable statesthecomputer can attain and the procedures, and inadequate safety procedures. The earlier
ways each of these states can be prevented. Make a careful model never experienced the problem because the program did
analysis of responses to failed or suspect sensors. not control the interlock (ref. 8-1 I).
Software capable of real-time diagnosis of its own hardware In many cases, safety-critical systems can have an analog
and sensors is very useful. Memory can be protected with process (or a stand-alone computer) capable of taking over if
parity, error-correcting code, and read-only circuitry in mem- the primary computer fails. If a computer control fails on a
ory. Messages received should be checked for accuracy, and process plant, an analog backup system (which is presumably
routes can be automatically changed when errors are detected. controlled by the computer) could keep the process running
Predefined system exceptions and user-defined fault excep- (though at less than optimum conditions). Alternatively, con-
tions should be designed into application software. Predefined trol actuators could go to a safe position if a failure occurred.
exceptions can be raised by runtime systems so the software Usually, the process must be allowed to proceed to some
should also have built-in or operating system recovery proce- nominal conditions (e.g., partial cooling water or partial prod-
dures. Information for recovery includes processor identifica- uct inflow into a process) before shutting down.
tion, process name, data reference, memory location, error Monitor the health of the backup systems and the output of
type, and time of detection. software control commands independently of the main control
(4) Software and/or hardware interaction: Computer timing computer. A separate computer should be performing health
problems and buffer overload problems must be eliminated. If checks on the main computer and on safety-critical sensor
all alarms and sensors cannot be read in one clock cycle of the outputs.
CPU, errors may occur or alarms may be missed. Overloaded Conduct special tests to verify the performance of safety-
buffers can result in CPU lockup. critical software. This testing should verify that the software
Load balancing should be a part of the operating system responds correctly and safely to single and multiple failures or
software routines because failures are often caused by over- alarms; that it properly handles operator input or sensor errors
loading one or more processors in the system. A few examples (e.g., data from a failed sensor); that it does not perform any
of overloading are caused by an increase in message traffic or unintended routines; that it detects failures and takes action
the inability of a processor to perform within time constraints. with respect to entry into and execution of safety-critical
In these cases, a potential tool to support complex systems is software components; and that it is able to receive alarms or
dynamic traffic time sharing in which message streams are other inhibit commands.
distributed among identical processors with a traffic coordina- Formal methods can use abstract models and specification
tor keeping track of the relative load among processors. languages to develop correct requirements. Logic engines can
(5) Isolating processes: Systems for safety-critical applica- be used to prove the correctness of the requirements.
tions need to be separate from everything else. System specifi- For many years, the Lewis Research Center's launch vehicle
cations often require gathering data from hundreds of sensors program verified the software for each mission by running the
and performing all sorts of noncritical tasks. Segregating these complete program in the mission simulation lab. All the mis-
noncritical tasks in a separate computer system will often sion constants and components were checked and verified.
improve chances that safety-critical functions will be not be Lewis never lost a vehicle because of software problems.
disrupted by defects in noncritical resources. Safety-critical
modules should be "firewalled," and proven hardware and Other improvements .--The hardware/software system must
technology should be used for critical systems. "Flight-proven" also be integrated with input sensors and signal conditioners
older computer systems and software that do the job should be (e.g., analog-to-digital boards) and the output devices (e.g.,
chosen over newer computers whose standards are rapidly servocontrolled actuators). Because the reliability of all this
evolving where critical applications are involved. hardware is also an issue, some basic approaches to total system
Analog interlocks on safety-critical systems should be performance follow:
replaced with software interlocks only with the greatest of care.
A thorough, well-documented analysis of what would happen (1) Reliability: The reliability and survivability of the elec-
with a computer failure and with a system failure that the tronic components associated with the software control system
interlock protects should also be made. An example of the can be improved by properly protecting components from
problem of replacing mechanical interlocks with software vibration, excess heat and voltage, and current spikes. Properly
interlocks involves a radiation therapy machine. An early maintained grounding and shielding also must be assured with
model of the therapy machine had a hardware interlock to maintenance training and documentation. Robust sensors,
prevent radiation overdoses. When the interlock was removed actuators, and interfaces also contribute to a more reliable
on a later model and replaced with software logic, several system. Sensor failure can cause the wrong data to be proc-
people were killed from a radiation overdose. The problem was essed. Even the fraying of cables has been linked with possible
caused by the operator interface, poorly documented data input uncontrolled changes in aircraft flight surface actuation. The

122 NASA/TP--2000-207428
reliabilityof computer-controlled output devices (servo- of and possible need for separate analog and digital grounds
actuators, vaJves, relays) must also be verified. Because output should also be investigated. Thorough system testing in all
devices may be subject to noise problems, error recovery and anticipated environments should be performed.
restart procedures should be included in software and properly (4) Manufacturing and maintenance: The duplication, load-
tested. ing, and maintenance of software must be planned and con-
Passive controls should be designed so that failures cause the trolled. Procedures must be developed to assure that the proper
system to go to a safe state. If input commands or sensor code is loaded on each processor model. All new compilations
readings are suspect, the system should go to a safe condition, of code must be verified. Buggy compilers can introduce defects.
which is accomplished by an analog backup or an autonomous Subtle changes from one revision of an operating system to
software module that should be in a separate backup system. another can cause a difference in response to the same code.
Multiple voting systems (multiple computers running the Procedures and requirements for maintenance upgrades must
same task in parallel with independently written programs) might also be developed. The updated software should be adequately
help to improve reliability. Although this concept is beneficial tested and verified (to the same level and extent and to the same
in theory, some studies suggest that common software logic requirements as the operating software) for accuracy (perfor-
faults arise from common requirements. Furthermore, mainte- mance), reliability, and maintainability. New software should
nance and configuration management of this type of system is be modularized and uploaded as individual modules when
greatly complicated by having different active versions of code maintenance is being performed. Also. whenever possible, issue
(ref. 8-12). Multiple computers with software written for the firmware changes as fully populated and tested circuit cards
same functional output but developed independently is one way (not as individual chips).
to handle the critical problem of software taking the operator to
a condition that was never intended. Systems should sense the
occurrence of anomalies and alert the operator. Health monitor- Software Development Tools
ing of the controlled system and the computer itself, including
frequent self-checks, should be part of the program. Several methods can be used to analyze and verify software.
Redundant systems need to have separate power sources and
locations (to avoid common mode failures). Use uninterrupted Fault tree analysis.bThis can identify critical faults and
power supplies for critical software systems. Have battery potential faults or problems. Then, all the conditions that can
backup for as long as needed to switch to manual operation. lead to these faults are considered and diagrammed.
Avoid a common power supply that can send a surge to all Petri net analysis.--This provides a way to model systems
devices at once or can shut off all devices at once. graphically. A Petri net has a set of symbols that show inputs,
A distributed system can also be used to improve reliability. outputs, and states with nodes that are either "places" (repre-
The system can sense problems in one processor and transfer its sented by circles) or "transitions" (represented by vertical lines).
work to another processor or system. Hardware components When all the places with connections to a transition are marked,
degrade with time and represent the most important factor in the net is "fired" by removing marks from each input place and
ensuring reliability of real-time systems. However, note that adding a mark to each place pointed to by the transition (the
the complexities of a distributed system can cause new prob- output places) (ref. 8--4).
lems that possibly reduce reliability. For example, the synchro- Hazard analysis.hThis uses formal methods to identify
nization and precision of numerical values between programs hazards and evaluate software systems (ref. 8-I 1).
and communications procedures can cause errors. More re- Formal logic analyzers.--These are logic engines that can
sources are also consumed for coding and testing and programs verify specifications. Some source analyzers can reveal logic
become larger (with more chance for error). problems in code and branching problems.
(2) Systemand/orsensorinterfaces:Thecomputerand sensor Pseudocodes.---These are used for program design and veri-
interfaces must be thoroughly tested to prevent mechanical fail- fication. They are similar to programming languages but are not
ures, intermittent contacts, connector problems, and noise. compiled. They have the flow and naming notation of program-
Again, provisions for data out of acceptable ranges must be ming language but have a readable style that allows someone to
made. better understand program logic (ref. 8--4).
(3) Radio frequency noise: Radio frequency (RF) noise State transition diagrams (STD's).bThese are graphs that
problems can be avoided. Input and output data should be show the possible states of the system as nodes and the possible
validated before use. The software should check for data out- changes that may take as lines. They can highlight poor archi-
side valid ranges and take appropriate action such as setting off tecture or unnecessarily complex computer code (ref. 8-4).
an alarm or shutting down the system. Proper maintenance Software failure mode effects analysis (FMEA).--These
procedures and training in the removal and replacement of analyze what can go wrong with the software and with the
grounding and shielding should be developed. The interaction system itself. The FMEA should analyze whether the system is

NASA/TP--2000-207428 123
faulttolerant
withrespecttohardware failuresandmake certain (19) Independent verification and validation (IV&V) of
thatthesystem specificationsarecomplete. Theactualfailure software should be used.
ofthecomputer hardware usually resultsinahardfailureand
theeffectsareeasilyidentified.
However, theeffects
offailures
handled bysoftware maynotbesoclear.Forexample, how Conclusions
doesthesoftware handlethelossofonepiece ofsensordataor
arecovery fromafault? Software is now used in many safety-critical applications and
each system has the potential to be a single-point failure or to
be zero fault tolerant; that is, a single failure will cause the
Software Safety Axioms and Suggestions system to fail orifacomputer is controlling a hazardous function,
a single failure can cause a hazardous condition to exist.
These axioms should be read and reread and the principles Potential problems with software are not well understood.
behind them thoroughly understood,
Computers controlling a system (the computer hardware, the
software, the sensors, and output devices that direct the flow of
( 1) Persons who design software should not write the code energy) are not a black box that can be ignored in a safety,
and those who write the code should not do the testing. reliability, or risk evaluation. However, if handled and applied
(2) Accidents are caused by incomplete or wrong assump- properly, software and hardware may be used to control a
tions about the system or process being controlled. Actual system and thus can be a valuable design option.
coding errors are less frequent perpetrators of accidents. The software development process can be improved by good
(3) Unhandled controlled system states and environmental communication, documentation, standardization, and configu-
conditions are a big cause of "software malfunctions." ration management. Other major factors in proper software
(4) The lack of up-to-date professional standards in soft-
development are correct and understandable requirements.
ware engineering and/or the lack of the use of these standards Factors that help to improve confidence in the system are
is a root cause of many problems. anticipating problems, properly handling errors, and improv-
(5) Changes to the original system specifications should be ing hardware reliability. Methods to validate and improve
limited.
software quality (and safety) are discussed in part II.
(6) It is impossible to build a complex software system to
behave exactly as it should under all conditions.
(7) Software safety, quality, and reliability are designed in,
not tested in.
(8) Upstream approaches to software safety are most Part II--Software Quality and the
effective.
(9) Software alone is neither safe nor unsafe.
Design and Inspection Process
(I0) Many software bugs are timing problems that are diffi- Software Development Specifications
cult to test for.
(11) Software often fails because it goes somewhere that the Improving software with standards and controls must include
programmer does not think it can get to. the following:
(12) Software systems do not work well until they have been
used. • Robust design: making software fault tolerant
(13) Mathematical functions implemented by software are • Process controls: standardizing the software development
not continuous functions but have an arbitrary number of process
discontinuities. • Design standards: standardizing the software specifications
(14) Engineers believe one can design "black box tests" on • Inspection: standardizing the software requirements
software systems without the knowledge of what is inside inspection process
the box. • Code inspection: standardizing the software code inspec-
(15) Safety-critical systems should be kept as small and as tion process
simple as possible; any functions that are not safety critical
should be moved to other modules. Precise and easily readable documentation and specifica-
(16) A software control system should be treated as a single- tions are necessary for a successful software project. Ideally,
point failure (in the past the software was often ignored). formal methods and specifications language should be used and
(17) What must not happen should be decided at the outset once written, must be understood and adhered to. To accom-
and then one should make sure that the program cannot get plish this process requires team participation in document and
there. specification generation and also real support of the specifica-
(18) The system should be fault tolerant and able to recover tions, documentation, and the verification of software con-
from faults and instruction jumps. formance and validation by upper management and the team.

124 NASA/TP--2000-207428
Some
ofthese
documents
andrelated
practices
should
include process. Standardizing formats, nomenclature, language,
compilers, and platforms for the software contributes to project
(1) A formalsoftware management planthatincludes the success as well. Besides many excellent internal company
software development cycle,theconfiguration manage- standards for software development, a number of documents
mentplan,approval authority, and_oupcharter and exist to help in the standardization and to gauge the maturity of
responsibilities.(Thisplanwouldspecifywhatother software development. Some of these documents are
documentation is required, howinterfaces areto be
controlled,andwhatthequalityassurance andverifica- (1) The Software Engineering Institute (SEI) Capability
tionrequirements are.) Maturity Model (CMM) is a method for assessing the software
(2) A formalsoftware designspecification thatincludes engineering capabilities of development organizations. It evalu-
architecturespecifications andhardware interfaces ates the level of process control and methodology in developing
(3) A software development planthatdescribes develop- software and is designed to rank the "maturity" of the company
mentactivities, facilities,
personnel, activityflow,and and its ability to undertake major software development projects.
thedevelopment toolsforsoftware generation (2) ISO 9000-3 Software Guidelines, Part 3, Guidelines
(4) A planforformalinspection ofsoftware thatincludes for the application of ISO 9001 to the development, supply, and
(a)asoftware qualityassurance plantointegrate hard- maintenance of software is intended to provide suggested
wareandsoftware safety,quality,andreliability controls and methods.
(b)asoftware verification testspecification (3) IEEE Software Engineering Standards Collect-
(c)a software faulttolerance andfailuremodes and ions include 22 standards ( 1993 edition) covering terminology,
effects
analysis specification quality assurance plans, configuration management, test
(5) Asoftware safety program planthatincludes asoftware documentation, requirements specifications, maintenance,
safetyhandbook andreliabilitypractices specifications metrics, and other subjects.
(6) Aformalplanformaintenance andoperation (4) NASA-developed software standards include NSS
(7)Configuration management anddocumentation plans 1740.13,.INTERIM, June 1994, NASA Software Safety Stan-
thatspecifyrecording allchanges tosoftware andthe dards that expands on the requirements of NASA Management
reasons forthechanges. (Records should include design Instruction (NMI) 2410.10, NASA Software Management
changes thatrequiresoftware modifications or any Assurance and Engineering Policy. These documents contain a
changein thefunctionalcapabilities, performance detailed reference document list.
specifications,orallocation ofsoftware tocomponents (5) DOD Standards include MIL-STD--882C, System Safety
orinterfaces.) Program Requirements (ref. 8-7), DOD-STD-2167A, Defense
(8) Interface controldocumentation thatspecifies linking System Software Development (MIL-STD-498) (ref. 8-t4),
hardware andsoftware, vendor-supplied software,and softwaredevelopment (e.g., ref. 8-15),and Documentation, and
internally generated software numerous other standards and guidelines (for reference only).
(9)Failurereviewboards toreviewbugs, thebugremoval
process, andtheoveralleffectofbugsonthesystem
(10)Lessons learned tobeusedtodocument problems and NASA Software Inspection Activities
thesolutions toeliminate repetition oferrors
We now want to focus on one area of the software docu-
(11)Testplansthatwill, tothegreatest extentpossible,
validatethesoftware system mentation, testing, inspection, and qualification process: the
software inspection activity. This inspection process includes
Oncethese documents aredeveloped
andtheprocedures
set (1) metrics, (2) software inspection training, and (3) formal
up,theymustbeimplemented, enforced,
andmaintained.A software inspection. Inspection activities include
softwaresystem safetyworkingteam(multidisciplined)
can
• Implementation of requirements
assist
softwareengineeringandcontinually
monitoradherence
• Review of pseudocode
tothedocumentation. Theyalsohavetoengenderrespect
for • Review of mechanics
theneedtofollowthespecifications,
notmandate
themandwalk
• Review of data structure
away.Therefore, theteamandsoftwareengineering
manage-
• "Walkthrough" of code
mentmusteducate programmersintheunderstanding
anduse • Verification and validation
ofspecifications
(ref.8-13).
• Independent verification and validation

Specifications
and Programming Standards The objectives of formal inspection include (1) removing
defects as early as possible in the development process, (2)
Structured programming with a well-defined design approach having a structured, well-defined review process for finding
and extensive commenting benefits the software design and fixing defects, (3) generating metrics and checklists used to

NAS A/TP--2000-207428 125


improve
quality,
(4) following total quality management (TQM) • Reader: presents the work product to the inspection team
techniques such as working together as a team, and (5) taking during the meeting (the programmer (author) does not give
responsibility for a work product shared by the author's peers. the presentation)
To achieve these objectives, specifications must be review- • Recorder: documents all the defects, open issues, and action
able, formally analyzable, and usable by the designers and the items brought forward during the meeting
ass urance and safety engineers. Furthermore, the specifications • Inspector: helps to identify and evaluate defects (the respon-
must support completeness and robustness checks and they sibility of every person at the meeting)
must support the generation of mission test data.
Formal design requirements and inspections.---The objec- Development process benefits.--Some of the benefits of
tive of inspection is to remove defects at the earliest possible formal inspection for the overall software development process
point in the product development life cycle. The product can be are that it
a document, a process, software, or a design. Inspection topics
include requirements, design requirements, detailed design • Improves quality and saves cost through early fault detection
requirements, source code, test plans, procedures, manual and correction
standards, and plans. * Provides a technically correct base for the following devel-
Inspection is a very structured process which requires that opment phases
team members, who are involved because of their technical • Contributes to project tracking
expertise, be sincerely interested in the software product. • Improves communication between developers
Rather than being viewed as a fault-finding mission, the inspec- • Aids in project education of personnel
tion should be considered a tool to help the author identify and - Provides structure for in-process reviews
correct problems as early as possible in the development
process. The inspection should also help to foster a team
Inspection also benefits the software developer in a number
environment by emphasizing that everyone is involved to
of ways:
develop a high-quality product.
Metrics (minor errors discovered, major errors discovered)
• Reduces defects made by the author because they are identi-
generated during this process are used to monitor the type of
software defects discovered and to help prevent their recur- fied early in the product life cycle
rence (refs. 8-16 to 8-18). • Identifies efficiently any omissions in the requirements
Process overview.--Staff, procedures, development time, • Provides constructive criticism of and guidance to the pro-
and training are applied to a developing software product to grammer by the inspection team in private rather than by
tearing down software in open public project design reviews
improve its quality. The formal seven-step program for inspec-
tion includes • Providesaconstructiveatmospherefortheentireteam because
of lessons learned from others' mistakes
• Implements improved project tracking with inspection mile-
( 1) The planning phase: organizing for the inspection
(2) The training phase: background and details of the inspec- stones embedded in the project
tion activity given to team members • Improves understanding ofthe overall project and engenders
communication and teamwork by bringing together project
(3) The preparation phase: review of the work by individual
persons from varied backgrounds
inspectors prior to the joint inspection meeting
• Trains new members of the software development team by
(4) The inspection meeting:defects identified, classified, and
working with senior team members
recorded by the team
(5) The "third hour" (cause phase): offline discussions held
by programmers to get help with defects Figure 4 presents the waterfall flowchart of the software
(6) The rework phase (corrective action): defects corrected development process (based on phases in MIL-STD--498,
by programmers Defense System Software Development, ref. 8-14). The fol-
(7) The followup phase: revisions reviewed and verified by lowing acronyms are used:
the team
CDR critical design review
Roles.--Each person who participates in the inspection per- CSCI computersoftwareconfiguration item (major computer
Ibrms various tasks: software PROGRAM)
CSU computer software unit (program module)
Moderator: coordinates the inspection process, chairs the FCA functional configuration audit
inspection meetings, and ensures that the inspection is I software inspections
conducted IV&V independent verification and validation activity

126 NASA,rI'P--2000- 207428


PCA physical configuration audit Inspections were also used at IBM Federal Systems to
PDR preliminary design review develop software for the space shuttle. The original defect rate
SDR system design review of 2.25 defects per thousand lines of code (KLOC) was unac-
SRR system requirements review ceptable. Over a 3-year period, inspections were applied on
SSR software specification review requirements, design, code and test plans, specifications, and
SW computer software procedures. The goal for this effort was 0.2 defect per KLOC.
TRR test readiness review With inspections, the project was able to surpass the goal and
V&V verification and validation activity attain a defect rate of 0.08 defect per KLOC.
One of the most essential lessons learned from the initial
implementation of the inspection process is that all inspection
Basic rules ofinspection.--These basic rules must be fol-
participants require some type of training. Everyone needs to
lowed if the software inspection process is to be effective:
understand the purpose and focus of inspections and the
resources required to support the process. Adequate time has to
(1) Inspections are in-process reviews conducted during be provided for inspections in the software development pro-
the development of a product in contrast to milestone reviews cess. Furthermore, using metrics from inspections provides an
conducted between development phases. excellent basis for monitoring both the inspection and develop-
(2) Inspections are conducted by a small peer team, each ment process and for evaluating process improvements.
member of which has a special interest in the project success. Another lesson learned is that a formal inspection requires
(3) Managers are not involved in the inspection and its projects to have an established development life cycle, an
results are not used as a tool to evaluate developers. established set of documents produced during the phases of the
(4) The moderator leads the inspection and must have life cycle, programming standards, and software development
received formal training to do so. standards (e.g., NASA Software Assurance Standard, NASA-
(5) Each team member, in addition to being an inspector, is STD-2201-93, which states that "Software verification and
assigned a specific role. validation activities shall be performed during each phase of the
(6) The inspection is spelled out in detail and no step of the software life cycle and shall include formal inspections.").
process is omitted. Additional benefits of formal inspections to the project are
(7) The overall time of the inspection is preset to aid in that they can be used with any development methodology
meeting the schedule. because no matter which development process or life cycle is
(8) Checklists are used to help identify defects. used, products being produced can be inspected; they are
(9) Inspection teams should work at an optimal rate, the applied during the development of work products and are a
object of the meeting being to identify as many defects as compliment to milestone or formal reviews but are not intended
possible--not to cover as many pages as possible. to replace them; they are recommended by the NASA Software
(10) Inspection metrics are defect type, number, and time Assurance Standard and can be applied to the work products
spent on inspections. These metrics are used to improve the called out in the NASA Software Documentation Standard
development process and the work product and to monitor the (refs. 8-20 to 8-22).
inspection.

Additional Recommendations
Results of software inspections.--Formal inspections save
costs because fixing defects early in the development cycle is On the basis of an evaluation of the space shuttle software
less costly than removing them later; they train team members development process, the following recommendations were
and provide them with a valuable development tool as lessons made (ref. 8-13):
learned from their participation in the bug identification and
removal process; and they improve developer and development (1) Verification and validation (V&V) inspections by con-
efficiency and lead to higher quality. tractors should pay close attention to off-nominal cases (crew
Fixing a defect found through inspections costs on the and/or ground error, hardware failure, software error condi-
average less than I hour per defect; fixing a defect found during tions), should focus on verifying the consistency in the levels of
software testing typically takes from 5 to 18 hours. Another cost descriptions for modules with the consistency in module require-
factor is that defects tend to amplify. One defect in require- ments and the design platform, should assure correctness with
ments ordesign may impact multiple lines of code. For example, respect to the hardware and software platforms, and should
a small study conducted by the Jet Propulsion Laboratory (JPL) maintain the real independence of independent verification and
found an amplification rate of I to 15, which means that I defect validation (IV&V).
in the requirements impacts 15 source lines of code (SLOC), as (2) The project should have sufficient personnel trained in
seen in figure 5 (information taken from ref. 8-19). system reliability and quality assurance (SR&QA) to support

NASAFI'P--2000-207428 127
n x_

0 _
_E
E E 0
0 _
0 _

J_._
0_ _Z A

®0
I' l e-

T 00 ° O
c

(,,-

E
._o
E

_c

0 0

O
(,.)

.,,C

O
E.
I,
E i
(:},,.
>> _o e
.WS_

>> iT"

6o

i
(,-

0,)

_L

128
NASA/TP--2000-207428
5OOO Software product assurance activities include formal inspec-
I-'-I Computer software
requirement tion, production-quality metrics, software inspection training,
e 4000 Code a code "walkthrough," verification and validation, and
¢:L
independent verification and validation. These activities are
_ 3000
making NASA projects more successful.

•_ 2000

=
>
1000 References

8-1 Halfhill, T.R.: The Truth Behind the Pentium Bug. BYTE, vol. 20,
0
A B C D no. 3, Mar. 1995.

Project 8-2. Gray, E.M.; and Thayer, R.H.: Requirements. Aerospace Software
Engineering, A Collection of Concepts, C. Anderson and M. Dorfman,
Figure 8-5.--Amplification of requirements into source code. eds., AIAA, Washington, DC, 1991, pp. 89-121.
Average amplification ratio, 1:15 (from ref. 8-19). 8-3. I:-22 Software on Track With Standard Processes. Avia. Week Space
Technol., vol. 143, no. 4, July 1995, pp. 53-54.
8--4. Norris, M.; and Rigby, P.: Software Engineering Explained. Wiley,
software-related activities and provide oversight and evalua-
New York, NY, 1992.
tion of software development activities by the individual
8-5. Beatson, J.: Is America Ready to 'Fly by Wire'? Washington Post,
SR&QA offices. sec. C, April 2, 1989, p.1.
(3) The same standards and procedures should be provided 8-6. Mecham, M.: Autopilot Go-Around Key to CAl., Crash. Aviat. Week
and enforced for multiple centers on the same program. Con- Space Technol., vol. 140, no. 19, May 1994, pp. 31-32.
8-7. System Safety Program Requirements. MIL-STD--882C, Jan. 1993.
sistent software development coding guidelines should be
8-8. Asker, J.R.; and Lenorovitz, J.M.: Computer Woes Spoil Clementine
provided to contractors.
Flyby Plan. Aviat. Week Space Technol., vol. 140, no. 20. May 1994,
(4) Visibility for potential software problems should be p. 58.
provided by defining detailed procedures to report software 8-9. Begley, Sharon: Mystery Stories at 10,000 Feet. Newsweek, vol. 122,
reliability, quality assurance (QA), or safety problems to the July 1993, p. 50.
8-10. Addy, E.A.: A Case Study on Isolation of Safety-Critical Software.
program-level organization.
Proc. 6th Annual Conference on Computer Assurance, NIST/IEEE,
(5) Accepted policies and guidelines should be provided for
1991, pp. 75-83.
the development and implementation of software V&V, 8-11. Leveson, N.: Safeware: System Safety and Computers. Addison-Wesley,
IV&V, assurance, and safety. These should include a well- Reading, MA, 1995.
documented maintenance and upgrade process. 8-12. Knight, J.C.: and Leveson, N.G.: An Experimental Evaluation of the
Assumption of Independence in Multiversion Programming. IEEE
(6) Sufficient resources, personnel, and expertise should be
Trans. Software Eng., vol. SE-12, no. 1, Jan. 1986, pp. 96-109.
provided to develop the required standards. Also, sufficient
8-13. Soistman, E.C.; and Ragsdale, KB.: Impact of Hardware/Software
resources, manpower, and authority should be used to compel Faults on System Reliability. Report No. OR 18, 173, Oriffiths Air
development contractors to verify that proper procedures were Force Base, Rome Air Development Center, Rome, NY, 1985.
followed. 8-14. Defense System Software Development. MIL-STD--498, Dec. 1994.
8-15. Software Engineering Handbook. General Electric Company, Corpo-
(7) Lessons learned in the development, maintenance, and
rate Information Systems, McGraw-Hill, New York, NY, 1986.
assurance of software should be recorded for use by other 8-16. Brooks, F.P., Jr.: No Silver Bullet: Essence and Accidents of Software
programs (refs. 8-23 to 8-26). Engineering. Comput., vol. 20, no. 4. 1987, pp. 10-19.
(8) The information that each development and oversight 8-17. Fagan, M.E,: Advances in Softwarelnspections. IEEETrans. Software
contractor is responsible for making available to the commu- Eng., vol. SE-12, no. 7, July 1986, pp. 123-152.
8-18. Clardy, R.C.: Sneak Analysis: An Integrated Approach. Third Interna-
nity as a whole should be precisely identified. Mechanisms
tional System Safety Conference Proceedings, 1977, pp. 377-387.
should be in place to ensure that programs be given all informa-
8-19. Kelly, John C.; Sherif, Joseph S.; and Hops. Jonathan: An Analysis
tion needed to make intelligent implementations of software of Defect Densities Found During Software Inspection. J. Systems
oversight functions. Software, vol. 17, no. 2, Feb. 1992, pp. 11 I-I 17.
8-20. Software System Safety Handbook. AFISC SSH I-1, U.S. Air Force
Inspection and Safety Center, Norton Air Force Base, CA, 1985.
8-21. Leveson, N.: Safety Analysis Using Petri Nets. IEEE Trans. Software
Conclusions Eng., 1986.
8-22. Mattern, S.: Confessions of a Modern-Day Software Safety Analysis.
Proceedings of the Ninth International System Safety Conference,
The overall software design process will be improved by
Long Beach, California, June 1989.
carefully constructing the initial documentation to generate
8-23. Goel, A.L.; and Okumoto, Kayudina: A Time Dependent Error Rate
real and usable requirements. Requirements must be capable of Model for Software Reliability and Other Performance Measures.
being verified by inspection and test. IEEE Transaction on Reliability, no. R-28, 1979, pp. 206-21 I.

NASAfI'PQ2000-207428 129
8-24. Sagols, G.; and Albin, J.L.: Reliability Model and Software Develop.
merit: A Practical Approach. Software Engineering: Practice and
Experience, E. Girard, ed., Ox ford Publishing. Ox ford, England, 1984.
8-25. Reliability Qualification and Production Approval Tests. MIL-STD-
781, Oct. 1986.

8-26. Myers, G.: Software Reliability: Principles and Practice. Wiley, New
York, NY, 1976.

130
N AS A/"I'P--2000- 207428
Reliability Training

Read the "Reference Document for Inspection: 'Big Bird's' House Concept" (found at the end ofch. 7). The class meeting exercise
explains what has to be done and the reference document explains the system requirements. The "'Big Bird's' Requirements
Checklist" gives the classifications for the inspection. Complete the "'Big Bird's' Formal Inspection Subsystems Requirements,"
and send it to the instructor to grade. A score of 70 percent correct will qualify you for a certificate (e.g., item 1, 2-acceptable,
item 3-squak, a cubic is about 17 inches, major, wrong, correctness, system).

NASA/TP---2000-207428 131
Chapter 9
Software Quality Assurance
Concept of Quality ability to satisfy needs (e.g., conform to specifications),
(b) the degree to which software possesses a desired combina-
Let us first look at the concept of quality before going on to tion of attributes, (c) the degree to which a customer or user
software quality. The need for quality is universal. The concepts perceives that software meets his or her expectations, and
of "zero defects" and "doing it right the first time" have changed (d) the composite characteristics of software that determine the
our perspective on quality management from that of measuring degree to which the software in use will meet the expectations
defects per unit and acceptable quality levels to monitoring the of the user.
design and cost-reduction processes. The present concepts
indicate that quality is not free. One viewpoint is that a major
We can infer from these statements and other source mate-
improvement in quality can be achieved by perfecting the
process of developing a product. Thus, we would characterize rials that software quality metrics (e.g., defects per 1000 lines
the process, implement processes to achieve customer satisfac- of code per programmer year, 70 percent successful test cases
tion, correct defects as soon as possible, and then strive for total for the first 4 weeks, and zero major problems at the prelimi-
nary design review) may vary more than hardware quality
quality management. The key to achieving quality appears to
have a third major factor in addition to product and process--the metrics (e.g., mean time between failures (MTBF) or errors per
environment. People are important because they make the 1000 transactions). In addition, software quality management
process or the product successful. Figure 9-1 represents the has generally focused on the process whereas software reliabil-
union of these three factors. ity management has focused on the product. Since processes
The term "software quality" is defined and interpreted differ- differ for different software products, few comparative bench-
ently by the many companies involved in producing program- marks are available. For hardware in general, benchmarks
ming products. To place the subject in perspective, we present have been available for a long time (i.e., MIL-HDBK-217E
principles and definitions for software quality from several series (ref. 9--4) for reliability). Recently, Rome Air Develop-
source materials: ment Center (RADC), the sponsor of MIL-HDBK-217E,
sponsored a software reliability survey that was intended to
(1) The purpose of software quality assurance is to assure the give software quality the same status as that of hardware.
acquisition of high-quality software products on schedule, within The next step is to discuss the process of achieving quality
cost, and in compliance with the performance requirements in software and how quality management is involved. The
(ref. 9-1). purpose of quality management for programming products is
(2) The developer of a methodology for assessing the quality to ensure that a preselected software quality level be achieved
of a software product must respond to various needs. There can on schedule and in a cost-effective manner. In developing a
be no single quality metric (ref. 9-2). quality management system, the programming product's criti-
(3) The process of assessing the quality of a software product cal life-cycle phase reviews provide the reference base for
begins when specific characteristics and certain of the metrics tracking the achievement of quality objectives. The guidelines
are selected (ref. 9-3). for reliability and maintainability management of the
(4) Software quality can be defined as (a) the totality of International Eiectrotechnical Commission (IEC) system
features and characteristics of a software product that bear on its life-cycle phases follow:

NASA/TP--2000-207428 133
4 J
l
I
--- m
> • I 1 1
pI I
I
/ / l /
/ I I /
/ I 1 I x
QM-3 "/ 1
/
/ 3
Phases
4 / 5

Figure 9-1 .--Quality diagram.

Z
/, QM-8

(1) Concept and definition: The need for the product is


Figure 9-2.--Programming quality measurement map.
decided and its basic requirements defined, usually in the form
of a product specification agreed upon by the manufacturer and
user. Without stating the specific activities for each phase, we can
(2) Design and development: The product hardware and discuss the generalities of software quality and its cost. The cost
software are created to perform the functions described in the of implementing quality increases with distance along the
product specification. This phase will normally include the X-axis. Activities can be arranged along the Y-axis so that the
assembly and testing of a prototype product under laboratory- cost of quality increases with distance along the Y-axis. With
simulated conditions or in actual field trials and the formulation this arrangement, we can establish rigorous quality standards
of detailed manufacturing specifications and instructions for for the individual quality metrics as a function of cost effective-
operation and maintenance. ness (e.g., error seeding--the statistical implanting and removal
(3) Manufacturing, installation, and acceptance: The design of software defects--may be expensive). Other quality metrics
is put into production. In the case of large, complex products, (e.g., test case effectiveness) may cost significantly less and
the installation of the product on a particular site may be could be selected.
regarded as an extension of the manufacturing process. This In general, for a programming product, the higher the level of
phase will normally conclude with acceptance testing of the quality, the lower the costs of the product's operation and main-
product before it is released to the user. tenance phase. This fact produces an incentive for implement-
(4) Operation and maintenance: The product is operated for ing quality metrics in the early design phases. The programming
the period of its useful life. During this phase, essential preven- industry has traditionally required large maintenance organiza-
tive and corrective maintenance is performed, product en- tions to correct programming product defects. Figure 9-3
hancements are made, and product performance is monitored. presents a typical phase-cost curve that shows theincreased costs
The useful life of a product ends when its operation becomes of correcting programming defects in the later phases of the
uneconomical because of increasing repair costs, it becomes programming product's life cycle. Note that the vertical axis is
technically obsolete, or other factors make its use impractical. nonlinear.
(5) Disposal: The product reaches the end of its planned
useful life or the requirement no longer exists for the product,
so it is disposed of, destroyed, or modernized, if economically Software Quality
feasible.
The next step is to look at specific software quality items.
The quality of the programming product can be controlled in Software quality is defined in reference 9--4 as "the achieve-
the first three life-cycle phases to achieve the expected level of ment of a preselected software quality level within the costs,
performance of the final product. When the fourth phase schedule, and productivity boundaries established by manage-
(operation and maintenance) has been entered, the quality of ment." However, agreement on such a definition is often
the software is generally fixed. With these five life-cycle phase difficult to achieve. In practice, the quality emphasis can
boundaries in place, we can conceptualize what can be imple- change with respect to the specific product application environ-
mented as "programming quality measurement." If the phases meat. Different perspectives of software product quality have
and activities are the X- and Y-coordinates, the individual been presented over the years. However, in today's literature,
quality metrics can be placed on the Z-axis as shown in there is general agreement that the proper quality level for a
figure 9-2. particular software product should be determined in the

134 NASA/TP--2000-207428
100 E-

t/)
Characteristic
tD 10 E-

0 r t
1 I or'*"I "'" I cn*o"
(,.)
(D
> Criteria

Metdcs

Concept Design integration Operation Figure 9-4.--Management's view of quality.


and and and and
definition development installation maintenance TABLE 9-1 .--APPLICATION-DEPENDENT
SOFTWARE QUALITY CHARACTERISTICS
Life-cycle phases
Characteristic Application Importance
Figure 9-3.--Increasing costs of programming defects.
Maintainability Aircraft High
Management information Medium
systems
Testbeds Low
concept and definition phase and that quality managers should Portability Spacecraft Low

monitor the project during the remaining life-cycle phases to Yestbeds High
ensure the proper quality level.
The developer of a methodology for assessing the quality of (1) Maintainability
a software product must respond to the specific characteristics (2) Portability
of the product. There can be no single quality metric. The (3) Reliability
process of assessing the quality of a software product begins
(4) Testability
with the selection of specific characteristics, quality metrics,
(5) Understandability
and performance criteria.
(6) Usability
The specifics of software quality can now be addressed with (7) Freedom from error
respect to these areas:

Management's view of software quality is the quality charac-


(1) Software quality characteristics teristics. Established criteria for these characteristics will pro-
(2) Software quality metrics vide the level of quality desired. The quantitative measures
(3) Overall software quality metrics (metrics) place the quality at the achieved level, This concept
(4) Software quality standards
is shown in figure 9-4.
Software quality criteria and metrics are directly related to
Areas (I) and (2) are applicable during the design and develop-
the specific product. Too often, establishing the characteristic
ment phase and the operation and maintenance phase. In and the metric in the early life-cycle phases without the proper
general, area (2) is used during the design and development criteria leads to defective software. An example of the charac-
phase before the acceptance phase for a given software product. teristics and their importance for various applications is pre-
sented in table 9-1.

Software Quality Characteristics


Software Quality Metrics
A software quality characteristic tree is presented in refer-
ence 9-5. The authors assume that different software products The entire area of software measurements and metrics has
require different sets of quality characteristics. A product that been widely published and discussed. Two textbooks
has a rigorous constraint on size may sacrifice the maintain- (refs. 9-6 and 9-7) and the establishment of the Institute for
ability characteristic of the software to meet its operational Electrical and Electronics Engineers (IEEE) Computer
program size goals. However, this same product may need to be Society's working group on metrics, which has developed a
highly portable for use on several different processors. In guide for software reliability measurement, are three examples
general, the primary software quality characteristics are of such activity. Software metrics cannot be developed before

NASA/TP--2000-207428 135
TABLE
9-2 .--MEASUREMENT OF SOFTWARE QUALITY CHARACTERISTICS

Characteristic Software life-cycle phase

3 4 5 7 9

Product Top-level Detailed Testing and Maintenance


definition design design integration and
enhancements
i

Maintainability (a) (a) (h)


!Portability
Reliability (a) (b) (b)

Testability
Test case completion

Estimate of bugs I

remaining
Understandability (a) i •
I
Usability (a) 7, ..... _p'

Freedom from error --- (a), (c) (a), (c) _r

awhere quality characteristic should be measured.


bWhere impact of poor quality is realized.
CMetric can take form of process indicator.

TABLE 9-3.--MEASUREMENTS AND PROGRAMMING PRODUCT LIFE CYCLE

System life- Software life- Order of precedence


cycle phase cycle phase
Primary Secondary
Concept and Conceptual planning (1)
definition Requirements definition (2) I .................

Product definition (3) iQuality, metrics a


Design and Top-level design (4) Quality metrics Process indicators
development Detailed design (5) Quality metrics Process indicators
Implementation (6) Process indicators" Quality metrics
Manufacturing and Testing and integration (7) Process indicators Performance measures
installation Qualification, installation, Performance measures c Quality metrics
and acceptance (8)
Operation and Maintenance and Performance measures
maintenance enhancements (9)
Disposal Disposal (tO)

aMetrics-----qualitative assessment, quantitative prediction, or both.


blndicators--month-by-month tracking of key project parameters.
CMeasures----quantitative performance assessment.

the cause and effect of a software defect have been established When the programming product enters the qualification,
for a given product with relation to its product life cycle. installation, and acceptance phase and continues into the mainte-
Table 9-2 is a typical cause-and-effect chart for a software nance and enhancements phase, the concept of performance is
product and includes the process indicator concept. At the important in the quality characteristic activity. This concept is
testing stage of product development, the evolution of software shown in table 9-3, where the aforementioned 5 IEC system
quality levels can be assessed by characteristics such as free- life-cycle phases have been expanded into 10 software life-
dom from error, completion of a successful testcase, and estimate cycle phases:
of the software bugs remaining. These process indicators can be
used to predict slippage of the product delivery date, the (1) Conceptual planning: The functional, operational, and
inability to meet original design goals, or other development economic context of the proposed software is understood and
problems. documented in a product proposal.

136 NASA/TP--2000-207428
(2) Requirementsdefinition: A product proposal is expanded and degree of effort is presented in table 9--4 for any given
into specific product requirements and the requirements, such quality characteristic. From the table,
as performance and functional capabilities, are analyzed and
translated into unambiguous developer-oriented terms. (1) Each quality characteristic can have a matrix similar to
(3) Product definition phase: Software engineering prin- this with a specific quality program tailored to a company's
ciples, technical information, and creativity are used to describe products.
the architecture, interfaces, algorithms, and datathat will satisfy (2) The quality effort is extended to each of the product's
the specified requirements. life-cycle phases to the degree desired by the company.
(4) Top-level design: The functional, operational, and per- (3) For each level, as the complexity and difficulty of a
formance requirements are analyzed and designs for system characteristic requirement increase, the intensity of the test and
architecture, software architecture, interfaces, and data are verification program effort increases.
created and documented to satisfy requirements. (4) This matrix will change for each characteristic in accor-
(5) Detailed design: The functional, operational, and perfor- dance with company emphasis.
mance requirements are analyzed and designs for system (5) Traditionally, the quality levels of a product correspond
architecture, software architecture, components, interfaces, to degrees of effort. However, this matrix extends the effort to
and data are further created, documented, and verified to satisfy all phases of the product's life cycle.
requirements.
(6) Implementation: The software product is created or As an example of using the matrix shown in table 9--4, a
implemented from the software design and the faults are characteristic such as reliability may be targeted to reach service
detected and removed. level 2. Then throughout planning, design, testing, integration,
(7) Testing and integration: Software elements, hardware and installation, the reliability should achieve at least level 2.
elements, or both are combined into an overall system or an These indicators are tied to the proper major phase review
element of a system, and the elements are tested in an orderly points of a product's life cycle. For most characteristics, the
process until the entire system has been evaluated, integrated, planning level should be achieved after the preliminary design
and tested. review (PDR); the design level, after the development phase or
(8) Qualification, installation, and acceptance: A software at the critical design review (CDR); the integration level, after
product is formally tested to assure the customer or the customer' s integration at the qualification testing; and the service level,
representative that the product meets its specified require- during the operational service reviews.
ments. This phase includes all steps necessary to deliver, install, Now, quality management can apply this matrix to each
and test a specific release of the system software and its characteristic in a manner depending on how critical it is to
deliverable documentation. ensure achievement of the characteristic. For example, the
(9) Maintenance and enhancements: The product is ready reliability goal for a key system may be 10 or fewer mishandled
for serving its designated function, is monitored for satisfactory calls per week, but the reliability goal for a private branch
performance, and is modified as necessary to correct problems exchange (PBX) may be only 5 mishandled calls per month.
or to respond to changing requirements. These objectives may cause quality management to define a
(10) Disposal: The product reaches the end of its planned planning 2, design 2, integration 2, and service 2 program for
useful life or the requirement no longer exists for the product the key system and a more demanding planning 4, design 3,
and it is disposed of, destroyed or modernized, if economically integration 3, and service 3 program for the PBX.
feasible. In this manner, the quality characteristics are clearly identi-
fied by detailed criteria that set the scope of and limit the required
objectives. Once these objectives are identified, a quality pro-
Overall Software Quality Metrics gram can be determined to define the specific required defini-
tion, design, test, and measurement efforts. No longer are
Several overall software quality metrics have been put into nebulous measurements made against vague objectives in the
practice and have effectively indicated software quality. Jones service phase of a product's life cycle in a last-minute attempt
(ref. 9--8) presents an overall quality metric called defect to improve quality.
removal efficiency. The data collected for the overall quality The program for pursuing quality characteristics must be
metric are simplified to the more practical expression of established early. If a particular quality characteristic is not
"defects per 1000 lines of source code." pursued to a reasonable extent in the planning and design
A second overall quality metric is based on the concept of phases, a maximum degree of effort (4) may not realistically be
quality prisms (refs. 9-9 and 9-10), which considers the extent achieved in the service phase. Conversely, the more uniformly
of effort with which a given quality characteristic has been and consistently a quality characteristic is pursued, the more
implanted into a product and the degree of effort for quality that achievable and figuratively stable the characteristic. This is
has occurred in each life-cycle phase. An example of the extent graphically represented for a single characteristic in

NASA/TP--2000-207428 137
TABLE 9--4.----QUALITY CHARACTERISTIC DEGREE/EXTENT MATRIX

Product Service level


phase
0 I 2 3 4

Plannin_ No activity General high level rSpecific detailed Highly complex required Difficult or complex
required ]requirements definition definition and support required definition and E
model prototype
X

Design and No activity General Detailed architecture I Extensive architecture and Separate quality teams
t
test architecture structure impact; structure consideration; to verify design;
consideration; language impact; test tailored language, detailed test facility; e

general test and program extended operating system, man- extensive qualification
n
measurement machine interface impact, test plans and
_program etc.; code walkthroughs; procedure t
detailed documentation

IIntegration _No activity General quality Extensive qualification Quality teams formed; Specialized quality O
and management test plans and detailed quality configur- mlegration,
_f
installation program; accept- procedure to verify ation control release manufacturing, and
ance test; nominal characteristics; above- program; extensive data installation programs to
change control nominal-quality- collection, verification. ensure achievement of
quality program requirement verifi- and analysis quality characteristics
cation testing by separate quality
organization

Service No activity General quality Formal data collection Detailed measurements, Extensive measures
tracking and and analysis program to data analysis, and and modeling, vigorous
redesign program to verify quality object- modeling program to data analysis, and
achieve quality 1yes; quality redesign verify high-level quality specialized tests to
objectives and effort objectives; extensive ensure high-level
requirements redesign to obtain quality achievement of
detailed quality
requirements; extensive
change program

No quality First level of I Second level of quality Third level of quality Fourth level of quality
quality

Degree of effort

figures 9-5 to 9-7, where the quality item is shown as either system. It was estimated that 13 weeks of design time would be
stable, unstable, or extremely costly to stabilize. required to construct this table generator by using a nominal
In figure 9-5 an optimum tradeoff of stability and productiv- amount of computer support time. A representative of the
ity is portrayed. The base of the prism is secure, supporting the design group was assigned to define the input and output
platform by properly balancing quality versus cost. In requirements for the support program and verify its operation.
figure 9--6 schedule pressures have established an unstable prism The program was initially written in assembly language. It was
to support the platform. In this example, the decision was made later redesigned and split into three separate programs written
to send the product into the field at service level I even though in a high-level language. These programs could then be sepa-
it initially had reached a more extensive degree of quality (3) in rately designed, verified, and maintained. The main consider-
the planning phase (considerable effort to define quality objec- ation became the verification process. An input and output test
tives in the planning phase but no followup). Figure 9-7 was written to check the extensive program paths. The project
presents the extremely costly view of upgrading a program- dragged along for a year as verification testing attempted to
ming product in the field to service level 4 (after passing the meet a zero-defect objective (imposed after the initial design
first three phases only to the first degree). Note the increasing had been completed). Costs increased and the schedule became
amount of time and effort to achieve service levels 1, 2, or 3. critical as the customer became impatient (fig. 9-7). As the
Service level 4 in this example is usually extremely difficult program began to function more successfully, deciding the
and expensive, if not impossible, to achieve. The measured degree of testing required for verification became a serious
productivity of such a product will most likely be low. problem. Confrontation developed between the design and
An excellent example of the need for this type of quality marketing departments over the commercial release of the
management process occurred many years ago, but the lessons program. The testing continued without agreement on the
still apply today. An automated program was proposed to required degree of effort. Eventually, the customer became
generate from 160 fields of input data per customer, a central- disillusioned and turned to another firm to provide the table
ized data base that would control a table-driven, wired logic generator.

138 NASA/q'P--2000-207428
installation

Service

Figure 9-7.--Extremely costly programming products.


Service

Figure 9-5.--Stability in quality and cost. and degree of each desired characteristic, with an elastic
compromise between the schedule, resources, and design activ-
ity needed to achieve it. In this case, many of the "ilities,"
changeability, usability, maintainability, and reliability, were
subsequently more critically identified. These considerations
could have been translated into the initial requirements for
structural design, program segmentation, extensive documen-
tation, type of language, amount of code walkthrough, number
of subfunctional tests, amount of error acceptable at first
release, depth of verification reviews, and so on. From this
form of planning, the quality prisms could have been estab-
lished to define the extent and degree (such as service level 2,
_'- Design and test
3, or 4) to which each of these characteristics should have been
/'_ Integrationand pursued in terms of project cost restraints that depended on user
willingness to pay and wait for a quality product.
A figuratively secure prismatic base for the programming
Service _ product is presented in figure 9-5. This security is developed
through execution of an extensive quality program, as progres-
sively shown in figures 9-8 to 9-10. A product's quality
Figure 9--6.--Instability due to scheduling decisions. objective is usually composed of more than one characteristic.
Previously, those have tentatively been noted as maintainabil-
ity, portability, reliability, testability, understandability,
Had a clear quality management decision been made in the usability, and freedom from error. Thus, quality management
planning phase and tracked throughout the development on the can extend the support prismatic structure to a greater depth
degree of error-free "verified" operation, the quality character- than to just one quality characteristic. In practice, several
istic objectives for its design architecture and structure, the quality prisms will be placed together to achieve a firm quality
language required for changes, and so forth, a more realistic base.
projection (and control) of schedule and people could have It may be desirable to have a product developed that has
been achieved. Several releases to the customer may have been reached service level 4 for all the aforementioned quality
required as the program designs and operation were verified to characteristics. However, realistic schedules and productivity
a predetermined extent within the various life-cycle phases. goals must be considered in terms of cost. These considerations
Had this procedure been followed, both the customer and the establish the need for vigorous quality management over all
supplier would have been more satisfied. life-cycle phases to selectively balance the various possibili-
This example offered an excellent opportunity to first deter- ties. It would be nonsuppordve, expensive, and time consum-
mine the type and degree of quality desired. Then management ing ifq uality management established the structural combination
could have constructed a quality process, in terms of the extent of individual characteristic quality prisms graphically

NASA/'/'P--2000- 207428 139


Quality

Schedule/Process/Productivity

Design and test

i,_e_/Integration and installation

Service

Quality management

Figure 9-8.--Delicate balance--planning complete.

Quality

Schedule/Process/Productivity

/Integration and installation

Service

Quality management

Figure 9-9.--Delicate balance--design and testing complete.

140 NASA/TP--2000-207428
Quality

/1

Schedule/Process/Productivity /
_/ Service

Quality management

Figure 9-10.--Delicate balance--integration and installation complete.

P Planning
P Planning
D Designand test
D Design and test
] Integration and installation
I Integration and installation
S Service
S Service

Relia- I Change- intain-IUnderstand-I "'° Relia- Cha 1ge- Maintai

bility / ability M,,ty lab,,ty I::: eto


al:
bility
e
abili
ability

P3
P3 P2 m

D3
/ I D3 D,._2 m

I3
/ I111 ,3 11 n
P1 P_

/ $1
m
$4 I
i__2
m
$3

Figure 9-11 .--Example of poor quality management. Figure 9-12.--Example of good quality management.

presented in figure 9-11. Unfortunately, this is the case for too


many products. Quality management would do better to estab- ii i: iii! ix
lish a more consistent support structure, like that represented in
figure 9-12. The figurative result of this consistent effort is
shown in the solid cost-effective base of figure 9-13.
If quality characteristics are established, monitored, meas-
ured, and verified throughout the life cycle, a realistic balance Figure 9-1 &--Example of solid quality base.
can successfully be achieved between quality costs, schedule,
and productivity. However, it will require an active quality As discussed, a quality management team must establish the

management process to establish and track these indicators. An degee of quality that a particular quality characteristic must
example of such a quality management process matrix is pre- reach throughout its life cycle. It may use specialized support
sented in table 9-5 to quantify the extent and degree of effort tools, measurement systems, and specific product quality stan-
needed to achieve a desired level of quality. This table can be dards to pursue its quality objectives. A point system can give
used as a pro_amming product quality worksheet or as both a quantitative reference for the pursuit of quality. The point
the characteristic survey data collection instrument and part of system can become the basis for trading time versus cost
the final quality prisms planning document. to reach specific quality goals. Of course, a firm's quality

NASA/TP--2000-207428 141
TABLE 9-5.--EXAMPLE OF QUALITY MANAGEMENT
PROCESS MATRIX

[Number in circle denotes degree of quality selected by a


quality management process.]

Product phase Quality characteristic

Reliability Changeability Maintainability


Planning 1®34 123@ 12@4

o Design and test !®34 123@ 12®4


ud
Integration and 1®34 I®34 123@
installation

Service 1®34 1®34 12®4

Degree of quality

TABLE 9-6.--EXAMPLE OF PURSUIT OF QUALITY

Product Quality characteristic

phase Reliability Changeability Maintainability

!Planning 4 3

Design and test 4 3

Integration and 2 4
installation

Service _r 2 3

Total points/ 8/16 12116 13/16


available points (50 percent) (75 percent) (81 percent)

Total (33/48)/C3, or (69 percent)/C3

management will define their own point system. However, the single-character structure. The X-percent quality reference
following example point system will serve as an illustration for number should also be qualified by a factor to note how many
discussion purposes. characteristics were actually used. This could be shown as
If a single characteristic's quality effort has progressed 69 percent/C3, or 33/48/C3.
through all four levels and through each level's maximum Finally, some characteristics will be more complex and
degree, it has accumulated a maximum of 4 + 4 + 4 + 4 = 16 require greater costs to achieve than others. Thus, a weighting
points. If another characteristic's effort has moved through the multiplier (WM) can be used to equalize the quality character-
levels only at one-half its maximum degree, it has accumulated istics. Weighting multipliers for the preceding example
2 + 2 + 2 + 2 = 8 points. If it reached three-quarters of the are demonstrated in table 9-7. For this example, the total of
maximum degree of effort on all levels, it has 3 + 3 + 3 + 3 = 12 10 + 28 + 19 = 57 points out of a possible 20 + 40 + 24 = 84 points
points. Management can now assign a reference value to the is 57/84/C3, or 68 percent/C3. This three-part programming
pursuit of quality for a programming product. This is shown in quality ratio (e.g., 57/84/C3) can be used for reviewing quality
the simplified example in table 9-6. For this example the total across programming products within a corporation as a more
is 9 + 12 + 13 = 33 points out of a possible 16 + 16 + 16 = 48 quantitative cross reference of quality costs to quality objectives.
points, or 69 percent. (In more general terms, this can also be A quality management process matrix (table 9-5) has been
referred to as an overall level-3 quality effort in the 50- to presented for pursuing quality throughout a programming
75-percent range.) Note that the real indication of the quality product's life cycle. It relates the pursuit of quality character-
objectives will be the magnitude of the X/Y (33/48) values. The istics to the planning, design and testing, integration and
greater the X- and Y-values, the deeper the degree to which the installation, and service phases. In practice, actual implemen-
characteristics have been pursued. The greater the X-value, the tation of this approach will require the selection of languages,
more stable the structure has become and the more quality walkthroughs of code, type of testing, and so forth to be
objectives the programming product has achieved. specifically defined for reaching service quality level 2, 3, or 4.
If this type of analysis is carried over all eight characteristics From this matrix, the impact on schedule and the cost of quality
(8X16), a maximum of 128 points is possible. Products that can be projected and monitored.
approach this level of effort will have a considerably more This process will also help management to compare the
stable structure than those that are only based upon a 16-point, extent and degree of quality for products of competing compa-

142 NASA/TP--2000-207428
TABLE 9-7.--EXAMPLE OF USE OF
Software
WEIGHTING MULTIPLIERS (WM) D. quality
standards
Product Quality characteristic
phase
Reliability Changeability Maintainability
Programming
Level × WM Level x WM Level x WM project

Planning 2xl 4x2 3x2


Software
Design and test 2xl 4x2 3xl.5 quality
measurements
Integration and 2xl 2x3 4xl
installation
Figure 9-14.--Relationship of measurements and standards.
Service 2x2 2x3 3xl.5

Total points/ 10/20 28/40 19/24 Concluding Remarks


available points (50 percent) (70 percent) (79 percent)

Total (57/84)/C3, or (68 percent)/C3 This chapter has presented a snapshot of software quality
assurance today and has indicated future directions. A basis for
software quality standardization was issued by the IEEE.
Research is continuing into the use of overall software quality
metrics and better software prediction tools for determining the
hies or internal corporate divisions. Of course, until such a
defect population. In addition, simulators and code generators
standard is developed, the quality management team will sub-
are being further developed so that high-quality software can be
jectively assign values and multipliers as noted in table 9-5 and
relate them to their own acceptable degree of documentation, produced.
Several key topics were discussed:
walkthrough of code, and module tests. These subjective values
are extremely useful in establishing individual product quality
effort goals, translating the concept of quality prisms to plan- (1) Life-cycle phases
(2) Software quality characteristics
ning, design, and test considerations that balance schedule and
(3) Software quality metrics
cost against quality objectives. However, management will
(4) Overall software quality metrics
now have a more reasonable opportunity to pursue and success-
(5) Software quality standards
fully achieve the extent and degree of desired quality for their
(6) Process indicators
products.
(7) Performance measures
The ability to specify an overall software quality metric has
been addressed. OveraIl quality measurements can be normal-
ized, as in the quality prisms concept, for purposes of compari- Process indicators are closely tied to the software quality

son. The quality prisms concept can be used to compare the effort and some include them as part of software development.

software of two or more different projects within the same In general, there are measures such as (I) test cases completed
versus test cases planned and (2) the number of lines of code
company or of different companies even if the software prod-
developed versus the number expected. Such process indica-
ucts have unique applications or utilize different programming
tors can also be rolled up (all software development projects
languages. Quality prisms can also be used to combine hard°
added together) to give an indication of overall company or
ware quality and software quality into an assessment of the
corporate progress toward a quality software product. Too
quality of the entire system.
often, personnel are moved from one project to another and thus
the lagging projects improve but the leading projects decline in

Software Quality Standards their process indicators. The life cycle for programming prod-
ucts should not be disrupted.
The relationship of software quality standards and software Performance measures, which include such criteria as the
quality measurements is depicted in figure 9-14. Measure- percentage of proper transactions, the number of system restarts,
ments and standards must agree. If a set of quality standards is the number of system reloads, and the percentage of uptime,
established (e.g., zero defects) and quality measurement cannot should reflect the user's viewpoint. The concept of recently
prove it (i.e., through exhaustive testing, error seeding, etc.), proposed performability combines performance and availabil-
the software development project must realistically set a goal so ity from the customer's perspective.
that both quality standards and measurements can be devel- In general, the determination of applicable quality measures
oped. The IEEE has published many articles on and general for a given software product development is viewed as a
guides for formulating goal criteria. In addition, many technical specific task of the software quality assurance function. The
papers are available on setting specific goals on the bases of life determination of the process indicators and performance mea-
cycle and a per-delivered software product (ref. 9-11). sures is a task of the software quality standards function.

NASA/TP--2000-207428 143
9-6. Perlis, A.J.; Sayward, F.G.; and Shaw, M., eds.: Software Metrics:
An Analysis and Evaluation. MIT Press, 1981.
References 9-7. Basili, V.R.: Tutorial on Models and Metrics for Software Management
and Engineering. IEEE Computer Society Press, 1980.
9-1. Dunn, R.; and Ulman, R.: Quality Assurance for Computer Software. 9-8. Jones, T.C.: Measuring Programming Quality and Productivity. IBM
McGraw-Hill, 1982, p. 265. Syst. J., vol. 17. no. 1, 1978, pp. 39-63.
9-2. Boehm, B.W., et al.: Characteristics of Software Quality. North- 9-9. Heldman, R.K., and Malec. H,A,: Quality Management Process for
Holland, 1978, p. 3-1. Telecommunications Programming Products. 1984 IEEE Global
9-3. LEEE Standard Glossary of Software Engineering Terminology. IEEE Telecommunications Conference. GlobeCom 1984. IEEE, 1984,
Computer Society, 1982, p. 34. pp. 557-565.
9--,1. Reliability Prediction of Electronic Equipment. MIL-HDBK-217E. 9-10. Malec, H.A.: An Introduction to Quality Prisms and Their Application to
Jan. 1990.
Software. Relectronic '85. Sixth Symposium on Reliability in Elec-
9-5. Boehm, B.W.; Brown, J.R.: and Lipow, M: Quantitative Evaluation of tronics, OMIKK-Technoinform, Budapest. Hungary, pp. 155-163.
Software Quality, Tutorial on Models and Metrics for Software
Management and Engineering, V.R. Basili, ed., IEEE Computer
Society Press, 1980.

144 NASA/TP--2000-207428
9-11. Jones, D.R.: and Malec, H.A.: Communications Systems Performability: New Horizons. 1989 IEEE InternationalConference on Communications, vol. I,
IEEE, 1989. pp. 1,4.1-1.4.9.

Reliability Training 1

[. What are the three factors that determine quality software?

A. Process, material, and vibration


B. Process, product, and environment
C. Planning, product, and shock
D. All of the above

2. What does software quality consist of?.

A. Various aspects of producing programming products


B. Bar charts for process control
C. Statistical analysis of software bugs
D. All of the above

3. How is the term "software quality" defined?

A. To assure the acquisition of high-quality software products on schedule, within cost, and in compliance with the
performance requirements
B. To ignore various needs
C. To develop specifications and attributes, perceive customer needs, and meet the user's expectations
D. All of the above

4a. What are the 10 software life-cycle phases?

A. Conceptual; requirements; product definition; design; implementation; testing; vibration; prototypes; installation; and
disposal
B. Planning; definition; design; manufacturing; testing; acceptance; debugging; and repair
C. Conceptual planning; requirements definition; product definition; top-level design; detailed design; implementation;
testing and integration; qualification, installation, and acceptance; maintenance and enhancements; and disposal
D. All of the above

4b. What are the IEC system life-cycle phases?

A. Concept and research; design and plan; manufacture and debug; operation and maintenance; and wearout
B. Concept and definition; design and development; manufacturing and installation; operation and maintenance; and
disposal
C. Research and development; design and breadboard; manufacturing and testing; operation and maintenance; and disposal
D. All of the above

4c. How can the 10 software life-cycle phases be combined to fit in the IEC system life-cycle phases?

A. Concept and definition: conceptual planning; requirements definition; and product definition
B. Design and development: top-level design and detailed design
C. Manufacturing and installation: implementation; testing and integration; qualification; and installation and acceptance
D. Operations and maintenance: maintenance and enhancement
E. Disposal: disposal
F. All of the above

IAnswers are given at the end of this manual.

NASA/TP--2000-207428 145
5.Cantherebedifferent
de_ees
ofaqualitycharacteristic
fordifferent
life-cycle
phases?
A.Yes B.No C.Donotknow

6a.Thedefinition
ofalackofsoftware
qualityis

A. Thelackofproperplanning
inearlylife-cyclephases
B. Theapplication
ofdependentsoftware qualitycharacteristics
C. Poorlydeveloped
software
thatlackspropercriteriainlife-cyclephases
D.All oftheabove

6b.Threeexample
characteristics
ofsoftware
qualityare

A. Testing,
inte_ation,
andportability
B. Maintainability,
portability,
andreliability
C.Design, implementation,
andreliability
D.All oftheabove

7.Seven
software
qualitycharacteristics
are

A. Maintainability,
portability,
reliability,
testability,
understandability,
usability,
andfreedom
fromerror
B. Planning,definition,
reliability,
testing,software,
hardware,
usability
C. Design, implementation,
intevation, qualification,
acceptance,
enhancement,maintenance
D. All oftheabove

8.Management
hasdecided
thatqualityengineering
should
measure
fourcharacteristics
of the XYZ software: maintainability,
portability, reliability, and testability. The desired goals set at the beginning of the program by management for the charac-
teristic effort were maintainability, 3.5; portability, 3.0; reliability, 3.9; and testability, 3.5. The overall goal was thus
87 percent/C4 for the extent of quality. The 2-year program gave the following results:

Characteristic Planning Design and test Integration Service

Maintainability 4.0 3.5' 3.4 3.4

IPortability 4.0 3.0 3.1 3.1

[Reliability 3.5 3.6 3.9 3.9

Testability 4,0

Total 15.5 13.2 13.9 14.0

a. The actual extent of quality was

A. (87.5 percent)/C4 B. (88.4 percent)/C4 C. (88.8 percent)/C4 D. None of these

b. Have the management objectives been achieved?

A. Yes B. No C. Do not know

146 NASA/TPq2000-207428
Chapter 10
Reliability Management

Roots of Reliability Management current thinking is that all aspects of management operations
and functions must be integrated in the reliability concept and
Over the past few years the term "reliability management" program. Thus, reliability in the manufacturing or production
has been raised to a high level of awareness. Previously, the phase is as important as reliability in the design phase
management of reliability was concerned with eliminating (ref. 10-1), as shown in figure 10-1.
failure by testing to prove reliability, and it generally comple-
mented the design function. Quality management, on the other
hand, focused on quality control and generally aligned itself Planning a Reliability Management
with manufacturing and production. The picture beganto change Organization
with the focus on customer reliability and quality concerns.
Specifically, the usage and standardization by companies of Planning a reliability management organization requires that
reliability growth models established that the new concept of the reliability function report to a high enough level to be effec-
reliability management is replacing the old concept of the five. The reporting level is too low if it does not involve top
management of reliability. The focus is now on enlarging the management in reliability issues. For example, many success-
area of reliability concern to all phases of the life cycle. The ful programs today encompass 3 to 6 hours per month at vice-

I
0000000000000_000000000000
I 0 0
o I 0
I
,$ 0
E 0
I
t_
_t.
O

r_
.__. O
o O

¢I: o I

ooooo_ II
I I
l l
}\ I\
Qualification \ First x_ Last Replacement
Designand development _- I customer customer
shipment shipment

I_'_ Manufacturing .D_-[

_. Customer ,.._I
v I

Figure10-1 .--L_-cycle reliabilitygrowth withtwo different partsto first customershipment.

NASA/TP--2000-207428 147
presidentialstaffmeetings.Each company must findthelevel
Reliabilitycouncil
thatmakes reliabilitya significantissueto bc addressed.A
guideto reliability
management isreference10-2.
A functional organizationforms groups toperform similar
generictaskssuchasplanning,designing,testing,
and reliability.
Often,such an organization getsmired down with too many
levelsofmanagcrnent,and specific productpriorities
arcoften
differentin themany task groups.However, many benefits
accruefrom theconcentration of talent
and constanttechnical
peer review.With today'stime-to-marketpressures,
building Diagnosticteam or person
such a largecentralized
reliability
organization
isoftennotthe
bestchoice.The team approach,distributed
reliability,
isoften Figure lO-2.--Reliability organization.
selectedover functionalorganization.
(5) Assign tasks
In a team organization, people with diversetalentsand
(6) Regularly review tasks
backgroundscomprisetheteams.Qualitycircles and reliability
(7) Participate in reliability improvement awards
circles arebased on thesame organizational approach.Even
though peerreviewisnotongoing,thecrosstechnologyknowl-
The reliability council membership may consist of the
edge oftoday'spersonnelappearstofullycompensate forthe
lack of constantpeer review. In the softwaredevelopment
(I) Vice president of the company or division as chairman
world,severaltypesof team organization exist. For instance,
(2) Vice president's staff
thefirst type,theprojectteam, istypicaland isa hierarchical
(3) Vice president's business partners
organizationinwhich programmers with lessexperienceare
(4) Corporate engineering director
assignedtowork forprogrammers withmore experience. The
(5) Corporate manufacturing director
projectteam isdesignedtofitthecompany organization rather
(6) Corporate customer services director
than to fitprojectrequirements.The second typeisthechief
programmer team,which employs a highlyskilled personwho
The diagnostic team's or person's functions are to
performsmost of theprogramming while providingtechnical
direction. A thirdtype isthe Weinberg programming team,
(I) Review theinternal reliability
status
which iscomposed ofgroupsof l0 orfewer programmers with
(2) Review reliability as perceivedby customers
complementary skills. Group consensus and leadership role
(3) Recommend tasksto thereliability council
shifts are characteristic of this type. Each of these team organi-
(4) Diagnose problems
zations has advantages depending on the size of the project, the
(5) Design experiments
newness of thetechnologybeing implemented,and so on.
(6) Collect and analyze data
The fourthtypeoftearnorganization, thematrix,isa hybrid
approachthatcombines functional talenttoputteams together,
The diagnostic team's or person's concerns include
butitcan bc a reliability disaster especially iftime-to-market
pressuresexist.Often the technology is masked by middle (1) Reliability, quality, and statistics
management proceduralmeetings becausetheseteams report (2) Engineering and manufacturing engineering
to one manager.Individual contributors areadded towork on (3) Product development and process optimization
one or more tasksof a given projectorproductdevelopment. (4) Product assembly and test strategies
These projects usually report to middle management. (5) Customer perception
A fifth possible type of team organization is based on the
theory stated in reference 10-3: reliability is actively pursued This is a new dynamic approach for establishing reliability
by involvement starting on the vice-presidential level and pro- management at the proper level in a corporation while optimiz-
ceeds throughout the organization. This new style of reliability ing its effectiveness.
involves establishing a reliability council, dedicating a full-time
diagnostic person or team, and generally making an upward
change in the reliability reporting level. Figure 10-2 presents
General Management Considerations
this concept. The reliability council's responsibilities are to
Program Establishment
(1) Endorse the annual reliability plan
(2) Regularly review reliability status To design for successful reliability and continue to provide
(3) Approve reliability improvement projects customers with a reliable product, the following steps are
(4) Set priorities on resources necessary:

148 NASA/TPq2000-207428
(1) Determine the reliability goals to be met. TABLE 10--1.--RELIABILITY OBJECTIVES FOR
TELECOMMUNICATIONS INDUSTRY
(2) Construct a symbolic representation (e.g., block dia-
Module Orsystem Objective
_am or Petri net, ref. 10--4).
(3) Determine the logistics support and repair philosophy. Telephone instrument Mean time between failures
(4_ Select the reliability analysis procedure. Electronic key system Complete loss of service
(5) Select the source or sources of the data for failure rates Major loss of service
and repair rates. Minor loss of service
(6) Determine the failure rates and the repair rates.
PABX Complete loss of service
(7) Perform the necessary calculations.
Major loss of service
(8) Validate and verify the reliability.
Minor loss of service
(9) Measure reliability until customer shipment.
Mishandled calls
Traffic service Mishandled calls
This section will address the first three steps in detail.
position system (TSPS) System outage
Class 5 office System outage
Class 4 office Loss of service
Goals and Objectives Class 3 office Service degradation

Goals must be placed into the proper perspective. They are


often examined by using models that the producer develops.
However, one of the weakest links in the reliability process is Symbolic Representation
the modeling. Dr. John D. Spragins, an editor for the IEEE
Chapter 3 presents reliability diagrams, models that are the
Transaction on Computers, places this fact in context
symbolic representations of the analysis. The relationship of
(ref. 10--3) with the following statement:
operation and failures can be represented in these models.
Redundancy (simple and compound) is also discussed. Perfor-
Some standard definitions of reliability or avail- mance estimates and reliability predictions are now being
ability, such as those based on the probability that performed simultaneously by using symbolic modeling con-
all components of a system are operational at a cepts such as Petri nets.
given time, can be dismissed as irrelevant when In 1966, Carl Adam Petri published a mathematical tech-
studying large telecommunication networks. Many nique for modeling. Known as a Petri net, it is a tool for analyzing
telecommunication networks are so large that the systems and their projected behavior. In 1987, he delivered the
probability they are operational according to this keynote address at the international workshop on Petri nets and
criterion may be very nearly zero; at least one item performance models (ref. 10-7). Many applications were dis-
of equipment may be down essentially all of the cussed: the use of timed models for determining the expected
time. The typical user, however, does not see this delay in complex sequences of actions, the use of methods to
unless he or she happens to be the unlucky person determine the average data throughput of parallel computers,
whose equipment fails; the system may still operate and the average failure rates of fault-tolerant computer designs.
perfectly from this user's point of view. A more Correctness analysis and flexible manufacturing techniques
meaningful criterion is one based on the reliability were also described. Timed Petri nets show promise for analyz-
seen by typical system users. The reliability appar- ing throughput performance in computer and communications
ent to system operators is another valid, but distinct, systems.
criterion. (Since system operators commonly con- A Petri net is an abstract and formal graphical model used for
sider systems down only after failures have been systems that exhibit concurrent, asynchronous, or nondeter-
reported to them, and may not hear of short ministic behavior. The Petri net model provides accurate sys-
self-clearing outages, their estimates of reliability tem information when it validly represents the system and the
are often higher than the values seen by users.) model solution is correct. A Petri net is composed of four parts:
a set of places, a set of transitions, an input function, and an
Reliability objectives can be defined differently for various output function. The input and output functions relate to tran-
systems. An example from the telecommunications industry sitions and places. In general, graphics are used to represent the
(ref. 10-5) is presented in table 10-1. We can quantify the Petri net structures and to show the concepts and the problems.
objectives, for example, for a private automatic branch exchange A circle represents a place, a bar represents a transition, and
(PABX) (ref. 10-6) as shown in table 10-2, which presents the directed arcs connect transitions to places or places to transi-
reliability specifications for a wide variation of PABX sizes tions. The state of a Petri net is called the PN marking and is
(from fewer than 120 lines to over 5000 lines). defined by the number of "tokens" contained in each place.

NASA/TP_2000-207428 149
TABLE l 0-2.--RELIABILITY SPECIFICATION FOR PABX
Number of lines

< 120 200 400 600 800 1200 3000 5000


i

Common control performance:


Mean time between catastrophic I0 ....................
failures, yr
System outage time per 20 yr, hr .................... l 1 1
Mean time between outages, yr .................... >5 >5 >5
Mean time between complete 5 10 40 40 40 ....
losses of service, yr

Service level:
Mean time between major losses 200 400 300 200 150 365 365 ....
of service, days
Mean time between minor losses 60 60 50 40 30 30 15 ....
of service, days
Degradation of service, hr/yr ............................ l
Mishandled calls, percent 0.1 0.1 0.I 0.1 0.1 0.1 0.1 0.02

TABLE 10--3.--SPARES POLICY


Subsystem ¸Omit, Subdepot Tm'naround Depot Turnaround
spares spares time" of spares time* of
? ? subdepot ? depot
spares, SpareS,

clays days

Common control and Yes Yes 2 Yes 15

memory
Network No 30
Line and trunk units Yes 30

Peripheral equipment No _r 30

Test equipment No No .... 5

"For replacing spares.

A place is an input to a transition if an arc exists from the place the area served is rural, suburban, or urban, and (5) whether the
to the transition and an output if an arc exists from the transition repair facility is onsite or remote. A typical spares policy for a tele-
to the place. Enabled transitions can be "fired" by removing one communications system (ref. 10-9) is presented in table 10-3.
token from each input place and adding one token to each Policies earl be formulated for families of systems or for
output place. The firing of a transition causes a change of state multifamily geographical areas. The turnaround time depends
and produces a different PN marking. Reference 10-8 contains on the replaceable units failure rate, the repair location, the
additional information. Petri nets are a useful reliability repair costs, and so forth. A specific spares policy can be tailored
modeling tool. to a given geographical area. Note that subsystems have differ-
ent spares policies owing to the criticality of their failures in
contrast to a blanket spares assignment without regard to
functionality or survivability.
Logistics Support and Repair Philosophy
Even though the spares location and turnaround time are the
The logistics support plan is normally based on criteria such same for two different subsystems, the spares adequacy can be
as (1) failure rates and repair rates of replaceable units, (2) different. Some spares adequacy levels for a teleeommtmica-
system maturity, (3) whether the sites can be served by depots tions systems are presented in table 10-4.
or subdepots, and (4) the rate at which additional sites are added Spares provisioning is an important part of a spares plan.
to the depot responsibility. Since spares are the key to support, Requirements must be clearly stated or they can lead to over- or
this chapter will examine them further. undersparing. For example, a spares adequacy of 99.5 percent
The size of the spares stock depends on (1) the criticality of can be interpreted in two ways. First, six spares might be needed
the replaceable unit to the system, (2) the necessary spare to guarantee that spares are available 99.5 percent of the time.
adequacy level, (3) the number of systems served, (4) whether Alternatively, if one states that when a failure occurs, a spare

150 NASA/TP--2000-207428
TABLE 10-4.--SPARES ADE( )UACY TABLE 10-6.--MAINTENANCE ACTION
I RECOMMENDATIONS
Subsystem Onsite Subdepot I Depot
Action Before Busy After Off-shift
spares? spares I spares time
busy hour busy
Adequacy" hour hour

Common control and Yes 0.9995 0.9995 Repair Yes Yes Yes Yes

memory Defer repair for (days) 0 0 1 1

Network No .995 .995 Is second failure affecting No Yes No No

Line and trunk units Yes .999 .999 service?

Peripheral equipment No .99 .99 Probability of no similar 0.95 0.90 0.82 0.60

Test equipment No ......... .95 second failure


Site failures last month Low High Normal Low
"Probability of having spares available.
Site failures last year Low Low Normal Low

Transient error rate Low High Low Low

TABLE 10--5.--DEPOT EFFECTIVENESS FOR TYPICAL DIGITAL PABX

Foreign Control Printed wiring cards forn systems Spare prinmdwifingcardsfor n systems
branch automatic
trunk 1 2 10 50 100 1 2 10 50 I00
part

15002 6 65 130 650 3 250 6 500 2 2 5 13 20


15003 5 16 32 160 800 1 600 1 1 2 5 7
15004 6 14 28 140 700 1 400 I 1 4 5 8

20703 8 28 56 280 1 400 2 800 2 1 4 10 15


20703 16 153 206 1 530 7650 15 300 7 11 29 106 196

Total 1058 2116 I0580 52900 105 800 153 173 287 658 1001

Spares, percent oftotal 14.5 8.2 2.7 1.2 0.95

must be available 99.5 percent of the time, it will be necessary the site to the operating network, the cumulative site failures

to supply 6 + 1 = 7 spares. for the last 3 months, and the probability of additional failures

The establishment of depot and subdepot sparing, rather than occurring. The data would be analyzed with a maintenance-

only individual site sparing, has proven to be cost effective. As prediction computer program to generate a table based on

an example, table 10-5 presents the depot effectiveness for a system loading, such as table 10---6. Often the suggested
maintenance deferral time is recommended to be the next
typical digital PABX. This table indicates that a 14.5-percent
spares level wouldbe required if only per-site sparing was used; maintenance visit (NMV). The NMV will vary with the amount

however, when one depot serves 100 sites, the required spares of equipment onsite and the projected failure frequency

level is less than 1 percent. (ref. 10--10).


A centralized maintenance base (CMB) (ref. 1 0-10) is essen- The combination of deferred maintenance and a centralized

tial to a deferred maintenance concept. Deferred maintenance maintenance base dictates the needs for an efficient spares

can be available on a real-time basis. When a failure occurs at program. Spares planning combined with knowledge of the

an unattended site, the CMB would receive information on a logistics can optimize support costs. A depot stocking plan can

display as to the criticality of the failure and the deferred main- additionally vary because of many factors, including error

tenance action taken if imposed and would receive a projection coverage, system maturity, deferred repair, and maintenance

indicating impending problems. The CMB would analyze the familiarity. A dynamic (continuously updated) depot stocking

situation for the specific site configuration, the processing level plan would be cost effective. A dynamic depot model using

in the system, and the site's failure-repair history. Monte Carlo methods (ref. 10-1 I) includes unit delivery sched-

Input data could consist of items such as the last similar ules, item usage per month, support personnel efficiency, and

occurrence, the next planned visit to the site, the criticality of depot and base repair cycle times.

NASA/TP--2000-207428 151
Botto _nup Top down

Customer-oriented
reliability h Translation

System-oriented
reliability

Allocation

Quality
assurance Subsystem-or module-
and orientedreliability
reliability
control
Requirements

Component-oriented
reliability

Planning

Reliability planning
and
reliabilitystandards

FigurelO-3.--Overall reliabilityprocess.

Reliability Management Activities For example, a replacement product was to use a very large-
scale integration (VLSI) implementation, and the protection
Performance Requirements against network failures needed to be assessed. An investiga-
tion found no apparent standard industry FMEA method for
It is often difficult to translate customer performance require- VLSI components. Because future VLSI products may show an
meats into design requirements, especially in the area of quality increasing need for FMEA, it is important that an industry
and reliability. Reliability encompasses both quantitative and standard be generated. In the network examples discussed, a
qualitative measures. New terms in the computer industry, such single fault could directly cause a customer-oriented problem.
as "robustness," are not formally metricized. However, we can The bottom-up approach to reliability validation ensures
adapt concepts for the overall performance process (ref. l O-I 2) customer satisfaction. The appropriate certification, process
to apply to reliability as presented in figure 10-3. metrics, and statistical in-process tests must be designed from
If a business' matrix of reliability requirements is reduced to the customer viewpoint. A step-by-step upward certification
one or more models, subjective and qualitative customer- and design review using process metrics can be designed to
oriented reliability measures can be translated into quantitative ensure customer-oriented reliability. In addition, we can see the
system-oriented reliability criteria. Figure 10-3 identifies both need for the independent upward path from reliability planning
the top-down and bottom-up approaches to reliability valida- and standards to customer-oriented reliability in figure 10-3.
tion, which include (1) translation, (2) allocation, (3) require- This is the key to success, since reliability control cannot be
ments, and (4) planning. bypassed or eliminated from design- or performance-related
With the identification of the agreed-to system-oriented issues.
reliability criteria, designer-oriented subsystem or module
reliability parameters can be allocated as shown in fig-
ure 10-3, generally by a system reliability team. The team Specification Targets
evaluates simple versus redundant configurations, levels of
fault detection and correction implementations, software con- A system can have a detailed performance or reliability spec-
siderations, and so forth. System or module reliability model- ification that is based on customer requirements. The surviv-
hag may specify reliability requirements for specific components. ability of a telecommunications network is defined as the
An example of such modeling is a failure modes and effects ability of the network to perform under stress caused by cable
analysis (FMEA) performed on a product to predict the prob- cuts or sudden and lengthy traffic overloads and aRer failures
ability of network failures due to a single failure or due to a including equipment breakdowns. Thus, performance and avail-
failure after an accumulation of undetected failures. ability have been combined into a unified metric. One area of

152 NASA/TP--2000-207428
100 --

Fully operational
P2

Subliminal Subliminal
Degraded
Ct.
availability availability
a; operation
O major minor
t--
t_

E
O
1= Pl
13_ Subliminal performance,B
75 percent at load factor
Unusable
Subliminalperformance,B
65 percent at loadfactor
J
a2 100
al
Availability,percent

Figure10-4.---Specification target(ref. 10-14).

telecommunications where these principles have been applied plex new hardware and software programs. Figure 10-6 (taken
is the design and implementation of fiber-based networks. from ref. 1 0-1) presents the traditional viewpoint of the design,
Reference 10-13 states that "the statistical observation that on development, and production community on cumulative reli-
the average 56 percent of the pairs in a copper cable are cut ability growth. It is possible that the same data generated both
when the cable is dug up, makes the copper network 'structur- curves in figure 10-6. When we measure the cumulative
ally survivable.' " On the other hand, a fiber network can be reliability growth, the decline of production coupled with a
assumed to be an all-or-nothing situation with 100 percent of decline of reliability is masked. If we track the product on a
the circuits being affected by a cable cut, failure, or other quarterly basis, often the product shows a relaxation of process
destruction. In this case study, according to reference 10-13, control, incorporation of old, marginal components into the last
"cross connects and allocatable capacity are utilized by the year's product manufacture, failure to incorporate the latest
intelligentnetwork operation system to dynamically reconfigure changes into service manuals, knowledgeable personnel trans-
the network in the case of failures." Figure 10-4 (from ferred to other products, and so forth. Thus, there is a need to
ref. 10-14) presents a concept for specification targets. track specific products on a quarterly basis (ref. 10-I).

Field Studies Human Reliability


The customer may observe specific results of availability. Analysis Methods
For instance, figure 10-5 has been the basis for the proposal of
an IEC technology trend document (ref. 10-15). The major objectives of reliability management are to ensure
System reliability testing is performed today to benchmark that a selected reliability level for a product can be achieved on
schedule in a cost-effective manner and that the customer
the reliability, availability, and dependability metrics of corn-
perceives the selected reliability level. The current emphasis in
reliability management is on meeting or exceeding customer
expectations. We can view this as a challenge, but it should be
lO0 viewed as the bridge between the user and the producer or
0 0 0 0 0
e- 0 provider. This bridge can be titled "human reliability." In the
0
past, the producer was concerned with the process and the
product and found reliability measurements that addressed
both. Often there was no correlation between field data, the
.10

customer's perception of reliability, and the producer's reli-


<> 0 ability metrics. Surveys then began to indicate that the cus-
Time tomer or user distinguished between reliability performance,
response to order placement, technical support, service quality,
Figure10-5.--Software availability.
and so on.

NASA/TP--2000-207428 153
O [] O 0 o 0 Do 0 0 0 0 0

_=
Q) 0
[] 0 0
O
0
E 13
0
¢0 D
e_ 0
[]
0
D 0 o Traditional cumulative

0 o
D Customer actual
rr 0 o

I ]'-.. I
\
Qualification " "- First "-- Last Replacement

pq- Design or development --_ customer customer


shipment shipment

I_ Manufacturing ,, _ I

I-'_ Customer

Figure 10-6.--Traditional viewpoint of reliability growth (ref. 10-1 ).

Human Errors Outage Downtime


frequency (3.5 min)
Human reliability
isdefined (ref.10-16) as "the probability (events or per year
crashes), per machine,
of accomplishing a job or task successfullyby humans at any
percent percent
required stage in system operatious within a specified minimum
time limit (if the time requirement is specified)." Although
Operational 2
customers generally are not yet requiting human reliability software
models in addition to the requested hardware and software
reliabilitymodels, the science of human reliabilityis well
established. Recovery
24 software _ 26

Example

Presently, the focus in design is shifting from hardware and


software reliability to human reliability. A recent 2 1/2-year
study by Ball Communication Research (ref. 10-17) indicated 29 Hardware 30
that reliability in planning, design, and field maintenance pro-
cedures must be focused on procedural errors, inadequate
emergency actions, recovery and diagnostic programs, the
design of preventive measures to reduce the likelihood of pro-
cedural errors, and the improvement of the human factors in the
design and subsequent documentation. The study revealed
the following results for outages or crashes as shown in fig- 38 Procedural 42

ure 10-7. Approximately 40 percent of outage events and


downtime isdue toprocedural problems (human error).In fact,
ifsoftware recovery problems are included with procedural
problems, 62 percent of the events and 68 percent of the
downtime are due tohuman error.Therefore, human reliability
planning, modeling, design, and implementation must be
focused on to achieve customer satisfaction.

Figure 10-7.--Reliability characteristics.

154 NASA/TP--2000-207428
Presentation of Reliability involved throughout the design qualification process through
field feedback. Ideally, the MTBF's of tiers 2 to 5 would be
Reliability testing usually occurs during product develop- equal; however, the calibration of reliability modeling tools
ment and ends with the first product shipment. However, and the accuracy of field MTBF measurements are challenges
product reliability testing can be cost effectively run through yet to be met in some corporations and industries. Thus, a three-
the manufacturing life of the product to achieve both continued to five-tier approach is a practical and effective solution for
customer satisfaction and the inherent reliability of the product. developing reliability measurements.
A major concern in planning reliability testing is the maturity Although the MTBF is between T(min) and T(spec), progress
of the specific manufacturing facility. For instance, a new plant is tracked toward T(spec) as a goal. The point is to fred and fix
may initially need three to five failures per week of tested the problems and thus improve the reliability of the product.
product under controlled test environments to shape the manu- Teamwork and commonality of purpose with manufacturing
facturing process and the product specifics. Therefore, detailed and engineering are necessary to deal with real problems and
failure analysis will be conducted on 150 to 250 failed items per not symptoms. After T(spec) has been achieved, an "insurance
year. Once plant personnel begin to feel comfortable as a team policy" is necessary to determine if anything has gone radically
and several of the plant's processes, products, or both are wrong. This can be a gross evaluation based on limited data as
certified, the goal of one failure per week can be instituted in a the "premiums" for a perfect "insurance policy" are too high.
medium-mature plant. The team in a mature plant with few Once T(spec) has been demonstrated, a trigger can be set at the
failures can observe leading indicators that forewarn of pos- 50-percent lower MTBF limit for control purposes. Improve-
sible problems and can prevent them from entering into the ment plans at this level should be based on the return on
shipped product. Thus, in a mature plant the goal of one failure investment. At maturity, T(intrinsic), dependence on reliability
per 2 weeks can suffice as a benchmark for quality operations testing can be reduced. A few suggestions for reductions are
to achieve product reliability. testing fewer samples, shortening tests, and skipping testing for
1 or 2 months when the personnel feel comfortable with the
product or process. With a reduced dependence on reliability
Engineering and Manufacturing testing, other manufacturing process data can be used for full
control.
Measuring reliability in a practical way is a challenge.
Reliability grows with product, process, and customer use
maturity. We could measure, for example, the reliability at the User or Customer
first customer shipment and the reliability during a 5-year
production life. An effective start may be to establish a three- Reliability growth has been studied, modeled, and analyzed---
to five-level reliability tier concept (ref. 10--18). For example, usually from the design and development viewpoint. Seldom is
table 10-7 presents a five-tier reliability concept. With this the process or product studied from the customer's or user's
concept, products can achieve the first customer shipment at a perspective. Furthermore, the reliability that the first customer
mean time between failures (MTBF) ofT(rain). Manufacturing observes with the first customer shipment can be quite different
and service will accept risks until T(spec) is reached. Manufac- from the reliability that a customer will observe with a unit or
turing has a commitment to drive the MTBF of the product up system produced 5 years later, or the last customer shipment.
to T(spec), and engineering has a commitment to provide Because the customer's experience can vary with the maturity
resources for solving design problems until T(spec) is reached. of a system, reliability growth is an important concept to
The qualification team working with this process is now customers and should be considered in the customer's purchas-
ing decision.
The key to reliability growth is the ability to define the goals
TABLE 10--7.--FIVE-TIER RELIABILITY CONCEPT for the product or service from the customer's perspective
Tier Mean time Description while reflecting the actual situation in which the customer
between obtains the product or service. For large telecommunications
failures
i
switching systems, there has been a rule of thumb for determin-
1 /'(mill) Minimum demonstrated MTBF before shipping ing reliability growth. Often systems have been allowed to
(statisticaltest) operate at a lower availability than the specified availability
2 T(spee) Specified MTBF that meets market needs and goal for the first 6 months to 1 year of operation (ref. 10-19).
supports service pricing In addition, component part replacement rates have often been
Design goal MTBF (calculation) allowed to be 50 percent higher than specified for the first
3 /'(design)
Intrinsic MTBF (plant measurement) 6 months of operation. These allowances accommodated
4 /'(intrinsic)
Field MTBF measurement craftspersons learning patterns, software patches, design
5 /'(field
errors, arid SO on.

NASA/TP--2000-207428 155
TABLE 10--8.--1950 GENERIC QUALITY METRICS
[From ref. 10-20.]
Metric Implementation phase

Requirements Design Laboratory Field test Field


system test performanc¢

Open questions 0
Problems fixed, per 1/500 1/1000 1/1000
words
Problems open, per 1/5000 1/5000 1/2000 1/2000
words
Interrupts, per day <20 <20 <25
Audits, per day 0 <10 <10 <25
Service affective 0 0 !.8
incidents, per
office month I
Reinitializations, per l
i
month
Cutoff calls, per <0.2
i
10000
Denied calls, per <0.7
10000
Trunk out of service, 20
min/yr

TABLE 10-9.--PRODUCTION LIFE-CYCLE RELIABILITY GROWTH CHART

System size Year

,9,7 I-. I ,,9,


Quarter

Q1 Q2 Q3 Q4 Qi Q2 ... Q3 Q4

Small system:
Reliability growth, 5 0 0 0 0 0 ... 0 0
percent
Time to steady 3 0 0 0 0 0 ... 0 0
state, months

Medium system:
Reliability growth, 100 50 25 10 10 10 ... l0 10
percent
Time to steady 6 3 2 1 I 1 I 1
state, months

Large system:
Reliability growth, 200 100 50 50 33 33 ... 20 20
percent
Time to steady 12 9 6 3 3 3 ... 3 3
State, months

156 NASA/TP--2000-207428
Thekeytoreliabilitygrowthistohavethegrowthmeasure- References
mentencompass the entirelife cycleof theproduct. The
conceptis notnew,onlyheretheemphasis is placedonthe 10-I. Malec, H.A.: Reliability Growth From the Customer Perspective.
customer'sperspective. Reference 10-20presents thegoalsof IEEE J. Sell Topics Commun., vol. 6, no. 8, Oct. 1988,
pp. 1287-1293.
softwarereliabilitygrowth(table10-8). 10-2. Dhillon, B.S.; and Reiehe, H.: Reliability and Maintainability Man-
Table10-8covers alargecomplex system withbuilt-infault agement. Van Nostrand Reinhold, 1985.
tolerance.Reference 10-21regarded this systemasnot 10-3. Spragins, J.D., et al.: Current Telecommunication Network Reliability
"technicallyor economically feasible to detect and fLX all Models: A Critical Assessment. [EEE J. Sell Topics Cornmtm.,
software problems in a system as large as No. 4 ESS [electronic vol. SAC-4, no. 7, Oct. 1986, pp. 1168-1173.
10--4. PNPM '87, International Workshop on Petri Nets and Performance
switching system]. Consequently, a strong emphasis has been
Models. IEEE Computer Society Press, 1987.
placed on making it sufficiently tolerant of sol'are errors to
10-5. Malec, H.A.: Reliability Optimization in Telephone Switching Sys-
provide successful operation and fault recovery in an environ- tems Design. IEEE Trans. Rel., vol. R-26, no. 3, Aug. 1977,
ment containing software problems." pp. 203--208,
Reliability growth can be specified from"day 1" on a product 10--6. Petri, C.A.: Communication With Automata. Final Report, Vol. I,
Supplement I, RADC TR 65-377-VOL I-SUPPL I, Applied Data
development and can be measured or controlled on a product
Research, Princeton, N J, Jan. 1966.
with a 10-year life until "day 5000." We can apply the philoso-
10--7. Woodsidc, C.M.: Innovator of Timed Petri Nets Keynotes Interna-
phy of reliability knowledge generation principles, which is to tional Workshop. Spectrum, Mar. 1988, p. 143.
generate reliability knowledge at the earliest possible time in 10--8. Peterson,J.L.: Petri Net Theory and the Modeling of Systems. Prentice-
the planning process and to add to this base for the duration of Hall, Inc., 1981.
10-9. Malec, H.A.; and Steinhorn, D.: A New Technique for Depot and
the product's useful life. To accurately measure and control
Sub-Depot Spares. IEEE Trans. Rel., vol. R-29, no. 5, Dec. 1980,
reliability growth, we must examine the entire manufacturing
pp. 381-386.
life cycle. One method is the construction of a production 10-10. Malec, H.A.: Maintenance Techniques in Distributed Communica-
life-cycle reliability growth chart, tions Switching Systems. IEEE Trans. Rel., vol. R-30, no. 3, Aug.
Table 10-9 presents a chart for setting goals for small (e.g., 1981, pp. 253-257.
10-11. Murray, L.R.; and Morris, R.S.: Spare/Repair Parts Provisioning
a 60-line PABX or a personal computer), medium, and large
Recommendations. 1979 IEEE Annual Reliability and Maiutain-
systems. Small systems must achieve manufacturing, shipping,
ability Symposium, IEEE, 1979, pp. 224--230.
and installation maturity in 3 months to gain and keep a market 10-12. Gruber, J.G., et al.: Quality-of-Service in Evolving Telecommunica-
share for present and future products. This is an achievable but tions Networks. IEEE J. Sell Topics Commun., voI. SAC--4, no. 7,
difficult goal to reach. The difference in reliability growth Oct. 1986, pp. 1084-1089.
10--13. Roohy-Laleh, E., et al.: A Procedure for Designing a Low Connected
characterization between small systems and larger systems is
Survivable Fiber Network. IEEE J. Sell Topics Commun.,
that the software-hardware-fu-mware interaction, coupled with vol. SAC-4, no. 7, Oct. 1986, pp. I 112-1 i 17.
the human factors of production, installation, and usage, limits 10-14. Joncs, D.R.;andMalec, H.A.:CommtmicationsSysternsPerformability:
the reliability growth over the production life cycle for most New Horizons. 1989 IEEE International Conference on Communi-
large, complex systems. cations, vol. 1, IEEE, 1989, pp. 1.4.1-1.4.9.
10-15. Decroix, A.: Analysis and Evaluation of Reliability and Availability of
In certain large telecommunications systems, the long instal-
Software. IEC-TC-56 draft, 56AVG I0 (DECROIX)02, June 1986.
lation time allows the electronic part reliability to grow so that 10-16. Dhillon, B.S.: Human Reliability: With Human Factors. Pergamon
the customer observes the design growth and the production Press, 1986.
growth. Large, complex systems often offer a unique environ- 10-17. Ali, S.R.: Analysis of Total Outage Data for Stored Program Control
ment to each product installation, which dictates that a signifi- Switching Systems. IEEE J. Sell Topics Commun., vol. SAC--4,
no. 7, Oct. 1986, pp. 1044-1046.
cant reliability growth will occur. Yet, with the difference that
10-18. Malec, H.A.: Product/Process ReliabilityTesting. 1987 IEEEinterna-
size and complexity impose on the resultant product reliability tiortal Conference on Communications, IEEE, 1987, pp. I 198-1202.
growth, corporations with a wide scope of product lines should 10-19. Conroy, R.A.; Malec, H.A.; and Van Goethern, J.: The Design,
not present overall reliability growth curves on a corporate Applications, and Performance of the System-12 Distributed Com-
basis but must present individual product line reliability growth puter Architecture. First International Conference on Computers
and Applications, E.A. Parrish and S. Jiang, eds., IEEE, 1984,
pictures to achieve total customer satisfaction.
pp. 186--195.
10-20. Giloth, P.K.; and Witsken, J.R.: No. 4 ESS--Design and Performance
of Reliable Switching Software. International Switching Sympo-
sium (ISS '81 CIC), IEEE 1981, pp. 33A1/1-9.
10-21. Davis, E.A.; and Giloth, P.K.: Performance Objectives and Service
Experience. Bell Syst. Tech. J., vol. 60, no. 6, 1981, pp. 1203-- 1224.

NASA/TP--2000-207428 157
Reliability Training _

I. Reliability management is concerned with what phases of the life cycle?

A. Design and development B. Manufacturing C. Customer D. All of the above

2. Name a new style of organizing reliability activities.

A. Functional B. Team C. Matrix D. Council

3. What are the functions of the diagnostic team or person?

A. Review the internal reliability status


B. Review reliability as perceived by the customer
C. Recommend tasks to the reliability council
D. Diagnose problems
E. Design experiments
F. Collect and analyze data
G. All of the above

4. Name a goal category for a telephone instrument.

A. Loss of service
B. Mean time between failures
C. Mishandled calls
D. All of the above

5. A PABX with 80.0 lines has a service level reliability specification for the mean time between maj or losses of service (MTBF) of

A. 150 days B. 1 hour C. 0.1 percent D. All of the above

6. A Petri net is composed of which of the following parts?

A. A set of places
B. A set of transitions
C. An input function
D. An output function
E. All of the above

7. For a telecommunications system, what is the spares adequacy level for a network subsystem with spares depots?

A. 0.999 B. 0.995 C. 0.95

8. Turnaround time depends on

A. Replaceable unit failure rate


B. Repair location
C. Repair cost
D. All of the above

IAnswersarc given at the end of thismanual.

158 NASA/TP--2000-207428
9. Sparesadequacy istheprobability
of having sparesavailable.

A. True B. False C. Do not know

10. What is the normal maintenance action recommendation for the site to defer repair for (days) during off-shift time?

A. 0 B. 2 C. 1

11. The bottom-up approach to reliability makes use of plarming, requirements, allocations, and customer orientation.

A. True B. False C. Do not know

12. Specification targets can be used to define what performance and availability requirements?

A. Fully operational
B. Subliminal availability
C. Degraded operation
D. Unusable
E. Subliminal performance
F. All of the above

13. Tracking a product on a quarterly basis often shows

A. A relaxation of process control


B. Incorporation of old marginal components
C. Failure to incorporate the latest changes into service manuals
D. Knowledgeable personnel transferred to other products
E. All of the above

14. If we consider recovery, software and procedural problems as human error, human error can account for what percentage of
outage and downtime problems?

a. Outage frequency, percent of events/crashes A. 38 B. 55 C. 62


b. Downtime (3.5 min), percent per year per machine A. 42 B. 51 C. 68

15. As a benchmark for quality operations to achieve product reliability, what is a reasonable goal (failures per week) for a mature
plant?

A. 3.0 B. 1.0 C. 0.5

16. While the MTBF is between T(min) and T(spec), progress is tracked toward what goal?

A. T(design) B. T(spec) C. T(intrinsic)

17. The key to reliability growth is to have the growth measurement encompass

A. The design phase


B. The manufacturing phase
C. The testing phase
D. The user phase
E. The entire life cycle of the product

NASA/TP--2000-207428 159
18. For a No. 4 ESS system in the field-test phase, the number of interrupts per day can be

A. <20 B. >20 C. 40

19. An electronic system must achieve manufacturing, shipping, and installation maturity in what period of time (months) to
gain and keep market share?

a. Small system A. 1 B. 2 C. 3
b. Medium system A. 4 B. 6 C. 12
c. Large system A. 12 B. 8 C. 16

160
NASA/TP_2000-207428
Chapter 11
Designing for Maintainability and System
Availability

Introduction to operate even with some system degradation. The treatment


of these types of problems is beyond the scope of this manual.
The final goal for a delivered system (aircraft, a car, an Finally, specific problems for hands-on training follow the
avionics box, or a computer) should be its availability to operate concluding section.
and to perform its intended function over the expected design
life. Hence, in designing a system, we cannot think in terms of
Definitions
delivering the system and just walking away. The system sup-
plier needs to provide support throughout the operating life of Reliability is the probability that an item can perform its
the product, which involves the concepts presented in fig- intended functions for a specific interval under stated condi-
ure 11-1. Here, supportability requires an effective combina- tions. What is the chance that a failure will stop the system from
tion of reliability, maintainability, logistics, operations, and operating? Usually the failure is random and unexpected, not
safety engineering to have a system that is available for its predicted as with brake wearout or a clutch or fatigue failure
intended use throughout the designated mission lifetime (see when a given input load spectrum is known.
the Definitions section for more details). Maintainability is the Availability is a measure of the degree to which an item is in
key to providing effective support, upkeep, modification, and the operable and commitable state at the start of the mission,
upgrading throughout the lifetime of the system. when the mission is called for at an unknown (random) point in
This chapter will concentrate on maintainability and its time. Also, it is the probability of system readiness over a long
integration into the system engineering and design process. The interval of time. Will the system be ready to operate when
topics to be covered include the elements of maintainability, the needed? Does it have very high reliability or very small main-
total cost of ownership, and the ways that system availability, tenance requirements (easily maintainable and having a good
maintenance, and logistics costs plus spare parts costs affect supply of spare parts) or a combination of both? For example,
the overall program costs. System analysis and maintainability what was the percentage of times a car started out of the total
will show how maintainability fits into the overall systems number of tries over its lifetime? Alternatively, how many days
approach to project development. Maintainability processes was it in the driveway ready to start as opposed to being in the
and documents will focus on how maintainability is to be garage for repairs?
performed and what documents are typically generated for a Maintainability is a system effectiveness concept that meas-
large-scale program. Maintainability analysis shows how ures the ease and rapidity with which a system or equipment is
tradeoffs can be performed for various alternative components. restored to operational status after failing. Also, it is the
Note that the majority of the mathematical analysis and ex- probability that a failed system can be restored to operating
amples will concentrate on maintainability analysis at the condition in a specified interval of downtime. How easy is it to
component level or below. In a highly complex and redundant diagnose the problems in a failed (or marginally operable)
system, the evaluation availability at a system level may be system and how easy is it to replace the failed components (or
extremely difficult and is beyond the scope of this manual. software) after this diagnosis has been made? If a system is not
Redundancy, switches and software that can be used to bypass reliable and is prone to partial or complete failures, if it is dif-
failed subsystems, and other methodologies can allow a system ficult to find out what is causing the system to malfunction, or

NASA/TP--2000-207428 161
=l Mean time
engineering
Reliability I v I between failures

Allocations
Life predictions
Failure data/analysis
Ufe-limited items
Redundancy
Failure mode effects analysis
Cdtical items

_l Mean time /
Maintainability
engineering J v I to repair

Maintenance concept
Preventive maintenance
Built-in-test]diagnosis
Design Repair or replacement
Development Recertification
Production Maintainability analysis Availability
Operation ORU selection
ORU replacement times
Maintenance scheduling

acquire and
engineering
Logistics J ship spares
I Time to
Spares quantities
Spares locations
Spares shipping
Lifetime supply of parts
Vendor evaluation

I Operational
engineering
Operations I r I constraints

Systems operation
Systems training

tisfied?
co ts i _L

Figure 11-1 .--System supportability requirements.

NASA/TP_2000-207428
162
.ak
if it is difficult to get to and replace failed components, we have
a serious problem that must be corrected (ref. 1).
Safety analysis is that which considers the possible types,
reasons, and effects of operation and failures on the system as
they affect the personal safety of those who operate or maintain it.
Logistics is the art and science of the management, engineer-
ing, and technical activities concerned with requirements,
design, and planning and maintaining resources to support
objectives, plans, and operations.
Operations defines the environment, schedule, loading, and
input and output parameters a system will need to function and
the tasks it will perform.

Importance of Maintainability

The importance of maintainability is further noted in


figure 11-2. Too often, the performance specifications or the
appearance of a product are the overriding factors in its acqui- Figure 11-2.--Importance of maintainability.
sition or purchase. This attitude can be extremely detrimental,
especially when the first failure occurs and it is realized that the
availability of critical parts and the ease of maintenance keep
critical systems operating. A large inte_ated system can come can information during this anomaly be retrieved later? If a
from the best possible design, utilizing the newest technology; failure cannot be isolated or if insufficient diagnostic capa-
it can be a work of art and outperform any competitive system, bilities are built into the system, restoration can be a time-
but who would want it if consuming task.
(3) How quickly can the system be repaired? Has the system
• System breakdowns could not be diagnosed to a level of been segmented into easily replaceable units? Are parts buried
detail needed to pinpoint the problem in a short time. on top of one another with hundreds of attachment points
• Spare parts were not readily available. between units? Also, can software be used to detect and route
• Repair required extremely long lead times. around a hardware failure and make the failure transparent to
• Installing the spare parts was extremely difficult. the user?
• Checkout and/or alignment of spare parts was difficult. (4) Where will spare parts be stored? How many spare units
should be ordered? Will parts for a unit in Washington be lost
For all practical purposes, such a system is not available in a warehouse in Los Angeles? Will there be an oversupply of
(operational). one unit and a shortage of another?
(5) Will a failed unit be discarded or repaired? If it is to be
repaired, where should it be repaired? What equipment and
Elements of Maintainability personnel are required to do the work?
(6) Will unique parts be available to repair the unit? Will
We need to consider up front in our design what must be done some unique part such as a traveling wave tube or a low-noise
to maintain the system. Either the system will not fail for the amplifier still be manufactured when it is needed to be replaced
entire mission or some parts of the system will fail and will need to repair a unit? Will the supplier who sold the unit repair it? If
to be replaced. If we do not have a system with perfect reliability repairs are agreed to, will the supplier still be in business
(there is wearout), the following questions (as illustrated by (logistics issues)?
fig. 11-3) should be asked:
When a product is planned, all these questions must be
(1) What parts have high failure rates and how will their answered. Although some of these questions overlap with
failure be diagnosed? For example, ifa cathode ray tube (CRT) logistics (the science of supply and support of a system through-
screen does not show a display, has the screen failed or has a out its product life cycle), they must all be addressed. Early in
power supply failed or has a computer stopped sending the the design phase of the product, the maintenance concept to be
screen data? used for the system and the design for maintainability must be
(2) Can various problems be diagnosed easily? How quickly examined first. The following definitions will be helpful in
can the problem be diagnosed? If there is an intermittent fault, making decisions in the design phase.

NASA/TP--2000-207428 163
Mean time I I"_ I
between
failures
(MTBF) [ Maintenanc_e
Preventive ]m't [ J= _' OO 1

MTBF = 300 000 hr

MaiDS, iT bility Recertification _OF U_

_a,,,,ro
_,o,o_ _ _,

Repair
Equipment Equipment 1

'7='oo I I Personnel I

Disposal

_ Availability

Figure 11-3.--Elementsof maintainability.

164 NASA/TP_2000-207428
Risk management lack of spare parts, additional units must be procured to have the
fleet strength at the desired level (whether it is delivery vehicles
or research aircraft). The total cost of ownership includes
.. _1 Acquisition costs /

perati°ns _ -'_1_ _ / • Total life-cycle: more than just the cost of flight units and a
Ground _ _" _ Spares prototype unit
operations ------"_ " _..,. /
• Availability of the unit: more than the advertised features
_ _ Logistics
Ground support ....-.--_"_ " when it is running (backup systems needed for excessive
downtime)
equipment _ ._ • Maintenance and logistics: often 40 to 60 percent of the total
system costs
Technical data / _._ _/] _ Training • Spares: a function of reliability and speed with which the
system can be maintained
Maintenance "__,_,_ Software

Test equipment " / Often all the costs associated with a project are not consid-
t,

i ered. Besides just the cost of producing the units, a huge amount
Disposal
of time and money must be expended keeping them operational
Figure 11-4.--Hidden system costs. throughout the mission lifetime. Total project costs are consid-
ered in table 11-1, Evident from the table is that total system
Total Cost of Ownership costs include design and development costs and a whole host of
training, operations, and maintenance costs.
The total life-cycle cost of a unit must be assessed when As the quality and reliability of the system increase, the cost
evaluating project cost. The need to support the system through of the system classically increases. However, this increase may
an effective logistics program that includes maintainability is not necessarily occur because as the quality and reliability of
of paramount importance (fig. 11--4). the system are improved, the cost of maintenance, logistics, and
The project can follow a faster development course and spares decreases. Since total support costs are a function of
procure less reliable hardware; however, the maintenance cost maintenance costs and the cost of the total number of spares,
will make the project more expensive. Additionally, if the unit spare repair, and spare transport, improved reliability drasti-
is not available because of lengthy maintenance processes or cally reduces the total cost of ownership, also.

TABLE 11-1 .--TOTAL PROJECT COSTS


Cost item Cost breakdown
Acquisition Design and development
Research, trades, design, analysis, prototype production and
test
Production

Operations Personnel, facilities, utilities, operating supplies and other


consumables, maintenance ground operations
Ground operations Ground support engineering model and test and checkout models;
maintenance for these
Ground support equipment All test, checkout, and diagnostic equipment; purchase, storage,
and calibration of ground support equipment
Technicai data All manuals, specifications, configuration management; software
configuration management, data base, storage
Training Continuous training of all operations and maintenance personnel
Maintenance Calibration, repair, and system downtime
Repair facilities
Labs, depots, and others

Test equipment Equipment used for maintenance, alignment, and calibration of the
system; equipment used for recertification (e.g., flight)
Software Maintenance, upgrades, test, and installation
Logistics Packaging. storage, transportation, and handling; tracking support
Spares Spare orbital replacement units and line replacement units; long-
lead-time items and critical components
Disposal Disassembling and recycling; disposing of hazardous waste

NAS A/TP----2000- 207428 165


Maintainability and Systems Engineering The program concept assumes a single consistent objective.
It involves putting tested and proven equipment together to
Figure 11-5 gives a global overview of a long-term research
perform a step toward the goal. Another area of work involves
project, such as the space program, and shows maintainability
developing technology and components and conducting
as an integral part of it. The Horizon Mission Methodology
ongoing exploration with the outer fringes of what lies ahead.
(HMM) was developed initially for the study of breakthrough
At an individual project level, a number of different disciplines
space technology. The HMM's are hypothetical space missions
are brought together to design, develop, deploy, and operate the
whose performance requirements cannot be met, even by
project. One of these disciplines is maintainability. Expanding
extrapolating known space technologies. The missions serve to
the various maintainability activities over project phases gives
develop conceptual thinking and depart from simple projec-
us the chart of figure 11-6. Systems engineering at the National
tions and variations of existing capabilities.
Aeronautics and Space Administration (NAS A) uses five phases
The use of HMM's with breakthrough technology options
to describe a mission. Note that the maintainability program is
(BTO' s) has been an attempt to provide a systematic analytical
run across all five phases. The task descriptions are also shown.
approach to evaluate and identify technological requirements
The various activities are defined in the following sections.
for BTO' s and to assess their potential for providing revolution-
Of great importance is that the maintainability concept of the
ary capabilities for advanced space missions.
project be introduced early in the program. Without this intro-
Therefore, we can think of the space program (or other major
duction, long-term missions will see costs rise and downtime
research program) not just as a number of isolated projects but
increase. True, initial development costs may increase, but
as a single unified program with a global goal (e.g., landing men
total cost will decrease. In some cases, projects have ignored
on the Moon or planning a manned mission to Mars or estab-
maintainability and built in diagnostics to obtain budgetary
lishing a permanent manned lunar base).
approval of a new system. However, the final costs always
increase as a result of this practice (ref. 2).
Finally, figure 11-7 shows the interrelationship of the vari-
ous project tasks and how work and information flow between
operations, reliability, and logistics functions. Basically, sys-
Breakthrough
Horizon Mission
Technology
Methodology
Options
(HMM)
(BTO'S)
and I
tems operation and mission requirements are evaluated to
generate the maintainability concept. This concept is further
affected by component reliability and the various reliability
l Space Program _._ analyses performed. This maintenance analysis is then inte-
grated with design engineering to develop a design that can be
Current Proving repaired and maintained.
Applications Technology Exploration
Ongoing I Maintainability data and requirements flow to logistics to
allow development of an effective support resource program.
The output of the maintenance analysis is also critical to the
Project A: Space Experiment logistics support analysis) The logistics support analysis

Phases record (LSAR) and support resource development feed the plan
for (1) facilities to house equipment or ground operations, (2)
ground support equipment, (3) the logistics plan and other
Logistics: Maintainability activities, (4) data (technical publication) for equipment opera-
tion and maintenance, and (5) identification of personnel and
Assurance: Reliabilty
training needed to maintain, repair, and support the equipment.
Assurance: Safety and Quality
Finally, a maintainability demonstration is performed to evalu-
Manufacturing ate the actual times needed to diagnose and physically changeout
a line replaceable unit (LRU) or an orbital replaceable unit
Test and Evaluation
(ORU).

Risk: Engineering and


Systems Analysis IThe following general guideline distinguishes support, logistics, and
maintenance for this manual. Supportability encompasses all logistics,
Systems
maintainability, and sustaining engineering. Logistics is involved with
Engineering
all movement of orbital replaceable units (ORU's) and spare parts, the
Program
procuring and staging of spare parts, and the development of storage
Manage-
containers. Maintainability is responsible for (after the ORU's are located)
ment
repairing ORU's, shop replaceable units (SRU's), printed circuit boards
(PCB's), which includes test and diagnostic equipment, tools: providing
Figure 11-5.--Systems engineering and operations. training, a suitable workarea, and maintenance personnel.

166 NASA/TP--2000-207428
Systems Engineering: Maintainability/Integrated Logistics Support

Phase A: Phase B: Phase C: Phase D: Phase E:


Preanalysis Definition Design Development/ Production
Testing Operation
Maintenance

Maintainability Program Management

"---Maintainability Concepts: Requirement and Availability


T

I _Policy ::

F Support Equipment

_tainability Analysis
-r
Maintainability Design Criteria

Figure 11-6.--Maintainability in system life cycle.

Maintainability Processes and Documents be quickly changed out to bring the system back into operation.
To speed the system back into operation, it is typically divided
The mission requirements analysis and the operational into units that can easily be replaced on-orbit or on the flight
requirements of a new system are derived from the initial needs line. A module or system is designated an ORU or an LRU if
and wants of the community. Directly and simultaneously that part of the design has high modularity (can be self-
derived from this is the system maintenance concept (as contained, such as a power supply) and low connectivity (a
described in the maintenance concept document (MCD)). minimum of power and data cables to other parts of the system).
At this time, an initial draft of maintenance requirements As we will discuss later, we must be able to diagnose that an
should also be developed. Operational requirements and sys- ORU or LRU has failed. This means that maintenance on-orbit
tem requirements are funneled into the maintenance concept (or on the flight line) will only replace these items. The system
document, which covers every aspect of a maintenance pro- is built, tested, shipped, and put into operation. Operations and
_am throughout the life of the system (see fig. ! 1-8) (ref. 3). maintenance training are also conducted.
The maintainability analysis (see fig. 11-9) also uses ( 1) the
First Phase predicted time for corrective maintenance times the number of
failures, (2) the predicted times preventive maintenance (PM)
The first phase involves planning and designing because times the number of scheduled PM's and predicted times
maintainability is made a part of the design process, which changeout of limited-life items times the number of scheduled
includes making components easy to service. In this first step, changeouts. With these times, a prediction of overall mainte-
ORU's (orbital replaceable units) or LRU's (line replaceable nance time per period is made. Assuming that the system is shut
units) are selected. As the name implies, replaceable units can down during maintenance, we can then predict availability.

NASA/TPw2000-207428 167
Operational and
System
Program Plan
Requirements Maintainability I

Reliability/Risk
Maintenance
Engineering
Concept Allocations
Design Data
Document]Plan Life predictions
Specifications Failure data
Layouts and Maintainability Failure analysis
drawings Analysis Failure modes and
Technical reports effects analysis
Maintenance
Design changes Critical item list
Analysis
(CIL)

I Requirements
Logistics Support (LSAR)
Analysis
i

Support Resources Development]LSAR I

,o. Personnel

Customer
Demonstration
Maintainability ___

Figure 11-7.---Maintainability in systems engineering process.

168 NASA,rI'P--2000-207428
2. Performance of
maintenance action

Preventative maintenance
Fault diagnostics
Maintenance activity
Plan
integration
Action

3, Repair analysis and repair

Repair level analysis


Maintenance task analysts
Repair and restore
Recertify
1. Design for Sparing and storage
maintainability

Analysis
Build
Test
Deploy
Training Tools

Figure 11-8.--Maintainability activities.

NASA/TP_2000-207428 169
Specifications FMEMCI!JRisk
Concept
Maintenance Maintainability
Supplier Maintenance
Plan Demonstration
Maintainability
Program

Maintainability Design Failure


Design Surveillance Reporting
Checklist

Analysis
(Maintenance
Maintainability L
design requirement)
Design Maintainability
Tradeoffs Problem
Definition

t_ Design
Reviews
Corrective
Action System

Maintenance
Analysis
(Maintenance
tasks and support)

Figure 11-9.--Maintainability analysis process.

As the design matures and the failure mode and effects analysis/ Third Phase
critical items list (FMEA/CIL) and supplier maintainability
program data mature, the overall availability (as well as other The third phase involves the handling of failed components.
Here, repair-level analysis evaluates the failed ORU or LRU to
maintainability figures of merit) is recalculated. The data
generated by the maintainability analysis serves to appraise determine whether it should be repaired or replaced. If repaired,
it may be done in-house (intermediate maintenance at a main-
project management of the overall maturity of the design and
the ability of the design to meet program objectives. tenance depot where more specialized equipment and better
diagnostic instrumentation might be available) or at the factory.
(The following section discusses the Maintenance Concept

Second Phase Document in more detail.) Then the unit needs to be recertified,
retested, finally checked out, and returned to the spare parts
The second phase of maintenance is handling failures, per- storage area (preferably bonded storage).
forming preventive maintenance, and replacing life-limited Only by developing the complete maintenance concept and
items. Eventually the deployed unit breaks down. The failure the maintenance requirements early in the development pro-
must be detected and isolated from the actual failed ORU/LRU. cess will the design really be impacted by maintenance needs.
How is the failure detected, and how is the maintenance action The operational requirements document, the mission (or sci-
planned and executed? Can it be combined with any other ence) requirements document, and the maintainability concept
maintenance actions or preventive maintenance activities? The document with preliminary requirements should be the design
on-orbit or flight line maintenance is performed by removing drivers. Only then can effective trade studies, systems analysis
and replacing the failed unit. But what do we do with the broken and functional analysis, and allocation be performed. Also,
ORU/LRU? trade studies with reliability and maintainability alternatives

170 NASA/TP--2000-207428
Maintenance Maintainability
Concept Concept Program Plan
Document (MPP)
Preliminary
(MCD)
J

Maintainability
Design
Guidelines I Concept Guidelines
(MDG)

I t

Supplier MAP I
Maintainability
Maintainability I I I Maintainability
Requirements
Requirements Plan Document Analysis Plan Demonstration
Maintenance
(MP) __f Plan
(MRD)
(MAP) 7 I

Verification
Maintainability
Integrated logistics Design Demonstration
Analysis
support plan and reviews
Document
Maintenance Report
models

Figure 11-10.--Maintainability documentation.

can be used to evaluate total system cost. Reliability and Maintenance concept document (MCD) (required).--This
maintainability alternative selections will drive maintenance and document defines the proposed way maintenance is to be
repair costs, shipping costs, ORU/LRU spare costs, long-lead- performed on the product (see fig. 11-11); gives details of the
time components, and components manufactured by complex aims of the maintenance program and support locations;
processes. describes the way all maintenance actives are to be carried out
(details of support and logistics may additionally be specified

Documents depending on document requirements); defines the input and


output data requirements and the scheduling of maintenance
Several documents (fig. 11-10) typically support a large- activities, including the following sections:
scale engineering project (some describe the activities already Mission profilesystem operational availability: How often
discussed). They officially begin with a basic plan and the and over what period of time is the system operational? What
maintenance concept document (MCD). The MCD together is the geographic deployment of the system and where is the
with the operations concept document and the science require- location of the system that needs to be repaired?
ments are the chief design and cost drivers for the future system. System-level maintainability requirements: What are the
The individual documents are as follows: allocated and actual reliability requirements and maintainabil-
Maintainability program plan (MPP) (required).--This doc- ity requirements (MTTR, MTBF, MLDT, MDT 2 )?
ument defines the overall maintainability pro_am, activities, Design requirements: What constitutes a maintainable ele-
documents to be generated, responsibilities, interfaces with the ment that can be removed or replaced (e.g., an orbital replace-
logistics function, and the general approach to the analysis of able unit (ORU) or a line replaceable unit (LRU)?). What are
maintenance. the sizes and weight limits?

2MTTR, mean time to repair; MTBF, mean time between failures: MLDT,
mean logistic delay time; MDT, maintenance downtime.

NASAfrP---2000-207428 171
Maintenance plan (MP) (required).--This document
I Availability:
• | operating time defines the actual way maintenance is to be performed on the
Computer Clutch I I (reliability) product. The MP gives detailed requirements for repair or
reliability: reliability: _ + downtime replacement analysis, the location for and levels of mainte-
mean time mean time I I (maintainability nance, and other detailed requirements for performing the
to failure to failure I t + supportability) maintenance.
Maintainability design guidelines (MDG ) (optional).--This
guideline contains suggestions, checklists, and descriptions of
ways to make the design maintainable• Related safety and
Computer Clutch human factors and factors to consider for vendors and transpor-
maintainability: maintainability: tation may also be considered.
mean time mean time Maintainability requirements document (MRD)
to repair to repair (required).--This document gives the specific requirements
(criteria) that will facilitate maintenance or repair in the predicted
environment. It contains all maintainability requirements.
Maintainability analysis plan (MAP) (required).--The
maintainability analysis plan specifies how the maintainability
of the system is assessed. It also documents the process that
Gear box _-
translates system operational and support requirements into
detailed quantitative and qualitative maintainability require-
ments with the associated hardware design criteria and support
requirements and provides basic analysis information on each
ORU/LRU. This document includes evaluation processes for
preventive, corrective, and emergency maintenance. The MAP
Figure 11-11.--Factors affectingmaintainability. documents the formal procedure for evaluating system and
equipment design, 3 using prediction techniques failure modes
Diagnostic principles and concepts: How will a failure be and effects analysis, procedures and design data to evolve a
detected and isolated? How will repairs be evaluated? comprehensive, quantitative description of maintainability
Requirements for suppliers: What information about parts design status, problem areas and corrective action
and components must the supplier give? How will the first-, requirements.
second-, and third-tier suppliers support their products? How Supplier maintainability analysis plan (optional).--This
quickly will they be available and for how long will they be document outlines methodology to evaluate suppliers for con-
available? formance to maintainability standards.
Repair versus replacement policy: How is the decision made Maintenance analysis document (required).--This docu-
to repair orreplace a unit? If repaired, how is the unit req ualified? ment provides the details of how each ORU/LRU is to be
Repair level analysis: Where will different failures be maintained and includes detailed maintenance tasks, mainte-
repaired? Which repairs will be made on-orbit (or on the flight- nance task requirements, and maintenance support require-
line)? Which repairs will be made at an intermediate mainte- ments.
nance facility (depot) and which will be made at the factory? Maintainability demonstration plan (optional).--This plan
Tools and test equipment: What diagnostic, alignment, and documents the process that translates (and verifies) system
check-out tools will be required for each level of maintenance operational and support requirements into actual test plans for
(repair)? the maintainability of systems and subsystems. The output, the
Personnel and training: What is the level of training required maintainability demonstration report, includes MTTR's and
for the units at each level of maintenance (from simple remove maintenance descriptions (ref. 4).
and replace to detailed troubleshooting of an ORU/LRU)?
Crew considerations: What time will be allocated for preven-
3Tohelp the reader distinguish between the variousaspects of maintainabil-
tive and corrective maintenance? How much time can a flight ity evaluation,the following is useful.The three stagestothe overallevaluation
crew and a ground crew give to maintenance during or between process are (1) engineering design analysis, (2) maintainability analysis, and
missions? (3) the maintainability demonstration. Engineering design analysis includes
Sparing concepts: Which spares will be onboard versus those the initial trade studies and evaluation to determine the optimum ORU design
configuration. Also, identified are safety hazards,reaction time constraints for
delivered when needed? Will failed units be repaired or
critical maintenance, and an evaluation of diagnostic alternatives. Maintain-
replaced? What are the general repair policies? ability analysis includes an expanded detailed analysis of the final design to
Elements of logistic support (optional): Where will all the determine all maintainability system parameters. The maintainability demon-
test and ground support equipment and inventory control sup- stration then specifies tests to verify the datacollected during the maintainabil-
plies be located? ity analysis.

172 NASA/TP--2000-207428
Service life v
Failure rate,
X, faiJures/10 6 hr
t i i° 11

Constant failure rate region Wearout


111

Infant mortality (useful life) Failure rate


Proper burn-in helps rapidly increases.
Low-failure-rate parts result from
discover early failures
good design, quality control and
of defective parts. /
manufacturing process. Random
failures still occur and parts must
,r'- Replace before /i
be replaced.

/
/
/
/
/

Failure rate

tO tl t2 t3

Low failure rate Time, t

Identify the environment


(temperature, humidity, vibration). __a

Figure 11-12.--Maintenance of limited-life items.

An intermittent operation requirement is different. If avail-


Maintainability Analysis Mathematics
ability is on demand, the built-in-test/built-in-test-equipment

As previously stated, the goal of system performance is to (BIT/BITE) and preventive maintenance functions have to be

have the system available when it is need. As figure 11-11 perfected and evaluated (through accumulating many hours on

shows, the failure rate, the mean time to repair, the time similar units). However, downtime for preventive maintenance

to acquire spares, and operational constraints all affect avail- has to be accounted for with spare systems. If there is scheduled

ability. intermittent operation, critical components can be replaced or

Availability requirements can be met with an extremely continuously monitored (ref. 6).

reliable system, one that is easy to repair and has an adequate For the mathematical analysis that follows, we will assume that

supply of spare parts, or a combination of both. System use and we have a system that requires continuous operation except for

mission profile also affect system availability requirements. scheduled preventive maintenance, that a temporary backup

The following list gives examples of continuous and intermit- system exists, or that the system can be down for short periods.

tent mission requirements (ref. 5). Once the system is put into operation, it might experience

Is continuous operation required as for a critical life support periods when not all features are operating but the failures can

system on a space station or an air traffic control system? If so, be tolerated until the next scheduled preventive maintenance

the reliability has to be very high and/or backup systems may (e.g., failure of a monitoring sensor or a BITBITE function).
be needed: Maintenance includes (1) corrective maintenance, the re-
placement of failed components or ORU's and LRU's; (2)

• Continuous operation preventive maintenance, 4 scheduled maintenance identified in

o Spacecraft (LEO) the design phase as solution, alignment, calibration, or replace-

o Space station ment of wear items such as clutches, seals, or belts: (3) replace-

o Air traffic control system ment of life-limited items such as those illustrated in fig-

• Intermittent operation (on demand) ure 11-12. Distinctions must be made between the availability

° Emergency vehicle calculated from the MTBF that is only valid in region II and the

° Research fighter availability once a component enters its wearout region. Here

o Shipboard gattling gun the failure rate may increase exponentially, and it is more

• Intermittent operation (scheduled)


° Space experiment 4preventive maintenance can also include software. Fixing corrupted tables.
° CAT scan or MRI equipment in hospital updating data bases, and loading revisions of software are an important part of
scheduled maintenance.
° Space Shuttle main engines

NASA/TP--2000-207428 173
difficult to predict. The generally accepted practice is to replace where MTI'MA is the mean time to a maintenance action
life-limited items before they enter their wearout period. If the (corrective, preventive, and replacement of limited-life items)
mission life extends into region III (wearout), the part is a life- and MMT is the mean (active) maintenance time (corrective,
limited component and will be replaced before the beginning of preventive, and replacement of limited-life items). Achieved
the wearout stage at time t2. If the mission life is somewhere in availability includes inherent availability plus consideration
region II, the component will only be replaced if it fails for time spent for preventive maintenance and maintenance of
randomly. No scheduled replacement time will be made. life-limited items.
Availability can be calculated as the ratio of operating time
to total time, where the denominator, total time, can be divided
Operational availability.
into operation time (uptime) and downtime. System availabil-
MTTMA
ity depends on any factor that contributes to downtime. Under-
MTTMA + MMT + MLDT + MADT
pinning system availability, then, are the reliability and
maintainability of the system design; however, support factors,
particularly logistics delay time, also play a critical role espe- where MLDT is the mean logistics delay time (includes down-
cially when a long supply line exists (such as with the Interna- time due to waiting time for spares or waiting for equipment or
tional Space Station (ISS)). Assuming these factors remain the supplies). Maintenance downtime is the time spent waiting for
same, the following availability figures of merit can be a spare part to become available or time waiting for test equip-
calculated: ment, transportation or a facility area to perform maintenance.
For this discussion, it does not include local delivery such as
MTBF going to a local storage location and returning to the work sight
Inherent availability = and returning the used part to a location for transport to a repair
MTBF + MTTR
facility. MADT is the mean administrative delay time and
includes downtime due to administrative delays, waiting for
where MTBF is the mean time between failures and MTTR is
maintenance personnel, time when maintenance is delayed due
the mean time to repair. Inherent availability considers only
to personnel being assigned elsewhere, filling out forms and
maintenance of failed units.
signing out the part. Operational availability includes achieved
availability plus consideration for all delay times.
MTTMA
Achieved availability = Availability measures can also be calculated for a point in
MTTMA + MMT time or for an average over a period of time. Availability can

Level
of
operation

Operational
/

Limited

Not
operating
/
J
f

,/
J
Time
J
/

Mean time betweenfailures _ Mean time to repair


M_R (MTr'R)
,f
f" (MTBF) 1--_ MTBF

i ! "e°va'l
i
I 1" i

io,.°°o.,, Accessing and


replacement
System
restoration Checkout [ Closeup

Figure 11-13.--Maintainability during systemoperation.

174 NASA/TP--2000-207428
alsobeevaluated foradegraded system. Fortheremainder of replacement, respectively, and 8640 is the number of hours in
ourdiscussion,wewillassume averageavailability andmain- one year.
tainability
factors. (4) The maintenance hours per year (MMHY) for corrective
Other
important factors
incalculating
availability include (1) (c), preventive (p) and life-limited replacement (l) follow:
maximum allowabletimetorestore,(2)proportions offaults
andpercentage oftimedetected asafunction offailuremode, MMHYc = DC x MTTRoRu x K x ( 8640
(3)maximum falsealarmrateforbuilt-intestequipment, and k MTBF )
(4)maximum allowablecrewtimeformaintenance activities.
Wealsowantto lookin detailatanindividual corrective
maintenance action.A number ofelements make upamainte- MMHYp = MMP x F(P)
nanceaction andoncetheyarecombined, other factors must be
considered before the overall impact on crew hours, mainte-
MMHY/= MTTR°Ru
nance hours, and other maintenance parameters are determined
rl
(fig. 11-13). These elements are (ref. 7)
where
(1) Maintainability prediction using the most effective meth-
ods available emphasizes an estimation of the time to restore at DC duty cycle of ORU, percent
the ORU/LRU level. For a failed unit, the time to restore is the MTBF mean time between failures, hr
total corrective maintenance time T in minutes for each ORU: MTBM mean time between maintenance, hr
MMP mean hours to perform preventive task, hr
T = DI + DL + GA + RR + SR + CK + CU F(P) preventive task frequency per year
K MTBF to MTBM conversion factor
where
Tt life limit for ORU, hr

DI diagnostic time to detect and isolate a fault to the ORU (5) Maximum corrective maintenance time Mma x is the +90
level, min percent time for a normal distribution. It is assumed that since
DL local delivery of spare ORU/LRU as opposed to shipping this is a manual operation and not the subject of wearout, the
in from a remote location, min normal distribution will apply:
GA time required to gain access to the failed ORU, min
RR timerequiredtoremoveandreplacethedefectiveORU, min
Mrnax = MTFRoR U + (1.61 x o)
SR time required to restore system (including alignment,
checkout, and calibration), min
where o is the standard deviation of the repair time.
CK time required to complete system checkout, rain
CU time required to close up system, min
Plots of typical inherent availability are presented in fig-
ure 11-14 as a function of MTFR and MTBF. Here, solving the
(2) The mean time to repair (MTTR) the ORU (on-orbit)
expression
follows. For this exercise, assume a crew size of one for all
repair operations:
MTBF
Inherent availability =
MTBF + MTTR
MTTRoR U = (T x Z)
60

where MTTR is in hours and Z is the conversion factor for


1 to 10.-6 g. . 12_ Availability/
10

(3) The Mean time to a maintenance action (MTTMA) based


on a yearly average is

MTTMA = (MMHYp + MMHYp + MMHY/)


_
° 2
o
8640 0 200 400 600 800 1000
Mean time betweenfailures, MTBF,hr
where MMHY is the preventive maintenance hours per year,
the subscripts p and 1 denote preventive and life-limited Figure 11-14.mRelationship of M'I-I'R and MTBF to availability.

NASA/TP--2000- 207428 175


A combination of built-in testing and diagnostic procedures
50 _- \Availability
"_ 40 0.973 (with the needed tools and instruments) must be available to
diagnose a fault or failure to at least one ORU/LRU level. If it
°"=- 30
cannot be determined with that fidelity, the wrong item might
be replaced. The built-in test procedures begin with specific
20
questions:

o I I
0 50 100 150 200 250 300 350 400 Do we know what is going to fail?
o Do maintenance records allow preventive maintenance
Failure rate, _., failures/106 hr
where critical items are replaced at a known percentage of
Figure 11-15.--Relationship of MT-rR and failure rate to life?
availability. o Do smart diagnostic features sense impending failures?

• Do we know what has failed?


gives
o Does built-in test equipment quickly diagnose
the problems?
MTTR = (1 - inherent availability) x MTBF
° Does readily available external test equipment quickly
diagnose the problems?
Figure 11-15 shows MTTR as a function of failure rate (assum-
ing an exponential rate). For an exponential distribution, the
• Do we know how we are going to handle each failure?
failure rate _, is 1/MTBF. Substituting this into the above
° Has a repair analysis been performed on all likely failures?
expression for inherent availability and solving for MTTR
o How will each failure be diagnosed and repaired?
yields the results shown.
o Has the failure modes and effects analysis (FMEA) been
evaluated for failures and corrective actions?

Additional Considerations
The questions that remain are Can all plausible and probable
failure modes (based on the FMEA/CIL) be diagnosed with
As previously mentioned, to speed the system back into
BIT/BITE? and Can the necessary diagnostic procedures be
operation, it is typically divided into units (ORU's/LRU's) that
carried out by a crew member or technician on the flight line?
can be easily replaced, either on-orbit or on the flight line. This
The answers to these questions determine the design concept
means that maintenance on-orbit (or on the flight line) will
for maintainability. The aim of this analysis is to reduce
usually only replace these items. The following are important
downtime.
questions we need to ask for our maintainability analysis
(ref. 8):

• How much downtime is acceptable? Requirements and Maintainability Guidelines for ORU's
• What will be replaced on the flight line (what should be
designated an LRU or an ORU)? Other requirements to evaluate ORU's/LRU's follow.
• How will a failure be diagnosed and isolated to an ORU/
LRU, a BIT/BITE, manual processes, software, or a combi- (1) On-orbit replacements of ORU's should not require
nation? calibrations, alignments, or adjustments. Replacements of like
• Will the failed units be scrapped or repaired? items in ORU's should be made without adjustments or align-
• If repaired, what should be repaired for each type of failure? ments (this will minimize maintenance time).
Where should it be repaired (depot, lab, factory) and by what (2) Items that have different functional properties should be
skill level? identifiable and distinguishable and should not be physically
• What preventive maintenance needs to be performed? interchangeable. Provisions should be incorporated to preclude
• What kind of maintenance tests need to be performed? installation of the wrong (but physically similar) cards, compo-
• Can all components be inspected for structural defects? nents, cables, or ORU's with different internal components or
• How will structural defects be detected and tracked? engineering, revision number, and so forth. Repro_amming,
• Have acceptable damage limits been specified? changing firmware, and changing internal switch settings may
• Are safety-related components easy to replace? be allowed with special procedures and safeguards.
• Are there safety issues that occur during maintenance? (3) All replaceable items should be designed so that it will be
• How is corrosion controlled? physically impossible to insert them incorrectly. This is a basic
• Are limited-life items tracked for maintenance? maintainability and safety requirement.

176 NASAfI'P--2000-207428
Additionalmaintainability
considerations
thatshouldbe
incorporated
inthedesignare
c6
(1)AnyORU,shopreplaceable unit(SRU) 5,theirsubcom-
\,-- --% (Kq,
ponents,
orcardsthatarephysically identical shouldbeinter-
changeable (excludingcablesandconnectors). Identical "_, MTTR2 /
,- R2 = RD(K_K,_K_K_,)
hardware(e.g.,asignalconditioning card)shallnotbemade
unique.
Different software and switch settings do not affect _MTTR 1
identity. The ability to replace ORU's with an identical unit
from an inactive rack will improve availability. Q)
0 _ml
:_ ,
(2) Standardization should be incorporated to the maximum
extent through the design. In the interest of developing an KR2 kR1
efficient supply support capability and in attaining the avail- Failure, rate, X
ability goals, the number of different types of spares should be
held to a minimum. Figure 11-16.--Effect of quality on maintainability.
(3) The ORU should be designed from standard off-the-shelf
components and parts.
(4) The same items and/or parts should be used in similar nance of equipment and software), and (8) disposal. Personnel
ORU' s with similar applications (e.g., boards, fasteners, switches, considerations involve analyzing what level of expertise is
and other human interface items; fuses, cable color designations, needed for each level of maintenance (on the flight line, in a
and connectors (except to avoid improper hook-ups)). depot (intermediate repair facility), or in the factory) to effec-
(5) Equipment control panel positions and layouts (from tively perform the repairs.
panel to panel) should be the same or similar when a number of Maintainability, quality, and reliability.--Figure 11-16
panels are incorporated and provide comparable functions. shows the relationship between the three. As quality and manu-
facturing techniques improve, reliability increases. Therefore,
for the same availability, MTTR may increase and a higher
Related Techniques and Disciplines availability may be attained. The reliability of the product is
given by Rp.roduct where the design stage reliability RD is modi-
Some disciplines that relate to basic maintainability analysis fied by various K factors. These denote probabilities that the
are now discussed (ref. 9).
design-stage reliability will not be degraded by any given factor.
Supportability.--This is a global term that covers all main- The K factors are external contributors to product failure:
tenance and logistics activities. The unit can be supported if it
can be maintained and if spare parts can be delivered to it.
Rproduc t = RD( KqKmKrKtKu )
Reliability-centered maintenance (RCM).--This mainte-
nance process is based on the identification of safety-critical
failure modes and deterioration mechanisms through engineer- where
ing analyses and experience. Thus, the consequences of the
failure can be determined on the basis of severity level so that manufacturing, fabrication, assembly techniques
maintenance tasks can be allocated according to severity level quality test methods and acceptance criteria
and risk. The RCM logic process considers maintenance task r reliability fault control activities
relative to (1) hard-time replacements in which degradation x/ logistics activities
user or customer activities
because of age or usage is prevented and maintenance is at
predetermined intervals; (2) on-condition maintenance in which
degradation is detected by periodic inspections and (3) condi- Manufacturing processes or assembly techniques that are not
tional maintenance in which degradation prior to failure is statistically controlled can greatly affect reliability. Special
detected by instrumentation and/or measurements. cause variation, change in raw materials, or lack of adherence
Integrated logistics support.---This includes the distribu- to manufacturing procedures can dramatically reduce product
tion, maintenance, and support functions for systems and reliability. Poor test methods may allow substandard compo-
products: (1) maintenance, (2) supportability, (3) test and nents to be used in a product that would fail final test screenings
support equipment, (4) personnel training, (5) operations facili- and enter the operating population. Poor packing, shipping
ties, (6) data (manuals), (7) computer resources (for mainte- practices, storage, and so on will raise the failure rate. The user
or customer may abuse the product by using it for what it was
not intended or using it in a new unspecified environment. All
5A part or component that is designed and/or designated to be replaced in a

depot or at the manufacturer. For instance, it may be highly modular but its
these problems require that the system be maintainable during
failure cannot be easily detected on-orbit or on the flight line. operation.

NASA/TP--2000-207428 177
TABLE
11-2.--MAINTAINABILITY
FIGURES
OF MERIT
Reliability = exp(-kt m)
Weight of orbital replacement and line replacement units, kg
Volume. m3
= exp(-0.000443 x 50) = 0.9780
Power requirement, W
Definition of partial operation
Mean time between failures, hr Determine the availability:
Life and wearout, hr
Mean time to repair, hr/repair MTBF
Failure modes and effects analysis, hr Availability =
(MTBF + MTTR)
Manifest time, hr
Operation time, hr 2257
= 0.9976
Operationperiod, hr (2257+5.5)
Spate location, sec
Maintenance cost, dollars
Repair cost, dollars
Transportation, dollars Example 2
'Built-in test capabilities
Five RTD temperature sensors (model RTD--A-7) were tested
Tools requited
Preventive maintenance and failed after an average of 4026 hr (time for first failure tf).
Time studies have shown that it takes 52 hr to diagnose, remove,
Supportability
order, receive, replace, and check out a unit (MTTR). Assum-
Availability
ing continuous use and an exponential failure rate, what is the
failure rate L the reliability for a mission time tm of 50 hr, the
MTBF, and the availability. Determine the failure rate:
Maintainability Problems
1 I
The maintainability, reliability, and cost data items in
MTBF 4026
table 11-2 represent the information required to perform a
maintainability analysis. We will consider how these items = 0.000248 failures / hr
interact and how maintainability trades can be made. First,
consider examples 1 and 2 (for the basic formulas, refer to the
The reliability is
section Maintainability Analysis Mathematics).

Reliability = exp(-?_t)
Example 1
= exp(-0.000248 x 50) = 0.9876
Five pressure transducers (model c-4) were tested and failed
after an average of 2257 hr (time for first failure tf). Time studies The availability is
have shown that it takes 5.5 hr to diagnose, remove, replace, and
check out a unit (MTTR). Assuming continuous use and an
exponential failure rate, what is the failure rate L, the reliability MTBF
for a mission time tm of 50 hr, the MTBF, and the availability? Availability = (MTBF + MTTR)
First, determine the failure rate:
4026
= 0.9872
(4026 + 52)
_, (Failure) (Failures
= \"""_-r J or _ 106 hr )

1 1 Problem Solving Strategy


MTBF 2257
One way to assess tradeoffs is to first evaluate conformance
= 0.000443 failures / hr or 443 failures / 106hr to minimum maintainability requirements and then to calculate
the effects that the alternatives have on costs by following these
steps: (1) determine screens, minimum or maximum
Determine the reliability: acceptable values for a system or component; (2) determine

178 NASA/TP--2000-207428
TABLE 11-3.--SYSTEM AND MISSION PARAMETERS AND COSTS
Requirement i Parameter Questions in determination of screen
System (1) Availability minimum, 0.990 Based on the M'I'rR and MTBF for each unit. Is the availability
greater than or equal to the requiremen! (0.990)?
(2) Mean time to repair (MTTR) maximum, 5.0 hr What is the maximum repair time that can be allowed?
How ion_ can the system be down?
(3) Mean time between failures (MTBF) minimum. 300 hr What is the minimum reliability' _oal of the s_'stem?
(4) Logistics delay time (LDT) + administrative delay time What is the maximum LDT allowable? For a single repair action,
(ADT), 0.3 hr how long does it take to deliver a replacement part from the
warehouse or factory (for the total mission, turnaround time for
repair of boards also needs to be considered)? What is the ADT?
How long will it take to process an order for spares and how long
will it take to do other paperwork? (ADT may not affect system
availability but it will affect total crew maintenance time used to
repair the s_'stem.)
Mission (5) Total mission time, 520 wk What is the total time that the unit will be in the system and
available for operation?
(6) System operating time per week, 4 hr How many hours per week does the unii operate and in what modes
(operational, standb),, partial, off)?
(7) Maximum resource allocation for maintenance, 0.1 hr/wk Are crews available for maintenance and operation of the unit? Is
the MTTR reasonable so that the crew will have time to do main-
tenance?
(8) Operational requirement, 6 hdwk Are there limits on how long an item can take to be repaired?
!(Often, if a system is difficult to repair, it may be neglected in
ifavor of a more easil), maintained s_cstem.)
(9) Total mission time, 87 360 hr IWhat are the total clock hours the mission is to last (irrespective of
whether the s_,stem bein_ considered is operating)?
(10) Total system operation, 2080.0 hr/yr What are the total hours per year the system or board being
considered is operating (6 times/wk × 52)?
Cost (11) Board repair, $7000 dollars What is the cost to repair a failed board?
(12) Transportation of board, $4500 What is the cost to transport a spare board to the site of field
repairs. (If the site is remote or on-orbit, the cost may be
considerable.)
(13) Maintenance on-orbit, $500/hr What are the allocated costs for crew maintenance time on-site or
on-orbit? (The cost of crew maintenance time may be considerable
and significantly affect the overall trade study costs.)

which tradeoffs meet these screens; (3) of the systems that pass, minimum of 300 hr, and the MTTR maximum of 5 hr. Note also
calculate costs (cost of spare, cost to ship spare, cost to install that in this figure the constant availability lines are generated
spare); (4) determine the lowest cost system; and (5) examine with MTBF's and MTTR's that represent average values:
the results for reasonableness. MTTR and MTBF are usually considered distributed variables
This discussion presents a more detailed analysis of how with an exponential or normal distribution.
tradeoffs (at the board or component level) involving mainte- Having addressed the basic requirements imposed on the
nance and reliability may be made. This is a more complex system and the costs associated with a maintenance action, we
example for which we want to determine the lowest cost will now evaluate individual boards that are being considered
solution to a maintainability problem with fixed requirements for a black box in the system.
by following the above procedures. First, some additional assumptions must be made. (I) Only
Determining screening requirements.--The reliability and one spare board is required and it is readily accessible on-orbit
maintainability screening requirements must be determined. Here or on the flight line; (2) all spares cost the same; (3) there is no
there is a maximum M'Iq_ 6 related to maintenance crew avail- finance (carrying) cost; and (4) repair costs for each alternative
ability, a minimum MTBF due to mission restrictions, and a board are the same. 7

specified availability requirement needed to complete the


mission. The operation of the system is intermittent. A detailed list
6Strictly speaking, we do not have a "maximum M'ITR" since MTTR and
of these requirements and costs is presented in table 11-3, which also MTBF do not have distributions but are derived from a distribution. This
gives quantitative system data needed to evaluate the model. notation is kept because we are looking at a number of MTTR's for various
The availability, maintainability, and reliability screens in alternative boards and the like.
7A problem arises when the boards are stored on the ground or in a
table I 1-3 are also portrayed graphically in figure 11-17 where
warehouse (for LRU's) when there are long logistic delay times. If systems were
availability is shown as a function ofF(MTBFand MTrR). The
in remote sites or on-orbit (with no local storage of spares) with only three or
solution space described by the system and mission require- four deliveries of spares per year (as with the space shuttle), there might be
ments is bounded by the 0.990 availability line, the MTBF considerable periods of downtime.

NASA/TP--2000-207428 179
TABLE 11-4.--BOARD TRADEOFF OPTION DATA
.ff 12 -- Availability / [Logistics and administrative delay times.
O. LDT + ADT, 0.3 hr.]
A B C D
Board Mean time Cost, Mean time to
option between dollars repair,
failures, MTTR,
e- 4 j _/ Acceptable MTBF, hr
hr
0
0 200 400 600 800 1000 2 195 74 100 3.7
Mean time between failures, MTBF, hr 2a 662 182 900 3.8
3 191 77 600 3.5
Figure 11-17.--Problem solution area on availability plot.
3a 583 13O 80O 3.7
4 199 76 600 3.3

Determining tradeoffs that meet screens.mData required to 4a 828 188 257 6.8
5 62 45 400 3.4
evaluate each potential electronic board for a particular func-
"Discarded for failing to meet functional design
tion in the system are given in table 11--4. Board option 1 was
parameters.
discarded for failure to meet functional design parameters.
Each remaining board (first column) was evaluated for ex- results, we found that options 2, 3, 4, and 5 failed the minimum
pected MTBF or reliability (with a parts count according to MTBF and availability screens; option 4a failed the maximum

MIL-HDBK-217 or possibly via testing), estimated cost to MTTR screen; the remaining options 2a and 4a will be evalu-
purchase the board, estimated time to repair the board (based on ated to determine which has the lower cost.

ease of diagnosis, built-in test circuitry or software), and Determining the cost of acceptable systems.--Of the sys-

estimated LDT (based on the supplier turnaround history) and tems that pass, calculate the costs of purchasing the spare and

administrative delay time (ADT). the board, repairing the failed unit, and shipping and installing

The next step is to calculate the data required in table 11-5 to the spare. These figures are shown in table 11-6.
see if the maintainability and reliability requirements have been The total mission board repair cost is equal to the cost of
met. 8 repairing each board (at a depot or the factory) times the total
number of maintenance actions. The cost of the board repair is
$7000/repair, which would theoretically be reduced by the
Number of maintenance actions
number of spares purchased. The repair cost and turnaround
Total mission time / wk x system operating time / wk time should be part of the supplier's bid for the board.
Mean time between failures The total mission board shipping cost is equal to the cost of
transporting the board times the total number of maintenance
MTBF actions. The cost of shipping the board is $4500 per shipment.
Availability = The total mission board maintenance cost reflects costs to
MTBF+ MTIR
change out the board on-orbit or on the flight line. The cost to
replace the board (on-orbit or on the flight line) is $500 per hr,
Total maintenance time
which assumes that the board is also an ORU or an LRU. It is
= Number of maintenance actions / mission equal to the total number of maintenance actions times

x (MTTR + LDT + ADT) (MTTR + LDT + ADT).


The total mission board repair cost is equal to the total cost
of repair, shipping, and maintenance.
Total maintenance time (hr / wk) The total mission board cost is equal to the total mission
board repair cost plus the cost of the board and one spare board.
The cost of manufacturing the board was already given in
Total maintenance time (hr)
column C of table 11--4. For the present example, we will
Total mission time (wk)
assume that we need to purchase one board and one spare
board. 9

Note that the maintainability screens are independent and may


not necessarily relate to these formulas (e.g., irrespective of the
9One must also consider the quantity of spares needed to have a replacement
required availability and minimum MTBF, there may be a
board available at all times. This is a function of the desired probability of an
maximum maintenance time allowed). After evaluating the
available spare, the time to ship the board out for repairs, to repair it, to recerti fy
it, and to return it to a storage location. A detailed discussion of the mathematics
81"heformula for column F is F = (5)(6)/B where (5) and (6) refer to items of this evaluation is beyond the scope of this paper. Additional costs will also
in table 11-3 and B refers to column B in table 1I-4. be incurred with parts storage.

180 NASA/TP--2000-207428
TABLE 1I-5.--MAINTAINABILITYFIGURES OF MERIT
A F G H I
Board Number of Availability, Total Tot_ main-
option maintenance percent. maintenance tenance
actions per time, time.
mission hr hr/wk

F = (5)(6)/B G = B/(B + D) H =F(D+E _) 1 = H/(5)


2 10.7 0.980 42.7 0.08
2a 3.1 .994 12.9 .O2
3 10.9 .980 41.4
3a 3.6 .993 14.3 .03
4 10.5 .982 37.6 .07
4a 2.5 .992 17.8 .03
5 33.3 .944 123.3 .24
'E = logistics and administrative delay time of 0.3.

TABLE 1I-6.--TOTAL MISSION COST CALCULATIONS


A J K L M N
Board! Board repair, Board shipping, Board maintenance, Repair, Board and spare,
option, dollars/mission dollars/mission dollars/mission dollars dollars

J = (ll)F K = (12)F L ---(13)H M=J+K+L N = (2C) + M


2 74 683 48 011 1861 124 554 272 754
2a 22005 14 146 1903 38 055 403 855
3 76 216 48 996 1761 126 974 282 174
3a 24 965 16 049 1854 42867 304467
4 73 151 47 026 1660 121 837 275 037
4a 17 578 11 300 3403 32 280 408 794
5 233 206 149 918 1733 384857 475 657

Determiningthe lowestcostsystem.---The solution is to pick an essential consideration for any program requiring ground
the lowest-cost board that passed the screens. Options 2 to 4a and/or on-orbit maintenance. The Office of Safety and Mission
and 5 have already failed screens. Of the remaining candidates Assurance (OSMA) has undertaken a continuous improvement
2a and 3a, 3a has the lowest cost. initiative to develop a technical roadmap that will provide a
Examining the results for reasonableness.--As always, path to achieving the desired degree of maintainability while
factors other than costs must be included in the analysis. Human realizing cost and schedule benefits. Although early life-cycle
factors, hierarchy of repairs, ease of problem diagnosis, ability costs are a characteristic of any assurance program,
to isolate faults, ability to test the unit, manufacturer's process operational cost savings and improved system availability
controls and experience, and the ability of the manufacturer to almost always result from a properly administered maintain-
provide long-term support for the unit are some additional ability assurance program. Experience in NASA programs has
considerations. demonstrated the value of an effective maintainability program
initiated early in the program life cycle.
Technical Memorandum 4628 entitled "Recommended Tech-
niques for Effective Maintainability" provides guidance for
Recommended Techniques
achieving continuous improvement of the life-cycle develop-
Current and future NASA programs face the challenge of ment process within NASA, having been developed from the
achieving a high degree of mission success with a minimum experiences of NASA, the Department of Defense, and indus-
degree of technical risk. Although technical risk has several try. The degree to which these proven techniques should be
elements, such as safety, reliability, and performance, a proven imposed resides with the project or program and will require an
track record of overall system effectiveness ultimately will be objective evaluation of the applicability of each technique.
the NASA benchmark that will foster the accomplishment of However, each applicable suggestion not implemented may
mission objectives within cost and schedule expectations with- represent an increase in pro_am risk. Also, the information
out compromising safety or program risk. A key characteristic presented is consistent with OSMA policy, which advocates an
of system effectiveness is the implementation of appropriate integrated product team OPT) approach for NASA systems
levels of maintainability through the program life cycle. acquisition. Therefore, this memorandum should be used to
Maintainability is a process for assuring the ease with which communicate technical knowledge that will promote proven
a system can be restored to operation following a failure. It is maintainability design and implementation methods resulting

NASA/TPm2000-207428 18 i
inthehighestpossible
degree
ofmission
successwhilebalanc- References
ingcosteffectiveness
andprogrammatic
risk.Therecommend
techniques canbe foundonlineat http://www.hq.nasa. 11-1. Maintainability Prediction. MIL-HDBK--472, Jan. 1984.
gov/offi
ce/codeq/doc.pdf. 11-2. Electronic Reliability Design Handbook. MIL-HDBK-338, June 1995.
11-3. Pillar, C.S.: Maintainability in Power Plate Design. Sixth Reliability
Engineering Conference for Electric Power Industry, American Soci-
ety for Quality Control, Milwaukee, 1979.
Conclusion 11-4. Fujiwara, H.:LogicTestingandDesignforTestability.MITPress, 1985.
11-5. Lala, P.K.: Fault Tolerant and Fault Testable Hardware Design. Prentice-
The benefit of a system maintainability program is mission Hall, 1985.

success, the goal of every NASA System Reliability and Quality 11-6. Engineering Design Handbook. Pamphlet AMCP 706-134, McGraw-
Hi/l, 1970.
Assurance (SR&QA) office. 10,11 A well-planned maintain-
11-7. USAF R&M 2000 Process. Handbook SAF/AQ, U.S. Air Force, Wash-
ability program gives greater availability at lower costs. A inglon, DC, 1987.
design with easily maintained (and assembled) modules re- 1I-8. Testability Program for Systems and Equipment. MIL-HDBK-2165,
sults. Considering maintenance prevents the inclination to use July 1995.
lower-cost components at the expense of reliability unless 11-9. Raheja, D.G.: Assurance Technologies: Principles and Practices,
McGraw-Hill, 1991,
maintainability tradeoffs justify them. Finally, maintainability
analysis forces considerations of potential obsolescence and
the need for upgrades 12 and reduces overall maintenance hours
and the total cost of ownership.

I°NASA Glenn Research Center is designing a second-generation instru-


ment to measure microgravity on the space station. The operating time for the
instrument is expected to be 10 yr. Reliability analysis has shown low teliability
for this mission even if we can get all the components to have an MTBF of
40 000 hr. Therefore, we are developing a maintenance program with an on-
orbit repair time of 700 hr, which should give a suitable availability for the
mission.
t 1NASA Glenn had an interesting experience with one of its space instru-
ments. It was designed for a mission time of 18 hr and had a reliability greater
than 0.90. It was suggested that we use the instrument on MIR for a 3000-hr
mission. The reliability fell to 0.40 when this and other factors were considered.
Maintainability was factored in with selected spare parts, software was added
to perform built-in test (BIT) of the unit. The mission specialists were also
trained to do repair work. The availability was returned to its previously
acceptable level (with the previous level of reliability). The instrument has
successfully collected data on MIR.
12For example, a ruggedized optical disk drive required maintenance after
each flight on the space shuttle or after 450 hr of operation. This process took
4 wk, which was unacceptable to NASA when the system had to be placed on
the Russian Space Station MIR. To correct the problem, the drives were
replaced with another component that greatly reduced maintenance time.

182 NASA/TP--2000-207428
Reliability Training 13

1. Three thermostats were tested and failed after an average of 39 500 cycles. Time studies showed that diagnosis took an average
6.8 hr to remove, replace, and check out a thermostat. What is the MTBF of the unit for a mission time of 168 cycles?

A. 30 200 cycles B. 35 600 cycles C. 39 500 cycles

What is the failure rate?

A. 20.6× 10 -6 failure/hr B. 25.3x10 -6 failure/hr C. 30.7x10 -6 failure/hr

What is the reliability?

A. 0.976 B. 0.986 C. 0.996

What is the availability?

A. 0.979 B. 0.989 C. 0.999

2.
Three air bearings were tested and failed after an average of 323 000 hr. It is estimated that it will take an average of
3200 hr to diagnose, remove, replace, and check out a bearing in low Earth orbit. What is the MTBF of a unit for a mission time
of 80 000 hours?

A. 293 000 hr/failure B. 313 000 hr/failure C. 323 000 hr/failure

What is the failure rate?

A. 3.1x10-6 failure/hr B. 3.5×10-6 failure/hr C. 4.0:,<10 -6 failure/hr

What is the reliability?

A. 0.68 B. 0.78 C. 0.88

What is the availability?

A. 0.79 B. 0.89 C. 0.99

13Answersare given at the end of this manual.

NASA/TP--2000-207428 183
Appendix A
Reliability Information
The figures and tables in this appendix provide reference failure rate 1 (or I/MTBF) and operating time t are substituted
data to support chapters 2 to 6. For the most part these data are for -x. The use of this table is discussed in chapter 3 and it is
self-explanatory. frequently referred to in chapters 4 to 6.
Figure A-1 contains operating failure rates for military Table A-2 contains tolerance factors for calculating the
standard parts. They relate to electronic, electromechanical, results of mean-time-between-failure tests. It provides seven
and some mechanical parts and are useful in making approx- confidence levels, from 50 to 99 percent for 0 to 15 observed
imate reliability predictions as discussed in chapter 3. Their failures. The use of this table is explained in the table. Examples
use, limitations, and validity are explained in chapter 4. are discussed in chapter 6.
Figure A-2 provides failure rate information for making Tables .4.-3 to A-5 contain tabulated data for safety margins,
approximate reliability predictions for systems that use estab- probability, sample size, and test-demonstrated safety margins
lished-reliability parts, such as air- and ground-launched for tests to failure. They provide three confidence levels, from
vehicles, airborne and critical ground support equipment, 90 to 99 percent, and sample sizes from 5 to 100. Values similar
piloted aircraft, and orbiting satellites. The use of this figure is to these are presented on the safety margin side of the reliability
discussed in chapter 4. slide rule; the slide rule provides six confidence levels and
Figure A-3 shows the relationship of operating application sample sizes from 5 to 80. The use of these tables and the slide
factor to nonoperating application factor. These data can be rule is discussed in chapter 6.
used to adjust failure rates for the mission condition. The use of More information on this subject can be found in references
this figure is also discussed in chapter 4. A-I and A-2.
Figure A-4 contains reliability curves for interpreting the
results of attribute tests. They provide seven confidence levels,
from 50 percent to 99 percent; and six test failure levels,
from 0 to 5 failures. The use of these figures is discussed in References
chapter 5.
A-I. Reliability Modeling and Prediction. MIL-STD-756B (plus change
Table A-I contains values of the negative exponential func- notices). Aug. 31, 1982.
tion e-x, where -x varies from 0 to -0.1999. The tabulated data A-2. Reliability for the Engineer. Book Seven: Reliability Tables. Matain
make it easy to look up the reliability, where the product of Marietta Corporation, 1965.

NASA/TP--2000-207428 185
Resistors Capacitors Semiconductors Transformers
and integrated and induc_ors
circuitsa
1OOOO0

50 000

20 000

10 000 [pA._

U.

t,t_m tmmpItmd_ m
_m
5 000 pm_t[_mml

O.

%
2 000
P Ww_ W c_t. tram_

_-,3

'WB
8 1 000 - ,u,,mmm I m
,mm mcitrw

t-

g
If} VldlbU klmalbed pamw 'l_-r
¢l 5OO Tamaklm. "mM --
Vae_bll _

2OO • Va¢_ m ° Irmal I_1 o


_1_ un_vc_m _m_ -"

Em:apmll_ pm_ I_andomw. me =. 4o ,C


100 _ _e trm_mw, peru _um
=.1_0 V

5O
rdpo_mm_

_Ow dmJ¢_ m 8ww m _ _ mmbmm.

Figure A-1 .---Militarystandardcatastrophic failure rates for operatingmode. (Failure rate for these parts in
nonoperating mode can be estimated by dividing these values by the application factor shown in fig. A-3.)

186 NASA/TP--2000-207428
E_romechan_l Switches and relays Conn_o_ b Hydraulics Hardware
rotating devices

Them_ _

E_
a¢MmmmKM_rm

C._ull bm/w
[ ,i, mm_. _ww

. Ose-e_aJ_e crls_ _='_ Ir_

' Pmc_lsn _ _r_


E_W
I_ufy _
T_gl uet_ m
W

Reed mey

u_m_u_ _ can_, - Ctul=_


_t_omuv

Exl_lm_e _ts and _


Pow_ _c_llw. =Imp. _ n_e
Po_er _i, _, _ _

Figure A-1 .---Continued.

NASA/TP_2000-207428 187
Resistors Capacitors Semiconductors Transformers
and integrated and inductors
circuitsa

£n_ 1:_* uandonmr, _ _ >1000 V


2O Er,ca_e ma0neti¢ ='qollW, t,r,_> t0 ',C

_ m=_ aml_Ik*. 20"C < me < ,10"C


10 W'lmw0tJnd accur4te _ -- Er,clt_med i_wer _. :=0 "G < _e < 40 "C
£nc_,umed I_N uan_*n,w. _ _taCe < to0o V

A
s
IJ.

J_
m { IEnca_Ula_ a_dlo lrandom_r. 10"C,¢ me c 20"C
I= 2
¢=
Q.

{ l:ncm_ ilUd_ t_ar_Somm', 10 'C < _o < 20 "C


Encmm ma_a=- amities. I0"C < m_,= 20"C

==
$'C < _le.¢ I0"C

m • 10"C

4>3

.2 -_ v-,_,,,-,._-_ _ '
mcS*C

.O5

.O2 ..... i

.01
*All devk_ s_o_ _leu sh_M_.

Figure A-1 .--Continued.

188 NASA/TP_2000-207428
Electromechanical Switchgs Connectors b Hydraulics Hardware
rotating devices and relays

,.._lgfudmdl_r, crimp, mini=urn

Mercur_ ..'_"_ r_y -- - - ,_l._r=_ v=_= - Ge_'t


PoNt.t_. _
bPoumr,thve=_ _r

S_gnaJcVcula_, a_, rnk_ure •Str_ me_blles


Mercury _.,,_h I M,imual
,Sk_
Pm_ mc_r. cdmp. K_w lack

C:h*d¢ va/ve=
.",,_,_..;-_..,.:.._-,

o..m¢ fmten=t
BIt¢l _

P_ f,ts=_=vs

Re_m_a_

I_em

{_ oM: xold_ nweld,

S_O dve_$

Figure A-1 .--Concluded.

NASA/TP_2000-207428 189
Resistors Capacitors Sem_,onductors Transformers
and integrated and inductors
circuilsa

50 000

20 000

10 000 Mcm_ve cliodo .

A
03
I--
E.

' GaAI FET


I.i¢_mMw Ge e_m_
5 000

Q.

{ _rU _-d_ma c_m. c_ae


2 000 ._e_er uanCmr(m,e¢ OBey. - Vadsble wJ_,msf_mw
_a-pa_w ccwm_¢mc_w

¢#

_• d_om(_ ° zen_
:3 1 000 Uonole_ck_¢ an_ - Vadab_ kon-co_ W
NyerU_eOrm_ c_¢_._n Ji,_

._.o
l-
Q.

e
500 v "_'._ RFmlk mo40_and op_
o

'FET

200 _unvua e_ec_ L.ow-lxm_ Ge e'_t nmi_o_

r E _m¢_li_ l_l,_ ripely,, ale _.,Io _c:


t.ow.ew_w m
_Enml_dawd p_mr Inms_w_. me > 40 -C
(k_ en_. n_a. vnlknonon)
100 - Tamalum. ,,mr
Wt_e m_arac Lmv-pm, w com_k_ m_lrw
Ena_da_ _u_ mmk_r_, pe_

F.d. pant
50 , ¥_mle amm_ tu_,W
W_e gm=. t_W

_JI _ _ll_e m tttmm.


_FI'TS per mlll-le,'m_ _ mm,ecrm.

Figure A-2.--High-reliability catastrophic failure rates for operating mode. Failure rate for these parts in non-
operating mode is about a factor of 10 less than values shown. (From ref. A-l).

190 NASA/TP_2000-207428
Electrornechanical Switches and relays Connectors b HYdraulics Hardware
rotating devices

k=cebcaton thnm'_-,;_:,_'; ,.

ThmwmJ
Urr_

Sr_hmnou= ¢lev_=. IxBsht_e smmw. I.=d==. _

It=.,.==-,m
tac_

:f o,y =_,= ,,_r Cemd_ q_eL, tim=,_,_,,,;


.7.s_s,_ve r_w. _ww, ii .¢ loo a.t,_

_ =.,,tdh _Ta_
O=e-C=m-_e ¢rtsm =m_mW

,acre o_ rl=_¢=k_ =,Ik Im_


ix_w n,AN
a¢_ m po*wly_
Togl_ sv4_
a¢ (_==0¢=

I_N.m

s_ul=..,_, E_,.- nu= and I=o"

P_w _'_6e¢. tolcW. I_in¢lm=lo

Figure A-2.--Continued.

NAS A/TP--2000- 207428 191


Resistors Capacitors Semiconductors Transformers
and integrated and inductors
circuitsa

'vlrllb_ _-
2O F_d=apor IVe,/W
Malml4 F.r,cm_ i_,_r nm::_, _M, > 40 "C

Enal=m_ po_r _'w_d_w. 20 "C < me < _0 -C --


10 -- LoW-pommr _81W lnrmkm_'_

Va_t)le V_nw rrd_ Wrn

, acl:_qlll L Enca_dpulNtra, m_rr_',l_xk'_m_ c IO00V


L F,._ "rec_n

LL

! EU auto " va_nm.r .

2¸ Ca.,on tm
M_II IWm
0

¢0
0,.

o ' Encxm_ _m.


I0"C < die < 20_C
Erx:a_4d magn_l_ am_lw.
I0_C ( m41 <20_C

mcal,_, dl,_ _ IO'C


{0

J_
,.-.,
2
m .2

.I

.05

.O2

4_Ai ¢41v_1 l/kin


• u,n,_ ihom 1
_=rrs _w rml_m _1 om_n.

Figure A-2.--ContJnued.

192 NASA/TP_2000-207428
Electromechanical Switches and relays Connectors b Hydraulics Hardware
rotating devices

Memu_,..'_'_-
--'_'ff-
relay
| s_ p_n _ _ - -s_ _

_u
f S_naJre,
P_. _.c_mr. c_'tmD,
=C/CWrrd/_lh_re
Pov_'. _hmmcL =oue_'

' Checkvak_
{ Siunadedoe. =o_,r Re4d .41_= ..
PowermcJi=x, cxtmp.=¢mv kx:_ SuJ¢ umb

, O-_ldm
' Pmw.#_c,r_ mm O-9

s_,w =rc,.,w. crkin. ,_.=_, I_=led=


Pem_a_mt t,_m

{ _¢_l_ _ _¢MI_=,

Figure A-2.--Concluded.

NASA/TP_2000-207428 193
108

Air-launched J
missiles in

107
106 --

° i, g,lig.y
'aunch°d
105 _ Ground-launched J

missiles in countdown --_


._o

o Airborne radars-_ \ \\ S 1-Missiles,/satellite launch

104 -- \_" and boost phase


j-

t- Ground support equipment _"


O
for missiles in countdownJ Ground electronics equipment
_T

103 Ground support =,_" Airborne computers


O -- equipment for
missiles in 7 Manned aircraft
laboratory _ ._
life tests _/I._-

102 -- C_ Sh_d, satellite orbit phasea

._ Fixed ground system in field

101 __j Laboratory computer

Nonoperating storage

1oO, I I I I I I l
100 101 102 103 104 105 106 107

Nonoperational application factor

Figure A-3.--Application factor comparison for nonoperating storage of military standard electronic parts
MIL-STD-756 points (solid symbols) are given for comparison. (From ref. A-2.)

194 NASA/q'P_2000-207428
10 o
/--50 Confidence
-- /-50
/

I"-61- 0-'- 70 pleerveeln


t ///- 60

:O9o
7 -70
10 1 / J- 80

102 _

103

(b_ \\_\\\
10 4 I I I f l 1 I l_,,\\\\

10 0
._50
/-5O
//- 60 J
/ /- 60
J/-70
10 1 -- / / ----80
// f J-80
.t 1.1--90
_ _--95
"a
102
E
z

10 3

(c) \\\\%\ (d) \\\%\\


10 4 I I I { I I I I_,\\\_\ I I I I 1 1 I I\'%,\\\

10 0

..--50 I F50

10 1 /.I- 60 / /- 60

_95
10 2

- N
//-99

103

(e)
I I I L I I I(°I I I I I I
lo4 J I
50 60 80 90 95 99 99.8 99.99 50 60 80, 90 95 99 99.8 99.99

Reliability, percent

Figure A--4.---Confidence curves for attribute testing. (a) When no failures are observed. (b) When one failure is observed. (c) When
two failures are observed. (d) When three failures are observed. (e) When four failures are observed. (f) When five failures are
observed. (From ref. A-2.)

NASA/TP--2000-207428 195
TABLE A-I.--VALUES OF NEGATIVE EXPONENTIAL FUNCTION ,,-_

0.0000 1.00000 0.0050 0.99501 0.0100 0.99005 0.0150 0.98511 0.0000 0.98020 0.0050 0.97531
.0001 .99990 .0051 .99491 ,0101 .98995 .0151 .98501 .0001 .98010 .0051 .97521
.0000 .99980 .0052 .99481 .0100 .98985 .0152 .98491 .0202 .98000 .0252 .97511
•0003 ,99970 .0053 .99471 .0103 .98975 .0153 .98482 .0203 .97990 .0253 .97502
.0004 .99960 .0054 .99461 .0104 .98965 .0154 .98472 .0204 .97981 .0254 ,97492

0.0005 0.99950 0.0055 0.99452 0.'0105 0.98955 0.0155 0.98462 0.0005 0.97971 0.0255 0.97482

.0006 .99940 .0056 .99442 .0106 .98946 .0156 .98452 .0206 .97961 .0256 .97472
•0007 .99930 .0057 ,99432 .0107 .98936 .0157 .98442 .0207 .97951 .0257 .97463
.0008 .99920 .0058 .99422 .0108 .98926 .0158 .98432 .0208 .97941 .0258 .97453
.0009 .99910 .0059 .99412 .0109 .98916 .0159 .98423 .0009 .97932 .0259 .97443

0.0010 0.99900 0.0060 0.99402 0.01 i0 0,98906 0.0160 0.98413 0.0010 0.97922 0.0060 0.97434
.0011 .99890 .0061 .99392 .Olll .98896 ,0161 .98403 .0211 .97912 .0261 .97424
,0012 .99880 .0062 ,99382 ,0112 ,98886 ,0162 ,98393 .0212 .97900 .0262 .97414
•0013 .99870 .0063 .99372 .0113 .98876 .0163 .98383 .0213 .97893 .0263 .97404
.0014 .99860 .0064 .99362 .0114 .98866 .0164 .98373 .0214 .97883 .0264 .97395

0.0015 0.99850 0.0065 0.99352 0.0115 0.98857 0.0165 0.98364 0.0215 0.97873 0.0265 0.97385
.0016 .99840 .0066 .99342 .0116 .98847 .0166 .98354 .0216 .97863 .0266 .97375
.0017 .99830 .0067 .99332 .0117 .98837 .0167 .98344 .0217 .97853 .0067 .97365
.0018 .99820 .0068 .99322 .0118 .98827 .0168 .98334 .0218 .97844 .0068 .97356
•0019 .99810 .0069 .99312 .0119 .98817 .0169 98324 .0219 .97834 .0269 .97346

0.0020 0.99800 0.0070 0.99300 0.0120 0.98807 !0.0170 0.98314 0.0220 0.97824 0.0070 0.97336
.0021 .99790 .0071 .99293 .0121 .98797 .0171 .98305 .0221 .97814 .0271 .97326
.0022 .99780 .(XI72 .99283 .0122 ,9878"7 .0172 .98295 .0022 .97804 .0272 .97317
.0023 .99770 .0073 .99273 .0123 .98777 .0173 .98285 .0223 .97795 .0273 .97307
.0024 .99760 .0074 .99263 .0124 .98767 .0174 .98275 .0224 .97785 .0274 .97297

0.0025 0.99750 0.0075 0.99253 0.0125 0.98757 0,0175 0.98265 0.0225 0.97775 0,0075 0,97287
.0026 .99740 .0076 ,99243 .0126 .98747 .0176 .98255 .0226 .97765 .0276 .97278
.0027 .99730 .0077 .99233 .0127 .98738 .0177 .98246 .0227 .97756 .0277 .97268
.0028 .99720 .0078 .99223 .0128 .98728 .0178 .98236 .0028 .97746 .0278 .97258
.0009 .99710 .0079 .99213 .0129 .98718 .0179 .98226 .0029 .97736 .0279 .97249

0.0030 0.99700 ;0.0080 0.99203 t O.OI30 0.98708 0.0180 0.98216 0.0030 0.97726 0.0080 0.97239
.0031 .99690 .0081 .99193 .0131 .98699 .0181 .98206 .0031 .97716 .0281 .97229
.0032 .99681 .0082 .99183 .0132 .98689 .0182 .98196 .0032 .97707 .0282 .97219
.0033 .99671 .0083 .99173 .0133 .98679 .0183 .98187 .0233 .97697 .0083 .97210
.0034 .99661 .0084 .99164 .0134 .98669 .0184 .98177 .0234 .97687 .0284 .97200

0,0035 0.99651 0.0085 0,99154 0.0135 0.98659 0,0185 0.98167 0,0035 0,97677 0.0085 0,97190
.0036 .99641 .0086 .99144 .0136 .98649 .0186 .98157 .0036 .97668 .0286 .97181
.0037 .99631 .0087 .99134 .0137 .98639 .0187 ,98147 .0237 .97658 .0287 .97171
.0038 .99621 .0088 .99124 .0138 .98629 .0188 .98138 .0238 .97648 .0288 .97161
.0039 .99611 .0089 .99114 .0139 .98620 .0189 .98128 .0239 .97638 .0089 .97151

0.0040 0.99601 0.0090 0.99104 0.0140 0.98610 0.0190 0.98118 0.0040 0.97629 0.0090 0.97142
.0041 .99591 .0091 .99094 .0141 .98600 .0191 .98108 .0241 .97619 .0291 .97132
.0042 .99581 .0092 .99084 .0142 .98590 .0192 .98098 .0242 .97609 .0292 .97122
.0043 .99571 .0093 .99074 .0143 .98580 .0193 .98089 .0243 .97599 .0293 .97113
.0044 .99561 .0094 .99064 .014.4 .98570 .0194 .98079 .0344 .97590 .0094 .97103

0.0045 0.99551 0.0095 0.99054 0.0145 0.98560 0.0195 0.98069 0.0245 0.97580 0.0295 0.97093
.0046 .99541 .0096 .99045 .0146 .98551 .0196 .98059 .0246 .97570 .0296 .97083
.0047 .99531 .0097 .99035 .0147 .98541 .0197 .98049 .0247 .97560 .0097 .97074
.0048 .99521 .0098 .99005 .0148 .98531 .0198 .98039 .0248 .97550 .0298 .97064
.0049 .99511 .0099 .99015 .0149 .98521 .0199 .980"30 .0249 .97541 .0099 .97034

196 NASA/TP--2000-207428
TABLE A- 1.--Continued.

.g e-X ,g e -x _ e -x x • -x [ .g e-_ .g e-x

0.0300 0.97045 0.0350 0.96561 0.0400 0.96079 0.0450 0.95600


1 0.0500 0.95123 0.0550 0.94649
.0301 .97035 .0351 .96551 .0401 .96069 .0451 .95590 .0501 .95113 .0551 .94639
.0302 .97025 .0352 .96541 .0402 .96060 .0452 .95581 .0502 .95104 .0552 ,94630
.0303 .97015 .0353 .96531 .0403 .96050 .0453 .95571 .0503 .95094 .0553 .94620
.0304 .97006 .0354 .96522 .0404 .96041 .0454 .95562 .0504 .95085 .0554 .94611

0.0305 0.96996 0.0355 0.96512 0.0405 0.96031 0.0455 0.95552 0.505 0.95075 I0.0055 0.94601
.0306 .96986 .0356 .96503 .0406 .96021 .0456 .95542 .0506 .95066 .0556 .94592
.0307 .96977 .0357 .96493 .0407 .96012 .0457 .95533 .0507 .95056 .0557 .94582
.0308 .96967 .0358 .96483 .0408 .96002 .0458 .95523 .0508 .95047 .0558 .94573
.0309 .96957 .0359 .96474 .0409 .95993 .0459 .95514 .0509 .95037 .0559 .94563

0.0310 0.96948 0.0360 0.96464 0.0410 0.95983 0.0460 0.95504 0.0510 0.95028 0.0560 0.94554
.0311 .96938 .0361 .96454 .0411 .95973 .0461 .95495 .0511 .95018 .0561 .94544
.0312 .96928 .0362 .96445 .0412 .95964 .0462 .95485 .0512 .95009 .0562 .94535
.0313 ,96918 .0363 .96435 ..0413 .95954 .0463 .95476 .0513 .94999 .0563 .94526
.0314 ,96909 .0364 ,96425 .0414 .95945 .0464 .95466 .0514 .94990 .0564 .94516

0.0315 0.96899 0.0365 0.96416 0.0415 0.95935 0.0465 0.95456 0.0515 0.94980 0.0565 0.94507
.0316 .96889 .0366 .96406 .0416 .95925 .0466 .95447 .0516 .94971 .0566 .94488
.0317 .96879 .0367 .96397 .0417 .94916 .0467 .95437 .0517 .94961 .0567 .94488
.0518 .96870 .0368 .96387 .0418 .95906 .0468 .95428 .0518 .94952 .0568 .94478
.0319 .96860 .0369 .96377 .0419 .95897 .0469 .95418 .0019 .94942 .0069 .94469

0.0320 0.96851 0.0370 0.96368 0.0420 0.95887 0.0470 0.95409 0.0020 0.94933 0.0570 0.94450
.0321 .96841 .0371 .96358 .0421 .95877 .0471 .95399 .0521 .94923 .0571 .94450
.0322 ,96831 .0372 .96348 .0422 .95868 .0472 .95390 .0522 .94914 .0072 .94441
.0323 .96822 .0373 .96339 .0423 .95858 .0473 .95380 .0523 .94904 .0073 .94431
.0324 .96812 .0374 .96329 .0424 .95849 .0474 .95371 .0024 .94895 .0574 .94422

0.0025 0.96802 0.0375 0.96319 0.0425 0.95839 0.0475 0.95361 0.0525 0.94885 0.0575 0.94412
.0326 .96793 .0376 .96310 .0426 .95829 .0476 .95352 .0026 .94876 .0576 .94,403
.0327 .96783 .0377 .96300 .0427 .95820 .0477 .95342 .0527 .94866 .0577 .94393
.0328 .96773 .0378 .96291 .0428 .95810 .0478 .95332 .0528 .94857 .0578 .94384
.0329 .96764 .0379 .96281 .0429 .95801 .0479 .95323 .0029 .94847 .0579 .94374

0.0330 0.96754 0.0380 0.96271 0.0430 0.95791 0.0480 0.95313 0.0530 0.94838 0.0080 0.94365
.0331 .96744 .0381 .96262 .0431 .94782 .0481 .95304 .0531 .94829 .0581 .94356
.0332 .96735 .0382 .96252 .0432 .95772 .0482 .95294 .0032 .94819 .0582 .94346
.0333 .96725 .0383 .96242 .0433 .95762 .0483 .95285 .0033 .94810 .0583 .94337
.0334 .96715 .0384 .96233 .0434 .95753 .04.84 .95275 .0534 .94800 .0084 .94327

0.0335 0.96705 0.0385 10.96223 0.0435 0.95743 0.0485 0.95266 0.0535 0.94791 0.0585 0.94318
.0336 .96696 .0386 .96214 .0436 .95734 .0486 ,95256 .0036 .94781 ,0586 .94308
.0337 .96686 .0387 .96204 .0437 .95724 .0487 .95247 .0037 .94772 .0587 .94299
.0338 .96676 .0388 .96194 .0438 .95715 .0488 .95237 .0038 .94762 .0088 .94289
.0339 .96667 .0389 .96185 .0439 .95705 .0489 .95228 .0539 .94753 .0589 .94280

0.0340 0.96657 0.0390 0.96175 0.0440 0.95695 0.0490 0.95218 0.0540 0.94743 0,0090 0.94271
.0341 .96647 .0391 ,96165 .0441 .95686 .0491 .95209 .0041 .94734 .0591 .94261
.0342 .96638 .0392 .96156 .0442 .95676 .0492 .95199 .0542 .9472.4 ,0592 .94252
.0343 .96628 .0393 .96146 .0443 .95667 .0493 .95190 .054.3 .94715 .0593 .94242
.0344 .96618 .0394 .96137 .0444 .95657 .0494 .95180 .0544 .94705 .0094 .94233

0.0345 0.96609 0.0395 0.96127 0.0445 0.95648 0.0495 0.95171 0.0545 0.94696 0.0595 0.94224
.0346 .96599 .0396 .96117 .0446 .95638 .0496 .95161 .0546 .94686 ,0096 .94214
.0347 .96590 .0397 .96108 .0447 .95628 .0497 .95151 .0047 .94677 .0597 .94200
.0348 .96580 .0398 .96098 .0448 .95619 .0498 .95142 .0548 .94667 .0598 .94.195
.0349 .96570 .0399 .96089 .0449 .95609 .0499 .95132 .0549 .94658 .0599 .94186

NASA/TP--2000-207428 197
TABLE A- i..-.Continued.

X ,f -x ,It- t,-x X C -x [ X _,-Jr X ,f -x X ,f-x


_,, [
0.0600 0.94176 0.0650 0.93707 0.0700 0.93239 0.0750 0.92774 0.0800 0.92312:0.0850 0.91851
.0601 .94167 .0651 .93697 .070 .93230 .0751 .92765 .0801 ,92302 .0851 .91842
.0602 ,94158 .0652 .93688 .0702 .93221 ,0752 .92756 .0802 .92293 .0852 .91833
.0603 .94148 .0563 .93679 .0703 .9321 .0753 .92747 .0803 .92284 .0853 .91824
.0604 .94139 .0654 .93669 .0"/04 .93202 .0754 .92737 .0804 .92275 .0854 .91814

0.0605 0.94129 0.0655 0.93660 0.0705 0.93193 0.0755 0.92728 0.0805 0.92265 0.0855 0.91805
.0606 .94120 .0656 .93651 .0706 .93183 .0756 .92719 .0806 .92256 .0856 .91796
.0607 .94111 .0657 .93641 .0707 .93174 .0757 .92709 .0807 .92247 .0857 .91787
.0608 .94101 .0658 .93632 .0708 .93165 .0758 .92700 .0808 .92238 ,0858 .91778
.0609 .94092 .0659 .93622 .0709 .93156 .0759 .92691 .0809 .92229 .0859 .91769

0.0610 0.94082 0.0660 0.93613 0.0710 0.93146 0.0760 0.92682: 0,0810 0.92219 0.0860 0.91759
,06t I ,94073 ,0661 .93604 ,07 t t .93137 .0761 .92672 _ ,08 t I .92210 .086t .91750
.0612 .94064 .0662 ,93594 .0712 ,93128 .0762 .92663, ,0812 .92201 ,0862 .91741
.0613 .94054 .0663 .93585 .0713 .93118 .0763 .92654 .0813 .92191 .0863 .91732
.0614 .94045 .0664 .93576 .0714 .93109 .0764 .92645 .0814 .92182 .0864 .91723

0.0615 0.94035 0.0665 0.93566 , 0,0715 0.93100 0.0765 0.92635 0.0815 0.92173 0.0865 0.91714
.0616 .94026 .0666 .93557 .0716 .93090 .0766 .92626 .0816 .92164 .0866 .91704
.0617 .94016 .0667 .93548 .0717 .93081 .0767 .92617 .0817 .92155 .0867 .91695
.0618 .94007 .0668 .93538 .0718 .93072 .0768 .92608 .0818 .92146 .0868 .91686
.0619 .93998 .0669 .93529 .0719 .93062 .0769 .92598 .0819 .92136 .0869 .91677

0.0620 0.93988 0.0670 0.93520 0.0720 0.93053 0.0770 0.92589 0.0820 0.92127 10.0870 0.91668
.062 ! .93979 .0671 .935 i0 .0721 .93044 .0771 .92580 .082 i .92118 .087 ! .91659
.0622 .93969 .0672 .9350 i .0722 .93034 .0772 .92570 .0822 .92109 .0872 •91649
•0623 .93960 .0673 .93491 .0723 .93025 .0773 .9256| .0823 .92100 .0873 .91640
.0624 .93951 .0674 .93482 .0724 .93016 .0774 .92552 .0824 .92090 .0874 .91631

0.0625 0.93941 0.0675 0.93473 0.0725 0.93007 0.0775 0.92543 0.0825 0.92081 0.0875 0.91622
•0626 .93932 .0676 .93463 .0726 .92997 .0776 .92533 .0826 .92072 .0876 .91613
.0627 .93923 .0677 .93454 .0727 .92988 .0777 .92524 .0827 .92063 .0877 .91604
•0628 .93913 .0678 .93445 .0728 .92979 .0778 .92515 .0828 .92054 .0878 .91594
.0629 .93904 .0679 .93435 .0729 .92969 .0779 .92506 .0829 .92044 .0879 .91585

0.0630 0.93894 0.0680 0.93425 0.0730 0.92960 0.0780 0.92496 0.0830 0.92035 0.0880 0.91576
,0631 .93885 .0681 .93417 .0731 .92951 .0781 ,92487 .0831 .92026 .0881 .91567
•0632 .93876 .0682 .93407 .0732 .92941 .0782 .92478 .0832 .92019 .0882 .91558
.0633 .93866 .0683 .93398 .0733 .92932 .0783 ,92469 .0833 .92008 .0883 .91549
.0634 .93857 .0684 .93389 .0734 .92923 ,0784 .92459 .0834 .91998 .0884 .91539

0.0635 0.93847 0.0685 0.93379 0,0735 0.92914 0.0785 0.92450 0.0835 0,91989 0.0885 0.91530
,0636 .93838 ,0686 .93370 ,0736 .92904 .0786 ,92441 .0836 .91980 .0886 .91521
.0637 .93829 ,0687 .93361 .0737 .92895 ,0787 .92432 .0837 .91971 .0887 .91512
•0638 .93819 .0688 .93351 .0738 .92886 .0788 .92422 .0838 .91962 .0888 .91503
.0639 .93810 .0689 .93342 .0739 .92876 .0789 .92413 .0839 ,91952 .0889 .91494

0.0640 0.93800 0.0690 0.93333 0.0740 0.92867 0.0790 0.92404 0.0840 0.91943 0.0890 0.91485
.0641 .93791 .0691 .93323 .0741 .92858 .0791 .92395 .0841 .91934 .0891 .91475
•0642 .93782 .0692 .93314 .0742 .92849 .0792 .92386 .084,2 ,91925 .0892 .91466
.0643 .93772 .0693 .93305 .0743 .92839 .0793 ,92376 .0843 .91,9_6 .0993 .9t457
.0644 .93763 .0694 .93295 .0744 .92830 .0794 .92367 .084,4 .91906 .0894 .91448

0.0645 0.93754 0.0695 0.93286 0.0745 0.92921 0.0795 0.92358 0.0845 0.91897 0.0895 0.91439
.0646 .93744 .0696 .93277 .0746 .92811 .0796 .92349 .08,46 .91888 .0896 .91430
.0647 .93735 .0697 .93267 .0747 .92802 .0797 .92339 .0847 ,91879 .0897 .91421
.0648 .93725 .0698 .93258 .0748 .92793 .0798 .92330 .0848 ,91870 .0898 .91411
•0649 .93716 .0699 .93249 .0749 .927$t .0799 .92321 .0849 .91860 .0899 .91402

198 NASA/TP--2000-207428
TABLE A- 1 .--Continued.

X e -x x t,-.¢ X t "-z x e-z x e -z x _-x

0.0900 0.91393 0.0950 0.90937 0.1000 0.90484 0.105(3 0.90032 0.1100 0.89583 0.1150 0.89137
.0901 .91384 .0951 .90928 .1001 .90475 .1051 .90023 .1101 .89574 .1151 .89128
.0902 .91375 .0952 .90919 . 1002 .90466 • 1052 .90014 • 1102 .89565 • 1152 .89119
.0903 .91366 .0953 .90910 .1003 .90457 .1053 .90005 .1103 .89557 .1153 .89110
.0904 .91357 .0954 .90901 . 1004 .90448 .1054 .89996 . 1004 .89548 . 1154 .89101

0.13905 0.91347 0.0955 0.90892 0.1005 0.90439 0.1055 0.89987 0.1105 0.89539 0.1155 0.89092
•0906 ,91338 .0956 .90883 • 1006 .90429 • 1056 .89978 • 1106 .89530 . 1156 .89083
.0907 ,91329 .0957 .90874 . 1007 .90420 . 1057 .89969 . 1107 .89521 . 1157 .89074
.0908 .91320 .0958 ,90865 . 1008 .90411 • 1058 .89960 . 1108 .89512 • ! 158 .89065
.0909 .91311 .0959 .90855 . 1009 .90402 . 1059 .89951 . 1109 .89503 . 1159 .89056

0.0910 0.91302 0.0960 0.90846 0.1010 0.90393 0.1060 0.89942 0.1110 0.89494 0.1160 0.89048
•0911 .91293 .0961 .90837 . 1011 .90384 . 1061 .89933 •I 111 .89485 . 1161 .89039
•0912 .91284 .0962 .90828 .I012 .90375 .1062 .89924 .1112 .89476 .1162 .89030
.0913 .91274 .0963 .90819 ..1013 .90366 .1063 .89915 .1113 .89467 .1163 .89021
.0914 .91265 .0964 ,908 I0 • 1014 .90357 . 1064 .89906 . I 114 .89458 •1164 ,89012

0.0915 0.91256 0.0965 0.90801 O. 1015 0.90348 O. 1065 0.89898 O. 1115 0.89449 O. 1165 0.89003
•0916 .91247 .0966 .90792 . 1016 .90339 . 1066 .89889 • 1116 ,89440 . 1166 .88994
•0917 .91238 .0967 .90783 .1017 .90330 .1067 .89880 .!117 .89431 .1167 .88985
•0918 .91229 .0968 .90774 • 1018 .90321 . 1068 .89871 • i 118 .89422 . 1168 .88976
,0919 .91220 .0969 .90765 .1019 .90312 .1069 .89862 . I 119 .89413 • 1169 .88967

0.0920 0.92111 0.0970 0.90756 0.1020 0.90303 0.1070 0.89853 0.1120 0.89404 0.1170 0.88959
•0921 .91201 .0971 .90747 .1021 .90294 .1071 .89844 .1121 .89395 .1171 .88950
.0922 .91192 .0972 .90737 • 1022 .90285 . 1072 .89835 . 1122 .89387 . 1172 .88941
.0923 .91183 .0973 .90728 . 1023 .90276 • 1073 .89826 .1123 .89378 • 1173 .88932
.0924 .91174 .0974 .907 19 i . 1024 .90267 . 1074 .89817 • 1124 .89369 . 1174 .88923

0.0925 0.91165 0.0975 0.90710 O. 1025 0.90258 0.1075 0.89808 0. i 125 0.89360 0.1175 0.88914
•0926 .91156 .0976 .90701 .1026 .90249 .1076 .89799 .1126 .89351 .1176 .88905
.0927 .91147 ,0977 .90692 . 1027 .90240 . 1077 .89790 '. 1127 .89342 ,1177 .88896
.0928 .91138 .0978 .90683 . 1028 .90231 " . 1078 .89781 • 1128 .89333 • 1178 .88887

.0929 .91128 .0979 .90674 • 1029 .90222 . 1079 .89772 • 1129 .89324 • I 179 •88878

0.0930 0.91119 0.0980 0.90665 0.1030 0.90213 0.1080 0.89763 0.1130 0.89315 0.1180 0.88870
•0931 .91110 .0981 .90656 .1031 .90204 .108t .89754 .1131 .89306 .1181 .8886 I
.0932 .91101 .0982 .90647 .1032 .90195 .1082 .89745 .1132 .89"297 .1182 .88852
•0933 .91092 .0983 .90638 .1033 .90186 .1083 .89736 .1133 .89288 .1183 .88843
,0934 .91083 .0984 .90629 .1034 .90177 .1084 .89727 .1134 .89279 .1184 ,88834

0.0935 0.91074 0.0985 0.90620 0.1035 0.90168 0.1085 0.89718 0.1135 0.89270 10.1185 0.88825
.0936 .91065 .0986 .90611 .1036 .90159 .1086 .89709 .1136 .89261 .1186 .88816
.0937 .91056 .0987 .90601 .1037 .90150 .1087 .89700 .1137 .89253 .1187 ,88807
•0938 .91046 .0988 .90592 .1038 .90141 .1088 .89691 .1138 .89"244 .1188 .88799
•0939 .91037 .0989 .90583 • 1039 .90132 • 1089 .89682 . ! 139 .89235 • 1189 ,88790

0.0940 0.91028 0.0990 0.90574 0.1040 0.90123 0.1090 0.89673 0.1140 0.89226 0.1190 0.98781
.0941 .91019 .0991 ,90565 .1041 .90114 .1091 .89664 .1141 .89217 .1191 ,88772
.0942 .91010 .0992 .90556 .1042 .90105 . 1092 •89655 • 1142 .89208 • 1192 .88763
.0943 .91001 .0993 .90547 • 1043 .9(1095 . 1093 .89646 . 1143 .89199 •1193 ,88754
.0944 .90992 .0994 .90538 . 1044 .90086 . 1094 •89637 • 1144 .89190 .1194 •88745

0.0945 0.90983 0.0995 0.90529 0.1045 0.90077 0.1095 0.89628 0.1145 0.89181 0.1195 0.88736
.0946 .90974 .0996 .90520 .1046 .90068 .1096 .89619 .1146 .89172 .1196 ,88728
.0947 .90965 .0997 .90501 .1047 .90059 .1097 .89610 .1147 .89163 .1197 .88719
.0948 .90955 .0998 .90502 .1048 .90050 .1098 .89601 .1148 .89154 .1198 .88710
.0949 .90946 .0999 .90493 • 1049 .90041 . 1099 .89592 . 1149 .89146 • 1199 .88701

NASA/TP--2000-207428 199
TABLE A- ! .--Condnued.

X _-•_ x e -a X _-z X _--x X _-.x .r e -x


- -d

O. 1200 0.88692 O. 1250 0.88250 O. 1300 0.87810 O. 1350 0.87372 O. 1400 0.86936 O• 1450 0.86502
.1201 .88683 .1251 .88241 .1301 .87801 .1351 .87363 •1401 .86927 .1451 .86494
• 1202 .88674 .1252 .88232 .1302 .87792 .1352 .87354 .1402 .86918 .1452 .86485
• 1203 .88665 .1253 .88223 .1303 .87783 .1353 .87345 .1403 .86910 .1453 .86476
• 1204 .88657 .1254 .88214 .1304 .87774 .1354 .87337 J .1404 .86901 .1454 .86468

O. 1205 0.88648 O. 1255 0.88206 O. 1305 0.87766 O. 1355 0.87328 I O. 1405 0.86892 O. 1455 0.86459
.1206 .88639 .1256 •88197 .1306 .87757 .1356 .87319 .1406 .86884 .1456 .86450
.1207 ,88630 .1257 .88188 .1307 .87748 •1357 .87310 .1407 .86875 •1457 .86442
.1208 .88621 •1258 •88179 .1308 .87739 •1358 .87302 .1408 .86866 .1458 .86433
.1209 .88612 .1259 .88170 .1309 .87731 .1359 .87283 .1409 .86858 .1459 .86424

0.1210 0.88603 10.1260 0.88161 10.1310 0.87722 0.1360 0.87284 0•1410 0.86849 0.1460 0.86416
.1211 .88595 .1261 .88153 .1311 .87713 .1361 ,87276 .1411 ,86840 .1461 .86407
.1212 .88586 .1262 .88144 .1312 .87704 •!362 .87267 .1412 .86832 .1462 .86398
.1213 .88577 .1263 .88135 .1313 .87695 .1363 .87258 .1413 .86823 .1463 .86390
.1214 .88568 .1264 .88126 .1314 .87687 .1364 .87249 •!414 .86814 .1464 .86381

0.1215 0,88559 0.1265 0.88117 0.1315 0.87678 0.1365 0.87241 0_1415 0•86806 0.1465 0.86373
.1216 .88550 .1266 .88109 .1316 .87669 .1366 .87232 .1416 .g6797 .1466 .86364
,1217 .88541 .1267 .88100 ,1317 ,87660 ,1367 ,87223 .1417 .86788 ,1467 .86355
.1218 ,88533 .1268 .88091 .1318 .87652 .1368 .87214 .1418 •86779 •1468 .86347
.1219 ,88524 .1269 .88082 .1319 .87643 .1369 .87206 .1419 .86771 .1469 .86338

0.1220 0.88515 0.1270 0.88065 0.1320 0.87634 0.1370 0.87197 0.1420 0.86762 0.1470 0.86329
.1221 .88506 .1271 •1321 .87625 •1371 .87188 .1421 .86753 .1471 .86321
• 1222 .88497 .1272 .88056 .1322 .87617 .1372 .87180 .1422 .86745 .1472 .86312
.1223 .88488 .t273 •88047 .1323 .87608 .1373 .87171 ,I.423 •86736 ,1473 .86304
•1224 .88479 .1274 .88038 .1324 .87599 .1374 •87162 .1424 .86727 .1474 .86295

O. 1225 0,88471 O. 1275 0.88029 O. 1325 0.87590 O. 1375 0•87153 O. 1425 0.86719 O. 1475 0.86286
.1226 .88462 .1276 .88021 .1326 .87582 .1376 .87145 .1426 .86710 .1476 .86278
.1227 .88453 .1277 .88012 .1327 .87573 .1377 .87136 .1427 .86701 .1477 .86269
.1228 .88444 .1728 .88003 .1328 •87564. .1378 .87127 .1428 .86693 •1478 .86260
• 1229 .88435 .1279 .87994 .1329 .87555 .1379 .87119 .1429 .86684 .1479 .86252

0.1230 0.88426 0.1280 0.87985 0•1330 0.87547 0.1380 0.87110 0.1430 0.86675 0.1480 0.86243
• i231 .88418 .1281 .87977 .1331 .87538 .1381 .87101 .1431 .86667 ,1481 .86234
• 1232 .88409 .1282 .87968 .1332 .87529 .1382 .87092 .1432 •86658 ,1482 .86226
• 1233 .88400 .1283 .87959 .1333 .87520 .1383 .87084 • 1433 .86649 ,1483 .86217
.1234 .88391 .1284 .87950 .1334 .87511 .1384 .87075 .1434 .86641 ,1484 .86209

O. 1235 0.88382 O. 1285 0.87941 O. 1335 0•87503 O. 1385 0.87066 O. 1435 0.86632 O, 1485 0.86200
• 1236 •88373 .1286 .87933 .1336 .87494 •!386 .87058 .1436 .86623 .1486 .86191
.1237 .88364 .1287 .87924 .1337 .87485 .1387 .87049 .1437 .86615 .1487 .86183
.1238 .88356 .1288 .87915 .t338 .87477 .1388 .87040 .1438 •86606 ,1488 .86174
.1239 .88347 .1289 .87906 .1339 .87468 .1389 .87031 .1439 .86597 .1489 .86166

0.1240 0.88338 0.1290 0.87897 0.1340 0.87459 0,1390 0.87023[ 0.1440 0.86589 0.1490 0.86157
• 1241 .88329 .1291 .87889 .1341 .87450 .1391 .87014 .1441 .86580 .1491 .86148
.1242 .88320 .1292 .87880 .1342 .87442 .1392 .87005 _, .1442 .86571 .1492 .86140
.1243 .88311 .1293 .87871 .1343 .87433 .1393 .86997 .1443 .86563 .1493 .86131
• 1244 .88303 .1294 .87862 .1344 .87424 .1394 .86988 .1444 .86554 .1494 .86122

O. 1245 0.88294 O. 1295 0.87853 O. 1345 0.87415 O. 1395 0.86979 O. 1445 0.86545 O. 1495 0.86114
.1246 .88285 .1296 .87845 .1346 .87407 .1396 .86971 .1446 .86537 .1496 .86105
• 1247 .88276 .1297 .87836 .1347 •87398 ,1397 .86962 .1447 .86528 .1497 .86097
.1248 .88267 .1298 .87827 .1348 .87389 .1398 .86953 .1448 .86520 .1498 .86088
.1249 .88256 .1299 .87818 .1349 .87380 .1399 .86945 .1449 .86511 .1499 .86079

200 NAS A,rI'P--2000-207428


TABLE A- I .--Continued.

X • -x x e -,x x e -'t X e -x x _-x x • _x

0.1500 0.86071 !0.1550 0.85642 0.1600 0.85214 0.1650 0.84789 0.1700 0.84366 0.1750 0.83946
.1501 .86062 .1551 .85633 .1601 .85206 .1651 .84781 .1701 .84358 .1751 .83937
,1502 .86054 .1552 .85624 .1602 .85197 .1652 .84772 .1702 .84350 .1752 .83929
• 1503 .86045 • 1553 .85616 . 1603 .85189 • 1653 .84764 .1703 .84341 • 1753 .83921
.1504 .86036 .1554 .85607 .1604 .85180 .1654 .84755 .1704 .84333 .1754 .83912

0.1505 0.86028 0.1555 0.85599 0.1605 0.85172 0.1655 0.84747 0.1705 0.84324 0.1755 0.83904
.1506 .86019 .1556 .85590 .1606 .85163 .1656 .84739 .1706 .84316 .1756 ,83895
,1507 .86010 .1557 .85582 .1607 .85155 .1657 .84730 .1707 .84307 .1757 .83887
.1508 .86002 ,1558 .85573 .1608 .85146 .1658 .84722 .1708 .84299 .1758 .83879
• 1509 .85993 .1559 .85564 .1609 ,85138 .1659 .84713 .1709 .84296 .1759 .83870

0.1510 0.85985 0.1560 10.85556 0.1610 0.85129 0.1660 0.84705 0.1710 0.84282 0.1760 0.83862

.1511 .85976 .1561 .85547 .1611 .85121 .1661 .84696 .1711 .84274 .1761 .83853
.1512 .85968 .1562 .85539 .1612 .85112 .1662 .84688 .1712 , g4.265 .1762 .83845
• 1513 .85959 ,1563 .85530 ,1613 .85104 .1663 .84679 .1713 .84257 .1763 .83837
• 1514 .85950 .1564 .85522 .1614 .85095 .1664 .84671 .1714 .84248 .1764 .83828

0.1515 0.85942 0.1565 0.85513 0.1615 0.85087 0.1665 0.84662 0.1715 0.84240 0.1765 0.83820
• 1516 .85933 .1566 .85505 .1616 .85078 .1666 .84654 .1716 .84231 .1766 .83811
• 1517 .85925 .1567 .85496 .1617 .85070 .1667 .84645 .1717 .84223 .1767 .83803
.1518 .85916 .1568 .85488 .1618 .85061 .1668 .84637 .1718 .84215 .1768 .83795
.1519 .85907 .1569 .85479 .1619 .85053 .1669 .84628 .1719 .84206 .1769 .83786

0.1520 0.85899 0.1570 0.85470 0.1620 0.85044. 0.1670 0.84620 0.1720 0.84198 0.1770 0.8377[
.1521 .85890 .1571 .85462 .1621 .85056 .1671 .84611 .1721 .84189 .1771 .83770
.1522 .85882 .1572 .85453 .1622 .85027 .1672 .84603 .1722 .84181 .1772 .83761
.1523 .85873 .1573 .85445 .1623 .85019 .1673 .84595 .1723 .84173 .1773 .83753
• I524 .85864 .1574 .85436 .1624 .85010 .1674 .84586 .1724 .84164 .1774 .83744

0.1525 0.85856 0.1575 0.85428 0.1625 0.85002 0.1675 0.84578 0.1725 0.84156 0.1775 0.83736
•1526 .85847 .1576 .85412 .1626 .84993 .1676 .84569 .1726 .84147 .1776 .83728
•1527 .85839 .1577 .85411 .1627 .84985 .1677 .84561 .1727 .84139 .1777 .83719
.1528 .85830 .1578 .85402 .1628 .84976 .1678 .84552 .1728 .84131 .1778 .83711
• 1529 .85822 . 1579 .85394 . 1629 .84968 • 1679 .84544 • 1729 .84122 . 1779 .83703

0.1530 0,85813 0.1580 0.85385 0.1630 0.84959 10.1680 0.84535 0.1730 0.84114 0.1780 0.83694
.1531 .85804 .1581 .85376 .1631 .84951 .1681 .84527 .1731 .84105 .1781 .83686
• 1532 .85796 .1582 .85368 .1632 .84942 .1682 .84518 .1732 .84097 .1782 ,83678 1
• 1533 .85787 .1583 .85359 .1633 .84934 .1683 .84510 .1733 .84089 .1783 .83669
.1534 .85779 .1584 .85351 .1634 .84925 .1684 .84502 .1734 ,84080 .1784 .83661

O. 1535 0.85770 0.1585 0.85342 O.1635 0.84917 O. 1685 0.84493 O.1735 0.84072 O, 1785 0.83652
• 1536 .85761 . 1586 .85334 . 1636 .84908 • 1686 .84485 • 1736 .84063 . i 786 .83644
. 1537 .85753 . 1587 .85325 . 1637 .849_0 • 1687 .84476 . 1737 .84055 . 1787 •83636
.1538 .85744 .1588 .85317 .1638 .84891 .1688 .844458 .1738 .84046 .1788 ,83627
• 1539 .85736 • 1589 .85308 . 1639 .84883 . 1689 .84459 . 1739 .84038 . 1789 .83619

0.1540 0,85727 0.1590 0.85300 0.1640 0.84874 0.1690 0.84451 I0.1740 0.84030 0.1790 0.83611
.1541 .85719 .1591 .85291 .1641 .84866 .1691 .84442 .1741 .84021 .1791 ,83602
• 1542 .85710 .1592 .85283 .1642 .84857 .1692 .84434 .1742 .84013 .1792 .83594
•1543 .85701 . 1593 .85274 • 1643 .84849 . 1693 .84426 . 1743 .84004 • 1793 .83586
.1544 .85693 .1594 .85266 .1644 .84840 .1694 .84,417 .1744 .83996 .1794 .83577

0.1545 0.85684 0.1595 0.85257 0.1645 0.84832 0.1695 0.84409 0.1745 0.83988 0.1795 0.83569
• 1546 .85676 . 1596 .85248 . 1646 .84823 . 1696 .84400 . 1746 .83979 • 1796 .83560
• 1547 .85667 .1597 .85240 .1647 .84815 .1697 .84392 .1747 .83971 .1797 .83552
• 1548 .85659 .1598 .85231 .1648 .84806 .1698 .84383 .1748 .83962 .1798 .83544
• 1549 .85650 . 1599 .85223 . 1649 .84798 . 1699 .84375 . 1749 ,83954 . 1799 .83535

NASA/TP--2000-207428 201
TABLE A-l.--Concluded.

X • -x X e -x x e -x x ¢-x

O. 1800 0.83527 O. 1850 0.831 I00.1900 0.82696 O. 1950 0.8228. _


.1801 .83519 .1851 .83102 .1901 .82688 .1951 .8227. 4
• 1802 .83510 .1852 .83094 .1902 .82679 .1952 .82267
.1803 .83502 .1853 .83085 .1903 .82671 .1953 .8225S
.1804 .83494 .1854 .83077 .1904 .82663 .1954 .82251

0.1805 0.83485 0.1855 0.83069 0.1905 0.82655 !0.1955 0.82242


.1806 .83477 .1856 .83061 .1906 .82646 .1956 .82234
.1807 .83469 .1857 .83052 .1907 .82638 .1957 .82226
.1_08 .83460 .1858 .83044 .1908 .82630 .1958 .82218
.1809 .83452 .1859 .83036 .1909 .82622 .1959 .82209

0.1810 0.83444 0.1860 0.83027 0.1910 0.82613 0.1960 0.82201


.1811 .83435 .1861 ,83019 .1911 .82605 .1961 .82193
.1812 .83427 .1862 .83017 .1912 .82597 .1962 .82185
• 1813 .83419 .1863 .83002 .1913 .82588 .1963 .82177
.1814 .83410 .1864 ,82994 .1914 .82580 .1964 .82168

0.1815 0.83402 0.1865 0.82986 0.1915 0.82572 0.1965 0.82160


.1816 .83393 .1866 .82978 .1916 .82564 .1966 .82152
• 1817 .83385 .1867 .82969 .1917 .82555 .1967 .82144
• 1818 .83377 .1868 .82961 .1918 .82547 .1968 .82135
.1819 .83368 .1869 .82953 .1919 ,82539 .1969 .82127

0.1820 0.83360 0.1870 0.82944 0.1920 0.82531 0.1970 0.82119


•1821 .83352 .1871 .82936 .1921 .82522 .1971 .82111
• 1822 .83343 .1872 .82928 .1922 .82514 .1972 .82103
• 1823 .83335 • 1873 .82919 • 1923 .82506 • 1973 .82094
.1824 .83327 .1874 .82911 .1924 .82498 .1974 .82086

O. 1825 0,83318 O. 1875 0.82903 O. 1925 0.82489 O. 1975 0.82078


• 1826 .83310 .1876 .82895 .1926 .82481 .1976 .82070
.1827 .83302 .1877 .82886 .1927 .82473 .1977 .82062
• 1828 .83293 . 18"18 .82878 . 1928 .82465 • 1978 .82053
. 1829 .83285 . 1879 .82870 • 1929 .82456 . 1979 .82045

0.1830 0.83277 0.1880 0.82861 0.1930 0.82448 0.1980 0.82037


• 1831 .83268 .1881 .82853 .1931 .82440 .1981 .82029
• 1832 .83260 . 1882 .82845 . 1932 .82432 • 1982 .82021
• 1833 .83252 .1883 .82837 .1933 .82423 .1983 .82012 !
.1834 .83244 .1884 .8"2828 .1934 .82415 .1984 .82004

0.1835 0.83235 0.1885 0.82820 0.1935 0.82407 0.1985 0.81996


.1836 .83227 .1886 .82812 .1936 .82399 .1986 .81988
• 1837 .83219 •1887 .82803 • 1937 .83391 . 1987 .81980
.1838 .83210 .1888 .82795 .1938 .82382 .1988 .81971
.1839 .83202 .1889 .82787 .1939 .82374 .1989 .81963

0.1840 0.83194 0.1890 0.82779 0.1940 0.82366 0.1990 0.81955


.1841 .83185 .1891 .82770 .1941 .82358 .1991 .81947
.1842 .83177 .1892 .82762 .1942 .82349 .1992 .81939
.1843 .83169 .1893 .82754 .1943 ,82341 .1993 .81930
.1844 .83160 .1894 .82746 ,1944 .82333 .1994 .81922

0.1845 0.83152 0.1895 0.82737 0.1945 0.82325 0.1995 0.81914


.1846 .83144 .1896 .82729 .1946 .82316 .1996 .81906
• 1847 .83135 .1897 .82721 .1947 .82308 .1997 .81898
• 1848 .83127 .1898 .82712 .1948 .82300 .1998 .81889
.1849 .83119 .1899 .82704 .1949 .82392 .1999 .81881

202 NASA/TP_2000-207428
TABLE A-Z--TOLERANCE FACTORS FOR OBSERVED MTBF a

Confidence Number of observed failures


level,
percenl 0 1 2 3 4 5 6 7 8 9 I0 II 12 13 14 15

99 4.6 6.6 8.4 10.1 i 1.6 13. I 14.6 16.0 17.4 18.7 20.2 21.5 22.8 24.1 25.4 26.8
95 3.0 4.7 6.3 7.8 9.1 10.5 ! 1.8 13. I 14.4 15.7 17.0 18.2 19.4 20.7 21.9 23.1
90 2.3 3.9 5.3 6.7 8.0 9.2 10.5 11.7 13.0 14.2 15.4 16.6 17.8 19.0 20.2 21.3
80 1.6 3.0 4.3 5.5 6.7 7.9 9.0 10.2 11.4 12.5 13.7 14.8 15.9 17.0 18. l 19.2
70 1.2 2.4 3.6 4.8 5.9 7.0 8.1 9.2 10.3 11.4 12.5 13.5 14.6 15.7 16.8 17.8
60 .9 2.0 3. I 4.2 5.2 6.3 7.4 8.4 9.4 10.5 I1.5 12.5 13.6 14.7 15.7 16.7
50 .7 1.7 2.7 3.7 4.7 5.7 6.7 7.7 8.7 9.7 10.7 I1.7 12.7 [3.7 14.7 15.7

a'To use Ibis l_¢_le


n

I. CalcuLate totat les4 I_u_. T - _ld Nit i where N i,_ (he :h v,iE tested, t i i, the te_ lime of N i, and wf is the Io_l number of units tesled.
i=|
2. Emer table under numbs- of obeyed failures a( desired conf_lence level ¢o find lolcrence favor.

3. Lower confidence lim_ of MTSF - T/fTo4ecar,c¢ f_"_rL

NASA/TP--2000-207428 203
TABLE A-3.--SAFETY MARGINS AT 99-PERCENT CONFIDENCE LEVEL

(a) Sample sizes 5 to 12

Safety Probabilily,
margin, Px t Sample size, N
$s¢ 5 6 7 8 9 I0 11 12

-5.0 0 -2.6271 -2.7679 -2.8843 -2.9789 -3.0590 -3.1327 t -3.1958 -3.2521


-4.0 0 -2.0487 -2.1655 -2.2612 -2.3404 -2.4052 -2.4667 -2.5188 -2.5652
-3.0 .0013 - 1.4523 -1.5466 - 1.6226 - 1.6880 - 1.7376 -1.7878 -1.8294 -!.8664
-2.0 .0227 .8028 .8810 .9415 .9923 - 1.0351 --1.0740 -!.1071 -1.1364
- 1,0 .1586 ,0434 .0500 .1235 .1762 .2227 ,2579 ,2893 .3168
-0 .5000 1.6808 1.3681 !.1900 1.0602 .9617 .8914 •8320 .7833
.1 .5398 1.9138 1.5664 1.3628 1.2168 1.1126 1.0351 .9703 .9175
.2 .5792 2.1557 1.7665 1.5439 1.3850 1.2706 1.1844 1.1137 1.0563
.3 .6179 2.4041 1.9747 1.7328 1.5608 1.4352 1.3389 1.2617 1.1994
.4 .6554 2.6582 2.1986 1.9285 1.7380 1.6061 1.4975 1.4138 1.3463
.5 .6914 2.9406 2.4294 2.1304 1.9206 1.7775 1.6602 1.5697 1.4970
.6 •7257 3.2293 2.6662 2.3378 2.1082 1,9522 1.8270 1.7295 1.6512
.7 .7580 3.5232 2.9083 2.5500 2.3002 2,1309 1.9977 1.8927 1.8085
.8 .7881 3.8217 3.1551 2.7665 2.4961 L3133 2.1719 2.0591 1.9689
.9 .8159 4.1244 3.4059 2.9869 2.6956 Z,4989 2.3493 2.2285 2.1320
1.0 .8413 4.4425 3.6604 3.2107 2.8988 2.6875 2.5295 2.4005 2•2975
I.I .8643 4.7756 3.9183 3.4375 3.1115 2,8846 2.7118 2.5745 2.4650
1.2 .8849 5.1124 4.1791 3.6672 3.3269 3,0842 2.8962 2.7506 2.6344
1.3 .9031 5.4524 4.4467 3.9006 3.5445 3,2860 3,0827 2.9285 2.8056
1.4 .9192 5.7952 4.7243 4.143 I 3.7582 3.4851 3.2713 3.1083 2.9785
!.5 .9331 6.1405 5.0O42 4.3877 3.9736 3.6855 3.4616 3.2897 3.1528
1.6 .9452 6.4881 5.2861 4.6340 4.1908 3.8874 3.6533 3.4724 3.3284
1.7 .9554 6.8377 5.5698 4.8820 4.4O94 4.O9O7 3.8463 3.6563 3.5051
1.8 .9640 7.1891 5.8550 5.1279 4.631 i 4.2953 4.0404 3.8412 3.6828
1.9 .9712 7.3422 6.1417 5.3723 4.8570 4.5010 4.2353 4.0269 3.8613
2.0 •9772 7.8966 6.4295 5.6180 5.0840 4.7077 4.4310 4.2135 4.0405
2.1 .9821 8.2524 6.7186 5.8647 5.3119 4.9153 4.6277 4.4008 4.2205
2.2 .9860 8.6094 7.0086 6.1125 5.5406 5.1238 4.8251 4.5889 4,4012
2.3 .9892 8.9675 7.2996 6.3612 5.7701 5.3330 5.0232 4.7776 4.5825
2.4 .9918 9.3265 7,5914 6.6107 6.0003 5.5429 5.2219. 4.9670 4.7644
2,5 ,9937 9.6865 7.8840 6.8609 6.231 i 5.7534 5,4212 5,1568 4.9468
2.6 .9953 1.0472 8.1772 7.1119 6.4624 5.9646 5.6210 5.3472 5.1296
2.7 .9965 1.4011 8.4694 7.3635 6.6943 6.1762 5.8213 5.5380 5.3129
2.8 .9974 1.7549 8.7588 7.6191 6.9248 6.3894 6.0221 5.7292 5.4965
2.9 .9981 _1..I094 9.0488 7.g753 7.1549 6.604O 6.2232 5.9207 5.6804
3.0 ,9986 I 1.4647 9.3395 8,1319 7.3855 6.8191 6.4247 6.1126 5.8647
3.1 .9990 11.8207 9.6307 8.3889 7.6165 7.0345 6.6266 6.3047 6.0492
3.2 .9993 12.1773 9.9223 8.6463 7.8479 7.2502 6.8287 6.4972 6.2340
3.3 .9995 12.5345 1,2145 8.9040 8.0796 7.4662 7.0312 6.6900 6.4191
3.4 .9996 12.8922 1.5070 9.1620 8.3 i 17 7.6825 7.2339 6.8850 6.6044
3.5 .9997 i3.2505 1.8000 9.4203 8.5440 7.8990 7.4368 7.0762 6.7899
3.6 .9998 13.6092 11.0933 9.6789 8.7767 8.1157 7.6400 7.2697 6.9756
3.7 .9998 13.9684 11.3870 9.9377 9.0096 8.3326 7.8435 7.4633 7.1616

2O4 NASA/TP_2000-207428
TABLE A-3.--Contint_d.

(a) Concluded.

Safety Probability, Sample size, N


margin, p_
S,4 7 8 10 11 12

3.8 0.9999 14.3280 11.6809 10.1968 9.2427 8.5497 8.0471 7.6572 7.3477
3.9 14.6880 11.9752 10.4560 9.4761 8.7671 8.2500 7.8512 7.5340
4.0 15.0488 12.2698 10.7155 9.7097 8.9845 8.4548 8.04.54 7.7204
4.1 15.4090 12.56445 10.9751 9.9435 9.2022 8.6590 8.2397 7.9069
4.2 15.7700 12.8597 11.2350 10.1775 9.4200 8.8632 8.4342 8,0937
4.3 16.1313 13.1550 I 1.4950 10.4116 9.6379 9.0877 8.6288 8,2805
4.4 16.4929 13.4505 ! 1.7551 10.6459 9.8559 9.2722 8.8235 8.4674
4.5 16.8547 13.7463 12.0154 10.8804 10.074 I 9.4769 9.0184 8.6545
4.6 17.2168 14.0422 12.2758 11. ! 150 10.2924 9.6817 9.2134 8.8417
4.7 i 17.5792 14.3383 12.5364 I i .3497 10.5108 9.8866 9.4084 9.0289
4.8 1' 17.9417 14.6346 12.7970 I 1.5846 10.7293 10.0917 9.6036 9.2163
4.9 1.0000 18.3045 14.9310 13.0578 11.8196 10.9479 10.2968 9.7989 9.4038
5.0 18.6674 15.2276 13.3187 12.0547 i1.1666 10.5020 9.9942 9.5913
5.1 19.0306 15.5243 13.5797 12.2900 I1.3854 10.7073 10.1896 9.7789
5.2 I 19.3939 15.8212 13.8408 12.5253 I 1.6043 10.9127 10.3851 9.9666
5.3 19.7574 16.1182 14.1020 12.7607 1 !.8232 11.1181 10.5807 10.1543
5.4 20.1210 16.4153 14.3632 12.9962 12.0422 I 1.3237 10.7764 10.3422
5.5 20.4848 16.7125 14.6246 13.2318 12.2613 11.5293 10.9721 10.5300
5.6 20.8488 17.01399 14.8860 13.4675 12.4804 11.7350 11.1679 10.7180
5.7 21.2129 17.3073 15.1475 13.7033 12.6996 11.9407 11.3638 10.9060
5.8 21.5771 17.6049 15,4091 13.9391 12.9189 12.1465 11.5597 11.0941
5.9 21.9414 17.9025 15.6707 14.1751 13.1382 12.3524 11.7556 11.2822
6.0 22.3059 18.2003 15.9324 14.4110 13.3576 12.5583 11.9516 11.4703
6,1 22.6705 18.4981 16.1941 14.6471 13.5770 12.7643 12.1477 11.6585
6.2 23.0351 18.7960 16.4560 14.8832 13.7965 12.9703 12.3438 11.8468
6.3 23.3999 19.0940 16.7178 15.1194 14.0160 13.1763 12.5400 12.0351
6.4 23.7648 19.3921 16.9797 15.3556 14.2356 13.3825 12.7362 12.2234
6.5 24,1298 19.6902 17.2417 15.5919 14.4552 13.5886 12.9324 12.4118
6,6 24.4948 19.9884 17.5037 15.8282 14.6748 13.7948 13.1287 12.6002
6.7 24.8600 20.2867 17.7658 16.0646 14.8945 14.001 ! t3.3250 12.7887
6.8 25.2252 20.5850 18.0279 16.3011 15.1143 14.2074 13.5214 12.9771
6.9 25.5905 20.8834 18.2901 16.5375 15.3340 14.4137 13.7177 13.1657
7.0 25.9559 21.1819 18.5523 16.7741 15.5538 14.6200 13.9142 13.3542
7.1 26.3214 21.4804 18.8145 17.0106 15.7736 14.8264 14.1106 13.5428
7.2 26.6869 21.7789 19.0768 17.2472 15.9935 15.0328 14.3071 13.7314
7.3 27.0525 22.0776 19.3391 17.4839 16.2134 15.2393 14.5036 13.9200
7.4 27.4182 22.3762 19.6014 17.7206 16.4333 15.4458 14.7002 14.1087
7.5 27.7839 22.6749 19.8638 17.9573 16.6533 15.6523 14.8967 14.2974
7.6 28.1497 22.9737 20.1262 18.1940 16.8732 15.8588 15.0933 14.4861
7.7 28.5155 23.2725 20.3886 18.4308 17.0932 16.0654 15.2900 14.6748
7.8 28.8814 23.5714 20.651 ! 18.6676 17.3133 16.2720 15.4866 14.8636
7.9 29.2474 23.8702 20.9136 18.9045 17.5333 16.4786 15.6833 15.0524
8.0 29.6134 24.1692 21.1761 19.1414 17.7534 16.6852 15.8800 15.2412

NAS A/TP--2000-207428 205


TABLE A-3.--Continued.

Co) Sample sizes 13 to 20

Safety Probability, Sample size, N


rnalrgin, P._
Sj, 13 14 15 16 17 18 19 20

-5.0 0 -3.3027 -3.3485 -3.3903 -3.4287 -3.4642 -3.4972 -3.5280 -3.5567


-4.0 0 -2,6069 - 2.6447 -2.6792 -2.7109 -2.7401 -2.7673 -2.7927 -2.8163
-3.0 .0013 -1.8997 - !.9299 - 1.9573 - 1.9826 - 2.0058 -2.0275 -2.0477 -2.0666
-2.0 .0227 - 1. i 628 -I.1866 - 1.2083 -1.2281 -1.2464 - 1.2633 - 1.2791 - 1.2937
- 1.0 ,1586 -.3411 -.3628 - ,3823 - .4_0 - .4162 - .4309 - .44.45 - .457 i
0 .5000 .7424 .7074 .6770 .6503 .6265 .6049 .5854 .5677
.I .5398 .8733 .8357 .8031 .7745 .7491 .7260 •7052 .6863
.2 .5792 1.OO85 .9679 .9328 .9022 .875 ! .8503 .8281 .8079
.3 .6179 1.1477 1.1039 i .0662 1.0332 1.0042 .9777 .9540 .9325
.4 .6554 i .2905 1.2433 1.2028 I. 1675 !. 1364 1.1081 1,0827 1.0598
.5 .6914 1.4369 1.3862 ! .3428 1.304.9 1.2717 1.2414 1.2143 1.1898
.6 .7257 1.5867 1.5324 1.4859 1.4454 1.4099 1.3775 1.3485 i.3224
.7 .7580 1.7393 1.6810 1.6312 !.5880 1.5500 1.5155 1.4846 1.4569
.8 .7881 1.8947 1.8324 1.7792 i .7331 1.6926 1.6558 1.6230 1.5935
,9 .8159 2.0527 ! .9862 1.9294 i .8803 1.8372 1.7981 1.7632 1.7319
1.0 .8413 2.2130 2.1422 2.0818 2.0295 1.9838 i .9423 1,9053 1.8721
I.I .8643 2.3752 2.3000 2.2359 2.1805 2.1320 2.O880 2.04.89 2.0138
1.2 .8849 2.5393 2.4596 2.3917 2.3330 2.2817 2.2352 2.1939 2.1568
1.3 .9031 2.7050 2.6208 2.5491 2.4871 2.4329 2.3839 2.3403 2.3012
1.4 .9192 2.8722 2.7833 2.7077 2.6424 2.5853 2.5337 2.4877 2.4466
!.5 .9331 3.0408 2.9472 2.8675 2.7988 2.7388 2.6845 2.6362 2.5930
1.6 .9452 3.2106 3.1122 3.0285 2.9563 2.8932 2.8363 2.7856 2.7403
!.7 .9554 3.3815 3.2782 3,1905 3.1147 3.0486 2.9890 2.9360 2.8885
1.8 .9640 3.5534 3.4452 3.3533 3•2740 3.2049 3.1425 3.0870 3.0374
1.9 .9712 3.7259 3.6129 3.5168 3.4340 3.3617 3.2966 3.2387 3.1870
2.0 .9772 3.8992 3.7812 3.6810 3.5946 3.5193 3.4513 3.3910 3.3370
2,1 .9821 4.0732 3.9503 3.8459 3.7559 3.6774 3.6066 3.5438 3.4876
2.2 •9860 4.2479 4.1200 4.0113 3.9177 3.8360 3.7625 3.6972 3.6388
2.3 .9892 4.4232 4.2902 4.1773 4.0800 3.9952 3.9188 3.8510 3.7904
2.4 .9918 4.599O 4.4610 4.3438 4.2429 4.1548 4.0756 4.0052 3.9424
2.5 .9937 4.7753 4•6322 4.5107 4.4061 4.3149 4.2327 4.1599 4.0947
2.6 .9953 4.9520 4,8O38 4.6781 4.5697 4.4753 4.3903 4.3149 4.2474
2.7 .9965 5.1291 4..9759 4.8458 4.7337 4.6361 4.5482 4.4702 4.4005
2.8 ,9974 5.3066 5.1482 5.0138 4.8980 4.7971 4.7063 4.6258 4.5538
2.9 .998 1 5.4843 5.3208 5.1820 5.0625 4.9584 4.8647 4.7816 4.7073
3.0 .9986 5.6624 5.4937 5.3505 5.2273 5.1198 5.0232 4.9376 4.8610
3.1 .999O 5.8407 5.6668 5.5193 5.3922 5,2816 5. lff20 5.0938 5.0149
3.2 .9993 6.0192 5.8402 5.6882 5.5575 5.4435 5•3411 5.2502 5,1690
3.3 .9995 6.1981 6.0138 5.8575 5.7229 5.6056 5,5003 5.4069 5.3234
3.4 .9996 6.3771 6.1876 6.0269 5.8885 5.7680 5.6597 5.5637 5.4779
3.5 .9997 6.5564 6.3617 6.1965 6.0544 5.9305 5.8193 5.7207 5.6325
3.6 .9998 6.7358 6.5359 6.3663 6.2204 6.0932 5.9790 5.8778 5.7873
3.7 .9998 6.9154 6.7103 6.5363 6.3865 6.2560 6.1389 6.035t 5.9423

206 NASA/TP_2000-207428
TABLE A-3.---Cominued.

(b) Concluded.

Safety Probability, Sample size, N


margin.
Su 13 14 15 16 17 18 19 20

3.8 0.9999 7.0952 6.8849 6.7064 6.5528 6.4190 6.2990 6.1925 6.0974
3.9 7.2752 7.0596 6.8767 6.7193 6.5822 6.4591 6.3501 6.2526
4.0 7.4553 7.2344 7.0471 6.8859 6.7454 6.6194 6.5077 6.4079
4.1 7.6356 7.4094 7.2176 7.0526 6.9088 6.7798 6.6655 6.5633
4.2 7.8159 7.5845 7.3883 7.2194 7.0723 6.9403 6.8234 6.7189
4.3 7.9964 7.7597 7.5590 7.3863 7.2359 7.1010 6.9814 6.8745
4.4 8.1771 7,9351 7.7299 7,5533 7. 3995 7,2617 7.1394 7.0302
4.5 8.3578 8.1105 7.9009 7.7204 7.5633 7.4225 7.2976 7.1860
4.6 8.5386 8.2861 8,0719 7.8877 7.7272 7.5833 7.4558 7.3419
4.7 8.7195 8.4617 8.2431 8.0550 7.8911 7.7443 7.6141 7.4978
4.8 _p 8.9005 8.6374 8.4143 8.2223 8.0551 7.9053 7.7725 7.6538
4.9 1.0000 9.0816 8.8132 8.5856 8.3898 8.2192 8.0664 7.9310 7.8099
5.0 9.2628 8.9890 8.7570 8.5573 8.3834 8.2276 8.0895 7.9660
5.1 9.4440 9.1650 8.9284 8.7249 8.5476 8.3888 8.2480 8.1222
5.2 9.6253 9.3410 9.0999 8.8925 8.7119 8.5501 8.4O67 8.2785
5.3 9.8067 9.5170 9.2715 9.0602 8.8763 8.7114 8.5654 8.4348
5.4 9.9881 9.6932 9.4431 9.2280 9.0406 8.8728 8.7241 8.5912
5.5 10.1696 9.8694 9.6148 9.3958 9.2051 9,O843 8.8829 8.7476
5.6 10.3512 10.0456 9.7865 9.5637 9.3696 9.1958 9.0417 8.9040
5.7 10.5328 10.2219 9.9583 9.7316 9.5341 9.3573 9.2006 9.0605
5.8 10.7145 10.3983 10.1302 9.8996 9.6987 9.5189 9.3595 9.2171
5.9 10.8962 10.5746 I0.3020 10.0676 9.8634 9.6805 9.5184 9.3736
6.0 11.0779 10.7511 10.4740 10.2356 10.0280 9.8422 9.6774 9.5302
6.1 I i.2598 10.9276 10,6459 10.4037 i0.1928 10.0039 9.8365 9.6869
6.2 I1.4416 11.1041 10.8179 10.5718 10.3575 I0.1656 9.9955 9.8435
6.3 11.6235 11.2806 10.9900 10.7400 10.5223 10.3274 I0,1546 10.0003
6.4 11.8054 11.4572 11.1621 10.9082 10.6871 10.4892 10,3138 I0.1570
6.5 11.9874 11.6339 !1.3342 il.0764 10.8520 10.6510 10.4729 10.3138
6.6 12.1694 11,8105 11.5063 11.2447 I 1.0168 10.8129 10,6321 10.4706
6.7 12.3514 11.9873 il.6785 !1.4130 11.1817 10,974.8 10.7913 10.6274
6.8 12.5335 12.1640 11.8507 11.5813 11.3467 II.1367 10,9505 10.7842
6.9 12.7156 12.3407 12.0230 11.7496 11.5116 I1.2986 1 I.1098 10.9411
7.0 12.8978 12.5175 12.1952 11.9180 11.6766 11.4606 11.2691 I1.0980
7.1 12.0799 12.6944 12.3675 12.0864 11.8416 I i .6226 11.4284 I1.2549
7.2 13.2621 12.8712 12.5398 12.2548 12.0067 11.7846 I 1.5877 11.4119
7.3 13.4443 13.0481 12,7122 12.4233 12.1717 11.9466 !i.7471 1 !.5688
7.4 13.6266 13.2250 12.8846 12.5918 12.3368 12.1087 !1.9065 I 1.7253
7.5 13.8088 13.4019 13.0569 12.7603 12.5019 12.2708 12.0659 11.8823
7.6 13.9911 13.5788 13.2294 12.9288 12.6671 12.4329 12.2253 12.0398
7.7 14.1734 13.7558 13.4018 13.0973 12,8322 12.5950 12.3847 12.1969
7.8 14.3558 13.9328 13.5742 13.2659 12.9974 12.7571 12.5442 12.3539
7.9 14.5381 14.1098 13.7467 13.4344 13.1626 12.9193 12.7036 12.5110
8.0 14.7205 14.2868 13.9192 13.6030 13.3278 13.0814 12.8631 12.6681

NASA/TP_2000-207428 207
TABLE A-3.--Cominued.

(c) Sample sizes 21 to 28

Safety Probability, Sample size, N


margin, 1
Su 21 22 23 27 28
24 25 26 I
i

-5.0 0 -3.5836 -3.6090 -3•6328 -3.6554 -3.6767 -3.6970 -3.7162 -3.7346


-4.0 0 -2.8385 -2.8594 -2.8790 -2.8976 -2.9152 -2.9318 -2.9477 -2.9628
-3.0 ,0013 -2.084,2 -2.1008 -2.1164 -2.1312 -2.1451 -2.1584 -2.1710 -2.1830
-2.0 •0227 - 1.3075 - 1.3204 - 1.3325 - 1.3439 - 1.3548 - !.3650 - 1.3748 - 1.3840
-I.0 . ! 586 - .4688 - .4797 - .4900 -.4996 -.5087 -.5173 - .5254 -.5331
0 .50(X) .5514 .5366 .5228 .5101 .4982 .4872 .4769 .4671
.I .5398 .6691 .6533 .6387 •6253 .6128 .6011 .5902 .5800
.2 .5792 .7896 .7728 ,7574 • 7432 .7299 .7176 .7061 .6953
.3 .6179 ,9130 .8952 .8788 .8637 .8496 .8366 .8244 .8130
.4 .6554 1.0390 1.0201 1.0026 .9866 .9717 .9579 .9450 .9330
.5 .6914 1.1677 I. 1475 1.1289 !.1119 1.0961 1.0814 1.0678 1.0550
.6 .7257 1.2988 1.2773 1 •2576 1.2395 1.2227 1.207 I 1.1926 I. 1791
.7 .7580 1.4318 ! .4089 1.3880 !.3687 1.3510 1.3345 1.3191 1.304.8
.8 .7881 1.5669 1.5426 1.5204 1.5000 1.48 ! 1 1.4637 1.4474 1.4323
.9 .8159 ! .7036 1.6779 1.6544 1.6327 1.6128 1.5943 1.5771 1.5611
1.0 .8413 1.8421 1.8149 1.7900 1.7671 i.7460 1.7265 1.7084 1.6914
I.I .8643 1.9821 1.9533 1.9270 1.9028 1.8806 1.8600 1.8408 1.8230
1,2 .884.9 2.1234 2.0930 2.0652 2.0397 2.0163 1.9945 1.9744 1.9556
1.3 .9031 2.2659 2.2339 2.2046 2•1778 2.1531 2.1302 2.1090 2.0892
1.4 .9192 2.4095 2.3758 2.3451 2.3169 2.2909 2.2669 2.2446 2.2238
1.5 .9331 2.5540 2.5187 2.4864 2.4568 2.4296 2.4044 2.3810 2.3592
1.6 ,9452 2.6995 2.6624 2.6286 2.5976 2.5690 2,5426 2.5182 2.4954
1.7 .9554 2.8457 2.8069 2.7716 2.7391 2.7093 2,6817 2.6561 2.6322
1.8 .9640 2.9927 2.9522 2.9152 2.8813 2.8501 2.8213 2.7946 2.7697
1.9 ,9712 3,1403 3.0980 3.0594 3.0241 2.9915 2.9615 2.9336 2.9077 i
2.0 .9772 3.2884 3.2443 3.2041 3.1673 3.1334 3.1021 3.0731 3.0461
2.1 .9821 3.4370 3.3912 3.3493 3.3 i 10 3.2758 3.2432 3.2130 3.1849
2.2 .9860 3.5862 3.5385 3.4950 3.4552 3.4186 3. 3847 3.3534 3.3242
2.3 .9892 3.7357 3.6862 3.64tl 3.5998 3.5618 3.5267 3.4941 3.4638
2.4 •9918 3.8857 3.8344 3.7876 3.7448 3.7053 3.6689 3.6352 3.6038
2.5 .9937 4.0360 3.9829 3.9344 3.8901 3.8492 3.8115 3.7766 3.7441
2.6 .9953 4.1867 4.1317 4.0816 4.0357 3.9934 3.9544 3.9183 3.8847
2.7 .9965 4.3377 4.2808 4.2290 4.1816 4.1379 4.0976 4.0603 4.0255
2.8 .9974 4.4890 4.4302 4.3767 4.3277 4.2826 4.2410 4.2025 4.1666
2.9 .9981 4.6404 4.5798 4.5246 4.4741 4.4276 4.3847 4.3449 4.3079
3.0 .9986 4.7920 4.7296 4.6727 4.6206 4.5727 4.5284 4.4874 4.4493
3.1 .9990 4.9439 4.8796 4.8210 4.7673 4.7180 4.6724 4.6302 4.59O9
3.2 .9993 5.0959 5.0297 4.9694 4.9142 4.8634 4.8166 4.7731 4.7327
3.3 .9995 5.2482 5.1801 5.1181 5.0613 5.0091 4.9609 4.9162 4.8747
3.4 .9996 5.4006 5.3306 5.2669 5.2085 5.1549 5.1053 5.0594 5.0168
3.5 .9997 5.5532 5.4813 5.4158 5.3559 5.3008 5.2500 5.2028 5.1590
3.6 .9998 5.7059 5.6321 5.5649 5,5034 5.4469 5.3947 5.3463 5.3014
3.7 .9998 5•8587 5.7831 5.7142 5.6511 5.5931 5.5396 5.4899 5.4438

208 NASA/TPm2000-207428
TABLE A-3,--Continued.

(c) Concluded.

Safety Probability, Sample size, N


_rgin. P,
21 22 23 24 25 26 27 28

3.8 0.9999 6.0117 5.9342 5.8635 5.7989 5.7394 5.6845 5.6337 5.5864
3.9 6.1648 6.0854 6.0130 5.9467 5.8858 5.8296 5.7775 5.729 I
4.0 6.3180 6.2367 6.1626 6.0947 6.0324 5.9748 5.9215 5.8719
4.1 6.4714 6.3881 6.3122 6.2428 6.1790 6.1201 6.0655 6.0147
4.2 6.6248 6.5396 6.4620 6.3910 6.3257 6.2654 6.2096 6.1577
4.3 6.7788 6.6912 6.6118 6.5392 6.4725 6.4109 6.3538 6.3007
4.4 6.9319 6.8428 6.7618 6.6876 6.6193 6.5564 6.498O 6.4438
4.5 7.0856 6.9946 6.9118 6.8360 6.7663 6.7020 6.6424 6.5870
4.6 7.2393 7.1464 7.0619 6.9844 6.9133 6.8476 6.7868 6.7302
4.7 7.3931 7.2983 7.2120 7.1330 7.0604 6.9933 6.9312 6.8735
4.8 ,' 7.5470 7.4503 7.3622 7.2816 7.2075 7.1391 7.0757 7.0168
4.9 1.0000 7.7010 7.6023 7.5125 7.4303 7.3547 7.2849 7.2203 7.1602
5.0 7.8550 7.7544 7.6628 7.5790 7.5019 7.4308 7.3649 7.3037
5.1 8.0090 7.9065 7.8132 7.7278 7.6492 7.5768 7.5096 7.4472
5.2 8.1632 8.0587 7.9636 7.8766 7.7966 7.7227 7.6543 7,5907
5.3 8.3173 8.2110 8.1141 8.0255 7.9440 7.8688 7.799 I 7.7343
5.4 8.4716 8.3633 8.2646 8.1744 8.0914 8.0148 7.9439 7.8780
5.5 8.6258 8.5156 8.4152 8.3233 8.2389 8.1609 8.0887 8.0215
5,6 8.7801 8.6680 8.5658 8.4723 8.3864 8.3071 8.2336 8.1653
5.7 8.9345 8.8204 8.7165 8.6214 8.5340 8.4533 8.3785 8,3091
5.8 9.0889 8.9728 8.867 i 8.7704 8.6815 8.5995 8.5235 8.4529
5.9 9.2433 9.1253 9.0179 8.9195 8.8292 8.7458 8.6685 8.5967
6,0 ! 9.3978 9.2778 9.1686 9.0687 8.9768 8.892O 8.8135 8,7405
6.1 9.5523 9.4304 9.3194 8.2179 9.1245 8.0384 8.9585 8.8844
6.2 9.7068 9.5830 9.4702 9.3671 9.2722 9.1847 9.1036 9.0283
6.3 9.8614 9.7356 9.6211 9.5163 9.4200 9.3311 9.2487 9,1722
6.4 10.0160 9.8882 9.7720 9.6655 9.5677 9.4775 9.3938 9.3161
6.5 I0.1706 10.0409 9.9229 9.8148 9.7155 9.6239 9.5390 9.,1601
6.6 10.3252 10.1936 10.0738 9.9641 8.8634. 9.7703 9.6842 9.6041
6.7 10.4799 10.3463 10.2247 10.1135 10.0112 9.9168 9.8294 9.7481
6.8 10.6346 10.4991 10.3757 10.2628 10.1591 10.0633 9.9746 9.8921
6.9 10.7893 10.6519 10.5267 10.4122 10.3070 10.2098 10.1198 10.0362
7.0 10.9441 10.8047 10.6777 10.5616 10.4549 10.3563 10.2651 10.1803
7.1 1 i.0988 10.9575 10.8288 10.7110 10.6028 10.5029 10.4104 10.3244
7.2 11.2536 11.1103 10.9798 10.8605 10,7507 10.6495 10.5557 10.4685
7,3 I 1.4084 i 1.2632 11.1309 i 1.0099 10.8987 10.7961 10.7010 10.6126
7.4 11.5632 I 1.4160 11,2820 11.1594 11.0467 10.9427 10.8463 10.7567
7.5 11.7181 11.5689 11.4331 "11.3089 11.1947 11.0893 10.9916 10.9009
7.6 11.8729 11.7218 11.5843 i i.4584 11.3427 11.2359 11.1370 1 i.0451
7.7 12.0278 i 1.8748 I 1.7354 1 ! .6079 i 1.4907 11.3826 I 1.2824 i 1.1893
7.8 12.1827 12.0277 I 1.8866 I 1.7575 1 i .6388 11.5292 11.4278 11.3335
7.9 12.3376 12.1807 12.0378 I 1.9070 11.7868 ! 1.6759 I 1.5732 I1.4777
8.0 ', 12.4926 12.3337 12.1890 12.0566 ! !.9349 11.8226 11.7186 11.6219

NASA/TP--2000-207428 209
TABLE A-3.--Continued.

(d) Sample sizes 30 to 1(30

Safety Probability, Sample size, N


margin. Pr

5M 3O 40 50 6O 70 8O 90 100

-5.0 0 -3.7688 -3.9040 -4.0005 -4.O741 -4.1328 -4.1810 -4.2216 -4.2565


-4.0 0 -2.9910 -3.1021 -3.1815 -3.2420 -3.2901 - 3.3297 -3.3630 -3.3916
-3.0 .0013 -2.2053 -2.2935 -2.2564 -2.4042 -2.4422 - 2.4735 -2.4998 -2.5224
-2.0 .0227 - 1.4013 - 1.469 I - 1.5172 - ! .5536 - 1.5825 - 1.6063 - 1.6262 - 1.6432
-I.0 .1586 -.5473 -.6024 -- .6406 -.6692 -.6918 - .7101 -.7254 -.7385
0 .5000 ,4494 .3835 .3401 .3087 .2846 .2655 .2497 .2364
.I .5398 .5613 .4924 .4472 .4146 .3897 .3700 .3537 .3401
.2 .5792 .6756 .6O33 .5560 .5221 .4963 .4758 .4590 .4449
.3 .6179 ,7923 .7162 .6665 .6311 .6O42 .5828 .56M .5508
.4 .6554 .9110 .8308 .7786 .7415 .7134 .6912 .6750 .6579
.5 .6914 i.0318 •9471 .8922 .8533 .8239 .8007 .7818 .7659
.6 .77.57 1.1 545 1.0650 1.0O73 .9664 .9356 .9113 ,8915 .8750
.7 .75g0 i .2788 i.1844 I. 1236 ! .0806 1.0483 ! .0229 1.0022 .9850
.8 .7881 1.4048 1.3051 1.241 i 1.1960 1.1621 1.1355 i.1138 1.0958
.9 .8159 1.5321 1.4270 1.3596 1.3122 1.2767 1.2488 1.2261 1.2073
1.0 .8413 1.6608 1.5500 1.4792 1.4294 1.3922 1.3629 1.3392 1.3195
I.i .8643 1.7906 i .6740 1.5996 ! .5474 1.5084 ! .4778 1.4530 1.4324
1.2 .8849 1.9215 1.7989 1,7208 1.6662 1.6253 1.5933 ! .5673 1.5458
1.3 .9031 2.0535 1.9247 ! .8429 1.7856 ! .7429 i .7094 1.6823 1.6598
1.4 .9192 2.1863 2.0513 1.9656 1.9057 1.8610 1.826O 1.7977 1.7742
1.5 .9331 2.3198 2.1784 2.0888 2.0262 ! .9795 1.9430 1.9135 1.8890
1.6 .9452 2.4542 2.3062 2.2126 2.1473 2.0986 2.0605 2.0297 2.0042
1.7 .9554 2.5891 2.4346 2.3369 2.2688 2.2180 2.1784 2.1463 2.1198
1.8 .9640 2.7247 2.5635 2.4617 2.3908 2.3379 2.2966 2.2632 2.2356
1.9 .9712 2.8608 2.6928 2.5868 2.5130 2,4580 2.41.51 2.3804 2.3517
2.0 .9772 2.9972 2.8225 2.7123 2.6356 2.5784 2.5339 2.4979 2.4681
2.1 .982| 3.1341 2.9525 2.8380 2.7584 2,6991 2.6529 2.6155 2.5846
2.2 .9860 3.2715 3.0829 2.9641 2.8815 2.8201 2.7721 2.7334 2.7014
2.3 .9892 3.4091 3.2136 3.0905 3.0049 2.9412 2.8916 2,8515 2.81841
2,4 .9918 3.5471 3,3445 3.2171 3.1285 3.0626 3,0112 2.9698 2.9355
2.5 .9937 3.6854 3.4757 3.M39 3.2523 3.1842 3.1311 3.0882 3.0528
2.6 .9953 3.8240 3.6072 3.4709 3.3763 3,3059 3.2511 3.2068 3.1702!
2.7 .9965 3.9628 3.7389 3.5982 3.5005 3.4278 3.3712 3.3256 3.2878
2.8 .9974 4.1019 3.87O7 3.7256 3.6248 3,5499 3.4915 3.4444 3.4055
2.9 .9981 4.2411 4.0028 3.8531 3.7493 3,6720 3.6119 3.5634 3.5233
3.0 .9986 4.3805 4.1349 3.9808 3.8738 3.7943 3.7324 3.6825 3.6412
3.1 .999O 4.520i 4.2673 4.1086 3.9985 3.9167 3.8530 3.8017 3.7592
3.2 .9993 4.6598 4.3997 4.2366 4.1234 4.0393 3.9737 3.9209 3.8773
3.3 .9995 4.7997 4.5323 4.3646 4.2483 4.1619 4.0945 4.0403 3.9954
3.4 .9996 4.9397 4.6650 4.4928 4.3733 4.2845 4.2154 4.1597 4.1137
3.5 .9997 4.0799 4.7979 4.6211 4.4984 4.4073 4.3364 4.2792 4.2320
3.6 .9998 5.2202 4.9308 4.7494 4.6236 4.5302 4.4574 4.3988 4.3503
3.7 .9998 5,3606 5.0638 4.8778 4.7489 4.6531 4.5785 4.5184 4.4688

210 NASA/TP_2000-207428
TABLE A=3.--.Concluded.

(d) Coocluded.

Safety Probability, Sample size, N


margin, P,
Su 40 5O 60 70 8O 9O 100

3.8 0.9999 5.5011 5.1969 5.0064 4.8742 4.7761 4.6997 4.6381 4.5872
3.9 5.6417 5.3301 5.1350 4.9996 4.8991 4.8209 4.7579 4.7058
4.0 5.7824 5.4634 5.2636 5.1251 5.0223 4.9422 4.8777 4.8243
4.1 5.9232 5.5968 5.3923 5.2506 5.1454 5.0635 4.9975 4.9430
4.2 6.0640 5.7302 5.5211 5.3762 5.2686 5.1849 5.1174 5.0615
4.3 6.2049 5.8637 5.6500 5.5019 5.3919 5.3063 5.2373 5.1803
4.4 6.3,*59 5.9972 5.7789 5.6275 5.5152 5.4277 5.3573 5.2991
4.5 6.4870 6.1308 5.9078 5.7533 5.6385 5.5492 5.4773 5.4178
4.6 6.6281 6.2645 6.0368 5.8791 5.7619 5.6708 5.5974 5.5367
4.7 6.7693 6.3982 6.1659 6.0049 5.8853 5.7923 5.7174 5.6555
4.8 ,v 6.9106 6.5319 6.2949 6.1307 6.0088 5.9139 5.8375 5.7744
4.9 1.0000 7.0518 6.6657 6.4241 6.2566 6.1323 6.0356 5.9577 5.8933
5.0 7.1932 6.7996 6.5532 6.3825 6.2558 6.1572 6.0778 6.0122
5.1 7.3346 6.9334 6.6824 6.5085 6.3794 6.2789 6.1980 6.1311
5.2 7.4760 7.0673 6.8116 6.6345 6.5029 6.4006 .6.3182 6.2501
5.3 7.6175 7.2013 6.9409 6.7605 6.6266 6.5224 6.4384 6.3691
5.4 7.7590 7.3353 7.0702 6,8865 6.7502 6.6441 6.5587 6.4881
5.5 7.9005 7.4693 7.1995 7.0126 6.8738 6.7659 6.6790 6.6071
5.6 8.042 ! 7.6033 7.3288 7.1386 6.9975 6.8877 6.7993 6.7262
5.7 8.1837 7.7374 7.4582 7.2648 7.1212 7.0095 6.9196 6.8452
5.8 8.3254 7.8715 7.5876 7.3909 7.2449 7.1314 7.0399 6.9643
5.9 8.467 I 8.0056 7.7170 7.5170 7.3687 7.2532 7.1603 7.0854
6.0 8.6088 8.1398 7.8464 7.6432 7.4924 7.3751 7.2806 7.2O25
6.1 8.7505 8.2740 7.9759 7.7694 7.6162 7.4970 7.4010 7.3217
6.2 8.8923 8.4082 8.1054 7.8956 7.7400 7.6189 7.5214 7.4408
6.3 9.0341 8.5424 8.2348 8.0218 7.8638 7.7409 7.6418 7.5600
6.4 9.1759 8.6766 8.3644 8.1481 7.9876 7.8627 7.7622 7.6791
6.5 9.3177 8.8109 8.4939 8.2744 8.1 i 14 7.9847 7.8827 7.7983
6.6 9.4596 8.9452 8.6234 8.4006 8.2353 8.1067 8.003 I 7.9175
6.7 9.6015 9.0794 8.7530 8.5269 8.3592 8.2286 8.1236 8.0367
6.8 9.7434 9.2138 8.8826 8.6532 8.4830 8.3506 8.2440 8.1559
6.9 9.8853 9.3481 9.0122 8.7795 8.6069 8.4726 8.3645 8.275 !
7.0 10.0272 9.4824 9.1418 8.9059 8.75O8 8.5946 8.4850 8.3944
7.1 10.1692 9.6168 9.2714 9.0322 8.8547 8.7167 8.6055 8.5136
7.2 10.3112 9.7512 9.4010 9.1586 8.9786 8.8387 8.7260 8.6329
7.3 10.4532 9.8856 9.5307 9.2849 9.1026 8.9607 8.8465 8.7521
7.4 10.5952 10.0200 9.6603 9.4113 9.2265 8.0828 8.9670 8.8714
7.5 10.7372 10.1544 9.7g00 9.5377 9.3505 9.2048 9.0876 8.99O7
7.6 10.8792 10.2888 9.9197 9.6641 9.4744 9.3269 9.2081 9.1100
7.7 11.0213 10.4233 10.0494 9.7905 9.5984 9.4490 9.3287 9.2293
7.8 1 i. 1633 10.5577 10.1791 9.9169 9.7224 9.5711 9.4492 9.3486
7.9 11.3054 10.6922 10.3088 10.0433 9.8464 9.6932 9.5698 9.4679
8.0 I 1.4475 10.8266 10.4385 10.1698 9.9704 9.8153 9.6904 9.5872

NASA,rFp--2000-207428 211
TABLE A--4.--SAFETY MARGINS AT 95-PERCENT CONFIDENCE LEVEL

(a) Sample sizes 5 to 12

Safety Pmbability, Sample size, N


margin, e.,
SM 8 9 i0 il I2

-5.0 0 -3.1600 -3.2797 -3,3759 -3.4551 -3.5230 -3.5814 - 3.6328 -3.6783


-4.0 0 -2.4898 -2.5882 -2.6674 -2.7326 -2.7884 -2.8364 -2.8787 -2.9161
-3.0 .0013 -1.8066 - 1.8847 - 1.9477 - !.9995 -2.0438 -2.0819 -2. ! 155 - 2.1453
-2.0 .0227 - 1.0897 - 1.1507 -i.1999 - 1.2402 - 1.2748 -1.3044 - 1.3304 - i.3535
-1.0 .1586 -.2651 -.3241 -.3691 - .4062 - .4363 - .4625 - .4847 - .5043
0 5000 .9538 .8223 .7340 .6697 .6"197 .5796 .5464 .5184
.I .5398 I. ! 123 .9664 .8692 .7989 .7449 .7017 .6661 .6361
.2 •5792 1.2779 I. 1159 1.0094 .9328 .8741 .8273 .7889 .7567
.3 .6179 1.4509 1.2710 1.1543 1.07O8 1.0071 .9562 .9148 .88O3
.4 .6554 1.6309 1.4315 1.3035 !.2118 1.1428 1.0882 1.0436 1.0065
.5 .6914 1.8155 1.5966 1.4564 1.3565 1.2822 ! .2231 i.1751 1.1353
.6 .7257 2.0061 1.7662 1.6130 1.5047 1.4247 1.3609 1.3093 1.2666
.7 .7580 2.2023 1.9395 1.7729 1.6562 1.5691 1.5010 i .4456 ! .3999
.8 .788 [ 2.4022 2.1163 1.9362 1.8104 1.7162 1.6434 1.5841 1.5353
.9 .8159 2.6057 2.2960 2.1023 1.9666 1.8658 1.7878 1.7245 1.6724
1.0 .8413 2.8129 2.4788 2.2709 2.1252 2.0174 1.9343 1.8667 1.8112
1.1 .8643 3.0237 2.6647 2.4415 2.2858 2.1710 2.0822 2.0103 1.9514
1.2 .8849 3.2370 2.8528 2.6140 2.4483 2.3263 2.2317 2.1554 2.0930
1.3 .9031 3.4526 3.0427 2.7883 2.6123 2.4830 2,3826 2.3OI8 2.2358
1.4 .9192 3.6702 3.2340 2.9642 2.7779 2.6408 2.5345 2.4493 2.3796
1.5 .933 I 3.8896 3.4268 3.141.5 2.9447 2.7998 2.6876 2.5977 2.5243
1.6 .9452 4.1105 3.6211 3.3200 3.1126 2.9598 2.8416 2.7471 2.6699
1.7 .9554 4.3329 3.8165 3.4997 3.2815 3.1208 2.9965 2.8972 2.8163
1.8 .9640 4.5564 4.0134 3.6803 3.4513 3.2826 3.1522 3.0482 2.9633
1.9 .9712 4,7811 4.2113 3.8618 3.6216 3.4451 3.3085 3.1997 3.1110
2.0 .9772 5 .O067 4.4101 4.0440 3.7927 3,6083 3.4654 3.3518 3.2592
2.1 •9821 5.2332 4.6097 4.2270 3.9645 3.7721 3.6229 3.5044 3.4079
2.2 .9860 5.4605 4.8099 4.4106 4. i 368 3.9364 3.7810 3.6575 3.5570
2.3 .9892 5.6885 5.0108 4.5947 4.3096 4.1013 3.9394 3.8111 3.7066
2.4 .9918 5.9171 5.2122 4.7794 4.4829 4.2665 4.0983 3.9650 3.8565
2.5 .9937 6.1463 5.4142 4.9646 4.6567 4.4322 4.2576 4.1193 4.0068
2.6 .9953 6.3761 5.6166 5.1501 4.8308 4.5982 4.4172 4.2739 4.1574
2.7 .9965 6.6063 5.8193 5.3361 5.0053 4.7646 4.5771 4.4289 4.3083
2.8 .9974 6.8369 6.0223 5.5222 5.1801 4.9311 4.7373 4.5840 4.4594
2.9 .9981 7.0679 6.2256 5.7086 5.3552 5.0978 4.8977 4.7394 4.610"/
3.0 .9986 7.2993 6.4292 5,8953 5.5306 5.2647 5.0584 4.8950 4.7622
3.1 .9990 7,5311 6.6332 6.0823 5.7062 5.4319 5,2193 5.0508 4.9139
3.2 .9993 7,7631 6.8374 6.2696 5.8821 5.5993 5.3803 5.2068 5.0658
3.3 ._)95 7.9954 7.0418 6.4570 6.0582 5.7668 5.5416 5.3630 5.2179
3.4 .9996 8.2280 7.2465 6.6447 6.2344 5.9346 5.7030 5.5194 5.3701
3.5 .9997 8,4608 7.4514 6.8326 6.4109 6.1 026 5.8646 5.6759 5.5225
3.6 .9998 8.6939 7.6565 7.0207 6.5875 6.27O7 6.0264 5.8325 5.6750
3.7 .9998 8.927 L 7.8618 7.2089 6.7643 6 •4389 6.1882 5.9893 5.8277

212 NASA/TP_2000-207428
TABLE A--4.--Continued.

(a) Concluded.

Safety Probability, Sample size, N


margin, p, 5
SM 7 8 9 10 11 12
, ,i

3.8 0.9999 9.1606 8.0673 7.3973 6.9412 6.6073 6.3502 6. [462 5.9804
3,9 9.3942 8.2729 7.5858 7.1182 6.7758 6.5124 6.3032 6.1333
4,0 9.6280 8.4787 7.7745 7.2954 6.9444 6.6746 6.4603 6.2852
4,1 9.8619 8.6846 7.9633 7.4727 7.1132 6.8369 6.6175 6.4393
4.2 10.0960 8.8906 8.1522 7.6501 7.2820 6.9994 6.7748 6.5924
4.3 10.3302 9.0968 8.3413 7.8276 7.4510 7.1619 6.9322 6.7456
4.4 10.5645 9.3031 8.5304 8.0052 7.6200 7.3245 7.0896 6.8989
4.5 10.7990 9.5095 8.7196 8.1829 7.7891 7.4872 7.2472 7.0523
4.6 l | .0336 9.7160 8.9090 8.3606 7.9583 7.6499 7.4048 7.2057
4.7 11.2683 9.9225 9.0984 8.5385 8.1276 7.8128 7.5625 7.3592
4.8 r 11.5031 10.1292 8.2879 8.7164 8.2969 7.9757 7.7202 7.5128
4.9 l.O000 11.7379 10.3359 9.4775 8.8944 8.4664 8.1386 7.8780 7.6664
5.0 11.9729 10.5428 9.667 ! 9.0725 8.6358 8.3017 8.0358 7.8200
5.1 12.2080 10.7497 9,8568 9.2506 8.8054 8.4647 8.1938 7.9738
5.2 12.4431 10,9567 10.0466 9.4288 8.9750 8.6279 8.3517 8,1275
5.3 12.6783 1 i. 1637 10.2364 9.6070 9.1446 8.7910 8.5097 8.2813
5.4 12.9136 11.3708 10.4263 9.7853 9.3143 8.9543 8.6678 8.4352
5.5 13.1489 11.5780 10.6163 9.9636 9.4841 9.1176 8.8259 8.5891
5.6 13.3843 I 1.7852 10.8063 I0.1420 9.6539 9.2809 8.9840 8.7450
5.7 13.6198 11.9925 10.9963 10.3205 9.8237 9.4442 9.1422 8.8970
5.8 13.8553 12.1998 ! i. 1865 10.4989 9.9936 9.6076 9.3004. 9.0510
5.9 14.0909 12.4072 11.3766 10.6775 10.1635 9.7710 9.4586 9.2050
6.0 14.3265 12.6146 1.5668 10.8560 10.3335 9.9345 9.6169 9.3591
6.1 14.5622 12.8221 ! .7570 11.0346 10.5034 10.0980 9.7752 9.5132
6.2 t4.7979 13.0296 1.9473 il.2133 10.6735 10.2615 9.9336 9.6673
6,3 15.0337 13.2372 12.1376 I 1.3919 10.8435 10.4251 10.0919 9.8215
6.4 15.2695 13,4447 12.3279 I !.5706 I 1.0136 10.5887 10.2503 9.9757
6.5 15.5054 13.6524 12.5183 1t.7494 11.1837 10.7523 [0.4087 I0.1299
6.6 15.7413 13.8600 12.7087 I 1.9281 11.3539 10.9160 10.5672 10.2841
6.7 15.9772 14.0677 12.8992 12.1069 11.5241 11.0796 10.7257 10.4384
6.8 16.2132 14.2755 13.0897 12.2857 11.6943 l 1.2433 10. 8842 10.5927
6.9 16.4492 14.4832 13.2802 12.4646 11.8645 11.4070 I 1.0427 10.7470
7.0 16.6852 14.6910 }3.4707 12.6434 12.0347 11.5708 11.2012 10.9013
7.1 16.9213 14.8988 13.6612 12.8223 12.2050 11.7345 11.3598 11.0556
7.2 17.1574 15.1067 13.8518 13.0013 12.3753 11.8983 11.5183 11.2100
7.3 17.3935 15.3146 14.0424 13.1802 12.5456 12.0621 11.6769 11.3644
7.4 17.6297 15.5225 14.2330 13.3592 12.7160 12.2259 [ 1.8356 1 [ .5187
7.5 17.8659 15.7304 14.4237 13.5381 12.8863 12.3898 ! 1.9942 1 !.6732
7.6 18.1021 15.9383 14.6144 13.7171 13.0567 12.5536 12.1528 11.8276
7.7 18.3383 16.1463 14.8050 13.8962 13,2271 12,7175 12,3115 11.9820
7.8 18.5746 16.3543 14.9958 14.0752 13.3975 12,8814 12.4702 12.1365
7.9 18.8109 16.5623 15.1865 14.2542 13.5679 13.0453 12.6289 12.2909
8.0 19.0472 16.7703 15.3772 14.4333 13.7384 13.2092 12.7876 12.4454

NASA/TP_2000-207428 213
TABLE A-4.---Continued.

(1)) Sample sizes 13 to 20

Safety Probability. Sample size, N


margin, P_
Sw 13 14 15 16 17 18 19 20

-5.0 0 -3.7191 -3.7560 -3.7895 -3.8202 -3,8485 -3.8747 -3.8990 -3.9217


-4.0 0 -2.9497 -2.9800 -3.0076 -3.0329 -3.0561 -3.0777 -3.0977 -3.1164
-3.0 .0013 -2.1719 -2.1960 -2.2179 -2.2380 -2.2564 -2.2735 -2.2894 -2•3042
-2.0 .0227 -I.3741 -!.3927 -!,4097 -1.4251 -I,4394 -i.4525 -1.4647 -1.4761
-1.0 .1586 -.5217 - •5373 - .5514 - .5642 -.5759 -.5866 -.5965 -.6057
-0 .5000 .4943 .4733 .4547 .4382 .4234 .4100 .3978 .3867
.1 .5398 .6104 .5881 .5684 .5510 .5353 •5212 .5064 .4967
.2 .5792 .7293 .7055 .6846 .6661 .6496 .6347 .6211 .6087
.3 .6179 •8509 .8256 .8033 .7837 •7661 • 7503 .7359 .7228
.4 .6554 .9751 .9480 .9243 .9034 .8848 .8679 .8527 .8388
.5 .6914 1.1017 1.0727 1.0475 1.0252 1.0054 .9875 .9713 .9566
.6 .7257 i•2306 1.1996 1.1727 1.1490 !.1279 I. 1089 1.0918 1.0762
.7 .7580 1.3614 ! .3284 1.2997 !.2745 1.2521 1.2319 1.2137 1.1972
.8 .7881 1.4942 ! .4591 1.4286 I •4018 J •3780 1.3566 1.3373 1•3197
.9 .8159 1.6286 1.5912 1.5588 1.5304 1.5052 1.4825 1.4620 1.4435
1.0 •8413 1.7647 1.7250 i .6906 1.6605 1.6338 1.6097 i .5880 1.5684
1.1 .8643 1.9021 1.8600 1.8236 1.7917 1.7635 1.7380 1.7151 1.6945
1.2 .8849 2.0407 1.9962 1.9577 1.9240 1.8942 1.8674 1.8432 1.8214
1.3 .9031 2.1806 2•1335 2.0929 2.0574 2.0260 1.9977 1.9723 1.9493
1.4 .9192 2.3213 2•2718 2.2290 2.1916 2.1585 2.1288 2. t020 2.0779
1.5 .9331 2.4630 2.4109 2.3659 2.3265 2.2918 2.2606 2.2325 2.2072
1.6 .9452 2.6055 2.5507 2.5035 2.4622 2 •4258 2.3930 2.3636 2.3371
1.7 .9554 2.7487 2.6913 2.6418 2.5986 2.5605 2.5261 2.4954 2.4676
1.8 .9640 2.8926 2,8325 2.7807 2.7355 2.6957 2.6598 2.6277 2.5986
1.9 •9712 3.0370 2,9743 2.9202 2.8730 2,8313 2.7939 2.7604 2.7301
2.0 .9772 3. i 820 3.1165 3.0601 3.0108 2.9674 2.9284 2.8935 2.8619
2.1 .9821 3.3274 3,2592 3.2004 3.1491 3.1040 3.0633 3.0270 2.994 I
2.2 .9860 3.4733 3.4023 3.3411 3.2878 3.2409 3.1986 3.1608 3•1267
2.3 .9892 3.6196 3.5458 3.4823 3.4269 3.3781 3.3343 3.2950 3.2596
2.4 .9918 3.7662 3,6896 3.6237 3.5662 3.5156 3.4702. 3.4295 3.3927
2.5 .9937 3.9131 3.8338 3.7654 3.7059 3.6535 3.6064 3.5642 3.5262
2.6 .9953 4.06O4 3.9782 3.9O75 3.8458 3.7915 3.7428 3.6991 3.6598
2.7 .9965 4.2079 4.1229 4.0497 3.9860 3.9298 3.8794 3.8343 3.7987
2.8 •9974 4.3557 4.2678 4.1922 4.1263 4.0684 4.0163 3.9697 3.9277
2.9 .9981 4•5036 4A129 4.3349 4.2669 4.207 I 4.1533 4.1053 4.0619
3.0 .9986 4.6518 4.5582 4.4777 4.4076 4.3459 4.2905 4.2409 4.1963
3.1 .9990 4.8001 4.7036 4.6207 4.5485 4.4849 4,4279 4.3768 4.3308
3.2 .9993 4.9486 4•8493 4.7639 4.6895 4.6241 4.5653 4.5128 4.4654
3.3 •9995 5.0973 4.9951 4•9072 4.8307 4.7634 4.7030 4.6489 4.6002
3.4 .9996 5.2461 5.1410 5.0507 4.9720 4.9028 4.8407 4•7851 4.7351
3.5 .999"7 5.395O 5•2871 5.1943 5.1135 5.4024 4.9786 4.9215 4.8701
3.6 •9998 5.5441 5.4333 5.3380 5.2550 5.1820 5,1165 5.0580 5.(X)52
3.7 .9998 5,6933 5.5796 5.4818 5.3967 5•3218 5.2546 5.1945 5.1404

214 NASA/TP--2000-207428
TABLE A-4.--Contimw.A.

(b) Concluded.

Safety Probability, Sample size, H


margin, e_
s,, 13 14 15 16 17 18 19 20

3.8 0.9999 5.8426 5.7260 5.6257 5.5385 5.4617 5.3928 5.3312 5.2757
3.9 5.9920 5.8725 5.7697 5.6803 5.6016 5.5310 5,4679 5.4110
4.0 6.1416 6.0191 5.9139 5,8223 5.7417 5.6694 5.6047 5.5465
4.1 6.2912 6. i658 6.0580 5.9643 5.8818 5.8078 5.7416 5.6820
4.2 6.4408 6.3126 6.2023 6.1064 6.0220 5.9463 5.8785 5.8176
4.3 6.5906 6.4594 6.3467 6.2485 6.1622 6.0848 6.0156 5.9532
4.4 6.7404 6.6063 6.4911 6.3908 6.3O25 6.2234 6.1526 6.0889
4.5 6.8903 6.7533 6.6355 6,5331 6.4429 6.3621 6.2989 6.2247
4.6 7.0403 6.9003 6.7801 6.6754 6.5834 6.5008 6.4270 6.3605
4.7 7.1903 7.0474 6.9247 6.8178 6.7239 6.6396 6.5642 6.4963
4.8 7.3404 7.1946 7.0693 6.9603 6.8644 6.7784 6.7015 6.6322
4.9 I.O000 7.4906 7.3418 7.2140 7.1028 7.0050 6.9173 6.8388 6.7682
5.0 7.6408 7.4891 7.3588 7,2454 7,1456 7.0562 6.9762 6.9042
5.1 I 7.7910 7.6364 7.5035 7.3879 7.2863 7.1951 7.1136 7.0402
5.2 7.9413 7.7837 7.6484 7.5306 7.4270 7.334 1 7.2510 7.1763
I
5.3 8.0916 7.9311 7.7932 7.6733 7.5677 7.4731 7.3885 7.3124
5.4 8.2420 8.0786 7.9381 7.8160 7.7085 7.6122 7.5260 7.4485
5.5 8.3924 8.2260 8.0831 7.9587 7.8494 7.7513 7.6636 7.5846
5.6 8.5429 8.3735 8.2281 8.1015 7.9902 7.8904 7.8012 %7208
5.7 8.6934 8.5211 8,373 i 8.2443 8.1311 8.0296 7.9388 7.8571
5.8 8.8439 8.6687 8.5181 8.3872 8.2720 8.1687 8.0764 7.9933
5.9 8.9944 8.8163 8.6632 8.5300 8.4129 8.3080 8.2141 8.1296
6.0 9.1450 8.9639 8.8083 8.6729 8.5539 8.4472 8.3518 8.2659
6.1 9.2956 9.1115 8.9534 8.8159 8.6949 8.5864 8.4895 8.4022
6.2 9..4463 9.2592 9.0986 8.9588 8.8359 8.7257 8.6272 8.5385
6.3 9.5969 9.4069 9.2438 9.1018 8.9770 8. 8650 8.7650 8. 6749
6.4 9,7476 9.5547 9.3890 9.2448 9.1180 9,0044 8.9028 8.8113
6.5 9.8983 9.7024 9.5342 9.3878 9.2591 9,1437 9.0406 8.9477
6.6 10.0491 9.8502 9.6794 9.5309 9.4002 9.2831 9.1784 9.0841
6.7 10.1998 9.9980 9.8247 9,6739 9.5413 9.4225 9.3162 9.2205
6.8 10.3506 i0.1458 9.9700 9.8170 9.6825 9.5619 9.4540 9.3570
6.9 10.5014 10.2937 10.1153 9.9601 9.8236 9.7013 9.5919 9.4935
7.0 10.6522 10.4416 10.2606 10.1032 9.9648 9,8407 9.7298 9.6299
7.1 10.8031 10.5894 10.4059 10.2463 10.1060 9,9802 9.8677 9.7664
7.2 10.9539 10.7373 10.5513 10.3895 10.2472 10.1196 10.0056 9.9030
7.3 11.1048 10.8852 10.6967 10.5326 10.3884 10.2591 10.1435 10.0395
7.4 1 i.2557 ! 1.0332 10.8421 10.6758 10.5296 10.3986 10.2815 10.1760
7.5 11.4066 11.1811 10.9875 10.8190 10.6709 10.5381 10.4194 10.3126
7.6 i 1.5575 1 !.3291 11,1329 10.9622 10.8122 10.6776 10.5574 10.4491
7.7 11.7084 11.4770 I 1.2783 11.1054 10.9534 10.8172 10.6954 10.5857
7.8 11.8594 11.6250 11.4237 I 1.2487 11.0947 10.9567 10.8334 10.7223
7.9 12.0104 11.7730 ! 1.5692 11.3919 11.2360 1 ! .0963 10.9714 10.8589
8.0 12.1613 11.9210 ! 1.7147 I 1.5352 11.3773 I 1.2358 1I. 1094 10.9955

NASA,rI'P--2000-207428 215
TABLE A-4.--Continued.

(c) Sample sizes 21 to 28

Safety Probability, Sample size,N


margin.
SM 21 22 23 24 25 26 27 28

-5.0 0 -3.9429 -3.9629 -3.9816 -3.9993 -4.0160 -4.0318 -4.0468 -4.0612


-4.0 0 -3.1338 -3.1502 -3.1656 -3.1801 -3.1938 -3.2068 -3.2192 -3.2310
-3.0 .0013 -2.3180 -2.3310 -2.3432 -2.3547 -2.3655 -2.3759 -2.3856 -2.3949
-2.0 .0227 - 1.4867 - 1.4966 -1.5060 -1.5148 -1.5231 -I.5310 - 1.5385 - 1.5456
- 1.0 .15_ -.6142 -.6222 -.6297 - .6368 - .6434 -.6497 -.6556 -.6613
-0 .5000 .3764 .3669 .3581 .3499 .3422 .3350 .3283 .3219
.I .5398 .4859 .4759 .4667 .458 [ .4501 .4426 .4356 .4289
.2 .5792 .5974 .5869 .5772 .5682 .5598 .5519 .5446 .5376
.3 .61_ .7108 .6998 .6896 .6801 .6713 .6630 .6553 .6480
.4 ._54 .8261 .8145 .8037 .7937 .7844 .7757 .7676 .7599
.5 .6914 .9432 .9308 .9194 .9089 .8991 .8899 .8813 .8733
.6 .7257 1.0619 1.0489 I.O368 1.0256 1.0153 1.0056 .9966 .9881
.7 ._80 I. 1821 I. 1683 1.1555 I. 1437 1.1328 1.1226 1.1130 1.1041
.8 .7881 1.3038 1.2891 11756 t .2632 1.2516 1 1408 t.2307 1.2213
.9 .8159 1.4266 1.4111 1.3968 1.3837 1.3715 1.3601 1.3495 1.3396
1.0 ._13 1.5506 i .5343 1.5192 1.5053 1.4925 ! .4805 1.4693 1.4588
I.I .8643 1.6756 1.6584 1.6425 1.6279 1.6144 1.6017 1.5900 1.5790
1.2 .8849 L,8016 1.7834 1.7667 1.7513 1.7371 1.7238 1.7 [ 14 [ .6999
1.3 .9031 1.9284 1.9093 1.8918 1.8756 1.8606 1.8466 i.8337 1.8215
1.4 .9192 2.0560 2.0359 2.0175 2.0005 i .9848 ! .9701 1.9565 i .9438
1.5 .9331 2.1842 2.1631 2.1438 2.1260 2.1095 2.0942 2.0799 2.0666
1.6 .9452 2.3130 2.2910 2.2707 2.252 ! 2.2349 2.2188 2.2039 2.1899
1.7 .95_ 2.4424 2.4194 2.3982 2.3787 2.3607 2.3439 2.3283 2.3138
1.8 .9640 2.5723 2.5482 2.5262 2.5058 2.4870 2.4695 2.4532 2.4380
1.9 .9712 2.70226 2.6775 2.6545 2.6333 2,6137 2.5955 2.5785 2.5626
2.0 .9772 2.8333 2.8072 2.7832 2.7611 2.7407 2.7217 2.7041 2.6876
2.1 .9821 2.9644 2,9372 2.9123 2.8893 2.8681 2.8483 2.8300 2.8128
2.2 .9860 3.0958 3.0675 3.0416 3.0178 2.9957 2.9753 2.9562 2.9384
2.3 .9892 3.2275 3.1982 3.1713 3.1465 3.1237 3.1024 3.0827 3.0642
2.4 .9918 3.3594 3.3291 3.3012 3.2756 3.2519 3.2298 3.2094 3.1902
2.5 .9937 3.4917 3.4602 3.4314 3.4048 3.3803 3.3575 3.3363 3.3165
2.6 .9953 3.6241 3.5916 3.5617 3.5343 3.5089 3.4853 3.4634. 3.4429
2.7 .9965 3.7568 3.723 I 3.6923 3.6639 3.6377 3.6134 3.5907 3.5695
2.8 .9974 3,8896 3.8549 3.8231 3.7938 3.7667 3.7416 3.7182 3.6963
2.9 .9981 4.0226 3.9868 3.9540 3.9237 3.8958 3.8699 3.8458 3.8233
3.0 .99_ 4.1558 4,1188 4.0850 4.0538 4.025l 3.9984 3.9735 3.9503
3.1 .9990 4.2891 4.25 I0 4.2162 4.1841 4.1544 4.1269 4.1013 4.0775
3.2 .9993 4.4225 4.3833 4.3475 4.3145 4.2839 4.2557 4.2293 4.2047
3.3 .9995 4.5560 4.5158 4.4789 4.444,9 4.4136 4.3845 4.3574 4.3321
3.4 .9996 4.6897 4.6483 4.6104 4.5755 4.5433 4.5134 4.4856 4.4596
3.5 .9997 4.8235 4.7810 4.7420 4.7062 4.6731 4.6424 4.6138 4.5872
3.6 .9998 4.9573 4.9137 4.8738 4.8370 4.8030 4.7715 4.7422 4.7148
3.7 .9998 5.0913 5.0466 5.0056 4.9679 4.9330 4.9007 4.8707 4.8426

216 NASA/TP_2000-207428
TABLE A-4.--Contimted.

(c) Concluded.

Safety Probabitity, Sample size, N


margin, P.t

Su 21 22 23 24 25 26 27 28

3.8 0.9999 5.2254 5.1795 5.1375 5.0988 5.0631 5.0300 4.9992 4.9704
3.9 5.3595 5.3125 5.2695 5.2298 5.1933 5.1593 5.1278 5.0983
4.0 5.4937 5.4456 5.4015 5.3609 5.3235 5.2887 5.2564 5.2262
4.! 5.6280 5.5787 5.5336 5.4921 5.4538 5.4182 5,3851 5.3542
4.2 5.7623 5.7119 5.6658 5.6233 5.5841 5.5477 5.5139 5.4823
4.3 5.8967 5.8452 5.7980 5.7546 5.7145 5.6773 5.6427 5.6104
4.4 6.0311 5.9785 5.9303 5.8859 5.8449 5.8069 5.7716 5.7386
4.5 6.1657 6.1119 6.0626 6.0173 5.9754 5.9366 5.9005 5.8668
4.6 6.3002 6.2453 6.1950 6.1487 6.1060 6.0663 6.0295 5.9951
4.7 6.4348 6.3788 6.3274 6.2802 6.2366 6.1961 6.1585 6.1234
4.8 ,r
6.5695 6.5123 6.4599 6.4117 6.3672 6.3259 6.2875 6.2517
4.9 1.0003 6.7042 6.6458 6.5924 6.5433 6.4979 6.4558 6.4166 6.3801
5.0 6.8389 6.7794 6.7250 6.6748 6.6286 6.5856 6.5457 6.5085
5.1 6.9737 6.9131 6.8575 6.8065 6.7593 6.7156 6.6749 6.6369
5.2 7.1085 7.0467 6.9902 6.9381 6.8901 6.8455 6.8041 6.7654
5.3 7.2433 7.1804 7.1228 7.0698 7.0209 6.9755 6.9333 6.8939
5.4 7.3782 7,3141 7.2555 7.2015 7.1517 7.1055 7.0625 7.0224
5.5 7.5131 7.4479 7.3882 7.3333 7.2826 7.2355 7.1918 7.1510
5.6 7.6480 7.5817 7.5209 7.4651 7.4134 7,3656 7.3211 7.2795
5.7 7.7830 7.7155 7.6537 7.5969 7.5443 7.4957 7.4504 7.4081
5.8 7.9180 7.8493 7.7865 7.7287 7.6753 7.6258 7.5797 7.5368
5.9 8.0530 7.9832 7.9193 7.8605 7.8062 7.7559 7.7091 7.6654
6.0 8.1880 8.1171 8.0521 7.9924 7,9372 7.8861 7.8385 7.7941
6.1 8.3231 8.2510 8.1850 8.1243 8.0682 8.0162 7.9679 7.9228
6.2 8.4582 8.3849 8.3179 8.2562 8.1992 8.1464 8.0973 8.0515
6.3 8.5933 8.5189 8.4508 8.3881 8,3303 8.2766 8.2267 8.1802
6.4 8.7284 8.6529 8.5837 8.5201 8.4613 8.4069 8.3562 8.3089
6.5 8.8635 8.7868 8.7166 8.6520 8.5924 8.5371 8.4857 8.4377
6.6 8.9987 8.9208 8.8496 8.7840 8.7235 8.6674 8.6152 8.5664
6.7 9.1338 9.0549 8.9825 8.9160 8.8546 8.7976 8.7447 8.6952
6.8 9.2690 9.1889 9. ! 155 9.0480 8.9857 8.9279 8.8742 8.8240
6.9 9.4O42 9.3230 9.2485 9.1801 9,1168 9.0582 9.0037 8.9528
7.0 9.5395 9.4570 9.3815 9.312I 9.2480 9.1885 9.1333 9.0817
7.1 9.6747 9.5911 9.5146 9.4442 9.3791 9.3189 9.2628 9.2105
7.2 9.8099 9.7252 9.6476 9.5762 9.5103 9.4492 9.3924 9.3394
7.3 9.9452 9.8593 9.7807 9.7083 9.6415 9.5796 9.5220 9.4682
7.4 10.0805 9.9934 9.9137 9.8404 9.7727 9.7099 9.6516 9.5971
7.5 10.2158 iO. 1276 10.0468 9.9725 9.9039 9.8403 9.7812 9.7260
7.6 10.3511 10.2617 10.1799 10.1046 10.0351 9.9707 9.9108 9.8549
7.7 10.4864 10.3959 10.3130 10.2368 10.1664 10.1011 10.0404 9.9838
7.8 10.6217 10.5300 10.4461 10.3689 10.2976 10.2315 10.1700 10.1127
7.9 10.7570 10.6642 10.5792 10.5010 10.4288 10.3619 10.2997 10.2416
8.0 10.8924 10.7984 10.7123 10.6332 10.5601 10.4923 10.4293 10.3705

NASA/TP--2000-207428 217
TABLE A-4.--Cominued.

(d) Sample sizes 30 to 160

Safety Probability. Sample size, N


margin. p,
s,, 3O 40 5O 60 70 8O 9O 160

-5.0 0 -4.0878 -4.1922 -4.2661 -4.3220 -4.3664 -4.4028 -4.4333 -4.4594


-4.0 0 -3.2528 -3.3386 -3.3992 -3.4452 -3.4815 -3.5113 -3.5364 -3.5578
-3.0 .6013 -2.4123 -2.4801 -2.5280 -2.5642 -2.5929 -2.6164 -2.6361 -2.6530
-2.0 .0227 - !.5589 -!.6105 -!.6469 - 1.6743 - 1.6960 - 1.7138 - 1.7286 - 1.7413
-I.0 .1586 -.6717 -.7122 - .7402 -.7612 -.7777 -.7911 -.8023 -.8118
-0 .5600 .3102 .2664 .2371 .2158 .1993 .1861 .1752 .1660
.1 .5398 .4168 .3713 .3411 .3191 .3O22 .2886 .2775 .2681
.2 .5792 .5249 .4776 .4462 .4235 .4060 .3921 .3806 .37 I0
.3 .61_ .6347 .5852 .5526 .5290 .5109 .4965 .4846 .4747
.4 .6554 .7459 .6941 .6660 .6354 .6166 .6017 .5894 .579 I
.5 .6914 .8586 .8042 .7685 .7429 .7233 .7078 .6950 .6843
.6 .7257 .9726 .9154 .8781 .8512 .8308 .8146 .8014 .7903
.7 .7580 1.0877 1.0277 .9885 .9604 .9391 .9222 .9084 .8968
.8 .7_! ! .2041 1.1409 1.0998 1.0704 1.0481 1.0304 1.0160 1.0039
.9 .8159 1.3214 ! .2550 !.2118 1.1811 1.1577 I. 1393 1.1242 I.i116
1.0 .8413 ! .4398 1.3699 1.3246 !.2924 !.2680 1.2487 1.2329 1.2198
I.| .8643 1.5589 1.4855 1.4381 1.4043 1.3787 1.3586 1.3421 1.3284
1.2 .8849 i .6788 1.6018 1.5520 1.5167 1.4900 i .4689 1.4518 1.4374
1.3 .9031 !.7994 1.7187 1.6666 1.6296 1.6017 !.5797 1.5618 1.5468
1.4 .91_ 1.9206 1.8361 1.7816 1.7429 1.7138 i .6908 1.6T21 1.6.566
1.5 .9331 2.0423 1.9539 1.8970 1.8566 1.8262 1.8023 1.7828 1.7666
1.6 .9452 2.1645 2.0722 2.0127 1,9707 1.9390 1.9140 ! .8938 1.8769
1.7 ._ 2.2873 2.1908 2.1289 2.0851 2.0521 2.0261 2.0050 !.9874
1.8 .9640 2.4104 2.3099 2.2453 2.1997 2.1654 2.1384 2.1164 2.0981
1.9 ._12 2.5338 2.4292 2.3620 2.3146 2.2789 2.2509 2.2281 2.2091
2.0 .9772 2.6576 2.5488 2.4790 2.4297 2.3927 2.3635 2.3399 2.3202
2.1 .9821 2.7817 2.6686 2.5962 2.5451 2.5066 2.4764 2.4519 2.4314
2.2 .9860 2.9061 2.7887 2.7136 2.6606 2.6207 2.5894 2.5640 2.5428
2.3 .9892 3.0307 2.9090 2.8312 2.T763 2.7350 2.7026 2.6763 2.6544
2.4 .9918 3.1555 3.0295 2.9489 2.8921 2.8494 2.8159. 2.7887 2.7661
2.5 .9937 3.2905 3.1502 3.0669 3.0081 2.9640 2.9293 2.9012 2.8778
2.6 .9953 3.4058 3.2710 3.1849 3.1243 3.0787 3.0429 3.0139 2.9897
2.7 .9965 3.5311 3.3920 3.3031 3.2405 3.1935 3.1565 3.1266 3.1017
2.8 .9974 3.6567 3.5131 3.4214 3.3568 3.3083 3.2703 3.2394 3.2137
2.9 .9981 3.7824 3.6344 3.5398 3.4733 3.4233 3.3841 3.3523 3.3258
3.0 .9986 3.9082 3.7557 3.6583 3.5898 3.5384 3.4980 3.4653 3.4380
3.1 .9990 4.0341 3.8771 3.7769 3.7064 3.6535 3.6120 3.5783 3.5503
3.2 .9993 4.1601 3.9987 3.8956 3.8231 3.7687 3.7260 3.6914 3.6626
3.3 .9995 4.2863 4.1203 4.0144 3.9399 3.8840 3.8401 3.8045 3.7750
3.4 .9996 4.4125 4.2420 4.1332 4.0567 3.9993 3,9542 3.9177 3.8874
3.5 .9997 4.5388 4.3638 4.2521 4.1736 4.1147 4.0684 4.0310 3.9998
3.6 .9998 4.6652 4.4856 4.3711 4.2905 4.2301 4.1827 4.14o,3 4.1123
3.7 .9998 4.7917 4.6075 4.4901 4.4075 4.3456 4.2970 4.2576 4.2249

218 NASA,rI'P--2000-207428
TABLE A--4.--Conclud_,

(d) Concluded.

Safety Probability, Sample size. N


margin, Px
SM 30 40 50 6O 7O 8O 90 100
r"-

3.8 0.9999 4.9182 4.7295 4,6O92 4.5246 4.4611 4.4114 4.3710 4.3375
3.9 5.0448 4.8515 4.7283 4.6417 4.5767 4.5257 4.4844 4.450 1
4.0 ' 5.1715 4.9736 4.8475 4.7588 4.6923 4.6402 4.5979 4.5628
4.1 5.2982 5.O958 4.9667 4.8760 4.8080 4.7546 4.7114 4.6755
4.2 5.4250 5.2179 5.0860 4.9932 4.9237 4.8691 4.8249 4.7882
4.3 5.5519 5.3402 5.2053 5.1105 5.0394 4.9836 4.9385 4.9009
4.4 5.6788 5.4624 5.3246 5.2278 5.1551 5.0982 5.0520 5.0137
4.5 5.8057 5.5847 5.4440 5.345 i 5.2709 5.2128 5.1656 5.1265
4.6 5.9327 5,7071 5.5634 5.4624 5.3867 5.3274 5.2793 5.2393
4.7 6.0597 5.8294 5.6828 5.5798 5.5025 5,4420 5.3929 5.3521
4.8 ', 6.1867 5.9519 5.8023 5.6972 5.6184 5.5566 5.5066 5.4650
4.9 1.0000 6.3138 6.0743 5.9218 5.8146 5.7343 5.6713 5.6203 5.5779
5.0 6.4409 6.1968 6.0413 5,9321 5.8502 5.786O 5.7340 5.6908
5.1 6.5681 6.3193 6.1608 6.0495 5.9661 5.90O7 5.8477 5.8037
5.2 6.6952 6.4418 6.2804 6.1670 6.0820 6.0154 5.9614 5.9166
5.3 6.8224 6.5643 6.4000 6.2845 6.1980 6.1302 6.0752 6.0296
5.4 6.9497 6.6869 6.5196 6.4021 6.3140 6.2449 6.1890 6.1425
5.5 7.0769 6.8095 6.6392 6.5196 6.4300 6.3597 6.3028 6.2555
5.6 7.2042 6.9321 6.7588 6.6372 6.5460 6.4745 6.4166 6.3685
5.7 7.3315 7.0547 6.8785 6.7548 6.6620 6.5893 6.5304 6.4815
i
5.8 7.4588 7.1774 6.9982 6.8723 6.7780 6.7041 6.6442 6.5945
5.9 7,5862 7.3000 7,1179 6.9900 6.8941 6.8189 6.7581 6.7075
6.0 7.7136 7.4227 7.2376 7.1076 7.0101 6.9338 6.8719 6.8205
6.1 7.8409 7.5454 7.3573 7.2252 7.1262 7.0486 6.9858 6.9336
6.2 7.9683 7.6681 7.4770 7.3429 7.2423 7.1635 7.0996 7.0466
6.3 ' 8.0958 7.7908 7.5968 7.4605 7.3584 7.2784 7.2135 7.1597
6.4 8.2232 7.9136 7.7165 7.5782 7.4745 7.3932 7.3274 7.2727
6.5 8,3506 8.0363 7.8363 7.6959 7.5906 7.5081 7.4413 7.3858
6.6 8,478 I 8.1591 7.9561 7.8136 7.7068 7.6230 7.5552 7.4989
6,7 8.6056 8.2819 8.0759 7.9313 7.8229 7.7379 7.6691 7.6120
6.8 8,733I 8.4047 8.1957 8.0490 7.9390 7.8529 7.7831 7.7251
6.9 8.8606 8.5275 8.3155 8.1667 8.0552 7.9678 7.8970 7.8382
7.0 8.9881 8.6503 8.4354 8,2844 8.1714 8.0827 8.0109 7.9513
7.1 9.1157 8.7731 8.5552 8.4022 8.2875 8.1977 8.1249 8.0644
7.2 9.2432 8,8960 8.6750 8.5199 8.4037 8,3126 8.2388 8.1776
7.3 9.3708 9.0188 8.7949 8.6377 8.5199 8.4276 8.3528 8.2907
7.4 9.4983 9.1417 8.9147 8.7554 8.6361 8.5425 8.4668 8.4038
7,5 9.6259 9.2645 9.0346 8.8732 8.7523 8.6575 8.5807 8.5170
7.6 9.7535 9.3874 9.1545 8.9910 8.8685 8.7725 8.6947 8.6301
7.7 9.8811 9.5103 9.2744 9.1088 8.9847 8.8874 8.8087 8.7433
7.8 10.0087 9.6332 9.3943 9.2266 9.1009 9 .OO24 8.9227 8.8564
7.9 10.1363 9.7561 9.5142 9.3444 9.2171 9.1174 9.0367 8.9696
8.0 " 10.2639 9.8790 9.6341 9.4622 9.3334 9.2324 9.1507 9.0828

NASA/TP_2000-207428 219
TABLE A-5.--SAFETY MARGINS AT 90-PERCENT CONFIDENCE LEVEL

Ca) Sample sizes 5 to 12

Safety Probability, Sample size, N


margin, P_
s. 7 8 !0 I1 12

-5.0 0 -3.5162 -3.6140 -3.6932 -3.7586 -3.8146 -3.8623 -3.9045 -3.9,,18


-4.0 0 -2.7824 -2,8627 -2.9278 -2.9816 -3.0276 -3.0669 -3.1016 -3.1323
-3.0 .0013 -2.0381 -2.1018 -2.1535 -2.1962 -2.2327 -2.2640 -2.2915 - 2.3159
-2.0 .0227 - i.2682 - 1.3178 -1.3578 -1.3910 - 1.4192 - 1.4435 - 1.4647 - 1.4835
- i.0 .1586 - .4225 - .4673 - .5023 - .5312 -.5548 -.5752 - .5927 - .6082
-0 .5000 .6857 .6023 .5439 .5000 .4657 .4373 .4137 .3936
.1 .5398 .8218 .7303 .6665 .6190 .5822 .5518 .5267 .5053
.2 .5792 .9632 .8623 .7930 .7412 .7014 .6689 .6420 .6193
.3 .6179 1.1098 .9984 .9229 .8664 .8234 .7885 .7598 .7355
.4 ,_54 1.2615 l. 1385 1.0557 ,9946 .9481 .9105 .8797 .8537
.5 .6914 1.4168 i.2817 I.1912 I.1256 1.075O 1.0347 1.0016 .9739
.6 .7257 1.5762 1.4282 1.3296 1.2591 1.2043 1.1610 1.1256 1.0960
.7 .75_ 1.7392 1.5777 1.4709 1.3945 1.3358 1.2894 1.2514 1.2197
.8 .7_i 1.9057 1.7301 1.6147 1.532] 1.4693 1.4196 1.3789 1.3451
.9 .81_ 2.0753 1.8851 i .7608 1.6715 1.6044 1.5512 1.5078 1.4717
1.0 .8413 2.2475 2.0422 1.9087 1.8127 1.7412 1,6843 1.6381 1.5997
1.1 .8643 2.4221 2.2010 2.0578 1.9558 1.8792 1.8186 1.7695 1.7288
1.2 .8_9 2.5988 2.3615 2.2085 2.1003 2.0184 1.9542 i.9021 1.8590
1.3 .9031 2.7769 2.5237 2.3605 2.2460 2,1589 2.0908 2.0357 1.9901
1.4 .9192 2.9564 2.6871 2.5138 2,3923 2.3003 2.2283 2.1701 2.1220
!.5 .9331 3.1372 2.8518 2.6681 2.5396 2.4426 2.3666 2.3052 2.2546
1.6 .9452 3.3192 3.0175 2.8234 2.6878 2.5857 2.5057 2.4411 2.3879
1.7 .9554 3.5024 3.I841 2.9795 2.8367 2.7296 2.6454 2.5776 2.5217
1.8 .9640 3.6868 3.3516 3.1363 2.9863 2.8740 2.7857 2.7147 2.6562
1.9 ._12 3.8720 3.5198 3.2938 3,1366 3.0189 2.9265 2.8522 2.7910
2.0 .9772 4.0580 3.6886 3.4519 3.2873 3.1643 3.0678 2.9902 2.9262
2.1 .9821 4.2446 3.8580 3.6105 3.4386 3.3101 3.2095 3.1285 3.0619
2.2 .9860 4.4318 4.0279 3.7696 3,5903 3.4563 3.3515 3.2672 3.1979
2.3 .9892 4.6195 4.1983 3.9292 3.7424 3.6O29 3.4940 3.4063 3.3341
2.4 .9918 4.8076 4.3691 4.0891 3.8948 3.7499 3.6367 3.5456 3.4707
2.5 .9937 4.9962 4.5403 4.2493 4.0476 3.8971 3.7797' 3.6852 3.6076
2.6 .9953 5.1851 4.7118 4.4099 4.2006 4.0446 3.9230 3,825 l 3.7446
2.7 .9965 5.3742 4.8836 4.5707 4.3539 4.1924 4.0665 3.9652 3.8819
2.8 .9974 5.5636 5.0557 4.7318 4.5075 4.3404 4.2102 4.1054 4.0194
2.9 .9981 5.7532 5.2281 4.8931 4.6613 4.4886 4.3541 4.2459 4.1570
3.0 .9986 5.9431 5.4007 5.0547 4.8152 4.6870 4.4982 4.3865 4.2948
3.1 .9990 6.1332 5.5735 5.2164 4.9694 4.7855 4,6425 4.5273 4.4328!
3.2 .9993 6.3236 5.7465 5.3784 5.1237 4.9342 4.7869 4.6683 4.5709
3.3 .9995 6.5142 5.9197 5.5405 5.2782 5,0831 4.9314 4.8093 4.7091
3.4 .9996 6.7009 6.0931 5.7027 5.4328 5,2321 5,0761 4.9505 4.8475
3.5 .9997 6.8958 6.2666 5.8651 5.5876 5.3812 5.2209 5.0918 4.9859
3.6 .9998 7.0869 6.4402 6,0276 5.7425 5.5305 5.3658 5.2332 5. [245
3.7 .9998 7.2781 6.6140 6.1903 5.8975 5.6798 5.5108 5.3747 5.2631

220 NASA/TP_2000-207428
TABLE A-5.--Continued.

(a) Concluded.

Safety Probability, Sample size. N


margin, p_
Su 6 7 8 9 10 11 12

3.8 0.9999 7.4694 6.7879 6.3531 6.0526 5.8293 5.6559 5.5163 5.4018
3.9 7.6609 6.9619 6.5159 6.2078 5.9788 5.8011 5.658O 5.5406
I
4.0 7.8525 7.1360 6.6789 6.3631 6.1284 5.9464 5.7998 5.6795
4.1 8,0442 7.3102 6.8419 6.5185 6.2781 6.0917 5.9416 5.8185
4.2 8.2360 7.4845 7.0051 6.6740 6.4279 6.2371 6.0835 5.9575
4.3 8.4279 7.6589 7.1683 6.8295 6.5778 6.3826 6.2254 6.0965
4,4 8.6199 7.8334 7.3316 6.9851 6.7277 6.5282 6.3675 6.2357
4.5 8.8120 8.0079 7.4949 7.1408 6.8777 6.6738 6.5095 6.3749
4.6 9,0041 8.1826 7.6584 7,2965 %0277 6.8194 6.6517 6.5141
4.7 9.1963 8.3572 7.8219 7.4523 7.1778 6.9651 6.7938 6.6534
4.8 'r 9.3886 8.5320 7.9854 7.6082 7.3280 7.1109 6.9361 6.7927
4.9 1.0000 9.5809 8.7068 8.1490 7.7641 7.4782 7.2567 7.0783 6.9321
5.0 9.7734 8.8816 8.3127 7.9200 7.6284 7.4025 7.2206 7.0715
5.1 9.9658 9.0566 8.4764 8.0760 7.7787 7.5484 7.3630 7.2109
5.2 10.1584 9.2315 8.6401 8.2320 7.9290 7.6944 7.5054 7.3504
5.3 10.3509 9.4065 8.8039 8.3881 8.0794 7.8403 7.6478 7.490O
5.4 10.5436 9.5816 8.9677 8.5442 8.2298 7.9863 7.79(32 7.6295
5.5 10.7362 9.7567 9.1316 8.7OO4 8.3802 8.1324 7.9327 7.769 l
5.6 10.9289 9.9318 9.2955 8.8566 8.5307 8.2784 8.0752 7.9087
5.7 11.1217 10.1070 9.4595 9.0128 8.6812 8.4245 8.2178 8.0483
5.8 i 1.3145 10.2822 9.6235 9.1691 8.8318 8.5706 8.3603 8.1880
5.9 11.5073 10.4574 9.7875 9.3253 8.9823 8.7168 8.5029 8.3277
6.0 11.7002 10.6327 9.9515 9.4817 9.1329 8.8630 8.6456 8.4674
6,1 il.8931 10.8080 10.1156 9.6380 9.2835 9,0092 8.7882 8.6071
6.2 12.0860 10.9833 10.2797 9.7944 9.4342 9.1554 8.9309 8.7469
6.3 12.2790 i i. 1587 10.4438 9.9508 9.5848 9.3016 9.0736 8.8866
6.4 12.4720 I 1.3341 10.6079 10.1072 9.7355 9.4479 9.2163 9.0264
6.5 12.6650 I 1.5095 10.772 i 10.2636 9.8862 9.5942 9.359O 9.1662
6.6 12.8581 11.6849 10.9363 10.4201 10.0869 9.7405 9.5017 9.3061
6.7 13.0512 11.8604 11.1005 I 0. 5765 I O. 1877 9.8868 9.6445 9.4459
6.8 13.2443 12.0359 11.2648 10.7330 10.3385 10.0332 9.7873 9.5858
6.9 13.4374 12.2114 11.4290 10.8896 10.4892 10.1795 9.9301 9.7257
7.0 13.6305 12.3869 I 1.5933 I 1.046 ! 10.6400 10.3259 10.0729 9.8656
7.1 13.8237 12.5625 11.7576 11.2026 10.7909 10.4723 10.2157 10.0055
7.2 14.0169 12.7380 11.9219 11.3592 10.9417 10.6187 10.3585 10.1454
7.3 14.2101 12.9136 12.0863 11.5158 11.0925 10.7651 10.5014 10.2853
7.4 14.4033 13.0892 12.2506 ! 1.6724 1 i.2434 10.9116 10.6443 10.4253
7,5 14,5966 13.2648 12.4150 ! !.8290 11.3943 11.0580 i0.7_72 10.5652
7.6 14.7899 13.4405 12.5794 I 1.9857 11.5452 11.2045 10.9300 10.7052
7.7 14.983 I 13.6161 12.7437 12.1423 11.6961 11.3509 ! 1.0729 10,8452
7.8 15.1764 13.7918 12.9082 12.2990 I 1.8470 11.4974 11.2159 10.9852
7.9 15.3698 13.9675 13.0726 12.4556 I 1.9979 11.6439 11.3588 i1.1252
8.0 15.5631 14.1432 13.2370 12,6123 12.1488 11.7904 11.5017 11.2652

NASA/TP--2000-207428 221
TABLE A-5.--Continued.

Co) Sample sizes 13 to 20

Safety Sample size, N


margin,
SM 13 14 15 16 17 18 19 2O

-5,0 0 -3.9752 -4.O054 -4.0328 -4.0579 -4.0809 -4.1023 -4.1221 -4.1406


-4.0 0 -3.1598 -3.1846 -3.2072 -3.2278 -3.2468 -3.2643 -3.2806 -3.2958
-3.0 .0013 -2.3377 -2.357_ -2.3753 -2.3916 -2.4067 -2.4206 -2.4335 -2.4455
-2.0 .0227 - i.5003 - 1.5155 - 1.5293 - 1.5419 - i.5534 -i.5641 - 1.5739 - 1.583 !
-I.0 .1586 - .6220 - .6343 - .6455 -.6556 -.6649 -.6734 -.6813 - .6886
-0 .5000 .3762 .3609 .3473 .3351 .3242 .3143 .3052 .2969
.I .5398 .4869 .4707 .4564 .4456 .4321 .4217 .4123 .4036
.2 .5792 ,5997 .5826 .5675 .5540 .5419 .5310 .5210 .5119
.3 .6179 .7146 .6964 .6804 .6662 .6534 .6419 .6314 .6218
.4 .6554 .8315 .8122 .7952 .7801 .7666 .7544 .7433 .7332
.5 .6914 .9502 .9296 .9116 .8956 .8813 .8684 .8566 .8460
,6 .7257 i,0707 1.0488 !.02% 1.0126 .9975 .9838 .9714 .9601
.7 .7580 I. 1927 1.1694 1.1490 i.1310 1.1149 1.1004 1.0873 1.0753
.8 .7881 1.3163 1.2915 1.2698 1.2507 !.2336 1.2182 1.2044 1.1917
.9 .8159 1.4411 1.4148 1.3918 i.3715 1.3534 1.3372 1.3225 1.3091
1.0 .8413 1.5673 1.5393 1.5149 1.4935 1.4744 1.4572 1.4416 1.4275
I.I ,8643 !.6944 i .6648 1.6390 1.6163 L.5962 1.5780 1.5616 1.5467
1.2 .8849 1.8225 !,7913 ! .7640 1.7401 1.7188 1.6996 1.6824 1.6667
1.3 .9031 1.9516 1,9186 1.8899 1.8646 1.8422 1.8220 1,8038 13873
1,4 .9192 2.0814 2.0466 2.0164 1.9898 1.9662 1.9450 1.9259 1.9086
1.5 .9331 2.2119 2.1753 2.1435 2.1156 2.O90_ 2.0686 2.0485 2.0303
!.6 .9452 2.5430 2.3046 2,2712 2.242O 2.2160 2.1927 2.1716 2.1526
1.7 .9554 2.4747 2.4344 2.3995 2.3688 2.3416 2.3172 2.2952 2.2753
1.8 .9640 2.6069 2.5648 2.5282 2.4962 2.4677 2.4422 2.4192 2.3984
1.9 .9712 2.7395 2.6955 2.6573 2.6238 2.5942 2.5675 2.5435 2.5218
2.0 .9772 2.8725 2.8266 2.7868 2.7519 2.7210 2.6932 2.6682 2.6456
2.1 •9821 3.0059 2.9580 2.9166 2.8802 2.8480 2.8191 2.793 ! 2.7696
2.2 .9S60 3.1396 3.0898 3.0467 3.0089 2.9754 2.9454 2.9183 2.8939
2.3 .9892 3.2736 3.2219 3.1771 3.1378 3.1031 3.0719 3.0438 3.0184
2.4 .9918 3.4079 3.3542 3.3077 3.2670 3.2309 3.1986 3.1695 3.1432
2.5 .9937 3.5424 3.4867 3.4385 3.3963 3.3590 3.3255 3.2954 3.2681
2.6 .9953 3.677 1 3.6195 3.5696 3.5259 3.4873 3.4526 3.4214 3.3932
2.7 .9965 3,8121 3.7525 3.7009 3.6557 3.6157 3.5799 3,5476 3.5185
2.8 .9974 3.9472 3.8856 3.8323 3.7856 3.7444 3.7073 3.6740 3.6439
2.9 .9981 4.0825 4.0189 3.9639 3.9157 3.8731 3.8349 3.8O05 3.7695
3.0 .9986 4.2179 4.1523 4.O956 4.0459 4.0020 3.9626 3.9272 3.8952
3.1 .9990 4.3535 4.2859 4.2275 4.1763 4.1310 4.0904 4.0539 4.0209
3.2 .9993 4.4893 4.4197 4.3594 4.3067 4.2602 4.2183 4.1808 4.1468
3.3 .9995 4.6251 4.5535 4.4915 4.4373 4.3894 4.3464 4.3078 4.2728
3.4 .9996 4.7611 4.6874 4.6237 4.5680 4.5187 4.4745 4.4348 4,3989
3.5 .9997 4.8971 4.8215 4.7560 4.6988 4.6481 4.6027 4.5619 4.5251
3.6 .9998 5.0333 4.9556 4.8884 4.8296 4.7777 4.7310 4.6892 4.6513
3,7 .9998 5.1695 5.0898 5.02O9 4.96O5 4.9072 4.8594 4.8165 4.7775
.,

222 NASA/TP--2000-207428
TABLE A-5.--Continued.

(b) Concluded.

Probability, Sample size, N


rrlargtn, Pa "'
SM 13 14 15 16 17 18 19 20

3.8 0.9999 5.3059 5.2241 5.1534 5.0916 5.0369 4.9879 4.9438 4.9040
3.9 5.4423 5.3585 5.2860 5.2226 5.1666 5.1164 5.0712 5.0305
4.0 5.5788 5.4929 5.4187 5.3538 5.2964 5.2449 5.1987 5.1570
4.1 5.7153 5.6274 5.5514 5.4850 5,4263 5.3736 5.3263 5.2835
4.2 5.8519 5.7620 5.6842 5.6162 5.5562 5.5022 5.4539 5.4101
4.3 5.98S6 5.8966 5.8171 5.7475 5.686! 5.6310 5.5815 5.5368
4,4 6.1253 6.0313 5.950O 5.8789 5.8161 5.7598 5.7092 5.6635
4.5 6.2621 6.1660 6.0829 6.0103 5,9461 5.8886 5.8369 5.7902
4.6 I 6.3989 6.3008 6.2159 6.1418 6.0762 6.0174 5.9647 5.9170
4.7 1 6.5358 6.4356 6.349O 6.2733 6.2064 6.1463 6.0925 6.0438
4.8 t, 6.6727 6.5704 6.4821 6.4048 6.3365 6.2743 6.2203 6.1707
4.9 1.0(300 6.8O96 6.7053 6.6152 6.5364 6.4667 6.4043 6.3482 6.2975
5.0 ! 6.9466 6.8402 6.7483 6.6680 6,5970 6.5333 6.4761 6.4245
5.1 7.0836 6.9752 6.8815 6.7996 6.7272 6.6623 6.6040 6.5514
5.2 7.2207 7.1102 7,0147 6.9313 6.8575 6.7914 6.7320 6.6784
5.3 7.3578 7.2452 7.1480 7.0630 6.9878 6.9205 6.8600 6.8054
5.4 7.4949 7.3803 7.2813 7.1947 7.1182 7.0496 6.9880 6.9324
5.5 7.6321 7.5154 7.4146 7.3264 7.2486 7.1787 7.1160 7.0594
5.6 7,7693 7.6505 7.5479 7.4582 7.3790 7.3079 7.2441 7.1865
5.7 7,9065 7.7856 7.6813 7.5900 7.5094 7.4371 7.3722 7.3136
5.8 8.0437 7.9208 7.81445 7.7218 7.63 8 7.5663 7.5003 7.4407
5.9 8.1809 8.0560 7.9480 7.8537 7.7703 7.6955 7.6284 7.5678
6.0 8,3182 8.1912 8.0815 7.9855 7.9008 7.g248 7.7566 7.6960
6.1 8,4555 8.3264 8.2149 8.1174 8.0313 7.9540 7.8847 7.822 |
6.2 8.5928 8.4617 8.3484 8.2493 8.1618 8.0833 8.0129 7.9493
6.3 8.7302 8.5969 8.4818 8.3812 8.2924 8.2126 8.1411 8.0765
6.4 8,8675 8.7322 8.6153 8.5131 8.4229 8.3420 8.2693 8.2037
6.5 9.0049 8.8675 8.7488 8.6451 8.5535 8.4713 8.3975 8.3309
6.6 9.1423 9.0028 8.8824 8.7771 8.6841 8.6006 8.5258 8.4582
6.7 9.2797 9.1382 9.0159 8.9090 8.8147 8.730O 8.6540 8.58.54
6.8 9.4171 9.2735 9.1495 9.0410 8.9453 8.8594 8.7823 8.7127
6.9 9.5546 9.4089 9.2830 9.1730 9.0759 8.9888 8.9106 8.8400
7.0 9.6920 9.5443 9.4166 9.3051 9.2065 9.1182 9.0389 8.9672
7.1 9.8295 9.6797 9.5502 9.4371 9.3372 9.2476 9.1672 9.0945
7.2 9.9670 9.8151 9.6838 9.569 ! 9.4679 9.3770 9.2955 9.2219
7,3 !O. 1045 9.9505 9.8175 9.7012 9.5985 9.5064 9.4238 9.3492
7.4 10.2420 10.0859 9.9511 9.8333 9.7292 9.6359 9.5521 9.4765
7.5 10.3795 !0.2213 10.0847 9.9653 9.8599 9.7653 9._805 9.6038
7.6 10.5170 10.3568 10.2184 10.0974 9.9906 9.8948 9.8088 9.7312
7.7 10.6546 10.4923 10.3521 10.2295 10.1213 10.0243 8.8372 9.8585
7.8 i 10.7921 10.6277 10.4857 10.3616 10.2521 10.1538 !0.0656 9.9859
7.9 10.9297 10.7632 10.6194 10.4938 10.3828 10.2833 10.1940 10.1133
8.0 " 11.0672 10.8987 10.7531 10.6259 10.5135 10.4128 10.3223 10.2407

NASA/TP--2000-207428 223
TABLE A-5,.--Cominued.

(c)Sample sizes21 to 28

Safety Probability, Sample size,


N
margin
Sw 21 22 23 24 25 26 27 28

-5.0 0 -4.1579 -4.1740 -4.1893 -4.2036 -4.2172 -4.2300 -4.2422 -4.2538


-4.0 0 -3.3100 -3.3233 -3.3358 -3.3476 -3.3587 -3.3693 -3.3793 -3.3888
-3.0 .0013 -2.4568 -2,4673 -2.4772 -2.4865 -2.4954 -2.5037 -2.5116 -2.5192
-2.0 .(Y227 -1.5917 - 1.5998 - 1.6073 - 1.6145 - 1.6212 - 1.6276 - 1.6336 - 1.6393
-1.0 .1586 - .6954 - .7017 --.7077 -.7133 -.7186 - .7235 - .7283 - .7327
0 .5000 .2893 .2821 .2755 .2694 .2636 .2582 .2531 .2483
.I .5398 .3956 .3882 .3813 .3749 .3689 .3633 .3580 .3530
.2 .5792 .5035 .4958 .4886 .4819 .4757 .4698 .4643 .4591
.3 .6179 .6130 .6049 ,5973 .5903 .5837 .5776 .5719 .5664
.4 .6554 .7239 .7153 .7074 .7000 .6931 .6867 .6807 .6750
.5 .6914 .8362 .8271 .8188 .8L10 .8037 .7970 .7906 .7847
.6 .7257 .9497 .9402 .9314 .9232 .9155 .9084 .9017 .8955
.7 .7580 1.0644 1.0543 !.0450 1.0364 1.0283 1.0208 i .0138 i.0072
.8 .7881 1.18(12 I.1695 1.1597 1.1506 i.1421 !. 1342 1.1269 1.1199
.9 .8159 1.2969 1.2857 1.2754 1.2658 1.2568 1.2485 1.2407 1,2334
1.0 .8413 1.4146 1.4028 1.3919 1,3818 1.3724 1.3636 1.3554 1.34771
l.l .8643 1.5331 1.5207 1.5092 1.4985 1.4886 1.4794 1.4708 1.4627]
1.2 .8849 1.6524 1.6392 1.6271 ! ,6159 1.6055 1.5959 1.5868 1.5783
1.3 .9031 1.7723 1.7585 1.7457 1.7340 !.7231 i.7129 i.7034 1.6945
1.4 .9192 1.8927 1.8783 I.8649 i.8526 1.8411 1.8305 1.8205 1.8112
1,5 .9331 2.0137 1.9985 1.9846 !.9716 1.9596 1.9485 1.9380 1.9283
1.6 .9452 2.1352 2.1193 2.1047 2.0911 2.0786 2.0669 2.0560 2.0458
1.7 .9554 2.2571 2.2405 2.2252 2.2111 2.1979 2.1857 2.1743 2.1637
1.8 .9640 2.3794 2,3621 2.3461 2.3313 2.3176 2.3O49 2.2930 2.2819
1.9 .9712 2.5021 2.4839 2.4673 2.4519 2.4376 2.4244 2.4120 2.4004
2.0 .9772 2.6250 2.6061 2.5888 2.5728 2.5579 2.5441 2.5312 2.5192
2.1 .9821 2.7482 2.7286 2.7105 2.6939 2.6784 2.6641 2.6507 2.6382
2.2 .9860 2.8716 2.8513 2.8325 2.8152 2.7992 2.7843 2.7704 2.7574
2.3 .9892 2.9953 2.9742 2.9547 2.9368 2.9202 2.9047 2.8903 2.8768
2.4 .9918 3.1192 3.0973 3.0772 3.0586 3.0413 3.0253 3.0104 2.9964
2,5 .9937 3.2433 3.2206 3.1998 3.1805 3.1627 3.1461 3.1306 3.1162
2.6 .9953 3.3676 3.3441 3.3225 3.3026 3.2842 3.2670 3.2510 3.2361
2.7 .9965 3.4920 3.4677 3.4454 3.4249 3.4058 3.3881 3.3716 3.3561
2.8 .9974 3.6165 3.5915 3.5685 3.5472 3.5276 3.5093 3.4922 3.4763
2.9 .998l 3.7412 3.7154 3.6916 3.6697 3.6495 3.6306 3.6130 3.5966
3.0 .9986 3.8660 3.8394 3.8149 3.7924 3.7714 3.7520 3.7339 3.7170
3.1 .9990 3.9909 3.9635 3.9383 3.9151 3.8935 3.8735 3.8549 3.8374
3.2 .9993 4.1160 4.0877 4.0618 4.0379 4.0157 3.9951 3.9759 3.9580
3.3 .9995 4.24l 1 4.2120 4.1854 4.1608 4.1380 4.1168 4.0971 4.0786
3.4 .9996 4.3663 4.3364 4.3090 4.2838 4.2603 4.2386 4.2183 4.1994 _
3.5 .9997 4.4916 4.4609 4.4328 4.4068 4.3828 4.3604 4.3396 4.3201
3.6 .9998 4.6169 4.5855 4.5566 4.5299 4.5053 4.4823
3.7 .9998 4.7423 4,7101 4.6805 4.6531 4.6278 5.6043 4.4610
4,5824 4.4410
4 5619

224 NASA/TP_2000-207428
TABLE A-f--Continued.

(c} Coacludcd.

Safety Sample size, N


margin,
s. 21 22 23 24 25 26 27 28

3.8 0.9999 4.8678 4.8348 4.8044 4.7764 4.7504 4.7263 4.7039 4.6829
3.9 4.9934 4.9595 4.9284 4.8997 4.8731 4.8484 4.8254 4.8039
4.0 5.1190 5.0843 5.0524 5.0231 4.9958 4.9706 4.9470 4.9250
4.1 5.2447 5.2091 5.1765 5.1465 5.1186 5.0927 5.0686 5.0461
4.2 5.3704 5.3340 5.3007 5.2699 5.2414 5.2150 5.1903 5.1673
4.3 5.496 ! 5.4590 5.4259 5.3934 5.3643 5.3372 5.3120 5 1885
4.4 5.6219 5.5840 5.5491 5.5170 5.4872 5.4595 5.4338 5.4097
4.5 5.7478 5.7090 5.6734 5.6405 5.6101 5.5819 5.5556 5.5310
4.6 5.8737 5.8340 5.7977 5.764[ 5.7331 5.7043 5.6774 5.6523
4.7 5.9996 5.9591 5.9220 5.8878 5.8561 5.8267 5.7992 5.7736
4.8 6.1255 6.0843 6.0464 6.0115 5.9791 5.9491 5.9211 5.8950
4.9 1.0000 6.2515 '6.2094 6,1708 6.[352 6.1022 6.0716 6.0430 6.0164
5.0 6.3775 6.3346 6.2952 6.2589 6.2253 6.1941 6.1650 6.1378
5.1 6.5035 6.4598 6,4]97 6.3827 6.3484 6.3166 6.2869 6.2592
5.2 6.6296 6.5851 6.5442 6.5O65 6.4716 6.4391 6.4089 6.3807
5.3 6.7557 6.7103 6.6687 6.6303 6.5947 6.5617 6.5309 6.5022
5.4 6.8818 6.8356 6.7932 6.7541 6.7179 6.6843 6.6530 6.6237
5.5 7.0080 6.9609 6.9[78 6.8780 6.8411 6.8069 6.7750 6.7452
5.6 7.1341 7.0863 7.0423 7.0019 6.9644 6.9295 6.8971 6.8668
5.7 7.2603 7.2116 7.1669 7.1257 7.0876 7.0522 7.0192 6.9883
5.8 7.3865 7.3370 7.2916 7.2497 7.2109 7.1749 7.1413 7.1099
5.9 7.5127 7.4624 7.4162 7.3736 7.3342 7.2975 7.2634 7.2315
6.0 7.6390 7.5878 7.5408 7.4975 7.4575 7.4202 7.3856 7.3531
6.1 7.7652 7.7132 7.6655 7.6215 7.5808 7.5450 7.5077 7.4748
6.2 7.8915 7.8387 7.7902 7.7455 7.7041 7.6657 7.6299 7.5964
6.3 8.0178 7.9641 7.9149 7.8695 7.8275 7.7884 7.752 I 7.7181
6.4 8.1441 8.0896 8.03% 7.9935 7.9508 7.9112 7.8743 7.8398
6.5 8.2704 8.2151 8.1643 8.1175 8.0742 8.034O 7.9965 7.%14
6.6 8.3967 8.3406 8.2891 8.2415 8.1976 8.1567 8.1187 8.0831
6.7 8.5231 8.4661 8.4138 8.3656 8.3210 8.2795 8.2409 8.2048
6.8 8.6494 8:5916 8.5386 8.4897 8.4444 8.4023 8.3632 8.3266
6.9 8.7758 8.7172 8.6633 8.6137 8.5678 8.5252 8.4854 8.4483
7.0 8.9022 8.8427 8.7881 8.7378 8.6912 8.6480 8.6077 8.5700
7.1 9.0285 8 .%83 8.9129 8.8619 8.1847 8.7708 8.7300 8.6918
7.2 9.1549 9.0938 9.0377 8.9860 8.9381 8.8937 8.8522 8.8135
7.3 9.2814 9.2194 8.1625 8.1101 9.0616 9.0165 8.9745 8.9353
7.4 9.4078 9.3450 9.2873 9.2342 9.1850 9.1394 9.0968 9.0571
7.5 9.5342 9.4706 9.4122 9.3583 9.3085 8.2622 9.2191 9.1788
7.6 9.6606 9.5962 9.5370 9.4825 9.4320 9.3851 9.3414 9.3006
7.7 9.7871 9.7218 9.6619 9.6066 9.5555 9.5080 9.4638 9.4224
7.8 9.9135 9.8474 9.7867 9.7308 9.6790 9.6309 9.5861 9.5442
7.9 10.0400 9.9730 9.9116 9.8549 9.8025 9.7538 9.7084 9.6660
8.0 I0.1665 10.0987 10.0365 9.9791 9.9260 9.8767 9.8308 9.7879

NAS A,rI'P--2000-207428 225


TABLE A-5.--Continued.

(d) Sample sizes 30 to 100

Safety Probability, Sample size, N


margin, e.
s_, 3O 4O 50 60 7O 8O 90 100

-5.0 0 -4.2753 -4.3596 -4.4191 -4.4640 -4.4996 -4.5286 -4.5530 -4.5738


-4.0 0 -3.4065 -3.4757 -3.5245 -3.5613 -3.5905 -3•6143 -3.6343 -3.6514
-3.0 .0013 -2.5332 -2.5879 -2.6264 -2.6555 -2.6785 -2.6973 -2.7130 -2.7265
-2.0 .0227 - 1.6500 - 1.6916 - 1.7208 - 1.7427 - 1.7601 - 1.7742 - t .7861 - i.7962
-!.0 •1586 - .741 ! -.7732 -.7954 -.8121 -.8251 -. 8358 -.8446 -.8522
-0 .5000 .2394 .2061 .1837 •i673 .1547 •1445 .1361 .1290
.i .5398 .3439 .3094 .2864 .2696 .2566 .2462 .2376 .2304
.2 .5792 .4496 .4138 .3901 .3727 .3594 .3487 .3399 .3325
.3 .6179 .5565 .5193 .4946 .4767 .4629 .4519 .4428 .4352
.4 .6554 .6646 .6257 .6000 .5814 .5671 .5557 .5464 •5385
.5 .6914 .7737 .7331 .7063 .6869 .6721 .6602 .6505 .6424
.6 .7257 .8840 .8414 .8134 .793 I .7777 .7654 .7553 .7468
.7 .7580 .9951 .9505 .9211 .9000 .8839 .8710 .8605 .8517
.8 .7881 1.1072 1.0603 1.0296 1.0075 .9906 •9773 .9663 .9571
.9 .8159 1.2201 1.1708 1.1386 I.II55 1.0979 1.0839 1.0725 1.0630
1.0 .8413 1•3337 i .2820 1.2482 1.2240 1.2056 1.1911 1.1791 1.1692
I.I .8643 1.4450 1•3937 1.3583 1.3330 i.3138 i.2986 1.2861 i.2757
1.2 •8849 1.5628 1.5059 1.4689 1.4424 1.4223 1.4064 1.3935 1.3826
1.3 •903 I 1.6782 i.6186 1.5799 ! .5522 1.5312 1.5146 1.5011 1.4898
1.4 .9192 1.7941 1.7317 1.6912 1.6623 1.6404 1.6231 1.6090 1.5972
1.5 .933 I 1.9105 1.8452 ! .8029 i .7727 1.7499 1.7319 1.7171 1.7049
i.6 .9452 2.0272 1.9590 1.9149 1.8834 1.8596 1.8408 1.8255 1.8127
!.7 .9554 2.1442 2.0731 2.0271 1.9944 1.9696 ! .9500 1.9341 1.9208
1.8 .9640 2.2616 2.1875 2.1396 2.1055 2.0797 2.0594 2.0428 2.0290
i.9 .9712 2.3793 2.3021 2.2523 2.2169 2.1901 2.1689 2.1517 2.1374
2.0 .9772 2.4972 2.4170 2.3652 2.3284 2.3006 2.2786 2.2608 2.2459
2.1 .9821 2.6154 2.5320 2.4782 2.4401 2.4112 2.3885 2.3700 2.3545
2.2 .9860 2.7337 2.6472 2.5915 2.5519 2.5220 2.4984 2.4792 2.4633
2•3 .9892 2.8523 2.7626 2.7049 2.6639 2.6329 2.60_ 2.5887 2.5721
2.4 .9918 2.9710 2.8782 2.8184 2.7759 2.7439 2.7187 2.6982 2.6810 I
2.5 .9937 3.0899 2.9938 2.9320 2•8881 2.8550 2.8290 2.8078 2.7901
2.6 .9953 3.2089 3.1096 3.0457 3.0004 2.9663 2.9393 2.9174 2.8992
2.7 .9965 3.3280 3.2255 3.1596 3. ! 128 3.0776 3.0498 3.0272 3.0084,
2.8 .9974 3.4473 3.3416 3.2735 3.2253 3.1889 3.1603 3.1370 3.1176
2•9 .9981 3.5667 3.4577 3•3875 3.3378 3.3004 3.2709 3.2469 3.2269
3.0 .9986 3.6861 3.5738 3.5016 3.4505 3.4119 3.3815 3.3568 3.3362
3.1 .9990 3.s057 3.6901 3.6158 3.5631 3.5234 3.4922 3.4668 3.4456
3.2 .9993 3.9253 3.8064 3.7300 3.6759 3.6351 3.6030 3.5768 3.5551
3.3 .9995 4.0451 3.9228 3.8443 3.7887 3.7467 3.7137 3.6869 3.6646
3.4 .9996 4.1649 4.0393 3.9586 3.9015 3.8585 3.8246 3.7970 3.7741
3.5 .9997 4.2847 4.1558 4.0730 4•0144 3.9702 3.9355 3.9072 3.8837
3.6 .9998 4.4047 4.2724 4.I874 4.1273 4•O820 4.0464 4.0174 3.9933
3.7 .9998 4.5247 4.3891 4.3019 4.2403 4.1939 4.1573 4.1276 4.1029

226 NASA/TP_2000-207428
TABLE A-5.--Concluded.

(d) Concluded.

Safety Probability, Sample size, N


margin. e,
SM 40 50 60 70 8O 9O 100

3.8 0.9999 4.6447 4.5057 4.4165 4.3433 4.3057 4.2683 4.2379 4.2126
3.9 4.7648 4.6224 4.5310 4.4664 4.4177 4.37% 4,34,82 4.3223
4.0 4.8849 4.7392 4.6456 4.5795 a.5296 4.4904 4.4585 4.4320
4.1 5.0051 4.8560 4.7603 4.6926 4.6416 4.6014 4.5688 4.5417
4.2 5.1253 4.9728 4.8749 4.8057 4.7536 4.7125 4.6792 4.6516
4.3 5.2456 5.0897 4.9896 4.9189 4.8656 4.8237 4.7896 4.7612
4.4 5.3659 5.2066 5.1044 5.032 I 4.9776 4.9348 4.9000 4.8710
4.5 5.4862 5.3235 5.2191 5.1453 5.0897 5.0460 5,0104 4.9809
4.6 5.6066 5.4405 5.3339 5.2585 5.2018 5.1572 5.1209 5.09O7
4.7 5.7270 5.5575 5.4487 5.3718 5.3139 5.2684 5,2314 5.2006
4.8 5.8474 5.6745 5.5635 5.485 I 5.4260 5.3796 5,3418 5.3104
4.9 1.0000 5.9679 5.7915 5.6784 5.5984 5.5882 5.4908 5.4523 5.4203
5.0 6.0883 5.9086 5.7932 5.7117 5.6503 5.6021 5.5628 5.5302
5.1 6.2088 6.0256 5.9081 5.82.5O 5.7625 5.7133 5.6734 5.6401
5.2 I 6.3294 6.1427 6.0230 5.9384 5.8747 5.8246 5.7839 5.7500
5.3 6.4499 6.2598 6.1379 6.0518 5.9869 5,9359 5.8945 5.86O0
5.4 6.5705 6.3770 6.2528 6.1651 6.0991 6.0472 6.0050 5.9599
5.5 6.6911 6.4941 6.3678 6.2785 6.2113 6.1585 6.]156 6.0799
5.6 6.8117 6.6113 6.4828 6.3919 6.3236 6.2698 6.2262 6.1898
5.7 6.9323 6.7285 6.5977 6.5054 6.4358 6.3812 6.3368 6.2998
5.8 7.0529 6.8456 6.7127 6.6188 6.5481 6.4925 6.4.474 6.4098
5.9 7.1735 6.9628 6.8277 6.7322 6.6694 6.6039 6.5580 6.5198
6.0 7.2942 7.0801 6.9427 6.8457 6_7727 6.7152 6.6686 6.6298
6.1 7.4149 7.1973 7.0577 6.9592 6.g850 6,8266 6.7792 6.7398
6.2 7.5356 7.3145 7.1728 7.0726 6.9973 6.9380 6.8899 6.8498
6.3 7.6563 7.4318 7.2878 7.1861 7.1096 7.0494 7.0005 6.9598
6.4 7.7770 7.5490 7.4029 7.2996 7.2219 7.1608 7.1112 7.0699
6.5 7.8978 7.6663 7.5179 7.4131 7.3342 7.2722 7.2218 7.1799
6.6 8.0185 7.7836 7.6330 7.5266 7.4466 7.3836 7.3325 7.2899
6.7 8.1393 7.9009 7.7481 7.6401 7.5589 7.4950 7.4432 7.4000
6.8 8.2600 8.0182 7.8632 7.7537 7.6712 7.6064 7.5538 7.5100
6.9 8.3808 8.1355 7.9783 7.8672 7.7836 7.7179 7.6645 7.6201
7.0 8.5016 8.2528 8.0%4 7.9807 7.8960 7.8283 7.7752 7.7302
7.t 8.6224 8.3701 8.20_ 8.0943 8.0083 7.9408 7.8859 7.8402
7.2 8.7432 8.4875 8.3236 8.2078 8.1207 8.0522 7.9966 7.9503
7.3 8.8640 8.6048 8.4387 8.3214 8.233 ! 8.1637 8.1073 8.0604
7.4 8.9848 8.7222 8.5538 8.4349 8.3455 8.2751 8.2180 8.1705
7.5 9.1056 8.8395 8.6690 8.5485 8.4579 8.3866 8.3287 8.2806
7.6 9.2264 8.9569 8.7841 8.6621 8.5702 8.4981 8.4394. 8.3906
7.7 9.3473 9.0743 8.8992 8.7756 8.6826 8.6095 8.5502 8.5007
7.8 9.4681 9.1916 9.0144 8.8892 8.7950 8.7210 8.66O9 8.6108
7.9 9.5890 9.3090 9.1296 9.0028 8.9075 8.8325 8.7716 8.7209
8.0 9.7098 9.4264 9.2447 9.1|64 9.0199 8.9,0,0 8.8823 8.8310

NASA,rI'P--2000-207428 227
Appendix B
Project Manager's Guide to Risk Management
and Product Assurance
Introduction with the NASA Policy Directive on safety and mission success
(ref. B-l), OSAT conducts independent assessment activities
to reduce risk. Typically, it is more actively involved in flight
This appendix provides project managers with practical
projects where the risks of failure are often greater and poten-
information about increasing the chances for project success by
tially more severe. However, risk management and product
using the tools of risk management and product assurance. The
assurance tools can be applied to ground-based projects as well.
elements of an effective product assurance program are
Flight projects at Glenn normally develop risk management
described along with the benefits of using a product-assurance-
and product assurance plans to define how they will manage
oriented management approach to reduce project risk. The
risks and address the applicable product assurance require-
information should be especially useful to new project manag-
ments. For many Glenn flight projects, product assurance
ers and to others concerned with specifying product assurance
requirements are specified in the Glenn Standard Assurance
reqfiirements or developing risk management or product assur-
Requirements and Guidelines for Experiments (ref. B-2).
ance plans.
The Office of Safety and Assurance Technologies helps
This appendix is written from the perspective of the NASA
Glenn project managers develop their risk management and
Glenn Research Center's Office of Safety and Assurance
product assurance plans and recommends ways to mitigate
Technologies (OSAT). It begins with a general discussion of
risks and meet applicable product assurance requirements, To
how OSAT supports projects at Glenn, including the roles and
this end, OSAT developed and maintains the Glenn Product
responsibilities of the project assurance lead. Then follows
Assurance Manual (ref. B-3), which contains numerous prod-
relevant discussions on reliability and quality assurance (R&QA)
uct assurance instructions that give suggestions for system
with respect to economics and requirements, performance-
safety, quality, reliability and maintainability, software, and
based contracting, and risk management. Finally, it describes
frequently applied requirements from various product assur- materials and processes. Glenn projects often use these instruc-
tions as is or tailor them to meet specific needs.
ance disciplines. For project managers needing further infor-
mation, a more comprehensive treatment of risk management
and product assurance can be found in the references.
Project Assurance Lead

Role
Risk Management and ProductAssurance
at the NASA Glenn Research Center The project assurance lead is OSAT's principal point of
contact with the project and serves as an important advisor to
The NASA Glenn Office of Safety and Assurance Technolo- the project manager. The lead provides guidance and advice
gies advises the various project offices on risk management, during the preparation of project, risk management, and prod-
safety, and product-assurance-related issues. Also, consistent uct assurance plans; the generation of statements of work; the

NASA/TP----2000- 207428 229


reviewof bidders'
proposals,andfinalcontractnegotiations. Development of OSAT Requirements
Theprojectassurance lead,normallyshownin theproject
organization
chartin astaffposition
reportingtotheproject Product assurance is a broad and diverse discipline that has
manager, workscloselywiththeprojectofficetoensurethat overlapping authority with procurement, engineering, manu-
riskmanagement andproduct assuranceactivities
areconsis- facturing, and testing. This problem has been mitigated to some
tentwiththeuniquenessoftheproject
andareascosteffective degree at NASA Glenn by developing and using standard
aspossible. product assurance requirements where possible and by assign-
ing experienced project assurance leads to assist projects in
defining OSAT requirements.
Responsibilities The project assurance lead typically has an extensive OSAT
background and can apply skills, training, and project experi-
The project assurance lead helps the project manager identify ence to tailor product assurance requirements to be reasonable
and mitigate risks and ensures that product assurance principals in scope and easily understood. In addition, the project assur-
are applied to the design, manufacture, test, handling, installa- ance lead is responsible for assuring that the product assurance
tion, and operation of the project. The lead identifies and program is consistent with project objectives and that it can
provides the product assurance technical support needed to satisfy mission requirements.
ensure that applicable risk, safety, reliability, maintainability, To illustrate how product assurance requirements can be
quality assurance, materials and processes, and software re- tailored, table B-I lists the actual requirements imposed on 10
quirements are satisfied.
Glenn contracts and identifies the particular project phase
associated with each contract.

Economics of OSAT
Effect of Performance-Based Contracting
Classical curves in figure B-1 show the relationship of
product quality cost and operational cost to product quality. To Even though the government has moved to performance-
achieve a very small percentage of product defects (high based contracting, a disciplined, organized approach to product
quality), product quality cost becomes extremely high. Con-
assurance is still essential to minimize safety risks and to
versely, if the percentage of defects is high (poor quality)
maximize chances for mission success. Although the govern-
operational cost becomes extremely high. The intersection of
ment seeks to avoid imposing"how to" requirements on perfor-
the two cost curves gives the optimum goal from a cost mance-based contractors, these contractors still" should follow
viewpoint. When finalizing product assurance requirements good product assurance practices. To verify their doing so, the
for a project, the project manager should keep the optimum cost government develops and implements surveillance plans to
goal in mind. However, from an engineering perspective, there obtain information about performance. This verification is
may be some critical items for which additional safeguards accomplished primarily through "insight" rather than through
must be established and the need for close risk control is
the more traditional "oversight." (Insight relies on reviewing
mandatory. In this situation, economics is still an important contractor-generated data and minimizes the amount of direct
consideration.
government involvement; conversely, oversight is more intru-
sive because it normally involves direct government monitor-
ing of contractor processes and activities.)
High

/ Product Operational _#
_ quality cost J Risk Management and Product Assurance
Plans

NASA programs and projects are required to use risk man-


agement as an integral part of their management process (ref.
B--4). This requirement includes developing and implementing
a risk management plan to identify, analyze, mitigate, track,
- High and control program and/or project risks as part of a continuous
0 Quality, percent defective
risk management process (ref. B-5).
Figure B-1 .--Relationship of product quality cost to operational
cost.

230 NAS ATrP-.-2000- 207428


TABLE B-I.--RELIABILITY AND QUALITY ASSURANCE REQUIREMENTS IMPOSED
ON VARIOUS PROGRAM TYPES

[Compositewind turbine blades,C; globalair samplingprogram,G; lift/cruise fan, L; materialsfor advancedturbine engines.
M: electricalpower processor,P; quiet, clean, short-haulexperimental engine,Q; Fr8D refanengines,R; spaceexperiments,
S; variable-cycle engine, V; 200-kW wind turbine generators, W.]

Requirement Aeronautics Space Energy

Study Advanced Develop- Right Develop- Flight Develop- Opera-


technology rnent ment ment tional

Reliability program plan P


Reliability program control S
Reliability program S
reporting
Reliability training
Supplier control
Reliability of Government-
furnished property
Design specifications S
Reliability prediction P
Fa/lure mode and effects
analysis
Maintainability and human- L S
induced failures
Design reviews G
Failure reporting and cor- Q R,G S
rective --lion
Standardization of design
practices
Parts program P w
Reliability evaluation plan S
Testing p
Reliability assessment S
Reliability inputs to S
readiness review
Reliability evaluation
program reviews
]Quality status reporting
,Governmem audits: quality Q R W
program audits
Quality program plan Q R W
Technicala0oaneats; qu_ity
support/designreviews M C
Change control Q R,G
Identification control Q R,G
Data retrieval
Source selection M Q R,G C W
Procurement documents Q R,G C
Quality assurance at source Q g W
Receiving inspection M Q R.G
Receiving inspection records M Q R,G
Supplier rating system
Postaward surveys
Coordinate supptier inspec-
tion and tests
Nonconformance informa-
tion feedback
Fabricazion operations Q R,G
Article and material control M Q R C W
Cleanliness control C W
Process control Q RoG C W
Workmanship standards M C

NASA/TP--2000-207428 231
TABLE B- I.--Concluded.

Requirement Aeronautics Space Energy

Study Advanced Develop- Flight Develop- Flight Develop- Opera-


technology rnent ment rt_nt tional

,Inspection and um planning Q R


Inspection records; inspec- M Q R.G S C W
tion and test Im'fonmJ_
Contractor quality control S
actions
Nonconformance control M Q R,G S C
Nonconformance documen- M Q R S C
tation
Failure analysis and correc- M Q R,O S
tive action
Material review Q R G C W
Material review board Q R S
Contracting officer approval S
Supplier material review S
bc_rd
Inspection of test equipment
and standards
IEvaluation of standards and M
test equipmem
Measurement accuracy S
Calibration accuracy M S
Calibration control V M Q R,G S C W
Environmental requiremems
Remedial and preventive R S
action (calibration)
Stamp control system Q R W
Stamp restriction S
Handling and storage Q R,G S W
Preserving, marking, pack- Q R S W.C
asing, and packing
Shipping
Sampling plans R
Statistical planning and G S
analysis
Contractor's responsibility Q R W
for Govexnn_nt property
Unsuitable Government Q R.G W
property

At NASA Glenn, OSAT serves as a risk management con- assurance pro_am to help mitigate risks in many key areas and

sultant to the project manager by offering OSAT risk manage- thereby serve as an important risk management tool.

ment training, helping to prepare risk management and/or


product assurance plans, conducting risk assessments, helping
to track risks, and providing other valuable support to facilitate
the risk management process.
Development and Implementation of
An effective product assurance program is an essential ingre- Product Assurance Plans
dient for successfully managing risks. It provides the frame-
work and discipline needed to support a structured risk As part of an overall risk reduction strategy, Glenn projects
management approach, a characteristic of many successful and contractors develop and implement product assurance
projects. The project manager can rely on an effective product plans to define and perform the tasks necessary to satisfy

232 NASA/TP--2000-207428
applicable product assurance requirements. The plans are hazard reports, develops safety compliance data packages,
intended to establish a disciplined, organized approach to supports safety reviews, and resol yes safety issues with integra-
product assurance, thereby minimizing safety risks and maxi- tion centers or payload safety review panels.
mizing the chances for mission or project success,
The product assurance plan normally includes a description
of assurance activities in the areas or disciplines discussed next. Materials and Processes

To assure safety and promote mission success, projects must


Assurance Reviews exercise care in the selection, processing, inspection, and
testing of materials. Prudent project managers invoke a com-
Assurance reviews help to ensure that the engineering devel- prehensive materials and processes (M&P) progam to ensure
opment and documentation have sufficiently progressed and that materials meet applicable requirements for flammability,
that the design and hardware are sufficiently mature to justify toxic off-gassing, vacuum out-gassing, corrosion, fluid com-
moving to the next phase of the project. These reviews ulti- patibility, and shelf-life control. This program and the associ-
mately require the project to demonstrate that the components, ated M&P assurance activities are documented in the product
subsystems, and system can successfully perform their in- assurance plan.
tended function under flightlike operating and environmental Projects prepare material identification and usage lists
conditions. (MIUL's) and attempt to use compliant materials to the maxi-
mum extent possible. Regarding materials usage, projects work
with and seek the advice of OSAT in several ways:justification
for the use ofa noncompliant material for a particular applica-
Verification Plan
tion and its selection for that application; preparation of mate-
rial usage a_eements (MUA's) that contain the rationale for
As part of its product assurance effort, the project develops using any noncompliant materials; assurances that fabrication
a verification plan to describe the tests, analyses, and inspec- and other manufacturing processes be performed in accordance
tions to be conducted to demonstrate hardware and/or software
with accepted practices or approved procedures; and the issu-
functionality and ability to safely survive expected environ- ance of a materials certification letter, in concert with the
mental extremes. The purpose of the verification program is to applicable NASA Materials and Processes Inter-Center Agree-
ensure that the payload and/or experiment meets all specified ment, when the materials and processes used by the project are
mission requirements. This activity includes verifying that the shown to be acceptable.
design complies with the requirements and that the hardware/ Some applications require the certification of metallic and
software complies with the mission. nonmetallic materials to assure that the chemical and physical
Verification testing includes functional and environmental properties of the materials are compatible with the design
tests to demonstrate the ability to meet performance require- requirements. After materials are selected by the engineer and
ments. Environmental tests consist of thermal cycling, random are precisely defined by a specification (Federal, Society of
vibration, and electromagnetic interference (EMI). Note that Automotive Engineers, American Society for Testing and
environmental stress screening is an effective product assur- Materials, or other available standards), the purchase order for
ance tool that project managers can use to verify the adequacy steels, aluminum alloys, brass, welding rods, solder, metal
of system design and workmanship. coatings, gases, and potting compounds should require that a
test report, a certificate of conformance (fig. B-2), or both
accompany the vendor's shipment. In addition to the vendor's
System Safety certificate, it may be necessary to conduct periodic in-house
tests of metallic and nonmetallic materials to assure their
continued conformance.
System safety is a critical element in the product assurance
plan. Each project must develop and implement a comprehen-
sive system safety program to ensure project compliance with
all applicable safety requirements, both flight and ground. Quality Assurance
Potential safety hazards must be identified and controlled to
reduce the risk of injuring personnel or damaging equipment. Quality assurance (QA), another critical element of an effec-
The Office of Safety and Assurance Technologies provides tive product assurance provam, is documented in the product
direct safety support or consultation to guide projects through assurance plan and helps a project establish and satisfy quality
the NASA safety review process (refs. B-6 to 11); it helps requirements through all phases of the project life cycle.
projects determine the best design solution to meet specific Quality assurance ( 1) promotes discipline, encouraging projects
safety requirements, conducts hazard analyses, generates to design in quality and ensure good workmanship by using

NAS A/TP---2000-207428 233


CAST TECHNOLOGY INCORPORATED LABORATORY REPORT OF
1482 ERIE BOULEVARD CHEMICAL ANALYSIS
SCHEN|CTADY_ NEW YORK 12305 AND
MECHANICAL TESTS
(_'ob 1365)
Pinanco Division (MS500-302}
_-_ewis _esea_ch Center
SOlD
21000 Bzookpazk Road sHt_o "to
TO
Clevela.nd, Ohio 44135

Cleveland, Ohio 44135


L

m cm_ll, n. hu_ m.

_%S 3-14991 C!655729 8/27/87


ill

mm.
A 2987 "'_" i pc "' =" V4271
m,

CHEMICAL ANALYSIS

NEAT NO,
c Mfl P $ Ni G" Cu #e Co CbTa
B Ti M.
/ i / / /
V4271 °05 +051 .051 o01511.006 Bal 12.0 .005 4.76 .05" .05* .08 L.96 ,76 S.]

Zr -- .10
t Le: st_u

MECHANICAL TF..STS

TltwlIUI _ III.O m_plrm_


TEIT ! L.O4_la IoCtWm.L ¢ M_tml[Im
itlLlklr NO, LIllE,
TmlIP o4 Ito % IN I'" n.A..
Iqll Nile. _s _ AgE8

_ib_crilx, d Io and s,eom 1o_q%t41 me WIM.IAM W. LATIMEI rmuJLtlal f_ tegts performed ill out leJ_orl_ory or of th4 41sta

P_+Fm ie lt_le el New Ym4k f:tm_slt_l _IW tl_ L_t_t:toty l_forntNI tt_ _sts.

Co 14,mzro
CTI-22 (U-_,)

Figure B--2.--Typical materical certification.

234 NASA/TP--2000-207428
proper controls during design, fabrication, assembly and test; that facilities maintain proper environmental controls. Project
(2) ensures that hardware and software conform to design managers should be familiar with the good QA practices cited
requirements and that documentation accurately reflect those in the following sections.
requirements; and (3) ensures that flight hardware be main-
tained in a sufficiently clean environment to prevent exposure Review of Drawings
to any contaminants that could degrade performance and pos-
sibly compromise the achievement of mission objectives. Before releasing the engineering drawings to the manufac-
OSAT assists projects in developing effective quality man- turer, design engineers may avail themselves of the technical
agement systems to address areas such as configuration control, services provided by quality engineers when developing speci-
procurement, fabrication, inspection, electrostatic discharge fication cailouts in the note section of the drawings (fig. B-3).
control, and nonconformance control. It also performs quality Give precise information on materials, surface finish, process-
audits of fabrication sources, establishes inspection require- ing, nondestructive testing, cleanliness, identification, packag-
ments, provides inspection and/or test monitoring services, ing. Special instructions and notes are important in obtaining a
makes dispositions for nonconforming material, and ensures quality product.

; I

NASA/TP--2000-207428 235
Changes in Engineering Documents Use of a Process Plan

Early in the design phase, establish a system to control Identify in a plan (fig. B-5) the manufacturing operations
changes (fig. B--4) in engineering documents and to remove that must be performed in a particular sequence. The most
obsolete documents. Changes in released drawings, specifica- commonly used processes are machining, mechanical fasten-
tions, test procedures, and related documents can be critical, ing, grinding, brazing, welding, soldering, polishing, coating,
particularly during the building and testing phases. For this plating, radiography, ultrasonics, fluorescent penetrate inspec-
reason, process the latest engineering data early to expedite tion, magnetic particle inspection, painting, bonding, heat
their distribution to the participating line organizations. treating, identification marking, and safety wiring.

i i i i •
(_IQJIDGTTITIklE H_ , , |

BI_WEERING CHN<GEOIIOER SPMIAIX ]


ml i i i i _1| El i
OMAIIli/G IIO. AIrI_'K_IrED MIUI_ 1'O (3JWI_

6,._t 70.9 | P,4_R /C4 TIOIV


!

CHI[CKI[M -[NGtNI[I[RINQ IlGTIOlt NI[AD'ON Nl_lJrll _ a dmiD.I

DI[IC:IRII r OM SKI[TeN ¢ICANGI[ IN DETAtL IWl Im

VJ////J// r/I,- n////i/_///_


J

1 / I / / i i f ,r" / I I 1 / • f T'_.

•S_r. T/( )N ,_ -2)

DEJ;CRII_[ OR $1C[TC,_ AS SHOWN ON [XlSTING O#._WlNG

I..OW_G HO_.E J_IAM£TE_. WA,$ ,_ 11¢,

_¢il_OUt.ll IliN¢ T

U_lTli III_O_II_RATKO IN CNANG_'

3)YtVAt,¢IC a,,'lOZ_ ,.._.I.d_ST_t.tCTttlZ.E


_ F'LIGblT ._(t_.._T_I._ TU,Q..E
APPROVALS ANO COPY DISTRIIII/TION _l_ll_ll_l_ 6 ¢_N le

--_.-,,,._r'f_"
,.,._,¢T _',,cE,_:_-_
_._.- t_,-*, o,,,_-. _""-"" $ oo- z ! !

Figure B-4.--Typical engineeringchange order.

236 NASA/TP_2000-207428
_c_:1_ T NMuirL _AGE

. .__.2_'o, -:-- .
PnoJ[c'r NO. " YOJ 4S/O PREPARED I)Y
NASA - LEWJS R_'SEARCH C,._NTER

PROCESS PLAN O_TE Q_ ISSU_


P*ATNO. &,t'._'Y?r *_. . S[Ri*L.O. ,
.l_R.._,_*a, Gr. k,4'8_Ts ¢, " ,
RE VISION DM T E
N([A_r ¢O0E NO. ICATIrG4:IIty _J_xpe/o, ,pr_o.p Seco_

RES PO_Sllll.£ OPERA TION _ART PIPETS I_SPRCTED


ONGA NIZATiON COMPLETED SDECo PtE ¢_'M T, BY REMM_KS
BY rCh*ck a_en. cqJ',_
(cOOEJ

"_ j_,_,_,_" _,_A ,_,¢4,_,aa'"e¢_ !


7443 • ,._f
1

OOJ *,_ "¢;


_ ,,_,_, _,_. ¢,_ ¢,G,_ _, .....
-) l ' _.__ "_o *-_'_'_'_
~ ( 3

-- _--,_(,_dr_.,_ _'/J_

,_Zl_. .G_ 'G.f 7 77,k A ' " "

PRQJECT O_ VtCE DATE O_=lrlCE OF _l(t, lAEI4,$'rv ;.[Ty


• _O QUMLIT ASSURANCE. O_L'r_

,r...,_._
_ASA-I*e_;*
NASA-C-q02 (_-71)

Figure B--5.--Typical processplan.

Calibration of Measuring Devices review engineering and product assurance representatives or


the material review board.
Calibrate instruments when physical quantities are to be
measured with any de_ee of accuracy. Calibration includes
repair, periodic (recall) maintenance, and determination of the Nonconformance of Hardware
accuracy (adjustments made as required) of the measuring
devices as compared with known standards from the National When hardware is to be built, some provision must be made
Institute of Standards and Technology. Figure B--6 shows a for the orderly review and disposition of all items that are
typical certificate of calibration. determined by inspection or test as not conforming to the
drawing, specification, or workmanship requirements. The
system most frequently used comprises two procedures:
Inspection of Hardware Items
(1) An engineer or a product assurance representative is
Quality control inspectors check in-process items against authorized to review and decide whether hardware can be
acceptable quality standards and engineering documents (fig. reworked into a conforming condition without an engineering
B-7). Minor deviations from good quality practices are nor- change, an instruction, or both.
mally resolved at the worksite; otherwise they are brought to (2) The material review board reviews hardware that cannot
the attention of the inspection supervisor. If the quality standard be reworked to meet the engineering specifications. The board
being violated is not contained in an engineering document, the consists of engineering, product assurance, and when required,
supervisor may review the inspector's decision if risks are government representatives. In difficult situations, the board
involved. If the discrepancy is a characteristic defined by an members consult with other organizations and persons to arrive
engineering document, the final decision is made by material at the minimum-risk decision.

NASA/TP--2000-207428 237
WESTERN AUTOMATIC TEST SERVICES
ml Comm_tiel Sueet
I_o Nlo, P._llfom_. 94303
1418) 328-60141

CERYIFTCATE OF CALIBBATIOK

TO: L_tton Industr£es DA'fZ: 21 July 1988


960 Industrial Way
San _r los, CA

kference: Your ._rder No. 49721

WATS Order No. 8526

21) _ 1T _ CO_CPJ_:

_he equ£pmeuc listed belovl_as been dull cal£brated by qavecom Zadus-


tr£es_/A_S Group per your £nsCrucCionJo

Wavec_lnduatrtee/l_TS _roup cal£breCioammsuremeuts are traceable


co the Nat£onal Buremu of Standards to the by the Bureau's

Cel£brmCion _ecil£C£ee.

WAWC_ INDUSTRIZS/_TATS Croup

QuancLty DescripCLon Ser_!l _o.

1 IUU_ _ SYSTRN
1 IULSA ZNPUT SYSl_f t2
1 NASA Ob'ZFOT PfS2FJ4 tZ
1 NASA OUTPUT SYS'D_ 12

_V|IC)N OF WAVEi_OM iNOU6TIPU|i. _1,,11} IrI'AT[ 01nCI,'VI'S DI'VlIION 117 N. PA$'I'Q4II4 i_YVA_ CAI, IFHHIA lION "}I_I_

Figure E_-6.--Typi_I ce_i_cate of calibration.

238 NASA/TP--2000-207428
PROCEDURE CFPA-ITI

NASA 20O t_TTS _SECTION ASSY OAIA $H(EI P,tGr 6 OF6

PARA
HO. OESCI_ ! PT I ON
... , ,i ,,,,,,. •
| OEEURRIHG AND
INSPECTION UtlLIER SCOP( Of" ALL PARrs
.,. . ...
2 LAYOUT OF" C|RCUIT : ;_RTS ON CIRCUIT LAYOUT StlEET

.COLD TEST DATA

(A) RETURN LOSS FREQUENCY (a ._lb OOWN. 12ol_. 4- MHz


(R} N014IHAL RETURN LOSS ,_ db

(C) 'WORST SPIKE _.._. db @ | _b.I OO /,,IHz

(O) |L _ 120,30 14H_¢.,,,_.._ db, |L (_' 12080 HHz 2.2. Lib

IL (D 12123 14t z[,...._ db, IL = 20 db (a |1¢_1.1 I,,14z

CLEAN PARTS PER LBP£-|TI z.s-:r


INSPECT PARTS BEFO_E STACKING #c
STACK C|RC'UIT FARTS OH BRAZZt_ FIXTURE ..... _.1-_ r_ G

t_ASUR£NENT OF CIRCUIT HEIGHTS 'BEFORE BRAZE (WITHOUT ALLOY)

o e c t.'n.31, l, o t.'l_.'_Z &


--_ A t.lZ&_'. _ i.ltlr.., t
=..oooq.
8 RENOv( O.O_ CER_IC ROD

'QC VERIFY OR|ENTATZON AND ORAZ|KG fIXTURE mX46ERS

rURNACE TVPE._L_LJ_ NO.___

SOAK DURATION |,_ HIN. TAP POSIT|ON ,._._

9 C_OIT+ONOF
'_LO_,PTERBR,ZE.. _,,_
/'WJLSUR(HENT OF CIRCUIT HE|GHTS AFTER BRAZ( .t_ ,t, . OCIOfo

A |.'/25el 4, e 1.1/.41 c Lll4¢ t o I.ltql

RECORD THE Ulrt'(Re_:( BETWEEN PARAGRAPH 7 ANO 9

A*.OoOq B-_.OOOS" c..,.ool_ o ..o._


: ,o SiZE OF HANOREL O_OPP(O'TtMIOUGH 'OEAH ttOIL[ ,_ INCH

II VERXFY PERPENU|CULARIT'Y._s_.INCHES OFF VErTICaL ....

C_.(_ INCttEs. HAXIHUI4 RUN OUT,i)_ |NCI¢.$

e._ ZNCH(S rROt_ TOP Of SPACZR ((_)HARK UP)

12

13 FINAL CO_O TE_T DATA

(A) REtUrN
LOSSFRE_UE,CY
V ._b Om_ J200q.3
(o) NOH[NAL RETURN LOSs _| db

(CJ _OflST SP|K( {_ db {v |_100 HI'/


z

i L (_ 12123 Mltz_.___db t L : 2O de) (_ IIq(,$.o _z

COMH(NTS: _ ]qla_qK AT O_'rP _-)';"

14 OISPOSITION OF ASS(HOLY

usE:
REJECT: ,I" t /

DISPOSITION IF REJECt: ._ ¢¢

Figure B-7.--Typicalmandatoryqualitycontrolinspec_on points.

NASA/TP--2000-207428 239
Documentation of Equipment Discrepancies Quality Assurance Documentation of Production, Inspec-
tion, and Test Operations
In a design, certain characteristics are distinct, describable,
and measurable in engineering units. Critical characteristics are Manufacturing, inspecting, testing, and related operations
generally identified by engineering documents and are closely for major assemblies and subassemblies should be documented
controlled by quality assurance personnel. Whenever a design for several reasons. Such documentation can provide a status
characteristic is determined to be nonconforming to released record of the work in progress and the work completed. Also,
engineering requirements, one of the following reporting pro- it can become a part of the permanent record of production,
cedures must be followed: inspection, and test operations. The sophistication of the format
and the entries in the log can be adjusted to suit the type of
(1) A minor discrepancy is recorded in a discrepancy log contract--research, development, or production. The chrono-
(fig. B-8). A disposition must be made by an engineer, an logical entries in the log can be summarized and included in an
inspector, or both if the condition is a minor discrepancy (e.g., acceptance data package, which contains information helpful
a scratch on a metal surface or excess material) that does not to review during a contractor's acceptance of a supplier's
adversely affect form, fit, or function and the hardware can be equipment or during final Government acceptance of a contract
used "as is" or reworked to engineering requirements. end item. Figure B-9 shows a checklist used to determine if an
(2) A failure discrepancy report is written and a disposition item conforms to specifications.
is obtained through the engineering review board (ERB) if a
mechanical, electrical, or electronic system or subsystem has
failed to perform within the limits of a critical characteristic
identified by an engineering drawing, specification, test proce-
dure, or related engineering document.

_c. _ s.EcT---L-- or/o


...

I_TNO0 Of COllECTION ST,_JP. T_II

i
f, U

.,o_.o.,_J,_,o¢.a_ J,. _=__. ¢

d,_,t_ /--¢,,÷ __,_ a..


_A -,,_Z/-" 6, .t-

.......

,.. m

L,

, w

'"l

IrOl_ L_t_ S
I
LOG DI££T
CLrAR£D I1'

Figure B--8.--Typical discrepancy log.

24O NASA/TP--2000-207428
2.0 Quality assurance checklist for conformance to specifications of
Communications Technology Satellite (CTS) output stage tube (OST)

OSTS/N: 2021 Classification: QT .M-2(QF-2)

2.1 Overall efficiency


i

Specification: 50 percent Actual: 40.7 percent


minimum over CTS band of minimum at 12.040 GHz.
12,038 to 12.123 GHz, at Out of specification.
saturation (Waiver required.)

2.2 Center frequency


i

Actual: 12.0805 GH
I Specification: 12.0805 GHz

2.3 RF power output


i

Actual: 170 W minimum


minimum at saturation at 12.040 GHz.
over CTS band of 12.038 to Out of specification.
J Specification:
12.123 GHz 200 W
(Waiver required)

2.4 Small signal bandwidth


I I i=

Specification: 3 dB
maximum peak to peak peak to peak
measured at 10 dB below
peak saturation over the I Actual: 2.4 dB maximu_
CTS band, 12.038 to 12.123 GHz

Figure B-9.--Checklist for item conformance to specifications.

Safety and Mission Assurance for Suppliers of Materials established. The program may also include maintainability
and Services analyses or demonstrations to show that equipment can be
adequately maintained based on expected component failure
Materials and services acquired by the user from outside rates.

sources must satisfy contract, Government, or company reli- Several ways that OSAT assists and works with projects to
ability and quality assurance requirements. The user's system ensure that hardware and software meet R&M requirements are
of control should involve by conducting failure mode, effects, and criticality analyses
(see the next section); developing reliability models; making
(1) Selecting acceptable or qualified sources reliability predictions; conducting reliability trade studies; pro-
(2) Performing surveys and audits of the supplier's facilities viding component selection and control design guidelines;
(3) Inspecting the received supplier's products conducting analyses to identify the root causes of failures;
(4) Reporting and taking corrective action for problems that implementing design changes to improve reliability and main-
occur tainability; developing maintenance concepts; performing spare
parts analyses; and developing plans (e.g., preventative main-
tenance) to address maintainability requirements.
Reliability and Maintainability The fundamental objective of a failure mode, effects, and
criticality analysis is to identify the critical failure areas in a
An effective reliability and maintainability program (R&M) design. To accomplish this identification, each functional com-
can ensure that a project's hardware and software meet mission ponent (or higher level if adequate to attain the intended
design life and availability requirements. The R&M program is purpose) is sequentially assumed to fail, and the broad effects
documented in the project's product assurance plan and includes of each such failure on the operation of the system (fig. B-10)
tests, analyses, and other assurance activities to demonstrate are traced. More details on this subject are available in the
that the project can meet the reliability and availability goals LeR-W0510.060 ISO Work Instruction.

NASAfrP--2000- 207428 241


Solar Array Failure Mode and Effects Analysis of Mounting and Mechanical Deployment Assembly
for Space Electric Rocket Test I!

Component Failure Cause Effect Criticality Action Status


mode

Actuator Binding Needle valve plugged Degraded deployment Minor Spring stiffness adequacy Completed
assembly and tolerances reviewed;
tests carefully evaluated
!Operation Tolerance buildup; Partial deployment Major Workmanship inspected Specified
is erratic O-nng damage;
workmanship
Actuation Spdng failure No deployment Cdtical Data packages will be Planned
stops prepared

Linkage Motion Binding and Iockup Partial deployment Major Kinematics study disclosed Completed
(mechanism stops source of binding; redesigned
assembly) prematurely Minor
Design weakness; Slow deployment Confidence tests will verify Planned
poor workmanship; elimination of failure mode
damage

Pin-puller Tie-rod Excessive load; Solar array does Cdtical Need study to develop Open
assembly is not squib failure; not deploy alternative design with
released corrosion of pin puller; adequate redundancy
jamming of catch

Mechanical Attachment Excessive loads Partial deployment Major Cold gas attitude control Planned
assembly )oint of solar system to be programmed;
arrays to low mode to avoid excessive
Agena bends load
or breaks

Hinges Workmanship Slow deployment Minor Confidence tests Planned


bind
Tolerance stackup Tolerances reviewed Completed
spring

Figure B-10.--Typical failure mode and effects analysis.

EEE Parts Control Selection and Screening

The costs incurred during subsystem and system testing are


The electronic, electrical, and electromechanical (EEE) parts
inversely proportional to the money spent for examining and
used by a project can have a major impact on its safety and
testing parts. Success is directly related to the part screening
reliability. The project must be sure that the EEE parts selected
costs. For example, the exceptional operational life of the
and used are appropriate for their application and offer the
Space Electric Rocket Test II satellite is no doubt attributable
lowest safety risk and greatest chance for mission success based
to the extensive parts selection and screening program.
on cost and schedule constraints. Projects must plan and imple-
Other factors influence parts selection and screening: the
ment an EEE parts control program consistent with reliability
criticality of the hardware application, unusual environments,
requirements and good engineering practice.
contractor experience, and in-house resources. The selection
The OSAT helps projects select parts and develop EEE parts
can range from a high-reliability part (identified in a Govern-
identification lists. Also, it verifies that parts selected comply
ment- or industry-preferred parts handbook) to an off-the-shelf
with de-rating guidelines and other requirements (e.g., radia-
commercial part. Screening is a selective process as called out
tion); conducts Alert searches in conjunction with the Govern-
in the source control document.
ment Industry Data Exchange Program and NASA Parts
Advisories to identify and deal with potentially unreliable
parts; and assists with parts screening, ensuring traceability and
analyzing part failures (see the following sections).

242 NASAfI'Pu2000-207428
Failure occurs

Initiate report within 24 hr

Assign number and open file - Project manager


(control) - Office of Mission Safety and Assurance
t Distribution:
- Design engineer

Cognizant engineer for


- Safety or materials and processes
Analyze
(cognizantfailure
engineer) t - Software or electrical, electronic, and
electromechanical parts
- Reliability and quality control

Working group:
i - Design engineer
Take corrective action I. - Safety or electrical, electronic, and
(design engineer or working group) J electromechanical parts;
materials and processes;
quality inspector engineer

Required corrective action:


I (project
Implementteam)
corrective action - Design, material, or process changes
- Reworking, repair, or replacement

(test engineer or technician and


I Test orinspector)
quality verify corrective action

Concur Distribution:
(project manager and - Project manager
Office of Mission Safety and Assurance - Office of Mission Safety and Assurance
product assurance manager) - Design engineer

(control) file
Closeout I

Figure B--11 .--Failure report, analysis, and corrective action flowchart.

Materials Identification provide a way to trace backward from an end item to the part
or material level.
Good engineering practice identifies parts, components, and
materials with a part number, a screening serial number, a date
code, and the manufacturer. Furthermore, the marking on parts Failure Analysis
and components should be affixed in a location that is easily
seen when the item is installed in an assembly. The identifica- Some failed parts are analyzed and investigated to determine
tion method and location on the item are included on a drawing, the cause of the failure (fig. B-11 ). Corrective action is taken
a specification, or other associated engineering document to assure that the problem does not recur and then the action
(fig. B-3, note 6). During the period of fabrication, assembly, is verified by testing. The problem is closed by ERB review.
and testing, the system of marking and recordkeeping should Sometimes corrective action may change a component

NASA/TP--2000-207428 243
iVe cost

ress through

Evaluate alternatives;
Determine
identify and resolve risks
objectives,
alternatives,
constraints
Risk analysis

Risk analysis

r-- Proto- ..- _ _


Review
,=_analysis "_/ type 2 /
commitment
! --" Opera onal
partition
R I Pro_o- .---
_"_k --_" prototype
A i t_e 1 _ _ _ Prototype 3
:2.._ _ _ __ _ ------L Solutions model'-'---_ben----chmark---_"
Concept of I ---- ILL ....
operation / ....
Develop and Software
verify next- requirements
Development .- Detailed
level process
Requirements design
plans validation Software
product
design

Design validation
I Code
and test and verification
Last I
plan test I
Evaluate process Inte- I
alternatives; gration I
identify and and test
resolve process
risks Acceptance
test

Determine process Imple- I


mentation I
objectives, alternatives,
constraints

Plan next phases Develop and verify next-level product

Figure B-12.--Spiral software development life cycle.

NASA/TP_2000-207428
244
application criterion, improve a packaging technique, or revise Conclusion
a test procedure. Often the detailed physical and chemical
examination reveals that a refinement is needed in the materials
Project managers can realize many benefits by using risk
used during the manufacturing of a part or that an improvement
management tools and a product-assurance-oriented approach
in the parts screening process is necessary. to their projects. By applying effective product assurance
techniques throughout the project life cycle, projects can achieve
the highest level of safety, quality, and reliability for the
Software Product Assurance available resources. The investment that project managers
make to apply risk management and product assurance to their
Software is generally a critical element in the safety and projects offers the probable return of increased mission safety
success of a project. Project managers are therefore wise to and a greater probability of success. Experienced project man-
establish an effective software assurance program to ensure the agers consider this to be a wise investment.
safety, reliability, and quality of their software systems. Such
a program includes a software assurance plan (typically part of
the product assurance plan) to address software quality stan- References
dards, configuration management, testing, problem reporting,
performance verification, certification process, and mission B-I. NASA Policy for Safety and Mission Success. NPD 8700.1, NASA,
simulation. June 12, 1997.

The software product assurance (SPA) effort is intended to B-2. Standard Assurance Requirements and Guidelines for Experiments.

ensure that all software hazards be identified and controlled, LeR-M 0510.002, NASA Glenn Research Center, Apr. 1996.
B-3. NASA Glenn Product Assurance Manual. LeR-M 0510.001, NASA
that the software be capable of meeting mission availability and Glenn Research Center, Dec. 18, t 996.
design life requirements, that the software meet all perfor- B--4. NASA Program and Project Management Processes and Requirements.
mance requirements for the mission simulation, and that soft- NPG 7120.5A, NASA, Apr. 3, 1998.

ware documentation accurately reflect those requirements B-5. Continuous Risk Management Guidebook, Carnegie Mellon Univer-
sity, Pittsburgh, PA, 1997.
(fig. B-I 2).
B-6. NASA Johnson Space Center Payload Safety Home Page (http://
The OSAT can help projects develop and implement an
wwwsrqa.jsc.nasa.gov/pce/pcehome.htm).
effective SPA process. For example, it can prepare SPA plans B-7. Safety Policy and Requirements for Payloads Using the Space Trans-
and conduct software hazard analyses, failure tolerance analy- portation System. NSTS 1700.7B, (Change No. 5, Oct. 12, 1998).
ses, and audits. It ensures that projects follow proper software B-8. Safety Policy Requirements for Payloads Using the International Space
Station. NSTS 1700.7B ISS Addendum, Dec. 1995.
configuration management practices. In addition, it witnesses
B-9. Space Shuttle Payload Ground Safety Handbook. KHB 1700.7 Rev. B,
or monitors software tests and verifies that results conform to
Sept. 1992.
expectations. B-10. Payload Safety Review and Data Submittal Requirements. NSTS/ISS
13830 Rev C, July 1998.
B-I 1. Guidelines for the Preparation of Flight Safety Data Packages and
Hazard Reports for Payloads Using the Space Shuttle. JSC 26943, Feb.
1995.

NASA/TP--2000-207428 245
Appendix C

Reliability Testing Examples


A great deal of work has been done by various researchers to
develop probabilistic methods suitable for reliability problems Q(t)= _ p(t)dt
(ref. C-l). Probabilistic methods that apply discrete and con-
tinuous random variables to user problems are not as well
When time is the variable, the usual range is 0 to t, implying
covered in the literature.
that the process operates for some finite time interval. This
This appendix concentrates on four useful functions: (1)
integral is used to define the unreliability function when fail-
failuref(O, (2) reliability R(t), (3) failure rate _,, and (4) hazard
ures are being considered.
rate _.'. Because we usually need to know how well a point
The reliability function R(t) is given by
estimate has been defined, some consideration is given to
confidence intervals for these functions. The appendix also
explains methods for planning events at the critical delivery R(t)= 1- Q(t)
milestone and closes with a brief explanation of two reliability
case histories. In integral form R(t) is given by

Useful Distribution Functions R(t) = ftP(t)dt

The failure functionf(t), which defines failure as a function Differentiation yields


of time or number of cycles, is important knowledge obtained
from reliability testing. Failure records are kept on a particular
piece of hardware to obtain a histogram of failures against time. dR(,) = dO(t______)
= -p(t)
This histo_am is studied to determine which failure distribu- dt dt
tion fits the existing data best. Once a functionf(t) is obtained,
reliability analysis can proceed. In many cases, sufficient time The a posteriori probability of failure pf in a given time
is not available to obtain large quantities of failure density interval, tI to t2, can be calculated by using these equations and
function data. In these cases, experience can be used to deter- is given by
mine which failure frequency function best fits a given set of
data. Table C-1 lists seven distributions, five continuous and
l [" t't2 "]
two discrete. These distributions can be used to describe the
time-to-failure functions for various components. The deriva- .;
tion of the four reliability functions for the seven listed distri-
butions is explained in the next section (ref. C-2).
Derivation of Q(t), R(t), _,, and _" functions.--The
unreliability function Q(t) is the probability that in a random
trial the random variable is not greater than t; hence,

NASA/TP--2000-207428 247
TABLE C-I.--FIT DATA FOR FAILURE FUNCTIONS The term in brackets is recognized from the calculus to be the
derivation of R(t) with respect to time, and the negative of this
Distribution Failurefit
derivation is equal top(t). Substituting these values gives
Continuous
distribution

Exponential Complexelectricalsystems z,: _2_[d R(,)I= _('__L)


Normal Mechanical systemssubjectto wear R(t)L dt ] R(t)
Weibull Mechanical,electromechanical,or electrical
parts: bearings,linkageswith fatigueloads,
As an example, consider a jet airplane traveling from Cleve-
relays,capacitor,s,andsemiconductors.
Reducesto expormntial distributionif ot= t, land to Miami. This distance is about 1500 miles and could be
# = I, and'r =0 covered in about 2.5 hr. The average rate of speed would be
Gamma
Combinedmechanical and¢leclricalsystems 1500 miles divided by 2.5 hr, or 600 mph. The instantaneous
Lognormal MechanicalpartsunderstressruptureIo_ding speed may have varied anywhere from 0 to 700 mph. The air
Disc_tc distribution speed at any given instant could be determined by reading the
speed indicator in the cockpit. Replacing the distance con-
Poisson One-dK_parts tinuum by failures, failure rate is analogous to average speed,
Binomial Complexelectricalsystems
for probability 600 mph in this example, and hazard rate is analogous to
of Nf defects instantaneous speed, the speed indicator reading in this example.
Figure C-I presents a summary of the useful frequency
functions for the failure distributions given in table C-1. These
Substituting and simplifying gives
functions were derived by using the defining equations given
previously. Choose any failure function and verify that R(t), L,
p:=l_R('_)
R(,,) and _' are properly defined by going through the derivation
yourself. Five reliability problems using the continuous distri-
butions given in figure C-I are solved in the next section.
The rate at which failures occur in a time interval is defined
Estimation using the exponentia_ normal, Weibull, gamma,
as the ratio of the probability of failure in the interval to the
andlognormaldistn'butions.--As an illustration of how to use
interval length. Thus, the equation for failure rate 2, is given by
these equations for an electrical part that experience indicates
will follow the exponential distribution, consider example 1.
Example 1: Testing of a particular tantalum capacitor showed
R(tl)- _.__L_l
R(t2) i_R(t_)I
z: (t_-t, )Rft,) - t2 -t I [ _3 that the failure density function was exponentially distributed.
For the 100 specimens tested, it was found that the mean time
between failures ? was 1000 hr.
Substituting tI = t and t2 = t + h into this equation gives
(1) What is the hazard rate?
(2) What is the failure rate at 100 hr and during the next
2 = R(t) - R(t + h) = R(t) - R(t + h)
10-hr interval?
(t+ h- t)R(O hR(t)
(3) What are the failure and reliability time functions?

The instantaneous failure rate in reliability literature is often Solution 1:


called the hazard rate. The hazard rate 2J is by definition the
limit of the failure rate as h _ 0. Using a previous equation and
(1) Using the equations given in figure C-I for exponential
taking the limit of the failure rate as h _ 0 gives distribution, the hazard rate is given by

,_, = lira
R(t)- e(t +h) 1 1
h_O hR(t) /" 1000 hr/failure

Letting h = At in this equation gives or

7_' = 1x 10 -3 failure/hr
_, lira l[R(t+_t__)-R(t).]
: _'-'°-
R--_F_L'
zt ]

248 NASA/TPm2000-207428
(2)Thefailurerateis givenby TABLE C-2._TEST DATA FOR
GIMBAL ACTUATORS

Ordered Time to Time to


sample failure, failure squared,
_= 1 e_tz/t
number tt t},
hr (IOs hr)2

Forthiscasethetimeintervalisgivenby 1 60xlO 3 3600


2 65 4225
h=t2- q =llO-lOO=lOhr 3 68 4624
4 70 49O0
5 75 5625
The necessary reliability functions are given by 6 75 5625
7 80 6400
$ 83 6889
e -t2/t =e -110/1000 =e --011 =0.896
9 85 7225
IO 90 81oo
and
Totali 750x lOS 57 213
e -t_/t =e -110/1000 =e -0"1 =0.905

Substituting these values gives


n

_,=_}_1 (1 - 0.896)= lxl0_ 3 failure/hr


I0 \ 0.905) _- = f=l

where
This is to be expected for the exponential case because the
failure rate is constant with time and is always equal to the
hazard rate. ? mean time between failures, hr
tf time to failure, hr
(3) The failure and reliability time functions are given by n number of observations

Therefore, using the data from table C-2,


p(t) = _ e -t/1000
1000
750 000
7=_ = 75000 hr
10
e(t) = e-"t°°°
The unbiased standard deviation c is given by
As an illustration of how to use the equations given in figure
C-I for mechanical parts subject to wear using the normal 1/2

distribution, consider example 2.


Example 2: A gimbal actuator is being used where friction,
mechanical loading, and temperature are the principal failure-
causing stresses. Assume that tests to failure have been con-
tl II 2 n
ducted on the mechanical parts, resulting in the data shown in f=I
table C-2. n-I

(1) What is the mean time between failures and the standard
deviation?
(2) What are the hazard rate at 85 300 hr and the failure rate
during the next 10 300-hr interval?
(3) What are the failure and reliability time functions? The sum terms required for this calculation are given by

Solution 2:
Z t_ = 57 213 (103 hr) 2 (column 3, table C- 2)
(1) The mean time between failures is given by f-_l

NASA/TP--2000-207428 249
Distribution p(t ) R( t )

Exponential exp (-t/t)


l exp (-tit)

Normal

c2-_ 1 i ,,1,j
exp - - cr2q_ exp - dt

Weibull

Gamma
1 (,__,0
lex0[_] _r(_)

Log normal
1 _:(,,-,,]2i
_ exp 2L ot. )

Distribution p( Nf) R ( Nf )

Poisson o
(t/t ) N exp (-t/t-)
_, (t/{)Jexp(-t/'t)
Nf_ j!
j =Nf

Binomial n
n! pNf gn-Nf _, n! pign-j
(n-Nf)!Nf!
]=Nf (n-j)!i!

Figure C_-t .--Summary of useful frequency functions.

250 NASAfI'P--2000-207428
_. 7,' Remarks

_t2 '

1 , ex0 /i 1/[ h = t2-t 1

i Complex electrical
h systems

Normal ordinate at t Mechanical systems


1 I1_ R (t2)]
L R(t0J Normal area t 1 to oo

(_ = scale parameter
= shape parameter
7 = location parameter

Mechanical or electrical systems.


If (x = t, 13= 0, and 7 = 0, reduces
to exponential. If 13= 3.5, approx-
imates normal.

Gamma ordinate at t Same as Weibull parameters


t-(t2- y)P1 but may be harder to use.
Gamma area t I to
(t 2_ y)[_-I exp L_J
1, F(_J) = Ft P-l e-t off
0
-(t 1-_)_] r(p) = (6 - 1)P(_- 1)
(tl_ _,)_3-..1
exp LTJ.

Combined mechanical
and electrical systems

Log normal ordinate at t Mechanical parts that fail due


1 Ii_R(t2)1 to some wearout mechanism
L R(tl) J Log normal area t 1 to ,=o

2, _.' Remarks

Not applicable Not applicable Nf = number of failures


One-shot devices

Not applicable Not applicable p = defectives


g = effectives
n = trials (sample size)

Complex systems for


probability of Nf defects

Figure C-1 .--Concluded.

NASA/TP_2000-207428 251
and
2.35 x 10 -4 failure/hr
= 1.47 x 10 -3 failure/hr
1.59 x 10 -!
tf = (750) 2 = 562 500 (103 hr) 2
The failure rate is given by

a \ 9 ) t, 9)

(2) The hazard rate X" is given by In this case h is given as 10 300 hr. The reliability at 95 600 hr
is given by
Scaled ordinate at 85 300 hr
Normal area from 85 300 hr to oo R(t 2) = Normal area from 95 600 hr to oo

Let Yt be the normal ordinate at 85 300 hr and Z l be the Using the preceding procedure results in
standardized normal variable, which is given by
R(t2)--
0.023
t- (85300- 75000)
Z 1 =_=
0- 10 300 hr Substituting values gives

Existing tables for the normal ordinate values for Z = 1.0 gives
_= 1 1-
Y{ = 0.242. The scale constant Ks to modify this ordinate value 0.023)= 8.56x
I0300 hr 0.159} 1.03x 10
104-t
for this problem is given by (ref. C-3)

nO = 8.31 x 10 -5 failure/hr
K S -_
0-

(3) The constants required to write expressions for p(t) and


where 0 is the class interval. Substituting values and solving R(t) are calculated as follows:
for YI gives

10 x 1 failures 1 = 1
YI = f(tl ) = KsYI" = x 0.242 0-(2r0112 (1.03 x 104)x 2.52 =3"87xi0-5
10 300 hr

= 2.35 x 10-4 failure/hr


20 -2 = 2 x (1.03 x 10 4)2 = 2.12 X 108

Note that the denominator required to calculate K' is R (q),


which is the normal area from 85 300 hr to ,,o. Existing tables Using the constants and substituting values gives

for the normal area for Z! = 1.0 (ref. C-3) give the area from
---00to Z 1, so that the unreliability Q(tl) is given by
p(t) = 3.87 x 10 .5 e -(t-7'Sx104):/2"12x10_

Q(q) =0.841 x (Area from -oo to ZI)

Because Q(tl) + R(tl) = 1.000,


R(t) = 3.87 x 10 -5 fie-( t-7"5x104 2
at

R(q) = 1.000 - 0.841 = 0.159 As an illustration for the Weibull distribution, consider
example 3.
and the hazard rate is given by Example 3: A lot of 100 stepping motors was tested to see
what their reliability functions were. A power supply furnished
electrical pulses to each motor. Instrumentation recorded the

252 NASA/TP--2000-207428
TABLE C-3.--WEIBULL DATA FOR STEPPING MOTORS

Number of Cumulative number Median 5-Percent 9_Percent


steps to of failures rank rank rank
failure
Problem 3 Problem 9 Scaled time to failure, _,.

0.2 x 103 2 i 6.70 0.51 25.89


.4 4 2 16.23 3.68 39.42
.9 5 3 25.86 8.73 50.69
4.0 16 4 35.51 15.00 60.66
i0.0 2O 5 45.17 22.24 69.65
18.0 50 6 54.83 30.35 77.76
30.9 90 7 64.49 39.34 85.00
50.0 97 8 74.14 49.30 91.27

number of continuous steps a motor made before it failed to step


even
lxl06
though
steps.
a pulse
The
was
step failure
provided.
data
All testing
are given
was
in table
stopped
C-3.
at
o: xpi-,o
ln l
(1) Calculate the frequency functions. Therefore,
(2) Plot the hazard rate function on log-log paper.
0:1 = e275 = 15.7
(3) What conclusions can be drawn from this graph?

Solution 3: Because there are I00 motors in this lot, the data 0:2 = e 4"6 = 100
give ordered plotting positions suitable for plotting on Weibull
probability paper. Figure C-2 shows a plot of these data. From By using the parameters just estimated and the equations given
the shape of the data in figure C-2, it appears as though two in figure C-1 for the Weibull distribution, the following failure
straight lines are necessary to fit this failure density function. frequency functions can be expressed: The partition limits on
This means that different frequency functions exist at different
the number of steps c are 0 < c < 10 and c > 10. The frequency
times. These frequency functions are said to be separated by a functions are given by
partition parameter 6.
From figure C-2 the Weibull scale, shape, and location
parameters can be estimated by following these steps: =
(1) Estimate the partition parameter 8. This estimate can be
Substituting values results in
obtained directly from figure C-2. The two straight lines that
best fit the given data intersect at pointf. Projecting this point
down to the abscissa gives a failure age of 10 000 cycles for
j_ (c) = 0.75 c0.75_ 1 e_(C/15.7)o.75
the partition parameter 6. 15.7
(2) Estimate the location parameter y. This parameter is used
as a straightener forp(t). Becausep(t - 0) is already a straight or
line for both regions, it is clear that "/I = 72 = 0. In general,
several tries at straightening may be required before the one
yielding a straight line forp (t -7) is found. j_ (c) = 0.47c --0"25 e -c°_51_5_ for 0 < c < 10

(3) Estimate the shaping parameter [3. The intercept point a


for line b, drawn parallel to line c and passing through point d, Similarly,

where In(t-y) = 1 is equal to [3. Thus, [31 = 0.75 and 132= 1.50.
(4) Estimate the scale parameter cc At point e for line c,
f2 (c) = 0.015c -°5 e -c_'5°/j_ fore> 10

I
In a'=- In ln_ The reliability functions are given by
1- a(t)

so that R(c) = e -(c-y)f_'c_

NASA/TP--2000-207428 253
99.9
99 0

90 __ Point d-- k Point f -7 Q,_

- Line b -k \ Point 3 - / /
O-- 60

4O - 95-percent ,_ I- ,, _ .' /"_ t_^:,,


1 -- - confidence line _ ,, / _ ., _ / -- ..... 1

20

0
-,,oin,_l /I , _..,>'_/"
I
I/":
l

_ 2 -- _ 10
6
3 g 4

2 - :-/ / i
1 _ _o,n,,
J / /_ _-s-pe,cen,
5 -- confidence line
.6
.4
r
6 --
.2

I t I illlf I i I it IliI I t J |1
.1
,2 .4 .6 .8 2 4 6 8 10 20 40 60x103
Failure age, cycles

I 1 ! I I I I
-2 -1 0 1 2 3 4
log e (failure age)

Figure C-2.--Weibull plot for stepping motors.

Therefore, substituting values gives


e-'C2 _

_'2 = l _ .d.$tto(I for c> 10


Ri(t ) = e-C °'7S/tS'7 for0<c_<10 [ e "t(ci ))l.s/ll_-

and
The hazard rate functions are given by

R2(t) = e-2"5'"" forc> 10


Z"=_-(c-7)_-'
The failure rate functions are given by
Therefore, substituting values gives

_'i = 0-047c-0"25 for0<c<10


_. = ¼[1 e+:-"/_"°'
e -(q -_',)_°_
and

Therefore, substituting values gives


X_ = 0.015c -°5 for c > 10

e -t c2 ) (2) By using two-cycle log-log paper and the following


forO<c<lO
calculation method, a graph of l' against c can be obtained:
tl= [l , ,0.75_,_,
e_(q),i.75m,

and _,_ = 0.047C -0-25

Taking logarithms to the base 10 gives

254 NASA/TP--2000-207428
.2
log 3,_ = log 0.047 + (--0.25) log c

Useful corollary equations are


_- .08
l0 x =y _-.06

_- .04 (
U)

x = log Y
_ .02

10 0 =1
I t I p Iilll 1 J t t Itll]
.01
and 2 4 6 8 10 20 40 60100x103
Number of steps or cycles, c
log 0.047 = log 4.7 × 10 -2 = 1og4.7 + (-2) log 10
Figure C-3.--Hazard rate plot for stepping motors.
= 2.672, or 8.672 - 10

TABLE C-5.--ELECTRICROCKET "


For c = l, RELIABILITYDATA

log 3-j = log 0.047 + (-0.25) log 1 Ordered Time to Median Scaled Linear
sample failure, rank time to scale
number rp failure rank
xl = 0.047 hr
Scaled time to failure, ti
For c = I 0,
I ! 037.8 6.70 7.2 5.0
2 1 814.4 16.23 12.6 15.0
log 3-i = log 0.047 + (0.25) log 10 = 2.672 - 0.25 = 2.422 3 2 332.8 25.86 16.3 25.0
4 3 124.8 35.51 21.7 35.0
5 3 614.4 45.71 25.1 45.0
3-i= 0.0264 6 4 579.2 54.83 31.8 55.0
7 5 342 4 64.49 37. I 65.0
I

In a similar manner solving for 3,'2 gives the data points 8 6 292.8l 74.14 43.7 75.0
shown in table C-4. These data are plotted in figure C-3. 9 7 920.0 83.77 55.0 85.0
I0 11 404.8 93.30 79.2 95.0

TABLE C-4.--HAZARD
RATEDATA FOR Example 4: Environmental testing of 10 electric rockets with
STEPPING MOTORS associated power conditioning has resulted in the ordered time-
Ntm'/t)_ of Failures to-failure data given in table C-5.
_¢ps, Per cycle,
c X' (1) What is the mean time between failures?
(2) Write the gamma failure and the reliability functions.
i × 103 0.047 (3) What is the hazard rate at 5000 hr?
!0 .026
(4) What is the failure rate at 5000 hr during the next
to .015
1000-hour interval?
100 .150

Solution 4: The essential steps for the graphical solution of


this problem follow (ref. C-5):
(3) Figure C-3 indicates that the hazard rate is decreasing by
0.25 during the first interval and is increasing by 0.50 during (I) Obtain the median ranks for each ordered position; see
the second interval for each logarithmic unit change of c. table C-5.
It appears that step motors, for first misses, jump from the (2) Plot on linear graph paper (10 x 10 to the inch) median
"infant mortality" stage into the wearout stage without any rank against time to failure for the range around 80-percent
transition period of random failures with a constant failure rate median rank.
(ref. @-4). (3) Fit a straight line to the plotted points. For a median
As an illustration of combined mechanical and electrical sys- rank of 80 read the corresponding time to failure ts0 in hours.
tems that follow the gamma distribution, consider example 4: Figure C--4 gives a ts0 of 7200 hr.

NASA/TP--2000-207428 255
9O With these graphical construction aids, the solution to the
problem is readily achieved:

(1) The mean time between failures is given by


8O
_e
t-- /'= o_13= 2.4x 103 hrx2.25 = 5.4x 103 hr
C

_5
(2) The gamma failure and reliability functions are given by
70

1 e-l(t-_,)/a
p(,)=_rff r((,_
)
F tso
/
60 I I / [
6 7 8x103 It has been shown that y = 0; the other constants are calculated
as follows:
Failureage, ti, hr

Figure C-4.--Electric rocketlife.


o_13=(2.4 x103) 2"25
(4) The time-to-failure data are scaled by using the equation
Using logarithms, log _15= 2.25(Iog 2.4 + log 103); performing
50 the indicated operations gives log (z13 = 7.61; hence,
ti =_ti (z13= 4.25x 107.
t80
where The second required constant is F(13) = F(2.25). Using the
identity F(x + I) = x!, then 1"(2.25) = F(1.25 + 1) = 1.25 !. Using
[i ith scaled time to failure Sterling's formula, x!= xXe-X(2nx) t/2. Taking logarithms gives
tso rough estimate of 80-percent failure time
ti ith time to failure, hr
log x,= x log x + (-x)log e + (2)[log 2_r + logx]
Table (2-5 gives /'i for each ordered sample.

(5) Plot on linear graph paper (I0 x 10 to the inch) median =(x +l )log x-O.434x +O.399
rank against scaled time to failure ti. Figure C-5 shows the
log(1.25!) = 1.75 log 1.25 - 0.434 x 1.25 + 0.399 = 0.026
plotted data points for this problem.
(6) These data points fit the gamma curve well with a ]3
estimate of 2.0; hence, it appears as though a two-parameter Substituting and forming the product gives c_13F(13)= (4.24x 107 )
gamma distribution is required with the location parameter y x 1.06 = 4.5x 107. Using these constants and substituting values
equal to zero. The nonzero location parameter case is covered gives
in the literature (ref. C-5).
(7) Overlay the linear axis ( 10 spaces to the inch) of a sheet
1 t1.25e_t/2.4×103
of five-cycle semiiog paper corresponding to a 13of 2.0. Plot on p(t) = 4.5 x 107
this special graph paper the linear scale rank against time-to-
failure data given in table C-5.
and
(8) Fit a straight line through the plotted points. Figure C--6
shows the plot for these data. Two additional straight lines are
shown in this figure: line I was obtained by plotting two known R(t) = -- 1 f _ tl25e -t/2"4xlO 3dt
points (0.5,1) and (20,8) (ref. C-5), line 2 has one point at 4.5 x 107 .,t
(0.5,1 ) with a slope m. If line 1 were coincident with line 2, the
{3estimate would be sufficiently accurate. (3) The hazard rate function at 5000 hr is given by
(9) Because the two lines are not coincident, a closer approxi-
mation for 13is obtained by taking a new midpoint coordinate
estimate of 6.8 from figure C-6. Using existing charts gives
13= 2.25, which satisfies the slope criteria (ref. C-5).
(10) For a shape parameter 13of 2.25, a linear scale rank of Here
20 percent applies. Entering figure C-6 at this point on the
ordinate gives a scale parameter a of 2400 hr.

256 NASAfI'P--2000-207428
Shape
100 -- parameter,

3.0
90 -- / 12.0

80 --

70 -- /,4,"
,/W
"E
60
- ,.o-_../////
///
_¢ 50 - / //
It..
40
_ ///_/_ _o
30
- ////'1 / --_°
20 /_//
I/
10

__Y_ I l I I i I l I
10 20 30 40 50 60 70 80

Scaled time to failure, _i

Figure C-5.--Electric rocket shape parameter curves.


99 --

95 --

90 --
80 --
Line 3 /
(slope m) -_. \
60 --

-_ 40 -

20
Q_
Line 1 ---._
._= 10
,...J

6 (slope m)

2
__-_ _-Line 2
I /--- Ct

I i 1 III111 I t I i 1111[ I t I I Itlll i I/


_l I l ltlll
.5
10 0 101 10 2 103 104
Failure age, ti, hr

Figure C-6.--Electric rocket parameter diagram.

NASA/TP--2000-207428 257
TABLE C-6.--TEST DATA FOR GUY SUPPORTS

P(tl) = 4.5X
"-'----'_ 10 Ordered Time to Median 5-Percent 95-Pereem
sample failure, rank rank rank

Performing the indicated operations gives numl_r q,


hr

I 1 100 6.7 0.5 25.9


2 1 890 16.2 3.7 39.4
=l.17x10 4
4.5x107 3 2 920 25.9 8.7 50.7
4 4 100 35.5 15.0 60.7
5 5 715 45.2 22.2 69.7
We can obtain R(t I) either analytically by using this integral 6 8 720 54.8 30.3 77.8
equation or graphically from figure C-6. Enter figure C-6 at a 7 12 000 64.5 39.3 85.0
8 17 500 74.1 49.3 91.3
failure age of 5000 hr. Draw a vertical line to line 3. Project the
9 23 900 83.3 60.6 96.3
intersection of fit) and 5000 hr over to the linear scale rank
10 46 020 93.3 74.1 99.5
(0.605). Using a previous identity,

R(tl) = 1- 0.605= 0.395


(a) Obtain the median rank for each ordered position (see
table C--6).
Substituting values gives
(b) Plot median rank against time to failure on log-normal
probability graph paper (probability times two log cycles), as
1.17 x 104-4
= 2.71 x 10 -4 failure/hr shown in figure C-7.
3.95 x 10 -1 (c) Ifa straight line can be fitted to these plotted points, the
time-to-failure function is log normal.
(4) The failure rate function at 5000 hr during the next (d) The mean time between failures is calculated by t ' =
1000-hr interval is given by In (t'), where t = 6970 hr as shown in figure C-7 for a median
rank of 50 percent; hence ?'= 8,84.
(e) The standard deviation is given by

Following the procedure given previously and substituting


values gives
where tb = 49 500 hr and t_, = 1020 hr as shown in figure C-7

for a median rank and a 1 - rank of 93,3 percent; hence, cr t, =


R(t2) = 1- 0.710 = 0.290
(10.81 -6,93)/3 = 1.28.
With these constants the expressions for p(t) and R(t) are
and
written as

3.21 x 10 -1 e-Q'-8.84_-_
t s/j h.28x10
p(t) = t'
= _1 (1 - 0.290)0.395
) = 2.65x 104-4 failure/hr

and
As an illustration of mechanical parts, consider example 5:
Example 5: A cable used as guy supports for sail experiments
in wind tunnel testing exhibited the time-to-failure perfor- R(t) = 3.21 x 10-t " e-(/-8"84)2/_'esXl°dt
l*_ /"
mance data given in table C--6. a]n(t)

(1) Write the failure and reliability functions. (2) The log-normal ordinate required for _," can be calculated
(2) What is the hazard rate at 5715 hr? by using the standardized normal variable table as in
(3) What is the failure rate during the next 3000 hr? example 2. The log-normal standardized variable is given by

Solution 5:
t'-?' 8.66-8.84
Z 2 .... 0.143
o"t, 1.28
(I) The essential steps for solving this problem are

258 NASA/TP--2000-207428
From the normal-curve ordinate tables 60x103
I

Cu
Y_ = 0.395 40 -

and

2O

Y2 = NY_ = I0x0.395 = 3.09 failures E-


5-percent
o"t, 1.28 confidence
6
10 line-, \
Substituting values gives N8
._./-, \

p-
p(t,)Y2 = 3.09 = 5.40x 10 -4 failure/hr
t 5.715x103 Poin
4

The log-normal area from t "to infinity can be obtained directly _- 95-percent
from figure C-7 by using the 1 - rank scale. Enter the time-to- Point 4 -_ confidence line
2 --
failure ordinate at 5715 hr; project over to the log-normal file
functionf(t) and down to the 1 - rank abscissa value of 0.638.
Therefore, the hazard rate L' at 5715 hr is given by
1
2 t0 30 50 70 g0 98
5.40 x 10-4
= 8.46 x 10-4 failure/hr Rank, percent
6.38 x 10 -t

I I R(il) I, R(t21)
I I I
(3) The failure rate during the next 3000 hr is calculated by
.98 .90 .70 .50 .30 .10 .02
knowing that R(q) = ---0.638 at a time to failure of 5715 hr and 1 - Rank
by obtaining R(t2) = 0.437 from figure C-7 at 8715 hr. There-
fore, the failure rate is given by Figure C-7.--Guy support life.

1 (1_0.437_=
L= 3×103 (, 0.638? 1.05x10 -'4 failure/hr T 15 000 hr
? .... 1000 hr/failure
r 15 failures

Determination of confidence limits,--In the preceding sec-


(2) The upper and lower confidence limits at some confi-
tions, statistical estimates of various parameters have been
dence level are given by
made. Here we determine the methods for defining the confi-
dence to be placed in some of these estimates. In example 1,
tantalum capacitors with a one-parameter exponential distribu-
tion were studied. For an exponentially distributed population,
additional estimates follow the chi-squared distribution. As an
illustration of how to determine confidence limits for an expo-
nentially distributed estimate, consider example 6.
and
Example 6: One hundred tantalum capacitors were tested for
15 000 hr, during which time 15 parts failed.
2r
LCL = 2 /?
(1) What is the mean time between failures? _(e/2);2r )
(2) What are the upper and lower confidence limits at
98-percent confidence level?
where
Solution 6:
UCL upper confidence limit, hr
(1) The mean time between failures is given by LCL lower confidence limit, hr

NASA/TP--2000-207428 259
T total observed operating time, hr Solution 7: For the areas under the normal curve from --_ to
Z2 percentage points of chi-squared distribution Z equal to 0.98 and 0.02, existing area tables give Z = +2.06
r number of failures and r = 15 + 5 = 20 total failures, with 2r = 40.
1 - a/2 probability that ? will be the calculated (x/2 interval
Substituting values gives
For the 98-percent confidence level required by this problem,

0Z2)1/2 = (2 x 40 - 1)1/2 + 2.06


a = 0.01
2
2 2
X0.0k40 = 59.7, _0.99:40 = 23.4

Hence,

and 40 x 103
UCL = _ = 1709 hr
23.4
2r = 30

40 x 103
Therefore, the chi-squared distribution values are given by LCL = = 670 hr
(available from many existing tables) 59.7
2
%0.0t',30 = 50.9 Thus, it can be said with 98-percent confidence that ? lies
between approximately 670 and 1710 hr; as the test time
2
X0.99;30 = 14.9 increases, the estimated-parameter confidence interval decreases.
In example 2 gimbal actuators that exhibited normally dis-
tributed time-to-failure data were analyzed. For a normally
Substituting values gives
distributed population, additional mean estimates will also be
normal. As an illustration of how to determine confidence
30x 1000
UCL = = 2013 hr intervals for normal estimates, consider example 8.
14.9 Example 8: Twenty-five gimbal actuators have been tested.
The mean time between failures has been calculated to be
and 75 000 hr with a standard deviation of 10 300 hr (see
example 2). What are the upper and lower confidence limits at
30x 1000 a 90-percent confidence level?
LCL = = 589 hr
50.9 Solution 8: The upper and lower confidence limits are given
by
Thus, it is known with 98-percent confidence that the limits of
UCL = ? + K_x/2
the time }" lie between approximately 590 and 2010 hr. rill2

Determining the percentage values for the chi-squared distri-


bution for values ofr greater than 30 may also be useful. It has
CY
been shown that when r >_.30,
LCL = t - Ka/2 n-_

(2Z2) 112 = [2(2r)- 1]112 -+Z where

? mean time between failures, hr


where Z is the area under the normal curve at the specified
confidence level. Example 7 illustrates how this equation is K_2 standardized normal variable
unbiased standard deviation
used for confidence interval calculations.
Example 7: The tantalum capacitors of example 6 have been n number of samples
operated for 5000 more hr; five additional units have failed. 1- a probability that t will be in calculated interval

What are the confidence limits on ? at the 98-percent confi-


For this problem
dence level for this additional testing?

26O NASA/TP--2000-207428
and

1.83 x 9820
LCL = 75 000 = 69 300 hr
101/2
O_
-- = 0.05
2 Comparing this time interval with that calculated for a
sample size of 25 shows that the smaller sample gives a larger
and from existing tables for the area under the normal curve, interval of uncertainty.
Koe2 = 1.64. Substituting values gives In example 3 stepping motors that exhibited Weibull distrib-
uted time-to-failure data were studied. As a graphical illustra-
1.64 x 10 300 tion of how to determine confidence intervals for a
UCL = 75 000+ = 78 400 hr
251/2 Weibull-distributed estimate, consider example 9.
Example 9: Another group of stepping motors has been step
tested as previously explained in example 3. The Weibull plot
and
of percent failures for a given failure age is the same as that
given in figure C-2. During this testing, however, only eight
1.64 × 10 300
LCL = 751300 = 71 600 hr failures have occurred. What is the 90-percent confidence band
251/2 on the reliability estimate at 4000 cycles?
Solution 9: The data needed for graphical construction of the
This means that 90 percent of the time the mean-time-between confidence lines on the Weibull plot are given in table C-3. The
failures estimate t" for 25 gimbal actuators, rather than the following steps are necessary to construct the confidence lines
original 10, will be between 71 600 and 78 400 hr. Note that the in figure C-2:
sample size n has been increased to use this technique. This
reflects the usual user pressure to learn as much as possible with (1) Enter the percent failure axis with the first 5-percent rank
the least amount of testing. Try to keep n > 25 in estimating value hittingf(t); for failure 2 the 5-percent rank is 3.68.
normal parameters with this technique. If n < 25, use Student's (2) Draw a horizontal line that intersectsf(t) at point 1.
t distribution (ref. C---6). To determine the effects on confidence (3) Draw a vertical line to cross the corresponding median
intervals of reducing sample size, rework example 2 for the rank; for failure 2 the median rank is 16.23.
smaller sample size of 10, using Student's t distribution. The (4) Draw a horizontal line at the median rank, 16.23, for
upper and lower confidence limits are given by failure 2. The intersection point of the line for step (3) with this
line is one point on the 95-percent confidence line.
s (5) Repeat steps (I) to (4) until the desired cycle life is
UCL = ? + tel 2 --TIT covered, 4000 cycles in this case.
?1
(6) The 5-percent confidence line is obtained in a similar
and
manner. Enter the percent failure axis with the 95-percent
s failure rank; 25.89 for failure 1.
LCL = i-ta/2 nl/2
(7) Draw a horizontal line that intersectsf(t) at point 3.
where (8) Draw a vertical line to cross the corresponding median
rank; 6.70 for failure 1.
t_2 Student's t variable (9) Draw a horizontal line at the median rank, 6.70, for fail-
s standard deviation ure 1. The intersection point of these two lines is one point on
the 5-percent confidence line.
For this problem, r = n - 1 = 9, et = 0.10, and t_2 from existing (10) Repeat steps (6) to (9) until the desired cycle life is
tables is 1.83. The standard deviation is given by covered.

A 90-percent confidence interval forf(t) at 4000 cycles is,


s = .57 213- 56 250. = 9820 from figure C-2, 1.2 to 37.5 percent. Hence, a 90-percent
)1/2
10
confidence interval for R(t) at 4000 cycles is 0.998 to 0.625.
In example 5, guy supports that exhibited log-normally-
Substituting values gives distributed time-to-failure data were analyzed. As a final graphi-
cal illustration of how to determine confidence intervals for a
1.83 × 9820 log-normally-distributed estimate, consider example 10.
UCL = 75 (300 -_ = 80 700 hr
101/2

NASA/TP--2000-207428 261
Example I0: It has been shown that the guy supports of TABLE C-7.--POISSON DATA
FOR SPEED CONTROLLER
example 5 exhibited a reliability of 0.638 at a time to failure of
5715 hr. Consider now the procedure for determining the
Ordered Time to
confidence band on this log-normal estimate. The data needed sample failure,
for the graphical construction of the 90-percent confidence number tf,
lines on the log-normal graph of figure C-7 are also given in hr
table C-6.
1 3 520.0
Solution 10: The steps necessary to graphically construct the
2 4 671.2
confidence lines in figure C-7 are as follows: 3 6 729.3
4 7 010.0
(1) Enter the rank axis with the first 5-percent rank value 5 8 510.2
6 9 250.1
hittingf(t), the log-normal life function shown in figure C-7;
7 10 910.0
for ordered sample 3, the 5-percent rank is 8.7. 8 11 220.5
(2) Draw a vertical line to intersectf(t) at point I as shown in 9 11 $15.6
figure C-7. l0 12 226.4

(3) Draw a horizontal line to cross the corresponding median


Toad 85 $66.3
rank; for ordered sample 3, the median rank is 25.9.
(4) The intersection point (point 2 in fig. C-7) of step (3) and
the median-rank line is one point on the 95-percent confidence
line. (3) What is the probability that 6, 7, 8, 9, or 10 failures will
(5) Repeat steps (1) to (4) until the desired time to failure is occur? What is the reliability after the fifth failure?
covered; 5715 hr in this case.
(6) The 5-percent confidence line is obtained in a similar Solution 11:
manner. Enter the rank axis with the 95-percent-failure rank,
25.9, for ordered sample 1. (1) Reducing the data given in table C-7 gives the mean time
(7) Draw a vertical line intersectingf(t) at point 3. between failures as
(8) Draw a horizontal line to cross the corresponding median
rank; for ordered sample 1, the median rank is 6.7. 10

(9) The intersection point (point 4 in fig. C-7) of these two _._ ti
lines is one point on the 5-percent confidence line. _-= i=__L_.t
= 8.59 x 104 = 8.59 x 103 hr/failure
(I0) Repeat steps (6) to (9) until the desired time to failure is
N/ 10
covered.

At 5715 hr the 90-percent confidence interval forf(t) is, from Hence, the Poisson failure density function is given by
figure C-7, 19.7 to 69.4 percent. Hence, a 90-percent confi-
dence interval for R(t) at 5715 hr is 0.803 to 0.306. Incidentally,
this graphical procedure for finding confidence intervals is
I t )Nf
completely general and can be used on other types of life test p(Nf)= 8"59×103 e -'/859×103
diagrams.
Estimation using the Poisson and binomial events.--The
Poisson and binomial distributions are discrete functions of the
The reliability function is given by
number of failures Nf that occur rather than of the time t.
The Poisson distribution (fig. C-l) is a discrete function of
the number of failures. When this distribution applies, it is of
interest to determine the probabilities associated with a speci-
fied number of failures in the time continuum. As an illustration
of a complex electrical component that follows the Poisson j=_ J!
distribution, consider example 11.
Example 11: Ten space-power speed controllers were tested
(2) To calculate the probability of five failures in 10 000 hr,
during the rotating solar dynamic development program, The use the ratio
time-to-failure test data are given in table C-7.

( 1) Write the Poisson failure density and reliability functions. t 1.0 x 104
=1.16
(2) What is the probability of five failures in 10 000 hr? t" 8.59 × 103

262 NASA/TP--2000-207428
Theprobability
of five failures in I0 000 hr is given by TABLE C-8.--BINOMIAL
EXPANSION COEFFICIENTS

Sample Possible Binomial


p(5) = (116)5e-116 2.09x0.314 = 5.47× 10 -3 size, numberof expansion
5! = 1.2×102
rl failures _¢fficicats

One easy method of calculating the term (1.16) 5 is 2 I


3 121

log(1.16) 5 = 5 log 1.16 = 5(0.148) = 0.740 s l 4\6:4


, a,

(1.16) 5 = 2.09 4

4! pjqn-j

(3) The reliability from the 5th to the 10th failure is the sum j=Nf

of the remaining terms in the Poisson expansion. The Poisson


expansion in sum form is given by One simple method for obtaining the binomial expansion
coefficients is to make use of Pascal's triangle. Pascal found
10 that there was symmetry to the coefficient development and
R(Ny)= % 0"314(1"16)J explained it as shown in table C-8. Pascal's triangle (dashed
j!
j---6 lines) is shown in the last column. The lower number in the
dashed triangle is obtained by adding the two upper numbers
(i.e., 3 + 3 = 6).
Calculating each term and summing gives
Using these constants and expanding gives p(Nf) as

R(6) = 0.0013
p(Nf)=q4 +4q3p+6q2p2 +4qp3 + p4
The binomial distribution is given in figure C-I as distribu-
tion 7. Considerable work has been done to develop the tech- The probability of one defective unit appearing in a flight
niques suitable for using this powerful tool (refs. C-1 and C-3). quantity of four is given by the second term in the expansion;
As an illustration consider a pyrotechnic part described in hence,
example 12.
Example 12: A suspicious lot of explosive bolts is estimated
4q3p = 4(0.85)3(0.15)= 0.37
to be 15 percent defective due to improper loading density as
observed by neutron radiography.
The resulting histogram for this distribution is shown in figure
(1) Calculate the probability of one defective unit appearing (::-8. The probability that 2, 3, or 4 defects will occur, as the
in a flight quantity of four. reliability after the first defect, is the sum of the remaining terms
(2) Plot the resulting histogram. in the binomial expansion. This probability can be calculated
(3) What is the reliability after the first defect? by using the equation for R(Nf). However, it is simpler to use
the histogram graph and sum the probabilities over Nf from
Not many failure density data are available, but past experience 2 to 4; hence,
with pyrotechnic devices has shown that the binomial distribu-
tion applies. From the given data, the per-unit number of R(2) = 0,096 + 0.011 + 0.0011 = 0.108
effectives q is 0.85, the per-unit number of defectivesp is 0.15,
the sample size n is 4, and the possible number of failures Nf These explosive bolts in their present form are not suitable for
is 0, 1, 2, 3, or 4. The frequency functions corresponding to use on any spacecraft because the probability of zero defects
these constants are given by is only 0.522, much below the usually desired 0.999 for pyro-
technic spacecraft devices.
4_ Determination of confulence limits.--When an estimate is

4-N' made from discrete distributions, it is expected that additional


estimates of the same parameter will be close to the original
estimate. It is desirable to be able to determine upper and lower
and confidence limits at some stated confidence level for discrete

NASA/TP--2000-207428 263
.75 Problem 13: The Poisson estimate of reliability from the 5th

v to the I 0th failure for speed controllers was found to be 0.13013


in a previous problem. What are the upper and lower confidence
O L I IIII II
.5O limits on this estimate at a 95-percent confidence level?
X3 The variation in ? can be found by using figure C-9. Enter
figure C-9 on the 5-percent c_ line at the left-hand end of the 5
"5 interval. Here, T/?l = 10.5; then 71= 10 ? (T/?_ ) = 8.57x 104/10.5
_>, .25
= 8160 hr. Using the left-hand end of the 4 interval gives
t-_

..Q
T/[ 2 = 9.25; then ?2 = 8-57x104/9.25 = 9530 hr. One simple
O

rl
method for finding f(5) is to use figure C-10 (ref. C-5). The
t ?ratios of interest are 1.22, 1.16, and 1.05, respectively. For
2 3 4
these ratios with Nf= 5, the values off(5) from figure C-10
Number of failures, Nf
are 0.997, 0.9987, and 0.99992, respectively. Because the sum
Figure C--8.--Explosive bolts histogram of the last five terms is desired, R(5) is 0.003, 0.0013, and
0.0008, respectively. This means that the probability of the
5th to the 10th failure of a speed control occurring is in the
distribution estimates just as is done for continuous functions of interval 0.0008 to 0.003 at a confidence level of 95 percent.
time. The analytical procedure for determining these intervals As an illustration of how confidence intervals can be obtained
is simplified by using specially prepared tables and graphs. for a binomial distribution, consider example 14.
Useful tables for the binomial distribution are given in the Example 14: The probability of one defective unit appearing
literature (ref. C-3). in a flight quantity of four explosive bolts has been calculated
As an example of how confidence intervals can be obtained to be 0.37. What are the upper and lower confidence limits on
for Poisson estimates, consider problem 13. this estimate at a 90-percent confidence level?

Scale parameter,
(_

20% 10% 5% 1%
lO

o/

V 3 6 7 lu 11 12

--LT_ _ 9 10 11' 12J13 14 16 17


I I i F 18 19 20 21 22 23 24 25
J
! f
I I I I !11 I I )l I I I I ( I I
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Normalized time, t/_ 1

Figure C-9._Poisson MTBF fixed test time.

264 NASA/TP--2000-207428
Number of
failures,
Nf

5 6 7 8 9 10

.99999 [--
.9999

.999

.99

.9
.8
_b
.6

,.Q
.4

t-
.2

.1

.01

.001

.0001

.00001
.1 .2 .4 .6 1 2 4 6 8 10 20 30
Time ratio, t/t

Figure C-lO.--Poisson unreliability sum.

If the sample size is n, the number of defectives is r, and the (2) Timeliness--A sample can be studied in less time than the
confidence level is y, this example has the following con- whole population can be studied, giving prompt results.
straints: n = 4, r= 1, and 7= 90 percent. Using these constraints, (3) Destructive nature of a test--Some tests require that the
the upper U and lower L confidence limits can be obtained end item be consumed to demonstrate performance, leaving
directly from existing tables as UCL = 0.680 and LCL = 0.026. nothing to use afterwards.
This means that with a 90-percent confidence level, the prob- (4) Accuracy--A sample survey conducted by well-trained
ability of one defective bolt appearing in a flight quantity of researchers usually will result in accurate and valid decisions.
four is in the interval from 0.026 to 0.680. (5) Infinite population--In many analytical studies, an infi-
nite population is available. If any information is to be used for
decision making, it must be based on a sample.
Sampling
Choosing a sample.--Goodj udgment must be used in choos-
Purpose of sampling.--Sampling is a statistical method ing a sample. Subjective methods of choosing samples fre-
used when it is not practical to study the whole population. quently result in bias. Bias is an expression, either conscious or
There are usually five reasons why sampling is necessary: subconscious, of the selector's preferences. Bias can be held to
a minimum by using a nonsubjective method developed just for
(1) Economy--It usually costs less money to study a sample this purpose. Several nonsubjective sampling procedures are
of an item than to study the whole population. described:

NASA/TP--2000-207428 265
6433 2582 0820 1460 6606 7143 9158 5114 9491 8063
3465 7348 5774 3821 6216 2148 1221 5895 7942 9971
9601 9189 0141 1377 3467 7971 0811 8309 0504 4606
2364 3260 1430 9505 3146 4815 9732 3447 7705 4532
7304 9292 4580 8160 7144 8073 8476 1896 6661 1285

3764 5460 6385 9045 7170 5831 4668 9386 3979 1116
0251 3139 4201 0578 2172 6876 4347 4288 1514 9985
2031 0919 7613 1535 1610 7491 3255 4014 3614 5599
6398 1374 1904 7490 3941 0284 5817 1630 4629 6773
0911 3930 0324 8151 3365 6685 0566 5047 8471 6166

5052 5023 3045 3433 6365 7310 5073 5416 2332 0922
9225 3984 4659 4642 7260 1383 7625 7512 8547 7343
3100 7916 9757 8869 5307 2691 0786 2701 0102 5745
4598 0065 4257 6557 4638 6418 7398 9790 5074 8018
5956 7285 0480 1411 7766 337"7 5023 0227 8047 1887

9360 1041 2094 4212 2623 2384 6422 5374 0651 8673
8796 9974 1913 8309 4943 9423 9143 4683 4436 8413
7071 8254 6825 3020 9000 4673 6129 0176 3670 4836
7336 4451 5863 6559 5344 0714 1856 0451 7855 5998
1660 0222 2005 0215 2370 2687 3039 7953 1960 6579

7506 1020 8718 9665 1892 8245 7249 6023 4602 4227
5000 8237 6203 6829 5325 5784 8720 5053 6347 1112
4255 6894 8093 9191 5011 0452 6199 0009 8086 5170
5764 9837 6780 7490 5412 4869 6950 4183 8671 4008
3609 1368 9129 7113 3099 1887 0544 6415 9148 4381

7218 5939 4932 5465 6648 6365 4179 9266 9803 5572
6854 5911 1495 4940 4630 4514 0942 7218 7382 2145
4403 4263 4755 5451 8251 2652 6207 4841 3528 7665
2978 4381 2205 9638 6946 7126 9039 9194 6676 4396
1072 2292 4428 4934 8183 7385 3236 7748 4488 1351

6488 6568 9530 8316 7709 9022 8041 5564 6667 5329
9263 7756 6300 6793 7769 3099 3606 2468 2574 5230
0357 3493 0385 4451 4313 3024 8243 4920 3523 9644
5372 9351 8393 6023 2811 1744 2306 7083 4330 7278
6570 2866 7565 7871 9490 9050 4454 3475 8319 2972

8596 8251 0336 8119 1968 9115 4202 7785 5269 5941
4177 0092 4207 7386 9891 1149 3429 7062 4622 8415
6438 4892 2089 5509 2054 9024 1213 5791 2543 7863
5820 6287 7464 0339 8585 0968 3675 2440 4000 5148
7721 3804 9520 6184 9152 1853 8640 3601 5606 7218

Figure C-11 .--Random digits table.

(1) Random sampling--Each item in the population has an action is taken on the basis of data obtained from the combina-
equal and independent chance of being selected as a sample. A tion of both samples.
random-digits table (fig. C-I 1) has been developed to facilitate (5) Sequential sampling--Random samples are selected and
drawing random samples and has been constructed to make the studied one at a time. A decision on whether to take action or
10 digits from 0 to 9 equally likely to appear at any location in to continue sampling is made after each observation on the
the table. Adjacent columns of numbers can be combined to get basis of all data available at that selection.
various-sized random numbers.

(2) Stratified samplingmSimilar items in a population are As an illustration of when to use various sampling methods
grouped or stratified, and a random sample is selected from consider example 15.
each group. Example 15: Describe how a sample should be selected for
(3) Cluster sampling--Items in a population are partitioned three cases:

into clusters, and a random sample is selected from each cluster.


(4) Double samplingmA random sample is selected; then, (1) Invoices numbered from 6721 to 8966 consecutively. In
depending on what is learned, some action is taken or a second this case, a random sampling procedure could be used based on
sample is drawn. After the second random sample is drawn, the four-digit table given in figure C-ll. Using the given

266 NASAfrP--2000-207428
invoice numbers, start at the top of the left column and proceed Accelerated Life Testing
down each column selecting random digits until the desired
sample size is obtained. Disregard numbers outside the range of Life testing to define the time duration during which a device
interest. performs satisfactorily is an important measurement in
(2) Printed circuit assemblies to compare the effectiveness of reliability testing because it is a measure of the reliability of a
different soldering methods. If boards are all the same type, a device. The life that a device will exhibit is very much depen-
cluster sampling procedure could be used here. Group the dent on the stresses it is subjected to. The same devices in field
boards by soldering methods; select x joints from each cluster application are frequently subjected to different stresses at
to compare the effectiveness of different soldering methods. varying times. It should be recognized then that life testing
(3) Residual gases in a vacuum vessel to determine the partial involves the following environmental factors:
pressure of gases at various tank locations. A stratified sam-
piing procedure could be used in this case. Stratify the tank near (1) The use stresses may influence the life of the device and
existing feedthroughs into x sections; an appropriate mass run failure rate functions.
could be taken from each section at various ionizer distances (2) The field stresses could be multidimensional.
from the tank walls. Analysis would tell how the partial pres- (3) An interdependence among the stress effects exists in the
sures varied witb ionizer depth at the feedthrough locations. multidimensional stress space.
(4) Life performance may vary because most devices operate
Sample size.--A completely general equation for determin- over a range in a multidimensional stress space.
ing sample size n is given by
Testing objects to failure under multidimensional stress
conditions is usually not practical. Even if it were, if the system
were properly designed, the waiting time to failure would be
n
quite long and therefore unrealistic. It has been shown that
time-to-failure data are important to reliability testing, and now
where
they appear difficult to obtain. These are some of the reasons
why many are turning to accelerated life testing, such as
Nf desired number of time-to-failure points compressed-time testing, advanced-stress testing, or optimum
n sample size life estimates:
tt test truncation time
(1) Compressed-time testing--If a device is expected to
operate once in a given time period on a repeated cycle, life
This equation can be used with any of the reliability functions
testing of this device may be accelerated by reducing the
given in figure C-1.
operating time cycle. The multidimensional stress condition
As an illustration of how these equations can be applied to
need not be changed. The stresses are being applied at a faster
electrical parts, consider example 16, which is derived from rate to accelerate device deterioration. Care should be taken not
example 1.
to accelerate the repetition rate beyond conditions that allow the
Example 16: Tantalum capacitors with a failure rate of
device to operate in accordance with specifications. Such
lxl0 -'3 failure/hr are to be tested to failure. In a 1000-hr test,
acceleration would move the device into a multidimensional
what sample size should be used to get 25 time-to-failure data
stress region that does not exist in field conditions and would
points?
yield biased information. As an illustration of compressed time
Solution 16: The truncated exponential reliability function is
testing, consider example 17.
given by
Example 17: The stepping motor in example 3 was being
pulsed for life testing. How could this life test be accelerated?
R(tt ) = e-t,/100o = 0.37 The power supply providing the stepping pulses may have
been stepping at the rate of one pulse per 10 sec, resulting in a
Solving the general sample size equation for n and substituting test time of 107 sec. These motors had a frequency response
values gives allowing 10 pulses per sec. Increasing the pulse stepping rate up
to the frequency response limit yields comparable time-to-
failure data in 105 sec, a savings in time of 2 orders of magnitude.
n= Nf 25 39.6
1-R(tt) 0.63 (2) Advanced-stress testing--If a device is expected to
operate in a defined multidimensional stress region, life testing
Rounding off to the nearest whole unit gives n = 40 pieces. This of this device may be accelerated by changing the multidimen-
means that 40 capacitors tested for 1000 hr should give 24 sional stress boundary. Usually the changes will be toward
time-to-failure data points. increased stresses because this tends to reduce time to failure.

NASA/TP--2000-207428 267
Thetworeasons whyadvanced stress
testing isused aretosave stresses resulted in fairly long waiting periods to failure.
timeandto seehowa deviceperforms underthesestress Changing the multidimensional stress conditions by a factor of
conditions.
Care should be exercised in changing stress bound- 1.25 to 2, which is usually done during development testing,
aries to be sure that unrealistic conditions leading to wrong tended to identify design deficiencies with shorter waiting
conclusions are not imposed on the device. A thorough study of periods without affecting the failure mechanism.
the failure mechanisms should be made to ensure that proposed (3) Optimum life estimate--One remaining calculation for
changes will not introduce new mechanisms that are not nor- nonreplacement failure or time-truncated life test is the opti-
mally encountered. If an item has a certain failure density mum estimate of mean time between failures ?. It has been
distribution in the rated multidimensional stress region, chang- shown (ref. C-I) that ?given by the time sum divided by the
ing the stress boundaries should not change the failure density number of failures should be modified by a censorship factor
distribution. Some guidelines for planning advanced-stress and a truncation time factor. The censorship factor K is caused
tests are by wearout failures, operator error, manufacturing errors, and
so forth. The correction equation for iis given by (ref. C-I)
(a) Define the multidimensional stress region for an item;
Nf
nominal values should be centrally located.
(b) Study the failure mechanisms applicable to this item.
(c) On the basis of guidelines (a) and (b), decide which stresses ? __ i=l

can be advanced without changing the failure mechanisms. IVI -K


(d) Specify multiple stress tests to establish trends; one point where
should be on the outer surface of the multidimensional region.
(e) Be sure that the specimen size at each stress level is Nf number of failures
adequate to identify the failure density function and that it has
not changed from level to level. K censorship factor
(f) Pay attention to the types of failures that occur at various
stress levels to be sure that new failure mechanisms are not As an illustration consider example 19.
being introduced. Example 19: The tantalum capacitor tested in example 1
(g) Decide whether new techniques being developed for could have been stopped when 10 capacitors (580 part-hours)
advanced-stress testing apply to this item. Several popular out of 100 had failed at a testing time of 100 hr. What is an
techniques are described: optimistic value for ? ?
Solution 19: Inspection of the 10 failed capacitors showed
(i) Sensitivity testing--Test an item at the boundary that two units failed because of manufacturing errors. There-
stress for a given time. If failure occurs, reduce stress by a fixed fore, Nf = 10, K = 2, n = 100 capacitors, tr = 100 hr, and the sum
amount and retest for the same time. If no failure occurs, of ti = 580 hr. Substituting these values into the /'correction
increase stress by a fixed amount and retest for the same time. equation gives
Repeat this process until 25 failures occur. This technique is
used to define endurance limits for items.
(ii) Least-of-Ntesting---Cluster items in groups and sub- ?= 580+(100-10)100 = 1197 hr
10-2
ject each cluster to a specified stress for a given time. Stop at the
first failure at each stress level. Examine failed items to ensure
conformance to expected failure mechanisms. This is an optimistic estimate for the mean time between
(iii) Progressive-stress testing--Test an item by starting failures, but it certainly is fair and reasonable to make these
at the central region in stress space and linearly accelerating types of corrections.
stress with time until failure occurs. Observe both the failure
stress level and the rate of increasing stress. Vary the rate of
increasing stress and observe its effect on the failure stress Accept/Reject Decisions With Sequential Testing
magnitude. Examine failed items to ensure conformance to
expected failure mechanics. A critical milestone occurs in product manufacturing at
delivery time. An ethical producer is concerned about shipping
As an illustration of advanced-stress testing, consider a product lot that does not meet specifications. The consumer
example 18. is concerned about spending money to purchase a product that
Example 18: A power-conditioning supply was being life does not meet specifications. A test method that permits each to
tested at nominal conditions with an associated electric rocket. have an opportunity to obtain data for decisionmaking is
The nominal electrical, thermal, vibration, shock, and vacuum required.

268 NASA/TP--2000-207428
Sequential testing constraints.--If ct is the producer's risk
and 13is the consumer's risk, two delivery time constants valid P,[N _ ( t ,_Nf e-t/loOO
1[ f]=_) Nf!
for small risks have been defined and are given as

(3) Delivery constant B defines the acceptance criteria for


PI/P O.Using this constraint and substituting for P1 and P0 gives

PI(Nf) 2Nz e-t/2000


B=_=

Let Pl be the probability that Nf failures will occur in time t The minimum testing time without failure t(0)mi n is given by
for a specified minimum acceptable ?1, and let P0 be the
probability that Nyfailures will occur in time t for an arbitrarily
0.111 = (2)°e -t(0)min/200
chosen upper value ?0- Test rules using these four constants
have been defined for each condition (refs. C-1 and C-5):
Solving for t(0)mi n gives
(1) Accept if Pj/ Po < B.
(2) Reject if Pl/ Po > A. t(0)min = 2.20 x 2000 = 4400 unit-hr
(3) Continue testing if B < Pt/P 0 < A.
The minimum number of capacitors to be life tested for 48 hr
Exponential parameter decisionmaking.--As an illustra- is given by
tion of how these testing constraints can be implemented for the
exponential distribution, consider example 20. 4400 unit- hr
Example 20: A purchased quantity of 100 000 tantalum nmi n = 91.7
48 hr
capacitors has been received. Negotiations prior to placement
of the order had established that o_= [3= 0.1, ?l= 1000 hr, and
?0= 2000 hr and that the sequential reliability test should be To ensure good results, choose a sample size n that is more than
truncated in 48 hr. twice nmin; for this problem, use n = 200 units. The required
minimum testing time for 200 units is given by
( l ) Calculate A and B.
(2) Write the expressions for PO and PI. 4400 unit-hr
(3) How many units should be placed on test? t(0)min = 200 units = 22.0 hr
(4) Plot a sequential reliability control graph to facilitate
decisionmaking at each failure time. The test can be stopped and an accept/reject decision made at
tr where tt is given by
Solution 20:

tt = 48 hr x 20 units = 9.6 x 103 unit-hr


(1) The delivery time constants are obtained by substituting
values into the defining equations.
(4) The tantalum capacitor reliability chart is constructed by
1-0.I using five points in the (Ns, t) plane; three of these points have
A=_=9 already been calculated and are given by
0.1

t(0)min = 4400, N f = 0
0.1
B=_=0.111
1-0.1
tt = 9.6× 103, Nf =0
(2) Using binomial distribution from figure C-1 and substi-
tuting values gives Po(Nf) and PI(Nf) as t=O, Nf =O

t ]Nf e-t/2000 The remaining two points are calculated by using the test
inequality given by

NASA/TP_2000-207428 269
Because these boundary constraints are straight lines in the form
B<p(Nf)<A

N: --b,+ (aorc)
In general terms the ratio p(N:) is given by

the slope b is given by


- Nf _

, f: t tlj

Taking natural logarithms of the inequality and substituting gives b_ m


5Xl_=_7.x10
in((0 / 0.69
Lq )
In B< Nfln - - t <In A
Figure C-12 shows the resulting tantalum capacitor reliability
chart. The tantalum capacitor acceptance reliability test results
Adding (l/i 1 - 1/} 0)t to each term gives in an "accept," "continue to test," or "reject" decision depend-
ing on the failure performance of the capacitors as a function of
operating time in unit-hours as zoned in figure C-12.
Binomial parameters decisionmaking.--For the binomial
lnB+l_--_olt<Nfln(f°]<'nm+l-_1--_olt,_,j
frequency function, the procedure to set up a sequential reliabil-
ity test is similar to the Poisson methodology. Because the
Dividing all terms by In( ?0/?1 ) gives unreliability, or number of defectives, is given by 1 - R for an
effectiveness of R, then PI(Nf) is given in binomial form by

I1 ,] p,
where
ktl )
n
N +N:
N, number of successful trials
number of failed trials
N:
R 0, R l chosen reliability values at some time t, R 0 > R 1

The ratio PI(Nf)/Po(Nf) is given by


The inequality is now in the form given by

a+ bt < Nf< c +bt PI(Nf)= (I-RI)NI(RI)


n-N/
Po(N:)
The constants a and c for this problem for zero failures are given
by
Following the steps given in example 20, give four of the points
in the (N:, t) plane:
a =
In B
--...7.7..-_
= -2.2 =-3.18, "" =_
In(t_--° ] 0.69 In B
Lq ) N(0)min = , Nf = 0

in A 2.2
_ =3.18, N:=O
The test can be stopped and an accept/reject decision made at

_,tl .] the number of test truncation trials Nr; N r is given by

Nr =t, Nc N: = 0

270 NASA/TP--2000-207428
10

Reject
Reject

lf truncated
c+bt

along these

lines

$
e_
E 2
z

(o,o)
\
_- t t (0, 9.6x10 3)

-2

-4 I I [ I I
0 2 4 6 8 10x10 3
Operating time, t, unit-hr

Figure C-12.--Tantalum capacitor reliability chart.

where N c is the number of units chosen for testing: The slope b is given by

n=0, Nf=O

In B b_
a _
Nf=0
In RO(1 - RI)'
RI(1- R0)

The inequality equation for these conditions is given by


in A
C_--
Nf=0
InR°(1-nl)' a+bn< Nf <c +bn
&(1-RO)
Accept/reject charts at delivery milestones when based on
reliability sequential testing methods provide a rigorous math-
ematical method for deciding whether or not to accept or reject
an order of components. The actual reliability value for these

NAS A/TP--2000-207428 271


TABLE C-9.--POWER SUPPLY PROBLEMDATA
y: 1-(/_ + RtS) : I - (0.I + 0.1) : 0.80, or 80 percent
Saml_e Number Reason"for fmlure Repair
serial of time,
number failures hr
k=r2=_=0'005 5
r1 0.001
I l AIA-2VR3 zener shorted 1.2
I Ground wire broke 1.4
1 1
AIA2-VR3 zenershorted; 5,5,7.3
AIA2-Q2 _ shorted rl- x--_'-_
l - I0°° hr
Ina 2._)-hr no failure occurred
test

0 Ina 250-hrtest
no failure
occurred ....... Looking up Za in a normal curve area table (table 3 in ref. C-3)
I A3AI--C3capacitorleaked 9.5 for R a = 0.1 shows that Zc_ = -1.28. The value of K 2 when
k = 5 and 7= 0.80 is obtained from figure 11-1 in reference C-3,
3 I A3AI--C3 cal_izor leaked 9.0 where K 2 = 1.05. The equation for t is thus t = mK 2 = (1000)
0 In a 250-hr test no failure occurred
(1.05) = 1050 hr = 1000 hr. The rejection number R for a time

4 1 A7AI-VRI ansoidered joint .5 sample of 1000 hr and a confidence level %'= 0.80 is given by
A3AI--C3 cap_itor leaked 9.5

5 0 In a 250-hr test no failureoccurred ....... RI0OO(0.80) = K 2 + ZcLK + 0.5

= 1.05 + (1.28)1.025 + 0.5 = 2.86 - 3

components is not known and neither is it wise to consider


(2) Recalculate the subsample for y = 0.50 and k = 5: From
reliability assessment at this critical milestone.
figure 11-I in reference C-3, K 2 = 0.29. Therefore,
Subsamplefehart.--The chief advantages of a subsamplef
chart are that (1) it reduces reliability acceptance testing costs,
(2) it provides for product improvements, (3) it determines if t =mK 2 = (1000)(0.29) = 290 hr _ 250 hr
statistical control exists, and (4) it determines the mean time to
repair. Looking up Zc_ in table 3 in reference C-3 for
Example 21: A power supply has the following data:
1-7 0.5
(1) Acceptable reliability level r 1,0.01 failure/hr; producer' s ..... 0.25
2 2
reliability risk Roe 10 percent; specified mean time to repair,
3.0 hr
shows that Zct = --0.68. Recalculate the rejection number as
(2) Lot tolerance fractional reliability deviation, r2, 0.005
failure/hr; consumer's reliability risk RI_, 10 percent
R250(0.50 ) = K 2 + ZctK + 0.5 = 0.29 + (0.68)0.54 + 0.5
The product test data are given in table C-9. Use figure C-I 3
= 1.16 = 1 failure
to analyze these data; then answer the following questions:

(3) Calculate K 2 for each value oft shown in table C-10 as


(1) What is a suitable time sample and rejection number for
meeting the 80-percent confidence level selected by manage-
ment?
K 2 ..... t 250 0.25 for k = 5; m = 1000 hr
(2) What are the subsample sizes and rejection numbers? m 1000
(3) What are the confidence levels for the various rejection
numbers? Look up in figure 11-1 of reference C-3 the confidence level
(4) What are the control limits on the mean time to repair? %,values shown in table C-10. Calculate R e for each confidence
(5) Plot these data on a subsample fchart. level. (The calculated values are shown in table C-10.)
(6) What should be done with the manufactured units?

1 - %, 1- 0.46
Solution 21: Given the product data, follow these steps: Ret ..... 0.27
2 2

(1) Calculate the confidence level 7, the ratio of acceptable


Look up Zc_ for each confidence level in table 3 of reference
reliability level to lot tolerance fractional reliability deviation
C-3 (the values are tabulated in table C-10). Recalculate the
k, and the mean time between failures m:

272 NASAfrP--2000-207428
NASA/TP--2000-207428 273
TABLE
C-10.--$UBSAMPLE
DATA 2f_ 2x4x3
= 6.88 hr

162 V, R.
= 3.49

percent

250 0.25 0.46 0.27 0.61 ! _ 2fO 2x4x3


LCL, - _.-T--'-- = = 1.79 hr
500 .50 .63 .185 .89 2
XS(0 to) t3.4
750 .75 .73 .133 1.11 2
1000 1.0 .78 .11 1.22 3

wherefis the average number of failures and ¢ denotes mean


time to repair. These control limits are shown in figure C-13 for
rejection numbers Rt(_[) for each subsample (the values are the repair time process. The lower control limit in this case has
listed in table C-10): no importance other than statistical completeness because any
value less than 1.79 hr is an indication of a better maintenance
activity than what has been specified--a desirable condition,
Rt(_) = K 2 + ZaK + 0.5 z
The completed subsamplefchart is shown in figure C-13.
Table C-i 1 shows the tabulated data calculated to solve this

R250(0.46 ) = 0.25 + (0.61)0.5 + 0.5 = 1.05 = 1 problem. During the various subsample intervals, some useful
conclusions can be drawn:

R5oo(o.63 ) = 0.50 + (0.89)0.71 + 0.5 = 1.63 --- 2 (1) During subsample interval 1 to 4 failures

R75O(O.73 ) = 0.75 + (1.11)0.87 + 0.5 = 2.21 = 2 4

i=1
RI0OO(O.78) = 1.00+(1.22)1 +0.5 = 2.72 = 3
reject serial number 1, request an engineering investigation,
and repair and retest serial number 1 Iater.
(2) During subsample interval 5 to 8 failures
(4) Find the control limits on the mean time to repair for the
data given in table C-9:

TABLE C-11 .--POWER SUPPLY ANALYZED DATA


[Sample size, 250 hr.!

Time Sample Reason for &ilure Number of Repair Mean time


uunpte seri_ failures time, to repair,
number hr hr

I I I IAIA2-VR3 zener shorted 1 1.2 ---


2 !Ground wire broke I i.4 ---

3 , AIA2-VIR.3 zener ra_rted; 2 5.5, 7.3 5.1


AHV2-Q2 transistor shorted
4 No failures occurred 0 -_

2 2 5 No failures occurred 0 -_

6 A3A I--C3 eltplcitor I 9.5 ---

3 7 A3AI--C3 capacitor h_lk_l I 9.0 4.6


8 No failures occurred 0 ..........

3 4 9 ATAI-VRI unsoldered joint 1 0.5 ---


10 A3AI-C3 capacitor leaked 1 9.5 ---

5 11 No failures occurred 0 ..........

12 No failures occurred 0 ---

Totals 8 48.9

274 NASA/TP--2000-207428
8 References

i=5 C-I. Bazovsky, 1.: Reliability Theory and Practice. Prentice-Hall, 1961.
C-2. Earles, D.R.: and Eddins, M.F.: Reliability Physics. AVCO Corp.,
Wilmington, MA, 1962.
ship serial numbers 2 and 3 after all failures have been reviewed,
C-3. Calabro, S.R.: Reliability Principles and Practices. McGraw-Hill, 1962.
the cause identified, and appropriate corrective action worked C--4. Berrettoni, J.N.: Practical Applications of the Weibull Distribution.
out and approved by an engineering review board. American Society for Quality Control, Annual Technical Conference
(3) During subsample interval 9 to 12 failures Transactions, vol. 16, 1962, p. 303.
C-5. Failure Distribution Analyses Study. Vols. I, II, and !I!. Computer
Applications Inc., NY, Aug. 1964.
12
C-6. Hoel, P.G.: Elementary Statistics. John Wiley & Sons, 1960.

i=9

ship serial numbers 4 and 5 after all failures have been reviewed,
properly closed out, and approved by the engineering review
board.

NASA/TP--2000-207428 275
Bibliography
Arsenault, I.E.; and Roberts, J.A.: Reliability and Maintainability of Electronic Domingos, H.: Electro-Thermal Overstress Failure in Microelectronics.
Systems. Computer Science Press, 1980. Report RADC-TR-73-87, Apr. 1973. (Avail. NTIS, AD--761792.)
Balaban, H .S.; and Retterer, B .L.: Guidelines for Application of Warranties to Electrical, Electronic, and Electromechanical (EEE) Parts Management and
AirForce Electronics Systems. RADC-TR-76-32, Mar. 1976. (Avail. NTIS, Control Requirements for NASA Space Flight Programs. NHB 5300.4 (1F),
AD--A023956.) July 11, 1989.
Balaban, H.S.; and Retterer, B.L.: Use of Warranties for Defense Avionics Ellingham, D,B., Jr.; Schreyer, W.M.; and Gaertner, W.W.: Development of
Procurements. Report RADC-TR-73-249, Feb. 1974. (Avail. NTIS, AD- Failure Rate Models for Semiconductor Optoelectronic Devices. Report
769399/7.) FAA-RD--76--134, July 1976. (Avail. NTIS, AD--A029163/3.)
Bauer, I.A., et al.: Dormancy and Power On-OffCycling Effects on Electronic Engleman, J.H.; Kennedy, J.; and Wood, S.R.: Traveling Wave Tube Failure
Equipment and Part Reliability. Report RADC-TR-73-248, Aug. 1973. Rates. Report RADC-TR-80--288, Nov. 1980. (Avail. NTIS, AI)-A096055.)
(Avail. NTIS, AD--768619.) Flint, S.: Failure Rates for Fiber Optic Assemblies. Report RADC-TR-80-
Bertin, A.P.: Development of Microcircuit Bond-Pull Screening Techniques. 322, Oct. 1980, (Avail. NTIS, AD--A092315.)
Report RADC-TR-73-123, Apr. 1973. (Avail. NTIS, AD-762333.) Fulton, D.W.: Nonelectronic Parts Reliability Notebook. Report NPRD--I,
Bevington, J.R., et al.: Reliability Evaluation of Plastic Integrated Circuits. 1978. (Avail. NTIS, AD--A059901.)
Report RADC-TR-71-8, Jan. 1971. (Avail. NTIS. AD-722043J Gagier, T.R.; Kimball, E.W.: and Selleck, R.R.: Laser Reliability Prediction.
Butler, T.W.; Cottrell, D.F.; and Maynard, W.M.: Failure Rate Mathematical Report RADC-TR-75-210, Aug. 1975. (Avail. NTIS, AD-A016437.)
Models for Discrete Semiconductors. Report RADC-TR-78-3, Jan. 1978. Ghate, P.B.: Failure Mechanisms Studies on Multilevel Metallization Systems
(Avail. NTIS, AD-A050181 .) for LSI. Report RADC-TR-71-186, Sept. 1971. (Avail. NTIS, AD-
Citrin, D.A.: Electrical Characterization of Complex Microcircuits. Report 731796.)
RADC-TR-72-145, June 1972. (Avail. NTIS, AD-748242,) Guth, G.F.: Development of Nonelectronic Part Cyclic Failure Rates. Report
Citrin, D.A.: Electrical Characterization of Complex Microcircuits. Report RADC-TR-77--417, Dec. 1977. (Avail. NTIS, AD--A050678.)
RADC-TR-73-373, Jan. 1974. (AvaiL. NTIS, AD-775740.) Guth, G.F.: Quantification of Printed Circuit Board Connector Reliability.
Clarke, R.N.; and Stallard, B.: Reliability Study of Microwave Power Transis- Report RADC-TR-77-33, Jan. 1978. (Avail. NTIS, AD-049980.)
tors. Report RADC-TR-75--18, Jan. 1975. (Avail. NTIS, AD--A007788.) Guth, G.F.: Reliability Prediction Models for Microwave Solid State Devices.
Coit, D.W.: Printed Wiring Assembly and Interconnection Reliability. Report Report RADC-TR-79-50, Apr. ! 979. (Avail. NTIS, AD-A069386.)
RADC-TR-81-3IS, Nov. 1981. (Avail. NTIS, AD-AI 11214.) Haberer, J.R.: Stress Induced Intermittent Failures in Encapsulated Microcir-
Coit, D.W.; and Steinkirchner, J.J.: Reliability Modeling of Critical Electronic cuits. Report RADC-TR-70-213, Oct. 1970. (Avail. NTIS, AD---715984.)
Devices. Report RADC-TR-83-108, May 1983. (Avail. NTIS, AD- Hasty, T.E., et al.: Reliability Physics Study of Microwave Solid State Devices.
A135705.) Report RADC-TR-71-184, Sept. 1971. (Avail. NTIS, AD--731794.)
Coppola, A.; and Sukert, A.: Reliability and Maintainability Management Hierholzer, E,L.: Passive Device Failure Rate Models for MIL-HDBK-217B.
Manual. Report RADC-TR-79-200, July 1979. (Avail. NTIS, AD-- Report RADC-TR-77--432, Nov. 1977. (Avail. NTIS, AD-A050180.)
A073299.) Hurley, H.C.; Strong, T.M; and Young, M.A.: Reliability Investigation of
Cottrell, D.F.; and Kirejczyk, T.E.: Crimp Connection Reliability--Failure Thermal Stress/Fatigue Failure in Multilayer Interconnection Boards.
Rate Mathematical Model for Electric Terminals and Connectors. Report Report RADC-TR-70--192, Oct. 1970, (Avail. NTIS, AD-714702.)
RADC-TR-78-15, Jan. 1978. (Avail. NTIS, AD--A050505.) Inspection System Provisions for Aeronautical and Space System Materials,
Crum, F.B., et al.: Warranty-Guarantee Application Guidelines for Air Force Parts, Components and Services. NHB 5330.4 (1C), July 1, 1971.
Ground Electronic Equipment. RADC-TR-79-287, Aug. 1979. (Avail. Joint Design-to-Cost Guide. Departments of the Army, the Navy, and the Air
NTIS, AD-A082318.) Force, DARCOM-P700-6, NAVMAT-P5242, AFLCP/AFSCP-I-800-
Department of Defense Acquisition Management Systems and Data Require- 19, Oct. 1977. (Avail. NTIS, AD--A048254.)
ments Control List (AMSDL). DOD 5030.19-L, voI, II, July 1981. Klion, J.: A Redundancy Notebook. Report RADC-TR-77-287, Dec. 1977.
Department of Defense Directive 5000.28, Design to Cost, May 23, 1975. (Avail. NTIS, AD-A050837.)
Descriptive Statistics. IEEE Statistics Course at Case Western Reserve Uni- Lacombe, D.J.: Reliability Prediction of Microwave Transistor. Report RADC-
versity, Spring 1963. TR-74-313, Dec. 1974. (Avail. NTIS, AD-A003643.)
Design Requirements for Rigid Printed Wiring Boards and Assemblies. NHB Lane, C.H.: Reliability Problems With SiO z Passivation and Glassivation.
5300.4 (3K), Jan. 7, 1986. Report RADC-TR-72-55, Mar, 1972. (Avail. NTIS, AD-741765.)
Devine, J.: Ultrasonic Beam Lead Bonding Equipment. Report RADC-TR- Lane, C.H.: Nichrome Resistor Properties and Reliability. Report RADC-TR-
73-27, Feb. 1973. (Avail. NTIS, AD--757561 .) 73-181, June 1973. (Avail. NTIS, AD--765534.)

NASA/TP--2000-207428 277
Lauttenburger, H.; and Fuchs, J.: A System for Effective Transferral of Reliability by Design. General Electric Co., Defense Elect. Div., Waynesboro.
Microelectronic Reliability Experience. Annals of Assurance Sciences, VA, 1964.
Proceedings of the Eighth Reliability and Maintainability Conference. Reliability Modeling and Prediction. MIL-STD-756, Aug. 31, 1982.
AIAA/SAE, 1969, pp. 503-521. Reliability Program Requirements for Aeronautical and Space System Con-
Leone, F,C., et al.: Percentiles of the Binomial Distribution. Case Institute of tractors. NHB 5300.4 (IA-I), Jan. 21, 1987.
Technology., 1967. Reliability Theory and Practice. ARINC Res. Corp., Washin_on, DC, 1962.
Lipow, M.: Airborne Systems Software Acquisition Engineering Guidebook Requirements for Conformal Coating and Stacking of Printed Wiring Boards
for Quality Assurance. ASD---TR-78-8, Aug. 1977. (Avail. NTIS, AD- and Electronic Assemblies. NHB 5300.4 (3J), Apr. I, 1985.
A059068.) Requirements for Crimping and Wire Wrap. NHB 5300.4(3H), May 1, 1984.
Lochner, R.H.: Estimation and Prediction Using the Binomial Distribution. Requirements for Electrostatic Discharge Control. NHB 5300.4 (3X), Draft
Reliability Res. and Ed. Dept., General Motors Corp., Milwaukee, WI, Copy, Dec. 1990.
1963. Requirements for Interconnecting Cables, Harnesses, and Wiring. NHB
Lochner, R.H.: Reliability Calculations for Exponential Population. Reliabil- 5300.4(3G), Apr. 1, 1989
ity Res. and Ed. Dept., General Motors Corp., Milwaukee. WI, 1963. Requirements for Printed Wiring Boards. NHB 5300.4 (31), May 1, 1984.
Lochner, R.H.: When and How to Use the Weibull Distribution. Reliability Requirements for Soldered Electrical Connections. NHB 5300.4 (3A-I),
Res. and Ed. Dept., General Motors Corp., Milwaukee, WI, 1963. Dec. 1, 1976.
Long, R.G. Reliability Prediction Modeling of New Devices. Report RADCTR- Rickets, H.C.: LSi/Microprocessor Reliability Prediction Model Develop-
80--273, July 1980. (Avail. NTIS, AD--A090029.) ment. Report RADC-TR-79-97, Mar. 1979. (Avail. NTIS, AD--A068911 .)
Lloyd, D.K.: and Lipow, M.: Reliability: Management, Methods and Math- Rigling, W.S.: Reliability Study of Polyimide/Glass Multilayer Boards. Report
ematics. Prentice-Hall, 1962. RADC-TR-73--400, Jan. 1974. (Avail. NTIS, AD-771994.)
Lyne, G.W.: Implementation of Operational Procedures for Optimized Reli- Safety, Reliability, Maintainability and Quality Provisions for the Space
ability and ComponentLife Estimator(ORACLE). Report RADCTR-77--49, Shuttle Program. NHB 5300.4 (ID-2), Oct. 1, 1979.
Mar. 1977. (Avail. NTIS, AD--A039344.) Sandier, G.H.: System Reliability Engineering. Prentice-Hall, 1963.
Maintainability Engineering Handbook. Naval Air Systems Command, Schafer, R.E., et al.: Contact Reliability Screening. Report RADC-TR-72-
NAVAIR 01-1A-33, July 1977. 326, Dec. 1972. (Avail. NTIS, AD--755923.)
Maintainability Program Requirements for Space Systems. NHB 5300.4 (1E), Schafer, R.E.; and Sheffield, T.S.: Bayesian Reliability Demonstration:
Mar. 10, 1987. Phase 11,Development of A priori Distribution. Report RADC-TR-71-209,
Moore, J.R,; Fumival, C.; and Burr, J.: Reliability of Ceramic Multilayer Oct. 1971. (Avail. NTIS, AD-732283.)
Boards. Report RADC-TR-71-299, Dec. 1971. (Avail. NTIS, AD.-737373.) Schafer, R. E.; Sheffield, T.S.; and Collins, T.R.: Bayesian Reliability Demon-
Morrison, G.N., et al.: RADC Thermal Guide for Reliability Engineers. Report stration: Phase Ill, Development of Test Plans. Report RADC-TR-73- ! 39,
RADC-TR-82-172, June 1982. (Avail. NTIS, AD..-A 118839.) June 1973. (Avail. NTIS, AD--765172.)
NAVAIR--0t-IA-31, Reliability and Maintainability Management Hand- Smith, J.S.; Kapfer, V.C.; and Doyle, E.A., Jr.: Reliability Evaluation of
book. Naval Air System Command. July 1977. 54L20 Radiation Hardened Dual NAND Gates. Report RADC-TR-73-
NAVAIR-01-1A-33. Maintainability Engineering Notebook. Naval Air 180, June 1973. (Avail. NTIS, AD--765173.)
System Command. July 1977. Tees, W.G.: Predicting Failure Rates of Yield Enhanced LSI. Comput. Des.,
Neff, G.R.; and Gold. H.I.: Software Acquisition Management Guidebook: vol. 10, no. 2, Feb. 1971, pp. 65-71.
Software Quality Assurance. ESD.-TR-77-255, Aug. 1977. (Avail. NTIS, Toohey, E.F., and Cairo, A.B.: Cost Analyses for Avionics Acquisition.
AD-A047318. ) 1980 Annual Reliability and Maintainability Symposium, IEEE, 1980,
Pieruschka, E.: Principles of Reliability. Prentice-Hall, 1963. pp. 85-90.
Plein, K.M.; Funk, J.R,; and James, L.E.: Reliability Study of Circular Turner, T.E.: Hybrid Microcircuit Failure Rate Prediction. Report RADC-TR-
Electrical Connectors. Report RADC-TR-73-171, June 1973. (Avail. 78--97, Apr. 1978. (Avail. NTIS, AD--A055756.)
NTIS, AD--765609.) Wilcox, R.H.: Redundancy Techniques for Computer Systems. Spartan Books,
Product Performance Agreement Guide. Joint AFSC/AFLC Publication, Aug. Washington, DC, 1962.
1980. Wilson, D.S.; and Smith, R.: Electric Motor Reliability Model. Report RADC-
Quality Assurance Program. AFSCR-74--I, Nov. 1978. TR-77-408, Dec. 1977. (Avail. NTIS, AD--A050179.)
Quality Program Provisions for Aeronautical and Space System Contractors. Wilson, D.S.; and Witkinson, J.W.: Reliability Model for Miniature Blower
NHB 5300.4 (IB), Apr. 1, 1969. Motors per MIL-B-23071B. Report RADC-TR-75-178, July 1975. (A vail.
Reliability and Maintainability Management Handbook. NAVAIR 01-1A- NTIS, AD--A013735.)
31, July 1977. Woodgate, R,W.: Infrared Testing of Multilayer Boards. Report RADC-TR-
Reliability and Maintainability Planning Guide for Army Aviation Systems 74-88, Apr. 1974. (Avail. NTIS, AD--780550.)
and Components. U.S. Army Aviation Research and Development Corn- Zimmer, R.P., et al.: High Power Microwave Tube Reliability Study. Report
man& St. Louis, MO, 1974. FAA-RD-76-172, Aug. 1976. (Avail. NTIS, AD-A033612/3.)
Reliability and Maintainability Planning Notebook. Federal Aviation Admin-
istration, Washington, DC, 1980.

278 NASA/TP--2000-207428
Reliability Training Answers

Chapter Answers

1 (B), 2 (D), 3 (C), 4 (C)

2 la (C), lb (B), 2a (C), 2b (B), 3a (C), 3b (A), 4a (B), 4b (C), 5ai (B), 5aii (C),

5aiii (B), 5b (C), 6a (C), 6b (B), 7a (B), 7b (C), 8a (C), 8b (C), 9 (D), 10 (A),

11 (B), 12 (C), 13 (C), t4 (C), 15 (D), 16 (E), 17 (D), 18 (F)

3 la (B), Ib (B), lc (C), 2a (A), 2b (C), 2c (A), 3a (B), 3b (A), 3c (B), 4 (C), 5a (B), 5b

(B), 6 (C), 7a (A), 7b (B), 7c (B), 7d (C), 7e (A), 8 (B), 9a (B), 9b (C), 10a (C), 10b (C),

lOc (A)

4 la (B), lb (B), 2a (A), 2b (A), 3 (C), 4a (B), 4b (B)

5 I (C), 2 (B), 3a (C), 3b (A),'3c (C), 4a (C), 4b (B), 4c (A), 5a (C), 5b (A), 6a (C), 6b

(C), 6c (A), 7a (B), 7b (C), 7c (C), 7d (C), 8a (A), 8b (C), 8c (B), 8d (C), 8e (B), 8f (B)

6 la (i3)i lb (C), lc (A),2a (C), 2b (B), 2c (A), 2d (C), 3a (B), 3b (C), 3ci (B), 3cii (A)

7 1 (c), 2 (B), 3 (D), 4 (A), 5 (B), 6 (C), 7 (B), 8 (C)

8 1. Item 4, squawk, major, wrong, reliability, subsystem

9 I (B), 2 (A), 3 '(C), 4a (C), 4b (B), 4c (F), 5 (A), 6a (C), 6b (B), 7 (A), 8a (B), 8b (A)

10 1 (D), 2 (D), 3 (G), 4 (B), 5 (A), 6 (E), 7 (B), 8 (D), 9 (A), 10 (C), 11 (B), 12 (F), 13

(E), 14a (C), 14b (C), 15 (C), 16 (B), 17 (E), 18 (A), 19a (C), 19b (B), 19c (A)

11 la (C), lb (B), lc (C), ld (C), 2a (C), 2b (A), 2c (B), 2d (C)

NASA/TP--2000-207428 279
Appendix D
Training Manual for Elements of Interface
Definition and Control
As part of this reliability and maintainability training manual, the authors have included in this appendix the published document
Training Manual for Elements of Interface Definition and Control. Their desire was to provide the reader the complete texts for
reliability training. This manual was published in 1997 and appears here exactly as it does in print. To avoid confusion, the reader
should note that the original page numbers and content have been retained.

NASA/TP--2000-207428 281
NASA
Reference
Publication
1370

1997

Training Manual for Elements of Interface


Definition and Control

Vincent R. Lalli
Lewis Research Center
Cleveland, Ohio

Robert E. Kastner
Vitro Corporation
Rockville, Maryland

Henry N. Hartt
Vitro Corporation
Washington, DC

lational Aeronautics and


pace Administration

Dffice of Management
cientific and Technical
_formation Program
Preface
This technical manual was developed under the Office of Safety and Mission Assurance continuous
training initiative. The structured information contained in this manual will enable the reader to efficiently and
effectively identify and control the technical detail needed to ensure that flight system elements mate properly
during assembly operations (both on the ground and in space).
Techniques used throughout the Federal Government to define and control technical interfaces for both
hardware and software were investigated. The proportion of technical information actually needed to
effectively define and control the essential dimensions and tolerances of system interfaces rarely exceeded 50
percent of any interface control document. Also, the current Government process for interface control is very
paper intensive. Streamlining this process can improve communication, provide significant cost savings, and
improve overall mission safety and assurance.
The primary thrust of this manual is to ensure that the format, information, and control of interfaces
between equipment are clear and understandable, containing only the information needed to guarantee
interface compatibility. The emphasis is on controlling the engineering design of the interface and not on the
functional performance requirements of the system or the internal workings of the interfacing equipment.
Interface control should take place, with rare exception, at the interfacing elements and no further.
There are two essential sections of the manual. The first, Principles of Interface Control, discusses how
interfaces are defined. It describes the types of interface to be considered and recommends a format for the
documentation necessary for adequate interface control. The second, The Process: Through the Design Phases,
provides tailored guidance for interface definition and control.
This manual can be used to improve planned or existing interface control processes during system design
and development. It can also be used to refresh and update the corporate knowledge base. The information
presented herein will reduce the amount of paper and data required in interface definition and control processes
by as much as 50 percent and will shorten the time required to prepare an interface control document. It also
highlights the essential technical parameters that ensure that flight subsystems will indeed fit together and
function as intended after assembly and checkout.

NASA RP-1370 iii


Acknowledgments

In 1992 the NASA Reliability and Maintainability Steering Committee recognized the need to provide its
engineers, especially its design engineers, with a practical understanding of the principles and applications of
interface definition and control documentation. A working group was formed and met at Goddard Space Flight
Center to discuss how some of the NASA centers were handling this topic. Four centers and NASA Headquarters
participated in the meeting: Headquarters--NASA handbook methods; Johnson Space Center (JSC) and
Marshall Space Flight Center (MSFC)---space station; Lewis Research Center--space station and launch
vehicles; Jet Propulsion Laboratory (JPL)----Mars Observer; and Goddard Space Flight Center (GSFC)--space
experiments.
To satisfy the need for a short, informative interface definition and control training manual, Robert E.
Kastner and Henry N. Hartt of the Vitro Corporation prepared the manual using material from the working
group meeting. Harvey L. Schabes and William J. Taylor of Lewis Research Center served as the final
NASA project office reviewers. Their suggestions improved the usefulness of the text for flight projects. The
dedication, time, and technical contributions of Jack Remez/GSFC; Donald Bush (retired)/MSFC; David
OberhettingerlSYSCON (JPL); Daniel Deans/LORAL (JSC); and Ronald Lisk/NASA Headquarters (Code Q)
in the preparation of this manual are appreciated. Without the support of their individual centers and their
enthusiastic personal support and willingness to serve on the NASA Reliability and Maintainability Steering
Committee, this manual would not have been possible.
The following NASA members of the steering committee may be contacted for more information about the
processes and products discussed in this manual:

James F. Clawson Vincent R. Lalli

Jet Propulsion Laboratory, NASA Lewis Research Center

California Institute of Technology Code 0152, MS 501-4


MS 301-456, SEC 505 Cleveland, OH 44135
4800 Oak Grove Drive Office: (216) 433--2354
Pasadena, CA 91109 Facsimile: (216) 433-5270
Office: (818) 356--7021 E-mail: rqlalli@limsO l.lerc.nasa.gov
Facsimile: (818) 393--.4699
E-mail: james, f.clawson @ccmail.jpl.nasa.gov Michael E. Langley
NASA George C. Marshall Space Flight Center
John Greco CR-10, Bldg. 4203
NASA Langley Research Center Marshall Space Flight Center, AL 35812
MS 421, Bldg. 1162A, Rm. 125 Office:(205) 544--0056
5A Hunsaker Loop Facsimile:(205) 544---4155
Hampton, VA 23681-.0001 E-mail: michael.langley @ msfc.nasa.gov
Office: (804) 864--3018
Facsimile: (804) 864--6327 Dan Y. Lee

E-mail: j.a.greco @ larc.nasa.gov NASA Ames Research Center


MS 240A-3, P.O. Box 1000
Moffet Field, CA 94035-1000
Wilson Harkins
Office: (415) 604-5962
NASA Headquarters, Code QS
Facsimile: (415) 604-0399
300 E. Street SW
E-mail: dan lee @qmgate.arc.nasa.gov
Washington, DC 20546
Office: (202) 358--0584
Facsimile: (202) 358-3104
E-mail: wharkins@ cc.hq.nasa.gov

NASA RP-1370 iv
Leon
R.Migdalski Donald L. Wiley
NASAKennedy
Space
Center NASA Lyndon B. Johnson Space Center
RT-SRD--2
OSB3309 Code NS3, Bldg. 45, Rm. 616B
Kennedy
Space
Center,
FL32899 Houston. TX 77058
Office: (407) 861-3284 Office: (713 _ 483-4084
Facsimile: (407) 861-.-4314 Facsimile: (713) 483-3045
E-mail: Imigdals @srqa.ksc.nasa.go v E-mail: dwiley @gp 101 .jsc.nasa.gov

Jack W. Remez
NASA Goddard Space Flight Center
Code 302, Bldg. 6, Rm. $240
Greenbelt, MD 20771
Telephone: (301) 286-7113
Facsimile: (301) 286-1701
E-mail: [email protected]

NASA RP-1370 v
Contents
Chapter
1. Introduction ........................................................................ I
1.1 Training ........................................................................ 2

2. Principles of Interface Control ......................................................... 3


2.1 Purpose of Interface Control ........................................................ 3
2.2 Identifying Interfaces .............................................................. 3
2.3 Categorizing (Partitioning) and Defining Interfaces ...................................... 4
2.3. l Electrical/Functional ......................................................... 4
2.3.2 Mechanical/Physical .......................................................... 4
2.3.3 Software ................................................................... 5
2.3.4 Supplied Services ............................................................ 5
2.4 Documenting Interfaces ............................................................ 6
2.5 Identifying Steady-State and Non-Steady-State Interfaces ................................. 6
2.6 Selecting a Custodian .............................................................. 6
2.7 Analyzing for Interface Compatibility ................................................. 7
2.8 Verifying Design Compliance With Interface Control Requirement .......................... 7
2.9 Verifying Contract-Deliverable Item .................................................. 7
2.10 Training ........................................................................ 8

3. The Process: Through the Design Phases .............................................. 13


3.1 Program Phases ................................................................. 13
3.1.1 Concept Definition .......................................................... 13
3.1.2 Requirements Definition ..................................................... 13
3.1.3 Systems Integration ........... ".............................................. 16
3.2 Preparing and Administering Interface Control Document ................................ 16
3.2.1 Selecting Types of Interface Control Document ................................... 16
3.2.2 Tracking and Resolving Missing Interface Design Data ............................. 16
3.3 Initial Issuance of ICD ............................................................ 17
3.4 Document Review and Comment ................................................... 17
3.4.1 Resolving Comments ........................................................ 17
3.4.2 Interface Control Working Group .............................................. 17
3.4.3 Approval/Signoff Cycle ...................................................... 19
3.4.4 Technical Approval ......................................... . ............... 19
3.4.5 Baselining ................................................................. 19
3.5 Change Notices ................................................................. 19
3.5.1 Initiating Changes .......................................................... 19
3.5.2 Requesting Changes ......................................................... 21
3.5.3 Proposed Change Notice Review and Comment Cycle .............................. 21
3.5.4 Processing Approved Changes ................................................. 21
3.5.5 Distributing Approved Changes ................................................ 21
3.5.6 Configuration Control Board .................................................. 21
3.5.7 Closing the Loop ........................................................... 22
3.6 Training ....................................................................... 22

NASA RP-1370 vi
Appendixes:
A: Electrical/Functional
InterfaceExample.............................................. 24
B: Mechanical/PhysicalInterfaceExamples .............................................. 29
C: SoftwareInterface
Example........................................................ 38
D: SuppliedServices
Interface Example ................................................. 39
E: Compatibility Analysis ............................................................ 43
F: Bracket System for Interfaces ....................................................... 46
G: ICD Guidelines .................................................................. 48
49
H: Glossary .......................................................................

50
References ...........................................................................
Bibliography .......................................................................... 50
Training Answers ...................................................................... 51

NASA RP-1370 vii


Chapter I

Introduction
This technical manual resulted from an investigation of Establishing a system that ensures that all interface param-
techniques used throughout NASA and other Federal Govern- eters are identified and controlled from the initial design
ment agencies to define and control technical interfaces for activities of a program is essential. It is not necessary that the
both hardware and software. The processes described herein fine details of these parameters be known at that time, but it is
distill the requirements for interface definition and control into very important that the parameters themselves are identified,
a concise set of parameters that control the design of only the that everything known about them at that time is recorded and
interface-related elements rather than providing extraneous controlled, and that voids I are identified and scheduled for
design detail that must subsequently be configuration elimination. The latter requirement is of primary importance to
managed. the proper design of any interface. Initial bounding of a void and
The purpose of this manual is to provide guidelines for scheduled tightening of those bounds until the precise dimen-
establishing and conducting the interface control process so sions or conditions are identified act as a catalyst to efficient
that items produced by different design agencies satisfactorily design and development. An enforced schedule for eliminating
mate and operate in a way that meets mission requirements. voids is one of the strongest controls on schedule that can be
These guidelines were drawn from the methodologies of a applied (ref. 3).
number of highly successful programs and therefore represent The process of identifying, categorizing, defining, and docu-
a compilation of "lessons learned." menting interfaces is discussed in the following chapter. Guid-
The principles and processes of interface definition and ance for the analysis of interface compatibility is also provided.
control presented in this document apply to all projects and
programs but may be tailored for program complexity. For
example, the interface control process may be less formal for a
project or program that requires only one or two end items and
has few participants; however, the formal interface control
document is still necessary. For a project or program that Verification _ Mission needs
requires a number of end items and where several participants
are involved, a carefully followed interface control process is and valid_efiniti°_ n
imperative, with comments, decisions, agreements, and com- _Risk and
mitments fully documented and tracked. Individual managers
should provide the implementation criteria for their interface Technical //'_ I /_ system:
control processes early in the project or program (ref. 1).
This manual covers the basic principles of interface defini-
tion and control: how to begin an interface control program
during the development of a new project or program, how to
develop and produce interface documentation, how to manage
the interface control process, and how to transfer interface
manage_nition
control requirements to hardware and software design.
Interface definition and control is an integral part of system
"__e;u,,t_ements
engineering. It should enter the system engineering cycle at the
end of the concept development phase. Depending on whether Figure 1.1--System engineering cycle. (The
the system under development is designed for one-time or requirements definition phase must include
continuous use, the process may continue for the full life cycle the requirements for the interfaces as well as
of the system. Interface definition and control should not be those which will eventually be reflected in the
equated to configuration management or configuration control. interface control document.)
Rather it is a technical management tool that ensures that all
equipment will mate properly the first time and will continue to
operate together as changes are made during the life cycle of the
system. Figure I. 1 depicts the elements of the system engineer-
1A "void" is a specific lack of information needed for control of an interface
ing cycle and is used in chapter 3 to describe the application of feature. Control and elimination of voids is fundamental to a strong interface
the interface discipline at different parts of the life cycle (ref. 2). definition and control program.

NASA RP-1370 1
1.1 Training 2

I. The processes explained in this manual for interface C. Mission needs definition, risk and systems analysis,
definition and control are concept and requirements definitions, system integra-
A. A concise set of parameters that control the design of the tion, configuration management, technical oversight,
interface-related elements and verification and validation
B. A set of design details needed for configuration manage-
ment 4a. What is a void?
A. Bracketed data
. The process is very important for projects that require B. Wrong data
A. A number of end items C. Lack of information needed
B. Involvement of several participants
C. Comments, decisions, agreements, and commitments 4b.How should voids be handled?
that must be fully documented and tracked A. Voids should be identified and their elimination
D. All of the above scheduled.
B. Data should be analyzed.
. What elements does the system engineering cycle contain? C. Supplier should be guided.
A. Mission needs, requirements, and integration
B. Technical oversight, core design, and system configura- 4c. Name a strong control needed for voids.
tion A. Precise dimensions
B. Enforced schedule
C. Identified catalysts

2Answers are given at the end of this manual.

NASA RP-1370
Chapter 2

Principles of Interface Control


mance requirements. These performance requirements are trans-
2.1 Purpose of Interface Control lated into design requirements as the result of parametric
studies, tradeoff studies, and design analyses. The design
An interface is that design feature of a piece of equipment 3 requirements are the basis for developing the system specifica-
tions. The boundaries between the functional areas as defined
that affects the design feature of another piece of equipment.
The purpose of interface control is to define interface require- in the system specifications become the interfaces. Early inter-
ments so as to ensure compatibility between interrelated pieces face discussions often contribute to final subsystem specifica-
of equipment and to provide an authoritative means of control- tions. Interface characteristics, however, can extend beyond the
ling the design of interfaces. Interface design is controlled by an interface boundary, or interface plane, where the functional
interface control document (ICD). areas actually come together. The interface could be affected
These documents by, and therefore needs to be compatible with, areas that
contribute to its function but may not physically attach. For
1. Control the interface design of the equipment to prevent example, it may be necessary to define the path of a piece of
any changes to characteristics that would affect compat- equipment as it traverses through another piece of equipment
ibility with other equipment and rotates and articulates to carry out its function. Electrical
2. Define and illustrate physical and functional characteris- characteristics of a transmitter and receiver separated by an
tics of a piece of equipment in sufficient detail to ensure interface plane may have to be defined for each to properly
compatibility of the interface, so that this compatibility function. Similarly, the acoustic energy produced by one com-
can be determined from the information in the ICD alone ponent and transmitted through the structure or onto another
3. Identify missing interface data and control the submis- component may need a corresponding definition.
sion of these data Identifying interfaces early in a program is essential to
4. Communicate coordinated design decisions and design successful and timely development. Functional analyses are
changes to program participants used for analyzing performance requirements and decompos-
5. Identify the source of the interface component ing them into discrete tasks or activities (i.e., decomposing the
primary system functions into subfunctions at ever increasing
ICD's by nature are requirements documents: they define levels of detail). Functional block diagrams are used to define
data flow throughout the system and interfaces within the
design requirements and allow integration. They can cause
designs to be the way they are. They record the agreed-to design system. Once the segments and elements within a system have
solution to interface requirements and provide a control mecha- been defined, a top-level functional block diagram is prepared.
nism to ensure that the agreed-to designs are not changed by one The block diagrams are then used in conjunction with N-
participant without negotiated a_eement of the other participant. squared diagrams to develop interface data flows. The N-
To be effective, ICD's should track a schedule path compat- squared diagram is a technique used extensively to develop data
ible with design maturation of a project (i.e., initial ICD's interfaces but can also be refined for use in defining hardware
should be at the 80% level of detail at preliminary design interfaces. However, use of this tool in this manual will be
review, should mature as the design matures, and should reach restricted to interface categorization. Additional description is
the 99% mark near the critical design review). provided in section 3.1.1.
In summary, identifying where interfaces are going to occur
begins the systems integration component of systems engineer-
ing and must start early in design planning. The interface
2.2 Identifying Interfaces
boundaries or planes vary from program to program depending
on how design and development responsibilities are assigned.
Identifying where interfaces are going to occur is a part of Interface control can occur within a functional area of other
systems engineering that translates a mission need into a
design and development agents. Therefore, interfaces can be
configured system (a grouping of functional areas) to meet that
identified at many levels, for example,
need. Each functional area grouping is assigned certain perfor-

1. Center to center
3For purposesof thismanual, a pieceof equipment is afunctional areaassigned
2. Discipline to discipline (e.g., propulsion to guidance,
to a specific source. Thus, a piece of equipment can be an element ofthe space
station, a systemof a spacecraft,a work packageassigned toa contractor, or a sensor to structure, or power to users)
subsystem. 3. Contractor to contractor

NASA RP-1370 3
4. Center to contractor to discipline 3. Shielding and grounding
5. Program to program (e.g,, shuttle to National Launch 4. Signal characteristics
System) 5. Cable characteristics
6. Data definition
Once interface boundaries or planes are established, the 7. Data transmission format, coding, timing, and updating
interfaces must be categorized and defined. 8. Transfer characteristics
9. Circuit logic characteristics
I0. Electromagnetic interference requirements
2.3 Categorizing (Partitioning) and 11. Data transmission losses
12. Circuit protective devices
Defining Interfaces
Other data types may be needed. For example, an analog
Categorizing, or partitioning, interfaces separates the inter-
signal interface document would contain function name and
face features by technical discipline and allows each category,
symbol, cable characteristics, transfer characteristics, circuit
in most cases, to proceed through the definition process
protective devices, shielding, and grounding; whereas a digital
independently.
data interface would contain function name and symbol, data
The following basic interface categories (defined by the
format, coding, timing and updating, and data definition.
types of feature and data they encompass) are recommended for
Additional data types under the electrical/functional heading
use in most programs:
are

1. Electrical/functional
1. Transmission and receipt of an electrical/electromag-
2. Mechanical/physical
3. Software netic signal
2. Use of an electrically conductive or electromagnetic
4. Supplied services
medium

During the early phases of systems engineering, interfaces


may be assigned only the high-level designation of these Appendix A shows recommended formats for electrical and
categories. As the system becomes better defined, the details of functional interface control drawings.
the physical and functional interface characteristics become
better defined and are documented. 2.3.2 Mechanical/Physical
An interface can be assigned to one of these categories by a
number of processes of elimination. The one recommended for Mechanical/physical interfaces are used to define and con-
use is the N-squared diagram (ref. 4), which is currently being trol the mechanical features, characteristics, dimensions, and
used by some NASA centers. tolerances of one equipment design that affect the design of
another subsystem. They also define force transmission re-
2.3.1 Electrical/Functional quirements where a static or dynamic force exists. The features
of the equipment that influence or control force transmission
Electrical/functional interfaces are used to define and con- are also defined in this ICD. Mechanical interfaces include
trol the interdependence of two or more pieces of equipment those material properties of the equipment that can affect the
when the interdependence arises from the transmission of an functioning of mating equipment, such as thermal and galvanic
electrical signal from one piece of equipment to another. All characteristics. Specific types of data defined are
electrical and functional characteristics, parameters, and toler-
ances of one equipment design that affect another design are 1. Optical characteristics
controlled by the electrical/functional ICD. The functional 2. Parallelism and straightness
mechanizations of the source and receiver of the interface 3. Orientation requirements
electrical signal are defined, as well as the transmission 4. Space or provisions required to obtain access for perform-
medium. ing maintenance and removing or replacing items,
The interface definition includes the data and/or control including space for the person performing the function
functions and the way in which these functions are represented 5. Size, shape, mass, mass distribution, and center ofgravity
by electrical signals. Specific types of data to be defined are 6. Service ports
listed here: 7. Indexing provisions
8. Concentricity
1. Function name and symbol 9. Surface finish
2. Impedance characteristics 10. Hard points for handling

NASA RP-! 370


11. Sealing,pressurization,attachment,andlocking multiple components, such as data-processing interactions
provisions between components, ti ruing, priority interrupts, and watchdog
12.Locationandalignment requirementswithrespect to timers. Controversy generally arises in determining whether
otherequipment these relationships are best documented in an electrical/func-
13.Thermal conductivity
andexpansioncharacteristics tional ICD, a software ICD, or a performance requirements
14.Mechanicalcharacteristics
(springrate,elasticproper- document. Generally, software interface definitions include
ties,creep,
set,etc.)
15.Load-carrying
capability 1. Interface communication protocol
16.Galvanicandcorrosivepropertiesof interfacing 2. Digital signal characteristics
materials 3. Data transmission format, coding, timing, and updating
requirements
Otherdatatypesmaybe needed, Forexample, anICD 4. Data and data element definition
controlling a form-and-fitinterface
wouldgenerally contain 5. Message structure and flow
suchcharacteristics assizeandshape oftheitem,locationof 6. Operational sequence of events
attachment features,
locationofindexing
provisions,
andweight 7. Error detection and recovery procedures
andcenter ofgravityoftheitem.However, anICDcontrolling
astructural loadinterfacewouldcontainweightandcenter of Other data types may be needed. Appendix C provides an
gravity,load-carrying capability,
andelastic
properties
ofthe example of a software interface signal.
material if applicabletotheloadingconditions.
NotallICD's
controlling aform-and-fit interfacewouldhavetocontain all 2.3.4 Supplied Services
typesof datagivenin thisexample, butsomeform-and-fit
interface definitions containmorethanthe16typesof data Supplied services are those support requirements that a piece
listed.Indexing definitions
mayrequire angularity,
waviness, of equipment needs to function. Supplied services are provided
andcontour definitions
andtolerances. by an external separate source. This category of interface can be
Additional datatypesunderthemechanical/physicalhead- subdivided further into electrical power, communication, fluid,
ingwouldbe and environmental requirements. The types of data defined for
these subcategories are
1.Dimensionalrelationships
between mating
equipment
2.Forcetransmissionacrossaninterface 1. Electrical power interface:
3.Useofmechanically conductivemedia a. Phase
4.Placing,
retaining,
positioning,
orphysically
transporting b. Frequency
acomponent byanother component c. Voltage
5.Shock mitigation
toprotectanothercomponent d. Continuity
e. Interrupt time
Appendix B(fromref.5)shows amechanical/physicaldraw- f. Load current
ing. g. Demand factors for significant variations during
Thisextensive varietyof possibilities
andcombinations operations
prevents
assigningastandardsetofdatatypesorlevelofdetail h. Power factor
toaform-and-fit
interface.
Each interface
must beanalyzedand i. Regulation
thenecessarycontrolling
dataidentified
beforetheproperlevel j. Ripple
ofinterface
definition
andcontrolcanbeachieved. Thisholds h. Harmonics
trueforallexamplesgiveninthischapter. 1. Spikes or transients
m. Ground isolation
2.3.3 Software n. Switching, standby, and casualty provisions
2. Communication interface:
A software interface defines the actions required when a. Types of communication required between equip-
interfacing components that result from an interchange of ment
information. A software interface may exist where there is no b.Numberofcommunicationstationspercommunica-
direct electrical interface or mechanical interface between two tion circuit
elements. For example, whereas an electrical ICD might define c. Location of communication stations
the characteristics of a digital data bus and the protocols used 3. Fluid interface:
to transmit data, a software interface would define the actions a. Type of fluid required
taken to process the data and return the results of the process. i. Gaseous
Software interfaces include operational sequences that involve ii. Liquid

NASA RP-1370 5
b. Fluid properties (ref. 8), for general guidance of a drawing format. The specifi-
i. Pressure cation format should use MIL-STD--490 (ref. 6) for paragraph
ii. Temperature numbering and general content.
iii. Flow rate Some large programs require large, detailed ICD's. Main-
iv. Purity taining a large, overly detailed document among multiple
v. Duty cycle parties may be more difficult than maintaining a number of
vi. Thermal control required (e.g., fluid heat lost or smaller, more focused documents. Grouping small documents
gained) by major category of interface and common participants is one
. Environmental characteristic interface: of the most effective and efficient strategies. It minimizes the
a. Ambient temperature number of parties involved and focuses the technical disci-
b. Atmospheric pressure plines, greatly streamlining the decision process and permitting
c. Humidity much shorter preparation time. However, interfaces can be
d. Gaseous composition required multidisciplinary and separate documents can result in mis-
e. Allowable foreign particle contents communications.

Other data types may be needed. Appendix D shows an ex-


ample of a supplied services interface for air-conditioning and 2.5 Identifying Steady-State and
cooling water. Non-Steady-State Interfaces

Interfaces can vary from a single set that remains constant for
2.4 Documenting Interfaces the life of a program to a multiple set of documents that
reconfigures during specific events in the life of a system. The
Once an interface has been categorized and its initial con- first category would be used for an interplanetary probe. The
tents defined, that interface definition must be recorded in a interfaces of its instruments with the basic spacecraft structure
document that is technically approved by the parties (designer would remain the same from assembly for launch throughout
and manager) and the owners of both sides of the interface. The the life of the experiment. However, a continually evolving
document then is approved by the next higher level in the platform, such as a lunar base, would perhaps be controlled in
project management structure and becomes the official control a series of interface documents based on the assembly sequence
for interface design. of the base. An initial base would be established and later made
The program manager must ensure that compliance with the more complex with additional structures and equipment deliv-
approved interface control document is mandatory. Each level ered by subsequent lunar flights. Pressurized elements, logistic
of program management must ensure that the appropriate elements, power-generating sources, habitats, laboratories, and
contractors and Government agencies comply with the docu- mining and manufacturing facilities might be added and
mentation. Therefore, technical approval of the interface con- reconfigured over time. Each configuration would require a set
trol document indicates that the designated approving of interface control documents to ensure compatibility at the
organization is ready to invoke the interface control document construction site as well as with the transportation medium
contractually on the approving organization's contractor or from Earth to Moon. Interfaces that remained constant during
supporting organization. this process might be termed steady state and require no further
The interface categories can be grouped together in one consideration once the interface was verified and delivered;
document, or each category can be presented in a separate whereas interfaces that would evolve from the initial
document (i.e., electrical ICD's, mechanical ICD's, etc). The configuration through multiple iterations would require multi-
format for interface control documents is flexible. In most cases coordination of interface parameters and schedules. The selec-
a drawing format is the easiest to understand and is adaptable tion of interface categories should identify the steady-state or
to the full range Of interface data. non-steady-state nature of interfaces as well as their initial
The specification format (ref. 6) can also be used. The use of designations (ref. 9).
this type of format enables simple changes through the removal
and insertion of pages; however, the fort'nat is often difficult to
use when presenting complex interface definitions that require 2.6 Selecting a Custodian
drawings, and normally requires many more pages to convey
the same level of information. Selecting an ICD custodian can depend on several factors
In either case there must be agreement on a standard for data (e.g., percentage of interface ownership, relative mission im-
presentation and interpretation. ANSI standard Y14.5 (ref. 7) portance of interface sides, and relative investment of interface
can be used for dimensions, along with DOD-STD-100 sides). However, it is generally most effective if the custodian

NASA RP-1370
selected
hasanobjective pointof view.Anexample of this 2.7 Analyzing for Interface
wouldbesomeone whois independent of eithersideof the Compatibility
interface
(i.e.,withoutany"vestedinterest"in theinterface
hardwareorsoftware).Objectivitypermits
unbiased controlof The interface definitions to be documented on the ICD's
theinterface,involvementof thecustodian asanobjective
mediator,anddocumentation onanoninterfer- must be analyzed for compatibility before the ICD is authenti-
oftheinterface
cated. Appendix E provides guidance on how compatibility
encebasiswithprogram/project internal
design.Selecting an
analyses may be performed. They vary in their complexity from
independent interface
custodian shouldbethefirststepin
a simple inspection of the interface definitions to complex
establishing
aninterfacecontrolorganization.
A set of criteria
mathematical analyses where many variables are involved.
should be used to select the custodian by weighting the content
and interests of the interface with the needs of interface control. Regardless of complexity, the compatibility analysis should
One set of criteria is as follows: be documented and maintained as backup information for the
ICD. It can be used to expedite any changes to the interface
definition by providing a ready means for evaluating the
1. Integration center: Is one center accountable for integrat-
compatibility of the proposed change. The compatibility analy-
ing the interfaces controlled by this ICD? This criterion is
sis also can be used to document how the interface definition
considered the most important because the integration center
was arrived at and why the definition is presented as it is on
will have the final responsibility for certifying flight readiness
an ICD.
of the interfaces controlled in the ICD.
2. U.S. center: Is the participant a U.S. center? This crite-
rion is considered the next most important because of agency
experience and projected responsibility. 2.8 Verifying Design Compliance With
3. Flight hardware or software: Is the interfacing article Interface Control Requirement
flight hardware or software (as opposed to support hardware or
software)? Flight hardware or software takes precedence. The ICD can only fulfill its purpose if the contractors'
4. Flight sequence: Does one side of the interfacing equip- detailed design drawings and construction practices adhere to
ment fly on an earlier manifest than the other? An earlier flight the limits imposed by the ICD. Verifying compliance of the
sequence takes precedence over follow-on interfacing design as well as of the construction process is an integral part
hardware. of interface control.
5. Host or user: Is the interfacing article a facility (as Each contractor should be assigned the responsibility of
opposed to the user of the facility)? Procedure in this criterion denoting on their manufacturing and inspection drawings or
is guided by the relative priority of the interfacing articles. documents those features and characteristics that, if altered,
6. Complexity: How complex is the interfacing equipment would affect interfaces controlled by the ICD's. To ensure that
(relative to each side)? The more complex side of the interface all ICD requirements are covered, the contractor should select,
normally takes precedence. at the highest assembly level at which the equipment is in-
7. Behavior: How active is the interfacing equipment? The spected, the features and characteristics to be denoted. Any
active side normally takes precedence over the passive side. design change affecting an ICD-controlled feature or charac-
8. Partitions: How are the partitions (categories) used by the teristic would be clearly identified even at the assembly level
interfacing equipment? The relative importance of the parti- (ref. 10).
tions to the interface is acknowledged, and selection of the Entries identified as "to be resolved" (TBR) can be bracketed
custodian is sensitive to the most important partition or shaded to indicate preliminary interface information or an
developers. interface problem. This information is subject to further review
and discussion and is an interim value for use in evaluating
Scores are assigned to each piece of interfacing equipment effects. Entries identified as "to be supplied" (TBS) represent
for each criterion. These scores can be determined by many data or requirements to be furnished. Appendix F shows a
different methods. Discrete values can be assigned to the first typical bracket system.
four criteria. A score of 1.0 is assigned if the interfacing piece
of equipment is unique in meeting the criterion, the other piece
of equipment then receives a score of 0.0. Scores of 0.5 are
assigned to both sides if both (or neither) of them meet the 2.9 Verifying Contract-Deliverable Item
criterion. There is no definitive way of assigning scores to the
last four criteria; however, verbal consensus or an unbiased Each contract-deliverable item that is a mating side to an ICD
survey can be used to assign scores. Also, the partition criteria interface should also be tested or measured to verify that the
can be scored by partition evaluation analysis (ref. 4). item complies with the requirement as specified in the ICD. The

NASA RP-1370 7
responsibilityforadministering andreporting onthisverifica- 4b. Who approves the interface control document?
tionprogramcouldbeassigned tothedesign agent, thecontrac- A. Designer or manager
tor,oranindependent thirdparty.If feasible, anindependent B. Owners of both sides
thirdpartyshould beselected forobjectivity. C. Both of the above
Theverificationmethods should includeanalysis, measure-
mentandinspection, demonstration, andfunctional testing. 4C. Who ensures compliance with the approved ICD?
Thespecific methods employed ateachinterface willdepend A. Designer or manager
onthetypeoffeature andtheproduction sequence. Compliance B. Owners of both sides
shouldbeverifiedatthehighest practical
assembly level.To C. Project manager
preclude fabrication
beyond thepointwhere verification canbe
performed, anintegrated inspection, measurement, anddem- 5a. What is a steady-state interface?
onstrationtestoutlineofbothhardware andsoftware should be A. A single set that remains constant for the life of the
developed. Thisverification testoutlinewill provide asched- project
ule,tiedtoproduction, thatallowsallinterface requirements to B. A multiple-set suite that reconfigures during specific
beverified.Theresultant dataandinspection sheets should events in the life of the system
becomepartof theverification datain thehistoryjacket
retainedbythecontractor forNASA. 5b. Give an example of a steady-state interface.
A. An interplanetary probe
B. A lunar base
2.10 Training 2
5C. What features make this a good example of a steady-state
interface?
1.
What is the purpose of interface control?
A. To define interfaces A. The basic structure of the spacecraft would remain the
B. To ensure compatibility between interrelated equip- same from assembly for launch throughout the life of
ment the experiment.
B. An initial base would be established and subsequently
C. To provide an authority to control interface design
D. All of the above made more complex with additional structures and
equipment delivered by subsequent lunar flights.
. How is an interface identified?
6a. How should an ICD custodian be selected?
A. By boundaries between functional areas
A. Percentage of ownership of the interface
B. By functional analyses of performance requirements
B. Relative investment of interface sides
C. By design features of a component that can affect the
C. An objective point of view
design features of another component

6b. What criteria should be used to select a custodian?


3a. How can interfaces be categorized?
A. Mechanical, electrical, software, and services A. Integration or U.S. center, flight hardware or software,
B. Electrical/functional, mechanical/physical, software, flight sequence, host or user, complexity, behavior,
and supplied services and partitions
C. Electrical, physical, software, and supplies B. Integration hardware, sequence user, and partitions

6c. What scoring system can be used for these criteria?


3b. What is one method of assigning an interface to one of the
A. Zero to 1.0, verbal consensus, unbiased survey, and
four basic categories?
partition evaluation analysis
A. Functional flow block diagram
B. One to 100, priority ranking, and voting
B. Timeline analysis
C. N-squared diagram
7a. What is the purpose of an ICD compatibility analysis?
4a. How can an interface be documented? A. Demonstrates definitions and provides mathematical
analysis
A. By drawing format
B. Demonstrates completeness of an interface definition
B. By specification format
and provides a record that the interface has been
C. By both of the above
examined and found to be compatible

2Answers are given at the end of this manual.

NASA RP-1370
7b.Whatarethe four categoriesthatrequireinterface 8b. What interface deficiency rating does a bracket discrep-
analysis? ancy have?
A. Electrical/functional,
mechanical/physical,
supplied/ A. S & MA impact A > 1 or understanding of risk B > 2
services,
andhydraulic/pneumatic B. S & MA impact A < 1 or understanding of risk B < 2
B. Electrical/functional,
mechanical/physical,
software,
andsuppliedservices 9a. How are mating sides of an ICD interface verified?
A. Testing or measurement to meet requirements
7c. The hardware for mounting the satellite vehicle (SV) B. Analysis, measurement or inspection, demonstration,
adapter to the Titan IV Centaur is shown in figures 2.1 and functional testing
to 2.3.
A. Is there sufficient data to perform a compatibility 9b. What does the verification test outline provide?
analysis? A. Schedule, tied to production, that allows interface
i. Yes ii. No requirements to be verified
B. Can the Jet Propulsion Laboratory specify the SV B. Process controls, tied to manufacturing, for meeting
adapter ring? schedules
i. Yes ii. No
C. What items need to be bracketed? 9c. Where is the resultant test and inspection data stored?
i. Shear pin material and SV attachment view A. Contractor files for use by an independent third party
ii. SV panel and view C--C B. History jackets for use by NASA

8a. What does a bracket on an ICD represent?


A. Verification of design compliance
B. An interface problem

NASA RP-1370 9
+

I0
NASA RP-1370
_e

i l,.

ug_

A;

NASA RP-1370 11
12 NASA RP-1370
Chapter 3

The Process: Through the Design Phases


Interface control should be started when a program begins. ticipants responsible for the interface control documentation,
This process eventually will define all interface design and the approval or signoffloop for documentation, and the degree
documentation responsibilities throughout the life cycle of the to which all participants have to adhere to interface control
program. Each pro_am phase from concept development to parameters and establishing a missing design data matrix,
project construction is directly related to the maturity level of change procedures, etc. (See section 3.2.)
interface control. Early development of the interface process, products, and
participants provides a firm foundation for the design engineer
to use the correct information in designing his or her portion of
3.1 Program Phases an interface. It minimizes the amount of paper to be reviewed,
shortens the schedule, and concentrates the efforts of the
designer on his or her area of responsibility.
3.1.1 Concept Definition
Initial selection of interfaces generally begins with listing of
all pieces of equipment in a system and then identifying the
During the system engineering concept definition phase
extent of interrelation among them. One tool used to help in this
(from fig. 1.1), basic functional areas of responsibility are
process is the N-squared dia_am. Details of this process can be
assigned for the various pieces of equipment that will be
found in reference 4. The N-squared diagram was initially used
employed by the project (electrical power, environment con-
for software data interfacing; however, some centers are using
trol, propulsion, etc.); see figure 3.1. At this point the design
it for hardware interfaces. If the dia_am is not polarized
responsibilities of the responsible organization and related
initially (input/output characteristics not labeled), it is a conve-
contractor (if chosen) should be defined to establish a set of
nient format for identifying equipment interfaces and for cat-
tiered, traceable requirements. From these requirements the
egorizing them. An example of this form is shown in figure 3.2.
interfaces to be designed are identified by category (electrical/
This diagram can be further stratified to identify the interfaces
functional, mechanical/physical, software, and supplied ser-
for each of the categories; however, detailed stratification is
vices) and by type of data that must be defined. This categori-
zation will include a detailed review of each requirement to best applied to electrical/functional, software, and supplied
services interfaces. Using the N-squared diagram permits an
determine which requirements or features will be controlled by
orderly identification and categorization of interfaces that can
the interface control process. (What is important for this item to
be easily shown graphically and managed by computer.
fulfill its intended function? On what interfacing equipment is
By the end of this phase the basic responsibilities and
this function dependent?) Once the interfaces to be controlled
management scheme, the framework for the interface control
are selected, the formal procedures for interface control need to
documentation, and the process for tracking missing interface
be established. These procedures include identifying the par-
design data (see section 3.2.2) should be established and
disseminated.

3.1.2 Requirements Definition

During the requirements definition phase (fig. 3.3; from


fig. 1. I ), the definitions of the mission objectives are completed
Concept so that each subsystem design can progress to development.
definition Here, the technology to be used in the project will be defined to
limit the risk associated with the use of new, potentially
• Assign basic functional areas of responsibility. unproven technologies. Defining requirements and baselining
• Define design responsibilities. interface documents early in the design process provides infor-
mation to the designer needed to ensure that interface design
• Categorize interfaces.
is done correctly the first time. Such proactive attention to
• Define interfaces to be controlled.
interfaces will decrease review time, reduce unnecessary
• Establish formal interface control procedures.
paperwork, and shorten schedule times. By the end of require-
• Disseminate scheme, framework, traceability. ments definition all interface control documents should be
Figure 3.1 .--Establishment of interface control prepared, interfaces defined to the most detailed extent pos-
process during concept definition. sible, and ICD's presented for baselining.

NASA RP-1370 13
M M M M M M
Structure IM M iM M
M,E E
Fuel pods M M
SS SS
I

M,E
Thrusters

E
Solar arrays
SS
Heat M M M,E
converters SS SS SS
Voltage M,E M,E E E
converters SS
Antenna M,E
A
Antenna M,E E
B
Experiment E
1
Key ! Ex aeriment E
2
E Electrical/functional
M MechanicaVphysical Experiment M
SS Supplied services 3

Gyros

Figure 3.2.--N-squared diagram for orbital equipment. (Entries not polarized.)

Baselining is the act by which the program manager or


designated authority signs an ICD. That signature establishes
the ICD as an official document defining interface design
requirements. The term "baselining" is used to convey that the
ICD is the only official definition and that this officiality comes
from the technical management level. Not only is the initial
version of the ICD baselined, but each subsequent change or
update to an ICD is also baselined.
The baselined version of the ICD will identify (by a "void")
any missing design data that cannot be included at that time.
Agreed-to due dates will be noted on the ICD for each data
= equirements
definition element required. Each void will define the data required and
specify when and by whom such data will be supplied. Where
• Define technologies to be used. possible, the data to be supplied should be bounded initially on
the ICD. These bounds will be replaced by detailed data when
• Define and categorize all interfaces.
the void is filled. The initial bounds give the data user (the other
• Prepare all interface control documents. side of the interface) a range that can be used without risk, until
• Identify all voids and assign both the detailed data are supplied. Establishing these voids on
responsibilities and due dates. ICD's provides a means of ensuring that interface design data
• Bound voids when possible. are supplied when they are required by the data user. Yet it
• Baseline interface documents. allows design freedom to the data supplier until the data are
needed. A recommended form for use in identifying the data
Figure 3.3,--Development and control of
needed is shown in figure 3.4. The criteria for choosing due
interfaces during requirements definition. dates are discussed in section 3.2.

14 NASA RP-1370
II B| I I II I

Interface Design Data Required (IDDR)

(Drawing/document number ÷ Void number)

Data required:Brief description of information needed


to define interface ,element currently
lacking details

Data supplier: (Projed center/code/contractor)

Data user(s): (pm_ect center/code/contracto 0

Date due: (Date design data are needed, either actual


date or a period of time related to a specific
milestone,

Figure 3.4.--Format for interface design data required (IDDR).

I • I I I I I

Interface Design Data Required ODDR)


Program Status Report

Drawing/doc # Sheet/page i Short title Suppl.)r(s) User,s) Due date Remarks

IDDR # /Zone Data required Center/code/ Center/code YriMo/Day


contractor contractor

!i

I I I I IIII II II I I II II I I I II I

Figure 3.5._Fonnnat for monthly report on IDDR status.

Documents should be baselined as early as possible, as soon Technical information voids in interface documents must be
as the drawings contain 10% of the needed information. The accounted for and tracked. Otherwise, there is no assurance that
significance of early baselining is that both sides of the interface the needed information is being provided in time to keep the
have the latest, most complete, official, single package of design on schedule. The status of these voids must be reported,
information pertaining to the design of the interface. and the owners of the interface-design-data-required forms
The package includes all a_eed-to design data plus a list of (IDDR's) must be held responsible for providing the needed
all data needed, its current level of maturity, and when it is to information. It is recommended that the status be reported
be supplied by whom to whom. monthly to all parties having responsibility for the interfaces.

NASA RP-1370 15
Aconsolidatedreportisthemostefficient,
consumestheleast 3.2 Preparing and Administering
paperandmailservices, andallowstheprogram manager to Interface Control Document
trackareas
important
totheintegration ofthesystem
compo-
nents.
Thebasicformshown infigure3.5isrecommended for
3.2.1 Selecting Type of Interface Control Document
reporting
andtracking IDDR's.
A drawing, a specification, or some combination format
3.1.3Systems Integration
should be selected for the ICD on a case-by-case basis. The
drawing format generally is preferred when the ICD has fea-
The interface control program continues to be active during
tures related to physical dimensions and shapes. The specifica-
the systems integration phase (fig. 3.6; from fig. 1.1). Design
tion format is preferred when the ICD needs tables and text to
changes that establish a need for a new interface will follow the
describe system performance. Combinations are used when
interface control change procedures as defined in section 3.2.
both dimensions and tables are needed. Members of the
Proposed design changes that affect existing interfaces are
coordinating activity responsible for preparing the ICD deter-
not given final approval until all participants' and the cognizant
mine the format, which is approved by the appropriate project
center's baselinings have been received through the ICD change
authority. Examples of drawing formats are given in appen-
notice system.
dixes A and B.
During the various design reviews that occur in the full-scale
The level of detail shown on the ICD varies according to the
engineering development phase, special attention should be
type and degree of design dependency at the interface being
given to design parameters that if altered, would affect inter-
controlled. The ICD should clearly identify and control inter-
faces controlled by the ICD. It is strongly recommended that
faces between designs and enable compatibility to be demon-
each design activity denote, on design and manufacturing
strated between the design areas. The key to a useful ICD is
documentation at the preliminary design review, through a
limiting the detail shown to what is required to provide compat-
bracket or some highlighting system, those features and char-
ibility. Any unnecessary detail becomes burdensome and may
acteristics that would affect an interface (see section 2.8). At the
confuse the contractors responsible for designing the mating
critical design review all voids should be resolved and all
interface. Again, the ICD should, at a minimum define and
detailed design drawings should comply with interface control
illustrate physical and functional interface characteristics in
documentation (see section 2.9).
sufficient detail that compatibility, under worst-case toler-
ances, can be determined from the ICD alone; or it should
reference applicable revisions of detailed design drawings or
documents that define and bracket or identif'y features, charac-
teristics, dimensions, etc., under worst-case tolerances, such
that compatibility can be determined from the bracketed
features alone.

3.2.2 Tracking and Resolving Missing Interface


Design Data

Missing interface data should be identified on the ICD, and


the ICD should control the date for its submission. The notation
identifying the missing data should indicate the specific data
Systems
required, how the data are being tracked for resolution, when
integration
the data are needed by the interfacing design agent, and by what
• Manage and satisfy voids. date the required data will be supplied. Establishing data-
• Invoke use of brackets on design drawings. required notations (or voids) on ICD's helps ensure that inter-
face design data will be supplied when needed; yet it allows
• Ensure resolution of voids by the time of critical
design freedom to the data supplier until the due date. Every
design review.
attempt should be made to establish realistic due dates and to
• Verify compliance of design documentation with meet that schedule unless there is a valid and urgent need to
ICD's. change a due date.
Figure 3.6.--Development and control of interfaces These criteria and procedures should be followed in estab-
during systems integration. lishing, reporting, and managing data due dates:

16 NASA RP-1370
1. Choose theduedateasthedatewhenthedatauserwill appropriate authority to all other activities with review and
starttobeaffected if agreed-upon or baselined
datahavenot comment responsibilities for the particular ICD and to the ICD
beenreceived. custodian.
2. Whenestablishing aduedate,allowtimetoprocess and Technical comments by all activities should be transmitted
authenticateachange noticetotheICD(i.e.,oncetheduedate to the custodian as soon as possible but not later than 30
hasbeenestablished, include aperiodoftimetoestablish that working days 4 from receipt of the comment issue. If the
duedateforthedatasupplier). comment issue is technically unacceptable to the Government
3. The custodian responsible for the ICD should periodi- authority or the interfacing contractor, the rationale for
cally, as determined by the appropriate project authority, unacceptability should be explained, including technical and
prepare and distribute a report on the status of all missing design cost effects if the interface definition is pursued as presented.
information for all project activities. The report should contain
the following information: 3.4.1 Resolving Comments
a. Identification of the data element needed, consisting of
the ICD number, the date, and a two- or three-digit The ICD custodian collects review comments and works in
number that provides a unique identifier for the data conjunction with project management for comment resolution
element until approval is attained, the comment is withdrawn, or the
b. A short title for the ICD ICD is cancelled. Information on comments and their disposi-
c. The activity that requires the data tion and associated resolution should be documented and
d. The date when the missing data are to be supplied or transmitted to all participants after all comments have been
the period of time after the completion of a program received and dispositioned. Allow two weeks 4 for participants
event or milestone when the data are to be supplied to respond to the proposed resolution. Nonresponses can be
e. The activity from which the data are due considered concurrence with the resolutions if proper
f. The status of the data required (i.e., late data, data in prenotification is given to all participants and is made part of the
preparation, or change notice number) review and comment policy.
g. A description of the data required When comments on the initial comment issue require major
changes and resolution is not achieved through informal com-
munications, an additional comment issue may be required
3.3 Initial Issuance of ICD and/or interface control working group (ICWG) meetings may
need to be arranged.
The first issue of an ICD should be a comment issue. The
3.4.2 Interface Control Working Group
comment issue is distributed to participating centers and con-
tractors for review and comment as designated in the interface
The ICWG is the forum for discussing interface issues.
responsibility matrix (fig. 3.7).
ICWG meetings serve two primary purposes: to ensure effec-
The interface custodian generates the responsibility matrix
tive, detailed definition of interfaces by all cognizant parties,
for ICD's. The matrix specifies the center and contractor
and to expedite baselining of initial ICD's and subsequent
responsibilities for baselining, review and comment, and tech-
drawing changes by encouraging resolution of interface issues
nical approval. The matrix lists all ICD's applicable to a
in prebaselining meetings. A major goal of interface control
particular program, Distribution of the ICD's can then be
should be that baselining immediately follow a prebaselining
controlled through this matrix as well.
ICWG meeting.
The review and comment process is iterative and leads to
All ICWG meetings must be convened and chaired by the
agreement on system interface definitions and eventual approval
cognizant project organization. The project can choose a con-
and baselining of the ICD. See figure 3.8 for a flow diagram of tractor to act as the chair of an ICWG when Government
the issuance, review and comment, and baselining procedures
commitments are not required. In all cases the ICWG members
for ICD' s. Concurrent distribution of the comment issue to all
must be empowered to commit the Government or contractor to
participants minimizes the time needed for review and subse-
specific interface actions and/or agreements. In cases where a
quent resolution of differences of opinion.
contractor is ICWG chair, the contractor must report to the
Government any interface problems or issues that surface
during an ICWG meeting.
3.4 Document Review and Comment
4The timesassigned for commenting activities to respond are arbitraryand
As designated in the ICD responsibility matrix, all centers should be assigned on the basis of the schedule needs of the individual
and contractors should submit technical comments through the programs.

NASA RP-1370 17
I
@ I
0
Z
I
I
L

i
I

t-
I

I
.m

F E

c-
o

I
L o
t_
I

e"

18 NASA RP-1370
ICD custodian
develops comment
issue of ICD
Issuance Issuance

I Contractors
and commentreview l Centers
and comment
review
Comments I
Resolution Resolution
cycle cycle
T l ICD custodian
coordinates and
resolves comments* * Interface control
working group meetings
are scheduled as needed.

Technical approval
by NASA centers
and contractors

Monthly status reports ICD to all


i _ Distribution
participants of

Figure 3.8.--Flow of interface control document production.

The ICWG chair prepares the ICWG meeting minutes or 3.4.5 Baselining
designates one of the meeting participants for this task. The
minutes should include discussions of problems, agreements Interface control documents are baselined when the owners
reached, decisions made, and action items. The ICWG chair of both sides of the interface at the next level up in the pro_am
also ensures that any updated interface control documentation structure come to technical agreement and sign the document
reflecting the ICWG discussions is distributed within the
timeframe agreed to by the affected participants.

3.4.3 Approval/Signoff Cycle 3.5 Change Notices


The management plan for the project assigns responsibility
The procedure for initiation, review, technical approval,
for each piece of equipment to a specific project authority and
baselining, and distribution of changes to project ICD's
its contractor. The signoff loop for each ICD reflects this plan
(fig. 3.9) should conform to the following guidelines.
and can be related to the project and the origin of each design
requirement. For each ICD, then, the signoff loop follows the
sequence of technical approval by the contractors first and then 3.5.1 Initiating Changes
by the appropriate project authority.
Any project activity should request a change to an ICD when
3.4.4 Technical Approval
I. Data are available to fill a void.
The appropriate project authority and the primary and asso- 2. Information contained in a data-required note needs to be
ciate organizations with an interest in a particular ICD are listed modified.
in the responsibility matrix. They each sign the ICD to signify 3. Additional data are needed (i.e., a new data requirement
technical agreement and a readiness to contractually invoke its has been established).
requirements. 4. A technical error is discovered on the ICD.

NASA RP-1370 19
change request
I Originate
(any participant)

+
Contractors
(for information)
i
f,cocu.d.
II
reviews and prepares
proposed change notice
Project
(for information)

Issuance Issuance

I Contractors
and commentreview I and
Project
comments
reviews I

Comments
Resolution Resolution
cycle cycle
ICD custodian
coordinates and
resolves comments

I ICWG meeting I
t I I
as required =

l Technical
landbY
NASAcontractorsPrOject
appr°val -- ....... 7 '
* Distribution per ICD
distribution matrix

* Distribution of
proceed as required
I Direction to by NASA
Authentication change notice
to contractor

Incorporate Change master ICD to


change into incorporate change
project and distribute

Figure 3.9.wDevelopment and flow of change notices in the ICD revision process.

2O NASA RP-1370
5. Anequipment designchangeandasystemorequipment The proposed change notice describes the specific changes
rearrangement are proposedto improveperformance, (technical or otherwise) to the ICD in detail by "from-to"
reducecost,orexpedite
scheduled
deliveries
thatwouldrequire delineations and the reasons for the changes, as well as who
changestoaninterfaceorcreation
ofnewinterfaces. requested the changes and how the change request was trans-
mitted (i.e., by letter, facsimile, ICWG action item, etc.).
3.5.2 Requesting Changes
3.5.3 Proposed Change Notice Review and
Comment Cycle
All requests for changes should be submitted to the organi-
zation responsible for maintaining the ICD, with copies to all
activities that will review the resultant change notices and to the The review and comment cycle for proposed changes to
ICD's should follow the same system as that used for the initial
appropriate project authority. If baselining is needed in less
issuance of the ICD (see sections 3.3 and 3.4).
than 30 days, a criticalchange should be requested. All requests
for changes should be submitted in a standard format that
3.5.4 Processing Approved Changes
includes the following items:

The baselined change notice should be distributed to all


1. Originator's identification number--It is used as a refer-
cognizant contractors and project parties expeditiously to prom-
ence in communications regarding the request and should
ulgate the revised interface definition. The master ICD is
appear on resulting change notices
revised in accordance with the change notice, and copies of the
2. Originating activity----originating project and code or
revised sheets of the ICD are distributed (see sections 3.3 and
originating contractor
3.4). Approval of the change by the project constitutes author-
3. Point of contact--name, area code, telephone number,
ity for the cognizant organization to implement the related
facsimile number, and e-mail address of the person at the
changes on the detailed design.
originating activity to be contacted regarding the request
4. Document affected--number, revision letter, and short
3.5.5 Distributing Approved Changes
title of each ICD that would be affected by the change
5. Number of data voids (if applicable)renumber of data The custodian distributes the baselined change notice to all
requirements for which data are being provided cognizant centers and contractors to expeditiously promulgate
6. Urgency--indication of whether this change is critical or the revised interface definition. The master ICD is then revised
routine (project decides whether to use critical route) in accordance with the change notice, and copies of the revised
7. Detailed description of change--a graphic or textual ICD sheets are distributed as was the change notice.
description of the change in sufficient detail to permit a clear The responsibility matrix (fig. 3.7) can be used to identify the
portrayal and evaluation of the request. Separate descriptions distribution of change notices as it was used for the distribution
should be provided when more than one ICD is affected. of the ICD's.
8. Justification---concise, comprehensive description of the
need and benefit from the change 3.5.6 Configuration Control Board
9. Impact-----concise, comprehensive description of the ef-
fect in terms of required redesign, testing, approximate cost, During development the project's configuration control
and schedule effects if the requested change is not approved; board is responsible for reviewing and issuing changes to the
also the latest date on which approval can occur and not affect configuration baseline. The board reviews all class I engineer-
cost or schedule ing change proposals to determine if a change is needed and to
10. Authorizing signature of the organization requiring the evaluate the total effect of the change. The configuration
change control board typically consists of a representative from the
chairman, the project management office, customers, engineer-
Upon receipt of a change request to an ICD, the ICD ing, safety assurance, configuration management (secretary),
custodian coordinates the issuance of a proposed change notice. fabrication, and others as required.
First, the ICD custodian evaluates-the technical effect of the Changes to configuration items can only be effected by the
proposed change on the operation of the system and mating duly constituted configuration control board. The board first
subsystem. If the effect of the change is justified, the ICD defines a baseline comprising the specifications that govern
custodian generates and issues a change notice. If the justifica- development of the configuration item design. Proposed changes
tion does not reflect the significance of the change, the ICD to this design are classified as either class I or class II changes.
custodian rejects the request, giving the reason or asking for Class I changes affect form, fit, or function. However, other
further justification from the originating organization. The ICD factors, such as cost or schedule, can cause a class I change.
custodian evaluates an acceptable change request to determine Class I changes must be approved by the project before being
whether it provides data adequate to generate a change notice. implemented by the contractor.

NASA RP-1370 21
Allotherchanges
areclassII changes. Examples ofclassII Appendix G provides seven ICD guidelines that have been
changesareeditorial
changes in documentation or hardware used by many successful flight projects and programs to pro-
changes(suchasmaterialsubstitution) thatdonotqualifyas vide such a focus on the interface definition and control
classIchanges.
Projectconcurrence, generally,
isrequired for process.
thecontractor
toimplement class IIchanges.Government plant
representatives
(Defense Contracts AdministrationServices
(DCAS), NavyPrograms Resident Office(NAVPRO), andAir 3.6 Training 2
ForceProgramsResidentOffice (AFPRO) usually accomplish
these tasks. la. When should the ICD process be started?
A. Concept definition B. Requirements definition
3.5.7 Closing the Loop C. Systems integration

A wide range of methods are available for verifying by test lb. What are the benefits of early development of the ICD
that the design meets the technical requirements. During the process?
definition phase analysis may be the only way of assessing what A. Assigns basic areas of responsibility
is largely a paper design. Typical methods are testing by B. Provides firm foundation for design, minimizes
similarity, analysis, modeling, and use of flight-proven compo- paper, shortens schedule, and concentrates efforts
nents; forecasting; and comparison, mathematical modeling,
simulation modeling, and using flight-proven experience and lC. What tool can be used to list equipment and identify their
decisions. The actual methods to be used are determined by the interrelations in a system?
project office. Each method has associated costs, requires A. Prechart B. N-squared dia_am
development time, and provides a specific level of performance
verification. The Government and industry managers must 2a. What should be done in the ICD process during require-
carefully trade off program needs for performance verification ments definition?
with the related costs. A. Define mission objectives
If any demonstrated or forecast parameter fails outside the B. Define technology and interfaces and present for
planned tolerance band, corrective action plans are prepared by baselining
the contractor and reviewed by the Government project office.
Each deviation is analyzed to determine its cause and to assess 2b. What is baselining?
the effect on higher level parameters, interface requirements, A. The designated authority signing an ICD
and system cost effectiveness. Alternative recovery plans are B. The only official definition
developed showing fully explored cost, schedule, and technical
performance implications. Where performance exceeds re- 2c. How are voids in an ICD accounted for and tracked?
quirements, opportunities for reallocation of requirements and A. Procedure or administration report
resources are assessed.
B. Monthly program status report on interface design
Although functional and performance requirements are con- data required
rained in the appropriate configuration item specification, the
definition, control, and verification of interface compatibility 3a. What should be done in the ICD process during develop-
must be handled separately. Otherwise, the volume of detail ment?
will overwhelm both the designers and managers responsible A. Manage voids, invoke brackets, resolve voids, and
for meeting the functional and performance requirements of the verify compliance
system. Early establishment of the interface definition and B. Control interface developments
control process will provide extensive savings in schedule,
manpower, money, and paper. This process will convey pre- 3b. How should proposed design changes be handled?
cise, timely information to the interface designers as to what the A. Discussed at critical design review
designer of the opposing side is committed to provide or needs B. Discussed and approved by all participants
and will subsequently identify the requirements for verifying
compliance. 3C. What should be given special attention?
Whether the interface is defined in a drawing format or in a A. Design parameters that affect controlled ICD
narrative format is at the discretion of the program. What is of B. Manufacturing documentation
primary importance is that only the information necessary to
define and control the interface should be on these contractural
documents to focus the technical users and minimize the need
for updating information. 2Answers are given at the end of this manual.

22 NASA RP-1370
4a. When is the drawing format used for an ICD? 6d. Who approves and baselines an ICD?
A. To describe the type and nature of the component A. Projects at the next level up in program structure
B. To describe physical dimensions and shapes B. The project office

4b. When should a specification be used? 7a. When should a project activity request a change to an ICD?
A. To describe performance with tables and text A. At the custodian's request
B. To describe a software function B° When data are available, requirements need change,
an error is discovered, or the design changes

What is the key to providing a useful ICD?
A. Give as much detail as possible 7b. What items should be included in a change notice request?
B. Limit the detail to what is necessary to demonstrate A. Identification number, activity, contact, document
compatibility affected, number of data voids, urgency, descrip-
tion, justification, impact, and authorizing signature
5a. What is the purpose of the initial issue of an ICD? B. Those established by the ICWG
A. Issuance, review, comment, and baselining
B. Review and resolution of differences of opinion 7C. Who evaluates and issues a proposed change notice?
A. ICD custodian
5b. Who is responsible for controlling the flow of an ICD? B. Project office
A. Contractor
B. Custodian 7d. What does a proposed change notice describe?
A. Specific changes (from-to), reasons, and the
6a. Who should review ICD's? requestor
A. Organizations designated in the responsibility B. Project notices
matrix
B. ICD custodian 7e. How is a change notice approved and distributed?
A. By the project authority to all cognizant parties
6b. How are comments resolved? B. By all cognizant parties to the contractors
A. By the project office
B. By project management and custodian working for
resolution and approval or the comment being with-
drawn

6C. Where are interface issues discussed?


A. Project office
B. Interface control working group

National Aeronautics and Space Administration


Lewis Research Center
Cleveland, Ohio, 44135, July 1995.

NASA RP-1370 23
Appendix A

Electrical/Functional Interface Example


This appendix illustrates elements of a telemetry drawing callout in table A.2 and the accompanying DDR block (fig.
interface control document showing control of waveform A.3). The DDR block notes that the responsible parties have
parameters and data rates. This interface example depicts data agreed on an amplitude band with which they can work until the
transfer between a guidance system electronics assembly and a guidance design becomes firm. However, there is also a date
launch vehicle telemetry system. The basic drawing (fig. A.1) called out that indicates when (45 days after preliminary design
covers the isolation elements of the guidance system, the jack review) the telemetry contractor must have the data to be able
and pins assigned, and shielding and grounding on the guidance to complete design and development and deliver the telemetry
side of the interface. Bus functions are named (e.g., guidance in time to support launch vehicle flight.
telemetry data l(parametric)), and the shielding requirements The parameters called out in this example are only' those
through to the first isolating elements of the telemetry system needed to control the design of either side of the interface
are provided (see notes on fig. A. 1). through the first isolating element. Also note that only the
Table A. 1 contains the details to be controlled for each bus shielding and wire gage of the launch vehicle cabling between
function. Signal source (electronics assembly) and destination the two systems are provided. Only pin numbers for the
(telemetry system) are identified. The waveform (fig. A.2) and guidance side of the interface are called out and controlled.
its critical characteristics (table A.2) are provided, as well as Connector types and other pertinent cable specifications are as
data rates and sources and load impedances. Telemetry load per a referenced standard that applies to all launch vehicle
impedance is further described by an equivalent circuit (see cabling. In this case the same pulse characteristics apply to each
note 3 on fig. A. 1). of the functions covered; however, table A.2 is structured to
The final value of pulse minimum amplitude is missing in pert'nit variation for each function if the design should dictate
this example. This is noted by the design-data-required (DDR) different values for the characteristics of each function.

24 NASA RP-1370
O)

"0

,-n
0

_
_'=
__': _ _
E_
_
_ _ _
.o_ _ _ _,.-_ __,. __ .= .-
(3
r-

U-
_ ._. .__ "_
_
"0 _
_
_3 _
__ _ _
_ _
---

25
NASA RP-1370
!

_,_o
o_ _ _ _ __
_-o
-_ .o

_

_
_°N
_°_
O
oo_ o=_ ooo
_(,,I .O
o _,- ,_ _ ._: _ .-
o-_ _ _'_ _,_ _'_ "13

26 NASA RP-1370
Pulse duration

Maximum
_- 10% of minimum amplitude
amplitude \

_ i /- Minimum amplitude
\4\ I/ •

Noise ! No-transmission level


Reference
level

R_ time i! TX_j/__.tUndershoot

Inte_ulse period Leading ITrailing Interpulse period _


edge edge

Notes:
1. The interpulse period shall be the period from 150 ns after the trailing edge of
a pulse until 100 ns prior to the leading edge of the subsequent pulse.
2. The reference level shall be the average voltage for the last 200 ns of the
interpulse period,
3. The no-transmission level shall be 0 V differential at the guidance/launch vehicle
interface using the test load specified in table A.2.
4. Shielding depicted represents the telemetry shielding requirements only. For
cable routing see void #01. Telemetry shielding shall be carried through all
connectors between the electronic assembly and the telemetry subsystem.
5. A radiofrequency cap shall be provided on electronic assemblies in all launch
vehicles in lieu of this connector.

Figure A.2.--Guidance data pulse characteristics.

NASA RP-1370 27
Table A.2.mREQUIRED PULSE CHARACTERISTICS AND TEST PARAMETERS

Pulse Guidance telemetry


characteristics
(see fig. A.2) Data I Data 2 Bit Frame Data I word Data 2 word
synchronization synchronization synchronization synchronization

Pulse duration 255 + 50 ns


Minimum amplitude 9 +_.2 V (see V027)
Maximum amplitude 15V
Rise time 75 ns maximum
Fall time 125 ns maximum
Undershoot 2.5 V maximum
Reference level offset 0 to -4.5 V relative to no-transmission level
Noise 1.4 V maximum peak to peak
Receiver susceptibility 2.0 V minimum
Test parameters:
Test load 75 __+5% resistive
Receiver 2.0 V minimum
susceptibility

DDR No. 3288399-V027

Data required: Guidance subsystem waveform parameter data (minimum amplitude


value to replace coordinated temporary amplitude band currently on
ICD-3288399)

Data supplier:. SP-2012/guidance telemetry steering committee

Data user(s): SP-2732/launch vehicle telemetry contractor/interface coordinator

Date due: 45 days following guidance preliminary design review

Figure A.3.--Typical design data required for table A.2.

28 NASA RP-1370
Appendix B

Mechanical/Physical Interface Examples


B.1 Mechanical Interface for one or the other. Considering the function of the
mounting bolts--to locate the box relative to the
Distributed Electrical Box
electrical connectors, it has to be assumed
that dimensions a, b, c, and d are basic dimensions.
Figure B.1 is an example of an interface development docu-
Interface control drawings cannot require the
ment (IDD) that, from initial inspection, appears to be fairly
designer of the mating interface to assume any-
complete. This figure contains a great amount of detail and just
thing. IDD's must stand by themselves.
about everything appears to be dimensioned. However, closer
b. Figure B.3 depicts initial details of mounting bolts
examination will reveal serious shortcomings.
for the L-shaped bracket. On first inspection there
First, the basic function of the interface must be defined. The
appears to be a great amount of detail. However, further
box depicted must be capable of being removed and replaced on
examination shows that much of the detail is not related
orbit, in many cases outside the crew habitat. In some cases it
to interface definition. The interface is the bolt. Where
is to be removed and replaced robotically. The box slides along
is it relative to other features of the box? What is the
the L-shaped bracket held to the support structure by three
relationship of bolts 1 and 2 to bolt 3 (datum C)?
mounting bolts labeled "bolt 1," "bolt 2," and "bolt 3." As the
What is the thread of the bolt? How long is the bolt?
box slides along the L-shaped bracket from left to right in the
The following data on the IDD are not required:
figure, some piloting feature on the box connectors engages the
i. Counterbore for bolt head
connectors mounted to the support structure by the spring-
ii. Diameter of bolt hole in bracket for bolts 1, 2,
mounted assembly, and the connector engages fully when the
and 3
lead screw is completely engaged.
iii. Distance of bolt hole to first thread

1. The initial interface area to be examined is that of the iv. The fact that there is a screw retaining ring
Adding data not required forthe interface, even if they
L-shaped bracket to the support structure (i.e., the interface of
are only pictorial, is expensive. It takes time for the
the three mounting bolts). The interface is being examined from
organization to develop and present it, and it takes
the perspective of the designer of the support structure. Does
time for the designer of the mating interface to deter-
figure B. 1 contain enough information for a mating interface to
mine that the information is not necessary and discard
be designed? (The area of interest has been enlarged and is
it. If the extraneous information stays on the IDD, it
presented as figure B.2.)
must be maintained (i.e., changed if the design details
a. The dimensions circled in figure B.2 and lettered a, b,
change). Only the features of a design that affect the
c, and d locate the position of the mounting bolts
features of the design of the mating interfaces need
relative to the box data. The following pertinent
be placed on the IDD.
differences are noted concerning this dimensioning:
i. Dimension a locates the holes relative to a "refer- c. Once the unnecessary data are removed, what remains
is shown in figure B.4. The data that remain are not
ence datum for coldplate support structure," but
complete and are unclear. The true position notations
the datum is not defined on the drawing. Is it a line
are indicated as being those for the "mounting inter-
or a plane ? What are the features that identify/locate
face for bolt," suggesting that the true position applies
the datum? What is the relationship of this datum to
to the hole in the support structure. However, since the
other data identified on the IDD (data A, B, and D)?
IDD is basically covering the features of the box, it is
This information is required so that the designer
assumed that these locations apply to the bolts on the
of the support structure can relate his or her
box. It should not be necessary to have to make
interface features easily to those of the box IDD.
assumptions about data on an IDD or ICD. The
ii. The IDD states that the tolerances on three-place
document should stand by itsdf.
decimals is +0.010. Dimensions a, b, c, and d
are three-place decimal dimensions and would,
therefore, fall under this requirement. Elsewhere on The only other data left in figure B.4 are the callouts for
the IDD a true position tolerance for bolt locations the locking inserts. These callouts refer to the method
is indicated. A feature cannot be controlled by both used by the designer of the support structure for retaining
bilateral and true positioning tolerancing. It must be the bolts. This IDD should not have this callout, since the

NASA RP-1370 29
aiti
mll

¢,.

e-

O
i-

X
0
.Q

'I=

.__

O
i

t N

;--!--I
E
n

I.
O3

.__
• "_
",i I

3O
NASA RP-1370
o.,go-= _ /
UNJFo3B II.L I .o_I At --_ ---O.03e-O._S_
_,.__ ---"i .,.o.oor

I I I _I"
'

d c b a

Figure B.2.mDetaU of L-shaped bracket interface.

C_unwl)¢_ k_"
10-32/Ulen Head 8oa

_ o.1_..0.'_94
"-I_ _f InErackel _ _,/ .'-r--'_/ J I

l_te-°"lcl __`" lela._! _ '_"


aounmg_= Uoun_ v_=o=
kx Bolts _ w',d 2 Io_ 8oN 3
[]
Figure B.3.--Initial details of mounting bolts.

NASA RP-1370 31
• J

• Jl

[EIE
_ Inttntaee Moun_ lmm'fa_
k_ Bolts 1aria2 tot 6oR 3

Figure B.4.---Necessary details of mounting bolts.

1.000 Max

0_75 Min 3 PL

I _bolt 2 and 3
fe"Ie .o,,IcI lele. l

Figure B.5.--Minimal interface definition.

method used for retaining the bolts is not the responsibil- 2. The next area to be examined is that of the connector
ity of the box designer. Generally IDD's and ICD's interface. Since both parts of the connector are being provided
should not specify design solutions, especially when by the box designer, the interface is the plate on which the
the design solutions are not the responsibility of the connectors are attached to the support structure. Again, the
one specifying them. question is, Does figure B.1 contain enough information for a
What is missing is how far the bolts protrude from the mating interface to be designed? The answer to that question is,
box. These data are required so that the designer of the Definitely not! The interface of the plate (holding the connec-
support structure knows how deep to make the mating tors) that mates with the support structure is identified as datum
hole and how much of a mating thread must be supplied D. Again, there is no definition of this datum. Is it a plane
to grip the bolts on the box. passing through the three highest points of the plate or some
Considering all of the above, figure B.5 represents other features of the connector plate?
what is really required (along with the locations and If a compatible mating interface is to be designed, the
thread types already defined in fig. B. 1 ) to define the box relationship between the surface to which the connector plate is
side of the interface and for the designers of the support attached and the surface to which the L-shaped bracket is
structure to design a compatible interface between the attached must be known. None of these data are supplied in
retaining bolts and the support structure. figure B.I. The following are data needed to establish this
relationship:

32 NASA RP-1370
a. Therequired
perpendicularity
of D to A B.2 Space Reservation and Attachment
b. The required parallelism of D to B
Features for Space Probe Onboard
c. The required angular relationship ofthe vertical centerline
shown in view B-B with the vertical centerline shown in Titan IV Launch Vehicle
view A-A
d. The pattern required for the four fasteners holding the Figure B.6 is an example of an ICD that defines the space
connector plate to the support structure. View B-B does envelope available onboard the Titan IV launch vehicle for a
contain a dimension of 2.594 for a horizontal spacing of payload and the attachment feature details for the launch
the lower two features but does not indicate that this vehicle side of the interface. The intended payload is the
dimension is applicable to the upper two fasteners. In Cassini Mission spacecraft. The Titan payload fairing, as
addition, there is no dimension for the distance between would be expected, is defined. The other side of this envelope
the fasteners in the Z direction. (i.e., the spacecraft) must also be defined to show compatibility.
e. The required relationship of the hole pattern for the When the spacecraft dimensions are established, compatibility
connector plate relative to the box, namely, should be shown by a comparison of the two envelopes. The
i. The location of the hole pattern above A in the Z Titan documentation defines the available space reserved for
direction equipment (i.e., a stay-out zone for the Titan launch vehicle
ii. The location of the hole pattern relative to C in the items). Ideally, this ICD should define a minimum space
X direction available for the spacecraft. Therefore, if the spacecraft dimen-
iii. The distance of datum D from C in the Y direction sions are constrained to a maximum size equal to the launch
when the box is fully installed vehicle's minimum, less a value for environmental effects, etc.,
then the two envelopes are compatible.
Since none of these data are identified as items to be determined Since interface data have been provided for the attachment
(TBD' s), it must be assumed either that the data are not required details for the launch vehicle side of the interface, the design of
because the connectors can be mated properly with a great deal the Cassini adapter for mounting to the Centaur launch vehicle
of misalignment or that the box designer did not recognize that at station -150.199 can be explained by using the Titan design
this type of data is required. Designers never wish to freeze a data.
design. The placement of design constraints in an ICD is The following key interface features have been established
basically freezing an area of a design or at least impeding the for this connection:
ability to change a design without that design being scrutinized
at another level. Therefore, the tendency of designers is to 1. Sheet 1 (fig. B.6(a)), note 5: Location of holes is estab-
disclose the minimum that they feel is necessary in the lished by a common master gauge tool with reference dimen-
interface for the control process. This is the primary reason sions provided.
for the ICD custodian not to be organizationally a part of 2. Sheet 3 (fig. B.6(c)), section F-F: Bearing areas are to be
the design process. Yet the ICD custodian must have access to fiat within 0.006 (units), and per view G the maximum bearing
the design function of an agency or contractor organization to area has been defined.
ensure the ready flow of the data required for proper interface 3. Sheet 3 (fig. B.6(c)), view H: Shape and dimensions of the
definition. (Can interface compatibility be demonstrated from shear alignment pins have been established.
the ICD's alone?) 4. Sheet 1 (fig. B.6(a)), note 4: How loads are to be transmit-
The ICD custodian must always test the data in interface ted is indicated.
documentation from the viewpoint of another design agent who
must develop a compatible mating interface. The following data elements missing from figure B.6 are
The preceding discussion simplifies specification of the mostly related to the lack of spacecraft design data:
L-shaped bracket and the mounting bolts. This redefinition of 1. No apparent tracking of TBD's. A tracking system
the interface tied up loose ends and provided needed dimen- should be in place at the beginning of ICD development.
sions and callouts absent from the original document. These Each TBD should have a unique sequential identifier with
portions of the document can now be controlled more easily and due dates and suppliers established.
related to a 100% mate design. 2. Norevision block for tracking the incorporation of changes.
Some type of revision record should be placed on each sheet.

NASA RP-1370 33
° _ " I @

÷ Q

34
NASA RP-/370
÷

"i
iZ

*r

I I

i • i i

4-

NASA RP-1370 35
÷

i
. ° I ° ? " I -

36 NASA RP-1370
Uponexchange ofdesign datarelatingtotheCassini probeit needed forinterfacecompatibility.
Thechances foranincom-
wouldbeexpected thattheprobe'smaximum envelope would patibilityaremuchlessif thespacecraft sideoftheinterface is
beestablished andrelatedto thedatasystem of theTitan/ defined. Space vehicledata,stations, andfasteners mustbe
Centaur launchvehicle. identifiedandcontrolled. Thedesigner ofthespace vehicle is
Thisexample isbasically aone-sided interface.TheTitan/ thenabletocommit tothedesign andproduction ofaninterface
Centaur sideof theinterface is welldefined,whichisto be thatisdefined. Thelaunchvehicledesigners canthenverify
expected consideringthematurity ofthedesign. Thetendency thatthespacecraft interfacewill matewiththelaunch vehicle
shouldberesisted, in cases likethis,to ignoreor placeless availableforthespacecraft.Therefore,if thespacecraftdimen-
emphasis onthedefinitionanddocumentation of themating sionsareconstrained toa maximum sizeequaltothelaunch
interface,
giventhecompleteness ofthelaunch vehicle sideof vehicle'sminimum, lessavalueforenvironmental effects,
etc.,
theinterface.
Themating interface, namely, thespacecraft side, thenthetwoenvelopes arecompatible.
shouldbecompletely defined. Otherwise, thespacecraft de- Sinceinterface datahavebeenprovided fortheattachment
signerwill besigninguptodesignacompatible interface by detailsforthelaunchvehiclesideoftheinterface, thedesign
a_eeingwithwhattheinterface onthelaunchvehicleside of theCassini adapter for mounting to theCentaur launch
lookslike.Althoughthisapproach allowsfreedom togooffand vehicleatstation- 150.199canbeexplained byusingtheTitan
"doindependent things,"itlacksthedegree ofpositivecontrol design data.

NASA RP-1370 37
Appendix C

Software Interface Example:


Definitions and Timing Requirements for Safety
Inhibit Arm Signals
Signal definition Centaur sequence Initiating event + time Persistence Function
control unit
switch number

Satellite vehicle (SV) 45 Main engine cutoff 3L_0.5 sec Unshorts SV pyro capacitor banks
pyro unshort (MECO) 2 ÷ 3_+0.5 sec
(primary)

SV latch valve 33 MECO2 + 10-&0.5 sec 3:L-0.5 sec Arms safety inhibit relay for SV
arm (primary) main engines

SV pyro unshort 89 MECO2 ÷ 155.-0.5 sec 3_+0.5 sec Provides redundant unshort of SV
(secondary) pyro capacitor banks

SV latch valve 88 MECO2 + 17_+0.5 sec 3'_-0.5 sec Provides redundant arm of inhibit
arm (secondary) relay for SV main engines

Radiofrequency 34 Titan IV/Centaur 3_-+9.5 sec Services backup (redundant to SV


monopropellant driver separation + 24_+0.5 sec _ound support equipment com-
backup enable mand) enable of safety inhibit SV
functions (radiofrequency sources
and reaction control system thruster
drivers)

38 NASA RP- 1370


Appendix D

Supplied Services Interface Example


This appendix provides a simplistic text-based example of a numbers, location on the drawing, brief description, and due
supplied services (air-conditioning and cooling water) inter- date. The DDR block (fig. D. 1 ) on the drawing expands on this
face control document with a typical design-data-required information and identifies supplier, user, and time urgency of
(DDR) block. This example contains elements condensed from the data needed. The DDR numbering convention used here is
a number of service documents originally used for a submarine "V09 = Void #09." Preceding the void number with the ICD
weapons program; however, the principles contained herein are number provides a program-unique DDR number that is easily
universally applicable to any complex system of interfaces. related to its associated ICD and easily maintained in a data
Page 1 of the ICD lists the DDR's (table D.I) showing DDR base.

TABLE D.I.--DESIGN-DATA-REQUIRED SUMMARY


AND LOCATOR

Void Location Description Date due


number

V01

V09 Sheet 1, Main heating 30 Days after


zone C-7 and cooling authentication of
(MHC) water data fulfilling
schedule DDR 5760242-V 12

DDR No. 1466134--V09


Data required: Heating and cooling (HC) system upper zone
water schedule (supply water temperature versus
environmental temperature)

Data supplier: HC working group

Data user. Launch vehicle design agent

Date due: 30 days after authentication of data fulfilling DDR No.


2543150-V12

Figure D.1 .--Typical design-data-required block.

NASA RP-1370 39
The following pages present the kinds of data required to fully g. Air quality: Air at the inlet to the equipment shall
define the air-conditioning requirements for suites of equip- be equivalent to or better than air filtered through
ment located in a launch control center. Table D.2 details a 0.3-}.tm filter with an efficiency of 95%.
conditioned-air distribution; table D.3 presents typical inter- 2. The closed-loop system shall have the capacity of
face data required to ensure that a cooling water service is removing 52.8 kW (minimum) of heat dissipated by
provided to electrical equipment and indicates requirements for equipment using closed-circuit conditioned air. This
the equipment before and after the incorporation of an engi- heat load includes 1.3 kW reserved for launcher
neering change. equipment in the launch vehicle control center (see
note 702 below).
701. Launch vehicle control center services:
A. Air-conditioning shall be provided with a dedicated 702. The system shall provide the capability of removing
closed-circuit system capable of supplying a mini- 1.65 kW minimum of heat dissipated by equipment by using
mum total flow of 12 820 scfm with a 50% backup compartment ambient air as a cooling medium while maintain-
capability. ing the compartment within specified limits.
1. The conditioned air shall be distributed to each A. The ship shall take no action that eliminates the option
equipment flue as specified in table D.2. The distrib- for launcher equipment to use compartment ambient air
uted conditioned air at the inlet to the equipment or closed-circuit conditioned air for dissipating launcher-
shall satisfy the following parameters: generated heat of 1.3 kW.
a. Temperature: The minimum temperature shall be B. Heat dissipated to ambient air by equipment using
65 °F and the maximum, 70 °F. closed-circuit conditioned air is not included.
b. Humidity: The maximum humidity shall be 75
grains per pound of dry air. 703. The system shall provide distribution trunks to equip-
c. Working pressure: The working pressure shall ment flues with total flow capacity as designated below for the
be enough to overcome equipment pressure drops conditions of table D.2:
and to maintain positive pressure at the equip-
ment outlet with respect to compartment ambi-
ent pressure. A 10% minimum leakage rate in the Trunk Minimum
compartment shall be assumed. flOW,
scfm
d. Flow resistance: The system shall be able to over-
come the pressure drop across the equipment (i.e., A 2700

from exit of orifice plate to top of equipment) as B 1620

shown in table D.2. C 2300


D 3400
e. Flow profile:
E 1300
(1) The flow distribution for each flue shall be F 1500
such that the flow velocity between the flue
centerline and 1.3 in. from the edge of the flue,
and (where equipment permits) 6 in. above the
flue gasket, shall not be less than 80% of the 704. Flow at reference designations marked with an asterisk
achieved average flow velocity. The achieved in table D.2 are to be considered flow reserve capabilities.
average flow velocity must equal or exceed veloc- These designated flues do not require verification of flow per
ity based on the minimum flow rate specified in table D.2 nor profiling per note 701.A.l.e(1) until these flues
table D.2. are activated. The Government-furnished pipe assemblies and
(2) Velocity profiling is not required for flues caps will be supplied for flues not activated.
designated 301 through 310, 011 through 015,
446BC, 405-2A, 405-2B, 405-6A, and 405-6B. 705. The minimum flow for flues 446BC and 447BC is
f. Adjustment capability: The system shall provide 100 scfm before change 30175 and 250 SCFM after change
flow adjustment from 0 to 300 scfm at each of the 30175.
equipment flues requiring velocity profiling.

40 NASA RP-1370
TABLE D.2.-----CONDITIONED-AIR DISTRIBUTION

Equipment Trunk Flue Minimum Flow resistance/


(see note, [low, pressure drop at
703) scfm minimum flow (see
note 701A.I .d),
in. H20

Data cabinets A 301B 225 0.54


301C 260
305B 80 .5O
305C 80 .50
306B 290 .56
306C 50 .50

Data console A 308B 100 .50


308C 50 .50
309 O*
310B 135 .50
310C 5O .50

Control console E 405-2A 100 1.0


405-2B 100
a,O5-6A 50
405--6B 50

Power buffer and B 011 440 2.0


conversion 012 440
013-1 150
013-2 150
015 440

Control computer 440BC 200 1.0

group 440--441D 300


444BC 300
444---445D 250

446BC See note


447BC 705

zl.71 200

Control group 450BC 200


450-451D 200
451BC 100

452BC 200
452--.453D 200
458BC 200
458--459D 200
459BC 150"

472 150*

Powerdistribution F 002BC 150


003BC 150
004BC 150*
004D 150"

Load F 271BC 275 1.0


271D 0* 0
O05BC 100" 1.0
O05D 0* 0

*Flow reserve capability.

NASA RP-1370 41
TABLE D.3.mWATER FLOW RATE INTERFACE PARAMETERS

[Water inlet temperature: 54 °F max and 48 °F min; temperature alarm set at 56 "F +1 °F (increasing) and 47 °F _+ 1 °F (decreasing); see Remarks.
Working pressure: 85 psig max and 57 psig rain. Test pressure, 125 psig max with binnacles to be isolated at vehicle hydrostatic test.
Pressure drop: nominal differential pressure range, 13 to 23 psid ref. Water quality: dual filters required;
filters to 10/.tin with 98% efficiency by weight, 20 _tm absolute,]

Function Minimum cooling Water flow rate Remarks


capability

Electrostatic:ally supported 2.25-kW gain a6.0-gal/min nominal total flow for two Reliability of water supply shall support a navigation
gyro navigator (ESGN) and ESGN binnacles and one GSS binnacle. subsystem availability of 0.97. This service requirement
gravity sensor system (GSS) The supply shall maintain constant flow shall be continuously available during patrol and refit.
binnacle cooling of 2,0 gallmin +10% to each binnacle. The water temperature shal) not vary by more than
6 °F when changing at the rate of 0.25 °F/see maximum.
This change shall not occur more than once per 30-rain
bA remote, low-flow alarm shall be pro- period.
vided for the ESGN binnacles and the
GSS binnacle.

Reserve capability for future 3.25-kW gain 2.6-gaVmin minimum


navigation development
I

ESGN binnacle cooling 1.5-kW gain a4.0-gaVmin nominal total flow for two
ESGN binnacles. The supply shall main-
rain a constant flow of 2.0 gaVmin +-.10%
to each binnacle.

bA remote, low-flow alarm shall be pro-


vided for the ESGN binnacles.

Reserve capability for future 4.0-kW gain 4.5-gal/min minimum


navigation development

aThe system shall provide test connections at the inlet and outlet of each binnacle to permit periodic measurement of differential pressure.
bLocai flow indication shall be provided for each binnacle.

42 NASA RP-1370
Appendix E

Compatibility Analysis
E.1 Definition depends on the signal type (e.g,, analog or digital)
and the intended use. In general, the interface must
Compatibility analysis of the interface definitions contained in show the characteristics of the isolating device (ele-
an ICD is a major tool of interface control. It serves a twofold ment) on each side of the interface and define the
purpose: signal characteristics in engineering terms suitable
for the particular type of signal.
1. Demonstrates completeness of interface definition. If any 4. Timing and other functional interdependencies
interface data are missing or presented in a manner that cannot be 5. System handling of error conditions
integrated by using the ICD alone as a data source, the ICD is 6. Full definition of any standards used. Most digital
considered deficient. transmission standards have various options that
2. Provides a record (traceability) that the interface has been must be selected; few, if any, standards define the
examined and found to have the right form and fit. This record data that are passed.
can then be used in evaluating the acceptability of subsequent B. Steps to be followed
change proposals. 1. Verify interoperability of connectors.
2. Size cables to loads.
3. Determine cable compatibility with signal and envi-
E.2 Kinds of Data ronmental conditions.
4. Define data in one document only.
The following compilation identifies the kinds of data that 5. Determine adequency of circuit protection devices
must be obtained for a compatibility analysis and outlines the and completeness of signal definition.
general steps that should be followed for three categories of II. Interface category--mechanical/physical
interface: electrical/functional, mechanical/physical, software, A. Type of interface--form and fit
and supplied services: 1. Data required to perform analysis
a. A datum (reference) that is common to both sides
I. Interface category--electricalfunctional of the interface (e.g., a mounting hole in one part
A. Data required to perform analyses that will mate with a hole or fastener in the other
1. The following parameters are required, considering mating parts or a common mating surface of the
the specific function or signal involved: two mating parts)
a. Cabling and connectors b. Dimensions and tolerances for all features of each
b. Power requirements part provided in a manner that gives the optimum
c. Electromagnetic interference, electromagnetic interface fit and still provides the required design
comparability, electromagnetic radiation, and functions. Optimum interface means dimension-
grounding requirements ing so that the tolerance accumulation is kept to a
d. Functional flow and timing requirements minimum.
e. Signal definition 2. Steps to be followed
f. Digital data definition to the bit level a. Start with the common datum and add and subtract
g. Protocol levels dimensions (adding the tolerance accumulations
h. Seven-layer International Standards Organization for each dimension) for each feature of the part
open systems instruction stack definition or its interface.
equivalent b. Determine the dimensional location of the
i. Error recovery procedures interface-unique features by adding and subtract-
j. Startup and shutdown sequences ing the tolerance accumulations from resulting
k. Adequacy of standards used or referenced dimensions to achieve the worst-case maximum
2. Unique requirements for an interface or a piece of and minimum feature definitions.
equipment different from overall system require- c. Perform the same analysis for the mating features
ments (i.e., the hierarchy of specifications required) of the interfacing part.
3. Adequate definition of all signals crossing the inter- d. Compare and question the compatibility of the
face. "Adequate" is difficult to define precisely but worse-case features of the two mating parts (Will

NASA RP-1370 43
the maximum condition of one part fit within the functional ICD. The purpose of an ICD is to communi-
minimum condition of the mating part?) cate equipment interface requirements to programmers
B. Type of interface--structural load in terms that the programmers readily and accurately
1. Data required to perform analysis understand and to require equipment designers to con-
a. A description of the loading conditions (static or sider the effect of their designs on computer programs.
dynamic) and the duration of those conditions B. Type of interface--hardware/software integration. The
b. Characteristics of the equipment involved: weight ICD provides an exact definition of every interface, by
or mass; mass distribution; elastic properties; and medium and by function, including input/output
sensitivity of elastic properties to temperature, control codes, data format, polarity, range, units, bit
moisture, atmospheric gas content, pressure, etc. weighting, frequency, minimum and maximum timing
2. Steps to be followed. This analysis involves placing constraints, legal/illegal values, accuracy, resolution,
the interfacing items in a position that produces the and significance. Existing documentation may be ref-
maximum loads while the items are interfacing. A erenced to further explain the effect of input/output
space experiment is primarily designed for flight operations on external equipment. Testing required to
loads, yet it must withstand the loads developed validate the interface designs is also specified.
during the launch and deployment cycles and per- IV. Interface category--supplied services
haps unique loads during launch processing. The A. Type of interfaceDfluid service
complexity of the compatibility analysis will vary 1. Data required to perform analysis
depending on the types of interfacing items and a. Type of fluid required by the equipment and
environments. type of fluid the service supplier will provide.
a. Attachment loads are the simplest, being a state- This may be in the form of a Federal or military
ment of the loads applied by the attaching feature specification or standard for both sides or for
(bolt) and the load capability of the component one side of the interface.
being retained (flange). b. Location of the equipment/service interface
b. Hoisting and handling loads require the calcula- (hose connection, pipe fitting, etc.)
tion of bending moments or shear for various c. Equipment requirements at the interface loca-
loading scenarios. Dynamic and environmental tion in regard to characteristics (pressure, tem-
loads must also be considered. (How quickly is the perature, flow rate, duty cycle, etc.)
load applied? What are the wind loading factors?) d. Capability of the service supplier at the interface
c. A more complex situation will be the loads devel- location
oped during a dynamic interaction of interfacing e. Manner in which the equipment can affect the
items where different material characteristics must capability of the service supplier (e,g,, having a
be considered along with the reaction characteris- large backpressure that the supplier fluid must
tics of the materials (e.g., a flexible beam of push against or a combination of series and
varying moments of inertia supported by an elas- parallel paths that the supplier fluid must pass
tomeric medium where the entire system is through)
subjected to a high-velocity impulse of a few 2. Steps to be performed. Examine the supplier and
microseconds duration). Such a condition could equipment requirements to determine
produce loads that exceed those for which one of a. If the supplier capability meets or exceeds the
the interfacing items is designed. Another inter- equipment requirements. This may require con-
facing item may have to be redesigned so as not to verting a Federal/military specification or stan-
jeopardize the mission of the primary item (i.e., dard requirement into what is specified for the
increasing the strength of the item being supported equipment.
could increase the weight). b. If the supplier capability meets the require-
III.Interface categoryDsoftware ments, considering the effects resulting from the
A. Type of interface--software. The ICD is required to fluid passing through the mating equipment
specify the functional interface between the computer B. Type of interface---environmental
program and any equipment hardware with which it 1. Data required to perform analysis
must operate. Often, the supplier documentation for a. Conditions required for equipment to function
standard computer peripherals and terminals is ad- properly. Storage, standby, and operating
equate for this purpose. Conversely, it has been found scenarios need to he established and defined.
that performance specifications governing the design b. Supplier's capability to provide the environ-
of new equipment are not satisfactory for use in a ment specified in terms of time to reach steady

44 NASA RP-1370
statefromtransients resultingfromuncontrol- a. Simple inspection, which considers the environ-
lableexternalenvironments; thelimitsof the ment required by an item versus the capability of
steady-state
conditions (maximum/minimum); the ambient in which the item resides
andmonitoring features b. Complex analysis, which must consider uncon-
2. Stepsto be performed. Performanalyses (e.g., trolled external environmental inputs, the ther-
thermal) underextreme andnominal environmen- mal properties of intermediate systems that do
talconditionstoverifythatsupplier's equipment not contribute to the end environment but act as
canmaintaintheenvironment requiredfor the conduits or resistors in the model, and the inter-
equipment. The complexity of the analysis may action of the item and the system that controls
vary depending on the types of items involved. the desired environment

NASA RP-1370 45
Appendix F

Bracket System for Interfaces


Brackets are used on hardware/engineering drawings to flag 5. Drawing errors not affecting element construction
or identify details controlled by the ICD. Changes cannot be B. Rating A2: Significant degradation to interface or
made to the drawings or designs without the effects on the mission performance
interface being assessed and coordinated through the ICD 1. Appreciable change in functional capability
process. 2. Appreciable degradation of engineering or science
The process uses a rating similar to that used in the problem/ data
failure reporting bracket system with the same controls and 3. Significant operational difficulties or constraints
traceability. Once a bracket has been assigned to an interface 4. Decrease in life of interfacing equipment
void or problem, specific analyses and actions are required for 5. Significant effect on interface or system safety
the bracketed item to be removed. The bracketed item remains C. Rating A3: Major degradation to interface or mission
in open status with assignment to the responsible cognizant performance or catastrophic effect on interface or
subsystem or design section until (1) the corrective action or system safety
coordinated information has been developed, (2) a proper risk 1I. Interface deficiency rating B (understanding of risk)
assessment has been performed, (3) ICD change actions have A. Rating B1 : Effect of interface deficiency is identified
been completed, (4) adequate verification of the interface is by analysis or test, and resolution or corrective
planned, and (5) the proper approval signatures have been action is assigned and scheduled or implemented
obtained. and verified. There is no possibility of recurrence.
The following ratings are used to establish a category of B. Rating B2: Effect of interface deficiency is not fully
"bracket" identifiers for interface deficiencies. Any discrep- determined. However, the corrective action proposed,
ancy having an A rating greater than I or a B rating greater than scheduled, or implemented is considered effective in
2 will be designated a bracketed discrepancy (see figure F. 1). correcting the deficiency. There is minimal possibility
of recurrence and little or no residual risk.
Interface deficiency rating A (S&MA impact) C. Rating B3: Effect of interface deficiency is well
A. Rating AI: Negligible effect on interface or mission understood. However, the corrective changes pro-
performance posed do not completely satisfy all doubts or concerns
I. No appreciable change in functionalcapability (form, regarding the correction, and the effectiveness of
fit, and function are adequate for the mission) corrective action is questionable. There is some poss-
2. Minor degradation of engineering or science data ibility of recurrence with residual risk.
3. Support equipment or test equipment failure but not D. Rating B4: Effect of interface deficiency is not well
mission-critical element failure understood. Corrections have not been proposed or
4. Support-equipment- or test-equipment-induced those proposed have uncertain effectiveness. There is
failures some possibility of recurrence with residual risk.

46 NASA RP-1370
Interface ordiscrepancy
project task manager redapproval
flag; required

Rating A Numerical rating Rating B


(S&MA impact) (understanding of risk)

Negligible 1 1 Known deficiency with corrective action


impact assigned, scheduled, and implemented

Significant 2 2 Deficiency poorly defined but acceptable


degradation corrective action proposed, scheduled, and
implemented (low residual risk)

Major 3 3 Known deficiency but effectiveness of


degradation t
corrective action is unclear and does not
satisfy all doubts and concems (residual risk)
L j

4 Impact not defined with confidence;


corrective action with uncertain
effectiveness (residual risk)

Figure F.1 .mlnterface deFmiency rating system.

NASA RP-1370 47
Appendix G

ICD Guidelines

1. Interface control documents should not require the designer of the


mating interface to assume anything. ICD's should be compatible with
each other and stand alone.

2. Only the definition that affects the design of the mating interfaces
need be used.

3. ICD's should not specify design solutions.

4. The ICD custodian should be independent of the design organiza-


tion.

5. The ICD custodian should verify that the data being controlled by
an ICD axe sufficient to allow other organizations to develop the
interface described by the ICD.

6. An interface control system should be in place at the beginning of


system (hardware or software) development.

7. Each void should have a unique sequential identifier establishing


due dates, identifying exact data to be supplied, and identifying the data
supplier.

48 NASA RP-1370
Appendix H

Glossary
baseline---The act by which the program manager or a desig- of equipment and (2) providing an authoritative means of
nated authority signs an interface control document (ICD) and controlling the interface design.
by that signature establishes the genuineness of the ICD as an
official document defining the interface design requirements. interface control document (ICD)---A drawing or other docu-
The term "baseline" conveys the idea that the ICD is the only mentation that depicts physical and functional interfaces of
official definition and that this officiality comes from the related or cofunctioning items. (The drawing format is the most
technical management level. Not only is the initial version of common means of controlling the interface.)
the ICD baselined, but each change to an ICD is likewise
approved. interface control working group---A group convened to
control and expedite interface activity between the Govern-
comment issue--An issue of an ICD distributed for review and ment, contractors, and other organizations, including resolu-
comment before a meeting of the affected parties and before tion of interface problems and documentation of interface
baselining agreements

custodian--The contractor or project assigned the responsibil- interface definition--The specification of the features, char-
ity of preparing and processing an ICD through authentication acteristics, and properties of a particular area of an equipment
and subsequently through the change process design that affect the design of another piece of equipment

data--Points, lines, planes, cylinders, and other geometric interoperability--The ability of two devices to exchange
shapes assumed to be exact for the purpose of computation and information effectively across an interface
from which the location or geometric relationship (form) of
features of a piece of equipment can be established mechanical/physical interface--An interface that defines the
mechanical features, characteristics, dimensions, and toler-
interface responsibility matrix--A matrix of contractors, ances of one equipment design that affect the design of another
centers, and project organizations that specifies responsibilities subsystem. Where a static or dynamic force exists, force
for each ICD listed for a particular task. Responsibilities are transmission requirements and the features of the equipment
designated as review and comment, technical approval, that influence or control this force transmission are also de-
baselining, and information. fined. Mechanical interfaces include those material properties
of the equipment that can affect the functioning of mating
electrical/functional interface---An interface that defines the equipment or the system (e.g., thermal and galvanic
interdependence of two or more pieces of equipment when the characteristics).
interdependence arises from the transmission of an electrical
signal from one piece of equipment to another. All electrical software interface---The functional interface between the
and functional characteristics, parameters, and tolerances of computer program and any equipment hardware with which it
one equipment design that affect another equipment design are must operate. Tasking required to validate the interface designs
specified. is also specified.

interface--That design feature of one piece of equipment that supplied-services interface---Those support requirements that
affects a design feature of another piece of equipment. An equipment needs to function and that are provided by an
interface can extend beyond the physical boundary between external separate source. This category of interface can be
two items. (For example, the weight and center of gravity of further subdivided into environmental, electrical power, and
one item can affect the interfacing item; however, the center of communication requirements.
gravity is rarely located at the physical boundary. An electrical
interface generally extends to the first isolating element rather technical approvalmThe act of certifying that the technical
than terminating at a series of connector pins.) content in an interface document or change issue is acceptable
and that the signing organization is committed to implementing
interface control--The process of (1) defining interface re- the portion of the interface design under the signer' s cognizance.
quirements to ensure compatibility between interrelated pieces

NASA RP-1370 49
FED-STD--209E: Airborne Particulate Cleanliness Classes in Cleanrooms and
References _,0_ _on_ Federal Standard, Sept. 1992.

KHB 1860. l:Kennedy Space Center Ionizing Radiation Protection Program.


1. MIL-STD-499: Engineering Management. May 1974. Kennedy Space Center, FL, 1972.
2. Blanchard, B.S.: and Fabryeky, W.J.: Systems Engineering and Analysis.
Prentice-Hall Inc., 1981. MCR-86-2550: Titan IV System Contamination Control Plan. Martin Marietta
3. MIL-STD--1521B: Technical Reviews and Audits for Systems, Equip- Aerospace Corp., Denver, CO, or Bethesda, MD. 1987.
meat, and Computer Software, Notice 1, Dec. 1985.
4. Koclder, F.; Withers, T.; Poodiack, J.: and Gierman, M.: Systems Engi- MIL-B-5087B: Bonding, Electrical and Lightning Protection for Aerospace
neering Management Guide. Defense Systems Management College, Systems. Military Standard, Dec. 1984.
Fort Belvior, VA, Jan. 1990.
5. ICD-Titan IV/Satellite Vehicle-24027, Cassini Mission. Martin Marietta MIL-N-7513F: Nomenclature Assignment, Contractor's Method for Obtain-
Technologies, inc., Denver, CO, Jan. 1994. ing. Military Standard, Notice 2, July 1993.
6. MIL-STD--490A: Specification Practices. Military Standard, June 1985.
7. ANSI Standard Y14.5: Dimensioning and Tolerancing. MIL-HDBK-259: Life Cycle Cost in Navy Acquisitions. Military Handbook,
8. DOD--STD-100: Enginecring Drawing Practices. Apr. 1983.
9. SAMSO-STD 77--4: Format and Requirements for Interface Documents.
Space & Missile Systems Org., Jan. 1979. MIL-P-27401C: Propellant Pressurizing Agent, Nitrogen. Military Standard,
10. MIL-STD-704: Aircraft Electrical Power Characteristics. Aug. 1988.

MIL-S-83490: Specifications, Types and Forms. Military Standard, Oct.


1968.

MIL-STD-100E: Engineering Drawing Practices. Military Standard, Sept.


Bibliography 1992.

MIL-STD--482A: Configuration Status Accounting Data Elements and Re-


AFR 65-3, AR 70--37, NAVELEXINST 4130.1, and 4139. I A: Joint Services lated Features. Military Standard, Sept. 1968.
Regulation on Configuration Management. Air Force Regulation, Naval Elec-
tronics Systems Instructions, Marine Corp Order. MIL-STD-973: On Configuration Management Practices for Systems, Equip-
ment, Munitions, and Computer Software. Military Standard, 1993.
AFSCP 800-7: Configuration Management. Air Force Systems Command
Pamphlet. M[L-STD--1246C: Product Cleanliness Levels and Contamination Control
Program. Military Standard, Apr. 1994.
DOD 4120.3-M: Defense Standardization and Specification Program Policies,
Procedures and Instructions, Aug. 1978. MIL-STD-1388-1 A: Logistic Support Analysis Reviewer. Military Standard,
Mar. 1991.

DOD 4245.7-M: Transition From Development to Production. Military Speci-


fication, Sept. 1985. MIL-STD--1456: Contractor Configuration Management Plans. Sept. 1989.
(Cancelled July 1992.)
DOD 5010.19: Configuration Management. Military Specification, July 1990.
MIL-STD-1528A: Manufacturing Management Program. Military Standard,
IX)D--D-- 1000B: Drawings, Engineering and Associated Lists, Military Speci- Sept. 1986.
fication, July 1990.
MIL-STD--1541: Electromagnetic Compatibility Requirements for Space
DOD-STD.-480B: Configuration Control--Engineering Changes, Deviations, Systems. Military Standard, Dec. 1987.
and Waivers, July 1988. (Cancelled July 1992.)
PD 699- ! 20: Cassini Final Targeting Specification. Program Directive, NASA
ESMC Req 160--1: Radiation Protection Program. Eastern Space & Missile or Executive Office of the President.
Center, Patrick AFB, FL.
SECNAVINST 4130: Navy Configuration Management Manual. Executive
FaRley, R.E.: Software Engineering Concepts. McGraw-Hill, New York, Office of the Secretary of the Navy.
1985.

50 NASA RP-1370
Training Answers
Chapter Answers

1 I(A); 2(D); 3(C); 4a(C), 4b(A), 4c(B)

I(D); 2(C); 3a(B), 3b(C); 4a(C), 4b(C), 4c(C);


5a(A), 5b(A), 5c(A); 6a(C), 6b(A), 6c(A);
7a(B), 7b(B), 7cA(i), 7cB(ii), 7cC(i); 8a(B),
8b(A); 9a(A), 9b(A), 9c(B)

la(A), lb(B), lc(B); 2a(B), 2b(A), 2c(B);


3a(A), 3b(B), 3c(A); 4a(B), 4b(A), 4c(B);
5a(A), 5b(B); 6a(A), 6b(B), 6c(B), 6d(B);
7a(B), 7b(A), 7c(A), 7d(A), 7e(A)

NASA RP-1370 51
Form Approved
REPORT DOCUMENTATION PAGE OMB No. 0704-0188

Publx:reporting burdenfor thiscollection


of information
is estimated to averageI hour perresponse,including
the timefor reviewing instructions,searching
existingdatasources,
gatheringandmaintainingthe dataneeded,andcompleting andreviewingthe collectionofinformation.Sendcomments regarding this burdenestimateoranyotheraspectof this
collection
of in'lormation,includingsuggestionsfor reducingthis burden,to WashingtonHeadquartersServices,Directoratefor InformationOperationsand Reports,1215Jefferson
DavisHighway.Suite 1204,Arlington. VA 22202-4302,andto the Officeof Management and Budget,PaperworkReduction Project(0704-0188),Washington,DC 20503.

1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE I 3. REPORT TYPE AND DATES COVERED

January 1997 [ Reference Publication


4. TITLE AND SUBTITLE 5. FUNDING NUMBERS

Training Manual for Elements of Interface Definition and Control

WU-323--42-02
6. AUTHOR(S)

Vincent R. Lalli, Robert E. Kastner, and Henry N. Hartt

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION


REPORT NUMBER

National Aeronautics and Space Administration


Lewis Research Center E-9790
Cleveland, Ohio 44135-3191

91 SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSORING/MONITORING


AGENCY REPORTNUMBER

National Aeronautics and Space Administration


Washington, DC 20546-0001 NASA RP- 1370

11. SUPPLEMENTARY NOTES


This manual was edited by Vincent R. Lalli, NASA Lewis Research Center; Robert E. Kastner, Vitro Corporation,
Rockville, Maryland; and Henry N. Hartt, Vitro Corporation, Washington, DC. Responsible person, Vincent R. Lalli,
(216) 433-2354.
12a. DISTRIBUTION/AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE
Unclassified - Unlimited
Subject Category 15
This publication is available from the NASA Center for Aerospace Information, (301 ) 621-0390.
Multiple copies are for sale by the National Technical Information Service. Springfield, VA 22161,
(703) 487-4822.
13, ABSTRACT (Maximum 200 words)

The primary thrust of this manual is to ensure that the format and information needed to control interfaces between
equipment are clear and understandable. The emphasis is on controlling the engineering design of the interface and not on
the functional performance requirements of the system or the internal workings of the interfacing equipment. Interface
control should take place, with rare exception, at the interfacing elements and no further. There are two essential sections
of the manual. Chapter 2, Principles of Interface Control, discusses how interfaces are defined. It describes different types
of interfaces to be considered and recommends a format for the documentation necessary for adequate interface control.
Chapter 3, The Process: Through the Design Phases, provides tailored guidance for interface definition and control. This
manual can be used to improve planned or existing interface control processes during system design and development, It
can also be used to refresh and update the corporate knowledge base. The information presented herein will reduce the
amount of paper and data required in interface definition and control processes by as much as 50 percent and will shorten
the time required to prepare an interface control document. It also highlights the essential technical parameters that ensure
that flight subsystems will indeed fit together and function as intended after assembly and checkout,

14. SUBJECT TERMS 15. NUMBER OFPAGES


60
Systems engineering; Configuration control; Documentation;
16. PRICE CODE
Change notices; Interface management
A04
17. SECURITY CLASSIFICATION 18. SECURITY CLASSIFICATION 19. SECURITY CLASSiRCATION 20. LIMITATION OF ABSTRACT
OF REPORT OF THIS PAGE OF ABSTRACT
Unclassified Unclassified Unclassified
NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89)
Prescribed by ANSI Std, Z39-18
298-102
Slide Rule and Insert Assembly

Cut out slide rule outer covers on pages 3 and 7 and


slide inserts on pages 5 and 9 along cut lines.

Slide Rule Outer Covers (pages 3 and 7)"

Cut out the 14 rectangular shapes, marked "cutout"


(a razor blade knife is recommended).

Fold spacers back.

Note: Outer covers are assembled upside down to each other.

Be certain to match the words "top" to "top"


Assemble outer covers using glue on folded spacers.

Cut out the black half-circle notches.

To reinforce the glue, staple outer covers where


you see (three places on top and bottom).

Slide Rule Insert (pages 5 and 9):

Make certain that plus signs are back to back.


Paste together _ and _ to make insert.

To assemble:
Place insert so that _ on the cover and C_ on the insert are
at the bottom right corner. When flipped, that will automatically
align _ on the cover and _ on the insert on the top right
corner.

NASA/TP---2000-207428
_---" Fold line -

Cut line

/
/
/

Fold line ....

NASA/TP--2000-207428
Cut line
/
/
/
/
/

NASA/TP--2000-207428
Fold line

,I
Cut line
l
/
/
/
/
/

J e,_
I

¢J

r O

Fold line .....

NAS A/TP--2000- 207428


Cut line
/
/
/
/
/

NASA/TP--2000-207428
REPORT DOCUMENTATION PAGE FormApproved
OMB No. 0704-0188
Public reporting _urden for this colleclion of information is estimated to average 1 hour per response, including the time for rewewing instructions, searching existing data sources,
gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this
I collection of information, including suggestions for reducing tMis burden, to Washington HeadQuarters Services, Directorate for Information Operations and Reports, 1215 Jefferson
Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188), Washington, DC 20503.

1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED
July 2000 Technical Paper
4. TITLE AND SUBTITLE 5. FUNDING NUMBERS

Reliability and Maintainability (RAM) Training

WU- 323-93-00-O0
6. AUTHOR(S)

Vincent R. Lalli, Henry A. Malec, and Michael H. Packard, editors

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION


REPORT NUMBER
National Aeronautics and Space Administration
John H. Glenn Research Center at Lewis Field E-11144
Cleveland, Ohio 44135-3191

9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSORINGdMONITORING


AGENCY REPORT NUMBER

National Aeronautics and Space Administration


Washington, DC 20546-0001 NASA TP--2000-207428

11. SUPPLEMENTARY NOTES

This manual was edited by Vincent R. Lalli, NASA Glenn Research Center, Henry A. Malec, Siemens Stromberg-Carlson,
Albuquerque, New Mexico 87123-2840, and Michael H. Packard, Ratheon Engineers and Constructors, Cleveland, Ohio
44135. Responsible person, Vincent R. Lalli, organization code 0510, (216) 433-2354.

12a. DISTRIBUTION/AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE

Unclassified - Unlimited
Subject Category: 15 Distribution: Standard

This publication is available from the NASA Center for AeroSpace Information, (301) 621-0390.
13. ABSTRACT (Maximum 200 words)

The theme of this manual is failure physics--the study of how products, hardware, software, and systems fail and what can
be done about it. The intent is to impart useful information, to extend the limits of production capability, and to assist in
achieving low-cost reliable products. In a broader sense the manual should do more. It should underscore the urgent need
for mature attitudes toward reliability. Five of the chapters were originally presented as a classroom course to over 1000
Martin Marietta engineers and technicians. Another four chapters and three appendixes have been added. We begin with a
view of reliability from the years 1940 to 2000. Chapter 2 starts the training material with a review of mathematics and a
description of what elements contribute to product failures. The remaining chapters elucidate basic reliability theory and
the disciplines that allow us to control and eliminate failures.

14. SUBJECT TERMS 15. NUMBER OF PAGES


363
Statistical concepts; Reliability; Maintainability; System safety; Quality assurance;
16. PRICE CODE
Logistics; Human factors; Software performance; System effectiveness
A16
17. SECURITY CLASSIFICATION 18. SECURITY CLASSIFICATION 19. SECURITY CLASSIFICATION 20. LIMITATION OF ABSTRACT
OF REPORT OF THIS PAGE OF ABSTRACT
Unclassified Unclassified Unclassified

NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89)


PreScril::_=cl by ANSI Stcl. Z39-18
298-102

You might also like