NASA Reliability Practices

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

T,.

NASA Technical

Memorandum

106313

Design

for Reliability:
Practices

NASA Reliability Preferred for Design and Test

Vincent R. Lalli Lewis Research Center Cleveland, Ohio

Prepared for the Reliability and Maintainability Symposium cosponsored by ASQC, IIE, IEEE, SOLE, IES, AIAA, Anaheim, California, January 24-27, 1994

SSS, and SRE

National Aeronamics and Space Administration __

(NASA-TM-I06313) RELIABILITY:

NASA

DESIGN FOR RELIABILITY FOR DESIGN AND Research Center)

N95-I3728

PREFERRED PRACTICES TEST (NASA. Lewis 27 p

Unclas

G3/18

0028221

_=_;_

= ir

DESIGN PART I--NASA RELIABILITY

FOR

RELIABILITY PRACTICES R. Lalli FOR DESIGN AND TEST

PREFERRED Vincent

National

Aeronautics and Space Administration Lewis Research Center Cleveland, Ohio 44135

SUMMARY

AND PURPOSE

Design weaknesses identified and tracked. Vincent

evident

by test

or analysis

are

This tutorial summarizes reliability experience from both NASA and industry and reflects engineering practices that support current and future civil space programs. These practices were collected from various NASA field centers and were reviewed by a committee of senior technical representatives from the participating centers (members are listed at the end). The material for this tutorial was taken from the publication issued by the NASA Reliability and Maintainability Steering Committee (NASA Reliability Preferred Practices for Design and Test. NASA TM-4322, 1991). Reliability must be an integral part of the systems engineering process. Although both disciplines must be weighted equally with other technical and programmatic demands, the application of sound reliability principles will be the key to the effectiveness and affordability of America's space program. Our space programs have shown that reliability efforts must focus on the design characteristics that affect the frequency of failure. Herein, we emphasize that these identified design characteristics must be controlled by applying conservative engineering principles. This tutorialshould be used to assessyour current reliability techniques, thus promoting an active technical interchange between reliability and design engineering that focuses on the design margins and theirpotential impact on maintenance and logistics requirements.By applying these practices and guidelines,reliability organizations throughout NASA and the aerospace community will continue to contribute to a systems development processwhich assuresthat Operating independently Design environments verified. criteria drive a conservative design approach. are well defined and

R. Lalli has been

at NASA

Lewis

Research

Center since 1963 when he was hired as an aerospace technologist. Presently, as an adjunct to his work for the Office of Mission Safety and Assurance in design, analysis, and failure metrics, he is responsible for product assurance management and also teaches courses to assist with NASA's training needs. Mr. Lalligraduated from Case Western Universitywith a B.S. and an M.S. in electrical engineering. In 1959 as a researchassistant at Case, and laterat PicatinnyArsenal,he helped to develop electronicfusesand special devices.From 1956 to 1963, he worked at TRW as a design,lead, and group engineer Mr. Lalliis a registered engineer in Ohio and a member of Eta Kappa Nu, IEEE, IPC, ANSI, and ASME.

1.0 OVERVIEW 1.1 Applicability The designpractices that have contributedto NASA mission successrepresentthe "best technicaladvice" on reliability design and testpractices. These practicesare not requirementsbut rather proven technicalapproaches that can enhance system reliability. This tutorial is divided into two technical sections.

Section II contains reliability practices, including design criteria, test procedures, and analytical techniques, that have been successfully applied in previous spaceflight programs. Section III contains reliability guidelines, including techniques currently applied to spaceflight projects, where insufficient information exists to certify that the technique will contribute to mission success.

1.2 Discussion Experiencefrom NASA's successful extended-duration space missions shows that four elements contribute to high reliability: (I) understanding stress factors imposed on flight hardware by the operating environment; (2) controlling the stress factors through the selection of conservativedesign criteria; (3) conducting an appropriate analysisto identify and track high stress pointsin the design {priorto qualification testingor flight use); and (4) selecting redundancy alternativesto provide the necessary function(s) should failure occur.

2.3 Document

Referencing

The following example of the document numbering system applicable to thepractices and guidelines isaPart Junction Temperature, _ practicenumber PD-ED-1204: P 1. Practice 2. Design Factors Design D ED 12 04

3. Engineering 4. Series 12

2.0 RELIABILITY 2.1 Introduction

PRACTICES

5. Practice04 Key to nomenclature.--The followingisan explanation of the numbering system: presented spaceflight herein conprograms. Position
1.

The reliability design practices tributed tothe success of previous

Code G- Guideline P - Practice D - Design factors T - Test elements EC - Environmental considerations

The information is for use throughout NASA and the aerospace community to assist in the design and development of highly reliable equipment and assemblies. The practices include recommended analysis procedures, redundancy considerations, parts selection, environmental requirements considerations, and test requirements and procedures.

2.

2.2 Format The following format is used for reliability practices: 4.


PRACTICE FORMAT DEFINITIONS

ED - Engineering design AP - Analytical procedures TE- Test considerations and procedures x xx Series number Practicenumber within series

5.
Prau:t|co: A brief statement of the practice

BeneJlt: from

concise

statement the practice

of the

technical

improvement

realised

implementing

2.4 Practices
Identifiable programs or projects

as of January

1993

Progr4_me that have

_lsat applied

Certified the

Ue_: prectlce

Center informat|on_

to

Courier usually

for

More

Information: NASA

Source Center

of (see

additional prigs 6)

PD-EC-II01 PD-EC-II02 PD-ED-1201 PD-ED-1202


be

Environmental Factors ** Meteoroids/Space Debris EEE Parts Derating

sponsoring

implementltiou to with ussd give the adequate full

Method: detkile information of

A the

brief process to

technical but to

discussion, provide the

not & design prctice

intended engineer should

understand

how

PD-ED-1203
Toclz_ical practice Igotimaalo: A brief technical juetification for use of the

High-Voltage Power Supply Design and Manufacturing Practices Class-S Parts in High-Reliability Applications Part Junction Temperature Welding Practicesfor 2219 Aluminum and Inconel718 Power Line Filters Magnetic Design Control for Science Instruments

Impact the

ofNonpr_tleo: is avoided

brief

statement

of

what

con

be

expected

if

practice

PD-ED-1204 PD-ED-1205 PD-ED-1206 PD-ED-1207

Related the manual

Prnctices: that

Identification contain related

of

other

topic

areas

in SPONSOR OF

informion

Referqmceo: matlon about

Publications the practice

that

contain

addltlonl

infer-

PRACTICE

PD-ED-1208 PD-ED-1209 PD-ED-1210 PD-ED-1211

Static Cryogenic Seals for Launch Vehicle Applications ** Ammonia-Charged AhminumHeat Pipes with Extruded Wicks * Assessment and Control of * Electrical Charges Combination Methods forDeriving Structural Design Loads ConsideringVibro-Acoustic,etc., Responses Design and Analysis of Electronic Circuitsfor Worst-Case Environments and Part Variations Power,

PT-TE-1410

PT-TE-1411 PT-TF_--1412 PT-TE-1413 PT-TE-1414

** ** ** **

Selectionof Spacecraft Materials and Supporting Vacuum Outgassing Data Heat Sinks for Parts Operated in Vacuum Environmental Test Sequencing Random Vibration Testing Electrostatic Discharge (ESD) Test Practices

PD-ED-1212

*New **New

practices for January1992. practices for January1993.

PD-ED-1213 PD-ED-1214 PD-ED-1215.1 PD-ED-1216 PD-ED-1217 PD-ED-1218

** Electrical Shielding of

Signal,and Control Cables ** Electrical Grounding Practicesfor Aerospace Hardware ** Preliminary Design Review ** Active Redundancy ** Structural Laminate Composites for Space Applications ** Applicationof AblativeComposites to Nozzles for Reusable Solid Rocket Motors ** Vehicle Integration/Tolerance Buildup Practices ** Battery Selection Practice for Aerospace Power Systems ** Magnetic Field Restraints for SpacecraftSystems and Subsystems Surface Charging and Electrostatic Discharge Analysis Independent Review of Reliability Analyses Part Electrical StressAnalysis Problem/Failure Report Independent Review/Approval Risk Rating of Problem/Failure Reports Thermal Analysis of Electronic Assemblies to the Piece Part Level Failure Criticality Modes, Analysis Effects, {FMECA) and

2.5 Typical

Reliability

Practice

PD-ED-1219

PD-ED-1221 PD-ED-1222

A typical reliability practice is illustrated in this section.Environmental factors arevery important in the system design so equipment operating conditionsmust be identified. Systems designed to have adequate environmental strength perform wellin the field and satisfy our customers. Failure to perform a detailedllfe-cycle environment profile can lead to overlooking environmental factorswhose effect is critical to equipment reliability. Not includingthese factors in the environmental design criteria and test program can lead to environmentinduced failures during spaceflight operations.

Environmental

Factors equipment operat-

PD-AP-1301 PD-AP-13O2 PD-AP-1303 PD-AP-1304 PD-AP-1305 PD-AP-1306 PD-AP-1307 * * * * * **

Practice {PD-EC-1101):Identify ing conditions.

Benefit:Adequate environmental strengthisincorporated into design. Programs That Certified Usage: SERT I and II, CTS, ACTS, space experiments, launch vehicles, space power systems, and Space Station Freedom Center to Contact Lewis Research Center for More Information: NASA

PT-TE-1401 PT-TE-1402 PT-TE-1403 PT-TE-1404 PT-TE--1405 PT-TE-1406 PT-TE-1407 PT-TE--1408 PT-TE-1409

EEE Parts Screening Thermal Cycling Thermographic Mapping Boards Thermal Test Levels Powered-On Vibration SinusoidalVibration * * *

Implement ationMethod: Develop life-cycle environment profile. of PC Describe anticipatedevents from finalfactory acceptance through removal from inventory. Identifysignificant natural and induced environments for each event. Describe environmental and stress conditions: Narrative Statistical

Assembly Acoustic Tests Pyrotechnic Shock Thermal Vacuum Versus Thermal Atmospheric Assemblies Test of Electronic

Technical

Rationale

Technical

Rationale

(continued)

_nvJronment

Principal

effects

Typicalfailurel induced

Environment

Principal

effects

Typical induced

failures

High temperature

Thermal Oxidation Structural Chemlcal

aging:

Insulation alteration

failure; of properties failure elec-

Wind

Force

ippllciilon

Structural interference function; mechanical

collapiel with lose of

change reaction

trlcal Structural Lose of

strength Inferrer-

lubrication

Depoi|t|on

of

materials

Mechanical once and

propertlel Softening sublimation Viscosity ovaporitlon Physical expansion reduction/ I meliing t and Structural incraued stress; wear on failure; mechanlcal Increased moving parts Heat gain (high Heat Isis (low velocity)

clogging; accelerated of loweffects of blabeffect e

abrasion Acceleration temperature Acceleration temperature

velocity)

Low temperature

Increased solidification Ice formation

viscosity

and

Lois

of

lubrication

Rain

Physical Water

stress absorption and

Structural Increase electrical structural In

collapse weigher failure; weakening of protective structural i surface

properties Alteration properties of electrical

immersion

]_mbrlttlement

Lose

of

mechanical cracking;

]_roslon

Removal coatings; weakenlng

strength; fracture Physical contraction Structural increased moving

failure; wear parts on Corrosion

deterioration Enhancement chemical of reactions

Moisture

absorption

Swellings container; breakdown; electrical

rupture physical lois strength mechanical interference function; Ion

of

Temperature shock

Mechenlcal

stress

Structural weakening; damage

collapse seal

or

of

Chemical Corrosion Electrolysis

reaction

Lose

of

High-speed particles of (nuclear irradiation)

Heating

Thermal oxidation

aging

strength; with electrical increased ity of

Tran|mutition ionisation

and

Alteration chemical, and electrical

of physical,

properties; conductivinsulators

properties; tion of gases

producand particles

Low humidity

relatlve

Desiccation Embrlttlemcnt Granulation

Loss

of

mechanical structural alteration properZero gravity Mechanical stress

secondary

strength; collapse; of ties; electrical

Interruption gravity-dependent functions

of

aduitlngS Absence Of convection

Aggravation temperature

of

hlgheffects

High pressure

Compression

Structural penetration log

collapeoi of sialwith Oione

cooling

I Interference

Chemical Crasing,

reactions: cracking Rapid alteration oxidation; of properties of mechanical elec-

function

Low

pressure

Expansion

Fracture explosive

of

container; expansion of electrical Ion strength breakdown Reduced strength dlelectrical of atr of Granulation Emhrittlcment

trical Lois

Outgueing

Alteration properties; mechanical

strength Interference function Insulation and arc-over b reakdown with

Reduced strength

dlelcctrlcal of air

Insulation and and arcoover osone

i corolla formation Exploilve de-

Severe stroll

mechtnlcal

Rupture structural

and

cracking collapse

Solar radiation

Actinic chemical

and

phyiJoreactions:

Surface alteration cal

deterioration; of electridieof formation mitertlll

comprolllOn

embrlttlement

properties;

Chemical Conttmlnition

reactions: Alteration and electrical of physical

coloration olonl

properties Sand dust and Abrasion Clogging Increased Interference tion; electrical alteration properties wear wlih funcof Acceleration Mechanical stress Structural collapse Reduced strength dielectric Insulation and arc-over breakdown

Salt

spray

Chemical Corrosion

reactions:

Increases Lois of

weir mechanical atteritlon proper-

strength; of ties; with Electrolysis Surface structural increased conductivity electrical

Interference function deterioration; weakening;

Technical Rationale

(concluded)

and projects. Unlike a reliability design practice, a guidelinelacksspecific operationalexperienceor data to validate its contribution to mission success. However, a
Typical induced failures

Environment

Principal

effect,

Vibration

Mechanical

stress

Lose

of

mechanical interference function; wear coIllpH

guideline does containinformationthatrepresents current %eat thinking"on a particular subject.

strength; with increased _tt|i_i Structural

3.2 Format
funcof

Magnetic fields

Induced

magnetization

Interference tion; electrical induced alteration

with

The following

format is used for reliability

guidelines:

properties; heating GUIDELINE FORMAT DEFINITIONS

Impact

of Nonpractice:

Prwctice:

brief

statement

of

the

guideline

Basalt: from

concise

statement the guideline

of

the

technical

improvement

re.iiued

Implementing

Failure to perform a detailedlife-cycle environment profilecan lead to overlooking environmental factors whose effect is critical to equipment reliability. If these factors are not includedin the environmental design criteria and testprogram, environment-induced failures may occur during spaceflight operations. References: Government I. ReliabilityPrediction of Electronic Equipment. MIL-HDBK-217E Notice I, January 1990. 2. Reliability/Design Thermal Applications. MIL-HDBK-251, January 1978. 3. Electronic Reliability Design Handbook. MIL-HDBK-338-1A, October 1088. 4. Environmental Test Methods and Engineering Guidelines.MIL-STD-810E, July 1989. Industry

Center information,

tn

Contact usually

for

More the

Information: NASA

Source Center

of

additional (see p&se 6)

sponsoring

Implementation to with be give the adequate used. full

Method: details inform.ainu of

A the

brief process to

technical but to

discussion, provide the a

not design

intended engineer should

understand

how

guideline

Technical guideline

Ratlvnale:

brief

technical

justlfication

for

use

of

the

Impa_'q the

of

Iq[onpractlce: is avoided

brief

statement

of

what

can

be

expected

if

guideline

Related in the

Guidelines: manual that

Identlficatlon contain related

of

other

topic

areas _IPONSOR OP GUIDELINE

information

References: information

Publications about the

that guideline

contain

additional

3.3 Guidelines as of January 1993 GD-ED-2201 GD-ED-2202 ** ** Fastener Standardization tion Considerations and Selec-

Design Considerations for Selection of Thick-Film Microelectronic Circuits Checklists Orbit for Microcircuits Heating Sub-

GD-ED-2203 5. Space Station Freedom Electric Power System Reliability and Maintainability Guidelines Document. EID-00866, Rocketdyne Division, Rockwell International, 1990. 6. Societyof Automotive Engineers,Reliability, Maintainability,and Supportability Guidebook, SAE G-11, 1990. GD-AP-2301 GT-TE-2401

** Design Earth **

Environmental

EMC Guideline for Payloads, systems, and Components 1993.

**New

Guidelines

as of January

3.0 RELIABILITY 3.1 Introduction

DESIGN

GUIDELINES

3.4 Typical Reliability Guideline A typical reliability guideline is illustrated in this section. Environmental heating for Earth orbiting systems is an important design consideration. Designers should use currently accepted values for the solar constant, albedo factor, and Earth radiation when calculating the heat balance of Earth orbiters. These calculations can

The reliability design guidelines for consideration by the aerospace community are presented herein. These guidelines contain information that represents a technically credible process applied to ongoing NASA programs

accuratelypredict the thermal environment of orbiting devices.Failure to use these constantscan resultin an incomplete thermal analysisand grossly underestimated temperature variationsof the components.

References: I. Leffler,J.M.: Spacecraft External Heating Variations in Orbit. AIAA paper 87-1596, June 1987. 2. Reliability/Design, Thermal Applications. MIL-lIDBK-251, 1978. 3. Incropera, F.P.; and DeWitt, D.P.: Fundamentals of Heat and Mass Transfer.Second ed. John Wiley & Sons, 1985.

Analysis of Earth Orbit Environmental Heating Guideline (GD-AP-2$01): Use currentlyaccepted valuesforsolarconstant,albedo factor, and Earth radiationwhen calculating heatbalanceof Earth orbiters. This practice providesheatingrate forblackbody case without consideringspectraleffects or collimation. Benefit:Thermal environment of orbiting devicesis accuratelypredicted. Center to Contact forMore Information: Goddard Implementation Method 2

4.0 NASA RELIABILITY AND STEERING COMMITTEE

MAINTAINABILITY

The followingmembers of the NASA Reliability and Maintainability SteeringCommittee may be contactedfor more information about the practicesand guidelines: Dan Lee Ames Research Center MS 218-7 DQR MoffettField, California 94035 JackRemez Goddard Space FlightCenter Bldg.6 Rm $233 Code 302 Greenbelt, Maryland 20771 Thomas Gindoff Jet Propulsion Laboratory California Institute of Technology MS 301-456 SEC 521 4800 Oak Grove Drive Pasadena,California 91109 Nancy Steisslinger Lyndon B. Johnson Space Center Bldg.45 Run 613 Code NB23 Houston,Texas 77058 Leon M]gdalski John F. Kennedy Space Center RT-ENG-2 KSC HQS 3548 Kennedy Space Center, Florida 32899 Salvatore Bavuso LangleyResearchCenter MS 478 5 Freeman Road Hampton, Virginia 23665-5225 VincentLalli Lewis ResearchCenter MS 501-4 Code 0152 21000 BrookparkRoad Cleveland, Ohio 44135 Donald Bush George C. MarshallSpace FlightCenter CT11 Bldg.4103 MarshallSpace FlightCenter, Alabama 35812 Ronald Lisk NASA Headquarters Code QS Washington,DC 20546

Solar constant,W/m Nominal, 1367.5 Winter, 1422.0 Summer, 1318.0 Albedo factor Nominal, 0.30 Hot, 0.35 Cold, 0.25 -

Earth-emitted energy producing 241 W/m 2)

(nominal,

255

K;

Solar constant, W/m _

Albedo factor

Earth-emitted energy, W,/m _

Equivslent earth temperature, K

Nomlntl,

1367.5

0.25 .30 .$5

286 239 222

_5e 254 280

Winter 1422

solstice,

O.g8 .30 .SS

257 249 231

263 2];8 283

Summer 131S

so]stics,

0.28 .50 .S&

24T 231 214

256 251 246

Technical Rationale: Modification of energy incident on a spacecraft due to Earth-Sun distancevariation and accuracy of solarconstant areof sufficient magnitude to be important parameters in performing a thermal analysis. Impact of Nonpractice: Failure to use constants resultsin an incomplete thermal analysisand grossly underestimated temperature variationsof components.

PART

H--RELIABILITY

TRAINING

1.0

INTRODUCTION

TO

RELIABILITY

=Reliability _ appliesto systems consisting of people, machines, and written information.A system is reliable ifthose who need it can depend on itover a reasonable period oftime and if itsatisfies theirneeds.Of the people involved in a system, some relyon it,some keep it reliable,and some do both. Severalmachines comprise a system: mechanical, electrical, and electronic. The written information defines peoples'roles in the system: sales literature; system specifications; detailedmanufacturing drawings; software,programs, and procedures; operating and repairinstructions; and inventory control. Reliability engineeringis the discipline that defines specifictasks done while a system is being planned, designed,manufactured, used, and improved. Outside of the usual engineeringand management tasks, these tasks ensure that the people in the system attend to allthose detailsthat keep itoperating reliably. Reliability engineering isnecessarybecause as usersof rapidly changing technology and as members of large complex systems, we cannot ensure that essential details affecting reliability are not overlooked.

ure.Although such design controlsare important, most equipment failures in the fieldbear no relation to the results of reasonablestress analysesduring design.These failures are type II (i.e., those caused by built-in flaws}.

1.2

New Direction

The new direction in reliability engineering will be toward a more realistic recognition of the causes and effects of failures. The new boundaries proposed for reliability engineering are to exclude management, applied mathematics, and double checking. These functions are important and may still be performed by reliability engineers. However, reliability engineering is to be a synthesizing function devoted to flaw control. The functions presented in figure 2 relate to the following tasks: for

(1) Identifyflaws and stresses and rank them priority actions.

(2) Engage the materialtechnologists to determine the flaw failure mechanisms. (3)Develop flaw control techniquesand send informationback to the engineersresponsible fordesign,manufacture,and support planning.

I.I

Period of Awakening: Failure Analysis

The theme ofthistutorial isfailure physics: the study of how products, hardware, software, and systems fail and what can be done about it.Training in reliability must begin with a review of mathematics and a description of the elements that contributeto product failures. Consider the following example of a failure analysis. A semiconductor diode developed a short.Analysisshowed that a surgevoltagewas occurringoccasionally, exceeding the breakdown voltage of the diode and burning itup. The problem: stress exceeding strength,a type I failure. A transistorsuddenly stopped functioning. Analysis showed that aluminum metallization opened at an oxide step on the chip, the opening accelerated by the neckdown of the metallization at the step.In classical terminology,thisfailure, caused by a manufacturing flaw,isa random failure(type If).These two failuretypes are shown in figure1.Formerly, most of the design control efforts shown in the figurewere aimed at the type I fail-

Electromigration

Cathode depletfon

Bearing wear

(a) Type I failures (a designmarginproblem on stress/strength, fatigue, andwear).

Oxide

Electromigration aroundflaw

Misaligned gear wear

Oxidepinhole breakdown

(b)Type II failures(a flaw problem). Figure1.--Two typesof failure.

analysis Environmental function function Design Design _ package


it

__

I=

Environmental stress information


,r

Support planning

Flaw control information

liiiiiiii!i i i
Flaw (failure) mechanisms

Information on operational conditions System/ equipment iser

(failure) li iliiiiiiiiii l- Flaw information Manufacturing _--k_nufacturing flaw information

tecMaterial hnology

Completed systems/equlpm_ent Maintenance plan and test equipment Figure 2.--Role of reliability engineering for the 1990's.

The neering

types are

of output different from

expected those

from provided

reliability by

engi-

Era of Semiconductors Period of Awakening New Direction Concluding Remarks Reliability Training

traditional characterisstresses mechanisms yield to

engineering: tics of parts on flaws to flaw and

stress-screening and systems; failures;

regimens; effects

failure

of environmental of failure

relationship

failures; reliability; IC chip

relationship flaw inspection

of manufacturing methods and vibration such

product mated monitoring.

detection

as autosignature

Relisbmty Mathematics and Failure Mathematics Review Notation Manipulation of Exponential Rounding Data Integration Formulas Differential Formulas Partial Derivatives Expansion of (a + b) n Failure Physics Probability Theory Fundamentals

Physics

Functions

Because facturing the bility flaw tion. and the what distribution

flaws processes,

in an item quality of flaws must

depend control, not

on stay

the

design,

manuRelia-

parts,

and materials, constant.

does

engineering control It is important allow time proper instead

act in a timely to the that proper customers to be tailored

manner functions recognize to

to provide for this needs acfact of of

information

controls

the

of demanding for the

a one-time total contract

negotiation period.

should

be done

Probability Theorems Concept of Reliability Reliability as Probability of Success Reliability as Absence of Failure Product Application K-Factors Concluding l_marks Reliability Training Exponential Dlstribut|on and Rellabmty Exponential Distribution Failure Rate Definition Failure Rate Dimensions _Bathtub" Curve Mean Time Between Failures Models

1.3

Training

as of June

1992

Although exemplify the the

this

tutorial

considers of a reliability a complete

only

specific

areas program,

to

contents provides

training list from

following

the

NASA avail-

Reference able tion course upon

Publication request from

1253, the

=Reliability National Virginia;

Training," Technical (703)

InformaA

Service, evaluation

Springfield, form

487-4650. appendix.

is included

in the

Calculations of Pc for Single Devices Reliability Models Calculation of Reliability for Serles-Connected Devices Calculation of Reliability for Devices Connected in Parallel (Redundancy) Calculation of Reliability for Complete System

Introduction

to Reliability

Era of Mechanical Designs Era of Electron Tubes

Concluding Reliability Uetag Failure Operating Storage

Remarks Training Rate Data Failure Rates

Other Trends Software

Models and Conclusions

Categories

of Software

Variables

Affecting Life Test Test

Processing Environments Severity of Software Defects Failure Rates Part Derating Software Hardware Bugs and Compared Software of Software With Failures Bugs Software Defects

Summary of Variables Part Failure Rate Data Improving System

Affecting

Manifestations Through Reliability Techniques

Reliability

Training

Predicting Reliability Use of Failure Rates Nonoperating Applications Equipment Standardisation Allocation Failures

by Rapid in Tradeoffs

Software quality A_uraace


Concept of Quality Quality Quality Quality Software Quality Remarks Training Management Organisation Characteristics Metrics Quality Standards Metrics to Control Failure of Rates Software Software Software Overall Software and Concluding Reliability Reliability

of Reliability Reliability as a Means Rates

Predictions of Reducing and

of Failure

Reliability

Importance of Learning From Each Failure Failure Reporting, Analysis, Corrective Action, Concurrence Case Study--Achieving Challenge Description to Achieving Reliability Goals Launch Vehicle Reliability

Design Subsystem Approach

Roots of Reliability Management Planning a Reliability Management General Management Considerations

Launch and Flight Reliability Field Failure Problem Mechanical Tests Runup Summary Concluding Reliability Applying and Rundown of Case Remarks Training Density Functions Functions Distribution Tests

Program Establishment Goals and Objectives Symbolic Logistics Reliability Performance Specification Field Studies Human Analysis Human Example Representation Support and Repair Activities Philosophy Management Requirements Targets

Study

Probabillty Density

Probability Application Cumulative Normal Normal Properties Symmetrical One-Limit

Functions

Reliability Methods Errors

of Density Probability Distribution Density

Function Distribution Problems Problems to Test Analyses and

Presentation

of Reliability Manufacturing

of Normal Two-Limit Problems

Engineering and User or Customer Reliability Training

Nonsymmetrical Application

Two-Limit of Normal

Distribution

Appendixes A--Reliability B--Project Information Manager's Guide Testing on Product Assurance

Reliability Predictions Effects of Tolerance on a Product Notes on Tolerance Effects Remarks Training Accumulation: of Tolerance A How-To-Do-It Guide

C--Reliabillty Bibliography Reliability Training

Examples

Estimating Concluding Reliability Testing

Answer_

for Reliability Reliability

Demonstrating Pc Ps Illustrated Illustrated

2.0

RELIABILITY URE PHYSICS FailurePhysics

MATHEMATICS

AND

FAIL-

Pw Illustrated K-Factors lllustrated Test Objectives and Methods

2.1

Test Objectives Attribute Test

Methods

Test-to-Failure Methods Life Test Methods Concluding Reliability Remarks Training

Software Reliabfllty Models Time Data Domain Domain Models Models

Axiomatic

Models

When most engineersthink of reliability, they think of parts sinceparts are the buildingblocks of products. Allagree that a reliable product must have reliable parts. But what makes a part reliable? When asked, nearly all engineers would say a reliablepaxt is one purchased according to a certain source control document and bought from an approved vendor. Unfortunately, these two qualifications are not always guarantees of relial)ility. The following case illustrates this problem.

A clock purchased according to PD 4600008 was procured from an approved vendor foruse in the ground support equipment of a missile system and was subjected to qualification testsas part of the reliability program. These testsconsisted of high- and low-temperature,mechanicalshock,temperature shock,vibration, and humidity.The clocksfrom the then sole-source vendor failed two of the tests: low-temperatureand humidity. A failure analysis revealedthatlubricants in the clock's mechanism froze and that the seals were not adequate to protect the mechanism from humidity. A second approved vendor was selected. His clocksfailed the hlgh-temperaturetest. In the processthe dialhands and numerals turned black, making readingsimpossiblefrom a distanceof 2 feet.A third approved vendor's clocks passed all of the tests except mechanical shock,which cracked two of the cases. Ironically, the fourth approved vendor's clocks,though lessexpensive,passed allthe tests. The point of thisillustration isthat fourclocks, each designed to the same specification and procured from a qualifiedvendor, all performed differently in the same environments. Why did thishappen? The specification did not include the gear lubricantor the type of coating on the hands and numerals or the type of casematerial. Many similarexamples could be cited, ranging from requirements for glue and paint to complete assemblies and systems, and the key to answering these problems can best be stated as follows:To kuow how reliable a product is or how to design a reliable product, you must know how many ways its parts can fail and the types and magnitude of stresses that cause such failures. Think about this: if you knew every conceivable way a missile could fail and if you knew the type and level of stress required to produce each failure, you could build a missile that would never fail because you could eliminate (I) As many ways of failure as possible (2) As many stresses as possible (3) The remaining potential failures by controlling the levelof the remaining stresses Sound simple? Well, it would be except that despite the thousands of failures observed in industry each day, we still know very little about why thingsfail and even less about how to control these failures. However, through systematic data accumulation and study, we learnmore each day. As stated at the outset, thistutorial introducessome basic concepts of failurephysics:failuremodes (how failures are revealed); failure mechanisms (what produces the failure mode); and failure stresses (what activates the failure mechanisms). The theory and the practical tools availablefor controlling failures are presented also.

2.2

Reliability as Absence of Failure

Although the classical definitionof reliability is adequate for most purposes,we are going to modify it somewhat and examine reliability from a slightly different viewpoint. Consider thisdefinition: Reliability is the probability that the critical [ailure modes of a device wi]l not occur during a specified period of time and under specified conditions when used in the manner and for the purpose _tended. Essentially, thismodificationreplaces the words "a device willoperate successfullff with the words "critical failure modes . . . will not occur. _ This means that ifallthe possiblefailure modes of a device (ways the device can fail)and their probabilitiesof occurrence are known, the probability of success (or the reliability of a device) can be stated. It can be stated in terms of the probability that those failure modes critical to the performance of the device willnot occur. Just as we needed a clear definition of success when using the classical definition, we must also have a clear definition of failure when using the modified definition. For example, assume that a resistorhas only two failure modes: it can open or it can short. If the probabilitythat the resistor willnot short is0.99 and the probabilitythat it willnot open is 0.9,the reliability of the resistor (or the probability that the resistor will not short or open) is given by

Rreslstor

= Probability

of no opens

Probability of no shorts = 0.9 0.99 = 0.89

Note that we have multipliedthe probabilities. Probability theorem 2 therefore requiresthat the open-failuremode probability and the short-failure-mode probability be independent of each other.This condition issatisfied because an open-failure mode cannot occur simultaneously with a short mode.

2.3

Product Application

This section relates reliability (or the probabilityof success)to product failures. 2.3.1 Product failuremodes.--In general, critical

equipment failures may be classified as catastrophicpart failures, tolerance failures, and wearout failures. The expressionforreliability then becomes

R = Pw

I0

where Pc probability
occur

added failures, the observed failures the inherent failures of the design. that catastrophic part failures will not 2.4 K-Factors

will be greater

than

Pt Pw

probability probability

that tolerance that wearout example,

failures failures

will not occur will not occur are multiRproduc t =

The other contributors to product failure just mentioned are calledK-factors; they have a value between 0 and 1 and modify the inherent reliability:

As in the resistor

these probabilities

plied together because they are considered to be independent of each other. However, this may not always be true because an out-of-tolerance failure, for example, may evolve into or result from a catastrophic part failure. Nevertheless, in this tutorial they are considered independent 2.3.2 and exceptions Inherent product are pointed out as required. the in-

Ri(KqKmKrK_Ku)

K-factors denote probabilities will not be degraded by

that

inherent

reliability

reliability.--Consider

herent reliability R i of a product. Think of the expression R i = PcPtP w as representing the potential reliability of a product as described by its documentation, or let it represent the reliability inherent in the design drawings instead of the reliability of the manufactured hardware. This inherent reliability is predicated on the decisions and actions of many people. If they change, the inherent reliability could change. Why do we consider inherent reliability? Because the facts of failure are these: When a design comes off the drawing board, the parts and materials have been selected; the tolerance, error, stress, and other performance analyses have been performed; the type of packaging is firm; the manufacturing processes and fabrication techniques have been decided; and usually the test methods and the quality acceptance criteria have been selected. At this point the design documentation represents some potential reliability that can never be increased except by a design change or good maintenance. However, the possibility exists that the actual reliability observed when the documentation is transformed into hardware will be much less than the potential reliability of the design. To understand why this is true, consider the hardware to be a black box with a hole in both the top and bottom. Inside are potential failures that limit the inherent reliability of the design. When the hardware is operated, these potential failures fall out the bottom (i.e., operating failures are observed). The rate at which the failures fall out depends on how the box or hardware is operated. Unfortunately, we never have just the inherent failures to worry about because other types of failures are being added to the box through the hole in the top. These other failures are generated by the manufacturing, quality, mad logistics functions, by the user or customer, and even by the reliability organization itself. We discuss these added failures and their contributors in the following paragraphs but it is important to understand that, because of the

- K m manufacturing and fabrication and assembly techniques quality test methods and acceptance criteria Krqreliability engineering activities - K_ logistics activities K u the user or customer Any K-factor can cause reliability to go to zero. Ifeach K-factor equals 1 (the goal), Rproduc t = R i.

2.5

Variables AffectingFailureRates

Part failure rates are affected by (1) acceptance criteria, (2) all environments, (3) application, and (4) storage. To reduce the occurrence of part failures, we observe failure modes, learn what caused the failure (the failure stress), determine why it failed (the failure mechanism), and then take action to eliminate the failure. For example, one of the failure modes observed during a storage test was an _open" in a wet tantalum capacitor. The failure mechanism was end seal deterioration, allowing the electrolyte to leak. One obvious way to avoid this failure mode in a system that must be stored for long periods without maintenance is not to use wet tantalum capacitors. If this is impossible, the best solution would be to redesign the end seals. Further testing would be required to isolate the exact failure stress that produces the failure mechanism. Once isolated, the failure mechanism can often be eliminated through redesign or additional process controls.

2.6

Use of Failure

Rates

in Tradeoffs

Failure rate tables and derating curves are useful to a designer because they enable him to make reliability tradeoffs and provide a more practical method of establishing derating requirements. For example, suppose we

11

have two design concepts for performing some function. If the failure rate of concept A is 10 times higher than that of concept ]3,one can expect concept B to fail onetenth as often as concept A. If it is desirableto use concept A forother reasons,such as cost,size, performance, or weight, the derating failure rate curves can be used to improve concept A's failurerate (e.g., select

isthe requiredfailure rate. Ifblowers are used for cooling, the equipment must operate at temperatures as high as 75 C; if air-conditioning isused, the temperature need not exceed 50 C. Therefore,air-conditioning must be used ifwe are to meet the reliability requirement.

Other factors must be examined before we make a final decision.Whatever type of cooling equipment is components with a lower failure rate,derate the composelected, the totalsystem reliability now becomes nents more, or both). An even betterapproach isto find ways to reduce the complexity and thus the failure rate R T = ReR c of concept A. Figure 3 illustrates the use of failure rate data in tradeoffs. This figuregives a failure-rate-versusTherefore,the effect on the system of the cooling equip temperature curve for the electronics of a complex (over ment's reliability must be calculated. A more important 35 000 parts) piece of ground support equipment. The consideration isthe effect on system reliability should the curve was developed as follows: (1) A failure ratepredictionwas performed by using component failure ratesand theirapplication factors KA foran operatingtemperature of 25 oC. The resulting failure rate was chosen as a reference point. (2) Predictionswere then made by using the same method for temperatures of 50, 75, and 100 C. The ratios of these predictionsto the referencepoint were plotted versus component operating temperature, with the resultingcurve for the equipment. This curve was then used to provide tradeoff criteriafor using airconditioning versus blowers to cool the equipment. To illustrate, suppose the maximum operatingtemperatures expected are 50 C with air-conditioning and 75 C with blowers.Suppose furtherthatthe requiredfailure ratefor the equipment, ifthe equipment isto meet itsreliability goal,is one failure per 50 hr.A failure ratepredictionat 25 C might indicatea failure rateof I per 100 hr.From the figure,note that the maximum allowable operating temperature is therefore 60 C, since the maximum allowablefailure rateratiois A = 2; that is,at 60 C the equipment failure ratewillbe (1/100) 2 = 1/50,which cooling equipment fail.Because temperature control appears to be critical, lossof itmay have serioussystem consequences.Therefore, itis too soon to ruleout blowers entirely. A failure mode, effects, and criticality analysis (FMECA) must be made on both cooling methods to examine allpossible failure modes and theireffects on the system. Only then willwe have sufficient information to make a sound decision.

2.7

Importance of Learning From Each Failure

When a product fails, a valuable piece of information about it has been generated because we have the opportunity to learn how to improve the product if we take the right actions. Failurescan be classified as: (1) Catastrophic (a shorted transistoror an open wire-wound resistor) (2)Degradation (change in transistorgain or the resistor value) (3)Wearout (brush wear in an electric motor) These threefailure categories can be subclassified further: (1) Independent (a shorted capacitor in a radiofrequency amplifier being unrelated to a low-emission cathode in a picture tube) (2) Cascade (the shorted capacitor in the radiofrequency amplifier causing excessive current to flow in its transistor and burning the collector beam lead open) (3) Common
motors)

40 2O lO _ 8 -

i _Maximum /e I Ioperating / I ltemperature/ I Iwilh blowers/' IMaxJmum I / Ioperating l / Itemperaturewith I lair-conditioning I /_

_+

4 2 i point co

mode

(uncured

resin

being

present

in

20

30

40 50 60 70 80 Componenttemperature,+C

90

100

Much can be learnedfrom each failure by using these categories, good failure reporting, analysis, and a concurrence system and by taking correctiveaction. Failure analysisdetermines what caused the part to fail. Correc-

Figure3.--Predicted failurerateratiosversustemperaturefor groundsupport equipment(electronics).

12

tive action ensures that the cause is dealt with. Concurrence informs management of actions being taken to

provide some about (The

measure

of reliability but littleinformation failure mechanisms of like devices.

the population

avoid another failure.These data enable all personnel to compare the part ratings with the use stresses and verify margin.

exceptions to this are not dealt with at this time.) sections, we discuss confidence levels,

that the part is being used with a known

In subsequent attribute

test, test-to-failure, and wel how these methods the meet

lifetest
be

methods,

explain how 2.8 Effects of Tolerance on a Product tives, show limits.

the two test objecstatistically

test results can

analyzed, and introduce the subject and use of confidence Because tolerances must be expected in all manufacquestions to ask about are 3.2 (1) How is the reliability affected? Mr. (2) How can tolerances he analyzed and what are available? methods and Igor Bazovsky, in his book, Reliability Theory aconfidence_ in Confidence Levels

turing processes, some

important

the affects of tolerance on a product

Practice

(ref. 4), defines the term

testing:

(3) What

will affect the term

Pt

in

the product

We know that statistical estimatesare more likely to be close to the truevalue as the sample sizeincreases. Thus, thereisa close correlation between the accuracy of an estimateand the sizeof the mample from which itwas obtained.Only an infinitely largesample size could give us a 100 percent confidence or certainty that a measured statistical parameter coincides with the true value. In this context, confidence is a mathematical probability relating the mutual positions of the true value of a parameter and its estimate. When the estimate of a parameter is obtained from a reasonably sized sample, we may logically assume that the true value of that parameter will be somewhere in the neighborhood of the estimate, to the right or to the left, Therefore, it would be more meaningful to express statistical estimates in terms of a range or interval with an associated probability or confidence that the true value lies within such interval than to express them as point estimates. This is exactly what we are doing when we assign confidence limits to point estimates obtained from statistical measurements.

reliabilitymodel?

Electrical (circuit gains

circuits can shift

are often up

affected

by part righthand

tolerances function s-plane, not fit

or down,

and transfer

poles or zeros can

shift into the

causing oscillations). Mechanical together or may

components

may

be so loose that

excessive vibration

causes trouble (refs.1 to 3).

3.0

TESTING

FOR

RELIABILITY

3.1

Test Objectives

It can be inferred that i000 test samples to demonstrate of cost and more,

are required

a reliability requirement of 0.999. Because is impractical. Furtheroften may not In other words, rather than as point estimates, express statisticalestimates be more meaningful to of a product it would

time, this approach

the total production

even approach

1000 items. Because we usually cannot test

express them such an

as a range

(or interval), with an associated is a statisticalterm and reflects the amount that of

the total production of a product (calledproduct population), we must demonstrate reliabilityon a few samples.

probability (or confidence) that the true value lleswithin interval. Confidence on supporting data

Thus, the main objective of a reliabilitytest is to test an available device so that the data will allow a statistical conclusion to be reached about devices that will not or cannot main the reliabilityof similar be tested. That is, the

depends

risk to be taken when

stating the reliability.

objective of a reliability test isnot only to evaluate basis

3.3

Attribute Test Methods

the specific items tested but also to provide a sound

for predicting the reliability of similar items that will not be tested and that often have not yet been manufactured.

Qualification, preflight certification, and design verification tests are categorized and 6). They as attribute tests (refs.5 and demonstrate showing how that or

are usually go/no-go or bad without

To how

know

how ways

reliable a product

is one must

know

a device is good how

good

many

itcan failand the types and magnitudes such failures.This premise

bad. In a typical test, two samples

are subjected to

of the stresses that produce leads to a secondary produce

a selected level of environmental mum

stress,usually tilemaxi-

objective of a reliability test: to so that the types and result in no failures

anticipated operational limit. If both samples pass, qualified, preflight certified,or involved

failures in the product

the device is considered

magnitudes

of the stresses causing such failures can be

verified for use in the particular environment

identified. Reliability tests that

(refs.7 and 8). Occasionally, such tests are called tests to

13

success because pass the test.

the

true

objective

is to have

the device

3.5

Life Test

Methods

Life testsaxe conducted to illustrate how the failure In summary, an attribute test is not a satisfactory method of testing for reliability because it can only identify gross design and manufacturing problems; it is an adequate method of testing for reliability only when sufficient samples are tested to establish an acceptable level of statistical confidence. rateof a typical system or complex subsystem variesduring itsoperatinglife. Such data provide valuable guidelines for controllingproduct reliability. They help to establish burn-in requirements, to predict spare part requirements, and to understand the need for or lack of need for a system overhaul program. Such data are obtained through laboratory lifetestsor from the normal operation of a fielded system. 3.4 Test-To-Failure Methods 2o The purpose of the test-to-failure method is to develop a failure distribution for a product under one or more types of stress. The results are used to calculate the demonstrated reliability of the device for each stress. In this case the demonstrated population reliability will usually be the Pt or Pw product reliability term. the term 14

18

16

In this discussion

of test-to-failure

methods,

=safety factor s SF is included because it is often confused with safety margin SM. Safety factor is widely used in industry to describe the assurance against failure that is built into structural products. Of the many definitions of safety factor the most commonly used is the ratio of mean strength to reliability boundary:

12

10

9.68 Percentdefective

p(x) When we deal with materials repeatable, and =tight _ strength with clearly distributions, defined, such as (a) StructureA. 18
D

sheet and structural steel or aluminum, using S F presents little risk. However, when we deal with plastics, fiberglass, and other metal substitutes or processes with wide variations in strength or repeatability, using S M provides a clearer picture of what is happening (fig. 4). In most cases, we must know the safety margin to understand how accurate the safety factor may be. In summary, test-to-failure develop a strength distribution methods can be used to that provides a good esti-

16 ___SF 14

10 13"13

"

-D
SM = 4.0 Rb

mate of the Pt and Pw product reliability terms without the need for the large samples required for attribute tests; the results of a test-to-failure exposure of a device can be used to predict the reliability of similar devices that cannot or will not be tested; testing to failure provides a means of evaluating the failure modes and mechanisms of devices so that improvements can be made; confidence levels can be applied to the safety margins and to the resulting population reliability estimates; the accuracy of a safety factor can be known only if the associated safety margin is known.

lO

8 _--'_'- 0.003 Peroent defective 6

/
p(x) (b) StructureB.

Figure4.--Two structures with identicalsafetyfactors (SF= 13/10 = 1.3) butdifferentsafetymargins.

14

In summary, Iliatests are performed to evaluate product failure rate characteristics; iffailures include all causes of system failure, the failure rate of the system is the only true factoravailable for evaluating the system's performance; lifetests at the part level require large sample sizesifrealistic failure rate characteristics are to be identified; laboratory lifetests must simulate the major factors that influence failurerates in a device during field operations; the use ofrunning averages in the analysis of lifedata will identifyburn-in and wearout regions if such exist; and failure rates are statistics and thereforeare subject to confidence levelswhen used in making predictions. Figure 5 illustrates what might be called a failure surfacefor a typicalproduct. It shows system failure rate versus operating time and environmental stress, three parameters that describea surface such that, given an environmental stress and an operating time, the failure rate is a point on the surface. Test-to-failure methods generate lineson the surface parallel to the stress axis;life testsgenerate lines on the surfaceparallelto the time axis. Therefore,these tests provide a good descriptionof the failuresurface and, consequently,the reliability of a product. Attribute tests resultonly in a point on the surfaceif failures occur and a point somewhere within the volume iffailures do not occur.For thisreason,attribute testing isthe least desirable method forascertaining reliability.

Of course, in the case of missile flights or other events that produce go/no-go results, an attribute analysis is the only way to determine product reliability.

4.0

SOFTWARE

RELIABILITY management is highly dependent between quality and reliability is

Software reliability on how the relationship

perceived. For the purposes of this tutorial, quality is closely related to the process, and reliability is closely related to the product. Thus, both span the life cycle. Before we can stratify software reliability, the progress of hardware reliability will be reviewed. Over the past 25 years, the industry observed (1) the initial of "wizard status _ to hardware reliability modeling, and analysis, (2) the growth of the assignment for theory, field, and

(3) the final establishment of hardware reliability as a science. One of the major problems was aligning reliability predictions and field performance. Once that was accomplished, the wizard status was removed from hardware reliability. The emphasis in hardware reliability from now to the year 2000 will be on system failure modes and effects. Software reliability became classified as a science for many reasons. The difficulty in assessing software reliability is analogous to the problem of assessing the reliability of a new hardware device with unknown reliability characteristics. The existence of 30 to 50 different software reliability models indicates the organization in this area. Hardware reliability began at a few companies and later became the focus of the Advisory Group on Reliability of Electronic Equipment. The field then logically progressed through different models in sequence over the years. Similarly, numerous people and companies simultaneously entered the software reliability field in their major areas: cost, complexity, and reliability. The difference is that at least 100 times as many people are now studying software reliability as those who initially studied hardware reliability. The existence of so many models and their purports tends to mask the fact that several of these models showed excellent correlations between software performance predictions and actual software field performance: the Musa model as applied to communications systems and the Xerox model as applied to office copiers. There are also reasons for not accepting software reliability as a science, and they are discussed next. One impediment to the establishment of software reliability as a science is the tendency toward programming development philosophies such as (1) "do it right the in'st time _ (a reliability model is not needed} or (2) "quality is a programmer's development toolj _ or

/0

Figure 5.--Product failure surface.

15

(3) _quality is the same as reliability and is measured by the number of defects in a program and not by its reliability. _ All of these philosophies tend to eliminate probabilistic measures because the managers consider a programmer to be a software factory whose quMity output is controllable, adjustable, or both. In actuality, hardware design can be controlled for reliability characteristics better than software design can. Design philosophy experiments that failed to enhance hardware reliability axe again being formulated for software design (ref. 9). Quality and reliability are not the same. Quality is characteristic and reliability is probabilistic. Our approach draws the line between quality and reliability because quality is concerned with the development process and reliability is concerned with the operating product. Many models have been developed and a number of the measurement models show great promise. Predictive models have been far less successful partly because a data base {such as MIL-HDBK-217E, ref. 10} is not yet available for software. Software reliability often has to use other methods; it must be concerned with the process of software product development.

"It is contrary to the definition of reliability to apply reliability analysis to a system that never really works. This means that the software which still has bugs in it really has never worked in the true sense of reliability in the hardware sense. _ Large complex software programs used in the communications industry are usually operating with some software bugs. Thus, a reliability analysis of such software is different from a reliability analysis of established hardware. Software reliability is not alone in the need for establishing qualitative and quantitative models. In the early 1980's, work was done on a combined hardware/software reliability model. A theory for combining well-known hardware and software models in a Markov processwas developed. A considerationwas the topic of software bugs and errorsbased on experiencein the telecommunications field. To synthesizethe manifestationsof softwarebugs, some of the followinghardware trends for these systems should be noted: (1) hardware transientfailures increaseas integratedcircuits become denser; (2} hardware transientfailures tend to remain constant or increaseslightly with time afterthe burn-in; and (3} hardware (integrated circuit) catastrophic failures decrease with time after the burn-in phase. These trends affect the operational software of communications systems. If the transient failures increase, the error analysis and system security software are called into action more often. This increases the risk of misprocessing a given transaction in the communications system. A decrease in the catastrophic failure rate of integrated circuits can be significant {ref. 12}. An order-of-magnitude decrease in the failure rate of 4K memory devices between the first year and the twentieth year is predicted. We also tend to oversimplify the actual situations. Even with five vendors of these 4K devices, the manufacturing quality control person may have to set up different screens to eliminate the defective devices from different vendors. Thus, the system software will see many different transient memory problems and combinations of them in operation. Centralcontroltechnology has prevailedin communications systems for25 years.The industryhas used many of itsold modeling tools and applied them directlyto distributedcontrol structures. Most modeling research was performed on largeduplex processors. With an evolution through forms of multiple duplex processors and load-sharingprocessorsand on to the present forms of distributed processingarchitectures, the modeling tools need to be verified. With fullydistributed control systems, the softwarereliability model must be conceptually matched to the softwaredesign in order to achievevalid predictions of reliability. The followingtrends can be formulated for software transient failures: (I} software transient failures decrease

4.1

Hardware

and Software

Failures

Microprocessor-based products have more refined definitions. Four types of failure may be considered (1} hardware catastrophic, (2)hardware transient, (3)software catastrophic, and (4) software transient. In general, the catastrophic failures require a physical or remote hardware replacement, software program can result in a manual or remote patch. The transient either restarts or unit restart, or a failure categories reloads for the

microprocessor-based systems, subsystems, or individual units and may or may not require further correction. A recent reliability analysis of such a system assigned ratios for these categories. Hardware transient faults were assumed to occur at 10 times the hardware catastrophic rate, and software transient faults were assumed to occur at 100 to 500 times the software catastrophic rate. The time of day is of great concern in reliability modeling and analysis. Although hardware catastrophic failures occur at any time of theday, they oftenmanifest themselves during busiersystem processingtimes.On the other hand, hardware and softwaretransient failures generallyoccur during the busy hours.When a system'spredicted reliability is close to the specified reliability, a sensitivity analysismust be performed.

4.2

Manifestations Many theories,

of Software models,

Bugs are available for

and methods Nathan

quantifying

software

reliability.

(ref. 11) stated,

16

as the system control increase allowed error

architecture and

approaches (2) software

a fully transient

distributed failures less time of

modeling hardware, powerful Table

tool could

additionally combine operator

the effects of be a

structure,

software, and tool

faults, it would tradeoff

as the processing per function, fast

window timing of system

decreases mode ready

(i.e.,

for making

design

decisions.

entry,

removal

I, an example

of the missing link, presents a fiveexamples to indicate criticality

checking,

removal

checks).

level criticality index for defects. These the flexibility of classification. such an

approach

A fully distributed control structure can be configured to operate as its own essing levels, each error filter. In a hierarchy of proc-

level acts as a barrier to the level

We software each has

can bug

choose removal

a decreasing,

constant,

or

increasing Although systems, be in

below and prevents errors or transient faults from propagating through the system. Central control structures

rate for systems to special bug removal

software. and

its app]icatlon software Systems software

situations rate also will

cannot usually prevent this type of error propagation.

a decreasing encountered.

generally advantages patched appropri-

software defects patch type that in link the can of

has

If the ware control likely user tems.

interleaving

of transaction such as with

processes a fully processes with

in a softdistributed are less

that and ate

certain the date.

be temporarily to a more not affect defect

program

is reduced, the

permanent

postponed does

architecture, to fail. This as

transaction true in

Thus,
be

this

manifestation service, but

is

is especially experienced on

nonconslstent sysis likely

treated it should

in genera]

as one

interaction Another

communications transient failures runs, the more

included missing the

overall

software software of design bug

quality mani-

opinion

software

assessment. festations. and software

The Until

concerns

that the faster a software program it is to cause errors (such control architectures}. A methods ware ations each bugs. are "missing can be link" used needs further

traditional is overcome impossible This has not to

separation in the achieve that on

hardware of large

as encountered

in central

systems be

systems, performance discussion. the occurrence Several of softoperbecause The key 4.3 formance causes

it will

a satisfactory software the per-

benchmark. modeling

indicates yet focused

specific

to quantify manifestations to the could cause

of software

unreliability.

However, detrimental

in the system's analysis event. a failure for bug

reliability

manifestation levels their

Concept Consider the

of Quality concept The of need quality for quality and "doing before We go on to

is to categorize and with ware ware system problems stress and found estimate

of criticality

manifestations and of this software. their re-

probability The

of occurrence importance and

spective

distributions. is often design which

increases Softa softis the priority under

software concepts time" ment.

quality. of have We _zero changed changed quality

is universal. it right the

The first

the distribution reliability reliability test,

of the hardware controlled process. The the

defects" our from levels

by establishing final measure of while

perspective measuring to The One

on quality defects the concepts is that by we would to achieve as possible, The per

manageunit and and

includes performance by audits,

evaluation

acceptable cost that in reduction quality

monitoring present viewpoint Thus, factors as soon

design

and the

of the system interrupts, The

processes. is not free. in quality

indicate a major the characcustomer and then

as defined other before

re-lnitializatlon, missing needs software link to reliabilprocess reliability on

measurable software

parameters. bug

improvement process terize

can be achieved a product.

perfecting

quantifying

manifestations an accurate

be

of developing the process, correct quality

we can obtain

implement defects

ity model a predicted

for measuring performance

tradeoffs basis.

in the design If a software

satisfaction, strive for total

management.

key to achieving

TABLE Bug manifestation rate 4 per day 3 per day 2 per week 1 per month 1 per two years Defect removal rate 1 per month 1 per week 1 per month 2 per year 1 per year

I.--CRITICALITY Level of criticality

INDEX type Failure characteristic

Failure

Transient Transient Transient or catastrophic Transient or catastrophic Catastrophic

Errors come and go Errors are repeated Service is affected System down System is partially stops

17

quality appears to have a thirdmajor factorin addition to product and process:the environment. People are important.They make the processor theproduct successful. The next step isto discuss what the processof achieving qualityin softwareconsists of and how quality management isinvolved.The purpose ofqualitymanagement for programming products isto ensure that a preselected softwarequalitylevelhas been achieved on scheduleand in a cost-effective manner. In developing a qualitymanagement system, the programming product'scritical lifecycle-phase reviews providethe reference base fortracking the achievement of qualityobjectives. The International ElectrotechnicalCommission (IEC) system life-cycle phases presented in theirguidelines for reliability and maintainabilitymanagement are (I) concept and definition, (2)design and development, (3) manufacturing, installation, and acceptance, (4)operation and maintenance,and (5)disposal. In general,a phase-coststudy shows the increasing cost of correcting programming defects in laterphases of a programming product'slife. Also, the higher the level of software quality,the more life-cycle costsare reduced.

metrics, and (4) standards. Areas {1} and (2) are applicable during both the design and development phase and the operation and maintenance phase. In general, area (2) is used during the design and development phase before the acceptance phase for a given software product. The following discussion will concern area (2).

4.5

Software

Quality

Metrics

The entire area of softwaremeasurements and metrics has been widely discussedand the subjectof many publications. Notable is the guide for software reliability measurement developedby theInstitute forElectrical and Electronics Engineers (IEEE} Computer Society's working group on metrics.A basisfor software qualitystandardizationwas alsoissuedby the IEEE. Software metrics cannot be developed before the cause and effect of a defect have been established for a givenproduct with relationto itsproduct life cycle. A typical cause-and-effect chart for a software product includesthe processindicator. At the testingstage of product development, the evolution of software qualitylevels can be assessedby characteristics such as freedom from error, successful testcase completion,and estimate of the software bugs remaining. For example, these processindicators can be used to predict slippageof the product deliverydate and the inability to meet originaldesign goals. When the programming product entersthe qualification, installation, and acceptance phase and continues into the maintenance and enhancements phase, the concept of performance is important in the quality characteristic activity. This concept isshown in table IIwhere the 5 IEC system life-cycle phaseshave been expanded to 10 software life-cycle phases.

4.4

Software

Quality

The next step is to look at specific software quality items. Software quality is defined as %he achievement of a preselected software quality level within the costs, schedule, and productivity boundaries established by management _ (ref. 10}. However, agreement on such a definition is often difficult to achieve because metrics vary more than those for hardware, software reliability management has focused on the product, and software quality management has focused on the process. In practice, the quality emphasis can change with respect to the specific product application environment. Different perspectives of software product quality have been presented over the years. However, in todays' literature there is general agreement that the proper quality level for a particular software product should be determined in the concept and definition phase and that quality managers should monitor the project during the remaining life-cycle phases to ensure the proper quality level. The developer of a methodology for assessing the quality of a software product must respond to the specific characteristics of the product. There can be no single quality metric. The process of assessing the quality of a software product begins with the selection of specific characteristics, quality metrics, and performance criteria. With respect to software interest are (1) characteristics, quality, several areas of (2) metrics, (3) overall

4.6

Concluding

Remarks

This sectionpresented a snapshot of software quality assurancetoday. Continuing researchis concerned with theuse ofoverall softwarequalitymetrics and bettersoftware predictiontoolsfordetermining the defectpopulation. In addition, simulators and code generators are being further developed so thathigh-quality softwarecan be produced. Process indicators are closely related to software quality and some include them as a stage in software development. In general,such measures as (1) testcases completed versus testcases planned and (2) the number of linesof code developed versus the number expected give an indicationof the overall company or corporate progress toward a qualitysoftware product. Too often,

18

TABLE II.--MEASUREMENTS

AND PROGRAMMING

PRODUCT

LIFE CYCLE

[The 5 International Electrotechnical Commission (IEC) life-cycle phases have been expanded to I0 software phases.] System lifecycle phase Concept and definition Software life-cycle phase Conceptual planning (1) Requirements definition (2) Product definition (3) Top-level design (4) Detailed design (5) Implementation (6)
w

Order of precedence Primary Secondary

......................................................... ......................................................... Quality metrics" ............................ Quality metrics Quality metrics Process Indicators b Process indicators Performance measures c Process indicatorJ Process indicators Quality metrics Performance measures Quality metrics

Design and development

Manufacturing and installation

Testing and integration (7) Qualification, installation, and acceptance (8) Maintenance and enhancements (9) Disposal (10)

Operation and maintenance Disposal

Performance measures

............................

.........................................................

LMetrlcs, qualitative assessment, quantitative prediction, or both. blndicators, month-by-month tracking of key proiect parameters. Measures, quantitative performance assessment.

personnel are moved from one project to another and thus the lagging projects improve but the leading projects decline in their process indicators. The llfe cycle for programming products should not be disrupted. Performance measures, includingsuch criteria as the percentage of proper transactions, the number of system restarts, the number of system reloads, and the percentage of uptime, should reflect the user's viewpoint. In general, the determination of applicable quality measures for a given software product development is viewed as a specific taskof the software qualityassurance function. The determinationof the processindicators and performance measures is a task of the software quality standards function.

(4) Selectthe reliability analysisprocedure. (5) Selectthe data sourcesforfailure rates and repair rates. (6) Determine the failure ratesand the repairrates. (7) Perform the necessarycalculations. (8) Validate (9) Measure and verify reliability the reliability. until customer shipment.

5.1

Goals

and Objectives

5.0

RELIABILITY

MANAGEMENT

To design for successful reliability and continue to provide customers with a reliable product, the following steps are necessary: (1)Determine the reliability goals to be met. (2) Construct a symbolic representation. (3) Determine philosophy. the logisticssupport and repair

Goals must be placed into the proper perspective. Because they are often examined by using models that the producer develops,one of the weakest links in the reliability processisthe modeling. Dr. John D. Spragins, an editorfor the IEEE Transaction on Computers, corroboratesthisfactwith the followingstatement (ref. 13): Some standard definitions of reliability or availability, such as those based on the probability that all components of a system are operational at a given time, can be dismissed as irrelevant when studying large telecommunication networks, Many telecommunication networks are so large that the probability they are operational according to this criterion may be very nearly zero; at least one item of equipment may be down essentially all of the time. The typical user, however, does not see this unless he or she happens to

19

be the unlucky

person whose

equipment

fails; the system

may

still

5.3

Human

Reliability

operate perfectly from this user's point of view. A more meaningful criterion is one users. based on the reliability seen by typical system to system operators is another valid, The reliability apparent

but distinct, criterion. (Since system operators commonly systems down only after failures have been reported to may not hear of short higher self-clearing than the outages, values seen their reliability are often

consider them, and of

The major objectives of reliability management are to ensure that a selectedreliability levelfor a product can be achieved on schedule in a cost-effective manner and that the customer perceivesthe selectedreliability level. The current emphasis in reliability management is on meeting or exceedingcustomer expectations. We can view thisas a challenge, but itshould be viewed as the bridge between the user and the producer or provider. This bridge is actually =human reliability. _ In the past, the producerwas concerned with the processand the product and found reliability measurements that addressed both. Often there was no correlation between fielddata, the customer's perception of reliability, and the producer's reliability metrics.Surveys then began to indicatethat the customer distinguished between reliability performance, response to order placement, technical support, service quality, etc. Human reliability is defined (ref. 17) as %..the probability of accomplishinga job or task successfully by humans at any requiredstagein system operationswithin a specified minimum time limit (if the time requirement is specified)." Although customers generallyaxe not yet requiringhuman reliability models in addition to the requestedhardware and software reliability models, the science of human reliability iswell established.

estimates

by users.)

Reliability objectives can be defined differently for various systems. An example from the telecommunications industry (ref. 14) is presented in table III.

5.2

Specification A system

Targets a detailed performance or relia-

can have

bility specification that is based on customer requirements. The survivability of a telecommunications network is defined as the ability of the network to perform under stress caused by cable cuts or sudden and lengthy traffic overloads and after failures including equipment breakdowns. Thus, performance and availability have been combined into a unified metric. One area of telecommunications where these principles have been applied is the design and implementation of fiber-based networks. Roohy-Laleh et al. (ref. 15) state a...the statistical observation that on the average 56 percent of the pairs in a copper cable are cut when the cable is dug up, makes the copper network 'structurally survivable. On the other hand, a fiber network can be assumed to be an all or nothing situation with 100 percent of the circuits being affected by a cable cut, failure, etc. In this case study (ref. 15), =...cross connects and allocatable capacity axe utilized by the intelligent network operation system to dynamically reconfigure the network in the case of failuresY Figure 6 (from ref. 16) presents a concept for specification targets.
TM

5.4

Customer

Reliability growth has been studied, modeled, and analyzed--usually from the design and development viewpoint. Seldom is the process or product studied from the customer's perspective. Furthermore, the reliability that the first customer observes with the r-st shipment

TABLE III.--RELIABILITY TELECOMMUNICATIONS Module Telephone Electronic or system instrument key system Mean

OB3ECTIVES INDUSTRY Objective

FOR

lOO Fullyoperational
P2 failures

time

between loss

Complete Major loss

of service

of service

Minor loss of service


PABX Complete loss of service

i
E
I11

Subliminal availability

Subliminal

major Pl

availability minor

Degraded operation

Major loss of service Minor loss of service Mishandled calls Mishandled (TSPS) System outage calls

Subliminal performance, 75 percent at toad factor B Unusable Subliminal performance, 65 percent at load factor B
a1 Availability, Figure 6.--Specification percent targets (ref. 16). a2 100

Traffic service position system Class 5 office Class 4 office Class 3 office

System outage Loss of service Service degradation

2O

can be quitedifferent from thereliability that a customer willobservewith a unit or system produced 5 years later, or with the lastshipment. Because the customer'sexperience can vary with the maturity of a system, reliability growth is an important concept to customers and should be consideredin theirpurchasing decisions. One key to reliability growth is the ability to def'me the goals for the product or service from the customer's perspective while reflecting the actual situation in which the customer obtains the product or service. For large telecommunications switching systems, the rule of thumb for determining reliability growth has been that often systems have been allowed to operate at a lower availability than the specified availability goalfor the first 6 months to 1 year of operation (ref. 18).In addition,component part replacement rates have often been allowed to be 50 percenthigher than specified forthe first 6 months of operation. These allowancesaccommodated craftspersons learningpatterns,software patches,design errors, etc. Another key to reliability growth is to have its measurement encompass the entirelife cycleof the product. The concept is not new; only here the emphasis is placed on the customer's perspective.

Reliability growth can be specified from "day 1_ in product development and can be measured or controlled with a 10-yearlife until _day 5000._ We can apply the philosophy of reliability knowledge generationprinciples, which isto generatereliability knowledge at the earliest possibletime in the planning processand to add to this base for the duration of the product's useful life. To accurately measure and control reliability growth, we must examine the entiremanufacturing lifecycle.One method is the construction of a production life-cycle reliability growth chart. In certain largetelecommunications systems, the long installation time allows the electronic part reliability to grow so that the customer observesboth the design and the production growth. Large complex systems oftenoffer an environment unique to each product installation, which dictatesthat a significant reliability growth will occur. Yet, with the difference that size and complexity impose on resultant product reliability growth, corporationswith largeproduct linesshould not present overall reliability growth curves on a corporate basis but must presentindividual product-line reliability growth pictures to achieve totalcustomer satisfaction.

21

APPENDIX--COURSE

EVALUATION

22

NASA

SAFETY

TRAINING

CENTER

(NSTC)

COURSE

EVALUATION

Name:

Course Title:

Sponsor:

Grade: (academic course only) I Date:

I.What

were the strengthsof thiscourse/workshop?

2. What were the weaknesses of thiscourse/workshop?

3. How

will the skills/knowledge you gained in thiscourse/workshop help you to perform betterin your job?

4. Please give the course/workshop an overall rating.

5 Excellent

3 Fair

I Poor

5. Please give the instructor an overall rating.

5 Excellent

3 Fair

1 Poor

6. Please

rate

the applicability

of this

course to your

work.

5 Excellent

3 Fair

1 Poor (OVER)

23

7.Asa customer

of the NASA

Safety Training Center (NSTC), how would you rate our services?

5 Excellent Comments:

3 Fair

I Poor

8. Please rate the followingitems: Excellent I. Overall course content ............................ 2. Achievement of course objectives ..................... 3. Instructor's knowledge of subject ..................... 4. Instructor's presentation methods .................... 5. Instructor's ability to address questions ................ 6. Quality of textbook/workbook (if applicable} ............ 7. Training facilities ............................... 8. Time allotted forthe course ........................ Comments: 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 Fair 3 3 3 3 $ 3 3 3 2 2 2 2 2 2 2 2 Poor 1 1 1 1 1 1 1 1

9. Training expense other than tuition(ifapplicable}: Travel {including plane fare, taxi,car rentaland tolls} Per Diem Total 10. Please send this evaluationto: NASA Safety Training Center

Webb, Murray & Associates, Inc. 1730 NASA Road One, Suite 102 Houston, Texas 77058 THANK YOU_

24

REFERENCES Reliability Prediction of Electronic Equipment. MIL-HDBK-217E, Jan. 1990.


*

ment. Workshop on Quantitative IEEE, New York, 1979. 12.

Software

Models,

1.

Electronic Reliability Design Handbook. MIL-HDBK-338, vols.1 and 2, Oct. 1988. Reliability Modeling and MIL-STD-756B, Aug. 1982. Prediction. 13.

Schick,G.J.;and Wolverton, R.W.: An Analysisof Computing Software ReliabilityModels. IEEE Trans. Software Eng., vol.SE-4, no. 2, Mar. 1978_ pp. 104-120. Spragins,J.D.,et al.: Current Telecommunication Network Reliability Models: A Critical Assessment, IEEE J. Sel.Topics Commun., vol.SAC-4, no. 7, Oct. 1986, pp. 1168-1173.

3.

Bazovsky, I.: Reliability Prentice-Hall, 1963.

Theory

and

Practice.

5.

Reliability Test Methods, Plans,and Environments for Engineering Development, Qualification, and Production. MIL-HDBK-781, July 1987.
.

14.

Malec, H.A.: Reliability Optimization in Telephone Switching Systems Design. IEEE Trans. Rel., vol. R-26, no. 3, Aug. 1977, pp. 203-208. Roohy-Laleh, E.,et al.: A Procedure forDesigning a Low Connected Survivable Fiber Network. IEEE J. Sel.Topics Commun., 1986, pp. 1112-1117. vol.SAC-4, no. 7, Oct.

Laubach, C.H.: Environmental ing. NASA SP-T-0023, 1975.

Acceptance

Test-

15.

7.

Laube, R.B.: Methods to Assessthe Successof Test Programs. J. Environ. Sci., vol. 26, no. 2, Mar.Apr. 1983, pp. 54-58. Test Requirements for Space MIL-STD-1540B, Oct. 1982.
.

16.

8.

Vehicles.

Jones, D.R.; and Malec, H.A.: Communications Systems Performability: New Horizons. 1989 IEEE International Conference on Communications, vol. 1, IEEE, 1989, pp. 1.4.1-1.4.9. Dhillon, Factors. B.S.: Human Reliability: Pergamon Press, 1986. With Human

Siewiorek, D.P.; and Swarz, R.S.: The Theory and Practice of Reliable System Design. Digital Press, Bedford, MA, 1982, pp. 206-211. ReliabilityPrediction of Electronic Equipment, MIL-HDBK-217E, Jan. 1990. Nathan, _Error-Free' I.: A Deterministic Status of Complex Model To Predict Develop-

17.

18. 10.

11.

Software

Conroy, R.A.; Malec, H.A.; and Van Goethem, J.: The Design, Applications, and Performance of the System-12 Distributed Computer Architecture. First International Conference on Computers and Applications, E.A. Parrish and S. Jiang, eds., IEEE, 1984, pp. 186-195.

25

REPORT

DOCUMENTATION

PAGE

Form Approved OMB No. 0704-0188

Public tepontng burden for this collection of information is estimated to .average 1.,hour pej' response, including the time lot revl.ewing In,stru.ctlons, searching existing data sources, gathering and maintaining the data. needed, and completing ano reviewing the conec.on ov information. :_eno comments regarolng this ouroen estimate or any other aspect of this co k_tion o! informatlon,/ncludlng suggestions for reducing this burden, to Washington Headquarters Services. Deectorate for Information Operstlons and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington. VA 22202-4302, and to the Office of Management and Budget. Paperwork Reduct on Project (0704-0188), Wash ngton. DC 20503. 1. AGENCY USE ONLY (Leave b/anlO 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED

October
4. TITLE AND SUBTITLE

1994

Technical

Memorandum

5. FUNDING NUMBERS

Design

for Reliability:

NASA Reliability Prc, fen'cd Practices for Design and Test WU-323-44-19 R. Lalli

O. AUTHOR(S) Vincent

7.

PERFORMING

ORGANIZATION

NAME(S)

AND

ADDRESS(ES)

8.

PERFORMING REPORT

ORGANIZATION NUMBER

National Lewis

Aeronautics Research Ohio Center

and Space

Administration E-8053

Cleveland,

44135-3191

9. SPONSORING/MONITORING

AGENCY NAME(S) AND ADDRESS(ES)

10.

SPONSORING/MONITORING AGENCY REPORTNUMBER

National Washington,

Aeronautics D.C.

and Space 20546-0001

Administration NASA TM-106313

11. SUPPLEMENTARY NOTES Prepared and SRE, (216)


12a.

for the Reliability Anaheim,

and

Maintainability January 24-27,

Symposium 1994.

cosponsored person,

by ASQC, Vincent

IIE,

IEEE,

SOLE,

IES, code

AIAA, 0152,

SSS,

California,

Responsible

R. Lalli,

organization

433-2354.
STATEMENT 12b. DISTRIBUTION CODE

DISTRIBUTION'AVAILABILITY

Unclassified Subject

- Unlimited 18

Category

13.

ABSTRACT

(Maximum

200

words)

This

tutorial

summarizes and future

reliability civil space

experience programs. technical taken

from These

both

NASA

and industry were from collected

and reflects from various

engineering NASA

practices field centers

that and at

support were

current reviewed The

practices

by a committee material Committee part

of senior

representatives from the publication Practices process.

the participating issued

centers

(members

are listed

the end). ity Steering must other

for this tutorial (NASA

was

by the NASA

Reliability TM-4322, be weighted

and Maintainabil1991). Reliability with

Reliability

Preferred

for Design Although of sound programs Herein, both

and TesL NASA disciplines principles shown must

be an integral technical

of the systems

engineering

equally

and programmatic of America's that affect

demands, space

the application program. Our

reliability have

will be the key efforts

to the effectivemust design focus on

ness and the design istics must

affordability characteristics be controlled

space

that reliability that these

the frequency conservative

of failure. engineering

we emphasize

identified

character-

by applying

principles.

14.

SUBJECT TERMS Design; Test; Practices; Reliability; Training; Flight proven

15.

NUMBER

OF

PAGES

27
16. PRICE CODE

A03
17. SECURITY OF REPORT CLASSIFICATION

18. SECURITY CLASSIFICATION OF THIS PAGE Unclassified

19.

SECURITY

CLASSIFICATION

20.

LIMITATION

OF

ABSTRACT

OF ABSTRACT

Unclassified NSN 7540-01-280-5500

Unclassified
Standard Prescribed _.9G-1C2 Form 298 (Rev. 739-18 2-89)

by AN_31 _td

You might also like