0% found this document useful (0 votes)
81 views34 pages

Risk - Module - V1.0

This document discusses techniques for risk analysis in space systems engineering projects, including risk management, fault tree analysis, and failure mode and effects analysis. It describes risk as potential events that could negatively impact safety, performance, cost or schedule. Risk management is presented as a systematic process to identify, assess, prioritize and mitigate risks through various analytical techniques. The document provides examples of risk types, outlines the risk identification and assessment process, and discusses trigger questions to uncover potential risks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views34 pages

Risk - Module - V1.0

This document discusses techniques for risk analysis in space systems engineering projects, including risk management, fault tree analysis, and failure mode and effects analysis. It describes risk as potential events that could negatively impact safety, performance, cost or schedule. Risk management is presented as a systematic process to identify, assess, prioritize and mitigate risks through various analytical techniques. The document provides examples of risk types, outlines the risk identification and assessment process, and discusses trigger questions to uncover potential risks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

Risk Module:

Risk Management, Fault Trees and


Failure Mode Effects Analysis

Space Systems Engineering, version 1.0

Space Systems Engineering: Risk Module


Module Purpose: Risk

 To understand risk, risk management, fault tree


analysis and failure mode effects analysis in the
context of project development

 Acknowledge that risks are inevitable and recognize


that through systematic management and analytic
techniques they can be reduced

 Review three techniques that are used to discover,


assess, rank and mitigate risk - risk management,
fault tree analysis and failure mode effects analysis

Space Systems Engineering: Risk Module 2


What are Risks and Risk Management?

 Risks are potential events that have negative impacts


on safety or project technical performance, cost or
schedule

 Risks are an inevitable fact of life – risks can be


reduced but never eliminated

 Risk Management comprises purposeful thought to


the sources, magnitude, and mitigation of risk, and
actions directed toward its balanced reduction

 The same tools and perspectives that are used to


discover, manage and reduce risks can be used to
discover, manage and increase project opportunities -
opportunity management

Space Systems Engineering: Risk Module 3


What is Risk Management?

Risk management is a continuous


and iterative decision making
technique designed to improve the
probability of success. It is a
proactive approach that:

 Seeks or identifies risks


 Assesses the likelihood and impact of these risks
 Develops mitigation options for all identified risks
 Identifies the most significant risks and chooses which
mitigation options to implement
 Tracks progress to confirm that cumulative project risk is indeed
declining
 Communicates and documents the project risk status
 Repeats this process throughout the project life

Space Systems Engineering: Risk Module 4


Risk Management Considers the Entire
Development and Operations Life of a Project
Risk Type Examples
 Technical Performance  Failure to meet a spacecraft technical
Risk requirement or specification during verification

 Cost Risk  Failure to stay within a cost cap for the project
 Failure to secure long-term political support
 Programmatic Risk
 Failure to meet a critical launch window
 Schedule Risk
 Spacecraft deorbits prematurely causing
 Liability Risk damage over the debris footprint

 Regulatory Risk  Failure to secure proper approvals for launch


of nuclear materials
 Operational Risk  Failure of spacecraft during mission
 Hazardous material release while fueling
 Safety Risk
during ground operations

 Supportability Risk  Failure to resupply sufficient material to


support human presence as planned

Space Systems Engineering: Risk Module 5


Every NASA Space Flight Project Begins
with a Plan for Risk Management
 This plan reflects the project’s risk management philosophy:
• Priority (criticality to long-term strategic plans)
• National significance
• Mission lifetime (primary baseline mission)
• Estimated project life cycle cost
• Launch constraints
• In-flight maintenance feasibility
• Alternative research opportunities or re-flight opportunities

 The risk management philosophy is reflected in a number of ways:


• Whether single point failures are allowed
• Whether the system is monitored continuously during operations
• How much slack is in the development schedule
• How technical resource margins (i.e., mass, power, MIPS, etc.) are
allocated throughout the development

Space Systems Engineering: Risk Module 6


Other Factors to Consider in Assessing
Risk (but not limited to)…

 Complexity of management and technical interfaces


 Design and test margins
 Mission criticality
 Availability and allocation of resources such as mass, power,
volume, data volume, data rates, and computing resources
 Scheduling and manpower limitations
 Ability to adjust to cost and funding profile constraints
 Mission operations
 Data handling, i.e., acquisition, archiving, distribution and
analysis
 Launch system characteristics
 Available facilities

Space Systems Engineering: Risk Module 7


Risk Identification

 Risks are identified by the development team, peer


reviews, lessons from past projects and expert
review
 Lessons from past projects are captured via ‘trigger
questions’, or questions that challenge a
development strategy or design solution
 The project risk status and top ten risk list are
reviewed periodically - usually monthly - and at the
project milestone reviews

Space Systems Engineering: Risk Module 8


Example Risk Trigger Questions

 Have requirements been implemented such that a small change


in requirements has the potential to cause large cost,
performance or schedule system ramifications?
 Do designs or requirements push the current state-of-the-art?
 Has the concept for operating, maintaining, decommissioning or
disposal of the system been adequately defined to ensure the
identification of all requirements?
 Has an independent cost estimate (ICE) been performed?
 Is the schedule adequate to handle the level of requirements or
objectives changes that are occurring or are likely to occur?
 Have the necessary facilities for environmental test been
identified and availability problems been resolved?

Space Systems Engineering: Risk Module 9


More Considerations for Risk Discovery

While each space project has its unique risks, a list of the underlying sources
of risks would include the following:
 Technical complexity - many design constraints or many dependent
operational sequences having to occur in the right sequence and at the
right time
 Organizational complexity - many independent organizations having to
perform with limited coordination
 Inadequate margins or reserves
 Inadequate implementation plans
 Unrealistic schedules
 Total and year-by-year budgets mismatched to the actual implementation
risks
 Over-optimistic designs pressured by mission expectations
 Limited engineering analysis and understanding due to inadequate
engineering tools and models
 Limited understanding of the mission’s space environments
 Inadequately trained or inexperienced project personnel
 Inadequate processes or inadequate adherence to proven processes

Space Systems Engineering: Risk Module 10


Pause and Learn Opportunity

Engage the class in identifying risks for a familiar


project.
• What kinds of risks are identified?
• What is the basis for their search for risks?
After the class has thought for a while, the instructor
could present some trigger questions which may help
discover new risks and show the value of the trigger
questions.

Space Systems Engineering: Risk Module


Cartoon: Dilbert Identifies Risks

© United Features Syndicate, Inc.

Space Systems Engineering: Risk Module 12


The Benefits of Preparing for the Unexpected

Background:
On January 21, 2004 (Sol 18), Spirit abruptly ceased communicating with
mission control. The next day the rover radioed a 7.8 bit/s beep,
confirming that it had received a transmission from Earth but indicating
that the spacecraft believed it was in a fault mode.

Mars Spirit Rover Flash Memory Problem

“The thing that strikes me most about all this is how critical
it was to have that INIT_CRIPPLED command in the system.
It’s not the kind of command that you’d ever expect to use
under normal conditions on Mars. But back during the
earliest days of the project Glenn realized that someday we
might need the flexibility to deal with a broken flash file
system, and he put INIT_CRIPPLED in the system and left it
there. And when the anomaly hit, it saved the mission.”
–From “Roving Mars” by Steve Squires, Hyperion 2005

Be prepared for the low probability event with a huge


consequence.

Space Systems Engineering: Risk Module 13


After Identification Risks are Assessed

 Risks are assessed by characterizing the probability that a


project will experience an undesired event and the
consequences, impact or severity of the undesired event, were
it to occur
 Risks can be compared on iso-curves consisting of a likelihood
measure and a consequence measure
 Since the assessment of the likelihood and consequence of a
risk is both subjective and has significant uncertainty the
characterization of risk either qualitative (low medium or high) or
semi-quantitative (risk are captured on a 5x5 matrix)

1.0
(Probability)
Likelihood

High Risk
Medium
Risk

Low
Risk
0.0
Severity of Consequence
Space Systems Engineering: Risk Module 14
An Example of Some Semi-Quantitative Definitions to
Enable a Project to Compare and Rank Risks

Impact of Consequences
Class Technical Schedule Cost
Probability of A condition that may cause death launch window cost
Occurrence Class I or permanently disabling injury, to be missed overrun >
Catastrophic facility destruction on the ground, 50 % of
Scale Measure (Scale 5) or loss of crew, major systems, or planned
vehicle during the mission cost
Near certain to occur
5 A condition that may cause schedule cost
(80-100%).
Class II severe injury or occupational slippage overrun 15
Highly likely to occur Critical illness, or major property damage causing % to 50 %
4
(60-80%). (Scale 4) to facilities, systems, equipment, launch date to of planned
or flight hardware be missed cost
Likely to occur (40-
3
60%). A condition that may cause minor internal cost
Class III injury or occupational illness, or schedule slip overrun 2 %
Unlikely to occur (20- to 15 % of
2 Moderate minor property damage to that does not
40%). planned
(Scale 3) facilities, systems, equipment, or impact launch
cost
Not likely; Improbable flight hardware date
1
(0-20%).
A condition that could cause the internal cost
need for minor first aid treatment schedule slip overrun <
Class IV but would not adversely affect that does not 2 % of
Negligible personal safety or health; damage impact internal planned
(Scale 2) to facilities, equipment, or flight development cost
hardware more than normal wear milestones
and tear level

Space Systems Engineering: Risk Module 15


A 5x5 Risk Matrix Provides a Quick
Visual Comparison of All Project Risks

High risks – mission success jeopardized -


immediate action required

Medium risk – review regularly – contingent


action if does not improve

Low risk – watch and review periodically

Space Systems Engineering: Risk Module 16


Top Risks and their Trends are Periodically
Reviewed for the SOFIA Project

SOFIA Risk Matrix


Rank & Risk Appr
Trend ID oach Risk Title
5
1 DFRC-34 R Landing Gear Door System
Failure
4 3 1 2 DFRC-12 M Sched Integration problems
structure vs.. avionics
Likelihood

3 DFRC-07 W Cost growth for engine


4 2 components
3
6 5
4 DFRC-24 A Quality Control Resources
insufficient
2
8 7 5 DFRC-01 W Avionics software behind
schedule
6 DFRC-11 R Payload Capacity & Volume
1 Trade-offs design issues

1 2 3 4 5 7 DFRC-04 R Limited Flight Envelope, due


to technical issues
CONSEQUENCES
8 DFRC-02 R More flight testing may be
required for Soft V&V
Criticality L x C Trend Approach
High Decreasing (Improving) M - Mitigate
Increasing (Worsening) W - Watch
Med Unchanged A - Accept
Low New Since Last Period R - Research

Space Systems Engineering: Risk Module 17


Top Risks and their Trends are Periodically
Reviewed for the Constellation SE&I
SE&I Top Risk List

T Co ns equence
5 R L
L 5 r S P S C
I a Owning I A E C O
e Title
K 4 6 7 1, 2 n Team K F R H S
n
E k E E F E T
d D
L 8 4
I 3
1 N _ 1677 - Ares I/Orio n As cent FP_SIG 4 4 5 5 5
H Aeroaco us tic Enviro nments
O 2 3
2 N _ 1676 - Structural loads on CEV FP_SIG 4 5 5 4 4
O
and LSAM during TLI
D 1
3 _ _ 1122 - Requirements SE&I - 2 0 2 2 2
1 2 3 4 5 Maturation PRIMO
CONSEQUENCE 4 1135 - Pro g ram Vis ibility for SE&I - 3 0 4 0 4
_ _
Clo s ing the Architecture AT&A
Leg end
_ Decreas ing (Impro ving ) 5 N _ 1603 - (SRR) Abo rt Site Sea SE&I_SO 5 3 4 4 4
Increas ing (Wo rs ening )
State Limits Launch Availability A
_
_ Unchang ed 6 _ _ 1125 - So ftware Develo pment CSI_SIG 4 3 3 3 3
_ To p Directo rate Ris k (TDR)
and As s urance
_ To p Pro g ram Ris k (TPR) 7 _ _ 1195 - CxP Lifecycle co s t SE&I_SO 4 0 0 0 4
_ To p Pro ject Ris k ( TPro jR ) A
8 _ _ _ 1046 - Tailo ring o f Human - SE&I_PT 3 0 0 3 3
Rating requirements I_HR

Space Systems Engineering: Risk Module 18


The Status of the Most Significant Risks and Their
Mitigation Options are Reviewed Periodically
 Title of risk
 Description or Root cause
 Possible categorizations
• System or subsystem
• Cause category (technology, programmatic, cost, schedule, etc.)
• Resources affected (budget, schedule slack, technical margins, etc.)
 Owner
 Assessment of Implementation risk or Mission risk
• Likelihood - estimate of the probability of the risk event
• Consequences - estimate of the performance, cost, safety and
schedule effects
 Mitigation
• Description, including costs of mitigation options
• Mitigation option leverage or reduction in the assessed risk
• Current mitigation activities
• Current trends in risk significance - likelihood and impact
 Significant milestones
• Opening and closing of the window of occurrence
• Decision points for mitigation implementation effectiveness

Space Systems Engineering: Risk Module 19


Part 2 of Risk Module:
Fault Tree Analysis
Event Tree Analysis

Space Systems Engineering: Risk Module


Fault Tree Analysis Supports Design
Decisions and Failure Investigations
 Fault Tree Analysis - FTA - uses a top-down symbolic logic model
and estimates of failure probabilities of ‘initiators’ to estimate the
occurrence (failure) of the pre-determined, undesirable, ‘top’ event

 An initiator is a credible undesirable event that is a contributing


cause to top event failure
 ‘Cut sets’ are groups of initiators, when taken together, cause top
event failure
 ‘Path sets’ are groups of initiators that if none occur the top event
does not fail

 FTA is both a design and a diagnostic tool


 As a design tool FTA is used to compare alternative design
solutions and the resulting TOP event probability
 As a diagnostic tool FTA is used to investigate scenarios that may
have led to the TOP event failure - leading to an estimate of the
most likely cut sets

Space Systems Engineering: Risk Module 21


Fault Tree Analysis

Fault tree analysis is a graphical


representation of the combination
of faults that will result in the
occurrence of some (undesired)
top event.
In the construction of a fault tree,
successive subordinate failure
events are identified and logically
linked to the top event.
The linked events form a tree
structure connected by symbols
called gates.

Space Systems Engineering: Risk Module 22


Refer to NASA Reference Publication 1358:
System Engineering “Toolbox” for
Design-Oriented Engineers

Section 3.6: Fault Tree Analysis


(Handout)
Particular points:
And/Or Gates explanation
Example Fault Tree (Fig 3-20)

Space Systems Engineering: Risk Module


Event Trees

 Event trees can be viewed as a special case of fault trees,


where the branches are all ORs weighted by their probabilities.
 Event trees are generated both in the success and failure
domains.
 This technique explores system responses to an initiating
“challenge” and enables assessment of the probability of an
unfavorable or favorable outcome. The system challenge may
be a failure or fault, an undesirable event, or a normal system
operating command.
 In constructing the event tree, one traces each path to eventual
success or failure.
 This technique is typically performed in phase C but may also
be performed in phase B.
 See NASA Reference Publication 1358: System Engineering
“Toolbox” for Design-Oriented Engineers section 3.8 for
additional discussion.

Space Systems Engineering: Risk Module 24


Will the Stage Make it from Hangman’s Hill to Placer Gulch?

Probability of no
Station
horses

1, 2, 3 0.2

4 0.1

Placer Gulch event tree


example from a Safety &
Mission Assurance training
course by Pat Clemons of
Sverdrup.
Space Systems Engineering: Risk Module 25
Fault Tree Analysis of the Placer Gulch Stage

Space Systems Engineering: Risk Module 26


Part 3 of Risk Module:
Failure Mode Effects Analysis

Space Systems Engineering: Risk Module


Failure Mode Effects Analysis

• Objective
• To ensure all failure modes have been identified and evaluated
• Technique
• Select a method to rank project failure modes
• Identify failure modes including all single point failure modes
• Analyze failure modes and their mission effect
• Determine those failure modes that might benefit from
corrective action, e.g.,
– Alternative designs
– Redundancy
– Increased reliability
• Determine which, if any, corrective actions will be
implemented

Space Systems Engineering: Risk Module 28


Failure Mode Effects Analysis

 FMEA is a design tool for identifying risk in the


system or mission design, with the intent of mitigating
those risks with design changes. The FMEA risk
mitigation:
1. Recognizes and evaluates the potential failure of
a system and its effects;
2. Identifies actions which could eliminate or reduce
the chance of a potential failure occurring.
 FMEA is initiated in Phase B (Preliminary Design)
and used to support design decisions in Phase C
(Final Design).

Space Systems Engineering: Risk Module 29


Failure Mode and Effects Analysis

S C O D R Actions Results
e l Potential Causes/c Current e P Responsibility
a c
Item Potential Potential v s Mechanisms(s) u Controls et N Recommended & Target
Failure Effects of s Failure r Prevention/Detection
c Action(s) Completion Date Actions S O DR
Function Mode Failure Taken ev cc et P N

What can be done?


What How - Design changes
are the bad
Effects? is it? What did they
- Process changes
How do and what
What are the are the
functions often - Special controls
does outcomes
or requirements?
it - Changes to standards,
happen procedures, or guides
What can go What are ?
wrong?
the Cause(s)?
- No Function
How
- Partially
good is
Degraded
this
Function
method Who is going
- Intermittent How can this
Function at to do it and
be prevented detecting
- Unintended when?
and detected? it?
Function

Space Systems Engineering: Risk Module 30


Module Summary: Risk

 Risk is inevitable, so risks can be reduced but not eliminated.

 Risk management is a proactive systematic approach to


assessing risks, generating alternatives and reducing
cumulative project risk.

 Fault Tree Analysis is both a design and a diagnostic tool that


estimates failure probabilities of initiators to estimate the failure
of the pre-determined, undesirable, ‘top’ event.

 Failure Mode Effects Analysis is a design tool for identifying risk


in the system design, with the intent of mitigating those risks
with design changes.

Space Systems Engineering: Risk Module 31


Backup Slides
for Risk Module

Space Systems Engineering: Risk Module


Uncertainties that Plague Projects
Uncertainties Offsets
 Will the baseline system satisfy the  Thorough study
needs & objectives?  Analyses
Mission Objectives  Are they the best ones?  Cost & schedule credibility
 Can baseline technology achieve the  Technology development plan
objectives?  Paper studies
 Can the specified technology be  Design reviews
Technical Factors attained?  Establish performance
 Are all the requirements known? margins
 Engineering model test and
prototyping
 Test & evaluation
 Can the plan and strategy meet the Resources
objectives? •Manpower skills
•Time
Internal Factors •Facilities
Program strategy
Budget allocations
Contingency planning

 Will outside influences jeopardize Contingency


External Factors the project? Robust design
Space Systems Engineering: Risk Module 33
Project Risk Categories

Typical Typical Typical Typical Typical


Technical Programmatic Supportability Cost Schedule
Risk Sources Risk Sources Risk Sources Risk Sources Risk Sources
• Physical properties • Material availability • Reliability and • Sensitivity to • Sensitivity to
maintainability technical risk technical risk
• Material properties • Personnel availability
• Training • Sensitivity to • Sensitivity to
• Radiation properties • Personnel skills programmatic risk programmatic risk
• Operations and
• Testing/Modeling • Safety support • Sensitivity to • Sensitivity to
• Integration/Interface • Security supportability risk supportability risk
• Manpower
• Software Design • Environmental considerations • Sensitivity to • Sensitivity to cost
impact schedule risk risk
• Safety • Facility
• Communication considerations • Labor rates • Degree of currency
• Requirement
problems • Interoperability • Estimating error • Number of critical
changes
• Labor strikes considerations path items
• Fault detection
• Requirement • System safety • Estimating error
• Operating changes
environment • Technical data
• Stakeholder
• Proven/Unproven advocacy
technology
• Contractor stability
• System complexity
• Funding continuity
• Unique/Special
and profile
Resources
• Regulatory changes
• COTS performance
• Embedded training

Space Systems Engineering: Risk Module 34

You might also like