Risk Analysis For Information and Systems Engineering: INSE 6320 - Week 6

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

1

Reliability Theory
INSE 6320 -- Week 6

Let T be a random variable representing the failure time or lifetime of a


physical system. For this system, the probability that it will fail by time t is:

Risk Analysis for Information and Systems Engineering

F ( t ) P [T t ]

Reliability
Expert Opinion
Midterm Review

f (u ) du
0

The probability of the system surviving until time "t" is:

R (t ) P[T t ] 1 F (t ) f ( u) du
t

Failure rate: the probability that a failure will occur in the interval [t1, t2]
given that a failure has not occurred before time t1. This is written as:
P [ t 1 T t 2 | T t 1]
P[t 1 T t 2]
F ( t 2 ) F ( t 1)

t 2 t1
( t 2 t 1) P [T t 1]
( t 2 t 1) R ( t 1)

Dr. A. Ben Hamza

Concordia University
2

Reliability

Reliability: The probability that an item will perform its intended function without
failure under stated conditions for a specified period of time.

Failure: The termination of the ability of the product to perform its intended function

Reliability provides a quantitative statement of the chance that an item will


operate without failure for a given period of time in the environment for
which it was designed.

In its simplest and most general form, reliability is the probability of success.

To perform reliability calculations, reliability must first be defined explicitly. It


is not enough to say that reliability is a probability. A probability of what?

Reliability is performance over time, probability


that something will work when you want it to.

Reliability Terms

Mean Time To Failure (MTTF) for non-repairable systems


Mean Time Between Failures for repairable systems (MTBF)
Reliability Probability (survival) R(t)
Failure Probability (cumulative density function ) F(t)=1-R(t)
Failure Probability Density f(t)
Failure Rate function (hazard rate) h(t)

Important Relationships:
R (t ) F (t ) 1
t

f (t ) h (t ) exp - h (u ) du dF ( t ) / dt,
0

R (t ) 1- F (t ) exp - h(u ) du ,
0

MTTF tf t dt R t dt
0

F (t ) f (u ) du ,
0

h (t ) f (t ) / R (t )

Example: Exponential Model

Failure Distribution function (or unreliability): Probability that the


product fails at some time prior to t.

F ( t ) P (T t )

h (t ) f (t ) / R (t )

Failure Density function: The value of f(t) is the probability of the


product failing precisely at time t.
f (t )

Constant Failure Rate

l(t)

f ( t ) exp( t )

dF (t )
dR (t )

R '(t )
dt
dt

0, t 0

R ( t ) exp( t ) 1 F ( t )

Reliability function: Probability that the item does not fail before time t

e (t x )
R ( x | t ) P(T t x | T t ) t e x R ( x )
e

R (t ) P (T t ) 1 F (t )
Hazard function: Measure of proneness to failure as a function of age, t.
h (t ) lim
t 0

P (t T t t | T t )
f (t)
R '( t )
d log R (t )

t
R (t )
R (t )
dt

MTTF

R(MTTF ) e MTTF e1 0.367879

Cumulative hazard : Cumulative number of failures at time t


H (t )

Memory-less property implies that a used unit is just as reliable as


one that is new; i.e., there is no wear-out.

h (u ) du log R ( t )
6

MTTF and MTBF

Example: Weibull Model

One of the measures of the system's reliability is the mean time to failure
(MTTF). It should not be confused with the mean time between failure (MTBF).
We refer to the expected time between two successive failures as the MTTF
when the system is non-repairable.
For a repairable item, MTBF is the ratio of the cumulative operating time to the
number of failures for that item.
When the system is non-repairable we refer to MTTF as the MTBF

MTBF MTTF R t dt tf t dt E (T )
0

f (t )

MTBF

Total operating time 45000

7500 hours
Number of failures
6

t
exp

0, 0, t 0

t
R ( t ) exp 1 F ( t )

t
h(t ) f (t ) / R (t )

MTTF

Example (repairable system): A motor is repaired and returned to service


six times during its life and provides 45,000 hours of service. Calculate MTBF.

1
t 1/ e t dt 1

is the Shape Parameter and


is the Characteristic Lifetime survival

11

Versatility of Weibull Model


t
Failure Rate: h(t ) f (t ) / R (t )

Answer

Failure Rate

Constant Failure Rate


Region
h(t)

0 1

Early Life
Region

Wear-Out
Region

Time t
10

Example

Failure Rate Function

Increasing failure rate (IFR) v.s. decreasing failure rate


(DFR)

h(t )

or

h (t )

respectively

Examples
h (t ) c where c is a constant
h (t ) at
h (t )

where a 0

1
for t 0
t 1

12

13

15

Answer

Answer

14

16

System Reliability Evaluation

Answer

A system (or a product) is a collection of components arranged according to a


specific design in order to achieve desired functions with acceptable performance
and reliability measures.

Clearly, the type of components used, their qualities, and the design configuration
in which they are arranged have a direct effect on the system performance and its
reliability. For example, a designer may use a smaller number of high-quality
components and configure them in a such a way to result in a highly reliable
system, or a designer may use larger number of lower-quality components and
configure them differently in order to achieve the same level of reliability.

Once the system is configured, its reliability must be evaluated and compared with
an acceptable reliability level. If it does not meet the required level, the system
should be redesigned and its reliability should be re-evaluated.

MTTF=

17

19

Reliability Block Diagram (RBD) Technique

Typical RBD configurations and related formulae

The first step in evaluating a system's reliability is to construct a reliability block


diagram which is a graphical representation of the components of the system and
how they are connected.
The purpose of RBD technique is to represent failure and success criteria pictorially
and to use the resulting diagram to evaluate System Reliability.

The reliability of the system is given by


R (t ) RA (t ) RB (t ) RC (t )....RZ (t )

Output

Input

Benefits:

Series System

The interpretation can be stated as any unit failing causes the system as a whole to fail.

The pictorial representation means that models are easily understood and therefore
readily checked.
Block diagrams are used to identify the relationship between elements in the system.
The overall system reliability can then be calculated from the reliabilities of the blocks
using the laws of probability.
Block diagrams can be used for the evaluation of system availability provided that
both the repair of blocks and failures are independent events, i.e. provided the time
taken to repair a block is dependent only on the block concerned and is independent
of repair to any other block

Parallel System
The reliability of the system is given by:

Input

Output

R(t ) 1 (1 RX (t ))(1 RY (t ))
The units X and Y that are operating in such a way that the system will survive as long as at
least one of the unit survives.

18

System Configuration Models

20

Typical RBD configurations and related formulae

Series/Parallel System

When blocks such as X and Y themselves comprise sub-blocks in series, block


diagrams of the type are shown below

Output
Input

RX (t ) RA1 (t ) RB1 ( t ) RC 1 (t )....RZ 1 (t )


RY (t ) RA 2 (t ) RB 2 (t ) RC 2 (t )....RZ 2 (t )

Thus, the reliability of the system is given by


R (t ) 1 (1 R X (t ))(1 RY (t ))

21

23

Software Reliability Models

Software Reliability
Basic definitions:

Software reliability models can be classified into many different groups; some of the

more prominent (better known) groups include:

Software reliability: probability that the software will not cause a failure for some
specified time.

error seeding - estimates the number of errors in a program. Errors are divided into
indigenous errors and induced (seeded) errors. The unknown number of indigenous
errors is estimated from the number of induced errors and the ratio of the two types
of errors obtained from the testing data.

Failure: divergence in expected external behavior.


Fault: cause/representation of an error, i.e., a bug
Error: a programmer mistake (misinterpretation of specifications?)

Reliability growth
Basic question: How to estimate the growth in software reliability as its errors are
being removed?

Measures and predicts the improvement of reliability through the testing process
using a growth function to represent the process.

Major issues:
Independent variables of the growth function could be time, number of test cases
(or testing stages) and

testing - (how much? When to stop!)


field use ( # of trained personnel? Support staff?)

The dependent variables can be reliability, failure rate or cumulative number of


errors detected.

Software reliability growth models: observe past failure history and give an estimate of
the future failure behavior; about 40 models have been proposed.
22

Reliability and Availability


A simple measure of reliability can be given as: MTBF = MTTF + MTTR , where
MTBF is mean time between failures
MTTF is mean time to fail
MTTR is mean time to repair
Availability can be defined as the probability that the system is still operating within
requirements at a given point in time and can be given as:
MTTF
Availability =
100%
(MTTF + MTTR)
Availability is more sensitive to MTTR which is an indirect measure of the
maintainability of software.

24

Software Reliability Models

Nonhomogeneous Poisson process (NHPP)


provide an analytical framework for describing the software failure
phenomenon during testing.
the main issue is to estimate the mean value function of the cumulative
number of failures experienced up to a certain point in time.
a key example of this approach is the series of Musa models

A typical measure (failures per unit time) is the failure intensity (rate) given as:
# of failures in [ t , t t ]
(t ) f

where t = program CPU time (in a time shared computer) or wall clock time
(in an embedded system).

25

Software Reliability Models

27

Example:
Assume a program will experience 100 failures in infinite time. It has now
experienced 50 failures. The initial failure intensity was 10 failures/cpu hour.

Software Reliability Growth models are generally black box - no easy way to
account for a change in the operational profile
Operational profile: description of the input events expected to occur in actual
software operation how it will be used in practice

The current failure intensity is:



50
( ) 0 1 10 1
5 failures/cpu hour
100
0

consequences are that we are unable to go from test to field


Many models have been proposed, perhaps the most prominent is Musa
Basic model:

The number of failures experienced after 10 cpu hours is:

10

( ) 100 1 exp
(10) 100[1 exp( 1)] 63 failures
100

Failure Intensity (FI) is the number of failures per unit time.

For 100 hours:

Assume that the decrement in failure intensity (FI) function (the derivative
with respect to the number of expected failures) is constant.

10

( ) 1001 exp
(100) 100[1 exp(10)] 100 failures
100

Implies that the FI is a function of average number of failures experienced


at any given point in time.

26

Musa Basic Model

where:

28

Expert Opinion

( ) 0 1
0

0 is the initial failure intensity at the start of execution.


is the average (expected) number of failures at any point in time.
0 is the total number of failures over infinite time.

Expert Opinion techniques involves consultation with experts, who use


their experience and understanding of the system to arrive at an estimate
of its cost.
Only used when more objective techniques are not applicable
Used to corroborate or adjust objective data
Cross check historical based estimate

Use for high level, low fidelity estimating


Last resort

The average number of failures at any point in time is given as:


( ) 0 1 exp 0
0

Tip: Expert opinion is the least regarded


and most dangerous method, but it is
seductively easy. Most lexicons do not
even admit it as a technique, but it is
included here for completeness.

29

31

Expert Opinion Advantages/Disadvantages

How to obtain information

Advantages
An expert can factor in differences between past project experiences and new
techniques, architectures or applications involved in the future project
Good cross check of other estimate from Subject Matter Expert (SME) point of
view

It helps to be a good lawyer and a good detective.


Ask clear, logical, probing questions.
Never simply ask a question then just walk away, use the following
approach
Help the specialists think through their own answers.

Allows perspective to an estimate that may be overlooked without SME

Do you mean?
Would that be the same in another situation?

Disadvantages

Ask questions in more than one way.

Expert judgment is only as good as the estimator, who has his own biases
Completely subjective without use of other techniques

Clarification.
Their answers might change based on a clarification question.

Look for uncertainty in their answers.

Low-to-nil credibility

Was their response confident or reluctant?

Evaluate the information obtained.


Make sense? Could you explain it to someone else?

30

What makes a good expert?

Credibility!

Someone who has the ear of the Program manager.


You should use the same person that the program manager relies upon for
the most critical information.

32

Main Points for Midterm Exam

What to Study
Some topics are more important than others.
Spend your time on the right stuff.
Dont waste time on topics we havent emphasized in class.

How to Prepare for the midterm

Technical specialist or engineer who is knowledgeable about the program


under question.

Focus on the main topics


Go back to your assignment. There is a lot of good review in there.
Make a list of your problem areas.
Eliminate any topics/problems not mentioned on the lecture slides.
Keep the class notes as guideline.
Read the relevant textbook chapters on the covered topics.
Bring a Calculator on the day of the exam.
Only one double-sided sheet of formulas and notes is allowed.

Midterm Exam Coverage: Lectures 1-to-5, and Assignment 1

Main Topics for Midterm Exam

Risk vs. Uncertainty

Probability of Events. Probability Distributions.

Individual Risk vs. Societal Risk

Weibull Analysis

Survival Analysis

Event Trees and F-N Curves

Fault Trees:

Block diagram for engineering systems (series, parallel, series-parallel, etc)


Cut Sets and Minimal Cut Sets
Equivalent Fault Tree
Probability of Occurrence of Fault Events
Probability of Occurrence (System Failure) of Top Event

33

You might also like