0% found this document useful (0 votes)
117 views14 pages

Fault Tolerant Systems: Chapter 1: PRELIMINARIES

This document introduces key concepts in fault-tolerant systems. It defines faults, errors, and failures and classifies fault types. It describes different forms of redundancy used to build fault tolerance into systems, including hardware, information, time, and software redundancy. Traditional measures of fault tolerance like reliability and availability are discussed as well as network-specific measures. The outline previews topics to be covered in subsequent chapters.

Uploaded by

Kisalay Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views14 pages

Fault Tolerant Systems: Chapter 1: PRELIMINARIES

This document introduces key concepts in fault-tolerant systems. It defines faults, errors, and failures and classifies fault types. It describes different forms of redundancy used to build fault tolerance into systems, including hardware, information, time, and software redundancy. Traditional measures of fault tolerance like reliability and availability are discussed as well as network-specific measures. The outline previews topics to be covered in subsequent chapters.

Uploaded by

Kisalay Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

FAULTTOLERANTSYSTEMS

Chapter1:PRELIMINARIES

PRELIMINARIES
FAULTCLASSIFICATION
TYPESOFREDUNDANCY
BASICMEASURESOFFAULTTOLERANCE
TRADITIONALMEASURES
NETWORKMEASURES

OUTLINE

PRELIMINARIES
Computersystems,hardwareandsoftware,
aremostcomplexsystemsevercreatedby
humanbeings.
Criticalapplications:Spaceshuttle,financial
systems,medicalinstruments,etc.
Faulttolerance:techniquestotoleratefaults
whilestilldeliveringacceptablelevelof
serviceforintendedobjectivesofsystems.
3

FAULTCLASSIFICATION
FAULT ERROR FAILURE
Fault:hardwaredefectorsoftware/programming
mistake.
Error:manisfestationoffault.
Failure:notachieveintendedobjectiveofsystem.

Fault/errormayspreadthroughsystem.
Containmentzone:barriertoreducechancethat
fault/errorinonezonepropagatestoanother.
4

FAULTCLASSIFICATION
FAULTCHARACTERISTICS
Permanent:permanentdefect.
Transient:malfunctionforsometimeandrestore
functionalityafterward.
Intermittent:oscillatesbetweenquiescentand
active.

OTHERCHARACTERISTICS.
Benign.
Malicious:appearsreasonable,butincorrect.
5

TYPESOFREDUNDANCY
REDUNDANCY:
Propertyofhavingmoreofaresourcesthanis
minimallynecessarytodothejob.
Whenthereisfault,redundancymasksorworks
aroundfaults.

FORMSOFREDUNDANCY:
Hardwareredundancy(staticanddynamic):
incorporateextrahardwareintodesigntoeither
detectoroverrideeffectsoffailedcomponent.
6

TYPESOFREDUNDANCY
FORMSOFREDUNDANCY(cont.):
Informationredundancy:errordetectionand
correction.
Timeredundancy:reexecutionofsamehardware
orprogram.
Softwareredundancy:multipleversionsof
program.

BASICMEASURESOFFT
MEASURE
Mathematicalabstractionthatexpressessome
relevantfacetofperformanceofobject.Usually
onlycapturesasubsetofproperties.

TYPES:
Traditional.
Network.

BASICMEASURESOFFT
RELIABILITYANDAVAILABILITY:Verylimitedin
whattheycanexpress.
ReliabilityR(t):probabilitythatsystemhasbeen
up(operational)continuouslyintimeinterval
[0,t].
MeanTimeToFailure(MTTF):Averagetime
systemoperatesuntilfailureoccurs.
MeanTimeBetweenFailure(MTBF):Averagetime
betweentwoconsecutivefailures.
9

BASICMEASURESOFFT
MeanTimetoRepair(MTTR):Timeneededto
repairsystemfollowingfirstfailure.
MTBF=MTTF+MTTR
AvailabilityA(t):averagefractionoftimeover
interval[0,t]thatsystemisup(operational).
PointavailabilityAP(t):probabilitythatsystemis
upatparticulartimeinstantt.
Longterm(steadystate)availability.
A = lim A(t)
t

10

BASICMEASURESOFFT
Longtermavailabilitymaybecalculatedfrom
MTTF,MTBF,andMTTR.
A=

MTTF
MTBF

MTTF
MTTF + MTTR

Itispossibleforalowreliabilitysystemtohave
highavailability:asystemthatfailseveryhouron
averagebutcomesbackupafteronlyasecond
MTBFofonehour(lowreliability),butavailability
ishighA=3559/3600=0.99972.
11

BASICMEASURESOFFT
NETWORKMEASURES:
Focusesonnetworkthatconnectsprocessor
together.
Nodeandlineconnectivity:Minimumnumberof
nodesandlines,respectively,thathavetofail
beforenetworkbecomesdisconnected.
Canonlydistinguishestwonetworkstates:
connectedanddisconnected.Itsaysnothing
abouthownetworkdegradesasnodesfailbefore,
orafter,becomingdisconnected.
12

BASICMEASURESOFFT
NETWORKMEASURES:

Bothnetworkshavesamenodeconnectivityof1.
ButN1ismuchmoreconnectedthanN2
probabilityofN1beingbrokenupislowerthan
forN2.
13

OUTLINE

HARWAREFAULTTOLERANCE
INFORMATIONREDUNDANCY
FAULTTOLERANTNETWORK
SOFTWAREFAULTTOLERANCE
CHECKPOINTING
CASESTUDIES
FAULTDETECTIONINCRYPTOGRAPHIC
SYSTEMS
14

You might also like