Week09-Fault Tolerant System

asdsad

Uploaded by

yzayan65

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views26 pages

Week09-Fault Tolerant System

asdsad

Uploaded by

yzayan65

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Disaster Recovery

& Business
Continuity
Planning
Week-09: Fault Tolerant System
Topics Covered
• Introduction to Fault tolerant system
• Fault,error and failure
• MTTF and MTTR
• Techniques of fault tolerant
• Real world high availability system
Introduction
• Reliability and availability have become
increasingly important in today’s computer
dependent world.
• To achieve the needed reliability and
availability, we need fault-tolerant computers.
Fault tolerant system
• Fault tolerant system have the ability to
tolerate faults by detecting failures, and
isolate defect modules so that the rest of the
system can operate correctly.
Case Study
• According to a study on Tandem systems , the
percentage of outages caused by hardware
faults was 30% in 1985, but had decreased to
10% in 1989. Outages caused by software
faults increased in the same period, from 43%
to over 60%!
Fault , Error, Failure
• When a system or module is designed, its
behavior is specified. When in service, we can
observe its behavior. When the observed
behavior differs from the specified behavior,
we call it a failure.
• A failure occurs because of an error, caused by
a fault.
Fault , Error, Failure
• For example is a cosmic ray that discharges a
memory cell (fault), causing an error. When
the memory cell is read, we have a memory
failure and the error becomes effective.
Module reliability
• This reliability is statistically quantified as
mean-time-to-failure (MTTF).
• The average time it takes to repair a module
• after the detection of the failure is called
mean-time-to-repair (MTTR).
Module Availability
• We get the module availability, which is the
ratio of service accomplishment to elapsed
time.
Module Availability
• We classify systems into different availability
classes as shown in table. Currently, most
general-purpose systems are operating in class
3 or 4.
Fault-Tolerance Techniques
• Hardware redundancy
• Information redundancy
• Software redundancy
• Time redundancy
Hardware Redundancy
• Making a module failfast can be done by
duplication.
• Two identical copies of a module are
employed , with a comparator checking the
output of the two copies.
• When the output differs, a fault is detected.
• This is a widely used technique, because it is
easy to realize, and relatively cheap.
Hardware Redundancy
Information Redundancy
• Information redundancy is the addition of
extra information to data, to allow error
detection and correction.
• This is typically error-detecting codes, error-
correcting codes (ECC), and self-checking
circuits.
• Parity codes are used in most modern
computers for memory error detection.
Software Redundancy
• There are some important differences
between software and hardware errors.
• Software development is also a more complex
and immature art than hardware design.
• It is said that perfect software is possible —
it’s just a matter of time and money.
Software Redundancy
• There are two software fault-tolerance
techniques
• N-version programming: Write the program N
times, then operate all N programs in parallel,
and take a majority vote for each answer.
• Transactions: Write the program as a
transaction. Use a consistency check at the
end, and if the conditions are not met, restart.
Software Fault Detection
• Watchdog timers and timeouts: A watchdog
daemon process can watch the life of an
application by periodically sending the process
a signal and check the return value to detect if
it is alive
• Consistency checking/self-checking: The
programs can use assertions to check the
results of computations.
Time Redundancy
• Hardware- and information- redundancy
requires extra hardware.
• This could be avoided by doing operations
several times in the same module and check
the results, in stead of doing it in parallel on
several modules and compare the outputs.
• This reduces the amount of hardware at the
expense of using additional time.
Fault-Tolerance in General-Purpose
Computers
• A processor contains many registers. To
provide fault-tolerance.
• Database transaction as software techniques.
High availability computer systems
• Tandem Computers
• Stratus
• MARS
• Sun Netra ft 1800
• Fault-Tolerance on Clusters
Tandem Computers
Stratus
MARS
Sun Nethra ft 1800
Fault-Tolerance on Clusters
Key points
• This chapter is an introduction to fault-
tolerance concepts and systems, mainly from
the hardware point of view.
• There are four methods of fault tolerant
Software, hardware, time and information.
• Finally, some systems are studied as case
examples, including Tandem, Stratus, MARS,
and Sun Netra ft 1800

HUMSS 12 DIASS FIRST QUARTER EXAM. by ALMIRAH MACALUNAS
100% (9)
HUMSS 12 DIASS FIRST QUARTER EXAM. by ALMIRAH MACALUNAS
11 pages
Unit 4 MCQ PDF
No ratings yet
Unit 4 MCQ PDF
34 pages
WM-GL-HAL-PSL-503 - Maintenance Procedures For A Lo Torc Plug Valve
100% (2)
WM-GL-HAL-PSL-503 - Maintenance Procedures For A Lo Torc Plug Valve
29 pages
Fault Tolerance Computing 1
No ratings yet
Fault Tolerance Computing 1
59 pages
The Motor Spirit and High Speed Diesel (Regulation of Supply and Distribution and Prevention of M - 0
No ratings yet
The Motor Spirit and High Speed Diesel (Regulation of Supply and Distribution and Prevention of M - 0
32 pages
Preliminar Não Fabricar: Plan View From Above Showing Foundation Hole Drilling
No ratings yet
Preliminar Não Fabricar: Plan View From Above Showing Foundation Hole Drilling
1 page
Reliable System Design: Hardware Design Checklist Testing Embedded Systems Critical Systems
No ratings yet
Reliable System Design: Hardware Design Checklist Testing Embedded Systems Critical Systems
28 pages
Biodata of Profvssapkal
No ratings yet
Biodata of Profvssapkal
30 pages
Ramp Check List
No ratings yet
Ramp Check List
1 page
Fault Tolerant Computing
No ratings yet
Fault Tolerant Computing
4 pages
Lesson 1 - Introduction To Fault-Tolerant Computing
No ratings yet
Lesson 1 - Introduction To Fault-Tolerant Computing
6 pages
Graven and Venkat
No ratings yet
Graven and Venkat
21 pages
System Reliability and Availability
No ratings yet
System Reliability and Availability
5 pages
II - Fault-Tolerant-techniques
No ratings yet
II - Fault-Tolerant-techniques
104 pages
Fault Tolerant Systems: Prerequisites
No ratings yet
Fault Tolerant Systems: Prerequisites
14 pages
Pico Interactive Instruction Manual
No ratings yet
Pico Interactive Instruction Manual
200 pages
Fault Tolerance Techniques: Unit 3
No ratings yet
Fault Tolerance Techniques: Unit 3
40 pages
03 - Reliability Software
No ratings yet
03 - Reliability Software
56 pages
Distrsyslectureset7 Win20
No ratings yet
Distrsyslectureset7 Win20
114 pages
II Fault Tolerant Techniques
No ratings yet
II Fault Tolerant Techniques
101 pages
Fault Tolerant Design: An Introduction: Elena Dubrova
No ratings yet
Fault Tolerant Design: An Introduction: Elena Dubrova
162 pages
DLL-November 4-8-25, 2024
No ratings yet
DLL-November 4-8-25, 2024
4 pages
Chapter 3
No ratings yet
Chapter 3
40 pages
Lect8 FaultTolerance
No ratings yet
Lect8 FaultTolerance
37 pages
CS61C Su18 27 MRR Dependability
No ratings yet
CS61C Su18 27 MRR Dependability
60 pages
Fault Tolerance Computing Lecture Note
No ratings yet
Fault Tolerance Computing Lecture Note
61 pages
RTFT15 Unit 2
No ratings yet
RTFT15 Unit 2
53 pages
DFTS BE 4 II Sem Unit 2
No ratings yet
DFTS BE 4 II Sem Unit 2
112 pages
Rts
No ratings yet
Rts
44 pages
Fault Tolerance Slides
No ratings yet
Fault Tolerance Slides
18 pages
Computer Graphics Final-1
No ratings yet
Computer Graphics Final-1
32 pages
Rtes Reliability and Fault Torelance
No ratings yet
Rtes Reliability and Fault Torelance
40 pages
Fault Avoidance and Tolerance Technique
No ratings yet
Fault Avoidance and Tolerance Technique
15 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
51 pages
Reference Book Principles of Distributed Database System Chapters
No ratings yet
Reference Book Principles of Distributed Database System Chapters
25 pages
Reliability and Reusability
No ratings yet
Reliability and Reusability
35 pages
Revision Notes - 02 Reliability in Computer Systems
No ratings yet
Revision Notes - 02 Reliability in Computer Systems
12 pages
Design Patterns For High Availability
No ratings yet
Design Patterns For High Availability
10 pages
Lecture 7 - FAULT-TOLERANT COMPUTING
No ratings yet
Lecture 7 - FAULT-TOLERANT COMPUTING
13 pages
Software Fault Tolerance
No ratings yet
Software Fault Tolerance
97 pages
VCO Non-Adjusting PLL FM MPX Stereo Demodulator With FM Accessories
No ratings yet
VCO Non-Adjusting PLL FM MPX Stereo Demodulator With FM Accessories
16 pages
Dis Sys
No ratings yet
Dis Sys
16 pages
Fault Tolerance Slides
No ratings yet
Fault Tolerance Slides
18 pages
7.fault Tolerance
No ratings yet
7.fault Tolerance
35 pages
Solving Linear Fractional Programming Problems With Interval Coefficients in The Objective Function. A New Approach
No ratings yet
Solving Linear Fractional Programming Problems With Interval Coefficients in The Objective Function. A New Approach
11 pages
16 Fault Tolerance
No ratings yet
16 Fault Tolerance
34 pages
Ch-4-Fault Tularance - Naming-SM
No ratings yet
Ch-4-Fault Tularance - Naming-SM
42 pages
Fault Tolerance
No ratings yet
Fault Tolerance
17 pages
Lecture 4
No ratings yet
Lecture 4
25 pages
SDA Session 8
No ratings yet
SDA Session 8
17 pages
Redundant and Voting System
No ratings yet
Redundant and Voting System
10 pages
Introduction To Fault Tolerance
No ratings yet
Introduction To Fault Tolerance
20 pages
Nairobi PDF
No ratings yet
Nairobi PDF
22 pages
Extending Rt-Minix With Fault Tolerance Capabilities: Pablo J. Rogina
No ratings yet
Extending Rt-Minix With Fault Tolerance Capabilities: Pablo J. Rogina
8 pages
Faulttolerancech5 150426005118 Conversion Gate02
No ratings yet
Faulttolerancech5 150426005118 Conversion Gate02
24 pages
Aw Hook-Simulationxpress Study-1
No ratings yet
Aw Hook-Simulationxpress Study-1
11 pages
Software Relaibility Models
No ratings yet
Software Relaibility Models
18 pages
HI5004 Group Assignment Guideline T1.2021
No ratings yet
HI5004 Group Assignment Guideline T1.2021
15 pages
RTS UNiT 4
No ratings yet
RTS UNiT 4
19 pages
Ieee Ha Swieorick
No ratings yet
Ieee Ha Swieorick
19 pages
Ran Dell 75
No ratings yet
Ran Dell 75
18 pages
11 Errors
No ratings yet
11 Errors
33 pages
0510 s16 Ms 23 PDF
No ratings yet
0510 s16 Ms 23 PDF
11 pages
Designing A Control System For High Availability
No ratings yet
Designing A Control System For High Availability
10 pages
Unit-5 Faults in RTOS
No ratings yet
Unit-5 Faults in RTOS
5 pages
Human Values: DR - Sunil Ms Ob LPU
No ratings yet
Human Values: DR - Sunil Ms Ob LPU
11 pages
Lecture 8 Software Construction
No ratings yet
Lecture 8 Software Construction
11 pages
Design of Fault Tolerant Systems
No ratings yet
Design of Fault Tolerant Systems
7 pages
10024947D00 - Turbine Control Board Requirements Specification, PB 540
No ratings yet
10024947D00 - Turbine Control Board Requirements Specification, PB 540
8 pages
Revision
No ratings yet
Revision
7 pages
9780374533557RGGReading Group Gold
No ratings yet
9780374533557RGGReading Group Gold
5 pages
Unit5 1
No ratings yet
Unit5 1
23 pages
Why Do Computers Stop Jim Gray
No ratings yet
Why Do Computers Stop Jim Gray
8 pages
Introduction
No ratings yet
Introduction
8 pages
Comply Efficiently With Electronic Documents and Statutory Reporting Worldwide
No ratings yet
Comply Efficiently With Electronic Documents and Statutory Reporting Worldwide
4 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
6 pages
Pages From 2512
No ratings yet
Pages From 2512
3 pages
Hardware Redundancy.
No ratings yet
Hardware Redundancy.
4 pages
Computer and Spftware Reliability
No ratings yet
Computer and Spftware Reliability
4 pages
Đề thi học kì 2 2022 - 2023
No ratings yet
Đề thi học kì 2 2022 - 2023
3 pages
China Orifice Forged Flanges Manufacturer & Supplier DHDZ
No ratings yet
China Orifice Forged Flanges Manufacturer & Supplier DHDZ
1 page
Nazma - Vali - Shaik Resume
No ratings yet
Nazma - Vali - Shaik Resume
2 pages
Confirmation
No ratings yet
Confirmation
2 pages
Ministry of Corporate Affairs: Only For Pay Later Payment. Not For Payment at Branch Counter E-Challan For Paying Later
No ratings yet
Ministry of Corporate Affairs: Only For Pay Later Payment. Not For Payment at Branch Counter E-Challan For Paying Later
2 pages
C# Tutorial - SoloLearn - Learn To Code For FREE!
No ratings yet
C# Tutorial - SoloLearn - Learn To Code For FREE!
1 page
Case Study
No ratings yet
Case Study
2 pages
Computerized Approach For Matrixform Fmea 1979
No ratings yet
Computerized Approach For Matrixform Fmea 1979
1 page
Embedded Systems Programming with C++: Real-World Techniques
From Everand
Embedded Systems Programming with C++: Real-World Techniques
Robert Johnson
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Operating Systems Interview Questions You'll Most Likely Be Asked
From Everand
Operating Systems Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Week09-Fault Tolerant System

Uploaded by

Week09-Fault Tolerant System

Uploaded by

Disaster Recovery

You might also like