0% found this document useful (0 votes)

96 views

Software Fault Tolerance Methods

This document discusses concepts of dependability and software fault tolerance techniques. It defines dependability as the trustworthiness of a system and its ability to deliver services. Software faults may occur during design so fault tolerance measures are needed. Common techniques include recovery blocks, N-version programming, consensus recovery blocks, and distributed recovery blocks. The goal of fault tolerance is to enable systems to tolerate faults through redundancy. [/SUMMARY]

Uploaded by

Monil Joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views

Software Fault Tolerance Methods

Uploaded by

Monil Joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Introduction Concepts of Dependability Software Fault Tolerance Techniques Conclusion Questions

More and more people depend and rely on computer systems Increasing need for computer systems also increases the need for fault tolerance computer systems The interest on the area of fault tolerant realtime systems is increasing

Software faults may be occurred in design of the system Virtually impossible to design and implement completely fault free system Measures have to be provided in order to detect and tolerate faults

Defined as trustworthiness of a system Reliance on the service it provides

System
Is a set of interacting components with a design

Service
Delivered by a system is the behavior of that system which affects users or other systems

Dependability Impairments Dependability Means

Dependability Attributes

Undesired, but seldom unexpected, circumstance causing or resulting from undependability System behaves in an unacceptable manner No longer satisfy its specifications when a system failure occurs

System failure
A system, which no longer delivers a service that complies with the specification of the system, is said to suffer from a system failure

Error
is a system state, which is liable to lead to a subsequent failure

Fault
The conditions which caused the error

In order to assess the severity of faults and to decide measures for removing them a classification is useful

Whether or not an error leads to a failure depends on a set of factors A system that incorporates redundancy on some level may mask the error

Failure Modes;
Failure Domain
The value of the service does not comply with the specifications

Failure Perception
Experienced by the user of the system

Failure Consequences
Different levels of severity

Not every fault leads to error Not every error leads to failure

Faults are active when they produce errors Errors are detected by error detection algorithms or mechanisms. Failures occur when error passes through the interface of the system

Methods and techniques enabling the provision of the ability to deliver a service on which reliance can be placed, and the reaching of confidence in this ability

A dependable software
Procurement (Fault prevention and Fault tolerance)
Methodology used to construct a dependable system

Validation (Fault removal and Fault forecasting)

Methodology used to ensure the dependability of a system

Fault prevention
How to prevent fault occurrence by construction

Fault tolerance
How to provide service when faults are present

Fault removal
How to minimize the presence of faults

Fault forecasting
How to estimate the creation and manifestation of faults

Since human activities are involved, these four means are goals that cannot be fully reached

The ability of an operational system to tolerate the presence of faults

1. 2. 3. 4. Error Detection Damage Assessment Error Processing Fault Treatment

Error Detection
Is the detection of an erroneous state Lead to subsequent failure

Damage Assessment
When an error has been detected in order to establish more precisely to which extent the system is damaged

Error Processing
Error Recovery
An attempt to substitute the erroneous system state with one which is error-free 1. Backward recovery 2. Forward recovery

Fault Treatment
Diagnosis Passivation

Is the duplication of critical components or functions of a system with the intention of increasing reliability of the system

A fault tolerant system is assumed to support some level of redundancy, ensuring that faults can be tolerated using the four phases

Space
Hardware redundancy Denoted as H

Information
Software redundancy Denoted as S

Repetition
Time redundancy Denoted as T

Enable the expected properties of a system to be expressed, and allow the quality of the system resulting from the impairments and the means opposing them to be assessed

Have four main attributes;

Availability
the extent to which a system has a readiness for usage

Reliability
the extent to which system continuously provides its service

Safety
the extent to which a system avoids catastrophic consequences on the environment

Security
the extent to which a system prevents unauthorized access and/or handling of information

Recovery Block (RB) N-Version Programming (NVP) Consensus Recovery Block (CRB) Distributed Recovery Block (DRB) N Self-Checking Programming (NSCP) Data Diversity

Basic elements of RB;

One primary module
A program module which performs desired operation

Zero or more alternate modules

Same desired operation in different way

One acceptance test

A test which confirms the output of the modules

An error in the operation of a module, explicitly detected by the acceptance test The module fails to terminate, detected by a time-out An error is detected during execution of a module by one of the implicit error detection mechanisms An inner recovery block has failed due to all modules being rejected either explicitly or implicitly

The types of faults tolerated by recovery blocks Designing the primary and alternate modules Designing the acceptance test Designing the recovery cache mechanism

Basic elements of NVP;

The initial specification
N software versions
The specification of the functionality which is desired by the software Software modules which all are independently generated from initial specification Decides what the final result of the computations will be using the results from the N versions as input A software structure used to drive the N versions and the decision mechanism

A decision mechanism

A supervisory Program

In order for the decision mechanism to do its job, the outputs of the N versions must be synchronized

The types of faults tolerated by N-version programming The initial specification Generating independent versions The decision mechanism

A synthesis of the original recovery block and N-version programming

Include RB problems Include NVP problems

Cost of implementation

To integrate software and hardware fault tolerance into one single structure Both the primary and the alternate modules are replicated and are resident on two or more separate nodes interconnected by a network Software faults -> Traditional recovery block fashion Hardware faults -> In backup nodes

The system is divided into several self checking components comprised of different variants (equivalent to alternates in RB and versions in NVP)

Usual problems associated with acceptance test Comparison mechanism

Programs fail for special cases in the input space Moving the input data out of failure domain with two approaches
Retry Block N-copy programming

The acceptance test The voter

Since all fault tolerance depends on some kind of redundancy, fault tolerant systems will always be more expensive The fault tolerance technique of choice is of course highly application dependent CRB and DRB are still mostly used for academic research

Low-cost systems should use fault tolerance schemes that do not make use of hardware redundancy High-cost systems should use schemes such as NVP, NSCP or NCP

https://www.cs.drexel.edu/~bmitchel/course/cs575/ClassPapers/hi ller98software.pdf http://ce.kashanu.ac.ir/babamir/Session1.pdf http://en.wikipedia.org/wiki/Redundancy_%28engineering%29 http://srel.ee.duke.edu/sw_ft/node5.html http://www.ibm.com/developerworks/rational/library/114.html

Airline Reservation System Project Documentation
77% (256)
Airline Reservation System Project Documentation
40 pages
WITSML API Documentación
No ratings yet
WITSML API Documentación
147 pages
Dependable and Secure Computing Concepts
No ratings yet
Dependable and Secure Computing Concepts
14 pages
RTES RELIABILITY AND FAULT TORELANCE
No ratings yet
RTES RELIABILITY AND FAULT TORELANCE
40 pages
Fault Avoidance and Tolerance Technique
No ratings yet
Fault Avoidance and Tolerance Technique
15 pages
Fault Tolerance: Click To Add Text Dealing Successfully With Partial System. Key Technique: Redundancy
No ratings yet
Fault Tolerance: Click To Add Text Dealing Successfully With Partial System. Key Technique: Redundancy
48 pages
Rajib Mall Lecture Notes
No ratings yet
Rajib Mall Lecture Notes
78 pages
Chapter 3
No ratings yet
Chapter 3
40 pages
Unit 11 Dependability-and-Security
No ratings yet
Unit 11 Dependability-and-Security
39 pages
DS CH7 - Fault Tolerance
No ratings yet
DS CH7 - Fault Tolerance
17 pages
Fundamental Concepts of Dependability: Algirdas Aviz Ienis Jean-Claude Laprie Brian Randell
No ratings yet
Fundamental Concepts of Dependability: Algirdas Aviz Ienis Jean-Claude Laprie Brian Randell
21 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
6 pages
Lect8 FaultTolerance
No ratings yet
Lect8 FaultTolerance
37 pages
Reference Book Principles of Distributed Database System Chapters
No ratings yet
Reference Book Principles of Distributed Database System Chapters
25 pages
Real Time Systems IX
No ratings yet
Real Time Systems IX
40 pages
TAF
No ratings yet
TAF
31 pages
Computer and Spftware Reliability
No ratings yet
Computer and Spftware Reliability
4 pages
Rts
No ratings yet
Rts
44 pages
Reliable System Design: Hardware Design Checklist Testing Embedded Systems Critical Systems
No ratings yet
Reliable System Design: Hardware Design Checklist Testing Embedded Systems Critical Systems
28 pages
Faulttolerancech5 150426005118 Conversion Gate02
No ratings yet
Faulttolerancech5 150426005118 Conversion Gate02
24 pages
Software Reliability: CIS 376 Bruce R. Maxim UM-Dearborn
No ratings yet
Software Reliability: CIS 376 Bruce R. Maxim UM-Dearborn
37 pages
Testing
No ratings yet
Testing
4 pages
Fault Tolerance Techniques: Unit 3
No ratings yet
Fault Tolerance Techniques: Unit 3
40 pages
LECT-7A-Software Reliability metrics
No ratings yet
LECT-7A-Software Reliability metrics
37 pages
SDA Session 8
No ratings yet
SDA Session 8
17 pages
Distributed Systems - Fault Tolerance
No ratings yet
Distributed Systems - Fault Tolerance
21 pages
Notes On Fault Tolerance
No ratings yet
Notes On Fault Tolerance
2 pages
Distrsyslectureset7 Win20
No ratings yet
Distrsyslectureset7 Win20
114 pages
Software Reliability: by Allesh Panda Iiit BBSR
No ratings yet
Software Reliability: by Allesh Panda Iiit BBSR
37 pages
Sivam 219303066 Research Paper Reliability 1
No ratings yet
Sivam 219303066 Research Paper Reliability 1
16 pages
03 - Reliability Software
No ratings yet
03 - Reliability Software
56 pages
Fundamental Concepts of Dependability: Algirdas Aviz Ienis Jean-Claude Laprie Brian Randell
No ratings yet
Fundamental Concepts of Dependability: Algirdas Aviz Ienis Jean-Claude Laprie Brian Randell
6 pages
STDcurs1 Merged
No ratings yet
STDcurs1 Merged
139 pages
7.Fault_Tolerance
No ratings yet
7.Fault_Tolerance
35 pages
Reliability
No ratings yet
Reliability
58 pages
Dependable Computing: Concepts, Limits, Challenges
No ratings yet
Dependable Computing: Concepts, Limits, Challenges
13 pages
DS unit_4
No ratings yet
DS unit_4
20 pages
Presentation - 02 Reliability in Computer Systems
No ratings yet
Presentation - 02 Reliability in Computer Systems
24 pages
Fault Tolerance 2
No ratings yet
Fault Tolerance 2
5 pages
Lecture 7 - FAULT-TOLERANT COMPUTING
No ratings yet
Lecture 7 - FAULT-TOLERANT COMPUTING
13 pages
N-Version Programming A Fault-Tolerance Approach To Reliability Software Operation
No ratings yet
N-Version Programming A Fault-Tolerance Approach To Reliability Software Operation
7 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
51 pages
Ds chapter 7 (2)
No ratings yet
Ds chapter 7 (2)
21 pages
Unit-5 Faults in RTOS
No ratings yet
Unit-5 Faults in RTOS
5 pages
Sivam 219303066 Research Paper Reliability
No ratings yet
Sivam 219303066 Research Paper Reliability
16 pages
Revision Notes - 02 Reliability in Computer Systems
No ratings yet
Revision Notes - 02 Reliability in Computer Systems
12 pages
9 Reliability
No ratings yet
9 Reliability
68 pages
Software Engineering Often Asked Question Answers PDF
No ratings yet
Software Engineering Often Asked Question Answers PDF
54 pages
Introduction To Fault Tolerance
No ratings yet
Introduction To Fault Tolerance
20 pages
Chen 07
No ratings yet
Chen 07
39 pages
Chapter 8
No ratings yet
Chapter 8
107 pages
Intro To DS Chapter 6
No ratings yet
Intro To DS Chapter 6
51 pages
Dependability & Security
No ratings yet
Dependability & Security
24 pages
Distributed System - Failures
No ratings yet
Distributed System - Failures
12 pages
LEC17 (SW) (2)
No ratings yet
LEC17 (SW) (2)
40 pages
Principles of Fault Tolerance
No ratings yet
Principles of Fault Tolerance
16 pages
Module 5 Software Redundancy-Short
No ratings yet
Module 5 Software Redundancy-Short
48 pages
RTFT15 Unit 2
No ratings yet
RTFT15 Unit 2
53 pages
15-440 Distributed Systems: Fault Tolerance, Logging and Recovery Thursday Oct 8, 2015
No ratings yet
15-440 Distributed Systems: Fault Tolerance, Logging and Recovery Thursday Oct 8, 2015
30 pages
Chapter 8-Fault Tolerance
No ratings yet
Chapter 8-Fault Tolerance
30 pages
Software Reliability
No ratings yet
Software Reliability
24 pages
Mastering System Center Configuration Manager
From Everand
Mastering System Center Configuration Manager
Vangel Krstevski
No ratings yet
Table of Contents
No ratings yet
Table of Contents
15 pages
5.1 Information+Security+Policy
No ratings yet
5.1 Information+Security+Policy
4 pages
Willcom Installation
No ratings yet
Willcom Installation
2 pages
SOA Patterns With BizTalk Server 2013 and Microsoft Azure - Second Edition - Sample Chapter
No ratings yet
SOA Patterns With BizTalk Server 2013 and Microsoft Azure - Second Edition - Sample Chapter
26 pages
EGMP
No ratings yet
EGMP
29 pages
Tips & Tricks For The Success With Azure SQL Managed Instance
100% (1)
Tips & Tricks For The Success With Azure SQL Managed Instance
41 pages
Course 6419B:: Configuring, Managing and Maintaining Windows Server 2008-Based Servers
No ratings yet
Course 6419B:: Configuring, Managing and Maintaining Windows Server 2008-Based Servers
10 pages
Web Brute Common
No ratings yet
Web Brute Common
156 pages
Chapter 6: The Relational Algebra and Relational Calculus: Answers To Selected Exercises
No ratings yet
Chapter 6: The Relational Algebra and Relational Calculus: Answers To Selected Exercises
4 pages
Securid Software Token Administrators Guide
No ratings yet
Securid Software Token Administrators Guide
30 pages
Customer Segmentation - Project With R
No ratings yet
Customer Segmentation - Project With R
5 pages
Extreme Programming Installed PDF
No ratings yet
Extreme Programming Installed PDF
2 pages
Snapwork-Emp Acknowledgement-Information Security 2021
No ratings yet
Snapwork-Emp Acknowledgement-Information Security 2021
2 pages
Srs Template
No ratings yet
Srs Template
6 pages
Middleware in Cloud Computing
No ratings yet
Middleware in Cloud Computing
7 pages
ADBS Chapter 1
No ratings yet
ADBS Chapter 1
80 pages
Upgrade Guide
No ratings yet
Upgrade Guide
297 pages
Learn Java 8 in A Week - A Beginner's Guide To Java Programming PDF
No ratings yet
Learn Java 8 in A Week - A Beginner's Guide To Java Programming PDF
107 pages
Joins in Dbms
No ratings yet
Joins in Dbms
19 pages
Art of Hapiness
No ratings yet
Art of Hapiness
5 pages
Purushottam Singh Purushottam Singh: Mobile: India - 134113 Mobile: +91-9141359063
No ratings yet
Purushottam Singh Purushottam Singh: Mobile: India - 134113 Mobile: +91-9141359063
3 pages
Data Structure - Importance and Advantages: Why Data Structures Are Needed
No ratings yet
Data Structure - Importance and Advantages: Why Data Structures Are Needed
11 pages
Manual Conmponetes Doa PDF
No ratings yet
Manual Conmponetes Doa PDF
283 pages
Field Notice: FN - 70441 - Firepower Software - Mysql-Server - Err Log File Might Consume Excessive Disk Space - Workaround Provided - Cisco
No ratings yet
Field Notice: FN - 70441 - Firepower Software - Mysql-Server - Err Log File Might Consume Excessive Disk Space - Workaround Provided - Cisco
3 pages
Cis Rhel6 Linux RCL
No ratings yet
Cis Rhel6 Linux RCL
17 pages
Sports Club
No ratings yet
Sports Club
11 pages
Cloud Deployment Model
No ratings yet
Cloud Deployment Model
3 pages
F5 Profile Types
No ratings yet
F5 Profile Types
2 pages

Software Fault Tolerance Methods

Uploaded by

Software Fault Tolerance Methods

Uploaded by

Introduction Concepts of Dependability Software Fault Tolerance Techniques Conclusion Questions

Defined as trustworthiness of a system Reliance on the service it provides

Dependability Impairments Dependability Means

Validation (Fault removal and Fault forecasting)

The ability of an operational system to tolerate the presence of faults

Have four main attributes;

Basic elements of RB;

Zero or more alternate modules

One acceptance test

Basic elements of NVP;

A synthesis of the original recovery block and N-version programming

Include RB problems Include NVP problems

Usual problems associated with acceptance test Comparison mechanism

The acceptance test The voter

You might also like