Lesson 2 - Fault and Error Modelling

Fault and Error Modelling

Uploaded by

Paul Pogba Clive

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views7 pages

Lesson 2 - Fault and Error Modelling

Fault and Error Modelling

Uploaded by

Paul Pogba Clive

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Lesson 2.

Fault and Error Modelling

1. Introduction
Faults, errors, and failures are terms often used interchangeably in computing systems, but they
represent different stages in a system malfunction. Fault modeling is a critical process in fault-
tolerant computing that helps us to understand how faults occur, propagate, and affect system
behavior. Understanding faults and errors and how to model them is very essential for improving
system reliability and performance.
2. Learning Outcomes
By the end of the lesson, students should be able to:
1. Differentiate between faults, errors, and failures.
2. Model faults and errors in systems.
3. Understand the implications of fault/error modeling on system reliability.
3. Faults, Errors, and Failures
3.1 Fault
A fault is an underlying defect or flaw in a system that could potentially lead to erroneous
behavior. Faults can occur due to hardware malfunctions, software bugs, human errors, or
external disturbances (e.g., radiation). Sometimes due to certain factors such as Lack of
resources or not following proper steps, Fault occurs in software which means that the logic was
not incorporated to handle the errors in the application. This is an undesirable situation, but it
mainly happens due to invalid documented steps or a lack of data definitions.
1. It is an unintended behavior by an application program.
2. It causes a warning in the program.
3. If a fault is left untreated it may lead to failure in the working of the deployed code.
4. A minor fault in some cases may lead to high-end error.
5. There are several ways to prevent faults like adopting programming techniques,
development methodologies, peer review, and code analysis.
3.2 Error
An error is a deviation in the system's internal state caused by a fault. Errors manifest when a
system state deviates from the expected state and can propagate through the system,
potentially leading to failures if not detected and corrected. For instance, in software
development, an error is simply a situation that happens when the Development team or the

1
developer fails to understand a requirement definition and hence that misunderstanding gets
translated into buggy code.
1. Errors are generated due to wrong logic, syntax, or loop that can impact the end-user
experience.
2. It is calculated by differentiating between the expected results and the actual results.
3. It rises due to several reasons like design issues, coding issues, or system specification
issues and leads to issues in the application.
3.3 Failure
A failure occurs when the system's output deviates from the expected behavior due to an error.
It is the external manifestation of an error, visible to the user or the system environment.
Failure is the accumulation of several defects that ultimately lead to Software failure and results
in the loss of information in critical modules thereby making the system unresponsive.
Generally, such situations happen very rarely because before releasing a product all possible
scenarios and test cases for the code are simulated. Failure is detected by end-users once they
face a particular issue in the software.
1. Failure can happen due to human errors or can also be caused intentionally in the system
by an individual.
2. It is a term that comes after the production stage of the software.
3. It can be identified in the application when the defective part is executed.

Diagram: Relationship Between Faults, Errors, and Failures

This diagram illustrates the sequence from fault to error to failure:

2
4. Types of Faults
Faults can be classified based on their behavior and duration. The three main types are transient,
intermittent, and permanent faults:
4.1 Transient Faults
These faults occur temporarily and disappear without any corrective action. For example,
power fluctuations or cosmic radiation may cause transient faults in electronic circuits
causing it to misbehave temporarily and then recovers without intervention.
4.2 Intermittent Faults
Intermittent faults appear and disappear at irregular intervals. This is typically seen in loose
electric circuit connections. Faulty components that temporarily malfunction cause
intermittent faults. These faults are more challenging to detect and troubleshoot since they
occur unpredictably.
4.3 Permanent Faults
Permanent faults persist until corrective action is taken. For instance, a burned-out
processor or a failed hard drive would cause permanent faults. These faults require repair or
replacement of the affected component.
5. Fault and Error Modelling
Fault and error modeling helps simulate and analyze how faults affect a system's reliability and
performance. Fault modeling allows designers to understand the behavior of faults and develop
strategies to handle them.
5.1 Fault Models
A fault model is an abstraction used to describe different types of faults in a system. Some
common fault models include:
 Stuck-at Fault Model: Stuck-at Fault Model is one of the most commonly used fault
models in digital circuit testing. It assumes that a signal or node in a digital circuit is
"stuck" at a constant logic level, either logic 0 (stuck-at-0) or logic 1 (stuck-at-1),
regardless of the inputs applied to the circuit. The model simplifies the analysis of faults
in combinational and sequential circuits by focusing on these two failure modes.

3
1. Stuck-at-0 (s-a-0):
In this fault type, a signal or a node that is supposed to change its value is stuck at
logic 0. No matter the input combinations that are applied, the output of that
node remains at 0. Mathematically, if a node N is stuck-at-0, we have:
𝑁 = 0 (for all input combinations)
2. Stuck-at-1 (s-a-1):
In this case, a signal or node that is supposed to change its value remains stuck at
logic 1, regardless of the input conditions. For a node N stuck-at-1, we have:
N=1 (for all input combinations)

 Byzantine Fault Model: This model describes a system in which components, such as
nodes or processors, can fail in arbitrary or malicious ways, including sending conflicting
or misleading information to different parts of the system. This type of fault is one of the
most challenging to deal with in distributed systems because it assumes that
components may not simply stop working but may actively work against the system.

In the Byzantine Fault Model: Nodes in a distributed system can behave unpredictably or
dishonestly. A Byzantine node may send different (possibly incorrect) information to
different nodes, creating inconsistencies. The system must reach a consensus, despite
some nodes behaving incorrectly.

This problem was originally formulated as the Byzantine Generals Problem, where
generals in different locations must agree on a common plan of action, even though
some generals may be traitors sending contradictory or false messages to other generals
in the circle.

4
Byzantine Fault Tolerance (BFT): A system is said to be Byzantine Fault-Tolerant if it can
reach a consensus or continue functioning correctly even in the presence of Byzantine
faults. In a distributed system of n nodes, where up to f nodes can behave in a Byzantine
manner, Byzantine Fault Tolerance is achieved if the system can still function correctly
despite these faulty nodes.

Necessary Condition: 3f+1 Nodes

For Byzantine Fault Tolerance to be possible, the system must have at least 3f+1 total
nodes. This ensures that even if f nodes behave arbitrarily, the correct nodes can still
outvote the faulty ones and reach a consensus. This is often referred to as the Byzantine
Quorum Condition.

Equations

Let: n: Total number of nodes in the system

f: Maximum number of Byzantine faulty nodes

The relationship between the total nodes and the maximum Byzantine faults tolerated
is expressed as: n ≥ 3f + 1

This means that for the system to tolerate f Byzantine faults, the total number of nodes
n must be at least 3f+1. For example, if the system needs to tolerate 1 Byzantine fault
(f=1), there must be at least 3×1+1 = 4 nodes in the system.

Example: Your job requires you to develop a robust system capable of withstanding to
events of failure from two sources. Compute the number of alternative correct
complementary sources needed to keep the system afloat using the Byzantine Fault
Tolerance model.

5
 Transient Fault Model: This model simulates faults that appear temporarily, such as
errors caused by radiation or electrical interference. A Transient Fault (or soft error)
refers to a temporary error in a system that occurs for a short duration and does not result
in permanent damage to the hardware. These faults are typically caused by external
factors like electromagnetic interference, power fluctuations, or cosmic radiation and
are often difficult to reproduce. Unlike permanent faults, transient faults do not indicate
a failure of the system's components, and the system can recover once the disturbance
has passed.
5.2 Error Models
Error models are very important in understanding how faults lead to errors in a system and how
these errors propagate through the system's components. Through analysis of these models,
engineers can design systems with appropriate fault tolerance and error detection mechanisms.
Some key error models are discussed in the following sections:
1. The Fail-Silent Model
In this model, a component that detects a fault stops all operations and provides no
output (silent failure). This model ensures that the faulty component does not propagate
errors to other components. For example, a temperature sensor in an industrial
application may stop sending readings if it detects a fault, ensuring that no erroneous
data is transmitted to the control system. This allows the system to rely on other sensors
or to take corrective actions without being misled by faulty data.
2. The Fail-Stop Model
In a fail-stop model, when a component fails, it stops functioning and signals its failure
to the rest of the system. This allows other components to take appropriate actions
based on the failure. For instance, in a distributed database system, if a server fails, it
may send a notification to other servers, indicating its unavailability. The remaining
servers can then redistribute the workload and maintain the overall system functionality.
3. Byzantine Model
In this model, components may fail and exhibit arbitrary behavior, including sending
conflicting or misleading information. This model is crucial for systems where nodes can
be compromised or may act maliciously. For example, in a blockchain network, some
nodes may attempt to submit fraudulent transactions. A Byzantine Fault Tolerance (BFT)
algorithm helps ensure that even if some nodes behave incorrectly, the majority can still
reach a consensus on the correct state of the blockchain.
4. The Error Propagation Model
This model describes how errors introduced by faulty components can affect other
components. The error can propagate through the system based on its architecture and
6
the interactions between components. For example, in a digital circuit, a faulty flip-flop
may cause an incorrect signal to be sent to other logic gates. If that signal is used as an
input to a combination of gates, the incorrect output can propagate further, leading to
errors in multiple parts of the circuit.
5. Silent Data Corruption Model
In this model, data may be corrupted without any fault detection mechanisms in place,
leading to incorrect results without any indication of failure. In a storage system, data
can be silently corrupted due to a transient fault, such as a power spike.

6. Fault Injection Techniques

Fault injection is a technique used in a simulated environment to assess the reliability of a
system by deliberately introducing faults into it. This allows engineers to observe how the
system responds to faults and whether the fault-tolerant mechanisms function as expected.
There are two basic approaches to this.
1. Software-Based Fault Injection: Here, faults are injected into the software by
manipulating variables, data, or instructions.
2. Hardware-Based Fault Injection: Hardware faults are simulated by manipulating physical
components, such as cutting wires or injecting electrical disturbances.
7. Importance of Fault and Error Modelling on System Reliability
Fault and error modeling play a crucial role in predicting a system's reliability. By modeling
potential faults and their effects, designers can:
 Helps in designing the system to detect and recover from errors easily, such as using
error-correcting codes like Hamming Codes to detect and correct bit-flip errors.
 Fault models help in deciding where to add redundancy (e.g., N-modular redundancy) to
ensure that even when a fault occurs, the system continues to operate.
 Fault models are essential in designing testing strategies like test pattern generation in
digital circuits, where faults are systematically introduced to ensure error detection
mechanisms work correctly.
8. Review Questions
1. Differentiate between transient, intermittent, and permanent faults.
2. Model the 3 system fault scenarios and discuss its effects on system performance.

STDcurs1 Merged
No ratings yet
STDcurs1 Merged
139 pages
Critical Path Method Exercises: Based On The PMBOK® Guide Fifth Edition
No ratings yet
Critical Path Method Exercises: Based On The PMBOK® Guide Fifth Edition
8 pages
Chapter 8-Fault Tolerance
100% (1)
Chapter 8-Fault Tolerance
71 pages
Fundamental Concepts of Dependability: Algirdas Aviz Ienis Jean-Claude Laprie Brian Randell
No ratings yet
Fundamental Concepts of Dependability: Algirdas Aviz Ienis Jean-Claude Laprie Brian Randell
21 pages
03 - Reliability Software
No ratings yet
03 - Reliability Software
56 pages
The Fault Detection Problem
No ratings yet
The Fault Detection Problem
15 pages
Reference Book Principles of Distributed Database System Chapters
No ratings yet
Reference Book Principles of Distributed Database System Chapters
25 pages
Software Reliability: CIS 376 Bruce R. Maxim UM-Dearborn
No ratings yet
Software Reliability: CIS 376 Bruce R. Maxim UM-Dearborn
37 pages
Fault Tolerance: Click To Add Text Dealing Successfully With Partial System. Key Technique: Redundancy
No ratings yet
Fault Tolerance: Click To Add Text Dealing Successfully With Partial System. Key Technique: Redundancy
48 pages
Haulroad: Design and Maintenance Guide Design and Maintenance
100% (1)
Haulroad: Design and Maintenance Guide Design and Maintenance
11 pages
Software Reliability: by Allesh Panda Iiit BBSR
No ratings yet
Software Reliability: by Allesh Panda Iiit BBSR
37 pages
Slides 08 PDF
No ratings yet
Slides 08 PDF
95 pages
Design For Testability: N.Pitcheswara Rao Assistant Professor ECE Department
No ratings yet
Design For Testability: N.Pitcheswara Rao Assistant Professor ECE Department
47 pages
Chapter 3
No ratings yet
Chapter 3
40 pages
BS 1881-112 1983 Concrete Methods of Accelerated Curing of Test Cubes
No ratings yet
BS 1881-112 1983 Concrete Methods of Accelerated Curing of Test Cubes
11 pages
Dependable and Secure Computing Concepts
No ratings yet
Dependable and Secure Computing Concepts
14 pages
End User Training Manual
No ratings yet
End User Training Manual
39 pages
Fault Tolerance Slides
No ratings yet
Fault Tolerance Slides
18 pages
Failure Model
No ratings yet
Failure Model
14 pages
Principles of Fault Tolerance
No ratings yet
Principles of Fault Tolerance
16 pages
IC Electronic English Catalogue 2010
No ratings yet
IC Electronic English Catalogue 2010
48 pages
Rts
No ratings yet
Rts
44 pages
Distributed System - Failures
No ratings yet
Distributed System - Failures
12 pages
CBDT3103 Answer
No ratings yet
CBDT3103 Answer
9 pages
Lec 3
No ratings yet
Lec 3
30 pages
Software Reliability Theory: Keywords: History Theory Random Point Process Exponential Order Statistics
No ratings yet
Software Reliability Theory: Keywords: History Theory Random Point Process Exponential Order Statistics
43 pages
Fundamental Concepts of Dependability: Algirdas Aviz Ienis Jean-Claude Laprie Brian Randell
No ratings yet
Fundamental Concepts of Dependability: Algirdas Aviz Ienis Jean-Claude Laprie Brian Randell
6 pages
5 DCS Failure Modes and Models
No ratings yet
5 DCS Failure Modes and Models
52 pages
Introduction To Fault Tolerance
No ratings yet
Introduction To Fault Tolerance
20 pages
WinSetupFromUSB Modified
No ratings yet
WinSetupFromUSB Modified
3 pages
Challenging Malicious Inputs With Fault Tolerance Techniques
No ratings yet
Challenging Malicious Inputs With Fault Tolerance Techniques
8 pages
Unit10 Fault Tolerance and Security
No ratings yet
Unit10 Fault Tolerance and Security
24 pages
Design Patterns For High Availability
No ratings yet
Design Patterns For High Availability
10 pages
Failure Model
No ratings yet
Failure Model
14 pages
WRL0004 TMP
No ratings yet
WRL0004 TMP
9 pages
ERP Chapter 1
No ratings yet
ERP Chapter 1
21 pages
Distrsyslectureset7 Win20
No ratings yet
Distrsyslectureset7 Win20
114 pages
Reliability
No ratings yet
Reliability
58 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
51 pages
页面提取自－catalogue 2
No ratings yet
页面提取自－catalogue 2
9 pages
Functions: Güntner Motor Management
No ratings yet
Functions: Güntner Motor Management
2 pages
7 User Manual For DP C240.CAN: 7.1 Important Notice 2 7.2 Introduction of Display 2 7.3 Product Description 3
No ratings yet
7 User Manual For DP C240.CAN: 7.1 Important Notice 2 7.2 Introduction of Display 2 7.3 Product Description 3
14 pages
Chapter 8
No ratings yet
Chapter 8
107 pages
CSC423 - Lec12 - Distributed and Parallel ComputerSystems
No ratings yet
CSC423 - Lec12 - Distributed and Parallel ComputerSystems
28 pages
Best SCCM Training - 100% Free Online Demo
No ratings yet
Best SCCM Training - 100% Free Online Demo
6 pages
The X MSCI Programmer's Handbook
No ratings yet
The X MSCI Programmer's Handbook
123 pages
JM Jar Quote
No ratings yet
JM Jar Quote
5 pages
Lecture 2 - January 7, 2015
No ratings yet
Lecture 2 - January 7, 2015
4 pages
Financialaccounting IFRSPrinciples 5 e 2019
No ratings yet
Financialaccounting IFRSPrinciples 5 e 2019
2 pages
CH 4
No ratings yet
CH 4
25 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
6 pages
Lect8 FaultTolerance
No ratings yet
Lect8 FaultTolerance
37 pages
Fault Avoidance and Tolerance Technique
No ratings yet
Fault Avoidance and Tolerance Technique
15 pages
Lecture 7 - FAULT-TOLERANT COMPUTING
No ratings yet
Lecture 7 - FAULT-TOLERANT COMPUTING
13 pages
Basic Concepts of Reliability
No ratings yet
Basic Concepts of Reliability
9 pages
64K (8K X 8) Cmos Eprom: Features Package Types
No ratings yet
64K (8K X 8) Cmos Eprom: Features Package Types
13 pages
Fault Tolerance Slides
No ratings yet
Fault Tolerance Slides
18 pages
Working Principle and Applications of Capacitive Pressure
No ratings yet
Working Principle and Applications of Capacitive Pressure
2 pages
Duogen Product Bulletin PDF
No ratings yet
Duogen Product Bulletin PDF
2 pages
STPM 2010 - ICT: Answer
No ratings yet
STPM 2010 - ICT: Answer
7 pages
Device Network SDK (Card-Based Access Control) - Developer Guide - V6.1.5.X - 20230330
No ratings yet
Device Network SDK (Card-Based Access Control) - Developer Guide - V6.1.5.X - 20230330
526 pages
Resume: Name: M. Vasantharao Email ID: Mobile No: +91
No ratings yet
Resume: Name: M. Vasantharao Email ID: Mobile No: +91
3 pages
Report of Sony Picture Attack
No ratings yet
Report of Sony Picture Attack
5 pages
Design and Implement of QPSK Modem Based On FPGA
No ratings yet
Design and Implement of QPSK Modem Based On FPGA
3 pages
Fault Tolerance Computing Lecture Note
No ratings yet
Fault Tolerance Computing Lecture Note
61 pages
SaaS Vs Cloud-Hosted
No ratings yet
SaaS Vs Cloud-Hosted
2 pages
9 Bsbpur301 Purchase Goods and Services 818
No ratings yet
9 Bsbpur301 Purchase Goods and Services 818
38 pages
DS CH7 - Fault Tolerance
No ratings yet
DS CH7 - Fault Tolerance
17 pages
Unit 4
No ratings yet
Unit 4
11 pages
Fault Tolerance and Recovery
No ratings yet
Fault Tolerance and Recovery
50 pages
Client Log
No ratings yet
Client Log
30 pages
Dis Sys
No ratings yet
Dis Sys
16 pages
Rtes Reliability and Fault Torelance
No ratings yet
Rtes Reliability and Fault Torelance
40 pages
Vlsi&Testing Module4
No ratings yet
Vlsi&Testing Module4
72 pages
Fault Tolerance Techniques
No ratings yet
Fault Tolerance Techniques
4 pages
21EC63 Module 4A
No ratings yet
21EC63 Module 4A
39 pages
Auto MS
No ratings yet
Auto MS
15 pages
Argus DCD
No ratings yet
Argus DCD
2 pages
LECT-7A-Software Reliability Metrics
No ratings yet
LECT-7A-Software Reliability Metrics
37 pages
A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems
No ratings yet
A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems
13 pages
AVP - Lead Infrastructure Technology
No ratings yet
AVP - Lead Infrastructure Technology
2 pages
Ch-4-Fault Tularance - Naming-SM
No ratings yet
Ch-4-Fault Tularance - Naming-SM
42 pages
Fault Tolerance Computing 1
No ratings yet
Fault Tolerance Computing 1
59 pages
Team07 Abstract
No ratings yet
Team07 Abstract
4 pages
DR Dele CPE591
No ratings yet
DR Dele CPE591
101 pages
Cpe 591 Lecture Note Engr Omosigho 2021-2022 Session Print
No ratings yet
Cpe 591 Lecture Note Engr Omosigho 2021-2022 Session Print
29 pages
Unit4 Reliability Evaluation
No ratings yet
Unit4 Reliability Evaluation
5 pages
Microsoft Azure Ai Fundamentals Certification Companion Guide To Prepare For The Ai900 Exam 1st Edition Krunal S Trivedi Download
No ratings yet
Microsoft Azure Ai Fundamentals Certification Companion Guide To Prepare For The Ai900 Exam 1st Edition Krunal S Trivedi Download
82 pages
A Modeling Framework For Self Healing So
No ratings yet
A Modeling Framework For Self Healing So
9 pages

Lesson 2 - Fault and Error Modelling

Uploaded by

Lesson 2 - Fault and Error Modelling

Uploaded by

Lesson 2.

Fault and Error Modelling

Diagram: Relationship Between Faults, Errors, and Failures

This diagram illustrates the sequence from fault to error to failure:

Necessary Condition: 3f+1 Nodes

Let: n: Total number of nodes in the system

f: Maximum number of Byzantine faulty nodes

6. Fault Injection Techniques

You might also like