Lesson 6 - System-Level Diagnosis

System level Diagnosis

Uploaded by

Paul Pogba Clive

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views14 pages

Lesson 6 - System-Level Diagnosis

System level Diagnosis

Uploaded by

Paul Pogba Clive

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

System-Level Diagnosis

SOE 504
Specific Objectives
By the end of this lecture, students should be able to:
1. Demonstrate an understanding of the process of diagnosing
system faults at the system level.
2. Implement techniques to isolate and diagnose faults.
Introduction to System-Level Diagnosis
System-level diagnosis refers to the identification and isolation of faults in
a complex system, which may include hardware and software components.
System-level diagnosis looks at the entire system with the aim of
discovering where the failure is occurring and how it affects overall system
functionality. It is necessary because it:
1. Helps maintain system reliability and performance.
2. Minimizes downtime by enabling quick detection and correction of faults,
and
3. Supports predictive maintenance, reducing the likelihood of system
breakdown.
Overview of System Faults
• Faults can be broadly categorized into three:
1. Permanent faults: occur when a component is permanently
damaged and stops functioning.
2. Transient faults: temporary malfunctions that resolve without
intervention but can still disrupt system operation.
3. Intermittent faults: occur unpredictably and are hard to diagnose
since they appear and disappear over time.
Types of System Faults:
1. Hardware faults are physical failures like power surges,
overheating, or component wear.
2. Software faults manifest in form of Bugs, logic errors, memory
leaks, and incorrect configurations.
3. Communication faults include network issues, signal degradation,
or protocol failures.
Process of Diagnosing System-Level Faults

Monitoring Systematic
Fault Fault
and Data Repair and
Isolation Identification
Collection Resolution
Monitoring and Data Collection
The first step is to gather information about the system. This includes:
 Error logs: Automated logging of errors during system operation.
 Event tracing: Capturing events in the system to track down
abnormal behaviors.
 Performance metrics: Monitoring CPU usage, memory, disk I/O,
network throughput, etc., to detect anomalies.
Fault Isolation
Once enough data has been gathered, the next step is to isolate the
fault. This step involves:
 Hypothesis generation: Based on the collected data, forming
hypotheses about which part of the system might be failing.
 Testing subsystems: Running diagnostics on suspected subsystems
or components to verify if they are the source of the fault.
Fault Identification
After isolating the fault, the exact nature of the failure must be
determined. This might involve:
1. Root Cause Analysis (RCA): Finding the underlying cause of the
fault rather than just the symptoms.
2. Testing scenarios: Reproducing the fault in a controlled
environment to better understand its behavior.
Systematic Repair and Resolution
Once the fault is identified, the final step is to apply a fix. It could be:
 Hardware repairs or replacements: For physical component
failures.
 Software patches or updates: For bugs or configuration issues.
 System reboots or reset: For transient issues that resolve with a
system reset.
Techniques for Fault Diagnosis
Built-In Self-Test (BIST)
Built-In Self-Test (BIST) A mechanism where the system performs self-
Techniques for fault

testing at startup or during operation to detect

internal faults. It is useful for identifying
Diagnosis

Watchdog timers hardware failures without needing external

diagnostics.
Watchdog Timers
Redundancy and Fault
These are timers that monitor software
tolerance execution. If the system or a task fails to reset
the timer in a specific time frame, the system
assumes a fault and triggers corrective action
Diagnostic Software tools
(such as a system reboot).
Redundancy and Fault Tolerance
1. N-Modular Redundancy (NMR): Involves having multiple modules
performing the same operation simultaneously. If one module fails,
the other modules continue, and the faulty one is identified and
isolated.
2. Checkpointing and Rollback: The system regularly saves its state
(checkpoint). In the event of a failure, the system can "roll back" to
a previous state, effectively undoing the changes caused by the
fault
Diagnostic Software Tools
1. Performance Monitoring Tools: E.g., Task Manager, top (Linux),
which provide real-time data on system performance.
2. Network Analyzers: Tools such as Wireshark, which help diagnose
communication faults.
3. Hardware Diagnostic Tools: For testing memory (e.g., MemTest),
hard disks (e.g., SMART diagnostics), and other hardware
components.
Review example
• A server in a data center is reported to be experiencing intermittent
slowdowns and periodic reboots. As a system’s engineer, discuss the
professional approach(s) you would use to diagnose and fix the fault
for optimum data center performance.

7f4a Ictsas426 Locate and Troubleshoot Ict Equipment System and Software Faults Book PDF
No ratings yet
7f4a Ictsas426 Locate and Troubleshoot Ict Equipment System and Software Faults Book PDF
230 pages
GEK - 91712 wPTU Toolbox Users Guide DC DRAFT
No ratings yet
GEK - 91712 wPTU Toolbox Users Guide DC DRAFT
81 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
51 pages
RTS UNiT 4
No ratings yet
RTS UNiT 4
19 pages
Computer and Spftware Reliability
No ratings yet
Computer and Spftware Reliability
4 pages
Unit5 1
No ratings yet
Unit5 1
23 pages
A Survey of Fault Diagnosis and Fault-Tolerant Techniques-Part I: Fault Diagnosis With Model-Based and Signal-Based Approaches
No ratings yet
A Survey of Fault Diagnosis and Fault-Tolerant Techniques-Part I: Fault Diagnosis With Model-Based and Signal-Based Approaches
11 pages
03 - Reliability Software
No ratings yet
03 - Reliability Software
56 pages
Gao 2015
No ratings yet
Gao 2015
11 pages
Rtes Reliability and Fault Torelance
No ratings yet
Rtes Reliability and Fault Torelance
40 pages
Distrsyslectureset7 Win20
No ratings yet
Distrsyslectureset7 Win20
114 pages
Ch-4-Fault Tularance - Naming-SM
No ratings yet
Ch-4-Fault Tularance - Naming-SM
42 pages
Determining The Computer Systems Errors Using Manual and Software Diagnosis
No ratings yet
Determining The Computer Systems Errors Using Manual and Software Diagnosis
16 pages
A Modeling Framework For Self Healing So
No ratings yet
A Modeling Framework For Self Healing So
9 pages
A Survey of Fault Diagnosis and Fault-Tolerant TechniquesPart I Fault Diagnosis With Model-Based and Signal-Based Approaches
No ratings yet
A Survey of Fault Diagnosis and Fault-Tolerant TechniquesPart I Fault Diagnosis With Model-Based and Signal-Based Approaches
11 pages
Fault Modeling
No ratings yet
Fault Modeling
21 pages
II - Fault-Tolerant-techniques
No ratings yet
II - Fault-Tolerant-techniques
104 pages
21EC63 Module 4A
No ratings yet
21EC63 Module 4A
39 pages
Reliability: APSC 380: I M 1997/98 W S T 2
No ratings yet
Reliability: APSC 380: I M 1997/98 W S T 2
4 pages
7.fault Tolerance
No ratings yet
7.fault Tolerance
35 pages
Functional Testing in RTS
No ratings yet
Functional Testing in RTS
47 pages
Probleme Zos PDF
No ratings yet
Probleme Zos PDF
406 pages
35 Fault Findings
No ratings yet
35 Fault Findings
32 pages
SoftFailureDet SHAREOrlando5 PDF
No ratings yet
SoftFailureDet SHAREOrlando5 PDF
54 pages
Design For Test DFT by DR Adam Teman 1712075531
No ratings yet
Design For Test DFT by DR Adam Teman 1712075531
75 pages
Test
No ratings yet
Test
38 pages
21EC63 Module 4A
No ratings yet
21EC63 Module 4A
39 pages
II Fault Tolerant Techniques
No ratings yet
II Fault Tolerant Techniques
101 pages
A Survey of Fault Tolerance Approaches On Different Architecture Levels
No ratings yet
A Survey of Fault Tolerance Approaches On Different Architecture Levels
9 pages
Computer Operational System Prac
No ratings yet
Computer Operational System Prac
31 pages
Handling Software Faults With Redundancy
No ratings yet
Handling Software Faults With Redundancy
24 pages
Fault Tolerance Computing Lecture Note
No ratings yet
Fault Tolerance Computing Lecture Note
61 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
6 pages
Introduction To Fault Tolerance
No ratings yet
Introduction To Fault Tolerance
20 pages
UNIT-5 MLAN (Troubleshooting)
No ratings yet
UNIT-5 MLAN (Troubleshooting)
23 pages
Fundamental Concepts of Dependability: Algirdas Aviz Ienis Jean-Claude Laprie Brian Randell
No ratings yet
Fundamental Concepts of Dependability: Algirdas Aviz Ienis Jean-Claude Laprie Brian Randell
6 pages
STDcurs1 Merged
No ratings yet
STDcurs1 Merged
139 pages
Lect4 - Fault Modelling
No ratings yet
Lect4 - Fault Modelling
24 pages
Distributed System - Failures
No ratings yet
Distributed System - Failures
12 pages
Diagnosis in Cyber-Physical Systems With Fault Protection Assemblies
No ratings yet
Diagnosis in Cyber-Physical Systems With Fault Protection Assemblies
25 pages
Sysplex Diagnosis Handout Slides
No ratings yet
Sysplex Diagnosis Handout Slides
235 pages
Testing and Design-for-Testability (DFT) For Digital Integrated Circuits
100% (1)
Testing and Design-for-Testability (DFT) For Digital Integrated Circuits
95 pages
9 Testing
No ratings yet
9 Testing
51 pages
MATLAB, Step by Step
No ratings yet
MATLAB, Step by Step
5 pages
Lecture 21,22,23,24 Availability Modifiability Tactics
No ratings yet
Lecture 21,22,23,24 Availability Modifiability Tactics
66 pages
Lecture 7 - FAULT-TOLERANT COMPUTING
No ratings yet
Lecture 7 - FAULT-TOLERANT COMPUTING
13 pages
UNIT 2troubleshooting Fundamentals
No ratings yet
UNIT 2troubleshooting Fundamentals
12 pages
Toward Monitoring Fault-Tolerant Embedded Systems (Extended Abstract)
No ratings yet
Toward Monitoring Fault-Tolerant Embedded Systems (Extended Abstract)
3 pages
Digital Design Notes
No ratings yet
Digital Design Notes
30 pages
1 s2.0 S1474667017511246 Main
No ratings yet
1 s2.0 S1474667017511246 Main
17 pages
Lecture 13
No ratings yet
Lecture 13
22 pages
Fault Avoidance and Tolerance Technique
No ratings yet
Fault Avoidance and Tolerance Technique
15 pages
Distribution Systems Fault Analysis: Laurentiu Nastac and Anupam Thatte
No ratings yet
Distribution Systems Fault Analysis: Laurentiu Nastac and Anupam Thatte
31 pages
Rts
No ratings yet
Rts
44 pages
Troubleshooting
No ratings yet
Troubleshooting
18 pages
A Guide To Fault Detection and Diagnosis
No ratings yet
A Guide To Fault Detection and Diagnosis
2 pages
Lect8 FaultTolerance
No ratings yet
Lect8 FaultTolerance
37 pages
Industrial Safety Doc 5
No ratings yet
Industrial Safety Doc 5
12 pages
Fault Tree and Cause Consequence Analysis For Control Software Validation
No ratings yet
Fault Tree and Cause Consequence Analysis For Control Software Validation
22 pages
Fault Localization For Hardware Design Code With Time-Aware Program Spectrum
No ratings yet
Fault Localization For Hardware Design Code With Time-Aware Program Spectrum
8 pages
Penetration Testing Fundamentals-2: Penetration Testing Study Guide To Breaking Into Systems
From Everand
Penetration Testing Fundamentals-2: Penetration Testing Study Guide To Breaking Into Systems
Devi Prasad
No ratings yet
Dell Optiplex 990 Led Codes
100% (1)
Dell Optiplex 990 Led Codes
4 pages
Interdisciplinary Courses of Computer Science 2023-24-1
No ratings yet
Interdisciplinary Courses of Computer Science 2023-24-1
90 pages
Firmware Readme - ljM725fw - Futuresmart Touch Panel PDF
No ratings yet
Firmware Readme - ljM725fw - Futuresmart Touch Panel PDF
68 pages
Siemens Advia 1800 Operator's Guide
No ratings yet
Siemens Advia 1800 Operator's Guide
223 pages
Unofficial Manual For Zorin 15 Core Final 21.02.2020
No ratings yet
Unofficial Manual For Zorin 15 Core Final 21.02.2020
188 pages
Creating A Bootable DOS CD V 1.5
No ratings yet
Creating A Bootable DOS CD V 1.5
25 pages
How - To - Boot Pendrive - Using - CMD
No ratings yet
How - To - Boot Pendrive - Using - CMD
2 pages
Device Interfaces and Drivers
No ratings yet
Device Interfaces and Drivers
15 pages
Optima Saver: VOIP Bandwidth Optimization Service Manual For Users or Traders - BY ImproLabs LLC New Jersey, USA.
No ratings yet
Optima Saver: VOIP Bandwidth Optimization Service Manual For Users or Traders - BY ImproLabs LLC New Jersey, USA.
20 pages
V162-7A IH2 MVME162LX Installation Oct97
No ratings yet
V162-7A IH2 MVME162LX Installation Oct97
153 pages
D1561 Bios
No ratings yet
D1561 Bios
118 pages
Btw-Sysmenu RM 615us Us 1037-3
No ratings yet
Btw-Sysmenu RM 615us Us 1037-3
86 pages
Manual - PAC-A With OOK Output
No ratings yet
Manual - PAC-A With OOK Output
51 pages
Jss 1 Ict 3rd Term
No ratings yet
Jss 1 Ict 3rd Term
17 pages
Servidor Intel Se7320ep2
No ratings yet
Servidor Intel Se7320ep2
132 pages
Manual - EN: Beckhoff Service Tool
No ratings yet
Manual - EN: Beckhoff Service Tool
31 pages
Android Internals Updated: Kernel
100% (1)
Android Internals Updated: Kernel
9 pages
Debug 1214
No ratings yet
Debug 1214
2 pages
Red Hat Enterprise Linux 7 Installation Guide en US
No ratings yet
Red Hat Enterprise Linux 7 Installation Guide en US
491 pages
Disk Operating System (Presentation)
No ratings yet
Disk Operating System (Presentation)
24 pages
Manual For Using Flash Magic
No ratings yet
Manual For Using Flash Magic
111 pages
Lecture Notes Course No Stca 301 Introdu
No ratings yet
Lecture Notes Course No Stca 301 Introdu
55 pages
Initial Respose Unit 2
No ratings yet
Initial Respose Unit 2
38 pages
Make USB Bootable Acronis
No ratings yet
Make USB Bootable Acronis
3 pages
E Lau Error Codes
No ratings yet
E Lau Error Codes
94 pages
Tangential Storage Silos and Cement Conveying Equipment: Operating Manual en-D000091-5K-6C-A
No ratings yet
Tangential Storage Silos and Cement Conveying Equipment: Operating Manual en-D000091-5K-6C-A
31 pages
Abhinav
No ratings yet
Abhinav
2 pages
KST SeamTech Tracking 20 en
No ratings yet
KST SeamTech Tracking 20 en
71 pages
59 - Hp-Bios
No ratings yet
59 - Hp-Bios
8 pages

Lesson 6 - System-Level Diagnosis

Uploaded by

Lesson 6 - System-Level Diagnosis

Uploaded by

System-Level Diagnosis

testing at startup or during operation to detect

Watchdog timers hardware failures without needing external

You might also like