0% found this document useful (0 votes)

6 views6 pages

UNIT3

This document covers advanced topics in operating systems, focusing on distributed shared memory (DSM), distributed scheduling, and failure recovery mechanisms. It discusses the architecture and algorithms for implementing DSM, including memory coherence and design issues, as well as the challenges of load distribution and task migration in distributed systems. Additionally, it addresses fault tolerance and recovery strategies, including error recovery techniques and checkpointing methods for maintaining consistency in distributed environments.

Uploaded by

diyadivya528

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views6 pages

UNIT3

Uploaded by

diyadivya528

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Advanced Operating Systems

UNIT- 3:
Distributed shared memory, Architecture, algorithms for implementing DSM, memory
coherence and protocols, design issues. Distributed Scheduling, introduction, issues in load
distributing, components of a load distributing algorithm, stability, load distributing
algorithm, performance comparison, selecting a suitable load sharing algorithm, requirements
for load distributing, task migration and associated issues. Failure Recovery and Fault
tolerance: introduction, basic concepts, classification of failures, backward and forward error
recovery, backward error recovery, recovery in concurrent systems, consistent set of check
points, synchronous and asynchronous check pointing and recovery, check pointing for
distributed database systems, recovery in replicated distributed databases.

Distributed Shared Memory (DSM)

Architecture:

 Distributed Shared Memory (DSM) is a concept where a distributed system

(composed of multiple machines) presents the illusion of having a single, unified
memory space to the applications running on it. Each node (machine) in the
distributed system has its own local memory, but DSM allows processes to access the
memory as if it were shared. The architecture of DSM typically consists of:
1. Multiple nodes: Each node has local memory and is connected to others via a
network.
2. Communication layer: The nodes communicate with each other to share
memory content, often utilizing message-passing protocols to update and
access shared memory locations.
3. Synchronization mechanisms: Mechanisms such as locks or semaphores may
be needed to coordinate memory access between processes running on
different nodes to ensure correct operation.

Algorithms for Implementing DSM:

 Implementing DSM requires coordination between different nodes to ensure that

updates to the shared memory are properly synchronized. The main approaches to
DSM implementation include:
o Lazy Update Protocol: In this approach, the updates to shared memory are
not immediately propagated to all nodes. Updates are only sent to the nodes
that request the data or when necessary. This protocol reduces the overhead of
communication but can lead to temporary inconsistency.
o Eager Update Protocol: Every time a process writes to the shared memory,
the change is immediately broadcast to all other processes. This ensures strong
consistency but can introduce high communication overhead.
o Write-Invalidate Protocol: In this approach, when a process writes to shared
memory, it invalidates other copies of the memory location in the system. This
Advanced Operating Systems
is typically used in systems with distributed caches to prevent stale data from
being accessed.
o Write-Propagation Protocol: This approach propagates changes to all other
copies in the system as soon as a process modifies a memory location.

Memory Coherence and Protocols:

 Memory Coherence refers to the requirement that all processes in a distributed

system observe the same order of updates to shared memory locations. In other words,
if process P1 writes to a memory location and process P2 reads that location, P2 must
see the update from P1, or they should be able to synchronize the access to this
memory.
o Sequential Consistency: This is the strongest memory consistency model
where the results of operations are guaranteed to be the same as if they were
executed in some sequential order. All processes see memory updates in the
same order.
o Release Consistency: This model allows for more relaxed memory
synchronization by only ensuring that synchronization happens at explicit
synchronization points (such as barriers or locks) instead of after every
memory write.
o Distributed Coherence Protocols:
 Directory-based protocols: These are used to maintain memory
coherence across a distributed system. A directory at each node tracks
which processes have copies of memory locations, and updates are sent
to all nodes that have a copy of the memory location.
 Broadcast protocols: These protocols propagate updates to all nodes
immediately. However, they are inefficient for large systems with
many nodes.
 Token-based protocols: A token is passed between nodes that hold
shared memory. This token is used to ensure that the memory is
updated in a consistent way across all nodes.

Design Issues in DSM:

 Scalability: DSM systems need to be scalable, meaning they can handle the
increasing number of nodes, processes, and memory accesses without significant
performance degradation. As the system grows, the number of nodes involved in
memory coherence and the overhead of synchronization increases.
 Consistency: Ensuring consistency is a critical design issue. Different consistency
models (sequential consistency, release consistency, etc.) need to be carefully chosen
based on the application requirements.
 Fault Tolerance: The DSM system must be able to continue functioning correctly if a
node fails. This requires mechanisms like replication and recovery strategies to handle
failures.
 Communication Overhead: Communication between nodes (e.g., sending memory
updates) can introduce latency. Efficient communication protocols are necessary to
minimize this overhead.
Advanced Operating Systems
Distributed Scheduling

Introduction:

 Distributed scheduling involves the allocation of resources or tasks across multiple

machines in a distributed system. Unlike centralized scheduling, which uses a single
point to manage tasks, distributed scheduling handles tasks in a decentralized manner
to achieve load balancing, resource optimization, and minimize delays. The main goal
is to improve the performance of the system by utilizing the resources efficiently.

Issues in Load Distributing:

 Load Imbalance: Load imbalance happens when some nodes in the system are
overloaded, while others are under-utilized. This leads to inefficiency and
underperformance.
 Communication Overhead: Load distributing algorithms often require processes to
communicate across the network to share load information. Excessive communication
overhead can hinder performance.
 Scalability: As the system scales (i.e., the number of nodes increases), the algorithms
should continue to function effectively. Some load balancing strategies may not scale
well and may become inefficient.
 Dynamic Changes: The load in the system can change over time (e.g., nodes can join
or leave the system, workloads may change). The load distribution mechanism must
dynamically adjust to these changes.

Components of a Load Distributing Algorithm:

1. Load Measurement: The algorithm must monitor and measure the load at each node.
This could include CPU usage, memory usage, or network bandwidth.
2. Task Assignment: Once the load is measured, tasks must be allocated to nodes. This
could involve direct assignment or task migration.
3. Task Migration: Tasks may need to be moved from overloaded nodes to
underutilized ones to balance the load. This process is known as task migration.
4. Feedback Mechanism: The system must periodically assess the load distribution and
adapt the assignments based on current system status.

Stability:

 A load distributing algorithm is considered stable if it consistently achieves an

optimal or near-optimal load balance without oscillating. In other words, the system
should not continually overcompensate or undercompensate the allocation of tasks
across nodes.

Load Distributing Algorithms:

1. Centralized Algorithms: These algorithms use a central controller that monitors the
load of all nodes and assigns tasks accordingly. While simpler, they introduce a single
point of failure and can become a bottleneck.
Advanced Operating Systems
2. Decentralized Algorithms: In decentralized algorithms, each node makes decisions
about load balancing based on local knowledge and periodic communication with
other nodes. These systems are more scalable and fault-tolerant than centralized ones.
3. Hybrid Algorithms: Hybrid algorithms combine elements of both centralized and
decentralized approaches. They may use a central controller for overall coordination
but delegate task distribution to individual nodes.

Performance Comparison:

 Efficiency: A good load distributing algorithm must minimize the time taken to
complete tasks and ensure that nodes are neither underloaded nor overloaded.
 Scalability: Algorithms that work well with a small number of nodes may not scale
effectively with more nodes. Performance must remain stable as the system grows.
 Fault Tolerance: The algorithm must handle node failures gracefully, ensuring that
tasks are reassigned in the case of node unavailability.

Selecting a Suitable Load Sharing Algorithm:

 Consider factors such as network topology, system size, load distribution dynamics,
and fault tolerance when selecting an algorithm. Algorithms with high overhead may
be unsuitable for large-scale systems, while simpler algorithms may not provide
optimal performance.

Requirements for Load Distribution:

 Task Granularity: The granularity of tasks (how large or small they are) affects how
easily they can be distributed or migrated across nodes.
 Communication Overhead: The algorithm should minimize the overhead for
communication between nodes to avoid delays in task distribution.
 Adaptability: The load distribution algorithm should adapt to changes in the system,
such as new tasks arriving or nodes failing.

Task Migration and Associated Issues:

 Migration Overhead: Moving tasks between nodes requires communication and

synchronization. Migration should only happen when it results in a net gain in
performance.
 Consistency: Task migration requires that the state of the task is preserved across
nodes to ensure correct execution.
 Synchronization: When tasks are moved between nodes, they may need to
synchronize with other tasks, particularly in concurrent environments.

Failure Recovery and Fault Tolerance

Introduction:
Advanced Operating Systems
 Distributed systems are inherently prone to failures, and recovery from failures is an
essential part of system design. Fault tolerance ensures that the system can continue
to function even in the presence of failures. Failure recovery mechanisms ensure that
the system can return to a consistent state after a failure.

Basic Concepts:

 Fault Tolerance: The ability of the system to continue operating despite failures.
 Failure Recovery: The ability of the system to restore itself to a consistent state after
a failure has occurred.

Classification of Failures:

1. Crash Failures: A process or node fails by simply stopping and losing its internal
state. These are the most common type of failures in distributed systems.
2. Omission Failures: A process fails to send or receive messages. This can happen if a
network link fails or if a process does not respond to requests.
3. Byzantine Failures: These occur when a process behaves arbitrarily, which could
include sending incorrect data, producing false outputs, or acting maliciously.
4. Network Partitioning: The system is divided into isolated groups of nodes, where
communication is impossible between the groups.

Backward and Forward Error Recovery:

 Backward Error Recovery: In this strategy, the system reverts to a previous,

consistent state (such as a checkpoint) and re-executes any operations that happened
after that point. This is typically used when a failure occurs and the system needs to
"undo" the operations to reach a known good state.
 Forward Error Recovery: This strategy allows the system to continue operation
despite the failure. For example, by using redundancy or error-correcting codes, the
system might be able to proceed with operations even after encountering an error.

Backward Error Recovery:

 Checkpointing: To implement backward recovery, the system periodically saves its

state, known as a checkpoint. If a failure occurs, the system can roll back to the most
recent checkpoint and continue from there.
 Logs: Another method is to keep logs of actions and operations. If a failure occurs,
the system can replay the log to restore itself to a consistent state.

Recovery in Concurrent Systems:

 Concurrency makes recovery more complex, as multiple processes may be

interacting with each other at the time of failure. A system must ensure that it can
restore the state of each process and maintain consistency across all processes.

Consistent Set of Checkpoints:

Advanced Operating Systems
 In distributed systems, checkpoints must be consistent across all processes. This
means that all processes that depend on each other must be at compatible states for
recovery to be meaningful.
 Coordinated Checkpointing: All processes synchronize their checkpoints to ensure a
global consistent state.
 Uncoordinated Checkpointing: Processes checkpoint independently, which may
lead to inconsistencies, requiring more sophisticated recovery mechanisms.

Synchronous and Asynchronous Checkpointing and Recovery:

 Synchronous Checkpointing: All processes coordinate their checkpoints at the same

time, ensuring a consistent global state. However, this approach incurs high overhead.
 Asynchronous Checkpointing: Processes can checkpoint independently, reducing
overhead but potentially leading to an inconsistent global state.

Checkpointing for Distributed Database Systems:

 Distributed databases use checkpointing to ensure consistency across replicas. Write-

ahead logging (WAL) ensures that changes are recorded before being applied,
allowing for recovery after a crash.

Recovery in Replicated Distributed Databases:

 In replicated databases, multiple copies of data are maintained to provide fault

tolerance. If one replica fails, others can take over. Replication mechanisms ensure
that the system remains consistent during recovery by using protocols like quorum-
based replication or two-phase commit (2PC) for transactions.

DDP Unit V
No ratings yet
DDP Unit V
44 pages
Module-03 MSCS201
No ratings yet
Module-03 MSCS201
10 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
52 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
51 pages
Unit 3
No ratings yet
Unit 3
58 pages
Unit - IV Notes
No ratings yet
Unit - IV Notes
42 pages
Vi6CS5 DS Unit-5
No ratings yet
Vi6CS5 DS Unit-5
32 pages
Mod 4
No ratings yet
Mod 4
48 pages
Distributed Systems QA
No ratings yet
Distributed Systems QA
53 pages
Chapter 3
No ratings yet
Chapter 3
29 pages
Unit-4 Rtu Kota
No ratings yet
Unit-4 Rtu Kota
17 pages
Module 5 Previous Year Questions With Solution
No ratings yet
Module 5 Previous Year Questions With Solution
8 pages
Plecement Prep Whole
No ratings yet
Plecement Prep Whole
169 pages
UNIT3 Aos
No ratings yet
UNIT3 Aos
3 pages
Distributed Shared Memory - Revised
No ratings yet
Distributed Shared Memory - Revised
64 pages
Name: Jayrajsinh Vaghela Roll No: 5166 Div: B Sub: DOS (Assi-3.1)
No ratings yet
Name: Jayrajsinh Vaghela Roll No: 5166 Div: B Sub: DOS (Assi-3.1)
24 pages
Module 2
No ratings yet
Module 2
34 pages
Tscan
No ratings yet
Tscan
961 pages
Lect5 - Distributed Shared Memory
No ratings yet
Lect5 - Distributed Shared Memory
120 pages
Parallel and Distributed Computing Lec 6
No ratings yet
Parallel and Distributed Computing Lec 6
26 pages
Unit 2
No ratings yet
Unit 2
15 pages
Design and Analysis of Differential Gearbox
50% (4)
Design and Analysis of Differential Gearbox
49 pages
6CS5 DS Unit-5
No ratings yet
6CS5 DS Unit-5
34 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
35 pages
Distributed Shared Memory For Advanced Os
No ratings yet
Distributed Shared Memory For Advanced Os
21 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
22 pages
Distributed Shared Memory (DSM)
No ratings yet
Distributed Shared Memory (DSM)
27 pages
DSM - Distributedsharedmemory
No ratings yet
DSM - Distributedsharedmemory
108 pages
DS Ia1
No ratings yet
DS Ia1
34 pages
Article 4
No ratings yet
Article 4
7 pages
Unit I
No ratings yet
Unit I
17 pages
Ds Assignment Solved
No ratings yet
Ds Assignment Solved
6 pages
Assignment NO.3 Parallel&distributed Computing
No ratings yet
Assignment NO.3 Parallel&distributed Computing
6 pages
AOS UNIT-V Notes
No ratings yet
AOS UNIT-V Notes
16 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
24 pages
Distributed Computing Previous Year Questions
No ratings yet
Distributed Computing Previous Year Questions
3 pages
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
From Everand
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Summer Internship Report 1
No ratings yet
Summer Internship Report 1
67 pages
Operating System Interview Questions and Answers
From Everand
Operating System Interview Questions and Answers
Manish Soni
No ratings yet
Automation in Distributed Shared Memory Testing For Multi-Processor Systems
No ratings yet
Automation in Distributed Shared Memory Testing For Multi-Processor Systems
14 pages
Imp DS
No ratings yet
Imp DS
27 pages
6CS5 DS Unit-4
No ratings yet
6CS5 DS Unit-4
64 pages
Scheduling in Distributed Systems
No ratings yet
Scheduling in Distributed Systems
9 pages
DFS
No ratings yet
DFS
3 pages
Distributed 5
No ratings yet
Distributed 5
5 pages
WINSEM2022-23 CSE4001 ETH VL2022230503162 ReferenceMaterialI TueFeb1400 00 00IST2023 Module4DistributedSystemsLecture2
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503162 ReferenceMaterialI TueFeb1400 00 00IST2023 Module4DistributedSystemsLecture2
27 pages
Ca Research
No ratings yet
Ca Research
5 pages
Introduction To DSM: Unit - III Essay Questions
No ratings yet
Introduction To DSM: Unit - III Essay Questions
21 pages
Mid - 1 Ans
No ratings yet
Mid - 1 Ans
13 pages
Distributed Operating Systems: By: Malik Abdulrehman
No ratings yet
Distributed Operating Systems: By: Malik Abdulrehman
27 pages
Week 5 PDC
No ratings yet
Week 5 PDC
12 pages
22 Scheme CSE
No ratings yet
22 Scheme CSE
65 pages
CT3 Ak Set C
No ratings yet
CT3 Ak Set C
8 pages
DS IAT 3 Answer Key
No ratings yet
DS IAT 3 Answer Key
9 pages
A4
No ratings yet
A4
5 pages
Introduction To Parallel Algorithms and Parallel Program Design
No ratings yet
Introduction To Parallel Algorithms and Parallel Program Design
91 pages
Unit 4
No ratings yet
Unit 4
7 pages
Current Log
No ratings yet
Current Log
27 pages
CT3 Ak Set A
No ratings yet
CT3 Ak Set A
7 pages
Distributed Shared Memory-Report
No ratings yet
Distributed Shared Memory-Report
35 pages
VOLVO EW160B-3 Elec. System, Warning System, Information System, Instruments
83% (6)
VOLVO EW160B-3 Elec. System, Warning System, Information System, Instruments
391 pages
Data Shared System (DSM)
No ratings yet
Data Shared System (DSM)
7 pages
Distributed Shared Memory
100% (1)
Distributed Shared Memory
20 pages
CS G623 L1
No ratings yet
CS G623 L1
19 pages
10 Distributed Shared Memory
No ratings yet
10 Distributed Shared Memory
20 pages
Virtual Reality Augmented Reality in Education Group2 PDF
No ratings yet
Virtual Reality Augmented Reality in Education Group2 PDF
33 pages
UART - Specification by Texas Instruments
No ratings yet
UART - Specification by Texas Instruments
51 pages
SBM SA Tool 2021 Excel v.1
100% (1)
SBM SA Tool 2021 Excel v.1
12 pages
Unit 1 Introduction To Analog Electronics: BJT-Bipolar Junction Transistor
No ratings yet
Unit 1 Introduction To Analog Electronics: BJT-Bipolar Junction Transistor
42 pages
Unit 5 - Graphs (Repaired)
No ratings yet
Unit 5 - Graphs (Repaired)
81 pages
Distributed Resource Management: Distributed Shared Memory
No ratings yet
Distributed Resource Management: Distributed Shared Memory
20 pages
SAP Manual
No ratings yet
SAP Manual
24 pages
BANCO4-6 WHL GSK PL23-02-16
No ratings yet
BANCO4-6 WHL GSK PL23-02-16
33 pages
Job Application Letter in Arabic
100% (1)
Job Application Letter in Arabic
4 pages
Hydraulic Engineering - Lec - 7-Updated
No ratings yet
Hydraulic Engineering - Lec - 7-Updated
18 pages
Advanced Operating Systems QB
100% (1)
Advanced Operating Systems QB
8 pages
w202 - Wiring Diagram - ME-SFI Fuel Injection and Ignition System
60% (5)
w202 - Wiring Diagram - ME-SFI Fuel Injection and Ignition System
5 pages
Distributed Shared Memory (DSM)
No ratings yet
Distributed Shared Memory (DSM)
4 pages
Stainless Steel Razni Standardi
No ratings yet
Stainless Steel Razni Standardi
6 pages
Unit I
No ratings yet
Unit I
6 pages
DQS251 - Piling - Spun Piles - Tutorial-Drwgs-Dec 2019 - pg11
No ratings yet
DQS251 - Piling - Spun Piles - Tutorial-Drwgs-Dec 2019 - pg11
8 pages
ADM (Unit 2)
No ratings yet
ADM (Unit 2)
35 pages
Unit Iii
No ratings yet
Unit Iii
5 pages
Data Sheet 3D 40-200 9.26
No ratings yet
Data Sheet 3D 40-200 9.26
6 pages
Unit Ii
No ratings yet
Unit Ii
21 pages
722.9, 7G-Tronic NAG2 Uncomfortable Shift Quality
100% (2)
722.9, 7G-Tronic NAG2 Uncomfortable Shift Quality
3 pages
Mtech r19 Sem1 m2
No ratings yet
Mtech r19 Sem1 m2
20 pages
Feemtech
No ratings yet
Feemtech
1 page
ADS&A
No ratings yet
ADS&A
16 pages
A Path Finding Visualization Using A Star Algorithm and Dijkstra's Algorithm
100% (1)
A Path Finding Visualization Using A Star Algorithm and Dijkstra's Algorithm
2 pages
eME4 HW3 Flores BSME-4B
No ratings yet
eME4 HW3 Flores BSME-4B
6 pages
Adsunit III
No ratings yet
Adsunit III
11 pages
ML (Unit 3)
No ratings yet
ML (Unit 3)
1 page
Crash 2024 02 22 - 19.05.51 Client
No ratings yet
Crash 2024 02 22 - 19.05.51 Client
6 pages
Research Methodology and IPR
No ratings yet
Research Methodology and IPR
10 pages
Unit Vads
No ratings yet
Unit Vads
10 pages
Water-Coold Ex
No ratings yet
Water-Coold Ex
6 pages
Borehole Packers: TEL 604 540 1100 RST Instruments Ltd. 11545 Kingston ST., Maple Ridge, BC V2X 0Z5 Canada
No ratings yet
Borehole Packers: TEL 604 540 1100 RST Instruments Ltd. 11545 Kingston ST., Maple Ridge, BC V2X 0Z5 Canada
2 pages
Unit 1
No ratings yet
Unit 1
6 pages
2994-1637316742202-SelfStudy - 1.1 (10 Hours)
No ratings yet
2994-1637316742202-SelfStudy - 1.1 (10 Hours)
3 pages
MCQ and Descriptive Questions - Guidelines
No ratings yet
MCQ and Descriptive Questions - Guidelines
4 pages
Unit 2
No ratings yet
Unit 2
6 pages
UNIT4
No ratings yet
UNIT4
5 pages
Microwave Engineering Sem VII Mu Question Paper 23 D
No ratings yet
Microwave Engineering Sem VII Mu Question Paper 23 D
1 page
Inventory List With Highlighting1
No ratings yet
Inventory List With Highlighting1
1 page
ML (Unit 1)
No ratings yet
ML (Unit 1)
2 pages
MEAN (Unit 3)
No ratings yet
MEAN (Unit 3)
1 page
Exp 10
No ratings yet
Exp 10
1 page

UNIT3

Uploaded by

UNIT3

Uploaded by

Advanced Operating Systems

Distributed Shared Memory (DSM)

 Distributed Shared Memory (DSM) is a concept where a distributed system

Algorithms for Implementing DSM:

 Implementing DSM requires coordination between different nodes to ensure that

Memory Coherence and Protocols:

 Memory Coherence refers to the requirement that all processes in a distributed

Design Issues in DSM:

 Distributed scheduling involves the allocation of resources or tasks across multiple

Issues in Load Distributing:

Components of a Load Distributing Algorithm:

 A load distributing algorithm is considered stable if it consistently achieves an

Load Distributing Algorithms:

Selecting a Suitable Load Sharing Algorithm:

Requirements for Load Distribution:

Task Migration and Associated Issues:

 Migration Overhead: Moving tasks between nodes requires communication and

Failure Recovery and Fault Tolerance

Backward and Forward Error Recovery:

 Backward Error Recovery: In this strategy, the system reverts to a previous,

Backward Error Recovery:

 Checkpointing: To implement backward recovery, the system periodically saves its

Recovery in Concurrent Systems:

 Concurrency makes recovery more complex, as multiple processes may be

Consistent Set of Checkpoints:

Synchronous and Asynchronous Checkpointing and Recovery:

 Synchronous Checkpointing: All processes coordinate their checkpoints at the same

Checkpointing for Distributed Database Systems:

 Distributed databases use checkpointing to ensure consistency across replicas. Write-

Recovery in Replicated Distributed Databases:

 In replicated databases, multiple copies of data are maintained to provide fault

You might also like