0% found this document useful (0 votes)
85 views

Distributed UNIT 3

Distributed UNIT 3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

Distributed UNIT 3

Distributed UNIT 3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

III BSc (Semester – VI) Distributed Systems Unit III

UNIT III
Introduction, Design and implementation of DSM system, Granularity
and Consistency Model, Advantages of DSM, Clock Synchronization,
Event Ordering, Mutual exclusion, Deadlock, Election Algorithms.

*****************
1. What is DSM (Distributed Shared Memory)? Explain Design and
implementation of DSM System?
Introduction:
In computer science, distributed shared memory (DSM) is a form
of memory architecture where physically separated memories can be
addressed as one logically shared address space. Here, the term
"shared" does not mean that there is a single centralized memory; but
that the address space is "shared" (same physical address on two
processors refers to the same location in memory). Distributed global
address space (DGAS), is a similar term for a wide class of software and
hardware implementations, in which each node of a cluster has access
to shared memory in addition to each node's non-shared private
memory.

A distributed-memory system, often called a multicomputer,


consists of multiple independent processing nodes with local memory
modules which are connected by a general interconnection network.
Software DSM systems can be implemented in an operating system, or
as a programming library and can be thought of as extensions of the
underlying virtual memory architecture.

Methods of achieving DSM:


There are usually two methods of achieving distributed shared
memory:

Ø Hardware, such as cache coherence (unity) circuits and network


interfaces.
Ø Software.

Software DSM implementation:


There are three ways of implementing software distributed shared
memory:

Ø Page based approach using the system’s virtual memory;


Ø Shared variable approach using some routines to access shared
variables;
Ø Object based approach ideally accessing shared data through
object-oriented discipline.

1 Prepared by P.Y.Kumar © www.anuupdates.org


III BSc (Semester – VI) Distributed Systems Unit III

Message Passing vs DSM

Message Passing Distributed Shared Memory


Variables have to be marshaled Variables are shared directly
Cost of communication is obvious Cost of communication is invisible
Processes are protected by having Processes could cause error
private address space by altering data
Executing the processes may
Processes should execute at
happen with non-overlapping
the same time
lifetimes

Software DSM systems also have the flexibility to organize the


shared memory region in different ways. The page based approach
organizes shared memory into pages of fixed size. In contrast, the
object based approach organizes the shared memory region as an
abstract space for storing shareable objects of variable sizes. Another
commonly seen implementation uses a tuple space, in which the unit of
sharing is a tuple.
Shared memory architecture may involve separating memory into
shared parts distributed amongst nodes and main memory; or
distributing all memory between nodes. A coherence protocol, chosen in
accordance with a consistency model, maintains memory coherence.

2 Prepared by P.Y.Kumar © www.anuupdates.org


III BSc (Semester – VI) Distributed Systems Unit III

2. Granularity (Parallel Computing) and Consistency Models.


The definition of granularity takes into account the communication
overhead between multiple processors or processing elements. It
defines granularity as the ratio of computation time to communication
time, wherein, computation time is the time required to perform the
computation of a task and communication time is the time required to
exchange data between processors.
Granularity is usually measured in terms of the number of
instructions executed in a particular task. Alternately, granularity can
also be specified in terms of the execution time of a program, combining
the computation time and communication time
Types of parallelism:
Depending on the amount of work which is performed by a parallel
task, parallelism can be classified into three categories: fine-grained,
medium-grained and coarse-grained parallelism.
Fine-grained parallelism:
In fine-grained parallelism, a program is broken down to a large
number of small tasks. These tasks are assigned individually to many
processors. The amount of work associated with a parallel task is low
and the work is evenly distributed among the processors. Hence, fine-
grained parallelism facilitates load balancing.
As each task processes less data, the number of processors
required to perform the complete processing is high. This in turn,
increases the communication and synchronization overhead.
Fine-grained parallelism is best exploited in architectures which
support fast communication. Shared memory architecture which has a
low communication overhead is most suitable for fine-grained
parallelism.
An example of a fine-grained system (from outside the parallel
computing domain) is the system of neurons in our brain.

Coarse-grained Parallelism:
In coarse-grained parallelism, a program is split into large tasks.
Due to this, a large amount of computation takes place in processors.
This might result in load imbalance, wherein certain tasks process bulk
of the data while others might be idle. Further, coarse-grained
parallelism fails to exploit the parallelism in the program as most of the
computation is performed sequentially on a processor. The advantage of
this type of parallelism is low communication and synchronization
overhead.

Message-passing architecture takes a long time to communicate


data among processes which makes it suitable for coarse-grained
parallelism.

3 Prepared by P.Y.Kumar © www.anuupdates.org


III BSc (Semester – VI) Distributed Systems Unit III

Medium-grained Parallelism
Medium-grained parallelism is used relatively to fine-grained and
coarse-grained parallelism. Medium-grained parallelism is a compromise
between fine-grained and coarse-grained parallelism, where we have
task size and communication time greater than fine-grained parallelism
and lower than coarse-grained parallelism. Most general-purpose
parallel computers fall in this category
Levels of Parallelism:
Granularity is closely tied to the level of processing. A program
can be broken down into 4 levels of parallelism -
Ø Instruction level.
Ø Loop level
Ø Sub-routine level and
Ø Program-level
The highest amount of parallelism is achieved at instruction level,
followed by loop-level parallelism. At instruction and loop level, fine-
grained parallelism is achieved. Typical grain size at instruction-level is
20 instructions, while the grain-size at loop-level is 500 instructions.
At the sub-routine (or procedure) level the grain size is typically a
few thousand instructions. Medium-grained parallelism is achieved at
sub-routine level.
At program-level, parallel execution of programs takes place.
Granularity can be in the range of tens of thousands of instructions.
Coarse-grained parallelism is used at this level.
Consistency models:
Consistency models are used in distributed systems like
distributed shared memory systems or distributed data stores (such as
file systems. The system is said to support a given model if operations
on memory follow specific rules. The data consistency model specifies a
contract between programmer and system, wherein the system
guarantees that if the programmer follows the rules, memory will be
consistent and the results of memory operations will be predictable.
Types:
There are two methods to define and categorize consistency
models; issue and view.
Issue: Issue method describes the restrictions that define how a
process can issue operations.
View: View method which defines the order of operations visible to
processes.
Strict Consistency:
Strict consistency is the strongest consistency model. Under this
model, a write to a variable by any processor needs to be seen
instantaneously by all processors. The Strict model diagram and non-
Strict model diagrams describe the time constraint – instantaneous

4 Prepared by P.Y.Kumar © www.anuupdates.org


III BSc (Semester – VI) Distributed Systems Unit III

Sequential Consistency
The sequential consistency model is a weaker memory model than
strict consistency. A write to a variable does not have to be seen
instantaneously; however, writes to variables by different processors
have to be seen in the same order by all processors.

Causal Consistency
Causal consistency is a weakening model of sequential consistency
by categorizing events into those causally related and those that are
not. It defines those only write operations that are causally related need
to be seen in the same order by all processes.

Processor Consistency
In order for consistency in data to be maintained and to attain
scalable processor systems where every processor has its own memory,
the Processor consistency model was derived. All processors need to be
consistent in the order in which they see writes done by one processor
and in the way they see writes by different processors to the same
location (coherence is maintained).

PRAM Consistency (also known as FIFO consistency)


In PRAM consistency, all processes view the operations of a single
process in the same order that they were issued by that process, while
operations issued by different processes can be viewed in different order
from different processes. PRAM consistency is weaker than processor
consistency. PRAM relaxes the need to maintain coherence to a location
across all its processors. Here, reads to any variable can be executed
before writes in a processor. Read before Write, Read after Read and
Write before Write ordering is still preserved in this model.

Cache Consistency
Cache consistency requires that all write operations to the same
memory location are performed in some sequential order. Cache
consistency is weaker than process consistency and incomparable with
PRAM consistency.

Slow Consistency
In slow consistency, if a process reads a value previously written
to a memory location, it cannot subsequently read any earlier value
from that location. Writes performed by a process are immediately
visible to that process. Slow consistency is a weaker model than PRAM
and cache consistency.

5 Prepared by P.Y.Kumar © www.anuupdates.org


III BSc (Semester – VI) Distributed Systems Unit III
3. Advantages of DSM:
Ø Scales well with a large number of nodes.
Ø Message passing is hidden.
Ø Can handle complex and large databases without replication or
sending the data to processes.
Ø Generally cheaper than using a multiprocessor system.
Ø Provides large virtual memory space.
Ø Programs are more portable due to common programming
interfaces.
Ø Shield programmers from sending or receiving primitives.

4. Clock Synchronization & Event Ordering:


The concept of one event happening before another in a
distributed system is examined, and is shown to define a partial
ordering of the events. A distributed algorithm is given for synchronizing
a system of logical clocks which can be used to totally order the events.
The use of the total ordering is illustrated with a method for solving
synchronization problems. The algorithm is then specialized for
synchronizing physical clocks, and a bound is derived on how far out of
synchrony the clocks can become.

Introduction:
The concept of time is fundamental to our way of thinking. It is
derived from the more basic concept of the order in which events occur.
We say that something happened at 3:15 if it occurred after our clock
read 3:15 and before it read 3:16. The concept of the temporal ordering
of events pervades our thinking about systems.
For example, in an airline reservation system we specify that a
request for a reservation should be granted if it is made before the flight
is filled. However, we will see that this concept must be carefully
reexamined when considering events in a distributed system.
A distributed system consists of a collection of distinct processes
which are spatially separated, and which communicate with one another
by exchanging messages. A network of interconnected computers, such
as the ARPA net, is a distributed system. A single computer can also be
viewed as a distributed system in which the central control unit, the
memory units, and the input-output channels are separate processes. A
system is distributed if the message transmission delay is not negligible
compared to the time between events in a single process. We will
concern ourselves primarily with systems of spatially separated
computers. However, many of our remarks will apply more generally. In
particular, a multiprocessing system on a single computer involves
problems similar to those of a distributed system because of the
unpredictable order in which certain events can occur.

6 Prepared by P.Y.Kumar © www.anuupdates.org


III BSc (Semester – VI) Distributed Systems Unit III
In a distributed system, it is sometimes impossible to say that one
of two events occurred first. The relation "happened before" is therefore
only a partial ordering of the events in the system. We have found that
problems often arise because people are not fully aware of this fact and
its implications. In this paper, we discuss the partial ordering defined by
the "happened before" relation, and give a distributed algorithm for
extending it to a consistent total ordering of all the events. This
algorithm can provide a useful mechanism for implementing a
distributed system. We illustrate its use with a simple method for
solving synchronization problems. Unexpected, anomalous behavior can
occur if the ordering obtained by this algorithm differs from that
perceived by the user. This can be avoided by introducing real, physical
clocks. We describe a simple method for synchronizing these clocks, and
derive an upper bound on how far out of synchrony they can drift.

The Partial Ordering Most people would probably say that an event
a happened before an event b if a happened at an earlier time than b.
They might justify this definition in terms of physical theories of time.
However, if a system is to meet a specification correctly, then that
specification must be given in terms of events observable within the
system. If the specification is in terms of physical time, then the system
must contain real clocks. Even if it does contain real clocks, there is still
the problem that such clocks are not perfectly accurate and do not keep
precise physical time. We will therefore define the "happened before"
relation without using physical clocks.

We begin by defining our system more precisely. We assume that


the system is composed of a collection of processes. Each process
consists of a sequence of events. Depending upon the application, the
execution of a subprogram on a computer could be one event, or the
execution of a single machine instruction could be one event.

Logical Clocks:
Let’s again consider cases that involve assigning sequence
numbers (“timestamps”) to events upon which all cooperating processes
can agree. What matters in these cases is not the time of day at which
the event occurred but that all processes can agree on the order in
which related events occur. Our interest is in getting event sequence
numbers that make sense system-wide. These clocks are called logical
clocks.
If we can do this across all events in the system, we have
something called total ordering: every event is assigned a unique
timestamp (number), every such timestamp is unique.
However, we don’t always need total ordering. If processes do not
interact then we don’t care when their events occur. If we only care
about assigning timestamps to related (causal) events then we have
something known as partial ordering.

7 Prepared by P.Y.Kumar © www.anuupdates.org


III BSc (Semester – VI) Distributed Systems Unit III
Physical clocks:
Most computers today keep track of the passage of time with a
battery-backed up CMOS clock circuit, driven by a quartz resonator. This
allows the timekeeping to take place even if the machine is powered off.
When on, an operating system will generally program a timer circuit (a
Programmable Interval Timer, or PIT, in older Intel architectures and
Advanced Programmable Interrupt Controller, or APIC, in newer
systems.) to generate an interrupt periodically (common times are 60 or
100 times per second). The interrupt service procedure simply adds one
to a counter in memory.

5. Explain Mutual Exclusion in Distributed Systems.


A condition in which there is a set of processes, only one of which
is able to access a given resource or perform a given function at any
time

Centralized Systems
Mutual exclusion via:

ü Test & set


ü Semaphores
ü Messages
ü Monitors

Distributed Mutual Exclusion


Assume there is agreement on how a resource is identified
Ø Pass identifier with requests
Ø Create an algorithm to allow a process to obtain exclusive
access to a resource
ü Centralized Algorithm
ü Token Ring Algorithm
ü Distributed Algorithm
ü Decentralized Algorithm

Centralized Algorithm:
Ø Mimic single processor system
Ø One process elected as Coordinator
ü Request resource
ü Wait for response
ü Receive grant
ü Access resource
ü Release resource

8 Prepared by P.Y.Kumar © www.anuupdates.org


III BSc (Semester – VI) Distributed Systems Unit III
If another process claimed resource:
Ø Coordinator does not reply until release
Ø Maintain queue
ü Service requests in FIFO order

Benefits:
Ø Fair.
ü All requests processed in order
Ø Easy to implement, understand, verify.
Problems:
Ø Process cannot distinguish being blocked from a dead Coordinator.
Ø Centralized server can be a bottleneck.

Token Ring algorithm:


Assume known group of processes
Ø Some ordering can be imposed on group
Ø Construct logical ring in software
Ø Process communicates with neighbor

9 Prepared by P.Y.Kumar © www.anuupdates.org


III BSc (Semester – VI) Distributed Systems Unit III
Ø Initialization
ü Process 0 gets token for resource R
Ø Token circulates around ring
ü From Pi to P(i+1)mod N
Ø When process acquires token
ü Checks to see if it needs to enter critical section
ü If no, send token to neighbor
ü If yes, access resource
• Hold token until done

Ø Only one process at a time has token


ü Mutual exclusion guaranteed
Ø Order well-defined
ü Starvation cannot occur
Ø If token is lost (e.g. process died)
ü It will have to be regenerated
Ø Does not guarantee FIFO order

Ricart & Agrawala algorithm:

Ø Distributed algorithm using reliable multicast and logical clocks


Ø Process wants to enter critical section:
ü Compose message containing:
• Identifier (machine ID, process ID)
• Name of resource
• Timestamp (totally-ordered Lamppost)
ü Send request to all processes in group
ü Wait until everyone gives permission
ü Enter critical section / use resource
Ø When process receives request:
ü If receiver not interested:
• Send OK to sender
ü If receiver is in critical section
• Do not reply; add request to queue
ü If receiver just sent a request as well:
• Compare timestamps: received & sent messages
• Earliest wins
• If receiver is loser, send OK
• If receiver is winner, do not reply, queue
Ø When done with critical section
ü Send OK to all queued requests
Ø N points of failure
Ø A lot of messaging traffic
Ø Demonstrates that a fully distributed algorithm is possible

10 Prepared by P.Y.Kumar © www.anuupdates.org


III BSc (Semester – VI) Distributed Systems Unit III
Lamport’s Mutual Exclusion:
Each process maintains request queue
ü Contains mutual exclusion requests
Requesting critical section:
ü Process Pi sends request(i, Ti ) to all nodes
ü Places request on its own
queue
ü When a process Pj receives
a request, it returns a time stamped ack
Entering critical section (accessing resource):
ü Pi received a message (ack or release) from every other
process with a timestamp larger than Ti
ü Pi’s request has the earliest timestamp in its queue
Difference from Ricart- Agrawala:
ü Everyone responds always - no hold-back
Process decides to go based on whether its request is the earliest
in its queue.
Releasing critical section:
ü Remove request from its own queue
ü Send a time stamped release message
ü When a process receives a release message
• Removes request for that process from its queue
• This may cause its own entry have the earliest
timestamp in the queue, enabling it to access the
critical section
6. Explain Deadlock in Distributed System.
A deadlock is a condition in a system where a set of processes (or
threads) have requests for resources that can never be satisfied.
Essentially, a process cannot proceed because it needs to obtain a
resource held by another process but it itself is holding a resource that
the other process needs. More formally, four conditions have to be met
for a deadlock to occur in a System:
Mutual exclusion A resource can be held by at most one process.
Hold and wait Processes that already hold resources can wait for
another resource.
Non-preemption A resource, once granted, cannot be taken away.
Circular wait Two or more processes are waiting for resources held by
one of the other processes.

Deadlock in distributed systems:


The same conditions for deadlock in uniprocessors apply to
distributed systems. Unfortunately, as in many other aspects of
distributed systems, they are harder to detect, avoid, and prevent. Four
strategies can be used to handle deadlock:

11 Prepared by P.Y.Kumar © www.anuupdates.org


III BSc (Semester – VI) Distributed Systems Unit III
Ignorance: ignore the problem; assume that a deadlock will never
occur. This is a surprisingly common approach.
Detection: let a deadlock occur, detect it, and then deal with it by
aborting and later restarting a process that causes deadlock.
Prevention: make a deadlock impossible by granting requests so that
one of the necessary conditions for deadlock does not hold.
Avoidance: choose resource allocation carefully so that deadlock will
not occur. Resource requests can be honored as long as the system
remains in a safe (non-deadlock) state after resources are allocated.
Centralized Deadlock Detection:
Ø We use a centralized deadlock detection algorithm and try to
imitate the non distributed algorithm.
ü Each machine maintains the resource graph for its own
processes and resources.
ü A centralized coordinator maintain the resource graph for
the entire system.
ü When the coordinator detect a cycle, it kills off one process
to break the deadlock.
ü In updating the coordinator’s graph, messages have to be
passed.
• Method 1) Whenever an arc is added or deleted from
the resource graph, a message have to be sent to the
coordinator.
• Method 2) Periodically, every process can send a list
of arcs added and deleted since previous update.
• Method 3) Coordinator ask for information when it
needs it.
Distributed Deadlock Detection:
Ø The Chandy - Misra-Haas algorithm:
ü Processes are allowed to request multiple resources at once
-- the growing phase of a transaction can be speeded up.
ü The consequence of this change is a process may now wait
on two or more resources at the same time.
ü When a process has to wait for some resources, a probe
message is generated and sent to the process holding the
resources. The message consists of three numbers: the
process being blocked, the process sending the message,
and the process receiving the message.
ü When message arrived, the recipient checks to see it it itself
is waiting for any processes. If so, the message is updated,
keeping the first number unchanged, and replaced the
second and third field by the corresponding process number.
ü The message is then send to the process holding the needed
resources.
ü If a message goes all the way around and comes back to the
original sender -- the process that initiate the probe, a cycle
exists and the system is deadlocked.

12 Prepared by P.Y.Kumar © www.anuupdates.org


III BSc (Semester – VI) Distributed Systems Unit III
Distributed deadlock prevention:
An alternative to detecting deadlocks is to design a system so that
deadlock is impossible. We examined the four conditions for deadlock. If
we can deny at least one of these conditions then we will not have
deadlock.
Mutual exclusion
To deny this means that we will allow a resource to be held (used)
by more than one process at a time. If a resource can be shared then
there is no need for mutual exclusion and deadlock cannot occur. Too
often, however, a process requires mutual exclusion for a resource
because the resource is some object that will be modified by the
process.
Hold and wait
Denying this means that processes that hold resources
cannot wait for another resource. This typically implies that a process
should grab all of its resources at once. This is not practical either since
we cannot always predict what resources a process will need throughout
its execution.
Non-preemption
A resource, once granted, cannot be taken away. In transactional
systems, allowing preemption means that a transaction can come in and
modify data (the resource) that is being used by another transaction.
This differs from mutual exclusion since the access is not concurrent but
the same problem arises of having multiple transactions modify the
same resource. We can support this with optimistic concurrency control
algorithms that will check for out-of-order modifications at commit time
and roll back (abort) if there are potential inconsistencies.
Circular wait
Avoiding circular wait means that we ensure that a cycle of
waiting on resources does not occur. We can do this by enforcing an
ordering on granting resources and aborting transactions or denying
requests if an ordering cannot be granted.

7. Explain Election Algorithms in Distributed System.

Ø All processes currently involved get together to choose a


coordinator
Ø If the coordinator crashes or becomes isolated, elect a new
coordinator
Ø If a previously crashed or isolated process, comes on line, a new
election may have to be held
Ø Wired systems
ü Bully algorithm
ü Ring algorithm
Ø Wireless systems
Ø Very large-scale systems

13 Prepared by P.Y.Kumar © www.anuupdates.org


III BSc (Semester – VI) Distributed Systems Unit III
Bully Algorithm:
Ø Assume
ü All processes know about each other
ü Processes numbered uniquely
ü They do not know each other’s state
Ø Suppose P notices no coordinator
ü Sends election message to all higher numbered
processes
ü If no response, P takes over as coordinator
ü If any responds, P yields
Ø Suppose Q receives election message
ü Replies OK to sender, saying it will take over
ü Sends a new election message to higher numbered
processes
Ø Repeat until only one process left standing
ü Announces victory by sending message saying that it
is the coordinator

Ø Suppose R comes back on line


ü Sends a new election message to higher numbered
processes
Ø Repeat until only one process left standing
ü Announces victory by sending message saying that it
is the coordinator (if not already the coordinator)
Ø Existing (lower numbered) coordinator yields
ü Hence the term “bully”

14 Prepared by P.Y.Kumar © www.anuupdates.org


III BSc (Semester – VI) Distributed Systems Unit III
Alternative – Ring Algorithm:
Ø All processes organized in ring
Ø Suppose P notices no coordinator
ü Sends election message to successor with own process
number in body of message
ü (If successor is down, skip to next process, etc.)
Ø Suppose Q receives an election message
ü Adds own process number to list in message body
Ø Suppose P receives an election message with its own process
number in body
ü Changes message to coordinator message, preserving
body
ü All processes recognize highest numbered process as
new coordinator
Ø If multiple messages circulate …
ü …they will all contain same list of processes
(eventually)
Ø If process comes back on-line
ü Calls new election

Wireless Environments:
Ø Unreliable, and processes may move
• Network topology constantly changing
Ø Algorithm:
ü Any node starts by sending out an ELECTION message to
neighbors
ü When a node receives an ELECTION message for the first
time, it forwards to neighbors, and designates the sender as
its parent
ü It then waits for responses from its neighbors
• Responses may carry resource information
ü When a node receives an ELECTION message for the second
time, it just OKs it

15 Prepared by P.Y.Kumar © www.anuupdates.org


III BSc (Semester – VI) Distributed Systems Unit III

(e) The build-tree phase. (f) Reporting of best node to source.

Very Large Scale Networks:


Ø Sometimes more than one node should be selected
Ø Nodes organized as peers and super-peers
ü Elections held within each peer group
ü Super-peers coordinate among themselves

*****************

16 Prepared by P.Y.Kumar © www.anuupdates.org


III BSc (Semester – VI) Distributed Systems Unit III

The following are the Important Questions from UNIT-III

1. What is DSM (Distributed Shared Memory)? Explain Design and


Implementation of DSM System?
2. Granularity (Parallel Computing) and Consistency Models.
3. Advantages of DSM
4. Clock Synchronization & Event Ordering
5. Explain Mutual Exclusion in Distributed Systems.
6. Explain Deadlock in Distributed System
7. Explain Election Algorithms in Distributed System.

*****************

17 Prepared by P.Y.Kumar © www.anuupdates.org

You might also like