CS3551-Distributed computing Notes_removed (1)
CS3551-Distributed computing Notes_removed (1)
COURSE OBJECTIVES:
UNIT I INTRODUCTION 8
Logical Time: Physical Clock Synchronization: NTP – A Framework for a System of Logical Clocks –
Scalar Time – Vector Time; Message Ordering and Group Communication: Message Ordering Paradigms
– Asynchronous Execution with Synchronous Communication – Synchronous Program Order on
Asynchronous System – Group Communication – Causal Order – Total Order; Global State and Snapshot
Recording Algorithms: Introduction – System Model and Definitions – Snapshot Algorithms for FIFO
Channels.
COURSE OUTCOMES:
Upon the completion of this course, the student will be able to
REFERENCES
1. George Coulouris, Jean Dollimore, Time Kindberg, “Distributed Systems Concepts and Design”, Fifth
Edition, Pearson Education, 2012.
2. Pradeep L Sinha, “Distributed Operating Systems: Concepts and Design”, Prentice Hall of India, 2007.
3. Tanenbaum A S, Van Steen M, “Distributed Systems: Principles and Paradigms”, Pearson Education,
2007.
4. Liu M L, “Distributed Computing: Principles and Applications”, Pearson Education, 2004.
5. Nancy A Lynch, “Distributed Algorithms”, Morgan Kaufman Publishers, 2003.
6. Arshdeep Bagga, Vijay Madisetti, “ Cloud Computing: A Hands-On Approach”, Universities Press,
2014.
UNIT I
INTRODUCTION TO DISTRIBUTED
SYSTEMS
INTRODUCTION
The process of computation was started from working on a single
processor. This uni-processor computing can be termed as centralized
computing. As the demand for the increased processing capability grew high,
multiprocessor systems came to existence. The advent of multiprocessor
systems, led to the development of distributed systems with high degree of
scalability and resource sharing. The modern day parallel computing is a
subset of distributed computing
QOS parameters
The distributed systems must offer the following QOS:
Performance
Reliability
Availability
Security
The interaction of the layers of the network with the operating system and
middleware is shown in Fig 1.2. The middleware contains important library functions
for facilitating the operations of DS.
The distributed system uses a layered architecture to break down the complexity of
system design. The middleware is the distributed software that drives the distributed
system, while providing transparency of heterogeneity at the platform level
1.3 MOTIVATION
The following are the key points that act as a driving force behind DS:
The main objective of parallel systems is to improve the processing speed. They are
sometimes known as multiprocessor or multi computers or tightly coupled
systems. They refer to simultaneous use of multiple computer resources that can
include a single computer with multiple processors, a number of computers connected
by a network to form a parallel processing cluster or a combination of both.
1. A multiprocessor system
A multiprocessor system is a parallel system in which the multiple processors have
direct access to shared memory which forms a common address space. The
architecture is shown in Figure 1.3(a). Such processors usually do not have a common
clock.
Figure 1.3 Two standard architectures for parallel systems. (a) Uniform
memory access (UMA) multiprocessor system. (b) Non-uniform memory access
(NUMA) multiprocessor.
Figure 1.4 shows two popular interconnection networks – the Omega network and the
Butterfly network, each of which is a multi-stage network formed of 2×2 switching
elements. Each 2×2 switch allows data on either of the two input wires to be switched
to the upper or the lower output wire. In a single step, however, only one data unit can
be sent on an output wire. So if the data from both the input wires is to be routed to
the same output wire in a single step, there is a collision.
The routing function from input line i to output line j considers only j and the
stage number s, where s ∈ [0, log n – 1]. In a stage s switch, if the s + 1th most
significant bit of j is 0, the data is routed to the upper output wire, otherwise it is
routed to the lower output wire.
Butterfly network
A butterfly network links multiple computers into a high-speed network. For a
butterfly network with n processor nodes, there need to be n (log n + 1) switching
nodes. The generation of the interconnection pattern between a pair of adjacent stages
depends not only on n but also on the stage numbers. In a stage (s) switch, if the s +
1th MSB of j is 0, the data is routed to the upper output wire, otherwise it is routed to
the lower output wire.
3. Array Processors
They are a class of processors that executes one instruction at a time in an array or table
of data at the same time rather than on single data elements on a common clock.
They are also known as vector processors. An array processor implement the
instruction set where each instruction is executed on all data items associated and then
move on the other instruction. Array elements are incapable of operating
autonomously, and must be driven by the control unit.
Flynn’s Taxonomy
Flynn's taxonomy is a specific classification of parallel computer architectures that are based on
the number of concurrent instruction (single or multiple) and data streams (single or multiple)
available in the architecture.
Flynn's taxonomy based on the number of instruction streams and data streams are the
following:
1. (SISD) single instruction, single data
2. (MISD) multiple instruction, single data
3. (SIMD) single instruction, multiple data
4. (MIMD) multiple instruction, multiple data
The multiprocessor systems are classified into two types based on coupling:
1. Loosely coupled systems
2. Tightly coupled systems
Concurrency
Concurrent programming refer to techniques for decomposing a task into subtasks
that can execute in parallel and managing the risks that arise when the program
executes more than one task at the same time.
The parallelism or concurrency in a parallel or distributed program can be measured
by the ratio of the number of local non-communication and non-shared memory
access operations to the total number of operations, including the communication or
shared memory access operations.
Granularity
Granularity or grain size is a measure of the amount of work or computation that is perform
by that task.
Granularity is also the communication overhead between multiple processors or
processing elements.
In this case, granularity as the ratio of computation time to communication time,
wherein, the computation time is the time required to perform the computation of a
task and communication time is the time required to exchange data between
processors.
Parallelism can be classified into three categories based on work distribution among
the parallel tasks:
1. Fine-grained: Partitioning the application into small amounts of work done leading
to a low computation to communication ratio.
2. Coarse-grained parallelism: This has high computation to communication ratio.
3. Medium-grained: Here the task size and communication time greater than fine-
grained parallelism and lower than coarse-grained parallelism.
Programs with fine-grained parallelism are best suited for tightly coupled systems.
Classes of OS of Multiprocessing systems:
Network Operating Systems: The operating system running on loosely coupled
processors which are themselves running loosely coupled software
Distributed Operating systems: The OS of the system running on loosely coupled
processors, which are running tightly, coupled software.
Multiprocessor Operating Systems: The OS will run on tightly coupled processors,
which are themselves running tightly coupled software.
1.4 MESSAGE-PASSING SYSTEMS VERSUS SHARED MEMORY SYSTEMS
Communication among processors takes place via shared data
variables, and control variables for synchronization among the processors. The
communications between the tasks in multiprocessor systems take place through
two main modes:
Processes can communicate with other Here, a process does not have private address
processes. They can be protected from one space. So one process can alter the execution of
another by having private address spaces. other.
This technique can be used in heterogeneous This cannot be used to heterogeneous
computers. computers.
Synchronization between processes is through Synchronization is through locks and
message passing primitives. semaphores.
Processes communicating via message passing Processes communicating through DSM
must execute at the same time. may execute with non-overlapping lifetimes.
Efficiency:
All remote data accesses are explicit and Any particular read or update may or may not
therefore the programmer is always aware of involve communication by the underlying
whether a particular operation is in-process or runtime support.
involves the expense of communication.
Blocking primitives
The primitive commands wait for the message to be delivered. The execution of the
processes is blocked.
The sending process must wait after a send until an acknowledgement is made by
thereceiver.
The receiving process must wait for the expected message from the sending process
The receipt is determined by polling common buffer or interrupt
This is a form of synchronization or synchronous communication.
A primitive is blocking if control returns to the invoking process after the processing
for the primitive completes.
Asynchronous
A Send primitive is said to be asynchronous, if control returns back to the invoking
process after the data item to be sent has been copied out of the user-specified buffer.
It does not make sense to define asynchronous Receive primitives.
Implementing non -blocking operations are tricky.
For non-blocking primitives, a return parameter on the primitive call returns a system-
generated handle which can be later used to check the status of completion of the call.
The process can check for the completion:
checking if the handle has been flagged or posted
issue a Wait with a list of handles as parameters: usually blocks until one of the
parameter handles is posted.
Fig 1.12 a) Blocking synchronous send and blocking Fig 1.12 b) Non-blocking synchronous send and
receive blocking receive
Fig 1.12 c) Blocking asynchronous send Fig 1.12 d) Non-blocking asynchronous send
buffer. The checking for the completion may be necessary if the user wants to reuse
the buffer from which the data was sent.
Non-blocking Receive:
The Receive call will cause the kernel to register the call and return the handle of a
location that the user process can later check for the completion of the non-blocking
Receive operation.
This location gets posted by the kernel after the expected data arrives and is copied to
the user-specified buffer. The user process can check for the completion of the non-
blocking Receive by invoking the Wait operation on the returned handle.
Processor Synchrony
Processor synchrony indicates that all the processors execute in lock-step with their clocks
synchronized.
RMI RPC
RMI uses an object oriented paradigm RPC is not object oriented and does not
where the user needs to know the object deal with objects. Rather, it calls specific
and the method of the object he needs to subroutines that are already established
invoke.
With RPC looks like a local call. RPC RMI handles the complexities of passing
handles the complexities involved with along the invocation from the local to the
passing the call from the local to the remote computer. But instead of passing a
remote computer. procedural call, RMI passes a reference to
the object and the method that is being
called.
Asynchronous Execution:
A communication among processes is considered asynchronous, when every
communicating process can have a different observation of the order of the messages
being exchanged. In an asynchronous execution:
there is no processor synchrony and there is no bound on the drift rate of processor
clocks
message delays are finite but unbounded
no upper bound on the time taken by a process
Synchronous Execution:
A communication among processes is considered synchronous when every
process observes the same order of messages within the system. In the same manner,
the execution is considered synchronous, when every individual process in the system
observes the same total order of all the processes which happen within it. In an
synchronous execution:
processors are synchronized and the clock drift rate between any two processors is
bounded
message delivery times are such that theyoccur in one logical step or round
upper boundon the time taken by a process to execute a step.
Fig 1.14: Synchronous execution
Emulating an asynchronous system by a synchronous system (A → S)
An asynchronous program can be emulated on a synchronous system fairly trivially as
the synchronous system is a special case of an asynchronous system – all
communication finishes within the same round in which it is initiated.
When all the above conditions are satisfied, then it can be concluded that ab is
casually related. Consider two events c and d; cd and dc is false (i.e) they are not
casually related, then c and d are said to be concurrent events denoted as c||d.
A system that supports the causal ordering model satisfies the following property:
Distributed Snapshot represents a state in which the distributed system might have been in. A
snapshot of the system is a single configuration of the system.
The global state of a distributed system is a collection of the local states of its
components, namely, the processes and the communication channels.
The state of a process at any time is defined by the contents of processor registers,
stacks, local memory, etc. and depends on the local context of the distributed
application.
The state of a channel is given by the set of messages in transit in the channel.
The state of a channel is difficult to state formally because a channel is a distributed entity
and its state depends upon the states of the processes it connects. Let
denote the state of a channel Cij defined as follows:
A distributed snapshot should reflect a consistent state. A global state is consistent if it could
have been observed by an external observer. For a successful Global State, all states must be
consistent:
If we have recorded that a process P has received a message from a process Q, then
we should have also recorded that process Q had actually send that message.
Otherwise, a snapshot will contain the recording of messages that have been received
but never sent.
The reverse condition (Q has sent a message that P has not received) is allowed.
The notion of a global state can be graphically represented by what is called a cut. A cut
represents the last event that has been recorded for each process.
The history of each process if given by:
Each event either is an internal action of the process. We denote by s ik the state of process pi
immediately before the kth event occurs. The state si in the global state S corresponding to the
cut C is that of pi immediately after the last event processed by pi in the cut – eici . The set of
events eici is called the frontier of the cut.
A cut is a set of cut events, one per node, each of which captures the state of the node on which it
occurs.
Cut is pictorially a line slices the space–time diagram, and thus the set of events in the
distributed computation, into a PAST and a FUTURE. The PAST contains all the events to
the left of the cut and the FUTURE contains all the events to theright of the cut. For a cut C,
let PAST(C) and FUTURE(C) denote the set ofevents in the PAST and FUTURE of C,
respectively.
Consistent cut: A consistent global state corresponds to a cut in which every message
received in the PAST of the cut was sent in the PAST of that cut.
Inconsistent cut: A cut is inconsistent if a message crosses the cut from the FUTURE to the
PAST.
The term max(past(ei)) denotes the latest event of process pi that has affected ej. This will
always be a message sent event.
Fig 1.19: Past and future cones of event
A cut in a space-time diagram is a line joining an arbitrary point on each process line that
slices the space-time diagram into a PAST and a FUTURE. A consistent global state
corresponds to a cut in which every message received in the PAST of the cut was sent in the
PAST of that cut.
The future of an event ejdenoted by Future (ej) contains all the events ei that are
casually affected by ej.
Futurei (ei ) is the set of those events of Future (ej) are the process pi and
min(Futurei(ej)) as the first event on process pi that is affected by ej. All events at a process pi
that occurred afterMax (Past (ej)) but before min (Futurei (ej)) are concurrent with ej.