# Definition: Distributed Computing Module-1
# Definition: Distributed Computing Module-1
MODULE-1
# Definition
A distributed system is a collection of independent computers that appears to its users as a
single coherent system. A distributed system consists of components (i.e., computers) that are
autonomous and users (be they people or programs) think they are dealing with a single system.
This means that one way or the other the autonomous components need to collaborate.
The computers, processes, or processors are referred to as the nodes of the distributed
system. To be qualified as "autonomous" , the nodes must at least be equipped with their own
private control; thus, a parallel computer of the single-instruction, multiple-data (SIMD) model does
not qualify as a distributed system. To be qualified as "interconnected" , the nodes must be able to
exchange information.
# Motivation
The motivation for using a distributed system is some or all of the following requirements:
1. Inherently distributed computations In many applications such as money transfer in
banking, or reaching consensus among parties that are geographically distant, the
computation is inherently distributed.
2. Resource sharing Resources such as peripherals, complete data sets in databases, special
libraries, as well as data (variable/files) cannot be fully replicated at all the sites because it
is often neither practical nor cost-effective. Further, they cannot be placed at a single site
because access to that site might prove to be a bottleneck. Therefore, such resources are
typically distributed across the system.
3. Access to geographically remote data and resources In many scenarios, the data cannot
be replicated at every site participating in the distributed execution because it may be too
large or too sensitive to be replicated. For example, payroll data within a multinational
corporation is both too large and too sensitive to be replicated at every branch office/site. It
is therefore stored at a central server which can be queried by branch offices. Similarly,
special resources such as supercomputers exist only in certain locations, and to access such
supercomputers, users need to log in remotely.
4. Enhanced reliability A distributed system has the inherent potential to provide increased
reliability because of the possibility of replicating resources and executions, as well as the
reality that geographically distributed resources are not likely to crash/malfunction at the
same time under normal circumstances. Reliability entails several aspects:
o availability, i.e., the resource should be accessible at all times;
o integrity, i.e., the value/state of the resource should be correct, in the face of concurrent
access from multiple processors, as per the semantics expected by the application;
o fault-tolerance, i.e., the ability to recover from system failures, where such failures may be
defined to occur in one of many failure models
5. Scalability As the processors are usually connected by a wide-area network, adding more
processors does not pose a direct bottleneck for the communication network
6. Increased performance/cost ratio By resource sharing and accessing geographically
remote data and resources, the performance/cost ratio is increased. Although higher
throughput has not necessarily been the main objective behind using a distributed system,
nevertheless, any task can be partitioned across the various computers in the distributed
system. Such a configuration provides a better performance/cost ratio than using special
parallel machines.
✓ Synchronous primitives A Send or a Receive primitive is synchronous if both the Send() and
Receive() handshake with each other. The processing for the Send primitive completes only
after the invoking processor learns that the other corresponding Receive primitive has also
been invoked and that the receive operation has been completed. The processing for the
Receive primitive completes when the data to be received is copied into the receiver’s user
buffer.
✓ Asynchronous primitives A Send primitive is said to be asynchronous if control returns
back to the invoking process after the data item to be sent has been copied out of the user-
specified buffer. It does not make sense to define asynchronous Receive primitives.
✓ Blocking primitives A primitive is blocking if control returns to the invoking process after
the processing for the primitive (whether in synchronous or asynchronous mode)
completes.
✓ Non-blocking primitives A primitive is non-blocking if control returns back to the invoking
process immediately after invocation, even though the operation has not completed. For a
non-blocking Send, control returns to the process even before the data is copied out of the
user buffer.
For a non-blocking Receive, control returns to the process even before the data may
have arrived from the sender. For non-blocking primitives, a return parameter on the
primitive call returns a system-generated handle which can be later used to check the status
of completion of the call. The process can check for the completion of the call in two ways.
First, it can keep checking (in a loop or periodically) if the handle has been flagged or
posted. Second, it can issue a Wait with a list of handles as parameters. The Wait call usually
blocks until one of the parameter handles is posted.
✓ Blocking synchronous Send The data gets copied from the user buffer to the kernel buffer
and is then sent over the network. After the data is copied to the receiver’s system buffer
and a Receive call has been issued, an acknowledgement back to the sender causes control
to return to the process that invoked the Send operation and completes the Send.
✓ non-blocking synchronous Send Control returns back to the invoking process as soon as
the copy of data from the user buffer to the kernel buffer is initiated. A parameter in the
non-blocking call also gets set with the handle of a location that the user process can later
check for the completion of the synchronous send operation. The location gets posted after
an acknowledgement returns from the receiver
✓ Blocking asynchronous Send The user process that invokes the Send is blocked until the
data is copied from the user’s buffer to the kernel buffer. (For the unbuffered option, the
user process that invokes the Send is blocked until the data is copied from the user’s buffer
to the network.)
✓ non-blocking asynchronous Send The user process that invokes the Send is blocked until
the transfer of the data from the user’s buffer to the kernel buffer is initiated. Control
returns to the user process as soon as this transfer is initiated, and a parameter in the non-
blocking call also gets set with the handle of a location that the user process can check later
using the Wait operation for the completion of the asynchronous Send operation. The
asynchronous Send completes when the data has been copied out of the user’s buffer.
✓ Blocking Receive The Receive call blocks until the data expected arrives and is written in
the specified user buffer. Then control is returned to the user process.
✓ non-blocking Receive The Receive call will cause the kernel to register the call and return
the handle of a location that the user process can later check for the completion of the non-
blocking Receive operation. This location gets posted by the kernel after the expected data
arrives and is copied to the user-specified buffer. The user process can check for the
completion of the non-blocking Receive by invoking the Wait operation on the returned
handle.
✓ Processor synchrony indicates that all the processors execute in lock-step with their clocks
synchronized. As this synchrony is not attainable in a distributed system, what is more
generally indicated is that for a large granularity of code, usually termed as a step, the
processors are synchronized. This abstraction is implemented using some form of barrier
synchronization to ensure that no processor begins executing the next step of code until all
the processors have completed executing the previous steps of code assigned to each of the
processors.
# A Distributed Program
A distributed program is composed of a set of n asynchronous processes p1, p2, , pi, , pn
that communicate by message passing over the communication network. Without loss of generality,
we assume that each process is running on a different processor. The processes do not share a
global memory and communicate solely by passing messages.
Process execution and message transfer are asynchronous – a process may execute an
action spontaneously and a process sending a message does not wait for the delivery of the
message to be complete. The global state of a distributed computation is composed of the states of
the processes and the communication channels
The causal precedence relation induces an irreflexive partial order on the events of a distributed
computation that is denoted as H=(H, →).
Note that the relation → is nothing but Lamport’s “happens before” relation. For any two events ei
and ej , if ei → ej , then event ej is directly or transitively dependent on event ei . (Graphically, it
means that there exists a path consisting of message arrows and process-line segments (along
increasing time) in the space-time diagram that starts at ei and ends at ej .) For example, in the
above figure, e11 → e33 and e33 → e26 . The relation → denotes flow of information in a distributed
computation and ei → ej dictates that all the information available at ei is potentially accessible at ej
. For example, in above figure, event e26 has the knowledge of all other events shown in the figure.
This property ensures that causally related messages destined to the same destination are
delivered in an order that is consistent with their causality relation.
Causally ordered delivery of messages implies FIFO message delivery. (Note that CO ⊂ FIFO ⊂ Non-
FIFO.)
Causal ordering model considerably simplifies the design of distributed algorithms because it
provides a built-in synchronization.
Consistent cut: A consistent global state corresponds to a cut in which every message received in the
PAST of the cut was sent in the PAST of that cut.
Inconsistent cut: A cut is inconsistent if a message crosses the cut from the FUTURE to the PAST.
• In a consistent cut, every message received in the PAST of the cut was sent in the PAST of
that cut. (In Figure, cut C2 is a consistent cut.)
• All messages that cross the cut from the PAST to the FUTURE are in transit in the
corresponding consistent global state.
• A cut is inconsistent if a message crosses the cut from the FUTURE to the PAST. (In Figure,
cut C1 is an inconsistent cut.)
Asynchronous communication achieves high degree of parallelism and non- determinism at the cost
of implementation complexity with buffers. On the other hand, synchronization is simpler with low
performance. The occurrence of deadlocks and frequent blocking of events prevents it from
reaching higher performance levels.
*************************************