1st Unit DC
1st Unit DC
Scenario 1 Scenario 2
Assign paper to single faculty Arun Assign paper to 5 faculties
Arun => 500 papers => 5 days Arun => 100 papers =>
Gopal => 100 papers =>
Ashif => 100 papers =>
Hari => 100 papers =>
Murali => 100 papers =>
Eg - Online banking transaction for 10 million users
Eg 2 Image Rendering - Resize, Filter, Color, Effects
Characteristics
Relation to Computer System Components
Key Points
● The distributed software is also termed as middleware.
● A distributed execution is the execution of processes across the distributed
system to collaboratively achieve a common goal. An execution is also
sometimes termed a computation or a run.
● The middleware is the distributed software that drives the distributed system,
while providing transparency of heterogeneity at the platform level.
● Middleware layer does not contain the traditional application layer functions of
the network protocol stack, such as http, mail, ftp, and telnet.
● Various primitives and calls to functions defined in various libraries of the
middleware layer are embedded in the user program code.
● There exist several libraries to choose from to invoke primitives for the more
common functions – such as reliable and ordered multicasting – of the
middleware layer.
● There are several standards such as Object Management Group’s (OMG)
common object request broker architecture (CORBA) [36], and the remote
procedure call (RPC) mechanism.
Motivation/ Benefit of Distributed Computing
Key Points
1. Inherently distributed computations
a. In many applications such as money transfer in banking, or reaching consensus among parties that are geographically
distant, the computation is inherently distributed.
2. Resource sharing
a. Resources such as peripherals, complete data sets in databases, special libraries, as well as data (variable/files)
cannot be fully replicated at all the sites because it is often neither practical nor cost-effective.
b. Further, they cannot be placed at a single site because access to that site might prove to be a bottleneck.
c. Therefore, such resources are typically distributed across the system. For example, distributed databases such as DB2
partition the data sets across several servers, in addition to replicating them at a few sites for rapid access as well as
reliability.
3. Access to geographically remote data and resources
a. In many scenarios, the data cannot be replicated at every site participating in the distributed execution because it may
be too large or too sensitive to be replicated. For example, payroll data within a multinational corporation is both too
large and too sensitive to be replicated at every branch office/site. It is therefore stored at a central server which can be
queried by branch offices. Similarly, special resources such as supercomputers exist only in certain locations, and to
access such supercomputers, users need to log in remotely.
4. Enhanced reliability
a. A distributed system has the inherent potential to provide increased reliability because of the possibility of replicating
resources and executions, as well as the reality that geographically distributed resources are not likely to
crash/malfunction at the same time under normal circumstances.
b. Reliability entails several aspects:
i. • availability, i.e., the resource should be accessible at all times; •
ii. integrity, i.e., the value/state of the resource should be correct, in the face of concurrent access from multiple
processors, as per the semantics expected by the application;
iii. • fault-tolerance, i.e., the ability to recover from system failures, where such failures may be defined to occur in
one of many failure models.
5. Increased performance/cost ratio
a. By resource sharing and accessing geographically remote data and resources, the performance/cost ratio is
increased.
b. Although higher throughput has not necessarily been the main objective behind using a distributed system,
nevertheless, any task can be partitioned across the various computers in the distributed system.
c. Such a configuration provides a better performance/cost ratio than using special parallel machines.
6. Scalability
a. As the processors are usually connected by a wide-area network, adding more processors does not pose a
direct bottleneck for the communication network.
7. Modularity and incremental expandability
a. Heterogeneous processors may be easily added into the system without affecting the performance, as long as
those processors are running the same middleware algorithms.
b. Similarly, existing processors may be easily replaced by other processors.
Distributed Vs Parallel Computing
Message Passing vs Shared Memory
Key Points
● Shared memory systems are those in which there is a (common) shared address
space throughout the system.
● Communication among processors takes place via shared data variables, and
control variables for synchronization among the processors.
● Semaphores and monitors that were originally designed for shared memory
uniprocessors and multiprocessors are examples of how synchronization can be
achieved in shared memory systems.
● All multicomputer (NUMA as well as message-passing) systems that do not have a
shared address space provided by the underlying architecture and hardware
necessarily communicate by message passing.
● For a distributed system, this abstraction is called distributed shared memory.
Implementing this abstraction has a certain cost but it simplifies the task of the
application programmer.
Emulating MP in SM
Shared Memory P1 P2 P3
1. The shared address space can be partitioned into disjoint parts, one part
being assigned to each processor.
2. “Send” and “receive” operations can be implemented by writing to and reading
from the destination/sender processor’s address space, respectively.
Specifically, a separate location can be reserved as the mailbox for each
ordered pair of processes.
3. A Pi–Pj message-passing can be emulated by a write by Pi to the mailbox and
then a read by Pj from the mailbox.
4. The write and read operations need to be controlled using synchronization
primitives to inform the receiver/sender after the data has been sent/received.
Emulating SM in MP
1. This involves the use of “send” and “receive” operations for “write” and “read”
operations.
2. Each shared location can be modeled as a separate process;
a. “write” to a shared location is emulated by sending an update message to the corresponding
owner process;
b. a “read” to a shared location is emulated by sending a query message to the owner process.
3. the latencies involved in read and write operations may be high even when
using shared memory emulation
4. An application can of course use a combination of shared memory and
message-passing.
5. In a MIMD message-passing multicomputer system, each “processor” may be
a tightly coupled multiprocessor system with shared memory. Within the
multiprocessor system, the processors communicate via shared memory.
Between two computers, the communication is by message passing
CS 3551 DISTRIBUTED
COMPUTING
Primitives for Distributed Computing
Function Parameters Need
Sl Primitive Shortcut
Receive
Receive
● In computer systems, there are many ways for programs to talk to each other, like
sending messages or making remote calls. Different software products and scientific
tools use their own special ways to do this.
● For example, some big companies use their custom methods, like IBM's CICS
software. Scientists often use libraries called MPI or PVM. Commercial software
often uses a method called RPC, which lets you call functions on a different
computer like you would on your own computer.
● All these methods use something like a hidden network phone line (called "sockets")
to make these remote calls work.
● There are many types of RPC, like Sun RPC and DCE RPC. There are also other
ways to communicate, like "messaging" and "streaming."
● As software evolves, there are new methods like RMI and ROI for object-based
programs, and big standardized systems like CORBA and DCOM.
Synchronous vs Asynchronous Execution
Synchronous vs Asynchronous Execution
Asynchronous Synchronous
Consistency and Replicate for performance but it is essential that replicated compute
replication should be consistent.
Naming Easy to use and robust naming for identifiers, and addresses is essential for locating
resources and processes in a transparent and scalable manner.
Security & Scalability Involves various aspects of cryptography, secure channels, access control, key
management – generation and distribution, authorization, and secure group
management
The system must be scalable to handle large loads.
Processes management of processes and threads at clients/servers; code migration; and the
design of software and mobile agents.
Data storage and Schemes for data storage, and implicitly for accessing the data in a fast and scalable
access manner across the network are important for efficiency
Fault tolerance ● Fault tolerance requires maintaining correct and efficient operation in spite of
any failures of links, nodes, and processes.
● Process resilience, reliable communication, distributed commit, checkpointing
and recovery, agreement and consensus, failure detection, and
self-stabilization are some of the mechanisms to provide fault-tolerance.
2. Algorithmic Challenges ( CREPT-MGW)
Challenge Description
group Communication, 1. A group is a collection of processes that share a common context and collaborate on a
multicast, and ordered common task within an application domain
message delivery 2. Algorithms need to be designed to enable efficient group communication and group
management wherein processes can join and leave groups dynamically
3. Specify order of delivery when multiple process send message concurrently
Execution models and ● interleaving model and partial order model are two widely adopted models of distributed
frameworks system executions
● The input/output automata model [25] and the TLA (temporal logic of actions) are two
other examples of models that provide different degrees of infrastructure
Program design and ● Methodically designed and verifiably correct programs can greatly reduce the overhead
verification tools of software design, debugging, and engineering
● Designing mechanisms to achieve these design and verification goals is a challenge.
2. Algorithmic Challenges ( CREPT-MGW)
Challenge Description
Time and global state 1. The processes in the system are spread across three-dimensional physical space.
Another dimension, time, has to be superimposed uniformly across space.
2. The challenges pertain to providing accurate physical time, and to providing a variant of
time, called logical time.
3. Logical time is relative time, and eliminates the overheads of providing physical time for
applications where physical time is not required.
4. Observing the global state of the system (across space) also involves the time
dimension for consistent observation
Monitoring ● On-line algorithms for monitoring such predicates are hence important.
distributed events ● Event streaming is used where streams of relevant events reported from different
and predicates processes are examined collectively to detect predicates
Graph algorithms ● The distributed system is modeled as a distributed graph, and the graph algorithms form
and distributed the building blocks for a large number of higher level communication, data
routing algorithms dissemination, object location, and object search functions.
● the design of efficient distributed graph algorithms is of paramount importance
World Wide Web ● Minimizing response time to minimize user perceived latencies is important challenge
design ● Object search and navigation on the web efficiently is important
2. Algorithmic Challenges (Synch Mechanism)
2. Algorithmic Challenges
2. Algorithmic Challenges (Fault Tolerant)
2. Algorithmic Challenges (Load Balancing)
2. Algorithmic Challenges (performance)
3. Applications of distributed computing
3. Applications of distributed computing
Sensor Networks:
1. Sensors, which can measure physical properties like temperature and humidity, have become affordable
and are deployed in large numbers (over a million).
2. They report external events, not internal computer processes. These networks have various applications,
including mobile or static sensors that communicate wirelessly or through wires. Self-configuring ad-hoc
networks introduce challenges like position and time estimation.
Ubiquitous Computing:
1. Ubiquitous systems involve processors integrated into the environment, working in the background, like in
sci-fi scenarios. Examples include smart homes and workplaces.
2. These systems are essentially distributed, use wireless tech, sensors, and actuators, and can self-organize.
They often consist of many small processors in a dynamic network, connecting to more powerful resources
for data processing.
3. Applications of distributed computing
Peer-to-Peer (P2P) Computing:
1. In P2P computing, all processors interact as equals without any hierarchy, unlike client-server systems. P2P
networks are often self-organizing and may lack a regular structure.
2. They don't use central directories for name resolution. Challenges include efficient object storage and
lookup, dynamic reconfiguration, replication strategies, and addressing issues like privacy and security.
1. As information grows, we need efficient ways to distribute and filter it. Publish-Subscribe involves
distributing information, letting users subscribe to what interests them, and then filtering it based on user
preferences.
2. Content distribution is about sending data with specific characteristics to interested users, often used in web
and P2P settings. When dealing with multimedia, we face challenges like large data, compression, and
synchronization during storage and playback.
3. Applications of distributed computing
Data Mining Algorithms:
1. They analyze large data sets to find patterns and useful information. For example, studying
customer buying habits for targeted marketing.
2. This involves applying database and AI techniques to data. When data is distributed, as in private
banking or large-scale weather prediction, efficient distributed data mining algorithms are needed.
● A distributed program is composed of a set of n asynchronous processes p1, p2, ………, pi, ,
pn that communicate by message passing over the communication network.
● We assume that each process is running on a different processor.
● The processes do not share a global memory and communicate solely by passing
messages.
● Cij - Channel from pi to process pj
● mij- a message sent by pi to pj.
● Don’t share a global clock.
● Process execution and message transfer are asynchronous
● The global state of a distributed computation is composed of the states of the processes
and the communication channels
● The state of a process is characterized by the state of its local memory and depends upon
the context. The state of a channel is characterized by the set of messages in transit in the
channel
Model for Distributed Execution
A B
Model for Distributed Execution
1. Execution of a process consists of a sequential execution of its actions.
2. The actions are atomic and the actions of a process are modeled as three
types of events, internal events, message send events(send (m)), and
message receive events(rec(m)).
3. The occurrence of events changes the states of respective processes and
channels, thus causing transitions in the global system state.
4. An internal event changes the state of the process at which it occurs.
5. A send event (or a receive event) changes the state of the process that sends
(or receives) the message and the state of the channel on which the message
is sent (or received)
Causal Precedence Relation
Logical vs Physical Concurrency
● In a distributed computation, two events are logically concurrent if and only if
they do not causally affect each other.
● Physical concurrency, on the other hand, has a connotation that the events
occur at the same instant in physical time.
● Note that two or more events may be logically concurrent even though they
do not occur at the same instant in physical time.
Models of Communication Networks (FIFO, Non-FIFO, Causal
Ordering)
1. FIFO - each channel acts as a first-in first-out message queue and thus, message ordering is
preserved by a channel.
2. non-FIFO model, a channel acts like a set in which the sender process adds messages and the
receiver process removes messages from it in a random order.
3. Causal Ordering (built-in synch)- based on Lamport’s “happens before” relation. A system that
supports the causal ordering model satisfies the following property:
Global State of Distributed System
1. The state of a process at any time is defined by the contents of processor registers, stacks, local
memory, etc. and depends on the local context of the distributed application.
2. The state of a channel is given by the set of messages in transit in the channel.
3. The occurrence of events changes the states of respective processes and channels, thus causing
transitions in global system state.
a. For example, an internal event changes the state of the process at which it occurs. A send event (or a receive event) changes
the state of the process that sends (or receives) the message and the state of the channel on which the message is sent (or
received).
State of Process
State of Channel
How to find global state?
❖ For a global snapshot to be meaningful, the states of all the components of the
distributed system must be recorded at the same instant.
❖ This will be possible if the local clocks at processes were perfectly synchronized or
there was a global system clock that could be instantaneously read by the
processes. However, both are impossible.
❖ Solution:
❖ Recording at different time will be meaningful provided every message that is
recorded as received is also recorded as sent.
❖ Basic idea is that an effect should not be present without its cause.
❖ States that don’t violate causality are called consistent global states and are
meaningful global states
Inconsistent state
Consistent state
Strongly Consistent