0% found this document useful (0 votes)
4 views

CS439-CC-2-Parallel Distributed Systems

Cloud computing builds on over 60 years of advancements in parallel and distributed systems, utilizing a client-server model where computations occur in the cloud. Key factors in cloud applications include concurrency, checkpoint-restart mechanisms, and communication protocols that manage distributed processes. Challenges such as deadlocks, race conditions, and the need for effective parallel algorithms are critical considerations in the development and operation of cloud computing systems.

Uploaded by

haiqachaudary6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

CS439-CC-2-Parallel Distributed Systems

Cloud computing builds on over 60 years of advancements in parallel and distributed systems, utilizing a client-server model where computations occur in the cloud. Key factors in cloud applications include concurrency, checkpoint-restart mechanisms, and communication protocols that manage distributed processes. Challenges such as deadlocks, race conditions, and the need for effective parallel algorithms are critical considerations in the development and operation of cloud computing systems.

Uploaded by

haiqachaudary6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

CS439 - Cloud Computing

Parallel and Distributed Systems


The path to cloud computing
 Cloud computing is based on the traditional
parallel and distributed systems.
 A result of knowledge and wisdom accumulated over ~60
years of computing.

 Cloud applications follow the client-server


model
 Thin-client running on the user end.
 Computations are carried out on the cloud.
Important factors
• Concurrency
 Majority of cloud applications are data-intensive
 Use a number of concurrently running instances
• Checkpoint-restart mechanism
 Many cloud computations run for extended periods of
time on multiple servers
 Checkpoints are taken periodically in anticipation of
the need to restart a process when one or more
systems fail
• Communication
 Communication protocols support coordination of
distributed processes via communication channels,
that
o Might be noisy and unreliable
o Might lose messages or deliver duplicate, distorted, or
out of order
Parallel computing

• Parallel hardware/software systems are used


to:
 Solve problems demanding resources not
available on a single system
 Reduce the time required to obtain a solution

• The speed-up S measures the effectiveness


of parallelization:
S(N) = T(1) / T(N)
Where,
T(1) is the execution time of the sequential computation.
T(N) is the execution time when N parallel computations are
executed.
Parallel computing

• Amdahl's Law
if α is the fraction of running time a sequential
program spends on non-parallelizable segments of
the computation then

S = 1/ α

• Gustafson's Law
The scaled speed-up with N parallel processes

S(N) = N + α( 1 - N)
Concurrency

• Required by system and application software:

 Reactive systems respond to external events


o e.g., operating system kernel, embedded systems.

 Improve performance
o Parallel applications partition workload & distribute it
to multiple threads running concurrently.

 Support variable load & shorten the response


time of distributed applications, like
o Transaction management systems
o Client-server applications
Concurrency

• Concurrent execution can be challenging

• Might lead to race conditions


 An undesirable effect when the results of
concurrent execution depend on the sequence of
events.

• Shared resources must be protected by


locks/ semaphores /monitors to ensure
serial access

• Deadlocks and livelocks are possible


Concurrency

Four Coffman conditions for a deadlock:


 Mutual exclusion
At least one resource must be non-sharable, only one
process/thread may use the resource at any given time
 Hold and wait
At least one processes/thread must hold one or more
resources and wait for others.
 No-preemption
The scheduler or a monitor should not be able to force a
process/thread holding a resource to relinquish it.
 Circular wait
Given the set of n processes/threads {P1, P2 , P3 , …,Pn
}. Process P1 waits for a resource held by P2 , P2 waits for
a resource held by P3, and so on, Pn waits for a resource
held by P1.

A deadlock situation on a resource can arise iff all of the above conditions hold
simultaneously in the system.
Monitor
A monitor provides special procedures to access the data in a critical
section.
Other Challenges
• Livelock condition
Two or more processes/threads continually change their state in
response to changes in the other processes; then none of the
processes can complete its execution.
• Priority inversion
 Often concurrently running processes/threads are scheduled
based on their priorities.
 Priority inversion means that a higher priority process/task is
indirectly preempted by a lower priority one.

• Discovering parallelism is often challenging


• Development of parallel algorithms requires a
considerable effort
 For example, solving large systems of linear equations or solving
systems of PDEs (Partial Differential Equations), require algorithms
based on domain decomposition methods.
Parallelism
• Fine-grain parallelism
 Relatively small blocks of the code can be executed in parallel
without the need to communicate or synchronize with other
threads or processes.
• Coarse-grain parallelism
 Large blocks of code can be executed in parallel.
• Speed-up of applications with fine-grain parallelism
is considerably lower as compared to coarse-
grained applications
 Processor speed is orders of magnitude larger than the
communication speed even on systems with a fast interconnect.
• Data parallelism
 Data is partitioned into several blocks and the blocks are
processed in parallel.
• Same Program Multiple Data (SPMD)
 Data parallelism when multiple copies of the same program run
concurrently, each one on a different data block.
Parallelism levels

• Bit level parallelism


 Number of bits processed per clock cycle (often called a
word size)
 Increased from 4-bit, to 8-bit, 16-bit, 32-bit, and to 64-bit
• Instruction-level parallelism
 Computers now use multi-stage
processing pipelines to speed up execution
• Data parallelism or loop parallelism
 The program loops can be processed in parallel
• Task parallelism
 The problem can be decomposed into tasks that can be
carried out concurrently. For example, SPMD. Note that
data dependencies cause different flows of control in
individual tasks
Distributed systems

• A collection of
 Autonomous computers

 Connected through a network

 Distribution software called middleware, used to

o Coordinate computers activities

o Share system resources


Characteristics of distributed systems

• Users perceive the system as a single, integrated


computing facility
• Components are autonomous
• Scheduling, resource management and security
policies are implemented by each system
• There are multiple
 Points of control
 Points of failure

• Resources may not be accessible at all times


• Can be scaled by adding additional resources
• Can be designed to maintain availability even at
low levels of hardware/software/network reliability
Desirable properties of a distributed system

• Access transparency
 Local & remote resources are accessed using
identical operations
• Location transparency
 Information objects are accessed without
knowing their location.
• Concurrency transparency
 several processes run concurrently using shared
information objects without interference among
them.
• Replication transparency
 Multiple instances of information objects increase
reliability without the knowledge of users or
applications.
Desirable properties of a distributed system

• Failure transparency
 Concealment of faults.
• Migration transparency
 Information objects in the system are moved
without affecting the operation performed on
them.
• Performance transparency
 System can be reconfigured based on the load
and quality of service requirements.
• Scaling transparency
 System & applications can scale without
changing the system structure and without
affecting the applications.
Processes, threads, events
• Dispatchable units of work:
 Process is a program in execution
 Thread is a light-weight process
• State of a process/thread
 Information required to restart a suspended process/thread,
e.g. program counter and the current values of the registers.
• Event
 A change of state of a process e.g. local or Communication
events
• Process group
 A collection of cooperating processes
 Processes cooperate & communicate to reach a common
goal
• Global state of a distributed system
 Distributed systems consists several processes &
communication channels
 Global state is the union of states of individual processes and
Messages and communication channels

• Message is a structured unit of information.


• Communication channel provides the means
for processes or threads to
 Communicate with one another
 Coordinate their actions by exchanging messages
 Communication is done using send(m) and receive(m)
system calls, where m is a message
• State of a communication channel
 Given two processes pi and pj, the state of the channel ξi,j,
from pi to pj consists of messages sent by pi but not yet
received by pj
• Protocol
 A finite set of messages exchanged among processes to
help them coordinate their actions
Events
Space-time diagrams display 1
e
1 e
2
1 e
3
1 e
4
1
11
e
1

local and communication events p 1

during a process lifetime (a)

 Local events are small black circles.


 Communication events are
connected by lines from the send 1
e
1 e
2
1
3
e
1 e
4
1 e
5
1
6
e
1
event and to the receive event. p 1

1 2 5
e e 3
e
4
e e
a) All events in process p1 are local p 2
2 2 2 2 2

 Process is in state σ1 immediately (b)


after the occurrence of event e11 and
remains in that state until the
occurrence of event e21
b) 2 processes p1 and p2 1
e 2
e
3
e
4
e
5
1 e 1 1 1
 Events e12 & e23 are communication
1
p 1

event
o p1 sends a message to p2 1
e
2 e
2
2 e
3
2
4
e
2

o p2 receives the message sent by p1. p 2

c) 3 processes interact by means e


1
e
2
e
3
e
4

of communication events.
3 3 3 3

p 3

(c)
Global state of a process group

• Global states of a distributed computation


with n processes form an n-dimensional
lattice.

 How many paths to reach a global state exist?


 More the paths, harder to identify events leading
to a given state
 Debugging is quite difficult if there are large
number of paths
Global state of a process group

• In case of 2 processes, No. of paths from the


initial state Σ(0,0) to the state Σ(m,n) is:

• In 2-dimensional case, global state Σ (m,n) can


only be reached from two states, Σ(m-1,n) and
Σ(m,n-1)
0, 0
 1 2
e e
Global State
1 1

p
1


1, 0

0 ,1
p
2 1
e e 2
2
2
1 2
e e
(a) Lattice of 
2, 0

1,1

0, 2
p
1 1

global states of 2
1

p 2

processes with the 1 2


3, 0 e
2 e 2

2 ,1 1, 2
 
space-time 1
e
1 e
1
2

showing only the 3,1


p
1

2, 2

first 2 events per  p


2 1 2
e e
process.
2 2

4 ,1

3, 2 2,3 1 2
  e e 1 1

p
1

5 ,1 3, 3
p
 
4, 2 2, 4 2 1 2
  e e 2 2

(b) 6 possible e e
1
1
2
1

sequences of 5, 2
 
4,3

3, 4

2,5 p
1

events leading to p
2 1 2
e 2 e 2

the state Σ2,2 53


 
4, 4

3, 5
e
1
e
2
1 1

p 1

4,5 p

5, 4 2 1 2
 e 2 e 2

5, 5

(a) time
6,5 (b)

Process Coordination - Communication protocols

• A major challenge is to guarantee that two


processes will reach an agreement in case of
channel failures

• Communication protocols ensure process


coordination by implementing :
 Error control mechanisms
o Using error detection and error correction codes.
 Flow control
o Provides feedback from the receiver, it forces the sender to
transmit only the amount of data the receiver can handle.
 Congestion control
o Ensures that the offered load of the network does not exceed
the network capacity.
Process Coordination - Time & time intervals

• Process coordination requires:


 A global concept of time shared by cooperating entities.
 The measurement of time intervals, the time elapsed
between two events.
• Two events in the global history may be unrelated
 Neither one is the cause of the other
 Such events are said to be concurrent events
• Local timers provide relative time measurements
 An isolated system can be characterized by its history i.e.
a sequence of events
• Global agreement on time is necessary to trigger
actions that should occur concurrently
• Timestamps are often used for event ordering
 Using a global time base constructed on local virtual clocks
Logical clocks
• Logical clock (LC)
 An abstraction necessary to ensure the clock condition in the
absence of a global clock
• A process maps events to positive integers.
 LC(e) is the local variable associated with event e.
• Each process time-stamps the message m it sends
with the value of the logical clock at the time of
sending:

• The rules to update the logical clock:

)
Logical Clocks

1 2 3 4 5 12
p 1

m 1 m 2 m
5

1 2 6 7 8 9
p 2

m 3 m
4

1 2 3 10 11
p 3

)
Message delivery rules; causal delivery

• A real-life network might reorder messages.


• First-In-First-Out (FIFO) delivery
 Messages are delivered in the same order they are sent.
• Causal delivery
 An extension of the FIFO delivery
 Used in case when a process receives messages from
different sources.
• Communication channel typically does not
guarantee FIFO delivery
 However, FIFO delivery is enforced by attaching a sequence
number to each message sent
 The sequence numbers are also used to reassemble
messages out of individual packets.
Atomic actions
• Parallel & distributed applications must take special
precautions for handling shared resources
• Atomic operation
 A multi-step operation should be allowed to proceed to completion
without any interruptions and should not expose the state of the
system until the action is completed
 Hiding the internal state of an atomic action reduces the No. of
states a system can be in
 Hence, it simplifies the design and maintenance of the system.
• Atomicity requires hardware support:
 Test-and-Set
o Instruction which writes to a memory location and returns the old
content of that memory cell as non-interruptible.
 Compare-and-Swap
o Instruction which compares the contents of a memory location to a
given value and, only if the two values are the same, modifies the
contents of that memory location to a given new value.
Atomicity

• Before-or-after atomicity
 Effect of multiple actions is as if these actions have
occurred one after another, in some order.
• A systematic approach to atomicity must
address several delicate questions:
 How to guarantee that only one atomic action has
access to a shared resource at any given time?
 How to return to the original state of the system
when an atomic action fails to complete?

 How to ensure that the order of several atomic


actions leads to consistent results?
All-or-nothing atomicity
• A transaction is either carried out successfully, or
the record targeted by the transaction is returned
to its original state.
• Two phases:
 Pre-commit phase
o During this phase it should be possible to back up from it without
leaving any trace.
o Commit point - the transition from the first to the second phase.
o During the pre-commit phase all steps necessary to prepare the
post-commit phase must be carried out, e.g. check permissions,
swap in main memory all pages that may be needed, mount
removable media, and allocate stack space
o During this phase no results should be exposed and no actions that
are irreversible should be carried out
 Post-commit phase
o Should be able to run to completion
o Shared resources allocated during the pre-commit cannot be
released until after the commit point.
Storage Models

• Cell storage does not


support all-or-nothing
actions
 When we maintain the version
histories it is possible to
restore the original content
 However, we need to
encapsulate the data access
and provide mechanisms to
implement the two phases of
an atomic all-or-nothing action

• The journal storage


does precisely that.
Consensus protocols

• Consensus
 Process of agreeing to one of several alternates
proposed by a number of agents.

• Consensus service
 Set of n processes
 Clients send requests, propose a value and wait
for a response
 Goal is to get the set of processes to reach
consensus on a single proposed value.
Consensus protocols

• Consensus protocol assumptions:


 Processes run on processors and communicate
through a network
 processors and network may experience failures,
(but not the complicated failures).
 Processors:
1. Operate at arbitrary speeds
2. Have stable storage and may rejoin the protocol after a
failure
3. Send messages to one another.
 Network
1. May lose, reorder, or duplicate messages
2. Messages are sent asynchronously
3. Message may take arbitrary long time to reach the
destination.
Client-Server Paradigm
• This paradigm is based on the enforced modularity
 Modules are forced to interact only by sending
and receiving messages.
• A more robust design
 Clients and servers are independent modules and may fail
separately.
• Servers are stateless
 May fail and then come up without the clients being
affected or even noticing the failure of the server.
• An attack is less likely
 Difficult for an intruder to guess the
o Format of the messages
o Sequence numbers of the segments, when messages are
transported by TCP
Services

a) Email service
 Sender and receiver
communicate
asynchronously using
inboxes and outboxes
 Mail demons run at each
site.
b) Event service
 supports coordination in a
distributed environment
(a)

 Based on the publish-


subscribe paradigm
 An event producer publishes
events and an event
consumer subscribes to
events
 Server maintains queues for
each event and delivers
notifications to clients when
an event occurs.
(b)
WWW
Browser Web Server

• 3-way handshake HTTP request


SYN
 First 3 messages exchanged RTT
SYN TCP connection establishment
between the client and the ACK + HTTP request
server
• Once TCP connection is ACK
Server residence time.
Web page created on the fly
established the HTTP server Data
takes its time to construct Data
Data transmission time
ACK
the page to respond the first
request
• To satisfy the second HTTP request

request, the HTTP server ACK

must retrieve an image from Server residence time.


Image retrieved from disk
the disk
• Response time includes
Image transmission time
 Round Trip Time (RTT) Image
 Server residence time
 Data transmission time

time time
HTTP Communication

A Web client can: HTTP client


request
Web
TCP HTTP
(a) communicate Browser
port
80 server
directly with the response
server

request to proxy
(b) communicate HTTP client
Web request to server
through a proxy Browser
Proxy
TCP port 80

response to client HTTP


server
response to proxy

(c) use tunneling to HTTP client request request


TCP HTTP
cross the network. Web Tunnel port
80 server
Browser
response response

You might also like