0% found this document useful (0 votes)
11 views

Parallel and distributed computing

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Parallel and distributed computing

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Computing:

Computing is the process of completing a given goal-oriented task by using computer technology.
History of computing:
Batch Era – execution of series of program on a computer with out manual intervention.
Time sharing era – sharing of computer recourses among many users by means of multiprogramming and
multitasking.
Desktop era – a personal computer that provides computing power to a single user.
Network era – system with a shared memory and distribute memory.
Type of computing:
Based on how computer handles each task, it can be categorized into two types of computing:
 Serial computing
 Parallel computing
Serial Computing;
Serial computing is a type of computing where one task is executed at a time and all the tasks are executed
by a single process in a sequence.
Parallel computing:
Parallel computing is a type of architecture, where several processors simultaneously execute, multiple,
smaller calculations, broken down from an over all larger complex problem.
Task execution in parallel computing:
It refers to the process of breaking down the large complex problem into smaller independent, often similar
parts that can be executed by multiple processors communicating via a shared memory, the result of whom is
then combine as a part of an overall algorithm.
Usages of parallel computing:
Parallel computing is used in following field:
 HPC (High performance/productivity computing)
 Technical Computing
 Cluster computing
Multiprocessor:
It is a type of computer architecture which contain two or more processors (CPU) that can executes multiple
tasks simultaneously. These processors can have common memory (shared memory architecture) or have
their own memory (distributed memory architecture).
Multicore:
It is a type of computer architecture where the single physical chip contains multiple processing units or core
which independently executes the tasks. These cores can have common or shared memory. Typically, these
cores have their own cache memory allowing for parallelism.
Principles of Parallel computing:
 Finding enough Parallelism
 Scalability
 Locality
 Coordination and synchronization
 Load balancing
 Performance modelling
Advantages of Parallel computing:
 Computing power is increased
 Performance is increased
 Scalability
 Provides concurrency
 Solve larger problems
Distributed computing:
A distributed system is a collection of independent computers that appears to its user as a single coherent
system.
Characteristics of Distributed computing:
 These systems have autonomous computer that seems to its user as a single computer.
 There is no shared memory so the computer communicates with the help of message passing.
 A single task is divided among various computers.
Application of distributed computing: (GDBPE)
Some of the common applications of distributed computing are as follow:
 Google
 Business
 P2P Network
 Engineering
 Defence
Approaches to distributed computing:
 Grid computing
 Cloud computing
Grid Computing:
Grid computing can be defined as a network of computers that work together that would be rather difficult
for a single computer to perform. The machine in the network works under the same protocols to act as a
super virtual computer.
Cloud computing:
Cloud computing is a technology that allows the user to access and use the computer resources such as
processing power, storages or application over the internet instead of owning hardware and software and
maintaining user utilization of the resources that are hosted by the cloud service provider.
Types of distributed system:
 Client / Server systems
 Peer-to-peer system
 Middleware system
 Three-tier system
 N-tier system
Client/ server system:
It is the most basic communication method where client send the request to the server and the server reply
with an output.
Peer-to-Peer system:
This communication model works as a decentralized system where the system works as both client and
server.
Middleware system:
It can be thought as an application that sits between two different application and provide service to both.
Three-tier system:
This system uses a separate layer and server for each function of a program. In this the data is stored in
middle layer rather then on the client system.
N-tier system:
It is also known as multiple distributed system. The N-tier can contain any no of function in the network.
Difference between Parallel and distributed computing:
Distributed computing is often used in tandem with parallel computing. Parallel computing on a single
computer uses multiple processors to process tasks in parallel, whereas distributed parallel computing uses
multiple computing devices to process those tasks.
Issues while designing a distributed system:
For parallel and distributed programs, the design process will include three issues:
 Decomposition
 Communication
 Synchronization
Shared memory:
Shared memory is a type of memory architecture that allows the multiple threads or process to access the
same memory space. In the context of parallel and distributed computing, shared memory can be used to
facilitates the communication and synchronization between different process and threads.
Architecture of shared memory:

How shared memory work in parallel computing:


In a computer system, with multiple processors, each processor needs a way to access the main memory and
I/O devices. They do this by connecting to the main memory and I/O devices through a shared path called
bus or a fast-switching network. Each processor has its own memory called cache where frequently used
data is kept.
Cache coherency protocol:
To make sure that each processor sees the same data while accessing the main memory and I/O devices, the
system uses cache coherency protocol. This protocol ensures if data in one processor cache change, other
processor also update that data, so they don’t end up with outdated data. This way processor stay in
synchronized with each other.
Programming models:
There are different programming model and tools to work with shared memory in parallel computing. Two
common ones are:
 Multithreading
 Open MP
Distributed memory:
Distributed memory refers to a type of parallel computing architecture where each processor has its own
private memory, and communication between processors happens through message passing. In this
architecture, the memory of one processor is not directly accessible by other processors, and communication
between processors occurs explicitly through messages that are sent and received.
Architecture of distributed memory:

Programming models:
There are different programming model and tools to work with distributed memory in parallel computing.
Two common ones are:
 Widely used standard: MPI (message passing interface)
 Others: PVM, Express etc.
Flynn’s Classification of computer architecture:
The crux of parallel processing or computing is CPU. Based on
number or instruction and data stream can be processed
simultaneously, computer system can be divided in 4 categories:
1. Single Instruction Single Data (SISD)
2. Single Instruction Multiple Data (SIMD)
3. Multiple Instruction Single Data (MISD)
4. Multiple Instruction Multiple Data (MIMD)
Single Instruction Single Data (SISD):
An SISD computing system is a uniprocessor machine which is capable of executing a single instruction,
operating on a single data stream. In SISD, machine instructions are processed in a sequential manner and
computers adopting this model are popularly called sequential computers. Most conventional computers
have SISD architecture

Single Instruction Multiple Data (SIMD):


A SIMD system is a multiprocessor machine capable of executing the same instruction on all the CPUs but
operating on different data streams. Machines based on a SIMD model are well suited to scientific
computing since they involve lots of vector and matrix operations.

Types of SIMD:
SIMD is classified into two schemes:
• Scheme 1 – Each processor has its own local memory.
• Scheme 2 – Processors and memory modules communicate with each other via interconnection
network.
Scheme 1: Scheme 2:

Multiple Instruction Single Data (MISD):


An MISD computing system is a multiprocessor machine capable of executing different instructions on
different PEs but all of them operating on the same dataset.

Multiple Instruction Multiple Data (MIMD):


An MIMD system is a multiprocessor machine which is capable of executing multiple instructions on
multiple data sets. Each PE in the MIMD model has separate instruction and data streams; therefore,
machines built using this model are capable to any kind of application.

Types of MIMD:
MIMD machines are broadly categorized into:
 Shared memory MIMD
 Distributed memory MIMD
Shared memory MIMD:
In shared memory MIMD (Multiple Instruction, Multiple Data), all the processors in the system share a
single memory space. They can directly access and communicate using this shared memory.

Distributed memory:
In distributed memory MIMD, each processor has its own private memory, and processors communicate
with each other by passing messages over a network MIMD.

Difference between SMID and MIMD:

Single Instruction Multiple Data Multiple Instruction Multiple


Architecture (SIMD) Data (MIMD)

Same instruction on multiple data Different instructions on multiple


Type of processing
sets data sets

Flexibility Limited High

Memory access Shared Separate

Applications with irregular data


Applications with regular data
Best suited for dependencies or complex
parallelism
computations

High efficiency for parallelizable Flexible, but may be less efficient


Performance
tasks for parallelizable tasks

Fault tolerance:
The ability of the system to provide the required functionality of the system in the presence of any failure or
fault.
Types of fault tolerance:
Fault is generally classified as:
 Transient fault
 Intermitted fault
 Permanent fault
Transient fault:
This type of fault occurs once and then disappears. It doesn’t harm the system at any great extent but are
every difficult to find. Example include when processor halt.
Intermitted fault:
These types of faults occur again and again. When the fault occurs, it appears and disappears by it self and
then reappears. Example includes when the working computer hangs.
Permanent fault:
Theses types of faults remain in the system until the faulty component is replaced by the new components.
These types of faults can cause serious type of damage to the system. Example include burn out chip.
Failure Tolerance:
It includes design a system in such a way that it can handle any failure with out any significant lose of
functionality and data.
Classification of failure:
Failure can be classified into 5 categories:
 Crash failure
 Omission failure
 Timing failure
 Response failure
 Arbitrary failure
Crash failure:
When the server halt (stops), but is working correctly until it halts.
Omission failure:
When the server fails to respond to any incoming request. There are two type of omission failure:
 Receive omission
 Send omission
Receive omissions:
When the server fails to receive any incoming request.
Send omission:
When the server fails to send any message.
Timing failure:
When the server response lies outside the specified time interval.
Response failure:
When the server response is incorrect. There are two type of response failure:
 Value failure
 State transition failure
Value failure:
When the server response with the wrong value.
State transition failure:
The server deviates form the correct flow of control.
Arbitrary failure:
A server may produce arbitrary response at arbitrary times.
Failure masking:
Failure masking is the fault tolerance technique where we hide the occurrence of the failure from the other
processes.
Failure masking types:
The most common approach for failure masking is redundancy which can be categorized into following
types:
 Information redundancy
 Time redundancy
 Physical redundancy
Information redundancy:
This involves adding extra information such as checksum, parity bit etc., that will help in error detection and
correction.
Time redundancy:
Retrying the option if needed. For example, if the operation fails or an error occurs, then the system can
retry that operation again.
Physical redundancy:
It involves adding extra hardware in case of any failure so that the function or operation are still working
after failure.
Process resiliency:
It refers to the system ability to keep on working when one or more individual process fails.
Purpose:
The goal is to ensure that a failure in one part of the system doesn’t cause the entire system to stop working
or lose data.
Technique:
To achieve process resiliency, we use a technique called process replication, where process identical copies
are kept in a from of group. If one process fails, the copy of that process from that group can be used to keep
the system operational.
Type of process group:
Process can be categorized into two main types:
 Flat group
 Hierarchy group
Flat group:
In flat group, all the processes are equal no single process is incharge for decision making. All the process
has to make a collective decision. They must reach a consensus before acting.
Advantage:
There is no single point failure as all the process have the equal responsibility of making decision.
Disadvantage:
Decision making is complex as all the process have to come to a consensus before acting up.
Hierarchy group:
In hierarchy group, there is a single process (coordinator) that is responsible for making decision on the
behalf of all the all the process. The coordinator decides which action to take.

Advantage:
This structure is simple and decision making is faster as only coordinator is allowed to make decision.
Disadvantage:
There is single point failure as if the coordinator fails, the group has lost its leader.
Approaches for process group replication:
To ensure that process group remain resilient there are two main categories for arranging the replication:
 Primary base protocol
 Replicated write protocol
Primary base protocol:
This approach uses hierarchical structure setup with a primary (process) and backup processes. The primary
handles all the write operations and the backup stay synchronized with primary.
Failure handling in primary base protocol:
If the primary process crashes, then the backup process will run an election algorithm to select a new
primary process to take over.
Advantages:
Easy and quick decision making as only primary is allow to make decision.
Disadvantages:
The primary process is a single point of failure, through backup can recover it.
Replicated write protocol:
This group use the flat process grouping, where all the process has equality, no process is primary. The
process can perform the write operation simultaneously, using protocol to ensure the data consistency.
Types of replicated write protocols:
Two types of replicated write protocol are used:
 Active replication
 Quorum based protocol
Active replication:
In this protocol every process performs every operation, keeping all the copies identical.
Quorum based protocol:
A subset of process (quorum) must agree before any operation is performed.
Advantage:
No single point failure all the processes have equal role.
Disadvantage:
Coordinating write across multiple process can be complex and requires more communication.
Reliable client-server communication:
Reliable client-server communication in a distributed system refers to a dependable exchange of data
between server and the client across the network.
Type of communication failure:
Communication channel may experience various types of failure:
 Crash failure
 Omission failure
 Timing failure
 Arbitrary failure
Crash failure:
The connection absurdly stops working.
Omission failure:
Message are lost during transmission.
Timing failure:
Message are delayed or do not arrive on time.
Arbitrary failure:
Message may be corrupted which may result in unexpected behaviour.
Peer-to-Peer communication:
Point to point connection in distributed system is established using a reliable transport protocol such as TCP.
Handling communication failure with TCP:
TCP handles the omission failure by using acknowledgment and retransmission. When a message is lost,
TCP resends it until the recipient confirm the receipt.
TCP doesn’t handle the crash failure, in that case system need to automatically need to establish a new
connection.
Remote Procedure Call (RPC):
Remote Procedure Call (RPC) is a communication paradigm in distributed systems that allows a program to
execute a procedure (function) on a remote server as if it were a local function.
Classes of RPC failure:
When something goes wrong, in the RPC process it can lead to one of the following failures:
 Client if unable to locate the server
 Request message form the client to the server is lost
 Server crasher after receiving the message
 Reply message from the server to the client is lost
 Client crashes after sending the request.
Handling server crasher in RPC:
Server crashes can be handle using following three philosophies:
 At least one semantic
 At most one semantic
 No semantic
At least one semantic:
This approach makes sure that the action was complete at least one time.
At most one semantic:
This approach makes sure that action happened either once or not at all.
No semantic:
There is no guarantee what happened. After crash we have to check whether, the action was completed or
not.
Orphan:
When a client sends a request to a server and crashes before it gets replies, the server might be still working
on that request. Since the client is gone, no one is waiting for the reply. This leftover task is called orphan.
Ways to handle orphan:
There are four ways to handle orphan tasks:
 Extermination
 Reincarnation
 Gentle reincarnation
 Expiration
Extermination:
Just stop or kill the orphan task on the server.
Reincarnation:
Each client has its own unique identifier for its session, when a new session starts it easy to identify the
orphan task a sit doesn’t match the current identifier.
Gentle reincarnation:
Similar to the reincarnation, but when the new session starts it tries to find if the task still has its owner, if no
owner is found, then it stops the orphan task.
Expiration:
Then server stops the orphan takes if it takes too long without finishing. After a set time, it assumes that it is
no longer needed.
Reliable multicasting:
It ensures that the message send by the one sender reaches all the receivers without any of them missing it.
Or
Reliable multicast services guarantee that all messages are delivered to all members of a process group
Working:
The sender assigns a unique number to each message called sequence number, to help keep track of
messages. The receiver also stores the copies of these messages in a temporarily area called history buffer,
so it can resend them if needed. After receiving the message, the sender sends the acknowledgement back to
sender to confirm it. If a receiver notices the missing message, it can send a negative acknowledgment.
Distributed commit protocol:
It ensures that when an operation need to be performed across multiple system, either all the system perform
that operation or none of them perform operation. This is important for consistency across the system.
Commit protocol:
These are the protocol used in distributed system to ensure that a group of system either agree to perform a
task or non of them do it.
Types of commit protocol:
There are three types of commit protocols:
 Single phase commit protocol
 Two phase commit protocol
 Three phase commit protocol
Single phase commit protocol:
A one-phase commitment protocol involves a coordinator who communicates with servers and performs
each task.
Two phase commit protocol:
There are two phases for the commit procedure to work:
 Voting phase – the coordinator ask all the participants if they can proceed or not. Each participant
votes “commit” or “abort”
 Decision phase – based on voting the coordinator decides and send a global commit or abort
message.
Disadvantage:
The protocol has a problem if the coordinator crashes, participants may get stuck unable to make a final
decision, which can lead to deadlock.
Three phase commit protocol:
Three-Phase Commit (3PC) Protocol is an extension of the Two-Phase Commit (2PC) Protocol that avoids
blocking problem.
Recovery:
Recovery is the process of fixing the problem that occurred in the system and getting it back to its correct
state, working state.
Ways of recovery:
There are two main ways of recovery:
 Backward recovery
 Forward recovery
Backward recovery:
This method takes the system back to its previous correct state and then continue form there. It like hitting
the undo button.
Advantage:
One major benefit of backward recovery is that it can be designed as a general-purpose solution. This means
it works across a wide variety of systems and applications, without needing to be customized for each one.
Instead, it can be implemented as part of the middleware layer in a distributed system
Disadvantage:
Restoring a system or process to a previous state is generally a relatively costly operation in terms of
performance.
Forward recovery:
When the system has entered an erroneous state, instead of moving back to a previous, checkpointed state,
an attempt is made to bring the system in a correct new state from which it can continue to execute.
Advantage:
It tends to faster and doesn’t require to go back to the previous state.
Disadvantage:
It works only when the system know what kind of errors might occur. If the error is unexpected, the system
may not know how to fix it.
Concurrency:
Concurrency is the task of running one or more computational under same time interval. Two events are said
to be concurrent if they occur in the same time interval.
When multiple tasks or processes run at the same time (called concurrency), we need rules to ensure they
don’t interfere with each other. Without proper control, bad things can happen, like:
 Data corruption: If two tasks try to change the same data at the same time, the data can get messed
up.
 Inconsistency: The application might end up in a weird, invalid state.
To solve these problems, we use synchronization and coordination:
 Synchronization: Makes sure tasks take turns and access shared data in an organized way.
 Coordination: Helps tasks work together smoothly, like teammates following a plan.
Concurrency is powerful because:
 It saves time by running tasks simultaneously.
 It reduces duplication of code.
 It makes programs faster and more efficient.
 It scales well for big systems like servers or distributed applications.
 It can solve more complex, real-world problems compared to single-task (sequential) programming.

There are two main ways to achieve concurrency:


1. Parallel Programming:
o Tasks are split and run on multiple processors of the same computer.
o For example, a program might divide work into four tasks, each running on a different
processor at the same time.
2. Distributed Programming:
o Tasks are split and run on different computers, sometimes working together.
o For example, a program might have three parts running on separate machines.

Four approaches to programming with concurrency:


1. Sequential Programming:
o Tasks are done one after another, with no overlap.
o Simple but not concurrent.
2. Declarative Concurrency:
o Programs focus on what to do (not how to do it), and concurrency happens automatically.
Tasks run based on when their data is ready.
3. Message-passing Concurrency:
o Tasks communicate by sending messages to each other.
o Tasks are independent and don’t share data directly, which prevents conflicts.
4. Shared-state Concurrency:
o Tasks share the same resources or data.
o This requires extra rules (synchronization) to avoid problems like data corruption.

Key concepts of concurrency models:


 Declarative programming: Focuses on logic rather than step-by-step instructions. It’s like telling
the computer "what to achieve" rather than "how to achieve it."
 Message-passing: Tasks communicate by sending and receiving messages, either instantly
(synchronously) or with a delay (asynchronously).
 Shared-state: Tasks share data directly, which requires strict rules to ensure everything stays
consistent and valid.

You might also like