0% found this document useful (0 votes)

6 views16 pages

Unit01-Parallel Computing Introduction

The document outlines the course structure for Parallel and Distributed Computing (CS404) for Fall 2023-24, covering topics such as parallel computing, distributed systems, memory architectures, programming models, and algorithms. It details various types of parallel computers, their architectures, and the principles of parallel programming, including message passing and multithreading. Additionally, it introduces fundamental concepts like Von Neumann architecture and Flynn's taxonomy for classifying computer architectures.

Uploaded by

haq4ibtisam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views16 pages

Unit01-Parallel Computing Introduction

Uploaded by

haq4ibtisam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Parallel and Distributed Computing (CS404) Fall 2023-24

th
BSCS – 7

Parallel and Distributed Computing

Unit 1: Introduction to Parallel Computing..................................... 1

What is Parallel Computing? ............................................................................................................. 1
Parallel Computing .................................................................................................................... 1
Parallel Computer ..................................................................................................................... 1
Parallel Computer Models ......................................................................................................... 2
Classes of Parallel Computer Architecture ................................................................................. 2
Concepts and Terminology ................................................................................................................ 4
Von Neumann Architecture ....................................................................................................... 4
Flynn’s Classical Taxonomy........................................................................................................ 5
Parallel Computing Terminology ............................................................................................... 9
Uses, Limitations and Costs of Parallel Computing .......................................................................... 10
Uses of Parallel Computing...................................................................................................... 10
Limitations of Parallel Computing ............................................................................................ 12

Unit 2: Introduction to Distributed Systems ................................. 14

Distributed Systems ........................................................................................................................ 14
Operational Layers of Distributed Computing.......................................................................... 14
Middleware and Distributed Systems .............................................................................................. 15
Types of Distributed Systems........................................................................................................... 16
High performance distributed computing................................................................................ 16
Distributed Information Systems ............................................................................................. 18
Pervasive systems ................................................................................................................... 21

Unit 3: Parallel Computer Memory ................................................. 24

Memory Hierarchies ........................................................................................................................ 24
Parallel Computer Memory Architecture ......................................................................................... 25
Shared Memory ...................................................................................................................... 25
Distributed Memory ................................................................................................................ 27
Hybrid Distributed-Shared Memory ........................................................................................ 28

I
Qurtuba University of Science and Information Technology Peshawar
(Computer Science Department)
Parallel and Distributed Computing (CS404) Fall 2023-24
th
BSCS – 7

Unit 04: Parallel Programming Models .......................................... 30

Introduction .................................................................................................................................... 30
Parallel Programming Models ......................................................................................................... 30
Data parallel ............................................................................................................................ 30
Task parallel ............................................................................................................................ 31
Process centric ........................................................................................................................ 31
Shared-distributed memory .................................................................................................... 32
Message Passing ..................................................................................................................... 32

Unit 05: Designing Parallel Programs ............................................ 33

Design Methodology ....................................................................................................................... 33
Steps in Designing Parallel Program ................................................................................................ 33
Understand the Problem ......................................................................................................... 33
Partitioning ............................................................................................................................. 34
Communication ....................................................................................................................... 35
Synchronization ...................................................................................................................... 37

Unit 06: Message Passing Interface (MPI) ...................................... 39

Introduction .................................................................................................................................... 39
MPI Programming Model ................................................................................................................ 39
MPI Basics ....................................................................................................................................... 39
C Language Binding ........................................................................................................................ 41

Unit 07: Multithreaded Programming............................................ 43

Threads overview ............................................................................................................................ 43
Kernel-level threads ................................................................................................................ 43
User-level threads ................................................................................................................... 44
Multithreaded Programming .......................................................................................................... 44
Multithreading on a Single Processor ...................................................................................... 44
Multithreaded Programming on Multiple Processors .............................................................. 44
Programming language support .............................................................................................. 44

Unite 08: Parallel Algorithms ........................................................ 45

What is an Algorithm? .................................................................................................................... 45

II
Qurtuba University of Science and Information Technology Peshawar
(Computer Science Department)
Parallel and Distributed Computing (CS404) Fall 2023-24
th
BSCS – 7

Parallel Algorithms.......................................................................................................................... 45
Algorithmic Notations for Parallel Algorithms ................................................................................. 45
Parallel Models for Parallel Algorithms ........................................................................................... 46
Shared-Memory Model ................................................................................................................... 47
PRAM Model ........................................................................................................................... 47
Network Models ............................................................................................................................. 48
Directed Acyclic Graph Models ........................................................................................................ 49
Parallel Algorithm Techniques ......................................................................................................... 49

Unit 09: GPU Architecture and Programming .............................. 50

Graphics Processing Unit (GPU)....................................................................................................... 50
GPU Architecture ............................................................................................................................ 50
Hardware Structure ................................................................................................................. 51
GPU Programming Model ............................................................................................................... 54

Unit 10: ................................................................................................. 55

Fault Tolerance ............................................................................................................................... 55
Basic Concepts ........................................................................................................................ 55
Concurrency control ........................................................................................................................ 58
Interconnection topologies .............................................................................................................. 59
Parallel Networks Topologies .................................................................................................. 60

III
Qurtuba University of Science and Information Technology Peshawar
(Computer Science Department)
Parallel and Distributed Computing (CS404) Fall 2023-24
th
BSCS – 7

Unit 1: Introduction to Parallel Computing

What is Parallel Computing?

Parallel Computing
Parallel computing is a type of computing architecture in which several processors execute or process an
application or computation simultaneously. It refers to the process of breaking down larger problems into
smaller, independent parts, which are often similar. These parts can be executed simultaneously by multiple
processors communicating via shared memory. After processing the results are combined as part of an
overall algorithm. Parallel computing is also known as parallel processing.

There are generally four types of parallel computing:

1. Bit-level parallelism: increases processor word size, which reduces the quantity of instructions the
processor must execute in order to perform an operation on variables greater than the length of the
word.
2. Instruction-level parallelism: the two forms are hardware approach and software approach.
a. The hardware approach implements dynamic parallelism where the processor decides at
run-time which instructions to execute in parallel.
b. The software approach implements static parallelism where the compiler decides which
instructions to execute in parallel.
3. Task parallelism: the parallelization of computer code across multiple processors that runs several
different tasks at the same time on the same data.
4. Superword-level parallelism: a vectorization technique that can exploit parallelism of inline code. It
involves identifying scalar instructions in a large basic block that perform the same operation, and
combining them into a superword operation on a multi-word object, if dependences do not prevent
it.

Based on communication frequency, parallel applications are typically classified as either:

i. Fine-grained parallelism: where subtasks will communicate several times per second
ii. Coarse-grained parallelism: where subtasks do not communicate several times per second, or
iii. Embarrassing parallelism: where subtasks rarely or never communicate

Parallel Computer
A parallel computer is a set of processors that are able to work cooperatively to solve a computational
problem. Parallel computers offer the potential to concentrate computational resources on important
computational problems. Computational resources include processors, memory, or I/O bandwidth, etc.
Parallel computing also includes parallel supercomputers that have hundreds or thousands of processors,
networks of workstations, multiple-processor workstations, and embedded systems.

A parallel computer is simply a collection of processors, typically of the same type, interconnected in a certain
fashion to allow the coordination of their activities and the exchange of data. The processors are assumed to
be located within a small distance of one another, and are primarily used to solve a given problem jointly.

Lecturer: Mairaj 1
Note: These handouts/notes are not equivalent and/or replacement of the text/reference books.
Qurtuba University of Science and Information Technology Peshawar
(Computer Science Department)
Parallel and Distributed Computing (CS404) Fall 2023-24
th
BSCS – 7

While in distributed systems, a set of possibly many different types of processors are distributed over a large
geographic area. In distributed systems, the primary goals are:
 to use the available distributed resources, and
 to collect information and transmit it over a network connecting the various processors

Parallel Computer Models

Single Machine Model

Von Neumann computer is based on a single machine model. It comprises a central processing unit (CPU)
connected to a storage unit (memory). The CPU executes a stored program that specifies a sequence of
“read and write” operations on the memory. It is also called sequential machine model.

Multicomputer Model
The multicomputer is an idealized parallel computer model. Each node consists of a von Neumann machine
(a CPU and memory). A node can communicate with other nodes by sending and receiving messages over an
interconnection network. Each computer executes its own program. This program may access local memory
and may send and receive messages over the network.

Figure 1: Multi-computer model

Classes of Parallel Computer Architecture

Parallel computers can be classified according to variety of architectural features and modes of operations.
Some of these criteria include:
 The type and number of processors.
 The interconnections among the processors and the corresponding communication schemes.
 The overall control and synchronization
 The input/output operations

Distributed-Memory MIMD Computer

MIMD stands for multiple instructions multiple data.

 MIMD means that each processor can execute a separate stream of instructions on its own local
data.
 Distributed memory means that memory is distributed among the processors, rather than placed in
a central location.

The principal difference between a multicomputer and the distributed-memory MIMD computer is the cost
of sending and receiving of messages among the nodes. In this architecture, the cost of messaging between
two nodes is dependent on the location of nodes and other network traffic. Examples of this class of
machine include the IBM SP, Intel Paragon, Thinking Machines CM5, Cray T3D, Meiko CS-2, and nCUBE.

Lecturer: Mairaj 2
Note: These handouts/notes are not equivalent and/or replacement of the text/reference books.
Qurtuba University of Science and Information Technology Peshawar
(Computer Science Department)
Parallel and Distributed Computing (CS404) Fall 2023-24
th
BSCS – 7

Figure 2: Distributed-memory MIMD computer

Shared-Memory MIMD Computer

Shared-memory MIMD machines are also known as multiprocessor systems. In multiprocessor computers,
all processors share access to a common memory, typically via a bus or a hierarchy of buses.

Programs developed for multicomputers can also execute efficiently on multiprocessors, because shared
memory permits an efficient implementation of message passing. Examples of this class of machine include
the Silicon Graphics Challenge, Sequent Symmetry, and the many multiprocessor workstations.

Figure 3: Shared-memory MIMD computer

In the idealized multiprocessor model, any processor can access any memory element in the same amount
of time. This architecture usually introduces some form of memory hierarchy. Such as copies of frequently
used data items are stored in a cache associated with each processor. Access to this cache is much faster
than access to the shared memory.

SIMD (Single Instruction Multiple Data) computer

In SIMD machines, all processors execute the same instruction stream on a different piece of data. This
approach is appropriate only for specialized problems, such as image processing and certain numerical
simulations. These applications are characterized by a high degree of regularity. Multicomputer algorithms
cannot, in general, be executed efficiently on SIMD computers. The “MasPar MP” is an example of this class
of machine.

Lecturer: Mairaj 3
Note: These handouts/notes are not equivalent and/or replacement of the text/reference books.
Qurtuba University of Science and Information Technology Peshawar
(Computer Science Department)
Parallel and Distributed Computing (CS404) Fall 2023-24
th
BSCS – 7

Figure 4: SIMD computer

Concepts and Terminology

Von Neumann Architecture

Von Neumann architecture was first published by John von Neumann in 1945. It is also known as the Von
Neumann model or Princeton architecture. This computer architecture design consists of a Control Unit,
Arithmetic and Logic Unit (ALU), Memory Unit, Registers and Inputs/Outputs.

Von Neumann architecture is based on the stored-program computer concept, where instruction data and
program data are stored in the same memory. This design is still used in most computers produced today.
Von Neumann architecture is the design upon which many general purpose computers are based. The key
elements of von Neumann architecture are:
 data and instructions are both stored as binary digits
 data and instructions are both stored in primary storage
 instructions are fetched from memory one at a time and in order (serially)
 the processor decodes and executes an instruction, before cycling around to fetch the next
instruction
 the cycle continues until no more instructions are available

Design Elements of Von Neumann Architecture

Figure 5: Von Neumann architecture

Central Processing Unit (CPU)

The Central Processing Unit (CPU) is the electronic circuit responsible for executing the instructions of a
computer program. It is sometimes referred to as the microprocessor or processor. The CPU contains the
ALU, CU and a variety of registers.

Registers
Registers are high speed storage areas in the CPU. All data must be stored in a register before it can be
processed.

Lecturer: Mairaj 4
Note: These handouts/notes are not equivalent and/or replacement of the text/reference books.
Qurtuba University of Science and Information Technology Peshawar
(Computer Science Department)
Parallel and Distributed Computing (CS404) Fall 2023-24
th
BSCS – 7

MAR Memory Address Register Holds the memory location of data that needs to be accessed
MDR Memory Data Register Holds data that is being transferred to or from memory
AC Accumulator Where intermediate arithmetic and logic results are stored
PC Program Counter Contains the address of the next instruction to be executed
CIR Current Instruction Register Contains the current instruction during processing
Table 1: Registers in CPU – Von Neumann architecture

Arithmetic and Logic Unit (ALU)

The ALU allows arithmetic (add, subtract etc) and logic (AND, OR, NOT etc) operations to be carried out.

Control Unit (CU)

The control unit controls the operation of the computer’s ALU, memory and input/output devices, telling
them how to respond to the program instructions it has just read and interpreted from the memory unit.
The control unit also provides the timing and control signals required by other computer components.

Buses
Buses are the means by which data is transmitted from one part of a computer to another, connecting all
major internal components to the CPU and memory. A standard CPU system bus is comprised of a control
bus, data bus and address bus.

Address Bus Carries the addresses of data between the processor and memory
Data Bus Carries data between the processor, the memory unit and the input/output
devices
Control Bus Carries control signals or commands from the CPU, and status signals from
other devices, in order to control and coordinate all the activities within the
computer
Table 2: Types of buses

Memory Unit
The memory unit consists of RAM, sometimes referred to as primary or main memory. Unlike secondary
memory, primary memory is faster and directly accessible by the CPU. RAM is split into partitions. Each
partition consists of an address and its contents (both in binary form). The addresses uniquely identify every
location in the memory. Loading data from permanent memory (hard drive), into the faster and directly
accessible temporary memory (RAM), allows the CPU to operate much faster.

Flynn’s Classical Taxonomy

Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966 and
extended in 1972. This classification system has been used as a tool in design of modern processors and their
functionalities. Flynn’ classifications are based upon the number of concurrent instruction (or control)
streams and data streams available in the architecture.

Flynn's taxonomy distinguishes multi-processor computer architectures according to how they can be
classified along the two independent dimensions of Instruction Stream and Data Stream. Each of these
dimensions can have only one of two possible states – Single or Multiple.

According to Flynn’s taxonomy, the parallel computers can be classified as:

1. Single Instruction stream Single Data stream (SISD)
2. Single Instruction stream Multiple Data stream (SIMD)

Lecturer: Mairaj 5
Note: These handouts/notes are not equivalent and/or replacement of the text/reference books.
Qurtuba University of Science and Information Technology Peshawar
(Computer Science Department)
Parallel and Distributed Computing (CS404) Fall 2023-24
th
BSCS – 7

3. Multiple Instruction stream Single Data stream (MISD)

4. Multiple Instruction stream Multiple Data stream (MIMD)

Figure 6: Flynn's taxonomy

Single Instruction Stream, Single Data Stream (SISD):

A sequential computer which exploits no parallelism in either the instruction or data streams. Single control
unit (CU) fetches single instruction stream (IS) from memory. The CU then generates appropriate control
signals to direct single processing element (PE) to operate on single data stream (DS) i.e., one operation at a
time. Examples of SISD architecture are the traditional uniprocessor machines, such as older generation
mainframes, minicomputers, workstations and single processor/core PCs.

Figure 7: Single Instruction Stream, Single Data Stream (SISD)

This class of computers is characterized as:

 A serial (non-parallel) computer
 Single Instruction: Only one instruction stream is being acted on by the CPU during any one clock
cycle
 Single Data: Only one data stream is being used as input during any one clock cycle
 Deterministic execution

Single instruction stream, multiple data streams (SIMD)

A single instruction is simultaneously applied to multiple different data streams. Instructions can be
executed sequentially, such as by pipelining, or in parallel by multiple functional units. Flynn's 1972 paper
subdivided SIMD down into three further categories:

1. Array Processor
These receive the one (same) instruction but each parallel processing unit has its own separate and
distinct memory and register file. The modern term for an array processor is "single instruction,
multiple threads" (SIMT).

Lecturer: Mairaj 6
Note: These handouts/notes are not equivalent and/or replacement of the text/reference books.
Qurtuba University of Science and Information Technology Peshawar
(Computer Science Department)
Parallel and Distributed Computing (CS404) Fall 2023-24
th
BSCS – 7

2. Pipelined Processor
These receive the one (same) instruction but then read data from a central resource, each processes
fragments of that data, then writes back the results to the same central resource. In Flynn's 1977
paper the resource is main memory. For modern CPUs the resource is now more typically the
register file. Alternative name for this type of register-based SIMD is "packed SIMD".

3. Associative Processor
These receive the one (same) instruction but in each parallel processing unit an independent
decision is made, based on data local to the unit, as to whether to perform the execution or whether
to skip it. The modern term for associative processor is "Predicated" (or masked) SIMD.

Some modern designs (GPUs in particular) take features of more than one of these subcategories. GPUs of
today are SIMT (single instruction multiple threads) but also are Associative i.e. each processing element in
the SIMT array is also predicated.

Figure 8: Single instruction stream, multiple data streams (SIMD)

This class of computers is characterized as:

 Type of parallel computer
 Single Instruction: All processing units execute the same instruction at any given clock cycle
 Multiple Data: Each processing unit can operate on a different data element
 Best suited for specialized problems characterized by a high degree of regularity, such as
graphics/image processing.
 Synchronous (lockstep) and deterministic execution
 Two varieties: Processor Arrays and Vector Pipelines
 Most modern computers and graphics processor units (GPUs) employ SIMD instructions and
execution units
 Examples:
o Processor Arrays: Thinking Machines CM-2, MasPar MP-1 & MP-2, ILLIAC IV
o Vector Pipelines: IBM 9000, Cray X-MP, Y-MP & C90, Fujitsu VP, NEC SX-2, Hitachi S820,
ETA10

Lecturer: Mairaj 7
Note: These handouts/notes are not equivalent and/or replacement of the text/reference books.
Qurtuba University of Science and Information Technology Peshawar
(Computer Science Department)
Parallel and Distributed Computing (CS404) Fall 2023-24
th
BSCS – 7

Multiple instruction streams, single data stream (MISD)

Multiple instructions operate on one data stream. This is an uncommon architecture which is generally used
for fault tolerance. Heterogeneous systems operate on the same data stream and must agree on the result.
Examples include the Space Shuttle flight control computer.

Figure 9: Multiple instruction streams, single data stream (MISD)

MISD computers are characterized as:

 A type of parallel computer
 Multiple Instructions: Each processing unit operates on the data independently via separate
instruction streams.
 Single Data: A single data stream is fed into multiple processing units.
 Few actual examples of this class of parallel computer have ever existed
 Some conceivable uses might be:
o multiple frequency filters operating on a single signal stream
o multiple cryptography algorithms attempting to crack a single coded message

Multiple instruction streams, multiple data streams (MIMD)

These are multiple autonomous processors that simultaneously execute different instructions on different
data. MIMD architectures include multi-core superscalar processors, and distributed systems, using either
one shared memory space or a distributed memory space.

Figure 10: Multiple instruction streams, multiple data streams (MIMD)

MIMD computers have following characteristics:

 A type of parallel computer
 Multiple Instruction: Every processor may be executing a different instruction stream
 Multiple Data: Every processor may be working with a different data stream
 Execution can be synchronous or asynchronous, deterministic or non-deterministic

Lecturer: Mairaj 8
Note: These handouts/notes are not equivalent and/or replacement of the text/reference books.
Qurtuba University of Science and Information Technology Peshawar
(Computer Science Department)
Parallel and Distributed Computing (CS404) Fall 2023-24
th
BSCS – 7

 Currently, the most common type of parallel computer

 Examples: most current supercomputers, networked parallel computer clusters and "grids", multi-
processor SMP computers, multi-core PCs.
 Many MIMD architectures also include SIMD execution sub-components

Parallel Computing Terminology

Some of the more commonly used terms associated with parallel computing are listed below:

CPU
Modern day CPUs consist of one or more cores. A core is a distinct execution unit with its own instruction
stream. Cores with a CPU may be organized into one or more sockets - each socket with its own distinct
memory. When a CPU consists of two or more sockets, usually hardware infrastructure supports memory
sharing across sockets.

Node
A node is a standalone "computer in a box". Node usually comprised of multiple CPUs/processors/cores,
memory, network interfaces, etc. Nodes are networked together to comprise a supercomputer.

Task
A task is a logically discrete section of computational work. A task is typically a program or program-like set
of instructions that is executed by a processor. A parallel program consists of multiple tasks running on
multiple processors.

Pipelining
Pipelining is the breaking of a task into steps performed by different processor units, with inputs streaming
through, much like an assembly line; a type of parallel computing.

Shared Memory
It describes a computer architecture where all processors have direct access to common physical memory. In
a programming, it specifies a model where parallel tasks all have the same "picture" of memory and can
directly address and access the same logical memory locations regardless of where the physical memory
actually exists.

Symmetric Multi-Processor (SMP)

SMP is the shared memory hardware architecture where multiple processors share a single address space
and have equal access to all resources, such as memory, disk, etc.

Distributed Memory
In hardware, distributed memory refers to the network based “memory access” for physical memory (which
is not common). As a programming model, tasks can only logically "see" local machine memory and must use
communications to access memory on other machines where other tasks are executing.

Communications
Parallel tasks typically need to exchange data. There are several ways this can be accomplished, such as
through a shared memory bus or over a network.

Lecturer: Mairaj 9
Note: These handouts/notes are not equivalent and/or replacement of the text/reference books.
Qurtuba University of Science and Information Technology Peshawar
(Computer Science Department)
Parallel and Distributed Computing (CS404) Fall 2023-24
th
BSCS – 7

Synchronization
The coordination of parallel tasks in real time, very often associated with communications. Synchronization
usually involves waiting by at least one task. Therefore, it may increase the wall clock execution time of a
parallel application.

Computational Granularity
In parallel computing, granularity is a quantitative or qualitative measure of the ratio of computation to
communication.

 Coarse: relatively large amounts of computational work are done between communication events
 Fine: relatively small amounts of computational work are done between communication events

Observed Speedup
It is one of the simplest and most widely used indicators for measuring the performance of a parallel
program. Speedup is defined as the ratio between the “wall-clock time of serial execution” and “wall-clock
time of parallel execution”

𝑤𝑎𝑙𝑙_𝑐𝑙𝑜𝑐𝑘 𝑡𝑖𝑚𝑒 𝑜𝑓 𝑠𝑒𝑟𝑖𝑎𝑙 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛

𝑆𝑝𝑒𝑒𝑑𝑢𝑝 =
𝑤𝑎𝑙𝑙_𝑐𝑙𝑜𝑐𝑘 𝑡𝑖𝑚𝑒 𝑜𝑓 𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛

Parallel Overhead
It is the required execution time that is unique to parallel tasks, as opposed to that for doing useful work.
Parallel overhead can include factors such as Task start-up time, Synchronizations, Data communications,
Software overhead imposed by parallel languages, libraries, operating system, etc. and Task termination
time.

Massively Parallel
It refers to the hardware that comprises a given parallel system, having many processing elements. The
meaning of "many" keeps increasing, but currently, the largest parallel computers are comprised of
processing elements numbering in the hundreds of thousands to millions.

Embarrassingly (IDEALY) Parallel

It is the solving many similar, but independent tasks simultaneously that require little or zero coordination
between the tasks.

Scalability
Scalability refers to a parallel system's (hardware and/or software) ability to demonstrate a proportionate
increase in parallel speedup with the addition of more resources. Factors that contribute to scalability
include Hardware (particularly memory-CPU bandwidths) and network communication properties,
Application algorithm, Parallel overhead related, Characteristics of the specific application.

Uses, Limitations and Costs of Parallel Computing

Uses of Parallel Computing

The primary goal of parallel computing is to increase available computation power for faster application
processing and problem solving. Parallel processing is generally implemented in operational environments or
scenarios that require massive computation or processing power.

Lecturer: Mairaj 10
Note: These handouts/notes are not equivalent and/or replacement of the text/reference books.
Qurtuba University of Science and Information Technology Peshawar
(Computer Science Department)
Parallel and Distributed Computing (CS404) Fall 2023-24
th
BSCS – 7

Parallel computing infrastructure is typically housed within a single datacenter where several processors are
installed in a server rack. Computation requests are distributed in small chunks by the application servers
that are then executed simultaneously on each server.

The importance of parallel computing continues to grow with the increasing usage of multicore processors
and GPUs. GPUs work together with CPUs to increase the throughput of data and the number of concurrent
calculations within an application. Using the power of parallelism, a GPU can complete more work than a
CPU in a given amount of time.

Some advantages of Parallel Computing over Serial Computing are:

 It saves time and money as many resources working together will reduce the time and reduce the
potential costs.
 It can be impractical to solve larger problems on Serial Computing.
 It can take advantage of non-local resources when the local resources are finite. Non-local resources
include resources on wide area network or on the Internet.
 Serial Computing “wastes” the potential computing power. Parallel Computing makes better use of
the hardware.
 A single compute resource can only do one thing at a time. Multiple compute resources can do many
things simultaneously. Parallel computing provides concurrency and saves time and money.
 In real-world many things happen at a certain time but at different places concurrently. This data is
extensively huge to manage. Real-world data needs more dynamic simulation and modeling, and for
achieving the same, parallel computing is the key.
 Complex, large datasets and their management can be organized only and only by using parallel
computing approach.
 Moreover, it is impractical to implement real-time systems using serial computing.

Applications of Parallel Computing

Some important applications of parallel computing are:

Real-Time Simulation of Systems

Real-time simulation refers to a “computer model” of a physical system that can execute at the same rate as
actual "wall clock" time. In other words, the computer model runs at the same rate as the actual physical
system. Real-time simulation extensively used in

 Engineering fields
o Statistical power grid protection tests
o aircraft design and simulation
o motor drive controller design methods and space robot integration, etc
 Computer gaming
 Industrial market for operator training and off-line controller tuning

Science and Engineering

Parallel computing is being used to model difficult problems in many areas of science and engineering, such
as:

 Atmosphere, Earth, Environment

 Physics - applied, nuclear, particle, condensed matter, high pressure, fusion, photonics

Lecturer: Mairaj 11
Note: These handouts/notes are not equivalent and/or replacement of the text/reference books.
Qurtuba University of Science and Information Technology Peshawar
(Computer Science Department)
Parallel and Distributed Computing (CS404) Fall 2023-24
th
BSCS – 7

 Bioscience, Biotechnology, Genetics

 Chemistry, Molecular Sciences
 Geology, Seismology
 Mechanical Engineering - from prosthetics to spacecraft
 Electrical Engineering, Circuit Design, Microelectronics
 Computer Science, Mathematics
 Defense, Weapons

Industrial and Commercial Applications

Commercial and industrial applications provide an equal or even greater driving force in the development of
faster computers. These applications require the processing of large amounts of data in sophisticated ways.
For example:

 Data analysis and "Big Data," databases, data mining

 Artificial Intelligence (AI)
 Oil exploration
 Web search engines, web based business services
 Medical imaging and diagnosis
 Pharmaceutical design
 Financial and economic modeling
 Management of national and multi-national corporations
 Image processing, advanced graphics, augmented and virtual reality, particularly in the
entertainment industry
 Networked video and multi-media technologies
 Collaborative work environments

Global Applications
Parallel computing is now being used extensively around the world, in a wide variety of applications. Some
of them are:

 Research
 Finance
 Logistic services
 Information processing services
 Aerospace
 Telecommunication
 Defense
 Health and medicines , and so on …

Limitations of Parallel Computing

In multicomputer model accesses to local (same-node) memory are less expensive than accesses to remote
(different-node) memory. That is, read and write are less costly than send and receive. Hence, it is desirable
that accesses to local data be more frequent than accesses to remote data. This property is called locality.
Locality is fundamental requirement for parallel software, in addition to concurrency and scalability. The
importance of locality depends on the ratio of remote to local access costs. This ratio can vary from 10:1 to

Lecturer: Mairaj 12
Note: These handouts/notes are not equivalent and/or replacement of the text/reference books.
Qurtuba University of Science and Information Technology Peshawar
(Computer Science Department)
Parallel and Distributed Computing (CS404) Fall 2023-24
th
BSCS – 7

1000:1 or greater, depending on the relative performance of the local computer, the network, and the
mechanisms used to move data to and from the network.

Lecturer: Mairaj 13
Note: These handouts/notes are not equivalent and/or replacement of the text/reference books.
Qurtuba University of Science and Information Technology Peshawar
(Computer Science Department)

High Performance Computing Unit 1-2
No ratings yet
High Performance Computing Unit 1-2
60 pages
15 Chinese Diesel Heater Problems + Troubleshooting & Error Codes
100% (2)
15 Chinese Diesel Heater Problems + Troubleshooting & Error Codes
15 pages
Project Schedule Specifications Rev. 01
No ratings yet
Project Schedule Specifications Rev. 01
17 pages
Cse4001 Parallel-And-Distributed-Computing Eth 1.1 47 Cse4001
50% (2)
Cse4001 Parallel-And-Distributed-Computing Eth 1.1 47 Cse4001
2 pages
Klartext Programming 737759-24
No ratings yet
Klartext Programming 737759-24
745 pages
CR 10-X, CR 12-X, CR 15-X: User Manual
No ratings yet
CR 10-X, CR 12-X, CR 15-X: User Manual
85 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
Parallel Computing
100% (1)
Parallel Computing
12 pages
CS - 687 Parallel and Distributed Computing
100% (2)
CS - 687 Parallel and Distributed Computing
3 pages
Parellel Computing 2024 C - Handout-2
No ratings yet
Parellel Computing 2024 C - Handout-2
3 pages
Parallel and Distributed Computing Handout
100% (3)
Parallel and Distributed Computing Handout
3 pages
Mobile Communications Chapter 2: Wireless Transmission
0% (1)
Mobile Communications Chapter 2: Wireless Transmission
41 pages
Subject Name Parallel and Distributed Computing
100% (1)
Subject Name Parallel and Distributed Computing
3 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
Cableizer Flyer 2016
No ratings yet
Cableizer Flyer 2016
4 pages
PC Course Notes May17
No ratings yet
PC Course Notes May17
123 pages
Parallel and Distributed Computing Course Syllabus
No ratings yet
Parallel and Distributed Computing Course Syllabus
3 pages
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
HPC BOOk
No ratings yet
HPC BOOk
68 pages
Reviewer
No ratings yet
Reviewer
2 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Unit1 2 and 3
No ratings yet
Unit1 2 and 3
76 pages
CS416 - Parallel and Distributed Computing: Lecture # 01
No ratings yet
CS416 - Parallel and Distributed Computing: Lecture # 01
20 pages
SEM 8 Syllabus
No ratings yet
SEM 8 Syllabus
53 pages
Parallel and Distributed Course Outline
No ratings yet
Parallel and Distributed Course Outline
4 pages
Pda 1
No ratings yet
Pda 1
72 pages
Tentative Course List (July - Dec 2024)
No ratings yet
Tentative Course List (July - Dec 2024)
108 pages
Parallel and Distributed Algorithms
No ratings yet
Parallel and Distributed Algorithms
65 pages
Week 1 Introduction To Distributed Computing
No ratings yet
Week 1 Introduction To Distributed Computing
75 pages
PDC Lecture 01
No ratings yet
PDC Lecture 01
36 pages
ISE-20% Unit Test I-15% Unit Test II-15% ESE-50% (Minimum Passing Marks: 40%)
No ratings yet
ISE-20% Unit Test I-15% Unit Test II-15% ESE-50% (Minimum Passing Marks: 40%)
2 pages
Unit 1
No ratings yet
Unit 1
65 pages
Handbook HPC 23-24
No ratings yet
Handbook HPC 23-24
18 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
PDC 3
No ratings yet
PDC 3
26 pages
Mca 4
No ratings yet
Mca 4
61 pages
Unit 4-Mca
No ratings yet
Unit 4-Mca
29 pages
Lectures - Week 1 2 Introduction To Distributed Computing
No ratings yet
Lectures - Week 1 2 Introduction To Distributed Computing
65 pages
Mscs6060 Parallel and Distributed Systems
No ratings yet
Mscs6060 Parallel and Distributed Systems
50 pages
Mygale m14-f4 Parts Catalogue Renault Engine v1.0 2017
No ratings yet
Mygale m14-f4 Parts Catalogue Renault Engine v1.0 2017
32 pages
Computer Sci. - Technology
No ratings yet
Computer Sci. - Technology
28 pages
Curriculum Structure Semester - VII: SHIVAJI UNIVERSITY, KOLHAPUR - Syllabus W.E.F. 2014 - 15
No ratings yet
Curriculum Structure Semester - VII: SHIVAJI UNIVERSITY, KOLHAPUR - Syllabus W.E.F. 2014 - 15
35 pages
Catalogue DRX New DRX Adjustable 01
No ratings yet
Catalogue DRX New DRX Adjustable 01
22 pages
Solution Consultant and Project Manager Requirements For Authorizations & Recognized Expertise: Eligible Certifications
No ratings yet
Solution Consultant and Project Manager Requirements For Authorizations & Recognized Expertise: Eligible Certifications
19 pages
CS-3006 - Parallel and Distributed Computing - (BS All Programs) - Spring-2023
No ratings yet
CS-3006 - Parallel and Distributed Computing - (BS All Programs) - Spring-2023
6 pages
Parallel and Distributed Systems
No ratings yet
Parallel and Distributed Systems
3 pages
CS621 Cheatsheet
No ratings yet
CS621 Cheatsheet
11 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
2 pages
Cse570 Zola
No ratings yet
Cse570 Zola
11 pages
Course Outline PDC
0% (1)
Course Outline PDC
1 page
Lecture 1
No ratings yet
Lecture 1
13 pages
CS526 1 Intro
No ratings yet
CS526 1 Intro
15 pages
Parallel and Distributed Computing (CC 510)
No ratings yet
Parallel and Distributed Computing (CC 510)
4 pages
Important Exam Questions For Parallel and Distributed Systems
No ratings yet
Important Exam Questions For Parallel and Distributed Systems
7 pages
The Raise of Robots and Ai
No ratings yet
The Raise of Robots and Ai
11 pages
CIBSE Technical Symposium 2022 Programme
No ratings yet
CIBSE Technical Symposium 2022 Programme
10 pages
OSM 4 WFP External
No ratings yet
OSM 4 WFP External
10 pages
FMTH0301/Rev.5.1 Course Plan
No ratings yet
FMTH0301/Rev.5.1 Course Plan
16 pages
Faculty of Computer Engineering Informatics and Communications
No ratings yet
Faculty of Computer Engineering Informatics and Communications
5 pages
CSC334 P&DC CDF V4.5
No ratings yet
CSC334 P&DC CDF V4.5
3 pages
Cricket Management System (Login Modules)
No ratings yet
Cricket Management System (Login Modules)
7 pages
Doc2 2
No ratings yet
Doc2 2
4 pages
P&DC Course Information Sheet
No ratings yet
P&DC Course Information Sheet
4 pages
OTH Forschungsbericht 2023 TTM
No ratings yet
OTH Forschungsbericht 2023 TTM
6 pages
9.viruses and Other Destructive Programs
No ratings yet
9.viruses and Other Destructive Programs
4 pages
Course Outline
No ratings yet
Course Outline
4 pages
A Bit-Serial Adder Using Partially Reversible Logic
No ratings yet
A Bit-Serial Adder Using Partially Reversible Logic
9 pages
Schneider Electric - Easy-UPS-3Phase-Modular - EMUPS50K250PBHS
No ratings yet
Schneider Electric - Easy-UPS-3Phase-Modular - EMUPS50K250PBHS
4 pages
Practical Task SESSION 1 DFC10033.1662349265782
No ratings yet
Practical Task SESSION 1 DFC10033.1662349265782
9 pages
15-418: Parallel Computer Architecture and Programming Spring 2011 Syllabus
No ratings yet
15-418: Parallel Computer Architecture and Programming Spring 2011 Syllabus
4 pages
Parallel and Distributed Computing - Zhang Zhiguo.2009w-1
No ratings yet
Parallel and Distributed Computing - Zhang Zhiguo.2009w-1
9 pages
Software Patterns Made Easy
From Everand
Software Patterns Made Easy
Justice Nanhou
No ratings yet
PSQCA 2WheelerAuto
No ratings yet
PSQCA 2WheelerAuto
11 pages
Syllabus
No ratings yet
Syllabus
2 pages
Handout MCA MSC (CS) IIISem JulyDec2024 CS419 DistributedComputing VivekPurohit
No ratings yet
Handout MCA MSC (CS) IIISem JulyDec2024 CS419 DistributedComputing VivekPurohit
2 pages
Group 10 Engine Control System: 1. Cpu Controller Mounting
No ratings yet
Group 10 Engine Control System: 1. Cpu Controller Mounting
7 pages
Course Outline Parrallel Distributed Computing (CS-432)
No ratings yet
Course Outline Parrallel Distributed Computing (CS-432)
4 pages
Drupal Website Case Study
No ratings yet
Drupal Website Case Study
4 pages
cs3551 Syllabus
No ratings yet
cs3551 Syllabus
3 pages
BCS 413 Course Outline
No ratings yet
BCS 413 Course Outline
3 pages
KL50K4510 Bi
No ratings yet
KL50K4510 Bi
2 pages
00 - Introduction To Parallel and Distributed Computing
No ratings yet
00 - Introduction To Parallel and Distributed Computing
3 pages
Naman Khybri
No ratings yet
Naman Khybri
3 pages
Datasheet PM851 MSATA v10
No ratings yet
Datasheet PM851 MSATA v10
2 pages
Parallel and Distributed - Courseoutline
No ratings yet
Parallel and Distributed - Courseoutline
2 pages
Cisco Data Center Networking Architecture and Operations Assessment
No ratings yet
Cisco Data Center Networking Architecture and Operations Assessment
6 pages
Automatic Voltage Regulating Relay Usage and Benefits
No ratings yet
Automatic Voltage Regulating Relay Usage and Benefits
2 pages
Course Type Course Code Name of Course L T P Credit: Text Books
No ratings yet
Course Type Course Code Name of Course L T P Credit: Text Books
1 page
Vardaan Stair Climbing Wheelchair
No ratings yet
Vardaan Stair Climbing Wheelchair
2 pages
Massey Ferguson Mf5400 Workshop Manual 01 Introduction
98% (58)
Massey Ferguson Mf5400 Workshop Manual 01 Introduction
5 pages

Unit01-Parallel Computing Introduction

Uploaded by

Unit01-Parallel Computing Introduction

Uploaded by

Parallel and Distributed Computing (CS404) Fall 2023-24

Parallel and Distributed Computing

Unit 1: Introduction to Parallel Computing..................................... 1

Unit 2: Introduction to Distributed Systems ................................. 14

Unit 3: Parallel Computer Memory ................................................. 24

Unit 04: Parallel Programming Models .......................................... 30

Unit 05: Designing Parallel Programs ............................................ 33

Unit 06: Message Passing Interface (MPI) ...................................... 39

Unit 07: Multithreaded Programming............................................ 43

Unite 08: Parallel Algorithms ........................................................ 45

Unit 09: GPU Architecture and Programming .............................. 50

Unit 10: ................................................................................................. 55

Unit 1: Introduction to Parallel Computing

There are generally four types of parallel computing:

Based on communication frequency, parallel applications are typically classified as either:

Parallel Computer Models

Single Machine Model

Figure 1: Multi-computer model

Classes of Parallel Computer Architecture

Distributed-Memory MIMD Computer

Figure 2: Distributed-memory MIMD computer

Shared-Memory MIMD Computer

Figure 3: Shared-memory MIMD computer

SIMD (Single Instruction Multiple Data) computer

Figure 4: SIMD computer

Concepts and Terminology

Von Neumann Architecture

Design Elements of Von Neumann Architecture

Figure 5: Von Neumann architecture

Central Processing Unit (CPU)

Arithmetic and Logic Unit (ALU)

Control Unit (CU)

Flynn’s Classical Taxonomy

According to Flynn’s taxonomy, the parallel computers can be classified as:

3. Multiple Instruction stream Single Data stream (MISD)

Figure 6: Flynn's taxonomy

Single Instruction Stream, Single Data Stream (SISD):

Figure 7: Single Instruction Stream, Single Data Stream (SISD)

This class of computers is characterized as:

Single instruction stream, multiple data streams (SIMD)

Figure 8: Single instruction stream, multiple data streams (SIMD)

This class of computers is characterized as:

Multiple instruction streams, single data stream (MISD)

Figure 9: Multiple instruction streams, single data stream (MISD)

MISD computers are characterized as:

Multiple instruction streams, multiple data streams (MIMD)

Figure 10: Multiple instruction streams, multiple data streams (MIMD)

MIMD computers have following characteristics:

 Currently, the most common type of parallel computer

Parallel Computing Terminology

Symmetric Multi-Processor (SMP)

𝑤𝑎𝑙𝑙_𝑐𝑙𝑜𝑐𝑘 𝑡𝑖𝑚𝑒 𝑜𝑓 𝑠𝑒𝑟𝑖𝑎𝑙 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛

Embarrassingly (IDEALY) Parallel

Uses, Limitations and Costs of Parallel Computing

Uses of Parallel Computing

Some advantages of Parallel Computing over Serial Computing are:

Applications of Parallel Computing

Real-Time Simulation of Systems

Science and Engineering

 Atmosphere, Earth, Environment

 Bioscience, Biotechnology, Genetics

Industrial and Commercial Applications

 Data analysis and "Big Data," databases, data mining

Limitations of Parallel Computing

You might also like