0% found this document useful (0 votes)

25 views8 pages

Unit 4 Parallel Computing

Parallel processing has been developed as an effective technology in modern computers to meet the demand for higher performance, lower cost and accurate results in real-life applications. Concurrent events are common in today’s computers due to the practice of multiprogramming, multiprocessing.

Uploaded by

btechcseamar2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views8 pages

Unit 4 Parallel Computing

Uploaded by

btechcseamar2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Unit 4

What is parallel programming, and how does it work?

Parallel programming is often used interchangeably with the terms parallel processing, parallel
computing, and parallel computing solutions. Parallel programming is like running 10 burrito rolling
stations in parallel instead of slowly making 100 burritos yourself. In computer science terms, parallel
programming is the process of splitting a problem into smaller tasks that can be executed at the
same time – in parallel – using multiple computing resources. In other words, parallel programming
allows programmers to run large-scale projects that require speed and accuracy.
You can use parallel processing techniques on various devices, from mobile devices to laptops to
supercomputers. Different programming languages rely on different technologies to enable parallelism.
Open multi-processing (OpenMP) provides a cross-platform API for developing parallel applications
using C, C++, and Fortran across the cores of a single CPU.
On the other hand, technologies such as the message passing interface (MPI) enable parallel processes
between different computers or nodes.
Parallel programming is not limited to data parallelism, however. We can spread code execution across
several tasks for faster execution by distributing tasks across different threads, and across different
processors. By doing so, we’re also increasing a program’s natural resources for work and thus
increasing its capacity. In short, we get things done faster.
Parallel Algorithm Models
The need for a parallel algorithm model arises in order to understand the strategy that is used for the
partitioning of data and the ways in which these data are being processed. Therefore every model being
used provides proper structuring based on two techniques. They are as follows:
1. Selection of proper partitioning and mapping techniques.
2. Proper use of strategy in order to reduce the interaction.
Types of Parallel Models
1. The Data-Parallel Model
The data-parallel model algorithm is one of the simplest models of all other parallel algorithm models.
In this model, the tasks that need to be carried out are identified first and then mapped to the processes.
This mapping of tasks onto the processes is being done statically or semi-statically. In this model, the
task that is being performed by every process is the same or identical but the data on which these
operations or tasks are performed is different.
The problem to be solved is divided into a number of tasks on the basis of data partitioning. Here data
partitioning is being used because all the operations performed by each process are similar and proper
uniform partitioning of data followed by static mapping assures the proper load balancing.
Example: Dense Matrix Multiplication
Dense Matrix Multiplication

In the above example of dense matrix multiplication, the instruction stream is being divided into the
available number of processors. Each processor computes the data stream it is allocated with and
accesses the memory unit for read and write operation. As shown in the above figure, the data stream 1
is allocated to processor 1, once it computes the calculation the result is being stored in the memory
unit.
2. The Task Graph Model
The task dependency graph is being used by the parallel algorithms for describing the computations it
performs. Therefore, the use of interrelationships among the tasks in the task dependency graph can be
used for reducing the interaction costs.
This model can be used effectively for solving problems in which tasks are associated with a large
amount of data as compared to that actual computation. The parallelism that is described with the task
dependency graph where each task is an independent task is known as task parallelism. The task graph
model is majorly used for the implementation of parallel quick sort, a parallel algorithm based on
divide and conquer.
Example: Finding the minimum number

Finding the Minimum Number

In the above example of finding the minimum number, the task graph model works parallelly in order
to find the minimum number in the given stream. As shown in the above figure, the minimum of 23 and
12 is computed and passed on further by one process, similarly at the same time the minimum of 9 and
30 is calculated and passed on to the further process. This approach of computation requires less time
and effort.
3. Work Pool Model
The work pool model is also known as the task pool model. This model makes use of a dynamic
mapping approach for task assignment in order to handle load balancing. The size of some processes or
tasks is small and requires less time. Whereas some tasks are of large size and therefore require more
time for processing. In order to avoid the inefficiency load balancing is required.
The pool of tasks is created. These tasks are allocated to the processes that are idle in the runtime. This
work pool model can be used in the message-passing approach where the data that is associated with
the tasks is smaller than the computation required for that task. In this model, the task is moved without
causing more interaction overhead.
Example: Parallel tree search

Parallel Search Tree

In the above example of the parallel search tree, that uses the work pool model for its computation uses
four processors simultaneously. The four sub-tress are allocated to four processors and they carry out
the search operation.
4. Master-Slave Model
Master Slave Model is also known as Manager- worker model. The work is being divided among the
process. In this model, there are two different types of processes namely master process and slave
process. One or more process acts as a master and the remaining all other process acts as a slave.
Master allocates the tasks to the slave processes according to the requirements. The allocation of tasks
depends on the size of that task. If the size of the task can be calculated on a prior basis the master
allocates it to the required processes.
If the size of the task cannot be calculated prior the master allocates some of the work to every process
at different times. The master-slave model works more efficiently when work has to be done in
different phases where the master assigns different slaves to perform tasks at different phases. In the
master-slave model, the master is responsible for the allocation of tasks and synchronizing the activities
of the slaves. The master-slave model is generally efficient and used for shared address space and
message-passing paradigms.
Example: Distribution of workload across multiple slave nodes by the master process

Distribution of workload across multiple slave nodes by the master process

As shown in the above example of the Master-Slave model, the distribution of workload is being done
across multiple processes. As shown in the above diagram, one node is the master process that allocates
the workload to the other four slave processes. In this way, each sub-computation is carried out by
multiple slave processes.
5. The Pipeline Model
The Pipeline Model is also known as the Producer-Consumer model. This model is based on the
passing of a data stream through the processes that are arranged in succession. Here a single task goes
through all the other processes. They are then accessed by the required processes in a sequential
manner. Once the processing of one process is finished it goes to the next present process. In this
model, the pipeline acts as a chain of producers and consumers.
This pipeline of producers and consumers can also be arranged in a directed graph-like fashion rather
than a linear chain. The approach of Static mapping is being used for mapping of tasks onto the
processes.
Example: Parallel LU factorization algorithm

Parallel LU factorization algorithm

As shown in the above diagram, the Parallel LU factorization algorithm uses the pipeline model. In this
model, the producer reads the input matrix and generated the tasks that are required for computing the
LU factorization as an output. The producer divides this input matrix into a smaller size of multiple
tasks and shares them into a shared task queue. The consumers then retrieve these blocks and perform
the LU factorization on each independent block.
6. Hybrid Model
A hybrid model is the combination of more than one parallel model. This combination can be applied
sequentially or hierarchically to the different phases of the parallel algorithm. The model that can be
efficient for performing the task is selected as a model for that particular phase.
Example: A combination of master-slave, work pool, and data graph model.

A combination of master slave, work pool and data graph model

As shown in the above hybrid model where three different models are used at each phase master-slave
model, the work pool model, and the data graph model. Consider the above example where the master-
slave model is used for the data transformation task. The master process distributes the task to multiple
slave processes for parallel computation. In the second phase work pool model is used for data analysis
and similarly data graph model is used for making the data visualization. In this way, the operation is
carried out in multiple phases and by using different parallel algorithm models at each phase.

2.1 Message Passing Paradigm

The beowulf clusters that you will be writing, compiling, and running your MPI programs on is called
a Distributed Memory System. In this system we have a master node, computer, that you log into.
Connected to the master node is a network of several other nodes. When you run your MPI program on
the master node, the master node runs the same program on each one of the nodes in the cluster. This way
we have access on each node to their processor and memory. We can also transfer data between each
node giving the illustion of one giant computer. See the illustration below for example.

Distributed Memory System

Network

A program that runs on a node is called a Process. When your program is run a process is run on each
processor in the cluster. These processes communicate to each other using a system of message passing.
These messages are packets of data that are put into envelopes that contain routing information. Using the
message passing system allows us to copy data from the memory of one process to another. Here is an
illustration:
Communication of messages requires that both processes cooperate in a send and receive operation. The
transfer of data is called a send and the receiving of data by a process is called a receive.

2.2 Sending and Receiving

There are two different kinds of buffers in MPI. The application buffer which is where the data for each
process is held in memory, this is the address space that holds data to be sent and received. The system
buffer is used when messages are needed to be stored, this buffer will be used depending on what type of
communication method is being used. The system buffer allows us to send messages in asynchronous
mode. Asynchronous send operations are allowed to complete even though the receiving process may not
have yet received the message. In synchronous mode a send will complete when the receiving process
gives acknowledgement that the message was received by the receiving process.

Above is an illustration that sends data from Process 1 to Process 2. The variable in the application buffer
is sent through the network and copied into the system buffer on the receiving process. The data on the
receiving system buffer is then copied into the processes application buffer. There are two methods for
sending and receiving:
Blocking – In blocking communication a call is dependent on events. To send, the data in the
application buffer must be copied to a system buffer so the data is available for reuse. For
receives the data must be copied into the receive buffer so it is ready to be used.

Non-Blocking – In non-blocking communication a send will complete without waiting for

the receiving process to complete. This allows computation to overlap communication, but
keep in mind that it is not safe to modify or use the application buffer after a non-blocking
send. It is up to the programmer to test if the receive process is complete and the application
buffer is free for reuse.

2.3 Communicators and Groups

MPI needs to have a way to identify all the different process that will run in a parallel program. To do this
we have something called a rank. An integer is assigned to each process when it initializes. This way the
programmer can use the rank to specify a destination or source for sending and receiving messages. The
rank integer will start at zero and increase by one, for every running process. A communicator is an
object that MPI uses to group collections of process that are allowed to communicate with each other. All
the processes that we have available to us when we begin our MPI program will be ranked and grouped
into one single communicator called MPI_COMM_WORLD. MPI_COMM_WORLD is the default group
when the MPI program is initialized, we can then divide this into seperate groups to work with.

Shared memory programming

Shared memory programming is a method of allowing multiple programs to access the same
memory simultaneously. This can be useful for communication between programs or to avoid
redundant copies of data.
Here are some examples of shared memory programming:
 Database servers

Shared memory allows database server threads and processes to share data, which can
reduce memory usage and disk I/O.

 CUDA programming

Shared memory is a CUDA memory space that all threads in a thread block can access. This
allows all threads in the block to read and write to the shared memory, and all changes are
available to all threads in the block.
 Parent and child processes
The shared flag can be set when mapping a block of memory so that it is shared between a
parent and child process. This allows the processes to communicate with each other without
using signals, pipes, or files.

Computer-Controlled Systems: Theory and Design, Third Edition
From Everand
Computer-Controlled Systems: Theory and Design, Third Edition
Karl J Åström
3/5 (1)
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
FAINT YET PURSUING by KELLY JOEL
No ratings yet
FAINT YET PURSUING by KELLY JOEL
13 pages
Unit-Iv Concurrent and Parallel Programming: Parallel Programming Paradigms - Data Parallel
No ratings yet
Unit-Iv Concurrent and Parallel Programming: Parallel Programming Paradigms - Data Parallel
61 pages
Models of Parallel Algoritms and Simple Parallel Algorithms
No ratings yet
Models of Parallel Algoritms and Simple Parallel Algorithms
40 pages
CPP Unit-4
No ratings yet
CPP Unit-4
61 pages
Parallel Algorithm Models
No ratings yet
Parallel Algorithm Models
21 pages
Parallel Algorithm Models: An Algorithm Model Is Typically A Way of Structuring A Parallel Algo. Models
No ratings yet
Parallel Algorithm Models: An Algorithm Model Is Typically A Way of Structuring A Parallel Algo. Models
28 pages
Parallel Algorithm - Introduction
No ratings yet
Parallel Algorithm - Introduction
36 pages
Lecture 04 - Parallel Algorithm Models
No ratings yet
Lecture 04 - Parallel Algorithm Models
18 pages
Hpclab
No ratings yet
Hpclab
58 pages
HPC Module 4
No ratings yet
HPC Module 4
18 pages
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-02-07 Reference-Material-I
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-02-07 Reference-Material-I
35 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
Parallel Models of Computation
No ratings yet
Parallel Models of Computation
3 pages
Programming Models
No ratings yet
Programming Models
21 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
Dalgorithm
No ratings yet
Dalgorithm
5 pages
HPC Chapter 1
No ratings yet
HPC Chapter 1
12 pages
L19-20 PA Design Intro
No ratings yet
L19-20 PA Design Intro
31 pages
Con Currency Mapping
No ratings yet
Con Currency Mapping
40 pages
Parallel Computing Pastpaper Solve by Noman Tariq
No ratings yet
Parallel Computing Pastpaper Solve by Noman Tariq
30 pages
Parallel Computing
No ratings yet
Parallel Computing
2 pages
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-31 Reference-Material-I
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-31 Reference-Material-I
19 pages
WINSEM2022 23 CSE4001 ETH VL2022230503182 Reference Material I 02
No ratings yet
WINSEM2022 23 CSE4001 ETH VL2022230503182 Reference Material I 02
28 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Presented by
No ratings yet
Presented by
23 pages
Lecture 13 - Programming Models
No ratings yet
Lecture 13 - Programming Models
15 pages
Mpi Course
No ratings yet
Mpi Course
202 pages
UNIT - I: Parallel and Distributed Computing
No ratings yet
UNIT - I: Parallel and Distributed Computing
58 pages
Parallel Framework, The Need
No ratings yet
Parallel Framework, The Need
6 pages
Partitioning
No ratings yet
Partitioning
37 pages
L1.3a HPC Concepts
No ratings yet
L1.3a HPC Concepts
43 pages
Week 3 Parallel Algorithms
No ratings yet
Week 3 Parallel Algorithms
10 pages
Parallel Computing
No ratings yet
Parallel Computing
24 pages
Load Balancing (Computing)
No ratings yet
Load Balancing (Computing)
16 pages
Cloud Application Development
No ratings yet
Cloud Application Development
21 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Lecture 6 Principles of Parallel Algorithm Design
No ratings yet
Lecture 6 Principles of Parallel Algorithm Design
35 pages
Part 1 - Lecture 3 - Parallel Software-1
No ratings yet
Part 1 - Lecture 3 - Parallel Software-1
45 pages
Parallel Algorithm and Programming
No ratings yet
Parallel Algorithm and Programming
4 pages
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
No ratings yet
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
18 pages
Message Passing Fundamentals: Reference: Http://foxtrot - Ncsa.uiuc - edu:8900/public/MPI
No ratings yet
Message Passing Fundamentals: Reference: Http://foxtrot - Ncsa.uiuc - edu:8900/public/MPI
22 pages
Chapter 2 - Parallel Algorithm Design
No ratings yet
Chapter 2 - Parallel Algorithm Design
84 pages
Chapter 3 - Principles of Parallel Algorithm Design
No ratings yet
Chapter 3 - Principles of Parallel Algorithm Design
52 pages
Chapter 6 Distributed System Management
No ratings yet
Chapter 6 Distributed System Management
69 pages
Intro HPC IITK
No ratings yet
Intro HPC IITK
44 pages
SOE413 Parellel Distributed Cloud
No ratings yet
SOE413 Parellel Distributed Cloud
21 pages
Computer Algebra: Fundamentals and Applications
From Everand
Computer Algebra: Fundamentals and Applications
Fouad Sabry
No ratings yet
Parallel Computing 50 Questons
No ratings yet
Parallel Computing 50 Questons
5 pages
Module 3 - Principles of Parallel Algorithm Design
No ratings yet
Module 3 - Principles of Parallel Algorithm Design
39 pages
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
No ratings yet
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
27 pages
4 DesigningParallelPrograms
No ratings yet
4 DesigningParallelPrograms
69 pages
Lecture 5 Principles of Parallel Algorithm Design
No ratings yet
Lecture 5 Principles of Parallel Algorithm Design
30 pages
1 Module 1 Parallelism Fundamentals Motivation Key Concepts and Challenges Parallel Computing
No ratings yet
1 Module 1 Parallelism Fundamentals Motivation Key Concepts and Challenges Parallel Computing
81 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
Parallel Paradigms
No ratings yet
Parallel Paradigms
16 pages
AI-Driven Web Apps: Practical Machine Learning for Software Developers
From Everand
AI-Driven Web Apps: Practical Machine Learning for Software Developers
Sivaramarajalu Ramadurai Venkataraajalu
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
The Software Programmer: Basis of common protocols and procedures
From Everand
The Software Programmer: Basis of common protocols and procedures
S Mathioudakis
No ratings yet
Suman Report
No ratings yet
Suman Report
29 pages
Graph
No ratings yet
Graph
17 pages
High
No ratings yet
High
54 pages
Acne
No ratings yet
Acne
23 pages
Format For The Industrial Training Report 2021
No ratings yet
Format For The Industrial Training Report 2021
3 pages
Python Modifiers
No ratings yet
Python Modifiers
1 page
Problem Solving Skill
No ratings yet
Problem Solving Skill
12 pages
Recurrence Relations DM
No ratings yet
Recurrence Relations DM
12 pages
Email Etiquette
No ratings yet
Email Etiquette
15 pages
Ps 1320 Gbnlfresd
No ratings yet
Ps 1320 Gbnlfresd
8 pages
Camatkara-Candrika 3ed
No ratings yet
Camatkara-Candrika 3ed
100 pages
STCMB 1
No ratings yet
STCMB 1
59 pages
Mini-Vert Brochure
No ratings yet
Mini-Vert Brochure
4 pages
How Do Trusses Work
No ratings yet
How Do Trusses Work
14 pages
CCC Professional Cloud Security Manager
No ratings yet
CCC Professional Cloud Security Manager
32 pages
Minnesota Waterfowl Regulations 2023
No ratings yet
Minnesota Waterfowl Regulations 2023
32 pages
Lecture O03: ENGR90024 Computational Fluid Dynamics
No ratings yet
Lecture O03: ENGR90024 Computational Fluid Dynamics
43 pages
Getting Started With Excel: Comprehensive
0% (1)
Getting Started With Excel: Comprehensive
10 pages
Experiment 16: Heat Conduction
No ratings yet
Experiment 16: Heat Conduction
6 pages
Irjet V6i240
No ratings yet
Irjet V6i240
5 pages
2006-12-31: Overall Conclusion For The Year of 'Arise and Shine'
No ratings yet
2006-12-31: Overall Conclusion For The Year of 'Arise and Shine'
6 pages
702-Failure Cargo Crane
100% (1)
702-Failure Cargo Crane
27 pages
Criminology MCQs
100% (1)
Criminology MCQs
4 pages
Pega CSSA Cheat Sheet For OOTB Rules
No ratings yet
Pega CSSA Cheat Sheet For OOTB Rules
4 pages
Review of Invisalign System
No ratings yet
Review of Invisalign System
13 pages
CSEC Biology June 2014 P032
No ratings yet
CSEC Biology June 2014 P032
12 pages
Three-Dimensional Printing (3D Printing) : by Dr. Vineet Srivastava
No ratings yet
Three-Dimensional Printing (3D Printing) : by Dr. Vineet Srivastava
9 pages
WiFi, Working, Elements of WiFi
100% (1)
WiFi, Working, Elements of WiFi
67 pages
HW 683608 1answe
No ratings yet
HW 683608 1answe
4 pages
Span 210-MW Syllabus Spring 2014
No ratings yet
Span 210-MW Syllabus Spring 2014
12 pages
DSR Ss 03 January 2023 Indordb
No ratings yet
DSR Ss 03 January 2023 Indordb
19 pages
Images Line Drawings and Backplanes
No ratings yet
Images Line Drawings and Backplanes
27 pages
Physical Education Class 12 Important Questions Chapter 10 Kinesiology Biomechanics and Sports - Learn CBSE
No ratings yet
Physical Education Class 12 Important Questions Chapter 10 Kinesiology Biomechanics and Sports - Learn CBSE
14 pages
B1 Booster v1
No ratings yet
B1 Booster v1
32 pages
Colour Dilution Alopecia in Doberman Pinschers With Blue or Fawn Coat Colours - A Study On The Incidence and Histopathology of This Di
No ratings yet
Colour Dilution Alopecia in Doberman Pinschers With Blue or Fawn Coat Colours - A Study On The Incidence and Histopathology of This Di
10 pages
The Most Notorious "Talker" Runs The World's Greatest Clan Vol 3
No ratings yet
The Most Notorious "Talker" Runs The World's Greatest Clan Vol 3
339 pages
Gotaq QPCR Master Mix Quick Protocol
No ratings yet
Gotaq QPCR Master Mix Quick Protocol
1 page
GS 150
No ratings yet
GS 150
72 pages

Unit 4 Parallel Computing

Uploaded by

Unit 4 Parallel Computing

Uploaded by

Unit 4

What is parallel programming, and how does it work?

Finding the Minimum Number

Parallel Search Tree

Distribution of workload across multiple slave nodes by the master process

Parallel LU factorization algorithm

A combination of master slave, work pool and data graph model

2.1 Message Passing Paradigm

Distributed Memory System

2.2 Sending and Receiving

Non-Blocking – In non-blocking communication a send will complete without waiting for

2.3 Communicators and Groups

Shared memory programming

You might also like