0% found this document useful (0 votes)

33 views42 pages

Parallel Programming

The document provides an overview of parallel programming, explaining its importance in efficiently solving complex problems by dividing tasks among multiple computing resources. It covers various models, paradigms, tools, challenges, and best practices associated with parallel programming, highlighting its applications in fields like scientific simulation, big data analysis, and artificial intelligence. The future of parallel programming is discussed, emphasizing advancements in technology and the potential for improved efficiency and new applications.

Uploaded by

Kashaf Maqsood

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views42 pages

Parallel Programming

Uploaded by

Kashaf Maqsood

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

Parallel

Programming
AdvanceSubmitted
Computer Arcitecture
to: Dr. Saima Farhan
OUR TEAM

Farheen Fatima

Farzeen Fatima

Khansa
CONTENTS OF THIS Presentation
 Introduction to Parallel Programming
 Why Parallel Programming?
 Parallel Programming Process
 Parallel Programming Models
 Parallel Programming Paradigms
 Tools and Frameworks for Parallel Programming
 Challenges in Parallel Programming
 Applications of Parallel Programming
 Best Practices for Effective Parallel Programming
 Future of Parallel Programming
 Conclusion
INTRODUCTIO
N
parallel programming is the
process of splitting a problem
into smaller tasks that can be
executed at the same time – in
parallel – using multiple
computing resources.

It is a key technique for solving

complex problems efficiently

Example: scientific problems, big data

analysis, and artificial intelligence,
image processing, weather forecast.
Matrix A: Matrix B: Matrix C (2*2)

Problem breakdown

C[0][0] C[0][1] C[1][0] C[1][1]

P0 P0 P0 P0

lt
res

res
ult su es ult
re r
ltu

ASSEMBLE
Why Use
 Improved Performance: Speeds up computational
tasks by dividing them into smaller, parallelized
subtasks.
 Efficient Resource Utilization: Makes optimal use
of multi-core processors and high-performance
computing (HPC) architectures.
 Handles Large Data: Ideal for processing vast
datasets or performing simulations.
 Essential for Modern Applications: Powering
technologies like machine learning, climate
modeling, and real-time analytics.
Process of parallel programming
Understand the problem

Broke down into sub problems

Identify Communications Between Tasks

Synchronize the Sequence of Task

Identify Dependencies in the Sequence of Tasks

Perform Load Balancing

Write and debug the code

parallel programming Models
Shared Memory
 Shared Address Space: Processes share a common memory space, reading and
writing asynchronously.
 Access Control: Locks and semaphores manage access to shared memory to prevent
race conditions and deadlocks.
 Ease of Development: Programmers don’t need to define data ownership, simplifying
development.
 Performance Challenge: Managing data locality is difficult, leading to inefficiencies in
memory access and cache usage. Poor data locality increases memory usage, which
reduces performance. Controlling data locality is complex and often beyond average
users.
MPI
 A parallel programming approach where separate processes communicate only by
sending messages, not sharing memory. Each set of tasks use their own local memory
during computation. Multiple tasks can reside on the same physical machine and/or
across an arbitrary number of machines.
 Processes exchange data through communications by sending and receiving
messages.
 Data transfer usually requires cooperative operations to be performed by each
process. For example, a send operation must have a matching receive operation.
Hybrid Model
 Hybrid Model: Combines multiple programming models, like MPI with
shared memory
 Popular Hardware: Ideal for clustered multi-core systems.
 MPI + GPU: MPI tasks run on CPUs, and intensive computations are
offloaded to GPUs.
 Data Exchange: GPU and CPU exchange data using CUDA or similar
technologies.
parallel programming Paradigms
Data-Level
 The data-parallel model is one of the simplest parallel algorithm models.
 Tasks to be performed are identified first, then assigned to processes.
 Task assignment can be static (fixed) or semi-static (partially flexible).
 Each process performs the same task (operations are identical), but on
different pieces of data.
 The problem is divided into smaller tasks based on data partitioning.
 Data partitioning ensures:
-> All processes perform similar operations
-> Proper load balancing with uniform data distribution.
Exampel
1. Problem:
- You have a large array of numbers, e.g., [1, 2, 3, 4, 5, 6, 7, 8].
- The goal is to calculate the total sum of all the numbers.
2. Data Partitioning:
- Divide the array into smaller chunks to distribute the work among multiple
processes.
- For example, with 4 processes:
- Process 1 gets [1, 2]
- Process 2 gets [3, 4]
- Process 3 gets [5, 6]
- Process 4 gets [7, 8]
Example: Cont…
3. Parallel Task Execution:
- Each process performs the same operation (sum calculation) on its
assigned data.
- Process 1 calculates 1 + 2 = 3.
- Process 2 calculates 3 + 4 = 7.
- Process 3 calculates 5 + 6 = 11.
- Process 4 calculates 7 + 8 = 15.
4. Combining Results:
- The partial sums [3, 7, 11, 15] are combined (aggregated) to get the final
result:
3 + 7 + 11 + 15 = 36.
Task-Pool-level
 Also known as the Task Pool Model.
 Dynamic mapping of tasks to processors to handle load balancing.
 Used when tasks vary in size and processing time.
 How it Works:
 Tasks are divided into a pool, which is a collection of tasks ready to be
processed.
 Idle processors in the system are assigned tasks from the pool during
runtime.
 This ensures that processors remain active and no processor is underutilized.
 Load Balancing:

Example
A system needs to process various tasks of different sizes (e.g.,
small data cleanup tasks and large data processing tasks).

Step 1 Step 2
A pool of tasks is This ensures all
created, and as processors remain
processors become busy, and the
idle, they pull tasks workload is
from the pool to distributed evenly,
process. leading to efficient
processing
Master-Slave
 Also known as the Manager-Worker Model.
 Roles:
 The master process manages the tasks.
 The slave processes execute the tasks assigned by the master.
 Task Allocation:
 If the task size is known beforehand, the master allocates it to the
appropriate slaves.
 If the task size is unknown beforehand, the master assigns smaller
portions to slaves incrementally.
Responsibilitie Common
s Usage

Allocates tasks to Effective in systems

slave processes. with shared memory
Synchronizes the or message-passing
activities of all slave communication.
processes. Ensures efficient task
distribution and
coordination.
process a large collection of images (e.g.,
apply filters, resize images, etc.)
Master Process Role:
 The master receives the list of images to process.
 It divides the work (e.g., groups of images) and assigns tasks to slave
processes.
 For example:
 Slave 1 processes images 1–100.
 Slave 2 processes images 101–200.
 Slave 3 processes images 201–300.
Slave Process Role:
 Each slave performs the task assigned by the master (e.g., applying
filters to the images).
 Once done, slaves may send their results back to the master.
Pipeline-Based
 Also called the Producer-Consumer Model.
 Data flows through a series of processes arranged in succession.
->How it works:
A single task passes through multiple processes sequentially.
Each process performs its part of the work and then sends the task to the
next process.
->Pipeline Structure:
Processes act as a chain of producers (output generators) and consumers
(input processors).
->Task Mapping:
Uses static mapping, where tasks are assigned to specific processes in
Example
Scenario:
Consider a pipeline for processing and analyzing log
files in a web application. The pipeline involves:
Stage 1: Reading log files (Producer).
Stage 2: Parsing logs to extract relevant information.
Stage 3: Filtering logs for specific events (e.g.,
errors, warnings).
Stage 4: Aggregating statistics (e.g., error counts).
Stage 5: Saving results to a database (Final
Consumer).
Tools and Framework for Parallel
Programming
MPI
A standardized library for parallel programming.
Enables applications to use multiple processors or computers to work together.
Designed for systems where each processor has its own private memory.
Communication between processors happens by sending and receiving messages.

Portability Scalability Flexibility

Provides a rich set of
Works across Can scale easily to functions for: Sending and
different use many receiving messages.
hardware processors. Complex communication
architectures and patterns
operating systems
Thread Building Block
 Threading Building Blocks (TBB) is a C++ library developed by Intel to help
 developers write parallel applications.
 It focuses on task-based parallelism, making it easier to divide work across
multiple threads.
 TBB allows tasks to be divided into smaller independent tasks that can be
executed in parallel,
 without requiring manual management of threads.
 It automatically distributes tasks among available threads, ensuring efficient
use of resources.
 TBB scales well across multi-core processors, handling dynamic workloads
efficiently.
 Cross-Platform:
 Works across various operating systems like Windows, Linux, and
macOS,
 supporting multi-core CPUs from different manufacturers.
CUDA
 CUDA (Compute Unified Device Architecture) is a parallel computing platform
 and programming model created by NVIDIA.
 It allows programmers to use Graphics Processing Units (GPUs) for general-purpose
computing.
 Utilizes the power of GPUs to speed up computationally intensive tasks.
 Ideal for tasks like simulations, machine learning, and real-time data processing.
 Integrates with C, C++, and Fortran, making it easy for developers to adopt.
 Can scale efficiently with the increasing number of GPU cores,
 enhancing performance as hardware improves.

 Manages different types of memory, such as global memory, constant memory and
shared memory, for optimal performance.
Challenges
Synchronizati Load
on Balancing Communicatio Debug and
n Overhead Profiling
Ensures that tasks distributes work
Processors in Debugging parallel
accessing shared evenly across
parallel systems is
resources do so in processors to ensure
systems often complex due to
an orderly manner, no processor is
need to issues like race
avoiding conflicts overburdened.
exchange data, conditions,
(e.g., using locks,
which can deadlocks, and
semaphores
cause non-
communicati deterministic
on overhead. behavior
Applications
Scientific Big Data ML Gaming and
Simulation Analysisi Graphics

Used in weather Involves Training deep

forecasting, processing vast learning models Real-time rendering
climate modeling, amounts of Parallel algorithms in video games relies
molecular structured and are used to on parallel
dynamics, and unstructured data. distribute tasks processing, especially
astrophysics. Technologies like across multiple with GPUs.
Hadoop GPUs or CPUs for
faster results.
Best practices for effective parallel
programming.
1. Understand the Problem
 Find which parts of your program can run at the same time.
 Check if using parallel programming will actually save time and effort.
2. Reduce Communication Between Tasks
 Limit how often different parts of the program need to share data.
 Use efficient ways to share information, like shared memory or quick messages.
3. Divide Work Evenly
 Make sure all tasks get a fair share of the work so no processor stays idle.
 Use methods to adjust the workload if some tasks take longer than others.
4. Prevent Conflicts
 Use tools like locks or semaphores to avoid errors when multiple tasks try to
change the same data at the same time.
5. Optimize Data Access
 Keep tasks close to the data they need to access to reduce delays.
 Make use of your computer’s memory cache for faster performance.
6. Choose the Right Tools
 Use tools and libraries that match your needs, such as:
OpenMP for shared memory.
MPI for distributed systems.

7.Test Carefully
 Look for errors in how tasks interact, especially with shared data.
 Try running the program on different computers to ensure it works well everywhere.
8.Measure Performance
 Check how much faster the program runs with parallel programming.
 Find the slow parts and improve them.
9.Plan for Growth
 Design the program so it works well even if more processors or cores are added in the
future.
10. Keep Code Simple
 Write small, reusable pieces of code for parallel tasks.
 This makes debugging and updates much easier.
Future of Parallel programming
1. Faster and Smarter Devices
 Computers will use different processors like CPUs, GPUs, and specialized
chips together to work faster.
2. Quantum Computing
 New kinds of computers, like quantum computers, will need special parallel
programming methods to solve problems quicker.
3. Artificial Intelligence (AI)
 AI and machine learning will rely even more on parallel programming to train
smarter systems faster.
4. Easier Tools
 Tools and languages will improve to make parallel programming simpler for
everyone.
5. Saving Energy
 Parallel programming will focus on doing tasks faster while using less energy,
especially in big data centers.
6. Edge and IoT Devices
 Small devices like smart sensors and IoT gadgets will use parallel
programming to handle tasks quickly.
7. Powerful Supercomputers
 Parallel programming will power supercomputers that can handle extremely
large and complex calculations.
8. Automation
 Future tools will make it easier to write parallel programs by automatically
dividing tasks between processors.
9. New Applications
 Fields like gaming, virtual reality, blockchain, and augmented reality will
heavily use parallel programming.
In conclusion, parallel
programming is the
key to making computers
faster and more efficient
by running many tasks
at the same time. It is
shaping the future of
technology, from AI and
supercomputers to
gaming, IoT, and even
quantum computing. As
tools and methods
improve, parallel
programming will
become easier and more
powerful, helping solve
bigger problems and
create smarter systems.

Bcs702 Parallel Computing Module 1
No ratings yet
Bcs702 Parallel Computing Module 1
35 pages
Parallel Programming For Modern High Performance Computing Systems (Czarnul, Pawel)
No ratings yet
Parallel Programming For Modern High Performance Computing Systems (Czarnul, Pawel)
330 pages
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
L1.3a HPC Concepts
No ratings yet
L1.3a HPC Concepts
43 pages
HPC Module 4
No ratings yet
HPC Module 4
18 pages
Intro HPC IITK
No ratings yet
Intro HPC IITK
44 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Parallel Computing
No ratings yet
Parallel Computing
91 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
HPC Lecture 2 Points
No ratings yet
HPC Lecture 2 Points
7 pages
Programming Models
No ratings yet
Programming Models
21 pages
PDC 3
No ratings yet
PDC 3
26 pages
Khaitan PSERC Webinar HPC Mar 2013 Slides
No ratings yet
Khaitan PSERC Webinar HPC Mar 2013 Slides
52 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
U1&u2 Padcom-25
No ratings yet
U1&u2 Padcom-25
95 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
3.3-Recent Trends in Parallel Computing
No ratings yet
3.3-Recent Trends in Parallel Computing
12 pages
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
No ratings yet
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
22 pages
Partitioning
No ratings yet
Partitioning
37 pages
LP V Theory and Practical Explanation: o o o o
No ratings yet
LP V Theory and Practical Explanation: o o o o
96 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Parallel Computing
No ratings yet
Parallel Computing
24 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages
BCSE412L - Parallel Computing 01
No ratings yet
BCSE412L - Parallel Computing 01
27 pages
Parallel Programming
No ratings yet
Parallel Programming
18 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
001 - DDS IIIT Jan 10th
No ratings yet
001 - DDS IIIT Jan 10th
34 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
.Trashed-1650000204-Hpc Prac Exam
No ratings yet
.Trashed-1650000204-Hpc Prac Exam
5 pages
Parallel Algorithms Presentation
No ratings yet
Parallel Algorithms Presentation
32 pages
Part 1 - Lecture 3 - Parallel Software-1
No ratings yet
Part 1 - Lecture 3 - Parallel Software-1
45 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
PA Midsem
No ratings yet
PA Midsem
20 pages
Parallel_Programming_FDP
No ratings yet
Parallel_Programming_FDP
43 pages
Parallel and Distributed Algorithms
No ratings yet
Parallel and Distributed Algorithms
65 pages
HPC Note
No ratings yet
HPC Note
39 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Parallel VS Distributed Computing
No ratings yet
Parallel VS Distributed Computing
9 pages
Parallel and Distributed Lec 8
No ratings yet
Parallel and Distributed Lec 8
24 pages
Pda 1
No ratings yet
Pda 1
72 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Lecture1 Introduction PDF
No ratings yet
Lecture1 Introduction PDF
43 pages
Parallel Algorithm - Introduction
No ratings yet
Parallel Algorithm - Introduction
36 pages
Chapter 02 - Asynchronous and Parallel Programming in
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in
55 pages
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
No ratings yet
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
34 pages
Parallel Computing
No ratings yet
Parallel Computing
19 pages
Concurrent Programming With Threads: Rajkumar Buyya
No ratings yet
Concurrent Programming With Threads: Rajkumar Buyya
168 pages
2 ParallelArchExec
No ratings yet
2 ParallelArchExec
46 pages
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
No ratings yet
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
27 pages
Overview of Parallel Computing: Shawn T. Brown
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
46 pages
IT105 Midterm Lecture Part1
No ratings yet
IT105 Midterm Lecture Part1
5 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Biografi Steve Jobs PDF
No ratings yet
Biografi Steve Jobs PDF
74 pages
Mukabbir College, Gujrat
No ratings yet
Mukabbir College, Gujrat
1 page
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
46 pages
01 Introduction - World of Microcontrollers - Book - PIC Microcontrollers
No ratings yet
01 Introduction - World of Microcontrollers - Book - PIC Microcontrollers
15 pages
C250i - Shutter On Waste Toner Conveyance Unit May Not Open - 1.1
No ratings yet
C250i - Shutter On Waste Toner Conveyance Unit May Not Open - 1.1
8 pages
2019 NTC Directory 100919
No ratings yet
2019 NTC Directory 100919
6 pages
Desktop Dell Optiplex 7010 SFF (59P3N)
No ratings yet
Desktop Dell Optiplex 7010 SFF (59P3N)
3 pages
G31MX&46GMX Series Manual Multi V1.2
No ratings yet
G31MX&46GMX Series Manual Multi V1.2
47 pages
DX Diag
No ratings yet
DX Diag
35 pages
MP 7503 Guia de Soporte
100% (1)
MP 7503 Guia de Soporte
47 pages
CSE Linux Operatig System Report
No ratings yet
CSE Linux Operatig System Report
20 pages
Iclock260-360 Brochure PDF
No ratings yet
Iclock260-360 Brochure PDF
2 pages
1st Quarter MAPEH6 ARTS Concepts On The Use of Computer Software
No ratings yet
1st Quarter MAPEH6 ARTS Concepts On The Use of Computer Software
29 pages
E Catalogue OK PDF
No ratings yet
E Catalogue OK PDF
13 pages
SIC Assembler
No ratings yet
SIC Assembler
26 pages
CAT GR 10 Theory Test T1 QP 1
No ratings yet
CAT GR 10 Theory Test T1 QP 1
5 pages
Introduction To Industrial Automation in PLC
100% (1)
Introduction To Industrial Automation in PLC
50 pages
Verilog Paractice Assignments
No ratings yet
Verilog Paractice Assignments
3 pages
Advance Study Research About Computer
No ratings yet
Advance Study Research About Computer
4 pages
Syspro Slips PDF
No ratings yet
Syspro Slips PDF
26 pages
Digital Literacy Level 4 Exam
No ratings yet
Digital Literacy Level 4 Exam
3 pages
Lecture 3 Architecture
No ratings yet
Lecture 3 Architecture
20 pages
Inventoria Stock Manager Printing
No ratings yet
Inventoria Stock Manager Printing
19 pages
Sheet
No ratings yet
Sheet
19 pages
Types of Interrupt
No ratings yet
Types of Interrupt
4 pages
Selective Branch Prediction Schemes Based On FPGA MIPS Processor For Educational Purposes
No ratings yet
Selective Branch Prediction Schemes Based On FPGA MIPS Processor For Educational Purposes
9 pages
Report 1728315212179
No ratings yet
Report 1728315212179
124 pages
Diane Elizabeth B. Camansag BSA-1 Parts of Motherboard: So What Is A Motherboard? A Computer
No ratings yet
Diane Elizabeth B. Camansag BSA-1 Parts of Motherboard: So What Is A Motherboard? A Computer
10 pages
TYAN S2696 Ver130 s2696 130
No ratings yet
TYAN S2696 Ver130 s2696 130
78 pages
Best Lap Top For Photoshop
No ratings yet
Best Lap Top For Photoshop
3 pages

Parallel Programming

Uploaded by

Parallel Programming

Uploaded by

Parallel

It is a key technique for solving

Example: scientific problems, big data

C[0][0] C[0][1] C[1][0] C[1][1]

Broke down into sub problems

Identify Communications Between Tasks

Synchronize the Sequence of Task

Identify Dependencies in the Sequence of Tasks

Perform Load Balancing

Write and debug the code

Allocates tasks to Effective in systems

Portability Scalability Flexibility

Used in weather Involves Training deep

You might also like