0% found this document useful (0 votes)

12 views64 pages

3.introduction To Parallelism

The lecture introduces the concept of parallelization, emphasizing its importance in computing efficiency and speed, particularly in processing large datasets. It covers various programming models for parallel computing, including shared and distributed memory systems, and highlights tools such as OpenMP and MPI. The session also discusses performance measures like speedup and efficiency, along with practical examples of parallel code implementation.

Uploaded by

1none2none3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views64 pages

3.introduction To Parallelism

Uploaded by

1none2none3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Lecture 3

Introduction to Parallelization

August 7, 2023
Logistics
• Office hours: 5 – 6 PM
– Chetna (RM) chetna@cse
– Muzafar (KD-213) muzafarwan@cse
– Vishal Singh (RM-505) vshlsng@cse
• Group formation
– Email by August 9
– Include names, roll numbers, email-ids
– vdeka@cse, chetna@cse

2
Parallelism Everywhere

Source: https://gilmour.com/
Why Parallel?
Task: Find the average age of Indians

India 2020 population is estimated at 1,380,004,385

people at mid year according to UN data.

Time (1 human): > 40 years

Time (1 CPU): 10 s

Time (2 CPUs): 5 s

Time (4 CPUs): 3 s
Parallelism
A parallel computer is a collection of processing
elements that communicate and cooperate to solve
large problems fast.

– Almasi and Gottlieb (1989)

5
Why Fast?

6
Basic Computing Unit

CPU/Core
Memory

Intel Core i7
Processing unit
(Courtesy: www.intel.com)
System – Simplified View

Memory

Fast Memory CPU cores

DISK

8
Multicore Era
CPU
Intel 4004
(1971)
Single core
single chip

Single core Hydra Multiple cores

(1996)
multiple chips single chip
Cray X-MP IBM POWER4
(1982) (2001)
Multiple cores
multiple chips

9
Moore’s Law
Number of CPU cores per
node increased

Gordon Moore

[Source: Wikipedia]
Parallel Computing

DISK DISK

DISK DISK
Supercomputer/Cluster/Data Center
Network is the backbone for data communication
Parallel Computer

Compute nodes
Domain Decomposition

Compute nodes
Discretization

Gridded mesh for a global model [Credit: Tompkins, ICTP]

15
Data Bottleneck
Congestion

Read/write

Storage

Example: ‘Age’ is data

Compute nodes
Average – Serial vs. Parallel
Serial Parallel

for i = 1 to N for i = 1 to N/P

sum += a[i] sum += a[i]
avg = sum/N collect sums and compute

17
Parallel Computer

A parallel computer is a collection of processing

elements that communicate and cooperate to solve
large problems fast.

– Almasi and Gottlieb (1989)

18
Parallel Average
Core
Process
Memory

for i = 0 to N/P for i = N/P to 2N/P for i = 2N/P to 3N/P for i = 3N/P to N
sum += a[i] sum += a[i] sum += a[i] sum += a[i]

0 1 2 3
19
Parallel Code Example

// local computation at every process/thread

for i = N/P * id ; i < N/P * (id+1) ; i++
localsum += a[i]

// collect localsum, add up in one of the ranks

and compute average

20
Performance Measure
• Speedup
Time ( 1 processor)
SP =
Time ( P processors)

• Efficiency
SP
EP =
P

21
Parallel Performance (Parallel Sum)
Parallel efficiency of summing 10^7 doubles

#Processes Time (sec) Speedup Efficiency

1 0.025 1 1.00
2 0.013 1.9 0.95
4 0.010 2.5 0.63
8 0.009 2.8 0.35
12 0.007 3.6 0.30

22
Ideal Speedup
Speedup Linear
Superlinear

Sublinear

Processors
23
Scalability Bottleneck

Performance of weather simulation application

24
Programming

25
Parallel Programming Models
Libraries MPI, TBB, Pthread, OpenMP, …
New languages Haskell, X10, Chapel, …
Extensions Coarray Fortran, UPC, Cilk, OpenCL, …

• Shared memory
– OpenMP, Pthreads, CUDA, …
• Distributed memory
– MPI, UPC, …
• Hybrid
– MPI + OpenMP, MPI + CUDA
26
Sharing Data

27
Parallel Programming Models
Shared memory programming – OpenMP, Pthreads
Distributed memory programming – MPI

Cache
Core
Process/
thread
28
Shared Memory Programming
• Shared address space
• Time taken to access certain memory words is longer (NUMA)
• Programming paradigms – Pthreads, OpenMP
• Need to worry about concurrent access

Memory
CPU cores

29
Threads

From Tim Mattson’s slides 30

OpenMP (Open Multiprocessing)
• Standard for shared memory programming
– Compiler directives
– Runtime routines
– Environment variables
• OpenMP Architecture Review Board
• First released in Nov’97
• Current version 5.1 (Nov’20)

31
OpenMP Example
• Thread-based
• Fork-join model

#pragma omp parallel //fork

Spawn a
{ default number
of threads

} //join

32
OpenMP

$ gcc –fopenmp –o foo foo.c

33
OpenMP

34
OpenMP

35
Output

36
OpenMP – Parallel Sum

Work on distinct data concurrently

37
OpenMP Timing

double stime = omp_get_wtime();

#pragma omp parallel
{
…
}
double etime = omp_get_wtime();

38
Multiple Systems

Cache
Core
Process

39
Distributed Memory Systems
64 – 192 GB RAM/node

• Networked systems
Node • Distributed memory
• Local memory
• Remote memory
• Parallel
Codefile system

Cluster
40
MPI (Message Passing Interface)
• Standard for message passing in a distributed
memory environment (most widely used
programming model in supercomputers)
• Efforts began in 1991 by Jack Dongarra, Tony
Hey, and David W. Walker
• MPI Forum formed in 1993
– Version 1.0: 1994
– Version 4.0: 2021

41
Process - Distinct Address Space
Core
Process
Memory

Local data Local data Local data Local data

42
Multiple Processes on a Single Node

From N. Karanjkar’s slides

43
Multiple Processes on Multiple Nodes

Node 1

Node 2
44
Communication using Messages
Core
Process
Memory

Local data Local data Local data Local data

for i = 0 to N/P for i = N/P to 2N/P for i = 2N/P to 3N/P for i = 3N/P to N
sum += a[i] sum += a[i] sum += a[i] sum += a[i]

45
Communication using Messages
Core
Process
Memory

Local data Local data Local data Local data

Instruction 1 Instruction 1 Instruction 1 Instruction 1
Instruction 2 Instruction 2 Instruction 2 Instruction 2 SIMD
… … … …

46
Message Passing
Time

Process 0 Process 1 47
MPI Programming

48
MPI Programming

mpicc -o program.x program.c

49
Communication using Messages

Core
Process
Memory
Local data Local data Local data Local data

for i = 0 to N/P for i = N/P to 2N/P for i = 2N/P to 3N/P for i = 3N/P to N
sum += a[i] sum += a[i] sum += a[i] sum += a[i]

for i = N/P * rank ; i < N/P * (rank+1) ; i++

localsum += a[i]
Collect localsum, add up at one of the ranks
50
Communication using Messages

Core
Process
Memory

51
Simplest Communication Primitives

• MPI_Send
• MPI_Recv

52
MPI Programming

int MPI_Send (const void *buf, int count, MPI_Datatype datatype,

int dest, int tag, MPI_Comm comm)

SENDER RECEIVER
int MPI_Recv (void *buf, int count, MPI_Datatype datatype, int
source, int tag, MPI_Comm comm, MPI_Status *status)
53
MPI Programming
MPI_Comm_rank (MPI_COMM_WORLD, &myrank);

// Sender process
if (myrank == 0) /* code for process 0 */
{
strcpy (message,"Hello, there");
MPI_Send (message, strlen(message)+1, MPI_CHAR, 1, 99,
MPI_COMM_WORLD);
}

// Receiver process
else if (myrank == 1) /* code for process 1 */
{
MPI_Recv (message, 20, MPI_CHAR, 0, 99, MPI_COMM_WORLD,
&status);
printf ("received :%s\n", message);
} 54
MPI – Parallel Sum
Assume the data array resides in the memory of process 0 initially
MPI_Comm_rank (MPI_COMM_WORLD, &myrank);

// Sender process
if (myrank == 0) /* code for process 0 */
{
for (int rank=1; rank<SIZE ; rank++) {
start = rank*N/size*sizeof(int);
MPI_Send (data+start, N/size, MPI_INT, rank, 99, MPI_COMM_WORLD);
}
}
else /* code for processes 1 … SIZE */
{
MPI_Recv (data, N/size, MPI_CHAR, 0, 99, MPI_COMM_WORLD, &status);
}
55
MPI Timing
double stime = MPI_Wtime();
…
…
…
double etime = MPI_Wtime();

56
Parallelization

57
Interpolation

58
Interpolation

59
Range/Value Query

Input: File
Output: File

60
Query on a Million Processes

Compute nodes
Unstructured Mesh

Source: COMSOL
62
Unstructured Mesh

Obayashi et al., Multi-objective Design Exploration Using Efficient Global Optimization

63
Thank You

Method Installation of Steel Portal Frame
100% (2)
Method Installation of Steel Portal Frame
7 pages
Parallel and Distributed Computing
33% (3)
Parallel and Distributed Computing
10 pages
En-12697-22-2003-Bituminous Mixtures Test Methods For Hot Mix Asphalt Part 22 Wheel Tracking
No ratings yet
En-12697-22-2003-Bituminous Mixtures Test Methods For Hot Mix Asphalt Part 22 Wheel Tracking
32 pages
Untitled Document
No ratings yet
Untitled Document
23 pages
Mpi Openmp Handouts
No ratings yet
Mpi Openmp Handouts
67 pages
2 ParallelArchExec
No ratings yet
2 ParallelArchExec
46 pages
Programming Assignment: On Openmp
No ratings yet
Programming Assignment: On Openmp
19 pages
Intro To MPI
No ratings yet
Intro To MPI
44 pages
Introduction To Parallel Programming: Center For Institutional Research Computing
No ratings yet
Introduction To Parallel Programming: Center For Institutional Research Computing
98 pages
Parallel Programming
No ratings yet
Parallel Programming
108 pages
Omp Hands On
No ratings yet
Omp Hands On
200 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
High Performance Computing For Computational Mechanics: ISCM-10
No ratings yet
High Performance Computing For Computational Mechanics: ISCM-10
63 pages
.Trashed-1650000204-Hpc Prac Exam
No ratings yet
.Trashed-1650000204-Hpc Prac Exam
5 pages
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
No ratings yet
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
50 pages
Mansi Kadam PC Lab Assignment 1
No ratings yet
Mansi Kadam PC Lab Assignment 1
4 pages
HPC Summary
No ratings yet
HPC Summary
17 pages
ParallelProgramming Start2016
No ratings yet
ParallelProgramming Start2016
41 pages
Multi Core Architectures and Programming
No ratings yet
Multi Core Architectures and Programming
10 pages
Parallel Programming 3
No ratings yet
Parallel Programming 3
22 pages
Parallel Programming For Multicore Machines Using OpenMP and MPI Lecture Notes (Dr. Constantinos Evangelinos) (Z-Library)
No ratings yet
Parallel Programming For Multicore Machines Using OpenMP and MPI Lecture Notes (Dr. Constantinos Evangelinos) (Z-Library)
292 pages
Mit Openmp Mpi
No ratings yet
Mit Openmp Mpi
77 pages
CSC-334 - P&DC - Lab Manual - V2.0
No ratings yet
CSC-334 - P&DC - Lab Manual - V2.0
102 pages
Introduction To Parallel Computing: What Is Parallel Computing? CS 480 - II Parallel and Scientific Computing
No ratings yet
Introduction To Parallel Computing: What Is Parallel Computing? CS 480 - II Parallel and Scientific Computing
10 pages
Lecture 1
No ratings yet
Lecture 1
23 pages
MPI Python Workshop Day1 Fall2024
No ratings yet
MPI Python Workshop Day1 Fall2024
22 pages
Intro To OpenMP Mattson Customized
No ratings yet
Intro To OpenMP Mattson Customized
94 pages
Parallel Programming and MPI
No ratings yet
Parallel Programming and MPI
54 pages
Demystifying Multicore Germany 14 PDF
No ratings yet
Demystifying Multicore Germany 14 PDF
82 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
Lec7 - TLP Shared Memory and OpenMP
No ratings yet
Lec7 - TLP Shared Memory and OpenMP
45 pages
CS621 Final Term Current Papers
100% (1)
CS621 Final Term Current Papers
9 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
PDC Lecture 02
No ratings yet
PDC Lecture 02
35 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
PDC Lecture 15 OpenMP
No ratings yet
PDC Lecture 15 OpenMP
18 pages
Distributed Memory Programming With: Peter Pacheco
No ratings yet
Distributed Memory Programming With: Peter Pacheco
125 pages
Chapter 3 - Shared-Memory Programming, OpenMP
No ratings yet
Chapter 3 - Shared-Memory Programming, OpenMP
65 pages
Lec 6
No ratings yet
Lec 6
21 pages
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
No ratings yet
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
27 pages
Mpi
No ratings yet
Mpi
46 pages
PDC Experiments
No ratings yet
PDC Experiments
11 pages
Overview of Parallel Computing: Shawn T. Brown
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
46 pages
Parallel & Distributed Computing: MPI - Message Passing Interface
No ratings yet
Parallel & Distributed Computing: MPI - Message Passing Interface
49 pages
Sunil Kumar L 24
No ratings yet
Sunil Kumar L 24
21 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
3-Parallel Software
No ratings yet
3-Parallel Software
35 pages
Bahria University Lahore Campus: Department of Computer Sciences
No ratings yet
Bahria University Lahore Campus: Department of Computer Sciences
10 pages
Lecture 10 Shared Memory Programming With OpenMP
No ratings yet
Lecture 10 Shared Memory Programming With OpenMP
30 pages
PDC Lecture 05
No ratings yet
PDC Lecture 05
48 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
PA Midsem
No ratings yet
PA Midsem
20 pages
Shared Memory and Accelerators
No ratings yet
Shared Memory and Accelerators
88 pages
Lec 9 DR Marwa Abbas
No ratings yet
Lec 9 DR Marwa Abbas
64 pages
Parallel Programming Unit 2
No ratings yet
Parallel Programming Unit 2
71 pages
001 - DDS IIIT Jan 10th
No ratings yet
001 - DDS IIIT Jan 10th
34 pages
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
NumPy Recipes
From Everand
NumPy Recipes
Martin McBride
No ratings yet
Apporio Taxi Uber Clone
No ratings yet
Apporio Taxi Uber Clone
5 pages
Bschons Statistics and Data Science (02240193) : University of Pretoria Yearbook 2020
No ratings yet
Bschons Statistics and Data Science (02240193) : University of Pretoria Yearbook 2020
6 pages
SCADA System of NLDC
100% (1)
SCADA System of NLDC
38 pages
Software Engineering: UNIT-2
No ratings yet
Software Engineering: UNIT-2
53 pages
Setting Up OpenVPN Server On Ubuntu
No ratings yet
Setting Up OpenVPN Server On Ubuntu
35 pages
Research Paper
No ratings yet
Research Paper
3 pages
MiniWave Manual
No ratings yet
MiniWave Manual
16 pages
MTCP NJ Client
No ratings yet
MTCP NJ Client
4 pages
Active Directory Sync With Lotus ADSync
100% (1)
Active Directory Sync With Lotus ADSync
24 pages
CSR BC417 Datasheet
100% (2)
CSR BC417 Datasheet
116 pages
IITG Credit Linked DS
No ratings yet
IITG Credit Linked DS
10 pages
SET Duct Manufacturing, Inc.: Spiral Duct Dimensional Guide
100% (1)
SET Duct Manufacturing, Inc.: Spiral Duct Dimensional Guide
20 pages
Exam 4 Training Grile
No ratings yet
Exam 4 Training Grile
15 pages
Fall 2023 - CS302P - 1
No ratings yet
Fall 2023 - CS302P - 1
2 pages
Oracle: Question & Answers
No ratings yet
Oracle: Question & Answers
18 pages
YATO Konteyner 9
No ratings yet
YATO Konteyner 9
8 pages
Full Mobile App Development With Ionic Cross Platform Apps With Ionic Angular and Cordova Griffith Ebook All Chapters
100% (3)
Full Mobile App Development With Ionic Cross Platform Apps With Ionic Angular and Cordova Griffith Ebook All Chapters
38 pages
User Manual
No ratings yet
User Manual
128 pages
Computer Vision in Banking
No ratings yet
Computer Vision in Banking
7 pages
Bauer New Filling Valves
No ratings yet
Bauer New Filling Valves
4 pages
Wonderware - InTouch Access Anywhere Secure Gateway 2013
No ratings yet
Wonderware - InTouch Access Anywhere Secure Gateway 2013
43 pages
BQS502 Assignment 1 Mac - Aug 2022
No ratings yet
BQS502 Assignment 1 Mac - Aug 2022
2 pages
Profile
No ratings yet
Profile
2 pages
Lesson 1 in ICT FIRST QUARTER
No ratings yet
Lesson 1 in ICT FIRST QUARTER
2 pages
Forced Perspective Photography
100% (1)
Forced Perspective Photography
3 pages
CFM @45 Tr. - Coil Selection
No ratings yet
CFM @45 Tr. - Coil Selection
1 page
Voisey Bay C&F
No ratings yet
Voisey Bay C&F
16 pages
Admitcard31 01 2024
No ratings yet
Admitcard31 01 2024
1 page

3.introduction To Parallelism

Uploaded by

3.introduction To Parallelism

Uploaded by

Lecture 3

India 2020 population is estimated at 1,380,004,385

Time (1 human): > 40 years

– Almasi and Gottlieb (1989)

Fast Memory CPU cores

Single core Hydra Multiple cores

Gridded mesh for a global model [Credit: Tompkins, ICTP]

Example: ‘Age’ is data

for i = 1 to N for i = 1 to N/P

A parallel computer is a collection of processing

– Almasi and Gottlieb (1989)

// local computation at every process/thread

// collect localsum, add up in one of the ranks

#Processes Time (sec) Speedup Efficiency

Performance of weather simulation application

From Tim Mattson’s slides 30

#pragma omp parallel //fork

$ gcc –fopenmp –o foo foo.c

Work on distinct data concurrently

double stime = omp_get_wtime();

Local data Local data Local data Local data

From N. Karanjkar’s slides

Local data Local data Local data Local data

Local data Local data Local data Local data

mpicc -o program.x program.c

for i = N/P * rank ; i < N/P * (rank+1) ; i++

int MPI_Send (const void *buf, int count, MPI_Datatype datatype,

Obayashi et al., Multi-objective Design Exploration Using Efficient Global Optimization

You might also like