0% found this document useful (0 votes)
68 views58 pages

1 Introduction

The document provides logistics and policies for a Parallel Computing course. It outlines the class hours, office hours, grading policy, assignment details including group formation and submission, and lecture topics on parallelism, distributed memory systems, and parallel programming models.

Uploaded by

spareyash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views58 pages

1 Introduction

The document provides logistics and policies for a Parallel Computing course. It outlines the class hours, office hours, grading policy, assignment details including group formation and submission, and lecture topics on parallelism, distributed memory systems, and parallel programming models.

Uploaded by

spareyash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Parallel Computing (CS 633)

January 8, 2024

Preeti Malakar
[email protected]
Logistics
• Class hours: MW 3:30 – 4:45 PM (RM 101)
• Office hour: MTW 5:00 – 5:30 PM (KD 221)
• https://www.cse.iitk.ac.in/users/cs633/2023-24-2
– Lectures will be uploaded after every class
• Announcements/uploads on
– MooKIT
– Course email alias
• Email to the instructor should always be prefixed with
[CS633] in the subject
2
Grading Policy
Participate actively in class

3
Switch OFF All Devices

4
5
Assignments

• Programming assignments in C
• In a group (group size = 3)
– Send group member information by Jan 14 to
{gsarkar,madhavm}@cse.iitk.ac.in
– Include clearly names, roll numbers, IITK email-ids
– Subject of email [CS633 Group]
– Change in group formation is not allowed
• Mode of submission will be explained in due time

6
Assignments

• Credit for early submissions (+5 / day)


– Max credit: +15 / assignment
– Last date of submission will be considered only
• Score reduction for late submissions (-3 / day)
– Max 2 late days / assignment
• None of the assignments can be completed in a day!

Plagiarism will NOT be tolerated


Use of AI tools is NOT allowed
7
Lecture 1

Introduction
Multicore Era
CPU
Intel 4004
(1971)
Single core
single chip

Single core Hydra Multiple cores


(2000)
multiple chips single chip
Cray X-MP IBM POWER4
(1982) (2001)
Multiple cores
multiple chips

9
Moore’s Law (1965)
Number of transistors in a chip doubles every 18 months

[Source: Wikipedia]
“However, it must be programmed with a more complicated parallel programming
10
model to obtain maximum performance.”
Trends

[Source: M. Frans Kaashoek, MIT]


11
12
top500.org (Nov’23)

~ $600 million
~ 7300 sq. ft.
~ 22 MW power
~ 23000 L water

13
green500.org (Nov’23)

Metric of interest: Performance per Watt 14


15
Top #1
supercomputer

https://www.top500.or
g/resources/top-
systems/

16
Making of a Supercomputer

Source: energy.gov 17
Greenest Data Centre?

Source: MIT TR 06/19

18
“The 149,000 square
foot facility built on a
hillside overlooking the
UC Berkeley campus
and San Francisco Bay
will house one of the
most energy-efficient
computing centers
anywhere, tapping into
the region’s mild
climate to cool the
supercomputers at the
National Energy
Research Scientific
Computing Center
(NERSC) and eliminating
the need for
mechanical cooling. ”

https://www.science.org/content/article/climate-change-threatens-supercomputers 19
Top Supercomputers from India

20
Supercomputing in India [topsc.cdacb.in, Jul’23]

21
Source: www.iitk.ac.in
22
Credit: Ashish Kuvelkar, CDAC
23
National Supercomputing Mission Sites

24
Big Compute

25
Massively Parallel Codes

Climate simulation of Earth [Credit: NASA]


26
Discretization

Gridded mesh for a global model [Credit: Tompkins, ICTP]

27
Numerical Weather Models

• Use numerical methods to solve equations


that govern atmospheric processes
• Are based on fluid dynamics and depend on
observations of meteorological variables
• Are used to obtain nowcast/forecast

28
Massively Parallel Simulations

Self-healing material simulation


[Nomura et al., “Nanocarbon synthesis by high-temperature
oxidation of nanoparticles”, Scientific Reports, 2016] 29
Massively Parallel Analysis

[Nomura et al., “Nanocarbon synthesis by high-temperature


oxidation of nanoparticles”, Scientific Reports, 2016]
30
Massively Parallel Codes

Cosmological simulation [Credit: ANL]


31
Massively Parallel Analysis
Virgo Consortium

32
Computational Science

[Source: Culler, Singh and Gupta] 33


Big Data

34
Output Data
10 PB / year

High-
2 PB / simulation
energy
Scaled to 786K cores on Mira
physics
Higgs boson simulation
Source: CERN
240 TB / simulation

Cosmology
Q Continuum simulation
Source: Salman Habib et al.

Climate/weather
Hurricane simulation
Source: NASA 35
Input Data

[Credit: World Meteorological Organization]


36
System Architecture Trends

[Credit: Pavan Balaji@ATPESC’17] 37


I/O trends

NERSC I/O trends [Credit: www.nersc.gov]


38
Compute vs. I/O trends
I/O VS. FLOPS FOR #1 SUPERCOMPUTER IN TOP500 LIST
1.00E-03

1.00E-04
Byte/FLOP

1.00E-05

1.00E-06
1997 2001 2004 2008 2010 2011 2013 2015 2018

39
Why Parallel?

A*
20 hours

2 hours
Not really
40
Parallelism
A parallel computer is a collection of processing
elements that communicate and cooperate to solve
large problems fast.

– Almasi and Gottlieb (1989)

41
Speedup
Example – Sum of squares of N numbers
Serial Parallel

for i = 1 to N for i = 1 to N/P


sum += a[i] * a[i] sum += a[i] * a[i]
collate result

O(N) O(N/P) +
Communication time
42
Performance Measure
• Speedup
Time ( 1 processor)
SP =
Time ( P processors)

• Efficiency
SP
EP =
P

43
Parallel Performance (Parallel Sum)
Parallel efficiency of summing 10^7 doubles

#Processes Time (sec) Speedup Efficiency


1 0.025 1 1.00
2 0.013 1.9 0.95
4 0.010 2.5 0.63
8 0.009 2.8 0.35
12 0.007 3.6 0.30

44
Ideal Speedup
Speedup Linear
Superlinear

Sublinear

Processors
45
Issue – Scalability

[Source: M. Frans Kaashoek, MIT]


46
Scalability Bottleneck

Performance of weather simulation application


47
Parallelism
A parallel computer is a collection of processing
elements that communicate and cooperate to solve
large problems fast.

– Almasi and Gottlieb (1989)

48
Distributed Memory Systems

• Networked systems
Node • Distributed memory
• Local memory
• Remote memory
• Parallel
Codefile system

Cluster
49
Parallel Programming Models
Libraries MPI, TBB, Pthread, OpenMP, …
New languages Haskell, X10, Chapel, …
Extensions Coarray Fortran, UPC, Cilk, OpenCL, …

• Shared memory
– OpenMP, Pthreads, …
• Distributed memory
– MPI, UPC, …
• Hybrid
– MPI + OpenMP
50
This course …

51
Large-scale Parallel Computing

Message Parallel
passing algorithms

Designing Performance
parallel codes analysis

52
Message Passing Paradigm

• Point-to-point (P2P) communications


• Collective communications
• Algorithms
• Performance

53
Profiling

54
Parallel I/O
NOT SHARED

2 GB/s SHARED
BRIDGE NODES

4 GB/s

IB NETWORK

128:1

Compute node rack I/O nodes GPFS filesystem

11
Job Scheduling

Wikipedia

NODES USERS

JOBS

Example of a real supercomputer activity


- Argonne National Laboratory Theta jobs
56
Supercomputer Activity

57
Reference Material

• DE Culler, A Gupta and JP Singh, Parallel Computer Architecture:


A Hardware/Software Approach Morgan-Kaufmann, 1998.
• A Grama, A Gupta, G Karypis, and V Kumar, Introduction to
Parallel Computing. 2nd Ed., Addison-Wesley, 2003.
• Marc Snir, Steve W. Otto, Steven Huss-Lederman, David W.
Walker and Jack Dongarra, MPI - The Complete Reference,
Second Edition, Volume 1, The MPI Core.
• Bill Gropp, Using MPI, Third Edition, The MIT Press, 2014.
• Research papers

58

You might also like