0% found this document useful (0 votes)

9 views

01 Introduction

Uploaded by

wz1151897402

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

01 Introduction

Uploaded by

wz1151897402

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Lecture 1:

Why Parallelism?

CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)

One common definition

A parallel computer is a collection of processing elements

that cooperate to solve problems fast

We care about performance * We’re going to use multiple

processors to get it

* Note: different motivation from “concurrent programming” using pthreads in 15-213

(CMU 15-418, Spring 2012)
DEMO 1
(15-418 Spring 2012‘s first parallel program)

(CMU 15-418, Spring 2012)

Speedup
One major motivation of using parallel processing: achieve a speedup

For a fixed problem size:

Time (1 processor)
Speedup( P processors ) =
Time (P processors)

(CMU 15-418, Spring 2012)

Class observations from demos 1

▪ Communication limited the maximum speedup achieved

▪ Minimizing the cost of communication improved speedup

- Moved students (“processors”) closer together (or let them shout)

(CMU 15-418, Spring 2012)

DEMO 2
(scaling up to four processors)

(CMU 15-418, Spring 2012)

Class observations from demo 2

▪ Imbalance in work assignment limited speedup

- Some processors ran out work to do (went idle), while others
were still working

▪ Improving the distribution of work improved speedup

(CMU 15-418, Spring 2012)

DEMO 3
(massively parallel execution)

(CMU 15-418, Spring 2012)

Class observations from demo 3

▪ The problem I just gave you has a significant amount of

communication compared to computation

▪ Communication costs can dominate a parallel

computation, severely limiting speedup

(CMU 15-418, Spring 2012)

Course theme 1:
Designing and writing parallel programs ... that scale!

▪ Parallel thinking
1. Decomposing work into parallel pieces
2. Assigning work to processors
3. Orchestrating communication/synchronization

▪ Abstractions for performing the above tasks

- Writing code in popular parallel programming languages

(CMU 15-418, Spring 2012)

Course theme 2:
Parallel computer hardware implementation: how parallel
computers work

▪ Mechanisms used to implement abstractions efficiently

- Performance characteristics of implementations
- Design trade-offs: performance vs. convenience vs. cost

▪ Why do I need to know about HW?

- Because the characteristics of the machine really matter
(recall speed of communication issues in class demos)
- Because you care about performance (you are writing parallel programs)

(CMU 15-418, Spring 2012)

Course theme 3:
Thinking about efficiency
▪ FAST != EFFICIENT

▪ Just because your program runs faster on a parallel computer, it

doesn’t mean it is using the hardware efficiently
- Is 2x speedup on 10 processors is a good result?

▪ Programmer’s perspective: make use of provided machine capabilities

▪ HW designer’s perspective: choosing the right capabilities to put in

system (performance/cost, cost = silicon area?, power?, etc.)

(CMU 15-418, Spring 2012)

Logistics

(CMU 15-418, Spring 2012)

Logistics
▪ Kayvon’s office hours
- Tues/Thurs 1:30-2:30 PM (right after class)
- GHC 7005

▪ TAs
- Michael Papamichael
- Mike Mu

▪ Textbook
- Culler and Singh, Parallel Computer Architecture: A Hardware/Software Approach
- Yes, it’s old. But many parts are still very good.

(CMU 15-418, Spring 2012)

Logistics: assignments
▪ Four programming assignments
- First assignment individual, the rest are in pairs
- Each in a different parallel programming environment

Assignment 1: ISPC programming Assignment 2: OpenCL

on Intel quad-core CPU programming on NVIDIA GPUs

Assignment 3: OpenMP Assignment 4: MPI

programming on programming on
Supercomputing cluster Supercomputing cluster

(CMU 15-418, Spring 2012)

Logistics: final project
▪ 6-week final project
▪ Done in pairs

▪ Announcing: the first annual 418 parallelism competition!

- Non-CMU judges from (Intel, NVIDIA, etc.)
- Expect non-trivial prizes... (e.g., high end GPUs, tablets)

(CMU 15-418, Spring 2012)

Logistics: grades

40% assignments
30% exams
25% project
5% class participaction

(CMU 15-418, Spring 2012)

Why parallelism?

(CMU 15-418, Spring 2012)

Why parallelism?
▪ The answer 10 years ago
- To get performance that was faster
than what clock frequency scaling
would provide
- Because if you just waited until next
year, your code would run faster on
the next generation CPU

▪ Parallelizing your code not

always worth the time
- Do nothing: performance doubling
~ every 18 months

(CMU 15-418, Spring 2012)

End of frequency scaling

(CMU 15-418, Spring 2012)

Power wall
P = CV2F
P: power
C: capacitance
V: voltage
F: frequency

▪ Higher frequencies typically require higher voltages

(CMU 15-418, Spring 2012)

Power vs. core voltage
Pentium M

Credit: Shimin Chin (CMU 15-418, Spring 2012)

Programmable invisible parallelism
▪ Bit level parallelism
- 16 bit 32 bit 64 bit

▪ Instruction level parallelism (ILP)

- Two instructions that are independent can be executed simultaneously
- “Superscalar” execution

(CMU 15-418, Spring 2012)

ILP example
a"="(x*x"+"y*y"+"z*z)
ILP = 3 x*x y*y z*z

ILP = 1 +

(CMU 15-418, Spring 2012)

ILP scaling
3

2
Speedup

0
0 4 8 12 16

Instruction Issue Capability

(CMU 15-418, Spring 2012)

Single core performance scaling
The rate of single thread
performance scaling has decreased
(essentially to 0)

1. Frequency scaling limited by power

2. ILP scaling tapped out

No more free lunch for

software developers!

(CMU 15-418, Spring 2012)

Why parallelism?
▪ The answer 10 years ago
- To get performance that was faster than what clock frequency scaling
would provide
- Because if you just waited until next year, your code would run faster on
the next generation CPU

▪ The answer today:

- Because it is the only way to achieve significantly higher application
performance for the foreseeable future

(CMU 15-418, Spring 2012)

Intel Sandy Bridge (2011)
▪ Quad core CPU + GPU

(CMU 15-418, Spring 2012)

NVIDIA Fermi GPU (2009)
▪ 16 processing cores

(CMU 15-418, Spring 2012)

Mobile processing
▪ Power limits heavily influencing designs

Apple A5: (in iPhone 4s and iPad 2) NVIDIA Tegra:

Dual Core CPU + GPU + image processor and more Quad core CPU + GPU + image processor...

(CMU 15-418, Spring 2012)

Supercomputing
▪ Today: clusters of CPUs + GPUs
▪ Pittsburgh Supercomputing Center: Backlight
▪ 512 eight core Intel Xeon processors
- 4096 total cores

(CMU 15-418, Spring 2012)

Summary (what we learned)
▪ Single thread performance scaling has ended
- To run faster, you will need to use multiple processing elements
- Which means you need to know how to write parallel code

▪ Writing parallel programs can be challenging

- Problem partitioning, communication, synchronization
- Knowledge of machine characteristics is important

(CMU 15-418, Spring 2012)

PLC Controls with Structured Text (ST): IEC 61131-3 and best practice ST programming
From Everand
PLC Controls with Structured Text (ST): IEC 61131-3 and best practice ST programming
Tom Mejer Antonsen
4/5 (12)
01 Whyparallelism
No ratings yet
01 Whyparallelism
39 pages
001__DDS-IIIT-Jan-10th
No ratings yet
001__DDS-IIIT-Jan-10th
34 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 03-Aug-2021 Lecture1-Course Introduction
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 03-Aug-2021 Lecture1-Course Introduction
39 pages
Unit 5
No ratings yet
Unit 5
66 pages
Week1-Parallel-and-Distributed-Computing
No ratings yet
Week1-Parallel-and-Distributed-Computing
55 pages
Overview of Parallel Computing: Shawn T. Brown
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
46 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
PP Cuda Unit1 1
No ratings yet
PP Cuda Unit1 1
77 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
Parallelism in Computer Architecture
No ratings yet
Parallelism in Computer Architecture
27 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
30 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
No ratings yet
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
11 pages
Lecture 9
No ratings yet
Lecture 9
72 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Lecture-2-06.01.2025
No ratings yet
Lecture-2-06.01.2025
21 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
Parallelism
No ratings yet
Parallelism
22 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Paralle Processing in Brief
No ratings yet
Paralle Processing in Brief
31 pages
CS 258 Parallel Computer Architecture: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
No ratings yet
CS 258 Parallel Computer Architecture: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
44 pages
Lec1-Introduction To Parallel - Distributed System
No ratings yet
Lec1-Introduction To Parallel - Distributed System
29 pages
Parallel Computing Main
No ratings yet
Parallel Computing Main
47 pages
Concurrent Programming With Threads: Rajkumar Buyya
No ratings yet
Concurrent Programming With Threads: Rajkumar Buyya
168 pages
COSC 4101 Parallel and Distributed Computing Final
No ratings yet
COSC 4101 Parallel and Distributed Computing Final
4 pages
High Performance Computing: Sabah Sayed
No ratings yet
High Performance Computing: Sabah Sayed
22 pages
Onur 18 742 Fall12 Lecture1 Intro Afterlecture
No ratings yet
Onur 18 742 Fall12 Lecture1 Intro Afterlecture
36 pages
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
No ratings yet
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
22 pages
Multi Threading
No ratings yet
Multi Threading
168 pages
Part 1 - Lecture 1 - Introduction Parallel Computing
No ratings yet
Part 1 - Lecture 1 - Introduction Parallel Computing
33 pages
High Performance Computing Unit 1-2
No ratings yet
High Performance Computing Unit 1-2
60 pages
COA - Module-5
No ratings yet
COA - Module-5
35 pages
Parallel Computers Architecture and Programming V. Rajaraman download
100% (1)
Parallel Computers Architecture and Programming V. Rajaraman download
54 pages
CAQA5e ch1
No ratings yet
CAQA5e ch1
42 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
CSCE569 Parallel Computing: TTH 03:30AM-04:45PM Dr. Jianjun Hu
No ratings yet
CSCE569 Parallel Computing: TTH 03:30AM-04:45PM Dr. Jianjun Hu
37 pages
V. Rajaraman, C. Siva Ram Murthy - Parallel Computers Architecture and Programming-PHI (2016)
100% (2)
V. Rajaraman, C. Siva Ram Murthy - Parallel Computers Architecture and Programming-PHI (2016)
506 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
arciticher
No ratings yet
arciticher
6 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
Computer Achitecture II - Parallel - Computing
No ratings yet
Computer Achitecture II - Parallel - Computing
46 pages
[Ebooks PDF] download Parallel Computers Architecture and Programming V. Rajaraman full chapters
100% (1)
[Ebooks PDF] download Parallel Computers Architecture and Programming V. Rajaraman full chapters
47 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Lect 1 Overview
No ratings yet
Lect 1 Overview
17 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
23 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
Topic 1 2024
No ratings yet
Topic 1 2024
41 pages
CS4230 Parallel Programming: Mary Hall August 21, 2012
No ratings yet
CS4230 Parallel Programming: Mary Hall August 21, 2012
17 pages
Basics of Parallel Programming: Unit-1
No ratings yet
Basics of Parallel Programming: Unit-1
79 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
Mca 4
No ratings yet
Mca 4
61 pages
COA UNIT 5 (AutoRecovered)
No ratings yet
COA UNIT 5 (AutoRecovered)
14 pages
15-418: Parallel Computer Architecture and Programming Spring 2011 Syllabus
No ratings yet
15-418: Parallel Computer Architecture and Programming Spring 2011 Syllabus
4 pages
23553
No ratings yet
23553
56 pages
Full Parallel Computers Architecture and Programming V. Rajaraman PDF All Chapters
100% (4)
Full Parallel Computers Architecture and Programming V. Rajaraman PDF All Chapters
62 pages
CISA Exam-Testing Concept-PERT/CPM/Gantt Chart/FPA/EVA/Timebox (Chapter-3)
From Everand
CISA Exam-Testing Concept-PERT/CPM/Gantt Chart/FPA/EVA/Timebox (Chapter-3)
Hemang Doshi
1.5/5 (3)
pstack
No ratings yet
pstack
1 page
Dts Test
No ratings yet
Dts Test
1 page
09-indexes2
No ratings yet
09-indexes2
5 pages
18-timestampordering
No ratings yet
18-timestampordering
82 pages
03-storage1
No ratings yet
03-storage1
55 pages
02 Multicore
No ratings yet
02 Multicore
66 pages
Acceleo 2.6 User Tutorial
No ratings yet
Acceleo 2.6 User Tutorial
66 pages
Librerias COdesys Festo Provisional
No ratings yet
Librerias COdesys Festo Provisional
22 pages
RipunjayCV9
No ratings yet
RipunjayCV9
1 page
Java Script
No ratings yet
Java Script
9 pages
Suraj Mogalnewsfdc2.0
No ratings yet
Suraj Mogalnewsfdc2.0
2 pages
Comprehensive CS Career Paths
No ratings yet
Comprehensive CS Career Paths
4 pages
Sample Practical Evolutionary Algorithms
No ratings yet
Sample Practical Evolutionary Algorithms
12 pages
A Faster Voxel Traversal Algorithm For Ray Tracing
No ratings yet
A Faster Voxel Traversal Algorithm For Ray Tracing
6 pages
CSE 325-Operating System Lec1-3
No ratings yet
CSE 325-Operating System Lec1-3
39 pages
Angular - Angular Coding Style Guide
No ratings yet
Angular - Angular Coding Style Guide
45 pages
Course Outline PF 2021
No ratings yet
Course Outline PF 2021
4 pages
Linkedin Tagged LeetCode Problems
No ratings yet
Linkedin Tagged LeetCode Problems
7 pages
Practical File Programs
No ratings yet
Practical File Programs
10 pages
Daa Codes
No ratings yet
Daa Codes
7 pages
Fresco Code Python Application Programming
No ratings yet
Fresco Code Python Application Programming
7 pages
Sunbird ED Developer Bootcamp 2023
No ratings yet
Sunbird ED Developer Bootcamp 2023
9 pages
Creating A Serial Communication On Win32
No ratings yet
Creating A Serial Communication On Win32
9 pages
19u03057 Divya Prakash Lab1-4
No ratings yet
19u03057 Divya Prakash Lab1-4
21 pages
STUDENT ID: 25211207696: Exercise (Lab) 6
No ratings yet
STUDENT ID: 25211207696: Exercise (Lab) 6
6 pages
Sma MCQ and Important Questions
No ratings yet
Sma MCQ and Important Questions
7 pages
Connection String For Connecting To Data Sources in VB
100% (3)
Connection String For Connecting To Data Sources in VB
10 pages
Unit-Iv Basic Behavioral Modeling-I
No ratings yet
Unit-Iv Basic Behavioral Modeling-I
14 pages
2021 PO R1 Question Paper Eng
No ratings yet
2021 PO R1 Question Paper Eng
2 pages
GBDU
No ratings yet
GBDU
2 pages
DBMS Manual
No ratings yet
DBMS Manual
96 pages
A Brief Introduction To RTL8192C Driver Power Saving
No ratings yet
A Brief Introduction To RTL8192C Driver Power Saving
5 pages
Chapter 1: Creating Relational Database (8 Marks) : Data
No ratings yet
Chapter 1: Creating Relational Database (8 Marks) : Data
11 pages
Mini Project Report: Body Meter
No ratings yet
Mini Project Report: Body Meter
35 pages
C++ Lecture All in One
100% (4)
C++ Lecture All in One
120 pages
Intro To Python Programming
No ratings yet
Intro To Python Programming
4 pages

01 Introduction

Uploaded by

01 Introduction

Uploaded by

Lecture 1:

CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)

A parallel computer is a collection of processing elements

We care about performance * We’re going to use multiple

* Note: different motivation from “concurrent programming” using pthreads in 15-213

(CMU 15-418, Spring 2012)

For a fixed problem size:

(CMU 15-418, Spring 2012)

▪ Communication limited the maximum speedup achieved

▪ Minimizing the cost of communication improved speedup

(CMU 15-418, Spring 2012)

(CMU 15-418, Spring 2012)

▪ Imbalance in work assignment limited speedup

▪ Improving the distribution of work improved speedup

(CMU 15-418, Spring 2012)

(CMU 15-418, Spring 2012)

▪ The problem I just gave you has a significant amount of

▪ Communication costs can dominate a parallel

(CMU 15-418, Spring 2012)

▪ Abstractions for performing the above tasks

(CMU 15-418, Spring 2012)

▪ Mechanisms used to implement abstractions efficiently

▪ Why do I need to know about HW?

(CMU 15-418, Spring 2012)

▪ Just because your program runs faster on a parallel computer, it

▪ Programmer’s perspective: make use of provided machine capabilities

▪ HW designer’s perspective: choosing the right capabilities to put in

(CMU 15-418, Spring 2012)

(CMU 15-418, Spring 2012)

(CMU 15-418, Spring 2012)

Assignment 1: ISPC programming Assignment 2: OpenCL

Assignment 3: OpenMP Assignment 4: MPI

(CMU 15-418, Spring 2012)

▪ Announcing: the first annual 418 parallelism competition!

(CMU 15-418, Spring 2012)

(CMU 15-418, Spring 2012)

(CMU 15-418, Spring 2012)

▪ Parallelizing your code not

(CMU 15-418, Spring 2012)

(CMU 15-418, Spring 2012)

▪ Higher frequencies typically require higher voltages

(CMU 15-418, Spring 2012)

Credit: Shimin Chin (CMU 15-418, Spring 2012)

▪ Instruction level parallelism (ILP)

(CMU 15-418, Spring 2012)

(CMU 15-418, Spring 2012)

Instruction Issue Capability

(CMU 15-418, Spring 2012)

1. Frequency scaling limited by power

No more free lunch for

(CMU 15-418, Spring 2012)

▪ The answer today:

(CMU 15-418, Spring 2012)

(CMU 15-418, Spring 2012)

(CMU 15-418, Spring 2012)

Apple A5: (in iPhone 4s and iPad 2) NVIDIA Tegra:

(CMU 15-418, Spring 2012)

(CMU 15-418, Spring 2012)

▪ Writing parallel programs can be challenging

(CMU 15-418, Spring 2012)

You might also like