0% found this document useful (0 votes)

71 views7 pages

CS4961 Parallel Programming: Course Details

This document provides details about a parallel programming course, including: 1) The course meets on Tuesdays and Thursdays from 9:10-10:30 AM in room WEB L112. 2) The instructor is Mary Hall and the TA is Sriram Aananthakrishnan. The textbook is "Principles of Parallel Programming" by Calvin Lin and Lawrence Snyder, along with other readings. 3) Today's lecture will provide an overview of the course, important problems that require parallel computing, and what types of architectures will be covered such as multi-cores and GPUs. It will also discuss developing high-performance parallel applications and parallel and distributed computing concepts.

Uploaded by

EDPSENPAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views7 pages

CS4961 Parallel Programming: Course Details

Uploaded by

EDPSENPAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

8/25/09

Course Details
•  Time and Location: TuTh, 9:10-10:30 AM, WEB L112
CS4961 Parallel Programming 
•  Course Website
-  http://www.eng.utah.edu/~cs4961/

•  Instructor: Mary Hall, [email protected],

Lecture 1: Introduction   http://www.cs.utah.edu/~mhall/
-  Office Hours: Tu 10:45-11:15 AM; Wed 11:00-11:30 AM

•  TA: Sriram Aananthakrishnan, [email protected]

-  Office Hours: TBD
Mary Hall 
•  Textbook
August 25, 2009 
-  “Principles of Parallel Programming,”
Calvin Lin and Lawrence Snyder.
-  Also, readings and notes provided for
MPI, CUDA, Locality and Parallel Algs.

08/25/2009 CS4961 1 08/25/2009 CS4961 2

Today’s Lecture Outline

•  Logistics
• Overview of course (done)
•  Introduction
• Important problems require powerful
computers … •  Technology Drivers for Multi-Core Paradigm Shift
- … and powerful computers must be parallel.
•  Origins of Parallel Programming: Large-scale
- Increasing importance of educating parallel
scientific simulations
programmers (you!)
•  The fastest computer in the world today
• What sorts of architectures in this class
-  Multimedia extensions, multi-cores, GPUs, •  Why writing fast parallel programs is hard
networked clusters Some material for this lecture drawn from:
Kathy Yelick and Jim Demmel, UC Berkeley
• Developing high-performance parallel Quentin Stout, University of Michigan,
applications (see http://www.eecs.umich.edu/~qstout/parallel.html)
Top 500 list (http://www.top500.org)
- An optimization perspective
08/25/2009 CS4961 3 08/25/2009 CS4961 4

1
8/25/09

Course Objectives Parallel and Distributed Computing

• Learn how to program parallel processors and •  Parallel computing (processing):
systems -  the use of two or more processors (computers), usually
within a single system, working simultaneously to solve a
- Learn how to think in parallel and write correct single problem.
parallel programs •  Distributed computing (processing):
- Achieve performance and scalability through -  any computing that involves multiple computers remote
understanding of architecture and software mapping from each other that each have a role in a computation
problem or information processing.
• Significant hands-on programming experience •  Parallel programming:
- Develop real applications on real hardware -  the human process of developing programs that express
what computations should be executed in parallel.
• Discuss the current parallel computing context
- What are the drivers that make this course timely
- Contemporary programming models and
architectures, and where is the field going
08/25/2009 CS4961 5 08/25/2009 CS4961 6

Detour: Technology as Driver for

Why is Parallel Programming Important Now? “Multi-Core” Paradigm Shift
•  All computers are now parallel computers
(embedded, commodity, supercomputer)
-  On-chip architectures look like parallel computers •  Do you know why most computers sold today are
-  Languages, software development and compilation strategies parallel computers?
originally developed for high end (supercomputers) are now
becoming important for many other domains •  Let’s talk about the technology trends
•  Why?
-  Technology trends

•  Looking to the future

-  Parallel computing for the masses demands better parallel
programming paradigms
-  And more people who are trained in writing parallel
programs (possibly you!)
-  How to put all these vast machine resources to the best use!

08/25/2009 CS4961 7 08/25/2009 CS4961 8

2
8/25/09

Technology Trends: Microprocessor Capacity Technology Trends: Power Density Limits

Serial Performance
Transistor
count still
rising

Clock speed
flattening
sharply

Slide source: Maurice Herlihy

Moore’s Law:
Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor
density of semiconductor chips would double roughly every 18 months.
08/25/2009 CS4961 9 08/25/2009 CS4961 10

The Multi-Core Paradigm Shift Proof of Significance: Popular Press

What to do with all these transistors?

•  Key ideas:
•  This week’s issue of
-  Movement away from increasingly complex processor design Newsweek!
and faster clocks
-  Replicated functionality (i.e., parallel) is simpler to design •  Article on 25 things “smart
-  Resources more efficiently utilized people” should know
-  Huge power management advantages •  See
http://www.newsweek.com/id/212142

All Computers are Parallel Computers.

08/25/2009 CS4961 11 08/25/2009 CS4961 12

3
8/25/09

Scientific Simulation:
The Third Pillar of Science The quest for increasingly more powerful machines
•  Traditional scientific and engineering paradigm:
•  Scientific simulation will continue to push on system
1) Do theory or paper design. requirements:
2) Perform experiments or build system. -  To increase the precision of the result
-  To get to an answer sooner (e.g., climate modeling, disaster
•  Limitations: modeling)
-  Too difficult -- build large wind tunnels.
-  Too expensive -- build a throw-away passenger jet.
•  The U.S. will continue to acquire systems of
increasing scale
-  Too slow -- wait for climate or galactic evolution.
-  For the above reasons
-  Too dangerous -- weapons, drug design, climate -  And to maintain competitiveness
experimentation.
•  Computational science paradigm:
3) Use high performance computer systems to simulate
the phenomenon
-  Base on known physical laws and efficient numerical
methods.
08/25/2009 CS4961 13 08/25/2009 CS4961 14

A Similar Phenomenon in Commodity Systems The fastest computer in the world today
•  What is its name? RoadRunner

•  More capabilities in software Los Alamos National

•  Where is it located? Laboratory
•  Integration across software
•  Faster response •  How many processors does it have? ~19,000 processor chips
•  More realistic graphics (~129,600 “processors”)

•  … •  What kind of processors?

AMD Opterons and
IBM Cell/BE (in Playstations)
•  How fast is it? 1.105 Petaflop/second
One quadrilion operations/s
1 x 1016
See http://www.top500.org
08/25/2009 CS4961 15 08/25/2009 CS4961 16

4
8/25/09

High Resolution
Example: Global Climate Modeling Problem Climate Modeling on
NERSC-3 – P. Duffy,
•  Problem is to compute: et al., LLNL
f(latitude, longitude, elevation, time) 
temperature, pressure, humidity, wind velocity
•  Approach:
-  Discretize the domain, e.g., a measurement point every 10 km
-  Devise an algorithm to predict weather at time t+δt given t

•  Uses:
-  Predict major events,
e.g., El Nino
-  Use in setting air
emissions standards

Source: http://www.epm.ornl.gov/chammp/chammp.html
08/25/2009 CS4961 17 08/25/2009 CS4961 18

Some Characteristics of Scientific Simulation

Example of Discretizing a Domain
•  Discretize physical or conceptual space into a grid
-  Simpler if regular, may be more representative if adaptive

•  Perform local computations on grid One

Another
-  Given yesterday’s temperature and weather pattern, what is processor
today’s expected temperature? processor
computes
computes
this part
•  Communicate partial results between grids this part in
parallel
-  Contribute local weather result to understand global
weather pattern.
•  Repeat for a set of time steps
•  Possibly perform other calculations with results
-  Given weather model, what area should evacuate for a
hurricane?
Processors in adjacent blocks in the grid communicate their result.

08/25/2009 CS4961 19 08/25/2009 CS4961 20

5
8/25/09

Parallel Programming Complexity Finding Enough Parallelism

An Analogy to Preparing Thanksgiving Dinner •  Suppose only part of an application seems parallel
•  Enough parallelism? (Amdahl’s Law)
•  Amdahl’s law
-  Suppose you want to just serve turkey
-  let s be the fraction of work done sequentially, so
•  Granularity (1-s) is fraction parallelizable
-  How frequently must each assistant report to the chef -  P = number of processors
-  After each stroke of a knife? Each step of a recipe? Each dish Speedup(P) = Time(1)/Time(P)
completed?

All of
•  Locality these things makes parallel <= 1/(s + (1-s)/P)
programming even
-  Grab the spices one harder
at a time?
prior to starting a dish?
than
Or collect onessequential
that are needed
<= 1/s
programming.
•  Load balance •  Even if the parallel part speeds up perfectly
-  Each assistant gets a dish? Preparing stuffing vs. cooking green performance is limited by the sequential part
beans?
•  Coordination and Synchronization
-  Person chopping onions for stuffing can also supply green beans
-  Start pie after turkey is out of the oven
08/25/2009 CS4961 21 08/25/2009 CS4961 22

Overhead of Parallelism Locality and Parallelism

•  Given enough parallel work, this is the biggest barrier Conventional
to getting desired speedup Storage
Proc Proc Proc
Hierarchy Cache Cache
Cache
•  Parallelism overheads include: L2 Cache L2 Cache L2 Cache
-  cost of starting a thread or process
-  cost of communicating shared data
interconnects
potential

-  cost of synchronizing L3 Cache L3 Cache L3 Cache

-  extra (redundant) computation

•  Each of these can be in the range of milliseconds Memory Memory

(=millions of flops) on some systems Memory

•  Tradeoff: Algorithm needs sufficiently large units of

work to run fast in parallel (I.e. large granularity), •  Large memories are slow, fast memories are small
but not so large that there is not enough parallel
work •  Program should do most work on local data

08/25/2009 CS4961 23 08/25/2009 CS4961 24

6
8/25/09

Load Imbalance Some Popular Parallel Programming Models

•  Load imbalance is the time that some processors in •  Pthreads (parallel threads)
the system are idle due to -  Low level expression of threads, which are independent
-  insufficient parallelism (during that phase) computations that can execute in parallel
-  unequal size tasks •  MPI (Message Passing Interface)
•  Examples of the latter -  Most widely used at the very high-end machines
-  adapting to “interesting parts of a domain” -  Extension to common sequential languages, express
communication between different processes along with
-  tree-structured computations parallelism
-  fundamentally unstructured problems
•  Map-Reduce (popularized by Google)
•  Algorithm needs to balance load -  Map: apply the same computation to lots of different data
(usually in distributed files) and produce local results
-  Reduce: compute global result from set of local results

•  CUDA (Compute Unified Device Architecture)

-  Proprietary programming language for NVIDIA graphics
processors
08/25/2009 CS4961 25 08/25/2009 CS4961 26

Summary of Lecture Next Time

•  Solving the “Parallel Programming Problem” •  An exploration of parallel algorithms and their
-  Key technical challenge facing today’s computing industry, features
government agencies and scientists
•  First written homework assignment
•  Scientific simulation discretizes some space into a grid
-  Perform local computations on grid
-  Communicate partial results between grids
-  Repeat for a set of time steps
-  Possibly perform other calculations with results
•  Commodity parallel programming can draw from this history
and move forward in a new direction
•  Writing fast parallel programs is difficult
-  Amdahl’s Law Must parallelize most of computation
-  Data Locality
-  Communication and Synchronization
-  Load Imbalance
08/25/2009 CS4961 27 08/25/2009 CS4961 28

High Performance
100% (1)
High Performance
457 pages
Fundamentals of Multicore Software Development PDF
No ratings yet
Fundamentals of Multicore Software Development PDF
322 pages
Financial Management Source #7
No ratings yet
Financial Management Source #7
49 pages
Filipino Bill of Rights
No ratings yet
Filipino Bill of Rights
13 pages
Conflict Management and Negotiation - Team 5
No ratings yet
Conflict Management and Negotiation - Team 5
34 pages
PMP Notes - 3
100% (3)
PMP Notes - 3
68 pages
Basics of Parallel Programming: Unit-1
No ratings yet
Basics of Parallel Programming: Unit-1
79 pages
Technical University of Mombasa Transcript Year 3
100% (1)
Technical University of Mombasa Transcript Year 3
1 page
Grammar
100% (1)
Grammar
398 pages
04 Nursing Process of MHN
100% (1)
04 Nursing Process of MHN
13 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
38 pages
A-Dec Dental Lights and Monitor Mounts Service Guide
No ratings yet
A-Dec Dental Lights and Monitor Mounts Service Guide
68 pages
HPC
100% (3)
HPC
457 pages
Financial Management Source #6
100% (1)
Financial Management Source #6
12 pages
Parallel Computing An Introduction
No ratings yet
Parallel Computing An Introduction
40 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
The FSC - Stability
No ratings yet
The FSC - Stability
9 pages
Flashcut CNC 7 - 0 Users Guide
No ratings yet
Flashcut CNC 7 - 0 Users Guide
185 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
No ratings yet
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
170 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
Col 11136
No ratings yet
Col 11136
294 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
30 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 03-Aug-2021 Lecture1-Course Introduction
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 03-Aug-2021 Lecture1-Course Introduction
39 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Chapter 1 Measuring Understanding Performance
No ratings yet
Chapter 1 Measuring Understanding Performance
63 pages
Bagi 9780203812303 - Previewpdf
No ratings yet
Bagi 9780203812303 - Previewpdf
94 pages
Lecture1 Introduction PDF
No ratings yet
Lecture1 Introduction PDF
43 pages
Parallel & Distributed Computing
No ratings yet
Parallel & Distributed Computing
47 pages
Chapter 1 - Parallel Architectures
No ratings yet
Chapter 1 - Parallel Architectures
60 pages
CS4230 Parallel Programming: Mary Hall August 21, 2012
No ratings yet
CS4230 Parallel Programming: Mary Hall August 21, 2012
17 pages
Lecture 9
No ratings yet
Lecture 9
72 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Assembly and Operating Instructions: Inverter Welding Machine
No ratings yet
Assembly and Operating Instructions: Inverter Welding Machine
14 pages
New Microsoft Word Document (3) BBBB
No ratings yet
New Microsoft Word Document (3) BBBB
85 pages
PC 1
No ratings yet
PC 1
53 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Parallel N Distributed Systems
No ratings yet
Parallel N Distributed Systems
44 pages
User Manual 2569067
No ratings yet
User Manual 2569067
70 pages
Perspective On Parallel Programming: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
No ratings yet
Perspective On Parallel Programming: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
42 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
Part 1 - Lecture 1 - Introduction Parallel Computing
No ratings yet
Part 1 - Lecture 1 - Introduction Parallel Computing
33 pages
4700 Vertical Multi-Stage Centrifugal Pumps: Installation and Operating Instructions
No ratings yet
4700 Vertical Multi-Stage Centrifugal Pumps: Installation and Operating Instructions
36 pages
P 1
No ratings yet
P 1
44 pages
CSCE569 Parallel Computing: TTH 03:30AM-04:45PM Dr. Jianjun Hu
No ratings yet
CSCE569 Parallel Computing: TTH 03:30AM-04:45PM Dr. Jianjun Hu
37 pages
ch2 PC
No ratings yet
ch2 PC
44 pages
CS 258 Parallel Computer Architecture: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
No ratings yet
CS 258 Parallel Computer Architecture: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
44 pages
001 - DDS IIIT Jan 10th
No ratings yet
001 - DDS IIIT Jan 10th
34 pages
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
No ratings yet
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
58 pages
CAQA5e ch1
No ratings yet
CAQA5e ch1
42 pages
9155EN
No ratings yet
9155EN
27 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
23 pages
Content Map Applications
No ratings yet
Content Map Applications
36 pages
Evaluation of Business Performance Source 01
No ratings yet
Evaluation of Business Performance Source 01
25 pages
Lecture1 Cda3101
No ratings yet
Lecture1 Cda3101
44 pages
Introduction To Scientific Programming
No ratings yet
Introduction To Scientific Programming
17 pages
Cours 1
No ratings yet
Cours 1
38 pages
Lect 1 Overview
No ratings yet
Lect 1 Overview
17 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Cours 1
No ratings yet
Cours 1
38 pages
14013204-3 - Parallel Computing - Lecture1
No ratings yet
14013204-3 - Parallel Computing - Lecture1
52 pages
Lecture Notes On Parallel Computation
No ratings yet
Lecture Notes On Parallel Computation
30 pages
Bray Resilient Valves
No ratings yet
Bray Resilient Valves
25 pages
1 Introduction
No ratings yet
1 Introduction
48 pages
High Performance Computing: Sabah Sayed
No ratings yet
High Performance Computing: Sabah Sayed
22 pages
Disability Project Work
No ratings yet
Disability Project Work
16 pages
Section 7 Gravitational Fields
No ratings yet
Section 7 Gravitational Fields
39 pages
Statements PDF
No ratings yet
Statements PDF
4 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
VAC Choke Multivariadores sandCoresDatasheet
No ratings yet
VAC Choke Multivariadores sandCoresDatasheet
16 pages
Week - 01 - Lec1 03 03 2021
No ratings yet
Week - 01 - Lec1 03 03 2021
19 pages
PDC Lecture 01
No ratings yet
PDC Lecture 01
36 pages
Kra 4 Community Linkages and Professional Engagement & Personal Growth and
No ratings yet
Kra 4 Community Linkages and Professional Engagement & Personal Growth and
7 pages
01 - Parallel Programming
No ratings yet
01 - Parallel Programming
18 pages
Lecture-2-06 01 2025
No ratings yet
Lecture-2-06 01 2025
21 pages
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
No ratings yet
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
22 pages
Likap Notebook Manual
No ratings yet
Likap Notebook Manual
8 pages
ITE-6300-2013T (UGRD) Cloud Computing & Internet o T
No ratings yet
ITE-6300-2013T (UGRD) Cloud Computing & Internet o T
9 pages
Parallel Computing 1 Unit
No ratings yet
Parallel Computing 1 Unit
59 pages
Financial Management Presentation 1
No ratings yet
Financial Management Presentation 1
8 pages
Grade 10 Agriculture
No ratings yet
Grade 10 Agriculture
4 pages
Basics CUDA
No ratings yet
Basics CUDA
55 pages
Individual Event Posters
No ratings yet
Individual Event Posters
8 pages
Module 1: Parallelism Fundamentals Week 1 Learning Outcomes
No ratings yet
Module 1: Parallelism Fundamentals Week 1 Learning Outcomes
8 pages
AWS SAA Notes
No ratings yet
AWS SAA Notes
18 pages
DR Fixit Polymer Mortar PX 75 1
No ratings yet
DR Fixit Polymer Mortar PX 75 1
3 pages
Parallel Computing: Quiz 2
No ratings yet
Parallel Computing: Quiz 2
2 pages
Egg Drop Project 2
No ratings yet
Egg Drop Project 2
2 pages
Itinerary of Travel
No ratings yet
Itinerary of Travel
4 pages
Content 211
No ratings yet
Content 211
3 pages
Daniel Robert Middleton
No ratings yet
Daniel Robert Middleton
3 pages
HDFC 5000 Book4 07to31mar25
No ratings yet
HDFC 5000 Book4 07to31mar25
3 pages

CS4961 Parallel Programming: Course Details

Uploaded by

CS4961 Parallel Programming: Course Details

Uploaded by

8/25/09

• Instructor: Mary Hall, [email protected],

• TA: Sriram Aananthakrishnan, [email protected]

08/25/2009 CS4961 1 08/25/2009 CS4961 2

Today’s Lecture Outline

Course Objectives Parallel and Distributed Computing

Detour: Technology as Driver for

• Looking to the future

08/25/2009 CS4961 7 08/25/2009 CS4961 8

Technology Trends: Microprocessor Capacity Technology Trends: Power Density Limits

Slide source: Maurice Herlihy

The Multi-Core Paradigm Shift Proof of Significance: Popular Press

What to do with all these transistors?

All Computers are Parallel Computers.

• More capabilities in software Los Alamos National

• … • What kind of processors?

Some Characteristics of Scientific Simulation

• Perform local computations on grid One

08/25/2009 CS4961 19 08/25/2009 CS4961 20

Parallel Programming Complexity Finding Enough Parallelism

Overhead of Parallelism Locality and Parallelism

- cost of synchronizing L3 Cache L3 Cache L3 Cache

- extra (redundant) computation

• Each of these can be in the range of milliseconds Memory Memory

• Tradeoff: Algorithm needs sufficiently large units of

08/25/2009 CS4961 23 08/25/2009 CS4961 24

Load Imbalance Some Popular Parallel Programming Models

• CUDA (Compute Unified Device Architecture)

Summary of Lecture Next Time

You might also like

•  Instructor: Mary Hall, [email protected],

•  TA: Sriram Aananthakrishnan, [email protected]

•  Looking to the future

•  More capabilities in software Los Alamos National

•  … •  What kind of processors?

•  Perform local computations on grid One

-  cost of synchronizing L3 Cache L3 Cache L3 Cache

-  extra (redundant) computation

•  Each of these can be in the range of milliseconds Memory Memory

•  Tradeoff: Algorithm needs sufficiently large units of

•  CUDA (Compute Unified Device Architecture)