0% found this document useful (0 votes)
67 views7 pages

CS4961 Parallel Programming: Course Details

This document provides details about a parallel programming course, including: 1) The course meets on Tuesdays and Thursdays from 9:10-10:30 AM in room WEB L112. 2) The instructor is Mary Hall and the TA is Sriram Aananthakrishnan. The textbook is "Principles of Parallel Programming" by Calvin Lin and Lawrence Snyder, along with other readings. 3) Today's lecture will provide an overview of the course, important problems that require parallel computing, and what types of architectures will be covered such as multi-cores and GPUs. It will also discuss developing high-performance parallel applications and parallel and distributed computing concepts.

Uploaded by

EDPSENPAI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views7 pages

CS4961 Parallel Programming: Course Details

This document provides details about a parallel programming course, including: 1) The course meets on Tuesdays and Thursdays from 9:10-10:30 AM in room WEB L112. 2) The instructor is Mary Hall and the TA is Sriram Aananthakrishnan. The textbook is "Principles of Parallel Programming" by Calvin Lin and Lawrence Snyder, along with other readings. 3) Today's lecture will provide an overview of the course, important problems that require parallel computing, and what types of architectures will be covered such as multi-cores and GPUs. It will also discuss developing high-performance parallel applications and parallel and distributed computing concepts.

Uploaded by

EDPSENPAI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

8/25/09

Course Details
•  Time and Location: TuTh, 9:10-10:30 AM, WEB L112
CS4961 Parallel Programming

•  Course Website
-  http://www.eng.utah.edu/~cs4961/

•  Instructor: Mary Hall, [email protected],


Lecture 1: Introduction 
 http://www.cs.utah.edu/~mhall/
-  Office Hours: Tu 10:45-11:15 AM; Wed 11:00-11:30 AM

•  TA: Sriram Aananthakrishnan, [email protected]


-  Office Hours: TBD
Mary Hall

•  Textbook
August 25, 2009

-  “Principles of Parallel Programming,”
Calvin Lin and Lawrence Snyder.
-  Also, readings and notes provided for
MPI, CUDA, Locality and Parallel Algs.

08/25/2009 CS4961 1 08/25/2009 CS4961 2

Today’s Lecture Outline


•  Logistics
• Overview of course (done)
•  Introduction
• Important problems require powerful
computers … •  Technology Drivers for Multi-Core Paradigm Shift
- … and powerful computers must be parallel.
•  Origins of Parallel Programming: Large-scale
- Increasing importance of educating parallel
scientific simulations
programmers (you!)
•  The fastest computer in the world today
• What sorts of architectures in this class
-  Multimedia extensions, multi-cores, GPUs, •  Why writing fast parallel programs is hard
networked clusters Some material for this lecture drawn from:
Kathy Yelick and Jim Demmel, UC Berkeley
• Developing high-performance parallel Quentin Stout, University of Michigan,
applications (see http://www.eecs.umich.edu/~qstout/parallel.html)
Top 500 list (http://www.top500.org)
- An optimization perspective
08/25/2009 CS4961 3 08/25/2009 CS4961 4

1
8/25/09

Course Objectives Parallel and Distributed Computing


• Learn how to program parallel processors and •  Parallel computing (processing):
systems -  the use of two or more processors (computers), usually
within a single system, working simultaneously to solve a
- Learn how to think in parallel and write correct single problem.
parallel programs •  Distributed computing (processing):
- Achieve performance and scalability through -  any computing that involves multiple computers remote
understanding of architecture and software mapping from each other that each have a role in a computation
problem or information processing.
• Significant hands-on programming experience •  Parallel programming:
- Develop real applications on real hardware -  the human process of developing programs that express
what computations should be executed in parallel.
• Discuss the current parallel computing context
- What are the drivers that make this course timely
- Contemporary programming models and
architectures, and where is the field going
08/25/2009 CS4961 5 08/25/2009 CS4961 6

Detour: Technology as Driver for


Why is Parallel Programming Important Now? “Multi-Core” Paradigm Shift
•  All computers are now parallel computers
(embedded, commodity, supercomputer)
-  On-chip architectures look like parallel computers •  Do you know why most computers sold today are
-  Languages, software development and compilation strategies parallel computers?
originally developed for high end (supercomputers) are now
becoming important for many other domains •  Let’s talk about the technology trends
•  Why?
-  Technology trends

•  Looking to the future


-  Parallel computing for the masses demands better parallel
programming paradigms
-  And more people who are trained in writing parallel
programs (possibly you!)
-  How to put all these vast machine resources to the best use!

08/25/2009 CS4961 7 08/25/2009 CS4961 8

2
8/25/09

Technology Trends: Microprocessor Capacity Technology Trends: Power Density Limits


Serial Performance
Transistor
count still
rising

Clock speed
flattening
sharply

Slide source: Maurice Herlihy

Moore’s Law:
Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor
density of semiconductor chips would double roughly every 18 months.
08/25/2009 CS4961 9 08/25/2009 CS4961 10

The Multi-Core Paradigm Shift Proof of Significance: Popular Press

What to do with all these transistors?


•  Key ideas:
•  This week’s issue of
-  Movement away from increasingly complex processor design Newsweek!
and faster clocks
-  Replicated functionality (i.e., parallel) is simpler to design •  Article on 25 things “smart
-  Resources more efficiently utilized people” should know
-  Huge power management advantages •  See
http://www.newsweek.com/id/212142

All Computers are Parallel Computers.


08/25/2009 CS4961 11 08/25/2009 CS4961 12

3
8/25/09

Scientific Simulation:
The Third Pillar of Science The quest for increasingly more powerful machines
•  Traditional scientific and engineering paradigm:
•  Scientific simulation will continue to push on system
1) Do theory or paper design. requirements:
2) Perform experiments or build system. -  To increase the precision of the result
-  To get to an answer sooner (e.g., climate modeling, disaster
•  Limitations: modeling)
-  Too difficult -- build large wind tunnels.
-  Too expensive -- build a throw-away passenger jet.
•  The U.S. will continue to acquire systems of
increasing scale
-  Too slow -- wait for climate or galactic evolution.
-  For the above reasons
-  Too dangerous -- weapons, drug design, climate -  And to maintain competitiveness
experimentation.
•  Computational science paradigm:
3) Use high performance computer systems to simulate
the phenomenon
-  Base on known physical laws and efficient numerical
methods.
08/25/2009 CS4961 13 08/25/2009 CS4961 14

A Similar Phenomenon in Commodity Systems The fastest computer in the world today
•  What is its name? RoadRunner

•  More capabilities in software Los Alamos National


•  Where is it located? Laboratory
•  Integration across software
•  Faster response •  How many processors does it have? ~19,000 processor chips
•  More realistic graphics (~129,600 “processors”)

•  … •  What kind of processors?


AMD Opterons and
IBM Cell/BE (in Playstations)
•  How fast is it? 1.105 Petaflop/second
One quadrilion operations/s
1 x 1016
See http://www.top500.org
08/25/2009 CS4961 15 08/25/2009 CS4961 16

4
8/25/09

High Resolution
Example: Global Climate Modeling Problem Climate Modeling on
NERSC-3 – P. Duffy,
•  Problem is to compute: et al., LLNL
f(latitude, longitude, elevation, time) 
temperature, pressure, humidity, wind velocity
•  Approach:
-  Discretize the domain, e.g., a measurement point every 10 km
-  Devise an algorithm to predict weather at time t+δt given t

•  Uses:
-  Predict major events,
e.g., El Nino
-  Use in setting air
emissions standards

Source: http://www.epm.ornl.gov/chammp/chammp.html
08/25/2009 CS4961 17 08/25/2009 CS4961 18

Some Characteristics of Scientific Simulation


Example of Discretizing a Domain
•  Discretize physical or conceptual space into a grid
-  Simpler if regular, may be more representative if adaptive

•  Perform local computations on grid One


Another
-  Given yesterday’s temperature and weather pattern, what is processor
today’s expected temperature? processor
computes
computes
this part
•  Communicate partial results between grids this part in
parallel
-  Contribute local weather result to understand global
weather pattern.
•  Repeat for a set of time steps
•  Possibly perform other calculations with results
-  Given weather model, what area should evacuate for a
hurricane?
Processors in adjacent blocks in the grid communicate their result.

08/25/2009 CS4961 19 08/25/2009 CS4961 20

5
8/25/09

Parallel Programming Complexity Finding Enough Parallelism


An Analogy to Preparing Thanksgiving Dinner •  Suppose only part of an application seems parallel
•  Enough parallelism? (Amdahl’s Law)
•  Amdahl’s law
-  Suppose you want to just serve turkey
-  let s be the fraction of work done sequentially, so
•  Granularity (1-s) is fraction parallelizable
-  How frequently must each assistant report to the chef -  P = number of processors
-  After each stroke of a knife? Each step of a recipe? Each dish Speedup(P) = Time(1)/Time(P)
completed?

All of
•  Locality these things makes parallel <= 1/(s + (1-s)/P)
programming even
-  Grab the spices one harder
at a time?
prior to starting a dish?
than
Or collect onessequential
that are needed
<= 1/s
programming.
•  Load balance •  Even if the parallel part speeds up perfectly
-  Each assistant gets a dish? Preparing stuffing vs. cooking green performance is limited by the sequential part
beans?
•  Coordination and Synchronization
-  Person chopping onions for stuffing can also supply green beans
-  Start pie after turkey is out of the oven
08/25/2009 CS4961 21 08/25/2009 CS4961 22

Overhead of Parallelism Locality and Parallelism


•  Given enough parallel work, this is the biggest barrier Conventional
to getting desired speedup Storage
Proc Proc Proc
Hierarchy Cache Cache
Cache
•  Parallelism overheads include: L2 Cache L2 Cache L2 Cache
-  cost of starting a thread or process
-  cost of communicating shared data
interconnects
potential

-  cost of synchronizing L3 Cache L3 Cache L3 Cache

-  extra (redundant) computation

•  Each of these can be in the range of milliseconds Memory Memory


(=millions of flops) on some systems Memory

•  Tradeoff: Algorithm needs sufficiently large units of


work to run fast in parallel (I.e. large granularity), •  Large memories are slow, fast memories are small
but not so large that there is not enough parallel
work •  Program should do most work on local data

08/25/2009 CS4961 23 08/25/2009 CS4961 24

6
8/25/09

Load Imbalance Some Popular Parallel Programming Models


•  Load imbalance is the time that some processors in •  Pthreads (parallel threads)
the system are idle due to -  Low level expression of threads, which are independent
-  insufficient parallelism (during that phase) computations that can execute in parallel
-  unequal size tasks •  MPI (Message Passing Interface)
•  Examples of the latter -  Most widely used at the very high-end machines
-  adapting to “interesting parts of a domain” -  Extension to common sequential languages, express
communication between different processes along with
-  tree-structured computations parallelism
-  fundamentally unstructured problems
•  Map-Reduce (popularized by Google)
•  Algorithm needs to balance load -  Map: apply the same computation to lots of different data
(usually in distributed files) and produce local results
-  Reduce: compute global result from set of local results

•  CUDA (Compute Unified Device Architecture)


-  Proprietary programming language for NVIDIA graphics
processors
08/25/2009 CS4961 25 08/25/2009 CS4961 26

Summary of Lecture Next Time


•  Solving the “Parallel Programming Problem” •  An exploration of parallel algorithms and their
-  Key technical challenge facing today’s computing industry, features
government agencies and scientists
•  First written homework assignment
•  Scientific simulation discretizes some space into a grid
-  Perform local computations on grid
-  Communicate partial results between grids
-  Repeat for a set of time steps
-  Possibly perform other calculations with results
•  Commodity parallel programming can draw from this history
and move forward in a new direction
•  Writing fast parallel programs is difficult
-  Amdahl’s Law Must parallelize most of computation
-  Data Locality
-  Communication and Synchronization
-  Load Imbalance
08/25/2009 CS4961 27 08/25/2009 CS4961 28

You might also like