217 Lec1
217 Lec1
Lecture 1: Introduction
2
Course Staff
• Professor:
Nael Abu-Ghazaleh
WCH-441, (951) 827-2347
Use 217 to start your e-mail subject line
Office hours: TBD soon; or by appointment
• Teaching Assistants:
– We will have one!
• If we can find one
– Office hours: TBA
• Class may be moving in time and space
– Sorry, I will let you know soon
3
Web Resources
• Course website:
http://www.cs.ucr.edu/~nael/217-f15
– Handouts and lecture slides
– Resources, announcements, projects, …
– Note: While we’ll make an effort to post announcements
on the web, we can’t guarantee it, and won’t make any
allowances for people who miss things in class.
• Piazza for discussions
– Channel for electronic announcements
– Forum for Q&A – course staff read the board, and your
classmates often have answers
• iLearn for submissions and grades
4
Grading
• Exam+Final: 35%
• Project: 30%
– Design Document: 25%
– Project Presentation: 25%
– Demo/Functionality/Performance/Report: 50%
5
Academic Honesty
• You are allowed and encouraged to discuss
assignments with other students in the class.
Getting verbal advice/help from people who’ve
already taken the course is also fine.
• Any reference to assignments from previous terms
or web postings is unacceptable
• Any copying of non-trivial code is unacceptable
– Non-trivial = more than a line or so
– Includes reading someone else’s code and then going off
to write your own.
6
Academic Honesty (cont.)
• Giving/receiving help on an exam is
unacceptable
• Penalties for academic dishonesty:
– Zero on the assignment for the first occasion
– Automatic failure of the course for repeat
offenses
– UCR academic honesty policy trumps any
instructor policies
7
Team Projects
• Work can be divided up between team
members in any way that works for you
• However, each team member will demo the
final checkpoint of each project individually,
and will get a separate demo grade
– This will include questions on the entire design
– Rationale: if you don’t know enough about the
whole design to answer questions on it, you
aren’t involved enough in the project
8
Text/Notes
1. D. Kirk and W. Hwu, “Programming Massively
Parallel Processors – A Hands-on Approach,
Second Edition”
2. CUDA by example, Sanders and Kandrot
3. Nvidia CUDA C Programming Guide
– https://docs.nvidia.com/cuda/cuda-c-programming-guide/
4. Occasional research papers
5. Lecture notes on class website
– Tentative schedule on class website
– Will try to assign reading ahead of time
9
Blue Waters Hardware
e n2
PCIe G
NVIDIA Kepler
Host Memory
32GB Z
1600 MT/s DDR3
Y
NVIDIA Tesla X2090 Memory
6GB GDDR5 capacity
X
Gemini High Speed Interconnect
11
CPU and GPU have very different
design philosophy
GPU CPU
Throughput Oriented Cores Latency Oriented Cores
Chi Chi
p Compute Unit p
Core
Cache/Local Mem
Local Cache
Registers
Threading
Registers
Control
SIMD
Unit SIMD Unit
CPUs: Latency Oriented Design
• Large caches
– Convert long latency memory
accesses to short latency ALU ALU
cache accesses Control
ALU ALU
• Sophisticated control CPU
– Branch prediction for Cache
reduced branch latency
– Data forwarding for reduced DRAM
data latency
• Powerful ALU
– Reduced operation latency
13
GPUs: Throughput Oriented
Design
• Small caches
– To boost memory throughput
• Simple control
– No branch prediction
GPU
– No data forwarding
• Energy efficient ALUs
– Many, long latency but heavily
pipelined for high throughput DRAM
15
Heterogeneous parallel computing is
catching on.
Data
Financial Scientific Engineering Medical
Intensive
Analysis Simulation Simulation Imaging
Analytics
Digital Electronic
Digital Video Computer Biomedical
Audio Design
Processing Vision Informatics
Processing Automation
Core A
• Scalability
9/25/15!
Keys to Software Cost Control
App
Core A
Core A
2.0
• Scalability
– The same application runs efficiently on new
generations of cores
9/25/15!
Keys to Software Cost Control
App
• Scalability
– The same application runs efficiently on new
generations of cores
– The same application runs efficiently on more of
the same cores
9/25/15!
Scalability and Portability
• Performance growth with HW generations
– Increasing number of compute units
– Increasing number of threads
– Increasing vector length
– Increasing pipeline depth
– Increasing DRAM burst size
– Increasing number of DRAM channels
– Increasing data movement latency
• Portability across many different HW types
– Multi-core CPUs vs. many-core GPUs
– VLIW vs. SIMD vs. threading
– Shared memory vs. distributed memory
9/25/15!
Keys to Software Cost Control
App App App
• Scalability
• Portability
– The same application runs efficiently on
9/25/15!
different types of cores
Keys to Software Cost Control
App App App
• Scalability
• Portability
– The same application runs efficiently on different types
of cores
– The same application runs efficiently on systems with
different organizations and interfaces
9/25/15!
Parallelism Scalability
9/25/15!
Algorithm Complexity and Data
Scalability
9/25/15!
Why is data scalability important?
• Any algorithm complexity higher than linear
is not data scalable
– Execution time explodes as data size grows even for an n*log(n)
algorithm
9/25/15!
A Real Example of Data Scalability
Particle-Mesh Algorithms
9/25/15!
Massive
Parallelism -
Regularity
good bad!
Global Memory Bandwidth
Ideal Reality
Conflicting Data Accesses Cause
Serialization and Delays
• Massively parallel
execution cannot
afford serialization
• Contentions in accessing
critical data causes
serialization
What is the stake?