0% found this document useful (0 votes)
11 views24 pages

L01 Slides

The document outlines the course ECE 459: Programming for Performance, taught by Patrick Lam at the University of Waterloo. It covers the course schedule, objectives, assignments, and key concepts related to improving program performance through parallelism and synchronization. The document also includes contact information for the instructor and teaching assistants, as well as resources for students.

Uploaded by

ANKIT MATHUR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views24 pages

L01 Slides

The document outlines the course ECE 459: Programming for Performance, taught by Patrick Lam at the University of Waterloo. It covers the course schedule, objectives, assignments, and key concepts related to improving program performance through parallelism and synchronization. The document also includes contact information for the instructor and teaching assistants, as well as resources for students.

Uploaded by

ANKIT MATHUR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Lecture 01—Introduction

ECE 459: Programming for Performance

Patrick Lam

University of Waterloo

January 5, 2015

[Thanks to Jon Eyolfson for slides!]


Course Website

http://patricklam.ca/p4p/

Resources on github:
[email protected]:patricklam/p4p-2015.git

I also added everyone enrolled as of Sunday to Piazza.

2/1
Staff

Instructor
Patrick Lam [email protected] DC 2597D/DC2534

Teaching Assistants
Xi Cheng [email protected]
Morteza Nabavi [email protected]
Saeed Nejati [email protected]
Husam Suleiman [email protected]

3/1
Schedule

Lectures: January 5—April 7


MWF 9:30 AM, MC 2065
Tutorials: not used

Midterm: TBA

4/1
Office Hours

Wednesdays, 10:30-12:20, DC2597D,

or check http://patricklam.ca/in

[Academic, and other, advice also available!]

5/1
Recommended Textbook

Multicore Application Programming For Windows, Linux, and Oracle


Solaris. Darryl Gove. Addison-Wesley, 2010.

6/1
Goal

Make programs run faster!

7/1
Making Programs Faster

Two main ways:

1 2

1
credit: Chensiyuan, Wikimedia Commons, CC-BY-SA
2
credit: me
8/1
Making Programs Faster

Increase bandwidth (tasks per unit time); or


Decrease latency (time per task).

Examples of bandwidth/latency:
Network (connection speed/ping), traffic (lanes/speed)

9/1
Our Focus

Primarily on increasing bandwidth (more tasks/unit time).

Do tasks in parallel

Decreasing time/task usually harder, with fewer gains.

CPUs have been going towards more cores rather than


raw speed.

10 / 1
A Bit on Improving Latency

We won’t return to these topics, but we’ll touch on them now.

Profile the code;


Do less work;
Be smarter; or
Improve the hardware.

11 / 1
Intermission

While working on Assignment 1, I ran into this puzzle:

(x0 , y0 ) (x0 + w0 , y0 )

(x1 , y1 ) (x1 + w1 , y1 )

(x0 , y0 + h0 )
When do these rectangles intersect?

12 / 1
Increasing Bandwidth: Parallelism

Some tasks are easy to run in parallel.

Examples: web server requests, computer


graphics, brute-force searches, genetic algorithms

Others are more difficult.

Example: linked list traversal (why?)

13 / 1
Hardware

Use pipelining (all modern CPU do this):


I Implement this in software by spliting a task into
subtasks and running the subtasks in parallel

Increase the number of cores/CPUs.

Use multiple connected machines.

Use specialized hardware, such as a GPU which


contains hundreds of simple cores.

14 / 1
Barriers to parallelization

Independent tasks (“embarrassingly parallel problems”)


are trivial to parallelize, but dependencies cause
problems.

Unable to start task until previous task finishes.

May require synchronization and combination of results.

More difficult to reason about, since execution may


happen in any order.

15 / 1
Limitations

Sequential tasks in the problem will always dominate


maximum performance

Some sequential problems may be parallelizable by


reformulating the implementation

However, no matter how many processors you have,


you won’t be able to speed up the program as a whole
(known as Amdahl’s Law)

16 / 1
Data Race

Two processors accessing the same data.

For example, consider the following code:


x = 1
print x
You run it and see it prints 5

Why? Before the print, another thread wrote a new value for x.
This is an example of a data race.

17 / 1
Deadlock

Two processors trying to access a shared resource.


Consider two processors trying to get two resources:
Processor 1 Processor 2
Get Resource 1 Get Resource 2
Get Resource 2 Get Resource 1
Release Resource 2 Release Resource 1
Release Resource 1 Release Resource 2

Processor 1 gets Resource 1, then Processor 2 gets Resource 2,


now they both wait for each other (deadlock).

18 / 1
Objectives

Implement parallel programs which use 1) synchronization


primitives and 2) asynchronous I/O

Describe and use parallel computing frameworks

Be able to investigate software and improve its performance

Use and understand specialized GPU programming/programming


languages

19 / 1
Assignments

1 Manual parallelization using Pthreads/async I/O

2 Automatic parallelization and OpenMP

3 Application profiling and improvement

4 GPU programming

20 / 1
Breakdown

40% Assignments (10% each)

10% Midterm

50% Final

21 / 1
Grace Days

4 grace days to use over the semester


for late assignments.

No mark penalty for using grace days.

Try not to use them just because they’re there.

22 / 1
Homework for Wednesday

We’ll be doing exercises based on this presentation:

http://www.infoq.com/presentations/
click-crash-course-modern-hardware

I’ll post the exercises on Tuesday.

23 / 1
Suggestions?

Just let me know

24 / 1

You might also like