0% found this document useful (0 votes)

25 views62 pages

01 - Lecture Intro To HPC

The document outlines the course CSE 455 on High Performance Computing (HPC), covering topics such as HPC fundamentals, parallel programming, and performance analysis. It emphasizes the need for HPC due to increasing computational demands and introduces various parallel architectures and programming concepts. Assessment methods include assignments, quizzes, a midterm, a major project, and a final exam.

Uploaded by

John Wadie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views62 pages

01 - Lecture Intro To HPC

Uploaded by

John Wadie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

CSE 455

Higher Performance
Computing
LECTURE 1
Course Outline
• High Performance Computing
• Flynn’s Taxonomy
• Interconnection Networks
• Performance Analysis of Multiprocessor Architecture
• Shared Memory Architecture
• Parallel Programming with MPI
• Parallelization Fundamentals
• Parallel Programming with OpenMP
• Graphical Processing Units (GPUs)
• Task Scheduling and Allocation
Textbooks
Assessment
Assessment Marks
Assignments 15
Quizzes 5
Midterm 20
Major Task (Project) 20
Final Exam 40
Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
What is High Performance Computing (HPC)?

• Using Computing resources that provide more

computing power to solve a problem in a reasonable
amount of time.
• Those Problems need large amounts of computing
power for short periods of time.
• HPC systems range from workstations, up to the
largest supercomputers.
What is High Performance Computing (HPC)?

HPC includes work on ‘four basic building blocks’ in this

course
• Theory (numerical laws, physical models, speed-up
performance, etc.)
• Technology (multi-core, supercomputers, networks,
storages, etc.)
• Architecture (shared-memory, distributed-memory,
interconnects, etc.)
• Software (libraries, schedulers, monitoring,
applications, etc.)
Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
Why we need ever-increasing
performance
◼ Computational power is increasing, but so
are our computation problems and needs.
◼ Problems we never dreamed of have been
solved because of past increases, such as
decoding the human genome.
◼ More complex problems are still waiting to
be solved.

Copyright © 2010, Elsevier Inc. All rights Reserved 9

Characteristics of Problems
Solved Using HPC
◼ It takes long time to compute its results (Need
More compute power).
◼ Needs large quantity of resources (memory,
etc...).
◼ Requires multiple runs.
◼ Time critical

Copyright © 2010, Elsevier Inc. All rights Reserved 10

Motivational Examples
A few examples of large-scale problems for motivation.
◼ Currently that means Tera-scale or Peta-scale.

Kilo thousand (103 ) 210

Mega million (106 ) 220
Giga billion (109 ) 230
Tera trillion (1012) 240
Peta (1015) 250
Exa (1018) 260
◼ Processing speed measured in ???

◼ One operation may take several cycles.

◼ How many operation/sec done in 1 GHz Processor ???

◼ Exascale is a billion times more. (Also flops and bytes.)

Solving a linear system
Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
Climate modeling

Copyright © 2010, Elsevier Inc. All rights Reserved 14

Protein folding

Copyright © 2010, Elsevier Inc. All rights Reserved 15

Drug discovery

Copyright © 2010, Elsevier Inc. All rights Reserved 16

Energy research

Copyright © 2010, Elsevier Inc. All rights Reserved 17

Data analysis

Copyright © 2010, Elsevier Inc. All rights Reserved 18

◼ Since then, it’s dropped to about 20%

increase per year.

Copyright © 2010, Elsevier Inc. All rights Reserved 20

An intelligent solution
◼ Instead of designing and building faster
microprocessors, put multiple processors
on a single integrated circuit.

Copyright © 2010, Elsevier Inc. All rights Reserved 21

Now it’s up to the programmers
◼ Adding more processors doesn’t help
much if programmers aren’t aware of
them…
◼ … or don’t know how to use them.

◼ Serial programs don’t benefit from this

approach (in most cases).

Copyright © 2010, Elsevier Inc. All rights Reserved 22

Why we’re building parallel
systems
◼ Up to now, performance increases have
been attributable to increasing density of
transistors.

◼ But there are

inherent
problems.

Copyright © 2010, Elsevier Inc. All rights Reserved 23

A little physics lesson
◼ Smaller transistors = faster processors.
◼ Faster processors = increased power
consumption.
◼ Increased power consumption = increased
heat.
◼ Increased heat = unreliable processors.

Copyright © 2010, Elsevier Inc. All rights Reserved 24

Solution
◼ Move away from single-core systems to
multicore processors.
◼ “core” = central processing unit (CPU)

◼ Introducing parallelism!!!

Copyright © 2010, Elsevier Inc. All rights Reserved 25

◼ What you really want is for

it to run faster.

Copyright © 2010, Elsevier Inc. All rights Reserved 27

Approaches to the serial problem
◼ Rewrite serial programs so that they’re
parallel.

◼ Write translation programs that

automatically convert serial programs into
parallel programs.
◼ This is very difficult to do.
◼ Success has been limited.

Copyright © 2010, Elsevier Inc. All rights Reserved 28

More problems
◼ Some coding constructs can be
recognized by an automatic program
generator, and converted to a parallel
construct.
◼ However, it’s likely that the result will be a
very inefficient program.
◼ Sometimes the best parallel solution is to
step back and devise an entirely new
algorithm.

Copyright © 2010, Elsevier Inc. All rights Reserved 29

Example
◼ Compute n values and add them together.
◼ Serial solution:

Copyright © 2010, Elsevier Inc. All rights Reserved 30

Example (cont.)
◼ We have p cores, p much smaller than n.
◼ Each core performs a partial sum of
approximately n/p values.

Each core uses it’s own private variables

and executes this block of code
independently of the other cores.

Copyright © 2010, Elsevier Inc. All rights Reserved 31

Example (cont.)
◼ After each core completes execution of the
code, is a private variable my_sum
contains the sum of the values computed
by its calls to Compute_next_value.

◼ Ex., 8 cores, n = 24, then the calls to

Compute_next_value return:
1,4,3, 9,2,8, 5,1,1, 5,2,7, 2,5,0, 4,1,8, 6,5,1, 2,3,9

Copyright © 2010, Elsevier Inc. All rights Reserved 32

Example (cont.)
◼ Once all the cores are done computing
their private my_sum, they form a global
sum by sending results to a designated
“master” core which adds the final result.

Copyright © 2010, Elsevier Inc. All rights Reserved 33

Example (cont.)

Copyright © 2010, Elsevier Inc. All rights Reserved 34

Example (cont.)

Core 0 1 2 3 4 5 6 7
my_sum 8 19 7 15 7 13 12 14

Global sum
8 + 19 + 7 + 15 + 7 + 13 + 12 + 14 = 95

Core 0 1 2 3 4 5 6 7
my_sum 95 19 7 15 7 13 12 14

Copyright © 2010, Elsevier Inc. All rights Reserved 35

But wait!
There’s a much better way
to compute the global sum.

Copyright © 2010, Elsevier Inc. All rights Reserved 36

Better parallel algorithm
◼ Don’t make the master core do all the
work.
◼ Share it among the other cores.
◼ Pair the cores so that core 0 adds its result
with core 1’s result.
◼ Core 2 adds its result with core 3’s result,
etc.
◼ Work with odd and even numbered pairs of
cores.
Copyright © 2010, Elsevier Inc. All rights Reserved 37
Better parallel algorithm (cont.)
◼ Repeat the process now with only the
evenly ranked cores.
◼ Core 0 adds result from core 2.
◼ Core 4 adds the result from core 6, etc.

◼ Now cores divisible by 4 repeat the

process, and so forth, until core 0 has the
final result.

Copyright © 2010, Elsevier Inc. All rights Reserved 38

Multiple cores forming a global
sum

Copyright © 2010, Elsevier Inc. All rights Reserved 39

Analysis
◼ In the first example, the master core
performs 7 receives and 7 additions.

◼ In the second example, the master core

performs 3 receives and 3 additions.

◼ The improvement is more than a factor of 2!

Analysis (cont.)
◼ The difference is more dramatic with a
larger number of cores.
◼ If we have 1000 cores:
◼ The first example would require the master to
perform 999 receives and 999 additions.
◼ The second example would only require 10
receives and 10 additions.

◼ That’s an improvement of almost a factor

of 100!
Copyright © 2010, Elsevier Inc. All rights Reserved 41
Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
How do we write parallel
programs?
◼ Task parallelism
◼ Partition various tasks carried out solving the
problem among the cores.

◼ Data parallelism
◼ Partition the data used in solving the problem
among the cores.
◼ Each core carries out similar operations on it’s
part of the data.

Professor P

15 questions
300 exams

Professor P’s grading assistants

TA#1 TA#3
TA#2

Division of work –
data parallelism

TA#1
100 exams
TA#3

100 exams

100 exams
TA#2

Division of work –
task parallelism

TA#1
TA#3
Questions 11 - 15
Questions 1 - 5

TA#2
Questions 6 - 10

Division of work –
data parallelism

Division of work –
task parallelism

Tasks
1) Receiving
2) Addition

Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
Parallel Programming Concepts
Parallel programming is explicitly indicating how
different portions of the computation may be executed
concurrently by different processors.
Programmers/compilers must be able to identify
operations that may be performed in parallel.
Terminology
◼ Concurrent computing – a program is one
in which multiple tasks can be in progress
at any instant.
◼ Parallel computing – a program is one in
which multiple tasks cooperate closely to
solve a problem
◼ Distributed computing – a program may
need to cooperate with other programs to
solve a problem.

Coordination
◼ Cores usually need to coordinate their work.
◼ Communication – one or more cores send
their current partial sums to another core.
◼ Load balancing – share the work evenly
among the cores so that one is not heavily
loaded.
◼ Synchronization – because each core works
at its own pace, make sure cores do not get
too far ahead of the rest.

Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
Parallel Architectures
◼ Multi-Processors Systems
◼ Multiple-CPU computer with shared memory.
◼ The same address on different CPUs refers
to the same memory location.
◼ Centralized multiprocessor: all the primary
memory is in one place.
◼ Distributed multiprocessor: the primary
memory is distributed among the processors.

55
Parallel Architectures
◼ Multi-Computers Systems
◼ Has disjoint local address spaces (memory).
◼ Each CPU has direct access to its local
memory only.
◼ The same address on different CPUs refers to
different memory locations.
◼ CPUs interact with each other by passing
messages.

56
Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
Type of parallel systems

Shared-memory Distributed-memory
Multi-Processor Multicomputer
Copyright © 2010, Elsevier Inc. All rights Reserved 58
Parallel Algorithm Design
A four-step process for designing parallel
algorithms
Foster’s Design Methodology
Parallel Algorithm Design
Decomposition: partitioning the problem
into tasks.
Communication: connecting tasks to each
other.
Agglomeration: reducing the number of
tasks to reduce communication overhead.
Mapping: assigning tasks to processes.
Concluding Remarks (1)
◼ The laws of physics have brought us to the
doorstep of multicore technology.
◼ Serial programs typically don’t benefit from
multiple cores.
◼ Automatic parallel program generation
from serial program code isn’t the most
efficient approach to get high performance
from multicore computers.

Concluding Remarks (2)
◼ Learning to write parallel programs
involves learning how to coordinate the
cores.
◼ Parallel programs are usually very
complex and therefore, require sound
program techniques and development.

Parallel Programming For Modern High Performance Computing Systems (Czarnul, Pawel)
No ratings yet
Parallel Programming For Modern High Performance Computing Systems (Czarnul, Pawel)
330 pages
Why Parallel Computing?: Peter Pacheco
No ratings yet
Why Parallel Computing?: Peter Pacheco
84 pages
Manual Deh 2250ub
0% (1)
Manual Deh 2250ub
112 pages
Parallel Computing 1 Unit
No ratings yet
Parallel Computing 1 Unit
59 pages
Chapter 1
No ratings yet
Chapter 1
47 pages
Parallel Processing
No ratings yet
Parallel Processing
61 pages
Lecture01 Intro ToHPC
No ratings yet
Lecture01 Intro ToHPC
48 pages
L1.3a HPC Concepts
No ratings yet
L1.3a HPC Concepts
43 pages
Generic Questions
No ratings yet
Generic Questions
70 pages
CC Notes
No ratings yet
CC Notes
78 pages
Khaitan PSERC Webinar HPC Mar 2013 Slides
No ratings yet
Khaitan PSERC Webinar HPC Mar 2013 Slides
52 pages
Mca 4
No ratings yet
Mca 4
61 pages
Introduction To Parallel Computing Tutorial - HPC at LLNL
No ratings yet
Introduction To Parallel Computing Tutorial - HPC at LLNL
46 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
34 pages
CMP 252 - Parallelism Fundamentals
No ratings yet
CMP 252 - Parallelism Fundamentals
64 pages
14013204-3 - Parallel Computing - Lecture1
No ratings yet
14013204-3 - Parallel Computing - Lecture1
52 pages
Introduction To Parallel Computing Tutorial
No ratings yet
Introduction To Parallel Computing Tutorial
35 pages
PP Cuda Unit1 1
No ratings yet
PP Cuda Unit1 1
77 pages
SPPU - BE - HPC - Unit 1 Notes
67% (3)
SPPU - BE - HPC - Unit 1 Notes
47 pages
High Performance Computing (HPC) Lec1
No ratings yet
High Performance Computing (HPC) Lec1
30 pages
HPC Introduction Lecture 3
No ratings yet
HPC Introduction Lecture 3
42 pages
Chapter 1
No ratings yet
Chapter 1
39 pages
Lec1-Introduction To Parallel - Distributed System
No ratings yet
Lec1-Introduction To Parallel - Distributed System
29 pages
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
No ratings yet
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
170 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
MObile Communication
No ratings yet
MObile Communication
61 pages
OR-Model Question Paper - 2024
No ratings yet
OR-Model Question Paper - 2024
6 pages
CAQA5e ch1
No ratings yet
CAQA5e ch1
42 pages
Multicore02 2
No ratings yet
Multicore02 2
18 pages
Module 1-Topic 1
No ratings yet
Module 1-Topic 1
36 pages
High Performance Computing: Sabah Sayed
No ratings yet
High Performance Computing: Sabah Sayed
22 pages
CC Unit1 Notes Compressed
No ratings yet
CC Unit1 Notes Compressed
41 pages
PDC 3
No ratings yet
PDC 3
26 pages
HPC Fall 2010: Prof. Robert Van Engelen
No ratings yet
HPC Fall 2010: Prof. Robert Van Engelen
35 pages
Computação Paralela
No ratings yet
Computação Paralela
18 pages
HPC Lecture (1) Summary
No ratings yet
HPC Lecture (1) Summary
8 pages
001 - DDS IIIT Jan 10th
No ratings yet
001 - DDS IIIT Jan 10th
34 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
Unit 1 HPC
No ratings yet
Unit 1 HPC
11 pages
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
No ratings yet
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
18 pages
CC Notes I Unit
No ratings yet
CC Notes I Unit
31 pages
AutoCAD Drawing Commands
No ratings yet
AutoCAD Drawing Commands
9 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Lecture 9
No ratings yet
Lecture 9
72 pages
BCSE412L - Parallel Computing 01
No ratings yet
BCSE412L - Parallel Computing 01
27 pages
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
Lecture 1 - Introduction To Parallel Computing
0% (1)
Lecture 1 - Introduction To Parallel Computing
32 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
23 pages
01-Parallel Computing
No ratings yet
01-Parallel Computing
7 pages
Controlcasepciv4 241115112355 3cfe7e3f
No ratings yet
Controlcasepciv4 241115112355 3cfe7e3f
27 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
High Performance Computing Lecture 1 HPC Public
No ratings yet
High Performance Computing Lecture 1 HPC Public
50 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 03-Aug-2021 Lecture1-Course Introduction
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 03-Aug-2021 Lecture1-Course Introduction
39 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
30 pages
CHEMISTRY - 3.1 Accuracy Precision Practice Sig Figs and Sci Notation
100% (1)
CHEMISTRY - 3.1 Accuracy Precision Practice Sig Figs and Sci Notation
20 pages
Units (Objective & Descriptive) (2024-25)
No ratings yet
Units (Objective & Descriptive) (2024-25)
78 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
.Trashed-1650000204-Hpc Prac Exam
No ratings yet
.Trashed-1650000204-Hpc Prac Exam
5 pages
SAP Tables - Overview
No ratings yet
SAP Tables - Overview
3 pages
BSNL Landline Broadband Closure Letter
0% (1)
BSNL Landline Broadband Closure Letter
2 pages
Lect 1 Overview
No ratings yet
Lect 1 Overview
17 pages
Part 1 - Lecture 1 - Introduction Parallel Computing
No ratings yet
Part 1 - Lecture 1 - Introduction Parallel Computing
33 pages
Satp Installation Guide 3.2
No ratings yet
Satp Installation Guide 3.2
81 pages
LAS WEEK 1 - Grade 10 ICT
No ratings yet
LAS WEEK 1 - Grade 10 ICT
4 pages
ICT Trivia
No ratings yet
ICT Trivia
9 pages
Richtek RT9742
No ratings yet
Richtek RT9742
20 pages
End User Procedure (EUP) Display RFQ ME43: Purpose
No ratings yet
End User Procedure (EUP) Display RFQ ME43: Purpose
12 pages
Ned Mohan: Minneapolis 2002
No ratings yet
Ned Mohan: Minneapolis 2002
19 pages
Copernicus Product Catalogue 20200302
No ratings yet
Copernicus Product Catalogue 20200302
76 pages
Parallel Computing: Charles Koelbel
No ratings yet
Parallel Computing: Charles Koelbel
12 pages
Mansi Kadam PC Lab Assignment 1
No ratings yet
Mansi Kadam PC Lab Assignment 1
4 pages
Function ConvertCurrencyToEnglish
No ratings yet
Function ConvertCurrencyToEnglish
8 pages
OREDA
No ratings yet
OREDA
2 pages
BT11803 Tutorial 3 ANSWER
100% (1)
BT11803 Tutorial 3 ANSWER
4 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages
SATIR DX-Series - DX-300 - Catalogue
No ratings yet
SATIR DX-Series - DX-300 - Catalogue
3 pages
Firewall Ufw
No ratings yet
Firewall Ufw
10 pages
Amdocs Fte Cpi Form Nitdgp
No ratings yet
Amdocs Fte Cpi Form Nitdgp
4 pages
Classical Planning in AI
100% (1)
Classical Planning in AI
5 pages
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
Collaborative Learning For Cyberattack Detection in Blockchain Networks
No ratings yet
Collaborative Learning For Cyberattack Detection in Blockchain Networks
12 pages
Endian Iec-62443-Compliance Whitepaper en
No ratings yet
Endian Iec-62443-Compliance Whitepaper en
5 pages
Semtech Broadcast SelectorGuide 2021 Web
No ratings yet
Semtech Broadcast SelectorGuide 2021 Web
12 pages
0417 s13 QP 31
No ratings yet
0417 s13 QP 31
8 pages
SUN2000-115kTL-M2 Datasheet
No ratings yet
SUN2000-115kTL-M2 Datasheet
2 pages
TE Comp Sem VI - AI For May 2022 Examination
No ratings yet
TE Comp Sem VI - AI For May 2022 Examination
3 pages
Document
No ratings yet
Document
1 page
LEDGENTS For Building
No ratings yet
LEDGENTS For Building
1 page

01 - Lecture Intro To HPC

Uploaded by

01 - Lecture Intro To HPC

Uploaded by

CSE 455

• Using Computing resources that provide more

HPC includes work on ‘four basic building blocks’ in this

Copyright © 2010, Elsevier Inc. All rights Reserved 9

Copyright © 2010, Elsevier Inc. All rights Reserved 10

Kilo thousand (103 ) 210

◼ One operation may take several cycles.

◼ How many operation/sec done in 1 GHz Processor ???

◼ Exascale is a billion times more. (Also flops and bytes.)

Copyright © 2010, Elsevier Inc. All rights Reserved 14

Copyright © 2010, Elsevier Inc. All rights Reserved 15

Copyright © 2010, Elsevier Inc. All rights Reserved 16

Copyright © 2010, Elsevier Inc. All rights Reserved 17

Copyright © 2010, Elsevier Inc. All rights Reserved 18

◼ Since then, it’s dropped to about 20%

Copyright © 2010, Elsevier Inc. All rights Reserved 20

Copyright © 2010, Elsevier Inc. All rights Reserved 21

◼ Serial programs don’t benefit from this

Copyright © 2010, Elsevier Inc. All rights Reserved 22

◼ But there are

Copyright © 2010, Elsevier Inc. All rights Reserved 23

Copyright © 2010, Elsevier Inc. All rights Reserved 24

Copyright © 2010, Elsevier Inc. All rights Reserved 25

◼ What you really want is for

Copyright © 2010, Elsevier Inc. All rights Reserved 27

◼ Write translation programs that

Copyright © 2010, Elsevier Inc. All rights Reserved 28

Copyright © 2010, Elsevier Inc. All rights Reserved 29

Copyright © 2010, Elsevier Inc. All rights Reserved 30

Each core uses it’s own private variables

Copyright © 2010, Elsevier Inc. All rights Reserved 31

◼ Ex., 8 cores, n = 24, then the calls to

Copyright © 2010, Elsevier Inc. All rights Reserved 32

Copyright © 2010, Elsevier Inc. All rights Reserved 33

Copyright © 2010, Elsevier Inc. All rights Reserved 34

Copyright © 2010, Elsevier Inc. All rights Reserved 35

Copyright © 2010, Elsevier Inc. All rights Reserved 36

◼ Now cores divisible by 4 repeat the

Copyright © 2010, Elsevier Inc. All rights Reserved 38

Copyright © 2010, Elsevier Inc. All rights Reserved 39

◼ In the second example, the master core

◼ The improvement is more than a factor of 2!

Copyright © 2010, Elsevier Inc. All rights Reserved 40

◼ That’s an improvement of almost a factor

Copyright © 2010, Elsevier Inc. All rights Reserved 43

Copyright © 2010, Elsevier Inc. All rights Reserved 44

Copyright © 2010, Elsevier Inc. All rights Reserved 45

Copyright © 2010, Elsevier Inc. All rights Reserved 46

Copyright © 2010, Elsevier Inc. All rights Reserved 47

Copyright © 2010, Elsevier Inc. All rights Reserved 48

Copyright © 2010, Elsevier Inc. All rights Reserved 49

Copyright © 2010, Elsevier Inc. All rights Reserved 52

Copyright © 2010, Elsevier Inc. All rights Reserved 53

Copyright © 2010, Elsevier Inc. All rights Reserved 61

Copyright © 2010, Elsevier Inc. All rights Reserved 62

You might also like