0% found this document useful (0 votes)

67 views49 pages

Dependencies, Instruction Scheduling, Optimization, and Parallelism

Uploaded by

DUDEKULA VIDYASAGAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views49 pages

Dependencies, Instruction Scheduling, Optimization, and Parallelism

Uploaded by

DUDEKULA VIDYASAGAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Dependencies, Instruction

Scheduling, Optimization, and

Parallelism
Prof. James L. Frankel
Harvard University

Version of 6:59 PM 24-Apr-2018

Copyright © 2018, 2016 James L. Frankel. All rights reserved.
Ordering of Execution of Instructions
• Although written by the programmer in a particular way, the language
allows execution in another order so long as it meets the as-if
constraint
• A different order may allow faster execution because of
• Delay slots
• Pipelining advantages
• Caching advantages
• Prefetching
• Multiple processing elements
• Data locality
Delay Slot
• Pre-fetching of instructions is performed by the processor so it is not
idle waiting for instructions to be read from memory
• If an unpredicted branch/jump occurs, it may cause a pipeline bubble
• Pre-fetching of instructions may not follow the execution path even if
a processor is able to correctly predict whether a branch/jump will
occur
• MIPS deals with this issue by executing one instruction that follows a
branch/jump whether or not the branch/jump occurs
• The location of that instruction following the branch/jump is referred to as
the delay slot
Delay Slot Not Evident in Our MIPS Code
• We’ve been using SPIM in a default, simplified mode
• SPIM is not emulating the delay slot feature of MIPS
• Switch -delayed_branches turns delay slot emulation on
Pipelining
• Present the CSCI E-93 MIPS Pipelining Slides
• These are not available on-line
Caching
• See CSCI E-93 Caching slides
Types of Dependencies
• Control Dependence
• Control flow of program determines what can execute when
• Data Dependence
• Definition and use of variables determines a partial ordering
Control Dependencies
• Flow-of-control statements
• If-then
• If-then-else
• For
• While
• Do-while
• Switch-case
• Function call
• Return
• Goto
• Break
• Continue
Control Dependencies
• Flow-of-control operators
• ||
• &&
• ?:
Data Dependencies (1 of 3)
• True Dependence
• A variable is written and then is read

variable = …
…
… = variable
Data Dependencies (2 of 3)
• Output Dependence
• A variable is written and later is written again

variable = …
…
variable = …

• Can be removed by renaming (SSA form)

Data Dependencies (3 of 3)
• Anti-Dependence
• A variable is read and then written

… = variable
…
variable = …

• Can be removed by renaming (SSA form)

Complications in Determining Data
Dependence
• Array accesses require analysis of the subscript expressions
• Pointer accesses require analysis of the pointers derivations
• In addition to aliasing other pointers, pointers can also alias variables of other
types
• Unions create aliases explicitly
Sequential Array Accesses (1 of 3)
i = 5;
j = 6;
A[j-1] = …;
… = A[i];

• Does j-1 equal i?

• Can be determined by copy propagation and constant folding
• What about across basic blocks?
Sequential Array Accesses (2 of 3)
void f(int i, j) {
A[j-1] = …;
… = A[i];
}

• Does j-1 equal i?

• Requires symbolic evaluation and inter-procedural analysis (i.e.,
analysis across the function call boundary)
Sequential Array Accesses (3 of 3)
void f(int i, j) {
A[j-1] = …;
… = A[i*3];
}

• Does j-1 equal i*3?

• Requires more complicated symbolic evaluation and inter-procedural
analysis
Pointer Dereferencing (1 of 4)
int A[10], p, q;
p = &A[0];
q = &A[1];
*p = …;
… = *q;

• Do p and q alias each other?

Pointer Dereferencing (2 of 4)
int A[10], p, q;
p = &A[0];
q = &A[1];
*p = …;
… = *(q-1);

• Do p and (q-1) alias each other?

Pointer Dereferencing (3 of 4)
int A[10], p, q;
p = &A[0];
*p = …;
q = f(…);
… = *q;

• Do p and q alias each other?

Pointer Dereferencing (4 of 4)
int i, p;
p = &i;
…
… = i; /* First reference to i */
*p = …;
… = i; /* Second reference to i */

• Do *p and i alias each other?

• Do both references to i need to read the value of i or could i be kept in a
register?
Unions (1 of 3)
union union_name {
int i;
float f;
} var;
var.i = …;
… = var.f;

• Do var.i and var.f alias each other?

Unions (2 of 3)
union union_name {
int i;
short s;
char c;
} var;
var.i = …;
… = var.s;

• Do var.i and var.s alias each other?

Unions (3 of 3)
union union_name {
int i;
short s[4];
char c[6];
} var;
var.i = …;
… = var.s[2];

• Do var.i and var.s[2] alias each other?

Sequential Data Dependency vs. Loop-Carried
Data Dependency
• Sequential Data Dependency is directly reflected by the program
without requiring analysis of loops

• Loop-Carried Data Dependency requires analysis of loops to be

discovered
Simple Loop-Carried Data Dependence
Example
n = 5;
product = 1;
while(n > 1) {
product = product*n;
n--;
}

• Both n and product have sequential and loop-carried dependencies

Difficulties in Data Dependence Analysis
• Usually analysis is more difficult because of more complex data types
• Determining if a reference is to the same data as another access is the
problem of determining aliasing
• One access aliases another access, if the accesses overlap data in
memory
• Array accesses require analysis of the subscript expressions
• Pointer accesses require analysis of the pointers derivations
• Unions create aliases explicitly
Loop-Level Parallelism (1 of 3)
• Compute the squares of the differences between elements in two
arrays

for(i = 0; i < n; i++) {

Z[i] = X[i] – Y[i];
Z[i] = Z[i] * Z[i];
}

• Contains independent iterations

Loop-Level Parallelism (2 of 3)
• Compute the squares of the differences between elements in two arrays

for(i = 0; i < n; i++)

Z[i] = X[i] – Y[i];
for(i = 0; i < n; i++)
Z[i] = Z[i] * Z[i];

• Also contains independent iterations, but exhibits worse data locality than
the program fragment on the previous slide
• In the previous program fragment, operations can be performed while data is still in
registers
Loop-Level Parallelism (3 of 3)
• Going back to the first fragment, with M processors and with each processor
numbered p (zero origin), the previous loop can be rewritten, as follows:

b = ceil(n/M);
for(i = b*p; i < min(n, b*(p+1)); i++) {
Z[i] = X[i] – Y[i];
Z[i] = Z[i] * Z[i];
}

• Approximately equal size, independent iterations are created for each processor
FORTRAN PARALLEL DO
• FORTRAN has a PARALLEL DO statement that tells the compiler there
are no dependencies across its iterations

PARALLEL DO I = 1, N
A(I) = A(I) + B(I)
ENDDO
ISO C99 restrict
• ISO C99 has the restrict type qualifier for pointers to tell the compiler
there are no aliases to access the object to which it points

void add(int n, int *restrict dest, int *restrict op1, int *restrict op2) {
int i;
for(i = 0; i < n; i++)
dest[i] = op1[i] + op2[i];
}
Loop-Carried Dependence (1 of 7)
• Here is a slightly more complicated example of a loop-carried dependence:

double Z[100];
for(i = 0; i < 91; i++) {
Z[i+10] = Z[i];
}

• Iteration 0 copies Z[0] into Z[10]

• Iteration 1 copies Z[1] into Z[11]
• …
• Iteration 9 copies Z[9] into Z[19]
• Iteration 10 copies Z[10] into Z[20] -- This is a true dependent on iteration 0
• Iteration 11 copies Z[11] into Z[21] -- This is a true dependent on iteration 1
• …
Loop-Carried Dependence (2 of 7)
• This program fragment copies the first ten locations of Z into each of
the next ten locations of Z through to the end of Z
Loop-Carried Dependence (3 of 7)
• This example gives us a loop-carried dependence distance of 10
• And, a dependence direction of < (which means the direction is to a
future iteration)

• These distances and directions can be computed for each nested loop
iteration variable and for each statement in the loop

• For this example, the first 10 iterations can run with no dependencies
• Then, each iteration can run so long as the iteration 10 before it has
completed
Loop-Carried Dependence (4 of 7)
• For which values of x and y does x+10 equal y in the range 0 <= x, y <
91?
• An exact test would tell us if there exists a solution in the specified range
• An inexact test would tell us if there exists a solution, but not necessarily in
the specified range

• This is an Integer Linear Program

• Diophantine analysis can give us an exact answer
• GCD (Greatest Common Divisor) can give us an inexact answer
• But, if GCD says NO, then that is very useful information because then there is
no integer solution even outside the specified range!
Diophantine Equation
• Wikipedia: A Diophantine equation is a polynomial equation, usually
in two or more unknowns, such that only the integer solutions are
sought or studied (an integer solution is a solution such that all the
unknowns take integer values)
Loop-Carried Dependence (5 of 7)
• Here is another example of a loop-carried dependence:

double Z[100];
for(i = 0; i < 91; i++) {
Z[i] = Z[i+10];
}

• Iteration 0 copies Z[10] into Z[0]

• Iteration 1 copies Z[11] into Z[1]
• …
• Iteration 9 copies Z[19] into Z[9]
• Iteration 10 copies Z[20] into Z[10] -- This is anti-dependent on iteration 0
• Iteration 11 copies Z[21] into Z[11] -- This is anti-dependent on iteration 1
• …
Loop-Carried Dependence (6 of 7)
• Unfortunately, these anti-dependences can’t be removed by renaming
(converting into SSA form) because they are elements of an array

• This example gives us a loop-carried dependence distance of 10

• And, a dependence direction of < (which means the direction is to a future
iteration)

• Once again, for this example, the first 10 iterations can run with no
dependencies
• Then, each iteration can run so long as the iteration 10 before it has
completed
Loop-Carried Dependence (7 of 7)
• Here is a more complicated example of a loop-carried dependence:

double A[200];
for(i = 0; i < 100; i++) {
A[2*i + 2] = A[2*i + 1];
}

• Let’s apply the GCD test

• 2*idest + 2 = 2*iuse + 1
• 2*idest - 2*iuse = -1
• Does gcd(2, 2) divide 1?
• No; there is no dependency
Greatest Common Divisor
• The greatest common divisor of a1, a2, … , an is denoted by
gcd(a1, a2, … , an)
• It is the largest integer that evenly divides all a1 through an
• Use the Euclidean Algorithm to compute GCD; see Aho, Lam, Sethi, and
Ullman, page 820 for details on the algorithm

• Theorem 11.32 in ALSU on page 819 states that

• the linear Diophantine equation
a1x1 + a2x2 + … + anxn = c
• has an integer solution for x1, x2, … , xn if and only if gcd(a1, a2, … , an) divides c

• Signs of the a terms and of c (i.e., if any of the a terms or c are negative) are
irrelevant
Eager Evaluation
• Execute code to evaluate an expression when the result is assigned
(bound) to a variable
• This is the usual evaluation methodology using in most programming
languages
• Eager evaluation is a straight-forward implementation of the program
Futures/Lazy Evaluation/Call-by-Need
• Delayed evaluation until actually needed
• Most common method of evaluation in executing Haskell programs
• Sometimes operations are performed, but only a portion of the result
is needed
• Example: array inversion, but only some elements needed
• Sometimes operations are performed, but control flow means the
result may not be used
• Side-effects (e.g., input/output) must occur when expected
• May allow infinite-size data structures to be declared
• Causes the minimal amount of computation to be performed
Speculative Evaluation
• Execute code in advance of being needed if resources are available
• Take advantage of idle resources
• Have result immediately available, if needed
• Either side-effects must not occur (e.g., input/output) or must be able
to be reverted or undone (e.g., changing values of variables)
• Overall more computation may be performed, but the overall time to
completion of a program can be reduced
Locality of Data to Processor
• In a multi-processor system, having data local to a processor is very
important
• Data in registers is fastest
• Data in memory is an order-of-magnitude slower
• Data accessed over a network is slower
• Data in mass storage is much slower

• Very important to appropriately locate data in a MIMD (Multiple

Instruction Multiple Data) computer with local memory to each
processing element
Task Parallelism
• Can run larger segments of code on separate processors
• These might be different function invocations
• These might be multiple independent loops

• Easy to exploit for small scale parallelization

• Not as attractive for large scale parallelization as loop iteration/data
parallelism because
• There isn’t the same degree of task parallelism
• As the size of a data set increases, task parallelism doesn’t increase
• Tasks are generally of unequal size
• Not all processors are kept busy
• Need to wait for the slowest processor
Data Parallelism
• For CPU intensive, long-running programs, there is a higher degree of
data parallelism
• As the size of a data set increases, data parallelism increases
• Tasks are generally of equal size
• Keeps all processors busy
• No need to wait for the last processor to complete
Vector/SIMD/GPU Processors
• Same operation to multiple processing elements
• SIMD == Single Instruction Multiple Data
• Compiler needs to uncover array-like operations and dole them out to
each processor
• An equally big problem is locating the data in the appropriate
processor
• What if the data is used in different ways so that sometimes one assignment
of data to processors was appropriate and other times a different assignment
was appropriate?
Massively-Parallel Processor (MPP)
• Extremely large number of processors (e.g., 64K)
• Exploit parallelism in large data structures
• Intended for very time-consuming computations
• Almost all very time-consuming computations deal with massive amounts of
data
• Distribute the data among the processors
• Perform (mostly) local operations on the data

• Explore C* as an example of how to program such machines

Data Flow Computation
• Present the Jack Dennis model of Data Flow Computation

ACA Unit 8 Hardware and Software For VLIW and EPIC Notes - Unit 8
No ratings yet
ACA Unit 8 Hardware and Software For VLIW and EPIC Notes - Unit 8
35 pages
Capp 1
No ratings yet
Capp 1
38 pages
PDC Lecture 04
No ratings yet
PDC Lecture 04
44 pages
14-Parallelization and Automatic Parallelization-08!11!2024
No ratings yet
14-Parallelization and Automatic Parallelization-08!11!2024
50 pages
CS-3006 9 DependenceAnalysis
No ratings yet
CS-3006 9 DependenceAnalysis
67 pages
Module 5 Instruction Level Parallelism and Pipelining
No ratings yet
Module 5 Instruction Level Parallelism and Pipelining
54 pages
U3.1 Concepts and Challenges
No ratings yet
U3.1 Concepts and Challenges
12 pages
Dependency Analysis of For-Loop Structures For Automatic Parallelization of C Code
No ratings yet
Dependency Analysis of For-Loop Structures For Automatic Parallelization of C Code
13 pages
Dependence Alanysis and Loop Normalization
No ratings yet
Dependence Alanysis and Loop Normalization
23 pages
Embedded C Programming
100% (1)
Embedded C Programming
57 pages
ACA Unit 3
No ratings yet
ACA Unit 3
50 pages
Instruction Execution and Straight-Line Sequencing
No ratings yet
Instruction Execution and Straight-Line Sequencing
5 pages
Lecture 5
No ratings yet
Lecture 5
76 pages
BERC 1313 Lab 5 POINTER - 2022 - 2023 - Sem2
No ratings yet
BERC 1313 Lab 5 POINTER - 2022 - 2023 - Sem2
15 pages
Program Control Strcutures
No ratings yet
Program Control Strcutures
32 pages
Assignment # 5 - 5
No ratings yet
Assignment # 5 - 5
6 pages
MCP Unit 1
No ratings yet
MCP Unit 1
41 pages
Data Dependences: CS 524 - High-Performance Computing
No ratings yet
Data Dependences: CS 524 - High-Performance Computing
20 pages
Programming and Data Structure Solved MCQs Part 2 Book
No ratings yet
Programming and Data Structure Solved MCQs Part 2 Book
15 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
EE312 Old Exam Study Guide
No ratings yet
EE312 Old Exam Study Guide
6 pages
An Efficient Technique For Eliminating Hidden Redundant Memory Accesses
100% (1)
An Efficient Technique For Eliminating Hidden Redundant Memory Accesses
11 pages
L14 - Parallelization
No ratings yet
L14 - Parallelization
17 pages
Lab 4
No ratings yet
Lab 4
10 pages
High-Level Machine Machine: Start V A B V Is A 0 or B 0 Decision No Yes V V Calculate A B Terminate Program Oval
No ratings yet
High-Level Machine Machine: Start V A B V Is A 0 or B 0 Decision No Yes V V Calculate A B Terminate Program Oval
46 pages
4 - Intermediate Code Generation
No ratings yet
4 - Intermediate Code Generation
51 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
7 Pointers
No ratings yet
7 Pointers
40 pages
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
No ratings yet
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
38 pages
Unit 2 Basic Optimization Techniques For Serial Code
No ratings yet
Unit 2 Basic Optimization Techniques For Serial Code
31 pages
CSE 259 Lecture3
No ratings yet
CSE 259 Lecture3
11 pages
Dependence Analysis 1996
No ratings yet
Dependence Analysis 1996
226 pages
43-Instruction Scheduling and Software Pipelining-19!11!2024
No ratings yet
43-Instruction Scheduling and Software Pipelining-19!11!2024
25 pages
L28 Parallelization
No ratings yet
L28 Parallelization
13 pages
Unit-Iii: Instructions & Instruction Sequencing
No ratings yet
Unit-Iii: Instructions & Instruction Sequencing
8 pages
09 Pointers Arrays
No ratings yet
09 Pointers Arrays
34 pages
Standish Sol Manual Ece223
No ratings yet
Standish Sol Manual Ece223
258 pages
c3 Dependence Analysis p3
No ratings yet
c3 Dependence Analysis p3
20 pages
Data Level Parallelism in Smid Andvector and Gpu: BY 19PW40 S.Sayana
No ratings yet
Data Level Parallelism in Smid Andvector and Gpu: BY 19PW40 S.Sayana
18 pages
Data Dependences and Hazards
No ratings yet
Data Dependences and Hazards
24 pages
2 TypesofParallelism
No ratings yet
2 TypesofParallelism
69 pages
TA ZC142 Computer Programming Date: 23/01/2013
No ratings yet
TA ZC142 Computer Programming Date: 23/01/2013
43 pages
Do Hoang Tu - Operating System From 0 To 1 (2022) - Removed - Removed - Removed
No ratings yet
Do Hoang Tu - Operating System From 0 To 1 (2022) - Removed - Removed - Removed
21 pages
C-Unit-2 (3132)
No ratings yet
C-Unit-2 (3132)
55 pages
Unit 3
No ratings yet
Unit 3
14 pages
Review of Structured Programming in C
No ratings yet
Review of Structured Programming in C
64 pages
C Programming I - Karl W Broman
No ratings yet
C Programming I - Karl W Broman
22 pages
Compiler Techniques For Exposing ILP
No ratings yet
Compiler Techniques For Exposing ILP
18 pages
C Programming
No ratings yet
C Programming
73 pages
L19 Parallelization
No ratings yet
L19 Parallelization
11 pages
Assignment # 5
No ratings yet
Assignment # 5
7 pages
Solved Problems 1
No ratings yet
Solved Problems 1
4 pages
06 Arrays
No ratings yet
06 Arrays
60 pages
Compiler Unit 4
No ratings yet
Compiler Unit 4
59 pages
Assignment For Compiler Construction
No ratings yet
Assignment For Compiler Construction
5 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
Tuto No 01 Solution Up
No ratings yet
Tuto No 01 Solution Up
15 pages
Adobe Scan 29 Nov 2024
No ratings yet
Adobe Scan 29 Nov 2024
6 pages
Computer Science, Career and Job
From Everand
Computer Science, Career and Job
Ramkrishna Ghosh
No ratings yet
Oracle SQL and PL/SQL
From Everand
Oracle SQL and PL/SQL
Niraj Gupta
4.5/5 (8)
Rachitresume
No ratings yet
Rachitresume
1 page
Image Features and Categorization: Computer Vision Jia-Bin Huang, Virginia Tech
No ratings yet
Image Features and Categorization: Computer Vision Jia-Bin Huang, Virginia Tech
70 pages
CS5760: Computer Vision: Lecture 8: Image Alignment
No ratings yet
CS5760: Computer Vision: Lecture 8: Image Alignment
35 pages
Markov Random Fields and Segmentation With Graph Cuts: Computer Vision Jia-Bin Huang, Virginia Tech
No ratings yet
Markov Random Fields and Segmentation With Graph Cuts: Computer Vision Jia-Bin Huang, Virginia Tech
44 pages
CS5670: Computer Vision: Noah Snavely
No ratings yet
CS5670: Computer Vision: Noah Snavely
65 pages
Projective Geometry and Camera Models: Computer Vision Jia-Bin Huang, Virginia Tech
No ratings yet
Projective Geometry and Camera Models: Computer Vision Jia-Bin Huang, Virginia Tech
70 pages
Hidden Variables, The EM Algorithm, and Mixtures of Gaussians
No ratings yet
Hidden Variables, The EM Algorithm, and Mixtures of Gaussians
58 pages
Structure From Motion: Computer Vision Jia-Bin Huang, Virginia Tech
No ratings yet
Structure From Motion: Computer Vision Jia-Bin Huang, Virginia Tech
84 pages
Image Stitching: Computer Vision Jia-Bin Huang, Virginia Tech
No ratings yet
Image Stitching: Computer Vision Jia-Bin Huang, Virginia Tech
57 pages
Interest Points: Computer Vision Jia-Bin Huang, Virginia Tech
No ratings yet
Interest Points: Computer Vision Jia-Bin Huang, Virginia Tech
104 pages
CSC 458/2209: Computer Networks, Fall 2019: Department of Computer Science, University of Toronto
No ratings yet
CSC 458/2209: Computer Networks, Fall 2019: Department of Computer Science, University of Toronto
4 pages
H05 CSC458 Tutorial II
No ratings yet
H05 CSC458 Tutorial II
16 pages
Alignment and Object Instance Recognition: Computer Vision Jia-Bin Huang, Virginia Tech
No ratings yet
Alignment and Object Instance Recognition: Computer Vision Jia-Bin Huang, Virginia Tech
71 pages
CSE 185 Introduction To Computer Vision: Feature Matching
No ratings yet
CSE 185 Introduction To Computer Vision: Feature Matching
48 pages
CSE 185 Introduction To Computer Vision: Local Invariant Features
No ratings yet
CSE 185 Introduction To Computer Vision: Local Invariant Features
57 pages
CSE 185 Introduction To Computer Vision: Fitting and Alignment
No ratings yet
CSE 185 Introduction To Computer Vision: Fitting and Alignment
42 pages
H22 CSC458 - Final Review
No ratings yet
H22 CSC458 - Final Review
12 pages
H20 CSC458 Sample Final Solutions
No ratings yet
H20 CSC458 Sample Final Solutions
4 pages
15-441 Computer Networking: Lecture 20 - TCP Performance
No ratings yet
15-441 Computer Networking: Lecture 20 - TCP Performance
35 pages
Professor Yashar Ganjali Department of Computer Science University of Toronto
No ratings yet
Professor Yashar Ganjali Department of Computer Science University of Toronto
46 pages
15-441 Computer Networking: Lecture 18 - More TCP & Congestion Control
No ratings yet
15-441 Computer Networking: Lecture 18 - More TCP & Congestion Control
38 pages
Professor Yashar Ganjali Department of Computer Science University of Toronto
No ratings yet
Professor Yashar Ganjali Department of Computer Science University of Toronto
56 pages
21 p2p
No ratings yet
21 p2p
64 pages
22 Qos
No ratings yet
22 Qos
47 pages
12 BGP
No ratings yet
12 BGP
6 pages
Good Ideas So Far : Flow Control
No ratings yet
Good Ideas So Far : Flow Control
11 pages
Outline: DNS Design
No ratings yet
Outline: DNS Design
6 pages
Multicast Routing: Unicast: One Source To One Destination
No ratings yet
Multicast Routing: Unicast: One Source To One Destination
13 pages
Outline: Peter Steenkiste Departments of Computer Science and Electrical and Computer Engineering How Do Routers Works?
No ratings yet
Outline: Peter Steenkiste Departments of Computer Science and Electrical and Computer Engineering How Do Routers Works?
7 pages
14 Ip Grab Bag
No ratings yet
14 Ip Grab Bag
6 pages
Como Instalr Una Maquina
No ratings yet
Como Instalr Una Maquina
3 pages
CyberAces Module1-Windows 7 Registry
No ratings yet
CyberAces Module1-Windows 7 Registry
13 pages
String in C++
No ratings yet
String in C++
18 pages
DPWH DO NO. 006 S 2024-YOUTUBE LIVESTREAMING, POSTING OF PROCUREMENT ACTIVITIES AND CONTRACT AWARD REPORTING
No ratings yet
DPWH DO NO. 006 S 2024-YOUTUBE LIVESTREAMING, POSTING OF PROCUREMENT ACTIVITIES AND CONTRACT AWARD REPORTING
66 pages
Sessions - 1 and 2
No ratings yet
Sessions - 1 and 2
76 pages
Rain Bird ESP-Me Controller Users Manual EN
No ratings yet
Rain Bird ESP-Me Controller Users Manual EN
17 pages
C Interview Questions: Abdul Kalam
No ratings yet
C Interview Questions: Abdul Kalam
69 pages
What Is A Superapp?
No ratings yet
What Is A Superapp?
8 pages
Lecture 3 Gis
No ratings yet
Lecture 3 Gis
20 pages
Digital Literacy Level 4 Exam
No ratings yet
Digital Literacy Level 4 Exam
3 pages
Video Editing Unit 2 BJMC
100% (1)
Video Editing Unit 2 BJMC
14 pages
User Persona
No ratings yet
User Persona
1 page
Chandra Resume
No ratings yet
Chandra Resume
6 pages
Appendices
No ratings yet
Appendices
124 pages
ADB Code Adapter
No ratings yet
ADB Code Adapter
5 pages
Ericsson RBS Series
100% (1)
Ericsson RBS Series
2 pages
Enterprise Open Source and Linux - Ubuntu
No ratings yet
Enterprise Open Source and Linux - Ubuntu
2 pages
Comparators in Java
No ratings yet
Comparators in Java
3 pages
Zoology queSTION
No ratings yet
Zoology queSTION
1 page
117 2080 PL-300-1
No ratings yet
117 2080 PL-300-1
7 pages
Car Template Proposal 4g
No ratings yet
Car Template Proposal 4g
37 pages
Digital Forensic: Project Phase 2
No ratings yet
Digital Forensic: Project Phase 2
7 pages
Composition Aggregation UML Class Diagram For Composition and Aggregation
No ratings yet
Composition Aggregation UML Class Diagram For Composition and Aggregation
25 pages
RabbitMQ Training Daywise
No ratings yet
RabbitMQ Training Daywise
6 pages
Chapter 4: Automating Active Directory Domain Services Administration
No ratings yet
Chapter 4: Automating Active Directory Domain Services Administration
19 pages
Iar Final
No ratings yet
Iar Final
255 pages
M800 CDMA TM-System Architecture
100% (1)
M800 CDMA TM-System Architecture
85 pages
68hc11 - Addressing Modes
No ratings yet
68hc11 - Addressing Modes
18 pages
FS S5860 Series Switches
No ratings yet
FS S5860 Series Switches
24 pages
FIXED FOR NEWS-ANNOUNCEMENT-Announcement - Manage-Php
No ratings yet
FIXED FOR NEWS-ANNOUNCEMENT-Announcement - Manage-Php
7 pages

Dependencies, Instruction Scheduling, Optimization, and Parallelism

Uploaded by

Dependencies, Instruction Scheduling, Optimization, and Parallelism

Uploaded by

Dependencies, Instruction

Scheduling, Optimization, and

Version of 6:59 PM 24-Apr-2018

• Can be removed by renaming (SSA form)

• Can be removed by renaming (SSA form)

• Does j-1 equal i?

• Does j-1 equal i?

• Does j-1 equal i*3?

• Do *p and *q alias each other?

• Do *p and *(q-1) alias each other?

• Do *p and *q alias each other?

• Do *p and i alias each other?

• Do var.i and var.f alias each other?

• Do var.i and var.s alias each other?

• Do var.i and var.s[2] alias each other?

• Loop-Carried Data Dependency requires analysis of loops to be

• Both n and product have sequential and loop-carried dependencies

for(i = 0; i < n; i++) {

• Contains independent iterations

for(i = 0; i < n; i++)

• Iteration 0 copies Z[0] into Z[10]

• This is an Integer Linear Program

• Iteration 0 copies Z[10] into Z[0]

• This example gives us a loop-carried dependence distance of 10

• Let’s apply the GCD test

• Theorem 11.32 in ALSU on page 819 states that

• Very important to appropriately locate data in a MIMD (Multiple

• Easy to exploit for small scale parallelization

• Explore C* as an example of how to program such machines

You might also like

• Do p and q alias each other?

• Do p and (q-1) alias each other?

• Do p and q alias each other?