Computer Architecture ILP - Techniques For Increasing

The document discusses instruction level parallelism (ILP) and techniques to enhance it, focusing on program correctness through exception behavior and data flow preservation. Key compiler techniques include pipeline scheduling, loop unrolling, and branch prediction, which aim to optimize execution speed and reduce stalls. Advanced branch prediction methods, such as correlating and tournament predictors, are also highlighted for improving prediction accuracy in instruction execution.

Uploaded by

Aritra Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views11 pages

Computer Architecture ILP - Techniques For Increasing

Uploaded by

Aritra Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Computer Architecture

(PCC CS-402)
Instruction Level Parallelism
Techniques for increasing ILP

May 12, 2025

Essential properties to program correctness
■ Enforcing dependency relations is not entirely
necessary if we can preserve the correctness of the
program.
■ Two properties critical to program correctness (and
normally preserved by maintaining both data and
control dependency) are:
● Preserving exception behavior: any change in
instruction order must not change the order in which
exceptions are raised.
● Preserving data flow: the flow of data between
instructions that produce results and consumes them.
Liveness: data that is needed is called live; data that is
no longer used is called dead.

May 12, 2025 2

Compiler techniques for exposing ILP
■ Loop transformation technique to optimize a
program's execution speed :
● Reduce or eliminate instructions that control the loop,
e.g., pointer arithmetic and "end of loop" tests on each
iteration.
● Hide latencies, e.g., the delay in reading data from
memory.
● Re-write loops as a repeated sequence of similar
independent statements → space-time tradeoff.
● Reduce branch penalties.
■ Methods:
1. Pipeline scheduling.
2. Loop unrolling.
3. Branch prediction.

May 12, 2025 3

1. Pipeline Scheduling
■ Pipeline stall: Delay in execution of an instruction in
an instruction pipeline in order to resolve a hazard.
The compiler can reorder instructions to reduce the
number of pipeline stalls.
■ Pipeline scheduling: Separate dependent instruction
from the source instruction by the pipeline latency of
the source instruction.

May 12, 2025 4

2. Loop unrolling

Loop: L.D F0,0(R1) // F0=array element

DADDUI R1,R1,#-8 // decrement pointer
8 bytes
ADD.D F4,F0,F2 // add scalar in F2
S.D F4,8(R1) // store result
BNE R1,R2,Loop // branch R1!=R2

■ Assume # elements of the array with starting address

in R1 is divisible by 4.
■ Unrolled by a factor of 4.
■ Eliminate unnecessary instructions.

May 12, 2025 5

Loop unrolling
Loop: L.D F0,0(R1)
ADD.D F4,F0,F2
S.D F4,0(R1) % drop DADDUI & BNE
L.D F6,-8(R1)
ADD.D F8,F6,F2
S.D F8,-8(R1) %drop DADDUI & BNE
L.D F10,-16(R1)
ADD.D F12,F10,F2
S.D F12,-16(R1) % drop DADDUI & BNE
L.D F14,-24(R1)
ADD.D F16,F14,F2
S.D F16,-24(R1)
DADDUI R1,R1,#-32
BNE R1,R2,Loop

May 12, 2025 6

Pipeline schedule the unrolled loop
■ Pipeline schedule reduces the number of stalls.
● The L.D instruction requires only one cycle so when
ADD.D are issued F4, F8, F12 , and F16 are already
loaded.
● The ADD.D requires only two cycles so that two S.D
can proceed immediately.
● The array pointer is updated after the first two S.D so
the loop control
Loop: can proceed immediately
L.D F0,0(R1) after the last
S.D F4,0(R1)
two S.D. L.D F6,-8(R1) S.D F8,-
8(R1)
L.D F10,-16(R1) DADDUI
R1,R1,#-32
L.D F14,-24(R1) S.D
F12,16(R1)
ADD.D F4,F0,F2 S.D
F16,8(R1)
ADD.D F8,F6,F2 BNE
R1,R2,Loop
May 12, 2025 ADD.D F12,F10,F2 7
Loop unrolling & scheduling summary
■ Use different registers to avoid unnecessary
constraints.
■ Adjust the loop termination and iteration code.
■ Find if the loop iterations are independent except the
loop maintenance code, if so unroll the loop.
■ Analyze memory addresses to determine if the load
and store from different iterations are independent, if
so interchange load and stores in the unrolled loop.
■ Schedule the code while ensuring correctness.
■ Limitations of loop unrolling:
● Decrease of the amount of overhead with each roll.
● Growth of the code size.
● Register pressure (shortage of registers) → scheduling
to increase ILP increases the number of live values
thus, the number of registers.
May 12, 2025 8
3. Branch prediction
■ Guess whether a conditional jump will be taken or
not.
■ Improve the flow in the instruction pipeline.
■ The branch that is guessed to be the most likely is
then fetched and speculatively executed.
■ If it is later detected that the guess was wrong then
the speculatively executed or partially executed
instructions are discarded and the pipeline starts over
with the correct branch, incurring a delay.

May 12, 2025 9

Branch prediction
■ The branch predictor keeps records of whether
branches are taken or not taken. When it encounters a
conditional jump that has been seen several times
before then it can base the prediction on the history.
The branch predictor may, for example, recognize
that the conditional jump is taken more often than
not, or that it is taken every second time.
■ Note: not to be confused with branch target
prediction → guess the target of a taken conditional
or unconditional jump before it is computed by
decoding and executing the instruction itself. Both
are often combined into the same circuitry.
May 12, 2025 10
Advanced Branch prediction
■ Correlating branch predictors (or two-level
predictors): make use of outcome of most recent
branches to make prediction.
● Correlated predictor has less misses that simple
predictor with same size.
● Correlated predictor has less misses than simple
predictor with unlimited number of entries.
■ Tournament predictors: run multiple predictors and run a
tournament between them; use the most successful.
● Combine two predictors:
 Global information based predictor.
 Local information based predictor.
● Uses a selector to choose between predictors.

May 12, 2025 11

Industrial Production Line Counter Systems
No ratings yet
Industrial Production Line Counter Systems
9 pages
Computer Fundamentals Tutorialspoint
No ratings yet
Computer Fundamentals Tutorialspoint
54 pages
Op 275
No ratings yet
Op 275
13 pages
Anch Prediction
No ratings yet
Anch Prediction
183 pages
Sisense Technical Documentation V8.0
No ratings yet
Sisense Technical Documentation V8.0
1,531 pages
An Introduction To Cloud Database
No ratings yet
An Introduction To Cloud Database
48 pages
CMP3010L05-Hazard Continue ILP
No ratings yet
CMP3010L05-Hazard Continue ILP
54 pages
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
138 pages
State Space Modeling of Buck Boost Converter
No ratings yet
State Space Modeling of Buck Boost Converter
4 pages
Onur 447 Spring15 Lecture9 Branch Prediction Afterlecture
No ratings yet
Onur 447 Spring15 Lecture9 Branch Prediction Afterlecture
65 pages
Chapter 1 - All About Computers: Class - V Subject-Computer Science 3 Term Home-Work Schedule (Number N1)
No ratings yet
Chapter 1 - All About Computers: Class - V Subject-Computer Science 3 Term Home-Work Schedule (Number N1)
2 pages
CompanionAsset 9780128119051 Chapter03
No ratings yet
CompanionAsset 9780128119051 Chapter03
67 pages
Pipelining Achieves Instruction Level Parallelism (ILP)
No ratings yet
Pipelining Achieves Instruction Level Parallelism (ILP)
59 pages
EC483 Fall2024 W7
No ratings yet
EC483 Fall2024 W7
40 pages
Lec7 Pipelining
No ratings yet
Lec7 Pipelining
22 pages
P N Junction Diode
No ratings yet
P N Junction Diode
3 pages
Assignment - Web Testing App
No ratings yet
Assignment - Web Testing App
25 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
SS Computer Architecture Cache Memory Organization
No ratings yet
SS Computer Architecture Cache Memory Organization
24 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
108 pages
13) Ilp1 PDF
No ratings yet
13) Ilp1 PDF
85 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
108 pages
Exploiting Instruction-Level Parallelism With Software Approaches
No ratings yet
Exploiting Instruction-Level Parallelism With Software Approaches
108 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
43-Instruction Scheduling and Software Pipelining-19!11!2024
No ratings yet
43-Instruction Scheduling and Software Pipelining-19!11!2024
25 pages
Lecture 5
No ratings yet
Lecture 5
76 pages
Unit II
No ratings yet
Unit II
84 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
ch4 3
No ratings yet
ch4 3
61 pages
VB6.0 TutorialUpdatedTableContentA4
No ratings yet
VB6.0 TutorialUpdatedTableContentA4
130 pages
9 Loop Unrolling
No ratings yet
9 Loop Unrolling
21 pages
Lec5 - ILP Issues in Pipeline Design
No ratings yet
Lec5 - ILP Issues in Pipeline Design
38 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
Slides Chapter 6 Pipelining
No ratings yet
Slides Chapter 6 Pipelining
60 pages
EE457Unit9a OoO
No ratings yet
EE457Unit9a OoO
77 pages
Microbiology
No ratings yet
Microbiology
34 pages
MHR Sprint Demo 8
No ratings yet
MHR Sprint Demo 8
16 pages
UNIT3 B
No ratings yet
UNIT3 B
97 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
Lec18-Static BRANCH PREDICTION VLIW
No ratings yet
Lec18-Static BRANCH PREDICTION VLIW
40 pages
Page Replacement
No ratings yet
Page Replacement
3 pages
Pipe 3
No ratings yet
Pipe 3
32 pages
Data Sheet 6ES7212-1HD30-0XB0: General Information
No ratings yet
Data Sheet 6ES7212-1HD30-0XB0: General Information
5 pages
CAunitiii
No ratings yet
CAunitiii
36 pages
Pipelining Become Universal Technique in 1985
No ratings yet
Pipelining Become Universal Technique in 1985
16 pages
Lec02 Superscalar SW VLIW 22 23
No ratings yet
Lec02 Superscalar SW VLIW 22 23
34 pages
Dynamo1300 640SF
No ratings yet
Dynamo1300 640SF
2 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
Computer Science 146 Computer Architecture
No ratings yet
Computer Science 146 Computer Architecture
13 pages
Information Transfer
No ratings yet
Information Transfer
18 pages
L04 PipeliningII
No ratings yet
L04 PipeliningII
33 pages
Lec 11
No ratings yet
Lec 11
19 pages
CPU Structure & Functions
No ratings yet
CPU Structure & Functions
44 pages
Lec 12
No ratings yet
Lec 12
15 pages
4.1 Basic Compiler Techniques For Exposing ILP Instruction-Level Parallelism
No ratings yet
4.1 Basic Compiler Techniques For Exposing ILP Instruction-Level Parallelism
11 pages
3 Pipeline
No ratings yet
3 Pipeline
38 pages
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
50 pages
Chapter 3 PPTV 31 Sem IIv 31
No ratings yet
Chapter 3 PPTV 31 Sem IIv 31
40 pages
Analog Signal Conditioning: Compiled By: Prof. G B Rathod EC Dept., BVM Email: Ghansyam - Rathod@bvmengineering - Ac.in
No ratings yet
Analog Signal Conditioning: Compiled By: Prof. G B Rathod EC Dept., BVM Email: Ghansyam - Rathod@bvmengineering - Ac.in
81 pages
Page 1 of 8 ARW, Add RBS Wizard, System Administrator Guide
No ratings yet
Page 1 of 8 ARW, Add RBS Wizard, System Administrator Guide
8 pages
CCN 1
No ratings yet
CCN 1
75 pages
Pipeline Hazards: Structural Hazards: Resource Conflict
No ratings yet
Pipeline Hazards: Structural Hazards: Resource Conflict
49 pages
Bee Micro Project
No ratings yet
Bee Micro Project
10 pages
Input Unit: Memory: in Processing Element (PE) or CPU: Output
No ratings yet
Input Unit: Memory: in Processing Element (PE) or CPU: Output
24 pages
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
No ratings yet
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
38 pages
555 Timer
No ratings yet
555 Timer
12 pages
Pipelining
No ratings yet
Pipelining
44 pages
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
No ratings yet
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
27 pages
Mastering Angular 16: A Concise Overview: Master of Angular 16 Series, #1
From Everand
Mastering Angular 16: A Concise Overview: Master of Angular 16 Series, #1
Pedro Martins
No ratings yet
CS 6461: Computer Architecture Instruction Level Parallelism
No ratings yet
CS 6461: Computer Architecture Instruction Level Parallelism
41 pages
CSF303 Lab2
No ratings yet
CSF303 Lab2
8 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Lecture: Static ILP: Topics: Predication, Speculation (Sections C.5, 3.2)
No ratings yet
Lecture: Static ILP: Topics: Predication, Speculation (Sections C.5, 3.2)
26 pages
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
0% (1)
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
40 pages
Wa Spring4 PDF
No ratings yet
Wa Spring4 PDF
11 pages
M116C 1 M116C 1 Lec10-Pipeline-II
No ratings yet
M116C 1 M116C 1 Lec10-Pipeline-II
18 pages
19 Computer Architecture Vector Processor
No ratings yet
19 Computer Architecture Vector Processor
20 pages
UPS 6KVA EMERSON Menual
No ratings yet
UPS 6KVA EMERSON Menual
9 pages
Pipeline History
No ratings yet
Pipeline History
30 pages
Lec3 PDF
No ratings yet
Lec3 PDF
15 pages
Instruction-Level Parallelism (ILP), Since The
100% (1)
Instruction-Level Parallelism (ILP), Since The
57 pages
3D Beamforming (FDD) Feature Parameter Description: Issue Date
No ratings yet
3D Beamforming (FDD) Feature Parameter Description: Issue Date
42 pages
Solid Sate Electronic Devices (Code: 402401) - AY1516-S2
No ratings yet
Solid Sate Electronic Devices (Code: 402401) - AY1516-S2
12 pages
An Introduction To Libuv
No ratings yet
An Introduction To Libuv
63 pages
The 3.5 - Floppy List For CPC Amstrads. 09-2016
No ratings yet
The 3.5 - Floppy List For CPC Amstrads. 09-2016
7 pages
Lab 1 Postlab Report
No ratings yet
Lab 1 Postlab Report
6 pages
Lec-10 Software Pipelining
No ratings yet
Lec-10 Software Pipelining
24 pages
CDJ 9000 Nxs
No ratings yet
CDJ 9000 Nxs
158 pages
Pacman - Arch
No ratings yet
Pacman - Arch
8 pages
Landis+Gyr E850: Grid Metering
No ratings yet
Landis+Gyr E850: Grid Metering
6 pages
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
No ratings yet
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
21 pages
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
No ratings yet
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
7 pages
3RD Grading Periodical Test
No ratings yet
3RD Grading Periodical Test
4 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)

Computer Architecture ILP - Techniques For Increasing

Uploaded by

Computer Architecture ILP - Techniques For Increasing

Uploaded by

Computer Architecture

May 12, 2025

May 12, 2025 2

May 12, 2025 3

May 12, 2025 4

Loop: L.D F0,0(R1) // F0=array element

■ Assume # elements of the array with starting address

May 12, 2025 5

May 12, 2025 6

May 12, 2025 9

May 12, 2025 11

You might also like