0% found this document useful (0 votes)

43 views25 pages

CS 352H: Computer Systems Architecture: Topic 10: Instruction Level Parallelism (ILP) October 6 - 8, 2009

This document discusses techniques for increasing instruction-level parallelism (ILP) in computer processors, including deeper pipelining, multiple issue, speculation, static and dynamic multiple issue, loop unrolling, and dynamic pipeline scheduling. The key techniques are executing multiple instructions simultaneously through deeper pipelining with multiple functional units, issuing instructions out of order to avoid stalls while preserving program semantics, and speculatively executing instructions to hide latencies.

Uploaded by

Sudip Kumar Dey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views25 pages

CS 352H: Computer Systems Architecture: Topic 10: Instruction Level Parallelism (ILP) October 6 - 8, 2009

Uploaded by

Sudip Kumar Dey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

CS 352H: Computer Systems Architecture

Topic 10: Instruction Level Parallelism (ILP)

October 6 - 8, 2009

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell
Instruction-Level Parallelism (ILP)

Pipelining: executing multiple instructions in

parallel
To increase ILP
Deeper pipeline
Less work per stage ⇒ shorter clock cycle
Multiple issue
Replicate pipeline stages ⇒ multiple pipelines
Start multiple instructions per clock cycle
CPI < 1, so use Instructions Per Cycle (IPC)
E.g., 4GHz 4-way multiple-issue
16 BIPS, peak CPI = 0.25, peak IPC = 4
But dependencies reduce this in practice

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 2
Multiple Issue

Static multiple issue

Compiler groups instructions to be issued together
Packages them into “issue slots”
Compiler detects and avoids hazards
Dynamic multiple issue
CPU examines instruction stream and chooses instructions to issue
each cycle
Compiler can help by reordering instructions
CPU resolves hazards using advanced techniques at runtime

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 3
Speculation

“Guess” what to do with an instruction

Start operation as soon as possible
Check whether guess was right
If so, complete the operation
If not, roll-back and do the right thing
Common to static and dynamic multiple issue
Examples
Speculate on branch outcome
Roll back if path taken is different
Speculate on load
Roll back if location is updated

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 4
Compiler/Hardware Speculation

Compiler can reorder instructions

e.g., move load before branch
Can include “fix-up” instructions to recover from incorrect guess
Hardware can look ahead for instructions to execute
Buffer results until it determines they are actually needed
Flush buffers on incorrect speculation

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 5
Speculation and Exceptions

What if exception occurs on a speculatively executed

instruction?
e.g., speculative load before null-pointer check
Static speculation
Can add ISA support for deferring exceptions
Dynamic speculation
Can buffer exceptions until instruction completion (which may not
occur)

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 6
Static Multiple Issue

Compiler groups instructions into “issue packets”

Group of instructions that can be issued on a single cycle
Determined by pipeline resources required
Think of an issue packet as a very long instruction
Specifies multiple concurrent operations
⇒ Very Long Instruction Word (VLIW)

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 7
Scheduling Static Multiple Issue

Compiler must remove some/all hazards

Reorder instructions into issue packets
No dependencies with a packet
Possibly some dependencies between packets
Varies between ISAs; compiler must know!
Pad with nop if necessary

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 8
MIPS with Static Dual Issue

Two-issue packets
One ALU/branch instruction
One load/store instruction
64-bit aligned
ALU/branch, then load/store
Pad an unused instruction with nop

Addres Instruction Pipeline Stages

sn type
ALU/branch IF ID EX ME WB
n+4 Load/store IF ID EX M
ME WB
n+8 ALU/branch IF ID M
EX ME WB
n + 12 Load/store IF ID EX M
ME WB
n + 16 ALU/branch IF ID M
EX ME WB
n + 20 Load/store IF ID EX M
ME WB
M
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 9
MIPS with Static Dual Issue

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 10
Hazards in the Dual-Issue MIPS

More instructions executing in parallel

EX data hazard
Forwarding avoided stalls with single-issue
Now can’t use ALU result in load/store in same packet
add $t0, $s0, $s1
load $s2, 0($t0)
Split into two packets, effectively a stall
Load-use hazard
Still one cycle use latency, but now two instructions
More aggressive scheduling required

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 11
Scheduling Example

Schedule this for dual-issue MIPS

Loop: lw $t0, 0($s1) # $t0=array element
addu $t0, $t0, $s2 # add scalar in $s2
sw $t0, 0($s1) # store result
addi $s1, $s1,–4 # decrement pointer
bne $s1, $zero, Loop # branch $s1!=0

ALU/branch Load/store cycle

Loop nop lw $t0, 0($s1) 1
: addi $s1, $s1,–4 nop 2
addu $t0, $t0, $s2 nop 3
bne $s1, $zero, Loop sw $t0, 4($s1) 4

IPC = 5/4 = 1.25 (c.f. peak IPC = 2)

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 12
Loop Unrolling

Replicate loop body to expose more parallelism

Reduces loop-control overhead
Use different registers per replication
Called “register renaming”
Avoid loop-carried “anti-dependencies”
Store followed by a load of the same register
Aka “name dependence”
Reuse of a register name

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 13
Loop Unrolling Example

ALU/branch Load/store cycle

Loop addi $s1, $s1,–16 lw $t0, 0($s1) 1
: nop lw $t1, 12($s1) 2
addu $t0, $t0, $s2 lw $t2, 8($s1) 3
addu $t1, $t1, $s2 lw $t3, 4($s1) 4
addu $t2, $t2, $s2 sw $t0, 16($s1) 5
addu $t3, $t4, $s2 sw $t1, 12($s1) 6
nop sw $t2, 8($s1) 7
bne $s1, $zero, Loop sw $t3, 4($s1) 8

IPC = 14/8 = 1.75

Closer to 2, but at cost of registers and code size

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 14
Dynamic Multiple Issue

“Superscalar” processors
CPU decides whether to issue 0, 1, 2, … each cycle
Avoiding structural and data hazards
Avoids the need for compiler scheduling
Though it may still help
Code semantics ensured by the CPU

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 15
Dynamic Pipeline Scheduling

Allow the CPU to execute instructions out of order to

avoid stalls
But commit result to registers in order
Example
lw $t0, 20($s2)
addu $t1, $t0, $t2
sub $s4, $s4, $t3
slti $t5, $s4, 20
Can start sub while addu is waiting for lw

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 16
Dynamically Scheduled CPU
Preserves
dependencies

Hold pending
operands

Results also sent

to any waiting
reservation
stations

Reorders buffer for

register writes
Can supply
operands for
issued instructions
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 17
Register Renaming

Reservation stations and reorder buffer effectively provide

register renaming
On instruction issue to reservation station
If operand is available in register file or reorder buffer
Copied to reservation station
No longer required in the register; can be overwritten
If operand is not yet available
It will be provided to the reservation station by a function unit
Register update may not be required

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 18
Speculation

Predict branch and continue issuing

Don’t commit until branch outcome determined
Load speculation
Avoid load and cache miss delay
Predict the effective address
Predict loaded value
Load before completing outstanding stores
Bypass stored values to load unit
Don’t commit load until speculation cleared

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 19
Why Do Dynamic Scheduling?

Why not just let the compiler schedule code?

Not all stalls are predicable
e.g., cache misses
Can’t always schedule around branches
Branch outcome is dynamically determined
Different implementations of an ISA have different
latencies and hazards

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 20
Does Multiple Issue Work?

Yes, but not as much as we’d like

Programs have real dependencies that limit ILP
Some dependencies are hard to eliminate
e.g., pointer aliasing
Some parallelism is hard to expose
Limited window size during instruction issue
Memory delays and limited bandwidth
Hard to keep pipelines full
Speculation can help if done well

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 21
Power Efficiency

Complexity of dynamic scheduling and speculations

requires power
Multiple simpler cores may be better

Microproce Year Clock Pipelin Issue Out-of- Cores Power

ssor Rate e width order/
i486 1989 25MHz Stages
5 1 No
Speculati 1 5W
Pentium 1993 66MHz 5 2 on
No 1 10W
Pentium 1997 200MHz 10 3 Yes 1 29W
Pro
P4 2001 2000MH 22 3 Yes 1 75W
Willamette
P4 Prescott 2004 z
3600MH 31 3 Yes 1 103W
Core 2006 z
2930MH 14 4 Yes 2 75W
UltraSparc 2003 z
1950MH 14 4 No 1 90W
III
UltraSparc 2005 z
1200MH 6 1 No 8 70W
T1 z
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 22
The Opteron X4 Microarchitecture

72 physical
registers

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 23
The Opteron X4 Pipeline Flow

For integer operations

FP is 5 stages longer
Up to 106 RISC-ops in progress
Bottlenecks
Complex instructions with long dependencies
Branch mispredictions
Memory access delays

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 24
Concluding Remarks

Multiple issue and dynamic scheduling (ILP)

Dependencies limit achievable parallelism
Complexity leads to the power wall

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 25

1.void Pointer
No ratings yet
1.void Pointer
1 page
Bank Math Lecture Book v-1
No ratings yet
Bank Math Lecture Book v-1
99 pages
Deathworld Harry Harrison
No ratings yet
Deathworld Harry Harrison
153 pages
Witherby Seamanship2014 PDF
67% (3)
Witherby Seamanship2014 PDF
92 pages
Unit Iv
No ratings yet
Unit Iv
17 pages
CompArch 17e ILP-1
No ratings yet
CompArch 17e ILP-1
15 pages
Practice Questions On Loops in Java
No ratings yet
Practice Questions On Loops in Java
6 pages
MS015 User Manual Multi
No ratings yet
MS015 User Manual Multi
90 pages
CMP3010L05-Hazard Continue ILP
No ratings yet
CMP3010L05-Hazard Continue ILP
54 pages
CMP3010L07 Tomasulo
No ratings yet
CMP3010L07 Tomasulo
70 pages
Investing For Inclusion Exploring Lgbti Lens
No ratings yet
Investing For Inclusion Exploring Lgbti Lens
48 pages
Lesson Plan
No ratings yet
Lesson Plan
3 pages
1Y0-204 Dumps Citrix Virtual Apps and Desktops 7 Administration
No ratings yet
1Y0-204 Dumps Citrix Virtual Apps and Desktops 7 Administration
7 pages
RC1665 - Mindi Puspita Anggraeni
No ratings yet
RC1665 - Mindi Puspita Anggraeni
5 pages
Him Portland v. Devito Builders (2003)
No ratings yet
Him Portland v. Devito Builders (2003)
4 pages
CPIM part 2 practice exam 2单词卡 - Quizlet
No ratings yet
CPIM part 2 practice exam 2单词卡 - Quizlet
15 pages
Statistical Analysis of Data From The Stock Market
No ratings yet
Statistical Analysis of Data From The Stock Market
25 pages
1 s2.0 S0360319923002951 Main
No ratings yet
1 s2.0 S0360319923002951 Main
25 pages
1.pipelining & ILP
No ratings yet
1.pipelining & ILP
37 pages
Coa Unit 5
No ratings yet
Coa Unit 5
20 pages
Processors
No ratings yet
Processors
58 pages
Visionis Biometric Solutions Vis 3015 Vis 3016 Vis 3013 ENG
No ratings yet
Visionis Biometric Solutions Vis 3015 Vis 3016 Vis 3013 ENG
14 pages
Norman Cordero Marquez, Petitioner, vs. Commission On Elections, Respondent.
No ratings yet
Norman Cordero Marquez, Petitioner, vs. Commission On Elections, Respondent.
9 pages
L04 Pipelining
No ratings yet
L04 Pipelining
38 pages
MS9882 10 Military Fasteners Com
No ratings yet
MS9882 10 Military Fasteners Com
2 pages
Prime SQL
No ratings yet
Prime SQL
1 page
Threads (Chapter 4) : References
No ratings yet
Threads (Chapter 4) : References
29 pages
Memory Consistency Model: Ack: Prof. Sarita Adve, UIUC
No ratings yet
Memory Consistency Model: Ack: Prof. Sarita Adve, UIUC
27 pages
Kanailal Sandipan Jis College Resume
No ratings yet
Kanailal Sandipan Jis College Resume
1 page
Operating System Structures (Chapter 2) : References
No ratings yet
Operating System Structures (Chapter 2) : References
20 pages
A4 版本1 （未使用）
No ratings yet
A4 版本1 （未使用）
2 pages
Media Studies
No ratings yet
Media Studies
44 pages
G7 Q4 Week 03
No ratings yet
G7 Q4 Week 03
8 pages
MVS JCL Utilities Quick Reference, Third Edition
From Everand
MVS JCL Utilities Quick Reference, Third Edition
Robert Wingate
5/5 (1)
Rough Transcriptionasi Se Baila El Tango - Violin 1
No ratings yet
Rough Transcriptionasi Se Baila El Tango - Violin 1
2 pages
Unit III
No ratings yet
Unit III
29 pages
Onur 447 Spring15 Lecture11 Precise Exceptions Afterlecture
No ratings yet
Onur 447 Spring15 Lecture11 Precise Exceptions Afterlecture
49 pages
Computer Architecture Revision For Final Exam
No ratings yet
Computer Architecture Revision For Final Exam
60 pages
Lecture10 Cda3101
No ratings yet
Lecture10 Cda3101
32 pages
Three Phase Frequency Converter PDF
No ratings yet
Three Phase Frequency Converter PDF
86 pages
BUILDING AND ENHANCING NEW LITERACIES ACROSS THE CURRICULUM Module 2
No ratings yet
BUILDING AND ENHANCING NEW LITERACIES ACROSS THE CURRICULUM Module 2
11 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
#9 - RA 9028 As Amended by RA 10364
100% (1)
#9 - RA 9028 As Amended by RA 10364
3 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
Unit 1
No ratings yet
Unit 1
5 pages
Project Analysis of Kaleshwaram
100% (2)
Project Analysis of Kaleshwaram
71 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
60 pages
Lec02 Superscalar SW VLIW 22 23
No ratings yet
Lec02 Superscalar SW VLIW 22 23
34 pages
CS8491 Ca Unit 4
No ratings yet
CS8491 Ca Unit 4
32 pages
Otondro Prohori, Guarding Who, Against What
No ratings yet
Otondro Prohori, Guarding Who, Against What
10 pages
Time - Speed - Distance: Grade 5 - Lesson 11 (5ta1)
No ratings yet
Time - Speed - Distance: Grade 5 - Lesson 11 (5ta1)
5 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
Summer Internship Project Report
100% (1)
Summer Internship Project Report
49 pages
Reading Assignment1
No ratings yet
Reading Assignment1
15 pages
Cics Question Bank 1 of 28
No ratings yet
Cics Question Bank 1 of 28
28 pages
Sp11-Quiz1 Soln
No ratings yet
Sp11-Quiz1 Soln
20 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
17 pages
Computer Architecture
No ratings yet
Computer Architecture
29 pages
Chapter 9
No ratings yet
Chapter 9
28 pages
Zareen 13
No ratings yet
Zareen 13
13 pages
9/11 Commission Interview Requests For Defense Department Personnel
No ratings yet
9/11 Commission Interview Requests For Defense Department Personnel
6 pages
Chap07 DMMvideo
No ratings yet
Chap07 DMMvideo
40 pages
Ca08 2014 PDF
No ratings yet
Ca08 2014 PDF
54 pages
University of Jordan: Computer Engineering Department Course Outline 0907731 Advanced Computer Architecture
No ratings yet
University of Jordan: Computer Engineering Department Course Outline 0907731 Advanced Computer Architecture
3 pages
Me FIRST
No ratings yet
Me FIRST
4 pages
SINAMICS G120 PN at S7-1200 DOCU V1d0 en
No ratings yet
SINAMICS G120 PN at S7-1200 DOCU V1d0 en
63 pages
Computer Architecture vs. Instruction Set Architecture
No ratings yet
Computer Architecture vs. Instruction Set Architecture
15 pages
CS6461 Computer Architecture Lecture 8
No ratings yet
CS6461 Computer Architecture Lecture 8
61 pages
Ca06 2014 PDF
No ratings yet
Ca06 2014 PDF
53 pages
Computer Architecture Chapter 4: The Processor Part 3: Dr. Phạm Quốc Cường
No ratings yet
Computer Architecture Chapter 4: The Processor Part 3: Dr. Phạm Quốc Cường
23 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
Computer Architecture - 2marks: 1) What Is The Need For Speculation? (NOV/DEC 2014)
No ratings yet
Computer Architecture - 2marks: 1) What Is The Need For Speculation? (NOV/DEC 2014)
11 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
CS352H Computer Systems Architecture: Lecture 2: Instruction Set Architectures I September 1, 2009
No ratings yet
CS352H Computer Systems Architecture: Lecture 2: Instruction Set Architectures I September 1, 2009
23 pages
Comparch PDF
No ratings yet
Comparch PDF
84 pages
16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013
No ratings yet
16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013
42 pages
CS352H: Computer Systems Architecture: Lecture 4: Instruction Set Architectures III + MIPS ALU September 10, 2008
No ratings yet
CS352H: Computer Systems Architecture: Lecture 4: Instruction Set Architectures III + MIPS ALU September 10, 2008
15 pages
Computer Organization: - by Rama Krishna Thelagathoti (M.Tech CSE From IIT Madras)
No ratings yet
Computer Organization: - by Rama Krishna Thelagathoti (M.Tech CSE From IIT Madras)
118 pages
9 MIPS Pipeline Hazards
No ratings yet
9 MIPS Pipeline Hazards
42 pages
Advanced Topics in Computer Architecture ECE 7373
No ratings yet
Advanced Topics in Computer Architecture ECE 7373
40 pages
Loop Unrolling: Replicate Loop Body To Expose More Parallelism
No ratings yet
Loop Unrolling: Replicate Loop Body To Expose More Parallelism
3 pages
Pipeline History
No ratings yet
Pipeline History
30 pages
Aca Important Questions 2 Marks 16marks
60% (5)
Aca Important Questions 2 Marks 16marks
18 pages
Pipeline Processor Design
No ratings yet
Pipeline Processor Design
89 pages
Unit - 4: Parallelism
No ratings yet
Unit - 4: Parallelism
5 pages
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
No ratings yet
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
21 pages
Department of Computer Science and Engineering Subject Name: Advanced Computer Architecture Code: Cs2354
No ratings yet
Department of Computer Science and Engineering Subject Name: Advanced Computer Architecture Code: Cs2354
7 pages
Real Time System Lect10 A
No ratings yet
Real Time System Lect10 A
25 pages
Aca Notes
No ratings yet
Aca Notes
23 pages
Complex Pipelining: Arvind
No ratings yet
Complex Pipelining: Arvind
32 pages
M116C 1 M116C 1 Lec10-Pipeline-II
No ratings yet
M116C 1 M116C 1 Lec10-Pipeline-II
18 pages
10th Lecture: Multiple-Issue Processors: Please Recall: Branch Prediction
No ratings yet
10th Lecture: Multiple-Issue Processors: Please Recall: Branch Prediction
28 pages
Midterm Recap: Performance Evaluation
No ratings yet
Midterm Recap: Performance Evaluation
5 pages
CSE 820 Graduate Computer Architecture: Dr. Enbody
No ratings yet
CSE 820 Graduate Computer Architecture: Dr. Enbody
25 pages
CS2354 Advanced Computer Architecture
No ratings yet
CS2354 Advanced Computer Architecture
14 pages
Superpipelining
No ratings yet
Superpipelining
7 pages

CS 352H: Computer Systems Architecture: Topic 10: Instruction Level Parallelism (ILP) October 6 - 8, 2009

Uploaded by

CS 352H: Computer Systems Architecture: Topic 10: Instruction Level Parallelism (ILP) October 6 - 8, 2009

Uploaded by

CS 352H: Computer Systems Architecture

Topic 10: Instruction Level Parallelism (ILP)

Pipelining: executing multiple instructions in

Static multiple issue

“Guess” what to do with an instruction

Compiler can reorder instructions

What if exception occurs on a speculatively executed

Compiler groups instructions into “issue packets”

Compiler must remove some/all hazards

Addres Instruction Pipeline Stages

More instructions executing in parallel

Schedule this for dual-issue MIPS

ALU/branch Load/store cycle

IPC = 5/4 = 1.25 (c.f. peak IPC = 2)

Replicate loop body to expose more parallelism

ALU/branch Load/store cycle

IPC = 14/8 = 1.75

Allow the CPU to execute instructions out of order to

Results also sent

Reorders buffer for

Reservation stations and reorder buffer effectively provide

Predict branch and continue issuing

Why not just let the compiler schedule code?

Yes, but not as much as we’d like

Complexity of dynamic scheduling and speculations

Microproce Year Clock Pipelin Issue Out-of- Cores Power

For integer operations

Multiple issue and dynamic scheduling (ILP)

You might also like