0% found this document useful (0 votes)

11 views67 pages

CompanionAsset 9780128119051 Chapter03

Chapter 3 of 'Computer Architecture: A Quantitative Approach' discusses Instruction-Level Parallelism (ILP) and its exploitation through pipelining and various scheduling techniques. It covers hardware-based dynamic approaches and compiler-based static approaches, emphasizing the importance of minimizing cycles per instruction and addressing data dependencies. The chapter also explores advanced topics such as loop unrolling, branch prediction, and dynamic scheduling to enhance performance in modern processors.

Uploaded by

Rasha Alhreimi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views67 pages

CompanionAsset 9780128119051 Chapter03

Uploaded by

Rasha Alhreimi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 67

Computer Architecture

A Quantitative Approach, Sixth Edition

Chapter 3
Instruction-Level Parallelism
and Its Exploitation

Copyright © 2019, Elsevier Inc. All rights Reserved 1

Introduction
Introduction
 Pipelining become universal technique in 1985
 Overlaps execution of instructions
 Exploits “Instruction Level Parallelism”

 Beyond this, there are two main approaches:

 Hardware-based dynamic approaches

Used in server and desktop processors

Not used as extensively in PMP (personal Mobile
Processors) processors
 Compiler-based static approaches

Not as successful outside of scientific applications

Copyright © 2019, Elsevier Inc. All rights Reserved 2

Introduction
Instruction-Level Parallelism
 When exploiting instruction-level parallelism, goal is to
minimize CPI (Cycles Per Instruction)
 Pipeline CPI = Ideal pipeline CPI + Structural stalls + Data hazard
stalls + Control stalls
Where:

The ideal pipeline CPI is a measure of the maximum performance attainable by the
implementation

Structural hazards arise from resource conflicts when the hardware cannot support
all possible combinations of instructions simultaneously in overlapped execution.

Data hazards arise when an instruction depends on the results of a previous
instruction in a way that is exposed by the overlapping of instructions in the
pipeline.

Control hazards arise from the pipelining of branches and other instructions that
change the PC.
 Parallelism with basic block is limited
 a straight-line code sequence with no branches in except to the entry
and no branches out except at the exit
 Typical size of basic block = 3-6 instructions
 Must optimize across branches

Copyright © 2019, Elsevier Inc. All rights Reserved 3

Introduction
Data Dependence
 The simplest and most common way to increase the
ILP is to exploit parallelism among iterations of a loop
 Loop-Level Parallelism
 Unroll loop statically or dynamically

As an alternative, Use SIMD (vector processors and
GPUs)
 Challenges:
 Data dependency

Instruction j is data dependent on instruction i if
 Instruction i produces a result that may be used by instruction j
 Instruction j is data dependent on instruction k and instruction k
is data dependent on instruction i
 Dependent instructions cannot be executed
simultaneously

Copyright © 2019, Elsevier Inc. All rights Reserved 4

Introduction
Data Dependence

Copyright © 2019, Elsevier Inc. All rights Reserved 5

Introduction
Data Dependence
 Dependencies are a property of programs
 Pipeline organization determines if dependence
is detected and if it causes a stall

 Data dependence conveys:

 Possibility of a hazard
 Order in which results must be calculated
 Upper bound on exploitable instruction level
parallelism

 Dependencies that flow through memory

locations are difficult to detect
Copyright © 2019, Elsevier Inc. All rights Reserved 6
Introduction
Name Dependence
 Two instructions use the same name but no flow of
information.
 occurs when two instructions use the same
register or memory location.
 Not a true data dependence, but is a problem when
reordering instructions
 two types of name dependences between an
instruction i that precedes instruction j:
 Antidependence: instruction j writes a register or
memory location that instruction i reads

Initial ordering (i before j) must be preserved
 Output dependence: instruction i and instruction j
write the same register or memory location

Ordering must be preserved
 To resolve, use register renaming techniques
Copyright © 2019, Elsevier Inc. All rights Reserved 7
Introduction
Other Factors
 A hazard occurs whenever:
 there is a name or data dependence between

instructions, and
 The instructions are close enough that the

overlap during execution would change the

order of access to the operand involved in the
dependence.
 Solution: Preserve program order (program
should execute sequentially).
 The goal of both software and hardware
techniques is to exploit parallelism by
preserving program order only where it
affects the outcome of the program
Copyright © 2019, Elsevier Inc. All rights Reserved 8
Introduction
Other Factors
 Consider two instructions i and j, with i preceding j in
program order. The possible data hazards are
 Read after write (RAW):j tries to read a source before i writes it.
 Write after write (WAW): j tries to write an operand before it is written
by i.
 Write after read (WAR): j tries to write a destination before it is read by
i
 Control Dependence
 Determines the ordering of instruction i with respect to a
branch instruction so that instruction i is executed in
correct program order and only when it should be

Instruction control dependent on a branch cannot be moved before
the branch so that its execution is no longer controlled by the
branch

An instruction not control dependent on a branch cannot be moved
after the branch so that its execution is controlled by the branch

Copyright © 2019, Elsevier Inc. All rights Reserved 9

Introduction
Examples
• Example 1:  or instruction
add x1,x2,x3 dependent on
beq x4,x0,L
sub x1,x1,x6
add and sub
L: …
or x7,x1,x8

• Example 2:
add x1,x2,x3
 Assume x4 isn’t
beq x12,x0,skip used after skip
sub x4,x5,x6  Possible to
add x5,x4,x9 move sub
skip: before the
or x7,x8,x9 branch

Copyright © 2019, Elsevier Inc. All rights Reserved 10

Compiler Techniques
Compiler Techniques for Exposing ILP
 Pipeline scheduling
 Find sequences of unrelated instructions that
can be overlapped in the pipeline.
 To avoid a pipeline stall, the execution of a
dependent instruction must be separated from
the source instruction by a distance in clock
cycles equal to the pipeline latency of that
source instruction.
 A compiler’s ability to perform this
scheduling depends on:
 Amount of ILP available in the program.
 latencies of the functional units in the pipeline.
Copyright © 2019, Elsevier Inc. All rights Reserved 11
Compiler Techniques
Compiler Techniques for Exposing ILP
 Example:
for (i=999; i>=0;
i=i-1)
x[i] = x[i] + s;
 The loop is
parallel: body of
each iteration is
independent.

Copyright © 2019, Elsevier Inc. All rights Reserved 12

Compiler Techniques
Pipeline Stalls

Copyright © 2019, Elsevier Inc. All rights Reserved 13

Compiler Techniques
Loop Unrolling
 Loop unrolling
 Unroll by a factor of 4 (assume # elements is divisible by 4)
 Eliminate unnecessary instructions
Loop: fld f0,0(x1)
fadd.d f4,f0,f2
fsd f4,0(x1) //drop addi & bne
fld f6,-8(x1)
fadd.d f8,f6,f2
fsd f8,-8(x1) //drop addi & bne
fld f10,-16(x1)
fadd.d f12,f0,f2  note: number
fsd f12,-16(x1) //drop addi & bne
of live registers
fld f14,-24(x1)
fadd.d f16,f14,f2
vs. original loop
fsd f16,-24(x1)  26 clock cycles
addi x1,x1,-32
bne x1,x2,Loop

Copyright © 2019, Elsevier Inc. All rights Reserved 14

Compiler Techniques
Loop Unrolling/Pipeline Scheduling
 Pipeline schedule the unrolled loop:
After Before
Loop: fld f0,0(x1)
fld f6,-8(x1)
fld f8,-16(x1)
fld f14,-24(x1)
fadd.d f4,f0,f2
fadd.d f8,f6,f2
fadd.d f12,f0,f2
fadd.d f16,f14,f2
fsd f4,0(x1)
fsd f8,-8(x1)
fsd f12,-16(x1)
fsd f16,-24(x1)
addi x1,x1,-32
 14 cycles
bne x1,x2,Loop  3.5 cycles per element

Copyright © 2019, Elsevier Inc. All rights Reserved 15

Compiler Techniques
Strip Mining
 Unknown number of loop iterations?
(upper bound on the loop is unknown)
 Number of iterations = n
 Goal: make k copies of the loop body
 Instead of a single unrolled loop, Generate
pair of consecutive loops:

First executes n mod k times

Second executes n / k times

“Strip mining”

Copyright © 2019, Elsevier Inc. All rights Reserved 16

Branch Prediction
Branch Prediction
 Basic 2-bit predictor:
 For each branch:

Predict taken or not taken

If the prediction is wrong two consecutive times, change prediction
 Correlating predictor/two-level predictors:
 Branch predictors use the behavior of other branches to make a
prediction.
 Multiple 2-bit predictors for each branch
 One for each possible combination of outcomes of preceding n
branches

(m,n) predictor: behavior from last m branches to choose from 2m n-bit predictors
 Tournament predictor:
 Combine correlating predictor with local predictor: choose among two
different predictors based on which predictor (local, global, or even
some time varying mix) was most effective in recent predictions.

Copyright © 2019, Elsevier Inc. All rights Reserved 17

Branch Prediction

18
Branch Prediction
Branch Prediction

gshare tournament

Copyright © 2019, Elsevier Inc. All rights Reserved 19

Branch Prediction
Branch Prediction Performance

Copyright © 2019, Elsevier Inc. All rights Reserved 20

Branch Prediction
Branch Prediction Performance

Copyright © 2019, Elsevier Inc. All rights Reserved 21

Branch Prediction
Tagged Hybrid Predictors
 This class of branch predictors employs a
series of global predictors indexed with
different length histories.
 Need to have predictor for each branch
and history
 Problem: this implies huge tables
 Solution:

Use hash tables, whose hash value is based on
branch address and branch history

Longer histories may lead to increased chance of
hash collision, so use multiple tables with
increasingly shorter histories

Copyright © 2019, Elsevier Inc. All rights Reserved 22

Branch Prediction
Tagged Hybrid Predictors

Copyright © 2019, Elsevier Inc. All rights Reserved 23

Branch Prediction
Tagged Hybrid Predictors

Copyright © 2019, Elsevier Inc. All rights Reserved 24

Dynamic Scheduling
Dynamic Scheduling
 Dynamic Scheduling: Rearrange order of instructions to
reduce stalls while maintaining data flow
 Advantages:
 code that was compiled with one pipeline in mind can run
efficiently on a different pipeline
 Compiler doesn’t need to have knowledge of microarchitecture
 Handles cases where dependencies are unknown at compile
time
 allows the processor to tolerate unpredictable delays, such as
cache misses, by executing other code while waiting for the miss
to resolve
 Disadvantage:
 Substantial increase in hardware complexity
 Complicates exceptions

Copyright © 2019, Elsevier Inc. All rights Reserved 25

Dynamic Scheduling
Dynamic Scheduling
 dynamically scheduled processor cannot change the
data flow, it tries to avoid stalling when dependences
are present.
 Static pipeline scheduling by the compiler tries to
minimize stalls by separating dependent instructions
so that they will not lead to hazards

Copyright © 2019, Elsevier Inc. All rights Reserved 26

Dynamic Scheduling
Dynamic Scheduling
 Dynamic scheduling implies:
 Out-of-order execution
 Out-of-order completion

 Example 1:
fdiv.d f0,f2,f4
fadd.d f10,f0,f8
fsub.d f12,f8,f14

 fsub.d is not dependent, issue before fadd.d

Copyright © 2019, Elsevier Inc. All rights Reserved 27

Dynamic Scheduling
Dynamic Scheduling
 Example 2:
fdiv.d f0,f2,f4
fmul.d f6,f0,f8
fadd.d f0,f10,f14

 fadd.d is not dependent, but the

antidependence makes it impossible to issue
earlier without register renaming
 fmul.d and fadd.d: antidependence (Register
f0)
 If fadd.d executes before fmul.d, it will result in
WAR Copyright © 2019, Elsevier Inc. All rights Reserved 28
Dynamic Scheduling
Register Renaming
 Example 3:

fdiv.d f0,f2,f4
fadd.d f6,f0,f8
antidependence
fsd f6,0(x1)
fsub.d f8,f10,f14 antidependence

fmul.d f6,f10,f8
 WAR hazards on the use of f8 by fadd.d and its use by the fsub.d
 WAW hazard because the fadd.d may finish later than the fmul.d
 There are also three true data dependences:

between the fdiv.d and the fadd.d,

between the fsub.d and the fmul.d,

between the fadd.d and the fsd.

Copyright © 2019, Elsevier Inc. All rights Reserved 29

Dynamic Scheduling
Register Renaming
 Example 3: assume the existence of two
temporary registers, S and T.

fdiv.d f0,f2,f4
fadd.d S,f0,f8
fsd S,0(x1)
fsub.d T,f10,f14
fmul.d f6,f10,T

 Now only RAW hazards remain, which can be strictly

ordered

Copyright © 2019, Elsevier Inc. All rights Reserved 30

Dynamic Scheduling
Register Renaming
 Tomasulo’s Approach
 Tracks when operands are available: Minimize RAW
 Introduces register renaming in hardware

Minimizes WAW and WAR hazards
 rely on two key principles:
 dynamically determining when an instruction is ready to execute
 renaming registers to avoid unnecessary hazards.
 Register renaming is provided by reservation stations (RS)
 The basic idea is that a RS fetches and buffers an operand
as soon as it is available, eliminating the need to get the
operand from a register
 RS Contains:

The instruction

Buffered operand values (when available)

Reservation station number of instruction providing
the operand values
Copyright © 2019, Elsevier Inc. All rights Reserved 31
Dynamic Scheduling
Register Renaming
 RS fetches and buffers an operand as soon as it
becomes available (not necessarily involving register file)
 Pending instructions designate the RS to which they will
send their output
 Result values broadcast on a result bus, called the common data
bus (CDB)
 Only the last output updates the register file
 As instructions are issued, the register specifiers are
renamed with the reservation station
 May be more reservation stations than registers
 Load and store buffers
 Contain data and addresses, act like reservation stations

Copyright © 2019, Elsevier Inc. All rights Reserved 32

Dynamic Scheduling
Tomasulo’s Algorithm

Copyright © 2019, Elsevier Inc. All rights Reserved 33

Dynamic Scheduling
Tomasulo’s Algorithm
 Three Steps:
 Issue

Get next instruction from FIFO instruction queue

If available RS, issue the instruction to the RS with operand values if
available

If operand values not available, stall the instruction
 Execute

If one or more of the operands is not yet available, monitor the
common data bus while waiting for it to be computed

When operand becomes available, store it in any reservation
stations waiting for it

When all operands are ready, issue the instruction

Loads and store maintained in program order through effective
address

No instruction allowed to initiate execution until all branches that
proceed it in program order have completed
 Write result

Write result on CDB into reservation stations and store buffers
 (Stores must wait until address and value are received)

Copyright © 2019, Elsevier Inc. All rights Reserved 34

Dynamic Scheduling
Example

Copyright © 2019, Elsevier Inc. All rights Reserved 35

Dynamic Scheduling
Tomasulo’s Algorithm
 Example loop:
Loop: fld f0,0(x1)
fmul.d f4,f0,f2
fsd f4,0(x1)
addi x1,x1,8
bne x1,x2,Loop // branches if x16 != x2

Copyright © 2019, Elsevier Inc. All rights Reserved 36

Dynamic Scheduling
Tomasulo’s Algorithm

Copyright © 2019, Elsevier Inc. All rights Reserved 37

Hardware-Based Speculation
Hardware-Based Speculation
 Execute instructions along predicted
execution paths but only commit the
results if prediction was correct
 Instruction commit: allowing an instruction
to update the register file when instruction
is no longer speculative
 Need an additional piece of hardware to
prevent any irrevocable action until an
instruction commits
 I.e. updating state or taking an execution

Copyright © 2019, Elsevier Inc. All rights Reserved 38

Hardware-Based Speculation
Reorder Buffer
 Reorder buffer – holds the result of
instruction between completion and
commit

 Four fields:
 Instruction type: branch/store/register
 Destination field: register number
 Value field: output value
 Ready field: completed execution?

 Modify reservation stations:

 OperandCopyright
source is now reorder buffer instead 39
© 2019, Elsevier Inc. All rights Reserved
Hardware-Based Speculation
Reorder Buffer
 Issue:
 Allocate RS and ROB, read available
operands
 Execute:
 Begin execution when operand values are
available
 Write result:
 Write result and ROB tag on CDB
 Commit:
 When ROB reaches head of ROB, update
register
 When a mispredicted branch reaches head of40
Copyright © 2019, Elsevier Inc. All rights Reserved
Hardware-Based Speculation
Reorder Buffer
 Register values and memory values are
not written until an instruction commits

 On misprediction:
 Speculated entries in ROB are cleared

 Exceptions:
 Not recognized until it is ready to commit

Copyright © 2019, Elsevier Inc. All rights Reserved 41

Hardware-Based Speculation
Reorder Buffer

Copyright © 2019, Elsevier Inc. All rights Reserved 42

Hardware-Based Speculation
Reorder Buffer

Copyright © 2019, Elsevier Inc. All rights Reserved 43

Multiple Issue and Static Scheduling
Multiple Issue and Static Scheduling
 To achieve CPI < 1, need to complete
multiple instructions per clock

 Solutions:
 Statically scheduled superscalar processors
 VLIW (very long instruction word) processors
 Dynamically scheduled superscalar
processors

Copyright © 2019, Elsevier Inc. All rights Reserved 44

Multiple Issue and Static Scheduling
Multiple Issue

Copyright © 2019, Elsevier Inc. All rights Reserved 45

Multiple Issue and Static Scheduling
VLIW Processors
 Package multiple operations into one
instruction

 Example VLIW processor:

 One integer instruction (or branch)
 Two independent floating-point operations
 Two independent memory references

 Must be enough parallelism in code to fill

the available slots

Multiple Issue and Static Scheduling
VLIW Processors

 Disadvantages:
 Statically finding parallelism
 Code size
 No hazard detection hardware
 Binary code compatibility
Copyright © 2019, Elsevier Inc. All rights Reserved 47
Dynamic Scheduling, Multiple Issue, and Speculation
Dynamic Scheduling, Multiple Issue, and Speculation

 Modern microarchitectures:
 Dynamic scheduling + multiple issue +
speculation

 Two approaches:
 Assign reservation stations and update
pipeline control table in half clock cycles

Only supports 2 instructions/clock
 Design logic to handle any possible
dependencies between the instructions

 Issue logic is the bottleneck in dynamically

Dynamic Scheduling, Multiple Issue, and Speculation
Multiple Issue
 Examine all the dependencies amoung the
instructions in the bundle

 If dependencies exist in bundle, encode

them in reservation stations

 Also need multiple completion/commit

 To simplify RS allocation:
 Limit the number of instructions of a given
class that can be issued in a “bundle”, i.e. on
FP, one integer, one load, one store
Copyright © 2019, Elsevier Inc. All rights Reserved 50
Dynamic Scheduling, Multiple Issue, and Speculation
Example
Loop: ld x2,0(x1) //x2=array element
addi x2,x2,1 //increment x2
sd x2,0(x1) //store result
addi x1,x1,8 //increment pointer
bne x2,x3,Loop //branch if not last

Dynamic Scheduling, Multiple Issue, and Speculation
Example (No Speculation)

Dynamic Scheduling, Multiple Issue, and Speculation
Example (Mutiple Issue with Speculation)

Adv. Techniques for Instruction Delivery and Speculation
Branch-Target Buffer
 Need high instruction bandwidth
 Branch-Target buffers

Next PC prediction buffer, indexed by current PC

Adv. Techniques for Instruction Delivery and Speculation
Branch Folding
 Optimization:
 Larger branch-target buffer
 Add target instruction into buffer to deal with
longer decoding time required by larger buffer
 “Branch folding”

Adv. Techniques for Instruction Delivery and Speculation
Return Address Predictor
 Most unconditional branches come from
function returns
 The same procedure can be called from
multiple sites
 Causes the buffer to potentially forget about
the return address from previous calls
 Create return address buffer organized
as a stack

Adv. Techniques for Instruction Delivery and Speculation
Return Address Predictor

Adv. Techniques for Instruction Delivery and Speculation
Integrated Instruction Fetch Unit
 Design monolithic unit that performs:
 Branch prediction
 Instruction prefetch

Fetch ahead
 Instruction memory access and buffering

Deal with crossing cache lines

Adv. Techniques for Instruction Delivery and Speculation
Register Renaming
 Register renaming vs. reorder buffers
 Instead of virtual registers from reservation stations and reorder
buffer, create a single register pool

Contains visible registers and virtual registers
 Use hardware-based map to rename registers during issue
 WAW and WAR hazards are avoided
 Speculation recovery occurs by copying during commit
 Still need a ROB-like queue to update table in order
 Simplifies commit:

Record that mapping between architectural register and physical register is no
longer speculative

Free up physical register used to hold older value

In other words: SWAP physical registers on commit
 Physical register de-allocation is more difficult

Simple approach: deallocate virtual register when next instruction writes to its
mapped architecturally-visibly register

Adv. Techniques for Instruction Delivery and Speculation
Integrated Issue and Renaming
 Combining instruction issue with register renaming:
 Issue logic pre-reserves enough physical registers for the
bundle
 Issue logic finds dependencies within bundle, maps registers
as necessary
 Issue logic finds dependencies between current bundle and
already in-flight bundles, maps registers as necessary

Adv. Techniques for Instruction Delivery and Speculation
How Much?
 How much to speculate
 Mis-speculation degrades performance and
power relative to no speculation

May cause additional misses (cache, TLB)
 Prevent speculative code from causing
higher costing misses (e.g. L2)
 Speculating through multiple branches
 Complicates speculation recovery
 Speculation and energy efficiency
 Note: speculation is only energy efficient
when it significantly improves performance
Copyright © 2019, Elsevier Inc. All rights Reserved 61
Adv. Techniques for Instruction Delivery and Speculation
How Much?
integer

Adv. Techniques for Instruction Delivery and Speculation
Energy Efficiency
 Value prediction
 Uses:

Loads that load from a constant pool

Instruction that produces a value from a small set
of values
 Not incorporated into modern processors
 Similar idea--address aliasing prediction--is
used on some processors to determine if
two stores or a load and a store reference
the same address to allow for reordering

Fallacies and Pitfalls
Fallacies and Pitfalls
 It is easy to predict the performance/energy
efficiency of two different versions of the same
ISA if we hold the technology constant

Fallacies and Pitfalls
Fallacies and Pitfalls
 Processors with lower CPIs / faster clock rates
will also be faster

 Pentium 4 had higher clock, lower CPI

 Itanium had same CPI, lower clock

Fallacies and Pitfalls
Fallacies and Pitfalls
 Sometimes bigger and dumber is better
 Pentium 4 and Itanium were advanced designs, but
could not achieve their peak instruction throughput
because of relatively small caches as compared to i7

 And sometimes smarter is better than bigger and

dumber
 TAGE branch predictor outperforms gshare with less
stored predictions

Fallacies and Pitfalls
Fallacies and Pitfalls
 Believing that there
are large amounts
of ILP available, if
only we had the
right techniques

Mapa de Memoria Trackers Artech - Communication Box
No ratings yet
Mapa de Memoria Trackers Artech - Communication Box
25 pages
Accessing Fusion Data From BI Reports Using Java
No ratings yet
Accessing Fusion Data From BI Reports Using Java
8 pages
Brochure MIT XPRO - Professional Certificate in Data Engineering - V44
No ratings yet
Brochure MIT XPRO - Professional Certificate in Data Engineering - V44
15 pages
JavaScript Hoisting (With Examples)
No ratings yet
JavaScript Hoisting (With Examples)
8 pages
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
0% (1)
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
40 pages
ISC2 3 Access Control Concepts
No ratings yet
ISC2 3 Access Control Concepts
21 pages
HandbookStudent SC015 20212022
No ratings yet
HandbookStudent SC015 20212022
114 pages
IP Camera Protocols - PSIA Vs ONVIF
No ratings yet
IP Camera Protocols - PSIA Vs ONVIF
9 pages
Feasibility Study Report
No ratings yet
Feasibility Study Report
2 pages
Instruction-Level Parallelism (ILP), Since The
100% (1)
Instruction-Level Parallelism (ILP), Since The
57 pages
Advanced Query Tuning Using IBM Data Studio
No ratings yet
Advanced Query Tuning Using IBM Data Studio
64 pages
Instruction Level Parallelism-Concepts N Challenges
100% (1)
Instruction Level Parallelism-Concepts N Challenges
4 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
CS 6461: Computer Architecture Instruction Level Parallelism
No ratings yet
CS 6461: Computer Architecture Instruction Level Parallelism
41 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
2 pages
Lec3 PDF
No ratings yet
Lec3 PDF
15 pages
CS 6290 Instruction Level Parallelism
No ratings yet
CS 6290 Instruction Level Parallelism
45 pages
Compiler Techniques For Exposing ILP
No ratings yet
Compiler Techniques For Exposing ILP
26 pages
Ch2 Lec7 Instruction Piplining
No ratings yet
Ch2 Lec7 Instruction Piplining
34 pages
Eng 23297-64295-1-PB PDF
No ratings yet
Eng 23297-64295-1-PB PDF
27 pages
Pipelining Become Universal Technique in 1985
No ratings yet
Pipelining Become Universal Technique in 1985
16 pages
Computer Organization and Architecture What Does Superscalar Mean?
No ratings yet
Computer Organization and Architecture What Does Superscalar Mean?
14 pages
H5 Video Wall Splicer Specifications V1.12.0
No ratings yet
H5 Video Wall Splicer Specifications V1.12.0
25 pages
3-INSTRUCTION LEVEL PARALLELISM-12-Dec-2019Material - I - 12-Dec-2019 - ILP PDF
No ratings yet
3-INSTRUCTION LEVEL PARALLELISM-12-Dec-2019Material - I - 12-Dec-2019 - ILP PDF
15 pages
13) Ilp1 PDF
No ratings yet
13) Ilp1 PDF
85 pages
Introduction To Instruction Level Parallelism (ILP) : ECE338 Parallel Computer Architecture Spring 2022
No ratings yet
Introduction To Instruction Level Parallelism (ILP) : ECE338 Parallel Computer Architecture Spring 2022
13 pages
4th Lecture Computer Architecture
No ratings yet
4th Lecture Computer Architecture
15 pages
OENG1278 Group Report Template
No ratings yet
OENG1278 Group Report Template
6 pages
MPCA Assignment 11 B - 66
No ratings yet
MPCA Assignment 11 B - 66
5 pages
2022 Winter Model Answer Papermsbte Study Resources
No ratings yet
2022 Winter Model Answer Papermsbte Study Resources
25 pages
5-Instruction Level Support For Parallel Programming-22!12!2022
No ratings yet
5-Instruction Level Support For Parallel Programming-22!12!2022
16 pages
pdc2: MODULE2
No ratings yet
pdc2: MODULE2
113 pages
Innovative IoT-based Intelligent Smart Cup Coaster
No ratings yet
Innovative IoT-based Intelligent Smart Cup Coaster
7 pages
03 Dynamic Sched
No ratings yet
03 Dynamic Sched
84 pages
Data Science Training Program Gynosis
No ratings yet
Data Science Training Program Gynosis
6 pages
Reading Assignment1
No ratings yet
Reading Assignment1
15 pages
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
No ratings yet
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
14 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
19 pages
UDT and UDF Related Issues
No ratings yet
UDT and UDF Related Issues
5 pages
Dumping The EID Root Key (2023-11-21)
No ratings yet
Dumping The EID Root Key (2023-11-21)
6 pages
Instruction Level Parallelism and Its Exploitation: Unit Ii by Raju K, Cse Dept
No ratings yet
Instruction Level Parallelism and Its Exploitation: Unit Ii by Raju K, Cse Dept
201 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
Advanced C Programming: Declarations, External Names, Memory Layout
No ratings yet
Advanced C Programming: Declarations, External Names, Memory Layout
32 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
Instruction-Level Parallelism: Stalls Control Stalls WAW Stalls WAR Stalls RAW Stalls Structural CPI CPI
No ratings yet
Instruction-Level Parallelism: Stalls Control Stalls WAW Stalls WAR Stalls RAW Stalls Structural CPI CPI
50 pages
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
No ratings yet
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
38 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
Instruction Level Parallelism: Soner Onder
No ratings yet
Instruction Level Parallelism: Soner Onder
25 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
Chapter 5 PPTV 41 STDV 1
No ratings yet
Chapter 5 PPTV 41 STDV 1
47 pages
Delegation of Financial Powers
No ratings yet
Delegation of Financial Powers
8 pages
Instruction-Level Parallel Processors: Asim Munir
No ratings yet
Instruction-Level Parallel Processors: Asim Munir
28 pages
Instruction-Level Parallel Processors: Objective
No ratings yet
Instruction-Level Parallel Processors: Objective
31 pages
Lecture 5
No ratings yet
Lecture 5
76 pages
Unit 6
No ratings yet
Unit 6
22 pages
CSP Unit 1 - Digital Information
No ratings yet
CSP Unit 1 - Digital Information
205 pages
ACA Unit 3
No ratings yet
ACA Unit 3
50 pages
Exploiting ILP With Software Approach
No ratings yet
Exploiting ILP With Software Approach
104 pages
Module 5 Nosql
No ratings yet
Module 5 Nosql
9 pages
ILP Overview and Scoreboard
No ratings yet
ILP Overview and Scoreboard
60 pages
From Side Project To Going Indie Slide - Antoine Van Der Lee
No ratings yet
From Side Project To Going Indie Slide - Antoine Van Der Lee
127 pages
3a.ILP Dipendenze e Superscalare
No ratings yet
3a.ILP Dipendenze e Superscalare
24 pages
Study Guide Chapter 3
No ratings yet
Study Guide Chapter 3
3 pages
Unit4 Aca
No ratings yet
Unit4 Aca
6 pages
ILP-Solution For CO5
No ratings yet
ILP-Solution For CO5
27 pages
MCP Unit 1
No ratings yet
MCP Unit 1
41 pages
EE457Unit9a OoO
No ratings yet
EE457Unit9a OoO
77 pages
AI in Architecture and Engineering From
No ratings yet
AI in Architecture and Engineering From
21 pages
UNIT-4 Exception - F File Handling
No ratings yet
UNIT-4 Exception - F File Handling
36 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
CTI Cheat Sheet v1.0
No ratings yet
CTI Cheat Sheet v1.0
10 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
22 pages
Compliance Specification
No ratings yet
Compliance Specification
13 pages
Module 5 Instruction Level Parallelism and Pipelining
No ratings yet
Module 5 Instruction Level Parallelism and Pipelining
54 pages
L1.3b OOOpipelines
No ratings yet
L1.3b OOOpipelines
72 pages
Features and Changes by Product
No ratings yet
Features and Changes by Product
2 pages
CH16-WS ILP and Superscalar-V2
No ratings yet
CH16-WS ILP and Superscalar-V2
42 pages
4-Advanced Pipelining - 241114 - 060906
No ratings yet
4-Advanced Pipelining - 241114 - 060906
80 pages
Onur 447 Spring15 Lecture9 Branch Prediction Afterlecture
No ratings yet
Onur 447 Spring15 Lecture9 Branch Prediction Afterlecture
65 pages
(SMTPS!) GODADDY
No ratings yet
(SMTPS!) GODADDY
9 pages
Lecture-7-15 01 2025
No ratings yet
Lecture-7-15 01 2025
19 pages
Conguration of MDG, Consolidation and Mass Processing (1) 1
No ratings yet
Conguration of MDG, Consolidation and Mass Processing (1) 1
126 pages
Computer Architecture ILP - Techniques For Increasing
No ratings yet
Computer Architecture ILP - Techniques For Increasing
11 pages
U3.1 Concepts and Challenges
No ratings yet
U3.1 Concepts and Challenges
12 pages
EC483 Fall2024 W7
No ratings yet
EC483 Fall2024 W7
40 pages
2 TypesofParallelism
No ratings yet
2 TypesofParallelism
69 pages
CMP3010L05-Hazard Continue ILP
No ratings yet
CMP3010L05-Hazard Continue ILP
54 pages
ORACLE PL/SQL Interview Questions You'll Most Likely Be Asked
From Everand
ORACLE PL/SQL Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
5/5 (1)
Swift Programming Simplified: A Practical Guide with Examples
From Everand
Swift Programming Simplified: A Practical Guide with Examples
William E. Clark
No ratings yet

CompanionAsset 9780128119051 Chapter03

Uploaded by

CompanionAsset 9780128119051 Chapter03

Uploaded by

Computer Architecture

A Quantitative Approach, Sixth Edition

Copyright © 2019, Elsevier Inc. All rights Reserved 1

 Beyond this, there are two main approaches:

Copyright © 2019, Elsevier Inc. All rights Reserved 2

Copyright © 2019, Elsevier Inc. All rights Reserved 3

Copyright © 2019, Elsevier Inc. All rights Reserved 4

Copyright © 2019, Elsevier Inc. All rights Reserved 5

 Data dependence conveys:

 Dependencies that flow through memory

overlap during execution would change the

Copyright © 2019, Elsevier Inc. All rights Reserved 9

Copyright © 2019, Elsevier Inc. All rights Reserved 10

Copyright © 2019, Elsevier Inc. All rights Reserved 12

Copyright © 2019, Elsevier Inc. All rights Reserved 13

Copyright © 2019, Elsevier Inc. All rights Reserved 14

Copyright © 2019, Elsevier Inc. All rights Reserved 15

Copyright © 2019, Elsevier Inc. All rights Reserved 16

Copyright © 2019, Elsevier Inc. All rights Reserved 17

Copyright © 2019, Elsevier Inc. All rights Reserved 19

Copyright © 2019, Elsevier Inc. All rights Reserved 20

Copyright © 2019, Elsevier Inc. All rights Reserved 21

Copyright © 2019, Elsevier Inc. All rights Reserved 22

Copyright © 2019, Elsevier Inc. All rights Reserved 23

Copyright © 2019, Elsevier Inc. All rights Reserved 24

Copyright © 2019, Elsevier Inc. All rights Reserved 25

Copyright © 2019, Elsevier Inc. All rights Reserved 26

 fsub.d is not dependent, issue before fadd.d

Copyright © 2019, Elsevier Inc. All rights Reserved 27

 fadd.d is not dependent, but the

Copyright © 2019, Elsevier Inc. All rights Reserved 29

 Now only RAW hazards remain, which can be strictly

Copyright © 2019, Elsevier Inc. All rights Reserved 30

Copyright © 2019, Elsevier Inc. All rights Reserved 32

Copyright © 2019, Elsevier Inc. All rights Reserved 33

Copyright © 2019, Elsevier Inc. All rights Reserved 34

Copyright © 2019, Elsevier Inc. All rights Reserved 35

Copyright © 2019, Elsevier Inc. All rights Reserved 36

Copyright © 2019, Elsevier Inc. All rights Reserved 37

Copyright © 2019, Elsevier Inc. All rights Reserved 38

 Modify reservation stations:

Copyright © 2019, Elsevier Inc. All rights Reserved 41

Copyright © 2019, Elsevier Inc. All rights Reserved 42

Copyright © 2019, Elsevier Inc. All rights Reserved 43

Copyright © 2019, Elsevier Inc. All rights Reserved 44

Copyright © 2019, Elsevier Inc. All rights Reserved 45

 Example VLIW processor:

 Must be enough parallelism in code to fill

Copyright © 2019, Elsevier Inc. All rights Reserved 46

 Issue logic is the bottleneck in dynamically

Copyright © 2019, Elsevier Inc. All rights Reserved 49

 If dependencies exist in bundle, encode

 Also need multiple completion/commit

Copyright © 2019, Elsevier Inc. All rights Reserved 51

Copyright © 2019, Elsevier Inc. All rights Reserved 52

Copyright © 2019, Elsevier Inc. All rights Reserved 53

Copyright © 2019, Elsevier Inc. All rights Reserved 54

Copyright © 2019, Elsevier Inc. All rights Reserved 55

Copyright © 2019, Elsevier Inc. All rights Reserved 56

Copyright © 2019, Elsevier Inc. All rights Reserved 57

Copyright © 2019, Elsevier Inc. All rights Reserved 58

Copyright © 2019, Elsevier Inc. All rights Reserved 59

Copyright © 2019, Elsevier Inc. All rights Reserved 60

Copyright © 2019, Elsevier Inc. All rights Reserved 62

Copyright © 2019, Elsevier Inc. All rights Reserved 63

Copyright © 2019, Elsevier Inc. All rights Reserved 64

 Pentium 4 had higher clock, lower CPI

Copyright © 2019, Elsevier Inc. All rights Reserved 65

 And sometimes smarter is better than bigger and

Copyright © 2019, Elsevier Inc. All rights Reserved 66

Copyright © 2019, Elsevier Inc. All rights Reserved 67

You might also like