0% found this document useful (0 votes)

91 views21 pages

Parallelism Via Instructions: Instruction-Level Parallelism (ILP)

The document discusses different techniques for exploiting instruction-level parallelism (ILP) in processors, including pipelining, increasing pipeline depth, multiple issue, speculation, static multiple issue, and dynamic multiple issue. Pipelining overlaps instruction execution to exploit ILP. Increasing pipeline depth or using multiple issue allows more instructions to execute simultaneously. Speculation guesses instruction properties to enable earlier execution. Static multiple issue relies on compiler scheduling, while dynamic issue determines scheduling dynamically in hardware.

Uploaded by

daniel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views21 pages

Parallelism Via Instructions: Instruction-Level Parallelism (ILP)

Uploaded by

daniel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

PARALLELISM VIA INSTRUCTIONS

Pipelining exploits parallelism among instructions called

as Instruction-level parallelism (ILP).

There are two primary methods for increasing the

potential amount of instruction-level parallelism.

1. increasing the depth of the pipeline to overlap more

instructions.
Eg:

divide our washer into three machines that perform

the wash, rinse, and spin steps of a traditional washer.
a four-stage to a six-stage pipeline. 1
PARALLELISM VIA INSTRUCTIONS
2. Multiple issue - replicate the internal components of the
computer so that it can launch multiple instructions in
every pipeline stage.
A multiple-issue laundry would replace our household
washer and dryer with, say, three washers and three
dryers.
There are two major ways to implement a multiple-issue
processor (compiler and hardware)
static multiple issue
dynamic multiple issue

2
SPECULATION
Speculation is an approach that allows the compiler or
the processor to guess about the properties of an
instruction.
Check whether guess was right
If so, complete the operation
If not, roll-back and do the right thing

Common to static and dynamic multiple issue.

Eg:

speculate on the outcome of a branch, so that instructions

after the branch could be executed earlier.
speculate that a store that precedes a load does not refer
to the same address, which would allow the load to be 3
executed before the store.
STATIC MULTIPLE-ISSUE
In Static multiple-issue processors, instructions to be
issued in a given clock cycle are chosen statically by the
compiler.

The set of instructions issued in a given clock cycle,

which is called an issue packet .
- one large instruction with multiple operations

This view led to the original name for this approach:

Very Long Instruction Word (VLIW) approach.

4
STATIC MULTIPLE-ISSUE

If one instruction of the pair cannot be used, we require

that it be replaced with a nop.

Responsibility of Static multiple-issue processors

1. removing all hazards
2. scheduling the code
3. inserting no-ops

5
STATIC MULTIPLE-ISSUE

Consider a simple two-issue MIPS processor

1. an integer ALU operation or branch
2. a load or store
Such a design is like that used in some embedded MIPS
processors.

6
STATIC MULTIPLE-ISSUE

To issue an ALU and a data transfer operation in parallel

needed additional hardware
extra ports in the register file.

In one clock cycle

Read two registers for the ALU operation and two more for a
store
One write port for an ALU operation and one write port for a
load.
Since the ALU is tied up for the ALU operation, we also
need a separate adder to calculate the effective address
for data transfers.
Without these extra resources, our two-issue pipeline 7
would be hindered by structural hazards.
A STATIC TWO-ISSUE DATAPATH

8
HAZARDS IN THE TWO-ISSUE MIPS

Data hazard

Instruction in Same Packet Instruction splitted

into 2 packets
add $t0, $s0, $s1 load $s2, 0($t0) add $t0, $s0, $s1
load $s2, 0($t0)

In the above example we cant use ALU result in

load/store in same packet. So split into two packets,
effectively a stall.
Forwarding avoided stalls with single-issue.
9
HAZARDS IN THE TWO-ISSUE MIPS
Use latency
Number of clock cycles between a load instruction and
an instruction that can use the result of the load without
stalling the pipeline.

Load-use hazard
Still we have only one cycle use latency, but now two
instructions.
So More aggressive scheduling required

10
SCHEDULING EXAMPLE

11
LOOP UNROLLING

There are three situations in which a data hazard can occur:

read after write (RAW), a true dependency
write after read (WAR), an anti-dependency
write after write (WAW), an output dependency

True data dependence:

i2 tries to read a source before i1 writes to it
Eg:
i1. R2 <- R1 + R3
i2. R4 <- R2 + R3

12
LOOP UNROLLING
Antidependence :
Also called name dependence.
i2 tries to write a destination before it is read by i1
For example:
i1. R4 <- R1 + R5
i2. R5 <- R1 + R2
In any situation with a chance that i2 may finish
before i1 (i.e., with concurrent execution), it must be
ensured that the result of register R5 is not stored
before i1 has had a chance to fetch the operands.

13
LOOP UNROLLING
Output dependency
(i2 tries to write an operand before it is written by i1)
A write after write (WAW) data hazard may occur in
a concurrent execution environment.
For example:
i1. R2 'R2 <- R4 + R7
i2. R2 <- R1 + R3
The write back (WB) of i2 must be delayed
until i1 finishes executing.
Register renaming:

The renaming of registers by the compiler or

hardware to remove antidependences.
14
LOOP UNROLLING

To eliminate dependences that are not true data

dependencies
Replicate loop body to expose more parallelism

This reduces loop-control overhead

Use different registers per replication

Called register renaming

To avoid loop-carried anti-dependencies.
Store followed by a load of the same register
Reuse of a register name.

15
LOOP UNROLLING

16
DYNAMIC MULTIPLE ISSUE
These are Superscalar processors

CPU decides whether to issue 0, 1, 2, instructions

each cycle
Avoiding structural and data hazards

Avoids the need for compiler scheduling

Though it may still help.
Code semantics ensured by the CPU

Allow the CPU to execute instructions out of order to

avoid stalls . But commit result to registers in order. 17
DYNAMIC MULTIPLE ISSUE

Example
lw $t0, 20($s2)
addu $t1, $t0, $t2
sub $s4, $s4, $t3
slti $t5, $s4, 20

Can start sub while addu is waiting for lw

Dynamic pipeline scheduling:

Hardware support for reordering the order of

instruction execution so as to avoid stalls.
18
DYNAMICALLY SCHEDULED CPU

19
DYNAMICALLY SCHEDULED CPU
Instruction Fetch and Decode Unit: fetches instructions,
decodes them, and sends each instruction to a
corresponding functional unit for execution.
Each functional unit has buffers, called reservation
stations, which hold the operands and the operation.
When the buffer contains all its operands and the
functional unit is ready to execute, the result is
calculated and is sent to any reservation stations waiting
for this particular result as well as to the commit unit.
Commit unit buffers the result until it is safe to put the
result into the register file or, for a store, into memory.
The buffer in the commit unit - the reorder buffer.

Once a result is committed to the register file, it can be

20
fetched directly from there, just as in a normal pipeline.
DYNAMICALLY SCHEDULED CPU
Reservation stations and reorder buffer effectively
provide register renaming
Out-of-order execution, since the instructions can be
executed in a different order than they were fetched
(preserves the data flow order of the program).
In-order commit :
To make programs behave as if they were running on
a simple in-order pipeline,
the instruction fetch and decode unit is required to
issue instructions in order,
which allows dependences to be tracked,
and the commit unit is required to write results to
21
registers and memory in program fetch order.

h61 s2p Schematics
50% (2)
h61 s2p Schematics
32 pages
DTE MP
No ratings yet
DTE MP
17 pages
List of 8051 Microcontroller Special Function Registers
100% (1)
List of 8051 Microcontroller Special Function Registers
10 pages
Cse410 10 Pipelining A
No ratings yet
Cse410 10 Pipelining A
7 pages
Lec04 Advanced Sequential Circuit Design
No ratings yet
Lec04 Advanced Sequential Circuit Design
54 pages
10th Lecture: Multiple-Issue Processors: Please Recall: Branch Prediction
No ratings yet
10th Lecture: Multiple-Issue Processors: Please Recall: Branch Prediction
28 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
Superscalar
No ratings yet
Superscalar
38 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
Experiment-7 Design A Binary Subtractor Circuit Half and Full Subtractor 7-1 Object
No ratings yet
Experiment-7 Design A Binary Subtractor Circuit Half and Full Subtractor 7-1 Object
4 pages
KS Chapter 5 FIR Filter Design
No ratings yet
KS Chapter 5 FIR Filter Design
55 pages
Design: Planning
No ratings yet
Design: Planning
214 pages
Instruction-Level Parallelism (ILP), Since The
100% (1)
Instruction-Level Parallelism (ILP), Since The
57 pages
P1500-Wrapper - Creation
No ratings yet
P1500-Wrapper - Creation
24 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
214 pages
Parallelism I: Inside The Core
No ratings yet
Parallelism I: Inside The Core
61 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
CS8491 Ca Unit 4
No ratings yet
CS8491 Ca Unit 4
32 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
Ca08 2014 PDF
No ratings yet
Ca08 2014 PDF
54 pages
Hafta 14
No ratings yet
Hafta 14
23 pages
4-Advanced Pipelining - 241114 - 060906
No ratings yet
4-Advanced Pipelining - 241114 - 060906
80 pages
Lecture10 Cda3101
No ratings yet
Lecture10 Cda3101
32 pages
Lec07 Pipelining Review
No ratings yet
Lec07 Pipelining Review
121 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
108 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
39 pages
Computer Architecture Chapter 4: The Processor Part 3: Dr. Phạm Quốc Cường
No ratings yet
Computer Architecture Chapter 4: The Processor Part 3: Dr. Phạm Quốc Cường
23 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
03ILP Speculation and Advanced Topics
No ratings yet
03ILP Speculation and Advanced Topics
48 pages
DSP - ExtraProblems - Solutions PDF
No ratings yet
DSP - ExtraProblems - Solutions PDF
28 pages
Lec02 Superscalar SW VLIW 22 23
No ratings yet
Lec02 Superscalar SW VLIW 22 23
34 pages
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
No ratings yet
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
38 pages
Me FIRST
No ratings yet
Me FIRST
4 pages
Midterm Recap: Performance Evaluation
No ratings yet
Midterm Recap: Performance Evaluation
5 pages
L27,28 Superscaler
No ratings yet
L27,28 Superscaler
28 pages
L04 Pipelining
No ratings yet
L04 Pipelining
38 pages
Chapter 04 Processor 2
No ratings yet
Chapter 04 Processor 2
28 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
EE457Unit9a OoO
No ratings yet
EE457Unit9a OoO
77 pages
Digital Signal Processing Prof. S. C. Dutta Roy Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 38 FIR Design
No ratings yet
Digital Signal Processing Prof. S. C. Dutta Roy Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 38 FIR Design
18 pages
ch4 3
No ratings yet
ch4 3
61 pages
A4 版本1 （未使用）
No ratings yet
A4 版本1 （未使用）
2 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
MIPS
No ratings yet
MIPS
70 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
01 - Mod 2 - Livro Autorresponsabilidade
No ratings yet
01 - Mod 2 - Livro Autorresponsabilidade
9 pages
Ca06 2014 PDF
No ratings yet
Ca06 2014 PDF
53 pages
Embedded Systems Design: Pipelining and Instruction Scheduling
No ratings yet
Embedded Systems Design: Pipelining and Instruction Scheduling
48 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
Pipeline History
No ratings yet
Pipeline History
30 pages
Cs2354 Advanced Computer Architecture 2 Marks
No ratings yet
Cs2354 Advanced Computer Architecture 2 Marks
10 pages
CPU Structure and Functions
No ratings yet
CPU Structure and Functions
39 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Unit - 1 Microprocessor Architecture
No ratings yet
Unit - 1 Microprocessor Architecture
52 pages
Computer Organization and Architecture What Does Superscalar Mean?
No ratings yet
Computer Organization and Architecture What Does Superscalar Mean?
14 pages
13) Ilp1 PDF
No ratings yet
13) Ilp1 PDF
85 pages
CompArch 17e ILP-1
No ratings yet
CompArch 17e ILP-1
15 pages
Digital Systems Design - EC3352 2021 Regulation - Important Questions
No ratings yet
Digital Systems Design - EC3352 2021 Regulation - Important Questions
65 pages
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
No ratings yet
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
35 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
No ratings yet
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
7 pages
Lec39 PDF
No ratings yet
Lec39 PDF
24 pages
CMP3010L05-Hazard Continue ILP
No ratings yet
CMP3010L05-Hazard Continue ILP
54 pages
CMP3010L07 Tomasulo
No ratings yet
CMP3010L07 Tomasulo
70 pages
02a ILP Pipeline
No ratings yet
02a ILP Pipeline
40 pages
Course Book
100% (2)
Course Book
105 pages
msp430 Unit4
No ratings yet
msp430 Unit4
118 pages
EE 010 604 Digital Signal Processing
No ratings yet
EE 010 604 Digital Signal Processing
2 pages
Discrete-Time Filter Design by Windowing: Quote of The Day
No ratings yet
Discrete-Time Filter Design by Windowing: Quote of The Day
16 pages
QB104645 2013 Regulation PDF
No ratings yet
QB104645 2013 Regulation PDF
5 pages
QB104645 2013 Regulation PDF
No ratings yet
QB104645 2013 Regulation PDF
5 pages
Superpipelining
No ratings yet
Superpipelining
7 pages
Lec 40
No ratings yet
Lec 40
23 pages
l6 PDF
No ratings yet
l6 PDF
11 pages
Computer Architecture Questions
No ratings yet
Computer Architecture Questions
10 pages
FPGA Building Blocks: 2.2.1 Layout of The Xilinx Artix-7 XC7A35T FPGA
No ratings yet
FPGA Building Blocks: 2.2.1 Layout of The Xilinx Artix-7 XC7A35T FPGA
2 pages
Ideal Filters: S X N S N y
No ratings yet
Ideal Filters: S X N S N y
26 pages
Module2 - Capacitor and Resistor Model
No ratings yet
Module2 - Capacitor and Resistor Model
37 pages
Display 14 Segmentos
No ratings yet
Display 14 Segmentos
4 pages
Gigabyte MB Naming Rule Apr-2009
No ratings yet
Gigabyte MB Naming Rule Apr-2009
7 pages
Question Bank and Objectives Mpi
No ratings yet
Question Bank and Objectives Mpi
7 pages
6th Gen X Series Spec Update
No ratings yet
6th Gen X Series Spec Update
14 pages
IC2005 Advanced Digital Signal Processing
No ratings yet
IC2005 Advanced Digital Signal Processing
6 pages
C PROGRAMMING Organized by IT Department PDF
No ratings yet
C PROGRAMMING Organized by IT Department PDF
10 pages
IEEE Research Ece - Assignment
No ratings yet
IEEE Research Ece - Assignment
7 pages
VMC 8506
No ratings yet
VMC 8506
60 pages
Scheme Acer Aspire 9300 06211sa PDF
No ratings yet
Scheme Acer Aspire 9300 06211sa PDF
59 pages
Sub Code
No ratings yet
Sub Code
16 pages
History of Distributed Systems
No ratings yet
History of Distributed Systems
12 pages
Auto Intensity Control of Street Lights
No ratings yet
Auto Intensity Control of Street Lights
4 pages
24l1026i Memoria EEPROM 1024k
No ratings yet
24l1026i Memoria EEPROM 1024k
28 pages
Emedded Systems: Project Lab Report
No ratings yet
Emedded Systems: Project Lab Report
17 pages
#5 Arithmetic Circuits
No ratings yet
#5 Arithmetic Circuits
23 pages
Chapter 6
No ratings yet
Chapter 6
21 pages
IBM x3550 M5
No ratings yet
IBM x3550 M5
28 pages
Bios 25Q64FV
No ratings yet
Bios 25Q64FV
89 pages
Apple II Sound Routine (6502)
No ratings yet
Apple II Sound Routine (6502)
2 pages
Company Technology: A Selling and Not A Product
No ratings yet
Company Technology: A Selling and Not A Product
25 pages
Lecture4 Chapter6 - Problem-Solving Session
No ratings yet
Lecture4 Chapter6 - Problem-Solving Session
14 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
From Everand
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
Mamta Devi
No ratings yet

Parallelism Via Instructions: Instruction-Level Parallelism (ILP)

Uploaded by

Parallelism Via Instructions: Instruction-Level Parallelism (ILP)

Uploaded by

PARALLELISM VIA INSTRUCTIONS

Pipelining exploits parallelism among instructions called

There are two primary methods for increasing the

1. increasing the depth of the pipeline to overlap more

divide our washer into three machines that perform

Common to static and dynamic multiple issue.

speculate on the outcome of a branch, so that instructions

The set of instructions issued in a given clock cycle,

This view led to the original name for this approach:

If one instruction of the pair cannot be used, we require

Responsibility of Static multiple-issue processors

Consider a simple two-issue MIPS processor

To issue an ALU and a data transfer operation in parallel

In one clock cycle

Instruction in Same Packet Instruction splitted

In the above example we cant use ALU result in

There are three situations in which a data hazard can occur:

True data dependence:

The renaming of registers by the compiler or

To eliminate dependences that are not true data

This reduces loop-control overhead

Called register renaming

CPU decides whether to issue 0, 1, 2, instructions

Avoids the need for compiler scheduling

Allow the CPU to execute instructions out of order to

Can start sub while addu is waiting for lw

Hardware support for reordering the order of

Once a result is committed to the register file, it can be

You might also like