0% found this document useful (0 votes)
91 views21 pages

Parallelism Via Instructions: Instruction-Level Parallelism (ILP)

The document discusses different techniques for exploiting instruction-level parallelism (ILP) in processors, including pipelining, increasing pipeline depth, multiple issue, speculation, static multiple issue, and dynamic multiple issue. Pipelining overlaps instruction execution to exploit ILP. Increasing pipeline depth or using multiple issue allows more instructions to execute simultaneously. Speculation guesses instruction properties to enable earlier execution. Static multiple issue relies on compiler scheduling, while dynamic issue determines scheduling dynamically in hardware.

Uploaded by

daniel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views21 pages

Parallelism Via Instructions: Instruction-Level Parallelism (ILP)

The document discusses different techniques for exploiting instruction-level parallelism (ILP) in processors, including pipelining, increasing pipeline depth, multiple issue, speculation, static multiple issue, and dynamic multiple issue. Pipelining overlaps instruction execution to exploit ILP. Increasing pipeline depth or using multiple issue allows more instructions to execute simultaneously. Speculation guesses instruction properties to enable earlier execution. Static multiple issue relies on compiler scheduling, while dynamic issue determines scheduling dynamically in hardware.

Uploaded by

daniel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

PARALLELISM VIA INSTRUCTIONS

Pipelining exploits parallelism among instructions called


as Instruction-level parallelism (ILP).

There are two primary methods for increasing the


potential amount of instruction-level parallelism.

1. increasing the depth of the pipeline to overlap more


instructions.
Eg:

divide our washer into three machines that perform


the wash, rinse, and spin steps of a traditional washer.
a four-stage to a six-stage pipeline. 1
PARALLELISM VIA INSTRUCTIONS
2. Multiple issue - replicate the internal components of the
computer so that it can launch multiple instructions in
every pipeline stage.
A multiple-issue laundry would replace our household
washer and dryer with, say, three washers and three
dryers.
There are two major ways to implement a multiple-issue
processor (compiler and hardware)
static multiple issue
dynamic multiple issue

2
SPECULATION
Speculation is an approach that allows the compiler or
the processor to guess about the properties of an
instruction.
Check whether guess was right
If so, complete the operation
If not, roll-back and do the right thing

Common to static and dynamic multiple issue.


Eg:

speculate on the outcome of a branch, so that instructions


after the branch could be executed earlier.
speculate that a store that precedes a load does not refer
to the same address, which would allow the load to be 3
executed before the store.
STATIC MULTIPLE-ISSUE
In Static multiple-issue processors, instructions to be
issued in a given clock cycle are chosen statically by the
compiler.

The set of instructions issued in a given clock cycle,


which is called an issue packet .
- one large instruction with multiple operations

This view led to the original name for this approach:


Very Long Instruction Word (VLIW) approach.

4
STATIC MULTIPLE-ISSUE

If one instruction of the pair cannot be used, we require


that it be replaced with a nop.

Responsibility of Static multiple-issue processors


1. removing all hazards
2. scheduling the code
3. inserting no-ops

5
STATIC MULTIPLE-ISSUE

Consider a simple two-issue MIPS processor


1. an integer ALU operation or branch
2. a load or store
Such a design is like that used in some embedded MIPS
processors.

6
STATIC MULTIPLE-ISSUE

To issue an ALU and a data transfer operation in parallel


needed additional hardware
extra ports in the register file.

In one clock cycle


Read two registers for the ALU operation and two more for a
store
One write port for an ALU operation and one write port for a
load.
Since the ALU is tied up for the ALU operation, we also
need a separate adder to calculate the effective address
for data transfers.
Without these extra resources, our two-issue pipeline 7
would be hindered by structural hazards.
A STATIC TWO-ISSUE DATAPATH

8
HAZARDS IN THE TWO-ISSUE MIPS

Data hazard

Instruction in Same Packet Instruction splitted


into 2 packets
add $t0, $s0, $s1 load $s2, 0($t0) add $t0, $s0, $s1
load $s2, 0($t0)

In the above example we cant use ALU result in


load/store in same packet. So split into two packets,
effectively a stall.
Forwarding avoided stalls with single-issue.
9
HAZARDS IN THE TWO-ISSUE MIPS
Use latency
Number of clock cycles between a load instruction and
an instruction that can use the result of the load without
stalling the pipeline.

Load-use hazard
Still we have only one cycle use latency, but now two
instructions.
So More aggressive scheduling required

10
SCHEDULING EXAMPLE

11
LOOP UNROLLING

There are three situations in which a data hazard can occur:


read after write (RAW), a true dependency
write after read (WAR), an anti-dependency
write after write (WAW), an output dependency

True data dependence:


i2 tries to read a source before i1 writes to it
Eg:
i1. R2 <- R1 + R3
i2. R4 <- R2 + R3

12
LOOP UNROLLING
Antidependence :
Also called name dependence.
i2 tries to write a destination before it is read by i1
For example:
i1. R4 <- R1 + R5
i2. R5 <- R1 + R2
In any situation with a chance that i2 may finish
before i1 (i.e., with concurrent execution), it must be
ensured that the result of register R5 is not stored
before i1 has had a chance to fetch the operands.

13
LOOP UNROLLING
Output dependency
(i2 tries to write an operand before it is written by i1)
A write after write (WAW) data hazard may occur in
a concurrent execution environment.
For example:
i1. R2 'R2 <- R4 + R7
i2. R2 <- R1 + R3
The write back (WB) of i2 must be delayed
until i1 finishes executing.
Register renaming:

The renaming of registers by the compiler or


hardware to remove antidependences.
14
LOOP UNROLLING

To eliminate dependences that are not true data


dependencies
Replicate loop body to expose more parallelism

This reduces loop-control overhead


Use different registers per replication

Called register renaming


To avoid loop-carried anti-dependencies.
Store followed by a load of the same register
Reuse of a register name.

15
LOOP UNROLLING

16
DYNAMIC MULTIPLE ISSUE
These are Superscalar processors

CPU decides whether to issue 0, 1, 2, instructions


each cycle
Avoiding structural and data hazards

Avoids the need for compiler scheduling


Though it may still help.
Code semantics ensured by the CPU

Allow the CPU to execute instructions out of order to


avoid stalls . But commit result to registers in order. 17
DYNAMIC MULTIPLE ISSUE

Example
lw $t0, 20($s2)
addu $t1, $t0, $t2
sub $s4, $s4, $t3
slti $t5, $s4, 20

Can start sub while addu is waiting for lw


Dynamic pipeline scheduling:

Hardware support for reordering the order of


instruction execution so as to avoid stalls.
18
DYNAMICALLY SCHEDULED CPU

19
DYNAMICALLY SCHEDULED CPU
Instruction Fetch and Decode Unit: fetches instructions,
decodes them, and sends each instruction to a
corresponding functional unit for execution.
Each functional unit has buffers, called reservation
stations, which hold the operands and the operation.
When the buffer contains all its operands and the
functional unit is ready to execute, the result is
calculated and is sent to any reservation stations waiting
for this particular result as well as to the commit unit.
Commit unit buffers the result until it is safe to put the
result into the register file or, for a store, into memory.
The buffer in the commit unit - the reorder buffer.

Once a result is committed to the register file, it can be


20
fetched directly from there, just as in a normal pipeline.
DYNAMICALLY SCHEDULED CPU
Reservation stations and reorder buffer effectively
provide register renaming
Out-of-order execution, since the instructions can be
executed in a different order than they were fetched
(preserves the data flow order of the program).
In-order commit :
To make programs behave as if they were running on
a simple in-order pipeline,
the instruction fetch and decode unit is required to
issue instructions in order,
which allows dependences to be tracked,
and the commit unit is required to write results to
21
registers and memory in program fetch order.

You might also like