0% found this document useful (0 votes)
18 views11 pages

Computer Architecture ILP - Techniques For Increasing

The document discusses instruction level parallelism (ILP) and techniques to enhance it, focusing on program correctness through exception behavior and data flow preservation. Key compiler techniques include pipeline scheduling, loop unrolling, and branch prediction, which aim to optimize execution speed and reduce stalls. Advanced branch prediction methods, such as correlating and tournament predictors, are also highlighted for improving prediction accuracy in instruction execution.

Uploaded by

Aritra Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views11 pages

Computer Architecture ILP - Techniques For Increasing

The document discusses instruction level parallelism (ILP) and techniques to enhance it, focusing on program correctness through exception behavior and data flow preservation. Key compiler techniques include pipeline scheduling, loop unrolling, and branch prediction, which aim to optimize execution speed and reduce stalls. Advanced branch prediction methods, such as correlating and tournament predictors, are also highlighted for improving prediction accuracy in instruction execution.

Uploaded by

Aritra Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Computer Architecture

(PCC CS-402)
Instruction Level Parallelism
Techniques for increasing ILP

May 12, 2025


Essential properties to program correctness
■ Enforcing dependency relations is not entirely
necessary if we can preserve the correctness of the
program.
■ Two properties critical to program correctness (and
normally preserved by maintaining both data and
control dependency) are:
● Preserving exception behavior: any change in
instruction order must not change the order in which
exceptions are raised.
● Preserving data flow: the flow of data between
instructions that produce results and consumes them.
Liveness: data that is needed is called live; data that is
no longer used is called dead.

May 12, 2025 2


Compiler techniques for exposing ILP
■ Loop transformation technique to optimize a
program's execution speed :
● Reduce or eliminate instructions that control the loop,
e.g., pointer arithmetic and "end of loop" tests on each
iteration.
● Hide latencies, e.g., the delay in reading data from
memory.
● Re-write loops as a repeated sequence of similar
independent statements → space-time tradeoff.
● Reduce branch penalties.
■ Methods:
1. Pipeline scheduling.
2. Loop unrolling.
3. Branch prediction.

May 12, 2025 3


1. Pipeline Scheduling
■ Pipeline stall: Delay in execution of an instruction in
an instruction pipeline in order to resolve a hazard.
The compiler can reorder instructions to reduce the
number of pipeline stalls.
■ Pipeline scheduling: Separate dependent instruction
from the source instruction by the pipeline latency of
the source instruction.

May 12, 2025 4


2. Loop unrolling

Loop: L.D F0,0(R1) // F0=array element


DADDUI R1,R1,#-8 // decrement pointer
8 bytes
ADD.D F4,F0,F2 // add scalar in F2
S.D F4,8(R1) // store result
BNE R1,R2,Loop // branch R1!=R2

■ Assume # elements of the array with starting address


in R1 is divisible by 4.
■ Unrolled by a factor of 4.
■ Eliminate unnecessary instructions.

May 12, 2025 5


Loop unrolling
Loop: L.D F0,0(R1)
ADD.D F4,F0,F2
S.D F4,0(R1) % drop DADDUI & BNE
L.D F6,-8(R1)
ADD.D F8,F6,F2
S.D F8,-8(R1) %drop DADDUI & BNE
L.D F10,-16(R1)
ADD.D F12,F10,F2
S.D F12,-16(R1) % drop DADDUI & BNE
L.D F14,-24(R1)
ADD.D F16,F14,F2
S.D F16,-24(R1)
DADDUI R1,R1,#-32
BNE R1,R2,Loop

May 12, 2025 6


Pipeline schedule the unrolled loop
■ Pipeline schedule reduces the number of stalls.
● The L.D instruction requires only one cycle so when
ADD.D are issued F4, F8, F12 , and F16 are already
loaded.
● The ADD.D requires only two cycles so that two S.D
can proceed immediately.
● The array pointer is updated after the first two S.D so
the loop control
Loop: can proceed immediately
L.D F0,0(R1) after the last
S.D F4,0(R1)
two S.D. L.D F6,-8(R1) S.D F8,-
8(R1)
L.D F10,-16(R1) DADDUI
R1,R1,#-32
L.D F14,-24(R1) S.D
F12,16(R1)
ADD.D F4,F0,F2 S.D
F16,8(R1)
ADD.D F8,F6,F2 BNE
R1,R2,Loop
May 12, 2025 ADD.D F12,F10,F2 7
Loop unrolling & scheduling summary
■ Use different registers to avoid unnecessary
constraints.
■ Adjust the loop termination and iteration code.
■ Find if the loop iterations are independent except the
loop maintenance code, if so unroll the loop.
■ Analyze memory addresses to determine if the load
and store from different iterations are independent, if
so interchange load and stores in the unrolled loop.
■ Schedule the code while ensuring correctness.
■ Limitations of loop unrolling:
● Decrease of the amount of overhead with each roll.
● Growth of the code size.
● Register pressure (shortage of registers) → scheduling
to increase ILP increases the number of live values
thus, the number of registers.
May 12, 2025 8
3. Branch prediction
■ Guess whether a conditional jump will be taken or
not.
■ Improve the flow in the instruction pipeline.
■ The branch that is guessed to be the most likely is
then fetched and speculatively executed.
■ If it is later detected that the guess was wrong then
the speculatively executed or partially executed
instructions are discarded and the pipeline starts over
with the correct branch, incurring a delay.

May 12, 2025 9


Branch prediction
■ The branch predictor keeps records of whether
branches are taken or not taken. When it encounters a
conditional jump that has been seen several times
before then it can base the prediction on the history.
The branch predictor may, for example, recognize
that the conditional jump is taken more often than
not, or that it is taken every second time.
■ Note: not to be confused with branch target
prediction → guess the target of a taken conditional
or unconditional jump before it is computed by
decoding and executing the instruction itself. Both
are often combined into the same circuitry.
May 12, 2025 10
Advanced Branch prediction
■ Correlating branch predictors (or two-level
predictors): make use of outcome of most recent
branches to make prediction.
● Correlated predictor has less misses that simple
predictor with same size.
● Correlated predictor has less misses than simple
predictor with unlimited number of entries.
■ Tournament predictors: run multiple predictors and run a
tournament between them; use the most successful.
● Combine two predictors:
 Global information based predictor.
 Local information based predictor.
● Uses a selector to choose between predictors.

May 12, 2025 11

You might also like