Computer Architecture ILP - Techniques For Increasing
The document discusses instruction level parallelism (ILP) and techniques to enhance it, focusing on program correctness through exception behavior and data flow preservation. Key compiler techniques include pipeline scheduling, loop unrolling, and branch prediction, which aim to optimize execution speed and reduce stalls. Advanced branch prediction methods, such as correlating and tournament predictors, are also highlighted for improving prediction accuracy in instruction execution.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
18 views11 pages
Computer Architecture ILP - Techniques For Increasing
The document discusses instruction level parallelism (ILP) and techniques to enhance it, focusing on program correctness through exception behavior and data flow preservation. Key compiler techniques include pipeline scheduling, loop unrolling, and branch prediction, which aim to optimize execution speed and reduce stalls. Advanced branch prediction methods, such as correlating and tournament predictors, are also highlighted for improving prediction accuracy in instruction execution.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11
Computer Architecture
(PCC CS-402) Instruction Level Parallelism Techniques for increasing ILP
May 12, 2025
Essential properties to program correctness ■ Enforcing dependency relations is not entirely necessary if we can preserve the correctness of the program. ■ Two properties critical to program correctness (and normally preserved by maintaining both data and control dependency) are: ● Preserving exception behavior: any change in instruction order must not change the order in which exceptions are raised. ● Preserving data flow: the flow of data between instructions that produce results and consumes them. Liveness: data that is needed is called live; data that is no longer used is called dead.
May 12, 2025 2
Compiler techniques for exposing ILP ■ Loop transformation technique to optimize a program's execution speed : ● Reduce or eliminate instructions that control the loop, e.g., pointer arithmetic and "end of loop" tests on each iteration. ● Hide latencies, e.g., the delay in reading data from memory. ● Re-write loops as a repeated sequence of similar independent statements → space-time tradeoff. ● Reduce branch penalties. ■ Methods: 1. Pipeline scheduling. 2. Loop unrolling. 3. Branch prediction.
May 12, 2025 3
1. Pipeline Scheduling ■ Pipeline stall: Delay in execution of an instruction in an instruction pipeline in order to resolve a hazard. The compiler can reorder instructions to reduce the number of pipeline stalls. ■ Pipeline scheduling: Separate dependent instruction from the source instruction by the pipeline latency of the source instruction.
May 12, 2025 4
2. Loop unrolling
Loop: L.D F0,0(R1) // F0=array element
DADDUI R1,R1,#-8 // decrement pointer 8 bytes ADD.D F4,F0,F2 // add scalar in F2 S.D F4,8(R1) // store result BNE R1,R2,Loop // branch R1!=R2
■ Assume # elements of the array with starting address
in R1 is divisible by 4. ■ Unrolled by a factor of 4. ■ Eliminate unnecessary instructions.
Pipeline schedule the unrolled loop ■ Pipeline schedule reduces the number of stalls. ● The L.D instruction requires only one cycle so when ADD.D are issued F4, F8, F12 , and F16 are already loaded. ● The ADD.D requires only two cycles so that two S.D can proceed immediately. ● The array pointer is updated after the first two S.D so the loop control Loop: can proceed immediately L.D F0,0(R1) after the last S.D F4,0(R1) two S.D. L.D F6,-8(R1) S.D F8,- 8(R1) L.D F10,-16(R1) DADDUI R1,R1,#-32 L.D F14,-24(R1) S.D F12,16(R1) ADD.D F4,F0,F2 S.D F16,8(R1) ADD.D F8,F6,F2 BNE R1,R2,Loop May 12, 2025 ADD.D F12,F10,F2 7 Loop unrolling & scheduling summary ■ Use different registers to avoid unnecessary constraints. ■ Adjust the loop termination and iteration code. ■ Find if the loop iterations are independent except the loop maintenance code, if so unroll the loop. ■ Analyze memory addresses to determine if the load and store from different iterations are independent, if so interchange load and stores in the unrolled loop. ■ Schedule the code while ensuring correctness. ■ Limitations of loop unrolling: ● Decrease of the amount of overhead with each roll. ● Growth of the code size. ● Register pressure (shortage of registers) → scheduling to increase ILP increases the number of live values thus, the number of registers. May 12, 2025 8 3. Branch prediction ■ Guess whether a conditional jump will be taken or not. ■ Improve the flow in the instruction pipeline. ■ The branch that is guessed to be the most likely is then fetched and speculatively executed. ■ If it is later detected that the guess was wrong then the speculatively executed or partially executed instructions are discarded and the pipeline starts over with the correct branch, incurring a delay.
May 12, 2025 9
Branch prediction ■ The branch predictor keeps records of whether branches are taken or not taken. When it encounters a conditional jump that has been seen several times before then it can base the prediction on the history. The branch predictor may, for example, recognize that the conditional jump is taken more often than not, or that it is taken every second time. ■ Note: not to be confused with branch target prediction → guess the target of a taken conditional or unconditional jump before it is computed by decoding and executing the instruction itself. Both are often combined into the same circuitry. May 12, 2025 10 Advanced Branch prediction ■ Correlating branch predictors (or two-level predictors): make use of outcome of most recent branches to make prediction. ● Correlated predictor has less misses that simple predictor with same size. ● Correlated predictor has less misses than simple predictor with unlimited number of entries. ■ Tournament predictors: run multiple predictors and run a tournament between them; use the most successful. ● Combine two predictors: Global information based predictor. Local information based predictor. ● Uses a selector to choose between predictors.