0% found this document useful (0 votes)
24 views76 pages

COA Unit-3 Slides

Uploaded by

Bhuvanesh Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views76 pages

COA Unit-3 Slides

Uploaded by

Bhuvanesh Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 76

Unit-3: Pipeline Processing

Let's say that there are four loads of dirty laundry that need to be washed, dried,
and folded. We could put the the first load in the washer for 30 minutes, dry it for
40 minutes, and then take 20 minutes to fold the clothes. Then pick up the second
load and wash, dry, and fold, and repeat for the third and fourth loads. Supposing
we started at 6 PM and worked as efficiently as possible, we would still be doing
laundry until midnight.
However, a smarter approach to the problem would be to put the second load of
dirty laundry into the washer after the first was already clean and whirling happily
in the dryer. Then, while the first load was being folded, the second load would dry,
and a third load could be added to the pipeline of laundry. Using this method, the
laundry would be finished by 9:30.
Instruction Execution

An instruction in a process is divided into 5 subtasks:


1. Instruction fetch (IF).
2. Instruction decode (ID).
3. Operand fetch (OF).
4. Instruction Execution (IE).
5. Output store (OS).
1cc

5cc

3cc

• Instructions are executed one by one or in Non-Parallel fashion.


• Single H/W Components which can take only 1 task at a time from it’s
input and produce the result at the O/P.
Drawback
• Only one input can be processed at a time.
• Partial output or Intermediate O/P not possible
Pipelining

• Pipelining is a technique where multiple instructions are


overlapped during execution.
• Pipeline is divided into stages and these stages are connected with
one another to form a pipe like structure.
• Improves the throughput of the system and thus increases the
overall instruction throughput.
Pipelining

Execution in Pipelined Architecture

• Parallel Execution of instruction takes place.


• At a particular time slot, all the instructions are in different phases.
• Instead of single H/W components we can split H/W design into small
components.
• The segments are connected with each other through an interface
register and they can execute multiple task in in-depend or parallel.
Example: 4-Stage instruction
pipeline
• The processing of each instruction is divided into 4-segments.
• IFIt is the segment that fetches an instruction.
• IDIt is the segment that decode instruction and calculate effective
address.
• Ex It is the segment that executes the instruction
• WB It is the segment that stores the result in memory
Instruction Cycle:
• Fetch Instruction
• Decode Instruction(Identify opcode and operand)
• Execute Instruction
• Write Back Result(In Register/Memory)
Registers Involved In Each Instruction Cycle:
• Memory address registers(MAR) : It is connected to the address lines of the system
bus. It specifies the address in memory for a read or write operation.
• Memory Buffer Register(MBR) : It is connected to the data lines of the system bus. It
contains the value to be stored in memory or the last value read from the memory.
• Program Counter(PC) : Holds the address of the next instruction to be fetched.
• Instruction Register(IR) : Holds the last instruction fetched.
Stages of
Pipelining
• Instructions of the program execute parallely. When one instruction goes from
nth stage to (n+1)th stage, another instruction goes from (n-1)th stage to nth
stage.
Pipelining

• Advantages
• Pipelining improves the throughput of the system.
• In every clock cycle, a new instruction finishes its
execution.
• Allow multiple instructions to be executed concurrently.
Pipelining

• Disadvantages
• The design of pipelined processor is complex and
costly to manufacture.
• The instruction latency is more.
Types of
Pipelining
• Instruction Pipelining
• Arithmetic Pipelining
Instruction Pipelining

• In instruction pipeline consecutive instruction from memory while previous


instructions are being executed in other segments. Computer needs to process
each instruction with the following sequence of steps:
1. Fetch instruction from memory.
2. Decode the instruction.
3. Calculate effective address.
4. Fetch operand from memory.
5. Execute instruction.
6. Store the result in memory.
Problems with instruction pipeline
(Pipeline Hazards).

• A pipeline hazard occurs when the instruction pipeline deviates at


some phases, some operational conditions that do not permit the
continued execution. The major pipeline hazards are described below:
• Resource hazard (Resource conflict).
• Data hazard (data dependency).
• Branch hazard (branch difficulties).
Arithmetic Pipeline

• Used to implement floating-point operations, multiplication of


fixed-point numbers, and in scientific computation problems.
• Example of a pipeline unit for floating-point addition and
subtraction. X = A * 2a = 0.9504 * 103
Y = B * 2b = 0.8200 * 102
Where A and B are two fractions that represent the mantissa and a and
b are the exponents.
Arithmetic Pipeline

• The combined operation of floating-point addition and subtraction


is divided into four segments. Each segment contains the
corresponding suboperation. The suboperations in the four
segments are:
• Compare the exponents by subtraction.
• Align the mantissas.
• Add or subtract the mantissas.
• Normalize the results
Floating-point binary
addition: X = A * 2a =
0.9504 * 103 Y = B *
2b = 0.8200 * 102
1. Compare exponents by
subtraction:
• The exponents are compared by
subtracting them to determine
their difference. The larger
exponent is chosen as the
exponent of the result.
• The difference of the exponents,
i.e., 3 - 2 = 1 determines how
many times the mantissa
associated with the smaller
exponent must be shifted to the
right.
2.Align the mantissas:
• The mantissa associated with the
smaller exponent is shifted
according to the difference of
exponents determined in segment
3.Add mantissas:
The two mantissas are added in segment
three.
Z=X+Y
= 1.0324 * 103

4. Normalize the result:


After normalization, the result is written
as:
Z = 0.1324 * 104
Parameters that Determines the Performance of pipeline
process
• Speed Up Ratio (SK)
• Latency (LK)
• Efficiency (EK)
• Throughput(HK)
Parameters that Determines the Performance of
pipeline process
• Consider a K-Segment pipeline with clock cycle time as ‘TP‘. Let there be ‘n’
task to be completed in a pipeline processor.
• Now the first instruction is going to take ‘K’ Cycles to come out of the pipeline but the
other (n-1) instructions will take only ‘1’ cycle each i.e. a total of (n-1) cycles. So, time
taken to execute ‘n’ instructions in pipelined processor is;
• ETPipeline = (K+(n-1))Cycles
= (K + n-1)*TP
• In the same case, for a Non –Pipeline processor, Execution time of ‘n’ instruction will
be

ETNon-pipeline = n*K*TP
Parameters that Determines the Performance of
pipeline process
• So, the Speed up (S) of the Pipelined processor over Non-Pipelined processor, when
‘N’ tasks are executed on the same processor is:

• S= (Performance of Pipelined Processor)/(Performance of Non-Pipelined Processor)


• As the performance of a processor is inversely proportional to the execution time,
we
have,

S = ET Non-Pipe/ ET Pipe
S = [N*K*TP ] / [(K + N-1)*TP ]
= (N*K) / [(K + N-1)*TP ]
• When the number of task ‘N’ is significantly larger than K, i.e
N>>K Then S = (N*K)/N = K
Where ‘K’ is the number of stages in the Pipeline.
Parameters that Determines the Performance of
pipeline process
• Efficiency = Given Speed Up/ Maximum Speed Up
= S/SMax
We know that SMax = K

• Throughput = No of Instructions/ Total time to complete the


instruction
= N/(k + N-1)*Tp
Note: The cycle per instruction (CPI) for an ideal pipeline processor is 1.
• Latency=N×clock cycle time
Latency measures the time to complete a single instruction, and while it doesn’t
decrease with pipelining, pipelining allows more instructions to be completed
concurrently.
Throughput is significantly improved with pipelining because multiple instructions are
processed in parallel, leading to the completion of one instruction per clock cycle after
the pipeline is filled.
Speedup quantifies the performance improvement achieved by pipelining, often close
to the number of pipeline stages, provided hazards are minimized.
Efficiency is a measure of how effectively the pipeline stages are utilized. It is affected
by hazards, stalls, and the balance of work between stages.
Space Time
Diagram

Seg No\Clock 1 2 3 4 5 6 7 8 9

No of Segment = 4
No of Task =6
Speed Up Ratio = ?
Branch Instruction
I\T Hazard
1 2 3 4 5 6 7 8 9 10 11 12 13

1 FI DA FO EX

2 FI DA FO EX

3 FI DA FO EX
(Branch)
4 FI ---- ---- FI DA FO EX

5 FI DA FO EX

6 FI DA FO EX

7 FI DA FO EX

No of
stalls=2
Pipeline
• Any condition that causes ‘stall’ inHazard
the pipeline operations can be called a
Hazard
• Pipeline hazards are situation that prevent the next instruction in the
instruction stream from executing during its designated clock cycles.
• Hazard Occurs:
Pipeline: It is a technique of decomposing a sequential
A <- 3+A process into a no of sub processes with each sub process
B <- 4*A being executed in a special dedicated segment that
operates concurrently with all other segments.
• No Hazard
A <- 5*C
B <- 20+C
Pipeline Hazards

a) Data Hazard
b) Control or Instruction Hazard
c) Structure Hazard
Pipeline
Hazard
a) Data Hazards
• An instruction cannot continue because it needs a value that has not yet been generated by an
earlier instruction.
• Data Hazards occurs when the data is used before it is ready.
• In other words an instruction attempt to use a resource before it is ready.
There are three type of Data Hazard possible.
1) RAW (Read after Write) [Flow/True data dependency]
2) WAR (Write after Read) [Anti-Data dependency]
3) WAW (Write after Write) [Output data dependency]
Pipeline Hazard
• Let there be two instructions I and J, such that J follow I. Then,
• RAW hazard occurs when instruction J tries to read data before instruction I writes it.
• Eg:
• I: R2 <- R1 + R3
• J: R4 <- R2 + R3
• WAR hazard occurs when instruction J tries to write data before instruction I reads it.
• Eg:
• I: R2 <- R1 + R3
• J: R3 <- R4 + R5
Pipeline Hazard
• WAW hazard occurs when instruction J tries to write output before instruction I writes it.
Eg:
I: R2 <- R1 + R3 J: R2 <-
R4 + R5

• WAR and WAW hazards occur during the out-of-order execution of the instructions.
#Observations
• All the instructions after the ADD use the result of the ADD instruction (in R1). The ADD instruction writes the
value of R1 in the WB stage (shown black), and the SUB instruction reads the value during ID stage (IDsub). This
problem is called a data hazard. Unless precautions are taken to prevent it, the SUB instruction will read the
wrong value and try to use it.
• The AND instruction is also affected by this data hazard. The write of R1 does not complete until the end of cycle
5 (shown black). Thus, the AND instruction that reads the registers during cycle 4 (IDand) will receive the wrong
result.
• The OR instruction can be made to operate without incurring a hazard by a simple implementation
technique. The technique is to perform register file reads in the second half of the cycle, and writes in the
first half. Because both WB for ADD and IDor for OR are performed in one cycle 5, the write to register file
by ADD will perform in the first half of the cycle, and the read of registers by OR will perform in the second
half of the cycle.
• The XOR instruction operates properly, because its register read occur in cycle 6 after the register write by
ADD.
Control Hazard
• In the previous lecture, we have studied about the pipeline hazards.
• Any condition that causes STALL in the pipeline operations can be called as a
hazard.
• It means due to some circumstances, pipeline gets disturbed and don’t
perform
concurrently for some clock cycles.
• Control Hazard: The situation when pipeline can't operate normally due to
non
sequential control flow.
Types of Control Hazards
• Branch Hazards: These occur when the processor encounters a conditional branch instruction (like an if statement or a loop). Since the outcome of a
branch (whether it will be taken or not) is not known until later in the pipeline, the processor may not know which instruction to fetch next.

• Example: If the processor encounters a branch that depends on the result of a comparison, it has to wait for the comparison to be evaluated before it
knows whether to jump to a new instruction or continue with the next one in the sequence.

• Jump Hazards: These occur when the processor encounters an unconditional jump instruction, where it must immediately change the flow of
execution to a new location. Unlike conditional branches, where the outcome depends on a condition, a jump is always taken, but the new instruction
address may not be known immediately.

• Example: An unconditional GOTO statement or a function call that transfers control to a different memory location.

• Indirect Branch Hazards: These occur when the target of a branch or jump is determined at runtime, rather than being a fixed address known in
advance.

• Example: A function call where the target address is stored in a register or memory location, making it more complex for the processor to predict
where to go next. Solutions: More advanced forms of branch prediction are required to handle indirect branches.

• Call and Return Hazards: These occur when the processor executes a function call or a return instruction. When returning from a function, the
processor needs to know where to resume execution. If the return address is not immediately available, this can cause a delay.

• Example: When a function is called, the processor pushes the return address to a stack and later needs to pop it to continue execution.
Control Hazard
• Now we will understand control or instruction hazard.
• In this, we will see what are the conditions, due to that control hazard occurs in the
pipeline and doesn't perform concurrent/ overlapped operations for some clock cycle.
• Control hazard occurs due to Branch Instruction i.e. the impact of the pipeline on the
branch condition.
• We can understand the control hazard through an example.
Example:
Memory Instructions
Location
12: BEQ R1, R3, 36//When R1 and R3 equal then
jump to address 36
16: AND R2, R3, R5
20: OR R6, R1, R7
24: ADD R8, R1, R9
36: XOR R10,R1, R11
Control
Hazard
• aLet
In us assume
dedicated the following
pipeline 5 phases are required to execute an instruction namely
architecture,
Instruction Fetch, Instruction Decode, Execution, Memory access and Write Back (writing
result in the register).
IF: Instruction Fetch
ID: Instruction Decode
EX: Execution
MEM: Memory access
WB: Write Back/ Store result in the Register.
Timing Diagram of given set of Instructions
Address CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

12 IF ID EX MEM WB
IF ID EX MEM WB
16
20 IF ID EX MEM WB
24 IF ID EX MEM WB
Control
Address CC1 CC2
Hazard
CC3 CC4 CC5 CC6 CC7 CC8

12 IF ID EX MEM WB
IF ID EX MEM WB
16
20 IF ID EX MEM WB
24 IF ID EX MEM WB

Memory Instructions
Location
12: BEQ R1, R3, 36//When R1 and R3 equal then
jump to address 36
16: AND R2, R3, R5
20: OR R6, R1, R7
24: ADD R8, R1, R9
36: XOR R10,R1, R11
Control
In the instruction 12 , Hazard
• Whether it will jump to address location 36 or not, it will be known at ‘MEM’ phase i.e.
at CC4.
• It means until the instruction 12 will be in the 4th CC the next three instructions at the
memory locations 16, 20 and 24 have already entered in the pipe and
performing the operations in their respective phases.
• In normal pipeline concept these instruction(located at addresses 16,20 and 24) have
entered in the pipe.
• Now, suppose if condition(R1=R3) becomes true then everything happened with the
subsequent instructions becomes wrong, because at this point it is clear that instruction
at location 36 should be the next instruction to be executed, instead of 16,20 and 24.
• So, there is a requirement of flush out the wrongly entered/processed instructions.
• Therefore, STALLS for 3CC will occur and instruction at the location 16, 20 and 24 should
not suppose to executes and they should flush out from the pipe.
Control
Hazard
• According to the above example, branch instruction decides to go to the location 36 for
the next instruction to be executed in the MEM stage i.e. in the CC4.
• Three subsequent instructions that follow the branch instruction will be fetched and begin
their executions as like normal scenario, before BEQ branches to location 36.
• Common assumption is to not stopping the process because of branch, once
branch condition is true, just flush out the previous unwanted instruction from the
pipe.
• In this case Branch Penalty is 3 Cycles.
Control
So,
Hazard
• The instruction fetch unit of CPU is responsible for providing a stream of instructions to
the execution unit.
• The instruction fetched by the fetch unit are in consecutive memory location until some special
conditions or branch occurs.
• Problem arises when one of the instruction is branch instruction and need to go to the
some different memory location.
• In this case all unwanted instructions fetched in the pipeline from consecutive memory
locations
are invalid now andMemory
need to removeInstructions
i.e. Flushing out from the pipe.
Location
100 BEQ R1, R3, 120//When R1 and R3 equal then
jump to address 120
104 Instruction 2
108. Instruction 3 Flush OUT
…….
………
120: Instruction 10
124: Instruction 11
Control
Hazard
• This causes STALL in the pipeline till new corrected instruction are fetched from the
memory.
• Thus the time lost as a result of this called as Branch Penalty.
• For reducing the resulting delay, dedicated hardware is incorporated in the fetch/decode
unit to identify branch instruction possibility of occurrence in advance.
• It can increase the cost.
Structural Hazard
• When the multiple instructions need the same resource.
• It means in a computer organization part, common resources are
used by the multiple instructions for their execution.
• These resources are like: Memory, RAM, Different kind of registers,
ALU, common bus etc.
• We have limited no of resources and large no of instructions
• So, obviously many conflict may occur due to this situation.
• Due to this normal pipeline concept is getting disturbed, called as
Structural Hazard.
Types of Structural Hazards

Hazard Type Description

Single-Port Memory Hazards Occur when one instruction is trying to read from a
memory location while another instruction is trying to
write to the same location at the same time.

Execution Unit Conflicts Arise when multiple instructions require the same
execution unit, such as an ALU, simultaneously.

Bus Contention Happens when two or more instructions try to use the
same bus to transfer data at the same time.
Structural
Clock Cycle CC1
Hazard
CC2 CC3 CC4 CC5 CC6 CC7 CC8

I1 IF ID EX MEM WB
IF ID EX MEM WB
I2
I3 IF ID EX MEM WB
I4 IF ID EX MEM WB

• For I1: Focus on CC4,


In this above example , we are accessing ‘MEM’ in CC4 for loading/storing the data in the
memory.
• For I4: Focus on CC4,
In the same CC4, I4 is also fetching instruction from memory, so two instructions I1 and
I4
using the same resources at a time.
Structural
CC1 CC2
Hazard
CC3 CC4 CC5 CC6 CC7 CC8 CC9
Clock Cycle

I1 IF ID EX MEM WB
IF ID EX MEM WB
I2
I3 IF ID EX MEM WB
I4 STALL IF ID EX MEM WB

• For I4: If we make a stall at CC4 and start the I4 in the CC5 then again the similar
kind of problem still exists, because at CC5, i2 is using the memory along with
i4.
• Same kind of problem may occur continuously in CC6 and CC7 also.
Stalling to avoid hazards

• For all of these hazards, the simplest solution to implement is to stall. Stalling involves having the hardware
introduce a delay, or bubble, into the pipeline when a hazard is encountered until that hazard is resolved.
Drawbacks of stalling:
• Reduced efficiency and increased cycle count
• Loss of parallelism and potential impact on performance
Structural
Clock Cycle CC1 CC2 Hazard
CC3 CC4 CC5 CC6 CC7 CC8 CC9

I1 IF ID EX MEM WB
IF ID EX MEM WB
I2
I3 IF ID EX MEM WB
I4 STALL STALL IF ID EX MEM

Clock
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11
Cycle
I1 IF ID EX MEM WB

I2 IF ID EX MEM WB
I3 IF ID EX MEM WB
I4 STALL STALL STALL IF ID EX MEM WB
Data Hazard
Example
Methods of Optimizing Against Hazards – Compiler-Level
• While stalling is a universal remedy in that it can be used to resolve any pipelining hazard, the costs are high, and
stalls impact a chip’s ability to perform efficiently. However, there are other methods available to resolve hazards
that help retain efficiency.
• The first we will examine are all performed in the compiler; no additional hardware need be added to
implement them because the improvements are made to the code itself, not to the machine running it.
• Implemented correctly, compiler-level optimizations, as opposed to hardware-level optimizations, can provide a
solution to
• hazards that does not require extra power and can be performed on any hardware that can implement the
simple pipeline described above.
Resolving Structural Hazards
• One approach to structural hazards is to reorder operations such that two instructions are never so close to one
another that this sort of hazard occurs. The extent to which this is possible depends largely on the hardware; a
string of successive multiplications could be difficult to organize such that a structural hazard never occurred.
However, barring that, reordering operations to prevent this kind of hazard is
• often viable during compiler time. Because the compiler has knowledge of how long a particular
Example for Static Instruction Reordering

• R1 ← R1 + R2
• R3 ← R3 + R3

Reordering the instructions:


• R1 ← R1 + R2
• R4 ← 1
• R5 ← 2
• R6 ← 3
• R7 ← 4
• R3 ← R3 + R3
Resolving Data Hazards

Instruction Reordering
• R1 ← R2 * R2
• R2 ← R1 + R3
Re-ordered sequence:
R1 ← R2 * R2
R4 ← R5 - R6
R2 ← R1 + R3
Reordering can sometimes be made more difficult by data hazards; reordering to avoid one hazard can result in
others, as in the following example:
• R1 ← R2 * R2
• R2 ← R1 + R3
• R4 ← R5 – R2
In this case, the subtraction’s use of R2 as an operand violates an antidependence between it and the addition,
meaning that the two instructions are not independent. Normally, we might want to put an independent
instruction between the multiplication and the addition, since multiplications take longer than additions and
thus the addition would be delayed.
Resolving Control Hazards
Managing control hazards at the compiler level is related to distancing the logical operation on which the
branch is based from the branch itself and on limiting the number of branches. Both of these are accomplished
by loop unrolling.
Loop unrolling essentially expands the body of a loop so that fewer branches are necessary. Hennessey
Example:
for (i=1000; i>0; i=i-1)
x[i] = x[i] + s;

Unrolled loop (4 times):


for (i = 1000; i > 0; i = i - 4) {
x[i] = x[i] + s;
x[i - 1] = x[i - 1] + s;
x[i - 2] = x[i - 2] + s;
x[i - 3] = x[i - 3] + s;
}
Challenges in Loop Unrolling
• Increased Code Size and Cache Misses
• Trade-offs of loop unrolling: larger code footprint may increase cache misses
• Optimal Unrolling Factor
• Explanation of factors influencing optimal unrolling: cache size, register
Limitations of Compiler-Level Optimization

• Limitations at Compile Time


• Many runtime hazards, like data dependencies, can’t be fully predicted
• Examples of Runtime-Only Hazards
• Control flow changes, unpredictable loops
• Conclusion: Need for hardware-level solutions for complex hazards
Methods of Optimizing Against Hazards – Hardware-Level

Resolving Structural Hazards


• Resource Duplication
• Resource Pipelining
• Dynamic Scheduling
Resource Duplication

• Resource duplication offers a more direct approach to eliminating


structural hazards. By duplicating critical resources, such as providing
multiple ALUs or additional memory ports, the pipeline can handle
concurrent requests without conflicts. This eliminates the need for
stalls and ensures smoother instruction flow. However, resource
duplication increases the hardware complexity and cost of the
processor. The trade-off between performance gain and increased
cost must be carefully considered.
Resource Duplication
• One of the simple solution is to give separate memory for keeping the instructions
and the data.
• In Von Neumann Architecture, same memory is used for storing the data and the
instructions, so it is a big drawback of Von Neumann Architecture.
• If we use harvard Architecture, we can store instructions and operand data in
different slots of memory locations. Harvard Architecture
Memory for Memory for operand
VON-NEUMANN Instruction Data
I1
I1 D1
I2
I2 D2
DATA1
I3 D3
I3 I4 D4
DATA2 . D5
. . .
. .
I10 I10 D10
Pipelining the Resource

• Pipelining the resource itself is another effective strategy to mitigate


structural hazards. Instead of duplicating the entire resource, it is
divided into smaller pipelined stages. This allows the resource to
handle multiple instructions concurrently, as different stages can
process different parts of the instructions. For example, a pipelined
ALU can have separate stages for operand fetching, arithmetic
operation, and result writing. This approach improves throughput
without requiring full resource duplication, but it may increase the
latency of the resource itself.
Dynamic Scheduling

• Dynamic scheduling employs sophisticated hardware mechanisms to


dynamically analyze and reorder instructions at runtime, aiming to
avoid structural hazards. Techniques like scoreboarding and
Tomasulo's algorithm track resource availability and instruction
dependencies, allowing the processor to schedule instructions out of
order to maximize resource utilization and minimize stalls. While
highly efficient, dynamic scheduling significantly increases the
complexity of the processor's control logic and can lead to higher
power consumption.
Resolving Data Hazards
• Register Renaming
• Open forwarding
Register Renaming: Unlocking Parallelism

• Register renaming eliminates false dependencies by mapping logical to physical registers.


• This technique allows multiple instructions to execute concurrently without interference.
• It frees up resources and enhances the performance of the pipeline significantly.
• Register renaming is essential for modern processors aiming for higher instruction throughput.
• It’s a cornerstone strategy in overcoming data hazards.
• ∙ Example:
• ∙ Instruction 1: R1 = R2 + R3 (Stores the result in R1)
• ∙ Instruction 2: R1 = R4 + R5 (Uses R1 but overwrites the previous value)
• Without register renaming, the second instruction would overwrite R1 too soon, before the first instruction can use it,
leading
• to incorrect results. With register renaming:
• ∙ Instruction 1: R6 = R2 + R3 (Renames R1 to R6)
• ∙ Instruction 2: R7 = R4 + R5 (Renames R1 to R7)
• This ensures that both instructions use different physical registers (R6 and R7), allowing parallel execution without any
Open forwarding

Open forwarding allows for the dynamic transfer of data between pipeline stages.
This technique minimizes delays by forwarding results directly to dependent instructions.
It enhances throughput and reduces latency in executing instructions smoothly.
By doing so, it optimizes resource utilization within the pipeline.
• Open forwarding is a key strategy to combat data hazards.
Resolving Control Hazards
• Branch Prediction
Branch Prediction

• Guessing whether a branch (like an if-statement) will go one way or another to keep the pipeline moving
without delays.
• There are two types 1) Static and 2) Dynamic
Static Prediction:
Uses simple rules, like always assuming branches are "taken" or "not taken.“
• Pros: Reduces pauses and keeps the system working smoothly.
• Cons: Wrong guesses lead to wasted work and lost time.
Dynamic Branch prediction

• A smarter type of branch prediction that looks at what happened in the past to make better guesses about
branches.
• Common techniques include 1-bit and 2-bit prediction tables, and more complex methods like Pattern
History Tables (PHT) or Branch History Tables (BHT).

• Pros: More accurate than basic guessing, which means fewer stalls.
• Cons: Needs extra hardware, and wrong guesses can still waste time.
Consider the following 4-Stage instruction pipeline where different
instructions are taking different amount of time at different stages.
How many CC will be required to complete these four instructions in
the given pipeline?
IF ID EX WB
I1 2 1 2 2
I2 1 3 3 1
I3 2 2 2 2
I4 1 2 1 2
IF ID EX WB
I1 2 1 2 2
I2 1 3 3 1
I3 2 2 2 2
I4 1 2 1 2

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16


I1
I2
I3
I4
Q. Consider the following program segment which is executed in the 4- stage pipeline.
Fetch(F), Decode(D), Execute(E) Write(W).

ADD R0,
R1,R2 MUL
R3, R4, R6
SUB
R7,R8,R9
DIV R10,
R11, R12
STORE X,
R13

Fetch,
Decode,
Write Back
takes 1 CC
while
Execution
takes 3
Cycle for
remaining
instructions.
Q. Consider the following program segment which is executed in the 4- stage pipeline.
Fetch(F), Decode(D), Execute(E) Write(W).

ADD R0,
R1,R2 MUL
R3, R4, R6
SUB
R7,R8,R9
DIV R10,
R11, R12
STORE X,
R13
Fetch(F) Decode(D) Execute(E) Write Back(W)
Fetch,
Decode, I1 1 1 1 1
Write Back I2 1 1 3 1
takes 1 CC
while I3 1 1 1 1
Execution I4 1 1 3 1
takes 3 I5 1 1 1 1
Cycle for
remaining
instructions.
What is the
speed up?
Fetch(F) Decode(D) Execute(E) Write Back(W)
I1 1 1 1 1
I2 1 1 3 1
I3 1 1 1 1
I4 1 1 3 1
I5 1 1 1 1

I1
I2
I3
I4
I5

Speed Up=
Q) A CPU has a 5-stage pipeline and operates at a frequency of 1 GHz. The instruction fetch happens in the first
stage. A conditional branch instruction computes the target address and evaluates the condition in the 3rd
stage. The CPU stalls and does not fetch new instructions following a conditional branch instruction until the
branch outcome is known. Given that a program consists of 1 billion instructions, where 20% of these
instructions are conditional branch instructions, and each instruction takes 1 clock cycles on average, calculate
the total time required for the completion of the program.
• Consider two pipeline implementations that have the same instruction structures and
support overlapping of all instructions, except for memory-related operations. In this
case, if memory operations cannot be executed simultaneously, it results in one stall
cycle. In the program, 20% of the instructions involve memory-related operations.
Pipeline 1 utilizes 1 port-memory, while Pipeline 2 uses 2 port –memory. If the speedup
factors for the respective pipelines are S1,S2, What is the value of S2/S1.

You might also like