0% found this document useful (0 votes)
159 views4 pages

Branch Hazard.: Control Hazards

Control hazards occur when the outcome of a branch is not known until later in the pipeline, which can cause instructions to be flushed. There are several techniques to reduce the performance impact of control hazards: 1) Assume branches are not taken and continue fetching sequentially, flushing if a branch is actually taken. 2) Move branch execution earlier in the pipeline to reduce the number of instructions that need flushing. 3) Implement dynamic branch prediction using a branch history table to predict branch outcomes based on previous executions, reducing flushes from incorrect static predictions.

Uploaded by

Jeya Sheeba A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views4 pages

Branch Hazard.: Control Hazards

Control hazards occur when the outcome of a branch is not known until later in the pipeline, which can cause instructions to be flushed. There are several techniques to reduce the performance impact of control hazards: 1) Assume branches are not taken and continue fetching sequentially, flushing if a branch is actually taken. 2) Move branch execution earlier in the pipeline to reduce the number of instructions that need flushing. 3) Implement dynamic branch prediction using a branch history table to predict branch outcomes based on previous executions, reducing flushes from incorrect static predictions.

Uploaded by

Jeya Sheeba A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Control Hazards

An instruction must be fetched at every clock cycle to sustain the pipeline, yet in our
design the decision about whether to branch doesn’t occur until the MEM pipeline stage.

This delay in determining the proper instruction to fetch is called a control hazard or
branch hazard.

Control hazards can cause a greater performance loss for MIPS pipeline than do data
hazards.

When a branch is executed it may or may not change the PC to something other than the
current value plus 4.

Recall that if a branch changes the PC to its target address, it is a taken branch; if it falls
through, it is not taken or untaken.

Assume Branch Not Taken

One improvement over branch stalling is to predict that the branch will not be taken and
thus continue execution down the sequential instruction stream.

If the branch is taken, the instructions that are being fetched and decoded must be
discarded.
Execution continues at the branch target. If branches are untaken half the time, and if it
costs little to discard the instructions, this optimization halves the cost of control hazards.

To discard instructions, we merely change the original control values to 0s, much as we
did to stall for a load-use data hazard.

The difference is that we must also change the three instructions in the IF, ID, and EX
stages when the branch reaches the MEM stage; for load-use stalls, we just change control to 0 in
the ID stage and let them percolate through the pipeline.

Discarding instructions, then, means we must be able to flush instructions in the IF, ID,
and EX stages of the pipeline.

Reducing the Delay of Branches:

One way to improve branch performance is to reduce the cost of the taken branch.

So far, we have assumed the next PC for a branch is selected in the MEM stage, but if we
move the branch execution earlier in the pipeline, then fewer instructions need be flushed.

The MIPS architecture was designed to support fast single-cycle branches that could be
pipelined with a small branch penalty.

The designers observed that many branches rely only on simple tests (equality or sign, for
example) and that such tests do not require a full ALU operation but can be done with at most a
few gates.

When a more complex branch decision is required, a separate instruction that uses an
ALU to perform a comparison is required a situation that is similar to the use of condition codes
for branches.

Moving the branch decision up requires two actions to occur earlier:


 Computing the branch target address
 Evaluating the branch decision.

For branch equal, we would compare the two registers read during the ID stage to see if
they are equal. Equality can be tested by first exclusive ORing their respective bits and then
ORing all the results.

Moving the branch test to the ID stage implies additional forwarding and hazard
detection hardware, since a branch dependent on a result still in the pipeline must still work
properly with this optimization. For example, to implement branch on equal (and its inverse), we
will need to forward results to the equality test logic that operates during ID. There are two
complicating factors:
1. During ID, we must decode the instruction, decide whether a bypass to the equality unit is
needed, and complete the equality comparison so that if the instruction is a branch, we can set
the PC to the branch target address. Forwarding for the operands of branches was formerly
handled by the ALU forwarding logic, but the introduction of the equality test unit in ID will
require new forwarding logic. Note that the bypassed source operands of a branch can come from
either the ALU/MEM or MEM/WB pipeline latches.

2. Because the values in a branch comparison are needed during ID but may be produced later in
time, it is possible that a data hazard can occur and a stall will be needed. For example, if an
ALU instruction immediately preceding a branch produces one of the operands for the
comparison in the branch, a stall will be required, since the EX stage for the ALU instruction
will occur after the ID cycle of the branch. By extension, if a load is immediately followed by a
conditional branch that is on the load result, two stall cycles will be needed, as the result from the
load appears at the end of the MEM cycle but is needed at the beginning of ID for the branch.

Branch execution to the ID stage:

The branch execution to the ID stage is an improvement, because it reduces the penalty of
a branch to only one instruction if the branch is taken, namely, the one currently being fetched.

To flush instructions in the IF stage, we add a control line, called IF.Flush, that zeros the
instruction field of the IF/ID pipeline register.

Clearing the register transforms the fetched instruction into a nop, an instruction that has
no action and changes no state.

Dynamic Branch Prediction:

Assuming a branch is not taken is one simple form of branch prediction. In that case, we
predict that branches are untaken, flushing the pipeline when we are wrong.

For the simple five-stage pipeline, such an approach, possibly coupled with compiler
based prediction, is probably adequate.

With deeper pipelines, the branch penalty increases when measured in clock cycles.
Similarly, with multiple issue the branch penalty increases in terms of instructions lost.

This combination means that in an aggressive pipeline, a simple static prediction scheme
will probably waste too much performance. As we mentioned in Section 4.5, with more
hardware it is possible to try to predict branch behavior during program execution.

One approach is to look up the address of the instruction to see if a branch was taken the
last time this instruction was executed and, if so, to begin fetching new instructions from the
same place as the last time. This technique is called dynamic branch prediction.

One implementation of that approach is a branch prediction buffer or branch history table.
A branch prediction buffer is a small memory indexed by the lower portion of the address
of the branch instruction. The memory contains a bit that says whether the branch was recently
taken or not.

This simple 1-bit prediction scheme has a performance shortcoming: even if a branch is
almost always taken, we can predict incorrectly twice, rather than once, when it is not taken.

In a 2-bit scheme, a prediction must be wrong twice before it is changed. Figure shows
the finite-state machine for a 2-bit prediction scheme.

A branch prediction buffer can be implemented as a small, special buffer accessed with
the instruction address during the IF pipe stage.

You might also like