CONTROL HAZARD
By
Dr. Mausumi Maitra
Professor
Dept. of Information Technology
Govt. College of Engg. and Ceramic Technology
Control Hazard
A branch disrupts the normal flow of control.
If an instruction is a branch instruction, it is known at the end of
the decode step.
The next instruction may be
(i)Either the next sequential instruction – if the branch is not
taken or
(ii)the one specified by the branch instruction – if the branch is
taken.
For conditional Jump instruction, only after executing ALU
operation one would know whether to jump or not.
Control Instruction
Unconditional Jump : JMP 2050H
Conditional Jump :
JC 2060H, JNC 2050H
JZ 3098H, JNZ 4987H
• Branch is known at the end of Decode (ID) step.
• Whether the fetched instruction should be executed or not would
be known only at the end of Execution (EX) step.
•
• The branch address would be known after the Data Memory
(MEM) step.
• If branch is not taken, the fetched instructions will be processed.
• If branch is taken, the instruction fetched is not processed and
the instruction from the branch address is fetched.
• All the instructions following the branch in the pipeline becomes
useless and will be drained from the pipeline i.e. pipeline will be
flushed loosing a number of useful cycles.
EX : Here instruction (i+1) is a branch instruction. It is a
branch will be known to the hardware only at the end of ID
step. The branch address assignment will be done at the
end of MEM operation.
Clock Number
Instr
uctio 1 2 3 4 5 6 7 8 9 10 11 12
ns
i IF ID EX MEM WB
i+1 IF ID EX MEM WB
i+2 IF ID EX MEM WB Continue if branch is
not taken
Discard if branch is
taken
i+3 idle idle idle IF ID AL ME If branch is
U M taken
All the instructions following the branch in the pipeline
becomes
• useless and will be drained from the pipeline
i.e. pipeline will be flushed loosing a number of useful
cycles.
I1 The no. of pipeline
I2(conditional branch cycles wasted
I3 to Ik) between a branch
taken and its branch
. target is called the
. delay slot – d.
.
Ik (Branch target) There will be delay of
Ik+1 3 cycles (3, 4, 5)
Effect of Branch Penalty on Pipeline
Performance
Let
Tav = Average no. of cycles per instruction
Pb = Probability of conditional branch instruction
Then
Tav = Pb X average no. of cycles per branch instruction
+ (1 – Pb) X average no. of cycles per non
branch instruction.
Two cases :
1.Condition is true – the target path is chosen, no. of cycles needed for
the execution is
(1 + d ) cycles ( d = branch penalty)
2. Condition is false – Path starts from next sequential instruction and no
branch penalty, hence one clock cycle is required per instruction.
Thus, the average no. of cycles per branch instruction would be equal to
Pt X (1+d) + (1 – Pt) X 1
Where Pt = Probability that the target path is chosen.
Effective pipeline throughput
H = 1 / Tav = 1/ (1+d Pb Pt)
When Pb = Pt = 0, H = 1.
Ex : Let the percentage of unconditional branches in a set of typical
programs be 5% and that of conditional branches be 15%. Assume
that 80% of the conditional branches are taken. Calculate the % loss
of speed due to branches.
Soln.
No. of cycles per instruction = 1
Average delay cycles due to unconditional branches = 3 X 0.05 = 0.15
Average delay cycles due to conditional branches = 3 X 0.15 X 0.8
= 0.36
Therefore, total delay = 0.15 + 0.36 = 0.51 cycles
Therefore, Tav = 1 + 0.51 = 1.51 cycles.
Pipeline throughput (H) = 1 / Tav = 1/1.51 = 0.662
Pipeline operates at 66.2% of its maximum efficiency.
Therefore % loss of speed up due to branches = 33.8%
Thus it is essential to reduce pipeline delay when branches
occur.
To avoid Control Hazard : Two methods :
1.Hardware Method
Primary objective is to find out the branch address as
early as possible. A separate ALU is put in the ID step of the
pipeline to find out the effective jump address at the earliest.
2. Software Method
The compiler rearranges the statements of the assembly language
program in such a way that the statement following the branch
statement - called a delay slot – is always executed once without
affecting the correctness of the program.
Original Program Rearranged Program
. .
. .
ADD A,B ADD A,B
STO C STO C
ADD D,F ADD D,F
SUB B,C JMI X
JMI X SUB B,C
. .
X . X .
JMI X has been placed before ADD D,F, while JMI is decoded, ADD D,F
will be allowed to complete without changing the meaning of the
program.
Original Program Rearranged Program
Y ADD A,B Y ADD A,B
ADD C,D Y+1 ADD C,D
. .
. .
JMI Y JMI Y+1
. ADD A,B
. .
.
Here delay slot is filled by the target instruction of the branch. If the
probability of the branch being taken is high, this procedure is very
effective. When the branch is not taken, compiler should undo the
statements executed in the delay slot. (Backward Branching)
Original Program Rearranged Program
. .
. .
ADD A,B ADD A,B
JMI Z JMI Z+1
. ADD C,D
. .
Z ADD C,D .
. Z+1 .
. .
Forward branching