CH-1 1 Pipelining
CH-1 1 Pipelining
UNIT - IV
CONTENT
1. Introduction of Pipelining
2. Instruction Pipelining
3. Arithmetic Pipelining
4. Pipelining Hazards
5. Numerical on Pipelining
1. INTRODUCTION OF PIPELINING
INTRODUCTION
Pipelined Execution
Fetch Fetching the instruction Fetching the instruction Fetching the instruction
Decode Decoding the instruction Decoding the instruction Decoding the instruction
Execute Executing the instruction Memory Access for operands Calculate the effective address
Write Back Executing the instruction of the operand
4-stage pipelining
PERFORMANCE MEASURES FOR THE GOODNESS
OF A PIPELINE
◼ In order to formulate the performance measures for the goodness of a
pipeline in processing a series of instructions
◼ A space-time chart (called the Gantt’s chart) is used.
◼ In this chart, the vertical axis represents the segments (four in this case) and the
horizontal axis represents time (the time (T) taken by each subunit to perform its task
is the same, therefore, known as unit time)
◼
SPEED-UP S(N)
◼
Mainly two
types of
pipelining
1. Instruction 2. Arithmetic
Pipelining Pipelining
2. INSTRUCTION PIPELINING
INSTRUCTION PIPELINING
◼ Instruction pipelines are used to divide the task of executing a stream of instructions into
subtasks to be executed in different pipeline segments to improve the throughput of the
computer system.
◼ For example, if we have a stream of instructions, then one segment of the pipeline can
read the instructions while another segment can decode the previous instruction. In this
way, more than one instruction will be handled simultaneously by the computer system
which will improve its throughput. The instruction pipeline will be more efficient if the
instructions are divided into equal-duration segments.
◼ Arithmetic pipelines are used to divide an arithmetic task into subtasks to be executed in
different pipeline segments.
◼ The main purpose is to speed up the arithmetic operations
◼ Pipeline arithmetic units are usually found in very high-speed computers. They are used
to implement floating-point operations, multiplication of fixed-point numbers, and similar
computations encountered in scientific problems.
◼ Floating-point operations are easily decomposed into suboperations.
◼ A pipeline multiplier is essentially an array multiplier, with special adders designed to minimize the carry
propagation time through the partial products.
ARITHMETIC PIPELINING: FLOATING POINT ADDER
Exponents Mantissas
We know that two floating point numbers are represented in their a b A B
X = A x 2a
Choose exponent Align mantissa
Y = B x 2b Segment 2:
◼ The following numerical example may clarify the suboperations performed in each
segment. For simplicity, we use decimal numbers, although Figure refers to binary numbers.
1. Consider the two normalized floating-point numbers:
2. The two exponents are subtracted in the first segment to obtain 3 - 2 = 1. The larger
exponent 3 is chosen as the exponent of the result.
3. The next segment shifts the mantissa of Y to the right to obtain
EXAMPLE: FLOATING POINT ADDER
4. This aligns the two mantissa under the same exponent. The addition of the two mantissa
in segment 3 produces the sum Z = 1 .0324 * 103.
5. The sum is adjusted by normalizing the result so that it has a fraction with a nonzero first
digit. This is done by shifting the mantissa once to the right and incrementing the
exponent by one to obtain the normalized sum.
Z = 0.10324 * 104.
6. The comparator, shifter, adder-subtractor, incrementer, and decrementer in the
floating-point pipeline are implemented with combinational circuits. Suppose that the
time delays of the four segments are t1 = 60 ns, t2 = 70 ns, t3 = 100 ns, t4 = 80 ns, and the
interface registers have a delay of tr = 10 ns.
7. The clock cycle is chosen to be tp = t3 + tr = 110 ns.
8. An equivalent non-pipeline floating point adder-subtractor will have a delay time tn = t1 + t2
+ t3 + t4 + tr = 320 ns.
9. In this case the pipelined adder has a speedup of 320/110 = 2.9 over the nonpipelined
adder.
4. PIPELINING HAZARDS
PIPELINING HAZARDS
◼ Pipeline hazards are situations that prevent the next instruction in the instruction stream
from executing during its designated clock cycles.
◼ Any condition that causes a stall in the pipeline operations can be called a hazard.
1. Data Hazards
3. Structural Hazards.
1. DATA HAZARDS
◼ A data hazard is any condition in which either the source or the destination operands of an
instruction are not available at the time expected in the pipeline. As a result, some
operation has to be delayed, and the pipeline stalls.
◼ Data hazards are divided into three types according to the order in which READ or WRITE
operations are performed on the register:
This is when one instruction makes use of data from a previous instruction.
Example,
When the second instruction is written to a register before the first instruction is read, this is known as a race
condition. In the case of a simple structure of a pipeline, this is uncommon. WAR, on the other hand, can occur in
some machines having complex and specific instructions.
Example,
ADD X2, X1, X0
SUB X0, X3, X4
This is a situation where two simultaneous instructions must write the same register in the same sequence they
were issued.
Example,
ADD X0, X1, X2
SUB X0, X4, X5
DATA HAZARDS CLASSIFICATION
◼ Important Note:
WAW and WAR hazards can only occur when instructions are executed in parallel or out of order.
These occur because the same register numbers have been allotted by the compiler although
avoidable.
This situation is fixed by renaming one of the registers by the compiler or by delaying the updating of
a register until the appropriate value has been produced.
Modern CPUs not only have incorporated Parallel execution with multiple ALUs but also out of order
issues and execution of instructions along with many stages of pipelines.
1. DATA HAZARDS
◼ Solution 1: At the IF stage of the SUB instruction, add three bubbles. This will make it easier for SUB – ID
(Instruction Decoder) to work at t6. As a result, all subsequent instructions in the pipe are similarly delayed.
◼ Solution 2: Forwarding of Data – Data forwarding is the process of sending a result straight to that
functional unit that needs it: a result is transferred from one unit’s output to another’s input. The goal is to
have the solution ready for the next instruction as soon as possible.
2. STRUCTURAL HAZARDS
◼ Hardware resource conflicts among the
instructions in the pipeline cause structural
hazards. Memory, a GPR Register, or an ALU
might all be used as resources here.
◼ Thus a Conditional hazard occurs when the decision to execute an instruction is based on
the result of another instruction like a conditional branch, which checks the condition’s
resultant value.
◼ The branch and jump instructions decide the program flow by loading the appropriate
location in the Program Counter(PC). The PC has the value of the next instruction to be
fetched and executed by CPU. Consider the following sequence of instructions.
3. CONTROL HAZARDS
SOLUTION FOR CONTROL HAZARDS
1. Stall:
Stall the given pipeline as soon as any branch instructions are decoded. Just don’t allow IF anymore.
Stalling reduces throughput as it always does. According to statistics, at least 30% of the instructions
in a program are BRANCH. With Stalling, the pipeline is effectively operating at 50% capacity.
2. Prediction:
Consider a for or a while loop that is repeated 100 times. We know the program would run 100 times
without the given branch condition being met. The program only exits the loop for the 101st time. As a
result, it’s better to let the pipeline run its course and then flush/undo when the branch condition is
met. This has less of an impact on the pipeline’s throttle and stalling.
SOLUTION FOR CONTROL HAZARDS
3. Dynamic Branch Prediction :
A history record is maintained with the help of Branch Table Buffer (BTB). The BTB is a kind of cache, which
has a set of entries, with the PC address of the Branch Instruction and the corresponding effective branch
address. This is maintained for every branch instruction that occurs.
4. Reordering Instructions:
Delayed branching entails reordering the instructions to move the branch instruction later in the
sequence, allowing safe and beneficial instructions that are unaffected by the result of a branch to be
brought in earlier in the sequence, delaying the fetch of the branch instruction. If such instructions are
not available, NOP is used. The Compiler is used to implement this delayed branch.
5. NUMERICAL ON PIPELINING
QUESTION 1
In certain scientific computations it is necessary to perform the arithmetic operation (Ai + Bi)(Ci +
Di) with a stream of numbers. Specify a pipeline configuration to carry out this task. Use the
contents of all registers in the pipeline for i = 1 through 6.
Solution:
QUESTION 2
Determine the number of clock cycles that it takes to process 200 tasks in a six-segment
pipeline
n = 6 segments
m = 200 tasks
Draw a space-time diagram for a six-segment pipeline showing the time it takes to process eight
tasks.
Solution:
(n + m – 1) = 6 + 8 – 1 = 13 cycles
QUESTION 4
A non-pipeline system takes 50 ns to process a task. The same task can be processed in a
six-segment pipeline with a clock cycle of 10 ns. Determine the speedup ratio of the
pipeline for 100 tasks. What is the maximum speedup that can be achieved?
Solution:
QUESTION 5
The pipeline of Fig has the following propagation times: 40 ns for the operands to be read from memory into registers R1 and R2, 45
ns for the signal to propagate through the multiplier, 5 ns for the register transfer time into R3, and 15 ns to add the two numbers
into R5.
Solution: