Pipelining - Computer Architecture and Organization
Pipelining - Computer Architecture and Organization
School of Engineering 2
3
4
5
6
7
8
9
10
Instruction pipelining is similar to the use of an assembly line in a manufacturing
plant. An assembly line takes advantage of the fact that a product goes through
various stages of production. By laying the production process out in an assembly
line, products at various stages can be worked on simultaneously. This process is
also referred to as pipelining, because, as in a pipeline, new inputs are accepted at
one end before previously accepted inputs appear as outputs at the other end.
It should be clear that this process will speed up instruction execution. If the fetch and execute
stages were of equal duration, the instruction cycle time would be halved. However, if we look
more closely at this pipeline (Figure 14.9b), we will see that this doubling of execution rate is
unlikely for two reasons:
The execution time will generally be longer than the fetch time. Execution will involve reading
and storing operands and the performance of some operation. Thus, the fetch stage may have to
wait for some time before it can empty its buffer.
A conditional branch instruction makes the address of the next instruction to be fetched unknown.
Thus, the fetch stage must wait until it receives the next instruction address from the execute
stage. The execute stage may then have to wait while the next instruction is fetched.
Guessing can reduce the time loss from the second reason. A simple rule is the following: When a
conditional branch instruction is passed on from the fetch to the execute stage, the fetch stage
fetches the next instruction in memory after the branch instruction. Then, if the branch is not
taken, no time is lost. If the branch is taken, the fetched instruction must be discarded and a new
instruction fetched.
While these factors reduce the potential effectiveness of the two-stage pipe- line, some
speedup occurs. To gain further speedup, the pipeline must have more stages. Let us
consider the following decomposition of the instruction processing.
Fetch instruction (FI): Read the next expected instruction into a buffer.
Decode instruction (DI): Determine the opcode and the operand specifiers.
Calculate operands (CO): Calculate the effective address of each source operand. This
may involve displacement, register indirect, indirect, or other forms of address
calculation.
Fetch operands (FO): Fetch each operand from memory. Operands in registers need not
be fetched.
Execute instruction (EI): Perform the indicated operation and store the result, if any, in
the specified destination operand location.
With this decomposition, the various stages will be of more nearly equal duration.
For the sake of illustration, let us assume equal duration. Using this assumption,
Figure 14.10 shows that a six-stage pipeline can reduce the execution time for 9
instructions from 54 time units to 14 time units.
Several comments are in order: The diagram assumes that each instruction goes
through all six stages of the pipeline. This will not always be the case. For
example, a load instruction does not need the WO stage. However, to simplify the
pipeline hardware, the timing is set up assuming that each instruction requires all
six stages. Also, the diagram assumes that all of the stages can be performed in
parallel. In particular, it is assumed that there are no memory conflicts. For
example, the FI, FO, and WO stages involve a memory access. The diagram
implies that all these accesses can occur simultaneously. Most memory systems
will not permit that. However, the desired value may be in cache, or the FO or WO
stage may be null. Thus, much of the time, memory conflicts will not slow down
the pipeline.
Several other factors serve to limit the performance enhancement. If the six stages
are not of equal duration, there will be some waiting involved at various pipe- line
stages, as discussed before for the two-stage pipeline. Another difficulty is the
conditional branch instruction, which can invalidate several instruction fetches. A
similar unpredictable event is an interrupt. Figure 14.11 illustrates the effects of
the conditional branch, using the same program as Figure 14.10. Assume that
instruction 3 is a conditional branch to instruction 15. Until the instruction is
executed, there is no way of knowing which instruction will come next. The
pipeline, in this example, simply loads the next instruction in sequence (instruction
4) and proceeds. In Figure 14.10, the branch is not taken, and we get the full
performance benefit of the enhancement. In Figure 14.11, the branch is taken. This
is not determined until the end of time unit 7. At this point, the pipeline must be
cleared of instructions that are not useful. During time unit 8, instruction 15 enters
the pipeline. No instructions complete during time units 9 through 12; this is the
performance penalty incurred because we could not anticipate the branch.
Figure 14.12 indicates the logic needed for pipelining to account for branches and
interrupts.
Other problems arise that did not appear in our simple two-stage organization. The
CO stage may depend on the contents of a register that could be altered by a
previous instruction that is still in the pipeline. Other such register and memory
conflicts could occur. The system must contain logic to account for this type of
conflict.
To clarify pipeline operation, it might be useful to look at an alternative depiction.
Figures 14.10 and 14.11 show the progression of time horizontally across the
figures, with each row showing the progress of an individual instruction. Figure
14.13 shows same sequence of events, with time progressing vertically down the
figure, and each row showing the state of the pipeline at a given point in time. In
Figure 14.13a (which corresponds to Figure 14.10), the pipeline is full at time 6,
with 6 different instructions in various stages of execution, and remains full
through time 9; we assume that instruction I9 is the last instruction to be executed.
In Figure 14.13b, (which corresponds to Figure 14.11), the pipeline is full at times
6 and 7. At time 7, instruction 3 is in the execute stage and executes a branch to
instruction 15. At this point, instructions I4 through I7 are flushed from the
pipeline, so that at time 8, only two instructions are in the pipeline, I3 and I15.
18
In the previous subsection, we mentioned some of the situations that can result in
less than optimal pipeline performance. In this subsection, we examine this issue
in a more systematic way. Chapter 16 revisits this issue, in more detail, after we
have introduced the complexities found in superscalar pipeline organizations.
A pipeline hazard occurs when the pipeline, or some portion of the pipeline, must
stall because conditions do not permit continued execution. Such a pipe- line stall
is also referred to as a pipeline bubble. There are three types of hazards: resource,
data, and control.
Morgan Kaufmann Publishers 30 November, 2021
• Write after write (WAW), or output dependency: Two instructions both write
to the same location. A hazard occurs if the write operations take place in the
reverse order of the intended sequence.
The example of Figure 14.16 is a RAW hazard. The other two hazards are best
discussed in the context of superscalar organization, discussed in Chapter 16.
27
28
29
A control hazard, also known as a branch hazard, occurs when the pipeline makes
the wrong decision on a branch prediction and therefore brings instructions into
the pipeline that must subsequently be discarded. We discuss approaches to
dealing with control hazards next.
A variety of approaches have been taken for dealing with conditional branches:
Multiple streams
Prefetch branch target
Loop buffer
Branch prediction
Delayed branch
Morgan Kaufmann Publishers 30 November, 2021