Module 5 Notes Bcs302[1]
Module 5 Notes Bcs302[1]
PIPELINING
8.1 Basic Concepts
Pipelining is a particularly effective way of organizing concurrent activity in a
computer system.
Consider how the idea of pipelining can be used in a computer. The processor executes
a program by fetching and executing instructions, one after the other. Let Fi and Ei refer to the
fetch and execute steps for instruction Ii. Execution of a program consists of a sequence of fetch
and execute steps, as shown in Figure 8.1a.
Now consider a computer that has two separate hardware units, one for fetching
instructions and another for executing them, as shown in Figure 8.1b.
The instruction fetched by the fetch unit is deposited in an intermediate storage buffer,
B1. This buffer is needed to enable the execution unit to execute the instruction while the fetch
unit is fetching the next instruction. The results of execution are deposited in the destination
location specified by the instruction.
The computer is controlled by a clock whose period is such that the fetch and execute
steps of any instruction can each be completed in one clock cycle. Operation of the computer
proceeds as in Figure 8.1c.
Buffer B1 holds instruction I3, which was fetched in cycle 3 and is being decoded by the
instruction-decoding unit.
Buffer B2 holds both the source operands for instruction I2 and the specification of the
operation to be performed.
Buffer B3 holds the results produced by the execution unit and the destination information
for instruction I1.
Each stage in a pipeline is expected to complete its operation in one clock cycle. Hence,
the clock period should be sufficiently long to complete the task being performed in any stage.
If different units require different amounts of time, the clock period must allow the longest task
to be completed. A unit that completes its task early is idle for the remainder of the clock period.
Hence, pipelining is most effective in improving performance if the tasks being performed in
different stages require about the same amount of time.
This consideration is particularly important for the instruction fetch step, which is assigned
one clock period in Figure 8.2a. The clock cycle must be equal to or greater than the time
needed to complete a fetch operation.
The use of cache memories solves the memory access problem. In particular, when a cache
is included on the same chip as the processor, access time to the cache is usually the same as
the time needed to perform other basic operations inside the processor. This makes it possible
to divide instruction fetching and processing into steps that are more or less equal in duration.
Each of these steps is performed by a different pipeline stage, and the clock period is chosen
to correspond to the longest one.
The pipelined processor in Figure 8.2 completes the processing of one instruction in each
clock cycle, which means that the rate of instruction processing is four times that of sequential
operation.
For a variety of reasons, one of the pipeline stages may not be able to complete its
processing task for a given instruction in the time allotted. For example, stage E in the four-
stage pipeline of Figure 8.2b is responsible for arithmetic and logic operations, and one clock
cycle is assigned for this task. Although this may be sufficient for most operations, some
operations, such as divide, may require more time to complete.
Figure 8.3 shows an example in which the operation specified in instruction I2 requires
three cycles to complete, from cycle 4 through cycle 6. Thus, in cycles 5 and 6, the Write stage
must be told to do nothing, because it has no data to work with. Thus, steps D4 and F5 must be
postponed as shown.
Pipelined operation in Figure 8.3 is said to have been stalled for two clock cycles. Normal
pipelined operation resumes in cycle 7. Any condition that causes the pipeline to stall is called
a hazard. We have just seen an example of a data hazard. A data hazard is any condition in
which either the source or the destination operands of an instruction are not available at the
time expected in the pipeline. As a result, some operation has to be delayed, and the pipeline
stalls.
Figure 8.3:Effect of an execution operation taking more than one clock cycle.
The pipeline may also be stalled because of a delay in the availability of an instruction.
For example, this may be a result of a miss in the cache, requiring the instruction to be fetched
from the main memory. Such hazards are often called control hazards or instruction hazards.
The effect of a cache miss on pipelined operation is illustrated in Figure 8.4.
Instruction I1 is fetched from the cache in cycle 1, and its execution proceeds normally.
However, the fetch operation for instruction I2, which is started in cycle 2,results in a cache
miss. The instruction fetch unit must now suspend any further fetch requests and wait for I2 to
arrive. We assume that instruction I2 is received and loaded into buffer B1 at the end of cycle
5. The pipeline resumes its normal operation at that point.
An alternative representation of the operation of a pipeline in the case of a cache miss is
shown in Figure 8.4b. This figure gives the function performed by each pipeline stage in each
clock cycle.
Note that the Decode unit is idle in cycles 3 through 5, the Execute unit is idle in cycles 4
through 6, and the Write unit is idle in cycles 5 through 7. Such idle periods are called stalls.
They are also often referred to as bubbles in the pipeline. Once created as a result of a delay
in one of the pipeline stages, a bubble moves downstream until it reaches the last unit.
the register file in cycle 6. Even though the instructions and their data are all available, the
pipeline is stalled because one hardware resource, the register file, cannot handle two
operations at once.
If the register file had two input ports, that is, if it allowed two simultaneous write
operations, the pipeline would not be stalled.
In general, structural hazards are avoided by providing sufficient hardware resources on
the processor chip.