0% found this document useful (0 votes)
103 views

Pipelining - Computer Architecture and Organization

Instruction pipelining improves processor performance by dividing instruction processing into multiple stages that can operate in parallel. A multi-stage pipeline allows new instructions to begin processing before previous instructions have finished. However, pipeline hazards like resource conflicts, data dependencies, and control hazards can reduce the benefits of pipelining by stalling the pipeline. Common pipeline hazards include instructions competing for functional units like ALUs, instructions depending on results not yet available, and conditional branches altering the instruction flow. More advanced techniques attempt to mitigate these hazards to better utilize the parallelism of pipelined processors.

Uploaded by

DIPTANU SAHA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views

Pipelining - Computer Architecture and Organization

Instruction pipelining improves processor performance by dividing instruction processing into multiple stages that can operate in parallel. A multi-stage pipeline allows new instructions to begin processing before previous instructions have finished. However, pipeline hazards like resource conflicts, data dependencies, and control hazards can reduce the benefits of pipelining by stalling the pipeline. Common pipeline hazards include instructions competing for functional units like ALUs, instructions depending on results not yet available, and conditional branches altering the instruction flow. More advanced techniques attempt to mitigate these hazards to better utilize the parallelism of pipelined processors.

Uploaded by

DIPTANU SAHA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

1

School of Engineering 2
3
4
5
6
7
8
9
10
Instruction pipelining is similar to the use of an assembly line in a manufacturing
plant. An assembly line takes advantage of the fact that a product goes through
various stages of production. By laying the production process out in an assembly
line, products at various stages can be worked on simultaneously. This process is
also referred to as pipelining, because, as in a pipeline, new inputs are accepted at
one end before previously accepted inputs appear as outputs at the other end.

To apply this concept to instruction execution, we must recognize that, in fact, an


instruction has a number of stages. Figures 14.5, for example, breaks the
instruction cycle up into 10 tasks, which occur in sequence. Clearly, there should
be some opportunity for pipelining.
As a simple approach, consider subdividing instruction processing into two stages: fetch
instruction and execute instruction. There are times during the execution of an instruction when
main memory is not being accessed. This time could be used to fetch the next instruction in
parallel with the execution of the current one. Figure depicts this approach. The pipeline has two
independent stages. The first stage fetches an instruction and buffers it. When the second stage is
free, the first stage passes it the buffered instruction. While the second stage is executing the
instruction, the first stage takes advantage of any unused memory cycles to fetch and buffer the
next instruction. This is called instruction prefetch or fetch overlap. Note that this approach,
which involves instruction buffering, requires more registers. In general, pipelining requires
registers to store data between stages.

It should be clear that this process will speed up instruction execution. If the fetch and execute
stages were of equal duration, the instruction cycle time would be halved. However, if we look
more closely at this pipeline (Figure 14.9b), we will see that this doubling of execution rate is
unlikely for two reasons:

The execution time will generally be longer than the fetch time. Execution will involve reading
and storing operands and the performance of some operation. Thus, the fetch stage may have to
wait for some time before it can empty its buffer.

A conditional branch instruction makes the address of the next instruction to be fetched unknown.
Thus, the fetch stage must wait until it receives the next instruction address from the execute
stage. The execute stage may then have to wait while the next instruction is fetched.

Guessing can reduce the time loss from the second reason. A simple rule is the following: When a
conditional branch instruction is passed on from the fetch to the execute stage, the fetch stage
fetches the next instruction in memory after the branch instruction. Then, if the branch is not
taken, no time is lost. If the branch is taken, the fetched instruction must be discarded and a new
instruction fetched.
While these factors reduce the potential effectiveness of the two-stage pipe- line, some
speedup occurs. To gain further speedup, the pipeline must have more stages. Let us
consider the following decomposition of the instruction processing.

Fetch instruction (FI): Read the next expected instruction into a buffer.

Decode instruction (DI): Determine the opcode and the operand specifiers.

Calculate operands (CO): Calculate the effective address of each source operand. This
may involve displacement, register indirect, indirect, or other forms of address
calculation.

Fetch operands (FO): Fetch each operand from memory. Operands in registers need not
be fetched.

Execute instruction (EI): Perform the indicated operation and store the result, if any, in
the specified destination operand location.

Write operand (WO): Store the result in memory.

With this decomposition, the various stages will be of more nearly equal duration.
For the sake of illustration, let us assume equal duration. Using this assumption,
Figure 14.10 shows that a six-stage pipeline can reduce the execution time for 9
instructions from 54 time units to 14 time units.

Several comments are in order: The diagram assumes that each instruction goes
through all six stages of the pipeline. This will not always be the case. For
example, a load instruction does not need the WO stage. However, to simplify the
pipeline hardware, the timing is set up assuming that each instruction requires all
six stages. Also, the diagram assumes that all of the stages can be performed in
parallel. In particular, it is assumed that there are no memory conflicts. For
example, the FI, FO, and WO stages involve a memory access. The diagram
implies that all these accesses can occur simultaneously. Most memory systems
will not permit that. However, the desired value may be in cache, or the FO or WO
stage may be null. Thus, much of the time, memory conflicts will not slow down
the pipeline.
Several other factors serve to limit the performance enhancement. If the six stages
are not of equal duration, there will be some waiting involved at various pipe- line
stages, as discussed before for the two-stage pipeline. Another difficulty is the
conditional branch instruction, which can invalidate several instruction fetches. A
similar unpredictable event is an interrupt. Figure 14.11 illustrates the effects of
the conditional branch, using the same program as Figure 14.10. Assume that
instruction 3 is a conditional branch to instruction 15. Until the instruction is
executed, there is no way of knowing which instruction will come next. The
pipeline, in this example, simply loads the next instruction in sequence (instruction
4) and proceeds. In Figure 14.10, the branch is not taken, and we get the full
performance benefit of the enhancement. In Figure 14.11, the branch is taken. This
is not determined until the end of time unit 7. At this point, the pipeline must be
cleared of instructions that are not useful. During time unit 8, instruction 15 enters
the pipeline. No instructions complete during time units 9 through 12; this is the
performance penalty incurred because we could not anticipate the branch.
Figure 14.12 indicates the logic needed for pipelining to account for branches and
interrupts.

Other problems arise that did not appear in our simple two-stage organization. The
CO stage may depend on the contents of a register that could be altered by a
previous instruction that is still in the pipeline. Other such register and memory
conflicts could occur. The system must contain logic to account for this type of
conflict.
To clarify pipeline operation, it might be useful to look at an alternative depiction.
Figures 14.10 and 14.11 show the progression of time horizontally across the
figures, with each row showing the progress of an individual instruction. Figure
14.13 shows same sequence of events, with time progressing vertically down the
figure, and each row showing the state of the pipeline at a given point in time. In
Figure 14.13a (which corresponds to Figure 14.10), the pipeline is full at time 6,
with 6 different instructions in various stages of execution, and remains full
through time 9; we assume that instruction I9 is the last instruction to be executed.
In Figure 14.13b, (which corresponds to Figure 14.11), the pipeline is full at times
6 and 7. At time 7, instruction 3 is in the execute stage and executes a branch to
instruction 15. At this point, instructions I4 through I7 are flushed from the
pipeline, so that at time 8, only two instructions are in the pipeline, I3 and I15.
18
In the previous subsection, we mentioned some of the situations that can result in
less than optimal pipeline performance. In this subsection, we examine this issue
in a more systematic way. Chapter 16 revisits this issue, in more detail, after we
have introduced the complexities found in superscalar pipeline organizations.

A pipeline hazard occurs when the pipeline, or some portion of the pipeline, must
stall because conditions do not permit continued execution. Such a pipe- line stall
is also referred to as a pipeline bubble. There are three types of hazards: resource,
data, and control.
Morgan Kaufmann Publishers 30 November, 2021

Chapter 4 — The Processor 20


A resource hazard occurs when two (or more) instructions that are already in the
pipeline need the same resource. The result is that the instructions must be
executed in serial rather than parallel for a portion of the pipeline. A resource
hazard is sometime referred to as a structural hazard.

Let us consider a simple example of a resource hazard. Assume a simplified five-


stage pipeline, in which each stage takes one clock cycle. Figure 14.15a shows the
ideal case, in which a new instruction enters the pipeline each clock cycle. Now
assume that main memory has a single port and that all instruction fetches and data
reads and writes must be performed one at a time. Further, ignore the cache. In this
case, an operand read to or write from memory cannot be performed in parallel
with an instruction fetch. This is illustrated in Figure 14.15b, which assumes that
the source operand for instruction I1 is in memory, rather than a register.
Therefore, the fetch instruction stage of the pipeline must idle for one cycle before
beginning the instruction fetch for instruction I3. The figure assumes that all other
operands are in registers.

Another example of a resource conflict is a situation in which multiple instructions


are ready to enter the execute instruction phase and there is a single ALU. One
solutions to such resource hazards is to increase available resources, such as
having multiple ports into main memory and multiple ALU units.
A data hazard occurs when there is a conflict in the access of an operand location.
In general terms, we can state the hazard in this form: Two instructions in a
program are to be executed in sequence and both access a particular memory or
register operand. If the two instructions are executed in strict sequence, no
problem occurs. However, if the instructions are executed in a pipeline, then it is
possible for the operand value to be updated in such a way as to produce a
different result than would occur with strict sequential execution. In other words,
the program produces an incorrect result because of the use of pipelining.
23
Morgan Kaufmann Publishers 30 November, 2021

Chapter 4 — The Processor 24


25
There are three types of data hazards;

• Read after write (RAW), or true dependency: An instruction modifies a


register or memory location and a succeeding instruction reads the data in that
memory or register location. A hazard occurs if the read takes place before the
write operation is complete.

• Write after read (WAR), or antidependency: An instruction reads a register or


memory location and a succeeding instruction writes to the location. A hazard
occurs if the write operation completes before the read operation takes place.

• Write after write (WAW), or output dependency: Two instructions both write
to the same location. A hazard occurs if the write operations take place in the
reverse order of the intended sequence.

The example of Figure 14.16 is a RAW hazard. The other two hazards are best
discussed in the context of superscalar organization, discussed in Chapter 16.
27
28
29
A control hazard, also known as a branch hazard, occurs when the pipeline makes
the wrong decision on a branch prediction and therefore brings instructions into
the pipeline that must subsequently be discarded. We discuss approaches to
dealing with control hazards next.

One of the major problems in designing an instruction pipeline is assuring a steady


flow of instructions to the initial stages of the pipeline. The primary impediment,
as we have seen, is the conditional branch instruction. Until the instruction is
actually executed, it is impossible to determine whether the branch will be taken or
not.

A variety of approaches have been taken for dealing with conditional branches:

Multiple streams
Prefetch branch target
Loop buffer
Branch prediction
Delayed branch
Morgan Kaufmann Publishers 30 November, 2021

Chapter 4 — The Processor 31


32
33
34
35
36
37
38
39
40

You might also like