0% found this document useful (0 votes)

35 views23 pages

Pipe Lining

Uploaded by

Falguni Mathur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views23 pages

Pipe Lining

Uploaded by

Falguni Mathur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

mywbut.

com

Pipelining
Introduction to Pipelining

It is observed that organization enhancements to the CPU can improve performance. We have
already seen that use of multiple registers rather than a single a accumulator, and use of cache
memory improves the performance considerably. Another organizational approach, which is
quite common, is instruction pipelining.

Pipelining is a particularly effective way of organizing parallel activity in a computer system. The
basic idea is very simple. It is frequently encountered in manufacturing plants, where pipelining
is commonly known as an assembly line operation.

By laying the production process out in an assembly line, product at various stages can be
worked on simultaneously. This process is also referred to as pipelining; because, as in a
pipeline, new inputs are accepted at one end before previously accepted inputs appear as
outputs at the other end.

To apply the concept of instruction execution in pipeline, it is required to break the instruction
in different task. Each task will be executed in different processing elements of the CPU.

As we know that there are two distinct phases of instruction execution: one is instruction fetch
and the other one is instruction execution. Therefore, the processor executes a program by
fetching and executing instructions, one after another.

Let and refer to the fetch and execute steps for instruction . Execution of a program
consists of a sequence of fetch and execute steps is shown in the Figure 9.1.

Figure 9.1: Sequential Execution.

1
mywbut.com

Now consider a CPU that has two separate hardware units, one for fetching instructions and
another for executing them.

The instruction fetch by the fetch unit is stored in an intermediate storage buffer

The results of execution are stored in the destination location specified by the instruction.

For simplicity it is assumed that fetch and execute steps of any instruction can be completed in
one clock cycle.

The operation of the computer proceeds as follows:

• In the first clock cycle, the fetch unit fetches an instruction (instruction , step
) and stored it in buffer at the end of the clock cycle.

• In the second clock cycle, the instruction fetch unit proceeds with the fetch
operation for instruction (step ).

• Meanwhile, the execution unit performs the operation specified by instruction

which is already fetched and available in the buffer (step ).

• By the end of the second clock cycle, the execution of the instruction is
completed and instruction is available.

• Instruction is stored in buffer replacing , which is no longer needed.

• Step is performed by the execution unit during the third clock cycle, while
instruction is being fetched by the fetch unit.

2
mywbut.com

• Both the fetch and execute units are kept busy all the time and one instruction is
completed after each clock cycle except the first clock cycle.

• If a long sequence of instructions is executed, the completion rate of instruction

execution will be twice that achievable by the sequential operation with only
one unit that performs both fetch and execute.

Basic idea of instruction pipelining with hardware organization is shown in the Figure 9.2.

Figure 9.2: Hardware Organization

The pipeline execution of fetch and execution cycle is shown in the Figure 9.3

Figure 9.3: Pipeline Execution.

3
mywbut.com

The processing of an instruction need not be divided into only two steps. To gain further speed
up, the pipeline must have more stages.

Let us consider the following decomposition of the instruction execution:

• Fetch Instruction (FI): Read the next expected instruction into a buffer.

• Decode Instruction ((DI): Determine the opcode and the operand specifiers.

• Calculate Operand (CO): calculate the effective address of each source operand.

• Fetch Operands (FO): Fetch each operand from memory.

• Execute Instruction (EI): Perform the indicated operation.

• Write Operand (WO): Store the result in memory.

There will be six different stages for these six subtasks. For the sake of simplicity, let us assume
the equal duration to perform all the subtasks. It the six stages are not of equal duration, there
will be some waiting involved at various pipeline stages.

The timing diagram for the execution of instruction in pipeline fashion is shown in the Figure
9.4.

4
mywbut.com

Figure 9.4: Timing sequence in pipeline execution.

From this timing diagram it is clear that the total execution time of 8 instructions in this 6 stages
pipeline is 13-time unit. The first instruction gets completed after 6 time unit, and there after in
each time unit it completes one instruction.

Without pipeline, the total time required to complete 8 instructions would have been 48 (6 X 8)
time unit. Therefore, there is a speed up in pipeline processing and the speed up is related to
the number of stages.

5
mywbut.com

Pipeline Performance

The cycle time of an instruction pipeline is the time needed to advance a set of instructions
one stage through the pipeline. The cycle time can be determined as

where

maximum stage delay (delay through stage which experience the largest delay)
number of stages in the instruction pipeline.
time delay of a latch, needed to advance signals and data from one stage to the next.
In general, the time delay is equivalent to a clock pulse and

Now suppose that n instructions are processed and these instructions are executed one after
another. The total time required to execute all n instructions is

A total of k cycles are required to complete the execution of the first instruction, and the
remaining (n-1) instructions require (n-1) cycles.

The time required to execute n instructions without pipeline is

because to execute one instruction it will take cycle.

The speed up factor for the instruction pipeline compared to execution without the pipeline is
defined as:

6
mywbut.com

In general, the number of instruction executed is much more higher than the number of stages
in the pipeline So, the n tends to , we have

i.e. We have a k fold speed up, the speed up factor is a function of the number of stages in the
instruction pipeline.

Though, it has been seen that the speed up is proportional to number of stages in the pipeline,
but in practice the speed up is less due to some practical reason. The factors that affect the
pipeline performance is discussed next.

Effect of Intermediate storage buffer:

Consider a pipeline processor, which process each instruction in four steps;

F: Fetch, Read the instruction from the memory

D: Decode, decode the instruction and fetch the source operand (S)

O: Operate, perform the operation

W: Write, store the result in the destination location.

The hardware organization of this four-stage pipeline processor is shown in the Figure 9.5.

Figure 9.5: Four stage pipeline processor

7
mywbut.com

In the preceding section we have seen that the speed up of pipeline processor is related to
number of stages in the pipeline, i.e, the greater the number of stages in the pipeline, the faster
the execution rate.

But the organization of the stages of a pipeline is a complex task and if affects the performance
of the pipeline.

The problem related to more number of stages:

At each stage of the pipeline, there is some overhead involved in moving data from buffer to
buffer and in performing various preparation and delivery functions. This overhead can
appreciably lengthen the total execution time of a single instruction.

The amount of control logic required to handle memory and register dependencies and to
optimize the use of the pipeline increases enormously with the number of stages.

Apart from hardware organization, there are some other reasons which may effect the
performance of the pipeline.

(A) Unequal time requirement to complete a subtask:

Consider the four-stage pipeline with processing step Fetch, Decode, Operand and write.

The stage-3 of the pipeline is responsible for arithmetic and logic operation, and in general one
clock cycle is assigned for this task

Although this may be sufficient for most operations, but some operations like divide may
require more time to complete.
Figure 9.6 shows the effect of an operation that takes more than one clock cycle to complete an
operation in operate stage.

8
mywbut.com

Clock Cycle
Instruction 1 2 3 4 5 6 7 8 9
s

Figure 9.6: Operation takes more than one clock cycle

The operate stage for instruction takes 3 clock cycle to perform the specified operation.
Clock cycle 4 to 6 required to perform this operation and so write stage is doing nothing during
the clock cycle 5 and 6, because no data is available to write.

Meanwhile, the information in buffer B2 must remain intake until the operate stage has
completed its operation.

This means that stage 2 and stage 1 are blocked from accepting new instructions because the
information in B1 cannot be overwritten by a new fetch instruction.

The contents of B1, B2 and B3 must always change at the same clock edge.

Due to that reason, pipeline operation is said to have been stalled for two clock cycle. Normal
pipeline operation resumes in clock cycle 7.

Whenever the pipeline stalled, some degradation in performance occurs.

Role of cache memory:

The use of cache memory solves the memory access problem

9
mywbut.com

Occasionally, a memory request results in a cache miss. This causes the pipeline stage that
issued the memory request to take much longer time to complete its task and in this case the
pipeline stalls. The effect of cache miss in pipeline processing is shown in the Figure 9.7.

Clock Cycle
Instruction 1 2 3 4 5 6 7 8 9
s

Time
Figure 9.7: Effect of cache miss in pipeline processing

Clock Cycle
1 2 3 4 5 6 7 8 9 10
Stages
F: Fetch F1 F2 F2 F2 F2 F4 F3

D: Decode D1 idle idle idle D2 D3

O: Operate O1 idle idle idle O2 O3

W: Write W1 idle idle idle W2 W3

Time

Figure 9.8: Function performed by each stage as a function of time.

Function performed by each stage as a function of time ias shown in Figure 9.8.

In this example, instruction is fetched from the cache in cycle 1 and its execution proceeds
normally.

10
mywbut.com

The fetch operation for instruction which starts in cycle 2, results in a cache miss.
The instruction fetch unit must now suspend any further fetch requests and wait for to
arrive.

We assume that instruction is received and loaded into buffer B1 at the end of cycle 5, It
appears that cache memory used here is four time faster than the main memory.

The pipeline resumes its normal operation at that point and it will remain in normal operation
mode for some times, because a cache miss generally transfer a block from main memory to
cache.

From the figure, it is clear that Decode unit, Operate unit and Write unit remain idle for three
clock cycle.

Such idle periods are sometimes referred to as bubbles in the pipeline.

Once created as a result of a delay in one of the pipeline stages, a bubble moves downstream
until it reaches the last unit. A pipeline can not stall as long as the instructions and data being
accessed reside in the cache. This is facilitated by providing separate on chip instruction and
data caches.

Dependency Constraints:

Consider the following program that contains two instructions, followed by

:A A+5

:B 3*A

When this program is executed in a pipeline, the execution of can begin before the
execution of completes. The pipeline execution is shown in Figure 9.9.

Clock Cycle
1 2 3 4 5 6
Stages
F: Fetch F1 D1 O1 W1

D: Decode F2 D2 O2 W2

11
mywbut.com

Figure 9.9: Pipeline execution of two instructions

In clock cycle 3, the specific operation of instruction i.e. addition takes place and at that time
only the new updated value of A is available. But in the clock cycle 3, the instruction is
fetching the operand that is required for the operation of . Since in clock cycle 3 only,
operation of instruction is taking place, so the instruction will get the old value of A , it
will not get the updated value of A , and will produce a wrong result. Consider that the initial
value of A is 4. The proper execution will produce the result as
B=27
:

But due to the pipeline action, we will get the result as

Due to the data dependency, these two instructions can not be performed in parallel.

Therefore, no two operations that depend on each other can be performed in parallel. For
correct execution, it is required to satisfy the following:

• The operation of the fetch stage must not depend on the operation performed
during the same clock cycle by the execution stage.

• The operation of fetching an instruction must be independent of the execution

results of the previous instruction.

• The dependency of data arises when the destination of one instruction is used as
a source in a subsequent instruction.

12
mywbut.com

Branching

In general when we are executing a program the next instruction to be executed is brought
from the next memory location. Therefore, in pipeline organization, we are fetching
instructions one after another.

But in case of conditional branch instruction, the address of the next instruction to be fetched
depends on the result of the execution of the instruction.

Since the execution of next instruction depends on the previous branch instruction, sometimes
it may be required to invalidate several instruction fetches. Consider the following instruction
execution sequence of Figure 9.10.

Time 1 2 3 4 5 6 7 8 9 10

Instruction

In this instruction sequence, consider that is a conditional branch instruction.

The result of the instruction will be available at clock cycle 5. But by that time the fetch unit has
already fetched the instruction and

If the branch condition is false, then branch won't take place and the next instruction to be
executed is which is already fetched and available for execution.

Now consider that when the condition is true, we have to execute the instruction After clock
cycle 5, it is known that branch condition is true and now instruction has to be executed.

13
mywbut.com

But already the processor has fetched instruction and It is required to invalidate these
two fetched instruction and the pipe line must be loaded with new destination instruction .

Due to this reason, the pipeline will stall for some time. The time lost due to branch instruction
is often referred as branch penalty.

Time
1 2 3 4 5 6 7 8 9 10
Instruction
I1 F1 D1 O1 W1

I2 F2 D2 O2 W2

I3 F3 D3 O3 W3

I4 F4 D4

I5 F5

I10 F10 D10 O10 W10

I11 F11 D11 O11 W11

Figure 9.11: The effect of branch takes place

The effect of branch takes place is shown in the Figure 9.11. Due to the effect of branch takes
place, the instruction I4 and I5 which has already been fetched is not executed and new
instruction I10 is fetched at clock cycle 6.

There is not effective output in clock cycle 7 and 8, and so the branch penalty is 2.

The branch penalty depends on the number of stages in the pipeline. More numbers of stages
results in more branch penalty.

Dealing with Branches:

14
mywbut.com

One of the major problems in designing an instruction pipe line is assuming a steady flow of
instructions to the initial stages of the pipeline.

The primary problem is the conditional branch instruction until the instruction is actually
executed, it is impossible to determine whether the branch will be taken or not.

A variety of approaches have been taken for dealing with conditional branches:

• Multiple streams
• Prefetch branch target
• Loop buffer
• Branch prediction
• Delayed branch

Multiple streams

A single pipeline suffers a penalty for a branch instruction because it must choose one of two
instructions to fetch next and sometimes it may make the wrong choice.

A brute-force approach is to replicate the initial portions of the pipeline and allow the pipeline
to fetch both instructions, making use of two streams.

There are two problems with this approach.

• With multiple pipelines there are contention delays for access to the registers
and to memory

• Additional branch instructions may enter the pipeline (either stream) before the
original branch decision is resolved. Each such instruction needs as additional
stream.

Prefetch Branch target

When a conditional branch is recognized, the target of the branch is prefetced, in addition to
the instruction following the branch. This target is then saved until the branch instruction is
executed. If the branch is taken, the target has already been prefetched.

Loop Buffer:

15
mywbut.com

A top buffer is a small, very high speed memory maintained by the instruction fetches stage of
the pipeline and containing the most recently fetched instructions, in sequence.

If a branch is to be taken, the hardware first cheeks whether the branch target is within the
buffer. If so, the next instruction is fetched from the buffer.

The loop buffer has three benefits:

1. With the use of prefetching, the loop buffer will contain some instruction
sequentially ahead of the current instruction fetch address. Thus, instructions
fetched in sequence will be available without the usual memory access time.

2. If a branch occurs to a target just a few locations ahead of the address of the
branch instruction, the target will already be in the buffer. This is usual for the
common occurrence of IF-THEN and IF-THEN-ELSE sequences.

3. This strategy is particularly well suited for dealing with loops, or iterations; hence
the name loop buffer. If the loop buffer is large enough to contain all the
instructions in a loop, then those instructions need to be fetched from memory
only once, for the first iteration. For subsequent iterations, all the needed
instructions are already in the buffer.

The loop buffer is similar in principle to a cache dedicated to instructions. The differences are
that the loop buffer only retains instructions in sequence and is much smaller in size and hence
lower in cost.

Branch Prediction:

Various techniques can be used to predict whether a branch will be taken or not. The most
common techniques are:

• Predict never taken

• Predict always taken

• Predict by opcode

16
mywbut.com

• Taken/not taken switch

• Branch history table.

The first three approaches are static; they do not depend on the execution history upto the
time of the conditional branch instructions.

The later two approaches are dynamic- they depend on the execution history.

Predict never taken always assumes that the branch will not be taken and continue to fetch
instruction in sequence.

Predict always taken assumes that the branch will be taken and always fetch the brunet target

In these two approaches it is also possible to minimize the effect of a wrong decision.

If the fetch of an instruction after the branch will cause a page fault or protection violation, the
processor halts its prefetching until it is sure that the instruction should be fetched.

Studies analyzing program behaviour have shown that conditional branches are taken more
than 50% of the time , and so if the cost of prefetching from either path is the same,
then always prefetching from the branch target address should give better performance than
always prefetching from the sequential path.

However, in a paged machine, prefetching the branch target is more likely to cause a page fault
than prefetching the next instruction in the sequence and so this performance penalty should
be taken into account.

Predict by opcode approach makes the decision based on the opcade of the branch instruction.
The processor assumes that the branch will be taken for certain branch opcodes and not for
others. Studies reported in showed that success rate is greater than 75% with the
strategy.

Lilja,D "Reducing the branch penalty in pipeline processors", computer, July 1988.

17
mywbut.com

Dynamic branch strategies attempt to improve the accuracy of prediction by recording the
history of conditional branch instructions in a program. Scheme to maintain the history
information:

• One or more bits can be associated with each conditional branch instruction that
reflect the recent history of the instruction.

• These bits are referred to as a taken/not taken switch that directs the processor
to make a particular decision the next time the instruction is encountered.

• Generally these history bits are not associated with the instruction in main
memory. It will unnecessarily increase the size of the instruction. With a single
bit we can record whether the last execution of this instruction resulted a branch
or not.

• With only one bit of history, an error in prediction will occur twice for each use
of the loop: once for entering the loop. And once for exiting.

If two bits are used, they can be used to record the result of the last two instances of the
execution of the associated instruction.

The history information is not kept in main memory, it can be kept in a temporary high speed
memory.

One possibility is to associate these bits with any conditional branch instruction that is in a
cache. When the instruction is replaced in the cache, its history is lost.

Another possibility is to maintain a small table for recently executed branch instructions with
one or more bits in each entry.

The branch history table is a small cache memory associated with the instruction fetch stage of
the pipeline. Each entry in the table consists of three elements:

• The address of the branch instruction.

• Some member of history bits that record the state of use of that instruction.

18
mywbut.com

• Information about the target instruction, it may be the address of the target
instruction, or may be the target instruction itself.

Delayed Branch:

Instruction
Ij Fj Ej

Ij + 1 Fj + 1

Ik Fk Ek

Ik + 1 Fk + 1 Ek + 1
Time

Figure 9.12: Instructions in pipeline with branch instruction.

The Figure 9.12 shows the execution of instructions in pipeline where the instruction Ij is a
branch instruction.

The instruction is a branch instruction. The processor begins fetching instruction before
it determine whether the current instruction, , is a branch instruction.

When execution of is completed and a branch must be made, the processor must discard
the instruction that was fetched and now fetch the instruction at the branch target.

The location following a branch instruction is called a branch delay slot. There may be more
than one branch delay slot, depending on the time it takes to execute a branch instruction.

The instructions in the delay slots are always fetched and at least partially executed before the
branch decision is made and the branch target address is computed.

Delayed branching is a technique to minimize the penalty incurred as a result of conditional

branch instructions.

19
mywbut.com

The instructions in the delay slots are always fetched, so we can arrange the instruction in delay
slots to be fully executed whether or not the branch is taken.

The objective is to plane useful instruction in these slots. If no useful instructions can be placed
in the delay slots, these slots must be filled with NOP (no operation) instructions.

While feeling up the delay slots with instructions, it is required to maintain the original
semantics of the program.

For example consider the consider the following code segments given in the Figure 9.13.

I1 LOOP Shift_left R1

I2 Decrement R2

I3 Branch_if 0 LOOP

I4 NEXT Add R1,R3

Figure 9.13: Original Program Loop

Here register is used as a counter to determine the number of times the contents of register
are sifted left.
Consider a processor with a two-stage pipeline and one delay slot. During the execution phase
of the instruction , the fetch unit will fetch the instruction After evaluating the branch
condition only, it will be clear whether instruction or will be executed next.
The nature of the code segment says that it will remain in the top depending on the initial value
of and when it becomes zero, it will come out from the loop and execute the instruction .
During the loop execution, every time there is a wrong fetch of instruction . The code
segment can be recognized without disturbing the original meaning of the program. The
reordered code segment is shown in Figure 9.14.

20
mywbut.com

LOOP Decrement R2

Branch_if 0 LOOP

Shift_left R1

NEXT Add R1,R3

Figure 9.14: Reordered instructions for program loop

In this case, the shift instruction is fetched while the branch instruction is being executed.

After evaluating the branch condition, the processor fetches the instruction at LOOP or at NEXT,
depending on whether the branch condition is true or false, respectively.

In either case, it completes execution of the shift instruction.

Logically the program is executed as if the branch instruction was placed after the shift
instruction. That is, branching takes place one instruction later than where the branch
instruction appears in the instruction sequence in the memory, hence the name “delayed
branch” .

Figure 9.15 shows the execution timing for the last two passes through the loop of reordered
instructions.

Figure 9.16 shows the execution timing for the last two passes through the loop of the original
program loop.

21
mywbut.com

Instruction
Decrement F E

Branch F E

Shift F E

Decrement F E

Branch F E

Shift F E

Add F E
Time

Figure 9.15: Execution timing for last two passes through the loop of reordered instruction.

Instruction
Shift F E

Decrement F E

Branch F E

Add F *

Shift F E

Decrement F E

Branch F E

Add F E
Time

22
mywbut.com

Figure 9.16: Execution timing for last two passes through the loop of the original program loop.
*Note : Execution Unit Idle

Problems

Q 1: Explain the concept of instruction pipeline.

Q 2: How do you evaluate the performance enhancement of a pipeline processor with d

number of phases with respect to a processor without pipeline?

Q 3: Why is a two stage instruction pipeline unlikely to cut the instruction cycle time in half,
compared with the use of no pipeline?

Q 4: What is branch penalty?

Q 5: What do you mean by static branch strategies and dynamic branch strategies to deal with
branches in pipeline processor?

Q 6: What is a branch history table and how it is used to deal with branches?

Q 7: Explain the concept of delayed branching technique.

Q 8: What is a loop buffer? How loop buffer is used to handle the branching in pipeline
processor?

Python Beyond Limits: Python, #3
From Everand
Python Beyond Limits: Python, #3
AnwaarX
No ratings yet
Ee3018 - Ep - QB - 1 - Ccet0370 - 2025-05-07-09-30-57
No ratings yet
Ee3018 - Ep - QB - 1 - Ccet0370 - 2025-05-07-09-30-57
94 pages
Android Offloading Computing Over Cloud
No ratings yet
Android Offloading Computing Over Cloud
6 pages
IGNOU Operating System Previous Years Solved Papers
From Everand
IGNOU Operating System Previous Years Solved Papers
Manish Soni
No ratings yet
Unit-3 Complete Notes (Computer Fundamentals - BBA-5)
No ratings yet
Unit-3 Complete Notes (Computer Fundamentals - BBA-5)
8 pages
Object Oriented Concepts Using JAVA - Module - 1
No ratings yet
Object Oriented Concepts Using JAVA - Module - 1
41 pages
Csc325 Coal Lab Manual Sp24 v2.0
No ratings yet
Csc325 Coal Lab Manual Sp24 v2.0
119 pages
ST Unit Ii
No ratings yet
ST Unit Ii
25 pages
JPR Chapter 1
No ratings yet
JPR Chapter 1
189 pages
OS - Module 2-2
No ratings yet
OS - Module 2-2
36 pages
Unit - 1 INTRODUCTION TO PYTHON - ECE
No ratings yet
Unit - 1 INTRODUCTION TO PYTHON - ECE
61 pages
Accelerated Computing With HIP: Second Edition
From Everand
Accelerated Computing With HIP: Second Edition
Yifan Sun
No ratings yet
Compiler Design Mini Project Amisha
No ratings yet
Compiler Design Mini Project Amisha
7 pages
Compiler Backend
No ratings yet
Compiler Backend
15 pages
Unit 6
No ratings yet
Unit 6
22 pages
Introduction To Java
No ratings yet
Introduction To Java
9 pages
Pipe Lining
No ratings yet
Pipe Lining
35 pages
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mahmood Alassouli
No ratings yet
Hallo Kubernetes: Container, Orchestration, Management, and Monitoring
From Everand
Hallo Kubernetes: Container, Orchestration, Management, and Monitoring
Agus Kurniawan
No ratings yet
Computer Programming - Study Guide
No ratings yet
Computer Programming - Study Guide
156 pages
Uml Unit 4
No ratings yet
Uml Unit 4
14 pages
Operating System UNIT 4 Btech
No ratings yet
Operating System UNIT 4 Btech
21 pages
Ict 10
No ratings yet
Ict 10
3 pages
COA Unit 3 Pipelining 31.5.23
No ratings yet
COA Unit 3 Pipelining 31.5.23
12 pages
Software M-PLC en
No ratings yet
Software M-PLC en
3 pages
Ch03-Embedded OS For WSNs
No ratings yet
Ch03-Embedded OS For WSNs
75 pages
Operating Systems Interview Questions You'll Most Likely Be Asked
From Everand
Operating Systems Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Chapter 9 - Main Memory
No ratings yet
Chapter 9 - Main Memory
3 pages
Foc 1ST Unit Notes
No ratings yet
Foc 1ST Unit Notes
31 pages
Operating System
No ratings yet
Operating System
10 pages
COA Module 3 PPT Part 2
No ratings yet
COA Module 3 PPT Part 2
62 pages
Carl Hamacher CHPTR 8 Pipeline
No ratings yet
Carl Hamacher CHPTR 8 Pipeline
58 pages
Comp322 s12 Lec27 Slides v1
No ratings yet
Comp322 s12 Lec27 Slides v1
21 pages
Skse64 Whatsnew
No ratings yet
Skse64 Whatsnew
12 pages
Lecture 11
No ratings yet
Lecture 11
17 pages
Crew AIdocs
No ratings yet
Crew AIdocs
45 pages
Module-5 Ddco - BCS302 DR Laxmi G
No ratings yet
Module-5 Ddco - BCS302 DR Laxmi G
7 pages
TensorFlow Lite Micro Embedded Machine L
No ratings yet
TensorFlow Lite Micro Embedded Machine L
13 pages
Pipelining Concepts and Problems
No ratings yet
Pipelining Concepts and Problems
33 pages
3 Pipelining Pipeline:: "Folder" Takes 20 Minutes
No ratings yet
3 Pipelining Pipeline:: "Folder" Takes 20 Minutes
8 pages
CSC 403 Lecture 2
No ratings yet
CSC 403 Lecture 2
16 pages
Module 5
No ratings yet
Module 5
107 pages
Ijrpr2881 A Review On The Working of The Microprogrammed Control Unit
No ratings yet
Ijrpr2881 A Review On The Working of The Microprogrammed Control Unit
4 pages
Module 4-Pipelining
No ratings yet
Module 4-Pipelining
39 pages
Session6-Pipelining Approach
No ratings yet
Session6-Pipelining Approach
11 pages
DLCOA 6.1 Sep2024
No ratings yet
DLCOA 6.1 Sep2024
81 pages
Computer Architecture 1
No ratings yet
Computer Architecture 1
8 pages
Pipe Lining
No ratings yet
Pipe Lining
12 pages
DDCO Jan25 Unit5
No ratings yet
DDCO Jan25 Unit5
30 pages
Computer Organization and Architecture Pipelining Set Execution, Stages and Throughput
No ratings yet
Computer Organization and Architecture Pipelining Set Execution, Stages and Throughput
7 pages
Module 5 Part2 Pipelining
No ratings yet
Module 5 Part2 Pipelining
36 pages
Module 5 Notes Bcs302
No ratings yet
Module 5 Notes Bcs302
22 pages
Module 3-Part 2
No ratings yet
Module 3-Part 2
50 pages
Pipelining and Others
No ratings yet
Pipelining and Others
34 pages
Processor Organization (Part 2)
No ratings yet
Processor Organization (Part 2)
42 pages
Pipelining
No ratings yet
Pipelining
47 pages
Pipelined Architecture With Its Diagram
No ratings yet
Pipelined Architecture With Its Diagram
20 pages
Pipelining Basic Concept
No ratings yet
Pipelining Basic Concept
23 pages
COA Lecture 10
No ratings yet
COA Lecture 10
22 pages
Co Unit 4
No ratings yet
Co Unit 4
17 pages
Pipeline Processing
No ratings yet
Pipeline Processing
28 pages
Slide 6
No ratings yet
Slide 6
46 pages
Pipeline 1
No ratings yet
Pipeline 1
17 pages
11 Processor Structure and Function 20 3 18
No ratings yet
11 Processor Structure and Function 20 3 18
27 pages
Module 5 - Pipelining
No ratings yet
Module 5 - Pipelining
61 pages
Ddco5-240207065925-3db65dc3 (1) - Pages-Deleted
No ratings yet
Ddco5-240207065925-3db65dc3 (1) - Pages-Deleted
8 pages
CH 12.ppt Type I
No ratings yet
CH 12.ppt Type I
54 pages
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
Lecture 7 - PIPELINING
No ratings yet
Lecture 7 - PIPELINING
16 pages
4-Concept of Pipelining
No ratings yet
4-Concept of Pipelining
20 pages
Comparison Between Pipelining
No ratings yet
Comparison Between Pipelining
9 pages
Computer Architecture Pipe Line
No ratings yet
Computer Architecture Pipe Line
28 pages
Pipelining
No ratings yet
Pipelining
44 pages
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
No ratings yet
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
24 pages
2 - Performance Issue
No ratings yet
2 - Performance Issue
4 pages
Chap-10: Speed and Efficiency
No ratings yet
Chap-10: Speed and Efficiency
29 pages
Unit3 Pipelining
No ratings yet
Unit3 Pipelining
54 pages
4 Instruction Pipeline
No ratings yet
4 Instruction Pipeline
13 pages
Chapter6 - Pipelining
No ratings yet
Chapter6 - Pipelining
61 pages
Chapter6 - Pipelining
No ratings yet
Chapter6 - Pipelining
61 pages
Pipe Lining
No ratings yet
Pipe Lining
29 pages
2 Performance Issue
No ratings yet
2 Performance Issue
4 pages
Basic Concepts1
No ratings yet
Basic Concepts1
18 pages
Chapter 6 - Pipelining
0% (1)
Chapter 6 - Pipelining
61 pages
Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining
No ratings yet
Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining
12 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
31 pages
Pipeline: A Simple Implementation of A RISC Instruction Set
No ratings yet
Pipeline: A Simple Implementation of A RISC Instruction Set
16 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
Coa Lecture Unit 3 Pipelining
No ratings yet
Coa Lecture Unit 3 Pipelining
95 pages

Pipe Lining

Uploaded by

Pipe Lining

Uploaded by

mywbut.

Figure 9.1: Sequential Execution.

The operation of the computer proceeds as follows:

• Meanwhile, the execution unit performs the operation specified by instruction

• Instruction is stored in buffer replacing , which is no longer needed.

• If a long sequence of instructions is executed, the completion rate of instruction

Figure 9.2: Hardware Organization

Figure 9.3: Pipeline Execution.

Let us consider the following decomposition of the instruction execution:

• Fetch Operands (FO): Fetch each operand from memory.

• Execute Instruction (EI): Perform the indicated operation.

• Write Operand (WO): Store the result in memory.

Figure 9.4: Timing sequence in pipeline execution.

The time required to execute n instructions without pipeline is

because to execute one instruction it will take cycle.

Effect of Intermediate storage buffer:

Consider a pipeline processor, which process each instruction in four steps;

F: Fetch, Read the instruction from the memory

O: Operate, perform the operation

W: Write, store the result in the destination location.

Figure 9.5: Four stage pipeline processor

The problem related to more number of stages:

(A) Unequal time requirement to complete a subtask:

Figure 9.6: Operation takes more than one clock cycle

Whenever the pipeline stalled, some degradation in performance occurs.

Role of cache memory:

The use of cache memory solves the memory access problem

D: Decode D1 idle idle idle D2 D3

O: Operate O1 idle idle idle O2 O3

W: Write W1 idle idle idle W2 W3

Figure 9.8: Function performed by each stage as a function of time.

Such idle periods are sometimes referred to as bubbles in the pipeline.

Consider the following program that contains two instructions, followed by

Figure 9.9: Pipeline execution of two instructions

But due to the pipeline action, we will get the result as

• The operation of fetching an instruction must be independent of the execution

In this instruction sequence, consider that is a conditional branch instruction.

I10 F10 D10 O10 W10

I11 F11 D11 O11 W11

Figure 9.11: The effect of branch takes place

Dealing with Branches:

There are two problems with this approach.

Prefetch Branch target

The loop buffer has three benefits:

• Predict never taken

• Predict always taken

• Taken/not taken switch

• Branch history table.

• The address of the branch instruction.

Figure 9.12: Instructions in pipeline with branch instruction.

Delayed branching is a technique to minimize the penalty incurred as a result of conditional

I4 NEXT Add R1,R3

Figure 9.13: Original Program Loop

NEXT Add R1,R3

Figure 9.14: Reordered instructions for program loop

In either case, it completes execution of the shift instruction.

Q 1: Explain the concept of instruction pipeline.

Q 2: How do you evaluate the performance enhancement of a pipeline processor with d

Q 4: What is branch penalty?

Q 7: Explain the concept of delayed branching technique.

You might also like