0% found this document useful (0 votes)

39 views16 pages

Lecture 7 - PIPELINING

This document discusses instruction pipelining in CPUs. It describes how breaking down the instruction execution process into discrete stages allows different parts of multiple instructions to be processed simultaneously, improving performance. Specifically: - Instruction execution can be broken into fetch, decode, operand access, execute, and writeback stages, with separate hardware for each stage allowing continuous processing. - Pipelining can increase instruction throughput and reduce completion time compared to sequential execution, with deeper pipelines providing greater speedup. - Pipeline performance is affected by unequal stage times, cache misses causing stalls, and dependencies between instructions. Stalls degrade the performance gains from pipelining.

Uploaded by

noberth nikombolwe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views16 pages

Lecture 7 - PIPELINING

Uploaded by

noberth nikombolwe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

DAR ES SALAAM INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER STUDIES

COU07302 MICROPROCESSOR AND COMPUTER ARCHITECTURE
Lecture 1 - Pipelining
by
E. Kondela

Introduction
It is observed that organization enhancements to the CPU can improve performance. We have
already seen that use of multiple registers rather than a single a accumulator, and use of cache
memory improves the performance considerably. Another organizational approach, which is quite
common, is instruction pipelining.
Pipelining is a particularly effective way of organizing parallel activity in a computer system. The
basic idea is very simple. It is frequently encountered in manufacturing plants, where pipelining is
commonly known as an assembly line operation.
By laying the production process out in an assembly line, product at various stages can be worked
on simultaneously. This process is also referred to as pipelining, because, as in a pipeline, new
inputs are accepted at one end before previously accepted inputs appear as outputs at the other end.
To apply the concept of instruction execution in pipeline, it is required to break the instruction in
different task. Each task will be executed in different processing elements of the CPU.
As we know that there are two distinct phases of instruction execution: one is instruction fetch and
the other one is instruction execution. Therefore, the processor executes a program by fetching and
executing instructions, one after another.
Let Fi and Ei refer to the fetch and execute steps for instruction Ii. Execution of a program consists
of a sequence of fetch and execute steps is shown in the figure on the next slide.

Now consider a CPU that has two separate hardware units, one for fetching instructions and another for executing
them.
The instruction fetch by the fetch unit is stored in an intermediate storage buffer B1. The results of execution are
stored in the destination location specified by the instruction.
For simplicity it is assumed that fetch and execute steps of any instruction can be completed in one clock cycle.
The operation of the computer proceeds as follows:
 In the first clock cycle, the fetch unit fetches an instruction (instruction I1, step F1) and stored it in buffer
B1 at the end of the clock cycle.
 In the second clock cycle, the instruction fetch unit proceeds with the fetch operation for instruction I2
(step F2).
 Meanwhile, the execution unit performs the operation specified by instruction I1which is already fetched
and available in the buffer B1 (step E1).
 By the end of the second clock cycle, the execution of the instruction I1 is completed and instruction I2 is
available.
 Instruction I2 is stored in buffer B1 replacing I1 which is no longer needed.
 Step E2 is performed by the execution unit during the third clock cycle, while instruction I3 is being
fetched by the fetch unit.
 Both the fetch and execute units are kept busy all the time and one instruction is completed after each
clock cycle except the first clock cycle.
 If a long sequence of instructions is executed, the completion rate of instruction execution will be twice
that achievable by the sequential operation with only one unit that performs both fetch and execute.
Basic idea of instruction pipelining with hardware organization is shown in the figure on the next slide.

The processing of an instruction need not be divided into only two steps. To gain further speed up, the pipeline
must have more stages.
Let us consider the following decomposition of the instruction execution:
 Fetch Instruction (FI): Read the next expected instruction into a buffer.
 Decode Instruction ((DI): Determine the opcode and the operand specifiers.
 Calculate Operand (CO): calculate the effective address of each source operand.
 Fetch Operands(FO): Fetch each operand from memory.
 Execute Instruction (EI): Perform the indicated operation.
 Write Operand(WO): Store the result in memory.
There will be six different stages for these six subtasks. For the sake of simplicity, let us assume the equal duration
to perform all the subtasks. It the six stages are not of equal duration, there will be some waiting involved at
various pipeline stages.
The timing diagram for the execution of instruction in pipeline fashion is shown in the figure on the next slide.
From this timing diagram it is clear that the total execution time of 8 instructions in this 6 stages
pipeline is 13-time unit. The first instruction gets completed after 6 time unit, and there after in each
time unit it completes one instruction. Without pipeline, the total time required to complete 8
instructions would have been 48 (6 X 8) time unit. Therefore, there is a speed up in pipeline
processing and the speed up is related to the number of stages.
Pipeline Performance
i.e. We have a k fold speed up, the speed up factor is a function of the number of stages in the
instruction pipeline.
Though, it has been seen that the speed up is proportional to number of stages in the pipeline, but in
practice the
speed up is less due to some practical reason. The factors that affect the pipeline performance is
discussed next.

Effect of Intermediate storage buffer:

Consider a pipeline processor, which process each instruction in four steps;
F: Fetch, Read the instruction from the memory
D: Decode, decode the instruction and fetch the source operand (S)
O: Operate, perform the operation
W: Write, store the result in the destination location.
The hardware organization of this four-stage pipeline processor is shown next.
In the preceding section we have seen that the speed up of pipeline processor is related to number of
stages in the pipeline, i.e, the greater the number of stages in the pipeline, the faster the execution
rate. But the organization of the stages of a pipeline is a complex task and if affects the performance
of the pipeline.
The problem related to more number of stages:
At each stage of the pipeline, there is some overhead involved in moving data from buffer to buffer
and in performing various preparation and delivery functions. This overhead can appreciably
lengthen the total execution time of a single instruction.
The amount of control logic required to handle memory and register dependencies and to optimize
the use of the pipeline increases enormously with the number of stages.
Apart from hardware organization, there are some other reasons which may effect the performance
of the pipeline.
(A) Unequal time requirement to complete a subtask:
Consider the four-stage pipeline with processing step Fetch, Decode, Operand and write.
The stage-3 of the pipeline is responsible for arithmetic and logic operation, and in general one
clock cycle is assigned for this task
Although this may be sufficient for most operations, but some operations like divide may require
more time to complete. Following figure shows the effect of an operation that takes more than one
clock cycle to complete an operation in operate stage.
The operate stage for instruction I2 takes 3 clock cycle to perform the specified operation. Clock
cycle 4 to 6 required to perform this operation and so write stage is doing nothing during the clock
cycle 5 and 6, because no data is available to write.
Meanwhile, the information in buffer B2 must remain intake until the operate stage has completed
its operation. This means that stage 2 and stage 1 are blocked from accepting new instructions
because the information in B1 cannot be overwritten by a new fetch instruction.
The contents of B1, B2 and B3 must always change at the same clock edge.
Due to that reason, pipeline operation is said to have been stalled for two clock cycle. Normal
pipeline operation resumes in clock cycle 7. Whenever the pipeline stalled, some degradation in
performance occurs.

Role of cache memory:

The use of cache memory solves the memory access problem. Occasionally, a memory request
results in a cache miss. This causes the pipeline stage that issued the memory request to take much
longer time to complete its task and in this case the pipeline stalls. The effect of cache miss in
pipeline processing is shown in the figure.
Function performed by each stage as a function of time
In this example, instruction I1 is fetched from the cache in cycle 1 and its execution proceeds
normally. The fetch operation for instruction I2 which starts in cycle 2, results in a cache miss. The
instruction fetch unit must now suspend any further fetch requests and wait for I2 to arrive.
We assume that instruction I2 is received and loaded into buffer B1 at the end of cycle 5, It appears
that cache memory used here is four time faster than the main memory.
The pipeline resumes its normal operation at that point and it will remain in normal operation mode
for some times, because a cache miss generally transfer a block from main memory to cache.
From the figure, it is clear that Decode unit, Operate unit and Write unit remain idle for three clock
cycle. Such idle periods are sometimes referred to as bubbles in the pipeline. Once created as a
result of a delay in one of the pipeline stages, a bubble moves downstream until it reaches the last
unit. A pipeline can not stall as long as the instructions and data being accessed reside in the cache.
This is facilitated by providing separate on chip instruction and data caches.

Dependency Constraints:
Consider the following program that contains two instructions, I1 followed by I2
I1 : A← A + 5
I2 : B← 3 * A
When this program is executed in a pipeline, the execution of I2 can begin before the execution of
I1 completes.
The pipeline execution is shown below.

In clock cycle 3, the specific operation of instruction I1 i.e. addition takes place and at that time
only the new updated value of A is available. But in the clock cycle 3, the instruction I2 is fetching
the operand that is required for the operation of I2. Since in clock cycle 3 only, operation of
instruction I1 is taking place, so the instruction will get operation of the old value of A , it will not
get the updated value of A , and will produce a wrong result. Consider that the initial value of A is 4.
The proper execution will produce the result as

B=27
I1: A← A + 5 = 4 + 5 = 9
I2: B← 3 x A = 3 x 9 = 27

But due to the pipeline action, we will get the result as

I1: A← A + 5 = 4 + 5 = 9
I2:B← 3 x A = 3 x 4 = 12

Due to the data dependency, these two instructions can not be performed in parallel.

Therefore, no two operations that depend on each other can be performed in parallel. For correct
execution, it is required to satisfy the following:
 The operation of the fetch stage must not depend on the operation performed during the
same clock cycle by the execution stage.
 The operation of fetching an instruction must be independent of the execution results of the
previous instruction.
 The dependency of data arises when the destination of one instruction is used as a source in
a subsequent instruction.

Branching

In general when we are executing a program the next instruction to be executed is brought from the
next memory location. Therefore, in pipeline organization, we are fetching instructions one after
another.
But in case of conditional branch instruction, the address of the next instruction to be fetched
depends on the result of the execution of the instruction.
Since the execution of next instruction depends on the previous branch instruction, sometimes it
may be required to invalidate several instruction fetches. Consider the following instruction
execution sequence:

In this instruction sequence, consider that I3 is a conditional branch instruction.

The result of the instruction will be available at clock cycle 5. But by that time the fetch unit has
already fetched the instruction I4 and I5.
If the branch condition is false, then branch won't take place and the next instruction to be executed
is I4 which is already fetched and available for execution.
Now consider that when the condition is true, we have to execute the instruction I10 after clock
cycle 5, it is known that branch condition is true and now instruction I 10 has to be executed.
But already the processor has fetched instruction I4 and I5 it is required to invalidate these two
fetched instruction and the pipe line must be loaded with new destination instruction I 10.

Due to this reason, the pipeline will stall for some time. The time lost due to branch instruction is
often referred as branch penalty.
The effect of branch takes place is shown in the figure in the previous slide. Due to the effect of
branch takes place, the instruction I4 and I5 which has already been fetched is not executed and new
instruction I 10 is fetched at clock cycle 6.
There is not effective output in clock cycle 7 and 8, and so the branch penalty is 2. The branch
penalty depends on the number of stages in the pipeline. More numbers of stages results in more
branch penalty.

Dealing with Branches:

One of the major problems in designing an instruction pipe line is assuming a steady flow of
instructions to the initial stages of the pipeline. The primary problem is the conditional brancho
instruction until the instruction is actually executed, it is impossible to determine whether the
branch will be taken or not.
A variety of approaches have been taken for dealing with conditional branches:
 Multiple streams
 Prefetch branch target
 Loop buffer
 Branch prediction
 Delayed branch

Multiple streams
A single pipeline suffers a penalty for a branch instruction because it must choose one of two
instructions to fetch next and sometimes it may make the wrong choice.
A brute-force approach is to replicate the initial portions of the pipeline and allow the pipeline to
fetch both instructions, making use of two streams.
There are two problems with this approach.
 With multiple pipelines there are contention delays for access to the registers and to memory
 Additional branch instructions may enter the pipeline (either stream) before the original
branch decision is resolved. Each such instruction needs as additional stream.
Prefetch Branch target
When a conditional branch is recognized, the target of the branch is prefetced, in addition to the
instruction following the branch. This target is then saved until the branch instruction is executed. If
the branch is taken, the target has already been prefetched,.

Loop Buffer:
A top buffer is a small, very high speed memory maintained by the instruction fetch stage of the
pipeline and containing the most recently fetched instructions, in sequence. If a branch is to be
taken, the hardware first cheeks whether the branch target is within the buffer. If so, the next
instruction is fetched from the buffer.
The loop buffer has three benefits:
1. With the use of prefetching, the loop buffer will contain some instruction sequentially ahead of
the current instruction fetch address. Thus, instructions fetched in sequence will be available
without the usual memory access time.
2. If a branch occurs to a target just a few locations ahead of the address of the branch instruction,
the target will already be in the buffer. This is usual for the common occurrence of IF-THEN and
IF-THEN-ELSE sequences.
3. This strategy is particularly well suited for dealing with loops, or iterations; hence the name loop
buffer. If the loop buffer is large enough to contain all the instructions in a loop, then those
instructions need to be fetched from memory only once, for the first iteration. For subsequent
iterations, all the needed instructions are already in the buffer.

The loop buffer is similar in principle to a cache dedicated to instructions. The differences are that
the loop buffer only retains instructions in sequence and is much smaller in size and hence lower in
cost.

Branch Prediction :
Various techniques can be used to predict whether a branch will be taken or not. The most common
techniques are:
 Predict never taken
 Predict always taken
 Predict by opcode
 Taken/not taken switch
 Branch history table.
The first three approaches are static; they do not depend on the execution history upto the time of
the conditional branch instructions. The later two approaches are dynamic- they depend on the
execution history.

Predict never taken always assumes that the branch will not be taken and continue to fetch
instruction in sequence. Predict always taken assumes that the branch will be taken and always fetch
the branet target In these two approaches it is also possible to minimize the effect of a wrong
decision.
If the fetch of an instruction after the branch will cause a page fault or protection violation, the
processor halts its prefetching until it is sure that the instruction should be fetched. Studies
analyzing program behaviour have shown that conditional branches are taken more than 50% of the
time, and so if the cost of prefetching from either path is the same, then always prefetching from the
branch target address should give better performance than always prefetching from the sequential
path.
However, in a paged machine, prefetching the branch target is more likely to cause a page fault than
prefetching the next instruction in the sequence and so this performance penalty should be taken
into account.
Predict by opcode approach makes the decision based on the opcade of the branch instruction. The
processor assumes that the branch will be taken for certain branch opcodes and not for others.
Studies reported in showed that success rate is greater than 75% with the strategy.
Dynamic branch strategies attempt to improve the accuracy of prediction by recording the history of
conditional branch instructions in a program. Scheme to maintain the history information:
 One or more bits can be associated with each conditional branch instruction that reflect the
recent history of the instruction.
 These bits are referred to as a taken/not taken switch that directs the processor to make a
particular decision the next time the instruction is encountered.
 Generally these history bits are not associated with the instruction in main memory. It will
unnecessarily increase the size of the instruction. With a single bit we can record whether
the last execution of this instruction resulted a branch or not.
 With only one bit of history, an error in prediction will occur twice for each use of the loop:
once for entering the loop. And once for exiting.
If two bits are used, they can be used to record the result of the last two instances of the execution
of the associated instruction.

The history information is not kept in main memory, it can be kept in a temporary high speed
memory. One possibility is to associate these bits with any conditional branch instruction that is in a
cache. When the instruction is replaced in the cache, its history is lost. Another possibility is to
maintain a small table for recently executed branch instructions with one or more bits in each entry.
The branch history table is a small cache memory associated with the instruction fetch stage of the
pipeline. Each entry in the table consists of three elements:
 The address of the branch instruction.
 Some member of history bits that record the state of use of that instruction.
 Information about the target instruction, it may be the address of the target instruction, or
may be the target instruction itself.
Consider that the instruction Ij is a branch instruction. The processor begins fetching instruction
Ij+1 before it determine whether the current instruction, Ij , is a branch instruction.
When execution of is completed and a branch must be made, the processor must discard the
instruction that was fetched and now fetch the instruction at the branch target.
The location following a branch instruction is called a branch delay slot. There may be more than
one branch delay slot, depending on the time it takes to execute a branch instruction.
The instructions in the delay slots are always fetched and at least partially executed before the
branch decision is made and the branch target address is computed.
Delayed branching is a technique to minimize the penalty incurred as a result of conditional branch
instructions. The instructions in the delay slots are always fetched, so we can arrange the instruction
in delay slots to be fully executed whether or not the branch is taken. The objective is to plane
useful instruction in these slots. If no useful instructions can be placed in the delay slots, these slots
must be filled with NOP (no operation) instructions. While feeling up the delay slots with
instructions, it is required to maintain the original semantics of the program.

For example consider the following code segments

Here register R2 is used as a counter to determine the number of times the contents of register R1
are sifted left. Consider a processor with a two-stage pipeline and one delay slot. During the
execution phase of the instruction I3 the fetch unit will fetch the instruction I4. After evaluating the
branch condition only, it will be clear whether instruction I1 or I4 will be executed next.
The nature of the code segment says that it will remain in the top depending on the initial value of
R2 and when it becomes zero, it will come out from the loop and execute the instruction I4. During
the loop execution, every time there is a wrong fetch of instruction I4. The code segment can be
recognized without disturbing the original meaning of the program.
In this case, the shift instruction is fetched while the branch instruction is being executed. After
evaluating the branch condition, the processor fetches the instruction at LOOP or at NEXT,
depending on whether the branch condition is true or false, respectively.
In either case, it completes execution of the shift instruction. Logically the program is executed as if
the branch instruction was placed after the shift instruction. That is, branching takes place one
instruction later than where the branch instruction appears in the instruction sequence in the
memory, hence the name “delayed branch” .

Gemsy RXM-2 Rxm-2a
100% (1)
Gemsy RXM-2 Rxm-2a
27 pages
Larhea Grandet
No ratings yet
Larhea Grandet
39 pages
777 D Test Charts
100% (1)
777 D Test Charts
41 pages
Kenwood TM v7 User Manual
100% (1)
Kenwood TM v7 User Manual
100 pages
Implementing A Custom Language SuccinctlyFeb 22, 2018 by Vassili Kaplan
100% (1)
Implementing A Custom Language SuccinctlyFeb 22, 2018 by Vassili Kaplan
121 pages
Unit 3 Programmable Digital Signal Processors
No ratings yet
Unit 3 Programmable Digital Signal Processors
66 pages
Internal Marks Assessment System
0% (1)
Internal Marks Assessment System
22 pages
Performance Counter Reference Summary
No ratings yet
Performance Counter Reference Summary
752 pages
CA Unit-2 Chapter-2
No ratings yet
CA Unit-2 Chapter-2
36 pages
Module 3-Part 2
No ratings yet
Module 3-Part 2
50 pages
Pipe Lining
No ratings yet
Pipe Lining
29 pages
File List
No ratings yet
File List
17 pages
DLCOA 6.1 Sep2024
No ratings yet
DLCOA 6.1 Sep2024
81 pages
Block Diagram of 8085
No ratings yet
Block Diagram of 8085
32 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Module 4-Pipelining
No ratings yet
Module 4-Pipelining
39 pages
Moduel 5
No ratings yet
Moduel 5
46 pages
6-981-B HRS050-HRS060
100% (1)
6-981-B HRS050-HRS060
15 pages
Pipe Lining
No ratings yet
Pipe Lining
35 pages
Pipelining and Others
No ratings yet
Pipelining and Others
34 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
Pipelining Concepts and Problems
No ratings yet
Pipelining Concepts and Problems
33 pages
Pipe Lining
No ratings yet
Pipe Lining
23 pages
Module 5 - Pipelining
No ratings yet
Module 5 - Pipelining
61 pages
Pipelining Basic Concept
No ratings yet
Pipelining Basic Concept
23 pages
CH 12.ppt Type I
No ratings yet
CH 12.ppt Type I
54 pages
Aduymylupxnbhyevcaeb
No ratings yet
Aduymylupxnbhyevcaeb
70 pages
SIMD Machines:: Pipeline System
No ratings yet
SIMD Machines:: Pipeline System
35 pages
COA Lecture 10
No ratings yet
COA Lecture 10
22 pages
Co - Unit Ii - Ii
No ratings yet
Co - Unit Ii - Ii
34 pages
DDCO Jan25 Unit5
No ratings yet
DDCO Jan25 Unit5
30 pages
Coa Lecture Unit 3 Pipelining
No ratings yet
Coa Lecture Unit 3 Pipelining
95 pages
Slide 6
No ratings yet
Slide 6
46 pages
COA Unit - V Notes
No ratings yet
COA Unit - V Notes
21 pages
Ca 5
No ratings yet
Ca 5
12 pages
Computer Architecture 1
No ratings yet
Computer Architecture 1
8 pages
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
No ratings yet
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
24 pages
Pipelining
No ratings yet
Pipelining
44 pages
11 Processor Structure and Function 20 3 18
No ratings yet
11 Processor Structure and Function 20 3 18
27 pages
Dpco Unit 4
No ratings yet
Dpco Unit 4
21 pages
PIpeline Processing and Multi Processing
No ratings yet
PIpeline Processing and Multi Processing
16 pages
Decision Making and Creative Problem Solving
No ratings yet
Decision Making and Creative Problem Solving
52 pages
Module 5 Notes Bcs302
No ratings yet
Module 5 Notes Bcs302
22 pages
1st Term Test Computer p5
No ratings yet
1st Term Test Computer p5
8 pages
Triple-Band Panel Dual Polarization Half-Power Beam Width Adjust. Electr. Downtilt
0% (1)
Triple-Band Panel Dual Polarization Half-Power Beam Width Adjust. Electr. Downtilt
4 pages
HDM4 Version 205 Installation Instructions
No ratings yet
HDM4 Version 205 Installation Instructions
36 pages
Computer Architecture Pipe Line
No ratings yet
Computer Architecture Pipe Line
28 pages
3 Pipelining Pipeline:: "Folder" Takes 20 Minutes
No ratings yet
3 Pipelining Pipeline:: "Folder" Takes 20 Minutes
8 pages
COA Unit 3 Pipelining 31.5.23
No ratings yet
COA Unit 3 Pipelining 31.5.23
12 pages
4-Concept of Pipelining
No ratings yet
4-Concept of Pipelining
20 pages
Unit3 Pipelining
No ratings yet
Unit3 Pipelining
54 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
31 pages
Module 3
No ratings yet
Module 3
20 pages
UNIT - 5 Pipeling Concept
No ratings yet
UNIT - 5 Pipeling Concept
15 pages
Module 3 Pipelining
No ratings yet
Module 3 Pipelining
7 pages
Chap-10: Speed and Efficiency
No ratings yet
Chap-10: Speed and Efficiency
29 pages
National Digital Literacy Programme (NDLP) Letter To Parent-1
No ratings yet
National Digital Literacy Programme (NDLP) Letter To Parent-1
4 pages
Pipe Lining
No ratings yet
Pipe Lining
12 pages
Session6-Pipelining Approach
No ratings yet
Session6-Pipelining Approach
11 pages
Parallel Aware Optimizer - Provides All Possible Access Paths Matured Optimizer - Select Best Possible Path Out of All Possible Access
No ratings yet
Parallel Aware Optimizer - Provides All Possible Access Paths Matured Optimizer - Select Best Possible Path Out of All Possible Access
16 pages
Designing Simple Wired Robot
No ratings yet
Designing Simple Wired Robot
5 pages
Uni1-2 Pipelining
No ratings yet
Uni1-2 Pipelining
12 pages
Pipelining Seminar
No ratings yet
Pipelining Seminar
14 pages
Akshay Shankar Resume 10 02
No ratings yet
Akshay Shankar Resume 10 02
1 page
Module-5 Ddco - BCS302 DR Laxmi G
No ratings yet
Module-5 Ddco - BCS302 DR Laxmi G
7 pages
Pipeline: A Simple Implementation of A RISC Instruction Set
No ratings yet
Pipeline: A Simple Implementation of A RISC Instruction Set
16 pages
(Alarma GSM) GSM SMS Alarm
No ratings yet
(Alarma GSM) GSM SMS Alarm
14 pages
2 - Performance Issue
No ratings yet
2 - Performance Issue
4 pages
Ddco5-240207065925-3db65dc3 (1) - Pages-Deleted
No ratings yet
Ddco5-240207065925-3db65dc3 (1) - Pages-Deleted
8 pages
Instruction Pipeline
No ratings yet
Instruction Pipeline
16 pages
Pipelining
No ratings yet
Pipelining
5 pages
Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining
No ratings yet
Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining
12 pages
Dynamatic (4) Odt
No ratings yet
Dynamatic (4) Odt
4 pages
How To Configure GMPI A6V10372787 - en
No ratings yet
How To Configure GMPI A6V10372787 - en
4 pages
Excel Macros
No ratings yet
Excel Macros
2 pages
Types of Computer: It Is A Midsize Multi-Processing System Capable of Supporting Up To 250 Users Simultaneously
No ratings yet
Types of Computer: It Is A Midsize Multi-Processing System Capable of Supporting Up To 250 Users Simultaneously
5 pages
Computer 6
No ratings yet
Computer 6
3 pages
2 Performance Issue
No ratings yet
2 Performance Issue
4 pages
An Introduction To MariaDB's Data at Rest Encryption (DARE) - Part 1
No ratings yet
An Introduction To MariaDB's Data at Rest Encryption (DARE) - Part 1
2 pages
Ec6013 Advanced Microprocessors and Microcontrollers L T P C
No ratings yet
Ec6013 Advanced Microprocessors and Microcontrollers L T P C
2 pages
MPMC Syllabus
No ratings yet
MPMC Syllabus
1 page
Vsat Faq
No ratings yet
Vsat Faq
2 pages
IGNOU Operating System Previous Years Solved Papers
From Everand
IGNOU Operating System Previous Years Solved Papers
Manish Soni
No ratings yet
Exploring BeagleBone: Tools and Techniques for Building with Embedded Linux
From Everand
Exploring BeagleBone: Tools and Techniques for Building with Embedded Linux
Derek Molloy
4/5 (1)
Learn Java Programming in 24 Hours
From Everand
Learn Java Programming in 24 Hours
PublishDrive
No ratings yet
IGNOU PGDCA MCS 202 Computer Organisation Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 202 Computer Organisation Previous Years Unsolved Papers
Manish Soni
No ratings yet
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
Computer Science II Essentials
From Everand
Computer Science II Essentials
Randall Raus
No ratings yet
Practical, Made Easy Guide To Building, Office And Home Automation Systems - Part One
From Everand
Practical, Made Easy Guide To Building, Office And Home Automation Systems - Part One
Kerwin Mathew
No ratings yet

Lecture 7 - PIPELINING

Uploaded by

Lecture 7 - PIPELINING

Uploaded by

DAR ES SALAAM INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER STUDIES

Effect of Intermediate storage buffer:

Role of cache memory:

But due to the pipeline action, we will get the result as

In this instruction sequence, consider that I3 is a conditional branch instruction.

Dealing with Branches:

For example consider the following code segments

You might also like