0% found this document useful (0 votes)

63 views71 pages

Pipe Lining

Pipelining is a technique used in computer processors to overlap the execution of instructions to improve performance. It works by breaking down the execution of each instruction into multiple sequential stages, allowing new instructions to begin execution before previous ones have finished. While pipelining reduces the average time per instruction, it can introduce pipeline hazards if instructions interact in a way that violates dependencies between stages. The document discusses pipelining concepts and the 5-stage RISC instruction pipeline model.

Uploaded by

Amit Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views71 pages

Pipe Lining

Uploaded by

Amit Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

Pipelining: Basic and

Intermediate Concepts

Computer Architecture: A Quantitative

Approach by Hennessey and Patterson
Appendix A
(adapted from J. Rhinelander’s slides)
What Is A Pipeline?
• Pipelining is used by virtually all modern
microprocessors to enhance performance by
overlapping the execution of instructions.
• A common analogue for a pipeline is a factory
assembly line. Assume that there are three stages:
1. Welding
2. Painting
3. Polishing
• For simplicity, assume that each task takes one hour.
ENGR9861 Winter
2007 RV
What Is A Pipeline?
• If a single person were to work on the product it
would take three hours to produce one product.
• If we had three people, one person could work on each
stage, upon completing their stage they could pass
their product on to the next person (since each stage
takes one hour there will be no waiting).
• We could then produce one product per hour
assuming the assembly line has been filled.

ENGR9861 Winter
2007 RV
Characteristics Of Pipelining
• If the stages of a pipeline are not balanced and one
stage is slower than another, the entire throughput of
the pipeline is affected.
• In terms of a pipeline within a CPU, each instruction
is broken up into different stages. Ideally if each stage
is balanced (all stages are ready to start at the same
time and take an equal amount of time to execute.) the
time taken per instruction (pipelined) is defined as:

Time per instruction (unpipelined) / Number of stages

ENGR9861 Winter
2007 RV
Characteristics Of Pipelining
• The previous expression is ideal. We will see later that
there are many ways in which a pipeline cannot
function in a perfectly balanced fashion.
• In terms of a CPU, the implementation of pipelining
has the effect of reducing the average instruction time,
therefore reducing the average CPI.
• EX: If each instruction in a microprocessor takes 5
clock cycles (unpipelined) and we have a 4 stage
pipeline, the ideal average CPI with the pipeline will
be 1.25 .

ENGR9861 Winter
2007 RV
RISC Instruction Set Basics
(from Hennessey and Patterson)
• Properties of RISC architectures:
– All ops on data apply to data in registers and typically
change the entire register (32-bits or 64-bits).
– The only ops that affect memory are load/store
operations. Memory to register, and register to memory.
– Load and store ops on data less than a full size of a
register (32, 16, 8 bits) are often available.
– Usually instructions are few in number (this can be
relative) and are typically one size.

ENGR9861 Winter
2007 RV
RISC Instruction Set Basics
Types Of Instructions
• ALU Instructions:
 Arithmetic operations, either take two registers as
operands or take one register and a sign extended
immediate value as an operand. The result is stored in a
third register.
 Logical operations AND OR, XOR do not usually
differentiate between 32-bit and 64-bit.
• Load/Store Instructions:
 Usually take a register (base register) as an operand and a
16-bit immediate value. The sum of the two will create
the effective address. A second register acts as a source
in the case of a load operation.
ENGR9861 Winter
2007 RV
RISC Instruction Set Basics
Types Of Instructions (continued)
 In the case of a store operation the second register
contains the data to be stored.
• Branches and Jumps
 Conditional branches are transfers of control. As
described before, a branch causes an immediate value to
be added to the current program counter.
• Appendix A has a more detailed description of the
RISC instruction set. Also the inside back cover has a
listing of a subset of the MIPS64 instruction set.

ENGR9861 Winter
2007 RV
RISC Instruction Set Implementation
• We first need to look at how instructions in the MIPS64
instruction set are implemented without pipelining. We’ll
assume that any instruction of the subset of MIPS64 can be
executed in at most 5 clock cycles.
• The five clock cycles will be broken up into the following steps:
 Instruction Fetch Cycle
 Instruction Decode/Register Fetch Cycle

 Execution Cycle

 Memory Access Cycle

 Write-Back Cycle
ENGR9861 Winter
2007 RV
Instruction Fetch (IF) Cycle
• The value in the PC represents an address in memory.
The MIPS64 instructions are all 32-bits in length.
Figure 2.27 shows how the 32-bits (4 bytes) are
arranged depending on the instruction.
• First we load the 4 bytes in memory into the CPU.
• Second we increment the PC by 4 because memory
addresses are arranged in byte ordering. This will now
represent the next instruction. (Is this certain???)

ENGR9861 Winter
2007 RV
Instruction Decode (ID)/Register Fetch
Cycle
• Decode the instruction and at the same time read in
the values of the register involved. As the registers are
being read, do equality test incase the instruction
decodes as a branch or jump.
• The offset field of the instruction is sign-extended
incase it is needed. The possible branch effective
address is computed by adding the sign-extended
offset to the incremented PC. The branch can be
completed at this stage if the equality test is true and
the instruction decoded as a branch.

ENGR9861 Winter
2007 RV
Instruction Decode (ID)/Register Fetch
Cycle (continued)
• Instruction can be decoded in parallel with reading the
registers because the register addresses are at fixed
locations.

ENGR9861 Winter
2007 RV
Execution (EX)/Effective Address Cycle
• If a branch or jump did not occur in the previous
cycle, the arithmetic logic unit (ALU) can execute the
instruction.
• At this point the instruction falls into three different
types:
 Memory Reference: ALU adds the base register and the
offset to form the effective address.
 Register-Register: ALU performs the arithmetic, logical,
etc… operation as per the opcode.
 Register-Immediate: ALU performs operation based on
the register and the immediate value (sign extended).
ENGR9861 Winter
2007 RV
Memory Access (MEM) Cycle
• If a load, the effective address computed from the
previous cycle is referenced and the memory is read.
The actual data transfer to the register does not occur
until the next cycle.
• If a store, the data from the register is written to the
effective address in memory.

ENGR9861 Winter
2007 RV
Write-Back (WB) Cycle
• Occurs with Register-Register ALU instructions or
load instructions.
• Simple operation whether the operation is a register-
register operation or a memory load operation, the
resulting data is written to the appropriate register.

ENGR9861 Winter
2007 RV
Looking At The Big Picture
• Overall the most time that an non-pipelined
instruction can take is 5 clock cycles. Below is a
summary:
 Branch - 2 clock cycles
 Store - 4 clock cycles
 Other - 5 clock cycles
• EX: Assuming branch instructions account for 12% of
all instructions and stores account for 10%, what is the
average CPI of a non-pipelined CPU?
ANS: 0.12*2+0.10*4+0.78*5 = 4.54
ENGR9861 Winter
2007 RV
The Classical RISC 5 Stage Pipeline

• In an ideal case to implement a pipeline we just need

to start a new instruction at each clock cycle.
• Unfortunately there are many problems with trying to
implement this. Obviously we cannot have the ALU
performing an ADD operation and a MULTIPLY at
the same time. But if we look at each stage of
instruction execution as being independent, we can see
how instructions can be “overlapped”.

ENGR9861 Winter
2007 RV
ENGR9861 Winter
2007 RV
Problems With The Previous Figure

• The memory is accessed twice during each clock cycle. This

problem is avoided by using separate data and instruction
caches.
• It is important to note that if the clock period is the same for a
pipelined processor and an non-pipelined processor, the memory
must work five times faster.
• Another problem that we can observe is that the registers are
accessed twice every clock cycle. To try to avoid a resource
conflict we perform the register write in the first half of the
cycle and the read in the second half of the cycle.

ENGR9861 Winter
2007 RV
Problems With The Previous Figure
(continued)

• We write in the first half because therefore an write

operation can be read by another instruction further down
the pipeline.
• A third problem arises with the interaction of the pipeline
with the PC. We use an adder to increment PC by the end
of IF. Within ID we may branch and modify PC. How does
this affect the pipeline?
• The use if pipeline registers allow the CPU of have a
memory to implement the pipeline. Remember that the
previous figure has only one resource use in each stage.

ENGR9861 Winter
2007 RV
Pipeline Hazards
• The performance gain from using pipelining occurs
because we can start the execution of a new
instruction each clock cycle. In a real implementation
this is not always possible.
• Another important note is that in a pipelined
processor, a particular instruction still takes at least as
long to execute as non-pipelined.
• Pipeline hazards prevent the execution of the next
instruction during the appropriate clock cycle.

ENGR9861 Winter
2007 RV
Types Of Hazards
• There are three types of hazards in a pipeline, they are
as follows:
 Structural Hazards: are created when the data path
hardware in the pipeline cannot support all of the
overlapped instructions in the pipeline.
 Data Hazards: When there is an instruction in the
pipeline that affects the result of another instruction in
the pipeline.
 Control Hazards: The PC causes these due to the
pipelining of branches and other instructions that change
the PC.
ENGR9861 Winter
2007 RV
A Hazard Will Cause A Pipeline Stall

• Some performance expressions involving a realistic

pipeline in terms of CPI. It is a assumed that the clock
period is the same for pipelined and unpipelined
implementations.
Speedup = CPI Unpipelined / CPI pipelined
= Pipeline Depth / ( 1 + Stalls per Ins)
= Ave Ins Time Unpipelined / Ave Ins Time Pipelined

ENGR9861 Winter
2007 RV
A Hazard Will Cause A Pipeline Stall
(continued)

• We can look at pipeline performance in terms of a

faster clock cycle time as well:
CPI unpipelined Clock cycle time unpipelined
Speedup = x
CPI pipelined Clock cycle time pipelined

Clock cycle time unpipelined

Clock cycle pipelined =
Pipeline Depth

1
Speedup = x Pipeline Depth
1 + Pipeline stalls per Ins
ENGR9861 Winter
2007 RV
Dealing With Structural Hazards

• Structural hazards result from the CPU data path not

having resources to service all the required
overlapping resources.
• Suppose a processor can only read and write from the
registers in one clock cycle. This would cause a
problem during the ID and WB stages.
• Assume that there are not separate instruction and data
caches, and only one memory access can occur during
one clock cycle. A hazard would be caused during the
IF and MEM cycles.

ENGR9861 Winter
2007 RV
ENGR9861 Winter
2007 RV
Dealing With Structural Hazards

• A structural hazard is dealt with by inserting a stall or pipeline

bubble into the pipeline. This means that for that clock cycle,
nothing happens for that instruction. This effectively “slides”
that instruction, and subsequent instructions, by one clock cycle.
• This effectively increases the average CPI.
• EX: Assume that you need to compare two processors, one with
a structural hazard that occurs 40% for the time, causing a stall.
Assume that the processor with the hazard has a clock rate 1.05
times faster than the processor without the hazard. How fast is
the processor with the hazard compared to the one without the
hazard?

ENGR9861 Winter
2007 RV
Dealing With Structural Hazards (continued)

CPI no haz Clock cycle time no haz

Speedup = x
CPI haz Clock cycle time haz

1 1
Speedup = x
1+0.4*1 1/1.05

= 0.75

ENGR9861 Winter
2007 RV
Dealing With Structural Hazards (continued)

• We can see that even though the clock speed of the

processor with the hazard is a little faster, the speedup
is still less than 1.
• Therefore the hazard has quite an effect on the
performance.
• Sometimes computer architects will opt to design a
processor that exhibits a structural hazard. Why?
• A: The improvement to the processor data path is too costly.
• B: The hazard occurs rarely enough so that the processor will still
perform to specifications.
ENGR9861 Winter
2007 RV
Data Hazards (A Programming Problem?)

• We haven’t looked at assembly programming in detail

at this point.
• Consider the following operations:
DADD R1, R2, R3
DSUB R4, R1, R5
AND R6, R1, R7
ORR8, R1, R9
XOR R10, R1, R11

ENGR9861 Winter
2007 RV
Pipeline Registers

What are the problems?

ENGR9861 Winter
2007 RV
Data Hazard Avoidance
• In this trivial example, we cannot expect the programmer to
reorder his/her operations. Assuming this is the only code we
want to execute.
• Data forwarding can be used to solve this problem.
• To implement data forwarding we need to bypass the pipeline
register flow:
– Output from the EX/MEM and MEM/WB stages must be fed back
into the ALU input.
– We need routing hardware that detects when the next instruction
depends on the write of a previous instruction.

ENGR9861 Winter
2007 RV
ENGR9861 Winter
2007 RV
General Data Forwarding
• It is easy to see how data forwarding can be used by
drawing out the pipelined execution of each
instruction.
• Now consider the following instructions:

DADD R1, R2, R3

LDR4, O(R1)
SDR4, 12(R1)

ENGR9861 Winter
2007 RV
ENGR9861 Winter
2007 RV
Problems
• Can data forwarding prevent all data hazards?
• NO!
• The following operations will still cause a data hazard.
This happens because the further down the pipeline
we get, the less we can use forwarding.
LD R1, O(R2)
DSUB R4, R1, R5
AND R6, R1, R7
OR R8, R1, R9

ENGR9861 Winter
2007 RV
ENGR9861 Winter
2007 RV
Problems
• We can avoid the hazard by using a pipeline interlock.
• The pipeline interlock will detect when data
forwarding will not be able to get the data to the next
instruction in time.
• A stall is introduced until the instruction can get the
appropriate data from the previous instruction.

ENGR9861 Winter
2007 RV
Control Hazards
• Control hazards are caused by branches in the code.
• During the IF stage remember that the PC is
incremented by 4 in preparation for the next IF cycle
of the next instruction.
• What happens if there is a branch performed and we
aren’t simply incrementing the PC by 4.
• The easiest way to deal with the occurrence of a
branch is to perform the IF stage again once the
branch occurs.
ENGR9861 Winter
2007 RV
Performing IF Twice
• We take a big performance hit by performing the
instruction fetch whenever a branch occurs. Note, this
happens even if the branch is taken or not. This
guarantees that the PC will get the correct value.

IF ID EX MEM WB
branch IF ID EX MEM WB
IF IF ID EX MEM WB

ENGR9861 Winter
2007 RV
Performing IF Twice
• This method will work but as always in computer
architecture we should try to make the most common
operation fast and efficient.
• With MIPS64 branch instructions are quite common.
• By performing IF twice we will encounter a
performance hit between 10%-30%
• Next class we will look at some methods for dealing
with Control Hazards.

ENGR9861 Winter
2007 RV
Control Hazards (other solutions)

• These following solutions assume that we are dealing

with static branches. Meaning that the actions taken
during a branch do not change.
• We already saw the first example, we stall the pipeline
until the branch is resolved (in our case we repeated
the IF stage until the branch resolved and modified the
PC)
• The next two examples will always make an
assumption about the branch instruction.

ENGR9861 Winter
2007 RV
Control Hazards (other solutions)

• What if we treat every branch as “not taken”

remember that not only do we read the registers
during ID, but we also perform an equality test in case
we need to branch or not.
• We can improve performance by assuming that the
branch will not be taken.
• What in this case we can simply load in the next
instruction (PC+4) can continue. The complexity
arises when the branch evaluates and we end up
needing to actually take the branch.

ENGR9861 Winter
2007 RV
Control Hazards (other solutions)

• If the branch is actually taken we need to clear the

pipeline of any code loaded in from the “not-taken”
path.
• Likewise we can assume that the branch is always
taken. Does this work in our “5-stage” pipeline?
• No, the branch target is computed during the ID cycle.
Some processors will have the target address
computed in time for the IF stage of the next
instruction so there is no delay.

ENGR9861 Winter
2007 RV
Control Hazards (other solutions)

• The “branch-not taken” scheme is the same as performing the IF

stage a second time in our 5 stage pipeline if the branch is taken.
• If not there is no performance degradation.
• The “branch taken” scheme is no benefit in our case because we
evaluate the branch target address in the ID stage.
• The fourth method for dealing with a control hazard is to
implement a “delayed” branch scheme.
• In this scheme an instruction is inserted into the pipeline that is
useful and not dependent on whether the branch is taken or not.
It is the job of the compiler to determine the delayed branch
instruction.
ENGR9861 Winter
2007 RV
ENGR9861 Winter
2007 RV
How To Implement a Pipeline
• From page A-29 in the text we will now look at the
data path implementation of a 5 stage pipeline.

ENGR9861 Winter
2007 RV
How To Implement a Pipeline

ENGR9861 Winter
2007 RV
Multi-clock Operations
• Sometimes operations require more than one clock
cycle to complete. Examples are:
 Floating Point Multiply
 Floating Point Divide
 Floating Point Add
• We can assume that there is hardware available on the
processor for performing the operations.
• Assume that the FP Mul and Add are fully pipelined,
and the divide is un-pipelined.

ENGR9861 Winter
2007 RV
ENGR9861 Winter
2007 RV
ENGR9861 Winter
2007 RV
Avoiding Structural Hazards
• The multiplier and the divider are fully pipelined. The
divider is not pipelined at all.
• Take a look at figure A.34 for a good example of how
pipelining will function in the case of longer
instruction execution. The author assumes a single
floating point register port.
• Structural hazards are avoided in the ID stage by
assigning a memory bit in a shift register. Incoming
instructions can then check to see if they should stall.

ENGR9861 Winter
2007 RV
Instruction Level Parallelism (ILP)
Chapter 3
• The reason why we can implement pipelining in a
microprocessor is due to instruction level parallelism.
• Since operations can be overlapped in execution, they
exhibit ILP.
• ILP is mostly exploited in the use of branches. A
“basic block” is a block of code that has no branches
into or out of except for at the start and the end.
• In MIPS, an average basic block is 4-7 separate
instructions.
ENGR9861 Winter
2007 RV
Dependences and Hazards

A high proportion of loop instructions executed are loop management instructions (next
example should give a clearer picture) on the induction variable.

KEY IDEA: Eliminating this overhead could potentially significantly increase the
performance of the loop:

We’ll use the following loop as our example:

for (i = 1000 ; i > 0 ; I -- ) {

x[ i ] = x[ i ] + constant;
}

ENGR9861 Winter
2007 RV
Dependences and Hazards
for (i = 1000 ; i > 0 ; I -- ) {
x[ i ] = x[ i ] + constant;
}

Loop : L.D F0,0(R1) ; F0 = array elem.

ADD.D F4,F0,F2 ; add scalar in F2
S.D F4,0(R1) ; store result
DADDUI R1,R1,#-8 ; decrement ptr
BNE R1,R2,Loop ; branch if R1 !=R2

ENGR9861 Winter
2007 RV
Dependences and Hazards

• Data Dependence:
– Instruction i produces a result the instruction j will use
or instruction j is data dependent on instruction i and
vice versa.
• Name Dependence:
– Occurs when two instructions use the same register and
memory location. But there is no flow of data between
the instructions. Instruction order must be preserved.
 Antidependence: j writes to a location that i reads.
 Output Dependence: two instructions write to the same
location.
ENGR9861 Winter
2007 RV
Dependences and Hazards

• Types of data hazards:

– RAW: read after write
– WAW: write after write
– WAR: write after read
• We have already seen a RAW hazard. WAW hazards
occur due to output dependence.
• WAR hazards do not usually occur because of the
amount of time between the read cycle and write cycle
in a pipeline.
ENGR9861 Winter
2007 RV
Control Dependence
• Assume we have the following piece of code:
If p1{
S1
}

If p2{
S2
}

• S1 is dependent on p1 and S2 is dependent on p2.

ENGR9861 Winter
2007 RV
Control Dependence
• Control Dependences have the following properties:
– An instruction that is control dependent on a branch
cannot be moved in front of the branch, so that the
branch no longer controls it.
– An instruction that is control dependent on a branch
cannot be moved after the branch so that the branch
controls it.

ENGR9861 Winter
2007 RV
Dynamic Scheduling
• The previous example that we looked at was an
example of statically scheduled pipeline.
• Instructions are fetched and then issued. If the users
code has a data dependency / control dependence it is
hidden by forwarding.
• If the dependence cannot be hidden a stall occurs.
• Dynamic Scheduling is an important technique in
which both dataflow and exception behavior of the
program are maintained.
ENGR9861 Winter
2007 RV
Dynamic Scheduling (continued)

• Data dependence can cause stalling in a pipeline that

has “long” execution times for instructions that
dependencies.
• EX: Consider this code ( .D is floating point) ,

DIV.D F0,F2,F4
ADD.D F10,F0,F8
SUB.D F12,F8,F14

ENGR9861 Winter
2007 RV
Dynamic Scheduling (continued)

• Longer execution times of certain floating point

operations give the possibility of WAW and WAR
hazards. EX:

DIV.D F0, F2, F4

ADD.D F6, F0, F8
SUB.D F8, F10, F14
MUL.D F6, F10, F8

ENGR9861 Winter
2007 RV
Dynamic Scheduling (continued)

• If we want to execute instructions out of order in

hardware (if they are not dependent etc…) we need to
modify the ID stage of our 5 stage pipeline.
• Split ID into the following stages:
– Issue: Decode instructions, check for structural hazards.
– Read Operands: Wait until no data hazards, then read
operands.
• IF still precedes ID and will store the instruction into a
register or queue.

ENGR9861 Winter
2007 RV
Still More Dynamic Scheduling
• Tomasulo’s Algorithim was invent by Robert
Tomasulo and was used in the IBM 360/391.
• The algorithm will avoid RAW hazards by executing
an instruction only when it’s operands are available.
WAR and WAW hazards are avoided by register
renaming.

DIV.D F0,F2,F4 DIV.D F0,F2,F4

ADD.D F6,F0,F8 ADD.D Temp,F0,F8
S.D F6,0(R1) S.D Temp,0(R1)
SUB.D F8,F10,F14 SUB.D Temp2,F10,F14
MUL.D F6,F10,F8 MUL.D F6,F10,Temp2

ENGR9861 Winter
2007 RV
ENGR9861 Winter
2007 RV
Branch Prediction In Hardware

• Data hazards can be overcome by dynamic hardware

scheduling, control hazards need also to be addressed.
• Branch prediction is extremely useful in repetitive
branches, such as loops.
• A simple branch prediction can be implemented using
a small amount of memory and the lower order bits of
the address of the branch instruction.
• The memory only needs to contain one bit,
representing whether the branch was taken or not.
ENGR9861 Winter
2007 RV
Branch Prediction In Hardware

• If the branch is taken the bit is set to 1. The next time

the branch instruction is fetched we will know that the
branch occurred and we can assume that the branch
will be taken.
• This scheme adds some “history” to our previous
discussion on “branch taken” and “branch not taken”
control hazard avoidance.
• This single bit method will fail at least 20% of the
time. Why?

ENGR9861 Winter
2007 RV
2-bit Prediction Scheme
• This method is more reliable than using a single bit to
represent whether the branch was recently taken or
not.
• The use of a 2-bit predictor will allow branches that
favor taken (or not taken) to be mispredicted less often
than the one-bit case.

ENGR9861 Winter
2007 RV
ENGR9861 Winter
2007 RV
Branch Predictors
• The size of a branch predictor memory will only
increase it’s effectiveness so much.
• We also need to address the effectiveness of the
scheme used. Just increasing the number of bits in the
predictor doesn’t do very much either.
• Some other predictors include:
– Correlating Predictors
– Tournament Predictors

ENGR9861 Winter
2007 RV
Branch Predictors
• Correlating predictors will use the history of a local
branch AND some overall information on how
branches are executing to make a decision whether to
execute or not.
• Tournament Predictors are even more sophisticated in
that they will use multiple predictors local and global
and enable them with a selector to improve accuracy.

ENGR9861 Winter
2007 RV

Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
Chapter # 03 Pipelining
No ratings yet
Chapter # 03 Pipelining
85 pages
CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation
No ratings yet
CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation
114 pages
3-Pipelining 241110 203716
No ratings yet
3-Pipelining 241110 203716
59 pages
Unit2 Aca
No ratings yet
Unit2 Aca
118 pages
Lecture # Pipelining
No ratings yet
Lecture # Pipelining
36 pages
Module 4-Pipelining
No ratings yet
Module 4-Pipelining
39 pages
Week 11-13
No ratings yet
Week 11-13
76 pages
Pipelining 2019
No ratings yet
Pipelining 2019
82 pages
Batch Control IsA 9 21 2010
100% (2)
Batch Control IsA 9 21 2010
52 pages
L03 Pipelining
No ratings yet
L03 Pipelining
45 pages
Moduel 5
No ratings yet
Moduel 5
46 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
39 pages
DDCO Jan25 Unit5
No ratings yet
DDCO Jan25 Unit5
30 pages
Pipelining
No ratings yet
Pipelining
43 pages
8 Pipeline DDP Control
No ratings yet
8 Pipeline DDP Control
54 pages
Lec 1
No ratings yet
Lec 1
30 pages
Lec 7
No ratings yet
Lec 7
26 pages
Pipelined Datapath and Control
No ratings yet
Pipelined Datapath and Control
37 pages
WINSEM2022-23 BCSE205L TH VL2022230502914 2023-04-06 Reference-Material-I
No ratings yet
WINSEM2022-23 BCSE205L TH VL2022230502914 2023-04-06 Reference-Material-I
27 pages
Chapter 04 Processor 2
No ratings yet
Chapter 04 Processor 2
28 pages
Pipelining Basic Concept
No ratings yet
Pipelining Basic Concept
23 pages
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
No ratings yet
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
136 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
Lec04 Pipelining Intro&hazards
No ratings yet
Lec04 Pipelining Intro&hazards
77 pages
Pipelining Preview: Basics & Challenges
No ratings yet
Pipelining Preview: Basics & Challenges
75 pages
Lecture 13 Pipelining
No ratings yet
Lecture 13 Pipelining
12 pages
PIPELINING
No ratings yet
PIPELINING
30 pages
Pipelining - Modified1
No ratings yet
Pipelining - Modified1
51 pages
ILP - Appendix C PDF
No ratings yet
ILP - Appendix C PDF
52 pages
Pipelining: Basic and Intermediate Concepts
No ratings yet
Pipelining: Basic and Intermediate Concepts
69 pages
Lec11 Pipeline 1 Notes
No ratings yet
Lec11 Pipeline 1 Notes
26 pages
Unit-V: Performance Enhancement Techinques
No ratings yet
Unit-V: Performance Enhancement Techinques
61 pages
CH7-Parallel and Pipelined Processing
No ratings yet
CH7-Parallel and Pipelined Processing
23 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Lec07 Pipelining Review
No ratings yet
Lec07 Pipelining Review
121 pages
Embedded Computer Architecture 5SAI0
No ratings yet
Embedded Computer Architecture 5SAI0
59 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Embedded Systems Design: Pipelining and Instruction Scheduling
No ratings yet
Embedded Systems Design: Pipelining and Instruction Scheduling
48 pages
Basic Pipelining: CS2100 - Computer Organization
No ratings yet
Basic Pipelining: CS2100 - Computer Organization
83 pages
ACA Unit 2,7th Sem CSE
No ratings yet
ACA Unit 2,7th Sem CSE
13 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining
No ratings yet
Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining
12 pages
Pipeline
100% (2)
Pipeline
8 pages
Service Manual B4600 PDF
71% (7)
Service Manual B4600 PDF
171 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
Pipelines - #1 RISC ISA Without Pipe
No ratings yet
Pipelines - #1 RISC ISA Without Pipe
9 pages
Risc in Pipe Ine
No ratings yet
Risc in Pipe Ine
39 pages
OLP Notes
No ratings yet
OLP Notes
11 pages
Coa Unit - 5 Notes
No ratings yet
Coa Unit - 5 Notes
6 pages
2 - Focus II C-307 DC Motor Sigma - Ing
No ratings yet
2 - Focus II C-307 DC Motor Sigma - Ing
127 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
38 pages
Pipeline Pre Trenching Pre Qua - Rev A 27june22 - Final
No ratings yet
Pipeline Pre Trenching Pre Qua - Rev A 27june22 - Final
57 pages
Chapter 3 PG - 36
No ratings yet
Chapter 3 PG - 36
401 pages
Cse410 10 Pipelining A
No ratings yet
Cse410 10 Pipelining A
7 pages
UNIT 5 RISC Architecture
No ratings yet
UNIT 5 RISC Architecture
16 pages
PipeLining in Microprocessors
No ratings yet
PipeLining in Microprocessors
19 pages
Service and Parts Frymaster Bigl30 Series Manual Lov™ Gas Fryer
No ratings yet
Service and Parts Frymaster Bigl30 Series Manual Lov™ Gas Fryer
75 pages
Invitation Letter For Visa Spouse
No ratings yet
Invitation Letter For Visa Spouse
2 pages
BAC GIANG - Đề thi chọn ĐT 2023 (chính thức)
No ratings yet
BAC GIANG - Đề thi chọn ĐT 2023 (chính thức)
19 pages
HPE - Dp00002639en - Us - HPE Smart Storage Administrator GUI User Guide
No ratings yet
HPE - Dp00002639en - Us - HPE Smart Storage Administrator GUI User Guide
142 pages
Pipeline: A Simple Implementation of A RISC Instruction Set
No ratings yet
Pipeline: A Simple Implementation of A RISC Instruction Set
16 pages
Chitaliya Dipak - Nirma
No ratings yet
Chitaliya Dipak - Nirma
93 pages
Lec12 Pipeline
No ratings yet
Lec12 Pipeline
23 pages
Analog Display Digital VFO
No ratings yet
Analog Display Digital VFO
3 pages
Untitled 2
No ratings yet
Untitled 2
31 pages
Advance Structures (7th Semester) (B.ARCH)
No ratings yet
Advance Structures (7th Semester) (B.ARCH)
93 pages
30 List of Documents Required For Different Categories of Agricultural Loan Schemes-030823261212
No ratings yet
30 List of Documents Required For Different Categories of Agricultural Loan Schemes-030823261212
4 pages
Avg. Market Capitalization of Listed Companies During Jul-Dec 2018
No ratings yet
Avg. Market Capitalization of Listed Companies During Jul-Dec 2018
294 pages
FEA Questions
No ratings yet
FEA Questions
9 pages
Land and Inland
No ratings yet
Land and Inland
28 pages
Required Documents - World Education Services
No ratings yet
Required Documents - World Education Services
6 pages
Technical Supply Conditions For Gauges: 1 IS: 7018 (Part 2) - 1983 Indian Standard
No ratings yet
Technical Supply Conditions For Gauges: 1 IS: 7018 (Part 2) - 1983 Indian Standard
6 pages
Obj - Que. PBG 4.4
No ratings yet
Obj - Que. PBG 4.4
11 pages
2 Plugins Changelog
No ratings yet
2 Plugins Changelog
3 pages
How To Pay Profession Tax Online
No ratings yet
How To Pay Profession Tax Online
22 pages
2016 CCNY Great Grads
No ratings yet
2016 CCNY Great Grads
16 pages
Assoland Construction Pte LTD V Malayan Credit Properties Pte LTD (1993) 3 SLR 470
No ratings yet
Assoland Construction Pte LTD V Malayan Credit Properties Pte LTD (1993) 3 SLR 470
2 pages
Indian Institute of Information Technology, Design and Manufacturing, Kurnool Jagannathagattu, Dinnidevarapadu, Kurnool
No ratings yet
Indian Institute of Information Technology, Design and Manufacturing, Kurnool Jagannathagattu, Dinnidevarapadu, Kurnool
8 pages
Bravo-Guerrero vs. Bravo, 465 SCRA 244, July 29, 2005
No ratings yet
Bravo-Guerrero vs. Bravo, 465 SCRA 244, July 29, 2005
7 pages
HTML in A Day For Digital Marketing Pro Course
No ratings yet
HTML in A Day For Digital Marketing Pro Course
1 page
Sainik School Amaravathinagar Class Xii - Summer Vacation Home Work Annexure A
No ratings yet
Sainik School Amaravathinagar Class Xii - Summer Vacation Home Work Annexure A
5 pages
Fleximax Series Power System: All The Power You Need
No ratings yet
Fleximax Series Power System: All The Power You Need
2 pages
The Tuckahoe Talker: Congratulations Marissa & Brandon
No ratings yet
The Tuckahoe Talker: Congratulations Marissa & Brandon
2 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
Computer Science II Essentials
From Everand
Computer Science II Essentials
Randall Raus
No ratings yet

Pipe Lining

Uploaded by

Pipe Lining

Uploaded by

Pipelining: Basic and

Computer Architecture: A Quantitative

Time per instruction (unpipelined) / Number of stages

 Memory Access Cycle

• In an ideal case to implement a pipeline we just need

• The memory is accessed twice during each clock cycle. This

• We write in the first half because therefore an write

• Some performance expressions involving a realistic

• We can look at pipeline performance in terms of a

Clock cycle time unpipelined

• Structural hazards result from the CPU data path not

• A structural hazard is dealt with by inserting a stall or pipeline

CPI no haz Clock cycle time no haz

• We can see that even though the clock speed of the

• We haven’t looked at assembly programming in detail

What are the problems?

DADD R1, R2, R3

• These following solutions assume that we are dealing

• What if we treat every branch as “not taken”

• If the branch is actually taken we need to clear the

• The “branch-not taken” scheme is the same as performing the IF

We’ll use the following loop as our example:

for (i = 1000 ; i > 0 ; I -- ) {

Loop : L.D F0,0(R1) ; F0 = array elem.

• Types of data hazards:

• S1 is dependent on p1 and S2 is dependent on p2.

• Data dependence can cause stalling in a pipeline that

• Longer execution times of certain floating point

DIV.D F0, F2, F4

• If we want to execute instructions out of order in

DIV.D F0,F2,F4 DIV.D F0,F2,F4

• Data hazards can be overcome by dynamic hardware

• If the branch is taken the bit is set to 1. The next time

You might also like