100% found this document useful (1 vote)

247 views58 pages

Chapter 6 Pipelining

R1, R2, R3 I2: BNE R3, label - Pipelining improves processor speed by overlapping the execution of multiple instructions so that multiple instructions can be in different stages - fetch, decode, execute, write - at any given time. - Data hazards occur when two instructions depend on each other's results, so the pipeline may need to stall until the first instruction's results are available for the second. Operand forwarding can reduce stalls by passing instruction results directly to dependent instructions. - Control hazards happen when an instruction is not available at the expected time, like a cache miss, stalling the pipeline. Structural hazards occur when two instructions need the

Uploaded by

vinoliamanohar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

247 views58 pages

Chapter 6 Pipelining

Uploaded by

vinoliamanohar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 58

PIPELINING

Chapter 6

Basic concepts
Speed of execution of programs can be improved in two ways:
Faster circuit technology to build the processor and the memory. Arrange the hardware so that a number of operations can be performed simultaneously. The number of operations performed per second is increased although the elapsed time needed to perform any one operation is not changed.

Pipelining is an effective way of organizing concurrent activity in a computer system to improve the speed of execution of programs.

Basic concepts (Contd.,)

Processor executes a program by fetching and executing instructions one after the other. This is known as sequential execution. If Fi refers to the fetch step, and Ei refers to the execution step of instruction Ii, then sequential execution looks like:
1 2 3

What if the execution of one instruction is overlapped with the fetching of the next one?

Basic concepts (Contd.,)

Computer has two separate hardware units, one for fetching instructions and one for executing instructions. Instruction is fetched by instruction fetch unit and deposited in an intermediate buffer B1. Buffer enables the instruction execution unit to execute the instruction while the fetch unit is fetching the next instruction. Results of the execution are deposited in the destination location specified by the instruction.
Interstage buffer B1
Instruction fetch unit

Execution unit

Basic concepts (Contd.,)

Computer is controlled by a clock whose period is such that the fetch and execute steps of any instruction can be completed in one clock cycle. First clock cycle: Second clock cycle: - Fetch unit fetches an instruction I2 (F2) , and execution unit executes instruction I1 (E1). Third clock cycle: Fourth clock cycle: - Execution unit executes instruction I3 (E3).
Clock cycle Instruction I1 I2 I3 F1 E1 F2 E2 F3 E3 1 2

- Fetch unit fetches an instruction I1 (F1) and stores it in B1.

- Fetch unit fetches an instruction I3 (F3), and execution unit executes instruction I2 (E2).

Time 3 4

Basic concepts (Contd.,)

Suppose the processing of an instruction is divided into four steps: F Fetch: Read the instruction from the memory. D Decode: Decode the instruction and fetch the source operands. E Execute: Perform the operation specified by the instruction. W Write: Store the result in the destination location. There are four distinct hardware units, for each one of the steps. Information is passed from one unit to the next through an interstage buffer. Three interstage buffers connecting four units. As an instruction progresses through the pipeline, the information needed by the downstream units must be passed along.
Interstage b uf fers

F : Fetch instruction B1

D : Decode instruction and fetch operands B2

E: Execute operation B3

W : Write results

Basic concepts (Contd.,)

Time Clock cycle Instruction I1 I2 I3 I4 F1 D1 F2 E1 D2 F3 W1 E2 D3 F4 W2 E3 D4 W3 E4 W4 1 2 3 4 5 6 7

Clock cycle 1: F1 Clock cycle 2: D1, F2 Clock cycle 3: E1, D2, F3 Clock cycle 4: W1, E2, D3, F4 Clock cycle 5: W2, E3, D4 Clock cycle 6: W3, E3, D4 Clock cycle 7: W4

Basic concepts (Contd.,)

Time Clock cycle Instruction 1 2 3 4 5 6 7

I1
I2 I3 I4

D1
F2

E1
D2 F3

W1
E2 D3 F4 W2 E3 D4 W3 E4 W4

During clock cycle #4: Buffer B1 holds instruction I3, which is being decoded by the instruction-decoding unit. Instruction I3 was fetched in cycle 3. Buffer B2 holds the source and destination operands for instruction I2. It also holds the information needed for the Write step (W2) of instruction I2. This information will be passed to the stage W in the following clock cycle. Buffer B1 holds the results produced by the execution unit and the destination information for instruction I1.

Basic concepts (Contd.,)

Role of Cache memory Potential increase in performance achieved by using pipelining is proportional to the number of pipeline stages. This rate can be achieved only if the pipelined operation can be sustained without interruption through program instruction. If a pipelined operation cannot be sustained without interruption, the pipeline is said to stall. A condition that causes the pipeline to stall is called a hazard.

Basic concepts (Contd.,)

Data Hazard
Execution of the instruction occurs in the E stage of the pipeline. Execution of most arithmetic and logic operations would take only one clock cycle. However, some operations such as division would take more time to complete. For example, the operation specified in instruction I2 takes three cycles to complete from cycle 4 to cycle 6.
Time Clock c y cle 1 2 3 4 5 6 7 8 9 Instruction I1 F1 D1 E1 W1 I2 F2 D2 E2 W2 I3 F3 D3 E3 W3 I4 F4 D4 E4 W4 I5 F5 D5 E5

Fi gure 8.3. Effect of an xecuti e on operati on taki ng m ore than one ycl cl ock e. c

Basic concepts (Contd.,)

Control or instruction hazard Pipeline may be stalled because an instruction is not available at the expected time. For example, while fetching an instruction a cache miss may occur, and hence the instruction may have to be fetched from the main memory. Fetching the instruction from the main memory takes much longer than fetching the instruction from the cache. Thus, the fetch cycle of the instruction cannot be completed in one cycle. For example, the fetching of instruction I2 results in a cache miss. Thus, F2 takes 4 clock cycles instead of 1. Time
Clock c ycle Instruction I1 I2 I3 F1 D1 E1 F2 W1 D2 F3 E2 D3 W2 E3 W3 1 2 3 4 5 6 7 8 9

Basic concepts (Contd.,)

Structural hazard
Two instructions require the use of a hardware resource at the same time. Most common case is in access to the memory:
One instruction needs to access the memory as part of the Execute or Write stage. Other instruction is being fetched. If instructions and data reside in the same cache unit, only one instruction can proceed and the other is delayed.

Many processors have separate data and instruction caches to avoid this delay. In general, structural hazards can be avoided by providing sufficient resources on the processor chip.

Basic concepts (Contd.,)

Clock c ycle Instruction I1 I2 I3 I4 I5 1 F1 2 D1 F2 3 E1 D2 F3 4 W1 E2 D3 F4 M2 E3 D4 F5 W2 (Load X(R1),R2 W3 E4 D5 5 6 7

Memory address X+R1 is computed in step E2 in cycle 4, memory access takes place In cycle 5, operand read from the memory is written into register R2 in cycle 6. Execution of instruction I2 takes two clock cycles 4 and 5. In cycle 6, both instructions I2 and I3 require access to register file. Pipeline is stalled because the register file cannot handle two operations at once.

Data hazards
Data hazard is a situation in which the pipeline is stalled because the data to be operated on are delayed. Consider two instructions: I1 : A = 3 + A I2 : B = 4 x A If A = 5, and I1 and I2 are executed sequentially, B=32. In a pipelined processor, the execution of I2 can begin before the execution of I1. The value of A used in the execution of I2 will be the original value of 5 leading to an incorrect result. Thus, instructions I1 and I2 depend on each other, because the data used by I2 depends on the results generated by I1. Results obtained using sequential execution of instructions should be the same as the results obtained from pipelined execution. When two instructions depend on each other, they must be performed in the correct order.

Data hazards (contd..)

Clock cycle Instruction I1 I2 I3 I4 1 F1 2 D1 F2 3 E1 D2 4 W1 5 6 7 8 9

Mul R2, R3, R4

D2A E2 W2

Add R5,R4,R6
W3
E4 W4

D3
F4

E3
D4

Mul instruction places the results of the multiply operation in register R4 at the end of clock cycle 4. Register R4 is used as a source operand in the Add instruction. Hence the Decode Unit decoding the Add instruction cannot proceed until the Write step of the first instruction is complete. Data dependency arises because the destination of one instruction is used as a source in the next instruction.

Operand forwarding
Data hazard occurs because the destination of one instruction is used as the source in the next instruction. Hence, instruction I2 has to wait for the data to be written in the register file by the Write stage at the end of step W1. However, these data are available at the output of the ALU once the Execute stage completes step E1. Delay can be reduced or even eliminated if the result of instruction I1 can be forwarded directly for use in step E2. This is called operand forwarding.

Handling data dependency in software

Data dependency may be detected by the hardware while decoding the instruction:
Control hardware may delay by an appropriate number of clock cycles reading of a register till its contents become available. The pipeline stalls for that many number of clock cycles.

Detecting data dependencies and handling them can also be accomplished in software.
Compiler can introduce the necessary delay by introducing an appropriate number of NOP instructions. For example, if a two-cycle delay is needed between two instructions then two NOP instructions can be introduced between the two instructions. I1: Mul R2, R3, R4 NOP NOP I2: Add R5, R4, R6

Side effects
Data dependencies are explicit easy to detect if a register specified as the destination in one instruction is used as a source in the subsequent instruction. However, some instructions also modify registers that are not specified as the destination. For example, in the autoincrement and autodecrement addressing mode, the source register is modified as well. When a location other than the one explicitly specified in the instruction as a destination location is affected, the instruction is said to have a side effect. Another example of a side effect is condition code flags which implicitly record the results of the previous instruction, and these results may be used in the subsequent instruction.

Side effects (contd..)

I1: Add R3, R4 I2: AddWithCarry R2, R4 Instruction I1 sets the carry flag and instruction I2 uses the carry flag leading to an implicit dependency between the two instructions. Instructions with side effects can lead to multiple data dependencies. Results in a significant increase in the complexity of hardware or software needed to handle the dependencies. Side effects should be kept to a minimum in instruction sets designed for execution on pipelined hardware.

Instruction hazards
Instruction fetch units fetch instructions and supply the execution units with a steady stream of instructions. If the stream is interrupted then the pipeline stalls. Stream of instructions may be interrupted because of a cache miss or a branch instruction.

Instruction hazards (contd..)

Clock cycle Instruction I1 I 2 (Branch) 1 F1 2 E1 F2 E2 Execution unit idle 3 4 5 6 Time

I3 Ik
I k+1

X Fk Ek
Fk+ 1 Ek+ 1

Pipeline stalls for one clock cycle. Time lost as a result of a branch instruction is called as branch penalty. Branch penalty is one clock cycle.

Instruction hazards (contd..)

Branch penalty depends on the length of the pipeline, may be higher for a longer pipeline. For a four-stage pipeline:
Branch target address is computed in stage E2. Instructions I3 and I4 have to be discarded. Execution unit is idle for 2 clock cycles. Branch penalty is 2 clock cycles.
1 I1 I2 I3 I4 Ik I k+
1

T ime 5 6 7 8

2 D
1

3 E D F
1

4 W E D F
1

X X Fk D F
k

Ek D
k+ 1

W E

k+ 1

Instruction hazards (contd..)

Branch penalty can be reduced by computing the branch target address earlier in the pipeline. Instruction fetch unit has special hardware to identify a branch instruction after the instruction is fetched. Branch target address can be computed in the Decode stage (D2), rather than in the Execute stage (E2). Branch penalty is only one clock cycle.
T ime Clock c I1 I 2 (Branch) I3 Ik I k+
1

ycle

1 F1

2 D1 F2

3 E1 D2 F3

4 W
1

X Fk D F k+
k

Ek
1

W E k+

k+ 1

Instruction hazards (contd..)

Instruction queue F : Fetch instruction

Queue can hold several instructions

Fetch unit fetches instructions before they are needed & stores them in a queue

Instruction fetch unit

D : Dispatch/ Decode unit

E : Execute instruction

W : Write results

Dispatch unit takes instructions from the front of the queue and dispatches them to the Execution unit. Dispatch unit also decodes the instruction.

Instruction hazards (contd..)

Clock c ycle Queue length 1 1 F1 2 1 D1 F2 3 1 E1 D2 4 1 E1 5 2 E1 6 3 W1 E2 W2 7 2 8 1 9 1 10 1 I1 I2 I3

F3
F4 F5

E3
D4

W3
E4 W4

I5 is a branch instruction with target instruction Ik. Ik is fetched in cycle 7, and I6 is discarded. However, this does not stall the pipeline, since I4 is dispatched.

I4
I 5 (Branch) I6 Ik I k+ 1

D5 F6 X Fk Dk Ek Wk

I2, I3, I4 and Ik are executed in successive clock cycles. Fetch unit computes the branch address concurrently with the execution of other instructions. This is called as branch folding.

Fk+ 1

D k+ 1 Ek+ 1

Conditional branches and branch prediction

Clock c ycle 1 2 3 4 5 6 7 8 I

T ime

F
1

(Branch)

I
3

F
k

F
k+ 1

Delayed branch (contd..)

Delayed branching can minimize the penalty incurred as a result of conditional branch instructions. Since the instructions in the delay slots are always fetched and partially executed, it is better to arrange for them to be fully executed whether or not branch is taken.
If we are able to place useful instructions in these slots, then they will always be executed whether or not the branch is taken.

If we cannot place useful instructions in the branch delay slots, then we can fill these slots with NOP instructions.

Delayed branch (contd..)

LOOP NEXT Shift_left Decrement Branch=0 Add
(a) Original program loop

R1 R2 LOOP R1,R3

LOOP

Decrement Branch=0 Shift_left Add

R2 LOOP R1 R1,R3

(b) Reordered instructions

Branch prediction
To reduce the branch penalty associated with conditional branches, we can predict whether the branch will be taken. Simplest form of branch prediction:
Assume that the branch will not take place. Continue to fetch instructions in sequential execution order. Until the branch condition is evaluated, instruction execution along the predicted path must be done

Determine a priori whether a branch will be taken or not depending on the expected program behavior.
For example, a branch instruction at the end of the loop causes a branch to the start of the loop for every pass through the loop except the last one. Better performance can be achieved if this branch is always predicted as taken.

Branch prediction (contd.,)

Branch prediction decision is the same every time an instruction is executed.
This is static branch prediction.

Branch prediction decision may change depending on the execution history.

This is dynamic branch prediction.

Branch prediction (contd.,)

In dynamic branch prediction the processor hardware assesses the likelihood of a given branch being taken by keeping track of branch decisions every time that instruction is executed. Simplest form of execution history used in predicting the outcome of a given branch instruction is the result of the most recent execution of that instruction.
Processor assumes that the next time the instruction is executed, the result is likely to be the same. For example, if the branch was taken the last time the instruction was executed, then the branch is likely to be taken this time as well.

Branch prediction (contd.,)

Branch prediction algorithm may be described as a two-state machine with 2 states:
LT : Branch is likely to be taken LNT: Branch is likely not to be taken

Initial state of the machine be LNT When the branch instruction is executed, and if the branch is taken, the machine moves to state LT. If the branch is not taken, it remains in state LNT. When the same branch instruction is executed the next time, the branch is predicted as taken if the state of the machine is LT, else it is predicted as not taken.
Branch taken (BT) BNT LNT BT

Branch not taken (BNT)

Branch prediction (contd.,)

BT BNT SNT LNT

BNT BNT BT

ST : Strong likely to be taken LT : Likely to be taken LNT : Likely not to be taken SNT : Strong likely not to be taken

BT BNT

Influence on Instruction Sets

Overview
Some instructions are much better suited to pipeline execution than others. Addressing modes Conditional code flags

Addressing Modes
Addressing modes include simple ones and complex ones. In choosing the addressing modes to be implemented in a pipelined processor, we must consider the effect of each addressing mode on instruction flow in the pipeline: Side effects The extent to which complex addressing modes cause the pipeline to stall Whether a given mode is likely to be used by compilers

I 2 (Load)

Figure 8.5. Effect of a Load instruction on pipeline timing.

Recall
Load X(R1), R2

Load (R1), R2

Complex Addressing Mode

Load (X(R1)), R2

Clock c ycle 1

T ime 7

Load

X + [R1]

[X +[R1]] [[X +[R1]]] Forward

Next instruction

(a) Complex addressing mode

Simple Addressing Mode

Add #X, R1, R2 Load (R2), R2 Load (R2), R2
(b) Simple addressing mode

Add

X + [R1]

Load

[X +[R1]]

Load

[[X +[R1]]]

Next instruction

Addressing Modes
In a pipelined processor, complex addressing modes do not necessarily lead to faster execution. Advantage: reducing the number of instructions / program space Disadvantage: cause pipeline to stall / more hardware to decode / not convenient for compiler to work with Conclusion: complex addressing modes are not suitable for pipelined execution.

Addressing Modes
Good addressing modes should have: Access to an operand does not require more than one access to the memory Only load and store instruction access memory operands The addressing modes used do not have side effects Register, register indirect, index

Conditional Codes
If an optimizing compiler attempts to reorder instruction to avoid stalling the pipeline when branches or data dependencies between successive instructions occur, it must ensure that reordering does not cause a change in the outcome of a computation. The dependency introduced by the condition-code flags reduces the flexibility available for the compiler to reorder instructions.

Conditional Codes
Figure 8.17. Instruction reordering.
Add Compare Branch=0 R1,R2 R3,R4 ...

(a) A program fragment

Compare Add Branch=0

R3,R4 R1,R2 ...

(b) Instructions reordered

Conditional Codes
Two conclusion: To provide flexibility in reordering instructions, the conditioncode flags should be affected by as few instruction as possible. The compiler should be able to specify in which instructions of a program the condition codes are affected and in which they are not.

Datapath and Control Considerations

Original Design

Pipelined Design
- Separate instruction and data caches - PC is connected to IMAR - DMAR - Separate MDR - Buffers for ALU - Instruction queue - Instruction decoder output

- Reading an instruction from the instruction cache - Incrementing the PC - Decoding an instruction - Reading from or writing into the data cache - Reading the contents of up to two regs - Writing into one register in the reg file - Performing an ALU operation

Superscalar operation
Pipelining enables multiple instructions to be executed concurrently by dividing the execution of an instruction into several stages: An alternative approach is to equip the processor with multiple processing units to handle several instructions in parallel in each stage. If a processor has multiple processing units then several instructions can start execution in the same clock cycle.
Processor is said to use multiple issue.

These processors are known as superscalar processors.

Superscalar operation (contd.,)

Clock c ycle I (F add) 1 F 2 D E 3 E 4 E 5 6 W 7
1 1 1 1A 1B 1C 1

(Add)

(Fsub)

(Sub)

Out-of-order execution
Instructions are dispatched in the same order as they appear in the program, however, they complete execution out-of-order.
Dependencies among instructions need to be handled correctly, so that this does not lead to any problems.

What if during the execution of an instruction an exception occurs and one or more of the succeeding instructions have been executed to completion?
For example, the execution of instruction I1 may cause an exception after the instruction I2 has completed execution and written the results to the destination location?

If a processor permits succeeding instructions to complete execution and write to the destination locations, before knowing whether the prior instructions cause exceptions, it is said to allow imprecise exceptions.

Out-of-order execution(Contd.,)
Clock c ycle I 1 (F add) 1 F1 2 D1 3 E 1A 4 E 1B 5 E 1C 6 W1 7

I 2 (Add)
I 3 (Fsub) I 4 (Sub)

D2
F3 F4

E2
D3 D4 E 3A E 3B

W2
E 3C E4 W3 W4

To guarantee a consistent state when exceptions occur, the results of execution must be written to the destination locations strictly in the program order. Step W2 must be delayed until cycle 6, when I1 enters the write stage. Integer unit must retain the results of I2 until cycle 6, and cannot accept another instruction until then. If an exception occurs during an instruction, then all subsequent instructions that may have been partially executed are discarded. This is known a precise exception.

Execution completion
It is beneficial to allow out-of-order execution, so that the execution unit is freed up to execute other instructions. However, instructions must be completed in program order to allow precise exceptions. These requirements are conflicting. It is possible to resolve the conflict by allowing the execution to proceed and writing the results into temporary registers. The contents of the temporary registers are transferred to permanent registers in correct program order.

Execution completion (Contd.,)

A special control unit called commitment unit is needed to ensure in-order commitment when out-of-order execution is allowed. Commitment unit has a queue called reorder buffer to determine which instructions should be committed next:
Instructions are entered in the queue strictly in the program order as they are dispatched for execution.

When an instruction reaches the head of the queue and its execution has been completed:
Results are transferred from temporary registers to permanent registers. All resources assigned to this instruction are released. The instruction is said to have retired.

Instructions are retired strictly in program order, though they may be completed out-of-order.

Dispatch Operation
When dispatch decisions are made, dispatch unit must ensure that all the resources needed for the execution of an instruction are available and it reserves the resources needed. What if instructions are dispatched out of order? Deadlock occurs

Performance Considerations

Overview
The execution time T of a program that has a dynamic instruction count N is given by:
T N S R

where S is the average number of clock cycles it takes to fetch and execute one instruction, and R is the clock rate. Instruction throughput is defined as the number of instructions executed per second.
Ps R S

Overview
An n-stage pipeline has the potential to increase the throughput by n times. However, the only real measure of performance is the total execution time of a program. Higher instruction throughput will not necessarily lead to higher performance. Two questions regarding pipelining
How much of this potential increase in instruction throughput can be realized in practice? What is good value of n?

Number of Pipeline Stages

Since an n-stage pipeline has the potential to increase the throughput by n times, how about we use a 10,000stage pipeline? As the number of stages increase, the probability of the pipeline being stalled increases. The inherent delay in the basic operations increases. Hardware considerations (area, power, complexity,)

I) Bit Stuffing: 1. Write A Program For A HLDC Frame To Perform The Following
No ratings yet
I) Bit Stuffing: 1. Write A Program For A HLDC Frame To Perform The Following
5 pages
Iii Bca PHP 2
No ratings yet
Iii Bca PHP 2
14 pages
Grade-5-Coding and Robotics
No ratings yet
Grade-5-Coding and Robotics
53 pages
Network Layer Delivery, Forwarding, and Routing: Computer Networks 22-1
100% (1)
Network Layer Delivery, Forwarding, and Routing: Computer Networks 22-1
31 pages
Write A Program To Implement Sliding Window: Experiment No. 8 Aim: Theory
No ratings yet
Write A Program To Implement Sliding Window: Experiment No. 8 Aim: Theory
3 pages
Abstraction & Encapsulation in C++
No ratings yet
Abstraction & Encapsulation in C++
21 pages
Python Lab Manual 1
No ratings yet
Python Lab Manual 1
40 pages
OS PPT Introduction
No ratings yet
OS PPT Introduction
43 pages
Web Technologies Black Book
No ratings yet
Web Technologies Black Book
2 pages
WLAN V200R009C00 Typical Configuration Examples PDF
No ratings yet
WLAN V200R009C00 Typical Configuration Examples PDF
1,749 pages
Information T Information Theory and Coding: S.Chandramohan
No ratings yet
Information T Information Theory and Coding: S.Chandramohan
38 pages
UNIT 2 - Connectionless and Connection Oriented Protocol PDF
No ratings yet
UNIT 2 - Connectionless and Connection Oriented Protocol PDF
115 pages
Online Banking System: A Project Report On
100% (2)
Online Banking System: A Project Report On
58 pages
Practical File For C
No ratings yet
Practical File For C
68 pages
Coa Lecture Unit 3 Pipelining
No ratings yet
Coa Lecture Unit 3 Pipelining
95 pages
Unit-2-Computer Organization Best Notes For Bca
No ratings yet
Unit-2-Computer Organization Best Notes For Bca
21 pages
Layered Technology
No ratings yet
Layered Technology
57 pages
ZD2911 User Guide-En C++
100% (1)
ZD2911 User Guide-En C++
76 pages
DAA Unit - 1
No ratings yet
DAA Unit - 1
68 pages
Software Engineering Unit-1 Notes
No ratings yet
Software Engineering Unit-1 Notes
30 pages
OOP - Final - LAB - Exam II A-B - Iqra Shahzad
0% (1)
OOP - Final - LAB - Exam II A-B - Iqra Shahzad
2 pages
Foundations of Artificial Intelligence
No ratings yet
Foundations of Artificial Intelligence
60 pages
Mca 1 Sem Fundamental of Computers and Emerging Technologies Kca 101 2023
No ratings yet
Mca 1 Sem Fundamental of Computers and Emerging Technologies Kca 101 2023
2 pages
CS8591-Computer Networks Department of CSE 2020-2021
No ratings yet
CS8591-Computer Networks Department of CSE 2020-2021
24 pages
The Importance of Algorithms
No ratings yet
The Importance of Algorithms
22 pages
Chapter04-1 Multiplexing Reference Forouzan Chapter 6
No ratings yet
Chapter04-1 Multiplexing Reference Forouzan Chapter 6
50 pages
Swapping and Segmentation
No ratings yet
Swapping and Segmentation
15 pages
CPC - Unit I PPT 2
No ratings yet
CPC - Unit I PPT 2
25 pages
STM Unit4
100% (1)
STM Unit4
75 pages
71 - System Administrator Guide
No ratings yet
71 - System Administrator Guide
311 pages
KTU - CST202: I/O Organization - T M S
No ratings yet
KTU - CST202: I/O Organization - T M S
34 pages
Data and Computer Communications: Tenth Edition by William Stallings
No ratings yet
Data and Computer Communications: Tenth Edition by William Stallings
49 pages
Net Framework and C# Programming Practical File
No ratings yet
Net Framework and C# Programming Practical File
21 pages
Web Technologies Unit 1-5 PDF
No ratings yet
Web Technologies Unit 1-5 PDF
139 pages
Computer Graphics and Animation Lab: BCAP 286
No ratings yet
Computer Graphics and Animation Lab: BCAP 286
6 pages
Drools Expert Docs
No ratings yet
Drools Expert Docs
364 pages
Chapter 3 Multimedia Compression
0% (1)
Chapter 3 Multimedia Compression
61 pages
Kodak Setup Manuals
83% (6)
Kodak Setup Manuals
256 pages
Python Programming KNC 302
No ratings yet
Python Programming KNC 302
2 pages
Experiment 1:: Write Program To Implement Data Link Layer Stuffing Method
No ratings yet
Experiment 1:: Write Program To Implement Data Link Layer Stuffing Method
29 pages
Example 1: Simplify The Following Boolean Expression. Using Boolean Algebra Postulates and
No ratings yet
Example 1: Simplify The Following Boolean Expression. Using Boolean Algebra Postulates and
10 pages
User Interface Design
No ratings yet
User Interface Design
12 pages
T24 Notes Programming
No ratings yet
T24 Notes Programming
13 pages
Lec 2 Flowcharts and Computer System
No ratings yet
Lec 2 Flowcharts and Computer System
39 pages
Matrikon Opc Server For Siemens LSX
No ratings yet
Matrikon Opc Server For Siemens LSX
46 pages
MasterRC RC Detailing For AutoCAD
No ratings yet
MasterRC RC Detailing For AutoCAD
5 pages
DDA Algorithm 1
No ratings yet
DDA Algorithm 1
16 pages
Case Study of Various Routing Algorithm
No ratings yet
Case Study of Various Routing Algorithm
25 pages
Chapter Five: Object-Oriented Testing
No ratings yet
Chapter Five: Object-Oriented Testing
20 pages
1 An Overview of C
No ratings yet
1 An Overview of C
36 pages
Computer Science (Optional II) Grade 9-10: Micro Syllabus - Academic Year 2069
100% (1)
Computer Science (Optional II) Grade 9-10: Micro Syllabus - Academic Year 2069
6 pages
Computer Graphics JNTU Question Paper
No ratings yet
Computer Graphics JNTU Question Paper
4 pages
VI Branch: Information Internet and Web Technologies
No ratings yet
VI Branch: Information Internet and Web Technologies
3 pages
17MIS7072 Abhibus PDF
No ratings yet
17MIS7072 Abhibus PDF
10 pages
2013A IP Question
No ratings yet
2013A IP Question
49 pages
Bput Coa
No ratings yet
Bput Coa
2 pages
AP - Programming Essentials in Python Quizzes: Summary Test 2 Answers
100% (1)
AP - Programming Essentials in Python Quizzes: Summary Test 2 Answers
7 pages
GE3151 PYTHON Syllabus
No ratings yet
GE3151 PYTHON Syllabus
2 pages
Web Technology Kit 501
No ratings yet
Web Technology Kit 501
1 page
Cse-Vii-Advanced Computer Architectures (10CS74) - Assignment PDF
No ratings yet
Cse-Vii-Advanced Computer Architectures (10CS74) - Assignment PDF
6 pages
WT Practical Question For MCA I
No ratings yet
WT Practical Question For MCA I
7 pages
Chapter 8 LAN Technologies and Network Topology
No ratings yet
Chapter 8 LAN Technologies and Network Topology
9 pages
List of Programs Subject Code: PCS-307 Subject: OOP Using C++ Programming Lab
100% (1)
List of Programs Subject Code: PCS-307 Subject: OOP Using C++ Programming Lab
4 pages
Question Bank
No ratings yet
Question Bank
9 pages
How Does A Single Bit Error Differs From Burst Error.
No ratings yet
How Does A Single Bit Error Differs From Burst Error.
4 pages
B.SC (It) 6Th Sem Practical Question Paper With ANSWER... : 1. Write A Program For Frame Sorting Technique Used in Buffers
No ratings yet
B.SC (It) 6Th Sem Practical Question Paper With ANSWER... : 1. Write A Program For Frame Sorting Technique Used in Buffers
13 pages
Multiprocessor Configuration
100% (1)
Multiprocessor Configuration
7 pages
Design Processes: - : Corollary 4 Corollary 1 Axiom 1 Corollary 2 Axiom 2 Corollary 3
No ratings yet
Design Processes: - : Corollary 4 Corollary 1 Axiom 1 Corollary 2 Axiom 2 Corollary 3
8 pages
Setuplog
No ratings yet
Setuplog
231 pages
Practice Assignment 11 Sol 12453
100% (1)
Practice Assignment 11 Sol 12453
6 pages
Previous University Question Paper
No ratings yet
Previous University Question Paper
3 pages
Innova Central Quick Guide Rev 1 1 1
No ratings yet
Innova Central Quick Guide Rev 1 1 1
6 pages
45 Strip Wizard Tutorial: You Created This PDF From An Application That Is Not Licensed To Print To Novapdf Printer
No ratings yet
45 Strip Wizard Tutorial: You Created This PDF From An Application That Is Not Licensed To Print To Novapdf Printer
2 pages
Be 3 To 8 COMP Engg
No ratings yet
Be 3 To 8 COMP Engg
33 pages
UE-V Deployment Guide
No ratings yet
UE-V Deployment Guide
68 pages
Safenet Luna Network HSM Product Brief
No ratings yet
Safenet Luna Network HSM Product Brief
2 pages
Scratch Key
No ratings yet
Scratch Key
7 pages
Zigbee Interfacing ARM7 Primer
No ratings yet
Zigbee Interfacing ARM7 Primer
14 pages
C AUDSEC 731 PDF Questions and Answers
No ratings yet
C AUDSEC 731 PDF Questions and Answers
5 pages
Readme
No ratings yet
Readme
4 pages
Top 20 Nginx WebServer Best Security Practices
No ratings yet
Top 20 Nginx WebServer Best Security Practices
15 pages
Question Bank (ASP)
No ratings yet
Question Bank (ASP)
3 pages
Spring Cert Web
No ratings yet
Spring Cert Web
14 pages
Single Instruction
No ratings yet
Single Instruction
2 pages
PCSX2 0.9.7. 3133 Full Setup
No ratings yet
PCSX2 0.9.7. 3133 Full Setup
6 pages
Assignment Bca
No ratings yet
Assignment Bca
8 pages
Software Development Life Cycle Report: Shruti Dath.G. Divya Rani S.Y
No ratings yet
Software Development Life Cycle Report: Shruti Dath.G. Divya Rani S.Y
5 pages
100928C 01 PDF
No ratings yet
100928C 01 PDF
8 pages
Scenarios For C# Datastructure
No ratings yet
Scenarios For C# Datastructure
6 pages

Chapter 6 Pipelining

Uploaded by

Chapter 6 Pipelining

Uploaded by

PIPELINING

Basic concepts (Contd.,)

Basic concepts (Contd.,)

Basic concepts (Contd.,)

- Fetch unit fetches an instruction I1 (F1) and stores it in B1.

Basic concepts (Contd.,)

D : Decode instruction and fetch operands B2

Basic concepts (Contd.,)

Basic concepts (Contd.,)

Basic concepts (Contd.,)

Basic concepts (Contd.,)

Basic concepts (Contd.,)

Basic concepts (Contd.,)

Basic concepts (Contd.,)

Data hazards (contd..)

Mul R2, R3, R4

Handling data dependency in software

Side effects (contd..)

Instruction hazards (contd..)

Instruction hazards (contd..)

Instruction hazards (contd..)

Instruction hazards (contd..)

Queue can hold several instructions

Instruction fetch unit

D : Dispatch/ Decode unit

Instruction hazards (contd..)

Conditional branches and branch prediction

Delayed branch (contd..)

Delayed branch (contd..)

Decrement Branch=0 Shift_left Add

(b) Reordered instructions

Branch prediction (contd.,)

Branch prediction decision may change depending on the execution history.

Branch prediction (contd.,)

Branch prediction (contd.,)

Branch not taken (BNT)

Branch prediction (contd.,)

Influence on Instruction Sets

Figure 8.5. Effect of a Load instruction on pipeline timing.

Complex Addressing Mode

[X +[R1]] [[X +[R1]]] Forward

(a) Complex addressing mode

Simple Addressing Mode

(a) A program fragment

Compare Add Branch=0

R3,R4 R1,R2 ...

(b) Instructions reordered

Datapath and Control Considerations

These processors are known as superscalar processors.

Superscalar operation (contd.,)

Execution completion (Contd.,)

Number of Pipeline Stages

You might also like