0% found this document useful (0 votes)

40 views35 pages

SIMD Machines:: Pipeline System

This document discusses pipeline systems and hazards that can occur. It describes a 4-stage pipeline with fetch, decode, execute, and write stages. Instructions move through each stage in parallel. Hazards like data dependencies and branch instructions can cause the pipeline to stall. Data hazards occur if an instruction uses the result of a previous instruction before it is ready. Branch instructions can also cause stalls if the target address is not known until later stages. Techniques like forwarding, inserting NOPs, and predicting branch targets earlier can help reduce stalls from hazards.

Uploaded by

pssdk99hfd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views35 pages

SIMD Machines:: Pipeline System

Uploaded by

pssdk99hfd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Chapter 6

SIMD Machines:
Pipeline System
Pipeline System
Pipelining is a H/W implementation technique where multiple instructions are overlapped
in execution. It is a particularly effective way of organizing concurrent activities in a
computer system.

Aside:
Sequential execution of instructions is done by using a Single H/W circuit
Fetch I+ Decode I+ Execute I+ Write the result o f instruction
Return
Pipelining Properties
- The number of performed operations/ sec is increased even though the elapsed time
needed to perform any one operation is not changed.
- The computer pipelining system is divided into n-stages.
- Each stage completes a part of an instruction in parallel.
- The stages are connected one to next to form a pipe.
Example: Consider a 4- stage pipeline system as shown in Fig. (6.1)

Fig. (6.1) Hardware Organization of 4- stage pipeline system

The four stages are:
F: Fetch- To read (fetch) an instruction from the memory.
D: Decode- Decode the instruction and fetch the source operands.
E: Execute- Perform the operation specified by the instruction.
W: Write- Store the result in destination location.
In this configuration, Interstage buffer registers (B1, B2, and B3) are placed between each two
stages so that the result computed by one stage can serve as an input to the next stage during the
next period.
Example: Consider a 4-stage pipeline system, with four instructions are in progress at any given
time. This means four distinct hardware units (stages) are needed, as shown in Figure (6.2). These
units must be capable of performing their tasks simultaneously and without interfering with one
another. Information is passed from one unit to the next through a storage buffer.

Fig. (6.2) Instruction execution diagram of 4- stage pipeline system (Ideal case)
Notes
1) Each stage in a pipeline is expected to complete its operation in one clock cycle. Hence,
the clock period should be sufficiently long to complete the task being performed in any
stage.

• If different units require different amounts of time, the clock period must allow the
longest task to be completed.
•A stage that completes its task early is idle for the remainder of the clock period.
2) The results obtained when instructions are executed in a pipelined processor are identical
to those obtained when the same instructions are executed sequentially.
PIPELINE PERFORMANCE
∙ The ideal case of a pipelined processor completes the processing of one instruction in each clock
cycle, which means that the rate of instruction processing is n-times that of sequential operation,
where n: is the number of pipeline stages.

∙ The potential increase in performance resulting from pipelining is proportional to the number of
pipeline stages.

Fetch + Decode + Execute +

write

Sequential processor
Pipeline system processor
Pipeline Hazards
For a variety of reasons, one of the pipeline stages may not be able to complete its processing
task for a given instruction in the time allotted. This is called "Hazard“.
- Hazards prevent the next instruction in an instruction stream from being executing during its
designated clock cycle.
- Hazards reduce the performance from the ideal speedup gained by pipelining because it causes
pipeline stalls to be inserted.

Data
There are three classes of pipeline hazards
Hazards

Control Structural
1) Data Hazards
A data hazard is a situation in which the pipeline is stalled, because the data to be operated
on are delayed for some reason; like data dependency.
There are two types of data dependency:
a) Explicit
b) implicit
Example (Explicit data dependency) : Consider a program that contains the following two
instructions I1 and I2. The data dependency just described arises when the destination of one
instruction is used as a source in the next instruction. Draw the instruction execution
diagram.
I1: Mul R2,R3,R4 R4 R2 * R3
I2: Add R5,R4,R6 R6 R4 + R5
I3 : Mov R5, M[addr] M[addr] R5
:
:
Mul R2,R3,R4 Data dependencey
R4

Add R5,R4,R6

Mov R5, M[addr]

Fig. (6.3) Pipeline stalled by data dependency between D2 and W1

Example: (Implicit data dependency)
Consider a program that contains the following two instructions I1 and I2. The data dependency just described
arises when the destination of one instruction I1 is used as a source in instruction I2. Draw the instruction
execution diagram.
I1: Add R1,R2,R3 R3 R1 + R2
Data dependence
I2: Adc R4,R5,R6 R6 R4 + R5 + CY because of the CY
I3 : Mov R5, R1 R1 R5
:
:
Eliminating Data Hazards: There are two approaches to eliminate data hazards
A) H/W Handling (OPERAND FORWARDING)
- If the H/W of pipeline system is arranged so that the result of source instruction I1 (E1) is forwarded directly
for the use directly in execution stage of destination instruction (ex. E2), the data hazard can be eliminated.

Fig. (6.4a) shows a part of the processor datapath involving the ALU and the register file.
The registers SRC1 and SRC2 constitute the interstage buffers needed for pipelined operation, as illustrated in
Fig. (6.4b)

I1: Mul R2,R3,R4 Source Instruction

I2: Add R5,R4,R6 Destination Instruction
SRC1, SRC2 RSLT

F: Fetch Ins D: Decode Ins. E: Execute Ins

and Fetch (ALU) W: Write Ins
B1 Operands B2 B3

Forwarding Path

Fig. (6.4b) Operand Forwarding technique in a pipelined processor

- As shown in Fig. (6.4b), registers SRC1 and SRC2 are part of buffer B2 and RSLT is part of B3.
- The two multiplexers connected at the inputs to the ALU allow the data on the destination bus to be
selected instead of the contents of either the SRC1 or SRC2 register.
- After decoding instruction I2 and detecting the data dependency, a decision is made to use data
forwarding. The operand not involved in the dependency, register R2, is read and loaded in register
SRC1 in clock cycle 3.
- In the next clock cycle, the product produced by instruction I1 is available in register RSLT, and
because of the forwarding connection, it can be used in step E2. Hence, execution of I2 proceeds
without interruption
B) HANDLING DATA HAZARDS IN SOFTWARE
In this case, the compiler can introduce the two-cycle delay needed between the instructions I1
and I2 by inserting NOP (No-operation) instructions, as follows:

I1: Mul R2,R3,R4

NOP
NOP Data dependency
I2: Add R5,R4,R6

If the responsibility for detecting such dependencies is left entirely to the software, the
compiler must insert the NOP instructions to obtain a correct result.

2- Instruction (or Control) Hazards

The pipeline processor may also be stalled because of a delay in the availability of an
instruction. Instruction Hazards occurred if a cache miss occurred. This will require the instruction
to be fetched from the main memory. Control Hazard may occur because of branch instructions.
a) Effect of a cache miss: The effect of a cache miss on pipelined operation is illustrated in Fig. (6.5).
Assume here that there is a cache miss in fetching instructin I2. Instruction I1 is fetched from the
cache in cycle 1, and its execution proceeds normally. However, the fetch operation for instruction
I2, which is started in cycle 2, results in a cache miss.

Fig. (6.5) Instruction execution diagram for a pipelined processor

stalled caused by cache miss in F2
The instruction fetch stage must now be suspend any further fetch requests and wait for instruction
I2 to arrive. We assume that instruction I2 is received and loaded into buffer B1 at the end of cycle 5.
The pipeline resumes its normal operation at that point.

b) Effect of Branch instructions (consider Unconditional branch only)

A branch instruction may also cause the pipeline system to stall. The time lost as a result of a branch
instruction is often referred to as the branch penalty.

For example, Fig. (6.6)a shows the effect of a branch instruction on a four-stage pipeline system.
- Assume that the branch address is computed in step E2. Instructions I3 and I4 must be discarded, and

the target instruction, Ik, is fetched in clock cycle 5. Thus, the branch penalty is two clock cycles.
I2: branch Ik

Fig. (6.6)a . pipelined processor stalled caused by branch timing

Reducing the branch penalty requires the branch address to be computed earlier in the pipeline.
- Typically, the instruction fetch unit has dedicated hardware to identify a branch instruction and
compute the branch target address as quickly as possible after an instruction is fetched.

- With this additional hardware, both of these tasks can be performed in step D2, leading to the
sequence of events shown in Fig. (6.6b). In this case, the branch penalty is only one clock cycle.

I2: branch Ik

Fig. (5.6b) Instruction execution

diagram after Reducing the branch
penalty
Reducing Instruction hazards
- Using Instruction Queue and Prefetching Technique
In general, either a cache miss or a branch instruction stall the pipeline for one or more
clock cycles.
- To reduce the effect of these stalls, many processors employ sophisticated fetch units that
can fetch instructions before they are needed and put them in a queue.
* Typically, this queue called instruction queue ; it can store several instructions.
* sophisticated fetch unit attempts to keep the instruction queue filled at all times to
reduce the impact of occasional delays when fetching instructions.
* Further, the fetch unit must have sufficient decoding and processing capability to
recognize and execute branch instructions.
- A separate unit, which we call the dispatch unit, takes instructions from the front of the
queue and sends them to the execution unit. This leads to the organization shown in Fig.
(6.7). The dispatch unit also performs the decoding function.
Fig. (6.7) Use of an instruction queue in the hardware organization

To be effective, the fetch unit must have sufficient decoding and processing capability to recognize and
execute branch instructions..
What is the effect of instruction queue and prefetching system on pipeline hazards?
∙ When the pipeline stalls because of a data hazard (for example), the dispatch unit is not able to
issue instructions from the instruction queue. However, the fetch unit continues to fetch
instructions and add them to the queue.

∙ Conversely, if there is a delay in fetching instructions because of a cache miss, the dispatch unit
continues to issue instructions from the instruction queue.

3- Structural hazard
This is a situation when two instructions require the use of a given hardware resource at the
same time. The most common case in which structural hazard may arise is in access to memory.
One instruction may need to access memory as part of the Execute or Write stage while another
instruction is being fetched. If instructions and data reside in the same cache unit, only one
instruction can proceeds and the other instruction is delayed. Many processors use separate
instruction and data caches to avoid this delay.
Both stages require to use
the same data bus at the
same time

Fig. 6.8. Structural hazards are avoided by providing sufficient hardware resources on the processor chip
Example: Consider the following sequence of instructions

I1 Add 0A,R0,R1
I2 Mul 3,R2,R3
I3 And 3A,R2,R4
I4 Add R0,R2,R5
I5 Sub R5, R4,R4
I6 Mov R5, [3000]
I7 Mov R2, [2500]

In all instructions, the destination operand is given last. Initially, registers R0 and R2 contain
14 and 5B, respectively. These instructions are executed in a computer that has a four-stage
pipeline. Assume that the first instruction is fetched in clock cycle 1, and that instruction
fetch requires only one clock cycle. And there is a cache miss in fetching instruction I2.
Note that the time needed to fetch an instruction in case of cache miss is 5 Clk cycles.
(a) Draw the instruction execution diagram.
(b) Give the contents of the interstage buffers, B1, B2, and B3, during clock cycles 2 and 10.
Clock 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Inst.
I1 F1 D1 E1 W1
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
I5 F5 D5 D5 E5 W5
I6 F6 D6 E6 W6
I7 F7 D7 E7 W7

I1 Add 0A,R0,R1
Hazards I2 Mul 3,R2,R3
I3 And 3A,R2,R4
I4 Add R0,R2,R5
Fig. Instruction execution diagram I5 Sub R5, R4,R4
I6 Mov R5, [3000]
I7 Mov R2, [2500]
Clock 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Inst.
I1 F1 D1 E1 W1
I2 F2 D E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
I5 F5 D5 D5 E5 W5
I6 F6 D6 E6 W6
I7 F7 D7 E7 W

I1 Add 0A,R0,R1
I2 Mul 3,R2,R3
I3 And 3A,R2,R4
I4 Add R0,R2,R5
Clock B1 B2 B3 I5 Sub R5, R4,R4

Clk cycle 2 I1 (fetched) I1 (after decode step) Nothing I6 Mov R5, [3000]
I7 Mov R2, [2500]
Clk cycle10 I6 (fetched) I5 (after decode step) R5=6F (i.e. execute I4)
Homework: Consider the following sequence of instructions

I1 Mov 09,R0
Loop: I2 INC R0
I3 Add R0,R1,R1
I4 Adc R2,R0,R2
I5 OR R2, 89, R4
I6 Jmp Loop
I7 Mov R1, [2500]
I8 Mov [2000], R3

In all instructions, the destination operand is given last. Initially, registers R1 and R2 contain 18 and 20,
respectively. These instructions are executed in a computer that has a four-stage pipeline. Assume that the first
instruction is fetched in clock cycle 1, and that instruction fetch requires only one clock cycle. And there is a
cache miss in fetching instruction I5. Note that the time needed to fetch an instruction in case of cache miss is 5
Clk cycles, and the first level cache is split type.

(a) Draw the instruction execution diagram.

(b) Give the contents of the interstage buffers, B1, B2, and B3, during clock cycles 7 and 13.
Performance Measurements
The execution time of a program, T , of a program

N: dynamic instruction count

S: is the average number of clock cycles processor takes to fetch and execute one instruction,
and
R : is the clock rate.
Note: This simple model assumes that instructions are executed one after the other, with no overlap.

Instruction Throughput: represents the number of instructions executed per second. (It is
used to measure the pipelined system speed)
A- Sequential Processor Throughput
For sequential execution, the throughput, Ps is given by
B- Pipeline Throughput
1) Effect of a Unified Cache
Let TI be the time between two successive instruction completions.
Note: For sequential execution, T1 = S
However, in the absence of hazards, a pipelined processor completes the execution of one instruction each
clock cycle, thus,
T1 = 1 cycle
A cache miss stalls the pipeline by an amount equal to the cache miss penalty. This means that the value of TI
increases by an amount equal to the cache miss penalty for the instruction in which the miss occurs.
Note: a cache miss can occur for either
Consider a computer that has a unified cache for both instructions and data, and let d be the percentage of
instructions that refer to data operands in the memory. The average increase in the value of T1 as a result of
cache misses is given by

where hi and hd are the hit ratios for instructions and data, respectively.
2) Effect of a two- level Caches
Reducing the cache miss penalty is particularly worthwhile in a pipelined processor. This can be achieved by
introducing a secondary cache between the primary, on-chip cache and main memory. A miss in the primary
cache for which the required block is found in the secondary cache introduces a penalty, Ms , In the case of a
miss in the secondary cache, the penalty (Mp) is still incurred. Assuming a hit rate hs in the secondary cache,
the average increase in TI is

Example 1: A typical computer system with a clock period of 1.25ns and a unified cache for instructions and
data. Assume that 33% of the instructions access data in memory. With 95% instruction hit rate and 92% data hit
rate, and miss penalty of 16 – clock cycles. Determine the followings:
- Pipeline processor throughput.
- Non-pipelined processor throughput.
Solution: clock rate R = 1/T1 = 1/ 1.25ns = 800 MHz
with 33% of the instructions access data in memory, δmiss is

= ((1- 0.95) + 0.33(1-0.92))* 16

= ( 0.05 + 0.0264 ) * 16 = 1.2224 cycles

Taking this delay into account, the processor’s throughput would be

Pp = 800/ (1+ 1.2224) = 360 MIPS

- Non-pipelined processor throughput

Pipeline Performance = Pp / Ps = 360/ 153.2 = 2.4

Example 2: Consider a processor that uses a 4- stage pipeline system and two level caches (L1 and
L2) with a clock period of 1.25ns.
The L1- cache is a unified cache for instructions and data with 33% of the instructions access data in
memory. Assume that the instruction hit rate is 95%, data hit rate is 92%, and miss penalty of 16 –
clock cycles. If the time needed to transfer an (8- word) block from the L2- cache is 9 ns, miss penalty
of L2- cache is 5clock cycles, and L2- cache hit rate is 91%, determine the followings:

- Pipelined processor throughput.

- Non-pipelined processor throughput.
- Pipeline Performance
Solution: clock rate R = 1/T1 = 1/ 1.25ns = 800 MHz
with 33% of the instructions access data in memory, δmiss is

= ((1- 0.95) + 0.33(1-0.92)) * (0.915)+ (1-0.91) 16)

= ( 0.05 + 0.0264 ) * (4.55+ 1.44) =
= 0.0764 * 5.99 = .0457 cycles

Taking this delay into account, the processor’s throughput would be

Pipelined processor throughput

Pp = 800/ (1+ .0457) = 549 MIPS

Ps = 800/ (4+ .0457) = 179.5 MIPS

Pipeline Performance = Pp / Ps = 549/ 179.5 = 3.06

Superscalar Operation
Aside

Return
A more aggressive approach is to equip the processor with multiple processing units to handle
several instructions in parallel in each processing stage.
- With this arrangement, several instructions start execution in the same clock cycle, and the
processor is said to use multiple-issue.
- Such processor is capable of achieving instruction execution throughput of more than one instruction
per cycle. They are known as superscalar processors. Many modern high-performance processors use
this approach.
Example: Consider a processor with two execution units, one for integer and one for floating-point operations.
The Instruction fetch unit is capable of reading two instructions at a time and storing them in the instruction
queue as shown in Fig. 9

In each clock cycle, the Dispatch unit retrieves and decodes up to two instructions from the front of the queue. If
there is one integer, one floating-point instruction, and no hazards, both instructions are dispatched in the same
clock cycle.

Fig. 9 A superscalar processor with

two execution units.
Pipeline instruction execution timing diagram is shown in the following Fig.10. The blue
shading indicates operations in the floating-point unit. The floating-point unit takes three clock
cycles to complete the floating-point operation specified in I1. The integer unit completes
execution of I2 in one clock cycle.

Fig. 10 An example of
instruction execution flow in the
processor of Fig. (1), assuming
no hazards are encountered.

ICT 7 Q1 Module1 Quarter I
No ratings yet
ICT 7 Q1 Module1 Quarter I
23 pages
TN Alarm Description
100% (7)
TN Alarm Description
219 pages
Pipeline Hazards Detailed Notes
No ratings yet
Pipeline Hazards Detailed Notes
49 pages
Instruction Pipeline Design, Arithmetic Pipeline Deign - Super Scalar Pipeline Design
No ratings yet
Instruction Pipeline Design, Arithmetic Pipeline Deign - Super Scalar Pipeline Design
34 pages
Unit 3
No ratings yet
Unit 3
94 pages
Module 5 Part2 Pipelining
No ratings yet
Module 5 Part2 Pipelining
36 pages
C.Arch Large
No ratings yet
C.Arch Large
57 pages
COA Lecture 10
No ratings yet
COA Lecture 10
22 pages
CH14-WS - 10thed - Pipeline
No ratings yet
CH14-WS - 10thed - Pipeline
16 pages
Pipe Lining
No ratings yet
Pipe Lining
12 pages
Pipeline Hazards. Presentation
100% (2)
Pipeline Hazards. Presentation
20 pages
Pipelining. Pipeline Hazards: Sabina Batyrkhanovna
No ratings yet
Pipelining. Pipeline Hazards: Sabina Batyrkhanovna
19 pages
K53 e Booklet
86% (7)
K53 e Booklet
60 pages
COA Unit 3 Pipelining 31.5.23
No ratings yet
COA Unit 3 Pipelining 31.5.23
12 pages
Slides Chapter 6 Pipelining
No ratings yet
Slides Chapter 6 Pipelining
60 pages
Ch#16 (CPU Structure and Function)
No ratings yet
Ch#16 (CPU Structure and Function)
48 pages
Measurement of Insulation Resistance IR Part 2
No ratings yet
Measurement of Insulation Resistance IR Part 2
12 pages
CA Unit-2 Chapter-2
No ratings yet
CA Unit-2 Chapter-2
36 pages
Module 3 Pipelining
No ratings yet
Module 3 Pipelining
7 pages
Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining
No ratings yet
Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining
12 pages
Pipelining (All Slides)
No ratings yet
Pipelining (All Slides)
45 pages
3.2 Pipeline Processing
No ratings yet
3.2 Pipeline Processing
18 pages
Lecture 1
100% (1)
Lecture 1
10 pages
Module 5 - Pipelining
No ratings yet
Module 5 - Pipelining
61 pages
Pipe Lining
No ratings yet
Pipe Lining
14 pages
Eshine Manual PDF
No ratings yet
Eshine Manual PDF
27 pages
Kuliah 14 Pipeliningg
No ratings yet
Kuliah 14 Pipeliningg
28 pages
4-Pipeline
No ratings yet
4-Pipeline
30 pages
Lab#02: Characteristics of Power Diode: Objective: To Become Familiar With The Operating Principles of Power Diodes
No ratings yet
Lab#02: Characteristics of Power Diode: Objective: To Become Familiar With The Operating Principles of Power Diodes
5 pages
Lecture 3.1.2 (Concept of Pipelining, Pipeline Hazards)
No ratings yet
Lecture 3.1.2 (Concept of Pipelining, Pipeline Hazards)
6 pages
CAP EndSem Unit 5
No ratings yet
CAP EndSem Unit 5
8 pages
Seleccion Chaveteros Norma JIS PDF
No ratings yet
Seleccion Chaveteros Norma JIS PDF
1 page
CH 6
No ratings yet
CH 6
29 pages
Lecture 7 - PIPELINING
No ratings yet
Lecture 7 - PIPELINING
16 pages
Chapter 3 PPTV 31 Sem IIv 31
No ratings yet
Chapter 3 PPTV 31 Sem IIv 31
40 pages
10 Pipelining
No ratings yet
10 Pipelining
44 pages
Co - Unit Ii - Ii
No ratings yet
Co - Unit Ii - Ii
34 pages
Unit 6
No ratings yet
Unit 6
22 pages
Unit 5 Pipeline Hazard
No ratings yet
Unit 5 Pipeline Hazard
31 pages
Pipelining - Modified1
No ratings yet
Pipelining - Modified1
51 pages
Pipelining New
No ratings yet
Pipelining New
33 pages
CoA Batch13
No ratings yet
CoA Batch13
30 pages
Pipelining
No ratings yet
Pipelining
44 pages
Parallel Processing Chapter - 3: Instruction Level Parallelism
No ratings yet
Parallel Processing Chapter - 3: Instruction Level Parallelism
33 pages
Dpco Unit 4
No ratings yet
Dpco Unit 4
21 pages
31 Pipeline Hazards 25-04-2024
No ratings yet
31 Pipeline Hazards 25-04-2024
35 pages
Computer Architecture M2 (Part 3)
No ratings yet
Computer Architecture M2 (Part 3)
34 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
53 pages
Coa Unit 4
No ratings yet
Coa Unit 4
10 pages
DLCO Module 6 Sem 3
No ratings yet
DLCO Module 6 Sem 3
40 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
LEA DcDesk Configuration-Software e
No ratings yet
LEA DcDesk Configuration-Software e
4 pages
Fluke 133
No ratings yet
Fluke 133
70 pages
CA-unit 4-Material
No ratings yet
CA-unit 4-Material
31 pages
Lecture 13-14: Pipelines Hazards": Suggested Reading:" (HP Chapter 4.5-4.7) "
No ratings yet
Lecture 13-14: Pipelines Hazards": Suggested Reading:" (HP Chapter 4.5-4.7) "
51 pages
Ca07 2014 PDF
No ratings yet
Ca07 2014 PDF
56 pages
Content: - Introduction To Pipeline Hazard - Structural Hazard - Data Hazard - Control Hazard
No ratings yet
Content: - Introduction To Pipeline Hazard - Structural Hazard - Data Hazard - Control Hazard
27 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
Week 4 - Pipelining
No ratings yet
Week 4 - Pipelining
44 pages
Cisco Nexus 5548UP Switch Configuration Guide PDF
No ratings yet
Cisco Nexus 5548UP Switch Configuration Guide PDF
14 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
L10-L11-Instruction Pipelining
No ratings yet
L10-L11-Instruction Pipelining
38 pages
System Software and Machine Architecture Introduction2
No ratings yet
System Software and Machine Architecture Introduction2
3 pages
Datasheet 74ls83 Full Adder
No ratings yet
Datasheet 74ls83 Full Adder
5 pages
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
No ratings yet
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
24 pages
Digital Lab Manual EC1010
No ratings yet
Digital Lab Manual EC1010
68 pages
NEWLOGO
No ratings yet
NEWLOGO
9 pages
LG Adjustment and Spare Part Instruction
No ratings yet
LG Adjustment and Spare Part Instruction
6 pages
Ledfaa13 Manual
No ratings yet
Ledfaa13 Manual
54 pages
Cryptography and Network Security
No ratings yet
Cryptography and Network Security
11 pages
Pipelining
No ratings yet
Pipelining
29 pages
What Are The Commonly Used Citrix Commands
No ratings yet
What Are The Commonly Used Citrix Commands
42 pages
COA Unit 3
No ratings yet
COA Unit 3
89 pages
Cross-Platform Use of The NIST REFPROP Database Via Pythons's Ctypes Library
No ratings yet
Cross-Platform Use of The NIST REFPROP Database Via Pythons's Ctypes Library
19 pages
VHDL Dma
No ratings yet
VHDL Dma
5 pages
Level Depth Measurement Meter
No ratings yet
Level Depth Measurement Meter
9 pages
Backing Up and Restoring A Hyperion Essbase Database
No ratings yet
Backing Up and Restoring A Hyperion Essbase Database
11 pages
Nano VIP Cube
No ratings yet
Nano VIP Cube
6 pages
Dspic Timers
No ratings yet
Dspic Timers
26 pages
360DOME 8m&10m
No ratings yet
360DOME 8m&10m
19 pages
PC Adapter V5.0 TS Adapter Manual
No ratings yet
PC Adapter V5.0 TS Adapter Manual
28 pages
Junior 9th Notes
No ratings yet
Junior 9th Notes
23 pages
Quiz Digital Audio No Answers
No ratings yet
Quiz Digital Audio No Answers
4 pages
Programa Cursuri Orange
No ratings yet
Programa Cursuri Orange
1 page
Q1. What Are The Things You Need To Consider in Choosing An Operating System?
No ratings yet
Q1. What Are The Things You Need To Consider in Choosing An Operating System?
2 pages
Exploring BeagleBone: Tools and Techniques for Building with Embedded Linux
From Everand
Exploring BeagleBone: Tools and Techniques for Building with Embedded Linux
Derek Molloy
4/5 (1)
CCNA Certification Study Guide Volume 1: Exam 200-301 v1.1
From Everand
CCNA Certification Study Guide Volume 1: Exam 200-301 v1.1
Todd Lammle
5/5 (1)
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
CCST Cisco Certified Support Technician Study Guide: Networking Exam
From Everand
CCST Cisco Certified Support Technician Study Guide: Networking Exam
Todd Lammle
5/5 (1)

SIMD Machines:: Pipeline System

Uploaded by

SIMD Machines:: Pipeline System

Uploaded by

Chapter 6

Fig. (6.1) Hardware Organization of 4- stage pipeline system

Fetch + Decode + Execute +

Mov R5, M[addr]

Fig. (6.3) Pipeline stalled by data dependency between D2 and W1

I1: Mul R2,R3,R4 Source Instruction

F: Fetch Ins D: Decode Ins. E: Execute Ins

Fig. (6.4b) Operand Forwarding technique in a pipelined processor

I1: Mul R2,R3,R4

2- Instruction (or Control) Hazards

Fig. (6.5) Instruction execution diagram for a pipelined processor

b) Effect of Branch instructions (consider Unconditional branch only)

Fig. (6.6)a . pipelined processor stalled caused by branch timing

Fig. (5.6b) Instruction execution

(a) Draw the instruction execution diagram.

N: dynamic instruction count

= ((1- 0.95) + 0.33(1-0.92))* 16

Taking this delay into account, the processor’s throughput would be

Pp = 800/ (1+ 1.2224) = 360 MIPS

Pipeline Performance = Pp / Ps = 360/ 153.2 = 2.4

- Pipelined processor throughput.

= ((1- 0.95) + 0.33(1-0.92)) * (0.91*5)+ (1-0.91)* 16)

Taking this delay into account, the processor’s throughput would be

Pipelined processor throughput

Pp = 800/ (1+ .0457) = 549 MIPS

Ps = 800/ (4+ .0457) = 179.5 MIPS

Pipeline Performance = Pp / Ps = 549/ 179.5 = 3.06

Fig. 9 A superscalar processor with

You might also like

= ((1- 0.95) + 0.33(1-0.92)) * (0.915)+ (1-0.91) 16)