0% found this document useful (0 votes)

162 views16 pages

Pipeline: A Simple Implementation of A RISC Instruction Set

Pipeline allows for the overlapping execution of multiple instructions by breaking down the execution process into discrete stages. At each stage, a different instruction is being worked on. This improves performance by allowing new instructions to begin execution before previous instructions have finished. However, pipeline hazards can occur when instructions interact in ways that prevent proper parallel execution, such as structural hazards from shared resources or data hazards when an instruction needs to read a value before it is written. Various techniques are used to address hazards and maximize the performance benefits of pipelining.

Uploaded by

akhilesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

162 views16 pages

Pipeline: A Simple Implementation of A RISC Instruction Set

Uploaded by

akhilesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Pipeline

Pipeline is an implementation technique that exploits parallelism among the

instructions in a sequential instruction stream. Pipeline allows to overlapping the
execution of multiple instructions.
A Pipeline is like an assembly line each step or pipeline stage completes a part of
an instructions. Each stage of the pipeline will be operating an a separate instruction.
Instructions enter at one end progress through the stage and exit at the other end. If the
stages are perfectly balance.
(assuming ideal conditions), then the time per instruction on the pipeline processor is
given by the ratio:
Time per instruction on unpipelined machine
Number of Pipeline stages
Under these conditions, the speedup from pipelining is equal to the number of stage
pipeline. In practice, the pipeline stages are not perfectly balanced and pipeline does
involve some overhead. Therefore, the speedup will be always then practically less than
the number of stages of the pipeline.
Pipeline yields a reduction in the average execution time per instruction. If the
processor is assumed to take one (long) clock cycle per instruction, then pipelining
decrease the clock cycle time. If the processor is assumed to take multiple CPI, then
pipelining will aid to reduce the CPI.
A Simple implementation of a RISC instruction set
Instruction set of implementation in RISC takes at most 5 cycles without pipelining.
The 5 clock cycles are:
1. Instruction fetch (IF) cycle:
Send the content of program count (PC) to memory and fetch the current
instruction from memory to update the PC.
New PC

[PC] + 4;

Since each instruction is 4 bytes

2. Instruction decode / Register fetch cycle (ID):

Decode the instruction and access the register file. Decoding is done in parallel with
reading registers, which is possible because the register specifies are at a fixed location in
a RISC architecture. This corresponds to fixed field decoding. In addition it involves:
-

Perform equality test on the register as they are read for a possible branch.

Sign-extend the offset field of the instruction in case it is needed.

Compute the possible branch target address.

3. Execution / Effective address Cycle (EXE)

The ALU operates on the operands prepared in the previous cycle and performs
one of the following function defending on the instruction type.
* Memory reference: Effective address

[Base Register] + offset

* Register- Register ALU instruction: ALU performs the operation specified in

the instruction using the values read from the register file.
* Register- Immediate ALU instruction: ALU performs the operation specified in
the instruction using the first value read from the register file and that sign extended
immediate.
4.

Memory access (MEM)

For a load instruction, using effective address the memory is read. For a store
instruction memory writes the data from the 2nd register read using effective address.
5.

Write back cycle (WB)

Write the result in to the register file, whether it comes from memory system (for a
LOAD instruction) or from the ALU.
Five stage Pipeline for a RISC processor
Each instruction taken at most 5 clock cycles for the execution
*

Instruction fetch cycle (IF)

Instruction decode / register fetch cycle (ID)

Execution / Effective address cycle (EX)

Memory access (MEM)

Write back cycle (WB)

The execution of the instruction comprising of the above subtask can be

pipelined. Each of the clock cycles from the previous section becomes a pipe stage a
cycle in the pipeline. A new instruction can be started on each clock cycle which results
in the execution pattern shown figure 2.1. Though each instruction takes 5 clock cycles
to complete, during each clock cycle the hardware will initiate a new instruction and will
be executing some part of the five different instructions as illustrated in figure 2.1.
.
Instruction
#
Instruction i
Instruction I+1
Instruction I+2
Instruction I+3
Instruction I+4

Clock number
1

EXE

MEM

EXE

MEM

EXE

MEM

EXE

MEM

EXE

MEM

Figure 2.1 Simple RISC Pipeline. On each clock cycle another instruction fetched
Each stage of the pipeline must be independent of the other stages. Also, two different
operations cant be performed with the same data path resource on the same clock. For
example, a single ALU cannot be used to compute the effective address and perform a
subtract operation during the same clock cycle. An adder is to be provided in the stage 1
to compute new PC value and an ALU in the stage 3 to perform the arithmetic indicated
in the instruction (See figure 2.2). Conflict should not arise out of overlap of instructions
using pipeline. In other words, functional unit of each stage need to be independent of
other functional unit. There are three observations due to which the risk of conflict is
reduced.
Separate Instruction and data memories at the level of L1 cache eliminates a
conflict for a single memory that would arise between instruction fetch and data
access.
Register file is accessed during two stages namely ID stage WB. Hardware should
allow to perform maximum two reads one write every clock cycle.
To start a new instruction every cycle, it is necessary to increment and store the
PC every cycle.

Cycle 1

Cycle 2

Cycle 3

Cycle 4

Cycle 5

Figure 2.2 Diagram indicating the cycle and functional unit of each stage.

Buffers or registers are introduced between successive stages of the pipeline so

that at the end of a clock cycle the results from one stage are stored into a register (see
figure 2.3). During the next clock cycle, the next stage will use the content of these
buffers as input. Figure 2.4 visualizes the pipeline activity.

Figure 2.3 Functional units of 5 stage Pipeline. IF/ID is a buffer between IF and ID stage.

Basic Performance issues in Pipelining

Pipelining increases the CPU instruction throughput but, it does not reduce the execution
time of an individual instruction. In fact, the pipelining increases the execution time of
each instruction due to overhead in the control of the pipeline. Pipeline overhead arises
from the combination of register delays and clock skew. Imbalance among the pipe stages
reduces the performance since the clock can run no faster than the time needed for the
slowest pipeline stage.

Figure 2.4 Pipeline activity

Pipeline Hazards
Hazards may cause the pipeline to stall. When an instruction is stalled, all the instructions
issued later than the stalled instructions are also stalled. Instructions issued earlier than
the stalled instructions will continue in a normal way. No new instructions are fetched
during the stall.
Hazard is situation that prevents the next instruction in the instruction stream fromk
executing during its designated clock cycle. Hazards will reduce the pipeline
performance.

Performance with Pipeline stall

A stall causes the pipeline performance to degrade from ideal performance. Performance
improvement from pipelining is obtained from:
Speedup =

Average instruction time un-pipelined

Average instruction time pipelined

Speedup =

CPI unpipelined * Clock cycle unpipelined

CPI pipelined * Clock cycle pipelined

CPI pipelined = Ideal CPI + Pipeline stall clock cycles per instruction
CPI pipelined = 1 + Pipeline stall clock cycles per instruction
Assume that,
i)
cycle time overhead of pipeline is ignored
ii)
stages are balanced
With theses assumptions,
Clock cycle unpipelined = clock cycle pipelined
Therefore, Speedup = CPI unpipelined
CPI pipelined

Speedup =

CPI unpipelined
a
1+Pipeline stall cycles per instruction

If all the instructions take the same number of cycles and is equal to the number of
pipeline stages or depth of the pipeline, then,
CPI unpipelined = Pipeline depth
Speedup =

Pipeline depth a
1+Pipeline stall cycles per instruction

If there are no pipeline stalls,

Pipeline stall cycles per instruction = zero
Therefore,
Speedup = Depth of the pipeline.
Types of hazard
Three types hazards are:
1. Structural hazard
2. Data Hazard
3. Control Hazard
20

Structural hazard
Structural hazard arise from resource conflicts, when the hardware cannot support all
possible combination of instructions simultaneously in overlapped execution. If some
combination of instructions cannot be accommodated because of resource conflicts, the
processor is said to have structural hazard.
Structural hazard will arise when some functional unit is not fully pipelined or when
some resource has not been duplicated enough to allow all combination of instructions in
the pipeline to execute.
For example, if memory is shared for data and instruction as a result, when an instruction
contains data memory reference, it will conflict with the instruction reference for a later
instruction (as shown in figure 2.5a). This will cause hazard and pipeline stalls for 1
clock cycle.

Figure 2.5a Load Instruction and instruction 3 are accessing memory in clock
cycle4

Instruction #
Load Instruction
Instruction I+1
Instruction I+2

Clock number
1

EXE

MEM

ID
IF

EXE
ID

MEM
EXE
IF

Instruction I+3
Instruction I+4

Stall

WB
MEM
ID
IF

WB
EXE
ID

MEM
EXE

WB
MEM

Figure 2.5b A Bubble is inserted in clock cycle 4

Pipeline stall is commonly called Pipeline bubble or just simply bubble.
Data Hazard
Consider the pipelined execution of the following instruction sequence (Timing diagram
shown in figure 2.6)
DADD
DSUB
AND
OR
XOR

R1, R2, R3
R4,R1,R5
R6,R1,R5
R8, R1,R9
R10,R1,R11

DADD instruction produces the value of R1 in WB stage (Clock cycle 5) but the DSUB
instruction reads the value during its ID stage (clock cycle 3). This problem is called Data
Hazard.
DSUB may read the wrong value if precautions are not taken. AND instruction will read
the register during clock cycle 4 and will receive the wrong results.
The XOR instruction operates properly, because its register read occurs in clock cycle 6
after DADD writes in clock cycle 5. The OR instruction also operates without incurring a
hazard because the register file reads are performed in the second half of the cycle
whereas the writes are performed in the first half of the cycle.
Minimizing data hazard by Forwarding
The DADD instruction will produce the value of R! at the end of clock cycle 3. DSUB
instruction requires this value only during the clock cycle 4. If the result can be moved
from the pipeline register where the DADD store it to the point (input of LAU) where
DSUB needs it, then the need for a stall can be avoided. Using a simple hardware
technique called Data Forwarding or Bypassing or short circuiting, data can be made
available from the output of the ALU to the point where it is required (input of LAU) at
the beginning of immediate next clock cycle.
Forwarding works as follows:
i)
The output of ALU from EX/MEM and MEM/WB pipeline register is always
feedback to the ALU inputs.
ii)
If the Forwarding hardware detects that the previous ALU output serves as the
source for the current ALU operations, control logic selects the forwarded
result as the input rather than the value read from the register file.
Forwarded results are required not only from the immediate previous instruction, but also
from an instruction that started 2 cycles earlier. The result of ith instruction
Is required to be forwarded to (i+2)th instruction also.
Forwarding can be generalized to include passing a result directly to the functional unit
that requires it.
Data Hazard requiring stalls
LD
R1, 0(R2)
DADD
R3, R1, R4
AND
R5, R1, R6
OR
R7, R1, R8
The pipelined data path for these instructions is shown in the timing diagram (figure 2.7)

Instruction
LD R1, 0(R2)

Clock number
1

EXE

MEM

EXE

MEM

EXE

MEM

EXE

MEM

EXE
ID

MEM
Stall

WB
EXE

MEM

Stall

EXE

MEM

Stall

EXE

MEM

DADD R3,R1,R4
AND R5, R1, R6
OR R7, R1, R8

LD R1, 0(R2)
DADD R3,R1,R4
AND R5, R1, R6
OR R7, R1, R8

ID
IF

Figure 2.7 In the top half, we can see why stall is needed. In the second half, stall
created to solve the problem.

The LD instruction gets the data from the memory at the end of cycle 4. even with
forwarding technique, the data from LD instruction can be made available earliest during
clock cycle 5. DADD instruction requires the result of LD instruction at the beginning of
clock cycle 5. DADD instruction requires the result of LD instruction at the beginning of
clock cycle 4. This demands data forwarding of clock cycle 4. This demands data
forwarding in negative time which is not possible. Hence, the situation calls for a pipeline
stall.
Result from the LD instruction can be forwarded from the pipeline register to the and
instruction which begins at 2 clock cycles later after the LD instruction.
The load instruction has a delay or latency that cannot be eliminated by forwarding alone.
It is necessary to stall pipeline by 1 clock cycle. A hardware called Pipeline interlock
detects a hazard and stalls the pipeline until the hazard is cleared. The pipeline interlock
helps to preserve the correct execution pattern by introducing a stall or bubble. The CPI
for the stalled instruction increases by the length of the stall. Figure 2.7 shows the
pipeline before and after the stall.
Stall causes the DADD to move 1 clock cycle later in time. Forwarding to the AND
instruction now goes through the register file or forwarding is not required for the OR
instruction. No instruction is started during the clock cycle 4.
Control Hazard
When a branch is executed, it may or may not change the content of PC. If a branch is
taken, the content of PC is changed to target address. If a branch is taken, the content of
PC is not changed.

The simple way of dealing with the branches is to redo the fetch of the instruction
following a branch. The first IF cycle is essentially a stall, because, it never performs
useful work.
One stall cycle for every branch will yield a performance loss 10% to 30% depending on
the branch frequency.
Reducing the Brach Penalties
There are many methods for dealing with the pipeline stalls caused by branch
delay
1. Freeze or Flush the pipeline, holding or deleting any instructions after the branch
until the branch destination is known. It is a simple scheme and branch penalty is
fixed and cannot be reduced by software
2. Treat every branch as not taken, simply allowing the hardware to continue as if
the branch were not to executed. Care must be taken not to change the processor
state until the branch outcome is known.
Instructions were fetched as if the branch were a normal instruction. If the branch
is taken, it is necessary to turn the fetched instruction in to a no-of instruction and
restart the fetch at the target address. Figure 2.8 shows the timing diagram of both
the situations.

Instruction
Untaken Branch

Clock number
1

EXE

MEM

ID
IF

EXE
ID
IF

Instruction I+1
Instruction I+2
Instruction I+3
Instruction I+4
Taken Branch
Instruction I+1
Branch Target
Branch Target+1
Branch Target+2

ID
IF

EXE
Idle
IF

MEM
Idle
ID
IF

MEM
EXE

WB
MEM

ID
IF

EXE
ID

MEM
EXE

WB
MEM

WB
Idle
EXE
ID
IF

Idle
MEM
EXE
ID

Idle
WB
MEM
EXE

WB
MEM

Figure 2.8 The predicted-not-taken scheme and the pipeline sequence when the
branch is untaken (top) and taken (bottom).

3. Treat every branch as taken: As soon as the branch is decoded and target address
is computed, begin fetching and executing at the target if the branch target is
known before branch outcome, then this scheme gets advantage.
For both predicated taken or predicated not taken scheme, the compiler can
improve performance by organizing the code so that the most frequent path
matches the hardware choice.
4. Delayed branch technique is commonly used in early RISC processors.
In a delayed branch, the execution cycle with a branch delay of one is
Branch instruction
Sequential successor-1
Branch target if taken
The sequential successor is in the branch delay slot and it is executed irrespective of
whether or not the branch is taken. The pipeline behavior with a branch delay is shown in
Figure 2.9. Processor with delayed branch, normally have a single instruction delay.
Compiler has to make the successor instructions valid and useful there are three ways in
which the to delay slot can be filled by the compiler.

Instruction
Untaken Branch

Clock number
1

EXE

MEM

EXE

MEM

EXE

MEM

ID
IF

EXE
ID

Branch delay
Instruction (i+1)
Instruction (i+2)
Instruction (i+3)
Instruction (i+4)
Taken Branch
Branch delay
Instruction (i+1)
Branch Target
Branch Target+1
Branch Target+2

ID
IF

EXE
ID

MEM
EXE

WB
MEM

ID
IF

EXE
ID
IF

MEM
EXE
ID

MEM
EXE

WB
MEM

WB
MEM
EXE

WB
MEM

Figure 2.9 Timing diagram of the pipeline to show the behavior of a delayed branch
is the same whether or not the branch is taken.
26

The limitations on delayed branch arise from

Restrictions on the instructions that are scheduled in to delay slots.

ii)

Ability to predict at compiler time whether a branch is likely to be taken or

not taken.

The delay slot can be filled from choosing an instruction

From before the branch instruction

From the target address

From fall- through path.

The principle of scheduling the branch delay is shown in fig 2.10

Figure 2.10 Scheduling the Branch delay

What makes pipelining hard to implements?

Dealing with exceptions: Overlapping of instructions makes it more difficult to know
whether an instruction can safely change the state of the CPU. In a pipelined CPU, an
instruction execution extends over several clock cycles. When this instruction is in
execution, the other instruction may raise exception that may force the CPU to abort the
instruction in the pipeline before they complete.
27

Types of exceptions:
The term exception is used to cover the terms interrupt, fault and exception.
I/O device request, page fault, Invoking an OS service from a user program, Integer
arithmetic overflow, memory protection overflow, Hardware malfunctions, Power failure
etc. are the different classes of exception.
Individual events have important characteristics that determine what action is needed
corresponding to that exception.
i)

Synchronous versus Asynchronous

If the event occurs at the same place every time the program is executed with the
same data and memory allocation, the event is asynchronous.
Asynchronous events are caused by devices external to the CPU and memory such
events are handled after the completion of the current instruction.
ii)

User requested versus coerced: User requested exceptions are predictable

and can always be handled after the current instruction has completed. Coerced
exceptions are caused by some hardware event that is not under the control of the user
program. Coerced exceptions are harder to implement because they are not predictable
iii)

User maskable versus user non maskable :

If an event can be masked by a user task, it is user maskable. Otherwise it is user non
maskable.

iv)

Within versus between instructions:

Exception that occur within

instruction are usually synchronous, since the instruction triggers the

exception. It is harder to implement exceptions that occur within
instructions than those between instructions, since the instruction must be
stopped and restarted. Asynchronous exceptions that occurs within
instructions arise from catastrophic situations and always causes program
termination.

Resume versus terminate:

If the programs execution continues after the interrupt, it is a resuming event otherwise
if is terminating event. It is easier implement exceptions that terminate execution.

Stopping and restarting execution:

The most difficult exception have 2 properties:
1. Exception that occur within instructions
2. They must be restartable
For example, a page fault must be restartable and requires the intervention of OS.
Thus pipeline must be safely shutdown, so that the instruction can be restarted in
the correct state. If the restarted instruction is not a branch, then we will continue
to fetch the sequential successors and begin their execution in the normal fashion.

11)

Restarting is usually implemented by saving the PC of the instruction at which to

restart. Pipeline control can take the following steps to save the pipeline state safely.
i)

Force a trap instruction in to the pipeline on the next IF

ii)

Until the trap is taken, turn off all writes for the faulting instruction and for all

instructions that follow in pipeline. This prevents any state changes for instructions that
will not be completed before the exception is handled.
iii) After the exception handling routine receives control, it immediately saves the PC
of the faulting instruction. This value will be used to return from the exception later.
NOTE:
1. with pipelining multiple exceptions may occur in the same clock cycle because
there are multiple instructions in execution.
2 Handling the exception becomes still more complicated when the instructions are
allowed to execute in out of order fashion.
Pipeline implementation
Every MIPS instruction can be implemented in 5 clock cycle
1. Instruction fetch cycles.(IF)
IR

Mem [PC]

NPC

PC+ 4

Operation: send out the [PC] and fetch the instruction from memory in to the Instruction
Register (IR). Increment PC by 4 to address the next sequential instruction.
29

2. Instruction decode / Register fetch cycle (ID)

Regs [rs]

Regs [rt]

Imm

sign extended immediate field of IR;

Operation: decode the instruction and access that register file to read the registers
( rs and rt). File to read the register (rs and rt). A & B are the temporary registers.
Operands are kept ready for use in the next cycle.
Decoding is done in concurrent with reading register. MIPS ISA has fixed length
Instructions. Hence, these fields are at fixed locations.

Execution/ Effective address cycle (EX)

One of the following operations are performed depending on the instruction
type.
*

Memory reference:
ALU output

A+ Imm;

Operation: ALU adds the operands to compute the effective address and places
the result in to the register ALU output.
*

Register Register ALU instruction:

ALU output

A func

Operation: The ALU performs the operation specified by the function code on the value
taken from content of register A and register B.
*.

Register- Immediate ALU instruction:

ALU output

Operation:

A Op Imm ;

the content of register A and register Imm are operated (function Op) and

result is placed in temporary register ALU output.

*. Branch:
ALU output
Cond

NPC + (Imm << 2)

(A == O)

Chapter # 03 Pipelining
No ratings yet
Chapter # 03 Pipelining
85 pages
CS2340 03 MIPSBasics
No ratings yet
CS2340 03 MIPSBasics
48 pages
CA Slides#5 Pipeline Hazards
No ratings yet
CA Slides#5 Pipeline Hazards
33 pages
Single Cycle Processor PPT by Svnit
No ratings yet
Single Cycle Processor PPT by Svnit
88 pages
Assignment 3 With Solution
No ratings yet
Assignment 3 With Solution
6 pages
Lecture - MIPS ISA, Single Cycle & Pipelined Datapath
No ratings yet
Lecture - MIPS ISA, Single Cycle & Pipelined Datapath
44 pages
ILP - Appendix C PDF
No ratings yet
ILP - Appendix C PDF
52 pages
COA Lecture 10
No ratings yet
COA Lecture 10
22 pages
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
No ratings yet
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
136 pages
Lecture 2: Review of Performance/Cost/Power Metrics and Architectural Basics
No ratings yet
Lecture 2: Review of Performance/Cost/Power Metrics and Architectural Basics
73 pages
L12 - Advanced Branch Preiction
No ratings yet
L12 - Advanced Branch Preiction
9 pages
Exercise5 Solution - Introduction To Embedded Systems
No ratings yet
Exercise5 Solution - Introduction To Embedded Systems
7 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
39 pages
231 CO1003 Final
No ratings yet
231 CO1003 Final
78 pages
07-08 - CO - B2Ch2 - MIPS Instruction Set
100% (1)
07-08 - CO - B2Ch2 - MIPS Instruction Set
62 pages
MIPS MARS Simulator
No ratings yet
MIPS MARS Simulator
10 pages
Digital Design and Computer Architecture, 2: Edition
100% (1)
Digital Design and Computer Architecture, 2: Edition
134 pages
Branch Prediction
No ratings yet
Branch Prediction
41 pages
ACA Unit 2,7th Sem CSE
No ratings yet
ACA Unit 2,7th Sem CSE
13 pages
Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining
No ratings yet
Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining
12 pages
Lab3 Scan-Chain Insertion and ATPG Using DFTADVISOR and FASTSCAN
No ratings yet
Lab3 Scan-Chain Insertion and ATPG Using DFTADVISOR and FASTSCAN
38 pages
MIPS Pipeline PDF
No ratings yet
MIPS Pipeline PDF
2 pages
Os Nguyenvanvietquang 20213583
No ratings yet
Os Nguyenvanvietquang 20213583
18 pages
Mips Assembly Summary
No ratings yet
Mips Assembly Summary
56 pages
Computer Architecture Lec 10
No ratings yet
Computer Architecture Lec 10
41 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
MIPS Processor R3000: Registers
No ratings yet
MIPS Processor R3000: Registers
76 pages
Milestone03 - Computer Architecture Report - Group3
No ratings yet
Milestone03 - Computer Architecture Report - Group3
45 pages
MIPS Instruction Set
No ratings yet
MIPS Instruction Set
7 pages
MIPS Processor Implementation
No ratings yet
MIPS Processor Implementation
92 pages
Unit 1 - ARM7, ARM9, ARM11 Processors
67% (3)
Unit 1 - ARM7, ARM9, ARM11 Processors
88 pages
Digital Design and Computer Architecture, 2: Edition
No ratings yet
Digital Design and Computer Architecture, 2: Edition
87 pages
Index (C++)
No ratings yet
Index (C++)
25 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
94 pages
Regfile For DE1
No ratings yet
Regfile For DE1
16 pages
Previous Final Exam
No ratings yet
Previous Final Exam
10 pages
Problems For BJT Section: Lecture Notes: Sec. 3
No ratings yet
Problems For BJT Section: Lecture Notes: Sec. 3
9 pages
Dynamic Branch Prediction
No ratings yet
Dynamic Branch Prediction
7 pages
Final Examination - Attempt Review
No ratings yet
Final Examination - Attempt Review
26 pages
Computer Organization and Assembly Language
No ratings yet
Computer Organization and Assembly Language
50 pages
Lab 3
No ratings yet
Lab 3
6 pages
Lab Assignment 2: MIPS Single-Cycle Implementation: Electrical and Computer Engineering University of Cyprus
100% (1)
Lab Assignment 2: MIPS Single-Cycle Implementation: Electrical and Computer Engineering University of Cyprus
23 pages
Microprocessors Vs Microcontrollers
No ratings yet
Microprocessors Vs Microcontrollers
40 pages
Co1007 CC GK212
No ratings yet
Co1007 CC GK212
5 pages
Computer Architecture
No ratings yet
Computer Architecture
31 pages
Microprocessor Basic Programming
100% (1)
Microprocessor Basic Programming
132 pages
How To Use The Three AXI Configurations
No ratings yet
How To Use The Three AXI Configurations
14 pages
Fco Basic Structure
No ratings yet
Fco Basic Structure
11 pages
Chapter 6 - Behavioral Modeling
No ratings yet
Chapter 6 - Behavioral Modeling
71 pages
Chapter 7 - Verilog Tasks and Functions
No ratings yet
Chapter 7 - Verilog Tasks and Functions
25 pages
Assignment - 1
0% (1)
Assignment - 1
4 pages
Chapter 7 Problems
No ratings yet
Chapter 7 Problems
3 pages
So Sanh 8086 Va ARM
No ratings yet
So Sanh 8086 Va ARM
2 pages
Assgn1 PDF
No ratings yet
Assgn1 PDF
2 pages
So Sanh 8086 Va ARM
No ratings yet
So Sanh 8086 Va ARM
2 pages
Model Answers - HW1 PDF
No ratings yet
Model Answers - HW1 PDF
6 pages
Dynamic RAM
No ratings yet
Dynamic RAM
7 pages
I2c Bit Controller in Verilog
No ratings yet
I2c Bit Controller in Verilog
14 pages
11
No ratings yet
11
13 pages
M Tech
No ratings yet
M Tech
23 pages
Phu Luc - Cac Lenh Assembly Cua CPU NIOS II
No ratings yet
Phu Luc - Cac Lenh Assembly Cua CPU NIOS II
19 pages
Micro Pills
No ratings yet
Micro Pills
33 pages
Report SRAM 6T Cell Design - Analysis Nisha-1306184446
No ratings yet
Report SRAM 6T Cell Design - Analysis Nisha-1306184446
51 pages
HW2 - TCMT - Nhóm A
No ratings yet
HW2 - TCMT - Nhóm A
4 pages
VLSI Design Mid 2
No ratings yet
VLSI Design Mid 2
3 pages
Final 222 2009 Sol
No ratings yet
Final 222 2009 Sol
6 pages
Computer Organization and Architecture: Web Course Developed For NPTEL
No ratings yet
Computer Organization and Architecture: Web Course Developed For NPTEL
610 pages
EE313 PPT Sem1 2015
No ratings yet
EE313 PPT Sem1 2015
192 pages
DSP Notes
No ratings yet
DSP Notes
6 pages
CourceMeterials DEL34
100% (1)
CourceMeterials DEL34
206 pages
MIPS Reference Data Card
No ratings yet
MIPS Reference Data Card
2 pages
# Syllabus
No ratings yet
# Syllabus
51 pages
Core of Emb-Sys
No ratings yet
Core of Emb-Sys
52 pages
Micro Controller Assignment
No ratings yet
Micro Controller Assignment
10 pages
Arm 7 Datasheet PDF
No ratings yet
Arm 7 Datasheet PDF
150 pages
ARM Architecture - Wikipedia, The Free Encyclopedia PDF
No ratings yet
ARM Architecture - Wikipedia, The Free Encyclopedia PDF
24 pages
Reduced Instruction Set Computer (RISC) : Presented To
No ratings yet
Reduced Instruction Set Computer (RISC) : Presented To
11 pages
8bit Risc Processor
No ratings yet
8bit Risc Processor
7 pages
ARM Notes1
No ratings yet
ARM Notes1
15 pages
Computer Architecture Unit V - Advanced Architecture Part-A
No ratings yet
Computer Architecture Unit V - Advanced Architecture Part-A
4 pages
Arm-7 Based Finger Print Authentication System: Volume 2, Issue 4, April 2013
No ratings yet
Arm-7 Based Finger Print Authentication System: Volume 2, Issue 4, April 2013
6 pages
Performance: Latency
No ratings yet
Performance: Latency
7 pages
Hand Book 6th Semester - Ece - 28!12!2015 FINAL
No ratings yet
Hand Book 6th Semester - Ece - 28!12!2015 FINAL
138 pages
Embedded Systems
No ratings yet
Embedded Systems
64 pages
Introduction To Microcontrollers
No ratings yet
Introduction To Microcontrollers
52 pages
03-Instruction Set Architecture
No ratings yet
03-Instruction Set Architecture
40 pages
Chapter 1
No ratings yet
Chapter 1
11 pages
Introduction To PIC32: References
No ratings yet
Introduction To PIC32: References
22 pages
Operating System Questions
No ratings yet
Operating System Questions
6 pages
Technical Jurnal For Engineering Project
No ratings yet
Technical Jurnal For Engineering Project
7 pages

Pipeline: A Simple Implementation of A RISC Instruction Set

Uploaded by

Pipeline: A Simple Implementation of A RISC Instruction Set

Uploaded by

Pipeline

Pipeline is an implementation technique that exploits parallelism among the

Since each instruction is 4 bytes

2. Instruction decode / Register fetch cycle (ID):

Sign-extend the offset field of the instruction in case it is needed.

Compute the possible branch target address.

3. Execution / Effective address Cycle (EXE)

[Base Register] + offset

* Register- Register ALU instruction: ALU performs the operation specified in

Memory access (MEM)

Write back cycle (WB)

Instruction fetch cycle (IF)

Instruction decode / register fetch cycle (ID)

Execution / Effective address cycle (EX)

Memory access (MEM)

Write back cycle (WB)

The execution of the instruction comprising of the above subtask can be

Buffers or registers are introduced between successive stages of the pipeline so

Basic Performance issues in Pipelining

Figure 2.4 Pipeline activity

Performance with Pipeline stall

Average instruction time un-pipelined

CPI unpipelined * Clock cycle unpipelined

If there are no pipeline stalls,

Figure 2.5b A Bubble is inserted in clock cycle 4

The limitations on delayed branch arise from

Restrictions on the instructions that are scheduled in to delay slots.

Ability to predict at compiler time whether a branch is likely to be taken or

The delay slot can be filled from choosing an instruction

From before the branch instruction

From the target address

From fall- through path.

The principle of scheduling the branch delay is shown in fig 2.10

Figure 2.10 Scheduling the Branch delay

What makes pipelining hard to implements?

Synchronous versus Asynchronous

User requested versus coerced: User requested exceptions are predictable

User maskable versus user non maskable :

Within versus between instructions:

Exception that occur within

instruction are usually synchronous, since the instruction triggers the

Resume versus terminate:

Stopping and restarting execution:

Restarting is usually implemented by saving the PC of the instruction at which to

Force a trap instruction in to the pipeline on the next IF

2. Instruction decode / Register fetch cycle (ID)

sign extended immediate field of IR;

Execution/ Effective address cycle (EX)

Register Register ALU instruction:

Register- Immediate ALU instruction:

result is placed in temporary register ALU output.

NPC + (Imm << 2)

You might also like