0% found this document useful (0 votes)

21 views88 pages

EC Chapter2 2014

1) The document describes the basic operation of a MIPS processor implementation including fetch, decode, execute stages. 2) Key steps are fetching instructions from memory and incrementing the program counter, decoding the instruction to read registers and determine the operation, and executing different instruction types by performing arithmetic/logical operations, memory access, or branch comparisons. 3) Control signals are needed to determine when to write register values or memory to handle cases where write back does not occur every clock cycle.

Uploaded by

quejaukennoitrei-2168

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views88 pages

EC Chapter2 2014

Uploaded by

quejaukennoitrei-2168

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 88

Computer Architecture 14-15

Chapter 2: Enhancing performance

with pipelining

[Adapted from Computer Organization and Design, 4th Edition,

Chapter 2. Enhancing performance with pipelining 1 Dept. of Computer Architecture, UMA, Oct 2014
Introduction
CPU performance factors
Instruction count
- Determined by ISA and compiler
CPI and Cycle time
- Determined by CPU hardware

We will examine two MIPS implementations

A simplified version
A more realistic pipelined version
Simple subset, shows most aspects
Memory reference: lw, sw
Arithmetic/logical: add, sub, and, or, slt
Control transfer: beq, j
Chapter 2. Enhancing performance with pipelining 2 Dept. of Computer Architecture, UMA, Oct 2014
Review: MIPS (RISC) Design Principles
Simplicity favors regularity
fixed size instructions
small number of instruction formats
opcode always the first 6 bits

Smaller is faster
limited instruction set
limited number of registers in register file
limited number of addressing modes

Make the common case fast

arithmetic operands from the register file (load-store machine)
allow instructions to contain immediate operands

Good design demands good compromises

three instruction formats
Chapter 2. Enhancing performance with pipelining 3 Dept. of Computer Architecture, UMA, Oct 2014
The Processor: Datapath & Control
Our implementation of the MIPS is simplified
memory-reference instructions: lw, sw
arithmetic-logical instructions: add, sub, and, or, slt
control flow instructions: beq, j

Generic implementation
use the program counter (PC) to supply Fetch
PC = PC+4
the instruction address and fetch the
instruction from memory (and update the PC) Exec Decode
decode the instruction (and read registers)
execute the instruction

All instructions (except j) use the ALU after reading the

registers

How? memory-reference? arithmetic? control flow?

Chapter 2. Enhancing performance with pipelining 4 Dept. of Computer Architecture, UMA, Oct 2014
Aside: Clocking Methodologies
The clocking methodology defines when data in a state
element is valid and stable relative to the clock
State elements - a memory element such as a register
Edge-triggered – all state changes occur on a clock edge
Typical execution
read contents of state elements -> send values through
combinational logic -> write results to one or more state elements
State Combinational State
element element
logic
1 2

clock

one clock cycle

Assumes state elements are written on every clock
cycle; if not, need explicit write control signal
write occurs only when both the write control is asserted and the
clock edge occurs
Chapter 2. Enhancing performance with pipelining 5 Dept. of Computer Architecture, UMA, Oct 2014
Fetching Instructions
Fetching instructions involves
reading the instruction from the Instruction Memory
updating the PC value to be the address of the next
(sequential) instruction

clock Add

4
Fetch
PC = PC+4 Instruction
Memory
Exec Decode Read
PC Instruction
Address

PC is updated every clock cycle, so it does not need an

explicit write control signal just a clock signal
Reading from the Instruction Memory is a combinational
activity, so it doesn’t need an explicit read control signal
Chapter 2. Enhancing performance with pipelining 6 Dept. of Computer Architecture, UMA, Oct 2014
Decoding Instructions
Decoding instructions involves
sending the fetched instruction’s opcode and function field
bits to the control unit

Fetch Control
PC = PC+4 Unit

Exec Decode
Read Addr 1
Register Read
Read Addr 2 Data 1
and Instruction
File
Write Addr
Read
Data 2
Write Data

reading two values from the Register File

- Register File addresses are contained in the instruction

Chapter 2. Enhancing performance with pipelining 7 Dept. of Computer Architecture, UMA, Oct 2014
Executing R Format Operations
R format operations (add, sub, slt, and, or)
31 25 20 15 10 5 0
R-type: op rs rt rd shamt funct
perform operation (op and funct) on values in rs and rt
store the result back into the Register File (into location rd)

RegWrite ALU control

Read Addr 1
Fetch Register Read
PC = PC+4 Instruction Read Addr 2 Data 1 overflow
File zero
ALU
Write Addr
Exec Decode Read
Data 2
Write Data

Note that Register File is not written every cycle (e.g. sw), so
we need an explicit write control signal for the Register File
Chapter 2. Enhancing performance with pipelining 8 Dept. of Computer Architecture, UMA, Oct 2014
Executing Load and Store Operations
Load and store operations involves
compute memory address by adding the base register (read from
the Register File during decode) to the 16-bit signed-extended
offset field in the instruction
store value (read from the Register File during decode) written to
the Data Memory
load value, read from the Data Memory, written to the Register
File RegWrite ALU control MemWrite

overflow
Read Addr 1 zero
Register Read
Address

Instruction Read Addr 2 Data 1 Data

File Memory Read Data
ALU
Write Addr
Read
Data 2 Write Data
Write Data

Sign MemRead
16 Extend 32

Chapter 2. Enhancing performance with pipelining 9 Dept. of Computer Architecture, UMA, Oct 2014
Executing Branch Operations
Branch operations involves
compare the operands read from the Register File during decode
for equality (zero ALU output)
compute the branch target address by adding the updated PC to
the 16-bit signed-extended offset field in the instr
Add Branch
Add target
4 Shift address
left 2

ALU control
PC

Read Addr 1 zero (to branch

Sign
16 Extend 32
Chapter 2. Enhancing performance with pipelining 10 Dept. of Computer Architecture, UMA, Oct 2014
Executing Jump Operations
Jump operation involves
replace the lower 28 bits of the PC with the lower 26 bits of the
fetched instruction shifted left by 2 bits

Add

4
4
Jump
Instruction Shift address
Memory
left 2 28
PC Read Instruction
Address 26

Chapter 2. Enhancing performance with pipelining 11 Dept. of Computer Architecture, UMA, Oct 2014
Creating a Single Datapath from the Parts
Assemble the datapath segments and add control lines
and multiplexors as needed
Single cycle design – fetch, decode and execute each
instructions in one clock cycle
no datapath resource can be used more than once per
instruction, so some must be duplicated (e.g., separate
Instruction Memory and Data Memory, several adders)
multiplexors needed at the input of shared elements with
control lines to do the selection
write signals to control writing to the Register File and Data
Memory

Cycle time is determined by length of the longest path

Chapter 2. Enhancing performance with pipelining 12 Dept. of Computer Architecture, UMA, Oct 2014
Fetch, R, and Memory Access Portions

Add
RegWrite ALUSrc ALU control MemWrite MemtoReg
4
ovf
zero
Read Addr 1
Instruction
Memory Register Read Address
Read Addr 2 Data 1
Read Data
PC Instruction File Memory Read Data
Address ALU
Write Addr Read
Data 2 Write Data
Write Data

MemRead
Sign
16 Extend 32

Chapter 2. Enhancing performance with pipelining 13 Dept. of Computer Architecture, UMA, Oct 2014
Adding the Control
Selecting the operations to perform (ALU, Register File
and Memory read/write)
Controlling the flow of data (multiplexor inputs)
31 25 20 15 10 5 0
R-type: op rs rt rd shamt funct
Observations 31 25 20 15 0
I-Type: op rs rt address offset
op field always
in bits 31-26 31 25 0
addr of registers J-type: op target address
to be read are
always specified by the
rs field (bits 25-21) and rt field (bits 20-16); for lw and sw rs is the base
register
addr. of register to be written is in one of two places – in rt (bits 20-16)
for lw; in rd (bits 15-11) for R-type instructions
offset for beq, lw, and sw always in bits 15-0
Chapter 2. Enhancing performance with pipelining 14 Dept. of Computer Architecture, UMA, Oct 2014
Single Cycle Datapath with Control Unit
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc

RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Read Data
PC Instr[31-0] 0 File Memory Read Data 1
Address ALU
Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1