0% found this document useful (0 votes)
173 views73 pages

Chapter 4

This document summarizes the microarchitecture of a MIPS CPU. It describes the execution stages of instruction fetch, decode, execute, memory access, and write back. It explains the datapath components including instruction memory, registers file, ALU, and data memory. It shows how the controller extracts bits from instructions to control the multiplexers and functional units. Finally, it provides examples of identifying values for functional units when executing sample instructions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
173 views73 pages

Chapter 4

This document summarizes the microarchitecture of a MIPS CPU. It describes the execution stages of instruction fetch, decode, execute, memory access, and write back. It explains the datapath components including instruction memory, registers file, ALU, and data memory. It shows how the controller extracts bits from instructions to control the multiplexers and functional units. Finally, it provides examples of identifying values for functional units when executing sample instructions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

COMPUTER ARCHITECTURE

Chapter 4: Microarchitecture

Computer Architecture – CSE – HCMIU 1


Introduction
• CPU performance factors
– Instruction count
• Determined by ISA and compiler
– CPI and Cycle time
• Determined by CPU hardware
• We will examine two MIPS implementations
– A simplified version: CPI = 1
– A more realistic pipelined version: CPI ≈ 1
• Simple subset, shows most aspects
– Memory reference: lw, sw
– Arithmetic/logical: add, sub, and, or, slt
– Control transfer: beq, j
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 2
Instruction execution

1. PC → instruction memory (cache), fetch instruction


2. Register numbers → registers file, read registers
3. Depending on instruction class
3.1. Use ALU to calculate
• Arithmetic result → done
• Memory address for load/store → 3.2
• Branch target address → 3.3
3.2. Access data memory for load/store
3.3. PC ← target address or PC + 4

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 3


Execution stages

1. Instruction fetch (IF): PC → instruction address


2. Instruction decode (ID): register operands → register file
3. Execute (EXE):
– Load/store: compute a memory address
– Arithmetic/logical: compute an arithmetic/logical result
4. Memory access (MEM):
– Load: read data memory
– Store: write data memory
5. Write back (WB):
– Store a result of register file
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 4
Datapath - controller

• Datapath: contains information that is operated on by a


functional unit
– Instruction memory: contain instructions (IF)
– Registers file: 32 32-bit registers (ID & WB)
– ALU: calculate arithmetic and logical operations (EXE)
– Data memory: contain data (MEM)
• Control signals: used for multiplexer selection or for
directing the operation of a functional unit
– Control unit
– Multiplexer
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 5
Datapath overview

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 6


Multiplexer - MUX

S=0→C=A
S=1→C=B

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 7


Controller overview

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 8


Building datapath

• Hardware components:

• ID
IF ID & WB

EXE MEM
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 9
Instruction fetch

• Main operations: fetching the


next instruction from Instruction
memory
– PC → instruction address
– Instruction memory →
instruction (32 bits)
– PC ← PC + 4 (using the Add
component)
• Results:
– 32 bits machine instruction
(Instruction)
– Address of the next instruction
in PC
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 10
Instruction decode
• Results:
• Main operations: Extract
– Values (32-bit) for the next stage
machine instructions
– R-format, branch, and
store: rs, rt → Registers
→ values
– I-format: rs → Registers
→ values & immediate
→ sign-extend

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 11


Instruction execution

• Main operation • Results:


– Arithmetic result/memory address (ALU-
– Calculate the arithmetic result)
– Comparison result (zero-output)
operations/address of
memory/register
comparison
• R-format and bne&beq:
both operands collected
from Registers
• I-format (except
bne&beq): one operands
collected from Registers
and one from sign-extend

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 12


Memory access

• Main operations • Results:


– Load: get data from Data – Load: values of Data
memory memory (Read data)
– Store: N/A
– Store: write data from
Registers to Data
memory

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 13


Write back

• Main operations:
– Write values back to Registers (arithmetic/load)
• Result:
– N/A

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 14


Full datapath
Assume that processor is executing lw $t0, 100($t1) (stored at address 100). If registers
store values equal to 2 times their numbers (ex., $s0 stores values of 2x16=32). Please
identify values of inputs/outputs of main functional units

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 15


Example
• Question: Assume that the processor is executing
add $s0, $s1, $s2
– Identify values of functional units’ inputs/outputs
• Answer:
– The machine code is: 000000_10001_10010_10000_00000_1000002
– Instruction memory:
• Instruction address = PC (word address where we store the above instruction)
• Instruction = machine code
– Registers:
• Read register 1 = 100012 => Read data 1 = content ($s1)
• Read register 2 = 100102 => Read data 2 = content ($s2)
• Write register = 100002 & Write data = content ($s1) + content ($s2)
– …sign-extend: input = 10000_00000_100000
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 16
Building controller

• Extracting bits from 32-bit instructions


– Read register 1, Read register 2, and Write registers
– Sign-extend
– Control block
• Building a Control block:
– Handling multiplexers
– Handling control signals of functional units
• RegWrite
• MemRead
• MemWrite
• …

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 17


Extracting bits

• Registers:
– Read register 1 ⇐ rs (instruction[25:21])
– Read register 2 ⇐ rt (instruction[20:16])
– Write register ⇐ rt/rd → need a multiplexer
• Sign-extend0⇐ address
rs
(instruction[15:0])
rt rd shamt funct
R-type
31:26 25:21 20:16 15:11 10:6 5:0

Load/ 35 or 43 rs rt address
Store
31:26 25:21 20:16 15:0

Branch 4 rs rt address
31:26 25:21 20:16 15:0

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 18


Datapath with bit-selection

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 19


Control block

• Main function: handling • Predefined ALU operation values


multiplexers and functional Operator ALU Opeation
and 0000
units’ control signals or 0001
add 0010
– ALU: two levels of decoding sub 0110
• Level 1: 6 bit opcode → 2 slt 0111
bit ALUop • ALUop:
• Level 2: 2 bit ALUop (+ 6 – opcode → add operator →
bit function field) → 4 bit ALUop = 00
ALU operation – opcode → sub operator →
– Muxes and control signals ALUop = 01
of functional units (except – opcode → unknown → ALUop
ALU): 6 bit opcode = 10
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 20
Full microarchitecture
0 32 bit

MUX
Shift left 2
1
32 bit Add
PCSrc
Add
4
Branch
[31:26] Control
unit

RegWrite

MemtoReg
[25:21] 32 bit
Instruction Read register 1 Read oprd 1
PC 32 bit address data 1

ALUSrc
32 bit [20:16] zero MemWrite
Instruction Read register 2
0 ALU Read 32 bit 1
Instruction 32 bit
MUX

ALU Address data

MUX
memory [15:11] Write register Read 0 result
1 32 bit

MUX
data 2 oprd 2 32 bit Write 0
Write data data Data
RegD 1
Register 4 bit memory
st
ALU Operation MemRead

[15:0] Sign ALU


extend ALUOp Control
16 bit 32 bit
32 bit

[5:0] 6 bit

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 21


Exercise
• Question: assume that registers store values of two times their numbers, for
instance $s1 stores values of 17 × 2 = 34, please identify:
1. Which functional units contribute to the processing of following instructions
2. Values of inputs/outputs of functional units when processing following
instructions
3. Values of control signals when processing following instructions
– add $t0, $t1, $t2
– addi $s0, $s1, 100
– lw $s0, 100($s2) # memory word at address 136 stores values of 2021
– sw $s0, 100($s2)
– beq $s1, $s0, L1

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 22


Unconditional jump instructions

• Chapter 2: update PC with concatenation of


– Top 4 bits of old PC
– 26-bit jump address PC = {PC[31:28],address,00}
– 00
• One more way to update PC => Need a multiplexer & an
extra control signal decoded from opcode

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 23


Datapath with Jumps added

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 24


Performance issue

• Simplified version (CPI = 1): every instruction executed in


only one cycle
– Longest delay determines clock period
– What is the longest instruction?
• Instruction memory → register file → ALU → data memory
→ register file
• Not feasible to vary period for different instructions
• We will improve performance by pipelining

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 25


Pipelining analogy

• Pipeline laundry:
– Overlapping execution
– Improving performance
(time for entire group)
• Four loads
– Speed-up = 2.3×
– Not impressive
• Non-stop (#loads → ∞)
– Speed-up?
– Number of stages
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 26
MIPS pipeline

• Five stages, one step per stage


1. Instruction fetch (IF): PC → instruction address
2. Instruction decode (ID): register operands → register file
3. Execute (EXE):
• Load/store: compute a memory address
• Arithmetic/logical: compute an arithmetic/logical result
4. Memory access (MEM):
• Load: read data memory
• Store: write data memory
5. Write back (WB):
• Store a result of register file
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 27
Pipeline performance

• Assume time for stages is


– 100ps for register read or write (ID & WB)
– 200ps for other stages (IF, EXE, & MEM)
• Compare pipelined datapath with single-cycle datapath

Instruction Instr Register ALU op Memory Register Total


fetch read access write time
lw 200ps 100 ps 200ps 200ps 100 ps 800ps

sw 200ps 100 ps 200ps 200ps 700ps

R-format 200ps 100 ps 200ps 100 ps 600ps

beq 200ps 100 ps 200ps 500ps


Computer Architecture (c) Cuong Pham-Quoc@HCMUT 28
Pipeline performance

Single-cycle(Tc = 800ps)

Pipeline(Tp = 200ps)

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 29


Pipeline speed-up

• If all stages are


mebalanced:
of single-cycle me bw instruc onsingle-cycle
speed-up = =
– speed_up =me
number of pipe stages
of pipelined me bw instruc onpipelined

• If not balanced, speedup is less


• Source of speedup
– Throughput increased
– Latency (time for each instruction) does not decrease
• Sometimes increased
ti
ti
ti
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 30
ti
ti
ti
Pipeline datapath

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 31


Instructions execution

Mul -Cycle Pipeline Diagram


How can we keep data when stages are not balanced?
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 32
ti
Wholesale market example

• Trucks move forward when completed


– Accident at the Apple store
– Barriers can help: open every hour (cycle)

weight = 100kg weight = 1ton weight = 1ton weight = 100kg


me = 15minutes me = 1hour me = 1hour me = 15minutes

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 33


ti
ti
ti
ti
Pipeline registers
Barriers in wholesale markets ⇔ Registers in digi

IF/ID ID/EX EX/MEM MEM/WB

Shift left 2
Add

Add
4
[25:21]
0 Instruction Read register 1 Read oprd1
MemtoReg
MUX

data 1 MemWrite
PC 32 bit address [20:16] zero
Instruction Read register 2 ALUSrc
1 0 ALU
Read 1
Instruction
MUX

Result Address data

MUX
[15:11] Write register
PCSrc Memory Read 0
1

MUX
data 2 Write 0
Write data oprd2
data Data
RegDst Registers 1 4 bit Memory
RegWrite ALU Operation MemRead
[15:0] Sign ALU
extend ALUOp Control
16 bit 32 bit

[5:0] 32 bit

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 34


Example

• Question: Given the following MIPS sequence:


1. lw $s0, 20($s1)
2. sub $t2, $s2, $s3
3. add $t3, $s3, $s4
4. lw $t4, 24($s1)
5. add $t5, $s5, $s6
Assume that the sequence is executed by a 5-stage pipelined
MIPS processor
a) Draw a multi-cycle pipeline diagram for the sequence
b) Analyze the 5th cycle with the datapath diagram in the
previous slide
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 35
Multi-cycle pipeline diagram

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 36


Single-cycle pipeline diagram
add $t5, $s5, $s6 lw $t4, 24($s1) add $t3, $s3, $s4 sub $t2, $s2, $s3 lw $s0, 20($s1)

IF/ID ID/EX EX/MEM MEM/WB

Shift left 2
Add

Add
4
[25:21]
0 Instruction Read register 1 Read oprd1
MemtoReg
MUX

PC 32 bit address [20:16] data 1 MemWrite


Instruction Read register 2 zero
1 ALUSrc
0 ALU Read
Instruction 1
MUX

Result Address data

MUX
[15:11] Write register
PCSrc Memory Read 0
1 Write

MUX
data 2 oprd2 0
Write data data Data
RegDst Registers 1 4 bit Memory

RegWrite ALU Operation


MemRead
[15:0] Sign ALU
Control
Anything wrong? 16 bit extend 32 bit ALUOp

[5:0] 32 bit

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 37


Corrected pipeline datapath
IF/ID ID/EX EX/MEM MEM/WB

Shift left 2
Add

Add
4
[25:21]
0 Read register 1 Read oprd 1
Address MemtoReg
MUX

PC data 1 MemWrite
32 bit [20:16] zero
Instruction Read register 2 ALUSrc
1 ALU Read 11
Instruction Write register Result Address data

MUX
PCSrc Memory Read 0

MUX
data 2 oprd 2 Write 00
Write data data Data
Registers 1 4 bit Memory
RegWrite ALU Operation
MemRead
[15:0] Sign ALU
extend Control
16 bit 32 bit ALUOp
[5:0]
0
MUX

[20:16] [15:11]
1
RegDst

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 38


Example of lw - IF
lw

IF/ID ID/EX EX/MEM MEM/WB

Shift left 2
Add

Add
4
[25:21]
0 Read register 1 Read oprd1
Address MemtoReg
MUX

PC data 1 MemWrite
32 bit [20:16] zero
Instruction Read register 2 ALUSrc
1 ALU Read 1
Instruction Write register Result Address data

MUX
PCSrc Memory Read 0

MUX
data 2 oprd2 Write 0
Write data data Data
Registers 1 4 bit Memory
RegWrite ALU Operation
MemRead
[15:0] Sign ALU
extend ALUOp Control
16 bit 32 bit
[5:0]
0
MUX

[20:16] [15:11]
1
RegDst

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 39


Example of lw - ID
lw

IF/ID ID/EX EX/MEM MEM/WB

Shift left 2
Add

Add
4
[25:21]
0 Read register 1 Read oprd1
Address MemtoReg
MUX

PC data 1 MemWrite
32 bit [20:16] zero
Instruction Read register 2 ALUSrc
1 ALU Read 1
Instruction Write register Result Address data

MUX
PCSrc Memory Read 0

MUX
data 2 oprd2 Write 0
Write data data Data
Registers 1 4 bit Memory
RegWrite ALU Operation
MemRead
[15:0] Sign ALU
extend ALUOp Control
16 bit 32 bit
[5:0]
0
MUX

[20:16] [15:11]
1
RegDst

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 40


Example of lw - EXE
lw

IF/ID ID/EX EX/MEM MEM/WB

Shift left 2
Add

Add
4
[25:21]
0 Read register 1 Read oprd1
Address MemtoReg
MUX

PC data 1 MemWrite
32 bit [20:16] zero
Instruction Read register 2 ALUSrc
1 ALU Read 1
Instruction Write register Result Address data

MUX
PCSrc Memory Read 0

MUX
data 2 oprd2 Write 0
Write data data Data
Registers 1 4 bit Memory
RegWrite ALU Operation
MemRead
[15:0] Sign ALU
extend ALUOp Control
16 bit 32 bit
[5:0]
0
MUX

[20:16] [15:11]
1
RegDst

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 41


Example of lw - MEM
lw

IF/ID ID/EX EX/MEM MEM/WB

Shift left 2
Add

Add
4
[25:21]
0 Read register 1 Read oprd1
Address MemtoReg
MUX

PC data 1 MemWrite
32 bit [20:16] zero
Instruction Read register 2 ALUSrc
1 ALU Read 1
Instruction Write register Result Address data

MUX
PCSrc Memory Read 0

MUX
data 2 oprd2 Write 0
Write data data Data
Registers 1 4 bit Memory
RegWrite ALU Operation
MemRead
[15:0] Sign ALU
extend ALUOp Control
16 bit 32 bit
[5:0]
0
MUX

[20:16] [15:11]
1
RegDst

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 42


Example of lw - WB
lw

IF/ID ID/EX EX/MEM MEM/WB

Shift left 2
Add

Add
4
[25:21]
0 Read register 1 Read oprd1
Address
MUX

PC data 1 MemWrite
32 bit [20:16] zero
Instruction Read register 2 ALUSrc
1 ALU Read 1
Instruction Write register Result Address data

MUX
PCSrc Memory Read 0

MUX
data 2 oprd2 Write 0
Write data data Data
Registers 1 4 bit Memory
RegWrite ALU Operation
MemRead
[15:0] Sign ALU
extend ALUOp Control
16 bit 32 bit
[5:0]
0
MUX

[20:16] [15:11]
1
RegDst

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 43


Control signals

• Control signals travel across pipeline registers

WB

Control
M WB

EX M WB

2 bit

MemtoReg
MemWrite
MemRead

RegWrite
ALUSrc
RegDst

ALUOp

Branch
IF/ID ID/EX EX/MEM MEM/WB

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 44


Exercise

• Question: Given the following MIPS sequence:


lw $s0, 20($s1)
sub $t2, $s2, $s3
add $t3, $s3, $s4
lw $t4, 24($s1)
add $t5, $s5, $s6
Assume that the sequence is executed by a 5-stage pipelined
MIPS processor
a) Identify values of control signals in cycle 5 at the functional
units
b) Identify values of control signals in cycle 5 at the Control block

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 45


Full pipeline micro-architecture
ID/EX
EX/MEM
WB MEM/WB

Control M WB
IF/ID WB
EX M

[31:26]

Branch
Shift left 2
Add

Add
4 RegWrite

MemtoReg
[25:21]
0 Instruction Read register 1 Read oprd1
MUX

PC address data 1
32 bit [20:16] MemWrite

ALUSrc
Instruction Read register 2 zero
1 ALU
Read 1
Instruction Write register Result Address data

MUX
PCSrc Memory Read 0

MUX
data 2 oprd2 Write 0
Write data data Data
Registers 1 4 bit Memory
MemRead

ALUOp
[15:0] Sign ALU
RegDst
extend Control
16 bit 32 bit
[5:0]
0
MUX

[20:16] [15:11]
1

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 46


Hazards

• Situations that prevent starting the next instruction in the


next cycle
• Structure hazard
– A required resource is busy
• Data hazard
– Need to wait for previous instruction to complete its data
read/write
• Control hazard
– Deciding on control action depends on previous instruction

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 47


Structure hazards

• Conflict for use of a resource


– e.g.,: the laundry process, B forgot to bring clothes from
the washing machine to dryer
• Should be eliminated entirely
– Stall the pipe for that cycle
– May repeat many times
• MIPS processors already solved all structure hazards
– Separated instructions and data memory (caches)
– Read and write registers use different ports

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 48


Data hazards

• An instruction depends on completion of data access by a


previous instruction
sub $t0, $t2, $t3
add $t3, $t0, $t1

• No any issues with the simplified version


• Problem with pipeline

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 49


Control hazards

• Branch determines flow of control


– Fetching next instruction depends on branch outcome
• Pipeline updates PC in the MEM stage
– Still working on ID stage of branch

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 50


Data hazards solutions
1. Code rescheduling
– Done by compiler (software level)
– Sometimes cannot find solutions
2. Delay or stalls insertion
– Done by hardware
– Always can find solutions
– Increase execution time & need extra hardware resources
3. Forwarding
– Done by hardware
– Cannot find solutions in a special case
– Requires extra hardware resources
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 51
Code re-scheduling

• When haven’t data hazards happened?


– Read after or in the same cycle with Write (RAW)

• Find and swap data-independent instructions


Computer Architecture (c) Cuong Pham-Quoc@HCMUT 52
Example
• Question: given the following sequence of MIPS instructions
1: lw $t1, 0($t0) 1: lw $t1, 0($t0)
2: lw $t2, 4($t0) 2: lw $t2, 4($t0)
3: add $t3, $t1, $t2 5: lw $t4, 8($t0)
4: sw $t5, 12($t0) 7: addi $t6, $t0, 4
5: lw $t4, 8($t0) 3: add $t3, $t1, $t2
6: add $t5, $t1, $t4 4: sw $t5, 12($t0)
7: addi $t6, $t0, 4 6: add $t5, $t1, $t4

– Identify data hazards and solve by code re-scheduling


• Answer:
– The third and the sixth instructions (add $t3, $t1, $t2 and add $t5, $t1, $t4)
have data hazards
– Move two data-independent instructions 5 and 7 to right after the second
instruction

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 53


Stalls insertion

• When haven’t data hazards happened?


– Read after or in the same cycle with Write (RAW)

• Delay instructions that use data:


– ID stage of the instruction using data ≡ WB stage of the
instruction producing data
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 54
Example

• Questions: given the following MIPS sequence of instructions


1: lw $t1, 0($t0)
2: lw $t2, 4($t0)
3: add $t3, $t1, $t2
4: sw $t3, 12($t0)

– Identify data hazards and solve them by the stalls insertion method;
how many cycles needed for the sequence?
• Answer:
– Data hazards (2) - (3) & (3) - (4)
rd th
– Insert two stalls for the 3 instruction & two stalls for the 4
instruction
– 12 cycles needed
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 55
Example (cont.)

CK1 CK2 CK3 CK4 CK5 CK6 CK7 CK8 CK9 CK10 CK11 CK12

IM REG ALU DM REG


lw $t1, 0($t0)

IM REG ALU DM REG


lw $t2, 4($t0)

add $t3, $t1, $t2 IM REG ALU DM REG

Stall Stall

IM REG ALU DM REG


sw $t3, 12($t0)

Stall Stall

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 56


Forwarding

• When can an instruction use data at soonest?


– Right after data is produced (EXE stage & MEM stage)
• When is data processed?
– Apply operators: EXE state

• Use data right after created


Computer Architecture (c) Cuong Pham-Quoc@HCMUT 57
Example

• Question: given the following MIPS sequence


1: sub $s2, $s1, $s3
2: and $s7, $s2, $s5
3: or $s8, $s6, $s2
4: add $s0, $s2, $s2
5: sw $s5, 100($s2)

– Analyze data dependencies & identify data hazards


• Answer:
– $s2 produced by the 1st instruction used by all other
instructions
nd rd
– Data hazards: 2 & 3 instructions
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 58
Example (cont.)

• Answer (cont.):

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 59


Forwarding

• Consider the previous sequence of MIPS instructions


– Data can be forwarded
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

sub $s2,$s1,$s3 IM REG ALU DM REG

and $s7,$s2,$s5
IM REG ALU DM REG

or $s8,$s6,$s2 IM REG ALU DM REG

add $s0,$s2,$s2 IM REG ALU DM REG

sw $s5,100($s2) IM REG ALU DM REG

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 60


Load used data hazards

• When instruction producing data is a load, can data be


forwarded?
– Cannot forward to the next instruction since data is produced
later

– Delay:
• 1 stall with forwarding or 2 stalls without forwarding
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 61
Detecting data hazards

• EXE hazard versus MEM hazard


– EXE hazard: forward from the EX/MEM register
• Destination register in the MEM stage ≡ one of the source
registers of the instruction in EXE
– MEM hazard: forward from the MEM/WB register
• Destination register in the WB stage ≡ one of the source
registers of the instruction in EXE
• Passing register numbers along the pipe
– ID/EX.rs & ID/EX.rt: first & second sources
– EX/MEM.rd; MEM/WB.rd: destinations
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 62
EX hazard conditions

1a. ID/EX.rs
IF/ID
= EX/MEM.rd ID/EX EX/MEM MEM/WB

1b. ID/EX.rt =[31:26]


EX/MEM.rd
Control
unit (not happened when an I-
Instruction
format instruction is in the EXE stage)
address
Instruction
Instruction
Inst[31:0]

Memory [25:21]
ID/EX.rs

[20:16]
ID/EX.rt
0 EX/MEM.rd MEM/WB.rd

MUX
[15:11]
1
RegDst

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 63


MEM hazard conditions

• Consider the following MIPS sequence of instructions


1: add $s0, $s0, $s1
2: add $s0, $s0, $s2
3: add $s0, $s0, $s3

• Both EX and MEM hazards seem occur


– EX hazard is correct
• MEM hazard conditions
2a. (ID/EX.rs = MEM/WB.rd) & (ID/EX.rs != EX/MEM.rd)
2b. (ID/EX.rt = MEM/WB.rd) & (ID/EX.rt != EX/MEM.rd)
(not happened when an I-format instruction is in the EXE
stage)
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 64
Datapath with forwarding ID/EX EX/MEM MEM/WB

WB
WB
Control M
IF/ID Unit WB
(1) ALUSrc
EX (2) RegDst M
(3) ALUop
[31:26]

Shift left 2

Branch
Add

Add
4 RegWrite

MemtoReg
MemWrite
[25:21]

MUX
0 Instruction Read register 1 oprd 1
MUX

PC address Read data 1


32 bit [20:16] zero
Instruction Read register 2
1 (1) ALU Read 1
Instruction Write register Result Address data

MUX
Memory

MUX
PCSrc 0

MUX
oprd 2 Write 0
Write data Read data 2
data Data
Registers 1 ALU 4 bit Memory
Operation MemRead
F1
F2
[15:0] (3) ALU
Sign
extend Control
16 bit 32 bit
[5:0] (2)
0
EX/MEM.rd
MUX

MEM/WB.rd
[15:11] 1
ID/EX.rt
Forwarding
ID/EX.rs unit EX/MEM.RegWrite

Mem/WB.RegWrite

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 65


Forwarding control values

Mux values
Source Explanation
(binary)
F1 = 00 ID/EX The first ALU operand comes from the registers file
The first ALU operand is forwarded from the prior ALU
F1 = 10 EX/MEM
result
The first ALU operand is forwarded from data memory
F1 = 01 MEM/WB
of an earlier ALU result
F2 = 00 ID/EX The second ALU operand comes from the registers file
The second ALU operand is forwarded from the prior
F2 = 10 EX/MEM
ALU result
The second ALU operand is forwarded from data
F2 = 01 MEM/WB
memory of an earlier ALU result

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 66


Forwarding conditions
• EX hazard
– 1a: if (EX/MEM.RegWrite and (EX/MEM.rd ≠ 0) and (EX/MEM.rd =
ID/EX.rs)) F1 = 10
– 1b: if (EX/MEM.RegWrite and (EX/MEM.rd ≠ 0) and (EX/MEM.rd =
ID/EX.rt)) F2 = 10
• MEM hazard
– 2a: if (MEM/WB.RegWrite and (MEM/WB.rd ≠ 0) and not (EX/
MEM.RegWrite and (EX/MEM.rd ≠ 0) and (EX/MEM.rd = ID/EX.rs))
and (MEM/WB.rd = ID/EX.rs)) F1 = 01
– 2b: if (MEM/WB.RegWrite and (MEM/WB.rd ≠ 0) and not (EX/
MEM.RegWrite and (EX/MEM.rd ≠ 0) and (EX/MEM.rd = ID/EX.rt))
and (MEM/WB.rd = ID/EX.rt)) F2 = 01
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 67
Detecting Load-use data hazards

• Check when using data instruction is decoded in ID stage


– Producing data instruction (load) is in the EXE stage
– The sooner the better due to a stall inserted
• Load-use hazard when
– ID/EX.MemRead and ((ID/EX.rt = IF/ID.rs) or (ID/EX.rt =
IF/ID.rt))
• How to stall the pipeline?
– Force control signals in ID/EX register to 0 ⇒ EXE, MEM,
and WB do nothing
– Prevent updating PC and ID/EX register
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 68
ID/EX EX/MEM MEM/WB

1 WB
Write 0 ID/EX.MemRead WB

MUX
Hazard LU-hazard M
WB
detection 0 (1) ALUSrc
EX (2) RegDst M

IF/ID.rs&rt
ID/EX.rt (3) ALUop
IF/ID
Control
unit
[31:26]

Shift left 2

Branch
Add

Add
4

MemtoReg
MemWrite
RegWrite
[25:21]

MUX
0 Instruction Read register 1 oprd1
MUX

PC address Read data 1


32 bit [20:16] zero
Instruction Read register 2
1 (1) ALU Read 1
Instruction Write register Result Address data

MUX
Memory

MUX
PCSrc 0

MUX
oprd2 Write 0
Write data Read data 2 data Data
Registers 1 Memory
ALU 4 bit
F1 Operation MemRead
F2
[15:0] (3) ALU
Sign
extend control
16 bit 32 bit
[5:0] (2)
0
EX/MEM.rd

MUX

MEM/WB.rd
[15:11] 1
ID/EX.rt
Forwarding
ID/EX.rs unit EX/MEM.RegWrite

Mem/WB.RegWrite

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 69


Branch hazards solutions

• Predict outcome of branch


– Only stall if prediction is wrong
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

beq $s1,$s2, L1 IM REG DM REG


ALU

and $s1, $s2, $s3 IM REG ALU DM REG

add $t3, $s3, $s4 IM REG ALU DM REG

sub $s3, $s3, $s4 IM REG ALU DM REG

L1: lw $t4, 24($s1) IM REG ALU DM REG

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 70


Branch prediction

• Static branch prediction


– Based on typical branch behavior
– Example: loop and if-statement branches
• Predict backward branches taken
• Predict forward branches not taken
• Dynamic branch prediction
– Hardware measures actual branch behavior
• e.g., record recent history of each branch
– Assume future behavior will continue the trend
• When wrong, stall while re-fetching, and update history
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 71
Concluding remarks

• ISA influences design of datapath and control


• Datapath and control influence design of ISA
• Pipelining improves instruction throughput using
parallelism
– More instructions completed per second
– Latency for each instruction not reduced
• Hazards: structural, data, control

Computer Architecture (c) Cuong Pham-Quoc@HCMUT 72


The end

Computer Architecture – CSE – HCMIU 73

You might also like