0% found this document useful (0 votes)
10 views

Processor

Uploaded by

hn3792941
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Processor

Uploaded by

hn3792941
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 184

Computer Architecture

Assoc. Prof. Nguyễn Trí Thành, PhD


UNIVERSITY OF ENGINEERING AND TECHNOLOGY
FACULTY OF INFORMATION TECHNOLOGY
DEPARTMENT OF INFORMATION SYSTEMS
[email protected]

1
Processor on MIPS

2
CPU – RAM interconnection

3
Instruction Execution

 PC  instruction memory, fetch instruction


 Decode the instruction
 Execute the instruction
 Depending on instruction class
 Use ALU to calculate
 Arithmetic result

 Memory address for load/store

 Branch target address

 Access data memory for load/store


 PC  target address or PC + 4
4
Convention for Registers

Name Register number Usage


$zero 0 the constant value 0
$v0-$v1 2-3 values for results and expression evaluation
$a0-$a3 4-7 arguments
$t0-$t7 8-15 temporaries
$s0-$s7 16-23 saved
$t8-$t9 24-25 more temporaries
$gp 28 global pointer
$sp 29 stack pointer
$fp 30 frame pointer
$ra 31 return address

Register 1, called $at, is reserved for the assembler; registers 26-27,


called $k0 and $k1 are reserved for the operating system.

5
Instruction type in MIPS

 arithmetic-logic instructions:
 add $s1,$s2,$s3 R $s1 = $s2 + $s3
sub $s1,$s2,$s3 R $s1 = $s2 – $s3
or $s1,$t2,$t3 R $s1 = $t2 | $t3
and $s1,$t2,$t3 R $s1 = $t2 & $t3
 memory-reference instructions: lw, sw
 lw $s1,100($s2) I $s1 = Memory[$s2+100]
sw $s1,100($s2) I Memory[$s2+100] = $s1
 control-flow instructions: beq, j
bne $s4,$s5,Lab1 I if $s4 != $s5 goto Lab1
beq $s4,$s5,Lab2 I if $s4 = $s5 goto Lab2
j Lab3 J goto Lab3

6
ALU
A

Mux2
8 Register

DeMux
8 8
File
0 1 B

Mux1
Mux
8 (4)
8
Flags

ALU
4 4
4
8
4

4
7
Operation Operand1 Operand2 Destination
Implementing MIPS
 We're ready to look at an implementation of the MIPS
instruction set
 Simplified to contain only
 arithmetic-logic instructions: add, sub, and, or, slt
 memory-reference instructions: lw, sw
 control-flow instructions: beq, j

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

op rs rt rd shamt funct R-Format


6 bits 5 bits 5 bits 16 bits

op rs rt offset I-Format
6 bits 26 bits

op address J-Format
8
ALU Control

 ALU used for


 Load/Store: F = add
 Branch: F = subtract
 R-type: F depends on funct field

ALU control Function


0000 AND
0001 OR
0010 add
0110 subtract
0111 set-on-less-than
1100 NOR
9
ALU Control
 Assume 2-bit ALUOp derived from opcode
 Combinational logic derives ALU control

opcode ALUOp Operation funct ALU function ALU control


lw 00 load word XXXXXX add 0010
sw 00 store word XXXXXX add 0010
beq 01 branch equal XXXXXX subtract 0110
R-type 10 add 100000 add 0010
subtract 100010 subtract 0110
AND 100100 AND 0000
OR 100101 OR 0001
set-on-less-than 101010 set-on-less-than 0111

10
R-format Example

op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

add $t0, $s1, $s2


special $s1 $s2 $t0 0 add

0 17 18 8 0 32

000000 10001 10010 01000 00000 100000

000000100011001001000000001000002 = 0232402016

11
Hexadecimal

 Base 16
 Compact representation of bit strings
 4 bits per hex digit
0 0000 4 0100 8 1000 c 1100
1 0001 5 0101 9 1001 d 1101
2 0010 6 0110 a 1010 e 1110
3 0011 7 0111 b 1011 f 1111
 Example: eca8 6420
 1110 1100 1010 1000 0110 0100 0010 0000
12
MIPS I-format Instructions

op rs rt constant or address
6 bits 5 bits 5 bits 16 bits

 Immediate arithmetic and load/store instructions


 rt: destination or source register number
 Constant: –215 to +215 – 1
 Address: offset added to base address in rs
 Design Principle 4: Good design demands good
compromises
 Different formats complicate decoding, but allow 32-bit
instructions uniformly
 Keep formats as similar as possible
13
Overview: Processor
Implementation Styles
 Single Cycle
 perform each instruction in 1 clock cycle

 clock cycle must be long enough for slowest instruction;


therefore,
 disadvantage: only as fast as slowest instruction

 Multi-Cycle
 break fetch/execute cycle into multiple steps

 perform 1 step in each clock cycle

 advantage: each instruction uses only as many cycles as it


needs
 Pipelined
 execute each instruction in multiple steps

 perform 1 step / instruction in each clock cycle

 process multiple instructions in parallel – assembly line


14
Breaking instructions into steps
 Our goal is to break up the instructions into steps so
that
 each step takes one clock cycle
 the amount of work to be done in each step/cycle is about
equal
 each cycle uses at most once each major functional unit so
that such units do not have to be replicated
 functional units can be shared between different cycles within
one instruction
 Data at end of one cycle to be used in next must be
stored !!

15
Breaking instructions into steps
 We break instructions into the following potential
execution steps – not all instructions require all the
steps – each step takes one clock cycle
1. Instruction fetch and PC increment (IF)
2. Instruction decode and register fetch (ID)
3. Execution, memory address computation, or branch completion
(EX)
4. Memory access or R-type instruction completion (MEM)
5. Memory read completion (WB)

 Each MIPS instruction takes from 3 – 5 cycles (steps)


16
Instruction types
 Consider the code?

lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Label #assume not equal
add $t5, $t2, $t3
sw $t5, 8($t3)
Label: j 32

 Identify the type of each instruction?

17
Step 1: Instruction Fetch & PC
Increment (IF)
 Use PC to get instruction and put it in the instruction
register.
Increment the PC by 4 and put the result back in the PC.

 Can be described succinctly using RTL (Register-Transfer


Language):
IR = Memory[PC];
PC = PC + 4;

18
Step 2: Instruction Decode and
Register Fetch (ID)
 Read registers rs and rt in case we need them.
Compute the branch address in case the instruction is
a branch.

 RTL:
A = Reg[IR[25-21]];
B = Reg[IR[20-16]];
ALUOut = PC + sign-extend((IR[15-0]) <<
2);

19
Step 3: Execution, Address Computation
or Branch Completion (EX)
 ALU performs one of four functions depending
on instruction type
 memory reference:
ALUOut = A + sign-extend(IR[15-0]);
 R-type:
ALUOut = A op B;
 branch (instruction completes):
if (A==B) PC = ALUOut;
 branch destination address = (PC + 4) + (4 * offset)

 jump (instruction completes):


PC = PC[31-28] || (IR[25-0] << 2)
20
Step 4: Memory access or R-type
Instruction Completion (MEM)

 Again depending on instruction type:


 Loads and stores access memory
 load
MDR = Memory[ALUOut];
 store (instruction completes)
Memory[ALUOut] = B;

 R-type (instructions completes)


Reg[IR[15-11]] = ALUOut;

21
Step 5: Memory Read
Completion (WB)
 Again depending on instruction type:
 Load writes back (instruction completes)
Reg[IR[20-16]]= MDR;
Important: There is no reason from a datapath (or control)
point of view that Step 5 cannot be eliminated by
performing
Reg[IR[20-16]]= Memory[ALUOut];
for loads in Step 4. This would eliminate the MDR as
well.
The reason this is not done is that, to keep steps
balanced in length, the design restriction is to allow
each step to contain at most one ALU operation, or one
register access, or one memory access.
22
Summary of Instruction Execution
Action for R-type Action for memory-reference Action for Action for
Step
Step name instructions instructions branches jumps
Instruction fetch IR = Memory[PC]
1: IF PC = PC + 4
Instruction A = Reg [IR[25-21]]
2: ID decode/register fetch B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II
3: EX computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)
jump completion
Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]
4: MEM completion ALUOut or
Store: Memory [ALUOut] = B
5: WB Memory read completion Load: Reg[IR[20-16]] = MDR

23
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;

I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
R
WD RD2 B
MemRead RegWrite

24
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;

I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
R
WD RD2 B
MemRead RegWrite

25
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;

I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
R
WD RD2 B
MemRead RegWrite

26
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;

I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
R
WD RD2 B
MemRead RegWrite

27
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;

I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
R
WD RD2 B
MemRead RegWrite

28
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;

I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
R
WD RD2 B
MemRead RegWrite

29
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;

I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU

WD
R
RD2 B 4 OUT

MemRead RegWrite

30
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;

I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU

WD
R
RD2 B 4 OUT

MemRead RegWrite

31
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;

I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU

WD
R
RD2 B 4 OUT

MemRead RegWrite

32
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;

I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU

WD
R
RD2 B 4 OUT

MemRead RegWrite

33
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;

I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU

WD
R
RD2 B 4 OUT

MemRead RegWrite

34
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;

I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU

PC + 4 WD
R
RD2 B 4 OUT

MemRead RegWrite

35
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)

I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite

36
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)

I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite

37
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)

I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite

38
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)

I Instruction I
R
5 5 5 Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite

39
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)

I Instruction I
R
5 5 5 Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]

40
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)

I Instruction I
R
5 5 5 Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]

41
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)

I Instruction I
R
5 5 5 Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]

42
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)

I Instruction I
R
5 5 5 Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]

43
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)

I Instruction I
R Branch
5 5 5 Reg[rs] Operation
3
Target
PC MemWrite RN1 RN2 WN Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]

44
Multicycle Execution Step (3):
Memory Reference Instructions
ALUOut = A + sign-extend(IR[15-0]);

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]

45
Multicycle Execution Step (3):
Memory Reference Instructions
ALUOut = A + sign-extend(IR[15-0]);

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]

46
Multicycle Execution Step (3):
Memory Reference Instructions
ALUOut = A + sign-extend(IR[15-0]);

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]

47
Multicycle Execution Step (3):
Memory Reference Instructions
ALUOut = A + sign-extend(IR[15-0]);

I Instruction I
R
5 5 5
Reg[rs] Operation
Mem.
3
PC MemWrite RN1 RN2 WN Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]

48
Multicycle Execution Step (3):
ALU Instruction (R-Type)

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

49
Multicycle Execution Step (3):
ALU Instruction (R-Type)

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

50
Multicycle Execution Step (3):
ALU Instruction (R-Type)
ALUOut = A op B

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

51
Multicycle Execution Step (3):
ALU Instruction (R-Type)
ALUOut = A op B

I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

52
Multicycle Execution Step (3):
Branch Instructions
if (A == B) PC = ALUOut;

I Instruction I
R
Branch
5 5 5
Reg[rs] Operation
3
Target
PC MemWrite RN1 RN2 WN
Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]

53
Multicycle Execution Step (3):
Branch Instructions
if (A == B) PC = ALUOut;

I Instruction I
R
Branch
5 5 5
Reg[rs] Operation
3
Target
PC MemWrite RN1 RN2 WN
Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]

54
Multicycle Execution Step (3):
Branch Instructions
if (A == B) PC = ALUOut;

I Instruction I
R
Branch
5 5 5
Reg[rs] Operation
3
Target
PC MemWrite RN1 RN2 WN
Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]

55
Multicycle Execution Step (3):
Branch Instructions
if (A == B) PC = ALUOut;

I Instruction I
R
Branch
5 5 5
Reg[rs] Operation
3
Target
PC MemWrite RN1 RN2 WN
Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
Branch R OUT

Target WD RD2 B
MemRead RegWrite
Address
Reg[rt]

56
Multicycle Execution Step (3):
Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)

I Instruction I
R
5 5 5 Branch
Reg[rs] Operation
3 Target
PC MemWrite RN1 RN2 WN
ADDR
Address
Registers
RD1 A Zero
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]

57
Multicycle Execution Step (3):
Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)

I Instruction I
R
5 5 5 Branch
Reg[rs] Operation
3 Target
PC MemWrite RN1 RN2 WN
ADDR
Address
Registers
RD1 A Zero
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]

58
Multicycle Execution Step (3):
Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)

I Instruction I
R
5 5 5 Branch
Reg[rs] Operation
3 Target
PC MemWrite RN1 RN2 WN
ADDR
Address
Registers
RD1 A Zero
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]

59
Multicycle Execution Step (3):
Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)

I Instruction I
R
5 5 5 Branch
Reg[rs] Operation
3 Target
PC MemWrite RN1 RN2 WN
ADDR
Address
Registers
RD1 A Zero
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]

60
Multicycle Execution Step (3):
Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)

I Instruction I
R
5 5 5 Branch
Reg[rs] Operation
3 Target
PC MemWrite RN1 RN2 WN
ADDR
Address
Registers
RD1 A Zero
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]

61
Multicycle Execution Step (3):
Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)

I Instruction I
R
5 5 5 Branch
Reg[rs] Operation
3 Target
PC MemWrite RN1 RN2 WN
ADDR
Address
Registers
RD1 A Zero
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]

62
Multicycle Execution Step (3):
Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)

I Instruction I
R
5 5 5 Branch
Reg[rs] Operation
3 Target
PC MemWrite RN1 RN2 WN
ADDR
Address
Registers
RD1 A Zero
Memory M ALU
RD D WD ALU
Jump R OUT
WD RD2 B
Address MemRead RegWrite
Reg[rt]

63
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];

I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

64
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];

I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

65
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];

I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

66
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];

I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

67
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];

I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

68
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];

I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

69
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];

I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

70
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];

I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

71
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];

I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Mem. Reg[rt]
Data

72
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];

I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Mem. Reg[rt]
Data

73
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

74
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

75
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

76
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

77
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

78
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

79
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

80
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

81
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

82
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

83
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

84
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;

I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

85
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT

I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

86
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT

I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

87
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT

I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

88
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT

I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

89
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT

I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

90
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT

I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

91
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT

I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

92
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT

I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

93
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT

I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]

94
Multicycle Execution Step (5):
Memory Read Completion (lw)
Reg[IR[20-16]] = MDR;

I Instruction I
R
5 5 5
Reg[rs] Operation
3
Mem.
PC MemWrite RN1 RN2 WN
ADDR
Address
RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Mem. Reg[rt]
Data

95
Multicycle Execution Step (5):
Memory Read Completion (lw)
Reg[IR[20-16]] = MDR;

I Instruction I
R
5 5 5
Reg[rs] Operation
3
Mem.
PC MemWrite RN1 RN2 WN
ADDR
Address
RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Mem. Reg[rt]
Data

96
Multicycle Execution Step (5):
Memory Read Completion (lw)
Reg[IR[20-16]] = MDR;

I Instruction I
R
5 5 5
Reg[rs] Operation
3
Mem.
PC MemWrite RN1 RN2 WN
ADDR
Address
RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Mem. Reg[rt]
Data

97
Multicycle Execution Step (5):
Memory Read Completion (lw)
Reg[IR[20-16]] = MDR;

I Instruction I
R
5 5 5
Reg[rs] Operation
3
Mem.
PC MemWrite RN1 RN2 WN
ADDR
Address
RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Mem. Reg[rt]
Data

98
Quizes
 Suppose PC=1024, $t2=10, $t1=23,$t3=71, run each of the
following instructions step by step

 lw $t2, 0($t3)
sw $t1, 4($t3)
beq $t2, $t3, 18
add $t1, $t2, $t3
bne $t1, $t2,-8
j 85

99
Simple Questions
 How many cycles will it take to execute this code?

lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Label #assume not equal
add $t5, $t2, $t3
sw $t5, 8($t3)
Label: j 85

 What is going on during the 8th cycle of execution?

Clock time-line
 In what cycle does the actual addition of $t2 and $t3 takes place?

100
Datapath with Control I
PCSrc

1
Add M
u
x
4 ALU 0
Add result
New multiplexor RegWrite Shift
left 2

Instruction [25– 21] Read


Read register 1 Read MemWrite
PC data 1
address Instruction [20– 16] Read MemtoReg
ALUSrc
Instruction register 2 Zero
1 Read ALU ALU
[31– 0] Write data 2 1 Read
M result Address 1
u register M data
Instruction Instruction [15– 11] x u M
memory Write x u
0 data Registers x
0
Write Data 0
RegDst data memory
Instruction [15– 0] 16 Sign 32
extend ALU MemRead
control
Instruction [5– 0]

ALUOp

Adding control to the MIPS Datapath III (and a new multiplexor to select field to
specify destination register): what are the functions of the 9 control signals?
Control Signals

Signal Name Effect when deasserted Effect when asserted

RegDst The register destination number for the The register destination number for the
Write register comes from the rt field (bits 20-16) Write register comes from the rd field (bits 15-11)
RegWrite None The register on the Write register input is written
with the value on the Write data input
AlLUSrc The second ALU operand comes from the The second ALU operand is the sign-extended,
second register file output (Read data 2) lower 16 bits of the instruction
PCSrc The PC is replaced by the output of the adder The PC is replaced by the output of the adder
that computes the value of PC + 4 that computes the branch target
MemRead None Data memory contents designated by the address
input are put on the first Read data output
MemWrite None Data memory contents designated by the address
input are replaced by the value of the Write data input
MemtoReg The value fed to the register Write data input The value fed to the register Write data input
comes from the ALU comes from the data memory

Effects of the seven control signals


Datapath with Control II
0
M
u
x
ALU
Add result 1
Add Shift PCSrc
RegDst left 2
4 Branch
MemRead
Instruction [31 26] MemtoReg
Control
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25 21] Read


PC Read register 1
address Read
Instruction [20 16] data 1
Read
register 2 Zero
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15 0] Sign
extend ALU
control

Instruction [5 0]

MIPS datapath with the control unit: input to control is the 6-bit instruction
opcode field, output is seven 1-bit signals and the 2-bit ALUOp signal
PCSrc cannot be
0
set directly from the
M
u opcode: zero test
x
ALU
Add result 1
outcome is required
Add Shift PCSrc
RegDst left 2
4 Branch
MemRead
Instruction [31 26] MemtoReg
Control
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25 21] Read


PC Read register 1
address Read
Instruction [20 16] data 1
Read
register 2 Zero
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15 11] x u
1 Write x Data
data x
1 memory 0
Write
data

Datapath with Instruction [15 0]


16
Sign
extend
32

ALU
control

Control II (cont.) Instruction [5 0]

Determining control signals for the MIPS datapath based on instruction opcode

Memto- Reg Mem Mem


Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
Implementation: Main Control
Block
Inputs
Op5
Op4
Op3
Signal R- lw sw beq Op2
name format Op1
Op0
Op5 0 1 1 0
Op4 0 0 0 0
Inputs

Op3 0 0 1 0 Outputs
Op2 0 0 0 1 R-format Iw sw beq
RegDst
Op1 0 1 1 0 ALUSrc
Op0 0 1 1 0 MemtoReg
RegDst 1 0 x x RegWrite
ALUSrc 0 1 1 0 MemRead
MemtoReg 0 1 x x MemWrite
Outputs

RegWrite 1 1 0 0 Branch
MemRead 0 1 0 0 ALUOp1

MemWrite 0 0 1 0 ALUOpO

Branch 0 0 0 1 Main control PLA (programmable


ALUOp1 1 0 0 0 logic array): principle underlying
ALUOP2 0 0 0 1 PLAs is that any logical expression
Truth table for main control signals can be written as a sum-of-products
PCWrite

FSM Control: PCWriteCond


IorD
MemRead

Implement- MemWrite
IRWrite
Control logic

ation
MemtoReg
PCSource
ALUOp
Outputs ALUSrcB
ALUSrcA
RegWrite
RegDst

NS3
NS2
NS1
Inputs NS0
Op5

Op4

Op3

Op2

Op1

Op0

S3

S2

S1

S0
Instruction register State register
opcode field

Four state bits are required for 10 states

High-level view of FSM implementation: inputs to the combinational logic block are
the current state number and instruction opcode bits; outputs are the next state
number and control signals to be asserted for the current state
Op5

Op4

FSM Op3

Op2

Control: Op1

Op0

PLA
S3

S2

Implem-
S1

S0

entation
PCWrite
PCWriteCond
IorD
MemRead
MemWrite
IRWrite
MemtoReg
PCSource1
PCSource0
ALUOp1
ALUOp0
ALUSrcB1
ALUSrcB0
ALUSrcA
RegWrite
RegDst
NS3
NS2
NS1
NS0

Upper half is the AND plane that computes all the products. The products are carried
to the lower OR plane by the vertical lines. The sum terms for each output is given by
the corresponding horizontal line
E.g., IorD = S0.S1.S2.S3 + S0.S1.S2.S3
FSM Control: ROM
Implementation
 ROM (Read Only Memory)
 values of memory locations are fixed ahead of time
 A ROM can be used to implement a truth table
 if the address is m-bits, we can address 2m entries in the ROM
 outputs are the bits of the entry the address points to

address output
0 0 0 0 0 1 1
m n
0 0 1 1 1 0 0
0 1 0 1 1 0 0
ROM m = 3 0 1 1 1 0 0 0
n = 4 1 0 0 0 0 0 0
1 0 1 0 0 0 1
1 1 0 0 1 1 0
1 1 1 0 1 1 1
The size of an m-input n-output ROM is 2m x n bits – such a ROM can
be thought of as an array of size 2m with each entry in the array being
n bits
Microprogramming
 Microprogramming is a method of specifying FSM control that
resembles a programming language – textual rather graphic
 this is appropriate when the FSM becomes very large, e.g., if the
instruction set is large and/or the number of cycles per instruction is
large
 in such situations graphical representation becomes difficult as
there may be thousands of states and even more arcs joining them
 a microprogram is specification : implementation is by ROM or PLA
 A microprogram is a sequence of microinstructions
 each microinstruction has eight fields (label + 7 functional)
 Label: used to control microcode sequencing
 ALU control: specify operation to be done by ALU
 SRC1: specify source for first ALU operand
 SRC2: specify source for second ALU operand
 Register control: specify read/write for register file
 Memory: specify read/write for memory
 PCWrite control: specify the writing of the PC
 Sequencing: specify choice of next microinstruction
Example: CPI in a multicycle
CPU
 Assume
 the control design of the previous slide
 An instruction mix of 22% loads, 11% stores, 49% R-type operations,
16% branches, and 2% jumps
 What is the CPI assuming each step requires 1 clock cycle?

 Solution:
 Number of clock cycles from previous slide for each instruction class:
 loads 5, stores 4, R-type instructions 4, branches 3, jumps 3
 CPI = CPU clock cycles / instruction count
=  (instruction countclass i  CPIclass i) / instruction count
=  (instruction countclass I / instruction count)  CPIclass I
= 0.22  5 + 0.11  4 + 0.49  4 + 0.16  3 + 0.02  3
= 4.04
110
Performance in MIPS
 Given the fact (on previous slide)
 The CPU clock is 2GHz (1GHz=109Hz)
 What is MIPS (Million Instruction Per Second)?
 What is the MIPS of the above system?

 Solution:
 1 second has 2x109 cycles
 MIPS= 2x103 / 4.04=495

111
Enhancing Performance
with Pipelining

112
Pipelining
 Start work ASAP!! Do not waste time!
6 PM 7 8 9 10 11 12 1 2 AM
Time
Task
order
A
Not pipelined
B

Assume 30 min. each task – wash, dry, fold, store – and that
separate tasks use separate hardware and so can be overlapped
6 PM 7 8 9 10 11 12 1 2 AM
Time

Task
order

A
Pipelined
B

D
113
Pipelined vs. Single-Cycle
Instruction Execution: the Plan
Program
execution 2 4 6 8 10 12 14 16 18
order Time
(in instructions)
Instruction Data Single-cycle
lw $1, 100($0) fetch
Reg ALU
access
Reg

Instruction Data
lw $2, 200($0) 8 ns fetch
Reg ALU
access
Reg

Instruction
lw $3, 300($0) 8 ns fetch
...
8 ns

Assume 2 ns for memory access, ALU operation; 1 ns for register access:


therefore, single cycle clock 8 ns; pipelined clock cycle 2 ns.
Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access

Instruction Data
Pipelined
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access

Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access

2 ns 2 ns 2 ns 2 ns 2 ns 114
Pipelining: Keep in Mind
 Pipelining does not reduce latency of a single
task, it increases throughput of entire workload
 Pipeline rate limited by longest stage
 potential speedup = number pipe stages
 unbalanced lengths of pipe stages reduces
speedup
 Time to fill pipeline and time to drain it – when
there is slack in the pipeline – reduces
speedup

115
Example Problem
 Problem: for the laundry fill in the following table when
1. the stage lengths are 30, 30, 30 30 min., resp.
2. the stage lengths are 20, 20, 60, 20 min., resp.

Person Unpipelined Pipeline 1 Ratio unpipelined Pipeline 2 Ratio unpiplelined


finish time finish time to pipeline 1 finish time to pipeline 2
1
2
3
4

 Come up with a formula for pipeline speed-up!

116
Pipelining MIPS

 What makes it easy with MIPS?


 all instructions are same length
 so fetch and decode stages are similar for all instructions

 just a few instruction formats


 simplifies instruction decode and makes it possible in one
stage
 memory operands appear only in load/stores
 so memory access can be deferred to exactly one later stage

 operands are aligned in memory


 one data transfer instruction requires one memory access
stage

117
Pipelining MIPS
 What makes it hard?
 structural hazards: different instructions, at different stages,
in the pipeline want to use the same hardware resource
 control hazards: succeeding instruction, to put into pipeline,
depends on the outcome of a previous branch instruction,
already in pipeline
 data hazards: an instruction in the pipeline requires data to
be computed by a previous instruction still in the pipeline

 Before actually building the pipelined datapath and


control we first briefly examine these potential
hazards individually…
118
Structural Hazards
 Structural hazard: inadequate hardware to simultaneously support
all instructions in the pipeline in the same clock cycle
 E.g., suppose single – not separate – instruction and data memory
in pipeline below with one read port
 then a structural hazard between first and fourth lw instructions

Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access
Pipelined
Instruction Data
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access
Hazard if single memory
Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access
Instruction Data
lw $4, 400($0) Reg ALU Reg
2 ns fetch access

2 ns 2 ns 2 ns 2 ns 2 ns

 MIPS was designed to be pipelined: structural hazards are easy to


avoid!
119
Control Hazards
 Control hazard: need to make a decision based on the
result of a previous instruction still executing in pipeline
 Solution 1 Stall the pipeline, add hardware to zero test
for branch in ID step
Program
execution 2 4 6 8 10 12 14 16
order Time
(in instructions)
Instruction
Reg ALU
Data Note that branch outcome is
add $4, $5, $6 fetch access computed in ID stage with
Instruction added hardware (later…)
beq $1, $2, 40 fetch
Reg ALU
2ns
Instruction Data
lw $3, 300($0) bubble fetch
Reg ALU
access
Reg

2ns
4 ns

Pipeline stall
120
Control Hazards
 Solution 2 Predict branch outcome
 e.g., predict branch-not-taken :
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
add $4, $5, $6 fetch
Reg ALU
access
Reg

Instruction Data
beq $1, $2, 40 Reg ALU Reg
2 ns fetch access

Instruction Data
lw $3, 300($0) Reg ALU Reg
2 ns fetch access

Prediction success
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
add $4, $5 ,$6 Reg ALU Reg
fetch access

Instruction Data
beq $1, $2, 40 Reg ALU Reg
fetch access
2 ns
bubble bubble bubble bubble bubble

Instruction Data
or $7, $8, $9 Reg ALU Reg
fetch access
4 ns
121
Prediction failure: undo (=flush) lw
Control Hazards
 Solution 3 Delayed branch: always execute the sequentially next
statements with the branch executing after one instruction delay –
compiler’s job to find a statement that can be put in the slot that is
independent of branch outcome
 MIPS does this – but it is an option in SPIM (Simulator -> Settings)
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)

beq $1, $2, 40 Instruction


Reg ALU
fetch

add $4, $5, $6 Instruction Data


Reg ALU
(d elayed branch slot) 2 ns fetch access

Instruction Data
lw $3, 300($0) Reg ALU Reg
2 ns fetch access

Instruction Data
2 ns Reg ALU
or $t0, $t1, $t2 fetch access

122
Data Hazards
 Data hazard: instruction needs data from the result of a
previous instruction still executing in pipeline
 Solution Forward data if possible…

2 4 6 8 10
Time
Instruction pipeline diagram:
add $s0, $t0, $t1 IF ID EX MEM WB shade indicates use –
left=write, right=read

Program
execution 2 4 6 8 10
order Time
(in instructions)
add $s0, $t0, $t1 IF ID EX MEM WB
Without forwarding – blue line –
data has to go back in time;
with forwarding – red line –
sub $t2, $s0, $t3
data is available in time
IF ID EX MEM WB

123
Data Hazards
 Forwarding may not be enough
 e.g., if an R-type instruction following a load uses the result of the load –
called load-use data hazard
2 4 6 8 10 12 14
Program Time
execution
order
(in instructions)
Without a stall it is impossible
lw $s0, 20($t1) IF ID EX MEM WB
to provide input to the sub
instruction in time
sub $t2, $s0, $t3 IF ID EX MEM WB

2 4 6 8 10 12 14
Program Time
execution
order
(in instructions)
With a one-stage stall, forwarding
lw $s0, 20($t1) IF ID EX MEM WB can get the data to the sub
instruction in time
bubble bubble bubble bubble bubble

sub $t2, $s0, $t3 IF ID EX MEM WB


124
Reordering Code to Avoid
Pipeline Stall (Software Solution)
 Example:
lw $t0, 0($t1)
lw $t2, 4($t1) Data hazard
sw $t2, 0($t1)
sw $t0, 4($t1)

 Reordered code:
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t0, 4($t1)
Interchanged
sw $t2, 0($t1)

125
Pipelined Datapath
 We now move to actually building a pipelined datapath
 First recall the 5 steps in instruction execution
1. Instruction Fetch & PC Increment (IF)
2. Instruction Decode and Register Read (ID)
3. Execution or calculate address (EX)
4. Memory access (MEM)
5. Write result into register (WB)
 Review: single-cycle processor
 all 5 steps done in a single clock cycle
 dedicated hardware required for each step

 What happens if we break the execution into multiple cycles, but keep
the extra hardware?
126
Review - Single-Cycle Datapath
“Steps”

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D

127
Review - Single-Cycle Datapath
“Steps”

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D

IF 128
Instruction Fetch
Review - Single-Cycle Datapath
“Steps”

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D

IF 129
Instruction Fetch
Review - Single-Cycle Datapath
“Steps”

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D

IF ID 130
Instruction Fetch Instruction Decode
Review - Single-Cycle Datapath
“Steps”

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D

IF ID 131
Instruction Fetch Instruction Decode
Review - Single-Cycle Datapath
“Steps”

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D

IF ID EX 132
Instruction Fetch Instruction Decode Execute/ Address Calc.
Review - Single-Cycle Datapath
“Steps”

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D

IF ID EX 133
Instruction Fetch Instruction Decode Execute/ Address Calc.
Review - Single-Cycle Datapath
“Steps”

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D

IF ID EX MEM 134
Instruction Fetch Instruction Decode Execute/ Address Calc. Memory Access
Review - Single-Cycle Datapath
“Steps”

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D

IF ID EX MEM 135
Instruction Fetch Instruction Decode Execute/ Address Calc. Memory Access
Review - Single-Cycle Datapath
“Steps”

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D

IF ID EX MEM WB
136
Instruction Fetch Instruction Decode Execute/ Address Calc. Memory Access Write Back
Pipelined Datapath – Key Idea

 What happens if we break the execution into


multiple cycles, but keep the extra hardware?
 Answer: We may be able to start executing a new
instruction at each clock cycle - pipelining
 …but we shall need extra registers to hold data
between cycles – pipeline registers

137
Pipelined Datapath

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

138
Pipelined Datapath

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

139
Pipelined Datapath

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

140
Pipelined Datapath

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

141
Pipelined Datapath

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

142
Pipelined Datapath

Pipeline registers
ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

143
Pipelined Datapath

Pipeline registers
ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

144
Pipelined Datapath

Pipeline registers
ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID
145
Pipelined Datapath

Pipeline registers
ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX
146
Pipelined Datapath

Pipeline registers
ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM


147
Pipelined Datapath

Pipeline registers
ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


148
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


149
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


150
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


151
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


152
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


153
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


154
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


155
Only data flowing right to left may cause hazard…, why?
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


156
Only data flowing right to left may cause hazard…, why?
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


157
Only data flowing right to left may cause hazard…, why?
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


158
Only data flowing right to left may cause hazard…, why?
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


159
Only data flowing right to left may cause hazard…, why?
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


160
Only data flowing right to left may cause hazard…, why?
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


161
Only data flowing right to left may cause hazard…, why?
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


162
Only data flowing right to left may cause hazard…, why?
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


163
Only data flowing right to left may cause hazard…, why?
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


164
Only data flowing right to left may cause hazard…, why?
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


165
Only data flowing right to left may cause hazard…, why?
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


166
Only data flowing right to left may cause hazard…, why?
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


167
Only data flowing right to left may cause hazard…, why?
Pipelined Datapath

Pipeline registers wide enough to hold data coming in


ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


168
Only data flowing right to left may cause hazard…, why?
Bug in the Datapath

IF/ID ID/EX EX/MEM MEM/WB


ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

169

Write register number comes from another later instruction!


Corrected Datapath
IF/ID ID/EX EX/MEM MEM/WB

ADD
ADD
4 64 bits 133 bits
102 bits 69 bits
<<2
PC
ADDR RD 5
RN1 RD1
32 Zero
Instruction RN2
ALU
5
Memory Register
5
WN File RD2 M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5 D

Destination register number is also passed through ID/EX, EX/MEM 170


and MEM/WB registers, which are now wider by 5 bits
Pipelined Example
 Consider the following instruction sequence:
lw $t0, 10($t1)
sw $t3, 20($t4)
add $t5, $t6, $t7
sub $t8, $t9, $t10

171
Single-Clock-Cycle Diagram:
Clock Cycle 1
LW

IF/ID ID/EX EX/MEM MEM/WB

ADD
ADD
4
<<2
PC
ADDR RD RN1 RD1
32 5
ALU Zero
Instruction RN2
5
Memory Register
WN File RD2
5
M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5
D

172
Single-Clock-Cycle Diagram:
Clock Cycle 2
SW LW

IF/ID ID/EX EX/MEM MEM/WB

ADD
ADD
4
<<2
PC
ADDR RD RN1 RD1
32 5
ALU Zero
Instruction RN2
5
Memory Register
WN File RD2
5
M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5
D

173
Single-Clock-Cycle Diagram:
Clock Cycle 3
ADD SW LW

IF/ID ID/EX EX/MEM MEM/WB

ADD
ADD
4
<<2
PC
ADDR RD RN1 RD1
32 5
ALU Zero
Instruction RN2
5
Memory Register
WN File RD2
5
M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5
D

174
Single-Clock-Cycle Diagram:
Clock Cycle 4
SUB ADD SW LW

IF/ID ID/EX EX/MEM MEM/WB

ADD
ADD
4
<<2
PC
ADDR RD RN1 RD1
32 5
ALU Zero
Instruction RN2
5
Memory Register
WN File RD2
5
M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5
D

175
Single-Clock-Cycle Diagram:
Clock Cycle 5
SUB ADD SW LW

IF/ID ID/EX EX/MEM MEM/WB

ADD
ADD
4
<<2
PC
ADDR RD RN1 RD1
32 5
ALU Zero
Instruction RN2
5
Memory Register
WN File RD2
5
M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5
D

176
Single-Clock-Cycle Diagram:
Clock Cycle 6
SUB ADD SW

IF/ID ID/EX EX/MEM MEM/WB

ADD
ADD
4
<<2
PC
ADDR RD RN1 RD1
32 5
ALU Zero
Instruction RN2
5
Memory Register
WN File RD2
5
M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5
D

177
Single-Clock-Cycle Diagram:
Clock Cycle 7
SUB ADD

IF/ID ID/EX EX/MEM MEM/WB

ADD
ADD
4
<<2
PC
ADDR RD RN1 RD1
32 5
ALU Zero
Instruction RN2
5
Memory Register
WN File RD2
5
M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5
D

178
Single-Clock-Cycle Diagram:
Clock Cycle 8
SUB

IF/ID ID/EX EX/MEM MEM/WB

ADD
ADD
4
<<2
PC
ADDR RD RN1 RD1
32 5
ALU Zero
Instruction RN2
5
Memory Register
WN File RD2
5
M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5
D

179
Alternative View –
Multiple-Clock-Cycle Diagram
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8
Time axis
lw $t0, 10($t1) IM REG ALU DM REG

sw $t3, 20($t4) IM REG ALU DM REG

add $t5, $t6, $t7 IM REG ALU DM REG

sub $t8, $t9, $t10 IM REG ALU DM REG

180
Notes
 One significant difference in the execution of an R-type
instruction between multicycle and pipelined
implementations:
 register write-back for the R-type instruction is the 5th
(the last write-back) pipeline stage vs. the 4th stage for
the multicycle implementation. Why?
 think of structural hazards when writing to the register
file…
 Worth repeating: the essential difference between the
pipeline and multicycle implementations is the insertion of
pipeline registers to decouple the 5 stages
 The CPI of an ideal pipeline (no stalls) is 1. Why?

181
Simple Example: Comparing
Performance
 Compare performance for multicycle, and pipelined datapaths using the
gcc instruction mix
 assume 2 ns for memory access
 assume gcc instruction mix 23% loads, 13% stores, 19% branches,
2% jumps, 43% ALU
 for pipelined execution assume
 50% of the loads are followed immediately by an instruction that uses
the result of the load. This sacrifies 2 cylces.
 25% of branches are mispredicted, delay on misprediction is 1 clock
cycle
 jumps always incur 1 clock cycle delay so their average time is 2 clock
cycles

182
Simple Example: Comparing
Performance
 Multicycle: average instruction time 8.04 ns
 Pipelined:
 loads use 1 cc (clock cycle) when no load-use dependency
and 2 cc when there is dependency – given 50% of loads
are followed by dependency the average cc per load is:
0.5*1+0.5*2=1.5
 stores use 1 cc each
 branches use 1 cc when predicted correctly and 2 cc when
not – given 25% mis-prediction average cc per branch is
0.75*1+0.25*2=1.25
 jumps use 2 cc each
 ALU instructions use 1 cc each
 therefore, average CPI is
1.5  23% + 1  13% + 1.25  19% + 2  2% + 1  43% =
1.1825
 therefore, average instruction time is 1.1825  2 = 2.365 ns183
Summary
 Techniques described in this chapter to design datapaths and
control are at the core of all modern computer architecture
 Multicycle datapaths offer two great advantages over single-cycle
 functional units can be reused within a single instruction if they are
accessed in different cycles – reducing the need to replicate
expensive logic
 instructions with shorter execution paths can complete quicker by
consuming fewer cycles
 Modern computers, in fact, take the multicycle paradigm to a
higher level to achieve greater instruction throughput:
 pipelining (next topic) where multiple instructions execute
simultaneously by having cycles of different instructions overlap in
the datapath
 the MIPS architecture was designed to be pipelined

184

You might also like