Processor
Processor
1
Processor on MIPS
2
CPU – RAM interconnection
3
Instruction Execution
5
Instruction type in MIPS
arithmetic-logic instructions:
add $s1,$s2,$s3 R $s1 = $s2 + $s3
sub $s1,$s2,$s3 R $s1 = $s2 – $s3
or $s1,$t2,$t3 R $s1 = $t2 | $t3
and $s1,$t2,$t3 R $s1 = $t2 & $t3
memory-reference instructions: lw, sw
lw $s1,100($s2) I $s1 = Memory[$s2+100]
sw $s1,100($s2) I Memory[$s2+100] = $s1
control-flow instructions: beq, j
bne $s4,$s5,Lab1 I if $s4 != $s5 goto Lab1
beq $s4,$s5,Lab2 I if $s4 = $s5 goto Lab2
j Lab3 J goto Lab3
6
ALU
A
Mux2
8 Register
DeMux
8 8
File
0 1 B
Mux1
Mux
8 (4)
8
Flags
ALU
4 4
4
8
4
4
7
Operation Operand1 Operand2 Destination
Implementing MIPS
We're ready to look at an implementation of the MIPS
instruction set
Simplified to contain only
arithmetic-logic instructions: add, sub, and, or, slt
memory-reference instructions: lw, sw
control-flow instructions: beq, j
op rs rt offset I-Format
6 bits 26 bits
op address J-Format
8
ALU Control
10
R-format Example
op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
0 17 18 8 0 32
000000100011001001000000001000002 = 0232402016
11
Hexadecimal
Base 16
Compact representation of bit strings
4 bits per hex digit
0 0000 4 0100 8 1000 c 1100
1 0001 5 0101 9 1001 d 1101
2 0010 6 0110 a 1010 e 1110
3 0011 7 0111 b 1011 f 1111
Example: eca8 6420
1110 1100 1010 1000 0110 0100 0010 0000
12
MIPS I-format Instructions
op rs rt constant or address
6 bits 5 bits 5 bits 16 bits
Multi-Cycle
break fetch/execute cycle into multiple steps
15
Breaking instructions into steps
We break instructions into the following potential
execution steps – not all instructions require all the
steps – each step takes one clock cycle
1. Instruction fetch and PC increment (IF)
2. Instruction decode and register fetch (ID)
3. Execution, memory address computation, or branch completion
(EX)
4. Memory access or R-type instruction completion (MEM)
5. Memory read completion (WB)
lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Label #assume not equal
add $t5, $t2, $t3
sw $t5, 8($t3)
Label: j 32
17
Step 1: Instruction Fetch & PC
Increment (IF)
Use PC to get instruction and put it in the instruction
register.
Increment the PC by 4 and put the result back in the PC.
18
Step 2: Instruction Decode and
Register Fetch (ID)
Read registers rs and rt in case we need them.
Compute the branch address in case the instruction is
a branch.
RTL:
A = Reg[IR[25-21]];
B = Reg[IR[20-16]];
ALUOut = PC + sign-extend((IR[15-0]) <<
2);
19
Step 3: Execution, Address Computation
or Branch Completion (EX)
ALU performs one of four functions depending
on instruction type
memory reference:
ALUOut = A + sign-extend(IR[15-0]);
R-type:
ALUOut = A op B;
branch (instruction completes):
if (A==B) PC = ALUOut;
branch destination address = (PC + 4) + (4 * offset)
21
Step 5: Memory Read
Completion (WB)
Again depending on instruction type:
Load writes back (instruction completes)
Reg[IR[20-16]]= MDR;
Important: There is no reason from a datapath (or control)
point of view that Step 5 cannot be eliminated by
performing
Reg[IR[20-16]]= Memory[ALUOut];
for loads in Step 4. This would eliminate the MDR as
well.
The reason this is not done is that, to keep steps
balanced in length, the design restriction is to allow
each step to contain at most one ALU operation, or one
register access, or one memory access.
22
Summary of Instruction Execution
Action for R-type Action for memory-reference Action for Action for
Step
Step name instructions instructions branches jumps
Instruction fetch IR = Memory[PC]
1: IF PC = PC + 4
Instruction A = Reg [IR[25-21]]
2: ID decode/register fetch B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II
3: EX computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)
jump completion
Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]
4: MEM completion ALUOut or
Store: Memory [ALUOut] = B
5: WB Memory read completion Load: Reg[IR[20-16]] = MDR
23
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;
I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
R
WD RD2 B
MemRead RegWrite
24
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;
I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
R
WD RD2 B
MemRead RegWrite
25
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;
I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
R
WD RD2 B
MemRead RegWrite
26
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;
I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
R
WD RD2 B
MemRead RegWrite
27
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;
I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
R
WD RD2 B
MemRead RegWrite
28
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;
I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
R
WD RD2 B
MemRead RegWrite
29
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;
I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
WD
R
RD2 B 4 OUT
MemRead RegWrite
30
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;
I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
WD
R
RD2 B 4 OUT
MemRead RegWrite
31
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;
I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
WD
R
RD2 B 4 OUT
MemRead RegWrite
32
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;
I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
WD
R
RD2 B 4 OUT
MemRead RegWrite
33
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;
I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
WD
R
RD2 B 4 OUT
MemRead RegWrite
34
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;
I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
PC + 4 WD
R
RD2 B 4 OUT
MemRead RegWrite
35
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)
I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
36
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)
I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
37
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)
I Instruction I
R
5 5 5 Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
38
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)
I Instruction I
R
5 5 5 Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
39
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)
I Instruction I
R
5 5 5 Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]
40
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)
I Instruction I
R
5 5 5 Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]
41
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)
I Instruction I
R
5 5 5 Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]
42
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)
I Instruction I
R
5 5 5 Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]
43
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)
I Instruction I
R Branch
5 5 5 Reg[rs] Operation
3
Target
PC MemWrite RN1 RN2 WN Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]
44
Multicycle Execution Step (3):
Memory Reference Instructions
ALUOut = A + sign-extend(IR[15-0]);
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]
45
Multicycle Execution Step (3):
Memory Reference Instructions
ALUOut = A + sign-extend(IR[15-0]);
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]
46
Multicycle Execution Step (3):
Memory Reference Instructions
ALUOut = A + sign-extend(IR[15-0]);
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]
47
Multicycle Execution Step (3):
Memory Reference Instructions
ALUOut = A + sign-extend(IR[15-0]);
I Instruction I
R
5 5 5
Reg[rs] Operation
Mem.
3
PC MemWrite RN1 RN2 WN Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 WD
R B
RD2
MemRead RegWrite
Reg[rt]
48
Multicycle Execution Step (3):
ALU Instruction (R-Type)
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
49
Multicycle Execution Step (3):
ALU Instruction (R-Type)
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
50
Multicycle Execution Step (3):
ALU Instruction (R-Type)
ALUOut = A op B
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
51
Multicycle Execution Step (3):
ALU Instruction (R-Type)
ALUOut = A op B
I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
52
Multicycle Execution Step (3):
Branch Instructions
if (A == B) PC = ALUOut;
I Instruction I
R
Branch
5 5 5
Reg[rs] Operation
3
Target
PC MemWrite RN1 RN2 WN
Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]
53
Multicycle Execution Step (3):
Branch Instructions
if (A == B) PC = ALUOut;
I Instruction I
R
Branch
5 5 5
Reg[rs] Operation
3
Target
PC MemWrite RN1 RN2 WN
Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]
54
Multicycle Execution Step (3):
Branch Instructions
if (A == B) PC = ALUOut;
I Instruction I
R
Branch
5 5 5
Reg[rs] Operation
3
Target
PC MemWrite RN1 RN2 WN
Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]
55
Multicycle Execution Step (3):
Branch Instructions
if (A == B) PC = ALUOut;
I Instruction I
R
Branch
5 5 5
Reg[rs] Operation
3
Target
PC MemWrite RN1 RN2 WN
Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
Branch R OUT
Target WD RD2 B
MemRead RegWrite
Address
Reg[rt]
56
Multicycle Execution Step (3):
Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)
I Instruction I
R
5 5 5 Branch
Reg[rs] Operation
3 Target
PC MemWrite RN1 RN2 WN
ADDR
Address
Registers
RD1 A Zero
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]
57
Multicycle Execution Step (3):
Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)
I Instruction I
R
5 5 5 Branch
Reg[rs] Operation
3 Target
PC MemWrite RN1 RN2 WN
ADDR
Address
Registers
RD1 A Zero
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]
58
Multicycle Execution Step (3):
Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)
I Instruction I
R
5 5 5 Branch
Reg[rs] Operation
3 Target
PC MemWrite RN1 RN2 WN
ADDR
Address
Registers
RD1 A Zero
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]
59
Multicycle Execution Step (3):
Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)
I Instruction I
R
5 5 5 Branch
Reg[rs] Operation
3 Target
PC MemWrite RN1 RN2 WN
ADDR
Address
Registers
RD1 A Zero
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]
60
Multicycle Execution Step (3):
Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)
I Instruction I
R
5 5 5 Branch
Reg[rs] Operation
3 Target
PC MemWrite RN1 RN2 WN
ADDR
Address
Registers
RD1 A Zero
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]
61
Multicycle Execution Step (3):
Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)
I Instruction I
R
5 5 5 Branch
Reg[rs] Operation
3 Target
PC MemWrite RN1 RN2 WN
ADDR
Address
Registers
RD1 A Zero
Memory M ALU
RD D WD ALU
OUT
WD
R B
RD2
MemRead RegWrite
Reg[rt]
62
Multicycle Execution Step (3):
Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)
I Instruction I
R
5 5 5 Branch
Reg[rs] Operation
3 Target
PC MemWrite RN1 RN2 WN
ADDR
Address
Registers
RD1 A Zero
Memory M ALU
RD D WD ALU
Jump R OUT
WD RD2 B
Address MemRead RegWrite
Reg[rt]
63
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];
I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
64
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];
I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
65
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];
I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
66
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];
I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
67
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];
I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
68
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];
I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
69
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];
I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
70
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];
I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
71
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];
I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Mem. Reg[rt]
Data
72
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];
I Instruction I
R
5 5 5
Reg[rs] Operation Mem.
PC MemWrite RN1 RN2 WN
3 Address
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Mem. Reg[rt]
Data
73
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
74
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
75
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
76
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
77
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
78
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
79
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
80
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
81
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
82
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
83
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
84
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;
I Instruction I
R
5 5 5
Reg[rs] Operation
3
PC MemWrite RN1 RN2 WN
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
85
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT
I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
86
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT
I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
87
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT
I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
88
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT
I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
89
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT
I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
90
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT
I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
91
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT
I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
92
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT
I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
93
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15-11]] = ALUOUT
I Instruction I
R
5 5 5
Reg[rs] Operation
3
R-Type
PC MemWrite RN1 RN2 WN
Result
ADDR RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Reg[rt]
94
Multicycle Execution Step (5):
Memory Read Completion (lw)
Reg[IR[20-16]] = MDR;
I Instruction I
R
5 5 5
Reg[rs] Operation
3
Mem.
PC MemWrite RN1 RN2 WN
ADDR
Address
RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Mem. Reg[rt]
Data
95
Multicycle Execution Step (5):
Memory Read Completion (lw)
Reg[IR[20-16]] = MDR;
I Instruction I
R
5 5 5
Reg[rs] Operation
3
Mem.
PC MemWrite RN1 RN2 WN
ADDR
Address
RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Mem. Reg[rt]
Data
96
Multicycle Execution Step (5):
Memory Read Completion (lw)
Reg[IR[20-16]] = MDR;
I Instruction I
R
5 5 5
Reg[rs] Operation
3
Mem.
PC MemWrite RN1 RN2 WN
ADDR
Address
RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Mem. Reg[rt]
Data
97
Multicycle Execution Step (5):
Memory Read Completion (lw)
Reg[IR[20-16]] = MDR;
I Instruction I
R
5 5 5
Reg[rs] Operation
3
Mem.
PC MemWrite RN1 RN2 WN
ADDR
Address
RD1 A Zero
Registers
Memory M ALU
RD D WD ALU
OUT
PC + 4 R
WD RD2 B
MemRead RegWrite
Mem. Reg[rt]
Data
98
Quizes
Suppose PC=1024, $t2=10, $t1=23,$t3=71, run each of the
following instructions step by step
lw $t2, 0($t3)
sw $t1, 4($t3)
beq $t2, $t3, 18
add $t1, $t2, $t3
bne $t1, $t2,-8
j 85
99
Simple Questions
How many cycles will it take to execute this code?
lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Label #assume not equal
add $t5, $t2, $t3
sw $t5, 8($t3)
Label: j 85
Clock time-line
In what cycle does the actual addition of $t2 and $t3 takes place?
100
Datapath with Control I
PCSrc
1
Add M
u
x
4 ALU 0
Add result
New multiplexor RegWrite Shift
left 2
ALUOp
Adding control to the MIPS Datapath III (and a new multiplexor to select field to
specify destination register): what are the functions of the 9 control signals?
Control Signals
RegDst The register destination number for the The register destination number for the
Write register comes from the rt field (bits 20-16) Write register comes from the rd field (bits 15-11)
RegWrite None The register on the Write register input is written
with the value on the Write data input
AlLUSrc The second ALU operand comes from the The second ALU operand is the sign-extended,
second register file output (Read data 2) lower 16 bits of the instruction
PCSrc The PC is replaced by the output of the adder The PC is replaced by the output of the adder
that computes the value of PC + 4 that computes the branch target
MemRead None Data memory contents designated by the address
input are put on the first Read data output
MemWrite None Data memory contents designated by the address
input are replaced by the value of the Write data input
MemtoReg The value fed to the register Write data input The value fed to the register Write data input
comes from the ALU comes from the data memory
Instruction [5 0]
MIPS datapath with the control unit: input to control is the 6-bit instruction
opcode field, output is seven 1-bit signals and the 2-bit ALUOp signal
PCSrc cannot be
0
set directly from the
M
u opcode: zero test
x
ALU
Add result 1
outcome is required
Add Shift PCSrc
RegDst left 2
4 Branch
MemRead
Instruction [31 26] MemtoReg
Control
ALUOp
MemWrite
ALUSrc
RegWrite
ALU
control
Determining control signals for the MIPS datapath based on instruction opcode
Op3 0 0 1 0 Outputs
Op2 0 0 0 1 R-format Iw sw beq
RegDst
Op1 0 1 1 0 ALUSrc
Op0 0 1 1 0 MemtoReg
RegDst 1 0 x x RegWrite
ALUSrc 0 1 1 0 MemRead
MemtoReg 0 1 x x MemWrite
Outputs
RegWrite 1 1 0 0 Branch
MemRead 0 1 0 0 ALUOp1
MemWrite 0 0 1 0 ALUOpO
Implement- MemWrite
IRWrite
Control logic
ation
MemtoReg
PCSource
ALUOp
Outputs ALUSrcB
ALUSrcA
RegWrite
RegDst
NS3
NS2
NS1
Inputs NS0
Op5
Op4
Op3
Op2
Op1
Op0
S3
S2
S1
S0
Instruction register State register
opcode field
High-level view of FSM implementation: inputs to the combinational logic block are
the current state number and instruction opcode bits; outputs are the next state
number and control signals to be asserted for the current state
Op5
Op4
FSM Op3
Op2
Control: Op1
Op0
PLA
S3
S2
Implem-
S1
S0
entation
PCWrite
PCWriteCond
IorD
MemRead
MemWrite
IRWrite
MemtoReg
PCSource1
PCSource0
ALUOp1
ALUOp0
ALUSrcB1
ALUSrcB0
ALUSrcA
RegWrite
RegDst
NS3
NS2
NS1
NS0
Upper half is the AND plane that computes all the products. The products are carried
to the lower OR plane by the vertical lines. The sum terms for each output is given by
the corresponding horizontal line
E.g., IorD = S0.S1.S2.S3 + S0.S1.S2.S3
FSM Control: ROM
Implementation
ROM (Read Only Memory)
values of memory locations are fixed ahead of time
A ROM can be used to implement a truth table
if the address is m-bits, we can address 2m entries in the ROM
outputs are the bits of the entry the address points to
address output
0 0 0 0 0 1 1
m n
0 0 1 1 1 0 0
0 1 0 1 1 0 0
ROM m = 3 0 1 1 1 0 0 0
n = 4 1 0 0 0 0 0 0
1 0 1 0 0 0 1
1 1 0 0 1 1 0
1 1 1 0 1 1 1
The size of an m-input n-output ROM is 2m x n bits – such a ROM can
be thought of as an array of size 2m with each entry in the array being
n bits
Microprogramming
Microprogramming is a method of specifying FSM control that
resembles a programming language – textual rather graphic
this is appropriate when the FSM becomes very large, e.g., if the
instruction set is large and/or the number of cycles per instruction is
large
in such situations graphical representation becomes difficult as
there may be thousands of states and even more arcs joining them
a microprogram is specification : implementation is by ROM or PLA
A microprogram is a sequence of microinstructions
each microinstruction has eight fields (label + 7 functional)
Label: used to control microcode sequencing
ALU control: specify operation to be done by ALU
SRC1: specify source for first ALU operand
SRC2: specify source for second ALU operand
Register control: specify read/write for register file
Memory: specify read/write for memory
PCWrite control: specify the writing of the PC
Sequencing: specify choice of next microinstruction
Example: CPI in a multicycle
CPU
Assume
the control design of the previous slide
An instruction mix of 22% loads, 11% stores, 49% R-type operations,
16% branches, and 2% jumps
What is the CPI assuming each step requires 1 clock cycle?
Solution:
Number of clock cycles from previous slide for each instruction class:
loads 5, stores 4, R-type instructions 4, branches 3, jumps 3
CPI = CPU clock cycles / instruction count
= (instruction countclass i CPIclass i) / instruction count
= (instruction countclass I / instruction count) CPIclass I
= 0.22 5 + 0.11 4 + 0.49 4 + 0.16 3 + 0.02 3
= 4.04
110
Performance in MIPS
Given the fact (on previous slide)
The CPU clock is 2GHz (1GHz=109Hz)
What is MIPS (Million Instruction Per Second)?
What is the MIPS of the above system?
Solution:
1 second has 2x109 cycles
MIPS= 2x103 / 4.04=495
111
Enhancing Performance
with Pipelining
112
Pipelining
Start work ASAP!! Do not waste time!
6 PM 7 8 9 10 11 12 1 2 AM
Time
Task
order
A
Not pipelined
B
Assume 30 min. each task – wash, dry, fold, store – and that
separate tasks use separate hardware and so can be overlapped
6 PM 7 8 9 10 11 12 1 2 AM
Time
Task
order
A
Pipelined
B
D
113
Pipelined vs. Single-Cycle
Instruction Execution: the Plan
Program
execution 2 4 6 8 10 12 14 16 18
order Time
(in instructions)
Instruction Data Single-cycle
lw $1, 100($0) fetch
Reg ALU
access
Reg
Instruction Data
lw $2, 200($0) 8 ns fetch
Reg ALU
access
Reg
Instruction
lw $3, 300($0) 8 ns fetch
...
8 ns
Instruction Data
Pipelined
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access
Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access
2 ns 2 ns 2 ns 2 ns 2 ns 114
Pipelining: Keep in Mind
Pipelining does not reduce latency of a single
task, it increases throughput of entire workload
Pipeline rate limited by longest stage
potential speedup = number pipe stages
unbalanced lengths of pipe stages reduces
speedup
Time to fill pipeline and time to drain it – when
there is slack in the pipeline – reduces
speedup
115
Example Problem
Problem: for the laundry fill in the following table when
1. the stage lengths are 30, 30, 30 30 min., resp.
2. the stage lengths are 20, 20, 60, 20 min., resp.
116
Pipelining MIPS
117
Pipelining MIPS
What makes it hard?
structural hazards: different instructions, at different stages,
in the pipeline want to use the same hardware resource
control hazards: succeeding instruction, to put into pipeline,
depends on the outcome of a previous branch instruction,
already in pipeline
data hazards: an instruction in the pipeline requires data to
be computed by a previous instruction still in the pipeline
Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access
Pipelined
Instruction Data
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access
Hazard if single memory
Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access
Instruction Data
lw $4, 400($0) Reg ALU Reg
2 ns fetch access
2 ns 2 ns 2 ns 2 ns 2 ns
2ns
4 ns
Pipeline stall
120
Control Hazards
Solution 2 Predict branch outcome
e.g., predict branch-not-taken :
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
add $4, $5, $6 fetch
Reg ALU
access
Reg
Instruction Data
beq $1, $2, 40 Reg ALU Reg
2 ns fetch access
Instruction Data
lw $3, 300($0) Reg ALU Reg
2 ns fetch access
Prediction success
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
add $4, $5 ,$6 Reg ALU Reg
fetch access
Instruction Data
beq $1, $2, 40 Reg ALU Reg
fetch access
2 ns
bubble bubble bubble bubble bubble
Instruction Data
or $7, $8, $9 Reg ALU Reg
fetch access
4 ns
121
Prediction failure: undo (=flush) lw
Control Hazards
Solution 3 Delayed branch: always execute the sequentially next
statements with the branch executing after one instruction delay –
compiler’s job to find a statement that can be put in the slot that is
independent of branch outcome
MIPS does this – but it is an option in SPIM (Simulator -> Settings)
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
lw $3, 300($0) Reg ALU Reg
2 ns fetch access
Instruction Data
2 ns Reg ALU
or $t0, $t1, $t2 fetch access
122
Data Hazards
Data hazard: instruction needs data from the result of a
previous instruction still executing in pipeline
Solution Forward data if possible…
2 4 6 8 10
Time
Instruction pipeline diagram:
add $s0, $t0, $t1 IF ID EX MEM WB shade indicates use –
left=write, right=read
Program
execution 2 4 6 8 10
order Time
(in instructions)
add $s0, $t0, $t1 IF ID EX MEM WB
Without forwarding – blue line –
data has to go back in time;
with forwarding – red line –
sub $t2, $s0, $t3
data is available in time
IF ID EX MEM WB
123
Data Hazards
Forwarding may not be enough
e.g., if an R-type instruction following a load uses the result of the load –
called load-use data hazard
2 4 6 8 10 12 14
Program Time
execution
order
(in instructions)
Without a stall it is impossible
lw $s0, 20($t1) IF ID EX MEM WB
to provide input to the sub
instruction in time
sub $t2, $s0, $t3 IF ID EX MEM WB
2 4 6 8 10 12 14
Program Time
execution
order
(in instructions)
With a one-stage stall, forwarding
lw $s0, 20($t1) IF ID EX MEM WB can get the data to the sub
instruction in time
bubble bubble bubble bubble bubble
Reordered code:
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t0, 4($t1)
Interchanged
sw $t2, 0($t1)
125
Pipelined Datapath
We now move to actually building a pipelined datapath
First recall the 5 steps in instruction execution
1. Instruction Fetch & PC Increment (IF)
2. Instruction Decode and Register Read (ID)
3. Execution or calculate address (EX)
4. Memory access (MEM)
5. Write result into register (WB)
Review: single-cycle processor
all 5 steps done in a single clock cycle
dedicated hardware required for each step
What happens if we break the execution into multiple cycles, but keep
the extra hardware?
126
Review - Single-Cycle Datapath
“Steps”
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D
127
Review - Single-Cycle Datapath
“Steps”
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D
IF 128
Instruction Fetch
Review - Single-Cycle Datapath
“Steps”
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D
IF 129
Instruction Fetch
Review - Single-Cycle Datapath
“Steps”
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D
IF ID 130
Instruction Fetch Instruction Decode
Review - Single-Cycle Datapath
“Steps”
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D
IF ID 131
Instruction Fetch Instruction Decode
Review - Single-Cycle Datapath
“Steps”
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D
IF ID EX 132
Instruction Fetch Instruction Decode Execute/ Address Calc.
Review - Single-Cycle Datapath
“Steps”
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D
IF ID EX 133
Instruction Fetch Instruction Decode Execute/ Address Calc.
Review - Single-Cycle Datapath
“Steps”
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D
IF ID EX MEM 134
Instruction Fetch Instruction Decode Execute/ Address Calc. Memory Access
Review - Single-Cycle Datapath
“Steps”
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D
IF ID EX MEM 135
Instruction Fetch Instruction Decode Execute/ Address Calc. Memory Access
Review - Single-Cycle Datapath
“Steps”
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD
E Memory M
U
16 X 32 X
T WD
N
D
IF ID EX MEM WB
136
Instruction Fetch Instruction Decode Execute/ Address Calc. Memory Access Write Back
Pipelined Datapath – Key Idea
137
Pipelined Datapath
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
138
Pipelined Datapath
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
139
Pipelined Datapath
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
140
Pipelined Datapath
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
141
Pipelined Datapath
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
142
Pipelined Datapath
Pipeline registers
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
143
Pipelined Datapath
Pipeline registers
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
144
Pipelined Datapath
Pipeline registers
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
IF/ID
145
Pipelined Datapath
Pipeline registers
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
IF/ID ID/EX
146
Pipelined Datapath
Pipeline registers
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
Pipeline registers
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
169
ADD
ADD
4 64 bits 133 bits
102 bits 69 bits
<<2
PC
ADDR RD 5
RN1 RD1
32 Zero
Instruction RN2
ALU
5
Memory Register
5
WN File RD2 M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5 D
171
Single-Clock-Cycle Diagram:
Clock Cycle 1
LW
ADD
ADD
4
<<2
PC
ADDR RD RN1 RD1
32 5
ALU Zero
Instruction RN2
5
Memory Register
WN File RD2
5
M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5
D
172
Single-Clock-Cycle Diagram:
Clock Cycle 2
SW LW
ADD
ADD
4
<<2
PC
ADDR RD RN1 RD1
32 5
ALU Zero
Instruction RN2
5
Memory Register
WN File RD2
5
M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5
D
173
Single-Clock-Cycle Diagram:
Clock Cycle 3
ADD SW LW
ADD
ADD
4
<<2
PC
ADDR RD RN1 RD1
32 5
ALU Zero
Instruction RN2
5
Memory Register
WN File RD2
5
M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5
D
174
Single-Clock-Cycle Diagram:
Clock Cycle 4
SUB ADD SW LW
ADD
ADD
4
<<2
PC
ADDR RD RN1 RD1
32 5
ALU Zero
Instruction RN2
5
Memory Register
WN File RD2
5
M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5
D
175
Single-Clock-Cycle Diagram:
Clock Cycle 5
SUB ADD SW LW
ADD
ADD
4
<<2
PC
ADDR RD RN1 RD1
32 5
ALU Zero
Instruction RN2
5
Memory Register
WN File RD2
5
M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5
D
176
Single-Clock-Cycle Diagram:
Clock Cycle 6
SUB ADD SW
ADD
ADD
4
<<2
PC
ADDR RD RN1 RD1
32 5
ALU Zero
Instruction RN2
5
Memory Register
WN File RD2
5
M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5
D
177
Single-Clock-Cycle Diagram:
Clock Cycle 7
SUB ADD
ADD
ADD
4
<<2
PC
ADDR RD RN1 RD1
32 5
ALU Zero
Instruction RN2
5
Memory Register
WN File RD2
5
M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5
D
178
Single-Clock-Cycle Diagram:
Clock Cycle 8
SUB
ADD
ADD
4
<<2
PC
ADDR RD RN1 RD1
32 5
ALU Zero
Instruction RN2
5
Memory Register
WN File RD2
5
M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5
D
179
Alternative View –
Multiple-Clock-Cycle Diagram
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8
Time axis
lw $t0, 10($t1) IM REG ALU DM REG
180
Notes
One significant difference in the execution of an R-type
instruction between multicycle and pipelined
implementations:
register write-back for the R-type instruction is the 5th
(the last write-back) pipeline stage vs. the 4th stage for
the multicycle implementation. Why?
think of structural hazards when writing to the register
file…
Worth repeating: the essential difference between the
pipeline and multicycle implementations is the insertion of
pipeline registers to decouple the 5 stages
The CPI of an ideal pipeline (no stalls) is 1. Why?
181
Simple Example: Comparing
Performance
Compare performance for multicycle, and pipelined datapaths using the
gcc instruction mix
assume 2 ns for memory access
assume gcc instruction mix 23% loads, 13% stores, 19% branches,
2% jumps, 43% ALU
for pipelined execution assume
50% of the loads are followed immediately by an instruction that uses
the result of the load. This sacrifies 2 cylces.
25% of branches are mispredicted, delay on misprediction is 1 clock
cycle
jumps always incur 1 clock cycle delay so their average time is 2 clock
cycles
182
Simple Example: Comparing
Performance
Multicycle: average instruction time 8.04 ns
Pipelined:
loads use 1 cc (clock cycle) when no load-use dependency
and 2 cc when there is dependency – given 50% of loads
are followed by dependency the average cc per load is:
0.5*1+0.5*2=1.5
stores use 1 cc each
branches use 1 cc when predicted correctly and 2 cc when
not – given 25% mis-prediction average cc per branch is
0.75*1+0.25*2=1.25
jumps use 2 cc each
ALU instructions use 1 cc each
therefore, average CPI is
1.5 23% + 1 13% + 1.25 19% + 2 2% + 1 43% =
1.1825
therefore, average instruction time is 1.1825 2 = 2.365 ns183
Summary
Techniques described in this chapter to design datapaths and
control are at the core of all modern computer architecture
Multicycle datapaths offer two great advantages over single-cycle
functional units can be reused within a single instruction if they are
accessed in different cycles – reducing the need to replicate
expensive logic
instructions with shorter execution paths can complete quicker by
consuming fewer cycles
Modern computers, in fact, take the multicycle paradigm to a
higher level to achieve greater instruction throughput:
pipelining (next topic) where multiple instructions execute
simultaneously by having cycles of different instructions overlap in
the datapath
the MIPS architecture was designed to be pipelined
184