0% found this document useful (0 votes)

115 views50 pages

21 Architecture MultiCycle PDF

Uploaded by

wwwwwww

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

115 views50 pages

21 Architecture MultiCycle PDF

Uploaded by

wwwwwww

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Carnegie Mellon

Multi-cycle MIPS Processor

Design of Digital Circuits 2014

Srdjan Capkun
Frank K. Gürkaynak

http://www.syssec.ethz.ch/education/Digitaltechnik_14

1
Adapted from Digital Design and Computer Architecture, David Money Harris & Sarah L. Harris ©2007 Elsevier
Carnegie Mellon

What Will We Learn?

 What are the problems of Single-cycle Processor

 Multi-cycle Architecture for the MIPS

 Determine the performance of Multi-cycle Processor

2
Carnegie Mellon

Single-cycle MIPS Processor

 Single-cycle microarchitecture:
+ simple
- cycle time limited by longest instruction (lw)
- two adders/ALUs and two memories

3
Carnegie Mellon

Single-cycle Performance
MemtoReg
Control
MemWrite
Unit
Branch 0 0
ALUControl 2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite

CLK CLK
CLK 1 0
010 1
25:21
WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0
A RD

ALU
1 ALUResult ReadData
1 A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
0
20:16
0
15:11
1
WriteReg4:0
PCPlus4
+

SignImm
4 15:0 <<2
Sign Extend PCBranch

+
Result

4
Carnegie Mellon

Delay of Individual Components (Example)

Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20

5
Carnegie Mellon

Processor Performance
 How fast is my program?
 Every program consists of a series of instructions
 Each instruction needs to be executed.

 So how fast are my instructions ?

 Instructions are realized on the hardware
 They can take one or more clock cycles to complete
 Cycles per Instruction = CPI

 How much time is one clock cycle?

 The critical path determines how much time one cycle requires =
clock period.
 1/clock period = clock frequency = how many cycles can be done
each second.

6
Carnegie Mellon

Processor Performance
 Now as a general formula
 Our program consists of executing N instructions.
 Our processor needs CPI cycles for each instruction.
 The maximum clock speed of the processor is f,
and the clock period is therefore T=1/f

 Our program will execute in

N x CPI x (1/f) = N x CPI x T seconds

7
Carnegie Mellon

How can I Make the Program Run Faster?

N x CPI x (1/f)
 Reduce the number of instructions
 Make instructions that ‘do’ more (CISC)
 Use better compilers

 Use less cycles to perform the instruction

 Simpler instructions (RISC)
 Use multiple units/ALUs/cores in parallel

 Increase the clock frequency

 Find a ‘newer’ technology to manufacture
 Redesign time critical components
 Adopt pipelining
8
Carnegie Mellon

Multi-cycle MIPS Processor

 Single-cycle microarchitecture:
+ simple
- cycle time limited by longest instruction (lw)
- two adders/ALUs and two memories

 Multi-cycle microarchitecture:
+ higher clock speed
+ simpler instructions run faster
+ reuse expensive hardware on multiple cycles
- sequencing overhead paid many times

 Same design steps: datapath & control

9
Carnegie Mellon

What Do We Want To Optimize

 Single Cycle Architecture uses two memories
 One memory stores instructions, the other data
 We want to use a single memory (Smaller size)

10
Carnegie Mellon

What Do We Want To Optimize

 Single Cycle Architecture uses two memories
 One memory stores instructions, the other data
 We want to use a single memory (Smaller size)

 Single Cycle Architecture needs three adders

 ALU, PC, Branch address calculation
 We want to use the ALU for all operations (smaller size)

11
Carnegie Mellon

What Do We Want To Optimize

 Single Cycle Architecture uses two memories
 One memory stores instructions, the other data
 We want to use a single memory (Smaller size)

 Single Cycle Architecture needs three adders

 ALU, PC, Branch address calculation
 We want to use the ALU for all operations (smaller size)

 In Single Cycle Architecture all instructions take one cycle

 The most complex operation slows down everything!
 Divide all instructions into multiple steps
 Simpler instructions can take fewer cycles (average case may be
faster)

12
Carnegie Mellon

Consider lw instruction
 For an instruction such as: lw $t0, 0x20($t1)

 We need to:
 Read the instruction from memory
 Then read $t1 from register array
 Add the immediate value (0x20) to calculate the memory address
 Read the content of this address
 Write to the register $t0 this content

13
Carnegie Mellon

Multi-cycle Datapath: instruction fetch

 First consider executing lw
 STEP 1: Fetch instruction

IRWrite

CLK CLK
CLK CLK
WE WE3
PC' PC Instr A1 RD1
b A
RD
A2 RD2
EN
Instr / Data
Memory A3
Register
WD
File
WD3

read from the memory location [rs]+imm to location [rt]

I-Type
op rs rt imm
6 bits 5 bits 5 bits 16 bits
14
Carnegie Mellon

Multi-cycle Datapath: lw register read

IRWrite

CLK CLK CLK

CLK CLK
WE 25:21 WE3 A
PC' PC Instr A1 RD1
b A
RD
A2 RD2
EN
Instr / Data
Memory A3
Register
WD
File
WD3

I-Type
op rs rt imm
6 bits 5 bits 5 bits 16 bits
15
Carnegie Mellon

Multi-cycle Datapath: lw immediate

IRWrite

CLK CLK CLK

CLK CLK
WE 25:21 WE3 A
PC' PC Instr A1 RD1
b A
RD
A2 RD2
EN
Instr / Data
Memory A3
Register
WD
File
WD3

SignImm
15:0
Sign Extend

I-Type
op rs rt imm
6 bits 5 bits 5 bits 16 bits
16
Carnegie Mellon

Multi-cycle Datapath: lw address

IRWrite ALUControl2:0

CLK CLK CLK

CLK CLK
WE WE3 A SrcA CLK
25:21
PC' PC Instr A1 RD1
b RD

ALU
A EN A2 RD2 ALUResult ALUOut
Instr / Data SrcB
Memory A3
Register
WD
File
WD3

SignImm
15:0
Sign Extend

I-Type
op rs rt imm
6 bits 5 bits 5 bits 16 bits
17
Carnegie Mellon

Multi-cycle Datapath: lw memory read

IorD IRWrite ALUControl2:0

CLK CLK CLK

CLK CLK
WE WE3 A SrcA CLK
25:21
PC' PC Instr A1 RD1
b 0 Adr RD

ALU
A EN A2 RD2 ALUResult ALUOut
1
Instr / Data SrcB
Memory CLK A3
Register
WD
Data File
WD3

SignImm
15:0
Sign Extend

I-Type
op rs rt imm
6 bits 5 bits 5 bits 16 bits
18
Carnegie Mellon

Multi-cycle Datapath: lw write register

IorD IRWrite RegWrite ALUControl2:0

CLK CLK CLK

CLK CLK
WE WE3 A SrcA CLK
25:21
PC' PC Instr A1 RD1
b 0 RD

ALU
Adr ALUResult ALUOut
A EN A2 RD2
1
Instr / Data SrcB
Memory 20:16
CLK A3
Register
WD
Data File
WD3

SignImm
15:0
Sign Extend

I-Type
op rs rt imm
6 bits 5 bits 5 bits 16 bits
19
Carnegie Mellon

Multi-cycle Datapath: increment PC

PCWrite IorD IRWrite RegWrite ALUSrcA ALUSrcB1:0 ALUControl2:0

CLK CLK CLK

CLK CLK
0 SrcA
WE WE3 A CLK
25:21
PC' PC Instr A1 RD1 1
b 0 RD

ALU
Adr ALUResult ALUOut
EN A EN A2 RD2 00
1 SrcB
Instr / Data 4 01
Memory 20:16
CLK A3 10
Register
WD 11
Data File
WD3

SignImm
15:0
Sign Extend

20
Carnegie Mellon

Multi-cycle Datapath: sw
 Write data in rt to memory

PCWrite IorD MemWrite IRWrite RegWrite ALUSrcA ALUSrcB1:0 ALUControl2:0

CLK CLK CLK

CLK CLK
0 SrcA
WE WE3 A CLK
25:21
PC' PC Instr A1 RD1 1
b 0 RD

ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00
1
Instr / Data 4 01 SrcB
Memory 20:16
CLK A3 10
Register
WD 11
Data File
WD3

SignImm
15:0
Sign Extend

21
Carnegie Mellon

Multi-cycle Datapath: R-type Instructions

 Read from rs and rt
 Write ALUResult to register file
 Write to rd (instead of rt)
PCWrite IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB1:0 ALUControl2:0

CLK CLK CLK

CLK CLK
0 SrcA
WE WE3 A CLK
25:21
PC' PC Instr A1 RD1 1
b 0 RD

ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00
1
Instr / Data 20:16 4 01 SrcB
0
Memory 15:11 A3 10
CLK 1 Register
WD 11
0 File
Data WD3
1

SignImm
15:0
Sign Extend

22
Carnegie Mellon

Multi-cycle Datapath: beq

 Determine whether values in rs and rt are equal
 Calculate branch target address:
BTA = (sign-extended immediate << 2) + (PC+4)
PCEn
IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB1:0 ALUControl2:0 Branch PCWrite PCSrc

CLK CLK CLK

CLK CLK
0 SrcA
WE WE3 A Zero CLK
25:21
PC' PC Instr A1 RD1 1 0
b 0 RD

ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1
Instr / Data 20:16
4 01 SrcB
0
Memory 15:11
A3 10
CLK 1 Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

23
Carnegie Mellon

Complete Multi-cycle Processor

CLK
PCWrite
Branch PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct

MemtoReg
RegDst
CLK CLK CLK
CLK CLK
0 SrcA
WE WE3 A Zero CLK
25:21
PC' PC Instr A1 RD1 1 0
0 RD

ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1
Instr / Data 20:16 4 01 SrcB
0
Memory 15:11 A3 10
CLK 1 Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

24
Carnegie Mellon

Control Unit
Control
MemtoReg
Unit
RegDst
IorD Multiplexer
PCSrc Selects
Main ALUSrcB1:0
Controller
Opcode5:0 (FSM) ALUSrcA
IRWrite
MemWrite
Register
PCWrite
Enables
Branch
RegWrite

ALUOp1:0

ALU
Funct5:0 ALUControl2:0
Decoder

25
Carnegie Mellon

Main Controller FSM: Fetch

S0: Fetch

Reset

CLK
PCWrite 1
Branch 0 PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct

MemtoReg
RegDst
CLK CLK CLK 0
CLK 0 CLK 0
0 SrcA 010
0 WE WE3 A Zero CLK 0
25:21
PC' PC Instr A1 RD1 1 0
0 RD 01

ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1 X
Instr / Data 1 20:16 4 01 SrcB
1 0
Memory 15:11 A3 10
CLK 1 X Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

26
Carnegie Mellon

Main Controller FSM: Fetch

S0: Fetch
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite

CLK
PCWrite 1
Branch 0 PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct

MemtoReg
RegDst
CLK CLK CLK 0
CLK 0 CLK 0
0 SrcA 010
0 WE WE3 A Zero CLK 0
25:21
PC' PC Instr A1 RD1 1 0
0 RD 01

ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1 X
Instr / Data 1 20:16 4 01 SrcB
1 0
Memory 15:11 A3 10
CLK 1 X Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

27
Carnegie Mellon

Main Controller FSM: Decode

S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite

CLK
PCWrite 0
Branch 0 PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct

MemtoReg
RegDst
CLK CLK CLK X
CLK 0 CLK 0
0 SrcA XXX
X WE WE3 A Zero CLK X
25:21
PC' PC Instr A1 RD1 1 0
0 RD XX

ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1 X
Instr / Data 0 20:16 4 01 SrcB
0 0
Memory 15:11 A3 10
CLK 1 X Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

28
Carnegie Mellon

Main Controller FSM: Address Calculation

S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite

Op = LW
or
S2: MemAdr Op = SW CLK
PCWrite 0
Branch 0 PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct

MemtoReg
RegDst
CLK CLK CLK 1
CLK 0 CLK 0
0 SrcA 010
X WE WE3 A Zero CLK X
25:21
PC' PC Instr A1 RD1 1 0
0 RD 10

ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1 X
Instr / Data 0 20:16 4 01 SrcB
0 0
Memory 15:11 A3 10
CLK 1 X Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

29
Carnegie Mellon

Main Controller FSM: Address Calculation

S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite

Op = LW
or
S2: MemAdr Op = SW CLK
PCWrite 0
Branch 0 PCEn
ALUSrcA = 1 IorD Control PCSrc
ALUSrcB = 10 MemWrite Unit ALUControl2:0
ALUOp = 00 IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct

MemtoReg
RegDst
CLK CLK CLK 1
CLK 0 CLK 0
0 SrcA 010
X WE WE3 A Zero CLK X
25:21
PC' PC Instr A1 RD1 1 0
0 RD 10

ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1 X
Instr / Data 0 20:16 4 01 SrcB
0 0
Memory 15:11 A3 10
CLK 1 X Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

30
Carnegie Mellon

Main Controller FSM: lw

S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite

Op = LW
or
S2: MemAdr Op = SW

ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00

Op = LW
S3: MemRead

IorD = 1

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite
31
Carnegie Mellon

Main Controller FSM: sw

S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite

Op = LW
or
S2: MemAdr Op = SW

ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00

Op = SW
Op = LW
S5: MemWrite
S3: MemRead

IorD = 1
IorD = 1
MemWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite
32
Carnegie Mellon

Main Controller FSM: R-Type

S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite

Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute

ALUSrcA = 1 ALUSrcA = 1
ALUSrcB = 10 ALUSrcB = 00
ALUOp = 00 ALUOp = 10

Op = SW
Op = LW S7: ALU
S5: MemWrite
Writeback
S3: MemRead

RegDst = 1
IorD = 1
IorD = 1 MemtoReg = 0
MemWrite
RegWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite
33
Carnegie Mellon

Main Controller FSM: beq

S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01 ALUSrcA = 0
ALUOp = 00 ALUSrcB = 11
PCSrc = 0 ALUOp = 00
IRWrite
PCWrite
Op = BEQ
Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute
S8: Branch
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01
ALUOp = 00 ALUOp = 10 PCSrc = 1
Branch

Op = SW
Op = LW S7: ALU
S5: MemWrite
Writeback
S3: MemRead

RegDst = 1
IorD = 1
IorD = 1 MemtoReg = 0
MemWrite
RegWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite
34
Carnegie Mellon

Complete Multi-cycle Controller FSM

Op = SW
Op = LW S7: ALU
S5: MemWrite
Writeback
S3: MemRead

RegDst = 1
IorD = 1
IorD = 1 MemtoReg = 0
MemWrite
RegWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite
35
Carnegie Mellon

Main Controller FSM: addi

S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01 ALUSrcA = 0
ALUOp = 00 ALUSrcB = 11
PCSrc = 0 ALUOp = 00
IRWrite
PCWrite
Op = ADDI
Op = BEQ
Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute S9: ADDI
S8: Branch
Execute
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01
ALUOp = 00 ALUOp = 10 PCSrc = 1
Branch

Op = SW
Op = LW S7: ALU
S5: MemWrite S10: ADDI
Writeback
S3: MemRead Writeback

RegDst = 1
IorD = 1
IorD = 1 MemtoReg = 0
MemWrite
RegWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite
36
Carnegie Mellon

Main Controller FSM: addi

S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01 ALUSrcA = 0
ALUOp = 00 ALUSrcB = 11
PCSrc = 0 ALUOp = 00
IRWrite
PCWrite
Op = ADDI
Op = BEQ
Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute S9: ADDI
S8: Branch
Execute
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcA = 1
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUSrcB = 10
ALUOp = 00 ALUOp = 10 PCSrc = 1 ALUOp = 00
Branch

Op = SW
Op = LW S7: ALU
S5: MemWrite S10: ADDI
Writeback
S3: MemRead Writeback

RegDst = 1 RegDst = 0
IorD = 1
IorD = 1 MemtoReg = 0 MemtoReg = 0
MemWrite
RegWrite RegWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite
37
Carnegie Mellon

Extended Functionality: j

PCEn
IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB1:0 ALUControl2:0 Branch PCWrite PCSrc1:0

CLK CLK CLK

CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
25:21
PC' PC Instr A1 RD1 1 00
0 RD

ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 01
1
Instr / Data 20:16 4 01 SrcB 10
0
Memory 15:11 A3 10
CLK 1 Register PCJump
WD 11
0 File
Data WD3
1
<<2 27:0
<<2

SignImm
15:0
Sign Extend
25:0 (jump)

38
Carnegie Mellon

Control FSM: j
S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0 S11: Jump
ALUSrcB = 01 ALUSrcA = 0
ALUOp = 00 ALUSrcB = 11 Op = J
PCSrc = 00 ALUOp = 00
IRWrite
PCWrite
Op = ADDI
Op = BEQ
Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute S9: ADDI
S8: Branch
Execute
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcA = 1
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUSrcB = 10
ALUOp = 00 ALUOp = 10 PCSrc = 01 ALUOp = 00
Branch

Op = SW
Op = LW S7: ALU
S5: MemWrite S10: ADDI
Writeback
S3: MemRead Writeback

RegDst = 1 RegDst = 0
IorD = 1
IorD = 1 MemtoReg = 0 MemtoReg = 0
MemWrite
RegWrite RegWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite
39
Carnegie Mellon

Control FSM: j
S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0 S11: Jump
ALUSrcB = 01 ALUSrcA = 0
ALUOp = 00 ALUSrcB = 11 Op = J
PCSrc = 00 ALUOp = 00 PCSrc = 10
IRWrite PCWrite
PCWrite
Op = ADDI
Op = BEQ
Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute S9: ADDI
S8: Branch
Execute
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcA = 1
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUSrcB = 10
ALUOp = 00 ALUOp = 10 PCSrc = 01 ALUOp = 00
Branch

Op = SW
Op = LW S7: ALU
S5: MemWrite S10: ADDI
Writeback
S3: MemRead Writeback

RegDst = 1 RegDst = 0
IorD = 1
IorD = 1 MemtoReg = 0 MemtoReg = 0
MemWrite
RegWrite RegWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite
40
Carnegie Mellon

Multi-cycle Performance
 Instructions take different number of cycles:
 3 cycles: beq, j
 4 cycles: R-Type, sw, addi
 5 cycles: lw

 CPI is weighted average, i.e. SPECINT2000 benchmark:

 25% loads
 10% stores
 11% branches
 2% jumps
 52% R-type

 Average CPI = (0.11 + 0.02) 3 +(0.52 + 0.10) 4 +(0.25) 5

= 4.12
41
Carnegie Mellon

Multi-cycle Performance
 Multi-cycle critical path:

Tc =
CLK
PCWrite
Branch PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct
MemtoReg
RegDst

CLK CLK CLK

CLK CLK
0 SrcA
WE WE3 A Zero CLK
25:21
PC' PC Instr A1 RD1 1 0
0 RD

ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1
Instr / Data 20:16 4 01 SrcB
0
Memory 15:11 A3 10
CLK 1 Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

42
Carnegie Mellon

Multi-cycle Performance
 Multicycle critical path:

Tc = tpcq + tmux + max(tALU + tmux, tmem) + tsetup

CLK
PCWrite
Branch PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct
MemtoReg
RegDst

CLK CLK CLK

CLK CLK
0 SrcA
WE WE3 A Zero CLK
25:21
PC' PC Instr A1 RD1 1 0
0 RD

ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1
Instr / Data 20:16 4 01 SrcB
0
Memory 15:11 A3 10
CLK 1 Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

43
Carnegie Mellon

Multicycle Performance Example

Tc =

44
Carnegie Mellon

Multicycle Performance Example

Tc = tpcq_PC + tmux + max(tALU + tmux, tmem) + tsetup

= [30 + 25 + 250 + 20] ps
= 325 ps
45
Carnegie Mellon

Multi-cycle Performance Example

 For a program with 100 billion instructions executing on a
multi-cycle MIPS processor
 CPI = 4.12
 Tc = 325 ps

 Execution Time = (# instructions) × CPI × Tc

= (100 × 109)(4.12)(325 × 10-12)
= 133.9 seconds

 This is slower than the single-cycle processor (92.5 seconds).

Why?

46
Carnegie Mellon

Multi-cycle Performance Example

 For a program with 100 billion instructions executing on a
multi-cycle MIPS processor
 CPI = 4.12
 Tc = 325 ps

 Execution Time = (# instructions) × CPI × Tc

= (100 × 109)(4.12)(325 × 10-12)
= 133.9 seconds

 This is slower than the single-cycle processor (92.5 seconds).

Why?
 Not all steps the same length
 Sequencing overhead for each step (tpcq + tsetup= 50 ps)

47
Carnegie Mellon

Review: Single-Cycle MIPS Processor

Jump MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite

CLK CLK
CLK
0 25:21
WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0 Result
1 A RD

ALU
1 ALUResult ReadData
A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
20:16
0
PCJump 15:11
1
WriteReg4:0
PCPlus4
+

SignImm
4 15:0
<<2
Sign Extend PCBranch

+
27:0 31:28

25:0
<<2

48
Carnegie Mellon

Review: Multicycle MIPS Processor

CLK
PCWrite
Branch PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct

MemtoReg
RegDst
CLK CLK CLK
CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
25:21
PC' PC Instr A1 RD1 1 00
0 RD

ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 01
1
Instr / Data 20:16 4 01 SrcB 10
0
Memory 15:11 A3 10
CLK 1 Register PCJump
WD 11
0 File
Data WD3
1
<<2 27:0
<<2

ImmExt
15:0
Sign Extend
25:0 (Addr)

49
Carnegie Mellon

What Have We Learned?

 A more ‘realistic’ architecture
 Shared data and program memory
 A single ALU for all operations

 Multi-cycle: Operations take different number of cycles

 Simpler operations take less steps
 More complex operations take more steps
 Average CPI

 Bottom line
 Smaller
 More complex control
 Not necessarily faster (overhead)
50

Chapter 6 Instruction Set of 8085 & Programming
100% (3)
Chapter 6 Instruction Set of 8085 & Programming
103 pages
Chapter 6 - Pipelining
0% (1)
Chapter 6 - Pipelining
61 pages
Addressing Modes
100% (1)
Addressing Modes
26 pages
Slides Chapter 5 Basic Processing Unit
No ratings yet
Slides Chapter 5 Basic Processing Unit
44 pages
Computer Organization and Architecture What Does Superscalar Mean?
No ratings yet
Computer Organization and Architecture What Does Superscalar Mean?
14 pages
Addressing Modes of 8085 Microprocessor: Notes By: Amitav Biswas (Department of Computer Sci.)
No ratings yet
Addressing Modes of 8085 Microprocessor: Notes By: Amitav Biswas (Department of Computer Sci.)
5 pages
DSP Unit-5 Solutions
No ratings yet
DSP Unit-5 Solutions
17 pages
C) Machine Language Instruction
No ratings yet
C) Machine Language Instruction
9 pages
Lecture Instructions Addressing Mode Timing Diagram
No ratings yet
Lecture Instructions Addressing Mode Timing Diagram
11 pages
Microcontrollers Notes For IV Sem ECE Students
No ratings yet
Microcontrollers Notes For IV Sem ECE Students
19 pages
Processor Architecture
No ratings yet
Processor Architecture
13 pages
CS6303 Computer Architecture Question Bank 3rd Sem
No ratings yet
CS6303 Computer Architecture Question Bank 3rd Sem
5 pages
Digital Design and Computer Architecture, 2: Edition
100% (1)
Digital Design and Computer Architecture, 2: Edition
134 pages
EE-457 Spring
No ratings yet
EE-457 Spring
11 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
50 pages
Syllabus
No ratings yet
Syllabus
5 pages
The Final Datapath: Add M U X
No ratings yet
The Final Datapath: Add M U X
32 pages
PIC Microcontroller Architecture
100% (1)
PIC Microcontroller Architecture
20 pages
Pipelining ControlUnitAndHazards
No ratings yet
Pipelining ControlUnitAndHazards
109 pages
Cs/Coe 1541: Single and Multi-Cycle Implementations
No ratings yet
Cs/Coe 1541: Single and Multi-Cycle Implementations
93 pages
Computer Architecture: CSCE 350
No ratings yet
Computer Architecture: CSCE 350
41 pages
Lecture 4 8405 Computer Architecture
No ratings yet
Lecture 4 8405 Computer Architecture
15 pages
Lect 08
No ratings yet
Lect 08
48 pages
Today - Finish Single-Cycle Datapath/control Path - Look at Its Performance and How To Improve It
No ratings yet
Today - Finish Single-Cycle Datapath/control Path - Look at Its Performance and How To Improve It
28 pages
Single Cycle Mips
No ratings yet
Single Cycle Mips
25 pages
Cyan 2800398239029h09fn0ivj0vcjb0
No ratings yet
Cyan 2800398239029h09fn0ivj0vcjb0
16 pages
Pentium 4 Pipe Lining
100% (5)
Pentium 4 Pipe Lining
7 pages
Lecture 16: Basic CPU Design
No ratings yet
Lecture 16: Basic CPU Design
20 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
38 pages
Pipelining
No ratings yet
Pipelining
25 pages
Lect 07 Processordesign PDF
No ratings yet
Lect 07 Processordesign PDF
55 pages
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
No ratings yet
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
35 pages
Computer Architecture Note by Redwan (UptoMemorySystem)
100% (1)
Computer Architecture Note by Redwan (UptoMemorySystem)
64 pages
DPCO Chapter 4
No ratings yet
DPCO Chapter 4
33 pages
MIPS2
No ratings yet
MIPS2
74 pages
VLSI Syllabus
No ratings yet
VLSI Syllabus
13 pages
Module 4 Ktunotes - in Min
No ratings yet
Module 4 Ktunotes - in Min
11 pages
MIPS Assembly: Design of Digital Circuits 2013 Frank K. Gürkaynak Markus Püschel
No ratings yet
MIPS Assembly: Design of Digital Circuits 2013 Frank K. Gürkaynak Markus Püschel
45 pages
Computer Organization: Engr. Sana Elahi
No ratings yet
Computer Organization: Engr. Sana Elahi
24 pages
Topics: - Introduction - Single-Cycle Processor
No ratings yet
Topics: - Introduction - Single-Cycle Processor
10 pages
Unit 7 COA
No ratings yet
Unit 7 COA
24 pages
Basic Computer Organization and Design
No ratings yet
Basic Computer Organization and Design
49 pages
Presentation of Computer Architecture
No ratings yet
Presentation of Computer Architecture
21 pages
A Single-Cycle MIPS Processor
No ratings yet
A Single-Cycle MIPS Processor
13 pages
5 Singlecycle
No ratings yet
5 Singlecycle
60 pages
Single Cycle Mi Ps
No ratings yet
Single Cycle Mi Ps
31 pages
Mips Datapath
No ratings yet
Mips Datapath
23 pages
Lec6 PDF
No ratings yet
Lec6 PDF
22 pages
T9 - MIPS-Tutorial Sheet ODD SEM 20222
No ratings yet
T9 - MIPS-Tutorial Sheet ODD SEM 20222
4 pages
Computer Architecture: Trần Trọng Hiếu
No ratings yet
Computer Architecture: Trần Trọng Hiếu
29 pages
Untitled
No ratings yet
Untitled
87 pages
ASSIGNMENT1 Acsa
No ratings yet
ASSIGNMENT1 Acsa
3 pages
Comparch 04
No ratings yet
Comparch 04
73 pages
Module 4 - Parallel & Pipeline Processing - Final
No ratings yet
Module 4 - Parallel & Pipeline Processing - Final
31 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
05 - Instruction Set of 8085 and Questions
No ratings yet
05 - Instruction Set of 8085 and Questions
110 pages
Instruction Pipelining
No ratings yet
Instruction Pipelining
21 pages
Ch#4 Part 1, 2,34
No ratings yet
Ch#4 Part 1, 2,34
70 pages
Arch2 Microarchitecture Design Afterlecture
No ratings yet
Arch2 Microarchitecture Design Afterlecture
222 pages
Lecture 12
No ratings yet
Lecture 12
34 pages
Multi-Cycle MIPS Processor
No ratings yet
Multi-Cycle MIPS Processor
36 pages
Instruction Format
No ratings yet
Instruction Format
10 pages
461 Assignment
No ratings yet
461 Assignment
52 pages
Slide 3
No ratings yet
Slide 3
65 pages
DDCArv Ch7
No ratings yet
DDCArv Ch7
57 pages
DDCA Ch7
No ratings yet
DDCA Ch7
76 pages
08 Cse333
No ratings yet
08 Cse333
6 pages
CH 2 - Data Transfer Group
No ratings yet
CH 2 - Data Transfer Group
25 pages
The Significance of SIMD, SSE and AVX - Intel - Slides (3a - SIMD)
No ratings yet
The Significance of SIMD, SSE and AVX - Intel - Slides (3a - SIMD)
57 pages
4 The Processors
No ratings yet
4 The Processors
112 pages
M6800 Insturction Map
No ratings yet
M6800 Insturction Map
1 page
Lecture 13
No ratings yet
Lecture 13
114 pages
Intel AVX Documentation
No ratings yet
Intel AVX Documentation
8 pages
Sheet 8
No ratings yet
Sheet 8
13 pages
Lec 7 CSE-509 Pipelining
No ratings yet
Lec 7 CSE-509 Pipelining
27 pages
FALLSEM2024-25 CSI3021 TH VL2024250101951 2024-07-19 Reference-Material-I
No ratings yet
FALLSEM2024-25 CSI3021 TH VL2024250101951 2024-07-19 Reference-Material-I
21 pages
Getting Started
No ratings yet
Getting Started
14 pages
Week6 Performance Numericals
No ratings yet
Week6 Performance Numericals
38 pages
Lecture10 - Chapter4-P2
No ratings yet
Lecture10 - Chapter4-P2
46 pages
Chapter 04 Processor 1
No ratings yet
Chapter 04 Processor 1
26 pages
Onur 447 Spring15 Lecture7 Pipelining Afterlecture
No ratings yet
Onur 447 Spring15 Lecture7 Pipelining Afterlecture
66 pages
Onur 447 Spring15 Lecture6 Multi Cycle and Microprogrammed Microarchitectures Afterlecture
No ratings yet
Onur 447 Spring15 Lecture6 Multi Cycle and Microprogrammed Microarchitectures Afterlecture
81 pages
Chapter 12 Performance of Single-Cycle and Multi-Cycle Data Path
No ratings yet
Chapter 12 Performance of Single-Cycle and Multi-Cycle Data Path
27 pages
Computer Organisation Syllabus
No ratings yet
Computer Organisation Syllabus
1 page
Single Cycle Processor Design: COE 233 Logic Design and Computer Organization
No ratings yet
Single Cycle Processor Design: COE 233 Logic Design and Computer Organization
41 pages
4single Cycle Datapath
No ratings yet
4single Cycle Datapath
9 pages
Single Cycle Processor Design: Computer Architecture and Assembly Language
No ratings yet
Single Cycle Processor Design: Computer Architecture and Assembly Language
24 pages
HW Monitor
No ratings yet
HW Monitor
39 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
From Everand
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
Mamta Devi
No ratings yet