Digital Design and Computer Architecture:: ARM® Edition
Digital Design and Computer Architecture:: ARM® Edition
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <1>
Topics
• Introduction
• Performance Analysis
• Single-Cycle Processor
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <2>
Introduction
• Microarchitecture: how to
implement an architecture
in hardware
• Processor:
– Datapath: functional blocks
– Control: control signals
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <3>
Microarchitecture
• Multiple implementations for a single
architecture:
– Single-cycle: Each instruction executes in a
single cycle
– Multicycle: Each instruction is broken up into
series of shorter steps
– Pipelined: Each instruction broken up into series
of steps & multiple instructions execute at once
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <4>
Processor Performance
• Program execution time
Execution Time =
(#instructions)(cycles/instruction)(seconds/cycle)
• Definitions:
– CPI: Cycles/instruction
– clock period: seconds/cycle
– IPC: instructions/cycle = IPC
• Challenge is to satisfy constraints of:
– Cost
– Power
– Performance
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <5>
ARM Processor
• Consider subset of ARM instructions:
– Data-processing instructions:
• ADD, SUB, AND, ORR
• with register and immediate Src2, but no shifts
– Memory instructions:
• LDR, STR
• with positive immediate offset
– Branch instructions:
• B
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <6>
Architectural State Elements
• Determines everything about a processor:
– Architectural state:
• 16 registers (including PC)
• Status register
– Memory
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <7>
ARM Architectural State Elements
CLK
4 4
CLK
Status CLK
CLK WE3
A1 RD1 WE
4 32
PC' PC
A RD A RD
32 32 32 32 A2 RD2 32 32
Instruction
4 32 Data
Memory A3 Register Memory
4
WD3 File WD
32 32
R15
32
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <8>
Single-Cycle ARM Processor
• Datapath
• Control
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <9>
Single-Cycle Datapath: LDR fetch
STEP 1: Fetch instruction
CLK CLK
CLK
Instr
WE3 WE
PC' PC A1 RD1
A RD
A RD
Instruction A2 RD2
Memory Data
Memory
A3 Register
WD
WD3 File
R15
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <10>
Single-Cycle Datapath: LDR Reg Read
STEP 2: Read source operands from RF
CLK CLK
CLK
Instr
19:16 RA1 WE3 WE
PC' PC A1 RD1
A RD
A RD
Instruction A2 RD2
Memory Data
Memory
A3 Register
WD
WD3 File
R15
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <11>
Single-Cycle Datapath: LDR Immed.
STEP 3: Extend the immediate
CLK CLK
CLK
Instr
19:16 RA1 WE3 WE
PC' PC A1 RD1
A RD
A RD
Instruction A2 RD2
Memory Data
15:12 Memory
A3 Register
WD
WD3 File
R15
11:0
Extend ExtImm
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <12>
Single-Cycle Datapath: LDR Address
STEP 4: Compute the memory address
ALUControl
00
CLK CLK
CLK
Instr
19:16 RA1 WE3 SrcA WE
PC' PC A1 RD1
A RD ALUResult
ALU
A RD
Instruction A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register
WD
WD3 File
R15
11:0
Extend ExtImm
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <13>
Single-Cycle Datapath: LDR Mem Read
• STEP 5: Read data from memory and write
it back to register file
RegWrite ALUControl
1 00
CLK CLK
CLK Instr
19:16 RA1 WE3 SrcA WE
PC' PC A1 RD1
A RD ALUResult ReadData
ALU
A RD
Instruction A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register
WD
WD3 File
R15
11:0
Extend ExtImm
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <14>
Single-Cycle Datapath: PC Increment
STEP 6: Determine address of next instruction
RegWrite ALUControl
1 00
CLK CLK
CLK
Instr
o
19:16 RA1 WE3 SrcA WE
PC' PC A1 RD1
A RD ALUResult ReadData
ALU
A RD
Instruction A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register
WD
WD3 File
R15
PCPlus4
+
4
11:0
Extend ExtImm
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <15>
Single-Cycle Datapath: Access to PC
STEP 6: PC can be source/destination of instruction
•Source: R15 (PC+8) available in Register File
•Destination: Be able to write result to PC
ALU
0 A RD
Instruction A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register
WD
4 WD3 File
PCPlus8
R15
+
PCPlus4
+
4
11:0
Extend ExtImm
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <16>
Single-Cycle Datapath: STR
Write data in RD to memory
Instr
19:16 RA1 WE3 SrcA WE
1 PC' PC A1 RD1
A RD ALUResult ReadData
ALU
0 A RD
Instruction RA2
A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register WriteData
WD
WD3 File
4
PCPlus8
R15
+
PCPlus4
+
4
11:0
Extend ExtImm
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <17>
Single-Cycle Datapath: Data-processing 1
With immediate Src2:
•Read from Rn and Imm8 (ImmSrc chooses the zero-
extended Imm8 instead of Imm12)
•Write ALUResult to register file
•Write to Rd
ALUFlags
PCSrc RegWrite ImmSrc ALUControl MemWrite MemtoReg
0 1 0 v aries 0 0
CLK CLK
CLK
Instr
ALU
0 A RD
Instruction RA2
A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register WriteData
WD
WD3 File
4 1
PCPlus8
R15
+
PCPlus4 0
+
4
11:0
Extend ExtImm
Result
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <18>
Single-Cycle Datapath: Data-processing 2
With register Src2:
•Read from Rn and Rm (instead of Imm8)
•Write ALUResult to register file
•Write to Rd
ALUFlags
PCSrc RegSrc RegWrite ImmSrc ALUSrc ALUControl MemWrite MemtoReg
0 0 1 X 0 varies 0 0
CLK CLK
CLK
Instr
ALU
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+
PCPlus4 0
+
4
11:0
Extend ExtImm
Result
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <19>
Single-Cycle Datapath: B
• Calculate branch target address:
BTA = (ExtImm) + (PC + 8)
ExtImm = Imm24 << 2 and sign-extended
ALUFlags
PCSrc RegSrc RegWrite ImmSrc ALUSrc ALUControl MemWrite MemtoReg
1 1 x 0 10 1 00 0 0
CLK CLK
CLK
19:16
Instr
ALU
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+
PCPlus4 0
+
4
23:0
Extend ExtImm
Result
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <20>
Single-Cycle Datapath: ExtImm
ALUFlags
PCSrc RegSrc RegWrite ImmSrc ALUSrc ALUControl MemWrite MemtoReg
1 1 x 0 10 1 00 0 0
CLK CLK
CLK
19:16
Instr
0 RA1 WE3 SrcA WE
1 PC' PC A1 RD1
A RD 15 1 ALUResult ReadData
ALU
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+
PCPlus4 0
+
4
23:0
Extend ExtImm
Result
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <21>
Single-Cycle ARM Processor
PCSrc
Control
MemtoReg
Unit
31:28 MemWrite
Cond
27:26 ALUControl
Op
25:20 ALUSrc
Funct
15:12
Rd ImmSrc
RegWrite
Flags
ALUFlags
RegSrc
0 1 CLK CLK
CLK
19:16
Instr
ALU
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+
PCPlus4 0
+
4
23:0
Extend ExtImm
Result
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <22>
Review: ALU
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <23>
Single-Cycle Control: PC Logic
PCS = 1 if PC is written by an instruction or branch (B):
PCS = ((Rd == 15) & RegW) | Branch
PCSrc
Control
MemtoReg
Unit
31:28 MemWrite
Cond
27:26 ALUControl
Op
25:20 ALUSrc
Funct
15:12
Rd ImmSrc
RegWrite
Flags
ALUFlags
RegSrc
0 1 CLK CLK
CLK
19:16
Instr
ALU
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+
PCPlus4 0
+
4
23:0
Extend ExtImm
Result
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <24>
Review: Processor Performance
Program Execution Time
= (#instructions)(cycles/instruction)(seconds/cycle)
= # instructions x CPI x TC
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <25>
Single-Cycle Performance
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <27>
Single-Cycle Performance Example
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 40
Register setup tsetup 50
Multiplexer tmux 25
ALU tALU 120
Decoder tdec 70
Memory read tmem 200
Register file read tRFread 100
Register file setup tRFsetup 60
Tc1 = ?
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <28>
Single-Cycle Performance Example
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 40
Register setup tsetup 50
Multiplexer tmux 25
ALU tALU 120
Decoder tdec 70
Memory read tmem 200
Register file read tRFread 100
Register file setup 60
Tc1 = tpcq_PC + 2tmem + tdectRFsetup
+ tRFread + tALU + 2tmux + tRFsetup
= [50 + 2(200) + 70 + 100 + 120 + 2(25) + 60] ps
= 840 ps
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <29>