0% found this document useful (0 votes)
14 views29 pages

Digital Design and Computer Architecture:: ARM® Edition

Uploaded by

electro-ub ub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views29 pages

Digital Design and Computer Architecture:: ARM® Edition

Uploaded by

electro-ub ub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 29

Digital Design and Computer Architecture: ARM® Edition

Sarah L. Harris and David Money Harris

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <1>
Topics

• Introduction
• Performance Analysis
• Single-Cycle Processor

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <2>
Introduction
• Microarchitecture: how to
implement an architecture
in hardware
• Processor:
– Datapath: functional blocks
– Control: control signals

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <3>
Microarchitecture
• Multiple implementations for a single
architecture:
– Single-cycle: Each instruction executes in a
single cycle
– Multicycle: Each instruction is broken up into
series of shorter steps
– Pipelined: Each instruction broken up into series
of steps & multiple instructions execute at once

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <4>
Processor Performance
• Program execution time
Execution Time =
(#instructions)(cycles/instruction)(seconds/cycle)

• Definitions:
– CPI: Cycles/instruction
– clock period: seconds/cycle
– IPC: instructions/cycle = IPC
• Challenge is to satisfy constraints of:
– Cost
– Power
– Performance
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <5>
ARM Processor
• Consider subset of ARM instructions:
– Data-processing instructions:
• ADD, SUB, AND, ORR
• with register and immediate Src2, but no shifts
– Memory instructions:
• LDR, STR
• with positive immediate offset
– Branch instructions:
• B

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <6>
Architectural State Elements
• Determines everything about a processor:
– Architectural state:
• 16 registers (including PC)
• Status register
– Memory

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <7>
ARM Architectural State Elements

CLK

4 4
CLK
Status CLK
CLK WE3
A1 RD1 WE
4 32
PC' PC
A RD A RD
32 32 32 32 A2 RD2 32 32
Instruction
4 32 Data
Memory A3 Register Memory
4
WD3 File WD
32 32
R15
32

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <8>
Single-Cycle ARM Processor
• Datapath
• Control

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <9>
Single-Cycle Datapath: LDR fetch
STEP 1: Fetch instruction
CLK CLK
CLK

Instr
WE3 WE
PC' PC A1 RD1
A RD
A RD
Instruction A2 RD2
Memory Data
Memory
A3 Register
WD
WD3 File
R15

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <10>
Single-Cycle Datapath: LDR Reg Read
STEP 2: Read source operands from RF
CLK CLK
CLK

Instr
19:16 RA1 WE3 WE
PC' PC A1 RD1
A RD
A RD
Instruction A2 RD2
Memory Data
Memory
A3 Register
WD
WD3 File
R15

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <11>
Single-Cycle Datapath: LDR Immed.
STEP 3: Extend the immediate
CLK CLK
CLK

Instr
19:16 RA1 WE3 WE
PC' PC A1 RD1
A RD
A RD
Instruction A2 RD2
Memory Data
15:12 Memory
A3 Register
WD
WD3 File
R15

11:0
Extend ExtImm

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <12>
Single-Cycle Datapath: LDR Address
STEP 4: Compute the memory address
ALUControl
00
CLK CLK
CLK

Instr
19:16 RA1 WE3 SrcA WE
PC' PC A1 RD1
A RD ALUResult

ALU
A RD
Instruction A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register
WD
WD3 File
R15

11:0
Extend ExtImm

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <13>
Single-Cycle Datapath: LDR Mem Read
• STEP 5: Read data from memory and write
it back to register file
RegWrite ALUControl
1 00
CLK CLK
CLK Instr
19:16 RA1 WE3 SrcA WE
PC' PC A1 RD1
A RD ALUResult ReadData

ALU
A RD
Instruction A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register
WD
WD3 File
R15

11:0
Extend ExtImm

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <14>
Single-Cycle Datapath: PC Increment
STEP 6: Determine address of next instruction
RegWrite ALUControl
1 00
CLK CLK
CLK

Instr

o
19:16 RA1 WE3 SrcA WE
PC' PC A1 RD1
A RD ALUResult ReadData

ALU
A RD
Instruction A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register
WD
WD3 File
R15
PCPlus4
+

4
11:0
Extend ExtImm

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <15>
Single-Cycle Datapath: Access to PC
STEP 6: PC can be source/destination of instruction
•Source: R15 (PC+8) available in Register File
•Destination: Be able to write result to PC

PCSrc RegWrite ALUControl


1 1 00
CLK CLK
CLK
Instr

19:16 RA1 WE3 SrcA WE


1 PC' PC A1 RD1
A RD ALUResult ReadData

ALU
0 A RD
Instruction A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register
WD
4 WD3 File
PCPlus8
R15
+

PCPlus4
+

4
11:0
Extend ExtImm

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <16>
Single-Cycle Datapath: STR
Write data in RD to memory

PCSrc RegWrite ALUControl MemWrite


0 0 00 1
CLK CLK
CLK

Instr
19:16 RA1 WE3 SrcA WE
1 PC' PC A1 RD1
A RD ALUResult ReadData

ALU
0 A RD
Instruction RA2
A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register WriteData
WD
WD3 File
4
PCPlus8
R15
+
PCPlus4
+

4
11:0
Extend ExtImm

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <17>
Single-Cycle Datapath: Data-processing 1
With immediate Src2:
•Read from Rn and Imm8 (ImmSrc chooses the zero-
extended Imm8 instead of Imm12)
•Write ALUResult to register file
•Write to Rd

ALUFlags
PCSrc RegWrite ImmSrc ALUControl MemWrite MemtoReg
0 1 0 v aries 0 0
CLK CLK
CLK
Instr

19:16 RA1 WE3 SrcA WE


1 PC' PC A1 RD1
A RD ALUResult ReadData

ALU
0 A RD
Instruction RA2
A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register WriteData
WD
WD3 File
4 1
PCPlus8
R15
+

PCPlus4 0
+

4
11:0
Extend ExtImm
Result

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <18>
Single-Cycle Datapath: Data-processing 2
With register Src2:
•Read from Rn and Rm (instead of Imm8)
•Write ALUResult to register file
•Write to Rd

ALUFlags
PCSrc RegSrc RegWrite ImmSrc ALUSrc ALUControl MemWrite MemtoReg
0 0 1 X 0 varies 0 0
CLK CLK
CLK
Instr

19:16 RA1 WE3 SrcA WE


1 PC' PC A1 RD1
A RD ALUResult ReadData

ALU
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+

PCPlus4 0
+

4
11:0
Extend ExtImm
Result

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <19>
Single-Cycle Datapath: B
• Calculate branch target address:
BTA = (ExtImm) + (PC + 8)
ExtImm = Imm24 << 2 and sign-extended

ALUFlags
PCSrc RegSrc RegWrite ImmSrc ALUSrc ALUControl MemWrite MemtoReg
1 1 x 0 10 1 00 0 0
CLK CLK
CLK
19:16
Instr

0 RA1 WE3 SrcA WE


1 PC' PC A1 RD1
A RD 15 1 ALUResult ReadData

ALU
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+

PCPlus4 0
+

4
23:0
Extend ExtImm
Result

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <20>
Single-Cycle Datapath: ExtImm

ALUFlags
PCSrc RegSrc RegWrite ImmSrc ALUSrc ALUControl MemWrite MemtoReg
1 1 x 0 10 1 00 0 0
CLK CLK
CLK
19:16

Instr
0 RA1 WE3 SrcA WE
1 PC' PC A1 RD1
A RD 15 1 ALUResult ReadData

ALU
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+
PCPlus4 0
+

4
23:0
Extend ExtImm
Result

ImmSrc1:0 ExtImm Description


00 {24’b0, Instr7:0} Zero-extended imm8
01 {20’b0, Instr11:0} Zero-extended imm12
10 {6{Instr23}, Instr23:0} Sign-extended imm24

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <21>
Single-Cycle ARM Processor
PCSrc
Control
MemtoReg
Unit
31:28 MemWrite
Cond
27:26 ALUControl
Op
25:20 ALUSrc
Funct
15:12
Rd ImmSrc
RegWrite

Flags
ALUFlags

RegSrc
0 1 CLK CLK
CLK
19:16
Instr

0 RA1 WE3 SrcA WE


1 PC' PC A1 RD1
A RD 15 1 ALUResult ReadData

ALU
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+

PCPlus4 0
+

4
23:0
Extend ExtImm
Result

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <22>
Review: ALU

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <23>
Single-Cycle Control: PC Logic
PCS = 1 if PC is written by an instruction or branch (B):
PCS = ((Rd == 15) & RegW) | Branch
PCSrc
Control
MemtoReg
Unit
31:28 MemWrite
Cond
27:26 ALUControl
Op
25:20 ALUSrc
Funct
15:12
Rd ImmSrc
RegWrite

Flags
ALUFlags

RegSrc
0 1 CLK CLK
CLK
19:16
Instr

0 RA1 WE3 SrcA WE


1 PC' PC A1 RD1
A RD 15 1 ALUResult ReadData

ALU
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+

PCPlus4 0
+

4
23:0
Extend ExtImm
Result

If instruction is executed: PCSrc = PCS


Else PCSrc = 0 (i.e., PC = PC + 4)

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <24>
Review: Processor Performance
Program Execution Time
= (#instructions)(cycles/instruction)(seconds/cycle)
= # instructions x CPI x TC

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <25>
Single-Cycle Performance

TC limited by critical path (LDR)


Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <26>
Single-Cycle Performance
• Single-cycle critical path:
Tc1 = tpcq_PC + tmem + tdec + max[tmux + tRFread, tsext + tmux]
+ tALU + tmem + tmux + tRFsetup

• Typically, limiting paths are:


– memory, ALU, register file
– Tc1 = tpcq_PC + 2tmem + tdec + tRFread + tALU + 2tmux +
tRFsetup

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <27>
Single-Cycle Performance Example
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 40
Register setup tsetup 50
Multiplexer tmux 25
ALU tALU 120
Decoder tdec 70
Memory read tmem 200
Register file read tRFread 100
Register file setup tRFsetup 60
Tc1 = ?

Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <28>
Single-Cycle Performance Example
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 40
Register setup tsetup 50
Multiplexer tmux 25
ALU tALU 120
Decoder tdec 70
Memory read tmem 200
Register file read tRFread 100
Register file setup 60
Tc1 = tpcq_PC + 2tmem + tdectRFsetup
+ tRFread + tALU + 2tmux + tRFsetup
= [50 + 2(200) + 70 + 100 + 120 + 2(25) + 60] ps
= 840 ps
Digital Design and Computer Architecture: ARM® Edition © 2015 Chapter 6 <29>

You might also like