ARM CPU Cores
ARM CPU Cores
A72 has 3.5 times performance gain over A15. Pair with A53 to get big.Little
Architecture. A73 introduced in 2016.
3GHz on a 10nm SoC & 2.8GHz on a 16nm SoC is achievable with A73.
© G.N. Khan ARM Processors/Cores – EE8205: Embedded Computer Systems Page: 5
Cortex A75
Cortex-A75
execute up to 3
instructions per
clock cycle.
A75 boasts
7 execution units,
two load/stores,
two NEON &
FPU, a BPU and
two integer cores.
Thumb-2
A64 AArch64
address register
P
C incrementer
PC
register
bank
instruction
decode
A multip ly &
L register
U control
A B
b
u b b
s u u
s barrel s
shifter
ALU
D[31:0]
time
1 2 3
• Latency
Time it takes for an instruction to get through the pipeline.
• Throughput
Number of instructions executed per time period.
Pipelining increases throughput without reducing latency.
© G.N. Khan ARM Processors/Cores – EE8205: Embedded Computer Systems Page: 15
ARM CPU Features: Modified RISC
Multiple Load and Store Operation
Reduce the penalty of data accesses during a stall in the pipeline
Multiple load/store instructions can move any of the ARM registers
to and from memory and update the memory address register
automatically after the transfer.
• This not only allows one instruction to transfer many words of
data (in a single bus burst), but it also reduces the number of
instructions needed to transfer data.
• Make the ARM code smaller than other 32-bit CPUs
• These instructions can specify an update of the base address
register with a new address after (or even before) the transfer.
RISC CPU architectures would normally use a second instruction (add or
subtract) to form the next address in a sequence.
ARM does it automatically with a single bit in the instruction, again a
useful saving in code size.
© G.N. Khan ARM Processors/Cores – EE8205: Embedded Computer Systems Page: 16
ARM CPU (More) Features
All instructions are conditionally executed:
• A very useful feature
• Loads, stores, procedure calls and returns, and all other operations
can execute conditionally after some prior instruction to set the
condition code flags
• Any ALU instruction may set the flags
• This eliminates short forward branches in ARM code
• It also improves code density and avoids flushing the pipeline for
branches and increase execution performance
▪ Most of the architectures have conditional branch instructions
▪ These follow a test or compare instruction to control the flow of
execution through the program
▪ Some architectures also have a conditional move instruction,
allowing data to be conditionally transferred between registers
System on chip
EmbeddedICE JTAG
controller por t JTAG TAP EmbeddedICE
host data
system Embedded
ARM
trace
core
address macrocell
control
ARM9TDMI:
instr uction r. read data memor y reg
fetch shift/ALU access write
decode
Sign Extender
R15 Rd
(pc) Registers
R0 – R15
Rm
Rn
Barrel Shifter
MAC
ALU
Result Bus
• Barrel shifter rotates/shift instruction operand prior to inputting the value into the ALU
• Extensively used for signal processing application programs
MUL R1 R2 #2 (LSL R1 R2 #2)
ADD R5, R1, R4 translates to ADD R5, R4, R2, LSL #2
}
© G.N. Khan ARM Processors/Cores – EE8205: Embedded Computer Systems Page: 34
ARM7: Programming Model
r0 r8
r1 r9 31 0
r2 r10
r3 r11 CPSR
r4 r12
r5 r13
NZCV
r6 r14
r7 r15
(PC)
• Word is 32 bits long.
• Word can be divided into four 8-bit bytes.
• ARM addresses can be 32 bits long.
• Address refers to byte.
Address 4 starts at byte 4.
3 3 2 2 2 2 2 2 2 1 1 1
1 0 9 8 7 6 5 4 3 6 5 0 9 8 0
APSR N C Z V Q
IPSR 0 or exception #
Examples
SSAT R7, #16, R7, LSL #4
; Logical shift left value in R7 by 4, then
; saturate it as a signed 16-bit value and
; write it back to R7
cond
PC PC + imm;
BL label n/a Subroutine call
LR rtn adr
Shift Instructions
<shift> Meaning Notes
LSL #n Logical shift left by n bits Zero fills; 0 ≤ n ≤ 31
LSR #n Logical shift right by n bits Zero fills; 1 ≤ n ≤ 32
ASR #n Arithmetic shift right by n bits Sign extends; 1 ≤ n ≤ 32
ROR #n Rotate right by n bits 1 ≤ n ≤ 32
RRX Rotate right w/C by 1 bit
Architectures
• This 16-bit Thumb instruction is available in ARMv6T2 and above.
• In ARM code, IT is a pseudo-instruction that does not generate any code.
© G.N. Khan ARM Processors/Cores – EE8205: Embedded Computer Systems Page: 51
IT Examples
ITTE NE ; IT can be omitted
ANDNE r0,r0,r1 ; 16-bit AND, not ANDS
ADDSNE r2,r2,#1 ; 32-bit ADDS (16-bit ADDS doesn’t set flags in IT)
MOVEQ r2,r3 ; 16-bit MOV
ITT EQ
MOVEQ r0,r1
BEQ dloop ; branch at end of IT block is permitted
ITT EQ
MOVEQ r0,r1
BKPT #1 ; BKPT always executes
ADDEQ r0,r0,#1
Incorrect example
IT NE
ADD r0,r0,r1; syntax error: no condition code used in IT
LDR R0,A
if (a == 0) b = 1 ; CMP R0,#0
BNE L1
LDR R0,=1
STR R0,B
L1: …
Yes
- or –
A = 0? B1
LDR R0,A
CMP R0,#0
No ITT EQ
LDREQ R0,=1
L1: STREQ R0,B
Non-Conditional Method
CMP r2, #5
BGT t_else
MOV r2, #10
t_else: MOV r2, #1
Old SP
PSR Exception return occurs when in Handler
Mode and one of the following instructions
Return Address
is executed:
Increasing Addresses
R1
SP R0
12
Cycles
Latency:
17 Cycles
Tail Chaining
Without Tail-Chaining
With Tail-Chaining
NVIC INTERRUPTS
Navigation System
iPhone based on
ARM1176JZ
Communication Interfaces
• Allow data exchange between independent electronic modules in
the car, as well as remote sub modules.
• High Speed (up to 1Mbps) CAN (Control Area Network) is a 2-
wire, fault tolerant differential bus.
• It serves as the main vehicle bus type for connecting the various
electronic modules in the car with each other.
• LIN supports low speed (up to 20 kbps) single bus wire networks,
used to communicate with remote sub functions of the
infotainment system.
The main load types in a Cluster are the stepper motors that operate
the gauges and the various indicator and back light sources.
• The Stepper motor drivers are typically integrated in the C.
• LED drivers are typically multi-channel devices with serial interfaces to
the C or Darlington arrays.