ch2 2
ch2 2
Outline
ARM processor core
Memory hierarchy
2
ARM Processor Core
3
ARM7TDMI Processor Core
Current low-end ARM core for applications like
digital mobile phones
TDMI
– T: Thumb, 16-bit compressed instruction set
– D: on-chip Debug support, enabling the processor to halt
in response to a debug request
– M: enhanced Multiplier, yield a full 64-bit result, high
performance
– I: EmbeddedICE hardware
Von Neumann architecture
3-stage pipeline
CPI ~ 1.9
4
ARM7TDMI Block Diagram
scan chain 2
extern0 Embedded scan chain 0
extern1
ICE
opc, r/w,
mreq, trans,
mas[1:0]
A[31:0] processor other
core signals
Din[31:0]
bus JTAG TAP
splitter controller
Dout[31:0]
5
ARM7TDMI Core Diagram
6
ARM7TDMI Interface Signals (1/4)
mclk A[31:0]
clock
control wait
Din[31:0]
eclk
configuration bigend Dout[31:0]
7
ARM7TDMI Interface Signals (2/4)
Clock control
– All state change within the processor are controlled by mclk, the
memory clock
– Internal clock = mclk AND \wait
– eclk clock output reflects the clock used by the core
Memory interface
– 32-bit address A[31:0], bidirectional data bus D[31:0], separate data
out Dout[31:0], data in Din[31:0]
– seq indicates that the memory address will be sequential to that used
in the previous cycle
8
ARM7TDMI Interface Signals (3/4)
– Lock indicates that the processor should keep the bus to ensure the
atomicity of the read and write phase of a SWAP instruction
– \r/w, read or write
– mas[1:0], encode memory access size – byte, half-word or word
– bl[3:0], externally controlled enables on latches on each of the 4
bytes on the data input bus
MMU interface
– \trans (translation control), 0: user mode, 1: privileged mode
– \mode[4:0], bottom 5 bits of the CPSR (inverted)
– Abort, disallow access
State
– T bit, whether the processor is currently executing ARM or Thumb
instructions
Configuration
– Bigend, big-endian or little-endian
9
ARM7TDMI Interface Signals (4/4)
Interrupt
– \fiq, fast interrupt request, higher priority
– \irq, normal interrupt request
– isync, allow the interrupt synchronizer to be passed
Initialization
– \reset, starts the processor from a known state, executing from
address 0000000016
ARM7TDMI characteristics
10
Memory Access
The ARM7 is a Von Neumann,
load/store architecture, i.e.,
– Only 32 bit data bus for both inst. And data.
– Only the load/store inst. (and SWP) access
memory.
Memory is addressed as a 32 bit
address space
Data type can be 8 bit bytes, 16 bit
half-words or 32 bit words, and may be
seen as a byte line folded into 4-byte
words
Words must be aligned to 4 byte
boundaries, and half-words to 2 byte
boundaries.
Always ensure that memory controller
supports all three access sizes
11
ARM Memory Interface
Sequential (S cycle)
– (nMREQ, SEQ) = (0, 1)
– The ARM core requests a transfer to or from an address which is either the
same, or one word or one-half-word greater than the preceding address.
Non-sequential (N cycle)
– (nMREQ, SEQ) = (0, 0)
– The ARM core requests a transfer to or from an address which is unrelated to
the address used in the preceding address.
Internal (I cycle)
– (nMREQ, SEQ) = (1, 0)
– The ARM core does not require a transfer, as it performing an internal
function, and no useful prefetching can be performed at the same time
Coprocessor register transfer (C cycle)
– (nMREQ, SEQ) = (1, 1)
– The ARM core wished to use the data bus to communicate with a
coprocessor, but does not require any action by the memory system.
12
Cached ARM7TDMI Macrocells
EmbeddedICE & JTAG CP15
ARM Core
Physical
Address AMBA
MMU
Virtual AMBA Address
Address Interface AMBA
Inst. & data
Data
Write
Inst. & data cache
Buffer
ARM710T ARM720T
– 8K unified write through – As ARM 710T but with WinCE
cache support
– Full memory management ARM 740T
unit supporting virtual – 8K unified write through cache
memory – Memory protection unit
– Write buffer – Write buffer
13
Processor Core Vs CPU Core
Processor Core
– The engine that fetches instructions and execute them
– E.g.: ARM7TDMI, ARM9TDMI, ARM9E-S
CPU Core
– Consists of the ARM processor virtual address
address
physical
instructions & data
AMBA interface
AMBA AMBA
address data
ARM710T
14
Memory Hierarchy
15
Memory Size and Speed
Main memory
16
Caches (1/2)
A cache memory is a small, very fast memory that
retains copies of recently used memory values.
It usually implemented on the same chip as the
processor.
Caches work because programs normally display
the property of locality, which means that at any
particular time they tend to execute the same
instruction many times on the same areas of data.
An access to an item which is in the cache is called
a hit, and an access to an item which is not in the
cache is a miss.
17
Caches (2/2)
A processor can have one of the following two
organizations:
– A unified cache
• This is a single cache for both instructions and data
– Separate instruction and data caches
• This organization is sometimes called a modified Harvard
architectures
18
Unified instruction and data cache
FF..FF16
registers
instructions
processor
instructions
address and data
data
copies of
instructions address
copies of
data
memory
cache
instructions 00..0016
and data
19
Separate data and instruction caches
FF..FF16
copies of
instructions address
instructions
cache
address instructions
instructions
registers
processor
address
copies of
data
data memory
cache
00..0016
20
Cache Write Strategies
Write-through
– All write operations are passed to main memory
Write-through with buffered write
– All write operations are still passed to main memory and
the cache updated as appropriate, but instead of slowing
the processor down to main memory speed the write
address and data are stored in a write buffer which can
accept the write information at high speed.
Copy-back (write-back)
– Not kept coherent with main memory
21
Summary (1/2)
ARM Processor Family
22
Summary (2/2)
Memory hierarchy
– Unified cache/Separate instruction and data cache
– Write-through with buffered write
23