Intro To ARM Cortex-M3 (CM3) and LPC17xx MCU: Outline
Intro To ARM Cortex-M3 (CM3) and LPC17xx MCU: Outline
Intro To ARM Cortex-M3 (CM3) and LPC17xx MCU: Outline
Outline
• ARM Cortex-M3 processor
• NXP LPC17xx microcontroller unit (MCU)
1
Cortex-M3 Processor
2
Cortex-M3 Processor
• Greater performance efficiency: more work to be done
without increasing the frequency or power requirements
– Implements the new Thumb-2 instruction set architecture
• 70% more efficient per MHz than an ARM7TDMI-S processor
executing Thumb instructions
• 35% more efficient than the ARM7TDMI-S processor executing ARM instructions for Dhrystone
benchmark
4
Cortex-M3 Processor Architecture
• Harvard architecture: it uses separate interfaces to fetch
instructions (Inst) and (Data)
• Processor is not memory starved: it permits accessing data
and instruction memories simultaneously
• From CM3 perspective, everything looks like memory
– Only differentiates between instruction fetches and data
accesses
• Interface between CM3 and manufacturer specific
hardware is through three memory buses:
– ICode, DCode, and System (for peripherals), which are
defined to access different regions of memory
6
Cortex-M3 Processor
• Cortex-M3 is a load/store architecture with three basic
types of instructions
• register-to-register operations for processing data
8
Instruction Prefetch & Execution
Processor Modes
Operating Modes
10
Exceptions
11
Processor Register Set
13
Program Memory Model
• RAM for an executing program is divided into three regions
– Data in RAM are allocated during the link process and initialized by
startup code at reset
– The (optional) heap is managed at runtime by library code
implementing functions such as the malloc and free which are part of
the standard C library
– The stack is managed at runtime by compiler generated code which
generates per-procedure-call stack frames containing local variables and
saved registers
Cortex-M3 Memory Address Space
• ARM Cortex-M3 processor has a single 4 GB address space
15
Memory Map
Instruction Set Architecture (ISA)
• Instruction set
– Addressing modes
– Word size
– Data formats
– Operating modes
– Condition codes
10
Major Elements of
32-bits
ISA 32-bits
mov r0, #1
ld r1, [r0,#5]
r1=mem((r0)+5)
bne loop
subs r2, #1
Endianess Endianess
Addressing: Big Endian vs Little Endian
• Endian-ness: ordering of bytes within a word
– Little - increasing numeric significance with increasing
memory addresses
– Big – The opposite, most significant byte first
– MIPS is big endian, x86 is little endian
Instruction Encoding
• Instructions are encoded in machine language
opcodes
19
Traditional ARM instructions
• Fixed length of 32 bits
• Commonly take two or three operands
• Process data held in registers
• Shift & ALU operation in single clock cycle
• Access memory with load and store instructions only
– Load/Store multiple register
• Can be extended to execute conditionally by
adding the appropriate suffix
• Affect the CP“ ‘ status flags ďLJ adding the ͚“͛ suffidž
to the instruction
21
16bit Thumb-2
• Some of the changes used to reduce the length of
the instructions from 32 bits to 16 bits
– reduce the number of bits used to identify the register
• less number of registers can be used
– reduce the number of bits used for the immediate value
• smaller number range
– remove options suĐh as ͚“͛
• make it default for some instructions
– remove conditional fields (N, Z, V, C)
– no conditional executions (except branch)
– remove the optional shift (and no barrel shifter
operation
• introduce dedicated shift instructions
– remove some of the instructions
• more restricted coding
Thumb-2 Implementation
23
32bit Instruction Encoding
25
Thumb
Instruction
Set
• See
4_THUMB_Instr_
Set_pt3.pdf
included in
lab1_files.zip
27
Updating the APSR
• SUB Rx, Ry
– Rx = Rx - Ry
– APSR unchanged
• SUBS
– Rx = Rx - Ry
– APSR N or Z bits might be set
• ADD Rx, Ry
– Rx = Rx + Ry
– APSR unchanged
• ADDS
– Rx = Rx + Ry
– APSR C or V bits might be set
29
Conditional Execution
Conditional Execution
31
Conditional Execution and Flags
33
ARM Instruction Set
35
Data Processing Instructions
• Arithmetic operations:
– ADD, ADDC, SUB, SUBC, RSB, RSC
• Bit-wise logical operations:
– AND, EOR, ORR, BIC
• Register movement operations:
– MOV, MVN
• Comparison operations:
– TST, TEQ, CMP, CMN
20
Data Processing Instructions
Multiply Instructions
• Integer multiplication (32-bit result)
• Long integer multiplication (64-bit result)
• Built in Multiply Accumulate Unit (MAC)
• Multiply and accumulate instructions add
product to running total
39
Multiply Instructions
41
Addressing Modes
• Offset Addressing
– Offset is added or subtracted from base register
– Result used as effective address for memory access
– [<Rn>, <offset>]
• Pre-indexed Addressing
– Offset is applied to base register
– Result used as effective address for memory access
– Result written back into base register
– [<Rn>, <offset>]!
• Post-indexed Addressing
– The address from the base register is used as the EA
– The offset is applied to the base and then written back
– [<Rn>], <offset>
<offset> options
• An immediate constant
– #10
• An index register
– <Rm>
• A shifted index register
– <Rm>, LSL #<shift>
43
Block Transfer Instructions
Swap Instruction
45
Modifying the Status Registers
Software Interrupt
47
Branching Instructions
• Branch (B):
– jumps forwards/backwards up to 32 MB
• Branch link (BL):
– same + saves (PC+4) in LR
• Suitable for function call/return
• Condition codes for conditional branches
Branching Instructions
49
IF-THEN Instruction
Barrier instructions
51
Unified Assembly Language
• UAL supports generation of either Thumb-2 or
ARM instructions from the same source code
– same syntax for both the Thumb code and ARM code
– enable portability of code for different ARM
processor families
• Interpretation of code type is based on the
directive
listed in the assembly file
• Example:
– For GNU Assembler, the directive for UAL is
.syntax unified
– For ARM assembler, the directive for UAL is
THUMB
Example
data: 1
.byte 0x12, 20, 0x20, -1
func:
mov r0, #0
mov r4, #0
movw r1, #:lower16:data
movt r1, #:upper16:data
top: ldrb r2, [r1],1
add r4, r4, r2
add r0, r0, #1
cmp r0, #4
bne top
53
From ARM
Architecture
Reference Manual
55
Example 2
int counter;
int Counter_Inc(void) {
return counter ++;
}
Resulting
(annotated)
assembly language
with
0: f240 0300 movw r3 , #:lower16:counter // r3 = &counter
corresponding
4: f2c0 0300
8: 6818
movt r3 ,
// r0 = *r3
#:upper16:counter ldr r0 ,
machine
a: 1c42 code:
[r3 , #0] // r2 = r0 + 1
c: 601a str r2 , [r3 , #0] // *r3 = r2
adds r2 , r0 , #1
e: 4740
Counter_Inc: bx lr // return r0
• Key points:
– Cortex-M3 utilizes a mixture of 32-bit and 16-bit
instructions (mostly the latter) and the core interacts
with memory solely through load and store instructions
– While there are instructions that load/store groups of
registers (in multiple cycles) there are no instructions
that directly operate on memory locations
30
How does an assembly language program
get turned into a executable program
image?
Binary
progra
m file
(.bin)
Assembly Object
Executabl
files files
e image
(.s) (.o) file
ld
(linker
as
)
(assembler
)
Memory
layout
Disassemble
Linker d code
script (.lst)
(.ld)
_start:
.word STACK_TOP, start
start:
movs r0, #10
movs r1, #0
loop:
adds r1, r0
subs r0, #1
bne loop
deadloop:
b
deadloop
.end
59
What information does the disassembled file provide?
all:
arm-none-eabi-as -mcpu=cortex-m3 -mthumb example1.s -o
example1.o arm-none-eabi-ld -Ttext 0x0 -o example1.out
example1.o
arm-none-eabi-objcopy -Obinary example1.out
example1.bin arm-none-eabi-objdump -S example1.out >
example1.lst
.equ STACK_TOP, 0x20000800 example1.out: file format elf32-littlearm
.text
.syntax unified
.thumb Disassembly of section .text:
.global _start
.type 00000000 <_start>:
start, %function
0:
_start: 20000800 .word
.word STACK_TOP, start 0x20000800
start: 4:
movs r0, #10 00000009 .word
movs r1, #0 0x00000009
loop:
adds r1, r0 00000008 <start>:
subs r0, #1 8: 200a
bne loop movs r0, #10
deadloop: a: 2100 movs
b deadloop r1, #0
.end
0000000c <loop>:
c: 1809 adds r1,
r1, r0 e: 3801 subs r0, #1
10: d1fc bne.n c <loop>
00000012 <deadloop>:
12: e7fe b.n 12
<deadloop>
Elements of an assembly
program? .equ STACK_TOP,
0x20000800
/*
/*
Equates symbol to value */
Tells AS to assemble region */
.text /* Means language is ARM UAL */
.syntax unified /* Means ARM ISA is Thumb */
.thumb /* .global exposes symbol */
.global _start /* _start label is the beginning
*/
.type start, %function /* ...of the program region */
/* Specifies start is a function
_start: */
.word STACK_TOP, start /* start label
Inserts wordis reset handler */
0x20000800
*/
start: /* Inserts word (start) */
movs r0, #10
movs r1, #0
loop:
adds r1, r0
subs r0, #1
bne loop
deadloop:
b deadloop
.end
61
How does a mixed C/Assembly program
get turned into a executable program
image?
C files (.c)
Bin
ar
y
pr
Assembly Object Executabl og
files files ra
e image
(.s) (.o) m
file
gcc fi
(compil le
as e
(assembler + link) (.
) bi
n)
ld
(linker)
Memory
layout
Disassemble
d Code
Library object
(.lst)
Linker files (.o) script
(.ld)
63
Nested Vector Interrupt Controller (NVIC)
• Provides key system control registers including the
System Timer (SysTick) that provides a regular timer
interrupt
• Provision for a built-in timer across the Cortex-M3
family has the significant advantage of making
operating system code highly portable – all operating
systems need at least one core timer for time-slicing
• Registers used to control the NVIC are defined to
reside at address 0xE000E000 and are defined by the
Cortex- M3 specification
• These registers are accessed with the system bus
Outline
• ARM Cortex-M3 processor
• NXP LPC17xx microcontroller unit (MCU)
65
Basic Processor Based System
67
While there is significant overlap between the families
and their peripherals, there are also important differences
IŶ the laď of this Đourse we foĐus oŶ the NXP’s LPC17dždž
faŵilLJ
LPC17xx
• LPC17xx (of NXP) is an ARM Cortex-M3 based microcontroller
• The Cortex-M3 is also the basis for microcontrollers from
other manufacturers including TI, ST, Toshiba, Atmel, etc.
• LPC1768 operates at up to a 100 MHz CPU frequency
• Sophisticated clock system
• Peripherals include:
– up to 512 kB of flash memory, up to 64 kB of data memory
– Ethernet MAC
– a USB interface that can be configured as either Host, Device, or
OTG
– 8 channel general purpose DMA controller
– 4 UARTs, 2 CAN channels, 2 SSP controllers, SPI interface
– 3 I2C interfaces, 2-input plus 2-output I2S interface
– 8 channel 12-bit ADC, 10-bit DAC, motor control PWM
– Quadrature Encoder interface, 4 general purpose timers,
– 6-output general purpose PWM
– ultra-low power RTC with separate battery supply
– up to 70 general purpose I/O pins
69
LPC1768
LPC1768
• LPC1768 microcontrollers are based on the Cortex-M3
processor with a set of peripherals distributed across three
buses – Advanced High-performance Bus (AHB) and its two
Advanced Peripheral Bus (APB) sub-buses APB1 and APB2.
• These peripherals:
– are controlled by the CM3 core with load and store
instructions that access memory mapped registers
– Đan ͞inteƌƌupt͟ the Đoƌe to ƌeƋuest attention thƌough
peƌipheƌal
specific interrupt requests routed through the NVIC
• Data transfers between peripherals and memory can
be automated using DMA
• Labs will cover among others:
– basic peripheral configuration (e.g., lab1 illustrates
GPIO General Purpose I/O peripherals)
– how interrupts can be used to build effective software
– how to use DMA to improve performance and allow
processing to proceed in parallel with data transfer
71
LPC1768
• Peripherals are ͞ŵ eŵorLJ-ŵapped͟
– core interacts with the peripheral hardware by reading and writing peripheral
͞ƌegisteƌs͟ using load and stoƌe instƌuĐtions
• The various peripheral registers are documented in the user and reference
manuals
– documentation include bit-level definitions of the various registers and info on
how interpret those bits
– actual physical addresses are also found in the reference manuals
• Examples of base addresses for several peripherals (see page 14 of the
LPC17xx user manual):
0x40010000 UART1
0x40020000 SPI
0x40028000 GPIO interrupts
0x40034000 ADC
…
• No real need for a programmer to look up all these values as they
are defined in the library file lpc17xx.h as:
LPC_UART1_BASE
LPC_SPI_BASE
LPC_GPIOINT_BASE
LPC_ADC_BASE
…
LPC1768
• Typically, each peripheral has:
• control registers to configure the peripheral
• status registers to determine the
current peripheral status
• data registers to read data from and
write data to the peripheral
73
LPC1768
• In addition to providing the addresses of the
peripherals, lpc17xx.h also provides C language level
structures that can be used to access each
peripheral.
• For example, the SPI and GPIO ports are defined by
the following register structures:
typedef struct
{
IO uint32_t SPCR;
I uint32_t SPSR;
IO uint32_t SPDR;
IO uint32_t SPCCR;
uint32_t RESERVED0[3];
IO uint32_t SPINT;
} LPC_SPI_TypeDef;
LPC1768
typedef struct union { union {
{ IO uint32_t FIOPIN; O uint32_t
union { struct { FIOCLR; struct {
IO uint32_t FIODIR; __IO uint16_t __O uint16_t
struct { FIOPINL; FIOCLRL;
__IO uint16_t __IO uint16_t __O uint16_t
FIODIRL; FIOPINH; FIOCLRH;
__IO uint16_t }; };
FIODIRH; struct { struct {
}; __IO uint8_t __O uint8_t
struct { FIOPIN0; FIOCLR0;
__IO uint8_t __IO uint8_t __O uint8_t
FIODIR0; FIOPIN1; FIOCLR1;
__IO uint8_t __IO uint8_t __O uint8_t
FIODIR1; FIOPIN2; FIOCLR2;
__IO uint8_t __IO uint8_t __O uint8_t
FIODIR2; FIOPIN3; FIOCLR3;
__IO uint8_t }; };
FIODIR3; }; };
}; union { } LPC_GPIO_TypeDef;
}; IO uint32_t FIOSET;
uint32_t RESERVED0[3]; struct {
union { __IO uint16_t FIOSETL;
IO uint32_t FIOMASK; __IO uint16_t FIOSETH;
struct { };
__IO uint16_t FIOMASKL; struct {
__IO uint16_t FIOMASKH; __IO uint8_t FIOSET0;
}; __IO uint8_t FIOSET1;
};
struct { __IO uint8_t FIOSET2;
}; __IO uint8_t FIOMASK0; __IO uint8_t FIOSET3;
__IO uint8_t FIOMASK1; };
__IO uint8_t FIOMASK2; };
__IO uint8_t FIOMASK3;
75
LPC1768
• The register addresses of the various ports are defined
in the library (see lpc17xx.h):
#define LPC_APB0_BASE (0x40000000UL)
…
#define LPC_UART1_BASE (LPC_APB0_BASE +
#define LPC_SPI_BASE 0x10000) (LPC_APB0_BASE +
#define 0x20000) (LPC_APB0_BASE +
LPC_GPIOINT_BASE 0x28080) (LPC_APB0_BASE +
#define LPC_ADC_BASE 0x34000)
…
#define LPC_GPIO1 ((LPC_GPIO_TypeDef *)
LPC_GPIO1_BASE)
…
Memory
• On-chip flash memory system
– Up to 512 kB of on-chip flash memory
– Flash memory accelerator maximizes performance for
use with the two fast AHB-Lite buses
– Can be used for both code and data storage
• On-chip Static RAM
– Up to 64 kB of on-chip static RAM memory
– Up to 32 kB of SRAM, accessible by the CPU and all
three DMA controllers are on a higher-speed bus
– Devices with more than 32 kB SRAM have two
additional 16 kB SRAM blocks
40
LPC17xx system memory map
41