0% found this document useful (0 votes)
18 views48 pages

Chapter 1

The document discusses the importance of learning Assembly language, particularly for ARM Cortex-M microcontrollers, highlighting its performance benefits and necessity for low-level programming in embedded systems. It provides an overview of ARM processor prevalence in mobile and embedded devices, along with examples of various ARM-based products. Additionally, it covers fundamental concepts of computer architecture, memory organization, and program execution in the context of Assembly and C programming.

Uploaded by

khoanguyen1511kg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views48 pages

Chapter 1

The document discusses the importance of learning Assembly language, particularly for ARM Cortex-M microcontrollers, highlighting its performance benefits and necessity for low-level programming in embedded systems. It provides an overview of ARM processor prevalence in mobile and embedded devices, along with examples of various ARM-based products. Additionally, it covers fundamental concepts of computer architecture, memory organization, and program execution in the context of Assembly and C programming.

Uploaded by

khoanguyen1511kg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Embedded Systems with ARM Cortex-M Microcontrollers in Assembly

Language and C

Chapter 1
Computer and Assembly Language

Dr. Yifeng Zhu


Electrical and Computer Engineering
University of Maine

Spring 2018

1
Embedded Systems

2
Amazon Warehouse

Kiva Robot

3
Assembly Programs

http://www.andysinger.com/

4
Why do we learn Assembly?
 Assembly isn’t “just another language”.
 Help you understand how does the processor work
 Assembly program runs faster than high-level language. Performance critical codes must be
written in assembly.
 Use the profiling tools to find the performance bottle and rewrite that code section in assembly
 Latency-sensitive applications, such as aircraft controller
 Standard C compilers do not use some operations available on ARM processors, such ROR (Rotate
Right) and RRX (Rotate Right Extended).
 Hardware/processor specific code,
 Processor booting code
 Device drivers
 A test-and-set atomic assembly instruction can be used to implement locks and semaphores.
 Cost-sensitive applications
 Embedded devices, where the size of code is limited, wash machine controller, automobile
controllers
 The best applications are written by those who've mastered assembly language or fully
understand the low-level implementation of the high-level language statements they're
choosing.
5
Why ARM processor
 As of 2005, 98% of the more than
one billion mobile phones sold each
year used ARM processors

 As of 2009, ARM processors


accounted for approximately 90%
of all embedded 32-bit RISC
processors

 In 2010 alone, 6.1 billion ARM-


based processor, representing 95%
of smartphones, 35% of digital
televisions and set-top boxes and
10% of mobile computers

 As of 2014, over 50 billion ARM


processors have been produced

6
iPhone 5 Teardown

The A6 processor is the first


Apple System-on-Chip (SoC) to
use a custom design, based off
the ARMv7 instruction set.

7 http://www.ifixit.com
iPhone 6 Teardown

The A8 processor is the first 64-


bit ARM based SoC. It supports
ARM A64, A32, and T32
instruction set.

8 http://www.ifixit.com
iPhone 7
Teardown

A10 processor:
• 64-bit system on chip (SoC)
• ARMv8-A core

9
Apple Watch
 Apple S1 Processor
 32-bit ARMv7-A compatible
 # of Cores: 1
 CMOS Technology: 28 nm
 L1 cache 32 KB data
 L2 cache 256 KB
 GPU PowerVR SGX543

10
Kindle HD Fire

Texas Instruments
OMAP 4460 dual-
core processor

11 http://www.ifixit.com
Fitbit Flex Teardown

STMicroelectronics
32L151C6 Ultra Low
Power ARM Cortex M3
Microcontroller

Nordic Semiconductor
nRF8001 Bluetooth Low
Energy Connectivity IC

12
www.ifixit.com
Samsung Galaxy Gear

 STMicroelectronics
STM32F401B ARM-
Cortex M4 MCU with
source: ifixit.com
128KB Flash

13
Pebble Smartwatch

source: ifixit.com

 STMicroelectronics STM32F205RE ARM


Cortex-M3 MCU, with a maximum speed
of 120 MHz
14
Oculus VR

 Facebook’s $2 Billion Acquisition Of Oculus in 2014 source: ifixit.com

 ST Microelectronics STM32F072VB ARM Cortex-M0 32-bit RISC Core


Microcontroller
15
HTC Vive

STMicroelectronics 32F072R8
ARM Cortex-M0
Microcontroller
16 source: ifixit.com
Nest Learning Thermostat

source: ifixit.com

 ST Microelectronics STM32L151VB ultra-low-power 32 MHz


ARM Cortex-M3 MCU
17
Samsung Gear Fit Fitness Tracker

source: ifixit.com

 STMicroelectronics STM32F439ZI 180


MHz, 32 bit ARM Cortex-M4 CPU

18
Data Address
Memory 8 bits 32 bits

 Memory is arranged as a series of “locations” 0xFFFFFFFF


 Each location has a unique “address”
 Each location holds a byte (byte-addressable)
 e.g. the memory location at address 0x080001B0
contains the byte value 0x70, i.e., 112
 The number of locations in memory is limited 70 0x080001B0
 e.g. 4 GB of RAM BC 0x080001AF
 1 Gigabyte (GB) = 230 bytes 18 0x080001AE
01 0x080001AD
 232 locations  4,294,967,296 locations!
A0 0x080001AC
 Values stored at each location can represent
either program data or program instructions
 e.g. the value 0x70 might be the code used to tell
the processor to add two values together

0x00000000
19 Memory
Computer Architecture
Von-Neumann Harvard
Instructions and data are stored Data and instructions are stored
in the same memory. into separate memories.

20
Computer Architecture
Von-Neumann Harvard
Instructions and data are stored Data and instructions are stored
in the same memory. into separate memories.

21
ARM Cortex-M Series Family
Von-Neumann Harvard
Instructions and data are stored Data and instructions are stored
in the same memory. into separate memories.

ARM ARM ARM ARM


Cortex-M0 Cortex-M0+ Cortex-M3 Cortex-M4

ARMv6-M ARMv6-M ARMv7-M ARMv7E-M

ARM ARM ARM ARM


Cortex-M1 Cortex-M23 Cortex-M7 Cortex-M33

ARMv6-M ARMv8-M ARMv7E-M ARMv8-M

22
Levels of Program Code
C Program Assembly Program Machine Program
0010000100000000
int main(void){ 0010000000000000
int i; 1110000000000001
int total = 0; Compile Assemble 0100010000000001
for (i = 0; i < 10; i++) {
0001110001000000
total += i;
} 0010100000001010
while(1); // Dead loop 1101110011111011
} 1011111100000000
1110011111111110

 High-level language  Assembly language  Hardware


 Level of abstraction  Textual representation representation
closer to problem of instructions  Binary digits
domain (bits)
 Provides for productivity  Encoded
and portability instructions and
data

23
See a Program Runs
C Code
Assembly Code
int main(void){
int a = 0; MOVS r1, #0x00 ; int a = 0
int b = 1; compiler MOVS r2, #0x01 ; int b = 1
int c; ADDS r3, r1, r2 ;c = a + b
c = a + b; MOVS r0, 0x00 ; set return value
return 0; BX lr ; return
}

Machine Code
0010000100000000 2100 ; MOVS r1, #0x00
0010001000000001 2201 ; MOVS r2, #0x01
0001100010001011 188B ; ADDS r3, r1, r2
0010000000000000 2000 ; MOVS r0, #0x00
0100011101110000 4770 ; BX lr
In Binary In Hex
24
Processor Registers
32 bits
 Fastest way to read and write
 Registers are within the processor chip
R0  A register stores 32-bit value
R1
 STM32L has
R2
 R0-R12: 13 general-purpose registers
Low R3
Registers
R4  R13: Stack pointer (Shadow of MSP or PSP)
R5  R14: Link register (LR)
General
R6 Purpose  R15: Program counter (PC)
Register
R7  Special registers (xPSR, BASEPRI, PRIMASK, etc)
R8
R9
High
32 bits
Registers R10
R11 xPSR
R12 BASEPRI
Special
R13 (SP) R13 (MSP) R13 (PSP) PRIMASK Purpose
Register
R14 (LR) FAULTMASK
R15 (PC) CONTROL

25
Program Execution
 Program Counter (PC) is a register that holds the memory
address of the next instruction to be fetched from the memory.

Memory Address
1. Fetch
instruction at
PC address 4770 0x080001B4
2000 0x080001B2
PC 188B 0x080001B0
2201 0x080001AE
3. Execute 2. Decode 2100 0x080001AC
the the
instruction instruction PC = 0x080001B0
Instruction = 188B or
2000188B or 8B180020

26
Three-state pipeline:
Fetch, Decode, Execution
 Pipelining allows hardware resources to be fully utilized
 One 32-bit instruction or two 16-bit instructions can be fetched.

Pipeline of 32-bit instructions

27
Three-state pipeline:
Fetch, Decode, Execution
 Pipelining allows hardware resources to be fully utilized
 One 32-bit instruction or two 16-bit instructions can be fetched.
Clock

Instruction Instruction Instruction


Instruction i
Fetch Decode Execution

Instruction Instruction Instruction


Instruction i + 1
Fetch Decode Execution

Instruction Instruction Instruction


Instruction i + 2
Fetch Decode Execution

Instruction Instruction Instruction


Instruction i + 2
Fetch Decode Execution

Pipeline of 16-bit instructions


28
Machine codes are stored in memory
Data Address
r15 pc
0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2
r1
r0
0x00000000
Registers CPU
29 Memory
Fetch Instruction: pc = 0x08001AC
Decode Instruction: 2100 = MOVS r1, #0x00
Data Address
r15 0x080001AC pc 0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2
r1
r0
0x00000000
Registers CPU
30 Memory
Execute Instruction:
MOVS r1, #0x00
Data Address
r15 0x080001AC pc 0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2
r1 0x00000000
r0
0x00000000
Registers CPU
31 Memory
Fetch Next Instruction: pc = pc + 2
Decode & Execute: 2201 = MOVS r2, #0x01
Data Address
r15 0x080001AE pc 0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2 0x00000001
r1 0x00000000
r0
0x00000000
Registers CPU
32 Memory
Fetch Next Instruction: pc = pc + 2
Decode & Execute: 188B = ADDS r3, r1, r2
Data Address
r15 0x080001B0 pc
0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3 0x00000001
r2 0x00000001
r1 0x00000000
r0
0x00000000
Registers CPU
33 Memory
Fetch Next Instruction: pc = pc + 2
Decode & Execute: 2000 = MOVS r0, #0x00
Data Address
r15 0x080001B2 pc
0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2 0x00000001
r1 0x00000000
r0 0x00000000
0x00000000
Registers CPU
34 Memory
Fetch Next Instruction: pc = pc + 2
Decode & Decode: 4770 = BX lr
Data Address
r15 0x080001B4 pc
0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2 0x00000001
r1 0x00000000
r0 0x00000000
0x00000000
Registers CPU
35 Memory
Example:
Calculate the Sum of an Array

int a[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};


int total;

int main(void){
int i;
total = 0;
for (i = 0; i < 10; i++) {
total += a[i];
}
while(1);
}

36
Example:
Calculate the Sum of an Array

Instruction Data
Memory (Flash) Memory (RAM)

int main(void){ int a[10] = {1, 2, 3, 4, 5, 6, 7,


int i; 8, 9, 10};
total = 0; int total;
for (i = 0; i < 10; i++) {
total += a[i];
CPU } I/O Devices
while(1);
}
Starting memory address Starting memory address
0x08000000 0x20000000

37
Example:
Calculate the Sum of an Array
0010 0001 0000 0000
0100 1010 0000 1000
0110 0000 0001 0001 MOVS r1, #0x00
Instruction LDR r2, = total_addr
0010 0000 0000 0000
Memory (Flash) STR r1, [r2, #0x00]
1110 0000 0000 1000 MOVS r0, #0x00
0100 1001 0000 0111 B Check
int main(void){ 1111 1000 0101 0001 Loop: LDR r1, = a_addr
int i; 0001 0000 0010 0000 LDR r1, [r1, r0, LSL #2]
total = 0; LDR r2, = total_addr
for (i = 0; i < 10; i++) { 0100 1010 0000 0100
LDR r2, [r2, #0x00]
total += a[i]; 0110 1000 0001 0010
} ADD r1, r1, r2
while(1);
0100 0100 0001 0001 LDR r2, = total_addr
} 0100 1010 0000 0011 STR r1, [r2,#0x00]
Starting memory address 0110 0000 0001 0001 ADDS r0, r0, #1
0x08000000 Check: CMP r0, #0x0A
0001 1100 0100 0000
BLT Loop
0010 1000 0000 1010
NOP
1101 1011 1111 0100 Self: B Self
1011 1111 0000 0000
1110 0111 1111 1110

38
Example:
Calculate the Sum of an Array
0x20000000 0x0001 a[0] = 0x00000001
0x20000002 0x0000
0x20000004 0x0002 a[1] = 0x00000002
0x20000006 0x0000
Data 0x20000008 0x0003 a[2] = 0x00000003
Memory (RAM) 0x2000000A 0x0000
0x2000000C 0x0004 a[3] = 0x00000004
0x2000000E 0x0000
0x20000010 0x0005 a[4] = 0x00000005
int a[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
int total; 0x20000012 0x0000
0x20000014 0x0006 a[5] = 0x00000006
0x20000016 0x0000
0x20000018 0x0007 a[6] = 0x00000007
0x2000001A 0x0000
0x2000001C 0x0008 a[7] = 0x00000008
0x2000001E 0x0000
Assume the starting memory address of a[8] = 0x00000009
0x20000020 0x0009
the data memory is 0x20000000
0x20000022 0x0000
0x20000024 0x000A a[9] = 0x0000000A
0x20000026 0x0000
0x20000028 0x0000 total= 0x00000000
0x2000002A 0x0000
Memory
Memory
address
content
in bytes
39
Loading Code and Data into Memory

40
Loading Code and Data into Memory

41
Loading Code and Data into Memory

• Stack is mandatory
• Heap is used only if
dynamic allocation (e.g.
malloc, calloc) is used.

42
View of a Binary Program

43
44
from st.com
45
from st.com
46
from st.com
STM32L4

from st.com 47
Memory
Map

48

You might also like