0% found this document useful (0 votes)
52 views

Chapter 01 See Program Running

The document provides an introduction to embedded systems using ARM Cortex-M microcontrollers, focusing on the importance of assembly language for performance-critical applications. It discusses the architecture of ARM processors, the significance of assembly programming, and the structure of computer systems, including memory organization and instruction execution. Additionally, it outlines the hierarchy of programming languages and the role of compilers and interpreters in program execution.

Uploaded by

chasepbrown1209
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Chapter 01 See Program Running

The document provides an introduction to embedded systems using ARM Cortex-M microcontrollers, focusing on the importance of assembly language for performance-critical applications. It discusses the architecture of ARM processors, the significance of assembly programming, and the structure of computer systems, including memory organization and instruction execution. Additionally, it outlines the hierarchy of programming languages and the role of compilers and interpreters in program execution.

Uploaded by

chasepbrown1209
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 63

Embedded Systems with ARM Cortex-M Microcontrollers in Assembly Language and C

Chapter 1
Computer and Assembly Language

Dr. Yifeng Zhu


Electrical and Computer Engineering
University of Maine

Spring 2021

1
Embedded Systems

2
Amazon Warehouse

Kiva Robot

3
Assembly Programming

Embedded Systems
Performance, performance, performance!
Cost, cost cost!

4 http://www.andysinger.com/
Why should we learn Assembly?
 Assembly isn’t “just another language”.
 Help you understand how does the processor work
 Assembly program runs faster than high-level language.
 Performance critical codes may need to be written in assembly.
 Use the profiling tools to find the performance bottle and rewrite that code section in
assembly
 Latency-sensitive applications, such as aircraft controller
 Standard C compilers do not use some operations available on ARM processors, such ROR
(Rotate Right) and RRX (Rotate Right Extended).
 Hardware/processor specific code,
 Processor booting code
 Device drivers
 Compiler, assembler, linker
 A test-and-set atomic assembly instruction can be used to implement locks and
semaphores.
 Cost-sensitive applications
 Embedded devices, where the size of code is limited, wash machine controller,
5 automobile controllers
Why ARM processor
 ARM: Acorn RISC Machine

 In 2010 alone, 6.1 billion ARM-based processor, ADC USB 2.0


representing 95% of smartphones, 35% of digital
televisions and set-top boxes and 10% of mobile Touch
DAC
computers sensing

USART,SPI, Advanced
 As of 2019, 150 billion ARM processors have been I2C timers
produced
Motor
LCD Driver control

6
iPhone 11

A13 Bionic:
• 64-bit system on chip
(SoC)
• 6 ARMv8.3-A cores
• Introduced on Sept. 10,
2019
7 ifixit.com
Amazon Echo (Alexa)
 Texas
Instruments
DM3725
Digital Media
Processor
 ARM Cortex-A8

8
Kindle HD Fire

Texas
Instruments
OMAP 4460
dual-core
processor

9 http://www.ifixit.com
Fitbit Flex Teardown

STMicroelectronics
32L151C6 Ultra Low
Power ARM Cortex M3
Microcontroller

Nordic Semiconductor
nRF8001 Bluetooth Low
Energy Connectivity IC

10
www.ifixit.com
Samsung Galaxy Gear

 STMicroelectronics
STM32F401B ARM-
Cortex M4 MCU with
source: ifixit.com
128KB Flash

11
Pebble Smartwatch

source: ifixit.com

 STMicroelectronics STM32F205RE
ARM Cortex-M3 MCU, with a
maximum speed of 120 MHz
12
Oculus VR

 Facebook’s $2 Billion Acquisition Of Oculus in 2014


source: ifixit.com
 ST Microelectronics STM32F072VB ARM Cortex-M0 32-bit
RISC Core Microcontroller
13
HTC Vive

STMicroelectronics
32F072R8 ARM Cortex-
M0 Microcontroller
14 source: ifixit.com
Nest Learning Thermostat

source: ifixit.com

 ST Microelectronics STM32L151VB ultra-low-power


32 MHz ARM Cortex-M3 MCU
15
Samsung Gear Fit Fitness Tracker

source: ifixit.com

 STMicroelectronics STM32F439ZI
180 MHz, 32 bit ARM Cortex-M4
CPU
16
Data Address
Memory 8 bits 32 bits

0xFFFFFFFF
 Memory is arranged as a series of
“locations”
 Each location has a unique “address”
 Each location holds a byte (byte-addressable)
 e.g. the memory location at address 0x080001B0
contains the byte value 0x70, i.e., 112 0x080001B
70
 The number of locations in memory is limited 0
 e.g. 4 GB of RAM BC 0x080001A
 1 Gigabyte (GB) = 230 bytes 18 F
01 0x080001A
 232 locations  4,294,967,296 locations!
A0 E
 Values stored at each location can represent 0x080001A
either program data or program instructions D
 e.g. the value 0x70 might be the code used to 0x080001A
tell the processor to add two values together C
0x00000000
17
Memory
Computer Architecture
Von-Neumann Harvard
Instructions and data are Data and instructions are
stored in the same stored into separate
memory. memories.

18
Computer Architecture
Von-Neumann Harvard
Instructions and data are Data and instructions are
stored in the same stored into separate
memory. memories.

19
ARM Cortex-M Series Family
Von-Neumann Harvard
Instructions and data are Data and instructions are
stored in the same stored into separate
memory. memories.

ARM ARM ARM ARM


Cortex-M0 Cortex-M0+ Cortex-M3 Cortex-M4

ARMv6-M ARMv6-M ARMv7-M ARMv7E-M

ARM ARM ARM ARM


Cortex-M1 Cortex-M23 Cortex-M7 Cortex-M33

ARMv6-M ARMv8-M ARMv7E-M ARMv8-M

20
INTRODUCTION
Introduction
• Digital computer:
- Machine that can solve problems by carrying out instructions.

• Program:
- Sequence of instructions for performing a certain task.

• Machine language:
- Instructions that can be executed by the hardware of a computer.
Add two numbers.
Check a number to see if it is zero.
Copy a piece of data from one memory location to
another.
- Primitive instructions are as simple as possible.

Machine programs are difficult and tedious for people to use.


INTRODUCTION
Programming Languages
Design a new set of instructions that is more convenient for people.
• Two different languages:
- Language L0 executed by computer.
- Language L1 used by people.
• Translation
- Replace L1 program (the source) by an equivalent L0 program (the target).
- The computer executes the generated L0 program.
• Interpretation
- Write an L0 program that takes L1 programs as inputs and executes them.
- The computer executes the L0 program (the interpreter).

Fundamental techniques of executing programs in other languages.


INTRODUCTION
Translation and Interpretation

• Translation – entire program is converted in advance


• Interpretation – program is converted in “real-time” or online
- Execution of translated programs is faster

A language translator is also called a compiler.


INTRODUCTION
Virtual Machine
Avoid thinking in terms of translation or interpretation.

• Imagine a virtual machine.


- Hypothetical computer M1 whose machine language is L1.
- If such a machine can be constructed, no need for machine executing L0.

• People can write programs for virtual machine.


- M1 may be built in hardware.
- L1 programs may be translated to L0 programs.
- L1 programs may be interpreted by L0 program.

We can neglect implementation of M1 when writing L1 programs.


INTRODUCTION

Language Layers
For practical implementation, L0 and L1 should not be too different.
• L1 may be a bit better than L0.
- L1 is far from ideal for most applications.

• Continue language building process.


- Invent language L2 that is a bit more people-oriented than L1.
- Implement virtual machine M2 on top of M1.

A hierarchy of language layers is constructed.


INTRODUCTION
A Multi-Level Machine
INTRODUCTION

Overview
How can we manage the complexity of computer systems?
• A computer can be regarded as a hierarchy of levels.
- Each level performs some well-defined function.
- Each level is implemented on top of the next lower level.
- The lowest level is the physical level (hardware).

• Each level serves as a layer of abstraction.


- Its implementation is not interesting to the upper layers.
- It hides the details of the lower layers from the upper layers.

We can understand a computer on different layers of abstraction.


INTRODUCTION
• Introduction.
- Overview and historic development.

• Computer systems organization.


- The components of a computer and their interconnection.

• The hierarchy of levels.


1. Digital logic: implementation of computation by digital devices.
2. Microarchitecture: structure of a computer processor.
3. Instruction set architecture: operations provided by a computer processor.
4. Operating system: extension of the instruction set by additional services.
5. Assembly language: generation of executable program from textual descriptions .

• Outlook.
- Compilation: the generation of assembly programs from high-level languages.
- Computer networks: the interconnection of computer systems to distributed systems.
INTRODUCTION
Contemporary
Multi-Level
Machine
Levels of Program Code 001000010000000
0
001000000000000
C Program Assembly Program Machine 0Program
111000000000000
int main(void){ 1
int i; 010001000000000
int total = 0; 1
for (i = 0; i < 10; i++) Compil Assemble
000111000100000
{ e 0
total += i;
} 001010000000101
while(1); // Dead loop 0
} 110111001111101
1
101111110000000
 High-level language  Assembly  Hardware0
111001111111111
 Level of abstraction language representation
0
closer to problem  Textual  Binary digits (bits)
domain representation of  Encoded
 Provides for productivity instructions instructions and
and portability  Human-readable data
format  Computer-readable
instructions format instructions

30
See a Program Runs
C Code
Assembly Code
int main(void){
int a = 0; MOVS r1, #0x00 ; int a = 0
int b = 1; compiler MOVS r2, #0x01 ; int b = 1
int c; ADDS r3, r1, r2 ; c = a + b
c = a + b; MOVS r0, 0x00 ; set return value
return 0; BX lr ; return
} l er
b
m
se
as
Machine Code
001000010000000 2100 ; MOVS r1, #0x00
0 2201 ; MOVS r2, #0x01
001000100000000 188B ; ADDS r3, r1, r2
1 2000 ; MOVS r0, #0x00
000110001000101 4770 ; BX lr
1
In Binary In Hex
001000000000000
31 0
010001110111000
Processor Registers
32 bits
 Fastest way to read and write
R0  Registers are within the processor
R1 chip
R2  A register stores 32-bit value
Low R3
Registers
R4
 ARM Cortex-M has
R5  R0-R12: 13 general-purpose registers
General
R6 Purpose  R13: Stack pointer (Shadow of MSP or
Register
R7 PSP)
R8  R14: Link register (LR)
R9  R15: Program counter (PC)
High
32 bits
R10
Registers  Special registers (xPSR, BASEPRI,
R11 xPSR
PRIMASK, etc)
R12 BASEPRI
Special
R13 (SP) R13 (MSP) R13 (PSP) PRIMASK Purpose
Register
R14 (LR) FAULTMASK
R15 (PC) CONTROL

32
Program Execution
 Program Counter (PC) is a register that holds the memory address of the next
instruction to be fetched from the memory.

Memory Address
1. Fetch
instruction
at PC 477 0x080001B
address 0 4
PC 200 0x080001B
0 2
3. 2. 188 0x080001B
Execute Decode B 0
the the PC = 0x080001B0
220 0x080001A
instructio instructio Instruction = 1188B Eor
n n 2000188B or 210 8B180020
0x080001A
0 C

33
Three-state pipeline:
Fetch, Decode, Execution
 Pipelining allows hardware resources to be fully utilized
 One 32-bit instruction or two 16-bit instructions can be fetched.

Pipeline of 32-bit instructions

34
Three-state pipeline:
Fetch, Decode, Execution
 Pipelining allows hardware resources to be fully utilized
 One 32-bit instruction or two 16-bit instructions can be fetched.

Clock

Instruction Instruction Instruction


Instruction i
Fetch Decode Execution

Instruction Instruction Instruction


Instruction i + 1
Fetch Decode Execution

Instruction Instruction Instruction


Instruction i + 2
Fetch Decode Execution

Instruction Instruction Instruction


Instruction i + 2
Fetch Decode Execution

Pipeline of 16-bit instructions


35
Machine codes are stored in memory
Data Address
r15 pc
0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 477 0x080001B
r8 ALU 0 4
r7 200 0x080001B
r6 0 2
r5 188 0x080001B
r4 B 0
220 0x080001A
r3
1 E
r2
210 0x080001A
r1 0 C
r0
0x00000000
Registers CPU
36 Memory
Fetch Instruction: pc = 0x08001AC
Decode Instruction: 2100 = MOVS r1, #0x00
Data Address
0x080001
r15 pc
AC 0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 477 0x080001B
r8 ALU 0 4
r7 200 0x080001B
r6 0 2
r5 188 0x080001B
r4 B 0
220 0x080001A
r3
1 E
r2
210 0x080001A
r1 0 C
r0
0x00000000
Registers CPU
37 Memory
Execute Instruction:
MOVS r1, #0x00
Data Address
0x080001
r15 pc
AC 0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 477 0x080001B
r8 ALU 0 4
r7 200 0x080001B
r6 0 2
r5 188 0x080001B
r4 B 0
220 0x080001A
r3
1 E
r2
210 0x080001A
r1 0x000000 0
00 C
r0
0x00000000
Registers CPU
38 Memory
Fetch Next Instruction: pc = pc + 2
Data Address
0x080001
r15 pc
AE 0xFFFFFFFF
r14 lr
• Thumb-2 consists
of a mix of 16- & r13 sp
32-bit instructions r12
• In reality, we
always fetch 4
r11
bytes from the r10
instruction
r9 477 0x080001B
memory (either
one 32-bit r8 ALU 0 4
instruction or two r7 200 0x080001B
16-bit
r6 0 2
instructions)
• To simplify the r5 188 0x080001B
demo, we assume
r4 B 0
we only fetch 2 220 0x080001A
bytes from the r3
instruction 1 E
r2
memory in this 210 0x080001A
r1 0x000000 0
example. 00 C
r0
0x00000000
Registers CPU
39 Memory
Fetch Next Instruction: pc = pc + 2
Decode & Execute: 2201 = MOVS r2, #0x01
Data Address
0x080001
r15 pc
AE 0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 477 0x080001B
r8 ALU 0 4
r7 200 0x080001B
r6 0 2
r5 188 0x080001B
r4 B 0
220 0x080001A
r3
1 E
r2 0x000000
01
0x000000 210 0x080001A
r1 0
00 C
r0
0x00000000
Registers CPU
40 Memory
Fetch Next Instruction: pc = pc + 2
Decode & Execute: 188B = ADDS r3, r1, r2
Data Address
0x080001
r15 pc
B0 0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 477 0x080001B
r8 ALU 0 4
r7 200 0x080001B
r6 0 2
r5 188B 0x080001B
220 0
r4
1 0x080001A
r3 0x000000
01
0x000000 210 E
r2 0
01
0x000000 0x080001A
r1
00 C
r0
0x00000000
Registers CPU
41 Memory
Fetch Next Instruction: pc = pc + 2
Decode & Execute: 2000 = MOVS r0, #0x00
Data Address
0x080001
r15 pc
B2 0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 477 0x080001
r8 ALU 0 B4
r7 200 0x080001
r6 0 B2
r5 188 0x080001
r4 B B0
220 0x080001
r3
1 AE
r2 0x000000
01
0x000000 210 0x080001
r1 0
00
0x000000
AC
r0
00 0x00000000
Registers CPU
42 Memory
Fetch Next Instruction: pc = pc + 2
Decode & Decode: 4770 = BX lr
Data Address
0x080001
r15 pc
B4 0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 477 0x080001B
r8 ALU 0 4
r7 200 0x080001B
r6 0 2
r5 188 0x080001B
r4 B 0
220 0x080001A
r3
1 E
r2 0x000000
01
0x000000 210 0x080001A
r1 0
00
0x000000
C
r0
00 0x00000000
Registers CPU
43 Memory
Realities
 In the previous example,
 PC is incremented by 2

Well, I lied!

44
Realities
 PC is always incremented by 4.
 Each time, 4 bytes are fetched from the instruction memory
 It is either two 16-bit instructions or one 32-bit instruction

If bit [15-11] = 11101, 11110, or 11111, then, it is the first half-word of a 32-bit instruction.
Otherwise, it is a 16-bit instruction.

45
Example:
Calculate the Sum of an Array

int a[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9,


10};
int total;

int main(void){
int i;
total = 0;
for (i = 0; i < 10; i++) {
total += a[i];
}
while(1);
}

46
Example:
Calculate the Sum of an Array

Instruction Data
Memory Memory
(Flash) (RAM)
int main(void){ int a[10] = {1, 2, 3, 4,
int i; 5, 6, 7, 8, 9, 10};
total = 0; int total;
for (i = 0; i < 10;
i++) { I/O
CPU total += a[i];
} Devices
while(1);
}
Starting memory address Starting memory address
0x08000000 0x20000000

47
Example:
Calculate the Sum of an Array
0010 0001 0000
0000
0100 1010 0000 MOVS r1, #0x00
Instruction 1000 LDR r2, =
Memory 0110 0000 0001 total_addr
0001 STR r1, [r2,
(Flash)
0010 0000 0000 #0x00]
int main(void){ 0000 MOVS r0, #0x00
int i; 1110 0000 0000 B Check
total = 0; 1000 Loop: LDR r1, = a_addr
for (i = 0; i < 10;
0100 1001 0000 LDR r1, [r1, r0,
i++) {
total += a[i]; 0111 LSL #2]
} 1111 1000 0101 LDR r2, =
while(1); 0001 total_addr
}
Starting memory address 0001 0000 0010 LDR r2, [r2,
0x08000000 0000 #0x00]
0100 1010 0000 ADD r1, r1, r2
0100 LDR r2, =
0110 1000 0001 total_addr
0010 STR r1,
0100 0100 0001 [r2,#0x00]
0001 ADDS r0, r0, #1
0100 1010 0000 Check: CMP r0, #0x0A
0011
48 0110 0000 0001 BLT Loop
0001 NOP
Example:
Calculate the Sum of an Array
0x20000054 0x00000000
0x20000050 0x00000000
0x2000004C 0x00000000
0x20000048 0x00000000
Data 0x20000044 0x00000000
Memory (RAM) 0x20000040 0x00000000
0x2000003C 0x00000000
0x20000038 0x00000000
int a[10] = {1, 2, 3, 4, 5, 6, 7, 8, 0x20000034 0x00000000
9, 10}; 0x20000030 0x00000000
int total; 0x2000002C 0x00000000
0x20000028 0x00000000 total= 0x00000000
0x20000024 0x0000000A a[9] = 0x0000000A
0x20000020 0x00000009 a[8] = 0x00000009
0x2000001C 0x00000008 a[7] = 0x00000008
0x20000018 0x00000007 a[6] = 0x00000007
Assume the starting memory
address of the data memory is
0x20000014 0x00000006 a[5] = 0x00000006
0x20000000 0x20000010 0x00000005 a[4] = 0x00000005
0x2000000C 0x00000004 a[3] = 0x00000004
0x20000008 0x00000003 a[2] = 0x00000003
0x20000004 0x00000002 a[1] = 0x00000002
0x20000000 0x00000001 a[0] = 0x00000001
Memory
Memory
address
content
in bytes
49
Loading Code and Data into Memory

50
Loading Code and Data into Memory

51
Loading Code and Data into Memory

• Stack is mandatory
• Heap is used only if
dynamic allocation
(e.g. malloc, calloc) is
used.

52
View of a Binary Program

53
STM32F401RE Nucleo-64 Board

54
from st.com
STM32L476 Nucleo-64 Board

55
from st.com
Host Target
Development
Target System Host Computer

Systems software developed


Software and hardware debugged on host system and downloaded
in test phase while tethered. to target system
Standalone in deployment phase.

USB
Processor Cable
Peripherals - Download
- Debug
- Power
Real Time System Integrated Development
- Embedded processor(s) Environment (IDE)
- Memory - Compiler (Cross)
- Sensors - Assembler
- Actuators - Linker
- Communication Devices - Debugger
- Flash Programmer
Host Target
Development
Software Compilation Toolchain
Software Compilation Toolchain
Embedded Development Suite

60
61
from st.com
STM32L4

62 from st.com
Memory
Map

63

You might also like