Chapter 01 See Program Running
Chapter 01 See Program Running
Chapter 1
Computer and Assembly Language
Spring 2021
1
Embedded Systems
2
Amazon Warehouse
Kiva Robot
3
Assembly Programming
Embedded Systems
Performance, performance, performance!
Cost, cost cost!
4 http://www.andysinger.com/
Why should we learn Assembly?
Assembly isn’t “just another language”.
Help you understand how does the processor work
Assembly program runs faster than high-level language.
Performance critical codes may need to be written in assembly.
Use the profiling tools to find the performance bottle and rewrite that code section in
assembly
Latency-sensitive applications, such as aircraft controller
Standard C compilers do not use some operations available on ARM processors, such ROR
(Rotate Right) and RRX (Rotate Right Extended).
Hardware/processor specific code,
Processor booting code
Device drivers
Compiler, assembler, linker
A test-and-set atomic assembly instruction can be used to implement locks and
semaphores.
Cost-sensitive applications
Embedded devices, where the size of code is limited, wash machine controller,
5 automobile controllers
Why ARM processor
ARM: Acorn RISC Machine
USART,SPI, Advanced
As of 2019, 150 billion ARM processors have been I2C timers
produced
Motor
LCD Driver control
6
iPhone 11
A13 Bionic:
• 64-bit system on chip
(SoC)
• 6 ARMv8.3-A cores
• Introduced on Sept. 10,
2019
7 ifixit.com
Amazon Echo (Alexa)
Texas
Instruments
DM3725
Digital Media
Processor
ARM Cortex-A8
8
Kindle HD Fire
Texas
Instruments
OMAP 4460
dual-core
processor
9 http://www.ifixit.com
Fitbit Flex Teardown
STMicroelectronics
32L151C6 Ultra Low
Power ARM Cortex M3
Microcontroller
Nordic Semiconductor
nRF8001 Bluetooth Low
Energy Connectivity IC
10
www.ifixit.com
Samsung Galaxy Gear
STMicroelectronics
STM32F401B ARM-
Cortex M4 MCU with
source: ifixit.com
128KB Flash
11
Pebble Smartwatch
source: ifixit.com
STMicroelectronics STM32F205RE
ARM Cortex-M3 MCU, with a
maximum speed of 120 MHz
12
Oculus VR
STMicroelectronics
32F072R8 ARM Cortex-
M0 Microcontroller
14 source: ifixit.com
Nest Learning Thermostat
source: ifixit.com
source: ifixit.com
STMicroelectronics STM32F439ZI
180 MHz, 32 bit ARM Cortex-M4
CPU
16
Data Address
Memory 8 bits 32 bits
0xFFFFFFFF
Memory is arranged as a series of
“locations”
Each location has a unique “address”
Each location holds a byte (byte-addressable)
e.g. the memory location at address 0x080001B0
contains the byte value 0x70, i.e., 112 0x080001B
70
The number of locations in memory is limited 0
e.g. 4 GB of RAM BC 0x080001A
1 Gigabyte (GB) = 230 bytes 18 F
01 0x080001A
232 locations 4,294,967,296 locations!
A0 E
Values stored at each location can represent 0x080001A
either program data or program instructions D
e.g. the value 0x70 might be the code used to 0x080001A
tell the processor to add two values together C
0x00000000
17
Memory
Computer Architecture
Von-Neumann Harvard
Instructions and data are Data and instructions are
stored in the same stored into separate
memory. memories.
18
Computer Architecture
Von-Neumann Harvard
Instructions and data are Data and instructions are
stored in the same stored into separate
memory. memories.
19
ARM Cortex-M Series Family
Von-Neumann Harvard
Instructions and data are Data and instructions are
stored in the same stored into separate
memory. memories.
20
INTRODUCTION
Introduction
• Digital computer:
- Machine that can solve problems by carrying out instructions.
• Program:
- Sequence of instructions for performing a certain task.
• Machine language:
- Instructions that can be executed by the hardware of a computer.
Add two numbers.
Check a number to see if it is zero.
Copy a piece of data from one memory location to
another.
- Primitive instructions are as simple as possible.
Language Layers
For practical implementation, L0 and L1 should not be too different.
• L1 may be a bit better than L0.
- L1 is far from ideal for most applications.
Overview
How can we manage the complexity of computer systems?
• A computer can be regarded as a hierarchy of levels.
- Each level performs some well-defined function.
- Each level is implemented on top of the next lower level.
- The lowest level is the physical level (hardware).
• Outlook.
- Compilation: the generation of assembly programs from high-level languages.
- Computer networks: the interconnection of computer systems to distributed systems.
INTRODUCTION
Contemporary
Multi-Level
Machine
Levels of Program Code 001000010000000
0
001000000000000
C Program Assembly Program Machine 0Program
111000000000000
int main(void){ 1
int i; 010001000000000
int total = 0; 1
for (i = 0; i < 10; i++) Compil Assemble
000111000100000
{ e 0
total += i;
} 001010000000101
while(1); // Dead loop 0
} 110111001111101
1
101111110000000
High-level language Assembly Hardware0
111001111111111
Level of abstraction language representation
0
closer to problem Textual Binary digits (bits)
domain representation of Encoded
Provides for productivity instructions instructions and
and portability Human-readable data
format Computer-readable
instructions format instructions
30
See a Program Runs
C Code
Assembly Code
int main(void){
int a = 0; MOVS r1, #0x00 ; int a = 0
int b = 1; compiler MOVS r2, #0x01 ; int b = 1
int c; ADDS r3, r1, r2 ; c = a + b
c = a + b; MOVS r0, 0x00 ; set return value
return 0; BX lr ; return
} l er
b
m
se
as
Machine Code
001000010000000 2100 ; MOVS r1, #0x00
0 2201 ; MOVS r2, #0x01
001000100000000 188B ; ADDS r3, r1, r2
1 2000 ; MOVS r0, #0x00
000110001000101 4770 ; BX lr
1
In Binary In Hex
001000000000000
31 0
010001110111000
Processor Registers
32 bits
Fastest way to read and write
R0 Registers are within the processor
R1 chip
R2 A register stores 32-bit value
Low R3
Registers
R4
ARM Cortex-M has
R5 R0-R12: 13 general-purpose registers
General
R6 Purpose R13: Stack pointer (Shadow of MSP or
Register
R7 PSP)
R8 R14: Link register (LR)
R9 R15: Program counter (PC)
High
32 bits
R10
Registers Special registers (xPSR, BASEPRI,
R11 xPSR
PRIMASK, etc)
R12 BASEPRI
Special
R13 (SP) R13 (MSP) R13 (PSP) PRIMASK Purpose
Register
R14 (LR) FAULTMASK
R15 (PC) CONTROL
32
Program Execution
Program Counter (PC) is a register that holds the memory address of the next
instruction to be fetched from the memory.
Memory Address
1. Fetch
instruction
at PC 477 0x080001B
address 0 4
PC 200 0x080001B
0 2
3. 2. 188 0x080001B
Execute Decode B 0
the the PC = 0x080001B0
220 0x080001A
instructio instructio Instruction = 1188B Eor
n n 2000188B or 210 8B180020
0x080001A
0 C
33
Three-state pipeline:
Fetch, Decode, Execution
Pipelining allows hardware resources to be fully utilized
One 32-bit instruction or two 16-bit instructions can be fetched.
34
Three-state pipeline:
Fetch, Decode, Execution
Pipelining allows hardware resources to be fully utilized
One 32-bit instruction or two 16-bit instructions can be fetched.
Clock
Well, I lied!
44
Realities
PC is always incremented by 4.
Each time, 4 bytes are fetched from the instruction memory
It is either two 16-bit instructions or one 32-bit instruction
If bit [15-11] = 11101, 11110, or 11111, then, it is the first half-word of a 32-bit instruction.
Otherwise, it is a 16-bit instruction.
45
Example:
Calculate the Sum of an Array
int main(void){
int i;
total = 0;
for (i = 0; i < 10; i++) {
total += a[i];
}
while(1);
}
46
Example:
Calculate the Sum of an Array
Instruction Data
Memory Memory
(Flash) (RAM)
int main(void){ int a[10] = {1, 2, 3, 4,
int i; 5, 6, 7, 8, 9, 10};
total = 0; int total;
for (i = 0; i < 10;
i++) { I/O
CPU total += a[i];
} Devices
while(1);
}
Starting memory address Starting memory address
0x08000000 0x20000000
47
Example:
Calculate the Sum of an Array
0010 0001 0000
0000
0100 1010 0000 MOVS r1, #0x00
Instruction 1000 LDR r2, =
Memory 0110 0000 0001 total_addr
0001 STR r1, [r2,
(Flash)
0010 0000 0000 #0x00]
int main(void){ 0000 MOVS r0, #0x00
int i; 1110 0000 0000 B Check
total = 0; 1000 Loop: LDR r1, = a_addr
for (i = 0; i < 10;
0100 1001 0000 LDR r1, [r1, r0,
i++) {
total += a[i]; 0111 LSL #2]
} 1111 1000 0101 LDR r2, =
while(1); 0001 total_addr
}
Starting memory address 0001 0000 0010 LDR r2, [r2,
0x08000000 0000 #0x00]
0100 1010 0000 ADD r1, r1, r2
0100 LDR r2, =
0110 1000 0001 total_addr
0010 STR r1,
0100 0100 0001 [r2,#0x00]
0001 ADDS r0, r0, #1
0100 1010 0000 Check: CMP r0, #0x0A
0011
48 0110 0000 0001 BLT Loop
0001 NOP
Example:
Calculate the Sum of an Array
0x20000054 0x00000000
0x20000050 0x00000000
0x2000004C 0x00000000
0x20000048 0x00000000
Data 0x20000044 0x00000000
Memory (RAM) 0x20000040 0x00000000
0x2000003C 0x00000000
0x20000038 0x00000000
int a[10] = {1, 2, 3, 4, 5, 6, 7, 8, 0x20000034 0x00000000
9, 10}; 0x20000030 0x00000000
int total; 0x2000002C 0x00000000
0x20000028 0x00000000 total= 0x00000000
0x20000024 0x0000000A a[9] = 0x0000000A
0x20000020 0x00000009 a[8] = 0x00000009
0x2000001C 0x00000008 a[7] = 0x00000008
0x20000018 0x00000007 a[6] = 0x00000007
Assume the starting memory
address of the data memory is
0x20000014 0x00000006 a[5] = 0x00000006
0x20000000 0x20000010 0x00000005 a[4] = 0x00000005
0x2000000C 0x00000004 a[3] = 0x00000004
0x20000008 0x00000003 a[2] = 0x00000003
0x20000004 0x00000002 a[1] = 0x00000002
0x20000000 0x00000001 a[0] = 0x00000001
Memory
Memory
address
content
in bytes
49
Loading Code and Data into Memory
50
Loading Code and Data into Memory
51
Loading Code and Data into Memory
• Stack is mandatory
• Heap is used only if
dynamic allocation
(e.g. malloc, calloc) is
used.
52
View of a Binary Program
53
STM32F401RE Nucleo-64 Board
54
from st.com
STM32L476 Nucleo-64 Board
55
from st.com
Host Target
Development
Target System Host Computer
USB
Processor Cable
Peripherals - Download
- Debug
- Power
Real Time System Integrated Development
- Embedded processor(s) Environment (IDE)
- Memory - Compiler (Cross)
- Sensors - Assembler
- Actuators - Linker
- Communication Devices - Debugger
- Flash Programmer
Host Target
Development
Software Compilation Toolchain
Software Compilation Toolchain
Embedded Development Suite
60
61
from st.com
STM32L4
62 from st.com
Memory
Map
63