Chapter 08 ARM Subroutines 2 Stack Perserve Environment Edited
Chapter 08 ARM Subroutines 2 Stack Perserve Environment Edited
Chapter 8
Preserve Environment via Stack
Spring 2020
2
Stack
A Last-In-First-Out memory model
Only allow to access the most recently added item
Also called the top of the stack
Key operations:
push (add item to stack)
pop (remove top item from stack)
Tower of Hanoi
4
Tower of Hanoi
STACK: Last In First Out
http://en.wikipedia.org/wiki/File:Tower_of_Hanoi_4.gif
5
Stack Growth Convention:
Ascending vs Descending
Memory Memory
Address Address
0xFFFFFFFF 0xFFFFFFFF
Stack grows
upwards
Stack base PUSH POP
Stack top
Stack top
0x00000000 0x00000000
Stack
Pointer Stack top Stack top
Stack
(SP) Pointer
(SP)
Full stack: SP points to the last item Empty stack: SP points to the
pushed onto the stack next free space on the stack
7
Cortex-M Stack
Memory
Cortex-M uses full descending stack Address
0xFFFFFFFF
Example:
PUSH/POP {r0,r6,r3}
Stack pointer (SP, aka R13) Stack base
decremented on PUSH
SP = SP – 4 * # of registers
Stack
incremented on POP Pointer
(SP)
Stack top
SP = SP + 4 * # of registers
PUSH POP
stack grow
SP starts at 0x20000200 for STM32- downwards
8
Full Descending Stack
High Memory Addresses
PUSH {register_list}
equivalent to:
Stack base STMDB SP!, {register_list}
DB: Decrement Before
9
Stack Implementation
Push Pop
Stock Name
Equivalent Alternative Equivalent Alternative
Full
STMFD SP!,list STMDB SP!,list LDMFD SP!,list LDMIA SP!,list
Descending(FD)
Empty
STMED SP!,list STMDA SP!,list LDMED SP!,list LDMIB SP!,list
Descending(ED)
Full
STMFA SP!,list STMIB SP!,list LDMFA SP!,list LDMDA SP!,list
Ascending(FA)
Empty
STMEA SP!,list STMIA SP!,list LDMEA SP!,list LDMDB SP!,list
Ascending(EA)
10
Typical Usage of Stack
Why need stack?
Saving the original contents of processor’s registers at the
beginning a subroutine (Contents are restored at the end of a
subroutine)
Storing local variables in a subroutine
Passing extra arguments to a subroutine
Saving processor’s registers upon an interrupt
11
Stack
PUSH {Rd}
SP = SP-4 ⟶ descending stack
(*SP) = Rd ⟶ full stack
• The order in which registers listed in the register list does not matter.
• When pushing multiple registers, these registers are automatically
sorted by name and the lowest-numbered register is stored to the
lowest memory address, i.e. is stored last.
12
Stack
POP {Rd}
Rd = (*SP) ⟶ full stack
SP = SP + 4 ⟶ Stack shrinks
• The order in which registers listed in the register list does not matter.
• When popping multiple registers, these registers are automatically
sorted by name and the lowest-numbered register is loaded from the
lowest memory address, i.e. is loaded first.
13
Full Descending Stack
PUSH {r3, r1, r7, r2}
14
Full Descending Stack
PUSH {r3, r1, r7, r2}
Largest-numbered
register is pushed first.
15
Full Descending Stack
PUSH {r3, r1, r7, r2} POP {r3, r1, r7, r2}
16
Full Descending Stack
PUSH {r3, r1, r7, r2} POP {r3, r1, r7, r2}
Smallest-numbered
register is popped first.
17
Example: swap R1 & R2
R1 0x11111111
R2 0x22222222
memory
18
Example: swap R1 & R2
R1 0x11111111
R2 0x22222222
memory
19
Example: swap R1 & R2
R1 0x11111111
R2 0x22222222
memory
20
Example: swap R1 & R2
R1 0x22222222
R2 0x22222222
memory
21
Example: swap R1 & R2
R1 0x22222222
R2 0x11111111
memory
22
Quiz
Are the values of R1 and R2 swapped?
Answer: No.
23
Subroutine
A subroutines, also called a function or a procedure,
single-entry, single-exit
Return to caller after it exits
When a subroutine is called, the Link Register (LR) holds the
memory address of the next instruction to be executed after
the subroutine exits.
24
Link Register
32 bits
25
Call a Subroutine
26
Calling a Subroutine
BL label Caller Program
Notes:
label is name of subroutine Subroutine/Callee
foo PROC
Compiler translates label to ...
memory address MOV r4, #10
...
After call, LR holds return address BX LR
(the instruction following the call) ENDP
27
Exiting a Subroutine
Caller Program
Subroutine/Callee
foo PROC
Branch and Exchange ...
BX LR MOV
...
r4, #10
BX LR
PC = LR ENDP
28
ARM Procedure Call Standard
Subroutine
Register Usage Notes
Preserved
If return has 64 bits, then r0:r1 hold it. If argument 1 has
r0 Argument 1 and return value No
64 bits, r0:r1 hold it.
r1 Argument 2 No
r2 Argument 3 No If the return has 128 bits, r0-r3 hold it.
r3 Argument 4 No If more than 4 arguments, use the stack
r4 General-purpose V1 Yes Variable register 1 holds a local variable.
r5 General-purpose V2 Yes Variable register 2 holds a local variable.
r6 General-purpose V3 Yes Variable register 3 holds a local variable.
r7 General-purpose V4 Yes Variable register 4 holds a local variable.
r8 General-purpose V5 YES Variable register 5 holds a local variable.
r9 Platform specific/V6 Yes/No Usage is platform-dependent.
r10 General-purpose V7 Yes Variable register 7 holds a local variable.
r11 General-purpose V8 Yes Variable register 8 holds a local variable.
It holds intermediate values between a procedure and the
r12 (IP) Intra-procedure-call register No
sub-procedure it calls.
r13 (SP) Stack pointer Yes SP has to be the same after a subroutine has completed.
LR does not have to contain the same value after a
r14 (LR) Link register No
subroutine has completed.
r15 (PC) Program counter N/A Do not directly change PC
29
Caller-saved Registers vs
Callee-saved Registers
32 bits
Caller-saved
R0 • Not saved by subroutine
registers
R1
• Hold arguments/result
R2
Low R3
Registers
R4
• Caller expects their values are
R5
Callee-saved General retained
R6
registers
Purpose
Register • Callee must save and store it if
R7
callee modifies it
R8
R9
High
32 bits
Registers R10
R11 xPSR
R12 BASEPRI
Special
R13 (SP) R13 (MSP) R13 (PSP) PRIMASK Purpose
Register
Callee-saved registers R14 (LR) FAULTMASK
R15 (PC) CONTROL
Caller expects callee does not modify r4! Callee should preserve r4!
31
Preserve Runtime Environment via Stack
32
Preserve Runtime Environment via Stack
What is wrong in foo()?
Caller Program Subroutine foo Subroutine bar
foo PROC bar PROC
MOV r4, #100 PUSH {r4} ...
... ... BX LR
BL foo MOV r4, #10 ENDP
... ...
ADD r4, r4, #1 BL bar
...
POP {r4}
BX LR
ENDP
33
Preserve Runtime Environment via Stack
What is wrong in foo()?
Caller Program Subroutine foo Subroutine bar
foo PROC bar PROC
MOV r4, #100 PUSH {r4} ...
... ... BX LR
BL foo MOV r4, #10 ENDP
... ...
ADD r4, r4, #1 BL bar
...
POP {r4}
BX LR
ENDP
34
Preserve Runtime Environment via Stack
What is wrong in foo()? Solution #1
Caller Program Subroutine foo Subroutine bar
foo PROC bar PROC
MOV r4, #100 PUSH {r4, LR} ...
... ... BX LR
BL foo MOV r4, #10 ENDP
... ...
ADD r4, r4, #1 BL bar
...
POP {r4, LR}
BX LR
ENDP
35
Stacks and Subroutines
Preserve Runtime Environment via Stack
What is wrong in foo()? Solution #2
Caller Program Subroutine foo Subroutine bar
foo PROC bar PROC
MOV r4, #100 PUSH {r4, LR} ...
... ... BX LR
BL foo MOV r4, #10 ENDP
... ...
ADD r4, r4, #1 BL bar
...
POP {r4, PC}
BX LR
ENDP
37
Subroutine Calling Another Subroutine
Function SQ
Function MAIN Function QUAD
38
Subroutine Calling Another Subroutine
Function QUAD
39
Example: R0 = R04
MOV R0,#2 xxxxxxxx 0x20000200
BL QUAD 0x200001FC
B ENDL 0x200001F8
R0
MOV R0,#2 0x08000138
SQ MUL R0,R0
BX LR BL QUAD 0x0800013C
B ENDL 0x08000140
SP 0x20000200
QUAD PUSH {LR} SQ MUL R0,R0 0x08000144
LR
BL SQ BX LR 0x08000148
PC 0x08000138
BL SQ QUAD PUSH {LR} 0x0800014C
BX LR BL SQ 0x08000154
40
Example: R0 = R04
MOV R0,#2 xxxxxxxx 0x20000200
BL QUAD 0x200001FC
B ENDL 0x200001F8
R0 0x02
MOV R0,#2 0x08000138
SQ MUL R0,R0
BX LR BL QUAD 0x0800013C
B ENDL 0x08000140
SP 0x20000200
QUAD PUSH {LR} SQ MUL R0,R0 0x08000144
LR
BL SQ BX LR 0x08000148
PC 0x0800013C
BL SQ QUAD PUSH {LR} 0x0800014C
BX LR BL SQ 0x08000154
41
Example: R0 = R04
MOV R0,#2 xxxxxxxx 0x20000200
BL QUAD 0x200001FC
B ENDL 0x200001F8
R0 0x02
MOV R0,#2 0x08000138
SQ MUL R0,R0
BX LR BL QUAD 0x0800013C
B ENDL 0x08000140
SP 0x20000200
QUAD PUSH {LR} SQ MUL R0,R0 0x08000144
LR 0x08000140
BL SQ BX LR 0x08000148
PC 0x0800014C
BL SQ QUAD PUSH {LR} 0x0800014C
BX LR BL SQ 0x08000154
42
Example: R0 = R04
MOV R0,#2 xxxxxxxx 0x20000200
BL QUAD 0x08000140 0x200001FC
B ENDL 0x200001F8
R0 0x02
MOV R0,#2 0x08000138
SQ MUL R0,R0
BX LR BL QUAD 0x0800013C
B ENDL 0x08000140
SP 0x200001FC
QUAD PUSH {LR} SQ MUL R0,R0 0x08000144
LR 0x08000140
BL SQ BX LR 0x08000148
PC 0x08000150
BL SQ QUAD PUSH {LR} 0x0800014C
BX LR BL SQ 0x08000154
43
Example: R0 = R04
MOV R0,#2 xxxxxxxx 0x20000200
BL QUAD 0x08000140 0x200001FC
B ENDL 0x200001F8
R0 0x02
MOV R0,#2 0x08000138
SQ MUL R0,R0
BX LR BL QUAD 0x0800013C
B ENDL 0x08000140
SP 0x200001FC
QUAD PUSH {LR} SQ MUL R0,R0 0x08000144
LR 0x08000154
BL SQ BX LR 0x08000148
PC 0x08000144
BL SQ QUAD PUSH {LR} 0x0800014C
BX LR BL SQ 0x08000154
44
Example: R0 = R04
MOV R0,#2 xxxxxxxx 0x20000200
BL QUAD 0x08000140 0x200001FC
B ENDL 0x200001F8
R0 0x04
MOV R0,#2 0x08000138
SQ MUL R0,R0
BX LR BL QUAD 0x0800013C
B ENDL 0x08000140
SP 0x200001FC
QUAD PUSH {LR} SQ MUL R0,R0 0x08000144
LR 0x08000154
BL SQ BX LR 0x08000148
PC 0x08000148
BL SQ QUAD PUSH {LR} 0x0800014C
BX LR BL SQ 0x08000154
45
Example: R0 = R04
MOV R0,#2 xxxxxxxx 0x20000200
BL QUAD 0x08000140 0x200001FC
B ENDL 0x200001F8
R0 0x04
MOV R0,#2 0x08000138
SQ MUL R0,R0
BX LR BL QUAD 0x0800013C
B ENDL 0x08000140
SP 0x200001FC
QUAD PUSH {LR} SQ MUL R0,R0 0x08000144
LR 0x08000154
BL SQ BX LR 0x08000148
PC 0x08000154
BL SQ QUAD PUSH {LR} 0x0800014C
BX LR BL SQ 0x08000154
46
Example: R0 = R04
MOV R0,#2 xxxxxxxx 0x20000200
BL QUAD 0x08000140 0x200001FC
B ENDL 0x200001F8
R0 0x04
MOV R0,#2 0x08000138
SQ MUL R0,R0
BX LR BL QUAD 0x0800013C
B ENDL 0x08000140
SP 0x200001FC
QUAD PUSH {LR} SQ MUL R0,R0 0x08000144
LR 0x08000158
BL SQ BX LR 0x08000148
PC 0x08000144
BL SQ QUAD PUSH {LR} 0x0800014C
BX LR BL SQ 0x08000154
47
Example: R0 = R04
MOV R0,#2 xxxxxxxx 0x20000200
BL QUAD 0x08000140 0x200001FC
B ENDL 0x200001F8
R0 0x10
MOV R0,#2 0x08000138
SQ MUL R0,R0
BX LR BL QUAD 0x0800013C
B ENDL 0x08000140
SP 0x200001FC
QUAD PUSH {LR} SQ MUL R0,R0 0x08000144
LR 0x08000158
BL SQ BX LR 0x08000148
PC 0x08000148
BL SQ QUAD PUSH {LR} 0x0800014C
BX LR BL SQ 0x08000154
48
Example: R0 = R04
MOV R0,#2 xxxxxxxx 0x20000200
BL QUAD 0x08000140 0x200001FC
B ENDL 0x200001F8
R0 0x10
MOV R0,#2 0x08000138
SQ MUL R0,R0
BX LR BL QUAD 0x0800013C
B ENDL 0x08000140
SP 0x200001FC
QUAD PUSH {LR} SQ MUL R0,R0 0x08000144
LR 0x08000158
BL SQ BX LR 0x08000148
PC 0x08000158
BL SQ QUAD PUSH {LR} 0x0800014C
BX LR BL SQ 0x08000154
49
Example: R0 = R04
MOV R0,#2 xxxxxxxx 0x20000200
BL QUAD 0x08000140 0x200001FC
B ENDL 0x200001F8
R0 0x10
MOV R0,#2 0x08000138
SQ MUL R0,R0
BX LR BL QUAD 0x0800013C
B ENDL 0x08000140
SP 0x20000200
QUAD PUSH {LR} SQ MUL R0,R0 0x08000144
LR 0x08000140
BL SQ BX LR 0x08000148
PC 0x0800015C
BL SQ QUAD PUSH {LR} 0x0800014C
BX LR BL SQ 0x08000154
50
Example: R0 = R04
MOV R0,#2 xxxxxxxx 0x20000200
BL QUAD 0x08000140 0x200001FC
B ENDL 0x200001F8
R0 0x10
MOV R0,#2 0x08000138
SQ MUL R0,R0
BX LR BL QUAD 0x0800013C
B ENDL 0x08000140
SP 0x20000200
QUAD PUSH {LR} SQ MUL R0,R0 0x08000144
LR 0x08000140
BL SQ BX LR 0x08000148
PC 0x08000140
BL SQ QUAD PUSH {LR} 0x0800014C
BX LR BL SQ 0x08000154
51
Stack Pointer (SP)
32 bits • SP is the shadow of MSP (Main SP) or PSP (Process SP)
• If there is no embedded OS, PSP is not used
R0 • Determined by the ASP (Active SP) bit in the CONTROL
R1 register (ASP is always 0 in handler mode).
R2 • 0 = MSP (default)
Low R3 • 1 = PSP
Registers
R4 Bit 31 - 3 2 1 0
R5
R6
General
Purpose
Reserved ASP
Register
R7 CONTROL Register
R8
R9
High
32 bits
Registers R10
R11 xPSR
R12 BASEPRI
Special
R13 (SP) R13 (MSP) R13 (PSP) PRIMASK Purpose
Register
R14 (LR) FAULTMASK
R15 (PC) CONTROL
53
Passing Arguments into a Subroutine
R0 R1 R2 R3
32-bit 32-bit 32-bit 32-bit
Argument 1 Argument 2 Argument 3 Argument 4 Extra arguments are
pushed to the stack by
R1(MSB32) R0(LSB32) R3(MSB32) R2(LSB32) the caller. The caller is
64-bit Argument 1 64-bit Argument 2 responsible to pop them
out of the stack after the
subroutine returns.
R3(MSB32) R2 R1 R0(LSB32)
128-bit Argument
Subroutine
54
Passing Arguments into a Subroutine
Subroutine
Return Value
Register R0
55
Passing 4 Arguments
s = sum(1, 2, 3, 4);
Caller Callee
56
Passing Extra Arguments via Stack
int32_t sum(int32_t a, int32_t b, int32_t c, int32_t d,
int32_t h, int32_t i, int32_t j, int32_t k);
s = sum(1, 2, 3, 4, 5, 6, 7, 8);
Caller Callee
MOVS r0, #5 sum PROC
MOVS r1, #6 EXPORT sum
MOVS r2, #7 ADD r0, r0, r1 ; add a + b
MOVS r3, #8 ADD r0, r0, r2 ; add c
PUSH {r0, r1, r2, r3} ADD r0, r0, r3 ; add d
MOVS r0, #1 LDRD r1,r2, [sp] ; r1=mem[sp],r2=mem[sp+4]
MOVS r1, #2 ADD r0, r0, r1 ; add h
MOVS r2, #3 ADD r0, r0, r2 ; add i
MOVS r3, #4 LDRD r1,r2, [sp, #8] ; r1=mem[sp+8],r2=mem[sp+12]
BL sum ADD r0, r0, r1 ; add j
... ADD r0, r0, r2 ; add k
POP {r0, r1, r2, r3} BX LR
ENDP
Caller Callee
MOVS r0, #5 sum PROC
MOVS r1, #6 EXPORT sum
MOVS r2, #7 PUSH {r5, r6, lr}
MOVS r3, #8 ADD r0, r0, r1 ; add a + b
PUSH {r0, r1, r2, r3} ADD r0, r0, r2 ; add c
MOVS r0, #1 ADD r0, r0, r3 ; add d
MOVS r1, #2 LDRD r5,r6, [sp, #12] ;r5=mem[sp+12],r6=mem[sp+16]
MOVS r2, #3 ADD r0, r0, r5 ; add h
MOVS r3, #4 ADD r0, r0, r6 ; add i
BL sum LDRD r5,r6, [sp, #20] ;r5=mem[sp+20],r6=mem[sp+24]
... ADD r0, r0, r5 ; add j
POP {r0, r1, r2, r3} ADD r0, r0, r6 ; add k
POP {r5, r6, pc}
ENDP
58
Passing Extra Arguments via Stack
int32_t sum(int32_t a, int32_t b, int32_t c, int32_t d,
int32_t h, int32_t i, int32_t j, int32_t k);
s = sum(1, 2, 3, 4, 5, 6, 7, 8);
Caller Stack
MOVS r0, #5 Old SP <xxxxxxxx>
MOVS r1, #6
MOVS r2, #7
SP+12 0x00000008
MOVS r3, #8 SP+8 0x00000007
PUSH {r0, r1, r2, r3}
MOVS r0, #1 SP+4 0x00000006
MOVS r1, #2 SP 0x00000005
MOVS r2, #3
MOVS r3, #4
BL sum
...
POP {r0, r1, r2, r3}
59
Passing Extra Arguments via Stack
int32_t sum(int32_t a, int32_t b, int32_t c, int32_t d,
int32_t h, int32_t i, int32_t j, int32_t k);
s = sum(1, 2, 3, 4, 5, 6, 7, 8);
Callee
Stack
Old SP <xxxxxxxx> sum PROC
EXPORT sum
SP+24 0x00000008 PUSH {r5, r6, lr}
SP+20 0x00000007 ADD r0, r0, r1 ; add a + b
ADD r0, r0, r2 ; add c
SP+16 0x00000006 ADD r0, r0, r3 ; add d
SP+12 0x00000005 LDRD r5,r6, [sp, #12] ; r5=mem[sp+12],r6=mem[sp+16]
ADD r0, r0, r5 ; add h
SP+8 lr ADD r0, r0, r6 ; add i
SP+4 r6 LDRD r5,r6, [sp, #20] ; r5=mem[sp+20],r6=mem[sp+24]
ADD r0, r0, r5 ; add j
SP r5 ADD r0, r0, r6 ; add k
POP {r5, r6, pc}
ENDP
60
Summary
ARM Cortex-M uses full descending stack
How to pass arguments into a subroutine?
Each 8-, 16- or 32-bit variables is passed via r0, r1, r2, r3
Extra parameters are passed via stack
What registers should be preserved?
Caller-saved registers vs callee-saved registers
How to preserve the running environment for the caller?
Via stack
61