Writing Optimized C Code For Micro Controller Applications
Writing Optimized C Code For Micro Controller Applications
By Wilson Chan
Toshiba America Electronics Components, Inc.
Email: wilson.chan@taec.toshiba.com
INTRODUCTION
PROGRAMMING MODEL
Some microcontrollers do not have hardware support for a C stack. If you plan to
develop your embedded applications in C, you should select a microcontroller with a
stack-based architecture. If the microcontroller has dedicated address-specifying/index
registers, they will also help the C compiler to generate more efficient code.
IX
W A IY
B C SP
D E PC
H L PSW
General-Purpose : Special-Purpose :
8bit X 8 16bit (IX, IY, SP, PC)
PSW = JF, ZF, CF,
HF, SF, VF
C Language Program
1 int i, a, b, c;
2 test(){
3 i = 1 + 2;
4 a = 1;
5 b = a + 2;
6 c = a + b;
7 }
This optimization method deletes unused variables at compile time. Consider the
C program example in Figure 3. With dead-code elimination optimization, the C
compiler eliminates the C statement in line 3.
C Language Program
1 int test(){
2 int a;
3 a = 1;
4 return 0;
5 }
1 _test: 1 _test:
2 ld WA,0x1 2 xor WA,WA
3 xor WA,WA 3 ret
4 ret
This optimization method replaces expensive operations with less expensive ones.
Consider the C program example in Figure 4. The most efficient code is a left-shift
instead of an integer multiplication. Without optimization, the generated code makes a
call to a multiplication function supplied by the C run-time library to compute the
multiplication which takes much longer than a left-shift operation.
C Language Program
1 int i;
2 test() {
3 i *= 2;
4 }
1 _test: 1 _test:
2 ld BC,0x2 2 ld IY,_i
3 ld WA,(_i) 3 ld WA,(IY)
4 cal C87C_muli 4 shlca WA
5 ld (_i),WA 5 ld (IY),WA
6 ret 6 ret
This optimization method reduces the number of operations by using the first
operation result in subsequent statements that contain the same operation. Consider the C
program example in Figure 5. The calculation of the sub-expression, i + 1, is reduced
from two times to once with optimization.
C Language Program
1 int test(int *a, int *b, int i)
2 {
3 return(a[i+1] + b[i+1]);
4 }
1 _test: 1 _test:
2 ld WA,(SP+0x7) 2 ld WA,(SP+0x7)
3 inc WA 3 inc WA
4 shlca WA 4 shlca WA
5 ld IX,WA 5 ld IX,WA
6 add IX,(SP+0x3) 6 add IX,(SP+0x3)
7 ld WA,(SP+0x7) 7 ld DE,WA
8 inc WA 8 add DE,(SP+0x5)
9 shlca WA 9 ld WA,(DE)
10 ld DE,WA 10 add WA,(IX)
11 add DE,(SP+0x5) 11 ret
12 ld WA,(DE)
13 add WA,(IX)
14 ret
C Language Program
1 int a[10], b, c;
2 test(){
3 int i;
4 for (i = 0; i < 10; i++)
5 a[i] = b + c;
6 }
1 _test: 1 _test:
2 ld BC,0x0 2 ld BC,(_b)
3 cmp BC,0xa 3 add BC,(_c)
4 j sge,L1 4 ld WA,_a
5 L2: 5 ld DE,WA
6 ld WA,BC 6 add WA,0x14
7 shlca WA 7 L2:
8 ld DE,WA 8 ld (DE),BC
9 add DE,_a 9 inc DE
10 ld WA,(_b) 10 inc DE
11 add WA,(_c) 11 cmp DE,WA
12 ld (DE),WA 12 j lt,L2
13 inc BC 13 ret
14 cmp BC,0xa
15 j slt,L2
16 L1:
17 ret
This optimization method reduces the number of memory access, thus speeding
up program execution. Consider the C program example in Figure 7. With optimization,
the number of memory access is reduced from three to one. Also, the resultant code
occupies less memory (12 bytes versus 20 bytes).
C Language Program
1 int a;
2 int test() {
3 if (a != 1)
4 return a;
5 else
6 return a - 1;
7 }
1 _test: 1 _test:
2 ld WA,(_a) 2 ld WA,(_a)
3 cmp WA,0x1 3 cmp WA,0x1
4 j t,L1 4 j t,L1
5 ld WA,(_a) 5 ret
6 ret 6 L1:
7 L1: 7 dec WA
8 ld WA,(_a) 8 ret
9 dec WA
10 ret
Switch statements are very common in C programs. This optimization calls for
the C compiler to analyze the nature of the case values in the switch statement, then
decide on the optimum way to implement the switch statement. Consider the C program
examples in Figure 8. The C compiler implements the switch statement in the C program
1 as a series of compare and branch instructions. For the C program 2, the C Compiler
uses a different coding style to improve code efficiency.
C Language Program 1 C Language Program 2
1 unsigned char j; 1 unsigned char j;
2 test1(unsigned char i) { 2 test2(unsigned char i) {
3 switch(i) { 3 switch(i) {
4 case 1: 4 case 1:
5 j = 1; 5 j = 1;
6 break; 6 break;
7 case 2: 7 case 2:
8 j = 2; 8 j = 2;
9 break; 9 break;
10 case 3: 10 case 3:
11 j = 3; 11 j = 3;
12 break; 12 break;
13 default: 13 case 4:
14 break; 14 j = 4;
15 } 15 break;
16 } 16 default:
17 break;
18 }
19 }
C Language Program
1 unsigned char a;
2 test( ) {
3 a &= ~0x1;
4 a |= 0x4;
5 }
With Optimization
1 _test:
2 ld IY,_a
3 clr (IY).0
4 set (IY).2
5 ret
Level Function
0 Minimum optimization (default)
Stack release absorption. Branch instruction optimization.
Deletion of unnecessary instructions
1 Basic block optimization
Propagation of copying restricted ranges.
Gathering of common partial expressions in restricted ranges.
2 Optimization of more than basic blocks
Propagation of copying whole functions.
Gathering of common partial expressions of whole functions
3 Maximum optimization
Loop optimization and other miscellaneous optimization
Format
-XS
Function
Specifies the output of minimum object code size.
Description
When this option is specified, part of optimization is skipped. The
default, when this option is not specified, is the output of code with execution
speed priority.
Many microcontrollers have more than one memory space. For example, a
memory space may be accessible with an 8-bit offset, another memory space requires a
16-bit offset, still some memory space requires an address space modifier. You can
decrease program size by explicitly locating the frequently used variables into the
memory space that requires the minimum number of bytes for addressing.
Example:
#pragma section const
const int ix = 3000;
char y[] = { 'A','B' };
static char z = 4;
For temporary variables, do not declare them as global variables. Rather, declare
them as auto variables.
When global variables are passed as arguments in a function, use the arguments,
not the global variables, in expressions within the function.
For global variables that are accessed frequently in a function, make a copy of the
global variables as auto variables and use the auto variables within the function.
Consider the C program examples in Figure 16. The C program 1 produces 30 bytes of
machine code whereas the C program 2 generates 18 bytes of machine code.
C Language Program 1 C Language Program 2
1 unsigned char *a, j; 1 unsigned char *a, j;
2 test( ) { 2 test( ) {
3 for (j=0; j<100; j++) 3 unsigned char *c, i;
4 *a++ = 0; 4 c = a;
5 } 5 for (i=0; i<100; i++)
6 *c++ = 0;
7 }
_test: _test:
ld (_j),0x0 ld BC,(_a)
L2: xor A,A
ld IY,_a L6:
ld DE,(IY) ld DE,BC
ld WA,(IY) inc BC
inc WA ld (DE),0x0
ld (IY),WA inc A
ld (DE),0x0 cmp A,0x64
inc (_j) j lt,L6
cmp (_j),0x64 ret
j lt,L2
ret
Use unsigned data types with the data size that matches the natural width of the
microcontroller’s registers. Also, use the smallest data type that can get the job done. For
example, if you write a C program for an 8-bit microcontroller, use the unsigned char
data type in loop control operations, as subscript of arrays and as bit-field members. If
the C compiler enforces ANSI C’s integer promotion rule by default, specify an option to
disable it. Otherwise, this ANSI C’s rule will enlarge the program size.
Example: _fcn:
struct field { ld IY,_array
unsigned char a:1; ld IX,IY
unsigned char b:3; ld BC,IY
unsigned char c:3; add IY,0xa
unsigned char d:1; L4:
}; ld DE,BC
struct field array[10]; ld A,(DE)
and A,0x8f
void fcn( ) { or A,0x50
unsigned char i; ld (DE),A
for (i=0; i < 10; i++) { or (IX),0xe
array[i].b = 5; inc BC
array[i].c = 7; inc IX
} cmp IX,IY
} j lt,L4
ret
Example: _testcpy:
char a_src[41] = {"Hello"}; push HL
char a_des[41]; ld IY,_a_src
ld IX,_a_des
void testcpy(void) { j L3
register char *p_src = a_src; L2:
register char *p_des = a_des; ld DE,IX
inc IX
while (*p_src) ld HL,IY
*p_des++ = *p_src++; inc IY
*p_des = '\0'; ld A,(HL)
} ld (DE),A
L3:
cmp (IY),0x0
j f,L2
ld (IX),0x0
pop HL
ret
Example: .RegParm:
int g1, g2, g3, g4, sum; ld (_g1),WA
ld (_g2),BC
int __adecl RegParm( int p1, ld (_g3),DE
int p2, int p3, int p4) { ld DE,(SP+0x3)
g1 = p1; ld (_g4),DE
g2 = p2; add WA,BC
g3 = p3; pop DE
g4 = p4; pop BC
return(p1+p2); j DE
} _test:
ld WA,0xffa8
void test( ) { push WA
sum = RegParm(2, -2, 88, - ld DE,0x58
88); ld WA,0x2
} ld BC,0xfffe
cal .RegParm
ld (_sum),WA
ret
Expressions involving these data types often result in calls to run-time library
functions in the generated code. For the example in Figure 20, the C compiler generates
a function call to _fld_ff in the run-time library to support floating point type.
Example: _fcn1:
float array4[10], vf; push DE
fcn1( ) { push HL
unsigned char i; ld (SP+0x4),0x0
for (i=0; i < 10; i++) L2:
array4[i] = vf; ld A,(SP+0x4)
} ld W,0x0
shlca WA
shlca WA
ld BC,_array4
ld HL,WA
add HL,BC
ld BC,_vf
ld WA,HL
cal ._fld_ff
inc (SP+0x4)
cmp (SP+0x4),0xa
j lt,L2
pop HL
pop DE
ret
The C compiler follows the ANSI C data type promotion rules to process
expression that involve different data types, resulting in extra object code and execution
time. In expressions that contain char and int, char gets promoted to int. In expressions
that contain both signed and unsigned integers, signed integers are promoted to unsigned
integer. In expressions that contain floating point types, float gets promoted to double.
In expressions that contain only char, if the C compiler enforces ANSI C’s integer
promotion rule by default, specify an option to disable it. Otherwise, this ANSI C’s rule
will enlarge the program size.
Example: _fcn1:
char c1; ld A,(_c1)
int i1, i2, i3; test A.7
subb W,W
fcn1( ) { add WA,(_i3)
i1 = c1 + i3; ld (_i1),WA
} ret
fcn2( ) { _fcn2:
i1 = i2 + i3; ld WA,(_i2)
} add WA,(_i3)
ld (_i1),WA
ret
Avoid using recursive functions with many arguments and auto variables, and
functions with variable length argument lists. If a structure or an array is used as
argument in a function, pass a pointer to the data instead of passing the data on the stack.
Example: _SumCount:
struct s1 { ld IX,(SP+0x3)
char *text; xor IY,IY
int count; ld WA,IY
}; ld DE,(SP+0x5)
extern struct s1 ays1[5]; cmp DE,0x0
int sum; j sle,L1
L2:
ld BC,(IX+0x2)
int SumCount( struct s1 *p1, add WA,BC
int p2) { inc IY
int i, j; add IX,0x4
for (i=0, j=0; i < p2; i++, cmp IY,DE
p1++) j slt,L2
j += p1->count; L1:
return(j); ret
} _test3:
ld WA,0x5
void test3( ) { push WA
sum = SumCount(ays1, ld WA,_ays1
(sizeof(ays1)/sizeof(struct push WA
s1))); cal _SumCount
} ld SP,SP+0x4
ld (_sum),WA
ret
CONCLUSION