Getting The Least Out of Your C Compiler: Class #508, Embedded Systems Conference San Francisco 2001
Getting The Least Out of Your C Compiler: Class #508, Embedded Systems Conference San Francisco 2001
Getting The Least Out of Your C Compiler: Class #508, Embedded Systems Conference San Francisco 2001
Jakob Engblom
IAR Systems
Box 23051
SE-750 23 Uppsala
Sweden
email: [email protected]
Using only the internal program and data memory of a microcontroller can save large costs
in embedded systems design. This requires, however, that the program fits into the memory—
which is not always easy to accomplish.
This article discusses how to help a modern, highly optimizing C compiler generate small
code, while maintaining the portability and readability advantages offered by C. In order to
facilitate an understanding of what a compiler likes and does not like, we will give an inside
view on how a compiler operates.
Many established truths and tricks are invalidated when using modern compilers. We will
demonstrate some of the more common mistakes and how to avoid them, and give a catalog
of good coding techniques. An important conclusion is that code that is easy for a human to
understand is usually also compiler friendly, contrary to hacker tradition.
1 Introduction
A C compiler is a basic tool for most embedded systems programmers. It is the tool by which the ideas and
algorithms in your application (expressed as C source code) are transformed into machine code executable by
your target processor. To a large extent, the C compiler determines how large the executable code for the
application will be.
The C language is well suited for low-level programming. It was designed for coding operating systems,
which has left the imprint of powerful pointer handling, bit manipulation power not found in other high-level
languages, and target-dependent type sizes to generate the best possible code for a certain target.
The semantics of C are specified by the ISO/ANSI C Standard [1]. The standard makes an admirable job of
specifying the language without unduly constraining an implementation of the language. For example,
compared to Java, the C standard gives the compiler writer some flexibility in the size of types and the precise
order and implementation of calculations. The result is that there are C compilers for all available processors,
from the humblest 8-bitter to the proudest supercomputers.
A compiler performs many transformations on a program in order to generate the best possible code.
Examples of such transformations are storing values in registers instead of memory, removing code which
does nothing useful, reordering computations in a more efficient order, and replacing arithmetic operations by
cheaper operations.
To most programmers of embedded systems, the case that a program does not quite fit into the available
memory is a familiar phenomenon. Recoding parts of an application in assembly language or throwing out
functionality may seem to be the only alternatives, while the solution could be as simple as rewriting the C
code in a more compiler-friendly manner.
In order to write code that is compiler friendly, you need to have a working understanding of compilers. Some
simple changes to a program, like changing the data type of a frequently-accessed variable, can have a big
impact on code size while other changes have no effect at all. Having an idea of what a compiler can and
cannot do makes such optimization work much easier.
This paper (and associated presentation) will try to convey a feeling for how a modern C compiler works and
how you can help it compile your code in the best possible way.
2 Modern C Compilers
Assembly programs specify both what, how, and the precise order in which calculations should be carried out.
A C program, on the other hand, only specifies the calculations that should be performed. With some
restrictions, the order and the technique used to realize the calculations are up to the compiler.
The compiler will look at the code and try to understand what is being calculated. It will then generate the
best possible code, given the information it has managed to obtain, locally within a single statement and also
across entire functions and sometimes even whole programs.
All code that is not considered useful—according to the definition in the previous section—is removed. This
removal of unreachable or useless computations can cause some unexpected effects. An important example is
that empty loops are completely discarded, making “empty delay loops” useless. The code shown below
stopped working properly when upgrading to a modern compiler that removed useless computations:
Note that a compiler cannot in general make function calls into common subexpressions. Two subsequent
calls to the same function with the same argument will generate two function calls, since the compiler does
not in general know what happens inside the called function (for example, it might perform side-effects). If
you intend to evaluate a function only once, write it once!
2.8 Linker
The linker should be considered an integral part of the compilation system, since there are some optimizations
that are performed in the linker. The most basic embedded-systems linker should remove all unused functions
and variables from a program, and only add those parts of the standard libraries that are actually used. The
granularity at which program parts are discarded varies, from files or library modules down to individual
functions or even snippets of code. The smaller the granularity, the better the linker.
Some linkers also perform post-compilation transformations on the program. A favorite is to extend the low-
level transformation of breaking out common code sequences to work across the whole program, with corres-
ponding potential gains in code compression.
Program 1 gets slightly smaller with speed optimization, while program 2 is considerably larger, an effect we
traced to the fact that the inliner was lucky on program 1.
The conclusion is that one should always try to compile a program with different optimization settings and see
what happens. Some compilers allow you to adjust the aggressiveness of individual optimizations—explore
that possibility, especially for inlining.
It is often worthwhile to use different compilation settings for different files in a project: put the code that
must run very quickly into a separate file and compile that for minimal execution time (maximum speed), and
the rest of the code for minimal code size. This will give a small program, which is still fast enough where it
matters. Some compiler allow different optimization settings for different functions in the same source file
using #pragma directives.
1
Unless some operand has a larger range than int, this might be the case for long int.
small 8-bit micro. A portable solution to the data size problem is to define a set of types with minimal
guaranteed range, and then change the definition depending on the target. For example, “best_int8_t” is
guaranteed to cover the range –128 to + 127, but might be bigger than 8 bits if that is more efficient:
This solution should not be used when the actual size of the data is important (reading and writing I/O, for
example).
The conclusion is that, if you think about a value as never going below zero, make it unsigned. If the purpose
of a variable is to manipulate it as bits, make it unsigned. Otherwise, operations like right shifting and
masking might do strange things to the value of the variable.
If a small change to a program causes a big change in program size, look at the library functions included
after linking. Especially floating point and 32-bit integer libraries can be insidious, and creep in due to C
implicit casts.
Another way to shrink the code of your program is to use limited versions of standard functions. For instance,
the standard printf() is a very big function. Unless you really need the full functionality, you should use a
limited version that only handles basic formatting or ignores floating point. Note that this should be done at
link time: the source code is the same, but a simpler version is linked. Because the first argument to
printf() is a string, and can be provided as a variable, it is not possible for the compiler to automatically
figure out which parts of the function your program needs.
A good way to avoid implicit casts is to use consistent typedefs for all types used in the application.
Otherwise, it is easy to start mixing different-size integers or other types. Some lint-type tools can also be
used to check for type consistency.
4 Facilitating Optimizations
The previous section listed some items about selecting and using data types that should be quite obvious. This
section will deal with the somewhat more complex issue of how to write your code so that it is easy to
understand for the compiler, and so that optimization is allowed to the greatest possible extent.
The basic principle is that “the compiler is your friend”. But it is not an AI system that can understand what
you are doing… and it is a rather dumb tool many times. If you understand the strengths and weaknesses of
the compiler, you will write much better code.
short a; short a;
char b = highbyte(a); char b = highbyte(a);
2
Note that in C++, reference parameters (“foo(int &)”) can introduce pointers to variables in a calling function without
the syntax of the call showing that the address of a variable is taken.
The old way to declare a function before calling it (Kernighan & Ritchie or “K&R” style) was to leave the
parameter list empty, like “extern void foo()”. This is not a proper ANSI prototype and will not help
code generation. Unfortunately, few compilers warn about this by default.
Example
unsigned char gGlobal; /* global variable */
void foo(int x)
{
unsigned char ctemp;
ctemp = gGlobal; /* should go into register */
...
/* Calculations involving ctemp, i.e. gGlobal */
bar(z); /* does not read or write gGlobal, otherwise error */
/* More calculations on ctemp */
...
gGlobal = ctemp; /* make sure to remember the result */
}
s.A = a; s.A = a;
s.B = bar(b); s.C = c;
s.C = c; s.D = c;
s.D = c; s.B = bar(b);
s.E = baz(d); s.E = baz(d);
} }
Note that grouping function calls has no effect if the functions become inlined.
Another example is the use of conditional values in calculations. The “clever” code will result in larger
machine code, since the generated code will contain the same test as the straightforward code, and adds a
temporary variable to hold the one or zero to add to str. The straightforward code can use a simple
increment operation rather than a full addition, and does not require the generation of any intermediate results.
Since clever code almost never compiles better than straightforward code, why write clever code? From a
maintenance standpoint, writing simpler and more understandable code is definitely the method of choice [4].
Note that this conflicts with some other advice in this paper about grouping function calls, so check and see
which change gives the best effects. Accessing in order is probably less important, but it all depends on your
particular program, compiler, and platform.
5 Summary
This paper has tried to give an idea of how a modern C compiler works. Based on this, we have also given
practical tips for how you can write code that is easy to compile and that will allow your executable code to
be made smaller.
A compiler is a very complex system with highly non-linear behavior, where a seemingly small change in the
source code can have big effects on the assembly code generated.
The basis for the compilation process is that the compiler should be able to understand what your code is
supposed to do, in order to perform the operations in the best possible way for a given target. As a general
rule, code that is easy to understand for a fellow human programmer—and thus easy to maintain and port—is
also easier to compile efficiently.
References
[1] ISO/IEC 9899:1999 Programming languages – C or American National Standard for Information
Systems - Programming Language - C, ANSI X3.159-1989.
[2] Jakob Engblom: Why SpecINT95 Should Not Be Used To Benchmark Embedded Systems Tools, in
Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded
Systems (LCTES) 1999, ACM Press, May 1999.
[3] Johan Runeson, Sven-Olof Nyström, and Jan Sjödin: Optimizing Code Size through Procedural
Abstraction, Poster presented at the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for
Embedded Systems (LCTES) 2000, Vancouver, Canada, June 2000.
[4] Steve McGuire: Writing Solid Code, Microsoft Press, Redmond, Washington, 1993.