Programming in C and C++: 10. Undefined Behaviour and Optimisations
Programming in C and C++: 10. Undefined Behaviour and Optimisations
2 / 27
The Optimisation Tradeoff
3 / 27
The C Abstract Machine
4 / 27
Examples of Sequence Points in C
5 / 27
Examples of Sequence Points in C (cont.)
6 / 27
Execution of the Abstract Machine
7 / 27
Freedom to Optimize
Outside of the abstract machine, the C standard leaves plenty of room for
compilers to optimize source code to take advantage of specific hardware.
8 / 27
Implementation-defined Behaviour
These are typically set depending on the target hardware architecture and
operating system’s Application Binary Interface (ABI).
Examples of implementation-defined behaviour include:
I Number of bits in a byte (minimum of 8 and exactly 8 in every
modern system, but the PDP-10 had 36 bits per byte!)
I sizeof(int) which is commonly 32- or 64-bit
I char a = (char)123456;
I Results of some bitwise operations on signed integers
I Result of converting a pointer to an integer or vice versa
Compiler warnings help spot accidental dependency on this behaviour:
1 test.c:3:13: warning: shift count >= width of type
2 [-Wshift-count-overflow]
3 int x = a >> 64;
9 / 27
Unspecified Behaviour
The compiler can vary these within the same program to maximise
effectiveness of its optimisations. For example:
I Evaluation order of arguments in a function call.
I The order in which side effects take place when not otherwise
explicitly specified by the standard.
I Whether a call to an inline function uses the inline or external
definition.
I The memory layout of storage for function arguments.
I The order and contiguity of storage allocated by successive calls to
the malloc, calloc or realloc functions
10 / 27
Undefined Behaviour
There is a long list of undefined behaviours in the C standard (J.2). Some
examples include:
I Between two sequence points, an object is modified more than once,
or is modified and the prior value is read other than to determine the
value to be stored
int i = 0;
f(i++, i++); // what arguments will f be called with?
I Conversion of a pointer to an integer type produces a value outside
the range that can be represented
char *p = malloc(10);
short x = (short)p; // if address space larger than short?
I The initial character of an identifier is a universal character name
designating a digit
char \u0031morething;
11 / 27
Undefined Behaviour (cont.)
The value of a pointer to an object whose lifetime has ended is used.
We already encountered this back in lecture 4:
1 #include <stdio.h>
2
3 char *unary(unsigned short s) {
4 char local[s+1];
5 int i;
6 for (i=0;i<s;i++) local[i]=’1’;
7 local[s]=’\0’;
8 return local;
9 }
10
11 int main(void) {
12 printf("%s\n",unary(6)); //What does this print?
13 return 0;
14 }
12 / 27
Undefined Behaviour (cont.)
Use of an uninitialized variable before accessing it.
1 int main(int argc, char **argv) {
2 int i;
3 while (i < 10) {
4 printf("%d\n", i);
5 i++;
6 }
7 }
Accessing out-of-bounds memory.
1 char *buf = malloc(10);
2 buf[10] = ’\0’;
Dereferencing a NULL pointer or wild pointer (e.g. after calling free).
1 char *buf = malloc(10);
2 buf[0] = ’a’; // what if buf is NUL?
3 free(buf);
4 buf[0] = ’\0’; // buf has been freed
13 / 27
Undefined Behaviour (cont.)
14 / 27
Undefined Behaviour (cont.)
15 / 27
Security Issues due to Undefined Behaviour
1 void read_from_network(int size) {
2 // Catch integer overflow.
3 if (size > size+1)
4 errx(1, "packet too big")
5
6 char *buf = malloc(size+1);
7 if (buf == NULL)
8 errx(1, "out of memory");
9
10 read(fd, buf, size);
11 // ... error checking on read.
12
13 buf[size] = 0;
14 process_packet(buf);
15 free(buf);
16 }
16 / 27
Security Issues due to Undefined Behaviour (cont.)
1 void read_from_network(int size) {
2 // size > size+1 is impossible since signed
3 // overflow is impossible. Optimize it out!
4 // if (size > size+1)
5 // errx(1, "packet too big")
6
7 char *buf = malloc(size+1);
8 if (buf == NULL)
9 errx(1, "out of memory");
10
11 read(fd, buf, size);
12 // ... error checking on read.
13
14 buf[size] = 0;
15 process_packet(buf);
16 free(buf);
17 }
17 / 27
Security Issues due to Undefined Behaviour (cont.)
18 / 27
Security Issues due to Undefined Behaviour (cont.)
19 / 27
Security Issues due to Undefined Behaviour (cont.)
20 / 27
Compiler View on Undefined Behaviour
21 / 27
Compiler View on Undefined Behaviour (cont.)
Consider a simplified program fragment, where function definitions:
I terminate for every input
I run in a single thread
I have infinite computing resources.
22 / 27
“Always-Defined” Functions
1 int32_t
2 safe_div_int32_t (int32_t a, int32_t b)
3 {
4 if ((b == 0) || ((a == INT32_MIN) && (b == -1))) {
5 report_integer_math_error();
6 return 0;
7 } else {
8 return a / b;
9 }
10 }
23 / 27
“Sometimes-Defined” Functions
1 int32_t
2 safe_div_int32_t (int32_t a, int32_t b)
3 {
4 return a / b;
5 }
6 // function call defined iff
7 // ((b == 0) || ((a == INT32_MIN) && (b == -1)))
24 / 27
“Always-Undefined” Functions
1 int32_t
2 safe_div_int32_t (int32_t a, int32_t b)
3 {
4 bool check_overflow;
5 if (check_overflow &&
6 ((b == 0) || ((a == INT32_MIN) && (b == -1)))) {
7 report_integer_math_error();
8 return 0;
9 } else {
10 return a / b;
11 }
12 }
Why is this variant of safe div int32 t always undefined for all inputs?
25 / 27
Case Analysis in Linux kernel
1 static void __devexit agnx_pci_remove (struct pci_dev *pdev)
2 {
3 struct ieee80211_hw *dev = pci_get_drvdata(pdev);
4 struct agnx_priv *priv = dev->priv;
5
6 if (!dev) return;
7 // ... do stuff using dev ...
8 }
This code gets a pointer to a device struct, tests for null and uses it.
But the pointer is dereferenced before the null check! An optimising
compiler (e.g. gcc at -O2 or higher) performs the following case analysis:
I if dev == NULL then dev->priv has undefined behaviour.
I if dev != NULL then the null pointer check will not fail. Null pointer
check is dead code and may be deleted.
End Result: The !dev check is optimised away → security issue
26 / 27
Living with Undefined Behaviour
Long term: Unsafe programming languages will only be used to build
safer abstractions (e.g. Java or ML are bootstrapped in C/C++).
Short term: No straightforward answer, as a combination of tools and
techniques are needed.
I Use multiple compilers (clang or gcc) and enable all compiler
warnings (-Wall).
I Use static analysis tools (like clang-analyzer or Coverity) to spot
errors, or dynamic analysis engines like Valgrind.
I Modern clang and gcc have “undefined behaviour sanitizers” that
detect and generate errors for many classes of undefined behaviour via
the -fsanitize=undefined command line flag.
I Avoid “sometimes-undefined” functions by checking all inputs at the
point of use.
I Use high-quality third-party libraries that obey these rules.
“Be very careful, use good tools, and hope for the best.”
– John Regehr, http://blog.regehr.org/archives/213
27 / 27