Part4 SW SEC Memory Corruption
Part4 SW SEC Memory Corruption
SECURITY
MEMORY CORRUPTION
Part 1: Essence of the problem
• Buffer overflow
• Other memory corruption problems
• Code injection vs code reuse
Outlines • Spotting the problem
• Format string attacks
• Preventing format string attack
Part 3: Attacks
• Return-to-libc attacks
• Return-Oriented Programming (ROP)
2
Overview
1. How do memory corruption flaws work?
2. What can be the impact?
3. How can we spot such problems in C(++) code?
4. What can ‘the platform’ do about it?
ie. the compiler, system libraries, hardware, OS, ..
5. What can the programmer do about it?
3
Essence of the problem
4
undefined behaviour: Anything can
happen
// assign values to
array elements
- In C, array indexing starts from 0, so: buffer[0] = a;
buffer[4] buffer[1] = z;
refers to the 5th element of the array buffer, that doesn’t exist because buffer[2] = b;
the array has a length of 4. buffer[3] = c;
- Attempting to modify this element , i.e., buffer[4] will result in
accessing memory beyond the bounds of the array, leading to
undefined behavior.
Therefore,
assigning a value to buffer can overwrite memory locations
adjacent to the array buffer to cause memory corruption or
segmentation faults
5
undefined behaviour: Anything can
happen
Suppose in a C program you have an array of length 4 Segmentation faults in
char buffer[4]; C++ are runtime errors
that happen when a
What happens if the statement below is executed? program attempts to
buffer[4] = ‘a’; access memory that it is
not allowed to access.
If the att acker can control the value ‘a’
then anything that the att acker wants may happen
7
Solution to this problem
• For e f f i c i e n c y
8
Buff er overfl ow
9
Other memory corruption problems
memory c orruption
Errors with pointers and with dynamic memory (the heap) problems; C
1. Buffer overflows
2. Pointers related errors
• Who has writt en a C(++) program that uses pointers?
• Who ever had such a program crashing?
• Who has ever writt en a C(++) program that uses dynamic
memory, i.e., malloc & free?
• Who ever had such a program crashing?
11
Spot all (potential) defects
1000 …
1001 void f (){ "Dereference" refers to
1002 char* buf, buf1; the act of accessing the
possible null dereference value stored at the
1003 buf = malloc(100); (if malloc failed) memory location pointed
1004 buf[0] = ’a’;
to by a pointer.
... use-after-free; buf[0] points to de-
allocated memory
3001 free(buf);
3002 buf[0] = ’b’; memory leak; pointer buf1
3003 buf1 = malloc(100); to this memory is lost &
3004 buf[0] = ’c’ memory is never freed
3005 }
12
Spot all (potential) defects
3. Free Memory: When you are done using the allocated memory,
there is a need to free it using the free function to prevent
memory leaks.
13
How does classic buff er overfl ow work?
aka smashing the stack
14
Process memory layout
Process memory layout
Stack (buffer)
Stack-vs-heap layout inside memory
Process memory layout
Low
add resses
Program Code .text
Stack layout
The stack consists of Activation Records:
Stack Buffer
x
AR main() return address
AR f() buf[4..7]
buf[0..3]
void f(int x) {
char[8] buf;
gets(buf);
}
void main() {
f(…); …
}
void format_hard_disk(){…} 19
Stack overfl ow att ack - case 1
What if gets() reads more than 8 bytes ?
Attacker can jump to arbitrary point in the code !
x
AR main() return address
AR f() buf[4..7]
buf[0..3]
void f(int x) {
char[8] buf;
gets(buf);
}
void main() {
f(…); …
}
void format_hard_disk(){…} 20
21
Stack overfl ow att ack - case 2
What if gets() reads more than 8 bytes ?
Attacker can jump to his own code (aka shell code)
x
AR main() return address
AR f() /bin/sh
exec
void f(int x) {
char[8] buf;
gets(buf);
}
void main() {
f(…); …
}
void format_hard_disk(){…} 22
Stack overfl ow att ack - case 2
What if gets() reads more than 8 bytes ?
Attacker can jump to his own code (aka shell code)
23
Code injection vs code reuse
The two attack scenarios in these examples
(2) is a code injection attack
attacker inserts his own shell code in a buffer and corrupts
return addresss to point to this code
In the example, exec('/bin/sh')
This is the classic buffer overflow attack
[Smashing the stack for fun and profit, Aleph One, 1996]
(1) is a code reuse attack
attacker corrupts return address to point to existing code In
the example, format_hard_disk
26
Spotting the problem
Reminder: C chars & strings
str
h e l l o \0
28
Example: gets
char buf[20];
gets(buf); // read user input until
// first EoL or EoF character
29
Example: strcpy
char dest[20];
strcpy(dest, src); // copies string src to dest
30
Spot the defect!
(1)
char src[9];
char dest[9];
31
Spot the defect!
(1)
char src[9]; base_url is 10 chars long, incl.
char dest[9]; its null terminator, so src will not
be null-terminated
char* base_url = "www.ab.cd";
strncpy(src, base_url, 9);
// copies base_url to src
strcpy(dest, src);
// copies src to dest
32
Spot the defect!
(1)
char src[9]; base_url is 10 chars long, incl.
char dest[9]; its null terminator, so src will not
be null-terminated
char* base_url = ”www.ab.cd”;
strncpy(src, base_url, 9); NB: a strongly typed
// copies base_url to src programming
strcpy(dest, src); language would
guarantee that
// copies src to dest strings are always
null-terminated,
without the
programmer having
so strcpy will overrun the buffer dest to worry about this...
33
Spot the defect!
(2)
char *buf;
int len;
...
34
Spot the defect!
(2)
char *buf;
int len;
...
35
Spot the defect!
(2)
char *buf;
What if the malloc() fails?
int len;
(because we are out of memory)
...
if (len < 0)
{
error ("negative length");
return;
}
buf = malloc(MAX(len,1024));
read(fd,buf,len);
36
Are memory address spaces allocated? Yes
Are memory address spaces deallocated? No
Are memory address spaces full? Yes
Are memory address spaces being used at the current moment? Yes
37
Spot the defect!
(2)
char *buf;
int len;
...
if (len <
0)
{error
("negat
ive
length"
);
return;
}
buf =
malloc(MAX
(len,1024) 38
);
Bett er still
char *buf;
int len;
...
if (len <
0)
{error
("negat
ive
length"
);
return;
}
buf =
calloc(MAX
(len,1024) 39
);
Spot the defect! (3)
1. Buffer overflow
#define MAX_BUF 256 if length of (in) => (buf),
strcpy function will cause a
void BadCode (char* in) buffer overflow. This leads
{ short len; to security vulnerabilities.
char buf[MAX_BUF];
len = strlen(in);
40
Spot the defect! (3)
#define MAX_BUF 256
len = strlen(in);
41
Exercise 1
#include <stdlib.h>
int main() {
int *arr;
int size = 10;
• Spot ONE (1) defect.
• What is the root // Allocate memory for an array of 10 integers, initialized to zero
cause? arr = (int *)malloc(size, sizeof(int));
• Propose a solution.
// Use the allocated memory
// ...
return 0;
}
42
Exercise 1
#include <stdlib.h>
memory c orruption
problems; C
int main() { 1. Buffer overflows
int *arr; 2. Pointers related errors
int size = 10; 3. Memory leakage
44
Exercise 2
#define MAX_BUF 10
46
Exercise 3 void allocateMemory() {
int* ptr = (int*)malloc(sizeof(int));
*ptr = 10;
// Free allocated memory to prevent memory leak
free(ptr);
}
47
Exercise 4
#include <stdio.h>
48
Exercise 4
#include <stdio.h>
49
NB absence of language-level security
In a safer p rogramming language than C/C++,
the programmer would not have to worry about
• writing past array bounds
(because you will get an IndexOutOfBoundsException
instead)
• implicit conversions from signed to unsigned
integers
(because the type system/compiler would forbid this
or warn)
• malloc possibly returning null
(because you'd get an OutOfMemoryException
instead)
• malloc not initialising memory
(because language could always ensure default
initialisation)
• ...
50
Spot the defect! (4)
#include <stdio.h>
51
Exercise 5
#include <stdio.h>
memory c orruption
problems; C
int main(int argc, char* argv[]) 1. Buffer overflows
{ if (argc > 1) 2. Pointers related errors
printf(argv[1]); 3. Memory leakage
return 0; 4. Format string attacks
• Spot ONE (1) defect.
• What is the root }
cause?
• Propose a solution.
52
memory c orruption
Exercise 5
problems; C
#include <stdio.h> 1. Buffer overflows
2. Pointers related errors
int main(int argc, char* argv[]) { 3. Memory leakage
if (argc > 1) { 4. Format string attacks
// Use printf with a format specifier (%s) to print user-supplied input as a string
printf("%s\n", argv[1]);
}
return 0;
• The defect is the code is vulnerable }
to format string attacks.
• The root cause is directly using the
printf function with the user-
supplied input argv[1] without any
formatting specifier.
• The solution is to use printf with a
format specifier such as %s to
specify that the input should be
treated as a string.
53
Format string att acks
New type of memory corruption discovered in 2000
No attack
• Strings can contain special characters, eg %s in
printf(“Cannot find file %s”, filename);
Such strings are called format strings
An attack
printf("Cannot find file %s");
• What can happen if we execute
printf(string)
where string is user-supplied ?
Esp. if it contains special characters, eg %s, %x, %n, %hn?
54
Format string att acks
Suppose attacker can feed malicious input string s to printf(s). This can:
• Read the stack
%x reads and prints bytes from stack, so input as:
%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x
%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x...,
will dump the stack, including passwords, keys,… stored on the
stack, as an output
55
To exploit format string attacks and read arbitrary memory,
what output does a carefully crafted format string produce
when the input is “\xEF\xCD\xCD\xAB %x%x...%x%s”?
d
Preventing format string att acks
• Always replace printf(str)
with printf("%s", str)
eg gcc has (far too many?) command line options for this:
-Wformat –Wformat-no-literal –Wformat-
security ...
See https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=format+string
to see how common format strings still are
57
Recap: buff er overfl ows
• Buffer overflow is #1 weakness in C and C++ programs
– because these language are not memory-safe
• Tricky to spot
• Typical cause: programming with arrays, pointers, and
strings
– esp. library functions for null-terminated strings
• Related attacks:
• Format string attack: another way of corrupting stack
• Integer overflows: often a stepping stone to getting a
buffer to overflow
• but just the integer overflow can already have a
security impact; e.g., think of banking software
58
Platform-level defences
Platform-level defences
Platform-level defenses – PLDs: are protective measures
implemented at the level of the underlying platform or
operating system (OS) to mitigate vulnerabilities and
potential attacks.
60
Platform-level defences A library is
• PLDs are defences where the compiler, hardware, or binary
compatible,
OS can take, without the programmer having to if a program
know. linked
dynamically
• Some defenses may need OS & hardware support to a former
• Some defenses cause overhead version of
the library
– if the overhead is unacceptable in production code, we continues
can running with
still use it when testing newer
• Some defenses may break binary compatibility versions of
the library
– eg if a compiler adds extra book-keeping & checks, without the
then all need to
libraries may need to be re-compiled with that compiler recompile.
61
PLDs
1. Stack canaries
now standard
2. Non-executable memory (NX, W⊕X) on many
platforms
3. Address space layout randomization (ASLR)
62
1. Stack canaries
• A dummy value - stack canary or cookie - is written on the stack in
front of the return address and checked when function returns
• A careless stack overflow will overwrite the canary, which can
then be detected
63
Stack canaries
64
Further improvements
1. More variation in canary values: eg not fixed values hardcoded in
binary but a random value chosen for each execution.
2. Better still, XOR the return address into the canary value
3. Include a null byte in the canary value, because C string
functions cannot write nulls inside strings
65
Further improvements
4. Re-order elements on the stack to reduce the potential impact of
overruns
• swapping parameters buf and fp on stack changes whether
overrunning buf can corrupt fp
• which is especially dangerous if fp is a function pointer
• hence it is safer to allocate array buffers ‘above’ all other
local variables
First introduced by IBM’s ProPolice.
66
Windows 2003 Stack Protection
Nice example of the ways in which things can go wrong...
• Enabled with /GS command line option in Visual Studio
• When canary is corrupted, control is transferred to an exception
handler
• Exception handler information is stored ...
on the stack!
• Attacker can corrupt the exception handler info on the stack, in the
process corrupt the canaries, and then let Stack Protection
mechanism transfer control to a malicious exception handler
67
Q: What countermeasure is proposed to mitigate the exploitation
of corrupted exception handler information and canaries in the
stack protection mechanism?
A: Only allow transfer of control to registered exception handlers.
• Attackers can still analyse memory layout on their own laptop, but
will have to determine the offsets used on the victim’s machine
to carry out an attack
69
3. Non-eXecutable memory (NX , W⊕X, DEP)
Distinguish
• X: executable memory (for storing code)
• W: writeable, non-executable memory (for storing data)
and let processor refuse to execute non-executable code
Limitation: this technique does not work for JIT (Just In Time)
compilation, where e.g. JavaScript is compiled to machine code at
run time.
70
Q: How does the use of a shadow stack (SS) enhance platform-level defenses
against buffer overflow attacks?
A: Use of SS enhances PLDs against buffer overflow attacks by providing copies of
return addresses to check for corruption.
A: c
Att acks, Defeating NX
Defeating NX: Return-to-libc att acks
With NX, code injection attacks no longer possible,
but code reuse attacks still are... (see slide 24).
73
Return-Oriented Programming (ROP)
Next stage in evolution of attacks, as people removed or protected
dangerous libc calls such as system()
74
Q: What is Return-Oriented Programming (ROP) and how does it work?
A: Return-Oriented Programming (ROP) is an advanced attack technique where
attackers chain together small snippets of existing code, known as gadgets, to
form a malicious program.
A: c
More advanced defences
Goals / Building blocks of att acks
• Code corruption attack
Overwrite the original program code in memory;
impossible with W⨁X
• Control-flow hijack attack
Overwrite a code pointer, eg return address, jump address,
function pointer, or pointer in the virtual function table (vtable)
C++ object
• Data-only attack
Overwrite some data, eg bool isAdmin;
• Information leak
Only reading some data; recall Heartbleed attack on TLS
77
Q: Which attack type involves overwriting a
code pointer, such as a return address or
function pointer?
a) Control-flow hijack attack
b) Data-only attack
c) Code corruption attack
d) Information leak
A: a
A: c
Classification of defences
• Probabilistic methods
Basic idea: add randomness to make attacks harder
– in location where certain data is located (eg ASLR),
or in the way data is represented in memory (eg pointer
encryption)
• Memory Safety
Basic idea: do additional bookkeeping & add runtime checks to
prevent some illegal memory access
79
More randomness: Pointer Encryption (PointGuard)
Probabilistic
methods
• Many buffer overflow attacks involve corrupting pointers,
pointers to data or code pointers
• To complicate this: store pointers encrypted in main memory,
unencrypted in registers
– simple & fast encryption scheme: XOR with a fixed value,
randomly chosen when a process starts
• Attacker can still corrupt encrypted pointers in memory,
but these will not decrypt to predictable values
– We can still use encryption to ensure integrity
of the corrupted pointers . Normally NOT a
good idea, but here it works.
• Next step: Data Space Randomisation (DSR)
– encrypt not just pointers, but store all data
encrypted in
memory
80
More memory safety
Memory
safety
Additional book keeping of meta-data & extra runtime checks to
prevent illegal memory access
Different possibilities ptr
• add information to pointer about size of memory chunks it points to
(fat pointers)
• add information to memory chunks about their size (Spatial
safety with object bounds)
• …
81
Fat pointers
Memory
safety
The compiler
• records size information for all pointers
• adds runtime checks for pointer arithmetic & array indexing
A pointer p
s o m e d a t a
A fat pointer p size
Downsides
• Considerable execution time overhead
• Not binary compatible – ie all code needs to be compiled to add
this book keeping for all pointers
82
Control Flow Integrity (CFI)
Extra bookkeeping & checks to spot unexpected control flow
• Dynamic return integrity
Stack canaries, or shadow stack that keeps copies of all return
addresses, providing extra check against corruption of return
addresses
• Static control flow integrity (Static CFI)
Idea: determine the control flow graph (cfg) and monitor jumps
in the control flow to spot deviant behavior
If f() never calls g(), because g()does not even occur in the
code of f(), then call from f() to g() is suspect, as is a
return fro m g() to f()
This can detect Return-to-libc and ROP attacks
83
Q: Which type of attacks can Static Control
Flow Integrity (Static CFI) detect?
a) Injection attacks
b) Format string attacks
c) Return-to-libc and ROP attacks
d) Code reuse attack
A: c
Static control fl ow integrity
g()
void f() { f()
... ; g(); call g
call h
... ; g();
... ; h(); return h()
call g
...
}
void g(){ ..h();} call h return
void h(){ ... }
Before and/or after every control transfer (function call or return) we could check if
it is legal – ie. allowed by the cfg
Some weird returns would still be allowed
• eg if we call h() from g(), and the return is to f(), this would be allowed by the static
cfg
• Additional dynamic return integrity check can narrow this down to actual call site –
using recorded call site on shadow stack
85
Downsides of static control
fl ow integrity checks
86
Exam questions: you
should be able to
• Explain how simple buffer overflows work & what root causes are
• Spot a simple buffer overflow, memory-allocation problem,
format string attack, or integer overflow in some C code
• Explain how countermeasures - such as stack canaries, non-
executable memory, ASLR, CFI, bounds checkers, pointer
encryption, … - work
• Explain why they might not always work