Lecture 1
Lecture 1
1
Acknoledgment
} Slides are adapted from Dr. Akrami
} Also, other resources avilable online and I have cited them within
slides
2
Memory layout
Memory Layout Refresher
} How is program data laid out in memory?
4
All programs are stored in memory
architecture.
} In a 32-bit system, memory addresses are
32 bits long, which means the address space
has 232 bytes of memory.
5
All programs are stored in memory
4G 0xffffffff
6
All programs are stored in memory
} During program execution, the memory space is
divided into four main sections:
C Memory
1. Text/Code Section: This section stores the
program's executable instructions, which are the
machine code generated by the compiler and linker.
2. Static Section: This section holds constant values,
static variables, and global variables. These variables
have a fixed memory location throughout the
program's execution.
3. Heap: This section is used for dynamic memory
allocation. When a program requests memory using
functions like malloc in C, it is allocated from the heap.
The heap grows upwards in memory, meaning newer
allocations have higher addresses.
4. Stack: This section is used for storing local variables https://textbook.cs161.org/memory-safety/x86.html
and function call information. The stack grows
downwards in memory, with older function calls having
higher addresses.
7
The instructions themselves are in
memory
4G 0xffffffff
Text
0 0x00000000
8
Location of data areas
4G 0xffffffff
Set when
process starts cmdline & env
Stack int f() {
int x;
…
Runtime
Heap malloc(sizeof(long));
Uninit’d data static int x;
Known at Init’d data static const int y=10;
compile time
Text
0 0x00000000
9
Memory allocation
Stack and heap grow in opposite directions
As more memory is needed in the heap,
it grows towards the higher addresses.
Where as more memory is needed for
the stack, is grows downward
0x00000000 toward the lower address. 0xffffffff
Heap Stack
push 1
Stack push 2
pointer push 3
10
Memory allocation
Stack and heap grow in opposite directions
0x00000000 0xffffffff
Heap 1 Stack
push 1
Stack push 2
pointer push 3
11
Memory allocation
Stack and heap grow in opposite directions
0x00000000 0xffffffff
Heap 2 1 Stack
push 1
Stack push 2
pointer push 3
12
Memory allocation
0x00000000 0xffffffff
Heap 3 2 1 Stack
push 1
apportioned by the OS;
managed in-process Stack push 2
pointer push 3
by malloc
return
13
Registers
https://www.cs.virginia.edu/~evans/cs216/guides/x86.html
15
15
Registers
16
16
Registers and Memory
} Note that the top of the current stack frame is the highest address
associated with the current stack frame, and the bottom of the stack frame
is the lowest address associated with the current stack frame.
} The values stored in these three registers, EIP, EBP, and ESP, typically
represent memory addresses.
} When we say a register "points" to a memory location, we mean that the
address stored within the register refers to a specific memory address.
} For example, if EIP points to 0xDEADBEEF, it indicates that the value
0xDEADBEEF is stored in the EIP register, and this value can be interpreted
as an address to access a particular memory location.
17
Registers and Memory
18
Pushing and Popping Stack
} To save a value in the stack:
1) Allocating space on the stack by decreasing the ESP register, and
2) Storing the value in the newly allocated space.
} The x86 push instruction simplifies this process by performing
both steps in a single operation.
19
Pushing and Popping Stack
} To remove a value in the stack, we use the x86 pop instruction.
} This instruction performs two actions:
1) It increments the ESP register to remove the top value from the stack, and
2) It copies the removed value into a specified register.
} It's important to note that once a value is popped from the stack, it is effectively erased from
memory.
} However, by incrementing the ESP register, the popped value is now below the current stack
pointer and is considered to be in undefined memory.
20
Stack and function calls
} How the program use the stack while it is running?
21
Function Calls
} When a function is invoked, the stack allocates additional memory to store
local variables and other function-specific data.
} This allocated space, located at lower memory addresses, is known as a
stack frame.
} Once the function completes, this space is released, making it available for
future function calls.
} A function call involves the caller and the callee.
} The caller initiates the function call, and the program's execution
temporarily shifts to the callee.
} Once the callee finishes its execution, control is returned back to the caller,
allowing it to resume its execution.
22
Function Calls
} When making a function call in x86, we need to adjust the values of three
key registers:
1. EIP (Instruction Pointer): Initially pointing to the caller's instructions,
EIP must be updated to point to the first instruction of the callee.
2. EBP (Base Pointer): EBP, which points to the top of the caller's stack
frame, needs to be updated to point to the top of the newly created stack
frame for the callee.
3. ESP (Stack Pointer): Similarly, ESP, which points to the bottom of the
caller's stack frame, must be adjusted to point to the bottom of the callee's
stack frame.
23
Function Calls
} When the functions return, we need to preserve the original values of the
registers before the function call.
} This involves saving these values onto the stack. Once the function finishes,
these saved values are restored to their original state, allowing the program
to resume execution from where it left off in the caller function.
} The complete process of calling a function and returning involves 11 steps.
} Example: main is the caller function and foo is the callee function.
int main(void) {
foo(1, 2);
}
24
Function Calls
• The stack before the function foo is called.
• ebp and esp point to the top and bottom of the caller stack frame.
25
Function Calls
26
Function Calls
27
Function Calls
3. Update eip.
28
Function Calls
4. Push the old ebp (sfp) on the stack.
29
Function Calls
5. Move ebp down.
30
Function Calls
6. Move esp down.
31
Function Calls
7. Execute the function.
32
Function Calls
Inside a function
High Memory
sum(2,5) 5
2 To access 2= ebp + 8 bytes
int sum(int x,int y){ 4 rid
int total;
total = x + y; 4 ebp
EBP:Stack base pointer
return total;
}
High Memory
sum(2,5) 5
4 2 To access 5= ebp + 12 bytes
int sum(int x,int y){ 4 Ret Add
int total;
total = x + y; 4 EBP EBP:Stack base pointer
return total;
}
High Memory
sum(2,5) 5 Arguments
pushed in
2 reverse order
int sum(int x,int y){ Ret Add of code
int total;
EBP EBP:Stack base pointer
total = x + y;
return total; 4 0xC To access total= ebp – 4 bytes
} =-4(%ebp)
Local variables
pushed in the
same order as ESP:Stack frame pointer
they appear low Memory
in the code
36
Function Calls
9. Restore the old ebp (sfp).
37
Function Calls
10. Restore the old eip (rip).
38
Function Calls
11. Remove arguments from the stack.
39
Function Calls
• You might notice that we saved the old values of EIP and EBP during the
function call, but not the old value of ESP.
• This is because the ESP register naturally adjusts itself as values are pushed
onto and popped off the stack.
• As we push arguments and local variables onto the stack, ESP is
decremented.
• Conversely, as we pop values off the stack, ESP is incremented.
• This automatic adjustment of ESP eliminates the need to save its old value
before the function call.
40
Basic stack layout
void func(char *arg1, int arg2, int arg3)
{
char loc1[4]
int loc2;
...
}
0xffffffff
… loc2 loc1 ??? ??? arg1 arg2 arg3 caller’s data
Local variables Arguments
pushed in the pushed in
same order as reverse order
they appear of code
in the code
The local variable allocation is ultimately up to the compiler: Variables could be allocated in any
order, or not allocated at all and stored only in registers,
depending on the optimization level used.
41
Accessing variables
void func(char *arg1, int arg2, int arg3)
{
... Q: Where is (this) loc2?
loc2++;
...
}
0xffffffff
… loc2 loc1 ??? ??? arg1 arg2 arg3 caller’s data
0xbffff323
Can’t know absolute
address at compile time
But can know the relative address
• loc2 is always 8B before ???s
42
Accessing variables
void func(char *arg1, int arg2, int arg3)
{
... Q: Where is (this) loc2?
loc2++;
... A: -8(%ebp)
}
0xffffffff
… loc2 loc1 ??? ??? arg1 arg2 arg3 caller’s data
43
Returning from functions
int main()
{
...
func(“Hey”, 10, -3);
... Q: How do we restore %ebp?
}
0xffffffff
… loc2 loc1 ??? ??? arg1 arg2 arg3 caller’s data
44
Returning from functions
int main()
{
...
Q: How do we restore %ebp?
func(“Hey”, 10, -3);
...
}
%esp
0xffffffff
Old
??? arg1 arg2 arg3 caller’s data
%ebp
Stack frame
%ebp Old %ebp
for func
45
Returning from functions
int main()
{
...
func(“Hey”, 10, -3);
... Q: How do we resume here?
}
0xffffffff
Old
… loc2 loc1 ??? arg1 arg2 arg3 caller’s data
%ebp
Stack frame
%ebp Old %ebp
for func
46
The instructions themselves are in
memory
4G 0xffffffff
%eip
Text
0 0x00000000
47
The instructions themselves are in
memory
4G 0xffffffff
%eip
Text
0 0x00000000
48
Returning from functions
int main()
{
...
func(“Hey”, 10, -3);
... Q: How do we resume here?
}
0xffffffff
Old Old
… loc2 loc1 ??? arg1 arg2 arg3 caller’s data
%ebp %eip
Stack frame
%ebp Old %ebp
for func
49
Stack and functions: Summary
} Calling function:
} 1.Push arguments onto the stack (in reverse)
} 2.Push the return address, i.e., the address of the instruction you
want run after control returns to you (%eip)
} 3.Jump to the function’s address
} Called function:
} 4.Push the old frame pointer onto the stack (%ebp)
} 5.Set frame pointer (%ebp) to where the end of the stack is right
now (%esp)
} 6.Push local variables onto the stack
} Returning function:
} 7.Reset the previous stack frame: %esp = current %ebp, %ebp = (old
%ebp)
} 8.Jump back to return address: %eip = 4(%esp)
50
Buffer overflows
Buffer overflows from 10,000 ft
} Buffer =
} Contiguous memory associated with a variable or field
} Common in C
} All strings are (NUL-terminated) arrays of char’s
} Overflow =
} Put more into the buffer than it can hold
52
What is a buffer overflow?
} A buffer overflow is a bug that affects low-level code, typically in
C and C++, with significant security implications
} But an attacker can alter the situations that cause the program
to do much worse
} Steal private information (e.g., Heartbleed)
} Corrupt valuable information
} Run code of the attacker’s choice
53
Why study them?
} Buffer overflows are still relevant today
} C and C++ are still popular
} Buffer overflows still occur with regularity
54
C and C++ still very popular
55
Critical systems in C/C++
} Most OS kernels and utilities
} fingerd, X windows server, shell
56
History of buffer overflows
} Morris worm
} Propagated across machines (too aggressively, thanks to a bug)
} One way it propagated was a buffer overflow attack against a
vulnerable version of fingerd on VAXes
} Sent a special string to the finger daemon, which caused it to execute code that
created a new worm copy
} Didn’t check OS: caused Suns running BSD to crash
} End result: $10-100M in damages, probation, community service
Morris now a professor at MIT
57
History of buffer overflows (cont.)
} CodeRed
} Exploited an overflow in the MS-IIS server
} 300,000 machines infected in 14 hours
58
History of buffer overflows (cont.)
} SQL Slammer
} Exploited an overflow in the MS-SQL server
} 75,000 machines infected in 10 minutes
59
60
61
What we’ll do
} Understand how these attacks work, and how to defend against
them
62
Note about terminology
} I use the term buffer overflow to mean any access of a buffer
outside of its allotted bounds
} Could be an over-read, or an over-write
} Could be during iteration (“running off the end”) or by direct access (e.g.,
by pointer arithmetic)
} Out-of-bounds access could be to addresses that precede or follow the
buffer
} Others sometimes use different terms
} They might reserve buffer overflow to refer only to actions that write
beyond the bounds of a buffer
} Contrast with terms buffer underflow (write prior to the start), buffer overread
(read past the end), out-of-bounds access, etc.
63
Benign outcome
void func(char *arg1)
{
char buffer[4];
strcpy(buffer, arg1);
...
}
int main()
{
char *mystr = “AuthMe!”;
func(mystr);
...
}
buffer
SEGFAULT (0x00216551) (during subsequent access)
64
Security-relevant outcome
void func(char *arg1)
{
int authenticated = 0;
char buffer[4];
strcpy(buffer, arg1);
if(authenticated) { ...
}
int main()
{
char *mystr = “AuthMe!”;
func(mystr);
...
}
A00 00
u 00t 00h 4d00 65 210000
00 00 %ebp %eip &arg1
buffer authenticated
65
Could it be worse?
void func(char *arg1)
!
E
{
char buffer[4];
...
O D
strcpy(buffer, arg1);
C
}
All ours!
00 00 00 00 %ebp %eip &mystr
buffer
strcpy will let you write as much as you want (til a ‘\0’)
66
Aside: User-supplied strings
} These examples provide their own strings
67
Code Injection
Code Injection: Main idea
void func(char *arg1)
{
char buffer[4];
sprintf(buffer, arg1);
...
}
%eip
69
Code Injection: Main idea
void func(char *arg1)
{
char buffer[4];
sprintf(buffer, arg1);
...
}
%eip
70
Challenge1: Loading code into memory
} It must be the machine code instructions (i.e., already
compiled and ready to run)
71
What code to run?
72
Shellcode
#include <stdio.h>
int main( ) {
char *name[2];
name[0] = “/bin/sh”;
name[1] = NULL;
execve(name[0], name, NULL);
}
Machine code
pushl %eax “\x50” (Part of)
Assembly
73
Challenge 2:
Getting injected code to run
%eip
74
Recall: memory layout summary
} Calling function:
} 1.Push arguments onto the stack (in reverse)
} 2.Push the return address, i.e., the address of the instruction you
want run after control returns to you
} 3.Jump to the function’s address
} Called function:
} 4.Push the old frame pointer onto the stack (%ebp)
} 5.Set frame pointer (%ebp) to where the end of the stack is right
now (%esp)
} 6.Push local variables onto the stack
} Returning function:
} 7.Reset the previous stack frame: %esp = %ebp, %ebp = (%ebp)
} 8.Jump back to return address: %eip = 4(%esp)
75
Hijacking the saved %eip
%eip %ebp
76
Hijacking the saved %eip
%eip %ebp
77
Challenge 3:
Finding the return address
} If we don’t have access to the code, we don’t know how far the
buffer is from the saved %ebp
78
Improving our chances: nop sleds
nop is a single-byte instruction
(just moves to the next instruction)
Jumping anywhere
%eip %ebp here will work
79
Putting it all together
good
padding guess
%eip
80
Other memory exploits
Other attacks
} The code injection attack we have just considered is called stack
smashing
} The term was coined by Aleph One in 1996
82
Heap overflow
} Stack smashing overflows a stack allocated buffer
83
Heap overflow
typedef struct _vulnerable_struct {
char buff[MAX_LEN];
int (*cmp)(char*,char*);
} vulnerable;
int foo(vulnerable* s, char* one, char*
two)
{
strcpy( s->buff, one ); copy one into buff
strcat( s->buff, two ); copy two into buff
return s->cmp( s->buff, "file://foobar"
);
}
84
Heap overflow variants
} Overflow into the C++ object vtable
} C++ objects (that contain virtual functions) are represented using a
vtable, which contains pointers to the object’s methods
} This table is analogous to s->cmp in our previous example, and a similar
sort of attack will work
} Overflow into adjacent objects
} Where buff is not collocated with a function pointer, but is allocated
near one on the heap
} Overflow heap metadata
} Hidden header just before the pointer returned by malloc
} Flow into that header to corrupt the heap itself
} Malloc implementation to do your dirty work for you!
85
Integer overflow
void vulnerable()
{
HUGE
char *response;
int nresp = packet_get_int();
Wrap around
if (nresp > 0) {
response = malloc(nresp*sizeof(char*));
for (i = 0; i < nresp; i++)
response[i] = packet_get_string(NULL);
} Overflow
86
Corrupting data
} The attacks we have shown so far affect code
} Return addresses and function pointers
87
Read overflow
} Rather than permitting writing past the end of a buffer, a bug
could permit reading past the end
88
Read overflow
int main() {
char buf[100], *p;
int i, len;
while (1) {
}
p = fgets(buf,sizeof(buf),stdin);
if (p == NULL) return 0;
len = atoi(p); Read integer
p = fgets(buf,sizeof(buf),stdin);
if (p == NULL) return 0; } Read message
}
for (i=0; i<len; i++)
if (!iscntrl(buf[i])) putchar(buf[i]);
else putchar('.'); Echo back (partial)message
printf(“\n”); May exceed
}} actual message
length!
89
Sample transcript
% ./echo-server
24
every good boy does fine
ECHO: |every good boy does fine|
10
hello there OK: input length
ECHO: |hello ther| < buffer size
25
hello BAD:
ECHO: |hello..here..y does fine.| length
leaked data > size !
90
Heartbleed
} The Heartbleed bug was a read overflow
in exactly this style
} Format specifiers
} Position in string indicates stack argument to print
} Kind of specifier indicates type of the argument
} %s = string
} %d = integer
} etc.
93
What’s the difference?
void safe()
{
char buf[80];
if(fgets(buf, sizeof(buf), stdin)==NULL)
return;
printf(“%s”,buf);!
}
void vulnerable()
{
char buf[80];
if(fgets(buf, sizeof(buf), stdin)==NULL)
return;
printf(buf); Attacker controls the format string
}
94
printf implementation
int i = 10;
printf(“%d %p\n”, i, &i);
0x00000000 0xffffffff
%ebp %eip &fmt 10 &i
95
Back to our example
void vulnerable()
{
char buf[80];
if(fgets(buf, sizeof(buf), stdin)==NULL)
return;
printf(buf);
}
“%d %x"
0x00000000 0xffffffff
%ebp %eip &fmt
caller’s
stack frame
96
Format string vulnerabilities
} printf(“100% dave”);
} Prints stack entry 4 bytes above saved %eip
} printf(“%s”);
} Prints bytes pointed to by that stack entry
} printf(“%d %d %d %d …”);
} Prints a series of stack entries as integers
} printf(“%08x %08x %08x %08x …”);
} Same, but nicely formatted hex
} printf(“100% no way!”)
} WRITES the number 3 to address pointed to by stack entry
97
Why is this a buffer overflow?
} We should think of this as a buffer overflow in the sense that
} The stack itself can be viewed as a kind of buffer
} The size of that buffer is determined by the number and size of the
arguments passed to a function
98
Vulnerability prevalence
http://web.nvd.nist.gov/view/vuln/statistics
99
Time to switch hats
100
Software Security
Questions
101