Ring 0x00: Basics of Windows Shellcode Writing Table of Contents
Ring 0x00: Basics of Windows Shellcode Writing Table of Contents
Table of contents
Introduction
Find the DLL base address
Find the function address
Call the function
Write the shellcode
Test the shellcode
Resources
Introduction
This tutorial is for x86 32bit shellcode. Windows shellcode is a lot harder to write than the shellcode for
Linux and you’ll see why. First we need a basic understanding of the Windows architecture, which is
shown below. Take a good look at it. Everything above the dividing line is in User mode and everything
below is in Kernel mode.
Unlike Linux, in Windows, applications can’t directly accesss system calls. Instead they use functions
from the Windows API ( WinAPI ), which internally call functions from the Native API ( NtAPI ), which in turn
use system calls. The Native API functions are undocumented, implemented in ntdll.dll and also, as can
be seen from the picture above, the lowest level of abstraction for User mode code.
The documented functions from the Windows API are stored in kernel32.dll , advapi32.dll , gdi32.dll and
others. The base services (like working with file systems, processes, devices, etc.) are provided by
kernel32.dll .
So to write shellcode for Windows, we’ll need to use functions from WinAPI or NtAPI . But how do we do
that?
ntdll.dll and kernel32.dll are so important that they are imported by every process.
To demonstrate this I used the tool ListDlls from the sysinternals suite .
I also wrote a little assembly program that does nothing and it has 3 loaded DLLs:
Notice the base addresses of the DLLs. They are the same across processes, because they are loaded
only once in memory and then referenced with pointer/handle by another process if it needs them. This is
done to preserve memory. But those addresses will differ across machines and across reboots.
This means that the shellcode must find where in memory the DLL we’re looking for is located. Then the
shellcode must find the address of the exported function, that we’re going to use.
The shellcode I’m going to write is going to be simple and its only function will be to execute calc.exe . To
accomplish this I’ll make use of the WinExec function, which has only two arguments and is exported by
kernel32.dll .
One of the fields of TEB is a pointer to Process Environment Block (PEB) structure, which holds
information about the process. The pointer to PEB is 0x30 bytes after the start of TEB .
0x0C bytes from the start, the PEB contains a pointer to PEB_LDR_DATA structure, which provides
information about the loaded DLLs. It has pointers to three doubly linked lists, two of which are
particularly interesting for our purposes. One of the lists is InInitializationOrderModuleList which holds the
DLLs in order of their initialization, and the other is InMemoryOrderModuleList which holds the DLLs in
the order they appear in memory. A pointer to the latter is stored at 0x14 bytes from the start of
PEB_LDR_DATA structure. The base address of the DLL is stored 0x10 bytes below its list entry
connection.
In the pre-Vista Windows versions the first two DLLs in InInitializationOrderModuleList were ntdll.dll and
kernel32.dll , but for Vista and onwards the second DLL is changed to kernelbase.dll .
The second and the third DLLs in InMemoryOrderModuleList are ntdll.dll and kernel32.dll . This is valid
for all Windows versions (at the time of writing) and is the preferred method, because it’s more portable.
So to find the address of kernel32.dll we must traverse several in-memory structures. The steps to do so
are:
They say a picture is worth a thousand words, so I made one to illustrate the process. Open it in a new
tab, zoom and take a good look.
If a picture is worth a thousand words, then an animation is worth (Number_of_frames * 1000) words.
When learning about Windows shellcode (and assembly in general), WinREPL is really useful to see the
result after every assembly instruction.
Relative Virtual Address (RVA) is an address relative to the base address of the PE executable, when its
loaded in memory (RVAs are not equal to the file offsets when the executable is on disk!).
In the PE format, at a constant RVA of 0x3C bytes is stored the RVA of the PE signature which is equal
to 0x5045 .
0x78 bytes after the PE signature is the RVA for the Export Table .
0x14 bytes from the start of the Export Table is stored the number of functions that the DLL exports.
0x1C bytes from the start of the Export Table is stored the RVA of the Address Table , which holds the
function addresses.
0x20 bytes from the start of the Export Table is stored the RVA of the Name Pointer Table , which holds
pointers to the names (strings) of the functions.
0x24 bytes from the start of the Export Table is stored the RVA of the Ordinal Table , which holds the
position of the function in the Address Table .
.loop:
mov edi , [ ebp - 10h ] ; edi = var16 = Address of Name Pointer Table
mov esi , [ ebp - 4 ] ; esi = var4 = "WinExec\x00"
xor ecx , ecx
jz start.found
.found:
; the counter (eax) now holds the position of WinExec
.end:
add esp , 26h ; clear the stack
pop ebp
ret
The instruction “mov ebx, fs:0x30” contains three null bytes. A way to avoid this is to write it as:
format PE console
use32
entry start
start:
push eax ; Save all registers
push ebx
push ecx
push edx
push esi
push edi
push ebp
.loop:
mov edi , [ ebp - 10h ] ; edi = var16 = Address of Name Pointer Table
mov edi , [ edi + eax * 4 ] ; Entries in Name Pointer Table are 4 bytes
; edi = RVA Nth entry = Address of Name Table
jz start.found
.found:
; the counter (eax) now holds the position of WinExec
.end:
Iopened it inIDA to show you a better visualization. The one showed in IDA doesn’t save all the
registers, I added this later, but was too lazy to make new screenshots.
Use fasm to compile, then decompile and extract the opcodes. We got lucky and there are no null bytes.
When I started learning about shellcode writing, one of the things that got me confused is that in the
disassembled output the jump instructions use absolute addresses (for example look at address 401070 :
“ je 0x40107c ”), which got me thinking how is this working at all? The addresses will be different across
processes and across systems and the shellcode will jump to some arbitrary code at a hardcoded
address. Thats definitely not portable! As it turns out, though, the disassembled output uses absolute
addresses for convenience, in reality the instructions use relative addresses.
Look again at the instruction at address 401070 (“ je 0x40107c ”), the opcodes are “ 74 0a ”, where 74 is
the opcode for je and 0a is the operand (it’s not an address!). The EIP register will point to the next
instruction at address 401072 , add to it the operand of the jump 401072 + 0a = 40107c , which is the
address showed by the disassembler. So there’s the proof that the instructions use relative addressing
and the shellcode will be portable.
50 53 51 52 56 57 55 89 e5 83 ec 18 31 f6 56 6a 63 66 68 78 65 68 57 69 6e 45 89
Length in bytes:
>>> len(shellcode)
200
#include <stdio.h>
int main ()
{
(( void ( * )()) sc )();
return 0;
}
To run it successfully in Visual Studio, you’ll have to compile it with some protections disabled:
Security Check: Disabled (/GS-)
Data Execution Prevention (DEP): No
Edit 0x00:
One of the commenters, Nathu , told me about a bug in my shellcode. If you run it on an OS other than
Windows 10 you’ll notice that it’s not working. This is a good opportunity to challenge yourself and try to
fix it on your own by debugging the shellcode and google what may cause such behaviour. It’s an
interesting issue :)
In case you can’t fix it (or don’t want to), you can find the correct shellcode and the reason for the bug
below…
EXPLANATION:
Depending on the compiler options, programs may align the stack to 2, 4 or more byte boundaries
(should by power of 2). Also some functions might expect the stack to be aligned in a certain way.
The alignment is done for optimisation reasons and you can read a good explanation about it here: Stack
Alignment .
Ifyou tried to debug the shellcode, you’ve probably noticed that the problem was with the WinExec
function which returned “ERROR_NOACCESS” error code, although it should have access to calc.exe !
Ifyou read this msdn article , you’ll see the following: “Visual C++ generally aligns data on natural
boundaries based on the target processor and the size of the data, up to 4-byte boundaries on 32-bit
processors, and 8-byte boundaries on 64-bit processors”. I assume the same alignment settings were
used for building the system DLLs.
Because we’re executing code for 32bit architecture, the WinExec function probably expects the stack to
be aligned up to 4-byte boundary. This means that a 2-byte variable will be saved at an address that’s
multiple of 2, and a 4-byte variable will be saved at an address that’s multiple of 4. For example take two
variables - 2 byte and 4 byte in size. If the 2 byte variable is at an address 0x0004 then the 4 byte
variable will be placed at address 0x0008. This means there are 2 bytes padding after the 2 byte
variable. This is also the reason why sometimes the allocated memory on stack for local variables is
larger than necessary.
The part shown below (where ‘WinExec’ string is pushed on the stack) messes up the alignment, which
causes WinExec to fail.
The reason it works on Windows 10 is probably because WinExec no longer requires the stack to be
aligned.
Edit 0x01:
Although it works when it’s used in a compiled binary, the previous change produces a null byte, which is
a problem when used to exploit a buffer overflow. The null byte is caused by the instruction “push
636578h” which assembles to “68 78 65 63 00”.
The version below should work and should not produce null bytes:
Resources
For the pictures of the TEB , PEB , etc structures I consulted several resources, because the official
documentation at MSDN is either non existent, incomplete or just plain wrong. Mainly I used ntinternals ,
but I got confused by some other resources I found before that. I’ll list even the wrong resources, that
way if you stumble on them, you won’t get confused (like I did).
[0x04] I took inspiration from this blog, that has great illustration, but uses the older technique with
InInitializationOrderModuleList (which still works for ntdll.dll, but not for kernel32.dll)
http://blog.the-playground.dk/2012/06/understanding-windows-shellcode.html
[0x05] The information for the TEB, PEB, PEB_LDR_DATA and LDR_MODULE I took from here (they
are actually the same as the ones used in resource 0x04, but it’s always good to fact check :) ).
https://undocumented.ntinternals.net/
[0x07] PEB structure from the official documentation. It is correct, though some fields are shown as
Reserved, which is why I used resource 0x05 (it has their names listed).
https://msdn.microsoft.com/en-us/library/windows/desktop/aa813706.aspx
[0x08] Another resource for the PEB structure. This one is wrong. If you count the byte offset to
PPEB_LDR_DATA, it’s way more than 12 (0x0C) bytes.
https://www.nirsoft.net/kernel_struct/vista/PEB.html
[0x09] PEB_LDR_DATA structure. It’s from the official documentation and clearly WRONG. Pointers to
the other two linked lists are missing.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa813708.aspx
[0x0a] PEB_LDR_DATA structure. Also wrong. UCHAR is 1 byte, counting the byte offset to the linked
lists produces wrong offset.
https://www.nirsoft.net/kernel_struct/vista/PEB_LDR_DATA.html
[0x0b] Explains the “new” and portable way to find kernel32.dll address
http://blog.harmonysecurity.com/2009_06_01_archive.html