How To Write Your Own Packer
How To Write Your Own Packer
2, 2006
How to Write Your Own Packer
by BigBoote
Vol. 1, No. 2, 2006
Abstract:
Why write your own packer when there are so many existing ones to choose from? Well, aside from making your executables smaller,
packing is a good way to quickly and easily obfuscate your work.
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
How to Write Your Own Packer produced took about 1.5 weeks to produce including
research and bug fixing. Subsequent ones took far less
By BigBoote since I had already done the hard part, which is figuring
out how. Hopefully this document will save you that time
as well!
You do not have to use assembler for the most part. If you
Why write your own packer when there are so many
can part with supporting some esoteric features, you
existing ones to choose from? Well, aside from making
won't have to use it at all. All of that is relevant for the
your executables smaller, packing is a good way to
decompression stub only anyway. The packer can be in
quickly and easily obfuscate your work. Existing well
Logo or ObjectOriented COBOL if you like.
know packers either have an explicit 'unpack' function,
or there are readily available procdump scripts for OK, enough of the blahblahblah, on to technical stuff....
generating an unpacked version.
3 Big Picture
1 Intro Simple. Executable is analyzed, transformed, and an
extra piece of code is attached which gets invoked instead
Why write your own packer when there are so many of the original program. This piece is called a 'stub' and
existing ones to choose from? Well, aside from making decompresses the image to its original location. Then it
your executables smaller, packing is a good way to jumps to that original location. But you know this
quickly and easily obfuscate your work. Existing well already.
know packers either have an explicit 'unpack' function, or
there are readily available procdump scripts for Sounds simple, but there are pitfalls that await you.
generating an unpacked version. Some of these include:
Since this document has quickly exploded in length I'm
• Support for simplified Thread Local Storage, which is
going to break it up into separate installments. In this
key in supporting Delphi applications
installment I will cover the qualitative aspects of
Support for code relocation fixups in dlls if you care
producing a packer. I'll discuss what you're getting into
about packing dlls. Recall ActiveX controls are dlls
and how the packer is structured in general. I'll briefly
too, as are other common things you might be
discuss some pitfalls, and I'll give some links to technical
interested in packing
information you will need to be familiar with before going
into the next installments.
• Support for some stuff that must be available even in
In the next two installments I'll go into details of how to the compressed form. This includes some of your
implement the components of the packer and how I resources and export names in dlls
usually go about producing them. Dealing with bound imports
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
• Working around bugs in Microsoft code. There is an • Making the Packer Application
infamous one relating to OLE and the resource The packer application does all the hard work. This
section. Many packers do not accommodate this and makes since when you realize the stub is supposed to
this is important for ActiveX support. do as little as possible to have a minimum impact on
runtime.
horse's mouth. Dry. Accurate.
7 A Stub That Runs
It's useful to remember that your decompression stub is
5 Next Step actually a parasite onto a program that was never
OK, after you've gotten familiar with those, we can start expecting for it to be there. As such, you should try to
to write some code. I'm going to save that for the next minimize your impact on the runtime environment in
installments (probably two). They will detail: your packer. I had mentioned before that you could make
a packer in Logo or ObjectOriented COBOL, and that
• Making the Unpacker Stub really was only partially true. You can make the packer
The stub has several responsibilities aside from the application that way fer sure and you might even be
obvious decompression. It also has to perform duties able to make the unpacker that way sometimes but you
normally done by the Windows loader. will really be much happier with C/C++/ASM for the stub
part. I personally like C++. Anyway, it will be smaller. If
you don't care about the size, still using stuff like Delphi
or VB for the stub would be problematic because it hoists
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
in subtle stuff like TLS and runtimes, and they don't assumptions. In real life you don't have to do it this way,
have primitives needed to thunk over to the original but let's temporarily pretend we are and at the end of
program. Plus it can hose COM stuff that the original this series you'll know how you might like to do it
app isn't expecting. So let's assume the unpacker will be different.
in the lowerlevel languages I spoke of and take solace
that this is pretty straightforward code, and that the
Big picture is that there will be two projects, producing
packer still can be in whatever. two distinct executables the packer stub and the packer
application. Their configuration will be significantly
Since the stub is a parasite, and since it will have to be
different.
located in a spot at the original application's convenience,
we will have to be relocating it dynamically in the packer
We are going to do a bit of ledgerdemain with the stub
application. To help with this we will make the stub with
project which will be explained later, but for now,
relocation records. These are usually used for dlls when
configure a boiler plate project for your stub thusly:
they can't be loaded at their preferred address. We will
make use of them when binding the stub to the original • Produce a DLL
application.
• Use static multithreaded runtime libraries
Setting Up Projects and now for something completely add any defines or header paths your compressor lib
different will be needing
/FAcs generate listing files with source, assembly, and
machine code
OK, I am going to take a brief break from code and
/Fa(filename) specify where these listings go
technological stuff and talk about project configuration. remove /GZ compilergenerated stack checks
Normally I wouldn't since that's a personal choice, remove any debug options, it won't help us where we're
however this time I will because things I talk about later going
will be dependent upon some of the configuration
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
BOOL WINAPI _DllMainCRTStartup ( HANDLE,
these options are probably available as checkboxes, so
DWORD, LPVOID )
you won't have to manually add them.
{
//(program will go here)
return TRUE;
}
The gist is that we are not going to have normal debug
capabilites so we turn off that stuff. Instead, we will be
relying on the listing of the compilergenerated assembly This should resolve the linker error and will be our entry
to see code and the linkergenerated mapfile to see actual point. The place our program will ultimately go is
addresses. All this is interesting stuff in any project indicated by the comment. Ultimately we'll never hit the
'return TRUE' statement; it's just there to make the
really, but it is all we have for debugging in this one.
compiler happy, and the function signature is what it is
If you build now you will should get a linker error to make the linker happy.
complaining about an unresolved external symbol
If you want to be more arty, you can do the following:
DllMainCRTStartup@12. This is good! If you don't get
that then the default libs are coming in. The symbol is #pragma comment ( linker, "/entry:\"StubEntryPoint\"" )
possible different for Borland stuff. Other errors probably void declspec ( naked ) StubEntryPoint() {
mean something else needs to be fixed; this is the only //(program will go here)
one you should get for Microsoft's compiler. }
which is syntactically clearer.
9 Runtime dependencies
This is cosmetic so don't feel bad if you find the
You cannot assume what runtime dependencies the
equivalent pragmas for your compiler/linker. Also, this
original app has. Thus, you cannot make calls to funky
perverts what the compiler normally thinks about and I
dlls (vbrunX.dll, etc). You have no idea if they are there.
have seen it crash randomly. I have found when the
You will do well to statically link your runtime library.
compiler gets in a crashing mood, that putting in:
You will do much (much) better, however, to not link any
runtime libraries at all! ASM coders will take delight in
this fact already, because they are hardcore, but this asm nop
need not dissuade the C/C++ coders who are accustomed
to malloc() strcmp() new std::vector<> or such. All this is
in a couple places seems to get it back on track. Ain't that
doable. You will just have to provide your own
a laugh?! Whatever...
implementation of these functions. Fortunately, this is
pretty easy since you can call native functions exported As code is added, you should periodically build. The
by Kernel32.dll. /That/ dll is certainly present, and linker will add more and more complaints like above and
certainly one that is already used by the original app you we will have to implement the underlying methods the
are packing so feel free to use it when you like. compiler is emitting references to. Here's a tip: when you
installed your dev tools, you may have had the option to
install the source to the C Runtime. It will be helpful in
10 Making a Trivial C some cases since you can cut and paste some parts. In
Runtime to Link Instead of the particular, a function:
Proper One extern "C" declspec ( naked ) void _chkstk(void)
Replacing the C Runtime might sound scary but
remember we only want to implement what is necessary; is sometimes emitted by the compiler quietly (if you have
this will turn out to be a small set of things. The linker a large array on the stack, like for a buffer). Just cutand
will help you figure out what these are. Recall that we paste that one; it's funky.
turned off default library searching with the /nodefault
switch (or equivalent for your linker, that's for FYI, I typically have to implement:
Microsoft's). If you configured as I suggested above, we've
memcpy
got a linker error already: DllMainCRTStartup@12 We'll memset
fix that one first. memcmp
malloc
Discard your boilerplate DllMain. Replace it with:
free
realloc
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
organized into sections, which have a location and size.
This information is stored in the section headers, which What the Hell is that? Well, it creates a special data
describe where the sections go in memory (relative to the section for our global variables. Dirty little secret about
load address). the linker is that it sorts the section names lexically, and
To do this properly, we will be needing to know our load discards the portion at the '$' and after. By naming the
address. If we are a stub for an exe we can simply do a section '.A$A' we will be forcing the global vars structure
GetModuleHandle(NULL) and the returned module to be at the very beginning of the data section, which will
handle is the base load address. This won't work for a dll be easy for the packing application to locate. Next, we
however. The module handle for the dll is on the stack. will merge some sections with the following linker
We can write some code to get it, or we can choose not to options. You can put these on the link line, or you can be
do the 'arty entry point' and it is referenceble as a fancier and place them in the code with a pragma (if your
parameter (do not attempt to reference those parameters tools support such). I think putting them in the pragma
if it is the stub for an exe unless you are fond of crashes). makes it more obvious from the code standpoint that the
stuff is fragile and should be handled carefully if changes
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
• whiz through the records getting the DWORD at the
address they indicate and add the offset
void decompress_original_data() {
void* pvCompData = (void*)
Pretty straightforward. The format of the relocation
( gev.RVA_compressed_data_start + load_address );
initialize_compressor ( pvCompData,
records is a little bit odd and is structured the way it is
gev.compressed_data_size;); presumably for size considerations. The records are
organized as a series of chunks of records, one chunk per
decompress_data ( &origdirinfo, sizeof(origdirinfo) ); page. The records in the chunk reference an offset into
the page. Additionally, for padding consideration there
int section_count;
are records that are essentially noops and should be
decompress_data ( §ion_count, sizeof(section_count) )
;
ignored. Pseudocode follows:
for ( int i = 0; i < section_count; ++i ) { void perform_relocations () {
section_header hdr; //see if no relocation records
decompress_data ( &hdr, sizeof(hdr) ); if
void* pvOrigLoc = (void*) ( hdr.RVA_location + ( origdirinfoIMAGE_DIRECTORY_ENTRY_BASERELO
load_address ); C.VirtualAddress == 0 )
decompress_data ( pvOrigLoc, hdr.size ); return;
}
//compute offset
cleanup_compressor(); IMAGE_DOS_HEADER* dos_header =
} (IMAGE_DOS_HEADER*) load_address;
IMAGE_NT_HEADERS32* nt_hdr =
This will be called in the main entry point of the stub (IMAGE_NT_HEADERS32*)
right after computing the actual load address. &((unsigned char*)load_address)dos_header>e_lfanew;
DWORD reloc_offset = load_address nt_hdr
That's it! What could be easier? Well, notice that we're >OptionalHeader.ImageBase;
using a stream model for our compressor. Most
compression libraries come pretty close to implementing //if we're where we want to be, nothing further to do
that but you have to do ever so slightly more to make it if ( reloc_offset == 0 )
return;
that simple. I wrap up my compressors in a class so that
they all implement the above interface to make things //gotta do it, compute the start
simple like above. Swaping out compressors then just IMAGE_BASE_RELOCATION* ibr_current =
means making a new adaptor class. The rest of the stub (IMAGE_BASE_RELOCATION*)
need not be touched to put in different (origdirinfoIMAGE_DIRECTORY_ENTRY_BASERELOC
compressors/encryptors. .VirtualAddress + load_address );
//compute the end
Now that all the original data is decompressed into it's IMAGE_BASE_RELOCATION* ibr_end =
original location, we have to do stuff that the Windows (IMAGE_BASE_RELOCATION*)
loader normally does. This includes relocation fixups, &((unsigned
imports lookup, and TLS initialization/thunking. char*)ibr_current)origdirinfo[IMAGE_DIRECTORY_EN
TRY_BASERELOC.Size];
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
ibr_current = (IMAGE_BASE_RELOCATION*) IMAGE_IMPORT_MODULE_DIRECTORY.dwModuleNa
&((unsigned char*)ibr_current)ibr_current>SizeOfBlock; meRVA has the name of the dll you will need to
} LoadLibrary() on). Once you get the address, you stick it
} in the parallel location in the ImportAddressTable array.
You do this for each member.
This is the majority of what is needed to support DLLs. In the case when the ImportNameTable is not present,
There is a little bit more discussed later. Given that this however, as with Borland's linker, you must get the
is so straightforward, I'm a little surprised at the number address of the function name from the
of packers out there that do not support DLLs. ImportAddressTable itself. Then you overwrite it with
The next major thing we have to do is to resolve all the the function address.
imports. This is only a little more involved that the It is important to use the ImportNameTable in
relocation records. preference to the ImportAddressTable because of a thing
called 'bound executables'. If you want to test your work
on a bound executable, consider that notepad.exe is
16 Resolving Imports bound.
Resolving the imports consists of walking through the
Import Address Table of the original application and After processing each DLL you may or may not wish to
doing GetProcAddress to resolve the imports. This is very do a FreeLibrary. It's going to depend on how you
similar to the relocation record logic that I won't do a implement your packer application. We'll discuss that in
pseudocode example. Details of these structures are the next installment, and it relates to 'merged imports'.
given in the links provided in the first installment. The For now, suffice it to say that if you perform merged
structures all start at: imports, you can call FreeLibrary, but if you do not, you
must not call it. You might want to put the call in and
origdirinfoIMAGE_DIRECTORY_ENTRY_IMPORT.Virtual comment it out while developing until you have merged
Address
imports implemented. Merged imports is important for
properly supporting TLS that potentially exist in
There are a couple caveats I should mention however: implicitly loaded DLLs. This leads into the final
responsibility for the stub, which is handling TLS
• The structures are wired together via RVA pointers. support.
These need to have the load_address added to make a
real pointer
• The pointers in the structure to strings are real 17 Supporting TLS
pointers. These _do_not_ need the load_address Thread Local Storage, or TLS, is a handy programming
added. Relocation processing will have already fixed mechanism. We don't care mostly, since we're not using
these up. it, but the original application to be packed might be
using it indeed. In fact, Delphi always uses it, and so if
Don't forget about importing by ordinal. You will know we're going to support packing Delphi apps, we better
this is happening because the pointer to the string will accomodate it.
have the high bit set ( (ptr & 0x8000000) != 0 ).
Borland and Microsoft linkers do different things, so you TLS fundamentally is done via API calls. In general, you
have to be prepared to get the string from either of allocate an 'index' which you store in a global variable.
different spots. Basically, there are two parallel arrays, With this index you can get a DWORD value specific to
the ImportNameTable which you get from: each thread. Normally you use this value to store a
pointer to a hunk of memory you allocate once per
IMAGE_IMPORT_MODULE_DIRECTORY.dwImportNa thread. Because people thought this was tedious, a
meListRVA special mechanism was created to make it easier.
Consequently, you can write code like this:
and the ImportAddressTable which you get from:
declspec ( thread ) int tls_int_value = 0;
IMAGE_IMPORT_MODULE_DIRECTORY.dwIATPortio and each thread can access it's distinct instance by name
nRVA like any other variable. I don't know if there is an official
name for this form of TLS, so I'll call it 'simplified TLS'.
The ImportNameTable is optional. Borland doesn't use it. This is done in cooperation of the operating system, and
If it is present, you should use it to get the name of the there are structures within the PE file that makes it
function and GetProcAddress() it's pointer (the happen. Those structures are contained in a chunk that
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
For this, I add two items to the external global packer extern "C" void NTAPI TLS_callback ( PVOID DllHandle,
data structure: DWORD Reason, PVOID Reserved ) {
if ( safe_to_callback_tls ) {
GlobalExternVars PIMAGE_TLS_CALLBACK* ppfn =
{ g_pkrdat.m_tlsdirOrig.AddressOfCallBacks;
//(other stuff we previously described) if ( ppfn ) {
IMAGE_TLS_DIRECTORY tls_original; while ( *ppfn ) {
IMAGE_TLS_DIRECTORY tls_proxy; (*ppfn) ( DllHandle, Reason, Reserved );
}; ++ppfn;
}
}
The packer application will copy the original data to } else {
tls_original for our use at runtime. tls_proxy will be delayed_tls_callback = true;
almost an exact copy, except two items will not be TLS_dll_handle = DllHandle;
modified from the stub: TLS_reason = Reason;
TLS_reserved = Reserved;
tls_proxy.AddressOfIndex }
tls_proxy.AddressOfCallBacks }
In the stub we will inialize the AddressOfIndex to point This will provide a place for the OS to store the slot info,
to a normal global DWORD variable, and we will which we will later restore, and if it does call thunks
initialize AddressOfCallBacks to point to an array of then we will capture the parameters for later when we
function pointers in the stub. The function pointers array will invoke the original thunks after decompression.
is a list of things that is called whenever a new thread is Again, this is all done because the OS will be doing this
created. It is intended to be used for user defined stuff before we have a chance to decompress. After we
initialization of the TLS objects. Alas, no compiler I have decompress, we pass the call straight to the original
seen has ever used them. Moreover, on the Windows 9x application.
line, these functions are not even called. Still, we support
it in case one day they are used. We point the We handle this last step like so:
AddressOfCallbacks to an array of two items, one
void FinalizeTLSStuff() {
pointing to a function of our implementation, and the if
second being NULL to indicate the end of the list. ( origdirinfoIMAGE_DIRECTORY_ENTRY_TLS.Virtual
Address != 0 ) {
There will be a global DWORD for the TLS slot:
*gev.tls_original.AddressOfIndex = TLS_slot_index;
void* TLS_data;
DWORD TLS_slot_index;
asm
{
The TLS callback function must be of the form: mov ecx, DWORD PTR TLS_slot_index;
mov edx, DWORD PTR fs:02ch
mov ecx, DWORD PTR edx+ecx*4
mov pvTLSData, ecx
extern "C" void NTAPI TLS_callback ( PVOID DllHandle,
}
DWORD Reason, PVOID Reserved );
int size = gev.tls_original.EndAddressOfRawData
gev.tls_original.StartAddressOfRawData;
also you add two global booleans indicating that it is safe memcpy ( pvTLSData, (void*)
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
as such since we are going to snip out interesting pieces. memory since it has mapped the sections appropriately.
You could just as easily use a tool to spew just the
interesting pieces to binary resources, or encode them as So, I would further suggest creating a utility class that
static data in a C source file. This choice is per taste and incorporates the address translator mentioned earlier
we are going to choose the resource approach. We are (along with logic to initialize it) that can provide
also going to be a commandline app. So... translated access from RVA to physical pointer for
regions within a PE file. Stick in an RVA, get out a
Configure your project as a commandline (console) physical memory pointer. We can use this device for both
application. Create a RC file and include a resource that the memorymapped original, and also for the resource
is the stub 'dll' produced by your previous project. That's loaded stub. You don't have to do this but it will make
really it for configuration. I'm sure that will be a welcome your life easier. This is a plus because it's already going
simplification after having set the stub project! to get a little harder as it is wink.gif. You may wish to
throw in a couple other convenient PEspecific items, like
pointers to the image headers. We'll be using various
22.2 Utility Code fields in these headers at several points throughout the
packing process.
There are going to be some things that are simple, but One other thing that will make you happier in the long
very tedious, and you will probably like to produce some run is to produce some sort of wrapper for your
machinery to tend to these tasks. compression library of choice. In doing so you can both
One such task relates to translating addresses. We have simplify use of the library and also be able to swap out a
to do this in a couple places for different reasons, so you different compressor should you choose. For example:
might consider making some sort of general purpose
class Compressor {
address translator. It will need to handle several distinct
public:
ranges of addresses being mapped independently to other Compressor ( HANDLE hFile ); //create; write to given
ranges. In practical terms, there won't be a huge number file at current file pos
of range mappings (like about 5), so if you want to just void InsertData ( const void *pv, int size ); //stick some
keep a list of range mappings and do a linear search no uncompressed data in
one will chastise you. void Finish(); //finish any pending writes
DWORD CompressedCount(); //count of output
Another tedious thing (I find) is reading little bits and (compressed) data
pieces from the original executable file. This is };
particularly true when navigating a network of objects
since you have to run along pointer paths. To make this This sort of interface I have found to be suitable for all
much more bearable I use a memorymapped file for the compression libraries I have considered, though of course
original executable. Readonly access is fine since we I wouldn't use it for things other than this exe packer.
won't be altering the original (BTW, if for some reason
you do want to write to the mapped image, but not Other than that you might like to make a general
disrupt your original file, remember you can map it copy purpose resource tree walker, but we'll discuss that later
onwrite. I've done this for some protectors.) I don't use in the implementation. Making this part generic is
this approach for the output file, however, because most mostly useful if you wish to reuse it in other projects.
of that will be sequential write.
Lastly, I would like to reiterate that the pointers in the With that being said, we are ready to move onto the...
executable are RVA's. This means you will need to do
_two_ things to transform them to real pointers. First, if
you've mapped the image to an address, you will need to
23 Basic Tasks
add that base address. The stub 'dll' compiled in as a Here are the fundamental things the packer will need to
resource will be accessed through a memory address once do.
we LockResource() on it. That address is the base
• Determine Size of original
address. Now, that's all you have to do on a running
module (i.e. one the OS loader mapped in), but that's not • Setup new section(s); modify originals
all we have to do. The second thing we have to do is • Create and add stub outside this region
consider the file and section alignment of the executable • Preserve export info
(do _not_ assume they are they same). The net result of
• Fixup TLS stuff
this is that there will need to be an adjustment on a per
section basis to the resultant pointer. Again, this is not • Relocate the Relocations
necessary for a module loaded by the OS loader into
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
IMAGE_NT_HEADERS::OptionalHeader.DataDirectoryI Recall from installment 2 that the relocation records are
MAGE_DIRECTORY_ENTRY_EXPORT.VirtualAddress stored in chunks, one chunk per page, and as 16bit
IMAGE_NT_HEADERS::OptionalHeader.DataDirectoryI records that are essentially offsets into the page. I refer
MAGE_DIRECTORY_ENTRY_EXPORT.Size you to installment 2 for details, and to the references in
installment 1. Suffice it to say, we travel along our now
to where we stuck it and how big it became. Finally align sorted array, emitting chunk headers whenever a page
up the buffer to a DWORD boundary for further appends, change is detected and emitting 16bit records otherwise.
and you're done with this part. A page is on a 4096 boundary for 32bit PE files so you
can AND the address with 0xfffff000 to find it's page
FYI, we are 50% through our todo list. And we haven't
value, and you can AND the address with 0x00000fff to
compressed any data yet! It's all downhill from here...
find it's offset for the relocation record. Also take care
that when you detect a page change, you will possibly
24.7 Do Stub Fixups and need to pad to a 32bit boundary by adding a noop
Relocating the Relocations relocation record (IMAGE_REL_BASED_ABSOLUTE).
At this point most of the stub's stuff has been built up After processing all records set the
and we can fixup it's pointers to reflect the fact that we
have extracted and moved it's original components. This IMAGE_NT_HEADERS::OptionalHeader.DataDirectoryI
task is very similar to the relocation fixups performed in MAGE_DIRECTORY_ENTRY_BASERELOC.VirtualAdd
the stub. The difference is in computing the delta to ress
IMAGE_NT_HEADERS::OptionalHeader.DataDirectoryI
apply.
MAGE_DIRECTORY_ENTRY_BASERELOC.Size
In normal relocation, like what the stub performs, there
is only one delta. This is because the image as a whole to reflect this new chunk that we added. We should
moves, and all items are relocated by the same amount. already be aligned to a 32bit boundary.
In our case, different sections have moved differently, and
thus each item must be treated as having it's own delta. 24.8 Setup for TLS Stuff
The delta computation in this case is computed as the If the original application used TLS we need to set some
change between the RVA of the original item to be fixed things up so that the stub can help out. This is fairly
up (RVAFixupOrig) and the RVA of the item after it has straightforward. Especially if there is none!
been moved (RVAFixupDest). The item at RVAFixupDest
must then be adjusted by this delta. TLS information is communicated to the stub through
the public data. Way back, when we were appending the
Since this translated RVADest is a relocated relocation, I data section took note of the index that starts the data
save it into an array of DWORDs for the next step. This section. Also, since when we build the stub to have that
saves me from going through the relocation structure structure at the beginning, now we can cast the address
twice. of the buffer offset by the index to a pointer to the public
structure. Again, we can't stow this pointer since
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
whenever we append to the buffer we risk reallocating development in order to isolate problems. Not useful
memory, but we can recompute the pointer as needed otherwise.
from the index between appends.
OK, assuming you have implemented the wrapper
Anyway, if there is no TLS, as evidenced by:
interface I suggested in Utility Code, above, we are ready
to do some compressing! Well almost. The compression of
IMAGE_NT_HEADERS::OptionalHeader.DataDirectoryI the original data could be large, so I prefer not to do it to
MAGE_DIRECTORY_ENTRY_TLS.VirtualAddress memory and rather directly compress to the output file
(ergo the HANDLE constructor argument in the
Compressor class). So we must compute the file position
being 0, then we can simply clear out the copy of the tls
of where this data goes.
directory in the public data (we called it tls_original in
installment 2). We zeroed the size of the original PE sections, so the first
real one is our new stub section. We need to compute the
If there _is_ TLS, then we copy the original TLS
file offset to this new section (PointerToRawData).
directory structure to the tls_original in the public data,
and copy over a few items to the tls_proxy:
You should make a copy of the original
IMAGE_NT_HEADERS if you haven't already. We will
SizeOfZeroFill manipulate it to reflect our output. Let's call it
Characteristics nthdrDest and initialize it to the original exe's values.
StartAddressOfRawData Then calculate:
EndAddressOfRawData
nthdrDest .FileHeader.NumberOfSections = (new section
count)
Note, the addresses do not need to be translated (shock int nSectionHeadersPos =
ofshocks) because they reference data in the original IMAGE_DOS_HEADER::e_lfanew +
application, which we have not moved. The stub only sizeof(IMAGE_NT_HEADERS);
accesses that data _after_ it decompresses it. int nFirstSectionPos = nSectionHeadersPos +
(new section count) *
sizeof(IMAGE_SECTION_HEADER);
Setup:
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
InsertData The 'fun' that awaits is similar to what we did for exports
( IMAGE_NT_HEADERS::OptionalHeader.DataDirectory earlier in that we walk a structure and optionally copy
, stuff over, adjusting the pointer when we do and leaving
sizeof(IMAGE_NT_HEADERS::OptionalHeader.DataDire it pointing to the original data in the compressed section
ctory) ); otherwise.
DWORD dwSectionCount =
IMAGE_NT_HEADERS::FileHeader.NumberOfSections;
InsertData ( &dwSectionCount, sizeof(dwSectionCount) ) The difference is that this structure is more complex,
; with more objects and a more complex decision on what
for each section IMAGE_SECTION_HEADER to keep. First let me briefly tell you what you want to
InsertData ( & keep uncompressed because that's the easy part to know
IMAGE_SECTION_HEADER::VirtualAddress, and tedious part to figure out experimentally. You will
sizeof(IMAGE_SECTION_HEADER::VirtualAddress) );
want to keep uncompressed the following resources:
InsertData ( &
IMAGE_SECTION_HEADER::SizeOfRawData,
sizeof(IMAGE_SECTION_HEADER::SizeOfRawData) );
• first RT_ICON should be kept
InsertData ( (actual pointer to original data),
IMAGE_SECTION_HEADER::SizeOfRawData ); • first RT_GROUP_ICON should be kept
• first RT_VERSION should be kept
In other words, we are pushing the RVA of where the
data goes, the physical (uncompressed) size, and then the • first "TYPELIB" should be kept
physical data. We do this for each section of the original.
• all "REGISTRY" should be kept
When we are done we invoke Finish() on the compressor
to flush any remaining data not written. OK, that being said, keep in mind that resources are a
multilevel tree of directories. You need to keep track of
We get the number of actual compressed bytes with at what level you are to make your comparisons in order
CompressedCount(). This we add to the size of the to determine whether to keep a resource or not. Also, as a
buffer we were building and store it in the perceived convenience, all the fixed sized structures are
SizeOfRawData field of the section header for the stub. coalesced at the beginning with variable length ones
afterwards. This means all the directory structures are
Finally, get a pointer to the structure containing the at the beginning, with things like string identifiers and
public data (this is why we didn't write out this until resource data afterwards.
now). Set the value of the stub entry point (after I do a similar thing as with the stub and build this
translating, of course), the RVA of the start of the section in memory with a managed array of bytes. Once
compressed data (which is the RVA of the stub + the size it is constructed I write it out later.
of the stub buffer) and the size of the compressed data
(which we got from the compressor when done). You can walk the tree once to find where this boundary
between fixed and variable sized data lays, then copy the
Then seek back to the position PointerToRawData we fixed data verbatim. It's interesting to not that most of
just computed and write out the stub buffer. Basically we the pointers in this section are relative to the section
just concatenated the two in reverse order. itself, and thus do not require translation. The exception
to this is the pointers to the actual resource data, which
Finished with generating and writing out the stub! is an RVA.
Walk the tree a second time and append all the string
24.10 Processing the Resource identifiers. Adjust the pointers to these strings keeping
in mind that they are _not_ RVAs, but are rather relative
Directory offsets into the resource section.
Processing the resource directory is a strictly optional
Walk the tree a third time and copy over the resource
task. It is a bit tedious. Benefits of processing include
chunks for the resources types of interest described
preserving the everimportant application icon and
above. Keep in mind that these actually _are_ RVAs, so
© CodeBreakers Journal, http://www.CodeBreakersJournal.com
CodeBreakers Magazine – Vol. 1, No. 2, 2006
you will need to add the RVA of the beginning this • IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT
section. What is that? Well, it is the RVA of the last kiss it goodbye
section, plus its size, aligned up to the
• IMAGE_DIRECTORY_ENTRY_IAT expunge it
NT_HEADERS::OptionalHeader.SectionAlignment. The
resource chunks should be aligned between appends. • IMAGE_DIRECTORY_ENTRY_DEBUG (we don't
really have bugs, anyway)
Setup the section header for this additional section. It
_must_ have the name .rsrc. Setup the VirtualAddress of Seriously, though, the first two are used by the loader
this section to the RVA we just computed. Setup the and will cause crashing behaviour. Removing them
PointerToRawData in a similar manner, except use the harms nothing. The last one might be nice, but the
last sections PointerToRawData + SizeOfRawData and debugger can't get to the data until after the application
align the result up by the value of is running, which is too late.
IMAGE_NT_HEADERS::OptionalHeader.FileAlignment • Writing out the Remainder
instead. Set the SizeOfRawData to the size of the
• Copy over the original DOS stub.
resulting chunk, and the VirtualSize to the same. You
can align these values up if you like. • Write out the modified PE header.
Similar to what we did with the stub, seek to the Position to the section header offset we computed
PointerToRawData and write out the data in the buffer (nSectionHeadersPos) Loop through the section headers
we've been building. we have been keeping onhand and write them out. If you
have a modified resource section, take care to rename the
Finally, set:
original and make the new one be named .rsrc to work
IMAGE_NT_HEADERS::OptionalHeader.DataDirectoryI
around the Microsoft OLE automation bug.
MAGE_DIRECTORY_ENTRY_RESOURCE.VirtualAddr Close your file.
ess
IMAGE_NT_HEADERS::OptionalHeader.DataDirectoryI
MAGE_DIRECTORY_ENTRY_RESOURCE.Size 25 Beyond Packers
I think it's useful to consider from a big picture of what a
and we are done with that.
packer is, because subsets of the technology can be used
for different applications. For instance, we bound new
24.11 Dotting I's and Crossing code and data to an arbitrary executable that was not
T's designed to host it, without damaging the original
program. This is like an exe binder. Discard the
There are some details that will need to be fixed up compression and a lot of the manipulation of directories
before writing the rest of the stuff out. Mostly this has to and you can produce one. Similarly, one could retain
do with the various directory entries, but let's not forget some of the directory manipulations, like with the
the entry point address! imports, and fashion a protector of sorts to resist reverse
The entry point is computed as the stub 'dll's entry point engineering. Other extended applications may come to
after being translated with the translation device I hope mind as well.
you created.
The image size needs to be recomputed as the last 26 conclusions
section's VirtualAddress plus its VirtualSize.
I hoped you found some useful information in this article.
I enjoyed having the opportunity to write it.
Most of the directory entries need to be copied over from
the stub 'dll' after being passed through the translator.
Exceptions include the Resource directory. If you
processed resources you should point it to the new
section you created. If you did not leave it as it was in the
original. Resources will be available at runtime, but not
to explorer or OLE (or ResHacker).
If you made exports/relocations, setup those entries (that
was discussed earlier).
Some directory entries should definitely be zeroed out:
© CodeBreakers Journal, http://www.CodeBreakersJournal.com