Executable File Analysis (Windows Forensic Analysis) Part 2
Executable File Analysis (Windows Forensic Analysis) Part 2
API structures. With this and other resources, we can understand the structure of a PE file, delve into its depths, and extract information
that may be of use to us during an investigation.
A PE file can be broken down into several areas of interest (I hesitate to say "sections," as we will be using this term for a specific
purpose in our discussion). The first, and perhaps most important, part of a PE file (if not the most important, then one of the best bits of
geek trivia) is the file signature. For executable files on Windows systems, the file signature consists of the letters MZ, found in the first
two bytes of the file. As noted earlier in the topic, these two letters are the initials of Mark Zbikowski
(http://en.wikipedia.org/wiki/Mark_Zbikowski), the Microsoft architect credited with designing the executable file format. However,
as you’ll see, it takes much more than those two letters and an ".exe" at the end of the file name to make a file executable.
Mark’s initials are the signature for a 64-byte structure called the IMAGE_DOS_ HEADER. The important elements of this structure
are the first two bytes (the "magic number" 0x5a4d in little-endian hexadecimal format, or MZ) and the last DWORD (4-byte) value,
which is referred to as e_lfanew. This value is defined in the ntimage.h header file as the file address (offset) of the new EXE header; that
is, the offset at which we should find the signature for the beginning of the IMAGE_NT_HEADERS structure. The e_lfanew value
points to the location of the PE header, enabling Windows to properly execute the image file. Figure 6.2 illustrates these values from an
executable file opened in a hex editor.
In the example illustrated in Figure 6.2, the IMAGE_NT_HEADERS structure should be located at offset 0xB8 (184 in decimal
notation) within the file. The IMAGE_NT_ HEADERS structure consists of a signature and two additional structures, IMAGE_FILE_
HEADER and IMAGE_OPTIONAL_HEADER. The signature for a PE header is, sensibly enough, "PE" followed by two zero values
(the signature value is a DWORD, or four bytes in length, and appears as "PE\00\00"), and is illustrated in Figure 6.3.
2 Designates the architecture type of the computer; the program can be run only on a system that emulates this
Machine
bytes type
2 Number of
Designates how many sections (IMAGE_SECTION_HEADERS) are included in the PE file
bytes Sections
The time and date that the linker created the image, in UNIX time format (i.e., number of seconds since
4
TimeDateStamp midnight, 1 Jan 1970). This normally indicates the system time on the programmer’s computer when he
bytes
compiled the executable
4 Pointer to Symbol
Offset to the symbol table (0 if no COFF symbol table exists)
bytes Table
4 Number of
Number of symbols in the symbol table
bytes Symbols
2 Size of Optional Size of the IMAGE_OPTIONAL_ HEADER structure; determines whether the structure is for a 32-bit or
bytes Header 64-bit architecture
2
Characteristics Flags designating various characteristics of the file
bytes
For forensic investigators, the TimeDateStamp value may be of significance when investigating an executable file, as it shows when the
linker created the image file (investigators should also be aware that this value can be modified with a hex editor without having any
effect on the execution of the file itself). This normally indicates the system time on the programmer’s computer when the programmer
compiled the executable and may be a clue as to when this program was constructed. When performing analysis of the file, the number of
sections that are reported in the IMAGE_FILE_HEADER structure should match the number of sections within the file. Also, if the file
extension has been altered, the Characteristics value will provide some clues as to the true nature of the file; for instance, within the
Characteristics value illustrated in Figure 6.4, if the IMAGE_FILE_DLL flag is set (i.e., 0×2000), the executable file is a dynamic link
library (DLL) and cannot be run directly. One class of files that usually occur as DLLs are browser helper objects, or BHOs.These are
DLLs that are loaded by Internet Explorer and can provide all manner of functionality. In some instances, these DLLs are legitimate
(such as the BHO used to load Adobe’s Acrobat Reader when a PDF file is accessed via the browser), but in many cases these BHOs may
be spyware or adware. The MSDN page for the IMAGE_FILE_HEADER provides a list of possible constant values that can comprise
the Characteristics field.
The value that gives the size of the IMAGE_OPTIONAL_HEADER structure (http://msdn.microsoft.com/en-
gb/library/ms680339.aspx) is important for file analysis, as it tells you whether the optional header is for a 32-bit or a 64-bit
application. This value corresponds to the "magic number" of the IMAGE_OPTIONAL_HEADER structure, which is located in the first
two bytes of the structure; a value of 0x10b indicates a 32-bit executable image, a value of 0x20b indicates a 64-bit executable image,
and a value of 0×107 indicates a ROM image. In our discussion, we will focus on the IMAGE_ OPTIONAL_HEADER32 structure for a
32-bit executable image. Figure 6.5 illustrates the IMAGE_OPTIONAL_HEADER of a sample application viewed in PEView.
As you saw earlier, the size of the IMAGE_OPTIONAL_HEADER structure is stored in the IMAGE_FILE_HEADER structure, which
contains several values that may be useful for certain, detailed analyses of executable files. This level of analysis is beyond the scope of
this topic.
However, a value of interest within the IMAGE_OPTIONAL_HEADER is the Subsystem value, which tells the operating system which
subsystem is required to run the image. Microsoft even provides a Knowledge Base article (90493,
http://support.microsoft.com/kb/90493) that describes how (and includes sample code) to determine the subsystem of an application.
Note that the MSDN page of the IMAGE_OPTIONAL_HEADER structure provides several more possible values for the Subsystem
than the Knowledge Base article.
Another value that investigators will be interested in is the AddressofEntryPoint value within the IMAGE_OPTIONAL_HEADER.
This is a pointer to the entry point function relative to the image base address. For executable files, this is where the code for the
application begins. The importance of this value will become apparent later in this topic.
Figure 6.6 shows four of the 16 data directories available in the sample application. The values listed are the locations or offsets within
the PE file where the information is located. For instance, the first line in Figure 6.6 shows that the IMPORT table is located at offset
0×138, the value at that location (0×78004), and the name of the value (RVA). From the information visible in Figure 6.6, we can see that
the sample application has both an IMPORT table and a RESOURCE table.
Tip::
An RVA is used within an executable file when an address of a variable (for example) needs to be specified but hardcoded addresses
cannot be used. This is because the executable image will not be loaded into the same location in memory on every system. RVAs are
used because of the need to be able to specify locations in memory that are independent of the location where the file is loaded. An RVA
is essentially an offset in memory, relative to where the file is loaded. The formula for computing the RVA is as follows:
To obtain the actual memory address (a.k.a. the Virtual Address, or VA), simply add the Load Address to the RVA.
The final portion of the PE file that is of interest to us at this point is the IMAGE_ SECTION_HEADER
(http://msdn.microsoft.com/en-us/library/ms680341.aspx) structures. The IMAGE_FILE_HEADER structure contains a value that
specifies the number of sections that should be in a PE file, and therefore the number of IMAGE_SECTION_HEADER structures that
need to be read. The IMAGE_SECTION_HEADER structures are 40 bytes in size, and contain the name of the section (eight characters
in length), information about the size of the section both on disk and in memory, and the characteristics of the section (i.e., whether the
section can be read, written to, executed, etc.). Figure 6.7 illustrates the structure of an IMAGE_SECTION_HEADER.
Tip::
One thing to keep in mind when viewing the section names is that there are no hard and fast requirements as to what section names
should or can be. The section name is nothing more than a series of characters (up to eight) that can be anything. Rather than ".text", the
section name could be "timmy". Changing the name does not affect the functionality of the PE file. In fact, some malware authors will
edit and modify the section names, perhaps to throw off inexperienced malware analysts. Most "normal" programs have names such as
.code, .data, .rsrc, or .text. System programs may have names such as PAGE, PAGEDATA, and so forth. Although these names are
normal, a malware author can easily rename the sections in a malicious program so that they appear innocuous. Some section names can
be associated with packers and cryptors directly. For example, any program with a section name beginning with UPX has been processed
using one of those programs. We will discuss this at greater length later in this topic.
All of the PE file information is also available via pedump.exe. The section information in Figure 6.7 appears as follows when
viewed via pedump.exe:
As you can see, there is no significant difference in the information available via the two tools. The virtual size and address information
determines how the executable image file will "look" when in memory, and the "raw data" information applies to the executable image
file as it exists on disk.
IMPORT Tables
It’s very rare these days that an application is written completely from scratch. Most programs are constructed by accessing the Windows
application program interface (API) through various functions made available in libraries (DLLs) on the system. Microsoft provides a
great number of DLLs that offer access to ready-made functions for creating windows, menus, dialogs, sockets, and just about any
widget, object, and construct on the system. There is no need to create any of these completely by hand when creating an application or
program.
That being the case, when programs are written and then compiled and linked into executable image files, information about the DLLs
and functions accessed by that program needs to be available to the operating system when the application is running. This information is
maintained in the IMPORT table and the IMPORT ADDRESS table of the executable file.
Note
Awhile back, I had the opportunity to work on a project that involved determining whether an executable file had network capabilities. I
had done some work examining applications to determine whether they had capabilities of either a network server (listened for
connections, like a Trojan backdoor) or client (made connections to servers, like an IRCbot), but with this project the goal was to
automate the process. So, we started by examining available DLLs to determine which of them provided networking functionality (i.e.,
wininet.dll, ws2_32.dll, etc.), and then we determined which functions provided the core functionality in question. Once we had that
information, we could automate the process by parsing the PE file structures, locating the IMPORT table and determining which DLLs
and functions were used. One thing to keep in mind, however, is that reading the IMPORT table of a malware executable file may not be
that easy if the file is obfuscated in some manner.
The pedump.exe tool provides easy access to the IMPORT table information, by locating the import data directory and parsing the
structures to determine the DLLs and the functions the application uses. Example output from pedump.exe appears as follows: Import
Table:
As you can see, the sample application imports several functions from kernel32.dll. Although the DLL actually provides a number of
functions that are available for use (see the "EXPORT Table" section later in this topic), this example executable imports functions such
as GetSystemTimeAsFileTime() and CreateFileA() for use. Microsoft provides a good deal of information regarding many of the
available functions, so you can do research online to see what various functions are meant to do. For example, the
GetSystemTimeAsFileTime( ) function retrieves the current system time as a 64-bit FILETIME object, and the returned value represents
the number of 100-nanosecond intervals since 1 Jan 1601, in Universal Coordinated Time (UTC) format.
Tip::
You can look up Microsoft API functions via MSDN. I keep a link to the Microsoft Advanced Search page on my browser toolbar for
quick access. Typing in the name of the function I’m interested, such as GetSystemTimeAsFileTime, provides me not only with
information about the API function, but also with important ancillary information.
Seeing what functions an application imports gives you a general clue as to what it does (and does not do). For example, if the
application does not import any of the DLLs that contain networking code, either as low-level socket functions or higher-level Internet
APIs, it is unlikely that the application is a backdoor or that it can be used to transmit information off the system and onto the Internet.
This is a useful technique, one that I have used to provide information and answer questions about an application. I was once given an
executable image and asked whether it was or had the capability of being a network backdoor. After documenting the file, I took a look at
the IMPORT table and saw that none of the imported DLLs provided networking capabilities. I took my analysis a step further by
looking at the functions that were imported and found that although several provided mathematic functionality, none provided
networking capability.
Another useful tool for viewing the information regarding DLLs and functions required by an application is the Dependency Walker
tool, also known as depends.exe, available from the Web site of the same name. Figure 6.8 illustrates an excerpt of the Dependency
Walker GUI, with the sample application dcode.exe open in the Dependency Walker.
The Dependency Walker tool allows you to see not only the DLLs and functions that an executable imports—be it an .exe or a .dll file—
but also the functions exported by DLLs. We will discuss the EXPORT table a bit more in the next section.
The Dependency Walker tool also has a useful profiling function, which allows you to set specific parameters for how a module or
application will be profiled, and then to launch the application to see which modules (DLLs) will be loaded. This allows you to trace the
various DLL function calls and returned values as the application runs. This can be useful in detecting modules that are dynamically
loaded but aren’t listed in the IMPORT tables of other modules, or for determining why an "application failed to initialize properly" error
is reported. However, this falls outside the scope of static analysis, as it requires the file to be run.
Related Links
Windows Forensic Analysis
Live Response: Collecting Volatile Data (Windows Forensic Analysis) Part 1
Live Response: Collecting Volatile Data (Windows Forensic Analysis) Part 2
Live Response: Collecting Volatile Data (Windows Forensic Analysis) Part 3
Live Response: Collecting Volatile Data (Windows Forensic Analysis) Part 4
Live Response: Collecting Volatile Data (Windows Forensic Analysis) Part 5
:: Search WWH ::