Reverse Engineering Techniques Used For Malware Analysis
Reverse Engineering Techniques Used For Malware Analysis
DISSERTATION THESIS
Scientific coordinator:
Prof. ION SMEUREANU
Student:
IOAN-CRISTIAN IACOB
BUCHAREST
2015
2
Table of contents:
I. Introduction .................................................................................................................... 3
A. Malware History.......................................................................................................... 3
B. Malware Classification ................................................................................................ 3
1. Viruses ..................................................................................................................................... 3
2. Worms ..................................................................................................................................... 3
3. Trojans..................................................................................................................................... 3
4. Bots ......................................................................................................................................... 3
5. Rootkits ................................................................................................................................... 4
C. Malware Propagation Techniques ........................................................................... 4
1. Web browsing ......................................................................................................................... 4
2. USB thumb drives.................................................................................................................... 4
3. Email Spear Phishing ............................................................................................................... 4
4. Watering Hole ......................................................................................................................... 4
5. Zero Day Exploits (java, office docs, flash) .............................................................................. 4
D. Malware Goals ........................................................................................................ 5
1. Monetising Malware ............................................................................................................... 5
2. Espionage and Sabotage ......................................................................................................... 5
E. The Goal of Malware Analysis ................................................................................. 8
II. Types of Malware Analysis and Tools ............................................................................ 9
A. Dynamic Malware Analysis ......................................................................................... 9
1. Sysinternals ............................................................................................................................. 9
2. OllyDbg, WinDbg and ImmDbg ............................................................................................. 11
3. Sandboxing ............................................................................................................................ 12
B. Static Malware Analysis ............................................................................................ 13
1. IDA ......................................................................................................................................... 13
III. Malware Strategy and Defence ................................................................................ 15
A. Anti-Reverse Techniques ......................................................................................... 15
1. Anti-Disassembly Techniques ............................................................................................... 15
2. Anti-Debugging Techniques .................................................................................................. 21
B. Malware Behaviour ................................................................................................... 26
1. Process Hollowing ................................................................................................................. 26
Techniques for Malware Reverse Engineering
2
Techniques for Malware Reverse Engineering
I. Introduction
A. Malware History
Malware is the short term used for malicious software. Malware can be any unwanted
software used to gain unauthorised access, disrupt computer operation, or gather sensitive
information. Malware is defined by its malicious intent, acting against the requirements of the
computer user.
Before internet access became affordable to most computer users, viruses spread by
infecting floppy disks. The malware inserted a copy of itself into the machine code instructions
of executables. Viruses depended on users exchanging software on floppies or thumb drives
so they can spread in computer environments. Because of the lack of computer networks,
viruses were written mostly for fun, not for information theft. This means that they were very
loud, and computer users immediately knew they were infected.
The first malware which spread using computer networks originated on multitasking
Unix systems. SunOS and VAX BSD systems were the targets of the first well-known worm
created in 1988. The method used for propagation was to exploit security holes
(vulnerabilities) in network server programs and ran itself as a separate process. Since then,
malware has evolved and started gathering information. This meant they needed to be present
for longer periods of time in order to gather as much information/data as they can. So the
writing techniques changed from a loud behaviour and notorious to a stealthier and obscure
approach.
Today, because of its popularity, Microsofts Windows OS is the preferred target for
malware writers, although a few are also written for Linux and Unix systems. [1]
B. Malware Classification
1. Viruses
A virus is a type of malware that replicates by inserting a copy of itself into other
executables. The majority of viruses are attached to executables, this means the virus may
reside in a system but will not be active until a user runs that program. When the software is
executed, the virus will pass execution to the legitimate process after it has done its job.
The virus spreads when the infected file is transferred between different computers,
networks, e-mails, etc. [2]
2. Worms
In contrast to viruses, worms propagate by exploiting vulnerabilities of network
services, which make them human independent. After an infection has been achieved, the
virus can pivot its action to the new infected host to enable it to access other network segments
that would otherwise be inaccessible from the original starting point.
3. Trojans
The Trojan was named after the horse used by the Greeks to infiltrate Troy. It is
designed to be similar with legitimate software and trick users into launching it. After the
system has been compromised, the Trojan can gather and steal intellectual property, change
or delete files, create backdoors to attackers etc.
This kind of malware spreads using human interaction, and not by infecting other files
or propagating through the network.
4. Bots
The term bot comes from the word robot, which is an autonomous
mechanism/software that is able to interact with the nearby environment providing services
3
Techniques for Malware Reverse Engineering
4
Techniques for Malware Reverse Engineering
After these vulnerabilities have been found by legitimate users, they are documented
and reported to software developers to be patched. Known vulnerabilities are accounted and
publicly available at cve.mitre.org. CVE is a dictionary of publicly known information security
vulnerabilities and exposures. [3]
D. Malware Goals
1. Monetising Malware
In the recent years, an extensive diversification has been perceived of the
underground economy associated with malware and the subversion of Internet-connected
systems. [4]
a) Credentials Theft (CCs & Bank Accounts, email credentials)
In the early days of malware, the main purpose was notoriety, but since the internet
has grown, malware writers concentrated their efforts on making money. One of their
strategies, after infecting computer environments, is to steal user credentials from banking
websites, systems accounts, ftp, email, etc. These credentials are sent through the internet to
the attacker so he can sell them on the black market or use them for their own good. [5]
Some examples include: Citadel, SpyEye, Pony, Zeus, Carberp and Dyre.
b) Rogue Software (fake AV, Battery Boosters)
Rogue software is a misleading type of software that simulates the user interface of a
legitimate application with the intent of dropping malware onto the computer or to persuade
the user into paying money for it.
c) BlackSEO and SPAM
Black SEO is a practice that increases a page's rank in search engines through means
that violate the search engines' terms of service. [6] This is accomplished by renting botnet
infrastructures to entities that desire higher page rank. The bot master commands his bots to
search and access webpages, thus resulting rank increase.
SPAM is the process of sending unsolicited emails containing advertisements or
attachments to users. By clicking the links from the message body, users can be redirected to
phishing websites or site that are hosting malware.
This type of activity also has its own black-market and bot masters can rent their
infrastructure to other individuals. Payments are made using cryptocurrencies to preserve the
anonymity of both entities.
d) Pay per Install (install other malware)
Pay-per-install is a technique used by bot masters to make money by renting or selling
their infrastructure of compromised hosts to other entities. These entities continue by installing
new malware on the hosts, and by doing so, he is further expanding his current botnet.
e) Ransomware (CryptoLocker)
Ransomware is the type of malware that restricts users access to their data by
encrypting critical parts and afterwards demanding a ransom to be paid by the victim to the
malware author in order to re-enable access. [7]
2. Espionage and Sabotage
Advanced Persistent Threat (APT) is a set of stealthy and continuous hacking
processes often orchestrated by human targeting a specific entity. APT usually targets
organizations and/or nations for business or political motives. APT processes require high
degree of covertness over a long period of time. As the name implies, APT consists of three
major components/processes: advanced, persistent, and threat. The advanced process
signifies sophisticated techniques used by malware to exploit vulnerabilities in systems. The
persistent process suggests that an external command and control server is continuously
5
Techniques for Malware Reverse Engineering
monitoring and extracting data from a specific target. The threat process indicates human
involvement in orchestrating the attack (Musa, n.d.)
a) Stuxnet (sabotage)
Stuxnet is an advanced worm which was discovered in July 2010, and it was designed
to infect industrial programmable logic controllers (PLCs). It infected at least 22 manufacturing
sites and had a major impact on Irans nuclear enrichment programs, but also infected an U.S.
manufacturing plant. Stuxnet is the first malware to target industrial processes and the costs
of eliminating it are not neglectable.
PLCs allow the automation of electromechanical processes such as centrifuges for
separating nuclear material. Stuxnet spread by exploiting four zero-day vulnerabilities and was
able to compromise Iranian PLCs, collecting information on industrial systems and disrupting
the enrichment process of uranium. Stuxnet reportedly ruined almost 20% of Iran's nuclear
centrifuges.
Stuxnet has three modules: a worm that executes all routines related to the main
payload of the attack; a link file that automatically executes the propagated copies of the worm;
and a rootkit component responsible for hiding all malicious files and processes, preventing
detection of the presence of Stuxnet.
The Stuxnet first infected a computer via an infected USB thumb drive and then
propagates through the network and scanning for Siemens Step7 Software. In absence of
both, the malware remained dormant. When it arrived on a targeted computer, it introduced a
rootkit onto the Siemens software what modified the parameters given to the machinery and
reported to the user normal operation parameters.
b) Red October (espionage)
Red October was a cyberespionage malware program discovered in October 2012 and
uncovered in January 2013 by Kaspersky Lab. The malware was reportedly operating
worldwide for up to five years prior to discovery, transmitting information ranging from
diplomatic secrets to personal information, including from mobile devices. The primary vectors
used to install the malware were emails containing attached documents that exploited
vulnerabilities in Microsoft Word and Excel. Later, a webpage was found that exploited a
known vulnerability in the Java browser plugin. Red October was termed an advanced
cyberespionage campaign intended to target diplomatic, governmental and scientific research
organizations worldwide.
After being revealed, domain registrars and hosting companies shut down as many as
60 domains used by the virus creators to receive information. The attackers shut down their
operation as well. [8]
c) Regin (espionage)
According to popular antivirus companies, Regin has one of the most technical
competence which is rarely seen and it has been used to spy on governments, infrastructure
operators, businesses, researchers, and private individuals.
Regin was revealed by several antivirus companies in November 2014. Figure 1
illustrates the main countries targeted by this campaign. [9]
6
Techniques for Malware Reverse Engineering
10% Russia
5%
5% Saudi Arabia
5%
5% 28% Mexico
Ireland
9%
India
Afganistan
9%
Iran
24%
Belgium
Others
Antivirus companies have been unable to determine the attack vector used. Regin has
been compared to Stuxnet and is thought to have been developed by well-founded teams of
developers, possibly a government, as a targeted multi-purpose data collection tool. [10]
The attack is comprised of several stages, each being encrypted or hidden except
the first stage which represents the initial dropper of the malware. This modular approach
enables the attacker to customise the attack scenario, and the multitude of stages is used to
elude reverse engineers by making them reconstruct the puzzle in a bigger amount of time
and with fewer clues. Only by acquiring all five stages it is then possible to conduct a
thorough analysis on the attack. Also some of the components were not written to disk,
these would only reside in memory to escape disk forensics.
This first stage begins a chain of events, downloading, decrypting and executing the
next level up until the fifth which is the final one. Figure 2 displays these different stages in a
cascading order. [11]
7
Techniques for Malware Reverse Engineering
d) Flame (espionage)
Flame is a modular computer malware discovered in 2012. The malware is being used
for targeted cyber espionage in Middle Eastern countries.
Its discovery was announced on 28 May 2012 by MAHER Centre of Iranian National,
Computer Emergency Response Team (CERT), Kaspersky Lab and CrySyS Lab of the
Budapest University of Technology and Economics.
Flame can spread to other systems over a local network or via USB thumb drives. It
can record audio, screenshots, keyboard activity and network traffic. The program also
records Skype conversations and can turn infected computers into Bluetooth beacons which
attempt to download contact information from nearby Bluetooth-enabled devices. This data,
along with locally stored documents, is sent on to one of several command and control servers
that are scattered around the world. The program then awaits further instructions from these
servers. [12]
Malware analysis is the action of taking apart the executable code and study its
behaviour. While the reverse engineering takes place, the analyst must focus and find
answers to questions above.
The reverse engineer must create a safe and isolated environment in which he can
conduct the malware analysis. Isolated environments are mandatory in order to prevent
accidental damage to production sites. One solution to this is to create a physically isolated
network from the corporate one, with its own network services, hosts, software and also
isolated from the Internet. Several tools can be used to simulate the Internet and other
communication protocols. More techniques used in this scope will be detailed further in this
thesis.
8
Techniques for Malware Reverse Engineering
There are two approaches to disassembling and analysing software. Both are
important, none is better than the other and both are used to compare results and confirm
observed behaviour. The techniques used in software reverse engineering are comprised into
dynamic analysis and static analysis.
The objective of this paper is to describe how a malware sample must be analysed,
how to approach heavily obfuscated code, route the code flow to exhibit malware behaviour
that may occur under certain conditions, understand detection mechanisms etc.
9
Techniques for Malware Reverse Engineering
The last tool referred to in this paper is TCP Viewer. This utility is illustrated Figure 5,
it displays connections created and maintained by the processes and can also manage them
along with their process. This tool is often used to view the network activity and network
destinations of certain malware. Results from TCP viewer can be combined with network traffic
10
Techniques for Malware Reverse Engineering
from tools like Wireshark or TCPDump to get a more detailed picture of the malwares network
activity.
11
Techniques for Malware Reverse Engineering
3. Sandboxing
In computer science, a sandbox is an isolated and controlled environment in which
untrusted code is ran to observe its behaviour and visualise network activity. A sandbox has
an isolated environment to prohibit and prevent possible damage of malicious software to the
host operating system. After the software has finished execution and the logs have been
acquired the sandbox automatically reverts back to a clean state.
Sandboxes are used to automate malware analysis and designed to easily scale up to
enable fast triage of hundreds of malware samples. These tools must implement a series of
techniques to evade detection, simulate human activity, and to execute as much as possible
from the malware sample. Modern malware include routines that detect the presence of
debuggers, sandboxed environments, virtual machines and other monitoring tools installed
onto the system, also malware can include logic bombs to evade dynamic analysis of
malicious code. Login Bombs are algorithms that do enable check if a condition has been met
and after malicious routines are ran. These checks can include calendaristic dates, certain
key strokes, elapsed time since install etc.
There are two types of sandboxes, emulated and virtualised.
A software emulator is a program that simulates the functionality of another
applications or hardware. Emulators can collect detailed information about execution of a
particular application and emit a report based on them and also emulator can be developed
to run software written for different CPU architectures such as Android which runs on ARM
processors. A big disadvantage to emulator is lack of performance because of the stacked
software layers.
12
Techniques for Malware Reverse Engineering
With virtualization, a program runs on the underlying system hardware. The hypervisor
manages accesses of different applications to the underlying hardware. Such that different
virtual machines are isolated and independent from each other. However, when programs run
inside a virtual machine, it is being executed on the actual hardware and thus detailed data
collection is difficult to achieve. An advantage of this approach is that programs run at the
native speed of the hardware host and also virtualising software provides means by which the
user can create a restore point in time and revert back to it when needed. It is recommended
that malware analysts should deploy physically isolated environments and also they must
emulate the Internet to enable optimal analysis using dynamic techniques.
13
Techniques for Malware Reverse Engineering
14
Techniques for Malware Reverse Engineering
This fragment has been disassembled using a method called linear-disassembly and
it resulted an inaccurate interpretation. This is a result of overlapping code. The jump
instruction from address 0x00401005 makes and a conditional jump to address 0x0040100E
and the disassembler does not process both paths because they overlap. To be more precise,
the XOR EAX,EAX instruction will reset EAX and the JZ (jump if zero) instruction from address
0x401005 will always be met, thus the incorrect branch of an if-statement has been
disassembled. This misinterpretation can be mitigated by manually specifying the
disassembler to parse code from a specific address, which in our case is 0x0040100E. Figure
9 displays the correct pattern of the code flow.
15
Techniques for Malware Reverse Engineering
After the code logic has been corrected, the instructions now have a precise meaning
and can be easily read by the analyst. A call to the API function stricmp() can now be observed
including its parameters. Further adjustments can made to sanitise the code. Because the
code branching was there only to mislead the analysis and had no effect on the programs
behaviour, in Figure 10 the instructions between addresses 0x00401005 and 0x0040100D
have been replaced with NOP instructions.
After this final modification the code flow is easier to read and also helps future analysts
whom may also review this project.
16
Techniques for Malware Reverse Engineering
preserving intellectual property. Malware writers use this technique to elude analysis of
malicious code, conceal its purpose and bypass antivirus scans. The same principle has also
been applied to hardware devices with the same scopes.
Obfuscation is performed in all the programming languages including, C/C++, Java,
.NET, JavaScript, PHP, Python, Assembly etc. This technique is most frequently applied using
automated software that scramblers the code, but there are also manual and labour intensive
approaches to achieve a more complex output in limited scenarios.
Obfuscation does not alter the original code pattern of the software.
In the Figure 11, an example of an obfuscated windows binary can be seen which was
created most probably for amusement. And Figure 12 illustrates the running process of the
mentioned binary.
17
Techniques for Malware Reverse Engineering
Code packing is another method to compress and obfuscate the executable file. Code
encryption uses the same technique but adding symmetric keys to the algorithm. This is
achieved by applying mathematical operations on the source code using a special software
called a packer. The newly generated executable includes a routine (usually at the beginning
of the code) that decompresses the full binary inside the memory and then the execution is
passed to the newly decompressed code. Free packers can be downloaded from the Internet,
but also some of them require a license which also have a better obfuscation algorithm or
even multiple algorithms. Depending on the complexity of the packing algorithm, reverse
engineers can identify the unpacking routine, take notice of the memory addresses passed as
parameters, and then find the call/jump to the newly generated code and from there the
unpacked code can be dumped from memory to disk.
There are multiple known packers, like UPX, PECompact, ASPack, Themida, FSG. In
the following paragraphs UPX will be used as an example. Ultimate Packer for eXecutables
(UPX) [14] is one of the most popular packers used because of its ease of use and free to
download. The executable code, data, Import Address Table (IAT) and Import Descriptor
Table (IDT) is compressed into a single section called UPX1 as Figure 14 illustrates. UPX0 is
a placeholder for the unpacked code, UPX1 is the container of the packed code and UPX2
contains the unpacking routine.
PE HEADER PE HEADER
UPX0
.text
UPX1
.data
UPX2
As expected, when the packed executable is launched, the unpacking routine does the
exact opposite. Figure 15 displays that the entry point of the packed executable is inside
UPX2, and after the unpacking routine has finished the execution must jump back to the
original entry point of the application. Also during the unpacking routine, the Import Address
Table (IAT) and Import Descriptor Table (IDT) of the original executable are rebuilt.
18
Techniques for Malware Reverse Engineering
PE HEADER PE HEADER
UPX0 UPX0
.text Original Entry Point
UPX1
.data
Entry Point UPX2
UPX1
UPX2
Zoom In
After the jump has been identified, the packed executable can now be opened in a
debugger (like OllyDbg or WinDBG), execute the code until the jump, and dump the process
from memory to continue analysis on the unpacked code.
19
Techniques for Malware Reverse Engineering
20
Techniques for Malware Reverse Engineering
Figure 19 displays a function that takes a number and returns the product of that
number times 42. IDA is unable to deduce any meaningful information regarding this function
because it has been tricked by a rogue RETN instruction. Notice that it has not detected the
presence of an argument to this function. The first three instructions accomplish the task of
jumping to the real start of the function. To repair this example, the first three instructions can
be patched with NOP instructions and adjust the function boundaries to cover the real function.
This technique is often used to elude analysis and has an easy repair.
2. Anti-Debugging Techniques
This subchapter aims to classify and present some of the most popular anti-debugging
techniques used on Windows NT-based operating systems.
21
Techniques for Malware Reverse Engineering
Figure 20 IsDebuggerPresent
There are also scenarios when the programmers writes large amounts of checks in his
code, and this implies that the reverse engineer should patch each check in the code. A
simpler way to patch all of these debugger checks is by modifying a flag present in the Process
Environment Block specific for that running process. The underlying mechanism of
IsDebuggerPresent() checks a flag named BeingDebugged (Figure 22) from the Process
Environment Block which relies inside the user space of the operating system. If the flag is
22
Techniques for Malware Reverse Engineering
TRUE, the function will also return TRUE, thus by attaching a debugger to the application the
flag can be set to FALSE and a global patch is applied (Figure 23).
b) Timing Checks
The theory behind timing checks is that an executing section of code, especially a small
section, would only require a small amount of time. Therefore, if a timed section of code takes
a greater amount of time than a certain set limit, then there is most likely a debugger attached,
and someone is stepping through the code. This type of attacks has many small variations,
and the most common example uses the IA-32 RDTSC instruction. Other methods utilize
different timing methods such as GetTime, GetTickCount (illustrated in Figure 24), and
QueryPerformanceCounter (also illustrated in Figure 24). [15]
23
Techniques for Malware Reverse Engineering
24
Techniques for Malware Reverse Engineering
25
Techniques for Malware Reverse Engineering
If software breakpoint are being identified, the reverse engineer can use hardware
breakpoints instead, but he must take into consideration that only four hardware breakpoints
can be used because they use special registers within the processor. Another solution is to
patch the executable if there are a small amount of check. If a breakpoint is needed to be set
inside an API code, but the packer attempts to search for breakpoint inside an API code, the
analyst can use a breakpoint on the Unicode version of the API which eventually will be called
by the ANSI version of that function. [16]
B. Malware Behaviour
This subchapter aims to classify and present some of the most popular techniques
used by malware authors to hide malicious activity from the user, interpose between windows
API functions, modify code on execution etc.
1. Process Hollowing
Process hollowing is another mechanism of those that seek to hide the presence of a
malicious process. A bootstrap application generates a seemingly legitimate process in a
suspended state (i.e. svchost.exe), then the legitimate process is then unmapped and
replaced with the code that is to be hidden from the user. If the preferred image base of the
new image does not match that of the old image, the new image must be rebased. Once the
new image is loaded in memory the EAX register of the suspended thread is set to the entry
point. The process is then resumed and the entry point of the new image is executed.
To successfully perform process hollowing the source image must meet a few
requirements:
1. To increase compatibility, the subsystem of the source image should be set to
windows.
2. The compiler should use the static version of the run-time library to remove
dependence to the Visual C++ runtime DLL. This can be achieved by using the
/MT or /MTd compiler options.
3. Either the preferred base address (assuming it has one) of the source image
must match that of the destination image, or the source must contain a
relocation table and the image needs to be rebased to the address of the
destination. For compatibility reasons the rebasing route is preferred. The
/DYNAMICBASE or /FIXED:NO linker options can be used to generate a
relocation table.
26
Techniques for Malware Reverse Engineering
Figure 29 exemplifies the steps taken to achieve a successful hollowing. First the target
process must be launched in a suspended state by passing the CREATE_SUSPENDED flag
to the CreateProcess(). Once the process is run, its memory space can be modified. Next, the
base address of the destination image must be located by querying the process with
NtQueryProcessInformation() to acquire the address of the process environment block (PEB).
Next, a new block of memory is allocated for the source image. The size of the block is
determined by the SizeOfImage() member of the source images optional header. Usually to
simplify the code, the author will flag the entire block as PAGE_EXECUTE_READWRITE.
After the memory has been allocated, the new image must be copied to the destination
process memory by using WriteProcessMemory() starting with its portable executable
headers. Following that, the data of each section is copied. By applying the proper memory
protection options to the different sections would make the hollowing tougher to detect. Finally,
the thread is resumed, executing the entry point of the source image. [17]
Hollower.exe svchost.exe
PE HEADER
( )
CreateProcess ( ,suspended,...)
VirtualAllocEx(...) ( )
.data WriteP rocessMemory( )
PE HEADER ResueThread( )
( )
.text
.data ( )
The most common use of this technique is seen using svchost.exe as target, but it can
be also used on other windows processes. Svchost.exe is used often because it has multiple
instances which can be seen in the task manager and it easy to escape from user detection.
Other windows processes have only one instance and it would be very easy for an
experienced user to spot two identical processes when only one should be present. Figure 30
represents a screenshot from the pseudocode view of a malware sample that uses process
hollowing to hide the malicious process.
27
Techniques for Malware Reverse Engineering
2. Process/DLL Injection
Code injection is a technique used by programmers to run code within the address
space of another process. There are multiple techniques used to achieve the injection. These
techniques are used to influence the behaviour of the targeted program by adding new
components to the running process. Often this technique is used add malicious behaviour to
processes or they are used to apply Windows hot patch updates which do not require
operating system reboot.
The injected code could hook system function calls, steal identifiable information, or
do other malicious activity in the name of a legitimate process. Code injection is a way of
hiding from automated or human detection of malicious code, usually incident response
personnel search for malicious processes with odd disk paths. This impersonation is beneficial
for bypassing restrictions enforced by the operating system on a particular process. It's
important to note that appropriate level of privileges are required on the system to start
manipulating with other program's memory. [18]
Code injection can be performed in both user mode processes and also into kernel
mode processes. The most popular method used to achieve user mode injection include
Windows API functions, AppInit_DLL and Detours also known as Function Hooking. [18]
The functions provided by the Windows API used to achieve process injection are:
OpenProcess() - used to attach to the running process;
VirtualAllocEx() used to allocate memory inside the process;
WriteProcessMemory() used to copy the code into the process memory and also
determine the appropriate address in memory;
28
Techniques for Malware Reverse Engineering
Injector.exe Injected.exe
Attach
OpenProcess()
Step 2
Injector.exe Injected.exe
Allocate Memory
VirtualAllocEx()
Step 3
Injector.exe Injected.exe
Write Code & Deter Addr
WriteProcessMemory()
CODE
Step 4
Injector.exe Injected.exe
Execute
CreateRemoteThread()
CODE
In the first step a handle to the target process is acquired so that injector.exe can
interact with injected.exe. This is achieved by calling OpenProcess() function and then
requesting access rights in order to perform the next steps. The second step is responsible
with allocating memory in the targeted process to be able to copy code in it. VirtualAllocEx()
takes the amount of memory to allocate as one of its parameters and creates new space with
the desired length. Step three is responsible with copying the new code into the process by
using WriteProcessMemory() function. Most execution functions take a memory address to
start from and that address must be identified. The starting address can be searched in
memory by using LoadLibraryA(). And finally, the last step is to execute the new code into a
separate thread. The CreateRemoteThread() is probably the most used function, it is very
reliable but others can be used to avoid detection.
29
Techniques for Malware Reverse Engineering
User Mode
IopSynchronousSer
KiFastCallEntry() NtWriteFile() IofCallDriver()
viceTail()
Kernel Mode
Figure 32 - WriteFile function flow chart
The term kernel space refers to any function that resides in the kernel space of the
operating system, and thus for a user application to call one of these, it must enter the kernel
space via SYSENTER. The first three functions can be hooked from user mode, the others
require a kernel mode driver to enable hooking. By hooking at any point in the flow chart, the
function is able to intercept and tamper data that passed through, as it can be seen in Figure
33.
HookingFunction()
User Mode
IopSynchronousSer
KiFastCallEntry() NtWriteFile() IofCallDriver()
viceTail()
Kernel Mode
There are multiple types of hooks, they are categorised by the type of functions that
the hook is applied to. Further, three types of hooks will be summarised.
IAT Hooks - The Import Address Table (IAT) is a table specific to each application, it
contains a series of jumps to certain API functions. Because functions in DLLs change
addresses, functions are called using a jump within their own jump table. When the
30
Techniques for Malware Reverse Engineering
application is executed the loader will place a pointer to each required DLL function at the
appropriate address in the IAT. If an application injects inside another, it can modify the
addresses in the IAT and then be able to receive control every time a function is called.
Inline hooking is a method of gaining control when a function is called, but before the
called function completed execution. The flow of execution is redirected by adjusting the first
few bytes of a target function. A method of achieving this is to overwrite the first five bytes of
the function with a jump to malicious code, the malicious function can then read the original
function arguments and do whatever it desires. If the malicious function needs results from
the original function, it may call the function by executing the five bytes that were replaced
then jump five bytes into the original function, which will miss the malicious call/jump to escape
infinite loops. The concept is exemplified in Figure 34
(malicious code)
mov EAX, 0x88 jmp maliciousCode
// do bad stuff
mov EDX, 0x7FEE0000 mov EDX, 0x7FEE0000
call dword prt ds:[edx] call dword prt ds:[edx]
...
return 2C return 2C
jmp originalBytes
In user mode, inline hooks are usually located inside functions that are exported by a
DLL. The most effective way of detecting and bypassing these hooks is to compare each DLL
against the original code. An application would need to get a list of loaded DLLs, find the
original files, align and load the sections into memory. Since now the DLL has copy in memory,
the application can parse the export address table and relate each function with the original
one from the copied DLL. In order to bypass hooks, an application can then either replace the
overwritten code using the code from the newly loaded DLL, alternatively, it could resolve
imports in the newly loaded DLL and use it instead. This technique of bypassing DLL hooks
practically involves writing a custom implementation of LoadLibrary().
In kernel mode, inter-modular jumps are not frequently implemented. Hooks in ntoskrnl
can usually be detected by disassembling each instruction in each function, then looking for
jumps or calls that point outside of ntoskrnl and into other modules. Also, the method described
in user mode can be applied here.
SSDT Hooks - The System Service Dispatch Table (SSDT) is a table of pointers for
various Zw/Nt functions, which are callable from user mode. A malicious application can
replace pointers in the SSDT with pointers to its own code. All pointers from the SSDT should
point inside of ntoskrnl, if a pointer relates outside of ntsokrnl, then it is a strong indicator that
it has been hooked. It is possible for a rootkit to modify ntoskrnl.exe in memory and slip some
code into an empty space, in this case the pointer would still point to within ntoskrnl.
31
Techniques for Malware Reverse Engineering
IRP Hooks Each loaded driver contains a table of 28 function pointer, these pointer
are can be called by other drivers via IoCallDriver(), the pointers correspond to operations
such as read/write (IRP_MJ_READ/IRP_MJ_WRITE). These pointers can easily be replaced
by other drivers. Generally all IRP major function pointers for a driver should point to code
within the driver's address space, but there are also scenarios when this rule can be broken,
but nevertheless it is a good step towards detecting malicious drivers which have redirected
the IRP major functions of legitimate drivers to their own code.
32
Techniques for Malware Reverse Engineering
Different drivers can be patched in the driver stack, ex: ntfs.sys (partition level),
atapi.sys (sector level lower)
Figure 35 DKOM
The best approach of finding a hidden process is to create a tool that can read and
parse the kernel space and search for EPROCESS structures that have no other structures
pointing to it. This technique can also be applied to dumping the RAM contents and parsing
the dump offline (onto an uninfected machine). After a hidden process has been discovered,
the analyst can continue with dumping the process from memory and continue analysis by
disassembling the code.
b) Driver dispatch routines hooking
The DRIER_OBJECT structure of a driver must be obtained in order to accomplish
driver dispatch. Some of the most common functions are replaces with the rootkit functions,
like IoGetDeviceObjectPointer, IoGetLowerDeviceObject, IoGetDeviceAttachmentBaseRef.
Typically the disk drivers are hooked to filter access to files and sector which contain sensitive
code of the malware.
5. Domain Generation Algorithm
Domain Generation Algorithms are used by malware authors to generate domains
based on a seed which is derived from a calendaristic value. These domains are generated
for malware to connect to a command and control centre. The malware author also uses this
algorithm to register domains ahead of time and after the time period has expired, the domain
is then deleted. The large number of domains generated by this technique makes difficult for
law enforcement to effectively shut down botnets because bots will attempt to contact only
those domains that should be active in that period of time. This technique was popularised by
the Conficker.
Typically, these algorithms use large array of words and generate a domain by
choosing two or three word from the array and concatenating them and appending a top level
33
Techniques for Malware Reverse Engineering
main at the end. (i.e. [word1][word2].com) There are also scenarios where the DGA will
generate domains that are composed of numbers and letters with a certain length and do not
compose a meaningful word and look very random.
Third parties (Law enforcements or hackers can) can use these algorithms in their
advantage. They can reverse engineer the malware and find the DGA function, replicate its
functionality, generate and register domains before the bot master. By doing this, law
enforcements can replicate a command and control centre to send commands to bots to
uninstall themselves, or other hacker can use this technique to install other malware and steal
the botnet from the original bot master. This operation of generating in advance the domains
to capture bots is called DNS Sinkhole.
Over the years, malware authors have learned how to circumvent the DNS sinkhole
and introduced into their malware a routine that is responsible with receiving digitally signed
commands. If the received command cannot be verified and authenticated, the bot will drop
the command and continue its activity.
The diagram from Figure 36 displays how the Dyreza banking malware generates its
domains based on the current date and a hardcoded number range between [0,333). The
example illustrates the DGA for the date July 4th 2015, and uses as input the number 16. It is
very easy to understand that this algorithm can generate 333 domains every day.
34
Techniques for Malware Reverse Engineering
Now, that the memory page has been set as writeable, the code can start changing its
code. Assuming that upper in the code there is present a NOP instruction (Figure 38), the
code exemplified by Figure 39 allowed it to be changed into inc %ebx.
35
Techniques for Malware Reverse Engineering
This is a very simple example, with no major implication on the executed code, but this
technique can be scaled to much higher capabilities. Combining it with debugging checks, a
malware author can instruct its malware to dramaticaly change code execution or simply
delete the malicious parts of it and thus making the reversing of software much more difficult.
7. Persistence
After the infection of a computer has taken place, the malware has to maintain control
on the machine by implementing some techniques to guarantee activity after a reboot. The
most common used techniques used in Windows operating system involve the use of
registries and start-up folder. Investigating malware persistence locations in the Windows
Registry and start-up locations is a frequent technique employed by forensic investigators to
identify malicious software installed on a host. Besides these common and easy to use
techniques, there is also some other that does not leave any forensic trail behind by simply
placing a DLL in some specific directories with a specific name and the operating system will
load it without any further checks. There are a total of 1032 paths and name combinations in
which a 32 bit DLL can be placed and automatically be loaded at boot. 64 bit DLLs have a
much more diverse range of paths and names because Windows OS has more 64 bit running
processes running at boot time.
a) DLL Search Order Hijacking
When an application requires a DLL to be loaded either by statically importing it into
the executable or by using LoadLibrary(), windows operating system will search that DLL in
some predefined sequence of locations. Figure 40 displays the order in which the search is
accomplished in the windows operating system. An important information to keep in mind from
this figure is that the first place the application looks for a DLL is in the root directory of the
executable itself. If the requested DLL name is listed in the \\.\KnownDlls object then it will
be loaded from the System32 folder. This object is populated at boot-time using data from the
following registry key HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session
Manager\KnownDLLs.
36
Techniques for Malware Reverse Engineering
Directory of Application
Current directory
System directory
Windows directory
KnownDlls will reveal a list of about 30 of the most used DLLs, ws2_32.dll is the DLL
used for networking and it is present within that list, thus the windows operating system will
always load it from system32 regardless of the path from which the application is launched.
This can is a security feature applied to the most important system DLLs which prohibits an
attacker to load its ws2_32.dll instead of the original one. There are also DLLs important to
the operating system that are not present in KnownDlls and DLL Search Order can be
hijacked. Two of these DLLs are iphlpapi.dll and mswsock.dll. [23]
Executables inside system32 are not susceptible to this attack and malware authors
must rely on other techniques to infect a host. Explorer.exe is a critical executable that resides
in the Windows directory and it requires DLLs that are not inside KnownDlls thus permitting
attackers to place ntshrui.dll along with explorer.exe instead of it being in system32. This is
more of a forensic scenario but it is important to be known also by the malware analyst. [23]
This problem long existed in windows operating system and may also persist in the
future due to compatibility with older software.
37
Techniques for Malware Reverse Engineering
38
Techniques for Malware Reverse Engineering
Table of figures:
39
Techniques for Malware Reverse Engineering
References:
1. http://en.wikipedia.org/wiki/Malware#History_of_viruses_and_worms
2. http://www.cisco.com/web/about/security/intelligence/virus-worm-diffs.html
3. https://cve.mitre.org/
4. https://software.imdea.org/~juanca/papers/ppi_usenixsec11.pdf
5. http://securityintelligence.com/3-ways-steal-corporate-credentials/#.VTU8WfmUd8E
6. http://www.wordstream.com/black-hat-seo
7. http://en.wikipedia.org/wiki/Ransomware
8. http://www.kaspersky.com/about/news/virus/2013/Kaspersky_Lab_Identifies_Operat
ion_Red_October_an_Advanced_Cyber_Espionage_Campaign_Targeting_Diplomat
ic_and_Government_Institutions_Worldwide
9. http://www.symantec.com/connect/blogs/regin-top-tier-espionage-tool-enables-
stealthy-surveillance
10. http://en.wikipedia.org/wiki/Regin_(malware)#cite_note-intercept20041124-3
11. http://www.symantec.com/connect/blogs/regin-top-tier-espionage-tool-enables-
stealthy-surveillance
12. http://en.wikipedia.org/wiki/Flame_%28malware%29
13. The Practical Malware Analysis book by Michael Sikorski and Andrew Honig
14. http://upx.sourceforge.net/
15. http://www.codeproject.com/Articles/30815/An-Anti-Reverse-Engineering-Guide
16. https://www.blackhat.com/presentations/bh-usa-07/Yason/Whitepaper/bh-usa-07-
yason-WP.pdf
17. http://www.autosectools.com/process-hollowing.pdf
18. https://www.blackhat.com/presentations/bh-usa-
07/Butler_and_Kendall/Presentation/bh-usa-07-butler_and_kendall.pdf
19. http://blog.opensecurityresearch.com/2013/01/windows-dll-injection-basics.html
20. http://blogs.cisco.com/security/talos/threat-spotlight-dyre
21. http://malwaremusings.com/2012/10/13/self-modifying-code-changing-memory-
protection/
22. https://www.hex-rays.com/products/ida/support/download.shtml
23. http://arstechnica.com/security/2015/05/gpu-based-rootkit-and-keylogger-offer-
superior-stealth-and-computing-power/
24. http://seclab.stanford.edu/websec/chromium/chromium-security-architecture.pdf
40