Challenges of Malware Analysis: Obfuscation Techniques
Challenges of Malware Analysis: Obfuscation Techniques
100
INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE
Singh et al., Vol.7, No.3
compromised by malware for many reasons such The rest of paper is prepared as follows: Section 2
as: describes the malware analysis methods. Section 3
introduces the anti-static malware analysis
techniques. Section 4 presents the dynamic
To harm the computer system. malware analysis obfuscation techniques. Section
For financial gain. 5 discusses the countermeasures to some anti-
For stealing confidential or private data. analysis techniques. Finally the paper is
For making the systems as bots. concluded.
To make the services unavailable to the
system.
2. Malware Analysis
If we compare the traditional malware with
new the malware then we will get the idea how the Malware analysis is categorized into two main
new malware are so hard to detect. Traditional types of static and dynamic which are described as
malware were broad, known, open and one time follow:
but now malware are very targeted, zero-day,
stealthy and persistent as shown in Figure 2 [[6]].
Several types of new malware and their variants
are being programmed by attacker to compromise 2.1 Static Malware Analysis
the security of the computers systems. It is very basic and powerful phenomenon
to analyze the malware without running the
malware. In this analysis process code of malware
are examined to find out the useful information.
On the basis of that information, the malware
detection software are designed (antivirus, IDSs
etc). The extracted information can be the
signature of malware file, program structure,
executable format, instruction opcodes etc. For
static analysis, code of the binary required.
Figure 2 Comparison between Traditional (past) and Therefore, reverse engineering is done to convert
Advanced Malware (present). the executable malware file into the assembly
code. Various disassemblers are used to transform
Today the malware are very specific for the binary files into assembly code such as
achieving the particular goal either to disrupt the Ollydbg, IDA Pro [[4]], and Capstone. These
working of system or any other like stealing disassemblers convert the binary files into the
important data. In order to avoid malware detector, assembly language code, not in the same source
new variants are created using various obfuscation code in which the malware file was actually
techniques. In addition to encoding (encryption, contains. Then, the investigation is done on the
base64) and packing techniques create the assembly code to find the structure or pattern of
complex malicious software like polymorphic, malicious activity which can be used to detect the
metamorphic and packed malware [[7]] which can malware file or variants of that malware file as
overrun the malware detection. Therefore, to crack well. It is a tedious job to examine a thousand lines
or analyze such kind of malware is very time of assembly code. To solve this problem various
consuming and also very hard. alternatives are followed like the program is
broken into parts or grouped on the basis of
The output of malware analysis system functioning. Additionally, code obfuscation
must allow to the security organization for techniques make the analyst’s job harder. Malware
updating the malware defending software which writers use various obfuscation techniques such as
can tackle the growth of malware and as a result to code encryption, reordering the program
thwart the new malware.
101
INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE
Singh et al., Vol.7, No.3
instructions and dead code insertion technique to CWSandbox) in order to get an adequate flow of
evade the malware analysis [[7]]. information. It is done in three basic ways such as
following:
• Tainting the source and sinks
• Address Dependencies
2.2 Dynamic Malware Analysis
• Control flow dependencies
Dynamic analysis is also known as
behavioural analysis. Dynamic analysis is based In tainting approach, labels are assigned to
upon running the malware file then the interaction the registers or identifiers [[20]]. The data elements
of malware with the computer system is monitored which is assignment with the label is called tainted
or observed. For analysis purpose, the malware are source. The variables also become the tainted if
run in a controlled environment. In other terms they are assigned from a tainted source. As shown
malware files are executed in the virtual in Figure 3 below the variable k is tainted because
environment because if the malware file is run on it may cause to call or trigger the suspicious
host system then it will harm the host system. A activity. If any instruction processes the tainted
virtual environment is created using virtualization register is detected as malicious action. On basis of
tools like the Virtual box or VMware. Also, the tainted information malware file is detected.
dynamic analysis environment can be using
emulators and hypervisor [[14]]. When a malware
file is running in monitored environment various
activities are observed such as the creation of new
files, deletion of system or user files, new log
entries, registry entries, URL accessed, data
transmitted etc. Based on these activities, the file is Figure 3. Variable k is tainted because it may cause to call or
considered as a benign file or malicious file. In the trigger the suspicious activity.
case of static analysis, the files which are not
disassembled or not examined properly then those While in address dependency, address tainting is
files can be analyzed in the virtual environment to used to observe sensitive information leakage
know their behaviour. Various approaches are [[21]]. Rather than tainting the data variable,
used in dynamic analyses which are explained as address dependency also tracks the flow of
follows: information in an indirect way (using address by
pointer). As shown in Figure 4 example pointer k
is tainted. It is the base pointer to access array
here. To assign a 5th element to variable C using
2.2.1 Tracking the flow of information this tainted pointer. When a tainted pointer is
When the malware programs are assigned with an address of a register then de-
investigated, it is necessary to know how referencing of the tainted pointer is detected as
information is being processed by the malware malevolent action as shown in figure 4.
program. In the static analysis, the source code of
malware is examined to interpret the flow of
information from an instruction to another or from
one block to another. However, it is a tedious job
because a program file consists of thousands of Figure 4. Pointer k is tainted.
lines of code. Also, this interpretation is totally
based on the analyst capability to investigate the Moreover, control flow dependency is also used to
flow of information statically without running the track the flow of information. In the program
malware. Therefore, running malware is analyzed instructions depend on others instruction and also
in the virtual environment (VirtualBox) or in other instruction depends on that instruction. On
Sandbox (Cuckoo, Norman Sandbox, the basis of execution of instruction, it is evaluated
102
INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE
Singh et al., Vol.7, No.3
the flow of data in order to get know about any are inserted into the program code for changing the
suspicious event. order of code execution without affecting the
actual behaviour of malware program as shown in
Figure 5. It seems very simple for this example to
find out the original order but for hundred lines of
code, it becomes cumbersome for the analyst to
2.2.2 Monitoring the function calls find out the actual order.
Function call monitoring is second most used
dynamic analysis approach in which malware
programs are monitored to know what functions
are called [21]. A malware program can call
various types of functions related to API
(Application Programming Interface), systems
calls, window native calls [[21]]. For example,
malware calls the function such as CreateFile,
DeleteFile, GetProcAddress. It helps to identify
the malware files. On the basis of order of the Figure 5 (a) Original x86 assembly code, (b) It shows
functions calls, malware detection systems are reordered code using unconditional jump instruction.
designed to detect the malware and classify them
3.2 Redundant Data Insertion
into proper categories. A process is used to
intercept the function calls are known as hooking. Malware writers insert the dead code into
the program for creating the new version of same
malware just for increasing the overhead of the
3. Anti-Static Analysis Methods analyst [[19]]. This approach can evade the
signature-based detection systems. When the
Obfuscation means unclear or obscure which is redundant code is inserted then the different
not understandable. Therefore, the malware writer signature is generated. This approach of
uses several obfuscation techniques to evade the obfuscation affects the static analysis only because
analysis. From the ancient time; various in static analyzes it becomes difficult to distinguish
camouflages have been used to hide the actual the dead code which has no contribution in the
information. For example when a king had to send working of malicious software. Thus investigating
information to another king then they used to use the dead code is extra overhead for the malware
secret and hidden methods to keep the data analyst. The redundant code or dead code doesn’t
confidential. The purpose of these approaches was affect the original purpose of the malicious
to keep important information secret. In modern software. Unconditional jump statements are used
computer era, various algorithms are used for to bypass the redundant code block which retains
confidentially, integrity and authentication of data. the original executing order of malware.
Similarly, the malware developers use obfuscation Moreover, a serious of NOP (Not operation like
techniques to conceal the malicious code to bypass instruction in x86) statements are inserted in the
the malware detection system (Antivirus). malware to create new variant as shown in Figure
Obfuscation techniques can be divided into two 6.
categories anti-static and anti-dynamic analysis
techniques. In this section mostly used anti-static
obfuscation methods are explained as follows.
3.1 Change the order of the code
It is a simple obfuscation approach to
change the order of execution of program
instructions [[8]]. Unconditional jump statements
103
INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE
Singh et al., Vol.7, No.3
104
INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE
Singh et al., Vol.7, No.3
b. Oligomorphic
d. Metamorphic
Oligomorphism implies few structures. It is
Greek term combination of two words: oligo (i.e. a Igor Muttik defined metamorphic malware as:
small number of) and morphe means form. “Metamorphics are the bodypolymorphics”.
Oligomorphic malware overcomes the limitation Metamorphic malware doesn’t have a constant
of simple encrypted malware in which the same body and a decryptor; because metamorphic
decryptor is used to create the copies of malware malware do not use any encryption and packing
file [19]. In Oligomorphic malware, the decrypt or technique to thwart the analysis [14, 23].These
imitates into different form every time to decrypt malware transform its binary code dynamically to
the malware file into equivalent form while evade detection. Unlike Oligomorphic and
retaining the same semantic. Win95/Memorial was polymorphic, it does not reveal the constant body
the Oligomorphic malware which had the in the memory. Metamorphic malware imitates
capability to create the 96 variants of original file. another form during runtime in memory. That is
The problem with Oligomorphic malware is that why it is known as dynamic code obfuscated
only a limited number of decryptor can be made. malware.
Consequently, a malware detector can use this Very early in 1998, Vacna, a malware writer,
weakness for detection of every possible variant of implemented a metamorphic malware
malware files. Win95/Regswap by exchanging the used registers
in the code as shown in figure 10.
105
INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE
Singh et al., Vol.7, No.3
106
INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE
Singh et al., Vol.7, No.3
when the analyst is using actual host machine to in order to confirm the detection environment
known the actual behaviour of malware. Detection [[33], [34]]. There are certain discrepancies in
of a debugger can be done in following ways as: features of JavaScript language such as exception
handling or parsing which can be a reason of
analysis environment detection. Because browser
4.3.1 API detection behaves differently in analysis environment
compared to host operating system. Also, ActiveX
In MS Windows operating system, the API behaves differently in browser in virtual and
calls can be used to detect the debugging tools. A emulated environment can be a fingerprint of
malware writer can write the small code to check detection. In [[33]], two other feature of browser
the BeingDebugged flag in Windows OS in order HTML parsing and Document Object Model can
to detect the debugger as shown in Figure 11. be detected in the emulated environment.
5. Countermeasures to Some Anti-Analysis
Techniques
In this section, countermeasures to the anti-
analysis techniques are discussed. We have
discussed countermeasures for redundant code
Figure 11 Example of detection code used by malware. insertion, reordering of actual malware code and
packing malware.
There are many API functions which are
commonly used by malware to known whether the
malware is being analyzed as 5.1 Countermeasure for Redundant Code
CheckRemoteDebuggeRpresent(),
OutputDebugString() [[8], [4], [19], [23]]. Practically speaking, ClamAV anti-virus
programs provided the solutions to NOP
instructions [[17]]. This technique just only
4.3.2. Services and handles concentrates on viral byte arrangements and
semantic NOP byte instructions are overlooked. It
Services and handles are used by various is highly dependent on used regular expression and
malware. The various fine debuggers have main wildcards. A poor decision can bring about a high
services which may be used by malicious software false positive rate. Christodorescu et al. proposed
to identity their existence.. SoftICE is well-known a standardization approach where NOP and
kernel level debugger tools. Its service NTICE can semantic NOP instructions are distinguished and
be used by malware to detect its presence. evacuated by watching the content. However, this
technique can’t be effective if further obscurity
techniques are used in the malware file. If malware
4.3.3 Signature of Debuggers writer has used additional obfuscation technique
along with NOP instruction then this technique
This is very simple and effective anti- fails to handle NOP redundancy. Thereby, it is not
analysis approach to detect the present of the possible to disassemble the malware code
debugger by using their signature and address. accurately. Besides, checking whether a code is a
Like 83 3D 1B 01 was the signature of old version semantic NOP is undecidable [[4]].
Ollydbg.
5.2 Countermeasure for Reordering of Code
Christodorescu et al. (2007) proposed an approach
4.4 Browser Based Fingerprints which uses a CFG(control flow graph) invariant to
determine and remove the reordering of malware
In analysis environment, the browser is program. Using invariant CFG, we again reorder
also vulnerable which can be exploited by malware the code into the actual order which was before
107
INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE
Singh et al., Vol.7, No.3
first reordering. But the requirement of this In dynamic unpacking, the malware file is
technique is that malware code must be executed. When, the unpacking routine unpacks
disassembled properly thereby the CFG can be the malware file then original import table is
made appropriately. constructed. The big hurdle of this approach is to
find out the beginning of original code (original
entry point) and ending of unpacking routine. It
requires hard work and expertise. Undesirably, this
5.3 Countermeasure for Packer
is a difficult issue to handle automatically.
Revealing malware secured by archivers is not as Consequently, manual negotiation is done to
tough as the reverse techniques are available and determine the starting of original malicious code.
effortlessly reachable. Conversely, beating packers
c. Manual Unpacking
is substantially more troublesome. First of all, one
needs to recognize them effectively. This can be It is not an easy task to find out the Original
either accomplished by searching for section Entry Point(OEP) of malware programs. It requires
names inside the packed malware program, which a lot of hard work and the great understanding
can uncover the packer (e.g. UPX0, UPX1 if UPX about the packing tools in order to get insight
packer is used or look for different markers for about the packed malware file. Unlikely, no such
example, few library imports, unusual segment method is there which can determine the entry
sizes (e.g. size of crude information is 0 while the point of packed file.
virtual size is never zero). The other way is to
unpack the packed malware program in order to
access the code which represents the actual
behaviour of malware. In this manner, there are 6. Conclusion
three basic options for unpacking [[15]] such as Analysis of malware is very tedious task.
follows: Obfuscation is one of the major factors which
affects the analysis of malware. There are two
basic ways to analyze the malware signature based
a. Static: Automated Unpacking (without executing the file) and behaviour-based
(running the file mostly in controlled
This approach deals with packed malware environment). After studying various research
without running them and uses some automated papers and whitepapers of security experts it has
tool for unpacking. Most commonly used packing been shown that the signature-based detection
tools are UPack, UPX, NSPack, FSG, ASpack etc. techniques have become obsolete. Also the
There are various tools such as PEid, PE Explorer signature-based detection techniques can’t detect
and PE view which are capable for unpacking the the new malware. Now the second alternative is
packed malware files which are packed using these behaviour-based analysis in which malware files
tools. These tools restate the malware executable are executed for capturing the behavioural
into original form (unpack) without running the artifacts. There is also a possibility that complex
malware file. But, the malware writers can use obfuscated malware can cheat the execution
several anti-packing mechanisms to evade the environments like sandboxes, debuggers due to not
unpacking such as data encoding (e.g. base64 executing actual behaviour. Even though
coding), encryption and, anti-disassembly behaviour-based system detection systems are far
techniques (multilevel instruction, abuse of better than signature-based malware detection
pointers and exception handlers) [[23], [24]]. systems, behaviour-based systems are slow
Consequently, to unpack the packed malware is a compared to signature-based system. Therefore,
big challenge for the analyst [[15], [23]]. the time consuming is also a big concern in order
to scan the system and give the decision within
instant of time. By considering the pros of both
b. Dynamic: Automated Unpacking analysis techniques integrated malware detection
systems can provide solution to both problems
108
INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE
Singh et al., Vol.7, No.3
110