Practical Malware Analysis Based On Sandboxing
Practical Malware Analysis Based On Sandboxing
Abstract—The past years have shown an increase in the both conclude that Cuckoo provides similar details regarding the
number and sophistication of cyber-attacks targeting Windows behavior of the malware in a considerable smaller amount of
and Linux operating systems. Traditional network security time than manual analysis.
solutions such as firewalls are incapable of detecting and
stopping these attacks. In this paper, we describe our distributed This paper is structured as follows: Section II presents the
firewall solution Distfw and its integration with a sandbox for background and some of the related work, Section III
malware analysis and detection. We demonstrate the describes the design and implementation of Distfw, Section IV
effectiveness and shortcomings of such a solution. We use Cuckoo includes details about the integration of Distfw with Cuckoo,
to perform automated analysis of malware samples and compare Section V describes the experimental evaluation including
the results with the ones from manual analysis. We discover that manual analysis and automatic analysis using Cuckoo, and
Cuckoo provides similar results in a considerable amount of time. Section VI presents the conclusions and future work.
Keywords—malware, network security, sandbox, malware
analysis II. BACKGROUND AND RELATED WORK
In this section, we explore the types of malware analysis,
define the concept of sandbox systems, explain the advantages
I. INTRODUCTION and disadvantages of sandbox solutions and describe available
A threat summary report by F-Secure informs us of an sandbox systems.
alarming low percentage of malware detection and mitigation
worldwide: 15-20 malware blocked per 10000 users [1]. More A. Malware Analysis
and more criminal organizations change their profile moving By analyzing a malware, one can determine a lot of useful
to cyber-crime due to low risks involved in cyber-attacks and information: IPs of Command and Control (C&C) servers,
fast profits as a result. Due to this trend, many companies are indicators of compromise, file access, whether the malware
subject to malware attacks, ranging from drive-by attacks to was packed or not, if it has obfuscated code or not, whether it
sophisticated targeted attacks. spreads on the network or not. All this information can help an
In order to counter this trend, both security communities investigator determine the impact of the attack: was it a
and companies have put effort in developing methods to targeted attack or just a dry-by malware attack; the
protect their assets, using security products: breach detection sophistication of the attack can point out whether the attacker
appliances, web and email security platforms, etc. is an individual, an organized cyber-crime group, or even a
national security entity.
The days when cyber criminals exploited computers and
servers using only a couple of scripts that they would share In order to perform malware analysis, several methods are
amongst themselves are gone. Now cyber criminals are using available [2][3][4]: static analysis, memory analysis, dynamic
special tailored tools designed to bypass our defenses and to analysis and automatic analysis.
avoid them. It would be a colossal task to analyze every Static analysis – this method consists in obtaining
suspicious piece of software that exists, therefore automated information about the malware without executing it. We may
malware analysis can be very useful. obtain the strings, detect packers and observe certain
In this paper, we present our distributed firewall, called operations using the disassembled version.
Distfw, implemented using iptables for filtering traffic and Memory analysis – this method allows investigating the
IPsec for securing the communication. We integrated Distfw memory of the infected system in order to reveal hidden
with a sandbox for automatically analyzing malicious information about the malware, such as DLLs, hidden network
applications. connections, etc.
We integrated the Cuckoo sandbox solution into our Dynamic analysis or behavioral analysis – examining the
distributed firewall and performed automated experimental malware’s interaction with the host system at runtime. This
evaluation of malware samples. In addition, we performed includes analyzing the way the malware interacts with the file
manual step-by-step malware analysis on the same samples system, with the network, processes, etc. This method requires
and discovered similar information about the behavior of the
executable. From our experimental evaluation, we can
Authorized licensed use limited to: University of Portsmouth. Downloaded on February 28,2024 at 12:54:32 UTC from IEEE Xplore. Restrictions apply.
an isolated environment, in which the malware is launched applications installed or specific registry keys in case of
and its behavior is monitored. Microsoft Windows.
Automatic malware analysis – this is usually done via The most popular websites that provide such services are
sandbox systems. There are many reasons for using an Anubis and Malwr. Anubis [7] is an online platform that
automated malware analysis system, the most important being allows a user to submit Windows executables or Android
the ability to uncover artifacts about the malware in a fast APKs for analysis.
manner. Usually, analyzing a malware requires a lot of effort
Malwr [8] is an online platform developed by the same
and skill for the examiner. Even though it does not always
team that designed Cuckoo Sandbox. Users are allowed to
produce the same level of details, it is a very good starting
submit files or URLs for analysis. Additionally, users can
point in analyzing suspicious files.
view reports of other submitted files if the original submitter
configured the analysis report as public.
B. Sandbox Systems
A sandbox is a security platform for running unknown III. DISTFW DESIGN AND IMPLEMENTATION
executables in a dedicated environment without the risk of
affecting the production systems. Basically, sandboxes are We implemented a distributed firewall, called Distfw
virtualized environments that simulate live systems to ensure [9][10]. In the design of this firewall, we have tried to meet
that the tested executable runs in way that is almost the same, the requirements of a distributed firewall, as stated by Steve
if not identical, to the real environment. Similarly, security Bellovin [11]:
sandboxes are used to execute suspicious files in a safe • Policy language: the policy language includes the
environment in order to analyze their behavior and to provide commands given to the scriptable firewall provided by
information regarding attacks to security officers. the operating system. In our solution, this is
Sandbox systems allow monitoring suspicious executable accomplished using iptables commands.
files in an isolated environment while eliminating the risk of
• System management: This is provided by
compromising live systems. Another important aspect is that
implementing a master/client framework.
sandboxes eliminate a lot of human effort derived from
complex and lengthy tasks such as disassembling the • Safe distribution: The security policy is distributed
executable in order to understand its purpose. This method securely to the clients using IPSec in order to secure
allows a security administrator without extensive training in the policy distribution.
malware analysis to perform a triage of suspicious files and
only send confirmed malware for analysis. The main components of the Distfw architecture are: the
master node and the client nodes. The master node is
Nowadays, most of the security products on the market use responsible for the deployment and configuration of openswan
one or more types of sandboxing for behavior analysis, most on client nodes, log file integration from all clients. The
of them are closed source (proprietary), but some notable master node is also responsible for the deployment of iptables
solutions are provided as open source. rules based on the company policy but also according to
Some of the well known sandbox systems available are sandbox malware analysis of URLs accessed by users or
Cuckoo Sandbox and Zerowine. Cuckoo Sandbox [5] is an applications submitted for analysis.
open source malicious code behavioral analysis system that The master/client framework is based on a series of scripts,
consists of two components: 1) a Cuckoo Host system, which implemented using bash and Expect, which reside on the
handles the execution and analysis, and 2) Analysis Guests, management machine. The functionalities offered by these
which are isolated virtual machines where the malware is scripts are summarized in the following:
executed and results are sent back to the Cuckoo Host.
Analysis is done using packages - scripts that define • Adding a client machine to the framework.
automated tasks that the Cuckoo Host should perform during • Adding iptables rules to a client
the analysis of a target application. Moreover, Cuckoo
supports URL analysis in the guest machines, adding the • Listing iptables rules running on a client
possibility to determine whether the website that the user is
accessing is malicious or not. • Capturing URLs accessed by users
Zerowine [6] is an open source system that dynamically A. Adding a client machine to the framework
analyzes the behavior of target applications using Wine. The
Based on an IP address and initial credentials
disadvantage of this solution is that it only analyzes Windows
(user/password with elevated privileges) to remotely access
applications and it does so in an emulated environment
the client machine via SSH, the script adds a user distfw on the
(Wine).
client machine. After that, it modifies the /etc/sudoers file, in
It should be noted that there are also websites that allow order to allow the distfw user to manipulate iptables.
users to submit files for analysis, eliminating the need for
For obvious security reasons, it is recommended to limit
dedicated hardware for deployment and usage of dedicated
the use of the root user as much as possible, and delegate
sandbox systems. However, this method does not provide the
privileges instead. For this reason, a more elegant solution was
best results as some malware target systems that have certain
Authorized licensed use limited to: University of Portsmouth. Downloaded on February 28,2024 at 12:54:32 UTC from IEEE Xplore. Restrictions apply.
to create a user, which will be used only to manipulate iptables However, if there are numerous IP’s that have to be
on the client machine. blocked on several clients, this solution does not scale. To
prevent this sort of situations, the script allows the
Next, the master node checks whether openswan is administrator to load a file containing iptables rules and send
installed on the client node, and if not, it automatically installs the rules to the client to be applied.
the package.
Finally, the configuration is saved on the client using
Following this step, the master node generates a iptables-save. This is done in order to provide a fallback in
configuration file for the communication via IPSec with the case the client machine powers off.
client node and deploys it on the client node.
The last step of this process is to perform an initial C. Listing iptables rules running on a client
configuration of iptables on the client machine. This is done This script is used for displaying the rules configured on a
via a pre-defined list of rules, which are meant to perform a certain client. The script prompts the administrator to type the
lockdown to the system. This was done using the 3 pre- IP address of the client, and after that, it prints out the active
defined iptables chains: configuration of iptables on that client.
• INPUT – all traffic destined to the client machine is Similar to the methods described above, the distfw user
dropped, with one exception: SSH connections created earlier is used to connect via SSH connection over the
generated from the management machine. IPSec VPN tunnel to the client node. Using expect, the client’s
• FORWARD – all traffic that is supposed to pass iptables rules are printed out on the management node.
through the client machine is dropped. Considering the
fact that most clients in this framework are intended to D. Capturing URLs accessed by users
be either end-user machines, or servers (web, email, The script launches httpry, which is an application
etc), we consider that there is no real need to allow designed to monitor HTTP traffic, in the background, by
traffic to be forwarded. recording the URLs accessed by users and periodically
sending them to the master node. In our case, they are
• OUTPUT – only traffic marked as related, or introduced in the sandbox system for analysis.
established is allowed to pass, everything else is
dropped. This method is implemented in order to analyze potential
malicious URLs used by hackers for drive-by downloads. The
This initial lockdown is performed in order to prevent any script records HTTP requests (GET, POST, etc.) and then
other traffic to or from the client, until it is secured with the filters the results until only the fully qualified domain name is
iptables rules provided, dictated by the security policy. left, which is sent for analysis.
The last step of this script is related to creating a new chain
of rules, called distfw. From this point on, this is the chain that IV. CUCKOO INTEGRATION
is used to process traffic related to the client machine.
In our implementation, we chose to integrate Cuckoo
One of the reasons behind the creation of this chain is to sandbox to our Distfw distributed firewall solution. The main
protect the communication channel provided by the INPUT reason for this choice was the fact that Cuckoo allows guest
chain, allowing the client to communicate with the machines using Virtual Box, KVM or VMware, permitting the
management machine. In this case, all further rules are added analysis of files and applications on most operating systems.
in the distfw chain of rules, while the INPUT, OUTPUT and Moreover, Cuckoo facilitates the analysis of URLs, thus
FORWARD are not modified from now on. enabling the administrator to determine whether the websites
accessed by the users are legitimate or not. All analysis results
An important consequence of this is that, even if by
are stored in a database, and can be later used for reporting or
mistake we send the client a rule that would block SSH
retrospective analysis.
connection with the management machine, this rule will never
trigger, and communication with the management machine We installed the Cuckoo sandbox on the same machine
will not be lost. that is responsible for managing the distributed firewall
(Figure 1). The idea behind this was to integrate the benefits
While the issue of getting locked out might not seem a big
of automated analysis and to use those results in the
deal when it comes to client machines that are in your campus
distributed firewall. Cuckoo relies heavily on Python and there
LAN, it can be a serious problem when it comes to client
are some Python applications necessary to properly run
machines, which are in a different geographic location.
Cuckoo (e.g. Magic, Pydeep, Yara, Pefile, etc.).
B. Adding iptables rules to a client. Considering that 93% of malicious programs involved in
This script is used for adding a firewall rule for a remote web attacks are executed via malicious URLs [12], we chose
client. The script will prompt the administrator to type in the to integrate the sandbox in the distributed firewall
iptables rule that we want to add and the IP address of the implementation and automatically analyze URL requests. In
client machine, where it will be applied. This is useful when order to achieve this, we created a script that listens for URL
only one or two rules are necessary to be applied. requests, saves them to a file and sends them over the existing
Authorized licensed use limited to: University of Portsmouth. Downloaded on February 28,2024 at 12:54:32 UTC from IEEE Xplore. Restrictions apply.
IPSec VPN channel to the distributed firewall manager information in the RAM memory with DumpIt and then load
machine. Then each URL is submitted to analysis via Cuckoo. the dump in Volatility (an open source multi-platform
framework that enables the memory analysis). We choose to
run the following jobs: connections, pstree, and dllist,
dlldump.
As we look at the output of the connections job in Figure
2, it confirms what Wireshark has already pointed out to us on
the host machine: that the malware has network activity, even
though it is not visible on the guest machine.
A. Manual analysis
Fig. 3. Output of pstree job in Volatility
In order to perform the manual analysis, we created a
Windows XP virtual machine, which includes Wireshark,
The next step is to list the DLLs used by the adbreader.exe
DumpIt [13], Volatility [14] and Ida [15] for disassembly.
application. This is done by issuing the dlllist job. The result
The first step is to launch the malware and monitor its can be seen in Figure 4. However, with this information we
network activity. However, by monitoring its activity in cannot tell which DLL is part of the malware. However, we
Wireshark, and by analyzing the output of the netstat can submit the DLLs to an antivirus check to determine which
command, we do not obtain any information regarding one is part of the malware, and we find that
network traffic. There is no network activity reported by module.132.2498da0.40000.dll is actually the malware itself,
Wireshark or netstat command. This can mean two things. while the rest of the DLLs are actually harmless.
Either the malware is in an idle state because it detected that it
is executed in a virtual environment or its activity is hidden
from the winpcap driver. In order to determine which is the
case, we launch Wireshark in the host environment. Now, we
can see that the guest machine is actually making connections
to an IP address: 95.211.99.27 with the destination port set to
81.
With this information, we can conclude that the application
is connecting to its C&C server. However, there are still a lot
of questions to be answered: What does the malware do? How
Fig. 4. Output of dlllist in Volatility
does the malware hide its connections? What type of malware
is it?
At this point, we have managed to identify the file that is
In order to answer these questions, we follow up with a responsible for the infection, and the IP address of the C&C
memory analysis of the malware. We begin by dumping all server, but we still do not know what the malware actually
Authorized licensed use limited to: University of Portsmouth. Downloaded on February 28,2024 at 12:54:32 UTC from IEEE Xplore. Restrictions apply.
does. To answer this question we proceed to load the In addition, we find the channel name is “jobs”, and the
identified file in a disassembler and analyze the code. “NICK” is set to be generated in a random fashion each time it
connects to the server. The format of the nickname is
First, we need to know which programming language was presented in Figure 9: "n[%s|%s]%s” (for example
used to write the malicious code. Loading adbreader.exe in n[USA|XP]395455), where the number is randomly generated
Ida, we discover that the application was written in Delphi, as each time based on processor tick – clock cycle, XP is the
we can see from Figure 5. This information is useful for the operating system version, and n[USA] is the same each time
analysis of the target DLL. the malware connects to the server.
Authorized licensed use limited to: University of Portsmouth. Downloaded on February 28,2024 at 12:54:32 UTC from IEEE Xplore. Restrictions apply.
The malware analysis solution proposed in this paper
represents the extension of the Distfw scripts functionality
[9][10]. We consider that the addition of the sandbox analysis
and the extension of functionality of the Distfw scripts is a
major leap forward in putting together a security enforcement
solution for networks.
Considering that the results of the sandbox analysis have
been promising, we plan to provide a method that will detect
downloaded email attachments and automatically submit an
analysis. A Verizon data breach report for 2013 [16] reveals
that e-mail attachments are used as a main vector of attack in
almost 80% of attacks related to espionage. Considering this
Fig. 11. Cuckoo analysis report alarming high number of attacks using email attachments as a
means of attack, we consider that an automated attachment
The report also provides a list of strings (Figure 12) found analysis will be an enhancement to our solution.
in the application.
VII. BIBLIOGRAPHY
[1] Mobile Threat Report Q1, http://www.f-
secure.com/static/doc/labs_global/Research/Mobile_Threat_Report_Q1_
2014.pdf (Last Access: August 2014)
[2] E. Skoudis, L. Zeltser, “Malware: Fighting Malicious Code”, Prentice
Hall, 2003
[3] C. H. Malin, E. Casey, J. M. Aquilina, “Malware Forensics Field Guide
for Linux Systems”, Syngress, 2014.
[4] M. Sikorski, A. Honig “Practical malware analysis”, Practical Malware
Analysis: The Hands-On Guide to Dissecting Malicious Software, No
Fig. 12. Information extracted by the sandbox
Starch Press, 2012.
[5] Cuckoo Sandbox documentation,
It is important to notice that the sandbox does not state in http://docs.cuckoosandbox.org/en/latest/ (Last Access: August 2014)
the report whether the analyzed file is a malicious or a [6] Zero Wine: Malware Behavior Analysis,
legitimate file. This conclusion remains to be drawn by the http://zerowine.sourceforge.net/ (Last Access: August 2014)
analyst as in the case of a manual analysis. However, the [7] Anubis Malware Analysis Tool, https://anubis.iseclab.org/?action=about
incredibly low time required to analyze files in conjunction (Last Access: August 2014)
with the large number of information provided to the analyst [8] Malwr, https://malwr.com/about/ (Last Access: August 2014)
can help in investigating suspicious files. [9] M. Vasilescu, A distributed firewall implementation, Master Thesis,
University Politehnica of Bucharest, 2014.
[10] M. Potoroaca, Implementation of a distributed firewall over Windows
VI. CONCLUSIONS platforms, Master Thesis, University Politehnica of Bucharest, 2014.
Considering the increasing number of cybernetic attacks, [11] S. Ioannidis, A.D. Keromytis, S.M. Bellovin, and J.M. Smith.
the number of files that need to be analyzed and the time cost Implementing a distributed firewall, ACM Conference on Computer and
of analyzing a single file, we consider that the implementation Communications Security, 2000.
of an automated analysis system is paramount in order to [12] Kaspersky Security Bulletin 2013,
http://www.securelist.com/en/analysis/204792318/Kaspersky_Security_
increase security within a network. Bulletin_2013_Overall_statistics_for_2013 (Last Access: August 2014)
We designed and implemented a distributed firewall for [13] Dumpit, http://www.aldeid.com/wiki/Dumpit (Last Access: August
Linux operating system, by using iptables for filtering, IPSec 2014)
for securing network communication and a sandbox to [14] Volatility, https://code.google.com/p/volatility/ (Last Access: August
automatically analyze URLs accessed by users in order to 2014)
detect malicious applications. The sandbox was integrated in [15] Ida disassembler, https://www.hex-rays.com/products/ida/ (Last Access:
August 2014)
the distributed firewall solution in order to provide an
[16] Data Breach Investigations Report 2014,
automated mechanism for the detection of potential malware www.verizonenterprise.com/DBIR/2014/reports/rp_Verizon-DBIR-
applications. 2014_en_xg.pdf (Last Access: August 2014)
We presented how Cuckoo solution can be integrated into
our distributed firewall and evaluated its functionality by
analyzing a malware sample. We also performed manual
malware analysis of the same sample. The automated Cuckoo
solution produces the same results in a considerable smaller
time.
Authorized licensed use limited to: University of Portsmouth. Downloaded on February 28,2024 at 12:54:32 UTC from IEEE Xplore. Restrictions apply.