Cuckoo KVM Research

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Digital Investigation 22 (2017) S48eS56

Contents lists available at ScienceDirect

Digital Investigation
journal homepage: www.elsevier.com/locate/diin

DFRWS 2017 USA d Proceedings of the Seventeenth Annual DFRWS USA

Insights gained from constructing a large scale dynamic analysis


platform
Cody Miller a, Dae Glendowne b, Henry Cook c, DeMarcus Thomas b, *, Chris Lanclos b,
Patrick Pape b
a
Babel Street, 1818 Library St., Reston, VA, USA
b
Distributed Analytics and Security Institute, 2 Research Boulevard, Starkville, MS, USA
c
Green Mountain Technology, 5860 Ridgeway Center Parkway, Suite 401, Memphis, TN, USA

a b s t r a c t
Keywords: As the number of malware samples found increases exponentially each year, there is a need for systems
Malware that can dynamically analyze thousands of malware samples per day. These systems should be reliable,
Dynamic analysis
scalable, and simple to use by other systems and malware analysts. When handling thousands of mal-
Cuckoo sandbox
ware, reprocessing a small percentage of the malware due to errors can be devastating; a reliable system
avoids wasting resources by reducing the number of errors.
In this paper, we describe our scalable dynamic analysis platform, perform experiments on the plat-
form, and provide lessons we have learned through the process. The platform uses Cuckoo sandbox for
dynamic analysis and is improved to process malware as quickly as possible without losing valuable
information. Experiments were performed to improve the configuration of the system's components and
help improve the accuracy of the dynamic analysis. Lessons learned presented in the paper may aid
others in the development of similar dynamic analysis systems.
© 2017 The Author(s). Published by Elsevier Ltd. on behalf of DFRWS. This is an open access article under
the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Introduction design, and the decisions that were made to ensure high perfor-
mance and quality of dynamic analysis.
As the arms race between malware creators and security pro- The main software component of the system described in this
fessionals progresses, new adaptations are needed, and made, every paper is the open-source dynamic malware analysis platform
year. One such adaptation on the security side is malware behavior Cuckoo Sandbox developed by Guarnieri et al. (2013). Due to
identification via dynamic analysis. AV-Test indicates that the total Cuckoo Sandbox's broad range of virtualization software support
number of new malware has increased significantly in the past five and customization (as a result of it being open source and of its
years from under 100 million in 2012 to over 500 million in 2016. extensive plugin support), a multitude of customized systems can
Due to this increasing number of new malware distributed annually, be developed around it. The system presented in this paper is just
dynamic analysis systems must be able to process tens of thousands one such iteration, optimized and backed up by testing options at
of malware samples per day. In order to meet such a large quota, and various decision points. Some of the optimization areas focused on
remain manageable, the systems need to be reliable, scalable, and in this research include: identifying performance bottlenecks,
convenient for the users. In the effort to create such a system, there efficient machine/software configurations for virtualization, and
are many turning points at which a decision impacts performance malware execution time limit tradeoffs. Along with the optimiza-
and efficiency. It is important to consider all the viable options for tion solutions, we also discuss how the presented system is scalable
these turning points, which can be an overwhelming task for an due to our distribution scheme and database storage. Along with
already complex system. The research presented in this paper seeks detailed explanations of the above areas, within the context of our
to improve understanding of these choices by presenting our system developed system, this paper provides some lessons learned
throughout the process which can aid in the future development
and improvement of similar systems.
The growth of malware samples found each year puts more
* Corresponding author. expectations on an already taxing responsibility as a digital
E-mail address: [email protected] (D. Thomas).

http://dx.doi.org/10.1016/j.diin.2017.06.007
1742-2876/© 2017 The Author(s). Published by Elsevier Ltd. on behalf of DFRWS. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/
licenses/by-nc-nd/4.0/).
C. Miller et al. / Digital Investigation 22 (2017) S48eS56 S49

forensics examiner. The only way that digital forensics examiners System overview
will be able to respond to the enormous amount of malware
being developed is through more sophisticated tools that are Cuckoo Sandbox
developed through lessons of past and current tools. The first
contribution of this research is the scalable dynamic analysis Cuckoo Sandbox is an automated dynamic analysis sandbox
platform, which could be used by digital forensics examiners to created by Guarnieri et al. (2013). Cuckoo allows the submission of
respond to the sheer amount of malware being developed yearly. files to be run in an isolated environment. Cuckoo first reverts a VM
Secondly, this paper discusses the different experiments that to a base snapshot (one that is not affected by malware), then it
were used to optimize the platform. Lastly, in the process of runs the malware on the VM. While the malware sample is running,
developing the scalable dynamic analysis platform, lessons were Cuckoo collects information about what it does in the sandbox such
discovered that should be taken into consideration by future as: API calls, network traffic, files dropped, etc. Cuckoo's collected
digital forensic tool developers. The lessons could be just as information will be referred to as a Cuckoo sample for the
important as the scalable tool because of the ever changing remainder of this paper. Spengler created Spender-sandbox
digital environment. (Cuckoo 1.3), a modified version of Cuckoo 1.2 that adds a num-
ber of features and bug fixes. “Cuckoo modified” (a branch separate
Related work from the main Cuckoo version branch) was selected over Cuckoo 1.2
because the modified version adds, among other things, new hooks
According to Kruegel (2014), three main aspects of a dynamic and signatures.
analysis system must be true for it to be effective: visibility,
resistance to detection, and scalability. Visibility is the ability to Cuckoo nodes
effectively monitor the activity of a sample in an analysis envi-
ronment. With the increased environmental-awareness of mal- To process malware, five hosts running ESXi 5.5.0 were used.
ware samples, a sandbox must also be effective at hiding their The hardware varies slightly between the hosts, two of them have
presence to avoid identification. In addition to these, the ability 16 physical cores and three of them have 20 physical cores and all
to process samples at a high rate is a requirement due to the five have 128 Gib of RAM. Each host also has an adapter connected
sheer volume of malware samples being produced. Kirat et al. to an isolated network that the hosts share. A virtual machine (VM)
(2011) compare the processing and restore speeds for varying running CentOS 7 and Cuckoo (i.e., a Cuckoo node) is on each host
analysis environments, and compare the number of samples that and has 64 Gib of RAM and 28 virtual cores. The Cuckoo nodes each
can be processed within a minute. In their experiments, they have 20 Cuckoo agent VMs within them. All together there are 100
compared results from BareBox, VirtualBox, and QEMU for Cuckoo agent VMs managed by the five Cuckoo nodes. This setup of
automated analysis of 100 samples. The results show the ability VMs inside of a VM was chosen because, according to Kortchinsky
to run 2.69, 2.57, and 3.74 samples per minute respectively when (2009) and Wojtczuk and Rutkowska, though unlikely, malware
the samples were executed for 15 s each. Blue coat malware can escape the agent VM and attack the host; keeping the agent
analysis s400/s500 is a sandbox solution that states the ability inside another VM adds another layer of isolation.
to process 12,000 samples daily.
When considering a dynamic analysis system, you must also Cuckoo agents
consider the method that tools use to extract information.
Guarnieri et al. (2013) suggested that most systems will monitor The driving research behind implementing this system focuses
the API calls (systems calls) to obtain an idea of what may be primarily on malware that targets the 32-bit version of Windows 7.
occurring in the system. In addition to this, some systems will also Therefore, Windows 7 32-bit virtual machines were used for the
monitor the steps between API calls (Kruegel, 2014), perform taint Cuckoo agents; QEMU version 2.5.1 was used as the virtualization
analysis to monitor information as it propagates through the sys- architecture. The agent virtual machines have: a new installation of
tem (Song et al., 2008), execute samples multiple times with Windows 7, 512 Mib of RAM, 1 CPU core, Adobe Reader 11, Python
varying OS's and system configurations to identify environment 2.7 installed, and have Windows firewall and UAC disabled. All the
sensitivity (Song et al., 2008; Provataki and Katos, 2013), execute agent VMs used the network adapter on the node that is connected
hardware emulation (Kruegel, 2014), use bare-metal systems to to the isolated network of VM hosts.
avoid evasive techniques (Kirat et al., 2011; Kirat et al., 2014),
incorporate integration with memory analysis frameworks INetSim
(Guarnieri et al., 2013), etc. Also, when considering an analysis
system, the pros and cons of open-source vs. closed-source projects All the agent VMs were connected to an isolated network that
must be evaluated. does not have access to the Internet. INetSim was created by
An additional aspect to consider for any dynamic analysis Hungenberg and Eckert and was used to spoof various Internet
environment is selecting an optimal time of execution per sample. services such as DNS, HTTP, and SMTP. It runs on its own VM which
Keragala (2016) stated that samples can exhibit stalling behavior to is on the same network as the agent VMs. Gilboy (2016) stated that
defeat time-out limits of some systems if not properly selected. INetSim can improve malware execution as it can trick malware
Several works did not state a specific execution period, but analysis that require the Internet. However, this does not help if the mal-
reports showed execution times ranging between 2 and 3 min ware needs external resources, such as a command and control
(Provataki and Katos, 2013; Vasilescu et al., 2014; Rieck et al., 2011). server, as INetSim does not provide external (Internet) resources to
Lengyel et al. (2014) selected an arbitrary duration period of 60 s. To the isolated network.
our knowledge, there has only been a single published work
(Kasama, 2014) which performed an empirical evaluation of the Results server
optimal execution time for samples in a dynamic analysis system.
However, this work had a limited number of samples (5,697) and The results server is a VM that provided a way to get the Cuckoo
captured API calls. The experiment in this paper is meant to confirm samples from the Cuckoo nodes directly without using Cuckoo's
the results presented by Kasama. built-in API to fetch the results, thus improving transfer and
S50 C. Miller et al. / Digital Investigation 22 (2017) S48eS56

processing speed. Each of the Cuckoo nodes has an NFS-connected Experiments


drive that maps Cuckoo's storage (the directory where Cuckoo
places Cuckoo samples) to a mount point on the results server. The Distribution time
results server compresses the samples and transfers them to long
term storage. The long term storage is not on the isolated network, This experiment aimed to determine how the distribution script
therefore making it possible to process the results using resources we created affects the overall speed of processing samples. The
outside the isolated network. distribution script saved a timestamp when it started processing a
sample and another when it was finished. This experiment used the
Database difference of these two times to determine the time added to each
sample. This time delta started when the sample was completed in
To improve the stability and efficiency of the system, a database Cuckoo and ended when the sample was on long-term storage.
was used to store the malware and to act as a central location in the
malware processing pipeline. The database was used to track the Experiment
malware as it transitions through the following states: submission, The time delta was calculated for 118,055 samples in the data-
Cuckoo processing, and complete. When the malware was sub- base. This experiment was designed to test the speed of the dis-
mitted to the database, it was marked as being ready for Cuckoo tribution and not to explore the Cuckoo samples generated.
processing. Along with the binary, other task keeping information Therefore, the dataset used in this experiment contains all our
was submitted, such as hashes, destination on long term storage, processed samples from various datasets gathered between 2014
current status (‘queued’ initially), and the priority of the sample. and 2016. All the samples were completed using the distribution
The database also allowed the system to be expandable; other script.
systems can connect to the database to get information about
samples and append new information.
Conclusion
Fig. 1 displays a histogram of the time deltas for the samples.
Scaling cuckoo
Most of the samples take between 50 and 150 s with an average of
114 s. The distribution script can process up to 60 samples in par-
In order to process as much malware as our hardware can
allel which means, on average, the time per sample is 1.9 s. We
handle, our Cuckoo system needed to be scalable. Cuckoo provided
consider this to be an acceptable amount of time for each sample.
a distribution utility to handle scaling. We chose to extend this
script to add additional features and flexibility. Our version adds,
among other things, the ability to update our database with the Cuckoo machinery
status and details of the sample, compress the sample to long term
storage, connect to Cuckoo nodes on different subnets, and also The purpose of this experiment was to determine which ma-
fixed some scalability issues. The extended version we created for chinery is more efficient to use on the Cuckoo nodes. The ma-
this purpose ran on the result server and used the existing Cuckoo chinery is the architecture that will run the Cuckoo agents. Cuckoo
API and mounted storage paths to compress and store the Cuckoo 1.3 (Spengler's version of Cuckoo 1.2), supports physical, KVM,
samples on long term storage. The script retrieved binaries stored VirtualBox, VMware, ESXi, vSphere, and XenServer. Physical ma-
on the database and submitted them to the Cuckoo nodes. It does chines were not chosen because we have servers that can support
this by keeping track of how many samples a Cuckoo node was more than one agent at a time; using the server for one agent would
processing and how many were already processed, using Cuckoo's waste resources. We chose not to use ESXi, vSphere, and XenServer
REST API. The distribution script supported any number of Cuckoo because they host the Cuckoo agents directly, removing one layer of
nodes as long as they are running Cuckoo's API and have mounted isolation. Several issues with VMware, such as VM corruption, and
the storage path on the results server. While the long term storage's speed, formed primary inspiration for this experiment; we wanted
bandwidth is not saturated, additional Cuckoo nodes can be added. to test if QEMU would be a more stable choice. This test was orig-
The distribution script submitted 25 samples to each node; when inally performed using VMware 10 and QEMU 0.12. During testing
a sample was submitted, it was marked as ‘processing’ in the data- several of the VMware 10 Cuckoo agents became corrupt and had to
base. The script submitted a sample with a random filename that did
not contain the letters ‘r’, ‘s’, or ‘m’ (so that it cannot contain names
of most hashes, such as MD5 or SHA), this was done because Singh
expressed that some malware may check its own filename before
executing fully. The script monitored the samples using Cuckoo's
API, on each Cuckoo node, to determine when a sample had finished
processing (meaning Cuckoo has generated reports and is done with
the binary). When a Cuckoo node finished processing five samples,
five more were added. The number of samples queued on the
Cuckoo nodes was small (the number of agent VMs plus five) so that
samples with a high priority are not put in a large queue with low
priority samples. When a sample was finished, the Cuckoo sample
was moved to long-term storage. The binary file, memory file,
Cuckoo reports, and dropped files of the Cuckoo sample were
compressed as they were moved. The sample is then marked as
‘complete’ in the database. If the sample failed at any point in the
process, the error was stored and the sample was marked as failed in
the database. Samples that encountered an error were manually
inspected to determine the cause of the error. The full list of errors
we have encountered are detailed in Dynamic analysis issues. Fig. 1. Distribution time taken for 118,055 samples.
C. Miller et al. / Digital Investigation 22 (2017) S48eS56 S51

be replaced regularly. This test was performed again with VMware behavior.py. The enhanced section was chosen over the raw API
12 and QEMU 2.5.1. calls because it requires less transformations to display visually.
The cumulative percent was calculated by determining the
Experiment percent of calls executed at each second, for each enhanced
To determine the performance of VMware and QEMU, 20,000 group, by all malware in the dataset.
samples from VirusShare.com were processed with Cuckoo on This experiment used the calls of malware on a single processing
both platforms with 20 agent VMs. VirusShare.com is a service of the sample. This does not mean that the malware executed along
created by Roberts that provided many anti-virus detections for all code paths. However, a single execution of the malware gives a
uploaded malware samples. The actual samples were not the starting point in determining the timeout.
focus of this experiment rather the stability of the virtualization
architecture running Cuckoo's agents. Each host VM was config- Experiment
ured to be identical (same OS, system resources, and configura- The samples in this experiment were analyzed using Cuckoo
tions). Cuckoo was also configured identically, on both platforms: with a timeout of 5 min and with Cuckoo set to not terminate
to only run the samples and not process the results (no reports processes. Due to the longer timeout, a smaller unique dataset
were generated), and to not dump the agent VM memory. Using containing 30,346 samples gathered from VirusShare.com was
VMware, Cuckoo processed the samples in 70 h 6 min (6847 used. This dataset contained a random sampling of malware be-
samples per day). Using QEMU 2.5.1, Cuckoo processed the tween 2014 and 2016. Cumulative percentage of API calls per sec-
samples in 30 h 14 min (15,862 samples per day). VMware ond was calculated on the dataset. After 125 s all the enhanced
crashed three times during the processing of the 20,000 samples. groups completed at least 90% of their calls. By 1132 s, 100% of all
These crashes caused Cuckoo to stop processing samples and had the groups' calls were completed. A graph of the cumulative
to manually be restarted. The times between each crash were percent is shown in Fig. 2.
removed from the total time.
Conclusion
Conclusion When choosing this timeout you need to decide what loss of
QEMU ran the 20,000 samples 2.3 times faster than VMware and information you can afford (if any) to get the performance you
was more stable. Due to these results and QEMU being a free and desire. Based on this information 125 s was choose for our Cuckoo
open source solution, QEMU was chosen as the virtualization timeout.
technology for our Cuckoo implementation. This experiment also had the side effect of providing details
about what kind of API calls happen at different times during the
execution of malware. According to this experiment, SetWin-
Best execution timeout
dowsHook and FindWindow APIs are most often called during the
first few seconds. Most service APIs (start, control, and delete) are
In Cuckoo sandbox, there was a timeout option that sets the
called 20 s after execution. File operations typically get called last,
maximum time a sample is allowed to run. After this timeout was
40 s after the malware started.
hit, Cuckoo terminated the analysis. Cuckoo kept monitoring the
malware beyond this timeout while it waited for the processes to
terminate if terminate_processes was set in the Cuckoo configu- Anti-VM
ration. If terminate_processes was not set, Cuckoo ended the
analysis immediately. The larger this timeout was, the longer the This experiment aided us in our decision to use QEMU over
samples could run; however, the higher this value, the fewer pro- VMware. Various popular anti-VM techniques were used to deter-
cessed samples per day. This is due to some malware never ter- mine how well the VM detection worked on Cuckoo agent VMs. For
minating itself or looping for an extended period of time. While this experiment, non-sandbox related detections (for example
Guarnieri et al. (2013), Kirat et al. (2011), Lengyel et al. (2014), Cuckoo detections) have been ignored as they are not in the scope
and others gave a default value for this timeout, they rarely of the experiment. While hardening the agent VMs was not the
explained why they chose this particular value. focus of this experiment; it is something to consider when deciding
The cumulative percentage of calls was calculated using the to use QEMU or VMware.
enhanced section of the Cuckoo report. Cuckoo generated this
section by grouping API calls that are similar. For example, Experiment
MoveFileWithProgressW, CopyFileA, and DeleteFileW are group- Pafish (Deepen), an open source project focused on identifying
ed together in the ‘file’ group. Table 1 lists each of the eight sandboxes and analysis environments, used common malware
groups and some example API calls. The full list of groups and techniques to fingerprint systems. It was run via Cuckoo on both
their corresponding API calls can be found in the Cuckoo- QEMU 2.5.1 and Vmware Workstation 12 and detected the
modified GitHub under modules, processing, and then following virtualization identifiers on both QEMU and Vmware:

Table 1
Enhanced groups.

Group Description Example


APIs

file File based operations such as copy, read, delete, write, move CopyFileA, DeleteFileA, NtWriteFile
dir Directory based operations such as delete, create RemoveDirectoryA, CreateDirectoryW
library Loading and using libraries (such as DLLs) LoadLibraryA, LdrLoadDll, LdrGetDllHandle
windowname Retrieving handle to a window FindWindowA. FindWindowExA
registry Registry based operations such as set, create, read, delete RegSetValueExA, RegCreateKeyExA, RegDeleteKeyA
windowshook Windows API hooking SetWindowsHookExA
service Service based operations such as start, modify, delete StartServiceA, ControlService, DeleteService
S52 C. Miller et al. / Digital Investigation 22 (2017) S48eS56

Fig. 2. Cumulative percent of enhanced groups. The dashed line represents the average cumulative percent for all enhanced groups.

 The correct CPU vendor (“AuthenticAMD” for QEMU and “Gen-  OC: CPU cores to give the OS and other programs
uineIntel” for Vmware) and Windows version (6.1 build 7601)  P: number of parallel processes Cuckoo's processing utility uses
 A “VM CPU” by checking the RDTSC CPU timestamp counter and  agentcount: number of agents on each node
by checking the hypervisor bit in the CPUID.  agentram: Gib of RAM to give each agent
 Under 60 Gib disk, under 2 Gib RAM, and less than 2 CPU cores
agentcount  agentram
R ¼ C þ OC þ P  2 þ (1)
Pafish also detected the following for Vmware: 4
Equation (1) was derived to estimate the minimum Gib of RAM
 A Vmware device mapped SCSI registry (HKLM/HARDWARE/ to give each Cuckoo node. It allows Cuckoo's main process to have
DEVICEMAP/Scsi/Scsi Port 1/Scsi Bus 0/Target Id 0/Logical Unit RAM dedicated to it, 2 Gib dedicated to each process of Cuckoo's
Id 0 “Identifier”) processing utility, and a fourth of the maximum RAM the agents
 A Vmware MAC address starting with 00:50:56 can use. The main Cuckoo process is not memory intensive and it is
 A Vmware WMI Win32 bios serial number recommended to provide it no more than 1 Gib. For our system, we
choose O ¼ 4 Gib.
Conclusion
 
Some of the VM detections had simple fixes, such as the agentcount þ P
R ¼ C þ OR þ (2)
VMware registry (by altering the key), MAC (by changing the MAC), 2
and disk/RAM/CPU cores (by increasing these resources). Others
required more research to determine how these detections can be To estimate the minimum number of CPU cores each Cuckoo
concealed. The results of this comparison showed that QEMU had node needed, Equation (2) was derived. This allowed Cuckoo to
less detectable virtualization through basic detection techniques. have dedicated cores and the agents and process utility to have
This conclusion provided another reason to choose QEMU over cores equal to half of the number of threads they create.
VMware for our system.
Improving execution
Hardware specification
This experiment was designed to determine if, by changing
This experiment aimed to give an estimate of the hardware specific things in the agent VM, the malware would run differently
resources required to run 20 Cuckoo agent VMs using QEMU. When or have more compatibility with the agent. This experiment targets
the Cuckoo nodes were first setup, the number of resources to give malware that needs a framework or checks for previous user ac-
them was unknown. The resources were overestimated and each tivity/system usage as described in Deepen and Willems.
Cuckoo node was given 64 Gib of ram and 28 CPU cores, which was
an over-estimation by a significant amount. Experiment
Ten Cuckoo agents were used to process a dataset containing 10,000
Experiment randomly chosen samples gathered from VirusShare.com between
A Cuckoo node was monitored while it processed samples on all 2015 and 2016. The dataset was run on the agents with different agent
20 Cuckoo agent VMs. It was determined that on average the agents configurations using Cuckoo 1.3. The base configuration was the same
used a fourth of the RAM they were given and cuckoo's processing agent listed in Section Cuckoo Agents. The hardened configuration was
utility used 2 Gib RAM per parallel process. QEMU used CPU cores the base configuration with the following additional changes:
no greater than half the number of agents running. Similarly, the Added Documents
processing utility used total CPU cores no more than half the
number of parallel processing configured.  “My Documents” has 5 JPGs, 1 txt, 5 PDFs, and 3 data files
 “My Music” has 3 MP3s
Conclusion  “My Pictures” has 6 JPGs and 1 GIF
The equation defined here use the following key terms:  “My Videos” has 4 MP4s

 R: minimum Gib RAM required to give each Cuckoo node New Programs
without over-commiting RAM
 C: Gib of RAM to dedicate to Cuckoo's main process  Firefox 38.0.5, Notepadþþ v7, VLC 2.2.4, 7-Zip 16.02, Adobe
 OR: Gib of RAM to give the OS and other programs Flash player 10.1.4, Java 6.45
C. Miller et al. / Digital Investigation 22 (2017) S48eS56 S53

New Frameworks is the percent of samples that had an increase in API usage. Average
percent increase is the percentage of the average amount that the
 Microsoft Visual Cþþ 2005, 2008, 2010, 2012, 2013, and 2015 call type APIs increased or decreased usage by. Maximum percent
redistributable increase is the maximum amount that the call type increased by.
 Microsoft .NET 3.5 and 4.6.1 frameworks On the hardened configuration the samples had the following
differences:
Recent Documents/Programs
 54.98% of the samples exhibited an increased number of unique
 All the added documents were opened multiple times. Each new API calls. The average increase of these samples was 7.88.
program was run multiple times.  60.22% of the samples had more total API calls, 10.61% had fewer,
 Running Programs and 29.17% had the same amount.
 Windows explorer  89.28% of the samples ran for a longer duration.
 Notepad  There were no new IP addresses or domains requested. However,
 All update services for new software were disabled some samples made different network calls, though there was no
substantial difference as only 2.91% of the malware did so.
Many of these changes were added as a result of the previous
experiments and lessons learned from them. These changes were Table 3 displays the differences between the two configurations
made to make the agent appear as if it has been/is being used by an in terms of number of files and registry reads, writes, and deletes,
actual user. The agent was left running for 30 min while programs number of mutexes created or accessed, and number of Cuckoo
and documents were manually opened and closed. The running signatures flagged. All actions except reading keys had no signifi-
programs were still running when Cuckoo started analysis on the cant impact between the two configurations as their average usage
malware. To evaluate the differences of each configuration, specific increased by a small amount. This small increase could be due to a
items in the Cuckoo sample of each configuration were compared. number of things unrelated to the configurations.
API calls of the malware were inspected to determine if the mal- The results of this experiment clearly show that samples on the
ware executed differently and the network activity was examined hardened configuration ran differently than on the normal
to see if the malware operated differently over the network. configuration. As expected, more malware was able to run due to
required frameworks being installed on the hardened configura-
Conclusion tion. This experiment also shows that malware on the hardened
Of the 10,000 samples, 9014 ran completely on the base configuration executed more code. Further research is required to
configuration compared to 9421 that ran on the hardened config- determine the exact reason the malware behaved differently.
uration. A complete run was achieved when the malware executed
properly and Cuckoo was able to capture at least one API call.
Samples that did not completely run on both the normal and Lessons learned
hardened were removed, leaving 8834 samples. The 1166 samples
that did not have a complete run were examined and it was Virtualization architecture
determined that 363 samples immediately exited with no hooked
APIs called (the malware ran properly but decided to exit), 474 had Lesson: Choose an appropriate dynamic analysis platform. There
a Cuckoo error unrelated to the sample, and the reason for the are many dynamic analysis platforms available. Egele et al. (2012)
remaining 329 could not be determined. Of the 10,000 samples, 23 provide an extensive survey of dynamic analysis systems and
of them required an external DLL that was not on the virtual ma- techniques. This serves as a good starting point for identifying
chine. By having frameworks installed 705 more samples were able many of the current systems and the techniques they use. We
to run completely. 63% of these were Microsoft .NET and 37% were considered several dynamic analysis systems that have appeared in
Microsoft Visual Cþþ. the literature.
Table 2 shows the changes in the types of API calls performed by We initially filtered the list of dynamic analysis systems based
the sample in the base configuration versus the hardened one. Call on their availability and level of active development. First, a system
type is the label Cuckoo groups API calls by. Percentage of samples that was strictly web-based, requiring us to submit samples to a
remote server to receive results was not considered acceptable for
the purposes of this work as the goal of this work was to construct a
Table 2 scalable platform for analyzing large quantities of malware.
Changes in types of API calls in the base versus the hardened configurations. Therefore, a version of the system must be available for download
Call type Percentage Average percent Maximum either in source code or binary form so that it could be installed and
of samples increase percent increase operated locally.
reg 32.82% 24.63% 85.35%
misc 37.96% 8.95% 90.77%
process 24.11% 5.37% 86.44% Table 3
file 30.58% 5.23% 88.03% Differences in configuration (base minus hardened).
sleep 41.1% 2.23% 92.73%
Name Minimum Maximum Average Samples Samples
network 7.79% 1.70% 23.43%
increase increase increase increased decreased
socket 6.55% 1.67% 42.63%
reg_native 28.40% 1.34% 86.77% Read Key 2744 3231 309.09 3982 985
crypto 4.10% 1.09% 91.18% Read File 41 250 4.62 3947 716
special 30.39% 0.70% 48.05% Write Key 30 2516 1.24 1615 265
window 26.26% 0.43% 90.45% Write File 27 159 1.04 2103 136
sync 35.64% 0.42% 36.06% Mutexes 83 124 0.73 2537 554
thread 37.74% 0.37% 17.02% Delete Key 7 52 0.49 1116 41
services 10.81% 0.30% 26.86% Delete File 7 328 0.34 1618 52
other 0.00% 0.00% 0.0% Signatures 6 14 0.19 1549 295
S54 C. Miller et al. / Digital Investigation 22 (2017) S48eS56

Second, the system must be under active development. Malware between runs. We encountered issues where running a sample
analysis is a constantly evolving area requiring consistent effort to once through Cuckoo would produce an invalid Cuckoo sample.
stay relevant. Some of the systems we considered had source code However, running the same sample again would produce a valid
available, but had not been updated in five or more years. Cuckoo sample. An invalid Cuckoo Sample could have any of the
Third, the system should be freely available. following issues: missing files, missing data in the report, corrupt
reports, and/or Cuckoo errors. These issues led us to thoroughly
 Anubis (Analysis of unknown binaries) e web-based; currently check Cuckoo Samples at different stages. In this process, the
offline Cuckoo samples were checked, and invalid samples were marked as
 CWSandbox (Willems et al., 2007) e commercial invalid in the database. The invalid samples can then be deliber-
 Norman Sandbox (Norman sandbox analyzer) e commercial ately queued again in another run.
 Joebox (Joebox) e web-based Our first iteration resulted in losing over 10% of Cuckoo samples
 Ether (Dinaburg et al., 2008) e not updated since 2009 due to various errors. Some of these issues were due to bugs in the
 WILDCat (Vasudevan and Yerraballi, 2006, 2005, 2004) e no distribution script, which were then corrected. Some of the errors
source code/binary were solved by switching from Cuckoo version 1.2 to 1.3; however,
 Panorama (Yin et al., 2007) e no source code/binary not all of the errors have been resolved. In a dataset of 24,463
 TEMU (Song et al., 2008) e not updated 2009 samples gathered from VirusShare.com, there were 899 errors of
this type (3.67% of samples). After manual inspection of the errors,
PANDA and Cuckoo Sandbox both met these criteria. PANDA the following explanations for the errors were found:
(Dolan-Gavitt et al., 2015), captures all instructions within the Error: Report.json missing processes or process calls
agent VM at the virtualization architecture level (outside the agent
operating system). This low-level instruction data provides every-  Missing files e (164 samples) these samples needed external
thing Cuckoo Sandbox provides; however, it is in a raw format. resources such as a DLL.
PANDA was not as mature as Cuckoo at the time and lacked plugins  Missing frameworks e (14 samples) these samples needed a
to convert this raw data into the information made readily available framework, such as MSVCP70, MSVCP90, MSVCP100,
by Cuckoo. It also lacked the robust reporting engine within MSVCP110, or Cygwin that was not installed.
Cuckoo. PANDA plugins could have been created to accomplish this,  Different OS e (131 samples) these samples called Windows API
but due to time constraints, Cuckoo Sandbox was chosen. functions that were different from the agent's Windows 7 OS.
After deciding on Cuckoo, we had to make a decision about which These samples caused a Windows error: invalid procedure entry
version of Cuckoo Sandbox we would use. The three primary choices point.
available at that time were Cuckoo 1.2 (Guarnieri), Cuckoo 2.0rc1  Corrupt/Invalid e (2 samples) these samples caused Windows
(Guarnieri), and Cuckoo-modified (Spengler) (a.k.a. Cuckoo version errors not a valid Windows 32 bit application or cannot be run in
1.3), which is a modified community version of Cuckoo Sandbox 1.2. 32 bit mode. Note that all these samples had a valid PE header
We did not choose 2.0rc1 because at the time it had just been released and a PE_TYPE header option value of 267 (32-bit).
and was still in the process of fixing a number of serious bugs and we  Application crashed e (18 samples) these samples crashed on
required something more stable. However, Cuckoo 2.0rc1 does offer startup causing a Windows application crash.
some benefits over the other two choices as far as functionality goes
and, for future work, we may consider using a Cuckoo 2.0 build. The Error: Cuckoo error e The package “modules.packages.exe”
key factor when choosing between 1.2 and 1.3 was that 1.3 provided start function raised an error: Unable to execute the initial process,
support for some features that we required from the sandbox. These analysis aborted.
features, as stated by Spengler, are detailed below:
 Corrupt/Invalid e (217 samples) these samples caused Windows
 Fully-normalized file and registry names errors not a valid Windows 32 bit application or cannot be run in
 Correlating API calls to malware call chains 32 bit mode. Note that all these samples had a valid PE header
 Anti-anti-sandbox and anti-anti-VM techniques built-in and a PE_TYPE header option value of 267 (32-bit).
 The ability to restore removed hooks  Side-by-side configuration e (10 samples) these samples caused
 Additional signatures and hooks Windows error: side-by-side configuration incorrect
 Automatic malware clustering using Malheur
 MIST format converter for Cuckoo logs Error: Cuckoo error system restarted unexpectedly

All of these features were provided in version 1.3, whereas only  Restarts e (7 samples) in Cuckoo 1.3, if the malware restarts the
a subset of the functionality was available in version 1.2. Version 1.2 VM Cuckoo will only capture a partial execution.
provided limited options for dealing with malware that use anti-  Unknown e (35 samples) these samples did not restart the VM
sandbox techniques; whereas version 1.3 provided a significant upon manual inspection, it is unknown why Cuckoo raised this
amount of anti-anti-sandbox techniques. Version 1.3 provided error.
many more signatures than 1.2 and the signatures available in
version 1.3 were more sophisticated than those present in version Error: Cuckoo error e The analysis hit the critical timeout,
1.2. Over the duration of our project, the number of available sig- terminating.
natures in version 1.3 has outgrown what is available from version
1.2, and also includes a number of signatures that were included in  Unknown e (256 samples) These samples appeared to be error
the 2.0rc1 release. free, but it is possible that they are not complete analysis.

Dynamic analysis issues Error: Various other errors

Lesson: Check and truly understand your analysis. Cuckoo has  False errors e (80 samples) Some errors, such as missing Cuckoo
some stability issues that cause Cuckoo samples to be inconsistent sample files, may have been caused by things other than Cuckoo.
C. Miller et al. / Digital Investigation 22 (2017) S48eS56 S55

When these samples were run through Cuckoo again, they did agent. Specific agents will be configured to have a different set of
not produce errors. programs installed or be completely different OS/architectures.
Moving forward, there is also a need for a processing component
Another issue we experienced was that some of our tests used for extracting information from each Cuckoo sample. A scalable,
what Cuckoo called duration in the reports. We determined later extendable plugin framework will be developed to allow feature
that this number does not actually represent the duration that the extraction from any part of the Cuckoo sample. This framework will
malware ran. Instead, this duration represented the time Cuckoo use the same database as the Cuckoo generation process, and will
took to completely process the sample, which includes time before/ be able to automatically extract features from completed samples.
after the malware started/ended. The actual malware duration we Once the processing framework is complete, we will use the fea-
used was the time between the first API call and the last API call. We tures extracted from Cuckoo samples to support future machine
also found that in some API call sequences there would be a large learning classification and clustering work.
gap in time. This was due to Windows updating the system time
after Cuckoo started the analysis. This was a confirmed bug within Conclusion
Cuckoo. The solution is to disable NTP on the guest.
In conclusion, we developed a dynamic analysis platform, per-
Improving analysis performance formed various experiments on the platform to optimize it, and
provided valuable lessons we have learned throughout the process.
Lesson: Disable unnecessary functionality. There are several The experiments that were performed on this platform not only
processing modules that Cuckoo has enabled by default that we helped make key decisions in the platform's development, but also
found to be unnecessary. The following modules were disabled provided insight into best practices/choices for virtualization soft-
because they slow the Cuckoo nodes and their features can be ware and machinery behind other dynamic analysis systems. The
calculated using the Cuckoo Sample at a later date: memory, dynamic analysis platform presented is able to process any number
dropped files, static, and strings modules. Another tweak, one of malware samples per day, only being bound by the computa-
recommended by the Cuckoo developers, was to use Cuckoo's tional resources available. The platform submitted and processed
processing utility. By default the main Cuckoo process (cuckoo.py) samples to and from a database, allowing other systems to be
handles submitting the samples to the agents and processing the connected. Other systems can share information about samples and
results. We found that cuckoo.py became unstable and its perfor- append to a sample record. This allows further, more complex,
mance dropped while samples were submitted and when the re- sample analysis.
sults were processed. This issue was fixed by separating the
processing using Cuckoo's processing utility script: processing.py.
References
Database Analysis of unknown binaries [online].
AV-Test. Malware statistics [online].
Lesson: Use a database. Using a database provided a simple way Blue coat malware analysis s400/s500 [online].
Deepen, D. Malicious Documents Leveraging New Anti-vm and Anti-sandbox
to automate sample processing. Originally, samples were manually Techniques [online].
submitted to Cuckoo from a directory which made it hard to keep Dinaburg, A., Royal, P., Sharif, M., Lee, W., 2008. Ether: malware analysis via hard-
track of processed samples. Now samples are added to the database ware virtualization extensions. In: Proceedings of the 15th ACM Conference on
Computer and Communications Security, ACM, pp. 51e62.
and automatically submitted to Cuckoo. In the development of this Dolan-Gavitt, B., Hodosh, J., Hulin, P., Leek, T., Whelan, R., 2015. Repeatable reverse
tool the use of Couchdb, a NoSQL database, was used for a couple of engineering with PANDA. In: Repeatable Reverse Engineering with PANDA.
reasons. The first being the large volume of data that we would ACM Press, pp. 1e11. http://dx.doi.org/10.1145/2843859.2843867.
Egele, M., Scholte, T., Kirda, E., Kruegel, C., 2012. A survey on automated dynamic
have and the second being the ability to scale out the architecture
malware-analysis techniques and tools. ACM Comput. Surv. (CSUR) 44 (2), 6.
which is an important aspect of this platform. However, the data- Gilboy, M.R., 2016. Fighting Evasive Malware with DVasion (PhD thesis). University
base did not solve all our data issues; since most of the Cuckoo of Maryland.
Guarnieri, C., Tanasi, A., Bremer, J., Mark, S. The cuckoo sandbox.2013 URL https://
sample files are large, it is troublesome to store them in the data-
www.cuckoosandbox.org/.
base. This led us to create links in the database to these files on the Guarnieri, C. Cuckoo Sandbox 1.2 [online].
shared file system. This meant that every machine/analyst that Guarnieri, C. Cuckoo Sandbox 2.0eRC1 [online].
processed these files also had to be connected to the file system. Hungenberg, T., Eckert, M. INetSim. URL http://www.inetsim.org/.
Joebox: A Secure Sandbox Application for Windows to Analyse the Behaviour of
Another issue is that each piece of our system had to know the Malware [online].
exact structure of the database and changing the structure meant Kasama, T., 2014. A Study on Malware Analysis Leveraging Sandbox Evasive Be-
changing each system. haviors (PhD thesis). Yokohama National University.
Keragala, D., 2016. Detecting Malware and Sandbox Evasion Techniques.
Kirat, D., Vigna, G., Kruegel, C., 2011. BareBox: efficient malware analysis on bare-
Future work metal. In: Proceedings of the 27th Annual Computer Security Applications
Conference, ACM, pp. 403e412.
Kirat, D., Vigna, G., Kruegel, C., 2014. BareCloud: bare-metal analysis-based evasive
Now that thousands of samples are automatically processed per malware detection. In: 23rd USENIX Security Symposium. USENIX Association.
day, there are a number of additional improvements that can be OCLC: 254320948.
made. First the submission of samples for Cuckoo generation is Kortchinsky, K., 2009. CLOUDBURST e a VMware Guest to Host Escape Story.
BlackHat, USA.
currently a manual process. In the future, a REST API will be con- Kruegel, C., 2014. Full system emulation: achieving successful automated dynamic
structed that handles much of the sample submission. Cuckoo analysis of evasive malware. In: Proc. BlackHat USA Security Conference,
generation would not need to know the exact structure of the pp. 1e7.
Lengyel, T.K., Maresca, S., Payne, B.D., Webster, G.D., Vogl, S., Kiayias, A., 2014.
database as long as it follows the API guidelines.
Scalability, fidelity and stealth in the drakvuf dynamic malware analysis system.
Currently, the system only supports Windows 7 32-bit. In the In: Proceedings of the 30th Annual Computer Security Applications Conference,
future, any Cuckoo supported OS and additional agent configura- ACM, pp. 386e395.
tions will be added. This could be done by setting a flag in the Norman sandbox analyzer [online].
Provataki, A., Katos, V., 2013. Differential malware forensics. Digit. Investig. 10 (4),
database for which agent the sample should be run on and then the 311e322. http://dx.doi.org/10.1016/j.diin.2013.08.006.
distribution script will submit to the correct Cuckoo node and Rieck, K., Trinius, P., Willems, C., Holz, T., 2011. Automatic analysis of malware
S56 C. Miller et al. / Digital Investigation 22 (2017) S48eS56

behavior using machine learning. J. Comput. Secur. 19 (4), 639e668. Vasudevan, A., Yerraballi, R., 2005. Stealth breakpoints. In: Computer Security Ap-
Roberts, J.-M. VirusShare [online]. plications Conference, 21st Annual. IEEE, p. 10.
Singh, A. Defeating Darkhotel Just-in-time Decryption [online]. Vasudevan, A., Yerraballi, R., 2006. Cobra: fine-grained malware analysis using
Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I., Kang, M.G., Liang, Z., Newsome, J., stealth localized-executions. In: Security and Privacy, 2006 IEEE Symposium on.
Poosankam, P., Saxena, P., 2008. BitBlaze: a new approach to computer security IEEE, p. 15.
via binary analysis. In: International Conference on Information Systems Se- Willems, C. Sandbox Evasion Techniques e Part 2 [online].
curity. Springer, pp. 1e25. Willems, C., Holz, T., Freiling, F., 2007. Toward automated dynamic malware analysis
Spengler, B. Spender-sandbox. URL https://github.com/spender-sandbox/. using cwsandbox. IEEE Secur. Priv. 5 (2).
Vasilescu, M., Gheorghe, L., Tapus, N., 2014. Practical malware analysis based on Wojtczuk, R., Rutkowska, J., Following the white rabbit: software attacks against
sandboxing. In: 2014 RoEduNet Conference 13th Edition: Networking in Edu- intel VT-d technology, invisible things lab (ITL), Tech. Rep.
cation and Research Joint Event RENAM 8th Conference. IEEE, pp. 1e6. Yin, H., Song, D., Egele, M., Kruegel, C., Kirda, E., 2007. Panorama: capturing system-
Vasudevan, A., Yerraballi, R., 2004. Sakthi: a retargetable dynamic framework for wide information flow for malware detection and analysis. In: Proceedings of
binary instrumentation. In: Hawaii International Conference in Computer the 14th ACM Conference on Computer and Communications Security, ACM,
Sciences. pp. 116e127.

You might also like