0% found this document useful (0 votes)
30 views6 pages

ICIIS

Uploaded by

deju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views6 pages

ICIIS

Uploaded by

deju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

                 

Windows Malware Detection Based on Cuckoo


Sandbox Generated Report Using Machine Learning
Algorithm
Shiva Darshan S.L.1 , Ajay Kumara M.A.2 , and Jaidhar C.D.3
Department of Information Technology
National Institute of Technology Karnataka
Surathkal, Mangalore, India
[email protected] , [email protected] , [email protected]

Abstract—Malicious software or malware has grown rapidly works allow an unknown malware to execute in an isolated en-
and many anti-malware defensive solutions have failed to detect vironment and screen its run-time behavior. Such frameworks
the unknown malware since most of them rely on signature- have been in use as a major aspect of the manual investigation
based technique. This technique can detect a malware based
on a pre-defined signature, which achieves poor performance process for a while; they are progressively utilized as a primary
when attempting to classify unseen malware with the capability component of the automated malware detection approach. The
to evade detection using various code obfuscation techniques. main upside of the automated malware detection technique is
This growing evasion capability of new and unknown malwares that it is able to recognize the unseen malware on the basis
needs to be countered by analyzing the malware dynamically in of the observed activities gathered during the execution of the
a sandbox environment, since the sandbox provides an isolated
environment for analyzing the behavior of the malware. In this malware. Majority of the sandboxes observe at the system call
paper, the malware is executed on to the cuckoo sandbox to interface the behavior of a user mode process. System calls are
obtain its run-time behavior. At the end of the execution, the a routine that allow the operating system to interact with the
cuckoo sandbox reports the system calls invoked by the malware user-level process to perform their desired task. These tasks
during execution. However, this report is in JSON format and include reading data from files, delivering packets across the
has to be converted to MIST format to extract the system calls.
The collected system calls are structured in the form of N- network, and recording of entry from the registry. Looking
Grams, which help to build the classifier by using the Information deeper into the execution of a program, a lot more interesting
Gain (IG) as a feature selection technique. A comprehensive information can be gathered.
experiment was conducted to perceive the best fit classifier among This paper presents a classic approach to the detection of
the chosen classifiers, including the Bayesian-Logistic-Regression, malware by extracting only the system calls (i.e., operation
SPegasos, IB1, Bagging, Part, and J48 defined within the WEKA
tool. From the experimental results, the overall best performance field) from the Malware Instruction Set (MIST) report that
for all the selected top N-Grams such as 200, 400, and 600 goes were obtained by implementing the MIST conversion process
to SPegasos with the highest accuracy, highest True Positive Rate for all those runtime behavioral reports of malware produced
(TPR), and lowest False Positive Rate (FPR). by cuckoo sandbox. Further, the extracted system calls are
Keywords—Sandbox, Malware Detection, Machine Learning, used to generate the sequence of N-Grams of specified length
Hypervisor, Virtual machine, N-Gram Feature Extraction.
such as N=2, N=3, and N=4, and then, adopt the Information
Gain (IG) feature selection method to calculate a score for
I. I NTRODUCTION
each N-Gram. Later, the top N-Grams are selected based on
Malware is also known as malicious software. It is a the highest IG score. The selected top N-Grams are processed
malicious code developed with the intention of damaging the by the classifier for classification.
function of a system. Malware has the capacity to disorder The rest of the paper is organized as follows. In Section
the normal operation by infecting the system or network II, we study the background of MIST instruction and its
[1]. It enters a system either through multiple media or gets representation. In Section III, we review earlier research to
downloaded into the system as a genuine application. Once it detect malicious executables. In Section IV, we describe our
gets into the system, it checks for vulnerabilities and infects proposed approach. In section V, experimental results are
the system, if the system is highly vulnerable. Generally, discussed. Finally, conclusion is drawn in the Section VI.
antimalware defensive solutions are signature dependent and
run inside the host machines. They are inadequate to thwart II. BACKGROUND
the emerging advanced malware attacks. The prime task of the malware detection system is to
Computerized malware examination frameworks (or sand- identify known as well as unknown malware and defend the
boxes) [2] [3] are one of the most recent security innovation integrity of the system, while performing its function. The
used to detect malware based on behavior traits. Such frame- analysis of the malware can be performed in two ways i.e.,



Authorized licensed use limited to: Penn State University. Downloaded on November 15,2020 at 02:14:45 UTC from IEEE Xplore. Restrictions apply.
                 

code analysis, and behavior analysis. The Code analysis is the detecting the system call interception by other sandbox
generally achieved in a static way by obtaining a complete systems [5]. On the other hand, virtualization-based sandbox
overview of the software. A major limitation of the code techniques [2] [7] play a vital role by examining the manip-
analysis technique is that it is often clogged by evasion ulated structure of the operating system that is caused by the
techniques such as binary packers, polymorphism, and anti- types and behavior of new variants of malware.
debug techniques. In behavior analysis, the malware behavior
is monitored, while it is running on a host system. Behavior- Cuckoo [3] is another malware analysis system, which
based malware analysis is an efficient way of observing the provides a detailed behavior report of a Windows executable
actions of the malware, while several existing monitoring file, when executed inside an isolated environment. Cuckoo
tools provide the behavioral report [3]. Generally, behavioral- can analyze many different malicious files (executables, doc-
based malware analysis tools execute a malware sample in an ument exploits, etc.) and malicious web-sites in a virtualized
isolated environment to obtain accurate system level behavior environment. Cuckoo is able to trace the API calls and general
by monitoring and recording the system calls invoked by the behavior of the input file and can easily integrate within the
malware. A summarized observed behavior of the malware existing framework. The current development of the sandbox
sample is tabulated in the analysis report. Monitoring suites based system [8] [9] is sufficient in providing behavior activity
such as Anubis and CWSandbox produce the behavior report of input an executable file in the form of a behavioral report.
in textual or XML-based format that provide system-level However, an accurate examination of the malware based on the
behavior of the malware, that includes system calls details. sandbox generated report involves extensive manual analysis.
A human-analyst can easily analyze textual or XML-based In addition, the sandbox also provides a report for benign
formats as they are unsuitable for further automatic analysis executables files on the monitored machine. In such cases,
due to a negative impact on the runtime of the analysis. XML precisely detecting actual malware activities from other benign
representations are inappropriate for finding generic behavioral executable applications is a challenging task. The sandbox
patterns. Unlike XML, textual representations are tough due to report is available in an unstructured form to precisely extract
aggregation and even increase the size of the report. In contrast actual semantic information (e.g, system call). Authors Rick
to textual and XML-based format, a MIST is used to record et al. [4] made an attempt to form an effective detection
all system level behavior in which the system call arguments of malware based on the invoked system call sequence. The
are organized in different levels of blocks (Fig. 1). collected system call sequence structured in the form of N-
Grams and N-Gram feature extraction technique is widely used
for different input sources [10] [11] [12]. In another work,
Tesauro et al. [13] applied the idea of N-Grams as features
for malware detection. The N-Grams were selected from most
frequent classes in malware and benign files. The N-Grams
outperform when the experiment is carried with a larger feature
Fig. 1: MIST representation of system call. set. Recent reports have shown that feature selection based on
the IG has produced the best results in classifying malicious
The first field category denotes the type of system calls executables files from benign executable files [10].
and the second field operation represents a particular system
call. In each MIST instruction, the type of the argument Machine learning algorithms are witnessed as a promising
block and its size depends on the particular system call. The technique to perform an accurate detection of malicious mal-
MIST representation is an optimized form for an effective and ware from benign executable files. Kolter et al. [14] describe
efficient way of analyzing the malware behavior using machine machine learning algorithm to classify the malicious executa-
learning algorithms [4]. bles that appear in the wild by encoding the N-Grams as
features for classification. Automated behavior-based malware
III. RELATED WORK
analysis framework using machine learning technique was
There have been several dynamic malware sandbox ap- proposed [15] that convert the report generated by the sandbox
proaches proposed in literature that perform dynamic malware into MIST format to identify the unknown malware with
analysis using sandbox technology. Willems et al. [5] devel- similar behavior.
oped an open source tool called CWSandbox that allows a
malware sample to execute either in a native environment or In our work, we have used the cuckoo sandbox to gather the
in a virtual Windows environment. Monitoring of the API calls system-level behavior of the executable files. The system calls’
is accomplished by the hook functions of analysis component. sequence, triggered by the executable files (processes), are ex-
The DRAKVUF [6] is another dynamic malware analysis tracted from the cuckoo sandbox generated report. IG feature
system that performs insight trace analysis of execution of selection technique is employed to choose the best features to
malware, including modern stealthy kernel rootkit by inter- construct the Final Feature Vector (FFV). Machine learning
cepting the kernel heap allocation of the targeted system. In algorithm is employed to classify the malware executable files
addition, DRAKVUF efficiently addresses the challenges in from benign executables files based on the FFV.



Authorized licensed use limited to: Penn State University. Downloaded on November 15,2020 at 02:14:45 UTC from IEEE Xplore. Restrictions apply.
                 

IV. PROPOSED WORK Duplicate removal


Our proposed work distinguishes the malware files from In first step system call extraction, we select only the
benign files on the basis of system calls’ sequence is structured operation field, i.e., the system calls of all the benign MIST
using a heuristic method called N-Grams analysis. It adopts files (1, 2, . . . .,10, 11, . . . . n) and all the malware
the IG technique to compute the IG score for the each N- MIST files (1, 2, . . . .,10, 11, . . . . n) as shown in Fig.
Gram and extracts the top N-Grams (features) based on the 4, Since we have the record of all system level behaviors.
highest IG score in order to prepare a FFV that is needed for The extracted operation fields are stored in a text file and
classification. Fig.2 depicts an overview architecture of the grouped in sequence to form N-Grams of variable length, i.e.,
proposed work. N=2, N=3, N=4, etc. The lengthier the N-Grams size, better
characteristics are represented. A snippet of extraction is as
shown in Fig. 3. We have grouped N-Grams of length four
bytes, while forming the N-Grams in the second step of the
generation phase. In the third step, the formed N-Grams are
sorted in descending order to get the highest order sequence
of N-Grams. After the sorting operation in the fourth step,
the duplicates should be removed, if observed to get unique
N-Grams. The unique N-Grams can be employed for better
feature selection and also provide better classification.

Fig. 2: System Architecture of the proposed work.


A. Behavior analysis
Since, the cuckoo sandbox functions at hypervisor as a
separate entity, it examines the behavior of malware which
are running on VMs to obtain the behavioral analysis report
of running executables in JavaScript Object Notation (JSON)
(a) Steps to generate Benign N-Gram Files.
format.
B. Conversion process
The analysis reports obtained in JSON format are pre-
processed to obtain the MIST, since it is a preferred format
that uses a smaller file size and reduces processing time. Since
our approach is specific to observation on monitored system
calls, we are concerned with the operation field (system call
as shown in Fig. 1) of MIST files to generate N-Grams (4
bytes) files as shown in Fig. 3.
(b) Steps to generate Malware N-Gram Files.
Fig. 4: System call extraction phase.
The above explanation is prerequisite for the feature selec-
tion approach, since it cannot be performed without the N-
Gram formation. The formed Benign N-Gram files [B1, B2,
B3,. . . ,Bn] and Malware N-Gram files [M1, M2, M3, . .
.,Mn] must undergo union operation considering each benign
N-Gram files [B1 ∪ B2 ∪ B3 ∪ . . . ∪ Bn] and malware
N-Gram files [M1 ∪ M2 ∪ M3 ∪ . . . ∪ Mn]. After the
union operation, the benign union N-Gram files and malware
union N-Gram files must be sorted in non-increasing order
Fig. 3: Snippet of N-Gram extraction using MIST file.
and duplicates must be removed, if observed to achieve unique
To generate the N-Gram files, we follow the following steps: benign N-Gram files and unique malware N-Gram files. The
• System calls extraction, occurrences of each unique benign N-Gram in the benign N-
• N-Gram generation, Gram files are observed and tabulated as N-Gram frequency
• Sorting of N-Grams, and table for the benign class, and in the same way, the occurrences



Authorized licensed use limited to: Penn State University. Downloaded on November 15,2020 at 02:14:45 UTC from IEEE Xplore. Restrictions apply.
                 

of each unique malware N-Gram in the malware N-Gram files collected from the public source 1 and the remaining 100
are observed and tabulated as N-Gram frequency table for the malware MIST files were obtained by implementing the MIST
malware class. conversion process for all those runtime behavioral reports
produced by cuckoo sandbox by injecting the Kelihos Trojan.
As explained earlier, we extracted N-Grams of different sizes
2bytes, 3bytes and 4bytes to measure which N-Gram size
achieves the best detection rate. A separate experiment was
conducted for each N-Gram size.The N-Grams are sorted in
decreasing order based on the IG score and duplicate N-Gram
is removed, if found. The class-wise document frequency for
each class was determined for each N-Gram to prepare the
contingency table. The IG method is used to calculate a score
for each N-Gram and the top K N-Grams are determined
based on the highest IG score. Experiment were conducted
Fig. 5: N-Gram frequency table for benign class and malware for different values of K such as 200, 400, and 600. Further,
class with feature contingency table. the best features were drawn at each K value for different N-
The feature contingency table is then prepared based on Gram lengths. The best features were pre-processed through
the values accommodated in the N-Gram frequency table for the instruction converter to prepare ARFF files for the selected
benign category and malware category as depicted in Fig. 5. N-Grams. The ARFF files were submitted to the WEKA tool
The feature contingency table is used to calculate Information for classification. A wide set of experiments were conducted
Gain [10]. Information Gain is computed by the following to determine which classifier achieved best malware detection
equation, rate with low False Positive Rate (FPR). We evaluated the
performance of several classification algorithms stated in the
 
IG(N − Gram) = P (vN −Gram , C) WEKA tool.
vN −Gram ∈{0,1} C∈{Ci }
Our objective was to know the best classification algorithm
among the several stated in the WEKA tool. From that per-
P (vN −Gram , C)
log spective, we selected six classifiers among the eight different
P (vN −Gram ), P (C)
categories mentioned in the WEKA tool. The six classifiers
(1)
chosen were the Bayesian-Logistic-Regression, SPegasos, IB1,
Where, C is one of the two categories - benign or malware Bagging, Part and J48 classified under Bayes, functions, lazy,
and vN −Gram is the value of N-Gram. vN −Gram = 1 indicates meta, rules and trees of WEKA. For evaluation purposes, we
that the N-Gram present either in benign N-Gram files or mal- measured and tabulated the values of True Positive Rate (TPR),
ware N-Gram files and vN −Gram = 0, otherwise. P(vN −Gram , False Positive Rate (FPR), Precision, Recall, F-measure, ROC
C) is the proportion of N-Gram files in C in which the N- Area and Accuracy for all the chosen six classifiers as shown
Gram takes on value vN −Gram . P(vN −Gram ) is the proportion in TABLE I and TABLE II.
of benign N-Gram files or malware N-Gram files in entire Two experiments were carried out by us: In the first ex-
training set such that N-Gram takes the value vN −Gram . P(C) periment, we considered N-Gram of three bytes in order to
is the proportion of data set belonging to category C. The N- select the top N-Grams based on the highest score of IG. The
Grams are organized in non-increasing order based on the IG top N-Grams were selected in terms of 200, 400, and 600.
score and the topmost L number of N-Grams are extracted as From the experimental observation, as shown in Fig. 6, the
best features for classification purpose. highest accuracy was 89.77% for 200 N-Grams, 90.03% for
400 N-Grams, and 89.88% for 600 N-Grams yielded by the
C. Instruction Converter SPegasos classifier (Fig. 6a). The highest TPR of 0.898 for 200
The instruction converter converts the extracted features into N-Grams, 0.9 for 400 N-Grams, and 0.899 for 600 N-Grams
an ARFF (Attribute-Relation File Format) file. ARFF is an was produced by the SPegasos classifier (Fig. 6b). The lowest
ASCII text file that describes a list of instances sharing a set FPR of 0.102 for 200 N-Grams, 0.1 for 400 N-Grams, and
of attributes. It is an important process because the classifiers 0.101 for 600 N-Grams was given by the SPegasos classifier
of WEKA tool used in our approach works with the ARFF (Fig. 6c). Receiver Operating Characteristics (ROC) curves is
file. mainly used to compare the classification capability of the
different algorithms. Among the number of classifiers tested
V. EXPERIMENT RESULTS in this work, it was observed that SPegasos classifier attained
Our experimental data consists of 3000 benign MIST files the best results.
and 3100 malware MIST files. The malware MIST files con- Similarly, in the second experiment, N-Gram of length
sists of four different families such as Swizzor (1000), Basun four bytes was analyzed, and the results for highest accuracy
(1000), AutoIt (1000), and Kelihos Trojan (100). Among the
considered four different malware families the first three were
1 https://github.com/rieck/malheur/tree/master/data



Authorized licensed use limited to: Penn State University. Downloaded on November 15,2020 at 02:14:45 UTC from IEEE Xplore. Restrictions apply.
                 

TABLE I: WEKA Classification results for N-Gram Length 3 bytes.


N-Gram Length= 3 N-Gram Length= 3 N-Gram Length= 3
Selected Top N-Grams = 200 Selected Top N-Grams = 400 Selected Top N-Grams = 600
Classifier C1 C2 C3 C4 C5 C6 C1 C2 C3 C4 C5 C6 C1 C2 C3 C4 C5 C6
0.894 0.902 0.881 0.912 0.899 0.896 0.882 0.894 0.874 0.903 0.877 0.886 0.882 0.904 0.874 0.908 0.888 0.874 B
TPR 0.895 0.887 0.885 0.88 0.886 0.899 0.906 0.899 0.882 0.885 0.9 0.915 0.91 0.89 0.882 0.885 0.881 0.923 M
0.894 0.894 0.883 0.896 0.893 0.898 0.894 0.896 0.878 0.894 0.889 0.9 0.896 0.897 0.878 0.897 0.885 0.899 W
0.105 0.113 0.115 0.12 0.114 0.101 0.094 0.101 0.118 0.115 0.1 0.085 0.09 0.11 0.118 0.115 0.119 0.077 B
FPR 0.106 0.098 0.119 0.088 0.101 0.104 0.118 0.106 0.126 0.097 0.123 0.114 0.118 0.096 0.126 0.092 0.112 0.126 M
0.106 0.106 0.117 0.104 0.108 0.102 0.106 0.104 0.122 0.106 0.112 0.1 0.104 0.103 0.122 0.103 0.115 0.101 W
0.895 0.888 0.885 0.884 0.888 0.899 0.903 0.898 0.881 0.887 0.897 0.912 0.908 0.891 0.881 0.888 0.882 0.919 B
Precision 0.894 0.9 0.881 0.909 0.897 0.897 0.885 0.894 0.875 0.901 0.88 0.889 0.885 0.903 0.875 0.906 0.887 0.88 M
0.894 0.894 0.883 0.897 0.893 0.898 0.894 0.896 0.878 0.894 0.889 0.901 0.896 0.897 0.878 0.897 0.885 0.9 W
0.894 0.902 0.881 0.912 0.899 0.896 0.882 0.894 0.874 0.903 0.877 0.886 0.882 0.904 0.874 0.908 0.888 0.874 B
Recall 0.895 0.887 0.885 0.88 0.886 0.899 0.906 0.899 0.882 0.885 0.9 0.915 0.91 0.89 0.882 0.885 0.881 0.923 M
0.894 0.894 0.883 0.896 0.893 0.898 0.894 0.896 0.878 0.894 0.889 0.9 0.896 0.897 0.878 0.897 0.885 0.899 W
0.894 0.895 0.883 0.898 0.893 0.898 0.893 0.896 0.878 0.895 0.887 0.899 0.894 0.898 0.877 0.898 0.885 0.896 B
F-measure 0.894 0.893 0.883 0.894 0.892 0.898 0.895 0.897 0.879 0.893 0.89 0.902 0.897 0.896 0.878 0.895 0.884 0.901 M
0.894 0.894 0.883 0.896 0.892 0.898 0.894 0.896 0.878 0.894 0.888 0.9 0.896 0.897 0.878 0.897 0.885 0.899 W
0.968 0.971 0.883 0.896 0.965 0.898 0.968 0.972 0.878 0.894 0.959 0.9 0.966 0.972 0.878 0.897 0.955 0.899 B
ROC Area 0.968 0.971 0.883 0.896 0.965 0.898 0.968 0.972 0.878 0.894 0.959 0.9 0.966 0.972 0.878 0.897 0.955 0.899 M
0.968 0.971 0.883 0.896 0.965 0.898 0.968 0.972 0.878 0.894 0.959 0.9 0.966 0.972 0.878 0.897 0.955 0.899 W
Accuracy (%) 89.43 89.42 88.30 89.62 89.25 89.77 89.40 89.63 87.82 89.40 88.85 90.03 89.60 89.68 87.78 89.67 88.47 89.88
TPR: True Positive Rate, FPR: False Positive Rate, C1: J48, C2: Bagging, C3: Ib1, C4: Bayesian Logistic Regression, C5: Part, C6: Spegasos,
B: Benign, M: Malware, W: Weighted Average

TABLE II: WEKA Classification results for N-Gram Length 4 bytes.


N-Gram Length = 4 N-Gram Length = 4 N-Gram Length = 4
Selected Top N-Grams = 200 Selected Top N-Grams = 400 Selected Top N-Grams = 600
Classifier C1 C2 C3 C4 C5 C6 C1 C2 C3 C4 C5 C6 C1 C2 C3 C4 C5 C6
0.899 0.902 0.88 0.9 0.899 0.921 0.879 0.904 0.881 0.904 0.885 0.9 0.881 0.907 0.881 0.903 0.88 0.894 B
TPR 0.885 0.885 0.878 0.878 0.887 0.88 0.907 0.887 0.873 0.878 0.9 0.891 0.904 0.887 0.882 0.877 0.887 0.905 M
0.892 0.894 0.879 0.889 0.893 0.9 0.893 0.896 0.877 0.891 0.893 0.896 0.893 0.897 0.882 0.89 0.884 0.9 W
0.115 0.115 0.122 0.122 0.113 0.12 0.093 0.113 0.127 0.122 0.1 0.109 0.096 0.113 0.118 0.123 0.113 0.095 B
FPR 0.101 0.098 0.12 0.1 0.101 0.079 0.121 0.096 0.119 0.096 0.115 0.1 0.119 0.093 0.119 0.097 0.12 0.106 M
0.108 0.106 0.121 0.111 0.107 0.1 0.107 0.104 0.123 0.109 0.108 0.104 0.108 0.103 0.118 0.11 0.117 0.101 W
0.886 0.884 0.879 0.881 0.889 0.887 0.904 0.889 0.874 0.881 0.898 0.892 0.901 0.889 0.882 0.88 0.886 0.904 B
Precision 0.898 0.918 0.88 0.898 0.898 0.9 0.882 0.902 0.88 0.902 0.887 0.899 0.884 0.905 0.881 0.9 0.881 0.895 M
0.892 0.901 0.879 0.889 0.893 0.894 0.893 0.896 0.877 0.891 0.893 0.896 0.893 0.897 0.882 0.89 0.884 0.9 W
0.899 0.921 0.88 0.9 0.899 0.902 0.879 0.904 0.881 0.904 0.885 0.9 0.881 0.907 0.881 0.903 0.88 0.894 B
Recall 0.885 0.88 0.878 0.878 0.887 0.885 0.907 0.887 0.873 0.878 0.9 0.891 0.904 0.887 0.882 0.877 0.887 0.905 M
0.892 0.9 0.879 0.889 0.893 0.894 0.893 0.896 0.877 0.891 0.893 0.896 0.893 0.897 0.882 0.89 0.884 0.9 W
0.893 0.902 0.879 0.89 0.894 0.895 0.891 0.897 0.877 0.892 0.892 0.896 0.891 0.898 0.882 0.891 0.883 0.899 B
F-measure 0.891 0.898 0.879 0.888 0.892 0.893 0.894 0.895 0.877 0.89 0.893 0.895 0.894 0.896 0.882 0.888 0.884 0.9 M
0.892 0.9 0.879 0.889 0.893 0.894 0.893 0.896 0.877 0.891 0.892 0.896 0.892 0.897 0.882 0.89 0.883 0.899 W
0.964 0.97 0.879 0.889 0.964 0.894 0.964 0.972 0.877 0.891 0.963 0.896 0.965 0.972 0.882 0.89 0.956 0.9 B
ROC Area 0.964 0.97 0.879 0.889 0.964 0.894 0.964 0.972 0.877 0.891 0.963 0.896 0.965 0.972 0.882 0.89 0.956 0.9 M
0.964 0.97 0.879 0.889 0.964 0.894 0.964 0.972 0.877 0.891 0.963 0.896 0.965 0.972 0.882 0.89 0.956 0.9 W
Accuracy (%) 89.20 89.37 87.93 88.92 89.30 90.03 89.27 89.57 87.7 89.1 89.25 89.57 89.20 89.68 88.17 88.97 88.35 89.95
TPR: True Positive Rate, FPR: False Positive Rate, C1: J48, C2: Bagging, C3: Ib1, C4: Bayesian Logistic Regression, C5: Part, C6: Spegasos,
B: Benign, M: Malware, W: Weighted Average

were 90.03% for 200 N-Grams, 89.57% for 400 N-Grams, 200 N-Grams, 0.104 for 400 N-Grams, and 0.101 for 600 N-
and 89.95% for 600 N-Grams with respect to the SPegasos Grams produced by the SPegasos classifier (Fig. 7c). From the
classifier (Fig. 7a). The highest TPR was 0.9 for 200 N-Grams, visual inspection of Fig. 6 and Fig. 7, we can conclude that
0.896 for 400 N-Grams, and 0.9 for 600 N-Grams obtained by SPegasos classifier turned out to be best and ensured better
the SPegasos classifier (Fig. 7b). The lowest FPR was 0.1 for classification for both N-Gram lengths three and four.



Authorized licensed use limited to: Penn State University. Downloaded on November 15,2020 at 02:14:45 UTC from IEEE Xplore. Restrictions apply.
                 

- - - -


%DJJLQJ %DJJLQJ %DJJLQJ %DJJLQJ
,E ,E
 ,E  %/5
 ,E 
%/5
%/5 %/5
3DUW 3DUW 3DUW
3DUW
 6SHJDVRV  6SHJDVRV  6SHJDVRV  6SHJDVRV

  



  
$FFXUDF\ 



52&$UHD
  

735

)35

  

  

   

   


           
6HOHFWHG7RS1*UDPV 6HOHFWHG7RS1*UDPV 6HOHFWHG7RS1*UDPV 6HOHFWHG7RS1*UDPV

(a) Accuracy. (b) TPR. (c) FPR. (d) ROC Area.


Fig. 6: Graphical representation considering evaluation measures such as (a) Accuracy, (b) True Positive Rate, (c) False Positive
Rate and (d) ROC area. When N-Gram length is three bytes.
- -
-
- %DJJLQJ
%DJJLQJ %DJJLQJ
%DJJLQJ ,E
 ,E  ,E  ,E
 %/5
%/5 %/5
%/5 3DUW
3DUW 3DUW 6SHJDVRV
 6SHJDVRV  6SHJDVRV  3DUW 
6SHJDVRV

  



  
$FFXUDF\ 



52&$UHD
  
735

)35

  

  

   

   


           
6HOHFWHG7RS1*UDPV 6HOHFWHG7RS1*UDPV 6HOHFWHG7RS1*UDPV 6HOHFWHG7RS1*UDPV

(a) Accuracy. (b) TPR. (c) FPR. (d) ROC Area.


Fig. 7: Graphical representation considering evaluation measures such as (a) Accuracy, (b) True Positive Rate, (c) False Positive
Rate and (d) ROC area. When N-Gram length is four bytes.
VI. CONCLUSION [5] C. Willems, T. Holz, and F. Freiling, “Toward automated dynamic
malware analysis using cwsandbox,” IEEE Security and Privacy, vol. 5,
In order to detect the malicious activities of the malware, no. 2, pp. 32–39, 2007.
behavior analysis of the executable file (process) such as [6] T. K. Lengyel, S. Maresca, B. D. Payne, G. D. Webster, S. Vogl, and
A. Kiayias, “Scalability, fidelity and stealth in the drakvuf dynamic
system calls invoked by the input file during execution have malware analysis system,” in Proceedings of the 30th Annual Computer
been employed. The gathered system calls’ sequence chunked Security Applications Conference. ACM, 2014, pp. 386–395.
into N-Gram and each N-Gram treated as a feature. The IG [7] M. Neugschwandtner, C. Platzer, P. M. Comparetti, and U. Bayer,
“Danubis–dynamic device driver analysis based on virtual machine
feature selection method was used to choose the best features introspection,” in International Conference on Detection of Intrusions
based on highest IG score, and the selected features were used and Malware, and Vulnerability Assessment. Springer, 2010, pp. 41–
to prepare FFV needed by the classifier. The experiments were 60.
[8] Y. Qiao, Y. Yang, J. He, C. Tang, and Z. Liu, “Cbm: free, automatic
performed using different classifiers available in the WEKA malware analysis framework using api call sequences,” in Knowledge
tool. From the experimental observations, it was found that Engineering and Management. Springer, 2014, pp. 225–236.
the better classifier among the chosen six classifiers in this [9] J. Shi, Y. Yang, C. Li, and X. Wang, “Spems: A stealthy and practical
execution monitoring system based on vmi,” in International Conference
experimental work is the SPegasos since it achieved highest on Cloud Computing and Security. Springer, 2015, pp. 380–389.
accuracy, highest TPR, and lowest FPR compared to the [10] D. K. S. Reddy and A. K. Pujari, “N-gram analysis for computer virus
others. SPegasos achieved better detection rate for different detection,” Journal in Computer Virology, vol. 2, no. 3, pp. 231–239,
2006.
feature lengths of 200, 400, and 600. Our future work will [11] S. Jain and Y. K. Meena, “Byte level n–gram analysis for malware
aim to develop a multiprocessing model able to compute IG detection,” in Computer Networks and Intelligent Computing. Springer,
scores for larger N-Gram datasets. 2011, pp. 51–59.
[12] H. Parvin, B. Minaei, H. Karshenas, and A. Beigi, “A new n-gram
feature extraction-selection method for malicious code,” in International
R EFERENCES Conference on Adaptive and Natural Computing Algorithms. Springer,
2011, pp. 98–107.
[1] A. Shabtai, R. Moskovitch, Y. Elovici, and C. Glezer, “Detection of [13] G. J. Tesauro, J. O. Kephart, and G. B. Sorkin, “Neural networks for
malicious code by applying machine learning classifiers on static fea- computer virus recognition,” IEEE expert, vol. 11, no. 4, pp. 5–6, 1996.
tures: A state-of-the-art survey,” Information Security Technical Report, [14] J. Z. Kolter and M. A. Maloof, “Learning to detect and classify malicious
vol. 14, no. 1, pp. 16–29, 2009. executables in the wild,” Journal of Machine Learning Research, vol. 7,
[2] Anubis: Analyzing Unknown Binaries-http://analysis.iseclab.org/. no. Dec, pp. 2721–2744, 2006.
[3] Cuckoo Sandbox-https://cuckoosandbox.org/. [15] K. Rieck, T. Holz, C. Willems, P. Düssel, and P. Laskov, “Learning
[4] K. Rieck, P. Trinius, C. Willems, and T. Holz, “Automatic analysis and classification of malware behavior,” in International Conference
of malware behavior using machine learning,” Journal of Computer on Detection of Intrusions and Malware, and Vulnerability Assessment.
Security, vol. 19, no. 4, pp. 639–668, 2011. Springer, 2008, pp. 108–125.

 

Authorized licensed use limited to: Penn State University. Downloaded on November 15,2020 at 02:14:45 UTC from IEEE Xplore. Restrictions apply.

You might also like