0% found this document useful (0 votes)
124 views8 pages

Multi Level Ransomware Detection Framework

This document proposes a multi-level ransomware detection framework that uses natural language processing and machine learning techniques. The framework analyzes ransomware at the dynamic link library, function call, and assembly instruction levels using supervised machine learning algorithms. N-gram probabilities, term frequencies, and inverse document frequencies are used to generate feature sets from the ransomware binaries. Experiments show that ransomware detection accuracy decreases as the value of N increases in the n-gram language model. Logistic regression outperformed other classifiers, achieving a 98.59% detection rate when combining features from all levels.

Uploaded by

Ryans
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views8 pages

Multi Level Ransomware Detection Framework

This document proposes a multi-level ransomware detection framework that uses natural language processing and machine learning techniques. The framework analyzes ransomware at the dynamic link library, function call, and assembly instruction levels using supervised machine learning algorithms. N-gram probabilities, term frequencies, and inverse document frequencies are used to generate feature sets from the ransomware binaries. Experiments show that ransomware detection accuracy decreases as the value of N increases in the n-gram language model. Logistic regression outperformed other classifiers, achieving a 98.59% detection rate when combining features from all levels.

Uploaded by

Ryans
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

A Multi-Level Ransomware Detection Framework

using Natural Language Processing and Machine


Learning
Subash Poudyal, Dipankar Dasgupta, Zahid Akhtar, Kishor Datta Gupta
Department of Computer Science
The University of Memphis
Memphis, TN, USA
{spoudyal, ddasgupt, zmomin, kgupta1}@memphis.edu

Abstract—Ransomware attacks in recent years have proved control server. It forces the users to pay the money with cryp-
expensive due to significant damages and obstructions these tocurrencies to give back the original files, but the recovery
caused in various sectors such as health, insurance, business, of their original files is not guaranteed. Ransomware takes
and education. Several malware detection methods have been
proposed to uncover different malware families, but the problem advantage of a system’s vulnerabilities, such as the Windows
remained unsolved due to the continuously evolving malware. SMB (Server Message Block) Remote Code Execution Vul-
In this work, we proposed a multi-level big data mining nerability, CVE-2017-0144 [3] which encrypts and locks the
framework combining Reverse engineering, Natural Language user’s system.
Processing(NLP) and Machine Learning(ML) approaches. The Ransomware is mainly classified into two categories: Crypto
framework analyzes the ransomware at different levels (i.e.,
Dynamic link library, function call and assembly instruction ransomware and Locker ransomware. Crypto ransomware en-
level) via different supervised ML algorithms. Apache Spark was crypts the files in an infected computer system and hold the
employed for faster processing of large generated feature set. files unless the payment is done via bitcoin. Payment through
Portable Executable (PE) parser and Objectdump tool of Linux bitcoin hides the identity of the malware writer. Paying ransom
system were used to get the raw data from the ransomware generally makes the decryption of encrypted files possible
and normal binaries that were processed further using our
custom-built NLP processing. The n-gram probabilities, term- so that the users can get the original readable data. On the
frequency and inverse document frequency (TF-IDF) were used other hand, Locker ransomware only locks the files. Users can
to generate the final feature sets. Experiments were performed recover the locked files by physically moving the hard drive
with different N values of n-gram language model that shows to a safe location or a system.
that the ransomware detection accuracy is inversely proportional According to Cyber security business report [2], the esti-
to the value of N. Among the five chosen supervised classifiers,
Logistic regression outperformed others with a detection rate of mated cost for damages caused by ransomware attacks will
98.59% for generated TF-IDFs trigrams at combined multi-level, be $11.5 billion by the end of 2019. In 2019, ransomware
which is an improved accuracy compared to individual levels. is projected to attack a business every 14 seconds. In June
Index Terms—Ransomware, Ransomware detection, Reverse 2019, two Florida cities, Riviera and Lake became victims
Engineering, NLP, N-gram language model, N-gram probabil- of ransomware attacks. Riviera and Lake city council agreed
ity, TF-IDF, Big data, Apache Spark, Machine learning, DLL,
Function call, Assembly instructions. to pay $600,000 and $500,000, respectively, to get their data
back [6].
Various work in ransomware analysis is done using dynamic
I. I NTRODUCTION
analysis, which includes running an executable file in Virtu-
Ransomware attacks have been an increasing trend these alBox or a sandbox environment. The malware is executed
days. Various government and non-government organizations in a safe environment which does not pose harm to the host
have been affected especially in the field of education, health, system but has notable limitations. Some ransomware samples
business, research, and insurance. Techniques such as social do not run in the virtual environment and do not show their real
engineering attacks, password breaking, network attack and behavior. Moreover, the command line arguments cannot be
so on have been applied to take control of the user’s machine known. These limitations can make the ransomware analysis
and resources so as to cause further damage and disruption. and detection ineffective. Multiple works have been done by
The high cost of damage caused by ransomware attacks is researchers [22, 27, 12, 10] using NLP techniques such as n-
due to downtime of the live system, disruption of the normal gram and TF-IDF to construct the feature vector of function
business, cost for forensic investigation, restoration cost, loss calls or opcodes and then perform machine learning training
due to reputational harm, and cyber security training cost. and classification. Since feature vector construction at multi-
Ransomware uses cryptographic algorithms to encrypt and level is missing, this motivates us to perform this unique anal-
lock the system, and communicate with the command and ysis at multi-level. We argue that in order to have a detailed
analysis of ransomware, static analysis using NLP and ML their technique will overcome issues like brittle features of
techniques prove more efficient. This paper leverages these n-gram model. Wu et al. [26] have proposed DroidDolphin
approaches and proposes a multi-level ransomware detection using APImonitor, SVM, and Hadoop clusters. API monitor
framework to analyze and detect ransomware as shown in tool basically captures API sequences from Android based
Figure 1. executables. N-gram model then generates the necessary
The remaining portion of this paper is organized into various features to feed the SVM model.
sections. Related work is presented in the Section II. The
proposed multi-level framework is discussed in Section III. Unlike other works, our paper tries to explore multiple
The workflow of the detector engine, which is the major levels, namely DLL, function call, and assembly instruction
detection component of the proposed framework, is presented while disassembling the binaries to create n-gram sequences,
in Section IV. Discussion about the dataset and experiments is calculate their probability scores and TF-IDF scores to gen-
done in Section V. The paper ends with conclusion and future erate feature vectors for machine learning classifiers. We also
works in Section VI. analyze the trigram sequences at different levels to explore
insights about the distinguishing characteristics of ransomware
II. R ELATED WORK and benign samples.
Different approaches have been proposed to analyze and
detect ransomware. Trung et al. [22] have used methods such III. P ROPOSED M ETHODOLOGY
as n-gram, doc2vec, TF-IDF to convert the API(Application The proposed methodology is basically a multi-level ran-
programming interface) sequences to numeric vectors. These somware detection framework, which comprises of six major
vectors are supplied to the machine learning classifiers. components: DLL tracker, Function call tracker, Assembly in-
In another paper, Trung et al. [23] have used a memory struction tracker, Detector engine, Action engine, and Passive
augmented neural network in combination with malware’s analyzer. This multi-level framework is run in an active mode
API calls sequence. They have used word2vec to convert API so as to analyze the given binaries at three levels, as shown
sequences to numeric vectors before feeding to the one-shot in Figure 1. It is initiated with the detection counter (dc) set
learning network. Hanqi Zhang et al. [27] have used opcode to zero. This framework tracks the detection rate at each level
sequences from ransomware samples and transformed them going from DLL to the assembly instruction level so is named
to the n-gram sequences. They achieved the best accuracy as multi-level framework. At level 1, the DLL tracker interacts
with 91.43%. Munir et al. [12] have used cuckoo sandbox with the detector engine, moves to the second level with
environment to perform dynamic analysis and capture the function call tracker, and then finally moves to the assembly
API calls. A class-wise approach is applied to the multi-class instruction tracker. The details of each major component is
malware family identification. For four ransomware families, described in below sections.
96.05% accuracy was achieved. Similarly, Gerardo et al. [10]
investigated the n-grams of android malwares based on A. DLL tracker
opcodes. They claimed an accuracy of 97%. Nial et al. [15]
DLL tracker analyzes the DLLs of a given binary using
have done static analysis of raw opcode sequences of android
the detection engine, as shown in Figure 1, and calculates
malware using deep convolutional neural network. Frequency
its classification accuracy. The details of detector engine
analysis of DLLs and assembly instructions is done using
framework is explained in Section IV. The detection counter
machine learning techniques by Poudyal et al. [18]. Wang et
is incremented by one if the accuracy is greater or equal to the
al. [24] have extracted text level features from the HTTP flow
defined threshold value. Threshold value is set by the expert
generated by the mobile apps to develop malware detection
user or the security team. For our experiment, we considered
model. Xin et al. [25] have used RNN based auto encoders,
threshold as 80%.
which process the given API calls of malware to get the low
dimensional representation. B. Function call tracker
Canzanese et al. [11] have analyzed system call traces A function call tracker analyzes the function calls of a given
utilizing n-gram language model and TF-IDF to detect binary. It also uses the detector engine and calculates the
malicious processes. They have claimed that their proposed classification accuracy. The detection counter is incremented
system would alarm the user if some unintended behaviours if the accuracy obtained is greater or equal to the defined
are observed which includes activities like host modifications. threshold value.
Alsulami et al. [9] have proposed a lightweight behavioral
malware detection technique that leverages Microsoft C. Assembly instruction tracker
Windows prefetch files. They have used n-gram, TF-IDF An assembly instruction tracker works similarly to DLL
and feature dimensionality reduction with SVM and logistic and function call trackers. The difference is that the detection
regression classifiers. Raff et al. [19] have explored raw counter’s value is evaluated here. If that value is greater or
byte sequences of malware leveraging neural network to equal to one, then the action engine is triggered, else the
improve the malware detection rates. They have claimed that passive analyzer comes into play.
F. Passive analyzer
When an executable or binary file is excluded by an action
engine, the system continues to monitor using a passive
analyzer. The passive analyzer generates the signature of the
binary and updates its detection database. The security admin
may further escalate the analysis of a particular binary using
behaviour analyzer techniques such as system monitoring, file
access analyzers, and so on. Digging into the details of the
passive analyzer is, again, out of the scope of this paper. More
related details can be read from these references [13, 21, 17].

IV. W ORKFLOW OF DETECTOR ENGINE


The detector engine works in two phases: Feature generation
and Machine learning prediction as shown in Figure 2. Each
phase conducts various operations which are described in
sections given below.

A. Reverse engineering and pre-processor


Reverse engineering is the process of deconstructing a
binary executable so as to generate the assembly opcodes and
analyze it to fulfil some meaningful objectives. The life cycle
of a binary executable is shown in Figure 3.
The program source code, written either in C or other
programming language, is compiled which involves steps from
a lexical analyzer to code optimizer. The object files generated
are linked to a binary file. The loader consists of OS loaders
and dynamic link libraries, which resolve the references to
the code to finally become a running executable. The goal of
the reverse engineering process is to get the original source
code functionality as close as possible. We reverse engineered
the ransomware and normal executable using the PE parser
Fig. 1. Multi-level ransomware detection framework tool [4] and Objdump Disassembler.
The PE parser tool is used to get the DLLs and function
calls used by the ransomware and normal samples while the
objdump tool is used to get the assembly instructions associ-
D. Detector engine
ated to each executable sample. The pre-processor component
A detector engine is a vital component of the proposed processes the code segments generated by the PE parser and
multi-level framework as it consists of reverse engineering, objdump tool.
pre-processing, big data analysis, natural language processing In this paper, we deal with the windows portable executable
methods, and machine learning classifiers. Each tracker, at files. PE file format is a data structure that holds the in-
each level, make use of this engine to identify whether the formation that is required for the windows operating system
given executable is benign or ransomware. This engine is loader to handle the program code [16]. It is used by windows
explained in details with a block diagram in Section IV. executable, object code, and DLLs.

B. Multi-level Extractor
E. Action engine The multi-level extractor tool collects the DLLs, function
calls and assembly instructions used in a sequence for a given
An action engine is responsible for incident handling and sample from the processed data of the pre-processor. Below
response. When the detection counter’s value is greater or is the brief explanation about each extractor types.
equal to one then the action engine analyzes its further action 1) DLL Extractor: A dynamic link library referred to as a
and alerts the user or system about the detection. Immediate DLL is a library that contains code and data that can be used
action or preventative actions are implemented via either by more than one program at the same time. The main benefit
manual or automatic inspectors. The details of the action of DLL is code re-usability and efficient memory usage. DLL
engine is beyond the scope of this paper. However, more can be user defined or entity/Microsoft defined as shown in
related details can be found on these references [14, 20, 5, 8]. Figure 5.
Fig. 3. Life cycle of a binary file

Fig. 4. Hierarchy of windows DLL

calls and system calls appear together in sequence. System


calls are considered a special type of function call, so the latter
is used more often as a common term. In our experiment we
used the function call sequences, which consists of system
calls as well.
3) Assembly Instruction Extractor: Assembly instruction is
a low-level machine instruction, which is also called machine
code. It can be directly executed by a computer’s central
processing unit(CPU). Each assembly instruction causes a
CPU to perform a specific task, like add, subtract, jump, xor
and so on. Each function call or system call is implemented via
assembly instructions as shown in the hierarchy in Figure 5.
C. NLP Schemes
NLP language models have proved useful in recommenda-
tion system, text classification, speech recognition and so on.
Fig. 2. Workflow of detector engine In our paper, we have exploited some popular concepts, but
applied to a unique problem domain of multi-level analysis
model of ransomware detection. In this work, NLP schemes
Figure 4 shows the hierarchy of windows DLL. Windows are composed of three methods: N-gram generation, N-gram
API set is the super-set which consists of one or more probability and TF-IDF, which are described below.
Application programming interfaces(APIs). Each API is a 1) N-gram Generator: An n-gram model is a type of
header file with or without interfaces which consists of API probabilistic language model that predicts the next possible
functions. DLL makes these API functions act upon and can item in a sequence. N-gram is a contiguous sequence of n
be considered a bridge between the user space and the kernel items from a given sample of text corpus or speech corpus.
space. The DLL extractor component parses the output of the In our experiment it is the DLL, function call and assembly
PE parser and lists the DLLs used by the given executable. instruction corpus. The items can be phonemes, letters, words,
2) Function call Extractor: A function call is a piece of DLLs, functions call, opcodes or base pairs depending upon
code that actually has lines of instructions that makes an the type of application considered. The value of N can be
impact to the system or user. Each DLL which is implicitly or 1,2,3,4 or any other positive integer. The value for N depends
explicitly linked consists of both import and export functions upon the problem domain, type and nature of dataset. In text
as shown in the Figure 5. processing tasks, a smaller N usually decreases the classifica-
Each of those function consists of a number of function calls tion performance heavily, while the larger N is not considered
and system calls. While disassembling the binaries, function too relevant. In our experiment we choose different values of
E. Resampling
Resampling methods involve repeated drawing of sam-
ples and reanalyzing the model. We selected K-fold cross-
validation for resampling because of its wider acceptance rate
by the research community. K-fold is the statistical method to
compare and select a model for a predictive modeling problem.
F. Supervised Classifiers
We applied various supervised machine learning algorithms
Fig. 5. Hierarchy of function calls and assembly instructions in a DLL
to ransomware and benign labeled dataset. We use Naive
Bayes, Logistic Regression, SVM, Random Forest, and De-
cision Tree, leveraging the Mlib spark library.
N ranging from 2 to 6 so as to analyze the detection accuracy
of ransomware. G. Model Fitting and Evaluation
The n-gram generation component is responsible for gener- We used supervised machine learning classifiers along with
ating possible sequences of n-grams for given value of N. We the training and test dataset. The model accepts only those
consider only the unique set of n-gram sequences. classifiers which has accuracy greater or equal to the given
2) N-gram Probability scoring: N-gram Probability scoring threshold value.
component is fed with the n-gram sequences obtained from
the n-gram generation component. We apply the Markov V. E XPERIMENTS
assumption by considering only the immediate N-1 words. In this section we discuss about dataset collection, experi-
In a n-gram, we consider the length n − 1 mental protocol and experimental results.

p(wi |w1 , . . . , wi−1 ) = p(wi |wi−n+1 , . . . , wi−1 ) A. Dataset


The dataset for the experiment was collected from various
• unigram: p(wi )
sources, such as Virus Total and open source malware repos-
• bigram: p(wi |wi−1 ) (Markov process)
itory theZoo [7]. We used 292 only ransomware binaries and
• trigram: p(wi |wi−2 , wi−1 )
the same number of benign executables for our experiment.
We can estimate n-gram probabilities by counting relative
frequency on a training corpus. B. Experimental protocol
c(wa , wb ) In this work, we use Apache Spark to do the big data
p̂(wb |wa ) = processing of n-gram sequences. Apache Spark provides Mlib
c(wa )
library [1] to implement various machine learning algorithms.
N is the total number of words in the training set and c(·) 1) Cluster configuration: We used the Apache spark cluster
denotes count of the word or word sequence in the training with the following configuration. There are 4 data nodes and
data. 1 name node each with 16GB RAM and 8 cores, Ubuntu
For our experiment dataset, n-gram sequence will be the 16.04.3 operating system and 1TB disk. Hadoop version-2.7.3
DLL, function call and assembly instruction sequences and N and Spark-2.3 is used.
will be their corresponding total number of sequences in the 2) Feature table: The Table I shows the distinct number of
corpus. The probability scores for each n-gram sequence is n-gram sequence features at different levels.
stored in a feature database.
3) TF-IDF: TF-IDF is the product of Term frequency(TF) TABLE I
and Inverse document frequency(IDF). Term frequency is sim- N UMBER OF UNIQUE N- GRAM FEATURES AT MULTI - LEVEL
ply the number of occurrences of particular n-gram sequences
N-gram(N) DLL Function call Assembly
in a binary sample whereas the IDF is given as:
2 2035 24,416 71,999
3 2797 29,226 1,539,769
T otal no of binaries 4 2874 30,483 7,198,017
IDF (ngram) = loge 5 2842 30,962 12,038,570
N o of binaries with ngram in it
6 2746 31,196 14,147,291
D. Machine learning prediction engine
The output of the feature database is fed to the the machine The total number of features differ for different N values of
learning prediction engine. The dataset contains the feature n-grams. As the value of N increases, the number of features
vector values of each binaries. Processing millions of assembly also increases in the function call and assembly level. It is
instructions takes polynomial time using traditional program- slightly irregular in the DLL level. From this table, we can
ming approach so we adopted a big data computing framework claim that there is less overlapping of features as we increase
and used Apache Spark to train and test our labelled dataset. the value of N.
All in all, we report the performance of the proposed frame- TABLE VI
work in terms of accuracy. Our prior published work [18] E XPERIMENT 5: L OGISTIC REGRESSION ACCURACY EVALUATION FOR
N - GRAM TF-IDF AT C OMBINED MULTI LEVEL
reported other performance matrices and results including false
positives, which are not compared here. Level N=2 N=3 N=4 N=5 N=6
Dll, Function call and
97.13 98.59 90.45 85.11 72.58
C. Experimental Results Assembly

We performed four experiments at different levels. We also


analyzed top ten trigrams based on their n-gram probability
93.25% at N=2 for logistic regression classifier. SVM follows
scores. More details is provided in the following sections.
with 92.16%. A similar trend is observed in the assembly level.
1) Performance analysis of ML Malware detectors: The
Logistic regression with 80.24% accuracy at N=3 is the best
first three experiments shown in Tables II, III, and IV is based
observed detection accuracy at this level.
on n-gram probability scores while the Table V is based on
n-gram TF-IDF score. Table V shows the performance evaluation for n-gram TF-
IDF at multi-level using Logistic regression. Since the above
three experiments showed the best performance for Logistic
TABLE II
E XPERIMENT 1: M ACHINE LEARNING ALGORITHMS ACCURACY regression we evaluated the TF-IDF feature set of n-grams
EVALUATION FOR N - GRAM PROBABILITIES AT D LL LEVEL using this classifier. The highest achieved accuracy rate is
Machine learning algorithm N=2 N=3 N=4 N=5 N=6
93.36% at N=2 for DLL level, 98.04% at N=3 for function
Naive Bayes 82.19 75.34 73.97 73.28 72.43 call level and 83.33% at N=3 for Assembly level. The average
Logistic Regression 89.55 88.43 85.44 84.93 82.70 accuracy for multi-level at N=2 is 88.86% and 87.63% at N=3.
SVM 88.52 86.98 85.1 84.58 82.87 Table VI shows the accuracy at combined multi-level using
Random Forest 86.64 85.27 85.27 84.76 83.4
Decision Tree 81.67 78.59 72.6 71.4 71.4 Logistic regression. The result shows the improved accuracy
which is a gain of combined multi-level analysis. The highest
accuracy is achieved at N=3 with 98.59% followed by 97.13%
TABLE III at N=2.
E XPERIMENT 2: M ACHINE LEARNING ALGORITHMS ACCURACY 2) Analysis of top 10 Trigrams at different levels for ran-
EVALUATION FOR N - GRAM PROBABILITIES AT F UNCTION LEVEL
somware and normal binaries: The table VII shows the top ten
Machine learning algorithm N=2 N=3 N=4 N=5 N=6 DLL sequences for ransomware and normal executables along
Naive Bayes 85.62 80.39 79.73 79.08 79.08 with their n-gram probability scores. The observed trigram
Logistic Regression 93.25 92.06 91.28 92.81 91.50 sequence with score 1.0 signifies the surety of that particu-
SVM 92.16 81.52 69.02 60.86 57.06
Random Forest 91.50 91.06 89.97 85.94 82.02 lar sequence to be called in order. Trigram sequence ntdll,
Decision Tree 74.83 72.54 65.68 65.68 61.11 kernel32, comctl32 is expected to occur starting with ntdll.
Kernel32, which is a part of that sequence can be a starting
sequence for other trigrams. There is a 40% probability that
TABLE IV the sequence kernel32, user32, advapi32 will occur starting
E XPERIMENT 3: M ACHINE LEARNING ALGORITHMS ACCURACY with kernel32. The top trigram sequences for ransomware
EVALUATION FOR N - GRAM PROBABILITIES AT A SSEMBLY LEVEL
and benign binaries differ significantly. For example, kernel32,
Machine learning algorithm N=2 N=3 N=4 N=5 N=6 user32, advapi32 has a score of 0.40 in ransomware samples
Naive Bayes 76.77 74.13 72.44 70.98 70.01 whereas the trigram sequence is different with different score
Logistic Regression 78.43 80.24 79.11 77.5 76.8
SVM 76.91 76.95 75.49 75.2 73.19
for benign samples. It is found to be the sequence: advapi32,
Random Forest 80.1 80.056 79.88 79.7 78.46 kernel32, user32 with a score of 0.192 (Not shown in table).
Decision Tree 79.82 79.68 76.66 75.04 73.2 This type of distinguishing behaviour is seen with other
trigrams as well.
The function call level n-grams occur with different n-
TABLE V gram sequences in ransomware samples. The sequence Ad-
E XPERIMENT 4: L OGISTIC REGRESSION ACCURACY EVALUATION FOR
N - GRAM TF-IDF AT MULTI LEVEL
dAccessDeniedAce, AreAnyAccessesGranted, GetCommand-
LineA with 1.0 for ransomware sample is sure to hap-
Level N=2 N=3 N=4 N=5 N=6 pen and is different than sequence RtlAddAccessDeniedAce,
Dll 93.36 81.52 69.02 60.86 57.06
Function call 96.08 98.04 90.20 84.31 72.55
NtOpenKey, NtQueryKey with 0.33 in normal sample. We
Assembly Instruction 77.14 83.33 81.67 80 80 can clearly observe the different sequence dependency of each
functions calls in ransomware and normal binaries.
At the DLL level, the highest accuracy is found to be Similar to the n-gram patterns described in the above
89.55% for Logistic regression at N=2. SVM has the second- two levels, the n-gram sequences at assembly level exhibit
best performance with 88.52% at N=2. The accuracy is found distinguishing behaviour. Trigram pattern sha256msg2, xor,
to be in a decreasing order while increasing the value of N. or is seen in ransomware with a score of 0.33 while a
At the function call level, the highest accuracy is found to be different pattern vpminsd, xor, rex with 0.33 is seen in normal
TABLE VII
T OP 10 T RIGRAM SEQUENCES AT DIFFERENT LEVELS

Ransomware binaries
DLL Function call Assembly Instruction
Trigram Score Trigram Score Trigram Score
0CReaderWriterLock, 0CSingleList,
ntdll, kernel32, comctl32 1.0 1.0 sha256msg2, xor, or 0.33
0CSmallSpinLock
InSendMessageEx, DialogBoxParamA,
msdart, mlang, midimap 0.5 1.0 addss, mov, mov 0.33
SetMenuItemBitmaps
msdart, mlang, advapi32 0.5 TabbedTextOutW, ReleaseDC, GetDC 1.0 vpmacssww, push, daa 0.33
AddAccessDeniedAce, AreAnyAccessesGranted,
dsauth, gdi32, mstask 0.5 1.0 wrpkru, cld, mov 0.33
GetCommandLineA
kernel32, user32, advapi32 0.40 vbaVarSub, CIcos, adj fptan 1.0 kmovd, pushf, lds 0.33
DbgPrint, LdrGetProcedureAddress,
wtsapi32, psapi, msvcrt 0.33 1.0 vpxorq, xchg, ror 0.33
RtlInitAnsiString
BuildSecurityDescriptorW,
winhttp, comctl32, shlwapi 0.33 0.5 vfrczpd, in, and 0.33
RegSetValueW, RegConnectRegistryA
DuplicateToken, CreateServiceA,
msimg32, iphlpapi, oledlg 0.33 0.5 mulss, xchg, aas 0.33
SetSecurityDescriptorOwner
LdrGetProcedureAddress, RtlInitAnsiString,
midimap, icmp, mfcsubs 0.33 0.5 mwait, je, push 0.33
LoadLibraryW
msacm32, kernel32, glu32 0.33 lopen, LoadLibraryW, GetConsoleCP 0.375 vtestps, imul, add 0.33
Normal binaries
DLL Function call Assembly Instruction
Trigram Score Trigram Score Trigram Score
api-ms-win-core-crt-l1-1-0,
SetupDiGetDeviceInstanceIdW,
api-ms-win-core-crt-l2-1-0, 1.0 1.0 sgdtd, jne, push 0.33
SetupDiDestroyDeviceInfoList, SetupDiEnumDeviceInfo
api-ms-win-core-libraryloader-l1-2-0
SkciInitialize, SkciQueryInformation,
dnssd, ws2 32, kernel32 1.0 1.0 vcmpltps, add, sub 0.33
SkciTransferVersionResource
iumcrypt,
UnregisterPowerSettingNotification, DispatchMessageW,
api-ms-win-core-heap-obsolete-l1-1-0, 0.5 1.0 vpmacsdqh, enter, in 0.33
MsgWaitForMultipleObjects
api-ms-win-eventing-cp-l1-1-0
SkciQueryInformation, SkciTransferVersionResource,
ntdsapi, logoncli, rpcrt4 0.5 0.5 cmpxchg8b, retf, lock 0.33
SkciValidateDynamicCodePages
esent, ntdll,
0.5 ChooseFontW, GetSaveFileNameW, InitCommonCtrlEx 0.5 vpminuw, cwde, pop 0.33
api-ms-win-core-file-l1-1-0
AddSIDToBoundaryDescriptor, CreateBoundaryDescript,
tapi32, gdi32, user32 0.33 0.33 vcvtsd2usi, dec, jge 0.33
CreatePrivateNamespaceW
DeleteBoundaryDescriptor, OpenPrivateNamespaceW,
mshtml, urlmon, msiso 0.33 0.33 vpshaw, ret, movabs 0.33
GetSecurityDescriptorDacl
EnterCriticalPolicySection, DeviceIoControl,
mswsock, ws2 32, winmm 0.33 0.33 pinsrb, test, je 0.33
GetSystemTimeAsFileTime
dpx, ntdll, ole32 0.33 LogonUserExW, WaitServiceState,EncodePointer 0.33 vpminsd, xor, rex 0.33
kerbclientshared, ntlmshared, msasn1 0.33 RtlAddAccessDeniedAce, NtOpenKey, NtQueryKey 0.33 vfnmsubpd, jrcxz, jge 0.33

binaries. Though we found the same instruction(xor) in both


the samples, their sequences were different with same or
different scores. So, we claim that these distinct sequences
built the unique feature set to achieve high detection rates
using different machine learning classifiers.

The Figure 6 shows the accuracy graph for all three levels
for n-gram TF-IDFs. We also calculated the average accuracy
among these three levels. Among three levels, function call
achieved improved high accuracy. The accuracy rate at N=3
is about 2% more than N=2, but there is a smooth decrease
at other higher values of N. There is a steep decrease in
accuracy for dll level. Detection rate of 93.36% at N=2 for
DLL level decreases to 81.52% at N=3 and finally to 57.06%
coming at N=6. The decrease is more rapid that the other two Fig. 6. Logistic regression accuracy for N-gram TF-IDF at multi-level
levels. Accuracy at assembly level has a different pattern, it
has an improved accuracy at N=3 but the accuracy decreases
slightly and becomes constant at N=5 and N=6. Generalizing
this pattern, the average graph line shows that the accuracy is [10] G. Canfora, A. De Lorenzo, E. Medvet, F. Mercaldo, and C. A.
found to be inversely proportional to the value of N. Visaggio. Effectiveness of opcode ngrams for detection of multi family
android malware. In 2015 10th International Conference on Availability,
Reliability and Security, pages 333–340. IEEE, 2015.
VI. C ONCLUSION AND F UTURE WORKS [11] R. Canzanese, S. Mancoridis, and M. Kam. System call-based detection
of malicious processes. In 2015 IEEE International Conference on
In this work, we proposed a multi-level ransomware de- Software Quality, Reliability and Security, pages 119–124. IEEE, 2015.
tection framework in big data platform leveraging techniques [12] M. Geden and J. Happa. Classification of malware families based on
of NLP domain, machine learning and reverse engineering. runtime behaviour. In International Symposium on Cyberspace Safety
and Security, pages 33–48. Springer, 2018.
We experimented with ransomware at different levels of code, [13] I. Ghafir and V. Prenosil. Dns traffic analysis for malicious domains
flowing from DLL to function call and then to assembly in- detection. In 2015 2nd International Conference on Signal Processing
structions level for better understanding of various components and Integrated Networks (SPIN), pages 613–918. IEEE, 2015.
[14] E. Kolodenker, W. Koch, G. Stringhini, and M. Egele. Paybreak: defense
and payloads. For faster processing we used an Apache Spark against cryptographic ransomware. In Proceedings of the 2017 ACM
computing environment but a general-purpose computer can on Asia Conference on Computer and Communications Security, pages
also be used. We found that the empirical results of multi-level 599–611. ACM, 2017.
[15] N. McLaughlin, J. Martinez del Rincon, B. Kang, S. Yerima, P. Miller,
analysis are convincing for further research to detect emerging S. Sezer, Y. Safaei, E. Trickel, Z. Zhao, A. Doupé, et al. Deep android
ransomware effectively. Our contributions can be summarized malware detection. In Proceedings of the Seventh ACM on Conference
as follows: on Data and Application Security and Privacy, pages 301–308. ACM,
2017.
• We designed a framework of multi-level analysis by [16] A. H. Michael Sikorski. Practical malware analysis. No starch press,
utilizing DLLs, function calls and assembly instructions (12), 2012.
[17] D. Morato, E. Berrueta, E. Magaña, and M. Izal. Ransomware early
while exploiting NLP schemes and machine learning detection by the analysis of file sharing traffic. Journal of Network and
classifiers. Computer Applications, 124:14–32, 2018.
• We explored the distinguishing n-gram sequences at
[18] S. Poudyal, K. P. Subedi, and D. Dasgupta. A framework for analyzing
ransomware using machine learning. In 2018 IEEE Symposium Series
multi-level for ransomware binary samples. These dif- on Computational Intelligence (SSCI), pages 1692–1699. IEEE, 2018.
ferences in n-gram sequences constructed a good feature [19] E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. K.
database to improve the detection rate at different levels. Nicholas. Malware detection by eating a whole exe. In Workshops at
the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
• The multi-level analysis produce improved detection re- [20] N. Scaife, H. Carter, P. Traynor, and K. R. Butler. Cryptolock (and
sult compared to the individual levels. The highest de- drop it): stopping ransomware attacks on user data. In 2016 IEEE 36th
tection accuracy for n-gram TF-IDF at N=3 is 98.59% International Conference on Distributed Computing Systems (ICDCS),
pages 303–312. IEEE, 2016.
followed by 97.13% at N=2. [21] S. Song, B. Kim, and S. Lee. The effective ransomware prevention
In the future, we plan to conduct experiments using com- technique using process monitoring on android platform. Mobile
Information Systems, 2016, 2016.
bined features of our multi-level analysis leveraging deep [22] T. K. Tran and H. Sato. Nlp-based approaches for malware classification
learning techniques. Also, we plan to include performance from api sequences. In 2017 21st Asia Pacific Symposium on Intelligent
comparison between our framework with other relevant ones. and Evolutionary Systems (IES), pages 101–105. IEEE, 2017.
[23] T. K. Tran, H. Sato, and M. Kubo. One-shot learning approach for
In addition, different program obfuscation techniques such as unknown malware classification. In 2018 5th Asian Conference on
junk code insertion, randomization to slow down the encryp- Defense Technology (ACDT), pages 8–13. IEEE, 2018.
tion process, use of polymorphic codes and multi-threaded [24] S. Wang, Q. Yan, Z. Chen, B. Yang, C. Zhao, and M. Conti. Detecting
android malware leveraging text semantics of network flows. IEEE
attacks will also be explored. Transactions on Information Forensics and Security, 13(5):1096–1109,
2017.
R EFERENCES [25] X. Wang and S. M. Yiu. A multi-task learning model for malware
classification with useful file access pattern from api call sequence. arXiv
[1] Apache spark mlib. https://spark.apache.org/mllib/. preprint arXiv:1610.05945, 2016.
[2] Cso cybersecurity business report. https://www.csoonline.com/article/- [26] W.-C. Wu and S.-H. Hung. Droiddolphin: a dynamic android malware
3237674/ransomware-damage-costs-predicted-to-hit-115b-by- detection framework using big data and machine learning. In Proceed-
2019.html. ings of the 2014 Conference on Research in Adaptive and Convergent
[3] National vulnerability database. https://nvd.nist.gov/vuln/detail/CVE- Systems, pages 247–252. ACM, 2014.
2017-0144. 2017. [27] H. Zhang, X. Xiao, F. Mercaldo, S. Ni, F. Martinelli, and A. K. Sangaiah.
[4] Pe-parse tool. https://github.com/trailofbits/pe-parse. Classification of ransomware families with machine learning based on
[5] Ransomware: How to prevent being attacked and recover after an n-gram of opcodes. Future Generation Computer Systems, 90:211–221,
attack. https://www.backblaze.com/blog/complete-guide-ransomware/. 2019.
April, 2019.
[6] Second florida city pays giant ransom to ransomware gang in
a week. https://www.zdnet.com/article/second-florida-city-pays-giant-
ransom-to-ransomware-gang-in-a-week/. June, 2019.
[7] Thezoo, make the possibility of malware analysis open and available to
the public. https://github.com/ytisf/theZoo.
[8] What to do if you’re infected by ransomware.
https://www.tomsguide.com/us/ransomware-what-to-do-next,news-
25107.html. June, 2017.
[9] B. Alsulami, A. Srinivasan, H. Dong, and S. Mancoridis. Lightweight
behavioral malware detection for windows platforms. In 2017 12th Inter-
national Conference on Malicious and Unwanted Software (MALWARE),
pages 75–81. IEEE, 2017.

You might also like