0% found this document useful (0 votes)
72 views12 pages

Entroplyzer Android Malware Classification and Characterisation

Uploaded by

alpha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views12 pages

Entroplyzer Android Malware Classification and Characterisation

Uploaded by

alpha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

EntropLyzer: Android Malware Classification and

2021 Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS) | 978-1-7281-6937-8/20/$31.00 ©2021 IEEE | DOI: 10.1109/RDAAPS48126.2021.9452002

Characterization Using Entropy Analysis of


Dynamic Characteristics
David Sean Keyes Beiqi Li Gurdip Kaur
Faculty of Computer Science Canadian Institute for Cybersecurity Canadian Institute for Cybersecurity
University of New Brunswick University of New Brunswick University of New Brunswick
Fredericton, Canada Fredericton, Canada Fredericton, Canada
[email protected] [email protected] [email protected]

Arash Habibi Lashkari Francois Gagnon Frédéric Massicotte


Canadian Institute for Cybersecurity Canadian Centre for Cyber Security Canadian Centre for Cyber Security
University of New Brunswick Ottawa, Canada Ottawa, Canada
Fredericton, Canada [email protected] [email protected]
[email protected]

Abstract—The unmatched threat of Android malware has which contains static features such as permissions, intents,
tremendously increased the need for analyzing prominent mal- API calls, and broadcast receivers and providers. Nevertheless,
ware samples. There are remarkable efforts in static and dynamic static analysis cannot detect malware samples which trigger
malware analysis using static features and API calls respectively.
Nonetheless, there is a void to classify Android malware by themselves on sensing a specific running environment. This
analyzing its behavior using multiple dynamic characteristics. gives importance to dynamic malware analysis. This paper
This paper proposes EntropLyzer, an entropy-based behavioral proposes EntropLyzer, an entropy-based malware behavior
analysis technique for classifying the behavior of 12 eminent analysis that inspects the behavioral change of recent malware
Android malware categories and 147 malware families taken families taken from CCCS-CIC-AndMal2020 dataset. Follow-
from CCCS-CIC-AndMal2020 dataset. This work uses six classes
of dynamic characteristics including memory, API, network, ing are the main contributions of this paper:
logcat, battery, and process to classify and characterize Android • We perform dynamic analysis of Android malware sam-
malware. Results reveal that the entropy-based analysis success-
fully determines the behavior of all malware categories and most ples taken from the CCCS-CIC-AndMal2020 dataset.
of the malware families before and after rebooting the emulator. It comprises 12 Android malware categories and 147
malware families that are dynamically analyzed in an
Index Terms—android malware, entropy analysis, malware emulated environment. We extract six classes of features
behavior, malware classification, malware characterization including memory, API, network, battery, logcat, and
process.
I. I NTRODUCTION • We present EntropLyzer, an entropy-based dynamic be-
Android is the leading platform that provides users with havioral analysis for Android malware detection. It uti-
a high-performance operating system that runs today’s most lizes Shannon entropy to identify behavioral changes
popular mobile devices. According to a report published by in malware families based on six classes of features
the International Data Corporation (IDC), Android dominated extracted from dynamic malware analysis.
the market with 85% of global market share in the last • We analyze the patterns obtained from behavioral analysis
quarter of 2020 [5]. The annual global shipment volume of to understand the nature of malware categories. These
Android is expected to grow by 150 million units in 2021 patterns are visualized to classify the behavior of malware
[4]. The unprecedented upsurge in Android’s dominance in families.
the marketplace poses challenges of detecting rising Android The rest of the paper is organized as follows: Section II unfolds
malware samples. As of March 2020, the total number of new the related works on dynamic Android malware analysis. Sec-
Android malware samples amounted to 482,579 per month [3]. tion III details the dataset. Section IV describes the proposed
The only way to curb the menace of Android malware is to methodology and is followed by experimental architecture in
detect and analyze it statically or dynamically. Static malware Section V. The discussion of experimental results and analyses
analysis extracts features from the AndroidManifest.xml file, are presented in Section VI. Finally, Section VII concludes the
978-1-7281-6937-8/20/$31.00 ©2020 IEEE paper and highlights future work.

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 30,2021 at 10:18:35 UTC from IEEE Xplore. Restrictions apply.
II. R ELATED W ORKS

This section summarizes prominent dynamic Android mal-


ware analysis techniques. A comprehensive review of dynamic
malware analysis techniques is conducted in [27] and taxon-
omy of mobile malware and attack vector is presented in [29].
Several dynamic malware analysis emphasized the use of
specific features such as opcodes sequences [28], intents [13],
websites visited by the Android apps [36], overprivileged
permissions [18], ontology-based characteristics shared by
Android malware apps to leverage their presence on a de-
vice [25], and dominance tree of API calls to find similar
patterns in Android apps [7] to detect malware. TraceDroid
Analysis Platform [35] dynamically analyzes Android appli-
cations using a comprehensive tracing scheme. It reveals more
information about Android app execution even in the emulated Fig. 1: Total Number of Malware Samples in Dataset
environment. A bag-of-words permission model [24] performs
well on static analysis and is claimed to do so for dynamic
analysis as well. There are significant efforts in detecting malware using
CANDYMAN [22] classifies Android malware families entropy-based approaches. A representative botnet dataset is
using dynamic analysis and Markov chains. It combines used to detect anomalous pattern of malicious activities per-
the information from various state transitions with the state formed by the bot traffic using Shannon, Renyi, and Tsallis
frequency distribution to discriminate 179 malware families entropy-based approach [10]. Structural entropy is used to
captured from Drebin dataset. Zhang et al. [39] also use Drebin detect Android malware with high precision to identify a
dataset by extracting n-gram and feature hashing to attribute malware application, and Android malware family [11]. En-
malware families with high accuracy. Cypider [19] is a novel tropy values are also used to classify a malware packed with
fingerprinting technique to generate a fingerprint for similar unknown packing algorithms [9]. An entropy-based distance
malicious instances that share common features and groups measure is used to discriminate the degree of metamorphism
them in a malicious community. in four malware families [30] and ransomware samples [23].
Some dynamic malware analysis techniques utilize input To summarize, dynamic malware analysis techniques and
format to detect malware. DL-Droid [8] detects Android entropy-based measures are used separately to detect malicious
malware through stateful input generation by using 30,000 behavior. This paper integrates both methods to present an
applications (benign and malware) on real devices. Olukoya entropy-based malware behavior analysis for a comparatively
et al. [26] gather meta-data related to unstructured input and larger dataset. We investigate the behavior change of 147
combine it with decision trees to determine malware with Android malware families that belong to 12 Android malware
different characteristics. Feature engineering is also used to categories, based on six classes of features extracted from
map API calls to certain features and aggregate them to find dynamic analysis.
the frequency of a feature [32]. Extracted malware patterns III. DATASET
are combined with artificially generated patterns to increase
detection rate [17]. Recently, VizMal is developed to visualize We collaborated with the Canadian Center for Cyber Se-
dynamic traces of malware analysis [12]. curity (CCCS) [1] to generate a new dataset namely, CCCS-
DroidChain [37] combines static analysis with behavior CIC-AndMal2020, which includes 400K android apps (200K
chain model. It addresses malware with a matrix approach by benign and 200K malware).
using four malware models, including privacy leakage, SMS • Malware Data:

financial charges, malware installation, and privacy escalation. CCCS shared their real-world collected Android malware
The state-based method is merged with the random-based samples for our analysis. We used VirusTotal [6] to iden-
method to evaluate code coverage capacity within a dynamic tify malware family and label the dataset by following a
analysis system using real devices [38]. MalDAE [15] corre- consensus of 70% anti-viruses to incorporate reliability of
lates static and dynamic API sequences to fuse and map them labeled dataset. We searched for similar malware samples to
semantically to construct a malware detection framework. A categorize malware samples in dataset with similar character-
tree augmented Naive Bayes [31] employs the conditional istics. Finally, we got the Android malware data distribution
dependencies among relevant static and dynamic features. Au- with 14 malware categories including Adware, Backdoor,
thors also proposed GSDroid, a graph-based feature extraction FileInfector, No Category, Potentially Unwanted Apps (PUA),
approach to detect malware. The hybrid analysis is further Ransomware, Riskware, Scareware, Trojan, Trojan-Banker,
merged with system calls and extracted features to achieve Trojan-Dropper, Trojan-SMS, Trojan-Spy and Zero Day as
high accuracy [33]. shown in Fig. 1. We used 12 malware categories (excluding

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 30,2021 at 10:18:35 UTC from IEEE Xplore. Restrictions apply.
Fig. 2: Similarity Graph

No Category and Zero Day) for dynamic malware analysis


owing to incomplete data in these categories. Further, there
are different malware families that belong to every malware
category. A malware family is a variant of malware category
that uses same base code and exhibits similar basic behavior
as shown by the malware category.
• Benign Data:
For benign android apps, we used the Androzoo dataset [21]
which collects samples from different sources including offi-
cial android market, Google Play, Anshi, AppChina, 1mobile,
and Genome project dataset. A weekly updated list containing
all the detailed information about the apps is created. We
also collected 200K benign Android apps from CCCS. We Fig. 3: Proposed Methodology for EntropLyzer
used the mentioned malware and benign samples to perform
static analysis [16]. For dynamic malware analysis, we used
A. Step 1: Dynamic Malware Analysis
malicious samples only.
Android APK files are dynamically executed in a sandbox
• Similarity:
environment. The detailed architecture comprising different
We used DroidKin [14] to detect similarity among various components used to dynamically analyze Android malware
Android malware categories and families. Although every samples is discussed in Section 5. At the end of this step, we
malware category has its unique behavior and characteristics extracted a total of 141 dynamic characteristics including 23
but there are some malware samples that bear resemblance memory features, 105 API features, two battery features, four
to other malware categories. To illustrate, Riskware families network features, one process feature, and six logcat features.
resemble Adware, Backdoor, and Trojan samples as shown in A complete list of dynamic characteristics is presented in Table
Fig. 2. Similarity graph can be used to create a taxonomy of II at the end of the paper.
Zero Day and No Category malware samples.
B. Step 2: Rebooting
IV. P ROPOSED M ETHODOLOGY In this work, it is found that some malware samples are
executed only after rebooting the emulator. After analyzing
This section uncovers EntropLyzer, the proposed method- malware samples and extracting dynamic characteristics, the
ology to analyze the dynamic behaviour of Android malware emulator is rebooted and the process is repeated from the
families using entropy analysis. Fig. 3 presents the step-by- beginning to capture dynamic characteristics again. Thus, we
step procedure to analyze malware behavior. captured two CSV files: (1) before rebooting the emulator,

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 30,2021 at 10:18:35 UTC from IEEE Xplore. Restrictions apply.
and (2) after rebooting the emulator. We refer to these files as V. A RCHITECTURAL D ETAILS
before reboot and after reboot file throughout the rest of the We used virtual environment in a server to analyze Android
paper. malware categories as presented in previous section. Since
C. Step 3: Shannon Entropy the dataset contains 200K malware samples, we configured
a virtual server farm consisting of 30 virtual machines to
Entropy is a measure of randomness or uncertainty of a dynamically analyze the samples in parallel. Fig. 4 presents
variable [34]. It is mathematically represented as: the architecture of our server farm. It comprises three primary
n
 components: (1) Database, (2) Android sandbox, and (3)
H(X) = − P (xi ) log P (xi ) (1) Android emulator.
i=1 1) Database: A new malware sample is submitted from the

where H(X) is the entropy of a variable X, defines the sum host machine through SSH command. It is stored in the
over variable’s possible values, and P(xi ) is the probability of MongoDB database and managed by a job management
occurrence of possible outcomes xi of variable X, where i script. Newly submitted job in database is forwarded
represents number of outcomes and varies between 1 and n. to job scheduler under analysis server which schedules
We computed the Shannon entropy for every malware sam- its execution in the emulator. Database is responsible
ple before and after rebooting the emulator to understand its for storing malware hashes and analysis status for the
behavior changes. To compute entropy, we treated every record sandbox.
(malware sample) in the dataset as a one-dimensional labeled 2) Analysis Server (or Android Sandbox): It is the most
array holding integer data (values of all features extracted important component in the architecture which manages
under six classes). Entropy for every record is computed by emulator instances, executes individual analysis tasks
using equation (1) by replacing X with record[i,:] as follows: including simple interactions (calls, send SMS, etc.), and
n invokes other components such as Frida and DroidBot.

H(record[i, :]) = − P (record[i, :]) log P (record[i, :]) Once a new job is submitted in the database, the Android
i=1 server creates a temporary job directory, starts emulator,
(2) uploads Frida server to the emulator, and starts calculat-
This step involves two tasks. In the first task (step 3a), we ing values for dynamic characteristics. After collecting
computed the entropy value for every malware sample in all defined dynamic characteristics, emulator is rebooted
before reboot and after reboot files. This computation abets by the sandbox and captures dynamic characteristics
us to assign an entropy value to every malware sample. In again and stores it in before reboot and after reboot file
the second task (step 3b), we computed the Shannon entropy respectively.
on all features for each malware category in the before and 3) Android Emulator: It is responsible for emulating the
after reboot file. This ensures to observe behavioral changes Android operating system (OS), interfacing with android
of malware samples for every dynamic characteristic. server (via telnet console, ADB services and files), and
collecting network traffic (pcap dump). The analysis life
D. Step 4: Malware Family Detection
cycle of malware samples is controlled by the sandbox.
We applied Naive Bayes (NB), Support Vector Machine Frida server under the Android OS collects API logs
(SVM), Random Forest (RF), and Decision Tree (DT) machine from the apps under test component and uses MobSF
learning (ML) classifiers to perform multi-class classification payload to attach to a specific emulator spawned by the
of 12 malware categories. We divided the dataset into various sandbox. It dumps network traffic via telnet console and
training and testing set combinations to achieve the best re- simulates the basic phone usage scenario (phone call,
sults. We finally split it into 80% training and 20% testing set. SMS, GPS toggle, web browsing etc.).
We found that RF classifier outperforms other ML classifiers In addition to the main components described above, a shared
to classify malware families. Thus, we applied it to obtain folder is created on a storage server to store Android malware
malware family classification of all malware categories in the samples and dynamic analysis results. It is important to
same training and testing set split. mention that we were unable to execute some malware samples
E. Step 5: Malware Behavior Analysis from the dataset. We encountered that there was no entry
point in some Android malware samples and some Android
We used the entropy values to analyze behavioral changes malware samples stopped abruptly. Therefore, there were quite
in each malware family and category. First and foremost, less number of actual malware samples that we were able to
for analyzing the behavioral changes in malware families, execute successfully. The total number of malware samples
we compared the entropy for each malware family in before for each family before and after rebooting the emulator are
reboot and after reboot file. Secondly, for analyzing behavioral shown in Fig. 5.
changes in malware category, we compared the entropy value
of all features in before and after reboot files. The results VI. A NALYSIS AND D ISCUSSION
of these comparisons are plotted to visualize the behavioral This section presents major findings and discussion of
changes. results.

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 30,2021 at 10:18:35 UTC from IEEE Xplore. Restrictions apply.
Fig. 4: Architecture used to Dynamically Analyze Android Malware

TABLE I: Comparison of ML Classification


Classifier Precision Recall F1-Score
NB 0.412 0.171 0.138
RF 0.769 0.764 0.759
DT 0.984 0.983 0.983

forest performs better than decision tree. Since the training


data contains 147 malware families, random forest classifier
forms multiple single trees compared to decision trees that
form only single decision tree, random forest turns out to be a
Fig. 5: Distribution of Malware Samples after Dynamic Anal- better classifier for malware families. The results of malware
ysis family classification are presented in Fig. 6. We classified 22
Adware, 10 Backdoor, 4 File Infector, 8 PUA, 7 Ransomware,
18 Riskware, 3 Scareware, 37 Trojan, 10 Trojan Banker, 9
A. Malware Category and Family Classification Trojan Dropper, 9 Trojan SMS, and 10 Trojan Spy families,
totalling to 147 malware families obtained after dynamically
We used ML classifiers to classify malware categories and executing the malware samples. By inspecting the diagonal
families. The results of classification of malware category are values in all the sub-figures of Fig. 6, we observe that majority
presented in Table I. Apparently, decision tree outperforms of malware families are classified correctly despite having
other classifiers by a significant margin. It gives 0.984 for very less number of samples. However, there is a prominent
precision and 0.983 for recall and F1-score. These values of observation in Ransomware category, where 50 samples of
precision and recall indicate that negligible number of false congur family are incorrectly classified as slocker family.
positives and false negatives are reported by the classifier.
Further, a deep insight into these values confirm that memory B. Entropy-based Behavior Analysis for Malware Categories
(heap size, views, heap allocation), API (message digest To dig deeper into the entropy distribution for malware
update, get device ID), and network (total bytes transmitted categories, we plotted the Shannon entropy against Kernel
and received) features contribute the most to the classification Density Estimation (KDE) for different malware categories, as
process. visualized in Fig. 7. We used KDE because it fits the best into
For classifying malware families, we observed that random the type of data in our dataset. On investigating the before

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 30,2021 at 10:18:35 UTC from IEEE Xplore. Restrictions apply.
(a) Adware (b) Trojan

(c) Backdoor (d) File Infector (e) PUA

(f) Ransomware (g) Riskware (h) Scareware

(i) Trojan Banker (j) Trojan Dropper (k) Trojan SMS

(l) Trojan Spy


Fig. 6: Classification of Malware Families under different Malware Categories using Random Forest
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 30,2021 at 10:18:35 UTC from IEEE Xplore. Restrictions apply.
reboot and after reboot entropy of malware categories, it is API (registerContentObserver, sessions), network and logcat
observed that entropy of every malware category changes after features. For Trojan Spy, significant alterations are performed
rebooting the emulator. There are two spikes in entropy values due to memory (PssClean, PrivateClean, Views), API (ses-
corresponding to Riskware and Trojan categories as compared sions), network and logcat features.
to other malware categories. Entropy density of Trojan Banker Overall, it is concluded that memory (PssClean, Private-
and Trojan Dropper is also higher than the rest of the malware Clean), network (total transmitted/received packets/bytes), and
categories. logcat features predominantly change before and after reboot
Adware samples primarily alter message digest and attempt for all malware categories and contribute to a large number of
to encode and decode base64 format. Based on dynamic behavioral changes in malware samples. We hardly observed
characteristics, Adware samples undergo massive changes in any changes in battery features for all malware categories.
memory (PssClean, PrivateClean, Views, ViewRootImpl), API Further, specific malware behavior for each category is also
(registerReceiver, digest update, base64 encode and decode, defined based on changes in before and after a reboot for
notify, sessions), network (total transmitted/received pack- dynamic characteristics.
ets/bytes), and logcat features. Backdoor samples access cryp-
tographic keys and update message digest. C. Entropy-based Behavior Analysis for Malware Families
File Infector samples retrieve system properties and access
API sessions as two significant activities. PUA samples mainly After analyzing the behavioral changes for malware cate-
communicate with other devices and access shared memory gories, we plotted entropy against KDE for malware families
pages. Ransomware samples attempt to encode and decode under each malware category, as shown in Fig. 8. These
base64 format and share memory pages with other processes. plots help to have a deep insight into the entropy distribution
On diving deeper into Riskware behavior, it is perceived that of malware families. For plotting these entropy curves, we
significant alterations in memory features (before and after excluded malware families with less than ten samples so that
reboot) attribute to the spiked entropy change in Riskware the curves are representative of identified malware behavior.
samples. Although there are important modifications in API, Moreover, we analyzed the behavior of prominent malware
network, and logcat features before and after reboot, but these families with spiked entropy values. As evident from behavior
modifications are too smaller as compared to the difference in analysis of different malware categories, all malware families
memory features before and after reboot. communicate with other devices in the network, share memory
Scareware samples try to get system properties and access pages with other processes, and perform logcat activities.
API sessions in addition to communication with network (1) Adware: Adware malware category has three malware
devices. For Trojan samples, the most significant changes are families highlighted in Fig. 8: appad, mobclick, and adend.
witnessed in memory (PssClean, ViewRootImpl, AppContexts, Appad performs multiple functions. It accesses database and
PrivateClean), API (execSQL, openDatabase, loadUrl, regis- executes SQLite queries, opens files and writes into them,
terReceiver, digest, notify, registerContentObserver), network, starts services, registers receiver, and creates threads for in-
and logcat features. To be particular, an average change in terprocess communication, accesses cipher keys and updates
Memory PssClean values after reboot is 373% higher than message digest, and gets device ID and verifies from device in-
before reboot. Overall, it indicates that Trojan samples share formation whether a debugger is connected or not. In addition
an extremely large number of memory pages across processes, to all the aforementioned activities, mobclick sends broadcast
attempt to access database, and communicate with other de- messages whilst adend mainly initiates new activities.
vices in the network. (2) Backdoor: Kmin family gets subscriber Id of the device
Major entropy changes in Trojan Banker samples are ob- as its main functionality.
served due to memory (PssClean, PrivateClean, AppContexts), (3) File Infector: Gudex and tachi are the two main
API (loadUrl, registerReceiver, handleReceiver, getDeviceId, families under File Infector that fetch network country ISO.
getSimSerialNumber, registerContentObserver), network, and Additionally, tachi gets system properties to fetch IP address
logcat features. Trojan Banker tries to retrieve smartphone’s of the wifi device while gudex updates message digest and
device identifier and SIM serial number in addition to commu- sends text message.
nicating with network devices and accessing shared memory (4) PUA: Umpay executes database queries. Scamapp opens
pages with other processes. URL connections and input files, starts new activities, and gets
Trojan Dropper’s behavior is impacted by memory (Pss- network country ISO. Apptrack opens input and output files,
Clean, PrivateClean), API (SecretKeySpec, digest, digest up- and gets network operator.
date, registerContentObserver, sessions), network, and logcat (5) Ransomware: There are six major entropy value changes
features. It is observed that these samples intend to modify in koler, lockscreen, congur, masnu, jisut, and slocker families.
cipher keys, update message digest in API sessions, and Koler loads URLs and adds web interfaces, gets device Id,
communicate with other devices in the network. SIM serial number and system properties. Lockscreen, congur,
Trojan SMS samples are primarily involved in sending mes- slocker, and jisut obtain system properties and API sessions.
sages to other devices. There are significantly larger changes Masnu accesses database and executes queries. Additionally,
in memory (PssClean, PrivateClean, Views, ViewRootImpl), it updates message digest and identifies secret key.

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 30,2021 at 10:18:35 UTC from IEEE Xplore. Restrictions apply.
(a) Before Reboot (b) After Reboot
Fig. 7: Distribution of Entropy for Malware Categories

(6) Riskware: A significant behavior change is observed for classifying Android malware categories and families in
in all malware families under this category. Jiagu accesses CCCS-CIC-AndMal2020 dataset. We performed dynamic mal-
wakelock service of the device to keep it awake. Mobilepay ware analysis in a virtual environment to extract six classes
starts new activities and fetches message digest. Smsreg is of dynamic characteristics including memory, API, network,
one of the largest malware families that primarily executes logcat, battery, and process. As apparent from the results,
database queries. Triada gets SIM serial number and access we classified all malware categories with 0.984 precision
cryptographic keys. Wificrack also accesses cipher keys. Dno- and 0.983 recall values from decision tree classifier. Further,
tua updates message digest. we were successfully able to compute entropy and analyze
(7) Scareware: It is the smallest category in the dataset behavior of all malware categories and most of the malware
that has least number of malware families. Fakeapp updates families before and after rebooting the emulator.
message digest whilst avpass opens up URL connection. However, there are some limitations of this work. Firstly, the
(8) Trojan: Wkload gets device Id, subscriber Id, SIM dynamic analysis is performed in an emulator. Some malware
operator name, and network operator name. Rootnik starts new samples are able to detect the emulated environment and
services and gets system properties. Qysly opens file input and are not executed. Therefore, we obtained lesser number of
URL connection. Gappusin starts new activities and accesses malware samples after dynamic analysis as compared to actual
message digest. Autoins mainly fetches message digest. number of malware samples in the dataset. Secondly, since
(9) Trojan Banker: Minimob is the only family that un- the number of malware samples were reduced after dynamic
dergoes major entropy changes and is involved in starting analysis, we skipped the malware families from the behavior
new activities and services, and gets installed packages on analysis which contain less than ten samples so that entropy
the device. curves are representative of analyzed behavior for that malware
(10) Trojan Dropper: Ramnit initiates a new service while category. To overcome these limitations, dynamic analysis can
ztorg executes SQLite queries and opens input files. be performed on real smartphone devices to analyze more
(11) Trojan SMS: The entropy of jsmsshider is decreased malware samples.
a lot after rebooting the emulator. Based on the captured
dynamic values, it sends text messages and accesses wakelock AVAILABILITY
service of the device to keep it awake. The source code for Android App dynamic analyzer and list
(12) Trojan Spy: There are two primary families under of extracted features are publicly available in GitHub [20] and
Trojan Spy: qqspy and spynote. Qqspy gets system properties dataset will be made available publicly at [2] after publication
and API sessions whilst spynote spies on country network ISO of this work.
and gets system properties as two major activities.
ACKNOWLEDGMENT
VII. C ONCLUSION AND F UTURE S COPE We thank the Mitacs Globalink Program for providing
Android malware dominates the Internet with massive in- the Research Internship (GRI) opportunity and Harrison Mc-
crease in number of new samples every day. Although there Cain Young Scholar Foundation funds from University of
are notable efforts to analyze these samples to do away New Brunswick (UNB) for supporting this project. We also
with them, the menace continues. This paper introduced thank CCCS for sharing the Android apps for CCCS-CIC-
EntropLyzer, an entropy-based behavioral analysis technique AndMal2020 dataset with us.

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 30,2021 at 10:18:35 UTC from IEEE Xplore. Restrictions apply.
(a) Adware (b) Backdoor

(c) File Infector (d) PUA

(e) Ransomware (f) Riskware

(g) Scareware (h) Trojan

(i) Trojan Banker (j) Trojan Dropper

(k) Trojan SMS (l) Trojan Spy


Fig. 8: Comparison of Entropy for Malware Families before (left) and after (right) reboot

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 30,2021 at 10:18:35 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES [24] Nikola Milosevic, Ali Dehghantanha, and Kim Kwang Raymond Choo,
Machine learning aided Android malware classification, Computers and
[1] Canadian centre for cyber security, 2020, =https://cyber.gc.ca/en/. Electrical Engineering 61 (2017), 266–274.
[2] Cccs-cic-andmal-2020, 2020, =https://www.unb.ca/cic/datasets/andmal [25] Luiz C. Navarro, Alexandre K.W. Navarro, André Grégio, Anderson
2020.html. Rocha, and Ricardo Dahab, Leveraging ontologies and machine-learning
[3] Development of new android malware worldwide from june 2016 to techniques for malware analysis into Android permissions ecosystems,
march 2020, 2020, =https://www.statista.com/statistics/680705/global- Computers & Security 78 (2018), 429–453.
android-malware-volume/statisticContainer. [26] Oluwafemi Olukoya, Lewis Mackenzie, and Inah Omoronyia, Towards
[4] Global smartphone shipments to witness double-digit growth in 2021, using unstructured user input request for malware detection, Computers
2020, =https://www.androidheadlines.com/2020/10/global-smartphone- & Security 93 (2020).
shipments-to-witness-double-digit-growth-in-2021.html. [27] Ori Or-Meir, Nir Nissim, Yuval Elovici, and Lior Rokach, Dynamic
[5] Smartphone market share, 2020, =https://www.idc.com/promo/smartphone- malware analysis in the modern era—A state of the art survey, ACM
market-share/os. Computing Surveys 52 (2019).
[6] Virustotal, 2020, =https://www.virustotal.com/gui/home/upload. [28] Abdurrahman Pektaş and Tankut Acarman, Learning to detect Android
[7] Shahid Alam, Soltan Abed Alharbi, and Serdar Yildirim, Mining nested malware via opcode sequences, Neurocomputing 396 (2020), 599–608.
flow of dominant APIs for detecting android malware, Computer Net- [29] Attia Qamar, Ahmad Karim, and Victor Chang, Mobile malware attacks:
works 167 (2020), 107026. Review, taxonomy & future directions, Future Generation Computer
[8] Mohammed K. Alzaylaee, Suleiman Y. Yerima, and Sakir Sezer, DL- Systems 97 (2019), 887–909.
Droid: Deep learning based android malware detection using real [30] Esmaeel Radkani, Sattar Hashemi, Alireza Keshavarz-Haddad, and
devices, Computers & Security 89 (2020). Maryam Amir Haeri, An entropy-based distance measure for analyzing
[9] Munkhbayar Bat-Erdene, Hyundo Park, Hongzhe Li, Heejo Lee, and and detecting metamorphic malware, Applied Intelligence 48 (2018),
Mahn-Soo Choi, Entropy analysis to classify unknown packing al- 1536–1546.
gorithms for malware detection, International Journal of Information [31] Surendran Roopak, Thomas Tony, and Emmanuel Sabu, Gsdroid: Graph
Security 16 (2017), 227–248. signal based compact feature representation for android malware detec-
[10] Przemysław Bereziński, Bartosz Jasiul, and Marcin Szpyrka, An entropy- tion, Expert Systems with Applications 159 (2020).
based network anomaly detection method, Entropy 17(4) (2015), 2367– [32] Arindaam Roy, Divjeet Singh Jas, Gitanjali Jaggi, and Kapil Sharma,
2408. Android Malware Detection based on Vulnerable Feature Aggregation,
[11] Gerardo Canfora, Francesco Mercaldo, and Corrado Aaron Visaggio, Procedia Computer Science 173 (2020), 345–353.
An hmm and structural entropy based detector for android malware: An [33] Dina Saif, S. M. El-Gokhy, and E. Sallam, Deep Belief Networks-
empirical study, Computers & Security 61 (2016), 1–18. based framework for malware detection in Android systems, Alexandria
[12] Andrea De Lorenzo, Fabio Martinelli, Eric Medvet, Francesco Mercaldo, Engineering Journal 57 (2018), 4049–4057.
and Antonella Santone, Visualizing the outcome of dynamic analysis [34] Claude E. Shannon, A mathematical theory of communication, The Bell
of Android malware with VizMal, Journal of Information Security and System Technical Journal, 27 (1948), 379–423.
Applications 50 (2020). [35] Victor Van Der Veen, Herbert Bos, and Christian Rossow, Dynamic
[13] Ali Feizollah, Nor Badrul Anuar, Rosli Salleh, Guillermo Suarez- Analysis of Android Malware, Internet & Web Technology Master thesis,
Tangil, and Steven Furnell, AndroDialysis: Analysis of Android Intent VU University Amsterdam (2013), 106.
Effectiveness in Malware Detection, Computers & Security 65 (2017), [36] Shanshan Wang, Zhenxiang Chen, Qiben Yan, Ke Ji, Lizhi Peng,
121–134. Bo Yang, and Mauro Conti, Deep and broad URL feature mining for
[14] Hugo Gonzalez, Natalia Stakhanova, and Ali A. Ghorbani, Droidkin: android malware detection, Information Sciences 513 (2020), 600–613.
Lightweight detection of android apps similarity, International Confer- [37] Zhaoguo Wang, Chenglong Li, Zhenlong Yuan, Yi Guan, and Yibo
ence on Security and Privacy in Communication Networks (2014). Xue, DroidChain: A novel Android malware detection method based
[15] Weijie Han, Jingfeng Xue, Yong Wang, Lu Huang, Zixiao Kong, and on behavior chains, Pervasive and Mobile Computing 32 (2016), 3–14.
Limin Mao, MalDAE: Detecting and explaining malware based on [38] Suleiman Y. Yerima, Mohammed K. Alzaylaee, and Sakir Sezer, Ma-
correlation and fusion of static and dynamic characteristics, Computers chine learning-based dynamic analysis of Android apps with improved
& Security 83 (2019), 208–233. code coverage, Eurasip Journal on Information Security 2019 (2019),
[16] hidden3 hidden1, hidden2 and hidden4, Didroid: Android malware clas- 1–24.
sification and characterization using deep image learning, Proceedings [39] Li Zhang, Vrizlynn L.L. Thing, and Yao Cheng, A scalable and exten-
of the 10th International Conference on Communication and Network sible framework for android malware detection and family attribution,
Security, Japan (2020). Computers & Security 80 (2019), 120–133.
[17] Manel Jerbi, Zaineb Chelly Dagdia, Slim Bechikh, Mohamed Makhlouf,
and Lamjed Ben Said, On the use of artificial malicious patterns for
android malware detection, Computers & Security 92 (2020), 17–43.
[18] Abdullah Talha Kabakus and Ibrahim Alper Dogru, An in-depth analysis
of Android malware using hybrid techniques, Digital Investigation 24
(2018), 25–33.
[19] ElMouatez Karbab, Mourad Debbabi, Abdelouahid Derhab, and Djed-
jiga Mouheb, Scalable and robust unsupervised android malware fin-
gerprinting using community-based network partitioning, Computers &
Security 96 (2020), 19–32.
[20] Arash Lashkari, Android dynamic analyser, Dec 2020,
=https://github.com/ahlashkari/backup-AndroidDynamicAnalyser.
[21] Li Li, Jun Gao, Médéric Hurier, Pingfan Kong, Tegawendé F. Bissyandé,
Alexandre Bartel, Jacques Klein, and Yves Le Traon, Androzoo++:
Collecting millions of android apps and their metadata for the research
community, Proceedings of the 13th International Conference on Mining
Software Repositories (2017), 468–471.
[22] Alejandro Martı́n, Vı́ctor Rodrı́guez-Fernández, and David Camacho,
CANDYMAN: Classifying Android malware families by modelling dy-
namic traces with Markov chains, Engineering Applications of Artificial
Intelligence 74 (2018), 121–133.
[23] Timothy McIntosh, Julian Jang-Jaccard, Paul Watters, and Teo Susn-
jak, The inadequacy of entropy-based ransomware detection, Neural
Information Processing, Communications in Computer and Information
Science, Springer, Cham 1143 (2019), 181–189.

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 30,2021 at 10:18:35 UTC from IEEE Xplore. Restrictions apply.
TABLE II: Extracted Dynamic Features
Category Feature
Memory PssTotal, PssClean, SharedDirty, PrivateDirty, SharedClean, PrivateClean, SwapPssDirty, HeapSize,
HeapAlloc, HeapFree, Views, ViewRootImpl, AppContexts, Activities, Assets, AssetManagers,
LocalBinders, ProxyBinders, ParcelMemory, ParcelCount, DeathRecipients, OpenSSLSockets, WebViews
Network TotalReceivedBytes, TotalReceivedPackets, TotalTransmittedBytes, TotalTransmittedPackets
Battery wakelock, service
Logcat verbose, debug, info, warning, error, total
Process total
API Process android.os.Process start, Process android.app.ActivityManager killBackgroundProcesses
Process android.os.Process killProcess
Command java.lang.Runtime exec, Command java.lang.ProcessBuilder start
JavaNativeInterface java.lang.Runtime loadLibrary
JavaNativeInterface java.lang.Runtime load
WebView android.webkit.WebView loadUrl, WebView android.webkit.WebView loadData
WebView android.webkit.WebView loadDataWithBaseURL
WebView android.webkit.WebView addJavascriptInterface
WebView android.webkit.WebView evaluateJavascript, WebView android.webkit.WebView postUrl
WebView android.webkit.WebView postWebMessage, WebView android.webkit.WebView savePassword
WebView android.webkit.WebView setHttpAuthUsernamePassword
WebView android.webkit.WebView getHttpAuthUsernamePassword
WebView android.webkit.WebView setWebContentsDebuggingEnabled, FileIO libcore.io.IoBridge open
FileIO android.content.ContextWrapper openFileInput
FileIO android.content.ContextWrapper openFileOutput
FileIO android.content.ContextWrapper deleteFile
Database android.content.ContextWrapper openOrCreateDatabase
Database android.content.ContextWrapper databaseList
Database android.content.ContextWrapper deleteDatabase
Database android.database.sqlite.SQLiteDatabase execSQL
Database android.database.sqlite.SQLiteDatabase insert
Database android.database.sqlite.SQLiteDatabase deleteDatabase
Database android.database.sqlite.SQLiteDatabase getPath
Database android.database.sqlite.SQLiteDatabase insertOrThrow
Database android.database.sqlite.SQLiteDatabase insertWithOnConflict
Database android.database.sqlite.SQLiteDatabase openDatabase
Database android.database.sqlite.SQLiteDatabase openOrCreateDatabase
Database android.database.sqlite.SQLiteDatabase query
Database android.database.sqlite.SQLiteDatabase queryWithFactory
Database android.database.sqlite.SQLiteDatabase rawQuery
Database android.database.sqlite.SQLiteDatabase rawQueryWithFactory
Database android.database.sqlite.SQLiteDatabase update
Database android.database.sqlite.SQLiteDatabase updateWithOnConflict
Database android.database.sqlite.SQLiteDatabase compileStatement
Database android.database.sqlite.SQLiteDatabase create
IPC android.content.ContextWrapper sendBroadcast
IPC android.content.ContextWrapper sendStickyBroadcast
IPC android.content.ContextWrapper startActivity, IPC android.content.ContextWrapper startService
IPC android.content.ContextWrapper stopService, IPC android.content.ContextWrapper registerReceiver
Binder android.app.ContextImpl registerReceiver, Binder android.app.ActivityThread handleReceiver
Binder android.app.Activity startActivity
Crypto javax.crypto.spec.SecretKeySpec init, Crypto javax.crypto.Cipher doFinal
Crypto-Hash java.security.MessageDigest digest, Crypto-Hash java.security.MessageDigest update
DeviceInfo android.net.wifi.WifiInfo getBSSID, DeviceInfo android.net.wifi.WifiInfo getIpAddress
DeviceInfo android.net.wifi.WifiInfo getNetworkId
Continued on next page

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 30,2021 at 10:18:35 UTC from IEEE Xplore. Restrictions apply.
TABLE II – continued from previous page
Category Feature
DeviceInfo android.telephony.TelephonyManager getSimCountryIso
DeviceInfo android.telephony.TelephonyManager getSimSerialNumber
DeviceInfo android.telephony.TelephonyManager getNetworkCountryIso
DeviceInfo android.telephony.TelephonyManager getDeviceSoftwareVersion
DeviceInfo android.os.Debug isDebuggerConnected
DeviceInfo android.content.pm.PackageManager getInstallerPackageName
DeviceInfo android.content.pm.PackageManager getInstalledApplications
DeviceInfo android.content.pm.PackageManager getInstalledModules
DeviceInfo android.content.pm.PackageManager getInstalledPackages
Network java.net.URL openConnection, Network org.apache.http.impl.client.AbstractHttpClient execute
Network com.android.okhttp.internal.huc.HttpURLConnectionImpl getInputStream
Network com.android.okhttp.internal.http.HttpURLConnectionImpl getInputStream
DexClassLoader dalvik.system.BaseDexClassLoader findResource
DexClassLoader dalvik.system.BaseDexClassLoader findResources
DexClassLoader dalvik.system.BaseDexClassLoader findLibrary
DexClassLoader dalvik.system.DexFile loadDex, DexClassLoader dalvik.system.DexFile loadClass
DexClassLoader dalvik.system.DexClassLoader init
Base64 android.util.Base64 decode, Base64 android.util.Base64 encode
Base64 android.util.Base64 encodeToString
SystemManager android.app.ApplicationPackageManager setComponentEnabledSetting
SystemManager android.app.NotificationManager notify
SystemManager android.telephony.TelephonyManager listen
SystemManager android.content.BroadcastReceiver abortBroadcast
SMS android.telephony.SmsManager sendTextMessage
SMS android.telephony.SmsManager sendMultipartTextMessage
DeviceData android.content.ContentResolver query
DeviceData android.content.ContentResolver registerContentObserver
DeviceData android.content.ContentResolver insert, DeviceData android.content.ContentResolver delete
DeviceData android.accounts.AccountManager getAccountsByType
DeviceData android.accounts.AccountManager getAccounts
DeviceData android.location.Location getLatitude, DeviceData android.location.Location getLongitude
DeviceData android.media.AudioRecord startRecording
DeviceData android.media.MediaRecorder start, DeviceData android.os.SystemProperties get
DeviceData android.app.ApplicationPackageManager getInstalledPackages
sessions

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 30,2021 at 10:18:35 UTC from IEEE Xplore. Restrictions apply.

You might also like