IPanalyer
IPanalyer
https://doi.org/10.1007/s11042-024-18511-6
Abstract
Android malware has been growing in scale and complexity, spurred by the unabated uptake
of smartphones worldwide. Millions of malicious Android applications have been detected
in the past few years, posing severe threats like system damage, information leakage, etc.
This calls for novel approaches to mitigate the growing threat of Android malware. Among
various detection schemes, permission and intent-based ones have been widely proposed in
the literature. However, many permissions and intents patterns are similar in normal and
malware datasets. Such high similarity in both datasets’ permissions and intents patterns
motivates us to rank them to find the distinguishing features. Hence, we have proposed a
novel Android malware detection system named IPAnalyzer that first ranks the permissions
and intents with a frequency-based Chi-square test. Then, the system applies a novel detec-
tion algorithm that combines ranked permissions and intents and involves various machine
learning and deep learning classifiers. As a result, the proposed system gives the best set
of permissions and intents with higher detection accuracy as an output. The experimental
results highlight that our proposed approach can effectively detect Android malware with
98.49% detection accuracy, achieved with the combination of the top six permissions and
top six intents. Furthermore, our experiments demonstrate that the proposed system with the
Chi-square ranking is better than other statistical tests like Mutual Information and Pearson
Correlation Coefficient. Moreover, the proposed model can detect Android malware with bet-
ter accuracy and less number of features than various state-of-the-art techniques for Android
malware detection.
B Yash Sharma
[email protected]
B Anshul Arora
[email protected]
1 Department of Applied Mathematics, Delhi Technological University, Delhi -110042, India
123
Multimedia Tools and Applications
1 Introduction
Initially, the only common usage of mobile phones included speaking or delivering short
message transactions. However, over time, mobile phones’ popularity and utility have grown
beyond just communication. These days mobile phones are used for almost everything and
contain personal data ranging from our social media account details to our most valuable
bank account details. Experts predicted mobile internet would eventually overtake desktop,
and they were right. If we look at a market share metric between December 2020- December
2021, 55% of worldwide online visits come from mobile, 43% from desktop, and 2% from
tablet.1 The world witnessed a significant hike in mobile device usage in the past ten years.
A little over 5.3 billion people use mobile phones worldwide, and a considerable percentage
rely on smartphones, each having at least 40 applications. As per the sources, by the end of
the year 2022, more than 200 billion apps will be downloaded from virtual app stores.2
Various mobile OS exist in the market, such as Android, iOS, Windows, etc. Among
them, Android is the most popular mobile OS due to its open-source, free-to-use nature and
numerous feature-rich apps. In a brief period, it has become the world’s most popular mobile
platform with 82% Android smartphone users. Due to such an increase in popularity, Android
became a direct target for malware developers.
Malware developers have been developing Android malware to exploit users’ data and dam-
age smartphones. Some of the most common malware threats in the market are mobile
ransomware and Phishing. Mobile ransomware attempts to steal the user’s personal informa-
tion by encrypting files on a mobile device, followed by demanding any ransom payment to
restore the user’s access to their data. In contrast, Phishing attacks mobile phones via email
containing a malware-type link or an attachment containing malware. If we look at the data
provided by AV-ATLAS, a website that offers real-time threat analysis and statistics,3 the
total number of malware in the world has increased from 82,798,906 in 2012 to 989,851,238
in 2022. According to a recent report by the Kaspersky Security Network, for the first quarter
of 2022, the statistics don’t look so good either; 6,463,414 mobile malware, adware, and
riskware attacks have been detected.4 These concerning numbers prove that there is dire
need of robust Android malare detection systems.
If we try to analyze the current situation closely, the number of malware attacks is mainly
from three major sources: (a) App markets, an easy distribution gateway for malware devel-
opers; (b) Users, drive-by-downloads; and (c) Developers, weak code. The users need to keep
their eyes open and minds sharp while installing any application, as, during the installation,
every application must declare its permissions, intents, and other application accesses. Still,
at times the user might not have enough knowledge or expertise to understand the negative
impact of allowing all permission access requests [1].
Motivation Permissions and intents are the key components present within the manifest
file of any Android application. These static features have been widely used in the literature
1 https://kinsta.com/mobile-vs-desktop-market-share/
2 https://www.cybertalk.org/2022/06/10/10-eye-opening-mobile-malware-statistics-to-know/
3 https://portal.av-atlas.org/malware
4 https://dataprot.net/statistics/malware-statistics/
123
Multimedia Tools and Applications
for Android malware detection. However, there are many similarities in the permissions and
intents patterns of normal and malicious apps. Tables 1 and 2, respectively, summarize the top
20 permissions and intents based on their frequency in the normal and malware dataset. We
have collected 77,000 normal apps and an equal number of malware apps from Androzoo.
More details about the dataset are covered in upcoming sections. Furthermore, we have
extracted permissions and intents from the manifest files of corresponding applications. As
seen in Table 1, 13 of the top 20 permissions are common in both normal and malware
datasets. Similarly, Table 2 highlights that seven out of the top 20 intents are common in
both datasets. Such similarity in these features across both datasets motivates us to rank the
features to propose an efficient detection model with distinguishing features. Moreover, the
Android operating system has more than 150 permissions; if we use all of them as features,
irrelevant features will hamper detection accuracy. Hence feature reduction is a key process
in developing a detection algorithm. Moreover, the field of Android security revolves around
accuracy; the better the accuracy of detecting malware, the better the detection system and
the best accuracy can only be obtained by using the best set of features. Hence feature ranking
is the key aspect of our research.
Several related works, such as [2, 3], and [4], have used permissions as the main feature
in the process of detecting Android malware. To talk about them in a bit of detail, Şahın
et al. [2] used multiple linear regression methods while feeding permissions as inputs for
their calculations and concluded their paper by comparing the results of their proposed
permission-based classifiers with the machine learning ones. Alsoghyer et al.[3] worked on
developing a detection model based upon the frequently used permissions in both normal
and malware datasets, followed by applying the machine learning algorithms. In contrast,
Shrivastava et al. [4] started with the same approach of using permission frequency but instead
used it to calculate a particular risk score to classify applications as normal or malware. The
authors in [5] and Idrees et al. [6] worked on combining the two features, permissions, and
intents. More specifically, the authors in [5] developed a malware classification system that
classified applications as normal or malware by observing the feature frequency, whereas in
[6], the authors started with a similar approach of monitoring the frequency of most requested
features but later used it to make a detection matrix as a part of the malware detection
system.
None of the above works used the key concept of ranking the features and hence missed
the feature reduction step, which could have enhanced the quality of their results. In several
other related works, such as [7] and [8], the authors built a detection system using the ranking
of features, be it permissions or permissions and intents combined. More specifically, Li et al.
[7] worked on ranking the permissions that are being used in one type of dataset only, either
normal or malware applications set by using the frequency method. Khariwal et al.[8] also
worked on ranking the features, but they took it a step further by including the ranking of
intents and the combined ranking of intents and permissions obtained from the Information
Gain score in their research work. However, both works were implemented on a smaller set
of malware applications as compared to the huge malware dataset in our proposed work.
More importantly, our work outperforms both of them in terms of detection accuracy while
using a lesser number of ranked features.
123
123
Table 1 Top 20 most frequently requested permissions from both normal and malware datasets with their corresponding frequency
Permissions Normal frequency Permissions Malware frequency
123
Multimedia Tools and Applications
We aim to build a robust and efficient static analysis-based Android malware detection system
capable of identifying malicious behavior of applications on Android smartphones. At the
same time, we are driven to fulfill this objective using the least as well as the best combination
of features only amongst the top two most commonly used static feature types, i.e., permis-
sions and intents. Instead of using just one feature type, we have opted for a hybrid approach
to choose the best pairs of permissions and intents, the reasons for which are twofold. Firstly,
we believe generalizing a theory needs more than one tested scenario before it can to be
called a fact. Similarly, to prove the robustness of our proposed algorithm, we checked it on
both permissions and intents (including various possible pairing combinations) as well as
on three testing datasets of different timelines and sources. Secondly, experimental results
indicate that combining different feature types can lead to elevated detection accuracy instead
of using either of them individually, and as mentioned earlier, a malware detection model is
as good as its detection accuracy.
Combining intents and permissions as features can be a simple yet effective approach
to detecting malicious applications. Therefore, this work aims to analyze permissions and
intents while taking their frequency as input and further ranking them using a statistical Chi-
square test. The following research questions emerge in the light of proposing a detection
model based on the ranking of permissions and intents:
1. Why do we need to rank the permissions and intents, and subsequently, why is feature
reduction needed instead of feeding all the features as inputs?
2. How to incorporate feature ranking, i.e., how to rank the permissions and intents?
3. How to frame a detection approach based on the ranking of permissions and intents?
We are motivated to answer these questions with a vision to develop an Android malware
detector, named IPAnalyzer, based on the combination of ranked permissions and intents. We
have used a frequency-based Chi-square test to rank permissions and intents. We have used
the Chi-square test because of its numerous advantages, such as its robust nature to the data
distribution and comparatively more straightforward computation. Moreover, the Chi-square
test can handle data whose parametric assumptions cannot be met, irrespective of two-group
or multiple-group studies. Further, we have proposed a novel detection algorithm that uses
ranked permissions and intents and applies various machine learning and deep learning
techniques to detect Android malware effectively. The work proposed in this paper employs
a mix of old and recent datasets for evaluation. Our detection results are better than many
state-of-the-art techniques proposed in the existing literature. Moreover, our experiments
demonstrate that the proposed Chi-square-based feature ranking gives us better accuracy
than the Mutual Information and Pearson Correlation Coefficient, which have been used in
[9], which we evaluate against the same dataset of normal and malicious apps.
Contributions The main contributions of this research are highlighted below:
• Firstly, we ranked the permissions and intents in order of their absolute frequency dif-
ference between the malware and normal dataset and used the values as pre-requisite in
the Chi-square test.
• Next, we applied the frequency-based Chi-square test on the permissions and intents and
ranked them based on the F-score given as an output by the test.
• We proposed a novel algorithm to merge the individual rankings of permissions and
intents to develop an efficient Android malware detection system.
123
Multimedia Tools and Applications
• We observed that the detection results of the proposed approach are relatively better
than various state-of-the-art techniques existing in the literature for Android malware
detection.
Organization The rest of the paper is structured as follows. First, we discuss the related
work in Sect. 2. Then, we explain, in detail, the proposed methodology in Sect. 3. Finally,
we present the results from the proposed model in Sect. 4 and conclude the paper with future
work directions in Sect. 5.
2 Related work
This section reviews the related works proposed in the literature for Android malware
detection. Some dynamic Android malware detection techniques have been proposed in
the literature using dynamic features like network traffic [10]-[11], system calls [12], cryp-
tographic and network operation [13] etc. However, because our proposed model analyzes
static features, i.e., permissions and intents, for malware detection, we have focused on dis-
cussing the works that have used permissions, intents, and other static features for malware
detection. We divide the related works into two categories: detection techniques based upon
feature ranking and methods that have not applied any feature ranking for malware detection.
First, we discuss the methods that have applied any feature ranking technique for Android
malware detection. The authors in [7] introduced a significant permission identification model
named SigPid. Firstly, a pruning process identified permissions that are highly requested and
rarely requested by malware apps. After normalizing the data, the support value for each
permission was used to calculate each permission’s rate. The rate values lie between -1
and 1; two ranked lists were created, one in ascending order and the other in descending
order. Then the authors took the top value from both lists to calculate TPR, FPR, recall,
precision, and F- measure. Similarly, the top three values were taken from both lists, and all
the above expressions were re-evaluated. The authors repeated the above process until the
most miniature set of permissions was found, giving the best results. The permission having
a meager support value was further discarded to improve the classification process.
The authors in [14] presented an improved version of a pre-existing method and named it
Term Frequency-Inverse Document Frequency Class Frequency (TF-IDFCF) by adding the
aspect of class frequency of a particular feature. In the end, they trained multiple classifiers
using ML algorithms and checked the efficiency of their proposed work using the WEKA tool.
The authors in [15] evaluated a machine learning-based problem, mainly using a Bayesian
Classifier to detect applications’ malware behavior. They took permissions, API calls, and
system commands as features for their approach, ranked them on their Mutual Information
score, and later fed the values to the Bayesian Classifier for the training and testing phase.
Finally, they concluded their work by showing the detection results, which showed that their
approach can achieve great detection rates and that not all features are needed to provide
optimum results. In [16], the authors divided their focus between permissions and other code-
based features to be fed into the Bayesian Classifier. First, they extracted the most relevant
permissions using the Mutual Information gain. Then, to further improve their detection, they
did the same for top code-based features and thirdly for a combination of permissions and
123
Multimedia Tools and Applications
code-based features. Their results concluded that code-based and a mix of both give better
accuracy and detection results than just permission-based features.
The authors in [17] considered the frequency of permissions in the benign and malicious
datasets to rank them according to their usage. Further, to reduce the permission set, they
utilized the help of a support-based method and applied various values of “k” as the support
threshold to choose the most significant collection of permissions. Later they combined the
best set of permissions for each app with the nine traffic features that they have proved to be
the best among all in their previous research to perform hybrid detection. According to their
results, using permission and network traffic features together as a hybrid vector gives better
accuracy than using either alone.
Wang et al. [9] studied the permission-induced risk in Android applications by ranking the
features using Mutual Information, T-test, and Correlation Coefficient, followed by identify-
ing the risky permission subsets from the rankings by utilizing Principal Component Analysis
and Sequential Forward Selection. They further applied various machine learning algorithms
to evaluate the usefulness of identified top risky permissions.
Rathore et al. [18] emphasized the importance of feature reduction and ranking instead
of using a large number of feature sets by proposing a reliable feature reduction approach
involving the use of a huge variety of classifiers, feature sets including permissions, intents,
opcode sequences as well as mutually exclusive and merged feature spaces. They managed
to reduce the feature size up to 90% which affected a bit of the original detection accuracy
but they also trimmed down the test and training time successfully. Chaudhary et al. [19]
took a comparative approach of using the complete dataset and secondly, the reduced set.
The reduced set was achieved through their preferred feature reduction technique which was
Chi-square due to its numerous merits. While using the complete dataset of permissions
and intents, they utilized the CNN algorithm. They observed elevated performance levels
and reduced overheads in the case of the reduced dataset as compared to the complete one.
Manzil et al. [20] proposed a novel feature selection approach of filtering out the most relevant
set of permissions and intents by using the hamming distance of each feature and threshold
technique. Lastly, they used various machine learning, deep learning, and ensemble learning
classifiers to detect the presence of Android ransomware and ransomware families. Seyfari
et al. [21] used numerous feature types such as permissions, intents, API calls, services,
receivers, activities, etc. and to deal with the huge number they proposed a combination of
SA algorithm in combination with fuzzy logic which is found to be capable of searching the
neighborhood for the solution. Lastly, they took the commonly approached route of testing
their approach using traditional ML-based classifiers. Anupama et al. [22] presented a hybrid
approach using permissions and system calls, both individually and in combination to build
a detection model using various machine learning and deep learning classifiers. Moreover,
they analyzed the negative impact of evasion and poisoning attacks on the performance and
robustness of the classifiers.
Mahindru et al. [23] chose ANN as the preferred type of classifier for their research. In
particular, they used Self Organising Maps (SOM) to learn and detect the malware behavior
by using permissions, API calls, user rating, and number of user download apps as features.
They used six feature selection techniques to enhance the detection accuracy and dampen
the number of features used. Mahindru et al. [24] extracted features such as permissions
and API calls to build a detection system using an LSSVM (Least Square Support Vector
Machine) learning approach with three distinct kernel functions i.e., linear, radial basis, and
polynomial. The authors used ten distinct feature selection and feature ranking techniques to
deal with the dimensionality issue and reduced the feature set.
123
Multimedia Tools and Applications
To the best of our knowledge, no other work has ranked static features of permissions
and intents with a frequency-based Chi-square test to detect Android malware. In this work,
different from other works, we ranked the features with a frequency-based Chi-square test
and proposed a novel algorithm to detect Android malware with the best features.
In the literature, many techniques exist for Android malware detection that have not applied
any feature ranking test, i.e., they do not aim to rank the features. In this subsection, we
review all such detection methods. In [25], ten cross-validation is applied after creating a
feature vector by removing the permissions that are never used. The authors applied Linear
regression to the training data to remove the permissions with a coefficient value close to
zero. Talha et al. [26] presented a permission-based Android malware detection system named
APK Auditor. It consists of three components: an APK auditor client, a signature database,
and a central server. The client made an analysis request on whether the application can be
trusted. The signature database handled the application permissions, services, and receivers.
Finally, the central server calculated the permission malware score, and the application’s
malware score was subsequently calculated.
The authors in [27] extracted a set of 123 dynamic permissions from 11000 Android appli-
cations. These collected apk packages were made to run with the emulator bluestack. Finally,
permissions were extracted by running a Java code and were divided into safe and unsafe
permissions. Ultimately, they evaluated the performance of machine learning classifiers on
the dataset.
In [28], the App perm analyzer software examined the manifest and code permissions
separately. Manifest permissions were extracted from the Androidmanifest. xml file and code
permissions were found by searching the decoded source code of the apk file. The authors
created a feature vector and evaluated six different types of scores. Once the scores were
calculated, appropriate threshold limits were set according to the accuracy, sensitivity, and
specificity. Hence, applications below the threshold value were identified as benign, whereas
those above or equal to the threshold value were detected as malware. The authors in [29]
proposed an Android malware detection model based on improved Naïve Bayes classification.
They determined the value of Pearson Correlation Coefficient “r” and deleted the permissions
whose value “r” was less than the threshold “ρ” and derived the new permission set. Further,
they got the improved detection model by clustering based on information theory.
The authors in [30] discussed an approach based on sequence alignment. This work took
a DNA element as permission and determined permission patterns for normal and malicious
samples. It is a technique related to bioinformatics used to identify similarities between
applications by evaluating a similarity score and setting up a threshold. In [31], the authors
described a monitoring tool to keep track of permission requests from various applications.
The monitoring system took the help of the Broadcast Receiver and intent object to detect
update events. The app’s name and installation time were saved along with the requested
permissions; hence, when a new file was generated asking for a new set of permissions, they
could be easily identified. Ultimately, they used the pattern of permission sets for known mal-
ware applications to match up with the testing dataset permission sets to classify applications
as malicious. Ilham et al. [32] described a novel approach based on permissions. The authors
applied filter feature selection algorithms and machine learning algorithms to classify appli-
cations in WEKA. The work proposed in [33] analyzed a permission-based Android malware
system in which they proposed a permission weight approach, namely Relevance Frequency.
123
Multimedia Tools and Applications
They applied their proposed approach to various machine learning algorithms and concluded
by comparing the results of their study with the existing or previous methods.
In [34], the authors claimed to introduce a new approach called permission maps (Perm-
maps) that could combine information related to the Android permissions with their severity
level. In the end, (Convolutional Neural Network) CNN techniques were used to classify
several malware types. Xiong et al. [35] utilized the dominant permission patterns in either
malware dataset or clean dataset to work as a weak classifier in the proposed Enclamald, an
ensemble classifier. The permission patterns defined in both datasets were also used but only
with significant differences in their support degrees. An unknown application is fed to the
classifier, and after computing the score of the weak classifier with a discrimination coeffi-
cient, the application was categorized into normal or malware. The authors in [36] introduced
a two-layered malware security and detection model by improving the Random Forest Algo-
rithm in the first layer after submitting the fuzzy sets. In the second layer, they mined the
sensitive cluster of permissions to analyze the fuzzy sets using the Apriori Algorithm. The
work proposed in [37] mainly used a couple of feature extraction algorithms called Sequen-
tial Forward selection (SFS) and Principal component Analysis (PCA) to identify the type of
permissions and took down the malicious application detection by limiting the permissions
that seemed dangerous using the centralized algorithm.
Amer et al. [38] worked on creating an ensemble model based upon multiple machine
learning classifiers to train and test the given data. They were subsequently categorizing the
apps as malware or benign. The authors emphasized the efficiency of their model as their
robustness feature, and it outperformed the previous works in terms of accuracy. The authors
in [39] used feature reduction techniques such as Information gain, Relief, and Gain Ratio to
take only the most influential set of permissions out of the entire collection. Further, to detect
malware from the used dataset, supervised classifiers were used. Pondugula et al. [40] built a
sequential neural network model for training and later tested the permission data with three
hidden layers to classify the applications. First, a threshold was set in the sigmoid output in
the output layer. Then, all the applications exceeding that threshold were considered malware.
Wang et al. [41] introduced a new approach, Multilevel Permission Extraction (MPE),
comprising three methods. First, Association Rule Mining (ARM), which worked on finding
the support and confidence value between the two permissions and, after setting a certain
threshold, could be used to delete some permissions. Second, Principal Component Analysis
(PCA), another feature reduction method used with the third Deep Cross Network (DCN)
to identify the top feature. As a result, the authors observed that minimal features were
required to achieve better accuracy, precision, recall, and F score. In [42], the author talked
about a new approach where they included the custom permissions, i.e., the permissions that
had an impact on detecting malicious applications and the built-in permissions. After that,
they removed the permissions that were never used in any of the applications and created
a permission Used-at-least-once list (Permissions UALO), giving the best accuracy after
applying the Random Forest classifier. In [43], the authors set rules to identify applications
as benign or malware using permission combinations. Then, they formulated a k- map tool to
make a list of permissions more often used by malware and repeated the same step for double,
triple, and quadruple values of k. Finally, they concluded the paper by stating results leading
to prove that their proposed method produces low false positive rates and false negative rates.
Enck et al. [44] defined a Kirin security service that worked over smartphones using
its original Kirin security language to detect malware applications by using specific rules
based on the severity and possible harmful usage of the permissions. A new application was
fed to the Kirin security application, and subsequently, it displayed the risk ratings to the
users using Kirin security rules. In [45], the authors focused on producing risk scores for
123
Multimedia Tools and Applications
each application after using permissions as a feature. The proposed method has a quantitative
security risk assessment whose work is based on permission request patterns. They calculated
the risk score for each app using Baye’s theorem and the impact of permissions, which they
calculated by setting a threshold for the ROC curve. Finally, they concluded their paper by
comparing it with the existing works and emphasizing its merits, namely that it can alleviate
the overprivileged problem. The authors in [46] focused on risk scoring the applications using
the Naïve Bayes method and its advanced modifications and mixture models. According to
the authors, their approach can be used as a feedback mechanism for the developers as they
might get an idea of which permission to keep or lose to make their app less risky. To report
their results, they used the Radius of Curvature Curves to compare a randomly selected app
with a particular risk value being used as an indicator of a malicious app. They concluded
the paper by stating that Naïve Bayes with Informative Priors works best while ranking the
apps and risk scoring.
In [47], the authors introduced a framework that utilized Natural Language Processing
(NLP) techniques called WHYPER, which reads application descriptions to inform the user
why the application needs particular permission. For this, they performed analysis over
snapshots of the Android application’s descriptions, parsed them using an NLP parser, and
produced the Annotated description with the help of a semantic engine. Samra et al. [48]
worked on making clusters of two categories of Android applications, namely business, and
tool, using the K-means Clustering algorithm. The clustering algorithm uses permissions as
features extracted from the XML files of the applications. The detection results indicated that
their clustering technique could efficiently detect malware.
The authors in [49], after extracting permissions as features from various applications,
used Information gained to select k best features. Then on the extracted features, they further
applied the K-means clustering algorithm, classified it using a decision tree, and concluded
the paper by showing their detection results after using several machine learning algorithms.
In [50], the authors extracted permissions from various applications, studied their frequency,
and observed that the chances of malware asking for a single permission are comparatively
higher than the normal apps. Further, they applied various machine learning algorithms with
different values of k in k fold cross validation to note down the accuracy values and called
their whole approach Permission Usage to detect Malware in Android (PUMA). Moonsamy
et al. [51] emphasized the importance of promoting the utility of “Used” as well as “Required”
permissions. They used the Biclustering method in the first step to visualize the permissions,
later to use the rare yet unique as well as frequently asked permissions; they proposed the
Contrast Permission Pattern Mining (CPPM) method in which they reduced the dataset to
contrasting permissions pattern by taking the support score for each feature. Finally, they
selected the permissions with the maximum support difference between the normal and
malicious datasets.
The authors in [52] presented the Appguard system, which has proven helpful in cus-
tomizing security policies on untrusted applications. Whenever a new app gets installed,
its proposed model asks the client to secure it, and then it installs a new modified app after
rewriting its policies and deleting the old app. Wu et al. [53] considered static information for
discussing Android applications’ behavior, namely permissions, deployment of components,
intent messages passing, and API calls. After extracting all the features, they formulated the
feature vector table to apply the K-means and EM algorithm, and to decide the number of
clusters; they used the Singular Value Decomposition (SVD) method. Finally, they concluded
the paper with their detection results when they applied KNN and Naïve Bayes algorithm over
the dataset. In [54], the authors divided the permission into four groups: Android, custom,
dangerous, and all permissions. Further, they used this division to calculate eight permission
123
Multimedia Tools and Applications
pair scores, four each for normal and malicious. In the end, per-pair scores for normal and
malware apps were used to classify the app as benign or dangerous. Finally, they concluded
their paper by stating the detection results with machine learning algorithms such as Random
Forest and Stacking Ensemble Learning (SEL).
The authors in [55] used a different approach. Rather than using individual permission
patterns, they focused on perm pairs to form graphs using permissions as vertices and the
frequency of their occurrence as weights for edges for malware detection. They used three
different malware datasets and a single normal dataset. Further, they used an edge-eliminating
method to remove the features/permissions that don’t positively affect accuracy, which helped
reduce detection time and space used. Alsoghyer et al. [3] developed a malware detection
model based on the frequency of permissions by using the most occurring permissions in
the used datasets. Later, they built a machine algorithm-based predictive model used to
discriminate between normal and malware applications. The authors in [56] utilized the binary
table of zeroes and ones, which shows the absence or presence of individual permissions in
any particular app, to calculate the first four moments after using kernel Density Estimation.
After calculating different moment values for three datasets of normal, malware and a dataset
of the top three malware families, they concluded that a significant difference exists between
the calculated moment values for benign and other datasets. In [4], the authors worked towards
formulating a risk score for each permission and intent based on their frequency in a particular
application. Then after observing the different risk scores for applications, they divided the
score into ranges, namely low-risk, high-risk, and medium-risk. In the end, they concluded
their paper by analyzing their results using Sensitivity analysis.
Zhu et al. [57] chose static analysis to build their detection model. They used permissions,
API calls, and other hardware features to be fed into their CNN-based multi-head Squeeze and
Excitation Residual block (MSer) and staged it to construct a deep network proposed by them
called MSerNet. Rathore et al. [58] proposed a defense strategy against adversarial malware
by themselves constructing a couple of robust evasion attacks on permissions and intents-
based models, capable of affecting the detection accuracy of the traditional classifiers gravely.
Thereafter, they introduced a defence system namely MalVpatch that improves the detection
accuracy and at the same time drastically enhances the adversarial robustness of malware
detection models. Keyvanpour et al. [59] mainly used three feature selection techniques on
their extracted features namely, permissions, API calls, intents, and hardware components,
and reduced the set with the help of frequency-based, RF weighs and feature group frequency-
based methods and further fed the results to various machine learning classifiers.
Ravi et al. [60] began by emphasizing the popularity of CNN-based models in the field
of Android malware detection. Following that, they employed 26 CNN-based pre-trained
models to carry out their image-based Android malware detection. Later they fused their
CNN-based model with a couple of machine learning-based classifiers to build a robust and
generalizable model. Kaithal et al. [61] aimed at improving the detection accuracy of tra-
ditional ML-based classifier decisions by combining it with the concept of Buffalo Fitness.
Subsequently, they proposed a novel mill approach namely the African Buffalo-based deci-
sion tree algorithm capable of detecting malware with high accuracy precision, and recall. Lee
et al. [62] emphasized the importance of using permissions frequency as a means of detect-
ing Android malware and hence, classified normal and malicious apps through ML-based
detection techniques based on the frequency of permission of Android apps. The authors con-
cluded and confirmed that the accuracy was improved upon including the frequency of the
top 20 permissions based on the importance of feature information. Wu et al. [63] deployed
123
Multimedia Tools and Applications
Off framework using DDQN algorithm based upon Recurrent Neural Network (RNN) as
the decision network to sequentially select features. Their approach is capable of exhibiting
high performance without any human intervention and fills the disadvantages of traditional
ranking-based feature selection algorithms. Ibrahim et al. [64] used a variety of static fea-
tures such as permissions, API calls, receivers, and services, and even proposed a couple of
new features namely, file size and fuzzy hash values. To process the huge variety of features
and to make them suitable enough for feeding into their API deep learning model, they kept
the count of the opcode sequences, processed the fuzzy hash value using Gated recurrent
unit (GRU), and clustered the permissions and other features using neural network layers.
Lastly, they compared the efficiency of their deep learning model with several other machine
learning classifiers.
A. T. Kabakus [65] proposed a neural network-based model which unlike the other models
in the literature took one-dimensional data as input to train and further test the approach. The
features utilized by the authors were intents, API calls, and the most popular feature used in
the related work permissions. Wang et al. [66] started by analyzing permission sequences to
build a static detection model capable of text-based binary classification. To further classify
the malware family they extracted memory features to build the object reference graph for
each malware. Their experimental results indicate that their approach is capable of resisting
obfuscation attacks with high accuracy. Yuan et al. [67] proposed a broad learning approach
that can also be considered as a flat neural network with two hidden layers. Their main goal
was to build a lightweight on-device detection method using features such as permissions,
intent actions, and API calls while keeping full or incremental training directly on only
mobile devices. The experimental results show that they outperform the shallow learning-
based traditional machine learning models as well as some deep learning ones.
To the best of our knowledge, no other work has ranked the static permissions and intents
with the frequency-based Chi-square test. In this work, we have first ranked the permissions
and intents with frequency-based Chi-square and then proposed a novel detection algorithm
that provides better detection accuracy with the best set of permissions and intents. We
describe our proposed methodology in the next section.
3 System design
In this section, we explain our proposed methodology in detail. Figure 1 summarizes a brief yet
complete idea of our proposed model IPAnalyzer, which is divided mainly into two modules.
In the first module, named as Ranking module, we extract the permissions and intents from
the training dataset and aim to rank them using a frequency-based Chi-Square test. Such a
ranking will help us eliminate irrelevant features that negatively affect the detection accuracy.
In the detection module, we propose a novel algorithm that applies machine learning and deep
learning techniques to get the best features that can provide higher detection accuracy. We
implemented the machine learning and deep learning classifiers with the Python programming
language [68]. Further, we have used the Android Asset Packaging Tool (AAPT2) tool5 to
extract the list of permissions and intents from normal as well as malware applications. We
deployed them on a desktop system with the configuration of 8 GB RAM, i5-1135G7 CPU,
Windows 11 OS.
The following subsections discuss in detail both modules of the proposed model.
5 https://developer.android.com/studio/command-line/aapt2
123
Multimedia Tools and Applications
3.1.1 Dataset
To begin with, we needed a vast dataset of normal and malware applications to conduct our
research. For this purpose, we downloaded 77,000 normal and 77,000 malware applications
from Androzoo [69]. The market used by Androzoo for normal applications is Google Play
Store, whereas the malware apps are from various sources such as PlayDrone, appchina,
anzhi, and VirusShare. To create the normal dataset from the Androzoo, we filtered out those
apps that had VirusTotal6 detection score of zero, i.e., the apps that have been detected as
malware by none of the antiviruses on VirusTotal. Further, for the malware dataset, we filtered
out those applications with a detection score of at least five, i.e., the apps detected as malware
by at least five antiviruses on VirusTotal.
The package file format most popularly used by the Android Operating System is the Android
Package Kit (APK). The APK file is a compressed file of several sub-files and folders that
include essential information such as the application’s permissions, intents, etc. Java Program-
ming Language is the most common language used to write the Android application. After
that, Java source codes are compiled and converted into byte codes. However, byte codes can’t
run directly in the Android environment. Hence, they are further converted into executable
Dalvik bytecodes. Compressing files, folders, and important information inside an APK is
called compilation, whereas extracting information from the kit is called decompilation.
Amongst several important files present inside the bundle, one is the AndroidManifest.xml
file, which contains two of the most important features we use in our detection model, i.e.,
permissions and intents, in addition to several other features. Details about permissions and
intents are summarized below.
6 https://www.virustotal.com/gui/home/upload
123
Multimedia Tools and Applications
1. The Android permission check system requires application developers to declare the
list of permissions that an application needs to invoke the Android API successfully.
Hence, this manifest file contains the list of all Android permissions required to run
the application efficiently. Permission is declared using the <uses-permission> tag
within the manifest file. For example, as shown in Fig. 2, which is the snapshot
of the AndroidManifest.xml file of “WhatsApp Messenger” app, requires permis-
sions such as “READ_PHONE_STATE”, “READ_PHONE_NUMBERS”, “RECEIVE_
SMS”, “VIBRATE” and “AUTHENTICATE_ACCOUNTS” to execute on Android
smartphones.
2. Android Intents- An Intent is a messaging object a developer can use to request an action
from another app component. For example, as shown in Fig. 3, which is the snapshot of
the AndroidManifest.xml file of the “WhatsApp Messenger” app, which requires intents
such as “REQUEST” and “DEFAULT” to execute on Android smartphones.
We have used the Android Asset Packaging Tool (AAPT2) tool7 to extract the list of
permissions and intents from normal as well as malware applications.
After extracting the list of permissions and intents from all the applications of our dataset, we
create the feature vector tables for their representation. The extracted features are represented
with One Hot Encoding method8 to generate a feature vector for each app in our dataset. The
feature vector formulated for each app is of the binary type, with a “1” for the permissions and
intents the application requests and a “0” for the permissions and intents that are not present
within that app. In this way, we create two separate vector tables, one each for permissions
and intents, represented by PV T and I V T , respectively. For instance, if there are a total of
five permissions, say <P1 , . . . ..P5 > and five intents say <I1 , . . . ..I5 > in the system, and
any application A j has permissions P1 , P2, P5 and intents I3 , I4 , I5 , then the app A j is
represented as “11001” and “00111” in PV T and I V T respectively.
We observe that some features have a high frequency in normal or malware datasets. The
frequency difference between the malware and normal dataset for any feature can give us
valuable insights for feature ranking. Therefore, before applying the Chi-Square method to
rank the features, we initially assign weights to all the permissions and intents based on their
absolute frequency difference in normal and malware datasets. Then, we take the absolute
frequency difference for each permission and intent in normal and malware datasets. For
instance, if there are an “x” number of features, the feature with the highest absolute fre-
quency difference will be assigned a weight of one, and the feature with the lowest frequency
difference will be given a weight of “x”. The process is repeated separately for permissions
and intents.
After assigning the weights to all the features, for every occurrence of “1” for each per-
mission and intent, we replace “1” by its corresponding weight in both PV T and I V T . For
7 https://developer.android.com/studio/command-line/aapt2
8 https://scikit-learn.org/stable/modules/sklearn.preprocessing.OneHotEncoder.html
123
Multimedia Tools and Applications
instance, again, consider the same app A j , which was initially represented as “11001” and
“00111” in PV T and I V T respectively. Suppose the weights for P1 , P2, and P 5 are α1 , α2, α5
respectively, and weights for I3 , I4 and I 5 are β3 , β4, β5 , then A j is now represented as
“α1 α2 00α5 ” and “00β3 β4 β5 ” in PV T and I V T respectively.
We have used a statistical Chi-Square test to rank the features. Such a ranking helps eliminate
irrelevant features, and their removal will help improve detection accuracy. The Chi-Square
statistic is used to determine whether the variables of different categories defined are inde-
pendent of each other. It can also be used to measure the significant difference between
variables and their expected values. The Chi-square test is specifically designed to assess
the independence between two categorical variables. This makes it particularly suitable for
feature ranking/selection when dealing with categorical or discrete data, similar to the dataset
used in our work. Moreover, Chi-square does not require equality of variances among the
study groups or homoscedasticity in the data, nor does it form any prior assumptions about
the distribution, making it a suitable method for feature ranking even in the presence of data
with missing values or measurement errors. The Chi-square formula [70] is defined in the
below equation.
(Oi − E i )2
χc 2 =
Ei
where:
c=Degrees of freedom,
O=Observed value(s), and
E=Expected value(s)
This test’s null hypothesis says there is no link between the original and expected data.
The alternate hypothesis states that the actual and expected data depend on each other. For a
basic chi-square test of independence, where n denotes the number of observations and k is
the number of categories, the computational complexity is generally considered to be O(nk).
We apply the Chi-square test on the two feature vector tables we have formulated for our
training dataset, i.e., PV T and I V T . The F score9 that comes after applying the Chi-square
test on a categorical type data can be very efficiently used to select the best set of features,
amongst all features, by ranking them from highest to lowest F score value. The feature
that can better distinguish normal and malware datasets will have a higher F-score value.
We apply this ranking technique separately on permissions and intents, and we get, as an
output of this module, two ranked lists, PList and I List , one each for permissions and intents,
respectively.
9 https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html
123
Multimedia Tools and Applications
We have used several machine learning and deep learning classifiers [71] in our detection
approach. Mainly, we have applied nine widely used classifiers such as Decision Trees,
Random Forest and Support Vector Machnie in our experiments. Next, we briefly describe
these nine classifiers.
Machine learning classifiers
1) Decision Trees (DT): It comes from the family of supervised learning classifiers and
is one of the most commonly used algorithms. As the name suggests, a Decision tree builds
classification or regression models in the form of a tree structure with the training time
complexity of O(n*log(n)*d) where n denotes the number of training points in the training
set and d is the dimensionality of the data. It follows the Iterative Dichotomiser 3(ID3)
algorithm structure for determining the split. To form the trees, DT uses entropy, i.e., the
degree of uncertainty in the randomness of elements and information gain.
2) Random Forest (RF): Random forest classifier is an ensemble algorithm based on
bagging, i.e., bootstrap aggregation. It can also be defined as an improved version of the
decision tree, as it eliminates the disadvantage of overfitting the training set. It takes the
average of all the predictions, which cancels out the biases, and at the same time, a Random
forest adds additional randomness to the model while growing the trees. Similar to the
complexity of DT, RF also has the complexity of O(n*log(n)*d*k) just including an extra
factor k, i.e., the number of decision trees.
3) Bagging Classifier (BC): A bagging classifier considers the vote or average of all the
individual predictions given by each base classifier fitted on a random subset by aggregating
them and producing a meta-estimator. The main advantage of using such a meta-estimator is
that it can lower the black-box estimator’s variance by bringing randomization into its work.
4) GaussianNB (NB): The naive Bayes classifier is based on Bayes’ theorem with inde-
pendent assumptions between predictors (i.e., it assumes that the presence of a feature in a
class is unrelated to any other feature). Even if these features depend on each other, or upon
the existence of the other features, all of these predictors function independently with the
train time complexity of O(n*d).
5) Logistic Regression (LR): This type of machine learning classifier falls under the
category of supervised learning technique. Using the help of independent variables, it can
predict categorical dependent variables such as yes or no, true or false, etc with the train time
complexity of O(n*d). For example, instead of fitting a regression line in Logistic regression,
we fit an “S” shaped logistic function, which predicts two maximum values (0 or 1).
6) Support Vector Machine (SVM): The primary goal of the SVM algorithm is to construct
an optimal line, decision boundary, or hyperplane that effectively divides the n-dimensional
space into distinct classes, facilitating the straightforward placement of new data points
into the correct category for future instances. SVM accomplishes this by utilizing extreme
data points, often referred to as support vectors, to aid in the creation of the hyperplane.
Consequently, this algorithm is named Support Vector Machine. The training complexity of
nonlinear SVM is generally between O(n 2 ) and O(n 3 ) with n the number of training instances.
Deep learning classifiers
7) Artificial Neural Network (ANN): The working of ANN is similar to the working of
nerve cells in the human brain. An artificial neural network has three or more layers that
are interconnected. The first layer consists of input neurons. Those neurons send data to the
deeper layers, sending the final output data to the last output layer.
123
Multimedia Tools and Applications
123
Multimedia Tools and Applications
the detection accuracy, say D Acc . The maximum accuracy, say D Max , is initialized to zero.
At every iteration, we compare D Acc and D Max . If the accuracy at the current iteration, i.e.,
D Acc , is higher than D Max , we proceed towards the next iteration, and we set D Max as D Acc .
In the next iteration, we select the top two ranked permissions and intents and find the
detection accuracy on the testing data by considering these four features, i.e., D Acc for the
current iteration. Again, we compare the D Max and D Acc , and if D Acc is higher than D Max ,
we proceed toward the next iteration to select top three ranked permissions and intents. The
algorithm continues the same way and terminates when the detection accuracy does not
improve further. At a stage when D Acc is not higher than Dmax , we return the D Max and the
best set of permissions and intents. From the proposed approach, the best set of features will
always contain equal permissions and intents. Overall, the computational complexity of the
proposed malware detection algorithm can be expressed as O((N PList + N I List ) * M * f(N)),
where N PList is the number of permissions in PList , N I List is the number of intents in I List ,
M is the maximum number of permissions or intents in the testing dataset, and f(N) is the
time complexity of the machine learning algorithm used. In the result section, we have shown
that this best set of permissions and intents obtained from the above procedure outperforms
other sets in terms of detection accuracy. This answers our research question 3, i.e, how to
frame a detection approach based on the ranking of permissions and intents. We describe the
results obtained from the proposed approach in the next section.
In this section, we showcase and discuss the experimental results obtained from the proposed
IPAnalyzer model. We point out that we have separate datasets for training and testing. As
described in Sect. 3.1.1, we have 77,000 applications, each in the normal and malware cate-
gory. Out of them, we use 56,000 normal apps and 56,000 malware apps in the ranking module.
The remaining 21,000 normal and 21,000 malware apps are used in the detection module.
We name this dataset TESTING DATASET-1. Additionally, we tested our approach on an
unknown dataset containing 788 normal apps from Androzoo and an equal number of mal-
ware apps from Koodous10 and named it TESTING DATASET-2. Then, we also considered
a third dataset called TESTING DATASET-3, which contains recent and stealthier malware
samples detected in 2022. In the upcoming subsections, first, we discuss the ranking obtained
from the frequency-based Chi-Square test, and after that, we describe the detection results
on TESTING DATASET-1, TESTING DATASET-2, and TESTING DATASET-3. Further,
we also compare our proposed work with similar works in Android malware detection.
10 https://koodous.com/
123
123
Table 3 Top 10 permissions along with their corresponding weights
Permissions Weights allotted according to ranking Normal frequency Malware frequency Absolute difference
Intents Weights allotted according to ranking Malware frequency Normal frequency Absolute difference
123
Multimedia Tools and Applications
frequency difference in both datasets. Similarly, we can acknowledge the weights of other
top permissions from the table. The permission named “SET_WALLPAPER” had the low-
est frequency difference of 10 and hence, had the highest weight of 129, amongst all 129
permissions.
As can be seen from Table 4, the intent named “USER_PRESENT” is assigned the weight
of one as it has the highest frequency difference in both datasets. Similarly, we can acknowl-
edge the weights of other top intents from the table. The intent named “MAIN” had the lowest
frequency difference of 87 and hence, had the highest weight of 79, amongst all 79 intents.
To identify the distinguishing features, we separately applied the statistical Chi-Square test
on PV T and I V T . The Chi-Square test, as its output, calculates the corresponding F score
values for all the features. Further, we used these F score values to rank the features such
that the feature with the highest F score value is the top-ranked feature and hence, the most
distinguishing one. Tables 5 and 6 summarize the top ten permissions and intents according
to their F-scores obtained from the Chi-Square test. This answers our research question 2,
i.e., how to rank the permissions and intents to identify the distinguishing features among
them.
Table 5 highlights that the permission named “BIND_GET_INSTALL_REFERRER_SER
VICE” is the most distinguishing permission with the highest F-score. Similarly, we can infer
rankings of other permissions based on their F-scores from the table. The permission named
“SET_WALLPAPER” had the lowest F score value of 13.803 amongst all permissions and
hence, is the least distinguishing permission. Similarly, Table 6 highlights that the intent
named “CONNECTION” is the most distinguishing intent with the highest F-score. Similarly,
we can infer rankings of other intents based on their F-scores from the table. The intent named
“MAIN” had the lowest F-score value of 0.9848 and hence, is the least distinguishing intent.
In the following subsection, we present the detection results obtained with the proposed
model.
123
Multimedia Tools and Applications
In this section, we discuss the detection results, i.e., the accuracy obtained from our proposed
approach over the TESTING DATASET-1. For comparison, we perform three experiments,
considering 1) permissions alone, 2) intents alone, and 3) both permissions and intents com-
bined. We discuss these results in upcoming subsections.
First, we apply the proposed detection algorithm (Algorithm 1) with permissions alone. The
algorithm will give the best permissions with higher accuracy as an output. Table 7 sum-
marizes the detection results when we use permissions alone for detection. The table can be
understood as follows. With the top-ranked permission, i.e., “BIND_GET_INSTALL_REFER
RER_SERVICE”, we get 95.55% accuracy with several machine learning classifiers. We
call this the first iteration, then we move to the next iteration when we consider the top
two ranked permissions, i.e., combining “BIND_GET_INSTALL_REFERRER_SERVICE”
with “JPUSH_MESSAGE” for detection and repeat the process mentioned above. In
this iteration, we get an accuracy of 96.96% from several machine learning classifiers.
As discussed in Algorithm 1, we proceed to the next iteration whenever the detection
accuracy increases from the previous iteration. Hence, we consider the top three per-
missions and repeat the entire procedure. The procedure terminates until we observe
a potential decrease in the detection accuracy. As shown in Table 7, we achieved
the highest detection accuracy on the tenth iteration, i.e., upon adding the top ten
permissions, namely “BIND_GET_INSTALL_REFERRER_ SERVICE”, “JPUSH_ MES-
SAGE”, “RESTART_PACKAGES”, “SEND_SMS”, “RECEIVE_SMS”, “READ_SMS”,
“CHANGE_CONFIGURATION”, “RECEIVE_ USER_PRESENT”, “BROADCAST_PAC
KAGE_INSTALL”, “BROADCAST_PACKAGE_REPLACED”, we get the highest accu-
racy of 97.70%. From the next iteration, we observe that the detection accuracy starts
decreasing. Finally, we observe that we get the highest accuracy of 97.70% when we apply
the proposed Algorithm 1 only on permissions.
123
123
Table 7 Detection results with proposed approach considering only permissions
Permissions used Detection accuracy using various machine learning and deep learning classifiers (in %)
DT RF ANN BC NB LR MLP SVM DNN
Bind_get_install_referrer_service 95.55 95.55 94.29 95.55 95.55 95.55 94.60 94.60 94.60
Jpush_message 96.69 96.69 51.71 96.69 96.69 96.69 96.96 96.96 96.96
Restart_packages 96.86 96.86 60.68 96.86 96.86 96.86 96.79 96.79 96.79
Send_SMS 97.06 97.06 66.59 97.06 97.06 97.06 97.05 97.05 97.25
Receive_SMS 97.42 97.42 70.83 97.42 97.42 97.42 97.25 97.25 97.25
Read_SMS 97.42 97.42 74.46 97.42 97.42 97.42 97.46 97.46 97.46
Change_configuration 97.29 97.29 77.15 97.29 97.29 97.29 97.27 97.27 97.27
Receive_user_present 97.39 97.39 70.16 97.39 97.39 97.39 97.41 97.41 97.41
Broadcast_package_install 97.60 97.60 71.63 97.60 91.65 97.60 97.50 97.50 97.50
Broadcast_package_replaced 97.70 97.70 74.54 97.70 92.16 97.70 97.58 97.58 97.58
Broadcast_sticky 97.20 97.20 91.70 97.20 91.91 97.20 97.14 97.14 97.14
Process_outgoing_calls 97.32 97.32 78.10 97.32 92.55 97.32 97.22 97.22 97.22
Multimedia Tools and Applications
Multimedia Tools and Applications
Next, we apply the proposed detection algorithm (Algorithm 1) with intents alone. The algo-
rithm will give the best intents with higher accuracy as an output. Table 8 summarizes the
detection results when we use intents alone for detection. The table can be understood as
follows. With the top-ranked intent, i.e., “CONNECTION”, we get 95.27% accuracy with
all the classifiers. We call this the first iteration, then we move to the next iteration when
we consider the top two ranked intents, i.e., combining “CONNECTION” with “Daemon-
Service” for detection and repeating the abovementioned process. In this iteration, we note
that we get an accuracy of 95.35% from all the machine learning classifiers. As discussed
in Algorithm 1, we proceed to the next iteration whenever the detection accuracy increases
from the previous iteration. The procedure terminates until we observe a potential decrease
in the detection accuracy. As shown in Table 8, we achieved the highest detection accuracy
on the second iteration, i.e., upon adding the top two intents, namely “CONNECTION”
and “DaemonService”, we get the highest accuracy of 95.35%. From the next iteration, we
observe that the detection accuracy starts decreasing. Finally, we observe that we get the
highest accuracy of 95.35% when we apply the proposed Algorithm 1 only on intents.
123
123
Table 8 Detection results with proposed approach considering only intents
Intents used Detection accuracy using various machine learning and deep learning classifiers (in %)
DT RF ANN BC NB LR MLP SVM DNN
Connection 95.27 95.27 94.33 95.27 95.27 95.27 95.21 95.21 95.21
Daemonservice 95.35 95.35 65.32 95.35 95.35 95.35 95.30 95.30 95.30
Notification_received 95.27 95.27 72.99 95.27 95.27 95.27 95.32 95.32 95.32
Notification_opened 95.20 95.20 75.78 95.20 95.20 95.20 95.05 95.05 95.05
Multimedia Tools and Applications
Table 9 Detection results with proposed approach considering the combination of permission and intents
Multimedia Tools and Applications
Permissions and intents used Detection accuracy using various machine learning and
deep learning classifiers (in %)
DT RF ANN BC NB LR MLP SVM DNN
Bind_get_install_referrer_service and connection 96.56 96.56 56.65 96.56 96.56 96.56 96.55 96.55 96.55
Jpush_message and Daemonservice 97.90 97.90 70.34 97.90 97.90 97.90 97.92 97.92 97.92
Restart_packages and notification_received 98.02 98.02 97.90 98.02 98.02 98.02 97.92 97.92 97.92
Send_SMS and notification_opened 98.32 98.32 82.65 98.32 98.32 98.32 98.14 90.84 98.14
Receive_SMS and message_received 98.41 98.41 85.38 98.41 98.41 98.41 98.29 98.29 98.29
Read_SMS and start_from_agoo 98.49 98.49 86.84 98.49 98.49 98.49 98.35 98.35 98.35
Change_configuration and report 98.27 98.27 88.37 98.27 88.92 98.27 88.26 88.26 98.26
Receive_user_present and command 98.26 98.26 90.18 98.26 90.32 98.26 90.21 90.21 98.29
123
Multimedia Tools and Applications
and intents irrespective of the number of iterations. For instance, from Table 9, we get the
highest accuracy of 98.49% with the combination of six permissions and six intents, i.e., 12
features. Hence, to cross-check our approach, we compare the detection accuracy of features
in other combinations of 12, such as five permissions with seven intents, four permissions
with eight intents, three permissions with nine intents, and two permissions with ten intents,
and vice versa. Moreover, we consider other combinations when the total number of features
used differs from 12, i.e., 10, 11, 13, 14, and 15. Finally, we summarize all these results
in Table 10. From the table, we observe that the best set of six permissions and six intents
obtained from our proposed approach proves to be better in terms of detection accuracy
than other combinations of permissions and intents. Hence, our model outperforms different
combinations of permissions and intents, and we find that we get the highest accuracy of
98.49% with the top six permissions and top six intents combined.
Table 11 summarizes the detection results when we use all permissions and intents for detec-
tion without applying any feature ranking technique. The table can be understood as follows.
On considering all the permissions simultaneously without utilizing Chi-Square-based fea-
ture ranking, we observe that the highest detection accuracy obtained is 78.64%, whereas
the highest detection accuracy obtained while considering all intents is 67.18%. Such a low
Table 10 Comparison of proposed frequency-based chi-square test in terms of detection accuracy upon using
different combinations of features
Combination of features Total number of features used Detection accuracy (in %)
123
Multimedia Tools and Applications
Table 11 Detection results considering all features for TESTING DATASET-1 without applying the proposed
approach
Features used Detection accuracy using various machine learning and
deep learning classifiers (in %)
DT RF ANN BC NB LR MLP SVM DNN
All permissions 74.64 74.64 69.55 78.64 69.60 69.60 44.95 69.80 69.71
All intents 67.18 67.18 55.11 67.18 54.28 55.15 50.25 64.26 50.26
detection accuracy highlights the importance of ranking permissions and intents because such
ranking helps us eliminate the irrelevant features that can hamper detection accuracy. This
answers our research question 1, i.e., why do we need to rank permissions and intents. We
have applied the frequency-based Chi-Square test to rank permissions and intents in our work.
However, statistical tests such as Mutual Information and Pearson Correlation Coefficient
have been used in other works such as [9] for Android malware detection. Hence, next, we
compare the performance of the proposed frequency-based Chi-Square test with the Mutual
Information and Pearson correlation Coefficient. Table 12 highlights the top ten permissions
and intents ranked with Mutual Information and Pearson Correlation Coefficient.
For the comparison, we ranked permissions and intents using Mutual Information and
Pearson’s Correlation Coefficient and further applied Algorithm 1 on TESTING DATASET-
1 to obtain their corresponding detection accuracies. First, we apply the proposed detection
algorithm (Algorithm 1), only on permissions, after ranking them using Mutual Information
and Pearson’s Correlation Coefficient. The proposed algorithm, i.e., Algorithm 1, will give
the best set of permissions with higher accuracy as an output. The results are summarized
in Table 13. From the table, we observe that we get the highest accuracy of 97.61% with
only one permission, namely “MOUNT_UNMOUNT_FILESYSTEMS”, when we rank the
permissions with Mutual Information. With Pearson’s Correlation Coefficient, we get the
highest accuracy of 96.02% again with only one permission, namely “RECEIVE_SMS”. With
our proposed frequency-based Chi-Square test on permissions, we get the highest accuracy
of 97.70% with ten permissions. Therefore, on TESTING DATASET-1, the frequency-based
Chi-Square test is better than both Mutual Information and Pearson Correlation Coefficient
when we rank permissions with these techniques. Moreover, as seen in Table 9, we get the
highest accuracy of 98.49% from the proposed model with the frequency-based Chi-Square
test on the combination of permissions and intents, which is higher than the accuracy obtained
from Pearson Coefficient and Mutual Information. Hence, our model outperforms Mutual
Information and Pearson Correlation Coefficient on permissions.
Next, we apply the proposed detection algorithm (Algorithm 1), only on intents, after
ranking them using Mutual Information and Pearson’s Correlation Coefficient. The proposed
algorithm, i.e., Algorithm 1, will give the best set of intents with higher accuracy as an
output. The results are summarized in Table 14. From the table, we observe that we get
the highest accuracy of 95.35% with only two intents, namely “USER_PRESENT” and
“PACKAGE_REMOVED” when we rank the intents with Mutual Information. Whereas,
with Pearson’s Correlation Coefficient, we get the highest accuracy of 63.45% with only one
intent, namely “Main”. With our proposed frequency-based Chi-Square test on intents, we get
an accuracy of 95.35% with two intents, the same as that obtained from Mutual Information
and better than that from the Pearson Coefficient. Moreover, as seen in Table 9, we get the
highest accuracy of 98.49% from the proposed model with the frequency-based Chi-Square
123
123
Table 12 Top 10 permissions and intents ranked using Mutual Information and Correlation Coefficient
Permissions Intents
Mutual information Correlation coefficient Mutual information Correlation coefficient
Table 13 Comparison of frequency-based Chi-square test with mutual information and Pearson coefficient on permissions
Approach used Number of permissions used Detection accuracy using various machine learning and
deep learning classifiers (in %)
DT RF ANN BC NB LR MLP SVM DNN
Frequency-based Chi-square test (our approach) 10 97.70 97.70 74.54 97.70 92.16 97.70 97.58 97.58 97.58
Mutual information [9] 01 97.43 97.43 95.67 97.43 25.6 97.43 97.61 97.61 97.61
Correlation coefficient [9] 01 96.02 96.02 94.52 96.02 19.75 96.02 95.46 95.46 95.46
123
123
Table 14 Comparison of frequency-based Chi-square test with mutual information and pearson coefficient on intents
Approach used Number of intents used Detection accuracy using various machine learning and
deep learning classifiers (in %)
DT RF ANN BC NB LR MLP SVM DNN
Frequency-based Chi-Square (our approach) 02 95.35 95.35 65.32 95.35 95.35 95.35 95.30 95.30 95.30
Mutual information [9] 02 95.35 95.35 95.13 95.35 95.35 95.35 96.12 96.12 96.12
Correlation coefficient [9] 01 61.96 61.96 63.45 61.96 38.04 61.96 62.12 62.12 62.12
Multimedia Tools and Applications
Multimedia Tools and Applications
Table 15 Comparison of frequency-based Chi-Square test with Mutual Information and Pearson Coefficient on the combination of permission and intents
Approach used Number of PERMISSION-INTENT pairs used Detection accuracy using various machine learning and
deep learning classifiers (in %)
DT RF ANN BC NB LR MLP SVM DNN
Frequency-based Chi-square 06 98.49 98.49 86.84 98.49 98.49 98.49 98.35 98.35 98.35
Mutual information [9] 01 96.82 96.82 96.59 96.82 96.82 96.82 96.64 96.64 96.64
Correlation coefficient [9] 01 63.86 63.86 39.88 63.86 39.88 63.86 63.55 63.55 63.55
123
Multimedia Tools and Applications
test on the combination of permissions and intents, which is higher than the accuracy obtained
from Pearson Coefficient and Mutual Information applied on intents.
Now, we apply the proposed detection algorithm (Algorithm 1), on the combination of
permissions and intents, after ranking them using Mutual Information and Pearson’s Corre-
lation Coefficient. The proposed algorithm, i.e., Algorithm 1, will give the combined best set
of permissions and intents with higher accuracy as an output. The results are summarized
in Table 15. From the table, we observe that we get the highest accuracy of 96.82% with
only one pair of permission and intent, namely “MOUNT_UNMOUNT_FILESYSTEMS
and USER_PRESENT” when we rank permissions and intents with Mutual Information.
Whereas, with Pearson’s Correlation Coefficient, we get the highest accuracy of 63.86% again
with only one pair, namely “RECEIVE_SMS and MAIN”. With our proposed frequency-
based Chi-Square test on the combination of permissions and intents, we get the highest
accuracy of 98.49% with six permissions and six intents. Hence, our model outperforms
Mutual Information and the Pearson Correlation Coefficient on the combination of permis-
sions and intents.
As described at the beginning of Sect. 4, we have 788 malware apps from Koodous, which are
different from the Androzoo dataset, and we name this dataset TESTING DATASET-2. In
this section, we discuss the detection results, i.e., the accuracy obtained when we apply our
proposed approach to the TESTING DATASET-2. Again, for comparison, we perform three
experiments, considering 1.) permissions alone, 2) intents alone, and 3) both permissions and
intents combined.
First, we apply the proposed detection algorithm (Algorithm 1) with permissions alone.
The algorithm will give the best set of permissions with higher accuracy as an out-
put. Table 16 summarizes the detection results when we use permissions alone for
detection on the TESTING DATASET-2. The table can be understood as follows. With
the top-ranked permission, i.e., “BIND_GET_INSTALL_REFERRER_SERVICE”, we get
95.29% accuracy with several classifiers. We call this the first iteration, then we move
to the next iteration when we consider the top two ranked permissions, i.e., combining
“BIND_GET_INSTALL_REFERRER_SERVICE” with “JPUSH_MESSAGE” for detec-
tion and repeat the process as mentioned above. In this iteration, we get an accuracy of
95.55% from several classifiers. As discussed in Algorithm 1, we proceed to the next iter-
ation whenever the detection accuracy increases from the previous iteration. Hence, we
consider the top three permissions and repeat the entire procedure. The procedure terminates
until we observe a potential decrease in the detection accuracy. As shown in Table 16, we
achieved the highest detection accuracy on the ninth iteration, i.e., upon adding the top nine
permissions, namely “BIND_GET_INSTALL_REFERRER_ SERVICE”, “JPUSH_ MES-
SAGE”, “RESTART_ PACKAGES”, “SEND_ SMS”, “RECEIVE_ SMS”, “READ_SMS”,
“CHANGE_CONFIGURATION”, “RECEIVE_ USER_PRESENT”,“BROADCAST_PA
CKAGE_INSTALL”, we get the highest accuracy of 97.40%. From the next iteration, we
123
Table 16 Detection results with proposed approach considering only permissions
Permissions used Detection accuracy using various machine learning and
deep learning classifiers (in %)
Multimedia Tools and Applications
Bind_get_install_referrer_service 95.29 95.29 94.12 95.29 95.29 95.29 94.88 94.88 94.88
Jpush_message 95.55 95.55 93.34 94.40 95.55 95.55 94.84 94.84 93.84
Restart_packages 95.75 95.75 94.41 95.75 95.70 95.70 94.44 94.44 75.61
Send_SMS 96.20 96.20 83.21 96.20 96.20 96.20 92.44 92.44 85.95
Receive_SMS 96.70 96.70 90.56 96.70 96.70 96.70 90.33 90.33 93.72
Read_SMS 96.95 96.95 92.48 96.95 96.80 96.95 92.98 92.98 96.34
Change_configuration 97.08 97.08 93.54 97.08 97.08 97.08 88.12 88.12 97.69
Receive_user_present 97.29 97.29 93.79 97.29 97.29 97.29 88.31 88.31 88.31
Broadcast_package_install 97.40 97.40 93.82 97.40 97.35 97.35 93.05 88.01 93.05
Broadcast_package_replaced 97.20 97.20 93.82 97.20 97.20 97.14 92.75 92.75 93.82
Broadcast_sticky 97.20 97.20 94.46 97.20 97.14 97.20 92.63 92.92 92.63
Process_outgoing_calls 97.06 97.06 96.72 97.02 97.14 97.02 93.08 93.08 93.08
123
123
Table 17 Detection results with proposed Approach considering only intents
Intents used Detection accuracy using various machine learning and
deep learning classifiers (in %)
DT RF ANN BC NB LR MLP SVM DNN
Connection 94.46 94.46 84.31 94.20 94.20 94.20 86.32 86.32 86.32
Daemonservice 94.70 94.70 90.22 94.70 94.62 94.62 91.62 91.62 91.62
Notification_received 94.98 76.92 94.90 94.98 94.98 94.90 90.05 90.05 90.05
Notification_opened 95.28 78.10 95.28 95.28 95.28 95.28 94.44 94.44 94.44
Message_received 95.46 85.42 95.40 95.40 95.40 95.46 86.44 86.44 86.44
Start_from_agoo 95.78 95.78 89.07 95.78 95.70 95.70 88.75 88.75 88.75
Report 95.55 95.55 92.91 95.55 95.55 95.50 92.90 92.90 92.90
Command 95.08 95.10 93.33 95.08 95.10 95.10 92.24 92.24 92.24
Multimedia Tools and Applications
Multimedia Tools and Applications
observe that the detection accuracy starts decreasing. Finally, we observe that we get the
highest accuracy of 97.40% when we apply the proposed Algorithm 1 only on permissions.
Next, we apply the proposed detection algorithm (Algorithm 1) with intents alone. The algo-
rithm will give the best set of intents with higher accuracy as an output. Table 17 summarizes
the detection results when we use intents alone on the TESTING DATASET-2. The table
can be understood as follows. With the top-ranked intent, i.e., “CONNECTION,” we get
94.46% accuracy with several classifiers. We call this the first iteration, then we move to
the next iteration when we consider the top two ranked intents, i.e., combining “CONNEC-
TION” with “DaemonService” for detection and repeating the abovementioned process. In
this iteration, we note that we get an accuracy of 94.70% from several classifiers. Hence, we
consider the top three intents and repeat the entire procedure. As discussed in Algorithm 1,
we proceed to the next iteration whenever the detection accuracy increases from the previous
iteration. The process terminates until we observe a potential decrease in the detection accu-
racy. As shown in Table 17, we achieved the highest detection accuracy on the sixth iteration,
i.e., upon adding the top six intents, namely “CONNECTION”, “DaemonService”, “NOTI-
FICATION_RECEIVED”, “NOTIFICATION_OPENED”, “MESSAGE_RECEIVED”, and
“START_FROM_AGOO”, we get the highest accuracy of 95.78%. From the next iteration,
we observe that the detection accuracy starts decreasing. Finally, we observe that we get the
highest accuracy of 95.78% when we apply the proposed Algorithm 1 only on intents.
123
123
Table 18 Detection results with proposed approach considering the combination of permission and intents
Permissions and intents used Detection accuracy using various machine learning and
deep learning classifiers (in %)
DT RF ANN BC NB LR MLP SVM DNN
Bind_get_install_referrer_service and connection 95.68 95.68 92.61 95.60 95.68 95.60 94.02 94.02 92.52
Jpush_message and Daemonservice 95.82 95.82 75.87 95.80 95.82 95.82 97.50 97.50 97.50
Restart_packages and notification_received 96.24 96.24 58.97 96.24 96.24 96.24 98.07 98.07 90.09
Send_SMS and notification_opened 96.45 96.45 82.60 96.45 96.45 96.45 97.25 97.25 97.25
Receive_SMS and message_received 96.72 96.72 89.85 96.72 96.72 96.72 90.22 90.22 90.22
Read_SMS and start_from_agoo 97.28 97.28 92.66 97.28 97.28 97.28 92.65 92.90 92.65
Change_configuration and report 97.82 97.82 93.16 97.82 97.82 97.82 87.74 87.82 87.74
Receive_user_present and command 98.18 98.18 93.43 98.18 98.18 98.18 87.72 87.72 87.72
Broadcast_package_install and service 98.02 98.02 93.45 98.02 98.02 98.02 87.51 87.51 87.51
Broadcast_package_replaced and election 97.64 97.64 93.56 97.64 97.64 97.64 87.21 87.21 87.21
Multimedia Tools and Applications
Multimedia Tools and Applications
“COMMAND”, we get the highest accuracy of 98.18%. From the next iteration, we observe
that the detection accuracy starts decreasing. We observe that we get the highest accuracy
of 98.18% when we apply the proposed Algorithm 1 on the combination of permissions and
intents.
Table 19 summarizes the detection results on TESTING DATASET-2 when we use all per-
missions and intents for detection without applying any feature ranking technique. The table
can be understood as follows. On considering all the permissions simultaneously without
utilizing Chi-Square-based feature ranking, we observe that the highest detection accuracy
obtained is 74.28%, whereas the highest detection accuracy obtained while considering all
intents is 86.78%. We have applied the frequency-based Chi-Square test to rank permissions
and intents in our work. Next, we compare the performance of the proposed frequency-
based Chi-Square test with the Mutual Information and Pearson correlation Coefficient on
TESTING DATASET-2.
For the comparison, we ranked permissions and intents using Mutual Information and
Pearson’s Correlation Coefficient and further applied Algorithm 1 on TESTING DATASET-
2 to obtain their corresponding detection accuracies. First, we apply the proposed detection
algorithm (Algorithm 1), only on permissions, after ranking them using Mutual Infor-
mation and Pearson’s Correlation Coefficient. The detection results are summarized in
Table 20. The table shows that we get the highest accuracy of 92.05% with six per-
missions, namely “MOUNT_UNMOUNT_FILESYSTEMS”,“READ_PHONE_STATE”,
“CHANGE_WIFI_STATE”, “GET_TASKS”, “SYSTEM_ALERT_WINDOW” and “READ
_LOGS” when we rank the permissions with Mutual Information. Whereas, with Pearson’s
Correlation Coefficient, we get the highest accuracy of 64.50% with seven permissions,
namely “RECEIVE_SMS”, “QUERY_ALL_PACKAGES”, “INTERNET”, “CHANGE_BA
DGE”, “MAPS_RECEIVE”, “SYSTEM_OVERLAY_WINDOW” and “REQUEST_IGNO
RE_BATTERY_OPTIMIZATIONS”. With our proposed frequency-based Chi-Square test
on permissions, we get the highest accuracy of 97.40% with nine permissions. Therefore,
on TESTING DATASET-2, the frequency-based Chi-Square test is better than both Mutual
Information and Pearson Correlation Coefficient when we rank permissions with these tech-
niques.
Next, we apply the proposed detection algorithm (Algorithm 1) on TESTING DATASET-
2, with intents alone, after ranking them using Mutual Information and Pearson’s Corre-
lation Coefficient. The proposed algorithm, i.e., Algorithm 1, will give the best intents
with higher accuracy as an output. The results are summarized in Table 21. From
the table, we observe that we get the highest accuracy of 95.53% with five intents,
namely “USER_PRESENT”, “PACKAGE_REMOVED”, “DEFAULT”, “PUSH_TIME”
Table 19 Detection results considering all features for TESTING DATASET-2 without applying the proposed
approach
Features used Detection accuracy using various machine learning and
deep learning classifiers (in %)
DT RF ANN BC NB LR MLP SVM DNN
All permissions 74.28 74.28 66.96 74.21 59.97 59.97 60.62 52.50 52.71
All intents 86.78 86.70 66.80 86.55 69.43 65.22 66.93 68.83 64.54
123
123
Table 20 Comparison of frequency-based Chi-Square test with mutual information and Pearson coefficient on permissions
Approach used Number of permissions used Detection accuracy using various machine learning and
deep learning classifiers (in %)
DT RF ANN BC NB LR MLP SVM DNN
Frequency-based Chi-Square (our approach) 09 97.40 97.40 93.82 97.40 97.35 97.35 93.05 88.01 93.05
Mutual information [9] 06 92.05 92.05 92.03 92.05 92.05 92.05 92.10 92.10 92.09
Correlation coefficient [9] 07 64.50 64.50 54.59 64.50 61.83 61.83 26.38 26.28 60.12
Multimedia Tools and Applications
Multimedia Tools and Applications
Table 21 Comparison of frequency-based Chi-Square test with Mutual Information and Pearson Coefficient on intents
Approach used Number of Intents used Detection accuracy using various machine learning and
deep learning classifiers (in %)
DT RF ANN BC NB LR MLP SVM DNN
Frequency-based Chi-Square (our approach) 06 95.78 95.78 89.07 95.78 95.70 95.70 88.75 88.75 88.75
Mutual information [9] 05 95.14 95.14 57.97 95.14 95.53 95.53 60.12 60.12 58.88
Correlation coefficient [9] 09 85.45 85.45 79.82 85.45 78.87 81.61 80.56 80.56 80.52
123
Multimedia Tools and Applications
The applications in both the datasets, i.e., TESTING DATASET-1 and TESTING DATASET-
2 were collected over the period from 2016 to 2022. In this subsection, we discuss the results
obtained from testing our proposed approach over a new and more recent dataset, i.e., on
1000 malicious applications downloaded from Androzoo that were detected in 2022. Again,
we perform three experiments, considering 1) permissions alone, 2) intents alone, and 3)
both permissions and intents combined.
First, we apply the proposed detection algorithm (Algorithm 1) with permissions alone.
Table 23 summarizes the detection results when we use permissions alone for detection on the
recent dataset. With the top-ranked permission, i.e., “BIND_GET_INSTALL_REFERRER_
SERVICE”, we get the highest accuracy of 96.84% accuracy with one of the classifier. Then
we move to the next iteration when considering the top two ranked permissions, i.e., com-
bining “BIND_GET_INSTALL_REFERRER_SERVICE” with “JPUSH_MESSAGE” for
detection. In this iteration, we get an increased accuracy of 97.02 %. Next, we consider the top
three permissions and repeat the entire procedure. The procedure terminates until we observe
a potential decrease in the detection accuracy. As shown in Table 23, we achieved the highest
detection accuracy on the third iteration, i.e., upon adding the top three permissions, namely
“BIND_GET_INSTALL_REFERRER_SERVICE”, “JPUSH_MESSAGE”and “RESTART
_ PACKAGES”, we get the highest accuracy of 97.13%. From the next iteration, we observe
that the detection accuracy starts decreasing. Finally, we observe that we get the highest
accuracy of 97.13% when we apply the proposed Algorithm 1 on the recent dataset with
permissions alone.
123
Multimedia Tools and Applications
Table 22 Comparison of frequency-based Chi-Square test with Mutual Information and Pearson Coefficient on the combination of permission and intents
Approach used Number of permission-intent pairs used Detection accuracy using various machine learning and
deep learning classifiers (in %)
DT RF ANN BC NB LR MLP SVM DNN
Frequency-based Chi-Square (our approach) 08 98.18 98.18 93.43 98.18 98.18 98.18 87.72 87.72 87.72
Mutual information [9] 01 97.29 97.29 97.29 97.29 97.29 97.29 97.22 97.22 97.22
Correlation coefficient [9] 02 85.54 85.54 80.01 85.54 80.25 85.54 79.80 79.80 84.03
123
123
Table 23 Detection results with proposed approach considering only permissions
Permissions used Detection accuracy using various machine learning and
deep learning classifiers (in %)
DT RF ANN BC NB LR MLP SVM DNN
Bind_get_install_ referrer_ service 96.06 96.06 96.84 96.06 96.06 96.06 96.66 96.66 96.65
Jpush_message 97.02 97.02 60.93 97.02 97.02 97.02 96.65 96.65 96.65
Restart_packages 97.13 97.13 53.28 97.13 97.13 97.13 96.51 96.51 96.51
Send_SMS 94.50 94.50 52.10 94.50 94.50 94.50 95.76 95.76 95.77
Receive_SMS 95.50 95.50 51.30 95.50 95.50 95.50 95.02 95.02 95.03
Read_SMS 92.85 92.85 50.90 92.85 92.85 92.85 94.60 94.60 94.62
Multimedia Tools and Applications
Multimedia Tools and Applications
Next, we apply the proposed approach to the recent dataset, with intents alone. The
algorithm will give the best intents with higher accuracy as an output. Table 24 summa-
rizes the detection results when we use intents alone for detection. With the top-ranked
intent, i.e., “CONNECTION,” we get 96.03% accuracy. Then we move to the next iter-
ation; when we consider the top two ranked intents, i.e., combining “CONNECTION”
with “DaemonService” for detection and repeating the process, we get an accuracy of
96.26%. As shown in Table 24, we achieved the highest detection accuracy on the sixth
iteration, i.e., upon adding the top six intents, namely “CONNECTION”, “Daemon-
Service”, “NOTIFICATION_RECEIVED”,“NOTIFICATION_ OPENED”, “MESSAGE
RECEIVED,” and“START FROM AGOO,” we get the highest accuracy of 97.89%. From the
next iteration, we observe that the detection accuracy starts decreasing. Finally, we observe
that we get the highest accuracy of 97.89% when we apply the proposed Algorithm 1 on the
recent dataset with intents.
Further, we apply the proposed approach to the recent dataset with the combina-
tion of permissions and intents. The algorithm will give the best set of permissions
and intents with higher accuracy as an output. Table 25 summarizes the detection
results. With the top-ranked pair, i.e., “BIND_GET_INSTALL_REFERRER_SERVICE”
and “CONNECTION”, we get 96.19% accuracy. Then we move to the next iteration
when considering the top two ranked pairs of permissions and intents, i.e., com-
bining., “BIND_GET_INSTALL_REFERRER_SERVICE” and “CONNECTION” with
“JPUSH_MESSAGE” and “DaemonService” and we get an increased accuracy of 98.42%.
Next, we consider the top three pairs of permissions and intents and repeat the entire pro-
cedure. We achieved the highest detection accuracy on the third iteration, i.e., upon adding the
top three permissions, namely “BIND_GET_INSTALL_REFERRER_SERVICE”, “JPUSH_
MESSAGE”and “RESTART_PACKAGES”, and top three intents namely “CONNEC-
TION”, “DaemonService” and “NOTIFICATION_RECEIVED”, we get the highest accuracy
of 98.74%. Hence, we can conclude that the proposed approach in this work can detect recent
malware samples with an efficient accuracy of 98.74%.
In this section, we compare the performance of our proposed model with other similar works
of Android malware detection that have used permissions or intents as features. Table 26
summarizes this comparison. As seen from the table, our work outperforms all these works
in terms of detection accuracy. Some works have ranked the permissions based on frequency
or with tests like Mutual Information and Pearson Correlation Coefficient. Some other works
have applied feature selection techniques with Linear Regressions or Naive Bayes, whereas
some authors have used permissions in pairs for Android malware detection. Only two works,
i.e., Li et al. [7] and Wang et al. [9], have used a larger number of normal applications in
their analysis than ours. However, their dataset size for malware apps is smaller than ours.
Moreover, our work outperforms them in terms of detection accuracy. Hence, our proposed
model is better than many state-of-the-art techniques presented in the literature for Android
malware detection.
4.7 Limitations
Now, we describe a few limitations of the proposed approach. The proposed model ranks
permissions and intents for malware detection, and hence, the model is a static detection.
123
123
Table 24 Detection results with proposed approach considering only intents
Intents used Detection accuracy using various machine learning and
deep learning classifiers (in %)
DT RF ANN BC NB LR MLP SVM DNN
Connection 96.03 96.03 85.32 96.03 96.03 96.03 84.32 92.50 84.32
Daemonservice 96.26 96.26 91.23 96.26 96.26 96.26 90.05 95.80 90.05
Notification_received 97.19 97.19 94.66 97.19 97.19 97.19 94.76 96.50 94.76
Notification_opened 97.42 97.42 95.30 97.42 97.42 97.42 95.22 96.46 95.22
Message_received 97.55 97.55 95.44 97.55 97.55 97.55 95.40 95.40 95.40
Start_from_agoo 97.89 97.89 88.97 97.89 97.89 97.89 89.21 90.20 89.21
Report 97.40 97.40 84.14 97.51 97.40 97.40 84.04 84.04 84.04
Command 97.27 97.27 81.22 97.29 97.27 97.26 80.45 80.45 80.45
Multimedia Tools and Applications
Multimedia Tools and Applications
Table 25 Detection results with proposed approach considering the combination of permissions and intents
Permissions and intents used Detection accuracy using various machine learning and
deep learning classifiers (in %)
DT RF ANN BC NB LR MLP SVM DNN
Bind_get_install_referrer_service and connection 96.19 96.19 73.09 96.19 96.19 96.19 97.42 97.42 97.42
Jpush_message and daemonservice 97.35 97.35 72.78 97.35 97.35 97.35 98.42 98.42 98.42
Restart_packages and notification_received 98.74 98.74 67.42 98.74 89.37 98.74 97.60 97.60 97.60
Send_SMS and notification_opened 98.17 98.17 70.14 98.17 87.24 98.17 97.26 97.26 97.26
Receive_SMS and message_received 96.75 96.75 73.60 96.75 96.75 96.75 97.05 97.05 97.05
123
123
Table 26 Comparison of proposed work with the existing literature based on malware detection using permissions and intents
Related work Feature selection/Feature ranking technique used Dataset size Detection accuracy (in %) Number of best features
Normal Malware
Li et al. [7] Permissions ranking based on frequency 310,926 62,838 93.62 22 permissions
Khariwal et al. [8] Raked features using Information gain 1,414 1,714 94.73 37 features
Wang et al. [9] Permissions ranking with Mutual Information, Correla- 310,926 4,868 94.62 40 permissions
tion Coefficient and T-test
Yerima et al. [16] Mutual Information gain based permissions and code 1,000 1,000 97.7 15 permissions
based features
Chaudhary et al. [19] Chi-Square as a feature reduction technique 5065 426 96.4 –
Mahindru et al. [23] Feature selection using Chi-Square, Gain Ratio, Filtered 5,00,000 98.2
Subset selection, Information feature, LR analysis, PCA
Sahin et al. [25] Feature selection with Linear regression 1,000 1,000 96.1 27 permissions
Talha et al. [26] Risk score calculated for each app 1,853 6,909 88.28 –
Dougru et al. [28] Permission groups score calculated, to sum up, app’s risk 5,554 5,554 96.19 –
Score
Shang et al. [29] Naive Bayes and Pearson Correlation Coefficient 945 1,725 86.54 –
Tchakounte et al. [30] Sequence alignment based similarity score 534 534 79.58 –
Kato et al. [54] Similarity score between malware and normal permission 11,500 19,000 97.3 –
pairs
Arora et al. [55] Normal and malicious graphs of permission pairs 7,533 7,533 95.44 –
IPAnalyzer (Proposed model) Permissions and Intents ranking with Frequency- 77,000 77,000 98.49 12 features
based Chi-Square
Multimedia Tools and Applications
Multimedia Tools and Applications
In this work, we proposed a novel static technique to detect Android malware using the com-
bination of ranked permissions and intents. Initially, we ranked the permissions and intents
separately based on their frequency difference in normal and malware datasets. Subsequently,
we ranked the features using a frequency-based statistical Chi-Square test. Finally, we pro-
posed a novel algorithm with machine learning and deep learning techniques to merge the
two ranked lists and find the best combination of permissions and intents. Our experimental
results demonstrate that the proposed model gives adequate detection accuracy of 98.49%
with 12 features, i.e., the top six permissions combined with the top six intents. Furthermore,
results showed that our proposed method is better than many state-of-the-art techniques for
Android malware detection in terms of detection accuracy and the number of features used.
In our future work, we will expand the analysis on other manifest file components such as
broadcast receivers, activities, services, etc. We will also aim to integrate dynamic analysis
to detect stealthier malware and colluding apps.
Author Contributions There are equal contributions in this research from all the authors of this article.
Data Availability The datasets generated during and/or analysed during the current study are available from
the corresponding author on reasonable request.
Declarations
Ethical approval This article does not contain any studies with human participants or animals performed by
any of the authors.
Conflict of interest The authors declare that they have no conflict of interest.
References
1. Felt AP, Ha E , Egelman S , Haney A, Chin E, Wagner D (2012) Android permissions: user attention,
comprehension, and behavior. In: Proceedings of the eighth symposium on usable privacy and security,
pp 1–14
2. Şahın DÖ, Akleylek S, Kiliç E (2022) Linregdroid: detection of android malware using multiple linear
regression models-based classifiers. IEEE Access 10:14246–14259
3. Alsoghyer S, Almomani I (2020) On the effectiveness of application permissions for android ransomware
detection. In: 2020 6th conference on data science and machine learning applications (CDMA), pp 94–99.
IEEE
123
Multimedia Tools and Applications
4. Shrivastava G, Kumar P (2019) Sensdroid: analysis for malicious activity risk of android application.
Multimed Tools Appl 78(24):35713–35731
5. Idrees F, Rajarajan M, Chen TM, Rahulamathavan Y, Naureen A (2017) Andropin: correlating android
permissions and intents for malware detection. In: 2017 8th IEEE annual information technology, elec-
tronics and mobile communication conference (IEMCON), pp 394–399. IEEE
6. Idrees F, Rajarajan M (2014) Investigating the android intents and permissions for malware detection. In:
2014 IEEE 10th international conference on wireless and mobile computing, networking and communi-
cations (WiMob), pp 354–358. IEEE
7. Li J, Sun L, Yan Q, Li Z, Srisa-An W, Ye H (2018) Significant permission identification for machine-
learning-based android malware detection. IEEE Trans Industr Inf 14(7):3216–3225
8. Khariwal K, Singh J, Arora A (2020) Ipdroid: android malware detection using intents and permissions.
In: 2020 Fourth world conference on smart trends in systems, security and sustainability (WorldS4), pp
197–202. IEEE
9. Wang W, Wang X, Feng D, Liu J, Han Z, Zhang X (2014) Exploring permission-induced risk in android
applications for malicious application detection. IEEE Trans Inf Forensics Secur 9(11):1869–1882
10. Arora A, Peddoju SK (2017) Minimizing network traffic features for android mobile malware detection.
In: Proceedings of the 18th international conference on distributed computing and networking, pp 1–10
11. Shabtai A, Tenenboim-Chekina L, Mimran D, Rokach L, Shapira B, Elovici Y (2014) Mobile malware
detection through analysis of deviations in application network behavior. Computers & Security 43:1–18
12. Singh L, Hofmann M (2017) Dynamic behavior analysis of android applications for malware detection.
In: 2017 International conference on intelligent communication and computational techniques (ICCT),
pp 1–7. IEEE
13. Feng P, Ma J, Sun C, Xu X, Ma Y (2018) A novel dynamic android malware detection system with
ensemble learning. IEEE Access 6:30996–31011
14. Sahal AA, Alam S, Soğukpinar I (2018) Mining and detection of android malware based on permissions.
In: 2018 3rd International conference on computer science and engineering (UBMK), pp 264–268. IEEE
15. Yerima SY, Sezer S, McWilliams G, Muttik I (2013) A new android malware detection approach using
bayesian classification. In: 2013 IEEE 27th international conference on advanced information networking
and applications (AINA), pp 121–128. IEEE
16. Yerima SY, Sezer S, McWilliams G (2014) Analysis of bayesian classification-based approaches for
android malware detection. IET Inf Secur 8(1):25–36
17. Upadhayay M, Sharma A, Garg G, Arora A (2021) Rpndroid: android malware detection using ranked
permissions and network traffic. In: 2021 Fifth World conference on smart trends in systems security and
sustainability (WorldS4), pp 19–24. IEEE
18. Rathore H, Kharat A, Manickavasakam A, Sahay SK, Sewak M (2023) Malefficient10%: a novel feature
reduction approach for android malware detection. In: International conference on broadband communi-
cations, networks and systems, pp 72–92. Springer
19. Chaudhary M, Masood A (2023) Realmalsol: real-time optimized model for android malware detection
using efficient neural networks and model quantization. Neural Cmputing and Applications 35(15):11373–
11388
20. Rahima Manzil HH, Naik SM (2023) Android ransomware detection using a novel hamming distance
based feature selection. J Comput Virology and Hacking Techniques 1–23
21. Seyfari Y, Meimandi A (2023) A new approach to android malware detection using fuzzy logic-based
simulated annealing and feature selection. Multimed Tools Appl 1–25
22. Anupama M, Vinod P, Visaggio CA, Arya M, Philomina J, Raphael R, Pinhero A, Ajith K, Mathiyalagan
P (2022) Detection and robustness evaluation of android malware classifiers. J Comput Virology Hacking
Techniq 18(3):147–170
23. Mahindru A, Sangal A (2022) Somdroid: android malware detection by artificial neural network trained
using unsupervised learning. Evol Intel 15(1):407–437
24. Mahindru A, Sangal A (2021) Fsdroid:-a feature selection technique to detect malware from android
using machine learning techniques: Fsdroid. Multimed Tools Appl 80:13271–13323
25. Şahin DÖ, Kural OE, Akleylek S, Kılıç E (2021) A novel permission-based android malware detection
system using feature selection based on linear regression. Neural Computing and Applications, 1–16
26. Talha KA, Alper DI, Aydin C (2015) Apk auditor: permission-based android malware detection system.
Digit Investig 13:1–14
27. Mahindru A, Singh P (2017) Dynamic permissions based android malware detection using machine
learning techniques. In: Proceedings of the 10th innovations in software engineering conference, pp
202–210
28. Doğru İA, Önder M (2020) Appperm analyzer: malware detection system based on android permissions
and permission groups. Int J Software Eng Knowl Eng 30(03):427–450
123
Multimedia Tools and Applications
29. Shang F, Li Y, Deng X, He D (2018) Android malware detection method based on naive bayes and
permission correlation algorithm. Clust Comput 21(1):955–966
30. Tchakounté F, Wandala AD, Tiguiane Y (2019) Detection of android malware based on sequence align-
ment of permissions. Int J Comput (IJC) 35(1):26–36
31. Ju S-h, Seo H-s, Kwak J (2016) Research on android malware permission pattern using permission
monitoring system. Multimed Tools Appl 75:14807–14817
32. Ilham S, Abderrahim G, Abdelhakim BA (2018) Permission based malware detection in android devices.
In: Proceedings of the 3rd International conference on smart city applications, pp 1–6
33. Şahın DÖ, Kural OE, Akleylek S, Kiliç E (2018) New results on permission based static analysis for
android malware. In: 2018 6th International symposium on digital forensic and security (ISDFS), pp 1–4.
IEEE
34. D’Angelo G, Palmieri F, Robustelli A (2022) A federated approach to android malware classification
through perm-maps. Clust Comput 25(4):2487–2500
35. Xiong P, Wang X, Niu W, Zhu T, Li G (2014) Android malware detection with contrasting permission
patterns. China Communications 11(8):1–14
36. Lu T, Hou S (2018) A two-layered malware detection model based on permission for android. In: 2018
IEEE International conference on computer and communication engineering technology (CCET), pp
239–243. IEEE
37. Kavitha K, Salini P, Ilamathy V (2016) Exploring the malicious android applications and reducing risk
using static analysis. In: 2016 International conference on electrical, electronics, and optimization tech-
niques (ICEEOT), pp 1316–1319. IEEE
38. Amer E (2021) Permission-based approach for android malware analysis through ensemble-based voting
model. In: 2021 International mobile, intelligent, and ubiquitous computing conference (MIUCC), pp
135–139. IEEE
39. Chakravarty S et al (2020) Feature selection and evaluation of permission-based android malware detec-
tion. In: 2020 4th International conference on trends in electronics and informatics (ICOEI)(48184), pp
795–799. IEEE
40. Sirisha P, Anuradha T et al (2019) Detection of permission driven malware in android using deep learning
techniques. In: 2019 3rd International conference on electronics, communication and aerospace technol-
ogy (ICECA), pp 941–945. IEEE
41. Wang Z, Li K, Hu Y, Fukuda A, Kong W (2019) Multilevel permission extraction in android applications
for malware detection. In: 2019 International conference on computer, information and telecommunication
systems (CITS), pp 1–5. IEEE
42. Park J, Kang M, Cho S-j, Han H, Suh K (2020) Analysis of permission selection techniques in machine
learning-based malicious app detection. In: 2020 IEEE Third international conference on artificial intel-
ligence and knowledge engineering (AIKE), pp 92–99. IEEE
43. Liang S, Du X (2014) Permission-combination-based scheme for android mobile malware detection. In:
2014 IEEE International conference on communications (ICC), pp 2301–2306. IEEE
44. Enck W, Ongtang M, McDaniel P (2009) On lightweight mobile phone application certification. In:
Proceedings of the 16th ACM conference on computer and communications security, pp 235–245
45. Wang Y, Zheng J, Sun C, Mukkamala S (2013) Quantitative security risk assessment of android permis-
sions and applications. In: Data and applications security and privacy XXVII: 27th Annual IFIP WG 11.3
Conference, DBSec 2013, Newark, NJ, USA, July 15-17, 2013. Proceedings 27, pp 226–241. Springer
46. Peng H, Gates C, Sarma B, Li N, Qi Y, Potharaju R, Nita-Rotaru C, Molloy I (2012) Using probabilistic
generative models for ranking risks of android apps. In: Proceedings of the 2012 ACM conference on
computer and communications security, pp 241–252
47. Pandita R, Xiao X, Yang W, Enck W, Xie T (2013) {WHYPER}: towards automating risk assessment of
mobile applications. In: 22nd USENIX security symposium (USENIX Security 13), pp 527–542
48. Samra AAA, Yim K, Ghanem OA (2013) Analysis of clustering technique in android malware detec-
tion. In: 2013 seventh international conference on innovative mobile and internet services in ubiquitous
computing, pp 729–733. IEEE
49. Zarni Aung WZ (2013) Permission-based android malware detection. Int J Sci Technol Res 2(3):228–234
50. Sanz B, Santos I, Laorden C, Ugarte-Pedrero X, Bringas PG, Álvarez G (2013) Puma: permission usage
to detect malware in android. In: International joint conference e CISIS’12-ICEUTE 12-SOCO 12 special
sessions, pp 289–298. Springer
51. Moonsamy V, Rong J, Liu S (2014) Mining permission patterns for contrasting clean and malicious
android applications. Futur Gener Comput Syst 36:122–132
52. Backes M, Gerling S, Hammer C, Maffei M, Styp-Rekowsky P (2013) Appguard–enforcing user require-
ments on android apps. In: Tools and algorithms for the construction and analysis of systems: 19th
123
Multimedia Tools and Applications
international conference, TACAS 2013, held as part of the european joint conferences on theory and prac-
tice of software, ETAPS 2013, Rome, Italy, March 16-24, 2013. Proceedings 19, pp 543–548. Springer
53. Wu D-J, Mao C-H, Wei T-E, Lee H-M, Wu K-P (2012) Droidmat: android malware detection through
manifest and api calls tracing. In: 2012 seventh asia joint conference on information security, pp 62–69.
IEEE
54. Kato H, Sasaki T, Sasase I (2021) Android malware detection based on composition ratio of permission
pairs. IEEE Access 9:130006–130019
55. Arora A, Peddoju SK, Conti M (2019) Permpair: android malware detection using permission pairs. IEEE
Trans Inf Forensics Secur 15:1968–1982
56. Saleem MS, Mišić J, Mišić VB (2020) Examining permission patterns in android apps using kernel density
estimation. In: 2020 international conference on computing, networking and communications (ICNC),
pp 719–724. IEEE
57. Zhu H-j, Gu W, Wang L-m, Xu Z-c, Sheng VS (2023) Android malware detection based on multi-head
squeeze-and-excitation residual network. Expert Syst Appl 212:118705
58. Rathore H, Nandanwar A, Sahay SK, Sewak M (2023) Adversarial superiority in android malware detec-
tion: lessons from reinforcement learning based evasion attacks and defenses. Forensic Sci Int: Digital
Investigation 44:301511
59. Keyvanpour MR, Barani Shirzad M, Heydarian F (2023) Android malware detection applying feature
selection techniques and machine learning. Multimed Tools Appl 82(6):9517–9531
60. Ravi V, Chaganti R (2023) Efficientnet deep learning meta-classifier approach for image-based android
malware detection. Multimed Tools Appl 82(16):24891–24917
61. Kaithal PK, Sharma V (2023) A novel efficient optimized machine learning approach to detect malware
activities in android applications. Multimed Tools Appl 1–18
62. Lee S-A, Yoon A-R, Lee J-W, Lee K (2022) An android malware detection system using a knowledge-
based permission counting method. JOIV: Int J Inform Vis 6(1):138–144
63. Wu Y, Li M, Zeng Q, Yang T, Wang J, Fang Z, Cheng L (2023) Droidrl: feature selection for android
malware detection with reinforcement learning. Computers & Security 128:103126
64. İbrahim M, Issa B, Jasser MB (2022) A method for automatic android malware detection based on static
analysis and deep learning. IEEE Access 10:117334–117352
65. Kabakus AT (2022) Droidmalwaredetector: a novel android malware detection framework based on con-
volutional neural network. Expert Syst Appl 206:117833
66. Wang H, Zhang W, He H (2022) You are what the permissions told me! android malware detection based
on hybrid tactics. J Inform Sec Appl 66:103159
67. Yuan W, Jiang Y, Li H, Cai M (2019) A lightweight on-device detection method for android malware.
IEEE Trans Sys Man Cybernetics: Syst 51(9):5600–5611
68. ython W (2021) Python. Python releases for windows 24
69. Allix K, Bissyandé TF, Klein J, Le Traon Y (2016) Androzoo: collecting millions of android apps for the
research community. In: Proceedings of the 13th international conference on mining software repositories,
pp 468–471
70. Franke TM, Ho T, Christie CA (2012) The chi-square test: often used and more often misinterpreted. Am
J Eval 33(3):448–458
71. Witten IH, Frank E (2002) Data mining: practical machine learning tools and techniques with java imple-
mentations. ACM SIGMOD Rec 31(1):76–77
72. Fushiki T (2011) Estimation of prediction error by using k-fold cross-validation. Stat Comput 21:137–146
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.
123