0% found this document useful (0 votes)
8 views

A_Survey_on_Android_Malware_Detection_Techniques_Using_Supervised_Machine_Learning

This paper provides a comprehensive review of machine learning techniques for detecting Android malware, highlighting the security weaknesses of the Android operating system compared to iOS. It discusses various detection methods, including static, dynamic, and hybrid approaches, while identifying research gaps and proposing future directions for improvement. The findings emphasize the importance of developing effective malware detection tools to protect user privacy and data security on Android devices.

Uploaded by

ramewqppq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

A_Survey_on_Android_Malware_Detection_Techniques_Using_Supervised_Machine_Learning

This paper provides a comprehensive review of machine learning techniques for detecting Android malware, highlighting the security weaknesses of the Android operating system compared to iOS. It discusses various detection methods, including static, dynamic, and hybrid approaches, while identifying research gaps and proposing future directions for improvement. The findings emphasize the importance of developing effective malware detection tools to protect user privacy and data security on Android devices.

Uploaded by

ramewqppq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Received 24 September 2024, accepted 15 October 2024, date of publication 24 October 2024,

date of current version 27 November 2024.


Digital Object Identifier 10.1109/ACCESS.2024.3485706

A Survey on Android Malware Detection


Techniques Using Supervised
Machine Learning
SAFA J. ALTAHA , AHMED ALJUGHAIMAN , AND SONIA GUL
Department of Computer Networks and Communications, College of Computer Sciences and Information Technology, King Faisal University, Al-Ahsa 31982,
Saudi Arabia
Corresponding author: Safa J. Altaha ([email protected])
This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King
Faisal University, Saudi Arabia, under Grant KFU242126.

ABSTRACT Android’s open-source nature has contributed to the platform’s rapid growth and its widespread
adoption. However, this widespread adoption of the Android operating system (OS) has also attracted the
attention of malicious actors who develop malware targeting these devices. Android malware threatens users’
privacy, data security, and overall device performance. Machine learning (ML) plays a significant role in
malware analysis and detection because it can process huge amounts of data, identify complex patterns, and
adjust to changing threats. The purpose of this paper is to provide a comprehensive review of the existing
research on ML-based techniques used to detect and analyze Android malware. In this paper, the security
weaknesses in Android OS are explored and the reasons why these weaknesses do not exist in the iPhone
operating system (iOS) are discussed. Further, the authors examine the existing studies that have been
proposed by researchers and outlines their strengths and limitations. The findings reveal that the existing
researches utilize different ML models, features, and detection techniques, including static, dynamic, and
hybrid approaches. Moreover, directions for future research and potential areas that require more attention
and improvement in this field are highlighted.

INDEX TERMS Android, Android malware, malware detection, supervised machine learning.

I. INTRODUCTION passwords, these devices have become an appealing target for


In today’s interconnected world, mobile devices such as cybercriminals [1]. From 2009 to 2020, Android OS has been
smartphones and tablets have become an integral part of our identified as the most widely used OS on mobile devices,
daily lives. They have changed the way people communicate accounting for 72.95% of total usage worldwide. The iPhone
with others, access information, entertain themselves, and operating system (iOS) comes in second place with a usage
perform daily activities. Android, being one of the most percentage of 26.27%. Other OSs, such as Windows Mobile,
popular mobile operating systems (OSs), offers a wide range constitute the remaining 0.78% [2]. Figure 1 illustrates the
of features and applications that enhance productivity and distribution of different mobile device OSs usage. There is a
entertainment. As the popularity and ubiquity of Android substantial amount of malware targets the Android platform
devices continue to rise, so does the risk of malicious software since it is the most widely used OS [3]. Therefore, the
that targets these platforms. Android malware threatens users’ effective detection and mitigation of Android malware have
privacy, data security, and overall device performance. With become essential to ensure a safe and secure user experience.
the increasing prevalence of Android smartphones that store Malware can be defined as any software that intentionally
sensitive information like personal data, banking details, and executes malicious payloads on victim’s machines (com-
puters, smart phones, computer networks, etc) [4]. Android
The associate editor coordinating the review of this manuscript and malware can take various forms, such as viruses, worms,
approving it for publication was S. M. Abdur Razzak . spyware, ransomware, and Trojans. Once a device is infected,
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
173168 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 12, 2024
S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

rules and patterns to identify potential malware, even if the


specific signature is not yet known. This method analyzes
the characteristics and actions of a program to determine
the presence of malicious behavior. In contrast, behavior-
based detection focuses on monitoring the runtime behavior
of a program to detect anomalies or suspicious activities.
By tracking and analyzing the program’s interactions with
the system, network, and other resources, behavior-based
detection can identify malware that signature-based detection
may fail to detect.
Machine Learning (ML) algorithms can significantly help
in classifying Android applications as malware or benign.
By using features and patterns within the applications,
ML algorithms can learn from labeled datasets and make
FIGURE 1. Percentage of usage of various mobile operating systems.
accurate predictions on unseen application samples [9].
In this paper, the authors present Android malware detection
techniques and present a systematic literature review (SLR)
this can lead to data theft, unauthorized access, financial loss, of existing related works. Moreover, this paper proposes
privacy breaches, and device malfunction. Malware exhibits future research directions and works that need to be explored
several key characteristics, including the ability to replicate, by the researchers. This research paper attempts to address
propagate, self-execute, and corrupt computer systems. The the following questions:
corruption of a computer system can have detrimental 1) What are the vulnerabilities of Android OS in avoiding
effects on the integrity, confidentiality, and availability of malicious activities?
information. Replication is a critical feature found in many 2) What are the security characteristics and weaknesses of
types of malware, as it ensures its persistence and spread. Google Play and App Store?
In certain malware instances, excessive replication can lead to 3) Which approaches are most frequently utilized by
the depletion of computer resources, such as hard disk space researchers to detect malware in Android OS?
and Random Access Memory (RAM) [5]. 4) Which static features are mainly used by researchers to
Obfuscation is a technique employed by malware pro- detect malware in Android OS?
grammers to make their malicious code difficult to analyze 5) Which ML model is the most effective for detecting
and comprehend. The primary objective of obfuscation is to malware in Android OS?
conceal the malicious behavior of the malware. By employing
obfuscation techniques, malware authors aim to obstruct A. RESEARCH MOTIVATION
the efforts of security researchers, analysts, and antivirus There is a pressing requirement for reliable, scalable,
software in detecting and understanding the true nature and and robust malware detection tools for Android devices,
intent of the malware [6]. Other types of malware can as malware can compromise user privacy, steal sensitive
be easily detected and eliminated using antivirus software. information stored on the device such as passwords and
These software solutions maintain a database of virus bank data, and cause financial losses through activities, such
signatures, which are unique binary patterns associated with as identity theft or fraudulent transactions. By detecting
malicious code. When files are suspected of being infected, malware, users can maintain the security and integrity of their
they are scanned for the presence of any virus signatures. personal information and mitigate potential harm.
This detection method was effective until malware authors The main motivation of this research study is to help
began creating polymorphic and metamorphic malware developers build effective malware analysis systems for
variants. Polymorphic malware is designed to change its Android OS based on the identified research gaps and
code with each infection or execution, thereby making it shortcomings from recent works to efficiently detect mali-
difficult for signature-based detection methods to identify cious applications, prevent cyber threats, protect users’ data,
it. Metamorphic malware goes a step further by not merely and enhance overall cybersecurity. In addition, the aim is
changing its appearance but also modifying its underlying to identify vulnerabilities and other security issues in the
code structure with each iteration [7]. These types of Android platform and distinguish between the Google Play
malware employ encryption techniques to evade signature- Store and iOS App Store in the context of utilized approaches
based detection, thus making it more challenging for antivirus for application review and security.
software to identify and remove them [8]. Other detection
techniques, such as heuristic and behavior-based methods, B. RESEARCH CONTRIBUTIONS AND RELATED REVIEWS
have become more apparent as they focus on identifying This paper aims to present an SLR of existing works related
suspicious behaviors and patterns rather than relying on to ML-based Android malware detection. The goal is to
known signatures. Heuristic-based detection uses a set of provide a clear and comprehensive view of the state of

VOLUME 12, 2024 173169


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

Android malware detection using supervised ML models Section IV provides the search strategy. Section V discusses
in recent years. This paper first highlights the inherent and compares the related studies. Section VI discusses and
security weaknesses of the Android OS, which make it more highlights the major findings. Section VII presents directions
vulnerable to malware threats compared to other platforms, for future research. Section VIII discusses the threats to the
such as iOS. Then, the security practices and policies applied validity of this study. Finally, Section IX concludes this work.
by the Play Store are discussed and compared with the
security measures taken by the App Store for iOS. The main
contributions of this paper are summarized below:
• Provide a guide for researchers to identify the vulnera-
bilities that may lead to system compromise in Android
OS and distinguish between Google Play Store and iOS
App Store application security and review process.
• The authors perform this SLR on ML-based techniques
for Android malware detection and categorize reviewed
papers based on the analysis approaches employed.
• Identify the static features that are impactful in analyzing
malware in Android OS more effectively.
• Identify the most effective ML model to detect malware
in Android OS.
• Finally, major findings and research gaps are highlighted
and future research directions in Android malware
detection using ML models are provided.
Several previous surveys and literature reviews have focused
on Android malware detection research. For example, Pan FIGURE 2. The versions of android operating system.
et al. [10] provided a comprehensive SLR with a focus on
static analysis approaches for Android malware detection.
Their paper identified ML models and statistical models as II. BACKGROUND
methods to be used for the detection of Android malware. A. EVOLUTION OF ANDROID OPERATING SYSTEM
However, ML-based detection was not addressed as their The journey of Android has been marked by a series of key
main focus and they merely reviewed static features-based milestones, each of which influence the overall platform’s
methods. Chowdhury et al. [11] reviewed a small number of development. In this section, the authors present the evolution
studies published from 2015 to 2023. They only summarised of the Android OS, tracing its roots from the early days of
the reviewed works, but the model accuracy and ML Android Inc. up to 2022. By examining the major Android
models of those studies were not mentioned. Wang et al. releases and the introduction of new features and capabilities,
[12] provided a good systematic review, but their main a better understanding of the factors that have contributed to
focus was only on deep learning models. Kouliaridis and Android’s rise is obtained. Android Inc. developed Android
Kambourakis [13] presented a well-organized SLR paper OS in 2003. Subsequently, Google purchased it in 2005;
encompassing the three analysis approaches with a bias in November 2007, the first release was launched and
toward approaches that used static analysis. Senanayake et al. released [15]. Figure 2 presents the key milestones of Android
[14] conducted an extensive survey, covering over 100 studies OS. The following points discuss the main features of each
from 2016 to 2021. However, there is a lack of sufficient milestone [16]:
background information. • In September 2008, Android officially released Version
To address the limitations of the related studies, the 1.0. It included limited features, such as web browsing
authors present a survey of past studies that have utilized and connection to an online email server.
the three main analysis approaches: static, dynamic, and • On April 27, 2009, Android 1.5 (Cupcake) was released.
hybrid. Details regarding the methodologies, ML models, and It included the ability to record and playback videos,
results of each of the reviewed papers are provided. Future copy and paste, post a movie to YouTube, and view
research directions to enhance this field are also highlighted. usage history.
In addition, a comprehensive background related to this field • On September 15, 2009, the Donut version was released.
is presented. It had a few new features, such as voice and text entry
The remainder of this paper is organized in the following search, better camera access, and text-to-speech engine
manner: Section II presents the background related to support.
Android OS, malware analysis approaches, and techniques. • On October 26, 2009, a new version named Eclair
Section III describes the security model of Android OS and was released. This included Microsoft Exchange email
the difference between Android and iOS in terms of security. support, Bluetooth 2.1, and SMS and MMS services.

173170 VOLUME 12, 2024


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

• On May 20, 2010, the Froyo version was released with B. ANDROID ARCHITECTURE
a few improvements related to speed, memory, and Android OS is based on the Linux kernel and is developed
performance optimization and support for the Android by Google. It is primarily designed for smartphones, tablets,
Cloud to Device Messaging service. smartwatches. Figure 3 depicts the architecture of the
• On December 6, 2010, the Gingerbread version was Android OS, which mainly consists of four layers based
released with a better user experience and user-friendly on the needs of the device—Linux Kernel, Native Library,
interface, as well as support for Near Field Communica- Application Framework, and Applications; all layers work
tion (NFC). with each other to ensure the system runs securely and
• The Honeycomb version was released on February 22, effectively. Applications in Android devices are executed
2011, for the first Android tablet based on the Linux within Dalvik Virtual Machines (DVM), which are usually
kernel 2.6.36. written in Java programming language. While building
• On October 19, 2011, the Ice Cream Sandwich version Android applications, a substantial amount of the code is
was released with multiple features, such as graphics copied and pasted leading to spaghetti code, which implies
improvements, spell-checking, and improved camera that this code is complex and difficult to understand and
performance. maintain [17]. The responsibilities of each layer are listed
• On June 27, 2012, multiple improvements were made, below:
including 4K resolution support, a smoother user
interface, and multiple user accounts.
• On September 3, 2013, numerous new features were
included, such as wireless printing abilities, sensor
batching, screen recording capability, and improved
application compatibility.
• On November 12, 2014, big enhancements were made
including redesigning the user interface, improving
battery life, and support for multiple SIM cards.
• On May 28, 2015, a new version was released with a
few improvements, such as fingerprint reader support,
runtime permission requests, and USB-C connectivity.
• On March 9, 2016, a major version called Nougat was
released. It included yption abilities, support manager
APIs, screen zoom, multi-window support, battery usage
alerts, and more.
• On August 21, 2017, a new version was released that FIGURE 3. Architecture of android operating system.
included many features related to API, themes, and
notifications.
• On August 6, 2018, the Pie was officially released. • The Linux kernel layer is the basic layer of the
In this version, simple features were added, such as a architecture that is responsible for power, device, and
screenshot button, the battery percentage, and the clock memory management. The developers do not interact
was shifted to the left of the notification bar. directly with this layer as it is responsible for the
• Android 10 was the first version to be released in underlying infrastructure.
numerical order. On September 3, 2019, Android 10 was • The native library layer contains libraries written
launched. It could access location in the background, in C or C++ libraries. There are different libraries
perform authentication using fingerprint, and included that provide support in building user interface appli-
WPA3 Wi-Fi security. cation frameworks, drawing graphics, and accessing
• On September 8, 2020, a new Android version was databases. Android Runtime includes an environment
released and included simple features related to privacy that enables the applications to be effectively exe-
and security. cuted. Moreover, DVM includes a runtime feature
• On October 19, 2021, Android 12 was launched. that is used during installation to compile the system
It included features related to user interface and system codes [18].
services such as WindowManager and system server. • At the heart of the application framework layer is
• On February 10, 2022, Android 13 was released. Many application programming interface (API) libraries, like
functions were enhanced, including the ability to choose user interface, telephony, resources, locations, content
which user can access which applications. To safeguard providers, and other APIs that are necessary for building
privacy, none of the applications included any sensitive the system and developing daily-use applications, such
information. as the calendar and phone call [19]. This layer contains

VOLUME 12, 2024 173171


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

the most important components that are used to build the environments and sandboxes, which allows it to bypass
applications: security measures. Acecard and 888.apk are banking Trojans
1) View system for building the graphic components that were discovered in 2015. They steal victims’ data
of the application. related to their bank accounts and sniff banking transaction
2) Activity manager responsible for user interaction packets during SMS alerts and transaction commands [21].
with applications. HummingBad targeted Android, as well as iOS. It is a rootkit
3) Location manager to specify the user’s location that steals sensitive information by installing malicious
using GPS systems. applications in the background [24]. Two ransomware were
4) Telephony manager responsible for providing discovered in 2016, AndroidOS. Fusob and Xbot. They
calling ability. steal banking account information and obtain remote access
5) Resource manager responsible for accessing lay- to the device, demanding ransom. In 2017, super user
out files and graphics. privilege was granted by the Ztorg rooting malware. In 2018,
6) Content provider for sharing data between applica- a new backdoor called Chamois was discovered. It stole
tions in the devices. the OAuth token. TimpDoor emerged in 2019 with the
7) Notification manager notifying the user of applica- ability to propagate rapidly; it infects Android victims via
tion events [17]. SMS. Cerberus intercepts calls, whereas XHelper redirects
• Applications are located at the top layer in the Android the user to fake websites and attempts to steal their
OS. The users interact directly with this layer. It com- credentials [21], [23].
prises third-party applications, such as home, contact,
short message service (SMS), games, and web browsers. TABLE 1. Evaluation of android malware.
Most Android applications that are located in this layer
are written in Java or Kotlin programming languages
[2], [20].

C. ANDROID MALWARE HISTORY


The open nature of the Android platform is a significant factor
in the progression of Android malware. The open-source
model allows for OS modifications by manufacturers, thereby
making the source code vulnerable to attackers. Over the
years, the complexity and sophistication of Android malware
have increased, with cybercriminals developing increasingly
stealthy and harmful variants to target Android users.
Table 1 presents a summary of Android malware evolution
from 2010 to 2020.
In 2010, AndroidOS.DroidSMS.A emerged as the first
Android mobile malware. It took advantage of the newly
added SMS services and sent an SMS and charged the victim
without his/her knowledge. Tap Snake is a spyware that was
discovered in the same year. This malware can send the
device’s location data along with recorded phone conversa-
tions to a remote malicious server. In 2011, DroidDream was D. MALWARE DETECTION TECHNIQUES
discovered with the ability to root the victims’ devices to A malware detection program is a computational function
steal users’ sensitive information [21]. Boker is another type that operates within a specific domain that comprises a
of Trojan that propagates via SMS; it automatically installs collection of application programs and a collection of
once the user receives an SMS. In 2012, Opfake and Fakeinst malicious and benign programs. Its purpose is to analyze
were the most prominent Android Trojans discovered by the programs within the programs set and determine whether
Kaspersky. These types of malware use more sophisticated each program is benign (normal) or malware (malicious)
techniques, such as drive-by downloads and updated attacks, [5]. By examining the characteristics, behavior, or code of
thus making it more difficult to detect and affect the victims’ each program, the detection program aims to accurately
devices. [21], [22]. FakeDefender is a type of ransomware classify them and identify any potential malware threats
that was found in 2013. It displays fake information to prompt within the domain. The malware detection process typically
users to buy fake security applications [23]. involves three stages: malware analysis, feature extraction,
Thereafter, Obad malware was discovered. It enables and classification. Malware analysis is an important process
attackers to take the role of device administrator in the that aims to understand the content and behavior of malicious
background by exploiting zero-day vulnerabilities. Not- software. It involves examining and studying the malware
Compatible. CA is a Trojan that recognizes emulated to ascertain its functionality, purpose, and potential impact.

173172 VOLUME 12, 2024


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

Once the malware has been analyzed, relevant features are the program as malware or benign. Even though
extracted to capture its distinguishing characteristics. These the program’s code is being modified, the program’s
features can be derived from various aspects of the malware, behavior is mostly the same. Hence, most of unknown
such as its code, behavior, network communication, or file malware is detected using this method [4]. However, it is
structure. In the classification stage, ML or other classi- worth mentioning that certain malware binaries may not
fication algorithms are applied to categorize the malware function correctly within a protected environment due to
into different classes, such as benign or malicious. This anti-analysis techniques [27]. Thus, there is a possibility
classification is based on the extracted features and patterns of misclassifying malware samples as benign in such
observed during the malware analysis stage [4]. cases.
While behavior-based and anomaly-based detection
share a common focus on system behavior, the key
difference is the reference point. Behavior-based detec-
tion looks for patterns that are characteristic of known
malware, while anomaly-based detection identifies devi-
ations from normal, expected behavior [28].
FIGURE 4. Malware detection techniques. • Heuristic-based detection uses rules, algorithms, or pat-
terns to identify potential malware based on general
Based on Naseer et al. [25], malware detection techniques characteristics or behaviors exhibited by malicious
can be categorized into different types, as depicted in software. For example, if a user or program attempts
Figure 4. Below are the definitions of each type: to remove files that are needed by the OS, it could be
• Signature-based detection, also known as pattern match- malicious [29].
ing, involves comparing the characteristics or signatures This technique makes educated guesses regarding the
of known malware with the files or processes being presence of malware when specific signatures are not
analyzed. In signature-based detection, scanner software available. Heuristic-based techniques primarily leverage
examines the information of a file and checks it against ML and data mining strategies in the context of malware
a database of virus signatures. These virus signatures detection [25].
are essentially specific patterns or sequences of code • Statistical-based detection relies on the application
that are characteristic of known malware. If a file’s of statistical and mathematical techniques to analyze
signature matches one in the database, it is flagged as system activities, such as network connections, memory
malicious [5]. usage, system calls [5]. In addition, it considers sta-
An example of signature-based detection can be buffer tistical metrics, such as the median, mean, mode, and
overflows. Because buffer overflows include shellcodes, standard deviation. For example, it is expected to initiate
the strategy is to maintain a database of known malicious a TCP connection with one or a few TCP SYN packets.
shellcode patterns and alert if a shellcode is found in any If a host sends a hundred TCP SYN packets in a short
request [26]. period, it may be considered a malicious activity.
• Anomaly-based systems in malware detection are Statistical-based detection analyzes large volumes of
designed to identify any type of computer misuse data—such as network traffic, system logs, and user
that deviates from the normal activities of a computer behavior—to identify patterns and anomalies. In con-
system. In contrast, signature-based systems detect trast, heuristic-based detection relies on a set of
malware based on specific patterns or fingerprints predefined rules or patterns [28], [30].
stored in their databases. Anomaly-based systems detect Table 2 presents the advantages and disadvantages of
malicious software by monitoring system activities and these approaches. The detection of malware continues to
classifying them as either normal or abnormal based on introduce challenges due to its inherent complexity both in
predefined criteria. This approach enables the detection theory and practice. Malware creators apply sophisticated
of novel or unknown threats that do not have specific techniques, including obfuscation, to make the detection
signatures, thereby making it a valuable technique in process highly challenging. Consequently, the ability to
identifying new and emerging malware [5]. effectively detect malware remains a significant problem in
For example, a credit card company uses anomaly-based the field of cybersecurity [4], [6].
detection to monitor credit card usage patterns by
customers. When a customer makes an unusually large E. APPROACHES TO MALWARE ANALYSIS
purchase or a purchase in a new location, the algorithm Static, dynamic, and hybrid analysis are the most commonly
identifies this anomaly and sends an alert to contact the used techniques in Android malware detection. Static analy-
customer. sis involves examining the code and resources of an Android
• Behavior-based detection examines the behavior of the application without directly executing the program. It is
program using certain monitoring tools and classifies applied by analyzing the AndroidManifest.xml file, smali

VOLUME 12, 2024 173173


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

TABLE 2. Comparison of malware detection techniques. analysis of the malware is achieved [34], [35]. Table 3
presents the differences between static, dynamic, and hybrid
malware analysis in the context of their advantages and
disadvantages.

TABLE 3. Comparison between android malware analysis approaches.

files (which contain the application’s bytecode), and a set of


static features, such as permissions, API calls, and Dalvik
opcode (which contains low-level bytecode instructions to
be executed by Android’s virtual machine called Dalvik).
These artifacts can be obtained by decompiling the APK III. EXPLORING THE SECURITY WEAKNESSES OF
files. This approach offers several advantages, including ANDROID
shorter analysis time and lower computational requirements The Android security model relies highly on permission-
compared to dynamic analysis methods. By using static based mechanisms to control access to sensitive resources and
features, it becomes possible to quickly assess the appli- user data. Before installing an app, users are presented with
cation’s security and detect known malware signatures or a list of permissions that the application requires; they can
common patterns. However, it may be less effective in choose to grant or deny those permissions. The developers
detecting sophisticated malware that relies on dynamic have to declare all the required permissions for the application
behavior [31], [32]. in the AndroidManifest.xml file to be granted by the users.
Further, dynamic analysis involves executing the applica- This mechanism ensures that applications have access
tions and observing their behavior in real time. It also involves only to the resources they genuinely need. The requested
executing applications directly on a real device or within a permission helps users prevent applications from misusing
sandbox environment. This method focuses on monitoring resources, but users often have little knowledge to determine
the behavior of the application during run-time. By running whether granting permission is harmful [36]. For example,
the application, analysts can observe its interactions with the requesting network access, including Wi-Fi and sending short
device, network, and user data. They capture logs and analyze messages, is common for many legitimate applications, but
the network traffic generated by the application. While certain malware may exploit these permissions to perform
dynamic analysis offers a more comprehensive understanding malicious activities, such as stealing bandwidth or sensitive
of an application’s behavior, it can be more time-consuming information. Therefore, determining whether an application
and computationally demanding compared to static analysis. is malicious solely based on its permission requests can be
Nevertheless, it is highly effective in detecting sophisticated challenging for users.
malware that employs evasion techniques or exhibits mali- In Android, permissions are categorized into two main
cious behavior only at run-time [31], [33]. levels: normal permissions and dangerous permissions [37].
In hybrid analysis, to enhance the overall analysis, Normal permissions are considered low-risk permissions that
the malware is initially examined using static analysis grant access to certain resources or capabilities that are not
techniques, followed by a dynamic analysis approach. This sensitive by nature. Examples of normal permissions, include
two-step process involves analyzing the malware’s code accessing the network state, accessing the device’s vibration
and structure without execution (static analysis) and then feature, or accessing the device’s battery statistics. Dangerous
executing the malware and observing its behavior in a permissions are considered high-risk permissions that involve
controlled environment (dynamic analysis). By combining accessing sensitive user data or performing potentially
both static and dynamic analysis, a more comprehensive harmful actions. When an application requests dangerous

173174 VOLUME 12, 2024


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

permission, the user is explicitly prompted to grant or deny open-source nature enables developers to modify and
the permission during the installation process or at run- enhance the code, thereby fostering a vibrant ecosystem of
time. Examples of dangerous permissions include accessing third-party applications and innovations [40]. This openness
the user’s contacts, reading short messages, and accessing has contributed to the platform’s rapid growth and widespread
the device’s camera [38], [39]. These permission groups adoption, making it a dominant force in the mobile market.
are organized based on a device’s capabilities or features. Android users download over 1.5 billion applications and
Tables 4 and 5 provide two lists of normal and dangerous games from Google Play each month [40]. However, this
permission groups along with their associated permissions widespread adoption of the Android OS has also attracted
[37]. The normal permissions table lists permissions that the attention of malicious actors who develop and distribute
are generally considered low-risk, such as setting alarms, malware that specifically targets these devices. Below are
changing wallpapers, and managing Bluetooth connections. weaknesses found in Android OS and a discussion of why
These permissions are automatically granted to applications these weaknesses do not exist in iOS:
during installation without the user’s explicit consent. The • For Android OS, an extra antivirus program needs to
dangerous permissions table includes more sensitive permis- be installed to avoid malware effects on the mobile,
sions, such as accessing the microphone, contacts, calendar, particularly for applications downloaded from sources
and SMS/MMS messages. These permissions have a higher other than Google Play [41]. Using external sources
potential for privacy and security risks, and applications increases the risk of malicious applications that convert
must explicitly request them from the user, who can choose developed software into viruses. On the other hand,
either to grant or deny them. Understanding these permission iOS users do not need to install any antivirus solutions
groups is important for both developers and users to ensure because the only place to get applications is the App
that applications are using only the necessary permissions Store and everything on the App Store is carefully
and that users are aware of the potential impacts of granting checked to ensure it does not contain any malicious
specific permissions to an application. code [41].
• For accepting and releasing an application on the
TABLE 4. List of some normal android permissions. App Store, iOS developers are required to register
with Apple. Apple has a licensing agreement in place,
which involves testing each application submitted by
third-party developers for any privacy or security
violations. If the application complies with the licensing
agreement and does not violate any privacy or security
guidelines, it is accepted, digitally signed, and made
available for download on the App Store. Similar
to Apple, Google Play also requires applications to
be digitally signed. However, the process of digitally
signing an application for Android is different. In con-
trast to Apple, developers do not have to register
with Google Play or obtain signing certificates issued
by Google. Android developers can create as many
TABLE 5. List of some dangerous android permissions. signing certificates as they need for their applications
without being monitored by Google. They are not
required to obtain signing certificates directly from
Google or undergo a registration process. To distribute
their applications through Google Play, developers are
required to pay $25 using credit cards. This fee serves
as a verification process and helps ensure a certain level
of accountability for application submissions. However,
there is a possibility that attackers use stolen credit cards
to pay and distribute malicious applications [42], [43].
• For storing application data, devices can use either
built-in or external memory. For Android, users can use
both. This raises a few security issues and makes it
difficult for the applications to access the data, which
leads to slow processing. From a security perspective,
having both built-in and external memory can increase
There are security aspects associated with Android OS the attack surface for potential threats. Data stored in
that make it more vulnerable compared to iOS. Android’s external memory is more vulnerable to viruses and

VOLUME 12, 2024 173175


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

unauthorized access compared to built-in memory. This in topic-specific discussions. However, it only displays a
can cause risks if sensitive information is stored on the maximum of 100 results per search.
external storage [44], [45].
• Android applications can be downloaded from Google TABLE 7. Data sources.
Play and other unknown sources. In contrast, iOS
applications can only be downloaded from the App
Store, which forces the applications to undergo various
security checks. For Android, the ability to download
applications from sources other than the official Google
Play Store leads to security risks, as applications down-
loaded from unknown sources may contain malware, B. SEARCH STRING
spyware, or other forms of malware that can compromise These sources are used to search for existing literature,
user data [42], [45]. using keywords and certain boolean operators such as
• In Android, the permission system is designed to provide ‘‘Android malware analysis,’’ (‘‘Android malware detection’’
users with control over the access that applications have AND ‘‘Machine learning’’) and (‘‘Malware in Android’’
the resources. In iOS, users are generally not directly AND ‘‘supervised learning’’). Each platform offers valuable
involved in granting or denying individual permissions resources and provides a variety of logical tools that facilitate
to applications during the installation process [42], [46] the search process.
Table 6 summarizes the differences between Android OS and
iOS based on a security perspective. C. INCLUSION CRITERIA AND SCREENING PROCESS
This review is guided by the PRISMA flow diagram.
TABLE 6. Comparison of security between android and iOS.
PRISMA is an abbreviation for Preferred Reporting Items for
Systematic Reviews and Meta-Analyses used for a new type
of systematic review that focuses on conducting searches of
databases and registers.

IV. SEARCH STRATEGY


In this section, the authors describe the methodology used to
guide this study—such as the search string, data sources, and
inclusion criteria for the relevant papers.
FIGURE 5. Selection of papers by PRISMA.
A. DATA SOURCES
Various research data sources were used to explore the The inclusion criteria was that publications needed to have
relevant research papers, as listed in Table 7. Google Scholar a direct focus on Android malware and ML classifiers and
and IEEE Xplore offer a huge amount of literature, including related techniques. In addition to relevance, a few more
articles, theses, books, conference papers, and preprints from criteria were considered for inclusion, such as research papers
various disciplines and sources. IEEE Xplore is a leading published from 2017 to 2024 years and only research papers
digital library specifically focused on engineering, com- written in English were considered. The selection process
puter science, and technology-related research. ResearchGate of papers by PRISMA is illustrated in Figure 5. The iden-
offers a Q&A feature and discussion forums on which tification process involved searching multiple data sources,
researchers can ask questions, seek advice, and engage including Google Scholar, ResearchGate, and IEEE Xplore.

173176 VOLUME 12, 2024


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

A total of 18,650 records were identified from Google Arslan et al. [49] introduced a static-based methodology
Scholar, 100 from ResearchGate, and 926 from IEEE Xplore. to distinguish between malicious and benign applications.
Before the screening stage, 15,612 duplicate records were The core of this approach is assessing the permissions
omitted, along with 3,612 records omitted for other reasons, requested by applications in their ‘AndroidManifest.xml’
such as very long text, paid or access-restricted papers, and files and determining how many of these permissions are used
the study not matching the predetermined publication types in the source code. Applications that request unnecessary
like book chapter. After screening, 410 records remained, permissions, which are not essential for their functionality,
of which 213 were excluded. From the remaining 197 reports are categorized as risky and a suspicion value is assigned
sought for retrieval, 65 reports were not retrieved. Papers not to them. Consequently, applications with suspicion values
relevant to Android malware and ML, as well as documents higher than a pre-determined mean value are classified as
written in foreign languages, were deleted; moreover, a few malicious applications.
papers were excluded because the full text was not available. Akbar et al. [50] proposed a malware detection system
Finally, 40 papers were selected.’’ based on the permission requested by the applications.
To enhance the classification of malware, additional metrics
V. RELATED LITERATURE such as permission rate and smali size are collected for
In recent years, extensive research has been conducted both benign and malware training datasets. They applied the
on analyzing and detecting mobile malware. Security ven- genetic algorithm to the malware detection approach to obtain
dors and researchers commonly employ three methods for the most optimized features and efficient approach. The pro-
extracting features from mobile applications: static analysis, posed approach comprises three modules—data preparation,
dynamic analysis, and hybrid analysis (a combination of decoding the AndroidManifest.xml, and classification; each
static and dynamic approaches). This section provides a module contains several steps. They concluded that feature
discussion on the recent related studies to the detection selection reduces the compute complexity.
approaches which include static, dynamic, and hybrid Gao et al. [51] proposed Android malware detection
analyses. and family classification based on the graph convolutional
network (GCN). First, they extracted the frequency and
A. STATIC ANALYSIS patterns of API usage within the applications. Then, they
Bourebaa and Benmohamed [47] presented a model that performed API embedding. This process represents the APIs
used DL algorithm. They incorporated features such as as numerical vectors, where each API is assigned a unique
permissions and APIs, and they were able to develop vector representation. Thereafter, they measured the distance
an automated and efficient method for detecting Android or similarity between API embeddings. APIs that are used in
malware. Their approach yielded a detection accuracy of similar contexts or exhibit similar patterns of co-occurrence
88.9%. They used fully connected neural networks (NNs) will have smaller distances between their embeddings. Next,
as an ML model. It consists of neurons organized in applications and APIs are mapped into a heterogeneous
interconnected layers. The first layer receives the input and graph, and ‘‘App-API’’ and ‘‘API-API’’ edges are built.
the hidden layers and the last one produces the desired result. Finally, the GCN model is trained and unlabeled applications
The author used 512 neurons in the input layer, 8 neurons are classified.
in the hidden layer, and 3 output nodes that produced the Sangal and Verma [52] used CICInvesAndMal2019 as
result. a dataset and extracted Android permissions and intent
Dhalaria and Gandotra [48] proposed an effective frame- as a feature set for malware detection. The features were
work that combines static features and utilizes ML classifiers. selected based on the principal component analysis (PCA)
Three types of static features were extracted: API calls, reduction technique. The implementation of the feature
permissions, and intents. API calls were extracted from reduction technique revealed that it is possible to achieve
classes.dex, while permissions and intents were extracted higher accuracy in detection, while minimizing processing
from AndroidManifest.xml. The results from the test indicate overhead. By reducing the number of features used in the
that the combination of features outperforms individual detection process, the computational resources required for
features. Additionally, it was found that random forest analysis and classification decreased.
(RF) and k-nearest neighbors (K-NN) classifiers achieve the Balcioglu [53] proposed a method for malware analy-
highest accuracy. sis and detection. This method involved examining static
Lee et al. [31] investigated the impact of using genetic attributes including manifest permissions, API call signa-
algorithms and information gain on the performance of tures, intent filters, command signatures, and binaries. Naive
malware detection systems. Moreover, the authors compared Bayes, K-NN, and multiLayer perceptron are the ML models
the performance and the time taken to build the model for that were used in this work for training and validation along
each feature selection method. They found that the genetic with a method called PCA, which is used to reduce the
algorithm achieved better accuracy; however, it consumed number of attributes required to accurately characterize each
more time. app. The conducted experiment revealed that multiLayer

VOLUME 12, 2024 173177


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

perceptron has the highest level of accuracy, but requires the highest accuracy of 98.82% and the Gaussian Naive Bayes
a long time for training compared to other ML models model obtained the lowest accuracy.
used in this experiment. Moreover, Balcioglu concluded that Mahindru et al. [60] introduced a feature selection
the two attributes that distinguish malware from benign framework that is based on univariate logistic regression
applications are the feature to read phone attitude and the and multivariate linear regression. Then, the relevant features
Internet. are used as input to six ensemble ML techniques, which
Yumlembam et al. [54] proposed malware detection are gradient descent with momentum (GDM), gradient
for applications in Android-based IoT devices. For the descent method with adaptive learning rate (GDA), levenberg
classification, they used graph NNs, which is a type of DL that marquardt (LM), quasi-newton (NM), gradient descent (GD),
provides graph-based classification. They used permission and deep neural network (DNN). The authors utilized
and intent features for training and testing. permission and API calls for detecting Android malware.
Li et al. [55] presented a method to identify malware They obtained an accuracy of 98.8%.
applications in Android devices using static analysis and Odat and Yaseen [61] contributed by extracting a new
that is based on program genes. Program gene refers to the dataset that based on three datasets, which are Drebin,
smallest unit within a program in which the static components Malgenome, and MalDroid2020. To evaluate the proposed
are dynamically expressed. These genes are composed of model, multiple ML models were used and compared based
short sequences of assembly instructions that are designed on their accuracy. Permission and API calls were extracted
to perform particular tasks or fulfill specific functions. They from the APK file using APKtools and then fed into the ML
enable modularity and reusability in programming. After models. The results revealed that the RF model outperformed
extracting the program gene, the authors used the information other ML models.
gain method for feature selection. Following this, they Aamir et al. [62] proposed a novel deep learning approach
employed the Word2Vec technique to express the semantic for Android malware detection that used CNN as a classifica-
abstraction of the selected features. Finally, these selected tion method. The authors converted the APK file information
features are used in the training and testing. into a two-dimensional image representation because CNN
Khariwal et al. [56] presented a static technique for performs better with visual data. Because the dataset is
detecting Android malware by extracting intents and per- imbalanced, the use of oversampling or undersampling
missions. Initially, the permissions and intents were ranked techniques is required, but the authors did not discuss the use
using the information gain method. Subsequently, the goal of these techniques or how the dataset was balanced.
was to identify the optimal combination of permissions and Alhussen [63] leveraged long short-term memory (LSTM)
intents that could yield improved accuracy by employing and NN for detecting malware in Android applications. Due
various ML algorithms. The experimental results confirmed to the imbalanced dataset, the author applied the synthetic
that the proposed approach of combining permissions and minority over-sampling technique to balance the number of
intents yielded higher detection accuracy compared to using benign and malware samples. The result revealed that LSTM
permissions or intents alone. outperformed NN.
Smmarwar et al. [57] proposed a framework that provides
a secure and sustainable environment for malware detection.
They used permissions, intents, and API as the static features B. DYNAMIC ANALYSIS
to classify the Android application. The proposed model Hashem El Fiky et al. [64] proposed an ML approach for
performed accurately in malware detection; in contrast, the dynamic analysis of Android malware for detecting and
it achieved an accuracy of less than 85% for malware family identifying Android malware categories. The major dynamic
classification. characteristics that were extracted in these approaches are
Elayan and Mustafa [58] presented a model for Android network, memory, battery, logcat, and process. The authors
malware detection using a gated recurrent unit which is a type used many ML models and compared their accuracy and
of DL. They extracted two static features, Permission and effectiveness. In the result, they noted that the best classifier
API calls, from applications. First, the authors compared the is RF which achieves over 96% accuracy.
performance of multiple ML models, such as support vector Abuthawabeh [1] presented a supervised-based model that
machine (SVM), decision tree (DT), and RF. They compared can enhance the adepth and accuracy of the malware detection
the result of these models to a gated recurrent unit that process. He used three feature selection algorithms—RF,
contains three blocks. They concluded that DL outperforms recursive feature elimination (RFE), and LightGBM. Next,
the traditional ML models. three ML algorithms were selected for evaluation: DT,
Kesani [59] aimed to test the performance of different ML RF, and extra-trees. Finally, a comparison among these
models, including K-NN, Gaussian Naïve Bayes, SVM, DT, algorithms was made. The result revealed that the Extra-trees
RF, logistic regression, and sequential neural network (SNN). classifier had achieved the highest weighted accuracy per-
The author utilized only permission features to detect the centage among the other classifiers by 87.75% for malware
presence of malware in Android applications. SNN achieved detection.

173178 VOLUME 12, 2024


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

Further, Zulkifli et al. [65] presented a detection technique Bibi et al. [69] proposed a technique that is designed
based on network traffic for Android OS. In this technique, to specifically detect sophisticated Android malware. The
an application of packet capture, such as tpacketcapture, was authors used a gated recurrent unit which is a type of DL.
run during the execution of the tested sample to capture They evaluated the performance of their technique using
the network traffic generated by the samples. Thereafter, the standard performance metrics as well as by comparing it to
behavior of the samples was then analyzed. The captured other similar works. They concluded that the efficient and
file was sent to Wireshark software to be analyzed. Some of timely detection of their technique aids in mitigating and
the network traffic features that were extracted and analyzed ceasing attacks
from the running tested application are average packet size, Bhati and Kaushal [70] proposed an approach that extracts
ratio of incoming to outgoing bytes, average number of bytes all the system calls made by the application during the run
received per second, and average number of bytes sent per time. Then, all the collected system calls are analyzed in
flow. order to classify the application as malware or benign. The
Mahindru and Sangal [66] proposed a framework that approach was implemented using DT and RF. Moreover, they
detects malware from Android applications by performing compared these models in terms of accuracy; RF achieved a
dynamic analysis. Their model is trained by using the better accuracy level than DT.
dynamic behavior of real-world applications that were Haidros and Naik [71] presented a framework consisting of
collected from different promised repositories and the three modules. In the first module, they used the Monkey tool
experiment was performed on over 500,000 Android appli- to extract the dynamic features from an application. Then,
cations. They explored four types of ML models that are the second module involved selecting only the important
not widely used, which are the farthest first clustering, features using the Chi-square method to train and detect the
nonlinear ensemble decision tree forest approach, multilayer applications. The last module responsible for the detection
perceptron and DL algorithm. They extracted features from of the malware used seven different ML models. They
the tested application while running them in an emulator. compared the performance of these models and concluded
Each application is represented as an 1844-dimensional that AdaBoost had the best accuracy level.
Boolean vector, where ‘‘1’’ implies that the application Xiong and Zhang [72] argued that the existing approaches
requires the specified features and ‘‘0’’ implies that features with a single ML model have limitations in generalizing
are not required. among multiple malware behaviors. Hence, they proposed
Thangavelooa1 et al. [67] provided a dynamic analysis a multimodel approach that combines logistic regression,
technique in Android malware detection called DATDroid. DT, and K-NN. The authors utilized dynamic features such
Five dynamic features including system calls, system call as TCP packet information. Their experiment revealed that
process errors and time, CPU usage, memory, and network integrated models outperform single models.
packets are extracted. A few sub-features are extracted
from each main feature; for example under system calls,
there are a few sets of features, such as errors, total calls, C. HYBRID ANALYSIS
total errors, and so on. Numerous tools were used in this García et al. [73] proposed a new tool for the extraction of
project to extract the required features. For example, for both static and dynamic features from Android applications
Android Debug Bridge, shell is used to collect the CPU for malware detection. This tool performs code inspection in
usage during execution and Tcpdump is used to capture the order to retrieve a wide set of characteristics and processes
network packets while the applications are running on the all the information collected. The features utilized are
virtual machine. Moreover, Wireshark is used to analyze organized into three different categories (pre-static, static,
the captured traffic. The experimental results in this project and dynamic). Pre-static features include information that
achieved an overall accuracy of 91.7%, with lower false is extracted without code inspection, such as file name and
positive rates compared to the benchmarked method. package name. The static features included in this tool are
Wang et al. [68] introduced a lightweight method that API calls, activities, opcodes, services, and permissions. The
combines dynamic analysis and ML that and capable of final set of features is extracted by monitoring the execution
identifying Android malware. The proposed method can of the application in a controlled environment.
effectively identify malicious network behavior by analyzing Wen and Yu [74] presented a system for malware detection
network traffic. It focuses on analyzing hypertext transfer that extracted features based on both static and dynamic
protocol (HTTP) requests and transmission control protocol analysis. They used a feature selection approach using PCA
(TCP) flows to determine if an application is malicious. and Relief to reduce the dimensionality of the features.
Furthermore, it provides clear indications regarding the Subsequently, a classification model is built using SVM. The
specific malware family to which the application belongs. experimental results demonstrated that the system offers an
Additionally, the method offers a detailed explanation for its effective method for detecting Android malware.
findings. Taher et al. [75] offered a hybrid approach that combines
static and dynamic malware analysis to provide a full view

VOLUME 12, 2024 173179


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

of the threat. In the feature selection phase of their approach, extract package features, permission features, component
static characteristics—such as command strings, API calls, features, and triggering mechanisms from the software.
intents, and permissions—were extracted. Additionally, Dynamic analysis tools are employed to capture the soft-
dynamic analysis of the malware involved extracting ele- ware’s dynamic behavioral characteristics and, subsequently,
ments, such as cryptographic activities, dynamic approvals, the static and dynamic features are formatted. Finally, the
system calls, and information leakage. feature eigenvectors are processed using ML algorithms in
Alzaylaee et al. [76] proposed a system called DL- two stages, which results in the classification of the software
Droid. It is a DL system that detects malicious Android as malicious or benign. The accuracy of malware detection
applications by static and dynamic analysis. In the system, and malicious family classification is 95.9% and 94.8%,
the authors only extracted Android permissions statically respectively.
before each run and then extracted the API calls that the Atacak et al. [81] presented a malware detection model
application invokes dynamically at run time. The system that intended to improve the detection accuracy and reduce
is evaluated using 31,125 samples and its performance the required time. The detection in this method is based on
demonstrated that the system achieved high accuracy better permission information from the applications. The proposed
than traditional ML classifiers -based Android malware model reached an accuracy of 92%.
detection frameworks. Their study revealed that DL-Droid Xu et al. [82] proposed a framework that involved
achieved up to 97.8% detection rate with dynamic features extracting network traffic features as dynamic features and
only and 99.6% detection rate with a combination of both converting them to two-dimensional images. Moreover, the
static and dynamic features. approach included the use of program code as the static
Asim et al. [77] proposed an approach that is particularly feature. Each of these features is processed and analyzed
designed for detecting Trojan horse malware using both static individually, and then the results of the analysis are used to
and dynamic features. The authors extracted over 20 features classify the application and categorize the malware.
related to different categories, such as network usage, CPU Table 8 presents a summary of the related studies that
usage, and permission. SVM obtained the best performance addressed Android malware detection techniques with their
among the employed ML models. corresponding contributions and limitations.
Mantoo and Khurana [78] presented a hybrid approach
of static, dynamic, and intrinsic features-based malware VI. RESULTS AND DISCUSSION
detection using k-NN and logistic regression ML algorithms. This section wraps up several key findings based on the
They used 20 features, including the API calls that the system reviewed studies in the previous section.
invokes during execution as the dynamic analysis, the size of
the application as intrinsic features, and permission-related A. ANALYSIS APPROACHES
features as static analysis. The static features were extracted As indicated by Venkatraman and Alazab [85], static analysis
using apktool. The dynamic features are extracted by is considered faster and more effective than dynamic analysis
installing each application in the restricted environment of due to its advantages in capturing information related to
Genny Motion Studio. Both k-NN and logistic regression structural properties, such as byte sequence ‘‘signatures’’
showed an accuracy of 97.5%. and anomalies in file content. Dynamic analysis can be
Kouliaridis et al. [79] introduced an automated hybrid effective by utilizing run-time information, such as running
analysis tool that extracted groups of static and dynamic processes or employing control flow graphs, which can be
features to analyze the behavior of an application on less susceptible to obfuscated malware. Moreover, dynamic
the Android platform. A total of six feature categories malware analysis requires the user to install and execute
were investigated, including permissions, intents, API the application, potentially impacting user data. Malicious
calls, network traffic, inter-app communication, and Java behavior may occur during the application’s execution or
classes. even immediately after installation, potentially compromis-
Among these categories, only permissions, intents, and ing the user’s sensitive information.
API calls apply to static analysis. The authors found that Based on the reviewed studies, it is identified that 50% of
the API calls category tended to enhance the performance of the related studies used static analysis as a malware detection
Additionally, the Java classes category also achieved notably approach, 25% used hybrid analysis, and 25% used the
high average importance scores. The experiment for this dynamic analysis technique. This is presented in Figure 6.
study was done over three different datasets in order to show The increased use of static analysis demonstrates its high
that this hybrid analysis of Android applications can greatly effectiveness as an application analysis approach. Moreover,
improve the detection capabilities of a malware detection the resource consumption involved in static analysis is lower
model. when compared with the other two methods. In a study
Yang et al. [80] proposed a two-stage mechanism for conducted by Gorment et al. [86] on malware detection for
detecting and classifying Android malware using ML algo- different platforms, it was found that 53.3% of the studies
rithms. The approach involves utilizing static analysis to focused on static analysis. In contrast, dynamic analysis

173180 VOLUME 12, 2024


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

TABLE 8. Summary of reviewed papers.

VOLUME 12, 2024 173181


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

TABLE 8. (Continued.) Summary of reviewed papers.

173182 VOLUME 12, 2024


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

TABLE 8. (Continued.) Summary of reviewed papers.

VOLUME 12, 2024 173183


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

TABLE 8. (Continued.) Summary of reviewed papers.

173184 VOLUME 12, 2024


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

accounted for 28.9% and hybrid analysis for 17.8%. This also static-based malware detection. Three main features were
highlights the significant effectiveness and attractiveness of mostly used by researchers, which are enumerated below:
static analysis for malware detection, not only for Android • Permissions features: Android applications request var-
OS but also across various platforms. ious permissions to access certain sensitive resources or
data in order to perform their functional requirements
and complete specific actions on a mobile device.
Users have to allow the requested permissions by
applications [87]. Malicious applications often request
excessive or unnecessary permissions, which indicates
suspicious behavior.
• Intent: Intent is a fundamental component of Android’s
inter-application communication system [88]. When one
activity within an application needs to communicate
with another activity, it generates an intent. An intent
serves as a message or request that encapsulates
the important information that the sending activity
wants to communicate to the receiving activity [89].
ML algorithms can consider intent features, such as
FIGURE 6. Malware analysis techniques used in the research papers
intent filters and action names. Intent filters define the
covered in this survey. types of intents an application can respond to, and action
names specify the specific actions the application can
perform [90], [91].
• API calls: API calls are important for Android applica-
TABLE 9. Static features used in studies covered in this survey.
tions because they allow the application to interact with
devices [92]. An API call typically involves sending a
request to a server to fetch or send data. This can include
retrieving information from a database or submitting
data to be processed and stored [93].
ML algorithms can analyze permission features, which
provide information regarding the access and capabilities
requested by the applications. Certain permissions are com-
monly associated with malicious behavior, such as accessing
the exact location of the device and having details about
the SMS messages. ML algorithms can learn to recognize
these patterns and classify applications as malicious based
on suspicious permission requests [94]. Moreover, certain
combinations of intent filters and action names can indicate
malicious behavior. Further, by analyzing the sequence and
patterns of API calls made by an application, ML algorithms
can identify suspicious behavior that may indicate malware.
Permission features are the most widely used features
followed by API calls, and intent features, as illustrated in

B. EXTRACTED STATIC FEATURES


Static analysis involves the examination of source code
without executing the application. It involves analyzing the
code structure, syntax, and semantics to detect the presence
of malware. In this subsection, the most widely extracted
static features based on the reviewed studies are discussed and
defined. Based on Table 9, which presents the extracted static
features that are used by the reviewed papers that proposed FIGURE 7. Extracted static features in the reviewed papers.

VOLUME 12, 2024 173185


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

Figure 7. Other features include those that were used only by TABLE 10. Accuracy achieved by each paper.
one study, such as binaries and program genes.

C. PERFORMANCE OF ML ALGORITHMS
Ten ML algorithms were widely used and achieved the best
results based on the reviewed studies. Figure 8 illustrates the
percentage use of these ML algorithms, which have attained
the highest accuracy rates in certain papers compared to other
ML models. The analysis reveals that the RF algorithm is the
most commonly used, followed by DT and SVM with similar
usage percentages. RF is effective for malware detection and
is robust against overfitting [95], [96], which could explain
its popularity in the analyzed studies.
Even though RF is more commonly used overall, it is
important to note that it may not always guarantee the highest
accuracy in every scenario. As shown in Table 10, the highest
accuracy rates were achieved by other algorithms, such as
logistic regression with 100% accuracy [79], CNN with
99.92% accuracy [62], and DT with 99.6% accuracy [76] for
malware detection. The choice of algorithm should be based
on a thorough evaluation and consideration of the specific
dataset and problem domain.

FIGURE 8. Used ML algorithm in the analyzed papers.

discussed. The following points outline the main findings and


Even though the recently proposed studies by researchers key limitations uncovered during the SLR:
have a great accuracy detection rate, a few studies took into 1) The high percentage of static analysis-based appro-
consideration the time taken to train and test the models. aches may indicate their widespread adoption and
For example, in a work proposed by Balcioglu et al. [53], effectiveness in malware detection.
the accuracy achieved was close to 99%, but the time taken 2) The most commonly used static features for malware
was very long. Bibi et al. [69] revealed a long detection time detection are permissions, intents, and API calls.
compared to similar studies. Therefore, it is essential to create 3) Among the ML algorithms used, RF is the most
a balance between achieving high accuracy and considering commonly used, followed by DT and SVM. RF is
the time aspect in malware detection approaches. To ensure a effective for malware detection and robust against
good user experience, it is important to build a fast detection overfitting, which could explain its popularity.
system while maintaining a high level of accuracy. 4) It is important to balance achieving high accuracy
and considering the time aspect in malware detection
D. SUMMARY OF MAJOR FINDINGS AND LIMITATIONS approaches. Certain studies have achieved close to 99%
The aim of this section is to provide a summary of the accuracy, but with long detection times, which is a
major findings from the SLR on Android malware detection critical factor for practical implementation.
using supervised ML models, discussing the significant 5) Dataset issues are identified as the most common
contributions, and conclusions of the reviewed studies. Addi- limitation in the SLR. This is not only because of the
tionally, the limitations identified throughout the literature are size of the used datasets; the use of the outdated dataset

173186 VOLUME 12, 2024


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

also contributed to this issue. An old dataset may not algorithmic decision-making, and the use of AI in security
reflect current trends and behaviors of malware. applications. Researchers and developers must ensure that
6) A few of the studies have used imbalanced datasets, their solutions comply with all relevant regulations and
which leads to trained models that are biased toward industry standards to mitigate legal and ethical risks.
the majority class.
7) While accuracy is an important metric for evaluating VII. RECOMMENDATIONS FOR FUTURE RESEARCH
the ML models’ performance, it should not be the only DIRECTIONS
focus. Other aspects, such as resource consumption There is still room for further research to enhance the effec-
and the time taken to train and test, should also be tiveness, efficiency, and resilience of ML-based approaches
considered to provide a comprehensive evaluation of for malware detection. In this section, a set of recom-
their ML models. mendations for future research directions in the field of
8) Despite the high accuracy rates achieved in many ML-based Android malware detection are presented. These
studies, the majority of them did not test their ML recommendations aim to address existing challenges and
models in real-world scenarios to evaluate their actual push the boundaries of detection capabilities. Exploring
performance of the ML models and the generalization these research directions can enhance the state-of-the-art
of the dataset. Evaluating the ML models using real- in Android malware detection and develop better solutions.
world data, rather than just on the training and testing Based on this study, the authors suggest additional future
datasets, is important to assess their robustness and investigation in the following directions::
effectiveness in applications.
9) Android devices typically have limited CPU, memory, A. EXPLORING OTHER STATIC FEATURES
and battery resources compared to desktops. Highly The existing literature has mainly focused on utilizing
accurate ML models, particularly ensemble models, traditional static features—such as API calls, permissions,
for malware detection may have a large number and file metadata—for Android malware detection. However,
of parameters and require significant computational there are still more static features that need to be addressed
resources. Running these complex models directly and explored by researchers for malware detection in Android
on Android devices may not be feasible due to the OS. An example of these features is Opcode features,
limited resources available. In addition, this may cause which can be found in a class.dex file [99]. The opcode
latency in malware detection, which can impact the user sequences represent the low-level instructions executed by
experience. the application on the target platform. Furthermore, code
10) The ML models should be able to generalize well comments could be used for malware detection because
and detect not only known malware but also new, developers may leave comments that may contain hints
previously unseen threats. This requires the models related to their malicious activities. However, a key challenge
to learn the underlying patterns and characteristics faced by researchers in exploring these novel feature sets
of malware, rather than relying solely on specific is the lack of publicly available datasets that include these
signatures. attributes.
To address this gap, future research should focus on the
E. REGULATORY AND ETHICAL IMPLICATIONS development of comprehensive datasets that capture a diverse
The integration of ML algorithm in malware detection and set of static features, including opcode sequences and code
analysis introduces not just technological advancements but comments, in addition to traditionally utilized attributes.
also regulatory of ethical considerations. Malware detection
often relies on analyzing large datasets of applications B. EXPLAINABLE MACHINE LEARNING TECHNIQUES
and user behavior, which may include sensitive user data While NN and other complex ML models have demonstrated
and personal information. Malware detection systems must impressive performance in Android malware detection, they
analyze the system in depth, from reading the content of often act as black boxes, thus making it challenging to
all files to observing the behavior of all processes. Hence, understand how they classify the Android application. This
developers and researchers must ensure strict compliance lack of transparency leads to a significant challenge, as it can
with data privacy regulations—such as the General Data affect the trust and reliability of these models, particularly in
Protection Regulation (GDPR)—when collecting, storing, high-risk domains such as cybersecurity.
and processing this data for ML-based malware detection To address this issue, researchers have been focusing more
[98]. Appropriate measures should be taken to anonymize on the field of explainable ML (XML). XML techniques
and protect user data, thereby minimizing the risk of personal aim to develop models that not only provide accurate
information leaks or misuse. predictions but also offer interpretable explanations for their
Moreover, depending on the jurisdiction, the deployment decisions. In the context of Android malware detection,
of Android malware detection systems may be subject to XML approaches will be able to bridge the gap between
specific regulations, such as those related to data protection, model accuracy and model interpretability. Moreover, a clear

VOLUME 12, 2024 173187


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

understanding of reasons underlying the model’s decisions residual errors in data extraction and analysis cannot be
enables researchers to assess its generalization capabilities eliminated.
and develop more robust malware detection systems. External validity deals with the summary of the results
obtained from the primary studies. The review collected
C. DEEP LEARNING NEEDS TO BE UTILIZED MORE IN research publications from 2017 to 2024 to provide a
ANDROID SECURITY comprehensive overview of Android malware detection using
While DL models have showed outstanding performance in ML models up to the present time. ML models and techniques
multiple domains, such as computer vision, their application for malware detection have increased significantly during
in Android malware detection is still in the early stages and this period due to recent advances in malware and software
needs to be discovered more. However, deploying DL mod- security. The trends in this field could vary across different
els on resource-constrained devices introduces challenges periods. Thus, the analysis presented in the review may not
related to resource consumption and model size. In addition, fully capture comprehensive studies conducted before the
the lack of comprehensive and high-quality datasets of period.
Android malware samples is another challenge.
The authors encourage researchers to make further IX. CONCLUSION
advances in utilizing DL models and address related chal- With the increasing popularity and widespread use of
lenges to enable practical deployment of DL-based malware Android devices, the risk of malicious software that targets
detection on Android. these platforms is also on the rise. Android malware presents
dangers to user privacy, data security, and the overall
D. DETECTION OF COLLUDING APPLICATIONS performance of devices. Efficient and accurate Android
Colluding applications refer to a group of Android applica- malware detection not only protects individual users’ privacy
tions that work together to perform malicious activities that and data security but also preserves the integrity of enterprise
they are unable to do independently. These applications may networks and sensitive information. The security model of
communicate and exchange information covertly, thereby Android is highly reliant on permission based model where
causing a significant threat to the overall security and privacy the developer declare the required permission and the users
of Android devices [100]. This is an emerging threat. grant this permission. Based on a recently proposed paper by
The reason why this is considered an emerging threat is that researchers, the security weaknesses related to the Android
traditional Android malware detection systems typically scan application and Google Play review process are highlighted
applications individually to determine if they are malicious. and summarized.
However, in this case, the malicious activity is being This paper presented an SLR of existing works related to
conducted by a group of applications colluding with each ML-based malware detection. After conducting the review,
other. Traditional detectors are unable to identify this type static analysis was identified as the widely used analysis
of threat, as they are not designed to detect the collaborative approach for malware detection in Android application,
nature of the malicious behavior. responsible for 50% of the reviewed papers. Another key
To address this issue, a new model is needed that can finding is that permissions features were the mostly used
effectively detect these colluding applications. Unfortunately, static features, followed by API calls features. Moreover,
there has been very little research conducted in this area, and the authors discovered that RF was the most used ML
there is a lack of datasets available to support the development model for Android malware detection and classification.
of this model. Even though it is the most commonly used model, the highest
accuracy was achieved with other ML models, such as logistic
VIII. THREATS TO VALIDITY regression and DT. Based on the findings from existing
Every research study, including an SLR, faces potential studies in this field, it is concluded from this SLR that
threats to the validity of its findings. In this section, the threat Android malware detection using the ML model still faces
to validity that are experienced while conducting this SLR are challenges and limitations. Database issues were identified
discussed. as the most significant limitation in the reviewed papers.
Construct validity is about the collection of primary Another limitation was the lack of model evaluation in a real-
studies. In this paper, the authors likely have a few errors world scenario, which did not ensure model robustness and
in the screening process based on the inclusion or exclusion dataset generalization. To address these challenges, future
criteria. This cross-checking method was employed to help research directions are provided. The findings of the review
validate the final set of included studies and further mitigate contribute to a deeper understanding of the current state of
the risk of errors in the study selection. the field and can serve as a foundation for researchers to build
Internal validity is related to extracting and analyzing upon.
data. It is related to the soundness of the applied review
process. There was a heavy workload during the process REFERENCES
of data extraction and data analysis. Hence, the data was [1] M. K. A. Abuthawabeh, ‘‘Android malware detection based on network
traffic using CICAndMal2017 dataset,’’ Ph.D. dissertation, Sci. Inf. Syst.
cross-checked and reviewed until all authors agreed on Secur. Digit. Criminology, Princess Sumaya Univ. Technol., Amman,
the comparison results. However, the possibility of certain Jordan, 2019.
173188 VOLUME 12, 2024
S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

[2] S. Garg and N. Baliyan, ‘‘Comparative analysis of Android and iOS [26] M. Salman. (2024). What is Signature-Based Detection? Accessed:
from security viewpoint,’’ Comput. Sci. Rev., vol. 40, May 2021, Jul. 2024. [Online]. Available: https://www.educative.io/answers/what-is-
Art. no. 100372. [Online]. Available: https://www.sciencedirect.com/ signature-based-detection
science/article/pii/S1574013721000125 [27] M. Botacin, F. Ceschin, R. Sun, D. Oliveira, and A. Grégio, ‘‘Challenges
[3] A. B. Yilmaz, Y. S. Taspinar, and M. Koklu, ‘‘Classification of malicious and pitfalls in malware research,’’ Comput. Secur., vol. 106, Jul. 2021,
Android applications using naive Bayes and support vector machine Art. no. 102287.
algorithms,’’ Int. J. Intell. Syst. Appl. Eng., vol. 10, no. 2, pp. 269–274, [28] Z. Bazrafshan, H. Hashemi, S. M. H. Fard, and A. Hamzeh, ‘‘A survey
2022. on heuristic malware detection techniques,’’ in Proc. 5th Conf. Inf. Knowl.
[4] Ö. A. Aslan and R. Samet, ‘‘A comprehensive review on malware detection Technol., May 2013, pp. 113–120.
approaches,’’ IEEE Access, vol. 8, pp. 6249–6271, 2020. [Online]. [29] What is Heuristic Analysis? Accessed: Jul. 2024. [Online]. Available:
Available: https://api.semanticscholar.org/CorpusID:210692077 https://www.fortinet.com/resources/cyberglossary/heuristic-analysis
[5] I. A. Saeed, A. Selamat, and A. M. A. Abuagoub, ‘‘A survey on malware [30] X. Meng, ‘‘An integrated networkbased mobile botnet detection system,’’
and malware detection systems,’’ Int. J. Comput. Appl., vol. 67, no. 16, Ph.D. dissertation, Univ. London, London, U.K., 2018.
pp. 25–31, Apr. 2013.
[31] J. Lee, H. Jang, S. Ha, and Y. Yoon, ‘‘Android malware detection using
[6] R. Tahir, ‘‘A study on malware and malware detection techniques,’’
machine learning with feature selection based on the genetic algorithm,’’
Int. J. Educ. Manage. Eng., vol. 8, no. 2, pp. 20–30, Mar. 2018.
Mathematics, vol. 9, no. 21, p. 2813, Nov. 2021. [Online]. Available:
[7] A. Sharma and S. K. Sahay, ‘‘Evolution and detection of polymorphic and https://www.mdpi.com/2227-7390/9/21/2813
metamorphic malwares: A survey,’’ 2014, arXiv:1406.7061.
[32] K. Liu, S. Xu, G. Xu, M. Zhang, D. Sun, and H. Liu, ‘‘A review of Android
[8] P. Vinod, R. Jaipur, V. Laxmi, and M. Gaur, ‘‘Survey on malware detection
malware detection approaches based on machine learning,’’ IEEE Access,
methods,’’ in Proc. 3rd Hackers’ Workshop Comput. Internet Secur.
vol. 8, pp. 124579–124607, 2020.
(IITKHACK), Mar. 2009, pp. 74–79.
[9] N. Milosevic, A. Dehghantanha, and K.-K.-R. Choo, ‘‘Machine learning [33] M. Y. Wong and D. Lie, ‘‘Intellidroid: A targeted input generator for the
aided Android malware classification,’’ Comput. Electr. Eng., vol. 61, dynamic analysis of Android malware,’’ in Proc. NDSS, vol. 16, 2016,
pp. 266–274, Jul. 2017. [Online]. Available: https://www.sciencedirect. pp. 21–24.
com/science/article/pii/S0045790617303087 [34] N. Tarar, S. Sharma, and C. R. Krishna, ‘‘Analysis and classification of
[10] Y. Pan, X. Ge, C. Fang, and Y. Fan, ‘‘A systematic literature review of Android malware using machine learning algorithms,’’ in Proc. 3rd Int.
Android malware detection using static analysis,’’ IEEE Access, vol. 8, Conf. Inventive Comput. Technol. (ICICT), Nov. 2018, pp. 738–743.
pp. 116363–116379, 2020. [35] V. Rao and K. Hande, ‘‘A comparative study of static, dynamic and
[11] N.-U.-R. Chowdhury, A. Haque, H. Soliman, M. S. Hossen, T. Fatima, hybrid analysis techniques for Android malware detection,’’ Int. J. Eng.
and I. Ahmed, ‘‘Android malware detection using machine learning: Develop. Res., vol. 5, no. 2, pp. 1433–1436, 2017.
A review,’’ in Proc. Intell. Syst. Conf. Amsterdam, The Netherlands: [36] N. Peiravian and X. Zhu, ‘‘Machine learning for Android malware
Springer, 2023, pp. 507–522. detection using permission and API calls,’’ in Proc. IEEE 25th Int. Conf.
[12] Z. Wang, Q. Liu, and Y. Chi, ‘‘Review of Android malware detection based Tools Artif. Intell., Nov. 2013, pp. 300–305.
on deep learning,’’ IEEE Access, vol. 8, pp. 181102–181126, 2020. [37] M. Alazab, M. Alazab, A. Shalaginov, A. Mesleh, and A. Awajan,
[13] V. Kouliaridis and G. Kambourakis, ‘‘A comprehensive survey on machine ‘‘Intelligent mobile malware detection using permission requests and API
learning techniques for Android malware detection,’’ Information, vol. 12, calls,’’ Future Gener. Comput. Syst., vol. 107, pp. 509–521, Jun. 2020.
no. 5, p. 185, Apr. 2021. [38] K. Sowndarajan and S. Binu, ‘‘Static analysis tool for identification of
[14] J. Senanayake, H. Kalutarage, and M. O. Al-Kadri, ‘‘Android mobile mal- permission misuse by Android applications,’’ Int. J. Appl. Eng. Res.,
ware detection using machine learning: A systematic review,’’ Electronics, vol. 12, no. 24, pp. 15169–15178, 2017.
vol. 10, no. 13, p. 1606, Jul. 2021. [39] M. Bardus, M. A. Daccache, N. Maalouf, R. A. Sarih, and I. H. Elhajj,
[15] M. Haris, B. Jadoon, M. Yousaf, and F. Khan, ‘‘Evolution of Android ‘‘Data management and privacy policy of COVID-19 contact-tracing apps:
operating system: A review,’’ in Proc. 2nd Int. Conf. Adv. Res., 2017, Systematic review and content analysis,’’ JMIR mHealth uHealth, vol. 10,
pp. 1–11. no. 7, Jul. 2022, Art. no. e35195.
[16] R. Sain. (2024). Android OS History and Versions. Accessed: Jul. 2024. [40] R. Singh. (2014). An Overview of Android Operating System and
[Online]. Available: https://www.naukri.com/code360/library/Android-os- Its Security Features. [Online]. Available: https://api.semanticscholar.
history-and-versions org/CorpusID:11973006
[17] R. Vaidya. Android Architecture: Layers and Important Components. [41] M. S. Ahmad, N. E. Musa, R. Nadarajah, R. Hassan, and N. E. Othman,
Accessed: Nov. 26, 2023. [Online]. Available: https://www.elluminatiinc. ‘‘Comparison between Android and iOS operating system in terms of
com/android-architecture/n security,’’ in Proc. 8th Int. Conf. Inf. Technol. Asia (CITA), Jul. 2013,
[18] N. Rahimi, J. Nolen, and B. Gupta, ‘‘Android security and its pp. 1–4.
rooting—A possible improvement of its security architecture,’’ J. Inf. [42] I. Mohamed and D. Patel, ‘‘Android vs iOS security: A comparative
Secur., vol. 10, no. 2, pp. 91–102, 2019. study,’’ in Proc. 12th Int. Conf. Inf. Technol.-New Generat., Apr. 2015,
[19] Y. Chen, ‘‘Research on Android architecture and application pp. 725–730.
development,’’ J. Phys., Conf. Ser., vol. 1992, no. 2, Aug. 2021,
[43] C. Nachenberg, ‘‘A window into mobile device security,’’ Symantec Secur.
Art. no. 022168.
Response, Moutain View, CA, USA, Tech. Rep., 2011.
[20] P. Uttarwar, R. P. Tidke, D. S. Dandwate, and U. J. Tupe, ‘‘A literature
[44] T.-M. Grønli, J. Hansen, G. Ghinea, and M. Younas, ‘‘Mobile application
review on Android—A mobile operating system,’’ Int. Res. J. Eng.
platform heterogeneity: Android vs windows phone vs iOS vs Firefox OS,’’
Technol., vol. 8, no. 1, pp. 1–6, 2021.
in Proc. IEEE 28th Int. Conf. Adv. Inf. Netw. Appl., May 2014, pp. 635–641.
[21] M. Ashawa and S. Morris, ‘‘Analysis of mobile malware: A systematic
review of evolution and infection strategies,’’ J. Inf. Secur. Cybercrimes [45] S. Karthick and S. Binu, ‘‘Android security issues and solutions,’’ in Proc.
Res., vol. 4, no. 2, pp. 103–131, Dec. 2021. Int. Conf. Innov. Mech. Ind. Appl. (ICIMIA), Feb. 2017, pp. 686–689.
[22] M. Yousefi-Azar, L. G. C. Hamey, V. Varadharajan, and S. Chen, [46] D. Geneiatakis, I. N. Fovino, I. Kounelis, and P. Stirparo, ‘‘A permission
‘‘Malytics: A malware detection scheme,’’ IEEE Access, vol. 6, verification approach for Android mobile applications,’’ Comput. Secur.,
pp. 49418–49431, 2018. vol. 49, pp. 192–205, Mar. 2015.
[23] T. Trieu, ‘‘Android malware analysis,’’ M.S. thesis, Inf. Technol., [47] F. Bourebaa and M. Benmohammed, ‘‘A deep neural network model for
Metropolia Univ. Appl. Sci., Finland, 2021. malware detection,’’ Int. J. Informat. Appl. Math., vol. 4, no. 1, pp. 1–14,
[24] F. Martinelli, F. Mercaldo, V. Nardone, A. Santone, and G. Vaglini, 2021.
‘‘Model checking and machine learning techniques for HummingBad [48] M. Dhalaria and E. Gandotra, ‘‘A framework for detection of Android
mobile malware detection and mitigation,’’ Simul. Model. Pract. Theory, malware using static features,’’ in Proc. IEEE 17th India Council Int. Conf.
vol. 105, Dec. 2020, Art. no. 102169. (INDICON), Dec. 2020, pp. 1–7.
[25] M. Naseer, J. F. Rusdi, N. M. Shanono, S. Salam, Z. B. Muslim, N. A. Abu, [49] R. S. Arslan, I. A. Doğru, and N. Barişçi, ‘‘Permission-based malware
and I. Abadi, ‘‘Malware detection: Issues and challenges,’’ J. Phys., Conf. detection system for Android using machine learning techniques,’’
Ser., vol. 1807, no. 1, Apr. 2021, Art. no. 012011. Int. J. Softw. Eng. Knowl. Eng., vol. 29, no. 1, pp. 43–61, Jan. 2019.

VOLUME 12, 2024 173189


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

[50] F. Akbar, M. Hussain, R. Mumtaz, Q. Riaz, A. W. A. Wahab, and [73] A. Martín García, R. Lara-Cabrera, and D. Camacho, ‘‘A new tool for static
K.-H. Jung, ‘‘Permissions-based detection of Android malware using and dynamic Android malware analysis,’’ in Proc. 13th Int. FLINS Conf.
machine learning,’’ Symmetry, vol. 14, no. 4, p. 718, Apr. 2022. (FLINS), Sep. 2018, pp. 509–516.
[51] H. Gao, S. Cheng, and W. Zhang, ‘‘GDroid: Android malware detection [74] L. Wen and H. Yu, ‘‘An Android malware detection system based
and classification with graph convolutional network,’’ Comput. Secur., on machine learning,’’ AIP Conf. Proc., vol. 1864, no. 1, 2017,
vol. 106, Jul. 2021, Art. no. 102264. Art. no. 020136.
[52] A. Sangal and H. K. Verma, ‘‘A static feature selection-based Android [75] F. Taher, O. AlFandi, M. Al-kfairy, H. A. Hamadi, and S. Alrabaee,
malware detection using machine learning techniques,’’ in Proc. Int. Conf. ‘‘DroidDetectMW: A hybrid intelligent model for Android malware
Smart Electron. Commun. (ICOSEC), Sep. 2020, pp. 48–51. detection,’’ Appl. Sci., vol. 13, no. 13, p. 7720, Jun. 2023.
[53] Y. Balcioglu, ‘‘Malware analysis for effective Android malware detection,’’ [76] M. K. Alzaylaee, S. Y. Yerima, and S. Sezer, ‘‘DL-Droid: Deep learning
in Proc. Int. Anatolian Congr. Sci. Res., Mar. 2023, pp. 1–9. based Android malware detection using real devices,’’ Comput. Secur.,
[54] R. Yumlembam, B. Issac, S. M. Jacob, and L. Yang, ‘‘IoT-based Android vol. 89, Feb. 2020, Art. no. 101663.
malware detection using graph neural network with adversarial defense,’’ [77] S. Ullah, T. Ahmad, A. Buriro, N. Zara, and S. Saha, ‘‘TrojanDetector:
IEEE Internet Things J., vol. 10, no. 10, pp. 8432–8444, May 2023. A multi-layer hybrid approach for trojan detection in Android applica-
[55] Q. Li, G. Chen, and B. Li, ‘‘Android malware detection based on program tions,’’ Appl. Sci., vol. 12, no. 21, p. 10755, Oct. 2022.
genes,’’ Secur. Commun. Netw., vol. 2023, pp. 1–11, Apr. 2023. [78] B. A. Mantoo and S. S. Khurana, ‘‘Static, dynamic and intrinsic features
[56] K. Khariwal, J. Singh, and A. Arora, ‘‘IPDroid: Android malware detection based Android malware detection using machine learning,’’ in Proceedings
using intents and permissions,’’ in Proc. 4th World Conf. Smart Trends of ICRIC, P. K. Singh, A. K. Kar, Y. Singh, M. H. Kolekar, and S. Tanwar,
Syst., Secur. Sustainability (WorldS4), Jul. 2020, pp. 197–202. Eds., Cham, Switzerland: Springer, 2020, pp. 31–45.
[57] S. K. Smmarwar, G. P. Gupta, S. Kumar, and P. Kumar, ‘‘An optimized [79] V. Kouliaridis, G. Kambourakis, D. Geneiatakis, and N. Potha, ‘‘Two
and efficient Android malware detection framework for future sustainable anatomists are better than one—Dual-level Android malware detection,’’
computing,’’ Sustain. Energy Technol. Assessments, vol. 54, Dec. 2022, Symmetry, vol. 12, no. 7, p. 1128, Jul. 2020. [Online]. Available:
Art. no. 102852. https://www.mdpi.com/2073-8994/12/7/1128
[58] O. N. Elayan and A. M. Mustafa, ‘‘Android malware detection using deep [80] F. Yang, Y. Zhuang, and J. Wang, ‘‘Android malware detection using
learning,’’ Proc. Comput. Sci., vol. 184, pp. 847–852, Jan. 2021. hybrid analysis and machine learning technique,’’ in Proc. Int. Conf. Cloud
[59] R. S. Kesani, ‘‘Android malware detection using machine learning,’’ Comput. Secur., Jun. 2017, pp. 565–575.
M.S. thesis, Dept. Comput. Sci., Blekinge Inst. Technol., Valhallavägen, [81] İ. Atacak, K. Kılıç, and İ. A. Doğru, ‘‘Android malware detection using
2024. hybrid ANFIS architecture with low computational cost convolutional
[60] A. Mahindru, H. Arora, A. Kumar, S. K. Gupta, S. Mahajan, S. Kadry, layers,’’ PeerJ Comput. Sci., vol. 8, p. e1092, Sep. 2022.
and J. Kim, ‘‘PermDroid a framework developed using proposed feature [82] P. Xu, C. Eckert, and A. Zarras, ‘‘hybrid-Falcon: Hybrid pattern malware
selection approach and machine learning techniques for Android malware detection and categorization with network traffic and program code,’’ 2021,
detection,’’ Sci. Rep., vol. 14, no. 1, p. 10724, May 2024. arXiv:2112.10035.
[61] E. Odat and Q. M. Yaseen, ‘‘A novel machine learning approach for [83] R. Srinivasan, S. Karpagam, M. Kavitha, and R. Kavitha, ‘‘An analysis
Android malware detection based on the co-existence of features,’’ IEEE of machine learning-based Android malware detection approaches,’’
Access, vol. 11, pp. 15471–15484, 2023. J. Phys., Conf. Ser., vol. 2325, no. 1, Aug. 2022, Art. no. 012058.
[62] M. Aamir, M. W. Iqbal, M. Nosheen, M. U. Ashraf, A. Shaf, [84] K. Sai, N. Navya, T. Sreya, and S. Ali, ‘‘Android malware detection
K. A. Almarhabi, A. M. Alghamdi, and A. A. Bahaddad, ‘‘AMDDLmodel: using genetic algorithm based optimized feature selection and machine
Android smartphones malware detection using deep learning model,’’ learning,’’ Int. J. Recent Develop. Sci. Technol., vol. 7, no. 2, pp. 60–72,
PLoS ONE, vol. 19, no. 1, Jan. 2024, Art. no. e0296722. 2023.
[63] A. Alhussen, ‘‘Advanced Android malware detection through deep [85] S. Venkatraman and M. Alazab, ‘‘Use of data visualisation for zero-
learning optimization,’’ Eng., Technol. Appl. Sci. Res., vol. 14, no. 3, day malware detection,’’ Secur. Commun. Netw., vol. 2018, pp. 1–13,
pp. 14552–14557, Jun. 2024. Dec. 2018.
[64] A. H. E. Fiky, A. E. Shenawy, and M. A. Madkour, ‘‘Android malware [86] N. Z. Gorment, A. Selamat, L. K. Cheng, and O. Krejcar, ‘‘Machine
category and family detection and identification using machine learning,’’ learning algorithm for malware detection: Taxonomy, current challenges
2021, arXiv:2107.01927. and future directions,’’ IEEE Access, vol. 11, pp. 141045–141089,
[65] A. Zulkifli, I. R. A. Hamid, W. M. Shah, and Z. Abdullah, ‘‘Android 2023.
malware detection based on network traffic using decision tree algorithm,’’ [87] S. Ramachandran, A. Dimitri, M. Galinium, M. Tahir, I. V. Ananth,
in Proc. 3rd Int. Conf. Recent Adv. Soft Comput. Data Mining (SCDM). C. H. Schunck, and M. Talamo, ‘‘Understanding and granting Android
Johor, Malaysia: Springer, Feb. 2018, pp. 485–494. permissions: A user survey,’’ in Proc. Int. Carnahan Conf. Secur. Technol.
[66] A. Mahindru and A. L. Sangal, ‘‘MLDroid—Framework for Android (ICCST), Oct. 2017, pp. 1–6.
malware detection using machine learning techniques,’’ Neural Comput. [88] A. Pathak. Exploring Android Intents. Accessed: Mar. 8, 2024. [Online].
Appl., vol. 33, no. 10, pp. 5183–5240, May 2021. Available: https://medium.com/@myofficework000/intents-in-android-
[67] R. Thangaveloo, W. W. Jing, C. K. Leng, and J. Abdullah, ‘‘DATDroid: 713da59ee700
Dynamic analysis technique in Android malware detection,’’ Int. J. Adv. [89] M. W. Afridi, T. Ali, T. Alghamdi, T. Ali, and M. Yasar, ‘‘Android
Sci., Eng. Inf. Technol., vol. 10, no. 2, pp. 536–541, Mar. 2020. application behavioral analysis through intent monitoring,’’ in Proc. 6th
[68] S. Wang, Z. Chen, Q. Yan, B. Yang, L. Peng, and Z. Jia, ‘‘A mobile Int. Symp. Digit. Forensic Secur. (ISDFS), Mar. 2018, pp. 1–8.
malware detection method using behavior features in network traffic,’’ [90] K. Efimov and R. Onitza-Klugman. Exploring Intent-Based Android
J. Netw. Comput. Appl., vol. 133, pp. 15–25, May 2019. [Online]. Avail- Security Vulnerabilities on Google Play. Accessed: Dec. 8, 2023.
able: https://www.sciencedirect.com/science/article/pii/S1084804518 [Online]. Available: https://snyk.io/blog/exploring-android-intent-based-
304028 security-vulnerabilities-google-play/
[69] I. Bibi, A. Akhunzada, J. Malik, J. Iqbal, A. Musaddiq, and S. Kim, [91] Intents and Intent Filters. Accessed: Feb. 2024. [Online]. Available:
‘‘A dynamic DL-driven architecture to combat sophisticated Android https://stuff.mit.edu/afs/sipb/project/android/docs/guide/components/
malware,’’ IEEE Access, vol. 8, pp. 129600–129612, 2020. intents-filters.html
[70] T. Bhatia and R. Kaushal, ‘‘Malware detection in Android based on [92] W. Wang, M. Zhao, Z. Gao, G. Xu, H. Xian, Y. Li, and X.
dynamic analysis,’’ in Proc. Int. Conf. Cyber Secur. Protection Digit. Zhang, ‘‘Constructing features for detecting Android malicious appli-
Services (Cyber Security), Jun. 2017, pp. 1–6. cations: Issues, taxonomy and directions,’’ IEEE Access, vol. 7,
[71] H. H. R. Manzil and S. M. Naik, ‘‘DynaMalDroid: Dynamic analysis- pp. 67602–67631, 2019.
based detection framework for Android malware using machine learning [93] J. Fernando. (Oct. 2016). What is an API? How to Call an API From
techniques,’’ in Proc. Int. Conf. Knowl. Eng. Commun. Syst. (ICKES), Android? Accessed: Feb. 2024. [Online]. Available: https://droidmentor.
Dec. 2022, pp. 1–6. com/api-call-api-android/
[72] S. Xiong and H. Zhang, ‘‘A multi-model fusion strategy for Android [94] P. Papaioannou. How Malicious Applications Abuse Android Permis-
malware detection based on machine learning algorithms,’’ J. Comput. Sci. sions. Accessed: Dec. 8, 2023. [Online]. Available: https://blog.nviso.
Res., vol. 6, no. 2, pp. 1–11, Jun. 2024. eu/2021/09/01/how-malicious-applications-abuse-android-permissions/

173190 VOLUME 12, 2024


S. J. Altaha et al.: Survey on Android Malware Detection Techniques Using Supervised ML

[95] V. Kabade, R. Hooda, C. Raj, Z. Awan, A. S. Young, M. S. Welgampola, AHMED ALJUGHAIMAN received the B.S. degree in computer and
and M. Prasad, ‘‘Machine learning techniques for differential diagnosis information technology, specializing in computer networks and information
of vertigo and dizziness: A review,’’ Sensors, vol. 21, no. 22, p. 7565, security, from Indiana University–Purdue University Indianapolis, in 2011,
Nov. 2021. the master’s degree in network security from DePaul University, in 2013, the
[96] Zach. Decision Tree vs. Random Forests: What’s the Difference? master’s degree in information assurance from the University of Colorado at
Accessed: Nov. 29, 2023. [Online]. Available: https://www.statology.org/ Colorado Springs, in 2019, and the Ph.D. degree in security engineering from
decision-tree-vs-random-forest/ the University of Colorado at Colorado Springs, in 2021. He is currently an
[97] A. Fatima, R. Maurya, M. K. Dutta, R. Burget, and J. Masek, ‘‘Android
Assistant Professor with the College of Computer Sciences and Information
malware detection using genetic algorithm based optimized feature
Technology, King Faisal University, Saudi Arabia. His research interests
selection and machine learning,’’ in Proc. 42nd Int. Conf. Telecommun.
Signal Process. (TSP), Budapest, Hungary, 2019, pp. 220–223. include underwater wireless sensor networks, underwater communications,
[98] S. Kayode, ‘‘Navigating regulatory and legal challenges in AI and software-defined networks, cybersecurity, computer networks, terrestrial
ML-powered cybersecurity: A comprehensive analysis,’’ Tech. Rep., 2023. wireless sensor networks, network protocols, network security, the Internet
[99] Q. Wu, X. Zhu, and B. Liu, ‘‘A survey of Android malware static detection of Things, blockchain, and unmanned aerial vehicles.
technology based on machine learning,’’ Mobile Inf. Syst., vol. 2021,
pp. 1–18, Mar. 2021.
[100] F. I. Abro, M. Rajarajan, T. M. Chen, and Y. Rahulamathavan, ‘‘Android
application collusion demystified,’’ in Proc. Int. Conf. Future Netw. Syst.
Secur. Gainesville, FL, USA: Springer, 2017, pp. 176–187.

SONIA GUL is a dedicated Educator and a Scholar with more than 15 years
of expertise at prestigious educational establishments in Saudi Arabia,
SAFA J. ALTAHA received the bachelor’s degree in computer science and New Zealand, and Asia, instructing pupils from diverse social and cultural
information technology, in 2019. She is currently pursuing the master’s contexts. Her teaching and research expertise lies in the practical aspects
degree in cybersecurity with King Faisal University. She has a strong interest of computer networks, telecommunications, wireless communications,
in various areas, including machine learning, digital forensics, cybersecurity, emergency communications, software engineering, data structures and algo-
and business continuity. In addition to her studies, she works as a Business rithms, project management, information and communication technology
Continuity Analyst, where she applies her knowledge and skills to ensure the (ICT), and operating systems. She possesses strong communication skills
resilience and security of organizations in the face of cyber threats. Her role and employs effective teaching and research methods to create an engaging
involves implementing strategies and measures to protect critical business learning environment. Her capability to function in a managerial capacity or
operations and data. Prior to her role as a Business Continuity Analyst, as a member of teaching and research teams demonstrates a track record of
she was a Cybersecurity Analyst, ensuring compliance to international effectively meeting timelines and successful project closures.
cybersecurity standards within her organization.

VOLUME 12, 2024 173191

You might also like