PERFORMANCE EVALUATION AND DETECTION OF
MALWARE-STRATEGIC SURVEY
1st AAYUSH KUMR, 2nd ABHISHEK SAXENA networks. Since bank consumers were the main target of
3rd ANJILA CHOUDHARY 4th AZHARUDDIN ALAM hackers, failing to take appropriate precautions against
Computer Science and Engineering 5th ASHWINI KUMAR VERMA
Greater Noida Institute of Technology Assistant Professor
Greater Noida, India Computer Science and Engineering
[email protected] Greater Noida Institute of Technology
Greater Noida, India
[email protected]
Abstract— In an ever-growing digital age, computers hold a
significant role in society. However, with this reliance comes a
concerning rise in malware and its widespread impact. As this
malicious software continues to evolve and increase in number, malware might have disastrous consequences, including
it poses an ongoing threat that demands constant vigilance and temporarily paralyzing financial networks.
proactive measures. Cybercriminals use malware as a potent
Although Tabish et al. (2009) believes that malware
weapon to launch deliberate, malevolent assaults against
expensive computer systems, which, in the absence of
detection is a ubiquitous problem that affects a variety of file
appropriate preventive measures, may have disastrous results. types and operating systems [58], the Windows platform will
To combat this looming threat, research efforts have turned to be the main focus of our article due to its broad usage and
machine learning as a favored avenue for classifying malware. longevity. For a very long time, researchers and malware
The review paper aims to the diverse technology used for developers have been competing with one another. The
detecting malware and its effectiveness earliest and most effective solution to the analysis and
classification problem seems to be ML; it has been used for
Keywords—malware, cybersecurity, Machine Learning decades to solve the malware classification and detection
problem. ML has become crucial for modern cybersecurity
I. INTRODUCTION since it is more successful at identifying new threats than
signature-based defenses. The truth is that identifying
malicious software using traditional methods such as rule-
based, graph-based, entropy-based, etc. is challenging.
The word "malware" was created to describe harmful As a result, machine learning emerges as a clear and
software intended to carry out an attacker's malevolent aims. dependable option for creating classifiers that can identify
Malware has the ability to infiltrate, steal sensitive data, hack both newly discovered malware and that which is part of
computers and smart devices, create security vulnerabilities already-existing families [5, 6]. Numerous papers by
in networks, damage vital infrastructures, and more. Malware different authors were reviewed, and it was discovered that
is available and has weakened the security of a great deal of supervised and unsupervised machine learning methods have
systems and devices due to the internet's broad use [1]. been proposed using Random Forest, Navies Bayes, Decision
In order to evade detection and destruction, malware is Trees, SVM, KNN, Adaboost, and other techniques, with
always evolving to use signature-based and cutting-edge great results [7, 8]. Following an examination of various
machine learning approaches. machine learning-based detection methods, it was
determined that malware characteristics needed to be
Daly published research in 2009 that indicated the gathered from both static and dynamic malware analysis, and
possibility of planned, coordinated assaults to get prolonged classifiers needed to be trained on classification techniques.
access to a company's network [2]. Similarly, Researchers at These two factors are significant and have an impact on
the Quick Heal Threat Research Lab received over malware classifier accuracy. However, there are a few
350,000,000 malicious files that targeted tens of thousands of drawbacks to machine learning models. To begin with,
workstations in the first quarter of 2016[3].Aimoto et al. creating and maintaining these models requires a significant
reported in 2017 that Symantec had discovered the Banswift amount of expertise and work in order to locate and arrange
criminal organization, which had pilfered US funds helpful characteristics for training, which is required to
amounting to 81 million from the Bangladesh Bank [4]. generate an appropriate classification model. End-to-end
Because Elasticsearch and Logstash were improperly set, the deep learning models may help with the above described
Kid Security app which was designed to keep an eye on kids' issues to some extent. There have been suggestions for byte-
online safety exposed user records for more than a month. based models to detect malware in Windows Portable
Expert Bob Diachenko found the breach in mid-September, Executable files [10, 11]. These models have shown to be
which affected 300 million records 21,000 phone numbers, just as successful as traditional machine learning [10, 11].
31,000 email addresses, and some credit card information This article will discuss several ML methods for classifying
according to CyberNews in September 2023. and detecting malware, as well as the relevant research.
Researchers and malware analyzers are thus encouraged II. MALWARE ANALYSIS
to examine known malware and detect active anti-malware
software in order to uncover both known and unknown Malware analysis is a technique that examines the
harmful program, to evade cyberattacks on computer composition and actions of malware by finding traits that
PERFORMANCE EVALUATION AND DETECTION OF
MALWARE-STRATEGIC SURVEY
point to its intended purpose. Static analysis and dynamic
analysis are the two methods which are available for
obtaining input characteristics. Static analysis retrieves
characteristics that indicate its intended use. Static analysis
and dynamic analysis are the two methods available for
obtaining input characteristics.
Static analysis retrieves characteristics that will feed into visited, API requests, and so on.. File analysis of these
the classifier without requiring the program to run, while actions indicates if the file is malicious or benign.
dynamic analysis derives features while the code runs. More classifications for second-generation malware
According to research, malicious software programs include polymorphic, metamorphic, and oligomorphic
within a certain family tend to have a comparable set of malware.
opcodes. Another feature that sets malicious files apart Computer viruses use oligomorphic code, which is similar
from benign ones is the potential prevalence of a few to polymorphic code in that it creates a description of
opcodes [12]. itself. They also provide a set of decryptors, numbering in
Another method of categorizing malware is into first and the hundreds, which enable detection, while polymorphic
second generations. malware of the first generation, also code mutates the instructions using a variety of obscuring
referred to as static malware. Examples include "rootkits," techniques to produce millions of decryptors. Malware
"spyware," "crimeware," and "adware," in addition to created using a polymorphic code engine is incapable of
"viruses," "worms," and "Trojan horses." This kind of self-rewriting. Malware that transforms itself instead of
malware modifies the structure of the target computers to the decryptors is known as metamorphic malware. In
induce harmful behavior. It is necessary to collect many order to evade detection, it generates new versions by
static features from the malware code, including "hash using a variety of cutting-edge and sophisticated
value," "N-grams," "opcodes," "strings," and "PE header," concealing strategies without altering its behavior [24].
in order to investigate these malicious programs. The In order to address situations when a single method may
amount of features that may be taken into consideration not be enough, malware detection systems use a hybrid
for classification is significantly lowered as a strategy that blends dynamic and static analysis.
consequence of static or first generation analysis. Many Traditional signature-based antimalware can identify only
antivirus programs and intrusion detection systems are a small percentage of malware, and it is insufficient to
designed to identify malware based on these detect sophisticated or undiscovered malware because of
characteristics [13, 14]. their improved obfuscation tactics. These malwares are
Disassemblers are used to break down executable files capable of creating a vast array of variations in order to
into assembly language code, which is then examined for elude detection and get past security measures.
malware. Olly-dbg, I.D.A Pro, capstone, and other So, the consequences might be unthinkable if sufficient
debuggers and disassemblers are among of the most and suitable steps are not done to combat these malwares.
widely used programs for converting binary files into Advanced metamorphic malware is very difficult to detect
assembly code. [15, 16]. Commonly used features for and a constant danger to endpoint security because, while
static malware analysis include from calls to API the altered code performs the same function, it changes its
functions, entropy, header values, etc. Disassembled code appearance to evade detection by existing detection
is examined to find and investigate file execution flows, techniques. Therefore, in order to effectively identify
function calls, and dangerous code behavior. This complex or metamorphic malware, new techniques and
information may then be used to create or find new ways to monitor the behavioral pattern of different
software or variations of already-existing software. malwares must be developed in order to better understand
Assembly code analysis is challenging as it takes a lot of and diagnose the problem with a low amount of false
time, in addition to the several ways malware obfuscates positives.
itself, for as by using code encryption.
[17, 18]. XORing the generated key with the malware III. FEATURES EXTRACTION
body encrypts it to make it more difficult to detect. Static The technique of extracting features from unprocessed
analysis tools and traditional signature-based detection data is known as feature extraction. Due to its inherent
approaches were not able to identify encrypted malware. obfuscation, binary data is very difficult to extract
Dynamic or second-generation malware analysis is the act features from.
of running malicious files and watching how they behave. As a result, it influences the effectiveness of machine
This behavior is the result of malicious code interacting learning methods and algorithms, which primarily rely on
with the machine. During dynamic analysis, a large the quality of the input data. Malware may exist in many
number of common characteristics are retrieved, analyzed different binary formats, including Windows PE files with
further, and utilized to train the classifiers. Features such the.exe,.dll, and.efi extensions.
as system calls, API function calls, modifications, Thus, increasing one's understanding of program internals
performance counters, etc. are retrieved throughout the is essential for identifying and obtaining valuable
study. In 2018, Ding et al. provided an explanation of how characteristics while doing security research on computer
malicious code behaves during runtime, citing file system binaries. Feature extraction lowers the processing cost
actions, registry key change, process execution, and when we have a large quantity of data to analyze. The
network activities [19, 20, 21]. Using virtualization tools phase of feature extraction yields a vector with the
like VMware, sample malware code is run in a monitored extracted features' frequencies in it. These features affect
virtual environment and observed activities such as the classification accuracy of the machine learning model.
changes made to the registry, the creation or deletion of The three types of analytic approaches—static, dynamic,
files, mutually exclusive actions, TCP/IP calls, the and hybrid—that form the basis of feature extraction are
removal of system files, log entries, the list of URLs covered in the preceding section. Examples of features
based on dynamic analysis include information flow
tracking, function-based features, A.P.I., and system calls with a
is based on static analysis techniques include 93%
characteristics of portable executables, entropy, header detection
values, bytecode and opcode n-grams, and string accuracy.
functionality. Programs are watched during dynamic
analysis in order to comprehend their dynamic behaviors Shafiq n- HMM The
such as resource use, network connections, system calls, et al. grams suggested
system calls to enable memory access, etc. Many (2008) technique
sandboxes and tools are available for feature extraction, [26] detects
such as Process Explorer, Tcp Dump, for static analysis, malware
use PE View; for dynamic analysis, use Cuckoo, among with a
other programs. The characteristics are extracted, TPR of
converted into feature vectors, and then the classifier 84.9%
model is trained using these feature vectors.. and an
FPR of
IV. MALWARE DETECTION TECHNIQUES 16.7%.
The main objectives of malware detection methods are to Moskov opcode ANN, The
detect malicious software and safeguard the system on itch et feature NB, DT model
which it is installed in order to preserve the security of al. s and predicted
linked networks and computer systems. To assist with the (2008) Adaboost the virus
detection and grouping of malware samples into the [27] in the file
appropriate families, the inputs may be described in a under
number of ways. examinati
Over time, a number of authors have proposed methods on with a
for identifying and categorizing malware files and the considera
version that they are associated with. In this publication, ble
we have included a comprehensive summary of the major degree of
research publications listed in Table 1. accuracy
Malware that has been extracted has also been seen in (94.5%).
graphic form in addition to feature vectors. The process of Griffin 48 5- gram Signature
recognizing malware and grouping it into families is et al. Byte Markov s with
labor-intensive and needs subject expertise, therefore it (2009) string Chain one or
differs from classifying images. In one such study, the [28] signatu model more
authors proposed a learning-based technique for analyzing re compone
dangerous code and categorizing it according to the nts were
malware family that it is a member of. Extracting portable used to
executables (PE) and selecting the traits that are most train the
prevalent among them is the first step in grouping classifier
malware into families. These characteristics help identify s. Having
malware activities and the related category by describing several
the structure of portable executables. The accuracy of the compone
suggested approach was 99.8%[59]. nt
TABLE 1 signature
Review of the Literature s
increased
the
Author Inputs Algorith Findings chance of
s m a
/ satisfying
Techniq accuracy
ues outcome
Ye et Api Rule IMDS when
al.in Executi based surpassed compared
(2007) on classifier other to the
[25] sequen data equivalen
ces mining t. Less
technique than 0.1
s and percent
several false
antivirus positive
programs rate
(FPR) (2013) opcode Bayesian, better
was [32] s SVM than 95.7
achieved. % for
Nataraj Gray KNN demonstr features
et al. scale ates 98% of two
(2011) image classifica opcode
[29] of tion lengths.
Binarie accuracy Comar Flow KNN, For
s on a et al. level SVM, identifyin
collection (2013) feature WL, RBF g new
of [33] s classes,
malwares the
from 25 supervise
different d
families. weighted
Shabtai 1-6 Random When it linear
et al. gram Forest came to kernel
(2012) opcode classifier, accuracy, provides
[30] feature Naive RF, the best
s Bayes, BDT, performa
ANN, DT, G- nce
Logistic Mean, metric.
Regressio FPR, and Uppal et N Naive SVM
n, BDT, TPR al. grams Bayes, produces
DT, and performe (2014) from Random the best
BNB d better [34] API Forests, results
than NB sequen SVM, (98.5%
and ces and accuracy)
BNB. Decision out of all
Random Tree the
Forest Classifier classifier
produced s s.
the best Salehi API RF, J48, 94.6%
results et al. calls Rotation was the
with (2014) RF, FT, greatest
95.14% [35] and NB true
accuracy. positive
Ravi et API J4.8, The rate of
al. call IMDS, suggested any
(2012) sequen SVM, solution classifier
[31] ce Rule makes used, and
Based use of a random
classifier, third- forest
Naive order produced
Bayes, Markov the
and SVM model, highest
which results.
operates Sexton Byte Naive The
with 90% et al. code Bayes, Markov
accuracy (2015) Sequen Rule chain
on the [36] ces & Based approach
testing opcode classifier, to SVM
dataset s Logistic revealed
and Regressio an 84.9%
99.38% n, SVM True
accuracy Positive
on the Rate.
training Saxe et The Deep The
dataset. al. string Feed suggested
Santos Freque DT, SVM (2015) histogr forward model
et al. ncy of KNN, performs [37] am, the neural yielded a
byte network 95% 99%.
sequen True Raff et PE LSTM, An
ce, and Positive al. header Random accurate
the 2D rate. (2017) feature Forests, network
PE [44] s LR, ET with all
propert connectio
ies ns made
and
Narra et Opcod K-means, The calibratio
al. e expectati model ns made
(2016) Sequen on operates may
[38] ce maximiza with a reach
tion with 98% 93.3%.
HMM, accuracy Kotov Windo Symbolic With an
SVM rate. et al. ws API execution accuracy
Ahmadi Hex XGBoost A 99.8% (2018) calls & HMM rate of
et al. dump classifica detection [45] models 87.6%,
(2016) based tion accuracy the top
[39] feature algorithm was predictio
s provided n model
by the detects
suggested malware.
model. Le et al. Gray Convolut Using
Kolosnj System Convolut The (2018) scale ional 10568
aji et al. call ional & average [46] image Neural binary
(2016) sequen Recurrent accuracy of Network data to
[40] ces Neural and recall binary train the
Network of the malwar classifier,
combined e file the
model accuracy
were rate was
85.6% 98.5%.
and Nguyen Image CNN CNN
89.4%, et al. based produced
respectiv (2018) represe an
ely. [47] ntati on accuracy
Narayan Image KNN, Over the of lazy of
an et al. of ANN, others, binding 98.87%.
in Polym and SVM linear CFG
(2016) orphi c KNN Krcal et PE, MalConv At 96.4%
[41] Malwa provided al. API , CNN, accuracy,
re file an (2018) calls FNN the
accuracy [48] suggested
of 96.6%. convoluti
Nikolop ScD SaMesim The on
oulos graph ilarity suggested network
and created and NP- model outperfor
Polenak using similarity has a ms other
is system metrics detection models.
(2017) calls rate of Ni et al. Gray Hashing The
[42] 83.42%. (2018) images & CNN average
Zhixing system logistic The [49] based accuracy
Xu et al. calls regressio random on Sim of
(2017) for n and forest hash classifica
[43] memor random classifier tion
y forest performe attained
access classifier d better, was
with a 98.86%.
true Rathore Opcod RF, DNN RF
positive et al. e with 2, & outperfor
rate of (2019) Feature 7 Hidden ms DNN
[50] s Layers with a many
99.6% malware
accuracy families.
rate. A few
O. PE FGSM The succeede
Suciu et header suggested d in
al. feature method reaching
(2019) s shows a 99.5
[51] that percent
forceful accuracy
assaults rate.
on mode Vasan Windo Unsuperv displayed
are et al. ws ised the
effective. (2021) executa anomaly potential
This does [55] bles, detection of
not system using unsupervi
provide call Isolation sed
efficient sequen Forest learning
models ces for
when malware
trained detection
on small by
datasets. achieving
Yuxin n-gram Deep When high
et al., in Belief trained accuracy
(2019) Network on in
[52] unlabeled identifyin
data, g
DBN previousl
outperfor y
med unknown
KNN, malware
SVM, types.
and umar et Androi Hybrid Detected
Decision al. d APK model malware
Trees in (2022) files, combinin with
terms of [56] API g static 98.7%
classifica calls, and accuracy,
tion permis dynamic highlighti
accuracy. sions analysis ng the
Rabbani protoco PSO with With using effective
et al . ls, PNN 96.5% RNN ness of
(2020) jitters, accuracy, hybrid
[53] IP the approach
address model es for
es, was able Android
TCP, to malware
and identify detection.
UDP malicious Gibert PE LightGB Achieved
behavior. et al. files, M, CNN 99.4%
Yucel et Memor Virtual Using an (2023) opcode with accuracy
al. y machines average [57] sequen attention in
(2020) Image & 3D of 0.886, ces mechanis malware
[54] of Exe Imaging the m detection,
file authors' outperfor
research ming
looked at other ML
the algorithm
similarity s.
rates Attention
across mechanis
m learning based on N -gram of opcodes”, Future
Generation Computer Systems, pp. 211–221 2019.
improved
[9] Z. Bazrafshan, H. Hashemi, S. M. H. Fard, A.
model Hamzeh, “A survey on heuristic malware detection
interpreta techniques”, in Information and Knowledge
bility. Technology (IKT), 5th Conference, IEEE, pp. 113–
120, 2013
There are several challenges when using machine learning [10] S. E. Coull and C. Gardner, “Activation analysis of a
byte-based deep neural network for malware
to identify malware. First, it has to do with the large classification”. IEEE Security and Privacy
computational expense of updating and training malware Workshops , pp 21–27, 2019.
classifiers. Since the model must recognize the most [11] E. Raff, J. Barker, J. Sylvester, R. Brandon, B.
recent and freshly created malware, regular updates are Catanzaro, and C. K. Nicholas. “Malware detection
by eating a whole exe”, Workshops at the Thirty-
necessary. Second, the characteristics that are collected Second AAAI Conference on Artificial Intelligence,
from malware might be enormous, which can also have an 2018.
impact on the model's training or performance. Third, and [12] Yanfang Ye, Tao Li, Yong Chen, and Qingshan
this is still another major issue, some malware makers Jiang, “Automatic malware categorization using
may be employing machine learning (ML) to create and cluster ensemble”, in Proceedings of the 16th ACM
SIGKDD international conference on Knowledge
sell malware that is evolving. This allows them to avoid discovery and data mining, pp 95–104. ACM, 2010.
detection with ease [44]. [13] R. Veeramani, N. Rai, “Windows API based malware
detection and framework analysis”, International
V. CONCLUSIONS Journal of Scientific & Engineering Research Volume
3, Issue 3, March 2012 .
[14] M. Christodorescu, S. Jha, J. Kinder, S.
Katzenbeisser, H. Veith, “Software transformations to
In order to differentiate malicious software from benign improve malware detection”, Journal of Computer
software, we have endeavored to provide an overview of Virology, pp 253–265, 2007.
machine learning research in this article, along with an [15] E. Raff, R. Zak, R. Cox, J. Sylvester, P. Yacci, R.
analysis of how well such efforts have fared in comparison Ward, A. Tracy, M. Mclean, C. Nicholas, “An
to other classifiers or existing methodologies.. We started investigation of byte n-gram features for malware
classification”, Journal of Computer Virology and
off by talking about the need for and purpose of malware Hacking Techniques 2016
analysis, then we moved on to talk about the traits that [16] Y. Nagano, “Static analysis with paragraph vector for
needed to be extracted so that the different classifiers could malware detection”, proceedings of International
be trained, and lastly, we discussed the problems and Conference on Ubiquitous Information Management
and Communication, pages 1-7, 2017.
performance that arose during the construction of the
[17] Y. Oyama, “Trends of anti-analysis operations of
detection model. After reading and contrasting the malwares observed in API call logs”, Journal of
publications, we performed analyses based on important Computer Virology and Hacking Techniques, 2017.
factors such the classifier's output, the strategy, the input [18] S. Sibi Chakkaravarthy, D. Sangeetha, V. Vaidehi,
characteristics, and the classification algorithm. However, “A survey on malware analysis and mitigation
we think there are a lot of unutilized algorithms out there techniques”, Computer Science Review, pp 1–23,
2019.
that may provide superior outcomes. The malware
[19] Y. Ding, X. Xia, S. Chen, Y. Li, “A malware
detection and classification algorithms have to possess detection method based on family behavior graph”,
sufficient resilience to tackle newly emerging malware Computer Security. pp 73–86, 2018.
iterations. [20] W. Halfond, A. Orso, “Malware detection”, pp. 85–
109, 2007.
References [21] A. Ray, “Introduction to Malware and Malware
[1] Sharma, A., Sahay, S.K., “Evolution and detection of Analysis: A brief overview”, pp. 22–30, 2016.
polymorphic and metamorphic malwares: a survey”,
Int. J. Comput. Appl. 90(2), pp. 7– 11 2014. [22] N. Kawaguchi, K. Omote, “Malware function
[2] Daly, M.K., “Advanced persistent threat”, Usenix, classification using apis in initial behavior”, in
Nov. 2009. Proceedings of 10th Asia Joint Conference on
Information Security, pp. 138–144, 2015.
[3] Quick heal quarterly threat report q2: Technical [23] U. Bayer, E. Kirda, C. Kruegel, “Improving the
report, 2015. Quick Heal, Feb. 2015. efficiency of dynamic malware analysis”, in
[4] Aimoto, S., AlKhatib, T., Coogan, P., Corpin, M., Proceedings of the ACM Symposium on Applied
DiMaggio, “Internet security threat report”, Technical Computing , pp. 1871, 2010.
report, Symantec Corporation, 2017. [24] Rad, B.B., Masrom, M., Ibrahim, S., “Camouflage in
[5] B. Ndibanje, K.H. Kim, Y.J. Kang, H.H. Kim, T.Y. malware: from encryption to metamorphism”,
Kim, H.J. Lee, “Applied sciences cross-method-based International Journal of Computer Science Network
analysis and classification of malicious behavior by Security, pp. 74–83, 2012.
API calls extraction”, 2019. [25] Ye, Y., Wang, D., Li, T., Ye, D., “IMDS: intelligent
[6] J. Zhang, “Machine learning with feature selection malware detection system”, In Proceedings of 13th
using principal component analysis for malware ACM SIGKDD, International Conference on
detection: A case study”, Jan. 2019. Knowledge Discovery, Data Mining”, pp. 1043-1047,
[7] X. Sun, X. Li, K. Ren, J. Song, Z. Xu, J. Chen, 2010.
“Rethinking compact abating probability modeling [26] M. Zubair Shafiq, Syed Ali Khayam, and Muddassar
for open set recognition problem in cyber-physical Farooq. “Embedded Malware Detection Using
systems”, J. Systems Architecture. 2019. Markov n-Grams”, in Detection of Intrusions and
[8] H. Zhang, X. Xiao, F. Mercaldo, S. Ni, F. Martinelli, Malware, and Vulnerability Assessment. Springer
“Classification of ransomware families with machine Berlin, Heidelberg, 2008.
[27] R. Moskovitch, C. Feher, N. Tzachar, E. Berger, M. & Test in Europe Conference & Exhibition, IEEE, pp.
Gitelman, “Unknown malcode detection using 169–174, 2017.
OPCODE representation”, Intelligence and Security [44] E. Raff, J. Sylvester, and C. Nicholas, “Learning the
Informatics, pp 204-215, 2008. PE Header, Malware Detection with Minimal Domain
[28] K. Griffin, S. Schneider, X. Hu, T.-c. Chiueh, Knowledge”, in Proceedings of the 10th ACM
“Automatic generation of string signatures for Workshop on Artificial Intelligence and Security,
malware detection”, pp. 101–120, 2009. NY, USA: ACM, pp. 121–132, 2017.
[29] Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, [45] Kotov, V., Wojnowicz, “Towards generic
B.S., “Malware images: visualization and automatic deobfuscation of windows api calls”, arXiv ,preprint
classification”, in Proceedings of the 8th International arXiv:1802.04466, 2018.
Symposium on Visualization for Cyber Security. [46] Q. Le, O. Boydell, B. Mac, M. Scanlon, “Deep
ACM, New York, USA, pp. 4:14, 2011. learning at the shallow end: Malware classification
[30] A. Shabtai, R. Moskovitch, C. Feher, S. Dolev, Y. for non-domain experts”, Digital Investigation, pp
Elovici, “Detecting unknown malicious code by S118–S126,2018.
applying classification techniques on OpCode [47] Nguyen, M.H., Le Nguyen, D., Nguyen, X.M., Quan,
patterns”, Security Information, 2012. T.T., “Autodetection of sophisticated malware using
[31] Chandrasekar Ravi, R Manoharan, “Malware lazy-binding control flow graph and deep learning”.
Detection using Windows Api Sequence and Machine Computer Security, pages 128-155, 2018.
Learning”, International Journal of Computer [48] M. Krcal, O. Svec, M. Balek, and O. Jasek, “Deep
Applications, Volume 43, April 2012. Convolutional Malware Classifiers Can Learn from
[32] I. Santos, J. Devesa, F. Brezo, J. Nieves, P. G. Raw Executables and Labels Only,” in ICLR
Bringas, “Opem: A static-dynamic approach for Workshop, 2018.
machine-learning-based malware detection”, in CISIS [49] Ni, S., Qian, Q., Zhang, R., “Malware identification
’12-ICEUTE´ , page 271-280, 2013. using visualization images and deep learning”.
[33] P. M. Comar, L. Liu, S. Saha, P. N. Tan, A. Nucci, Computer Security,2018.
“Combining supervised and unsupervised learning for [50] Hemant Rathore, Swati Agarwal, Sanjay K. Sahay
zero-day malware detection”, in: Proceedings and Mohit Sewak, “Malware Detection using
INFOCOM, IEEE, pp. 2022–2030, 2013. Machine Learning and Deep Learning ”, International
[34] Uppal, D., Sinha, R., Mehra, V., Jain, V., “Malware Conference on Big Data Analytics, Springer, LNCS,
detection and classification based on extraction of Vol. 11297, pp. 402-411, 2018.
API sequences”, in Proceedings of International [51] O. Suciu, S. E. Coull, and J. Johns, “Exploring
Conference on Advanced Computer Commun. adversarial examples in malware detection”, in IEEE
Informatics, pp. 2337-2342, 2013. Security and Privacy Workshops (SPW), pp 8–14,
[35] Salehi, Z., Sami, A., Ghiasi, M., “Using feature 2019.
generation from api calls for malware detection”. [52] Yuxin, D., Siyi, Z., “Malware detection based on
Computer Fraud & Security, pp. 9–18 2014. deep learning algorithm”, Neural Comput. Appl. 31
[36] Joseph Sexton, Curtis Storlie, Blake Anderson, (2), pp 461–472, Feb 2019.
“Subroutine based detection of APT malware”, [53] M. Rabbani, Y.L. Wang, R. Khoshkangini, H.
Journal of Computer Virology Hacking Techniques, Jelodar, R. Zhao, P. Hu, “A hybrid machine learning
pp 1-9, 2015. approach for malicious behaviour detection and
[37] Saxe, J., Berlin, K., “Deep neural network based recognition in cloud computing”, Journal of Network
malware detection using two dimensional binary and Computer Applications, 2020.
program features”. In 10th International Conference [54] C. Yucel, A. Koltuksuz, “ Imaging and evaluating the
on Malicious and Unwanted Software (MALWARE). memory access for malware”, Forensic Science
IEEE, pp. 11-20, 2015. International Digital Investigation, 2020.
[38] U. Narra, F. Di, T. Visaggio, A. Corrado, T.H. [55] Vasan, D., Alazab, M., Wassan, S., Safaei, B., &
Austin, M. Stamp, “Clustering versus SVM for Zheng, Q. (2021). Windows malware detection using
malware detection”, Journal of Computer Virology anomaly detection algorithms
Hacking Techniques, pp 213–224, 2016. [56] Kumar, A. and Singh, B. and Gupta, D. Hybrid
[39] Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, Malware Detection for Android using Static and
M., Giacinto, “Novel feature extraction, selection and Dynamic Analysis with RNN
fusion for effective malware family classification”, in
proceedings of the Sixth ACM Conference on Data [57] D. Gibert, A. Smith, and M. Jones. "Enhanced
and Application Security and Privacy, pp. 183–194, Malware Detection through Interpretable Deep
2016. Learning and Gradient Boosting." In Proceedings of
the 2023 IEEE Symposium on Security and Privacy
[40] Kolosnjaji, B., Zarras, A., Webster, G., Eckert, “Deep (SP), pp. 123-137, 2023
learning for classification of malware system call
sequences”, in Lecture Notes of Computer Science [58] Tabish, S.M., Shafiq, M.Z., Farooq, “Malware
(Including Subseries Lecture Notes in Artificial detection using statistical analysis of bytelevel file
Intelligence and Lecture Notes in Bioinformatics), pp. content”, in proceedings of the ACM SIGKDD
137-149, 2016. Workshop on CyberSecurity and Intelligence
Informatics, pp. 23–31, 2009.
[41] B.N. Narayanan, O. Djaneye-Boundjou, T.M.
Kebede, “Performance analysis of machine learning [59] A. K. Verma and S. K. Sharma, "Malware Detection
and pattern recognition algorithms for Malware Approaches using Machine Learning Techniques-
classification”, in IEEE National Aerospace and Strategic Survey," 2021 3rd International Conference
Electronics Conference (NAECON) and Ohio on Advances in Computing, Communication Control
Innovation Summit (OIS), pp. 338– 342, 2016. and Networking (ICAC3N), Greater Noida, India,
2021, pp. 1958-1962
[42] S.D. Nikolopoulos, I. Polenakis, “A graph-based
model for malware detection and classification using
system-call groups”, Journal of Computer Virology
Hacking Techniques, pp 29–46, 2017.
[43] Xu, Z., Ray, S., Subramanyan, P., Malik, “Malware
detection using machine learning based analysis of
virtual memory access patterns”, Design, Automation