0% found this document useful (0 votes)

44 views8 pages

A Comprehensive Survey On Identification of Malware Types and Malware Classification Using Machine Learning Techniques

This document presents a comprehensive survey on malware identification and classification using machine learning techniques, highlighting the increasing sophistication of malware and the challenges faced by security experts. It categorizes various types of malware and discusses traditional and advanced machine learning methods for detection, including static, dynamic, and memory analysis. The paper aims to provide insights into the current trends, limitations, and future research directions in the field of malware detection.

Uploaded by

kofeinmrdoors98

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views8 pages

A Comprehensive Survey On Identification of Malware Types and Malware Classification Using Machine Learning Techniques

Uploaded by

kofeinmrdoors98

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Proceedings of the Second International Conference on Smart Electronics and Communication (ICOSEC).

IEEE Xplore Part Number: CFP21V90-ART; ISBN: 978-1-6654-3368-6

A Comprehensive Survey on Identification

of Malware Types and Malware
Classification Using Machine Learning
2021 2nd International Conference on Smart Electronics and Communication (ICOSEC) | 978-1-6654-3368-6/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICOSEC51865.2021.9591763

Techniques
Nagababu Pachhala1 S. Jothilakshmi2 Bhanu Prakash Battula3
1 2 3
Research Scholar, Department of IT, Associate Professor, Department of Professor, Department of CSE, KKR
Annamalai University, IT, Annamalai University, & KSR Institute of Technology &
Annamalainagar, 608002, Tamil Nadu, Annamalainagar, 608002, Tamil Nadu, Sciences, Guntur-522017, Andhra
India India Pradesh, India
1 2 3
[email protected] [email protected] [email protected]

Abstract
Malware is malicious code that has an effect on the user I. INTRO DUCTIO N
or device and allows an attacker to do significant harm The defining word for harmful software is malware.
to the machine. Malware is a kind of computer virus Malware is a dangerous code which has an impact on
that increases in number and severity with each passing the user or the computer, and damages the machine by
day, posing a major danger to the security of the an attacker. Malware is a virus version, Trojan, Root,
Internet. This is a never-ending fight between security Ransomware, Worm, BotNet, Spyware, Adware,
experts and malware producers, with the sophistication Keyloggers, etc., and an extensive array of their
of malware increasing at the same rate as technological families is available, spread every day online.
advancement. Current state-of-the-art research focuses According to a study conducted by the AV-Test
on the development and use of machine learning Institute, every day 350,000 new hazardous
methods for malware detection owing to the capacity of applications and programmes are reported. The
these techniques to stay up with malware evolution and malware statistics are documented and registered for
keep up with the speed of technological advancement.
897 million malicious code in 2020, each harmful
The purpose of this study is to provide a systematic and
individual is categorised and saved properly.
comprehensive review of machine learning methods for
malware detection, with a special emphasis on deep Today, the world is evolving into a digital age [1]
learning techniques, in order to aid in the identification where cyber technology is an integral part of everyday
of malware. The paper's primary contributions are (i) it life. The use of computers and the Internet includes
provides a comprehensive description of the methods computing and access to knowledge and using
and features used in a traditional machine learning techniques such as the Internet of Things (IoT),
workflow for malware detection and classification; (ii) it cryptocurrency, etc. Today's world discusses the
examines the challenges and limitations of tradi tional digital economy for cyber collections [2]; such a deep
machine learning; and (iii) it examines recent trends and
computer involvement and many other innovations
progress in the field, with a particular emphasis on deep
present the digital world with new challenges.
learning approaches. Furthermore, (iv) it addresses the
Malicious software, also known as malware, is a
research problems and unresolved obstacles associated
malicious program intended to target computer
with state-of-the-art methods, and (v) it discusses the
systems, device hijacks, file deletion, robbery,
future directions of study in the field. A better
knowledge of malware detection and the new advances spamming, and malware downloads. The malware
and research paths being explored by the scientific program is designed for malicious activities. The list
community to combat the issue is provided by the survey
of malicious activities is widespread and grows at a
results, which aid researchers in their research efforts. rapid and frequent pace with new entries.

Keywords: Malicious S oftware, S ystem Damage, With the tremendous growth rate of cybercrime, it is
Antivirus S oftware, Malware Types, Malware Detection, clearly unreasonable to study and grasp the enormous
S tatic Analysis, Dynamic Analysis, Machine Learning. malware [3] [4] manually. Analysts are aided by the
fact that very little original software is generated by
the developers primarily using code and code trends
to reuse new malware. The biggest downside and
challenge for analysts is the malware operation to
inherit the patterns and similarities between the ways

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1207

Authorized licensed use limited to: Harbin Engineering Univ Library. Downloaded on December 11,2024 at 08:25:17 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Smart Electronics and Communication (ICOSEC).
IEEE Xplore Part Number: CFP21V90-ART; ISBN: 978-1-6654-3368-6

in question. To take advantage of the similarities and malware structure remains the same. In contrast, the
expected trends in malware, the anti-malware industry latter notes that maintaining the activity itself changes
has begun to use the principle of machine learning in the malware's form, resulting in creating a new system
which the machines are trained to discover and after each iteration. It is hard to detect and isolate this
acknowledge the inherited patterns. Machine learning complex property of the malware. Signature-based,
and malware detection are multiple fields with several Heuristic-based, normalization, and computer
overlaps. education are the most effective techniques for
malware detection. Machine learning has become a
As the Internet is growing quickly, malware is now well-known solution to malware defenders in recent
one of the biggest cyber hazards. All malware years.
programmes such as information stealing, snooping,
etc. may be referred to as malware. Kaspersky Lab’s II. TYPES O F MALWARE AND DIFFERENT MALWARE
[5] described the malware as a "computer program ANALYSIS MO DELS
designed to infect and multiply harm to a legitimate 2.1 Malware types
user's computer." Although the diversity of malware
is rising, anti-virus scanners cannot match the security It is helpful to identify the problem so that malware
needs that lead to millions of hosts being attacked. In methodologies and reasoning are better understood.
2019, according to Kaspersky Labs 12,989,287 hosts Depending on its function, malware may be divided
had been targeted and separate malware items were into different groups. The types of malware are
found. In particular, Juniper Research (2016) forecasts
that the cost of data violations will rise worldwide to Virus: This is the simplest type of software. It is just
$3.7 billion in 2020. any software piece that has been loaded, launched,
and repeated (modified) without user permission or
Furthermore, due to the high availability of attacking other software.
resources on the Internet, the degree of competence
necessary for malware creation decreases. A high Worm: This form of malware is very much like a
level of anti-detection techniques and the ability to virus. The difference is that the worm will propagate
purchase black-market malware give everyone a to other machines across the network.
chance to become an attacker [6], not depending on
Trojan: This malware class is used to describe
the level of skills. Current studies have shown that
malware types that can appear as legitimate software.
script kiddies are produced or automated with more Thus, social engineering is the general propagation
and more attacks. Therefore, malware protection of
vector used in this class, making people trust that they
computer systems for individual users and companies download legitimate apps.
is one of the most critical cyber security tasks since
even a single attack may cause compromises to data Adware: The only aim of this type of malware is to
and adequate loss. The need for reliable and prompt display computer ads. Adware may also be viewed as
detection methods [7] is dictated by massive loss and a spyware subset and its aim is to create revenue for
repeated attacks. Current static and dynamic developers.
processes, especially when dealing with zero-day
attacks, do not provide efficient detection. Machine- Spyware: As the name suggests, spyware can call the
based learning methods can also be used. Figure 1 malware that allows spyware. Typical spyware
shows several malware detection approaches . practices include monitoring the search history to
transmit custom advertising to third parties and
tracking activities to sell it after that.

Rootkit: Its functionality allows the intruder with

higher permissions to access data than it is permitted.
For example, offer administrative access to an
unauthorized user. Rootkits are constantly hidden and
sometimes unnoticed, making them unbelievably
challenging to find and remove.

Backdoor: A backdoor is a kind of malware, which

lets attackers in a different way to access the device.
Instead, it does not harm itself but provides attackers
Figure 1: Approaches for malware detection with a wider surface. Backdoors are so rarely utilised
individually. They usually occur before other malware
Malware is divided into two groups – Malware of the attacks occur.
first generation and Malware [8] of the second
generation. The malware category depends on how Keylogger: This malware class aims at logging all
the device, its functionality, and the process are user-pressed keys and thus storing the data, including
affected. The first generation is about the idea that the

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1208

passwords, bank card numbers, and other vulnerable delete the malicious code's dynamical features,
data. including CWSandbox, Anubis, CAT,
TRACKTRAK, etc.
Ransomware: This Malware is meant to encrypt all
the data on the computer and requests a victim to
transfer a certain amount of money to get the
decryption key. Typically, a ransomware-infected Memory-damaging Malware Analysis
computer is "frozen," so the user cannot access a file, The test procedure of the spiteful code[9] [10] after it
and the screen is used to provide information about has been executed is known as a memory-damaging
attackers' requests.
malware analysis. Memory analysis features include
2.2 Different malware analysis models shared resources, application programs, hooking
detection, network services, rootkits link, hidden
There are three types of Malware Analysis: Static, objects, injection code, etc. Memory analytical
Dynamic, and Memory Malware Analysis , resources include volatility, pin tools, Valgrind, etc.
represented in Figure 2.
This survey aims to review and systematize existing
literature to promote malware analysis using machine
Malware learning techniques.
Analysis
III. MACHINE LEARNING MO DELS

Machine learning can be asserted as a scientific

Static Dynamic Memory discipline of artificial intelligence which concerns the
Analysis Analysis Analysis analysis and implementation of algorithms that can be
trained with data. It is declared to be a method of data
analysis that can be used for making predictions from
Without
data.
While After
executing executing executing
malicious malicious malicious 3.1 Broad categories of ML techniques:
code code code Learning with Supervised model
Supervised learning is a machine learning activity that
tries to infer a feature from a large set of labelled data.
Investigators often classify their findings using the
Tools: Tools:Detou term "diagnosis." Some of the known examples of
rs, Cuckoo Volability pin
PED,Yara,Olly supervised learning algorithms are Bayesian
Sandbox etc tools, etc.
dog etc networks, Decision trees, k-Nearest Neighbour
(KNN), Support Vector Machine (SVM) and
Artificial Neural Networks(ANN).
Figure 2: Types of Malware Analysis Unsupervised learning
Unsupervised learning demands to discover concealed
Static Malware Analysis structure or form from an untagged data. It is also
The detection method or checking of the malicious known as clustering. Some of the commonly utilized
code without executing it is called a static malware unsupervised learning algorithms are k-means
review. It is a malware analysis focus ed on a clustering, hierarchical clustering, and
signature. Dormant malware is extracted from and neighbourhood-based methods, Self-organizing map
used in the collection or functionality extraction (SoM).
process in the machine classification, such as Reinforcement learning
metadata chains, code, and import libraries. The static Reinforcement learning involves an application that
malware analysis file type may most likely be exe, communicates with a dynamic space to attain a
DLL, documents, assembly code, byte code, etc.; specific objective.
static features are extracted from these file types as
the output. For static malware analysis, tools like 3.2 Different types of machine learning classifiers
PEiD, ssdeep, pafish, Yara, strings, IDA Pro, 3.2.1 Single classifiers
OllyDbg, OllyDump, and many more can be used. An Intrusion detection system that is designed using
only one machine learning algorithm or approach is
Dynamic Malware Analysis known as a single classifier. Some of the commonly
used single classifiers are as follows
To eliminate the malware or stop it from spreading to Decision Tree
other systems, dynamic analysis helps by running the Decision tree is a kind of classification algorithm that
malware sample and analyses its activity on the learns to build a model from an already known
system. The dynamic malware analysis is used to

categorised dataset. Each item in the data is Genetic Algorithms

determined by its attribute values. The taxonomy can It is defined to be a programming approach which
be seen as a link between a collection of attributes and imitates biological transformation for clarifying
a specific division or class. Decision trees are one problems [15]. In Genetic Algorithm many biological
example of a classification algorithm [2]. Decision concepts like mutation, crossover, and inheritance
tree is very powerful, simple and easy to implement. selection are mimicked and used to derive solutions.
K-Nearest Neighbours Darwin’s survival of fittest is used to provide an
K-Nearest Neighbours can be said as an old and optimal solution from a large set of population
simple method to categorise samples [11]. It computes candidates [16].
imprecise distance between different points on the 3.2.2 Hybrid Classifiers
input vectors. The untagged or unlabelled points are A hybrid classifier is one in which more than one
then designated to the clas s of its K-nearest machine learning algorithm is combined to improve
neighbours. K-Nearest Neighbours is also called as the detection efficiency of an Intrusion Detection
instance-based learning. System (IDS). In some cases, it involves both
SVM clustering based techniques and classification-based
Boser, Guyon and Vapnik became acquainted with techniques i.e both Supervised and Unsupervised
SVM in COLT-92. It can be asserted as a kind of learning. A hybrid classifier helps in both intrusion
supervised learning method that can be applied for and anomaly detection [17].
classification and regression analysis. SVM for 3.2.3 Ensemble Classifier
intrusion detection is based on the idea of decision Ensemble learning is a method of creating various
planes, which define decision boundaries . A collective base classifiers from which a new classifier is derived
object with varying class memberships are separated which performs better than any fundamental classifier.
by a decision plane. The labelled training item sets are An ensemble classifier produces a better classification
found as normal class of objects and the remaining as result with acceptable solutions for many applications
attack entities. SVM detects intrusion on the basis of [18].
recognized attack patterns [6]. Table-1: Comparison of different ML models in
ANN IDS
It works as a system that simulates the functionality of Machine
human brain neurons [12]. Neural networks are learning Merits Demerits
basically organised into input, secret and display Approach
layers.. Layers are composed of numerous interlinked Works well Trees that are
‘nodes’ that consist of ‘activation function’. Patterns with formed from
are furnished onto the network through the ‘input datasets. many datasets are
layer’. This layer then coordinates with the hidden High detection not easy to
layers for processing by a system of weighed accuracy. analyse or
‘connections. The middle-hidden layer then connects Decision tree understand.
to a layer for displaying output (the detection result) construction The attribute of
known as the output layer. Decision
doesn’t require the output should
Tree
Naïve Bayes any knowledge. be unconditional.
Naïve Bayes model applies a simple Bayes theorem Decision tree Output attribute
with independent assumptions between predictors representation is is restricted to
[13]. This means that they are mutually exclusive. easy to one.
Building a Naïve Bayesian design is easy without understand. Algorithms used
complex repetition of parameter evaluation. This are insecure or
seems to be very useful for considerably huge unreliable.
datasets. Naïve Bayes class ifiers are said to work fast Needs large
when compared to any other highly developed storage
methods. requirements.
Fuzzy Logic Slow in
Dr. Lotfi Zadeh first initiated Fuzzy logic in 1960 at Quick calculation
classifying test
UC-Berkeley. The idea of Fuzzy is based on “degrees k-Nearest time.
tuples.
of truth” rather than usual “true or false” (0 or 1). The neighbour Simple algorithm
Highly
value of truth in fuzzy is a real number varying from 0 – to interpret
susceptible to the
to 1. In fuzzy space, an object is allowed to belong to curse
any different class at the same time. Thus, the system of
is quite fuzzy and acts as a great option for intrusion dimensionality.
detection. Because of the unpredictable character of Greater learning Training can be
the invasions, fuzzy plays a prominent lead in Support
potential for extensive and
identifying harmful events. It also minimises false vector
insignificant prolonged.
alarm levels [14]. machine
samples. The attack

Rate of decision detected doesn’t independent.

and training is provide extra Rational thinking
high. information of is imprecise. Can
Involves more
Inconsiderate to the invasion as be used
consumption of
the extent of data binary type of powerfully against
resource.
inputs. classifier is used. port scans.
To update rule
Accuracy is very High algorithmic Semantic variables
dynamically is
high. complexity are used. Permits
difficult.
Can model even Extensive Fuzzy vague inputs.
To build a replica
very complex memory Logic Allows indistinct
from fuzzy
decision limits requirements. thresholds.
system is tiring.
which are not Reconciles
Demands more
linear. disputing objects.
fine adaptation
Compared to other Rule basis or
and imitation
techniques it is fuzzy sets can be
before function.
less likely to smoothly
suffer from amended.
overfitting. Have the ability to
Possess the derive the
capability of The process of foremost
generalizing from training is very classification rule
less, unclear slow, hence not and prime optimal Genetic
deficient data. suitable for parameters. Algorithm cannot
Does not require detection at real Biologically assure constant
specialist time. inspired and optimization
cognition to find Chance of employs response time.
new attacks or overfitting is Genetic evolutionary Over-fitting
Artificial invasions. likely to occur at Algorithms algorithm. problems
neural Requires less times of training It breaks all No global
Networks formal statistical in the neural optimization optimum
training. network. issues. No constant
Implicitly detects Greater It resolves the optimization
complex non- computational mess with diverse response time.
linear associations burden. solutions.
between similar Prone to Effortlessly
and free variables. overfitting and changed to
Exhibits more requires long prevailing models.
tolerance towards training time.
clatter data.
Features that are Table-1 describes different machine learning models
continuous are with the advantages and limitation of handling
complex to malware data.
It is simple and
handle.
straightforward to IV. LITERATURE S URVEY
Cannot have
apply. The malware detection architecture proposed by
proficient
It doesn't require Gupta et al.[1] produced a dataset of 0.7 million files
classifiers when
as much data for that contains 0.18 million clean files and 0.61 million
prior cognizance
training. samples of malware, such as VXHeaven, Nothing,
goes wrong. Lack
It can deal with VirusShare, etc. After collecting malware samples,
of available
Naïve both continuous automated malware analysis is carried out using
probability data.
Bayes and discrete data. cuckoo sandboxes to evaluate the malware's actions
Its findings seem
It can handle during execution. The data collection involves a series
to be like those
many predictors of python scripts. The findings are in the JavaScript
obtained from
and data points. Object Notation format. Python in Apache Spark
threshold-based
It is quick and removes the static and dynamic functionality from the
systems.
may be used to JSON reports.
High
make predictions
computational
in real time. Burnap et al.[2] developed a new approach for
effort is required.
categorising files using self-organizing maps and for
Data attributes
reducing overfitting in the course of the training.
are conditionally
Compiled data set for the VirusTotal API. In and

under different conditions, Random Forest, BayesNet, characteristics are used to classify malware using the
MLP and Support Vector Machine classifiers are K-means, Expectation-Maximisation, and Hidden
used. Both malware samples from VirusTotal dataset Markov algorithms. The Expectation-Maximization
are initially examined for 15 cross -validation cycles, results provide better accuracy among the clustering
and the findings of Random Forest with 96% algorithms. Makandar et al. [9] summarize malware
precision are produced. analysis and detection technology with various
malware types.
AlAhmadi et al. [3] proposed a new technique to
malware classification. This is a three-phase process. The automatic system for detecting unknown malware
In the first stage, malware variants are fed into samples using neural networks is given by Kosmidis
network traffic and extracted thereafter. The variant et al. [10]. The malware is classified by perceptron,
families are built and encoded using the network decision tree, closest centroid, stochastic gradient,
change and the input instability is verified. When multilayer perceptron, random forest algorithms. In
similarities are derived from the malware family, the Random Forest, average accuracy results and test time
sequence is extracted and the flow values are are also taken into account as parameters. Sahay et
compared to similarities using binary likeness, al.[11] grouped malware-dependent executable using
Levenshtein distances, cosine similitude, interflow optimal K- means clustering and these groups used
distance, and N-flow mining, which are taken into training features for detection. They concluded that
account as the outcome of the second phase. In the the proposed approach provides 78 percent of
third phase, a model is trained and created for profile accuracy in finding unknown malware. Ahmadi et al.
extraction characteristics. Machine learning methods [12] used the malware data collection and the hex
like KNN and Random Forest are utilised in the dumping-based features of Microsoft, and extracted
classification process. Finally, it is noted that the them from disassembled data. GBoost classification
proposed classifier achieves 95.5 percent of accuracy. algorithms were used for classification. The authors
registered an accuracy of 91.8 percent.
The malware detection system has been launched by
Khan et al. [4]. The methodologies for malware Drew et al. [13] used polymorphic malware
sensing are employed in remote and regional analyses. classification using the Super Threaded Reference
A file is verified whether it is malicious or benign Free Alignment-Free N sequence Decoder
with the help of signatures. Various anti-virus tools (STRAND). The Algorithm State Machines (ASM)
are utilised for the analysis of malware and APIs sequence model was presented in their method, and
during isolated inspections. Analysis includes the use precision obtained by cross-validation was more than
of anti-virtual machines, anti-debuggers, analysing of 98.59 percent. In volume and diversity of
URLs, string analysis and packaging. Ronen et al. [5] programming versions, traditional safety protection
include the standard data set, which has been measures are not sufficient that is analysed by Souri et
announced by numerous malware as a challenge for al. [14]. Dynamic analysis of malware during runtime
the Kaggle competition. Ye et al.[6] submitted an may secure the model from malicious programming.
investigation into malware detection using smart This article proposed a framework to analyse malware
malware detection technologies for data mining. They [15] behaviour using machine learning automatically.
depict two phases in which features are extracted and
classified as crucial processes in the analysis and Hashemi et al. [16] proposed an entirely new
detection of malware. They reviewed research Windows platform based solution to polymorphic
activities from 2011 to 2016 including issues related malware detection. Polymorphic computer viruses are
to malware identification and data mining solutions. much more advanced and challenging to discover than
their original versions. It takes a lot of time to catch
Wang et al. [7] introduced the design and execution of them. A two-stage approach is used to evaluate them:
a sandbox, extractor and categorization. Mainly three first stage is creating both known and unknown
steps are considered for the tasks of collectors, mining malware for the API call sequences; second, sequence
workers and classifiers. The PinFWSandbox module restructuring and distance between the two data
in the collector which collects dynamic data, log file points.
data, and passes the extractor stage, as well as static
analysis and passionate performance. The extractor Souri et al. [17] proposed practical methods for
extracts all static characteristics, as well as dynamic developing warning correlated attack scenarios with
instructions and features. The classifier integrates all intrusion preconditions and effects. Their approach is
models, including the product of individual model based on the observation that alarms in a sequence of
classifications, system call output classifications, and attacks are not isolated but linked to multiple phases.
instantaneous classification with dynamic outcomes. Earlier stages are being prepared for the later steps.
They suggested a formal structure that would
Pai et al. [8] used clustering algorithms to classify represent warnings using the idea of hyper-alerts [18]
malware. Static characteristics and their ratings are with their conditions and consequences.
derived from the opcode sequences. These static

Several malware detectors have to break down implemented, and that has less background traffic.
malicious code to generate assembly code for Therefore, a system is proposed for the detection and
analysis. Palumbo et al. [19] described scenarios in prevention of unknown malware attacks. The
which malware masks instructions to avoid static disadvantages indicated must be overcome by
analysis. They investigate static detection techniques developing an efficient extraction model which
by modelling the dynamic usage of the stack, which is enhances the accuracy of malware detection. To
used in metamorphic viruses. All the virous stages categorise malware and to prevent it from providing
will be avoided if memory stacks are used in a more systems security, an extractor model must be built for
sophisticated way. Many viruses [20], for example, the rule set.
contain ambiguous calling instructions for static
analysis breakdown. VI. CO NCLUSIO N
Malicious software being an increasing security
Two common issues with behavioural block threat, malware detection continues to be an essential
identification strategies and circumvention of research issue. An inquiry into current malware
monitoring points have been established by identity models was carried out using machine
Narayanan et al. [21]. For any commercial AV learning techniques. Malware detection systems have
solution today, all these areas pose problems. The been compared and evaluated on the basis of a
switch to the disc processor eliminates the difficulties number of critical aspects, including classification
of circumvention and allows partial solutions to the approaches, analytical methodology, dataset number,
other issues. One feature of the signature detection precision and analysis. Research on malware detection
that varies from the host-level is that conduct has already proved that machine learning is correctly
detectors only detect malware and it does not detect classified. It is hoped that more constructive learning
other anomalies. methods are developed with machine learning,
Wu et al.[22] explored smartphone-based malware ensemble learning and deep learning. These
detection model for animal health protection contributions can be linked to important fields of
component with biological resistance system using inquiry. A new mix of aims, features and algorithms
both static malware analysis and malware element can be investigated in order to increase accuracy
investigations. Due to the precisely assessed vector above the existing state of the art. Moreover, as some
coding, the static and dynamic features are classes of algorithms have never been used for some
distinguished and antigens are created. In addition, 34 purpose, new ways can be offered for further research.
malwares and 25 benign files were compiled to study An investigation of malware can provide other ideas
samples. to be pursued. The entire field of study focuses on the
development of appropriate malware testing
Bat-Erdene et al. [23] introduced a technique to standards. This paper provides a brief survey on the
characterize the packaging algorithms of unknown models available for malware detection. The new idea
packaging. Firstly, they estimated entropy of a given of malware analysis economics can drive future
executable and changed them into typical research routes when establishing a malware testing
representations by entropy estimates of a particular environment where appropriate tuning methods can be
memory region. They used symbolic approximation provided to balance conflicting metrics and improving
aggregate, which is considered to be viable for the security levels in the network.
enormous knowledge shifts. Secondly, images are
transmitted using managed learning-ordering
techniques, i.e., Naive Bayes and Support Vector REFERENCES
machines for computerization.

V. PRO PO PSED MO DEL

[1]. Gupta, D., & Rani, R. (2018). Big Data Framework for
The key advantage of behavioural detection systems is Zero Day Malware Detection. Cybernetics and Systems,
that all harmful files are identified by their call 49(2), 103-121.
behaviour which increases the prediction of malware. [2]. Burnap, P., French, R., Turner, F., & Jones, K. (2018).
The key disadvantage of the identification methods Malware classification using self organizing feature
based on the signature is the runtime overhead. There maps and machine activity data. computers & security,
73, 399- 410.
are three major platforms in the target environment:
[3]. AlAhmadi, B. A., & Martinovic, I. (2018, May).
embedded systems, windows and smartphones. Most MalClassifier: Malware family classification using
behaviour detection research papers employed the network flow sequence behaviour. In APWG
mobile environment to mirror the approach for Symposium on Electronic Crime Research (eCrime),
malware detection. However, any number of unknown 2018 (pp. 1-13). IEEE.
attacks that could affect the test findings can involve [4]. Khan, M. H., & Khan, I. R. (2017). Malware Detection
and Analysis. International Journal of Advanced
traffic from genuine networks. Researchers and
Research in Computer Science, 8(5).
developers should join a single tested network that has
different attacks, including those that have been

[5]. Ronen, R., Radu, M., Feuerstein, C., Yom-T ov, E., & [20]. Mohamed GAN, Ithnin NB (2018) SBRT : API signature
Ahmadi, M. (2018). Microsoft Malware Classification behaviour based representation technique for improving
Challenge. arXiv preprint arXiv:1802.10135. metamorphic malware detection. In: Saeed F, Gazem N,
[6]. Ye, Y., Li, T ., Adjeroh, D., & Iyengar, S. S. (2017). A Patnaik S, Saed Balaid AS, Mohammed F (eds) Recent
survey on malware detection using data mining trends in information and communication technology.
techniques. ACM Computing Surveys (CSUR), 50(3), Proceedings of the 2nd international conference of
41. reliable information and communication technology
[7]. Wang, C., Ding, J., Guo, T ., & Cui, B. (2017, (IRICT 2017). Springer International Publishing, Cham,
November). A Malware Detection Method Based on pp 767–777
Sandbox, Binary Instrumentation and Multidimensional [21]. Kumar, S. A., Babu, E. S., Nagaraju, C., & Gopi, A. P.
Feature Extraction. In International Conference on (2015). An empirical critique of on-demand routing
Broadband and Wireless Computing, Communication protocols against rushing attack in MANET .
and Applications (pp. 427-438). Springer, Cham. International Journal of Electrical and Computer
[8]. Pai, S., Di T roia, F., Visaggio, C. A., Austin, T . H., & Engineering, 5(5).
Stamp, M. (2017). Clustering for malware classification. [22]. Narayanan A, Chandramohan M, Chen L, Liu Y (2017)
Journal of Computer Virology and Hacking T echniques, A multi-view context -aware approach to Android
13(2), 95-107 malware detection and malicious code localization.
[9]. Makandar, A., & Patrot, A. (2017). Overview of Empir Softw Eng. https://doi.org/10.1007/s10664-017-
malware analysis and detection. In IJCA proceedings on 9539-8
national conference on knowledge, innovation in [23]. Wu B, Lu T , Zheng K, Zhang D, Lin X (2014)
technology and engineering, NCKIT E (Vol. 1, pp. 35 - Smartphone malware detection model based on artificial
40). immune system. China Commun 11:86–92.
[10]. Kosmidis, K., & Kalloniatis, C. (2017, September). [24]. Bat-Erdene M, Park H, Li H, Lee H, Choi MS (2017)
Machine Learning and Images for Malware Detection Entropy analysis to classify unknown packing
and Classification. In Proceedings of the 21st Pan - algorithms for malware detection. Int J Inf Secur
Hellenic Conference on Informatics (p. 5). ACM. 16(3):227–248.
[11]. S. K. Sahay and A. Sharma, “Grouping the Executables [25]. Cui B, Jin H, Carullo G, Liu Z (2015) Service-oriented
to Detect Malwares with High Accuracy,” Procedia mobile malware detection system based on mining
Computer Science, vol. 78, no. June, pp. 667–674, 2016. strategies. Pervasive Mob Comput 24:101–116.
[12]. M. Ahmadi, D. Ulyanov, S. Semenov, M. Trofimov, and [26]. Fan Y, Ye Y, Chen L (2016) Malicious sequential
G. Giacinto, “Novel Feature Extraction, Selection and pattern mining for automatic malware detection. Expert
Fusion for Effective Malware Family Classification,” Syst Appl 52:16–
ACM Conference Data Application Security Priv., pp. 25. https://doi.org/10.1016/j.eswa.2016.01.002
183–194, 2016 [27]. Martín A, Menéndez HD, Camacho D (2016)
[13]. J. Drew, M. Hahsler, and T . Moore, “Polymorphic MOCDroid: multi-objective evolutionary classifier for
malware detection using sequence classification Android malware detection. Soft Comput 21:7405 –
methods and ensembles,” EURASIP J. Inf. Secur., vol. 7415.
2017, no. 1, p. 2, 2017. [28]. Sarada, K., Narayana, V. L., Gopi, P., & Pavani, V.
[14]. Souri A, Norouzi M, Asghari P (2017) An analytical (2020). An iterative group based anomaly detection
automated refinement approach for structural modeling method for secure data communication in networks.
large-scale codes using reverse engineering. Int J Inf Journal of Critical Reviews, 7(6), 208-212.
T echnol 9:329–333. https://doi.org/10.1007/s41870-017- [29]. Gopi, A., Babu, E. S., Raju, C. N., & Kumar, S. A.
0050-7 (2015). Designing an Adversarial Model Against
[15]. Souri A, Navimipour NJ, Rahmani AM (2017) Formal Reactive and Proactive Routing Protocols in MANET S:
verification approaches and standards in the cloud A Comparative Performance Study. International
computing: a comprehensive and systematic review. Journal of Electrical & Computer Engineering (2088 -
Comput Stand 8708), 5(5).
Interfaces. https://doi.org/10.1016/j.csi.2017.11.007 [30]. Shakya, Subarna, Lalitpur Nepal Pulchowk, and S.
[16]. Hashemi H, Azmoodeh A, Hamzeh A, Hashemi S Smys. “Anomalies Detection in Fog Computing
(2017) Graph embedding as a new approach for Architectures Using Deep Learning.” Journal: Journal of
unknown malware detection. J Comput Virol Hacking T rends in Computer Science and Smart T echnology
T ech 13:153–166. https://doi.org/10.1007/s11416-016- March 2020, no. 1 (2020): 46-55.
0278-y
[17]. Souri A, Asghari P, Rezaei R (2017) Software as a
service based CRM providers in the cloud computing:
challenges and technical issues. J Serv Sci Res 9:219–
237. https://doi.org/10.1007/s12927-017-0011-5
[18]. Chowdhury M, Rahman A, Islam R (2018) Malware
analysis and detection using data mining and machine
learning classification. In: Abawajy J, Choo K-KR,
Islam R (eds) International conference on applications
and techniques in cyber security and intelligence:
applications and techniques in cyber security and
intelligence. Springer International Publishing, Cham,
pp 266–274
[19]. Palumbo P, Sayfullina L, Komashinskiy D, Eirola E,
Karhunen J (2017) A pragmatic android malware
detection procedure. Comput Secure 70:689–
701. https://doi.org/10.1016/j.cose.2017.07.013

Authorized licensed use limited to: Harbin Engineering Univ Library. Downloaded on December 11,2024 at 08:25:17 UTC from IEEE Xplore. Restrictions apply.

Development of Malware Detection and Analysis Mode
No ratings yet
Development of Malware Detection and Analysis Mode
50 pages
Project Notes - II (Capstone Project) - Facebook Comments Volume Prediction - YS
83% (6)
Project Notes - II (Capstone Project) - Facebook Comments Volume Prediction - YS
15 pages
19MCA1097 Project Report On Heart Failure Prediction
No ratings yet
19MCA1097 Project Report On Heart Failure Prediction
63 pages
Ijett V73i1p132
No ratings yet
Ijett V73i1p132
15 pages
Analyzing and Comparing The Effectiveness of Malware Detection - A Study of Machine Learning Approaches - ScienceDirect
No ratings yet
Analyzing and Comparing The Effectiveness of Malware Detection - A Study of Machine Learning Approaches - ScienceDirect
39 pages
15709-Article Text-55876-2-10-20220114
No ratings yet
15709-Article Text-55876-2-10-20220114
26 pages
Electronics 11 03665 v2
No ratings yet
Electronics 11 03665 v2
20 pages
Chapter One 1.1 Background of The Study
No ratings yet
Chapter One 1.1 Background of The Study
40 pages
The State-of-the-Art in AI-Based Malware Detection Techniques: A Review
No ratings yet
The State-of-the-Art in AI-Based Malware Detection Techniques: A Review
18 pages
Ijcna 2021 o 56
No ratings yet
Ijcna 2021 o 56
18 pages
A Comprehensive Review On Malware Detection Approaches
No ratings yet
A Comprehensive Review On Malware Detection Approaches
23 pages
SSRN Id3901568
No ratings yet
SSRN Id3901568
21 pages
Symmetry 14 02304
No ratings yet
Symmetry 14 02304
11 pages
Malware Application Detection Using Machine Learning
No ratings yet
Malware Application Detection Using Machine Learning
8 pages
Phase 1 Report Group ID CSE19-G58 Malware Detection Using ML
No ratings yet
Phase 1 Report Group ID CSE19-G58 Malware Detection Using ML
30 pages
Judy S Detection and Classification of Malware For
No ratings yet
Judy S Detection and Classification of Malware For
6 pages
Survey Paper of Group 7
No ratings yet
Survey Paper of Group 7
9 pages
Malware Detection Issues and Challenges
No ratings yet
Malware Detection Issues and Challenges
7 pages
Mushkan Report
No ratings yet
Mushkan Report
67 pages
Malware Survey IJNSA
No ratings yet
Malware Survey IJNSA
22 pages
Malware Classification Using Machine Learning Algorithms and Tools
No ratings yet
Malware Classification Using Machine Learning Algorithms and Tools
7 pages
Im 2007
No ratings yet
Im 2007
48 pages
A Review of Deep Learning Based Malware Detection Techniques
No ratings yet
A Review of Deep Learning Based Malware Detection Techniques
19 pages
p6 Digital Forensics For Malware Classification An Approach For
No ratings yet
p6 Digital Forensics For Malware Classification An Approach For
12 pages
1 s2.0 S2405844023107821 Main
No ratings yet
1 s2.0 S2405844023107821 Main
19 pages
6 Thsemminiproject
No ratings yet
6 Thsemminiproject
12 pages
Malware Detection Using Data Mining Techniques: Sara Najari, Iman Lotfi
No ratings yet
Malware Detection Using Data Mining Techniques: Sara Najari, Iman Lotfi
5 pages
Malware Detection and Analysis Challenges and Rese
No ratings yet
Malware Detection and Analysis Challenges and Rese
10 pages
The Rise of Machine Learning For Detection and Classification of Malware - Research Developments, Trends and Challenges - ScienceDirect
No ratings yet
The Rise of Machine Learning For Detection and Classification of Malware - Research Developments, Trends and Challenges - ScienceDirect
75 pages
Computers 13 00059
No ratings yet
Computers 13 00059
18 pages
Malware Detection Using Machine Learning
No ratings yet
Malware Detection Using Machine Learning
4 pages
Malcode Detection
No ratings yet
Malcode Detection
5 pages
Malware Detection and Classification Based On Graph Convolutional Networks and Function Call Graphs
No ratings yet
Malware Detection and Classification Based On Graph Convolutional Networks and Function Call Graphs
11 pages
Robust Malicious Software Detection and Classifica
No ratings yet
Robust Malicious Software Detection and Classifica
16 pages
Malware Detection Using Machine Leaning
No ratings yet
Malware Detection Using Machine Leaning
9 pages
A Novel Ensemble-Based Approach For Windows Malware Detection
No ratings yet
A Novel Ensemble-Based Approach For Windows Malware Detection
10 pages
Malware Analysis and Classification Survey
No ratings yet
Malware Analysis and Classification Survey
9 pages
A Survey of Malware Detection Techniques
No ratings yet
A Survey of Malware Detection Techniques
49 pages
Symmetry 14 02304 With Cover
No ratings yet
Symmetry 14 02304 With Cover
12 pages
Malware Survey Arxxiv
No ratings yet
Malware Survey Arxxiv
10 pages
Unifying Traditional and Machine Learning Approaches For Robust Malware Classification
No ratings yet
Unifying Traditional and Machine Learning Approaches For Robust Malware Classification
6 pages
Techniques in Detection and Analyzing Ma
No ratings yet
Techniques in Detection and Analyzing Ma
6 pages
A New Malware Classification Framework Based On Deep Learning Algorithms
No ratings yet
A New Malware Classification Framework Based On Deep Learning Algorithms
16 pages
Malware Classification Based On Multilayer Perception and
No ratings yet
Malware Classification Based On Multilayer Perception and
22 pages
Malware Survey Arxxiv
No ratings yet
Malware Survey Arxxiv
9 pages
IET Information Security - 2020 - Ghouti - Malware Classification Using Compact Image Features and Multiclass Support
No ratings yet
IET Information Security - 2020 - Ghouti - Malware Classification Using Compact Image Features and Multiclass Support
11 pages
JETIR1907J68 June 2019
No ratings yet
JETIR1907J68 June 2019
17 pages
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
No ratings yet
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
9 pages
From Code To Conundrum Machine Learnings Role in Modern Malware Detection
No ratings yet
From Code To Conundrum Machine Learnings Role in Modern Malware Detection
6 pages
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
No ratings yet
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
8 pages
Artificial Intelligence in Malware Detection: Cosolan Cornelia Ionela May 22, 2018
No ratings yet
Artificial Intelligence in Malware Detection: Cosolan Cornelia Ionela May 22, 2018
5 pages
Malware - Detection - Using - Machine - Learning (3) - Removed
No ratings yet
Malware - Detection - Using - Machine - Learning (3) - Removed
31 pages
Project JAISON
No ratings yet
Project JAISON
61 pages
A Malicious Code Detection Method Based On Stacked Depthwise Separable Convolutions and Attention Mechanism
No ratings yet
A Malicious Code Detection Method Based On Stacked Depthwise Separable Convolutions and Attention Mechanism
27 pages
Classification of Malware Detection Using Machine Learning Algorithms A Survey
No ratings yet
Classification of Malware Detection Using Machine Learning Algorithms A Survey
7 pages
Malware - Detection - Using - Machine - Learning (2) - Removed
No ratings yet
Malware - Detection - Using - Machine - Learning (2) - Removed
31 pages
Evaluation of Machine Learning For Smart Phone Malware Detection
No ratings yet
Evaluation of Machine Learning For Smart Phone Malware Detection
6 pages
Scalable Malware Detection System Using Big Data A
No ratings yet
Scalable Malware Detection System Using Big Data A
18 pages
Malware Detection Using Machine Learning and Deep Learning
No ratings yet
Malware Detection Using Machine Learning and Deep Learning
10 pages
Document Malware
No ratings yet
Document Malware
9 pages
Analysis of Cyber Security Threats Using
No ratings yet
Analysis of Cyber Security Threats Using
5 pages
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
From Everand
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
M. Sreedevi
No ratings yet
Data Preprocessing Steps For Machine Learning in Python (Part 1) - by Learn With Nas - Wom
No ratings yet
Data Preprocessing Steps For Machine Learning in Python (Part 1) - by Learn With Nas - Wom
39 pages
KNN Algorithm in Machine Learning
No ratings yet
KNN Algorithm in Machine Learning
26 pages
Deep Learning Based Convolutional Neural Networks (DLCNN) On Classification Algorithm To Detect The Brain Turnor Diseases Using MRI and CT Scan Images
No ratings yet
Deep Learning Based Convolutional Neural Networks (DLCNN) On Classification Algorithm To Detect The Brain Turnor Diseases Using MRI and CT Scan Images
8 pages
Major Report
No ratings yet
Major Report
53 pages
Mul Tiple Classification System For Fracture Detection in Human Bone X-Ray Images
No ratings yet
Mul Tiple Classification System For Fracture Detection in Human Bone X-Ray Images
8 pages
Dog Breed Identification: Whitney Larow Brian Mittl Vijay Singh
No ratings yet
Dog Breed Identification: Whitney Larow Brian Mittl Vijay Singh
7 pages
CS464 Ch1 Intro Fall2020
No ratings yet
CS464 Ch1 Intro Fall2020
83 pages
Final Doc1
No ratings yet
Final Doc1
12 pages
Heart Disease Prediction Using KNN Algorithm-2
No ratings yet
Heart Disease Prediction Using KNN Algorithm-2
19 pages
Atharva Kale 9a..
No ratings yet
Atharva Kale 9a..
9 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
16 pages
Anomaly-Detection 112940
No ratings yet
Anomaly-Detection 112940
17 pages
Noah Silverman - Predicting The Outcome of The Horse Race Using Data Mining Technique
No ratings yet
Noah Silverman - Predicting The Outcome of The Horse Race Using Data Mining Technique
20 pages
1 s2.0 S1877050923001102 Main
No ratings yet
1 s2.0 S1877050923001102 Main
7 pages
Chatbot For Disease Prediction Using Classification Based Machine Learning Algorithms
No ratings yet
Chatbot For Disease Prediction Using Classification Based Machine Learning Algorithms
5 pages
Introduction To Machine Learning Course Code: 4350702
No ratings yet
Introduction To Machine Learning Course Code: 4350702
12 pages
A Data-Driven Predictive Maintenance Model For Hospital HVAC System With Machine Learning
No ratings yet
A Data-Driven Predictive Maintenance Model For Hospital HVAC System With Machine Learning
19 pages
Final Chinese LeapMotion A Chinese Sign Language Recognition System Using Leap Motion
No ratings yet
Final Chinese LeapMotion A Chinese Sign Language Recognition System Using Leap Motion
6 pages
Unit-I - Machine Learning Concepts
No ratings yet
Unit-I - Machine Learning Concepts
135 pages
1 s2.0 S0264999313004318 Main
No ratings yet
1 s2.0 S0264999313004318 Main
9 pages
Reasearch Paperanu
No ratings yet
Reasearch Paperanu
9 pages
Mlaifile1 3
No ratings yet
Mlaifile1 3
27 pages
HD Paper
No ratings yet
HD Paper
9 pages
Machine Learning Implementation On Hotel Customer Data - Paper
No ratings yet
Machine Learning Implementation On Hotel Customer Data - Paper
66 pages
Machine Learning (Important QS) - Young Researchers
No ratings yet
Machine Learning (Important QS) - Young Researchers
81 pages
Anti-Spam Filter Based On Naïve Bayes, SVM, and KNN Model
No ratings yet
Anti-Spam Filter Based On Naïve Bayes, SVM, and KNN Model
5 pages
Chapter-V CLASSIFICATION & CLUSTERING
No ratings yet
Chapter-V CLASSIFICATION & CLUSTERING
153 pages
Improved Dropping Attacks Detecting System in 5g N
No ratings yet
Improved Dropping Attacks Detecting System in 5g N
24 pages

A Comprehensive Survey On Identification of Malware Types and Malware Classification Using Machine Learning Techniques

Uploaded by

A Comprehensive Survey On Identification of Malware Types and Malware Classification Using Machine Learning Techniques

Uploaded by

Proceedings of the Second International Conference on Smart Electronics and Communication (ICOSEC).

IEEE Xplore Part Number: CFP21V90-ART; ISBN: 978-1-6654-3368-6

A Comprehensive Survey on Identification

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1207

Rootkit: Its functionality allows the intruder with

Backdoor: A backdoor is a kind of malware, which

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1208

Machine learning can be asserted as a scientific

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1209

categorised dataset. Each item in the data is Genetic Algorithms

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1210

Rate of decision detected doesn’t independent.

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1211

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1212

V. PRO PO PSED MO DEL

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1213

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1214

You might also like