0% found this document useful (0 votes)
43 views8 pages

A Comprehensive Survey On Identification of Malware Types and Malware Classification Using Machine Learning Techniques

This document presents a comprehensive survey on malware identification and classification using machine learning techniques, highlighting the increasing sophistication of malware and the challenges faced by security experts. It categorizes various types of malware and discusses traditional and advanced machine learning methods for detection, including static, dynamic, and memory analysis. The paper aims to provide insights into the current trends, limitations, and future research directions in the field of malware detection.

Uploaded by

kofeinmrdoors98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views8 pages

A Comprehensive Survey On Identification of Malware Types and Malware Classification Using Machine Learning Techniques

This document presents a comprehensive survey on malware identification and classification using machine learning techniques, highlighting the increasing sophistication of malware and the challenges faced by security experts. It categorizes various types of malware and discusses traditional and advanced machine learning methods for detection, including static, dynamic, and memory analysis. The paper aims to provide insights into the current trends, limitations, and future research directions in the field of malware detection.

Uploaded by

kofeinmrdoors98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Proceedings of the Second International Conference on Smart Electronics and Communication (ICOSEC).

IEEE Xplore Part Number: CFP21V90-ART; ISBN: 978-1-6654-3368-6

A Comprehensive Survey on Identification


of Malware Types and Malware
Classification Using Machine Learning
2021 2nd International Conference on Smart Electronics and Communication (ICOSEC) | 978-1-6654-3368-6/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICOSEC51865.2021.9591763

Techniques
Nagababu Pachhala1 S. Jothilakshmi2 Bhanu Prakash Battula3
1 2 3
Research Scholar, Department of IT, Associate Professor, Department of Professor, Department of CSE, KKR
Annamalai University, IT, Annamalai University, & KSR Institute of Technology &
Annamalainagar, 608002, Tamil Nadu, Annamalainagar, 608002, Tamil Nadu, Sciences, Guntur-522017, Andhra
India India Pradesh, India
1 2 3
[email protected] [email protected] [email protected]

Abstract
Malware is malicious code that has an effect on the user I. INTRO DUCTIO N
or device and allows an attacker to do significant harm The defining word for harmful software is malware.
to the machine. Malware is a kind of computer virus Malware is a dangerous code which has an impact on
that increases in number and severity with each passing the user or the computer, and damages the machine by
day, posing a major danger to the security of the an attacker. Malware is a virus version, Trojan, Root,
Internet. This is a never-ending fight between security Ransomware, Worm, BotNet, Spyware, Adware,
experts and malware producers, with the sophistication Keyloggers, etc., and an extensive array of their
of malware increasing at the same rate as technological families is available, spread every day online.
advancement. Current state-of-the-art research focuses According to a study conducted by the AV-Test
on the development and use of machine learning Institute, every day 350,000 new hazardous
methods for malware detection owing to the capacity of applications and programmes are reported. The
these techniques to stay up with malware evolution and malware statistics are documented and registered for
keep up with the speed of technological advancement.
897 million malicious code in 2020, each harmful
The purpose of this study is to provide a systematic and
individual is categorised and saved properly.
comprehensive review of machine learning methods for
malware detection, with a special emphasis on deep Today, the world is evolving into a digital age [1]
learning techniques, in order to aid in the identification where cyber technology is an integral part of everyday
of malware. The paper's primary contributions are (i) it life. The use of computers and the Internet includes
provides a comprehensive description of the methods computing and access to knowledge and using
and features used in a traditional machine learning techniques such as the Internet of Things (IoT),
workflow for malware detection and classification; (ii) it cryptocurrency, etc. Today's world discusses the
examines the challenges and limitations of tradi tional digital economy for cyber collections [2]; such a deep
machine learning; and (iii) it examines recent trends and
computer involvement and many other innovations
progress in the field, with a particular emphasis on deep
present the digital world with new challenges.
learning approaches. Furthermore, (iv) it addresses the
Malicious software, also known as malware, is a
research problems and unresolved obstacles associated
malicious program intended to target computer
with state-of-the-art methods, and (v) it discusses the
systems, device hijacks, file deletion, robbery,
future directions of study in the field. A better
knowledge of malware detection and the new advances spamming, and malware downloads. The malware
and research paths being explored by the scientific program is designed for malicious activities. The list
community to combat the issue is provided by the survey
of malicious activities is widespread and grows at a
results, which aid researchers in their research efforts. rapid and frequent pace with new entries.

Keywords: Malicious S oftware, S ystem Damage, With the tremendous growth rate of cybercrime, it is
Antivirus S oftware, Malware Types, Malware Detection, clearly unreasonable to study and grasp the enormous
S tatic Analysis, Dynamic Analysis, Machine Learning. malware [3] [4] manually. Analysts are aided by the
fact that very little original software is generated by
the developers primarily using code and code trends
to reuse new malware. The biggest downside and
challenge for analysts is the malware operation to
inherit the patterns and similarities between the ways

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1207


Authorized licensed use limited to: Harbin Engineering Univ Library. Downloaded on December 11,2024 at 08:25:17 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Smart Electronics and Communication (ICOSEC).
IEEE Xplore Part Number: CFP21V90-ART; ISBN: 978-1-6654-3368-6

in question. To take advantage of the similarities and malware structure remains the same. In contrast, the
expected trends in malware, the anti-malware industry latter notes that maintaining the activity itself changes
has begun to use the principle of machine learning in the malware's form, resulting in creating a new system
which the machines are trained to discover and after each iteration. It is hard to detect and isolate this
acknowledge the inherited patterns. Machine learning complex property of the malware. Signature-based,
and malware detection are multiple fields with several Heuristic-based, normalization, and computer
overlaps. education are the most effective techniques for
malware detection. Machine learning has become a
As the Internet is growing quickly, malware is now well-known solution to malware defenders in recent
one of the biggest cyber hazards. All malware years.
programmes such as information stealing, snooping,
etc. may be referred to as malware. Kaspersky Lab’s II. TYPES O F MALWARE AND DIFFERENT MALWARE
[5] described the malware as a "computer program ANALYSIS MO DELS
designed to infect and multiply harm to a legitimate 2.1 Malware types
user's computer." Although the diversity of malware
is rising, anti-virus scanners cannot match the security It is helpful to identify the problem so that malware
needs that lead to millions of hosts being attacked. In methodologies and reasoning are better understood.
2019, according to Kaspersky Labs 12,989,287 hosts Depending on its function, malware may be divided
had been targeted and separate malware items were into different groups. The types of malware are
found. In particular, Juniper Research (2016) forecasts
that the cost of data violations will rise worldwide to Virus: This is the simplest type of software. It is just
$3.7 billion in 2020. any software piece that has been loaded, launched,
and repeated (modified) without user permission or
Furthermore, due to the high availability of attacking other software.
resources on the Internet, the degree of competence
necessary for malware creation decreases. A high Worm: This form of malware is very much like a
level of anti-detection techniques and the ability to virus. The difference is that the worm will propagate
purchase black-market malware give everyone a to other machines across the network.
chance to become an attacker [6], not depending on
Trojan: This malware class is used to describe
the level of skills. Current studies have shown that
malware types that can appear as legitimate software.
script kiddies are produced or automated with more Thus, social engineering is the general propagation
and more attacks. Therefore, malware protection of
vector used in this class, making people trust that they
computer systems for individual users and companies download legitimate apps.
is one of the most critical cyber security tasks since
even a single attack may cause compromises to data Adware: The only aim of this type of malware is to
and adequate loss. The need for reliable and prompt display computer ads. Adware may also be viewed as
detection methods [7] is dictated by massive loss and a spyware subset and its aim is to create revenue for
repeated attacks. Current static and dynamic developers.
processes, especially when dealing with zero-day
attacks, do not provide efficient detection. Machine- Spyware: As the name suggests, spyware can call the
based learning methods can also be used. Figure 1 malware that allows spyware. Typical spyware
shows several malware detection approaches . practices include monitoring the search history to
transmit custom advertising to third parties and
tracking activities to sell it after that.

Rootkit: Its functionality allows the intruder with


higher permissions to access data than it is permitted.
For example, offer administrative access to an
unauthorized user. Rootkits are constantly hidden and
sometimes unnoticed, making them unbelievably
challenging to find and remove.

Backdoor: A backdoor is a kind of malware, which


lets attackers in a different way to access the device.
Instead, it does not harm itself but provides attackers
Figure 1: Approaches for malware detection with a wider surface. Backdoors are so rarely utilised
individually. They usually occur before other malware
Malware is divided into two groups – Malware of the attacks occur.
first generation and Malware [8] of the second
generation. The malware category depends on how Keylogger: This malware class aims at logging all
the device, its functionality, and the process are user-pressed keys and thus storing the data, including
affected. The first generation is about the idea that the

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1208


Authorized licensed use limited to: Harbin Engineering Univ Library. Downloaded on December 11,2024 at 08:25:17 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Smart Electronics and Communication (ICOSEC).
IEEE Xplore Part Number: CFP21V90-ART; ISBN: 978-1-6654-3368-6

passwords, bank card numbers, and other vulnerable delete the malicious code's dynamical features,
data. including CWSandbox, Anubis, CAT,
TRACKTRAK, etc.
Ransomware: This Malware is meant to encrypt all
the data on the computer and requests a victim to
transfer a certain amount of money to get the
decryption key. Typically, a ransomware-infected Memory-damaging Malware Analysis
computer is "frozen," so the user cannot access a file, The test procedure of the spiteful code[9] [10] after it
and the screen is used to provide information about has been executed is known as a memory-damaging
attackers' requests.
malware analysis. Memory analysis features include
2.2 Different malware analysis models shared resources, application programs, hooking
detection, network services, rootkits link, hidden
There are three types of Malware Analysis: Static, objects, injection code, etc. Memory analytical
Dynamic, and Memory Malware Analysis , resources include volatility, pin tools, Valgrind, etc.
represented in Figure 2.
This survey aims to review and systematize existing
literature to promote malware analysis using machine
Malware learning techniques.
Analysis
III. MACHINE LEARNING MO DELS

Machine learning can be asserted as a scientific


Static Dynamic Memory discipline of artificial intelligence which concerns the
Analysis Analysis Analysis analysis and implementation of algorithms that can be
trained with data. It is declared to be a method of data
analysis that can be used for making predictions from
Without
data.
While After
executing executing executing
malicious malicious malicious 3.1 Broad categories of ML techniques:
code code code Learning with Supervised model
Supervised learning is a machine learning activity that
tries to infer a feature from a large set of labelled data.
Investigators often classify their findings using the
Tools: Tools:Detou term "diagnosis." Some of the known examples of
rs, Cuckoo Volability pin
PED,Yara,Olly supervised learning algorithms are Bayesian
Sandbox etc tools, etc.
dog etc networks, Decision trees, k-Nearest Neighbour
(KNN), Support Vector Machine (SVM) and
Artificial Neural Networks(ANN).
Figure 2: Types of Malware Analysis Unsupervised learning
Unsupervised learning demands to discover concealed
Static Malware Analysis structure or form from an untagged data. It is also
The detection method or checking of the malicious known as clustering. Some of the commonly utilized
code without executing it is called a static malware unsupervised learning algorithms are k-means
review. It is a malware analysis focus ed on a clustering, hierarchical clustering, and
signature. Dormant malware is extracted from and neighbourhood-based methods, Self-organizing map
used in the collection or functionality extraction (SoM).
process in the machine classification, such as Reinforcement learning
metadata chains, code, and import libraries. The static Reinforcement learning involves an application that
malware analysis file type may most likely be exe, communicates with a dynamic space to attain a
DLL, documents, assembly code, byte code, etc.; specific objective.
static features are extracted from these file types as
the output. For static malware analysis, tools like 3.2 Different types of machine learning classifiers
PEiD, ssdeep, pafish, Yara, strings, IDA Pro, 3.2.1 Single classifiers
OllyDbg, OllyDump, and many more can be used. An Intrusion detection system that is designed using
only one machine learning algorithm or approach is
Dynamic Malware Analysis known as a single classifier. Some of the commonly
used single classifiers are as follows
To eliminate the malware or stop it from spreading to Decision Tree
other systems, dynamic analysis helps by running the Decision tree is a kind of classification algorithm that
malware sample and analyses its activity on the learns to build a model from an already known
system. The dynamic malware analysis is used to

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1209


Authorized licensed use limited to: Harbin Engineering Univ Library. Downloaded on December 11,2024 at 08:25:17 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Smart Electronics and Communication (ICOSEC).
IEEE Xplore Part Number: CFP21V90-ART; ISBN: 978-1-6654-3368-6

categorised dataset. Each item in the data is Genetic Algorithms


determined by its attribute values. The taxonomy can It is defined to be a programming approach which
be seen as a link between a collection of attributes and imitates biological transformation for clarifying
a specific division or class. Decision trees are one problems [15]. In Genetic Algorithm many biological
example of a classification algorithm [2]. Decision concepts like mutation, crossover, and inheritance
tree is very powerful, simple and easy to implement. selection are mimicked and used to derive solutions.
K-Nearest Neighbours Darwin’s survival of fittest is used to provide an
K-Nearest Neighbours can be said as an old and optimal solution from a large set of population
simple method to categorise samples [11]. It computes candidates [16].
imprecise distance between different points on the 3.2.2 Hybrid Classifiers
input vectors. The untagged or unlabelled points are A hybrid classifier is one in which more than one
then designated to the clas s of its K-nearest machine learning algorithm is combined to improve
neighbours. K-Nearest Neighbours is also called as the detection efficiency of an Intrusion Detection
instance-based learning. System (IDS). In some cases, it involves both
SVM clustering based techniques and classification-based
Boser, Guyon and Vapnik became acquainted with techniques i.e both Supervised and Unsupervised
SVM in COLT-92. It can be asserted as a kind of learning. A hybrid classifier helps in both intrusion
supervised learning method that can be applied for and anomaly detection [17].
classification and regression analysis. SVM for 3.2.3 Ensemble Classifier
intrusion detection is based on the idea of decision Ensemble learning is a method of creating various
planes, which define decision boundaries . A collective base classifiers from which a new classifier is derived
object with varying class memberships are separated which performs better than any fundamental classifier.
by a decision plane. The labelled training item sets are An ensemble classifier produces a better classification
found as normal class of objects and the remaining as result with acceptable solutions for many applications
attack entities. SVM detects intrusion on the basis of [18].
recognized attack patterns [6]. Table-1: Comparison of different ML models in
ANN IDS
It works as a system that simulates the functionality of Machine
human brain neurons [12]. Neural networks are learning Merits Demerits
basically organised into input, secret and display Approach
layers.. Layers are composed of numerous interlinked Works well Trees that are
‘nodes’ that consist of ‘activation function’. Patterns with formed from
are furnished onto the network through the ‘input datasets. many datasets are
layer’. This layer then coordinates with the hidden High detection not easy to
layers for processing by a system of weighed accuracy. analyse or
‘connections. The middle-hidden layer then connects Decision tree understand.
to a layer for displaying output (the detection result) construction The attribute of
known as the output layer. Decision
doesn’t require the output should
Tree
Naïve Bayes any knowledge. be unconditional.
Naïve Bayes model applies a simple Bayes theorem Decision tree Output attribute
with independent assumptions between predictors representation is is restricted to
[13]. This means that they are mutually exclusive. easy to one.
Building a Naïve Bayesian design is easy without understand. Algorithms used
complex repetition of parameter evaluation. This are insecure or
seems to be very useful for considerably huge unreliable.
datasets. Naïve Bayes class ifiers are said to work fast Needs large
when compared to any other highly developed storage
methods. requirements.
Fuzzy Logic Slow in
Dr. Lotfi Zadeh first initiated Fuzzy logic in 1960 at Quick calculation
classifying test
UC-Berkeley. The idea of Fuzzy is based on “degrees k-Nearest time.
tuples.
of truth” rather than usual “true or false” (0 or 1). The neighbour Simple algorithm
Highly
value of truth in fuzzy is a real number varying from 0 – to interpret
susceptible to the
to 1. In fuzzy space, an object is allowed to belong to curse
any different class at the same time. Thus, the system of
is quite fuzzy and acts as a great option for intrusion dimensionality.
detection. Because of the unpredictable character of Greater learning Training can be
the invasions, fuzzy plays a prominent lead in Support
potential for extensive and
identifying harmful events. It also minimises false vector
insignificant prolonged.
alarm levels [14]. machine
samples. The attack

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1210


Authorized licensed use limited to: Harbin Engineering Univ Library. Downloaded on December 11,2024 at 08:25:17 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Smart Electronics and Communication (ICOSEC).
IEEE Xplore Part Number: CFP21V90-ART; ISBN: 978-1-6654-3368-6

Rate of decision detected doesn’t independent.


and training is provide extra Rational thinking
high. information of is imprecise. Can
Involves more
Inconsiderate to the invasion as be used
consumption of
the extent of data binary type of powerfully against
resource.
inputs. classifier is used. port scans.
To update rule
Accuracy is very High algorithmic Semantic variables
dynamically is
high. complexity are used. Permits
difficult.
Can model even Extensive Fuzzy vague inputs.
To build a replica
very complex memory Logic Allows indistinct
from fuzzy
decision limits requirements. thresholds.
system is tiring.
which are not Reconciles
Demands more
linear. disputing objects.
fine adaptation
Compared to other Rule basis or
and imitation
techniques it is fuzzy sets can be
before function.
less likely to smoothly
suffer from amended.
overfitting. Have the ability to
Possess the derive the
capability of The process of foremost
generalizing from training is very classification rule
less, unclear slow, hence not and prime optimal Genetic
deficient data. suitable for parameters. Algorithm cannot
Does not require detection at real Biologically assure constant
specialist time. inspired and optimization
cognition to find Chance of employs response time.
new attacks or overfitting is Genetic evolutionary Over-fitting
Artificial invasions. likely to occur at Algorithms algorithm. problems
neural Requires less times of training It breaks all No global
Networks formal statistical in the neural optimization optimum
training. network. issues. No constant
Implicitly detects Greater It resolves the optimization
complex non- computational mess with diverse response time.
linear associations burden. solutions.
between similar Prone to Effortlessly
and free variables. overfitting and changed to
Exhibits more requires long prevailing models.
tolerance towards training time.
clatter data.
Features that are Table-1 describes different machine learning models
continuous are with the advantages and limitation of handling
complex to malware data.
It is simple and
handle.
straightforward to IV. LITERATURE S URVEY
Cannot have
apply. The malware detection architecture proposed by
proficient
It doesn't require Gupta et al.[1] produced a dataset of 0.7 million files
classifiers when
as much data for that contains 0.18 million clean files and 0.61 million
prior cognizance
training. samples of malware, such as VXHeaven, Nothing,
goes wrong. Lack
It can deal with VirusShare, etc. After collecting malware samples,
of available
Naïve both continuous automated malware analysis is carried out using
probability data.
Bayes and discrete data. cuckoo sandboxes to evaluate the malware's actions
Its findings seem
It can handle during execution. The data collection involves a series
to be like those
many predictors of python scripts. The findings are in the JavaScript
obtained from
and data points. Object Notation format. Python in Apache Spark
threshold-based
It is quick and removes the static and dynamic functionality from the
systems.
may be used to JSON reports.
High
make predictions
computational
in real time. Burnap et al.[2] developed a new approach for
effort is required.
categorising files using self-organizing maps and for
Data attributes
reducing overfitting in the course of the training.
are conditionally
Compiled data set for the VirusTotal API. In and

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1211


Authorized licensed use limited to: Harbin Engineering Univ Library. Downloaded on December 11,2024 at 08:25:17 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Smart Electronics and Communication (ICOSEC).
IEEE Xplore Part Number: CFP21V90-ART; ISBN: 978-1-6654-3368-6

under different conditions, Random Forest, BayesNet, characteristics are used to classify malware using the
MLP and Support Vector Machine classifiers are K-means, Expectation-Maximisation, and Hidden
used. Both malware samples from VirusTotal dataset Markov algorithms. The Expectation-Maximization
are initially examined for 15 cross -validation cycles, results provide better accuracy among the clustering
and the findings of Random Forest with 96% algorithms. Makandar et al. [9] summarize malware
precision are produced. analysis and detection technology with various
malware types.
AlAhmadi et al. [3] proposed a new technique to
malware classification. This is a three-phase process. The automatic system for detecting unknown malware
In the first stage, malware variants are fed into samples using neural networks is given by Kosmidis
network traffic and extracted thereafter. The variant et al. [10]. The malware is classified by perceptron,
families are built and encoded using the network decision tree, closest centroid, stochastic gradient,
change and the input instability is verified. When multilayer perceptron, random forest algorithms. In
similarities are derived from the malware family, the Random Forest, average accuracy results and test time
sequence is extracted and the flow values are are also taken into account as parameters. Sahay et
compared to similarities using binary likeness, al.[11] grouped malware-dependent executable using
Levenshtein distances, cosine similitude, interflow optimal K- means clustering and these groups used
distance, and N-flow mining, which are taken into training features for detection. They concluded that
account as the outcome of the second phase. In the the proposed approach provides 78 percent of
third phase, a model is trained and created for profile accuracy in finding unknown malware. Ahmadi et al.
extraction characteristics. Machine learning methods [12] used the malware data collection and the hex
like KNN and Random Forest are utilised in the dumping-based features of Microsoft, and extracted
classification process. Finally, it is noted that the them from disassembled data. GBoost classification
proposed classifier achieves 95.5 percent of accuracy. algorithms were used for classification. The authors
registered an accuracy of 91.8 percent.
The malware detection system has been launched by
Khan et al. [4]. The methodologies for malware Drew et al. [13] used polymorphic malware
sensing are employed in remote and regional analyses. classification using the Super Threaded Reference
A file is verified whether it is malicious or benign Free Alignment-Free N sequence Decoder
with the help of signatures. Various anti-virus tools (STRAND). The Algorithm State Machines (ASM)
are utilised for the analysis of malware and APIs sequence model was presented in their method, and
during isolated inspections. Analysis includes the use precision obtained by cross-validation was more than
of anti-virtual machines, anti-debuggers, analysing of 98.59 percent. In volume and diversity of
URLs, string analysis and packaging. Ronen et al. [5] programming versions, traditional safety protection
include the standard data set, which has been measures are not sufficient that is analysed by Souri et
announced by numerous malware as a challenge for al. [14]. Dynamic analysis of malware during runtime
the Kaggle competition. Ye et al.[6] submitted an may secure the model from malicious programming.
investigation into malware detection using smart This article proposed a framework to analyse malware
malware detection technologies for data mining. They [15] behaviour using machine learning automatically.
depict two phases in which features are extracted and
classified as crucial processes in the analysis and Hashemi et al. [16] proposed an entirely new
detection of malware. They reviewed research Windows platform based solution to polymorphic
activities from 2011 to 2016 including issues related malware detection. Polymorphic computer viruses are
to malware identification and data mining solutions. much more advanced and challenging to discover than
their original versions. It takes a lot of time to catch
Wang et al. [7] introduced the design and execution of them. A two-stage approach is used to evaluate them:
a sandbox, extractor and categorization. Mainly three first stage is creating both known and unknown
steps are considered for the tasks of collectors, mining malware for the API call sequences; second, sequence
workers and classifiers. The PinFWSandbox module restructuring and distance between the two data
in the collector which collects dynamic data, log file points.
data, and passes the extractor stage, as well as static
analysis and passionate performance. The extractor Souri et al. [17] proposed practical methods for
extracts all static characteristics, as well as dynamic developing warning correlated attack scenarios with
instructions and features. The classifier integrates all intrusion preconditions and effects. Their approach is
models, including the product of individual model based on the observation that alarms in a sequence of
classifications, system call output classifications, and attacks are not isolated but linked to multiple phases.
instantaneous classification with dynamic outcomes. Earlier stages are being prepared for the later steps.
They suggested a formal structure that would
Pai et al. [8] used clustering algorithms to classify represent warnings using the idea of hyper-alerts [18]
malware. Static characteristics and their ratings are with their conditions and consequences.
derived from the opcode sequences. These static

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1212


Authorized licensed use limited to: Harbin Engineering Univ Library. Downloaded on December 11,2024 at 08:25:17 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Smart Electronics and Communication (ICOSEC).
IEEE Xplore Part Number: CFP21V90-ART; ISBN: 978-1-6654-3368-6

Several malware detectors have to break down implemented, and that has less background traffic.
malicious code to generate assembly code for Therefore, a system is proposed for the detection and
analysis. Palumbo et al. [19] described scenarios in prevention of unknown malware attacks. The
which malware masks instructions to avoid static disadvantages indicated must be overcome by
analysis. They investigate static detection techniques developing an efficient extraction model which
by modelling the dynamic usage of the stack, which is enhances the accuracy of malware detection. To
used in metamorphic viruses. All the virous stages categorise malware and to prevent it from providing
will be avoided if memory stacks are used in a more systems security, an extractor model must be built for
sophisticated way. Many viruses [20], for example, the rule set.
contain ambiguous calling instructions for static
analysis breakdown. VI. CO NCLUSIO N
Malicious software being an increasing security
Two common issues with behavioural block threat, malware detection continues to be an essential
identification strategies and circumvention of research issue. An inquiry into current malware
monitoring points have been established by identity models was carried out using machine
Narayanan et al. [21]. For any commercial AV learning techniques. Malware detection systems have
solution today, all these areas pose problems. The been compared and evaluated on the basis of a
switch to the disc processor eliminates the difficulties number of critical aspects, including classification
of circumvention and allows partial solutions to the approaches, analytical methodology, dataset number,
other issues. One feature of the signature detection precision and analysis. Research on malware detection
that varies from the host-level is that conduct has already proved that machine learning is correctly
detectors only detect malware and it does not detect classified. It is hoped that more constructive learning
other anomalies. methods are developed with machine learning,
Wu et al.[22] explored smartphone-based malware ensemble learning and deep learning. These
detection model for animal health protection contributions can be linked to important fields of
component with biological resistance system using inquiry. A new mix of aims, features and algorithms
both static malware analysis and malware element can be investigated in order to increase accuracy
investigations. Due to the precisely assessed vector above the existing state of the art. Moreover, as some
coding, the static and dynamic features are classes of algorithms have never been used for some
distinguished and antigens are created. In addition, 34 purpose, new ways can be offered for further research.
malwares and 25 benign files were compiled to study An investigation of malware can provide other ideas
samples. to be pursued. The entire field of study focuses on the
development of appropriate malware testing
Bat-Erdene et al. [23] introduced a technique to standards. This paper provides a brief survey on the
characterize the packaging algorithms of unknown models available for malware detection. The new idea
packaging. Firstly, they estimated entropy of a given of malware analysis economics can drive future
executable and changed them into typical research routes when establishing a malware testing
representations by entropy estimates of a particular environment where appropriate tuning methods can be
memory region. They used symbolic approximation provided to balance conflicting metrics and improving
aggregate, which is considered to be viable for the security levels in the network.
enormous knowledge shifts. Secondly, images are
transmitted using managed learning-ordering
techniques, i.e., Naive Bayes and Support Vector REFERENCES
machines for computerization.

V. PRO PO PSED MO DEL


[1]. Gupta, D., & Rani, R. (2018). Big Data Framework for
The key advantage of behavioural detection systems is Zero Day Malware Detection. Cybernetics and Systems,
that all harmful files are identified by their call 49(2), 103-121.
behaviour which increases the prediction of malware. [2]. Burnap, P., French, R., Turner, F., & Jones, K. (2018).
The key disadvantage of the identification methods Malware classification using self organizing feature
based on the signature is the runtime overhead. There maps and machine activity data. computers & security,
73, 399- 410.
are three major platforms in the target environment:
[3]. AlAhmadi, B. A., & Martinovic, I. (2018, May).
embedded systems, windows and smartphones. Most MalClassifier: Malware family classification using
behaviour detection research papers employed the network flow sequence behaviour. In APWG
mobile environment to mirror the approach for Symposium on Electronic Crime Research (eCrime),
malware detection. However, any number of unknown 2018 (pp. 1-13). IEEE.
attacks that could affect the test findings can involve [4]. Khan, M. H., & Khan, I. R. (2017). Malware Detection
and Analysis. International Journal of Advanced
traffic from genuine networks. Researchers and
Research in Computer Science, 8(5).
developers should join a single tested network that has
different attacks, including those that have been

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1213


Authorized licensed use limited to: Harbin Engineering Univ Library. Downloaded on December 11,2024 at 08:25:17 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Smart Electronics and Communication (ICOSEC).
IEEE Xplore Part Number: CFP21V90-ART; ISBN: 978-1-6654-3368-6

[5]. Ronen, R., Radu, M., Feuerstein, C., Yom-T ov, E., & [20]. Mohamed GAN, Ithnin NB (2018) SBRT : API signature
Ahmadi, M. (2018). Microsoft Malware Classification behaviour based representation technique for improving
Challenge. arXiv preprint arXiv:1802.10135. metamorphic malware detection. In: Saeed F, Gazem N,
[6]. Ye, Y., Li, T ., Adjeroh, D., & Iyengar, S. S. (2017). A Patnaik S, Saed Balaid AS, Mohammed F (eds) Recent
survey on malware detection using data mining trends in information and communication technology.
techniques. ACM Computing Surveys (CSUR), 50(3), Proceedings of the 2nd international conference of
41. reliable information and communication technology
[7]. Wang, C., Ding, J., Guo, T ., & Cui, B. (2017, (IRICT 2017). Springer International Publishing, Cham,
November). A Malware Detection Method Based on pp 767–777
Sandbox, Binary Instrumentation and Multidimensional [21]. Kumar, S. A., Babu, E. S., Nagaraju, C., & Gopi, A. P.
Feature Extraction. In International Conference on (2015). An empirical critique of on-demand routing
Broadband and Wireless Computing, Communication protocols against rushing attack in MANET .
and Applications (pp. 427-438). Springer, Cham. International Journal of Electrical and Computer
[8]. Pai, S., Di T roia, F., Visaggio, C. A., Austin, T . H., & Engineering, 5(5).
Stamp, M. (2017). Clustering for malware classification. [22]. Narayanan A, Chandramohan M, Chen L, Liu Y (2017)
Journal of Computer Virology and Hacking T echniques, A multi-view context -aware approach to Android
13(2), 95-107 malware detection and malicious code localization.
[9]. Makandar, A., & Patrot, A. (2017). Overview of Empir Softw Eng. https://doi.org/10.1007/s10664-017-
malware analysis and detection. In IJCA proceedings on 9539-8
national conference on knowledge, innovation in [23]. Wu B, Lu T , Zheng K, Zhang D, Lin X (2014)
technology and engineering, NCKIT E (Vol. 1, pp. 35 - Smartphone malware detection model based on artificial
40). immune system. China Commun 11:86–92.
[10]. Kosmidis, K., & Kalloniatis, C. (2017, September). [24]. Bat-Erdene M, Park H, Li H, Lee H, Choi MS (2017)
Machine Learning and Images for Malware Detection Entropy analysis to classify unknown packing
and Classification. In Proceedings of the 21st Pan - algorithms for malware detection. Int J Inf Secur
Hellenic Conference on Informatics (p. 5). ACM. 16(3):227–248.
[11]. S. K. Sahay and A. Sharma, “Grouping the Executables [25]. Cui B, Jin H, Carullo G, Liu Z (2015) Service-oriented
to Detect Malwares with High Accuracy,” Procedia mobile malware detection system based on mining
Computer Science, vol. 78, no. June, pp. 667–674, 2016. strategies. Pervasive Mob Comput 24:101–116.
[12]. M. Ahmadi, D. Ulyanov, S. Semenov, M. Trofimov, and [26]. Fan Y, Ye Y, Chen L (2016) Malicious sequential
G. Giacinto, “Novel Feature Extraction, Selection and pattern mining for automatic malware detection. Expert
Fusion for Effective Malware Family Classification,” Syst Appl 52:16–
ACM Conference Data Application Security Priv., pp. 25. https://doi.org/10.1016/j.eswa.2016.01.002
183–194, 2016 [27]. Martín A, Menéndez HD, Camacho D (2016)
[13]. J. Drew, M. Hahsler, and T . Moore, “Polymorphic MOCDroid: multi-objective evolutionary classifier for
malware detection using sequence classification Android malware detection. Soft Comput 21:7405 –
methods and ensembles,” EURASIP J. Inf. Secur., vol. 7415.
2017, no. 1, p. 2, 2017. [28]. Sarada, K., Narayana, V. L., Gopi, P., & Pavani, V.
[14]. Souri A, Norouzi M, Asghari P (2017) An analytical (2020). An iterative group based anomaly detection
automated refinement approach for structural modeling method for secure data communication in networks.
large-scale codes using reverse engineering. Int J Inf Journal of Critical Reviews, 7(6), 208-212.
T echnol 9:329–333. https://doi.org/10.1007/s41870-017- [29]. Gopi, A., Babu, E. S., Raju, C. N., & Kumar, S. A.
0050-7 (2015). Designing an Adversarial Model Against
[15]. Souri A, Navimipour NJ, Rahmani AM (2017) Formal Reactive and Proactive Routing Protocols in MANET S:
verification approaches and standards in the cloud A Comparative Performance Study. International
computing: a comprehensive and systematic review. Journal of Electrical & Computer Engineering (2088 -
Comput Stand 8708), 5(5).
Interfaces. https://doi.org/10.1016/j.csi.2017.11.007 [30]. Shakya, Subarna, Lalitpur Nepal Pulchowk, and S.
[16]. Hashemi H, Azmoodeh A, Hamzeh A, Hashemi S Smys. “Anomalies Detection in Fog Computing
(2017) Graph embedding as a new approach for Architectures Using Deep Learning.” Journal: Journal of
unknown malware detection. J Comput Virol Hacking T rends in Computer Science and Smart T echnology
T ech 13:153–166. https://doi.org/10.1007/s11416-016- March 2020, no. 1 (2020): 46-55.
0278-y
[17]. Souri A, Asghari P, Rezaei R (2017) Software as a
service based CRM providers in the cloud computing:
challenges and technical issues. J Serv Sci Res 9:219–
237. https://doi.org/10.1007/s12927-017-0011-5
[18]. Chowdhury M, Rahman A, Islam R (2018) Malware
analysis and detection using data mining and machine
learning classification. In: Abawajy J, Choo K-KR,
Islam R (eds) International conference on applications
and techniques in cyber security and intelligence:
applications and techniques in cyber security and
intelligence. Springer International Publishing, Cham,
pp 266–274
[19]. Palumbo P, Sayfullina L, Komashinskiy D, Eirola E,
Karhunen J (2017) A pragmatic android malware
detection procedure. Comput Secure 70:689–
701. https://doi.org/10.1016/j.cose.2017.07.013

978-1-6654-3368-6/21/$31.00 ©2021 IEEE 1214


Authorized licensed use limited to: Harbin Engineering Univ Library. Downloaded on December 11,2024 at 08:25:17 UTC from IEEE Xplore. Restrictions apply.

You might also like