0% found this document useful (0 votes)
71 views15 pages

A Malware Classification Method Based On Three-Channel Visualization and Deep Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views15 pages

A Malware Classification Method Based On Three-Channel Visualization and Deep Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Computers & Security 126 (2023) 103084

Contents lists available at ScienceDirect

Computers & Security


journal homepage: www.elsevier.com/locate/cose

MCTVD: A malware classification method based on three-channel


visualization and deep learning
Huaxin Deng a, Chun Guo a,∗, Guowei Shen a, Yunhe Cui a, Yuan Ping b
a
State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, PR China
b
School of Information Engineering, Xuchang University, Xuchang 461000, PR China

a r t i c l e i n f o a b s t r a c t

Article history: With the rapid increase in the number of malware, the detection and classification of malware have
Received 25 August 2022 become more challenging. In recent years, many malware classification methods based on malware vi-
Revised 17 November 2022
sualization and deep learning have been proposed. However, the malware images generated by these
Accepted 27 December 2022
methods do not retain the semantic and statistical properties with a small and uniform size. This article
gives definitions of extracted content and filling mode to characterize the critical factors for the malware
Keywords: visualization task and proposes a new malware visualization method based on assembly instructions and
Malware classification Markov transfer matrices to characterize malware. Thus, a malware classification method based on three-
Malware visualization channel visualization and deep learning (MCTVD) is proposed. In MCTVD, its malware image has a small
Markov transfer matrix
and uniform size, and its convolutional neural network has few convolutional and pooling layers. Ex-
Deep learning
perimental results show that MCTVD can achieve an accuracy of 99.44% on Microsoft’s public malware
Convolutional neural network
dataset under 10-fold cross-validation and thus could be a highly competitive candidate for malware clas-
sification.
© 2022 Elsevier Ltd. All rights reserved.

1. Introduction ods still unsatisfactory. Deep learning methods can overcome these
shortcomings of machine learning. A convolutional neural network
Malware is any type of software that harms or exploits the nor- (CNN) is a deep learning approach that has proven very effective
mal operation of a system. In recent years, with the rapid devel- in tackling problems such as image recognition and classification
opment of the internet and computer technologies, the number (Basha et al., 2020). Correspondingly, deep learning was also intro-
of malware in the past decade has increased year by year. As re- duced in malware detection and classification (Kargarnovin et al.,
ported by AV-TEST (AV-TEST), the total number of malware cases 2022; Li et al., 2022; Wang et al., 2019b; Yadav and Tokekar, 2021),
was 1218.68 million as of 2022. Malware classification is a neces- and some malware classification approaches based on image visu-
sary task for malware analysis. It distinguishes different malware alization and deep learning were proposed in recent years. This
families to better understand the capabilities of malware variants type of method turns malware classification into an image clas-
from the same family and thus can reduce the work of security sification problem. Its feasibility lies in the fact that when differ-
analysts and facilitate their research on new malware or malware ent malware from the same family is converted into images, they
variants (Gibert et al., 2020). Unfortunately, classifying malware ef- appear to be similar in texture and layout (Verma et al., 2020).
ficiently and accurately is a challenging task. The generated malware image is the core of this type of malware
To improve the efficiency of classifying malware, traditional classification method. However, how to generate a high-quality im-
machine learning methods, such as decision tree (DT), naive age from a malware file is an issue that has not been deeply dis-
Bayes (NB), support vector machine (SVM), and k-nearest neigh- cussed. The most widely used malware visualization method uses
bor (KNN), have been widely used in a variety of malware clas- malware binaries directly as input, converting every 8-bit binary
sification methods (Pachhala et al., 2021). However, such meth- to one pixel to generate a grayscale image. This requires compres-
ods have the limitations of complex feature engineering and dif- sion or interception to keep the image size uniform when training
ficulty in processing large amounts of data. This makes such meth- with CNNs. There is undoubtedly a loss of effective information in
the original binary file during the conversion. A few malware vi-
sualization methods use opcode n-grams extracted from Windows
∗ Portable Executable (PE) files as pixels in generated malware im-
Corresponding author.
E-mail address: [email protected] (C. Guo). ages. Such methods generally use only the frequency of the unique

https://doi.org/10.1016/j.cose.2022.103084
0167-4048/© 2022 Elsevier Ltd. All rights reserved.
H. Deng, C. Guo, G. Shen et al. Computers & Security 126 (2023) 103084

combination of n consecutive opcodes to visualize malware, ignor- time-consuming (Gibert et al., 2020). These methods can be classi-
ing the role of the information about the quality of opcodes and fied into two categories according to the different kinds of features
operands in assembly instruction. Therefore, this article aims to used: static and dynamic.
explore the quality of malware images in terms of “extracted con-
tent” and “filling mode” defined in Section 3 and proposes a mal- 2.1.1. Static feature-based methods
ware visualization method to generate a high-quality malware im- Static features are those that can be obtained without running
age. the malware. These are features such as byte sequences (Yousefi-
To achieve this goal, we analyze the critical factors in gener- Azar et al., 2018), opcode sequences (Yeboah et al., 2021; Zhang
ating an image from malware and use the information extracted et al., 2019), API calls (Soni et al., 2022), and function call graphs
from the assembly instructions (containing opcodes and operands) (FCGs) (Hassen and Chan, 2017). Shalaginov et al. (2018) conducted
of a PE file to generate a three-channel image. A malware an in-depth survey of different machine-learning methods for the
classification method based on three-channel visualization and classification of static characteristics of Windows PE files. A frame-
deep learning (MCTVD) is then designed. MCTVD extracts the se- work to detect malicious applications and to categorize benign ap-
quence of assembly instructions from the code section (also known plications with an ensemble of multiple classifiers, namely, SVM,
as “.text” section) of malware and uses the transfer probabilities KNN, NB, classification and regression tree (CART), and random for-
of the unique combination of every 2 consecutive letters or num- est (RF) was proposed in (Wang et al., 2018). This framework ex-
bers from the sequence of assembly instructions to construct the tracts as many as 2,374,340 static features that fall into 11 types
three-channel image. This image contains richer information about (Restricted API calls, Suspicious API calls, and so on) from each APK
assembly instructions than the grayscale image and the opcode n- file and chooses the top-ranked 34,630 static features for detec-
gram image, and is beneficial to improving the accuracy of mal- tion and categorization. Zhang et al. (2019) proposed an opcode
ware classification. In addition, it does not require compression or sequence-based ransomware classification method. This method
interception of the sizes of the generated images. The main contri- first converts the opcode sequences from ransomware samples into
butions of this article are as follows: n-gram sequences, and then a vector consisting of term frequency
1) A three-channel malware visualization method based on as- values of the n-gram feature is used as the feature vector. Finally,
sembly instructions and Markov transfer matrices is proposed. The five machine learning methods are used to perform ransomware
extracted content and filling mode are defined to characterize the classification. Soni et al. (2022) proposed a malware classification
critical factors for the malware visualization task. Subsequently, a method using the features extracted from API calls and opcode se-
new malware visualization method is proposed. The image gener- quences. After extracting the features, four machine learning algo-
ated by this method focuses on retaining the information about as- rithms, NB, logistic regression, RF, and SVM, are used to classify
sembly instructions in the code section of malware with a reduced malware. Hassen and Chan (2017) proposed an FCG vector repre-
and equal size, which is helpful to improve the accuracy and effi- sentation based on function clustering that has significant perfor-
ciency of malware classification. mance gains which is then used for malware classification.
2) A CNN is designed to effectively classify the three-channel
images generated by our malware visualization method. Com- 2.1.2. Dynamic feature-based methods
pared with common CNNs, such as AlexNet (Krizhevsky et al., These features are obtained by dynamic analysis methods. Dy-
2012), VGG16 (Simonyan and Zisserman, 2014), and VGG19 namic analysis observes the interaction between malware and the
(Simonyan and Zisserman, 2014), our presented architecture has system by executing the executable file of the malware in a con-
fewer convolutional and fully connected layers, which is conducive trolled environment. Registry changes, memory writes, and API
to less time consumption during training. call traces are commonly used as dynamic features of malware.
3) A malware classification method called MCTVD combined Amer and Zelinka (2020) used various API functions with simi-
the three-channel images with our presented CNN is proposed. Ex- lar contextual characteristics as a cluster by studying the contex-
periments on a public dataset from Microsoft Corporation show tual relationships that exist between API functions in malware.
that MCTVD is superior to the traditional grayscale image-based, This article proved that there is indeed a clear difference be-
byte-level Markov-based, and RGB color image-based methods in tween the API call sequence of malware and benign software.
terms of accuracy and macro F1-score. San et al. (2019) proposed a malware family classification system
The rest of this article is organized as follows. Section 2 gives by extracting the prominent API features of 11 malware families
a brief introduction to the current malware classification methods. from a cuckoo sandbox. Xiao et al., 2020 proposed a graph reparti-
Section 3 describes the motivations of our proposed method. The tion algorithm to extract fragment behaviors from original API call
proposed MCTVD is detailed in Section 4. In Section 5, experimen- graphs and then obtained the crucial N-order subgraph for mal-
tal results regarding our method are presented and compared with ware detection and classification. An association rule-based mal-
other works. Finally, Section 6 summarizes the work of the article. ware classification using common subsequences of API calls was
proposed in (D’Angelo et al., 2021). This method exploits the prob-
abilities of transitioning from two API invocations in the call se-
2. Related work
quence.
Over the last 20 years, an increasing number of researchers
2.2. Malware classification methods based on malware images and
have proposed many malware classification methods based on ma-
deep learning
chine learning technologies. They can be roughly divided into two
types: methods based on traditional machine learning and meth-
The image-based malware classification method was first intro-
ods based on malware images and deep learning.
duced by Nataraj et al. (2011), who used the binary content of
malware to generate a grayscale image and applied GIST and KNN
2.1. Malware classification methods based on traditional machine to extract texture features to classify malware. A few years later,
learning image-based malware classification methods using machine learn-
ing were also proposed (Ghouti and Imam, 2020), which have the
These methods rely on handcrafted features based on expert limitation of needing a complex feature engineering process. With
knowledge, and their feature engineering process are generally the rapid development of deep learning technology in recent years

2
H. Deng, C. Guo, G. Shen et al. Computers & Security 126 (2023) 103084

and its excellent performance in the image classification field, mal- Yuan et al. (2022) converted a malware binary file into a multidi-
ware classification methods based on malware images and deep mensional Markov image. This image is a combination of several
learning have become a research hotspot in the malware classifi- byte transfer probability matrices and contains richer information
cation field in recent years. This type of approach can eliminate about byte distribution of a malware binary file than the Markov
many feature engineering works and obtain good classification ac- image proposed by Yuan et al. (2020).
curacy (Yuan et al., 2020). Different from existing work, in this article, the sequence of as-
Within the existing malware classification methods based on sembly instructions and the transfer probabilities of the unique
malware images and deep learning, the grayscale image converted combination of every 2 consecutive letters or numbers from the
directly from the binary sequence of malware is used by a major- sequence of assembly instructions are used as extracted content
ity of methods (Cui et al., 2018; Pinhero et al., 2021; Yan et al., and filling mode, respectively, to generate a three-channel image
2018; Zhao et al., 2020). For instance, Cui et al. (2018) converted from malware. In this way, the byte distribution of assembly in-
the binary sequence of malware into grayscale images. Then, these structions, the dependencies of an opcode on its previous opcode,
images were classified by using a CNN. Since only images with uni- and the quality of each opcode in malware are used to characterize
form sizes can be directly applied to CNNs, this type of method malware. The image thus can provide richer information about as-
requires compression or interception of the grayscale images when sembly instructions of malware than binary and opcode sequences.
they are trained with CNNs. Although Yuan et al. (2020) also used
binary sequences of malware, they converted the binary sequence 3. Motivation
into a Markov image with a fixed size of 256∗ 256 through the byte
transfer probability matrix. Then, a deep CNN was used for train- Since excellent performance can be obtained via malware clas-
ing a model to classify malware. Their Markov image contains the sification methods based on malware images and deep learning,
binary information and global structure of malware while ignor- this article focuses on designing a method belonging to this type
ing specific semantic information. The literature has shown that of method. One key problem of a malware classification method
opcode sequences can represent program behaviors (Jian et al., based on malware images and deep learning is how to convert
2021). Correspondingly, opcode sequences have also been used in malware into an image. This is because a deep learning model re-
some malware classification methods based on malware images quires a data representation that is convenient for this model to
and deep learning to generate images (Ni et al., 2018; Zhang et al., effectively extract key features from malware images. As shown in
2016). Zhang et al. (2016) collected opcode sequences from bi- Fig. 1, for generating an image from malware, there are two crit-
nary files in the dataset and used them to construct images. How- ical factors: what content is extracted from malware and how to
ever, the generated images are generally very sparse because the fill the content.
opcodes contained in a single sample are limited. In addition to Definition 1 (Extracted Content): Extracted content is the sub-
grayscale images, some other forms of malware images have also stance that is extracted from a malware file to prepare for filling
been proposed recently for malware classification. For instance, an image.
Wang et al. (2019a) converted the byte sequence into an RGB color Definition 2 (Filling Mode): The Filling mode is the method of
image. This conversion maps every 8-bit binary to an integer value filling the extracted content into an image.
of RGB in sequence. An RGB image named “CoLab image” was pro- Extracted content determines the collection of available infor-
posed by Xiao et al. (2021). The CoLab image uses colored label mation that can be used to fill an image. A binary sequence,
boxes to mark the sections of malware. A malware classification opcode sequence, and sequence of assembly instructions are in-
method based on CoLab image, VGG16, and SVM was constructed. stances of extracted content. The filling mode gives the specific

Fig. 1. Extracted content and filling mode for generating a malware image.

3
H. Deng, C. Guo, G. Shen et al. Computers & Security 126 (2023) 103084

Same parts between


two code blocks
Register operand Register operand
Immediate operand Immediate operand
Opcode Opcode
.text:004027B5 push ebp .text:0040695E push ebp
.text:004027B6 mov ebp, esp .text:0040695F mov ebp, esp
.text:004027B8 sub esp, 50h .text:00406961 sub esp, 50h
.text:004027BB mov [ebp+var_C], 4 .text:00406964 mov [ebp+var_40], 0
.text:004027C2 mov [ebp+var_3C], 0 .text:0040696B mov [ebp+var_30], 346h
.text:004027C9 mov [ebp+var_34], 0 .text:00406972 mov [ebp+var_50], 0
.text:004027D0 mov [ebp+var_40], 346h .text:00406979 mov [ebp+var_2C], 0DF10A3C3h
.text:004027D7 mov [ebp+var_50], 1 .text:00406980 mov [ebp+var_20], 4
.text:004027DE mov [ebp+var_8], 0EA931EA3h .text:00406987 mov [ebp+var_18], 378h
.text:004027E5 mov [ebp+var_4], 378h .text:0040698E mov [ebp+var_14], 1
.text:004027EC mov [ebp+var_44], 0 .text:00406995 mov [ebp+var_1C], 0
.text:004027F3 mov [ebp+var_18], 0 .text:0040699C mov [ebp+var_C], 0
.text:004027FA mov [ebp+var_20], 0 .text:004069A3 mov [ebp+var_28], 0
.text:00402801 mov edx, [ebp+arg_4] .text:004069AA mov ecx, [ebp+arg_4]
.text:00402804 and edx, 0FFFF0000h .text:004069AD and ecx, 0FFFF0000h
.text:0040280A cmp edx, 0 .text:004069B3 cmp ecx, 0
.text:0040280D jz short loc_402818 .text:004069B6 jz short loc_4069C1
.text:0040280F mov [ebp+var_48], 0 .text:004069B8 mov [ebp+var_34], 0
.text:00402816 jmp short loc_40281F .text:004069BF jmp short loc_4069C8

Code address of Absolute address of Code address of Absolute address of


memory operand memory operand memory operand memory operand
40281F-402816=9 4069C8-4069BF=9
Relative addresses of Relative addresses of
memory operand memory operand

Fig. 2. Comparison of the assembly instructions in code blocks of two malware samples from the same family.

pixel values and the size of the generated image and thus deter- section stores program codes of a PE file, while the “.data” sec-
mines what and how much information or property in the ex- tion stores the data variables of the program. Program code is the
tracted content can be provided by the generated image. It can core of the program, and different malware from the same fam-
be divided into two classes: nonuniform sized and uniform-sized. ily generally have similar program codes. To preserve more infor-
When the sizes of the generated image after filling are different mation about the assembly instructions of malware, the extracted
and thus cannot be directly applied to a CNN, this filling mode content used in this article is the sequence of assembly instruc-
belongs to the class of nonuniform size; otherwise, it belongs to tions in the code section of a malware file. An assembly instruc-
the class of uniform size. The 8 binary numbers and the stable tion includes an opcode and one or more operands. In a PE file,
value used in (Fu et al., 2018) are instances of the filling mode the code section is a block whose main content consists of as-
of nonuniform size, and the Markov transition probability and the sembly instructions. Therefore, we focuse on the code section of
Simhash value are examples of the filling mode of uniform size. the PE file rather than on the whole PE file and discard the other
Therefore, the extracted content and filling mode undoubtedly in- sections, such as the “.data” section. Fig. 2 gives the assembly in-
fluence whether a generated image is of high quality, i.e., the gen- structions in code blocks of two malware samples from the same
erated image can provide sufficient valuable information about its family and it is obvious that the assembly instructions of the two
original malware file, which is conducive to distinguishing the gen- malware samples are extremely similar in terms of opcode and
erated images belonging to different malware families. execution sequence. There are three main types of operands: im-
mediate operands, register operands, and memory operands. The
3.1. Extracted content similarity of the first two types of operands can be observed in-
tuitively, while the similarity of the memory operands can be bet-
Binary and opcode sequences are the two most common types ter observed by their relative addresses rather than the absolute
of extracted content used to generate the malware image. On the addresses. The relative address of a memory operand is obtained
one hand, the binary sequence preserves binary information and by the absolute address minus the code address. Intuitively, using
the global structure of a malware file while ignoring the specific the sequence of assembly instructions as the extracted content is
semantic information contained in the code section. On the other conducive to preserving richer information about assembly instruc-
hand, the opcode sequence preserves partial information in the as- tions of malware than binary and opcode sequences.
sembly instructions of malware. The literature has shown that the
opcode sequence is better than the binary sequence in terms of an- 3.2. Filling mode
alyzing malware files (Manavi and Hamzeh, 2017; Raff et al., 2018).
However, the opcode sequence lacks operands which are partici- As mentioned above, there are two types of filling modes:
pants in the execution of assembly instructions, i.e., the objects of nonuniform sized and uniform-sized. The most commonly used
various operations; thus, it just contains partial information about grayscale image uses every 8 binary numbers as a pixel, and thus,
the assembly instructions of malware. its filling mode belongs to the class of nonuniform size. The size
The address space of a PE file is flat, and its code and data are of the malware image generated by this filling mode varies with
stored in different sections in a certain format. Usually, the data the malware size. Therefore, it requires using byte truncation or
in different sections are logically related. For example, the code image scaling methods to unify the sizes of malware images when

4
H. Deng, C. Guo, G. Shen et al. Computers & Security 126 (2023) 103084

Fig. 3. MCTVD framework.

training with a CNN. However, the original binary information of tions is omitted because it is only used to separate operands. There
malware could be partially missing when the sizes of grayscale im- are only 62 different letters and numbers, which is helpful to gen-
ages are unified (Yuan et al., 2020). In addition, when faced with a erate matrices with a small and uniform size. Most new malware
malware variant with relocation sections, the similarities between comes from known malware with some code differences (Sun and
the malware variant’s image and its original malware image are Qian, 2021). Therefore, it can be inferred that malware of the same
low. Specifically, the stable values of “section entropy” and “sec- family has great similarity in code structures. Specifically, when
tion size” used in work (Fu et al., 2018) are another filling mode generating our three-channel image, the transfer probability of the
belonging to the nonuniform size class. unique combination of 2 consecutive letters or numbers, the trans-
For the uniform-sized filling mode, the SimHash value and fer probability of the unique combination of the first letters of
Markov transition probability are most commonly used ap- 2 consecutive opcodes, and the transfer probability of the unique
proaches. In (Ni et al., 2018), SimHash was used to convert op- combination of the last 2 consecutive letters in each opcode are
code sequences of different malware samples into images of equal used. They can (approximately) represent the assembly instruc-
size, but it can only generate short-length values; thus, the im- tions’ byte distribution, the dependencies of an opcode on its pre-
age that is converted from SimHash values usually requires inter- vious opcode, and the quality of each opcode. Compared with the
polating, which may introduce meaningless padding information. previous filling modes, it can provide richer information about as-
For Markov transition probability, the sizes of the generated im- sembly instructions in a malware file and will have a uniform size,
age are fixed and thus can be used directly as inputs for a CNN. In which will not cause missing useful information of assembly in-
(Manavi and Hamzeh, 2017), the frequency of the unique combi- structions and is suitable for classifying its variant with relocation
nation of every 2 consecutive opcodes that are extracted from the sections. After generating the three-channel image, a CNN is used
opcode sequence of malware is used to directly form a Markov im- to train the malware classification model. CNNs can discover the
age. Since there are many opcodes, the generated Markov images local features of images and are a good choice for classifying im-
will be large and sparse. This may lead to the effective informa- ages according to current research. Since the size of the proposed
tion in the image being too sparse and create some difficulties in three-channel image is only 62 × 62 × 3, a CNN that has a few fully
the training process of a CNN (Sun and Qian, 2021). Although fix- connected layers is designed for the classification of the malware
ing some of the opcodes as detection objects can deal with the images. The details will be introduced in Section 4.
above problem, it leads to a partial loss of useful information.
To avoid the useful information being missed due to the dis-
carding of the bytes or opcodes and to alleviate the sparseness 4. MCTVD method
of the generated image, we use the transfer probabilities of the
unique combination of every 2 consecutive letters or numbers ex- This section describes how MCTVD classifies malware upon the
tracted from the sequence of assembly instructions to generate content in Section 3. The overall framework of MCTVD is shown in
Markov images. Punctuation of the sequence of assembly instruc- Fig. 3. It consists of the following three steps that are as follows.

5
H. Deng, C. Guo, G. Shen et al. Computers & Security 126 (2023) 103084

Fig. 4. Schematic diagram of the three-channel image used in MCTVD.

Step 1: Sequence extraction. The assembly instructions in the states that include 26 uppercase letters, 26 lowercase letters, and
code section of malware are extracted from executable files with 10 numbers, where S = {s0 , s1 , . . . , s61 }. We assume that the next
the help of static analysis tools such as IDA Pro. These assembly state is only related to its current state and hence the sequence
instructions are viewed as a sequence of assembly instructions in of assembly instructions can be regarded as a Markov chain, i.e.,
the later steps. P (si+1 |s0 , . . . , si ) = P (si+1 |si ). Assuming that the transfer from the
Step 2: Malware visualization. The sequence of assembly in- previous state m to the subsequent state n occurs with a certain
structions obtained in step 1 is used to generate three Markov ma- probability, the transfer probability Pm,n is computed by using For-
trices. The consequence is that a 62 × 62 × 3 three-channel image mula 1.
is generated. f (m, n )
Step 3: Model training and classification of malware images. Pm,n = P (n | m ) = 61 (1)
The three-channel images are used to build a malware classifica- n=0 f (m, n )

tion model by using our presented CNN. Then the three-channel where f (m, n ) is the frequency of moving from state m to state n
images converted by the samples to be classified can be classified (m, n ∈ {0, 1, . . . , 61} ).
into different families by this model. The transfer probability matrix with n states has n2 transfer
probabilities and is a n × n matrix. Therefore, the size of matrix M
4.1. Sequence extraction used in MCTVD is small and uniform (62 × 62). As shown in For-
mula (2), the element pi, j in Row i and Column j of M represents
The sequence extraction step extracts a sequence of assembly the transfer probability from state i to state j.
instructions as the extracted contents for the three-channel image ⎡ ⎤
of MCTVD. In a PE file, its code section stores the code to be ex- p0,0 p0,1 ··· p0,61
⎢ p1,0 p1,1 ··· p1,61 ⎥
M=⎢ . .. ⎥
ecuted in the running process. To obtain more useful and nonre-
.. (2)
dundant features, MCTVD focuses on the assembly instructions in ⎣ .. .
..
. .

the code section of malware. Since these instructions cannot be p61,0 p61,1 ··· p61,61
obtained directly from the malware itself, it is necessary to use
a third-party analysis tool such as IDA Pro to convert the mal- To ensure that the generated malware image can provide rich
ware files into assembly files. After obtaining the assembly file, information about the assembly instructions of malware, three
MCTVD extracts the assembly instructions in the code section of transfer probability matrices generated by the sequence of assem-
the assembly file. Then, these instructions are combined into a se- bly instructions are used to construct a three-channel image in
quence of assembly instructions. When generating the sequence of MCTVD. That is, each transfer probability matrix is used as a chan-
assembly instructions, the opcode, immediate operands, and reg- nel of the three-channel image. A schematic diagram of the three-
ister operands in the extracted assembly instructions are reserved channel image used in MCTVD is shown in Fig. 4.
directly. For the memory operands, their relative addresses are re- The transfer probability of the unique combination of every 2
served in the sequence of assembly instructions and can be ob- consecutive letters or numbers is used to fill a matrix M1 for be-
tained as their absolute addresses minus their code addresses. ing the first channel of the three-channel image. It can reflect the
Compared with the binary and opcode sequences, the sequence byte distribution of the assembly instructions of a malware file.
of assembly instructions contains richer information about the as- Two-tuple opcodes can reflect the dependencies of an opcode on
sembly instructions of malware. its previous opcode. To generate the second channel of the three-
channel image, the transfer probability of the unique combination
4.2. Malware visualization of the first letters of every 2 consecutive opcodes in the sequence
of assembly instructions are used as the pixel values because they
In the malware visualization step, MCTVD uses the sequence can approximately substitute the transfer probability of 2-tuple op-
of assembly instructions obtained in the previous step to gener- codes. This transfer probability is used to fill a 62 × 62 matrix
ate a three-channel image. Specifically, we use the transfer prob- (called M2 ) to ensure that its size is the same as that of M1 . The
abilities of uppercase and lowercase letters or numbers of the se- quality of each opcode is a useful statistical characteristic for a PE
quence of assembly instructions as the pixel values. Punctuation is file and we add this property to our malware image. Specifically,
omitted because it is only used to separate operands or to help the transfer probability of the unique combination of the last two
the computer understand human-written assembly code. The ex- letters in each opcode of the sequence of assembly instructions is
istence of such irrelevant information may lead to difficulties in used to approximately substitute the quality of this opcode in a PE
the learning phase. The uppercase and lowercase letters and num- file. A 62 × 62 matrix M3 filled by these transition probabilities is
bers of the sequence of assembly instructions can be represented used to form the third channel of the three-channel image. Finally,
as a byte stream S. Assuming that each letter or number is re- a 62 × 62 × 3 matrix M = [M1 |M2 |M3 ] is constructed and is used
garded as a state, then each element in stream S has 62 possible to form a three-channel image. Fig. 5 shows the malware images

6
H. Deng, C. Guo, G. Shen et al. Computers & Security 126 (2023) 103084

Fig. 5. Malware images generated by MCTVD.

belonging to two malware families that were generated using the the convolutional and the fully connected layers. For our presented
MCTVD. As shown in Fig. 5, images generated by malware from the architecture, the size of the convolution kernel for the convolu-
same family have similar pixels and colors, while images generated tional layer is 3 × 3. Combined with other parameters (stride=1,
by malware from different families have significant differences in padding=‘same’), each convolutional layer of our presented archi-
some areas. These commonalities and differences make us believe tecture can maintain the same width and height as the previous
that learning the features of the three-channel images by a CNN layers. The number of convolution kernels will gradually double; it
could distinguish different malware families. The process of gener- begins with 64 and ends with 512. Its pooling layer uses maximum
ating the three-channel images is given in Algorithm 1. pooling with a 2 × 2 pool matrix, and the default step size is also
2 × 2. After pooling, the length and width of the matrix are con-
Algorithm 1 Three-channel image generation. tinuously reduced by half; it begins with 62 and ends with 3. To
speed up the training, a ReLU function is selected as the activation
Input: PE software E.
function for both the convolutional layer and the fully connected
Output: Three-channel image M.
layer. The Relu function is given by Formula 3.
1: A ⇐ assembly language file extracted from sample E
2: C ⇐ code Section C is obtained by A x (x > 0 )
h (x ) = (3)
3: for each line li in C do 0 (x ≤ 0 )
4: if li contains assembly instruction then
5: Rmo ⇐ calculate the relative address of the memory After convolution and pooling, the data are flattened into a
operand in the assembly instruction one-dimensional vector by the flatten function. In addition, our
6: Aai ⇐ stores the opcode, immediate operands, register proposed architecture includes a fully connected layer with an
operands, and Rmo output size of 1024. Subsequently, a softmax layer is connected,
7: end if whose number of neurons is set according to the number of mal-
8: end for ware families in the training dataset. The softmax function can be
9: /* Fai is the set of the first letter of every 2 consecutive opcodes used to solve multiclass classification problems and is given by
in Aai , Eai is the set of last two letters of each opcode in Aai */ Formula 4.
10: PAai ⇐ removes punctuation from Aai , leaving only letters and exp (ai )
so f tmax(yi ) = n (4)
i=1 exp (ai )
numbers
11: Fai ⇐ stores the first letter of every 2 consecutive opcodes from
Table 1 gives a comparison of our presented architecture with
Aai
AlexNet, VGG16, and VGG19. As shown, compared with these net-
12: Eai ⇐ stores the last two letters of each opcode from Aai
work structures, our presented architecture has fewer convolu-
13: M1 ⇐ Markov matrix M1 is generated by PAai
tion layers and fully connected layers. Correspondingly, our pre-
14: M2 ⇐ Markov matrix M2 is generated by Fai
sented architecture requires less time during training than AlexNet,
15: M3 ⇐ Markov matrix M3 is generated by Eai
VGG16, and VGG19 (see Section 5 for details). During the training
16: M ⇐ three-channel image M is constructed of M1 , M2 , M3
process, the Adam optimizer is used to learn the parameters, and
17: return M
the network is trained using the cross-entropy loss.

5. Experimental evaluation
4.3. Model training and classification of malware images
5.1. Dataset and experimental environment
This step aims to train a model to classify malware images
into different families. In recent years, deep learning models have The malware dataset used to evaluate MCTVD was derived from
emerged, and their effectiveness has been proven in many fields. a malware classification contest held by Microsoft at Kaggle in
Some existing deep learning models, such as AlexNet, VGG16, and 2015 (Ronen et al., 2018). It has been the benchmark dataset most
VGG19, show good performance in the field of image recognition. widely used in the field of static malware analysis since 2016. The
As mentioned above, the size of the three-channel image generated dataset consists of two separate parts: a training dataset and a
in step 2 is only 62 × 62 × 3. It is not suitable for the adoption of test dataset. The training dataset contains nine malware families
a complex model such as VGG16 to process our three-channel im- with a total of 10,868 samples. The test dataset contains 10,873
ages because of the difficulty of training or the overfitting prob- samples, but the labels of these samples are not publicly available.
lem caused by too small images. To address this problem, we con- Therefore, similar to most of the literature, we used only the train-
struct a CNN (shown in Fig. 6) with fewer layers compared with ing dataset (hereafter referred to as the Microsoft dataset) to ob-
VGG16, VGG19, or AlexNet. Compared with these above network tain experimental results. Table 2 lists the sample distribution of
structures, our presented architecture reduces the numbers of both the Microsoft dataset. For each sample, the dataset provides two

7
H. Deng, C. Guo, G. Shen et al. Computers & Security 126 (2023) 103084

Fig. 6. Network structure of our presented architecture used in MCTVD.

Table 1
Comparison of our presented architecture with AlexNet, VGG16, and VGG19 in the network structure.

Networks Convolutional layer Pooling layer Fully connected layer Total

AlexNet 5 3 3 11
VGG16 13 5 3 21
VGG19 16 5 3 24
Our presented architecture 4 4 1 9

Table 2 5.2. Evaluation metrics


Sample distribution of the Microsoft dataset.

Malware Family Malware Type Sample Number Four commonly used evaluation metrics were utilized to assess
Ramnit Confucianism 1541 the classification performance of MCTVD, namely, accuracy, preci-
Lollipop Advertising 2478 sion, recall, and F1-score. They can be calculated by Formulas 5∼8
Keilhos_ver3 Back Door 2942 based on true positives (T P ), true negatives (T N), false-positives
Vundo Trojan Horse 475 (F P ), and false negatives (F N).
Simda Back Door 42 
Tracur Download Software 751 i∈{1,...,k} T Pi
Keilhos_ver1 Back Door 398 Accuracy = (5)
Obfuscator.ACY Obfuscating Software 1228 N
Gatak Back Door 1013 
Total 10868 i∈{1,...,k} T Pi
P recision =  (6)
i∈{1,...,k} (T Pi + F Pi )


i∈{1,...,k} T Pi
file formats: malware binary files with the suffix “.bytes” (binary Recall =  (7)
i∈{1,...,k} (T Pi + F Ni )
stream files without PE headers) and the corresponding assembly
files with the suffix “.asm” decompiled by IDA Pro. MCTVD used 2 ∗ P recision ∗ Recall
only the assembly files. Note that 61 samples were removed in the F 1 − score = (8)
P recision + Recall
experiment for MCTVD because their assembly files did not con-
tain the code section. where N denotes the number of samples in the dataset with k mal-
To verify the effectiveness of MCTVD, we use stratified 10-fold ware families.
cross-validation. That is, the dataset was divided into 10 subsets of The meanings of T P , T N, F P , and F N for a specific malware fam-
equal size, the i-th subset was used as test data in turn, while the ily i ∈ {1, 2, . . . , k} are as follows: T Pi and F Ni denote the numbers
remaining subsets were used as training data. of samples correctly predicted as family i and not predicted as
MCTVD was implemented in Python 3 and trained on Ubuntu family i but actually belong to family i, respectively; T Ni and F Pi
18.04. Experiments were conducted with Intel(R) Xeon(R) Gold denote the numbers of samples correctly not predicted as family
5220, NVIDIA GeForce RTX 2080 Ti ∗ 1, 251G RAM. Table 3 lists i and incorrectly predicted as family i but actually do not belong
the parameters used in MCTVD and other network structures and to family i, respectively. Furthermore, we used the receiver operat-
methods in the experiment. ing characteristic (ROC) curve and the area under the ROC (AUC)

8
H. Deng, C. Guo, G. Shen et al. Computers & Security 126 (2023) 103084

Table 3
Parameters of different network structures and methods in the experiment.

Network Structure or Method Optimizer Learning Rate Decay Rate Batch Size Epoch

AlexNet Adam 3e-4 2e-7 16 250


VGG16 Adam 3e-4 2e-7 16 250
VGG19 Adam 3e-4 2e-7 16 250
GDMC Adam 3e-4 2e-7 16 250
MDMC Adam 3e-4 2e-7 16 250
RGBDMC Adam 3e-4 2e-7 16 250
MalCVS Adam 1e-3 5e-4 256 74
MulMarkov Adam 3e-4 2e-7 16 250
MCTVD Adam 3e-4 2e-7 16 250

Table 4
Results of our presented architecture, AlexNet, VGG16, and VGG19 under 10-fold cross-validation.

Network Accuracy Macro Precision Macro Recall Macro F1-score

AlexNet 0.9924 0.9806 0.9872 0.9838


VGG16 0.9915 0.9763 0.9805 0.9783
VGG19 0.9900 0.9586 0.9764 0.9665
Our presented architecture 0.9944 0.9944 0.9913 0.9929

to show more details about the performance of models. In addi-


tion, the training time (for building the model in the model train-
ing step) in seconds was considered to measure the efficiency of
the training models.
Due to the imbalance of various families of samples in the Mi-
crosoft dataset, for example, the Simda family has only 42 samples,
and the Kelihos_ver3 family has 2942 samples, we used macro
average to measure the average individual evaluation metrics ob-
tained for each category. It can be calculated by Formula 9.
q
1
Macro_metrics = metricsi (9)
q
i=1

5.3. Experimental results

5.3.1. Large training dataset test


A. The results of our presented architecture and other net- Fig. 7. Training times of different network structures under 10-fold cross-validation.
work structures
A CNN is designed in Section 4.3 to classify our three-channel
images. In this section, we compare our presented architecture
with three well-known network structures, i.e., AlexNet, VGG16, B. Comparative evaluation
and VGG19, for classifying the same three-channel images. To avoid To present better observe the performance of MCTVD, we took
random errors, 10-fold cross-validation is used to compare the ac- the performances of the traditional grayscale image-based method
curacies of different network structures. Note that the value of (GDMC) (Nataraj et al., 2011), the byte-level method based on
each valuation metric is an average value of 10-fold in this sec- Markov images and deep learning (MDMC) (Yuan et al., 2020), the
tion. The results obtained by our presented architecture, AlexNet, RGB color image-based method (RGBDMC) (Wang et al., 2019a),
VGG16, and VGG19, on the whole Microsoft dataset with 10-fold the CoLab image-based method (MalCVS) (Xiao et al., 2021) and
cross-validation are given in Table 4. the multidimensional Markov image-based method (MulMarkov)
On the one hand, as shown in Table 4, the classification ac- (Yuan et al., 2022) as our baselines. The grayscale image used in
curacies obtained by all four network structures on the Microsoft GDMC is the most widely used malware image. MDMC generates
dataset whose samples are converted to our three-channel images the malware image based on the Markov transfer probability ma-
are higher than 99%. This suggests that our three-channel images trix and uses a deep CNN to classify its generated images. The mal-
have high separability between malware families. On the other ware image generated by RGBDMC is an RGB image converted from
hand, Table 4 shows that our presented architecture provides bet- binary sequences. MalCVS is a malware classification using Co-
ter performance than AlexNet, VGG16, and VGG19 in terms of ac- Lab image (also an RGB image), pretrained VGG16, and SVM. Mul-
curacy, macro precision, macro recall, and macro F1-score. In ad- Markov is a malware classification method based on multidimen-
dition, Fig. 7 shows that the training time of our presented archi- sional Markov images and deep learning. In the experiment, GDMC
tecture is less than that of AlexNet, VGG16, and VGG19, which con- used our presented architecture with the single-channel model,
firms the lower time consumption during training of our presented RGBDMC used our presented architecture to classify the RGB im-
architecture. Comparisons of different network structures in terms ages, MulMarkov used the CNN proposed by (Su et al., 2018) to
of ROC and AUC are shown in Fig. 8, our presented architecture classify the multidimensional Markov images (dimension=3), and
and VGG16 behave better than AlexNet and VGG19. the image sizes of GDMC, MDMC, RGBDMC, MalCVS, and Mul-

9
H. Deng, C. Guo, G. Shen et al. Computers & Security 126 (2023) 103084

Fig. 8. ROC curves and AUC values of different network structures under 10-fold Fig. 10. ROC curves and AUC values of different methods under 10-fold cross-
cross-validation. validation.

Table 5
Results obtained by different methods under 10-fold cross-validation.
Markov. One main reason for this is that the size of the image
Method Accuracy Macro Precision Macro Recall Macro F1-score generated in MCTVD is only 62 × 62 × 3, which is smaller than
GDMC 0.9237 0.8948 0.8452 0.8649 the sizes of the images generated by the other four methods. For
MDMC 0.9826 0.9563 0.9422 0.9487 the ROC curves and AUC values of different methods as shown in
RGBDMC 0.9410 0.9198 0.8650 0.8839 Fig. 10, among all the methods, MCTVD obtains the highest AUC
MalCVS 0.9891 0.9821 0.9748 0.9763
value.
MulMarkov 0.9915 0.9902 0.9628 0.9747
MCTVD 0.9944 0.9944 0.9913 0.9929 The resulting confusion matrices obtained by the six methods
are shown in Fig. 11. It can help understand the detailed accuracy
of the six methods for each of the nine malware families. From
Fig. 11, we can observe that MCTVD outperforms GDMC, MDMC,
RGBDMC, and MalCVS on all nine malware families, and MCTVD is
inferior to MulMarkov in three out of the nine malware families.
To further assess the accuracy of MCTVD, the performances
of some other state-of-the-art malware classification methods are
given. For fairness, only the methods that had used the same
dataset and the division (the whole Kaggle’s Microsoft training
dataset under 10-fold cross-validation) as well as those based
on one modality of data (.bytes or .asm) were selected. More
specifically, method (Drew et al., 2016), method (Narayanan et al.,
2016), method (Drew et al., 2017), and method (Hassen and
Chan, 2017) are methods based on static features and tradi-
tional machine learning; method (Lin and Yeh, 2022), method
(Gibert, Mateu, Planes, Vicens, 2018), method (Ding et al., 2020)
and method (Gibert et al., 2018a) are methods based on static
features and deep learning; and method (Kim et al., 2017),
method (Kim and Cho, 2022), method (Gibert et al., 2019) and
method (Ren et al., 2020) can be classified into methods based
Fig. 9. Training times of different methods under 10-fold cross-validation.
on malware images and deep learning. Table 6 gives the av-
erage accuracies comparison of different methods under 10-fold
cross-validation on the Microsoft dataset. As shown in Table 6,
Markov were compressed into 256 × 256, 256 × 256, 256 × 256 × MCTVD obtains higher average accuracy than the other methods.
3, 224 × 224 × 3, and 256 × 256 × 3, respectively. Hence it could be a highly competitive candidate for malware
As shown in Table 5, among the six methods, MCTVD obtained classification.
the best accuracy, macro precision, macro recall, and macro F1-
score. It is worth mentioning here that the accuracy of MCTVD
is 5.34% higher than that of RGBDMC with the same CNN struc- 5.3.2. Small training dataset test
ture, which reflects that the malware image generated by MCTVD To evaluate the performance of MCTVD in the scenario where
is markedly superior to the RGB image generated by RGBDMC in only limited training samples are available, this section presents
terms of malware classification. In comparisons of different meth- the results of an experiment to test the MCTVD trained by a small
ods in terms of training time shown in Fig. 9, MalCVS requires training dataset. To avoid random errors, a specific 5-fold cross-
far less training time than the other five methods because it re- validation was used. In each fold, 20% of the samples from the
lies on a pretrained model and a traditional machine learning al- whole dataset were used as the training data, and the remaining
gorithm. Among the remaining five methods, the training time of 80% were used as the test data. We note that the value of each
MCTVD is less than that of GMDC, MDMC, RGBDMC, and Mul- valuation metric is an average value of 5-fold.

10
H. Deng, C. Guo, G. Shen et al. Computers & Security 126 (2023) 103084

Fig. 11. Confusion matrices obtained by different methods under 10-fold cross-validation.

Table 7 shows the comparison of the six methods under 5- different methods given in Fig. 12, MCTVD obtains the highest AUC
fold cross-validation on the Microsoft dataset. From Table 7, we value under the 5-fold cross-validation. Fig. 13 gives the training
can see that even when the training dataset only accounts for 20% times of different methods. Similar to the results obtained by dif-
and the testing dataset accounts for 80%, the accuracy of MCTVD ferent methods under 10-fold cross-validation, Fig. 13 shows that
still reaches 98.72%, which is higher than that of the other five MCTVD requires less training time than GMDC, MDMC, RGBDMC,
methods. Table 7 also shows that MCTVD achieves the best per- and MulMarkov, while it requires more training time than MalCVS
formance in terms of macro recall, macro precision, and macro F1- because MalCVS relies on a pretrained model and a traditional ma-
score under 5-fold cross-validation. As for the ROCs and AUCs of chine learning algorithm. The resulting confusion matrices of the

11
H. Deng, C. Guo, G. Shen et al. Computers & Security 126 (2023) 103084

Table 6
The average accuracies comparison of different methods under 10-fold cross-validation on the Microsoft dataset.

Method Input Algorithm Accuracy

(Drew et al., 2016) Byte sequence Strand 97.41%


(Narayanan et al., 2016) Byte image Linear kNN 96.6%
(Drew et al., 2017) Opcode sequence Strand 98.59%
(Hassen and Chan, 2017) Function call graph Ensembling multiple RF 99.3%
(Lin and Yeh, 2022) Bit sequence 1D CNN 96.32%
(Gibert, Mateu, Planes, Vicens, 2018) Structural entropy CNN 98.28%
(Ding et al., 2020) Opcode sequence Self-attention 98.48%
(Gibert et al., 2018a) Byte sequence Residual network 98.61%
(Kim et al., 2017) Byte image tGAN 96.39%
(Kim and Cho, 2022) Byte sequence VAE+1D CNN+LSTM 97.47%
(Gibert et al., 2019) Byte image CNN 97.5%
(Ren et al., 2020) Markov image VGG16+SVM 99.08%
MCTVD Assembly instruction sequence CNN 99.44%

Table 8
Table 7
Models in the ablation experiment.
Results obtained by different methods under 5-fold cross-validation.
Method First Channel Second Channel Third Channel
Method Accuracy Macro Precision Macro Recall Macro F1-score
MCTVD-F • ◦ ◦
GDMC 0.8116 0.7258 0.6648 0.6820
MCTVD-S ◦ • ◦
MDMC 0.9620 0.9167 0.9014 0.9086
MCTVD-T ◦ ◦ •
RGBDMC 0.8673 0.8248 0.7326 0.7520
MCTVD-FS • • ◦
MalCVS 0.9741 0.9545 0.9363 0.9436
MCTVD-FT • ◦ •
MulMarkov 0.9756 0.9641 0.9318 0.9449
MCTVD-ST ◦ • •
MCTVD 0.9872 0.9789 0.9584 0.9674
MCTVD • • •

Note: • means the channel is included, ◦ means it is not included.

average results obtained by the six methods under 5-fold cross-


validation are given in Fig. 14. It shows that MCTVD behaves bet-
ter than GDMC, MDMC, and RGBDMC on all nine malware families,
and MCTVD is inferior to MulMarkov and MalCVS in merely one
and two out of the nine malware families, respectively.

5.3.3. Ablation experiment


To explore the contribution of the different channels in the
three-channel image to the final result, ablation experiments were
designed and the results are reported in this subsection. Exper-
iments for single-channel images and two-channel images were
conducted. Since each single channel of the three-channel image
is a grayscale image, Our presented architecture was changed to
the single-channel mode. The each two-channel image was derived
from the three-channel image by filling one channel with 0s. The
other experimental parameters in this section were the same as in
Fig. 12. ROC curves and AUC values of different methods under 5-fold cross-
validation.
the previous section. The settings of the seven comparison models
are shown in Table 8.
Fig. 15 gives the performance of the different models used in
the ablation experiment under 10-fold cross-validation on the Mi-
crosoft dataset. On the one hand, it shows that among the three
channels, the first channel contributes most to the classification ac-
curacy. On the other hand, each channel contributes to the classi-
fication accuracy because each two-channel image obtains higher
accuracy than the single-channel images constituting them, and
the three-channel image obtains the best accuracy. In practice, the
number of channels n used will influence the form and quality of
the generated images. n can be neither too small nor too large.
When a very small value of n is used, the effective information
contained in the generated image may not be enough to make
highly accurate malware classification. On the other hand, if n is
too large, the computation requirement in training a CNN will in-
crease and even lead to less accurate outcomes when some of the
channels are filled by invalid properties.

Fig. 13. Training times of different methods under 5-fold cross-validation.

12
H. Deng, C. Guo, G. Shen et al. Computers & Security 126 (2023) 103084

Fig. 14. Confusion matrices obtained by different methods under 5-fold cross-validation.

6. Conclusion assembly instructions in the code section of the PE file as the


extracted content and has a small and uniform size. This mal-
Malware images are the key to malware classification methods ware image will not cause missing useful information on extracted
based on malware images and deep learning. To analyze the qual- content because it does not require interception or compression.
ity of the generated image, definitions of extracted content and fill- Based on the three-channel image and a CNN that has only a
ing mode are proposed to characterize the critical factors for mal- few convolution and fully connected layers, a malware classifi-
ware visualization task. Both of them need to be focused on when cation method called MCTVD is constructed. A series of experi-
generating images from malware. In addition, a three-channel mal- ments were conducted on the widely used Microsoft dataset to
ware visualization method is proposed to improve malware classi- evaluate the performance of MCTVD. Experimental results show
fication accuracy. The three-channel image uses the sequence of that MCTVD achieves high accuracy in both the scenarios of us-

13
H. Deng, C. Guo, G. Shen et al. Computers & Security 126 (2023) 103084

Ding, Y., Wang, S., Xing, J., Zhang, X., Oi, Z., Fu, G., Qiang, Q., Sun, H., Zhang, J., 2020.
Malware classification on imbalanced data through self-attention. In: 2020 IEEE
19th International Conference on Trust, Security and Privacy in Computing and
Communications (TrustCom). IEEE, pp. 154–161.
Drew, J., Hahsler, M., Moore, T., 2017. Polymorphic malware detection using se-
quence classification methods and ensembles. EURASIP J. Inform. Secur. 2017
(1), 1–12.
Drew, J., Moore, T., Hahsler, M., 2016. Polymorphic malware detection using se-
quence classification methods. In: 2016 IEEE Security and Privacy Workshops
(SPW). IEEE, pp. 81–87.
D’Angelo, G., Ficco, M., Palmieri, F., 2021. Association rule-based malware classifica-
tion using common subsequences of API calls. Appl. Soft Comput. 105, 107234.
Fu, J., Xue, J., Wang, Y., Liu, Z., Shan, C., 2018. Malware visualization for fine-grained
classification. IEEE Access 6, 14510–14523.
Ghouti, L., Imam, M., 2020. Malware classification using compact image features and
multiclass support vector machines. IET Inf. Secur. 14 (4), 419–429.
Gibert, D., Mateu, C., Planes, J., 2018. An end-to-end deep learning architecture for
classification of malware’s binary content. In: International Conference on Arti-
ficial Neural Networks. Springer, pp. 383–391.
Gibert, D., Mateu, C., Planes, J., 2020. The rise of machine learning for detection
and classification of malware: research developments, trends and challenges. J.
Netw. Comput. Appl. 153, 102526.
Fig. 15. The accuracies of the different models used in the ablation experiment un- Gibert, D., Mateu, C., Planes, J., Vicens, R., 2018. Classification of malware by us-
der 10-fold cross-validation. ing structural entropy on convolutional neural networks. In: Proceedings of the
AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, Febru-
ary 2-7, 2018, AAAI Press, pp. 7759–7764.
ing large and small training datasets. In the future, we will further Gibert, D., Mateu, C., Planes, J., Vicens, R., 2019. Using convolutional neural networks
for classification of malware represented as images. J. Comput. Virol. Hacking
explore the effect of the different numbers of channels for generat-
Tech. 15 (1), 15–28.
ing malware images on malware classification and use API calls or Hassen, M., Chan, P.K., 2017. Scalable function call graph-based malware classifica-
FCG features as extracted content to generate high-quality malware tion. In: Proceedings of the Seventh ACM on Conference on Data and Applica-
images. tion Security and Privacy, pp. 239–248.
Jian, Y., Kuang, H., Ren, C., Ma, Z., Wang, H., 2021. A novel framework for im-
age-based malware detection with a deep neural network. Comput. Secur. 109,
Declaration of Competing Interest 102400.
Kargarnovin, O., Sadeghzadeh, A. M., Jalili, R., 2022. Mal2GCN: a robust malware
detection approach using deep graph convolutional networks with non-negative
The authors declare that they have no known competing finan- weights. arXiv preprint arXiv:2108.12473.
cial interests or personal relationships that could have appeared to Kim, J.-Y., Bu, S.-J., Cho, S.-B., 2017. Malware detection using deep transferred gener-
influence the work reported in this paper. ative adversarial networks. In: International Conference on Neural Information
Processing. Springer, pp. 556–564.
Kim, J.-Y., Cho, S.-B., 2022. Obfuscated malware detection using deep generative
CRediT authorship contribution statement model based on global/local features. Comput. Secur. 112, 102501.
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep
convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1106–1114.
Huaxin Deng: Methodology, Software, Data curation, Writing Li, C., Cheng, Z., Zhu, H., Wang, L., Lv, Q., Wang, Y., Li, N., Sun, D., 2022. DMalNet:
– original draft. Chun Guo: Conceptualization, Methodology, For- dynamic malware analysis based on API feature engineering and graph learning.
mal analysis, Funding acquisition, Writing – review & editing. Comput. Secur. 122, 102872.
Lin, W.-C., Yeh, Y.-R., 2022. Efficient malware classification by binary sequences with
Guowei Shen: Formal analysis, Investigation, Funding acquisition,
one-dimensional convolutional neural networks. Mathematics 10 (4), 608.
Resources. Yunhe Cui: Investigation, Validation, Writing – review Manavi, F., Hamzeh, A., 2017. A new method for malware detection using opcode
& editing. Yuan Ping: Methodology, Writing – review & editing. visualization. In: 2017 Artificial Intelligence and Signal Processing Conference
(AISP). IEEE, pp. 96–102.
Narayanan, B.N., Djaneye-Boundjou, O., Kebede, T.M., 2016. Performance analysis
Data Availability
of machine learning and pattern recognition algorithms for malware classifi-
cation. In: 2016 IEEE National Aerospace and Electronics Conference (NAECON)
Data will be made available on request. and Ohio Innovation Summit (OIS). IEEE, pp. 338–342.
Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.S., 2011. Malware images: visu-
alization and automatic classification. In: Proceedings of the 8th International
Acknowledgments Symposium on Visualization for Cyber Security, pp. 1–7.
Ni, S., Qian, Q., Zhang, R., 2018. Malware identification using visualization images
The authors thank the anonymous referees for their valuable and deep learning. Comput. Secur. 77, 871–885.
Pachhala, N., Jothilakshmi, S., Battula, B.P., 2021. A comprehensive survey on identi-
comments and suggestions, which improved the technical con- fication of malware types and malware classification using machine learning
tent and the presentation of the article. This work is supported techniques. In: 2021 2nd International Conference on Smart Electronics and
by the National Natural Science Foundation of China under Grant Communication (ICOSEC). IEEE, pp. 1207–1214.
Pinhero, A., Anupama, M., Vinod, P., Visaggio, C.A., Aneesh, N., Abhijith, S., Anan-
No. 62162009, the Science and Technology Foundation of Guizhou
thaKrishnan, S., 2021. Malware detection employed by visualization and deep
Province under Grant No. [2020]1Y268, the Guizhou Major Special neural network. Comput. Secur. 105, 102247.
Science and Technology Project under Grant No. 20183001, the Key Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.K., 2018. Mal-
ware detection by eating a whole EXE. In: Workshops at the Thirty-Second AAAI
Technologies R&D Program of He’nan Province under Grant Nos.
Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2-7,
212102210084 and 222102210048. 2018, AAAI Press, pp. 268–276.
Ren, Z., Chen, G., Lu, W., 2020. Malware visualization methods based on deep con-
References volution neural networks. Multimed. Tools Appl. 79 (15), 10975–10993.
Ronen, R., Radu, M., Feuerstein, C., Yom-Tov, E., Ahmadi, M., 2018. Microsoft malware
Amer, E., Zelinka, I., 2020. A dynamic windows malware detection and prediction classification challenge. arXiv preprint arXiv:1802.10135.
method based on contextual understanding of API call sequence. Comput. Secur. San, C.C., Thwin, M.M.S., Htun, N.L., 2019. Malicious software family classification
92, 101760. using machine learning multi-class classifiers. In: Computational Science and
AV-TEST, Av-test, 2022. https://www.av-test.org/en/statistics/malware/.Online. Ac- Technology. Springer, pp. 423–433.
cessed: 24 August 2022. Shalaginov, A., Banin, S., Dehghantanha, A., Franke, K., 2018. Machine learning aided
Basha, S.S., Dubey, S.R., Pulabaigari, V., Mukherjee, S., 2020. Impact of fully con- static malware analysis: a survey and tutorial. In: Cyber Threat Intelligence.
nected layers on performance of convolutional neural networks for image clas- Springer, pp. 7–45.
sification. Neurocomputing 378, 112–119. Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale
Cui, Z., Xue, F., Cai, X., Cao, Y., Wang, G.-g., Chen, J., 2018. Detection of malicious image recognition. arXiv preprint arXiv:1409.1556.
code variants based on deep learning. IEEE Trans. Ind. Inf. 14 (7), 3187–3196.

14
H. Deng, C. Guo, G. Shen et al. Computers & Security 126 (2023) 103084

Chun Guo received PhD degree in information security


Soni, H., Kishore, P., Mohapatra, D.P., 2022. Opcode and API based machine learning
from Beijing University of Posts and Telecommunications
framework for malware classification. In: 2022 2nd International Conference on
in July 2014. He is currently an Associate Professor in
Intelligent Technologies (CONIT). IEEE, pp. 1–7.
the College of Computer Science and Technology, Guizhou
Su, J., Vasconcellos, D.V., Prasad, S., Sgandurra, D., Feng, Y., Sakurai, K., 2018.
University, PR China. His research interests include data
Lightweight classification of IoT malware based on image recognition. In: 2018
mining, intrusion detection and malware detection.
IEEE 42Nd Annual Computer Software and Applications Conference (COMPSAC),
Vol. 2. IEEE, pp. 664–669.
Sun, G., Qian, Q., 2021. Deep learning and visualization for identifying malware fam-
ilies. IEEE Trans. Dependable Secure Comput. 18 (1), 283–295. doi:10.1109/TDSC.
2018.2884928.
Verma, V., Muttoo, S.K., Singh, V., 2020. Multiclass malware classification via
first-and second-order texture statistics. Comput. Secur. 97, 101895.
Wang, S.-w., Zhou, G., Lu, J.-c., Zhang, F.-j., 2019. A novel malware detection and
classification method based on capsule network. In: International Conference Guowei Shenreceived his PhD degree from Harbin Engi-
on Artificial Intelligence and Security. Springer, pp. 573–584. neering University. He is currently a Professor of Guizhou
Wang, W., Li, Y., Wang, X., Liu, J., Zhang, X., 2018. Detecting android malicious apps University. His main research interests include big data,
and categorizing benign apps with ensemble of classifiers. Future Gen. Comput. computer network and cybersecurity.
Syst. 78, 987–994.
Wang, W., Zhao, M., Wang, J., 2019. Effective android malware detection with a hy-
brid model based on deep autoencoder and convolutional neural network. J.
Ambient Intell. Humaniz. Comput. 10 (8), 3035–3043.
Xiao, M., Guo, C., Shen, G., Cui, Y., Jiang, C., 2021. Image-based malware classification
using section distribution information. Comput. Secur. 110, 102420.
Xiao, F., Sun, Y., Du, D., Li, X., Luo, M., 2020. A novel malware classification method
based on crucial behavior. Math. Probl. Eng. 2020, 6804290.
Yadav, B., Tokekar, S., 2021. Recent innovations and comparison of deep learning
techniques in malware classification: a review. Int. J. Inform. Secur.Sci. 9 (4), Yunhe Cui received his PhD degree from the South-
230–247. west Jiaotong University, Chengdu, Sichuan, PR China.
Yan, J., Qi, Y., Rao, Q., 2018. Detecting malware with an ensemble method based on He is currently a lecturer of Guizhou University,
deep neural network. Secur. Commun. Netw. 2018, 7247095. Guiyang, Guizhou, PR China. His research interests include
Yeboah, P.N., Amuquandoh, S.K., Musah, H.B.B., 2021. Malware detection using en- software-defined networking, network security, traffic en-
semble n-gram opcode sequences. Int. J. Interact. Mob.Technol. 15 (24), 19–31. gineering, swarm intelligence algorithm, data centers,
Yousefi-Azar, M., Hamey, L., Varadharajan, V., Chen, S., 2018. Learning latent edge computing and cloud computing.
byte-level feature representation for malware detection. In: International Con-
ference on Neural Information Processing. Springer, pp. 568–578.
Yuan, B., Wang, J., Liu, D., Guo, W., Wu, P., Bao, X., 2020. Byte-level malware classi-
fication based on Markov images and deep learning. Comput. Secur. 92, 101740.
Yuan, B., Wang, J., Wu, P., Qing, X., 2022. IoT malware classification based on
lightweight convolutional neural networks. IEEE Internet Things J. 9 (5), 3770–
3783. doi:10.1109/JIOT.2021.310 0 063.
Zhang, H., Xiao, X., Mercaldo, F., Ni, S., Martinelli, F., Sangaiah, A.K., 2019. Classifica- Yuan Ping received the BS degree in electronics and in-
tion of ransomware families with machine learning based onN-gram of opcodes. formation engineering from Southwest Normal University,
Future Gener. Comput. Syst. 90, 211–221. in 2003, the MS degree in mathematics from He’nan Uni-
Zhang, J., Qin, Z., Yin, H., Ou, L., Hu, Y., 2016. IRMD: malware variant detection us- versity, in 2008, and the PhD degree in information se-
ing opcode image recognition. In: 2016 IEEE 22nd International Conference on curity from the Beijing University of Posts and Telecom-
Parallel and Distributed Systems (ICPADS). IEEE, pp. 1175–1180. munications, in 2012. He was a Visiting Scholar with
Zhao, Y., Cui, W., Geng, S., Bo, B., Feng, Y., Zhang, W., 2020. A malware detec- the School of Computing and Informatics, University of
tion method of code texture visualization based on an improved faster RCNN Louisiana at Lafayette and with the Department of Com-
combining transfer learning. IEEE Access 8, 166630–166641. doi:10.1109/ACCESS. puting Science, University of Alberta. He is currently a
2020.3022722. Professor with Xuchang University. His research interests
include machine learning, public key cryptography, data
Huaxin Deng received BS degree in computer science privacy and security, and cloud and edge computing.
and technology from Chongqing University of Science and
Technology in China in 2019. He is currently pursuing
the MS degree in computer science and technology from
Guizhou University. His recent research interests include
information security and malware classification.

15

You might also like