0% found this document useful (0 votes)
50 views10 pages

Ieee Access Image Malware Aug22

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 10

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3198072

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

An Attention Mechanism for


Combination of CNN and VAE for
Image-Based Malware Classification
TUAN VAN DAO, HIROSHI SATO AND MASAO KUBO
Department of Computer Science, National Defense Academy, 1-10-20 Hashirimizu, Yokosuka, Kanagawa, Japan
Corresponding author: Tuan Van Dao (e-mail: [email protected]).

ABSTRACT Currently, malware is increasing in both number and complexity dramatically. Several
techniques and methodologies have been proposed to detect and neutralize malicious software. However,
traditional methods based on the signatures or behaviors of malware often require considerable compu-
tational time and resources for feature engineering. Recent studies have applied machine learning to the
problems of identifying and classifying malware families. Combining many state-of-the-art techniques has
become popular but choosing the appropriate combination with high efficiency is still a problem. The
classification performance has been significantly improved using complex neural network architectures.
However, the more complex the network, the more resources it requires. This paper proposes a novel
lightweight architecture by combining small Convolutional Neural Networks and advanced Variational
Autoencoder, enhanced by channel and spatial attention mechanisms. We achieve overperformance and
sufficient time through various experiments compared to other cutting-edge techniques using unbalanced
and balanced Malimg datasets.

INDEX TERMS Malware Classification, Variational Autoencoder, channel attention, spatial attention,
latent representation, information security.

I. INTRODUCTION strings that are all embedded in raw bytes of the Portable
The Internet has become an essential function in our lives. Executable (PE) [4]. The main limitation of static analysis
However, at the same time, it also raises many security threats is that it is not sufficient in the case of code obfuscation
while providing excellent service. Malware is a powerful and zero-malware. In addition, the analysis will be time-
tool for an attacker to intrude, sabotage, and control a tar- consuming if malware is mixed up with many disruptive
get indirectly as a remote administration tool through the methods.
Internet. The abuse of various malware causes a significant On the other hand, dynamic analysis investigates the mal-
impact on cyber-security and threats to individuals, society, ware as they are executed in simulated environments like
and countries [1], [2]. Authors of malware mix different sandboxes or virtual machines [5]. This analysis does not
evading techniques such as user interaction, environment require disassembling the PE file and decompression and un-
awareness, obfuscation, code compression, and code en- packing in advance to gain malware’s features as static anal-
cryption to change existing malicious code’s appearance to ysis. The main limitation of this analysis is that the dynamic
bypass the Anti-virus System and Intrusion Detection System analysis may not always uncover malicious behavior because
(IDS). However, it is often the case that the new variants still some malware can detect virtual environments and change
have the same malicious intentions and characteristics as the its behavior. Moreover, because of the rapid development of
original malware. many automatic malware creation tools [6], these methods
There are two malware detection and analysis techniques: cannot catch up to the speed of malware generation.
static analysis and dynamic analysis. The static analysis in- Machine learning has become more potent because its
vestigates the malware without executing them [3]. This type highly developed algorithms can solve most problems en-
of analysis utilizes various information, such as Application countered in almost every field. Several methods extract
Programming Interface (API) calls, the entropy of files, and elements from malicious software, such as API calls [7],

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3198072

T..V Dao, H. Sato, M. Kubo et al.: A Novel Combination of Light-weight Deep Learning Model for Image-Based Malware Classification

[8], and feed them into machine learning. Some of them So far, recent studies have focused mainly on the depth and
take advantage of Natural Language Processing (NLP) to width of Neural Networks and increase amount of features
solve strings element for detection [9] and classification tasks but have not yet focused on enriching the quanlity of object
[10]. Existing malware classification research uses machine features. This paper aims to gather as many worthwhile
learning techniques like Support Vector Machine (SVM) features as possible while keeping small model architecture
[11], K-Nearest Neighbor [12], and Random Forest [13]. by utilizing CNN and combining it with a new type of
Another alternative to the machine learning-based method Variational Autoencoder enhanced by the Attention mecha-
for malware classification is the vision-based approach nism, which we call “AVAE”. The AVAE can provide more
[14-39]. Although attackers use obfuscation techniques to discriminate features, map and refine the original feature
achieve spoofing, malware variants from the same family still space to latent representation.
maintain similar code and data order, which may not appear The main contribution of this paper is providing an image-
in the same location. The convolutional Neural Networks based malware classification system through feature syn-
(CNN) can extract common features from a family. Conti thesis from VAE, CNN, and attention mechanism. Because
et al. have proposed a method to visualize malware binaries the processing is merely dependent on images, the system
into a grayscale image and noticed that visual analyses of does not require in-depth knowledge of the malware and
malware binary help distinguish various regions of data from the environment to determine its behavior. Moreover, some
the image [15]. The advantage of the malware visualization classifiers can give the result in under a second, so our model
analysis is that it does not require using any decompilers or a can be applied in real-time countermeasures against malware.
dynamic running environment. Moreover, malware samples The rest of the paper is organized as follows: Section
are converted into RGB (Red, Green, Blue) in [16] by encod- 2 discusses the related work concerning some popular and
ing and arranging bytes from binary files. A color image can recent techniques in malware detection and classification.
obtain more information than a grayscale image. Section 3 illustrates the proposed model in detail. Section 4
The growth of high-performance computing, coupled with evaluates the performance of the proposed approach. Finally,
the huge CNNs architectures, made it possible to process we summarize our work in Section 5.
images at a higher level of complexity. However, recent
studies indicate that fewer parameters with a simple network
II. RELATED WORK
structure give relatively satisfactory results and can be ap-
plied to low-profile devices like IoT [17] or smartphones In this session, we investigate various new studies on image-
[18]. Taking advances from different well-known CNN ar- based malware classification, ranging from models with sim-
chitectures, Transfer learning is also applied for image-based ple structures to complex ones; some hybrid models with
malware classification [19], [24], [25], [28], [30]. By using different structural combinations have achieved high perfor-
pre-trained CNNs and fine-tuning them, several CNNs can mance in malware classification.
extract rich features more than simple ones [19]. For the first time, Nataraj et al. proposed a novel approach
Another approach that can be used to extract features of for visualizing and classifying malware using image process-
an image is Autoencoder (AE). AE is an unsupervised deep ing techniques [12]. They visualized malware as a gray-scale
learning algorithm with a unique neural network structure. image based on the observation that images of the same class
AE transforms the input into an output with minimal recon- were very similar in layout and texture. They utilize GIST
struction errors and can process with small data. However, descriptor, based on wavelet decomposition of an image, as
AE often falls into overfitting, and the problem of organizing feature extractor and k-nearest neighbor(kNN) as a classifier.
the latent space is complex. VAE is then introduced as an The paper achieved an accuracy of 97.18% on their intro-
autoencoder whose training is regularised to avoid overfitting duced dataset: Malimg, which contains 9,339 malware sam-
and ensure that the latent space has suitable properties that ples related to 25 different malware families. Other feature
enable a generative process. While VAE can represent global descriptors are also applied as HOG and HOC+GIST [22].
features through latent space, CNN capture local feature However, this method is not suitable for processing a massive
through small kernels. The combination of VAE and CNN amount of malware because of the high computational cost.
promises to obtain an overall feature of the object [32]. Naeem et al. [23] utilized a new type of feature descriptor by
However, this combination still did not achieve the expected combining and balancing collective local and global feature
performance. vectors. As a result, they achieved a high classification rate
For now, attention mechanisms [20] have been a significant of 98% on the Malimg dataset.
breakthrough in deep learning. The mechanisms have been The current research focuses on building a complex net-
widely used in image recognition, NLP, and speech recog- work model with deep CNN. For example, more than ten
nition. However, few studies on malware classification are Conv layers [2], VGG16 in [24], VGG19 in [25], or Com-
based on attention mechanisms in terms of computer vision. bining multiple CNN architectures [19]. On the other hand,
Moreover, compared to multi-head attention [20], this type [26] minimize parameters to speed up training. The proposed
of attention tends to feedforward CNN and can be applied at model achieves the accuracy, which is lower, approximately
every convolutional block in deep networks. under 1%, than the state-of-the-art result, by reducing 99.7%
2 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3198072

T.V. Dao, H. Sato, M. Kubo et al.: A Novel Combination of Light-weight Deep Learning Model for Image-Based Malware Classification

the number of trainable parameters of the best model in the dataset.


comparison session. Lee et al. [1] illustrates the effectiveness of autoencoder
Verma et al. [27] try to enrich extracted Malware features by applying multiple AEs. Each AE model classifies only
by concatenating CNN features and other 35 statistical tex- one type of malware and is trained using only samples from
ture features. The numerous CNNs require high-resolution the corresponding family. As a result, the author achieves
images for training. The input image size of these networks is an accuracy of 94.03% for a system with the same AE
usually around 224x224 to 299x299 [28]. The larger the size, network structure and 97.75% with various AEs. Moreover,
the higher the computational cost. Roseline et al. [29] em- the model achieves a 0.46% improvement from 97.75% to
ployed Lightweight CNNs with merely three convolutional 98.21% when combining similar classes. However, the article
layers with an increasing depth of 16, 32 and 64. The model is still misclassified quite a lot, showing that AE has not
is optimized by Adam and utilizes Categorical Cross-entropy been effective in extracting the characteristics of image-based
loss, and the input image is resized to 32x32. With the above malware.
setting, [29] achieved an accuracy of 97.68% through 50 Burks et al. [32] inserted VAE into the handcraft Resid-
epochs. ual Network (RN), and the performance accuracy of 85%
Rezende et al. [30] transferred the first 49 layers of ResNet- increased by 2% and 6% compared with the original RN and
50 on ImageNet to the malware classification task. Frozen Generative Adversarial Network (GAN) model, respectively.
layers can be seen as learned feature extraction layers. The Awan et al. [25] applied spatial convolutional attention
author replaced the last layer with 1000 fully connected soft- called dynamic spatial convolution on VGG19 Network. This
max with 25 fully connected ones according to the number attention utilized a global average pooling (GAP) mecha-
of classes on the Malimg dataset. After 750 epochs, the paper nism, rescale the output of GAP by lambda layer, fed into
reached an average accuracy of 98.62% with 10-fold cross- dropout of rate 0.25 before Fully connected layer, the au-
validation. They also compare features extracted from Deep thor utilized Softmax as a traditional classifier of CNNs.
CNN (DCNN) with GIST features using the same kNN clas- The performance was evaluated on the Malimg dataset and
sifier. The experimental result showed ResNet-50 performed achieved an accuracy of 97.68%. Ma et al. [33] applied the
better than handcrafted GIST by 0.52% with 98.00% and attention mechanism [20] and handcrafted architecture with
97.48%, respectively. five parts: Input layer, Local Attention, Global Attention
Vasan et al. [19] utilized an ensemble of CNNs. They layer, Dense layer, and Output layer. Compared with other
assumed that different CNNs provide different semantic rep- methods, the combination of the attention mechanism and
resentations of the image; therefore, higher qualities feature CNN mechanism achieved the best classification accuracy of
is extracted than traditional methods. VGG16 and ResNet- 96.09% on Microsoft’s Kaggle dataset.
50 pre-trained on ImageNet were fine-tuned for malware B. N. Narayanan et al. [42] declare that each malicious
images. This ensemble method achieves high detection ac- program belonging to a family has a distinct pattern. The
curacy with a low false rate. authors use Principal Component Analysis (PCA) as linear
Anandhi et al. [21] introduced another type of Deep CNN dimension reduction can save the computational time and
with Densely connected networks (DensNet). DensNet com- even trade-off of losing several valuable information. As a
prises dense blocks, a composite function, and a transition result, the performance obtained is still far behind CNN.
layer. This architecture solved the vanishing gradient prob- V. S. P. Davuluru et al. [43] indicate a trade-off between
lem because of the shrinking of the gradient through a deep computational time and model complexity. The authors also
network. The author utilized DenseNet201 with 201 layers highlight the advantages of using CNN as a feature extrac-
deep and achieved an accuracy of 98.97% on the original tor. Instead of the original CNN classifier (softmax), using
Malimg dataset and 99.36% by combining similar families, SVM can overcome the drawback of the limited unbalanced
C2LOP and Swizzor. dataset.
Çayır et al. [26] proposed a simple architecture called B. N. Narayananet al. [44] proposed a novel approach of
Random CapsNet forest engineering instead of complex fusing both Natural Language Processing (NLP)-based ap-
CNN architectures. This model contains capsules similar to proach called LSTM and image-based approaches including
autoencoders, with each capsule learning how to represent simple CNN, AlexNet, ResNet, and VGG16 into a single
an instance for a given class. Although the proposed method simple architecture. The combination of several different
does not use data augmentation, data resampling, transfer CNN feature extractors is also somewhat similar to the char-
learning, and weighted loss function, it still achieved accept- acteristics of the DensNet model, concatenating intermediate
able results with an accuracy of 98.72%. layers [21]. The authors extract 9 features from each of those
Nisa et al. [31] combine the features extracted from pre- architectures, compiling a suite of 45 in total. Choosing the
trained AlexNet and Inception-V3. These fusion features are appropriate features from the total number of features in
then classified using different classifiers such as SVM, kNN, each architecture will also become an optimization problem
and Decision tree (DT). [31] achieved an accuracy of 98.7% for two different architectures. Besides, recent malware is
on the Malimg dataset. The result was improved up to 99.3% obfuscated, and the obtained opcodes sequence will be en-
when applying augmentation to turn Malimg into a balanced tangled with a lot of noise, leading to limitations in finding
VOLUME 4, 2016 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3198072

T..V Dao, H. Sato, M. Kubo et al.: A Novel Combination of Light-weight Deep Learning Model for Image-Based Malware Classification

relationships between words and the quality of embedding


of LSTM. As a result, it will affect the assembly architecture.
Besides, observing malware visualization from the Microsoft
Malware Classification Challenge (BIG 2015) dataset, it can
seem that different families have distinctly different images
that the naked eye can distinguish. The number of families
is not too large; compared to the Malimg dataset, up to 25
families, several malware samples from different families
look the same and can not be distinguished by the human
eye. Therefore, with data of higher complexity, an additional
refine mechanism is needed; in this study, we focus on
filtering and selecting essential features so that they can be
processed with data with high similarity even if the naked
FIGURE 1. The structure of a binary file
eye cannot distinguish it.

III. PROPOSED METHOD


A. IMAGE REPRESENTATION FOR MALWARE
To visualize a malware sample as an image, we must interpret
every byte as one pixel in an image. Notice that binary files
are the hexadecimal representation of the PE of malware in
Figure 1. The first row is the offset of the memory address.
The second one represents the pair of hexadecimal. Each
hexadecimal pair is treated as a single decimal number which
serves as a pixel value of the image. The resulting array must
be organized as a 2-D array, and values are in the range
[0,255] (0: black, 255: white). The size of the image depends
on the binary file’s size. Table 1 presents different heights
for malware images due to different sizes of malware files
while fixing the width of images. Table 1 also illustrates that FIGURE 2. Samples from the Malimg dataset
converting malware into grayscale images does not require
a long time; common malicious codes less than 1Mb in size
only take no more than 0.01s to convert. B. VARIATIONAL AUTOENCODER
We then convert the grayscale images into three-channel VAE [34] is a variant of an autoencoder (AE) that also
RGB images by replicating the grayscale channels for three consists of an encoder and a decode. The autoencoder is
iterations. Figure 2 illustrates a part of the malware plot solely trained to encode and decode with as few losses
from the Malimg dataset, which Nataraj et al. [12] created. (reconstruction loss) as possible, no matter how the latent
It can be observed that images from a given family are space is organized. Therefore, it is tough to guarantee that
similar while distinct from those of a different family. New the encoder will organize the latent space smartly. More than
variants are often created by changing a small part of the that, AE often faces an overfitting problem which causes
code. Therefore, if the predecessor is reused, the result would irregular in the latent space. On the other hand, the VAE
be very similar. Furthermore, by converting malware into applies a Gaussian probability density qϕ (z|x) that makes
an image, it is possible to detect the small changes while the encoder return distribution over the latent space. VAE
keeping the comprehensive structure of samples belonging tackles the problem of the latent space irregularity problem
to the same family. by adding in the loss function a regularisation term over that
returned distribution to ensure a better organization of the
TABLE 1. Image height for different malware file sizes latent space.
Let ϕ = (W, b) and θ = (W, b’). The lost function of VAE
File size Image height Time convert(ms) includes two terms as follows:
<10 kB 32 0.105
lV AE (xi , θ, ϕ) = −Eqϕ (z|xi ) log pθ (xi | z)
 
10kB-30kB 64 0.312
30kB-60kB 128 0.428 (1)
60kB-100kB 256 0.571 +DKL (qϕ (z|xi ∥ p(z))
100kB-200kB 384 0.748
200kB-500kB 512 0.665
The first term is the expected negative log-likelihood of
500kB-1Mb 768 0.814 the i -th data point. This term is also called the reconstruction
>1Mb 1024 2.85 error (RE) of VAE since it forces the decoder to learn to
reconstruct the input data. The second term is the Kullback-
Leibler (KL) divergence between the encoder’s distribution
4 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3198072

T.V. Dao, H. Sato, M. Kubo et al.: A Novel Combination of Light-weight Deep Learning Model for Image-Based Malware Classification

Symbol Meaning
W Weight matrix of encoder
0.5 to avoid overfitting. Moreover, we use Adam as a fine-
W’ Weight matrix of decoder tuning optimizer with a minimal learning-rate = 0.001. In the
b Bias vector of encoder AVAE model, we insert CBAM in turn between convolutional
b’ Bias vector of decoder
ϕ Parameter for training encoder
layers. In latent representation, we use the mean vector, dense
θ Parameter for training decoder µ with latent dimension sets to 100. We concatenate these
x Training dataset extracted features with a fully connected layer of CNN. Both
z Representation of the input sample
xi datapoint i − th
the CNN model and AVAE model train low-resolution image
qϕ Encoder with the size of 64x64, and the number of epochs are 50.
qθ Decoder We utilize early stopping to finish training without im-
lVAE (x i , θ, ϕ) Loss function of VAE for a datapoint x i provement after five epochs. We use the typical classifiers al-
g Deterministic function
K Number of samples that are utilized to reparameterize z gorithm of machine learning to evaluate our system. In order
to evaluate our method, we utilize 10-fold Cross-Validation.
One of the ten subsamples is held out as validation data, and
qϕ (z|x) and the expected distribution p(z). This divergence the remaining nine subsamples are used as training data. This
measures the relation of q and p [34]. In the VAE, p(z) is process is repeated ten times with each of the ten subsamples
specified as a standard normal distribution with mean zero used as validation. The average of ten results is the quality of
and standard deviation, denoted as N (0, 1). If the encoder the method.
outputs representations z different from the standard normal
distribution, it will receive a penalty in the loss. Since the IV. EXPERIMENTAL RESULTS
gradient descent algorithm is not suitable to train a VAE with A. DATASET
a random variable z sampled from p(z), the loss function of This study evaluates our model using the Malimg Dataset
the VAE is re-parameterized as follows: consisting of 9,339 malware samples of 25 different families.
K Table 2 illustrates the number of malwares in each class. It is
i 1 X clear that the Malimg dataset is unbalanced; 2,949 images
lV AE (x , θ, ϕ) = − log pθ (xi | z i,k )
K (2) represent the Allaple. A malware family, while merely 80
k=1
+DKL (qϕ (z|xi ∥ p(z)) images are present in the Skintrim. N family. The imbalanced
datasets are a communal problem in machine learning in
Where z i,k = gϕ (ϵi,k , xi ), ϵk denotes N (0, 1). general, and computer vision in particular [28], [35], [36].
After training, the latent layers of VAE can be utilized for Furthermore, imbalanced data harms the performance of
a classification task. Then, the original data is passed through the CNNs because of causing underfitting and overfitting
the encoder part of VAE to generate the latent representation. [37]. There are two standard methods to deal with imbal-
anced class distribution problems; oversampling and un-
C. ATTENTION MECHANISM dersampling. Instead of adding more samples on lacking
The structure of the attention module is described in Figure malware families, [32] utilized image augmentation, which
3. There are two sequential sub-modules: Channel Attention generates new data from classes with less population in the
Module (CAM) and Spatial Attention Module (SAM). The dataset. However, using augmentation is an extremely high
former decomposes the input tensor into two subsequent computational cost. In this study, we adopt undersampling
vectors generated by Global Average Pooling and Global to balance the Malimg dataset. Specifically, we reduce the
Max Pooling, feeding into a multi-layer perceptron with one number of malware samples in all groups to the lowest
hidden layer. After that, both vectors are merged by using sample Skintrim.N family same with [38]. The total number
element-wise summation. The latter applies Max Pooling of variants now is less than one-fourth of 2,000 compared to
and Average Pooling across channels, then concatenate them, the original Malimg dataset.
followed by a convolution layer to generate a spatial attention
map. B. CLASSIFICATION RESULT
The model can learn what and where to emphasize or We utilized some standard classifiers for the unbalanced
suppress and refines intermediate features effectively through Malimg dataset. The result is shown in Table 3. Random For-
the attention mechanism, [40]. In this paper, we apply both est (RF) classifier achieves the highest accuracy of 99.40%,
CAM and SAM. It is called Convolutional Block Attention while Nearest Centroid runs fastest with merely 0.11 seconds
Module (CBAM) [40] in the encoder part of VAE. We name with an accuracy difference of 1.26% compared to RF in the
it as Attention of Variational Autoencoder (AVAE). 10-fold Cross-Validation. Table 8 depicts a confusion matrix
that gives the detailed performance of the proposed method
D. FEATURE COMBINATION AND CLASSIFICATION using the Random Forest classifier. As can be seen, 22 out
Fig. 4 illustrates the architecture of our system. We utilize of 24 families attain F-scores greater than 90%, 88.1%, and
the lightweight CNN with merely two convolutional layers 89.2% of Swizzor.gen!E and Swizzor.gen!I, respectively.
with a kernel size is 32, followed by 64. Before flattening The balanced Malimg dataset of results is shown in Table
the pooled feature map, we apply dropout with a rate = 4. Even though the number of data is reduced dramatically,
VOLUME 4, 2016 5

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3198072

T..V Dao, H. Sato, M. Kubo et al.: A Novel Combination of Light-weight Deep Learning Model for Image-Based Malware Classification

FIGURE 3. The structure of CBAM [40]

FIGURE 4. An overview of proposed method

we still achieve high accuracy of 98.40% when using the lightweight proposed model improves accuracy significantly
RF classifier. The result shows that our method can extract and saves the computational cost. Moreover, the time to
crucial features of image-based malware. Compared to the classify each malicious code only takes an average of 0.01s.
previous study, our proposed method reduces by 1% while Complex architectures such as [25], [30], [32] require high
[38] reduces four times by 4%. The results of the unbalanced image quality and computational processing capacity. The
Malimg dataset compared with the results of other studies reason for using complex networks is that the deep layers
using the same dataset are shown in Table 7. are expected to extract specific features such as ears and
eyes in image processing tasks concerned with humans. On
As shown in Table 7, the Lightweight CNNs of Roseline the other hand, the shallow layers focus on overall image
et al. [29] proposed with merely 0.83M parameters, but the features such as edges of the objects. For example, in Fig. 2,
result does not change sharply since the first-time dataset many uncomplicated elements can be found by observing the
was introduced by Nataraj et al. [12] by 0.31% from 97.18% simple grayscale of malware samples. Therefore, we focus
to 97.49%. That proves that using only a few parameters is on the first layers to extract adequate features with a smaller
not necessarily extracting enough features of the object. On image size of 64x64, still ensuring high accuracy.
the other hand, utilizing a model with enormous parameters The Malimg dataset contains many samples processed
such as ResNet-50 [30] and VGG19 [25] improved the result through obfuscation techniques such as encryption and pack-
slightly; however, it requires more computational power. ing. Among them, malware samples belonging to Adialer.C,
Nevertheless, using a sufficient number of parameters, our Autorun.K, Lolyda.AT, Malex.gen!J, VB.AT, Yuner.A are
6 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3198072

T.V. Dao, H. Sato, M. Kubo et al.: A Novel Combination of Light-weight Deep Learning Model for Image-Based Malware Classification

TABLE 2. Original Malimg Dataset TABLE 4. Performance comparision for the various classifier on balanced
Malimg Dataset. Best configuration was highlighted with bold characters.

Class Family name No. of samples Percentage(%) Accuracy Time


0 Adialer.C 122 1.31 Classifier
(%) (s)
1 Agent.FYI 116 2.12 Decision Tree 94.15 2.43
2 Allaple.A 2949 31.58 k-Nearest Neighbors 98.35 0.68
3 Appaple.L 1591 17.04 Naive Bayes 96.75 0.10
4 Alueron.gen!J 198 2.12 Nearest Centroid 97.60 0.05
5 Autorun.K 106 1.14 Random Forest 98.40 10.95
6 C2LOP.gen!g 200 2.14 SVM 93.75 2.19
7 C2LOP.P 146 1.56
8 Dialplatform.B 177 1.89
9 Dontovo.A 162 1.73 TABLE 5. Comparision of accracy in term of both imbalanced and balanced
10 Fakerean 381 4.08 Malimg dataset with previous work
11 Instantaccess 431 4.62
12 Lolyda.AA1 213 2.28 Accuracy (%)
13 Lolyda.AA2 184 1.97 Study
Unbalance Malimg Balanced Malimg
14 Lolyda.AA3 123 1.32 Yajamanam et al. [38] 97.00 93.00
15 Lolyda.AT 159 1.70 This paper 99.40 98.40
16 Malex.gen!J 136 1.46
17 Obfuscator.AD 142 1.52
18 Rbot!gen 158 1.69 TABLE 6. Comparision with two families misclassification
19 Skintrim.N 80 0.86
20 Swizzor.gen!E 128 1.37 Accuracy(%)
21 Swizzor.gen!I 132 1.41 Studies
Swizzor.gen!E Swizzor.gen!I
22 VB.AT 408 4.58
Yajamanam et al. [38] 51.0 36.0
23 Wintrim.BX 97 1.04
Naeem et al. [23] 30.0 50.0
24 Yuner.A 800 8.57
Roseline et al. [29] 70.0 45.0
Çayır et al. [26] 56.3 68.8
Verma et al. [27] 87.5 81.8
TABLE 3. Performance comparision for the various classifier on unbalanced Awan et al. [25] 48.0 56.0
Malimg Dataset V. Anandhi et al. [21] 84.2 52.5
This paper 87.5 87.9
Accuracy Time
Classifier
(%) (s)
Decision Tree 98.12 7.00
k-Nearest Neighbors 99.30 10.49 sification results. We propose a feature selection method
Naive Bayes 98.16 0.32 called AVAE. AVAE consists of a small CNN, variational
Nearest Centroid 98.82 0.08 autoencoder, and an attention mechanism.
Random Forest 99.40 32.52
SVM 98.23 12.65 Experimental results show that our method could classify
malware families efficiently. Our method has achieved the
best accuracy of 99.40% with the Random Forest classifier,
packed with the same packing process, making them have while Nearest Centroid reaches nearly 99% in under a sec-
similar structure and pattern. As a result, analysts often ond. Furthermore, with merely 80 images of each family,
have difficulty distinguishing them. However, our method our method achieves a high accuracy of 98.40%, which is
can process these samples directly without unpacking, with consistent with the fact that some new families lack data.
the corresponding accuracy of 100%, 100%, 100%, 99.26%, The total time to convert malicious code into an image (with
99.75%, and 100%, respectively. The experiment indicated common malicious code under 1 Mb in size) and classify it
that our method was robust against these specific obfuscation merely takes 0.02s. We think our method will be applicable
attacks. to the existing systems from these results.
Moreover, despite achieving high total accuracy of classi- Another advantage of our method is that it can distinguish
fication, many studies have encountered an obstacle in classi- similar malware families with high accuracy even when it is
fying two family variants: Swizzor.gen!E and Swizzor.gen!I, packed. Therefore, our proposed method can help malware
which are highly similar and difficult to distinguish. The analysts reduce the time to classify variants. Furthermore,
accuracy of both families compared with other authors is when the malware family is identified, it is possible to know
shown in Table 6. We achieve the best result with 87.5% and the typical characteristics, the intended utilization, and the
87.9% accuracy, respectively. impact of the malware on the target.
In the latent space of VAE, the global features are orga-
V. CONCLUSION nized in a more planned than in AE. However, the importance
Recent studies have developed huge complex neural network of the elements has not been considered. In this stydy, we
models for malware analysis to obtain desirable features. further emphasize the importance of attention mechanism in
However, they demand more resources than the average selecting and evaluating weights for VAE to help features
system can provide. Therefore, this paper focuses on building acquire important features in latent space. At the same time
simple, lightweight models while still ensuring high clas- to ensure feature diversity, we combined light-weight CNN
VOLUME 4, 2016 7

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3198072

T..V Dao, H. Sato, M. Kubo et al.: A Novel Combination of Light-weight Deep Learning Model for Image-Based Malware Classification

TABLE 7. Comparision with existing state-of-the-art algorithms

Studies Year Techniques Accuracy(%) Number of parameters(M)


Nataraj et al. [12] 2011 GIST feature + kNN 97.18 (-)
Garcia et al. [13] 2016 ANN + Random Forest 95.26 (-)
Agarap [11] 2017 GRU-SVM 84.92 (-)
Rezende et al. [30] 2017 ResNet-50 + Softmax 98.62 25.56
Yajamanam et al. [38] 2018 Deep learning + Softmax 97.00 (-)
Naeem et al. [23] 2019 Local Feature Extraction + Global Feature Extraction 98.40 (-)
Burks et al. [32] 2019 ResNet-18 + VAE 85.00 12.46
Roseline et al. [29] 2020 Lightweight CNNs 97.49 0.83
Çayır et al. [26] 2020 Capsule Networks + Softmax 98.72 (-)
Verma et al. [27] 2020 Combine first-order and second-order statistical texture features 98.58 (-)
Awan et al. [25] 2021 VGG19 + Spatial Convolutional Attetion 97.68 143.67
Nisa et al. [31] 2021 SFTA + Cosine kNN 98.70 88.26
Moussas et al. [41] 2021 Image and file features, ANN 99.13 (-)
Lee et al. [1] 2021 Multiple Autoencoders 97.75 23.81
V. Anandhi et al. [21] 2021 Gabor filter + DenseNet Markov 98.97 7.98
This paper 2022 Lightweight CNN + AVAE 99.40 3.62

TABLE 8. Unbalanced Malimg dataset confusion matrix for 10-fold cross validation using RF classifier

Class 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Precision Recall F1 Score


0 122 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.000 1.000 1.000
1 0 116 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.000 1.000 1.000
2 0 0 2948 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.000 1.000 1.000
3 0 0 0 1591 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.999 1.000 1.000
4 0 0 0 0 198 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.000 1.000 1.000
5 0 0 0 0 0 106 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.000 1.000 1.000
6 0 0 0 1 0 0 193 3 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0.949 0.930 0.935
7 0 0 0 0 0 0 1 144 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0.952 0.990 0.969
8 0 0 0 0 0 0 0 0 175 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1.000 0.988 0.994
9 0 0 0 0 0 0 0 0 0 162 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.000 1.000 1.000
10 0 0 0 0 0 0 1 0 0 0 379 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1.000 0.994 0.998
11 0 0 0 0 0 0 0 0 0 0 0 431 0 0 0 0 0 0 0 0 0 0 0 0 0 1.000 1.000 1.000
12 0 0 0 0 0 0 0 0 0 0 0 0 213 0 0 0 0 0 0 0 0 0 0 0 0 0.991 1.000 0.995
13 0 0 0 0 0 0 0 0 0 0 0 0 2 182 0 0 0 0 0 0 0 0 0 0 0 1.000 0.989 0.994
14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 122 0 0 0 0 0 0 0 1 0 0 1.000 0.992 0.996
15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 158 0 0 0 0 0 0 1 0 0 1.000 0.994 0.997
16 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 135 0 0 0 0 0 0 0 0 0.993 0.993 0.993
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 142 0 0 0 0 0 0 0 1.000 1.000 1.000
18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 158 0 0 0 0 0 0 0.994 1.000 0.997
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 80 0 0 0 0 0 1.000 1.000 1.000
20 0 0 0 0 0 0 1 3 0 0 0 0 0 0 0 0 0 0 0 0 112 12 0 0 0 0.899 0.876 0.881
21 0 0 0 0 0 0 3 2 0 0 0 0 0 0 0 0 0 0 1 0 10 116 0 0 0 0.908 0.877 0.892
22 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 407 0 0 0.991 0.998 0.995
23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 97 0 0.982 1.000 0.990
24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 800 1.000 1.000 1.000

to capture lower-range features. Compared to image data REFERENCES


generated by malicious code, there are not many complex [1] J. Lee and J. Lee, “A Classification System for Visualized Malware Based
factors that need a deep CNN network, such as face images on Multiple Autoencoder Models”, IEEE Access, vol. 9, pp. 144786 –
144795, Oct. 2021. DOI: 10.1109/ACCESS.2021.3122083.
or animal images in ImageNet data. The complementary [2] G, Xiao, J. Li, Y. Chen, K. Li, “MalFCS: An effective malware clas-
method from the two models helps us acquire rich and sification framework with automated feature extraction based on deep
different characteristics of the object. convolutional neural networks”, J. Parallel Distrib. Comput. Vol 141, pp.
49–58, 2020. DOI:10.1016/j.jpdc.2020.03.012.
[3] A. Moser, C. Kruegel, and E. Kirda, “Limits of static analysis for malware
We will build a new malware dataset with recent malicious detection”, Twenty-third Annual Computer Security Applications Confer-
code for future work. Additionally, we will apply the pro- ence, pp. 421 – 430, 2007. DOI: 10.1109/ACSAC.2007.21.
[4] M. Wagner, F. Fischer, R. Luh, A. Haberson, A. Rind, D.A. Keim
posed method to the IDS system to enhance the capacity for and W. Aigner, “A survey of Visualization of Systems for Malware
detection and classification of potential dangers in cyberse- Analysis”, Eurographics Conference on Visualization (EuroVis), 2015.
curity. DOI:10.2312/eurovisstar.20151114.
[5] M. Egele, T. Scholte, E. Kirda and C. Kruegel, “A survey on automated dy-
namic malware-analysis techniques and tools”, ACM Computing Surveys,
In this paper, we have built a model focusing on the issue vol.44, no.6, pp. 1-42, 2012. DOI:10.1145/2089125.2089126.
of classifying malware with simple but effective architecture. [6] Y. Ye, T. Li, D. Adjeroh and S. lyengar, “A Survey on Malware Detection
Using Data Mining Techniques”, Computer Science Review, vol. 50, no.
We think there is a possibility to apply our method to standard 41, pp. 1-40, 2017. DOI:10.1145/3073559.
image classification even with the lack of data. [7] L. Liu, B.S. Wang, B. Yu, and Q.X. Zhong, “Automatic malware classifica-

8 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3198072

T.V. Dao, H. Sato, M. Kubo et al.: A Novel Combination of Light-weight Deep Learning Model for Image-Based Malware Classification

tion and new malware detection using machine learning,” emphFrontiers ing Models”, emphApplied Sciences, vol. 11, no. 14, 2021, Art. no. 6446.
Inf. Technol. Electron. Eng., vol. 18, no. 9, pp. 1336–1347, Sep. 2017. DOI:10.3390/app11146446.
DOI:10.1631/FITEE.1601325. [29] A. Roseline, G. Hari, S. Geetha, R. Krishnamurthy, “Vision-Based Mal-
[8] Q. Qian and M. Tang, “Dynamic API call sequence visualisation for ware Detection and Classification Using Lightweight Deep Learning
malware classification,” IET Inf. Secur., vol. 13, no. 4, pp. 367–377, Oct. Paradigm”, in /emphComputer Vision and Image Processing, pp. 62-73,
2018. DOI:10.1049/iet-ifs.2018.5268. 2020.
[9] M. Mimura, “An Improved Method of Detecting Macro Malware on an [30] E. Rezende, G. Ruppert, T. Carvalho, F. Ramos,P. De Geus, “Malicious
Imbalanced Dataset”, IEEE Access , vol. 8, pp. 204709 – 204717, Nov. software classification using transfer learning of RESNET-50 deep neural
2020. DOI: 10.1109/ACCESS.2020.3037330. network”, in Proceedings 16th IEEE International Conference on Machine
[10] K. Tran and H. Sato, “NLP-based approaches for malware clas- Learning and Applications, Dec. 2017. DOI:10.1109/ICMLA.2017.00-19.
sification from API sequences”, emph21st Asia Pacific Sympo- [31] M. Nisa, J.H Shah, S. Kanwal, M. Raza, M.A Khan, R. Damaševicius, T.
sium on Intelligent and Evolutionary Systems (IES), Nov. 2017. Blažauskas, “Hybrid malware classification method using segmentation-
DOI:10.1109/IESYS.2017.8233569. based fractal texture analysis and deep convolution neural network
[11] A.M. Agarap, “Towards Building an intelligent Anti-Malware System: features”, Applied Sciences, vol. 10, July. 2020, Art. no. 4966.
A Deep Learning Approach using Support Vector Machine (SVM) for DOI:10.3390/app10144966.
Malware Classification”, arXiv preprint 2017, arXiv:1801.00318. [32] R.Burks, K.A Islam, J. Li, Y. Lu, “Data augmentation
[12] L. Nataraj, S. Karthikeyan, G. Jacob and B.S. Manjunath, “Malware with generative models for improved malware detection: a
images: visualization and automatic classification”. Proceedings of the comparative study”, The IEEE 10th Annual Ubiquitous Computing,
8th International Symposium on Visualization for Cyber Security , 2011. Electronics & Mobile Communication Conference, Oct. 2019.
DOI:10.1145/2016904.2016908. DOI:10.1109/UEMCON47517.2019.8993085.
[13] F.C.C. Garcia and F.P. Muga II, “Random Forest for Malware Classifica- [33] X. Ma, S. Guo, H. Li, Z. Pan, “How to Make Attention Mechanisms
tion”, aeXiv preprint 2016, arXiv:1609.07770. More Practical in Malware Classification”, IEEE Access, Oct. 2019.
DOI:10.1109/ACCESS.2019.2948358.
[14] L. Nataraj, S. Karthikeyan and B.S. Manjunath, “SATTVA: SpArsiTy
[34] D.P Kingma and M. Welling, “Auto-encoding variantional bayes”, aeXiv
inspired classificaTion of malware Variants”. Proceedings of the 3rd ACM
preprint 2013, arXiv: 1312.6114.
Workshop on Information Hiding and Multimedia Security, pp. 135–140,
[35] Ramasubramanian and H. Shanmugasundaram, “A Review on Classifica-
2015. DOI:10.1145/2756601.2756616.
tion of Data Imbalance using BigData”, International Journal of Manag-
[15] G. Conti, E. Dean, M. Sinda, B. Sangster, “Visual reverse engineering of
ing Information Technology, vol. 13, no. 03, pp. 09-22, Aug. 2021. DOI:
binary and data files”, Visualization for Computer Security, 5th Interna-
10.5121/ijmit.2021.13302.
tional Workshop, VizSec, Jan. 2008.
[36] F. Thabtah, S. Hammoud, F. Kamalov and A. Gonsalves, “Data imbalance
[16] D.L Vu, T.K Nguyen, T.V Nguyen, T.N Nguyen, F. Massacci and P.H.
in classification: Experimental evaluation”, Information Sciences, vol. 513,
Phung, “HIT4Mal: Hybrid image transformation for malware classifica-
no. 3, Nov. 2019. DOI:10.1016/j.ins.2019.11.004.
tion”, Transactions on Emerging Telecommunications Technologies, vol.
[37] K.S Kancherla, S. Mukkamala, “Image visualization based mal-
31, no. 5, Nov. 2019. DOI:10.1002/ett.3789.
ware detection”. In Proceedings of the 2013 IEEE Symposium on
[17] H. Naeem, F. Ullah, M.R. Naeem, S. Khalid, D. Vasan, S. Jabbar, S. Saeed, Computational Intelligence in Cyber Security (CICS), April. 2013.
“Malware detection in industrial internet of things based on hybrid image DOI:10.1109/CICYBS.2013.6597204.
visualization and deep learning model”, emphAd Hoc Networks, vol. 105, [38] S. Yajamanam,V.R.S Selvin, F.D. Troia, M. Stamp, ”Deep learning versus
no. 1, May. 2020. DOI:10.1016/j.adhoc.2020.102154. gist descriptors for image-based malware classification”, 2nd International
[18] Y. Ding, X. Zhang, J. Hu, W. Xu, “Android malware detection method Workshop on Formal methods for Security Engineering , pp. 553–561, Jan.
based on bytecode image”, Journal of Ambient Intelligence and Human- 2018. DOI:10.5220/ 0006685805530561.
ized Computing, 2020. DOI:10.1007/s12652-020-02196-4. [39] D. Gibert, C. Mateu, J. Planes, R. Vicens,” Using convolutional neural
[19] D. Vasan, M. Alazab, S. Wassan, B. Safaei, Q. Zheng, “Image- networks for classification of malware represented as images”, Journal of
Based malware classification using ensemble of CNN architectures (IM- Computer Virology and Hacking Techniques , vol. 15, no. 1, pp. 15–28.
CEC)”, Computers and Security, vol. 92, May. 2020, Art. no. 101748. DOI:10.1007/s11416-018-0323-0.
DOI:10.1016/j.cose.2020.101748. [40] S. Woo, J. Park, J.Y. Lee, I. Kweon, “CBAM: Convolutional Block
[20] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, “Attention Is All You Attention Module”, in Computer Vision – ECCV 2018, pp. 3-19, Sep.
Need”, in Proc. NIPS, pp.1-11, 2017. 2018.
[21] V. Anandhi, P.Vinod, V.G. Menon, “Malware visualization and detection [41] V. Moussas, A. Andretos, “Malware Detection Based on Code Vi-
using DenseNet”, in Personal and Ubiquitous Computing, July. 2021. sualization and Two-Level Classification”, information, Mar. 2021.
DOI:10.1007/s00779-021-01581-w. DOI:10.3390/info12030118.
[22] A. Bozkir, E. Tahillopglu, M. Aydos and I. Kara, “Catch them alive: A [42] B. N. Narayanan, O. Djaneye-Boundjou and T. M. Kebede, “Performance
malware detection approach through memory forensics, manifold learning Analysis of Machine Learning and Pattern Recognition Algorithms for
and computer vison”, “Computers and Secutiry”, vol. 103, Apr. 2021, Art. Malware Classification”, 2016 IEEE National Aerospace and Electronics
No. 102166. Conference (NAECON) and Ohio Innovation Summit (OSI),, Dayton, OH,
[23] H.Naeem, B.Guo, M.R. Naeem,F. Ullah, H. Aldabbas, M.S Javed, “Identi- 2016, pp. 338-342.
fication of malicious code variants based on image visualization”, Com- [43] V. S. P. Davuluru, B.N. Narayanan and E. J. Balster, “Convolutional Neural
puters and Electrical Engineering, vol. 76, pp. 225–237, Apr. 2019. Networks as Classification Tools and Feature Extractors for Distinguishing
DOI:10.1016/j.compeleceng.2019.03.015. Malware Programs”, 2019 IEEE National Aerospace and Electronics
[24] E. Rezende, G. Ruppert, T. Carvalho, A. Theophilo, F. Ramos, P. de Geus, Conference (NAECON), Dayton, OH, USA, 2019, pp. 273-278.
“Malicious software classification using VGG16 deep neural network’s [44] B. N. Narayanan and V. S. P. Davuluru, “Ensemble Malware Classification
bottleneck features”, Information Technology - New Generations, pp. 51- System using Deep Neural Networks”,in Electronics 2020, 9 (5), 721.
59, Jan. 2018.
[25] M. Awan, M. Mohoammed, A. Yasin, A. Zain, “Image-Based Mal-
ware Classification Using VGG19 Network and Spatial Convolutional
Attention”, in Electronics, vol. 10, no. 19, Oct. 2021, Art. no. 2444.
DOI:10.3390/electronics10192444.
[26] A. Çayır, U. Ünal, H. Dağ, “Random CapsNet forest model for imbalanced
malware type classification task” in Computers and Security, vol. 102,
2021, Art. no. 102133.
[27] V. Verma, S.K Muttoo, V.B Singh, “Multiclass malware classification via
first and second order texture statistics”, in Computers and Security, vol.
97, 2020, Art. no. 101895.
[28] W.Shafai, I. Almomani and A. AlKhayer, “Visualized Malware Multi-
Classification Framework Using Fine-Tuned CNN-Based Transfer Learn-

VOLUME 4, 2016 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3198072

T..V Dao, H. Sato, M. Kubo et al.: A Novel Combination of Light-weight Deep Learning Model for Image-Based Malware Classification

VAN TUAN DAO was born in Thai Binh province,


Viet Nam, in 1992. He received the B.E. and M.E.
degrees from Department of Computer Science,
National Defense Academy of Japan, in 2016 and
2018,respectively. He is currently pursuing the
Ph.D. degree in information security. His main re-
search interests include computer vision, artificual
intelligence, cyber security and machine learning.

HIROSHI SATO is an Associate Professor of the


Department of Computer Science at the National
Defense Academy in Japan. He obtained a degree
in Physics from Keio University in Japan and
degrees of Master and Doctor of Engineering from
Tokyo Institute of Technology in Japan. He was
previously a Research Associate at the Department
of Mathematics and Information Sciences at Os-
aka Prefecture University in Japan. His research
interests include agent-based simulation, evolu-
tionary computation, and artificial intelligence. Dr. Sato is a member of the
Japanese Society for Artificial Intelligence (JSAI), Society of Instrument and
Control Engineers (SICE), and The Institute of Electronics, Information and
Communication Engineers. IEICE).

MASAO KUBO is an Associate Professor of the


Department of Computer Science at the National
Defense Academy in Japan. He graduated from
the Precision Engineering Department, Hokkaido
University, in 1991. He received his PhD degree
in Computer Science from Hokkaido University in
1996.

10 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

You might also like