Catch Them Alive: Malware Detection
Catch Them Alive: Malware Detection
Ahmet Selman Bozkir a,∗, Ersan Tahillioglu b, Murat Aydos a, Ilker Kara c
a Department of Computer Engineering, Hacettepe University, Turkey
b ASELSAN Inc., Turkey
c Department of Medical Services and Techniques, Eldivan Medical Services Vocational School Çankırı, Karetekin
University, Turkey
a r t i c l e i n f o a b s t r a c t
Article history: The everlasting increase in usage of information systems and online services have
Received 6 June 2020 triggered the birth of the new type of malware which are more dangerous and hard
Revised 16 November 2020 to detect. In particular, according to the recent reports, the new type of fileless mal-
ware infect the victims’ devices without a persistent trace (i.e. file) on hard drives.
Accepted 28 December 2020
Moreover, existing static malware detection methods in literature often fail to detect
Available online 2 January 2021
sophisticated malware utilizing various obfuscation and encryption techniques. Our
contribution in this study is two-folded. First, we present a novel approach to recognize
Keywords: malware by capturing the memory dump of suspicious processes which can be repre-
Memory forensics sented as a RGB image. In contrast to the conventional approaches followed by static
Memory dump and dynamic methods existing in the literature, we aimed to obtain and use mem-
Machine learning
ory data to reveal visual patterns that can be classified by employing computer vision
and machine learning methods in a multi-class open-set recognition regime. And sec-
Computer vision
ond, we have applied a state of art manifold learning scheme named UMAP to improve
Malware detection the detection of unknown malware files through binary classification. Throughout the
Manifold learning study, we have employed our novel dataset covering 4294 samples in total, includ-
ing 10 malware families along with the benign executables. Lastly, we obtained their
memory dumps and converted them to RGB images by applying 3 different rendering
schemes. In order to generate their signatures (i.e. feature vectors), we utilized GIST
and HOG (Histogram of Gradients) descriptors as well as their combination. Moreover,
the obtained signatures were classified via machine learning algorithms of j48, RBF
kernel-based SMO, Random Forest, XGBoost and linear SVM. According to the results
of the first phase, we have achieved prediction accuracy up to 96.39% by employing
SMO algorithm on the feature vectors combined with GIST+HOG. Besides, the UMAP
based manifold learning strategy has improved accuracy of the unknown malware
recognition models up to 12.93%, 21.83%, 20.78% on average for Random Forest, lin-
ear SVM and XGBoost algorithms respectively. Moreover, on a commercially available
standard desktop computer, the suggested approach takes only 3.56 s for analysis on
average. The results show that our vision based scheme provides an effective protec-
tion mechanism against malicious applications.
© 2021 Elsevier Ltd. All rights reserved.
∗
Corresponding author.
E-mail address: [email protected] (A.S. Bozkir).
https://doi.org/10.1016/j.cose.2020.102166
0167-4048/© 2021 Elsevier Ltd. All rights reserved.
2 computers & security 103 (2021) 102166
Chen, (2018) has carried out the transfer learning tech- than 4 MB. As a result, they have reached up to an accuracy
nique on the malware detection problem via InceptionV1 deep of 96.7%. Though their study is the closest work in the liter-
learning architecture along with grayscaled malware images. ature compared to ours, however, our approach involves sev-
Throughout the study, the author performed the experiments eral differences such as the descriptors and the dataset em-
over the well known Malimg dataset (Nataraj et al., 2011) hav- ployed. Furthermore, we have used 3-channel images rather
ing 25 malware classes and Microsoft Malware Classification than grayscale counterparts and evaluated different resizing
Dataset (2015) including 9 classes. The results obtained from options.
a multi-class classification regime show that the proposed One another study that is closer to a part of our work is the
method together with the softmax classifier has achieved ac- paper of Sharma and Raglin, (2018). In their work, authors of
curacy up to 99.25% with 0.03% false positive rate for “Mal- (Sharma and Raglin, 2018) conducted several experiments to
img” dataset. Besides, the obtained accuracy has surpassed measure the efficiency of linear and nonlinear manifold learn-
the compared classification schemes involving SVM, Random ing algorithms including PCA, Isomap, Diffusion Maps, Lapla-
Forest, Naïve Bayes, etc. Chen, (2018) has also tested binary cian EigenMaps and t-SNE for the task of clustering. The au-
classification on the dataset having 16,518 benign and 10,639 thors have found that nonlinear methods perform better and
malware files and reported 99.67% accuracy. Nevertheless, we t-SNE outperforms the other techniques in almost all aspects.
argue their experiments since (1) they have discarded the mal- The aforementioned study is the first work that has applied
ware files less than 5 KB in favor of decreasing false positives manifold learning in the problem domain.
and (2) the testing strategy has been carried out against al- It should be noted that the literature of malware detec-
ready known classes. Instead, the training and testing sets tion also involves very unique approaches such as presented
should be mutually exclusive except for the benign family. in (Zhang et al., 2014, Zhang et al., 2016). In the former one,
Additionally, the author pointed out the vulnerability of the the authors have suggested a novel network traffic reason-
method against code obfuscation. ing approach in order to identify anomalies regarding request-
Similarly, Vasan et al. (2020) have taken two deep learn- level traffic structures and semantic triggering relations. In
ing architectures namely VGG16 and Resnet50 and fined tuned this way, they explored a new way to detect even zero-day
them via MalImg dataset to create an ensemble of classifiers. malware attacks via their new concept so-called “triggering
They first applied the transfer learning strategy and reduced relation discovery”. Their experiments based on 6GB+ dataset
the deep features obtained at the end of deep learning models and conducted with SVM, Bayesian network and Naïve Bayes
through the dimension reduction technique named principal algorithms have clearly shown that their proposal reaches up
component analysis (PCA). Vasan et al. (2020) have reported to 100% detection accuracy along with serving a scalable solu-
that they reduced the number of dimensions by 90%. Accord- tion against DNS bots, spyware and data exfiltration malware.
ing to their results, they have achieved 99% accuracy for un- The latter one (Zhang et al., 2016), on the other hand, applies
packed malware whereas 98% for packed ones. However, their the idea presented in (Zhang et al., 2014) onto Android mal-
study has been conducted by using a dataset having 25 classes ware recognition problem by constructing a triggering relation
in a manner of closed set recognition. Thus, the convolutional model for dynamic analysis of HTTP traffic generated by An-
neural network models they have trained do not recognize be- droid apps. In other words, in (Zhang et al., 2016), researchers
nign samples. aim to distinguish malicious network requests from benign
As another study, Yuan et al., (2020) have followed a differ- apps by constructing triggering relation graphs and exploring
ent way and proposed a solution based on byte-level malware the root triggers for malicious applications. Their experiments
classification through deep convolutional VGG16 model on conducted with 14GB+ dataset have shown that the use of
Markov images. To achieve this feature, they have converted triggering relations belonging to network traffic yields 98.2%
the binary files into Markov images by considering the trans- accuracy.
fer probability matrices. They have tested their proposal on
Microsoft Malware and Drebin datasets (involving 10 classes)
and obtained average accuracy rates of 99.26% and 97.36% re- 3. Materials and methods
spectively.
Tam et al. (2015) have taken memory snapshots on an em- In this part, we first introduce the dataset we have collected.
ulator equipped with Android 4.4 version. By using the Volatil- Next, the employed image descriptors will be demonstrated
ity Framework, the libraries and strings used by processes briefly due to space limitations.
extracted to be utilized as a feature. So they have proposed
a scalable and portable environment to investigate Android 3.1. Dataset
memory. Nevertheless, this study suffers from the size of the
small dataset they used. It is a well known fact that, for data-driven studies, the impor-
Dai et al. (2019) have proposed a method that classifies tance of well curated and correct dataset are vital. Provided
malware files through training artificial neural networks on that the literature of pattern recognition based malware de-
HOG features extracted from memory dump data. Yet, they tection is reviewed, it can be observed that there exists a lim-
have transformed memory dump files into grayscale images ited number of datasets such as “Microsoft Malware Classi-
recorded in PNG format. As the size of dump data is variable, fication Challenge” (Microsoft Malware Classification Dataset
the sizes of the images have been variable as well. For this rea- 2015) (9 imbalanced classes) and “Malimg” (Nataraj et al.,
son, the authors have resized them via bi-cubic interpolation 2011) (25 balanced classes). Bozkir et al. (2019) have re-
to make them have 4096-pixel width if the file size is greater cently published another brand new dataset called “Malevis”
computers & security 103 (2021) 102166 5
of gradient computation. Given a point (x,y), HOG first com- to visualize high dimensional data in 2D or 3D space in order
putes the gradient values in both horizontal and vertical di- to better discover and investigate the underlying structures,
rections by employing [−1, 0, 1] and [−1, 0, 1]T kernel templates clusters and neighborhoods. From this point of view, manifold
(Dai et al., 2018). As reported in (Dalal and Triggs, 2005), these learning can be considered as a data transformation tool that
templates perform the best results when pedestrian classifi- employs linear or nonlinear dimensionality reduction tech-
cation comes into prominence. Next, as stated in (Dai et al., niques.
2018) the gradients in both x and y directions are calculated In this study, we have employed a state-of-art dimen-
via the formulas given in (3) and (4) below: sion reduction and manifold learning technique named UMAP.
UMAP is a relatively new and powerful manifold learning
grad = Gx (x, y )2 + Gy (x, y )2 (3) and dimension reduction method that is built on Rieman-
nian and algebraic topology together with fuzzy simplical
sets (McInnes et al., 2018). Apart from manifold learning, it
Gx (x, y ) also presents many useful features such as clustering, met-
α(x, y ) = tan−1 (4)
Gy (x, y ) ric learning, visualization and inverse transformation of em-
beddings. It should be kept in mind that, there exists a
In essence, to normalize the contrast among the neighbor- vast amount of DR… methods in the literature such as PCA
hood regions, the image is divided into a determined num- (Jackson, 2005), Laplacian Eigenmaps (Belkin and Niyogi, 2002),
ber of equally sized cells where 8 × 8 or 16 × 16 cell groupings Isomap (Tenenbaum et al., 2000), Diffusion Map (Coifman and
constructs a block. Note that, these blocks consist of overlap- Lafon, 2006) and t-SNE (Maaten and Hinton, 2008). The t-SNE,
ping cells to contribute to the contrast normalization scheme. among the others, is accepted as the most widely used method
Throughout the normalization, the obtained local feature vec- (Becht et al., 2019, Ali et al., 2019) and it is often utilized to vi-
tors belonging to each cell are cascaded based on the voting sualize high dimensional data in 2D or 3D space. Fundamen-
result. As a result, the parameters such as cell size, block size tally, the goal of t-SNE is to discover the patterns by calcu-
and image size directly affect the dimension size of the fea- lating the probability distributions (i.e. Student t-distribution)
ture vectors. Therefore it is crucial to have a predefined input found in the pairs of high dimensional data points and re-
image size. flecting them into the lower dimensional space in a way that
the distributions are tried to be kept same via Kullback-Leibler
3.4. UMAP - Uniform Manifold approximation and divergence (Sharma and Raglin, 2018). Though it is a widely-
projection for dimension reduction used and powerful method, it has several shortcomings such
as (1) preserving only local neighborhoods (i.e. discarding the
In this sub-section, we first briefly outlined what the mani- global relationships yielding lack of exploring the “big pic-
fold learning is and concisely introduced the method of UMAP ture”), (2) slow computation that causes scalability problems
(Uniform Manifold Approximation and Projection). Further- when large datasets are analyzed, (3) consuming much mem-
more, we also presented the reasons behind its selection along ory in case of using large perplexity value and (4) lack of pro-
with describing its pros and cons compared to other dimen- ducing a learned transformer function to embed new cases
sion reduction techniques. when needed. Sharma and Raglin (2018) have applied the
Sharma and Raglin (2018) describe the concept of the mani- above listed DR… methods (except UMAP) in the problem
fold as a topological space that locally resembles the Euclidean domain of malware detection and found that the best em-
space along with involving a more complex global structure. bedding could be achieved by the use of t-SNE. Nonetheless,
More precisely, from the local perspective, manifolds can be the recent studies (McInnes et al., 2018, Becht et al., 2019,
seen as Euclidean spaces having homeomorphic neighborhoods Ali et al., 2019) comparing UMAP and t-SNE in various fields
for every point in n-dimensional space. Nonetheless, these and datasets report the superiority of UMAP with detailed
points might not be homeomorphic from the perspective of the benchmarks. Although both of these methods follow same or
global structure. Spheres, toruses, planes, or folded sheets can similar methodologies, the main reason behind this finding
be given as examples of manifolds. According to (Sharma and is that UMAP not only considers the local connectedness but
Raglin, 2018), given a dataset having a d-dimensional feature also attempts to preserve data’s global structure by utilizing
space, revealing the manifold structure lying inside the data Laplacian Eigenmaps (Kobak and Linderman, 2019)
is called as manifold learning. At its core, the UMAP algorithm initially builds a high di-
Manifold learning or dimension reduction (DR…) methods mensional graph structure by utilizing a concept so-called
deal with discovering new low dimensional embeddings by “fuzzy simplical complex” as an analogy to a weighted graph
transforming/mapping the high dimensional data such that having the weights of edges represent the degree of proba-
the distances among closer data points lie together in the bility of two data points (i.e. vertex) are related (Coenen and
new coordinate system whereas the distant points remain dis- Pearce, 2020). The algorithm decides to connect a data point
tant by also preserving the parameters. In essence, one sig- to another based on whether they overlap within a volumet-
nificant intuition behind this idea results from two facts: (a) ric range controlled by a diameter parameter. Thus, the selec-
many co-related and overlapping features exist in the datasets tion of the diameter becomes a key component for the opti-
and (b) the necessity of avoiding this complexity to achieve a mal trade-off ranging from having very tiny clusters to glu-
simplified and nonoverlapping representation by preserving ing unnecessary points. At this point, UMAP handles this crit-
the underlying parameters that govern the data (Sharma and ical stage by picking a diameter locally by considering the dis-
Raglin, 2018). One practical implication of these methods is tance of nth nearest neighbor of each point in higher dimen-
computers & security 103 (2021) 102166 7
sional space and the graph gets fuzzier along with lowering the
likelihood of connections while the gluing diameter increases
(Coenen and Pearce, 2020). Note that, employment of the num-
ber of neighbors (NN) is a subtle difference of UMAP compared
to the perplexity parameter used by t-SNE. Moreover, UMAP
removes the normalization steps for both high and low di-
mensional probabilities which results in speedup compared to
t-SNE. Since UMAP is based on the graph structure, optimiza-
tion of the layout is an essential step and this is achieved by
the Stochastic Gradient Descent algorithm. There exists some
advanced mathematical background for the UMAP algorithm
and the reader is suggested to review the study (McInnes et al.,
2018) for further reading.
In this study, we have incorporated UMAP to (1) improve the
binary classification performance at unknown malware recog-
nition problem by obtaining more discriminative lower di-
mensional embeddings via its supervised metric learning sup-
port that enables warping the dataset with the help of given
class labels; (2) visualize the latent space to investigate and ex-
plore the learned lower dimensional embeddings in 2D space.
As pointed out by Ali et al. (2019), dimension reduction is a way
to improve the efficiency of revealing patterns in datasets. To
be more precise about the first argument, UMAP enables to
create of a transformer that is trained in a supervised man-
ner through class labels regardless of whether it is a binary or
multi-class classification task. In other words, it learns how to
embed the samples belonging to different classes from high
dimensional feature space into the lower dimensional one by
also skewing the target feature space. In this regard, we can
obtain a well-separated set of new embeddings for each class
that could be more isolating and discriminative which yields
better and robust accuracy rates. In this perspective, UMAP
behaves similarly to variational autoencoders (VAE). However,
the experiments of (McInnes et al., 2018) clearly show the su-
periority of UMAP over VAE. Consequently, we have attempted
to investigate the use of UMAP for the first time in the prob-
lem domain and explored its contribution to the unknown
malware detection problem which was treated in the second
phase of our study.
Procdump tool enables users to monitor processes and ob- static malware files. In this regard, each pixel represents 3 se-
tain dump files. As being a command-line utility, it was first quential bytes yielding a color. The main motivation of this ap-
made available for Microsoft Windows operating system and proach is to store more bytes in each row of the image for hav-
later on Linux support was provided as well. It is one of the ad- ing better consistency in terms of byte-level alignment. In this
vantages of Procdump that it also enables us to capture virtual way, more information could be placed into each row of the
address space occupied by processes. image such that visual similarities belonging to similar sam-
The PE files involved in our Dumpware10 dataset were run ples will be more apparent and easily identifiable. Second, RGB
in the Windows Sandbox environment shipped with 1903 ver- based encoding makes the images more compact since the
sion of Windows 10 to get their contents located in both phys- amount of pixel space will be reduced with a 1/3 ratio yield-
ical and virtual address space. In this way, we also protected ing less distortion during the post image resizing. Nonethe-
our physical system against malicious activities. To obtain the less, it should be noted that, compared to 8-bit grayscale
dump files, we have executed the Prodcump by providing –ma encoding, this approach could involve disadvantages when
and –w startup arguments. The first argument “-ma” tells the byte-level variations come into prominence among memory
tool to gather full memory dump belonging to the target pro- dumps. Meanwhile, for the byte-to-image conversion, we first
cess whereas the “-w” forces the tool to wait for the process updated and modified the Python script called “bin2png” pub-
until it is run. In our experiments, we first have run Procdump lished in https://github.com/ESultanik/bin2png. Next, we have
tool and secondly, we have made an intentional delay that has run this script to create the source input images. The men-
a duration of 5000 ms. As the third stage, the PE file was run tioned script was written in Python 2.7. However, we modified
and dumping operation finished. The rationale behind the in- it that it will be run under Python 3 platform. Meanwhile, in-
tentional delay is just to ensure that Procdump and the speci- stead of lossy JPG format, we preferred the lossless PNG for-
fied malicious application and active. Note that the extension mat.
of the generated dump files changes according to the OS of the There exist various byte-to-image rendering schemes in
host. For instance, the Windows version of the Procdump cre- the studies of malware detection literature. For instance, in
ates .dmp files while the Linux version generates dump files their study, Dai et al. (2018) converted memory dump data into
having .vmcore extension. Next, we have located all the .dmp .png encoded gray-scale images having a constant width of
files into their respective folders according to their classes. 2048 or 4096 pixels according to the file size. However, accord-
One another important point is that the size of dump files ing to our best knowledge, none of them has conducted a com-
varies due to several factors (i.e. dependent DLL files). Besides, parative study on the way that these renderings affect classi-
the size of dump files generated through the PE files of our fication accuracy. In other words, here we are arguing how one
corpus ranges between 10 MB to 100 MB. As explained clearly must select the column width for the analysis. As can be seen
in (Dai et al., 2018), each process consists of several regions in Fig. 2, we have selected 4 different column widths such as
in memory space to store different blocks such as DLLs, envi- (1) 224px, (2) 300px, (3) 4096, and finally (4) square root scheme.
ronmental variables, process heap, thread stack, data segment As the names of the first 3 schemes refer, they correspond to
and text segment as well. Moreover, according to (Korkin and the initial image widths regardless of the memory dump file
Nesterov, 2015), operating systems employ Address Space Lay- size. The last scheme (i.e. square root) we applied follows a
out Randomized (ASLR) scheme indicating that those blocks different strategy. Instead of pre-determined image width, we
are located randomly and fragmented especially when the have computed the square root of the number denoting the
size of available memory is relatively small. Thus, to collect (dump file size)/3. Here, we have divided the byte size of PE’s
more consistent and uniform memory dumps we have allo- by 3 due to the 3-channel color coding and adjusted the edge
cated large enough memory (>16 GB) for the Sandbox envi- of the square in a way that it will contain zero-padding ele-
ronment. ments if necessary. In this way, we obtained 4 different kinds
of input images for representing the same training and val-
4.2. Image representation idation sets. Therefore, as a research question, we also ex-
plored the effect of initial image renderings as illustrated in
The proliferation of binary-to-image conversion invented by Fig. 3.
Nataraj et al., (2011) has inspired many new anti-malware The size of initial rendered images allocated quite large
studies. This kind of visualization of binary files has filled the even though RGB encoding has been preferred. Further, as can
gap between computer vision and the byte level sequences of be seen in Fig. 3, the images having 224px width yielded the
executables. Throughout the study, to achieve a suitable rep- longest vertical edge. This issue restricts the efficiency of com-
resentation, we have fundamentally (a) employed RGB based puter vision methods and extends the running time of the vi-
encoding to convert dump files into images and (b) exper- sual feature extraction algorithms. Thus, we have transformed
imented with various column width schemes during these them in a way they form a square sized image. Technically
byte-to-image renderings. speaking, we resized the images throughout their vertical axis
The content size of the memory dump data we collected via Lanczos interpolation shipped with the OpenCV library
throughout the study is very large and highly variable. There- (OpenCV Tutorials, OpenCV 2019). Rather than Bi-cubic inter-
fore, unlike other studies such as (Nataraj et al., 2011, Dai et al., polation used in (Dai et al., 2018), we have employed a better
2018, Yuan et al., 2020) applying grayscale encoding, we have quality interpolation method called Lanczos to reduce the in-
preferred RGB encoding to generate the images from malware formation loss during this unavoidable resize operation. With
dump files. Note that, in (Bozkir et al., 2019), authors have Lanczos interpolation, the images can be downscaled consid-
followed the same way for generating malware images from ering the 8 × 8 neighborhood in image pixels.
computers & security 103 (2021) 102166 9
The rationale behind the square size transformation is 4.3. Visual feature extraction
described as follows. As is known, machine learning meth-
ods require equal length feature vectors to make classifica- To extract possible discriminative visual patterns in memory
tions, clustering, and so on. Therefore, in substantial num- dump images, we have performed two different visual feature
ber of cases, any computer vision based method must pro- descriptors namely GIST and HOG. For getting GIST descrip-
duce equal-sized feature vectors (i.e. signatures) to represent tors we have employed the Python package named “leargist”
the images. As pointed out in the next sub-section, we extract (Pyleargist 2019). This implementation has been selected due
GIST and HOG based image descriptors during the signature to not only being fast but also enables us to compute colored
generation stage. Apart from the GIST, due to the nature of the GIST descriptors having 960-dimensional vectors. Throughout
algorithm of Histogram of Oriented Gradients, it needs to pro- the study, we have used a computer equipped with an Intel
cess on equal-sized images for producing features having the 8750 processor and 24 GB memory. The time needed for the
same length. GIST feature computation has been measured as 1.67 s on av-
Otherwise, some important parameters in the HOG com- erage.
putation such as cell size, number of cells in each block yield We have also computed the HOG features of the memory
varying sized feature vectors. Therefore, it indispensably be- dump images by using “Skimage.features” Python package.
comes an obligation to have a canonical image size. It should This package provides a useful set of descriptors such as HOG,
also be noted that resizing the image into square size un- SIFT, and Local Binary Patterns. During the HOG feature gen-
doubtedly causes loss of visual information to a certain ex- eration, we have first resized the input images into 256 × 256.
tent since this transformation distorts the image. Neverthe- This operation can be considered as downscaling for 300 × 300
less, the previous studies (Dai et al., 2018, Bozkir et al., 2019, images whereas it can be thought of as an upscaling process
Yuan et al., 2020) and our present work show that this loss for 224 × 224 images. Following this, we have set the cell_size
is not that significant and the discriminative visual cues and parameter as 32 along with orientations as 9 while we have de-
patterns could still exist. To this end, we have taken two im- fined the cell_per_block argument as 2 × 2. Besides, we have
portant advantages: (1) we “equalized” images in terms of employed “L2-Normsys” block normalization scheme. As a re-
size to be used via HOG descriptors and (2) we made the sult, we have obtained 1764d feature vectors. The time needed
data ready to be modeled with modern convolutional neural for the HOG feature computation has been measured as 1.12 s
networks. on average.
Consequently, we have transformed the byte sequences of
the PEs into discriminative visual elements that can be called 4.4. Classification via machine learning
memory dump images. What is more, is we have investigated
the outcome of different pre-processing techniques and com- As a result of the feature extraction procedure, vectors rep-
pared them. resenting the characteristic of the images were obtained. To
classify the memory dumps according to their visual descrip-
10 computers & security 103 (2021) 102166
Fig. 4 – Several examples of RGB based memory dump images (224 × 224 renderings) belonging to different malware
families and benign samples. The first three rows depict images of various malware classes such as “Allaple”, “BrowseFox”
and “InstallCore”. The last row shows samples belonging to benign PEs. The renderings demonstrate the intra-class and
inter-class similarities and variations.
tors, we have employed 5 well know machine learning meth- 4.5. Evaluation metrics
ods: (1) Random Forest, (2) XGBoost, (3) Linear SVM, (4) Se-
quential Minimal Optimization, and (5) J48. Three out of these Throughout the experimental studies, to fairly and quanti-
five techniques are in tree-based techniques whereas two of tatively assess the performance of the built machine learn-
them are involved in structured learning methods. In this way, ing models, we have operated several metrics such as ac-
we aimed to examine whether our problem domain can be curacy rate. The metrics that we have benefited from have
learned by tree-based methods and kernel machines. This in- been widely used in the literature and provide detailed evalu-
vestigation is important since we observe a high inter-class ations of the proposed approaches (Sharma and Sahay, 2014,
similarity between some different malware families as can be Eroglu et al., 2019 and Dai et al., 2018). In this sub-section we
seen in Fig. 4. On the other hand, Fig. 4 also depicts the intra- briefly introduce them as follows:
class variations belonging to the same malware class. TP (i.e. true positives) is the measure of artifacts correctly
For an efficient and effective approach, handling this kind classified as members of positive class whereas FP (i.e. false
of inter-class and intra-class variations/similarities is crucial. positives) denotes the number of cases where the model
Moreover, we believe that this problem must be addressed wrongly predicts the positive class. Similarly, TN (i.e. true neg-
for any kind of machine learning task. We employed Weka atives) shows the number of outcomes where the model de-
(General documentation, Weka 2019) software to build the termines the ground-truth negative class examples correctly.
models that we evaluated. In particular, we have written a On the other hand, FN (i.e. false negatives) tells the number of
Python script for XGBoost based modeling due to the lack of predictions where the created model incorrectly detects the
this method in vanilla Weka distribution. During the experi- actual negative cases. According to these base definitions we
ments carried out with Random Forest (RF), XGBoost, and J48, the metrics so-called accuracy, precision, recall, F1-score are
we have kept the default parameters. For the linear SVM and defined below in (5), (6), (7), and (8) respectively.
SMO based modeling, we have set the C value as 10. In partic-
ular, we have selected the RBF kernel for SMO based learning TP + TN
Accuracy = (5)
process. In this way, we have aimed to observe whether the TP + FP + TN + FN
feature space occupied by HOG and GIST descriptors can be
better separated via linear or Gaussian kernel tricks.
It should also be kept in mind that, our problem lies in TP
Precision = (6)
an open-set classification scheme where the benign class also TP + FP
exists. Therefore, having highly discriminant representations
and accurate classification of them is vital since most of the
samples will come from benign processes in the wild. TP
Recall = (7)
TP + FN
computers & security 103 (2021) 102166 11
Table 2 – GIST based performance comparison for the various classifiers and row length settings. Best configuration was
highlighted with bold characters.
ML.Algorithm Initial Column Width Feature Accuracy FPR Precision Recall F1-Score
Random Forest 224 pixels GIST 86.06% 0.017 0.867 0.861 0.857
SMO (RBF) 224 pixels GIST 87.45% 0.014 0.880 0.875 0.875
SVM (Linear) 224 pixels GIST 85.36% 0.016 0.858 0.854 0.853
XGBoost 224 pixels GIST 88.26% 0.011 0.885 0.882 0.881
J48 224 pixels GIST 72.70% 0.029 0.731 0.727 0.726
Random Forest 300 pixels GIST 89.08% 0.013 0.892 0.892 0.891
SMO (RBF) 300 pixels GIST 91.40% 0.010 0.915 0.914 0.914
SVM (Linear) 300 pixels GIST 88.03% 0.014 0.879 0.880 0.879
XGBoost 300 pixels GIST 88.61% 0.011 0.888 0.886 0.884
J48 300 pixels GIST 74.09% 0.027 0.756 0.741 0.746
Random Forest 4096 pixels GIST 93.14% 0.09 0.931 0.931 0.932
SMO (RBF) 4096 pixels GIST 94.65% 0.006 0.948 0.947 0.947
SVM (Linear) 4096 pixels GIST 92.91% 0.08 0.928 0.929 0.928
XGBoost 4096 pixels GIST 91.63% 0.008 0.919 0.916 0.917
J48 4096 pixels GIST 79.55% 0.02 0.804 0.797 0.797
Random Forest Square root scheme GIST 88.03% 0.016 0.885 0.880 0.878
SMO (RBF) Square root scheme GIST 91.75% 0.009 0.917 0.918 0.916
SVM (Linear) Square root scheme GIST 88.73% 0.012 0.887 0.887 0.885
XGBoost Square root scheme GIST 88.96% 0.011 0.889 0.889 0.888
J48 Square root scheme GIST 71.19% 0.033 0.724 0.712 0.716
Table 3 – HOG based performance comparison for the various classifiers and row length settings. Best configuration was
highlighted with bold characters.
ML.Algorithm Initial Column Width Feature Accuracy FPR Precision Recall F1-Score
Random Forest 224 pixels HOG 86.87% 0.017 0.874 0.869 0.866
SMO (RBF) 224 pixels HOG 90.84% 0.010 0.910 0.909 0.909
SVM (Linear) 224 pixels HOG 85.71% 0.015 0.859 0.859 0.859
XGBoost 224 pixels HOG 87.45% 0.012 0.875 0.874 0.872
J48 224 pixels HOG 67.71% 0.036 0.687 0.677 0.679
Random Forest 300 pixels HOG 86.06% 0.018 0.861 0.861 0.861
SMO (RBF) 300 pixels HOG 89.31% 0.012 0.894 0.893 0.892
SVM (Linear) 300 pixels HOG 85.48% 0.016 0.861 0.855 0.856
XGBoost 300 pixels HOG 84.90% 0.015 0.848 0.849 0.846
J48 300 pixels HOG 70.84% 0.031 0.726 0.708 0.712
Random Forest 4096 pixels HOG 88.61% 0.015 0.892 0.886 0.884
SMO (RBF) 4096 pixels HOG 92.68% 0.008 0.927 0.927 0.926
SVM (Linear) 4096 pixels HOG 88.85% 0.012 0.888 0.889 0.888
XGBoost 4096 pixels HOG 90.12% 0.010 0.904 0.901 0.900
J48 4096 pixels HOG 70.84% 0.033 0.717 0.708 0.710
Random Forest Square root scheme HOG 88.05% 0.014 0.896 0.889 0.886
SMO (RBF) Square root scheme HOG 89.79% 0.012 0.901 0.898 0.898
SVM (Linear) Square root scheme HOG 85.13% 0.017 0.857 0.851 0.853
XGBoost Square root scheme HOG 88.15% 0.012 0.882 0.881 0.880
J48 Square root scheme HOG 66.89% 0.037 0.683 0.669 0.674
Table 4 – GIST+HOG based performance comparison for various classifiers and row length settings. Best configuration was
highlighted with bold characters.
ML.Algorithm Initial Column Width Feature Accuracy FPR Precision Recall F1-Score
Random Forest 224 pixels GIST+HOG 89.89% 0.010 0.899 0.899 0.897
SMO (RBF) 224 pixels GIST+HOG 94.54% 0.006 0.946 0.945 0.945
SVM (Linear) 224 pixels GIST+HOG 92.10% 0.009 0.923 0.921 0.921
XGBoost 224 pixels GIST+HOG 91.86% 0.008 0.922 0.918 0.918
J48 224 pixels GIST+HOG 73.28% 0.028 0.741 0.733 0.735
Random Forest 300 pixels GIST+HOG 90.59% 0.012 0.908 0.906 0.904
SMO (RBF) 300 pixels GIST+HOG 93.14% 0.008 0.932 0.931 0.931
SVM (Linear) 300 pixels GIST+HOG 90.24% 0.010 0.904 0.902 0.902
XGBoost 300 pixels GIST+HOG 89.69% 0.010 0.897 0.896 0.895
J48 300 pixels GIST+HOG 76.42% 0.025 0.781 0.764 0.770
Random Forest 4096 pixels GIST+HOG 92.91% 0.009 0.933 0.929 0.929
SMO (RBF) 4096 pixels GIST+HOG 96.39% 0.004 0.965 0.964 0.964
SVM (Linear) 4096 pixels GIST+HOG 93.26% 0.007 0.932 0.933 0.932
XGBoost 4096 pixels GIST+HOG 93.61% 0.006 0.936 0.936 0.935
J48 4096 pixels GIST+HOG 80.37% 0.022 0.813 0.804 0.806
Random Forest Square root scheme GIST+HOG 91.17% 0.011 0.915 0.912 0.911
SMO (RBF) Square root scheme GIST+HOG 95.35% 0.005 0.954 0.954 0.953
SVM (Linear) Square root scheme GIST+HOG 92.21% 0.008 0.922 0.922 0.922
XGBoost Square root scheme GIST+HOG 90.82% 0.009 0.911 0.908 0.909
J48 Square root scheme GIST+HOG 73.86% 0.028 0.759 0.739 0.746
To avoid overfitting, we have set C (cost) parameter as (5) Fig. 5 depicts the confusion matrix of our best classifier
10. Thus, we can figure out that the problem in this do- which is obtained SMO (with radial basis kernel) classi-
main is highly non-linear since the best results have been fier along with the use of GIST+HOG features. As can be
achieved with non-linear kernels. This finding is also in observed from Fig. 5, the benign class has been found as
line with the scores achieved by J48 and RF/XGBoost meth- the most difficult class to predict. Thi has not presented a
ods. As J48 is a simpler decision tree method compared to surprise from our perspective. We believe that this finding
RF/XGBoost, the generalization capacity of it limits the ob- is mainly related to the high variance existing in benign-
tained accuracy during inference. Apparently, the use of ware. Also, the structural similarity among the benign-
more than one single tree (i.e. Random Forest, XGBoost) has ware is naturally low. In theory, the open-set recognition
contributed to separate highly complex feature space (i.e. problems generally tend to have relatively low accuracy
manifold) scores regarding the “other” or “unknown” class
computers & security 103 (2021) 102166 13
Fig. 6 – The workflow of the “unknown malware detection” phase through the use of UMAP dimension reduction and
binary classification. (For interpretation of the references to colour in this figure legend, the reader is referred to the web
version of this article.)
ture. One another hyper-parameter is the initial distance met- We also computed the performance gain in terms of accu-
ric which Euclidean by default. racy for each fold and they are ranging from 2.13% to 36.98%.
As pointed out by Becht et al. (2019), the parameter se-
lection of UMAP is important to have useful embeddings for
1. Along with these, we calculated the weighted averages of
different purposes. Hence, we have conducted numerous ex-
the accuracy gains for each classification algorithms. In
periments to find out the optimal hyperparameters. In our
that regard, 4-dimensional embeddings created by UMAP
experiments, we executed a grid-search algorithm that tests
made RF gain 12.93% on average. It is found that, for this
the parameter of n_components with the values of 4, 5, 10, 25,
particular problem and dataset, the performance of linear
50 and 100. Furthermore, we have also sought the optimal
SVM and XGBoost have increased by 21.83% and 20.78% re-
min_dist parameter with the values ranging from 0.15 to 1
spectively.
having an interval of 0.05. Apart from these, the n_neighbors
2. The overall duration to build an embedding transformer
parameter is tested against values ranging from 5 to 100.
object takes 33 s at the test computer equipped with Intel
Throughout the investigations, we also tested various met-
i7 8700 processor.
rics such as Euclidean, Manhattan, Chebyshev, Cosine, Jaccard and
3. We have depicted the 2D projections of the train/test splits
Dice.
for each fold by presenting their original (raw) feature
According to the result of these experiments, we found
space and their corresponding new embeddings. Fig. 7
out that our best classification results were obtained by
shows the visualizations of each fold in a separate column.
setting the values of n_components = 4, n_neigbors = 55,
The top two projections illustrate the training data (before
min_dist = 1 and metric = ‘Manhattan’. UMAP applies man-
and after UMAP) whereas the lower two projections of test
ifold learning in two different modes (1) supervised, (2) un-
data present the representations that are before and after
supervised. To acquire more separable classes we have pre-
UMAP. In the upper part of the figure, the clear separation
ferred supervised mode. In this way, UMAP could learn the
between the lower dimensional embeddings of two classes
topological structure of the classes and embeds them into
can be easily seen for each fold. If we investigate the lower
more separable low dimensional feature space via preserv-
part (i.e. test data) of the figure, we can see the improve-
ing the intra-class distances controlled by the aforementioned
ment in terms of class separation gets better for all the
hyperparameters.
folds. More precisely, the orange-colored points (i.e. Benign
Table 8 shows the results after the application of UMAP. The
samples) got distant from blue-colored points (i.e. Malware
findings we discover from the experiments are listed below as
samples) in a way that they build new clusters although
follows:
some remain as outliers.
4. To better explore the progress which was reflected in clas-
1. All classifiers become able to correctly classify their train- sification results in detail, we also presented the new pre-
ing samples cision, recall and F1-scores for benign and malware classes
2. Evaluation metrics such as accuracy have risen in different for each fold. The outcomes are provided in Table 9. Accord-
ratios (i.e. Fold 1: 63.25% → 89.95%, Fold 2: 84.24% → 89.27%, ing to the results, all the F1-scores showing the harmonic
Fold 3: 82.82% → 84.95%). More importantly, almost all the mean of precision-recall values have risen to a certain ex-
classifiers perform equally well on test sets. tend. Beyond this, it seems that UMAP not only improves
16 computers & security 103 (2021) 102166
Table 8 – Classification performance of various ML methods on each fold after application of UMAP based supervised
manifold learning.
Fig. 7 – The UMAP based 2D visualizations of train/test distributions in each fold that are obtained before and after
dimension reduction. (M: Malware, B: Benign-ware). (For interpretation of the references to colour in this figure legend, the
reader is referred to the web version of this article.)
obtained with UMAP shows that it is an appropriate method Bozkir AS, Akcapinar Sezer E. In: 4th International Symposium
that can be used in the domain to contribute both to the on Digital Forensic and Security (ISDFS). Use of HOG
problem of class imbalance and feature visualization. In this Descriptors in Phishing Detection Arkansas, USA; 2016.
Bozkir AS, Cankaya AO, Aydos M. In: Signal Processing and
sense, we suggested an approach following the volatile mem-
Applications Congress, SIU, Sivas. Utilization and Comparison
ory forensics to build a robust scheme against fileless mal-
of Convolutional Neural Networks in Malware Recognition;
ware. Thus, memory dumps of processes have been utilized 2019.
as the main source of information. So as to support the study Chen L, “Deep Transfer Learning for Static Malware
findings, we have prepared and published a publicly available Classification”, arXiv:1812.07606, 2018.
dataset having instances for 10 malware families also includ- Cheng Y, Fan W, Huang W, An J. A Shellcode Detection Method
ing benign samples. Moreover, we have explored the impact of Based on Full Native API Sequence and Support Vector
Machine, vol. 242. IOP Publishing; 2017.
various image rendering schemes and found that 4096px col-
Coenen A, Pearce A, “Understanding UMAP”,
umn width yields the best results. The results indicated that https://pair- code.github.io/understanding- umap/, Technical
when transforming memory dumps to initial images, having Note, (Available online at 27.5.2020), 2019.
larger columns (i.e. larger width values) result in having bet- Coifman RR, Lafon S. Diffusion maps. Appl. Comput. Harmon.
ter representations. Furthermore, the late fusion of GIST and Anal. 2006;21(1).
HOG features yields the best outcome. Dai Y, Li H, Qian Y, Lu X. A malware classification method based
on memory dump grayscale image. Digital Investigation
Consequently, the proposed approach having 96.39% accu-
2018;27:30–7.
racy shows promising results through also offering a reason-
Dai Y, Li H, Qian Y, Yang R, Zheng M. SMASH: a Malware
able time to detect such as 3.56 s. We believe that the volatile Detection Method Based on Multi-Feature Ensemble Learning.
memory forensics will gain more popularity in the near future IEEE Access 2019;7:112588–97.
due to the increase in fileless malware and similar threats. Dalal N, Triggs B. Histograms of oriented gradients for human
As future work, to increase the accuracy and to obtain an detection, vol. 1; 2005. p. 886–93.
end-to-end analysis platform, we plan on investigating the Eroglu E, Bozkir AS, Aydos M. Brand Recognition of Phishing Web
Pages via Global Image Descriptors. Eur. J. Sci. Technol. 2019
abilities of convolutional neural networks that are equipped
Special Issue.
with self-attention mechanisms to reveal more discriminative
General documentation, Weka. (2019). [Online]. Available:
representations as well as achieving better generalization ca- https://www.cs.waikato.ac.nz/ml/weka/.
pabilities. One another interesting idea that we project to test Gibert D, Mateu C, Planes J. The rise of machine learning for
is the inverse transformation feature of UMAP which might detection and classification of malware: research
enable us to generate synthetic image features analogous to developments, trends and challenges. J. Network Comp. Appl.
the decoding of auto-encoders. 2020;153.
Jackson JE. A User’s Guide to Principal Components, 587. John
Wiley & Sons; 2005.
Kobak D, Linderman GC. UMAP does not preserve global structure
Declaration of Competing Interest any better than t-SNE when using the same initialization;
2019. biorXiv preprint bioRxiv2019.12.19.877522.
Korkin I, Nesterov I. In: TheADFSL Conference on Digital
The authors declare that they have no known competing fi- Forensics. Applying memory forensics to rootkit detection.
nancial interests or personal relationships that could have ap- Security and Law; 2015.
peared to influence the work reported in this paper. Korkin I, Nesterow I. In: 11th ADFSL Conference on Digital
Forensics, Security and Law. Acceleration of Statistical
Detection of Zero-day Malware in the Memory Dump Using
CRediT authorship contribution statement Cuda-Enabled GPU Hardware; 2016.
Maaten Lvd, Hinton G. Visualizing data using t-sne,. J. Mach. Lear.
Res. 2008;9.
Ahmet Selman Bozkir: Conceptualization, Methodology, Malware Statistics, the AV-TEST Institute. (2019). [Online].
Software, Formal analysis, Writing - original draft. Ersan Available: https://www.av-test.org/en/statistics/malware/.
Tahillioglu: Data curation, Software. Murat Aydos: Project ad- L. McInnes, J. Healy, J. Melville, “UMAP: Uniform Manifold
ministration, Validation, Supervision, Writing - review & edit- Approximation and Projection for Dimension Reduction”,
ing. Ilker Kara: Validation, Writing - review & editing. arXiv preprint arXiv1802.03426, 2018.
Microsoft Malware Classification Dataset, (2015). [Online].
Available: https://www.kaggle.com/c/malware-classification.
R E F E R E N C E S
Nataraj L, Karthikeyan S, Jacob G, Manjunath BS. Malware
images: visualization and automatic classification.
Proceedings of the 8th international symposium on
visualization for cyber security. ACM, 2011.
Ali M, Jones MW, Xie X, Williams M. TimeCluster: dimension Nissim N, Lahav O, Cohen A, Elovici Y, Rokach L. Volatile memory
reduction appliedto temporal data for visual analytics. Vis. analysis using the MinHash method for efficient and secured
Comput. 2019;35. detection of malware in private cloud. Computers & Security
Becht E, McInnes L, Healy J, Dutertre C-A, Kwork I-W-H, Ng LG, 2019;87.
Ginhoux F, Newell EW. Dimensionality reduction for Oliva A, Torralba A. Modeling the shape of the scene: a holistic
visualizing single-cell data using UMAP. Nat. Biotechnol. representation of the spatial envelope. Int. J. Comput. Vis.
2019;37. 2001;42:145–75.
Belkin M, Niyogi P. In: Advances in neural information processing OpenCV Tutorials, OpenCV. (2019). [Online]. Available:
systems. Laplacian eigenmaps and spectral techniques for https://opencv.org/.
embedding and clustering; 2002.
computers & security 103 (2021) 102166 19
Or-Meir O, Nissim N, Elovici Y, Rokach L. Dynamic Malware Zhang H, Yao DD, Ramakrishnan N. Causality-based
Analysis in the Modern Era – A State of the Art Survey. ACM Sensemaking of Network Traffic for Android Application
Comput. Surv. 2019;52(5). Security. Proc. of the 2016 ACM Workshop on Artificial
Oujaoura M, Minaoui M, Fakir M, El Ayachi R, Bencharef O. Intelligence and Security (AISec’2016), 2016.
Recognition of Isolated Printed Tifinagh Characters. Int. J.
Comput. Appl. 2014;85. AHMET S. BOZKIR was born in Muğla Turkey,
ProcDump v9.0, Microsoft. (2019). [Online]. Available: https://docs. in 1983. He received a B.S. degree in com-
microsoft.com/en-us/sysinternals/downloads/procdump. puter engineering from Eskişehir Osmangazi
Process Explorer v16.31, Microsoft. (2019). [Online]. Available: University in 2002. Besides, he received M.Sc.
https://docs.microsoft.com/en-us/sysinternals/downloads/ and Ph.D. degrees in computer engineering
process-explorer. from the Hacettepe University in 2009 and
Process Monitor v3.53, Microsoft. (2019). [Online]. Available: 2016 respectively. He is currently working
https://docs.microsoft.com/en-us/sysinternals/downloads/ as a Research Assistant at Hacettepe Uni-
procmon. versity Multimedia Information Laboratory
Pyleargist, (2019). [Online] Available: (HUMIR). He has more than 33 studies cov-
https://pypi.org/project/pyleargist/. ering fields such as Information Security,
Rezende E, Ruppert G, Carvalho T, Theophilo A, Ramos F, de Human-Computer Interaction, Information
Geus P. In: Advances in Intelligent Systems and Computing. Retrieval, Machine Learning and Engineering
Malicious Software Classification Using VGG16 Deep Neural Geology.
Network’s Bottleneck Features; 2018.
ERSAN TAHILLIOGLU works as an embed-
Santos I, Devesa J, Brezo F, Nieves J, Bringas PG. In: International
ded software engineer in ASELSAN Corpora-
Joint Conference CISIS’12-ICEUTE 12-SOCO 12 Special
tion. He has received his B.S. degree in com-
Sessions. Opem: A static-dynamic approach for
puter engineering from Hacettepe Univer-
machine-learning-based malware detection. Berlin,
sity in 2016. He is currently an M.Sc student
Heidelberg: Springer; 2013.
in the same department. His-research inter-
Shaid SZM, Maarof MA. In: 2014 International Symposium on
ests include topics such as computer secu-
Biometrics and Security Technologies (ISBAST). Malware
rity and machine learning. His-current re-
behavior image for malware variant identification. IEEE; 2014.
search specifically focuses on malware de-
Sharma PK, Raglin A. In: 17th IEEE International Conference on
tection utilization of memory forensic and
Machine Learning and Applications. Efficacy of Nonlinear
machine learning.
Manifold Learning in Malware Image Pattern Analysis; 2018.
Sharma A, Sahay SK. Evolution and detection of polymorphic and
MURAT AYDOS. Dr. Murat Aydos received
metamorphic malwares: a survey. Int. J. Comput. Appl.
the B.Sc. degree from Yildiz Technical Univer-
2014;90:7–11.
sity (Turkey) in 1991, and M.S. degree from
Shijo PV, Salim A. Integrated static and dynamic analysis for
Electrical and Computer Engineering De-
malware detection. Procedia Comput. Sci. 2015;46:804–11.
partment, Oklahoma State University (USA),
Sihwail R, Omar K, Zainol Ariffin KA, Al Afghani S. Malware
in 1996. He completed his Ph.D. study at Ore-
detection approach based on artifacts in memory image and
gon State University, Electrical Engineering
dynamic analysis. Appl. Sci. 2019;9.18:3680.
and Computer Science Department in June
Tam K, Edwards N, Cavallaro L. In: Engineering Secure Software
2001. Dr. Aydos joined Informatics Institute
and Systems (ESSoS) Doctoral Symposium. Detecting Android
at Hacettepe University in April 2013. He is
malware using memory image forensics; 2015.
the Head of Information Security Division
Tenenbaum JB, De Silva V, Langford JC. A global geometric
at the Informatics Institute. Dr. Aydos is the
framework for nonlinear dimensionality reduction. Science
author/co-author of more than 30 technical
2000;290(5500).
publications focusing on the applications of Cryptographic Primi-
TDIMon, Mark Russinovich. &; Bryce Cogswell. (2019). [Online].
tives, Information & Data Security Mechanisms.
Available: https://sysinternals.d4rk4.ru/Utilities/TdiMon.html.
The Malevis Dataset, (2019). [Online]. Available: ILKER KARA has been an Assist. Prof. in the
https://web.cs.hacettepe.edu.tr/∼selman/malevis/. Department of Medical Services and Tech-
Vasan D, Alazab M, Wassan S, Safaei B, Zheng Q. Image-Based niques, Eldivan Medical Services Vocational
Malware Classification using Ensemble of CNN Architectures School in Çankırı Karatekin University since
(IMCEC). Computers & Security 2020;92. 2019. He has also been a part-time lecturer
Yajamanam S, Selvin VRS, Di Troia F, Stamp M. Deep learning in the Computer Engineering Department of
versus gist descriptors for image-based malware Hacettepe University since 2017. Kara com-
classification. Icissp 2018:553–61. pleted his Ph.D. at Gazi University as of
Yuan B, Wang J, Liu D, Gao W, Wu P, Bao X. Byte-level Malware 2015. His-research interests cover digital in-
Classification Based on Markov Images and Deep Learning. vestigation, malware analysis and internet
Computers & Security 2020;92. security. He has actively collaborated with
Zhang H, Yao D(Daphne), Ramakrishnan N. Detection of stealty researchers from several other disciplines
malware activities with traffic causality and scalable such as computer science and forensics se-
triggering relation discovery. Proc. of the 9th ACM symposium curity in particular. He is currently working as the head of Infor-
on information, computer and communications security mation Security Division at the Informatics Institute. Besides, Dr.
(ASIA CCS’14), 2014. Kara authored more than 20 technical publications focusing on
the applications of Cyber Security, Malware Analysis; Data Secu-
rity Mechanisms.