0% found this document useful (0 votes)
44 views19 pages

Catch Them Alive: Malware Detection

1) The document presents a novel approach for malware detection that analyzes memory dumps of suspicious processes converted to RGB images and applies computer vision and machine learning techniques. 2) The approach achieved up to 96.39% accuracy classifying malware families using machine learning on features extracted from memory dump images. 3) Applying manifold learning to the features further improved unknown malware detection for some classifiers, increasing accuracy up to 21.83% on average.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views19 pages

Catch Them Alive: Malware Detection

1) The document presents a novel approach for malware detection that analyzes memory dumps of suspicious processes converted to RGB images and applies computer vision and machine learning techniques. 2) The approach achieved up to 96.39% accuracy classifying malware families using machine learning on features extracted from memory dump images. 3) Applying manifold learning to the features further improved unknown malware detection for some classifiers, increasing accuracy up to 21.83% on average.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

computers & security 103 (2021) 102166

Available online at www.sciencedirect.com

journal homepage: www.elsevier.com/locate/cose

Catch them alive: A malware detection


approach through memory forensics, manifold
learning and computer vision

Ahmet Selman Bozkir a,∗, Ersan Tahillioglu b, Murat Aydos a, Ilker Kara c
a Department of Computer Engineering, Hacettepe University, Turkey
b ASELSAN Inc., Turkey
c Department of Medical Services and Techniques, Eldivan Medical Services Vocational School Çankırı, Karetekin

University, Turkey

a r t i c l e i n f o a b s t r a c t

Article history: The everlasting increase in usage of information systems and online services have
Received 6 June 2020 triggered the birth of the new type of malware which are more dangerous and hard
Revised 16 November 2020 to detect. In particular, according to the recent reports, the new type of fileless mal-
ware infect the victims’ devices without a persistent trace (i.e. file) on hard drives.
Accepted 28 December 2020
Moreover, existing static malware detection methods in literature often fail to detect
Available online 2 January 2021
sophisticated malware utilizing various obfuscation and encryption techniques. Our
contribution in this study is two-folded. First, we present a novel approach to recognize
Keywords: malware by capturing the memory dump of suspicious processes which can be repre-
Memory forensics sented as a RGB image. In contrast to the conventional approaches followed by static
Memory dump and dynamic methods existing in the literature, we aimed to obtain and use mem-
Machine learning
ory data to reveal visual patterns that can be classified by employing computer vision
and machine learning methods in a multi-class open-set recognition regime. And sec-
Computer vision
ond, we have applied a state of art manifold learning scheme named UMAP to improve
Malware detection the detection of unknown malware files through binary classification. Throughout the
Manifold learning study, we have employed our novel dataset covering 4294 samples in total, includ-
ing 10 malware families along with the benign executables. Lastly, we obtained their
memory dumps and converted them to RGB images by applying 3 different rendering
schemes. In order to generate their signatures (i.e. feature vectors), we utilized GIST
and HOG (Histogram of Gradients) descriptors as well as their combination. Moreover,
the obtained signatures were classified via machine learning algorithms of j48, RBF
kernel-based SMO, Random Forest, XGBoost and linear SVM. According to the results
of the first phase, we have achieved prediction accuracy up to 96.39% by employing
SMO algorithm on the feature vectors combined with GIST+HOG. Besides, the UMAP
based manifold learning strategy has improved accuracy of the unknown malware
recognition models up to 12.93%, 21.83%, 20.78% on average for Random Forest, lin-
ear SVM and XGBoost algorithms respectively. Moreover, on a commercially available
standard desktop computer, the suggested approach takes only 3.56 s for analysis on
average. The results show that our vision based scheme provides an effective protec-
tion mechanism against malicious applications.
© 2021 Elsevier Ltd. All rights reserved.


Corresponding author.
E-mail address: [email protected] (A.S. Bozkir).
https://doi.org/10.1016/j.cose.2020.102166
0167-4048/© 2021 Elsevier Ltd. All rights reserved.
2 computers & security 103 (2021) 102166

the dynamic analysis focuses on. One drawback of this ap-


1. Introduction proach is that the dynamic analysis of malware usually re-
quires much more resources (i.e. memory and CPU usage)
Computers and other associated computing systems (e.g. mo-
when compared to static analysis. Moreover, some portable
bile devices, IoTs) that we constantly use in our daily lives
executable (PE) files are required for decompression and un-
have become inevitable and rising components of our daily
packing in order to reveal discriminative features. Nonethe-
lives, which also attract the attention of cyber-attackers. Thus,
less, its performance in terms of accuracy is generally reported
computer systems that are used frequently in the sectors such
higher (Santos et al., 2013). Furthermore, in their comprehen-
as the defense industry, health, entertainment, banking, and
sive survey, Or-Meir et al. (2019) have listed some vulnerabili-
education have been a target for various malicious activities
ties belonging to dynamic analysis such as sensing the pres-
such as illegal profit acquisition, information theft, and denial
ence of analysis tools to hide its behavior (i.e. by tracing debug-
of service. Gibert et al. (2020) state that the combat between
gers and signatures created by tools), privilege escalation of
cyber-attackers and security professionals turned out to be an
malicious executable, nested virtualization and logic bombs.
arms race since attackers develop new techniques to evade
Likewise, they require specific tools for tracing suspicious file
or subvert anti-malware solutions while the security experts
activities. Meanwhile, there exist several tools which are de-
enhance the methods and schemes. According to the AV-Test
signed so as to extract features for dynamic analysis-based
Institute report, as of 2018, 856 million malware were released
methods such as Process Monitor (Process Monitor v3.53, Mi-
(Malware Statistics, the AV-TEST Institute 2019).
crosoft 2019), Process Explorer (Process Explorer v16.31, Mi-
To mitigate this problem, several combatting approaches
crosoft 2019), TDIMon (Bryce Cogswell 2019), RegMon, and
that can be categorized under three branches namely static,
Wireshark.
dynamic, and memory-based (Sihwail et al., 2019) were proposed
Apart from the pros and cons of static and dynamic based
(See Fig. 1). Static methods differ from the others by requiring
schemes, memory-based analysis (i.e. volatile memory foren-
no runtime analysis based on the execution of suspicious files.
sics) is an efficient way for malware detection that has been
Further, this type of analysis utilizes various features such as
gaining more attention in recent years. In principle, volatile
strings, opcode, API calls, byte sequences, control flow charts
memory forensics (VMF) involves two key stages: (a) acquisition
that are all revealed from raw bytes of the portable executa-
and (b) analysis whereas the acquisition refers to converting
bles (PEs) (Gibert et al., 2020). In this context, it can be de-
data stored in physical or virtual memory into memory dump
duced that static methods work with signatures extracted by
files via kernel drivers or emulators and the analysis step aims
listed features above. According to Gibert et al. (2020), a sig-
to reveal useful information to detect presence or behavior of
nature is an algorithm or a unique hash that differentiates
malware by smart techniques such as machine learning (Or-
specific malware from the others. As there is no need for ex-
Meir et al., 2019). In this perspective, volatile memory forensics
ecution, static methods require much less computational re-
presents several key advantages motivating ourselves for this
sources along with providing a fast recognition scheme. How-
study. As stated in (Dai et al., 2018), data in memory will be
ever, approaches relying on static analysis often present se-
inescapably exposed under supervision. Thus, although they
vere vulnerabilities when code obfuscation and encryption
can hide via encryption and packing, during the execution all
(Or-Meir et al., 2019) come into prominence. Moreover, they
processes become “naked” in the examination point of view
can be easily evaded by malware authors when dead code in-
since they expose most of the vital information (e.g. registers,
sertion and register reassignment come into play. Thus, their
code and data segments) to run. In this regard, volatile mem-
performance involves a high potential for unpredictable de-
ory forensics enables the detection of malware by only exam-
creases in terms of recognition accuracy.
ining their state in the system RAM. Moreover, another very
Apart from the static one, the dynamic analysis relies on
important advantage of VMF is being robust to fileless mal-
capturing discriminative patterns from suspicious files sourc-
ware which avoids detection by leaving no evidence or pres-
ing from their behavioral analysis when they are executed in
ence on hard drives (Or-Meir et al., 2019). This new kind of mal-
environments such as sandboxes and virtual machines. More
ware resides in the victim’s memory until it is terminated or
precisely, analysis of function calls (Cheng et al., 2017), de-
the victim’s system shuts down. Due to the lack of footprint,
tection of harmful activities, and manipulation of windows
detection of fileless malware is nearly impossible for typical
registry entries can be listed as the heading items which
signature-based anti-virus applications.

Fig. 1 – Commonly employed malware detection methods in the literature.


computers & security 103 (2021) 102166 3

Taking all these into account, we propose a vision-based


approach employing full memory dumps extracted from pro-
2. Related work
cesses as the source of information and global image descrip-
Currently, there exists a vast amount of studies in every aspect
tors such as GIST (Oliva and Torralba, 2001) and Histogram of
of malware detection and in this section, some of them which
Oriented Gradients (HOG) (Dalal and Triggs, 2005) for the dis-
focus on static analysis have been briefly reviewed.
covery of discriminative visual patterns. Further, we use 5 dif-
In the study proposed by Shijo and Salim, (2015), print-
ferent machine learning methods to classify whether the sus-
able strings (PIS) were extracted from the binary codes and
picious process is benign or malware. In this way, we suggest
an SVM based learning scheme was used to classify mal-
a more robust scheme against code obfuscations compared
ware files yielding an accuracy of 95.88%. In the work of
to other analysis methods. Moreover, compared to the static
Santos et al. (2013), they carried out a study comparing both
analysis we do not need for decompression and unpacking for
static and dynamic schemes. Hence, opcodes were extracted
the analysis.
and computed by term frequency (tf) and frequency of occur-
In the next phase, we have attempted to classify unknown
rence. In addition, they benefited from features based on dy-
malware samples through binary classification. So we have
namic analysis. According to their results, dynamic malware
split the dataset into 3 folds each having variously known mal-
detection poses a higher performance than the static analysis.
ware and benign samples that were used to train classifiers to
In 2019, Bozkir et al. (2019) evaluated several convolutional
recognize remaining unknown malware files. At this point, we
neural networks on the classification of persistent malware
have hypothesized and tested our belief that the code struc-
files. For that purpose,they have converted raw binary bytes of
ture and organization of malware files are more similar to
portable executables (PEs) into colored images. As the dataset,
other malware types rather than benign files.
they utilized Malevis Dataset (The Malevis Dataset 2019) in-
To this end, this study mainly presents 5 contributions
volving 12,394 malware files that have been split up in 8750
listed below without any order concern:
as the training set and 3644 as the test set. According to
the experiments they performed, they reported the accuracy
• The proposed method in this study focuses on memory of%97.48 by using “DenseNet” architecture.
analysis that is based on capturing memory dumps which In their study, Yajamanam et al. (2018) investigated and
could reflect malware that has been equipped with obfus- compared the pros and cons of deep learning architectures
cation or encryption. and GIST descriptors in the domain of malware recogni-
• We have examined the outcome of different byte-to-image tion through image feature extraction. Their findings on rig-
rendering schemes in terms of accuracy. To be more pre- orous experiments show that these two methods perform
cise, we have applied 4 different target image resolutions equally well in terms of accuracy. Nonetheless, according to
ranging from 224 to 4096 pixels. Unlike other studies that (Yajamanam et al., 2018), deep learning methodologies come
analyze grayscale images such as (Nataraj et al., 2011, into prominence by (a) discarding the necessity of extraction
Dai et al., 2018), we have exerted 3-channels RGB images of hand-crafted features and (b) a considerable amount of pro-
to represent memory dump files. cess time.
• Instead of using single image descriptors, we have used Nissim et al. (2019) have developed a framework to detect
GIST and HOG descriptors both solely and together in a ransomware and RATs on virtual machines that are hosted on
manner of information fusion. cloud systems. They employed MinHash method to analyze
• We have applied the state of art manifold learning and similarities among binaries that are created through volatile
dimension reduction technique called UMAP for the first memory dumps. In their solution, Nissim et al., (2019) have
time in the problem domain and evaluated its contribution implemented a custom Similarity classifier which is based on
to unknown malware detection problem. MinHash technique and Jaccard coefficient. To improve the de-
• We proposed a new dataset containing 10 unique malware tection quality and reduce the time for the process, they have
families + benign executable class yielding 11 classes. Be- leveraged the method of locality sensitive hashing. The results
sides, the dataset we collected involves 3433 training sam- show that they have achieved a 100% true positive rate along
ples as well as 861 validation instances reaching up to 4294 with having a very low false positive rate for ransomware such
in total. as 1.8% and 0% for RATs. Their method’s superiority actu-
ally sources from analyzing the static information after it was
loaded into volatile memory yielding to be able to analyze un-
This paper is organized as follows: In Section 2, we re-
packed, unencrypted clear data.
viewed several related studies. Section 3 introduces the de-
Shaid and Maarof, (2014) executed malware files within a
tails of the proposed dataset and briefly demonstrates GIST
virtual machine environment, collected user-level API calls
and HOG descriptors. Afterwards, Section 4 details the fol-
and categorized these calls manually according to their sus-
lowing sub-procedures: (a) how we have conducted mem-
piciousness levels. Later on, they visualized and catego-
ory dumping operations, (b) how we rendered RGB images,
rized those calls as suspicious and non-suspicious ones with
(c) how we obtained discriminative visual features by utiliz-
hot and cold colors respectively. To this end, they visual-
ing GIST and HOG descriptors. Further, Section 5 presents
ized suspicious files based on the behavior they exhibit.
the experimental results and comparison of classification
Hence, malware behavior can be identified and behavioral
study through 5 unique machine learning algorithms. Finally,
images can be used to introduce new ways to malware
Section 6 concludes the study and explains possible future
detection.
directions.
4 computers & security 103 (2021) 102166

Chen, (2018) has carried out the transfer learning tech- than 4 MB. As a result, they have reached up to an accuracy
nique on the malware detection problem via InceptionV1 deep of 96.7%. Though their study is the closest work in the liter-
learning architecture along with grayscaled malware images. ature compared to ours, however, our approach involves sev-
Throughout the study, the author performed the experiments eral differences such as the descriptors and the dataset em-
over the well known Malimg dataset (Nataraj et al., 2011) hav- ployed. Furthermore, we have used 3-channel images rather
ing 25 malware classes and Microsoft Malware Classification than grayscale counterparts and evaluated different resizing
Dataset (2015) including 9 classes. The results obtained from options.
a multi-class classification regime show that the proposed One another study that is closer to a part of our work is the
method together with the softmax classifier has achieved ac- paper of Sharma and Raglin, (2018). In their work, authors of
curacy up to 99.25% with 0.03% false positive rate for “Mal- (Sharma and Raglin, 2018) conducted several experiments to
img” dataset. Besides, the obtained accuracy has surpassed measure the efficiency of linear and nonlinear manifold learn-
the compared classification schemes involving SVM, Random ing algorithms including PCA, Isomap, Diffusion Maps, Lapla-
Forest, Naïve Bayes, etc. Chen, (2018) has also tested binary cian EigenMaps and t-SNE for the task of clustering. The au-
classification on the dataset having 16,518 benign and 10,639 thors have found that nonlinear methods perform better and
malware files and reported 99.67% accuracy. Nevertheless, we t-SNE outperforms the other techniques in almost all aspects.
argue their experiments since (1) they have discarded the mal- The aforementioned study is the first work that has applied
ware files less than 5 KB in favor of decreasing false positives manifold learning in the problem domain.
and (2) the testing strategy has been carried out against al- It should be noted that the literature of malware detec-
ready known classes. Instead, the training and testing sets tion also involves very unique approaches such as presented
should be mutually exclusive except for the benign family. in (Zhang et al., 2014, Zhang et al., 2016). In the former one,
Additionally, the author pointed out the vulnerability of the the authors have suggested a novel network traffic reason-
method against code obfuscation. ing approach in order to identify anomalies regarding request-
Similarly, Vasan et al. (2020) have taken two deep learn- level traffic structures and semantic triggering relations. In
ing architectures namely VGG16 and Resnet50 and fined tuned this way, they explored a new way to detect even zero-day
them via MalImg dataset to create an ensemble of classifiers. malware attacks via their new concept so-called “triggering
They first applied the transfer learning strategy and reduced relation discovery”. Their experiments based on 6GB+ dataset
the deep features obtained at the end of deep learning models and conducted with SVM, Bayesian network and Naïve Bayes
through the dimension reduction technique named principal algorithms have clearly shown that their proposal reaches up
component analysis (PCA). Vasan et al. (2020) have reported to 100% detection accuracy along with serving a scalable solu-
that they reduced the number of dimensions by 90%. Accord- tion against DNS bots, spyware and data exfiltration malware.
ing to their results, they have achieved 99% accuracy for un- The latter one (Zhang et al., 2016), on the other hand, applies
packed malware whereas 98% for packed ones. However, their the idea presented in (Zhang et al., 2014) onto Android mal-
study has been conducted by using a dataset having 25 classes ware recognition problem by constructing a triggering relation
in a manner of closed set recognition. Thus, the convolutional model for dynamic analysis of HTTP traffic generated by An-
neural network models they have trained do not recognize be- droid apps. In other words, in (Zhang et al., 2016), researchers
nign samples. aim to distinguish malicious network requests from benign
As another study, Yuan et al., (2020) have followed a differ- apps by constructing triggering relation graphs and exploring
ent way and proposed a solution based on byte-level malware the root triggers for malicious applications. Their experiments
classification through deep convolutional VGG16 model on conducted with 14GB+ dataset have shown that the use of
Markov images. To achieve this feature, they have converted triggering relations belonging to network traffic yields 98.2%
the binary files into Markov images by considering the trans- accuracy.
fer probability matrices. They have tested their proposal on
Microsoft Malware and Drebin datasets (involving 10 classes)
and obtained average accuracy rates of 99.26% and 97.36% re- 3. Materials and methods
spectively.
Tam et al. (2015) have taken memory snapshots on an em- In this part, we first introduce the dataset we have collected.
ulator equipped with Android 4.4 version. By using the Volatil- Next, the employed image descriptors will be demonstrated
ity Framework, the libraries and strings used by processes briefly due to space limitations.
extracted to be utilized as a feature. So they have proposed
a scalable and portable environment to investigate Android 3.1. Dataset
memory. Nevertheless, this study suffers from the size of the
small dataset they used. It is a well known fact that, for data-driven studies, the impor-
Dai et al. (2019) have proposed a method that classifies tance of well curated and correct dataset are vital. Provided
malware files through training artificial neural networks on that the literature of pattern recognition based malware de-
HOG features extracted from memory dump data. Yet, they tection is reviewed, it can be observed that there exists a lim-
have transformed memory dump files into grayscale images ited number of datasets such as “Microsoft Malware Classi-
recorded in PNG format. As the size of dump data is variable, fication Challenge” (Microsoft Malware Classification Dataset
the sizes of the images have been variable as well. For this rea- 2015) (9 imbalanced classes) and “Malimg” (Nataraj et al.,
son, the authors have resized them via bi-cubic interpolation 2011) (25 balanced classes). Bozkir et al. (2019) have re-
to make them have 4096-pixel width if the file size is greater cently published another brand new dataset called “Malevis”
computers & security 103 (2021) 102166 5

used to identify any image (Oujaoura et al., 2014). In essence,


Table 1 – The existing malware families and their cate-
GIST algorithm generates a representation (i.e. spatial enve-
gories in the dataset.
lope) by analyzing the dominant spatial structure of an image
Class Category # of Training # of Val Total by considering 5 different perceptual dimensions (i.e. open-
Adposhel Adware 364 93 457 ness, roughness, naturalness, expansions, and ruggedness,)
Allaple.A Worm 349 88 437 (Oliva and Torralba, 2001). It should be noted that, apart from
Amonetize Adware 349 87 436 scene classification (Oliva and Torralba, 2001), GIST descrip-
AutoRun-PU Worm 158 38 196 tors have been successfully employed in various fields (e.g.
BrowseFox Adware 152 38 190
recognition of phishing web page (Eroglu et al., 2019) and
Dinwod!rfn Trojan 98 29 127
InstallCore.C Adware 376 91 467
printed Tifinagh characters (Oujaoura et al., 2014)) as well as
MultiPlug Adware 390 98 488 malware classification (Nataraj et al., 2011). Thus, in this study,
VBA Virus 399 100 499 we mainly used GIST descriptors since it (a) is a global image
Vilsel Trojan 311 78 389 descriptor and (b) can work regardless of the image size.
Benign Files – 487 121 608 Through utilizing a pre-determined number of Gabor fil-
Total 3433 861 4294 ters G at different orientations O and scales S, GIST produces
a low dimensional feature vector. In order to decrease the
computational complexity, the whole image is first scaled and
then divided into B × B blocks yielding vectors containing
(The Malevis Dataset 2019). However, we observed 2 important B × B × G × O × S dimensions. In fact, for the computation of
shortcomings in these datasets. The first one is that they do GIST descriptor, the provided image is first divided into B × B
not contain any available PE files due to security concerns. The blocks to avoid loss of information and to reduce computa-
latter is that these datasets involve various features or assets tional load (Eroglu et al., 2019). As introduced in (1) and (2),
belonging to malware classes which causes the lack of benign Gabor filters having various scales and orientations are com-
samples. In this regard, it is obvious that the above-mentioned puted for each block.
datasets could be beneficial for close-set inference tasks. To be
⎛  ⎞
more specific, without any negative sample (i.e. benign pro- − x2θ + y2θ 
= Cexp⎝ ⎠exp 2π j (u0 xθ + v0 yθ
i i
cesses) the problem of learning becomes a close-set problem Gsθ (1)
i
2σ 2(s−1) i i
whereas we aimed at creating a model that is also capable of
determining whether the suspicious process is malicious or
not. The existing datasets, therefore, have not been appropri- xθi = xcosθi + ysinθi yθi = −xcosθi + ycosθi (2)
ate for our research.
As an attempt to build such a suitable dataset, we cooper- By default, for a 3 channel RGB image, the standard GIST
ated with an information security company located in Turkey. descriptor divides the whole surface into 4 × 4 spatial cells
The mentioned company has a special team and system to which will be represented two finer 8 orientations along
collect numereous malware samples (i.e. PEs) and they shared with one coarser 4 orientations. Consequently, a 960d vector
the real malware PEs with us. Combined with the received PEs (3 × (4 × 4) × (8 + 8 + 4)) is obtained.
belonging to 10 different malware families we also added an
adequate number of benign executables (PEs) mostly taken 3.3. Histogram of oriented gradients
from the Windows operating system. Note that, the malware
classes involved in the dataset were randomly selected. Fur- Suggested by (Dalal and Triggs, 2005), Histogram of Oriented
thermore, we have added an adequate number of benign sam- Gradients (HOG) is a computer vision based method that en-
ples chosen from digitally signed Microsoft Windows appli- ables us to capture local object appearance or visual cues by
cations. As a result, we built the “Dumpware10” dataset cov- utilizing distribution of intensity gradients and edge direc-
ering 4294 portable executables which are grouped under 11 tions. Although HOG focuses on local patches, the concate-
categories (10 malware + 1 benign class). Apart from having nation of processed image blocks/patches enables researchers
608 benign instances, the Dumpware10 dataset involves 3686 to use it as a global image descriptor. Since its proposal, HOG
malware samples collected from different families. The dis- features have been utilized in numerous fields such as pedes-
tribution and properties of the dataset have been presented trian detection (Dalal and Triggs, 2005) phishing identifica-
in Table 1. It should be noted that we have partitioned the tion (Eroglu et al., 2019; Bozkir and Akcapinar Sezer, 2016) and
whole dataset into a training set and validation set, using a shape representation. Though it is a well known and powerful
split of 80% / 20%. The proposed dataset is publicly available method, the shortcoming of HOG descriptor is the obligation
for non-commercial purposes and can be accessed via the link of having the same image sizes in order to have a canonical
of https://web.cs.hacettepe.edu.tr/∼selman/dumpware10/ feature vector representation. Nevertheless, we have chosen
HOG features like an auxiliary method since it can capture vi-
3.2. Gist descriptors sual cues of the whole image in both local and global perspec-
tive
Proposed by Oliva and Torralba, (2001), GIST descriptors were Computation of a HOG vector for an image fundamentally
first designed to capture and represent scene pictures in a requires the following stages: (1) calculation of gradients, (2)
holistic manner. To be more precise, GIST features provide a orientation binning and (3) block normalization. Several dif-
low dimensional and discriminative image vector that can be ferent types of derivative masks are employed in the phase
6 computers & security 103 (2021) 102166

of gradient computation. Given a point (x,y), HOG first com- to visualize high dimensional data in 2D or 3D space in order
putes the gradient values in both horizontal and vertical di- to better discover and investigate the underlying structures,
rections by employing [−1, 0, 1] and [−1, 0, 1]T kernel templates clusters and neighborhoods. From this point of view, manifold
(Dai et al., 2018). As reported in (Dalal and Triggs, 2005), these learning can be considered as a data transformation tool that
templates perform the best results when pedestrian classifi- employs linear or nonlinear dimensionality reduction tech-
cation comes into prominence. Next, as stated in (Dai et al., niques.
2018) the gradients in both x and y directions are calculated In this study, we have employed a state-of-art dimen-
via the formulas given in (3) and (4) below: sion reduction and manifold learning technique named UMAP.
UMAP is a relatively new and powerful manifold learning
grad = Gx (x, y )2 + Gy (x, y )2 (3) and dimension reduction method that is built on Rieman-
nian and algebraic topology together with fuzzy simplical
sets (McInnes et al., 2018). Apart from manifold learning, it
Gx (x, y ) also presents many useful features such as clustering, met-
α(x, y ) = tan−1 (4)
Gy (x, y ) ric learning, visualization and inverse transformation of em-
beddings. It should be kept in mind that, there exists a
In essence, to normalize the contrast among the neighbor- vast amount of DR… methods in the literature such as PCA
hood regions, the image is divided into a determined num- (Jackson, 2005), Laplacian Eigenmaps (Belkin and Niyogi, 2002),
ber of equally sized cells where 8 × 8 or 16 × 16 cell groupings Isomap (Tenenbaum et al., 2000), Diffusion Map (Coifman and
constructs a block. Note that, these blocks consist of overlap- Lafon, 2006) and t-SNE (Maaten and Hinton, 2008). The t-SNE,
ping cells to contribute to the contrast normalization scheme. among the others, is accepted as the most widely used method
Throughout the normalization, the obtained local feature vec- (Becht et al., 2019, Ali et al., 2019) and it is often utilized to vi-
tors belonging to each cell are cascaded based on the voting sualize high dimensional data in 2D or 3D space. Fundamen-
result. As a result, the parameters such as cell size, block size tally, the goal of t-SNE is to discover the patterns by calcu-
and image size directly affect the dimension size of the fea- lating the probability distributions (i.e. Student t-distribution)
ture vectors. Therefore it is crucial to have a predefined input found in the pairs of high dimensional data points and re-
image size. flecting them into the lower dimensional space in a way that
the distributions are tried to be kept same via Kullback-Leibler
3.4. UMAP - Uniform Manifold approximation and divergence (Sharma and Raglin, 2018). Though it is a widely-
projection for dimension reduction used and powerful method, it has several shortcomings such
as (1) preserving only local neighborhoods (i.e. discarding the
In this sub-section, we first briefly outlined what the mani- global relationships yielding lack of exploring the “big pic-
fold learning is and concisely introduced the method of UMAP ture”), (2) slow computation that causes scalability problems
(Uniform Manifold Approximation and Projection). Further- when large datasets are analyzed, (3) consuming much mem-
more, we also presented the reasons behind its selection along ory in case of using large perplexity value and (4) lack of pro-
with describing its pros and cons compared to other dimen- ducing a learned transformer function to embed new cases
sion reduction techniques. when needed. Sharma and Raglin (2018) have applied the
Sharma and Raglin (2018) describe the concept of the mani- above listed DR… methods (except UMAP) in the problem
fold as a topological space that locally resembles the Euclidean domain of malware detection and found that the best em-
space along with involving a more complex global structure. bedding could be achieved by the use of t-SNE. Nonetheless,
More precisely, from the local perspective, manifolds can be the recent studies (McInnes et al., 2018, Becht et al., 2019,
seen as Euclidean spaces having homeomorphic neighborhoods Ali et al., 2019) comparing UMAP and t-SNE in various fields
for every point in n-dimensional space. Nonetheless, these and datasets report the superiority of UMAP with detailed
points might not be homeomorphic from the perspective of the benchmarks. Although both of these methods follow same or
global structure. Spheres, toruses, planes, or folded sheets can similar methodologies, the main reason behind this finding
be given as examples of manifolds. According to (Sharma and is that UMAP not only considers the local connectedness but
Raglin, 2018), given a dataset having a d-dimensional feature also attempts to preserve data’s global structure by utilizing
space, revealing the manifold structure lying inside the data Laplacian Eigenmaps (Kobak and Linderman, 2019)
is called as manifold learning. At its core, the UMAP algorithm initially builds a high di-
Manifold learning or dimension reduction (DR…) methods mensional graph structure by utilizing a concept so-called
deal with discovering new low dimensional embeddings by “fuzzy simplical complex” as an analogy to a weighted graph
transforming/mapping the high dimensional data such that having the weights of edges represent the degree of proba-
the distances among closer data points lie together in the bility of two data points (i.e. vertex) are related (Coenen and
new coordinate system whereas the distant points remain dis- Pearce, 2020). The algorithm decides to connect a data point
tant by also preserving the parameters. In essence, one sig- to another based on whether they overlap within a volumet-
nificant intuition behind this idea results from two facts: (a) ric range controlled by a diameter parameter. Thus, the selec-
many co-related and overlapping features exist in the datasets tion of the diameter becomes a key component for the opti-
and (b) the necessity of avoiding this complexity to achieve a mal trade-off ranging from having very tiny clusters to glu-
simplified and nonoverlapping representation by preserving ing unnecessary points. At this point, UMAP handles this crit-
the underlying parameters that govern the data (Sharma and ical stage by picking a diameter locally by considering the dis-
Raglin, 2018). One practical implication of these methods is tance of nth nearest neighbor of each point in higher dimen-
computers & security 103 (2021) 102166 7

sional space and the graph gets fuzzier along with lowering the
likelihood of connections while the gluing diameter increases
(Coenen and Pearce, 2020). Note that, employment of the num-
ber of neighbors (NN) is a subtle difference of UMAP compared
to the perplexity parameter used by t-SNE. Moreover, UMAP
removes the normalization steps for both high and low di-
mensional probabilities which results in speedup compared to
t-SNE. Since UMAP is based on the graph structure, optimiza-
tion of the layout is an essential step and this is achieved by
the Stochastic Gradient Descent algorithm. There exists some
advanced mathematical background for the UMAP algorithm
and the reader is suggested to review the study (McInnes et al.,
2018) for further reading.
In this study, we have incorporated UMAP to (1) improve the
binary classification performance at unknown malware recog-
nition problem by obtaining more discriminative lower di-
mensional embeddings via its supervised metric learning sup-
port that enables warping the dataset with the help of given
class labels; (2) visualize the latent space to investigate and ex-
plore the learned lower dimensional embeddings in 2D space.
As pointed out by Ali et al. (2019), dimension reduction is a way
to improve the efficiency of revealing patterns in datasets. To
be more precise about the first argument, UMAP enables to
create of a transformer that is trained in a supervised man-
ner through class labels regardless of whether it is a binary or
multi-class classification task. In other words, it learns how to
embed the samples belonging to different classes from high
dimensional feature space into the lower dimensional one by
also skewing the target feature space. In this regard, we can
obtain a well-separated set of new embeddings for each class
that could be more isolating and discriminative which yields
better and robust accuracy rates. In this perspective, UMAP
behaves similarly to variational autoencoders (VAE). However,
the experiments of (McInnes et al., 2018) clearly show the su-
periority of UMAP over VAE. Consequently, we have attempted
to investigate the use of UMAP for the first time in the prob-
lem domain and explored its contribution to the unknown
malware detection problem which was treated in the second
phase of our study.

Fig. 2 – The overall workflow of the proposed approach’s


4. The approach
first phase. Notice that our first phase is composed of four
subsequent phases.
In this section, the workflow and system of our whole ap-
proach have been explained in detail. The overall study com-
poses of two phases. The followed way in different phases is
described in separate sections. The workflow of the first phase
temporary data that exists in the physical and virtual mem-
is depicted in Fig. 2 in detail. The second phase is illustrated
ory of the computer. The memory dump is also referred un-
in Fig. 5.
der different naming conventions such as core dump or sys-
tem dump, and it is mostly used by software developers dur-
4.1. Gathering memory data ing the debug phases. Through memory dumping, data of all
processes, or a specific process in physical memory can be
If the malware detection studies conducted in recent years extracted. When the target process is dumped, the layouts
are examined, it can be regarded that data located in volatile of thread stacks, text segments, data segments, heap areas,
memory is often used since the memory dumps could store in- DLL calls of the processes could be revealed. There are quite
formation related to the behavior and structure of portable ex- many tools for the memory dumping process. However, dur-
ecutables (PE). Moreover, in most of the cases, memory dumps ing the study, we have utilized the Procdump (ProcDump v9.0,
are robust to obfuscation and packing techniques. By defini- Microsoft 2019) v9.0 which is a command-line application de-
tion, the procedure of memory dumping is the extraction of veloped by Microsoft.
8 computers & security 103 (2021) 102166

Procdump tool enables users to monitor processes and ob- static malware files. In this regard, each pixel represents 3 se-
tain dump files. As being a command-line utility, it was first quential bytes yielding a color. The main motivation of this ap-
made available for Microsoft Windows operating system and proach is to store more bytes in each row of the image for hav-
later on Linux support was provided as well. It is one of the ad- ing better consistency in terms of byte-level alignment. In this
vantages of Procdump that it also enables us to capture virtual way, more information could be placed into each row of the
address space occupied by processes. image such that visual similarities belonging to similar sam-
The PE files involved in our Dumpware10 dataset were run ples will be more apparent and easily identifiable. Second, RGB
in the Windows Sandbox environment shipped with 1903 ver- based encoding makes the images more compact since the
sion of Windows 10 to get their contents located in both phys- amount of pixel space will be reduced with a 1/3 ratio yield-
ical and virtual address space. In this way, we also protected ing less distortion during the post image resizing. Nonethe-
our physical system against malicious activities. To obtain the less, it should be noted that, compared to 8-bit grayscale
dump files, we have executed the Prodcump by providing –ma encoding, this approach could involve disadvantages when
and –w startup arguments. The first argument “-ma” tells the byte-level variations come into prominence among memory
tool to gather full memory dump belonging to the target pro- dumps. Meanwhile, for the byte-to-image conversion, we first
cess whereas the “-w” forces the tool to wait for the process updated and modified the Python script called “bin2png” pub-
until it is run. In our experiments, we first have run Procdump lished in https://github.com/ESultanik/bin2png. Next, we have
tool and secondly, we have made an intentional delay that has run this script to create the source input images. The men-
a duration of 5000 ms. As the third stage, the PE file was run tioned script was written in Python 2.7. However, we modified
and dumping operation finished. The rationale behind the in- it that it will be run under Python 3 platform. Meanwhile, in-
tentional delay is just to ensure that Procdump and the speci- stead of lossy JPG format, we preferred the lossless PNG for-
fied malicious application and active. Note that the extension mat.
of the generated dump files changes according to the OS of the There exist various byte-to-image rendering schemes in
host. For instance, the Windows version of the Procdump cre- the studies of malware detection literature. For instance, in
ates .dmp files while the Linux version generates dump files their study, Dai et al. (2018) converted memory dump data into
having .vmcore extension. Next, we have located all the .dmp .png encoded gray-scale images having a constant width of
files into their respective folders according to their classes. 2048 or 4096 pixels according to the file size. However, accord-
One another important point is that the size of dump files ing to our best knowledge, none of them has conducted a com-
varies due to several factors (i.e. dependent DLL files). Besides, parative study on the way that these renderings affect classi-
the size of dump files generated through the PE files of our fication accuracy. In other words, here we are arguing how one
corpus ranges between 10 MB to 100 MB. As explained clearly must select the column width for the analysis. As can be seen
in (Dai et al., 2018), each process consists of several regions in Fig. 2, we have selected 4 different column widths such as
in memory space to store different blocks such as DLLs, envi- (1) 224px, (2) 300px, (3) 4096, and finally (4) square root scheme.
ronmental variables, process heap, thread stack, data segment As the names of the first 3 schemes refer, they correspond to
and text segment as well. Moreover, according to (Korkin and the initial image widths regardless of the memory dump file
Nesterov, 2015), operating systems employ Address Space Lay- size. The last scheme (i.e. square root) we applied follows a
out Randomized (ASLR) scheme indicating that those blocks different strategy. Instead of pre-determined image width, we
are located randomly and fragmented especially when the have computed the square root of the number denoting the
size of available memory is relatively small. Thus, to collect (dump file size)/3. Here, we have divided the byte size of PE’s
more consistent and uniform memory dumps we have allo- by 3 due to the 3-channel color coding and adjusted the edge
cated large enough memory (>16 GB) for the Sandbox envi- of the square in a way that it will contain zero-padding ele-
ronment. ments if necessary. In this way, we obtained 4 different kinds
of input images for representing the same training and val-
4.2. Image representation idation sets. Therefore, as a research question, we also ex-
plored the effect of initial image renderings as illustrated in
The proliferation of binary-to-image conversion invented by Fig. 3.
Nataraj et al., (2011) has inspired many new anti-malware The size of initial rendered images allocated quite large
studies. This kind of visualization of binary files has filled the even though RGB encoding has been preferred. Further, as can
gap between computer vision and the byte level sequences of be seen in Fig. 3, the images having 224px width yielded the
executables. Throughout the study, to achieve a suitable rep- longest vertical edge. This issue restricts the efficiency of com-
resentation, we have fundamentally (a) employed RGB based puter vision methods and extends the running time of the vi-
encoding to convert dump files into images and (b) exper- sual feature extraction algorithms. Thus, we have transformed
imented with various column width schemes during these them in a way they form a square sized image. Technically
byte-to-image renderings. speaking, we resized the images throughout their vertical axis
The content size of the memory dump data we collected via Lanczos interpolation shipped with the OpenCV library
throughout the study is very large and highly variable. There- (OpenCV Tutorials, OpenCV 2019). Rather than Bi-cubic inter-
fore, unlike other studies such as (Nataraj et al., 2011, Dai et al., polation used in (Dai et al., 2018), we have employed a better
2018, Yuan et al., 2020) applying grayscale encoding, we have quality interpolation method called Lanczos to reduce the in-
preferred RGB encoding to generate the images from malware formation loss during this unavoidable resize operation. With
dump files. Note that, in (Bozkir et al., 2019), authors have Lanczos interpolation, the images can be downscaled consid-
followed the same way for generating malware images from ering the 8 × 8 neighborhood in image pixels.
computers & security 103 (2021) 102166 9

Fig. 3 – Initial image building and resizing schemes we have applied.

The rationale behind the square size transformation is 4.3. Visual feature extraction
described as follows. As is known, machine learning meth-
ods require equal length feature vectors to make classifica- To extract possible discriminative visual patterns in memory
tions, clustering, and so on. Therefore, in substantial num- dump images, we have performed two different visual feature
ber of cases, any computer vision based method must pro- descriptors namely GIST and HOG. For getting GIST descrip-
duce equal-sized feature vectors (i.e. signatures) to represent tors we have employed the Python package named “leargist”
the images. As pointed out in the next sub-section, we extract (Pyleargist 2019). This implementation has been selected due
GIST and HOG based image descriptors during the signature to not only being fast but also enables us to compute colored
generation stage. Apart from the GIST, due to the nature of the GIST descriptors having 960-dimensional vectors. Throughout
algorithm of Histogram of Oriented Gradients, it needs to pro- the study, we have used a computer equipped with an Intel
cess on equal-sized images for producing features having the 8750 processor and 24 GB memory. The time needed for the
same length. GIST feature computation has been measured as 1.67 s on av-
Otherwise, some important parameters in the HOG com- erage.
putation such as cell size, number of cells in each block yield We have also computed the HOG features of the memory
varying sized feature vectors. Therefore, it indispensably be- dump images by using “Skimage.features” Python package.
comes an obligation to have a canonical image size. It should This package provides a useful set of descriptors such as HOG,
also be noted that resizing the image into square size un- SIFT, and Local Binary Patterns. During the HOG feature gen-
doubtedly causes loss of visual information to a certain ex- eration, we have first resized the input images into 256 × 256.
tent since this transformation distorts the image. Neverthe- This operation can be considered as downscaling for 300 × 300
less, the previous studies (Dai et al., 2018, Bozkir et al., 2019, images whereas it can be thought of as an upscaling process
Yuan et al., 2020) and our present work show that this loss for 224 × 224 images. Following this, we have set the cell_size
is not that significant and the discriminative visual cues and parameter as 32 along with orientations as 9 while we have de-
patterns could still exist. To this end, we have taken two im- fined the cell_per_block argument as 2 × 2. Besides, we have
portant advantages: (1) we “equalized” images in terms of employed “L2-Normsys” block normalization scheme. As a re-
size to be used via HOG descriptors and (2) we made the sult, we have obtained 1764d feature vectors. The time needed
data ready to be modeled with modern convolutional neural for the HOG feature computation has been measured as 1.12 s
networks. on average.
Consequently, we have transformed the byte sequences of
the PEs into discriminative visual elements that can be called 4.4. Classification via machine learning
memory dump images. What is more, is we have investigated
the outcome of different pre-processing techniques and com- As a result of the feature extraction procedure, vectors rep-
pared them. resenting the characteristic of the images were obtained. To
classify the memory dumps according to their visual descrip-
10 computers & security 103 (2021) 102166

Fig. 4 – Several examples of RGB based memory dump images (224 × 224 renderings) belonging to different malware
families and benign samples. The first three rows depict images of various malware classes such as “Allaple”, “BrowseFox”
and “InstallCore”. The last row shows samples belonging to benign PEs. The renderings demonstrate the intra-class and
inter-class similarities and variations.

tors, we have employed 5 well know machine learning meth- 4.5. Evaluation metrics
ods: (1) Random Forest, (2) XGBoost, (3) Linear SVM, (4) Se-
quential Minimal Optimization, and (5) J48. Three out of these Throughout the experimental studies, to fairly and quanti-
five techniques are in tree-based techniques whereas two of tatively assess the performance of the built machine learn-
them are involved in structured learning methods. In this way, ing models, we have operated several metrics such as ac-
we aimed to examine whether our problem domain can be curacy rate. The metrics that we have benefited from have
learned by tree-based methods and kernel machines. This in- been widely used in the literature and provide detailed evalu-
vestigation is important since we observe a high inter-class ations of the proposed approaches (Sharma and Sahay, 2014,
similarity between some different malware families as can be Eroglu et al., 2019 and Dai et al., 2018). In this sub-section we
seen in Fig. 4. On the other hand, Fig. 4 also depicts the intra- briefly introduce them as follows:
class variations belonging to the same malware class. TP (i.e. true positives) is the measure of artifacts correctly
For an efficient and effective approach, handling this kind classified as members of positive class whereas FP (i.e. false
of inter-class and intra-class variations/similarities is crucial. positives) denotes the number of cases where the model
Moreover, we believe that this problem must be addressed wrongly predicts the positive class. Similarly, TN (i.e. true neg-
for any kind of machine learning task. We employed Weka atives) shows the number of outcomes where the model de-
(General documentation, Weka 2019) software to build the termines the ground-truth negative class examples correctly.
models that we evaluated. In particular, we have written a On the other hand, FN (i.e. false negatives) tells the number of
Python script for XGBoost based modeling due to the lack of predictions where the created model incorrectly detects the
this method in vanilla Weka distribution. During the experi- actual negative cases. According to these base definitions we
ments carried out with Random Forest (RF), XGBoost, and J48, the metrics so-called accuracy, precision, recall, F1-score are
we have kept the default parameters. For the linear SVM and defined below in (5), (6), (7), and (8) respectively.
SMO based modeling, we have set the C value as 10. In partic-
ular, we have selected the RBF kernel for SMO based learning TP + TN
Accuracy = (5)
process. In this way, we have aimed to observe whether the TP + FP + TN + FN
feature space occupied by HOG and GIST descriptors can be
better separated via linear or Gaussian kernel tricks.
It should also be kept in mind that, our problem lies in TP
Precision = (6)
an open-set classification scheme where the benign class also TP + FP
exists. Therefore, having highly discriminant representations
and accurate classification of them is vital since most of the
samples will come from benign processes in the wild. TP
Recall = (7)
TP + FN
computers & security 103 (2021) 102166 11

Table 2 – GIST based performance comparison for the various classifiers and row length settings. Best configuration was
highlighted with bold characters.

ML.Algorithm Initial Column Width Feature Accuracy FPR Precision Recall F1-Score
Random Forest 224 pixels GIST 86.06% 0.017 0.867 0.861 0.857
SMO (RBF) 224 pixels GIST 87.45% 0.014 0.880 0.875 0.875
SVM (Linear) 224 pixels GIST 85.36% 0.016 0.858 0.854 0.853
XGBoost 224 pixels GIST 88.26% 0.011 0.885 0.882 0.881
J48 224 pixels GIST 72.70% 0.029 0.731 0.727 0.726
Random Forest 300 pixels GIST 89.08% 0.013 0.892 0.892 0.891
SMO (RBF) 300 pixels GIST 91.40% 0.010 0.915 0.914 0.914
SVM (Linear) 300 pixels GIST 88.03% 0.014 0.879 0.880 0.879
XGBoost 300 pixels GIST 88.61% 0.011 0.888 0.886 0.884
J48 300 pixels GIST 74.09% 0.027 0.756 0.741 0.746
Random Forest 4096 pixels GIST 93.14% 0.09 0.931 0.931 0.932
SMO (RBF) 4096 pixels GIST 94.65% 0.006 0.948 0.947 0.947
SVM (Linear) 4096 pixels GIST 92.91% 0.08 0.928 0.929 0.928
XGBoost 4096 pixels GIST 91.63% 0.008 0.919 0.916 0.917
J48 4096 pixels GIST 79.55% 0.02 0.804 0.797 0.797
Random Forest Square root scheme GIST 88.03% 0.016 0.885 0.880 0.878
SMO (RBF) Square root scheme GIST 91.75% 0.009 0.917 0.918 0.916
SVM (Linear) Square root scheme GIST 88.73% 0.012 0.887 0.887 0.885
XGBoost Square root scheme GIST 88.96% 0.011 0.889 0.889 0.888
J48 Square root scheme GIST 71.19% 0.033 0.724 0.712 0.716

Throughout the first phase, we also explored the outcome


2 ∗ Precision ∗ Recall
F 1 − score = (8) of the late fusion of the obtained feature vectors. In other
Precision + Recall
words, we concatenated GIST and HOG descriptors to test
Moreover, we have also benefited the metric of false pos- whether this kind of aggregation yields better classification
itive rate (FPR) which is formalized below in (9) respectively. accuracy. To measure the performance of the machine learn-
The FPR measures the proportion of the cases that are wrongly ing models we have created so far, we have utilized var-
predicted as positive where they are actually negative via di- ious performance metrics such as accuracy, false positive
viding the false positives by actual negative cases. rate (FPR), precision, recall, and F1-score (the harmonic mean
of precision and recall). According to the conducted experi-
FP FP ments, the following findings have been found:
F PR = = (9)
Act ıual Negat ives FP + TN

(1) Compared to HOG, the GIST descriptor turns out to pro-


5. Experiments and results vided with better results. This shows that the GIST descrip-
tor can generate more discriminative representations than
In this sub-section, we have introduced our two-phased ex- HOG. As listed in Table 2, we achieved an accuracy of 94.65%
periments and their results along with related discussions. by using GIST features and the SMO (Rbf) classifier.
(2) When compared to GIST, HOG features have been outper-
5.1. Phase 1 - Identification of known malware families formed in most of the cases. Table 3 shows the classifica-
and benign samples tion results related to HOG features and several machine
learning methods. Accordingly, we achieved at most 92.68%
In this phase, we investigated the performance of GIST and accuracy and 0.008 false positive rate. The best combina-
HOG features employed together with different machine tion we discovered is HOG + SMO (Rbf). Likewise, the initial
learning schemes in the problem domain. These features image rendering having 4096 pixels width has been iden-
were previously used in static malware analysis methods tified as the best configuration.
(Nataraj et al., 2011, Dai et al., 2018). Nevertheless, we have (3) One of the hypotheses that we proposed was the combi-
tested whether their individual and fused usage fits into the nation of GIST + HOG features yields a better classification
problem of dump image classification. scheme since we lately fused these two sources of informa-
As mentioned before, one of the other concerns of this tion. According to the results listed in Table 4, this fusion
study is to test and discover the correct image rendering has increased the accuracy and recall scores while reduced
scheme in particular to the problem domain. We, therefore, the FPR. Therefore, we can conclude that the merging of
have prepared 4 versions of the dataset with different im- different visual features yields better classification results
age renderings such as (1) 224px, (2) 300px, (3) 4096px and (4) regardless of the underlying machine learning method that
square root scheme. The square root scheme first takes the we have employed.
size of the file and computes the optimal side length of the (4) Among the used ML methods, we explored that SMO with
square to fit the content of the file. radial basis kernel has outperformed the other methods.
12 computers & security 103 (2021) 102166

Table 3 – HOG based performance comparison for the various classifiers and row length settings. Best configuration was
highlighted with bold characters.

ML.Algorithm Initial Column Width Feature Accuracy FPR Precision Recall F1-Score
Random Forest 224 pixels HOG 86.87% 0.017 0.874 0.869 0.866
SMO (RBF) 224 pixels HOG 90.84% 0.010 0.910 0.909 0.909
SVM (Linear) 224 pixels HOG 85.71% 0.015 0.859 0.859 0.859
XGBoost 224 pixels HOG 87.45% 0.012 0.875 0.874 0.872
J48 224 pixels HOG 67.71% 0.036 0.687 0.677 0.679
Random Forest 300 pixels HOG 86.06% 0.018 0.861 0.861 0.861
SMO (RBF) 300 pixels HOG 89.31% 0.012 0.894 0.893 0.892
SVM (Linear) 300 pixels HOG 85.48% 0.016 0.861 0.855 0.856
XGBoost 300 pixels HOG 84.90% 0.015 0.848 0.849 0.846
J48 300 pixels HOG 70.84% 0.031 0.726 0.708 0.712
Random Forest 4096 pixels HOG 88.61% 0.015 0.892 0.886 0.884
SMO (RBF) 4096 pixels HOG 92.68% 0.008 0.927 0.927 0.926
SVM (Linear) 4096 pixels HOG 88.85% 0.012 0.888 0.889 0.888
XGBoost 4096 pixels HOG 90.12% 0.010 0.904 0.901 0.900
J48 4096 pixels HOG 70.84% 0.033 0.717 0.708 0.710
Random Forest Square root scheme HOG 88.05% 0.014 0.896 0.889 0.886
SMO (RBF) Square root scheme HOG 89.79% 0.012 0.901 0.898 0.898
SVM (Linear) Square root scheme HOG 85.13% 0.017 0.857 0.851 0.853
XGBoost Square root scheme HOG 88.15% 0.012 0.882 0.881 0.880
J48 Square root scheme HOG 66.89% 0.037 0.683 0.669 0.674

Table 4 – GIST+HOG based performance comparison for various classifiers and row length settings. Best configuration was
highlighted with bold characters.

ML.Algorithm Initial Column Width Feature Accuracy FPR Precision Recall F1-Score
Random Forest 224 pixels GIST+HOG 89.89% 0.010 0.899 0.899 0.897
SMO (RBF) 224 pixels GIST+HOG 94.54% 0.006 0.946 0.945 0.945
SVM (Linear) 224 pixels GIST+HOG 92.10% 0.009 0.923 0.921 0.921
XGBoost 224 pixels GIST+HOG 91.86% 0.008 0.922 0.918 0.918
J48 224 pixels GIST+HOG 73.28% 0.028 0.741 0.733 0.735
Random Forest 300 pixels GIST+HOG 90.59% 0.012 0.908 0.906 0.904
SMO (RBF) 300 pixels GIST+HOG 93.14% 0.008 0.932 0.931 0.931
SVM (Linear) 300 pixels GIST+HOG 90.24% 0.010 0.904 0.902 0.902
XGBoost 300 pixels GIST+HOG 89.69% 0.010 0.897 0.896 0.895
J48 300 pixels GIST+HOG 76.42% 0.025 0.781 0.764 0.770
Random Forest 4096 pixels GIST+HOG 92.91% 0.009 0.933 0.929 0.929
SMO (RBF) 4096 pixels GIST+HOG 96.39% 0.004 0.965 0.964 0.964
SVM (Linear) 4096 pixels GIST+HOG 93.26% 0.007 0.932 0.933 0.932
XGBoost 4096 pixels GIST+HOG 93.61% 0.006 0.936 0.936 0.935
J48 4096 pixels GIST+HOG 80.37% 0.022 0.813 0.804 0.806
Random Forest Square root scheme GIST+HOG 91.17% 0.011 0.915 0.912 0.911
SMO (RBF) Square root scheme GIST+HOG 95.35% 0.005 0.954 0.954 0.953
SVM (Linear) Square root scheme GIST+HOG 92.21% 0.008 0.922 0.922 0.922
XGBoost Square root scheme GIST+HOG 90.82% 0.009 0.911 0.908 0.909
J48 Square root scheme GIST+HOG 73.86% 0.028 0.759 0.739 0.746

To avoid overfitting, we have set C (cost) parameter as (5) Fig. 5 depicts the confusion matrix of our best classifier
10. Thus, we can figure out that the problem in this do- which is obtained SMO (with radial basis kernel) classi-
main is highly non-linear since the best results have been fier along with the use of GIST+HOG features. As can be
achieved with non-linear kernels. This finding is also in observed from Fig. 5, the benign class has been found as
line with the scores achieved by J48 and RF/XGBoost meth- the most difficult class to predict. Thi has not presented a
ods. As J48 is a simpler decision tree method compared to surprise from our perspective. We believe that this finding
RF/XGBoost, the generalization capacity of it limits the ob- is mainly related to the high variance existing in benign-
tained accuracy during inference. Apparently, the use of ware. Also, the structural similarity among the benign-
more than one single tree (i.e. Random Forest, XGBoost) has ware is naturally low. In theory, the open-set recognition
contributed to separate highly complex feature space (i.e. problems generally tend to have relatively low accuracy
manifold) scores regarding the “other” or “unknown” class
computers & security 103 (2021) 102166 13

Table 5 – Comparison of our best classifier with other ref-


erences works.

Study Accuracy Precision Recall F1


Nataraj et al., 2011 91.40% 0.915 0.914 0.915
Dai et al., 2018 94.54% 0.946 0.945 0.945
Rezende et al., 2018 96.93% 0.970 0.969 0.969
Our best 96.36% 0.964 0.964 0.964

library and trained the network for 50 epochs with a decaying


learning rate of 0.001 and a batch size of 8. The training pro-
cess has been done on a computer having an Nvidia Geforce
1050TI graphics card. According to the comparative results
Fig. 5 – The confusion matrix obtained from our best provided in Table 5, our work outperforms the first two ap-
classifier (SMO) which utilizes GIST and HOG descriptors proaches in terms of accuracy, precision, recall and F1-score.
(Acc: 0.9639, F1-Score: 0.964). The benign class has found to Rezende et al. (2018) which employs deep learning has slightly
have the most misclassifications. outperformed our scheme. In this regard, our approach shows
a competitive performance compared to (Rezende et al., 2018).
According to us, the late fusion of GIST and HOG features
(6) We also inferred that the way of building initial image enriches the information gained from executable images com-
rendering highly affects the classification accuracy. In this pared to the single feature employment. Furthermore, it is
study, we have applied 4 different input image size for the a well known fact that deep convolutional neural networks
problem. In line with the suggestion made by (Dai et al., (CNN) present a superior performance in almost all aspects
2018), 4096 pixels wide rows produce the best results in of computer vision. Nevertheless, they require a large number
all experimental studies. Moreover, compared to 224 pixels of samples in order not to have the overfitting problem. Be-
wide images, the input images initially having 300 pixels sides, the nature of the malware detection problem is highly
outcomes better results in most of the cases. related to capturing distinguishing patterns from the images.
(7) As an option, we have also investigated the impact of size In this regard, similar to the findings of (Yajamanam et al.,
settings that are highly coupled with the square root of 2018), we argue that conventional CNN architectures do not in-
memory dump file size. However, as the results indicate, troduce much accuracy gain since the image variances among
this approach performs the worst results. Combining with malware families do not source from well-known distortions
the previous finding, we can conclude that the byte se- and transformations such as rotation, scaling and other types
quences should be tightly aligned to obtain more discrim- of affine transformations. On the other hand, it is notewor-
inative visual representations. At this point, we infer that thy that CNNs enables us to have much faster computation
the renderings obtained by the square rooting technique pipelines along with end-to-end learning.
hurt the structural similarity and visual patterns such that
the file sizes distort the canonical representation in which 5.2. Phase 2 - Detection of unknown malware and
a visual descriptor highly benefits from while extracting benign-ware through supervised manifold learning
structural patterns.
(8) The overall execution of the pipeline involving memory As is known, new types of malware families or a variant of
dumping, image rendering, feature classification and clas- known malware are published every day. Therefore, it is be-
sification took 3.56 s on average on a standard Intel Core coming more crucial to detect zero-day attacks in time. This
i7 PC equipped with 24 GB of memory and SSD harddrive. fact, in essence, has become the main motivation of this part
Thus, we believe that this duration enables our suggestion of the study. In the next phase of our study, therefore, we
to be a viable solution to be used in anti-malware systems. looked for an answer to the question of “Can we detect a sus-
picious process as malware even it is not possible to know it’s
We have also compared our best results with three other family?”. The nature of this question, in essence, leads us to
studies such as (Nataraj et al., 2011, Dai et al., 2018 and the binary classification.
Rezende et al., 2018). The study of (Nataraj et al., 2011) em- To find out a robust answer, we have designed a completely
ployed pure use of GIST descriptors along with machine learn- different experiment than the first phase. Since our aim is to
ing methods whereas the (Dai et al., 2018) utilized HOG de- correctly identify unknown processes, we have reconfigured
scriptors to be classified by the multi-layer perceptron model. the train and test sets according to this need. The Dump-
Likewise, Rezende et al. (2018) suggested the use of deep con- ware10 dataset which we have created contains 10 malware
volutional neural networks via VGG16 architecture. None of classes and benign instances. We, thus, decided to split our
these studies have publicly available source codes to bench- malware families such that 3 out of 10 families represent the
mark. So, we have manually coded according to the definitions unknown test portion. Accordingly, we targeted to classify the
in the aforementioned papers. To compare with the work of known unknowns through the information learned from the re-
(Rezende et al., 2018), we have utilized the official Pytorch v1.4 maining malware existing in the training data.
14 computers & security 103 (2021) 102166

Table 6 – Binary classification performance various ML methods on each fold.

ML.Algorithm Fold Training Data Testing Data


Accuracy Precision Recall F1 Accuracy Precision Recall F1
Random Forest Fold 1 100% 1 1 1 62.32% 0.593 0.690 0.550
SVM (Linear) Fold 1 96.54% 0.961 0.893 0.923 63.25% 0.582 0.666 0.549
XGBoost Fold 1 100% 1 1 1 63.19% 0.598 0.701 0.559
Random Forest Fold 2 100% 1 1 1 84.24% 0.715 0.824 0.746
SVM (Linear) Fold 2 96.86% 0.955 0.911 0.932 52.29% 0.560 0.622 0.473
XGBoost Fold 2 100% 1 1 1 59.40% 0.589 0.683 0.531
Random Forest Fold 3 100% 1 1 1 82.82% 0.688 0.769 0.713
SVM (Linear) Fold 3 96.64% 0.957 0.900 0.926 77.40% 0.652 0.760 0.668
XGBoost Fold 3 100% 1 1 1 76.24% 0.649 0.764 0.662

To investigate the generalization capability aswell, we have


Table 7 – Precision, recall and f1-scores of the predicted
split the dataset into 3 folds in a way that each fold involves
benign and malware classes with the most successful al-
7 malware families for training and 3 malware families for gorithms in each fold.
testing. The benign ware set including 608 samples in total
have been also split according to the ratio of train/test samples Fold Class Precision Recall F1 Support
in each fold via random selection. Consequently, the shapes Fold 1 Benign 0.24 0.71 0.35 227
of train/test splits of the folds have become as follows: Fold Fold 1 Malware 0.93 0.62 0.74 1376
1 (Train: 2691, Test: 1603), Fold 2 (Train: 3380, Test: 914) and Fold 2 Benign 0.47 0.80 0.59 130
Fold 2 Malware 0.96 0.85 0.90 784
Fold 3 (Train: 2744, Test: 1549). As the next step, we changed
Fold 3 Benign 0.43 0.69 0.53 220
the class labels as “malware” and “benign” in both train and Fold 3 Malware 0.94 0.85 0.89 1329
test sets. With this modification, we have converted the multi-
class problem into the binary classification scheme. For the
classification stage, we have selected linear SVM, XGBoost and
Random Forest algorithms having the same parameters as the
ones we have used in the previous phase.
The results of the experiments have been given in Table 6. and malware samples make it hard to separate them in high
As can be seen from the results, the Random Forest algorithm dimensional feature space.
has been found as the most successful classifier. Moreover, At this point, since the problem becomes an imbalanced
XGBoost and Random Forest algorithms have predicted the class distribution problem, there exist several countermea-
training cases 100% correctly. However, linear SVM (C = 10) sures that can be taken such as minor class oversampling,
models have failed to identify all the training cases. major class downsampling or synthetic sample generation for
For fold 1, the best scheme has been obtained through the minority class (i.e. benign)
linear SVM having achieved a 63.25% accuracy rate along with However, these methods have different shortcomings. We,
an F1-score of 0.549. For the fold 2, the Random Forest algo- therefore, applied the UMAP manifold learning and dimension
rithm has achieved the best performance with the accuracy reduction method for the first time in the literature to over-
score of 84.24% and F1-score of 0.746. Similarly, in Fold 3, RF come this problem through its supervised metric learning fea-
again outperformed the other models gaining 82.82% accuracy ture. With this feature, we first aimed to enhance the classi-
and F1-score of 0.713. fication performance through obtaining more discriminative
Moreover, one another key finding that emerges is that the and separable lower dimensional embeddings by making the
variance in the metrics between the fold1 and the two oth- UMAP learn the actual distances among the samples so that
ers. Fold 1 s results are found significantly less than the other it can warp the lower-dimensional feature space. Secondly, we
folds. As a whole, the results listed in Table 6 are found far less visualized the latent space of embeddings to investigate and
than the scores that we have obtained during phase 1. The ac- explore the class distributions on a 2D plane.
tual reason for this outcome is highly related to the imbalance To obtain lower-dimensional embedding through UMAP,
of the classes. Table 7 presents the precision, recall and f1- we have created a Python 3.6 script that imports both UMAP
scores of the predicted benign and malware classes with the version 0.4.4 and sklearn libraries. To run, UMAP requires sev-
most successful algorithms in each fold by also providing the eral hyper-parameters such as n_components (i.e. dimension
sample counts. Moreover, Fig. 6 also shows the malware splits number of target lower dimensional embeddings), n_neigbors
in each fold with their names. (i.e. the parameter to prefer the balance for preserving of lo-
In light of this new evidences, we can infer (a) benign ex- cal versus global structure and min_dist (i.e. the parameter to
amples are generally misclassified and (b) a high number of dictate the algorithm of how tightly it glues the data points).
malware samples make the models biased towards making The hyper-parameter of min_dist ranging from 0 to 1, actu-
the classifiers behaving in favor of malware class. We can also ally enables to select the permitted minimum distance among
conclude that the structural similarities between the benign the points in target embeddings. Thus, the larger the value
of min_dist broader the preservation of the manifold struc-
computers & security 103 (2021) 102166 15

Fig. 6 – The workflow of the “unknown malware detection” phase through the use of UMAP dimension reduction and
binary classification. (For interpretation of the references to colour in this figure legend, the reader is referred to the web
version of this article.)

ture. One another hyper-parameter is the initial distance met- We also computed the performance gain in terms of accu-
ric which Euclidean by default. racy for each fold and they are ranging from 2.13% to 36.98%.
As pointed out by Becht et al. (2019), the parameter se-
lection of UMAP is important to have useful embeddings for
1. Along with these, we calculated the weighted averages of
different purposes. Hence, we have conducted numerous ex-
the accuracy gains for each classification algorithms. In
periments to find out the optimal hyperparameters. In our
that regard, 4-dimensional embeddings created by UMAP
experiments, we executed a grid-search algorithm that tests
made RF gain 12.93% on average. It is found that, for this
the parameter of n_components with the values of 4, 5, 10, 25,
particular problem and dataset, the performance of linear
50 and 100. Furthermore, we have also sought the optimal
SVM and XGBoost have increased by 21.83% and 20.78% re-
min_dist parameter with the values ranging from 0.15 to 1
spectively.
having an interval of 0.05. Apart from these, the n_neighbors
2. The overall duration to build an embedding transformer
parameter is tested against values ranging from 5 to 100.
object takes 33 s at the test computer equipped with Intel
Throughout the investigations, we also tested various met-
i7 8700 processor.
rics such as Euclidean, Manhattan, Chebyshev, Cosine, Jaccard and
3. We have depicted the 2D projections of the train/test splits
Dice.
for each fold by presenting their original (raw) feature
According to the result of these experiments, we found
space and their corresponding new embeddings. Fig. 7
out that our best classification results were obtained by
shows the visualizations of each fold in a separate column.
setting the values of n_components = 4, n_neigbors = 55,
The top two projections illustrate the training data (before
min_dist = 1 and metric = ‘Manhattan’. UMAP applies man-
and after UMAP) whereas the lower two projections of test
ifold learning in two different modes (1) supervised, (2) un-
data present the representations that are before and after
supervised. To acquire more separable classes we have pre-
UMAP. In the upper part of the figure, the clear separation
ferred supervised mode. In this way, UMAP could learn the
between the lower dimensional embeddings of two classes
topological structure of the classes and embeds them into
can be easily seen for each fold. If we investigate the lower
more separable low dimensional feature space via preserv-
part (i.e. test data) of the figure, we can see the improve-
ing the intra-class distances controlled by the aforementioned
ment in terms of class separation gets better for all the
hyperparameters.
folds. More precisely, the orange-colored points (i.e. Benign
Table 8 shows the results after the application of UMAP. The
samples) got distant from blue-colored points (i.e. Malware
findings we discover from the experiments are listed below as
samples) in a way that they build new clusters although
follows:
some remain as outliers.
4. To better explore the progress which was reflected in clas-
1. All classifiers become able to correctly classify their train- sification results in detail, we also presented the new pre-
ing samples cision, recall and F1-scores for benign and malware classes
2. Evaluation metrics such as accuracy have risen in different for each fold. The outcomes are provided in Table 9. Accord-
ratios (i.e. Fold 1: 63.25% → 89.95%, Fold 2: 84.24% → 89.27%, ing to the results, all the F1-scores showing the harmonic
Fold 3: 82.82% → 84.95%). More importantly, almost all the mean of precision-recall values have risen to a certain ex-
classifiers perform equally well on test sets. tend. Beyond this, it seems that UMAP not only improves
16 computers & security 103 (2021) 102166

Table 8 – Classification performance of various ML methods on each fold after application of UMAP based supervised
manifold learning.

ML.Algorithm Fold Training Data Testing Data


Accuracy Precision Recall F1 Accuracy Precision Recall F1 Accuracy Gain
Random Forest Fold 1 100% 1 1 1 89.95% 0.816 0.726 0.760 +27.63%
SVM (Linear) Fold 1 100% 1 1 1 89.95% 0.816 0.726 0.760 +26.70%
XGBoost Fold 1 100% 1 1 1 89.95% 0.816 0.726 0.760 +26.76%
Random Forest Fold 2 100% 1 1 1 89.27% 0.783 0.761 0.771 +5.03%
SVM (Linear) Fold 2 100% 1 1 1 89.27% 0.783 0.761 0.771 +36.98%
XGBoost Fold 2 100% 1 1 1 89.38% 0.786 0.761 0.772 +29.97%
Random Forest Fold 3 100% 1 1 1 84.95% 0.705 0.753 0.724 +2.13%
SVM (Linear) Fold 3 100% 1 1 1 84.95% 0.705 0.753 0.724 +7.55%
XGBoost Fold 3 100% 1 1 1 84.95% 0.705 0.753 0.724 +8.71%
Random Forest Weight. Avg. – – – – – – – – +12,93%
SVM (Linear) Weight. Avg. – – – – – – – – +21.83%
XGBoost Weight. Avg. – – – – – – – – +20.78%

distant to real-time fashion. In order to overcome this prob-


Table 9 – Precision, recall and f1-scores of the predicted
lem some approaches such as the use of Nvidia Cuda enabled
benign and malware classes for each fold after the appli-
cation of UMAP. GPU hardware for memory dumping has been proposed by
(Korkin and Nesterow, 2016). This type of attempts for reduc-
Fold Class Precision Recall F1 Support ing the time to acquire dump files will lead solutions towards
Fold 1 Benign 0.71 0.48 0.58 227 better and close real-time detection. On the other hand, it is
Fold 1 Malware 0.92 0.97 0.94 1376 known that some smart malware types shutdown themselves
Fold 2 Benign 0.64 0.58 0.60 130 immediately when they detect that they are run on a sand-
Fold 2 Malware 0.93 0.95 0.94 784
box. This situation brings about other challenges for memory
Fold 3 Benign 0.48 0.62 0.54 220
Fold 3 Malware 0.93 0.89 0.91 1329
dump based malware detection strategies that need efforts
and solutions.
In this study, we have employed a new manifold learn-
ing based strategy (i.e. UMAP) to improve unknown malware
the scores for the minority class (i.e. benign) but also en- detection performance and the cross-validation based anal-
ables the samples of the majority class to reorganize in a ysis exhibits that it really improves the accuracy score along
way that they are mapped to more concise and clustered with enhancing the precision and recall rates of both the be-
fashion. nign and malware classes. The current experiments show that
UMAP does not create any bias on the classifers we recre-
ated. However, our dataset named Dumpware10 is limited to
6. Discussion 10 malware classes and we believe that our experimental se-
tups require larger datasets to support this argument more
Similar to the work of Dai et al. (2018), our proposed scheme strongly.
mainly relies on the memory dump files. The memory dump One another noteworthy aspect is the potentials of deep
content that we deal with in this study is called full memory learning methods that we can benefit from. As we highlighted
dumps that involve the data located both in the actual physi- in the previous sections, the well known and widely used con-
cal memory space and the virtual memory space data on the volutional neural architecture named VGG16 does not con-
disk. The employment of memory dump files as the source tribute the accuracy as we had expected. The authors of the
of information presents two important advantages such as study (Bozkir et al., 2019) conducted experiments on the Male-
(1) capturing the exposed binary content to analyze the pro- vis dataset (Korkin and Nesterow, 2016) covering 25 malware
cess even it is a packed or encrypted file and (2) detection classes by utilizing several contemporary CNN architectures
of a new and sophisticated kind of malware so-called fileless having different numbers of layers like Alexnet, Resnets, In-
malware which injects itself into the system memory with- ception and DenseNets. The results of the study demonstrate
out leaving any footprint on disk drives. On the other hand, that there exists no difference in large margin among the used
obtaining memory dumps poses some challenges. First of all, architectures in the problem domain. As a result, the use of
it takes some time in which its duration depends on differ- deep CNNs contributes to the problem by providing an end-
ent environmental parameters and it requires to start on the to-end approach rather than hand-crafted feature extraction.
OS which would risk the system. Therefore, many security Nonetheless, we believe that state of the art strategies such
software vendors prefer to employ a virtual machine or sand- as attention mechanism equipped CNNs may contribute to
boxes to cope with this risk. Nonetheless, this increases the the problem domain when the large scale of malware classes
total time to extract a memory dump and makes the system come into play. The rationale behind this argument results
computers & security 103 (2021) 102166 17

Fig. 7 – The UMAP based 2D visualizations of train/test distributions in each fold that are obtained before and after
dimension reduction. (M: Malware, B: Benign-ware). (For interpretation of the references to colour in this figure legend, the
reader is referred to the web version of this article.)

from our observations on malware images and the ability of


attention mechanisms in modern CNNs focusing on where
7. Conclusion
to look in the images to automatically discover the most dis-
In this study, we have conducted a two-phase study to iden-
criminative regions. The study (Chen, 2018) explores the dis-
tify and detect malware and benign executables by utiliz-
tinguishing parts of the malware images via superpixels – a
ing two image descriptors namely GIST and HOG to analyze
segmentation method for images – and shows that not ev-
memory dump images. Moreover, in the second phase, we at-
ery pixel contributes to the classification performance. There-
tempted to employ the state of the art manifold learning tech-
fore, for better separation among the classes, the future stud-
nique named UMAP for the first time in the field to improve
ies need to identify the essential and discriminative regions by
the robustness of classifiers which is meant to better decide
discarding the unnecessary visual information that resembles
whether a suspicious process in a computer’s memory is ma-
the curse of dimensionality.
licious or not. The experiments conducted on the embeddings
18 computers & security 103 (2021) 102166

obtained with UMAP shows that it is an appropriate method Bozkir AS, Akcapinar Sezer E. In: 4th International Symposium
that can be used in the domain to contribute both to the on Digital Forensic and Security (ISDFS). Use of HOG
problem of class imbalance and feature visualization. In this Descriptors in Phishing Detection Arkansas, USA; 2016.
Bozkir AS, Cankaya AO, Aydos M. In: Signal Processing and
sense, we suggested an approach following the volatile mem-
Applications Congress, SIU, Sivas. Utilization and Comparison
ory forensics to build a robust scheme against fileless mal-
of Convolutional Neural Networks in Malware Recognition;
ware. Thus, memory dumps of processes have been utilized 2019.
as the main source of information. So as to support the study Chen L, “Deep Transfer Learning for Static Malware
findings, we have prepared and published a publicly available Classification”, arXiv:1812.07606, 2018.
dataset having instances for 10 malware families also includ- Cheng Y, Fan W, Huang W, An J. A Shellcode Detection Method
ing benign samples. Moreover, we have explored the impact of Based on Full Native API Sequence and Support Vector
Machine, vol. 242. IOP Publishing; 2017.
various image rendering schemes and found that 4096px col-
Coenen A, Pearce A, “Understanding UMAP”,
umn width yields the best results. The results indicated that https://pair- code.github.io/understanding- umap/, Technical
when transforming memory dumps to initial images, having Note, (Available online at 27.5.2020), 2019.
larger columns (i.e. larger width values) result in having bet- Coifman RR, Lafon S. Diffusion maps. Appl. Comput. Harmon.
ter representations. Furthermore, the late fusion of GIST and Anal. 2006;21(1).
HOG features yields the best outcome. Dai Y, Li H, Qian Y, Lu X. A malware classification method based
on memory dump grayscale image. Digital Investigation
Consequently, the proposed approach having 96.39% accu-
2018;27:30–7.
racy shows promising results through also offering a reason-
Dai Y, Li H, Qian Y, Yang R, Zheng M. SMASH: a Malware
able time to detect such as 3.56 s. We believe that the volatile Detection Method Based on Multi-Feature Ensemble Learning.
memory forensics will gain more popularity in the near future IEEE Access 2019;7:112588–97.
due to the increase in fileless malware and similar threats. Dalal N, Triggs B. Histograms of oriented gradients for human
As future work, to increase the accuracy and to obtain an detection, vol. 1; 2005. p. 886–93.
end-to-end analysis platform, we plan on investigating the Eroglu E, Bozkir AS, Aydos M. Brand Recognition of Phishing Web
Pages via Global Image Descriptors. Eur. J. Sci. Technol. 2019
abilities of convolutional neural networks that are equipped
Special Issue.
with self-attention mechanisms to reveal more discriminative
General documentation, Weka. (2019). [Online]. Available:
representations as well as achieving better generalization ca- https://www.cs.waikato.ac.nz/ml/weka/.
pabilities. One another interesting idea that we project to test Gibert D, Mateu C, Planes J. The rise of machine learning for
is the inverse transformation feature of UMAP which might detection and classification of malware: research
enable us to generate synthetic image features analogous to developments, trends and challenges. J. Network Comp. Appl.
the decoding of auto-encoders. 2020;153.
Jackson JE. A User’s Guide to Principal Components, 587. John
Wiley & Sons; 2005.
Kobak D, Linderman GC. UMAP does not preserve global structure
Declaration of Competing Interest any better than t-SNE when using the same initialization;
2019. biorXiv preprint bioRxiv2019.12.19.877522.
Korkin I, Nesterov I. In: TheADFSL Conference on Digital
The authors declare that they have no known competing fi- Forensics. Applying memory forensics to rootkit detection.
nancial interests or personal relationships that could have ap- Security and Law; 2015.
peared to influence the work reported in this paper. Korkin I, Nesterow I. In: 11th ADFSL Conference on Digital
Forensics, Security and Law. Acceleration of Statistical
Detection of Zero-day Malware in the Memory Dump Using
CRediT authorship contribution statement Cuda-Enabled GPU Hardware; 2016.
Maaten Lvd, Hinton G. Visualizing data using t-sne,. J. Mach. Lear.
Res. 2008;9.
Ahmet Selman Bozkir: Conceptualization, Methodology, Malware Statistics, the AV-TEST Institute. (2019). [Online].
Software, Formal analysis, Writing - original draft. Ersan Available: https://www.av-test.org/en/statistics/malware/.
Tahillioglu: Data curation, Software. Murat Aydos: Project ad- L. McInnes, J. Healy, J. Melville, “UMAP: Uniform Manifold
ministration, Validation, Supervision, Writing - review & edit- Approximation and Projection for Dimension Reduction”,
ing. Ilker Kara: Validation, Writing - review & editing. arXiv preprint arXiv1802.03426, 2018.
Microsoft Malware Classification Dataset, (2015). [Online].
Available: https://www.kaggle.com/c/malware-classification.
R E F E R E N C E S
Nataraj L, Karthikeyan S, Jacob G, Manjunath BS. Malware
images: visualization and automatic classification.
Proceedings of the 8th international symposium on
visualization for cyber security. ACM, 2011.
Ali M, Jones MW, Xie X, Williams M. TimeCluster: dimension Nissim N, Lahav O, Cohen A, Elovici Y, Rokach L. Volatile memory
reduction appliedto temporal data for visual analytics. Vis. analysis using the MinHash method for efficient and secured
Comput. 2019;35. detection of malware in private cloud. Computers & Security
Becht E, McInnes L, Healy J, Dutertre C-A, Kwork I-W-H, Ng LG, 2019;87.
Ginhoux F, Newell EW. Dimensionality reduction for Oliva A, Torralba A. Modeling the shape of the scene: a holistic
visualizing single-cell data using UMAP. Nat. Biotechnol. representation of the spatial envelope. Int. J. Comput. Vis.
2019;37. 2001;42:145–75.
Belkin M, Niyogi P. In: Advances in neural information processing OpenCV Tutorials, OpenCV. (2019). [Online]. Available:
systems. Laplacian eigenmaps and spectral techniques for https://opencv.org/.
embedding and clustering; 2002.
computers & security 103 (2021) 102166 19

Or-Meir O, Nissim N, Elovici Y, Rokach L. Dynamic Malware Zhang H, Yao DD, Ramakrishnan N. Causality-based
Analysis in the Modern Era – A State of the Art Survey. ACM Sensemaking of Network Traffic for Android Application
Comput. Surv. 2019;52(5). Security. Proc. of the 2016 ACM Workshop on Artificial
Oujaoura M, Minaoui M, Fakir M, El Ayachi R, Bencharef O. Intelligence and Security (AISec’2016), 2016.
Recognition of Isolated Printed Tifinagh Characters. Int. J.
Comput. Appl. 2014;85. AHMET S. BOZKIR was born in Muğla Turkey,
ProcDump v9.0, Microsoft. (2019). [Online]. Available: https://docs. in 1983. He received a B.S. degree in com-
microsoft.com/en-us/sysinternals/downloads/procdump. puter engineering from Eskişehir Osmangazi
Process Explorer v16.31, Microsoft. (2019). [Online]. Available: University in 2002. Besides, he received M.Sc.
https://docs.microsoft.com/en-us/sysinternals/downloads/ and Ph.D. degrees in computer engineering
process-explorer. from the Hacettepe University in 2009 and
Process Monitor v3.53, Microsoft. (2019). [Online]. Available: 2016 respectively. He is currently working
https://docs.microsoft.com/en-us/sysinternals/downloads/ as a Research Assistant at Hacettepe Uni-
procmon. versity Multimedia Information Laboratory
Pyleargist, (2019). [Online] Available: (HUMIR). He has more than 33 studies cov-
https://pypi.org/project/pyleargist/. ering fields such as Information Security,
Rezende E, Ruppert G, Carvalho T, Theophilo A, Ramos F, de Human-Computer Interaction, Information
Geus P. In: Advances in Intelligent Systems and Computing. Retrieval, Machine Learning and Engineering
Malicious Software Classification Using VGG16 Deep Neural Geology.
Network’s Bottleneck Features; 2018.
ERSAN TAHILLIOGLU works as an embed-
Santos I, Devesa J, Brezo F, Nieves J, Bringas PG. In: International
ded software engineer in ASELSAN Corpora-
Joint Conference CISIS’12-ICEUTE 12-SOCO 12 Special
tion. He has received his B.S. degree in com-
Sessions. Opem: A static-dynamic approach for
puter engineering from Hacettepe Univer-
machine-learning-based malware detection. Berlin,
sity in 2016. He is currently an M.Sc student
Heidelberg: Springer; 2013.
in the same department. His-research inter-
Shaid SZM, Maarof MA. In: 2014 International Symposium on
ests include topics such as computer secu-
Biometrics and Security Technologies (ISBAST). Malware
rity and machine learning. His-current re-
behavior image for malware variant identification. IEEE; 2014.
search specifically focuses on malware de-
Sharma PK, Raglin A. In: 17th IEEE International Conference on
tection utilization of memory forensic and
Machine Learning and Applications. Efficacy of Nonlinear
machine learning.
Manifold Learning in Malware Image Pattern Analysis; 2018.
Sharma A, Sahay SK. Evolution and detection of polymorphic and
MURAT AYDOS. Dr. Murat Aydos received
metamorphic malwares: a survey. Int. J. Comput. Appl.
the B.Sc. degree from Yildiz Technical Univer-
2014;90:7–11.
sity (Turkey) in 1991, and M.S. degree from
Shijo PV, Salim A. Integrated static and dynamic analysis for
Electrical and Computer Engineering De-
malware detection. Procedia Comput. Sci. 2015;46:804–11.
partment, Oklahoma State University (USA),
Sihwail R, Omar K, Zainol Ariffin KA, Al Afghani S. Malware
in 1996. He completed his Ph.D. study at Ore-
detection approach based on artifacts in memory image and
gon State University, Electrical Engineering
dynamic analysis. Appl. Sci. 2019;9.18:3680.
and Computer Science Department in June
Tam K, Edwards N, Cavallaro L. In: Engineering Secure Software
2001. Dr. Aydos joined Informatics Institute
and Systems (ESSoS) Doctoral Symposium. Detecting Android
at Hacettepe University in April 2013. He is
malware using memory image forensics; 2015.
the Head of Information Security Division
Tenenbaum JB, De Silva V, Langford JC. A global geometric
at the Informatics Institute. Dr. Aydos is the
framework for nonlinear dimensionality reduction. Science
author/co-author of more than 30 technical
2000;290(5500).
publications focusing on the applications of Cryptographic Primi-
TDIMon, Mark Russinovich. &; Bryce Cogswell. (2019). [Online].
tives, Information & Data Security Mechanisms.
Available: https://sysinternals.d4rk4.ru/Utilities/TdiMon.html.
The Malevis Dataset, (2019). [Online]. Available: ILKER KARA has been an Assist. Prof. in the
https://web.cs.hacettepe.edu.tr/∼selman/malevis/. Department of Medical Services and Tech-
Vasan D, Alazab M, Wassan S, Safaei B, Zheng Q. Image-Based niques, Eldivan Medical Services Vocational
Malware Classification using Ensemble of CNN Architectures School in Çankırı Karatekin University since
(IMCEC). Computers & Security 2020;92. 2019. He has also been a part-time lecturer
Yajamanam S, Selvin VRS, Di Troia F, Stamp M. Deep learning in the Computer Engineering Department of
versus gist descriptors for image-based malware Hacettepe University since 2017. Kara com-
classification. Icissp 2018:553–61. pleted his Ph.D. at Gazi University as of
Yuan B, Wang J, Liu D, Gao W, Wu P, Bao X. Byte-level Malware 2015. His-research interests cover digital in-
Classification Based on Markov Images and Deep Learning. vestigation, malware analysis and internet
Computers & Security 2020;92. security. He has actively collaborated with
Zhang H, Yao D(Daphne), Ramakrishnan N. Detection of stealty researchers from several other disciplines
malware activities with traffic causality and scalable such as computer science and forensics se-
triggering relation discovery. Proc. of the 9th ACM symposium curity in particular. He is currently working as the head of Infor-
on information, computer and communications security mation Security Division at the Informatics Institute. Besides, Dr.
(ASIA CCS’14), 2014. Kara authored more than 20 technical publications focusing on
the applications of Cyber Security, Malware Analysis; Data Secu-
rity Mechanisms.

You might also like