Robust Alcode Detection

This document presents a new method for detecting malicious PDF files using image visualization and processing techniques. PDF files are first converted to grayscale images using existing visualization techniques. Image features representing the visual characteristics of benign and malicious PDF files are then extracted from the images. Finally, machine learning classifiers are trained on the extracted features to classify new PDF files as benign or malicious. The proposed method is evaluated on a dataset of real PDF malware and shown to be more robust against evasion techniques than existing learning-based PDF malware detection methods.

Uploaded by

mrimamss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Robust Alcode Detection

Uploaded by

mrimamss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

2019 2nd International Conference on Data Intelligence and Security (ICDIS)

Robust PDF Malware Detection with Image

Visualization and Processing Techniques
Andrew Corum Donovan Jenkins, Jun Zheng
Department of Computer Science Department of Computer Science and Engineering
Calvin College New Mexico Institute of Mining and Technology
Grand Rapids, MI 49546 Socorro, NM 87801
[email protected] [email protected], [email protected]

Abstract—PDF, as one of most popular document file format, features extracted from the logical structure of PDF files to
has been frequently utilized as a vector by attackers to covey detect malicious behavior.
malware due to its flexible file structure and the ability to In this paper, we propose a learning-based PDF malware
embed different kinds of content. In this paper, we propose
a new learning-based method to detect PDF malware using detection method with image visualization and processing, a
image processing and processing techniques. The PDF files are completely different approach for feature extraction. Using
first converted to grayscale images using image visualization image visualization approaches proposed in [9] and [10],
techniques. Then various image features representing the distinct the PDF files are converted to grayscale images first. Then
visual characteristics of PDF malware and benign PDF files are image features can be extracted to represent the distinct visual
extracted. Finally, learning algorithms are applied to create the
classification models to classify a new PDF file as malicious or characteristics of PDF malware and benign PDF files.
benign. The performance of the proposed method was evaluated The rest of this paper is organized as follows. In Sec-
using Contagio PDF malware dataset. The results show that the tion II, we present the background information about PDF
proposed method is a viable solution for PDF malware detection. file structure, PDF malware and attacks to evade learning-
It is also shown that the proposed method is more robust to
based PDF malware detection. An overview of the proposed
resist reverse mimicry attacks than the state-of-art learning-based
method. PDF malware detection method with image visualization and
Index Terms—PDF malware, malware detection, image visu- processing is presented in Section III. The details of image
alization, feature extraction visualization of PDF files and feature extraction using image
processing are introduced in Sections IV and V, respectively.
I. I NTRODUCTION The performance evaluation results of the proposed method
are provided in Section VI. Finally, conclusions are drawn in
PDF (Portable Document Format) is a file format invented Section VII.
by Adobe for presenting, exchanging and archiving documents
that is independent of hardware, software, and operating II. BACKGROUND
systems. As one of the most used file formats, PDF documents
A. PDF File Structure
have become one of the major vectors for malware attacks.
This is mainly due to the flexibility of PDF file structure and A PDF file contains at least four parts: header, body, cross-
the ability of embedding different kinds of content such as reference table, and a trailer. The header contains information
JavaScript code, encoded streams and image objects etc. These on the PDF version used to create the file. The body section
features can be exploited by attackers to embed the malware contains all the objects the file needs. These objects hold text,
in PDF files using tools like Metasploit [1], [2]. For example, images, formatting information, content location, scripts, and
it was reported that the current popular Ransomware can be more. There are two types of objects: direct and indirect.
hidden inside PDF documents to launch the attacks [3]. Indirect objects are typically dictionaries and referenced in
Various PDF malware detection techniques have been pro- the cross-reference table. These objects can call other indirect
posed to address the challenges of PDF malware attacks, objects and direct objects. The cross reference table simply
including keyword-based techniques, tree-based techniques, lists the names and locations of all the indirect objects. The
code-based techniques and learning-based techniques [1]. An trailer section contains the name and location of the first object
essential step for learning-based PDF malware detection tech- to be rendered (typically called ”nroot”).
niques is feature extraction. Two popular kinds features used
in the current learning-based techniques are Javascript-based B. PDF Malware
features and structural features [1]. PJScan [4] and Lux0R [5] There are a number of ways to exploit the PDF file format
are two examples that use features related to the embedded due to its powerful capability of embedding different kinds of
Javascript code to detect malicious behavior. Other techniques content. For example, a popular attack is to inject malicious
like PDF Slayer [6], PDFRate [7] and Hidost [8] rely on Javascript code into a PDF file using specific APIs. The

978-1-7281-2080-5/19/$31.00 ©2019 IEEE 108

DOI 10.1109/ICDIS.2019.00024
malicious code can then used to exploit certain vulnerabil-
ities of the PDF reader. Attackers can also embed malicious
Shockwave Flash (SWF) files and ActionScript code in a PDF
file. This can be used to exploit the vulnerabilities of the Flash
interpreter of the PDF reader. Early versions of PDF reader
allowed the execution of arbitrary binary files specified in a Benign
PDF document which could be exploited by remote attackers
to launch malicious attacks as reported in CVE-2010-1240.
C. Evasion of Learning-based PDF Malware Detection
To avoid detection by learning-based approaches, attackers
can utilize various attacks to evade the classifiers. Two of
Malicious
these attacks are mimicry [5], [7] and reverse mimicry [11]
attacks. The mimicry attacks inject benign PDF objects into a
malicious PDF file to cause the malicious PDF file to appear
benign. To conduct the mimicry attack, the attacker is aware
of the threshold that the classifier uses to distinguish a benign Decision
PDF file and a malicious one, and injects benign code until Boundary
the threshold is crossed. On the contrary, reverse mimicry (a)
attacks work by controlling the amount of the malicious
code injected into a benign PDF file. The attacker will inject
just enough malicious code to achieve their attacking goal,
while not crossing the classifier’s threshold and be considered
as malicious. Figure 1 illustrates the mimicry and reverse
mimicry attacks.
Benign
III. OVERVIEW OF P ROPOSED M ETHOD
The proposed method for detecting PDF malware with
image visualization and processing consists of the following
steps as shown in Figure 2:
• Image visualization: As the preprocessing step, the PDF Malicious
files are converted to grayscale images using two image
visualization techniques, byte plot [9] and Markov plot
[10].
• Image feature extraction: To characterize the converted
grayscale images, image features are extracted which are Decision
required as an essential step to build the classification Boundary
model to classify the PDF files as malicious or benign.
In this study, we consider two kinds of images features, (b)
keypoint descriptors and texture features. Fig. 1: Illustration of the (a) mimicry and (b) reverse mimicry
• Learning and classification: By representing each PDF attacks
file as a feature vector, a learning algorithm is applied
to create a classification model which can be used to
classify new PDF files as malicious or benign. Three pop- byte sequentially. Each byte is written directly as a pixel in
ular learning algorithms, Random Forests (RF), Decision a grayscale image with a range of possible value from 0 to
Trees (DT) and K-Nearest Neighbor (KNN), are adopted 255. The bytes are placed on the image following scan lines,
in our study. from left to right. To determine the image size, a method from
[9] is used which the image width is determined by the size
IV. I MAGE V ISUALIZATION OF PDF F ILES of the PDF file as shown in Table I. The length of the image
In this section, we introduce how to visualize PDF files as is variable which also depends on the PDF file size. Figure 3
grayscale images using byte plot [9] and Markov plot [10]. shows an example of a byte plot image created from a PDF
file.
A. Byte Plot
B. Markov Plot
Byte plot is a simple and effective method proposed in [9]
to visualize malware binaries as grayscale images. To create The Markov plot is an alternative way to visualize a PDF
a byte plot for a PDF file, the PDF binary is read byte by file as a grayscale image [10]. A Markov plot is created by

109
Fig. 2: Proposed PDF malware detetion method with image visualization and processing

TABLE I: Image Width vs. PDF File Size [9]

the transition probability of a byte value i to a byte value j in
PDF File Size Range Image Width the PDF byte stream which is calculated as shown in Equation
<10 kB 32
10 kB – 30 kB 64
(1).
wi,j
30 kB – 60 kB 128 Pi,j = 255 (1)
k=0 wi,k
60 kB – 100 kB 256
100 kB – 200 kB 384
200 kB – 500 kB 512 where wi,j is the number of transitions from byte value i to
500 kB – 1000 kB 768
>1000 kB 1024
byte value j in the PDF byte stream.
To display as a grayscale bitmap image, the state transition
255
matrix P is scaled as I = max(P) P. Note that the image size
of Markov plot is the same (256 × 256) for all PDF ﬁles. An
example of a PDF Markov plot is shown in Figure 4.

Fig. 4: An example of a Markov plot of a PDF ﬁle

Fig. 3: An example of a byte plot of a PDF ﬁle

V. F EATURE E XTRACTION USING I MAGE P ROCESSING
representing the byte stream of a PDF file as the state transition After preprocessing the PDF files using visualization tech-
matrix of a 256-state Markov chain. The state space of the niques, we extracted features from the images which will be
Markov chain is the set of possible values of a byte, S = used as input to build the classification models. In this study,
{0, 1, ..., 255}. The state transition matrix P of the Markov we consider two kinds of image features: keypoint descriptors
chain is given by {Pi,j |0 ≤ i ≤ 255, 0 ≤ j ≤ 255}. Pi,j is and texture features.

110
A. Keypoint Descriptors the Rotated Binary Robust Independent Elementary Features
Keypoint descriptors have been used for object detection (BRIEF) descriptor. Due to the significantly slow computation
and classification. We studied two widely-used keypoint de- of SIFT and similarity between SIFT and ORB, we only
scriptors: Scale Invariant Feature Transform (SIFT) [12] and included ORB in the performance evaluation.
Oriented FAST and Rotated BRIEF (ORB) [13]. FAST was proposed in [14] to find corner keypoints ef-
1) SIFT: The SIFT descriptor was developed by Lowe ficiently. ORB appies FAST with a few modifications. To
[12] which is widely used for feature extraction in computer detect keypoints, ORB uses a radius of 9 rather than 3 around
vision applications. SIFT can be broken down into four main the center pixel c. ORB also applies FAST on a multi-scale
steps: detection of scale-space extrema, keypoint localization, pyramid of the image and sort the keypoints using the Harris
keypoint orientation, and keypoint description. corner measure [15]. The keypoints are then thresholded to
First SIFT detects scale-space extrema (local maxima and obtain the targeted number n points. In our study, n is set
minima) within the image. These extrema will be starting to 32. The FAST method does not return any information on
points for finding the image’s keypoints. SIFT algorithm em- corner orientation, so ORB uses the location of the intensity
ploys Difference of Gaussian (DoG) as a close approximation centroid to find the corner’s orientation. The intensity centroid
of Laplacian of Gaussian LoG) to detect the local extrema over is defined in [16] as

scale and space. The potential keypoints found in the first step m1,0 m0,1
C= , (2)
is then thresholded with a threshold 0.03 [12] to remove low- m0,0 m0,0
contrast ones. Since DoG often flags edges as keypints while where the moment mi,j is
we are only interested in corners, these unwanted keypoints
are removed by setting a threshold for the curvature at an mi,j = pix pjy I(p) (3)
extremum, which is calculated using the Hessian matrix. p

After the array of keypoints has been trimmed down, the for each pixel p ∈ P in the corner. Finally, the orientation of
SIFT algorithm assigns orientations to each keypoint. This the corner is θ = atan2(m0,1 , m1,0 ).
is done by applying a histogram of oriented gradients with The BRIEF descriptor was proposed in [17] which is a
36 bins. Finally the keypoint descriptors can be created, fast and efﬁcient binary descriptor but without the rotation
using a 16 x 16 neighborhood around each keypoint. This invariance. BRIEF works by taking a smoothed patch of the
neighborhood is divided up into 16 blocks, and in each block image, p. Then this patch is tested at n different paired
another orientation histogram with 8 bins is taken. Hence keypoints found by FAST detector around the image. Each
a total of 128 bins are available as a descriptor for each pair of points is tested using Equation (4), where I(a) is the
keypoint. After this entire process is complete, we are left intensity of point a in the smoothed patch p.
with a feature vector of size N um Keypoints × 128. Figure

1, I(a) < I(b)
5 is an example of keypoints that SIFT identiﬁed on the byte τ (p, a, b) = (4)
plot representation of a PDF. 0, I(a) ≥ I(b)

The BRIEF feature vector, fn , is made up of each n pairs of

points tested in this way, as shown in Equation (5).
fn (p) = τ (p, a0 , b0 ), τ (p, a1 , b1 ), ..., τ (p, an−1 , bn−1 ) (5)
However, in order to account for rotated keypoints, ORB
uses a rotated BRIEF (rBRIEF) descriptor [13]. The rBRIEF
descriptor is gn (p, θ) = fn (p) where the paired locations,
a, b ∈ Sθ . The set Sθ = Rθ S, where S is the matrix of
a0 a1 ... an−1
paired locations, S = , and Rθ is the
b0 b1 ... bn−1
cos θ − sin θ
rotation matrix, Rθ = . Note that ORB uses
sin θ cos θ
a n = 256 bits, or 32 bytes, as the size of the BRIEF
feature vector, and the keypoint rotation, θ, is only in discrete
multiples of 2π/30.
B. Texture Features
Fig. 5: Keypoints identiﬁed by SIFT on a byte plot Texture features have been widely used to characterize
images for various computer vision and pattern recognition
2) ORB: The computation of SIFT is slow due to its applications including image segmentation, biomedical image
complexity and descriptor vector length. ORB provides a fast analysis, object recognition etc. In this study, we consider three
alternative to SIFT [13] which employs the Oriented Features methods for extracting texture features from images including
from Accelerated Segment Test (FAST) keypoint detector and local binary patterns (LBP), local entropy, and Gabor ﬁlter.

111
1) LBP: LBP is a typical method used to extract texture where
features from images [18]. LBP looks at p pixels on the
x′ = x cos θ + y sin θ,
boundary of a bilinear circle with a radius of r around
each pixel of the image. LBP compares the intensity of the y ′ = −x sin θ + y cos θ,
center pixel c, I(c), with the intensity of each outer pixels γ = 0.5,
k(k = 1, 2, ..., p), I(k). Using the Heaviside step function, φ=0
LBP checks if the outer pixels have higher or lower intensity
than the center pixel. Equation (6) combines each of these After convolving an image with the Gabor kernels, different
comparisons into a single grayscale value, representing the features can be extracted [20].In our study, we extracted two
I(k)
local texture at the center pixel. Once each pixel of the image features: global mean (µ = NN for all N pixels in the
has been converted in this way, a 64-bin histogram of the (I(k)−µ)2
image) and variance (σ 2 = N N ). Using these two
image can be made. Each bin of the histogram represents the measures per filtered image, and an additional 16-bin intensity
occurrences of a group of similar local binary textures found histogram, a feature vector of size 64 was obtained in our
throughout the image. These bins are then used as features to study.
build the classification model. In this study, we use a kernel
of radius r = 32 and p = 64 pixels. VI. P ERFORMANCE E VALUATION
The dataset used for performance evaluation was obtained
p−1
from Contagio which contains 10,980 malicious and 9,000
LBPr,p (c) = H(I(k) − I(c)) · 2k (6) benign PDF files [21]. We split the Contagio dataset into
k=0 two partitions. The first partition consists of 6,000 benign and
6,000 malicious PDF files. This partition was used for testing
the performance of different combinations of plot type, image
1, n ≥ 0 features and classifiers. The second partition consists of the
H(n) = (7)
0, n < 0 remaining 3,000 benign and 4,980 malicious PDF files, which
was used to compare the proposed approach with popular
2) Local Entropy: Local entropy is a method proposed in antivirus scanners. The performance metric used for evaluation
[19] for texture segmentation and thresholding. For a center is F1 score which is calculated using Equation (11), where
pixel c, a disk of radius 5 and area A is constructed. The P recision and Recall are calculated with Equations (12) and
probability assigned to a pixel k with the intensity value I(k), (13), respectively.
relative to the local neighborhood, is calculated using Equation P recision × Recall
(8). This probability is then used in a modified version of F1 = 2 × (11)
P recision + Recall
Shannon’s entropy H(c), shown in Equation (9). H(c) is
ture positives
computed for ever pixel in the image, and these values are P recision = (12)
compiled into a histogram (similar to what was done for LBP ture positives + false positives
in Section V-B1). In this study, we used a histogram with 16 true positives
Recall = (13)
bins which results in a feature vector of size 16. true positives + false nagetives
I(k) With the first partition of the dataset, we evaluated the per-
Pk = (8) formance of different plot type, image features and classifiers
A
using a 10-fold cross-validation. Tables II and III show the
results for byte plot and Markov plot, respectively. There are
A−1
several observations from the results. Obviously, byte plot is
H(c) = − Pk log2 Pk (9) a better choice than Markov plot as it always outperforms the
k=0 latter one. It can seen that keypoint descriptors are not suitable
for our task as the performance is much worse compared with
3) Gabor Filter: Gabor filters have been used for feature that of texture features. The best method is byte plot with local
extraction, texture segmentation, and object recognition. Inter- entropy and RF which achieves a F1 score of 0.9924.
estingly, these filters behave similarly to the mammalian visual
cortex, hence they can be used to recognize textures in an im- TABLE II: 10-fold cross validation results of byte plot
age [20]. In this study, we applied a bank of 24 different Gabor ORB LBP Gabor Local Entropy
kernels with four different orientations (θ = {0, π8 , π4 , 3π
8 }), RF 0.8816 0.9869 0.9918 0.9924
three different standard deviations (σ = {1, 2, 3}), and two DT 0.8821 0.9758 0.9855 0.9874
KNN 0.9188 0.9825 0.9900 0.9851
different frequencies (f = cλ−1 = {0.05, 0.25}). These values
define the Gabor kernel, Gλ,θ,σ , as shown in Equation (10).
After training the top-2 performed methods of byte plot or
′
x Markov plot with the first partition of dataset, we compare
′2
+γ 2 y ′2 )/2σ 2 )
Gλ,θ,σ (x, y) = e−((x cos 2π +φ (10) their performance with some popular antivirus scanners using
λ

112
TABLE III: 10-fold cross validation results of Markov plot which uses image visualization and processing techniques. The
ORB LBP Gabor Local Entropy
PDF files are converted to grayscale images using two image
RF 0.8584 0.9766 0.9854 0.9399 visualization techniques, byte plot and Markov plot. Image
DT 0.7022 0.9615 0.9696 0.9025 features including keypoint descriptors and texture features are
KNN 0.4477 0.9752 0.9400 0.8987 then extracted to represent the distinct visual characteristics of
the images. The proposed method was evaluated by using a
PDF malware dataset from Contagio. Our preliminary results
the second partition of the dataset. The performance of those
show that the proposed method achieves a good performance
antivirus scanners was obtained through VirusTotal [22], an
compared with popular antivirus scanners, which demonstrate
online virus scan service with over 70 antivirus scanners.
that the method is a viable solution for PDF malware detection.
Table IV shows the detection results of our methods and the
As the security of the learning-based methods are susceptible
antivirus scanners. Our best method, byte plot with Gabor
to evasion attacks, we tested the proposed method against
Filter and RF, achieves a F1 score of 99.48%. It can be seen
reverse mimicry attacks. It can be seen from the results that
that the proposed method has significantly better performance
the proposed method is more robust than the state-of-art
than some popular antivirus scanners like Symantec, Microsoft
learning-based method in resisting reverse mimicry attacks.
and nProtect. Some other scanners like McAfee, AVG, Avast
In future, the performance of the proposed method can be
achieves slightly better performance than our methods. The
further improved by investigating more image feature such as
results demonstrate that the proposed method is a viable
wavelet-based texture features [24]. Feature selection can also
solution for PDF malware detection, especially considering
be applied to select a subset of relevant image features from
that the Contagio dataset has been available for 7 years, which
both keypoint descriptors and texture features to improve the
is plenty enough time to make its way into the databases of
performance. Advanced machine learning algorithms such as
those antivirus scanners.
deep learning models [25] could also be investigated in future
TABLE IV: Performance comparison of the proposed method studies.
with popular antivirus scanners
ACKNOWLEDGMENT
F1 score
McAfee 0.9979
This work was supported by the National Science Founda-
AVG 0.9964 tion under award number CNS-1757945.
Avast 0.9959
Byte plot + Gabor + RF 0.9948 R EFERENCES
Byte plot + Local Entropy + RF 0.9937
[1] D. Maiorca and B. Biggio, “Digital investigation of PDF files: unveiling
Markov plot + Gabor + RF 0.9912 traces of embedded malware,” https://arxiv.org/abs/1707.05102.
Symantec 0.9898 [2] J. Park and H. Kim, “K-depth mimicry attack to secretly embed
Markov plot + LBP + RF 0.9736 shellcode to PDF files,” in: Proceedings of International Conference on
Microsoft 0.9733 Information and Science and Applications (ICISA 2017), 2017, pp. 388–
nProtect 0.8887 395.
[3] https://www.cybersecurity-insiders.com/
cyber-attack-with-ransomware-hidden-inside-pdf-documents/
We also tested our method against reverse mimicry attacks. [4] P. Laskov and N. Srndic, “Static detection of malicious javascriptbearin
We obtained a dataset of 1,500 reverse mimicry attacks from PDF documents,” in: Proceedings of the 27th Annual Computer Security
[23] which is publicly available. The dataset contains the injec- Applications Conference (ACSAC 2011), 2011, pp. 373-382.
[5] I. Corona, D. Maiorca, D. Ariu, and G. Giacinto, “Lux0R: Detection of
tions of three different types of contents: embedded executable, malicious PDF-embedded Javascript code through discriminant analysis
embedded JavaScript, and embedded PDF file. Each of the of API references,” in: Proceedings of the 7th Workshop on Artificial
malicious contents was injected into 500 benign PDF files. Intelligence and Security (AISec 2014), 2014, pp. 47–57.
[6] D. Maiorca, G. Giacinto, and I. Corona, “A pattern recognition system
We compare our method with PDF Slayer [6] which is a for malicious pdf files detection,” in: Proceedings of the 8th Interna-
state-of-art machine learning-based approach for PDF malware tional Conference on Machine Learning and Data Mining in Pattern
detection. Both methods were trained with the full Contagio Recognition, 2012, pp. 510–524.
[7] C. Smutz and A. Stavrou, “Malicious PDF detection using meta-data
dataset and tested with reverse mimicry dataset. The results in and structural features,” in: Proceedings of the 28th Annual Computer
terms of the number of detected samples are shown in Table V. Security Applications Conference (ACSAC 2012), 2012, pp. 239–248.
Although both methods struggle at detecting reverse mimicry [8] N. Srndic and P. Laskov, “Hidost: A static machine-learning based
detector of malicious diles,” EURASIP Journal of Information Security,
attacks, our method is much more robust than PDF Slayer Dec. 2016, 2016:22.
against the attacks . [9] L. Nataraj, S. Karthikeyan, G. Jacob, B. S. Manjunath, “Malware
images: Visualization and automatic classification,” in: Proceedings of
VII. C ONCLUSIONS the 8th International Symposium on Visualization for Cyber Security,
VizSec ’11, 2011, pp. 4:1–4:7.
PDF is one of the most widely-used document file for- [10] K. Kancherla, “Image processing techniques for malware detection,”
mats nowadays which has a flexible file structure and can Ph.D. thesis, 2015.
embed different kinds of content such as JavaScript code, [11] D. Maiorca, I. Corona, and G. Giacinto, “Looking at the bag is
not enough to find the bomb: An evasion of structural methods for
encoded streams and image objects etc. This also makes it malicious PDF files detection,” in: Proceedings of the 8th ACM SIGSAC
as a primary target of malware attacks. In this paper, we Symposium on Information, Computer and Communications Security,
propose a new learning-based PDF malware detection method 2013.

113
TABLE V: Detection results of the proposed method and PDF Slayer against reverse mimicry attacks
EXE Embedding JS Embedding PDF Embedding
Byte plot + Gabor + RF 95 176 111
Byte plot + Local entropy + RF 70 162 85
PDF Slayer 52 30 3

[12] D. Lowe, “Distinctive image features from scale-invariant keypoints,”

International Journal of Computer Vision, vol. 60, no. 2, 2004, pp. 91–
110.
[13] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: an efficient
alternative to SIFT or SURF,” In Proceedings of the 2011 International
Conference on Computer Vision (ICCV ’11), 2011, pp. 2564–2571.
[14] E. Rosten and T. Drummond, “Machine learning for highspeed corner
detection,” in: Proceedings of the European Conference on Computer
Vision, 2006.
[15] C. Harris and M. Stephens, “A combined corner and edge detector,” in:
Proceedings of Fourth Alvey Vision Conference, 1988, pp. 147–151.
[16] P. Rosin, “Measuring corner properties,” Computer Vision and Image
Understanding, vol. 73, no. 2, 1999, pp. 291 – 307.
[17] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “BRIEF: binary robust
independent elementary features,” in: Proceedings of Computer Vision
– ECCV 2010, pp. 778–792.
[18] M. Pietikainen, A. Hadid, G. Zhao, and T. Ahonen, “Local binary
patterns for still images,” In: Computer Vision Using Local Binary
Patterns, Springer, 2011, pp. 13–47.
[19] C. Yan, N. Sang, and T. Zhang, “Local entropy-based transition region
extraction and thresholding,” Pattern Recognition Letters, vol. 24, no.
16, 2003, pp. 2935 – 2941.
[20] Y. Cho, S. Bae, Y. Jin, K. M. Irick, and V. Narayanan, “Exploring
gabor filter implementations for visual cortex modeling on FPGA,” In:
Proceedings of the 21st International Conference on Field Programmable
Logic and Applications, 2011, pp. 311–316.
[21] Contagio, 2013, http://contagiodump.blogspot.com/2013/03/
16800-clean-and-11960-malicious-files.html
[22] Virus Total, http://www.virustotal.com/
[23] https://pralab.diee.unica.it/en/pdf-reverse-mimicry/
[24] Y. Huang, V. Bortoli, F. Zhou, and J. Gilles, “Review of wavelet-based
unsupervised texture segmentation, advantage of adaptive wavelets,” IET
Image Processing, vol. 12, no. 9, 2018, pp. 1626–1638.
[25] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification
with deep convolutional neural networks,” Communications of the ACM,
vol. 60, no. 6, Jun. 2017, pp. 84–90.

114

Learn IoT Programming Using Node-RED: Begin to Code Full Stack IoT Apps and Edge Devices with Raspberry Pi, NodeJS, and Grafana
From Everand
Learn IoT Programming Using Node-RED: Begin to Code Full Stack IoT Apps and Edge Devices with Raspberry Pi, NodeJS, and Grafana
Bernardo Ronquillo Japón
No ratings yet
P630 Testing Procedure
100% (1)
P630 Testing Procedure
18 pages
513274_1_En_12_Chapter
No ratings yet
513274_1_En_12_Chapter
14 pages
Explainable Ensemble Learning Based Detection of E
No ratings yet
Explainable Ensemble Learning Based Detection of E
23 pages
2 FB 8
No ratings yet
2 FB 8
8 pages
PDF Malware Detection Toward Machine Learning Modeling With Explainability Analysis
No ratings yet
PDF Malware Detection Toward Machine Learning Modeling With Explainability Analysis
27 pages
Mohammed et al. - 2021 - HAPSSA Holistic Approach to PDF Malware Detection
No ratings yet
Mohammed et al. - 2021 - HAPSSA Holistic Approach to PDF Malware Detection
6 pages
2513 Ijsptm 04
No ratings yet
2513 Ijsptm 04
6 pages
preprints202301.0557.v1
No ratings yet
preprints202301.0557.v1
9 pages
A Structural and Content-Based Approach For A Precise and Robust Detection of Malicious PDF Files
No ratings yet
A Structural and Content-Based Approach For A Precise and Robust Detection of Malicious PDF Files
10 pages
A Pattern Recognition System For Malicious PDF Files Detection
No ratings yet
A Pattern Recognition System For Malicious PDF Files Detection
2 pages
A Pattern Recognition System For Malicious PDF Files Detection
No ratings yet
A Pattern Recognition System For Malicious PDF Files Detection
15 pages
malicious pdf files detection 2017
No ratings yet
malicious pdf files detection 2017
9 pages
Designing_A_Pdf_Malware_Detection_System_Using_Mac
No ratings yet
Designing_A_Pdf_Malware_Detection_System_Using_Mac
15 pages
Hidost A Static Machine-Learning-Based Detector of Malicious Files
No ratings yet
Hidost A Static Machine-Learning-Based Detector of Malicious Files
20 pages
PDF Malware Detection A Hybrid Approach Using Random Forest and K-Nearest Neighbors
No ratings yet
PDF Malware Detection A Hybrid Approach Using Random Forest and K-Nearest Neighbors
6 pages
DSN 14
No ratings yet
DSN 14
13 pages
Malware Detection
No ratings yet
Malware Detection
15 pages
672642bcdc6305cc1d871def 37982191816
No ratings yet
672642bcdc6305cc1d871def 37982191816
2 pages
Research Article: Malware Detection On Byte Streams of PDF Files Using Convolutional Neural Networks
No ratings yet
Research Article: Malware Detection On Byte Streams of PDF Files Using Convolutional Neural Networks
10 pages
2107.12873
No ratings yet
2107.12873
6 pages
Detecting
No ratings yet
Detecting
12 pages
Detect Malware in Portable Document Format Files (PDF) Using Support Vector Machine and Random Decision Forest
No ratings yet
Detect Malware in Portable Document Format Files (PDF) Using Support Vector Machine and Random Decision Forest
4 pages
Malware Analysis On PDF
No ratings yet
Malware Analysis On PDF
45 pages
TLTK1
No ratings yet
TLTK1
20 pages
Combining Static and Dynamic Analysis For The Detection of Malicious Documents
No ratings yet
Combining Static and Dynamic Analysis For The Detection of Malicious Documents
6 pages
Electronics 11 03142 v2
No ratings yet
Electronics 11 03142 v2
18 pages
Yerima et al. - 2022 - Malicious PDF detection Based on Machine Learning
No ratings yet
Yerima et al. - 2022 - Malicious PDF detection Based on Machine Learning
6 pages
A Robust Framework For Malicious PDF Detection Leveraging
No ratings yet
A Robust Framework For Malicious PDF Detection Leveraging
20 pages
abstract1
No ratings yet
abstract1
4 pages
Malcode Detection
No ratings yet
Malcode Detection
5 pages
De Obfuscation and Detection of Malicious PDF Files With High Accuracy Hicss2013
No ratings yet
De Obfuscation and Detection of Malicious PDF Files With High Accuracy Hicss2013
10 pages
PDF S: Detecting Javascript-Based Attacks in PDF Documents: Crutinizer
No ratings yet
PDF S: Detecting Javascript-Based Attacks in PDF Documents: Crutinizer
8 pages
PDF Malware Detection
No ratings yet
PDF Malware Detection
3 pages
Malicious Code Invariance Based On Deep Learning
No ratings yet
Malicious Code Invariance Based On Deep Learning
7 pages
08 Rohit Final Malware Research Paper
No ratings yet
08 Rohit Final Malware Research Paper
13 pages
Ibrahimbello MaliciousDOcumentForensicProject
No ratings yet
Ibrahimbello MaliciousDOcumentForensicProject
10 pages
Robustml
No ratings yet
Robustml
18 pages
PDF Analysis Cheatsheet
No ratings yet
PDF Analysis Cheatsheet
4 pages
Malware Application Detection Using Machine Learning
No ratings yet
Malware Application Detection Using Machine Learning
8 pages
Malware Detection Using Machine Leaning
No ratings yet
Malware Detection Using Machine Leaning
9 pages
Malicious PDF Files Detecting and Analyzing
No ratings yet
Malicious PDF Files Detecting and Analyzing
26 pages
Malware Detection Using Frequency Domain-Based Image Visualization and Deep Learning
No ratings yet
Malware Detection Using Frequency Domain-Based Image Visualization and Deep Learning
10 pages
Malware Detection Using Machine Learning
No ratings yet
Malware Detection Using Machine Learning
11 pages
PDF Analysis System Using Yara
No ratings yet
PDF Analysis System Using Yara
9 pages
1742747318200
No ratings yet
1742747318200
37 pages
Final Synopsis To Submit
No ratings yet
Final Synopsis To Submit
10 pages
Malware Detection Using Machine Learning and Deep Learning
No ratings yet
Malware Detection Using Machine Learning and Deep Learning
10 pages
A - Multi-Strategy - Adversarial - Attack - Method - For - Deep - Learning - Based - Malware - Detectors
No ratings yet
A - Multi-Strategy - Adversarial - Attack - Method - For - Deep - Learning - Based - Malware - Detectors
5 pages
2406.03831v1
No ratings yet
2406.03831v1
8 pages
First Review B19
No ratings yet
First Review B19
24 pages
ULBP-RF: A Hybrid Approach For Malware Image Classification: Abstract
No ratings yet
ULBP-RF: A Hybrid Approach For Malware Image Classification: Abstract
5 pages
IET Information Security - 2020 - Ghouti - Malware Classification Using Compact Image Features and Multiclass Support
No ratings yet
IET Information Security - 2020 - Ghouti - Malware Classification Using Compact Image Features and Multiclass Support
11 pages
Automated Classification and Analysis of Internet Malware: (Mibailey, Jonojono, Janderse, Zmao, Farnam) @umich - Edu
No ratings yet
Automated Classification and Analysis of Internet Malware: (Mibailey, Jonojono, Janderse, Zmao, Farnam) @umich - Edu
20 pages
Dissertation Proposal Template
No ratings yet
Dissertation Proposal Template
9 pages
Malware Detection Using Convolutional Neural Network, A Deep Learning Framework: Comparative Analysis
No ratings yet
Malware Detection Using Convolutional Neural Network, A Deep Learning Framework: Comparative Analysis
14 pages
3
No ratings yet
3
3 pages
Adversarial ML PDF
No ratings yet
Adversarial ML PDF
1 page
Suscipit Consequat Eros Non Porttitor. Sed A: Government MCA College, Ahmedabad
No ratings yet
Suscipit Consequat Eros Non Porttitor. Sed A: Government MCA College, Ahmedabad
1 page
Mini Project
No ratings yet
Mini Project
11 pages
Image Collection Exploration: Unveiling Visual Landscapes in Computer Vision
From Everand
Image Collection Exploration: Unveiling Visual Landscapes in Computer Vision
Fouad Sabry
No ratings yet
Practical Guide to Vegetable Oil Processing 2nd Edition Monoj K. Gupta download pdf
100% (1)
Practical Guide to Vegetable Oil Processing 2nd Edition Monoj K. Gupta download pdf
65 pages
Mind Map Physical Chem
100% (1)
Mind Map Physical Chem
1 page
Nursing Practice Ii - Questions
No ratings yet
Nursing Practice Ii - Questions
17 pages
Chapter 5: Analytical Thinking and Creative Thinking
No ratings yet
Chapter 5: Analytical Thinking and Creative Thinking
7 pages
Cable Modem Motorola SBV5121 - User Guide
100% (2)
Cable Modem Motorola SBV5121 - User Guide
58 pages
Sparepart S 35 MC
100% (1)
Sparepart S 35 MC
12 pages
My Knight The Lioness
100% (1)
My Knight The Lioness
221 pages
Dynasand 1
No ratings yet
Dynasand 1
2 pages
SSINA - Designer's Handbook - Stainless Steel Fasteners
No ratings yet
SSINA - Designer's Handbook - Stainless Steel Fasteners
23 pages
Midterm English 9
No ratings yet
Midterm English 9
3 pages
SKKL 2 Nov
No ratings yet
SKKL 2 Nov
44 pages
Download Complete Speech and Language Processing An Introduction to Natural Language Processing Computational Linguistics and Speech Recognition Second Edition Daniel Jurafsky PDF for All Chapters
No ratings yet
Download Complete Speech and Language Processing An Introduction to Natural Language Processing Computational Linguistics and Speech Recognition Second Edition Daniel Jurafsky PDF for All Chapters
29 pages
Rapt 12
No ratings yet
Rapt 12
16 pages
Electrical Circuits and Internal Resistance (MCQ Only)
100% (1)
Electrical Circuits and Internal Resistance (MCQ Only)
12 pages
PRO Ex 30: Exterior Sheen Emilsion
No ratings yet
PRO Ex 30: Exterior Sheen Emilsion
2 pages
Week 3-1 Recipe Management
0% (1)
Week 3-1 Recipe Management
19 pages
Elna (Radial Thru-Hole) RJJ Series
No ratings yet
Elna (Radial Thru-Hole) RJJ Series
3 pages
Private Cars Add On Covers
No ratings yet
Private Cars Add On Covers
3 pages
Geometry m2 Topic B Lesson 8 Teacher
No ratings yet
Geometry m2 Topic B Lesson 8 Teacher
15 pages
USG For Critical Care PDF
No ratings yet
USG For Critical Care PDF
13 pages
Traffic Engineering: Lecturer: Dr. Nasreen A. Hussein
No ratings yet
Traffic Engineering: Lecturer: Dr. Nasreen A. Hussein
17 pages
Order Granting Plaintiffs' Fourth Motion To Enforce Settlement Agreement
No ratings yet
Order Granting Plaintiffs' Fourth Motion To Enforce Settlement Agreement
16 pages
Blasting & Painting Report
No ratings yet
Blasting & Painting Report
2 pages
0201 Intact Stability Essem 2 (Rev-01) PDF
No ratings yet
0201 Intact Stability Essem 2 (Rev-01) PDF
92 pages
Magic: Exclusive Interview World Showcase
100% (3)
Magic: Exclusive Interview World Showcase
63 pages
Augmented Reality and Learning in Science Museums: Susan A. Yoon, Joyce Wang and Karen Elinich
No ratings yet
Augmented Reality and Learning in Science Museums: Susan A. Yoon, Joyce Wang and Karen Elinich
13 pages
Tutorial On Molecular Visualisation: Version: 17th January 2012
No ratings yet
Tutorial On Molecular Visualisation: Version: 17th January 2012
11 pages
LESSON 2 - TRANSMUTATION - Louise Peralta - 11 - Fairness
No ratings yet
LESSON 2 - TRANSMUTATION - Louise Peralta - 11 - Fairness
2 pages
Question Bank Sem 4
100% (1)
Question Bank Sem 4
3 pages

Robust Alcode Detection

Uploaded by

Robust Alcode Detection

Uploaded by

2019 2nd International Conference on Data Intelligence and Security (ICDIS)

Robust PDF Malware Detection with Image

978-1-7281-2080-5/19/$31.00 ©2019 IEEE 108

TABLE I: Image Width vs. PDF File Size [9]

Fig. 4: An example of a Markov plot of a PDF ﬁle

Fig. 3: An example of a byte plot of a PDF ﬁle

The BRIEF feature vector, fn , is made up of each n pairs of

[12] D. Lowe, “Distinctive image features from scale-invariant keypoints,”

You might also like