0% found this document useful (0 votes)

11 views9 pages

Malicious PDF Files Detection 2017

The paper presents a machine learning approach for detecting malicious PDF files by extracting features from both the PDF structure and embedded JavaScript code. It highlights the inadequacies of existing detection methods and proposes a new method that combines structural and JavaScript feature vectors to improve detection accuracy. Experimental results demonstrate the effectiveness of the proposed method compared to previous techniques, achieving significantly higher accuracy in identifying malicious PDFs.

Uploaded by

Geen Life

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views9 pages

Malicious PDF Files Detection 2017

Uploaded by

Geen Life

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/319238475

Malicious PDF Files Detection Using Structural and Javascript Based Features

Conference Paper · May 2017

DOI: 10.1007/978-981-10-6544-6_14

CITATIONS READS

6 3,706

4 authors, including:

Amit Agarwal Manish Mahajan

Indian Institute of Technology Roorkee Graphic Era University
14 PUBLICATIONS 176 CITATIONS 25 PUBLICATIONS 94 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Amit Agarwal on 24 May 2019.

The user has requested enhancement of the downloaded file.

Malicious Pdf Files Detection Using Structural and
Javascript Based Features

Sonal Dabral1, Amit Agarwal2, Manish Mahajan3,Sachin Kumar4

1,3
Computer Science & Engineering, Graphic Era University, Dehradun
2
Computer Science & Engineering, Indian Institute of Technology, Roorkee
4
Centre for Transportation Systems, Indian Institute of Technology Roorkee
{sonaldabral26, amitagrawal1909, manish.mhajn, sachinagnihotri16}@gmail.com

Abstract. Malicious PDF files recently considered one of the most dangerous
threats to the system security. The flexible code-bearing vector of the PDF
format enables to attacker to carry out malicious code on the computer system
for user exploitation. Many solutions have been developed by security agents
for the safety of user’s system, but still inadequate. In this paper, we proposed a
method for malicious PDF file detection via machine learning approach. The
proposed method extracted features from PDF file structure and embedded
JavaScript code that leverage on advanced parsing mechanism. Instead of
looking for the specific attack inside the content of PDF i.e. quite complex
procedure, we extract features that are often used for attacks. Moreover, we
present the experimental evidence for the choice of learning algorithm to
provide the remarkably high accuracy as compared to other existing methods.
Keywords: Machine learning, PDF, JavaScript, Malware.

1 Introduction

Portable document format (PDF) is an electronic document format and it was released
in 1993 by Adobe System Inc, which allows publishing and exchange of documents
[1]. Nowadays, PDF is very popular because it is preferred as a mean of exchange
different documents between different organizations, peoples i.e. students and
professionals. Due to its high popularity, flexible structure and versatile functionality,
it has become a popular malware distribution strategy for user exploitation ranging
from server side to client side attack. The interest of miscreants has currently
switched from server side to client side attacks, because it gives well opportunity to
the attacker to exploit client applications (e.g. PDF readers) that are not uptodate
where the goal is to take advantage from lack of security knowledge of users by
fooling them into opening a malicious PDF document using applications found on
most user’s computers [2].

1
One of the most popular client applications is adobe reader for reading and exchanging of documents. Attackers may
exploit specific vulnerabilities of the reader application. In addition to exploitation of the PDF reader’s vulnerabilities,
the attackers also take the advantages of the many advanced features of PDF such as /Launch which can automatically
run an embedded script to manage OS specific events, or the/GoTo and /URl which can automatically open remote
resources for creating risk that are in internet [3]. Attackers often use JavaScript code to distract usual execution flow to
malicious code, it can be done by Buffer overflow, Heap spraying and Return Oriented Programming (ROP) [4]. In
order to bypass detection, attackers mainly use advanced encryptions techniques so that they can easily hide the
malicious code or embedded files in PDF [1].
The recent academic works over the malicious PDF file detections are categorized into two methods: dynamic and
static. First Detection of malicious JavaScript code within PDF files using both methods dynamic and static [5][6][7].
Another structural based approaches for malicious PDF detection using static analysis [4][9]. The advantage of this
method over the JavaScript analysis is that they are capable of detection of nonJava Script attacks and not affected by
code obfuscation because it does not a focus on analyzing content itself. However, further research showed that attacker
exploits the system through deliberate attacks [10]. Therefore, work has focused again on malicious JavaScript code
detection [11].
This paper propose a method based on machine learning technique for malicious PDF files detection where we
combine PDF structure feature vector to the JavaScript feature vector which are extracted from the PDF file structure
and embedded JavaScript in the PDF file respectively. The set of PDF structure features includes general characteristic
of the PDF structure as well as dynamic characteristic of PDF structure in terms of keywords such as /JavaScript,
/openAction and /URL etc. and the JavaScript features obtained from JavaScript code in the PDF file. As recent
research shows that the vast majority of PDF related vulnerabilities do rely JavaScript, hence we also analyze
JavaScript code inside the PDF file. But instead of looking for the specific attack inside the JavaScript code, we extract
features from JavaScript code which can conduct attack through JavaScript. The extraction process is efficiently carried
out using PDF analysis tool, namely, Origami that overcome the parsing related weakness presented in prior work. It
provides significant features to the classifier for effective and enhance detection of Malicious PDF file. We employed
different ensemble machine learning techniques to choose the classifier for our experiment. The good choice of
ensemble classifier gives a significant improvement on malicious PDF file detection.

1.1 The PDF File Structure

PDF file is a hierarchical structure of objects that are logically connected to each other. The structure of PDF file
determines how objects are accumulated in a file, how objects accessed and updated [1]. The PDF file structure is made
by four parts shown in figure 2.
 Header: represents the version number of PDF used by the file.
 Body: It contains large part in PDF file structure which constitutes all the PDF objects and contains the data or
information that is shown to user.
 Cross reference table (CRT): It indicates the position of every indirect object and these single objects are
represented by one entry in the table.
 Trailer: It gives the location of CRT and information about root object.
Fig. 1. An example of PDF files structure

2 Related Work

The increased prevalence of malicious documents has generated interest in techniques to perform malware analysis of
such documents over the years. Previous research focused on two methods for malicious PDF detection: static and
dynamic. Li et al. and Shafiq et al. [12, 13] present a method for detection of embedded malcode in word document
through static analysis using ngram and introduced novel dynamic runtime test that shows assertion but also remains
limited due to the size of malcode. Particularly this work is not designed for PDF file but they specially focused on
another file format such as docs, exe etc. There are possibilities to evade detection by modern obfuscation methods like
AES encryption [1], and other methods to exploit vulnerability like Heap Spraying, Return Oriented Programming
(ROP) [4].These exploiting methods are performed using embedded JavaScript code in PDF file. Therefore researchers
mainly targeted JavaScript code in PDF file.
Laskov and Srndic [6] developed a tool PJScan which is closely related to static analysis techniques, used to detect
the malicious PDF documents through lexical analysis of Java Script code. They used a machine learning approach,
OneClass Support Vector Machine automatically generate models from the available data for classification of testing
data. However this approach showed lower detection rate and not able to analyze obfuscated code that behave
maliciously during execution time. To overcome such limitation Snow et al. [14] proposed ShellOS, based on dynamic
analysis to detect code injection attacks, during runtime. It uses hardware virtualization that provides faster and precise
analysis of code and also enables to detect obfuscated code.
Moreover Tzermias et al. [7] demonstrated that the antivirus systems for the detection of malicious PDF documents
are less effective. To make more reliable detection system, they used the combination of both static and dynamic
analysis and introduced a standalone malicious PDF file scanner MDScan that specially focus on vulnerabilities. A
similar approach adopted by Schmitt et al. [15] presented a tool PDF Scrutinizer is used to detect current malicious PDF
file, however it showed a low falsepositive rate. It is mainly focuses on JavaScript based attacks.
Dynamic analysis of Java Script code may be computationally expensive and complex. To reduce cost factor and
increase speed, research again focused on static analysis.
Maiorka et al. [4] introduced a tool, PDF Malware Slayer (PDFMS) based on static method which analyze the
structure of PDF files by keywords and their occurrence. They have performed test set on Naive Bayes, SVM, J48 and
Random Forests classifiers. The results showed Random forests provided the highest accuracy which is better than
others. However, it has some structural weaknesses.
Instead of looking for specific content, the analysis of structure of PDF provided a higher detection rate. However
current work Maiorca et al.[10] showed that such detectors may be bypass, due to complexity in parsing mechanism.
Due to some structural weaknesses, work focused again on analysis of malicious JavaScript code. Corona et al. [11]
presented Lux0R “Lux 0n discriminant References", a new approach for the malicious JavaScript code detection using
characterization of JavaScript code by its API references. And Liu et al. [16] introduced a contextaware approach for
the detection of malicious JavaScript in PDF based on static document instrumentation and runtime behavior
monitoring.

3 Materials and Methods

In this section, the paper explain a method based on machine learning approach through static analysis where we
combine PDF structure feature vector with JavaScript feature vector, which are extracted from the PDF file structure
and embedded JavaScript code within PDF file, respectively. Our system architecture is shown in figure 2.

3.1 Dataset Used

We have collected dataset both malicious and benign PDF files from real and upto date samples. We have collected
around 4807 malicious file and 3745 benign files. Malicious PDF file samples are collected from the Contagiodump [9]
is a popular depository which contains the information about the trending vulnerabilities and attacks in PDF files. And
the benign PDF Samples are collected from the Yahoo search engine API. When collecting data from source websites it
gives no assurance that some data may be malicious. The existence of malicious files in the benign dataset will generate
undesirable results on the designed experiments. To diminish the risk, whole benign dataset was scanned using
antivirus.

3.2 Features Extraction

To extract features, we developed a parser that leverages on Origami tool. This tool performs a deep scanning of PDF
files to extract features that are mostly used by the attackers to hide malicious property. We adopted this tool as it
provide a reliable extraction of features as compared to others, such as PdfID [17], which simply analyzes the PDF file
without its logical properties, it may give good opportunity to the attackers to perform easy manipulations.

Parser
PDF Origami

Feature extractor
Structure EJSA*
Analysis

Feature Vector

Classifier Ensemble methods

Adab-oost M1 Baggi-ng Stacking

Malicious Benign
PDF Files PDF Files

For the extraction of features, we analyze each PDF file by following two ways:
* EJSA: Embedded JavaScript Analysis

Fig. 2. Architecture of our system.

1) Structure Analysis

In this phase, parser analyzes the structure of PDF file and searching for features which are significant for labeling PDF
file as malicious. This gives the set of features and their occurrence.
Based on the previous research [18, 19], these are following features that can be suspicious and used by attackers
mostly.
 JavaScript: JavaScript code can be directly embedded into an object within the PDF. Most malicious PDFs
use JavaScript to exploit Java vulnerabilities or to create heap sprays. /JS, /JavaScript keywords indicate the
use of JavaScript in the PDF.
 Actions: There are number of features such as /GoTo, /GoToR, /GoToE that are capable of specifying an
action to be performed. For example: Activating a hypertext link.
 Triggers: Attackers can use a number of different triggers in order to execute the harmful content within the
documents. An action is a common method to triggering mechanism. This is perform by the OpenAction key
in the root object of PDF file. The object which is point by OpenAction that may be a part of the attack.
 Launch: A document can open or print by launch an application, to manage OSspecific events. This feature
may be misused by attacker to steal confidential data of any organization whenever they access that suspicious
PDF file.
 Form Action: PDF Reader allow the /SubmitForm action from client to server. So in order to take advantage
of the weakness of the victim browser, this action perform a request to corrupt sites that will automatically
show on the victim browser and can perform a malfunctioning.

2) JavaScript Code Analysis

Our parser extract objects contain JavaScript from the body part of the structure of PDF file. Then it extracts embedded
JavaScript code and searching for the features labeled with JavaScript code that are often followed in carrying out an
attack. Based on the previous study [6, 19, 20], we describe following set of features used in our system.
 eval_length: This function is used by malicious scripts to dynamically interpret code and to calculate the
length of the longest string passed to eval() function call.
 max_string : It is use to define the length of the longest string. Malware writer use the strings for shell code is
very long as compared to string used in legitimate JavaScript.
 stringcount : It is used to count the no. of strings that are defined in scripts. To obfuscate the script malware
writer break the strings into many paltry strings.
 replace : This function calculate the uses of the javascript replace() function. Often it is used to obfuscate
JavaScript code in malicious scripts.
 substring : This function can be used to measure the uses of the javascript substring() function. It is mostly
used to obfuscate the JavaScript code.
 Eval : This function call used by the malicious scripts to measure the uses of the javascript eval() function and
to dynamically interpret JavaScript code.
 fromCharCode: It coverts Unicode values to the characters. It is mostly used to obfuscate the code.
 setTimeOut(): can be used to replace the eval() to run random javascript code after the particular timeout.
 document. write and document. createElement: which indicate the use of dynamic code executions.

3.3 Classification

To classify PDF files, extracted features run by a classifier that can be create by any learning algorithms. But in
previous, researchers have used the method of combining the predictions of multiple learners to produce better results
than could be produced from any individual learning algorithm [8]. In this sense we tested ensemble methods such as
Adaptive Boosting (AdaBoostM1), Bagging, stacking [8]. These algorithms combine weak classification tree models
with a particular weight to create a stronger and precise classifier. As a weak model we define a simple decision trees
(J48) (supervised learning approach, Quinlan, 1996) because an ensemble of trees gives more robustness compared to a
single tree. In addition we decided to give exhaustive experimental evidence in order to know which ensemble method
has ability to improve the accuracy on our dataset.
4 Results and Discussion

In this Section, we provide two experiments. The first one demonstrated the features extraction process. And the second
experiment presented experimental evidence as to which classification method has ability to improve the accuracy of
detection. In order to do this, first the only PDF structure features was run through different classifiers. Than we
experimented how the accuracy was improved when JavaScript features were combined with structure features.
Furthermore, we compare the performance of proposed method with previously developed tools for malicious PDFs
detection.

4.1 Experiment 1: Features Extraction

The goal of the experiment is to extract the feature vector from PDF file. Origami tool performs a deep scanning of PDF
files to extract features that are often used by the miscreants. After running the scan over one by one PDF file in
malicious and benign dataset, the results were achieved as shown in Figure 3.

Fig. 3. Structure based features extraction result

After completing the structure feature vector extraction, we realized that a huge number of the malicious PDF files used
JavaScript to perform malicious actions. In our own dataset we found around 92.3% malicious samples contained
JavaScript. Thus we performed JavaScript features extraction process by origami tool. The Results were shown in
Figure 4.

Fig. 4. JavaScript based features extraction result

4.2 Experiment 2: Detection accuracy

Our test was conducted on Adaboost M1( used as a boosting ensemble), Bagging ( used as a bagging ensemble) and
stacking with two learning algorithms (J48 and IBk, and Logistic Regression used as the Meta classifier), using 10folds
Cross Validation repeated 10 times. We show our results with regards to confusion matrices (the number of benign and
malicious files with correct and incorrect classifications).
First the structure feature vector dataset was run through different classifiers. This gives the following results:

Table 1. Results with structure features

AdaBoostM1 Bagging Stacking

True Positives 4498 4471 4493
False positives 309 336 314
True negatives 2990 2936 2976
False negatives 755 809 767
TP Rate 0.876 0.866 0.873
FP Rate 0.141 0.152 0.144
ROC Area 0.945 0.934 0.940
Detection Accuracy 87.5584% 86.6113% 87.3363%

Further we tested how well the complete feature vector dataset (structure feature and JavaScript features) performed
at the classification task. And the dataset gives the following results as shown in table 2.

Table 2. Results with complete features (structure features and JavaScript based features)

AdaBoostM1 Bagging Stacking

True Positives 4753 4742 4744
False positives 54 65 63
True negatives 3666 3603 3670
False negatives 79 142 75
TP Rate 0.984 0.976 0.984
FP Rate 0.017 0.0287 0.017
ROC Area 0.998 0.993 0.995
Detection Accuracy 98.4448% 97.5795% 98.3863%

As we can see, when we combine structure feature vector to the JavaScript feature vector, it gives better detection
accuracy than only structure features dataset.
To interpret the proposed method, it is compared with previous developed tools such as Wepawet, PDFMS, PJScan,
MDScan and PDF Scrutinizer for malicious PDFs detection. The result is shown in Table 3. For each method, we show
true positives rate (TPR) and false positives rates (FPR). It shows that our system definitely outperforms Wepawet,
PJScan, MDScan and PDF Scrutinizer.

Table 3. Comparison between proposed method with other tools.

System TPR FPR

Proposed Method 0.984 0.017
WepaWet 0.8892 0.032
PJScan 0.7194 0.011
MDScan 0.8934 0
PDFMS 0.9955 0.0251
PDF Scrutinizer 0.9 0

PJScan, MDScan and PDF Scrutinizer show the smallest FPR, but detection rate is very low compared to the other
tools. PDFMS shows the highest TPR but gives a lower FPR as compared to proposed method. It can be also observed
that the proposed method works better than WepaWet in both TPR and FPR terms. Moreover, it is indicating that the
proposed method is better than all these tools.
5 Conclusions

In the past few years malicious PDF file has become one of the most crucial threats which originate a very effectual
attack vector for malware writers. In this paper, we have proposed a method using machine learning techniques for the
malicious PDF file detection. Instead of only relying on structure property of PDF file, we also presented the JavaScript
based features to improve the accuracy of detection. In addition, we also showed experimental evidence as to which
learning algorithm has ability to improve the accuracy of detection. Finally, we show the comparison of our method
with the other academic tools. And the high detection accuracy of our method has to be proved it is more accurate to
other tools.

References

[1] Adobe, “PDF reference, adobe portable document format version 1.7”, 2006.
[2] Symantec, “Malware security report: protecting your business, customers, and the bottom line,” Symantec,
(2010).
[3] Filiol, E., Blonce, A. and Frayssignes, L. Portable document format (PDF) security analysis and malware
threats. J. Comput. Virol, pp.7586. (2007)
[4] Maiorca, D., Giacinto, G. and Corona, I. A pattern recognition system for malicious pdf files detection.
In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 510524).
(2012)
[5] Esparza, J. M. Obfuscation and (non)detection of malicious pdf files. In S21Sec ecrime. (2011).
[6] Laskov and Srndi´c, “Static detection of malicious javascriptbearing pdf documents.”In Proceedings of the
27th Annual Computer Security Applications Conference, pp.373382, December, (2011).
[7] Tzermias, Z., Sykiotakis, G., Polychronakis, M. and Markatos, E.P, Combining static and dynamic analysis for
the detection of malicious documents. In Proceedings of the Fourth European Workshop on System
Security (p. 4) (2011)
[8] Tiwari, A. and Prakash, A. Improving Classification of J48 Algorithm Using Bagging, Boosting and Blending
Ensemble Methods on SONAR Dataset Using Weka. International Journal of Engineering and Technical
Research, 2, pp.207209 (2014)
[9] Mila, “Contagio Malware dump.” [Online]. Available:http://contagiodump.blogspot.in/2010/08/Malicious
documentsarchivefor.html.[Accessed 10 October 2014].
[10] Maiorca, D., Corona, I. and Giacinto, G. Looking at the bag is not enough to find the bomb: an evasion of
structural methods for malicious pdf files detection. In Proceedings of the 8th ACM SIGSAC symposium on
Information, computer and communications security (pp. 119130). (2013)
[11] Corona, I., Maiorca, D., Ariu, D. and Giacinto, G. Lux0r: Detection of malicious pdfembedded javascript
code through discriminant analysis of api references. In Proceedings of the 2014 Workshop on Artificial
Intelligent and Security Workshop (pp. 4757). ACM, November (2014).
[12] Li, W.J., Stolfo, S., Stavrou, A., Androulaki, E., and Keromytis, A. D. (2007). A study of malcodebearing
documents. In Proc. of the 4th Int. Conf. on Detect. Of Intrus. and Malware, and Vulnerability Assessment.
[13] Shafiq, M.Z., Khayam, S.A. and Farooq, M. Embedded malware detection using markov ngrams.
In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (pp. 88
107). Springer Berlin Heidelberg July, (2008).
[14] Snow, K.Z., Krishnan, S., Monrose, F. and Provos, N., 2011, August. SHELLOS: Enabling Fast Detection and
Forensic Analysis of Code Injection Attacks. In USENIX Security Symposium (pp. 183200), August,(2011).
[15] Schmitt, F., Gassen, J. and GerhardsPadilla, E. PDF SCRUTINIZER: Detecting JavaScriptbased attacks in
PDF documents. In Privacy, Security and Trust (PST),Tenth Annual International Conference on (pp. 104
111). IEEE July, (2012).
[16] Liu, D., Wang, H. and Stavrou, A. Detecting malicious javascript in pdf through document instrumentation.
In Dependable Systems and Networks (DSN), 44th IFIP International Conference on (pp. 100111). IEEE
(2014).
[17] Stevens, D., “PDF Tool”, [Online]. Available: http://blog.didierstevens.com/programs/pdftools/.
[18] Stevens, D., “Malicious pdf analysis ebook”. [Online].Available:http://didierstevens.com/files/data/malicious
pdfanalysisebook.zip, Sept 2010.[Accessed 22 September 2015]
[19] Kittilsen, J., “Detecting malicious PDF documents.”, Master Thesis, Gjovik, Norway,pp. 1112, December,
(2011).
[20] Cova,M., Kruege,C., Vigna,G., “Detection and Analysis of DrivebyDownload Attacks and Malicious
JavaScript Code,” In Proceedings of International Conference on World Wide Web, pp. 281290, July, (2010).

View publication stats

Salesforce Basics
100% (1)
Salesforce Basics
220 pages
PL SQL
No ratings yet
PL SQL
62 pages
BESCK104EIntroduction To C Programming
No ratings yet
BESCK104EIntroduction To C Programming
5 pages
Malware Detection
No ratings yet
Malware Detection
15 pages
Ug1209 Embedded Design Tutorial
No ratings yet
Ug1209 Embedded Design Tutorial
165 pages
Git and Github: Cs 4411 Spring 2020
No ratings yet
Git and Github: Cs 4411 Spring 2020
40 pages
Mohammed Et Al. - 2021 - HAPSSA Holistic Approach To PDF Malware Detection
No ratings yet
Mohammed Et Al. - 2021 - HAPSSA Holistic Approach To PDF Malware Detection
6 pages
Video
No ratings yet
Video
55 pages
Malware Analysis On PDF
No ratings yet
Malware Analysis On PDF
45 pages
Towards Adversarial Malware Detection: Lessons Learned From PDF-based Attacks
No ratings yet
Towards Adversarial Malware Detection: Lessons Learned From PDF-based Attacks
35 pages
A Malicious PDF File Detection Method Based On Improved Ensemble Learning Stacking
No ratings yet
A Malicious PDF File Detection Method Based On Improved Ensemble Learning Stacking
4 pages
Analyzing Pdfs Like Binaries: Adversarially Robust PDF Malware Analysis Via Intermediate Representation and Language Model
No ratings yet
Analyzing Pdfs Like Binaries: Adversarially Robust PDF Malware Analysis Via Intermediate Representation and Language Model
18 pages
AguaSense An Automated Fishpond Monitoring and Filtration System
No ratings yet
AguaSense An Automated Fishpond Monitoring and Filtration System
51 pages
3
No ratings yet
3
39 pages
Designing A PDF Malware Detection System Using Mac
No ratings yet
Designing A PDF Malware Detection System Using Mac
15 pages
Customer-Partners - WINDOW-POC GUIDE Harmony EndPoint EPM R81.10 - Step by Step Version Final
0% (1)
Customer-Partners - WINDOW-POC GUIDE Harmony EndPoint EPM R81.10 - Step by Step Version Final
39 pages
A Pattern Recognition System For Malicious PDF Files Detection
No ratings yet
A Pattern Recognition System For Malicious PDF Files Detection
15 pages
1 en 12 Chapter
No ratings yet
1 en 12 Chapter
14 pages
Gmu CS TR 2012 5
No ratings yet
Gmu CS TR 2012 5
16 pages
SE-Unit-2-Agile Development
No ratings yet
SE-Unit-2-Agile Development
20 pages
Bim Coordinator Cover Letter
67% (3)
Bim Coordinator Cover Letter
6 pages
Dse 2225 Os Midterm
No ratings yet
Dse 2225 Os Midterm
4 pages
Handbook Mcu
No ratings yet
Handbook Mcu
31 pages
A Robust Framework For Malicious PDF Detection Leveraging
No ratings yet
A Robust Framework For Malicious PDF Detection Leveraging
20 pages
Double-Layer Detection Model of Malicious PDF Docu
No ratings yet
Double-Layer Detection Model of Malicious PDF Docu
19 pages
TLTK1
No ratings yet
TLTK1
20 pages
PDF Malware Detection A Hybrid Approach Using Random Forest and K-Nearest Neighbors
No ratings yet
PDF Malware Detection A Hybrid Approach Using Random Forest and K-Nearest Neighbors
6 pages
A Feature Set of Small Size For The PDF Malware Detection
No ratings yet
A Feature Set of Small Size For The PDF Malware Detection
6 pages
PC Monitor Manual PDF
No ratings yet
PC Monitor Manual PDF
8 pages
GACS25
No ratings yet
GACS25
9 pages
Explainable Ensemble Learning Based Detection of E
No ratings yet
Explainable Ensemble Learning Based Detection of E
23 pages
A Structural and Content-Based Approach For A Precise and Robust Detection of Malicious PDF Files
No ratings yet
A Structural and Content-Based Approach For A Precise and Robust Detection of Malicious PDF Files
10 pages
Detecting
No ratings yet
Detecting
12 pages
PDF Malware Detection Toward Machine Learning Modeling With Explainability Analysis
No ratings yet
PDF Malware Detection Toward Machine Learning Modeling With Explainability Analysis
27 pages
Gopaldinne 2021
No ratings yet
Gopaldinne 2021
5 pages
Electronics 11 03142 v2
No ratings yet
Electronics 11 03142 v2
18 pages
Yerima Et Al. - 2022 - Malicious PDF Detection Based On Machine Learning
No ratings yet
Yerima Et Al. - 2022 - Malicious PDF Detection Based On Machine Learning
6 pages
PDF-Malware: An Overview On Threats, Detection and Evasion Attacks
No ratings yet
PDF-Malware: An Overview On Threats, Detection and Evasion Attacks
6 pages
Research Article: Malware Detection On Byte Streams of PDF Files Using Convolutional Neural Networks
No ratings yet
Research Article: Malware Detection On Byte Streams of PDF Files Using Convolutional Neural Networks
10 pages
An Effective Machine Learning Based Approach For PDF Malware Detection
No ratings yet
An Effective Machine Learning Based Approach For PDF Malware Detection
6 pages
DSN 14
No ratings yet
DSN 14
13 pages
Abstract 1
No ratings yet
Abstract 1
4 pages
Final Year Project Format
No ratings yet
Final Year Project Format
11 pages
2 FB 8
No ratings yet
2 FB 8
8 pages
De Obfuscation and Detection of Malicious PDF Files With High Accuracy Hicss2013
No ratings yet
De Obfuscation and Detection of Malicious PDF Files With High Accuracy Hicss2013
10 pages
Hidost A Static Machine-Learning-Based Detector of Malicious Files
No ratings yet
Hidost A Static Machine-Learning-Based Detector of Malicious Files
20 pages
2513 Ijsptm 04
No ratings yet
2513 Ijsptm 04
6 pages
ScienceSoft MITRE Windows Integration App User Guide
No ratings yet
ScienceSoft MITRE Windows Integration App User Guide
43 pages
2014 Corona Lux0r Dynamic Tool
No ratings yet
2014 Corona Lux0r Dynamic Tool
11 pages
MVC Paper
No ratings yet
MVC Paper
14 pages
Unit-I Questions
No ratings yet
Unit-I Questions
3 pages
Structural Engineering - Engineeing First Principles
No ratings yet
Structural Engineering - Engineeing First Principles
4 pages
PDF Analysis System Using Yara
No ratings yet
PDF Analysis System Using Yara
9 pages
Towards Automated Defense From Rootkit Attacks: Arati Baliga and Liviu Iftode
No ratings yet
Towards Automated Defense From Rootkit Attacks: Arati Baliga and Liviu Iftode
32 pages
Robust Alcode Detection
No ratings yet
Robust Alcode Detection
7 pages
Malware Detection Using Machine Leaning
No ratings yet
Malware Detection Using Machine Leaning
9 pages
CN Lab9900
No ratings yet
CN Lab9900
42 pages
PDF Malware Detection
No ratings yet
PDF Malware Detection
3 pages
Linux Installation: Installing Linux Redhat 9 by
No ratings yet
Linux Installation: Installing Linux Redhat 9 by
37 pages
A Pattern Recognition System For Malicious PDF Files Detection
No ratings yet
A Pattern Recognition System For Malicious PDF Files Detection
2 pages
672642bcdc6305cc1d871def 37982191816
No ratings yet
672642bcdc6305cc1d871def 37982191816
2 pages
Combining Static and Dynamic Analysis For The Detection of Malicious Documents
No ratings yet
Combining Static and Dynamic Analysis For The Detection of Malicious Documents
6 pages
Plutoconfig
No ratings yet
Plutoconfig
10 pages
Preprints202301 0557 v1
No ratings yet
Preprints202301 0557 v1
9 pages
Lesson 2 Variables Data Types and Operators
No ratings yet
Lesson 2 Variables Data Types and Operators
6 pages
JavaScript - The Number Object
No ratings yet
JavaScript - The Number Object
3 pages
Icepe Presentation
No ratings yet
Icepe Presentation
15 pages
InDesign Course Outline
No ratings yet
InDesign Course Outline
3 pages
JavaScript DOM
No ratings yet
JavaScript DOM
10 pages
Capturing and Debugging SSL Traffic - FortiWeb
No ratings yet
Capturing and Debugging SSL Traffic - FortiWeb
5 pages
Swe3001 Operating-Systems Eth 1.0 37 Swe3001
No ratings yet
Swe3001 Operating-Systems Eth 1.0 37 Swe3001
2 pages
How To Install Yosemite On A Thinkpad T430: Strongly
No ratings yet
How To Install Yosemite On A Thinkpad T430: Strongly
2 pages
Building Applications with Rowy and Firestore: The Complete Guide for Developers and Engineers
From Everand
Building Applications with Rowy and Firestore: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Basics of Programming: A Comprehensive Guide for Beginners: Essential Coputer Skills, #1
From Everand
Basics of Programming: A Comprehensive Guide for Beginners: Essential Coputer Skills, #1
DG. Junior
No ratings yet
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
Darklang Development and Deployment: The Complete Guide for Developers and Engineers
From Everand
Darklang Development and Deployment: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Sourcegraph Essentials: The Complete Guide for Developers and Engineers
From Everand
Sourcegraph Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Idris Unleashed: Type-Driven Development and Theorem Proving in Functional Programming
From Everand
Idris Unleashed: Type-Driven Development and Theorem Proving in Functional Programming
Robert Johnson
No ratings yet
Strapi Development and Best Practices: Definitive Reference for Developers and Engineers
From Everand
Strapi Development and Best Practices: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
OpenAI Development Guide: Definitive Reference for Developers and Engineers
From Everand
OpenAI Development Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Metasploit Techniques and Workflows: Definitive Reference for Developers and Engineers
From Everand
Metasploit Techniques and Workflows: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Valgrind Essentials: Definitive Reference for Developers and Engineers
From Everand
Valgrind Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Blackboard Learn: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Blackboard Learn: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Dash Applications: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Dash Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
CodeIgniter Development Essentials: Definitive Reference for Developers and Engineers
From Everand
CodeIgniter Development Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Podio Technical Implementation Guide: Definitive Reference for Developers and Engineers
From Everand
Podio Technical Implementation Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Project Collaboration with Freedcamp: Definitive Reference for Developers and Engineers
From Everand
Efficient Project Collaboration with Freedcamp: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Learning Microsoft Windows Server 2012 Dynamic Access Control
From Everand
Learning Microsoft Windows Server 2012 Dynamic Access Control
Jochen Nickel
No ratings yet
Veracode Essentials: Definitive Reference for Developers and Engineers
From Everand
Veracode Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cloud Computing: Master the Concepts, Architecture and Applications with Real-world examples and Case studies
From Everand
Cloud Computing: Master the Concepts, Architecture and Applications with Real-world examples and Case studies
Ruchi Doshi
No ratings yet

Malicious PDF Files Detection 2017

Uploaded by

Malicious PDF Files Detection 2017

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Conference Paper · May 2017

Amit Agarwal Manish Mahajan

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Sonal Dabral1, Amit Agarwal2, Manish Mahajan3,Sachin Kumar4

1.1 The PDF File Structure

3 Materials and Methods

3.1 Dataset Used

3.2 Features Extraction

Classifier Ensemble methods

Fig. 2. Architecture of our system.

2) JavaScript Code Analysis

4.1 Experiment 1: Features Extraction

Fig. 3. Structure based features extraction result

Fig. 4. JavaScript based features extraction result

Table 1. Results with structure features

AdaBoostM1 Bagging Stacking

AdaBoostM1 Bagging Stacking

Table 3. Comparison between proposed method with other tools.

System TPR FPR

View publication stats

You might also like