0% found this document useful (0 votes)
11 views7 pages

Analyzing Malicious PDF Files - Part 12

Uploaded by

tw626
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views7 pages

Analyzing Malicious PDF Files - Part 12

Uploaded by

tw626
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 7

0

1
00:00:11,160 --> 00:00:11,910
Hello everyone.
1

2
00:00:11,940 --> 00:00:16,230
Let us start analyzing a bunch of malicious PDF files.
2

3
00:00:16,310 --> 00:00:23,270
So again we'll be coming back to our FLARE suite of tools collection.
3

4
00:00:23,310 --> 00:00:29,280
You can see there is a folder for PDF. When you go inside you will find that there
is a tool called PDF
4

5
00:00:29,280 --> 00:00:38,070
parser which is critical in parsing the complete PDF file and extracting malicious
artifacts artifact.
5

6
00:00:38,140 --> 00:00:43,460
Then there is a shortcut for PDF-parser and there is a shortcut for PDfid.
6

7
00:00:43,480 --> 00:00:47,450
So these are basically the compiled executables of the actual program.
7

8
00:00:47,490 --> 00:00:51,670
If you go inside PDf-parser, you'll see that it's simply a python file.
8

9
00:00:51,700 --> 00:00:57,810
You we can run it just like we ran the other OLE file analysis tools in the
previous videos.
9

10
00:00:58,240 --> 00:01:05,020
So Pdfid's original tool is not here. So we can just click on the properties of the
shortcut and we
10

11
00:01:05,020 --> 00:01:08,240
can see that where exactly this shortcut is pointing to.
11
12
00:01:08,450 --> 00:01:11,880
They can see that it's in Program Files/pdfid
12

13
00:01:12,010 --> 00:01:18,550
So I don't really enjoy running the shortcuts. It's better to always run the python
program directly
13

14
00:01:18,640 --> 00:01:23,110
so that in case there is any error you can look at it and try and resolve.
14

15
00:01:23,110 --> 00:01:25,480
So let's quickly go to program files.
15

16
00:01:27,000 --> 00:01:36,010
Pdfid and just copy it and move it to the FLARE folder.
16

17
00:01:36,080 --> 00:01:41,510
So you now have both pdf-parser and pdfid in the same FLARE directory.
17

18
00:01:41,510 --> 00:01:43,640
So pdfid is more of a
18

19
00:01:45,100 --> 00:01:51,650
meta information tool which gives you a bunch of information about the PDF file.
19

20
00:01:51,770 --> 00:01:57,260
For example how many page numbers are there, are there any javascripts inside it
and things like that.
20

21
00:01:57,260 --> 00:02:02,870
Whereas PDF-parser is more of a dynamic parsing of the PDF file.
21

22
00:02:02,900 --> 00:02:06,410
So let's begin with using pdfid.
22

23
00:02:06,410 --> 00:02:07,350
For the first
23

24
00:02:12,680 --> 00:02:22,600
So I will come to my pdfid directory and my files are stored in course files/PDF
files/PDF examples
24

25
00:02:22,610 --> 00:02:24,350
I have three examples here.
25

26
00:02:24,350 --> 00:02:28,700
So we'll be using them one on one
26

27
00:02:28,700 --> 00:02:37,420
We Will pass
>python pdfid.py
followed by the location of the file.
27

28
00:02:39,500 --> 00:02:43,360
So once you press enter it will give us a bunch of information.
28

29
00:02:43,360 --> 00:02:47,660
For example this PDF file has 26 objects inside it.
29

30
00:02:47,660 --> 00:02:53,240
If you remember from our previous discussion we talked about how PDF file is
basically.....the body of PDF file
30

31
00:02:53,240 --> 00:02:57,920
consists of different objects and all those objects will begin with.
31

32
00:02:58,010 --> 00:03:01,650
'obj' and end with 'endobj'.
32

33
00:03:01,880 --> 00:03:08,780
So there are 26 objects and 26 end-objects so it is ending all the objects properly
33

34
00:03:08,990 --> 00:03:15,500
There are nine streams. Again the body of for the PDF files contain streams and
these teams have the
34

35
00:03:15,500 --> 00:03:17,060
data.
35

36
00:03:17,240 --> 00:03:18,990
Then there is one cross-reference.
36

37
00:03:19,010 --> 00:03:21,690
There is one trailer one start xref
37

38
00:03:21,770 --> 00:03:23,790
There are three page numbers.
38

39
00:03:23,960 --> 00:03:27,050
There is one javascript as well.
39

40
00:03:27,050 --> 00:03:33,800
/JS tag and it has been picked up by PDfid
40

41
00:03:33,800 --> 00:03:35,100
well.
41

42
00:03:35,200 --> 00:03:37,190
There is an open action as well.
42

43
00:03:37,190 --> 00:03:42,640
So what I mean by open action here is that once you launch the PDf file, whatever
is
43

44
00:03:42,650 --> 00:03:46,370
marked as open action will be immediately executed.
44

45
00:03:46,670 --> 00:03:55,060
So it's very important to understand all these meta properties that we have got
from PDfid
45

46
00:03:55,160 --> 00:04:00,710
We already know a bunch of them but there are some of them which are new and the
important ones are things
46

47
00:04:00,710 --> 00:04:04,860
like JS, Javascript, AA, openaction
47

48
00:04:04,920 --> 00:04:05,720
XFA, URI
48

49
00:04:05,720 --> 00:04:11,930
So URI again tells us is there is any URI that is present inside the PDF. The
embedded file
49

50
00:04:11,930 --> 00:04:12,470
tells us.
50

51
00:04:12,470 --> 00:04:20,540
Is there any embedded file for example an executable or a Flash file that is inside
the PDF. So the interesting
51

52
00:04:20,540 --> 00:04:22,340
parts here are javascript's.
52

53
00:04:22,370 --> 00:04:28,550
We know that this file contains javascript and there is an open action that is
performed as well which
53

54
00:04:28,550 --> 00:04:33,980
means that as soon as we are launching the PDF ,the PDF is trying to do something
without you know giving
54

55
00:04:33,980 --> 00:04:37,410
you any kind of permission or something.
55

56
00:04:37,410 --> 00:04:47,260
All you have to do is just from that PDF itself. let us run for our second file as
well
56

57
00:04:47,260 --> 00:04:49,410
file we get something similar.
57

58
00:04:49,510 --> 00:04:54,770
There are 12 objects two streams it has two pages.
58

59
00:04:54,910 --> 00:05:03,580
And again it has javascript inside it and it performs open action as well and there
is no embedded file
59

60
00:05:03,820 --> 00:05:06,950
and there is no URI inside that PDF file
60

61
00:05:08,580 --> 00:05:13,110
Let us try with our third example
61

62
00:05:13,230 --> 00:05:18,060
We have eight objects one stream one page.
62

63
00:05:18,060 --> 00:05:19,470
There is no javascript.
63

64
00:05:19,470 --> 00:05:25,900
In this case and that is one xfa, no URI. That's it.
64

65
00:05:25,920 --> 00:05:31,950
So this is how we first collect some kind of static information of the PDF file
using pdfid and
65

66
00:05:31,950 --> 00:05:37,230
this can help us in making again some heuristic analysis of the PDF file by looking
at the number
66

67
00:05:37,230 --> 00:05:43,020
of pages, whether it has some javascript's or not with it or its performing some
open action or not and
67

68
00:05:43,020 --> 00:05:44,350
things like that.
68

69
00:05:44,370 --> 00:05:50,940
So once we have some kind of static heuristics about the PDF file, the next thing
that we can do
69

70
00:05:50,940 --> 00:05:56,710
is we can start using PDF parser to actually look into these elements.

You might also like