Analyzing Malicious PDF Files - Part 21
Analyzing Malicious PDF Files - Part 21
1
00:00:01,940 --> 00:00:03,660
So let's go to PDF parser.
1
2
00:00:10,640 --> 00:00:21,670
We will again run as
>python pdf-parser.py
then give the location of the pdf file
2
3
00:00:21,670 --> 00:00:27,100
example1.pdf. Press 'Enter' and it throws bunch of result to us.
3
4
00:00:27,100 --> 00:00:37,150
So the first result of PDF parser is nothing but the complete raw output of the PDF
file.
4
5
00:00:37,150 --> 00:00:46,730
You can see that it begins with PDF magic bytes which tells us that it's a PDA file
of version 1.4.
5
6
00:00:46,760 --> 00:00:49,290
Then we have objects inside it.
6
7
00:00:49,340 --> 00:00:54,960
You can just keep scrolling down you can see there is one object that contains
Stream
7
8
00:00:58,590 --> 00:01:00,200
as you move down.
8
9
00:01:00,270 --> 00:01:02,060
So there is another object.
9
10
00:01:02,070 --> 00:01:09,000
This might seem like suspicious but you have to look at what's what's exactly there
inside this particular
10
11
00:01:09,000 --> 00:01:09,510
dictionary.
11
12
00:01:09,510 --> 00:01:16,440
So it seems like it's a font setting element where this PDF has some specific font
setting element
12
13
00:01:16,470 --> 00:01:17,970
these are basically
13
14
00:01:18,000 --> 00:01:24,680
the hex representation of the value of that font.
14
15
00:01:24,720 --> 00:01:31,680
So it's not really something critical in terms of maliciousness of the file. You
can further come down.
15
16
00:01:34,010 --> 00:01:38,060
So these objects that contain stream these can be of interest.
16
17
00:01:38,210 --> 00:01:45,650
But as you see these objects have been referenced so we have to look who actually
is trying to reference
17
18
00:01:45,650 --> 00:01:55,340
to these or whether they are actually being referenced or they are just some
placeholders.
18
19
00:01:55,390 --> 00:02:05,020
So if you move down you object 24 tells us that it's basically having a javascript
and the javascript
19
20
00:02:05,020 --> 00:02:12,560
is executing a URL with unescape. if you further move down object 25.
20
21
00:02:12,620 --> 00:02:18,430
That's more about the title of PDF and that say we have the end of file
21
22
00:02:22,680 --> 00:02:23,000
OK.
22
23
00:02:23,030 --> 00:02:30,740
In order to quickly search for anything inside the PDF, the option that pdf parser
gives us is '-s'
23
24
00:02:30,830 --> 00:02:35,960
with this parameter, you can search for any string inside the inside the PDF.
24
25
00:02:36,110 --> 00:02:39,460
Let's say I want to look for 'javascript'
25
26
00:02:41,860 --> 00:02:45,550
So it gets me all the locations where javascript has been located.
26
27
00:02:45,640 --> 00:02:51,880
For example object number 24 contains javascript and it has the actual script as
well.
27
28
00:02:52,800 --> 00:03:03,010
and there is another subject object 26, which contains a dictionary that is calling
the
28
29
00:03:03,010 --> 00:03:07,620
javascript and referencing to object number 23.
29
30
00:03:07,660 --> 00:03:14,090
So let us see what exactly is there in object number 26.
30
31
00:03:14,110 --> 00:03:22,370
I think that is going to be the same data that we see here but let's run '-o' which
is for object
31
32
00:03:22,520 --> 00:03:25,090
and pass it object number which is 26.
32
33
00:03:25,250 --> 00:03:29,580
So if we press enter it gives us the content of object number 26.
33
34
00:03:29,780 --> 00:03:37,700
So again the object number 26 says that it's trying to call a javascript that is
34
35
00:03:37,710 --> 00:03:38,420
at object number 23
35
36
00:03:38,420 --> 00:03:42,460
So let's go to object 23 and see what's there.
36
37
00:03:43,730 --> 00:03:46,990
So object 23 is interesting here.
37
38
00:03:47,060 --> 00:03:49,190
It's not really doing anything.
38
39
00:03:49,190 --> 00:03:52,370
It is just referencing to object number 24.
39
40
00:03:53,200 --> 00:03:56,600
And you guys know what is there an object on 24.
40
41
00:03:57,700 --> 00:04:00,340
It's our javascript that we just now saw.
41
42
00:04:00,340 --> 00:04:09,220
So this is basically a kind of way by which malware authors try to create a sort of
loop so that the
42
43
00:04:09,220 --> 00:04:14,280
PDF tools are not able to quickly recognize where the javascript is located.
43
44
00:04:14,470 --> 00:04:22,330
So if you see there was object 26 was referencing to object number 23
44
45
00:04:22,340 --> 00:04:25,610
in an object number 23 referenced to object number 24.
45
46
00:04:25,620 --> 00:04:30,870
And it was object 24 that actually contained the javascript inside it.
46
47
00:04:31,390 --> 00:04:33,100
So we have the javascript here.
47
48
00:04:33,220 --> 00:04:35,980
Now it's a simple unescape script.
48
49
00:04:36,040 --> 00:04:42,220
All you have to do is just append a document.write to it and you can see what
exactly this javascript
49
50
00:04:42,220 --> 00:04:53,230
translates into. Let us quickly analyze another example.
50
51
00:04:53,260 --> 00:04:57,880
So again it's a pretty long output and we already have a result from PDFid.
51
52
00:04:57,890 --> 00:05:01,450
that example2.pdf also contains javascript
52
53
00:05:01,570 --> 00:05:02,900
So let us search for that
53
54
00:05:08,290 --> 00:05:09,090
OK.
54
55
00:05:09,110 --> 00:05:18,640
So it's saying that there is a script that is referencing to an action.
55
56
00:05:18,810 --> 00:05:24,120
So lets search for action here.
56
57
00:05:24,150 --> 00:05:25,820
What exactly it does.
57
58
00:05:25,890 --> 00:05:34,710
OK so if we look at the referencing action, this javascript is trying to launch
command.exe. From there
58
59
00:05:34,860 --> 00:05:43,810
It's going to home drive. it's looking weather template.pdf exists on desktop or
not.
59
60
00:05:43,810 --> 00:05:45,260
or not.
60
61
00:05:45,280 --> 00:05:49,370
If that file exists it's actually executing it.
61
62
00:05:50,270 --> 00:05:55,070
So this is what this javascript is trying to do it's basically a launch action as
soon as you launch
62
63
00:05:55,070 --> 00:05:55,980
that PDF,
63
64
00:05:56,060 --> 00:05:59,560
This is the particular portion of the script that will get executed.
64
65
00:05:59,560 --> 00:06:05,870
So that is how we follow the trails and try to understand what the javascript is
trying to do.