0% found this document useful (0 votes)
16 views34 pages

When Malware Is Packing Heat

When Malware Is Packing Heat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views34 pages

When Malware Is Packing Heat

When Malware Is Packing Heat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

When Malware

Is Packing Heat
Davide Balzarotti and Giovanni Vigna

USENIX Enigma 2018 1


Packing

2
Packing

3
Researchers often have a limited understanding
of the complexity of runtime packers

4
Researchers often have a limited understanding
of the complexity of runtime packers

AV software often mis-classify benign packed


samples as malicious

5
Researchers often have a limited understanding
of the complexity of runtime packers

AV software often mis-classify benign packed


samples as malicious

We all love ML, but in the presence of packing


it just learns the wrong thing

6
7
Layer 1

8
Layer 1 Layer 2

9
Layer 1 Layer 2 Layer 3

10
Layer 1 Layer 2 Layer n

packer code application code


11
12
13
Complexity Classes
[Class I] a single unpacking routine is executed before transferring
the control to the unpacked program
[Class II] multiple unpacking layers are executed sequentially and lead
to the original code at the end
[Class III] intermediate layers are executed in loops
[Class IV] the packer code is interleaved with the execution of the
unpacked program
[Class V] pieces of the original program are unpacked on-demand
[Class VI] only a single fragment of the original program (as little as
a single instruction) is unpacked in memory at any moment
in time
14
Off-The-Shelf Packers Custom Malware Packers

15
Ange Albertini 2009-2010
Creative Commons Attribution
16
http://corkami.blogspot.com
Why Does Packing Matter?
§ Dynamic analysis techniques (e.g., sandboxes) have
been introduced to deal with packing…

§ …but static analysis techniques are more efficient!

17
An Experiment
§ Benign programs from Windows OSs (XP, Vista, 7, NT)
§ 7983 samples
§ Packed with 4 different packers
§ 16663 samples
§ Submitted to VirusTotal
§ Looking for 10+ detections
§ See: http://sarvamblog.blogspot.com/2013/05/nearly-70-of-packed-windows-system.html

18
Results

§ UPX: 0% False Positives

§ BEP: 72.78% False Positives

§ NsPack: 98.72% False Positives

§ Upack: 99.88% False Positives

19
Packing = Malware?
§ False Positives § Dataset Pollution

20
How Did We Get Here?
§ Machine Learning has been increasingly used to perform
malware detection
§ The misclassification of packed binaries is the result of
learning the wrong thing…

§ Let’s take a step back!

21
What Is Machine Learning?
§ “Machine learning explores the study and
construction of algorithms that can learn from
and perform predictive analysis on data”
https://en.wikipedia.org/wiki/Machine_learning

22
Why Machine Learning?

§ Supports data analysis

§ Supports characterization

§ Supports classification

23
Machine Learning

?
Round?
Has >3 sides?

24
Machine Learning

25
Machine Learning

?
Reds are bad
Blues, greens,
oranges are good
What about
greys?

26
Machine Learning

27
Pitfalls in Machine Learning

28
Pitfalls in Machine Learning

29
Pitfalls in Machine Learning

30
Pitfalls in Machine Learning

31
55 2845 2327 5173 94.75 0.02 35.913
60 3103 2069 5173 94.712 0.032 36.667
65 3362 1810 5173 94.982 0.052 36.41
70 3621 1551 5173 94.934 0.052 38.252
75 3879 1293 5173 95.466 0.071 34.602
80 4138 1034 5173 95.533 0.074 35.822
85 4397 775 5173 95.756 0.055 35.656
90 4655 517 5173 95.407 0.234 36.128
95 4914 258 5173 96.123 0.076 49.711
100 5173 0 5173 96.839 0.008 52.451

Another Experiment

Insight: When most of malware is packed, packing is what is actually learned


32
Conclusions
§ Applying machine learning to packed malware might
lead to the detection of packing (and not the detection of
malicious behavior) resulting in false positives
§ De-sensitization caused by false positives
§ Pollution of datasets
§ Sophisticated dynamic unpacking and analysis is
necessary

33
Questions?

process by Roman from the Noun Project34


Machine learning picture: https://xkcd.com/1838/

You might also like