0% found this document useful (0 votes)
11 views

Student Presentation on Java Obfuscation

The document discusses Java obfuscation as a technique to transform software programs, making source code and binaries difficult to understand while preserving functionality. It outlines the compilation phases of Java programs, the reasons decompilation is feasible for Java, and various obfuscation methods such as name obfuscation, data obfuscation, and control flow obfuscation. The document also provides examples of decompilation and obfuscation, highlighting the challenges and techniques involved in protecting Java code from reverse engineering.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Student Presentation on Java Obfuscation

The document discusses Java obfuscation as a technique to transform software programs, making source code and binaries difficult to understand while preserving functionality. It outlines the compilation phases of Java programs, the reasons decompilation is feasible for Java, and various obfuscation methods such as name obfuscation, data obfuscation, and control flow obfuscation. The document also provides examples of decompilation and obfuscation, highlighting the challenges and techniques involved in protecting Java code from reverse engineering.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Java Obfuscation

Software Engineering Masterclass

Supervisor: Peter King


Second reader: Andrew Ireland

Heriot-Watt University 2011–2012

Pierre-André Saulais 1
Outline
● First lecture
● A – Compilation
● B – Decompilation
● Second lecture
● C – Obfuscation
● D – Software

Pierre-André Saulais 2
What is an obfuscator?
● Tool that can transform software programs
● Transforms source code (e.g. JavaScript)
● Transforms program binaries (e.g. Java)
● Hinders understanding of how the program works
● Renders the source code illegible to readers (JS)
● Prevent reverse-engineering by decompilers (Java)
● Must not change the program's functionality

Pierre-André Saulais 3
Compilation phases – Overview
● A compiler can be seen as:
● Black box that transforms source code to programs
● Pipeline with several stages
● Compilation is all about transformations
● ”How is the program at each compilation stage?”

Pierre-André Saulais 4
Compilation phases – Front-end
● Converts source code to AST and then IR
● Tasks: scanning, parsing, semantic analysis
● Source code (Text)
● Understandable by people, too high level for CPUs
● Abstract Syntax Tree (AST)
● Data structure that is easier to process than text
● 1:1 mapping to source code constructs
● Decompilation is trivial (iterate the tree)

Pierre-André Saulais 5
Compilation phases – Middle-end
● Transforms IR
● Tasks: run optimisation passes on IR
● Intermediate Representation (IR)
● Often linear (e.g. Java BC) or graph-based
● More difficult than ASTs to decompile but feasible
● Lower-level than source but not ISA-specific
● More focused on implementation details
● 'if', 'for', etc constructs replaced by branches + BBs
● Sub-expressions replaced by values

Pierre-André Saulais 6
Compilation phases – Back-end
● Converts IR to machine code
● Tasks: instruction selection, instruction
scheduling, register allocation
● Machine code
● Much more difficult than IR to decompile
● The code went through much transformations, e.g:
● Inlining: several function bodies 'mashed together'
● Instruction scheduling: reordering of instructions
● Register allocation: 1 register → N variables / time
● Very low-level and platform-specific
Pierre-André Saulais 7
Why is decompiling possible?
● Before Java, most programming languages were
compiled to native executables (machine code)
● Using optimising compilers
● e.g. C, C++, FORTRAN, etc
● Decompiling such executables is not feasible
● Why is decompiling Java programs feasible?
● Java (and C#) programs are partially compiled
● Next: comparison of both program life-cycles

Pierre-André Saulais 8
C program life-cycle: compilation
● Source files (.c) are compiled to object files (.o)
● Objects contain highly optimised, machine code
● Machine code is ISA-specific (e.g. X86, PPC, ARM)
● Very low-level
● Non-portable
● Difficult to map to source code

Pierre-André Saulais 9
Java program life-cycle: compilation
● Source files (.java) are compiled to .class files
● Class files contain high-level ”Java Byte Code”
● In non-optimised, canonical form (using javac)
● Easily mapped to source code
● Java BC is portable
● Executed by the Java Virtual Machine
● Several different ISAs are supported
● ”Write once, run everywhere”

Pierre-André Saulais 10
C program life-cycle: distribution
● Object files can be packed into binaries:
● Executables (.exe)
● Dynamically loaded libraries (.dll, .so)
● Before distribution, compilation is over
● No further processing needed
● C programs often distributed as binaries
● Requires binaries for each supported ISA

Pierre-André Saulais 11
Java program life-cycle: distribution
● Java programs often distributed as .jar files
● Zip file containing .class files and metadata
● However, .class files are not fully compiled
● At runtime, the JVM loads classes BC and either:
● Interprets it
● No processing, but slow execution
● Compiles it to machine code
● Using the HotSpot Just-in-Time compiler
● Heavy processing, but faster execution

Pierre-André Saulais 12
Decompilation examples – 1

Java source Corresponding Java BC


public int sum(int[] samples) 00: iconst_0 15: iaload
{ 01: istore_2 16: istore_3
int count = 0; 02: iconst_0 17: iload_2
int temp; 03: istore 4 18: iload_3
for(int i = 0; 05: iload 4 19: iadd
i < samples.length; 07: aload_1 20: istore_2
i++) 08: arraylength 21: iinc 4, 1
{ 09: if_icmpge 27 24: goto 5
temp = samples[i]; 12: aload_1 27: iload_2
count += temp; 13: iload 4 28: ireturn
}
return count;
}

Pierre-André Saulais 13
Decompilation examples – 2

Java source Decompiled (using JAD)


public int sum(int[] samples) public int sum(int ai[])
{ {
int count = 0; int i = 0;
int temp; for(int k = 0;
for(int i = 0; k < ai.length;
i < samples.length; k++)
i++) {
{ int j = ai[k];
temp = samples[i]; i += j;
count += temp; }
} return i;
return count; }
}

Pierre-André Saulais 14
Decompilation examples – 3

Obfuscated BC (using JBCO)


00: lconst_0 20: i2l 44: bipush 32
01: lstore_2 21: bipush 32 46: lshr
02: iconst_0 23: lshl 47: l2i
03: i2l 24: ldc2_w #26 48: if_icmpge 100
04: ldc2_w #32 27: land ...
07: land 28: lload 4 (27 more instructions)
08: lconst_0 30: ldc2_w #32 ...
09: ldc2_w #26 33: land 92: lstore 4
12: land 34: lxor 94: iinc 0, 1
13: lxor 35: lstore 4 97: goto 18
14: lstore 4 37: iload_0 100: lload 4
16: iconst_0 38: lload 4 102: ldc2_w #32
17: istore_0 40: ldc2_w #26 105: land
18: aload_1 43: land 106: l2i
19: arraylength 44: bipush 32 107: ireturn

Pierre-André Saulais 15
Decompilation examples – 4

Decompiled (using JAD)


public int sum(int ai[]) {
long l = 0L, l1 = (long)0 & 0xffffffffL ^ 0L & 0xffffffff00000000L;
this = 0;
do {
l1= (long)ai.length << 32
& 0xffffffff00000000L ^ l1 & 0xffffffffL;
if(this < (int)((l1 & 0xffffffff00000000L) >> 32)) {
l = (long)ai[this] & 0xffffffffL ^ l & 0xffffffff00000000L;
l1 = (long)((int)(l1 & 0xffffffffL) + (int)(l & 0xffffffffL))
& 0xffffffffL ^ l1 & 0xffffffff00000000L;
this++;
} else {
return (int)(l1 & 0xffffffffL);
}
} while(true);
}

Pierre-André Saulais 16
Java BC – Overview
● Java's IR language
● Linear (like an abstract assembly language)
● Compact (most instructions take 1-5 bytes)
● High-level (compared to machine code)
● BC deals with object references, field names, etc
● MC deals with pointers, offsets, etc
● BC is type-safe while MC generally isn't

Pierre-André Saulais 17
Java BC – A stack-based IR
● Most instructions operate on a special stack
● Some instructions push values (e.g. iload)
● Some instructions pop values (e.g. ireturn)
● Some pop inputs and push outputs (e.g. iadd)
● Example (x = 2, y = 3) int sum(int x, int y) {
return x + y;
● 0: [] → [2] }
● 1: [2] → [2 3] 0: iload_1
1: iload_2
● 2: [2 3] → [5] 2: iadd
● 3: [5] → [] 3: ireturn

Pierre-André Saulais 18
Java BC – Locals
● Each method has an array of locals, contains:
1) The special this value (by convention, index = 0)
2) The method's n arguments (index in [1; n])
3) The method's local variables (index > n)
int add_five(int x) {
int a = 5;
return x + a;
}
0: iconst_5 4: iadd
1: istore_2 5: ireturn
2: iload_2
3: iload_1
Pierre-André Saulais 19
Java BC – 'if' statement
● Implemented as conditional branch to 'else' block
● The 'then' block is executed with fall-through
int sign(int x) { 00: iload_1 ; if(x < 0)
int n; 01: iflt 9 ; goto 'if.else'
if(x >= 0)
n = 1; 04: iconst_1 ; 'if.then' block
else 05: istore_2
n = 0; 06: goto 11 ; goto 'if.end'
return n;
} 09: iconst_0 ; 'if.else' block
10: istore_2

11: iload_2 ; 'if.end'


12: ireturn
Pierre-André Saulais 20
Java BC – 'for' statement
● One conditional branch to exit the loop
● One unconditional branch to the loop start
int loop(int x) { 00: iconst_0
int i; 01: istore_2 ; i = 0
for(i = 0;
i < x; 02: iload_2 ; 'lp.start'
i++) { 03: iload_1 ; if (x >= i)
} 04: if_icmpge 13 ; goto 'lp.end'
return i; 07: iinc 2, 1 ; i++
} 10: goto 2 ; goto 'lp.start'

13: iload_2 ; 'lp.end'


14: ireturn

Pierre-André Saulais 21
Classification of transformations
● Three main classes of transformations [1]
● Layout obfuscation (noted 'L' in next slides)
● Changes how symbols are named
● Data obfuscation (noted 'D')
● Changes how data is stored
● Control obfuscation (noted 'C')
● Transforms the Control Flow Graph

Pierre-André Saulais 22
Debugging information removal (L)
● javac usually produce .class files with
● Local variable maps (local index → variable name)
● Line number maps (instruction → source line)
● This can be turned off with the '-O' or '-g:none'
options (no obfuscation tool needed)
● This does make the decompiled code more
difficult to understand

Pierre-André Saulais 23
Name obfuscation (L)
● Changes the name of each class/method/field to a
unique, random name
● Can use Java keywords, operators or spaces
● Methods with different signatures can share
names (method overloading)
● Usually public API untouched (not required)
● All uses of names have to be changed
● Even outside of the class/JAR (e.g. dependencies)

Pierre-André Saulais 24
Local variable access obfuscation (D)
● Changes the order of the locals index table
● Especially useful if the decompiler assumes that
this has index zero
● Such decompilers can produce non-reducible code:
this = this + 1;
● Packs small variables into larger variables
● Access to these variables require bit masking and
shifting operations
● Cf. earlier decompilation example

Pierre-André Saulais 25
String encryption (D)
● String constants used in the class are scrambled
using symmetric encryption
● Example: the string "Wrong password" can be
encrypted as ”Cgxzw5cuofh{j1”
● These strings are decrypted at run-time
● Requires the decryption code to be in the BC
● Adds confusion to the decompiled code but
generally straightforward to reverse

Pierre-André Saulais 26
Control Flow Graph
● A function can be broken down into Basic Blocks
● Longuest sequence of instructions without branch
● CFG: graph with BBs as nodes, branches as edges
entry:
iconst_0
istore_2
lp.start:
iload_2
iload_1
if_icmpge @lp.end
lp.body:
iinc 2, 1
goto @lp.start
lp.end:
iload_2
ireturn
Pierre-André Saulais 27
Changing the Control Flow Graph (C)
● Changes to the CFG can make the code non-
reducible (i.e. impossible to express in Java)
● Using goto (can be used in BC but not in Java)
● Loop unrolling
● Inlining methods (removes method boundaries)
● Inline methods will contain the code of all of their
callees (doesn't have to be in the same class)
● This cannot be used to remove public methods
(unless whole-application obfuscation is used)

Pierre-André Saulais 28
Inserting no-op/dead code (C)
● To confuse the reader and/or decompiler
● Adding code that does nothing when executed, or
is never executed ('dead code')
● Changing the recognized instruction patterns
● Only works with javac-specific decompilers
● E.g. adding random instructions after return
● Using instructions usually not found in .class files
● E.g. jsr, pop
● This usually makes the code non-reducible
Pierre-André Saulais 29
General issues with obfuscation
● Merely makes decompiling more difficult
● Functional information is still there
● With enough effort, what the program does can be
understood
● Makes debugging more complicated
● Complex transformations have costs
● Higher probability of introducing subtle bugs
● Usually there is a real performance loss

Pierre-André Saulais 30
Functional issues
● Name obfuscation (e.g. renaming methods)
● Can break code that uses reflection or JNI
● Reflection is used by serialization and ORM
frameworks (e.g. Spring, Hibernate)
● Can break bug reporting (mangled exception trace)
● Should be deterministic (or upgrades could break)
● Adding illegal code that is never executed
● Could break BC verification with newer versions of
the JVM

Pierre-André Saulais 31
Performance issues
● Control/data transforms can slow down the code
● Comparison using 'Scimark2' benchmark

Base JBCO JBCO JBCO JBCO JBCO ProGuard


Transform. N/A Layout Control Data L+C+D Speed Layout
Composite 100% 100% 94% 48% 43% 74% 100%
FFT 100% 105% 100% 46% 37% 68% 98%
LU 100% 99% 100% 36% 31% 88% 101%
Monte Carlo 100% 101% 37% 20% 15% 22% 101%
SOR 100% 100% 100% 73% 70% 73% 100%
Sparse MM 100% 100% 101% 59% 54% 81% 100%
BC size 1.00 3.23 3.60 6.67 7.35 5.64 0.66

Pierre-André Saulais 32
Who uses obfuscation?
● Mostly companies
● They don't want others to steal their work
● By reusing algorithms and data structures
● By using evaluation software indefinitely
● To protect Intellectual Property
● To prevent loss of revenue

Pierre-André Saulais 33
Alternatives
● Hiding the source code behind an interface
● Web app/service (e.g. server-side programming)
● Ahead-of-Time compiler (e.g. GCJ)
● Compiles Java code to machine code
● Requires AOT compilation for all dependencies
● Native 'core' library that contains critical code
● Executed through JNI
● Usually faster, but complicates deployment

Pierre-André Saulais 34
Java decompilers – javac-specific
● These tools rely on patterns found in code
produced by the official javac compiler
● Easily tripped by data/control flow obfuscation
● Examples
● JAD 1.5.8g (http://www.varaneckas.com/jad)
● JD-GUI 0.3.3 (http://java.decompiler.free.fr/)
● JODE 1.1.2-pre1 (http://jode.sourceforge.net/)

Pierre-André Saulais 35
Java decompilers – Dava
● http://www.sable.mcgill.ca/dava/
● Free and open source software (LGPL licence)
● Based on the Soot framework (McGill University)
● Tool-agnostic decompiler
● Does not depend on 'canonical' javac output
● Obfuscation-specific transformations
● Advanced flow-analysis to simplify code output [2]

Pierre-André Saulais 36
Java obfuscators – Proguard
● http://proguard.sourceforge.net/
● Free and open source (GPL licence)
● Main features
● Name obfuscation (with stack trace recovery)
● BC shrinker (removes unused code)
● Unsupported
● Control/data flow obfuscation
● String encryption

Pierre-André Saulais 37
Java obfuscators – JBCO
● http://www.sable.mcgill.ca/JBCO/
● Also based on the LGPL Soot framework
● Features
● Name, control and data flow obfuscation
● Configurable obfuscation passes
● Unsupported
● String encryption
● Stack trace recovery

Pierre-André Saulais 38
Java obfuscators – Commercial
● ”Second-generation” obfuscators
● Features
● Name, control and data flow obfuscation
● String encryption, stack trace recovery
● Examples
● KlassMaster (Zelix)
● Allatori Java Obfuscator (Smardec)
● DashO (PreEmptive Solutions)

Pierre-André Saulais 39
Conclusion
● Protecting programs is an issue for companies
● Obfuscators help with this issue, to some extent
● They also bring issues of their own

Pierre-André Saulais 40
References / Further Reading
● [1] Taxonomy of Obfuscating Transformations (Christian Collberg)
● [2] Programmer-friendly Decompiled Java (Naeem N., Hendren L.)
http://www.sable.mcgill.ca/publications/techreports/sable-tr-
2006-2.pdf
● [3] Protect Your Java Code – Through Obfuscators and Beyond
http://www.excelsior-usa.com/articles/java-obfuscators.html
● [4] Decompiling Java, Chapter 10 (Godfrey Nolan)
http://www.apress.com/downloadable/download/sample/
sample_id/1127/
● [5] Covert Java: Techniques for Decompiling [...] (Alex Kalinovsky)
http://www.informit.com/articles/printerfriendly.aspx?p=174368

Pierre-André Saulais 41
Questions?

for(void catch 11 final = 0; catch 11 final < |=;) {


void while if = |= - catch 11 final;
if(while if > void)
while if = void;
private01200126013D.readFully(_fld013D013D0120import,
catch 11 final, while if);
catch 11 final += while if;
if(= != null) {
const = (float)catch 11 final / (float)|=;
=.setColor(getForeground());
=.fillRect(0, size().height - 4, (int)(const *
size().width), 4);
}
}

Pierre-André Saulais 42

You might also like