Student Presentation on Java Obfuscation
Student Presentation on Java Obfuscation
Pierre-André Saulais 1
Outline
● First lecture
● A – Compilation
● B – Decompilation
● Second lecture
● C – Obfuscation
● D – Software
Pierre-André Saulais 2
What is an obfuscator?
● Tool that can transform software programs
● Transforms source code (e.g. JavaScript)
● Transforms program binaries (e.g. Java)
● Hinders understanding of how the program works
● Renders the source code illegible to readers (JS)
● Prevent reverse-engineering by decompilers (Java)
● Must not change the program's functionality
Pierre-André Saulais 3
Compilation phases – Overview
● A compiler can be seen as:
● Black box that transforms source code to programs
● Pipeline with several stages
● Compilation is all about transformations
● ”How is the program at each compilation stage?”
Pierre-André Saulais 4
Compilation phases – Front-end
● Converts source code to AST and then IR
● Tasks: scanning, parsing, semantic analysis
● Source code (Text)
● Understandable by people, too high level for CPUs
● Abstract Syntax Tree (AST)
● Data structure that is easier to process than text
● 1:1 mapping to source code constructs
● Decompilation is trivial (iterate the tree)
Pierre-André Saulais 5
Compilation phases – Middle-end
● Transforms IR
● Tasks: run optimisation passes on IR
● Intermediate Representation (IR)
● Often linear (e.g. Java BC) or graph-based
● More difficult than ASTs to decompile but feasible
● Lower-level than source but not ISA-specific
● More focused on implementation details
● 'if', 'for', etc constructs replaced by branches + BBs
● Sub-expressions replaced by values
Pierre-André Saulais 6
Compilation phases – Back-end
● Converts IR to machine code
● Tasks: instruction selection, instruction
scheduling, register allocation
● Machine code
● Much more difficult than IR to decompile
● The code went through much transformations, e.g:
● Inlining: several function bodies 'mashed together'
● Instruction scheduling: reordering of instructions
● Register allocation: 1 register → N variables / time
● Very low-level and platform-specific
Pierre-André Saulais 7
Why is decompiling possible?
● Before Java, most programming languages were
compiled to native executables (machine code)
● Using optimising compilers
● e.g. C, C++, FORTRAN, etc
● Decompiling such executables is not feasible
● Why is decompiling Java programs feasible?
● Java (and C#) programs are partially compiled
● Next: comparison of both program life-cycles
Pierre-André Saulais 8
C program life-cycle: compilation
● Source files (.c) are compiled to object files (.o)
● Objects contain highly optimised, machine code
● Machine code is ISA-specific (e.g. X86, PPC, ARM)
● Very low-level
● Non-portable
● Difficult to map to source code
Pierre-André Saulais 9
Java program life-cycle: compilation
● Source files (.java) are compiled to .class files
● Class files contain high-level ”Java Byte Code”
● In non-optimised, canonical form (using javac)
● Easily mapped to source code
● Java BC is portable
● Executed by the Java Virtual Machine
● Several different ISAs are supported
● ”Write once, run everywhere”
Pierre-André Saulais 10
C program life-cycle: distribution
● Object files can be packed into binaries:
● Executables (.exe)
● Dynamically loaded libraries (.dll, .so)
● Before distribution, compilation is over
● No further processing needed
● C programs often distributed as binaries
● Requires binaries for each supported ISA
Pierre-André Saulais 11
Java program life-cycle: distribution
● Java programs often distributed as .jar files
● Zip file containing .class files and metadata
● However, .class files are not fully compiled
● At runtime, the JVM loads classes BC and either:
● Interprets it
● No processing, but slow execution
● Compiles it to machine code
● Using the HotSpot Just-in-Time compiler
● Heavy processing, but faster execution
Pierre-André Saulais 12
Decompilation examples – 1
Pierre-André Saulais 13
Decompilation examples – 2
Pierre-André Saulais 14
Decompilation examples – 3
Pierre-André Saulais 15
Decompilation examples – 4
Pierre-André Saulais 16
Java BC – Overview
● Java's IR language
● Linear (like an abstract assembly language)
● Compact (most instructions take 1-5 bytes)
● High-level (compared to machine code)
● BC deals with object references, field names, etc
● MC deals with pointers, offsets, etc
● BC is type-safe while MC generally isn't
Pierre-André Saulais 17
Java BC – A stack-based IR
● Most instructions operate on a special stack
● Some instructions push values (e.g. iload)
● Some instructions pop values (e.g. ireturn)
● Some pop inputs and push outputs (e.g. iadd)
● Example (x = 2, y = 3) int sum(int x, int y) {
return x + y;
● 0: [] → [2] }
● 1: [2] → [2 3] 0: iload_1
1: iload_2
● 2: [2 3] → [5] 2: iadd
● 3: [5] → [] 3: ireturn
Pierre-André Saulais 18
Java BC – Locals
● Each method has an array of locals, contains:
1) The special this value (by convention, index = 0)
2) The method's n arguments (index in [1; n])
3) The method's local variables (index > n)
int add_five(int x) {
int a = 5;
return x + a;
}
0: iconst_5 4: iadd
1: istore_2 5: ireturn
2: iload_2
3: iload_1
Pierre-André Saulais 19
Java BC – 'if' statement
● Implemented as conditional branch to 'else' block
● The 'then' block is executed with fall-through
int sign(int x) { 00: iload_1 ; if(x < 0)
int n; 01: iflt 9 ; goto 'if.else'
if(x >= 0)
n = 1; 04: iconst_1 ; 'if.then' block
else 05: istore_2
n = 0; 06: goto 11 ; goto 'if.end'
return n;
} 09: iconst_0 ; 'if.else' block
10: istore_2
Pierre-André Saulais 21
Classification of transformations
● Three main classes of transformations [1]
● Layout obfuscation (noted 'L' in next slides)
● Changes how symbols are named
● Data obfuscation (noted 'D')
● Changes how data is stored
● Control obfuscation (noted 'C')
● Transforms the Control Flow Graph
Pierre-André Saulais 22
Debugging information removal (L)
● javac usually produce .class files with
● Local variable maps (local index → variable name)
● Line number maps (instruction → source line)
● This can be turned off with the '-O' or '-g:none'
options (no obfuscation tool needed)
● This does make the decompiled code more
difficult to understand
Pierre-André Saulais 23
Name obfuscation (L)
● Changes the name of each class/method/field to a
unique, random name
● Can use Java keywords, operators or spaces
● Methods with different signatures can share
names (method overloading)
● Usually public API untouched (not required)
● All uses of names have to be changed
● Even outside of the class/JAR (e.g. dependencies)
Pierre-André Saulais 24
Local variable access obfuscation (D)
● Changes the order of the locals index table
● Especially useful if the decompiler assumes that
this has index zero
● Such decompilers can produce non-reducible code:
this = this + 1;
● Packs small variables into larger variables
● Access to these variables require bit masking and
shifting operations
● Cf. earlier decompilation example
Pierre-André Saulais 25
String encryption (D)
● String constants used in the class are scrambled
using symmetric encryption
● Example: the string "Wrong password" can be
encrypted as ”Cgxzw5cuofh{j1”
● These strings are decrypted at run-time
● Requires the decryption code to be in the BC
● Adds confusion to the decompiled code but
generally straightforward to reverse
Pierre-André Saulais 26
Control Flow Graph
● A function can be broken down into Basic Blocks
● Longuest sequence of instructions without branch
● CFG: graph with BBs as nodes, branches as edges
entry:
iconst_0
istore_2
lp.start:
iload_2
iload_1
if_icmpge @lp.end
lp.body:
iinc 2, 1
goto @lp.start
lp.end:
iload_2
ireturn
Pierre-André Saulais 27
Changing the Control Flow Graph (C)
● Changes to the CFG can make the code non-
reducible (i.e. impossible to express in Java)
● Using goto (can be used in BC but not in Java)
● Loop unrolling
● Inlining methods (removes method boundaries)
● Inline methods will contain the code of all of their
callees (doesn't have to be in the same class)
● This cannot be used to remove public methods
(unless whole-application obfuscation is used)
Pierre-André Saulais 28
Inserting no-op/dead code (C)
● To confuse the reader and/or decompiler
● Adding code that does nothing when executed, or
is never executed ('dead code')
● Changing the recognized instruction patterns
● Only works with javac-specific decompilers
● E.g. adding random instructions after return
● Using instructions usually not found in .class files
● E.g. jsr, pop
● This usually makes the code non-reducible
Pierre-André Saulais 29
General issues with obfuscation
● Merely makes decompiling more difficult
● Functional information is still there
● With enough effort, what the program does can be
understood
● Makes debugging more complicated
● Complex transformations have costs
● Higher probability of introducing subtle bugs
● Usually there is a real performance loss
Pierre-André Saulais 30
Functional issues
● Name obfuscation (e.g. renaming methods)
● Can break code that uses reflection or JNI
● Reflection is used by serialization and ORM
frameworks (e.g. Spring, Hibernate)
● Can break bug reporting (mangled exception trace)
● Should be deterministic (or upgrades could break)
● Adding illegal code that is never executed
● Could break BC verification with newer versions of
the JVM
Pierre-André Saulais 31
Performance issues
● Control/data transforms can slow down the code
● Comparison using 'Scimark2' benchmark
Pierre-André Saulais 32
Who uses obfuscation?
● Mostly companies
● They don't want others to steal their work
● By reusing algorithms and data structures
● By using evaluation software indefinitely
● To protect Intellectual Property
● To prevent loss of revenue
Pierre-André Saulais 33
Alternatives
● Hiding the source code behind an interface
● Web app/service (e.g. server-side programming)
● Ahead-of-Time compiler (e.g. GCJ)
● Compiles Java code to machine code
● Requires AOT compilation for all dependencies
● Native 'core' library that contains critical code
● Executed through JNI
● Usually faster, but complicates deployment
Pierre-André Saulais 34
Java decompilers – javac-specific
● These tools rely on patterns found in code
produced by the official javac compiler
● Easily tripped by data/control flow obfuscation
● Examples
● JAD 1.5.8g (http://www.varaneckas.com/jad)
● JD-GUI 0.3.3 (http://java.decompiler.free.fr/)
● JODE 1.1.2-pre1 (http://jode.sourceforge.net/)
Pierre-André Saulais 35
Java decompilers – Dava
● http://www.sable.mcgill.ca/dava/
● Free and open source software (LGPL licence)
● Based on the Soot framework (McGill University)
● Tool-agnostic decompiler
● Does not depend on 'canonical' javac output
● Obfuscation-specific transformations
● Advanced flow-analysis to simplify code output [2]
Pierre-André Saulais 36
Java obfuscators – Proguard
● http://proguard.sourceforge.net/
● Free and open source (GPL licence)
● Main features
● Name obfuscation (with stack trace recovery)
● BC shrinker (removes unused code)
● Unsupported
● Control/data flow obfuscation
● String encryption
Pierre-André Saulais 37
Java obfuscators – JBCO
● http://www.sable.mcgill.ca/JBCO/
● Also based on the LGPL Soot framework
● Features
● Name, control and data flow obfuscation
● Configurable obfuscation passes
● Unsupported
● String encryption
● Stack trace recovery
Pierre-André Saulais 38
Java obfuscators – Commercial
● ”Second-generation” obfuscators
● Features
● Name, control and data flow obfuscation
● String encryption, stack trace recovery
● Examples
● KlassMaster (Zelix)
● Allatori Java Obfuscator (Smardec)
● DashO (PreEmptive Solutions)
Pierre-André Saulais 39
Conclusion
● Protecting programs is an issue for companies
● Obfuscators help with this issue, to some extent
● They also bring issues of their own
Pierre-André Saulais 40
References / Further Reading
● [1] Taxonomy of Obfuscating Transformations (Christian Collberg)
● [2] Programmer-friendly Decompiled Java (Naeem N., Hendren L.)
http://www.sable.mcgill.ca/publications/techreports/sable-tr-
2006-2.pdf
● [3] Protect Your Java Code – Through Obfuscators and Beyond
http://www.excelsior-usa.com/articles/java-obfuscators.html
● [4] Decompiling Java, Chapter 10 (Godfrey Nolan)
http://www.apress.com/downloadable/download/sample/
sample_id/1127/
● [5] Covert Java: Techniques for Decompiling [...] (Alex Kalinovsky)
http://www.informit.com/articles/printerfriendly.aspx?p=174368
Pierre-André Saulais 41
Questions?
Pierre-André Saulais 42