STATIC ANALYSIS OF JAVA
ENTERPRISE APPLICATIONS
FRAMEWORKS AND CACHES,
THE ELEPHANTS IN THE ROOM
Anastasios Antoniadis Nikos Filippakis Paddy Krishnan
University of Athens CERN Oracle Labs Australia
Raghavendra Ramesh Nicholas Allen Yannis Smaragdakis
ConsenSys Oracle Labs Australia University of Athens
ENTERPRISE APPLICATIONS + STATIC ANALYSIS =
• Big success in the Java world
• Overlooked failure of the Java static analysis world
• Popular Java points-to analysis frameworks: Soot, WALA, Doop
– Virtually zero coverage out of the box
– Completeness:
– Precision:
– Scalability:
2
CHALLENGES OF JAVA ENTERPRISE APPLICATIONS
COMPLETENESS
• Web Frameworks
– Layers of abstraction ease development
• Dynamic techniques
– e.g., Dependency Injection
– Configurability (annotations, xml)
– Custom implementations of JavaEE
• Supporting each framework Unsustainable
“Where do I start from?”
3
ANOTHER CHALLENGE:
FRAMEWORK CACHES
Frameworks employ caches to achieve transparent performance
Identity map pattern
Heterogeneous data structures
Caching of views, beans
Detrimental to analysis scalability and precision
Complexity of analyzing Alternative back-end
Erased generics
java.util.* Maps data structures
4
JackEE TO THE RESCUE
T H I S PA P E R ’ S C O N T R I B U T I O N S
– General-yet-customizable framework modeling
• Completeness
– An impressive average in-app reachability: 58%
– 14.48% in Doop
– Virtually zero in Soot, WALA
– Sound-modulo-analysis modeling of map structures
• 6x speedup on average
• Higher precision on multiple metrics
– Evaluated on popular real-world benchmarks
• A benchmarking suite for the future!
5
FRAMEWORK-AGNOSTIC MODELING
• JackEE’s modeling of frameworks
– Declarative implementation
• Extends Doop
– Defines a common simplified vocabulary
– Processes programs inputs (incl. annotations, xml)
– Produces framework-independent outputs
• Entry points - Discovery and exercise
• Bean objects - Generation and interconnection
6
JackEE’S GENERALIZED VOCABULARY
Inputs Class annotations Method annotations
XML configuration files
Field annotations
Loose JavaEE terms General points-to terms
• Servlet(c : Class) • EntryPointClass(c : Class)
Outputs • Controller(c : Class)
• Interceptor(c : Class)
• ExercisedEntryPointMethod(m : Method)
• BeanFieldInjection(c: Class, f : Field, o : Value)
• Bean(c : Class) • GeneratedObject(o : Value, c : Class)
• JackEE’s outputs
– Used by the points-to analysis
– Use the points-to analysis information to infer further points-to (mutual recursion)
7
SAMPLE USE OF VOCABULARY
ENTRY POINT DISCOVERY RULES
• Subtyping, annotations, xml configuration
– In-app servlet discovery
Servlet(class) :- ConcreteApplicationClass(class),
SubtypeOf(class, "javax.servlet.GenericServlet").
– Spring controller discovery
Controller(class) :- ConcreteApplicationClass(class),
Class_Annotation(class,"org.spring...@Controller").
– Interceptor discovery (e.g., Spring authentication providers)
Interceptor(class) :- XMLNode(file, nodeId, _, _, "authentication-provider"),
XMLNodeAttr(file, nodeId, _, _, providerId),
Bean_Id(class, providerId).
– Entry point accumulation rule
EntryPointClass(class) :- Servlet(class),
Controller(class),
Interceptor(class).
8
– Completeness:
SAMPLE USE OF VOCABULARY
WIRING TOGETHER BEANS
• Dependency injection patterns through annotations/XML configuration
– Field injection discovery
FieldInjection(class, field, beanObject) :- Field_Annotation(field, "@Inject"),
Field_DeclaringType(field, class),
Bean_Id(bean, field),
GeneratedObject(beanObject, bean).
– Wiring beans together
ObjectFieldPointsTo(object, field, beanObject) :- FieldInjection(class, field, beanObject),
Value_Type(object, class).
– Completeness:
9
JackEE POINTS-TO
A RECURSIVE RELATIONSHIP
• JackEE uses points-to information to infer further points-to information
– Case in hand:
bean = context.getBean(“beanId”);
VarPointsTo(local, beanObject) :- GetBeanInvocation(invocation),
ActualParam(0, invocation, actualParam),
VarPointsTo(actualParam, beanId),
Bean_Id(bean, beanId),
GeneratedObject(beanObject, bean),
AssignReturnValue(invocation, local).
• Completeness:
10
WHAT ABOUT CACHES?
SCALABILITY CHALLENGES
2objH computation time
• Blowup in java.util points-to (2objH)
– The most precise practical analysis alfresco 72 28
– Most of it attributed to maps
bitbucket-server 76 24
opencms 46 54
“Wait, but why?”
• Lots of internal complexity in maps pybbs 69 31
• Points-to evaluated for all backend data structures
shopizer 64 36
– Map “treeification” optimization
• Red-Black tree backend
SpringBlog 68 32
– Significant needless overhead
WebGoat 70 30
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
11
java.util time non-java.util time
WHAT ABOUT CACHES?
PRECISION CHALLENGES
• java.util.* maps feature a double-dispatch-like pattern
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict)
{
…
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key,
value);
…
}
• Degrades the precision of most context-sensitive analyses, e.g., 2objH
• putVal context is information-rich:
[<HashMap allocator receiver object>, <HashMap object>]
• putTreeVal context is not:
[<HashMap object>, <TreeNode object>]
12
SOUND-MODULO-ANALYSIS MODELING
• Given
S : the analysis
C : the code entity
C’: the sound-modulo-analysis model of C
Then S(C’) must model all the dynamic behaviors of C
• C’ can result in both more/less precise analysis (under fixed S)
– In our case C’ significantly improves precision!
13
SOUND-MODULO-ANALYSIS MODELING OF MAPS
HASHMAP SNIPPET
• S: flow-insensitive, array-insensitive, path-flow-insensitive analysis
Original JackEE
14
SOUND-MODULO-ANALYSIS MODELING
• Captures all the original behaviors of Java maps (e.g., exceptions)
• Removed complexity from the most-reused part of the library
– Greatest complexity-removal factor treeification elimination
• Treeification converts the map’s bins to red-black trees
– Fewer local aliases
– Code simplification context sensitivity keeps greater precision
15
EVALUATION
Benchmarks System Three axes of evaluation
8 well-known free applications 2x12-core Intel Xeon Completeness
7/8 open-source 640GB RAM Speed
(available on GitHub) Analyses running on 16 threads Precision
16
A REAL-WORLD BENCHMARKING
SUITE!
Benchmark Description Gitstar Organization/User Rank
alfresco CMS 2,088 550 1,947
bitbucket-server On-premise version of Bitbucket N/A N/A N/A
dotCMS CMS 612 400 4,624
opencms CMS 522 400 5,092
pybbs Website building framework 1,109 524 3,895
shopizer e-commerce framework 1,643 1.6k 4,040
SpringBlog Blog system 1,548 716 3,568
WebGoat The popular OWASP app 3,701 1.5k 1,234
17
IMPRESSIVE COMPLETENESS
HIGHER THAN PLAIN OLD BENCHMARKS
App reachable methods %
(DACAPO)!
alfresco
• JackEE averages 58.04% in-app bitbucket
reachability
dotCMS
– Doop averages 42.89% in-
app reachability for DaCapo opencms
• The established standard!
pybbs
shopizer
• Without JackEE Doop averages
14.48% in-app reachability SpringBlog
– 1.8% for alfresco
WebGoat
– 0.0% for pybbs
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Doop reachable JackEE reachable Column1
18
SPEED
JackEE OR I G INA L JD K 8 VS SOU ND- M O DU LO - A NA L Y SI S JD K 8
• Average speedup for mod-2objH: 5.9x
– Just by replacing HashMap, LinkedHashMap, ConcurrentHashMap!
19
PRECISION
JackEE ORIGINAL JDK 8 VS SOUND-MODULO-ANALYSIS JDK 8
Benchmark Avg. vpt size reduction Avg. app vpt size reduction # CallGraphEdge reduction
alfresco 24.2% 19.4% 7.4%
bitbucket-server 42.3% 25.7% 8.1%
dotCMS N/A N/A N/A
opencms 13.3% 8.2% 1.8%
pybbs 33.7% 24.3% 8.9%
shopizer 30.3% 27.0% 6.0%
SpringBlog 44.6% 28.7% 8.6%
WebGoat 30.2% 6.0% 4.4%
Average 28.7% 19.9% 6.5%
20
CONCLUSION
• JackEE’s contributions
– Automatic, declarative and extendable framework modeling
• Impressive completeness
– Sound-modulo-analysis modeling of maps
• Maintains soundness
• Achieves high scalability
• Significantly improves precision
21
AAAND CUT!
22