0% found this document useful (0 votes)
33 views

Static Analysis of Java Enterprise Applications

This paper presents JackEE, a framework for modeling Java enterprise applications to improve static analysis. JackEE provides customizable modeling of frameworks to increase analysis completeness. It also models maps in a sound but simplified way to improve precision and scalability.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Static Analysis of Java Enterprise Applications

This paper presents JackEE, a framework for modeling Java enterprise applications to improve static analysis. JackEE provides customizable modeling of frameworks to increase analysis completeness. It also models maps in a sound but simplified way to improve precision and scalability.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

STATIC ANALYSIS OF JAVA

ENTERPRISE APPLICATIONS
FRAMEWORKS AND CACHES,
THE ELEPHANTS IN THE ROOM

Anastasios Antoniadis Nikos Filippakis Paddy Krishnan


University of Athens CERN Oracle Labs Australia

Raghavendra Ramesh Nicholas Allen Yannis Smaragdakis


ConsenSys Oracle Labs Australia University of Athens
ENTERPRISE APPLICATIONS + STATIC ANALYSIS =
• Big success in the Java world
• Overlooked failure of the Java static analysis world
• Popular Java points-to analysis frameworks: Soot, WALA, Doop
– Virtually zero coverage out of the box
– Completeness:
– Precision:
– Scalability:

2
CHALLENGES OF JAVA ENTERPRISE APPLICATIONS
COMPLETENESS
• Web Frameworks
– Layers of abstraction ease development
• Dynamic techniques
– e.g., Dependency Injection
– Configurability (annotations, xml)
– Custom implementations of JavaEE
• Supporting each framework Unsustainable
“Where do I start from?”

3
ANOTHER CHALLENGE:
FRAMEWORK CACHES

Frameworks employ caches to achieve transparent performance


Identity map pattern
Heterogeneous data structures
Caching of views, beans

Detrimental to analysis scalability and precision


Complexity of analyzing Alternative back-end
Erased generics
java.util.* Maps data structures

4
JackEE TO THE RESCUE
T H I S PA P E R ’ S C O N T R I B U T I O N S

– General-yet-customizable framework modeling


• Completeness
– An impressive average in-app reachability: 58%
– 14.48% in Doop
– Virtually zero in Soot, WALA
– Sound-modulo-analysis modeling of map structures
• 6x speedup on average
• Higher precision on multiple metrics
– Evaluated on popular real-world benchmarks
• A benchmarking suite for the future!

5
FRAMEWORK-AGNOSTIC MODELING
• JackEE’s modeling of frameworks
– Declarative implementation
• Extends Doop
– Defines a common simplified vocabulary
– Processes programs inputs (incl. annotations, xml)
– Produces framework-independent outputs
• Entry points - Discovery and exercise
• Bean objects - Generation and interconnection

6
JackEE’S GENERALIZED VOCABULARY

Inputs Class annotations Method annotations


XML configuration files
Field annotations

Loose JavaEE terms General points-to terms


• Servlet(c : Class) • EntryPointClass(c : Class)
Outputs • Controller(c : Class)
• Interceptor(c : Class)
• ExercisedEntryPointMethod(m : Method)
• BeanFieldInjection(c: Class, f : Field, o : Value)
• Bean(c : Class) • GeneratedObject(o : Value, c : Class)

• JackEE’s outputs
– Used by the points-to analysis
– Use the points-to analysis information to infer further points-to (mutual recursion)
7
SAMPLE USE OF VOCABULARY
ENTRY POINT DISCOVERY RULES
• Subtyping, annotations, xml configuration
– In-app servlet discovery
Servlet(class) :- ConcreteApplicationClass(class),
SubtypeOf(class, "javax.servlet.GenericServlet").

– Spring controller discovery


Controller(class) :- ConcreteApplicationClass(class),
Class_Annotation(class,"org.spring...@Controller").
– Interceptor discovery (e.g., Spring authentication providers)
Interceptor(class) :- XMLNode(file, nodeId, _, _, "authentication-provider"),
XMLNodeAttr(file, nodeId, _, _, providerId),
Bean_Id(class, providerId).

– Entry point accumulation rule


EntryPointClass(class) :- Servlet(class),
Controller(class),
Interceptor(class).

8
– Completeness:
SAMPLE USE OF VOCABULARY
WIRING TOGETHER BEANS

• Dependency injection patterns through annotations/XML configuration


– Field injection discovery
FieldInjection(class, field, beanObject) :- Field_Annotation(field, "@Inject"),
Field_DeclaringType(field, class),

Bean_Id(bean, field),
GeneratedObject(beanObject, bean).

– Wiring beans together


ObjectFieldPointsTo(object, field, beanObject) :- FieldInjection(class, field, beanObject),
Value_Type(object, class).

– Completeness:

9
JackEE POINTS-TO
A RECURSIVE RELATIONSHIP

• JackEE uses points-to information to infer further points-to information


– Case in hand:
bean = context.getBean(“beanId”);

VarPointsTo(local, beanObject) :- GetBeanInvocation(invocation),


ActualParam(0, invocation, actualParam),
VarPointsTo(actualParam, beanId),
Bean_Id(bean, beanId),
GeneratedObject(beanObject, bean),
AssignReturnValue(invocation, local).

• Completeness:
10
WHAT ABOUT CACHES?
SCALABILITY CHALLENGES
2objH computation time
• Blowup in java.util points-to (2objH)
– The most precise practical analysis alfresco 72 28

– Most of it attributed to maps


bitbucket-server 76 24

opencms 46 54
“Wait, but why?”
• Lots of internal complexity in maps pybbs 69 31

• Points-to evaluated for all backend data structures


shopizer 64 36
– Map “treeification” optimization
• Red-Black tree backend
SpringBlog 68 32
– Significant needless overhead
WebGoat 70 30

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
11
java.util time non-java.util time
WHAT ABOUT CACHES?
PRECISION CHALLENGES
• java.util.* maps feature a double-dispatch-like pattern
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict)
{

else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key,
value);

}
• Degrades the precision of most context-sensitive analyses, e.g., 2objH

• putVal context is information-rich:


[<HashMap allocator receiver object>, <HashMap object>]
• putTreeVal context is not:
[<HashMap object>, <TreeNode object>]
12
SOUND-MODULO-ANALYSIS MODELING
• Given
S : the analysis
C : the code entity
C’: the sound-modulo-analysis model of C
Then S(C’) must model all the dynamic behaviors of C

• C’ can result in both more/less precise analysis (under fixed S)


– In our case C’ significantly improves precision!

13
SOUND-MODULO-ANALYSIS MODELING OF MAPS
HASHMAP SNIPPET

• S: flow-insensitive, array-insensitive, path-flow-insensitive analysis

Original JackEE
14
SOUND-MODULO-ANALYSIS MODELING
• Captures all the original behaviors of Java maps (e.g., exceptions)
• Removed complexity from the most-reused part of the library
– Greatest complexity-removal factor treeification elimination
• Treeification converts the map’s bins to red-black trees
– Fewer local aliases
– Code simplification context sensitivity keeps greater precision

15
EVALUATION

Benchmarks System Three axes of evaluation

8 well-known free applications 2x12-core Intel Xeon Completeness


7/8 open-source 640GB RAM Speed
(available on GitHub) Analyses running on 16 threads Precision

16
A REAL-WORLD BENCHMARKING
SUITE!
Benchmark Description Gitstar Organization/User Rank
alfresco CMS 2,088 550 1,947
bitbucket-server On-premise version of Bitbucket N/A N/A N/A
dotCMS CMS 612 400 4,624
opencms CMS 522 400 5,092
pybbs Website building framework 1,109 524 3,895
shopizer e-commerce framework 1,643 1.6k 4,040
SpringBlog Blog system 1,548 716 3,568
WebGoat The popular OWASP app 3,701 1.5k 1,234

17
IMPRESSIVE COMPLETENESS
HIGHER THAN PLAIN OLD BENCHMARKS
App reachable methods %
(DACAPO)!
alfresco

• JackEE averages 58.04% in-app bitbucket

reachability
dotCMS
– Doop averages 42.89% in-
app reachability for DaCapo opencms

• The established standard!


pybbs

shopizer
• Without JackEE Doop averages
14.48% in-app reachability SpringBlog

– 1.8% for alfresco


WebGoat
– 0.0% for pybbs
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Doop reachable JackEE reachable Column1

18
SPEED
JackEE OR I G INA L JD K 8 VS SOU ND- M O DU LO - A NA L Y SI S JD K 8

• Average speedup for mod-2objH: 5.9x


– Just by replacing HashMap, LinkedHashMap, ConcurrentHashMap!
19
PRECISION
JackEE ORIGINAL JDK 8 VS SOUND-MODULO-ANALYSIS JDK 8

Benchmark Avg. vpt size reduction Avg. app vpt size reduction # CallGraphEdge reduction
alfresco 24.2% 19.4% 7.4%
bitbucket-server 42.3% 25.7% 8.1%
dotCMS N/A N/A N/A
opencms 13.3% 8.2% 1.8%
pybbs 33.7% 24.3% 8.9%
shopizer 30.3% 27.0% 6.0%
SpringBlog 44.6% 28.7% 8.6%
WebGoat 30.2% 6.0% 4.4%
Average 28.7% 19.9% 6.5%

20
CONCLUSION

• JackEE’s contributions
– Automatic, declarative and extendable framework modeling
• Impressive completeness
– Sound-modulo-analysis modeling of maps
• Maintains soundness
• Achieves high scalability
• Significantly improves precision

21
AAAND CUT!

22

You might also like