0% found this document useful (0 votes)
15 views

Type-Based Analysis and Applications

Uploaded by

石远翔
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Type-Based Analysis and Applications

Uploaded by

石远翔
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Type-Based Analysis and Applications

Jens Palsberg
Purdue University
Dept. of Computer Science
West Lafayette, IN 47907
[email protected]

ABSTRACT there are annual international conferences, such as the Static


Type-based analysis is an approach to static analysis of pro- Analysis Symposium, for presentation and discussion of ad-
grams that has been studied for more than a decade. A type- vances in the area.
based analysis assumes that the program type checks, and Nowadays, static analysis is also used by tools for various
the analysis takes advantage of that. This paper examines software engineering tasks such as program understanding
the state of the art of type-based analysis, and it surveys [74, 28], debugging [14], testing [57], and reverse engineer-
some of the many software tools that use type-based analy- ing [22]. Among the international conferences that cover the
sis. Most of the surveyed tools use types as discriminators, application of static analysis to the area of software engineer-
while most of the theoretical studies use type and effect sys- ing is the International Symposium on Software Testing and
tems. We conclude that type-based analysis is a promising Analysis.
approach to achieving both provable correctness and good In the coming years, there are new challenges for static
performance with a reasonable effort. analysis. For example, there is an increasing need for ver-
ifying key properties of software, including real-time prop-
erties, security-related behavior, and power consumption.
1. INTRODUCTION The emerging paradigm [12] of combining model extraction
This paper is a survey of the theory and practice of type- (based on static analysis) and model checking is promising
based analysis. It tries to answer the following questions: for this purpose. Moreover, the notion of dynamic class
loading that has been popularized by Java has led to in-
• What is a type-based analysis?
creased interest in run-time compilation and therefore also
• What are the advantages of type-based analysis? in highly efficient static analyses. Finally, work on scalable
static analysis is needed to enable the application of static
• Is type-based analysis competitive with other approaches analysis to larger and larger programs.
to static analysis? Does the field of static analysis have a realistic hope of
being able to help address the software problems of today
• Which tools use type-based analysis?
and tomorrow? The properties that need to be analyzed are
• What is the current spectrum of type-based analyses? increasingly complex, and the programs being analyzed are
ever larger. When push comes to shove, will static analysis
As background for examining these questions, let us begin measure up? I believe the answer to both questions is Yes.
with a brief overview of some of the past successes and future One of the key reasons for optimism comes from the field
challenges of the broader field of static analysis. of programming language design and the growing popular-
Traditionally, optimizing compilers were the main con- ity of static type checking [10]. In the 1990s, most new
sumers of static analyses. Classical examples of static anal- software was written in languages such as C [33], C++ [17],
yses are liveness analysis (for doing, e.g., register allocation) and Java [23] which all feature varying degrees of static type
and data-flow analysis (for doing, e.g., common-subexpression checking. In particular, the type system of Java has received
elimination). Many textbooks on compiler design, including considerable attention, and for substantial subsets of Java,
[2, 4, 44], contain substantial coverage of how to define, im- there are automatically-checked proofs of type soundness,
plement, and use static analyses in compilers. There are e.g., [50]. The trend of typeful programming seems to con-
books devoted entirely to static analysis, including [48], and tinue, and types are now also being used in the intermediate
languages of compilers, including the Java VML, and even in
assembly languages [41, 40]. Traditionally, compilers would
apply static analyses to untyped intermediate representa-
Permission to make digital or hard copies of all or part of this work for tions of programs, and so these analyses worked for all pro-
personal or classroom use is granted without fee provided that copies are
grams, whether typable or not. Now, an increasing number
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to of static analyses are defined on statically-typed represen-
republish, to post on servers or to redistribute to lists, requires prior specific tations of programs, such as Java bytecodes. Of course,
permission and/or a fee. a static analysis can simply ignore the types, and some of
PASTE’01, June 18–19, 2001, Snowbird, Utah, USA.. them do. However, the presence of types has led researchers
Copyright 2001 ACM 1–58113–413–4/01/0006 ...$5.00.
to ask: e1 → λl x.e
(e1 e2 occurs in E) (3)
e1 e2 → e
Question: Can a static analysis take advantage
of that the program type checks? e1 → e 2 e2 → e 3
(4)
e1 → e 3
In particular, can the types help with defining more compli-
cated analyses, can they help with reasoning about the cor- This analysis is conservative and works for all λ-terms. The
rectness of an analysis, and can they help with making the idea is that if we analyze an expression E, then, for any
static analyses more efficient? These questions have been subexpression e of E, the flow set for e is the set of abstrac-
asked since, at least, the 1980s, and after the work of many tions λl x.e0 such that there is an edge e → λl x.e0 in the flow
researchers, it can be concluded that the answer to each of graph. For the running example F , we can use Rules (1)–(4)
the questions is Yes. As a result, there is an emerging field to generate the edges:
of type-based analysis: f → λ3 a.a
Terminology: A type-based analysis assumes (λ1 f.λ2 x.f x)(λ3 a.a) → λ2 x.f x
that the program type checks, and the analysis F → f x → a → x → λ4 b.b,
takes advantage of that.
so by transitivity (Rule (4)), we have F → λ4 b.b.
This paper examines the state of the art of type-based anal- The above analysis uses a style that is widely known as
ysis. We start with an example that illustrates what type- 0-CFA [59]. It can be executed in O(n3 ) time, where n is
based analysis is and isn’t, and we then discuss the advan- the size of the program [55], and can be proved correct with
tages of type-based analysis, survey some of the many soft- respect to arbitrary β-reduction [52].
ware tools that use type-based analysis, and map out the
landscape of type-based analysis. 2.2 A Simple Type System
Below we present three type-based analyses that all as-
2. EXAMPLE sume that the program being analyzed is simply typed, that
is, it obeys the following type discipline. We use α to range
Let us consider a classical static-analysis problem: flow
over type variables, and we use the following grammar to
analysis for the λ-calculus. We will present four well-known
define types:
static analyses for solving this problem: one that does not
rely on types, and three type-based analyses. The goal is to t ::= α | t → t.
illustrate various advantages of type-based analysis.
We use x to range over program variables, we use l to A type environment is a partial function from program vari-
range over labels, and we use the following grammar to de- ables to types, and we use the notation A[x : t] to denote a
fine a language of λ-terms: type environment which maps x to t, and otherwise maps y
to A(y) when x 6= y. The type rules are:
e ::= x | λl x.e | e1 e2 .
A`x:t (A(x) = t) (5)
The goal of a flow analysis of a program E is to approximate,
for each subexpression e of E, the set of labels (called flow
A[x : s] ` e : t
set) of the abstractions λl x.e0 that are the possible values of (6)
e. The flow analysis must be conservative, that is, if λl x.e0 A ` λl x.e : s → t
is a possible value of e, then l must be in the flow set for e.
As a running example, we will use the λ-term A ` e1 : s → t A ` e2 : s
(7)
1 2 3 4
A ` e 1 e2 : t
F = ((λ f.λ x.f x)(λ a.a))(λ b.b).
For the running example F , we can use Rules (5)–(7) to
If we do β-reduction of F , then we get: construct a type derivation which contains the judgments:
F →β (λ2 x.((λ3 a.a) x)) (λ4 b.b) ∅ ` λ1 f.λ2 x.f x : ((α → α) → (α → α)) →
→β (λ3 a.a) (λ4 b.b) ((α → α) → (α → α))
→β λ4 b.b. ∅[f : (α → α) → (α → α)] ` λ2 x.f x :
Hence, λ4 b.b is a possible value of F , so any sound flow (α → α) → (α → α)
analysis of F must produce a flow set for F that contains ∅ ` λ3 a.a : (α → α) → (α → α)
the label 4.
∅ ` λ4 b.b : α → α
2.1 0-CFA ∅ ` F : α → α.
One can define a flow analysis for a λ-term E by using a
flow graph in which the nodes are the expressions occurring 2.3 A Type and Effect System
in E. The edges in the flow graph are generated from the The first type-based analysis is a so-called type and effect
following four rules (taken from [27]): system. It uses the types and the type rules in a rather
direct way (much like in [26]). The idea is to annotate the
λl x.e → λl x.e (1) function types with a flow set ϕ. Thus, annotated types are
defined by the grammar:
e1 → λl x.e
(e1 e2 occurs in E) (2) t ::= α | t−
ϕ
→ t.
x → e2
The revised type rules are: Rules (11)–(16) to generate the edges:
f → dom(λ1 f.λ2 x.f x) → λ3 a.a
A`x:t (A(x) = t) (8)
(λ1 f.λ2 x.f x)(λ3 a.a) → ran(λ1 f.λ2 x.f x) → λ2 x.f x
F → ran((λ1 f.λ2 x.f x)(λ3 a.a)) → ran(ran(λ1 f.λ2 x.f x))
A[x : s] ` e : t
ϕ (l ∈ ϕ) (9) → ran(λ2 x.f x) → f x → ran(f )
A ` λl x.e : s −
→t
→ ran(dom(λ1 f.λ2 x.f x)) → ran(λ3 a.a) → a
ϕ → dom(λ3 a.a) → dom(dom(λ1 f.λ2 x.f x)) → dom(f )
A ` e1 : s −
→t A ` e2 : s
(10) → x → dom(λ2 x.f x) → dom(ran(λ1 f.λ2 x.f x))
A ` e 1 e2 : t
ϕ
→ dom((λ1 f.λ2 x.f x)(λ3 a.a))
Notice that Rule (9) enforces that the function type s −→t
→ λ4 b.b,
“remembers” the label l by having the side condition l ∈ ϕ.
ϕ
The idea is that if we have the judgment A ` e : s − → t, so the flow set for F is {4}.
then the flow set for e is ϕ. For the running example F , we The building of the flow graph may diverge for some λ-
can use Rules (8)–(10) to construct a type derivation which terms. In can be shown that if a λ-term is simply typed,
contains the judgments: then the flow graph will be finite, sparse, and built in finite
time, and the produced flow information will be the same as
{4} {3} {4} {1} that produced by 0-CFA. Moreover, if the sizes of the types
∅ ` λ1 f.λ2 x.f x : ((α −−→ α) −−→ (α −−→ α)) −−→ are independent of the size of the program, as is often the
{4} {2} {4} case in practice, then the flow information can be computed
((α −−→ α) −−→ (α −−→ α))
{4} {3} {4}
in O(n2 ) time, which is an improvement over the O(n3 ) time
∅[f : (α −−→ α) −−→ (α −−→ α)] ` λ2 x.f x : spent by 0-CFA.
{4} {2} {4}
(α −−→ α) −−→ (α −−→ α) 2.5 A Types-as-Discriminators Approach
3 {4} {3} {4} The third type-based analysis uses the types as discrim-
∅ ` λ a.a : (α −−→ α) −−→ (α −−→ α)
{4}
inators. To compute a flow set for an expression e in a
∅ ` λ4 b.b : α −−→ α program E, the analysis concentrates on the type of e, and
{4} asks which abstractions in E have the same type as e. The
∅ ` F : α −−→ α,
set of labels of those abstractions is the flow set for e.
For the running example F , the type of F itself is α → α.
so the flow set for F is {4}. There is exactly one abstraction in F which has type α → α,
namely λ4 b.b, so the flow set for F is {4}.
2.4 A Sparse-Flow-Graph Approach
The second type-based analysis for a λ-term E uses a 3. ADVANTAGES OF
sparse flow graph and avoids transitive closure [27]. All po- TYPE-BASED ANALYSIS
tential nodes in the flow graph are defined by the grammar:
The example above illustrates the main advantages of
type-based analysis, all revolving around the issues of sim-
n ::= e | dom(n) | ran(n), plicity, efficiency, and correctness. We discuss these advan-
tages in more detail here and we consider whether type-
where e occurs in the program being analyzed. The edges based analysis is competitive with other approaches to static
in the flow graph are generated from the rules (taken from analysis.
[27]):
3.1 Simplicity
x → dom(λl x.e) (λl x.e occurs in E) (11)
Types provide an infrastructure on top of which analy-
ses can be built. For example, the type and effect system
ran(λl x.e) → e (λl x.e occurs in E) (12) in Section 2 illustrates the idea of annotated types, that is,
the decoration of types with static information. Conceptu-
e1 e2 → ran(e1 ) (e1 e2 occurs in E) (13) ally, the starting point for a type-based analysis is a type
derivation for a program, not just a syntax tree. Such a type
dom(e1 ) → e2 (e1 e2 occurs in E) (14) derivation is a convenient basis for designing static analyses.
If the goal is to design a type and effect system, then each
n1 → n 2 n → ran(n1 ) type rule provides a localized setting for thinking about the
(15) analysis of a single language construct. If the goal is to de-
ran(n1 ) → ran(n2 )
sign an analysis using types as discriminators, then the type
n1 → n 2 n → dom(n2 ) derivation contains the needed types.
(16)
dom(n2 ) → dom(n1 ) 3.2 Efficiency
The idea is that if we analyze an expression E, then, for Many researchers have observed that executing a static
any subexpression e of E, the flow set for e is the set of analysis on a typical statically-typed program tends to be
abstractions λl x.e0 such that there is a path e →∗ λl x.e0 faster and give qualitatively better results than executing
in the flow graph. For the running example F , we can use the same style of analysis on a typical dynamically-typed
program. The reason seems to be that statically-typed pro- of Section 2, CHA uses types as discriminators to achieve
grams are inherently more structured and therefore easier good precision. We will use the notation StaticType(e) to
to analyze. The field of type-based analysis goes further denote the static type of the expression e, SubTypes(t) to
by trying to get additional benefits from the types. The denote the set of declared subtypes of type t, and the no-
sparse-flow-graph approach above is an example of how the tation StaticLookup(C, m) to denote the definition (if any)
mere existence of types can help with computing static in- of a method with name m that one finds when starting a
formation faster. The types-as-discriminators approach is static method lookup in the class C. For the virtual call
particularly efficient for a language with declared types be- site e.m(. . .), and each class C ∈ SubTypes(StaticType(e))
cause the needed type information is readily available in the where StaticLookup(C, m) = M 0 , CHA determines that M 0
program text. is a method that can be invoked. Notice how the static type
of e is used to restrict attention to only some of the classes
3.3 Correctness in the program.
The correctness of a type system with respect to a se- We can extend CHA to take class-instantiation informa-
mantics is usually phrased as a type soundness theorem: tion into account. The result is known as rapid type analysis
well-typed programs cannot go wrong [38]. The correct- (RTA), and was first described by Bacon and Sweeney [5,
ness of a type and effect system can similarly be phrased as 6]. The idea is to first collect the set S of all classes C for
a type soundness theorem; the correctness of the analysis which there is an occurrence of “new C()” in the program.
is subsumed by the correctness of the annotated-type sys- Then, for the virtual call site e.m(. . .), and each class C ∈
tem. There is a well-understood method for proving type SubTypes(StaticType(e)) where StaticLookup(C, m) = M 0
soundness [46, 77] based on proving type preservation and and C ∈ S, RTA determines that M 0 is a method that can
progress, and this method usually carries over to type and be invoked. Notice that a class is only taken into account if
effect systems. at least one object of that class may exist at run time.
One can go further and associate a single distinct set (like
3.4 Competitiveness S) with each class, method, and/or field in an application
Among the main approaches to static analysis are data [71]. If one associates a set with each expression, then the
flow analysis, constraint-based analysis, and abstract inter- result is 0-CFA [54].
pretation [48]. Many researchers, including Nielson, Nielson, Sundaresan et al. [64] use a combination of a type-based
and Hankin [48], have observed that there are important analysis (either CHA or RTA) and a more traditional 0-CFA-
similarities between these approaches. Many type inference like analysis in the following way. First they perform, say,
problems that are specified using type rules can be turned RTA to determine a call graph approximation, and then they
into equivalent constraint problems that are suitable for al- use a 0-CFA-like technique to propagate class information
gorithmic considerations. Similarly, one can view a type and along the edges of that call graph. This turns out to be fast
effect system as a specification of an analysis, which in turn and give good results.
can be transformed into a constraint problem. The type and A weakness of RTA and other whole-program analyses is
effect system may be easier to formulate and reason about, that they have problems with library-based applications for
and the constraint problem may be more appropriate when which the code of the library is not available at the time
designing an algorithm to carry out the analysis. of the analysis. To overcome that, the application extrac-
One inherent advantage of type-based analysis is that it tor Jax features a specification language that allows users
enables the definition of abstract domains in terms of types. to specify, at a high level, how to extract a library-based
The types-as-discriminators approach can be viewed as do- application [65]. The idea is that a specification tells Jax
ing that by dividing the abstractions into equivalence classes what to expect from the library.
based on the types. The Swift compiler by Ghemawat, Randall, and Scales
The types can be of help when comparing two type-based [21] has a front end which compiles Java to an typed inter-
analyses for the same language. The types are a lingua mediate representation that uses annotated Java types. The
franca that is “spoken” by both analyses, and this may be annotations can express such things as: the value is known
of help when trying to identify similarities and differences. to be an object of exactly a particular class (and not a sub-
class), the value is an array with a particular constant size,
and the value is known to be non-null. The backend of the
4. TOOLS THAT USE compiler uses the annotations for method inlining and other
TYPE-BASED ANALYSIS optimizations.
We now survey some of the tools that successfully use
type-based analysis. The tools work on programs written in 4.2 Application Extraction
C++ [17], Java [23], Modula 3 [11, 45], and Standard ML For the purpose of extraction applications, the goal is to
[39]. compute a conservative approximation of the set of methods
that are reachable from the main method. It is straightfor-
4.1 Method Inlining ward to extend the basic formulations of CHA and RTA with
In an object-oriented program we may have a virtual call a form of reachability analysis. The following set-constraint
site e.m(. . .). If a static analysis can determine a conser- formulation of a version of CHA, borrowed from [71], uses a
vative approximation of the set of methods that can be in- single set variable R (for “reachable methods”) that ranges
voked, then a compiler may be able to inline the call. One of of sets of methods. The constraints are derived from the
the fundamental type-based analyses of object-oriented pro- program text in the following way:
grams for doing that is the Class Hierarchy Analysis (CHA)
of Dean, Grove, and Chambers [13]. In the terminology 1. main ∈ R (main denotes the main method)
2. For each method M , each virtual call site Their third analysis, called Selectively Merge Type Refer-
e.m(. . .) occurring in M , and each class C ∈ ences (SMTypeRefs), goes further by including a type-based
SubTypes(StaticType(e)) where StaticLookup(C, m) = flow analysis. The idea is that two expressions e1 and e2 can-
M 0: not be aliases if the program never assigns an object of type
(M ∈ R) ⇒ (M 0 ∈ R). StaticType(e1 ) to a reference of type StaticType(e2 ), or vice
versa. Thus, the type-based flow analysis records the types
Intuitively, the first constraint reads “the main method is
involved in all assignments, parameter passings, and return
reachable,” and the second constraint reads: “if a method
of results, and computes an approximation of the possible
is reachable, and a virtual method call e.m(. . .) occurs in
flow between references of different types.
the body of that method, then every method with name m
Experiments [16] show that both FieldTypeDecl and SM-
that is inherited by a subtype of the static type of e is also
TypeRefs are good bases for doing redundant-load elimina-
reachable.” It is straightforward to show that there is a least
tion, while TypeDecl seems to be too imprecise to get good
set R that satisfies the constraints, and a solution procedure
results.
that computes that set. The reason for computing the least
Fink, Knobe, and Sarkar [18] used a flow-sensitive version
R that satisfies the constraints is that this maximizes the
of FieldTypeDecl in their implementation of redundant-load
complement of R, i.e., the set of unreachable methods that
and dead-store elimination.
can be removed safely.
Hosking, Nystrom, Whitlock, Cutts, and Diwan [29] have
RTA extended with reachability analysis uses both a set
presented an approach to partial-redundancy elimination
variable R ranging over sets of methods, and a set variable
which uses the FieldTypeDecl approach to type-based alias
S which ranges over sets of class names. The variable S
analysis. Their experience with the approach is mixed, al-
approximates the set of classes for which objects are created
though they conclude that the main problem is not the alias
during a run of the program. The constraints:
analysis but the isolation between their optimizer and the
1. main ∈ R (main denotes the main method) underlying execution environment.
2. For each method M , each virtual call site
e.m(. . .) occurring in M , and each class C ∈
4.4 Encapsulation Checking
SubTypes(StaticType(e)) where StaticLookup(C, m) = In a Java package, there may classes with the property
M 0: that no object of those classes will escape the package. In
(M ∈ R) ∧ (C ∈ S) ⇒ (M 0 ∈ R). other words, the objects of those classes are encapsulated
in the package. Bokowski and Vitek [9] called such classes
3. For each method M , and for each “new C()” occurring confined, and they presented an extension of Java in which
in M : one can specify that a class is confined. Grothoff, Palsberg,
(M ∈ R) ⇒ (C ∈ S). and Vitek [24] presented a type-based analysis for identi-
Intuitively, the second constraint refines the corresponding fying confined classes in Java bytecode. Their analysis is
constraint of CHA by insisting that C ∈ S, and the third defined using constraints, which, in turn, rely on a flow anal-
constraint reads: “S contains the classes that are instanti- ysis to determine a call-graph approximation. They use a
ated in a reachable method.” type-based flow analysis akin to the SMTypeRefs mentioned
RTA is easy to implement, scales well, and has been shown above. In effect, the combined analysis does a low-cost es-
to compute call graphs that are significantly more precise cape analysis which turns out to identify a high number of
than those computed by CHA [5]. There are several whole- confined classes in a large benchmark suite.
program analysis systems that rely on RTA to compute call
graphs (e.g., the Jax application extractor of [70].) 4.5 Race Detection
It turns out that for detecting unreachable methods, the In a multi-threaded program, a race condition occurs when
inexpensive RTA does almost as well as when using a distinct two threads manipulate a shared data structure simultane-
set with each class, method, and/or field in an application ously, without synchronization. This can result in unex-
[71]. pected program behavior. To avoid it, one can use a pro-
gramming discipline where each data structure is protected
4.3 Redundant-Load Elimination with a lock that can be held by at most one thread at a
Redundant-load elimination is a compile-time optimiza- time. Flanagan and Freund [20] have presented a type-based
tion that combines loop-invariant code motion and common- analysis that detects race conditions in Java programs. The
subexpression elimination. Since it is trying to reorder state- analysis is presented as a type and effect system. The cur-
ments that may do pointer accesses, redundant-load elimi- rent implementation requires adding some type annotations
nation can benefit from alias information. Two access paths to the Java code. It remains open whether the type anno-
are said to be possibles aliases if they may refer to the same tations can be computed by an analysis.
variable. Diwan, McKinley, and Moss [16] have presented
three type-based alias analyses, all based on the idea of 4.6 Memory Management
using types as discriminators. The most basic one, called Tofte and Talpin [72] suggested that call-by-value func-
TypeDecl, observes that two expressions e1 and e2 cannot tional languages can be implemented using regions for mem-
be aliases if ory management. The idea is that, at run time, the store
consists of a stack of regions. Region inference is a type-
SubTypes(StaticType(e1 )) ∩ SubTypes(StaticType(e2 )) = ∅.
based analysis, presented as a type and effect system, which
Their second analysis, called FieldTypeDecl, further distin- determines where regions can be allocated and deallocated.
guishes expressions based on observations such as: two ex- Birkedal, Tofte, and Vejlstrup [8] presented an implemen-
pressions e1 .f and e2 .g cannot be aliases if f 6= g. tation for Standard ML. which demonstrates that region
inference can result in significant space savings, in compar- [4] Andrew W. Appel. Modern Compiler Implementation
ison with more traditional memory management based on in Java. Cambridge University Press, 1998.
garbage collection. Moreover, the region-based system can [5] David F. Bacon and Peter F. Sweeney. Fast static
compete on speed with a garbage-collection-based system. analysis of C++ virtual function calls. In Proceedings
of the Eleventh Annual Conference on Object-Oriented
5. OTHER TYPE-BASED ANALYSES Programming Systems, Languages, and Applications
There is a large number of published type and effect sys- (OOPSLA’96), pages 324–341, San Jose, CA, 1996.
tems for such tasks as side-effect analysis [37, 32, 66, 76], SIGPLAN Notices 31(10).
binding-time analysis [49], strictness analysis [35, 36, 78, 3, [6] David Francis Bacon. Fast and Effective Optimiza-
31], totality analysis, [62, 63, 61], callability analysis [67, tion of Statically Typed Object-Oriented Languages.
68], flow analysis [43, 42, 7, 30, 75, 73, 15, 56, 53], trust PhD thesis, Computer Science Division, University
analysis [51], secure information flow analysis [60], closure of California, Berkeley, December 1997. Report No.
conversion [25], resource allocation in compilers [69], con- UCB/CSD-98-1017.
tinuation allocation [58], dependency analysis [1], commu- [7] Anindya Banerjee. A modular, polyvariant and type-
nication analysis [48], and elimination of useless variables based closure analysis. In Proceedings of ICFP’97,
[34, 19]. Many of them have been proved correct, most have ACM International Conference on Functional Program-
not yet been implemented for a full-fledged programming ming, pages 1–10, 1997.
language, although some have been implemented for a toy [8] Lars Birkedal, Mads Tofte, and Magnus Vejlstrup.
language, and some still need an algorithm for performing From region inference to von Neumann machines
the analysis. Nielson and Nielson [47] present the overall via region representation inference. In Proceedings of
methodology behind type and effect systems, and they dis- POPL’96, 23nd Annual SIGPLAN–SIGACT Sympo-
cuss the major design decisions, including whether or not sium on Principles of Programming Languages, pages
to incorporate subtyping, subeffecting, polymorphism, and 171–183, 1996.
polymorphic recursion. [9] Boris Bokowski and Jan Vitek. Confined types. In
Proceedings of the Fourteenth Annual Conference on
6. CONCLUSION Object-Oriented Programming Systems, Languages, and
Most of the surveyed tools use types as discriminators, Applications (OOPSLA’99), pages 82–96, Denver, CO,
while most of the theoretical studies use type and effect 1999.
systems. To enable a better comparison of the different [10] Luca Cardelli. Type systems. In CRC Handbook of
approaches, future research may attempt a further formal- Computer Science and Engineering, chapter 103, pages
ization of the techniques used in current tools, and larger- 2208–2236. CRC Press, 1997.
scale implementations and experiments with published type [11] Luca Cardelli, Jim Donahue, Mick Jordan, Bill Kalsow,
and effect systems. Ideally, a static analysis should come and Greg Nelson. The Modula-3 type system. In Six-
with both a proof of correctness and convincing experimen- teenth Symposium on Principles of Programming Lan-
tal results. Type-based analysis is a promising approach to guages, pages 202–212, 1989.
achieving both with a reasonable effort. [12] James C. Corbett, Matthew B. Dwyer, John Hatcliff,
Further information about type-based analysis and links Shawn Laubach, Corina S. Pasareanu, Robby, and
to many of the cited papers are available from: Hongjun Zheng. Bandera : Extracting finite-state mod-
els from Java source code. In Proceedings of ICSE’00,
http://www.cs.purdue.edu/homes/palsberg/tba/
22nd International Conference on Software Engineer-
ing, pages 439–448, 2000.
Acknowledgments [13] J. Dean, D. Grove, and C. Chambers. Optimization
Thanks to Tony Hosking for helpful discussions, and to John of object-oriented programs using static class hierarchy
Field and the other PASTE 2001 organizers for encourage- analysis. In W. Olthoff, editor, Proceedings of the Ninth
ment. Palsberg was supported by a National Science Foun- European Conference on Object-Oriented Programming
dation Faculty Early Career Development Award, CCR– (ECOOP’95), pages 77–101, Aarhus, Denmark, August
9734265, by CERIAS (Center for Education and Research 1995. Springer-Verlag.
in Information Assurance and Security), and by IBM. [14] David Detlefs, K. Rustan Leino, Greg Nelson, and
James Saxe. Extended static checking. Technical Re-
REFERENCES port 159, Compaq Systems Research Center, 1998.
[1] Martı́n Abadi, Anindya Banerjee, Nevin Heintze, and [15] Allyn Dimock, Robert Muller, Franklyn Turbak, and
Jon Riecke. A core calculus of dependency. In Pro- J. B. Wells. Strongly typed flow-directed representa-
ceedings of POPL’99, 26th Annual SIGPLAN–SIGACT tion transformations. In Proceedings ICFP ’97, Inter-
Symposium on Principles of Programming Languages, national Conference on Functional Programming, ACM
pages 147–160, 1999. SIGPLAN Notices 32(8), pages 11–24, 1997.
[2] Alfred V. Aho, Ravi I. Sethi, and Jeffrey D. Ull- [16] Amer Diwan, Kathryn McKinley, and Eliot Moss.
man. Compilers: Principles, Techniques, and Tools. Type-based alias analysis. In Proceedings of PLDI’98,
Addison-Wesley, Reading, MA, second edition, 1986. ACM SIGPLAN Conference on Programming Language
[3] Torben Amtoft. Minimal thunkification. In Proceed- Design and Implementation, pages 106–117, 1998.
ings of WSA’93, 3rd International Workshop on Static [17] Margaret A. Ellis and Bjarne Stroustrup. The Anno-
Analysis, pages 218–229. Springer-Verlag (LNCS 724), tated C++ Reference Manual. Addison-Wesley, 1990.
1993.
[18] Stephen Fink, Kathleen Knobe, and Vivek Sarkar. Uni- [33] Brian W. Kernighan and Dennis M. Ritchie. The C Pro-
fied analysis of array and object references in strongly gramming Language. Prentice-Hall, 1978.
typed languages. In Proceedings of SAS’00, 7th Inter- [34] Naoki Kobayashi. Type-based useless variable elimina-
national Static Analysis Symposium, pages 155–174. tion. In Proceedings of PEPM’00, ACM Symposium on
Springer-Verlag (LNCS 1824), 2000. Partial Evaluation and Semantics-Based Program Ma-
[19] Adam Fischbach and John Hannan. Type systems and nipulation, pages 84–93, 2000.
algorithms for useless-variable elimination. In Proceed- [35] Tsung-Min Kuo and Prateek Mishra. On strictness and
ings of PADO’01, Symposium on Programs as Data Ob- its analysis. In Proceedings of POPL’87, SIGPLAN–
jects, 2001. To appear. SIGACT Symposium on Principles of Programming
[20] Cormac Flanagan and Stephen Freund. Type-based Languages, pages 144–155, 1987.
race detection for Java. In Proceedings of PLDI’00, [36] Tsung-Min Kuo and Prateek Mishra. Strictness anal-
ACM SIGPLAN Conference on Programming Language ysis: A new perspective based on type inference. In
Design and Implementation, pages 219–232, 2000. Proceedings of Conference on Functional Programming
[21] Sanjay Ghemawat, Keith Randall, and Daniel Scales. Languages and Computer Architecture, pages 260–272,
Field analysis: Getting useful and low-cost interpro- 1989.
cedural information. In Proceedings of PLDI’00, ACM [37] John Lucassen and David Gifford. Polymorphic ef-
SIGPLAN Conference on Programming Language De- fect systems. In Proceedings of POPL’88, SIGPLAN–
sign and Implementation, pages 334–344, 2000. SIGACT Symposium on Principles of Programming
[22] Rajeev Gopal and Stephan R. Schach. Using automatic Languages, pages 47–57, 1988.
program decomposition techniques in software mainte- [38] Robin Milner. A theory of type polymorphism in pro-
nance tools. In Proceedings of ICSM’89, International gramming. Journal of Computer and System Sciences,
Conference on Software Maintenance, pages 132–141, 17:348–375, 1978.
1989. [39] Robin Milner, Mads Tofte, and Robert Harper. The
[23] James Gosling, Bill Joy, and Guy Steele. The Java Lan- Definition of Standard ML. MIT Press, 1990.
guage Specification. Addison-Wesley, 1996. [40] Greg Morrisett, Karl Crary, Neal Glew, Dan Gross-
[24] Christian Grothoff, Jens Palsberg, and Jan Vitek. En- man, Richard Samuels, Frederick Smith, David Walker,
capsulating objects with confined types. In Proceed- Stephanie Weirich, and Steve Zdancewic. Talx86: A
ings of OOPSLA’01, ACM SIGPLAN Conference on realistic typed assembly language. ACM Workshop on
Object-Oriented Programming Systems, Languages and Compiler Support for System Software, May 1999.
Applications, October 2001. To appear. [41] Greg Morrisett, David Walker, Karl Crary, and Neal
[25] John Hannan. Type systems for closure conversions. In Glew. From system F to typed assembly language.
Proceedings of Workshop on Types for Program Analy- In Proceedings of POPL’98, 25th Annual SIGPLAN–
sis, pages 48–62, 1995. SIGACT Symposium on Principles of Programming
[26] Nevin Heintze. Control-flow analysis and type systems. Languages, pages 85–97, 1998.
In Proceedings of SAS’95, International Static Analy- [42] Christian Mossin. Exact flow analysis. In Proceedings
sis Symposium, pages 189–206. Springer-Verlag (LNCS of SAS’97, International Static Analysis Symposium,
983), Glasgow, Scotland, September 1995. pages 250–264. Springer-Verlag (LNCS ), 1997.
[27] Nevin Heintze and David McAllester. Linear-time sub- [43] Christian Mossin. Flow Analysis of Typed Higher-Order
transitive control flow analysis. In Proceedings of ACM Languages. PhD thesis, DIKU, University of Copen-
SIGPLAN 1997 Conference on Programming Language hagen, 1997.
Design and Implementation, pages 261–272, 1997. [44] Steven Muchnick. Advanced Compiler Design and Im-
[28] Susan Horwitz, Thomas Reps, and David Binkley. In- plementation. Morgan Kaufmann, 1997.
terprocedural slicing using dependence graphs. ACM [45] Greg Nelson. Systems Programming with Modula-3.
Transactions on Programming Languages and Systems, Prentice Hall, 1991.
12(1):26–60, 1990. [46] Flemming Nielson. The typed lambda-calculus with
[29] Antony L. Hosking, Nathaniel Nystrom, David Whit- first-class processes. In Proceedings of PARLE, pages
lock, Quintin Cutts, and Amer Diwan. Partial redun- 357–373, April 1989.
dancy elimination for access path expressions. Software [47] Flemming Nielson and Hanne Riis Nielson. Type and
– Practice & Experience, 31(6):577–600, 2001. effect systems. In Correct System Design, pages 114–
[30] Suresh Jagannathan, Andrew Wright, and Stephen 136, 1999.
Weeks. Type-directed flow analysis for typed intermedi- [48] Flemming Nielson, Hanne Riis Nielson, and Chris Han-
ate languages. In Proceedings of SAS’97, International kin. Principles of Program Analysis. Springer-Verlag,
Static Analysis Symposium. Springer-Verlag, 1997. 1999.
[31] Thomas Jensen. Inference of polymorphic and condi- [49] Hanne R. Nielson and Flemming Nielson. Automatic
tional strictness properties. In Proceedings of POPL’98, binding time analysis for a typed λ-calculus. Science of
25th Annual SIGPLAN–SIGACT Symposium on Prin- Computer Programming, 10:139–176, 1988.
ciples of Programming Languages, pages 209–221, San [50] Tobias Nipkow and David von Oheimb. Javalight is type-
Diego, California, January 1998. safe – definitely. In Proceedings of POPL’98, 25th An-
[32] Pierre Jouvelot and David Gifford. Algebraic re- nual SIGPLAN–SIGACT Symposium on Principles of
construction of types and effects. In Proceedings of Programming Languages, pages 161–170, San Diego,
POPL’91, SIGPLAN–SIGACT Symposium on Princi- California, January 1998.
ples of Programming Languages, pages 303–310, 1991.
[51] Peter Ørbæk and Jens Palsberg. Trust in the λ- Systems, Languages, and Applications (OOPSLA’00),
calculus. Journal of Functional Programming, 7(6):557– pages 264–280, Minneapolis, Minnesota), 2000.
591, November 1997. Preliminary version in Proceed- [65] Peter F. Sweeney and Frank Tip. Extracting library-
ings of SAS’95, International Static Analysis Sym- based object-oriented applications. In Proceedings of
posium, Springer-Verlag (LNCS 983), pages 314–330, the Eighth International Symposium on the Founda-
Glasgow, Scotland, September 1995. tions of Software Engineering (FSE-8), pages 98–107,
[52] Jens Palsberg. Closure analysis in constraint form. November 2000.
ACM Transactions on Programming Languages and [66] Jean-Pierre Talpin and Pierre Jouvelot. The type
Systems, 17(1):47–62, January 1995. Preliminary ver- and effect discipline. Information and Computation,
sion in Proceedings of CAAP’94, Colloquium on Trees 111:245–296, 1994. A preliminary version was presented
in Algebra and Programming, Springer-Verlag (LNCS at LICS’92.
787), pages 276–290, Edinburgh, Scotland, April 1994. [67] Yan Mei Tang and Pierre Jouvelot. Separate abstract
[53] Jens Palsberg and Christina Pavlopoulou. From poly- interpretation for control-flow analysis. In Proceedings
variant flow information to intersection and union of TACS’94, Theoretical Aspects of Computing Soft-
types. Journal of Functional Programming, to appear. ware, pages 224–243. Springer-Verlag (LNCS 789),
Preliminary version in Proceedings of POPL’98, 25th 1994.
Annual SIGPLAN–SIGACT Symposium on Principles [68] Yan Mei Tang and Pierre Jouvelot. Effect systems with
of Programming Languages, pages 197–208, San Diego, subtyping. In Proceedings of PEPM’95, ACM Sympo-
California, January 1998. sium on Partial Evaluation and Sematics-Based Pro-
[54] Jens Palsberg and Michael I. Schwartzbach. Object- gram Manipulation, pages 45–53. ACM Press, 1995.
oriented type inference. In Proceedings of OOPSLA’91, [69] Peter Thiemann. Formalizing resource allocation in a
ACM SIGPLAN Sixth Annual Conference on Object- compiler. In ACM Workshop on Types in Compilation,
Oriented Programming Systems, Languages and Ap- pages 178–194, Kyoto, Japan, March 1998.
plications, pages 146–161, Phoenix, Arizona, October [70] Frank Tip, Chris Laffra, Peter F. Sweeney, and David
1991. Streeter. Practical experience with an application ex-
[55] Jens Palsberg and Michael I. Schwartzbach. Object- tractor for Java. In Proceedings of the Fourteenth
Oriented Type Systems. John Wiley & Sons, 1994. Annual Conference on Object-Oriented Programming
[56] Jakob Rehof and Manuel Fähndrich. Type-based Systems, Languages, and Applications (OOPSLA’99),
flow analysis: From polymorphic subtyping to cfl- pages 292–305, Denver, CO), 1999. SIGPLAN Notices
reachability. In Proceedings of POPL’01, 28th Annual 34(10).
SIGPLAN–SIGACT Symposium on Principles of Pro- [71] Frank Tip and Jens Palsberg. Scalable propagation-
gramming Languages, pages 54–66, 2001. based call graph construction algorithms. In Proceed-
[57] Debra J. Richardson. TAOS: Testing with analysis and ings of OOPSLA’00, ACM SIGPLAN Conference on
oracle support. In International Symposium on Soft- Object-Oriented Programming Systems, Languages and
ware Testing and Analysis, pages 138–153, 1994. Applications, pages 281–293, Minneapolis, Minnesota,
[58] Zhong Shao and Valery Trifonov. Type-directed contin- October 2000.
uation allocation. In ACM Workshop on Types in Com- [72] Mads Tofte and Jean-Pierre Talpin. Region-based
pilation, pages 116–136, Kyoto, Japan, March 1998. memory management. Information and Computation,
[59] Olin Shivers. Control-Flow Analysis of Higher-Order 132(2):109–176, 1997.
Languages. PhD thesis, CMU, May 1991. CMU–CS– [73] Franklyn Turbak, Allyn Dimock, Robert Muller,
91–145. and J. B. Wells. Compiling with polymorphic
[60] Geoffrey Smith and Dennis Volpano. Secure infor- and polyvariant flow types. In ACM SIGPLAN
mation flow in multi-threaded imperative language. Workshop on Types in Compilation, June 1997.
In Proceedings of POPL’98, 25th Annual SIGPLAN– http://www.cs.bc.edu/~muller/postscript/tic97.ps.Z.
SIGACT Symposium on Principles of Programming [74] Mark Weiser. Program slicing. IEEE Transactions on
Languages, pages 355–364, 1998. Software Engineering, 10(4):352–357, July 1984.
[61] Kirsten Solberg. Annotated Type Systems for Program [75] J. B. Wells, Allyn Dimock, Robert Muller, and
Analysis. PhD thesis, University of Aarhus, 1995. Franklyn Turbak. A calculus with polymorphic and
[62] Kirsten Solberg, Hanne Riis Nielson, and Flemming polyvariant flow types. Journal of Functional Program-
Nielson. Strictness and totality analysis. In Proceedings ming. To appear.
of SAS’94, International Static Analysis Symposium, [76] Andrew Wright. Typing references by effect inference.
pages 408–422. Springer-Verlag (LNCS 864), 1994. In Proceedings of ESOP’92, European Symposium on
[63] Kirsten Solberg, Hanne Riis Nielson, and Flemming Programming, pages 473–491. Springer-Verlag (LNCS
Nielson. Strictness and totality analysis with conjunc- 582), 1992.
tion. In Proceedings of TAPSOFT’95, Theory and Prac- [77] Andrew Wright and Matthias Felleisen. A syntactic ap-
tice of Software Development, pages 501–515. Springer- proach to type soundness. Information and Computa-
Verlag (LNCS 915), Aarhus, Denmark, May 1995. tion, 115(1):38–94, 1994.
[64] Vijay Sundaresan, Laurie Hendren, Chrislain Razafima- [78] David A. Wright. A new technique for strictness anal-
hefa, Raja Vallée-Rai, Patrick Lam, Etienne Gagnon, ysis. In Proceedings of TAPSOFT’91, pages 235–258.
and Charles Godin. Practical virtual method call Springer-Verlag (LNCS 494), 1991.
resolution for Java. In Proceedings of the Fifteenth
Annual Conference on Object-Oriented Programming

You might also like