Type Sensitive Control Flow Analysis
Type Sensitive Control Flow Analysis
John Reppy
University of Chicago
[email protected]
4 We discuss handling polymorphic types and user-defined type constructors 5 Ina language with sum types, deconstruction would be replaced by a case
in Section 6. expression.
unknown value: “e” is an object-language syntactic form and V[x 7→ v] to denote
the functional update of an approximation (likewise for R and T ).
U(T) = Tb
Our unit of analysis is a single abstype definition. We use LVAR
U(τ1 → τ2 ) = τ1\→ τ2 to denote the set of variables defined in the definition, GVAR to de-
U (hτ1 , τ2 i) = hU(τ1 ), U (τn )i note variables defined elsewhere, and VAR = LVAR ∪ GVAR for
Note that for unknown pair values, we preserve the fact that they all variables defined or mentioned in the definition being analyzed.
have a pair structure. Lastly, the > value is used to cutoff expansion We denote the known function identifiers by F UN I D ⊂ LVAR (i.e.,
of recursive types as described below. those variables that are defined by function bindings). These in-
We define the join of two approximate values as follows: clude the top-level function bindings in the definition, as well as
any nested function definitions. Our algorithm analyses the func-
⊥ ∨ vb = vb tion definitions in the declaration repeatedly until a fixed-point is
vb ∨ ⊥ = vb reached. The initial approximations map local variables, function
C(b v 1 ) ∨ C(b v2 ) = C(b v 1 ∨ vb2 ) results, and abstract types to ⊥, and map global variables and ex-
v 1 , vb2 i ∨ hvb0 1 , vb0 2 i = hb
hb v 1 ∨ vb0 1 , vb2 ∨ vb0 2 i ternal types to unknown values.
F ∨F0 = F ∪ F0
> ∨ vb = > fun cfa [[abstype T = C of τ with fb 1 · · · fb n end]] =
τb ∨ vb = τb let
vb ∨ τb = τb fun iterate A0 = let
val A1 = cfaFB (A0 , fb 1 )
vb ∨ > = > ···
Note that this operation is not total, but it is defined for any two val An = cfaFB (An−1 , fb n )
approximate values of the same type and we show in Section 5 that in
if (A0 6= An )
it preserves types.
then iterate An
One technical complication is that we need to keep our approx- else A0
imate values finite. For example, consider the following pathologi- end
cal example: let V = {x 7→ ⊥ | x ∈ LVAR \ F UN I D}
∪ {f 7→ {f } | f ∈ F UN I D}
abstype T = C of T with fun f (x) = C x end ∪ {x 7→ U(τ ) | xτ ∈ GVAR}
If we are not careful, our analysis might diverge computing ever let R = {f 7→ ⊥ | f ∈ F UN I D}
larger approximations of C ∞ (⊥) as the result of f . To avoid this let T = {T 7→ ⊥} ∪ {S 7→ S b | S ∈ (A BSTY \ {T })}
in
problem, we define a limit on the depth of approximations for
iterate (V, R, T )
recursive types as follows: end
d⊥eC = ⊥
> if C ∈ C The cfaFB function analyses a function binding in the abstype
τ →T
˚ ˇ
C (b
v) C = C(dbv eC∪{C} ) if C 6∈ C definition by “applying” the function to the unknown value of
the function’s argument type. The result of the application is then
dhb
v1 , vb2 ieC = h db
v1 eC , db
v2 eC i
recorded as escaping.
dF eC = F
db
τ eC = τb fun cfaFB (A, [[fun f (xτ ) = e]]) = let
val (A, vb) = applyFun ({}, A, f , U(τ ))
where C ⊂ DATAC ON is a set of constructors. We write db v e for in
db
v e∅ . We use > to cutoff the expansion of approximate values in- escape ({}, A, v b)
stead of Tb, because the approximation of escaping values of type end
T may not be an accurate approximation of the nested values. This
definition does not allow nested applications of the same construc- The applyFun function analyses the application of a known
tor. For example, the analysis will be forced to approximate the function f to an approximate value vb. The first argument to
escaping values of type T by C(>) in the above example. applyFun is a set M ∈ 2F UN I D of known functions that are
currently being analysed; if f is in this set, then we use the ap-
3.3 CFA proximation R instead of recursively analysing the f ’s body. This
mechanism is necessary to guarantee termination when analysing
Our analysis algorithm computes a triple of approximations: A = recursive functions. We assume the existence of a helper function
(V, R, T ), where bindingOf that maps known function names to their bindings
in the source. Once we have computed the approximate result (r)
V ∈ VAR → V\ ALUE variable approximation
of evaluating the function’s body, we add that information to the
R ∈ F UN I D → V\
ALUE function-result approximation result approximation.
T ∈ A BSTY → V\ ALUE escaping abstract-value
approximation fun applyFun (M, A as (V,R,T ), f , v) =
if f ∈ M
Our V approximation corresponds to Serrano’s A. The R ap- then (A, R(f ))
proximation records an approximation of function results for each else let
known function; this approximation is used in lieu of analysing a val [[fun f (x) = e]] = bindingOf (f )
function’s body when the function is already being analysed and is val V = V[x 7→ dV(x) ∨ ve]
needed to guarantee termination. We use the T approximation to val ((V, R, T ), r) =
cfaExp (M ∪ {f }, (V,R,T ), [[e]])
interpret abstract values of the form Tb. val R = R[f 7→ dR(f ) ∨ re]
We present the analysis algorithm using SML syntax extended in
with mathematical notation such as set operations, and the ∨ oper- ((V,R,T ), r)
ation on approximate values. We use the notation [[e]] to denote that end
fun cfaExp (M, A as (V,R,T ), [[x]]) = (A, V(x)) tions, an approximation triple, and an syntactic expression as ar-
| cfaExp (M, A, [[let x = e1 in e2 ]]) = let guments and returns updated approximations and a value that ap-
val ((V,R,T ), v b) = cfaExp (M, A, [[e1 ]])
proximates the result of the expression. For function applications,
val V = V[x 7→ dV(x) ∨ v be]
in
we use the apply helper function (discussed below) and for value
cfaExp (M, (V, R, T ), [[e2 ]]) deconstruction, we use the decon helper function, which handles
end the deconstruction of approximate values and their binding to vari-
| cfaExp (M, A, [[fun f (x) = e1 in e2 ]]) = ables. When the value is unknown (i.e., Tb), then we use the T ap-
cfaExp (M, A, [[e2 ]]) proximation to determine the value being deconstructed.
| cfaExp (M, A, [[e1 e2 ]]) = let
val (A, v b1 ) = cfaExp (M, A, [[e1 ]]) fun decon (V, T , [[C(x)]], C(b v )) = V[x 7→ dV(x) ∨ v
be]]
val (A, v b2 ) = cfaExp (M, A, [[e2 ]]) | decon (V, T , [[C τ →T (x)]], Tb) = (case T (T )
in of Tb => V[x 7→ dV(x) ∨ U (τ )e]
apply (M, A, v b1 , v
b2 ) b => decon(V, T , [[C(x)]], v
| v b)
end (* end case *))
| cfaExp (M, A, [[C e]]) = let | decon (V, T , [[C(x)]], ⊥) = V
val (A, v b) = cfaExp (M, A, [[e]]) | decon (V, T , [[C(x)]], >) = V[x 7→ >]
in
(A, C(bv ))
end
The apply function records the fact that an approximate func-
| cfaExp (M, A, [[let C x = e1 in e2 ]]) = let tion value is being applied to a approximate argument. When the
val ((V,R,T ), v b) = cfaExp (M, A, [[e1 ]]) approximation is a set of known functions, then we apply each
val V = decon (V, T , [[C x]], v b) function in the set to the argument compute the join of the results.
in When the function is unknown (i.e., a top value), then the argu-
cfaExp (M, (V,R,T ), [[e2 ]]) ment is marked as escaping and the result is the top value for the
end function’s range.
| cfaExp (M, A, [[he1 , e2 i]]) = let
val (A, v b1 ) = cfaExp (M, A, [[e1 ]]) fun apply (M, A, F , arg) = let
val (A, v b2 ) = cfaExp (M, A, [[e2 ]]) fun applyf (f , (A, res)) = let
in val (A, v b) = applyFun (M, A, f , arg)
(A, hb b2 i)
v1 , v in
end (A, res ∨ v
b)
| cfaExp (M, A, [[fst(e)]]) = ( end
case cfaExp (M, A, [[e]]) in
of (A, hb b2 i) => (A, v
v1 , v b2 ) fold applyf (V, T ) F
| (A, vb) => (A, v b) end
(* end case *)) | apply (M, A, τ1\ → τ2 , v
b) = let
| cfaExp (M, A, [[snd(e)]]) = ( val A = escape(M, A, v b)
case cfaExp (M, A, [[e]]) in
of (A, hb b2 i) => (A, v
v1 , v b1 ) (A, τb2 )
| (A, vb) => (A, v b) end
(* end case *)) | apply (M, A, ⊥, v b) = (A, ⊥)
| apply (M, A, >, v b) = let
Figure 2. CFA for expressions val A = escape(M, A, v b)
in
(A, >)
end
The escape function records the fact that a value escapes into
the wild. If the value has an abstract type, then it is added to the
approximation of wild values for the type; if it is a set of known
functions, then we apply each function in the set to the appropriate
top value; and if it is a pair, we record that its subcomponents 4. An example
are escaping. The escape function also takes the set of currently To illustrate the analysis, we step through its application to the
active functions as its first argument, since it needs to pass this following small example:6
value to the applyFun function.
abstype t = C of (int * int)
fun escape (_, (V,R,T ), C(b v )) = with
(V, R, T [T 7→ dT (T ) ∨ C(b v )e]) fun new x = C(1, x)
| escape (M, A, F ) = let fun pick y = let C(z) = y in fst(z)
fun esc (f τ1 → τ2 , A) = let end
val (A, vb) = applyFun(M, A, f , U(τ1 ))
in A end
This code has two functions and three other local variables:
in
fold esc A F
end
F UN I D = {new, pick}
| escape (M, A, hb b2 i) = let
v1 , , v LVAR = {x, y, z} ∪ F UN I D
val A1 = escape (M, A, v b1 ) GVAR = {}
val A2 = escape (M, A1 , v b2 )
in A2 end
| escape (_, A, v b) = A
6 This example is a variation of the one given in Section 2. We have also
Expressions are analysed by the cfaExp function, whose code taken the liberty of adding integer constants to our language and to the
is given in Figure 2. This function takes the set of active func- representation of approximate values.
The initial approximations are so the final result is:
A0 = (V0 , R0 , T0 ) V(x) = int
d
V0 = {new 7→ {new}, pick 7→ {pick}} V(y) = C(1, int)
d
∪ {x 7→ ⊥, y 7→ ⊥, z 7→ ⊥} V(z) = (1, int)
d
R0 = {f 7→ ⊥| f ∈ F UN I D} R(new) = C(1, int)
d
T0 = {t 7→ ⊥} ∪ {S 7→ Sb | S 6= t} R(pick) = 1
The cfa function will apply cfaFB to each of the bindings. We T (t) = C(1, int)
d
start with new and compute
b1 ) = applyFun({}, A0 , new, int)
(A1 , v d
5. Correctness of the analysis
Computing this application of applyFun involves computing The correctness of our analysis can be judged on several dimen-
cfaExp ({new}, (V1 , R0 , T0 ), [[C(1, x)]]) sions. First, there is the question of safety: does the analysis com-
pute an approximation of the actual computation? Second is the
where V1 = V0 [x 7→ int].d The result of analyzing the body of question of whether the algorithm terminates? The third correct-
new will be ((V1 , R0 , T0 ), C(1, int)),
d thus we have ness issue is the question of whether the approximations computed
by the analysis are faithful to the type of the program. This ques-
A1 = (V1 , R1 , T0 ) tion is important, since our analysis is guided by type information
in a number of situations. We discuss the first and third of these
R1 = {new 7→ C(1, int),
d pick 7→ ⊥}
questions in the remainder of this section.
vb1 = C(1, int)
d
5.1 Safety
The last step in analyzing new is to compute We postulate that our analysis is safe, i.e., that if it computes a
escape ({}, A1 , v
b1 ) value approximation V, then for any variable x ∈ dom(V) and any
execution of the program, the values taken on by x will be covered
which results in an enriched approximation that records the escap- by the approximate value V(x). One can formalize this statement
ing value of type t. in terms of a collecting semantics [CC77, Shi91] and prove it using
standard techniques, but we will make a less formal argument here.
A2 = (V1 , R1 , T1 ) The core of our analysis is the well known 0-CFA, which has been
T1 = {t 7→ C(1, int)}
d proven correct in a number of papers, but we have extended this
analysis with the T approximation for tracking abstract values that
Now we are ready to analyse the pick function binding, which escape to the wild. For a given abstract-type definition
means computing
abstype T = C of τ in ... end
b3 ) = applyFun({}, A2 , pick, t
(A3 , v b)
the ML type system restricts the scope of C to the “...”; thus,
Let V2 = V1 [y 7→ tb], then we evaluate the body of pick with the values of type T can only be constructed/deconstructed in the body
initial approximations (V2 , R1 , T1 ). The interesting case is when of the definition. Therefore, we claim that if the T approximation
cfaExp gets to the deconstruction of the value bound to y. In this computed by our analysis is “safe,” then our analysis is correct.
case, we must compute There are two aspects to the safety of T : first, does it correctly
approximate the values that escape and second, does the analysis
decon (V2 , T1 , [[C(z)]], t
b) correctly identify all possible places where escaping values could
reenter the definition? There are only two ways for values to escape
This case is handled by the second clause of the decon function, the definition: they can be returned in the result of one of the
which applies T1 to t, producing C(1, int),
d which results in the operations or they can be passed as an argument to an external
recursive call of decon or unknown function.7 The first of these cases is covered by the
call to escape in cfaFB, while the second is covered by the
decon (V2 , T1 , [[C(z)]], C(1, int))
d
call to escape in in apply. Thus, we claim that any escaping
The recursive call is handled by the first clause of decon, which abstract value in any possible execution will be covered by the
returns the augmented value approximation T approximation computed by our analysis. There are also only
two ways for escaping values to enter the definition: they can be
V3 = V2 [z 7→ (1, int)]
d passed in an argument to one of the definition’s operations or they
can be returned by a call to an external or unknown function. In
This approximation will be used in the analysis of [[fst(z)]], both of these situations, we use the approximate value U(τ ) to
which will produce 1 as its approximate result. Thus, the result represent unknown values of type τ . If such a value propagates to a
of applyFun on pick is deconstruction site, then we use the T approximation to determine
the values bound to the pattern variables.
A3 = (V2 , T1 , R2 )
R2 = R1 [pick 7→ 1] 5.2 Termination
vb3 = 1 The question of termination has been addressed by Serrano [Ser95]
(the bounding of the sizes of abstract values is crucial to the termi-
The escape function will not change the approximations in this
case, so A3 is the result of the first iteration over the function 7 Ifour language had references, then assignment would be another way for
bindings. It is also the fixedpoint of the analysis for this example, values to escape.
` fb 1 : Ok · · · ` fb n : Ok ` τ : Type
` abstype T = C τ →T of τ with fb 1 · · · fb n end : Ok ` ⊥: τ
` e : τ2 ` vb : τ
` fun f τ1 →τ2 (xτ1 ) = e : Ok ` C τ →T (b
v) : T
` vb1 : τ1 ` vb2 : τ2
` xτ : τ ` hbv1 , vb2 i : τ1 × τ2
` e1 : τ1 ` e2 : τ2 ` f : τ → τ 0 for all f ∈ F
` let xτ1 = e1 in e2 : τ2 ` F : τ → τ0
` fun f (x) = e1 : Ok ` e2 : τ
` fun f (x) = e1 in e2 : τ ` τb : τ
` e1 : τ2 → τ ` e2 : τ2 ` τ : Type
` e1 e2 : τ ` > : τ
U (α) = > 8 Another analysis technique might also work, but our type-sensitive CFA
U (→
−
τ T) = →
−
τ T
d fits well with common CML programming idioms.
abstype serv = S of (int * int chan) chan abstype noise = N of {
in perm : Word8Array.array,
fun new () = let grad : real array
val reqCh = channel() }
fun server v = let in
val (req, replCh) = recv reqCh fun mkNoise () = let
in val perm = Word8Array.array(256, 0w0)
send(replCh, v); val grad = Array(256, 0.0)
server req ...
end in
in N{perm = perm, grad = grad}
spawn (server 0); end
S reqCh fun noise (N{perm, grad}, x) = let
end val t = x + 512.0
fun call (S ch, v) = let val b0 = (floor t) mod 256
val replCh = channel() val b1 = (b0 + 1) mod 256
in val r0 = t - Real.realFloor t
send (ch, (v, replCh)); val r1 = r0 - 1.0
recv replCh val sX = sCurve r0
end fun get i =
end Array.sub(grad,
Word8.toInt(
Figure 5. A simple service with an abstract client-server protocol Word8Array.sub(perm, i)))
val u = r0 * get b0
val v = r1 * get b1
in
lerp (sx, u, v)
end
by an abstract type, and the call function allows clients to re- end
quest the service. We have developed an analysis that can detect
that the server’s request channel (reqCh) is used by potentially
many different senders, but by only on receiver, and that the re- Figure 7. 1D Perlin Noise
ply channel allocated for a given request (replCh) is used only
once [Xia05, RX06]. These properties allow the optimizer to re-
checking,9 so array-bounds-check elimination is an important opti-
place the general channel operations with more specialized ones.
mization for this application. To make this example more concrete,
Continuing with the example, Figure 6 illustrates the flow of the
Figure 7 sketches the code for 1D Perlin noise. The key point about
server’s request channel from its allocation site in the new function
this code is that the array subscript operations used to compute u
to its two use sites (the recv in new and the send in call).
and v can be statically eliminated as long as we know that the perm
Note that while the unknown clients of the service may store a a
and grad arrays have 256 elements. Using our type-sensitive CFA,
serv value in data structures, etc.., they may not access the inter-
we can map the arrays used in noise back to the allocation sites
nal representation and perform operations directly on the request
in mkNoise and thus enable array-bounds-check elimination.
channel. Because the server’s request channel is embedded in an
abstract value, our analysis is able to determine its use sites and
we can classify it as a known channel. The analysis will also iden- 8. Related work
tify replCh as a known channel, which allows the communication The application of control-flow analysis for higher-order func-
topology of this example to be accurately approximated. tional languages dates back to Shivers’ seminal work on control-
flow analysis for S CHEME [Shi88, Shi91]. Many variations of this
approach have been published including Serrano’s algorithm on
7.2 Perlin noise
which we base the presentation in this paper [Ser95].
Procedural generation of geometry and textures in computer graph- There is a significant body of work that falls into the intersection
ics applications often uses noise functions to introduce variability of type systems and program analysis. Some researchers have used
and produce more realistic images. One popular technique, which control-flow analysis to compute type information for untyped lan-
was developed by Ken Perlin, defines a mapping from <n to the guages [Shi91], while others have used type systems for program
unit interval, such that values that are close in <n will be mapped analysis [Pal01, Jen02].
to values that are close (ı.e., the noise has local similarity, but global Perhaps the most closely related work has been on using type
randomness) [PH89, Per02]. To compute the noise function, we information to guide analyses. For example, Jagannathan et al. de-
first divide up a unit n-cube into equal sized cells and pre-compute vised a flow analysis for a typed intermediate language as one
a random gradient vector in <n for each of the cell corners. Then might find in an ML compiler. Their analysis uses type information
the noise function is computed by mapping a point in <n to a cell to control polyvariance in the analysis and they prove that the anal-
interpolating between the 2n gradient vectors at the corners of the ysis respects the type system [WJ98]. Saha et al. used type infor-
cell. In a typical application, the noise function is sampled fre- mation to improve the performance of a demand-driven implemen-
quently and so its efficiency is paramount. The standard implemen- tation of CFA [SHO98] in the SML/NJ compiler. Lastly, Diwan et
tation of Perlin’s noise function uses a pair of precomputed arrays al. used type information to improve alias analysis for M ODULA-3
to allow fast calculation of the noise function. One of these arrays programs [DMM98]. We are not aware of any existing algorithms
is an array of 256 random gradient vectors in <n ; the second is a that use type abstraction to track values leaving and re-entering the
random permutation array of byte-sized indices. The permutation unit of analysis as we do.
array is used to map cell corners to indices in the gradient-vector
array. In a language like SML, the performance of the noise func- 9 For example, in the two-dimensional case, the noise function does 10 array
tion can be significantly reduced by the overhead of array bounds subscript operations.
fun new () = let
val reqCh = channel()
fun server v = let
val (req, replCh) = recv reqCh
in
send(replCh, v);
server req
end
in
spawn (server 0);
S reqCh
Unknown end
clients fun call ( S ch , v) = let
val replCh = channel()
in
send ( ch , (v, replCh));
recv replCh
end