12 Static
12 Static
Static Semantics
15-411: Compiler Design
Frank Pfenning
Lecture 12
October 3, 2013
1 Introduction
After lexing and parsing, a compiler will usually apply elaboration to translate the
parse tree to a high-level intermediate form often called abstract syntax. Then we
verify that the abstract syntax satisfies the requirements of the static semantics.
Sometimes, there is some ambiguity whether a given condition should be enforced
by the grammar, by elaboration, or while checking the static semantics. We will not
be concerned with details of attribution, but how to describe and then implement
various static semantic conditions. The principal properties to verify for C0 and
the sublanguages discussed in this course are:
• Initialization: variables must be defined before they are used.
• Proper returns: functions that return a value must have an explicit return state-
ment on every control flow path starting at the beginning of the function.
• Types: the program must be well-typed.
Type checking is frequently discussed in the literature, so we use initialization as
our running example and discuss typing at the end, in Section 9.
2 Abstract Syntax
We will use a slightly restricted form of the abstract syntax in Lecture 10 on IR trees,
with the addition of variable declaration with their scope. This fragment exhibits
all the relevant features for the purposes of the present lecture.
Expressions e ::= n | x | e1 ⊕ e2 | e1 && e2
Statements s ::= assign(x, e) | if(e, s1 , s2 ) | while(e, s)
| return(e) | nop | seq(s1 , s2 ) | decl(x, τ, s)
use(e1 , x) use(e2 , x)
use(e1 ⊕ e2 , x) use(e1 ⊕ e2 , x)
use(e1 , x) use(e2 , x)
use(e1 && e2 , x) use(e1 && e2 , x)
We see already here that use(e, x) is a so-called may-property: x may be used
during the evaluation of x, but it is not guaranteed to be actually used. For exam-
ple, the expression y > 0 && x > 0 may or may not actually use x. The expression
false && x > 0 will actually never use x, and yet we flag it as possibly being used.
This is appropriate: we would like to raise an error if there is a possibility that
an unitialized variable may be used. Because determining this in general is unde-
cidable, we need to approximate it. Our approximation essentially says that any
variable occurring in an expression may be used. The rule above express this more
formally.
For a language to be usable, it is important that the rules governing the static
semantics are easy to understand for the programmer and have some internal co-
herence. While it might make sense to allow false && x > 0 in particular, what is
the general rule? Designing programming languages and their static semantics is
difficult and requires a good balance of formal understanding of the properties of
programming languages and programmer’s intuition.
Once we have defined use for expressions, we should consider statements.
Does an assignment x = e use x? Our prior experience with liveness analysis
for register allocation on abstract machine could would say: only if it is used in e.
We stay consistent with this intuition and terminology and write live(s, x) for the
judgment that x is live in s. This means the value of x is relevant to the execution
of s.
Before we specify liveness, we should specify when a variable is defined. This is
because, for example, the variable x is not live before the statement x = 3, because
its current value does not matter for this statement, or any subsequent statement.
We write def(s, x) is the execution of statement s will define s. This is an example
of a must-property: we want to be sure that whenever s executes (and completes
normally, without returning from the current function or raising an exception of
some form), the x has been defined.
no rule for
def(assign(x, e), x) def(assign(y, e), x), y 6= x
def(s1 , x) def(s2 , x)
no rule for
def(if(e, s1 , s2 ), x) def(while(e, s), x)
The last two rules clearly illustrate that def(s, x) is a must-property: A condi-
tional only defines a variable if is it defined along both branches, and a while loop
does not define any variable (since the body may never be executed).
def(s1 , x) def(s2 , x)
no rule for
def(nop, x) def(seq(s1 , s2 ), x) def(seq(s1 , s2 ), x)
def(s, x) y 6= x
def(decl(y, τ, s), x)
The side condition on the last rule apply because s is the scope of y. If we have
already checked variable scoping, then in the particular case of C0, y could not be
equal to x because that would have led to an error earlier. However, even in this
case it may be less error-prone to simply check the condition even if it might be
redundant.
A strange case arises for return statement. Since a return statement never com-
pletes normally, any subsequent statements are unreachable. It is therefore permis-
sible to claim that all variables currently in scope have been defined. We capture
this by simply stating that return(e) defines any variable.
def(return(e), x)
4 Liveness
We now lift the use(e, x) property to statements, written as live(s, x) (x is live in s).
Liveness is again a may-property.
use(e, x)
live(assign(y, e), x)
use(e, x) live(x, s) y 6= x
no rule for
live(return(e), x) live(nop, x) live(decl(y, τ, s), x)
5 Initialization
Given liveness, we can now say when proper initialization is violated: If a variable
is live at the site of its declaration. That means that its value would be used some-
where before it is defined. Assume we have a program p and we write “s in p” if s is
a statement appearing in p. The the following rule captures the general condition.
decl(x, τ, s) in p live(s, x)
error
Unlike the previous rules in the lecture, this one should be read from the premises
to the conclusion. In this way it is similar to our rules for liveness from Lecture 4.
This brings out an important distinction when we try to convert the specifica-
tion rules into an implementation. We have to decide if the rules should be read
from the premises to the conclusion, or from the conclusion to the premises. Some-
times, the same property can be specified in different directions. For example, we
can define a predicate init which verifies that all variables are properly initialized
and which works from the conclusion to the premises with the following schema.
init(s1 ) init(s2 )
init(nop) init(seq(s1 , s2 ))
init(s) ¬live(s, x)
init(decl(x, τ, s)) (other rules omitted)
The omitted rules just verify each substatement so that all declarations in the pro-
gram are checked in the end.
Here we assume a boolean constant > (for true) and boolean operators con-
junction A ∧ B and negation ¬A in the functional language; later we might use
disjunction A ∨ B and falsehood ⊥. When we call live(s, x) we assume that it is a
similar function.
This function is now a transcription of the rules for the live judgment. In this
process we sometimes have to combine multiple rules into a single case of the func-
tion definition (as, for example, for seq(s1 , s2 ).
live(nop, x) = ⊥
live(seq(s1 , s2 ), x) = live(s1 , x) ∨ (¬def(s1 , x) ∧ live(s2 , x))
live(decl(y, τ, s), x) = y 6= x ∧ live(x, s)
...
We still have to write functions for predicates def(s, x) and use(e, x), but these
are a straightforward exercise now.
• init(δ, s, δ 0 ): assuming all the variables in δ are defined when s is reached, then
after its execution all the variables in δ 0 will be defined.
• δ ` s ⇒ δ 0 for init(δ, s, δ 0 ).
δ ` s1 ⇒ δ1 δ1 ` s2 ⇒ δ2
δ ` nop ⇒ δ δ ` seq(s1 , s2 ) ⇒ δ2
δ`e δ`e δ ` s1 ⇒ δ1 δ ` s2 ⇒ δ2
δ ` assign(x, e) ⇒ δ ∪ {x} δ ` if(e, s1 , s2 ) ⇒ δ1 ∩ δ2
δ`e δ ` s ⇒ δ0 δ ` s ⇒ δ0
δ ` while(e, s) ⇒ δ δ ` decl(y, τ, s) ⇒ δ 0 − {y}
δ`e
δ ` return(e) ⇒ {x | x in scope}
It is worth reading these rules carefully to make sure you understand them.
The last one is somewhat problematic, since we don’t have enough information
to know which rules declarations we are in the scope of. We should generalize
our judgment to Γ ; δ ` s −→ δ 0 , where Γ is the context containing all variables
currently in scope. Usually, we have
Γ ::= · | Γ, x:τ
δ`e
Γ ; δ ` return(e) ⇒ {x | x ∈ dom(Γ)}
and we would systematically add Γ to all other judgments. We again leave this as
an exercise.
In these judgments we have traded the complexity of traversing statements
multiple times with the complexity of maintaining variables sets.
8 Modes of Judgments
If we consider the judgment δ ` e there is nothing new to consider: we would
translate this to a function
In order to handle return(e), we probably should also pass in a second set of de-
clared variables or a context. We could also avoid returning a boolean by just re-
turning an optional set of defined variables, or raise an exception in case we disover
a variable that is used but not defined.
Examining the rules shows that we will need to be able to add variables to and
remove variables from sets, as well as compute intersections. Otherwise, the code
should be relatively straightforward.
Before we actually start this coding, we should go over the inference rules to
make sure we always have enough information to compute the output δ 0 given the
inputs δ and s. This is the purpose of mode checking. Let’s go over one example:
δ ` s1 ⇒ δ1 δ1 ` s2 ⇒ δ2
δ ` seq(s1 , s2 ) ⇒ δ2
Initially, we know the input δ and s = seq(s1 , s2 ). This means we also know s1 and
s2 . We cannot yet compute δ2 , since the required input δ1 in the second premise
is unknown. But we can compute δ1 from the first premise since we know δ and
s1 and this point. This gives us δ1 and we can now compute δ2 from the second
premise and return it in the conclusion.
9 Typing Judgments
Arguably the most important judgment on programs is whether they are well-
typed. We have already introduced the context (or type environment) Γ that assigns
types to variables. The typing judgment for expressions
Γ`e:τ
verifies that the expression e is well-typed with type τ , assuming the variables are
typed as prescribed by Γ. Most of the rules are straightforward; we show a couple.
Γ(x) = τ
Γ`x:τ Γ ` n : int Γ ` true : bool Γ ` false : bool
Typing for statements is slightly more complex. Statements are executed for
their effects, but statements in the body of a functions also ultimately return a
value. We write
Γ ` s : [τ ]
to express that s is well-typed in context Γ. If s returns (using a return(e) statement),
then e must be of type τ . We use this to check that no matter how a function returns,
the returned value is always of the correct type.
Γ(x) = τ 0 Γ ` e : τ0 Γ ` e : bool Γ ` s1 : [τ ] Γ ` s2 : [τ ]
Γ ` assign(x, e) : [τ ] Γ ` if(e, s1 , s2 ) : [τ ]
Γ ` e : bool Γ ` s : [τ ] Γ`e:τ
Γ ` while(e, s) : [τ ] Γ ` return(e) : [τ ]
Γ ` s1 : [τ ] Γ ` s2 : [τ ]
Γ ` nop : [τ ] Γ ` seq(s1 , s2 ) : [τ ]
Γ, x:τ 0 ` s : [τ ]
Γ ` decl(x, τ 0 , s) : [τ ]
where chk exp(e, τ ) would simply synthesize a type τ 0 for e and compare it to τ .
Questions
1. Write out the rules for proper returns: along each control flow path starting
at the beginning of a function, there must be a return statement. Clearly, this
is a must-property.