0% found this document useful (0 votes)
9 views10 pages

12 Static

Uploaded by

facts scientia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views10 pages

12 Static

Uploaded by

facts scientia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Lecture Notes on

Static Semantics
15-411: Compiler Design
Frank Pfenning

Lecture 12
October 3, 2013

1 Introduction
After lexing and parsing, a compiler will usually apply elaboration to translate the
parse tree to a high-level intermediate form often called abstract syntax. Then we
verify that the abstract syntax satisfies the requirements of the static semantics.
Sometimes, there is some ambiguity whether a given condition should be enforced
by the grammar, by elaboration, or while checking the static semantics. We will not
be concerned with details of attribution, but how to describe and then implement
various static semantic conditions. The principal properties to verify for C0 and
the sublanguages discussed in this course are:
• Initialization: variables must be defined before they are used.
• Proper returns: functions that return a value must have an explicit return state-
ment on every control flow path starting at the beginning of the function.
• Types: the program must be well-typed.
Type checking is frequently discussed in the literature, so we use initialization as
our running example and discuss typing at the end, in Section 9.

2 Abstract Syntax
We will use a slightly restricted form of the abstract syntax in Lecture 10 on IR trees,
with the addition of variable declaration with their scope. This fragment exhibits
all the relevant features for the purposes of the present lecture.
Expressions e ::= n | x | e1 ⊕ e2 | e1 && e2
Statements s ::= assign(x, e) | if(e, s1 , s2 ) | while(e, s)
| return(e) | nop | seq(s1 , s2 ) | decl(x, τ, s)

L ECTURE N OTES O CTOBER 3, 2013


Static Semantics L12.2

3 Definition and Use


Initialization guarantees that every variable is defined before it is used. The natural
way to specify this in two parts: when is a variable is defined, and when it is
used. An error is signaled if we cannot show that every variable in the program is
defined before it is used. As usual, this property is an approximation of what actual
behaviors can be exhibited at runtime.
First, we define when a variable is used in an expression, written as use(e, x).
This is entirely straightforward, since we have a clear separation of expressions
and statements in our language.

no rule for no rule for


use(n, x) use(x, x) use(y, x), y 6= x

use(e1 , x) use(e2 , x)
use(e1 ⊕ e2 , x) use(e1 ⊕ e2 , x)

use(e1 , x) use(e2 , x)
use(e1 && e2 , x) use(e1 && e2 , x)
We see already here that use(e, x) is a so-called may-property: x may be used
during the evaluation of x, but it is not guaranteed to be actually used. For exam-
ple, the expression y > 0 && x > 0 may or may not actually use x. The expression
false && x > 0 will actually never use x, and yet we flag it as possibly being used.
This is appropriate: we would like to raise an error if there is a possibility that
an unitialized variable may be used. Because determining this in general is unde-
cidable, we need to approximate it. Our approximation essentially says that any
variable occurring in an expression may be used. The rule above express this more
formally.
For a language to be usable, it is important that the rules governing the static
semantics are easy to understand for the programmer and have some internal co-
herence. While it might make sense to allow false && x > 0 in particular, what is
the general rule? Designing programming languages and their static semantics is
difficult and requires a good balance of formal understanding of the properties of
programming languages and programmer’s intuition.
Once we have defined use for expressions, we should consider statements.
Does an assignment x = e use x? Our prior experience with liveness analysis
for register allocation on abstract machine could would say: only if it is used in e.
We stay consistent with this intuition and terminology and write live(s, x) for the
judgment that x is live in s. This means the value of x is relevant to the execution
of s.
Before we specify liveness, we should specify when a variable is defined. This is
because, for example, the variable x is not live before the statement x = 3, because

L ECTURE N OTES O CTOBER 3, 2013


Static Semantics L12.3

its current value does not matter for this statement, or any subsequent statement.
We write def(s, x) is the execution of statement s will define s. This is an example
of a must-property: we want to be sure that whenever s executes (and completes
normally, without returning from the current function or raising an exception of
some form), the x has been defined.

no rule for
def(assign(x, e), x) def(assign(y, e), x), y 6= x

def(s1 , x) def(s2 , x)
no rule for
def(if(e, s1 , s2 ), x) def(while(e, s), x)

The last two rules clearly illustrate that def(s, x) is a must-property: A condi-
tional only defines a variable if is it defined along both branches, and a while loop
does not define any variable (since the body may never be executed).

def(s1 , x) def(s2 , x)
no rule for
def(nop, x) def(seq(s1 , s2 ), x) def(seq(s1 , s2 ), x)

def(s, x) y 6= x
def(decl(y, τ, s), x)

The side condition on the last rule apply because s is the scope of y. If we have
already checked variable scoping, then in the particular case of C0, y could not be
equal to x because that would have led to an error earlier. However, even in this
case it may be less error-prone to simply check the condition even if it might be
redundant.
A strange case arises for return statement. Since a return statement never com-
pletes normally, any subsequent statements are unreachable. It is therefore permis-
sible to claim that all variables currently in scope have been defined. We capture
this by simply stating that return(e) defines any variable.

def(return(e), x)

L ECTURE N OTES O CTOBER 3, 2013


Static Semantics L12.4

4 Liveness
We now lift the use(e, x) property to statements, written as live(s, x) (x is live in s).
Liveness is again a may-property.

use(e, x)
live(assign(y, e), x)

use(e, x) live(s1 , x) live(s2 , x)


live(if(e, s1 , s2 ), x) live(if(e, s1 , s2 ), x) live(if(e, s1 , s2 ), x)

We observe that liveness is indeed a may-property, since a variable is live in a con-


ditional if is used in the condition or live in one or more of the branches. Similarly,
if a variable is live in the body of a loop, it is live before because the loop body may
be executed.
use(e, x) live(s, x)
live(while(e, s), x) live(while(e, s), x)

use(e, x) live(x, s) y 6= x
no rule for
live(return(e), x) live(nop, x) live(decl(y, τ, s), x)

In some way the most interesting case is a sequence of statements, seq(s1 , s2 ). If a


variable is live in s2 it is only live in the composition if it is not defined in s1 !

live(s1 , x) ¬def(s1 , x) live(s2 , x)


live(seq(s1 , s2 ), x) live(seq(s1 , s2 ), x)

5 Initialization
Given liveness, we can now say when proper initialization is violated: If a variable
is live at the site of its declaration. That means that its value would be used some-
where before it is defined. Assume we have a program p and we write “s in p” if s is
a statement appearing in p. The the following rule captures the general condition.

decl(x, τ, s) in p live(s, x)
error

Unlike the previous rules in the lecture, this one should be read from the premises
to the conclusion. In this way it is similar to our rules for liveness from Lecture 4.
This brings out an important distinction when we try to convert the specifica-
tion rules into an implementation. We have to decide if the rules should be read

L ECTURE N OTES O CTOBER 3, 2013


Static Semantics L12.5

from the premises to the conclusion, or from the conclusion to the premises. Some-
times, the same property can be specified in different directions. For example, we
can define a predicate init which verifies that all variables are properly initialized
and which works from the conclusion to the premises with the following schema.

init(s1 ) init(s2 )
init(nop) init(seq(s1 , s2 ))

init(s) ¬live(s, x)
init(decl(x, τ, s)) (other rules omitted)

The omitted rules just verify each substatement so that all declarations in the pro-
gram are checked in the end.

6 From Judgments to Functions


We now focus on the special case that the inference rules are to be read bottom-up.
Starting with the judgments we ultimately want to verify, consider init(s). When
we start this, s is known and we are trying to determine if there is a deduction of
init(s) given the rules we have put down. If there is such a deduction, we succeed.
If not, we issue an error message. We can model this as a function returning a
boolean, or a function returning no interesting value but raising an exception in
case there the property is violated.

init : stm → bool

Now each of the rules becomes a case in the function definition.


init(nop) = >
init(seq(s1 , s2 )) = init(s1 ) ∧ init(s2 )
init(decl(x, τ, s)) = init(s) ∧ ¬live(s, x)
...

Here we assume a boolean constant > (for true) and boolean operators con-
junction A ∧ B and negation ¬A in the functional language; later we might use
disjunction A ∨ B and falsehood ⊥. When we call live(s, x) we assume that it is a
similar function.

live : stm × var → bool

This function is now a transcription of the rules for the live judgment. In this
process we sometimes have to combine multiple rules into a single case of the func-
tion definition (as, for example, for seq(s1 , s2 ).

L ECTURE N OTES O CTOBER 3, 2013


Static Semantics L12.6

live(nop, x) = ⊥
live(seq(s1 , s2 ), x) = live(s1 , x) ∨ (¬def(s1 , x) ∧ live(s2 , x))
live(decl(y, τ, s), x) = y 6= x ∧ live(x, s)
...

We still have to write functions for predicates def(s, x) and use(e, x), but these
are a straightforward exercise now.

def : stm × var → bool


use : exp × var → bool

The whole translation was relatively straightforward, primarily because the


rules were well-designed, and because we always had enough information to just
write a boolean function.

7 Maintaining Set of Variables


What we have done above is a perfectly adequate implementation of initialization
checking. But we might also try to rewrite it in order limit the number of traversals
of the statements. For example, in

live(seq(s1 , s2 ), x) = live(s1 , x) ∨ (¬def(s1 , x) ∧ live(s2 , x))

we may traverse s1 twice: once to check if x is live in s1 , and once to see if x is


defined in s1 . In general, we might traverse statements multiple times, namely for
each variable declaration in whose scope it lies. This in itself is not a performance
bug, but let’s see how one might change it.
One way this can often be done is to notice that for any statement s, there could
be multiple variables x such that live(s, x) or def(s, x) holds. We can try to combine
these into a set. We denote a set of variables with δ and define the following two
judgments:

• init(δ, s, δ 0 ): assuming all the variables in δ are defined when s is reached, then
after its execution all the variables in δ 0 will be defined.

• use(δ, e): e will only use variables defined in δ.

As a common convention, we isolate assumptions on the left-hand side of a turn-


stile symbols are write these:

• δ ` s ⇒ δ 0 for init(δ, s, δ 0 ).

• δ ` e for use(δ, e).

L ECTURE N OTES O CTOBER 3, 2013


Static Semantics L12.7

From the previous rules we develop the following:

δ ` s1 ⇒ δ1 δ1 ` s2 ⇒ δ2
δ ` nop ⇒ δ δ ` seq(s1 , s2 ) ⇒ δ2

δ`e δ`e δ ` s1 ⇒ δ1 δ ` s2 ⇒ δ2
δ ` assign(x, e) ⇒ δ ∪ {x} δ ` if(e, s1 , s2 ) ⇒ δ1 ∩ δ2

δ`e δ ` s ⇒ δ0 δ ` s ⇒ δ0
δ ` while(e, s) ⇒ δ δ ` decl(y, τ, s) ⇒ δ 0 − {y}

δ`e
δ ` return(e) ⇒ {x | x in scope}

It is worth reading these rules carefully to make sure you understand them.
The last one is somewhat problematic, since we don’t have enough information
to know which rules declarations we are in the scope of. We should generalize
our judgment to Γ ; δ ` s −→ δ 0 , where Γ is the context containing all variables
currently in scope. Usually, we have

Γ ::= · | Γ, x:τ

Then the last rule might become

δ`e
Γ ; δ ` return(e) ⇒ {x | x ∈ dom(Γ)}

and we would systematically add Γ to all other judgments. We again leave this as
an exercise.
In these judgments we have traded the complexity of traversing statements
multiple times with the complexity of maintaining variables sets.

8 Modes of Judgments
If we consider the judgment δ ` e there is nothing new to consider: we would
translate this to a function

use : set var × exp → bool

On the other hand, it does not work to translate δ ` s ⇒ δ 0 as

init : set var × stm × set var → bool

L ECTURE N OTES O CTOBER 3, 2013


Static Semantics L12.8

This is because, in general, we do not know δ 0 before we start out. We need to


compute it as part of building the deduction! So we need to implement

init : set var × stm → bool × set var

In order to handle return(e), we probably should also pass in a second set of de-
clared variables or a context. We could also avoid returning a boolean by just re-
turning an optional set of defined variables, or raise an exception in case we disover
a variable that is used but not defined.
Examining the rules shows that we will need to be able to add variables to and
remove variables from sets, as well as compute intersections. Otherwise, the code
should be relatively straightforward.
Before we actually start this coding, we should go over the inference rules to
make sure we always have enough information to compute the output δ 0 given the
inputs δ and s. This is the purpose of mode checking. Let’s go over one example:

δ ` s1 ⇒ δ1 δ1 ` s2 ⇒ δ2
δ ` seq(s1 , s2 ) ⇒ δ2

Initially, we know the input δ and s = seq(s1 , s2 ). This means we also know s1 and
s2 . We cannot yet compute δ2 , since the required input δ1 in the second premise
is unknown. But we can compute δ1 from the first premise since we know δ and
s1 and this point. This gives us δ1 and we can now compute δ2 from the second
premise and return it in the conclusion.

init(δ, seq(s1 , s2 )) = let δ1 = init(δ, s1 ) in init(δ1 , s2 )

9 Typing Judgments
Arguably the most important judgment on programs is whether they are well-
typed. We have already introduced the context (or type environment) Γ that assigns
types to variables. The typing judgment for expressions

Γ`e:τ

verifies that the expression e is well-typed with type τ , assuming the variables are
typed as prescribed by Γ. Most of the rules are straightforward; we show a couple.

Γ(x) = τ
Γ`x:τ Γ ` n : int Γ ` true : bool Γ ` false : bool

Γ ` e1 : int Γ ` e2 : int Γ ` e1 : bool Γ ` e2 : bool


Γ ` e1 + e2 : int Γ ` e1 && e2 : bool

L ECTURE N OTES O CTOBER 3, 2013


Static Semantics L12.9

Typing for statements is slightly more complex. Statements are executed for
their effects, but statements in the body of a functions also ultimately return a
value. We write
Γ ` s : [τ ]
to express that s is well-typed in context Γ. If s returns (using a return(e) statement),
then e must be of type τ . We use this to check that no matter how a function returns,
the returned value is always of the correct type.

Γ(x) = τ 0 Γ ` e : τ0 Γ ` e : bool Γ ` s1 : [τ ] Γ ` s2 : [τ ]
Γ ` assign(x, e) : [τ ] Γ ` if(e, s1 , s2 ) : [τ ]

Γ ` e : bool Γ ` s : [τ ] Γ`e:τ
Γ ` while(e, s) : [τ ] Γ ` return(e) : [τ ]

Γ ` s1 : [τ ] Γ ` s2 : [τ ]
Γ ` nop : [τ ] Γ ` seq(s1 , s2 ) : [τ ]

Γ, x:τ 0 ` s : [τ ]
Γ ` decl(x, τ 0 , s) : [τ ]

In the last rule for declarations, we might prohibit shadowing of variables by


requiring that x 6∈ dom(Γ). Alternatively, we could stipulate that the rightmost
occurrence of x in Γ is the one considered when calculating Γ(x). It is also possible
that we already know that no conflict can occur, since shadowing may have been
ruled out during elaboration already.

10 Modes for Typing


When implementing type-checking, we need to decide on a mode for the judgment.
Clearly, we want the context Γ and the expression e or statement s to be known,
but what about the type?
We first look at expression typing, Γ ` e : τ . Can we always know τ ? Perhaps
in our small language fragment from this lecture, but not in L3. For example, if we
check an expression e1 == e2 : bool, we may know the type boo but we do not
know the types of e1 or e2 (they could be bool or int). Similarly, if we have an ex-
pression used as a statement, we do not know the type of expression. Therefore, we
should implement a function that takes the context Γ and e as input and synthesizes
a type τ such that Γ ` e : τ (if such a type exists, and fails otherwise). The resulting
type τ can be then be compared to a given type if that is known. Of course, you
should go through the rules and verify that one can indeed always synthesize a
type.

L ECTURE N OTES O CTOBER 3, 2013


Static Semantics L12.10

For the typing of statements Γ ` s : [τ ] the situation is slightly different. Because


τ is the return type of the function in which s occurs, we will know τ instead of
having to synthesize is. We say we check the statement s against the return type τ .
Therefore, if we assume that functions raise an exception if an expression or
statement is not well-typed, we might have functions such as

syn exp : ctx × exp → tp


chk stm : ctx × stm × tp → unit

For convenience, we might also write a function

chk exp : ctx × exp × tp → unit

where chk exp(e, τ ) would simply synthesize a type τ 0 for e and compare it to τ .

Questions
1. Write out the rules for proper returns: along each control flow path starting
at the beginning of a function, there must be a return statement. Clearly, this
is a must-property.

2. Write some cases in the functions for type-checking.

L ECTURE N OTES O CTOBER 3, 2013

You might also like