0% found this document useful (0 votes)
9 views101 pages

Cours2 en Handout

The document discusses the concepts of abstract syntax and formal semantics in programming languages, emphasizing the importance of defining program meanings precisely. It covers various semantics approaches such as operational, denotational, and axiomatic semantics, and illustrates how abstract syntax trees are constructed from source code. Additionally, it addresses the role of environments in evaluating expressions and statements, highlighting potential issues like undefined variables and arithmetic overflow.

Uploaded by

bogg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views101 pages

Cours2 en Handout

The document discusses the concepts of abstract syntax and formal semantics in programming languages, emphasizing the importance of defining program meanings precisely. It covers various semantics approaches such as operational, denotational, and axiomatic semantics, and illustrates how abstract syntax trees are constructed from source code. Additionally, it addresses the role of environments in evaluating expressions and statements, highlighting potential issues like undefined variables and arithmetic overflow.

Uploaded by

bogg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 101

École Polytechnique

CSC 52064 – Compilation

Jean-Christophe Filliâtre

abstract syntax, semantics

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 1


meaning

how to define the meaning of programs?

most of the time, we are satisfied with an informal description, in natural


language (ISO norm, standard, reference book, etc.)

yet it is imprecise, sometimes even ambiguous

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 2


informal semantics

The Java programming language


guarantees that the operands of op-
erators appear to be evaluated in
a specific evaluation order, namely,
from left to right.
It is recommended that code not
rely crucially on this specification.

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 3


today

1. abstract syntax
2. formal semantics
• big-step operational semantics
• interpreter
• small-step operational semantics
3. application
• correctness of a compiler

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 4


formal semantics

formal semantics gives a mathematical characterization of the


computations defined by a program

useful to make tools (interpreters, compilers, etc.)

necessary to reason about programs

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 5


raises another question

what is a program?

as a syntactic object (sequence of characters),


it is to complex to apprehend

that’s why we switch to abstract syntax

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 6


abstract syntax

source


code production
lexical analysis


assembly code
stream of tokens


assembler (as)
parsing


machine language
abstract syntax tree


semantic analysis linking (ld)
↓ ↓
abstract syntax + symbol table executable

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 7


abstract syntax

the texts
2*(x+1)
and
(2 * ((x) + 1))
and
2 * /* I double */ ( x + 1 )

all map to the same abstract syntax tree


×
2 +
x 1

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 8


notation

we define an abstract syntax using a grammar

e ::= c constant
| x variable
| e +e addition
| e ×e multiplication
| ...

reads “an expression, noted e, is


• either a constant c,
• either a variable x,
• either the addition of two expressions,
• etc.”

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 9


notation

notation e1 + e2 of the abstract syntax borrows the symbol of the concrete


syntax

but we could have picked something else, e.g. Add(e1 , e2 ), +(e1 , e2 ), etc.

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 10


abstract syntax in Java

we use classes to build abstract syntax trees, as follows:


enum Binop { Add, Mul, ... }

abstract class Expr {}


class Cte extends Expr { int n; }
class Var extends Expr { String x; }
class Bin extends Expr { Binop op; Expr e1, e2; }
...
(constructors are omitted)

expression 2 * (x + 1) is then represented as

new Bin(Mul, new Cte(2), new Bin(Add, new Var("x"), new Cte(1)))

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 11


abstract syntax in OCaml

we use algebraic data types to build abstract syntax trees, as follows:


type binop = Add | Mul | ...

type expr =
| Cte of int
| Var of string
| Bin of binop * expr * expr
| ...

expression 2 * (x + 1) is then represented as

Bin (Mul, Cte 2, Bin (Add, Var "x", Cte 1))

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 12


parentheses

there is no constructor for parentheses in abstract syntax

in concrete syntax 2 * (x + 1),


parentheses are used to build this tree
×
2 +
x 1

rather than this one


+
× 1
2 x

(the lecture on parsing will explain how)


Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 13
syntactic sugar

we call syntactic sugar a construct of concrete syntax that does not exist
in abstract syntax

it is thus translated in terms of other constructs of abstract syntax


(typically during parsing)

examples:
• in C, expression a[i] is syntactic sugar for *(a+i)
• in Java, expression x -> {...} is sugar for the construction of an
object in some anonymous class that implements Function
• in OCaml, expression [e1 ; e2 ; ...; en ] is sugar for
e1 :: e2 :: ... :: en :: []

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 14


semantics

formal semantics is defined over abstract syntax

there are many approaches


• axiomatic semantics
• denotational semantics
• semantics by translation
• operational semantics

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 15


axiomatic semantics
also called Floyd-Hoare logic
(Robert Floyd, Assigning meanings to programs, 1967
Tony Hoare, An axiomatic basis for computer programming, 1969)

defines programs by means of their properties; we introduce a triple

{P} i {Q}

meaning “if formula P holds before the execution of statement i, then


formula Q holds after the execution”

example:
{x ≥ 0} x := x + 1 {x > 0}
example of rule:

{P[x ← E ]} x := E {P(x)}

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 16


denotational semantics

denotational semantics maps each program expression e to its


denotation [[e]], a mathematical object that represents the computation
denoted by e

example: arithmetic expressions with a single variable x

e ::= x | n | e + e | e * e | . . .

the denotation is a function that maps the value of x to the value of the
expression
[[x]] = x 7→ x
[[n]] = x 7→ n
[[e1 + e2 ]] = x 7→ [[e1 ]](x) + [[e2 ]](x)
[[e1 * e2 ]] = x 7→ [[e1 ]](x) × [[e2 ]](x)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 17


semantics by translation

(also called Strachey semantics)

we can define the semantics of a language by means of its translation to


another language for which the semantics is already defined

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 18


example

an esoteric language whose syntax consists of 8 characters and whose


semantics is defined by translation to the C language

command translation to C
(prelude) char array[30000] = {0};
char *ptr = array;
> ++ptr;
< --ptr;
+ ++*ptr;
- --*ptr;
. putchar(*ptr);
, *ptr = getchar();
[ while (*ptr) {
] }

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 19


operational semantics

operational semantics describes the sequence of elementary


computations from the expression to its outcome (its value)

it operates directly over abstract syntax

two kinds of operational semantics


• “natural semantics” or “big steps”

e↠v

• “reduction semantics” or “small steps”

e → e1 → e2 → · · · → v

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 20


while language

let us illustrate big-step operational semantics on a tiny fragment of C

e ::= expression
| n constant (signed 32-bit integer)
| x variable
| e op e binary operator (+, <. . . )

s ::= statement
| x=e; assignment
| if (e) s else s conditional
| while (e) s loop
| {s ... s } block

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 21


example

a = 0;
b = 1;
while (b < 100) {
b = a+b;
a = b-a;
}

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 22


big steps operational semantics of while

we seek to define a relation between some expression e and a value v

e↠v

here, values are limited to integers

v ::= value
| n integer value (signed 32-bit integer)

caveat: with most languages, values do not coincide with constants

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 23


example

in Java (or Python, OCaml, etc.), a value may be an address, even if we


do not have addresses among the literal constants of the language

int[] a = new int[4];


...
int[] b = a; a a
b[2] = 42; 0 1 2 3 0 1 42 3
b b
...

(more about this in lecture 5)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 24


value of a variable

the value of a variable is given by an environment E


(a function from variables to values)

we are going to define a relation

E, e ↠ v

that reads “in environment E , expression e has value v ”

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 25


example

in environment
E = {a 7→ 34, b 7→ 55}
the expression
a+b
has value
89
which we write
E , a + b ↠ 89

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 26


inference rules

a relation may be defined as the smallest relation satisfying a set of rules


with no premises (axioms) written

P
and a set of rules with premises written
P1 P2 ... Pn
P

this is called inference rules

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 27


example

we can define the relation Even(n) with two rules

Even(n)
et
Even(0) Even(n + 2)

that reads as follows


on the one hand Even(0)
on the other hand ∀n. Even(n) ⇒ Even(n + 2)

the smallest relation satisfying these two properties coincide with the
property “n is an even natural number”:
• even natural numbers are included, by induction
• if odd numbers were included, we could remove the smallest

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 28


derivation tree

a derivation is a tree whose internal nodes are rules with premises and
whose leaves are axioms

example:
Even(0)
Even(2)
Even(4)

the set of derivations characterizes the smallest relation satisfying the


inference rules

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 29


semantics of expressions
• a constant n has value n

E, n ↠ n

• a variable x has a value if E (x) is defined

x in E
E , x ↠ E (x)

• an addition e1 + e2 has a value if e1 has a value n1 , if e2 has a value


n2 and if n1 + n2 does not overflow
def
E , e1 ↠ n1 E , e2 ↠ n2 n = n1 + n2 − 231 ≤ n < 231
E , e1 + e2 ↠ n

• etc.
Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 30
example

with E = {a 7→ 34, b 7→ 55}, we have

a ∈ dom(E ) b ∈ dom(E )
E , a ↠ 34 E , b ↠ 55 89 = 34 + 55
E , a + b ↠ 89

note: one can see such a tree as a proof

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 31


expressions without value

there are expressions e for which there is no value v such that E , e ↠ v

examples:
• x + 1 with a variable x not defined in E

• 2000000000 + 1000000000 because of an overflow

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 32


expressions without value

these are two different situations

• the case of an undefined variable is detected during type checking


(see lecture 4) and the program is rejected

• the case of a (signed) arithmetic overflow is an undefined behavior


in the C language

the program is accepted, compiled, and executed,


but the compiler is free to do whatever it wants when an UB occurs

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 33


example

on the code
bool f(int x) {
return x+1 < 10;
}
the compiler gcc produces
xorl %eax, %eax
cmpl $8, %edi
setle %al
ret
which means it computes x <= 8

when x is 231 − 1, the function returns false even if x+1 would be −231
(if it was computed) and thus x+1 < 10 would be true

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 34


semantics of statements

a statement may modify the value of some variables (through assignments)

to define the semantics of a statement s, we thus introduce the relation

E, s ↠ E′

that reads “in environment E , the evaluation of statement s terminates


and leads to environment E ′ ”

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 35


semantics of statements 1/2

• if e has a value, then the assignment evaluates and adds/replaces


variable x

E, e ↠ v
E , x=e; ↠ E {x 7→ v }

• if the test e has a value, and if the corresponding branch evaluates,


then if evaluates

E , e ↠ n ̸= 0 E , s1 ↠ E1 E , e ↠ 0 E , s2 ↠ E2
E , if (e) s1 else s2 ↠ E1 E , if (e) s1 else s2 ↠ E2

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 36


example

with E = {a 7→ 21}, we have

E , 2 ↠ 2 E , a ↠ 21
E , a ↠ 21 E , 0 ↠ 0 E , 2 × a ↠ 42
E , a > 0 ↠ true E , a=2 × a; ↠ {a 7→ 42}
E , if (a > 0) a=2 × a; else { } ↠ {a 7→ 42}

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 37


semantics of statements 2/2

• a block evaluates if its statements evaluate in order


E , s 1 ↠ E1 E1 , { s 2 . . . } ↠ E2
E, { } ↠ E E , { s 1 s 2 . . . } ↠ E2

• a loop evaluates if it terminates

E, e ↠ 0
E , while (e) s ↠ E

E , e ↠ n ̸= 0 E , s ↠ E1 E1 , while (e) s ↠ E2
E , while (e) s ↠ E2

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 38


statement without evaluation

there are statements s that do not evaluate

example: while (1) { }

(and many other examples of statements involving expressions without a


value)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 39


induction on the derivation

to establish a property of a relation defined by a set of inference rules, on


can reason by structural induction on the derivation, i.e. one can use the
induction hypothesis on any sub-derivation

equivalently, one can say that we perform an induction over the height of
the derivation

in practice, we proceed by induction on the derivation and by case on the


last rule of the derivation

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 40


example

Proposition (evaluation is deterministic)


If E , e ↠ v and E , e ↠ v ′ then v = v ′ .

by induction over the derivations of E , e ↠ v and E , e ↠ v ′


case of an addition e = e1 + e2

(D1 ) (D2 ) (D1′ ) (D2′ )


.. .. .. ..
. . . .
E , e1 ↠ n1 E , e2 ↠ n2 E , e1 ↠ n1′ E , e2 ↠ n2′
E , e1 + e2 ↠ v E , e1 + e2 ↠ v ′

with v = n1 + n2 et v ′ = n1′ + n2′


by IH we have n1 = n1′ and n2 = n2′ and thus v = v ′

(other cases are similar or simpler)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 41


example

Proposition (evaluation is deterministic)


If E , s ↠ E ′ and E , s ↠ E ′′ then E ′ = E ′′ .

exercise: do this proof

remark: in the case of rule


E , e ↠ n ̸= 0 E , s ↠ E1 E1 , while (e) s ↠ E2
E , while (e) s ↠ E2
it is clear that induction is performed on the size of the derivation and not
on the size of the statement (which does not decrease)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 42


determinism

an evaluation relation is not necessarily deterministic

example: we add a primitive random to draw an integer 0 or 1 at random,


with the rule
0≤n<2
E , random() ↠ n

then we have E , random() ↠ 0 as well as E , random() ↠ 1

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 43


interpreter

we can code an interpreter following the rules of the natural semantics

let’s do it in Java

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 44


abstract syntax

as explained earlier
enum Binop { Add, ... }

abstract class Expr {}


class Ecte extends Expr { int n; }
class Evar extends Expr { String x; }
class Ebin extends Expr { Binop op; Expr e1, e2; }

abstract class Value {}


class Vint extends Value { int n; }
...
(constructors are omitted)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 45


abstract syntax

similarly for statements


abstract class Stmt {}
class Sassign extends Stmt { String x; Expr e; }
class Sif extends Stmt { Expr e; Stmt s1, s2; }
class Swhile extends Stmt { Expr e; Stmt s; }
class Sblock extends Stmt { List<Stmt> l; }

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 46


evaluation of an expression

let’s start with relation


E, e ↠ v

the environment E is represented by a class


class Environment {
HashMap<String, Value> vars = new HashMap<>();
}

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 47


evaluation of an expression

one solution is to declare a method


abstract class Expr {}
abstract Value eval(Environment env);
}
and then to define it within any sub-class

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 48


evaluation of an expression

E, n ↠ n
class Ecte extends Expr {
Value eval(Environment env) { return new Vint(n); }
}
x in E
E , x ↠ E (x)
class Evar extends Expr {
Value eval(Environment env) {
Value v = env.vars.get(x);
if (v == null)
throw new Error("unbound variable " + x);
return v;
}
}
Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 49
evaluation of an expression

E , e1 ↠ n1 E , e2 ↠ n2 n = n1 + n2 − 231 ≤ n < 231


etc.
E , e1 + e2 ↠ n

class Ebin extends Expr {


Value eval(Environment env) {
Value v1 = e1.eval(env), v2 = e2.eval(env);
switch (op) {
case Add:
return new Vint(v1.asInt() + v2.asInt());
...
}
}
}
(we could check the absence of arithmetic overflow)
Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 50
evaluation failure

the method eval dynamically fails on an expression involving an


undefined variable

we could have detected this error statically with typing (see lecture 4)

statically = at compile time


dynamically = during execution

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 51


evaluation of a statement

we proceed similarly for statements by adding a method in class Stmt


abstract class Stmt {
abstract void eval(Environment env);
}
that we define within any sub-class

eval returns nothing, but it mutates the environment

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 52


evaluation of a statement

E , s 1 ↠ E1 E1 , { s 2 . . . } ↠ E2
E, { } ↠ E E , { s1 s2 . . . } ↠ E2

class Sblock extends Stmt {


void eval(Environment env) {
for (Stmt s: l)
s.eval(env);
}
}

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 53


evaluation of a statement

E, e ↠ v
E , x=e; ↠ E {x 7→ v }

class Sassign extends Stmt {


void eval(Environment env) {
env.vars.put(x, e.eval(env));
}
}
(the environment is a mutable data structure)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 54


evaluation of a statement

E , e ↠ n ̸= 0 E , s1 ↠ E1 E , e ↠ 0 E , s 2 ↠ E2
E , if (e) s1 else s2 ↠ E1 E , if (e) s1 else s2 ↠ E2

class Sif extends Stmt {


void eval(Environment env) {
if (e.eval(env).asInt() != 0)
s1.eval(env);
else
s2.eval(env);
}
}

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 55


evaluation of a statement

E , e ↠ n ̸= 0 E , s ↠ E1 E1 , while (e) s ↠ E2
E , while (e) s ↠ E2

E, e ↠ 0
E , while (e) s ↠ E

class Swhile extends Stmt {


void eval(Environment env) {
while (e.eval(env).asInt() != 0)
s.eval(env);
}
}

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 56


interpreter in OCaml

we can do the same in OCaml

pattern matching plays the role of dynamic methods


let rec eval env = function
| Ecte v ->
v
| Evar x ->
(try Hashtbl.find env x
with Not_found -> failwith ("unbound variable" ^ x))
| Ebin (op, e1, e2) ->
(match op, eval env e1, eval env e2 with
| Add, Vint n1, Vint n2 -> Vint (n1 + n2)
| ...
| _ -> failwith "illegal operands")

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 57


brief comparison functional/object programming

what distinguishes
type expr = Cte of value | Evar of string | ...

abstract class Expr {...} class Ecte extends Expr {...}

in OCaml, the code of eval is a single function and it covers all cases

in Java, it is scattered in all classes

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 58


brief comparison functional/object programming

horizontal extension vertical extension


= adding a case = adding a function
Java easy painful
(one file) (several files)
OCaml painful easy
(several files) (one file)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 59


another way of writing the Java code

the Java code may be organized differently, with


• classes for the abstract syntax on one side,
• a class for the interpreter on the other side

to do that, we can use the visitor pattern

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 60


visitor 1/3

we start by introducing an interface for the interpreter


interface Interpreter {
Value interp(Ecte e);
Value interp(Evar e);
Value interp(Ebin e);
}

note: we use Java’s overloading to give all these methods the same name

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 61


visitor 2/3

in class Expr, we provide a method accept to apply the interpreter


abstract class Expr {
abstract Value accept(Interpreter i);
}
class Ecte extends Expr {
Value accept(Interpreter i) { return i.interp(this); }
}
class Evar extends Expr {
Value accept(Interpreter i) { return i.interp(this); }
}
class Ebin extends Expr {
Value accept(Interpreter i) { return i.interp(this); }
}
this is the only intrusion in the classes of abstract syntax

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 62


visitor 3/3
finally, we can code the interpreter in a separate class, that implements
interface Interpreter
class Interp implements Interpreter {
Environment env = new Environment();
Value interp(Ecte e) {
return new Vint(e.n);
}
Value interp(Ebin e) {
Value v1 = e.e1.accept(this), v2 = e.e2.accept(this);
switch (e.op) {
case Add:
return new Vint(v1.asInt() + v2.asInt());
...
}
...
}
Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 63
general-purpose visitor

the interface Interpreter is specific to our needs

we could instead provide a general-purpose visitor


interface Visitor {
void visit(Ecte e);
void visit(Evar e);
void visit(Ebin e);
}
so that we can use it for other purposes, e.g. printing

methods visit return nothing, but this is not an issue


(see this week’s lab)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 64


weaknesses of natural semantics

natural semantics makes no distinction between programs that crash, such


as
x +1
with an undefined variable x and programs whose evaluation does not
terminate, such as
while (1) { }

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 65


small-step operational semantics

small-step operational semantics remedies this by introducing a notion


of elementary computation E1 , s1 → E2 , s2 , which we iterate

then we can distinguish


1. successful termination

E , s → E1 , s1 → E2 , s2 → · · · → E ′ , { }

2. evaluation stuck on En , sn with sn ̸= { }

E , s → E1 , s1 → E2 , s2 → · · · → En , sn

3. non-terminating evaluation

E , s → E1 , s 1 → E 2 , s 2 → · · ·

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 66


remark

we can keep our big-step semantics for expressions,


since expressions always terminate

but for a more complex language (say, with function calls in expressions),
we would use small-step semantics for expressions as well

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 67


small-step operational semantics for while

E, e ↠ v
E , x=e; → E {x 7→ v }, { }

E , s1 → E1 , s1′
E, { { } s . . . } → E, { s . . . } E , { s1 s2 . . . } → E1 , { s1′ s2 . . . }

E , e ↠ n ̸= 0 E, e ↠ 0
E , if (e) s1 else s2 → E , s1 E , if (e) s1 else s2 → E , s2

E , e ↠ n ̸= 0
E , while (e) s → E , { s while (e) s }

E, e ↠ 0
E , while (e) s → E , { }
Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 68
example

{}, { x=40; while (x < 42) x=x + 1; }


→ {x 7→ 40}, { { } while (x < 42) x=x + 1; }
→ {x 7→ 40}, while (x < 42) x=x + 1;
→ {x 7→ 40}, { x=x + 1; while (x < 42) x=x + 1; }
→ {x 7→ 41}, { { } while (x < 42) x=x + 1; }
→ {x 7→ 41}, while (x < 42) x=x + 1;
→ {x 7→ 41}, { x=x + 1; while (x < 42) x=x + 1; }
→ {x 7→ 42}, { { } while (x < 42) x=x + 1; }
→ {x 7→ 42}, while (x < 42) x=x + 1;
→ {x 7→ 42}, {}

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 69


alternative

we could replace the two rules for while with the following rule

E , while (e) s → E , if (e) { s while (e) s } else { }

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 70


equivalence

Proposition (equivalence of the two semantics)


The two operational semantics are equivalent on programs whose
evaluation terminate, i.e.

E, s ↠ E′ if and only if E , s →⋆ E ′ , { }

(where →⋆ is the reflexive transitive closure of →).

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 71


proof

Proposition (big steps imply small steps)


If E , s ↠ E ′ , then E , s →⋆ E ′ , { }.

by induction on the derivation E , s ↠ E ′ and by case on the last rule


• case of { s1 s2 . . . }
E , s 1 ↠ E1 E 1 , { s 2 . . . } ↠ E 2
E , { s1 s2 . . . } ↠ E 2
then E , s1 →⋆ E1 , { } by IH
consequently,

E , { s1 s2 . . . } →⋆ E1 , { { } s2 . . . } (small steps)
→ E1 , { s2 . . . }
→⋆ E2 , { } (IH)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 72


proof

• case of while (e) s

if
E , e ↠ n ̸= 0 E , s ↠ E1 E1 , while (e) s ↠ E2
E , while (e) s ↠ E2
then

E , while (e) s → E , { s while (e) s }


→⋆ E1 , { { } while (e) s } (IH + block rule)
→ E1 , { while (e) s }
→⋆ E2 , { } (IH)

exercise: do the other cases

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 73


small steps imply big steps
Lemma
If E1 , s1 → E2 , s2 ↠ E ′ , then E1 , s1 ↠ E ′ .
by induction over ↠
• case s1 = { u1 v1 . . . }
• case u1 = { }
we have E1 , { { } v1 . . . } → E1 , { v1 . . . } ↠ E ′ and thus
E1 , { } ↠ E1 E1 , { v1 . . . } ↠ E ′
E1 , { { }; v1 . . . } ↠ E ′
• case u1 ̸= { }
E1 , { u1 v1 . . . } → E2 , { u2 v1 . . . } ↠ E ′ i.e. E1 , u1 → E2 , u2 and
E2 , u2 ↠ E2′ E2′ , { v1 . . . } ↠ E ′
E2 , { u2 v1 . . . } ↠ E ′
by IH we deduce
E1 , u1 ↠ E2′ E2′ , { v1 . . . } ↠ E ′
E1 , { u1 v1 . . . } ↠ E ′
(do the other cases)
Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 74
small steps imply big steps

we deduce
Proposition (small steps imply big steps)
Si E , s →⋆ E ′ , { }, alors E , s ↠ E ′ .

proof: we have

E , s → E 1 , s 1 → E2 , s 2 → · · · → En , s n → E ′ , { }

but E ′ , { } ↠ E ′ so En , sn ↠ E ′ by the lemma above,


then En−1 , sn−1 ↠ E ′ by the same lemma, etc.,
until we get E, s ↠ E ′
(induction on n, the number of steps)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 75


first extension

let us add pointers to our fragment of C

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 76


first extension

this means extending the notion of value, of expressions, and of statements

v ::= value
| n integer value (signed 32-bit integer)
| ℓ memory address

e ::= ... expression


| malloc(4) allocate memory
| *e read from memory

s ::= ... statement


| *e=e; write to memory mémoire

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 77


modeling the memory

to include the memory in our semantics, we extend the semantics relation


as follows,
M, E , e ↠ v
where M is a function from addresses (ℓ) to integers (n)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 78


evaluation of expressions

we add a rule to allocate memory


ℓ an address that is not in M
M, E , malloc(4) ↠ ℓ

and another to read from memory


M, E , e ↠ ℓ ℓ in M
M, E , *e ↠ M(ℓ)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 79


evaluation of statements

we add a rule to write to memory


M, E , e1 ↠ ℓ ℓ in M M, E , e2 ↠ n
M, E , *e1 =e2 ; ↠ M{ℓ 7→ n}, E

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 80


remark

as it is defined, our semantics makes a clear distinction between integers


and addresses

in particular, an expression such as **e has no value

we could accommodate such expressions, but at the price of a much more


complex semantics

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 81


second extension

let us add functions to our fragment of C

to make things simpler, functions do not return any value;


but they are useful anyway:
void f(int x, int *p) {
while (x) {
*p = *p + x;
x = x - 1;
}
}

beside, we only have local variables

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 82


second extension

s ::= ... statement


| f (e, . . . , e); function call

p ::= d . . . d program

d ::= void f (x, . . . , x) s function definition

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 83


semantics of a function call

for a function call to evaluate, we require that


• the function exists, with the right number of parameters
• all arguments evaluate to values
• the body of the function evaluates

there exists a function void f (x1 , . . . , xn ) s


M, E , ei ↠ vi M, {x1 7→ v1 , . . . , xn 7→ vn }, s ↠ M ′ , E ′
M, E , f (e1 , . . . , en ); ↠ M ′ , E

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 84


remarks

there exists a function void f (x1 , . . . , xn ) s


M, E , ei ↠ vi M, {x1 7→ v1 , . . . , xn 7→ vn }, s ↠ M ′ , E ′
M, E , f (e1 , . . . , en ); ↠ M ′ , E

note how
• the function body s is evaluated in a new environment, with only
variables xi and whose final value E ′ is discarded
• we return to the environment E from which we started, unchanged
• the memory, on the contrary, is possibly modified

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 85


evaluation order

here, the evaluation order of the function arguments is not significant,


since the evaluation of an expression has no side effects

in the (real) C language, however, arguments may have effects


f(i++, j++)
and it is not specified in which order they are evaluated
(this is an implementation-defined behavior)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 86


call by value

we have evaluated function arguments before the call,


and we passed their values

we call this call by value

this is what C does, but this is not the only option (see lecture 5)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 87


in the lecture notes

the lecture notes also contain the operational semantics for Mini-ML

e ::= x identifier
| c constant (1, 2, . . . , true, . . . )
| op primitive (+, ×, fst, . . . )
| fun x → e anonymous function
| e e application
| (e, e) pair
| let x = e in e local binding

(section 2.2, page 20)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 88


application

correctness of a compiler

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 89


correctness

a compiler must respect the semantics

if the input language is equipped with a semantics →s and the target


language with a semantics →m , and if some expression e is compiled to
C (e) the we must have “a commutative diagram”:

e −→s v

↓ ≈

C (e) −→m v ′

where v ≈ v ′ states that values v and v ′ coincide

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 90


minimalist example

let us consider arithmetic expressions with no variables

e ::= n | e + e

and let us show the correctness of a very simple compiler to x86-64


that uses the stack to store intermediate computations

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 91


input language

we set a small-step semantics for the input language

n = n1 + n2 e1 → e1′ e2 → e2′
n1 + n2 → n e1 + e2 → e1′ + e2 n1 + e2 → n1 + e2′

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 92


target language

similarly, we set a small-step semantics for the target language

m ::= movq $n, r


| addq $n, r | addq r , r
| movq (r ), r | movq r , (r ) |
r ::= %rdi | %rsi | %rsp

a state gathers the values of registers, R,


and the contents of the memory, M

R ::= {%rdi 7→ n; %rsi 7→ n; %rsp 7→ n}


M ::= N → Z

we then define the semantics of an instruction m using a relation


m
R, M, m −→ R ′ , M ′

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 93


target language

m
the relation R, M, m −→ R ′ , M ′ is defined as follows:

m
R, M, movq $n, r −→ R{r 7→ n}, M
m
R, M, addq $n, r −→ R{r 7→ R(r ) + n}, M
m
R, M, addq r1 , r2 −→ R{r2 7→ R(r1 ) + R(r2 )}, M
m
R, M, movq (r1 ), r2 −→ R{r2 7→ M(R(r1 ))}, M
m
R, M, movq r1 , (r2 ) −→ R, M{R(r2 ) 7→ R(r1 )}

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 94


compiler

the final value of an expression is stored in %rdi

code(n) = movq $n, %rdi

code(e1 + e2 ) = code(e1 )
addq $-8, %rsp
movq %rdi, (%rsp)
code(e2 )
movq (%rsp), %rsi
addq $8, %rsp
addq %rsi, %rdi

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 95


correctness of the compiler

we seek to prove that if



e −→ n
and if
m ⋆
R, M, code(e) −→ R ′ , M ′
then R ′ (%rdi) = n

we proceed by structural induction on e

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 96


correctness of the compiler

we show a stronger property (an invariant), namely


⋆ m ⋆
if e −→ n and R, M, code(e) −→ R ′ , M ′ then
 ′

 R (%rdi) = n


R ′ (%rsp) = R(%rsp)



∀a ≥ R(%rsp), M ′ (a) = M(a)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 97


correctness of the compiler

• case e = n

we have e → n and code(e) = movq $n, %rdi and the result is
immediate

• case e = e1 + e2
⋆ ⋆ ⋆ ⋆
we have e → n1 + e2 → n1 + n2 with e1 → n1 and e2 → n2
thus we can invoke the induction hypothesis on e1 and e2

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 98


correctness of the compiler
R, M
code(e1 ) R1 , M1 by induction hypothesis
R1 (%rdi) = n1 and R1 (%rsp) = R(%rsp)
∀a ≥ R(%rsp), M1 (a) = M(a)
addq $-8, %rsp
movq %rdi, (%rsp) R1′ , M1′ R1′ = R1 {%rsp 7→ R(%rsp) − 8}
M1′ = M1 {R(%rsp) − 8 7→ n1 }
code(e2 ) R2 , M2 by induction hypothesis
R2 (%rdi) = n2 and R2 (%rsp) = R(%rsp) − 8
∀a ≥ R(%rsp) − 8, M2 (a) = M1′ (a)
movq (%rsp), %rsi
addq $8, %rsp
addq %rsi, %rdi R ′ , M2 R ′ (%rdi) = n1 + n2
R ′ (%rsp) = R(%rsp) − 8 + 8 = R(%rsp)
∀a ≥ R(%rsp),
M2 (a) = M1′ (a) = M1 (a) = M(a)

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 99


in the large

such a proof can be done for a realistic compiler

example: CompCert, an optimizing compiler from C to PowerPC, ARM,


RISC-V, and x86, has been formally verified using the Coq proof assistant

see http://compcert.inria.fr/

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 100


next

> ./mini-python tests/good/pascal.py


*
**
***
• lab 2 (rooms 31 and 32) ****
• a mini-Python interpreter *****
******
• in Java or OCaml (your *******
choice) *000000*
• take your time to read and **00000**
***0000***
understand the code that is ****000****
provided *****00*****
******0******
**************
*000000*000000*
• lecture 3 **00000**00000**
***0000***0000***
• parsing ****000****000****
*****00*****00*****
******0******0******
*********************

Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 101

You might also like