Cours2 en Handout
Cours2 en Handout
Jean-Christophe Filliâtre
1. abstract syntax
2. formal semantics
• big-step operational semantics
• interpreter
• small-step operational semantics
3. application
• correctness of a compiler
what is a program?
source
↓
↓
code production
lexical analysis
↓
↓
assembly code
stream of tokens
↓
↓
assembler (as)
parsing
↓
↓
machine language
abstract syntax tree
↓
↓
semantic analysis linking (ld)
↓ ↓
abstract syntax + symbol table executable
the texts
2*(x+1)
and
(2 * ((x) + 1))
and
2 * /* I double */ ( x + 1 )
e ::= c constant
| x variable
| e +e addition
| e ×e multiplication
| ...
but we could have picked something else, e.g. Add(e1 , e2 ), +(e1 , e2 ), etc.
new Bin(Mul, new Cte(2), new Bin(Add, new Var("x"), new Cte(1)))
type expr =
| Cte of int
| Var of string
| Bin of binop * expr * expr
| ...
we call syntactic sugar a construct of concrete syntax that does not exist
in abstract syntax
examples:
• in C, expression a[i] is syntactic sugar for *(a+i)
• in Java, expression x -> {...} is sugar for the construction of an
object in some anonymous class that implements Function
• in OCaml, expression [e1 ; e2 ; ...; en ] is sugar for
e1 :: e2 :: ... :: en :: []
{P} i {Q}
example:
{x ≥ 0} x := x + 1 {x > 0}
example of rule:
{P[x ← E ]} x := E {P(x)}
e ::= x | n | e + e | e * e | . . .
the denotation is a function that maps the value of x to the value of the
expression
[[x]] = x 7→ x
[[n]] = x 7→ n
[[e1 + e2 ]] = x 7→ [[e1 ]](x) + [[e2 ]](x)
[[e1 * e2 ]] = x 7→ [[e1 ]](x) × [[e2 ]](x)
command translation to C
(prelude) char array[30000] = {0};
char *ptr = array;
> ++ptr;
< --ptr;
+ ++*ptr;
- --*ptr;
. putchar(*ptr);
, *ptr = getchar();
[ while (*ptr) {
] }
e↠v
e → e1 → e2 → · · · → v
e ::= expression
| n constant (signed 32-bit integer)
| x variable
| e op e binary operator (+, <. . . )
s ::= statement
| x=e; assignment
| if (e) s else s conditional
| while (e) s loop
| {s ... s } block
a = 0;
b = 1;
while (b < 100) {
b = a+b;
a = b-a;
}
e↠v
v ::= value
| n integer value (signed 32-bit integer)
E, e ↠ v
in environment
E = {a 7→ 34, b 7→ 55}
the expression
a+b
has value
89
which we write
E , a + b ↠ 89
P
and a set of rules with premises written
P1 P2 ... Pn
P
Even(n)
et
Even(0) Even(n + 2)
the smallest relation satisfying these two properties coincide with the
property “n is an even natural number”:
• even natural numbers are included, by induction
• if odd numbers were included, we could remove the smallest
a derivation is a tree whose internal nodes are rules with premises and
whose leaves are axioms
example:
Even(0)
Even(2)
Even(4)
E, n ↠ n
x in E
E , x ↠ E (x)
• etc.
Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 30
example
a ∈ dom(E ) b ∈ dom(E )
E , a ↠ 34 E , b ↠ 55 89 = 34 + 55
E , a + b ↠ 89
examples:
• x + 1 with a variable x not defined in E
on the code
bool f(int x) {
return x+1 < 10;
}
the compiler gcc produces
xorl %eax, %eax
cmpl $8, %edi
setle %al
ret
which means it computes x <= 8
when x is 231 − 1, the function returns false even if x+1 would be −231
(if it was computed) and thus x+1 < 10 would be true
E, s ↠ E′
E, e ↠ v
E , x=e; ↠ E {x 7→ v }
E , e ↠ n ̸= 0 E , s1 ↠ E1 E , e ↠ 0 E , s2 ↠ E2
E , if (e) s1 else s2 ↠ E1 E , if (e) s1 else s2 ↠ E2
E , 2 ↠ 2 E , a ↠ 21
E , a ↠ 21 E , 0 ↠ 0 E , 2 × a ↠ 42
E , a > 0 ↠ true E , a=2 × a; ↠ {a 7→ 42}
E , if (a > 0) a=2 × a; else { } ↠ {a 7→ 42}
E, e ↠ 0
E , while (e) s ↠ E
E , e ↠ n ̸= 0 E , s ↠ E1 E1 , while (e) s ↠ E2
E , while (e) s ↠ E2
equivalently, one can say that we perform an induction over the height of
the derivation
let’s do it in Java
as explained earlier
enum Binop { Add, ... }
E, n ↠ n
class Ecte extends Expr {
Value eval(Environment env) { return new Vint(n); }
}
x in E
E , x ↠ E (x)
class Evar extends Expr {
Value eval(Environment env) {
Value v = env.vars.get(x);
if (v == null)
throw new Error("unbound variable " + x);
return v;
}
}
Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 49
evaluation of an expression
we could have detected this error statically with typing (see lecture 4)
E , s 1 ↠ E1 E1 , { s 2 . . . } ↠ E2
E, { } ↠ E E , { s1 s2 . . . } ↠ E2
E, e ↠ v
E , x=e; ↠ E {x 7→ v }
E , e ↠ n ̸= 0 E , s1 ↠ E1 E , e ↠ 0 E , s 2 ↠ E2
E , if (e) s1 else s2 ↠ E1 E , if (e) s1 else s2 ↠ E2
E , e ↠ n ̸= 0 E , s ↠ E1 E1 , while (e) s ↠ E2
E , while (e) s ↠ E2
E, e ↠ 0
E , while (e) s ↠ E
what distinguishes
type expr = Cte of value | Evar of string | ...
in OCaml, the code of eval is a single function and it covers all cases
note: we use Java’s overloading to give all these methods the same name
E , s → E1 , s1 → E2 , s2 → · · · → E ′ , { }
E , s → E1 , s1 → E2 , s2 → · · · → En , sn
3. non-terminating evaluation
E , s → E1 , s 1 → E 2 , s 2 → · · ·
but for a more complex language (say, with function calls in expressions),
we would use small-step semantics for expressions as well
E, e ↠ v
E , x=e; → E {x 7→ v }, { }
E , s1 → E1 , s1′
E, { { } s . . . } → E, { s . . . } E , { s1 s2 . . . } → E1 , { s1′ s2 . . . }
E , e ↠ n ̸= 0 E, e ↠ 0
E , if (e) s1 else s2 → E , s1 E , if (e) s1 else s2 → E , s2
E , e ↠ n ̸= 0
E , while (e) s → E , { s while (e) s }
E, e ↠ 0
E , while (e) s → E , { }
Jean-Christophe Filliâtre CSC 52064 – Compilation abstract syntax, semantics 68
example
we could replace the two rules for while with the following rule
E, s ↠ E′ if and only if E , s →⋆ E ′ , { }
E , { s1 s2 . . . } →⋆ E1 , { { } s2 . . . } (small steps)
→ E1 , { s2 . . . }
→⋆ E2 , { } (IH)
if
E , e ↠ n ̸= 0 E , s ↠ E1 E1 , while (e) s ↠ E2
E , while (e) s ↠ E2
then
we deduce
Proposition (small steps imply big steps)
Si E , s →⋆ E ′ , { }, alors E , s ↠ E ′ .
proof: we have
E , s → E 1 , s 1 → E2 , s 2 → · · · → En , s n → E ′ , { }
v ::= value
| n integer value (signed 32-bit integer)
| ℓ memory address
p ::= d . . . d program
note how
• the function body s is evaluated in a new environment, with only
variables xi and whose final value E ′ is discarded
• we return to the environment E from which we started, unchanged
• the memory, on the contrary, is possibly modified
this is what C does, but this is not the only option (see lecture 5)
the lecture notes also contain the operational semantics for Mini-ML
e ::= x identifier
| c constant (1, 2, . . . , true, . . . )
| op primitive (+, ×, fst, . . . )
| fun x → e anonymous function
| e e application
| (e, e) pair
| let x = e in e local binding
correctness of a compiler
↓ ≈
⋆
C (e) −→m v ′
e ::= n | e + e
n = n1 + n2 e1 → e1′ e2 → e2′
n1 + n2 → n e1 + e2 → e1′ + e2 n1 + e2 → n1 + e2′
m
the relation R, M, m −→ R ′ , M ′ is defined as follows:
m
R, M, movq $n, r −→ R{r 7→ n}, M
m
R, M, addq $n, r −→ R{r 7→ R(r ) + n}, M
m
R, M, addq r1 , r2 −→ R{r2 7→ R(r1 ) + R(r2 )}, M
m
R, M, movq (r1 ), r2 −→ R{r2 7→ M(R(r1 ))}, M
m
R, M, movq r1 , (r2 ) −→ R, M{R(r2 ) 7→ R(r1 )}
code(e1 + e2 ) = code(e1 )
addq $-8, %rsp
movq %rdi, (%rsp)
code(e2 )
movq (%rsp), %rsi
addq $8, %rsp
addq %rsi, %rdi
• case e = n
⋆
we have e → n and code(e) = movq $n, %rdi and the result is
immediate
• case e = e1 + e2
⋆ ⋆ ⋆ ⋆
we have e → n1 + e2 → n1 + n2 with e1 → n1 and e2 → n2
thus we can invoke the induction hypothesis on e1 and e2
see http://compcert.inria.fr/