Lec02 Programming Language Specification
Lec02 Programming Language Specification
Specification
1
Lecture Objectives
2
Language Processors: Why do we need them?
Programmer Programmer
Compute surface area of
Concepts and Ideas
a triangle?
Java Program
JVM Interpreter
X86 Processor
0101001001...
Hardware Hardware
3
Programming Language Specification
• Why?
– A communication device between people who need to have
a common understanding of the PL:
• language designer, language implementer, user
• What to specify?
– Specify what is a ‘well formed’ program
• syntax
• contextual constraints (also called static semantics):
– scoping rules
– type rules
– Specify what is the meaning of (well formed) programs
• semantics (also called runtime semantics)
4
Programming Language Specification
• Why?
• What to specify?
• How to specify ?
– Formal specification: use some kind of
precisely defined formalism
– Informal specification: description in English.
6
How do we start?
7
Lexemes
8
Tokens
The category of lexemes are tokens.
• Identifiers: Names chosen by the programmer.
val, xdot, y
9
Tokens (Contd.)
10
Tokens (Contd.)
• Integers: 2 1000 -20
• Floating-point: 2.0 -0.010 .02
• Symbols: $ # @ { } << >> [ ]
• Strings: “x” “He said, I love Compilers”
• Comments: /* Hi and Bye */
11
Token Structure (Example)
12
What do we do with tokens?
13
Grammars
14
Context Free Grammar
15
Grammar, Formally
Grammar G of a programming language is a four tuples (quadruple),
G = (T, N, S, P) where:
T is a finite set of terminal symbols <assign>→<ident> = <expr>
<ident> →A | B | C
N is a finite set of non-terminal symbols
<expr> → <ident> + <expr>
S is the start symbol | <ident> * <expr>
P is a finite set of production rules | ( <expr> )
| <ident>
T = { =, A, B, C, *, +, (, ) }
N = { <assign>, <ident>, <expr> }
S = { <assign> }
P = { <assign> → <ident> = <expr>, <ident> → A | B | C,
<expr> → <ident> + <expr> | <ident> * <expr> | ( <expr> ) | <ident> }
16
Production rules
17
Backus Naur Form (BNF)
* Useful for describing the syntax of programming languages
if-else statement in Java
Tokens
The structuring rule for if-else
Terminals
Nonterminals
Can have the form Production
18
list → list + digit
Logical OR in BNF list → list – digit
list → digit
Tokens digit → 0
digit → 1
+ – 0123456789
digit → 2
digit → 3
digit → 4
Nonterminals digit → 5
digit → 6
list digit digit → 7
OR digit → 8
digit → 9
20
Recursive Rules in BNF
21
Extended BNF
• [ ] Optional element:
<if_stmt> ::= if (<logic_expr>) <stmt> [ else <stmt>]
<real_num> ::= [<int_num>] . <int_num>
+ 0
1 1
list
digit
9-5+2
list digit
_
digit 5
+ 2
C
26
Derivation
Derivation is a mechanism by which the rules of a grammar
can be repeatedly applied to generate a sentence.
At each stage, a nonterminal is replaced by the RHS of a
rule, till finally the whole sentence is generated.
A = B * C
(9 – 5) + 2 9 – (5 + 2)
30
Example 1 9–5+2
31
<assign> A = B*C+A <assign>
Example 2
<ident> = <expr> <ident> = <expr>
33
Contextual Constraints
Syntax rules alone are not enough to specify the format of
well-formed programs.
Example 1:
let const m~2
Undefined! Scope Rules
in putint(m + x)
Example 2:
let const m~2 ;
var n:Boolean
in begin
n := m<4;
n := n+1 Type error!
Type Rules
end
34
Semantics
Specification of semantics is concerned with specifying the
“meaning” of well-formed programs.
Terminology:
Expressions are evaluated and yield values (and may or may not
perform side effects).
Commands are executed and perform side effects.
Declarations are elaborated to produce bindings.
Side effects:
• change the values of variables
• perform input/output
35
The End
36