Language Specification & Compiler Construction: Hanspeter Mössenböck University of Linz
Language Specification & Compiler Construction: Hanspeter Mössenböck University of Linz
1. Overview
1.1 1.2 1.3 1.4 1.5 Motivation Structure of a Compiler Grammars Chomsky's Classification of Grammars The MicroJava Language
Also useful for general software development Reading syntactically structured command-line arguments Reading structured data (e.g. XML files, part lists, image files, ...) Searching in hierarchical namespaces Interpretation of command codes ...
3
1. Overview
1.1 1.2 1.3 1.4 1.5 Motivation Structure of a Compiler Grammars Chomsky's Classification of Grammars The MicroJava Language
token stream
Statement
syntax tree
syntax tree
machine code
Interpreter
VM
source code is translated into the code of a virtual machine (VM) VM interprets the code simulating the physical machine
7
1. Overview
1.1 1.2 1.3 1.4 1.5 Motivation Structure of a Compiler Grammars Chomsky's Classification of Grammars The MicroJava Language
What is a grammar?
Example
Statement = "if" "(" Condition ")" Statement ["else" Statement].
Four components
terminal symbols are atomic
"if", ">=", ident, number, ...
nonterminal symbols
are decomposed into smaller units rules how to decompose nonterminals topmost nonterminal
productions
Statement = Designator "=" Expr ";". Designator = ident ["." ident]. ... Java
start symbol
10
EBNF Notation
Extended Backus-Naur form for writing grammars Productions
Statement
left-hand side literal terminal symbol John Backus: developed the first Fortran compiler Peter Naur: edited the Algol60 report nonterminal symbol terminates a production
"write"
ident
","
Expression
";" .
right-hand side
by convention terminal symbols start with lower-case letters nonterminal symbols start with upper-case letters
Metasymbols
| (...) [...] {...} separates alternatives groups alternatives optional part iterative part a|b|c a (b | c) [a] b {a}b a or b or c ab | ac ab | b b | ab | aab | aaab | ...
11
Expr
Terminal symbols
simple TS: terminal classes: "+", "-", "*", "/", "(", ")" (just 1 instance) ident, number (multiple instances) Factor Term
Nonterminal symbols
Expr, Term, Factor
Start symbol
Expr
12
First(Factor) = First(Term) =
First(Expr) =
14
Follow(Expr) = Follow(Term) =
Where does Expr occur on the right-hand side of a production? What terminal symbols can follow there?
Follow(Factor) =
15
Empty String
The string that contains no symbol (denoted by ).
Term + ident * Factor
NTS
* (indirect derivation)
1 2 ... n
Recursion
A production is recursive if
X * 1 X 2 Can be used to represent repetitions and nested structures X 1 X 2
X = b | X a. X = b | a X. X = b | "(" X ")". X X a X a a X a a a b a a a a a ... X a X a a X a a a X ... a a a a a b X (X) ((X)) (((X))) (((... (b)...)))
Direct recursion
Left recursion Right recursion Central recursion
Indirect recursion
Example
X * 1 X 2
Expr Term Factor "(" Expr ")"
Expr = Term {"+" Term}. Term = Factor {"*" Factor}. Factor = id | "(" Expr ")".
17
Both alternatives start with b. The parser cannot decide which one to choose
Another example
E = T | E "+" T.
Thus
E = T {"+" T}.
18
1. Overview
1.1 1.2 1.3 1.4 1.5 Motivation Structure of a Compiler Grammars Chomsky's Classification of Grammars The MicroJava Language
19
Classification of Grammars
Due to Noam Chomsky (1956)
Recognized by Turing machines class 1 Context-sensitive grammars (|| ||) e.g: a X = a b c. Recognized by linear bounded automata Context-free grammars ( = NT, ) e.g: X = a b c. Recognized by push-down automata Regular grammars ( = NT, = T or T NT) e.g: X = b | b Y. Recognized by finite automata
class 2
class 3
20
1. Overview
1.1 1.2 1.3 1.4 1.5 Motivation Structure of a Compiler Grammars Chomsky's Classification of Grammars The MicroJava Language
21
22
all numbers are of type int all character constants are of type char (may contain \r and \n)
return
void
Operators
* > [ ,
/ >= ] .
% < {
<= }
Comments Types
// ... eol
int
char
arrays
classes
23
Declarations
ConstDecl VarDecl MethodDecl Type FormPars = "final" Type ident "=" (number | charConst) ";". = Type ident {"," ident} ";". = (Type | "void") ident "(" [FormPars] ")" {VarDecl} Block. = ident [ "[" "]" ]. = Type ident {"," Type ident}.
24
ActPars
25
no constructors
26
Execution
java MJ.Run myProg.obj -debug myProg.obj interpreter
Decoding
java MJ.Decode myProg.obj myProg.obj decoder myProg.code 27