0% found this document useful (0 votes)
159 views

Language Specification & Compiler Construction: Hanspeter Mössenböck University of Linz

The document describes the MicroJava language and compiler. It provides an overview of language specification and compiler construction. It discusses the structure and components of a compiler including lexical analysis, parsing, semantic analysis, code generation. It also describes Chomsky's classification of grammars and defines grammars, productions, terminals, nonterminals and derivations. The MicroJava language and its lexical, syntactic and semantic structure are defined.

Uploaded by

00matrix00
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views

Language Specification & Compiler Construction: Hanspeter Mössenböck University of Linz

The document describes the MicroJava language and compiler. It provides an overview of language specification and compiler construction. It discusses the structure and components of a compiler including lexical analysis, parsing, semantic analysis, code generation. It also describes Chomsky's classification of grammars and defines grammars, productions, terminals, nonterminals and derivations. The MicroJava language and its lexical, syntactic and semantic structure are defined.

Uploaded by

00matrix00
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Language Specification & Compiler Construction

Hanspeter Mssenbck University of Linz


http://ssw.jku.at/Misc/CC/
Text Book N.Wirth: Compiler Construction, Addison-Wesley 1996 http://www-old.oberon.ethz.ch/WirthPubl/CBEAll.pdf
1

1. Overview
1.1 1.2 1.3 1.4 1.5 Motivation Structure of a Compiler Grammars Chomsky's Classification of Grammars The MicroJava Language

Why should I learn about compilers?


It's part of the general background of a software engineer How do compilers work? How do computers work?
(instruction set, registers, addressing modes, run-time data structures, ...)

What machine code is generated for certain language constructs?


(efficiency considerations)

What is good language design? Opportunity for a non-trivial programming project

Also useful for general software development Reading syntactically structured command-line arguments Reading structured data (e.g. XML files, part lists, image files, ...) Searching in hierarchical namespaces Interpretation of command codes ...
3

1. Overview
1.1 1.2 1.3 1.4 1.5 Motivation Structure of a Compiler Grammars Chomsky's Classification of Grammars The MicroJava Language

Dynamic Structure of a Compiler


character stream
va l = 10 * va l + i

lexical analysis (scanning)


ident 1 "val" assign number times 3 2 4 10 ident 1 "val" plus 5 ident 1 "i"

token stream

token number token value

syntax analysis (parsing)

Statement

syntax tree

Expression Term ident = number * ident + ident 5

Dynamic Structure of a Compiler


Statement

syntax tree

Expression Term ident = number * ident + ident

semantic analysis (type checking, ...) intermediate representation


syntax tree, symbol table, ...

optimization code generation machine code


const 10 load 1 mul ...

Compiler versus Interpreter


Compiler translates to machine code
scanner source code parser ... code generator loader

machine code

Interpreter

executes source code "directly"


scanner source code parser interpretation

statements in a loop are scanned and parsed again and again

Variant: interpretation of intermediate code


... compiler ... source code intermediate code (e.g. Java bytecode)

VM

source code is translated into the code of a virtual machine (VM) VM interprets the code simulating the physical machine
7

Static Structure of a Compiler


parser & sem. analysis "main program" directs the whole compilation

scanner provides tokens from the source code

code generation generates machine code

symbol table maintains information about declared names and types

uses data flow


8

1. Overview
1.1 1.2 1.3 1.4 1.5 Motivation Structure of a Compiler Grammars Chomsky's Classification of Grammars The MicroJava Language

What is a grammar?
Example
Statement = "if" "(" Condition ")" Statement ["else" Statement].

Four components
terminal symbols are atomic
"if", ">=", ident, number, ...

nonterminal symbols

are decomposed into smaller units rules how to decompose nonterminals topmost nonterminal

Statement, Condition, Type, ...

productions

Statement = Designator "=" Expr ";". Designator = ident ["." ident]. ... Java

start symbol

10

EBNF Notation
Extended Backus-Naur form for writing grammars Productions
Statement
left-hand side literal terminal symbol John Backus: developed the first Fortran compiler Peter Naur: edited the Algol60 report nonterminal symbol terminates a production

"write"

ident

","

Expression

";" .

right-hand side

by convention terminal symbols start with lower-case letters nonterminal symbols start with upper-case letters

Metasymbols
| (...) [...] {...} separates alternatives groups alternatives optional part iterative part a|b|c a (b | c) [a] b {a}b a or b or c ab | ac ab | b b | ab | aab | aaab | ...
11

Example: Grammar for Arithmetic Expressions


Productions
Expr = ["+" | "-"] Term {("+" | "-") Term}. Term = Factor {("*" | "/") Factor}. Factor = ident | number | "(" Expr ")".

Expr

Terminal symbols
simple TS: terminal classes: "+", "-", "*", "/", "(", ")" (just 1 instance) ident, number (multiple instances) Factor Term

Nonterminal symbols
Expr, Term, Factor

Start symbol
Expr
12

Terminal Start Symbols of Nonterminals


What are the terminal symbols with which a nonterminal can start?
Expr = ["+" | "-"] Term {("+" | "-") Term}. Term = Factor {("*" | "/") Factor}. Factor = ident | number | "(" Expr ")".

First(Factor) = First(Term) =

ident, number, "(" First(Factor) = ident, number, "("

First(Expr) =

"+", "-", First(Term) = "+", "-", ident, number, "("

14

Terminal Successors of Nonterminals


Which terminal symbols can follow a nonterminal in the grammar?
Expr = ["+" | "-"] Term {("+" | "-") Term}. Term = Factor {("*" | "/") Factor}. Factor = ident | number | "(" Expr ")".

Follow(Expr) = Follow(Term) =

")", eof "+", "-", Follow(Expr) = "+", "-", ")", eof

Where does Expr occur on the right-hand side of a production? What terminal symbols can follow there?

Follow(Factor) =

"*", "/", Follow(Term) = "*", "/", "+", "-", ")", eof

15

Strings and Derivations


String
A finite sequence of symbols from an alphabet. Alphabet: all terminal and nonterminal symbols of a grammar. Strings are denoted by greek letters (, , , ...) e.g: = ident + number = - Term + Factor * number

Empty String
The string that contains no symbol (denoted by ).

Derivation (direct derivation)

Term + Factor * Factor


Term + ident * Factor

NTS

right-hand side of a production of NTS


16

* (indirect derivation)

1 2 ... n

Recursion
A production is recursive if
X * 1 X 2 Can be used to represent repetitions and nested structures X 1 X 2
X = b | X a. X = b | a X. X = b | "(" X ")". X X a X a a X a a a b a a a a a ... X a X a a X a a a X ... a a a a a b X (X) ((X)) (((X))) (((... (b)...)))

Direct recursion
Left recursion Right recursion Central recursion

Indirect recursion
Example

X * 1 X 2
Expr Term Factor "(" Expr ")"

Expr = Term {"+" Term}. Term = Factor {"*" Factor}. Factor = id | "(" Expr ")".

17

How to Remove Left Recursion


Left recursion cannot be handled in topdown parsing
X = b | X a.

Both alternatives start with b. The parser cannot decide which one to choose

Left recursion can always be transformed into iteration


X baaaa...a X = b {a} .

Another example
E = T | E "+" T.

What phrases can be derived?


T E E+T T+T E+T+T T+T+T E+T+T+T ... ...

Thus
E = T {"+" T}.

18

1. Overview
1.1 1.2 1.3 1.4 1.5 Motivation Structure of a Compiler Grammars Chomsky's Classification of Grammars The MicroJava Language

19

Classification of Grammars
Due to Noam Chomsky (1956)

Grammars are sets of productions of the form = .


class 0 Unrestricted grammars ( and arbitrary) e.g: X = a X b | Y c Y.
aYc = d. dY = bb. X aXb aYcYb dYb bbb

Recognized by Turing machines class 1 Context-sensitive grammars (|| ||) e.g: a X = a b c. Recognized by linear bounded automata Context-free grammars ( = NT, ) e.g: X = a b c. Recognized by push-down automata Regular grammars ( = NT, = T or T NT) e.g: X = b | b Y. Recognized by finite automata

class 2

class 3

Only these two classes are relevant in compiler construction

20

1. Overview
1.1 1.2 1.3 1.4 1.5 Motivation Structure of a Compiler Grammars Chomsky's Classification of Grammars The MicroJava Language

21

Sample MicroJava Program


program P final int size = 10; class Table { int[] pos; int[] neg; } Table val; { void main() int x, i; { //---------- initialize val ---------val = new Table; val.pos = new int[size]; val.neg = new int[size]; i = 0; while (i < size) { val.pos[i] = 0; val.neg[i] = 0; i = i + 1; } //---------- read values ---------read(x); while (x != 0) { if (x >= 0) val.pos[x] = val.pos[x] + 1; else if (x < 0) val.neg[-x] = val.neg[-x] + 1; read(x); } } }

main program; no separate compilation classes (without methods)

global variables local variables

22

Lexical Structure of MicroJava


Names Numbers Char constants no strings Keywords
program class if else final new + == ( = != ) ; while read ident = letter {letter | digit}. number = digit {digit}. charConst = '\'' char '\''.

all numbers are of type int all character constants are of type char (may contain \r and \n)

print

return

void

Operators

* > [ ,

/ >= ] .

% < {

<= }

Comments Types

// ... eol

int

char

arrays

classes

23

Syntactical Structure of MicroJava


Programs
Program = "program" ident {ConstDecl | VarDecl | ClassDecl} "{" {MethodDecl} "}". program P ... declarations ... { ... methods ... }

Declarations
ConstDecl VarDecl MethodDecl Type FormPars = "final" Type ident "=" (number | charConst) ";". = Type ident {"," ident} ";". = (Type | "void") ident "(" [FormPars] ")" {VarDecl} Block. = ident [ "[" "]" ]. = Type ident {"," Type ident}.

just one-dimensional arrays

24

Syntactical Structure of MicroJava


Statements
Block Statement = "{" {Statement} "}". = Designator ( "=" Expr ";" | "(" [ActPars] ")" ";" ) | "if" "(" Condition ")" Statement ["else" Statement] | "while" "(" Condition ")" Statement | "return" [Expr] ";" | "read" "(" Designator ")" ";" | "print" "(" Expr ["," number] ")" ";" | Block | ";". = Expr {"," Expr}.

input from System.in output to System.out

ActPars

25

Syntactical Structure of MicroJava


Expressions
Condition Relop Expr Term Factor = Expr Relop Expr. = "==" | "!=" | ">" | ">=" | "<" | "<=". = = = | | | | = = = ["-"] Term {Addop Term}. Factor {Mulop Factor}. Designator [ "(" [ActPars] ")" ] number charConst "new" ident [ "[" Expr "]" ] "(" Expr ")". ident { "." ident | "[" Expr "]" }. "+" | "-". "*" | "/" | "%".

no constructors

Designator Addop Mulop

26

The MicroJava Compiler


Package structure
MJ Compiler.java Scanner.java Parser.java ... SymTab CodeGen Run.java Decode.java SymTab Tab.java Obj.java Struct.java Scope.java CodeGen Code.java Item.java Decoder.java

Compilation of a MicroJava program


java MJ.Compiler myProg.mj myProg.mj compiler myProg.obj

Execution
java MJ.Run myProg.obj -debug myProg.obj interpreter

Decoding
java MJ.Decode myProg.obj myProg.obj decoder myProg.code 27

You might also like