0% found this document useful (0 votes)
46 views39 pages

Ch2 Modified

1) The document describes building a simple compiler by defining a programming language syntax using context-free grammar, developing a predictive parser, and implementing syntax-directed translation to generate intermediate code. 2) A context-free grammar consists of tokens, nonterminals, productions, and a start symbol. Productions specify rewriting rules to derive strings from the grammar. 3) Derivations and parse trees represent the structure of strings according to the grammar. Derivations apply productions to replace nonterminals, and parse trees visually depict the structure.

Uploaded by

Hassnain Abbas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views39 pages

Ch2 Modified

1) The document describes building a simple compiler by defining a programming language syntax using context-free grammar, developing a predictive parser, and implementing syntax-directed translation to generate intermediate code. 2) A context-free grammar consists of tokens, nonterminals, productions, and a start symbol. Productions specify rewriting rules to derive strings from the grammar. 3) Derivations and parse trees represent the structure of strings according to the grammar. Derivations apply productions to replace nonterminals, and parse trees visually depict the structure.

Uploaded by

Hassnain Abbas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

1

A SIMPLE SYNTAX-
DIRECTED TRANSLATOR

Chapter 2
2

Building a Simple Compiler


• Building our compiler involves:
– Defining the syntax of a programming language
– Develop a source code parser: for our compiler
we will use predictive parsing
– Implementing syntax directed translation to
generate intermediate code
3

Syntax Definition
• Context-free grammar is a 4-tuple with
– A set of tokens (terminal symbols)
– A set of nonterminals
– A set of productions
– A designated start symbol
4

Example Grammar

Context-free grammar for simple expressions:

G = <{list,digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list>

with productions P =

list  list + digit

list  list - digit

list  digit

digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
5

Derivation
• Given a CF grammar we can determine the
set of all strings (sequences of tokens)
generated by the grammar using derivation
– We begin with the start symbol
– In each step, we replace one nonterminal in the
current sentential form with one of the right-
hand sides of a production for that nonterminal
6

Derivation for the Example


Grammar

list
 list + digit
 list - digit + digit
 digit - digit + digit
 9 - digit + digit
 9 - 5 + digit
9-5+2

This is an example leftmost derivation, because we replaced


the leftmost nonterminal (underlined) in each step.
7

Derivation for the Example


Rightmost Grammar
Likewise, a rightmost derivation replaces the rightmost
nonterminal in each step
list
P=  digit - list
list  digit + list  digit - digit + list
 digit - digit + digit
list  digit - list
 digit - digit + 2
list  digit  digit - 5 + 2
digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 9-5+2
8

Parse Trees
• The root of the tree is labeled by the start symbol
• Each leaf of the tree is labeled by a terminal
(=token) or 
• Each interior node is labeled by a nonterminal
• If A  X1 X2 … Xn is a production, then node A has
immediate children X1, X2, …, Xn where Xi is a
(non)terminal or  ( denotes the empty string)
9

Parse Tree for the Example


Grammar
Parse tree of the string 9-5+2 using grammar G

list

list digit

list digit

digit
The sequence of
9 - 5 + 2 leafs is called the
yield of the parse tree
The Two Derivations for x – 2 * y
Rule Sentential Form Rule Sentential Form
— Expr — Expr
1 Expr Op Expr 1 Expr Op Expr
3 <id,x> Op Expr 3 Expr Op <id,y>
5 <id,x> – Expr 6 Expr * <id,y>
1 <id,x> – Expr Op Expr 1 Expr Op Expr * <id,y>
2 <id,x> – <num,2> Op Expr 2 Expr Op <num,2> * <id,y>
6 <id,x> – <num,2> * Expr 5 Expr – <num,2> * <id,y>
3 <id,x> – <num,2> * <id,y> 3 <id,x> – <num,2> * <id,y>

Leftmost derivation Rightmost derivation

In both cases, Expr * id – num * id


• The two derivations produce different parse trees
• The parse trees imply different evaluation orders!
Derivations and Parse Trees
G
Leftmost derivation
Rule Sentential Form
— Expr E
1 Expr Op Expr
3 <id,x> Op Expr
5 <id,x> – Expr E Op E
1 <id,x> – Expr Op Expr
2 <id,x> – <num,2> Op Expr
6 <id,x> – <num,2> * Expr x – Op E
E
3 <id,x> – <num,2> * <id,y>

This evaluates as x – ( 2 * y ) 2 y
*
Derivations and Parse Trees
G
Rightmost derivation
Rule Sentential Form
— Expr E
1 Expr Op Expr
3 Expr Op <id,y>
6 Expr * <id,y> E Op E
1 Expr Op Expr * <id,y>
2 Expr Op <num,2> * <id,y>
5 Expr – <num,2> * <id,y> E Op E * y
3 <id,x> – <num,2> * <id,y>

x – 2
This evaluates as ( x – 2 ) * y
13

Ambiguity

Consider the following context-free grammar:

G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>

with production P =

string  string + string | string - string | 0 | 1 | … | 9


This grammar is ambiguous, because more than one parse tree
represents the string 9-5+2
14

Ambiguity (cont’d)

string string

string string string

string string string string string

9 - 5 + 2 9 - 5 + 2
15

Associativity of Operators
Left-associative operators have left-recursive productions
left  left + term | term
String a+b+c has the same meaning as (a+b)+c
Right-associative operators have right-recursive productions
right  term = right | term
String a=b=c has the same meaning as a=(b=c)
Operators on the same line have the same associativity and
precedence:
left-associative: + -

left-associative: */
16

Precedence of Operators
Operators with higher precedence “bind more tightly”
expr  expr + term | expr – term | term
term  term * factor | term / factor | factor
factor  number | ( expr )
number  0|1|2|…….|9
String 2+3*5 has the same meaning as 2+(3*5)
expr
expr term
term term factor
factor factor number
number number
2 + 3 * 5
17

Syntax of Statements

stmt  id := expr
| if expr then stmt
| if expr then stmt else stmt
| while expr do stmt
| begin opt_stmts end
opt_stmts  stmt ; opt_stmts
|
18

Syntax-Directed Translation
• Uses a CF grammar to specify the syntactic
structure of the language
• AND associates a set of attributes with the
terminals and nonterminals of the grammar
• AND associates with each production a set of
semantic rules to compute values of attributes
• A parse tree is traversed and semantic rules
applied: after the tree traversal(s) are completed,
the attribute values on the nonterminals contain
the translated form of the input
19

Synthesized and Inherited


Attributes
• An attribute is said to be …
– synthesized if its value at a parse-tree node is
determined from the attribute values at the children of
the node
– Suppose a node N in a parse tree is labeled by the
grammar symbol X . We write X.a to denote the value
of attribute a of X at that node.
– inherited if its value at a parse-tree node is determined
by the parent (by enforcing the parent’s semantic rules)
20

Example Attribute Grammar


Syntax-directed definition for infix to postfix translation

String concat operator


Production Semantic Rule
expr  expr1 + term expr.t := expr1.t // term.t // “+”
expr  expr1 - term expr.t := expr1.t // term.t // “-”
expr  term expr.t := term.t
term  0 term.t := “0”
term  1 term.t := “1”
… …
term  9 term.t := “9”
21

Example Annotated Parse Tree

expr.t = “95-2+”

expr.t = “95-” term.t = “2”

expr.t = “9” term.t = “5”

term.t = “9”

9 - 5 + 2
22

Depth-First Traversals
procedure visit(n : node);
begin
for each child m of n, from left to right do
visit(m);
evaluate semantic rules at node n
end
23

Depth-First Traversals (Example)

expr.t = “95-2+”

expr.t = “95-” term.t = “2”

expr.t = “9” term.t = “5”

term.t = “9”

9 - 5 + 2 Note: all attributes are


of the synthesized type
24

Translation Schemes
• A translation scheme is a CF grammar embedded
with semantic actions
• When drawing a parse tree for a translation scheme,
we indicate an action by constructing an extra child
for it, connected by a dashed line to the node that
corresponds to the head of the production.

rest  + term { print(“+”) } rest


rest
Embedded
semantic action
+ term { print(“+”) } rest
25

Example Translation Scheme

expr  expr + term { print(“+”) }


expr  expr - term { print(“-”) }
expr  term
term  0 { print(“0”) }
term  1 { print(“1”) }
… …
term  9 { print(“9”) }
26

Example Translation Scheme


(cont’d)

expr
{ print(“+”) }
expr + term
{ print(“2”) }
{ print(“-”) }
- term 2
expr
{ print(“5”) }
term 5
{ print(“9”) }
9
Translates 9-5+2 into postfix 95-2+
27

Parsing
• Parsing = process of determining if a string of
tokens can be generated by a grammar
• For any CF grammar there is a parser that takes at
most O(n3) time to parse a string of n tokens
• Top-down parsing “constructs” a parse tree from
root to leaves
• Bottom-up parsing “constructs” a parse tree from
leaves to root
28

Predictive Parsing
• Recursive descent parsing is a top-down parsing
method
– Each nonterminal has one (recursive) procedure that is
responsible for parsing the nonterminal’s syntactic
category of input tokens
– When a nonterminal has multiple productions, each
production is implemented in a branch of a selection
statement based on input look-ahead information
• Predictive parsing is a special form of recursive
descent parsing where we use one lookahead
token to unambiguously determine the parse
operations
29

Example Predictive Parser


(Grammar)

type  simple
| ^ id
| array [ simple ] of type
simple  integer
| char
| num dotdot num
30

Example Predictive Parser


(Execution Step 1)

Check lookahead
type()
and call match

match(‘array’)

Input: array [ num dotdot num ] of integer

lookahead
31

Example Predictive Parser


(Execution Step 2)
type()

match(‘array’) match(‘[’)

Input: array [ num dotdot num ] of integer

lookahead
32

Example Predictive Parser


(Execution Step 3)
type()

match(‘array’) match(‘[’) simple()

match(‘num’)

Input: array [ num dotdot num ] of integer

lookahead
33

Example Predictive Parser


(Execution Step 4)
type()

match(‘array’) match(‘[’) simple()

match(‘num’) match(‘dotdot’)

Input: array [ num dotdot num ] of integer

lookahead
34

Example Predictive Parser


(Execution Step 5)
type()

match(‘array’) match(‘[’) simple()

match(‘num’) match(‘dotdot’) match(‘num’)

Input: array [ num dotdot num ] of integer

lookahead
35

Example Predictive Parser


(Execution Step 6)
type()

match(‘array’) match(‘[’) simple() match(‘]’)

match(‘num’) match(‘dotdot’) match(‘num’)

Input: array [ num dotdot num ] of integer

lookahead
36

Example Predictive Parser


(Execution Step 7)
type()

match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’)

match(‘num’) match(‘dotdot’) match(‘num’)

Input: array [ num dotdot num ] of integer

lookahead
37

Example Predictive Parser


(Execution Step 8)
type()

match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’) type()

match(‘num’) match(‘dotdot’) match(‘num’) simple()

match(‘integer’)
Input: array [ num dotdot num ] of integer

lookahead
38

Adding a Lexical Analyzer


• Typical tasks of the lexical analyzer:
– Remove white space and comments
– Encode constants as tokens
– Recognize keywords
– Recognize identifiers and store identifier names
in a global symbol table
39

The Lexical Analyzer “lexer”

Lexical analyzer
y := 31 + 28*x
lexan()

<id, “y”> <assign, > <num, 31> <‘+’, > <num, 28> <‘*’, > <id, “x”>

token
(lookahead)
tokenval Parser
(token attribute) parse()

You might also like