0% found this document useful (0 votes)
33 views35 pages

CPSC 388 - Compiler Design and Construction: Parsers - Context Free Grammars

This document discusses parsers and context-free grammars (CFGs) for compiler design. It introduces parsers as tools for recognizing more types of languages than regular expressions or finite state automata. CFGs are used to define parsers and consist of terminals, non-terminals, productions, and a start symbol. Productions specify rewrite rules with non-terminals on the left and sequences of terminals/non-terminals on the right. The document provides examples of CFGs for boolean expressions, assignments, and IF statements. It discusses ambiguous grammars and how to avoid ambiguity through precedence and associativity rules in the grammar.

Uploaded by

Kashif Raffat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views35 pages

CPSC 388 - Compiler Design and Construction: Parsers - Context Free Grammars

This document discusses parsers and context-free grammars (CFGs) for compiler design. It introduces parsers as tools for recognizing more types of languages than regular expressions or finite state automata. CFGs are used to define parsers and consist of terminals, non-terminals, productions, and a start symbol. Productions specify rewrite rules with non-terminals on the left and sequences of terminals/non-terminals on the right. The document provides examples of CFGs for boolean expressions, assignments, and IF statements. It discusses ambiguous grammars and how to avoid ambiguity through precedence and associativity rules in the grammar.

Uploaded by

Kashif Raffat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 35

CPSC 388 – Compiler Design

and Construction

Parsers – Context Free Grammars


Announcements
 HW3 due via Sakai

 Solution to HW3

 Homework: HW4 assigned today, due next


Friday, via Sakai

 Reading: 4.1-4.4

 Progress Report Grades due Sept 28th


Compilers
Lexical Analyzer Syntax Analyzer
(Scanner) Symantic Analyzer
(Parser)
Source Token Abstract
Code Stream Syntax Intermediate Code
Tree Generator

Optimizer

Code Generator
Scanner
Source Code:

position = initial + rate * 60 ;

Corresponding Tokens:

IDENT(position)
ASSIGN
IDENT(position)
PLUS
IDENT(rate)
TIMES
INT-LIT(60)
SEMI-COLON
Example Parse
Source Code:
position = initial + rate * 60 ;

Abstract-Syntax Tree: =

position +

•Interior nodes are operators.


•A node's children are operands.
initial *
•Each subtree forms "logical unit"
e.g., the subtree with * at its root shows that
because multiplication has higher precedence
than addition, this operation must be performed rate 60
as a unit (not initial+rate).
Limitations of RE and FSA
 Regular Expressions and Finite State
Automata cannot express all
languages
 For Example the language that
consists of all balanced parenthesis:
() and ((())) and (((((())))))
 Parsers can recognize more types of
languages than RE or FSA
Parsers
 Input: Sequence of Tokens

 Output: a representation of program


 Often AST, but could be other things

 Also find syntax errors

 CFGs used to define Parser


(Context Free Grammar)
Context Free Grammars

stmt → if ( expr ) stmt else stmt

terminals Rule or Production non-terminals


Context Free Grammar
CFGs consist of:

Σ – Set of Terminals (use tokens from scanner)


N – Set of Non-Terminals (variables)
P – Set of Productions (also called rules)
S – the Start Non-Terminal (one on left of
first rule if not specified)
Productions

stmt → if ( expr ) stmt else stmt

Sequence of zero or more terminals and non-terminals

Single non-terminal
Example with Boolean Expressions
 “true” and “false” are boolean
expressions
 If exp1 and exp2 are boolean
expressions, then so are:
 exp1 || exp2
 exp1 && exp2
 ! exp1
 ( exp1 )
Corresponding CFG
bexp → TRUE
bexp → FALSE
bexp → bexp OR bexp
bexp → bexp AND bexp
bexp → NOT bexp
bexp → LPAREN bexp RPAREN
UPPERCASE represent tokens (thus terminals)
lowercase represent non-terminals
CFG for Assignments
 Here is CFG for simple assignment
statements
(Can only assign boolean expressions
to identifiers)
stmt → ID ASSIGN bexp SEMICOLON
CFG for simple IF statements
Combine these CFGs and add 2 more rules for
simple IF statements:
1. stmt → IF LPAREN bexp RPAREN stmt
2. stmt → IF LPAREN bexp RPAREN stmt ELSE stmt
3. stmt → ID ASSIGN bexp SEMICOLON
4. bexp → TRUE
5. bexp → FALSE
6. bexp → bexp OR bexp
7. bexp → bexp AND bexp
8. bexp → NOT bexp
9. bexp → LPAREN bexp RPAREN
You Try It
Write a context-free grammar for the
language of very simple while loops
(in which the loop body only contains
one statement) by adding a new
production with nonterminal stmt on
the left-hand side.
CFG Languages
 The language defined by a context-free
grammar is the set of strings (sequences of
terminals) that can be derived from the
start nonterminal.

 Think of productions as rewriting rules


Set cur_seq = starting non-terminal
While (non-terminal, X, exists in cur_seq):
Select production with X on left of “→”
Replace X with right portion of selected
production
 Try it with given CFG
What Strings are in Language
1. stmt → IF LPAREN bexp RPAREN stmt
2. stmt → IF LPAREN bexp RPAREN stmt ELSE stmt
3. stmt → ID ASSIGN bexp SEMICOLON
4. bexp → TRUE
5. bexp → FALSE
6. bexp → bexp OR bexp
7. bexp → bexp AND bexp
8. bexp → NOT bexp
9. bexp → LPAREN bexp RPAREN

Set cur_seq = starting non-terminal


While (non-terminal, X, exists in cur_seq):
Select production with X on left of “→”
Replace X with right portion of selected production
Try It Again
exp → exp PLUS term
exp → exp MINUS term
exp → term
term → term TIMES factor
term → term DIVIDE factor
term → factor
factor → LPAREN exp RPAREN
factor → ID
What is the language?
Leftmost and Rightmost Derivations
 A derivation is a leftmost derivation
if it is always the leftmost
nonterminal that is chosen to be
replaced.
 It is a rightmost derivation if it is
always the rightmost one.
Derivation Notation
 E => a
 E =>* a
 E =>+ a
Parse Trees
Start with the start nonterminal.
Repeat:
choose a leaf nonterminal X
choose a production X --> alpha
the symbols in alpha become the children of
X in the tree
until there are no more leaf
nonterminals left.
The derived string is formed by reading the
leaf nodes from left to right.
Ambiguous Grammars
 Consider the grammar
exp → exp PLUS exp
exp → exp MINUS exp
exp → exp TIMES exp
exp → exp DIVIDE exp
exp → INT_LIT
 Construct Parse tree for 3-4/2
 Are there more than one parse trees?
 If there is more than one parse tree for a string then
the grammar is ambiguous
 Ambiguity causes problems with parsing (what is the
correct structure)?
Precedence in Grammars
To write a grammar whose parse trees
express precedence correctly, use a
different nonterminal for each
precedence level. Start by writing a
rule for the operator(s) with the
lowest precedence ("-" in our case),
then write a rule for the operator(s)
with the next lowest precedence, etc:
Precedence in Grammars
exp → exp MINUS exp | term
term → term DIVIDE term | factor
factor → INT_LIT | LPAREN exp RPAREN
 Now try constructing multiple parse
trees for 3-4/2
 Grammar is still ambiguous. Look at
associativity. Construct 2 parses tree
for 5-3-2.
Recursion on CFGs
 A grammar is recursive in nonterminal X
if: X derives a sequence of symbols that
includes an X.
 A grammar is left recursive in X if: X
derives a sequence of symbols that starts
with an X.
 A grammar is right recursive in X if: X
derives a sequence of symbols that ends
with an X.
 For left associativity, use left recursion.
 For right associativity, use right
recursion.
Ambiguity Removed in CFG
exp → exp MINUS term | term
term → term DIVIDE factor | factor
factor → INT_LIT | LPAREN exp RPAREN
 One level for each order of operation
 Left recursion for left assiciativity
 Now try constructing 2 parse trees for
5-3-2.
You Try it
 Construct a grammar for arithmetic
expressions with addition, multiplication,
exponentiation (right assoc.), subtraction,
division, and unary negative.
exp → exp PLUS exp |
exp MINUS exp |
exp TIMES exp |
exp DIVIDE exp |
exp POW exp |
MINUS exp |
LPAREN exp RPAREN |
INT_LIT
Solution
exp → exp PLUS term |
exp MINUS term |
term
term → term TIMES factor |
term DIVIDE factor |
factor
factor → exponent POW factor |
exponent
exponent → MINUS exponent |
final
final → INT_LIT |
LPAREN exp RPAREN
You Try It
 Write a grammar for the language of boolean
expressions, with two possible operands: true false,
and three possible operators: and or not. Add
nonterminals so that or has lowest precedence, then
and, then not. Finally, change the grammar to reflect
the fact that both and and or are left associative.
bexp → TRUE
bexp → FALSE
bexp → bexp OR bexp
bexp → bexp AND bexp
bexp → NOT bexp
bexp → LPAREN bexp RPAREN
List Grammars
 Several types of lists can be created using
CFGs
 One or more x's (without any separator or
terminator):
1. xList → X | xList xList
2. xList → X | xList X
3. xList → X | X xList
 One or more x's, separated by commas:
1. xList → X | xList COMMA xList
2. xList → X | xList COMMA X
3. xList → X | X COMMA xList
List Grammars
 One or more x's, each x terminated by a
semi-colon:
 You Try It
1. xList → X SEMICOLON | xList xList
2. xList → X SEMICOLON | xList X SEMICOLON
3. xList → X SEMICOLON | X SEMICOLON xList
 Zero or more x's (without any separator or
terminator):
1. xList → ε | X | xList xList
2. xList → ε | X | xList X
3. xList → ε | X | X xList
List Grammars
 Zero or more x's, each terminated by a
semi-colon:
1. xList → ε | X SEMICOLON | xList xList
2. xList → ε | X SEMICOLON | xList X SEMICOLON
3. xList → ε | X SEMICOLON | X SEMICOLON xList
 Zero or more x's, separated by commas:
 You Try It
1. xList → ε | nonEmptyXList
nonEmptyXList → X | X COMMA nonEmptyXList
CFGs for Whole Languages
 To write a grammar for a whole
programming language, break down
the problem into pieces. For example,
think about a Java program: a
program consists of one or more
classes:
program → classlist
classlist → class | class classlist
CFGs for Whole Languages
 A class is the word "class", optionally
preceded by the word "public",
followed by an identifier, followed by
an open curly brace, followed by the
class body, followed by a closing curly
brace:
class → PUBLIC CLASS ID LCURLY classbody RCURLY |
CLASS ID LCURLY classbody RCURLY
CFGs for Whole Languages
 A class body is a list of zero or more
field and/or method definitions:
classbody → ε | deflist
deflist → def | def deflist

And So On…

You might also like