0% found this document useful (0 votes)
19 views46 pages

[Week 3] Syntax Analysis (Derivation)

Uploaded by

ikechukwujo45
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views46 pages

[Week 3] Syntax Analysis (Derivation)

Uploaded by

ikechukwujo45
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

CSC 318 – COMPILER CONSTRUCTION I (2-UNITS)

Week Three:
Syntax Analysis and Context-Free Grammar (CFG)

The CSC318 Team [2020|2021]


[email protected]
Recommended Texts for this Course
1. Watson, Des (2017). A Practical Approach to Compiler Construction, Springer. [MainText]

2. Campbell, Bill; Iyer, Swami; and Akbal-Delibas, Bahar (2013). Introduction to Compiler
Construction in a Java World, Taylor & Francis Group.

3. Seidl, Helmut; Wilhelm, Reinhard; and Hack, Sebastian (2012). Compiler Design: Analysis and
Transformation, Springer

4. Grune, Dick; Reeuwijk, Kees van; Bal, Henri E; Jacobs, Ceriel J.H; and Langendoen, Koen (2012).
Modern Compiler Design, Second Edition, Springer.

5. Reis, Anthony Dos (2011). Compiler Construction Using Java, JavaCC, and Yacc, Wiley

2
Recap: From Description to Implementation

ü Lexical analysis (Scanning): Identify logical pieces of the description.


ü Syntax analysis (Parsing): Identify how those pieces relate to each other.
ü Semantic analysis: Identify the meaning of the overall structure.
ü IR Generation: Design one possible structure.
ü IR Optimization: Simplify the intended structure.
ü Generation: Fabricate the structure.
ü Optimization: Improve the resulting structure.
3

Tokens VersusTerminals
v In a compiler, the lexical analyzer reads the characters of the source program, groups them into
lexically meaningful units called lexemes, and produces as output tokens representing these
lexemes.

v A token consists of two components, a token name and an attribute value.

v The token names are abstract symbols that are used by the parser for syntax analysis.

v Often, we shall call these token names terminals, since they appear as terminal symbols in
the grammar for a programming language.
v The attribute value, if present, is a pointer to the symbol table that contains additional
information about the token.
4

In-Class Exercise 1

int main ()
{
int a = 0;
cout << a << endl;
return 1;
}
• Given the sample code above, list and describe the tokens.
5

Tokens for In-Class Exercise 1


Ø Identifier (int)
Ø Identifier (main)
Ø Left parenthesis
Ø Right parenthesis
Ø Left curly brace
Ø Identifier (int)
Ø Equals
Ø Integer (0)
Ø Semi-colon
Ø Identifier (cout)
Ø Double left angle bracket
Ø Identifier (a)
Ø Double left angle bracket
Ø Identifier (endl)
Ø Semi-colon
Ø Reserved word (return)
Ø Integer (1)
Ø Semi-colon
Ø Right curly brace
6

Tree Terminology
v Tree data structures figure prominently in compiling.
• A tree consists of one or more nodes. Nodes may have labels, which typically will be grammar
symbols.When we draw a tree, we often represent the nodes by these labels only.
• Exactly one node is the root. All nodes except the root have a unique parent; the root has no
parent. When we draw trees, we place the parent of a node above that node and draw an edge
between them. The root is then the highest (top) node.
• If node N is the parent of node M, then M is a child of N. The children of one node are called
siblings. They have an order, from the left, and when we draw trees, we order the children of a given
node in this manner.
• A node with no children is called a leaf. Other nodes — those with one or more children — are
interior nodes.
• A descendant of a node N is either N itself, a child of N, a child of a child of N, and so on, for any
number of levels.We say node N is an ancestor of node M if M is a descendant of N.
9

Definition of Grammars
A context-free grammar (CFG) has four components

1. A set of terminal symbols, sometimes referred to as "tokens." The terminals are the elementary symbols
of the language defined by the grammar.

2. A set of nonterminals, sometimes called "syntactic variables." Each nonterminal represents a set of
strings of terminals, in a manner we shall describe.

3. A set of productions, where each production consists of a nonterminal, called the head or left side of the
production, an arrow, and a sequence of terminals and/or nonterminals, called the body or right side of
the production. The intuitive intent of a production is to specify one of the written forms of a construct; if
the head nonterminal represents a construct, then the body represents a written form of the construct.

4. A designation of one of the nonterminals as the START symbol.


10

Syntax Definition
v Context-free Grammar is a 4-tuple with
Ø A set of tokens (terminal symbols), T
Ø A set of nonterminals, N
Ø A set of productions, P
X → Y Y …Y
1 2 n

where X ∈ N andY ∈ T ∪ N ∪ {ε}


i

Ø A designated start symbol S (a non-terminal)


Example CFGs: Simple Arithmetic Expressions 11

In English:
v An integer is an arithmetic expression.
v If exp1 and exp2 are arithmetic expressions, then so are the following:
• exp1 - exp2
• exp1 / exp2
• ( exp1 )
the corresponding CFG: we’ll write tokens as follows:
• exp → INTLITERAL E → intlit
• exp → exp MINUS exp E → E - E
• exp → exp DIVIDE exp E → E / E
• exp → LPAREN exp RPAREN E → ( E )
Example of a Grammar
v Context-free grammar for simple expressions:

G = <{list,digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list >


with productions P =
list ® list + digit
list ® list - digit

list ® digit

digit ® 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
13

Derivation
v Given a CF grammar we can determine the set of all strings (sequences of tokens) generated
by the grammar using derivation

v A derivation is a sequence of productions, P → … → … → …


• A derivation can be drawn as a Tree
• Start symbol is the tree’s root X
• For a production X → Y1 … Yn
add children Y1 … Yn to node
• We begin with the start symbol, S

• In each step, we replace one nonterminal in the current sentential form with one of the
right-hand sides of a production for that nonterminal
Example of a Grammar
v Context-free grammar for simple expressions:

G = <{list,digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list >


with productions P =
list ® list + digit
list ® list - digit

list ® digit

digit ® 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Derivation for the Example Grammar
list
Þ list + digit
Þ list - digit + digit
Þ digit - digit + digit
Þ 9 - digit + digit
Þ 9 - 5 + digit
Þ9-5+2
This is an example leftmost derivation, because we replaced the leftmost nonterminal
(underlined) in each step.
Likewise, a rightmost derivation replaces the rightmost nonterminal in each step
Derivation: An example

v CFG: v derivation:
E → id E→ E+E

E→E+E → E*E+E

E→E*E → id*E+E

E→(E) → id*id+E

→ id*id+id
• Is string id * id + id in language
defined by grammar?
Syntax Analyzer (Parser)

v Groups tokens together to form grammatical phrases


• Builds a structure to capture the program
(Abstract Syntax Tree)
• Interior nodes – operators
• Children - operands
o Example: a = a * 5;
17

Parse Trees

• The root of the tree is labeled by the Start symbol

• Each leaf of the tree is labeled by a Terminal (=token) or e

• Each interior node is labeled by a Nonterminal

• If A ® X1 X2 … Xn is a production, then node A has immediate children X1,

X2, …, Xn where Xi is a (non)terminal or e (e denotes the empty string)


Parse Tree for the Example Grammar

Parse tree of the string 9-5+2 using grammar, G


list

list digit

list digit

digit
The sequence of
9 - 5 + 2 leafs is called the
yield of the parse tree
Ambiguity

v Consider the following context-free grammar:

G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>


with production P =
string ® string + string | string - string | 0 | 1 | … | 9

This grammar is ambiguous, because more than one parse tree represents the
string 9-5+2
Ambiguity (cont’d)

string string

string string string

string string string string string

9 - 5 + 2 9 - 5 + 2

2
0
Associativity of Operators
• Left-associative operators have left-recursive productions

left ® left + term | term


String a+b+c has the same meaning as (a+b)+c

• Right-associative operators have right-recursive productions


right ® term = right | term
String a=b=c has the same meaning as a=(b=c)
Precedence of Operators
v Operators with higher precedence “bind more tightly”
§ expr ® expr + term | term
§ term ® term * factor | factor
§ factor ® number | ( expr )
§ String 2+3*5 has the same meaning as 2+(3*5)
expr
expr term
term term factor
factor factor number
number number
2 + 3 * 5 The Parse Tree
Syntax of Statements
stmt ® id := expr
| if expr then stmt
| if expr then stmt else stmt
| while expr do stmt
| begin opt_stmts end
opt_stmts ® stmt ; opt_stmts
|e
24

Syntax-Directed Translation (SDT)


• Uses a CFG to specify the syntactic structure of the language.

• AND associates a set of attributes with the terminals and


nonterminals of the grammar

• AND associates with each production a set of semantic rules to


compute values of attributes

• A parse tree is traversed and semantic rules applied:


o After the computations are completed the attributes contain
the translated form of the input 24
25

Synthesized and Inherited Attributes

• An attribute is said to be:


o synthesized if its value at a parse-tree node is
determined from the attribute values at the children
of the node
o inherited if its value at a parse-tree node is
determined by the parent (by enforcing the parent’s
semantic rules)
Example Attribute Grammar

String concat operator


Production Semantic Rule
expr ® expr1 + term expr.t := expr1.t // term.t // “+”
expr ® expr1 - term expr.t := expr1.t // term.t // “-”
expr ® term expr.t := term.t
term ® 0 term.t := “0”
term ® 1 term.t := “1”
… …
term ® 9 term.t := “9”
Example: Annotated Parse Tree

expr.t = “95-2+”

expr.t = “95-” term.t = “2”

expr.t = “9” term.t = “5”

term.t = “9”

9 - 5 + 2
28

In-Class Exercise

v Consider the Context-Free Grammar


G = < {S},{a,−, ∗}, P, S >
with productions P: S → S− | SS∗ | a

§Show how the string a − a ∗ −a∗ can be


generated by G (using derivation).
September 7, 2017 CSC318 - Compiler Construction (Omogbadegun, 2017)
28
The Structure of a Modern Compiler

Source Lexical Analysis


Code
Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization Machine
Code
The Structure of a Modern Compiler

Source Lexical Analysis


Code
Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization Machine
Code
The Structure of a Modern Compiler

Source Lexical Analysis


Code
Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization Machine
Code
while (y < z)
{
int x = a + b;
y += x;
} Lexical Analysis

Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization
while (y < z)
{
int x = a + b;
y += x;
}
Lexical Analysis

Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization
while (y < z)
{
int x = a + b;
y += x;
}
T_While
T_LeftParen
Lexical Analysis
T_Identifier y
T_Less Syntax Analysis
T_Identifier z
T_RightParen
T_OpenBrace Semantic Analysis
T_Int
T_Identifier x IR Generation
T_Assign
T_Identifier a
T_Plus IR Optimization
T_Identifier b
T_Semicolon
T_Identifier y
Code Generation
T_PlusAssign
T_Identifier x Optimization
T_Semicolon
T_CloseBrace
while (y < z)
{ int x = a + b;
y += x;
}
T_While
T_LeftParen Lexical Analysis
T_Identifier y
T_Less Syntax Analysis
T_Identifier z
T_RightParen
T_OpenBrace Semantic Analysis
T_Int
T_Identifier x
T_Assign
IR Generation
T_Identifier a
T_Plus IR Optimization
T_Identifier b
T_Semicolon
T_Identifier y Code Generation
T_PlusAssign
T_Identifier x Optimization
T_Semicolon
T_CloseBrace
while (y < z) {
int x = a + b;
y += x;
}

While Lexical Analysis

Syntax Analysis
Sequence
Semantic Analysis

< = = IR Generation

IR Optimization
y z x + y +
Code Generation

Optimization
a b y x
while (y < z) {
int x = a + b;
y += x;
}
While
Lexical Analysis

Syntax Analysis
Sequence
Semantic Analysis

< = = IR Generation

IR Optimization
y z x + y +
Code Generation

Optimization
a b y x
while (y < z)
{ int x = a + b;
y += x;
} While void

Lexical Analysis

Syntax Analysis
Sequence void
Semantic Analysis

< bool = int = int IR Generation

IR Optimization
y z x + int y + int
Code Generation
int int int int

Optimization
a b y x
int int int int
while (y < z)
{ int x = a + b;
y += x;
} While void

Lexical Analysis

Syntax Analysis
Sequence void
Semantic Analysis

< bool = int = int IR Generation

IR Optimization
y z x + int y + int
Code Generation
int int int int

Optimization
a b y x
int int int int
while (y < z)
b;
{
int x = a +
y += x;
Lexical Analysis
}

Loop: x = a + b Syntax Analysis

y = x + y Semantic Analysis

_= y < z IR Generation
t IR Optimization
1 Code Generation

if _t1 goto Loop Optimization


while (y < z)
b;
{
int x = a +
y += x;
Lexical Analysis
}

Loop: x = a + b Syntax Analysis

y = x + y Semantic Analysis

_= y < z IR Generation
t IR Optimization
1 Code Generation

if _t1 goto Loop Optimization


if _t1 goto{ Loop
while (y < z)
b;

int x = a +
y += x;
Lexical Analysis
}

x = a + b Syntax Analysis

Loop: y = x + y Semantic Analysis

_= y < z IR Generation
t IR Optimization
1 Code Generation

Optimization
if _t1 goto{ Loop
while (y < z)
b;

int x = a +
y += x;
Lexical Analysis
}

x = a + b Syntax Analysis

Loop: y = x + y Semantic Analysis

_= y < z IR Generation
t IR Optimization
1 Code Generation

Optimization
while (y < z)
b;
{
int x = a +
y += x;
Lexical Analysis
}

add $1, $2, $ Syntax Analysis


3 Semantic Analysis
Loop: $4, $1, $
IR Generation
add 4
IR Optimization
slt $6, $1, $
Code Generation
5
Optimization
beq $6, loo
p
LECTURE 3: FURTHER READING & ASSIGNMENT

Kindly study the Chapter4 in the Textbook, DES WATSON, on page


75-91, for further background understanding on the lecture topic.

45
29

Assignment
v Consider the grammar, G = < {S},{a,b},P, S > with
productions
P: S → aSbS | bSaS | E

o Show that this grammar is ambiguous by


finding an example string of terminals a and b
that results in two or more parse trees.

You might also like