0% found this document useful (0 votes)
10 views

[Week 3] Syntax Analysis (Derivation)

Uploaded by

ikechukwujo45
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

[Week 3] Syntax Analysis (Derivation)

Uploaded by

ikechukwujo45
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

CSC 318 – COMPILER CONSTRUCTION I (2-UNITS)

Week Three:
Syntax Analysis and Context-Free Grammar (CFG)

The CSC318 Team [2020|2021]


[email protected]
Recommended Texts for this Course
1. Watson, Des (2017). A Practical Approach to Compiler Construction, Springer. [MainText]

2. Campbell, Bill; Iyer, Swami; and Akbal-Delibas, Bahar (2013). Introduction to Compiler
Construction in a Java World, Taylor & Francis Group.

3. Seidl, Helmut; Wilhelm, Reinhard; and Hack, Sebastian (2012). Compiler Design: Analysis and
Transformation, Springer

4. Grune, Dick; Reeuwijk, Kees van; Bal, Henri E; Jacobs, Ceriel J.H; and Langendoen, Koen (2012).
Modern Compiler Design, Second Edition, Springer.

5. Reis, Anthony Dos (2011). Compiler Construction Using Java, JavaCC, and Yacc, Wiley

2
Recap: From Description to Implementation

ü Lexical analysis (Scanning): Identify logical pieces of the description.


ü Syntax analysis (Parsing): Identify how those pieces relate to each other.
ü Semantic analysis: Identify the meaning of the overall structure.
ü IR Generation: Design one possible structure.
ü IR Optimization: Simplify the intended structure.
ü Generation: Fabricate the structure.
ü Optimization: Improve the resulting structure.
3

Tokens VersusTerminals
v In a compiler, the lexical analyzer reads the characters of the source program, groups them into
lexically meaningful units called lexemes, and produces as output tokens representing these
lexemes.

v A token consists of two components, a token name and an attribute value.

v The token names are abstract symbols that are used by the parser for syntax analysis.

v Often, we shall call these token names terminals, since they appear as terminal symbols in
the grammar for a programming language.
v The attribute value, if present, is a pointer to the symbol table that contains additional
information about the token.
4

In-Class Exercise 1

int main ()
{
int a = 0;
cout << a << endl;
return 1;
}
• Given the sample code above, list and describe the tokens.
5

Tokens for In-Class Exercise 1


Ø Identifier (int)
Ø Identifier (main)
Ø Left parenthesis
Ø Right parenthesis
Ø Left curly brace
Ø Identifier (int)
Ø Equals
Ø Integer (0)
Ø Semi-colon
Ø Identifier (cout)
Ø Double left angle bracket
Ø Identifier (a)
Ø Double left angle bracket
Ø Identifier (endl)
Ø Semi-colon
Ø Reserved word (return)
Ø Integer (1)
Ø Semi-colon
Ø Right curly brace
6

Tree Terminology
v Tree data structures figure prominently in compiling.
• A tree consists of one or more nodes. Nodes may have labels, which typically will be grammar
symbols.When we draw a tree, we often represent the nodes by these labels only.
• Exactly one node is the root. All nodes except the root have a unique parent; the root has no
parent. When we draw trees, we place the parent of a node above that node and draw an edge
between them. The root is then the highest (top) node.
• If node N is the parent of node M, then M is a child of N. The children of one node are called
siblings. They have an order, from the left, and when we draw trees, we order the children of a given
node in this manner.
• A node with no children is called a leaf. Other nodes — those with one or more children — are
interior nodes.
• A descendant of a node N is either N itself, a child of N, a child of a child of N, and so on, for any
number of levels.We say node N is an ancestor of node M if M is a descendant of N.
9

Definition of Grammars
A context-free grammar (CFG) has four components

1. A set of terminal symbols, sometimes referred to as "tokens." The terminals are the elementary symbols
of the language defined by the grammar.

2. A set of nonterminals, sometimes called "syntactic variables." Each nonterminal represents a set of
strings of terminals, in a manner we shall describe.

3. A set of productions, where each production consists of a nonterminal, called the head or left side of the
production, an arrow, and a sequence of terminals and/or nonterminals, called the body or right side of
the production. The intuitive intent of a production is to specify one of the written forms of a construct; if
the head nonterminal represents a construct, then the body represents a written form of the construct.

4. A designation of one of the nonterminals as the START symbol.


10

Syntax Definition
v Context-free Grammar is a 4-tuple with
Ø A set of tokens (terminal symbols), T
Ø A set of nonterminals, N
Ø A set of productions, P
X → Y Y …Y
1 2 n

where X ∈ N andY ∈ T ∪ N ∪ {ε}


i

Ø A designated start symbol S (a non-terminal)


Example CFGs: Simple Arithmetic Expressions 11

In English:
v An integer is an arithmetic expression.
v If exp1 and exp2 are arithmetic expressions, then so are the following:
• exp1 - exp2
• exp1 / exp2
• ( exp1 )
the corresponding CFG: we’ll write tokens as follows:
• exp → INTLITERAL E → intlit
• exp → exp MINUS exp E → E - E
• exp → exp DIVIDE exp E → E / E
• exp → LPAREN exp RPAREN E → ( E )
Example of a Grammar
v Context-free grammar for simple expressions:

G = <{list,digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list >


with productions P =
list ® list + digit
list ® list - digit

list ® digit

digit ® 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
13

Derivation
v Given a CF grammar we can determine the set of all strings (sequences of tokens) generated
by the grammar using derivation

v A derivation is a sequence of productions, P → … → … → …


• A derivation can be drawn as a Tree
• Start symbol is the tree’s root X
• For a production X → Y1 … Yn
add children Y1 … Yn to node
• We begin with the start symbol, S

• In each step, we replace one nonterminal in the current sentential form with one of the
right-hand sides of a production for that nonterminal
Example of a Grammar
v Context-free grammar for simple expressions:

G = <{list,digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list >


with productions P =
list ® list + digit
list ® list - digit

list ® digit

digit ® 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Derivation for the Example Grammar
list
Þ list + digit
Þ list - digit + digit
Þ digit - digit + digit
Þ 9 - digit + digit
Þ 9 - 5 + digit
Þ9-5+2
This is an example leftmost derivation, because we replaced the leftmost nonterminal
(underlined) in each step.
Likewise, a rightmost derivation replaces the rightmost nonterminal in each step
Derivation: An example

v CFG: v derivation:
E → id E→ E+E

E→E+E → E*E+E

E→E*E → id*E+E

E→(E) → id*id+E

→ id*id+id
• Is string id * id + id in language
defined by grammar?
Syntax Analyzer (Parser)

v Groups tokens together to form grammatical phrases


• Builds a structure to capture the program
(Abstract Syntax Tree)
• Interior nodes – operators
• Children - operands
o Example: a = a * 5;
17

Parse Trees

• The root of the tree is labeled by the Start symbol

• Each leaf of the tree is labeled by a Terminal (=token) or e

• Each interior node is labeled by a Nonterminal

• If A ® X1 X2 … Xn is a production, then node A has immediate children X1,

X2, …, Xn where Xi is a (non)terminal or e (e denotes the empty string)


Parse Tree for the Example Grammar

Parse tree of the string 9-5+2 using grammar, G


list

list digit

list digit

digit
The sequence of
9 - 5 + 2 leafs is called the
yield of the parse tree
Ambiguity

v Consider the following context-free grammar:

G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>


with production P =
string ® string + string | string - string | 0 | 1 | … | 9

This grammar is ambiguous, because more than one parse tree represents the
string 9-5+2
Ambiguity (cont’d)

string string

string string string

string string string string string

9 - 5 + 2 9 - 5 + 2

2
0
Associativity of Operators
• Left-associative operators have left-recursive productions

left ® left + term | term


String a+b+c has the same meaning as (a+b)+c

• Right-associative operators have right-recursive productions


right ® term = right | term
String a=b=c has the same meaning as a=(b=c)
Precedence of Operators
v Operators with higher precedence “bind more tightly”
§ expr ® expr + term | term
§ term ® term * factor | factor
§ factor ® number | ( expr )
§ String 2+3*5 has the same meaning as 2+(3*5)
expr
expr term
term term factor
factor factor number
number number
2 + 3 * 5 The Parse Tree
Syntax of Statements
stmt ® id := expr
| if expr then stmt
| if expr then stmt else stmt
| while expr do stmt
| begin opt_stmts end
opt_stmts ® stmt ; opt_stmts
|e
24

Syntax-Directed Translation (SDT)


• Uses a CFG to specify the syntactic structure of the language.

• AND associates a set of attributes with the terminals and


nonterminals of the grammar

• AND associates with each production a set of semantic rules to


compute values of attributes

• A parse tree is traversed and semantic rules applied:


o After the computations are completed the attributes contain
the translated form of the input 24
25

Synthesized and Inherited Attributes

• An attribute is said to be:


o synthesized if its value at a parse-tree node is
determined from the attribute values at the children
of the node
o inherited if its value at a parse-tree node is
determined by the parent (by enforcing the parent’s
semantic rules)
Example Attribute Grammar

String concat operator


Production Semantic Rule
expr ® expr1 + term expr.t := expr1.t // term.t // “+”
expr ® expr1 - term expr.t := expr1.t // term.t // “-”
expr ® term expr.t := term.t
term ® 0 term.t := “0”
term ® 1 term.t := “1”
… …
term ® 9 term.t := “9”
Example: Annotated Parse Tree

expr.t = “95-2+”

expr.t = “95-” term.t = “2”

expr.t = “9” term.t = “5”

term.t = “9”

9 - 5 + 2
28

In-Class Exercise

v Consider the Context-Free Grammar


G = < {S},{a,−, ∗}, P, S >
with productions P: S → S− | SS∗ | a

§Show how the string a − a ∗ −a∗ can be


generated by G (using derivation).
September 7, 2017 CSC318 - Compiler Construction (Omogbadegun, 2017)
28
The Structure of a Modern Compiler

Source Lexical Analysis


Code
Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization Machine
Code
The Structure of a Modern Compiler

Source Lexical Analysis


Code
Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization Machine
Code
The Structure of a Modern Compiler

Source Lexical Analysis


Code
Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization Machine
Code
while (y < z)
{
int x = a + b;
y += x;
} Lexical Analysis

Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization
while (y < z)
{
int x = a + b;
y += x;
}
Lexical Analysis

Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization
while (y < z)
{
int x = a + b;
y += x;
}
T_While
T_LeftParen
Lexical Analysis
T_Identifier y
T_Less Syntax Analysis
T_Identifier z
T_RightParen
T_OpenBrace Semantic Analysis
T_Int
T_Identifier x IR Generation
T_Assign
T_Identifier a
T_Plus IR Optimization
T_Identifier b
T_Semicolon
T_Identifier y
Code Generation
T_PlusAssign
T_Identifier x Optimization
T_Semicolon
T_CloseBrace
while (y < z)
{ int x = a + b;
y += x;
}
T_While
T_LeftParen Lexical Analysis
T_Identifier y
T_Less Syntax Analysis
T_Identifier z
T_RightParen
T_OpenBrace Semantic Analysis
T_Int
T_Identifier x
T_Assign
IR Generation
T_Identifier a
T_Plus IR Optimization
T_Identifier b
T_Semicolon
T_Identifier y Code Generation
T_PlusAssign
T_Identifier x Optimization
T_Semicolon
T_CloseBrace
while (y < z) {
int x = a + b;
y += x;
}

While Lexical Analysis

Syntax Analysis
Sequence
Semantic Analysis

< = = IR Generation

IR Optimization
y z x + y +
Code Generation

Optimization
a b y x
while (y < z) {
int x = a + b;
y += x;
}
While
Lexical Analysis

Syntax Analysis
Sequence
Semantic Analysis

< = = IR Generation

IR Optimization
y z x + y +
Code Generation

Optimization
a b y x
while (y < z)
{ int x = a + b;
y += x;
} While void

Lexical Analysis

Syntax Analysis
Sequence void
Semantic Analysis

< bool = int = int IR Generation

IR Optimization
y z x + int y + int
Code Generation
int int int int

Optimization
a b y x
int int int int
while (y < z)
{ int x = a + b;
y += x;
} While void

Lexical Analysis

Syntax Analysis
Sequence void
Semantic Analysis

< bool = int = int IR Generation

IR Optimization
y z x + int y + int
Code Generation
int int int int

Optimization
a b y x
int int int int
while (y < z)
b;
{
int x = a +
y += x;
Lexical Analysis
}

Loop: x = a + b Syntax Analysis

y = x + y Semantic Analysis

_= y < z IR Generation
t IR Optimization
1 Code Generation

if _t1 goto Loop Optimization


while (y < z)
b;
{
int x = a +
y += x;
Lexical Analysis
}

Loop: x = a + b Syntax Analysis

y = x + y Semantic Analysis

_= y < z IR Generation
t IR Optimization
1 Code Generation

if _t1 goto Loop Optimization


if _t1 goto{ Loop
while (y < z)
b;

int x = a +
y += x;
Lexical Analysis
}

x = a + b Syntax Analysis

Loop: y = x + y Semantic Analysis

_= y < z IR Generation
t IR Optimization
1 Code Generation

Optimization
if _t1 goto{ Loop
while (y < z)
b;

int x = a +
y += x;
Lexical Analysis
}

x = a + b Syntax Analysis

Loop: y = x + y Semantic Analysis

_= y < z IR Generation
t IR Optimization
1 Code Generation

Optimization
while (y < z)
b;
{
int x = a +
y += x;
Lexical Analysis
}

add $1, $2, $ Syntax Analysis


3 Semantic Analysis
Loop: $4, $1, $
IR Generation
add 4
IR Optimization
slt $6, $1, $
Code Generation
5
Optimization
beq $6, loo
p
LECTURE 3: FURTHER READING & ASSIGNMENT

Kindly study the Chapter4 in the Textbook, DES WATSON, on page


75-91, for further background understanding on the lecture topic.

45
29

Assignment
v Consider the grammar, G = < {S},{a,b},P, S > with
productions
P: S → aSbS | bSaS | E

o Show that this grammar is ambiguous by


finding an example string of terminals a and b
that results in two or more parse trees.

You might also like