67% found this document useful (3 votes)
366 views

Syntax Analysis

This document discusses syntax analysis, which is the second phase of a compiler. It analyzes the structure of a program by parsing its source code according to the language's formal grammar. A formal grammar precisely defines the valid syntax of a language and allows for automated parsing. There are two main types of parsers - top-down parsers which work from the top of the parse tree downward, and bottom-up parsers which work from the bottom up. Context-free grammars are commonly used to represent programming languages and allow generating parse trees to represent the syntactic structure of a program.

Uploaded by

syeda rabia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
67% found this document useful (3 votes)
366 views

Syntax Analysis

This document discusses syntax analysis, which is the second phase of a compiler. It analyzes the structure of a program by parsing its source code according to the language's formal grammar. A formal grammar precisely defines the valid syntax of a language and allows for automated parsing. There are two main types of parsers - top-down parsers which work from the top of the parse tree downward, and bottom-up parsers which work from the bottom up. Context-free grammars are commonly used to represent programming languages and allow generating parse trees to represent the syntactic structure of a program.

Uploaded by

syeda rabia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Compiler Construction

[SWE-310]
Syntax Analysis
Compiler Construction [SWE-310] Syntax Analysis
Syntax Analysis

 2nd phase of a compiler.

 Also called parsing or hierarchal analysis.

 The grammar of a language defines the


correct form for sentences in that
language.
Compiler Construction [SWE-310] Syntax Analysis
Benefits of Syntax or Grammar

 Grammars offer following significant benefits


for both language designers and compiler
writers:
 A grammar gives a precise, yet easy-to-
understand, syntactic specification of a
programming language.
 From certain classes of grammars, a compiler
writer can construct automatically an efficient
parser that determines the syntactic structure
of a source program.
Compiler Construction [SWE-310] Syntax Analysis
Benefits of Syntax or Grammar

 The structure imparted to a language by a


properly designed grammar is useful for
translating source programs into correct object
code and for detecting errors.
 A grammar allows a language to be evolved or
developed iteratively, by adding new
constructs to perform new tasks. These new
constructs can be integrated more easily into
an implementation that follows the
grammatical structure of the language.
Compiler Construction [SWE-310] Syntax Analysis
The Role of Syntax Analyzer

 It involves grouping the tokens of the


source program into grammatical phrases
that are used by the compiler to synthesize
output.
 A language must exhibit structure.
 Just as the structure of English prose.
 Formal grammars are used to describe
structure of computer program.
Compiler Construction [SWE-310] Syntax Analysis
The Role of Syntax Analyzer
Compiler Construction [SWE-310] Syntax Analysis
Types of Languages

 There are two types of languages:


Natural languages
Formal or Computer languages
 E.g.#1 Natural Language
<Sentences>  <noun phrase> <verb phrase>
<noun phrase><adjective> <noun phrase>
<noun phrase>  <noun>
<noun>  boy
<adjective>  little
Compiler Construction [SWE-310] Syntax Analysis
Types of Languages

 E.g.#2 Formal or Computer Language

<expression> <expression> + <expression>


<expression> <expression> * <expression>
<expression> ( <expression> )
<expression> id
Compiler Construction [SWE-310] Syntax Analysis
Types of Languages

 E.g.#3 Show that ( id + id ) * id is a valid


expression
 <expression> <expression> * <expression>

 <expression> (<expression>) * <expression>


 <expression> (<expression> + <expression>) * <expression>

 <expression> (<expression> + <expression>) * id

 <expression> (<expression> + id ) * id

 <expression> ( id + id ) * id
Compiler Construction [SWE-310] Syntax Analysis
Types of Grammars

 There are four (4) types of grammars as


defined by Noam Chomsky in 1950’s.

Level Grammar
3 Regular
2 Context-Free
1 Context-Sensitive
0 Unrestricted
Compiler Construction [SWE-310] Syntax Analysis
Context-Free Grammar (CFG)

 Allthe computer languages are expressed


by Context-Free Grammar (CFG).
 A CFG, ‘G’ is defined as:
G = { T, N, S, P}, where,
T: Set of Terminals
N:Set of Non-terminals
S: Start Symbol
P: Productions
Compiler Construction [SWE-310] Syntax Analysis
Context-Free Grammar (CFG)

1. T: Set of Terminals
Terminals are the basic symbols from which
strings are formed.
The word “token" is a synonym for
“terminal“.
E.g. if, then, else, *, ;
Compiler Construction [SWE-310] Syntax Analysis
Context-Free Grammar (CFG)

2. N: Set of Non-terminals
 Non-terminals are syntactic variables that
denote sets of strings.
 impose a hierarchal structure on the
language.
Compiler Construction [SWE-310] Syntax Analysis
Context-Free Grammar (CFG)

3. S: Start Symbol
 One non-terminals is distinguished as start
symbol.
Compiler Construction [SWE-310] Syntax Analysis
Context-Free Grammar (CFG)
4. P: Productions
 Productions specify the manner in which the terminals and non-
terminals can be combined to form strings.
 Each production consists of:
(a) A nonterminal called the head or left side of the production; this
production defines some of the strings denoted by the head.
(b) The symbol  . Sometimes ::= has been used in place of the
arrow.
(c) A body or right side consisting of zero or more terminals and non-
terminals. The components of the body describe one way in which
strings of the nonterminal at the head can be constructed.
Compiler Construction [SWE-310] Syntax Analysis
Context-Free Grammar (CFG)
 E.g. The CFG of simple arithmetic expression can be
defined by following productions:
expr  expr op term
expr  ( expr )
expr  - expr
expr  id
Op  +
Op  -
Op  *
Op  /
Op  ↑
Compiler Construction [SWE-310] Syntax Analysis
Notational Conventions for CFG

1. Terminals:
Compiler Construction [SWE-310] Syntax Analysis
Notational Conventions for CFG

2. Non-terminals:
Compiler Construction [SWE-310] Syntax Analysis
Notational Conventions for CFG

3. Grammar Symbols:

 4. Strings of Terminals:
Compiler Construction [SWE-310] Syntax Analysis
Notational Conventions for CFG

5. Strings of Grammar Symbols:

 6. Alternatives for Non-terminals:


Compiler Construction [SWE-310] Syntax Analysis
Notational Conventions for CFG

7. Start Symbol:


Compiler Construction [SWE-310] Syntax Analysis
Notational Conventions for CFG

E.g. Using above notations and shorthand, the


grammar of previous example can be rewritten
as:
 E  E A E | ( E ) | - E | id
A+|-|*|/|↑
Compiler Construction [SWE-310] Syntax Analysis
Process of Defining Grammar as a Language

There are two processes by which a grammar


can be defined as a language:
Derivations
Parse Trees
Compiler Construction [SWE-310] Syntax Analysis
Process of Defining Grammar as a Language

1. Derivation:
 Production is a re-writing rule in which the
non-terminal on the left is replaced by the
strings on the right side of productions.
 E.g. E  - E, states that E with a minus sign
is also an expression. OR
 E ==> - E, implies E derives – E
 Also,
 E ==> - E ==> - ( E ) ==> - ( id )
Compiler Construction [SWE-310] Syntax Analysis
Derivation

 Generalized definition of derivation would be:


α𝐴𝛽 ==> α𝛾𝛽 ,
if A ==> 𝛾
Also,
if α1 ==> α2 ==> ----- ==> αn, then
α1 ==> αn
 ==> : Derives in one step
 ==>* : Derives in zero or more steps
 ==>+ : Derives in one or more steps
Compiler Construction [SWE-310] Syntax Analysis
Derivation
 If S ==>* α, where S is the start symbol of a grammar G, then it
can be deduce that α is a sentential form of G.
 Note that a sentential form may contain both terminals and
non-terminals, and may be empty.
 A sentence of G is a sentential form with no non-terminals.
 The language generated by a grammar is its set of sentences.
 Thus, a string of terminals w is in L(G), the language generated
by G, if and only if w is a sentence of G (or S ==>* w ).
 A language that can be generated by a grammar is said to be a
context-free language.
 If two grammars generate the same language, the grammars
are said to be equivalent.
Compiler Construction [SWE-310] Syntax Analysis
Derivation
 E.g. Prove that - ( id + id) is a sentence of grammar of previous
example.
 E ==> - E ==> - ( E ) ==> - ( E+ E ) ==> - (id + E ) ==> - ( id + id )
 At each step of derivation, two choices are available:
Leftmost replacement of non-terminal
Rightmost replacement of non-terminal
 Since the above derivation is leftmost, it can be re-written as:
 E ==>lm - E ==>lm - ( E ) ==>lm - ( E+ E ) ==>lm
- (id + E ) ==>lm - ( id + id )
 Similarly, a rightmost derivation would be:
 E ==>rm - E ==>rm - ( E ) ==>rm - ( E+ E ) ==>rm
- (id + E ) ==>rm - ( id + id )
Compiler Construction [SWE-310] Syntax Analysis
Types of Parsers

 Parsers that work on leftmost derivations


are called Top-down Parsers.

 Parsers that work on rightmost derivations


are called Bottom-up Parsers.
Compiler Construction [SWE-310] Syntax Analysis
Parse Trees

2. Parse Trees:
A graphical representation for a derivation that
filters out the choice regarding replacement order
of non-terminals.
Each interior node of a parse tree represents the
application of a production.
The interior node is labeled with the non-terminal
in the head of the production.
The children of the node are labeled, from left to
right, by the symbols in the body of the
production by which the non-terminal are
replaced during the derivation.
Compiler Construction [SWE-310] Syntax Analysis
Parse Trees
 E.g. Construct parse tree for the following derivation:
E ==> - E ==> - ( E ) ==> - ( E+ E ) ==> - (id + E ) ==> - ( id + id )
Compiler Construction [SWE-310] Syntax Analysis
Parse Trees

E.g. The sentence id + id * id has two


distinct leftmost derivations derivation:
Compiler Construction [SWE-310] Syntax Analysis
Parse Trees

E.g. The corresponding parse trees for the


sentence id + id * id are:

a+(b*c) (a+b)*c
 Which of the above parse tree is right?
Compiler Construction [SWE-310] Syntax Analysis
Parse Tree Examples

E.g.#1 Construct parse tree for the expression


id + id * id using the following grammar:
EE+T
TT+F
F  ( E ) | id
where,
E: Expression
T: Term
F: Factor
Compiler Construction [SWE-310] Syntax Analysis
Parse Tree Examples

The corresponding parse tree for the


expression id + id * id is given below:
Compiler Construction [SWE-310] Syntax Analysis
Parse Tree Examples

E.g.#2 Construct parse tree for the expression


id * num + id – num / id using the following
grammar:
E  E Top T | T
Top  + | -
T  T Fop F | F
Fop  * | /
F  ( E ) | id | - F | num
Compiler Construction [SWE-310] Syntax Analysis
Parse Tree Examples

id * num + id – num / id
Compiler Construction [SWE-310] Syntax Analysis
Parse Tree Examples

E.g.#3 - id * ( num + num ) / id


Compiler Construction [SWE-310] Syntax Analysis
Ambiguity

A grammar that produces more then one


parse tree for some sentence is called
ambiguous.

OR

Ambiguous grammars produce more then


one leftmost or rightmost derivation for
same sentence.
Compiler Construction [SWE-310] Syntax Analysis
Eliminating Ambiguity

E.g. Consider the following grammar:

 Here other stands for any other


statement.
Compiler Construction [SWE-310] Syntax Analysis
Eliminating Ambiguity

 The above grammar is ambiguous since it produces two parse


trees for the following statement:
if E1 then if E2 then S1 else S2
 Parse tree#1
Compiler Construction [SWE-310] Syntax Analysis
Eliminating Ambiguity

if E1 then if E2 then S1 else S2


 Parse Tree#1

 Parse Tree#2
Compiler Construction [SWE-310] Syntax Analysis
Eliminating Ambiguity

In all the programming languages 1st parse


tree is preferred.
The unambiguous grammar can be re-
written by applying general rule that:

“Match each else with the closest previous


unmatched then”
Compiler Construction [SWE-310] Syntax Analysis
Eliminating Ambiguity

Thus, the new grammar would be:


Compiler Construction [SWE-310] Syntax Analysis
A Sample Grammar

 The formal grammar of VSL (Very Simple Language) is


presented below:
Compiler Construction [SWE-310] Syntax Analysis
A Sample Grammar
Compiler Construction [SWE-310] Syntax Analysis
A Sample Grammar

You might also like