UNIT 3 Syntax Analysis-Part1: Harshita Sharma

The document discusses syntax analysis and parsing. It defines context-free grammars and their components. There are two main types of parsers - top-down parsers that build parse trees from the root node downward, and bottom-up parsers that build trees from the leaf nodes upward. Grammars precisely define the syntactic structure of programming languages. They allow languages to be specified and evolved over time.

Uploaded by

Vishu Aasliya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views70 pages

UNIT 3 Syntax Analysis-Part1: Harshita Sharma

Uploaded by

Vishu Aasliya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

UNIT 3 Syntax

Analysis–Part1
HARSHITA SHARMA
Syllabus
• Syntax analysis:
• Specification of syntax using grammar.
• Top-down parsing
• recursive-descent
• predictive.
• Bottom-up parsing
• shift-reduce
• SLR
• CLR
• LALR
• Parser generator.
INTRODUCTION
Role of the Parser
• Parser for any grammar is program that takes as input string w (obtain set of strings tokens
from the lexical analyzer) and produces as output either a parse tree for w , if w is a valid
sentences of grammar or error message indicating that w is not a valid sentences of given
grammar.
• The goal of the parser is to determine the syntactic validity of a source string is valid, a tree
is built for use by the subsequent phases of the computer.
• The tree reflects the sequence of derivations or reduction used during the parser. Hence, it
is called parse tree.
• If string is invalid, the parse has to issue diagnostic message identifying the nature and cause
of the errors in string. Every elementary subtree in the parse tree corresponds to a
production of the grammar.
There are two ways of identifying an elementary subtree:
1. By deriving a string from a non-terminal or
2. By reducing a string of symbol to a non-terminal.
Types of Parsers
The two types of parsers employed are:
a. Top down parser: which build parse trees from top(root) to bottom(leaves)
b. Bottom up parser: which build parse trees from leaves and work up the
root.
USE OF GRAMMAR
• By design, every programming language has precise rules that prescribe the
syntactic structure of well-formed programs.
• In C, for example, a program is made up of functions, a function out of
declarations and statements, a statement out of expressions, and so on.
• The syntax of programming language constructs can be specified by context-
free grammars or BNF (Backus-Naur Form) notation.
• Grammars offer significant benefits for both language designers and
compiler writers.
Advantages of using a Grammar
• A grammar gives a precise, yet easy-to-understand, syntactic specification of a programming
language.
• From certain classes of grammars, we can construct automatically an efficient parser that
determines the syntactic structure of a source program. As a side benefit, the parser-
construction process can reveal syntactic ambiguities and trouble spots that might have
slipped through the initial design phase of a language.
• The structure imparted to a language by a properly designed grammar is useful for
translating source programs into correct object code and for detecting errors.
• A grammar allows a language to be evolved or developed iteratively, by adding new
constructs to perform new tasks. These new constructs can be integrated more easily into an
implementation that follows the grammatical structure of the language.
SYNTAX ERROR HANDLING
Errors at various levels
• Lexical errors
• Syntax Errors
• Semantic Errors
• Logical Errors
• The precision of parsing methods allows syntactic errors to be detected very effciently.
Several parsing methods, such as the LL and LR methods, detect an error as soon as
possible; that is, when the stream of tokens from the lexical analyzer cannot be parsed
further according to the grammar for the language. More precisely, they have the viable-
prefix property, meaning that they detect that an error has occurred as soon as they see a
prefix of the input that cannot be completed to form a string in the language.
• Another reason for emphasizing error recovery during parsing is that many errors
appear syntactic, whatever their cause, and are exposed when parsing cannot
continue. A few semantic errors, such as type mismatches, can also be detected
efficiently; however, accurate detection of semantic and logical errors at compile
time is in general a difficult task.
• Goals of error handler:
• Report errors
• Recover errrors
• Minimal overhead
ERROR RECOVERY STRATEGIES
Panic Mode Recovery
• With this method, on discovering an error, the parser discards input symbols
one at a time until one of a designated set of synchronizing tokens is found.
The synchronizing tokens are usually delimiters, such as semicolon or },
whose role in the source program is clear and unambiguous.
• The compiler designer must select the synchronizing tokens appropriate for
the source language. While panic-mode correction often skips a considerable
amount of input without checking it for additional errors, it has the
advantage of simplicity, and, unlike some methods to be considered later, is
guaranteed not to go into an infinite loop.
Phrase Level Recovery
• On discovering an error, a parser may perform local correction on the remaining input; that
is, it may replace a prefix of the remaining input by some string that allows the parser to
continue.
• A typical local correction is to replace a comma by a semicolon, delete an extraneous
semicolon, or insert a missing semicolon. The choice of the local correction is left to the
compiler designer. Of course, we must be careful to choose replacements that do not lead to
infinite loops, as would be the case, for example, if we always inserted something on the
input ahead of the current input symbol.
• Phrase-level replacement has been used in several error-repairing compilers, as it can correct
any input string. Its major drawback is the difficulty it has in coping with situations in which
the actual error has occurred before the point of detection
Error Productions
• By anticipating common errors that might be encountered, we can augment
the grammar for the language at hand with productions that generate the
erroneous constructs.
• A parser constructed from a grammar augmented by these error productions
detects the anticipated errors when an error production is used during
parsing.
• The parser can then generate appropriate error diagnostics about the
erroneous construct that has been recognized in the input.
Global Correction
• Ideally, we would like a compiler to make as few changes as possible in processing an
incorrect input string. There are algorithms for choosing a minimal sequence of changes to
obtain a globally least-cost correction. Given an incorrect input string x and grammar G,
these algorithms will find a parse tree for a related string y, such that the number of
insertions, deletions, and changes of tokens required to transform x into y is as small as
possible.
• Unfortunately, these methods are in general too costly to implement in terms of time and
space, so these techniques are currently only of theoretical interest. Do note that a closest
correct program may not be what the programmer had in mind. Nevertheless, the notion of
least-cost correction provides a yard stick for evaluating error-recovery techniques, and has
been used for finding optimal replacement strings for phrase-level recovery.
GRAMMAR PRERQUISITES
Context Free Grammar
• Inherently recursive structures of a programming language are defined by a context-
free Grammar.
• In a context-free grammar, we have four triples G( V,T,P,S).
• Here , V is finite set of terminals (in our case, this will be the set of tokens)
• T is a finite set of non-terminals (syntactic-variables)
• P is a finite set of productions rules in the following form
A → α where A is a non-terminal and α is a string of terminals and non-terminals (including
the empty string)
• S is a start symbol (one of the non-terminal symbol)
Example
Using notational conventions
Language of a Grammar
• L(G) is the language of G (the language generated by G) which is a set of
sentences.
• A sentence of L(G) is a string of terminal symbols of G. If S is the start symbol of
G then ω is a sentence of L(G) iff S ⇒ ω where ω is a string of terminals of G. If
G is a context free grammar, L(G) is a context-free language.
• Two grammar G1 and G2 are equivalent, if they produce same language.
• Consider the production of the form S ⇒ α, If α contains non-terminals, it is called
as a sentential form of G. If α does not contain non-terminals, it is called as a
sentence of G.
Derivations
• The construction of a parse tree can be made precise by taking a derivational view,
in which productions are treated as rewriting rules. Beginning with the start symbol,
each rewriting step replaces a nonterminal by the body of one of its productions.
• This derivational view corresponds to the top-down construction of a parse tree,
but the precision afforded by derivations will be especially helpful when bottom-up
parsing is discussed.
• As we shall see, bottom-up parsing is related to a class of derivations known as
“rightmost" derivations, in which the rightmost nonterminal is rewritten at each step
Derivations
• In general a derivation step is
αAβ ⇒ αγβ
is sentential form and if there is a production rule A→γ in our grammar where α and
β are arbitrary strings of terminal and non-terminal symbols α1 ⇒ α2 ⇒ ... ⇒ αn (αn
derives from α1 or α1 derives αn )
• Derives in zero or more steps:

• At each derivation step, we can choose any of the non-terminal in the sentential
form of G for the replacement.
Leftmost Derivation
• If we always choose the left-most non-terminal in each derivation step, this derivation is
called as left-most derivation.
• Example:
• E→E+E|E–E|E*E|E/E|-E
• E→(E)
• E → id
• Leftmost derivation :
• E → E + E → E * E+E →id* E+E→id*id+E→id*id+id
• The string is derived from the grammar w= id*id+id, which consists of all terminal
symbols.
Rightmost Derivation
• If we always choose the right-most non-terminal in each derivation step, this
derivation is called as left-most derivation.
• Example 1 (same grammar as previous slide):
• E → E + E→ E+E * E→E+ E*id→E+id*id→id+id*id
• String that appear in leftmost derivation are called left sentinel forms.
• String that appear in rightmost derivation are called right sentinel forms.
• Sentinels: Given a grammar G with start symbol S, if S → α , where α may contain
nonterminals or terminals, then α is called the sentinel form of G.
Question
• Given grammar G : E → E+E | E*E | ( E ) | - E | id
Sentence to be derived : – (id+id).
Derive using both leftmost and rightmost derivation.
Solution
Yield / Frontier of a Tree
• Each interior node of a parse tree is a non-terminal. The children of node
can be a terminal or non-terminal of the sentinel forms that are read from
left to right. The sentinel form in the parse tree is called yield or frontier of
the tree.
PARSE TREE
• A parse tree is a graphical representation of a derivation that filters out the order in
which productions are applied to replace nonterminals.
• Each interior node of a parse tree represents the application of a production.
• The interior node is labeled with the nonterminal A in the head of the production;
the children of the node are labeled, from left to right, by the symbols in the body
of the production by which this A was replaced during the derivation.
• The leaves of a parse tree are terminal symbols.
• A parse tree can be seen as a graphical representation of a derivation.
Question
• Draw parse tree for the input –(id+id).
Solution
Sequence of Parse Trees
AMBIGUITY
What is it?
• a grammar that produces more than one parse tree for some sentence is said
to be ambiguous. Put another way, an ambiguous grammar is one that
produces more than one leftmost derivation or more than one rightmost
derivation for the same sentence.
• For most parsers, it is desirable that the grammar be made unambiguous, for
if it is not, we cannot uniquely determine which parse tree to select r a
sentence. In other cases, it is convenient to use carefully chosen ambiguous
grammars, together with disambiguating rules that “throw away" undesirable
parse trees, leaving only one tree for each sentence.
Example Question
• Consider the grammar: E -> E + E | E * E | ( E ) | id
• Derive the two distinct derivations and parse trees for: id+id*id:
Solution
Example
• To disambiguate the grammar E → E+E | E*E | E^E | id | (E), we can use precedence of operators as
follows:
• ^ (right to left)
• /,* (left to right)
• -,+ (left to right)
• We get the following unambiguous grammar:
• E → E+T | T
• T → T*F | F
• F → G^F | G
• G → id | (E)
Verifying languages generated by a Grammar

• A proof that a grammar G generates a language L has two parts: show that
every string generated by G is in L, and conversely that every string in L can
indeed be generated by G.
Every balanced string is derivable from S
CFGs vs Regular Expressions
• grammars are a more powerful notation than regular expressions. Every
construct that can be described by a regular expression can be described by a
grammar, but not vice-versa. Alternatively, every regular language is a
context-free language, but not vice-versa.
Constructing Grammar from an NFA
Example
Language describable by CFG but not be RE

• L = {a^nb^n|n>=1)
• FA cannot keep count, hence RE not possible.
• Grammar: S->aSb|
WRITING SUITABLE GRAMMARS
• Grammars are capable of describing most, but not all, of the syntax of
programming languages.
• For instance, the requirement that identifiers be declared before they are
used, cannot be described by a context-free grammar.
• Therefore, the sequences of tokens accepted by a parser form a superset of
the programming language; subsequent phases of the compiler must analyze
the output of the parser to ensure compliance with rules that are not
checked by the parser.
Why use RE to describe LA when CFG is
better?
1. Separating the syntactic structure of a language into lexical and non lexical parts
provides a convenient way of modularizing the front end of a compiler into two
manageable-sized components.
2. The lexical rules of a language are frequently quite simple, and to describe them we
do not need a notation as powerful as grammars.
3. Regular expressions generally provide a more concise and easier-to-understand
notation for tokens than grammars.
4. More efficient lexical analyzers can be constructed automatically from regular
expressions than from arbitrary grammars.
Removing Ambiguity
Question
• Draw parse tree(s) for the input:
• if E1 then if E2 then S1 else S2
Removing ambiguity
Eliminating Left Recursion
• A grammar is said to be left recursive if it has a non-terminal A such that there is a
derivation A=>Aα for some string α. Top-down parsing methods cannot handle
left-recursive grammars.
• Hence, left recursion can be eliminated as follows:
• If there is a production A → Aα | β it can be replaced with a sequence of two
productions
• A → βA’
• A’ → αA’ | ε
• Without changing the set of strings derivable from A.
Example Question
• Consider the following grammar for arithmetic expressions:
• E → E+T | T
• T → T*F | F
• F → (E) | id
Solution
• First eliminate the left recursion for E as
• E → TE’
• E’ → +TE’ | ε
Solution
• First eliminate the left recursion for E as
• E → TE’
• E’ → +TE’ | ε
• Then eliminate for T as
• T → FT’
• T’→ *FT’ | ε
Solution
• Thus the obtained grammar after eliminating left recursion is
• E → TE’
• E’ → +TE’ | ε
• T → FT’
• T’ → *FT’ | ε
• F → (E) | id
Algorithm to eliminate left recursion
1. Arrange the non-terminals in some order A1, A2 . . . An.
2. for i := 1 to n do begin
for j := 1 to i-1 do begin
replace each production of the form Ai → Aj γ
by the productions Ai → δ1 γ | δ2γ | . . . | δk γ
where Aj → δ1 | δ2 | . . . | δk are all the current Aj- productions;
end
eliminate the immediate left recursion among the Ai- productions
end
Indirect left recursion
• A grammar is said to have indirect left recursion if, starting from any symbol
of the grammar, it is possible to derive a string whose head is that symbol.
• For example: A->Br; B->Cd; C->At
• where A, B, C are non-terminals and r, d, t are terminals. Here, starting with
A, we can derive A again by substituting C to B and B to A.
Example
• A1 ⇒ A2 A3
• A2 ⇒ A3 A1 | b
• A3 ⇒ A1 A1 | a
• Where A1, A2, A3 are non terminals and a, b are terminals.
Solution
• Identify the productions which can cause indirect left recursion. In our case,
A3 ⇒ A1 A1 | a
• Substitute its production at the place the terminal is present in any other
production: substitute A1–> A2 A3 in production of A3.
Eliminating Left Factoring
• Left factoring is a grammar transformation that is useful for producing a
grammar suitable for predictive parsing. When it is not clear which of two
alternative productions to use to expand a non-terminal A, we can rewrite
the A-productions to defer the decision until we have seen enough of the
input to make the right choice.
• If there is any production A → αβ1 | αβ2 , it can be rewritten as
• A → αA’
• A’ → β1 | β2
Example
• Consider the grammar , G : S → iEtS | iEtSeS | a
• E→b
• Left factored, this grammar becomes
• S → iEtSS’ | a
• S’ → eS | ε
• E→b

Algorithm Design Solutions
80% (5)
Algorithm Design Solutions
207 pages
Unit-3 Cs6660-Compiler Design
No ratings yet
Unit-3 Cs6660-Compiler Design
66 pages
C Depart
No ratings yet
C Depart
7 pages
2.2 - Syntax Analysis (Upto Top-Down Parsing)
No ratings yet
2.2 - Syntax Analysis (Upto Top-Down Parsing)
91 pages
Role of Parse1
No ratings yet
Role of Parse1
20 pages
Unit Iii
No ratings yet
Unit Iii
28 pages
Unit Iii
No ratings yet
Unit Iii
95 pages
MODULE 3 Syntax Analysis
100% (1)
MODULE 3 Syntax Analysis
182 pages
Syntax Analysis
No ratings yet
Syntax Analysis
58 pages
CD Unit-2
100% (1)
CD Unit-2
60 pages
2 Syntax Analysis - Introduction
No ratings yet
2 Syntax Analysis - Introduction
8 pages
3-Module 2 - Role of Parser - Parse Tree-02-08-2024
No ratings yet
3-Module 2 - Role of Parser - Parse Tree-02-08-2024
76 pages
CS8602 CD Unit 2
No ratings yet
CS8602 CD Unit 2
43 pages
CD Chapter 3
No ratings yet
CD Chapter 3
57 pages
Syntax Analysis (Part-I)
No ratings yet
Syntax Analysis (Part-I)
88 pages
CD Unit 2
100% (1)
CD Unit 2
20 pages
Chapter 3
No ratings yet
Chapter 3
96 pages
Compiler - Design - Module3
No ratings yet
Compiler - Design - Module3
19 pages
Parser
No ratings yet
Parser
40 pages
Unit 2
No ratings yet
Unit 2
29 pages
ACD-UNIT-4 Notes
No ratings yet
ACD-UNIT-4 Notes
32 pages
CD Unit 2
No ratings yet
CD Unit 2
19 pages
3 Role of Parser
No ratings yet
3 Role of Parser
135 pages
CD Unit Ii
No ratings yet
CD Unit Ii
38 pages
Unit 2
No ratings yet
Unit 2
45 pages
Module 3-CD-NOTES
No ratings yet
Module 3-CD-NOTES
12 pages
4th - Syntax Analysis
No ratings yet
4th - Syntax Analysis
29 pages
CC 3
No ratings yet
CC 3
29 pages
CC Unit 3
No ratings yet
CC Unit 3
51 pages
Unit 2
No ratings yet
Unit 2
22 pages
Lecture 4 PDF
No ratings yet
Lecture 4 PDF
28 pages
12.2unit 2
No ratings yet
12.2unit 2
25 pages
CH03
No ratings yet
CH03
57 pages
CD Unit 3
No ratings yet
CD Unit 3
76 pages
Compiler Design Unit II-1
No ratings yet
Compiler Design Unit II-1
46 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
14 pages
CD Unit-Ii
No ratings yet
CD Unit-Ii
56 pages
II. Parser: Syntax Analysis
No ratings yet
II. Parser: Syntax Analysis
18 pages
Compiler Design Lec-Three Syntax Analysis
No ratings yet
Compiler Design Lec-Three Syntax Analysis
60 pages
Module 2 C D Notes
No ratings yet
Module 2 C D Notes
21 pages
@CD ch3
No ratings yet
@CD ch3
38 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
CD Unit Ii
No ratings yet
CD Unit Ii
11 pages
UNIt-II-P
No ratings yet
UNIt-II-P
57 pages
Syntax Analysis
No ratings yet
Syntax Analysis
47 pages
CD - Unit - 2
No ratings yet
CD - Unit - 2
22 pages
Lecture 08 09 PDF
No ratings yet
Lecture 08 09 PDF
10 pages
CD Unit 2
No ratings yet
CD Unit 2
15 pages
Compiler Design Unit 2
No ratings yet
Compiler Design Unit 2
44 pages
CD Unit-2
No ratings yet
CD Unit-2
107 pages
KCA015 Unit2
No ratings yet
KCA015 Unit2
29 pages
PCD 1.4 Syntax Analysis
No ratings yet
PCD 1.4 Syntax Analysis
33 pages
Chapter-3-Syntax Analysis
No ratings yet
Chapter-3-Syntax Analysis
126 pages
Compiler Design Chapter-3
0% (1)
Compiler Design Chapter-3
177 pages
CD Chapter 2
No ratings yet
CD Chapter 2
39 pages
CS602PC - Compiler Design Lecture Notes Unit 2
No ratings yet
CS602PC - Compiler Design Lecture Notes Unit 2
42 pages
Grammars
No ratings yet
Grammars
34 pages
SPCC Module 5 Lect 5 Syntax Analysis Part 1
No ratings yet
SPCC Module 5 Lect 5 Syntax Analysis Part 1
14 pages
Ss Mod4 PDF
No ratings yet
Ss Mod4 PDF
37 pages
Chapter - 3
No ratings yet
Chapter - 3
46 pages
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
Almost All Primes Can Be Quickly Certified - S. Goldwasser, J. Kilian (1986)
No ratings yet
Almost All Primes Can Be Quickly Certified - S. Goldwasser, J. Kilian (1986)
14 pages
Error Control Coding: 7.1 Block Codes
No ratings yet
Error Control Coding: 7.1 Block Codes
15 pages
Worksheet 9.5 Hyperbolas: Identify The Vertices and Foci of Each. Then Sketch The Graph
No ratings yet
Worksheet 9.5 Hyperbolas: Identify The Vertices and Foci of Each. Then Sketch The Graph
2 pages
Vehicle Routing and Scheduling
No ratings yet
Vehicle Routing and Scheduling
54 pages
Unit 2 Artificial Intelligence - Problem-Solving Through Searching
No ratings yet
Unit 2 Artificial Intelligence - Problem-Solving Through Searching
120 pages
Ai 1st Sessional Solution
100% (1)
Ai 1st Sessional Solution
21 pages
Electronic DL 1
No ratings yet
Electronic DL 1
17 pages
Data Structures Question Paper 2023
33% (3)
Data Structures Question Paper 2023
2 pages
Automata Theory-Notes
No ratings yet
Automata Theory-Notes
92 pages
DS Trees Short Notes
No ratings yet
DS Trees Short Notes
12 pages
Machine Learning Paper Set-5
No ratings yet
Machine Learning Paper Set-5
2 pages
Combinational Logic Circuits
No ratings yet
Combinational Logic Circuits
53 pages
Daa Assignment
No ratings yet
Daa Assignment
12 pages
PPT5 - Data Types in C++
100% (1)
PPT5 - Data Types in C++
26 pages
Solution of Non-Linear Equations: - Secant Method - Regula-Falsi Method
No ratings yet
Solution of Non-Linear Equations: - Secant Method - Regula-Falsi Method
19 pages
Quantum Threat Timeline - Executive Summary
No ratings yet
Quantum Threat Timeline - Executive Summary
4 pages
Simulated Annealing For Knapsack Problem
No ratings yet
Simulated Annealing For Knapsack Problem
5 pages
Dna Book
No ratings yet
Dna Book
171 pages
Chapter3 - Test Bank
No ratings yet
Chapter3 - Test Bank
3 pages
PHL 344 K 2017942525
No ratings yet
PHL 344 K 2017942525
3 pages
Graph Analytics For Python Developers
No ratings yet
Graph Analytics For Python Developers
13 pages
More Examples of Lagrange Interpolation: Appendix E
No ratings yet
More Examples of Lagrange Interpolation: Appendix E
3 pages
Acsl 16-17 Contest 3 Notes - Boolean Data Structures Regex Prev Boolean Graph Theory Bit String
No ratings yet
Acsl 16-17 Contest 3 Notes - Boolean Data Structures Regex Prev Boolean Graph Theory Bit String
40 pages
BCSL058 Solved Assignment 2024-25
No ratings yet
BCSL058 Solved Assignment 2024-25
41 pages
Lecture 04 Back Propagation
No ratings yet
Lecture 04 Back Propagation
50 pages
Answer - Key - Suraku Academy - CT DS - 301 - II
No ratings yet
Answer - Key - Suraku Academy - CT DS - 301 - II
6 pages
Write A C Program To Simulate Bit Stuffing and De-Stuffing
No ratings yet
Write A C Program To Simulate Bit Stuffing and De-Stuffing
23 pages
ADSP LAB, First Year M. Tech. E&TC
No ratings yet
ADSP LAB, First Year M. Tech. E&TC
8 pages
CSP Project 1.1.7 Rubric
No ratings yet
CSP Project 1.1.7 Rubric
1 page

UNIT 3 Syntax Analysis-Part1: Harshita Sharma

Uploaded by

UNIT 3 Syntax Analysis-Part1: Harshita Sharma

Uploaded by

UNIT 3 Syntax

You might also like