0% found this document useful (0 votes)

84 views9 pages

Chapter 3 - Syntax Analysis

Uploaded by

lidelidetuwatiro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views9 pages

Chapter 3 - Syntax Analysis

Uploaded by

lidelidetuwatiro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Compiler Design

Chapter Three: Syntax Analysis

The Objectives of this chapter are listed as follows,
❖ Explain the basic roles of Parser (syntactic Analyzer).

❖ Describe context-Free Grammars (CFGs) and their representation format.

❖ Discuss the different derivation formats: Leftmost derivation, Rightmost derivation and
Non-Leftmost, Non-Rightmost derivations

❖ Be familiar with CFG shorthand techniques.

❖ Describe Parse Tree and its structure.

❖ Discuss ambiguous grammars and how to deal with ambiguity from CFGs.

❖ Explain the Extended Backus Naur Form

The Role of the Parser

❖ The parser obtains a string of tokens from the lexical analyzer, as shown in the below figure,
and verifies that the string of token names can be generated by the grammar for the source
language.

❖ The parser is expected to report any syntax errors and to recover from commonly occurring
errors to continue processing the remainder of the program.

❖ Conceptually, for well-formed programs, the parser constructs a parse tree and passes it to the
rest of the compiler for further processing. In fact, the parse tree need not be constructed
explicitly, since checking and translation actions can be interspersed with parsing. Thus, the
parser and the rest of the front end could well be implemented by a single module.

Fig. Position of parser in compiler model

❖ Therefore, Parser performs context-free syntax analysis, guides context-sensitive analysis,

constructs an intermediate representation, produces meaningful error messages and attempts
1
Compiler Design

error correction.
❖ The parser obtains a string of tokens from the lexical analyzer, as shown in the above Figure
and verifies that the string of token names can be generated by the grammar for the source
language.
❖ A grammar gives a precise, yet easy-to-understand, syntactic specification of a programming
language.
• From certain classes of grammars, we can construct automatically an efficient parser
that determines the syntactic structure of a source program.
• As a side benefit, the parser-construction process can reveal syntactic ambiguities and
trouble spots that might have slipped through the initial design phase of a language.
• The structure imparted to a language by a properly designed grammar is useful for
translating source programs into correct object code and for detecting errors.
❖ A grammar allows a language to be evolved or developed iteratively, by adding new constructs
to perform new tasks.
❖ These new constructs can be integrated more easily into an implementation that follows the
grammatical structure of the language.
❖ There are three general types of parsers for grammars: universal, top-down, and bottom-up.
• Universal parsing methods such as the Cocke-Younger-Kasami algorithm and Earley's
algorithm can parse any grammar (Read more on these).
❖ These general methods are, however, too inefficient to use in compilers production.
❖ The methods commonly used in compilers can be classified as being either top-down or
bottom-up.
• Top-Down Methods: - As implied by their names, top-down methods build parse trees
from the top (root) to the bottom (leaves).
• Bottom-up methods: - start from the leaves and work their way up to the root to build
the parse tree.
• In either case, the input to the parser is scanned from left to right, one symbol at a time.
❖ The most efficient top-down and bottom-up methods work only for sub-classes of grammars,
but several of these classes, particularly, LL and LR grammars, are expressive enough to
describe most of the syntactic constructs in modern programming languages.

Error Handling
Common Programming Errors include:
• Lexical errors, Syntactic errors, Semantic errors and logical Errors. The type of error
handled in this phase of compilation is syntactical error.

Error handler goals

• Report the presence of errors clearly and accurately
• Recover from each error quickly enough to detect subsequent errors
• Add minimal overhead to the processing of correct programs

2
Compiler Design

Common Error-Recovery Strategies includes:

1. Panic mode recovery: - Discard input symbol one at a time until one of designated set of
synchronization tokens is found.
2. Phrase level recovery: - Replacing a prefix of remaining input by some string that allows
the parser to continue.
3. Error productions: - Augment the grammar with productions that generate the erroneous
constructs
4. Global correction: - Choosing minimal sequence of changes to obtain a globally least-cost
correction

Context-Free Grammars (CFGs)

❖ CFG is used as a tool to describe the syntax of a programming language.

❖ A CFG includes 4 components:

1. A set of terminals T, which are the tokens of the language

o Terminals are the basic symbols from which strings are formed.
o The term "token name" is a synonym for “terminal"

2. A set of non-terminals N
o Non-terminals are syntactic variables that denote sets of strings.
o The sets of strings denoted by non-terminals help define the language generated by
the grammar.
o Non-terminals impose a hierarchical structure on the language that is key to syntax
analysis and translation

3. A set of rewriting rules R.

o The left-hand side (head) of each rewriting rule is a single non-terminal.
o The right-hand side (body) of each rewriting rule is a string of terminals and/or non-
terminals

4. A special non-terminal S Є N, which is the start symbol. The production for the start
symbol are listed first.

❖ Just as regular expression generates strings of characters, CFG generate strings of tokens.

❖ A string of tokens is generated by a CFG in the following way:

1. The initial input string is the start symbol S
2. While there are non-terminals left in the string:
a. Pick any non-terminal in the input string A
b. Replace a single occurrence of A in the string with the right-hand side of any
rule that has A as the left-hand side

3
Compiler Design

c. Repeat 1 and 2 until all elements in the string are terminals

Example 1: A grammar that defines simple arithmetic expressions:

Terminals = {id, +, -, *, /, (,)}
Non-Terminals = {expression, term, factor}
Start Symbol = expression
Rules = expression →expression + term
→ expression – term
→ term
term → term* factor
→ term/factor
→ factor
factor → (expression)
→ id
Notational Conventions
1. These symbols are terminals:
A. Lowercase letters early in the alphabet, such as a, b, c.
B. Operator symbols such as +, *, and so on.
C. Punctuation symbols such as parentheses, comma, and so on.
D. The digits 0, 1, ... ,9.
E. Boldface strings such as id or if, each of which represents a single terminal symbol.

2. These symbols are non-terminals:

A. Uppercase letters early in the alphabet, such as A, B, C.
B. The letter S, which, when it appears, is usually the start symbol.
C. Lowercase, italic names such as expr or stmt.
D. Uppercase letters may be used to represent non-terminals for the constructs.
For example: - non terminals for expressions, terms, and factors are often represented
by E, T, and F, respectively.
3. Uppercase letters late in the alphabet, such as X, Y, Z, represent grammar symbols; that is,
either non-terminals or terminals.

4. Lowercase letters late in the alphabet, chiefly u, v, ..., z, represent (possibly empty) strings of
terminals.

5. Lowercase Greek letters ,,, for example, represent (possibly empty) strings of grammar
symbols.
❖ Thus, a generic production can be written as A→ , where A is the head and  the
body.
6. A set of productions A→ 1, A→ 2, A→ 3,…, A→ k with a common head A (call them
A-productions), may be written A→ 1|2|3|...|k. call 1, 2, 3,...,k the alternatives for A.

7. Unless stated otherwise, the head of the first production is the start symbol.

4
Compiler Design

Example 2: - Using these conventions, the grammar of Example 1 can be rewritten concisely as:
E→ E+T|E-T|T
T→ T*F|T/F|F
F → ( E ) | id

Derivations
❖ A derivation is a description of how a string is generated from the start symbol of a
grammar.
❖ The construction of a parse tree can be made precise by taking a derivational view, in which
productions are treated as rewriting rules. Beginning with the start symbol, each rewriting
step replaces a nonterminal by the body of one of its productions.
❖ For a general definition of derivation, consider a nonterminal A in the middle of a sequence
of grammar symbols, as in 𝛼𝐴𝛽, where 𝛼 𝑎𝑛𝑑 𝛽 are arbitrary strings of grammar symbols.
o Suppose 𝐴 → 𝛾 is a production. Then, we write 𝛼𝐴𝛽 ⇒ 𝛼𝛾𝛽.
o The symbol ⇒ means, "derives in one step."
❖ Example 3: Use the CFG below to perform the derivations in example 4 & 5.
Terminals = {id, num, if, then, else, print, =, {, }, ;, (, ) }
Non-Terminals = { S, E, B, L }
Rules = (1) S → print(E);
(2) S → while (B) do S
(3) S → { L }
(4) E → id
(5) E → num
(6) B → E > E
(7) L → S
(8) L → SL
Start Symbol = S
Leftmost Derivations
 A string of terminals and non-terminals α that can be derived from the initial symbol of the
grammar is called a sentential form
 Thus the strings “{ SL }”, “while(id>E) do S”, and print(E>id)” of the above example
are all sentential forms.
 A derivation is “leftmost” if, at each step in the derivation, the leftmost non-terminal is
selected to replace (always picks the leftmost non-terminal to replace).
 A sentential form that occurs in a leftmost derivation is called a left-sentential form.
Example 4: We can use leftmost derivations to generate while (id > num) do print(id); from
the above CFG (example 3) as follows:
S → while(B) do S
→ while(E>E) do S
→ while(id>E) do S
→ while(id>num) do S
5
Compiler Design

→ while(id>num) do print(E);
→ while(id>num) do print(id);
Rightmost Derivations
 Is a derivation technique that chooses the rightmost non-terminal to replace
Example 5: Generate while (num > num) do print(id); from CFG in example 3
S → while(B) do S
→ while(B) do print(E);
→ while(B) do print(id);
→ while(E>E) do print(id);
→ while(E>num) do print(id);
→ while(num>num) do print(id);
Non-Leftmost, Non-Rightmost Derivations
 Some derivations are neither leftmost or rightmost, such as:
S → while(B) do S
→ while(E>E) do S
→ while(E>E) do print(E);
→ while(E>id) do print(E);
→ while(num>id) do print(E);
→ while(num>id) do print(num);
CFG Shorthand
 We can combine two rules of the form S → α and S → β to get the single rule S → α│β
Example 6: CFG in example 3 can be shortened as follows
Terminals = {id, num, if, then, else, print, =, {,}, ;, (, ) }
Non-Terminals = { S, E, B, L }
Rules = S → print(E); | while (B) do S | { L }
E → id | num
B→E>E
L → S | SL
Start Symbol = S

Parse Trees
➢ A parse tree is a graphical representation of a derivation that filters out the order in which
productions are applied to replace non-terminals.
 Each interior node of a parse tree represents the application of a production.
 The interior node is labeled with the nonterminal A in the head of the production; the
children of the node are labeled, from left to right, by the symbols in the body of the
production by which this A was replaced during the derivation.
➢ We start with the initial symbol S of the grammar as the root of the tree
 The children of the root are the symbols that were used to rewrite the initial symbol in
6
Compiler Design

the derivation.
 The internal nodes of the parse tree are non-terminals
 The children of each internal node N are the symbols on the right-hand side of a rule
that has N as the left-hand side (e.g. B → E > E where E > E is the right-hand side and
B is the left-hand side of the rule)
➢ Terminals are leaves of the tree.
Examples 7: - ( id + id )
E ⇒ -E ⇒ - ( E ) ⇒ -( E + E ) ⇒ -( id + E) ⇒-( id + id)

Example 8: - ( id + id * id)
E ⇒ E + E ⇒ E + E * E ⇒ ( E + id * E) ⇒ (E + id * id) ⇒ ( id + id * id)

a) b)

Ambiguous Grammars
 A grammar is ambiguous if there is at least one string derivable from the grammar that has
more than one different parse tree, or more than one leftmost derivation, or more than one
rightmost derivation
 The example 8 above has two parse trees (parse tree a and b) that are ambiguous
grammars.
 Ambiguous grammars are bad, because the parse trees don’t tell us the exact meaning of the
string.
 For example, if we see the example 8 again, in figure a, the string means id + (id * id),
but in fig. b, the string means (id + id) * id. This is why we call it “ambiguous”.

7
Compiler Design

We need to change the grammar to fix this problem. How? We may rewrite the grammar as follows:
Terminals = {id, +, -, *, /, (, )}
Non-Terminals = {E, T, F }
Start Symbol = E
Rules = E → E + T
E→ E-T
E→ T
T→ T*F
T→ T/F
F → id
F → (E)
A parse tree for id * (id + id)

Review Exercises
Note: attempt all questions individually.
Submit your answer on [email protected]
Due date: January 29, 2025 G.C.

1. Consider the context-free grammar: S → S S + | S S * | a and the string aa + a*.

a) Give a leftmost derivation for the string.
b) Give a rightmost derivation for the string.
c) Give a parse tree for the string.
d) Is the grammar ambiguous or unambiguous? Justify your answer.
e) Describe the language generated by this grammar.

2. Consider the following grammar

Terminals = { a, b }
Non-Terminals = {S, T, F }
Start Symbol = S
Rules = S→ TF
T→ T T T
T→ a
F→ aFb
F→ b

8
Compiler Design

Which of the following strings are derivable from the grammar? Give the parse tree for
derivable strings?
i. ab iv. aaabb
ii. aabbb v. aaaabb
iii. aba vi. Aabb

3. Show that the following CFGs are ambiguous by giving two parse trees for the same string?
3.1) Terminals = { a, b } 3.2) Terminals = { if, then, else, print, id }
Non-Terminals = {S, T} Non-Terminals = {S, T}
Start Symbol = S Start Symbol = S
Rules = S→ STS Rules = S→ if id then S T
S→ b S→ print id
T→ aT T→ else S
T→ ε
T→ ε

4. Construct a CFG for each of the following:

a. All integers with sign (Example: +3, -3)
b. The set of all strings over { (, ), [, ]} which form balanced parenthesis. That is, (), ()(),
((()())()), [()()] and ([()[]()]) are in the language but )( , ][ , (() and ([ are not.
c. The set of all string over {num, +, -, *, /} which are legal binary post-fix expressions.
Thus numnum+, num num num + *, num num – num * are all in the language, while
num*, num*num and num num num – are not in the language.
d. Are your CFGs in a, b and c ambiguous?

Chapter 3 - Syntax Analysis Part One
No ratings yet
Chapter 3 - Syntax Analysis Part One
10 pages
Chapter 3 Syntax Analysis (Parsing)
No ratings yet
Chapter 3 Syntax Analysis (Parsing)
29 pages
Chapter 3 Syntax Analysis (Parsing)
No ratings yet
Chapter 3 Syntax Analysis (Parsing)
29 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
CD Unit 2
100% (1)
CD Unit 2
20 pages
Syntax Analysis and Parser Techniques
No ratings yet
Syntax Analysis and Parser Techniques
171 pages
Chapter 3 Syntax Analysis I
No ratings yet
Chapter 3 Syntax Analysis I
27 pages
Compiler - Design - Module3
No ratings yet
Compiler - Design - Module3
19 pages
Class Three
No ratings yet
Class Three
74 pages
Syntax Analysis for Programmers
No ratings yet
Syntax Analysis for Programmers
58 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
11 pages
Syntax Analyzer and Parsing Techniques
No ratings yet
Syntax Analyzer and Parsing Techniques
38 pages
Syntax Analysis and Parsing Guide
No ratings yet
Syntax Analysis and Parsing Guide
95 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
160 pages
Unit 2
No ratings yet
Unit 2
168 pages
Unit-2 Syntax Analysis
No ratings yet
Unit-2 Syntax Analysis
27 pages
Multimedia Application L4
No ratings yet
Multimedia Application L4
42 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
29 pages
Role of Parse1
No ratings yet
Role of Parse1
20 pages
Parser Lec1
No ratings yet
Parser Lec1
20 pages
2.2 - Syntax Analysis (Upto Top-Down Parsing)
No ratings yet
2.2 - Syntax Analysis (Upto Top-Down Parsing)
91 pages
Lesson 3: Syntax Analysis: Risul Islam Rasel
No ratings yet
Lesson 3: Syntax Analysis: Risul Islam Rasel
106 pages
Compiler Syntax & Yacc Guide
No ratings yet
Compiler Syntax & Yacc Guide
21 pages
CD Unit-2 (R20)
No ratings yet
CD Unit-2 (R20)
38 pages
Syntax Analysis (Part-I)
No ratings yet
Syntax Analysis (Part-I)
88 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
6 pages
Compiler Syntax & Parsing Guide
No ratings yet
Compiler Syntax & Parsing Guide
22 pages
Syntax Analysis
No ratings yet
Syntax Analysis
47 pages
Lecture 03
No ratings yet
Lecture 03
36 pages
Syntax Analysis: Chapter - 4
No ratings yet
Syntax Analysis: Chapter - 4
41 pages
Unit-3 Cs6660-Compiler Design
No ratings yet
Unit-3 Cs6660-Compiler Design
66 pages
Chapter - Three
No ratings yet
Chapter - Three
139 pages
Compiler Design: Syntax Analysis
No ratings yet
Compiler Design: Syntax Analysis
43 pages
Module 2 C D Notes
No ratings yet
Module 2 C D Notes
21 pages
CD Unit Ii
No ratings yet
CD Unit Ii
38 pages
Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
14 pages
CH2-1 To CH2-3
No ratings yet
CH2-1 To CH2-3
79 pages
2-Role of Parser and Parse Tree-02!08!2024
No ratings yet
2-Role of Parser and Parse Tree-02!08!2024
69 pages
Compiler Design CS - 4
No ratings yet
Compiler Design CS - 4
70 pages
First and Follow in Context-Free Grammar
No ratings yet
First and Follow in Context-Free Grammar
43 pages
2 Syntax Analysis - Introduction
No ratings yet
2 Syntax Analysis - Introduction
8 pages
Context-Free Grammar and Parsing Overview
No ratings yet
Context-Free Grammar and Parsing Overview
19 pages
Chapter 3
No ratings yet
Chapter 3
77 pages
Compiler Design 3
No ratings yet
Compiler Design 3
140 pages
CD Unit-2
100% (1)
CD Unit-2
60 pages
Parser Role in Compilation Process
No ratings yet
Parser Role in Compilation Process
31 pages
Syntax Analysis and Parsing Guide
No ratings yet
Syntax Analysis and Parsing Guide
105 pages
Chapter - Three: Syntax Analysis
No ratings yet
Chapter - Three: Syntax Analysis
100 pages
Chapter-3 So Far
No ratings yet
Chapter-3 So Far
50 pages
Chapter 3 Syntax Analyzer
No ratings yet
Chapter 3 Syntax Analyzer
46 pages
MODULE 3 Syntax Analysis
100% (1)
MODULE 3 Syntax Analysis
182 pages
Context-Free Grammar and Parsing Techniques
No ratings yet
Context-Free Grammar and Parsing Techniques
76 pages
Understanding Syntax Analysis and Parsers
No ratings yet
Understanding Syntax Analysis and Parsers
37 pages
Parsing Techniques and Error Handling
No ratings yet
Parsing Techniques and Error Handling
135 pages
Compiler Design: Syntax Analysis & Parsing
No ratings yet
Compiler Design: Syntax Analysis & Parsing
28 pages
SE Compiler Chapter 3-Parser
No ratings yet
SE Compiler Chapter 3-Parser
27 pages
Unit 2
No ratings yet
Unit 2
22 pages
Context-Free Grammars and Parsing
No ratings yet
Context-Free Grammars and Parsing
7 pages
Chapter-3-Syntax Analysis
No ratings yet
Chapter-3-Syntax Analysis
126 pages
Chapter - 5 Learning
No ratings yet
Chapter - 5 Learning
38 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
Chapter 5 - ICG
No ratings yet
Chapter 5 - ICG
5 pages
Chapter Three
No ratings yet
Chapter Three
36 pages
SL Notes
No ratings yet
SL Notes
166 pages
Moss
No ratings yet
Moss
2 pages
C Programming - Question Bank PDF
91% (11)
C Programming - Question Bank PDF
21 pages
NLP Unit 1
No ratings yet
NLP Unit 1
18 pages
Natural Language Processing: Practical 1
No ratings yet
Natural Language Processing: Practical 1
64 pages
Programming Languages & Compilers Course
No ratings yet
Programming Languages & Compilers Course
19 pages
PROGRAML: Graph-Based Program Analysis
No ratings yet
PROGRAML: Graph-Based Program Analysis
20 pages
1
No ratings yet
1
6 pages
Threaded Interpretive Languages
No ratings yet
Threaded Interpretive Languages
276 pages
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
No ratings yet
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
16 pages
Practical1 PCD
No ratings yet
Practical1 PCD
11 pages
Chapter 4:: Semantic Analysis: Programming Language Pragmatics
No ratings yet
Chapter 4:: Semantic Analysis: Programming Language Pragmatics
34 pages
A Retargetable C Compiler Design and Implementation
100% (1)
A Retargetable C Compiler Design and Implementation
578 pages
Computer Science Exam Prep
No ratings yet
Computer Science Exam Prep
23 pages
Coq 8.20.1 Reference Manual
No ratings yet
Coq 8.20.1 Reference Manual
936 pages
2024 CD-Ch02 Lexical Analysis
No ratings yet
2024 CD-Ch02 Lexical Analysis
25 pages
Screenshot 2024-11-29 at 8.35.21 AM
No ratings yet
Screenshot 2024-11-29 at 8.35.21 AM
40 pages
System Software Notes 5TH Sem Vtu
100% (1)
System Software Notes 5TH Sem Vtu
98 pages
Be Comp Sci Engg 2019-20
No ratings yet
Be Comp Sci Engg 2019-20
51 pages
AFL in Compiler Design for Students
No ratings yet
AFL in Compiler Design for Students
31 pages
UNIT-1 Objective:: Overview of A Language-Processing System
No ratings yet
UNIT-1 Objective:: Overview of A Language-Processing System
25 pages
CS6660 Compiler Design Lesson Plan
No ratings yet
CS6660 Compiler Design Lesson Plan
3 pages
Code Generation Using Machine Learning A Systematic Review
No ratings yet
Code Generation Using Machine Learning A Systematic Review
22 pages
Notes of Bluej For STD 10
100% (2)
Notes of Bluej For STD 10
17 pages
Oracle
No ratings yet
Oracle
27 pages
Understanding Compiler Design Basics
No ratings yet
Understanding Compiler Design Basics
14 pages
CDSSS
No ratings yet
CDSSS
72 pages
Compiler Design Course Overview
No ratings yet
Compiler Design Course Overview
1 page
SLD 1
No ratings yet
SLD 1
30 pages
Compiler Construction Final Notes For End Sem Exam
No ratings yet
Compiler Construction Final Notes For End Sem Exam
37 pages

Chapter 3 - Syntax Analysis

Uploaded by

Chapter 3 - Syntax Analysis

Uploaded by

Compiler Design

Chapter Three: Syntax Analysis

❖ Describe context-Free Grammars (CFGs) and their representation format.

❖ Be familiar with CFG shorthand techniques.

❖ Describe Parse Tree and its structure.

❖ Explain the Extended Backus Naur Form

The Role of the Parser

Fig. Position of parser in compiler model

❖ Therefore, Parser performs context-free syntax analysis, guides context-sensitive analysis,

Error handler goals

Common Error-Recovery Strategies includes:

Context-Free Grammars (CFGs)

❖ A CFG includes 4 components:

1. A set of terminals T, which are the tokens of the language

3. A set of rewriting rules R.

❖ A string of tokens is generated by a CFG in the following way:

c. Repeat 1 and 2 until all elements in the string are terminals

Example 1: A grammar that defines simple arithmetic expressions:

2. These symbols are non-terminals:

1. Consider the context-free grammar: S → S S + | S S * | a and the string aa + a*.

2. Consider the following grammar

4. Construct a CFG for each of the following:

You might also like