0% found this document useful (0 votes)

39 views29 pages

Chapter 3 Syntax Analysis (Parsing)

Uploaded by

ejigu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views29 pages

Chapter 3 Syntax Analysis (Parsing)

Uploaded by

ejigu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Chapter Three

Syntax Analysis(Parsing)

Compiler Design

1
Objective

At the end of this session students will be able to:

 Understand the basic roles of Parser(syntactic Analyzer).

 Understand context-Free Grammars(CFGs) and their representation format.

 Understand the different derivation formats: Leftmost derivation, Rightmost

derivation and Non-Leftmost, Non-Rightmost derivations

 Be familiar with CFG shorthand techniques.

 Understand Parse Tree and its structure.

 Understand ambiguous grammars and how to deal with ambiguity from CFGs.

 Understand the Extended Backus Naur Form

 Understand the Javacc Parser Generator and its Structure.

2
The Role of the Parser

Source token
Lexical Rest of Intermediate
program Parser Parse tree representation
Analyzer Front End
getNextToken

Symbol
table

Parser
 The parser is expected to report any
performs context-free syntax
syntax errors in an intelligible fashion and
analysis
to recover from commonly occurring
guides context-sensitive analysis
errors to continue processing the
constructs an intermediate
remainder of the program.
representation
 Conceptually, for well-formed programs,
produces meaningful error
3
the parser constructs a parse tree and passes
messages
Contd.
 The parser obtains a string of tokens from the lexical analyzer, as shown in the

above Figure and verifies that the string of token names can be generated by the
grammar for the source language.
 A grammar gives a precise, yet easy-to-understand, syntactic specification of a
programming language.
 From certain classes of grammars, we can construct automatically an efficient

parser that determines the syntactic structure of a source program.

 As a side benefit, the parser-construction process can reveal syntactic
ambiguities and trouble spots that might have slipped through the initial design
phase of a language.
 The structure imparted to a language by a properly designed grammar is

useful for translating source programs into correct object code and for
detecting errors.
 A grammar allows a language to be evolved or developed iteratively, by adding
4
new constructs to perform new tasks.
Contd.
 There are three general types of parsers for grammars: universal, top-down, and

bottom-up.
1. Universal parsing methods such as the Cocke-Younger-Kasami algorithm and
Earley's algorithm can parse any grammar [Read more on this].
 These general methods are, however, too inefficient to use in production

compilers.
 The methods commonly used in compilers can be classified as being either top-

down or bottom-up.
2. Top-Down Methods:- As implied by their names, top-down methods build
parse trees from the top (root ) to the bottom (leaves ) .
3. Bottom-up methods:- start from the leaves and work their way up to the
root to build the parse tree .
 In either case, the input to the parser is scanned from left to right, one symbol at

5 a time
 The most efficient top-down and bottom-up methods work only for sub-classes
Error Handling
Common Programming Errors include:
Lexical errors, Syntactic errors, Semantic errors and logical Errors
Error handler goals
Report the presence of errors clearly and accurately
Recover from each error quickly enough to detect subsequent errors
Add minimal overhead to the processing of correct programs
Common Error-Recovery Strategies includes:
1. Panic mode recovery:- Discard input symbol one at a time until one of designated
set of synchronization tokens is found.
2. Phrase level recovery:- Replacing a prefix of remaining input by some string that
allows the parser to continue.
3. Error productions:- Augment the grammar with productions that generate the
erroneous constructs
4. Global correction:- Choosing minimal sequence of changes to obtain a globally
6
least-cost correction
Context-Free Grammars (CFGs)
 CFG is used as a tool to describe the syntax of a programming language.
 A CFG includes 4 components:

1. A set of terminals T, which are the tokens of the language

 Terminals are the basic symbols from which strings are formed.
 The term "token name" is a synonym for "terminal"
2. A set of non-terminals N
 Non-terminals are syntactic variables that denote sets of strings.
 The sets of strings denoted by non-terminals help define the language
generated by the grammar.
 Non-terminals impose a hierarchical structure on the language that is
key to syntax analysis and translation
3. A set of rewriting rules R.
 The left-hand side (head) of each rewriting rule is a single non-terminal.
 The right-hand side (body) of each rewriting rule is a string of terminals
and/or non-terminals
7
4. A special non-terminal S Є N, which is the start symbol
Contd.
 Just as regular expression generate strings of characters, CFG generate strings of tokens
 A string of tokens is generated by a CFG in the following way:
1. The initial input string is the start symbol S
2. While there are non-terminals left in the string:
i. Pick any non-terminal in the input string A
ii. Replace a single occurrence of A in the string with the right-hand side of any rule that
has A as the left-hand side
iii. Repeat 1 and 2 until all elements in the string are terminals
Example:
Terminals = { id, num, if, then, else, print, =, {, }, ;, (, ) }
Non-Terminals = { S, E, B, L }
Rules = (1) S  print(E);
(2) S  while (B) do S
(3) S  { L }
(4) E  id
(5) E  num
(6) B  E > E
(7) L  S
8 (8) L  SL
Start Symbol = S
Contd.
Example 3: A grammar that defines simple arithmetic expressions:

Terminals = { id, +, -, *, /, (, ) }

Non-Terminals = {expression, term, factor } Example 4:

Start Symbol = expression 1. expression  expression + expression

2. expression  expression – expression
Rules = expression  expression + term
3. expression  expression * expression
 expression – term
4. expression  expression / expression
 term 5. expression  num
term  term* factor
expression  expression + expression
 term/factor
® expression * expression +
 factor expression
factor  ( expression ) ® num * expression + expression

 id ® num * num+ expression

9
® num * num+ num
Conventions
1. These symbols are terminals:

A. Lowercase letters early in the alphabet, such as a, b, c.

B. Operator symbols such as +, *, and so on .

C. Punctuation symbols such as parentheses , comma, and so on.

D. The digits 0, 1, ... ,9 .

E. Boldface strings such as id or if, each of which represents a single terminal

symbol.
2. These symbols are non-terminals:

i. Uppercase letters early in the alphabet, such as A, B, C.

ii. The letter S, which, when it appears, is usually the start symbol.
iii. Lowercase, italic names such as expr or stmt.
iv. Uppercase letters may be used to represent non-terminals for the constructs.
For example:- non terminals for expressions, terms, and factors are often
represented by E, T, and F, respectively.
10
3. Uppercase letters late in the alphabet , such as X, Y, Z, represent grammar symbols;
Contd.
4. Lowercase letters late in the alphabet , chiefly u, v, ... ,z , represent (possibly
empty) strings of terminals.
5. Lowercase Greek letters ,,, for example, represent (possibly empty) strings of
grammar symbols.
 Thus, a generic production can be written as A , where A is the head and  the

body.
6. A set of productions A 1, A 2, A 3,..., A k with a common head A (call

them A-productions), may be written A 1|A 2|A 3|...|A k.

 Call 1, 2, 3,...,k the alternatives for A

7. Unless stated otherwise, the head of the first production is the start symbol.
• The notational
Example:- Using these conventions , the grammar of Example conventions
4 of tell us #that
slide 9 can be
E,T, and F are non-
rewritten concisely as: terminals, with E the start
symbol.
E E+ T|E-T|T • The remaining symbols
11 are terminals
T T*F|T/F|F
Derivations
 A derivation is a description of how a string is generated from the start symbol of a
grammar.
1. A leftmost derivation always picks the leftmost non-terminal to replace (see slide 13)
2. A rightmost derivation always picks the rightmost non-terminal to replace( see slide 14)
 Some derivations are neither leftmost nor rightmost (see slide 15)
 For example: Use the CFG below to generate print (id);
Terminals = { id, num, if, then, else, print, =, {, }, ;, (, ) }
Non-Terminals = { S, E, B, L }
Rules = (1) S  print(E);
(2) S  while (B) do S
(3) S  { L }
(4) E  id
(5) E  num
(6) B  E > E
(7) L  S
12
(8) L  SL
Leftmost Derivations
 A string of terminals and non-terminals α that can be derived from the initial symbol of the
grammar is called a sentential form
 Thus the strings “{ S L }”, “while(id>E) do S”, and print(E>id)” of the above example re
all sentential forms
 A derivation is “leftmost” if, at each step in the derivation, the leftmost non-terminal is
selected to replace
 All of the above examples are leftmost derivations
 A sentential form that occurs in a leftmost derivation is called a left-sentential form
Example 1: We can use leftmost derivations to generate while(id > num) do print(id); from this
CFG as follows: Example 2: We also can generate { print(id);
S  while(B) do S print(num); } from the CFG as follows:
 while(E>E) do S S{L}
{SL}
 while(id>E) do S
 { print(E); L }
 while(id>num) do S  { print(id); L }
 while(id>num) do print(E);  { print(id); S }
 while(id>num) do print(id);  { print(id); print(E); }
13
 { print(id); print(num); }
Rightmost Derivations

 Is a derivation technique that chooses the rightmost non-terminal to

replace

Example 1: To generate while(num > num) do print(id);

S  while(B) do S
Example 2: Try to derivate { print(num); print(id); }
 while(B) do print(E);
from S
S{L}
 while(B) do print(id);
{SL}
 while(E>E) do print(id);
{SS}
 while(E>num) do print(id);  { S print(E); }

 while(num>num) do print(id);  { S print(id); }

 { print(E); print(id); }
 { print(num); print(id); }
14
Non-Leftmost, Non-Rightmost Derivations

 Some derivations are neither  Some strings that are not derivable from this CFG,

leftmost or rightmost, such as: such as:

1. print(id)
S  while(B) do S
2. { print(id); print(id) }
 while(E>E) do S
3. while (id) do print(id);
 while(E>E) do print(E);
4. print(id > id);

 while(E>id) do print(E); 1 & 2: no ; to terminate statements.

 while(num>id) do print(E); 3: the id in while (id) is not derivable from B.

4: id > id is not derivable from E.

 while(num>id) do
15
print(num);
CFG Shorthand
 We can combine two rules of the form S  α and S  β to get the single rule S  α│β

Example:

Terminals = { id, num, if, then, else, print, =, {, }, ;, (, ) }

Non-Terminals = { S, E, B, L }

Rules = S  print(E); | while (B) do S | { L }

E  id | num

BE>E

L  S | SL

Start Symbol = S

16
Parse Trees
 A parse tree is a graphical representation of a derivation that filters out the order in

which productions are applied to replace non-terminals .

 Each interior node of a parse tree represents the application of a production.

 The interior node is labeled with the nonterminal A in the head of the

production; the children of the node are labeled, from left to right, by the symbols in

the body of the production by which this A was replaced during the derivation .

 We start with the initial symbol S of the grammar as the root of the tree

 The children of the root are the symbols that were used to rewrite the initial symbol in the

derivation

 The internal nodes of the parse tree are non-terminals

 The children of each internal node N are the symbols on the right-hand side of a rule that has

N as the left-hand side (e.g. B  E > E where E > E is the right-hand side and B is the left-
17
hand side of the rule)
Examples
Example 1: -(id+id)
E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)

Example 2: (id+id*id)
E => E+E => E+E*E =>(E+id*E) => (E+id*id)=>(id+id*id)

18 a) b)
Ambiguous Grammars
 A grammar is ambiguous if there is at least one string derivable from the grammar that

has more than one different parse tree, or more than one leftmost derivation, or more
than one rightmost derivation
 Example 2 of slide 18 has two parse trees(parse tree a and b) that are ambiguous

grammars.
 Ambiguous grammars are bad, because the parse trees don’t tell us the exact meaning of the

string.
 For example, in Example 2 of the previous slide, in Fig a. the string means id*(id+id),

but in Fig. b, the string means (id*id)+id.

E This is why we call it “ambiguous”.
T
We need to change the grammar to fix this problem. How? We may rewrite the grammar
T * F
as follows:
F
Terminals = { id, +, -, *, /, (, ) } ( E )
Non-Terminals = {E, T, F } id
Start Symbol = E E + T

Rules = E E +T T F

E E -T F
id
19
E T id
A parse tree for id*id(id+id)
Contd.
We need to make sure that all additions appear higher in the tree than multiplications

(Why?)
How can we do this?

 Once we replace an E with E*E using single rule 4, we don’t want to rewrite any of the

Es we’ve just created using rule 2, since that would place an addition (+) lower in the

tree than a multiplication (*)

 Let’s create a new non-terminal T for multiplication and division

 T will generate strings of id’s multiplied or divided together, with no additions or

subtractions.

 Then we can modify E to generate strings of T’s added together

 This modified grammar is shown at slide no. 19.

20  However, this grammar is still ambiguous. It is impossible to generate a parse tree

Contd.
 Consider the string id+id+id, which has two parse trees, as shown at example 2 of slide

18:

id+id+id = (id+id)+id or

= id+(id+id) are all ok

id-id-id = (id-id)-id

!= id-(id-id) but this is wrong

 We would like addition and subtraction to have leftmost association as above

 In other words, we need to make sure that the right sub-tree of an addition or

subtraction is not another addition or subtraction

 We modified the parse tree of example 2 of slide 18 by the CFG and parse tree shown
21
at slide no. 19 to generate an unambiguous CFG and parse tree.
Extended Backus Naur Form (EBNF)
 Another term for a CFG is a Backus Naur Form (BNF).
 There is an extension to BNF notation, called Extended Backus Naur Form, or EBNF
 EBNF rules allow us to mix and match CFG notation and regular expression notation
in the right-hand side of CFG rules
 For example, consider the following CFG, which describes simpleJava statement
blocks and stylized simpleJava print statements:

1. S  { B } We could express the same language using an EBNF as

2. S  print(id) follows:
3. B  S ; C 1. S  { B }
4. C  S ; C 2. S  print”(“id”)”
5. C  ε 3. B  (S;)+
 Rules 3, 4, and 5 in the Note
above grammar describe a  In Rule 2, when we want a parenthesis to appear
series of one or more in EBNF, we need to surround it with quotation
statements S, terminated by marks.
22
semicolons
 But in Rule 3, the pair of parenthesis is for the
Exercise
1. Consider the following grammar
Terminals = { a, b } Which of the following strings are derivable from
Non-Terminals = {S, T, F }
Start Symbol = S the grammar? Give the parse tree for derivable
Rules = S TF strings? iv. aaabb
T T T T i. ab v. aaaabb
T a
ii. aabb vi. aabbb
F aFb
F b iii. aba

2. Show that the following CFGs are ambiguous by giving two parse trees for the same
2.2) Terminals = { if, then, else, print, id }
string?
Non-Terminals = {S, T}
2.1) Terminals = { a, b }
Start Symbol = S
Non-Terminals = {S, T}
Rules = S if id then S T
Start Symbol = S S print id
Rules = S STS T else S
S b T ε
23
T aT
Contd.
3. Construct a CFG for each of the following:

[Link] integers with sign (Example: +3, -3)

[Link] set of all strings over { (, ), [, ]} which form balanced parenthesis. That is,

(). ()(), ((()())()), [()()] and ([()[]()]) are in the language but )( , ][ , (() and ([ are

not.

[Link] set of all string over {num, +, -, *, /}which are legal binary post-fix

expressions. Thus numnum+, num num num + *, num num – num * are all in

the language, while num*, num*num and num num num – are not in the

language.

24 [Link] your CFGs in a, b and c ambiguous?

JavaCC-A LL(K) Parser Generator
 JavaCC can be used to build recursive dependent parser (RDP) as well as lexical
analyzer.
 It takes as input an EBNF grammar for a language and creates a suite mutually
recursive methods that implement a LL(K), Left-to-right, Leftmost-derivation, k-
symbol lookahead parser for the language.
JavaCC File Format Revisited ( Javacc .jj file have the following format)
Options
{
// Code to set various options flags
}
PARSER_BEGIN(simple)
public class simple
{ /* Extra parser method definitions go here, often a main progress
which drives the parse*/
}
PARSER_END(simple)
TOKEN_MGR_DECLS:
{
// Declarations used by the lexical analyzer
25 }
/* Token Rules and actions */
JavaCC Rules
 JavaCC uses Extended Brakus Naur Form (EBNF) rules to describe the grammar of the
language to be parsed.
JavaCC Rules Format is shown below:
void nonTerminal():
{ /* Java Declarations /* }
{
/* Rule Definition */
}
 The Java declarations block will be used when we add actions to the JavaCC rules for
building abstract syntax Tree.
 Since we are only using JavaCC to determine if a program is syntactically correct, we
will have the Java declaration block blank.
 The Rule Definition segment defines the right-hand side of the rule that we are writing.
 Non-terminals in these rules will represent function calls, and thus will be
followed by ().
26
 Terminal names will be enclosed in < and >.
Example
The following CFG rule for a subset of simpleJava statements:
S  while (E) S
S  V=E;
Would be represented by the JavaCC rule as:
void statement():
{ }
{
<WHILE><LEFTPARENTHESIS>expression()<RIGHTPARENTHESIS> statement()
| variable <ASSIGNOP>expression <SEMICOLON>
} Terminals = { num, +, -, *, / }
void expression(): Non-Terminals = {E, T, F }
{ } Start Symbol = E
{
Rules = E + E T
<PLUS> expression() expression()
| <MINUS> expression() expression() E - E E
| <TIMES> expression() expression() E T
| <DIVIDED> expression() expression() T * T F
| <INTEGER_LITERAL> T - T F
} T F
27 F num
A CFG for prefix arithmetic
Contd.

There are a few things to note about the form of JavaCC rules:

 In CFGs, we have followed the common convention of using uppercase letters for non-

terminals, and lowercase variables for terminals.

 JavaCC uses the reverse conversion, i.e. uppercase letters for terminals and

lowercase letters for non-terminals.

 JavaCC non-terminals are usually not a single letter, but amore meaningful identifier.

 All terminals(token names) are inside < and >

 Non-terminals represent method calls in the generator parser, hence the () after each

28 non-terminal in the rule.

Question?

Chapter 3 Syntax Analysis (Parsing)
No ratings yet
Chapter 3 Syntax Analysis (Parsing)
29 pages
Chapter 3 Syntax Analysis I
No ratings yet
Chapter 3 Syntax Analysis I
27 pages
Chapter 3 - Syntax Analysis Part One
No ratings yet
Chapter 3 - Syntax Analysis Part One
10 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
9 pages
Parser Lec1
No ratings yet
Parser Lec1
20 pages
Syntax Analysis and Parser Techniques
No ratings yet
Syntax Analysis and Parser Techniques
171 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
Compiler Design CS - 4
No ratings yet
Compiler Design CS - 4
70 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
160 pages
Class Three
No ratings yet
Class Three
74 pages
Compiler Design 3
No ratings yet
Compiler Design 3
140 pages
CH2-1 To CH2-3
No ratings yet
CH2-1 To CH2-3
79 pages
Chapter 3
No ratings yet
Chapter 3
77 pages
Lesson 3: Syntax Analysis: Risul Islam Rasel
No ratings yet
Lesson 3: Syntax Analysis: Risul Islam Rasel
106 pages
Lecture 03
No ratings yet
Lecture 03
36 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
11 pages
Unit 2
No ratings yet
Unit 2
168 pages
G52Cmp Compilers: Syntax Analysis
No ratings yet
G52Cmp Compilers: Syntax Analysis
36 pages
Context-Free Grammar and Parsing Overview
No ratings yet
Context-Free Grammar and Parsing Overview
19 pages
Compiler Syntax & Yacc Guide
No ratings yet
Compiler Syntax & Yacc Guide
21 pages
Syntax and Semantics in Programming
100% (2)
Syntax and Semantics in Programming
50 pages
CD Chapter III-1
No ratings yet
CD Chapter III-1
77 pages
Chapter - Three
No ratings yet
Chapter - Three
139 pages
2.2 - Syntax Analysis (Upto Top-Down Parsing)
No ratings yet
2.2 - Syntax Analysis (Upto Top-Down Parsing)
91 pages
Syntax Analysis and Parsing Guide
No ratings yet
Syntax Analysis and Parsing Guide
95 pages
Syntax Analyzer and Parsing Techniques
No ratings yet
Syntax Analyzer and Parsing Techniques
38 pages
CD Unit 2
100% (1)
CD Unit 2
20 pages
CC-Lec 5 Week 5 Cfgs
No ratings yet
CC-Lec 5 Week 5 Cfgs
29 pages
(Week 4) Syntax Analysis (CFG)
No ratings yet
(Week 4) Syntax Analysis (CFG)
50 pages
Multimedia Application L4
No ratings yet
Multimedia Application L4
42 pages
Chapter 4 - Context-Free Grammars and Languages
No ratings yet
Chapter 4 - Context-Free Grammars and Languages
60 pages
2nd Phase Syntax Analyzer - 1
No ratings yet
2nd Phase Syntax Analyzer - 1
136 pages
Compiler Syntax Analysis Overview
No ratings yet
Compiler Syntax Analysis Overview
18 pages
CH2 1
No ratings yet
CH2 1
27 pages
Compiler - Design - Module3
No ratings yet
Compiler - Design - Module3
19 pages
Module 2 C D Notes
No ratings yet
Module 2 C D Notes
21 pages
Chapter 2
No ratings yet
Chapter 2
47 pages
2-Role of Parser and Parse Tree-02!08!2024
No ratings yet
2-Role of Parser and Parse Tree-02!08!2024
69 pages
Syntax Analysis for Programmers
No ratings yet
Syntax Analysis for Programmers
58 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
29 pages
CS6109 Module 4
No ratings yet
CS6109 Module 4
36 pages
Role of Parse1
No ratings yet
Role of Parse1
20 pages
Unit-2 Syntax Analysis
No ratings yet
Unit-2 Syntax Analysis
27 pages
Syntax Analysis and Parsing Techniques
No ratings yet
Syntax Analysis and Parsing Techniques
56 pages
Syntax Analysis
No ratings yet
Syntax Analysis
90 pages
Chapter - Three: Syntax Analysis
No ratings yet
Chapter - Three: Syntax Analysis
100 pages
Chapter-3 So Far
No ratings yet
Chapter-3 So Far
50 pages
Parsing ME Modified
No ratings yet
Parsing ME Modified
168 pages
Copch 3
No ratings yet
Copch 3
90 pages
Compiler Design Chapter-3
0% (1)
Compiler Design Chapter-3
177 pages
Lecture 1 Introduction DR Raheel 19022024 032426pm
No ratings yet
Lecture 1 Introduction DR Raheel 19022024 032426pm
32 pages
Compiler Design Lec-Three Syntax Analysis
No ratings yet
Compiler Design Lec-Three Syntax Analysis
60 pages
Chapter 4 Syntax Analysis
No ratings yet
Chapter 4 Syntax Analysis
95 pages
Chapter - 2 - Finite State Automata - Part - 3
No ratings yet
Chapter - 2 - Finite State Automata - Part - 3
50 pages
Syntax Analysis (Part-I)
No ratings yet
Syntax Analysis (Part-I)
88 pages
Context-Free Grammars and Parsing
No ratings yet
Context-Free Grammars and Parsing
7 pages
CD Unit Ii
No ratings yet
CD Unit Ii
38 pages
Unit 2
No ratings yet
Unit 2
86 pages
Syntax Analysis
No ratings yet
Syntax Analysis
47 pages
01 4021 Course Details 2024s
No ratings yet
01 4021 Course Details 2024s
35 pages
FINAL
No ratings yet
FINAL
19 pages
Pava Specs
No ratings yet
Pava Specs
19 pages
C++ Basics Summer Training Report
No ratings yet
C++ Basics Summer Training Report
20 pages
Sage X3 Sandbox Configuration Guide
No ratings yet
Sage X3 Sandbox Configuration Guide
8 pages
AZ 140T00A ENU Powerpoint 01
No ratings yet
AZ 140T00A ENU Powerpoint 01
22 pages
Cbse Board Practical Examination-2024-25 (Information Technology)
No ratings yet
Cbse Board Practical Examination-2024-25 (Information Technology)
1 page
CIT315
No ratings yet
CIT315
2 pages
Brocade 5000 SAN Switch
No ratings yet
Brocade 5000 SAN Switch
10 pages
HP Sssu Commands
No ratings yet
HP Sssu Commands
11 pages
Form GNL-1 Help
No ratings yet
Form GNL-1 Help
12 pages
Msce Computer Studies Notes
100% (1)
Msce Computer Studies Notes
61 pages
Chart
No ratings yet
Chart
9 pages
Essential Home Cybersecurity Tips
No ratings yet
Essential Home Cybersecurity Tips
2 pages
Ebenezer Viswas Gunaseelan (RESUME)
No ratings yet
Ebenezer Viswas Gunaseelan (RESUME)
1 page
Lab-2 3
No ratings yet
Lab-2 3
3 pages
Pve Admin Guide 8.2
No ratings yet
Pve Admin Guide 8.2
613 pages
Torrent User Guide 6.0
No ratings yet
Torrent User Guide 6.0
526 pages
Class 12 Computer Technology Full Textbook English Medium 2022
No ratings yet
Class 12 Computer Technology Full Textbook English Medium 2022
256 pages
How CGI Scripting Works: Web Servers
No ratings yet
How CGI Scripting Works: Web Servers
11 pages
C01002512H 07 08 16 en
No ratings yet
C01002512H 07 08 16 en
270 pages
Computer Application (COM 113)
No ratings yet
Computer Application (COM 113)
10 pages
The Futureof Enterpriseresourceplanning ERPHarnessing Artificial Intelligence
No ratings yet
The Futureof Enterpriseresourceplanning ERPHarnessing Artificial Intelligence
6 pages
Lesson 8 Quiz
No ratings yet
Lesson 8 Quiz
3 pages
Low-Cost Portable Air Quality System
No ratings yet
Low-Cost Portable Air Quality System
5 pages
TCL Grade 4 Piano
100% (2)
TCL Grade 4 Piano
27 pages
Aoa QB
No ratings yet
Aoa QB
2 pages
Unified Test in Math 8
No ratings yet
Unified Test in Math 8
14 pages
2010 GROHE Ondus Brochure CEE
No ratings yet
2010 GROHE Ondus Brochure CEE
56 pages
Auma User Manual
No ratings yet
Auma User Manual
100 pages

Chapter 3 Syntax Analysis (Parsing)

Uploaded by

Chapter 3 Syntax Analysis (Parsing)

Uploaded by

Chapter Three

At the end of this session students will be able to:

 Understand the basic roles of Parser(syntactic Analyzer).

 Understand context-Free Grammars(CFGs) and their representation format.

 Understand the different derivation formats: Leftmost derivation, Rightmost

derivation and Non-Leftmost, Non-Rightmost derivations

 Be familiar with CFG shorthand techniques.

 Understand Parse Tree and its structure.

 Understand the Extended Backus Naur Form

 Understand the Javacc Parser Generator and its Structure.

parser that determines the syntactic structure of a source program.

1. A set of terminals T, which are the tokens of the language

Non-Terminals = {expression, term, factor } Example 4:

Start Symbol = expression 1. expression  expression + expression

 id ® num * num+ expression

A. Lowercase letters early in the alphabet, such as a, b, c.

B. Operator symbols such as +, *, and so on .

C. Punctuation symbols such as parentheses , comma, and so on.

D. The digits 0, 1, ... ,9 .

E. Boldface strings such as id or if, each of which represents a single terminal

i. Uppercase letters early in the alphabet, such as A, B, C.

them A-productions), may be written A 1|A 2|A 3|...|A k.

 Call 1, 2, 3,...,k the alternatives for A

 Is a derivation technique that chooses the rightmost non-terminal to

Example 1: To generate while(num > num) do print(id);

 while(num>num) do print(id);  { S print(id); }

leftmost or rightmost, such as: such as:

 while(E>id) do print(E); 1 & 2: no ; to terminate statements.

 while(num>id) do print(E); 3: the id in while (id) is not derivable from B.

4: id > id is not derivable from E.

Terminals = { id, num, if, then, else, print, =, {, }, ;, (, ) }

Rules = S  print(E); | while (B) do S | { L }

which productions are applied to replace non-terminals .

 Each interior node of a parse tree represents the application of a production.

 The internal nodes of the parse tree are non-terminals

but in Fig. b, the string means (id*id)+id.

tree than a multiplication (*)

 Let’s create a new non-terminal T for multiplication and division

 T will generate strings of id’s multiplied or divided together, with no additions or

 Then we can modify E to generate strings of T’s added together

 This modified grammar is shown at slide no. 19.

20  However, this grammar is still ambiguous. It is impossible to generate a parse tree

= id+(id+id) are all ok

!= id-(id-id) but this is wrong

 We would like addition and subtraction to have leftmost association as above

subtraction is not another addition or subtraction

1. S  { B } We could express the same language using an EBNF as

[Link] integers with sign (Example: +3, -3)

24 [Link] your CFGs in a, b and c ambiguous?

terminals, and lowercase variables for terminals.

lowercase letters for non-terminals.

 All terminals(token names) are inside < and >

28 non-terminal in the rule.

You might also like