0% found this document useful (0 votes)

26 views25 pages

System Software

The document discusses scanning and parsing in programming language grammars. It describes lexical analysis, tokenization, regular expressions, finite automata, context-free grammars, ambiguity, top-down and bottom-up parsing. Scanning converts the source code into tokens which are then parsed using various parsing techniques.

Uploaded by

nityamparesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views25 pages

System Software

Uploaded by

nityamparesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Chapter 6.

Scanning and Parsing

• Programming Language Grammars
• Classification of Grammar
• Ambiguity in Grammatical Specification
• Scanning
• Parsing
• Top Down Parsing
• Bottom up Parsing
• Language Processor Development Tools
 LEX
 YACC

Reference Book: System Programming by D M Dhamdhere, McGraw Hill Publication

Recall: The Structure of a Compiler
Scanner
 The scanner begins the analysis of the
source program by reading the input,
character by character, and grouping
Source Scanner Tokens characters into individual words and symbols
(tokens)

 RE ( Regular expression )
 NFA ( Non-deterministic Finite Automata )
Parsing 

DFA ( Deterministic Finite Automata )
LEX
Today we start

Interm. Code Machine

Optimization
Language Gen. Code

2
Recall: The role of Scanner and Parser

token
Source To semantic
program Scanner Parser analysis
getNextToken

Symbol
table
Programming Language Grammars
• The lexical and syntactic features of a programming language are
specified by its grammar.

• A language L can be considered to be a collection of valid sentences.

• Each sentence can be looked upon as a sequence of words, and each

word as a sequence of letters or graphic symbols acceptable in L.

• A language specified in this manner is known as a formal language.

• A formal language grammar is a set of rules which precisely specify
the sentences of L.
Programming Language Grammars

alphabets

Words / Strings

Sentences/ Statements

Language
Programming Language Grammars

- Terminal Symbols, Alphabet and Strings

• The alphabet of L is represented by a greek
symbol Σ.
• Such as Σ = {a , b , ….z, 0, 1,…. 9}
• A string is a finite sequence of symbols.
α= axy
Programming Language Grammars
- Productions
• Also called as rewriting rule
A nonterminal symbol ::= String of T’s and NT’s
e.g.:
<Noun Phrase> ::= <Article> <Noun>
<Article> ::= a| an | the
<Noun> ::= boy | apple
Programming Language Grammars

- Grammar
• A grammar G of a language LG is a quadruple
(Σ, SNT, S, P) where
– Σ is the alphabet
– SNT is the set of NT’s
– S is the distinguished symbol
– P is the set of productions
Programming Language Grammars
- Derivation
• Let production P1 of grammar G be of the
form P1: A ::= α
And let β be such that β = γAθ

β = γαθ
Programming Language Grammars
- Reduction
Programming Language Grammars

- Example
<Sentence> :: = <Noun Phrase> <Verb Phrase>
<Noun Phrase> ::= <Article> <Noun>
<Verb Phrase> ::= <Verb> <Noun Phrase>
<Article> ::= a| an| the
<Noun> ::= boy | apple
<Verb> ::= ate
Programming Language Grammars
<Sentence>
<Noun Phrase> <Verb Phrase>
<Article> <Noun> <Verb Phrase>
<Article> <Noun> <Verb> <Noun Phrase>
the <Noun> <Verb> <Article> <Noun>
the boy <Verb> <Article> <Noun>
the boy ate <Article> <Noun>
the boy ate an <Noun>
the boy ate an apple
Classification of Grammars
• Venn Diagram of Grammar Types:

Type 0 – Phrase-structure Grammars

Type 1 –
Context-Sensitive
Type 2 –
Context-Free
Type 3 –
Regular
TYPE – 0 GRAMMARS
• These grammars, known as phrase structure
grammars, contain productions of the form
α ::= β
- Where both α and β can be strings of Ts and NTs.
• Such productions permit arbitrary substitution
of strings during derivation or reduction
•hence they are not relevant to specification of
programming language.
TYPE – 1 GRAMMARS
• These grammars are known as context sensitive
grammars because their productions specify that
derivation or reduction of strings can take place only in
specific contexts.
• A Type – 1 production has the form
α A β ::= α π β
• Thus, a string π in a sentential form can be replaced by
‘A’ only when it is enclosed by the strings α and β.
• These grammars are also not particularly relevant for
PL specification since recognition of PL constructs is not
context sensitive in nature.
TYPE – 2 GRAMMARS
• These grammars impose no context requirements on derivations
or reductions.
• A typical Type – 2 production is of the form
A := π
• Which can be applied independent of its context.
• These grammars are therefore known as context free grammars
(CFG).
• CFGs are ideally suited for programming language specification.
• Two best known uses of Type – 2 grammars in PL specification:
• ALGOL-60 specification
• Pascal specification.
TYPE – 3 GRAMMARS
• Type – 3 grammars (Regular Grammar) are characterized by
productions of the form
A := tB | t or A := Bt | t
• These productions also satisfy the requirements of Type – 2
grammars.
• The specific form of the RHS alternatives- namely a single T or a
string containing a single T and a single NT- gives some practical
advantages in scanning.
• The use of Type – 3 productions is restricted to the specification of
lexical units, e.g. identifiers, constants, labels, etc.
TYPE – 3 GRAMMARS
• The productions for <constant> and <identifier> in
grammar are in fact Type – 3 in nature.
• When the production for <id> in the form Bt | t, as
<id> ::= l | <id> l | <id> d
• Where l and d stand for a letter and digit respectively.
• Type – 3 grammars are also known as linear grammars
or regular grammars.
• These are further categorized into left-linear and right-
linear grammars depending on whether the NT in the
RHS alternatives appears at the extreme left or extreme
right.
Ambiguity In Grammatical Specification
• Ambiguity implies the possibility of different interpretations of a
source string.
• In natural language, ambiguity may concern the meaning or
syntax category of a word, or the syntactic structure of a construct.
• For Example, A word can have multiple meanings or can be both noun and
verb, and a sentence can have multiple syntactic structures.
• Bank – river bank or financial bank
• Formal language grammars avoid ambiguity at the level of a lexical
unit or a syntax category.
• This is achieved by the simple rule that identical strings cannot
appear on the RHS of more than one production in the grammar.
Ambiguity In Grammatical Specification

• Existence of ambiguity at the level of the syntactic

structure of a string would mean that more than one
parse tree can be built for the string.
• Example:
<exp> ::= <id>|<exp>+<exp>|<exp>*<exp>
<id> ::= a|b|c
• Two parse trees exist for the source string a+b*c
according to this grammar-
• a+b is first reduced to <exp>
• b*c is first reduced to <exp>.
Ambiguity In Grammatical Specification

• A grammar is ambiguous if some strings are derived ambiguously.

• A string is derived ambiguously if it has more than one leftmost
derivations.
Typical example:
Production Rule
E  0 | 1 | E+E | EE
String : 01+1
Leftmost Derivation Rightmost Derivation
E  E+E E  EE
 EE+E  EE+E
 0E+E  EE+1
 01+E  E1+1
 01+1  01+1
Ambiguity In Grammatic Specification
• Ambiguity and Parse Trees
• The ambiguity of 01+1 is shown by the two different parse trees:
Parse Tree 1 Parse Tree 2
E
E

E  E
E + E

0
1 E +
E  E E

0 1
1 1
Ambiguity In Grammatic Specification
• Example - I
S → AB | aaB
A → a | Aa
B→b
Check the given grammar is ambiguous or not?

Parse Tree 1 Parse Tree 2

String: aab
Ambiguity In Grammatical Specification
• Example - II
E→E+E
E→E*E
E → id
Check the given grammar is ambiguous or not?

Parse Tree 1 Parse Tree 2

String: id + id * id
Scanning

Tokens, Patterns and Lexemes

 A token is a pair a token name and an optional token

value
 A pattern is a description of the form that the lexemes of a
token may take
 A lexeme is a sequence of characters in the source
program that matches the pattern for a token

ATCD PPT Module-3
No ratings yet
ATCD PPT Module-3
136 pages
Simple Syntax Directed Translation
No ratings yet
Simple Syntax Directed Translation
51 pages
09 CFL
100% (1)
09 CFL
62 pages
2nd Phase Syntax Analyzer - 1
No ratings yet
2nd Phase Syntax Analyzer - 1
136 pages
Lecture03 Parsing 1
No ratings yet
Lecture03 Parsing 1
108 pages
Chapter 4 Syntax Analysis
No ratings yet
Chapter 4 Syntax Analysis
90 pages
Unit-3 Syntax Analysis
No ratings yet
Unit-3 Syntax Analysis
319 pages
Compiler Design Unit 2
No ratings yet
Compiler Design Unit 2
44 pages
Chapter 2
No ratings yet
Chapter 2
47 pages
Compiler Unit Ii
No ratings yet
Compiler Unit Ii
67 pages
6 Languages Grammars
No ratings yet
6 Languages Grammars
37 pages
Chapter 4 Syntax Analysis
No ratings yet
Chapter 4 Syntax Analysis
95 pages
Chapter 4 Syntax Analysis
No ratings yet
Chapter 4 Syntax Analysis
95 pages
Unit 2 - Sessions 1 - 2
No ratings yet
Unit 2 - Sessions 1 - 2
133 pages
Unit-2 PCD
No ratings yet
Unit-2 PCD
36 pages
Structure Ofa Compiler: Front End
No ratings yet
Structure Ofa Compiler: Front End
95 pages
LanguagesandGrammars Unit 3
No ratings yet
LanguagesandGrammars Unit 3
65 pages
Lec 14
No ratings yet
Lec 14
32 pages
Unit 3
No ratings yet
Unit 3
26 pages
CP 324 Grammars l4
No ratings yet
CP 324 Grammars l4
19 pages
Parsing - 1
No ratings yet
Parsing - 1
59 pages
Project Report
67% (15)
Project Report
40 pages
CSC 409 Part 1 - 113507
No ratings yet
CSC 409 Part 1 - 113507
23 pages
Grammar and Parse Trees (Syntax) : What Makes A Good Programming Language?
100% (2)
Grammar and Parse Trees (Syntax) : What Makes A Good Programming Language?
50 pages
2 SimpleOnePassCompiler
No ratings yet
2 SimpleOnePassCompiler
66 pages
Unit Iii
No ratings yet
Unit Iii
95 pages
Unit - II CD
No ratings yet
Unit - II CD
38 pages
Parser Lec1
No ratings yet
Parser Lec1
20 pages
Lesson 3: Syntax Analysis: Risul Islam Rasel
No ratings yet
Lesson 3: Syntax Analysis: Risul Islam Rasel
106 pages
Chapter 4
No ratings yet
Chapter 4
23 pages
Day 5 - Syntax Analysis
No ratings yet
Day 5 - Syntax Analysis
46 pages
(Week 4) Syntax Analysis (CFG)
No ratings yet
(Week 4) Syntax Analysis (CFG)
50 pages
Chapter 3 Syntax Analysis
No ratings yet
Chapter 3 Syntax Analysis
20 pages
Entrepreneurship Process
No ratings yet
Entrepreneurship Process
22 pages
Unit 2
No ratings yet
Unit 2
45 pages
Preliminaries
No ratings yet
Preliminaries
45 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
6 pages
MCA Assignment MC0073
No ratings yet
MCA Assignment MC0073
21 pages
Syntax Analysis: - Check Syntax and Construct Abstract Syntax Tree
No ratings yet
Syntax Analysis: - Check Syntax and Construct Abstract Syntax Tree
22 pages
CSC 409 Note 2
No ratings yet
CSC 409 Note 2
12 pages
L4 Formal Grammers
No ratings yet
L4 Formal Grammers
23 pages
SEN 317 Lecture 3
No ratings yet
SEN 317 Lecture 3
10 pages
1.describing Syntax and Semantics
No ratings yet
1.describing Syntax and Semantics
110 pages
Syntax Analysis: Role of Parsers
No ratings yet
Syntax Analysis: Role of Parsers
6 pages
II. Parser: Syntax Analysis
No ratings yet
II. Parser: Syntax Analysis
18 pages
09 Parsing
No ratings yet
09 Parsing
11 pages
Syntax Analysis (Part-I)
No ratings yet
Syntax Analysis (Part-I)
88 pages
Compiler Design Unit 2
No ratings yet
Compiler Design Unit 2
24 pages
CH 6
No ratings yet
CH 6
18 pages
Lecture 03
No ratings yet
Lecture 03
36 pages
Unit Ii 2 Marks
No ratings yet
Unit Ii 2 Marks
5 pages
CD Unit 2
No ratings yet
CD Unit 2
19 pages
2 Syntax Analysis - Introduction
No ratings yet
2 Syntax Analysis - Introduction
8 pages
Compiler 3
No ratings yet
Compiler 3
11 pages
Topic 2 - Syntax and Semantics Lecture Notes
No ratings yet
Topic 2 - Syntax and Semantics Lecture Notes
50 pages
Automata Chapter 3
No ratings yet
Automata Chapter 3
14 pages
Unit 2 - Sessions 1 - 2
No ratings yet
Unit 2 - Sessions 1 - 2
36 pages
CD UNIT-II Syntax Analysis
No ratings yet
CD UNIT-II Syntax Analysis
13 pages
06 Formal Grammars
100% (2)
06 Formal Grammars
11 pages
Notes On Formal Grammars: What Is A Grammar?
No ratings yet
Notes On Formal Grammars: What Is A Grammar?
8 pages
Ch3 Rotor System Operation PDF
No ratings yet
Ch3 Rotor System Operation PDF
13 pages
Service Manual: Air Conditioner Split Type AMB 891/G
100% (1)
Service Manual: Air Conditioner Split Type AMB 891/G
5 pages
FT (06) - Answerkey (RM) Phase02
No ratings yet
FT (06) - Answerkey (RM) Phase02
22 pages
Unit - 3 Terms of Trade Types
No ratings yet
Unit - 3 Terms of Trade Types
4 pages
Aggregate Impact Value
No ratings yet
Aggregate Impact Value
8 pages
Application of Machine Learning
No ratings yet
Application of Machine Learning
11 pages
Magnetic Properties
No ratings yet
Magnetic Properties
71 pages
Chemistry 12 (PBA QIB)
No ratings yet
Chemistry 12 (PBA QIB)
27 pages
Mobil™ Dexron-VI ATF: Product Description
No ratings yet
Mobil™ Dexron-VI ATF: Product Description
2 pages
Mimo Introduction
No ratings yet
Mimo Introduction
13 pages
Bomba Kobe T200 - Manual de Partes
100% (1)
Bomba Kobe T200 - Manual de Partes
13 pages
Web Winter 2022
No ratings yet
Web Winter 2022
2 pages
A Report On Switchyard Equipment Testing (132
No ratings yet
A Report On Switchyard Equipment Testing (132
11 pages
Chapter - 1 Introduction
No ratings yet
Chapter - 1 Introduction
46 pages
SAP Business Explorer Tools
No ratings yet
SAP Business Explorer Tools
12 pages
Web Winter 2023
No ratings yet
Web Winter 2023
2 pages
Digital Filter Design (FIR) Using Frequency Sampling Method: Abstract
No ratings yet
Digital Filter Design (FIR) Using Frequency Sampling Method: Abstract
10 pages
Web Summer 2023
No ratings yet
Web Summer 2023
2 pages
M. Tech. Chemical 2018
No ratings yet
M. Tech. Chemical 2018
37 pages
Ensayos de Permeabilidad
No ratings yet
Ensayos de Permeabilidad
27 pages
Last Minute Notes
No ratings yet
Last Minute Notes
2 pages
MPC - 1ST Year Jee Mains Coes Paper 10.11.2024
No ratings yet
MPC - 1ST Year Jee Mains Coes Paper 10.11.2024
8 pages
Parameter Estimation of A Plucked String Synthesis Model Using A Genetic Algorithm With Perceptual Fitness Calculation
No ratings yet
Parameter Estimation of A Plucked String Synthesis Model Using A Genetic Algorithm With Perceptual Fitness Calculation
15 pages
Subjectivity Objectivity and Frames of R PDF
No ratings yet
Subjectivity Objectivity and Frames of R PDF
49 pages
Industrial Filters PDF
No ratings yet
Industrial Filters PDF
48 pages
Group Members: 1. Shucayb Mohamed Ismail 2. Abdihafid Ismail Salad 3. Nimo Ahmed Hassan 4. Nimo Khadar Ahmed
No ratings yet
Group Members: 1. Shucayb Mohamed Ismail 2. Abdihafid Ismail Salad 3. Nimo Ahmed Hassan 4. Nimo Khadar Ahmed
20 pages
Hyperbola
No ratings yet
Hyperbola
2 pages
Caotic Mechanics Maxima
No ratings yet
Caotic Mechanics Maxima
25 pages
Scientific Method and Retailing Research A Retrospective
No ratings yet
Scientific Method and Retailing Research A Retrospective
13 pages
Exp Limiting Friction
No ratings yet
Exp Limiting Friction
2 pages
23G-04 1 06
No ratings yet
23G-04 1 06
17 pages
Paper 1 Topic 4 - SL Questions
No ratings yet
Paper 1 Topic 4 - SL Questions
2 pages
Frafos ABC SBC Brochure
No ratings yet
Frafos ABC SBC Brochure
4 pages
The Genetic Code of All Languages; Part-5 (Hebrew)
From Everand
The Genetic Code of All Languages; Part-5 (Hebrew)
Moni Kanchan Panda
No ratings yet

System Software

Uploaded by

System Software

Uploaded by

Chapter 6.

Scanning and Parsing

Reference Book: System Programming by D M Dhamdhere, McGraw Hill Publication

Interm. Code Machine

• A language L can be considered to be a collection of valid sentences.

• Each sentence can be looked upon as a sequence of words, and each

word as a sequence of letters or graphic symbols acceptable in L.

• A language specified in this manner is known as a formal language.

- Terminal Symbols, Alphabet and Strings

Type 0 – Phrase-structure Grammars

• Existence of ambiguity at the level of the syntactic

• A grammar is ambiguous if some strings are derived ambiguously.

Parse Tree 1 Parse Tree 2

Parse Tree 1 Parse Tree 2

Tokens, Patterns and Lexemes

 A token is a pair a token name and an optional token

You might also like