0% found this document useful (0 votes)
26 views25 pages

System Software

The document discusses scanning and parsing in programming language grammars. It describes lexical analysis, tokenization, regular expressions, finite automata, context-free grammars, ambiguity, top-down and bottom-up parsing. Scanning converts the source code into tokens which are then parsed using various parsing techniques.

Uploaded by

nityamparesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views25 pages

System Software

The document discusses scanning and parsing in programming language grammars. It describes lexical analysis, tokenization, regular expressions, finite automata, context-free grammars, ambiguity, top-down and bottom-up parsing. Scanning converts the source code into tokens which are then parsed using various parsing techniques.

Uploaded by

nityamparesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Chapter 6.

Scanning and Parsing


• Programming Language Grammars
• Classification of Grammar
• Ambiguity in Grammatical Specification
• Scanning
• Parsing
• Top Down Parsing
• Bottom up Parsing
• Language Processor Development Tools
 LEX
 YACC

Reference Book: System Programming by D M Dhamdhere, McGraw Hill Publication


Recall: The Structure of a Compiler
Scanner
 The scanner begins the analysis of the
source program by reading the input,
character by character, and grouping
Source Scanner Tokens characters into individual words and symbols
(tokens)

 RE ( Regular expression )
 NFA ( Non-deterministic Finite Automata )
Parsing 

DFA ( Deterministic Finite Automata )
LEX
Today we start

Interm. Code Machine


Optimization
Language Gen. Code

2
Recall: The role of Scanner and Parser

token
Source To semantic
program Scanner Parser analysis
getNextToken

Symbol
table
Programming Language Grammars
• The lexical and syntactic features of a programming language are
specified by its grammar.

• A language L can be considered to be a collection of valid sentences.

• Each sentence can be looked upon as a sequence of words, and each

word as a sequence of letters or graphic symbols acceptable in L.

• A language specified in this manner is known as a formal language.


• A formal language grammar is a set of rules which precisely specify
the sentences of L.
Programming Language Grammars

alphabets

Words / Strings

Sentences/ Statements

Language
Programming Language Grammars

- Terminal Symbols, Alphabet and Strings


• The alphabet of L is represented by a greek
symbol Σ.
• Such as Σ = {a , b , ….z, 0, 1,…. 9}
• A string is a finite sequence of symbols.
α= axy
Programming Language Grammars
- Productions
• Also called as rewriting rule
A nonterminal symbol ::= String of T’s and NT’s
e.g.:
<Noun Phrase> ::= <Article> <Noun>
<Article> ::= a| an | the
<Noun> ::= boy | apple
Programming Language Grammars

- Grammar
• A grammar G of a language LG is a quadruple
(Σ, SNT, S, P) where
– Σ is the alphabet
– SNT is the set of NT’s
– S is the distinguished symbol
– P is the set of productions
Programming Language Grammars
- Derivation
• Let production P1 of grammar G be of the
form P1: A ::= α
And let β be such that β = γAθ

β = γαθ
Programming Language Grammars
- Reduction
Programming Language Grammars

- Example
<Sentence> :: = <Noun Phrase> <Verb Phrase>
<Noun Phrase> ::= <Article> <Noun>
<Verb Phrase> ::= <Verb> <Noun Phrase>
<Article> ::= a| an| the
<Noun> ::= boy | apple
<Verb> ::= ate
Programming Language Grammars
<Sentence>
<Noun Phrase> <Verb Phrase>
<Article> <Noun> <Verb Phrase>
<Article> <Noun> <Verb> <Noun Phrase>
the <Noun> <Verb> <Article> <Noun>
the boy <Verb> <Article> <Noun>
the boy ate <Article> <Noun>
the boy ate an <Noun>
the boy ate an apple
Classification of Grammars
• Venn Diagram of Grammar Types:

Type 0 – Phrase-structure Grammars


Type 1 –
Context-Sensitive
Type 2 –
Context-Free
Type 3 –
Regular
TYPE – 0 GRAMMARS
• These grammars, known as phrase structure
grammars, contain productions of the form
α ::= β
- Where both α and β can be strings of Ts and NTs.
• Such productions permit arbitrary substitution
of strings during derivation or reduction
•hence they are not relevant to specification of
programming language.
TYPE – 1 GRAMMARS
• These grammars are known as context sensitive
grammars because their productions specify that
derivation or reduction of strings can take place only in
specific contexts.
• A Type – 1 production has the form
α A β ::= α π β
• Thus, a string π in a sentential form can be replaced by
‘A’ only when it is enclosed by the strings α and β.
• These grammars are also not particularly relevant for
PL specification since recognition of PL constructs is not
context sensitive in nature.
TYPE – 2 GRAMMARS
• These grammars impose no context requirements on derivations
or reductions.
• A typical Type – 2 production is of the form
A := π
• Which can be applied independent of its context.
• These grammars are therefore known as context free grammars
(CFG).
• CFGs are ideally suited for programming language specification.
• Two best known uses of Type – 2 grammars in PL specification:
• ALGOL-60 specification
• Pascal specification.
TYPE – 3 GRAMMARS
• Type – 3 grammars (Regular Grammar) are characterized by
productions of the form
A := tB | t or A := Bt | t
• These productions also satisfy the requirements of Type – 2
grammars.
• The specific form of the RHS alternatives- namely a single T or a
string containing a single T and a single NT- gives some practical
advantages in scanning.
• The use of Type – 3 productions is restricted to the specification of
lexical units, e.g. identifiers, constants, labels, etc.
TYPE – 3 GRAMMARS
• The productions for <constant> and <identifier> in
grammar are in fact Type – 3 in nature.
• When the production for <id> in the form Bt | t, as
<id> ::= l | <id> l | <id> d
• Where l and d stand for a letter and digit respectively.
• Type – 3 grammars are also known as linear grammars
or regular grammars.
• These are further categorized into left-linear and right-
linear grammars depending on whether the NT in the
RHS alternatives appears at the extreme left or extreme
right.
Ambiguity In Grammatical Specification
• Ambiguity implies the possibility of different interpretations of a
source string.
• In natural language, ambiguity may concern the meaning or
syntax category of a word, or the syntactic structure of a construct.
• For Example, A word can have multiple meanings or can be both noun and
verb, and a sentence can have multiple syntactic structures.
• Bank – river bank or financial bank
• Formal language grammars avoid ambiguity at the level of a lexical
unit or a syntax category.
• This is achieved by the simple rule that identical strings cannot
appear on the RHS of more than one production in the grammar.
Ambiguity In Grammatical Specification

• Existence of ambiguity at the level of the syntactic


structure of a string would mean that more than one
parse tree can be built for the string.
• Example:
<exp> ::= <id>|<exp>+<exp>|<exp>*<exp>
<id> ::= a|b|c
• Two parse trees exist for the source string a+b*c
according to this grammar-
• a+b is first reduced to <exp>
• b*c is first reduced to <exp>.
Ambiguity In Grammatical Specification

• A grammar is ambiguous if some strings are derived ambiguously.


• A string is derived ambiguously if it has more than one leftmost
derivations.
Typical example:
Production Rule
E  0 | 1 | E+E | EE
String : 01+1
Leftmost Derivation Rightmost Derivation
E  E+E E  EE
 EE+E  EE+E
 0E+E  EE+1
 01+E  E1+1
 01+1  01+1
Ambiguity In Grammatic Specification
• Ambiguity and Parse Trees
• The ambiguity of 01+1 is shown by the two different parse trees:
Parse Tree 1 Parse Tree 2
E
E

E  E
E + E

0
1 E +
E  E E

0 1
1 1
Ambiguity In Grammatic Specification
• Example - I
S → AB | aaB
A → a | Aa
B→b
Check the given grammar is ambiguous or not?

Parse Tree 1 Parse Tree 2

String: aab
Ambiguity In Grammatical Specification
• Example - II
E→E+E
E→E*E
E → id
Check the given grammar is ambiguous or not?

Parse Tree 1 Parse Tree 2

String: id + id * id
Scanning

Tokens, Patterns and Lexemes

 A token is a pair a token name and an optional token


value
 A pattern is a description of the form that the lexemes of a
token may take
 A lexeme is a sequence of characters in the source
program that matches the pattern for a token

You might also like