0% found this document useful (0 votes)
25 views

Unit 3

dsa

Uploaded by

nam861836
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Unit 3

dsa

Uploaded by

nam861836
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Unit 3.

Generative Grammars

1
Basic concepts of languages theory

• Parsing or syntactic analysis is the process of analysing


a string of symbols, either in natural languages or in computer
languages, conforming to the rules of a formal grammar.

• The formal language theory considers a language as a


mathematical object.

• A language is just a set of strings (sentences). To formally


define a language we need to formally define what are the
strings admitted by the language.

2
Alphabet
Symbol
A physical entity that we shall not formally define; we shall rely on
intuition.
Alphabet
A finite, non-empty set of symbols
• We often use the symbol ∑ (sigma) to denote an alphabet
• Examples of alphabet
• Binary: ∑ = {0,1}
• All lower case letters: ∑ = {a,b,c,..z}
• Alphanumeric: ∑ = {a-z, A-Z, 0-9}
• DNA molecule letters: ∑ = {a,c,g,t}(guanine, adenine, thymine, and
cytosine)
• C character set
• KPL token set.
Example of an alphabet: C character set

Types Character Set


Lowercase Letters a –z
Uppercase Letters A-Z
Digits 0-9
~! # $% ^ & *( )_ +| \’ - =
Special Characters
{ } [] : ” ; < > ? , . /
White Spaces Tab Or New line Or Space
Token set of KPL

• Identifiers, numbers, character constants


• Keywords
PROGRAM, CONST, TYPE, VAR, PROCEDURE, FUNCTION, BEGIN, END,
ARRAY, OF, INTEGER, CHAR, CALL, IF, ELSE, WHILE, DO, FOR, TO
• Operators
:= (assign), + (addition), - (subtraction), * (multiplication), / (division), =
(comparison of equality), != (comparison of difference), > (comparison of
greateness), < (comparison of lessness), >= (comparison of greateness or
equality), <= (comparison of lessness or equality)
Separators:
:,;,(,),,,(.,.),.

5
String (sentence)

• A string is finite sequence of symbols chosen from some


alphabet

• Empty string is 

• Examples of string:
 1000010101111

 A C program is a string of tokens

 A human DNA pattern


Languages

A language over alphabet  is a set of strings over 

Examples of languages:

• The set of all words over {a, b},

• The set { an | n is a prime number },

• Programming language C: the set of syntactically correct


programs in C
Chomsky's Hierarchy
• Type-0 languages (recursive enumerable)

instances of a problem.

• Type-1 languages (context-sensitive)

natural languages, DNA languages

• Type-2 languages (context-free)

programming language, natural languages

• Type-3 languages (regular)

tokens of programming languages


Chomsky's Hierarchy

9
A grammar to generate real numbers in BNF

<real number> ::= <sign><natural number> |


<sign><natural number>'.'<digit sequence> |
<sign>'.'<digit><digit sequence> |
<sign><real number>’E'<natural number>
<sign> ::=  | ‘+’ | ‘-‘
<natural number> ::= ‘0’ | <nonzero digit><digit sequence>
<nonzero digit> ::= ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’
<digit sequence> ::=  | <digit><digit sequence>
<digit> ::= ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’

10
Context Free Grammars (CFG)

A context free grammar G has:


• A set of terminal symbols, 
• A set of nonterminal symbols (variables), V
• A start symbol, S, which is a member of V
• A set R of production rules of the form A -> w, where A is a
nonterminal and w is a string of terminal and nonterminal symbols
or .

11
Formal definition of a context free grammar

A context-free grammar is a 4-tuple (V,Σ, R, S), where


1) V is a finite set called the variables (or nonterminals)
2) Σ is a finite set, disjoint from V, called the terminals,
3) R is a finite set of rules, with each rule being a variable and a string of
variables and terminals(form of a rule is A where AV and 
(V)*)
4) S V is the start variable.

Conversion:
• Variables are represented by uppercase letters.
• Terminals are represented by lowercase letters, digits or signs
• A   and A   can be replaced by A   | 
Context free grammar example
The grammar of decimal numbers
S  AB | ABC | A.EC | ASeB
A+|-|
B  0 | DC
C  EC | 
D1|2|3|4|5|6|7|8|9
E0|1|2|3|4|5|6|7|8|9
Here,
V = {S, A, B, C, D, E} where
• S is <real numbers>
• A is <sign>
• B is <natural number>
• C is <digit sequence>
• D is <nonzero digit>
• E is <digit>
 = {+, -, .,e, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
S is the start variable 13
Context Free Grammar Examples

• Grammar of nested parentheses

• G = (V, , R, S) where

V = {S}
 ={ (, ) }
R ={ S (S), SSS, S }

14
How a string of a context free language can be generated
?
A context free grammar can be used to generate strings in the
corresponding language as follows:

let X = the start symbol s


while there is some nonterminal Y in X do
apply any one production rule using Y, e.g. Y -> w

15
Derivations

• When X consists only of terminal symbols, it is a string of the


language denoted by the grammar.

• Each iteration of the loop is a derivation step.

• If an iteration has several nonterminals to choose from at


some point, the rules of derviation would allow any of these
to be applied.

• Example : S  -A  -B.B  -B.C  -C.C -1.C  -1.5

16
Leftmost and Rightmost Derivations

• In practice, parsing algorithms tend to always choose the


leftmost nonterminal, or the rightmost nonterminal, resulting
in strings that are leftmost derivations or rightmost
derivations
• Example:
Leftmost derivation:
S  -A  -B.B  -C.B  -1.B  -1.C  -1.5
Rightmost derivation:
S  -A  -B.B  -B.C  -B.5  -C.5  -1.5

17
Derivation Tree (parse tree)

Derivation tree is constructed with


1) Each tree vertex is a variable (nonterminal) or terminal or epsilon
2) The root vertex is S
3) Interior vertices are from V, leaf vertices are from ∑ or epsilon
4) An interior vertex A has children, in order, left to right,
X1, X2, ... , Xk when there is a production in P of the
form A  X1 X2 ... Xk
5) A leaf can be epsilon only when there is
a production A  
and the leaf’s parent
can have only this child.
Here is the parse tree of string (()()) with grammar S (S), SSS, S
18
A parse tree of a tiny grammar for English

S  NP VP
NP  D N
VP  V NP
D  the
N  chef
N  soup
V  cooks
Ambiguity

Grammar
EE+E
EE*E
E(E)
E  ident

allows two different derivations for strings such as


ident + ident * ident (e.g. x + y * z)

The grammar is ambiguous

20
Disambiguation

EE+T
ET
TT*F
TF
F(E)
F  ident

(by adding some nonterminals and production rules to force operator precedence)

21
Recursion

• Direct recursion X ω1X ω2


• A production is recursive if X * ω1X ω2
• Recursion can be used to represent repetitions and nested
structures
• Left recursion X  b | Xa.
X X a X a a X a a a b a a a a a ...
• Right recursion X  b | a X.
X a X a a X a a a X ... a a a a a b
• Central recursion X  b | ( X).
X (X) ((X)) (((X))) (((... (b)...)))
• Indirect recursion X * ω1X ω2
Example:
X  b | Ya
Y  Xb
22
Removing Left Recursion

23
Removing Left Recursion

24
Example: Remove the left recursion

EE+T EE+T E  TE’


ET
ET E’  +TE’ | T
TT*F
TF
Add new symbol E’ T  FT’
F(E) E  TE’ T’  *FT’| F
F  ident E’  +TE’ | T F(E)
TT*F F  ident
TF
Add new symbol T’
T  FT’
T’  *FT’| F

25

You might also like