Samir CFG
Samir CFG
OF
COMPUTATION
1
Introduction
- {anbn : n = 0, 1, 2, …}
- {w : w is palindrome}
5
Context Free
•Grammar
Above grammar can generate string 000#111
as follows:
• A -> 0A1
• -> 00A11
• -> 000A111
• -> 000B111
• -> 000#111
• Formal definition:
• A context free grammar is a four tuple (V, ∑, R,S)
where
• V is a finite set called variable.
• ∑ is a finite set called terminal.
• R is a finite set of rules.
• S is a start variable as S ϵ V
6
Context Free
•Grammar
Some terminologies:
• Derivation:
• Let G = (V, ∑, R, S) be a context free grammar. If w1 w2 …wn
are strings over variable V such that w1 =>w2 =>….=>wn
then we can say wn is derivable.
• The sequence of substitution to obtain a string is called
derivation. Say wi derives wn i.e. w1 =*> wn. Then, the
sequence of steps to obtain wn from wi is called derivation.
• Language of CFG (L(G)):
• If G = (V, ∑, R, S) be a CFG, then the language of G denoted
by L(G) is the set of terminal strings that have derivations
from start symbol.
• i.e. L(G) = { w ϵ ∑* : S =*> w }
7
Context-Free Grammar:
Alternative Definition
Definition. A context-free grammar is a 4-tuple (∑, NT, R,
S), where:
1
2
Context Free
•Grammar
Derivative Tree / Parse Tree:
• Is a tree representation of strings of terminals using the
productions
defined by the grammar.
• It pictorially shows how the start symbol of a grammar
derives a strings in language.
• Is an ordered tree that shows the essentials of a derivation.
• Formally, given CFG G = (V, ∑, R, S) , a parse tree has
following properties:
• The root is labeled by start symbol.
• Each interior nodes of parse tree are variables.
• Each leaf node is labeled by terminal symbol or ϵ .
• If an interior node is labeled with non-terminal A and its children are x 1 ,x2 ,….xn from
left to right then there is a production P as:
• A -> x1 x2 ………xn for each xi ϵ ∑
1
3
Context Free
•Grammar
Derivative Tree / Parse Tree:
• e.g. consider a grammar where S - > aSa
|a|b|ϵ
Now, for string aabaa, we have
S => aSa
=> aaSaa
=> aabaa
Now, its parse tree is: S
a a
S
a
S
a
1
4
Derivation Trees
S S
S ? ?
A
A B A B
A A Infinitely
A A b B A A b many others
A A A b possible.
a a b a A b
a ε A b
a
a
Context Free
•Grammar
Leftmost and Rightmost Derivation:
• In leftmost derivation, leftmost variable is replaced
first.
• In rightmost derivation, rightmost variable is replaced
first.
• Consider a grammar:
S -> S + S
S ->
S/S S
-> (S)
S -> S
-S
S -> S x S
1
S -> a 6
Context Free
Grammar
Leftmost Derivation Rightmost
Derivation
S => S + S S => S + S
=> a + S => S + S/S
=> a + S/S => S + S/S–S
=> a + (S) / S => S + S/S–a
=> a + (S x S) / S => S + S/a–a
=> a + (a x S) / S => S + (S) / a –
a
=> a + (a x a) / S => S + (S x S) /
a–a
=> a + (a x a) / S => S + (S x a) /
–S a–a
=> a + (a x a) / a => S + (a x a) /
–S a–a 1
7
Context Free
•Grammar
Parse tree
S
S + S
S S
a
- S
( /
)
SS x a
Sa
S
a a
1
8
Parse Tree
A parse tree of a derivation is a tree in which:
a
S
a S
b
S
e
Parse Trees
S🡪A
S🡪B
S 🡪 AB
A 🡪 aA
B 🡪 bB
A🡪 e
B🡪e
S → A B? ?
S→ACD 1) Delete: B useless because nothing derivable from B.
A→Aa
A→a
A→aA
A→a 2) Delete either A→Aa or A→aA.
C→ε 3) Delete one of the idential productions.
D→dD
4) Delete & also replace S→ACD with S→AD.
D→E
E→eAe 5) Replace with D→eAe.
F→ff 6) Delete: E useless after change #5.
7) Delete: F useless because not derivable from S.
Context Free
•Grammar
Simplification of CFG:
• We must eliminate useless symbols.
• We must eliminate empty (ϵ) productions.
• We must eliminate unit productions.
• Eliminating useless symbols:
• Useless symbols are those variables or terminals that do not
appear in any
derivation of a terminal string from the start symbol.
• For a grammar G = (V, ∑, R, S) , a symbol y is useful, if
there is some derivation of the form S =*> XyZ.
• Elimination of useless symbols include identifying
whether or not the symbol is ‘generating ’ and
‘reachable’.
• A symbol x is generating if x =*> w for some terminal string
w. 2
7
Context Free
Grammar
• Eliminating useless symbols:
• e.g. consider a grammar
S -> aB | bX
A -> Bad |
bSX | a B ->
aSB | bBX
X -> SBd | aBX | ad
• Here, A and X can generate terminals so A and X are generating
symbols as A ->
a and X -> ad.
• Also S is generating symbol as S can generate terminal string
such as S -> bX and X generate terminal string.
• But B can’t produce any terminal symbol so it is non generating.
So, eliminate
B. Then we get
S -> bX
2
8
Context Free
Grammar
• Eliminating useless symbols:
• Again, A is unreachable as there is not any derivation of
form S => αAβ in
grammar. So, we eliminate A , then we get
X -> bX Hence, this is simplified grammar.
S ->
• Eliminating ad empty (ϵ) productions:
• A grammar is said to have empty production if there is a
production of the
form X -> ϵ.
• Here, we have to discover nullable variable.
• A variable X is nullable if X =*> ϵ.
• If X -> ϵ is a production to be eliminated then we look for all
productions whose right side contain X and replace each
occurrence of X in each of those production to obtain non
empty production.
• Now, those resultant non empty productions must be added to
2
grammar to keep the language generated be same. 9
Context Free
•Grammar
Eliminating empty (ϵ) productions:
• e.g.
S -> aA
A -> b | ϵ
• Here, A -> ϵ so, A is nullable.
• Then, find all productions whose right side contains A. We
get, S -> aA
• Now, replace A with ϵ, We get S -> a
• Then
S -> add this resultant non empty production in grammar,
S -> aA
finally
aA A weOget, e.g.
|a A-
-> b R >b S ->
S -> ABAC
a A -> aA
| ϵ B ->
bB | ϵ C - 3
0
Context Free
•Grammar
Eliminating unit productions:
• A unit production is a production of the form X -> Y , where
X and Y are
both variables.
• Algorithm:
while (there exists unit production X -> Y)
{
1.select a unit production X -> Y , such that
there exists a
production Y -> z, where z is terminal.
2. for (every non-unit
production , Y -> z) add
production X -> z to
grammar. 3
1
Context Free
•Grammar
Eliminating unit • e.g
productions: . S ->
• e.g. AB
A -> A -> a
B B B -> C
-> a
|b C-
B -
>D
>c
D -> E
• Solution
E -> a
we have unit production A
-> B. there exists non
unit
productions B -> a and B -> c.
now add them to the grammar.
Finally we get,
A -> 3
2
Context Free
•Grammar
Chomsky Normal Form (CNF):
• A CFG G = (V, ∑, R, S) is said to be in Chomsky
Normal Form (CNF) if every productions in G are
in one of the two forms:
Here,
• X -> X, Y and Z are
YZ variables. a is
terminal
• X ->
• So,
a a grammar in CNF is one which should
not have:
• Empty productions
• Unit productions
• Useless symbols
3
3
Context Free
Grammar
• Chomsky Normal Form (CNF):
• e.g. Convert the following CFG into CNF
G = (V, ∑, R, S) where
V = { S, A , B}
∑ = {a,
b} R =
{
S -> aAB | AaB | B
A -> aA | ϵ
B -> ab | bA
}
• Solution:
• To convert CFG into CNF , we have to
simplify CFG.
3
• In given grammar there are not any useless 4
Context Free
Grammar
• Chomsky Normal Form (CNF):
• Now, eliminating ϵ
production: S -> aAB |
AaB | aB | B
A -> aA | a
B -> ab | bA | b
• Again, eliminating unit
production:
S -> aAB | AaB | aB | ab |
bA | b
A -> aA | a
B -> ab | bA | b
• Now, Converting this simplified form to
get CNF: S -> CAB | ACB | CB | CD | DA |
b
C ->
3
a D 5
Context Free
•Grammar
Chomsky Normal Form • e.g
(CNF): . S -> AAC
• Here , again S -> CAB and S -> A -> aAb
ACB are not in required form;
|ϵ
so, we can perform like:
C -> aC | a
S -> EB | FB | CB | CD |
DA | b E -> CA
F ->
AC C
-> a
D ->
b
A -> CA | a
3
B -> CD | DA | b 6
Linear grammar
37
Left linear
grammar
A left linear grammar is a linear grammar in
which the non-terminal symbol always
occurs on the left side.
Here is a left linear grammar:
S → Aa
A → ab
38
Right linear
grammar
A right linear grammar is a linear grammar in
which the non-terminal symbol always occurs on
the right side.
Here is a right linear grammar:
S → abaA
A→ε
39
Left linear grammars are
evilConsider this rule from a left linear grammar:
A → Babc
Can that rule be used to recognize this string:
abbabc
We need to check the rule for B:
B → Cb | D
Now we need to check the rules for C and D.
This is very complicated.
Left linear grammars require complex parsers.
40
Right linear grammars are good
41
Convert left linear to right
linear
Now we will see an algorithm for converting any
left linear grammar to its equivalent right linear
grammar.
S → Aa S → abaA
A → ab A→ε
Both grammars generate this language: {aba}
42
May need to make a new start symbol
43
Symbols used by the algorithm
Let S denote the start symbol
Let A, B denote non-terminal symbols
Let p denote zero or more terminal
symbols
Let ε denote the empty symbol
44
Algorithm
1) If the left linear grammar has a rule S → p, then
make that a rule in the right linear grammar
2) If the left linear grammar has a rule A → p, then
add the following rule to the right linear
grammar:
S → pA
3) If the left linear grammar has a rule B → Ap,
add the following rule to the right linear
grammar:
A → pB
4) If the left linear grammar has a rule S → Ap,
then add the following rule to the right linear
grammar: 45
Convert this left linear
grammar
left linear
S → Aa
A → ab
46
Right hand side has
terminals
left linear right linear
S → Aa S → abA
A → ab
47
Right hand side of S has non-terminal
S → Aa S → abA
A → ab A→a
48
Equivalent!
S → Aa S → abA
A → ab A→a
49
Convert this left linear
grammar
original grammar left linear
S → Ab S0 → S
S → Sb S → Ab
A → Aa make a new
start symbol S → Sb
A→a A → Aa
A→a
Convert this
50
Right hand side has
terminals
left linear right linear
S0 → S S0 → aA
S → Ab
S → Sb
A → Aa
A→a
S0 → S S0 → aA
S → Ab A → bS
S → Sb A → aA
A → Aa S → bS
A→a
3) If the left linear grammar has a rule B → Ap, add the
following rule to the right linear grammar: A → pB
52
Right hand side of start symbol has non terminal
S0 → S S0 → aA
S → Ab A → bS
S → Sb A → aA
A → Aa S → bS
A→a S→ε
S0 → S S0 → aA
S → Ab A → bS
S → Sb A → aA
A → Aa S → bS
A→a S→ε
54
Will the algorithm always work?
55
Regular Grammars Generate
Regular Languages
56
Theorem
Languages
Regular
Generated by
Languag
Regular
es
Grammars
57
Theorem - Part 1
Languages
Regular
Generated by
Languag
Regular
es
Grammars
58
Theorem - Part 2
Languages
Regular
Generated by
Languag
Regular
es
Grammars
59
Proof – Part 1
Languages
Regular
Generated by
Languag
Regular
es
Grammars
61
Grammar G is right-linear
Exampl
e:
62
Construct NFA such that
every state is a grammar variable:
special
final
state
63
Add edges for each production:
64
65
66
67
68
69
NFA Gramm
ar
70
In General
A right-linear grammar
has variables:
and productions:
o
r
71
We construct the NFA such that:
special
final
72
state
For each production:
……
…
73
For each production:
……
…
74
Resulting NFA looks like this:
It holds that: 75
The case of Left-Linear
Grammars
Let be a left-linear grammar
Proof idea:
We will construct a right-linear
grammar with
76
Since is left-linear
grammar
the productions look like:
77
Construct right-linear grammar
In
:
In
:
78
Construct right-linear grammar
In
:
In
:
79
It is easy to see that:
81
Any regular language is
generated
by some regular grammar
Proof
idea:
Let be the NFA with .
Exampl
e:
83
Convert to a right-linear
grammar
84
85
86
87
In General
For any
transition:
Add
production:
Add
production:
89
Since is right-linear grammar
with
90
Reverse of a Regular
Language
91
Theore
m:
The reverse of a regular
language
is a regular language
Proof
idea:
Construct NFA that accepts
:
invert the transitions of the
NFA
that accepts 92
Context Free
Grammar Proof
Since is regular,
there is NFA that
accepts
Exampl
e:
93
Context Free
Grammar
Invert
Transitions
94
Context Free Grammar
95
Context Free Grammar
96
Resulting machine
accepts
is
regular
97
Context Free
Grammar
• Left Recursion:
• A production of grammar is said to have left recursion if
the leftmost
variable of its RHS is same as variable of its LHS.
• e.g. S -> Sa | ϵ
• Elimination of left recursion
• left recursion is eliminated by converting the grammar into
right recursive grammar.
• if we have, A -> Aα | β
then, A -> βA’
i) A -> ABd | AaA’|a
-> αA’ | ϵ Solution: A -> aA’
e.g. Consider
B -> Be | b the following grammar andA’ ->eliminate left
BdA’ | aA’ | ϵrecursion
B -> bB’
B’ -> eB’ | ϵ
9
8
Context Free
Grammar
• Greibach Normal From (GNF):
• A CFG G = (V, ∑, R, S) is said to be in Greibach Normal Form
(GNF) if all the production of grammar are of the form:
A -> xα; where x is a terminal and α is string of zero or more
variables.
e.g. convert the following CFG
A
into->GNFaB S |-> AB | BC
bA | a B -> e.g. S -> abSb |
bC | cC | b aa
C -> c
Solutions:
here production S -> AB | BC is not in
required form. now, applying substitution
rule:
S -> aBB | bAB | aB | bCC | cCC| bC
A -> aB |
bA | a B ->
bC | cC | b 9
9
Unrestricted Grammars
An unrestricted grammar has essentially no
restrictions on the form of its productions:
• Any variables and terminals on the left side, in any order
• Any variables and terminals on the right side, in any order
• The only restriction is that λ is not allowed as the left side
of a production
A sample unrestricted grammar has productions
S → S1B
S1 → aS1b
bB → bbbB
aS1b → aa
B→λ
Context-Sensitive Grammars
• In a context-sensitive grammar, the only
restriction is that, for any production, length
of the right side is at least as large as the
length of the left side
• Example introduces a sample unrestricted
grammar with productions
S → abc | aAbc
Ab → bA
Ac → Bbcc
bB → Bb
aB → aa | aaA
Characteristics of Context-
Sensitive Grammars
• An important characteristic of context-sensitive grammars is
that they are noncontracting, in the sense that in any
derivation, the length of successive sentential forms can
never decrease
• These grammars are called context-sensitive because it is
possible to specify that variables may only be replaced in
certain contexts
• For instance, in the grammar of Example ,
S → abc | aAbc
Ab → bA
Ac → Bbcc
bB → Bb
aB → aa | aaA
variable A can only be replaced if it is followed by either b or c
Context-Sensitive Languages
• A language L is context-sensitive if there is a context-sensitive
grammar G, such that either L = L(G) or L = L(G) ∪ { λ }
• The empty string is included, because by definition, a context-
sensitive grammar can never generate a language containing
the empty string
• As a result, it can be concluded that the family of context-free
languages is a subset of the family of context-sensitive
languages
• The language { anbncn: n ≥ 1 } is context-sensitive, since it is
generated by the grammar in Example
S → abc | aAbc
Ab → bA
Ac → Bbcc
bB → Bb
aB → aa | aaA
Derivation of Strings Using a
Context-Sensitive Grammar
Using the grammar in Example , we derive the string
aabbcc
S ⇒ aAbc
⇒ abAc
⇒ abBbcc
⇒ aBbbcc
⇒ aabbcc
The variables A and B are effectively used as
messengers:
an A is created on the left, travels to the right of the first c,
where it creates another b and c, as well as variable B
the newly created B is sent to the left in order to create the
corresponding a.
The Chomsky Hierarchy
The linguist Noam Chomsky
summarized the relationship
between language families by
classifying them into four
language types, type 0 to type 3
This classification, which
became known as the Chomsky
Hierarchy, is illustrated in
Figure .
Type-3: Regular Languages, Regular Expressions, DFAs, Regular Grammars
Type-2: CFLs, CFGs, PDAs
Type-1: Context-Sensitive Grammars/Languages, Linear Bounded Automatons
Type-0: Recursively Enumerable Languages, Semi-Thue Systems, TMs