Lec05-CFG ContextFreeLanguages
Lec05-CFG ContextFreeLanguages
• Context-Free Grammars
• Derivations
• Parse Trees
• Ambiguity
Languages
Context-Free Languages
Regular Languages
S→
S → 0S1
Induction:
• Production S → 0S1 says that if w is in the language then is 0w1 is in the language.
• is in the language.
• since is in the language, 01 is in the language.
• since 01 is in the language, 0011 is in the language.
• …
• Thus, the language of this CFG is { 0n1n | n ≥ 0}
• Sometimes, we use a shorthand for a list of productions with the same left side.
S → | 0 | 1 | 0S0 | 1S1
• One derivation step can replace any variable in the string with the right side (body) of
one of its productions.
Derivation Sequence:
Basis:
∗
– For any string of terminals and variables, we say ֜ .
– That is, any string derives itself.
Induction:
∗ ∗
– If ֜ and ֜ , then ֜ .
– That is, if can become by zero or more steps, and one more step takes to ,
then can become by a derivation sequence.
BBM401 Automata Theory and Formal Languages 11
∗
Derivation Sequence ֜
∗
• In other words, ֜ means that there is a sequence of strings 1, 2, … , n for
some n ≥ 1 such that
1. = 1 ,
2. = n , and
∗
• That is, S ֜ 000111 and also
∗
– S ֜ 000S111
∗
– S ֜ 00S11
∗
– 0S1 ֜ 000S111
∗
– 00S11 ֜ 000111
• We may select any non-terminal (variable) of the string for the replacement in each
derivation step.
S ֜lm ASB ֜lm aASB ֜lm aSB ֜lm acB ֜lm acbB ֜lm acb
• Rightmost Derivation always replaces the righmost variable (in the string) by one of
its rule-bodies. ֜rm
S ֜rm ASB ֜rm ASbB ֜rm ASb ֜rm Acb ֜rm aAcb ֜rm acb
∗
• If S ֜lm , we say that is a left-sentential form.
∗
• If S ֜rm , we say that is a right-sentential form.
• i.e. the set of strings of terminals (strings over T*) that are derivable from S
• For each CFL, there is a CFG, and each CFG generates a CFL.
• Every regular language is a CFL.
– That is, regular languages are a proper subset of context-free languages
Proof:
In order to prove this equality,
( Direction): We have to prove that every member of Lpal is also a member of L(Gpal).
( Direction): We have to prove that every member of L(Gpal) is also a member of Lpal.
Basis:
• |w|=0, or |w|=1.
• Then, w is , 0, or 1
• Since S→ , S→0 and S→1 are productions of Gpal,
∗
we can conclude that S ֜ w in all base cases.
∗
– S֜
∗
– S֜0
∗
– S֜1
Induction:
• Suppose |w|=n+12
• Since w=wR, we have w=0x0, or w=1x1, and x=xR
Case1:
∗
– If w=0x0, by IH we know that S ֜ x
∗
– Then, by the structure of the grammar S ֜ 0S0 ֜ 0x0 where 0x0=w
Case2:
∗
– If w=1x1, by IH we know that S ֜ x
∗
– Then, by the structure of the grammar S ֜ 1S1 ֜ 1x1 where 1x1=w
BBM401 Automata Theory and Formal Languages 21
The Language of a CFG – A Proof Example
Direction
Proof: ( Direction)
• We assume that wL(Gpal) and we must show that w=wR.
∗
• Since wL(Gpal), we have S ֜ w
∗
• We prove by induction of the length of ֜ (the length of the derivation sequence)
Basis:
∗
• The derivation S ֜ w is done in one step.
• Then w must be , 0, or 1, they are all palindromes.
• If wL(G), for some CFG, then w has a parse tree, which tells us the (syntactic)
structure of w.
– If G is unambiguous, w can have only one parse tree.
– If G is ambiguous, w may have more than one parse tree.
– Ideally there should be only one parse tree for each string in the language. This means that
the grammar should be unambiguous.
• We may remove the ambiguity from some of ambiguous grammars in order to obtain
unambiguous grammars by making certain assumptions.
• Unfortunately, some CFLs are inherently ambiguous and they can be only defined by
ambiguous grammars.
3. If an interior node is labeled by the variable A, and its children (from left to
right) labeled X1,X2,…,Xk then A → X1X2…Xk P.
S
Parse tree of acb
A S B
a A c b B
A S B
A S B
a A
A S B
a A
A S B
a A c
A S B
a A c b B
A S B
a A c b B
• Yield Example: S
a c b = acb
A S B
a A c b B
• We will prove:
∗
Part 1: If there is a parse tree with root labeled A and yield w, then A ֜lm w.
∗
Part 2: If A ֜lm w, then there is a parse tree with root A and yield w.
IH: Part 1 holds for the trees with the height < h.
∗ ∗ ∗ ∗
• Thus, A ֜lm X1…Xn ֜lm w1X2…Xn ֜lm w1w2X3…Xn ֜lm … ֜lm w1w2…wn
Example:
S → SaS | b is an ambiguous grammar.
There are two parse trees for the string babab
S S
S a S S a S
S a S b b S a S
b b b b
BBM401 Automata Theory and Formal Languages 42
Ambiguity, Leftmost and Rightmost Derivations
• If there are two different parse trees for a string in the language, they must produce
two different leftmost derivations for that string.
– Conversely, two different leftmost derivations of a string produce two different parse trees
for that string.
• Likewise for rightmost derivations.
• There are two leftmost derivation sequences for the string babab
1. S ֜lm SaS ֜lm SaSaS ֜lm baSaS ֜lm babaS ֜lm babab
2. S ֜lm SaS ֜lm baS ֜lm baSaS ֜lm babaS ֜lm babab
• There are two rightmost derivation sequences for the string babab
1. S ֜rm SaS ֜rm Sab ֜rm SaSab ֜rm Sabab ֜rm babab
2. S ֜rm SaS ֜rm SaSaS ֜rm SaSab ֜rm Sabab ֜rm babab
S a b
S a b
E * E E + E
E + E id id E * E
id id id id
E ֜lm E*E ֜lm E+E*E ֜lm id+E*E E ֜lm E+E ֜lm id+E ֜lm id+E*E
֜lm id+id*E ֜lm id+id*id ֜lm id+id*E ֜lm id+id*id
E ֜rm E*E ֜rm E*id ֜rm E+E*id E ֜rm E+E ֜rm E+E*E ֜rm E+E*id
֜rm E+id*id ֜rm id+id*id ֜rm E+id*id ֜rm id+id*id
• Disambiguated grammar:
E → E+T | T
T → T*F | F
F → G^F | G
G → id | (E)
E + T
T T * F
parse tree for id+id*id
F F G
G G id
id id
A grammar for L is
S → AB | C
A → aAb | ab
B → cBd | cd
C → aCd | aDd
D → bDc | bc
• {0n1m : n>m≥0}
S → 0S1 | 0A
A → | 0A
• The strings of 0’s and 1’s that contain equal number of 0’s and 1’s.
S → 0S1S | 1S0S |
S0 → 0S2 | 1S1 | ε
S1 → 0S3 | 1S0
S2 → 0S0 | 1S3
S3 → 0S1 | 1S2