ToC Notes - Unit 2
ToC Notes - Unit 2
Where,
G is the grammar, which consists of a set of the production rule. It is used to generate the string
of a language.
T is the final set of a terminal symbol. It is denoted by lower case letters.
V is the final set of a non-terminal symbol. It is denoted by capital letters.
P is a set of production rules, which is used for replacing non-terminals symbols(on the left side
of the production) in a string with other terminal or non-terminal symbols(on the right side of the
production).
S is the start symbol which is used to derive the string. We can derive the string by repeatedly
replacing a non-terminal by the right-hand side of the production until all non-terminal have been
replaced by terminal symbols.
Example 1: Construct the CFG for the language having any number of a's over the set ∑= {a}.
Solution: As we know the regular expression for the above language is
r.e. = a*
Now if we want to derive a string "aaaaaa", we can start with start symbols.
1. S
2. aS
3. aaS rule 1
4. aaaS rule 1
5. aaaaS rule 1
6. aaaaaS rule 1
7. aaaaaaS rule 1
8. aaaaaaε rule 2
9. aaaaaa
The r.e. = a* can generate a set of string {ε, a, aa, aaa,.....}. We can have a null string because S
is a start symbol and rule 2 gives S → ε.
Example 2: Construct a CFG for the regular expression (0+1)*
The rules are in the combination of 0's and 1's with the start symbol. Since (0+1)* indicates {ε, 0,
1, 01, 10, 00, 11, ....}. In this set, ε is a string, so in the rule, we can set the rule S → ε.
Solution: The string that can be generated for a given language is {aacaa, bcb, abcba, bacab,
abbcbba, ....}
Now if we want to derive a string "abbcbba", we can start with start symbols.
S → aSa
S → abSba from rule 2
S → abbSbba from rule 2
S → abbcbba from rule 3
Thus any of this kind of string can be derived from the given production rules.
Solution: The string that can be generated for a given language is {abb, aabbbb, aaabbbbbb....}.
1. S → aSbb | abb
Now if we want to derive a string "aabbbb", we can start with start symbols.
1. S → aSbb
2. S → aabbbb
2.2. Derivation: - Derivation is a sequence of production rules. It is used to get the input
string through these production rules. During parsing, we have to take two decisions. These are
as follows:
• We have to decide the non-terminal which is to be replaced.
• We have to decide the production rule by which the non-terminal will be replaced.
We have two options to decide which non-terminal to be placed with production rule.
2.2.1. Leftmost Derivation: In the leftmost derivation, the input is scanned and replaced
with the production rule from left to right. So in leftmost derivation, we read the input string
from left to right.
Example:
Production rules:
1. E = E + E
2. E = E - E
3. E = a | b
Input
1. a - b + a
2.2.2. Rightmost Derivation: In rightmost derivation, the input is scanned and replaced
with the production rule from right to left. So in rightmost derivation, we read the input string
from right to left.
Example
Production rules:
1. E = E + E
2. E = E - E
3. E = a | b
Input
1. a - b + a
The rightmost derivation is:
1. E=E-E
2. E=E-E+E
3. E=E-E+a
4. E=E-b+a
5. E=a-b+a
When we use the leftmost derivation or rightmost derivation, we may get the same string. This
type of derivation does not affect on getting of a string.
Examples of Derivation:
Example 1: Derive the string "abb" for leftmost derivation and rightmost derivation using a CFG
given by,
1. S → AB | ε
2. A → aB
3. B → Sb
Solution:
Leftmost derivation:
Rightmost derivation:
Example 2: Derive the string "aabbabba" for leftmost derivation and rightmost derivation using
a CFG given by,
1. S → aB | bA
2. S → a | aS | bAA
3. S → b | aS | aBB
Solution:
Leftmost derivation:
1. S
2. aB S → aB
3. aaBB B → aBB
4. aabB B→b
5. aabbS B → bS
6. aabbaB S → aB
7. aabbabS B → bS
8. aabbabbA S → bA
9. aabbabba A → a
Rightmost derivation:
1. S
2. aB S → aB
3. aaBB B → aBB
4. aaBbS B → bS
5. aaBbbA S → bA
6. aaBbba A→a
7. aabSbba B → bS
8. aabbAbba S → bA
9. aabbabba A → a
2.3 Derivation Tree: Derivation tree is a graphical representation for the derivation of the
given production rules for a given CFG. It is the simple way to show how the derivation can be
done to obtain some string from a given set of production rules. The derivation tree is also called
a parse tree.
Example 1:
Production rules:
1. E = E + E
2. E = E * E
3. E = a | b | c
Input
1. a * b + c
Step 1:
Step 2:
Step 2:
Step 4:
Step 5:
2.4 Sentential Form and Partial Derivation Tree: A partial derivation tree is a sub-tree of a
derivation tree/parse tree such that either all of its children are in the sub-tree or none of them are
in the sub-tree.
Example
If in any CFG the productions are −
S → AB, A → aaA | ε, B → Bb| ε
the partial derivation tree can be the following −
If a partial derivation tree contains the root S, it is called a sentential form. The above sub-tree is
also in sentential form.
2.5 Ambiguity in Grammar- A grammar is said to be ambiguous if there exists more than
one leftmost derivation or more than one rightmost derivation or more than one parse tree for the
given input string. If the grammar is not ambiguous, then it is called unambiguous.
If the grammar has ambiguity, then it is not good for compiler construction. No method can
automatically detect and remove the ambiguity, but we can remove ambiguity by re-writing the
whole grammar without ambiguity.
1. E→I
2. E→E+E
3. E→E*E
4. E → (E)
5. I → ε | 0 | 1 | 2 | ... | 9
Solution: For the string "3 * 2 + 5", the above grammar can generate two parse trees by leftmost
derivation:
Since there are two parse trees for a single string "3 * 2 + 5", the grammar G is ambiguous.
1. E → E + E
2. E → E - E
3. E → id
Solution: From the above grammar String "id + id - id" can be derived in 2 ways:
1. E → E + E
2. → id + E
3. → id + E - E
4. → id + id - E
5. → id + id- id
1. E → E - E
2. →E+E-E
3. → id + E - E
4. → id + id - E
5. → id + id - id
Since there are two leftmost derivation for a single string "id + id - id", the grammar G is
ambiguous.
To convert ambiguous grammar to unambiguous grammar, we will apply the following rules:
1. If the left associative operators (+, -, *, /) are used in the production rule, then apply left
recursion in the production rule. Left recursion means that the leftmost symbol on the right side
is the same as the non-terminal on the left side. For example,
X → Xa
2. If the right associative operates (^) is used in the production rule then apply right recursion in
the production rule. Right recursion means that the rightmost symbol on the left side is the same
as the non-terminal on the right side. For example,
X → aX
1. S → AB | aaB
2. A → a | Aa
3. B → b
1. S → AB
2. A → Aa | a
3. B → b
Example 2: Show that the given grammar is ambiguous. Also, find an equivalent unambiguous
grammar.
1. S → ABA
2. A → aA | ε
3. B → bB | ε
Solution: The given grammar is ambiguous because we can derive two different parse tree for
string aa.
The unambiguous grammar is:
1. S → aXY | bYZ | ε
2. Z → aZ | a
3. X → aXY | a | ε
4. Y → bYZ | b | ε
2.6 Simplification of CFG- In a CFG, it may happen that all the production rules and
symbols are not needed for the derivation of strings. Besides, there may be some null
productions and unit productions. Elimination of these productions and symbols is
called simplification of CFGs. Simplification essentially comprises of the following steps −
• Reduction of CFG
• Removal of Unit Productions
• Removal of Null Productions
2.6.1Removal of Useless Symbols-
A symbol can be useless if it does not appear on the right-hand side of the production rule and
does not take part in the derivation of any string. That symbol is known as a useless symbol.
Similarly, a variable can be useless if it does not take part in the derivation of any string. That
variable is known as a useless variable.
For Example:
In the above example, the variable 'C' will never occur in the derivation of any string, so the
production C → ad is useless. So we will eliminate it, and the other productions are written in
such a way that variable C can never reach from the starting variable 'T'.
Production A → aA is also useless because there is no way to terminate it. If it never terminates,
then it can never produce a string. Hence this production can never take part in any derivation.
To remove this useless production A → aA, we will first find all the variables which will never
lead to a terminal string such as variable 'A'. Then we will remove all the productions in which
the variable 'B' occurs.
The productions of type S → ε are called ε productions. These type of productions can only be
removed from those grammars that do not generate ε.
Step 1: First find out all nullable non-terminal variable which derives ε.
Step 2: For each production A → a, construct all production A → x, where x is obtained from a
by removing one or more non-terminal from step 1.
Step 3: Now combine the result of step 2 with the original production and remove ε productions.
Example: Remove the production from the following CFG by preserving the meaning of it.
1. S → XYX
2. X → 0X | ε
3. Y → 1Y | ε
Solution: Now, while removing ε production, we are deleting the rule X → ε and Y → ε. To
preserve the meaning of CFG we are actually placing ε at the right-hand side whenever X and Y
have appeared.
Let us take
S → XYX
If the first X at right-hand side is ε. Then
S → YX
Similarly if the last X in R.H.S. = ε. Then
S → XY
If Y = ε then
S → XX
If Y and X are ε then,
S→X
If both X are replaced by ε
S→Y
Now,
S → XY | YX | XX | X | Y
Now let us consider
X → 0X
If we place ε at right-hand side for X then,
X→0
X → 0X | 0
Similarly Y → 1Y | 1
The unit productions are the productions in which one non-terminal gives another non-terminal.
Use the following steps to remove unit production:
Step 3: Repeat step 1 and step 2 until all unit productions are removed.
Example:
1. S → 0A | 1B | C
2. A → 0S | 00
3. B→1|A
4. C → 01
Solution:
S → C is a unit production. But while removing S → C we have to consider what C gives. So,
we can add a rule to S.
S → 0A | 1B | 01
B → 1 | 0S | 00
1. S → 0A | 1B | 01
2. A → 0S | 00
3. B → 1 | 0S | 00
4. C → 01
Example:
1. G1 = {S → AB, S → c, A → a, B → b}
2. G2 = {S → aA, A → a, B → c}
The production rules of Grammar G1 satisfy the rules specified for CNF, so the grammar G1 is
in CNF. However, the production rule of Grammar G2 does not satisfy the rules specified for
CNF as S → aZ contains terminal followed by non-terminal. So the grammar G2 is not in CNF.
Steps for converting CFG into CNF
Step 1: Eliminate start symbol from the RHS. If the start symbol T is at the right-hand side of
any production, create a new production as:
S1 → S
Step 2: In the grammar, remove the null, unit and useless productions.
Step 3: Eliminate terminals from the RHS of the production if they exist with other non-
terminals or terminals.
For example, production S → aA can be decomposed as:
S → RA
R→a
Example: Convert the given CFG to CNF. Consider the given grammar G1:
1. S → a | aA | B
2. A → aBB | ε
3. B → Aa | b
Solution:
Step 1: We will create a new production S1 → S, as the start symbol S appears on the RHS. The
grammar will be:
1. S1 → S
2. S → a | aA | B
3. A → aBB | ε
4. B → Aa | b
Step 2: As grammar G1 contains A → ε null production, its removal from the grammar yields:
1. S1 → S
2. S → a | aA | B
3. A → aBB
4. B → Aa | b | a
Now, as grammar G1 contains Unit production S → B, its removal yield:
1. S1 → S
2. S → a | aA | Aa | b
3. A → aBB
4. B → Aa | b | a
Also remove the unit production S1 → S, its removal from the grammar yields:
1. S0 → a | aA | Aa | b
2. S → a | aA | Aa | b
3. A → aBB
4. B → Aa | b | a
Step 3: In the production rule S0 → aA | Aa, S → aA | Aa, A → aBB and B → Aa, terminal a
exists on RHS with non-terminals. So we will replace terminal a with X:
1. S0 → a | XA | AX | b
2. S → a | XA | AX | b
3. A → XBB
4. B → AX | b | a
5. X→a
Step 4: In the production rule A → XBB, RHS has more than two symbols, removing it from
grammar yield:
1. S0 → a | XA | AX | b
2. S → a | XA | AX | b
3. A → RB
4. B → AX | b | a
5. X → a
6. R → XB
GNF stands for Greibach normal form. A CFG(context free grammar) is in GNF(Greibach
normal form) if all the production rules satisfy one of the following conditions:
Example:
1. G1 = {S → aAB | aB, A → aA| a, B → bB | b}
2. G2 = {S → aAB | aB, A → aA | ε, B → bB | ε}
The production rules of Grammar G1 satisfy the rules specified for GNF, so the grammar G1 is
in GNF. However, the production rule of Grammar G2 does not satisfy the rules specified for
GNF as A → ε and B → ε contains ε(only start symbol can generate ε). So the grammar G2 is
not in GNF.
If the given grammar is not in CNF, convert it into CNF. You can refer the following topic to
convert the CFG into CNF: Chomsky normal form
If the context free grammar contains left recursion, eliminate it. You can refer the following
topic to eliminate left recursion: Left Recursion
Step 3: In the grammar, convert the given production rule into GNF form.
If any production rule in the grammar is not in GNF form, convert it.
Example:
1. S → XB | AA
2. A → a | SA
3. B→b
4. X→a
Solution: As the given grammar G is already in CNF and there is no left recursion, so we can
skip step 1 and step 2 and directly go to step 3.
1. S → XB | AA
2. A → a | XBA | AAA
3. B → b
4. X → a
1. S → aB | AA
2. A → a | aBA | AAA
3. B→b
4. X→a
1. S → aB | AA
2. A → aC | aBAC
3. C → AAC | ε
4. B→b
5. X→a
1. S → aB | AA
2. A → aC | aBAC | a | aBA
3. C → AAC | AA
4. B→b
5. X→a