Toc 3
Toc 3
Languages
1
Context-Free Languages
The class of context-free languages contains
all regular languages, as well as some non-
regular languages.
The class of context-free languages consists
of languages that have some sort of
recursive structure.
Context-free grammars are used for defining
the syntax of programming languages and
their compilation.
The nondeterministic pushdown automata
have the same power as context-free
grammars.
2
Grammar
A grammar is a set of rules for putting things
together.
Language generators which consists of
terminal symbols, non terminal symbols, a
starting symbol and rules.
A validity of a sentence is determined by the
grammar of the language.
Each language generated by some grammar
can be recognized by some automaton.
3
Grammar
Def: A grammar is a 4-tuple G = (V, Ʃ, P, S), where
1. V is a finite set, whose elements are called variables
or non-terminals,
2. Ʃ is a finite set, whose elements are called terminals,
3. V ∩ Ʃ = Ø ,
4. S is an element of V ; it is called the start variable,
5. P is a finite set, whose elements are called rules or
productions.
Each rule has the form A w, where
A Є (V U Σ)+ and w Є (V U Σ )*
4
Types of grammar
Chomsky Hierarchy of Grammars
Chomsky is the founder of formal language
theory and classified grammars according to
the minimal automation required to
recognize them.
◦ Type 0 grammar(Unrestricted grammar)
◦ Type 1 grammar(Context sensitive grammar)
◦ Type 2 grammar( Context Free grammar)
◦ Type 3 grammar(Regular grammar)
5
Type 0 grammar
A grammar G, is said to be type 0 grammar or
unrestricted grammar or phrase structured grammar if
all the productions are of the form α β where
α Є (V U Σ)+ and β Є(V U Σ)*
In this type of grammars there is no restrictions on size
of α and β but α can not be ϵ.
This is the largest family of grammars which is more
powerful than all other grammars. Any language can be
obtained from this grammar.
The language generated by this grammar is called
recursively enumerable language
Turing machine can recognize this grammar.
6
Type 1 grammar
A grammar G is said to be type 1 grammar
or context sensitive grammar if all the
productions are of the form α β where
α Є (V U Σ)+ and β Є (V U Σ)+ and |α| ≤ |β|
It is also ϵ - free grammar, since ϵ is not
allowed in the left hand side and right hand
side of the production.
Linear Bound Automata can be used to
recognize this language.
7
Type 2 grammar
A grammar G is said to be type 2 grammar
or context free grammar if all the
productions are of the form Aα where α
Є (V U Σ)* and A is single non-terminal.
The ϵ can appear on the right hand side of
the production.
The language generated by this grammar is
Context-Free language
Push Down Automata can be used to
recognize this language.
8
Type 3 grammar
A grammar G is said to be Type 3 grammar or regular
grammar if and only if the grammar is left linear or right
linear.
A grammar G is said to be left linear if all the productions
are of the form
A Bw or Aw
where A,B are variables or non-terminals and w is string of
terminal.
A grammar G is said to be right linear if all the productions
are of the form
A wB or Aw
where A,B are variables or non-terminals and w is string of
terminal.
The language generated by regular grammar is called
Regular Language.
Regular languages are recognized by Finite Automata.
9
A context-free grammar is a notation for
describing languages.
It is more powerful than finite automata or
RE’s, but still cannot define all possible
languages.
Useful for nested structures, e.g., parentheses
in programming languages.
10
Context-free grammars:
Definition: A context-free grammar is a 4-tuple
G = (V, Ʃ, P, S), where
1. V is a finite set, whose elements are called
variables,
2. Ʃ is a finite set, whose elements are called
terminals,
3. V ∩ Ʃ = Ø ,
4. S is an element of V ; it is called the start variable,
5. P is a finite set, whose elements are called rules
or productions. Each rule has the form A w,
where A Є V and w Є (V U Σ )*.
11
Formal CFG Example
Here is a formal CFG for { 0n1n | n > 1}.
The CFG for above language is G= (V, Ʃ, P, S),
where
Terminals Ʃ = {0, 1}.
Variables V = {S}.
Start symbol S = S.
Productions P = S -> 01 , S -> 0S1
i.e P ={S -> 01 | 0S1}
12
13
14
15
16
17
18
19
20
21
22
23
24
Design of CFG from FA
Let M= (Q, Σ, δ, q0, F) be a finite automata accepting L,
then A grammar G= (V, Ʃ, P, S) can be constructed
where, V={q0,q1,…qn} i.e., the states of DFA
will be the variables in the grammar
Ʃ= Ʃ i.e., the alphabets of DFA are the
terminals in grammar
S=q0 i.e., the start state of DFA is the start
symbol in the grammar
The productions can be obtained as follows:
If δ(qi, a) =qj then the production is qi aqj
If q Є F then q ϵ
i.e., P= {S aS | ϵ}
25
Obtain the grammar for the following DFA
Sol:
Transitions Grammar
δ(S, 0) = A S0A
δ(S, 1) = S S1S
δ(A, 0) = A A0A
δ(A, 0) = A A1A
Since A is a final state Aϵ
26
Design of CFG from Languages
Obtain a CFG for the following language, L={anbn|n≥0}
Sol:
When n = 0, the production is Sϵ
When n=1, the production is Sab i.e., for every a one b has to be
generated
Therefore the production is , S aSb
The language L contains the strings ab,aabb, aaabbb,……
The derivation for the string ab and aabb is as follows:
SaSb SaSb
aϵb aaSbb
The CFG, G= (V, Ʃ, P, S) where ab aaϵbb
aabb
V={S}, Ʃ={a,b}, S=S, and
P={SaSb|ϵ}
27
Obtain a CFG for the set of all palindromes over Σ={a,b}
Sol:
ϵ - is a palindrome, therefore Sϵ
a|b is a palindrome, therefore Sa|b
If w is palindrome then awa and bwb are palindromes, therefore
SaSa|bSb
The language L contains the strings aba, bab, abba, ababa, babab,…….
SaSa SbSb
aba baSab
babab
Therefore the CFG, G= (V, Ʃ, P, S)
where
V={S}, Ʃ={a,b}, S=S, and
P={SaSa|bSb|a|b|ϵ}
28
Obtain the CFG for the regular expression (0+1)*01
Sol:
The regular expression represents stings of 0’s and 1’s
ending with 01.
29
Parsing or Derivation
Derivation or Parsing is a processing of string by sequence of
substitutions.
It means replacing a non-terminal string by the right hand
side of the production, whose left hand side contains the non
-terminal to be replaced.
Parsing produces a new string from a given string.
Derivation can be used repeatedly to obtain a new string
from a given string.
If the string obtained after parsing contains only terminal
symbols, then no further derivations are possible.
There are two forms are derivations:
◦ Left most derivation
◦ Right most derivation
30
31
Example: Leftmost Derivations
Balanced-parentheses grammmar:
S -> SS | (S) | ()
S =>lm SS =>lm (S)S =>lm (())S =>lm (())()
Thus, S =>*lm (())()
S => SS => S() => (S)() => (())() is a
derivation, but not a leftmost derivation.
32
33
Example: Rightmost Derivations
Balanced-parentheses grammmar:
S -> SS | (S) | ()
S =>rm SS =>rm S() =>rm (S)() =>rm (())()
Thus, S =>*rm (())()
S => SS => SSS => S()S => ()()S => ()()() is
neither a rightmost nor a leftmost derivation.
34
Parse Trees
Parse trees are trees labeled by symbols of
a particular CFG.
Leaves: labeled by a terminal or ε.
Interior nodes: labeled by a variable.
◦ Children are labeled by the right side of a
production for the parent.
Root: must be labeled by the start symbol.
35
Example: Parse Tree
S -> SS | (S) | ()
S
S S
( S ) ( )
( )
36
Yield of a Parse Tree
The concatenation of the labels of the leaves
in left-to-right order is called the yield of the
parse tree.
Example: yield of the following parse tree
is (())()
S
S S
( S ) ( )
( )
37
Consider the following five rules:
S AB, A a, A aA, B b, B bB
Here, S, A, and B are variables, S is the start variable, and a and b are terminals.
We use these rules to derive strings consisting of terminals (i.e., elements of {a,
b}), in the following manner:
Initialize the current string to be the string consisting of the start variable S.
Take any variable in the current string and take any rule that has this variable on
the left-hand side. Then, in the current string, replace this variable by the right-
hand side of the rule.
Repeat until the current string only contains terminals.
For example, the string aaaabb can be derived in the following way:
Left most Derivation Right Most Derivation
S AB S AB
aAB AbB
aaAB Abb
aaaAB aAbb
aaaaB aaAbb
aaaabB aaaAbB
aaaabb aaaabb
38
Ambiguous Grammars
An ambiguous grammar is a CFG for
which there exists a string that can have
more than one leftmost derivation
A CFG is ambiguous if there is a string in
the language that is the yield of two or
more parse trees.
Example: S -> SS | (S) | ()
39
Example: Two parse trees for ()()()
S S
S S S S
S S ( ) ( ) S S
( ) ( ) ( ) ( )
40
41
The grammar can be converted into
unambiguous grammar based on the
precedence.
The order of precedence is as follows:
◦ The identifiers have the highest
precedence(Ex: a,b,c,0,1,…)
◦ The expression with in “()”
◦ *, / which ever occurs first from left to right
◦ +,- which ever occurs first from left to right
42
By assuming all these operators are left
associative, the new productions can be formed
as follows:
I a|b|c|0|1
F(E) | I
TT *F | T/F | F
EE+T |E-T|T
43
Pushdown Automata
The PDA is an automaton equivalent to the
CFG in language-defining power.
Only the nondeterministic PDA defines all the
CFL’s.
But the deterministic version models parsers.
◦ Most programming languages have deterministic
PDA’s.
44
PDA is an ε-NFA with the additional power
that it can manipulate a stack.
Its moves are determined by:
1. The current state (of its “NFA”),
2. The current input symbol (or ε), and
3. The current symbol on top of its stack.
The presence of a stack means that unlike the
finite automaton, the pushdown automaton
can remember an infinite amount of
information.
45
A push down automaton is a finite automaton with a stack data structure
46
Definition: A pushdown automata involves 7
components. The components are as
follows:
A pushdown automaton M is a system (Q, Σ,Г,
δ,q0,Z0,F), where
1) Q is a finite set of states;
2) Σ is an alphabet called the input alphabet;
3) Г is an alphabet, called the stack alphabet;
4) q0 in Q is the initial state;
5) Z0 in Г is a particular stack symbol called the
start symbol;
6) F ⊆ Q is the set of final states;
7) δ is a mapping from Q x (Σ U {ϵ}) x Г to finite
subsets of Q x Г*.
47
The Transition Function
48
Actions of the PDA
If δ(q, a, Z) contains (p, ) among its
actions, then one thing the PDA can do
in state q, with a at the front of the
input, and Z on top of the stack is:
1. Change the state to p.
2. Remove a from the front of the input
(but a may be ε).
3. Replace Z on the top of the stack by .
49
An Instantaneous Description (ID):
A ID is a triple (q, w, ), where:
1. q is the current state.
2. w is the remaining input.
3. is the current contents of stack i.e at the left.
If δ(q, a, Z) = (p, ) for any w and , then, we
obtain (p, w, ) i.e.,
(q, aw, Z) derives (p, w, ) in one move and is
represented by
(q, aw, Z)⊦(p, w, )
Extend ⊦ to ⊦*, meaning “zero or more moves”
50
Language Accepted by a PDA
There are two ways to define the language of a PDA.
One is by entering an accepting state(Final state
acceptance) and the other by emptying its stack(Empty
stack acceptance). These methods are equivalent, i.e., if
any language is accepted by one method is also
accepted by other method.
◦ Final state acceptance :The language accepted is the set of
all inputs for which some choice of moves causes the PDA
to enter a final state.
(q0, w, Z0) ⊦* (p, ϵ, ) where p Є F
◦ Empty stack acceptance: The language accepted is the set of
all inputs for which some choice of moves causes the PDA
to empty the stack.
(q0, w, Z0) ⊦* (p, ϵ, ϵ) where p Є Q
51
Construct a PDA to accept the following language
L={0n1n| n >= 1} by final state.
Sol: The machine should accept n number of 0’s
followed by n number of 1’s.
◦ Push all 0’s into the stack. For each 1 we encounter,
there should be a corresponding 0 on the stack. When
the input pointer reaches the end of the string, the
stack should be empty.
◦ Let q0 be the start state and z0 be the initial symbol on
stack. In the state q0, if the input symbol is 0, push it
on the stack.
The transitions are δ(q0, 0, Z0) = (q0, 0Z0) and
δ(q0, 0, 0) = (q0, 00)
52
In the state q0, if the next input symbol is 1, then
change the state to q1 and pop the top element
from the stack.
δ(q0, 1, 0) = (q1, ϵ)
In state q1, the rest of the symbols to be scanned
will be only 1’s and for each 1 there should be a
corresponding 0 on stack. If so, remain in state
q1 and pop the element from the stack
δ(q1, 1, 0) = (q1, ϵ)
If the next input symbol to be scanned is ϵ and if
the top of the stack is z0, change the state to q2
which is an accepting state. The transitions are
δ(q1, ϵ, Z0) = (q2, Z0)
53
Therefore PDA M which accepts the language
is M= (Q, Σ,Г, δ,q0,Z0,F) where
Q={q0,q1,q2}, Σ={0,1}, Г={Z0}, q0 ={q0},
Z0={Z0}, F={q2} and
δ is as follows:
δ(q0, 0, Z0) = (q0, 0Z0)
δ(q0, 0, 0) = (q0, 00)
δ(q0, 1, 0) = (q1, ϵ)
δ(q1, 1, 0) = (q1, ϵ)
δ(q1, ϵ, Z0) = (q2, Z0)
54
PDA to accept strings of the Language anbn
55
If the language is accepted by a final state then
it is also accepted by empty stack.
To convert the language to be acceptable by
empty stack the only change is as follows:
◦ Convert the final transition
δ(q1, ϵ, Z0) = (q2, Z0)
δ(q1, ϵ, Z0) = (q1, ϵ)
When the language is accepted by empty stack then stack should not
contain any thing including the stack symbol z0
56
Check the acceptability of the string 0011.
Sol: Following are the sequence of moves made
by the PDA to check whether the string is
accepted or not.
• The Initial ID is
(q0, 0011, Z0) ⊦ (q0, 011 , 0 Z0)
⊦ (q0, 11 , 00 Z0)
⊦ (q1, 1 , 0 Z0)
⊦ (q1, ϵ , Z0)
⊦ (q2, ϵ , Z0) is the final configuration.
Since q2 is the final state and the input string is ϵ in the
final configuration the string 0011 is accepted by PDA.
57
Check the acceptability of the string 00111.
Sol: Following are the sequence of moves
made by the PDA to check whether the string
is accepted or not.
• The Initial ID is
(q0, 00111, Z0) ⊦ (q0, 0111 , 0 Z0)
⊦ (q0, 111 , 00 Z0)
⊦ (q1, 11 , 0 Z0)
⊦ (q1, 1 , Z0) is the final configuration.
Since the transition δ(q1, 1 , Z0) is not defined, and the
input string 00111 is rejected by PDA.
58
59
Obtain a PDA to accept the language L={wcwR | w Є (a+b)*
where wR is the reverse of w}
Sol: Given that L={wcwR} .
If w= aab, then reverse of w is wR=baa
The languages L contains a string ‘aabcbaa’
The resultant string is a palindrome.
Push all the scanned symbols on the stack till we find a
symbol c.
After middle symbol if the string is a palindrome, for each
input symbol there should be a corresponding symbol in
the stack, pop the symbol.
Finally if there is no input and stack is empty, the given
string is acceptable.
The alphabet Σ={a,b,c} and let q0 be the start state of PDA
and Z0 is the initial symbol on the stack.
60
For input symbol a or b
In state q0 with z0 on top of the stack, push the input
symbols into the stack and remain in q0 until we
encounter the middle symbol ‘c’
The transitions are as follows:
δ(q0, a, Z0) = (q0, aZ0)
δ(q0, b, Z0) = (q0, bZ0)
δ(q0, a, a) = (q0, aa)
δ(q0, a, b) = (q0, ab)
δ(q0, b, a) = (q0, ba)
δ(q0, b, b) = (q0, bb)
For the input symbol c , the top the stack may be
either a or b or z0 then move to state q1 and do not
change the top of the stack
δ(q0, c, Z0) = (q1, Z0)
δ(q0, c, a) = (q1, a)
δ(q0, c, b) = (q1, b)
61
For the input symbols a or b for wR
In the state q1, if the next input symbol is
same as the top of the stack pop the element
and remain in state q1. The transitions are:
δ(q1, a, a) = (q1, ϵ)
δ(q1, b, b) = (q1, ϵ)
62
Therefore PDA M which accepts the language is M= (Q,
Σ,Г, δ,q0,Z0,F) where
Q={q0,q1,q2}, Σ={a,b,c}, Г={Z0}, q0 ={q0}, Z0={Z0}, F={q2}
and
δ is as follows:
δ(q0, a, Z0) = (q0, aZ0)
δ(q0, b, Z0) = (q0, bZ0)
δ(q0, a, a) = (q0, aa)
δ(q0, a, b) = (q0, ab)
δ(q0, b, a) = (q0, ba)
δ(q0, b, b) = (q0, bb)
δ(q0, c, Z0) = (q1, Z0)
δ(q0, c, a) = (q1, a)
δ(q0, c, b) = (q1, b)
δ(q1, a, a) = (q1, ϵ)
δ(q1, b, b) = (q1, ϵ)
δ(q1, ϵ, z0) = (q2, z0)
Show the acceptability of
the following strings
abacaba, babcbaa
63
Deterministic PDA:
Let M= (Q, Σ,Г, δ,q0,Z0,F) be a PDA. The PDA
is deterministic if the following two conditions
are satisfied.
◦ δ(q, a, z) has only one transition element
◦ δ(q, ϵ, z) is not empty
64
Show that the language L={wcwR | w Є (a+b)* where wR is
the reverse of w} is deterministic.
Sol. The transitions defined for the machine are
δ(q0, a, Z0) = (q0, aZ0)
δ(q0, b, Z0) = (q0, bZ0)
δ(q0, a, a) = (q0, aa)
δ(q0, a, b) = (q0, ab)
δ(q0, b, a) = (q0, ba)
δ(q0, b, b) = (q0, bb)
δ(q0, c, Z0) = (q1, Z0)
δ(q0, c, a) = (q1, a)
δ(q0, c, b) = (q1, b)
δ(q1, a, a) = (q1, ϵ)
δ(q1, b, b) = (q1, ϵ)
δ(q1, ϵ, z0) = (q2, z0)
For each q Є Q, a Є Σ, z Є Г , there is only one element
defined. Hence all the transitions are unique
And δ(q, ϵ, z) is not empty i.e., δ(q1, ϵ, z0) is defined.
Hence the PDA is deterministic.
65