0% found this document useful (0 votes)
135 views65 pages

Toc 3

The document discusses context-free grammars and languages, including defining context-free grammars, Chomsky hierarchy of grammars from type 0 to type 3, examples of context-free grammars, parsing and derivation of strings, and parse trees. Context-free grammars can generate context-free languages and are more powerful than regular expressions and finite automata but cannot define all possible languages.

Uploaded by

Punya V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views65 pages

Toc 3

The document discusses context-free grammars and languages, including defining context-free grammars, Chomsky hierarchy of grammars from type 0 to type 3, examples of context-free grammars, parsing and derivation of strings, and parse trees. Context-free grammars can generate context-free languages and are more powerful than regular expressions and finite automata but cannot define all possible languages.

Uploaded by

Punya V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Context Free Grammars and

Languages

1
Context-Free Languages
 The class of context-free languages contains
all regular languages, as well as some non-
regular languages.
 The class of context-free languages consists
of languages that have some sort of
recursive structure.
 Context-free grammars are used for defining
the syntax of programming languages and
their compilation.
 The nondeterministic pushdown automata
have the same power as context-free
grammars.
2
Grammar
 A grammar is a set of rules for putting things
together.
 Language generators which consists of
terminal symbols, non terminal symbols, a
starting symbol and rules.
 A validity of a sentence is determined by the
grammar of the language.
 Each language generated by some grammar
can be recognized by some automaton.

3
Grammar
Def: A grammar is a 4-tuple G = (V, Ʃ, P, S), where
1. V is a finite set, whose elements are called variables
or non-terminals,
2. Ʃ is a finite set, whose elements are called terminals,
3. V ∩ Ʃ = Ø ,
4. S is an element of V ; it is called the start variable,
5. P is a finite set, whose elements are called rules or
productions.
Each rule has the form A  w, where
A Є (V U Σ)+ and w Є (V U Σ )*

4
Types of grammar
 Chomsky Hierarchy of Grammars
 Chomsky is the founder of formal language
theory and classified grammars according to
the minimal automation required to
recognize them.
◦ Type 0 grammar(Unrestricted grammar)
◦ Type 1 grammar(Context sensitive grammar)
◦ Type 2 grammar( Context Free grammar)
◦ Type 3 grammar(Regular grammar)

5
Type 0 grammar
 A grammar G, is said to be type 0 grammar or
unrestricted grammar or phrase structured grammar if
all the productions are of the form α β where
α Є (V U Σ)+ and β Є(V U Σ)*
 In this type of grammars there is no restrictions on size
of α and β but α can not be ϵ.
 This is the largest family of grammars which is more
powerful than all other grammars. Any language can be
obtained from this grammar.
 The language generated by this grammar is called
recursively enumerable language
 Turing machine can recognize this grammar.

6
Type 1 grammar
 A grammar G is said to be type 1 grammar
or context sensitive grammar if all the
productions are of the form α  β where
α Є (V U Σ)+ and β Є (V U Σ)+ and |α| ≤ |β|
It is also ϵ - free grammar, since ϵ is not
allowed in the left hand side and right hand
side of the production.
Linear Bound Automata can be used to
recognize this language.

7
Type 2 grammar
 A grammar G is said to be type 2 grammar
or context free grammar if all the
productions are of the form Aα where α
Є (V U Σ)* and A is single non-terminal.
 The ϵ can appear on the right hand side of
the production.
 The language generated by this grammar is
Context-Free language
 Push Down Automata can be used to
recognize this language.

8
Type 3 grammar
 A grammar G is said to be Type 3 grammar or regular
grammar if and only if the grammar is left linear or right
linear.
 A grammar G is said to be left linear if all the productions
are of the form
A Bw or Aw
where A,B are variables or non-terminals and w is string of
terminal.
 A grammar G is said to be right linear if all the productions
are of the form
A wB or Aw
where A,B are variables or non-terminals and w is string of
terminal.
 The language generated by regular grammar is called
Regular Language.
 Regular languages are recognized by Finite Automata.
9
 A context-free grammar is a notation for
describing languages.
 It is more powerful than finite automata or
RE’s, but still cannot define all possible
languages.
 Useful for nested structures, e.g., parentheses
in programming languages.

10
Context-free grammars:
Definition: A context-free grammar is a 4-tuple
G = (V, Ʃ, P, S), where
1. V is a finite set, whose elements are called
variables,
2. Ʃ is a finite set, whose elements are called
terminals,
3. V ∩ Ʃ = Ø ,
4. S is an element of V ; it is called the start variable,
5. P is a finite set, whose elements are called rules
or productions. Each rule has the form A  w,
where A Є V and w Є (V U Σ )*.

11
Formal CFG Example
Here is a formal CFG for { 0n1n | n > 1}.
The CFG for above language is G= (V, Ʃ, P, S),
where
 Terminals Ʃ = {0, 1}.
 Variables V = {S}.
 Start symbol S = S.
 Productions P = S -> 01 , S -> 0S1
i.e P ={S -> 01 | 0S1}

12
13
14
15
16
17
18
19
20
21
22
  

23
  

24
Design of CFG from FA
Let M= (Q, Σ, δ, q0, F) be a finite automata accepting L,
then A grammar G= (V, Ʃ, P, S) can be constructed
where, V={q0,q1,…qn} i.e., the states of DFA
will be the variables in the grammar
Ʃ= Ʃ i.e., the alphabets of DFA are the
terminals in grammar
S=q0 i.e., the start state of DFA is the start
symbol in the grammar
The productions can be obtained as follows:
If δ(qi, a) =qj then the production is qi aqj
If q Є F then q  ϵ
i.e., P= {S aS | ϵ}

25
Obtain the grammar for the following DFA
 Sol:
Transitions Grammar
δ(S, 0) = A S0A
δ(S, 1) = S S1S
δ(A, 0) = A A0A
δ(A, 0) = A A1A
Since A is a final state Aϵ

Therefore the grammar G= (V, Ʃ, P, S) where


V={S,A}, Ʃ={0,1}, S=S, and
P={S0A|1S, A0A|1A|ϵ }

26
Design of CFG from Languages
Obtain a CFG for the following language, L={anbn|n≥0}
Sol:
When n = 0, the production is Sϵ
When n=1, the production is Sab i.e., for every a one b has to be
generated
Therefore the production is , S aSb
The language L contains the strings ab,aabb, aaabbb,……
The derivation for the string ab and aabb is as follows:
SaSb SaSb
aϵb aaSbb
The CFG, G= (V, Ʃ, P, S) where ab aaϵbb
aabb
V={S}, Ʃ={a,b}, S=S, and
P={SaSb|ϵ}

27
Obtain a CFG for the set of all palindromes over Σ={a,b}
Sol:
ϵ - is a palindrome, therefore Sϵ
a|b is a palindrome, therefore Sa|b
If w is palindrome then awa and bwb are palindromes, therefore
SaSa|bSb
The language L contains the strings aba, bab, abba, ababa, babab,…….

Using the above productions we can derive the palindromes as follows:

SaSa SbSb
aba baSab

babab
Therefore the CFG, G= (V, Ʃ, P, S)
where
V={S}, Ʃ={a,b}, S=S, and
P={SaSa|bSb|a|b|ϵ}

28
Obtain the CFG for the regular expression (0+1)*01
Sol:
The regular expression represents stings of 0’s and 1’s
ending with 01.

(0+1)* represents strings of 0’s and 1’s including null


string.

Therefore the CFG, G= (V, Ʃ, P, S)


where
V={S}, Ʃ={0,1}, S=S, and
P={SA01, A0A|1A|ϵ}

29
Parsing or Derivation
 Derivation or Parsing is a processing of string by sequence of
substitutions.
 It means replacing a non-terminal string by the right hand
side of the production, whose left hand side contains the non
-terminal to be replaced.
 Parsing produces a new string from a given string.
 Derivation can be used repeatedly to obtain a new string
from a given string.
 If the string obtained after parsing contains only terminal
symbols, then no further derivations are possible.
 There are two forms are derivations:
◦ Left most derivation
◦ Right most derivation

30
  

31
Example: Leftmost Derivations

 Balanced-parentheses grammmar:
S -> SS | (S) | ()
 S =>lm SS =>lm (S)S =>lm (())S =>lm (())()
 Thus, S =>*lm (())()
 S => SS => S() => (S)() => (())() is a
derivation, but not a leftmost derivation.

32
  

33
Example: Rightmost Derivations

 Balanced-parentheses grammmar:
S -> SS | (S) | ()
 S =>rm SS =>rm S() =>rm (S)() =>rm (())()
 Thus, S =>*rm (())()
 S => SS => SSS => S()S => ()()S => ()()() is
neither a rightmost nor a leftmost derivation.

34
Parse Trees
 Parse trees are trees labeled by symbols of
a particular CFG.
 Leaves: labeled by a terminal or ε.
 Interior nodes: labeled by a variable.
◦ Children are labeled by the right side of a
production for the parent.
 Root: must be labeled by the start symbol.

35
Example: Parse Tree

S -> SS | (S) | ()
S

S S

( S ) ( )

( )

36
Yield of a Parse Tree
 The concatenation of the labels of the leaves
in left-to-right order is called the yield of the
parse tree.
 Example: yield of the following parse tree
is (())()
S

S S

( S ) ( )

( )
37
Consider the following five rules:
S  AB, A  a, A  aA, B  b, B  bB
 Here, S, A, and B are variables, S is the start variable, and a and b are terminals.
We use these rules to derive strings consisting of terminals (i.e., elements of {a,
b}), in the following manner:
 Initialize the current string to be the string consisting of the start variable S.
 Take any variable in the current string and take any rule that has this variable on
the left-hand side. Then, in the current string, replace this variable by the right-
hand side of the rule.
 Repeat until the current string only contains terminals.
For example, the string aaaabb can be derived in the following way:
Left most Derivation Right Most Derivation
S  AB S  AB
 aAB  AbB
 aaAB  Abb
 aaaAB  aAbb
 aaaaB  aaAbb
 aaaabB  aaaAbB
 aaaabb  aaaabb
38
Ambiguous Grammars
 An ambiguous grammar is a CFG for
which there exists a string that can have
more than one leftmost derivation
 A CFG is ambiguous if there is a string in
the language that is the yield of two or
more parse trees.
 Example: S -> SS | (S) | ()

39
Example: Two parse trees for ()()()

S S

S S S S

S S ( ) ( ) S S

( ) ( ) ( ) ( )

40
41
 The grammar can be converted into
unambiguous grammar based on the
precedence.
 The order of precedence is as follows:
◦ The identifiers have the highest
precedence(Ex: a,b,c,0,1,…)
◦ The expression with in “()”
◦ *, / which ever occurs first from left to right
◦ +,- which ever occurs first from left to right

42
 By assuming all these operators are left
associative, the new productions can be formed
as follows:
I a|b|c|0|1
F(E) | I
TT *F | T/F | F
EE+T |E-T|T

So the final grammar which is unambiguous is shown


below:
EE+T |E-T|T
TT *F | T/F | F
F(E) | I
I a|b|c|0|1

43
Pushdown Automata
 The PDA is an automaton equivalent to the
CFG in language-defining power.
 Only the nondeterministic PDA defines all the
CFL’s.
 But the deterministic version models parsers.
◦ Most programming languages have deterministic
PDA’s.

44
 PDA is an ε-NFA with the additional power
that it can manipulate a stack.
 Its moves are determined by:
1. The current state (of its “NFA”),
2. The current input symbol (or ε), and
3. The current symbol on top of its stack.
 The presence of a stack means that unlike the
finite automaton, the pushdown automaton
can remember an infinite amount of
information.

45
A push down automaton is a finite automaton with a stack data structure

46
Definition: A pushdown automata involves 7
components. The components are as
follows:
A pushdown automaton M is a system (Q, Σ,Г,
δ,q0,Z0,F), where
1) Q is a finite set of states;
2) Σ is an alphabet called the input alphabet;
3) Г is an alphabet, called the stack alphabet;
4) q0 in Q is the initial state;
5) Z0 in Г is a particular stack symbol called the
start symbol;
6) F ⊆ Q is the set of final states;
7) δ is a mapping from Q x (Σ U {ϵ}) x Г to finite
subsets of Q x Г*.

47
The Transition Function

 Takes three arguments:


1. A state, in Q.
2. An input, which is either a symbol in Σ or ε.
3. A stack symbol in Γ.
 δ(q, a, Z) is a set of zero or more actions
of the form (p, ).
◦ p is a state;  is a string of stack symbols.

48
Actions of the PDA
 If δ(q, a, Z) contains (p, ) among its
actions, then one thing the PDA can do
in state q, with a at the front of the
input, and Z on top of the stack is:
1. Change the state to p.
2. Remove a from the front of the input
(but a may be ε).
3. Replace Z on the top of the stack by .

49
An Instantaneous Description (ID):
A ID is a triple (q, w, ), where:
1. q is the current state.
2. w is the remaining input.
3.  is the current contents of stack i.e at the left.
 If δ(q, a, Z) = (p, ) for any w and , then, we
obtain (p, w, ) i.e.,
 (q, aw, Z) derives (p, w, ) in one move and is
represented by
(q, aw, Z)⊦(p, w, )
 Extend ⊦ to ⊦*, meaning “zero or more moves”

50
Language Accepted by a PDA
 There are two ways to define the language of a PDA.
One is by entering an accepting state(Final state
acceptance) and the other by emptying its stack(Empty
stack acceptance). These methods are equivalent, i.e., if
any language is accepted by one method is also
accepted by other method.
◦ Final state acceptance :The language accepted is the set of
all inputs for which some choice of moves causes the PDA
to enter a final state.
 (q0, w, Z0) ⊦* (p, ϵ, ) where p Є F
◦ Empty stack acceptance: The language accepted is the set of
all inputs for which some choice of moves causes the PDA
to empty the stack.
 (q0, w, Z0) ⊦* (p, ϵ, ϵ) where p Є Q

51
Construct a PDA to accept the following language
L={0n1n| n >= 1} by final state.
 Sol: The machine should accept n number of 0’s
followed by n number of 1’s.
◦ Push all 0’s into the stack. For each 1 we encounter,
there should be a corresponding 0 on the stack. When
the input pointer reaches the end of the string, the
stack should be empty.
◦ Let q0 be the start state and z0 be the initial symbol on
stack. In the state q0, if the input symbol is 0, push it
on the stack.
The transitions are δ(q0, 0, Z0) = (q0, 0Z0) and
δ(q0, 0, 0) = (q0, 00)

52
 In the state q0, if the next input symbol is 1, then
change the state to q1 and pop the top element
from the stack.
δ(q0, 1, 0) = (q1, ϵ)
 In state q1, the rest of the symbols to be scanned
will be only 1’s and for each 1 there should be a
corresponding 0 on stack. If so, remain in state
q1 and pop the element from the stack
δ(q1, 1, 0) = (q1, ϵ)
 If the next input symbol to be scanned is ϵ and if
the top of the stack is z0, change the state to q2
which is an accepting state. The transitions are
δ(q1, ϵ, Z0) = (q2, Z0)

53
 Therefore PDA M which accepts the language
is M= (Q, Σ,Г, δ,q0,Z0,F) where
 Q={q0,q1,q2}, Σ={0,1}, Г={Z0}, q0 ={q0},
Z0={Z0}, F={q2} and
 δ is as follows:
 δ(q0, 0, Z0) = (q0, 0Z0)
 δ(q0, 0, 0) = (q0, 00)
 δ(q0, 1, 0) = (q1, ϵ)
 δ(q1, 1, 0) = (q1, ϵ)
 δ(q1, ϵ, Z0) = (q2, Z0)

54
PDA to accept strings of the Language anbn
55
 If the language is accepted by a final state then
it is also accepted by empty stack.
 To convert the language to be acceptable by
empty stack the only change is as follows:
◦ Convert the final transition
 δ(q1, ϵ, Z0) = (q2, Z0)
 δ(q1, ϵ, Z0) = (q1, ϵ)
When the language is accepted by empty stack then stack should not
contain any thing including the stack symbol z0

56
Check the acceptability of the string 0011.
Sol: Following are the sequence of moves made
by the PDA to check whether the string is
accepted or not.
• The Initial ID is
 (q0, 0011, Z0) ⊦ (q0, 011 , 0 Z0)
⊦ (q0, 11 , 00 Z0)
⊦ (q1, 1 , 0 Z0)
⊦ (q1, ϵ , Z0)
⊦ (q2, ϵ , Z0) is the final configuration.
Since q2 is the final state and the input string is ϵ in the
final configuration the string 0011 is accepted by PDA.

57
 Check the acceptability of the string 00111.
 Sol: Following are the sequence of moves
made by the PDA to check whether the string
is accepted or not.
• The Initial ID is
 (q0, 00111, Z0) ⊦ (q0, 0111 , 0 Z0)
⊦ (q0, 111 , 00 Z0)
⊦ (q1, 11 , 0 Z0)
⊦ (q1, 1 , Z0) is the final configuration.
Since the transition δ(q1, 1 , Z0) is not defined, and the
input string 00111 is rejected by PDA.

58
59
Obtain a PDA to accept the language L={wcwR | w Є (a+b)*
where wR is the reverse of w}
Sol: Given that L={wcwR} .
 If w= aab, then reverse of w is wR=baa
 The languages L contains a string ‘aabcbaa’
 The resultant string is a palindrome.
 Push all the scanned symbols on the stack till we find a
symbol c.
 After middle symbol if the string is a palindrome, for each
input symbol there should be a corresponding symbol in
the stack, pop the symbol.
 Finally if there is no input and stack is empty, the given
string is acceptable.
 The alphabet Σ={a,b,c} and let q0 be the start state of PDA
and Z0 is the initial symbol on the stack.

60
 For input symbol a or b
 In state q0 with z0 on top of the stack, push the input
symbols into the stack and remain in q0 until we
encounter the middle symbol ‘c’
 The transitions are as follows:
 δ(q0, a, Z0) = (q0, aZ0)
 δ(q0, b, Z0) = (q0, bZ0)
 δ(q0, a, a) = (q0, aa)
 δ(q0, a, b) = (q0, ab)
 δ(q0, b, a) = (q0, ba)
 δ(q0, b, b) = (q0, bb)
 For the input symbol c , the top the stack may be
either a or b or z0 then move to state q1 and do not
change the top of the stack
 δ(q0, c, Z0) = (q1, Z0)
 δ(q0, c, a) = (q1, a)
 δ(q0, c, b) = (q1, b)

61
 For the input symbols a or b for wR
 In the state q1, if the next input symbol is
same as the top of the stack pop the element
and remain in state q1. The transitions are:
 δ(q1, a, a) = (q1, ϵ)
 δ(q1, b, b) = (q1, ϵ)

 In the state q1 if the input symbol is ϵ and top


of the stack is z0 then move to state q2 which
is the final state. The transition is
 δ(q1, ϵ, z0) = (q2, z0)

62
 Therefore PDA M which accepts the language is M= (Q,
Σ,Г, δ,q0,Z0,F) where
 Q={q0,q1,q2}, Σ={a,b,c}, Г={Z0}, q0 ={q0}, Z0={Z0}, F={q2}
and
 δ is as follows:
 δ(q0, a, Z0) = (q0, aZ0)
 δ(q0, b, Z0) = (q0, bZ0)
 δ(q0, a, a) = (q0, aa)
 δ(q0, a, b) = (q0, ab)
 δ(q0, b, a) = (q0, ba)
 δ(q0, b, b) = (q0, bb)
 δ(q0, c, Z0) = (q1, Z0)
 δ(q0, c, a) = (q1, a)
 δ(q0, c, b) = (q1, b)
 δ(q1, a, a) = (q1, ϵ)
 δ(q1, b, b) = (q1, ϵ)
 δ(q1, ϵ, z0) = (q2, z0)
Show the acceptability of
the following strings
abacaba, babcbaa

63
 Deterministic PDA:
 Let M= (Q, Σ,Г, δ,q0,Z0,F) be a PDA. The PDA
is deterministic if the following two conditions
are satisfied.
◦ δ(q, a, z) has only one transition element
◦ δ(q, ϵ, z) is not empty

64
 Show that the language L={wcwR | w Є (a+b)* where wR is
the reverse of w} is deterministic.
Sol. The transitions defined for the machine are
 δ(q0, a, Z0) = (q0, aZ0)
 δ(q0, b, Z0) = (q0, bZ0)
 δ(q0, a, a) = (q0, aa)
 δ(q0, a, b) = (q0, ab)
 δ(q0, b, a) = (q0, ba)
 δ(q0, b, b) = (q0, bb)
 δ(q0, c, Z0) = (q1, Z0)
 δ(q0, c, a) = (q1, a)
 δ(q0, c, b) = (q1, b)
 δ(q1, a, a) = (q1, ϵ)
 δ(q1, b, b) = (q1, ϵ)
 δ(q1, ϵ, z0) = (q2, z0)
 For each q Є Q, a Є Σ, z Є Г , there is only one element
defined. Hence all the transitions are unique
 And δ(q, ϵ, z) is not empty i.e., δ(q1, ϵ, z0) is defined.
 Hence the PDA is deterministic.

65

You might also like