0% found this document useful (0 votes)
14 views41 pages

08 CFG

Uploaded by

subah8245
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views41 pages

08 CFG

Uploaded by

subah8245
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Context-free Grammar (CFG)

Context-free languages (CFLs)


Precedence in arithmetic expressions
bash-3.2$ python
Python 2.6.5 (r265:79359, Mar 24 2010, 01:32:55)
>>> 2+3*5
17

* +

+ 5 2 *
or
2 3 3 5

= 25 = 17
Grammars describe meaning

EXPR
EXPR → EXPR + TERM
EXPR → TERM EXPR
TERM → TERM * NUM
TERM TERM
TERM → NUM
NUM → 0-9 TERM

NUM NUM NUM


rules for valid (simple)
2 + 3 * 5
arithmetic expressions

Rules always yield the correct meaning


The grammar of English

SENTENCE → NOUN-PHRASE VERB-PHRASE


a girl likes the boy
NOUN-PHRASE VERB-PHRASE

NOUN-PHRASE → A-NOUN
or → A-NOUN PREP-PHRASE

a girl a girl with a flower


A-NOUN A-NOUN PREP-PHRASE
The grammar of English
NOUN-PHRASE → A-NOUN
or → A-NOUN PREP-PHRASE

a girl a girl with a flower


A-NOUNrecursive A-NOUN PREP-PHRASE
structure
PREP-PHRASE → PREP NOUN-PHRASE

with a flower
PREP NOUN-PHRASE
The grammar of (parts of) English

SENTENCE → NOUN-PHRASE VERB-PHRASE ARTICLE → a


NOUN-PHRASE → A-NOUN ARTICLE → the
NOUN-PHRASE → A-NOUN PREP-PHRASE NOUN → boy
VERB-PHRASE → CMPLX-VERB NOUN → girl
VERB-PHRASE → CMPLX-VERB PREP-PHRASE NOUN → flower
PREP-PHRASE → PREP A-NOUN VERB → likes
A-NOUN → ARTICLE NOUN VERB → touches
CMPLX-VERB → VERB NOUN-PHRASE VERB → sees
CMPLX-VERB → VERB PREP → with
The meaning of sentences

SENTENCE

NOUN-PHRASE VERB-PHRASE

CMPLX-VERB

PREP-PHRASE NOUN-PHRASE

A-NOUN A-NOUN A-NOUN

ARTICLE NOUN PREP ARTICLE NOUN VERB ARTICLE NOUN


a girl with a flower likes the boy
Context-free grammar

start variable A → 0A1 variables


A→B terminals
B→#
productions

A  0A1  00A11  000A111


 000B111  000#111
derivation
Context-free grammar
• A context-free grammar is given by (N, S, P, S) where
– N is a finite set of variables or non-terminals
- S is a finite set of terminals
– P is a set of productions or substitution rules of the form
A→a
A is a variable and a is a string of variables and terminals
– S is a variable called the start variable
Notation and conventions

EE+E A  0A Non-terminals: E, A
E  (E) A  1A Terminals: +, *, (, ), 0, 1
E A A 0 Start variable: E
A 1

shorthand: conventions:
E  E + E | (E) | A Variables in UPPERCASE
A  0A | 1A | 0 | 1 Start variable comes first
Derivation
• A derivation is a sequential application of productions:
E E+E
 (E)+ E E  E + E |(E) | A
 (E)+ A N  0A | 1A | 0 | 1
 (E + E)+ 1

derivation
 (E + E)+ 1
 (E + A)+ 1
 (A + A)+ 1
ab one production
 (A + 1A)+ 1
 (A + 10)+ 1
 (1 + 10)+ 1

* (1 + 10)+ 1
E  * b
a derivation
Context-free languages
• The language of a CFG is the set of all strings
generated by the grammar
* w}
L(G) = {w : w  S* and S 

• Questions we will ask:


I give you a CFG, what is the language?

I give you a language, write a CFG for it


Analysis example

A → 0A1 | B
B→#
L(G) = {0n#1n: n ≥ 0}

• Can you derive:


00#11 A  0A1  00A11  00B11  00#11

# A B #

00#111 No, there is an uneven number of 0s and 1s

00##11 No, there are too many #


Analysis example

S  SS | (S) | 

• Can you derive


S  (S) (2) S  (S)
 () (3)  (SS)
 ((S)S)
 ((S)(S))
 (()(S))
 (()())

() (()())
Parse trees

S  SS | (S) | 

• A parse tree gives a more compact representation:


S  (S) S
 (SS)
( S )
 ((S)S)
 ((S)(S))
S S
 (()(S))
 (()()) ( S )( S )
(()())  
Parse trees
S  (S) S  (S)
 (SS)  (SS)
 ((S)S)  (S(S))
 ((S)(S)) S  ((S)(S))
 (()(S))  (()(S))
 (()()) ( S )  (()())

S  (S) S S S  (S)
 (SS) ( S )( S )  (SS)
 ((S)S)  (S(S))
 (()S)    (S())
 (()(S))  ((S)())
 (()())  (()())

One parse tree can represent many derivations


Analysis example

S  SS | (S) | 

• Can you derive

(()() No, because there is an uneven


number of ( and )

())(() No, because there is a prefix


with an excess of )
Analysis example

S  SS | (S) |  L(G) = {w:


w has the same number of ( and )
no prefix of w has more )than(}
S
S S Parsing rules:

S Divide w up in blocks with


same number of ( and )
S S S
S S Each block is in L(G)

 
( ( ) ( ) ) ( ) Parse each block recursively
Design example

L = {0n1n | n  0}

These strings have recursive structure:


000000111111
0000011111
00001111
000111
0011
01

S  0S1| 
Design example

L = numbers without leading zeros

0, 109, 2, 23 , 01, 003


allowed not allowed

S → 0|LB 1052870032
B → DB|
any number
D → 0|L
leading digit L
L → 1|2|3|4|5|6|7|8|9
Design examples

L = {0n1n0m1m | n  0, m  0} 010011
00110011
These strings have two parts: 000111

L = L1L2
L1 = {0n1n | n  0}
L2 = {0m1m | m  0}
S  S1S1
rules for L1: S1  0S11|  S1  0S11 | 
L2 is the same as L1
Design examples

L = {0n1m0m1n | n  0, m  0} 011001
0011
These strings have nested structure: 1100
00110011
outer part: 0n1n
inner part: 1m0m

S  0S1|A
A  1A0 | 
Context-Free Grammar
• A context-free grammar G = (N, S, P, S),
where
– N : set of variables or non-terminals
- S : set of terminals
– P : set of productions, each of which is of the form
A  a1 | a2 | …
• Where each ai is an arbitrary string of variables and
terminals
– S: start variable

What is L(G)?
G: S  0 S 0 | 1 S 1 | 
Examples

What is L(G)?

– G: S  0 S 0 | 1 S 1 | 0 | 1

• Give a CFG for the set of palindromes over the


alphabet {0, 1}
Example

• A grammar for L = {0m1n | m≥n}

• CFG?
S => 0S1 | A
A => 0A | 
Design examples

L = {x: x has two 0-blocks with same number of 0s}

01011, 001011001, 10010101001 01001000, 01111


allowed not allowed

10010011010010110 A: , or ends in 1
initial part middle part final part C: , or begins with 1
A B C
Design examples

10010011010010110 A: , or ends in 1
A B C C: , or begins with 1
U: any string
S → ABC B has recursive structure:
A →  | U1
U → 0U | 1U |  00110100
C →  | 1U D
B → 0D0 | 0B0 same number of 0s
D → 1U1 | 1 at least one 0

D: begins and ends in 1


Context-free versus regular
• Write a CFG for the language (0 + 1)*111
S  U111
U  0U | 1U | 

• Can you do so for every regular language?

Every regular language is context-free

regular
NFA DFA
expression
From regular to context-free
regular expression CFG

 grammar with no rules


 S→
a (alphabet symbol) S→a
E1 + E2 S → S1 | S 2
E1E2 S → S1S2
E1* S → SS1 | 

(S becomes the new start symbol)


Context-free versus regular
• Is every context-free language regular?

S → 0S1 |  L = {0n1n: n ≥ 0}

Is context-free but not regular

regular context-free
CFLs & Regular Languages
What kind of grammars result for regular languages?

• A CFG is said to be right-linear if all the productions


are one of the following two forms:
• A → wB (or) A → w Where:
• A & B are non-terminals and
• w is a string of terminals

• Theorem 1: Every right-linear CFG generates a regular


language
• Theorem 2: Every regular language has a right-linear
grammar
• Theorem 3: Left-linear CFGs also represent RLs
Some Examples

0 1 0,1 0 1
1 0 A => 01B | C
A B C 1 0 B => 11B | 0C | 1A
A B 1 C
C => 1A | 0 | 1
0
Right linear CFG? Right linear CFG? Finite Automaton?

A => 0A
B => 1B | 1A | 0
C => 0B | 0C | 1C | 0 | 1
Exercises

L = {0n12n | n  0}
L = {0n10n | n  0}
L = {0n1n1m | n  0, m  0}
L = {0n1m0n| n  0, m  0}
L = {0m1n | m ≠ n, m, n  0}
L = {0i1j2k | i=j or j=k, where i,j,k≥0}
L = {0i1j2k | i=j or i=k, where i,j,k≥1}
Binary strings of even length
Binary strings with equal numbers of 0’s and 1’s
Left-most & Right-most Derivation Styles
Derive the string a*(ab+10) from G: G:
E => E+E | E*E | (E) | F
E *=>G a*(ab+10) F => aF | bF | 0F | 1F | 

E E
==> E * E ==> E * E

==> F * E ==> E * (E)


Left-most Right-most
==> aF * E ==> E * (E + E)
derivation: derivation:
==> a * E ==> E * (E + F)

==> a * (E) ==> E * (E + 1F)


Always Always
==> a * (E + E) ==> E * (E + 10F)
substitute substitute
==> a * (F + E) ==> E * (E + 10)
leftmost ==> a * (aF + E) ==> E * (F + 10) rightmost
variable ==> a * (abF + E) ==> E * (aF + 10) variable
==> a * (ab + E) ==> E * (abF + 0)

==> a * (ab + F) ==> E * (ab + 10)

==> a * (ab + 1F) ==> F * (ab + 10)

==> a * (ab + 10F) ==> aF * (ab + 10)

==> a * (ab + 10) ==> a * (ab + 10)


Leftmost vs. Rightmost derivations
Q1) For every leftmost derivation, there is a rightmost
derivation, and vice versa. True or False?

True - will can use parse trees to prove this

Q2) Does every word generated by a CFG have a leftmost and a


rightmost derivation?
Yes – easy to prove (reverse direction)

Q3) Could there be words which have more than one leftmost
(or rightmost) derivation?
Yes – depending on the grammar
How to prove that your CFGs are
correct?

(using induction)

36
CFG & CFL
• Theorem: A string w in (0+1)* is in L(G), iff, w is a
palindrome.
G:
A → 0A0 | 1A1 | 0 | 1 | 
• Proof:
– Use induction
• on string length for the IF part
• on length of derivation for the ONLY IF part
Ambiguity in CFGs
• A CFG is said to be ambiguous if there exists a
string which has more than one left-most
derivation

Example:
S ==> AS |  LM derivation #1:
A ==> A1 | 0A1 | 01 S => AS
=> 0A1S
=>0A11S LM derivation #2:
=> 00111S S => AS
=> 00111 => A1S
=> 0A11S
Input string: 00111
=> 00111S
Can be derived in two ways => 00111
Why does ambiguity matter?
E ==> E + E | E * E | (E) | a | b | c | 0 | 1

string = a * b + c Values are


different !!!

• LM derivation #1:
E
•E => E + E => E * E + E
==>* a * b + c E + E (a*b)+c

E * E c

a b
E
• LM derivation #2
•E => E * E => a * E => E * E a*(b+c)
a * E + E ==>* a * b + c
a E + E

The calculated value depends on which b c


of the two parse trees is actually used.
Removing Ambiguity in Expression Evaluations
• It MAY be possible to remove ambiguity for some
CFLs
– E.g.,, in a CFG for expression evaluation by imposing rules
& restrictions such as precedence
– This would imply rewrite of the grammar

• Precedence: (), * , + Modified unambiguous version:


E => E + T | T
T => T * F | F
F => I | (E)
I => a | b | c | 0 | 1

Ambiguous version: How will this avoid ambiguity?


E ==> E + E | E * E | (E) | a | b | c | 0 | 1
Inherently Ambiguous CFLs
• However, for some languages, it may not be possible
to remove ambiguity

• A CFL is said to be inherently ambiguous if every CFG


that describes it is ambiguous
Example:
– L = { anbncmdm | n,m≥ 1} U {anbmcmdn | n,m≥ 1}
– L is inherently ambiguous
– Why?

Input string: anbncndn

You might also like