0% found this document useful (0 votes)
32 views14 pages

Slides Parsing 04 CFG PCFG

This document discusses context-free grammars (CFGs) and probabilistic context-free grammars (PCFGs). A CFG defines a set of rules for rewriting symbols to generate sentences in a formal language. A PCFG adds probabilities to each rule in a CFG to define a probabilistic language model. PCFGs are commonly used in natural language processing tasks as they can assign probabilities to parse trees and sentences based on the rule probabilities.

Uploaded by

conab60622
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views14 pages

Slides Parsing 04 CFG PCFG

This document discusses context-free grammars (CFGs) and probabilistic context-free grammars (PCFGs). A CFG defines a set of rules for rewriting symbols to generate sentences in a formal language. A PCFG adds probabilities to each rule in a CFG to define a probabilistic language model. PCFGs are commonly used in natural language processing tasks as they can assign probabilities to parse trees and sentences based on the rule probabilities.

Uploaded by

conab60622
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

CFGs and PCFGs

(Probabilistic)
Context-Free
Grammars
Christopher Manning

A phrase structure grammar

SN  people
NP VP
VP
N fish
V NP
VP
N tanks
V NP PP
NProds
N NP NP
NPpeople
V NP PP
NPfish
V N
NPtanks
V e
PPwith
P P NP

people fish tanks


people fish with rods
Christopher Manning

Phrase structure grammars


= context-free grammars (CFGs)

• G = (T, N, S, R)
• T is a set of terminal symbols
• N is a set of nonterminal symbols
• S is the start symbol (S ∈ N)
• R is a set of rules/productions of the form X  
• X ∈ N and  ∈ (N ∪ T)*

• A grammar G generates a language L.


Christopher Manning

Phrase structure grammars in NLP

• G = (T, C, N, S, L, R)
• T is a set of terminal symbols
• C is a set of preterminal symbols
• N is a set of nonterminal symbols
• S is the start symbol (S ∈ N)
• L is the lexicon, a set of items of the form X  x
• X ∈ P and x ∈ T
• R is the grammar, a set of items of the form X  
• X ∈ N and  ∈ (N ∪ C)*
• By usual convention, S is the start symbol, but in statistical NLP,
we usually have an extra node at the top (ROOT, TOP)
• We usually write e for an empty sequence, rather than nothing
Christopher Manning

A phrase structure grammar

SN 
 NP
people
VP
VP
N fish
V NP
VP
N tanks
V NP PP
NProds
N NP NP
NPpeople
V NP PP
NPfish
V N
NPtanks
V e
PPwith
P P NP

people fish tanks


people fish with rods
Christopher Manning

Probabilistic – or stochastic – context-free


grammars (PCFGs)

• G = (T, N, S, R, P)
• T is a set of terminal symbols
• N is a set of nonterminal symbols
• S is the start symbol (S ∈ N)
• R is a set of rules/productions of the form X  
• P is a probability function
• P: R  [0,1]

• A grammar G generates a language model L.


Christopher Manning

A PCFG

SN 
 NP
people
VP 0.5
1.0
VP
N fish
V NP 0.6
0.2
VP
N tanks
V NP PP 0.4
0.2
NProds
N NP NP 0.1
NPpeople
V NP PP 0.2
0.1
NPfish
V N 0.7
0.6
PP
V tanks
P NP 1.0
0.3
P  with 1.0

[With empty NP removed so


less ambiguous]
Christopher Manning

The probability of trees and strings

• P(t) – The probability of a tree t is the product of the


probabilities of the rules used to generate it.
• P(s) – The probability of the string s is the sum of the
probabilities of the trees which have that string as their yield

P(s) = Σj P(s, t) where t is a parse of s


= Σj P(t)
Christopher Manning

N  people 0.5
N  fish 0.2
N  tanks 0.2
N  rods 0.1
V  people 0.1
V  fish 0.6
V  tanks 0.3
P  with 1.0

S  NP VP 1.0
VP  V NP 0.6
VP  V NP PP 0.4
NP  NP NP 0.1
NP  NP PP 0.2
NP  N 0.7
PP  P NP 1.0
Christopher Manning

N  people 0.5
N  fish 0.2
N  tanks 0.2
N  rods 0.1
V  people 0.1
V  fish 0.6
V  tanks 0.3
P  with 1.0

S  NP VP 1.0
VP  V NP 0.6
VP  V NP PP 0.4
NP  NP NP 0.1
NP  NP PP 0.2
NP  N 0.7
PP  P NP 1.0
Christopher Manning

Tree and String Probabilities

• s = people fish tanks with rods


• P(t1) = 1.0 × 0.7 × 0.4 × 0.5 × 0.6 × 0.7 Verb attach
× 1.0 × 0.2 × 1.0 × 0.7 × 0.1
= 0.0008232
• P(t2) = 1.0 × 0.7 × 0.6 × 0.5 × 0.6 × 0.2
Noun attach
× 0.7 × 1.0 × 0.2 × 1.0 × 0.7 × 0.1
= 0.00024696
• P(s) = P(t1) + P(t2)
= 0.0008232 + 0.00024696
= 0.00107016
Christopher Manning
Christopher Manning
CFGs and PCFGs

(Probabilistic)
Context-Free
Grammars

You might also like