CT107-3-3-TXSA - Text Analytics and Sentiment Analysis Parse Tree
Lab 6: CFG & Parse Tree
References:
1. Natural Language Processing with Python, by Steven Bird, Ewan Klein and Edward Loper,
2014.
QUICK REVIEW
CFG has been the most influential grammar formalism for describing language syntax. This is not
because CFG has been generally adopted as such for linguistic description, but rather because most
grammar formalisms are derived from or can somehow be related to CFG. For this reason, CFG is
often used as a base formalism when parsing algorithms are described.
The standard way to represent the syntactic structure of a grammatical sentence is as a syntax tree,
or a parse tree, which is a representation of all the steps in the derivation of the sentence from the
root node. This means that each internal node in the tree represents an application of a grammar
rule.
PRACTICES
Parse Tree 01
import nltk
text2 = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | PP NP | Det N PP | 'I'
VP -> V NP | VP PP | V
Det -> 'a'
N -> 'book'
V -> 'write'
""")
text1 = nltk.tokenize.word_tokenize("I write a book")
print(text1)
parser = nltk.ChartParser(text2)
for tree in parser.parse(text1):
print(tree)
tree.draw()
Output
Level 3 Asia Pacific University (APU) Page 1 of 10
CT107-3-3-TXSA - Text Analytics and Sentiment Analysis Parse Tree
['I', 'write', 'a', 'book']
(S (NP I) (VP (V write) (NP (Det a) (N book))))
Parse Tree 02
import nltk
groucho_grammar = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I'
VP -> V NP | VP PP
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas'
V -> 'shot'
P -> 'in'
""")
sent = ['I', 'shot', 'an', 'elephant', 'in', 'my', 'pajamas']
parser = nltk.ChartParser(groucho_grammar)
for tree in parser.parse(sent):
tree.draw()
print(tree)
Parse Tree 03
import nltk
Level 3 Asia Pacific University (APU) Page 2 of 10
CT107-3-3-TXSA - Text Analytics and Sentiment Analysis Parse Tree
text2 = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | PP NP | Det N PP
VP -> V NP | VP PP | V
N -> 'Alice' | 'Bob'
V -> 'loves'
Det ->
P ->
""")
text1 = nltk.tokenize.word_tokenize("Alice loves Bob")
print(text1)
print()
parser = nltk.ChartParser(text2)
for tree in parser.parse(text1):
print(tree)
tree.draw()
Parse Tree 04 – Adjective Phrase
The little bear saw the fine fat trout in the brook
Clue:
Level 3 Asia Pacific University (APU) Page 3 of 10
CT107-3-3-TXSA - Text Analytics and Sentiment Analysis Parse Tree
NP DT Nom
Nom Adj N | Adj Adj N
import nltk
text2 = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | Det Nom | 'the'
VP -> V NP | VP PP
Nom -> Adj N | Adj Adj N
Det -> 'the'
N -> 'bear' | 'trout' | 'brook'
V -> 'saw'
P -> 'in'
Adj -> 'little' | 'fine' | 'fat'
""")
text1 = nltk.tokenize.word_tokenize("the little bear saw the fine
fat trout in the brook")
print(text1)
print()
parser = nltk.ChartParser(text2)
for tree1 in parser.parse(text1):
tree1.draw()
print(tree1)
Level 3 Asia Pacific University (APU) Page 4 of 10
CT107-3-3-TXSA - Text Analytics and Sentiment Analysis Parse Tree
Parse Tree 05 – Adjective Phrase
import nltk
grammar2 = nltk.CFG.fromstring("""
S -> NP VP
NP -> Det Nom | Det N | PropN
Nom -> Adj Nom | N
VP -> V Adj | V NP | V S | V NP PP
PP -> P NP
PropN -> 'Buster' | 'Chatterer' | 'Joe'
Det -> 'the' | 'a'
N -> 'bear' | 'squirrel' | 'tree' | 'fish' | 'log'
Adj -> 'angry' | 'frightened' | 'little' | 'tall'
V -> 'chased' | 'saw' | 'said' | 'thought' | 'was' | 'put'
P -> 'on'
""")
sent = ['the', 'angry', 'bear', 'chased', 'the', 'frightened',
'little', 'squirrel']
parser = nltk.ChartParser(grammar2)
for tree in parser.parse(sent):
tree.draw()
print(tree)
Level 3 Asia Pacific University (APU) Page 5 of 10
CT107-3-3-TXSA - Text Analytics and Sentiment Analysis Parse Tree
Parse Tree 06 – Adverb Phrases (AdvP)
E.g.: Ken snores very loudly
import nltk
sentence = "Ken snores very loudly"
gram = nltk.CFG.fromstring("""
S -> NP VP
NP -> N
VP -> V ADV
N -> 'Ken'
V -> 'snores'
DEG -> 'very'
ADV -> DEG ADV | 'loudly'
""")
Level 3 Asia Pacific University (APU) Page 6 of 10
CT107-3-3-TXSA - Text Analytics and Sentiment Analysis Parse Tree
token = nltk.tokenize.word_tokenize(sentence)
print(token)
parser = nltk.ChartParser(gram)
for tree in parser.parse(token):
print(tree)
tree.draw()
import nltk
from nltk.tokenize import word_tokenize
sents = [
"unfortunately the cat killed the mouse",
"the cat unfortunately killed the mouse",
"the cat killed the mouse unfortunately"
]
grammar = nltk.CFG.fromstring("""
S -> ADV NP VP | NP VP
NP -> DT N
VP -> ADV VP | VP ADV | V NP
DT -> 'the'
N -> 'cat' | 'mouse'
Level 3 Asia Pacific University (APU) Page 7 of 10
CT107-3-3-TXSA - Text Analytics and Sentiment Analysis Parse Tree
V -> 'killed'
ADV -> 'unfortunately'
""")
parser = nltk.ChartParser(grammar)
for sent in sents:
print(sent)
for tree in parser.parse(word_tokenize(sent)):
tree.draw()
print(tree)
Unfortunately the cat killed the mouse
The cat unfortunately killed the mouse
Level 3 Asia Pacific University (APU) Page 8 of 10
CT107-3-3-TXSA - Text Analytics and Sentiment Analysis Parse Tree
The cat killed the mouse unfortunately
Draw Parse Tree using COLAB
Run the following set of codes to set the COLAB platform
import nltk
nltk.download('punkt')
### CREATE VIRTUAL DISPLAY ###
!apt-get install -y xvfb # Install X Virtual Frame Buffer
import os
os.system('Xvfb :1 -screen 0 1600x1200x16 &') # create virtual display w
ith size 1600x1200 and 16 bit color. Color can be changed to 24 or 8
os.environ['DISPLAY']=':1.0' # tell X clients to use our virtual DISPLAY
:1.0.
%matplotlib inline
### INSTALL GHOSTSCRIPT (Required to display NLTK trees) ###
!apt install ghostscript python3-tk
Example Program ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
import nltk
from IPython.display import display
text2 = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | PP NP | Det N PP | 'I'
VP -> V NP | VP PP | V
Det -> 'a'
N -> 'book'
V -> 'write'
Level 3 Asia Pacific University (APU) Page 9 of 10
CT107-3-3-TXSA - Text Analytics and Sentiment Analysis Parse Tree
""")
text1 = nltk.tokenize.word_tokenize("I write a book")
print(text1)
parser = nltk.ChartParser(text2)
for tree in parser.parse(text1):
display(tree) # tree.draw()
# print(tree)
Level 3 Asia Pacific University (APU) Page 10 of
10