0% found this document useful (0 votes)

41 views12 pages

Final Practice

This exam assessed students' knowledge in natural language processing through multiple choice and short answer questions. The exam was divided into 5 sections testing skills in parsing with probabilistic context-free grammars, bottom-up chart parsing, partial parsing and question answering, inference, and a grab bag of additional concepts. Students had 2 hours and 30 minutes to complete the 100 point exam which was closed book.

Uploaded by

flying ostrich

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views12 pages

Final Practice

Uploaded by

flying ostrich

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

CS474

Introduction to Natural Language Processing

Final Exam
December 15, 2005
Name:

Netid:

Instructions: You have 2 hours and 30 minutes to complete this exam. The exam is a
closed-book exam.

# description score max score

1 Parsing with PCFGs ____ / 25

2 Bottom-up Chart Parsing ____ / 10

3 Partial Parsing/Question Answering ____ / 30

4 Inference ____ / 15

5 The Grab Bag ____ / 20

Total score: ______ / 100

1
1 Parsing with PCFGs (25 pts)
(a) (3 pts) A sentence can easily have more than one parse tree that is consistent with
a given CFG. How do PCFGs and non-probability-based CFGs differ in terms of
handling parsing ambiguity?

Answer
PCFG parsers resolve ambiguity by preferring constsituents (and parse trees) with
the highest probability.
Consider the following PCFG for problems (b)-(e).
production rule probability
S → VP 1.0

VP → Verb NP 0.7
VP → Verb NP PP 0.3

NP → NP PP 0.3
NP → Det Noun 0.7

PP → Prep Noun 1.0

Det → the 0.1

(b) (3 pts) Draw the top-ranked parse tree for the sentence below by applying the given
PCFG. Does the result seem reasonable to you? Why or why not?

Cut the envelope with scissors.

Answer

The top-ranked sentence structure is shown in figure 1. (The leaf nodes representing
words are omitted.) The probability of the resulting parse tree is 1.0 ∗ 0.3 ∗ 0.7 ∗ 1.0 ∗

2
(0.1)5 , which is larger than 1.0 ∗ 0.7 ∗ 0.3 ∗ 0.7 ∗ 1.0 ∗ (0.1)5 , the probability of the al-
ternative parse tree (with the [VP → Verb NP] rule expansion) . Semantically, ”with
scissors” should attach to the verb, hence the resulting parse tree is a reasonable
one.

(c) (3 pts) Draw the top-ranked parse tree for the sentence below by applying the given
PCFG. Does the result seem reasonable to you? Why or why not?

Ask the grandma with scissors.

Answer
The top-ranked sentence structure is the same as for part (c). Semantically, ”with
scissors” should attach to the noun phrase, hence the resulting parse tree is not a
reasonable one.

(d) (5 pts) Describe how you would lexicalize the given PCFG in order to address the
problem you hopefully noticed in (b) and/or (c). Then show specifically how the
production rules below should be modified according to your lexicalization scheme.
production rule probability
VP → Verb NP 0.7
VP → Verb NP PP 0.3

Answer
Lexicalization of production rules can capture lexical specific preference of certain
rule expensions. In order to mitigate the sparse data problem, we will lexicalize
with respect to the head word of the left hand side of each production rule, instead
of all nonterminals in each production rule. In particular, the rules expending from
VP should be modified as

production rule probability

VP (x) → Verb NP (px )
VP (x) → Verb NP PP (qx )
where
x ∈ { Cut, Ask, Find, ... },
def
px = P( VP (x) → Verb NP | VP, x),
def
qx = P( VP (x) → Verb NP PP | VP, x),
and px + qx = 1.

3
Comment
Because we didn’t restrict lexicalization to head words of the right hand side of
rules, it is okay to propose lexicalized PCFGs in many different ways; in particular,
you don’t have to condition on the head word, you can condition on the entire
combination of words for all nonterminals, as long as you made it clear what you
are conditioning on, although it would be much less practical.

(e) (5 pts) The following two sentences exhibit parsing ambiguities. How would your
lexicalized PCFG from (d) handle these ambiguities?

Find the men in suits.

Find the men in summer.

Answer
Notice that the head word for any other node in a parse tree except the node for
the last word is identical in both sentences. Therefore, the conditional probability
of each node for a particular rule expension is identical in both sentences except
the node for the last word. However, the last word in both given sentences do not
control which rule expension to be used on the ancestor’s nodes. Hence the exact
same parse tree will be chosen by PCFGs, even though the prepositional phrase in
the first sentence should attach to the noun phrase, and the prepositional phrase in
the second sentence should attach to the verb phrase. (Although it is not impossible
to do it the other way, but it would sound less sensible.) Which attachment to be
chosen will depend on the actual value of P( VP (Find) → Verb NP | VP, Find ) and
P( VP (Find) → Verb NP PP | VP, Find ). In summary, the head word lexicalization
does not solve all ambiguities, as shown in the given sentences.

Comment
If your proposal in (d) didn’t condition on head words of the right hand side of
rules, you might have a chance to have a different conclusion here, depending on
how exactly you chose the set of words to condition on. However, unless you some-
how invented a clever way to condition on the entire words for one nonterminal
PP, or unless you have changed the definition of head word, you probably end up
encountering the same problem as above.
Read for problems (f)-(g): One problem with a lexicalized PCFG is that some
(perfectly reasonable) words might never show up in the training data for certain
production rules. This results in rules with a probability of 0.

(f) (3 pts) Describe why production rules with zero probability are problematic.

4
Answer
If a production rule has a zero probability, then the parse tree derived from that
production rule will have to have a zero probability also. However, a production
rule may have a zero probability not because it is invalid, but because the particular
production rule has not been observed in the training data. PCFGs in this case will
not be able to return the correct parse tree involving an unseen rule.

(g) (3 pts) Describe one method to avoid zero probabilities for lexicalized PCFGs.

Answer
Smoothing techniques from the language models can be similarly applied here. One
simple method would be assigning a minimum count 1 for all possible lexicalized
rules. (In order to make it a proper probability, we will need to augment the prob-
abilty values collected from the training data when we see the new test data by
renormalizing them.)

5
2 Bottom-up Chart Parsing (10 pts)
Given the grammar and lexicon below, show the final chart for the following sentence
after applying the bottom-up chart parser. Remember that the final chart contains all
edges added during the parsing process. You may use either the notation from class (i.e.
nodes/links) or the notation from the book to depict the chart.

S → VP

VP → Verb NP
NP → NP PP
NP → Det Noun
Find the men in suits.
PP → Prep Noun

Det → the
Verb → Find
Prep → in
Noun → men | suits

6
Answer

S → VP
VP → Verb NP PP
VP → Verb NP

VP → Verb NP NP → NP PP
S --> VP

NP → Det Noun PP → Prep Noun

Find the men in suits
0 1 2 3 4 5
Verb Det Noun Prep Noun
VP → Verb . NP PP → Prep . Noun
VP → Verb . NP PP
NP → Det . Noun
NP → NP . PP
VP → Verb NP . PP

7
3 Partial Parsing / Question Answering (30 pts)
Consider the following article for problems (a) - (e).

[From product reviews for various computer peripherals.]

I bought my wireless keyboard/mouse set several months ago, and, like a lot of
new products, it has some unanticipated issues. On the plus side, obviously, is
the styling. The design is fresh, clean, and interesting. The keyboard can tilt at
different angles, which was important because I had some difficulty typing with
it flat. The bluetooth receiver in the charger was functional, and I appreciated
having a bluetooth hub for my cellphone. The mouse and the keyboard have both
proved durable and reliable despite a number of mishaps.

In regards to the software, there are some real issues. When the mouse powers
down to save battery life there is a second or two of lag before it reconnects
with the receiver. I found this really annoying to deal with every time I stepped
away from my desk for ten or fifteen minutes. Also, during system startup when
the bluetooth software has yet to initalize, both the keyboard and the mouse are
useless. This made it impossible to do any kind of pre-windows-startup tasks such
as F8 for windows configuration. I suspect this is a result of how bluetooth in-
teracts with the OS and bios, but whatever the cause, it was, for me, a deal-breaker.

(a) (5pts) Mark or draw the output of a partial parser for the following sentence, stating
any necessary assumptions.

The bluetooth receiver in the charger was functional, and I appreciated

having a bluetooth hub for my cellphone.

Answer
[The bluetooth receiver]np in [the charger]np was functional, and I appreciated hav-
ing [a bluetooth hub]np for [my cellphone]np .

Comment
There can be different correct answers depending on the definition of constituents.

(b) (5 pts) State two advantages of partial parsers over parsers that provide in-depth
syntactic information.

8
Answer
First, partial parsers can be more robust than the regular parsers, because partial
parsers work on easier tasks. Second, for some NLP applications such as informa-
tion extraction, information derived from partial parsers can be more relevant than
that from regular parsers.

(c) (5 pts) Consider a closed domain QA system for the domain of the above text, i.e.
product reviews of computer peripherals. Assume that the QA system uses a sim-
ple TFIDF-based information retrieval method to identify documents and sentences
that contain the answer to the input question. Assume also that the QA system only
has access to the above document, i.e. the above document is the only document in
the collection. (Yes, we know that this is not a reasonable assumption.)
Devise one reasonable wh-question (i.e. who, what, where, when, why) that has
an answer in the document but that the QA system would not be able to answer
sensibly. Explain why the question is difficult for the system.

Answer
Fall 2006 students: we did not cover TFIDF-based IR methods. They represent
each document and query as a vector indicating the presence or absence of each
word in the language (minus stopwords), and then compute similarity between
a document and a query by computing the cosine of the angle between the two
vectors. In addition, words that appear frequently across the entire corpus receive
small weights; words that appear frequently in a document receive high weights.
This isn’t the whole story, but is enough to let you think about answering the
question. No answer yet...

(d) (7 pts) Now suppose a closed domain QA system that has access to a large number
of product reviews for various computer peripherals. Assume the possible ques-
tions for the QA system are limited to the following two types of questions.

• What features of product X are buyers satisfied with?

• What features of product X are buyers dissatisfied with?

Since the types of questions are restricted, we can design predictive annotations to
assist the question answering system. Describe a set of useful predictive annotation
types for this restricted question answering task. Then annotate one sentence from
the article according to your annotation scheme.

Answer
Fall 2006 students: we did not cover predictive annotation. No answer yet.

9
(e) (8 pts) Suppose that you have convinced your friends to annotate 500 documents
per your definition of predictive annotations given in (d). Once the 500 documents
are annotated, one can use them to train a supervised machine learning algorithm
to automatically annotate many more documents (and thereby avoid losing one’s
friends who have become increasingly unwilling to help with the manual annota-
tions).
Select one of your predictive annotation types from (d). Explain step-by-step how
you would go about the task of training a learning algorithm to automate this type
of annotation. Be sure to define your learning task and to describe a reasonable set
of features.

Answer
Fall 2006 students: we did not cover predictive annotation. No answer yet.

10
4 Inference (10 pts)
Consider the following article for this problem.

[This is just the first paragraph from the previous question’s text.]

I bought my wireless keyboard/mouse set several months ago, and, like a lot of new
products, it has some unanticipated issues. On the plus side, obviously, is the styling. The
design is fresh, clean, and interesting. The keyboard can tilt at different angles, which
was important because I had some difficulty typing with it flat. The bluetooth receiver in
the charger was functional, and I appreciated having a bluetooth hub for my cellphone.
The mouse and the keyboard have both proved durable and reliable despite a number of
mishaps.

For each of inferences (a) through (d) below,

1. state whether the inference depends on the discourse context, knowledge about
actions, and/or general world knowledge; and

2. describe what natural language processing techniques, if any, might enable a system
to make the inference automatically.

(a) The reviewer owns the keyboard.

(b) The charger is part of the keyboard.
(c) The reviewer had difficulty typing with the keyboard.
(d) The reviewer likes the keyboard.

11
5 Grab Bag (20 pts)
(a) (4 pts) (True or False. Explain your answer.) Information extraction is harder than
text categorization.

Answer
Fall 2006 students: we did not cover information extraction.

(b) (6 pts) Briefly describe the key differences between Autoslog-TS and Autoslog.

Answer
Fall 2006 students: we did not cover this. Autoslog-TS is largely unsupervised. It
does not require annotations, but instead, requires two sets of documents: relevant
and not relevant. After extracting every NP from the texts, it selects patterns by
relevance rate and frequency.

Answer
False. There is not generally enough data for 4-grams to outperform trigrams.

(d) (6 pts) Noun phrase coreference resolution includes pronoun resolution, proper
noun resolution, and common noun resolution. Which of the three would you ex-
pect to be the most difficult to handle computationally? Explain why.

Answer
Common noun is the hardest, because there can be drastically broad way of corefer-
ring the same entity. The variety of proper noun and pronoun coreference patterns
are relatively much narrower.

Cs606 Collection of Old Papers
0% (2)
Cs606 Collection of Old Papers
18 pages
NLP Unit-5
No ratings yet
NLP Unit-5
43 pages
Compiler Design
No ratings yet
Compiler Design
38 pages
Unit 3
No ratings yet
Unit 3
19 pages
MT8127 Android Scatter
100% (1)
MT8127 Android Scatter
7 pages
# Managing Conflict in The Workplace
100% (2)
# Managing Conflict in The Workplace
53 pages
Gpro From HTfed
No ratings yet
Gpro From HTfed
19 pages
Anunnaki
No ratings yet
Anunnaki
97 pages
Calculation of Slab On Grade 15 CM
No ratings yet
Calculation of Slab On Grade 15 CM
2 pages
NLP 2 Internal
No ratings yet
NLP 2 Internal
39 pages
CD Prev Ans and Ques
No ratings yet
CD Prev Ans and Ques
37 pages
NLP Unit 3
No ratings yet
NLP Unit 3
17 pages
CD Previous QP Answers
No ratings yet
CD Previous QP Answers
28 pages
Inducing Tree-Substitution Grammars: Trevor Cohn
No ratings yet
Inducing Tree-Substitution Grammars: Trevor Cohn
44 pages
Unit 5 NLP
No ratings yet
Unit 5 NLP
6 pages
Marine Catalogue
No ratings yet
Marine Catalogue
81 pages
Probabilistic Context Free Grammar For Urdu: Keywords
No ratings yet
Probabilistic Context Free Grammar For Urdu: Keywords
8 pages
Assurance Ethics, Values & Good Governance Compress Edition PDF
No ratings yet
Assurance Ethics, Values & Good Governance Compress Edition PDF
216 pages
Parsing-Lexicalization Text Mining
No ratings yet
Parsing-Lexicalization Text Mining
38 pages
PCFG
No ratings yet
PCFG
79 pages
mockExamWS21 With Solution
No ratings yet
mockExamWS21 With Solution
35 pages
TDSM25 (Issue No 08)
No ratings yet
TDSM25 (Issue No 08)
18 pages
Chapter 9 V 2
No ratings yet
Chapter 9 V 2
18 pages
Extracting Noun Phrases From Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation
No ratings yet
Extracting Noun Phrases From Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation
8 pages
Efficient, Feature-Based, Conditional Random Field Parsing: Jenny Rose Finkel, Alex Kleeman, Christopher D. Manning
No ratings yet
Efficient, Feature-Based, Conditional Random Field Parsing: Jenny Rose Finkel, Alex Kleeman, Christopher D. Manning
9 pages
Chunking: Computerlinguistik Sommersemester 2003 Lehrstuhl Für CL Hauptseminar: Parsing Referent: Andy Ardelean
No ratings yet
Chunking: Computerlinguistik Sommersemester 2003 Lehrstuhl Für CL Hauptseminar: Parsing Referent: Andy Ardelean
9 pages
Midterm F09 Answers
No ratings yet
Midterm F09 Answers
12 pages
Slp14 Handout s17hw
No ratings yet
Slp14 Handout s17hw
71 pages
NLP Unit-2 QB Updated
No ratings yet
NLP Unit-2 QB Updated
10 pages
Pract Q
No ratings yet
Pract Q
6 pages
NLP Sample Questions-Stu
No ratings yet
NLP Sample Questions-Stu
4 pages
6 Probabilisticparse
No ratings yet
6 Probabilisticparse
46 pages
SCFG PCFG LCFG
No ratings yet
SCFG PCFG LCFG
25 pages
Statistical Constituency Pars-Ing: 14.1 Probabilistic Context-Free Grammars
No ratings yet
Statistical Constituency Pars-Ing: 14.1 Probabilistic Context-Free Grammars
29 pages
Computer 2
No ratings yet
Computer 2
13 pages
UNIT 4 Part1
No ratings yet
UNIT 4 Part1
19 pages
Lexicalized Probabilistic Context-Free Grammars: Michael Collins
No ratings yet
Lexicalized Probabilistic Context-Free Grammars: Michael Collins
22 pages
NLP Module 3
No ratings yet
NLP Module 3
11 pages
Module No. 3: Parsing Structure in Text
No ratings yet
Module No. 3: Parsing Structure in Text
54 pages
Statistical Constituency Pars-Ing: C.1 Probabilistic Context-Free Grammars
No ratings yet
Statistical Constituency Pars-Ing: C.1 Probabilistic Context-Free Grammars
21 pages
Lexicalized Probabilistic Context-Free Grammars: Michael Collins
No ratings yet
Lexicalized Probabilistic Context-Free Grammars: Michael Collins
22 pages
Constituency Parsing PPT 2
No ratings yet
Constituency Parsing PPT 2
33 pages
Unit Ii Syntax Analysis
No ratings yet
Unit Ii Syntax Analysis
7 pages
CS450 - DLP - External Exam
No ratings yet
CS450 - DLP - External Exam
4 pages
Lecture 07
No ratings yet
Lecture 07
35 pages
2024 CD-Ch03 Syntaxx Analysis
No ratings yet
2024 CD-Ch03 Syntaxx Analysis
28 pages
14 Syntax 1
No ratings yet
14 Syntax 1
22 pages
Compiler Design Unit Wise Questions For Weekly Test
No ratings yet
Compiler Design Unit Wise Questions For Weekly Test
7 pages
r19 Ai Unit IV Chapter 1
No ratings yet
r19 Ai Unit IV Chapter 1
19 pages
NLP Sem 3 Unit
No ratings yet
NLP Sem 3 Unit
12 pages
CD Cycle Test 2
No ratings yet
CD Cycle Test 2
1 page
14 Ai Cse551 NLP 2 PDF
No ratings yet
14 Ai Cse551 NLP 2 PDF
39 pages
cs626 460 Midsem 2012 02 20 PDF
No ratings yet
cs626 460 Midsem 2012 02 20 PDF
1 page
Fundamentals of Information Technology
No ratings yet
Fundamentals of Information Technology
2 pages
Ai Unit 5
No ratings yet
Ai Unit 5
19 pages
CS6120 35650 - Spring2025 - Assignment - 2-1
No ratings yet
CS6120 35650 - Spring2025 - Assignment - 2-1
5 pages
CFG and PCFG
No ratings yet
CFG and PCFG
7 pages
Crack Propagation in Ansys
100% (2)
Crack Propagation in Ansys
24 pages
423/723 Natural Language Processing: Assignment 1
No ratings yet
423/723 Natural Language Processing: Assignment 1
4 pages
Untouchable Summary and Analysis of "Bakha at The Temple" and "Bakha Takes A Nap"
No ratings yet
Untouchable Summary and Analysis of "Bakha at The Temple" and "Bakha Takes A Nap"
8 pages
Week 3 - Probablistic Context Free Grammars
No ratings yet
Week 3 - Probablistic Context Free Grammars
18 pages
NLP Endsem 2016
No ratings yet
NLP Endsem 2016
2 pages
NLP Unit-4
No ratings yet
NLP Unit-4
6 pages
Gamer Printshop - Rude Awakening
100% (1)
Gamer Printshop - Rude Awakening
20 pages
A Look at Parsing and Its Applications
No ratings yet
A Look at Parsing and Its Applications
5 pages
s4 Owners Manual 20042017
No ratings yet
s4 Owners Manual 20042017
70 pages
Factors That Influence Temperature & Rainfall
100% (1)
Factors That Influence Temperature & Rainfall
4 pages
NLP Question
No ratings yet
NLP Question
4 pages
Data Inadequate: Interior Decorator
No ratings yet
Data Inadequate: Interior Decorator
2 pages
Unit 123 (NLP)
No ratings yet
Unit 123 (NLP)
3 pages
Idef02 - BPWin Standard
No ratings yet
Idef02 - BPWin Standard
238 pages
NLP Endsem 2015
No ratings yet
NLP Endsem 2015
2 pages
What Is A Cover Letter For A Job? Definition, Purpose, Meaning
0% (1)
What Is A Cover Letter For A Job? Definition, Purpose, Meaning
4 pages
Beginning With Shell Scripting: 1) Kernel 2) Shell 3) Process 4) Redirectors, Pipes, Filters Etc
No ratings yet
Beginning With Shell Scripting: 1) Kernel 2) Shell 3) Process 4) Redirectors, Pipes, Filters Etc
7 pages
Risk Management and Laboratory Safety
No ratings yet
Risk Management and Laboratory Safety
23 pages
Communication
No ratings yet
Communication
33 pages
Lampiran 1 Lembar Persetujuan Setelah Penjelasan (PSP) : (Informed Consent)
No ratings yet
Lampiran 1 Lembar Persetujuan Setelah Penjelasan (PSP) : (Informed Consent)
9 pages
Final Chapter of Untouchables Summary
No ratings yet
Final Chapter of Untouchables Summary
3 pages
Peace Education Reflection Paper
No ratings yet
Peace Education Reflection Paper
1 page
Objective:: Lab#10: 7-Segment Display SSUET/QR/114
No ratings yet
Objective:: Lab#10: 7-Segment Display SSUET/QR/114
4 pages
Database Management System: By-Karan Tiwari Bca 2 Semester
No ratings yet
Database Management System: By-Karan Tiwari Bca 2 Semester
14 pages
ICOMOS, 2004. The WHL Filling The Gaps
No ratings yet
ICOMOS, 2004. The WHL Filling The Gaps
98 pages
Performance Evaluation Form
No ratings yet
Performance Evaluation Form
1 page
Cross Coverage
No ratings yet
Cross Coverage
31 pages
111747920
No ratings yet
111747920
61 pages
77777
No ratings yet
77777
29 pages
WACP To Add Two Complex Numbers Using Structure
100% (1)
WACP To Add Two Complex Numbers Using Structure
3 pages
Fabco Sda 1800 Steerable Drive Axle Parts Manual
No ratings yet
Fabco Sda 1800 Steerable Drive Axle Parts Manual
14 pages
Mesin Skala Industri
No ratings yet
Mesin Skala Industri
2 pages
Imiforce 200 SC
No ratings yet
Imiforce 200 SC
5 pages
Bateni 2014
No ratings yet
Bateni 2014
9 pages
UPhL Ep 01
No ratings yet
UPhL Ep 01
6 pages
Descent and Descending Turns 3
No ratings yet
Descent and Descending Turns 3
8 pages
OPEN CLOZE MADE EASY: MADE EASY SERIES
From Everand
OPEN CLOZE MADE EASY: MADE EASY SERIES
D.Méndez
1/5 (1)
Presentations on the Critical Path Method
From Everand
Presentations on the Critical Path Method
Robert Perrine
1/5 (2)
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
Painless Calculus
From Everand
Painless Calculus
Barron's Educational Series
No ratings yet

Final Practice

Uploaded by

Final Practice

Uploaded by

CS474

Introduction to Natural Language Processing

# description score max score

1 Parsing with PCFGs ____ / 25

2 Bottom-up Chart Parsing ____ / 10

3 Partial Parsing/Question Answering ____ / 30

5 The Grab Bag ____ / 20

Total score: ______ / 100

PP → Prep Noun 1.0

Det → the 0.1

Cut the envelope with scissors.

Ask the grandma with scissors.

production rule probability

Find the men in suits.

NP → Det Noun PP → Prep Noun

[From product reviews for various computer peripherals.]

The bluetooth receiver in the charger was functional, and I appreciated

• What features of product X are buyers satisfied with?

For each of inferences (a) through (d) below,

(a) The reviewer owns the keyboard.

You might also like