Noun Phrase Extraction: A Description of Current Techniques

This document summarizes current techniques for noun phrase extraction including rule-based and machine learning approaches. Early solutions used simple rule-based and finite state automata methods based on part-of-speech tags and grammar rules. More recent approaches apply machine learning models like transformation-based learning, memory-based learning, maximum entropy modeling, hidden Markov models, and conditional random fields which are trained on large annotated corpora to learn extraction patterns.

Uploaded by

Ridha Galih Permana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views36 pages

Noun Phrase Extraction: A Description of Current Techniques

Uploaded by

Ridha Galih Permana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 36

Noun Phrase Extraction

A Description of Current
Techniques
What is a noun phrase?
 A phrase whose head is a noun or pronoun
optionally accompanied by a set of modifiers
 Determiners:
• Articles: a, an, the
• Demonstratives: this, that, those
• Numerals: one, two, three
• Possessives: my, their, whose
• Quantifiers: some, many
 Adjectives: the red ball
 Relative clauses: the books that I bought yesterday
 Prepositional phrases: the man with the black hat
Is that really what we want?
 POS tagging already identifies pronouns and
nouns by themselves
 The man whose red hat I borrowed yesterday in
the street that is next to my house lives next
door.
 [The man [whose red hat [I borrowed yesterday]RC ]RC
[in the street]PP [that is next to my house]RC ]NP lives
[next door]NP.
 Base Noun Phrases
 [The man]NP whose [red hat]NP I borrowed
[yesterday ]NP in [the street]NP that is next to [my
house]NP lives [next door]NP.
How Prevalent is this Problem?

 Established by Steven Abney in 1991 as a core

step in Natural Language Processing
 Quite explored
What were the successful early
solutions?
 Simple Rule-based/ Finite State Automata

Both of these rely on the aptitude of the linguist

formulating the rule set.
Simple Rule-based/ Finite State
Automata
 A list
of grammar rules and relationships
are established. For example:
 If I have an article preceding a noun, that
article marks the beginning of a noun phrase.
 I cannot have a noun phrase beginning after
an article
 The simplest method
FSA simple NPE example
noun/ pronoun/ determiner

determiner/ noun/
adjective pronoun
S0 S1 NP

adjective
Relative clause/
Prepositional phrase/
noun
Simple rule NPE example
 “Contextualization” and “lexicalization”
 Ratio between the number of occurrences of
a POS tag in a chunk and the number of
occurrences of this POS tag in the training
corpora
Parsing FSA’s, grammars, regular
expressions: LR(k) Parsing
 The L means we do Left to right scan of input tokens

 The R means we are guided by Rightmost derivations

 The k means we will look at the next k tokens to help us

make decisions about handles

 We shift input tokens onto a stack and then reduce that

stack by replacing RHS handles with LHS non-terminals
An Expression Grammar

1. E -> E + T
2. E -> E - T
3. E -> T
4. T -> T * F
5. T -> T / F
6. T -> F
7. F -> (E)
8. F -> i
LR Table for Exp Grammar
An LR(1) NPE Example
Stack Input Action 1. S  NP VP
[] NVN SH N 2. NP  Det N
[N] VN RE 3.) NP  N 3. NP  N
[NP] VN SH V 4. VP  V NP
[NP V] N SH N
[NP V N] RE 3.) NP  N
[NP V NP] RE 4.) VP  V NP
[NP VP] RE 1.) S  NP VP
[S] Accept!

(Abney, 1991)
Why isn’t this enough?
 Unanticipated rules
 Difficulty finding non-recursive, base NP’s
 Structural ambiguity
Structural Ambiguity
“I saw the man with the telescope.”

S S

NP VP NP VP

I V NP I VP NP

saw V
DET N PP DET N
PP
the man saw the man
PRP DET N
PRP DET N
with the telescope
with the telescope
What are the more current
solutions?
 Machine Learning
 Transformation-based Learning
 Memory-based Learning
 Maximum Entropy Model
 Hidden Markov Model
 Conditional Random Field
 Support Vector Machines
Machine Learning means
TRAINING!
 Corpus: a large, structured set of texts
 Establish usage statistics
 Learn linguistics rules
 The Brown Corpus
 American English, roughly 1 million words
 Tagged with the parts of speech
 http://www.edict.com.hk/concordance/WWWConcappE.htm
Transformation-based Machine
Learning
 An ‘error-driven’ approach for learning an
ordered set of rules
 1. Generate all rules that correct at least one error.
 2. For each rule:
(a) Apply to a copy of the most recent
state of the training set.
(b) Score the result using the objective
function.
 3. Select the rule with the best score.
 4. Update the training set by applying the selected
rule.
 5. Stop if the score is smaller than some pre-set
threshold T; otherwise repeat from step 1.
Transformation-based NPE
example
 Input:
 “WhitneyNN currentlyADV hasVB theDT rightADJ ideaNN.”

 Expected output:
 “[NP Whitney] [ADV currently] [VB has] [NP the right idea].”

 Rules generated (all not shown) :

From To If
NN NP always
ADJ NP the previous word was ART
DT NP the next word is an ADJ
DT NP the previous word was VB
Memory-based Machine Learning
 Classify data according to similarities to other data
observed earlier
 “Nearest neighbor”
 Learning
 Store all “rules” in memory
 Classification:
 Given new test instance X,
• Compare it to all memory instances
• Compute a distance between X and memory instance Y
 Update the top k of closest instances (nearest neighbors)
 When done, take the majority class of the k nearest neighbors as
the class of X

Daelemans, 2005
Memory-based Machine Learning
Continued
 Distance…?
 The Overlapping Function: Count the number of
mismatching features
 The Modified Value Distance Metric (MVDM)
Function: estimate a numeric distance between two
“rules”
 The distance between two N-dimensional vectors A,
B with discrete (for example symbolic) elements, in a
K class problem, is computed using conditional
probabilities:
 d(A,B) = Σj..n Σi..k (P(Ci I Aj) - P(Ci | Pj))
 where p(CilAj) is estimated by calculating the number
Ni(Aj) of times feature Aj occurred in vectors belonging
to class Ci, and dividing it by the number of times
feature Aj occurred for any class
Dusch, 1998
Memory-based NPE example
 Suppose we have the following candidate
sequence:
 DT ADJ ADJ NN NN
• “The beautiful, intelligent summer intern”
 In our rule set we have:
 DT ADJ ADJ NN NNP
 DT ADJ NN NN
Maximum Entropy
 The least biased probability distribution that
encodes information maximizes the information
entropy, that is, the measure of uncertainty
associated with a random variable.
 Consider that we have m unique propositions
 The most informative distribution is one in which we
know one of the propositions is true – information
entropy is 0
 The least informative distribution is one in which there
is no reason to favor any one proposition over
another – information entropy is log m
Maximum Entropy applied to NPE
 Let’s consider several French translations of the English word “in”
 p(dans) + p(en) + p(á) + p(au cours de) + p(pendant) = 1
 Now suppose that we find that either dans or en is chosen 30% of
the time. We must add that constraint to the model and choose the
most uniform distribution
 p(dans) = 3/20
 p(en) = 3/20
 p(á) = 7/30
 p(au cours de) = 7/30
 p(pendant) = 7/30
 What if we now find that either dans or á is used half of the time?
 p(dans) + p(en) = .3
 p(dans) + p(á) = .5
 Now what is the most “uniform” distribution?

Berger, 1996
Hidden Markov Model
 In a statistical model of a system possessing the
Markov property…
 There are a discrete number of possible states
 The probability distribution of future states depends
only on the present state and is independent of past
states
 These states are not directly observable in a
hidden Markov model.
 The goal is to determine the hidden properties
from the observable ones.
Hidden Markov Model

 a: transition probabilities
 x: hidden states
 y: observable states
 b: output probabilities
HMM Example
 states = ('Rainy', 'Sunny')
 observations = ('walk', 'shop', 'clean')
 start_probability = {'Rainy': 0.6, 'Sunny': 0.4}
 transition_probability = {
'Rainy' : {'Rainy': 0.7, 'Sunny': 0.3},
'Sunny' : {'Rainy': 0.4, 'Sunny': 0.6}, }
 emission_probability = {
'Rainy' : {'walk': 0.1, 'shop': 0.4, 'clean': 0.5},
'Sunny' : {'walk': 0.6, 'shop': 0.3, 'clean': 0.1}, }
In this case, the weather possesses the Markov
property
HMM as applied to NPE
 In the case of noun phrase extraction, the hidden
property is the unknown grammar “rule”
 Our observations are formed by our training data
 Contextual probabilities represent the transition states
 that is, given our previous two transitions, what is the likelihood
of continuing, ending, or beginning a noun phrase/ P(oi|oj-1,oj-2)
 Output probabilities
 Given our current state transition, what is the likelihood of our
current word being part of, beginning, or ending a noun phrase/
P(ij|oj)
 MaxO1…OT( πj:1…T P(oi|oj-1,oj-2) · P(ij|oj) )
The Viterbi Algorithm
 Now that we’ve constructed this
probabilistic representation, we need to
traverse it
 Finds the most likely sequence of states
Viterbi Algorithm
Whitney gave a painfully long presentation.
Conditional Random Fields
 An undirected graphical model in which each vertex represents a
random variable whose distribution is to be inferred, and each edge
represents a dependency between two random variables. In a CRF,
the distribution of each discrete random variable Y in the graph is
conditioned on an input sequence X

Yi could be B,I,O in the NPE case

…
y1 y2 y3 y4 yn-1 yn

x1, …, xn-1, xn
Conditional Random Fields
 The primary advantage of CRF’s over
hidden Markov models is their conditional
nature, resulting in the relaxation of the
independence assumptions required by
HMM’s
 The transition probabilities of the HMM
have been transformed into feature
functions that are conditional upon the
input sequence
Support Vector Machines
 We wish to graph an number of data points of dimension p and separate those
points with a p-1 dimensional hyperplane that guarantees the maximum
distance between the two classes of points – this ensures the most
generalization
 These data points represent pattern samples whose dimension is dependent
upon the number of features used to describe them

 http://www.csie.ntu.edu.tw/~cjlin/libsvm/#GUI
What if our points are separated by
a nonlinear barrier?

The Kernel function (Φ): maps points from 2d

to 3d space
•The Radial Basis Function is the best
function that we have for this right now
SVM’s applied to NPE
 Normally, SVM’s are binary classifiers
 For NPE we generally want to know about (at
least) three classes:
 B: a token is at the beginning of a chunk
 I: a token is inside a chunk
 O: a token is outside a chunk
 We can consider one class vs. all other classes
for all possible combinations
 We could do a pairwise classification
 If we have k classes, we build k · (k-1)/2 classifiers
Performance Metrics Used
 Precision= number of correct responses
number of responses
 Recall = number of correct responses
number correct in key
 F-measure = (β2 + 1) RP
(β2R) + P
Where β2 represents the relative weight of recall to precision (typically 1)

(Bikel, 1998)
Primary Method Implementation Evaluation Performance Pros Cons
Work Data (F-measure)
Dejean Simple rule-based “ALLiS” CONLL 2000 92.09 Extremely simple, Not very robust, difficult to
Uses XML input task quick; doesn’t improve upon; extremely
Not available require a training difficult to generate rules
corpus

Ramshaw, Transformation C++, Perl Penn Treebank 92.03 - 93 … Extremely dependent upon
Marcus Based Learning Available! training set and its
“completeness” – how many
different ways the NP are
formed; requires a fair amount
of memory

Tjong Kim Memory-Based “TiMBL” Penn 93.34, 92.5 Highly suited to Has no ability to intelligently
Sang Learning Python Treebank, the NLP task weight “important” features;
Available! CONLL 2000 also it cannot identify feature
task dependency – both of these
problems result in a loss of
accuracy

Koeling Maximum Entropy Not available CONLL 2000 91.97 First statistical Always makes the best local
task approach, higher decision without much regard
accuracy at all for position

Molina, Pla Hidden Markov Not available CONLL 2000 92.19 Takes position Make conditional
Model task into account independence assumptions
which ignore special input
features such as
capitalization, suffixes,
surrounding words
Sha, Pereira Conditional Java Penn 94.38 (“no Can handle “Over fitting”
Random Fields Is Available… sort of Treebank, significant millions of
CRF++ in C++ by CONLL 2000 difference”) features, handles
Kudo also task both position and
dependencies
IS AVAILABLE!
Kudo, Support Vector C++, Perl, Python Penn 94.22, 93.91 Minimizes error Doesn’t really take position
Matsumoto Machines Available! Treebank, resulting in higher into account
CONLL 2000 accuracy/ handles
task tons of features

Xu Ly Ngon Ngu Tu Nhien Regina Barzilay Lec20 Global Linear Models (Cuuduongthancong - Com)
No ratings yet
Xu Ly Ngon Ngu Tu Nhien Regina Barzilay Lec20 Global Linear Models (Cuuduongthancong - Com)
70 pages
UNIT3
No ratings yet
UNIT3
52 pages
1 cs772 Introduction Week of 3jan22
No ratings yet
1 cs772 Introduction Week of 3jan22
53 pages
Lec 10
No ratings yet
Lec 10
77 pages
05 Ar 4
No ratings yet
05 Ar 4
145 pages
L4 Tagging
No ratings yet
L4 Tagging
107 pages
Lecture03 Naive Bayes
No ratings yet
Lecture03 Naive Bayes
33 pages
NLP Question Paper Solution
No ratings yet
NLP Question Paper Solution
27 pages
Introducing Global Englishes
100% (1)
Introducing Global Englishes
313 pages
English7 Q3 W1 D4
No ratings yet
English7 Q3 W1 D4
44 pages
Maximum Entropy Markov Models: Alan Ritter CSE 5525
No ratings yet
Maximum Entropy Markov Models: Alan Ritter CSE 5525
70 pages
The Expectation Maximization (EM) Algorithm: Continued!
No ratings yet
The Expectation Maximization (EM) Algorithm: Continued!
67 pages
Lec 2
No ratings yet
Lec 2
21 pages
07 Exercises Dependency Parsing MUD SOLVED
No ratings yet
07 Exercises Dependency Parsing MUD SOLVED
13 pages
Slp14 Handout s17hw
No ratings yet
Slp14 Handout s17hw
71 pages
2 cs626 Pos Tagging Week of 1aug22
No ratings yet
2 cs626 Pos Tagging Week of 1aug22
57 pages
Lecture 4 N Grams
No ratings yet
Lecture 4 N Grams
29 pages
Over Fitting and TBL
No ratings yet
Over Fitting and TBL
46 pages
Using MALLET For Conditional Random Fields: Matthew Michelson & Craig A. Knoblock CSCI 548 - Lecture 3
No ratings yet
Using MALLET For Conditional Random Fields: Matthew Michelson & Craig A. Knoblock CSCI 548 - Lecture 3
41 pages
This Is AI4001: GCR: t37g47w
No ratings yet
This Is AI4001: GCR: t37g47w
51 pages
NLP L IA2
No ratings yet
NLP L IA2
23 pages
CRF Klinger Tomanek
No ratings yet
CRF Klinger Tomanek
32 pages
NLP 4
No ratings yet
NLP 4
83 pages
NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing
No ratings yet
NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing
28 pages
Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data
No ratings yet
Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data
28 pages
Lec PoS Tagging 2022
No ratings yet
Lec PoS Tagging 2022
67 pages
Week 9
No ratings yet
Week 9
36 pages
3 cs626 Pos Tagging Week of 8aug22
No ratings yet
3 cs626 Pos Tagging Week of 8aug22
27 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
Named Entity Survey
No ratings yet
Named Entity Survey
27 pages
NLP CH 2
No ratings yet
NLP CH 2
59 pages
NLP Summary
No ratings yet
NLP Summary
2 pages
A1 Most Common Irregular Verbs Part 2 SV
100% (1)
A1 Most Common Irregular Verbs Part 2 SV
25 pages
Lecture 02
No ratings yet
Lecture 02
31 pages
Module5 DS PPT
No ratings yet
Module5 DS PPT
38 pages
Predictive Methods For Text Mining
No ratings yet
Predictive Methods For Text Mining
75 pages
NLP Unit 4
No ratings yet
NLP Unit 4
22 pages
9.chapter7 POS Tagging
No ratings yet
9.chapter7 POS Tagging
37 pages
404 Ba (P1) Artificial Intelligence in Business
No ratings yet
404 Ba (P1) Artificial Intelligence in Business
12 pages
05 Introduction To NLP
No ratings yet
05 Introduction To NLP
63 pages
Structured Prediction
No ratings yet
Structured Prediction
3 pages
CSCI 5832 Natural Language Processing: Jim Martin
No ratings yet
CSCI 5832 Natural Language Processing: Jim Martin
46 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
Imitation Learning: Modeling & Learning Sequence of Decisions
No ratings yet
Imitation Learning: Modeling & Learning Sequence of Decisions
53 pages
NLP
No ratings yet
NLP
12 pages
CS6314
No ratings yet
CS6314
2 pages
Probabilistic Language Modeling Challenges
No ratings yet
Probabilistic Language Modeling Challenges
12 pages
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
No ratings yet
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
148 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
74 pages
Reference Material NLP - 2
No ratings yet
Reference Material NLP - 2
40 pages
What Is NLP?: Components of An FSA
No ratings yet
What Is NLP?: Components of An FSA
16 pages
Machine Learning and Statistical Natural Language Processing
No ratings yet
Machine Learning and Statistical Natural Language Processing
27 pages
End Sem Answer Key 2023
No ratings yet
End Sem Answer Key 2023
4 pages
Question Bank NLP SOLUTIONS
No ratings yet
Question Bank NLP SOLUTIONS
21 pages
NLP Viva
No ratings yet
NLP Viva
14 pages
Ngrams
100% (1)
Ngrams
22 pages
CELTA Assignment 1
100% (5)
CELTA Assignment 1
5 pages
Complex Sentiment Analysis Using Recursive Autoencoders
No ratings yet
Complex Sentiment Analysis Using Recursive Autoencoders
5 pages
Mozambique: Language OF Instruction Country Profile
No ratings yet
Mozambique: Language OF Instruction Country Profile
14 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
51 pages
Eisenstein
No ratings yet
Eisenstein
305 pages
It-3035 (NLP) - CS Mid Feb 2024
No ratings yet
It-3035 (NLP) - CS Mid Feb 2024
6 pages
Picture Cued
No ratings yet
Picture Cued
3 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Unit 5 - Test
No ratings yet
Unit 5 - Test
2 pages
Translation Product
No ratings yet
Translation Product
23 pages
U Main Types of Strategies and Activitiespptx
No ratings yet
U Main Types of Strategies and Activitiespptx
21 pages
Values of The Tenses
83% (6)
Values of The Tenses
3 pages
Level 2 Exit Test
No ratings yet
Level 2 Exit Test
12 pages
SURIGAONON1
No ratings yet
SURIGAONON1
11 pages
CAE Speaking Test Part 1 - Common Questions and Sample Answers
100% (1)
CAE Speaking Test Part 1 - Common Questions and Sample Answers
5 pages
Peace in Pakistan
No ratings yet
Peace in Pakistan
3 pages
Lista de Verbos
No ratings yet
Lista de Verbos
3 pages
Opening and Closing For Speech (MC and Speaker)
No ratings yet
Opening and Closing For Speech (MC and Speaker)
1 page
Chapter 5 Predictive Analytics II Text J Web J and Social Media Analytics
No ratings yet
Chapter 5 Predictive Analytics II Text J Web J and Social Media Analytics
5 pages
Recount Text: By: Maria Fery Ariska
No ratings yet
Recount Text: By: Maria Fery Ariska
8 pages
Phonics Lesson - Complex Sounds
No ratings yet
Phonics Lesson - Complex Sounds
5 pages
YouTube and You Learning in The Digital Age
No ratings yet
YouTube and You Learning in The Digital Age
7 pages
(FREE PDF Sample) Modelling Paralanguage Using Systemic Functional Semiotics Theory and Application 1st Edition Thu Ngo Ebooks
100% (19)
(FREE PDF Sample) Modelling Paralanguage Using Systemic Functional Semiotics Theory and Application 1st Edition Thu Ngo Ebooks
84 pages
Thesis Statement Dos and Donts
100% (3)
Thesis Statement Dos and Donts
5 pages
English
No ratings yet
English
2 pages
Positive To Comparative Degree
No ratings yet
Positive To Comparative Degree
7 pages
Effects of Team Teaching On Students' Academic Achievement in English Language Comprehension
No ratings yet
Effects of Team Teaching On Students' Academic Achievement in English Language Comprehension
7 pages
Types of Figurative Language: 1. Simile
No ratings yet
Types of Figurative Language: 1. Simile
4 pages
9 - Prepositions
No ratings yet
9 - Prepositions
41 pages
UNIT 3 - Belajar Bahasa Inggris Dari Nol
No ratings yet
UNIT 3 - Belajar Bahasa Inggris Dari Nol
10 pages
Unit 11 - Lesson 6 - Skills - 2
No ratings yet
Unit 11 - Lesson 6 - Skills - 2
5 pages
Trans Languag Ing
No ratings yet
Trans Languag Ing
13 pages
Grammar Unit 1
No ratings yet
Grammar Unit 1
3 pages
Spring 2012 Semantics My Antonymy 5 March
No ratings yet
Spring 2012 Semantics My Antonymy 5 March
2 pages
Untitled
No ratings yet
Untitled
4 pages
Practice Test 1: Answer: D. No, Thanks
No ratings yet
Practice Test 1: Answer: D. No, Thanks
5 pages
Dr. Fary Satriadi: Sanmol 500 MG 4 Tablet
No ratings yet
Dr. Fary Satriadi: Sanmol 500 MG 4 Tablet
1 page
Main Idea Assignment
No ratings yet
Main Idea Assignment
1 page