0% found this document useful (0 votes)
34 views5 pages

G22.2591 - Advanced Natural Language Processing - Spring 2004 Name Recognition Why Name Recognition?

1) Name recognition was introduced as a separate task in MUC-6 to encourage research on improving technologies for identifying names from text, which is important for applications like document indexing and question answering. 2) For MUC-6, names were categorized as people, organizations, locations, dates, times, percentages and currencies. Subsequent evaluations have expanded categories and most studies have used the original three categories. 3) Name recognition is scored based on recall, precision and F-measure, with perfect match requiring correct type, start and end positions of a name.

Uploaded by

undos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views5 pages

G22.2591 - Advanced Natural Language Processing - Spring 2004 Name Recognition Why Name Recognition?

1) Name recognition was introduced as a separate task in MUC-6 to encourage research on improving technologies for identifying names from text, which is important for applications like document indexing and question answering. 2) For MUC-6, names were categorized as people, organizations, locations, dates, times, percentages and currencies. Subsequent evaluations have expanded categories and most studies have used the original three categories. 3) Name recognition is scored based on recall, precision and F-measure, with perfect match requiring correct type, start and end positions of a name.

Uploaded by

undos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

G22.

2591 - Advanced Natural Language Processing - Spring 2004


Lecture 5
Name Recognition
Why name recognition?
Name recognition was introduced as a separate task in Message Understanding
Conference - 6 (see also the paper by Grishman and Sundheim). Through
earlier IE evaluations, system developers came to recognize that name
recognition and classification was an important part of text processing, even if
it was not recognized as basic in linguistic study. Making it a separate task
encouraged research to improve this technology, and emphasized its value for a
range of applications (document indexing, and later question answering).
For MUC-6, there were three name categories -- people, organizations, and
locations. Date, time, percentage, and currency expressions were also included
under name recognition. Some evaluations since then have added categories ...
artifact, facility, weapon, ... . In fact, some systems for open-domain question
answering have added a very large (100+) number of categories. However,
almost all studies have been done with the original set of three name
categories. Similar evaluations have been done for quite a few foreign
languages; CoNLL-2002 shared task did Dutch and Spanish; CoNLL-2003
shared taskdid English and German.
How to measure scores?
Name recognition is scored by recall, precision, and F-measure (combination of
recall and precision). The simplest rule is to require perfect match -- you only
get credit if you get the type, start, and end of a name correct (this is the metric
used, for example, in the Japanese IREX evaluation); the MUC evaluations
used a more generous scoring, with partial score for identifying a name of any
sort, for getting its type correct, and for getting its extent correct.
How well do people do?
In a small study for MUC-6 (17 articles), Sundheim reported <5%
interannotator (key-to-key) error. Agreement is probably enhanced in
languages where names are capitalized and for text where the annotator is

familiar with most of the names. Without capitalization, it can be hard to tell
unfamiliar organization names from common noun phrases.
Hand-coded rules
For a specific domain, it is possible to do very well with hand-coded rules and
dictionaries. On the MUC-6 evaluation (a very favorable situation, where the
source and general topic of the test data was known in advance), the SRA
system, based on hand-coded rules, got F=96.4. Writing rules by hand,
however, requires some skill and considerable time.
The hand-coded rules take advantage of
known names (through lists of well-known places, organizations, and
people)
characteristic suffixes for organizations (Corp., Associates, ...) and
locations (Island, Bay)
first names for people
titles for people
other mentions of the same name in an article
Note that sometimes the type decision is based upon left context, and
sometimes upon right context, so it would be difficult for taggers which operate
deterministically from left to right or from right to left to perform optimally.
Supervised training
Like POS tagging and chunking, named entity recognition has been tried with
very many different machine learning methods. More than the syntactic tasks,
performance on NE recognition depends on the variety of resources which are
brought to bear. CoNLL evaluations are relatively 'pure' ... the systems
basically just learn from the provided training corpus. On the other hand, 'real'
systems make use of as many lists and as much training data as available. This
has a substantial effect on performance. In additiion, performance is strongly
affected by the domain of the training and test data. These two effects can

make it difficult to compare results across different evaluations.


As with chunking, NE tagging can be recast as a token classification task. We
will have an "O" tag (token is not part of a named entity), and "B-X" and "I-X"
tags for each name type X.
Markov Models for Name Recognition
The NYU Jet system uses a straightforward HMM for named entity tagging.
The simplest HMM has a single state for each name type, and a single state for
not-a-name (NaN). However, typically the first and last word of a name have
different distributions, and the words immediately before or after a word often
give a good indication of the name type (for example, 'Mr.' before a name is a
clear indication of a person, while 'near' before a name probably indicates a
location). Therefore, we were able to create a more accurate model by having
separate states for the words immediately before and after a name, and for the
first and last tokens of a name. This added about 2 points to recall (89 to 91)
and 4 points to precision (82 to 86).
BBN's Nymble name tagger (Daniel M. Bikel; Scott Miller; Richard Schwartz;
Ralph Weischedel. Nymble: a High-Performance Learning Name-finder. Proc.
ANLP 97.) is perhaps the best-known name tagger.
They used several techniques to enhance performance over a basic HMM.
Most notably, they used bigram probabilities: they differentiated between the
probability of generating the first word of a name and subsequent words of a
name. The probability of generating the first word was made dependent on the
prior state; the probability of generating subsequent words was made
dependent on the prior word. The probability of a state transition was made
dependent on the prior word. This had to be combined with smoothing to
handle the case of unseen bigrams.
HMMs are generative models, and we noted before some difficulties with such
models. A generative model produces a joint probability over observation and
label sequences; typically we compute P(new state | prior state) and P(current
word | current state). It is difficult to represent long-range or multiple
interacting features in such a formalism. Instead, researchers have used
functions which compute the state probability given the input -- a formalism

which allows for a richer set of features.


Sekine et al. (Satoshi Sekine; Ralph Grishman; Hiroyuki Shinnou. A Decision
Tree Method for Finding and Classifying Names in Japanese Texts. Sixth
WVLC, 1998) used a decision tree method for Japanese named entity. The
decision tree yeilded information on the probability of the various tags. A
Viterbi algorithm then computed the most likely tagging of the entire sentence.
Borthwick et al. (Andrew Borthwick; John Sterling; Eugene Agichtein; Ralph
Grishman. Exploiting Diverse Knowledge Sources via Maximum Entropy in
Named Entity Recognition. Sixth WVLC, 1998) used a maximum entropy
method to compute the tags. Again, a Viterbi decoder was used to select the
best tagging. By itself the method did fairly well (92.2 F on dry-run). More
interestingly, it could be combined with the patterns of the NYU hand-codedrule system, with each rule a separate feature. The rule-based system by itself
also got 92.2 F; the combined system got 95.6 F, roughly on a par with the best
commercial system.
McCallum (Maximum Entropy Markov Models for Information Extraction and
Segmentation. Andrew McCallum, Dayne Freitag and Fernando Pereira.
ICML-2000) describes general Maximum Entropy Markov Models (MEMMs)
as computing P(current state | input, prior state) using Maximum Entropy
methods. The Ratnaparkhi POS tagger is close to this model. McCallum notes
that the Borthwick model is somewhat weaker in that the current state
probability is conditioned only on the input, not on the prior state, and that
may be why it did not do quite as well as the Nymble HMM model.
Discriminative training for HMMs
Another concern we had with HMMs was that the parameters learned may not
be the optimal ones for the ultimate classification task. As an alternative, we
considered discriminative methods ... methods which were trained to make the
discrimination between classes directly. We considered one such approach,
SVMs, last week. Collins ( Discriminative Training Methods for Hidden
Markov Models: Theory and Experiments with Perceptron Algorithms,
EMNLP 02; Collins and Duffy, ACL 2002) has described a somewhat different
approach. The basic idea was to use error-driven training. Collins reported a
15% reduction in error rate on a named entity tagging task by using this
approach.
Looking ahead to next week ...unsupervised learning of names

Tomek Strzalkowski; Jin Wang. A Self-Learning Universal Concept


Spotter. COLING 96.
Michael Collins; Yoram Singer. Unsupervised Models for Named Entity
Classification. EMNLP 99.
Silviu Cucerzan; David Yarowsky. Language Independent Named Entity
Recognition Combining Morphological and Contextual Evidence. EMNLP 99.

You might also like