0% found this document useful (0 votes)
10 views83 pages

NLP Unit-5

The document discusses various applications of Natural Language Processing (NLP), including speech recognition, text processing, dialogue systems, sentiment analysis, and named entity recognition. It highlights the challenges and successes associated with each application, such as speech understanding difficulties and the complexities of sentiment detection. Additionally, it emphasizes the importance of NLP in enhancing user interfaces and extracting meaningful information from text.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views83 pages

NLP Unit-5

The document discusses various applications of Natural Language Processing (NLP), including speech recognition, text processing, dialogue systems, sentiment analysis, and named entity recognition. It highlights the challenges and successes associated with each application, such as speech understanding difficulties and the complexities of sentiment detection. Additionally, it emphasizes the importance of NLP in enhancing user interfaces and extracting meaningful information from text.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Applications

of NLP
Applications
• What uses of the computer involve
language?
• What language use is involved?
• What are the main problems?
• How successful are they?

2/26
Speech applications
• Speech recognition (Speech-to-text)
– Uses
• As a general interface to any text-based application
• Text dictation
• Speech understanding
– Not the same: computer must understand intention, not necessarily exact
words
– Uses
• As a general interface to any application where meaning is important rather than
text
• As part of speech translation
• Difficulties
– Separating speech from background noise
– Filtering of performance errors (disfluencies)
– Recognizing individual sound distinctions (similar phonemes)
– Variability in human speech
– Ambiguity in language (homophones) 3/26
Speech applications
• Voice recognition
– Not really a linguistic issue
– But shares some of the techniques and problems

• Text-to-speech (Speech synthesis)


– Uses:
• Computer can speak to you
• Useful where user cannot look at (or see) screen
– Difficulties
• Homograph disambiguation
• Prosody determination (pitch, loudness, rhythm)
• Naturalness (pauses, disfluencies?)

4/26
Word processing
• Check and correct spelling, grammar and style
• Types of spelling errors
– Non-existent words
• Easy to identify
• But suggested correction not always appropriate
– Accidental homographs
• Deliberate ‘errors’
– Foreign words
– Proper names, neologisms
– Illustrations of spelling errors!
5/26
Better word processing
• Spell checking for homonyms
• Grammar checking
• Tuned to the user
– You can (already) add your own auto-corrections
– Non-native users (‘Interference checking’)
– Dyslexics and other special needs users
• Intelligent word processing
– Find/replace that knows about morphology, syntax

6/26
Text prediction
• Speed up word processing
• Facilitate text dictation
• At lexical level, already seen in SMS
• More sophisticated , might be based on
corpus of previously seen texts
• Especially useful in repeated tasks
– Translation memory
– Authoring memory
7/26
Dialogue systems
• Computer enters a dialogue with user
– Usually specific cooperative task-oriented dialogue
– Often over the phone
– Examples?
• Usually speech-driven, but text also appropriate
• Modern application is automatic transaction processing
• Limited domain may simplify language aspect
• Domain ‘model’ will play a big part
• Simplest case: choose closest match from (hidden) menu
of expected answers
• More realistic versions involve significant problems

8/26
Dialogue systems
• Apart from speech recognition and
synthesis issues, NL components include …
• Topic tracking
• Anaphora resolution
– Use of pronouns, ellipsis
• Reply generation
– Cooperative responses
– Appropriate use of anaphora
9/26
(also know as)
Conversation machines
• Another old AI goal (cf. Turing test)
• Also (amazingly) for amusement
• Mainly speech, but also text based
• Early famous approaches include ELIZA, which
showed what you could do by cheating
• Modern versions have a lot of NLP, especially
discourse modelling, and focus on the language
generation component

10/26
QA systems
• NL interface to knowledge database
• Handling queries in a natural way
• Must understand the domain
• Even if typed, dialogue must be natural
• Handling of anaphora
e.g. When is the next flight to Sydney? 6.50
And the one after? 7.50
What about Melbourne then? 7.20
OK I’ll take the last one.
11/26
IR systems
• Like QA systems, but the aim is to retrieve
information from textual sources that contain the
info, rather than from a structured data base
• Two aspects
– Understanding the query
– Processing text to find the answer
• Named Entity Recognition

12/26
13/26
Named entity recognition
• Typical textual sources involve names
(people, places, corporations), dates,
amounts, etc.
• NER seeks to identify these strings and
label them
• Clues are often linguistic
• Also involves recognizing synonyms, and
processing anaphora
14/26
Automatic summarization
• Renewed interest since mid 1990s, probably
due to growth of WWW
• Different types of summary
– indicative vs. informative
– abstract vs. extract
– generic vs. query-oriented
– background vs. just-the-news
– single-document vs. multi-document

15/26
Automatic summarization
• topic identification
• stereotypical text structure
• cue words
• high-frequency indicator phrases
• intratext connectivity
• discourse structure centrality
• topic fusion
• concept generalization
• semantic association
• summary generation
• sentence planning to achieve information compaction
16/26
Text mining
• Discovery by computer of new, previously
unknown information, by automatically extracting
information from different written resources
(typically Internet).
• Similar to data mining (e.g. using consumer
purchasing patterns to predict which products to
place close together on shelves), but based on
textual information.
• Big application area is biosciences.
17/26
Text mining
• preprocessing of document collections (text
categorization, term extraction)
• storage of the intermediate representations
• techniques to analyze these intermediate
representations (distribution analysis,
clustering, trend analysis, association rules,
etc.)
• visualization of the results.
18/26
Story understanding
• An old AI application
• Involves …
– Inference
– Ability to paraphrase (to demonstrate
understanding)
• Requires access to real-world knowledge
• Often coded in “scripts” and “frames”

19/26
Machine Translation
• Oldest non-numerical application of computers
• Involves processing of source-language as in other
applications, plus …
– Choice of target-language words and structures
– Generation of appropriate target-language strings
• Main difficulty is source-language analysis and/or
cross-lingual transfer implies varying levels of
“understanding”, depending on similarities
between the two languages
• MT ≠ tools for translators, but some overlap

20/26
21/26
Machine Translation
• First approaches perhaps most intuitive: look up
words and then do local rearrangement
• “Second generation” took linguistic approach:
grammars, rule systems, elements of AI
• Recent (since 1990) trend to use empirical
(statistical) approach based on large corpora of
parallel text
– Use existing translations to “learn” translation models,
either a priori (Statistical MT ≈ machine learning) or on
the fly (Example-based MT ≈ case-based reasoning)
– Convergence of empirical and rationalist (rule-based)
approaches: learn models based on treebanks or similar.
22/26
Language teaching
• CALL
• Grammar checking but linked to models of
– The topic
– The learner
– The teaching strategy
• Grammars (etc) can be used to create
language-learning exercises and drills

23/26
Assistive computing
• Interfaces for disabled
• Many devices involve language issues, e.g.
– Text simplification or summarization for users
with low literacy (partially sighted, dyslexic,
non-native speaker, illiterate, etc.)
– Text completion (predictive or retrospective)
• Works on basis of probabilities or previous
examples

24/26
Conclusion
• Many different applications
• But also many common elements
– Basic tools (lexicons, grammars)
– Ambiguity resolution
– Need (but impossibility of having) for real-world
knowledge
• Humans are really very good at language
– Can understand noisy or incomplete messages
– Good at guessing and inferring
25/26
What is SA & OM?
• Identify the orientation of opinion in a piece of
text

The movie The movie The movie


was fabulous! stars Mr. X was horrible!

• Can be generalized to a wider set of emotions


Motivation
• Knowing sentiment is a very natural ability of
a human being.
Can a machine be trained to do it?

• SA aims at getting sentiment-related


knowledge especially from the huge amount of
information on the internet

• Can be generally used to understand opinion in


a set of documents
Tripod of Sentiment Analysis
Cognitive
Science

Sentiment
Analysis

Machine Natural
Learning Language
Processing
Why sentiment analysis?
• Movie: is this review positive or negative?
• Products: what do people think about the new iPhone?
• Public sentiment: how is consumer confidence? Is despair
increasing?
• Politics: what do people think about this candidate or issue?
• Prediction: predict election outcomes or market trends from
sentiment

29
• Emotion: brief organically synchronized … evaluation of a major event
– angry, sad, joyful, fearful, ashamed, proud, elated
• Mood: diffuse non-caused low-intensity long-duration change in subjective feeling
– cheerful, gloomy, irritable, listless, depressed, buoyant
• Interpersonal stances: affective stance toward another person in a specific interaction
– friendly, flirtatious, distant, cold, warm, supportive, contemptuous
• Attitudes: enduring, affectively colored beliefs, dispositions towards objects or persons
– liking, loving, hating, valuing, desiring
• Personality traits: stable personality dispositions and typical behavior tendencies
– nervous, anxious, reckless, morose, hostile, jealous
Sentiment Analysis
• Sentiment analysis is the detection of attitudes
“enduring, affectively colored beliefs, dispositions
towards objects or persons”
1. Holder (source) of attitude
2. Target (aspect) of attitude
3. Type of attitude
• From a set of types
– Like, love, hate, value, desire, etc.
• Or (more commonly) simple weighted polarity:
– positive, negative, neutral, together with strength
31
4. Text containing the attitude
Sentiment Analysis
• Simplest task:
– Is the attitude of this text positive or negative?
• More complex:
– Rank the attitude of this text from 1 to 5
• Advanced:
– Detect the target, source, or complex attitude
types
Finding sentiment of a sentence

• Important for finding aspects or attributes


– Target of sentiment

• The food was great but the


service was awful

33
Finding aspect/attribute/target of sentiment

• Frequent phrases + rules


– Find all highly frequent phrases across reviews
(“fish tacos”)
– Filter by rules like “occurs right after sentiment
word”
Casino • “…great fish buffet, pool,means
casino, tacos” fish tacos a
resort, beds
Children’slikely
Barberaspect haircut, job, experience, kids
Greek Restaurant food, wine, service, appetizer, lamb
Department Store selection, department, sales, shop, clothing
Finding aspect/attribute/target of sentiment

• The aspect name may not be in the sentence


• For restaurants/hotels, aspects are well-
understood
• Supervised classification
– Hand-label a small corpus of restaurant review
sentences with aspect
• food, décor, service, value, NONE
– Train a classifier to assign an aspect to a
sentence 35
• “Given this sentence, is the aspect food, décor, service, value, or
Putting it all together:
Finding sentiment for aspects
Sentences Sentences Sentences
& Phrases & Phrases & Phrases
Final
Summary
Reviews

Text Sentiment Aspect


Aggregator
Extractor Classifier Extractor

36
Summary on Sentiment

• Generally modeled as classification or regression task


– predict a binary or ordinal label
• Features:
– Negation is important
– Using all words (in naïve bayes) works well for some tasks
– Finding subsets of words may help in other tasks
• Hand-built polarity lexicons
• Use seeds and semi-supervised learning to induce lexicons
Computational work on other affective states

• Emotion:
– Detecting annoyed callers to dialogue system
– Detecting confused/frustrated versus confident students
• Mood:
– Finding traumatized or depressed writers
• Interpersonal stances:
– Detection of flirtation or friendliness in conversations
• Personality traits:
– Detection of extroverts
Detection of Friendliness
• Friendly speakers use collaborative
conversational style
– Laughter
– Less use of negative emotional words
– More sympathy
• That’s too bad I’m sorry to hear
that
– More agreement
• I think so too 39
Named Entity(NE) Recognition

• What is NE and What is not an NE


• How to identify NE
• Tagset and Annotation Guidelines
• Methods Used in developing NER

40
Why do NER?
• Key part of Information Extraction system
• Robust handling of proper names essential
for many applications such as
Summarization, IR, Anaphora,.........
• Pre-processing for different classification
levels
• Information filtering
• Information linking
41
What is NER ?
• NER involves identification of proper names
in texts, and classification into a set of
predefined categories of interest.
• Three universally accepted categories:
• Person, location and organisation
• Other common tasks: recognition of date/time
expressions, measures (percent, money, weight
etc), email addresses etc.
• Other domain-specific entities: names of
Drugs, Genes, medical conditions, names of
ships, bibliographic references etc.
9/28/2024 42
NER Definition
• Named entity recognition (NER) (also known as entity
identification (EI) and entity extraction) is the task that locate
and classify atomic elements in text into predefined categories such
as the names of persons, organizations, locations, expressions of
times, quantities, monetary values, percentages, etc.
John sold 5 companies in 2002.

<ENAMEX TYPE="PERSON">John</ENAMEX> sold


<NUMEX TYPE="QUANTITY">5</NUMEX> companies in
<TIMEX TYPE="DATE">2002</TIMEX>.

9/28/2024 43
What is not NER?
• NER is not event recognition.
• NER does not create templates,
• NER does not perform co-reference or entity
linking,
– though these processes are often implemented alongside
NER as part of a larger IE system.
• NER is not just matching text strings with pre-
defined lists of names.
It recognises entities which are being used as entities in
a given context.
• NER
9/28/2024
is not an easy task! 44
What is Named Entity
• Named Entities are
– A Noun Phrase
– Rigid Designators : It designates/denotes the same
thing in all possible worlds in which the same thing
exists and does not designate anything else in those
possible worlds in which that same thing does not
exist

9/28/2024 45
EXAMPLES for Named Entity

• Hotel & Taj Hotel

• Flower & Rose Flower

• Beach & Kovalam Beach

• Airport & Indira Gandhi International airport

• The School & Good Shepherd School


9/28/2024 46
Word Sense
• Given
Disambiguation (WSD)
• A word in context
• A fixed inventory of potential word senses
• Decide which sense of the word this is
• Why? Machine translation, QA, speechsynthesis
• What set of senses?
• English-‐to-‐Spanish MT: set of Spanish translations
• Speech Synthesis: homographs like bass and bow
• In general: the senses in a thesaurus like WordNet
Two variants of WSD task
• Lexical Sample task
• Small pre-‐selected set of target words (line, plant)
• And inventory of senses for each word
• Supervised machine learning: train a classifier for eachword
• All-‐words task
• Every word in an entire text
• A lexicon with senses for each word
• Data sparseness: can’t train word-‐specific classifiers
WSD Methods
• Supervised Machine Learning
• Thesaurus/Dictionary Methods
• Semi-‐Supervised Learning
Supervised Machine Learning
Approaches
• Supervised machine learning approach:
• a training corpus of words tagged in context with their sense
• used to train a classifier that can tag words in new text
• Summary of what we need:
• the tag set (“sense inventory”)
• the training corpus
• A set of features extracted from the training corpus
• A classifier
Supervised WSD 1: WSD Tags
• What’s a tag?
A dictionary sense?
• For example, for WordNet an instance of “bass” in a text has 8
possible tags or labels (bass1 through bass8).
8 senses of “bass” in
WordNet
1. bass -‐ (the lowest part of the musical range)
2. bass, bass part -‐ (the lowest part in polyphonic music)
3. bass, basso -‐ (an adult male singer with the lowest voice)
4. sea bass, bass -‐ (flesh of lean-‐fleshed saltwater fish of the family
Serranidae)
5. freshwater bass, bass -‐ (any of various North American lean-‐fleshed
freshwater fishes especially of the genus Micropterus)
6. bass, bass voice, basso -‐ (the lowest adult male singing voice)
7. bass -‐ (the member with the lowest range of a family of musical
instruments)
8. bass -‐ (nontechnical name for any of numerous edible
marine and freshwater spiny-‐finned fishes)
Inventory of sense tags
for bass
WordNet Spanish Roget
Target Word in Context
Sense Translation Category
bass4 lubina FISH / IN SEC ...fish as Pacific salmon and striped bass and...
T
bass4 lubina FISH / IN SEC ...produce filets of smoked bass or sturgeon. ..
T
M US IC
bass7 bajo ...exciting jazz bass player since Ray Brown. ..
M US IC
bass7 bajo ...play bass because he doesn’t have to solo. ..
Supervised WSD 2:
Get a corpus
• Lexical sample task:
• Line-‐hard-‐serve corpus -‐ 4000 examples of each
• Interest corpus -‐ 2369 sense-‐tagged examples
• All words:
• Semantic concordance: a corpus in which each open-‐class word is
labeled with a sense from a specific dictionary/thesaurus.
• SemCor: 234,000 words from Brown Corpus, manually tagged with
WordNet senses
• SENSEVAL-‐3 competition corpora -‐ 2081 tagged word tokens
Supervised WSD 3: Extract
feature vectors
“If one examines the words in a book, one at a time as through
an opaque mask with a hole in it one word wide, then it is
obviously impossible to determine, one at a time, the meaning of
the words…
But if one lengthens the slit in the opaque mask, until one can
see not only the central word in question but also say N words on
either side, then if N is large enough one can unambiguously
decide the meaning of the central word…
The practical question is : ``What minimum value of N will, at
least in a tolerable fraction of cases, lead to the correct choice of
meaning for the central word?”
Feature vectors
• A simple representation for each observation
(each instance of a target word)
• Vectors of sets of feature/value pairs
• Represented as a ordered list of values
• These vectors represent, e.g., the window of words around
the target
Two kinds of features in
the vectors
• Collocational features and bag-‐of-‐words features
• Collocational
• Features about words at specific positions near target word
• Often limited to just word identity and POS
• Bag-‐of-‐words
• Features about words that occur anywhere in the window (regardless
of position)
• Typically limited to frequency counts
Examples
• Example text (WSJ):
An electric guitar and bass player stand off to
one side not really part of the scene
• Assume a window of +/-‐ 2 from the target
Examples
• Example text (WSJ)
An electric guitar and bass player stand off to
one side not really part of the scene,
• Assume a window of +/-‐ 2 from the target
Collocational features
• Position-‐specific information about the words
and collocations in window
• guitar and bass player stand
1 ,wi+1 ]
[wi—2,POSi—2,wi—1,POSi—1,wi+1,POSi+1,wi+2,POSi+2 ,wi—
i—2 i

[guitar, NN, and, CC, player, NN, stand, VB, and guitar, player stand]

•word 1,2,3 grams in window of ±3 is common


Bag-‐of-‐words features
• “an unordered set of words” – position ignored
• Counts of words occur within the window.
• First choose a vocabulary
• Then count how often each of those terms occurs in a
given window
• sometimes just a binary “indicator” 1 or 0
Co-‐Occurrence Example
• Assume we’ve settled on a possible vocabulary of 12 words in
“bass” sentences:

[fishing, big, sound, player, fly, rod, pound, double, runs, playing, guitar, band]

• The vector for:


guitar and bass player stand
[0,0,0,1,0,0,0,0,0,0,1,0]
Classification:
definition
• Input:
• a word w and some features f
• a fixed set of classes C = {c1,
c2,…, cJ}

• Output: a predicted class c∈C


Classification Methods:
Supervised Machine Learning
• Input:
• a word w in a text window d (which we’ll call a “document”)
• a fixed set of classes C = {c1, c2,…, cJ}
• A training set of m hand-‐labeled text windows again called
“documents” (d1,c1),....,(dm,cm)
• Output:
• a learned classifier γ:d →→ c
22
Classification Methods:
Supervised Machine Learning
• Any kind of classifier
• Naive Bayes
• Logistic regression
• Neural Networks
• Support-‐vector machines
• k-‐Nearest Neighbors

•…
Applying Naive

Bayes to WSD
P(c) is the prior probability of that sense
• Counting in a labeled training set.
• P(w|c) conditional probability of a word given a particular sense
• P(w|c) = count(w,c)/count(c)
• We get both of these from a tagged corpus like SemCor

• Can also generalize to look at other features besides words.


• Then it would be P(f|c)
• Conditional probability of a feature given a sense
The Simplified Lesk
algorithm
• Let’s disambiguate “bank” in this sentence:
The bank can guarantee deposits will eventually cover future tuition costs
because it invests in adjustable-‐rate mortgage securities.
• given the following two WordNet senses:

bank1 Gloss: a financial institution that accepts


deposits and channels the
Examples:
money into lending activities
“he cashed a check at the bank”, “that bank holds the mortgage
on my home”
bank2 Gloss: sloping land (especially the slope beside a body of water)
Examples: “they pulled the canoe up on the bank”, “he sat on the bank of
the river and watched the currents”
The Simplified Lesk
algorithm
Choose sense with most word overlap between gloss and context
(not counting function words)
The bank can guarantee deposits will eventually cover future
tuition costs because it invests in adjustable-‐rate mortgage
securities.

bank1 Gloss: a financial institution that accepts


deposits and channels the
Examples:
money into lending activities
“he cashed a check at the bank”, “that bank holds the mortgage
on my home”
bank2 Gloss: sloping land (especially the slope beside a body of water)
Examples: “they pulled the canoe up on the bank”, “he sat on the bank of
the river and watched the currents”
The Corpus Lesk algorithm

• Assumes we have some sense-‐labeled data (like SemCor)


• Take all the sentences with the relevant word sense:
These short, "streamlined" meetings usually are sponsored by local banks1,
Chambers of Commerce, trade associations, or other civic organizations.
• Now add these to the gloss + examples for each sense, call it the
“signature” of a sense.
• Choose sense with most word overlap between context and
signature.
Corpus Lesk: IDF
weighting
• Instead of just removing function words
• Weigh each word by its `promiscuity’ across documents
• Down-‐weights words that occur in every `document’ (gloss, example,etc)
• These are generally function words, but is a more fine-‐grained measure
• Weigh each overlapping word by inverse document frequency

34
Graph-‐based methods
• First, WordNet can be viewed as a graph
• senses are nodes
• relations (hypernymy, meronymy) are edges
• Also add edge between word and unambiguous gloss words
foodn1 liquidn1
helpingn 1

beveragen1 milkn1

toastn4 drinkn1
sipv1
supv 1

consumev1 drinkv1 sipn1

drinkingn1
consumern1 drinkern1

36 potationn1
consumptionn1
How to use the graph
for WSD
• Insert target word and words in its sentential context into the
graph, with directed edges to their senses
“She drank some milk”
1 1

• Now choose the drinkv drink n1


beverage n1
milkn

most central sense drinkv


2
drinkern1 foodn1 milkn
2

Add some probability


drinkv3 boozing n1 3
to “drink” and “milk” nutrimentn1 milkn

and compute node drinkv


4

milkn4
with“pagerank”
highest
37 drinkv5
“drink” “milk”
Semi-‐Supervised Learning
Problem: supervised and
dictionary-‐based approaches
require large hand-‐built
resources
What if you don’t have so much trainingdata?
Solution: Bootstrapping
Generalize from a very small hand-‐labeledseed-‐set.
Bootstrapping
• For bass
• Rely on “One sense per collocation” rule
• A word reoccurring in collocation with the same word will almost
surely have the same sense.

• the word play occurs with the music sense of bass


• the word fish occurs with the fish sense of bass
Sentences extracting using
“fish” and
“play”
We need more good teachers – right now, there are only a half a dozen who can play
the free bass with ease.
An electric guitar and bass player stand off to one side, not really part of the scene, just
as a sort of nod to gringo expectations perhaps.
The researchers said the worms spend part of their life cycle in such fish as Pacific
salmon and striped bass and Pacific rockfish or snapper.
And it all started when fishermen decided the
striped bass in Lake Mead were too
skinny.
Summary: generating seeds
1) Hand labeling
2) “One sense per collocation”:
• A word reoccurring in collocation with the same word will almost surely
have the same sense.
3) “One sense per discourse”:
• The sense of a word is highly consistent within a document -‐
Yarowsky (1995)
• (At least for non-‐function words, and especially topic-‐specific words)
Stages in the Yarowsky
bootstrapping algorithm for
the word “plant”
?
? ? LIFE ? ? ? ? ? ?
? LIFE ? ? ? ?
? ? ? ?
? ? A ?? ? ? ? ? ? ? ? ? ? ?
? A ? ? ? ? ? ? ?
? ? A AA ? ? A A ? ? A A ?
?? ? ? ? ?? ? ? ? ? ?? ? ?? ? A ?
? ?
? ? ? ? ?? ?
? ? A A ? ? ? A A A ?
A
A AA ? ?? ? ? ? ? ? AA A A MICROSCOPIC
? ? ? ?
?
? ? ? A
A AA A A
A
? ? ? ?
?
V0 ?
A A ?
AA
A AA A A
A
? ? ? ?
?
V1
? ? A ? ? ? ? A A ? ? ? ?
? A ? AA A
AA ? ? A ? ?
? ? ? ? ? ? ?? A A ? ? ? ? ? ?
? ? ? ? ? ? A ? ? ? ? ?
? ? ? ? ?? ? ? ? ?? ?
? ANIMAL A ? ? ? ? ? ?
? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ?
?? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ?
Λ0
? ? ? ?
Λ1
? ? EMPLOYEE ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ?? ? ? ? ? ? ? ? ? ?
? ? ? ? B B ?
? ? ? ?
? ? ?? ? ? ? ? ? ? ? ?? ? ?
B ? ? ?
? ? ? ? ? ? ? ?
? B B ? ? B B ?
? ? BB ? ? BB
? ? ? B B B ? ? B B B B
? ?? B B
? ? BB ? ? ? B BB ?
? ? ? ? ? ? ? B
? ?? ? ? ? ? ? ?? ? ? ? ? ?
? ? ? ? ? ? ? B ?
MANUFACTURING ? ? EQUIPMENT
? MANUFACTURING ? ? B ?

B B
(a) B (b) B
Summary
• Word Sense Disambiguation: choosing correct
sense in context
• Applications: MT, QA, etc.
• Three classes of Methods
• Supervised Machine Learning: Naive Bayes classifier
• Thesaurus/Dictionary Methods
• Semi-‐Supervised Learning

• Main intuition
44

• There is lots of information in a word’s context


• Simple algorithms based just on word counts can be surprisinglygood
Text Classification
• Assigning subject categories, topics, or genres
• Spam detection
• Authorship identification
• Age/gender identification
• Language Identification
• Sentiment analysis
• …
Dan Jurafsky

Text Classification: definition


• Input:
• a document d
• a fixed set of classes C = {c1, c2,…, cJ}

• Output: a predicted class c  C


Classification Methods: Hand--
‐coded rules
• Rules based on combinations of words or other features
• spam: black-‐list-‐address OR (“dollars” AND“have been selected”)
• Accuracy can be high
• If rules carefully refined by expert
• But building and maintaining these rules is expensive
Dan Jurafsky Classification Methods:
Supervised Machine
Learning
• Input:
• a document d
• a fixed set of classes C = {c1, c2,…, cJ}
• A training set of m hand-‐labeled documents (d1,c1),....,(dm,cm)
• Output:
• a learned classifier γ:d  c

10
Dan Jurafsky Classification Methods:
Supervised Machine Learning
• Any kind of classifier
• Naïve Bayes
• Logistic regression
• Support-‐vector machines
• k-‐Nearest Neighbors

• …

You might also like