NLP1 Lecture5
NLP1 Lecture5
Katia Shutova
ILLC
University of Amsterdam
12 November 2018
Natural Language Processing 1
Lecture 5: Introduction to semantics & lexical semantics
Semantics
Compositional semantics:
I studies how meanings of phrases are constructed out of
the meaning of individual words
I principle of compositionality: meaning of each whole
phrase derivable from meaning of its parts
I sentence structure conveys some meaning: obtained by
syntactic representation
Lexical semantics:
I studies how the meanings of individual words can be
represented and induced
Natural Language Processing 1
Lecture 5: Introduction to semantics & lexical semantics
Words and concepts
Prototype theory
Semantic relations
Hyponymy: IS-A
WordNet
Polysemy
I homonymy: unrelated word senses. bank (raised land) vs
bank (financial institution)
I bank (financial institution) vs bank (in a casino): related but
distinct senses.
I regular polysemy and sense extension
I zero-derivation, e.g. tango (N) vs tango (V), or rabbit,
turkey, halibut (meat / animal)
I metaphorical senses, e.g. swallow [food], swallow
[information], swallow [anger]
I metonymy, e.g. he played Bach; he drank his glass.
I vagueness: nurse, lecturer, driver
I cultural stereotypes: nurse, lecturer, driver
No clearcut distinctions.
Natural Language Processing 1
Lecture 5: Introduction to semantics & lexical semantics
Polysemy
Initial state
? ?? ? ?
? ?
? ??
? ? ?
?? ?
? ? ?
?? ? ? ?
? ?
? ? ?
? ?
Natural Language Processing 1
Lecture 5: Introduction to semantics & lexical semantics
Word sense disambiguation
Seeds
? ?? ? B
? B
? ?? manu.
? ? ?
?? life ?
A ? ?
?? A A ?
? A
? A ?
? ?
Natural Language Processing 1
Lecture 5: Introduction to semantics & lexical semantics
Word sense disambiguation
Iterating:
? ?? ? B
B B
? ??
animal ? ? ?
company
AA B
A ? ?
?? A A ?
? A
? A ?
? ?
Natural Language Processing 1
Lecture 5: Introduction to semantics & lexical semantics
Word sense disambiguation
Final:
A AA B B
B B
A BB
A B B
AA B
A B B
AA A A B
A A
A A B
A B
Natural Language Processing 1
Lecture 5: Introduction to semantics & lexical semantics
Word sense disambiguation
Distributional hypothesis
Distributional hypothesis
Distributional hypothesis
Distributional hypothesis
Distributional hypothesis
Scrumpy
Natural Language Processing 1
Lecture 5: Introduction to semantics & lexical semantics
Word sense disambiguation
Distributional hypothesis
Distributional semantics
1. Count-based models:
I Vector space models
I dimensions correspond to elements in the context
I words are represented as vectors, or higher-order tensors
2. Prediction models:
I Train a model to predict plausible contexts for a word
I learn word representations in the process
Natural Language Processing 1
Count-based models
Vectors
Natural Language Processing 1
Count-based models
Feature matrix
Context
Context
Context
Dependency vectors
word (Subj) word (Dobj)
come_v use_v
mean_v say_v
go_v hear_v
speak_v take_v
make_v speak_v
say_v find_v
seem_v get_v
follow_v remember_v
give_v read_v
describe_v write_v
get_v utter_v
appear_v know_v
begin_v understand_v
sound_v believe_v
occur_v choose_v
Natural Language Processing 1
Count-based models
Context weighting
Characteristic model
I Weights given to the vector components express how
characteristic a given context is for word w.
I Pointwise Mutual Information (PMI)
P(w, c) P(w)P(c|w) P(c|w)
PMI(w, c) = log = log = log
P(w)P(c) P(w)P(c) P(c)
f (c) f (w, c)
P(c) = P , P(c|w) = ,
k f (ck ) f (w)
P
f (w, c) k f (ck )
PMI(w, c) = log
f (w)f (c)
f (w, c): frequency of word w in context c
f (w): frequency of word w in all contexts
f (c): frequency of context c
Natural Language Processing 1
Count-based models
I Entire vocabulary.
I + All information included – even rare contexts
I - Inefficient (100,000s dimensions). Noisy (e.g.
002.png|thumb|right|200px|graph_n). Sparse
I Top n words with highest frequencies.
I + More efficient (2000-10000 dimensions). Only ‘real’
words included.
I - May miss out on infrequent but relevant contexts.
Natural Language Processing 1
Count-based models
I Entire vocabulary.
I + All information included – even rare contexts
I - Inefficient (100,000s dimensions). Noisy (e.g.
002.png|thumb|right|200px|graph_n). Sparse.
I Top n words with highest frequencies.
I + More efficient (2000-10000 dimensions). Only ‘real’
words included.
I - May miss out on infrequent but relevant contexts.
Natural Language Processing 1
Count-based models
Frequency counts...