0% found this document useful (0 votes)
101 views80 pages

Lecture 8 - Semantic Similarity Vector Semantic - Sem

The document discusses different types of relationships between word meanings (senses) including: - Polysemy, where a word has multiple related meanings - Homonymy, where words have unrelated meanings but the same form - Synonymy, where words have the same or similar meanings - Antonymy, where words have opposite meanings - Hyponymy and hypernymy, where one word denotes a subclass of another more general word. Understanding these relationships is important for natural language processing.

Uploaded by

Derryza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
0% found this document useful (0 votes)
101 views80 pages

Lecture 8 - Semantic Similarity Vector Semantic - Sem

The document discusses different types of relationships between word meanings (senses) including: - Polysemy, where a word has multiple related meanings - Homonymy, where words have unrelated meanings but the same form - Synonymy, where words have the same or similar meanings - Antonymy, where words have opposite meanings - Hyponymy and hypernymy, where one word denotes a subclass of another more general word. Understanding these relationships is important for natural language processing.

Uploaded by

Derryza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 80

Word Meaning and

Similarity
Word  Senses  and  
Word  Rela-ons  
Dan  Jurafsky  

Reminder:  lemma  and  wordform  


•  A  lemma  or  cita1on  form  
•  Same  stem,  part  of  speech,  rough  seman-cs  
•  A  wordform  
•  The  “inflected”  word  as  it  appears  in  text  

Wordform   Lemma  
banks   bank  
sung   sing  
duermes   dormir  
Dan  Jurafsky  

Lemmas  have  senses  


•  One  lemma  “bank”  can  have  many  meanings:  
Sense  1:   •  …a bank1!can hold the investments in a custodial
account…!
Sense  2:   •  “…as agriculture burgeons on the east bank2!the
river will shrink even more”  
•  Sense  (or  word  sense)  
•  A  discrete  representa-on    
                                   of  an  aspect  of  a  word’s  meaning.  
•  The  lemma  bank  here  has  two  senses  
Dan  Jurafsky  

Homonymy  
Homonyms:  words  that  share  a  form  but  have  
unrelated,  dis-nct  meanings:  
•  bank1:  financial  ins-tu-on,        bank2:    sloping  land  
•  bat1:  club  for  hiNng  a  ball,        bat2:    nocturnal  flying  mammal  
1.  Homographs  (bank/bank,  bat/bat)  
2.  Homophones:  
1.  Write  and  right  
2.  Piece  and  peace  
Dan  Jurafsky  

Homonymy  causes  problems  for  NLP  


applica1ons  
•  Informa-on  retrieval  
•  “bat care”!
•  Machine  Transla-on  
•  bat:    murciélago    (animal)  or    bate  (for  baseball)  
•  Text-­‐to-­‐Speech  
•  bass  (stringed  instrument)  vs.  bass  (fish)  
Dan  Jurafsky  

Polysemy  

•  1.  The  bank  was  constructed  in  1875  out  of  local  red  brick.  
•  2.  I  withdrew  the  money  from  the  bank    
•  Are  those  the  same  sense?  
•  Sense  2:  “A  financial  ins-tu-on”  
•  Sense  1:  “The  building  belonging  to  a  financial  ins-tu-on”  
•  A  polysemous  word  has  related  meanings  
•  Most  non-­‐rare  words  have  mul-ple  meanings  
Dan  Jurafsky  

Metonymy  or  Systema1c  Polysemy:    


A  systema1c  rela1onship  between  senses  
•  Lots  of  types  of  polysemy  are  systema-c  
•  School, university, hospital!
•  All  can  mean  the  ins-tu-on  or  the  building.  
•  A  systema-c  rela-onship:  
•  Building                        Organiza-on  
•  Other  such  kinds  of  systema-c  polysemy:    
Author  (Jane Austen wrote Emma)                                    
 Works  of  Author  (I love Jane Austen)  
Tree  (Plums have beautiful blossoms) !
!Fruit  (I ate a preserved plum)!
Dan  Jurafsky  

How  do  we  know  when  a  word  has  more  


than  one  sense?  
•  The  “zeugma”  test:  Two  senses  of  serve?  
•  Which flights serve breakfast?!
•  Does Lufthansa serve Philadelphia?!
•  ?Does  Lu^hansa  serve  breakfast  and  San  Jose?  
•  Since  this  conjunc-on  sounds  weird,    
•  we  say  that  these  are  two  different  senses  of  “serve”  
Dan  Jurafsky  

Synonyms  
•  Word  that  have  the  same  meaning  in  some  or  all  contexts.  
•  filbert  /  hazelnut  
•  couch  /  sofa  
•  big  /  large  
•  automobile  /  car  
•  vomit  /  throw  up  
•  Water  /  H20  
•  Two  lexemes  are  synonyms    
•  if  they  can  be  subs-tuted  for  each  other  in  all  situa-ons  
•  If  so  they  have  the  same  proposi1onal  meaning  
Dan  Jurafsky  

Synonyms  
•  But  there  are  few  (or  no)  examples  of  perfect  synonymy.  
•  Even  if  many  aspects  of  meaning  are  iden-cal  
•  S-ll  may  not  preserve  the  acceptability  based  on  no-ons  of  politeness,  
slang,  register,  genre,  etc.  
•  Example:  
•  Water/H20  
•  Big/large  
•  Brave/courageous  
Dan  Jurafsky  

Synonymy  is  a  rela1on    


between  senses  rather  than  words  
•  Consider  the  words  big  and  large  
•  Are  they  synonyms?  
•  How  big  is  that  plane?  
•  Would  I  be  flying  on  a  large  or  small  plane?  
•  How  about  here:  
•  Miss  Nelson  became  a  kind  of  big  sister  to  Benjamin.  
•  ?Miss  Nelson  became  a  kind  of  large  sister  to  Benjamin.  
•  Why?  
•  big  has  a  sense  that  means  being  older,  or  grown  up  
•  large  lacks  this  sense  
Dan  Jurafsky  

Antonyms  
•  Senses  that  are  opposites  with  respect  to  one  feature  of  meaning  
•  Otherwise,  they  are  very  similar!  
dark/light short/long !fast/slow !rise/fall!
hot/cold! up/down! in/out!
•  More  formally:  antonyms  can  
•  define  a  binary  opposi-on  
 or  be  at  opposite  ends  of  a  scale  
•   long/short, fast/slow!
•  Be  reversives:  
•  rise/fall, up/down!
Dan  Jurafsky  

Hyponymy  and  Hypernymy  


•  One  sense  is  a  hyponym  of  another  if  the  first  sense  is  more  
specific,  deno-ng  a  subclass  of  the  other  
•  car  is  a  hyponym  of  vehicle  
•  mango  is  a  hyponym  of  fruit  
•  Conversely  hypernym/superordinate  (“hyper  is  super”)  
•  vehicle  is  a  hypernym    of  car  
•  fruit  is  a  hypernym  of  mango  
Superordinate/hyper vehicle fruit furniture
Subordinate/hyponym car mango chair
Dan  Jurafsky  

Hyponymy  more  formally  


•  Extensional:  
•  The  class  denoted  by  the  superordinate  extensionally  includes  the  class  
denoted  by  the  hyponym  
•  Entailment:  
•  A  sense  A  is  a  hyponym  of  sense  B  if  being  an  A  entails  being  a  B  
•  Hyponymy  is  usually  transi-ve    
•  (A  hypo  B  and  B  hypo  C  entails  A  hypo  C)  
•  Another  name:  the  IS-­‐A  hierarchy  
•  A  IS-­‐A  B            (or  A  ISA  B)  
•  B  subsumes  A  
Dan  Jurafsky  

Hyponyms  and  Instances  


•  WordNet  has  both  classes  and  instances.  
•  An  instance  is  an  individual,  a  proper  noun  that  is  a  unique  en-ty  
•  San Francisco is  an  instance  of  city!
•  But  city  is  a  class  
•  city  is  a  hyponym  of        municipality...location...!

15  
Word Meaning and
Similarity
Word  Senses  and  
Word  Rela-ons  
Word Meaning and
Similarity
WordNet  and  other  
Online  Thesauri  
Dan  Jurafsky  

Applica1ons  of  Thesauri  and  Ontologies  

•  Informa-on  Extrac-on  
•  Informa-on  Retrieval  
•  Ques-on  Answering  
•  Bioinforma-cs  and  Medical  Informa-cs  
•  Machine  Transla-on  
Dan  Jurafsky  

WordNet  3.0  
•  A  hierarchically  organized  lexical  database  
•  On-­‐line  thesaurus  +  aspects  of  a  dic-onary  
•  Some  other  languages  available  or  under  development  
•  (Arabic,  Finnish,  German,  Portuguese…)  

Category   Unique  Strings  


Noun   117,798  
Verb   11,529  
Adjec-ve   22,479  
Adverb   4,481  
Dan  Jurafsky  

Senses  of  “bass”  in  Wordnet  


Dan  Jurafsky  

How  is  “sense”  defined  in  WordNet?  


•  The  synset  (synonym  set),  the  set  of  near-­‐synonyms,  
instan-ates  a  sense  or  concept,  with  a  gloss  
•  Example:  chump  as  a  noun  with  the  gloss:  
“a  person  who  is  gullible  and  easy  to  take  advantage  of”  
•  This  sense  of  “chump”  is  shared  by  9  words:  
chump1, fool2, gull1, mark9, patsy1, fall guy1,
sucker1, soft touch1, mug2!
•  Each  of  these  senses  have  this  same  gloss  
•  (Not  every  sense;  sense  2  of  gull  is  the  aqua-c  bird)  
 
Dan  Jurafsky  

WordNet  Hypernym  Hierarchy  for  “bass”  


Dan  Jurafsky  

WordNet  Noun  Rela1ons  


Dan  Jurafsky  

WordNet  3.0  

•  Where  it  is:  


•  hnp://wordnetweb.princeton.edu/perl/webwn  
•  Libraries  
•  Python:    WordNet    from  NLTK  
•  hnp://www.nltk.org/Home  
•  Java:  
•  JWNL,  extJWNL  on  sourceforge  
Dan  Jurafsky  

MeSH:  Medical  Subject  Headings  


thesaurus  from  the  Na1onal  Library  of  Medicine  
•  MeSH  (Medical  Subject  Headings)  
•  177,000  entry  terms    that  correspond  to  26,142  biomedical  
“headings”  

•  Hemoglobins   Synset
Entry  Terms:    Eryhem,  Ferrous  Hemoglobin,  Hemoglobin  
Defini1on:    The  oxygen-­‐carrying  proteins  of  ERYTHROCYTES.  
They  are  found  in  all  vertebrates  and  some  invertebrates.  
The  number  of  globin  subunits  in  the  hemoglobin  quaternary  
structure  differs  between  species.  Structures  range  from  
monomeric  to  a  variety  of  mul-meric  arrangements  
Dan  Jurafsky  

The  MeSH  Hierarchy  

•  a  

26  
Dan  Jurafsky  

Uses  of  the  MeSH  Ontology  


•  Provide  synonyms  (“entry  terms”)  
•  E.g.,  glucose  and  dextrose  
•  Provide  hypernyms  (from  the  hierarchy)  
•  E.g.,  glucose  ISA  monosaccharide  
•  Indexing  in  MEDLINE/PubMED  database  
•  NLM’s  bibliographic  database:    
•  20  million  journal  ar-cles  
•  Each  ar-cle  hand-­‐assigned  10-­‐20  MeSH  terms  
Word Meaning and
Similarity
WordNet  and  other  
Online  Thesauri  
Word Meaning and
Similarity
Word  Similarity:  
Thesaurus  Methods  
Dan  Jurafsky  

Word  Similarity  
•  Synonymy:  a  binary  rela-on  
•  Two  words  are  either  synonymous  or  not  
•  Similarity  (or  distance):  a  looser  metric  
•  Two  words  are  more  similar  if  they  share  more  features  of  meaning  
•  Similarity  is  properly  a  rela-on  between  senses  
•  The  word  “bank”  is  not  similar  to  the  word  “slope”  
•  Bank1  is  similar  to  fund3  
•  Bank2  is  similar  to  slope5  
•  But  we’ll  compute  similarity  over  both  words  and  senses  
Dan  Jurafsky  

Why  word  similarity  


•  Informa-on  retrieval  
•  Ques-on  answering  
•  Machine  transla-on  
•  Natural  language  genera-on  
•  Language  modeling  
•  Automa-c  essay  grading  
•  Plagiarism  detec-on  
•  Document  clustering  
Dan  Jurafsky  

Word  similarity  and  word  relatedness  


•  We  o^en  dis-nguish  word  similarity    from  word  
relatedness  
•  Similar  words:  near-­‐synonyms  
•  Related  words:  can  be  related  any  way  
•  car, bicycle:        similar  
•  car, gasoline:      related,  not  similar  
Dan  Jurafsky  

Two  classes  of  similarity  algorithms  


•  Thesaurus-­‐based  algorithms  
•  Are  words  “nearby”  in  hypernym  hierarchy?  
•  Do  words  have  similar  glosses  (defini-ons)?  
•  Distribu-onal  algorithms  
•  Do  words  have  similar  distribu-onal  contexts?  
Dan  Jurafsky  

Path  based  similarity  

•  Two  concepts  (senses/synsets)  are  similar  if  


they  are  near  each  other  in  the  thesaurus  
hierarchy    
•  =have  a  short  path  between  them  
•  concepts  have  path  1  to  themselves  
Dan  Jurafsky  

Refinements  to  path-­‐based  similarity  


•  pathlen(c1,c2) =  1  +  number  of  edges  in  the  shortest  path  in  the  
hypernym  graph  between  sense  nodes  c1  and  c2  
•  ranges  from  0  to  1  (iden-ty)  

1
•  simpath(c1,c2) =
pathlen(c1, c2 )

•  wordsim(w1,w2) = max sim(c1,c2)


c1∈senses(w1),c2∈senses(w2)  
Dan  Jurafsky  

Example:  path-­‐based  similarity  


simpath(c1,c2) = 1/pathlen(c1,c2)

simpath(nickel,coin)  =  1/2 = .5
simpath(fund,budget)  =  1/2 = .5
simpath(nickel,currency)  =  1/4 = .25
simpath(nickel,money)  =  1/6 = .17
simpath(coinage,Richter  scale)  =  1/6 = .17
Dan  Jurafsky  

Problem  with  basic  path-­‐based  similarity  


•  Assumes  each  link  represents  a  uniform  distance  
•  But  nickel  to  money  seems  to  us  to  be  closer  than  nickel  to  
standard  
•  Nodes  high  in  the  hierarchy  are  very  abstract  
•  We  instead  want  a  metric  that  
•  Represents  the  cost  of  each  edge  independently  
•  Words  connected  only  through  abstract  nodes    
•  are  less  similar  
Dan  Jurafsky  

Informa1on  content  similarity  metrics  


Resnik  1995.  Using  informa-on  content  to  evaluate  seman-c  
similarity  in  a  taxonomy.  IJCAI  
•  Let’s  define  P(c) as:  
•  The  probability  that  a  randomly  selected  word  in  a  corpus  is  an  instance  
of  concept  c
•  Formally:  there  is  a  dis-nct  random  variable,  ranging  over  words,  
associated  with  each  concept  in  the  hierarchy  
•  for  a  given  concept,  each  observed  noun  is  either  
•   a  member  of  that  concept    with  probability  P(c)
•  not  a  member  of  that  concept  with  probability  1-P(c)
•  All  words  are  members  of  the  root  node  (En-ty)  
•  P(root)=1
•  The  lower  a  node  in  hierarchy,  the  lower  its  probability  
Dan  Jurafsky  
en-ty  

Informa1on  content  similarity   …  

geological-­‐forma-on  

•  Train  by  coun-ng  in  a  corpus   natural  eleva-on   cave   shore  


•  Each  instance  of  hill  counts  toward  frequency    
of  natural  eleva<on,  geological  forma<on,  en<ty,  etc   hill   ridge   grono   coast  
•  Let  words(c) be  the  set  of  all  words  that  are  children  of  node  c  
•  words(“geo-­‐forma-on”)  =  {hill,ridge,grono,coast,cave,shore,natural  eleva-on}  
•  words(“natural  eleva-on”)  =  {hill,  ridge}  

" count(w)
P(c) = w!words(c)
N
Dan  Jurafsky  

Informa1on  content  similarity  


•  WordNet  hierarchy  augmented  with  probabili-es  P(c)  
D.  Lin.  1998.  An  Informa-on-­‐Theore-c  Defini-on  of  Similarity.  ICML  1998  
Dan  Jurafsky  

Informa1on  content:  defini1ons  

•  Informa-on  content:  
IC(c) = -log P(c)
•  Most  informa-ve  subsumer  
(Lowest  common  subsumer)  
LCS(c1,c2) =
The  most  informa-ve  (lowest)  
node  in  the  hierarchy  
subsuming  both  c1  and  c2  
Dan  Jurafsky  
Using  informa1on  content  for  similarity:    
the  Resnik  method  
Philip  Resnik.  1995.  Using  Informa-on  Content  to  Evaluate  Seman-c  Similarity  in  a  Taxonomy.  IJCAI  1995.  
Philip  Resnik.  1999.  Seman-c  Similarity  in  a  Taxonomy:  An  Informa-on-­‐Based  Measure  and  its  Applica-on  
to  Problems  of  Ambiguity  in  Natural  Language.  JAIR  11,  95-­‐130.  

•  The  similarity  between  two  words  is  related  to  their  


common  informa-on  
•  The  more  two  words  have  in  common,  the  more  
similar  they  are  
•  Resnik:  measure  common  informa-on  as:  
•  The  informa-on  content  of  the  most  informa-ve  
 (lowest)  subsumer  (MIS/LCS)  of  the  two  nodes  
•  simresnik(c1,c2) = -log P( LCS(c1,c2) )
Dan  Jurafsky  

Dekang  Lin  method  


Dekang  Lin.  1998.  An  Informa-on-­‐Theore-c  Defini-on  of  Similarity.  ICML  

•  Intui-on:  Similarity  between  A  and  B  is  not  just  what  they  have  
in  common  
•  The  more  differences  between  A  and  B,  the  less  similar  they  are:  
•  Commonality:  the  more  A  and  B  have  in  common,  the  more  similar  they  are  
•  Difference:  the  more  differences  between  A  and  B,  the  less  similar  
•  Commonality:  IC(common(A,B))  
•  Difference:  IC(descrip-on(A,B)-­‐IC(common(A,B))  
Dan  Jurafsky  

Dekang  Lin  similarity  theorem  


•  The  similarity  between  A  and  B  is  measured  by  the  ra-o  
between  the  amount  of  informa-on  needed  to  state  the  
commonality  of  A  and  B  and  the  informa-on  needed  to  fully  
describe  what  A  and  B  are  
IC(common(A, B))
simLin (A, B) !
  IC(description(A, B))
•  Lin  (altering  Resnik)  defines  IC(common(A,B))  as  2  x  informa-on  of  the  LCS  
2 log P(LCS(c1, c2 ))
simLin (c1, c2 ) =
log P(c1 ) + log P(c2 )
Dan  Jurafsky  

Lin  similarity  func1on  

2 log P(LCS(c1, c2 ))
simLin (A, B) =
log P(c1 ) + log P(c2 )

2 log P(geological-formation)
simLin (hill, coast) =
log P(hill) + log P(coast)

2 ln 0.00176
=
ln 0.0000189 + ln 0.0000216
= .59
Dan  Jurafsky  

The  (extended)  Lesk  Algorithm    


•  A  thesaurus-­‐based  measure  that  looks  at  glosses  
•  Two  concepts  are  similar  if  their  glosses  contain  similar  words  
•  Drawing  paper:  paper  that  is  specially  prepared  for  use  in  dra^ing  
•  Decal:  the  art  of  transferring  designs  from  specially  prepared  paper  to  a  
wood  or  glass  or  metal  surface  
•  For  each  n-­‐word  phrase  that’s  in  both  glosses  
•  Add  a  score  of  n2    
•  Paper  and  specially  prepared  for  1  +  22  =  5  
•  Compute  overlap  also  for  other  rela-ons  
•  glosses  of  hypernyms  and  hyponyms  
Dan  Jurafsky  

Summary:  thesaurus-­‐based  similarity  


1
sim path (c1, c2 ) =
pathlen(c1, c2 )
2 log P(LCS(c1, c2 ))
sim resnik (c1, c2 ) = ! log P(LCS(c1, c2 )) sim lin (c1, c2 ) =
log P(c1 ) + log P(c2 )
1
sim jiangconrath (c1, c2 ) =
log P(c1 ) + log P(c2 ) ! 2 log P(LCS(c1, c2 ))
sim eLesk (c1, c2 ) = # overlap(gloss(r(c1 )), gloss(q(c2 )))
r,q"RELS
Dan  Jurafsky  

Libraries  for  compu1ng  thesaurus-­‐based  


similarity  
•  NLTK  
•  hnp://nltk.github.com/api/nltk.corpus.reader.html?highlight=similarity  -­‐  
nltk.corpus.reader.WordNetCorpusReader.res_similarity  

•  WordNet::Similarity  
•  hnp://wn-­‐similarity.sourceforge.net/  
•  Web-­‐based  interface:  
•  hnp://marimba.d.umn.edu/cgi-­‐bin/similarity/similarity.cgi  

48  
Dan  Jurafsky  

Evalua1ng  similarity  
•  Intrinsic  Evalua-on:  
•  Correla-on  between  algorithm  and  human  word  similarity  ra-ngs  
•  Extrinsic  (task-­‐based,  end-­‐to-­‐end)  Evalua-on:  
•  Malapropism  (spelling  error)  detec-on  
•  WSD  
•  Essay  grading  
•  Taking  TOEFL  mul-ple-­‐choice  vocabulary  tests  
 Levied is closest in meaning to:!
imposed, believed, requested, correlated!
Word Meaning and
Similarity
Word  Similarity:  
Thesaurus  Methods  
Word Meaning and
Similarity
Word  Similarity:  
Distribu-onal  Similarity  (I)  
Dan  Jurafsky  

Problems  with  thesaurus-­‐based  meaning  


•  We  don’t  have  a  thesaurus  for  every  language  
•  Even  if  we  do,  they  have  problems  with  recall  
•  Many  words  are  missing  
•  Most  (if  not  all)  phrases  are  missing  
•  Some  connec-ons  between  senses  are  missing  
•  Thesauri  work  less  well  for  verbs,  adjec-ves  
•  Adjec-ves  and  verbs  have  less  structured  
hyponymy  rela-ons  
 
Dan  Jurafsky  

Distribu1onal  models  of  meaning  


•  Also  called  vector-­‐space  models  of  meaning  
•  Offer  much  higher  recall  than  hand-­‐built  thesauri  
•  Although  they  tend  to  have  lower  precision  
•  Zellig  Harris  (1954):  “oculist  and  eye-­‐doctor  …  
occur  in  almost  the  same  environments….                                  
If  A  and  B  have  almost  iden1cal  environments  
we  say  that  they  are  synonyms.  
 

•  Firth  (1957):  “You  shall  know  a  word  by  the  


53  
company  it  keeps!”  
Dan  Jurafsky  

Intui1on  of  distribu1onal  word  similarity  

•  Nida  example:  
A bottle of tesgüino is on the table!
Everybody likes tesgüino!
Tesgüino makes you drunk!
We make tesgüino out of corn.!
•  From context words humans can guess tesgüino means
•  an  alcoholic  beverage  like  beer  
•  Intui-on  for  algorithm:    
•  Two  words  are  similar  if  they  have  similar  word  contexts.  
Dan  Jurafsky  

Reminder:  Term-­‐document  matrix  


•  Each  cell:  count  of  term  t  in  a  document  d:    zt,d:    
•  Each  document  is  a  count  vector  in  ℕv:  a  column  below    
!"#$%&#'()*#+, -.*/0,1#2(31, 4&/(&"#56*"67 8*97:#;
<6,,/* = = > =?
"%/@(*7 A A =A BC
0%%/ BD ?> = ?
E/%.9 C ==D F F

55  
Dan  Jurafsky  

Reminder:  Term-­‐document  matrix  


•  Two  documents  are  similar  if  their  vectors  are  similar  
!"#$%&#'()*#+, -.*/0,1#2(31, 4&/(&"#56*"67 8*97:#;
<6,,/* = = > =?
"%/@(*7 A A =A BC
0%%/ BD ?> = ?
E/%.9 C ==D F F

56  
Dan  Jurafsky  

The  words  in  a  term-­‐document  matrix  


•  Each  word  is  a  count  vector  in  ℕD:  a  row  below    
!"#$%&#'()*#+, -.*/0,1#2(31, 4&/(&"#56*"67 8*97:#;
<6,,/* = = > =?
"%/@(*7 A A =A BC
0%%/ BD ?> = ?
E/%.9 C ==D F F

57  
Dan  Jurafsky  

The  words  in  a  term-­‐document  matrix  


•  Two  words  are  similar  if  their  vectors  are  similar  
!"#$%&#'()*#+, -.*/0,1#2(31, 4&/(&"#56*"67 8*97:#;
<6,,/* = = > =?
"%/@(*7 A A =A BC
0%%/ BD ?> = ?
E/%.9 C ==D F F

58  
Dan  Jurafsky  

The  Term-­‐Context  matrix  


•  Instead  of  using  en-re  documents,  use  smaller  contexts  
•  Paragraph  
•  Window  of  10  words  
•  A  word  is  now  defined  by  a  vector  over  counts  of  
context  words  

59  
Dan  Jurafsky  

Sample  contexts:  20  words  (Brown  corpus)      


•  equal  amount  of  sugar,  a  sliced  lemon,  a  tablespoonful  of  apricot  
preserve  or  jam,  a  pinch  each  of  clove  and  nutmeg,  
•  on  board  for  their  enjoyment.  Cau-ously  she  sampled  her  first  
pineapple  and  another  fruit  whose  taste  she  likened  to  that  of  
•  of  a  recursive  type  well  suited  to  programming  on  
the  digital  computer.  In  finding  the  op-mal  R-­‐stage  
policy  from  that  of  
•  substan-ally  affect  commerce,  for  the  purpose  of  
gathering  data  and  informa1on  necessary  for  the  
60   study  authorized  in  the  first  sec-on  of  this  
Dan  Jurafsky  

Term-­‐context  matrix  for  word  similarity  


•  Two  words  are  similar  in  meaning  if  their  context  
vectors  are  similar  
!!"#$!"% &'()*+," #!+! )-.&/ ",0*1+ 0*2!" 3
!)"-&'+ 4 4 4 5 4 5
)-.,!))1, 4 4 4 5 4 5
#-2-+!1 4 6 5 4 5 4
-.7'"(!+-'. 4 5 8 4 9 4

61  
Dan  Jurafsky  

Should  we  use  raw  counts?  


•  For  the  term-­‐document  matrix  
•  We  used  z-­‐idf  instead  of  raw  term  counts  
•  For  the  term-­‐context  matrix  
•  Posi-ve  Pointwise  Mutual  Informa-on  (PPMI)  is  common  

62  
Dan  Jurafsky  

Pointwise  Mutual  Informa1on  


•  Pointwise  mutual  informa1on:    
•  Do  events  x  and  y  co-­‐occur  more  than  if  they  were  independent?  
P(x,y)
PMI(X,Y ) = log 2
P(x)P(y)
•  PMI  between  two  words:    (Church  &  Hanks  1989)  
•   Do  words  x  and  y  co-­‐occur  more  than  if  they  were  independent?    
P(word1,word2 )
PMI(word1, word2 ) = log 2
P(word1)P(word2 )

•  Posi1ve  PMI  between  two  words  (Niwa  &  Nina  1994)  


•   Replace  all  PMI  values  less  than  0  with  zero  

 
 
Dan  Jurafsky  

Compu1ng  PPMI  on  a  term-­‐context  matrix  


•  Matrix  F  with  W  rows  (words)  and  C  columns  (contexts)  
•  fij  is  #  of  -mes  wi  occurs  in  context  cj
C W

fij ! fij ! fij


pij = W C pi* = j=1 p* j = i=1
W C W C
!! fij !! fij !! fij
i=1 j=1 i=1 j=1 i=1 j=1

p !# pmi if pmiij > 0


ij
pmiij = log 2 ij ppmiij = "
pi* p* j #$ 0 otherwise
64  
Dan  Jurafsky  

fij
pij = W C
!! fij
i=1 j=1
C W

p(w=informa-on,c=data)  =     6/19   =  .32   ! fij ! fij


j=1 p(c j ) = i=1
p(w=informa-on)  =   11/19   =   . 58   p(wi ) =
N N
p(c=data)  =  7/19   =  .37   !"#$%&'()*(+ !"#+
!"#$%&'( )*&* $+,!- ('.%/& .%0*(
*$(+!"& 1211 1211 1213 1211 1213 1244
$+,'*$$/' 1211 1211 1213 1211 1213 1244
)+0+&*/ 1244 1213 1211 1213 1211 1254
+,6"(#*&+", 1213 1275 1211 1254 1211 1238
65  
!"%&'()*(+ 1249 127: 1244 1259 1244
Dan  Jurafsky  
!"#$%&'()*(+ !"#+
!"#$%&'( )*&* $+,!- ('.%/& .%0*(
pij *$(+!"& 1211 1211 1213 1211 1213 1244
pmiij = log 2 $+,'*$$/' 1211 1211 1213 1211 1213 1244
pi* p* j
)+0+&*/ 1244 1213 1211 1213 1211 1254
+,6"(#*&+", 1213 1275 1211 1254 1211 1238
!"%&'()*(+ 1249 127: 1244 1259 1244

•  pmi(informa-on,data)  =  log2  (  .32  /   (.37*.58)  )   =  .58  


(.57  using  full  precision)  
!!"#$%&'()*+,*-
!"#$%&'( )*&* $+,!- ('.%/& .%0*(
*$(+!"& 1 1 2324 1 2324
$+,'*$$/' 1 1 2324 1 2324
)+0+&*/ 5366 7377 1 7377 1
66   +,8"(#*&+", 7377 7349 1 73:9 1
Dan  Jurafsky  

Weighing  PMI  
•  PMI  is  biased  toward  infrequent  events  
•  Various  weigh-ng  schemes  help  alleviate  this  
•  See  Turney  and  Pantel  (2010)  
•  Add-­‐one  smoothing  can  also  help  

67  
Dan  Jurafsky  

!""#$%&'(()*+"%,(-.)/012(.)+3)4
!"#$%&'( )*&* $+,!- ('.%/& .%0*(
*$(+!"& 1 1 2 1 2
$+,'*$$/' 1 1 2 1 2
)+0+&*/ 3 2 1 2 1
+,4"(#*&+", 2 5 1 6 1

!"#$%&'()*(+,-.//012 !"#+
!"#$%&'( )*&* $+,!- ('.%/& .%0*(
*$(+!"& 1213 1213 1214 1213 1214 1251
$+,'*$$/' 1213 1213 1214 1213 1214 1251
)+0+&*/ 1216 1214 1213 1214 1213 1257
+,8"(#*&+", 1214 1297 1213 1291 1213 123:
!"%&'()*(+ 129; 1254 1296 1255 1296
68  
Dan  Jurafsky  

!!"#$%&'()*+,*-
!"#$%&'( )*&* $+,!- ('.%/& .%0*(
*$(+!"& 1 1 2324 1 2324
$+,'*$$/' 1 1 2324 1 2324
)+0+&*/ 5366 7377 1 7377 1
+,8"(#*&+", 7377 7349 1 73:9 1
!!"#$%&'()*+,*-./011234
!"#$%&'( )*&* $+,!- ('.%/& .%0*(
*$(+!"& 1211 1211 1234 1211 1234
$+,'*$$/' 1211 1211 1234 1211 1234
)+0+&*/ 1245 1211 1211 1211 1211
69   +,6"(#*&+", 1211 1237 1211 1289 1211
Word Meaning and
Similarity
Word  Similarity:  
Distribu-onal  Similarity  (I)  
Word Meaning and
Similarity
Word  Similarity:  
Distribu-onal  Similarity  (II)  
Dan  Jurafsky  

Using  syntax  to  define  a  word’s  context  


•  Zellig  Harris  (1968)  
•  “The  meaning  of  en--es,  and  the  meaning  of  gramma-cal  rela-ons  among  them,  is  
related  to  the  restric-on  of  combina-ons  of  these  en--es  rela-ve  to  other  en--es”  
•  Two  words  are  similar  if  they  have  similar  parse  contexts  
•  Duty  and  responsibility  (Chris  Callison-­‐Burch’s  example)  

Modified  by   addi-onal,  administra-ve,  assumed,  


adjec1ves   collec-ve,  congressional,  cons-tu-onal  …  
Objects  of  verbs   assert,  assign,  assume,  anend  to,  avoid,  
become,  breach  …  
Dan  Jurafsky  

Co-­‐occurrence  vectors  based  on  syntac1c  dependencies  


Dekang  Lin,  1998  “Automa-c  Retrieval  and  Clustering  of  Similar  Words”  

•  The  contexts  C  are  different  dependency  rela-ons  


•  Subject-­‐of-­‐  “absorb”  
•  Preposi-onal-­‐object  of  “inside”  
•  Counts  for  the  word  cell:  
Dan  Jurafsky  

PMI  applied  to  dependency  rela1ons  


Hindle, Don. 1990. Noun Classification from Predicate-Argument Structure. ACL

Object  of  “drink”   Count   PMI  


it  
tea   3  
2   1.3  
11.8  
anything  
liquid   3  
2   5.2  
10.5  
wine   2   9.3  
tea  
anything   2  
3   11.8  
5.2  
liquid  
it   2  
3   10.5  
1.3  

•  “Drink it” more  common  than  “drink wine”!


•  But  “wine”  is  a  bener  “drinkable”  thing  than  “it”  
Dan  Jurafsky   Sec. 6.3

Reminder:  cosine  for  compu1ng  similarity  

Dot product Unit vectors

! ! ! ! N
! ! v •w
cos(v, w) = ! ! =
v w !i=1 vi wi
!• ! =
v w v w N 2 N
!i=1 vi !i=1 wi2
vi is the PPMI value for word v in context i
wi is the PPMI value for word w in context i.

Cos(v,w) is the cosine similarity of v and w


Dan  Jurafsky  

Cosine  as  a  similarity  metric  


•  -­‐1:  vectors  point  in  opposite  direc-ons    
•  +1:    vectors  point  in  same  direc-ons  
•  0:  vectors  are  orthogonal  

•  Raw  frequency  or  PPMI  are  non-­‐


nega-ve,  so    cosine  range  0-­‐1  

76  
Dan  Jurafsky  

large   data   computer  


apricot   1   0   0  
! ! ! ! N
! ! v •w v w !i=1 vi wi digital   0   1   2  
cos(v, w) = ! ! = !• ! =
v w v w N N
!i=1 vi2 !i=1 wi2 informa-on   1   6   1  
Which  pair  of  words  is  more  similar?   1+ 0 + 0 1
= = .16
cosine(apricot,informa-on)  =     1+ 0 + 0 1+ 36 +1 38
  0+6+2 8
cosine(digital,informa-on)  =   = = .58
0 +1+ 4 1+ 36 +1 38 5
 
0+0+0
cosine(apricot,digital)  =   =0
  1+ 0 + 0 0 +1+ 4
77  
Dan  Jurafsky  

Other  possible  similarity  measures  


Dan  Jurafsky  

Evalua1ng  similarity    
(the  same  as  for  thesaurus-­‐based)  
•  Intrinsic  Evalua-on:  
•  Correla-on  between  algorithm  and  human  word  similarity  
ra-ngs  
•  Extrinsic  (task-­‐based,  end-­‐to-­‐end)  Evalua-on:  
•  Spelling  error  detec-on,  WSD,  essay  grading  
•  Taking  TOEFL  mul-ple-­‐choice  vocabulary  tests  

 Levied is closest in meaning to which of these:!


imposed, believed, requested, correlated!
Word Meaning and
Similarity
Word  Similarity:  
Distribu-onal  Similarity  (II)  

You might also like