Statistical Semantic入門 ~分布仮説からword2vecまで~

2014/02/06 PFI

Statistical Semantic
~
word2vec
Preferred Infrastructure
(@unnonouno)

~

(@unnonouno)

! 
! 
! 
! 

! 

IBM

PFI

[Bird+10]
10
10.1
10.2
10.3
10.4
10.5
10.6
10.7
10.8

Wikipedia

! 
! 
! 
! 
! 
!

! 

! 

! 

Statistical Semantics


Statistical Semantics Distributional Semantics

! 
! 
!

[Evert10]

NAACL2010 Stefan Evert
Semantic Models

Distributional

???
2 cat

pig

knife

[Evert10]

(Distributional Hypothesis)
The Distributional Hypothesis is that words
that occur in the same contexts tend to have
similar meanings (Harris, 1954). (ACL wiki
)

! 
!

(Statistical Semantics)
Statistical Semantics is the study of "how the
statistical patterns of human word usage can be
used to ﬁgure out what people mean, at least to
a level sufficient for information access” (ACL
wiki
)

! 
!

! 
! 

! 
! 

! 
! 

! 
!

! 

! 

PFI
! 
! 
! 

! 

1

3
! 
! 

ex:

! 

etc…

ex:

! 

-

etc…

! 
! 
! 

ex:

NN
NN

etc…

: Latent Semantic Indexing (LSI),
Latent Semantic Analysis (LSA) [Deerwester+90]
! 
! 

! 

!

LSI

k:

(SVD)
U

=

x

∑

x

i
i k

V

! 

-

-

etc.
etc.

! 

-

! 

etc.

LSI

NMF

PLSI

LDA

NNLM

RNNLM

NTF

Skipgram

NN

! 

LSI

! 

Good
! 
! 

Bad
!

Probabilistic Latent Semantic
Indexing (PLSI) [Hofmann99]
! 

LSI

! 
! 

! 

ex:

LSI

PLSI
! 
! 

! 
! 
! 

! 

ex:

Latent Dirichlet Allocation (LDA) [Blei03]

PLSI
!  PLSI
LDA
!

! 
! 
! 

ex:

etc.

! 
! 

! 

1.0

! 
! 

Good
! 

Bad
! 
! 

LSI

SVD

Non-negative Matrix Factorization (NMF) [Lee
+99]
! 

SVD

! 
! 

[Lee+99]

NMF = PLSI [Dinga+08]
! 

NMF

PLSI

! 

NMF

PLSI

Non-negative Tensor Factorization (NTF)
[Cruys10]

3

! 
! 

2

3

! 
! 

Good
! 

Bad
! 
! 

word2vec

Neural Network Language Model (NNLM) [Bengio
+03]
! 
! 

N
NN
N-1

Recurrent Neural Network Language Model
(RNNLM) [Mikolov+10]
! 

t-1
t
! 

NNLM

N

! 

! 

http://rnnlm.org

RNNLM

! 

Transition-based parser

RNNLM
! 

! 
! 

Stack recurrent

Transition-based parser

Skip-gram

(word2vec) [Mikolov+13b]
! 
! 

CBOW
! 

Analogical reasoning

! 

Parser

Skip-gram

[Mikolov+13b]
: w1, w2, …, wT

! 

wi

c

vw

w

5

word2vec
! 
! 
! 

! 

! 

NMF

[Kim+13]
! 

“good”

”best”

”better”

NN
! 
! 

! 

2013

! 
! 
! 

Mikolov

15

! 

N

! 
! 
! 

NN

! 
! 
! 
! 

NN

N

! 

NN
! 

! 
! 

! 
! 
!

! 

! 

3

! 
! 

! 

NN
! 
! 

NN

1
! 

! 

! 
! 

! 

[Bird+10] Steven Bird, Ewan Klein, Edward Loper,
,
,
.
.
, 2010.
[
+96]
.
.
, 1996.
[Evert10] Stefan Evert.
Distributional Semantic Models. NAACL 2010 Tutorial.
[
13]
.
.
, 2013.
[Deerwester+90] Scott Deerwester, Susan T. Dumais, George W.
Furnas, Thomas K. Landauer, Richard Harshman.
Indexing by Latent Semantic Analysis. JASIS, 1990.

2
! 
! 

! 

! 

! 

[Hofmann99] Thomas Hofmann.
Probabilistic Latent Semantic Indexing. SIGIR, 1999.
[Blei+03] David M. Blei, Andrew Y. Ng, Michael I. Jordan.
Latent Dirichlet Allocation. JMLR, 2003.
[Lee+99] Daniel D. Lee, H. Sebastian Seung.
Learning the parts of objects by non-negative matrix factorization.
Nature, vol 401, 1999.
[Ding+08] Chris Ding, Tao Li, Wei Peng.
On the equivalence between Non-negative Matrix Factorization and
Probabilistic Latent Semantic Indexing. Computational Statistics &
Data Analysis, 52(8), 2008.
[Cruys10] Tim Van de Cruys.
A Non-negative Tensor Factorization Model for Selectional Preference
Induction. Natural Language Engineering, 16(4), 2010.

3
! 

! 

! 

! 

NN 1

[Bengio+03] Yoshua Bengio, Réjean Ducharme, Pascal Vincent,
Christian Jauvin.
A Neural Probabilistic Language Model. JMLR, 2003.
[Mikolov+10] Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan
"Honza" Cernocky, Sanjeev Khudanpur.
Recurrent neural network based language model.
Interspeech, 2010.
[Mikolov+13a] Tomas Mikolov, Wen-tau Yih, Geoffrey Zweig.
Linguistic Regularities in Continuous Space Word
Representations. HLT-NAACL, 2013.
[Mikolov+13b] Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey
Dean.
Efficient Estimation of Word Representations in Vector Space.
CoRR, 2013.

4
! 

! 

! 

NN 2

[Mikolov+13c] Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory
S. Corrado, Jeffrey Dean.
Distributed Representations of Words and Phrases and their
Compositionality. NIPS, 2013.
[Kim+13] Joo-Kyung Kim, Marie-Catherine de Marneffe.
Deriving adjectival scales from continuous space word
representations. EMNLP 2013.
,
[Mikolov+13d] Tomas Mikolov, Quoc V. Le, Ilya Sutskever.
Exploiting Similarities among Languages for Machine
Translation. CoRR, 2013.

Statistical Semantic入門 ~分布仮説からword2vecまで~

Recommended

More Related Content

What's hot (20)

Viewers also liked (8)

Similar to Statistical Semantic入門 ~分布仮説からword2vecまで~ (16)

More from Yuya Unno (20)

Recently uploaded (20)

Statistical Semantic入門 ~分布仮説からword2vecまで~