SlideShare a Scribd company logo
2014/02/06 PFI

Statistical Semantic
~
word2vec
Preferred Infrastructure
(@unnonouno)

~
(@unnonouno)

! 
! 
! 
! 

! 

IBM

PFI
Statistical Semantic入門 ~分布仮説からword2vecまで~
Semantics
[Bird+10]
10
10.1
10.2
10.3
10.4
10.5
10.6
10.7
10.8
[

+96]
5.
5.1
5.2
5.3
5.4
Wikipedia

! 
! 
! 
! 
! 
! 
! 

! 

! 

Statistical Semantics

Statistical Semantics
Statistical Semantics Distributional Semantics

! 
! 
! 
[Evert10]

NAACL2010 Stefan Evert
Semantic Models

Distributional
???

[Evert10]
???
2 cat

pig

knife

[Evert10]
dog

[Evert10]
(Distributional Hypothesis)
The Distributional Hypothesis is that words
that occur in the same contexts tend to have
similar meanings (Harris, 1954). (ACL wiki
)

! 
! 
(Statistical Semantics)
Statistical Semantics is the study of "how the
statistical patterns of human word usage can be
used to figure out what people mean, at least to
a level sufficient for information access” (ACL
wiki
)

! 
! 
Statistical Semantic入門 ~分布仮説からword2vecまで~
[

13]
! 
! 

! 
! 
! 

! 
! 

! 
! 

! 
! 
! 

! 

PFI
! 
! 
! 

! 

1
3
! 
! 

ex:

! 

etc…

ex:

! 

-

etc…

! 
! 
! 

ex:

NN
NN

etc…
: Latent Semantic Indexing (LSI),
Latent Semantic Analysis (LSA) [Deerwester+90]
! 
! 

! 

! 
LSI

k:

(SVD)
U

=

x

∑

x

i
i k

V
LSI
! 
! 
! 

! 

SVD
! 

-

-

etc.
etc.

! 

-

! 

etc.
Statistical Semantic入門 ~分布仮説からword2vecまで~
LSI

NMF

PLSI

LDA

NNLM

RNNLM

NTF

Skipgram

NN
! 

LSI

! 

Good
! 
! 

Bad
! 
! 

! 

! 
Probabilistic Latent Semantic
Indexing (PLSI) [Hofmann99]
! 

LSI

! 
! 

! 

ex:

LSI
PLSI
! 
! 

! 
! 
! 

! 

ex:
Latent Dirichlet Allocation (LDA) [Blei03]

PLSI
!  PLSI
LDA
! 
LDA
! 

NLP

! 

! 

1
! 
! 
! 

ex:

etc.

! 
! 

! 

1.0
! 
! 

Good
! 

Bad
! 
! 

LSI

SVD
Non-negative Matrix Factorization (NMF) [Lee
+99]
! 

SVD

! 
! 

[Lee+99]
NMF = PLSI [Dinga+08]
! 

NMF

PLSI

! 

NMF

PLSI
Non-negative Tensor Factorization (NTF)
[Cruys10]

3

! 
! 

2

3
! 
! 

SVD
! 
! 

Good
! 

Bad
! 
! 

word2vec
Neural Network Language Model (NNLM) [Bengio
+03]
! 
! 

N
NN
N-1
Recurrent Neural Network Language Model
(RNNLM) [Mikolov+10]
! 

t-1
t
! 

NNLM

N

! 

! 

http://rnnlm.org
RNNLM
! 

[Mikolov+13a]
RNNLM

! 

Transition-based parser

RNNLM
! 

! 
! 

Stack recurrent

Transition-based parser
Skip-gram

(word2vec) [Mikolov+13b]
! 
! 

CBOW
! 

Analogical reasoning

! 

Parser
Skip-gram

[Mikolov+13b]
: w1, w2, …, wT

! 

wi

c

vw

w

5
! 
[Mikolov+13c]
! 
word2vec
! 
! 
! 

! 

! 

NMF
[Kim+13]
! 

“good”

”best”

”better”
[Mikolov+13d]
! 
! 
NN
! 
! 

! 

2013

! 
! 
! 

Mikolov

15
! 

N

! 
! 
! 

NN

! 
! 
! 
! 

NN

N
! 

NN
! 

! 
! 

! 
! 
! 
! 

Statistical Semantics
! 

3

! 
! 

! 

NN
! 
! 

NN
1
! 

! 

! 
! 

! 

[Bird+10] Steven Bird, Ewan Klein, Edward Loper,
,
,
.
.
, 2010.
[
+96]
.
.
, 1996.
[Evert10] Stefan Evert.
Distributional Semantic Models. NAACL 2010 Tutorial.
[
13]
.
.
, 2013.
[Deerwester+90] Scott Deerwester, Susan T. Dumais, George W.
Furnas, Thomas K. Landauer, Richard Harshman.
Indexing by Latent Semantic Analysis. JASIS, 1990.
2
! 
! 

! 

! 

! 

[Hofmann99] Thomas Hofmann.
Probabilistic Latent Semantic Indexing. SIGIR, 1999.
[Blei+03] David M. Blei, Andrew Y. Ng, Michael I. Jordan.
Latent Dirichlet Allocation. JMLR, 2003.
[Lee+99] Daniel D. Lee, H. Sebastian Seung.
Learning the parts of objects by non-negative matrix factorization.
Nature, vol 401, 1999.
[Ding+08] Chris Ding, Tao Li, Wei Peng.
On the equivalence between Non-negative Matrix Factorization and
Probabilistic Latent Semantic Indexing. Computational Statistics &
Data Analysis, 52(8), 2008.
[Cruys10] Tim Van de Cruys.
A Non-negative Tensor Factorization Model for Selectional Preference
Induction. Natural Language Engineering, 16(4), 2010.
3
! 

! 

! 

! 

NN 1

[Bengio+03] Yoshua Bengio, Réjean Ducharme, Pascal Vincent,
Christian Jauvin.
A Neural Probabilistic Language Model. JMLR, 2003.
[Mikolov+10] Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan
"Honza" Cernocky, Sanjeev Khudanpur.
Recurrent neural network based language model.
Interspeech, 2010.
[Mikolov+13a] Tomas Mikolov, Wen-tau Yih, Geoffrey Zweig.
Linguistic Regularities in Continuous Space Word
Representations. HLT-NAACL, 2013.
[Mikolov+13b] Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey
Dean.
Efficient Estimation of Word Representations in Vector Space.
CoRR, 2013.
4
! 

! 

! 

NN 2

[Mikolov+13c] Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory
S. Corrado, Jeffrey Dean.
Distributed Representations of Words and Phrases and their
Compositionality. NIPS, 2013.
[Kim+13] Joo-Kyung Kim, Marie-Catherine de Marneffe.
Deriving adjectival scales from continuous space word
representations. EMNLP 2013.
,
[Mikolov+13d] Tomas Mikolov, Quoc V. Le, Ilya Sutskever.
Exploiting Similarities among Languages for Machine
Translation. CoRR, 2013.

More Related Content

Statistical Semantic入門 ~分布仮説からword2vecまで~