0% found this document useful (0 votes)
6 views13 pages

04 - Word Representations

The document discusses different approaches to representing word meanings in computers, including knowledge-based representations using concepts like hypernyms, and corpus-based representations using techniques like one-hot encoding, co-occurrence matrices, and learning low-dimensional dense word vectors directly from text. Corpus-based representations allow modeling word similarity and capturing syntactic and semantic relationships.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views13 pages

04 - Word Representations

The document discusses different approaches to representing word meanings in computers, including knowledge-based representations using concepts like hypernyms, and corpus-based representations using techniques like one-hot encoding, co-occurrence matrices, and learning low-dimensional dense word vectors directly from text. Corpus-based representations allow modeling word similarity and capturing syntactic and semantic relationships.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Applied Deep Learning

Word Representations
March 17th, 2020 http://adl.miulab.tw
2 Meaning Representations
◉ Definition of “Meaning”
o the idea that is represented by a word, phrase, etc.
o the idea that a person wants to express by using words, signs, etc.
o the idea that is expressed in a work of writing, art, etc.
3 Meaning Representations in Computers

Knowledge-Based Representation Corpus-Based Representation


4 Meaning Representations in Computers

Knowledge-Based Representation Corpus-Based Representation


5 Knowledge-Based Representation
◉ Hypernyms (is-a) relationships of WordNet

Issues:
▪ newly-invented words
▪ subjective
▪ annotation effort
▪ difficult to compute word similarity
6 Meaning Representations in Computers

Knowledge-Based Representation Corpus-Based Representation


7 Corpus-Based Representation
◉ Atomic symbols: one-hot representation

car [0 0 0 0 0 0 1 0 0 … 0]

car

Issues: difficult to compute the similarity (i.e. comparing “car” and “motorcycle”)
[0 0 0 0 0 0 1 0 0 … 0] AND [0 0 1 0 0 0 0 0 0 … 0] = 0
car motorcycle

Idea: words with similar meanings often have similar neighbors


8 Corpus-Based Representation
◉ Neighbor-based representation
o Co-occurrence matrix constructed via neighbors
o Neighbor definition: full document v.s. windows

full document
word-document co-occurrence matrix gives general topics
→ “Latent Semantic Analysis”

windows
context window for each word
→ capture syntactic (e.g. POS) and sematic information
9 Window-Based Co-occurrence Matrix
similarity > 0
◉ Example
Counts I love enjoy AI deep learning
o Window length=1 I 0 2 1 0 0 0
o Left or right context love 2 0 0 1 1 0
o Corpus: I love AI. enjoy 1 0 0 0 0 1
I love deep learning. AI 0 1 0 0 0 0
I enjoy learning. deep 0 1 0 0 0 1
learning 0 0 1 0 1 0

Issues:
▪ matrix size increases with vocabulary
Idea: low dimensional word vector
▪ high dimensional
▪ sparsity → poor robustness
10 Low-Dimensional Dense Word Vector
◉ Method 1: dimension reduction on the matrix
◉ Singular Value Decomposition (SVD) of co-occurrence matrix X

approximate
11 Low-Dimensional Dense Word Vector
◉ Method 1: dimension reduction on the matrix
◉ Singular Value Decomposition (SVD) of co-occurrence matrix X

Issues:
▪ computationally expensive:
O(mn2) when n<m for nxm matrix
▪ difficult to add new words

Idea: directly learn low-


dimensional word vectors

semantic relations syntactic relations


Rohde et al., “An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence,” 2005.
12 Low-Dimensional Dense Word Vector
◉ Method 2: directly learn low-dimensional word vectors
○ Learning representations by back-propagation. (Rumelhart et al., 1986)
○ A neural probabilistic language model (Bengio et al., 2003)
○ NLP (almost) from Scratch (Collobert & Weston, 2008)
○ Recent and most popular models: word2vec (Mikolov et al. 2013) and Glove
(Pennington et al., 2014)
• As known as “Word Embeddings”
13 Summary
◉ Knowledge-based representation
◉ Corpus-based representation
✓ Atomic symbol
✓ Neighbors
o High-dimensional sparse word vector
o Low-dimensional dense word vector
▪ Method 1 – dimension reduction
▪ Method 2 – direct learning

You might also like