07-dlintro deep learning nlp
07-dlintro deep learning nlp
Mausam
Disclaimer: this is an outsider’s understanding. Some details may be inaccurate
Features
Model
(NB, SVM, CRF)
Model
(MF, LSA, IR)
Features
Model
(NB, SVM, CRF)
Neural Model
Features (NB, SVM, CRF)
z1 z2 …
Neural Model
Features (NB, SVM, CRF)
z1 z2 …
Neural Model
Features NN= (NB, SVM, CRF, +++
+ feature discovery)
Supervised Optimize function
Training (LL, sqd error, margin…)
Data
Learn feature weights+vectors
NLP with DL
Assumptions
- doc/query/word is a vector of numbers
z1 z2 … - doc: bag/sequence/tree of words
- feature: neural (weights are shared)
- model: bag/seq of features (non-linear)
Neural Model
Features NN= (NB, SVM, CRF, +++
+ feature discovery)
Supervised Optimize function
Training (LL, sqd error, margin…)
Data
Learn feature weights+vectors
Meta-thoughts
Features
• Learned
• in a task specific end2end way
• not limited by human creativity
Everything is a “Point”
• Word embedding
• Phrase embedding
• Sentence embedding
• Word embedding in context of sentence
• Etc
Symbolic Symbolic
Input z1 Model Neural Model Output
(word) Features (class, sentence..)
Encoder Decoder
+ ; .
• Uses
– question aligns with answer //QA
– sentence aligns with sentence //paraphrase
– word aligns with (~important for) sentence //attention
g(Ax+b)
• 1-layer MLP
• Take x
– project it into a different space //relevant to task
– add some scalar bias (only increases/decreases it)
– convert into a required output
• 2-layer MLP
– Common way to convert input to output
Loss Functions
Cross Entropy
Binary Cross Entropy
Max Margin
Encoder-Decoder
LOSS
P(y) y*
Symbolic Symbolic
Input z1 Model Neural Model Output
(word) Features (class, sentence..)
Encoder Decoder
Common Loss Functions
Common Loss Functions
• Max Margin
Loss = max(0, 1-(score(y*)-score(ybest)))
https://ruder.io/optimizing-gradient-descent/
Glorot/Xavier Initialization (tanh)
• Initializing W matrix of dimensionality dinxdout