Deep Learning For NLP
Deep Learning For NLP
DEEP LEARNING
Overview of the Talk
So……
Please feel free to ask sensible questions during the talk
for clarification if needed
And
I have an accent, so let me know if you have trouble
understanding the Queen’s English
Overview of the Talk
Target = y
Learn y = f(x)
For each Neuron:
Activation <- Sum the inputs, add the bias, apply a sigmoid
function (tanh, logistic, etc) as the activation function
Activations Propagate through the layers
Output Layer: compute error for each neuron:
Error = y– f(x)
Update the weights using the derivative of the error
Backwards – propagate the error derivatives through the
hidden layers
Backpropagation
Errors
Gradient Descent
Weights are updated using the partial derivative of
the activation function w.r.t. the error
Derivative pushes learning down the gradient of
steepest descent on the error curve
Gradient Descent
Drawbacks - Backpropagation
Greedy algorithm
Can be viewed as constructing a binary parse
tree with the lowest reconstruction error
Auto-encoder is trained with two objective
functions:
1 Minimize the reconstruction error
2 Minimize the classification error in a softmax layer
The output at each level of the tree is fed into a
softmax neural network layer, trained on
labeled data
Semi-Supervised Training