Week9 Seq2seq
Week9 Seq2seq
27 Jan 2016
Seq2seq (Sutskever et al., 2014)
Decoder RNN
Encoder RNN
Source: http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns
Seq2seq overview and applications
• Encoder-decoder
• Two RNNs (typically LSTMs or GRUs)
• Can be deterministic or variational
• Applications:
• Machine translation
• Question answering
• Dialogue models (conversational agents)
• Summarization
• Etc.
LSTM cell
Seq2Seq
• Source sequence x= (x1, x2,..., x|x|) represented as word embedding
vectors
• Target sequence y= (y1, y2,..., y|y|)
• At the end of the encoding process, we have the final hidden and cell
states
• Hidden state initialization:
• Set the initial states of the decoder to
Seq2seq (cont.)
• At each step of the decoder, compute
• yjk is the value of the kth dimension of the output vector at time step j
Softmax example
• Beam search: choose k words with the highest p(yi) at each time step.
• k – beam width (typically 5-10)
Beam search
• Multiple possible replies
can be generated in
response to “Who does
John like?”
Source: https://aws.amazon.com/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/
Visualizing Attention in Machine Translation
(2)
Source: https://aws.amazon.com/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/
Variational Attention for
Sequence-to-Sequence Models
Hareesh Bahuleyan, Lili Mou, Olga Vechtomova, Pascal Poupart
In Proc. COLING 2018
Deterministic
Attention in
Variational Encoder-
Decoder (VED)
• The decoder LSTM has direct
access to source via cj
• This may cause the decoder to
ignore z – bypassing
phenomenon (Bahuleyan et al.,
2018)
• Distinct
Results on the question generation task