Lecture1 ANN - Full
Lecture1 ANN - Full
• Reference
- Stanford CS231 (http://cs231n.stanford.edu/)
- Stanford CS224 (http://cs224d.stanford.edu/ )
- Deep Learning Book (http://www.deeplearningbook.org/ )
AI is having real-
world impact
• Public imagination
• Text assistants
AI is having real-world impact
▪ Public imagination
▪ Text assistants
▪ Image generation
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ 454 billion USD globally
https://www.precedenceresearch.com/artificial-intelligence-market
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ Politics
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ Politics
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ Politics
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ Politics
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ Politics
▪ Law
MarketWatch, 2023
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ Politics
▪ Law
▪ Labor
▪ Sciences
Nature, 2022
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ Politics
▪ Law
▪ Labor
▪ Sciences
Wired, 2022
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ Politics
▪ Law
▪ Labor
▪ Sciences
▪ Education
Forbes, 2023
Deep learning hype on media
• New York Times(2012)
- Google Brain project identifying cat from YouTube
videos without any labels
Deep learning hype on media
• MIT Technology Review
- One of top 10 most promising breakthrough techs
Recent impacts
• Real industry impacts!
Speech Recognition Image Recognition
20
15
10
5
0
1996
2010
1990
1992
1994
1998
2000
2002
2004
2006
2008
2013
2014
→→Different
4
(%)
0
Microsoft Facebook CUHK Google Human
Recent impacts
• Even more…
Image Captioning Visual QA system
• Generate captions onimages • Answer a question about an image
[Silver et.al.,2016]
http://deepart.io
Supervised learning
• Teach computers with many (input, output) pairs
“Cat”
Supervised learning
• Examples
– Speechrecognition
1
e.g., f (x) =
1 + e—x
f (x) = max(0, x)
Deep neural networks
• Multiple (e.g., 5~20) layers of multiple neurons
- “Weights” updated using stochastic gradient descent
Forward Pass Backward Pass
Error back-propagation
Compute prediction
(chain rule)
Deep neural networks
• Multiple (e.g., 5~20) layers of multiple neurons
- “Weights” updated using stochastic gradient descent
<
Gaussian Mixture Models Deep NeuralNetworks
(GMM) (DNN)
Two pillars of recent advances
• Convolutional Neural Networks (CNN)
→→Excellent for image data
[Graves and Schmidhuber,Framewise phoneme classification with bidirectional LSTM and other neural network
architectures,2005] [Greff et al.,LSTM:A search space odyssey,2015]
A variant of RNN
: Long Short-Term Memory (LSTM)
• Instead of simple hidden nodes, LSTM has memory block
Forward
Pass
Input gate
Forget gate
Cell
Output gate
Cell output
A variant of RNN
: Long Short-Term Memory (LSTM)
• Instead of simple hidden nodes, LSTM has memory block
Backward
Pass
Cell output
Output gate
Cells
Forget gate
Input gate
A variant of RNN
: Long Short-Term Memory (LSTM)
• Deep, bidirectional LSTM
- Multiple layers of LSTMs
- LSTMs running both directions
Deep LSTM Bidirectional LSTM(BLSTM)
Applications of RNN (LSTM)
: Speech Recognition
• 3 components
- AM : estimate phoneme probability given input waveform
- LM : estimate word probability given past word sequence
- Decoder : combine AM+LM to estimate best sentence
Speech Recognition System
corpus
Language
speech
Model (LM)
Ex) n-‐gram, DNN, …
text
Acoustic
Decoder
Model (AM) “I love you”
Ex) WFST-‐based
Ex) GMM, DNN, …
Applications of RNN (LSTM)
: Speech Recognition (acoustic model)
• BLSTM takes entire speech for recognition at time t
- Long-term memory can improve the accuracy!
t t
Applications of RNN (LSTM)
: Speech Recognition (acoustic model)
• TIMIT: standard benchmark for phoneme recognition
- 3.5 hours (small set)
Phone Error Rate (PER) on TIMIT
40
Start of Start of using
35 using DNN! DBLSTM!
(20.7%) (18.0%) • Similar result in much
30 larger set (>2000hours)
with large vocabulary
(%)
SAIT
25 DBLSTM+ as well!
RNNDrop
(16.3%)
20 • LSTM-based LM also
gives significant
15 performance boost!
10
1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2013 2014
[Graves et.al., Speech recognition with deep recurrent neural networks,2013]
[Hannun et.al.,Deep speech:Scaling up end-to-end speech recognition,2014]
Applications of RNN (LSTM)
: MachineTranslation
• Statistical Machine Translation (SMT)
- Statistically estimate target sentence from source sentence
- Challenge: word order difference, one-to-many
→→Find (stochastic) mapping between sentences
SMT for English →→French
Applications of RNN (LSTM)
: MachineTranslation
• Neural Machine Translation : LSTM plays a central role
• Main idea: Use Encoder-Decoder idea
- ENC:find a representation of source
- DEC:generate a translation with encoded representation
Target sentence
Encoder LSTM
Decoder
LSTM
Source sentence Encoded representation
v of source sentence
[Cho et.al., Learning phrase representations using RNN encoder-decoder for statistical machine translation,2014]
[Sutskever et.al.,Sequence to sequence learning with neural networks,2014]
Advanced applications
: Image captioning
• “Translate” image to text
– Same principle as machine translation
– Combine CNN(encoder) + RNN/LSTM (decoder)
[Karpathy and Fei-Fei L.,Deep visual-semantic alignments for generating image descriptions,2015] @ Stanford
[Donahue et.al.,Long-term recurrent convolutional networks for visual recognition and description,2015] @ UC Berkeley
[Vinyals et.al.,Show and tell:A neural image caption generator,2015] @ Google
[Mao et.al.,Explain images with multimodal recurrent neural networks,2015] @ Baidu & UCLA
[Kiros et.al.,Unifying visual-semantic embeddings with multimodal neural language models,2015] @ U.Toronto
RNN summary
• Flexible for applications involving sequential data
• Theano
- Maintained by University of Montreal
- Strong Python integration
• Torch
- Maintained by NYU,Facebook
- Based on Lua
• Tensorflow
- Maintained by Google (most recent)