"Hello World" of Deep Learning
"Hello World" of Deep Learning
of deep learning
If you want to learn theano:
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/L
Keras ecture/Theano%20DNN.ecm.mp4/index.html
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Le
cture/RNN%20training%20(v6).ecm.mp4/index.html
Very flexible
or
Need some
effort to learn
使用 Keras 心得
Example Application
• Handwriting Digit Recognition
Machine “1”
28 x 28
……
500
softplus, softsign, relu, tanh,
hard_sigmoid, linear
……
500
Softmax
y1 y2
…… y10
Keras
28 x 28 …… 10 ……
=784
case 1:
case 2:
Keras
• Using GPU to speed training
• Way 1
• THEANO_FLAGS=device=gpu0 python
YourCode.py
• Way 2 (in your code)
• import os
• os.environ["THEANO_FLAGS"] =
"device=gpu0"
Live Demo
We do not really minimize total loss!
Mini-batch Randomly initialize
network parameters
𝐶1 𝐿′ = 𝐶 1 + 𝐶 31 + ⋯
x31 NN y31 𝑦ො 31 Update parameters once
𝐶 31 Pick the 2nd batch
……
𝐿′′ = 𝐶 2 + 𝐶 16 + ⋯
Update parameters once
x2 NN y2 𝑦ො 2
Mini-batch
…
𝐶2 Until all mini-batches
have been picked
x16 NN y16 𝑦ො 16
𝐶 16 one epoch
……
𝑙1
Update parameters once
x31 NN y31 𝑦ො 31
𝑙31 Pick the 2nd batch
……
𝐿′′ = 𝐶 2 + 𝐶 16 + ⋯
100 examples in a mini-batch Update parameters once
…
Batch size = 1
Until all mini-batches
Stochastic gradient descent have been picked
Repeat 20 times one epoch
Speed
• Smaller batch size means more updates in one epoch
• E.g. 50000 examples
• batch size = 1, 50000 updates in one epoch 166s 1 epoch
• batch size = 10, 5000 updates in one epoch 17s 10 epoch
166s Batch size = 1 or 10, update the same
amount of times in the same period.
Batch size = 10 is more stable
GTX 980 on MNIST with
17s 50000 training examples
Speed - Matrix Operation
x1 …… y1
x2 W1 W2 ……
WL y2
b1 b2 bL
……
……
……
……
……
xN x a1 ……
a2 y yM
=𝜎 WL …𝜎 W2 𝜎 W1 x + b1 + b2 … + bL
Speed - Matrix Operation
• Why mini-batch is faster than stochastic gradient
descent?
Stochastic Gradient Descent
𝑧1 = 𝑊1 𝑥 𝑧1 = 𝑊1 𝑥 ……
Mini-batch
matrix
Practically, which
𝑧1 𝑧1 = 𝑊1 𝑥 𝑥
one is faster?
Performance
• Larger batch size yields more efficient computation.
• However, it can yield worse performance
Shuffle the training examples for each epoch
Epoch 1 Epoch 2
x1 NN y1 𝑦ො 1 x1 NN y1 𝑦ො 1
Mini-batch
Mini-batch
𝑙1 𝑙1
x31 NN y31 𝑦ො 31 x31 NN y31 𝑦ො 31
𝑙31 𝑙17
……
……
𝑙2 𝑙2
……
……
Analysis
x1 When did the neuron
has the largest output?
x2
……
……
xN
Red: positive
Arranging the weights Blue: negative
according to the pixels The neurons in the first
they connected layer usually detect part
of the digits.
Try another task
政治
“stock” in document
經濟
Machine
體育
“president” in document
體育 政治 財經
http://top-breaking-news.com/
Try another task
Live Demo