0% found this document useful (0 votes)
123 views26 pages

"Hello World" of Deep Learning

The document provides information about Keras, a deep learning library for Python. It discusses that Keras was created by Francois Chollet and is easy to learn and use while still providing flexibility. It also provides examples of using Keras for handwritten digit recognition and discusses concepts like model saving/loading, using GPUs, mini-batch training, and shuffling data.

Uploaded by

Mohan Sc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views26 pages

"Hello World" of Deep Learning

The document provides information about Keras, a deep learning library for Python. It discusses that Keras was created by Francois Chollet and is easy to learn and use while still providing flexibility. It also provides examples of using Keras for handwritten digit recognition and discusses concepts like model saving/loading, using GPUs, mini-batch training, and shuffling data.

Uploaded by

Mohan Sc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

“Hello world”

of deep learning
If you want to learn theano:
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/L
Keras ecture/Theano%20DNN.ecm.mp4/index.html
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Le
cture/RNN%20training%20(v6).ecm.mp4/index.html

Very flexible
or
Need some
effort to learn

Easy to learn and use


Interface of
TensorFlow or (still have some flexibility)
Theano You can modify it if you can write
keras TensorFlow or Theano
Keras
• François Chollet is the author of Keras.
• He currently works for Google as a deep learning
engineer and researcher.
• Keras means horn in Greek
• Documentation: http://keras.io/
• Example:
https://github.com/fchollet/keras/tree/master/exa
mples
感謝 沈昇勳 同學提供圖檔

使用 Keras 心得
Example Application
• Handwriting Digit Recognition

Machine “1”

28 x 28

MNIST Data: http://yann.lecun.com/exdb/mnist/


“Hello world” for deep learning
Keras provides data sets loading function: http://keras.io/datasets/
Keras
……
28x28

……
500
softplus, softsign, relu, tanh,
hard_sigmoid, linear
……
500

Softmax

y1 y2
…… y10
Keras

Several alternatives: https://keras.io/objectives/


Keras

Step 3.1: Configuration

SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam

Step 3.2: Find the optimal network parameters

Training data Labels In the following slides


(Images) (digits)
Keras
Step 3.2: Find the optimal network parameters

numpy array numpy array

28 x 28 …… 10 ……
=784

Number of training examples Number of training examples


https://www.tensorflow.org/versions/r0.8/tutorials/mnist/beginners/index.html
Keras

Save and load models


http://keras.io/getting-started/faq/#how-can-i-save-a-keras-model

How to use the neural network (testing):

case 1:

case 2:
Keras
• Using GPU to speed training
• Way 1
• THEANO_FLAGS=device=gpu0 python
YourCode.py
• Way 2 (in your code)
• import os
• os.environ["THEANO_FLAGS"] =
"device=gpu0"
Live Demo
We do not really minimize total loss!
Mini-batch  Randomly initialize
network parameters

x1 NN y1 𝑦ො 1  Pick the 1st batch


Mini-batch

𝐶1 𝐿′ = 𝐶 1 + 𝐶 31 + ⋯
x31 NN y31 𝑦ො 31 Update parameters once
𝐶 31  Pick the 2nd batch
……

𝐿′′ = 𝐶 2 + 𝐶 16 + ⋯
Update parameters once
x2 NN y2 𝑦ො 2
Mini-batch


𝐶2  Until all mini-batches
have been picked
x16 NN y16 𝑦ො 16
𝐶 16 one epoch
……

Repeat the above process


Batch size influences both speed and
Mini-batch performance. You have to tune it.

 Pick the 1st batch


x1 NN y1 𝑦ො 1
𝐿′ = 𝐶 1 + 𝐶 31 + ⋯
Mini-batch

𝑙1
Update parameters once
x31 NN y31 𝑦ො 31
𝑙31  Pick the 2nd batch
……

𝐿′′ = 𝐶 2 + 𝐶 16 + ⋯
100 examples in a mini-batch Update parameters once


Batch size = 1
 Until all mini-batches
Stochastic gradient descent have been picked
Repeat 20 times one epoch
Speed
• Smaller batch size means more updates in one epoch
• E.g. 50000 examples
• batch size = 1, 50000 updates in one epoch 166s 1 epoch
• batch size = 10, 5000 updates in one epoch 17s 10 epoch
166s Batch size = 1 or 10, update the same
amount of times in the same period.
Batch size = 10 is more stable
GTX 980 on MNIST with
17s 50000 training examples
Speed - Matrix Operation
x1 …… y1
x2 W1 W2 ……
WL y2
b1 b2 bL

……
……

……

……

……
xN x a1 ……
a2 y yM

y =𝑓 x Forward pass (Backward pass is similar)

=𝜎 WL …𝜎 W2 𝜎 W1 x + b1 + b2 … + bL
Speed - Matrix Operation
• Why mini-batch is faster than stochastic gradient
descent?
Stochastic Gradient Descent

𝑧1 = 𝑊1 𝑥 𝑧1 = 𝑊1 𝑥 ……

Mini-batch
matrix

Practically, which
𝑧1 𝑧1 = 𝑊1 𝑥 𝑥
one is faster?
Performance
• Larger batch size yields more efficient computation.
• However, it can yield worse performance
Shuffle the training examples for each epoch
Epoch 1 Epoch 2

x1 NN y1 𝑦ො 1 x1 NN y1 𝑦ො 1

Mini-batch
Mini-batch

𝑙1 𝑙1
x31 NN y31 𝑦ො 31 x31 NN y31 𝑦ො 31
𝑙31 𝑙17

……
……

Don’t worry. This is the default of Keras.


x2 NN y2 𝑦ො 2 x2 NN y2 𝑦ො 2
Mini-batch
Mini-batch

𝑙2 𝑙2

x16 NN y16 𝑦ො 16 x16 NN y16 𝑦ො 16


𝑙16 𝑙26

……
……
Analysis
x1 When did the neuron
has the largest output?
x2

……

……
xN

Red: positive
Arranging the weights Blue: negative
according to the pixels The neurons in the first
they connected layer usually detect part
of the digits.
Try another task
政治
“stock” in document
經濟
Machine

體育
“president” in document

體育 政治 財經
http://top-breaking-news.com/
Try another task
Live Demo

You might also like