0% found this document useful (0 votes)

38 views

Deep Learning Tutorial

Uploaded by

Mihai-Eduard Iordache

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Deep Learning Tutorial

Uploaded by

Mihai-Eduard Iordache

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 133

Deep Learning Tutorial

李宏毅
Hung-yi Lee

speech processing and machine learning lab

Outline

• Lecture I: Introduction of Deep Learning

• Lecture II: Practical Tips for Deep Learning

• Playing Go f( )= “5-5” (next move)

• Chat-bot f( “Hi” )= “Hello”

(the user input) (system response)
Image Recognition:

Framework f( )= “cat”

A set of Model
function f1 , f 2 !

f1 ( )= “cat” f2 ( )= “money”

f1 ( )= “dog f2 ( )= “snake”
”
Image Recognition:

Framework f( )= “cat”

A set of Model
function f1 , f 2 ! Better!

Goodness of
function f
Supervised Learning

Training function input:

Data
function output: “monkey” “cat” “dog
Image Recognition:

Framework f( )= “cat”

Training Testing
A set of Model
function f1 , f 2 ! “cat”
Step 1

Goodness of Pick the “Best” Function

Using f∗
function f f*
Step 2 Step 3

Training
Data
“monkey” “cat” “dog
Three Steps for Deep Learning

• Step 1: define a set of function

Neural Network

• Step 2: goodness of function

• Step 3: pick the best function

Neural Network
Neuron
z = a1w1 + ! + ak wk + ! + aK wK + b

a1 w1 A simple function
…

ak wk z σ (z )
+ a
…

Activation
…

wK function
aK weights b bias
Neural Network
Sigmoid Function σ (z )
Neuron
1
σ (z ) = −z
1+ e z
2
1

4
-1 -2 + σ (z ) 0.98

Activation
-1
function
1 weights 1 bias
Neural Network
Different connections lead to
different network structures

+ σ (z )

+ σ (z ) + σ (z )

+ σ (z )

The neurons have different values of

weights and biases.
Fully Connected Feedforward
Network
1 4 0.98
1
-2
1
-1 -2 0.12
-1
1
0
Sigmoid Function σ (z )
1
σ (z ) = −z
1+ e z
Fully Connected Feedforward
Network
1 4 0.98 2 0.86 3 0.62
1
-2 -1 -1
1 0 -2
-1 -2 0.12 -2 0.11 -1 0.83
-1
1 -1 4
0 0 2
Fully Connected Feedforward
Network
1 0.73 2 0.72 3 0.51
0
-2 -1 -1
1 0 -2
-1 0.5 -2 0.12 -1 0.85
0
1 -1 4
0 0 2
This is a function.
Input vector, output vector

Given network structure, define a function set

Fully Connected Feedforward
Network neuron

Input Layer 1 Layer 2 Layer L Output

x1 …… y1
x2 …… y2

……
……

……

……
xN …… yM
Input Output
Layer Hidden Layers Layer

Deep means many hidden layers

Deep = Many hidden layers
22 layers

http://
cs231n.stanford.edu/ 19 layers
slides/
winter1516_lecture8.pdf

8 layers
6.7%
7.3%
16.4%

AlexNet (2012) VGG (2014) GoogleNet (2014)

Deep = Many hidden layers
101 layers
152 layers

Special
structure

3.57%

7.3% 6.7%
16.4%
AlexNet VGG GoogleNet Residual Net Taipei
(2012) (2014) (2014) (2015) 101
Example Application

Input Output

x1
y0.1
1 is 1

x2 y0.7
2 is 2
The image is
“2”

……
……
……

x256 y0.2
10 is 0
16 x 16 = 256
Ink → 1 Each dimension represents
No ink → 0 the confidence of a digit.
Example Application
• Handwriting Digit Recognition

x1 y1 is 1
x2
y2 is 2
Neural
Machine “2”
……

……
Network

……
x256 y10 is 0
What is needed is a
function ……
Input: output:
256-dim vector 10-dim vector
Example Application
Input Layer 1 Layer 2 Layer L Output
x1 …… y1 is 1
x2 ……
A function set containing the y2 is 2
candidates for “2”

……
……

……

……
……
Handwriting Digit Recognition
x256 …… y10 is 0
Input Output
Layer Hidden Layers Layer

You need to decide the network structure to

contain a good function in your function set.
FAQ

• Q: How many layers? How many neurons for each

layer?
Trial and Error + Intuition

• Q: Can we design the network structure?

Convolutional Neural Network (CNN)
in the following lecture
• Q: Can the structure be automatically determined?
• Yes, but not widely studied yet.
Three Steps for Deep Learning

• Step 1: define a set of function

• Step 2: goodness of function

• Step 3: pick the best function

Training Data
• Preparing training data: images and their labels

“5” “0” “4” “1”

“9” “2” “1” “3”

The learning target is defined on

the training data.
Learning Target
x1 …… y1 is 1
x2

Softmax
…… …… y2 is 2

……

……
x256 …… y10 is 0
16 x 16 = 256
Ink → 1 The learning target is ……
No ink → 0
Input: y1 has the maximum value

Input: y2 has the maximum value

A good function should make the loss
Loss of all examples as small as possible.

“1”

x1 …… y1 1
x2
Given a set ……

Softmax
of y2 0
parameters
……

……
……

……

……
x256 …… y10 0

Loss can be square error or cross entropy target

between the network output and target
Total Loss:
Total Loss
For all training data …
x1 NN y1
As small as possible
x2 NN y2
Find a function in
function set that
x3 NN y3
minimizes total loss L
……
……

……
……

xR NN yR
Three Steps for Deep Learning

• Step 1: define a set of function

• Step 2: goodness of function

• Step 3: pick the best function

Gradient Descent
• Gradient descent never guarantee global minima

Different initial point

Reach different minima,

so different results
You are playing Age of Empires …
You cannot see the whole map.
Gradient Descent
This is the “learning” of machines in deep
learning ……
Even alpha go using this approach.
People image …… Actually …..

I hope you are not too disappointed :p

Backpropagation
•

libdnn
(by NTU student,
Po-wei Chou)

Ref: https://www.youtube.com/watch?v=ibJpTrp5mcE
Three Steps for Deep Learning
• Step 1: • Step 2: • Step 3:
define a goodness pick the
set of of best
function function function
Deep Learning is so simple ……

Now If you want to find a function

If you have some function input/output examples
as training data
You can use deep learning
Spoken Question Answering
• TOEFL Listening Comprehension Test by Machine
• Example:
Audio Story: (The original story is 5 min long.)
Question: “ What is a possible origin of Venus’ clouds? ”
Choices:
(A) gases released as a result of volcanic activity
(B) chemical reactions caused by high surface temperatures
(C) bursts of radio energy from the plane's surface
(D) strong winds that blow dust into the atmosphere
Spoken Question Answering

Question: “what is a possible

origin of Venus‘ clouds?"
Audio Story:
ASR transcriptions Network answer
e.g. (A)
4 Choices

Using the questions and answers in the previous

exams to train the network
Spoken Question Answering

What we have learned today is

only the tip of the iceberg.
(A) (A) (A) (A)

(A)

(B) (B) (B)

Spoken Question Answering

(2) select the shortest (4) the choice with semantic

choice as answer most similar to others
Accuracy (%)

random

(1) (2) (3) (4) (5) (6) (7)

Naive Approaches
Spoken Question Answering

Memory Network: 39.2%

Accuracy (%)

(1) (2) (3) (4) (5) (6) (7)

Naive Approaches
Spoken Question Answering

Word-based Attention: 48.8%

Memory Network: 39.2%

Accuracy (%)

(1) (2) (3) (4) (5) (6) (7)

Naive Approaches
Corpus & Code for
TOEFL Listening Comprehension Test by Machine
https://github.com/sunprinceS/Hierarchical-Attention-Model
Example Application
• Handwriting Digit Recognition

Machine “1”

28 x 28

MNIST Data: http://yann.lecun.com/exdb/mnist/

“Hello world” for deep learning
If you want to learn theano:
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/
Keras Lecture/Theano%20DNN.ecm.mp4/index.html
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/
Lecture/RNN%20training%20(v6).ecm.mp4/index.html

Very flexible
or
Need some
effort to learn

Easy to learn and use

Interface of
TensorFlow or (still have some flexibility)
Theano You can modify it if you can write
Keras TensorFlow or Theano
Document: https://keras.io/
Deep Learning Graduate Student (using Keras)

How my friends see me How my family sees me How society sees me

How my professor sees me How I see myself How it really is

Modified from the figure of

沈昇勳 (Sheng-syun Shen)
How you feel when you use Keras
Keras
28x28
……

……
500

Softmax

y1 y2 …… y10
Keras
Keras

Step 3.1: Configuration

0.1
Step 3.2: Find the optimal network parameters

Training data Labels

(Images) (digits)
Keras
Step 3.2: Find the optimal network parameters

numpy array numpy array

28 x 28 …… 10 ……
=784

Number of training examples Number of training examples

https://www.tensorflow.org/versions/r0.8/tutorials/mnist/beginners/index.html
Keras

Save and load models

http://keras.io/getting-started/faq/#how-can-i-save-a-keras-model

How to use the neural network (testing):

case 1:

case 2:
Using GPU to speed training
• THEANO_FLAGS=device=gpu0 python YourCode.py
Live Demo
• You can find the code for demo today at the
following link:
• http://speech.ee.ntu.edu.tw/~tlkagk/DL_tutorial/
DeepLecture_HelloWorld.py
Lesson we learned ...
http://ent.ltn.com.tw/news/
breakingnews/1144545

Although deep Learning is a convenient

hammer, It is the hammer of Thor.

Hard to lift the hammer …

Lecture II:
Practical Tips
for Deep Learning
Recipe of Deep Learning
YES

• Step 1: define a NO
Good Results on
set of function Testing Data?
Overfitting!
• Step 2: goodness
of function YES

NO
• Step 3: pick the Good Results on
Training Data?
best function

Neural
Network
Do not always blame Overfitting

Not well trained

Overfitting?

Training Data Testing Data

[Kaiming He, arXiv 2015]

Recipe of Deep Learning
YES

Good Results on
Different approaches for Testing Data?
different problems.
YES
e.g. “dropout” for good
results on testing data
Good Results on
Training Data?

Neural
Network
Recipe of Deep Learning
YES

• Choosing proper loss Good Results on

Testing Data?
• Mini-batch
YES
• New activation function
Good Results on
• Adaptive Learning Rate Training Data?

• Momentum
Choosing Proper Loss
“1”

x1 …… y1 1 1
x2 ……

Softmax
y2 0 0
……

……
……

……
……
……
loss
x256 …… y10 0 0
Which one is better?
target
Square Cross
Error Entropy
=0 =0
Let’s try it
Square Error

Cross Entropy
Testing: Accuracy
Let’s try it
Square Error 0.11
Cross Entropy 0.84

Training
Cross
Entropy

Square
Error
Choosing Proper Loss
When using softmax output layer,
choose cross entropy
Cross
Entropy

Total
Loss
Square
Error
http://jmlr.org/
proceedings/papers/
w1 w2
v9/glorot10a/
Recipe of Deep Learning
YES

• Choosing proper loss Good Results on

Testing Data?
• Mini-batch
YES
• New activation function
Good Results on
• Adaptive Learning Rate Training Data?

• Momentum
We do not really minimize total loss!
Mini-batch ➢ Randomly initialize
network parameters

x1 NN y1 ➢ Pick the 1st batch

Mini-batch

x31 NN y31 Update parameters once

➢ Pick the 2nd batch
……

Update parameters once

x2 NN y2

…
Mini-batch

➢ Until all mini-batches

have been picked
x16 NN y16
one epoch
……

Repeat the above process

Mini-batch

➢ Pick the 1st batch

x1 NN y1
Mini-batch

Update parameters once

x31 NN y31
➢ Pick the 2nd batch
……

Update parameters once

100 examples in a mini-batch

…
➢ Until all mini-batches
Repeat 20 times have been picked
one epoch
We do not really minimize total loss!
Mini-batch ➢ Randomly initialize
network parameters

x1 NN y1 ➢ Pick the 1st batch

Mini-batch

x31 NN y31 Update parameters once

➢ Pick the 2nd batch
……

Update parameters once

x2 NN y2

…
Mini-batch

L is different each time

x16 NN y16 when we update
parameters!
……
Mini-batch
Original Gradient Descent With Mini-batch

Unstable!!!

The colors represent the total loss.

Not always true with
Mini-batch is Faster parallel computing.

Original Gradient Descent With Mini-batch

Update after seeing all If there are 20 batches, update
examples 20 times in one epoch.

See all See only one

examples batch
Can have the same speed
(not super large data set)

1 epoch

Mini-batch has better performance!

Testing:
Mini-batch is Better! Accuracy

Mini-batch 0.84
No batch 0.12
Training
Mini-batch
Accuracy

No batch

Epoch
Shuffle the training examples for each epoch
Epoch 1 Epoch 2

x1 NN y1 x1 NN y1
Mini-batch

Mini-batch
x31 NN y31 x31 NN y31
……

……
Don’t worry. This is the default of Keras.
x2 NN y2 x2 NN y2
Mini-batch
Mini-batch

x16 NN y16 x16 NN y16

……
……
Recipe of Deep Learning
YES

• Choosing proper loss Good Results on

Testing Data?
• Mini-batch
YES
• New activation function
Good Results on
• Adaptive Learning Rate Training Data?

• Momentum
Hard to get the power of Deep …

Results on Training Data

Deeper usually does not imply better.

Demo
Vanishing Gradient Problem
x1 …… y1
x2 …… y2
……

……

……
xN …… yM

Smaller gradients Larger gradients

Learn very slow Learn very fast

Almost random Already converge

based on random!?
Vanishing Gradient Problem
Smaller gradients

x1 ……
x2 Small
…… output
……

……
……

……

……
xN ……
Large
input
Intuitive way to compute the derivatives …
Hard to get the power of Deep …

In 2006, people used RBM pre-training.

In 2015, people use ReLU.
ReLU
• Rectified Linear Unit (ReLU)
Reason:
1. Fast to compute
2. Biological reason
3. Infinite sigmoid
with different biases
4. Vanishing gradient
[Xavier Glorot, AISTATS’11]
[Andrew L. Maas, ICML’13] problem
[Kaiming He, arXiv’15]
ReLU

x1 y1

0 y2
x2
0

0
ReLU
A Thinner linear network

x1 y1

y2
x2
Do not have
smaller gradients
Let’s try it
Testing: 9 layers Accuracy
Let’s try it
Sigmoid 0.11
ReLU 0.96
• 9 layers

Training

ReLU
Sigmoid
Recipe of Deep Learning
YES

• Choosing proper loss Good Results on

Testing Data?
• Mini-batch
YES
• New activation function
Good Results on
• Adaptive Learning Rate Training Data?
Giving different
• Momentum parameters different
learning rates
Recipe of Deep Learning
YES

• Choosing proper loss Good Results on

Testing Data?
• Mini-batch
YES
• New activation function
Good Results on
• Adaptive Learning Rate Training Data?

• Momentum
In physical world ……
• Momentum

How about put this phenomenon

in gradient descent?
Still not guarantee reaching
Momentum global minima, but give some
hope ……
cost
Movement =
Negative of 𝜕𝐿∕𝜕𝑤 + Momentum

Momentum
Real Movement

𝜕𝐿∕𝜕𝑤 = 0
Adam Adaptive learning rate + Momentum
Let’s try it Testing: Accuracy
Original 0.96
Adam 0.97
• ReLU, 3 layer

Training

Original
Adam
Recipe of Deep Learning
YES

• Early Stopping Good Results on

Testing Data?

• Regularization
YES

• Dropout Good Results on

Training Data?

• Network Structure
Why Overfitting?
• Training data and testing data can be different.

Training Data: Testing Data:

Learning target is defined by the training data.

The parameters achieving the learning target do not
necessary have good results on the testing data.
Panacea for Overfitting
• Have more training data
• Create more training data (?)

Handwriting recognition:

Original Created
Training Data: Training Data:

Shift 15。
Recipe of Deep Learning
YES

• Early Stopping Good Results on

Testing Data?

• Regularization
YES

• Dropout Good Results on

Training Data?

• Network Structure
Dropout
Training:

➢ Each time before updating the parameters

● Each neuron has p% to dropout
Dropout
Training:

Thinner!

➢ Each time before updating the parameters

● Each neuron has p% to dropout
The structure of the network is changed.
● Using the new network for training
For each mini-batch, we resample the dropout neurons
Dropout
Using basketball as example
Training (under pressure): Testing (in the real game):

http://big5.xinhuanet.com/gate/big5/
news.xinhuanet.com/sports/2012-07/
03/c_123363695.htm

https://www.youtube.com/watch?v=pn5dP9s9yiM
(idea from Prof. Min Sun)
Recipe of Deep Learning
YES

• Early Stopping Good Results on

Testing Data?

• Regularization
YES

• Dropout Good Results on

Training Data?

• Network Structure
CNN is a very good example!
(next lecture)
Lecture III:
Convolutional Neural
Network (CNN)
Why CNN for Image?
• When processing image, the first layer of fully
connected network would be very large
……

Softmax
……
100
……3 x 107

……
……
100 100 x 100 x 3 1000
Can the fully connected network be simplified by
considering the properties of image recognition?
Why CNN for Image
• Some patterns are much smaller than the whole
image
A neuron does not have to see the whole image to
discover the pattern.
Connecting to small region with less parameters

“beak” detector
Why CNN for Image
• The same patterns appear in different regions.
“upper-left
beak” detector

Do almost the same thing

They can use the same
set of parameters.

“middle beak”
detector
Why CNN for Image
• Subsampling the pixels will not change the object
bird
bird

subsampling

We can subsample the pixels to make image smaller

Less parameters for the network to process the image
The whole CNN
cat dog ……
Convolution

Max Pooling
Can repeat
Fully Connected many times
Feedforward network Convolution

Max Pooling

Flatten
The whole CNN
Property 1
➢ Some patterns are much Convolution
smaller than the whole image
Property 2
➢ The same patterns appear in Max Pooling
Can repeat
different regions.
many times
Property 3 Convolution
➢Subsampling the pixels will
not change the object
Max Pooling

Flatten
The whole CNN
cat dog ……
Convolution

Max Pooling
Can repeat
Fully Connected many times
Feedforward network Convolution

Max Pooling

Flatten
CNN – Convolution
The values in the matrices are learned from training data.

Filter 1
Matrix

Convolution Filter 2
Matrix

……
Max Pooling
……

Each filter detects a small

Property 1
pattern (3 x 3).
CNN – Convolution
The values in the matrices are learned from training data.
1 -1 -1
1 0 0 0 0 1 -1 1 -1 Filter 1
0 1 0 0 1 0 -1 -1 1 Matrix
0 0 1 1 0 0
1 0 0 0 1 0 -1 1 -1
-1 1 -1 Filter 2
0 1 0 0 1 0
Matrix
0 0 1 0 1 0 -1 1 -1

……
6 x 6 image
Each filter detects a small
Property 1
pattern (3 x 3).
1 -1 -1
CNN – Convolution -1 1 -1 Filter 1
-1 -1 1
stride=1

1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
0 1 0 0 1 0
-3 -3 0 1
0 0 1 0 1 0

6 x 6 image 3 -2 -2 -1

Property 2
-1 1 -1
CNN – Convolution -1 1 -1 Filter 2
-1 1 -1
stride=1 Do the same process for
1 0 0 0 0 1 every filter
0 1 0 0 1 0 3 -1 -3 -1
-1 -1 -1 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
-1 -1 -2 1
Feature
0 1 0 0 1 0
-3 -3 Map0 1
0 0 1 0 1 0 -1 -1 -2 1
6 x 6 image 3 -2 -2 -1
-1 0 -4 3
4 x 4 image
CNN – Colorful image
11 -1-1 -1-1 -1-1 11 -1-1
1 -1 -1 -1 1 -1
-1-1 11 -1-1 -1-1 11 -1-1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1-1 -1-1 11 -1-1 11 -1-1
-1 -1 1 -1 1 -1
Colorful image
1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0
Convolution v.s. Fully Connected

1 0 0 0 0 1 1 -1 -1 -1 1 -1
0 1 0 0 1 0 -1 1 -1 -1 1 -1

0 0 1 1 0 0 -1 -1 1 -1 1 -1

1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0 convolution
image

x1
1 0 0 0 0 1
0 1 0 0 1 0 x2
Fully- 0 0 1 1 0 0

connected 1 0 0 0 1 0
……

……
0 1 0 0 1 0
0 0 1 0 1 0
x36
1 -1 -1 Filter 1 1: 1
-1 1 -1 2: 0
-1 -1 1 3: 0
4: 0 3
1 0 0 0 0 1

…
0 1 0 0 1 0 7: 0
0 0 1 1 0 0 8: 1
1 0 0 0 1 0 9: 0
10: 0
0 1 0 0 1 0

…
0 0 1 0 1 0
13: 0
6 x 6 image
14: 0
Less parameters! 15: 1 Only connect to 9
16: 1 input, not fully
connected
…
1 -1 -1 1: 1
-1 1 -1 Filter 1 2: 0
-1 -1 1 3: 0
4: 0 3
1 0 0 0 0 1

…
0 1 0 0 1 0 7: 0
0 0 1 1 0 0 8: 1
9: 0 -1
1 0 0 0 1 0
10: 0
0 1 0 0 1 0

…
0 0 1 0 1 0
13: 0
6 x 6 image
14: 0
Less parameters! 15: 1
16: 1 Shared weights
Even less parameters!
…
The whole CNN
cat dog ……
Convolution

Max Pooling
Can repeat
Fully Connected many times
Feedforward network Convolution

Max Pooling

Flatten
CNN – Max Pooling
1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1

3 -1 -3 -1 -1 -1 -1 -1

-3 1 0 -3 -1 -1 -2 1

-3 -3 0 1 -1 -1 -2 1

3 -2 -2 -1 -1 0 -4 3
CNN – Max Pooling

New image
1 0 0 0 0 1 but smaller
0 1 0 0 1 0 Conv
3 0
0 0 1 1 0 0 -1 1
1 0 0 0 1 0
0 1 0 0 1 0 Max 30 13
0 0 1 0 1 0 Pooling
2 x 2 image
6 x 6 image
Each filter
is a channel
The whole CNN
3 0
-1 1 Convolution

30 13
Max Pooling
Can repeat
A new image many times
Convolution
Smaller than the original
image
The number of the channel Max Pooling
is the number of filters
The whole CNN
cat dog ……
Convolution

Max Pooling
A new image
Fully Connected
Feedforward network Convolution

Max Pooling
A new image
Flatten
3
Flatten
0

1
3 0
-1 1 3

3 1 -1
0 3 Flatten
1 Fully Connected
Feedforward network
0

3
Only modified the network structure and
CNN in Keras input format (vector -> 3-D tensor)
input

Convolution
1 -1 -1
-1 1 -1
-1 1 -1
-1 1 -1 …… There are 25
-1 -1 1 3x3 filters.
-1 1 -1 Max Pooling
Input_shape = ( 1 , 28 , 28 )
1: black/weight, 3: RGB 28 x 28 pixels Convolution

3 -1 3 Max Pooling

-3 1
Only modified the network structure and
CNN in Keras input format (vector -> 3-D tensor)
input
1 x 28 x 28
Convolution
How many parameters
9 25 x 26 x 26
for each filter?
Max Pooling
25 x 13 x 13
Convolution
How many parameters
225 50 x 11 x 11
for each filter?
Max Pooling
50 x 5 x 5
Only modified the network structure and
CNN in Keras input format (vector -> 3-D tensor)
input
1 x 28 x 28
output Convolution
25 x 26 x 26
Fully Connected Max Pooling
Feedforward network
25 x 13 x 13
Convolution
50 x 11 x 11
Max Pooling
1250 50 x 5 x 5
Flatten
Live Demo
What does CNN learn?
The output of the k-th filter is a x
11 x 11 matrix. input
Degree of the activation
of the k-th filter: 25 3x3
Convolution
filters
(gradient ascent)
11 Max Pooling

3 -1 …… -1
50 3x3
Convolution
filters
-3 1 …… -3
11 50 x 11 x 11
Max Pooling
……

……

3 -2 …… -1
What does CNN learn?
The output of the k-th filter is a
11 x 11 matrix. input
Degree of the activation
of the k-th filter: 25 3x3
Convolution
filters
(gradient ascent)
Max Pooling

50 3x3
Convolution
filters
50 x 11 x 11
Max Pooling

For each filter

What does CNN learn? input

Find an image maximizing the output Convolution

of neuron:
Max Pooling

Convolution

Max Pooling

flatten

Each figure corresponds to a neuron

What does CNN learn? input

Can we see Convolution

digits?
Max Pooling
0 1 2
Convolution

Max Pooling
3 4 5
flatten

6 7 8

Deep Neural Networks are Easily Fooled

https://www.youtube.com/watch?v=M2IebCN9Ht4
What does CNN learn? Over all pixel
values

0 1 2 0 1 2

3 4 5 3 4 5

6 7 8 6 7 8
CNN

Deep Dream Modify

image

• Given a photo, machine adds what it sees ……

CNN exaggerates what it sees

http://deepdreamgenerator.com/
Deep Dream
• Given a photo, machine adds what it sees ……

http://deepdreamgenerator.com/
Deep Style
• Given a photo, make its style like famous paintings

https://dreamscopeapp.com/
Deep Style
• Given a photo, make its style like famous paintings

https://dreamscopeapp.com/
Deep Style

CNN CNN

A Neural content style

Algorithm of
Artistic Style
https://arxiv.org/abs/
1508.06576
CNN

?
Application: Playing Go

Next move
Network (19 x 19
positions)

19 x 19 matrix 19 x 19 vector
19(image)
x 19 vector
Black: 1 Fully-connected feedforward
white: -1 network can be used
none: 0 But CNN performs much better.
http://lgs.tw/qwwheue
Training
Collecting records of many previous plays

……

Machine mimics human player

CNN
Why CNN for Go playing?
• Some patterns are much smaller than the whole
image
Alpha Go uses 5 x 5 for first layer

• The same patterns appear in different regions.

Why CNN for Go playing?
• Subsampling the pixels will not change the object
Max Pooling How to explain this???

Alpha Go does not use Max Pooling ……

Concluding Remarks

• Lecture I: Introduction of Deep Learning

• Lecture II: Practical Tips for Deep Learning

• Lecture III: Convolutional Neural Network (CNN)

ROS For Psych Checklist
100% (1)
ROS For Psych Checklist
7 pages
Detailed Lesson Plan MST 123 Final Teaching Demo
100% (1)
Detailed Lesson Plan MST 123 Final Teaching Demo
8 pages
Samsung Training Manual Led TV Uexxes7000 Uexxes7500 en
80% (5)
Samsung Training Manual Led TV Uexxes7000 Uexxes7500 en
53 pages
General Navigation: Exam 1, 70 Questions Time Allowed 2 Hours
No ratings yet
General Navigation: Exam 1, 70 Questions Time Allowed 2 Hours
96 pages
Assembly Line Balancing
No ratings yet
Assembly Line Balancing
16 pages
Dry Wall Putty
100% (3)
Dry Wall Putty
4 pages
Deep Learning Computer Vision
No ratings yet
Deep Learning Computer Vision
302 pages
Deep Learning Turorial PDF
No ratings yet
Deep Learning Turorial PDF
301 pages
2.game AI 1
No ratings yet
2.game AI 1
268 pages
Deep Learning: Hung-yi Lee 李宏毅
No ratings yet
Deep Learning: Hung-yi Lee 李宏毅
29 pages
Unit-5_AI_ETC
No ratings yet
Unit-5_AI_ETC
64 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
4 - DL (v2)
No ratings yet
4 - DL (v2)
32 pages
Deepnet Lourentzou
No ratings yet
Deepnet Lourentzou
49 pages
Deep Learning
100% (2)
Deep Learning
49 pages
UNIT II DNN
No ratings yet
UNIT II DNN
24 pages
CS 611 Slides 5
No ratings yet
CS 611 Slides 5
28 pages
ANNandItsApplicationsinCivilEngineering
No ratings yet
ANNandItsApplicationsinCivilEngineering
264 pages
20250415 - Deep_learning
No ratings yet
20250415 - Deep_learning
49 pages
MachineLearningSlides PartOne
No ratings yet
MachineLearningSlides PartOne
252 pages
a imprimer 4
No ratings yet
a imprimer 4
4 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
ECE/CS 559 - Neural Networks Lecture Notes #3 Some Example Neural Networks
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #3 Some Example Neural Networks
7 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
UNIT-I.pptx
No ratings yet
UNIT-I.pptx
90 pages
Module 2
No ratings yet
Module 2
44 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Lecture2 Slides 1
No ratings yet
Lecture2 Slides 1
28 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
Lect 12 -Deep Feed Forward NN- Review
No ratings yet
Lect 12 -Deep Feed Forward NN- Review
93 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 7+Deep+Learning
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 7+Deep+Learning
108 pages
CS217_2024_lec11
No ratings yet
CS217_2024_lec11
7 pages
Deep Learning
No ratings yet
Deep Learning
13 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
Deep Learning Final Sheet
No ratings yet
Deep Learning Final Sheet
915 pages
1725876123-Unit 1 Fundamental of Deep Learning
No ratings yet
1725876123-Unit 1 Fundamental of Deep Learning
51 pages
Session: Deep Learning: Module: Digital Image Processing Module Code: IMP302
No ratings yet
Session: Deep Learning: Module: Digital Image Processing Module Code: IMP302
34 pages
ANN-CNN-RNN
No ratings yet
ANN-CNN-RNN
26 pages
Deep Neural Network
No ratings yet
Deep Neural Network
12 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
DL Intro
No ratings yet
DL Intro
64 pages
Deep Learning With Keras - Quick Guide
No ratings yet
Deep Learning With Keras - Quick Guide
22 pages
Lecture_09_slides_-_after
No ratings yet
Lecture_09_slides_-_after
57 pages
1 AI_Introduction and ML
No ratings yet
1 AI_Introduction and ML
32 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
MSCDA 605 Machine Learning Exam Model Answers May_2019
No ratings yet
MSCDA 605 Machine Learning Exam Model Answers May_2019
7 pages
6COM1044 Deep Learning 1
No ratings yet
6COM1044 Deep Learning 1
49 pages
ML06_Neural-Network_2024-2025
No ratings yet
ML06_Neural-Network_2024-2025
78 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
LLM for Maths People
No ratings yet
LLM for Maths People
53 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
ML Unit-5
No ratings yet
ML Unit-5
19 pages
8.2.1: Introduction To Neural Networks: Objectives
No ratings yet
8.2.1: Introduction To Neural Networks: Objectives
11 pages
03-Lecture Notes-Mid
No ratings yet
03-Lecture Notes-Mid
23 pages
16-dl-1 - converted
No ratings yet
16-dl-1 - converted
9 pages
Deep Learning Tutorial Complete (v3)
No ratings yet
Deep Learning Tutorial Complete (v3)
109 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Java Package Mastery: 100 Knock Series - Master Java in One Hour, 2024 Edition
From Everand
Java Package Mastery: 100 Knock Series - Master Java in One Hour, 2024 Edition
Kanto
No ratings yet
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Thermodynamics - Boston
No ratings yet
Thermodynamics - Boston
7 pages
Artificial Intelligence Technology and The Challenge of Sustainable National Development in Nigeria
No ratings yet
Artificial Intelligence Technology and The Challenge of Sustainable National Development in Nigeria
14 pages
Technical Data Sheet: PVC - Resin SG58J
No ratings yet
Technical Data Sheet: PVC - Resin SG58J
2 pages
Cvsu Mission Cvsu Vision Cavite State University: Department of Biological and Physical Sciences
No ratings yet
Cvsu Mission Cvsu Vision Cavite State University: Department of Biological and Physical Sciences
18 pages
Vaso Pressors
No ratings yet
Vaso Pressors
1 page
TFN PDF
No ratings yet
TFN PDF
5 pages
Mass Transfer - Part 2
No ratings yet
Mass Transfer - Part 2
23 pages
AU Price List 01 Jul 2021
No ratings yet
AU Price List 01 Jul 2021
40 pages
HY3842-Hooyi
No ratings yet
HY3842-Hooyi
9 pages
Biotechnology - Module 1-Final Version
No ratings yet
Biotechnology - Module 1-Final Version
17 pages
Ilide - Info Members Directory 2012 PR
No ratings yet
Ilide - Info Members Directory 2012 PR
106 pages
Figure 4-7 Composite Diagram of A Plant Cell: Animated
No ratings yet
Figure 4-7 Composite Diagram of A Plant Cell: Animated
1 page
Irc Gov in SP 082 2008
No ratings yet
Irc Gov in SP 082 2008
162 pages
Sensors Questions
No ratings yet
Sensors Questions
11 pages
PB266101
No ratings yet
PB266101
3 pages
Concepts of Modern Maintenance Management
100% (3)
Concepts of Modern Maintenance Management
7 pages
Justin Culver Resume 1
No ratings yet
Justin Culver Resume 1
3 pages
Precalculus with Limits 4th Edition Larson Solutions Manual 2024 scribd download full chapters
100% (6)
Precalculus with Limits 4th Edition Larson Solutions Manual 2024 scribd download full chapters
46 pages
Đề Anh Cô Mai Phương
No ratings yet
Đề Anh Cô Mai Phương
123 pages
Chapter 1.6, Problem 10E
No ratings yet
Chapter 1.6, Problem 10E
13 pages
PFC Level Wordlist PDF
100% (1)
PFC Level Wordlist PDF
45 pages
00901-COS-MST-CIV-000017 - Method Statement and Risk Assessment For Concrete Repair Activities - AJI Approved
No ratings yet
00901-COS-MST-CIV-000017 - Method Statement and Risk Assessment For Concrete Repair Activities - AJI Approved
118 pages
June Exam Memorandum Grade 8 Term 2
67% (3)
June Exam Memorandum Grade 8 Term 2
7 pages
2023-01-12 Calvert County Times
No ratings yet
2023-01-12 Calvert County Times
32 pages