0% found this document useful (0 votes)
16 views29 pages

Deep Learning: Hung-yi Lee 李宏毅

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views29 pages

Deep Learning: Hung-yi Lee 李宏毅

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Deep Learning

Hung-yi Lee
李宏毅
Deep learning
attracts lots of attention.
• I believe you have seen lots of exciting results
before.

Deep learning trends at Google. Source: SIGMOD 2016/Jeff Dean


Ups and downs of Deep Learning
• 1958: Perceptron (linear model)
• 1969: Perceptron has limitation
• 1980s: Multi-layer perceptron
• Do not have significant difference from DNN today
• 1986: Backpropagation
• Usually more than 3 hidden layers is not helpful
• 1989: 1 hidden layer is “good enough”, why deep?
• 2006: RBM initialization
• 2009: GPU
• 2011: Start to be popular in speech recognition
• 2012: win ILSVRC image competition
• 2015.2: Image recognition surpassing human-level performance
• 2016.3: Alpha GO beats Lee Sedol
• 2016.10: Speech recognition system as good as humans
Three Steps for Deep Learning
Step 1:
Step 2: Step 3: pick
define a set
goodness of the best
of Neural
function
Network function function

Deep Learning is so simple ……


Neural Network

  z 

  z    z 

  z 
“Neuron”
Neural Network
Different connection leads to different network
structures
Network parameter : all the weights and biases in the “neurons”
Fully Connect Feedforward
Network
1 4 0.98
1
-2
1
-1 -2 0.12
-1
1
0
Sigmoid Function  z 
1
 z   z
1 e z
Fully Connect Feedforward
Network
1 4 0.98 2 0.86 3 0.62
1
-2 -1 -1
1 0 -2
-1 -2 0.12 -2 0.11 -1 0.83
-1
1 -1 4
0 0 2
Fully Connect Feedforward
Network
1 0.73 2 0.72 3 0.51
0
-2 -1 -1
1 0 -2
-1 0.5 -2 0.12 -1 0.85
0
1 -1 4
0 0 2
This is a function.
Input vector, output vector
𝑓
([ ]) [
1
−1
=
0 .62
0.83 ] ([ ]) [
𝑓
0
0
=
0 .51
0.85 ]
Given network structure, define a function set
Fully Connect Feedforward
Network
neuron
Input Layer 1 Layer 2 Layer L Output
x1 …… y1
x2 …… y2

……
……

……

……

……
xN …… yM
Input Output
Layer Hidden Layers Layer
Deep = Many hidden layers
22 layers

http://
cs231n.stanford.edu/ 19 layers
slides/
winter1516_lecture8.pdf

8 layers
6.7%
7.3%
16.4%

AlexNet (2012) VGG (2014) GoogleNet (2014)


Deep = Many hidden layers

152 layers 101 layers

Special
structure

Ref:
https://www.youtube.com/watch?
3.57%
v=dxB6299gpvI

7.3% 6.7%
16.4%
AlexNet VGG GoogleNet Residual Net Taipei
(2012) (2014) (2014) (2015) 101
Matrix Operation
1 4 0.98
y1
1
-2
1
-1 -2 0.12
-1 y2
1
0

𝜎[
1
−1
−2
1 ] [ ] +¿ [ ] ¿ [
(
1
−1
1
)
0
0 .98
0.12 ]
[ ]
4
−2
Neural Network
x1 …… y1
x 2 W1 W2 ……
WL y2
b1 b2 bL

……
……

……

……

……
xN x a1 ……
a2 y yM

𝜎W1 x(+ b)
1

𝜎W2 a1(+ b)
2

𝜎 L-1 + )
WL a( bL
Neural Network
x1 …… y1
x 2 W1 W2 ……
WL y2
b1 b2 bL

……
……

……

……

……
xN x a1 ……
a2 y yM

Using parallel computing techniques


y ¿ 𝑓 x( )
to speed up matrix operation

¿WL … 𝜎
W2 𝜎
𝜎 W1 (
x(+ (
b)
1 … + bL
+ b2 )
)
Output Layer
as Multi-Class Classifier
Feature extractor replacing
feature engineering
x1
…… y1
x2
…… y2

Softmax
x

……
……

……
……

……
xK
…… yM
Input Output = Multi-class
Layer Hidden Layers Layer Classifier
Example Application

Input Output

x1 y1
0.1 is 1

x2 y2
0.7 is 2
The image
is “2”

……
……
……

x256 y10
0.2 is 0
16 x 16 = 256
Ink → 1 Each dimension represents
No ink → 0 the confidence of a digit.
Example Application
• Handwriting Digit Recognition

x1 y1 is 1
x2
y2 is 2
Neural
Machine “2
……

Network

……
……

x256 y10 is 0
What is needed is a
function ……
Input: output:
256-dim vector 10-dim vector
Example Application
Input Layer 1 Layer 2 Layer L Output
x1 …… y1 is 1
x2 ……
A function set containing the y2 is 2
candidates for “2

……
……

……

……

……
……
Handwriting Digit Recognition ”
xN …… y10 is 0
Input Output
Layer Hidden Layers Layer

You need to decide the network structure to


let a good function in your function set.
FAQ

• Q: How many layers? How many neurons for each


layer?
Trial and Error + Intuition
• Q: Can the structure be automatically determined?
• E.g. Evolutionary Artificial Neural Networks
• Q: Can we design the network structure?
Convolutional Neural Network (CNN)
Three Steps for Deep Learning
Step 1:
Step 2: Step 3: pick
define a set
goodness of the best
of Neural
function
Network function function

Deep Learning is so simple ……


Loss for an Example
target
“1

x1 …… y1 ^
𝑦1 1
x2 …… ^
𝑦2

Softmax
Given a set of y2 0
parameters
……

……
……

……
……
Cross

……
Entropy
x256 …… y10 ^
𝑦 10 0

10 𝑦 ^
𝑦
𝑙 ( 𝑦 , ^𝑦 )=− ∑ ^
𝑦 𝑖 𝑙𝑛 𝑦 𝑖
𝑖=1
Total Loss:
Total Loss 𝑁
𝐿= ∑ 𝑙 𝑛
For all training data … 𝑛=1

x1 NN y1 ^
𝑦
1
1
𝑙
Find a function in
x2 NN y2 ^
𝑦
2

𝑙
2
function set that
minimizes total loss L
x3 NN y3 3
^
𝑦
3

𝑙
……
……

……
……

Find the network


xN NN yN ^
𝑦
𝑁 parameters that
𝑁
𝑙 minimize total loss L
Three Steps for Deep Learning
Step 1:
Step 2: Step 3: pick
define a set
goodness of the best
of Neural
function
Network function function

Deep Learning is so simple ……


Gradient Descent
𝜃

[]
Compute
𝑤1 0.2 0.15 𝜕𝐿
−𝜇 𝜕 𝐿/ 𝜕 𝑤1 𝜕 𝑤1
Compute 𝜕𝐿
𝑤2 -0.1 0.05 𝜕 𝑤2
−𝜇 𝜕 𝐿/ 𝜕 𝑤2 𝛻 𝐿=¿

……

𝜕𝐿
Compute 𝜕 𝑏1
𝑏1 0.3 0.2 ⋮
− 𝜇 𝜕 𝐿/ 𝜕 𝑏1
gradient
……
Gradient Descent
𝜃 Compute Compute
𝑤1 0.2 0.15 0.09
−𝜇 𝜕 𝐿/ 𝜕 𝑤1 −𝜇 𝜕 𝐿/ 𝜕 𝑤1
……
Compute Compute
𝑤2 -0.1 0.05 0.15
−𝜇 𝜕 𝐿/ 𝜕 𝑤2 −𝜇 𝜕 𝐿/ 𝜕 𝑤2
……
……

Compute Compute
𝑏1 0.3 0.2 0.10
− 𝜇 𝜕 𝐿/ 𝜕 𝑏1 − 𝜇 𝜕 𝐿/ 𝜕 𝑏1
……
……
Gradient Descent
This is the “learning” of machines in deep
learning ……
Even alpha go using this approach.
People image …… Actually …..

I hope you are not too disappointed :p


Backpropagation
• Backpropagation: an efficient way to compute in neural
network

libdnn
台大周伯威
同學開發
Ref: http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN
%20backprop.ecm.mp4/index.html
Three Steps for Deep Learning
Step 1:
Step 2: Step 3: pick
define a set
goodness of the best
of Neural
function
Network function function

Deep Learning is so simple ……


Acknowledgment
• 感謝 Victor Chen 發現投影片上的打字錯誤

You might also like