Introduction To Neural Network: - CS 280 Tutorial One
Introduction To Neural Network: - CS 280 Tutorial One
Guoxing Sun
[email protected]
2020-09-16
• The history of neural network
Paul Werbos proposed that using David Rumelhart , Geoffrey George Cybenko found the Lecun Jürgen Kernel SVM worked better
the back propagation of error to E. Hinton and Ronald J. university of neural propose Schmidhuber than neural network in many
train neural network, which make Williams reported this network d CNN proposed tasks.
multilayer networks possible approach, and made it famous. LSTM
𝑏:
bias
𝑥 𝑑 𝑎:
activation
𝑓 (⋅): activation function
Basic neural network
Multi Layer Perceptron (MLP)
𝑑
(𝑙 ) ( 𝑙) ( 𝑙− 1 ) ( 𝑙)
𝑧 = ∑ 𝑤 𝑖 𝑎𝑖 +𝑏
𝑖=1
¿
𝑥 1
(𝑙) ( 𝑙)
𝑎
=𝑓 𝑙 (𝑧 )
𝑥 2 𝑦
𝑠
…
𝑥 𝑑
Basic neural network
Activation function
Requirement:
(1) The activation value of each layer of neurons will not be saturated;
(2) The activation value of each layer cannot be 0.
2. Random value
initialization
It's easy to saturate when it's big, but it's not activated when it's small.
https://zhuanlan.zhihu.com/p/40175178
Optimization of neural network
Weight Initialization
Input: mean: 0 var :1
3. Naive initialization Output: mean: 0 var 1/3n
4. Xavier initialization
Xavier glorot thinks that good initialization should make the activation value of each layer consistent with the
variance of gradient during propagation, which is called glorot condition.
Xavier can only be applied to saturated activation functions such as sigmoid and tanh, but not to unsaturated
activation functions such as relu.
Optimization of neural network
Weight Initialization
5. Kaiming initialization
Optimization of neural network
Optimizer
• Gradient Descent ¿
• Stochastic Gradient Descent ¿
• Mini-batch SGD
¿
𝐽 ( ⋅) : loss function
Optimization of neural network
Normalization
BN is applied between convolution calculation (or affine transformation in MLP) and activation function.
Optimization of neural network
Overfitting and Underfitting
~ 𝛼 𝑇
𝐽 ( 𝑤 ; 𝑋 , 𝑦)= 𝑤 𝑤+ 𝐽 ( 𝑤 ; 𝑋 , 𝑦)
2