9 Neural Networks Learning
9 Neural Networks Learning
Machine Learning
Dr. Muhammad Amjad Iqbal
Associate Professor
University of Central Punjab, Lahore.
[email protected]
https://sites.google.com/a/ucp.edu.pk/mai/iml/
Slides of Prof. Dr. Andrew Ng, Stanford & Dr. Humayoun
Neural Networks: Learning
Cost function
• NNs - one of the most powerful learning algorithms
• Going to study a learning algorithm for fitting the derived
parameters in NN given a training set
• First things first: neural network cost function
• We are focusing on application of NNs for classification
problems
2
Neural Network (Classification)
total no. of layers in network
no. of units (not counting bias unit) in
layer
Layer 1 Layer 2 Layer 3 Layer 4 𝑠1=3 , 𝑠 2=5 , 𝑠4 =𝑠 𝐿 =4
Multi-class classification (K classes)
Binary classification
E.g. , , ,
𝑠 𝐿 =1 , 𝐾 = 1
3
Cost function: Generalization of Logistic
regression
Logistic regression:
Neural network:
4
one two three four
K = 10
Neural network:
𝑙=1
8
Gradient computation
10
Gradient computation: Backpropagation algorithm
• Already studied forward propagation
• Takes the initial input into a neural network and pushes
the input through the network
• It leads to the generation of an output
hypothesis, which may be a single real number (), but can
also be a vector ()
All vectors
(2 ) ( 2)
𝑎 . ∗(1 −𝑎 )
𝑛𝑜 𝛿(1) 𝑡𝑒𝑟𝑚
Derivative of the activation function g (sigmoid)
13
Gradient computation: Backpropagation algorithm
Intuition: “error” of node in layer .
15
Backpropagation
Updates.
Gradient computation: Backpropagation algorithm
( 𝟐) 𝑻
=( Ɵ ) 𝜹
(𝟐) (𝟑) (𝟐) ( 𝟐)
𝜹 .∗( 𝒂 . ∗(𝟏− 𝒂 ))
18
• We have calculated the partial derivative for each
parameter
• We can now use this “gradient” in gradient descent or
one of the advanced optimization algorithms
21
Forward Propagation
22
Forward Propagation
(2 )
Θ 10
(𝑖) (2) (2) (3) (3) (4 ) (4)
𝑧 →𝑎 (2 ) 𝑧 →𝑎 𝑧 →𝑎
1 1 Θ 1
11 (2 )
1 1 1
Θ 12
(𝑖) (2) (2) (3) (3)
𝑧 →𝑎
2 2
𝑧 →𝑎
2 2
(𝑖 ) (𝑖 )
(𝑥 , 𝑦 )
(3) (2 ) (2) (2) (2) (2)
𝑧 =Θ × 1+Θ × 𝑎 +Θ ×𝑎
1 10 11 1 12 2 23
What is backpropagation doing? When K=1
(Think of )
I.e. how well is the network doing on example i?
24
Forward Propagation
25
Forward Propagation
26
Neural Networks
Learning
Implementation note: Unrolling parameters
27
Neural Networks
Learning
Gradient checking
33
Motivation
• Backpropagation has a lot of details
– Small bugs may get in and ruin it
• It may looks like is decreasing, but in reality it
may not be decreasing by as much as it should
• Gradient checking helps to make sure that an
implementation is working correctly
34
Numerical estimation of gradients
36
Implementation Note:
- Implement backprop to compute DVec (unrolled ).
- Implement numerical gradient check to compute gradApprox.
- Make sure they give similar values.
- Turn off gradient checking. Using backprop code for learning.
Important:
- Be sure to disable your gradient checking code before training
your classifier. If you run numerical gradient computation on
every iteration of gradient descent (or in the inner loop of
costFunction(…))your code will be very slow.
38
Neural Networks
Learning
Random initialization
39
All hidden units computing the same thing
Zero initialization Highly redundant features
( 1)
Θ 10
(𝟐)
( 1)
Θ 20 𝚯 𝟏𝟏
𝚯(𝟐)
𝟏𝟐
Also ( 1) (1)
Θ =Θ
10 20
𝜹 𝜹
( 𝟏)
𝑱 (𝚯 )= (𝟏 )
𝑱 (𝚯 )
𝜹 𝚯𝟏𝟎 𝜹 𝚯 𝟐𝟎
After each update, parameters corresponding to inputs going into each of
two hidden units are identical.
41
Neural Networks
Learning
Putting it together
43
Training a neural network
Pick a network architecture (connectivity pattern between neurons)
44
Training a neural network
1. Randomly initialize weights
2. Implement forward propagation to get for any
3. Implement code to compute cost function
4. Implement backprop to compute partial derivatives
for i = 1:m
Perform forward propagation and backpropagation using
example
(Get activations and delta terms for ).
45
Training a neural network
5. Use gradient checking to compare computed using
backpropagation vs. using numerical estimate of gradient
of .
Then disable gradient checking code.
6. Use gradient descent or advanced optimization method with
backpropagation to try to minimize as a function of
parameters