0% found this document useful (0 votes)

27 views38 pages

9 Neural Networks Learning

Uploaded by

Mehar Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views38 pages

9 Neural Networks Learning

Uploaded by

Mehar Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

Introduction to

Machine Learning
Dr. Muhammad Amjad Iqbal
Associate Professor
University of Central Punjab, Lahore.
[email protected]

https://sites.google.com/a/ucp.edu.pk/mai/iml/
Slides of Prof. Dr. Andrew Ng, Stanford & Dr. Humayoun
Neural Networks: Learning
Cost function
• NNs - one of the most powerful learning algorithms
• Going to study a learning algorithm for fitting the derived
parameters in NN given a training set
• First things first: neural network cost function
• We are focusing on application of NNs for classification
problems
2
Neural Network (Classification)
total no. of layers in network
no. of units (not counting bias unit) in
layer
Layer 1 Layer 2 Layer 3 Layer 4 𝑠1=3 , 𝑠 2=5 , 𝑠4 =𝑠 𝐿 =4
Multi-class classification (K classes)
Binary classification
E.g. , , ,

pedestrian car motorcycle truck

1 output unit K output units

𝑠 𝐿 =1 , 𝐾 = 1
3
Cost function: Generalization of Logistic
regression
Logistic regression:

Neural network:

4
one two three four

K = 10

Example: if x(i) is an image of the

digit 5, then the corresponding y(i)
(that you should use with the cost
function) should be a 10-
dimensional
vector with y5 = 1, and the other
elements equal to 0.
Cost function: Generalization of Logistic
We don’t include “bias unit’s
regression
Logistic regression:

Neural network:

We don’t include “bias unit’s 6

Layer 1 Layer 2 Layer 3 Layer 4
no. of units (not counting bias unit) in layer
)

𝑙=1

𝑠1= 1 This is also called

a weight decay term
Neural Networks: Learning
Backpropagation algorithm
For minimization of cost function

8
Gradient computation

Need code to compute:

-
-
9
Gradient computation
Given one training example ( , ):
Forward propagation:

Layer 1 Layer 2 Layer 3 Layer 4

10
Gradient computation: Backpropagation algorithm
• Already studied forward propagation
• Takes the initial input into a neural network and pushes
the input through the network
• It leads to the generation of an output
hypothesis, which may be a single real number (), but can
also be a vector ()

Now: back propagation algorithm

11
Gradient computation: Backpropagation algorithm
Given one training example ( , ):

Intuition: “error” of node in layer . Layer 1 Layer 2 Layer 3 Layer 4

• Back propagation basically takes the

output you got from your network (𝑙 ) (𝑙)
𝛿 =𝑎 − 𝑦 𝑗
𝑗 𝑗
• Compares it to the real value (y)
and calculates how wrong the network
was (i.e. how wrong the parameters
were)
12
Gradient computation: Backpropagation algorithm
Intuition: “error” of node in layer .

For each output unit (layer L = 4)

Layer 1 Layer 2 Layer 3 Layer 4

All vectors
(2 ) ( 2)
𝑎 . ∗(1 −𝑎 )
𝑛𝑜 𝛿(1) 𝑡𝑒𝑟𝑚
Derivative of the activation function g (sigmoid)
13
Gradient computation: Backpropagation algorithm
Intuition: “error” of node in layer .

For each output unit (layer L = 4)

Layer 1 Layer 2 Layer 3 Layer 4

15
Backpropagation
Updates.
Gradient computation: Backpropagation algorithm

( 𝟐) 𝑻
=( Ɵ ) 𝜹
(𝟐) (𝟑) (𝟐) ( 𝟐)
𝜹 .∗( 𝒂 . ∗(𝟏− 𝒂 ))

Mathematically, It is possible to prove that all these partial derivatives

are exactly given by the above formula: activations * deltas
Backpropagation algorithm Accumulator

Training set of examples

Set (for all ).
For
Set
Perform forward propagation to compute for
Using , compute
Compute

18
• We have calculated the partial derivative for each
parameter
• We can now use this “gradient” in gradient descent or
one of the advanced optimization algorithms

• Probably too many details

• Probably less mathematically clean
• If not yet clear, you’ll understand it by doing
programming exercise 4.
Neural Networks
Learning
Backpropagation intuition

21
Forward Propagation

22
Forward Propagation

(2 )
Θ 10
(𝑖) (2) (2) (3) (3) (4 ) (4)
𝑧 →𝑎 (2 ) 𝑧 →𝑎 𝑧 →𝑎
1 1 Θ 1
11 (2 )
1 1 1

Θ 12
(𝑖) (2) (2) (3) (3)
𝑧 →𝑎
2 2
𝑧 →𝑎
2 2

(𝑖 ) (𝑖 )
(𝑥 , 𝑦 )
(3) (2 ) (2) (2) (2) (2)
𝑧 =Θ × 1+Θ × 𝑎 +Θ ×𝑎
1 10 11 1 12 2 23
What is backpropagation doing? When K=1

Focusing on a single example , , the case of 1 output unit,

and ignoring regularization ( ),

(Think of )
I.e. how well is the network doing on example i?
24
Forward Propagation

“error” of cost for (unit in layer ).

Formally, (for ), where

25
Forward Propagation

“error” of cost for (unit in layer ).

26
Neural Networks
Learning
Implementation note: Unrolling parameters

27
Neural Networks
Learning
Gradient checking

33
Motivation
• Backpropagation has a lot of details
– Small bugs may get in and ruin it
• It may looks like is decreasing, but in reality it
may not be decreasing by as much as it should
• Gradient checking helps to make sure that an
implementation is working correctly
34
Numerical estimation of gradients

Two sided difference One sided difference

Implement: gradApprox = (J(theta + EPSILON) – J(theta –

EPSILON)) /(2*EPSILON) 35
Parameter vector
(E.g. is “unrolled” version of )

36
Implementation Note:
- Implement backprop to compute DVec (unrolled ).
- Implement numerical gradient check to compute gradApprox.
- Make sure they give similar values.
- Turn off gradient checking. Using backprop code for learning.

Important:
- Be sure to disable your gradient checking code before training
your classifier. If you run numerical gradient computation on
every iteration of gradient descent (or in the inner loop of
costFunction(…))your code will be very slow.

38
Neural Networks
Learning
Random initialization

39
All hidden units computing the same thing
Zero initialization Highly redundant features
( 1)
Θ 10
(𝟐)
( 1)
Θ 20 𝚯 𝟏𝟏

𝚯(𝟐)
𝟏𝟐

Also ( 1) (1)
Θ =Θ
10 20

𝜹 𝜹
( 𝟏)
𝑱 (𝚯 )= (𝟏 )
𝑱 (𝚯 )
𝜹 𝚯𝟏𝟎 𝜹 𝚯 𝟐𝟎
After each update, parameters corresponding to inputs going into each of
two hidden units are identical.
41
Neural Networks
Learning
Putting it together

43
Training a neural network
Pick a network architecture (connectivity pattern between neurons)

No. of input units: Dimension of features

No. output units: Number of classes
Reasonable default: 1 hidden layer, or if >1 hidden layer, have same no. of hidden
units in every layer (usually the more the better) But expensive computationally

• Units in hidden layer should be comparable to input features

• 1.5 or 2 x number of input features

44
Training a neural network
1. Randomly initialize weights
2. Implement forward propagation to get for any
3. Implement code to compute cost function
4. Implement backprop to compute partial derivatives
for i = 1:m
Perform forward propagation and backpropagation using
example
(Get activations and delta terms for ).

45
Training a neural network
5. Use gradient checking to compare computed using
backpropagation vs. using numerical estimate of gradient
of .
Then disable gradient checking code.
6. Use gradient descent or advanced optimization method with
backpropagation to try to minimize as a function of
parameters

• In theory can get stuck in local optima

• In practice, its not usually a huge problem
• Even if we don’t find global optima, gradient descent will do a decent job
46
END

Sins of An Angel - Fez Matsikiti
100% (1)
Sins of An Angel - Fez Matsikiti
2,504 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
CS Configuration Document Ace V1.0
100% (5)
CS Configuration Document Ace V1.0
106 pages
ANN Doc
No ratings yet
ANN Doc
2 pages
Corrosion
100% (4)
Corrosion
11 pages
General Description: ISO 17987/LIN 2.x/SAE J2602 Transceiver
100% (1)
General Description: ISO 17987/LIN 2.x/SAE J2602 Transceiver
24 pages
Easy Love Spell
50% (2)
Easy Love Spell
2 pages
Backpropagation Example
No ratings yet
Backpropagation Example
9 pages
Trial Memorandum Plaintiff SAMPLE
100% (4)
Trial Memorandum Plaintiff SAMPLE
10 pages
EPS-DL-Handout3-Build ANN From Scratch Basics
No ratings yet
EPS-DL-Handout3-Build ANN From Scratch Basics
25 pages
Back Propagation
No ratings yet
Back Propagation
56 pages
God of War Ghost of Sparta
100% (1)
God of War Ghost of Sparta
32 pages
The College Walkthrough Ver 0.39
No ratings yet
The College Walkthrough Ver 0.39
22 pages
ML Unit 2
No ratings yet
ML Unit 2
5 pages
14 Backprop
No ratings yet
14 Backprop
34 pages
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
14 pages
NN 2
No ratings yet
NN 2
12 pages
Neural Networks: Learning: Introduction To Machine Learning
No ratings yet
Neural Networks: Learning: Introduction To Machine Learning
8 pages
ANN Example
No ratings yet
ANN Example
10 pages
Back Propogation
No ratings yet
Back Propogation
9 pages
A Step by Step Backpropagation
No ratings yet
A Step by Step Backpropagation
8 pages
AIML-Module-3-part 2
No ratings yet
AIML-Module-3-part 2
122 pages
Feedforward Propagation: 1.1 Visualizing The Data
No ratings yet
Feedforward Propagation: 1.1 Visualizing The Data
11 pages
Neural Network Training
No ratings yet
Neural Network Training
73 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
No ratings yet
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
39 pages
DS303 NN
No ratings yet
DS303 NN
20 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
16 pages
Sns College of Technology: Department of Civil Engineering
No ratings yet
Sns College of Technology: Department of Civil Engineering
15 pages
Neural Networks
No ratings yet
Neural Networks
52 pages
Neural-Networks Back Propagation
No ratings yet
Neural-Networks Back Propagation
70 pages
Learning Algorithm
No ratings yet
Learning Algorithm
100 pages
ANN Research
No ratings yet
ANN Research
18 pages
Neural
No ratings yet
Neural
53 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
Exp - 4 - 5 (Prakash)
No ratings yet
Exp - 4 - 5 (Prakash)
10 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
31 pages
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
14 pages
Lecture 13.3 Classification ANN
No ratings yet
Lecture 13.3 Classification ANN
64 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Computing Gradient Using Backpropagation: ZV0GDF798E
No ratings yet
Computing Gradient Using Backpropagation: ZV0GDF798E
5 pages
Back Propagation Algorithm
No ratings yet
Back Propagation Algorithm
13 pages
Machine Learning: Chapter 4. Artificial Neural Networks
No ratings yet
Machine Learning: Chapter 4. Artificial Neural Networks
34 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Lecture 12 - Neural Networks (DONE!!) PDF
No ratings yet
Lecture 12 - Neural Networks (DONE!!) PDF
27 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
9 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
75 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
17 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
14 pages
A Step by Step Backpropagation Example
No ratings yet
A Step by Step Backpropagation Example
9 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
09: Neural Networks - Learning: Neural Network Cost Function
No ratings yet
09: Neural Networks - Learning: Neural Network Cost Function
9 pages
Neural Networks: Learning: Cost Function
No ratings yet
Neural Networks: Learning: Cost Function
33 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
32 pages
Unit 4
No ratings yet
Unit 4
16 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
No ratings yet
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
27 pages
Neural Networks - Learning
No ratings yet
Neural Networks - Learning
26 pages
ML Unit - 2
No ratings yet
ML Unit - 2
70 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
BHS Inggris Xi Sem-1 TP 2021-2022
No ratings yet
BHS Inggris Xi Sem-1 TP 2021-2022
8 pages
Machine Learning Part 9
No ratings yet
Machine Learning Part 9
33 pages
Orson Welles' Memo On by Lawrence French
100% (1)
Orson Welles' Memo On by Lawrence French
41 pages
Murabahah and Murabahah For Purchase Orderer: Islamic Financial Transactions
No ratings yet
Murabahah and Murabahah For Purchase Orderer: Islamic Financial Transactions
14 pages
Unit 9 Simple Linear Regression: Structure
No ratings yet
Unit 9 Simple Linear Regression: Structure
22 pages
Forrester - Enabling Smarter Procurement
No ratings yet
Forrester - Enabling Smarter Procurement
15 pages
Sanskrit PDF
No ratings yet
Sanskrit PDF
33 pages
Techno-Commercial Proposal (Without Price) (08!04!2025)
No ratings yet
Techno-Commercial Proposal (Without Price) (08!04!2025)
6 pages
Result
No ratings yet
Result
1 page
Teaching Resume 2017
No ratings yet
Teaching Resume 2017
2 pages
THEONE ? Sentence Improvement Pre 4th Oct Level Up Your English
No ratings yet
THEONE ? Sentence Improvement Pre 4th Oct Level Up Your English
145 pages
Gsu100 6648-0.0
No ratings yet
Gsu100 6648-0.0
16 pages
Engo
No ratings yet
Engo
33 pages
Priciples of Marketing by Philip Kotler and Gary Armstrong
No ratings yet
Priciples of Marketing by Philip Kotler and Gary Armstrong
33 pages
Exam 2022 p2 Ans
No ratings yet
Exam 2022 p2 Ans
14 pages
Lower Secondary Science Revision Guide Secondary 1 Sample Pages 9781398364219
No ratings yet
Lower Secondary Science Revision Guide Secondary 1 Sample Pages 9781398364219
23 pages
Visual Effects (VFX) Market 2034: Forecast & Analysis
No ratings yet
Visual Effects (VFX) Market 2034: Forecast & Analysis
10 pages
Message Analyzer FAQ and Known Issues
No ratings yet
Message Analyzer FAQ and Known Issues
11 pages
Internship Report
No ratings yet
Internship Report
10 pages
Library Cataloger General Responsibilities
No ratings yet
Library Cataloger General Responsibilities
2 pages
API FR - INR.RINR DS2 en Excel v2 2917298
No ratings yet
API FR - INR.RINR DS2 en Excel v2 2917298
74 pages
Subtitle
No ratings yet
Subtitle
4 pages
Satyam Cnlu Torts Roughdraft
No ratings yet
Satyam Cnlu Torts Roughdraft
4 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

9 Neural Networks Learning

Uploaded by

9 Neural Networks Learning

Uploaded by

Introduction to

pedestrian car motorcycle truck

1 output unit K output units

Example: if x(i) is an image of the

We don’t include “bias unit’s 6

𝑠1= 1 This is also called

Need code to compute:

Layer 1 Layer 2 Layer 3 Layer 4

Now: back propagation algorithm

Intuition: “error” of node in layer . Layer 1 Layer 2 Layer 3 Layer 4

• Back propagation basically takes the

For each output unit (layer L = 4)

For each output unit (layer L = 4)

Layer 1 Layer 2 Layer 3 Layer 4

Mathematically, It is possible to prove that all these partial derivatives

Training set of examples

• Probably too many details

Focusing on a single example , , the case of 1 output unit,

“error” of cost for (unit in layer ).

“error” of cost for (unit in layer ).

Two sided difference One sided difference

Implement: gradApprox = (J(theta + EPSILON) – J(theta –

No. of input units: Dimension of features

• Units in hidden layer should be comparable to input features

• In theory can get stuck in local optima

You might also like