0% found this document useful (0 votes)

14 views39 pages

14 Introduction To Training A Network

Uploaded by

Shahzaib Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views39 pages

14 Introduction To Training A Network

Uploaded by

Shahzaib Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Introduction to Training a Network

Perceptron
Perceptron

1
𝑦=
1 + 𝑒 −(𝑤0 + 𝑤1∗𝑥1 + 𝑤2∗𝑥2+ … )

𝑦 = 𝑓(𝒙𝑖 ; 𝑾)
Line Learning
Filter Learning
Filter Visualization
Probability

𝑦 = 𝑓(𝒙𝑖 ; 𝑾)
Class Labels

y=1
Actual Value

𝑦 = 𝑓(𝒙𝑖 ; 𝑾)

Predicted Value
y=o
Actual Value
Training a Perceptron

𝑦 = 𝑓(𝒙𝑖 ; 𝑾)

Predicted Value
Training Data

...
Random Initialization
Feed Forward

0.8 Predicted
𝑦 = 𝑓(𝒙𝑖 ; 𝑾)
Loss Function

0.8 Predicted
0 Actual
Error = ?
Loss Function

0.8 𝑦
0 y
Loss Function

0.8 𝑦
0 y

𝑛
1 2
𝑆𝑆𝐸 = 𝑦𝑖 − 𝑦𝑖
2𝑛
𝑖=1
Loss Function

0.8 𝑦
0 y

𝑛
1 2
𝑆𝑆𝐸 = 𝑦𝑖 − 𝑦𝑖
2𝑛
𝑖=1
𝑛
1 2
𝑆𝑆𝐸 = 𝑓(𝒙𝑖 ; 𝑾) − 𝑦𝑖
2𝑛
𝑖=1
𝑛 2
1 1
𝑆𝑆𝐸 = − 𝑦𝑖
2𝑛 1 + 𝑒 −(𝑤0 + 𝑤1∗𝑥1 + 𝑤2∗𝑥2+ … )
𝑖=1
Loss Function

0.8 𝑦
0 y

𝑛
1 2
𝑆𝑆𝐸 = 𝑦𝑖 − 𝑦𝑖
2𝑛
𝑖=1
𝑛
1 2
𝐽(𝑾) = 𝑓(𝒙𝑖 ; 𝑾) − 𝑦𝑖
2𝑛
𝑖=1
𝑛 2
1 1
𝐽(𝑾) = − 𝑦𝑖
2𝑛 1 + 𝑒 −(𝑤0 + 𝑤1∗𝑥1 + 𝑤2∗𝑥2+ … )
𝑖=1
Loss Optimization Function

0.01 𝑦
0 y

𝑛
1 2
𝑆𝑆𝐸 = 𝑦𝑖 − 𝑦𝑖
2𝑛
𝑖=1
𝑛
1 2
𝐽(𝑾) = 𝑓(𝒙𝑖 ; 𝑾) − 𝑦𝑖
2𝑛
𝑖=1
𝑛 2
1 1
𝐽(𝑾) = − 𝑦𝑖
2𝑛 1 + 𝑒 −(𝑤0 + 𝑤1∗𝑥1 + 𝑤2∗𝑥2+ … )
𝑖=1
Idea of Gradient Descent

𝐽 𝑤
Stochastic Gradient Descent

• We can apply stochastic gradient descent to the problem of finding the coefficients for our
model as follows:

– Given each training instance:

– Calculate a prediction using the current values of the coefficients.
– Calculate new coefficient values based on the error in the prediction.

• You continue to update the model for training instances and correcting errors until the model
is accurate enough or cannot be made any more accurate. It is often a good idea to
randomize the order of the training instances shown to the model to mix up the corrections
made.
Stochastic Gradient Descent
1
𝑦=
1 + 𝑒 −(𝑤0 + 𝑤1∗𝑥1 + 𝑤2∗𝑥2)
𝑛
1 2
𝐽 𝒘 = 𝐽(𝑤0, 𝑤1, 𝑤2) = 𝑦𝑖 − 𝑦𝑖
2𝑛
𝑖=1

w0 = w0 – lambda * dJ/d(w0)

w1 = w1 – lambda * dJ/d(w1)

w2 = w2 – lambda * dJ/d(w2)

𝑤0 = 𝑤0 − 𝑙𝑎𝑚𝑏𝑑𝑎 ∗ 𝑦𝑖 − 𝑦𝑖 ∗ 𝑦𝑖 ∗ (1 − 𝑦𝑖 )
𝑤1 = 𝑤1 − 𝑙𝑎𝑚𝑏𝑑𝑎 ∗ 𝑦𝑖 − 𝑦𝑖 ∗ 𝑦𝑖 ∗ (1 − 𝑦𝑖 ) * x1
𝑤2 = 𝑤2 − 𝑙𝑎𝑚𝑏𝑑𝑎 ∗ 𝑦𝑖 − 𝑦𝑖 ∗ 𝑦𝑖 ∗ (1 − 𝑦𝑖 ) * x2

https://towardsdatascience.com/derivative-of-the-sigmoid-function-536880cf918e
Iteration 1

• Let’s start with values of 0.0 for coefficients and 0.03 for learning rate.

w0 = 0.0, w1 = 0.0, w2 = 0.0, lambda = 0.3

• We can now calculate the predicted value for 𝑦 ̂ using our starting point coefficients for the
first training instance:

𝑦 ̂ = 1 / 1 + e-(w0 + w1x1 + w2x2)

= 1 / 1 + e-(0 + 0*2.78 + 0*2.55)
= 0.5

• We can now use this prediction in our equation for gradient descent to update the weights.

𝑤0 = 𝑤0 − 𝑙𝑎𝑚𝑏𝑑𝑎 ∗ 𝑦𝑖 − 𝑦𝑖 ∗ 𝑦𝑖 ∗ (1 − 𝑦𝑖 ) 𝑤1 = 𝑤1 − 𝑙𝑎𝑚𝑏𝑑𝑎 ∗ 𝑦𝑖 − 𝑦𝑖 ∗ 𝑦𝑖 ∗ (1 − 𝑦𝑖 ) * x1
= 0.0 – 0.3 * (0.5-0) * 0.5 * (1-0.5) = 0.0 – 0.3 * (0.5-0) * 0.5 * (1-0.5) * 2.78
= -0.0375 = -0.104290635

𝑤2 = 𝑤2 − 𝑙𝑎𝑚𝑏𝑑𝑎 ∗ 𝑦𝑖 − 𝑦𝑖 ∗ 𝑦𝑖 ∗ (1 − 𝑦𝑖 ) * x2
= 0.0 – 0.3 * (0.5-0) * 0.5 * (1-0.5) * 2.55
= -0.09564513761
Iteration 2

• Let’s start with values of 0.0 for coefficients and 0.03 for learning rate.

w0 = -0.0375, w1 = -0.1042, w2 = -0.0956, lambda = 0.3

• We can now calculate the predicted value for 𝑦 ̂ using our starting point coefficients for the
first training instance:

𝑦 ̂ = 1 / 1 + e-(w0 + w1x1 + w2x2)

= 1 / 1 + e-(0 + 0*2.78 + 0*2.55)
= 0.5

• We can now use this prediction in our equation for gradient descent to update the weights.

𝑤2 = 𝑤2 − 𝑙𝑎𝑚𝑏𝑑𝑎 ∗ 𝑦𝑖 − 𝑦𝑖 ∗ 𝑦𝑖 ∗ (1 − 𝑦𝑖 ) * x2
= 0.0 – 0.3 * (0.5-0) * 0.5 * (1-0.5) * 2.55
= -0.09564513761
Iteration 100

• You can repeat this process 100 times. This is 10 complete epochs of the training data being
exposed to the model and updating the coefficients. The graph below show a plot of accuracy
of the model over 10 epochs.

• Here is a list of all of the values for the coefficients after the 100 iterations:

w0 = -0.4066054641

w1 = 0.8525733164

w2 = -1.104746259
Trained

w0 = -0.4066054641

w1 = 0.8525733164

w2 = -1.104746259

𝑦 = 1 / 1 + e-(w0 + w1x1 + w2x2)

𝑦 = 1 / 1 + e-(-0.4066054641 + 0.8525733164x1 + -1.104746259x2)
Testing a Perceptron

• Let’s plug final coefficients into our model and make a prediction for each point in our
training dataset.

𝑦 = 1 / 1 + e-(w0 + w1x1 + w2x2)

𝑦 = 1 / 1 + e-(-0.4066054641 + 0.8525733164x1 + -1.104746259x2)

Predicted if (Predicted < 0.5) Then 0 Else 1

Accuracy = Correct Predictions / Total Predictions

= (10 /10) * 100

= 100%
Trained

w0 = -0.4066054641

w1 = 0.8525733164

w2 = -1.104746259, . . .

𝑦 = 1 / 1 + e-(w0 + w1x1 + w2x2 + . . . )

𝑦 = 1 / 1 + e-(-0.4066054641 + 0.8525733164x1 + -1.104746259x2 + . . . )
Testing a Perceptron

𝑦 = 𝑓(𝑥 𝑖 ; 𝑾)

Predicted Value
Stochastic Gradient Descent
1
𝑦=
1 + 𝑒 −(𝑤0 + 𝑤1∗𝑥1 + 𝑤2∗𝑥2)
𝑛
1 2
𝐽 𝒘 = 𝐽(𝑤0, 𝑤1, 𝑤2) = 𝑦𝑖 − 𝑦𝑖
2𝑛
𝑖=1

w0 = w0 – lambda * dJ/d(w0)

w1 = w1 – lambda * dJ/d(w1)

w2 = w2 – lambda * dJ/d(w2)

https://towardsdatascience.com/derivative-of-the-sigmoid-function-536880cf918e
Composite Function

x f1 y1 f2 y2

𝑓2 𝑓1 𝑥 = 𝑓2 ° 𝑓1 𝑥

𝜕𝑓2 𝜕𝑓2 𝜕𝑓1

= * Apply Chain Rule
𝜕𝑥 𝜕𝑓1 𝜕𝑥
Idea of Back Propagation

w1
x f1 y1 J(w)

𝜕𝐽(𝑤) 𝜕𝐽(𝑤) 𝜕𝑦1

= *
𝜕𝑤1 𝜕𝑦1 𝜕𝑤1
Idea of Back Propagation

w1 w2
x f1 y1 f2 y2 J(w)

𝜕𝐽(𝑤) 𝜕𝐽(𝑤) 𝜕𝑦1 𝜕𝐽(𝑤) 𝜕𝐽(𝑤) 𝜕𝑦2

= * = *
𝜕𝑤1 𝜕𝑦1 𝜕𝑤1 𝜕𝑤2 𝜕𝑦2 𝜕𝑤2
Idea of Back Propagation

w1 w2
x f1 y1 f2 y2 J(w)

𝜕𝐽(𝑤) 𝜕𝐽(𝑤) 𝜕𝑦2 𝜕𝑦1

= * *
𝜕𝑤1 𝜕𝑦2 𝜕𝑦1 𝜕𝑤1

𝜕𝐽(𝑤)
𝜕𝑦1
CNN
Training
Learning Rate
Training
Training
Summary

l3 Phono Stage v2
100% (1)
l3 Phono Stage v2
110 pages
1.1. How Should We Define AI
No ratings yet
1.1. How Should We Define AI
14 pages
NN Theory
No ratings yet
NN Theory
138 pages
Back-Propagation Is Very Simple. Who Made It Complicated
No ratings yet
Back-Propagation Is Very Simple. Who Made It Complicated
26 pages
Neural Network
No ratings yet
Neural Network
44 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Bai 1 Eng
No ratings yet
Bai 1 Eng
10 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
ML Lecture#4
No ratings yet
ML Lecture#4
109 pages
Gradient Descent Learning: Minimize Objective Function: Error Landscape
No ratings yet
Gradient Descent Learning: Minimize Objective Function: Error Landscape
14 pages
Notes On MSE Gradients For Neural Networks: 1 2 Mean Squared Error (MSE)
No ratings yet
Notes On MSE Gradients For Neural Networks: 1 2 Mean Squared Error (MSE)
10 pages
Neural Networks - Slides - CMU - Aarti Singh & Barnabas Poczos
No ratings yet
Neural Networks - Slides - CMU - Aarti Singh & Barnabas Poczos
36 pages
Machine Learning: Algorithms and Applications: (Continued)
No ratings yet
Machine Learning: Algorithms and Applications: (Continued)
17 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Practical No. - 1
No ratings yet
Practical No. - 1
55 pages
68009640001-BF Enus MOTOTRBO DEP 450 Non Keypad Portable Radio User Guide
No ratings yet
68009640001-BF Enus MOTOTRBO DEP 450 Non Keypad Portable Radio User Guide
205 pages
UltraPIPE Software Ingles
No ratings yet
UltraPIPE Software Ingles
34 pages
Online Food Delivery App Foodie
No ratings yet
Online Food Delivery App Foodie
12 pages
Oop
No ratings yet
Oop
23 pages
Assignment 1: Q1. Task Description
No ratings yet
Assignment 1: Q1. Task Description
12 pages
Lecture 4
No ratings yet
Lecture 4
50 pages
Tutorial 8 Questions
No ratings yet
Tutorial 8 Questions
3 pages
DeepLearning Practice Question Answers
No ratings yet
DeepLearning Practice Question Answers
43 pages
BC 278clt
No ratings yet
BC 278clt
44 pages
Adobe Media Encoder Log-Last
No ratings yet
Adobe Media Encoder Log-Last
2 pages
Presentation, Project and Technical Report On " Computer Hardware "
No ratings yet
Presentation, Project and Technical Report On " Computer Hardware "
12 pages
Neural Networks: Derivation: 1 Model
No ratings yet
Neural Networks: Derivation: 1 Model
9 pages
Neural Networks Handout
No ratings yet
Neural Networks Handout
7 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
Deep Learning Lectures - 2
No ratings yet
Deep Learning Lectures - 2
73 pages
Smart Car Parking System in Multiplexes
No ratings yet
Smart Car Parking System in Multiplexes
6 pages
CS1101-DF-Unit 5 - Strings and Iterations
No ratings yet
CS1101-DF-Unit 5 - Strings and Iterations
7 pages
ECE 546 - VLSI Systems Design Lecture 16: SRAM: Fall 2012 W. Rhett Davis NC State University
No ratings yet
ECE 546 - VLSI Systems Design Lecture 16: SRAM: Fall 2012 W. Rhett Davis NC State University
24 pages
Software Laboratory II Code
No ratings yet
Software Laboratory II Code
27 pages
Lab 5: 16 April 2012 Exercises On Neural Networks
No ratings yet
Lab 5: 16 April 2012 Exercises On Neural Networks
6 pages
DeepLearning Workshop Humayun
No ratings yet
DeepLearning Workshop Humayun
63 pages
Trainina A NN Backpropagation
No ratings yet
Trainina A NN Backpropagation
6 pages
Slide 2
No ratings yet
Slide 2
35 pages
Part 13 MD
No ratings yet
Part 13 MD
41 pages
Experiment 3
No ratings yet
Experiment 3
9 pages
Gis Exam Questions
67% (3)
Gis Exam Questions
2 pages
Soft Computing
No ratings yet
Soft Computing
16 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Lab Manual DL (New)
No ratings yet
Lab Manual DL (New)
89 pages
Null 0
No ratings yet
Null 0
6 pages
SAMA APP Overview Specifications (For Tender)
No ratings yet
SAMA APP Overview Specifications (For Tender)
11 pages
Week 4 - Lab
No ratings yet
Week 4 - Lab
7 pages
Value Chain Analysis Thesis PDF
100% (2)
Value Chain Analysis Thesis PDF
5 pages
Chapter 3
No ratings yet
Chapter 3
13 pages
CHAPTER THREE Edited Handout
No ratings yet
CHAPTER THREE Edited Handout
13 pages
DESIGN For ROBUSTNESS
No ratings yet
DESIGN For ROBUSTNESS
15 pages
Week 7 - Lab
No ratings yet
Week 7 - Lab
6 pages
Da 3 Lab DL 21BCE2687
No ratings yet
Da 3 Lab DL 21BCE2687
15 pages
Imperva - SecureD Data Protection v1.5 HSL v1.2
No ratings yet
Imperva - SecureD Data Protection v1.5 HSL v1.2
32 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Ann PPT
No ratings yet
Ann PPT
48 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
Dat 300
No ratings yet
Dat 300
12 pages
HART® Transmitter Calibration
No ratings yet
HART® Transmitter Calibration
16 pages
Aec3012 4001
No ratings yet
Aec3012 4001
17 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Byu Pathway Online Degree Maps
No ratings yet
Byu Pathway Online Degree Maps
12 pages
05 Optimization Basics
No ratings yet
05 Optimization Basics
94 pages
Lecture Notes 3 Perceptron
No ratings yet
Lecture Notes 3 Perceptron
7 pages
Module 3.docxaiml
No ratings yet
Module 3.docxaiml
20 pages
Relational Database and SQL
No ratings yet
Relational Database and SQL
35 pages
Manual Control Sony Bravia
No ratings yet
Manual Control Sony Bravia
43 pages
M3 Transcript
No ratings yet
M3 Transcript
10 pages
Sop Sample
No ratings yet
Sop Sample
2 pages
Notebook - Deep Neural Networks
No ratings yet
Notebook - Deep Neural Networks
28 pages
Neural Networks & Deep Learning 2025
No ratings yet
Neural Networks & Deep Learning 2025
73 pages
Poly Interp
No ratings yet
Poly Interp
27 pages
Chapt5-ER-to-Relational Mapping
No ratings yet
Chapt5-ER-to-Relational Mapping
37 pages
UNIT-4 Introduction To IPR (IPR-Enginering)
No ratings yet
UNIT-4 Introduction To IPR (IPR-Enginering)
18 pages
DL Notes B Div
No ratings yet
DL Notes B Div
13 pages
Homework 2
No ratings yet
Homework 2
3 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Ex No:1 Implementing A Perceptron Algorithm For Binary Classification Date: Aim
No ratings yet
Ex No:1 Implementing A Perceptron Algorithm For Binary Classification Date: Aim
41 pages
Lecture#24-Universal Turing Machine
No ratings yet
Lecture#24-Universal Turing Machine
53 pages
12 Introduction To Perceptron
No ratings yet
12 Introduction To Perceptron
53 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Pr2 ANN WriteUp
No ratings yet
Pr2 ANN WriteUp
11 pages
ZTNA - Cloudflare Access - Product-Overview 2024 Q2 EN
No ratings yet
ZTNA - Cloudflare Access - Product-Overview 2024 Q2 EN
7 pages
Implement A Perceptron To Evaluate Logical Operations Including XOR
No ratings yet
Implement A Perceptron To Evaluate Logical Operations Including XOR
73 pages
Neural Network (Perceptrons)
No ratings yet
Neural Network (Perceptrons)
31 pages
Convex Report
No ratings yet
Convex Report
9 pages
G812 3
No ratings yet
G812 3
9 pages
Chapter 7
No ratings yet
Chapter 7
68 pages
NN Introduction MES
No ratings yet
NN Introduction MES
39 pages
Lecture 6
No ratings yet
Lecture 6
10 pages
Lecture 8
No ratings yet
Lecture 8
19 pages
Gradient Descent Algorithm and Back-Propagation Derivation
No ratings yet
Gradient Descent Algorithm and Back-Propagation Derivation
4 pages
Ai512 Book
No ratings yet
Ai512 Book
127 pages
Pa - Unit - Iv
No ratings yet
Pa - Unit - Iv
45 pages
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet

14 Introduction To Training A Network

Uploaded by

14 Introduction To Training A Network

Uploaded by

Introduction to Training a Network

– Given each training instance:

w0 = 0.0, w1 = 0.0, w2 = 0.0, lambda = 0.3

𝑦 ̂ = 1 / 1 + e-(w0 + w1x1 + w2x2)

w0 = -0.0375, w1 = -0.1042, w2 = -0.0956, lambda = 0.3

𝑦 ̂ = 1 / 1 + e-(w0 + w1x1 + w2x2)

𝑦 = 1 / 1 + e-(w0 + w1x1 + w2x2)

𝑦 = 1 / 1 + e-(w0 + w1x1 + w2x2)

Predicted if (Predicted < 0.5) Then 0 Else 1

Accuracy = Correct Predictions / Total Predictions

= (10 /10) * 100

𝑦 = 1 / 1 + e-(w0 + w1x1 + w2x2 + . . . )

𝜕𝑓2 𝜕𝑓2 𝜕𝑓1

𝜕𝐽(𝑤) 𝜕𝐽(𝑤) 𝜕𝑦1

𝜕𝐽(𝑤) 𝜕𝐽(𝑤) 𝜕𝑦1 𝜕𝐽(𝑤) 𝜕𝐽(𝑤) 𝜕𝑦2

𝜕𝐽(𝑤) 𝜕𝐽(𝑤) 𝜕𝑦2 𝜕𝑦1

You might also like