0% found this document useful (0 votes)

4 views20 pages

CHAPTER 3.4.1 - Backpropagation - Updated

Chapter 4 discusses back propagation and optimization in artificial intelligence using Python. It covers concepts such as partial derivatives, gradients, the gradient descent algorithm, and the chain rule, emphasizing their roles in training neural networks. The chapter also provides examples of calculating gradients for various functions and the softmax activation function.

Uploaded by

21146424

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views20 pages

CHAPTER 3.4.1 - Backpropagation - Updated

Uploaded by

21146424

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

ARTIFICIAL INTELLIGENCE

in Python LANGUAGE

Chapter 4: Back Propagation & Optimization

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• Partial derivative:
• The partial derivative measures how much impact a single input has on a function’s output.
• Euler’s notation:

Example:

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• Partial derivative:

Examples:

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• Partial derivative:
Partial derivative of max function:

𝜕𝑓(𝑥, 𝑦) 𝜕max(𝑥, 𝑦) 1 if 𝑥 ≥ 𝑦
𝑓(𝑥, 𝑦) = max(𝑥, 𝑦) → = =቎
𝜕𝑥 𝜕𝑥 0 if 𝑥 < 𝑦

𝜕𝑓(𝑥, 𝑦) 1 if 𝑥 ≥ 0
𝑓(𝑥, 0) = max(𝑥, 0) → =቎
𝜕𝑥 0 if 𝑥 < 0

Gradient is a vector composed of all of the partial derivatives of one

function, calculated in function of the input variables.

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• Gradient:
Gradient is the vector composed by all partial derivatives of the function. Denotation: Nabla 𝛻

Example:

Gradient descent algorithm: An optimal algorithm that help finding the local minimum of a
function (in the case of a NN: the loss function) by making converging the model’s parameters
to optimal values.
M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only
4.1. Back Propagation
• Gradient Descent Algorithm:
Gradient descent algorithm: Starting from a point that could be close to the solution, one will
use an iterative operation to gradually approach the desired point (local minimum), i.e., when
the derivative converge to 0.
In order to converge to the local minimum, one have to move in the inverse sense of the gradient
vector. The formula can be as follow:
𝜕𝑓
for each 𝑥𝑖 in 𝐱: 𝑥𝑖 (𝑡 + 1) = 𝑥𝑖 (𝑡) − 𝐿𝑅. 𝐱
𝜕𝑥𝑖
with 𝐿𝑅 is the learning rate

In the case of a N.N., f is the loss function

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• The chain rule:
When realizing a forward pass, the data is passed through layers of neurons. At each layer, the
outputs are passed through the activation function before going to the next layer, … The loss
function is calculated at the last layer of neurons (the output layer). Its expression can be written in
function of all the parameters of the network. Example with Cross Categorical Entropy Loss:
𝑛2
𝑛1
𝑛0
𝑛2 𝑛1
ා ∀𝑗=1 max 0,෎ ∀𝑗=1 max 0,෍ 𝑥𝑖 𝜔1,𝑖,𝑗 +𝑏1,𝑗 𝜔2,𝑖,𝑗 +𝑏2,𝑗 𝜔3,𝑖,𝑗 +𝑏3,𝑗
𝑛 𝑖=1 𝑖
𝑖=1
𝑛3 𝑒 𝑖=1 𝑖
𝐿 = − ෍ 𝑦𝑘 log ∀𝑗=1 𝑛2
𝑛1
𝑘=1 𝑛3 𝑛0
𝑛2 𝑛1
ා ∀𝑗=1 max 0,෎ ∀𝑗=1 max 0,෍ 𝑥𝑖 𝜔1,𝑖,𝑗 +𝑏1,𝑘 𝜔2,𝑖,𝑗 +𝑏2,𝑘 𝜔3,𝑖,𝑗 +𝑏3,𝑘
𝑖=1 𝑖
𝑖=1
ා 𝑒 𝑖=1 𝑖

𝑙=1 A chain of functions

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• The chain rule:
The derivative of a function chain is a product of all derivatives of all of the functions in this chain

Examples:

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• Back Propagation:
We want to back-propagate our gradients by calculating derivatives and partial derivatives with
respect to each of our parameters and inputs. We’re going to use the chain rule on our NN. We
start on 1 single neuron:

Calculate Gradient of the ReLU activation function:

The ReLU function is equivalent to function max(x, 0). Therefore, its partial derivative is:
𝜕𝐑𝐞𝐋𝐔 1 if 𝑦≥0
𝐑𝐞𝐋𝐮(𝑦) = max(𝑦, 0) ⇒ =቎
𝜕𝑦 0 if 𝑦<0

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• Back Propagation:
drelu_dxw0: the partial derivative of the ReLU w.r.t. the first weighed input, w0x0
drelu_dxw1: the partial derivative of the ReLU w.r.t. the second weighed input, w1x1
drelu_dxw2: the partial derivative of the ReLU w.r.t. the 3rd weighed input, w2x2
drelu_db: the partial derivative of the ReLU w.r.t. the bias, w0x0
Calculate Gradient of the sum function:
The partial derivative of the sum operation is always equal to 1:

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• Back Propagation:
Calculate Gradient of the multiplication function:

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• Back Propagation:
# The derivative of the next layer is here 1.0
Example of backpropagation on a single neuron: dvalue = 1.0
x = [1, -2, 3] # input # The derivative of the ReLU / the chain rule z
w = [-3, -1, 2] # weights dReLU_dz = dvalue * (1. if z > 0 else 0.)
b = 1 # bias # Partial derivative of the sum, the chain rule
# Forward pass dsum_dxw0 = 1 # = dz_dxw0
xw0 = x[0] * w[0] dsum_dxw1 = 1 # = dz_dxw1
xw1 = x[1] * w[1] dsum_dxw2 = 1 # = dz_dxw2
xw2 = x[2] * w[2] dsum_db = 1

z= xw0 + xw1 + xw2 + b # the value of the chain dReLU_dxw0 = dReLU_dz * dsum_dxw0
rule dReLU_dxw1 = dReLU_dz * dsum_dxw1
dReLU_dxw2 = dReLU_dz * dsum_dxw2
# ReLU activation dReLu_db = dReLU_dz * dsum_db
output = max(z,0)

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• Back Propagation: # Determine the gradient vectors
dx = [dReLU_dx0, dReLU_dx1, dReLU_dx2] # gradient of inputs
dw = [dReLU_dw0, dReLU_dw1, dReLU_dw2] # gradient of weights
Example of backpropagation on a single neuron:
db = dReLu_db # gradient of bias
# Partial derivative of the multiplication, the chain rule
dmul_dx0 = w[0] # Update the weights

dmul_dx1 = w[1] w[0] += -0.001*dw[0]

w[1] += -0.001*dw[1]
dmul_dx2 = w[2]
w[2] += -0.001*dw[2]
dmul_dw0 = x[0]
dmul_dw1 = x[1]
b += -0.001*db

dmul_dw2 = x[2] # Now, forward pass again !

dReLU_dx0 = dReLU_dxw0 * dmul_dx0 xw0 = x[0] * w[0]

xw1 = x[1] * w[1]
dReLU_dw0 = dReLU_dxw0 * dmul_dw0
dReLU_dx1 = dReLU_dxw1 * dmul_dx1
xw2 = x[2] * w[2]

dReLU_dw1 = dReLU_dxw1 * dmul_dw1

dReLU_dx2 = dReLU_dxw2 * dmul_dx2 z= xw0 + xw1 + xw2 + b # the value of the chain rule

dReLU_dw2 = dReLU_dxw2 * dmul_dw2

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• Back Propagation:
Calculate Gradient of the CCE loss:

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• Back Propagation:
def backward(self, dvalues, y_true):
Calculate Gradient of the CCE loss:
# Determine the number of samples
samples = len(dvalues)
# Determine the number of labels in each sample
# We use the first sample to count
labels = len(dvalues[0])
# if labels are sparse, turn them into one vector
if len(y_true.shape) == 1:
y_true = np.eye(labels)[y_true]
# Calculate gradient
self.dinputs = - y_true / dvalues
# Normalize gradient
self.dinputs = self.dinputs / samples

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• Back Propagation:
Calculate Gradient of the softmax activation function:

The Softmax Output is defined by:

This is a function with n inputs and n outputs. Thus,

calculating all the gradients of Sj will result in a
Jacobian matrix:

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• Back Propagation:
Calculate Gradient of the softmax
activation function:

The calculation of each term of this Jacobian matrix gives:

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• Back Propagation:
Calculate Gradient of the
softmax activation function:

Therefore:

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

4.1. Back Propagation
• Back Propagation:
Calculate Gradient of the softmax
def backward(self,dvalues):
activation function: # Create uninitialized array
self.dinputs = np.empty_like(dvalues)

# Enumerate outputs and gradients

for index, (single_output,single_dvalues) in enumerate(zip(self.output, dvalues)):
# Flatten ouput array
single_output = single_output.reshape(-1,1)
# Calculate the Jacobian Matrix of the output
jacobian_matrix = np.diagflat(single_output) - np.dot(single_output,single_output.T)

# Calculate sample-wise gradient

# and add it to the array of sample gradients
self.dinputs[index] = np.dot(jacobian_matrix,single_dvalues)

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

Artificial Intelligence

END OF CHAPTER 4.1

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

I M OK You Re OK Thomas Harris Download
No ratings yet
I M OK You Re OK Thomas Harris Download
52 pages
Frequency Distribution Table
No ratings yet
Frequency Distribution Table
2 pages
Storytelling Elements
No ratings yet
Storytelling Elements
7 pages
Legal Aspect Case Study
No ratings yet
Legal Aspect Case Study
2 pages
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet
12th Chemistry Vol 2 Question Bank Prepared by Chennai CEO
100% (2)
12th Chemistry Vol 2 Question Bank Prepared by Chennai CEO
163 pages
Aging and The Life Course: An Introduction To Social Gerontology 6th Edition (Ebook PDF) PDF Download
100% (2)
Aging and The Life Course: An Introduction To Social Gerontology 6th Edition (Ebook PDF) PDF Download
49 pages
Eddy Current Level-1
No ratings yet
Eddy Current Level-1
111 pages
YE Example
No ratings yet
YE Example
4 pages
Lecture 02-2
No ratings yet
Lecture 02-2
37 pages
2 B. Chapter 2 Mpu22012 2021
No ratings yet
2 B. Chapter 2 Mpu22012 2021
59 pages
Scholarship For MSC Student
No ratings yet
Scholarship For MSC Student
3 pages
MSDS LDPE - LLDPE Version 7 EN
No ratings yet
MSDS LDPE - LLDPE Version 7 EN
11 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
AFS Foundations Handbook - Final
No ratings yet
AFS Foundations Handbook - Final
41 pages
Day 6
No ratings yet
Day 6
3 pages
Trabajo de Ingles Animal en Extincion
No ratings yet
Trabajo de Ingles Animal en Extincion
4 pages
Definitions of Research
No ratings yet
Definitions of Research
3 pages
Forward & Backward Propagation
No ratings yet
Forward & Backward Propagation
2 pages
Tablas Elucidacion Estructural (Protoì - N y Carbono)
No ratings yet
Tablas Elucidacion Estructural (Protoì - N y Carbono)
54 pages
First
No ratings yet
First
92 pages
Triz Ol Rna Extraction 030911
No ratings yet
Triz Ol Rna Extraction 030911
3 pages
Aurora Putri Latifah - ICCSCP 2023
No ratings yet
Aurora Putri Latifah - ICCSCP 2023
9 pages
Written Addition and Subtraction Problem Solving
No ratings yet
Written Addition and Subtraction Problem Solving
15 pages
SAWS-EnG-0632 Minimum Requirements of Geotechnical Investigations and Reports
No ratings yet
SAWS-EnG-0632 Minimum Requirements of Geotechnical Investigations and Reports
13 pages
CHAPTER 3.4.2 - Optimization
No ratings yet
CHAPTER 3.4.2 - Optimization
8 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
LLM Ai Interview SS
No ratings yet
LLM Ai Interview SS
187 pages
CHAPTER 2 - Regressions
No ratings yet
CHAPTER 2 - Regressions
14 pages
Module 2
No ratings yet
Module 2
14 pages
Neural Networks
No ratings yet
Neural Networks
52 pages
Pr2 ANN WriteUp
No ratings yet
Pr2 ANN WriteUp
11 pages
CHAPTER 3.3 - Activation - Loss - Accuracy
No ratings yet
CHAPTER 3.3 - Activation - Loss - Accuracy
14 pages
Working of Multi-Layer Perceptron
No ratings yet
Working of Multi-Layer Perceptron
16 pages
Li 2008
No ratings yet
Li 2008
4 pages
Lab 1 - Experiment On Electrostatics
No ratings yet
Lab 1 - Experiment On Electrostatics
5 pages
Learning 3
No ratings yet
Learning 3
98 pages
CS445 - Neural Networks and Deep Learning - Lecture Notes
No ratings yet
CS445 - Neural Networks and Deep Learning - Lecture Notes
5 pages
Activation - Loss - Accuracy
No ratings yet
Activation - Loss - Accuracy
16 pages
Chapter 6 - Backpropagation
No ratings yet
Chapter 6 - Backpropagation
48 pages
DAILY LESSON LOG OF STEM - PC11AG-Ib-1 (Week Two-Day One) : 4 Cy 4cy
No ratings yet
DAILY LESSON LOG OF STEM - PC11AG-Ib-1 (Week Two-Day One) : 4 Cy 4cy
4 pages
2024 04 CS115 Vector Caculus
No ratings yet
2024 04 CS115 Vector Caculus
131 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Ramesh Babu Pushpanathan Consultant-SAP
No ratings yet
Ramesh Babu Pushpanathan Consultant-SAP
15 pages
Week3 Backpropagation
No ratings yet
Week3 Backpropagation
32 pages
Unit 3
No ratings yet
Unit 3
6 pages
L04 Slides - mlp1
No ratings yet
L04 Slides - mlp1
22 pages
PRY 6 English Ist Term
No ratings yet
PRY 6 English Ist Term
17 pages
06 AIS302 ANN Backpropagation
No ratings yet
06 AIS302 ANN Backpropagation
83 pages
Back Propagation
No ratings yet
Back Propagation
5 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Introduction To Feed Forward Neural Networks
No ratings yet
Introduction To Feed Forward Neural Networks
121 pages
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet
Complex Analysis by A.R. Vasishtha Pdfnotes - Co
100% (4)
Complex Analysis by A.R. Vasishtha Pdfnotes - Co
234 pages
555610a19 DL Exp4
No ratings yet
555610a19 DL Exp4
11 pages
X5000 Safety Manual
No ratings yet
X5000 Safety Manual
24 pages
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
No ratings yet
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
39 pages
NN 1
No ratings yet
NN 1
21 pages
Chap 3 Slides
No ratings yet
Chap 3 Slides
95 pages
AyushChokhani AI Asiignment 2
No ratings yet
AyushChokhani AI Asiignment 2
12 pages
Economics
No ratings yet
Economics
322 pages
Homework7 2 December 2015
100% (1)
Homework7 2 December 2015
2 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
16 pages
5 Backward Propagation
No ratings yet
5 Backward Propagation
81 pages
Tut 01
No ratings yet
Tut 01
39 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
32 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
Neural-Networks Back Propagation
No ratings yet
Neural-Networks Back Propagation
70 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
Understanding and Creating Neural Networks
No ratings yet
Understanding and Creating Neural Networks
69 pages
Amity University, Mumbai Aibas: Title: Learned Optimism Scale
No ratings yet
Amity University, Mumbai Aibas: Title: Learned Optimism Scale
9 pages
SJNanda Neural Network
No ratings yet
SJNanda Neural Network
47 pages
Chap5 3-BackProp
No ratings yet
Chap5 3-BackProp
41 pages
Backpropagation Example
No ratings yet
Backpropagation Example
9 pages
L3 Backpropagation
No ratings yet
L3 Backpropagation
61 pages
Week 1 Solutions
No ratings yet
Week 1 Solutions
8 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
ML Session 15 Backpropagation
No ratings yet
ML Session 15 Backpropagation
30 pages
Backward Forward Propogation
No ratings yet
Backward Forward Propogation
19 pages
Machine Learning: Backpropagation
No ratings yet
Machine Learning: Backpropagation
24 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
How To Build Your Own Neural Network From Scratch in
No ratings yet
How To Build Your Own Neural Network From Scratch in
6 pages
Foundations of Deep Learning
No ratings yet
Foundations of Deep Learning
30 pages
Lecture04 Neuralnets
No ratings yet
Lecture04 Neuralnets
81 pages
Lecture 2, Part 2: Backpropagation: Roger Grosse
No ratings yet
Lecture 2, Part 2: Backpropagation: Roger Grosse
9 pages
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
9 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
Neural Networks: Derivation: 1 Model
No ratings yet
Neural Networks: Derivation: 1 Model
9 pages

CHAPTER 3.4.1 - Backpropagation - Updated

Uploaded by

CHAPTER 3.4.1 - Backpropagation - Updated

Uploaded by

ARTIFICIAL INTELLIGENCE

Chapter 4: Back Propagation & Optimization

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

Gradient is a vector composed of all of the partial derivatives of one

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

In the case of a N.N., f is the loss function

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

𝑙=1 A chain of functions

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

Calculate Gradient of the ReLU activation function:

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

dmul_dx1 = w[1] w[0] += -0.001*dw[0]

dmul_dw2 = x[2] # Now, forward pass again !

dReLU_dx0 = dReLU_dxw0 * dmul_dx0 xw0 = x[0] * w[0]

dReLU_dw1 = dReLU_dxw1 * dmul_dw1

dReLU_dw2 = dReLU_dxw2 * dmul_dw2

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

The Softmax Output is defined by:

This is a function with n inputs and n outputs. Thus,

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

The calculation of each term of this Jacobian matrix gives:

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

# Enumerate outputs and gradients

# Calculate sample-wise gradient

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

END OF CHAPTER 4.1

M e c h a t ro n i c s – R o b o t & A I D e p a r t m e n t For Internal Circulation only

You might also like