0% found this document useful (0 votes)

12 views29 pages

Neural Networks

The document provides an overview of artificial neurons, their structure, and the training process of neural networks, particularly focusing on the Mean Squared Error (MSE) loss function and gradient descent optimization. It describes the architecture of Multi-Layer Perceptrons (MLP), the role of activation functions, and the backpropagation algorithm used for training. Additionally, it includes a practical implementation example using the PyTorch library to demonstrate forward and backward passes in a neural network.

Uploaded by

ayaazouz1997

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views29 pages

Neural Networks

Uploaded by

ayaazouz1997

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Biological and Artificial Neuron

Artificial neuron related names

() ()
w1 x1
w= w2 x= x 2 T
z=w x+ b
… …
wD xD

() ()
b 1
w1 x1
w= w2 x= x 2 T
z=w x
… …
wD xD

y=f ( z )=f ( wT x )

Perceptron
Sample data consists of n observed pairs:
(x1, y1), … (xi, yi) ,… (xn, yn), i=1…n.
xi – input vector, yi -label

b, w1, w2, wD - unknowns

n n
1
[ ]
2
f ( w )= ∑ [ ε i ] =∑ y i−f ( w x i )
2 T
n i=1 i=1

Mean Squared Error - MSE.

In context of neural networks MSE loss.

In a neural network, the weights are usually found using an optimization algorithm that
minimizes a chosen loss function. In the case of Mean Squared Error (MSE) loss, the
objective is to minimize the average squared difference between the predicted output and
the true output for a given set of input data.
Here's a general overview of how the weights are updated in a neural network using MSE
loss:
1. Initialize the weights of the neural network randomly.
2. Forward pass: Feed the input data through the neural network and obtain the
predicted output.
3. Compute the MSE loss between the predicted output and the true output.
4. Backward pass: Calculate the gradient of the loss with respect to each weight in the
network using backpropagation.
5. Use an optimization algorithm such as Stochastic Gradient Descent (SGD) or
ADAM to update the weights in the direction that reduces the loss (opposite to
gradient). The amount of weight update is determined by the learning rate
hyperparameter.
6. Repeat steps 2-5 for multiple epochs (passes through the entire dataset) until the loss
converges to a minimum or stops improving significantly or other finishing
conditions are satisfied.
During the training process, the weights are adjusted in a way that minimizes the MSE
loss between the predicted output and the true output. This is done by iteratively updating
the weights in the direction of steepest descent of the loss function until convergence.
Once the weights have converged to a minimum, the neural network can be used to make
predictions on new unseen data.
Artificial Neural Networks (ANN)

Multi-Layer Perceptrons (MLP)

Feed Forward Network

A N-layer neural network with inputs, hidden layers of K neurons each and one output
layer.
There are connections (synapses) between neurons across layers, but not within a layer.
N-layer neural network (excluding input)
Single layer NN have no hidden layers, input mapper onto output (SVM, logistic
regression)
Output layer neurons most commonly do not have an activation function - last output
layer represents the class scores.

y ( 1) =f ( W ( 1) x ) – outputs of first hidden layer

y ( 2) =f ( W ( 2) y ( 1) ) - outputs of second hidden layer

y=f ( W ( 3) y ( 2) ) - output (one output neuron – scalar number output)

y=f ( W ( 3 ) f ( W ( 2) f ( W ( 1) x ) ))
Input is a [3x1] vector
Weights W(1) of size [4x3], matrix with connections of the hidden layer, and the biases in
vector b1, of size [4x1].
Single neuron has its weights in a row of W1
Matrix-vector multiplication evaluates the activations of all neurons in that layer.
W(2) is [4x4] matrix with connections, and W(3) a [1x4] matrix for the last (output) layer.
The full forward pass is simply three matrix multiplications and applications of the
activation functions.
Size of the network - the number of parameters, number of layers.
This network has 4 + 4 + 1 = 9 neurons, [3 x 4] + [4 x 4] + [4 x 1] = 12 + 16 + 4 = 32
weights and 4 + 4 + 1 = 9 biases, for a total of 41 learnable parameters.
Activation Functions
An activation function in a neural network defines how the weighted sum of the input is
transformed into an output from a node or nodes in a layer of the network.
It decides whether a neuron should be activated or not. This means that it will decide
whether the neurons input to the network is important or not in the process of prediction.
Sometimes the activation function is called a “transfer function.” If the output range of
the activation function is limited, then it may be called a “squashing function.” Many
activation functions are nonlinear and may be referred to as the “nonlinearity” in the
layer or the network design.
The choice of activation function has a large impact on the capability and performance of
the neural network, and different activation functions may be used in different parts of
the model.
Technically, the activation function is used within or after the internal processing of each
node in the network, although networks are designed to use the same activation function
for all nodes in a layer.
A network may have three types of layers: input layers that take raw input from the
domain, hidden layers that take input from another layer and pass output to another layer,
and output layers that make a prediction.
All hidden layers typically use the same activation function. The output layer will
typically use a different activation function from the hidden layers and is dependent upon
the type of prediction required by the model.
Activation functions are also typically differentiable, meaning the first-order derivative
can be calculated for a given input value. This is required given that neural networks are
typically trained using the backpropagation of error algorithm that requires the derivative
of prediction error in order to update the weights of the model.
There are many different types of activation functions used in neural networks, although
perhaps only a small number of functions used in practice for hidden and output layers.

Importance of activation functions

y=f ( W ( 3 ) f ( W ( 2) f ( W ( 1) x ) )) with activation functions

y=W (3 ) W ( 2) W ( 1) x without activation

y=( W ( 3 ) W (2 ) W ( 1) ) x perceptron equation

Derivatives
NEURAL NETWORK TRAINING

For example, we have a neural network with 2 trainable

parameters. We could express loss of neural network as
L(w0, w1, Xtrain, Ytrain). Here w0, w1 are weights of neural
network (trainable parameters). Here Xtrain, Ytrain are train data.
Because we will focus on trainable parameters, further we will
write function without train data, it is L(w0, w1).
Training is minimization of loss versus trainable parameters.
Gradient Descent

Gradient of L(w0, w1):

(
∇ L ( w 0 , w 1) =
∂L ∂ L
,
∂ w 0 ∂ w1 )
lim f ( w0 +h , w1 )−f ( w0 , w 1) f w +h , w −f w , w
∂ f ℏ →0 ( 0 1) ( 0 1)
= ≈
∂ w0 h h

lim f ( w 0 , w1 +h )−f ( w0 , w1 ) f w , w +h −f w , w
∂ f ℏ →0 ( 0 1 ) ( 0 1)
= ≈
∂ w1 h h

w (01 )=w(00 )−η

∂L
∂ w0 |
w(00) ,w (0)
1

w (11 )=w1 −η
(0 ) ∂L
∂ w1 |
w(0)
0
, w(10)

η - learning rate
w =w −η ∇ L ( w )|w
(1) (0)
( 0)
❑ first iteration
w
(i+1 )
=w −η ∇ L ( w )|w
(i)
(i)
❑ i+1 iteration
Stop, when L has small changes or maximal iterations number
was executed.

When we have D trainable parameters, gradient:

( ∂∂wf , ∂∂wf ,..., ∂∂wf , ..., ∂∂wf )

∇ f ( w0 , w1 , ..., wi ,... wD )=
0 1 i D

lim f ( w i+ h )−f ( wi ) f w + h −f w
∂ f ℏ →0 ( i ) ( i)
= ≈
∂ wi h h

Number of operations O ( D ), because for we need calculate D partial derivatives, and for
2

every derivative number of operations proportional to number of parameters.

Backpropagation

The training of Neural Networks (NN) based on gradient-based optimization algorithms

is organized in two major steps:
Forward Propagation - here we calculate the output of the NN given inputs
Backward Propagation - here we calculate the gradients of the output with regards to
weights.
The first step is usually straightforward to understand and to calculate. The general idea
behind the second step is also clear — we need gradients to know the direction to make
steps in gradient descent optimization algorithm.
Although the backpropagation is not a new idea (developed in 1970s), answering the
question “how” these gradients are calculated gives some people a hard time. One has to
reach for some calculus, especially partial derivatives and the chain rule, to fully
understand back-propagation working principles.
Originally backpropagation was developed to differentiate complex nested functions.
However, it became highly popular thanks to the machine learning community and is
now the cornerstone of Neural Networks.
We’ll go to the roots and solve an exemplary problem step-by-step by hand, then we’ll
implement it in python using PyTorch, and finally we’re going to compare both results to
make sure everything works fine.
Computational Graph
Let’s assume we want to perform the following set of operations to get our result r:

If you substitute all individual node-equations into the final one you’ll find that we’re
solving the following equation. Being more specific we want to calculate its value and its
partial derivatives. So in this use case it is a pure mathematical task.
2 2 2
r =z ( x + y)
Forward Pass
To make this concept more tangible let’s take some numbers for our calculation. For
example:
x=1
y=2
z=4
Here we simply substitute our inputs into equations. The results of individual node-steps
are shown below. The final output is r=144.
Backward Pass
Now it’s time to perform a backpropagation, known also under a more fancy name
“backward propagation of errors” or even “reverse mode of automatic differentiation”.
To calculate gradients with regards to each of 3 variables we have to calculate partial
derivatives at each node in the graph (local gradients). Below we show how to do it for
the two last nodes/steps.
2
∂r ∂w
= =2 w
∂w ∂w

∂ w ∂ zv
= =v
∂z ∂z

After completing the calculations of local gradients a computation graph for the
backpropagation is like below.
Now, to calculate the final gradients (in orange circles) we have to use the chain rule. In
practice this means we have to multiply all partial derivatives along the path from the
output to the variable of interest:

Now we can use these gradient for whatever we want — e.g. optimization with a gradient
descent (SGD, Adam, etc.).

Implementation in PyTorch

There are numerous Neural Network frameworks in various languages where you can
implement such computations and make a computer to calculate gradients for you.
Below, We’ll demonstrate how to use the python PyTorch library to solve our exemplary
task.

import torch

x = torch.tensor(1.0, requires_grad=True)
y = torch.tensor(2.0, requires_grad=True)
z = torch.tensor(4.0, requires_grad=True)
# forward pass
u = x**2
v = u+y
w = z*v
r = w**2
print('r=', r.item())
# backward pass
r.backward()
print('dr/dx = ', x.grad.item())
print('dr/dy = ', y.grad.item())
print('dr/dz = ', z.grad.item())

The output of this code is:

Results from PyTorch are identical to the ones we calculated by hand.

Some notes:
● In PyTorch everything is a tensor — even if it contains only a single value.

● In PyTorch when you specify a variable which is a subject of gradient-based

optimization you have to specify argument requires_grad = True. Otherwise, it will

be treated as fixed input
● With this implementation, all back-propagation calculations are simply performed by
using method r.backward()

Henry E. Kyburg JR Theory and Measurement
100% (1)
Henry E. Kyburg JR Theory and Measurement
280 pages
Specification of Tokens
No ratings yet
Specification of Tokens
17 pages
Schedule For Early Number Assessment (SENA 3) Recording Sheet
100% (2)
Schedule For Early Number Assessment (SENA 3) Recording Sheet
7 pages
Neural Networks
No ratings yet
Neural Networks
52 pages
2024-05-07 - Module Réseaux de Neurones Pour La Performance Industrielle
No ratings yet
2024-05-07 - Module Réseaux de Neurones Pour La Performance Industrielle
61 pages
Neural Network (Perceptrons)
No ratings yet
Neural Network (Perceptrons)
31 pages
Modified Artificial Bee Colony Algorithm For Optimal Distributed Generation Sizing and Allocation in Distribution Systems
No ratings yet
Modified Artificial Bee Colony Algorithm For Optimal Distributed Generation Sizing and Allocation in Distribution Systems
9 pages
Lines, Angles, and Triangles SLM
No ratings yet
Lines, Angles, and Triangles SLM
11 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
Multi Percept Ron
No ratings yet
Multi Percept Ron
14 pages
Canadian Open Mathematics Challenge: The Canadian Mathematical Society
No ratings yet
Canadian Open Mathematics Challenge: The Canadian Mathematical Society
13 pages
CS 329 Lecture4 2025new
No ratings yet
CS 329 Lecture4 2025new
61 pages
L6 Neural Network
No ratings yet
L6 Neural Network
57 pages
Manual Laboratory Experiment No. 4 Bangguiyac
No ratings yet
Manual Laboratory Experiment No. 4 Bangguiyac
11 pages
Powersys PDF
No ratings yet
Powersys PDF
336 pages
EPS-DL-Handout3-Build ANN From Scratch Basics
No ratings yet
EPS-DL-Handout3-Build ANN From Scratch Basics
25 pages
MA 341 Homework 1 (Homogeneous Linear Differential Equations)
No ratings yet
MA 341 Homework 1 (Homogeneous Linear Differential Equations)
35 pages
ML.8-Neural Networks - Deep Learning (Week 12,13)
No ratings yet
ML.8-Neural Networks - Deep Learning (Week 12,13)
80 pages
MLS 1 - Presentation
No ratings yet
MLS 1 - Presentation
11 pages
Geo-Registration of Satellite Images
No ratings yet
Geo-Registration of Satellite Images
43 pages
Inductors: Publishing As Pearson (Imprint) Boylestad
No ratings yet
Inductors: Publishing As Pearson (Imprint) Boylestad
62 pages
2WB05 Simulation Lecture 5: Random-Number Generators: Marko Boon
No ratings yet
2WB05 Simulation Lecture 5: Random-Number Generators: Marko Boon
32 pages
List of SP10 With Failing Grades
No ratings yet
List of SP10 With Failing Grades
2 pages
Understanding and Coding Neural Networks From Scratch in Python and R
No ratings yet
Understanding and Coding Neural Networks From Scratch in Python and R
12 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Limits - Theory Session (Notes)
No ratings yet
Limits - Theory Session (Notes)
79 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
Chapter 7
No ratings yet
Chapter 7
36 pages
Drawing Mechanics - M.S
No ratings yet
Drawing Mechanics - M.S
17 pages
Circle Geometry Notes
No ratings yet
Circle Geometry Notes
2 pages
3ML.05.NeuralNetworks DeepLearning
No ratings yet
3ML.05.NeuralNetworks DeepLearning
67 pages
Classification BP Regression KNN Other Classifiers - Final
No ratings yet
Classification BP Regression KNN Other Classifiers - Final
116 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Gesture Recognition For Home Automation
No ratings yet
Gesture Recognition For Home Automation
13 pages
NN Introduction MES
No ratings yet
NN Introduction MES
39 pages
07-Lecture - 7 - COMPUTING AND ANALYZING POLYGON TRAVERSE MISCLOSURE ERRORS
No ratings yet
07-Lecture - 7 - COMPUTING AND ANALYZING POLYGON TRAVERSE MISCLOSURE ERRORS
25 pages
Cambridge International List of Fees June 2023-1
No ratings yet
Cambridge International List of Fees June 2023-1
12 pages
Introduction Deep Eng
No ratings yet
Introduction Deep Eng
50 pages
Neural Network 2
No ratings yet
Neural Network 2
14 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Costs and Their Curves
No ratings yet
Costs and Their Curves
8 pages
Accommodation of External Disturbances in Linear Regulator and Servomechanism Problems
No ratings yet
Accommodation of External Disturbances in Linear Regulator and Servomechanism Problems
10 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Vibrations
No ratings yet
Vibrations
18 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
D. Del Operator
No ratings yet
D. Del Operator
18 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Namma Kalvi 10th Maths Pta Model Question Papers Answers 217373
No ratings yet
Namma Kalvi 10th Maths Pta Model Question Papers Answers 217373
20 pages
Single Neuron Model
No ratings yet
Single Neuron Model
16 pages
Module 5 Lecture 2
No ratings yet
Module 5 Lecture 2
45 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
34 pages
Neural Network
100% (1)
Neural Network
54 pages
15 Neural Network Updated
No ratings yet
15 Neural Network Updated
85 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
Hi To Every One
No ratings yet
Hi To Every One
1 page
M3 Transcript
No ratings yet
M3 Transcript
10 pages
Understanding and Coding Neural Networks From Scratch in Python and R
100% (1)
Understanding and Coding Neural Networks From Scratch in Python and R
15 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
TCS NQT Syllabus
No ratings yet
TCS NQT Syllabus
4 pages
NN 2
No ratings yet
NN 2
12 pages
Neural
No ratings yet
Neural
53 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Curriculum Map Math 7 Q4
No ratings yet
Curriculum Map Math 7 Q4
3 pages
Machine Learning With Artificial Neural Networks
No ratings yet
Machine Learning With Artificial Neural Networks
44 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
Math7 Unpacking
No ratings yet
Math7 Unpacking
1 page
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
16 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
DEEP LEARNING Paper
No ratings yet
DEEP LEARNING Paper
12 pages
Sparseautoencoder 2011new
No ratings yet
Sparseautoencoder 2011new
19 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
AKTU ML PYQ Units 1 To 5 Clean
No ratings yet
AKTU ML PYQ Units 1 To 5 Clean
5 pages
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
No ratings yet
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
27 pages
NN Concepts
No ratings yet
NN Concepts
4 pages
Artificial Neural Network: Lecture Module 22
No ratings yet
Artificial Neural Network: Lecture Module 22
54 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
No ratings yet
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
31 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet

Neural Networks

Uploaded by

Neural Networks

Uploaded by

Biological and Artificial Neuron

Artificial neuron related names

b, w1, w2, wD - unknowns

Mean Squared Error - MSE.

Multi-Layer Perceptrons (MLP)

Feed Forward Network

y ( 1) =f ( W ( 1) x ) – outputs of first hidden layer

y ( 2) =f ( W ( 2) y ( 1) ) - outputs of second hidden layer

y=f ( W ( 3) y ( 2) ) - output (one output neuron – scalar number output)

Importance of activation functions

y=W (3 ) W ( 2) W ( 1) x without activation

y=( W ( 3 ) W (2 ) W ( 1) ) x perceptron equation

For example, we have a neural network with 2 trainable

Gradient of L(w0, w1):

w (01 )=w(00 )−η

When we have D trainable parameters, gradient:

( ∂∂wf , ∂∂wf ,..., ∂∂wf , ..., ∂∂wf )

every derivative number of operations proportional to number of parameters.

The training of Neural Networks (NN) based on gradient-based optimization algorithms

The output of this code is:

Results from PyTorch are identical to the ones we calculated by hand.

● In PyTorch when you specify a variable which is a subject of gradient-based

optimization you have to specify argument requires_grad = True. Otherwise, it will

You might also like