0% found this document useful (0 votes)

253 views5 pages

An Introduction To Mathematics Behind Neural Networks

This document provides an introduction to the mathematics behind neural networks. It explains the concepts of perceptrons, which are the simplest type of neural network consisting of inputs, a single neuron, and an output. It then describes the processes of forward propagation, backpropagation, and optimization used for neural network learning. Forward propagation involves multiplying inputs by weights, adding biases, and passing the results through an activation function. Backpropagation is used to calculate gradients to determine how costs change with respect to weights and biases. Optimization selects the best weights and biases using gradient descent to minimize the cost function.

Uploaded by

hilma a'yunina ifada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

253 views5 pages

An Introduction To Mathematics Behind Neural Networks

Uploaded by

hilma a'yunina ifada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 5

An Introduction To Mathematics

Behind Neural Networks

December 23rd 2019
TWEET THIS

Today, with open source machine learning software

libraries such as TensorFlow, Keras or PyTorch we can create
neural network, even with a high structural complexity, with
just a few lines of code. Having said that, the Math behind
neural networks is still a mystery to some of us and having
the Math knowledge behind neural networks and deep
learning can help us understand what’s happening inside a
neural network. It is also helpful in architecture selection,
fine-tuning of Deep Learning models, hyperparameters tuning
and optimization.
Introduction

I ignored understanding the Math behind neural networks and

Deep Learning for a long time as I didn’t have good
knowledge of algebra or differential calculus. Few days ago, I
decided to to start from scratch and derive the methodology
and Math behind neural networks and Deep Learning, to
know how and why they work. I also decided to write this
article, which would be useful to people like me, who finds it
difficult to understand these concepts.
Perceptrons

Perceptrons — invented by Frank Rosenblatt in 1957, are the

simplest neural network that consist of n number of inputs,
only one neuron and one output, where n is the number of
features of our dataset. The process of passing the data
through the neural network is know as forward propagation
and the forward propagation carried out in a Perceptron is
explained in the following three steps.
Step 1 : For each input, multiply the input value xᵢ with
weights wᵢ and sum all the multiplied values. Weights —
represent the strength of the connection between neurons
and decides how much influence the given input will have on
the neuron’s output. If the weight w₁ has higher value than
the weight w₂, then the input x₁ will have higher influence on
the output than w₂.

The row vectors of the inputs and weights are x = [x₁, x₂, … ,
xₙ] and w =[w₁, w₂, … , wₙ] respectively and their dot
product is given by

Hence, the summation is equal to the dot product of the

vectors x and w

Step 2: Add bias b to the summation of multiplied values

and let’s call this z. Bias — also know as offset is necessary in
most of the cases, to move the entire activation function to
the left or right to generate the required output values .

Step 3 : Pass the value of z to a non-linear activation

function. Activation functions — are used to introduce non-
linearity into the output of the neurons, without which the
neural network will just be a linear function. Moreover, they
have a significant impact on the learning speed of the neural
network. Perceptrons have binary step function as their
activation function. However, we shall use Sigmoid — also
know as logistic function as our activation function.
where, σ denotes the Sigmoid activation function and the
output we get after the forward prorogation is know as
the predicted value y.
Learning Algorithm

The learning algorithm consist of two parts —

Backpropagation and Optimization.
Backpropagation : Backpropagation, short for backward
propagation of errors, refers to the algorithm for computing
the gradient of the loss function with respect to the weights.
However, the term is often used to refer to the entire learning
algorithm. The backpropagation carried out in a Perceptron is
explained in the following two steps.
Step 1 : To know an estimation of how far are we from our
desired solution a loss function is used. Generally, Mean
Squared Error is chosen as loss function for regression
problems and cross entropy for classification problems. Let’s
take a regression problem and its loss function be Mean
Squared Error, which squares the difference
between actual (yᵢ) and predicted value ( ŷᵢ ).

Loss function is calculated for the entire training dataset and

their average is called the Cost function C.

Step 2 : In order to find the best weights and bias for our
Perceptron, we need to know how the cost function changes
in relation to weights and bias. This is done with the help
the gradients (rate of change) — how one quantity changes
in relation to another quantity. In our case, we need to find
the gradient of the cost function with respect to the weights
and bias.
Let’s calculate the gradient of cost function C with respect to
the weight wᵢ using partial derivation. Since the cost function
is not directly related to the weight wᵢ, let’s use the chain
rule.

Now we need to find the following three gradients

Let’s start with the gradient of the Cost function (C) with
respect to the predicted value ( ŷ )

Let y = [y₁ , y₂ , … yₙ] and ŷ =[ ŷ₁ , ŷ₂ , … ŷₙ] be the row

vectors of actual and predicted values. Hence the above
equation is simplifies as

Now let’s find the the gradient of the predicted value with
respect to the z. This will be a bit lengthy.

The gradient of z with respect to the weight wᵢ is

Therefore we get,

What about Bias? — Bias is theoretically considered to have

an input of constant value 1. Hence,
Optimization : Optimization is the selection of a best
element from some set of available alternatives, which in our
case, is the selection of best weights and bias of the
perceptron. Let’s choose gradient descent as our optimization
algorithm, which changes the weights and bias, proportional
to the negative of the gradient of the Cost function with
respect to the corresponding weight or bias. Learning rate (α)
is a hyperparameter which is used to control how much the
weights and bias are changed.
The weights and bias are updated as follows and the
Backporpagation and gradient descent is repeated until
convergence.

Conclusion

I hope that you’ve found this article useful and understood

the maths behind the Neural Networks and Deep Learning. I
have explained the working of a single neuron in this article,
however the these basic concepts are applicable to all kinds
of Neural Networks with some modifications. If you have any
questions or if you found a mistake, please let me know in
the comment.

Unit 4
No ratings yet
Unit 4
108 pages
Pranab K Sen - Julio M Singer - Large Sample Methods in Statistics (1994) - An Introduction With Applications (2017, CRC Press) - Libgen - Li
No ratings yet
Pranab K Sen - Julio M Singer - Large Sample Methods in Statistics (1994) - An Introduction With Applications (2017, CRC Press) - Libgen - Li
395 pages
CEC453 Machine Learning
No ratings yet
CEC453 Machine Learning
168 pages
Classification BP Regression KNN Other Classifiers - Final
No ratings yet
Classification BP Regression KNN Other Classifiers - Final
116 pages
ST2195 Complete
No ratings yet
ST2195 Complete
430 pages
NLP Unit V Notes
100% (1)
NLP Unit V Notes
21 pages
Neural
No ratings yet
Neural
53 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
Backpropagation Algorithm
No ratings yet
Backpropagation Algorithm
12 pages
Machine Learning Techniques - Types of Machine Learning - Applications Mathematical Foundations of Machine Learning
No ratings yet
Machine Learning Techniques - Types of Machine Learning - Applications Mathematical Foundations of Machine Learning
15 pages
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
No ratings yet
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
6 pages
Neural Network
100% (1)
Neural Network
54 pages
MACHINELEARING UNIT 1material
100% (1)
MACHINELEARING UNIT 1material
64 pages
Bioinformatics F&amp M 20100722 Bujak
100% (1)
Bioinformatics F&amp M 20100722 Bujak
27 pages
Anuj Srivastava, Eric P. Klassen - Functional and Shape Data Analysis-Springer (2016)
No ratings yet
Anuj Srivastava, Eric P. Klassen - Functional and Shape Data Analysis-Springer (2016)
454 pages
CNN PPT Unit Iv
No ratings yet
CNN PPT Unit Iv
134 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Project
No ratings yet
Project
10 pages
U02Lecture07 Classification
100% (1)
U02Lecture07 Classification
56 pages
Maths Roadmap For Machine Learning
No ratings yet
Maths Roadmap For Machine Learning
16 pages
PyCUDA Tutorial
100% (1)
PyCUDA Tutorial
15 pages
Adaline/Madaline:Applications
100% (1)
Adaline/Madaline:Applications
25 pages
The Gainz Manual
No ratings yet
The Gainz Manual
28 pages
Math4ml PDF
No ratings yet
Math4ml PDF
21 pages
Practice Machine Learning With Datasets From The UCI Machine Learning Repository
No ratings yet
Practice Machine Learning With Datasets From The UCI Machine Learning Repository
23 pages
Data Science
No ratings yet
Data Science
74 pages
ف1
No ratings yet
ف1
4 pages
Maths For Data Science
No ratings yet
Maths For Data Science
1 page
JNTUK R20 ML UNIT-I Final
No ratings yet
JNTUK R20 ML UNIT-I Final
22 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Dive in Deep Learning
100% (1)
Dive in Deep Learning
658 pages
Lecture 10 Tensor and Tensor Algebra 2 PDF
No ratings yet
Lecture 10 Tensor and Tensor Algebra 2 PDF
14 pages
ENG 202: Computers and Engineering Object Oriented Programming in PYTHON
No ratings yet
ENG 202: Computers and Engineering Object Oriented Programming in PYTHON
56 pages
Lecture 01 (Introduction To Pattern Recognition)
No ratings yet
Lecture 01 (Introduction To Pattern Recognition)
26 pages
05 Logistic - Regression
No ratings yet
05 Logistic - Regression
7 pages
Mathematical Modeling of Engineering Problems
No ratings yet
Mathematical Modeling of Engineering Problems
69 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
Automatic Differentiation With Pytorch: Stat 479: Deep Learning, Spring 2019 Sebastian Raschka
No ratings yet
Automatic Differentiation With Pytorch: Stat 479: Deep Learning, Spring 2019 Sebastian Raschka
43 pages
Machine Learning: Andrew NG's Course From Coursera: Presentation
100% (1)
Machine Learning: Andrew NG's Course From Coursera: Presentation
4 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
7 pages
Example of 2D Convolution
No ratings yet
Example of 2D Convolution
5 pages
A Hands-On Introduction To Data Science
No ratings yet
A Hands-On Introduction To Data Science
2 pages
Data Science With Python Workflow
No ratings yet
Data Science With Python Workflow
1 page
Soft Computing 2017
No ratings yet
Soft Computing 2017
323 pages
Max and Min PDF
No ratings yet
Max and Min PDF
19 pages
A Practical Guide To Graph Neural Networks
No ratings yet
A Practical Guide To Graph Neural Networks
28 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Artificial Neural Networks - Methodological Advances and Bio Medical Applications
100% (1)
Artificial Neural Networks - Methodological Advances and Bio Medical Applications
374 pages
Chapter 01 Introduction
No ratings yet
Chapter 01 Introduction
21 pages
Logistic Regression: Jia Li
No ratings yet
Logistic Regression: Jia Li
44 pages
Data Science New
No ratings yet
Data Science New
9 pages
An Introduction To Dynamical Systems and Chaos
No ratings yet
An Introduction To Dynamical Systems and Chaos
27 pages
Mehryar Mohri - Foundations of Machine Learning - Book
No ratings yet
Mehryar Mohri - Foundations of Machine Learning - Book
1 page
Neural
No ratings yet
Neural
35 pages
Lecture 3 EdgeDetection
No ratings yet
Lecture 3 EdgeDetection
52 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Online Machine Learning Algorithms For Currency Exchange Prediction
No ratings yet
Online Machine Learning Algorithms For Currency Exchange Prediction
84 pages
Radial Basis Function
No ratings yet
Radial Basis Function
35 pages
Linear Programming Problems - Formulation
100% (2)
Linear Programming Problems - Formulation
55 pages
Handbook of Dynamic Game Theory
No ratings yet
Handbook of Dynamic Game Theory
1,288 pages
Data Visualization With Ma Thematic A
No ratings yet
Data Visualization With Ma Thematic A
46 pages
What Is A Support Vector Machine?: Primer
No ratings yet
What Is A Support Vector Machine?: Primer
3 pages
How Google Uses SVD
No ratings yet
How Google Uses SVD
6 pages
CS321 Grosse Lecture Notes
No ratings yet
CS321 Grosse Lecture Notes
169 pages
Session 1 - Introduction and Graphical Method To Solve LPP
No ratings yet
Session 1 - Introduction and Graphical Method To Solve LPP
74 pages
Peter B. Aronhime, F. William Stephenson (Auth.) - Analog Signal Processing-Springer US (1994) PDF
No ratings yet
Peter B. Aronhime, F. William Stephenson (Auth.) - Analog Signal Processing-Springer US (1994) PDF
97 pages
Quality Costs and Productivity: Measurement, Reporting, and Control
No ratings yet
Quality Costs and Productivity: Measurement, Reporting, and Control
45 pages
Z-Chart & Loss Function Tables
No ratings yet
Z-Chart & Loss Function Tables
1 page
Unit 2
No ratings yet
Unit 2
125 pages
CO-2024-LS-Grade 9-CUF - Math-Q1-W9A-RHVP
No ratings yet
CO-2024-LS-Grade 9-CUF - Math-Q1-W9A-RHVP
10 pages
LN NN Rug
No ratings yet
LN NN Rug
215 pages
Empirical Risk Minimization
No ratings yet
Empirical Risk Minimization
3 pages
A Dynamic Route Guidance System Based On Real Traffic Data 2001 European Journal of Operational Research
No ratings yet
A Dynamic Route Guidance System Based On Real Traffic Data 2001 European Journal of Operational Research
7 pages
Chapter 1 - 7 Steps of Problem
No ratings yet
Chapter 1 - 7 Steps of Problem
3 pages
Hhaa 009
No ratings yet
Hhaa 009
51 pages
CS115 Optimization
No ratings yet
CS115 Optimization
160 pages
Math 1010 Optimizing An Advertising Campaign
No ratings yet
Math 1010 Optimizing An Advertising Campaign
4 pages
Submitted To: Ms - Surbhi Submitted By: Jasneek Arora BBM Sem Iii Sec A
No ratings yet
Submitted To: Ms - Surbhi Submitted By: Jasneek Arora BBM Sem Iii Sec A
13 pages
FandI - CT6 - 200909 - Examiners' Report
No ratings yet
FandI - CT6 - 200909 - Examiners' Report
11 pages
Optimal Selection of Process Mean For A Stochastic Inventory Model
No ratings yet
Optimal Selection of Process Mean For A Stochastic Inventory Model
10 pages
A I I E Transactions Volume 11 Issue 4 1979 (Doi 10.1080 - 05695557908974471) Muth, Eginhard J. White, John A. - Conveyor Theory - A Survey
100% (1)
A I I E Transactions Volume 11 Issue 4 1979 (Doi 10.1080 - 05695557908974471) Muth, Eginhard J. White, John A. - Conveyor Theory - A Survey
9 pages
Energies: Occupancy-Based HVAC Control With Short-Term Occupancy Prediction Algorithms For Energy-Efficient Buildings
No ratings yet
Energies: Occupancy-Based HVAC Control With Short-Term Occupancy Prediction Algorithms For Energy-Efficient Buildings
20 pages
Aggregate Planning SureStep Complete
No ratings yet
Aggregate Planning SureStep Complete
3 pages
Nmoup 2.3
No ratings yet
Nmoup 2.3
5 pages
Chapter 14 PDF
No ratings yet
Chapter 14 PDF
82 pages
Regression Exercises
No ratings yet
Regression Exercises
2 pages
Prasetyawan 2020
No ratings yet
Prasetyawan 2020
9 pages
A Study On Mixture of Exponentiated Pareto and Exponential Distributions PDF
No ratings yet
A Study On Mixture of Exponentiated Pareto and Exponential Distributions PDF
20 pages
7 Cost Function Intuition I
No ratings yet
7 Cost Function Intuition I
2 pages

An Introduction To Mathematics Behind Neural Networks

Uploaded by

An Introduction To Mathematics Behind Neural Networks

Uploaded by

An Introduction To Mathematics

Behind Neural Networks

Today, with open source machine learning software

I ignored understanding the Math behind neural networks and

Perceptrons — invented by Frank Rosenblatt in 1957, are the

Hence, the summation is equal to the dot product of the

Step 2: Add bias b to the summation of multiplied values

Step 3 : Pass the value of z to a non-linear activation

The learning algorithm consist of two parts —

Loss function is calculated for the entire training dataset and

Now we need to find the following three gradients

Let y = [y₁ , y₂ , … yₙ] and ŷ =[ ŷ₁ , ŷ₂ , … ŷₙ] be the row

The gradient of z with respect to the weight wᵢ is

What about Bias? — Bias is theoretically considered to have

I hope that you’ve found this article useful and understood

You might also like