0% found this document useful (0 votes)

9 views48 pages

Neural Network (Basics)

Neural Networks consist of interconnected neurons that process inputs and produce outputs through weighted connections. They can be categorized into layers: input, hidden, and output, and utilize activation functions to determine neuron activation. The backpropagation algorithm is a key method for training these networks by efficiently updating weights to minimize error in predictions.

Uploaded by

23bme020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views48 pages

Neural Network (Basics)

Uploaded by

23bme020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Neural Networks

Neural Networks
• Neural Networks are
networks of
interconnected neurons,
for example in human
brains.
• neurons, Neural
Artificial and Networks
computatio
are highly connected to
signals
ns
other from other
performs
neurons.
by

• combining
Outputs of these computations may be transmitted to one
or more other neurons.
• The neurons are connected together in a specific way to
perform a particular task. 2
Artificial Neural Networks (High-Level Overview)
• A neural network is a function.
• It consists of basically:
a. Neurons: which pass input values through functions and
output the result.
b. Weights: which carry values ( real-number) between
• Neurons can be categorized into
neurons.
a. Input Layer
layers:
b. Hidden
Layer
c. Output
Layer

3
Neurophysiology

• The human nervous system can be divided into three stages:

a. Receptors:
■ Convert stimuli from the external environment into electrical impulses
■ Rods and Cones of eyes,
■ Pain, touch, hot and cold receptors of skin.
b. Neural Net:
■ Receive information, process it and make appropriate decisions.
■ Brain
c. Effectors:
■ Convert electrical impulses generated by the the neural net (brain) into
responses to the external environment.
■ Muscles and glands, speech generators.
4
Basic Components of Biological Neurons
The basic components of a
biological neuron are:
● Cell Body (Soma) processes the incoming
activations and converts them into output
activations.
● Neuron Nucleus contains the genetic material
(DNA).
● Dendrites form a fine filamentary bush
each fiber thinner than an axon.
● Axon: Long thin cylinder carrying impulses
from soma to other cells
● Synapses: The junctions that allow
signal transmission b/w the axons and
5
dendrites.
Computation in Biological Neurons
● Incoming signals from
synapses are summed up
at the soma.
● On crossing a threshold, the
cell fires generating an action
potential in the axon hillock
region.

6
How do ANNs work?

An artificial neuron is an imitation of a human neuron

How do ANNs work?
• Now, let us have a look at the model of an artificial neuron.
How do ANNs work?
.........
Input xm
...
x2 x1

Processing ∑
∑= X1+X2 + ….+Xm =y

Output y
How do ANNs work?
Not all inputs are equal
xm ......... x2 x1
...
Input
w ...
weights m
..
w2 w1

Processing ∑ ∑= X1w1+X2w2 + ….
+Xmwm =y

Output y
How do ANNs work?
The signal is not passed down to the
next neuron verbatim
xm ......... x2 x1
...
Input
w
w w
...
weights m
..
2
1

Processing ∑
Transfer Function
f(vk)
(Activation Function)

Output y
The output is a function of the input, that is
affected by the weights, and the transfer
functions
The Perceptron Model
● Motivated the
by biological x
1
w1
neuron. is a
● A
element where computing
inputs are x w2 y
perceptron
associated with the weights 2

and the cell having a

threshold value. x wn
n

13
Neural network architectures

There are three fundamental classes of ANN architectures:

Single layer feed forward

architecture Multilayer feed

forward architecture Recurrent

networks architecture

Before going to discuss all these architectures, we first

discuss the mathematical details of a neuron at a single level.
To do this, let us first consider the AND problem and its
possible solution with neural network.

Soft Computing 29.01.201 14 /

The AND problem and its Neural network
The simple Boolean AND operation with two input
variables x1
and x2 is shown in the truth table.
Here, we have four input patterns: 00, 01, 10 and 11.
For the first three patterns output is 0 and for the
last pattern output is 1.

Soft Computing 29.01.201 15 /

The AND problem and its neural network
Alternatively, the AND problem can be thought as a
perception problem where we have to receive four
different patterns as input and perceive the results as 0
or 1.

00
0
10
01 1
11

x1
w1

Soft Computing 29.01.201 16 /

The AND problem and its neural network
A possible neuron specification to solve the AND problem
is given in the following. In this solution, when the input
is 11, the weight sum exceeds the threshold (θ = 0.9)
leading to the output 1 else it gives the output 0.

1 2

Σ
Here, y = wi xi − θ and w1 = 0.5,w2 = 0.5 and θ
= 0.9 Soft Computing 29.01.201 17 /
The Perceptron Model
● Rewrite Σ wi xi as w.x
x
● Replacethreshold = 1
w1

● b: Bias, -b
a prior inclination
x w2 y
towards some decision.
2

x wn
n

18
Activation Functions
● Activation function decide whether a neuron should be
activated or not.
● It helps the network to use the useful information and
suppress the irrelevant information.
● Usually a nonlinear function.
○ What if we choose a linear?
○ Linear classifier
○ Limited capacity to solve complex problems.

19
Activation Functions (cont’d)
● Sigmoid

○ continuously differentiable
○ ranges from 0-1
○ not symmetric around the origin
● Tanh
○
○
○ scaled version of the sigmoid
○ symmetric around the origin
○ vanishing gradient

20
Activation Functions (cont’d)
● ReLU

○ Also called piecewise linear function

because rectified function is linear for
half of the input domain and nonlinear
for the other half.
○ trivial to implement
○ sparse representation
○ avoid the problem of vanishing
gradients
○ dead neurons

21
Representation Power
● A neural network with at least one hidden layer can
approximate any function.[1]
● The representation power of network increase with more
hidden
and moreunits
hidden But, “with great power comes great overfitting”
● layers.

[1] Cybenko, George. "Approximation by superpositions of a sigmoidal function." Mathematics of control, signals and systems 2.4 22
(1989): 303-314.
Feed-forward Neural Network

a2 y1

a3 y2

Input Hidde Output

n 23
Objective Function
● The function we want to minimize or maximize is called the
objective function or criterion.
● When we are minimizing it, we may also call it the cost
function, loss function, or error function.
● A loss function tells how good our current classifier is.
● Given a dataset:

24
Objective Function (cont’d)
● Mean Squared Error:
○ Mean Squared Error (MSE), or quadratic, loss function is
widely used in linear regression as the performance
measure.
○ It measures the average of the squares of the errors—that
is, the average squared difference between the
estimated values and the actual value.
○ It is always non-negative, and values closer to zero are
better.

25
Objective Function (cont’d)
● Mean Absolute Error:
○ Mean Absolute Error (MAE) is a quantity used to
measure how close forecasts or predictions are to the
eventual outcomes.
○ Both MSE and MAE are used in predictive modeling.
○ MSE has nice mathematical properties which makes it
easier to compute the gradient.

26
Objective Function (cont’d)
● Cross-entropy:
○ Coss-entropy comes from the field of information
theory and has the unit of “bits.”
○ The cross-entropy between a “true” distribution p and an
estimated distribution q is defined as:

○ Cross-entropy can be re-written in of the

terms and Kullback-Leibler divergence entropy two
between the distributions

27
Objective Function (cont’d)
● Cross-entropy:
○ Assuming a ground truth probability distribution that
is 1 at the right class and 0 everywhere else p = [0,
…,0,1,0,…0] and our computed probability is q
○ Kullback-Leibler divergence can be written as:

28
Optimization
● The goal of optimization is to find parameter (weights)
that minimizes the loss function.
● How to find such weights?
○ Random Search
■ Very bad idea.
○ Random Local Search
■ Start with a random weight w and generate random perturbations Δw
to it and if the loss at the perturbed w+Δw is lower, we will perform an
update.
■ Computationally expensive
○ Follow the Gradient
■ No need to search for a good direction.
■ We can compute the best direction along which we should change
our weight vector that is mathematically guaranteed to be the
direction of the steepest descent. 29
Optimization (cont’d)
Find w which minimizes the
chosen error function E(w)

● wA : a local minimum
● wB : a global minimum
● At point wC local gradient is
given by vector ΔE(w)
● It points in direction of
greatest rate of increase
of E(w)
● Negative gradient points to 2
7
Optimization (cont’d)
Find w which minimizes the
chosen error function E(w)

● wA : a local minimum
● wB : a global minimum
● At point wC local gradient is
given by vector ΔE(w)
● It points in direction of
greatest rate of increase
of E(w)
● Negative gradient points to 2
8
Gradient and Hessian
● First derivative of a scalar function E(w) with respect to
a vector
w=[w1,w2]T is a vector called the Gradient of E(w)

● Second derivative of a scalar function E(w) with respect to

a vector
w=[w1,w2]T is a matrix called the Hessian of E(w)

32
Gradient Descent Optimization
● Determine weights w from labeled set of training samples.
● Take a small step in the direction of the negative gradient

wnew = wold - η ΔE(wold)

● After each update, the gradient is re-evaluated for the

new weight vector and the process is repeated
● This size of steps η taken to reach the minimum or bottom
is called Learning Rate.

33
Gradient Descent Variants
● Batch gradient descent:
○ Vanilla gradient descent, aka batch gradient descent,
computes the gradient of the cost function w.r.t. to the
parameters w for the entire training dataset.
○ wnew = wold - η ΔE(wold)

○ Guaranteed to converge to global minimum for

convex error surfaces and to a local minimum for
non-convex surfaces.
○ Need to calculate the gradients for the whole dataset to
perform just one update.
○ Very slow and is intractable for datasets that don't fit in 34
memory.
Gradient Descent Variants
● Stochastic gradient descent:
○ Stochastic gradient descent
(SGD) in contrast performs a
parameter update for each
training example, say (xi , yi )
○ wnew = wold - η ΔE(wold:; xi ; yi )
○ Much faster (avoid
redundancy as exist in Batch
gradient descent)
○ While slowly decreasing the
learning rate, SGD shows the
same convergence behaviour
as batch gradient descent.
○ It performs frequent updates 3
2
Gradient Descent Variants
● Mini-batch gradient descent:
○ Performs update for every mini-batch of n examples.
○ wnew = wold - η ΔE(wold:; xi:i+n ; yi:i+n)
○ Reduces variance of updates.
○ Algorithm of choice
○ Mini-batch size is a hyperparameter. Common sizes are
50-256.

36
Backpropagation Algorithm
● Backpropagation algorithm is used to train artificial neural
networks, it can update the weights very efficiently.
● It is a computationally efficient approach to compute the
derivatives of a complex cost function.
● Goal is to use those derivatives to learn the weight
coefficients for parameterizing a multi-layer artificial neural
network.
● It compute the gradient of a cost function with respect to all
the weights in the network, so that the gradient is fed to the
gradient descent method which in turn uses it to update the
weights in order to minimize the cost function.
37
Backpropagation Algorithm (cont’d)
● Chain Rule:
○ Single Path

x y z
○ Multiple Path

y
1

x z
y
3
2 5
Backpropagation Algorithm (cont’d)
● The total error in the
network for a single input is
wij wjk
given by the following
equation

Input Hidden Output

(i) (j) (k)
39
Backpropagation Algorithm (cont’d)
● There are two sets of
weights in our network:
wij wjk
○ wij : from the input to the
layer.
hidden
○ wjk : from the hidden to the
output
layer.
● We want to adjust the
network’s
weights to reduce this
overall error. Input Hidden Output
○ (i) (j) (k)
40
Backpropagation Algorithm (cont’d)
● Backpropagation – wij wjk
for outermost layer
○ outermost layer
parameters
directly affect the value of
the error function.
○ only term
one of the
summati
non-zero E will
derivative: Input Hidden Output
on
associatedwith the have
(i) (j) (k)
particular weight weaare the
considering. one
41
Backpropagation Algorithm (cont’d)
● Backpropagation – wij wjk
for outermost layer

Input Hidden Output

(i) (j) (k)

42
Backpropagation Algorithm (cont’d)
● Backpropagation – wij wjk
for outermost layer

Output
Input
Hidden (k)
(i)
For sigmoid activation (j)
function

43
Backpropagation Algorithm (cont’d)
● Backpropagation – for wij wjk
hidden layer

Output
Input
Hidden (k)
(i)
(j)

44
Backpropagation Algorithm (cont’d)
● Backpropagation – for wij wjk
hidden layer

Input Hidden Output

(i) (j) (k)

45
Backpropagation Algorithm (cont’d)
● Backpropagation – for wij wjk
hidden layer

Output
Input
Hidden (k)
(i)
(j)

46
References
● https://www.cs.swarthmore.edu/~meeden/cs81/s10/BackPropDeriv.pdf
● https://cs224d.stanford.edu/lecture_notes/notes3.pdf
● http://www.cs.cmu.edu/~ninamf/courses/315sp19/lectures/3_29-NNs.pdf
● https://cedar.buffalo.edu/~srihari/CSE574/Chap5/Chap5.1-FeedFor.pdf
● http://ruder.io/optimizing-gradient-descent/
● http://www.cs.cmu.edu/~ninamf/courses/315sp19/lectures/Perceptron-01-25
-2019.pdf
● http://www.cs.cornell.edu/courses/cs5740/2016sp/resources/backprop.pdf

47
Thank
You!

UNIT 1 Introduction Part 1
No ratings yet
UNIT 1 Introduction Part 1
37 pages
Elements of Chemical Reaction Engineering - Solutions Manual (PDFDrive)
No ratings yet
Elements of Chemical Reaction Engineering - Solutions Manual (PDFDrive)
903 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Lect 5
No ratings yet
Lect 5
89 pages
Module - 3 AAI
No ratings yet
Module - 3 AAI
119 pages
ML Unit 2
No ratings yet
ML Unit 2
58 pages
Chapter 5 Artificial Neural Networks
No ratings yet
Chapter 5 Artificial Neural Networks
50 pages
Unit 1
No ratings yet
Unit 1
29 pages
Day1 05 Introduction To DeepLearning Part
No ratings yet
Day1 05 Introduction To DeepLearning Part
20 pages
Lecture NN 2005
No ratings yet
Lecture NN 2005
137 pages
7 Neural Networks - Lecture Slides
No ratings yet
7 Neural Networks - Lecture Slides
74 pages
Machine Learning
No ratings yet
Machine Learning
77 pages
CNN and Gan: Introduction To
No ratings yet
CNN and Gan: Introduction To
58 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Module4 AI
No ratings yet
Module4 AI
12 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
18 pages
Basics
No ratings yet
Basics
48 pages
ML MU Unit 5NeuralNetworkpdf 2025 04 16 13 47 39
No ratings yet
ML MU Unit 5NeuralNetworkpdf 2025 04 16 13 47 39
57 pages
Chapter 3-1 Neural Network
No ratings yet
Chapter 3-1 Neural Network
43 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Lesson 2 Neural Network Architectures
No ratings yet
Lesson 2 Neural Network Architectures
35 pages
AN2DL 02 2324 Perceptron 2 FeedForward
No ratings yet
AN2DL 02 2324 Perceptron 2 FeedForward
55 pages
Essential Concept in Artificial Neural Networks
No ratings yet
Essential Concept in Artificial Neural Networks
27 pages
Neural Network and Fuzzy Logic
50% (2)
Neural Network and Fuzzy Logic
54 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
Unit 5
No ratings yet
Unit 5
59 pages
NN Unit - 1
No ratings yet
NN Unit - 1
27 pages
UNIT III 3.1 ML Artificial Neural Networks
No ratings yet
UNIT III 3.1 ML Artificial Neural Networks
65 pages
15 Neural Network Updated
No ratings yet
15 Neural Network Updated
85 pages
Lesson 7.0 Supervised Learning With Neural Networks
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks
22 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Unit - 4 Artificial Neural Networks
No ratings yet
Unit - 4 Artificial Neural Networks
33 pages
Neural Networks
No ratings yet
Neural Networks
61 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Lecture15 NeuronNetworks
No ratings yet
Lecture15 NeuronNetworks
61 pages
Neural NetworksChapter2Sup
No ratings yet
Neural NetworksChapter2Sup
20 pages
Part7.2 Artificial Neural Networks
No ratings yet
Part7.2 Artificial Neural Networks
51 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
@vtucode - in Module 5 AI 2021 Scheme 5th Sem
No ratings yet
@vtucode - in Module 5 AI 2021 Scheme 5th Sem
66 pages
Week 8 - ANN
No ratings yet
Week 8 - ANN
42 pages
Machine Learning-Gkouzionis
No ratings yet
Machine Learning-Gkouzionis
14 pages
Unit V
No ratings yet
Unit V
9 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
AI Mod4 Session 8 Best Fit Line & ANN
No ratings yet
AI Mod4 Session 8 Best Fit Line & ANN
39 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-11 Reference-Material-I
No ratings yet
WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-11 Reference-Material-I
40 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Artificial Neural Networks (Anns) : Intro
No ratings yet
Artificial Neural Networks (Anns) : Intro
15 pages
Module1 ECO-598 AI & ML Aug 21
No ratings yet
Module1 ECO-598 AI & ML Aug 21
45 pages
NNDL
No ratings yet
NNDL
96 pages
2024 MTH058 Lecture02 Backpropagation
No ratings yet
2024 MTH058 Lecture02 Backpropagation
62 pages
Neural Networks From Scratch: 3.1 Formal Neuron
No ratings yet
Neural Networks From Scratch: 3.1 Formal Neuron
8 pages
Neural Network
No ratings yet
Neural Network
7 pages
Module 5 AIML Notes
No ratings yet
Module 5 AIML Notes
77 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
Sequencing Problem MCQ Unit-3
100% (1)
Sequencing Problem MCQ Unit-3
3 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
Artificial Neural Network: Lecture Module 22
No ratings yet
Artificial Neural Network: Lecture Module 22
54 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
Chain Ladder Excel Caritat
No ratings yet
Chain Ladder Excel Caritat
86 pages
Week #6 - Verilog Behavioural Modeling (Part 4) FSM
No ratings yet
Week #6 - Verilog Behavioural Modeling (Part 4) FSM
18 pages
Automata - Chap2+finiteautomata
No ratings yet
Automata - Chap2+finiteautomata
47 pages
Mtech Ai ML
No ratings yet
Mtech Ai ML
19 pages
An Introduction To Spectral Methods
100% (1)
An Introduction To Spectral Methods
39 pages
Types of Machine Learning - Tpoint Tech
No ratings yet
Types of Machine Learning - Tpoint Tech
10 pages
Newton's Divided Difference Polynomial Method of Interpolation
No ratings yet
Newton's Divided Difference Polynomial Method of Interpolation
25 pages
Grover 221210109
No ratings yet
Grover 221210109
5 pages
RiskManagement B00246928
No ratings yet
RiskManagement B00246928
8 pages
Quantum Computers
No ratings yet
Quantum Computers
2 pages
MATHESH Matlab Final Output
No ratings yet
MATHESH Matlab Final Output
19 pages
Homwork 2, 3, and 4
No ratings yet
Homwork 2, 3, and 4
7 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
2 pages
Data Science & ML Using Python
No ratings yet
Data Science & ML Using Python
5 pages
Assignment 2 MGM3163
No ratings yet
Assignment 2 MGM3163
16 pages
Data Science Session 8 Clustering V0
No ratings yet
Data Science Session 8 Clustering V0
30 pages
Chapter3-Goodness of Fit Tests
No ratings yet
Chapter3-Goodness of Fit Tests
24 pages
What Gravity Mediated Entanglement Can Really Tell Us About Quantum Gravity
No ratings yet
What Gravity Mediated Entanglement Can Really Tell Us About Quantum Gravity
11 pages
DL Practical 3 Loss Function
No ratings yet
DL Practical 3 Loss Function
6 pages
19CS3602
No ratings yet
19CS3602
2 pages
Set - 1 Kuppam Engineering College, Kuppam - 517425 Iii/I B.Tech (R09) Eee (B) - Jntua Descriptive Mid Test No: I Control Sysrems
No ratings yet
Set - 1 Kuppam Engineering College, Kuppam - 517425 Iii/I B.Tech (R09) Eee (B) - Jntua Descriptive Mid Test No: I Control Sysrems
4 pages
Custodio Vonm Aldrich EE 2201 Activity 1
No ratings yet
Custodio Vonm Aldrich EE 2201 Activity 1
5 pages
Week 6 - Fourier Transform: Activities
No ratings yet
Week 6 - Fourier Transform: Activities
10 pages
Pages From (Monson Hayes) Schaum S Outline of Digital Signal
No ratings yet
Pages From (Monson Hayes) Schaum S Outline of Digital Signal
7 pages
Review ICC
No ratings yet
Review ICC
3 pages
LAB211 Assignment: Title
No ratings yet
LAB211 Assignment: Title
2 pages
Math10 - Exit - Assessment TOS
No ratings yet
Math10 - Exit - Assessment TOS
1 page
Chapter 1
No ratings yet
Chapter 1
2 pages
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
From Everand
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
Fouad Sabry
No ratings yet

Neural Network (Basics)

Uploaded by

Neural Network (Basics)

Uploaded by

Neural Networks

• The human nervous system can be divided into three stages:

An artificial neuron is an imitation of a human neuron

and the cell having a

There are three fundamental classes of ANN architectures:

Single layer feed forward

architecture Multilayer feed

forward architecture Recurrent

Before going to discuss all these architectures, we first

Soft Computing 29.01.201 14 /

Soft Computing 29.01.201 15 /

Soft Computing 29.01.201 16 /

○ Also called piecewise linear function

Input Hidde Output

○ Cross-entropy can be re-written in of the

● Second derivative of a scalar function E(w) with respect to

wnew = wold - η ΔE(wold)

● After each update, the gradient is re-evaluated for the

○ Guaranteed to converge to global minimum for

Input Hidden Output

Input Hidden Output

Input Hidden Output

You might also like