0% found this document useful (0 votes)

28 views100 pages

Artificial Neural Networks

Uploaded by

Raahil Rai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views100 pages

Artificial Neural Networks

Uploaded by

Raahil Rai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 100

Artificial Neural

Networks
ANN

• Topics
• Perceptron Model to Neural Networks
• Activation Functions
• Cost Functions
• Feed Forward Networks
• BackPropagation
Perceptron Model
Perceptron model

• To begin understanding deep learning,

• Single Biological Neuron
• Perceptron
• Multi-layer Perceptron Model
• Deep Learning Neural Network
Perceptron model

• Stained Neurons in cerebral cortex

Perceptron model
• Illustration of biological neurons
Perceptron model

• A perceptron was a form of neural network introduced in

1958 by Frank Rosenblatt.
• "...perceptron may eventually be able to learn, make
decisions, and translate languages."
Perceptron model

• Marvin Minsky and Seymour Papert's (1969)- book

Perceptrons.
• Suggested severe limitations to what perceptrons could do.
• Marked the beginning of AI Winter
Perceptron model

Inputs Output
Perceptron model

x1
Inputs Output

x2
Perceptron model

x1
Inputs f(X) Output

x2
Perceptron model

x1
y
Inputs f(X) Output

x2
Perceptron model
• If f(X) is just a sum, then y=x1+x2

x1
y
Inputs f(X) Output

x2
Perceptron model
• adjust some parameter in order to “learn”

x1
y
Inputs f(X) Output

x2
Perceptron model
• add an adjustable weight

w1
x1
y
Inputs f(X) Output
w2

x2
Perceptron model
• y = x1w1 + x2w2

w1
x1
y
Inputs f(X) Output
w2

x2
Perceptron model
• update the weights to effect y

w1
x1
y
Inputs f(X) Output
w2

x2
Perceptron model
● what if an x is zero? w won’t change anything!

w1
x1
y
Inputs f(X) Output
w2

x2
Perceptron model
• add in a bias term b to the inputs

w1
x1
y
Inputs f(X) Output
w2

x2
Perceptron model

*w1 + b
x1
y
Inputs f(X) Output

x2 *w2 + b
Perceptron model
• y = (x1w1 + b) + (x2w2 + b)

*w1 + b
x1
y
Inputs f(X) Output

x2 *w2 + b
Perceptron model
• expand this to a generalization:

x1
*w1 + b y
Inputs f(X) Output
x2 *w2 + b

xn *wn + b
Perceptron model

• Modeled a biological neuron as a simple perceptron

• Mathematically generalization:
Neural Networks
Neural Networks

• To build a network of perceptrons, we can connect layers of

perceptrons - multi-layer perceptron model
• The outputs of one perceptron are directly fed as inputs to
another perceptron.
Neural Networks

• The first layer is the input layer

Neural Networks

• The last layer is the output layer.

• Note: This last layer can be more than one neuron
Neural Networks
• Layers in between the input and output layers are the hidden layers.
• Hidden layers are difficult to interpret
• Neural Networks become “deep neural networks” if then contain 2
or more hidden layers.
Neural Networks
Neural Networks

• Neural network framework can be used to approximate any

function.
• Zhou Lu and later on Boris Hanin proved mathematically
that Neural Networks can approximate any convex
continuous function.
Activation Functions
Neural Networks

• Recall
• x*w + b
• w implies how much weight or strength to give the
incoming input
• b offset value, making x*w have to reach a certain
threshold before having an effect
Neural Networks

• For example if b= -10

• x*w + b
• Then the effects of x*w won’t really start to overcome
the bias until their product surpasses 10.
• After that, then the effect is solely based on the value of
w.
Neural Networks

• Set boundaries for the output value:

• x*w + b
• z = x*w + b
• Pass z through some activation function to limit its
value.
Perceptron model
• Recall our simple perceptron has an f(X)
• If we had a binary classification problem, we would want an output of
either 0 or 1.
w1
x1
y
Inputs f(X) Output
w2
+b • z = wx + b
x2 • In this context, we’ll then refer to
activation functions as f(z).
• Often see these variables capitalized
f(Z) or X to denote a tensor input
consisting of multiple values.
Deep Learning
• The most simple networks rely on a basic step function that outputs
0 or 1.
• Regardless of the values, this always outputs 0 or 1.
• Useful for classification (0 or 1 class).

1
Output very “strong” function

0
0
z = wx + b
Deep Learning

• Immediate cut off that splits between 0 and 1.

1
Output

0
0
z = wx + b
Deep Learning

1
Output

0
0
z = wx + b
Deep Learning

• sigmoid function

1
Output

0
0
z = wx + b
Deep Learning

• Hyperbolic Tangent: tanh(z)

• Outputs between -1 and 1 instead of 0 to 1

1
Output

-1
0
Deep Learning

• Rectified Linear Unit (ReLU): This is actually a relatively

simple function: max(0,z)
• good performance

Output

0
z = wx + b
Multi-Class
Activation Functions
Deep Learning

• There are 2 main types of multi-class situations

§ Non-Exclusive Classes
o A data point can have multiple classes/categories assigned to it
o Photos can have multiple tags (e.g. beach, family, vacation, etc…)
§ Mutually Exclusive Classes
o Only one class per data point.
o Photos can be categorized as being in grayscale (black and
white) or full color photos
• Organizing Multiple Classes
§ 1 output node per class.
Neural Networks

• This single node could output a continuous regression value

or binary classification (0 or 1).
Multiclass Classification

● Organizing for Multiple Classes

Class One

Class Two

Hidden Layers
Class N
Deep Learning

• Organizing Multiple Classes

• We can’t just have categories like “red”, “blue”, “green”,
etc...
• Instead we use one-hot encoding
Data Point 1 RED

Data Point 2 GREEN

Data Point 3 BLUE

... ...

Data Point N RED

Deep Learning

• Mutually Exclusive Classes

RED GREEN BLUE
Data Point 1 RED
Data Point 1 1 0 0
Data Point 2 GREEN
Data Point 2 0 1 0
Data Point 3 BLUE
Data Point 3 0 0 1
... ...
... ... ... ...
Data Point N RED
Data Point N 1 0 0
Deep Learning

• Non-Exclusive Classes
A B C
Data Point 1 A,B
Data Point 1 1 1 0
Data Point 2 A
Data Point 2 1 0 0
Data Point 3 C,B
Data Point 3 0 1 1
... ...
... ... ... ...
Data Point N B
Data Point N 0 1 0
Deep Learning

• Non-exclusive
• Sigmoid function
• Each neuron will output a value between 0 and 1,
indicating the probability of having that class
assigned to it.
Multiclass Classification

• Sigmoid Function for Non-Exclusive Classes

1
Class One 0.8
0

1
Class Two 0.2
0

Hidden Layers
1
Class N 0.3
0
Deep Learning

• Mutually Exclusive Classes

• But what do we do when each data point can only have a
single class assigned to it?
• softmax function
Deep Learning

• Mutually Exclusive Classes

• Softmax function calculates the probabilities distribution
of the event over K different events.
• This function will calculate the probabilities of each
target class over all possible target classes.
• The range will be 0 to 1, and the sum of all the
probabilities will be equal to one.
• The model returns the probabilities of each class and the
target class chosen will have the highest probability.
Deep Learning

• Mutually Exclusive Classes

• If we use softmax for multi-class problems you get this
type of output:
• [Red , Green , Blue]
• [ 0.1 , 0.6 , 0.3 ]
Cost Functions and
Gradient Descent
Deep Learning

• The output 𝒚! is the model’s estimation of what it predicts

the label to be.
• So after the network creates its prediction, how do we
evaluate it?
• And after the evaluation how can we update the network’s
weights and biases?
Deep Learning

• First question
• We need to take the estimated outputs of the network
and then compare them to the real values of the label.
• The cost function (often referred to as a loss function)
must be an average so it can output a single value.
Terminology

• We’ll use the following variables:

• y to represent the true value
• a to represent neuron’s prediction
• In terms of weights and bias:
• w*x + b = z
• Pass z into activation function σ(z) = a
Deep Learning

• One very common cost function is the quadratic cost

function:

• calculate the difference between the real values y(x)

against our predicted values a(x)
• squaring this does 2 useful things for us, keeps everything
positive and punishes large errors
Deep Learning

• Cost function

• W is our neural network's weights, B is our neural

network's biases, Sr is the input of a single training
sample, and Er is the desired output of that training
sample.
Deep Learning

• information was encoded in the simplified notation.

• The a(x) holds information about weights and biases.
Deep Learning

• In a real case, this means we have some cost function C

dependent lots of weights!
• C(w1,w2,w3,....wn)
• How do we find out which weights lead us to the
lowest cost?
Deep Learning

• For simplicity, let’s imagine we only had one weight in our

cost function w.
• We want to minimize our loss/cost (overall error).
• Which means we need to figure out what value of w
results in the minimum of C(w)
Deep Learning
• “simple” function C(w)
• What value of w minimizes the cost?
C(w)

w
Deep Learning

• What value of w minimizes our cost?

• we could take a derivative and solve for 0 C(w)

wmin
• The real cost function will be very complex!
• n-dimensional
• use gradient descent to solve this problem
C(w)

w
Gradient Descent
• Calculate the slope at a point
C(w)

wmin
Gradient Descent
• Calculate the slope at a point
C(w)

wmin
Deep Learning
• Move in the downward direction of the slope.
C(w)

wmin
Deep Learning
• Until we converge to zero, indicating a minimum.
C(w)

wmin
Deep Learning
• We could have changed our step size to find the next point!
C(w)

wmin
Deep Learning
• Smaller steps sizes take longer to find the minimum.
C(w)

wmin
Deep Learning
• Larger steps are faster, but we risk overshooting the
minimum!
C(w)

wmin
Deep Learning
• This step size is known as the learning rate.
C(w)

wmin
Deep Learning

• The learning rate shown in the illustrations was

constant (each step size was equal)
• Adapt the step size
• start with larger steps, then go smaller as we realize the
slope gets closer to zero.
• This is known as adaptive gradient descent.
Deep Learning
• In 2015, Kingma and Ba published their paper: “Adam: A
Method for Stochastic Optimization“.
• Adam is a much more efficient way of searching for these
minimums
Deep Learning

• Realistically we’re calculating this descent in an n-dimensional space

for all our weights.
• When dealing with these N-dimensional vectors (tensors), the
notation changes from derivative to gradient.
• This means we calculate

• ∇C(w1,w2,...wn)
Deep Learning
• For classification problems, we often use the cross
entropy loss function.
• The assumption is that your model predicts a probability
distribution p(y=i) for each class i=1,2,…,C.
• For a binary classification this results in:

• For M number of classes > 2

Backpropagation
Backpropagation

• Let’s begin with a very simple network, where each layer

only has 1 neuron
Backpropagation

• Each input will receive a weight and bias

w1 +b1 w2 +b2 w3 +b3

Backpropagation

• This means we have:

• C(w1,b1,w2,b2,w3,b3)
• We’ve already seen how this process propagates forward.
• Let’s start at the end to see the backpropagation.

w1 +b1 w2 +b2 w3 +b3

Backpropagation

• Let’s say we have L layers, then our notation becomes:

L-n L-2 L-1 L

Backpropagation

• Focusing on these last two layers, let’s define z=wx+b

• Then applying an activation function we’ll state: a = σ(z)

L-1 L
Backpropagation

• This means we have:

• zL = wL aL-1 +bL
• aL = σ(zL)
• C0(...) =(aL - y)2

L-1 L
Backpropagation

• We want to understand how sensitive is the cost function

to changes in w:

L-1 L
Backpropagation

• Using the relationships we already know along with the

chain rule:

L-1 L
Backpropagation

• We can calculate the same for the bias terms:

L-1 L
Backpropagation
!"
• Partial derivative :
!#
• How quickly the cost changes when we change the weights

&
• 𝑤$% : weight for the connection from the 𝑘'( neuron in 𝑙 − 1 layer to the 𝑗'(
neuron in 𝑙'( layer
Backpropagation
• Activation 𝑎$& of 𝑗'( neuron in 𝑙'( layer is related to the activations in 𝑙 − 1'(
layer by the following equation:
Assumptions about the cost function
• First assumption
• Average
• Second assumption
• Function of output from the neural network
Four Fundamental Equations
Equation 1: Error in the output layer
Equation 2: Error in terms of error in the next layer

Equation 3: Rate of change of the cost w.r.t. any bias in the network

Equation 4: Rate of change of the cost w.r.t. any weight in the network
Learning Process

• Step 1: Using input x set the activation function a for the

input layer.
• z = w x +b
• a = σ(z)
• This resulting a then feeds into the next layer (and so on).
• Step 2: For each layer, compute:
• zL = wL aL-1 +bL
• aL = σ(zL)
Deep Learning

• Step 3: We compute our error vector:

• δL=∇aC⊙σ′(zL) Hadamard Product
Deep Learning

• Step 3: We compute our error vector:

• δL=∇aC⊙σ′(zL)
• ∇aC=(aL−y)
• Expressing the rate of change of C with
respect to the output activations
• Now let’s write out our error term for a layer in terms of
the error of the next layer (since we’re moving
backwards).
• Step 4: Backpropagate the error:
• For each layer: L−1,L−2,… we compute
• δl=(wl+1)Tδl+1⊙σ′(zl) - generalized error for any
layer l
• (wl+1)T is the transpose of the weight matrix of l+1
layer

2022 - Recurrent Neural Networks Concepts and Applications
No ratings yet
2022 - Recurrent Neural Networks Concepts and Applications
413 pages
Deep Learning (MODULE-3)
No ratings yet
Deep Learning (MODULE-3)
85 pages
Introduction To Artificial Neural Networks With Keras - IITR Batch 2
No ratings yet
Introduction To Artificial Neural Networks With Keras - IITR Batch 2
252 pages
Webpdf
No ratings yet
Webpdf
671 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
7 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
221 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
A Survey On Vision Transformer
No ratings yet
A Survey On Vision Transformer
23 pages
NN Adaline
0% (2)
NN Adaline
14 pages
Neural Deep Learning
No ratings yet
Neural Deep Learning
221 pages
CS4442 - CS9542 - Part 2 - Lecture 5 - DNN - Intro
No ratings yet
CS4442 - CS9542 - Part 2 - Lecture 5 - DNN - Intro
113 pages
Deep Learning
No ratings yet
Deep Learning
180 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
ML Lecture#4
No ratings yet
ML Lecture#4
109 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
Lecture NN 2005
No ratings yet
Lecture NN 2005
137 pages
3 ArtificialNeuralNetworks PDF
No ratings yet
3 ArtificialNeuralNetworks PDF
77 pages
深度强化学习（初稿）
No ratings yet
深度强化学习（初稿）
289 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
216 pages
ML.8-Neural Networks - Deep Learning (Week 12,13)
No ratings yet
ML.8-Neural Networks - Deep Learning (Week 12,13)
80 pages
10 Multilayer Perceptrons
No ratings yet
10 Multilayer Perceptrons
54 pages
Lect 12 - Deep Feed Forward NN - Review
No ratings yet
Lect 12 - Deep Feed Forward NN - Review
93 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Introduction Deep Eng
No ratings yet
Introduction Deep Eng
50 pages
02A DL2023 NN Basics
No ratings yet
02A DL2023 NN Basics
52 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
Unit 1
No ratings yet
Unit 1
72 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
48 pages
Unit I
No ratings yet
Unit I
90 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Unit 1 and Unit 2
No ratings yet
Unit 1 and Unit 2
30 pages
4 Neural Network
No ratings yet
4 Neural Network
74 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
AN2DL 02 2324 Perceptron 2 FeedForward
No ratings yet
AN2DL 02 2324 Perceptron 2 FeedForward
55 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
Neural Networks - V Unit
No ratings yet
Neural Networks - V Unit
43 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
NNDL
No ratings yet
NNDL
96 pages
DL Intro
No ratings yet
DL Intro
64 pages
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
No ratings yet
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
106 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
Unit 4 Part 3 DL - 1
No ratings yet
Unit 4 Part 3 DL - 1
5 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Day1 05 Introduction To DeepLearning Part
No ratings yet
Day1 05 Introduction To DeepLearning Part
20 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Lecture2 Slides 1
No ratings yet
Lecture2 Slides 1
28 pages
Deep Learning Module-02 Search Creators
No ratings yet
Deep Learning Module-02 Search Creators
15 pages
Soft Computing Unit 2
No ratings yet
Soft Computing Unit 2
22 pages
Lesson 7.0 Supervised Learning With Neural Networks
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks
22 pages
Text Generation With LSTM Recurrent Neural Networks in Python With Keras
No ratings yet
Text Generation With LSTM Recurrent Neural Networks in Python With Keras
23 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
26 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
9 pages
Technical Seminar
No ratings yet
Technical Seminar
27 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
Deep Learning
No ratings yet
Deep Learning
11 pages
Neural Network (RNN & CNN)
No ratings yet
Neural Network (RNN & CNN)
31 pages
Module 2 DL Snotes P1
No ratings yet
Module 2 DL Snotes P1
16 pages
Tutorial 1,2
No ratings yet
Tutorial 1,2
12 pages
Chapter-2 (Deep Learning)
No ratings yet
Chapter-2 (Deep Learning)
18 pages
ST M Hdstat RNN Deep Learning
No ratings yet
ST M Hdstat RNN Deep Learning
17 pages
Deep Learning
No ratings yet
Deep Learning
13 pages
Deep Learning NLP and Computer Vision
No ratings yet
Deep Learning NLP and Computer Vision
9 pages
RNN LectureNotes
No ratings yet
RNN LectureNotes
36 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
QB Ecc604 May 2022 Examination Te Extc Sem Vi 2021-22
No ratings yet
QB Ecc604 May 2022 Examination Te Extc Sem Vi 2021-22
25 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
3 pages
Fundamentals of Deep Learning: Part 5: Pre-Trained Models
No ratings yet
Fundamentals of Deep Learning: Part 5: Pre-Trained Models
18 pages
2 Deep Learning in Image Classification A Survey Report
No ratings yet
2 Deep Learning in Image Classification A Survey Report
4 pages
Assignment 01
No ratings yet
Assignment 01
3 pages
The Significance of LLM Tokenization
No ratings yet
The Significance of LLM Tokenization
6 pages
Depth-Gated Recurrent Neural Networks
No ratings yet
Depth-Gated Recurrent Neural Networks
5 pages
Back-Propagation Neural Network
No ratings yet
Back-Propagation Neural Network
2 pages
RNN LSTM BiRNN Notes
No ratings yet
RNN LSTM BiRNN Notes
3 pages
DL Unit 6
No ratings yet
DL Unit 6
2 pages
Neural Turing Machine
No ratings yet
Neural Turing Machine
2 pages
Adobe Scan Dec 17, 2023
No ratings yet
Adobe Scan Dec 17, 2023
1 page
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)

Artificial Neural Networks

Uploaded by

Artificial Neural Networks

Uploaded by

Artificial Neural

• To begin understanding deep learning,

• Stained Neurons in cerebral cortex

• A perceptron was a form of neural network introduced in

• Marvin Minsky and Seymour Papert's (1969)- book

• Modeled a biological neuron as a simple perceptron

• To build a network of perceptrons, we can connect layers of

• The first layer is the input layer

• The last layer is the output layer.

• Neural network framework can be used to approximate any

• For example if b= -10

• Set boundaries for the output value:

• Immediate cut off that splits between 0 and 1.

• Hyperbolic Tangent: tanh(z)

• Rectified Linear Unit (ReLU): This is actually a relatively

• There are 2 main types of multi-class situations

• This single node could output a continuous regression value

● Organizing for Multiple Classes

• Organizing Multiple Classes

Data Point 2 GREEN

Data Point 3 BLUE

Data Point N RED

• Mutually Exclusive Classes

• Sigmoid Function for Non-Exclusive Classes

• Mutually Exclusive Classes

• Mutually Exclusive Classes

• Mutually Exclusive Classes

• The output 𝒚! is the model’s estimation of what it predicts

• We’ll use the following variables:

• One very common cost function is the quadratic cost

• calculate the difference between the real values y(x)

• W is our neural network's weights, B is our neural

• information was encoded in the simplified notation.

• In a real case, this means we have some cost function C

• For simplicity, let’s imagine we only had one weight in our

• What value of w minimizes our cost?

• The learning rate shown in the illustrations was

• Realistically we’re calculating this descent in an n-dimensional space

• For M number of classes > 2

• Let’s begin with a very simple network, where each layer

• Each input will receive a weight and bias

w1 +b1 w2 +b2 w3 +b3

• This means we have:

w1 +b1 w2 +b2 w3 +b3

• Let’s say we have L layers, then our notation becomes:

L-n L-2 L-1 L

• Focusing on these last two layers, let’s define z=wx+b

• This means we have:

• We want to understand how sensitive is the cost function

• Using the relationships we already know along with the

• We can calculate the same for the bias terms:

• Step 1: Using input x set the activation function a for the

• Step 3: We compute our error vector:

• Step 3: We compute our error vector:

You might also like