Mod 2.4,2.5,2.6 Architecture Design

Uploaded by

Christeena Antony

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views20 pages

Mod 2.4,2.5,2.6 Architecture Design

Uploaded by

Christeena Antony

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Architecture Design

 One key design consideration for neural networks is determining the

architecture.
 The word architecture refers to the overall structure of the network: how
many units it should have and how these units should be connected to each
other.
 Most neural networks are organized into groups of units called layers.
 Most neural network architectures arrange these layers in a chain structure,
with each layer being a function of the layer that preceded it. In this structure,
the first and second layer are given respectively as
 In these chain-based architectures, the main architectural
considerations are to choose the depth of the network and the
width of each layer.
 A network with even one hidden layer is sufficient to fit the
training set.
 Deeper networks often are able to use much lesser units per
layer and much lesser parameters and often generalize to the test
set, but are also often harder to optimize.
 The ideal network architecture for a task must be found via
experimentation guided by monitoring the validation set error.
Universal Approximation Properties
and Depth
 A linear model, mapping from features to outputs via matrix multiplication, can
by definition represent only linear functions. It has the advantage of being easy to
train.
 But we often want to learn nonlinear functions.
 At first glance, we might presume that learning a nonlinear function requires
designing a specialized model family for the kind of nonlinearity we want to
learn.
 Fortunately, feedforward networks with hidden layers provide a universal
approximation framework.
 The universal approximation theorem means that regardless of
what function we are trying to learn, we know that a large MLP
will be able to represent this function.
 However, we are not guaranteed that the training algorithm will
be able to learn that function.
 Even if the MLP is able to represent the function, learning can fail
for two different reasons.
1. First, the optimization algorithm used for training may not be
able to find the value of the parameters that corresponds to the
desired function.
2. Second, the training algorithm might choose the wrong function
due to overfitting.
 The universal approximation theorem says that there exists
a network large enough to achieve any degree of accuracy
we desire, but the theorem does not say how large this
network will be.
 In summary, a feedforward network with a single layer is
sufficient to represent any function, but the layer may be
infeasibly large and may fail to learn and generalize
correctly.
 In many circumstances, using deeper models can reduce
the number of units required to represent the desired
function and can reduce the amount of generalization error.
Other Architectural Considerations
 In practice, neural networks show considerably more diversity.
 Many neural network architectures have been developed for specific tasks.
1. Simulating Basic Machine Learning with Shallow Models
Most of the basic machine learning models like linear regression,
classification, support vector machines, logistic regression, singular value
decomposition, and matrix factorization can be simulated with shallow
neural networks containing no more than one or two layers.
2. Convolutional Neural Networks
3. Recurrent Neural Networks
4. Restricted Boltzmann Machines
Training a Neural Network with Backpropagation

 Backpropagation is one of the important concepts of a neural network.

 Our task is to classify our data best.
 For this, we have to update the weights of parameter and bias, In the
linear regression model, we use gradient descent to optimize the
parameter.
 Similarly here we also use gradient descent algorithm using
Backpropagation.
 For a single training example, Backpropagation algorithm calculates
the gradient of the error function.
Training a Neural Network with Backpropagation

The backpropagation algorithm contains two main phases, referred to as the forward and
backward phases, respectively.

1. Forward phase: In this phase, the inputs for a training instance are fed into the neural
network. This results in a forward cascade of computations across the layers, using the current
set of weights. The final predicted output can be compared to that of the training instance and
the derivative of the loss function with respect to the output is computed. The derivative of this
loss now needs to be computed with respect to the weights in all layers in the backwards phase.
2. Backward phase: The main goal of the backward phase is to learn the gradient of the loss
function with respect to the different weights by using the chain rule of differential calculus.
These gradients are used to update the weights. Since these gradients are learned in the
backward direction, starting from the output node, this learning process is referred to as the
backward phase.
Training a Neural Network
with Backpropagation
 In the single-layer neural network, the training process is relatively
straightforward because the error (or loss function) can be computed as a direct
function of the weights, which allows easy gradient computation.
 In the case of multi-layer networks, the problem is that the loss is a complicated
composition function of the weights in earlier layers. The gradient of a
composition function is computed using the backpropagation algorithm.

 The backpropagation algorithm leverages the chain rule of differential calculus,

which computes the error gradients in terms of summations of local-gradient
products over the various paths from a node to the output.
 Backpropagation algorithms are a set of methods used to efficiently train
artificial neural networks following a gradient descent approach which
exploits the chain rule.
Illustration of chain rule in
computational graphs
Example to understand how exactly updates the weight using Backpropagation.
Gradient Descent

 Let’s visualize how we might minimize the squared error over all of the training
examples
 Imagine a three-dimensional space where the horizontal dimensions correspond to
the weights w1 and w2, and the vertical dimension corresponds to the value of the
error function E.
 In this space, points in the horizontal plane correspond to different settings of the
weights, and the height at those points corresponds to the incurred error.
 If we consider the errors we make over all possible weights, we get a surface in
this three-dimensional space, in particular, a quadratic bowl.
Gradient Descent

The quadratic error surface for a linear neuron

We can also conveniently visualize this surface as a set of
elliptical contours, where the minimum error is at the center of
the ellipses.
Here we are working in a two-dimensional plane where the
dimensions correspond to the two weights. Contours
correspond to settings of w1 and w2 that evaluate to the same
value of E.
The closer the contours are to each other, the steeper the slope.
 In fact, it turns out that the direction of the steepest descent is
always perpendicular to the contours.
This direction is expressed as a vector known as the gradient.
Visualizing the error surface as a set of contours
How to find the values of the weight that minimizes
the error function?
 Suppose we randomly initialize the weights of our network so we find
ourselves somewhere on the horizontal plane.
 By evaluating the gradient at our current position, we can find the direction
of steepest descent, and we can take a step in that direction.
 Then we’ll find ourselves at a new position that’s closer to the minimum
than we were before. We can reevaluate the direction of steepest descent by
taking the gradient at this new position and taking a step in this new
direction.
 Following this strategy will eventually get us to the point of minimum
error.
 This algorithm is known as gradient descent
Gradient Descent(GD)
 Gradient descent is an optimization algorithm which is commonly-
used to train machine learning models and neural networks.Until
the function is close to or equal to zero, the model will continue to
adjust its parameters to yield the smallest possible error.
 Gradient descent is simply used to find the values of a
function's parameters (coefficients) that minimize a cost
function as far as possible.
 Gradient descent is best used when the parameters cannot be
calculated analytically (e.g. using linear algebra) and must be
searched for by an optimization algorithm.
 To start finding the right values we initialize w and b with some
random numbers
 Gradient descent then starts at that point and it takes one step after
another in the steepest downside direction until it reaches the point
where the cost function is as small as possible.
 How big the steps are gradient descent takes into the direction of the
local minimum are determined by the learning rate(another
hyperparameter), which figures out how fast or slow we will move
towards the optimal weights.
 Picking the learning rate is a hard problem.If we pick a learning rate that’s too small, we risk
taking too long during the training process. But if we pick a learning rate that’s too big, we’ll
mostly likely start diverging away from the minimum
Gradient Descent Algorithm

1. Randomly initialize weights w

2. Compute gradient G using derivative of cost function wrt weights J(w)

3.Weight update equation: wnew = wold-ηG.

(Here, η is a learning rate which should not be too high or low to skip or not
at all converging to min point. If we compute the gradient of the loss function
w.r.t our weights and take small steps in the opposite direction of gradient
our loss will gradually decrease until it converges to some local minima.)
4. Repeat steps 2 to 3 till convergence .Meaning till the weights, Wnew
approximately or equal to Wold.
Gradient descent

It is an iterative process to find the parameters (or the

weights) that converge with the optimum solution. The
optimum solution is where the loss function is minimized.
If gradient descent is working properly, the cost function
should decrease after every iteration.
When gradient descent can’t decrease the cost-function
anymore and remains more or less on the same level, it
has converged.

Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
No ratings yet
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
20 pages
Curs3site PDF
No ratings yet
Curs3site PDF
38 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
CS601 - Machine Learning - Unit 2 New
No ratings yet
CS601 - Machine Learning - Unit 2 New
56 pages
NN 2
No ratings yet
NN 2
12 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Unit 1
No ratings yet
Unit 1
72 pages
Lecture 13.3 Classification ANN
No ratings yet
Lecture 13.3 Classification ANN
64 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
SJNanda Neural Network
No ratings yet
SJNanda Neural Network
47 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
71 pages
Wa0006.
No ratings yet
Wa0006.
70 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
Main
No ratings yet
Main
25 pages
An Introduction To Mathematics Behind Neural Networks
No ratings yet
An Introduction To Mathematics Behind Neural Networks
5 pages
Module 2
No ratings yet
Module 2
14 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
A Gentle Introduction To Backpropagation
100% (1)
A Gentle Introduction To Backpropagation
15 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
Unit 2
No ratings yet
Unit 2
38 pages
SJNanda Neural Network
No ratings yet
SJNanda Neural Network
43 pages
SJNanda - Neural Network
No ratings yet
SJNanda - Neural Network
43 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
No ratings yet
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
6 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Artificial Neural Network - Back-Propagation Learning
No ratings yet
Artificial Neural Network - Back-Propagation Learning
21 pages
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
No ratings yet
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
27 pages
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
No ratings yet
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
39 pages
9 Neural Networks Learning
No ratings yet
9 Neural Networks Learning
38 pages
Ann-Back Propagation
No ratings yet
Ann-Back Propagation
21 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
Back Propagation Algorithm
No ratings yet
Back Propagation Algorithm
13 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Back Propagation Technique
No ratings yet
Back Propagation Technique
24 pages
Classification BP Regression KNN Other Classifiers - Final
No ratings yet
Classification BP Regression KNN Other Classifiers - Final
116 pages
Computing Gradient Using Backpropagation: ZV0GDF798E
No ratings yet
Computing Gradient Using Backpropagation: ZV0GDF798E
5 pages
Machine Learning: Algorithms and Applications: (Continued)
No ratings yet
Machine Learning: Algorithms and Applications: (Continued)
17 pages
Machine Learning Unit 5 Notes
No ratings yet
Machine Learning Unit 5 Notes
19 pages
Module 3 Final
No ratings yet
Module 3 Final
88 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
Unit 2
No ratings yet
Unit 2
37 pages
Ann 2 A
No ratings yet
Ann 2 A
20 pages
Multi Layer Feed-Forward Network Learning
No ratings yet
Multi Layer Feed-Forward Network Learning
5 pages
Anthony Kuh - Neural Networks and Learning Theory
No ratings yet
Anthony Kuh - Neural Networks and Learning Theory
72 pages
ML Unit - 2
No ratings yet
ML Unit - 2
70 pages
00005187-Deep Learning
No ratings yet
00005187-Deep Learning
11 pages
Back Propagation
No ratings yet
Back Propagation
28 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
ANN Notes Updated
0% (1)
ANN Notes Updated
46 pages
Communication Lab Record
No ratings yet
Communication Lab Record
32 pages
Module 3
No ratings yet
Module 3
46 pages
CNN Notes
No ratings yet
CNN Notes
10 pages
Mod 2.3 - Activation Function
No ratings yet
Mod 2.3 - Activation Function
9 pages