100% found this document useful (1 vote)

261 views12 pages

What Is Gradient Based Learning in Deep Learning

Uploaded by

deepunageti112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

261 views12 pages

What Is Gradient Based Learning in Deep Learning

Uploaded by

deepunageti112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

What is a Deep Feed-Forward Network?

Basically Deep Feed-Forward Networks (I will use the abbreviation DFN for the
rest of the article) are such neural networks which only uses input to feed forward
through a function, let’s say f*, but only through forward. There is no feedback
mechanism in DFN. There are indeed such cases when we have feedback
mechanism from the output, that are called Recurrent Neural Networks (I am also
planning to write about that later).

What are its applications?

DFNs are really useful in many areas. One of the most-known applications of AI
is so called object-recognition. It uses convolutional neural networks, that is a
special kind of feed-forward network.

applications of a DFN
What is happening in the kitchen of this business anyways?
Let’s get back to our function f* dive deeper. As I previously mentioned we have
a function y=f*(x) that we would like to approximate to f(x) by doing some
calculations. Clearly our input is x and we feed it through our function f* and we
get our result y. This is simple. But how does this help us to implement such
functionalities in the field? Imagine y as outputs such that we want to classify
inputs x. DNF defines a mapping y = f(x; θ), learns the values of θ and maps the
input x to the categories of y.

fig 1.1 — schema of a DFN

As you can observe from the above picture, DFN consists of many layers. You
can think of those layers like composite functions:

f(x) = f1(f2 (f3 (x)))

For the above function it has depth of 3. The name of deep learning is derived
from this terminology. The output layer is the outter-most function which is f1 in
this case. These chain structures are the most-common used structures. The more
composite the function gets, the more layers it has. Some of those layers are
called hidden layers. There is a reason behind why they are called hidden-layers.
It’s because the learning algorithm itself figures out how to use these layers in
order to achieve the best approximation of f* and we don’t interrupt with the
process. Namely, the training data doesn’t individually say what each layer should
do. In addition, the dimension of the hidden layers define the width of the
network. These layers also consist of units, nodes whatever you want to call them.

“Each unit resembles a neuron in the sense that it receives input from many other
units and computes its own activation value.” — G. Ian et al., Deep Learning

fig 1.2 — a schema of a net with depth 2: one hidden layer

Units of a Model
Each element of vector is viewed as a neuron. It’s easier to think them like units
in parallel that recieves inputs from many other units and computes its own
activation value rather than a vector-vector function.
retrieved from https://towardsdatascience.com/an-introduction-to-deep-
feedforward-neural-networks-1af281e306cd

Actually what we do here is converting biological nature into mathematical

expressions. Every neuron has their own input (vectore shaped e.g. x = [x1, x2,
x3, x4, … , xn]) from other neurons, multiplies those inputs by distinct weights
(w = [w1, w2, w3, …, wn])and appends a bias noted as b which is a constant value,
it’s used to adjust the net output. Then the net output is fed through a
function g(z) called activation function and the result of g(z) is sent to the other
neurons.

Activation Function

In activation functions the main idea is to define a threshold such that if it’s likely
to happen, we accept and vise verse we reject. Namely, we need functions that
jumps from 0 to 1 in a very tight manner. There are several functions, techniques
that do this job:

 1. Binary-Step Function

 2. Linear Function
 3. Sigmoid Function*

 4. Hyperbolic Tangent Function

 5. Rectified Linear Unit Function* (ReLU)

Sigmoid Function

Sigmoid function is a non-linear continuous function that outputs between 0 and

1 according to the input value.

fig 2.1 — sigmoid formula and the graph

A plot of the sigmoid function is shown in the fig 2.1. This function was used very
much in the past. However, nowadays there are other activation functions that are
used more frequently in the field. One of which is RELU

Rectified Linear Unit Function (ReLU)

This activation function is very popular in the area. It has the formula:

fig 2.2 — ReLU

The plot of this function is shown in fig 2.3.

fig 2.3 — plot of the ReLU

fig 2.4 — graph of the derivative of ReLU

However, the derivative of this function is not defined at 0 since it has a break
point there. So we assume the derivative equals to 0 or 1 at that point (fig 2.4).
Some Architectural Considerations

I shall begin this part of the article by explaining what architecture means in a
deep-forward network. “It refers to the overall structure of the network: how many
units it should have and how these units should be connected to each other”
(Goodfellow et al., 2016). Most neural networks have organizations such as

h1 = g_1( weight * input + bias)

the second layer is given

h2 = g_2(weight_2 * h1 + bias_2)

The layering process can go as much as we want, but you got the idea. In the chain-
base architectures, main architectural consideration is to pick the optimum depth
and width of the layers. The ideal parameters for depth and width are decided
through careful observations and experiments that had been conducted.

fig 3.1 — effect of depth in test accuracy

On the other hand, the depth of the neural network really affects the test accuracy,
performance in the application. We can infer that the more layers the architecture
has the better performance the model has as shown in fig 3.1.

Another consideration in the architectural design is that the connection method of

the layers. In some cases units of the layers in the neural network has fewer
connections to the following layer resulting in less computation need. As amount
of computation is reduced, the resource that is needed to train the model decreases
significantly. For instance a special case of a neural network is
called Convolutional Neural Network (CNN).
What is Gradient Based Learning in Deep
Learning?
Gradient-based learning is the backbone of many deep learning algorithms. This approach involves iteratively
adjusting model parameters to minimize the loss function, which measures the difference between the actual and
predicted outputs. At its core, Gradient-based learning leverages the gradient of the loss function to navigate the
complex landscape of parameters. In this blog, let’s discuss the essentials of Gradient-Based Learning, and if this
blog excites you for more, you can always dive into our Deep Learning Courses with Certificates online.

Cost Functions: The Mathematical Backbone

Learning Conditional Distributions with Max Likelihood

Maximizing likelihood is finding parameter values that make the observed data most probable. This is often
expressed as

Learning Conditional Statistics

This involves understanding the relationships between variables and focusing on the conditional expectation. The
goal is to minimize the difference between predicted and actual values, often using mean squared error (MSE) as
a cost function.

Output Units: Adapting to Data Types

Linear Units for Gaussian Output Distributions

The linear unit is used for outputs resembling a Gaussian distribution. The output is a linear combination of inputs:
Sigmoid Units for Bernoulli Output Distributions
These units are used for binary outcomes, modeled as

This function maps any input to a value between 0 and 1, ideal for binary classification.

Softmax Units for Multinoulli Output Distributions

For multi-class classification, the softmax function, which generalizes the sigmoid function for multiple classes,
is used.

Other Output Types

Deep learning can handle various output types, with specific functions tailored to different data distributions.

Role of an Optimizer in Deep Learning

Optimizers are algorithms designed to minimize the cost function. They play a critical role in Gradient-based
learning by updating the weights and biases of the network based on the calculated gradients.
Intuition Behind Optimizers with an Example
Consider a hiker trying to find the lowest point in a valley. They take steps proportional to the steepness of the
slope. In deep learning, the optimizer works similarly, taking steps in the parameter space proportional to the
gradient of the loss function.

Instances of Gradient Descent Optimizers

Batch Gradient Descent (GD)

This optimizer calculates the gradient using the entire dataset, ensuring a smooth descent but at a computational
cost. The update rule is

Stochastic Gradient Descent (SGD)

SGD updates parameters for each training example, leading to faster but less stable convergence. The update rule
is:

Mini-batch Gradient Descent (MB-GD)

MB-GD strikes a balance between GD and SGD by using mini-batches of the dataset. It combines efficiency with
a smoother convergence than SGD.

Challenges with All Types of Gradient-Based Optimizers

The journey of mastering Gradient-based learning in deep learning has its challenges. Each optimizer in deep
learning faces unique hurdles that can impact the learning process:

 Learning Rate Dilemmas: One of the foremost challenges in Gradient-based learning is selecting the optimal
learning rate. A rate too high can cause the model to oscillate or even diverge, missing the minimum—
conversely, a rate too low leads to painfully slow convergence, increasing computational costs.

 Local Minima and Saddle Points: These are areas in the cost function where the gradient is zero, but they
are not the global minimum. In high-dimensional spaces, common in deep learning, these points become more
prevalent and problematic. This issue is particularly challenging for certain types of optimizers in deep
learning, as some may get stuck in these points, hindering effective learning.

 Vanishing and Exploding Gradients: A notorious problem in deeper networks. With vanishing gradients, as
the error is back-propagated to earlier layers, the gradient can become so small that it has virtually no effect,
stopping the network from learning further. Exploding gradients occur when large error gradients accumulate,
causing large updates to the network weights, leading to an unstable network. These issues are a significant
concern for Gradient-based learning.

 Plateaus: A plateau is a flat region of the cost function. When using Gradient-based learning, the learning
process can slow down significantly on plateaus, making it difficult to reach the minimum.

 Choosing the Right Optimizer: With various types of optimizers in deep learning, such as Batch Gradient
Descent, Stochastic Gradient Descent (SGD), and Mini-batch Gradient Descent (MB-GD), selecting the right
one for a specific problem can be challenging. Each optimizer has its strengths and weaknesses, and the choice
can significantly impact the efficiency and effectiveness of the learning process.

 Hyperparameter Tuning: In Gradient-based learning, hyperparameters like learning rate, batch size, and the
number of epochs need careful tuning. This process can be time-consuming and requires both experience and
experimentation.

 Computational Constraints: Deep learning models can be computationally intensive, particularly those that
leverage complex Gradient-based learning techniques. This challenge becomes more pronounced when
dealing with large datasets or real-time data processing.

 Adapting to New Data Types and Structures: As deep learning evolves, new data types and structures
emerge, requiring adaptability and innovation in Gradient-based learning methods.

These challenges highlight the complexity and dynamic nature of Gradient-based learning in deep learning.
Overcoming them requires a deep understanding of both theoretical concepts and practical implementations, often
covered in depth in our Certified Deep Learning Course.

Conclusion and Pathways to Learning

Mastering Gradient-based learning in deep learning requires a deep understanding of these concepts. Enrolling in
Deep Learning Courses with Certificates Online from JanBask Training can provide structured, comprehensive
insights into these complex topics. These courses combine theoretical knowledge with practical applications,
equipping learners with the skills needed to innovate in deep learning.

Unit 5
No ratings yet
Unit 5
61 pages
Activation Functions - Ipynb - Colaboratory
No ratings yet
Activation Functions - Ipynb - Colaboratory
10 pages
Image Captioning
67% (3)
Image Captioning
16 pages
Machine Learning: PAC-Learning and VC-Dimension
No ratings yet
Machine Learning: PAC-Learning and VC-Dimension
31 pages
Unit 2a
No ratings yet
Unit 2a
31 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
KNN (K Nearest Neighbor)
No ratings yet
KNN (K Nearest Neighbor)
21 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
2 pages
Lecture 26-30 Unit 2
No ratings yet
Lecture 26-30 Unit 2
20 pages
Unit - 3
No ratings yet
Unit - 3
42 pages
UNIT2
No ratings yet
UNIT2
25 pages
Lab Manual Soft Computing
No ratings yet
Lab Manual Soft Computing
44 pages
Week 7 Solution
No ratings yet
Week 7 Solution
6 pages
Unit 4 Deeplearning
No ratings yet
Unit 4 Deeplearning
41 pages
PPS Course Material
100% (1)
PPS Course Material
177 pages
Unit - II: Recurrent Neural Network
No ratings yet
Unit - II: Recurrent Neural Network
75 pages
Omkar Sabnis B4-764 Experiment No. 7 Aim: Implementation of MC-Culloch Pitt Model For AND Gate Using Python. Theory
No ratings yet
Omkar Sabnis B4-764 Experiment No. 7 Aim: Implementation of MC-Culloch Pitt Model For AND Gate Using Python. Theory
10 pages
ML Unit 1
No ratings yet
ML Unit 1
25 pages
r22 1 9 ML Lab Manual r22 Regulations
No ratings yet
r22 1 9 ML Lab Manual r22 Regulations
24 pages
Unit - 3-NNDL - Notes
No ratings yet
Unit - 3-NNDL - Notes
17 pages
UNIT II - Compressed
No ratings yet
UNIT II - Compressed
14 pages
Unit 4 NNDL
No ratings yet
Unit 4 NNDL
37 pages
CS6456-Object Oriented Programming
No ratings yet
CS6456-Object Oriented Programming
15 pages
Hough Transform
No ratings yet
Hough Transform
16 pages
G1 Sign Language Identifier PPT
No ratings yet
G1 Sign Language Identifier PPT
18 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
Assignment On RNN
No ratings yet
Assignment On RNN
1 page
ML - CSA 301 - ML Perspective and Issues
No ratings yet
ML - CSA 301 - ML Perspective and Issues
34 pages
Set 3
No ratings yet
Set 3
16 pages
18AI61
No ratings yet
18AI61
3 pages
CS230 Midterm Solutions Fall 2022
No ratings yet
CS230 Midterm Solutions Fall 2022
20 pages
Deep Learning Exp
No ratings yet
Deep Learning Exp
25 pages
UNIT V Streaming
No ratings yet
UNIT V Streaming
22 pages
Neural Network and Their Applications
No ratings yet
Neural Network and Their Applications
2 pages
Int. To Data Analytics and Cyber Security Syllabus
No ratings yet
Int. To Data Analytics and Cyber Security Syllabus
2 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
Aiml Lab Manual 2023
No ratings yet
Aiml Lab Manual 2023
17 pages
DCCN Notes
No ratings yet
DCCN Notes
27 pages
Important Questions Soft Computing
No ratings yet
Important Questions Soft Computing
9 pages
Al3411 Artificial Intelligence and Machine Learning Laboratory L T P C
No ratings yet
Al3411 Artificial Intelligence and Machine Learning Laboratory L T P C
11 pages
Question Bank AML
No ratings yet
Question Bank AML
4 pages
Programming in C - CS3251 - HandWritten Notes - Un - 250316 - 200237
No ratings yet
Programming in C - CS3251 - HandWritten Notes - Un - 250316 - 200237
38 pages
Module 4 Recurrent Neural Network
No ratings yet
Module 4 Recurrent Neural Network
78 pages
Lecture Notes 5
No ratings yet
Lecture Notes 5
3 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
43 pages
MP Neuron
No ratings yet
MP Neuron
35 pages
CS 3 - Problem Solving Agent
No ratings yet
CS 3 - Problem Solving Agent
80 pages
CCS335-Cloud-Computing-QB - Unit 3, 4 & 5
No ratings yet
CCS335-Cloud-Computing-QB - Unit 3, 4 & 5
57 pages
CST 402 DC QB
No ratings yet
CST 402 DC QB
6 pages
TB04 - Soft Computing Ebook PDF
100% (4)
TB04 - Soft Computing Ebook PDF
356 pages
Unit-Iv Hopfield Networks: As Per Jntu Your Syllabus Is
No ratings yet
Unit-Iv Hopfield Networks: As Per Jntu Your Syllabus Is
48 pages
Genetic Algorithm
No ratings yet
Genetic Algorithm
14 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
Unit II
No ratings yet
Unit II
56 pages
Unit-3 D.L
No ratings yet
Unit-3 D.L
16 pages
Unit 03 - Neural Networks - MD
No ratings yet
Unit 03 - Neural Networks - MD
24 pages
Unit I
No ratings yet
Unit I
90 pages
ST M Hdstat RNN Deep Learning
No ratings yet
ST M Hdstat RNN Deep Learning
17 pages
Deep Learning Techniques: 1. Define Neural Networks
No ratings yet
Deep Learning Techniques: 1. Define Neural Networks
31 pages
Lecture 18 Order of Growth
No ratings yet
Lecture 18 Order of Growth
36 pages
Poisson Acumulada
No ratings yet
Poisson Acumulada
3 pages
NP-Hard and NP-Complete
No ratings yet
NP-Hard and NP-Complete
13 pages
Types of Continuous Probability Distributions
No ratings yet
Types of Continuous Probability Distributions
1 page
Soft Computing AND Neural Networks LAB (IT-408) : Submitted By:-Vipin Kumar 785/IT/11
No ratings yet
Soft Computing AND Neural Networks LAB (IT-408) : Submitted By:-Vipin Kumar 785/IT/11
9 pages
Time Series Analysis: Box-Jenkins Method
No ratings yet
Time Series Analysis: Box-Jenkins Method
26 pages
Assignment 1 Front Sheet: Qualification BTEC Level 5 HND Diploma in Computing
No ratings yet
Assignment 1 Front Sheet: Qualification BTEC Level 5 HND Diploma in Computing
32 pages
Toc QN Model 2017
No ratings yet
Toc QN Model 2017
10 pages
Econometric S Lecture 45
No ratings yet
Econometric S Lecture 45
31 pages
Tutorial 4
No ratings yet
Tutorial 4
2 pages
Assignment
0% (1)
Assignment
3 pages
Unit 5 RNN
No ratings yet
Unit 5 RNN
14 pages
Analysis and Study of Perceptron To Solve Xor Problem
No ratings yet
Analysis and Study of Perceptron To Solve Xor Problem
6 pages
Theory - Computation
No ratings yet
Theory - Computation
97 pages
Ai
No ratings yet
Ai
28 pages
11th Computer Science Chapter 13 Study Material English Medium
No ratings yet
11th Computer Science Chapter 13 Study Material English Medium
5 pages
Basics of Deep Learning: Pierre-Marc Jodoin and Christian Desrosiers
No ratings yet
Basics of Deep Learning: Pierre-Marc Jodoin and Christian Desrosiers
183 pages
Final Exam ANNFL 2015-1
No ratings yet
Final Exam ANNFL 2015-1
9 pages
A Presentation ON Nfa & Dfa: Aman Kumar C.S.E 4 Sem
No ratings yet
A Presentation ON Nfa & Dfa: Aman Kumar C.S.E 4 Sem
17 pages
Quiz 1: Solutions: 6.045J/18.400J: Automata, Computability and Complexity
No ratings yet
Quiz 1: Solutions: 6.045J/18.400J: Automata, Computability and Complexity
9 pages
Reviewer
No ratings yet
Reviewer
7 pages
State Variable Project
No ratings yet
State Variable Project
17 pages
DL3 Backpropagation
No ratings yet
DL3 Backpropagation
17 pages
08 Neural Networks
No ratings yet
08 Neural Networks
47 pages
Zupper 1
No ratings yet
Zupper 1
1 page
Regular Languages and Finite State Automata
No ratings yet
Regular Languages and Finite State Automata
15 pages
Btech Cs 4 Sem Theory of Automata and Formal Languages Ncs 402 2017
100% (1)
Btech Cs 4 Sem Theory of Automata and Formal Languages Ncs 402 2017
3 pages
Chapter 06 Discrete Probability Distributions Answer Key
No ratings yet
Chapter 06 Discrete Probability Distributions Answer Key
44 pages
Salinan Dari Untitled0.Ipynb - Colaboratory
No ratings yet
Salinan Dari Untitled0.Ipynb - Colaboratory
3 pages

What Is Gradient Based Learning in Deep Learning

Uploaded by

What Is Gradient Based Learning in Deep Learning

Uploaded by

What is a Deep Feed-Forward Network?

What are its applications?

fig 1.1 — schema of a DFN

f(x) = f1(f2 (f3 (x)))

fig 1.2 — a schema of a net with depth 2: one hidden layer

Actually what we do here is converting biological nature into mathematical

 4. Hyperbolic Tangent Function

 5. Rectified Linear Unit Function* (ReLU)

Sigmoid function is a non-linear continuous function that outputs between 0 and

fig 2.1 — sigmoid formula and the graph

Rectified Linear Unit Function (ReLU)

fig 2.2 — ReLU

The plot of this function is shown in fig 2.3.

fig 2.3 — plot of the ReLU

fig 2.4 — graph of the derivative of ReLU

h1 = g_1( weight * input + bias)

the second layer is given

fig 3.1 — effect of depth in test accuracy

Another consideration in the architectural design is that the connection method of

Cost Functions: The Mathematical Backbone

Learning Conditional Statistics

Output Units: Adapting to Data Types

Linear Units for Gaussian Output Distributions

Softmax Units for Multinoulli Output Distributions

Other Output Types

Role of an Optimizer in Deep Learning

Instances of Gradient Descent Optimizers

Batch Gradient Descent (GD)

Stochastic Gradient Descent (SGD)

Mini-batch Gradient Descent (MB-GD)

Challenges with All Types of Gradient-Based Optimizers

Conclusion and Pathways to Learning

You might also like