100% found this document useful (1 vote)

280 views51 pages

UNIT-1 Foundations of Deep Learning

The document provides an introduction to deep learning and neural networks. It discusses what deep learning is, how it mimics the human brain, and some of the design challenges. It also describes different types of neural network architectures like single layer feed forward networks and multilayer feed forward networks. Additionally, it covers applications of deep learning such as self-driving cars and natural language processing.

Uploaded by

bhavana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

280 views51 pages

UNIT-1 Foundations of Deep Learning

Uploaded by

bhavana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

UNIT-I

Introduction to Deep Learning and Neural Networks: What is Deep

Learning? , Design Issues, Applications and Types of DL Models. Artificial
Neural Network- Basic Architecture, Non linear model of a neuron,
Network Architectures- Single Layer Feed Forward, Multi Layer Feed
Forward and Recurrent Networks. Training a Neural Network – Activation
Functions: Step, Linear, Sigmoid, Tanh, ReLU, Leaky ReLU and Softmax.
Back Propagation, loss and cost functions - Mean Absolute Error, Mean
Squared Error, Binary Cross Entropy and Categorical Cross Entropy.
Optimizers- Gradient Descent, Stochastic Gradient Descent, Mini-Batch
Gradient Descent, Stochastic Gradient with Mumentm, Adagrad,
AdaDelta, Adam.
Introduction to Deep Learning
What is deep learning?
Deep learning is way of classifying, clustering, and predicting things
by using a neural network that has been trained on vast amounts of
data.
Deep Learning is a technique to mimic human brain.
“Deep learning is a subset of machine learning technique that
teaches computers to do what comes naturally to humans: learn by
example.”. i.e. Mimic the human being, makes the machine to
learn, the way the human.

 Deep learning has its roots in neural networks.

 Neural networks are sets of algorithms, modeled loosely after
the human brain, that are designed to recognize patterns.
 Deep learning creates many layers of neurons, attempting to
learn structured representation of big data, layer by layer.

 A key advantage of deep learning networks is that they often

continue to improve as the size of your data increases.
Design Challenges or Design Issues of Deep Learning
 In machine learning, we manually choose features and a
classifier to sort images. With deep learning, feature
extraction and modeling steps are automatic.

 A successful deep learning application requires a very large

amount of data (thousands of images) to train the model, as
well as GPUs, or graphics processing units, to rapidly process
your data.Having a high-performance GPU means the model
will take less time to analyze all those images.
Data dependencies: when the data is small, deep learning
algorithms don’t perform well. This is the only reason Deep
Learning algorithms need a large amount of data to understand it
perfectly.

1. Hardware dependencies: Generally, deep learning depends on

high-end machines. Thus, deep learning requirement includes
GPUs. That is integral part of it’s working. Also, they do a large
amount of matrix multiplication operations.
2. Feature engineering: domain knowledge is put into the
creation of feature extractors. Also, to reduce the complexity
of the data. Further, make patterns more visible to learn
algorithm working. Although, it’s very difficult to process.
Hence, it’s time consuming and expertise.

3. Execution time: deep learning takes more time as compared

to machine learning to train. The main reason behind its long
time is that so many parameters in deep learning algorithm.

Why Deep Learning:

1. Exponential growth of data from social media, you tube,
smart phones etc. With this complex use cases like
recommendation system, face detection and so on can be
implemented using deep learning models.
The accuracy is more with deep learning model.
2. Technology upgradation in terms of Software and Hardware.
The data centers like Cloud and NVIDIA provides huge GPUs
for processing.
3. Feature Extraction (Feature Engineering) is part of deep
learning model unlike machine learning models.
4. Solve complex problemstatements like image classification,
object detection, NLP tasks, Chatbots etc.
Deep learning applications
 Self Driving Cars
 News Aggregation and Fraud News Detection
 Natural Language Processing
 Virtual Assistants
 Entertainment
 Visual Recognition
 Fraud Detection
 Healthcare
 Personalisations
 Automatic Machine Translation
 Automatic Handwriting Generation
 Automatic Game Playing
 Language Translations
 Pixel Restoration
 Photo Descriptions
 Demographic and Election Predictions
 Voice Controlled Assistance
 Automatic Image Caption Generation
 Automatic Machine Translation

Types of Deep Learning Models

1. ANN – Artificial Neural Network
o Works on tabular kind of input data
o Every task of ML model can be solved using ANN
o Applications solved using ANN are classification, pattern
recognition, pattern completion, data mining
(discovering hidden data) etc.
2. CNN - Convolution Neural Network
o Works on Images or Video as input data
o Video is a combination of multiple frames or images
with sound.
o Applications solved using CNN are
 Image classification
 Object Detection
 Object Segmentation
 Object Tracking
 Face recognition
3. RNN – Recurrent Neural Network
o Works with input data as text or time series data
(Tracking daily, hourly, or weekly weather data,
Tracking changes in application performance,
Medical devices to visualize vitals in real time,
Tracking network logsetc) or sequential data.
o Applications solved using RNN can be
 NLP tasks like
 Prediction problems
 Machine Translation
 Speech Recognition
 Generating Image Descriptions
 Video Tagging
 Text Summarization
 Call Center Analysis
 Text to Speech conversion
Artificial Neural Network
Basic Architecture of ANN:

Input Layer: This layer accepts input features. It provides

information from the outside world to the network, no
computation is performed at this layer, nodes here just pass
on the information (features) to the hidden layer.
Hidden Layer: Nodes of this layer are not exposed to the outer
world, they are the part of the abstraction provided by any
neural network. Hidden layer performs all sort of
computation on the features entered through the input layer
and transfer the result to the output layer.
Output Layer: This layer brings up the information learned by
the network to the outer world.
ANNs consist of artificial neurons, each artificial neuron has a
processing node (‘body’) represented by circles in the figure as
well as connections from (‘dendrites’) and connections to
(‘axons’) other neurons which are represented as arrows in
the figure.
In a commonly used ANN architecture, the multilayer
perceptron, the neurons are arranged in layers. An ordered
set (a vector) of predictor variables is presented to the input
layer.
Each neuron of the input layer distributes its value to all of
the neurons in the middle layer (hidden layer).
Along each connection between input and middle neurons
there is a connection weight so that the middle neuron
receives the product of the value from the input neuron and
the connection weight.
Each neuron in the middle layer takes the sum of its weighted
inputs and then applies a non-linear (usually activation)
function to the sum. The result of the function then becomes
the output from that particular middle neuron.
Each middle neuron is connected to the output neuron. Along
each connection between a middle neuron and the output
neuron there is a connection weight.
In the final step, the output neuron takes the weighted sum of
its inputs and applies the non-linear function to the weighted
sum.
The result of this function becomes the output for the entire
ANN.
If multiple output classification, then output layer has more
neurons.
Non Linear Model of Neuron:
A neuron is an information processing unit that is fundamental to
the operation of a neural network.
The following model of a neuron which form the basis for designing
artificial neural networks.
The three basic elements of a neural model are
1. A set of synapses or connecting links, each which is
characterized by a weight or strength of its own.
2. An adder for summing the input signals weighted by the
respective synaptic of the neuron; the operations described
here constitute a linear combiner.
3. An activation function for limiting the amplitude of the output
of a neuron, also called squashing function. The activation
function is used to activate a neuron.
The normalized range of output of a neuron is [0,1] or [-1 1].

The neural model also includes an externally applied bias.The

bias has the effect of increasing or lowering the net input of
theactivation function. (depending on positive or negative
bias).
Neural Network Architectures
The way in which the neurons of a neural network are structured
is linkedwith the learning algorithm used to train the network.
The various structures are

1. Single-Layer Feed forward Networks

 In a layered neural network, the neurons are organized in
the form of layers.
 In the simplest form of a layered network, we have an
input layer of source nodes that projects directly onto an
output layer of neurons (computation nodes).
 This network is strictly of a feedforward type.
 The case of four nodes in both the input and output layers
is shown below is called a single-layer network.

 The designation “single-layer” referring to the output layer

of computation nodes (neurons).
 We do not count the input layer of source nodes because
no computation is performed there.
2. Multilayer Feed forward Networks
 The presence of one or more hidden layers, whose
computation nodes are correspondingly called hidden
neurons or hidden units;
 The function of hidden neurons is to intervene between the
external input and the network output in some useful
manner.
 By adding one or more hidden layers, the network is enabled
to extract higher-order statistics from its input.
 Due to the extra set of synaptic connections and the extra
dimension of neural interactions the learning process is more
accurate.
 The source nodes in the input layer of the network supply
respective elements of the activation pattern (input vector),
which constitute the input signals applied to the neurons
(computation nodes) in the second layer (i.e., the first hidden
layer). The outputsignals of the second layer are used as
inputs to the third layer, and so on for the rest of the network.
 Typically, the neurons in each layer of the network have as
their inputs the output signals of the preceding layer only. The
set of output signals of the neurons in the output (final) layer
of the network constitutes the overall response of the
network to the activation pattern supplied by the source
nodes in the input (first) layer.
 The architectural graph of a multilayer feedforward neural
network for the case of a single hidden layer is shown below.
It is referred to as a 10–4–2 network because it has 10 source
nodes, 4 hidden neurons, and 2 output neurons.
 In general, a feedforward network with m source nodes, h1
neurons in the first hidden layer, h2 neurons in the second
hidden layer, and q neurons in the output layer is referred to
as an m–h1–h2–q network.
The neural network in the above figure is said to be fully connected
in the sense that everynode in each layer of the network is
connected to every other node in the adjacent forwardlayer.
If, however, some of the communication links (synaptic
connections) aremissing from the network, we say that the
network is partially connected.
3. Recurrent Networks
Arecurrent neural network distinguishes itself from a feedforward
neural network in thatit has at least one feedback loop.
For example, a recurrent network may consist of a singlelayer of
neurons with each neuron feeding its output signal back to the
inputs of all the other neurons.
In the below structure there are no self-feedback loops in the
network; self-feedback refersto a situation where the output of a
neuron is fed back into its own input.The network also has no
hidden neurons.
The recurrent networks with hidden neurons.

The feedback connections originate from the hidden neurons as

wellas from the output neurons.
The presence of feedback loopshas a profound impact on the
learning capability of the network and on its performance.
Moreover, the feedback loops involve the use of particular
branches composedof unit-time delay elements (denoted by z-1),
which result in a nonlinear dynamicbehavior, assuming that the
neural network contains nonlinear units.
Activation Functions
An artificial neuron calculates the ‘weighted sum’ of its inputs and
adds a bias, as shown in the figure below by the net input.

 Now the value of net input can be any anything from -inf to
+inf.
 The neuron doesn’t really know how to bound to value and
thus is not able to decide the firing pattern.
 The activation function is an important part of an artificial
neural network. They basically decide whether a neuron
should be activated or not.Thus it bounds the value of the net
input.
 The activation function is a non-linear transformation that we
do over the input before sending it to the next layer of
neurons or finalizing it as output.
 Activation function is also called “Transfer Function”.
Different types of activation functions are used in Deep Learning.
1. Step Function:
Step Function is one of the simplest kind of activation
functions. In this, we consider a threshold value and if the
value of net input say y is greater than the threshold then the
neuron is activated.
Mathematically,

The graphical representation of step function is

2. Linear Function:
 Linear function has the equation similar to as of a straight
line i.e. y = ax.
 No matter how many layers we have, if all are linear in
nature, the final activation function of last layer is nothing
but just a linear function of the input of first layer.
 Range of output is -inf to +inf
 Linear activation function is used at just one place i.e.
output layer.
 If we will differentiate linear function to bring non-
linearity, result will no more depend on input “x” and
function will become constant, it won’t introduce any
ground-breaking behavior to our algorithm.
 Y = cx, derivative with respect to x is c. That means, the
gradient has no relationship with x. It is a constant gradient
and the descent is going to be on constant gradient.

 Eg: Calculation of price of a house is a regression problem.

House price may have any big/small value, so we can apply
linear activation at output layer.
3. Sigmoid Function:
 Sigmoid function is a widely used activation function.
 It is a function which is plotted as ‘S’ shaped graph.
 Equation:A = 1/ (1 + e-x)
 This is a smooth function and is continuously differentiable.
when I have multiple neurons having sigmoid function as
their activation function, the output is nonlinear as well.
 Non-linear activation function.
 The function ranges value from 0 to 1.
 Usually used in output layer of a binary classification,
where result is either 0 or 1, as value for sigmoid function
lies between 0 and 1 only so, result can be predicted easily
to be 1 if value is greater than 0.5 and 0 otherwise.
 If the two outputs category
4. TanhFunction:
 Tanh function also knows as Tangent Hyperbolic
function.
 It’s actually mathematically shifted version of the
sigmoid function. Both are similar and can be derived
from each other.
 The function is

(Or)
 tanh(x) = 2 * sigmoid(2x) – 1
 The range of values is from -1 to +1.
 non-linear activation function

 The negative inputs will be mapped strongly negative

and the zero inputs will be mapped near zero in the tanh
graph. The function is differentiable.
 The tanh function is mainly used classification between
two classes.
5. RELU Function:
 Stands for Rectified linear unit.
 It is the most widely used activation function.
 Equation: A(x) = max (0, x). It gives an output x if x is
positive and 0 otherwise.
If x is –ve => max(0, -ve) =0
If x is +ve => max(0,+ve)=+ve
 The value range is [0, inf).
 non-linear, which means we can easily backpropagate
the errors and have multiple layers of neurons being
activated by the ReLU function.
 ReLu is less computationally expensive than tanh and
sigmoid because it involves simpler mathematical
operations. At a time only a few neurons are activated
making the network sparse making it efficient and easy
for computation.
 Used in Hidden layers (middle).
 RELU learns much faster than sigmoid and Tanh
function.
6. Leaky ReLU:
 Leaky ReLU function is nothing but an improved version
of the ReLUfunction.Instead of defining the Relu
function as 0 for x less than 0, we define it as a small
linear component of x. It can be defined as:
 f(x) = ax, x<0
f(x) = x, otherwise.

7. Softmax Function:
 The softmax function is also a type of sigmoid function
and used to handle classification problems.
 It is used if output has more than two.
 Usually used when trying to handle multiple classes. The
softmax function would squeeze the outputs for each
class between 0 and 1 and would also divide by the sum
of the outputs.
 non-linear activation function
 The softmax function is ideally used in the output layer
of the classifier where we are actually trying to attain
the probabilities to define the class of each input.
 output for the Softmax function is the ratio of the
exponential of the parameter and the sum of
exponential parameter.
CHOOSING THE RIGHT ACTIVATION FUNCTION
 If we don’t know what activation function to use, then simply
use RELU as it is a general activation function and is used in
most cases these days.
 If output is for binary classification then, sigmoid function is
very choice for output layer.
 If output is other than binary classification then, softmax
function is choice for output layer.
 For all hidden layers RELU is choice of activation function.

“The activation function does the non-linear transformation to the

input making it capable to learn and perform more complex tasks.”
Loss and Cost Functions
A cost function is a measure of “how good” a neural
network did with respect to its given training sample and
the expected output.
A cost function is a single value it rates how good the neural
network did as a whole.
A cost function depends on variables such as weights and
biases.
A cost function is the average loss over the entire training
dataset. It is the error representation. It shows how our
model is predicting compared to the actual values.
The optimization strategies aim at minimizing the cost
function.
Lesser the cost function value then more will be the
accuracy.
Loss Function- Loss function or error function is for a single
training example or dataset.
Cost Function- Cost function is the average loss over the
entire training dataset.
A cost function is selected based on the problem selected
for training.
The various Cost functions are
1. Mean Absolute Error:
It is a measure of errors between paired observations.
Example: ai -> Predicted output
yi -> Actual or Observed Output
n -> number of observations / data inputs
For ai vs yi include comparisons of predicted and observed
values.

 Used for Regression or Linear Regression Models. Here the

output can take any number. Eg: House Price Prediction.
2. Mean Squared Error: Measures the average of the squares
of the errors. I.e. The average squared differences between
the estimated and the actual values. Also called Mean
Squared Deviation.
Used for Regression or Linear Regression Models. Here the
output can take any number.
Eg: House Price Prediction.

3. Binary Cross Entropy:

Compares each of the predicted probabilities to actual class
output which can be either 0 or 1.Alo called log loss.
It is the negative average of the log of corrected predicted
probabilities.
Used for binary classification.
If yi=0 => error = -log(1-ai)
When ai approaches to 1, then high error,
When ai close to 0, then less error

If yi=1=> error =-log(ai)

When ai approaches to 1 , then less error,
When ai approaches to 0, then high error.

4. Categorical Cross Entropy:

It is used for multi class classification.
OPTIMIZERS
Optimization is a technique which speedup the training / learning
of a model in deep learning.
Optimizers are algorithms or methods used to minimize an error
function (loss function) or to maximize the efficiency of production.
Optimizers are algorithms or methods used to change the
attributes of your neural network such as weights and learning rate
in order to reduce the losses.
Optimizers are mathematical functions which are dependent on
model’s learnable parameters i.e Weights & Biases.
The change of weights or learning rates of neural network to
reduce the losses is defined by the optimizers.
Optimization algorithms or strategies are responsible for reducing
the losses and to provide the most accurate results possible.
Optimizers commonly used are - Gradient Descent, Stochastic
Gradient Descent, Mini-Batch Gradient Descent, Adagrad,
AdaDelta, Adam.
The various Types of optimizers are
1. Gradient Descent:
 Gradient Descent is the most basic and mostly used
optimization algorithm.
 Gradient descent is an optimization algorithm based on a
convex function and adjust its parameters iteratively to
minimize a given function to its local minimum.
 It is used heavily in linear regression and classification
algorithms.
 Backpropagation in neural networks also uses a gradient
descent algorithm.
 Gradient Descent iteratively reduces a loss function by moving
in the direction opposite to that of steepest ascent.
 Gradient descent is a first-order optimization algorithm which
is dependent on the first order derivative of a loss function. It
calculates that which way the weights should be altered so
that the function can reach minima.
 It is dependent on the derivatives of the loss function for
finding minima.
 All the data points (records) are used at a time (cost function).

Also the symbol 'α' is used for learning rate.

 Uses the data of the entire training set to calculate the
gradient of the cost function to the parameters which requires
large amount of memory and slows down the process.
 How big/small the steps are gradient descent takes into the
direction of the local minimum are determined by the
learning rate, which figures out how fast or slow we will move
towards the optimal weights.

Advantages:
 Easy to understand
 Easy to implement
 Easy for Computation
Disadvantages:
 Because this method calculates the gradient for the entire
data set in one update, the calculation is very slow. So, if the
dataset is too large then this may take years to converge to
the minima.
 It requires large memory and it is computationally expensive.

2. Stochastic Gradient Descent (SGD)

 It is a variant of Gradient Descent. It updates the model
parameters one by one.
 In this, the model parameters are altered after computation of
loss on each training example.
 Here only one data point (record) is considered at a time for
weight updation.
 Linear regression uses SGD optimizer.
 If the model has 10K dataset SGD will update the model
parameters 10k times instead of one time as in Gradient
Descent.
 As the model parameters are frequently updated, parameters
have high variance and fluctuations in loss functions at
different intensities.

Advantages:
1. Frequent updates of model parameter
2. Requires less memory as no need to store values of loss
functions.
3. Allows the use of large data sets as it has to update only one
record at a time.
Disadvantages:
1. The frequent can also result in noisy gradients which may
cause the error to increase instead of decreasing it.
2. High Variance.
3. Frequent updates are computationally expensive.

3. Mini-Batch Gradient Descent

 It’s best among all the variations of gradient descent
algorithms. Most of the neural networks uses this model.
 It is an improvement on both SGD and standard gradient
descent.
 Here K data points (records) are used at a time for weight
updating.
 It updates the model parameters after every batch. So, the
dataset is divided into various batches and after every batch,
the parameters are updated.
 It simply splits the training dataset into small batches and
performs an update of weights for each of those batches.
 This creates a balance between the robustness of stochastic
gradient descent and the efficiency of batch gradient descent.
 It can reduce the variance when the parameters are updated,
and the convergence is more stable.
 It splits the data set into batches in between 50 to 256
records, chosen at random.
Advantages:
 It leads to more stable convergence.
 More efficient gradient calculations.
 Requires less amount of memory.
 Frequently updates the model parameters and also has less
variance.

Disadvantages:
 Mini-batch gradient descent does not guarantee good
convergence,
 If the learning rate is too small, the convergence rate will be
slow. If it is too large, the loss function will oscillate or even
deviate at the minimum value.

4. SGD with Momentum:

Momentum was invented for reducing high variance in SGD and
softens the convergence.
It accelerates the convergence towards the relevant direction
and reduces the fluctuation to the irrelevant direction.
Momentum simulates the inertia of an object when it is moving,
that is, the direction of the previous update is retained to a
certain extent during the update, while the current update
gradient is used to fine-tune the final update direction.
One more hyper parameter is used in this method known as
momentum symbolized by ‘γ’.

The momentum term ‘γ’ is usually set to 0.9 or a similar value.

It is s a stochastic optimization method that adds a momentum
term to regular stochastic gradient descent.
In this way, we can increase the stability to a certain extent, so
that we can learn faster.

Advantages:
 Momentum helps to reduce the noise. Reduces the oscillations
and high variance of the parameters.
 Converges faster than gradient descent.
 Exponential Weighted Average is used to smoothen the curve.
Disadvantages:
 Extra hyper parameter is added, which needs to be selected
manually and accurately.

All types of Gradient Descent have some challenges:

 Choosing an optimum value of the learning rate. If the learning
rate is too small than gradient descent may take ages to
converge.
 Have a constant learning rate for all the parameters. There
may be some parameters which we may not want to change at
the same rate.
 May get trapped at local minima.

5. Adagrad: (Adaptive Gradient Descent)

 In all the above optimizers GD, SGD, Mini Batch GD and SGD
with Momentum uses the constant learning rate for all
parameters and for each cycle.
 Adagard optimizer changes the learning rate. It changes the
learning rate ‘η’ for each parameter and at every time step ‘t’.
 It works on the derivative of an error function.
 The intuition behind AdaGrad is we can use different Learning
Rates for each and every neuron for each and every hidden
layer based on different iterations.
Advantages:
1. Learning rate changes for each training parameter.
2. Don’t need to manually tune the learning rate.
3. Able to train on sparse data (contains 1s and 0s). (high
percentage of data having zeros rather than no actual values)

Disadvantages:
1. Computationally expensive as a need to calculate the second
order derivative.
2. The learning rate is always decreasing, results in slow training.
3. If the neural network is deep the learning rate becomes very
small number, which will cause dead neuron problem.
4. Need to set the default learning rate first.

6. AdaDelta:
 Adadelta is an extension of Adagrad and it also tries to reduce
Adagrad’s aggressive, monotonically reducing the learning
rate and remove decaying learning rate problem.
 In Adadelta we do not need to set the default learning rate as
we take the ratio of the running average of the previous time
steps to the current gradient.

Advantages:
1. Now the learning rate does not decay and the training does not
stop.
Disadvantages:
1. Computationally expensive.
7. Adam (Adaptive Moment Estimation)
 Adam optimizer is one of the most popular and famous
gradient descent optimization algorithms.
 It is a method that computes adaptive learning rates for each
parameter.
 Works with momentums of first and second order. The
intuition behind the Adam is that we don’t want to roll so fast
just because we can jump over the minimum, we want to
decrease the velocity a little bit for a careful search.
 Updates both bias and weights.
8. RMSprop Optimizer
The RMSprop optimizer is similar to the gradient descent algorithm
with momentum.
The RMSprop optimizer restricts the oscillations in the vertical
direction. Therefore, we can increase our learning rate and our
algorithm could take larger steps in the horizontal direction
converging faster.
The following equations show how the gradients are calculated for
the RMSprop and gradient descent with momentum. The value of
momentum is denoted by beta and is usually set to 0.9.

Unit 2 Introduction To Deep Learning
No ratings yet
Unit 2 Introduction To Deep Learning
79 pages
001 - Introduction To Management Science 2
No ratings yet
001 - Introduction To Management Science 2
72 pages
ML Unit-1
No ratings yet
ML Unit-1
15 pages
Deep Learning PPT Full Notes
No ratings yet
Deep Learning PPT Full Notes
105 pages
LPP Formulation
67% (3)
LPP Formulation
38 pages
Book Machine Learning Finance Python
100% (1)
Book Machine Learning Finance Python
75 pages
Deep Learning
100% (1)
Deep Learning
21 pages
SCSA3015 Deep Learning Unit 4 PDF
No ratings yet
SCSA3015 Deep Learning Unit 4 PDF
30 pages
Machine Learning Notes
100% (1)
Machine Learning Notes
115 pages
ELD Using Newton Method
No ratings yet
ELD Using Newton Method
18 pages
ADL Unit-3
100% (2)
ADL Unit-3
21 pages
Introduction To Management Science A Modeling and Case Studies Approach With Spreadsheets 5th Edition Hillier Solutions Manual Download
97% (31)
Introduction To Management Science A Modeling and Case Studies Approach With Spreadsheets 5th Edition Hillier Solutions Manual Download
80 pages
Business-Mathematics - Mcqs For BBA Students All Units
No ratings yet
Business-Mathematics - Mcqs For BBA Students All Units
35 pages
Unit 3 Full Notes
No ratings yet
Unit 3 Full Notes
30 pages
Lecture 2.1 Linear Regression
No ratings yet
Lecture 2.1 Linear Regression
36 pages
UNIT-4 Foundations of Deep Learning
100% (1)
UNIT-4 Foundations of Deep Learning
43 pages
Optimization With COMSOL Multiphysics
No ratings yet
Optimization With COMSOL Multiphysics
55 pages
Deep Learning PPT Full Notes
100% (3)
Deep Learning PPT Full Notes
105 pages
Horngren - Ca16 - PPT - 10 - Determininng HowCosts Behave
100% (1)
Horngren - Ca16 - PPT - 10 - Determininng HowCosts Behave
47 pages
DLunit 4
No ratings yet
DLunit 4
16 pages
UNIT-2 Foundations of Deep Learning
No ratings yet
UNIT-2 Foundations of Deep Learning
64 pages
Unit 4 Notes
100% (1)
Unit 4 Notes
45 pages
Machine Learning Notes PDF
No ratings yet
Machine Learning Notes PDF
85 pages
DL Unit Wise Important Questions
No ratings yet
DL Unit Wise Important Questions
2 pages
Unit 4
100% (1)
Unit 4
57 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
200 pages
Backpropagation
No ratings yet
Backpropagation
7 pages
NN UNIT-1 Complete Notes With 153 Pages
No ratings yet
NN UNIT-1 Complete Notes With 153 Pages
153 pages
SCSA3015 Deep Learning Unit 2 PDF
No ratings yet
SCSA3015 Deep Learning Unit 2 PDF
32 pages
Deep Learning Notes
100% (1)
Deep Learning Notes
16 pages
Unit 5
No ratings yet
Unit 5
23 pages
120 Deep Learning Important Questions + Answers ?
No ratings yet
120 Deep Learning Important Questions + Answers ?
68 pages
Instance Based Learning
100% (1)
Instance Based Learning
49 pages
Unit - 1, Notes
No ratings yet
Unit - 1, Notes
38 pages
Deep Learning Notes
100% (1)
Deep Learning Notes
71 pages
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
DEEP LEARNING (Previous Question Papers)
No ratings yet
DEEP LEARNING (Previous Question Papers)
3 pages
Concept Learning
No ratings yet
Concept Learning
62 pages
Psoc Unit 1 QP 2022
No ratings yet
Psoc Unit 1 QP 2022
30 pages
Unit 4 Deeplearning
No ratings yet
Unit 4 Deeplearning
41 pages
Openmp Implementation of 3D Gravimetric, FTG and Magnetic Growth-Based Inversion Algorithm For Salt Structures in Deepwater Gulf of Mexico
No ratings yet
Openmp Implementation of 3D Gravimetric, FTG and Magnetic Growth-Based Inversion Algorithm For Salt Structures in Deepwater Gulf of Mexico
35 pages
Ratio Imputation Improvement
No ratings yet
Ratio Imputation Improvement
39 pages
Deep Learning-KTU
No ratings yet
Deep Learning-KTU
6 pages
Unit 1 Notes
100% (1)
Unit 1 Notes
14 pages
Blocher8e EOC SM Ch17 Final Student
No ratings yet
Blocher8e EOC SM Ch17 Final Student
60 pages
Artificial Neural Networks: Part 1/3
No ratings yet
Artificial Neural Networks: Part 1/3
25 pages
Cours-1regression Lineaire PDF
No ratings yet
Cours-1regression Lineaire PDF
24 pages
2.building Blocks of Neural Networks
100% (1)
2.building Blocks of Neural Networks
2 pages
NNDL Technical Publication Notes
No ratings yet
NNDL Technical Publication Notes
81 pages
Math Cost Minimization in Production of Iphone 15
No ratings yet
Math Cost Minimization in Production of Iphone 15
14 pages
Unit 4 NNDL
No ratings yet
Unit 4 NNDL
37 pages
Mathematics For Machine Learning-I
No ratings yet
Mathematics For Machine Learning-I
10 pages
Question Bank - Machine Learning (Repaired)
100% (1)
Question Bank - Machine Learning (Repaired)
78 pages
DL Unit - 4
No ratings yet
DL Unit - 4
14 pages
Profit Optimization of An Apparel Industry in Bangladesh by Linear Programming Model
No ratings yet
Profit Optimization of An Apparel Industry in Bangladesh by Linear Programming Model
8 pages
Deep Learning Question Bank (2024-25)
No ratings yet
Deep Learning Question Bank (2024-25)
2 pages
Deep Learning Handout
100% (1)
Deep Learning Handout
6 pages
ML (U1&u2)
No ratings yet
ML (U1&u2)
51 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
Omkar Sabnis B4-764 Experiment No. 7 Aim: Implementation of MC-Culloch Pitt Model For AND Gate Using Python. Theory
No ratings yet
Omkar Sabnis B4-764 Experiment No. 7 Aim: Implementation of MC-Culloch Pitt Model For AND Gate Using Python. Theory
10 pages
DL Question Bank
No ratings yet
DL Question Bank
23 pages
BUSI 2013 Unit 1-10 Notes
No ratings yet
BUSI 2013 Unit 1-10 Notes
10 pages
Overfitting vs. Underfitting, Bias vs. Variance
No ratings yet
Overfitting vs. Underfitting, Bias vs. Variance
7 pages
DL Unit - 5
No ratings yet
DL Unit - 5
14 pages
Advanced Deep Learning Questions - ChatGPT
No ratings yet
Advanced Deep Learning Questions - ChatGPT
13 pages
Mine Stockpile Design To Minimise Environmental Impact: J. E. Everett
No ratings yet
Mine Stockpile Design To Minimise Environmental Impact: J. E. Everett
10 pages
Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
UNIT-5 Foundations of Deep Learning
No ratings yet
UNIT-5 Foundations of Deep Learning
9 pages
Jntuk r20 Unit-I Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-I Deep Learning Techniques (WWW - Jntumaterials.co - In)
23 pages
Unit 2 Numerical
No ratings yet
Unit 2 Numerical
5 pages
Multiple Disease Prediction Using Machine Learning
No ratings yet
Multiple Disease Prediction Using Machine Learning
7 pages
DL Notes 1 5 Deep Learning
100% (1)
DL Notes 1 5 Deep Learning
189 pages
OPTIMIZATION OF FOOD PROCESSING OPERATIONS IN S.R.'s CANNERIES, ALLAHABAD BY LINEAR PROGRAMMING
No ratings yet
OPTIMIZATION OF FOOD PROCESSING OPERATIONS IN S.R.'s CANNERIES, ALLAHABAD BY LINEAR PROGRAMMING
38 pages
456 - Super Grain Corporation
No ratings yet
456 - Super Grain Corporation
5 pages
Sam HW2
No ratings yet
Sam HW2
4 pages
XG Boost
No ratings yet
XG Boost
5 pages
Pattern Classification
No ratings yet
Pattern Classification
39 pages
Regression Exercises
No ratings yet
Regression Exercises
2 pages
Deep Learning Laboratory
No ratings yet
Deep Learning Laboratory
69 pages
ML UNIT-4 Notes PDF
100% (1)
ML UNIT-4 Notes PDF
40 pages
DL Unit-2
No ratings yet
DL Unit-2
24 pages
Neural Networks
No ratings yet
Neural Networks
32 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
Machine Learning-Unit-V-Notes
No ratings yet
Machine Learning-Unit-V-Notes
23 pages
Deep Learning Interview Questions
No ratings yet
Deep Learning Interview Questions
17 pages
Unit - 3-NNDL - Notes
No ratings yet
Unit - 3-NNDL - Notes
17 pages
Deep Learning Questions
50% (2)
Deep Learning Questions
51 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
Var Model Validation: Laura Garc Ia Jorcano February 2018
No ratings yet
Var Model Validation: Laura Garc Ia Jorcano February 2018
9 pages
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
No ratings yet
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
21 pages

UNIT-1 Foundations of Deep Learning

Uploaded by

UNIT-1 Foundations of Deep Learning

Uploaded by

UNIT-I

Introduction to Deep Learning and Neural Networks: What is Deep

 Deep learning has its roots in neural networks.

 A key advantage of deep learning networks is that they often

 A successful deep learning application requires a very large

1. Hardware dependencies: Generally, deep learning depends on

3. Execution time: deep learning takes more time as compared

Why Deep Learning:

Types of Deep Learning Models

Input Layer: This layer accepts input features. It provides

The neural model also includes an externally applied bias.The

1. Single-Layer Feed forward Networks

 The designation “single-layer” referring to the output layer

The feedback connections originate from the hidden neurons as

The graphical representation of step function is

 Eg: Calculation of price of a house is a regression problem.

 The negative inputs will be mapped strongly negative

“The activation function does the non-linear transformation to the

 Used for Regression or Linear Regression Models. Here the

3. Binary Cross Entropy:

If yi=1=> error =-log(ai)

4. Categorical Cross Entropy:

Also the symbol 'α' is used for learning rate.

2. Stochastic Gradient Descent (SGD)

3. Mini-Batch Gradient Descent

4. SGD with Momentum:

The momentum term ‘γ’ is usually set to 0.9 or a similar value.

All types of Gradient Descent have some challenges:

5. Adagrad: (Adaptive Gradient Descent)

You might also like