0% found this document useful (0 votes)

29 views16 pages

Unit-3 Notes

Uploaded by

sipik50968

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views16 pages

Unit-3 Notes

Uploaded by

sipik50968

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

UNIT III

Introduction to Deep Learning: Historical Trends in Deep learning, Deep Feed forward
networks, Gradient-Based learning, Hidden Units, Architecture Design, Back- Propagation
and Other Differentiation Algorithms

Introduction

Deep Learning is a subset of Machine Learning that uses mathematical

functions to map the input to the output. These functions can extract non-
redundant information or patterns from the data, which enables them to form a
relationship between the input and the output. This is known as learning, and
the process of learning is called training.

In traditional computer programming, input and a set of rules are combined

together to get the desired output. In machine learning and deep learning, input
and output are correlated to the rules. These rules when combined with new
input-yield desired results.

Modern deep learning models use artificial neural networks or simply neural
networks to extract information.

These neural networks are made up of a simple mathematical function that can
be stacked on top of each other and arranged in the form of layers, giving them
a sense of depth, hence the term Deep Learning.
Deep learning can also be thought of as an approach to Artificial Intelligence, a smart combination of
hardware and software to solve tasks requiring human intelligence
Importance of Deep Learning

Deep learning algorithms play a crucial role in determining the features and can
handle the large number of processes for the data that might be structured or
unstructured. Although, deep learning algorithms can overkill some tasks that
might involve complex problems because they need access to huge amounts of
data so that they can function effectively. For example, there's a popular deep
learning tool that recognizes images namely Imagenet that has access to 14
million images in its dataset-driven algorithms. It is a highly comprehensive
tool that has defined a next-level benchmark for deep learning tools that aim
images as their dataset.

Deep learning algorithms are highly progressive algorithms that learn about the
image that we discussed previously by passing it through each neural network
layer. The layers are highly sensitive to detect low-level features of the image
like edges and pixels and henceforth the combined layers take this information
and form holistic representations by comparing it with previous data. For
example, the middle layer might be programmed to detect some special parts of
the object in the photograph which other deep trained layers are programmed to
detect special objects like dogs, trees, utensils, etc.

However, if we talk out the simple task that involves less complexity and a
data-driven resource, deep learning algorithms fail to generalize simple data.
This is one of the main reasons deep learning is not considered effective as
linear or boosted tree models. Simple models aim to churn out custom data,
track fraudulent transactions and deal with less complex datasets with fewer
features. Also, there are various cases like multiclass classification where deep
learning can be effective because it involves smaller but more structured
datasets but is not preferred usually.

Why Deep Learning

Applications of Deep Learning :

In computer vision, Deep learning models can enable machines to identify and
understand visual data. Some of the main applications of deep learning in
computer vision include:

Object detection and recognition: Deep learning model can be used to identify
and locate objects within images and videos, making it possible for machines to
perform tasks such as self-driving cars, surveillance, and robotics.

Image classification: Deep learning models can be used to classify images into
categories such as animals, plants, and buildings. This is used in applications
such as medical imaging, quality control, and image retrieval.

Image segmentation: Deep learning models can be used for image

segmentation into different regions, making it possible to identify specific
features within images.
Natural language processing (NLP):

In NLP, the Deep learning model can enable machines to understand and
generate human language. Some of the main applications of deep learning in
NLP include:

Automatic Text Generation – Deep learning model can learn the corpus of
text and new text like summaries, essays can be automatically generated using
these trained models.

Language translation: Deep learning models can translate text from one
language to another, making it possible to communicate with people from
different linguistic backgrounds.

Sentiment analysis: Deep learning models can analyze the sentiment of a piece
of text, making it possible to determine whether the text is positive, negative, or
neutral. This is used in applications such as customer service, social media
monitoring, and political analysis.

Speech recognition: Deep learning models can recognize and transcribe spoken
words, making it possible to perform tasks such as speech-to-text conversion,
voice search, and voice-controlled devices.

Reinforcement learning:

In reinforcement learning, deep learning works as training agents to take action

in an environment to maximize a reward. Some of the main applications of deep
learning in reinforcement learning include:

Game playing: Deep reinforcement learning models have been able to beat
human experts at games such as Go, Chess, and Atari.

Robotics: Deep reinforcement learning models can be used to train robots to

perform complex tasks such as grasping objects, navigation, and manipulation.

Control systems: Deep reinforcement learning models can be used to control

complex systems such as power grids, traffic management, and supply chain
optimization.
Historical Trends in Deep Learning

Deep Learning have been three waves of development: The first wave started
with cybernetics in the 1940s-1960s, with the development of theories of
biological learning and implementations of the first models such as the
perceptron allowing the training of a single neuron. The second wave started
with the connectionist approach of the 1980-1995 period, with back-propagation
to train a neural network with one or two hidden layers. The current and third
wave, deep learning, started around 2006.
Deep Learning History Timeline
Deep Feedforward Networks
Introduction
 Deep feedforward neural nets are also known as multilayer perceptrons
 Goal is to approximate a function F*(x) by learning a mapping y=F(x;θ) where θ are
the paramters to be learned by the model
 compose together many different functions, which can be represented by a DAG
 the final output of the model is called the output layer, while the intermediary layers
are called hidden layers.

Learning XOR
The XOR function (“exclusive or”) is an operation on two binary values, x1 and x2. When
exactly one of these binary values is equal to 1, the XOR function returns 1. Otherwise, it
returns 0. The XOR function provides the target function

y = f∗(x) that we want to learn. Our model provides a function y = f(x; θ) and our learning
algorithm will adapt the parameters θ to make f as similar as possible to f∗.

We want our network to perform correctly on the four points X = {[0, 0], [0, 1],[1, 0], and [1,
1]}. We will train the network on all four of these points. The only challenge is to t the
training set.

We can treat this problem as a regression problem and use a mean squared error loss
function. In practical applications, MSE is usually not an appropriate cost function for
modeling binary data.

Evaluated on our whole training set, the MSE loss function is

Suppose that we choose a linear model, with θ consisting of w and b. Our model is defined to
be

We can minimize J(θ) in closed form with respect to w and b using the normal equations.

After solving the normal equations, we obtain w = 0 and b=1/2 The linear model simply outputs 0.5
everywhere. Why does this happen? A linear model is not able to represent the XOR function. One
way to solve this problem is to use a model that learns a different feature space in which a linear
model is able to represent the solution.

Specifically, we will introduce a very simple feedforward network with one hidden layer
containing two hidden units.

This feedforward network has a vector of hidden units h that are computed by a function
f(1)(x; W, c). The values of these hidden units are then used as the input for a second layer.
The second layer is the output layer of the network. The output layer is still just a linear
regression model, but now it is applied to h rather than to x . The network now contains two
functions chained together: h = f(1)(x; W, c) and y = f(2)(h; w, b), with

the complete model being f (x; W,C,w,b) = f (2) (f (1) (x))

What function should f(1) compute? Linear models have served us well so far, and it may

be tempting to make f(1) be linear as well. Unfortunately, if f(1) were linear, then the feedforward
network as a whole would remain a linear function of its input. we must use a nonlinear function to
describe the features. Most neural networks do so using an affine transformation controlled by
learned parameters, followed by a fixed, nonlinear function called an activation function. We use
that strategy here, by defining h = g(WT x + c),

where W provides the weights of a linear transformation and c the biases.

We describe an affine transformation from a vector x to a vector h, so an entire vector of bias

parameters is needed. The activation function g is typically chosen to be a function that is applied
element-wise, with hi= g(xT Wi+ ci). In modern neural networks, the default recommendation is
to use the rectified linear unit or ReLU defined by the activation function g(z) = max{0,z}.

We can now specify our complete network as

f (x; W,C,w,b) = wT max{0, W T x + c} + b

We can now specify a solution to the XOR problem. Let

and b = 0

We can now walk through the way that the model processes a batch of inputs. Let X be the design
matrix containing all four points in the binary input space, with one example per row:

The first step in the neural network is to multiply the input matrix by the first layer’s weight matrix:

Next, we add the bias vector c, to obtain

In this space, all of the examples lie along a line with slope 1. As we move along this line, the output
needs to begin at 0, then rise to 1, then drop back down to 0. A linear model cannot implement such a
function. To finish computing the value of h for each example, we apply the rectified linear
transformation:
This transformation has changed the relationship between the examples. They no longer lie on a
single line. They now lie in a space where a linear model can solve the problem. We finish by
multiplying by the weight vector w:

The neural network has obtained the correct answer for every example in the batch.

In this example, we simply specified the solution, then showed that it obtained zero error. In a real
situation, there might be billions of model parameters and billions of training examples, so one cannot
simply guess the solution as we did here. Instead, a gradient- based optimization algorithm can find
parameters that produce very little error.

Gradient-Based Learning
As with other machine learning models, to apply gradient-based learning we must choose a
cost function, and we must choose how to represent the output of the model. Largest
difference between simple ML Models and neural networks are nonlinearity of a neural
network causes most interesting loss functions to become non-convex. This means that neural
networks are usually trained by using iterative, gradient-based optimizers that merely drive
the cost function to a very low value, rather than exact linear equation solvers used to train
linear regression models or the convex optimization algorithms used for logistic regression or
SVMs

Cost Functions
A cost function is an important parameter that determines how well a machine learning model
performs for a given dataset. It calculates the di erence between the expected value and
predicted value and represents it as a single real number

Types of Cost Function

1.Regression Cost Function

 Means Error
 Mean Squared Error
 Mean Absolute Error

2.Binary Classi cation cost Functions

3.Multi-class Classi cation Cost Function.

In most cases, our parametric model defines a distribution p(y | x;θ ) and we simply use the
principle of maximum likelihood. This means we use the cross-entropy between the training
data and the model’s predictions as the cost function.

Sometimes, we rather than predicting a complete probability distribution over y, we merely

predict some statistic of y conditioned on x. Specialized loss functions allow us to train a
predictor of these estimates.

The total cost function used to train a neural network will often combine one of the primary
cost functions described here with a regularization term.

Learning Conditional Distributions with Maximum Likelihood

Most modern neural networks are trained using maximum likelihood. This meansthat the cost
function is simply the negative log-likelihood, equivalently describedas the cross- entropy
between the training data and the model distribution. This cost function is given by:

The specific form of the cost function changes from model to model, depending on the
specific form of log P model.

An advantage of this approach of deriving the cost function from maximum likelihood is that
it removes the burden of designing cost functions for each model. Specifying a model p(y | x)
automatically determines a cost function log p(y | x).

Hidden Units
How to choose the type of hidden unit to use in the hidden layers of the model. The design of
hidden units is an extremely active area of research and does not yet have many definitive
guiding theoretical principles. Rectified linear units are an excellent default choice of hidden
unit.

We discuss motivations behind choice of hidden unit. It is usually impossible to predict in

advance which will work best. The design process consists of trial and error, intuiting that a
kind of hidden unit may work well, and evaluating its performance on a validation set

Some hidden units are not differentiable at all input points. For example, the rectified
linear function. g (z) = max {0, z} is not differentiable at z = 0. This may seem like it
invalidates g for use with a gradient based learning algorithm. In practice, gradient descent
still performs well enough for these models to be used for machine learning tasks

Most hidden units can be described as accepting a vector of inputs x, computing an affine
transformation z = wT h + b, and then applying an element-wise nonlinear function g (z).
Most hidden units are distinguished from each other only by the choice of the form of the
activation function g (z)

Rectified Linear Units and Their Generalizations (ReLU)

Rectified linear units use the activation function g (z) = max {0, z}.

Rectified linear units are easy to optimize due to similarity with linear units.

Only difference with linear units that they output 0 across half its domain
Derivative is 1 everywhere that the unit is active
Thus gradient direction is far more useful than with activation functions
with second-ordereffects
Rectified linear units are typically used on top of an affine transformation:
h = g (W T x + b).
Good practice to set all elements of b to a small value such as 0.1. This makes it likely that
ReLU will be initially active for most training samples and allow derivatives to pass through

ReLU vs other activations:

 Sigmoid and tanh activation functions cannot be with many layers due to
the vanishing gradient problem
 ReLU overcomes the vanishing gradient problem, allowing models to learn faster and
perform better.
 ReLU is the default activation function with MLP and CNN
One drawback to rectified linear units is that they cannot learn via gradient based methods on
examples for which their activation is zero
Three generalizations of rectified linear units are based on using a non-zero slope αi when
Logistic Sigmoid and Hyperbolic Tangent

Most neural networks used the logistic sigmoid activation function prior to rectified
linear units.
g (z) = σ (z)
or the hyperbolic tangent activation function
g (z) = tanh (z)
These activation functions are closely related because
tanh(z) = 2 σ (2z) − 1
We have already seen sigmoid units as output units, used to predict the probability
that a binary variable is 1.
Sigmoidals saturate across most of domain
 Saturate to 1 when z is very positive and 0 when z is very negative
 Strongly sensitive to input when z is near 0
 Saturation makes gradient-learning di cult
Hyperbolic tangent typically performs better than logistic sigmoid. It resembles the
identity function more closely. Because tanh is similar to the identity function near 0,
training a deep neural network ŷ = wT tanh(U T tanh (V T x))resembles training a
linear model ŷ = wTU TV T x so long as the activations of the network can be kept
small.
Architecture Design
The word architecture refers to the overall structure of the network: how many units it should
have and how these units should be connected to each other
Generic Neural Architectures
Most neural networks are organized into groups of units called layers. Most neural network
architectures arrange these layers in a chain structure, with each layer being a function of the
layer that preceded it. In this structure, the first layer is given by
h(1) = g(1) (W (1)Tx + b(1))
the second layer is given by
h(2) = g(2) (W (2)Th(1) + b(2))
In these chain-based architectures, the main architectural considerations are to choose the
depth of the network and the width of each layer.

Universal Approximation Properties and Depth

A feed-forward network with a single hidden layer containing a finite number of neurons can
approximate continuous functions on compact subsets of ℝn, under mild assumptions on the
activation function
 Simple neural networks can represent a wide variety of interesting functions when
given appropriate parameters
 However, it does not touch upon the algorithmic learnability of those parameters

The universal approximation theorem means that regardless of what function we are trying to
learn, we know that a large MLP will be able to represent this function. However, we are not
guaranteed that the training algorithm will be able to learn that function. Even if the MLP is
able to represent the function, learning can fail for two different reasons
 Optimizing algorithms may not be able to nd the value of the parameters that
corresponds to the desired function.
 The training algorithm might choose wrong function due to over- tting
The universal approximation theorem says that there exists a network large enough to
achieve any degree of accuracy we desire, but the theorem does not say how large this
network will be. provides some bounds on the size of a single-layer network needed to
approximate a broad class of functions. Unfortunately, in the worse case, an
exponential number of hidden units may be required This is easiest to see in the binary
case: the number of possible binary functions on vectors v ∈ {0,1}n is 2 2n and
selecting one such function requires 2 n bits, which will in general require O (2n)
degrees of freedom
A feedforward network with a single layer is sufficient to represent any function, But
the layer may be infeasibly large and may fail to generalize correctly. Using deeper
models can reduce no.of units required and reduce generalization error

Abhijit Ghatak - Deep Learning With R-Springer (2019)
No ratings yet
Abhijit Ghatak - Deep Learning With R-Springer (2019)
259 pages
DL Unit 1
No ratings yet
DL Unit 1
200 pages
Unit 1
No ratings yet
Unit 1
21 pages
Deep Learning
No ratings yet
Deep Learning
15 pages
Deep Learning
No ratings yet
Deep Learning
24 pages
The AI Revolution in Customer Service and Support A Practical Guide To Impactful Deployment of AI Models (Ross Smith, Mayte Cubino, Emily McKeon) (Z-Library)
No ratings yet
The AI Revolution in Customer Service and Support A Practical Guide To Impactful Deployment of AI Models (Ross Smith, Mayte Cubino, Emily McKeon) (Z-Library)
409 pages
AD3501-DL-Unit 1 Notes
No ratings yet
AD3501-DL-Unit 1 Notes
43 pages
Deepfake - Content-3
No ratings yet
Deepfake - Content-3
81 pages
CH 8
No ratings yet
CH 8
42 pages
Deep Learning UNIT 5
No ratings yet
Deep Learning UNIT 5
182 pages
UNIT I Part 1 Notes
No ratings yet
UNIT I Part 1 Notes
28 pages
Deep Learning For Computer Vision
No ratings yet
Deep Learning For Computer Vision
125 pages
Deep Learning, Theory and Foundation A Brief Review
No ratings yet
Deep Learning, Theory and Foundation A Brief Review
7 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
III-II CSM (Ar 20) DL - Units - 1 & 2 - Question Answers As On 4-3-23
No ratings yet
III-II CSM (Ar 20) DL - Units - 1 & 2 - Question Answers As On 4-3-23
56 pages
DeepLearning Introduction
No ratings yet
DeepLearning Introduction
19 pages
Deep Learning Midsem Merged Previous Batch
No ratings yet
Deep Learning Midsem Merged Previous Batch
423 pages
Lecun 2015
No ratings yet
Lecun 2015
10 pages
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 1 Notes
No ratings yet
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 1 Notes
15 pages
Deep Learning Notes
100% (1)
Deep Learning Notes
71 pages
Ad3501-Dl-Unit 1 Notes
No ratings yet
Ad3501-Dl-Unit 1 Notes
43 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
‎⁨فصل ثاني اسراء⁩
No ratings yet
‎⁨فصل ثاني اسراء⁩
13 pages
Deep Learning File
No ratings yet
Deep Learning File
58 pages
Deep Learning Algorithms
No ratings yet
Deep Learning Algorithms
21 pages
Deep Learning
No ratings yet
Deep Learning
7 pages
Deep Learning
No ratings yet
Deep Learning
2 pages
Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
EEG Report Final
No ratings yet
EEG Report Final
45 pages
Module 2
No ratings yet
Module 2
37 pages
Unit-3 NNDL
No ratings yet
Unit-3 NNDL
22 pages
Deep Learning File
No ratings yet
Deep Learning File
60 pages
3rd Unit DL Final Class Notes
No ratings yet
3rd Unit DL Final Class Notes
78 pages
LP - V - Lab Manual - DL
No ratings yet
LP - V - Lab Manual - DL
53 pages
Unit 3
No ratings yet
Unit 3
16 pages
Article Review 10 Eng
No ratings yet
Article Review 10 Eng
28 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
What Is Deep Learning Basics
No ratings yet
What Is Deep Learning Basics
11 pages
Deep Learning
No ratings yet
Deep Learning
22 pages
Deep Learning With R
No ratings yet
Deep Learning With R
18 pages
Module1 - Deep Learning
No ratings yet
Module1 - Deep Learning
26 pages
Deep Learning
No ratings yet
Deep Learning
7 pages
Lecun 2015
No ratings yet
Lecun 2015
9 pages
Unit 3
No ratings yet
Unit 3
21 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
Deep Learning Introduction
No ratings yet
Deep Learning Introduction
5 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
Deep Learning Project
No ratings yet
Deep Learning Project
24 pages
Unit I
No ratings yet
Unit I
10 pages
MRK - Spring 2022 - CS719 - 2 - MS210400057
No ratings yet
MRK - Spring 2022 - CS719 - 2 - MS210400057
6 pages
Unit - 1 Deep Learning Techniques
No ratings yet
Unit - 1 Deep Learning Techniques
18 pages
Three Reasons That You Should NOT Use Deep Learning - by George Seif - Towards Data Science
No ratings yet
Three Reasons That You Should NOT Use Deep Learning - by George Seif - Towards Data Science
1 page
Class X Ai Study Material
No ratings yet
Class X Ai Study Material
40 pages
Deep Learning Algorithms and Architectures
No ratings yet
Deep Learning Algorithms and Architectures
26 pages
Nature14539 PDF
No ratings yet
Nature14539 PDF
9 pages
Deep Learning Review and Discussion of Its Future PDF
No ratings yet
Deep Learning Review and Discussion of Its Future PDF
7 pages
Deep Learning
No ratings yet
Deep Learning
7 pages
Deep Learning Review and Discussion of Its Future
No ratings yet
Deep Learning Review and Discussion of Its Future
7 pages
MVDAFT Final
No ratings yet
MVDAFT Final
30 pages
Advancements and Applications of Deep Learning
No ratings yet
Advancements and Applications of Deep Learning
4 pages
Intelligent Sustainable Systems Proceedings of ICISS 2022 Jennifer S Raj Yong Shi Danilo Pelusi Valentina Emilia Balas Eds
No ratings yet
Intelligent Sustainable Systems Proceedings of ICISS 2022 Jennifer S Raj Yong Shi Danilo Pelusi Valentina Emilia Balas Eds
74 pages
Unit 16 - CRP-SEM3 - Proposal 2023 Big Data
No ratings yet
Unit 16 - CRP-SEM3 - Proposal 2023 Big Data
93 pages
ITR Roll No.20
No ratings yet
ITR Roll No.20
3 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
Ai in Tomorrow'S Pharma and Biotech Industry: White Paper
No ratings yet
Ai in Tomorrow'S Pharma and Biotech Industry: White Paper
4 pages
Unit Iv-1
No ratings yet
Unit Iv-1
32 pages
Floodsusceptibilitymappingof Northeastcoastaldistrictsof Tamil Nadu Indiausing Multisource Geospatialdataand Machine Learningtechniques
No ratings yet
Floodsusceptibilitymappingof Northeastcoastaldistrictsof Tamil Nadu Indiausing Multisource Geospatialdataand Machine Learningtechniques
32 pages
BlackBelt Plus Curriculum
No ratings yet
BlackBelt Plus Curriculum
25 pages
Anzar Draboo
No ratings yet
Anzar Draboo
2 pages
Ijmre V4n3ai119
No ratings yet
Ijmre V4n3ai119
8 pages
Seminar5-Week 5-Data Mining and Data Analytics
No ratings yet
Seminar5-Week 5-Data Mining and Data Analytics
48 pages
Figure PPT ch008
No ratings yet
Figure PPT ch008
46 pages
Future Trends in Internet Technologies
No ratings yet
Future Trends in Internet Technologies
5 pages
Customer Churn Prediction Using Machine Learning
No ratings yet
Customer Churn Prediction Using Machine Learning
7 pages
ML & Statistics Unit 6
No ratings yet
ML & Statistics Unit 6
36 pages
Report
No ratings yet
Report
40 pages
Eigenvector Spatial Filtering Enhancing Natural Hazards V - 2024 - Environmental
No ratings yet
Eigenvector Spatial Filtering Enhancing Natural Hazards V - 2024 - Environmental
19 pages
MoE Instruction Tuning
No ratings yet
MoE Instruction Tuning
24 pages
Sex Trouble Sexgender Slippage Sex Confusion and S
No ratings yet
Sex Trouble Sexgender Slippage Sex Confusion and S
11 pages
Deep Learning
100% (1)
Deep Learning
2 pages
Number (Old) Title Old Course Area (Before July 2019) New Course Area (After June 2019)
No ratings yet
Number (Old) Title Old Course Area (Before July 2019) New Course Area (After June 2019)
5 pages
AI Model Test Paper 1
No ratings yet
AI Model Test Paper 1
9 pages
Neural Network in 5 Minutes
No ratings yet
Neural Network in 5 Minutes
7 pages
ITXXX Applied Forecasting Methods Winter - Pritam Anand
No ratings yet
ITXXX Applied Forecasting Methods Winter - Pritam Anand
3 pages
Deep Learning Lab
No ratings yet
Deep Learning Lab
6 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
Deep learning: deep learning explained to your granny – a guide for beginners
From Everand
Deep learning: deep learning explained to your granny – a guide for beginners
PAT NAKAMOTO
3/5 (2)
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
From Everand
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
William Sullivan
1/5 (1)

Unit-3 Notes

Uploaded by

Unit-3 Notes

Uploaded by

UNIT III

Deep Learning is a subset of Machine Learning that uses mathematical

In traditional computer programming, input and a set of rules are combined

Why Deep Learning

Applications of Deep Learning :

Image segmentation: Deep learning models can be used for image

In reinforcement learning, deep learning works as training agents to take action

Robotics: Deep reinforcement learning models can be used to train robots to

Control systems: Deep reinforcement learning models can be used to control

Evaluated on our whole training set, the MSE loss function is

the complete model being f (x; W,C,w,b) = f (2) (f (1) (x))

where W provides the weights of a linear transformation and c the biases.

We describe an affine transformation from a vector x to a vector h, so an entire vector of bias

We can now specify our complete network as

f (x; W,C,w,b) = wT max{0, W T x + c} + b

We can now specify a solution to the XOR problem. Let

Next, we add the bias vector c, to obtain

Types of Cost Function

1.Regression Cost Function

2.Binary Classi cation cost Functions

Sometimes, we rather than predicting a complete probability distribution over y, we merely

Learning Conditional Distributions with Maximum Likelihood

We discuss motivations behind choice of hidden unit. It is usually impossible to predict in

Rectified Linear Units and Their Generalizations (ReLU)

ReLU vs other activations:

Universal Approximation Properties and Depth

You might also like