0% found this document useful (0 votes)
17 views

Deep Learning 3rd Module

Uploaded by

amitbpattar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Deep Learning 3rd Module

Uploaded by

amitbpattar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

MODULE 3

Optimization for Training Deep Models: How Learning Differs from Pure Optimization,
Challenges in Neural Network Optimization, Basic Algorithms. Parameter Initialization
Strategies, Algorithms with Adaptive Learning Rates. Convolutional Networks: The Convolution
Operation, Motivation, Pooling, Convolution and Pooling as an Infinitely Strong Prior, Variants
of the Basic Convolution Function, Structured Outputs, Data Types, Efficient Convolution
Algorithms, Random or Unsupervised Features.

How Learning Differs from Pure Optimization

Optimization algorithms used for training of deep models differ from traditional optimization
algorithms in several ways. Machine learning usually acts indirectly. In most machine learning
scenarios, we care about some performance measure P, that is defined with respect to the test set
and may also be intractable. We therefore optimize P only indirectly. We reduce a different cost
function J(θ) in the hope that doing so will improve P . This is in contrast to pure optimization,
where minimizing J is a goal in and of itself. Optimization algorithms for training deep models
also typically include some specialization on the specific structure of machine learning objective
functions. Typically, the cost function can be written as an average over the training set, such as
J(θ) = E(x,y)∼pˆdataL(f(x; θ), y),

where L is the per-example loss function, f (x; θ) is the predicted output when the input is x,
pˆdata is the empirical distribution. In the supervised learning case, y is the target output.
Throughout this chapter, we develop the unregularized supervised case, where the arguments to
L are f(x; θ) and y. However, it is trivial to extend this development, for example, to include θ or
x as arguments, or to exclude y as arguments, in order to develop various forms of regularization
or unsupervised learning. Equation 8.1 defines an objective function with respect to the training
set. We would usually prefer to minimize the corresponding objective function where the
expectation is taken across the data generating distribution pdata rather than just over the finite
training set: J ∗(θ) = E(x,y)∼pdataL(f(x; θ), y)
Example

Image Classification

Suppose we have a dataset of images of handwritten digits (MNIST dataset) and we want to
build a deep learning model to classify these digits into their respective categories (0-9).

the learning approach involves iteratively adjusting the model's parameters based on input data
and targets to improve performance on a specific task (image classification). In contrast, the pure
optimization approach focuses solely on finding the optimal values of the model parameters
without considering the data, which may lead to suboptimal generalization performance.

You might also like