Unit 2
Unit 2
Loss Function :
If the model design is iterated many times using a limited size data set,
then some over-fitting to the validation data can occur and so it may be
necessary to keep aside a third test set on which the performance of
the selected model is finally evaluated.
Clearly, we need a better approach. Ideally, this should rely only on the
training data and should allow multiple hyperparameters and model
types to be compared in a single training run. We therefore need to
find a measure of performance which depends only on the training data
and which does not suffer from bias due to over-fitting.
Optimization :
Machine learning algorithms usually require a high amount of numerical
computation. This typically refers to algorithms that solve mathematical
problems by methods that update estimates of the solution via an
iterative process, rather than analytically deriving a formula providing a
symbolic expression for the correct solution. Common operations
include optimization (finding the value of an argument that minimizes or
maximizes a function) and solving systems of linear equations.
Gradient-Based Optimization :
Most deep learning algorithms involve optimization of some
sort. Optimization refers to the task of either minimizing or maximizing
some function f(x) by altering x. We usually phrase most optimization
problems in terms of minimizing f(x). Maximization may be accomplished
via a minimization algorithm by minimizing −f(x). The function we want
to minimize or maximize is called the objective func- tion or criterion.
When we are minimizing it, we may also call it the cost function, loss
function, or error function.
Difficulty of training deep neural networks
Challenges Motivating Deep Learning:
The simple machine learning algorithms have not succeeded in solving
the central problems in AI, such as recognizing speech or recognizing
objects. The development of deep learning was motivated in part by the
failure of traditional algorithms to generalize well on such AI tasks.
While the k-nearest neighbors algorithm copies the output from nearby
training examples, most kernel machines interpolate between training
set outputs associated with nearby training examples.