0% found this document useful (0 votes)
30 views35 pages

Gradient Descent and Cost Function

The document discusses gradient descent and its relationship with cost functions, slope, and derivatives. It explains how gradient descent is used to find the optimal regression line by minimizing the mean square error through iterative adjustments of slope and intercept. Additionally, it highlights the importance of step size and learning rate in achieving convergence to the global minimum.

Uploaded by

Asma Ayub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views35 pages

Gradient Descent and Cost Function

The document discusses gradient descent and its relationship with cost functions, slope, and derivatives. It explains how gradient descent is used to find the optimal regression line by minimizing the mean square error through iterative adjustments of slope and intercept. Additionally, it highlights the importance of step size and learning rate in achieving convergence to the global minimum.

Uploaded by

Asma Ayub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

GRADIENT DESCENT AND COST

FUNCTION
RECAP OF BASIC CONCEPTS
SLOPE
RECAP OF BASIC CONCEPTS

Case of a linear line

Delta y = 3 (9-6) {change in y}


Delta x = 1 (3-2) {change in x}
RECAP OF BASIC CONCEPTS
SLOPE
Case of a non-linear
Slope is not constant
Based on the point under observation,
the slope may vary
How to find the slope ?
RECAP OF BASIC CONCEPTS
SLOPE
Zoom on the point, and it will look like a straight
line
We can then use Delta y and Delta x, to calculate
the slope
RECAP OF BASIC CONCEPTS
SLOPE
RECAP OF BASIC CONCEPTS
SLOPE AND DERIVATIVE
RECAP OF BASIC CONCEPTS
DERIVATIVE
RECAP OF BASIC CONCEPTS
DERIVATIVE
RECAP OF BASIC CONCEPTS
SLOPE AND DERIVATIVE
RECAP OF BASIC CONCEPTS
DERIVATIVE
RECAP OF BASIC CONCEPTS
SLOPE AND DERIVATIVE
Geometrically, the derivative of a function
can be interpreted as the slope of the graph
of the function or, more precisely, as the
slope of the tangent line at a point.

Its calculation, in fact, derives from the slope


formula for a straight line, except that a limiting
process must be used for curves.
RECAP OF BASIC CONCEPTS
SLOPE AND DERIVATIVE
RECAP OF BASIC CONCEPTS
PARTIAL DERIVATIVE
RECAP OF BASIC CONCEPTS

PARTIAL DERIVATIVE AND GRADIENT DESCENT


Partial derivative tells us that how much will be the
effect of each weight change on the cost function.
E.g. how much price is changing, given number of
bedrooms
RECAP OF BASIC CONCEPTS
POSSIBLE LINEAR REGRESSION LINES

17
MEAN SQUARE ERROR

18
MEAN SQUARE ERROR

19
MEAN SQUARE ERROR

20
GRADIENT DESCENT

There can be multiple regression lines that we


can create with different values of slope and
intercept.

But we cannot try every permutation and


combination of slope and intercept.

The efficient approach will be to select the


optimal line in minimum number of iterations.

Gradient descent is the algorithm that helps to


find best fit line for given training dataset in
minimum iterations.
GRADIENT DESCENT
Slope and intercept are
plotted against MSE.

Different values of m and b


will create a plane.

22
We start from any value of
m and b, usually 0.

Then from that point cost is


calculated and it is reduced
with every mini-step.

We keep on taking small


steps until we reach the
minimum point.

At minima, the error is


minimum.
GRADIENT DESCENT
Two views from m and b.

23
GRADIENT DESCENT
If we take fixed size steps, we can miss global
minima.
The gradient descent will never converge.

24
GRADIENT DESCENT
Varying step sizes can help to achieve global minima.

25
GRADIENT DESCENT
Varying step sizes can be achieved by calculating slope at
each point.
Partial derivative/Slope will tell in which direction we
need to go.

26
GRADIENT DESCENT
Learning rate decides the step size.
GRADIENT DESCENT

28
GRADIENT DESCENT

Step size Derivative


Term
For simplicity, assume we have to
minimize a function of one variable
only J(θ1)

29
GRADIENT DESCENT

J(θ1) ; θ1 is a real number

Slope of the line


θ1 tangent to the
Initialize θ1 function at point θ1

Line has a positive slope,


derivative is positive, so θ1 moves
towards the left

30
GRADIENT DESCENT

J(θ1) ; θ1 is a real number

Slope of the line


θ1 tangent to the
function at point θ1

Line has a positive slope,


derivative is positive, so θ1 moves
towards the left

31
GRADIENT DESCENT

J(θ1) ; θ1 is a real number

θ1

Line has a negative slope,


derivative is negative, θ1 minus a
negative number adds something
to θ1 and moves it towards the
right
32
GRADIENT DESCENT

J(θ1) ; θ1 is a real number

θ1

IF alpha is too small, gradient descent


will take small steps, hence a long
time to converge

33
GRADIENT DESCENT

J(θ1) ; θ1 is a real number

θ1

IF alpha is too large, gradient descent


can overshoot the minimum and may
fail to converge

34
GRADIENT DESCENT

J(θ1) ; θ1 is a real number

θ1

If is θ1 already at minimum, the


derivative term will be zero and the
algorithm will converge in the first
iteration

35

You might also like