0% found this document useful (0 votes)
53 views1 page

Cost Function: y 2m 1 (Y ) 2m 1

The document discusses machine learning cost functions and gradient descent. It explains that cost functions measure the accuracy of a hypothesis function by taking the average squared difference between predicted and actual output values. This is called the mean squared error. Gradient descent is then introduced as a way to estimate the parameters in the hypothesis function to minimize the cost function and improve accuracy. It works by plotting the cost function with the hypothesis parameters on the x and y axes to find the minimum value at the bottom of "pits" on the graph.

Uploaded by

Laith Bounenni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views1 page

Cost Function: y 2m 1 (Y ) 2m 1

The document discusses machine learning cost functions and gradient descent. It explains that cost functions measure the accuracy of a hypothesis function by taking the average squared difference between predicted and actual output values. This is called the mean squared error. Gradient descent is then introduced as a way to estimate the parameters in the hypothesis function to minimize the cost function and improve accuracy. It works by plotting the cost function with the hypothesis parameters on the x and y axes to find the minimum value at the bottom of "pits" on the graph.

Uploaded by

Laith Bounenni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

27/09/2020 Machine Learning - Home | Coursera

Cost Function
We can measure the accuracy of our hypothesis function by using a cost function. This takes an average
(actually a fancier version of an average) of all the results of the hypothesis with inputs from x's compared to the
actual output y's.

m m
1 2 1 2
J(θ0 , θ1 ) = ∑ (y^i − yi ) = ∑ (hθ (xi ) − yi )
2m i=1 2m i=1

To break it apart, it is 12 x
ˉ where xˉ is the mean of the squares of hθ (xi ) − yi , or the difference between the
predicted value and the actual value.

1
This function is otherwise called the "Squared error function", or "Mean squared error". The mean is halved ( 2m )
as a convenience for the computation of the gradient descent, as the derivative term of the square function will
cancel out the 12 term.

Now we are able to concretely measure the accuracy of our predictor function against the correct results we have
so that we can predict new results we don't have.

If we try to think of it in visual terms, our training data set is scattered on the x-y plane. We are trying to make
straight line (defined by hθ (x)) which passes through this scattered set of data. Our objective is to get the best
possible line. The best possible line will be such so that the average squared vertical distances of the scattered
points from the line will be the least. In the best case, the line should pass through all the points of our training
data set. In such a case the value of J(θ0 , θ1 ) will be 0.

ML:Gradient Descent
So we have our hypothesis function and we have a way of measuring how well it fits into the data. Now we need
to estimate the parameters in hypothesis function. That's where gradient descent comes in.

Imagine that we graph our hypothesis function based on its fields θ0 and θ1 (actually we are graphing the cost
function as a function of the parameter estimates). This can be kind of confusing; we are moving up to a higher
level of abstraction. We are not graphing x and y itself, but the parameter range of our hypothesis function and
the cost resulting from selecting particular set of parameters.

We put θ0 on the x axis and θ1 on the y axis, with the cost function on the vertical z axis. The points on our graph
will be the result of the cost function using our hypothesis with those specific theta parameters.

We will know that we have succeeded when our cost function is at the very bottom of the pits in our graph, i.e.
when its value is the minimum.

https://www.coursera.org/learn/machine-learning/resources/JXWWS 1/1

You might also like