0% found this document useful (0 votes)
20 views

Lecture 4 - Cost Function

Uploaded by

Tehreem Qamar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Lecture 4 - Cost Function

Uploaded by

Tehreem Qamar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Cost Function

Lecture 4
Recall predicting housing prices example?
• Find a price for house 1250 sq.ft? 500000

• Draw a line, around 220? 400000


300000
200000
100000
0
500 1000 1500 2000 2500 3000
Housing Prices
In million
Size (feet2)
Training Set Size in feet2 (x) Price ($) in 1000's (y)
2104 460
1416 232
1534 315
852 178
… …

Hypothesis:
‘s: Parameters of the model
How to choose ‘s ?
3
Model Representation
• More formally, in supervised learning, we have a data set and this
data set is called a training set.
• So for housing prices example, we have a training set of different
housing prices and our job is to learn from this data how to predict
prices of the houses

4
Notation:

Size in feet2 (x) Price ($) in 1000's (y)
• m = Number of training examples 2104 460
• x’s = “input” variable / features 1416 232
1534 315
• y’s = “output” variable / “target” 852 178
variable … …
• (x,y ) = one training example
• So, if we have size and prices of 47 houses, what is
• (x y )= i training example
i, i, th
m?
47
• What is x2
1416

• What is y2
232 5
Training Set • Training set like our training set of
housing prices
• We feed that to our learning
algorithm.
Learning Algorithm • It is the job of a learning algorithm
to then output a function h
• h is a function that takes as input,
the size of a house x and it tries to
Size of h Estimated output the estimated value of y for
house price the corresponding house.
• h is a function that maps from x's to
y's

6
The Hypothesis Function h

hθ(x)=θ0 + θ1x
We give to hθ(x)= values for θ0 and θ1 to get our output 'y'.

In other words, we are trying to create a function called h(x) that is


able to reliably map our input data (the x's) to our output data (the y's).

This model is called linear regression with one variable or Univariate linear

regression
7
500000

400000

300000
hθ(x)=θ0 +
200000
θ1x
100000

0
500 1000 1500 2000 2500 3000

8
• We have to minimize the difference
between h(x) and y
y • ( h(x) – y)
Where h(x) is what we are predicting and y is
the actual output
• Minimize the square difference between
x the output of the hypothesis and the actual
price of a house

Idea: Choose so that


is close to
for our training examples
9
Cost Function
Cost function
• We can measure the accuracy of our hypothesis function by using
a cost function
• This takes an average of all the results of the hypothesis with inputs
from x's compared to the actual output y's.
• This function is otherwise called the "Squared error function",
or Mean squared error

11
“Cost” Function
• Coss/ Loss in Machine learning helps us understand the difference
between the predicted value & the actual value.
• The Function used to quantify this loss during the training phase in the
form of a single real number is known as “Loss Function”.
• Loss function: Used when we refer to the error for a single training
example.
• Cost function: Used to refer to an average of the loss functions over an
entire training dataset.
Why use a Cost Function?

• Cost function helps us reach the optimal solution.


• The cost function is the technique of evaluating “the performance of our algorithm/model”.
• It takes both predicted outputs by the model and actual outputs and calculates how much wrong the
model was in its prediction.
• It outputs a higher number if our predictions differ a lot from the actual values.
Types of the Cost Function
• Regression cost Function
• Binary Classification cost Functions
• Multi-class Classification cost Functions
Regression Cost Functions
• A cost function used in the regression problem is called “Regression
Cost Function”.
• They are calculated on the distance-based error as follows:
• Error = y-y’
Where,
Y – Actual Input
Y’ – Predicted output
Mean Error
• In this cost function, the error for each training data is calculated and
then the mean value of all these errors is derived.
• Calculating the mean of the errors is the simplest and most intuitive
way possible.
• The errors can be both negative and positive. So they can cancel each
other out during summation giving zero mean error for the model.
• Thus this is not a recommended cost function but it does lay the
foundation for other cost functions of regression models.
Mean Squared Error
• This improves the drawback we encountered in Mean Error above. Here a square of the difference
between the actual and predicted value is calculated to avoid any possibility of negative error.
• It is measured as the average of the sum of squared differences between predictions and actual
observations.

• MSE = (sum of squared errors)/n

• It is also known as L2 loss.


• In MSE, since each error is squared, it helps to penalize even small deviations in prediction when
compared to MAE. But if our dataset has outliers that contribute to larger prediction errors, then
squaring this error further will magnify the error many times more and also lead to higher MSE
error.
• Hence we can say that it is less robust to outliers
Mean Absolute Error
• This cost function also addresses the shortcoming of mean error differently. Here an
absolute difference between the actual and predicted value is calculated to avoid any
possibility of negative error.
• So in this cost function, MAE is measured as the average of the sum of absolute
differences between predictions and actual observations.

• MAE = (sum of absolute errors)/n

• It is also known as L1 Loss.


• It is robust to outliers thus it will give better results even when our dataset has noise or
outliers.

You might also like