Revised-L3-Linear Regression
Revised-L3-Linear Regression
Vinh Vo
Outline
• Introductory problems
• Linear regression: theorical review
– Model representation
– Cost function
– Gradient descent
– Evaluation metrics
• Case study: House Price in Boston
• Discussion on linear regression
• Quizzes and Exercises
Many slides of this lecture are adapted from Machine Learning, Linear Regression by Prof. Andrew Ng
Outline
• Introductory problems
• Linear regression: theorical review
– Model representation
– Cost function
– Gradient descent
– Evaluation metrics
• Case study: House Price in Boston
• Discussion on linear regression
• Quizzes and Exercises
Introductory Problems
Problem 1: Weight Prediction Problem 2: House Price Prediction
400
300
200
100
0
0 1000 2000 3000
Size (m2)
Outline
• Introductory problems
• Linear regression
– Model representation
– Cost function
– Gradient descent
– Evaluation metrics
• Case study: House Price in Boston
• Discussion on linear regression
• Quizzes and Exercises
Model Representation
ü Objective: quantitively formulate the relationship between price and size
ü Assume that house price is linearly proportional to house size
ü Find the line that approximately best fit the dataset. Mathematically:
Ø 𝑝rice = 𝜃! + 𝜃" size
Ø h# (x) = 𝜃! + 𝜃" x
Ø Goal: find 𝜃! and 𝜃" such that h# (x) “best fit” the dataset
ü Terminologies:
Ø h! (x): model or hypothesis
Ø 𝜃" and 𝜃# : parameters of the model h! (x)
500
400
House Price
300
(in 1000$)
200
ith training example:
100
x (") , y (")
0
0 500 1000 1500 2000 2500 3000
Size (m2)
3 3 3
h! x = 1 + 0.5x
2 h! x = 1.5 + 0x
2 2
h! x = 0 + 0.5x
1 1 1
0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
𝜃) = 1.5 𝜃) = 0 𝜃) = 1
𝜃' = 0 𝜃' = 0.5 𝜃' = 0.5
Parameters:
Cost Function:
Goal:
400
300
200
100
0
0 1000
Size 2000 3000 Minimum value of J 𝜃), 𝜃'
J 𝜃' , 𝜃&
𝜃& Local
𝜃' minimum
Outline
• Introductory problems
• Linear regression: theorical review
– Model representation
– Cost function
– Gradient descent
– Evaluation metrics
• Case study: House Price in Boston
• Disadvantages of linear regression
• Quizzes and Exercises
Linear Regression: Evaluation Metrics
Training Set: x (%) , y (%) RMSE (Root Mean Square Error):
the average square error between
the predicted value h! x (%) and
the true value y (%) .
Learning Algorithm 0 ≤ 𝑅𝑀𝑆𝐸 ≤ ∞
The closer to 0, the better.
Estimated
Size of 𝑹𝟐 : how much the total variance of
h! x price: the dependent variable 𝑦 is
house: x (#)
h! x (#) explained by the (𝑥/𝑦) relationship
Hypothesis
Evaluation 0 ≤ 𝑅) ≤ 1
How good is ℎ( 𝑥 ? The closer to 1, the better.
metrics
'
RMSE and R
"
üRMSE = ∑$ h
%&" ' x (%) − y (%) *
$
+ the sum of the square of
* ∑)
&'( ,* -(&) ./ (&)
errors/residuals
üR = 1 − + Total square distance of each
∑)
&'(
(&)
/ .0 / observations from the mean
14.00
12.00
10.00
8.00
Observed
6.00
4.00
2.00
0.00
0.00 1.00 2.00 3.00 4.00 5.00 6.00
Linear Regression: Example (3/3)
SSE 0.975
R 2 = 1− = 1− = 0.98
TSS 47.369
Outline
• Introductory problems
• Linear regression: theorical review
– Model representation
– Cost function
– Gradient descent
– Evaluation metrics
• Case study: House Price in Boston
• Discussion on linear regression
• Quizzes and Exercises
Case study: The dataset of house price in Boston
# data instance: 506
# attributes per each instance: 14
Table of dataset
input
feature
Multivariable regression:
h# (x) = 𝜃! + 𝜃" x" + ⋯ + 𝜃$ x$
predict variable MV (or MEDV)
Explore the data
Data instances Statistics
Multivariable Linear Regression (1/4)
Coefficient of the model: h𝜽 (x) Model:
h# (x) = 𝜃! + 𝜃" x" + ⋯ + 𝜃$ x$
Results:
Training Set Testing Set
#instances: 404 #instances: 102
RMSE = 19.33 RMSE = 33.45
R) ≅ 0.77 R) ≅ 0.59
outliers
What we have learned so far
• Introductory problems
• Linear regression: a theorical view
– Model representation
– Cost function
– Gradient descent
– Evaluation metrics
• Case study: House Price in Boston
• Disadvantages of linear regression:
– Sensitive to outliers
– Not good for non-linear relationships between 𝑥 and 𝑦
• Quizzes and Exercises
THE END