0% found this document useful (0 votes)
7 views41 pages

Revised-L3-Linear Regression

tham khảo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views41 pages

Revised-L3-Linear Regression

tham khảo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Topic: Linear Regression

Date: 12 September 2024

Vinh Vo
Outline
• Introductory problems
• Linear regression: theorical review
– Model representation
– Cost function
– Gradient descent
– Evaluation metrics
• Case study: House Price in Boston
• Discussion on linear regression
• Quizzes and Exercises

Many slides of this lecture are adapted from Machine Learning, Linear Regression by Prof. Andrew Ng
Outline
• Introductory problems
• Linear regression: theorical review
– Model representation
– Cost function
– Gradient descent
– Evaluation metrics
• Case study: House Price in Boston
• Discussion on linear regression
• Quizzes and Exercises
Introductory Problems
Problem 1: Weight Prediction Problem 2: House Price Prediction

ü A healthcare center recorded ü There is a dataset of 506


a dataset of the height and houses in Boston. Each house
weight of 500 people. has 13 attributes: price, size,
teacher-pupil ratio, etc.
ü Goal: predicting the weight of
a new person, given his/her ü Goal: predict the price of a
height new house using the given
attribute in the dataset.
Introductory Problems - Data Visualization
Problem 1: Weight Prediction Problem 2: House Price Prediction
Price
500

400

300

200

100

0
0 1000 2000 3000
Size (m2)
Outline
• Introductory problems
• Linear regression
– Model representation
– Cost function
– Gradient descent
– Evaluation metrics
• Case study: House Price in Boston
• Discussion on linear regression
• Quizzes and Exercises
Model Representation
ü Objective: quantitively formulate the relationship between price and size
ü Assume that house price is linearly proportional to house size
ü Find the line that approximately best fit the dataset. Mathematically:
Ø 𝑝rice = 𝜃! + 𝜃" size
Ø h# (x) = 𝜃! + 𝜃" x
Ø Goal: find 𝜃! and 𝜃" such that h# (x) “best fit” the dataset
ü Terminologies:
Ø h! (x): model or hypothesis
Ø 𝜃" and 𝜃# : parameters of the model h! (x)
500

400
House Price
300
(in 1000$)
200
ith training example:
100
x (") , y (")
0
0 500 1000 1500 2000 2500 3000
Size (m2)

Supervised Learning Regression Problem


The “right answer” (label) for each The predicted value (output) which
example in the dataset is given is house price is a real number
Training set of Size in m2 (𝑥) Price ($) (𝑦)
housing prices 2104 460 è(x (') , y (') )
1416 232 m training
examples
… …
852 178 è(x (%) , y (%) )
… …
Notations and Terminologies:
m = number of training examples
x’s = “input” variable / feature/independent variables
y’s = “output” variable / target/dependent variables
x, y : one training example/data point/data instance
(x (%) , y (%) ): the ith training example
Problem summary
How do we represent h( x ?
Training Set: x (%) , y (%) y
𝜃 &
x
+
=𝜃
%
x
x x x
h$ x x
Learning Algorithm x x
Shorthand: h(x)
x
Estimated Goal: find 𝜃) and 𝜃' such that
Size of
h! x price: h( (x) “best fit” the dataset.
house: x (#)
h! x (#)
Hypothesis
This is called Linear Regression
h( x maps from x′s to y′s with one variable (univariate)
Outline
• Introductory problems
• Linear regression
– Model representation
– Cost function
– Gradient descent
– Evaluation metrics
• Case study: House Price in Boston
• Discussion on linear regression
• Quizzes and Exercises
Training set of Size in m2 (𝑥) Price ($) (𝑦)
housing prices 2104 460 è(x (') , y (') )
1416 232 m
training
… … examples
852 178 è(x (%) , y (%) )
… …

Hypothesis: h( x = 𝜃) + 𝜃'x Let’s plot h( x with some


𝜃%. s: parameters values of 𝜃) and 𝜃'(next slide)

How to choose 𝜃%. s?


What does “best fit” mean?
h( x = 𝜃) + 𝜃'x

3 3 3
h! x = 1 + 0.5x
2 h! x = 1.5 + 0x
2 2
h! x = 0 + 0.5x
1 1 1

0 0 0
0 1 2 3 0 1 2 3 0 1 2 3

𝜃) = 1.5 𝜃) = 0 𝜃) = 1
𝜃' = 0 𝜃' = 0.5 𝜃' = 0.5

How do we determine ℎ 𝑥 in general?


How do we determine ℎ x ? Find 𝜃) and 𝜃' such that
(%) (%) 3
minimize h x −y ,1≤𝑖 ≤m
y 5
1 3
J(𝜃), 𝜃') = J h x (%) − y (%)
2m
%4'
predicted e𝑟𝑟𝑜𝑟 = ℎ(x (%) ) − y (%) 5
value 1 3
h(x (") ) true value y (#) = J 𝜃) + 𝜃'x (%) − y (%)
x 2m
%4'
J(𝜃), 𝜃') is the cost function
Idea: choose 𝜃) and 𝜃' so that (by root mean squared error - RMSE)
h x (%) is close to y (%) for all
training examples x (%) , y (%) Find 𝜃), 𝜃' such that J(𝜃), 𝜃') is
minimum: least squared regression
How does the cost function look like? Simplified form
Hypothesis:
y
x x x
Parameters: 𝜃x&
= x
xx x

x
Cost Function:
x
(
1 )
Goal: J(𝜃% ) = + 𝜃% x (&) − y (&)
2m
&'%

To illustrate the graph of J(𝜃), 𝜃'),


consider its simplified form: 𝜃) = 0
Hypothesis: ℎ( 𝑥 = 𝜃'x Cost function: J 𝜃'
(for fixed 𝜃& , this is a function of 𝑥) (function of the parameter 𝜃& )
ℎ! x = x
3 𝑚=3 (𝜃& = 1) 3
ℎ(𝑥)=0.5x X X
2 2
(𝜃& =0.5)
y 1 1
ℎ(𝑥)=0 X X
𝜃& = 0 0 X
0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
*
1 ) 𝑱 𝟏 =𝟎 𝑱 𝟎 ≈ 𝟐. 𝟑𝟑
J 𝜃% =J 1 = + 𝜃% x + − y +
2𝑚 𝑱 𝟎. 𝟓 ≈ 𝟎. 𝟓𝟖 𝑱 𝟐 ≈ 𝟐. 𝟑𝟑
&'%
1 𝑱 𝟏. 𝟓 ≈ 𝟎. 𝟓𝟖
= 0) + 0) + 0) = 0
6
Hypothesis: ℎ( 𝑥 = 𝜃'x Cost function: J 𝜃'
(for fixed 𝜃& , this is a function of 𝑥) (function of the parameter 𝜃& )
ℎ x =x 𝑱 𝟏 =𝟎
3 𝑚=3 (𝜃& = 1)
y
2 ℎ 𝑥 = 0.5𝑥 𝑱 𝟎. 𝟓 ≈ 𝟎. 𝟓𝟖
(𝜃& =0.5)
X X
1
ℎ 𝑥 =0 𝑱 𝟎 ≈ 𝟐. 𝟑𝟑 X X
𝜃& = 0 X 𝜃&
0
𝑱 𝟐 ≈ 𝟐. 𝟑𝟑
0 1 2 3 x
min J 𝜃& = 0
* 𝑱 𝟏. 𝟓 ≈ 𝟎. 𝟓𝟖 at 𝜃& = 1
1 )
J 𝜃% =J 1 = + 𝜃% x + − y +
2𝑚
&'%
1
= 0) + 0) + 0) = 0
6
What is the story so far?
Hypothesis:

Parameters:

Cost Function:

Goal:

General form of 𝐽 𝜃! , 𝜃" : a surface in ℝ#


General form of ℎ( 𝑥 Cost function: J 𝜃), 𝜃'
(for fixed 𝜃), 𝜃') (function of parameters in 3D)
Price
500

400

300

200

100

0
0 1000
Size 2000 3000 Minimum value of J 𝜃), 𝜃'

Determine the minimum point by an algorithm called gradient descent


Outline
• Introductory problems
• Linear regression: theorical review
– Model representation
– Cost function
– Gradient descent
– Evaluation metrics
• Case study: House Price in Boston
• Discussion on linear regression
• Quizzes and Exercises
General form of Gradient Descent
ü Have some function J 𝜃) , 𝜃' , … , 𝜃G
ü Want 𝜃) , 𝜃' , … , 𝜃G such that J 𝜃) , 𝜃' , … , 𝜃G is minimum
ü Idea:
– Start with some 𝜃! , 𝜃" , … , 𝜃$
(example 𝜃! = 0, 𝜃" = 0.62, … , 𝜃$ = 0.89)
– Keep changing 𝜃! , 𝜃" , … , 𝜃$ to reduce J 𝜃! , 𝜃" , … , 𝜃$ until we
hopefully end up at a minimum
Gradient descent in 3D: J 𝜃% , 𝜃&
Anothe
t r starting
oin p oint
tin gp
sta r
One

J 𝜃' , 𝜃&

𝜃& Local
𝜃' minimum
Outline
• Introductory problems
• Linear regression: theorical review
– Model representation
– Cost function
– Gradient descent
– Evaluation metrics
• Case study: House Price in Boston
• Disadvantages of linear regression
• Quizzes and Exercises
Linear Regression: Evaluation Metrics
Training Set: x (%) , y (%) RMSE (Root Mean Square Error):
the average square error between
the predicted value h! x (%) and
the true value y (%) .
Learning Algorithm 0 ≤ 𝑅𝑀𝑆𝐸 ≤ ∞
The closer to 0, the better.

Estimated
Size of 𝑹𝟐 : how much the total variance of
h! x price: the dependent variable 𝑦 is
house: x (#)
h! x (#) explained by the (𝑥/𝑦) relationship
Hypothesis
Evaluation 0 ≤ 𝑅) ≤ 1
How good is ℎ( 𝑥 ? The closer to 1, the better.
metrics
'
RMSE and R
"
üRMSE = ∑$ h
%&" ' x (%) − y (%) *
$
+ the sum of the square of
* ∑)
&'( ,* -(&) ./ (&)
errors/residuals
üR = 1 − + Total square distance of each
∑)
&'(
(&)
/ .0 / observations from the mean

where y3 : the mean of observations in dataset


Analytical Solution By Calculus
One way to find the minimum value of J is to
Solution (b):
set its partial derivatives equal to zero:
" $ (%) (%) #
J 𝜃! , 𝜃" = ∑%&" 𝜃! + 𝜃" x − y n ∑ xy − ∑ x ∑ y
#$
)* +( ,+) 𝜃' =
=0 n ∑ x3 − ∑ x 3
)+(
I )* + ,+ (a)
( )
=0 ∑ y − 𝜃' ∑ x
)+) 𝜃) =
Solving the system (a) results the solution (b) 𝑚
Linear Regression: Example (1/3)
𝒙 𝒚 𝒙𝟐 𝒙𝒚
1.20 4.00 1.44 4.80 7 ∗ 223.61 − 24.1 ∗ 58
2.30 5.60 5.29 12.88 𝜃# =
7 ∗ 95.31 − 24.1 *
3.10 7.90 9.61 24.49
3.40 8.00 11.56 27.20 1565.27 − 1397.80 167.47
𝜃# = = = 1.94
4.00 10.10 16.00 40.40 667.17 − 580.81 86.36
4.60 10.40 21.16 47.84 ∑ y − 𝜃# ∑ x 58 − 1.94 ∗ 24.1
5.50 12.00 30.25 66.00 𝜃" = = = 1.61
n 7
24.10 58.00 95.31 223.61

True value: 𝑦 = 2𝑥 + 1.5 Predicted value: 𝑦 = 1.94𝑥 + 1.61


Linear Regression: Example (2/3)
Observed

14.00

12.00

10.00

8.00
Observed
6.00

4.00

2.00

0.00
0.00 1.00 2.00 3.00 4.00 5.00 6.00
Linear Regression: Example (3/3)

SSE 0.975
R 2 = 1− = 1− = 0.98
TSS 47.369
Outline
• Introductory problems
• Linear regression: theorical review
– Model representation
– Cost function
– Gradient descent
– Evaluation metrics
• Case study: House Price in Boston
• Discussion on linear regression
• Quizzes and Exercises
Case study: The dataset of house price in Boston
# data instance: 506
# attributes per each instance: 14
Table of dataset
input
feature

Multivariable regression:
h# (x) = 𝜃! + 𝜃" x" + ⋯ + 𝜃$ x$
predict variable MV (or MEDV)
Explore the data
Data instances Statistics
Multivariable Linear Regression (1/4)
Coefficient of the model: h𝜽 (x) Model:
h# (x) = 𝜃! + 𝜃" x" + ⋯ + 𝜃$ x$
Results:
Training Set Testing Set
#instances: 404 #instances: 102
RMSE = 19.33 RMSE = 33.45
R) ≅ 0.77 R) ≅ 0.59

This model seems not good


è Try to re-select features for better
performance è feature engineering
Multivariable Linear Regression (2/4)
The correlation matrix between attributes Observations:
Ø MEDV and RM: strong positive
correlation (0.7)
Ø MEDV and LSTAT: strong negative
correlation (-0.74).
Ø Multi-co-linearity: The
features RAD, TAX have a strong
correlation of 0.91 è not select
both these features together
Ø RM and LSTAT should be features

Range of each entry: −1,1


Multivariable Linear Regression (3/4)

MEDV and LSTAT: MEDV and RM:


strong negative strong positive
correlation (-0.74) correlation (0.7)

Try RM and LSTAT as features


Pairwise relationship between
MEDV and other attributes
Multivariable Linear Regression (4/4)
#data instance: 506 RMSE R)
#training instances: 404 Training Testing Training Testing
#testing instances: 102 Set Set Set Set
Use all 13 attributes as
input feature 𝑥
19.33 33.45 0.77 0.59
Only use RM and LSTAT
as input feature 𝑥
5.64 5.14 0.63 0.66

For a particular problem we may choose


appropriate features for building the model
è No general recipe, this process is an artwork!
Training set TRAINING
1 2 3
x (%) , y (%) PHASE
Raw Data Feature Extracted 4
extraction feature
(feature
engineering) Learning Algorithm:
Gradient descent min J(𝜃)
(
Data exploration
Data visualization 5 TESTING PHASE
Correlation Matrix Testing set
… x (*) , y (*) Model Estimated value:
h! x h! (x (#) )
Linear Regression: An Overall Story
h! (x) = 𝜃, + 𝜃% x% + ⋯ + 𝜃-x- Evaluation metrics:
(h! (x) ∈ ℝ) RMSE, R)
Outline
• Introductory problems
• Linear regression: theorical review
– Model representation
– Cost function
– Gradient descent
– Evaluation metrics
• Case study: House Price in Boston
• Discussion on linear regression
• Quizzes and Exercises
Disadvantages of Linear Regression
Not good work for
Sensitive to outliers
non-linear relationships

outliers
What we have learned so far
• Introductory problems
• Linear regression: a theorical view
– Model representation
– Cost function
– Gradient descent
– Evaluation metrics
• Case study: House Price in Boston
• Disadvantages of linear regression:
– Sensitive to outliers
– Not good for non-linear relationships between 𝑥 and 𝑦
• Quizzes and Exercises
THE END

You might also like