A07 Linear Regression v2 2up
A07 Linear Regression v2 2up
Mehul Motani
Electrical & Computer Engineering
National University of Singapore
Email: [email protected]
B. Sikdar
11/17/10
Machine Learning Taxonomy
• Machine learning is function approximation
• Supervised learning – Access to labeled dataset
• Unsupervised learning – Dataset is not labeled
• Classification – The output is categorical
• Regression – The output is continuous
• Many different ML models are available
• Simplest model is Linear Regression
– Find the best fit line which goes through the data points
– Linear regression is supervised learning
© Mehul Motani Linear Regression 2
Example – Tire tread vs Mileage
3
How are Tire tread wear
and Mileage related?
!
dependent
variable
(output)
A. Seyedi
11/17/10
Linear Regression - Best Fit Line
• Linear Regression - Fit data with the “best”
hyperplane which goes through the data points
Degree M
polynomial
Bad Fit
Good Fit
Over-Fitting
Root-Mean-Square(RMS) Error
Linear Regression
14
• Response/outcome/dependent variable: !
• Predictor/explanatory/independent variable: "
• Example 1: Estimate electricity demand for home cooling
(!) from the average daily temperature (")
• Example 2: Relationship between the head size and body
size of a newborn
• Regression analysis: statistical methodology to estimate
the relationship between " and !
• Correlation analysis: statistical methodology used to
asses the strength of relationship between " and !
Linear Regression
16
• The error .! :
• Independent and identically distributed
• Variety of causes:
• Measurement errors
• Other variables affecting $! not included in the
model
• Assumption / .! = 0: implies there is no systematic
bias
• Usual model for .! : .! ~7 0, 4 $
• Justified by the Central Limit Theorem
: = ; !! − ," + ,# "! $
!%#
% % % %
%&
= −2 + #" $" − '! + '$ #" = 0 ⇒ '! + #" + '$ + #"& = '$ + #" $"
%'$
"#$ "#$ "#$ "#$
A. Seyedi
11/17/10
Anscombe's Quartet
A B
C D
Multivariate Regression
26
• We have explored problems with one response variable ! and one
explanatory variable "
• Sometimes a straight line (linear regression) is not adequate and
quadratic or cubic model is needed (polynomial regression)
• Sometimes there are more than one predictor variables and their
simultaneous effect needs to be modeled
• # pairs of observations !! ; "!", "!#, ⋯ , "!$ , ' = 1, ⋯ , #
• Multiple regression model: * = +% + +& -& + +' -' + ⋯ + +( -( + .
• Polynomial regression: Linear in / and not necessarily "’s: "" =
", "# = " #, "$ = " $
• Simple linear regression: !; "
• Multiple linear regression: !; "", "#, ⋯ , ")
• Multivariate regression: !", !#, ⋯ , !* ; "", "#, ⋯ , ")
© Mehul Motani Linear Regression 26
Multiple Linear Regression
27
• Least squares fit: & = ∑%"#$ $" − '! + '$ #"$ + '& #"& + ⋯ + '+ #"+ &
/'! + #", + '$ + #"$ #", + ⋯ + '+ + #"+ #", = + $" #",