0% found this document useful (0 votes)
8 views

Lecture4 Alankar LinearRegression

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture4 Alankar LinearRegression

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Linear Regression

Ref: An Introduction to Statistical Learning by James et al., Springer

Alankar Alankar

IIT Bombay, India

January 25, 2024


Introduction

1 It is a supervised learning technique where the input and output are


known.
2 A quantitative response can be predicted.
3 It is a good topic to learn before moving to advanced topics.
4 Some basic errors measures and setting up of problems can be learned
in this section.
5 We will first attempt least square

Alankar Alankar (IIT Bombay, India) Linear Regression January 25, 2024 2/7
Simple linear regression

If X is the input and Y is the output, we approximate the the output as

Y ≈ β0 + β1 X

It assumes that there is a linear relationship between X and Y . β0 and β1


are intercept and slope respectively. Let us say that training data has been
used and coefficients or parameters have been estimates, given by β̂0 and
β̂1 . The predicted values are given by:

Ŷ = β̂0 + β̂1 X

Note that ≈ is not used for prediction equation. Because prediction is


made based on training based parameter and Ŷ is not exact but is
precisely given by the above equation. Ŷ is a prediction of Y . ˆrepresents
estimated value.

Alankar Alankar (IIT Bombay, India) Linear Regression January 25, 2024 3/7
Estimating the coefficients

Residual for ith data is defined as

ei = yi − ŷi

where yˆi = βˆ0 + βˆ1 xi . The residual sums of squares (RSS) is defined as
n
X
RSS = e12 + e22 + e32 ...en2 = (yi − β̂0 − β̂1 xi )2
i=1

The least square method minimizes RSS to choose βˆ0 and βˆ1 .

Alankar Alankar (IIT Bombay, India) Linear Regression January 25, 2024 4/7
Analyses of coefficients

Figure: Contour and three-dimensional plots of the RSS.

Alankar Alankar (IIT Bombay, India) Linear Regression January 25, 2024 5/7
Assessing the accuracy of the coefficient estimates

True relationship between X and Y takes the form


Y = f (x) + ϵ
The true relation is also called population regression relation. If f (x) is
linear, it can be given by β0 + β1 X . ϵ is generated from a normal
distribution with mean zero.

Figure: Red line on the left side is the true relation of the data in this example i.e.
f (x) = β0 + β1 X . ϵ. The blue line is the estimated line based on least square fit.
On the right plot, various blue lines are for least square fits for various samples
drawn from the population.
Alankar Alankar (IIT Bombay, India) Linear Regression January 25, 2024 6/7
True vs Real Data

1 True data means the complete population. This is never available to


us. We usually collect only a few observations. Our models are based
on these observations and thus every time a model is made, it is
different (not identical). However, the true data is always the same.

Alankar Alankar (IIT Bombay, India) Linear Regression January 25, 2024 7/7

You might also like