0% found this document useful (0 votes)
43 views7 pages

Multiple Linear Regression Part-3: Lectures 24

This document discusses multiple linear regression. It explains that ordinary least squares (OLS) is used to generate unbiased predictions with the smallest average squared error, given certain assumptions hold true. These assumptions include that the noise follows a normal distribution and there is a linear relationship between variables. It also discusses partitioning data, variable selection, issues like multicollinearity, and sample size requirements. Key references on data science, analytics, and data mining concepts are provided.

Uploaded by

Aniket Sujay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views7 pages

Multiple Linear Regression Part-3: Lectures 24

This document discusses multiple linear regression. It explains that ordinary least squares (OLS) is used to generate unbiased predictions with the smallest average squared error, given certain assumptions hold true. These assumptions include that the noise follows a normal distribution and there is a linear relationship between variables. It also discusses partitioning data, variable selection, issues like multicollinearity, and sample size requirements. Key references on data science, analytics, and data mining concepts are provided.

Uploaded by

Aniket Sujay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

MULTIPLE LINEAR REGRESSION Part-3

LECTURES 24

DR. GAURAV DIXIT


DEPARTMENT OF MANAGEMENT STUDIES

1
MULTIPLE LINEAR REGRESSION

• Ordinary least squares (OLS)


𝑦 = 𝛽0 + 𝛽1x1 + 𝛽2x2 + … + 𝛽pxp
– Unbiased predictions (on average, closer to actual values)
– Smallest average squared error
Given following assumptions hold true
• Noise follows a normal distribution
• Linear relationship holds true
• Observations are independent
• Homoskedasticity: variability in the outcome variable is same irrespective of the
values of the predictors

2
MULTIPLE LINEAR REGRESSION

• Partitioning in data mining modeling allows relaxation from


the first assumption
• In statistical modeling, same sample is used to fit the model
and assess its reliability
– Predictions of new records lack reliability
– First assumption is required to derive confidence intervals for
predictions
• Example: Open RStudio

3
MULTIPLE LINEAR REGRESSION

• Variable Selection
– Availability of large no. of variables for selecting a set of predictors
– Main idea is to select most useful set of predictors for a given outcome
variable of interest
– Selecting all the variables in the model is not recommended
• Data collection issues in future
• Measurement accuracy issues for some variables
• Missing values
• Parsimony

4
MULTIPLE LINEAR REGRESSION

• Variable Selection
– Selecting all the variables in the model is not recommended
• Multicollinearity: two or more predictors sharing the same linear relationship with
the outcome variable
• Sample size issues: Rule of thumb
n > 5*(p+2)
Where n=no. of observations
And p=no. of predictors
• Variance of predictions might increase due to inclusion of predictors which are
uncorrelated with the outcome variable
• Average error of predictions might increase due to exclusion of predictors which
are correlated with the outcome variable

5
Key References

• Data Science and Big Data Analytics: Discovering, Analyzing,


Visualizing and Presenting Data by EMC Education Services
(2015)
• Data Mining for Business Intelligence: Concepts, Techniques,
and Applications in Microsoft Office Excel with XLMiner by
Shmueli, G., Patel, N. R., & Bruce, P. C. (2010)

6
Thanks…

You might also like