0% found this document useful (0 votes)
7 views16 pages

Chapter 3 Part 2

Chapter 3 discusses additional considerations in regression modeling, including nonlinearity, correlation of error terms, non-constant variance, outliers, high-leverage points, and collinearity. It highlights the importance of checking for collinearity using pairwise correlation and variance inflation factor (VIF), and suggests remedies like dropping or combining variables. The chapter also compares linear regression with K-nearest neighbors (KNN) regression, noting that linear regression may outperform KNN in certain settings, especially with fewer observations per predictor.

Uploaded by

amin1jafarzade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views16 pages

Chapter 3 Part 2

Chapter 3 discusses additional considerations in regression modeling, including nonlinearity, correlation of error terms, non-constant variance, outliers, high-leverage points, and collinearity. It highlights the importance of checking for collinearity using pairwise correlation and variance inflation factor (VIF), and suggests remedies like dropping or combining variables. The chapter also compares linear regression with K-nearest neighbors (KNN) regression, noting that linear regression may outperform KNN in certain settings, especially with fewer observations per predictor.

Uploaded by

amin1jafarzade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Chapter 3

Additional topics
Other Considerations in the Regression Model
1. Nonlinearity of the response-predictor relationship
2. Correlation of error terms

An important assumption of the


linear regression is that the error
terms are uncorrelated.

If the assumption is violated, the


estimated standard errors will
tend to underestimate the true
standard error.
3. Non-constant variance of error terms
4. Outliers
5. High-leverage points
" (%! & %)̅ "
Leverage statistics: ℎ! = + ∑& "
# #$%(%# &%̅ )
The leverage statistic is always between 1/n and 1.
If an observation has leverage statistic exceeding (p+1)/n, then we
suspect that the corresponding point has high leverage.
6. Collinearity
Two or more predictors are closely
related with each other.
It reduces the accuracy of the estimates
of the regression coefficients.
It results in a decline in t-statistic.
• For detecting collinearity
Check the pairwise correlation
Use variance inflaction factor (VIF):
1
𝑉𝐼𝐹 𝛽'* =
1 − 𝑅+-# |+'#
• What should we do when collinearity exists?
(1) drop the problematic variable,
(2) combine the collinear variables into a single predictor(e.g.
PCA regression)
3.5 Comparison of Linear Regression with K-
Nearest Neighbors

K-nearest neighbors regression:


Type equation here.
1
𝑓' 𝑥= = / 𝑦!
𝐾
%! ∈?(
• The optimal value for K will depend on the bias-variance
tradeoff.
The small value of K provides the most flexible fit, which will have low
bias but high variance.

Large values of K provide a smoother and high bias with small variance.

• In what setting will least squares linear regression outperform a


KNN regression?
Curse of dimensionality
• Parametric approaches outperforms nonparametric approaches
when there are a small number of observations per predictor.

• Even in problems in which the dimension is small, we may


prefer linear regression to KNN from an interpretability
standpoint.

You might also like