Assumption Checking On Linear Regression
Assumption Checking On Linear Regression
Validation of Assumptions
May 12, 2019
Diagnosis 1: Nonlinearity
We know that in Multiple Regression
Model there is an assumption where
E(𝜀)=0 that is all the independent
variables X accurately explained the
variable Y.
Partial Residual
So to raise the concern above, we opt to use
residual-based plots. Particularly, partial residual
Plots
plot where, X-axis - Xj | Y-axis - Yj(j)=ei+𝛽hatXIJ is
Yi with the linear effects of the other variables
removed.
Can reveal nonlinearity and suggest whether Cannot distinguish between monotone and
a relationship is monotone. nonmonotone nonlinearity
Since its x-axis is not Xj thus it is not always Useful for locating a transformation
useful for locating a transformation.
Cannot show the linear strength
Can show the correct linear strength of
response variable Y and Xj (i.e.,correlation)
Why use Partial Residual Plots against Partial Regression Plot?
Partial regression plots attempt to When performing a linear regression with a
single independent variable, a scatter plot
show the effect of adding an of the response variable against the
additional variable to the model independent variable provides a good
indication of the nature of the relationship.
(given that one or more If there is more than one independent
independent variables are already variable, things become more complicated.
Although it can still be useful to generate
in the model) scatter plots of the response variable
against each of the independent variables,
this does not take into account the effect of
the other independent variables in the
model.
Remedy: Transform variables
Purpose:
To enhance linearity
error terms
The assumption of normally
distributed errors is almost always
arbitrary, but the central-limit
theorem assures that inference based
on the least squares estimator is
approximately valid.
Implications of Non-normality
Computations of t- and F- Computation of Prediction
statistics Intervals and Confidence
Intervals
Small departure from normality do not
usually affect the model but gross
non-normality can affect statistics.
NOTE:
Although the validity of least-squares estimation is robust the efficiency of least
squares is not: The least-squares estimator is maximally efficient among unbiased
estimators when the errors are normal.
Types of Departure from Normality and its effects
Distribution with thicker or
heavier tails than normal
Least squares fit may be sensitive to small
sample sizes
Probability Plot
thickness and thinness of the tails.
Goodness-of-fit Tests
Tests
1. Chi-square goodness of fit test
2. Kolmogorov-Smirnov One-sample Test
3. Wilk-Shapiro Test (mostly used)
4. Anderson-Darling Test (can detect small
departure)
5. Cramer-von-Mises criterion
6. Jarque-Bera test (uses kurtosis and
skewness)
Diagnosis 3: Heteroskedasticity
(non-constancy of variance)
Heteroskedasticity
We know that in Multiple Regression
Model there is an assumption where 𝜀
(non-constancy of
~ NID (0, Iσ2).
Two-sample Test
Two-sample test is used to fit separate
regressions each half of the observations by the
level of X, then compare the MSTs. (MSTfirst
half
/MSTsecond half)
Test
Model the second half (according to level of x)
that has the high/low variance observations.
Heteroskedasticity
nature of the heteroskedasticity.
Ho: variances are constant
Ha: variances are not constant
Test
More general tests that does not specify the
Breusch-Pagan Test
nature of the heteroskedasticity.
Ho: variances are constant
Ha: variances are not constant
Remedies
Transformation of
Variables
Only if form of heteroskedasticity is known
Use the GLS rather than OLS
Idea: Transform the observation matrix in such a
way that it will have a variance equal to I or I𝜎2
Squares (GLS)
all value x, matrix A will be always be positive),
meaning its inverse, V-1 is also positive definite.
There exist a nonsingular(i.e., invertible matrix)
matrix W such that W’W = V-1 .
Covariance (White)
Diagnosis 4: Multicollinearity
Consider n-dimensional vectors
x1,x2,...xn , if there exists c1,c2,...cn that
are not all equal to zero that will
make equation (A) = 0, then the set of
Recall: Linear
vectors are LINEARLY
DEPENDENT.
Recall: Linear
linearly dependent, then
rk((X’X)-1)<p (p is the number of
Dependence in
parameters ,rank is the number of
nonzero rows in row echelon form)
Regression
consequently (X’X)-1 does not exist.
NOTE:
Multicollinearity is not a stat problem but a data problem but affects greatly the
efficiency of least squares estimation..
Implications of Multicollinearity
Larger variance and standard OLS estimators and their
errors of the OLS estimators. standard errors become very
sensitive to small changes in the
Wider confidence intervals
data; that is they tend to be
Insignificant t ratios unstable. (OLS estimators became
not resistant)
A high R2 but few significant t
ratios Difficulty in assessing the
individual contributions of
Wrong sign for regression
explanatory variables to the sum
coefficients
of squares or R2
ILL CONDITIONING PROBLEM
X plays a crucial role, existence of Opposite signs of slope
problem can result to unstable
Contradicting results of analysis
inverses of (X’X)
of variance and individual
Least square estimators provide assessment of significance of the
poor estimates. variables.
Variance Inflation
VIFj > 10 indicates severe variance inflations for
the parameter estimators associated with Xj.
Factors
IF X is centered, X’X=Rxx → (X’X)-1 =R-1xx
⇒ VIFj = 1/ 1-Rj2, where Rj2= coeff of
determination obtained when regressing Xjon
other k-1 independent variables.
⇒ Thus, Rj2➡0, VIFj =1, if Rj2➡0, VIFj ➡∞
⇒ VIF measures the dependencies on the
variance of the jth slope.
Suppose X’X has eigenvalues given by λ1, λ2,..λk.
If λmax=max{λi} and λmin=min{λi}, then he
condition number k=λmax/λmin. Note that there is
ill-conditioning, some eigenvalues of X’X are
near 0.
Condition Indices
This gives clarification as to whether one or
several dependencies are present among the Xs.
Proportions
implicated in the near dependency represented
by the ith eigenvector. That is, a
multicollinearity problem occurs when a
component associated with a high condition
index contributes strongly to the variance of two
or more variables.
Remedies
Centering of
observations
Especially effective if complex functions of Xs
are present in the design matrix.
Deletion of
Preferred to use backward selection.
variables
near 0 or practically 0.
2. Kung walang problema sa lowest lambda,
check sa pinaka malaking condition index.
Try na alisin sya.
Imposing Suppose the eigensystem analysis implies
2X1+X2=0. To fit Y=𝛽o+X1𝛽1+X2𝛽2+X3𝛽3+𝜀, we can
constraints
combine X1 and X2 by imposing constraint
𝛽1=2𝛽2 . This will have an effect of regressing on
the new variable 2X1+X2 rather than on each of
X1 and X2.
Shrinkage
Estimation (Ridge Has a biased estimator which is more stable.
Regression)
Principal Determining linear combinations of the Xs that
explain much variability as the original Xs.
Component
These combinations are linearly independent
and will be called PRINCIPAL COMPONENTS.
Autocorrelation
presence of association/relationship among
adjacent(tend to be similar) observations.
How to detect autocorrelation?
Recall Y=X𝛽+𝜀 where 𝜀 ~ NID(0, 𝜎2I)
One possible autocorrelation model 𝜀t=𝜌𝜀t-1 + 𝜔t
where 𝜔t~NID(0, 𝜎2𝜔), |𝜌|<1.
Durbin-Watson Test
3. Using Durbin Watsons Test, get the value
of d test statistic
4. If d is not near to 2, thus possible
autocorrelation occurs.
5. If there is autocorrelation based on durbin
watson test, the residual should be
redefined using autocorrelation model,
replace it by 𝜌hat which can be found in “1st
order correlation”
Other test 1.
2.
Overall testing for White Noise
Autocorrelation Plot
Effects of Autocorrelation
Least square estimators of the Confident intervals and the
regression coefficients are various tests of significance
unbiased but are not efficient in commonly employed would no
the sense that they are no longer longer be strictly valid.
have minimum variance
Generalized Least
in heteroskedasticity.
Squares
2. Make a model for yt-1
3. Multiply 𝜌 to both sides of the equation of
model for yt-1
4. Subtract equation in #3 and #2
5. Redefine the Betastar and Epsilonstar
Cochrane-Orcutt
Procedure
Process of estimating rho then proceed to GLS
Diagnosis 6: Outliers
Outliers shouldn’t be automatically deleted
in the data. There can be outliers but its