0% found this document useful (0 votes)
51 views

Assumption Checking On Linear Regression

Different assumptions are needed to be satisfied in order to verify the validity of linear regression model, the document attempts to give some workarounds for these common failure of satisfying these assumptions.

Uploaded by

John San Juan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Assumption Checking On Linear Regression

Different assumptions are needed to be satisfied in order to verify the validity of linear regression model, the document attempts to give some workarounds for these common failure of satisfying these assumptions.

Uploaded by

John San Juan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Stat 136: Diagnostic Checking and

Validation of Assumptions
May 12, 2019
Diagnosis 1: Nonlinearity
We know that in Multiple Regression
Model there is an assumption where
E(𝜀)=0 that is all the independent
variables X accurately explained the
variable Y.

Linearity The nonlinearity happens when the


expected value of 𝜀 is not equal to
zero. Meaning there can be a partial
relationship that can be quadratic, or
two independent variables do not
have additive partial effects in Y.
Implications of Linearity
Regression surface is not precisely Failure in capturing systematic
captured pattern of relationship between
Xs and Y
If nonlinearity exists then the fitted model
can still approximate the dependent variable
Y, however, E(Y|X1, X2,...Xk) can be
misleading.
How to detect nonlinearity?
Use partial residual plots.
Plotting X individually against Y, although
useful, can be misleading. Because our prior
concern is NOT the marginal relationship of X
to Y, but the PARTIAL relationship of Y to each
X.

Partial Residual
So to raise the concern above, we opt to use
residual-based plots. Particularly, partial residual

Plots
plot where, X-axis - Xj | Y-axis - Yj(j)=ei+𝛽hatXIJ is
Yi with the linear effects of the other variables
removed.

In partial regression plot,


X-axis - residuals of model Xj as the dependent
variable against all other Xs.
Y-axis - residuals of model Y against the Xj
Difference of Partial Regression Plot and Partial Residual Plot
Partial Regression Plot Partial Residual Plot
Partial regressions plots are mostly used to Partial residual plots are commonly used to
identify leverage points and influential data identify the nature of the relationship
points (e.g. outliers) that might not be between Y and Xj (given the effect of the
leverage points. other independent variables in model).

Can reveal nonlinearity and suggest whether Cannot distinguish between monotone and
a relationship is monotone. nonmonotone nonlinearity

Since its x-axis is not Xj thus it is not always Useful for locating a transformation
useful for locating a transformation.
Cannot show the linear strength
Can show the correct linear strength of
response variable Y and Xj (i.e.,correlation)
Why use Partial Residual Plots against Partial Regression Plot?
Partial regression plots attempt to When performing a linear regression with a
single independent variable, a scatter plot
show the effect of adding an of the response variable against the
additional variable to the model independent variable provides a good
indication of the nature of the relationship.
(given that one or more If there is more than one independent
independent variables are already variable, things become more complicated.
Although it can still be useful to generate
in the model) scatter plots of the response variable
against each of the independent variables,
this does not take into account the effect of
the other independent variables in the
model.
Remedy: Transform variables
Purpose:

To enhance linearity

To improve data configuration


Diagnosis 2: Non-normality of error terms
We know that in Multiple Regression
Model there is an assumption where 𝜀
~ NID (0, Iσ2).

Failure to satisfy these assumptions as

Non-normality of serious repercussion in modelling.

error terms
The assumption of normally
distributed errors is almost always
arbitrary, but the central-limit
theorem assures that inference based
on the least squares estimator is
approximately valid.
Implications of Non-normality
Computations of t- and F- Computation of Prediction
statistics Intervals and Confidence
Intervals
Small departure from normality do not
usually affect the model but gross
non-normality can affect statistics.

NOTE:
Although the validity of least-squares estimation is robust the efficiency of least
squares is not: The least-squares estimator is maximally efficient among unbiased
estimators when the errors are normal.
Types of Departure from Normality and its effects
Distribution with thicker or
heavier tails than normal
Least squares fit may be sensitive to small
sample sizes

More often generate outliers that pull the


least squares fit too much in their direction.

For heavy-tailed errors, the efficiency of


least-squares estimation decreases markedly.
How to detect non-normality?
Use normality probability plot or
goodness-of-fit test.
Plotting the e(i) (ordered residuals) against its
Cumulative Distribution Function (CDF).

Normality If the plot exhibits a straight line, it is normal.


Deviation from the straight line shows the

Probability Plot
thickness and thinness of the tails.

Possible defect: Occurrence of one or two large


residuals, this indicates outliers.
Empirical way of determining normality of error
terms which make use of hypothesis testing.
Ho: Error terms are normally distributed.
Ha: Error terms are not normally distributed.
Goal: DO NOT REJECT Ho
Since error terms are unobservable, we use the
residuals.

Goodness-of-fit Tests

Tests
1. Chi-square goodness of fit test
2. Kolmogorov-Smirnov One-sample Test
3. Wilk-Shapiro Test (mostly used)
4. Anderson-Darling Test (can detect small
departure)
5. Cramer-von-Mises criterion
6. Jarque-Bera test (uses kurtosis and
skewness)
Diagnosis 3: Heteroskedasticity
(non-constancy of variance)
Heteroskedasticity
We know that in Multiple Regression
Model there is an assumption where 𝜀

(non-constancy of
~ NID (0, Iσ2).

Failure to satisfy these assumptions as


variance) serious repercussion in modelling.
Implications of Heteroskedasticity
Ordinary least square estimators Ordinary least square estimators
will be unbiased and linear. are consistent.
Regression coefficients will have larger
standard errors than necessary.
Implications of Heteroskedasticity
Ordinary least square estimators Variances of the OLS estimators
are not efficient. are not provided by the usual OLS
formula
Variances of the estimated coefficients are
not minimum, hence, OLS are no longer The t- and F- tests based on them can be
BLUE. Also, they are not asymptotically(i.e, highly misleading, resulting in incorrect
.as n increases) efficient. conclusions.
How to detect heteroskedasticity?
Plotting X(x-axis) against residual(y-axis).
Note: Because the residuals have unequal variances even when the
variance of the errors is constant, it is preferable to plot studentized
residuals against fitted values.
If residual plot gives the impression that the
variances increases or decreases in a systematic
manner (a funnel shape plot) of a variable X or
to Y.

Two-sample Test
Two-sample test is used to fit separate
regressions each half of the observations by the
level of X, then compare the MSTs. (MSTfirst
half
/MSTsecond half)

Ho: σ12 = σ22 vs Ha: σ12 ≠ σ22

In choosing the X, choose with the worst case


scenario variance from the residual plot.
The test involves the calculation of two least
squares regressions line, one using the data to be
associated with LOW variance and the other
with HIGH variance errors.

Goldfeld-Quandt Model the first half(according to level of x) that


has the high/low variance observations.

Test
Model the second half (according to level of x)
that has the high/low variance observations.

Then use SSElow/high/SSEhigh/lowas F-stat. CR:


F(SSE(df1),SSE(df2))

Ho: σ12 = σ22 vs Ha: σ12 ≠ σ22


White’s More general tests that does not specify the

Heteroskedasticity
nature of the heteroskedasticity.
Ho: variances are constant
Ha: variances are not constant

Test
More general tests that does not specify the

Breusch-Pagan Test
nature of the heteroskedasticity.
Ho: variances are constant
Ha: variances are not constant
Remedies
Transformation of
Variables
Only if form of heteroskedasticity is known
Use the GLS rather than OLS
Idea: Transform the observation matrix in such a
way that it will have a variance equal to I or I𝜎2

Generalized Least Since V is a positive definite (i.e., xVx’, where for

Squares (GLS)
all value x, matrix A will be always be positive),
meaning its inverse, V-1 is also positive definite.
There exist a nonsingular(i.e., invertible matrix)
matrix W such that W’W = V-1 .

Transforming model Y=XB + 𝜀, multiplying W


both sides.
Weighted Least
Squares
When V is given with different variances, W will
contain the reciprocal of the known variances.
Heteroskedasticity
Consistent When the pattern of heteroskedasticity is
unknown.

Covariance (White)
Diagnosis 4: Multicollinearity
Consider n-dimensional vectors
x1,x2,...xn , if there exists c1,c2,...cn that
are not all equal to zero that will
make equation (A) = 0, then the set of

Recall: Linear
vectors are LINEARLY
DEPENDENT.

Dependence If only the set c1,c2,...cn with all values


equal 0 is the only set that will make
equation (A)=0, then the set of
vectors is said to be LINEARLY
INDEPENDENT
If the independent variables Xs are

Recall: Linear
linearly dependent, then
rk((X’X)-1)<p (p is the number of

Dependence in
parameters ,rank is the number of
nonzero rows in row echelon form)

Regression
consequently (X’X)-1 does not exist.

If Xs are nearly linearly dependent,


rk((X’X)-1) is barely p and (X’X)-1)
becomes unstable.
The problem of multicollinearity exists when
the joint association of the independent
variables affects model process.

Pairwise correlation of independent variables


will NOT necessarily lead to
multicollinearity.

Multicollinearity Absence of pairwise correlation of Xs will


NOT necessarily indicate absence of
multicollinearity.

Joint correlation of the independent variables


will not be a problem if it is weak to affect
modelling.
Primary Sources of Multicollinearity

The data collection method Model specification.


employed.
Over-defined model
Constraints in the model or in the (overparameterized)
population.

NOTE:
Multicollinearity is not a stat problem but a data problem but affects greatly the
efficiency of least squares estimation..
Implications of Multicollinearity
Larger variance and standard OLS estimators and their
errors of the OLS estimators. standard errors become very
sensitive to small changes in the
Wider confidence intervals
data; that is they tend to be
Insignificant t ratios unstable. (OLS estimators became
not resistant)
A high R2 but few significant t
ratios Difficulty in assessing the
individual contributions of
Wrong sign for regression
explanatory variables to the sum
coefficients
of squares or R2
ILL CONDITIONING PROBLEM
X plays a crucial role, existence of Opposite signs of slope
problem can result to unstable
Contradicting results of analysis
inverses of (X’X)
of variance and individual
Least square estimators provide assessment of significance of the
poor estimates. variables.

Unstable/inflated standard errors


⇒most likely to no reject Ho.
How to detect multicollinearity?
Signs of the
coefficients are Correlation Matrix
reversed (limited though)
VIFj is the jth diagonal element of (X’X)-1 (it
indicates which term is much affected by
multicollinearity).

Variance Inflation
VIFj > 10 indicates severe variance inflations for
the parameter estimators associated with Xj.

Factors
IF X is centered, X’X=Rxx → (X’X)-1 =R-1xx
⇒ VIFj = 1/ 1-Rj2, where Rj2= coeff of
determination obtained when regressing Xjon
other k-1 independent variables.
⇒ Thus, Rj2➡0, VIFj =1, if Rj2➡0, VIFj ➡∞
⇒ VIF measures the dependencies on the
variance of the jth slope.
Suppose X’X has eigenvalues given by λ1, λ2,..λk.
If λmax=max{λi} and λmin=min{λi}, then he
condition number k=λmax/λmin. Note that there is
ill-conditioning, some eigenvalues of X’X are
near 0.

Condition Number k<100-no serious problem | k between 100-1000


moderate to strong multicol | k>1000 severe

If λj is close to 0 and tj={tj1,.., tjk }’ is the


eigenvector associated with it then ∑tjiXi=0
(i=1,..k) gives the structure of dependency of Xs
that leads to the problem. This gives us a clear
picture how Xs are related to each other.
May compute the ratio of the sqaure root of the
maximum value to the square root of each of
other eigenvalues.

Condition Indices
This gives clarification as to whether one or
several dependencies are present among the Xs.

Indices greater than 30 could indicate present


dependencies.

This can help in formulating a possible


simulataneous system of equations.
Var(𝛽hat)=𝜎2(X’X)-1
Var(𝛽jhat)= 𝜎2VIFj

Variance Variance Proportion: 𝜋ij=(tij/𝜆i)/VIFj


If the value is large (close to 0.5), jth regressor is

Proportions
implicated in the near dependency represented
by the ith eigenvector. That is, a
multicollinearity problem occurs when a
component associated with a high condition
index contributes strongly to the variance of two
or more variables.
Remedies
Centering of
observations
Especially effective if complex functions of Xs
are present in the design matrix.
Deletion of
Preferred to use backward selection.

unimportant Pag nasa results from SAS,


1. Tignan muna yung lowest lambda kung

variables
near 0 or practically 0.
2. Kung walang problema sa lowest lambda,
check sa pinaka malaking condition index.
Try na alisin sya.
Imposing Suppose the eigensystem analysis implies
2X1+X2=0. To fit Y=𝛽o+X1𝛽1+X2𝛽2+X3𝛽3+𝜀, we can

constraints
combine X1 and X2 by imposing constraint
𝛽1=2𝛽2 . This will have an effect of regressing on
the new variable 2X1+X2 rather than on each of
X1 and X2.
Shrinkage
Estimation (Ridge Has a biased estimator which is more stable.

Regression)
Principal Determining linear combinations of the Xs that
explain much variability as the original Xs.

Component
These combinations are linearly independent
and will be called PRINCIPAL COMPONENTS.

Regression Then these principal components will be the


regressors.
Downside: Mahirap iinterpret.
Diagnosis 5: Autocorrelation
Serial correlation/Autocorrelation - the

Autocorrelation
presence of association/relationship among
adjacent(tend to be similar) observations.
How to detect autocorrelation?
Recall Y=X𝛽+𝜀 where 𝜀 ~ NID(0, 𝜎2I)
One possible autocorrelation model 𝜀t=𝜌𝜀t-1 + 𝜔t
where 𝜔t~NID(0, 𝜎2𝜔), |𝜌|<1.

Durbin-Watson Test Hypotheses: Ho: 𝜌=0 vs Ha: 𝜌≠0


(pag do not reject, no need na yung auto
correlation model)

Test Stat: d=∑(et-et-1)2/∑e2t (for all i to n)


Procedure in Interpreting from SAS outputs

1. Set up the possible auto correlation model


𝜀t=𝜌𝜀t-1 + 𝜔t
2. We do not know 𝜌, so we estimate 𝜌hat

Durbin-Watson Test
3. Using Durbin Watsons Test, get the value
of d test statistic
4. If d is not near to 2, thus possible
autocorrelation occurs.
5. If there is autocorrelation based on durbin
watson test, the residual should be
redefined using autocorrelation model,
replace it by 𝜌hat which can be found in “1st
order correlation”
Other test 1.
2.
Overall testing for White Noise
Autocorrelation Plot
Effects of Autocorrelation
Least square estimators of the Confident intervals and the
regression coefficients are various tests of significance
unbiased but are not efficient in commonly employed would no
the sense that they are no longer longer be strictly valid.
have minimum variance

Estimates of variance and


standard errors of the regression
coefficients may be seriously
understated, giving a spurious
impression of accuracy.
Remedies
If the cause of the serial correlation is the

Re-specification incorrect specification, a respecification can


remedy the problem with serially correlated
error terms into an equation where the error
terms are no longer serially correlated.
It transforms the model just like in the problem

Generalized Least
in heteroskedasticity.

1. Let the model yt with 𝜀t=𝜌𝜀t-1 + 𝜔t

Squares
2. Make a model for yt-1
3. Multiply 𝜌 to both sides of the equation of
model for yt-1
4. Subtract equation in #3 and #2
5. Redefine the Betastar and Epsilonstar
Cochrane-Orcutt
Procedure
Process of estimating rho then proceed to GLS
Diagnosis 6: Outliers
Outliers shouldn’t be automatically deleted
in the data. There can be outliers but its

Outliers influence to the estimation procedures must


be reduced.
How to detect outliers?
It can’t be detected all together, it requires to
plot each Xs in Y to see if there is oddities
Plots
Scatter plots or residual plots.

You might also like