0% found this document useful (0 votes)

62 views65 pages

Assumption Checking On Linear Regression

Different assumptions are needed to be satisfied in order to verify the validity of linear regression model, the document attempts to give some workarounds for these common failure of satisfying these assumptions.

Uploaded by

John San Juan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views65 pages

Assumption Checking On Linear Regression

Uploaded by

John San Juan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

Stat 136: Diagnostic Checking and

Validation of Assumptions
May 12, 2019
Diagnosis 1: Nonlinearity
We know that in Multiple Regression
Model there is an assumption where
E(𝜀)=0 that is all the independent
variables X accurately explained the
variable Y.

Linearity The nonlinearity happens when the

expected value of 𝜀 is not equal to
zero. Meaning there can be a partial
relationship that can be quadratic, or
two independent variables do not
have additive partial eﬀects in Y.
Implications of Linearity
Regression surface is not precisely Failure in capturing systematic
captured pattern of relationship between
Xs and Y
If nonlinearity exists then the ﬁtted model
can still approximate the dependent variable
Y, however, E(Y|X1, X2,...Xk) can be
misleading.
How to detect nonlinearity?
Use partial residual plots.
Plotting X individually against Y, although
useful, can be misleading. Because our prior
concern is NOT the marginal relationship of X
to Y, but the PARTIAL relationship of Y to each
X.

Partial Residual
So to raise the concern above, we opt to use
residual-based plots. Particularly, partial residual

Plots
plot where, X-axis - Xj | Y-axis - Yj(j)=ei+𝛽hatXIJ is
Yi with the linear eﬀects of the other variables
removed.

In partial regression plot,

X-axis - residuals of model Xj as the dependent
variable against all other Xs.
Y-axis - residuals of model Y against the Xj
Difference of Partial Regression Plot and Partial Residual Plot
Partial Regression Plot Partial Residual Plot
Partial regressions plots are mostly used to Partial residual plots are commonly used to
identify leverage points and inﬂuential data identify the nature of the relationship
points (e.g. outliers) that might not be between Y and Xj (given the eﬀect of the
leverage points. other independent variables in model).

Can reveal nonlinearity and suggest whether Cannot distinguish between monotone and
a relationship is monotone. nonmonotone nonlinearity

Since its x-axis is not Xj thus it is not always Useful for locating a transformation
useful for locating a transformation.
Cannot show the linear strength
Can show the correct linear strength of
response variable Y and Xj (i.e.,correlation)
Why use Partial Residual Plots against Partial Regression Plot?
Partial regression plots attempt to When performing a linear regression with a
single independent variable, a scatter plot
show the eﬀect of adding an of the response variable against the
additional variable to the model independent variable provides a good
indication of the nature of the relationship.
(given that one or more If there is more than one independent
independent variables are already variable, things become more complicated.
Although it can still be useful to generate
in the model) scatter plots of the response variable
against each of the independent variables,
this does not take into account the eﬀect of
the other independent variables in the
model.
Remedy: Transform variables
Purpose:

To enhance linearity

To improve data conﬁguration

Diagnosis 2: Non-normality of error terms
We know that in Multiple Regression
Model there is an assumption where 𝜀
~ NID (0, Iσ2).

Failure to satisfy these assumptions as

Non-normality of serious repercussion in modelling.

error terms
The assumption of normally
distributed errors is almost always
arbitrary, but the central-limit
theorem assures that inference based
on the least squares estimator is
approximately valid.
Implications of Non-normality
Computations of t- and F- Computation of Prediction
statistics Intervals and Confidence
Intervals
Small departure from normality do not
usually affect the model but gross
non-normality can affect statistics.

NOTE:
Although the validity of least-squares estimation is robust the efficiency of least
squares is not: The least-squares estimator is maximally efficient among unbiased
estimators when the errors are normal.
Types of Departure from Normality and its effects
Distribution with thicker or
heavier tails than normal
Least squares fit may be sensitive to small
sample sizes

More often generate outliers that pull the

least squares ﬁt too much in their direction.

For heavy-tailed errors, the eﬃciency of

least-squares estimation decreases markedly.
How to detect non-normality?
Use normality probability plot or
goodness-of-ﬁt test.
Plotting the e(i) (ordered residuals) against its
Cumulative Distribution Function (CDF).

Normality If the plot exhibits a straight line, it is normal.

Deviation from the straight line shows the

Probability Plot
thickness and thinness of the tails.

Possible defect: Occurrence of one or two large

residuals, this indicates outliers.
Empirical way of determining normality of error
terms which make use of hypothesis testing.
Ho: Error terms are normally distributed.
Ha: Error terms are not normally distributed.
Goal: DO NOT REJECT Ho
Since error terms are unobservable, we use the
residuals.

Goodness-of-ﬁt Tests

Tests
1. Chi-square goodness of ﬁt test
2. Kolmogorov-Smirnov One-sample Test
3. Wilk-Shapiro Test (mostly used)
4. Anderson-Darling Test (can detect small
departure)
5. Cramer-von-Mises criterion
6. Jarque-Bera test (uses kurtosis and
skewness)
Diagnosis 3: Heteroskedasticity
(non-constancy of variance)
Heteroskedasticity
We know that in Multiple Regression
Model there is an assumption where 𝜀

(non-constancy of
~ NID (0, Iσ2).

Failure to satisfy these assumptions as

variance) serious repercussion in modelling.
Implications of Heteroskedasticity
Ordinary least square estimators Ordinary least square estimators
will be unbiased and linear. are consistent.
Regression coefficients will have larger
standard errors than necessary.
Implications of Heteroskedasticity
Ordinary least square estimators Variances of the OLS estimators
are not efficient. are not provided by the usual OLS
formula
Variances of the estimated coefficients are
not minimum, hence, OLS are no longer The t- and F- tests based on them can be
BLUE. Also, they are not asymptotically(i.e, highly misleading, resulting in incorrect
.as n increases) efficient. conclusions.
How to detect heteroskedasticity?
Plotting X(x-axis) against residual(y-axis).
Note: Because the residuals have unequal variances even when the
variance of the errors is constant, it is preferable to plot studentized
residuals against fitted values.
If residual plot gives the impression that the
variances increases or decreases in a systematic
manner (a funnel shape plot) of a variable X or
to Y.

Two-sample Test
Two-sample test is used to ﬁt separate
regressions each half of the observations by the
level of X, then compare the MSTs. (MSTﬁrst
half
/MSTsecond half)

Ho: σ12 = σ22 vs Ha: σ12 ≠ σ22

In choosing the X, choose with the worst case

scenario variance from the residual plot.
The test involves the calculation of two least
squares regressions line, one using the data to be
associated with LOW variance and the other
with HIGH variance errors.

Goldfeld-Quandt Model the ﬁrst half(according to level of x) that

has the high/low variance observations.

Test
Model the second half (according to level of x)
that has the high/low variance observations.

Then use SSElow/high/SSEhigh/lowas F-stat. CR:

F(SSE(df1),SSE(df2))

Ho: σ12 = σ22 vs Ha: σ12 ≠ σ22

White’s More general tests that does not specify the

Heteroskedasticity
nature of the heteroskedasticity.
Ho: variances are constant
Ha: variances are not constant

Test
More general tests that does not specify the

Breusch-Pagan Test
nature of the heteroskedasticity.
Ho: variances are constant
Ha: variances are not constant
Remedies
Transformation of
Variables
Only if form of heteroskedasticity is known
Use the GLS rather than OLS
Idea: Transform the observation matrix in such a
way that it will have a variance equal to I or I𝜎2

Generalized Least Since V is a positive deﬁnite (i.e., xVx’, where for

Squares (GLS)
all value x, matrix A will be always be positive),
meaning its inverse, V-1 is also positive deﬁnite.
There exist a nonsingular(i.e., invertible matrix)
matrix W such that W’W = V-1 .

Transforming model Y=XB + 𝜀, multiplying W

both sides.
Weighted Least
Squares
When V is given with diﬀerent variances, W will
contain the reciprocal of the known variances.
Heteroskedasticity
Consistent When the pattern of heteroskedasticity is
unknown.

Covariance (White)
Diagnosis 4: Multicollinearity
Consider n-dimensional vectors
x1,x2,...xn , if there exists c1,c2,...cn that
are not all equal to zero that will
make equation (A) = 0, then the set of

Recall: Linear
vectors are LINEARLY
DEPENDENT.

Dependence If only the set c1,c2,...cn with all values

equal 0 is the only set that will make
equation (A)=0, then the set of
vectors is said to be LINEARLY
INDEPENDENT
If the independent variables Xs are

Recall: Linear
linearly dependent, then
rk((X’X)-1)<p (p is the number of

Dependence in
parameters ,rank is the number of
nonzero rows in row echelon form)

Regression
consequently (X’X)-1 does not exist.

If Xs are nearly linearly dependent,

rk((X’X)-1) is barely p and (X’X)-1)
becomes unstable.
The problem of multicollinearity exists when
the joint association of the independent
variables aﬀects model process.

Pairwise correlation of independent variables

will NOT necessarily lead to
multicollinearity.

Multicollinearity Absence of pairwise correlation of Xs will

NOT necessarily indicate absence of
multicollinearity.

Joint correlation of the independent variables

will not be a problem if it is weak to aﬀect
modelling.
Primary Sources of Multicollinearity

The data collection method Model speciﬁcation.

employed.
Over-deﬁned model
Constraints in the model or in the (overparameterized)
population.

NOTE:
Multicollinearity is not a stat problem but a data problem but affects greatly the
efficiency of least squares estimation..
Implications of Multicollinearity
Larger variance and standard OLS estimators and their
errors of the OLS estimators. standard errors become very
sensitive to small changes in the
Wider confidence intervals
data; that is they tend to be
Insignificant t ratios unstable. (OLS estimators became
not resistant)
A high R2 but few significant t
ratios Difficulty in assessing the
individual contributions of
Wrong sign for regression
explanatory variables to the sum
coefficients
of squares or R2
ILL CONDITIONING PROBLEM
X plays a crucial role, existence of Opposite signs of slope
problem can result to unstable
Contradicting results of analysis
inverses of (X’X)
of variance and individual
Least square estimators provide assessment of significance of the
poor estimates. variables.

Unstable/inﬂated standard errors

⇒most likely to no reject Ho.
How to detect multicollinearity?
Signs of the
coefficients are Correlation Matrix
reversed (limited though)
VIFj is the jth diagonal element of (X’X)-1 (it
indicates which term is much aﬀected by
multicollinearity).

Variance Inﬂation
VIFj > 10 indicates severe variance inﬂations for
the parameter estimators associated with Xj.

Factors
IF X is centered, X’X=Rxx → (X’X)-1 =R-1xx
⇒ VIFj = 1/ 1-Rj2, where Rj2= coeﬀ of
determination obtained when regressing Xjon
other k-1 independent variables.
⇒ Thus, Rj2➡0, VIFj =1, if Rj2➡0, VIFj ➡∞
⇒ VIF measures the dependencies on the
variance of the jth slope.
Suppose X’X has eigenvalues given by λ1, λ2,..λk.
If λmax=max{λi} and λmin=min{λi}, then he
condition number k=λmax/λmin. Note that there is
ill-conditioning, some eigenvalues of X’X are
near 0.

Condition Number k<100-no serious problem | k between 100-1000

moderate to strong multicol | k>1000 severe

If λj is close to 0 and tj={tj1,.., tjk }’ is the

eigenvector associated with it then ∑tjiXi=0
(i=1,..k) gives the structure of dependency of Xs
that leads to the problem. This gives us a clear
picture how Xs are related to each other.
May compute the ratio of the sqaure root of the
maximum value to the square root of each of
other eigenvalues.

Condition Indices
This gives clariﬁcation as to whether one or
several dependencies are present among the Xs.

Indices greater than 30 could indicate present

dependencies.

This can help in formulating a possible

simulataneous system of equations.
Var(𝛽hat)=𝜎2(X’X)-1
Var(𝛽jhat)= 𝜎2VIFj

Variance Variance Proportion: 𝜋ij=(tij/𝜆i)/VIFj

If the value is large (close to 0.5), jth regressor is

Proportions
implicated in the near dependency represented
by the ith eigenvector. That is, a
multicollinearity problem occurs when a
component associated with a high condition
index contributes strongly to the variance of two
or more variables.
Remedies
Centering of
observations
Especially eﬀective if complex functions of Xs
are present in the design matrix.
Deletion of
Preferred to use backward selection.

unimportant Pag nasa results from SAS,

1. Tignan muna yung lowest lambda kung

variables
near 0 or practically 0.
2. Kung walang problema sa lowest lambda,
check sa pinaka malaking condition index.
Try na alisin sya.
Imposing Suppose the eigensystem analysis implies
2X1+X2=0. To ﬁt Y=𝛽o+X1𝛽1+X2𝛽2+X3𝛽3+𝜀, we can

constraints
combine X1 and X2 by imposing constraint
𝛽1=2𝛽2 . This will have an eﬀect of regressing on
the new variable 2X1+X2 rather than on each of
X1 and X2.
Shrinkage
Estimation (Ridge Has a biased estimator which is more stable.

Regression)
Principal Determining linear combinations of the Xs that
explain much variability as the original Xs.

Component
These combinations are linearly independent
and will be called PRINCIPAL COMPONENTS.

Regression Then these principal components will be the

regressors.
Downside: Mahirap iinterpret.
Diagnosis 5: Autocorrelation
Serial correlation/Autocorrelation - the

Autocorrelation
presence of association/relationship among
adjacent(tend to be similar) observations.
How to detect autocorrelation?
Recall Y=X𝛽+𝜀 where 𝜀 ~ NID(0, 𝜎2I)
One possible autocorrelation model 𝜀t=𝜌𝜀t-1 + 𝜔t
where 𝜔t~NID(0, 𝜎2𝜔), |𝜌|<1.

Durbin-Watson Test Hypotheses: Ho: 𝜌=0 vs Ha: 𝜌≠0

(pag do not reject, no need na yung auto
correlation model)

Test Stat: d=∑(et-et-1)2/∑e2t (for all i to n)

Procedure in Interpreting from SAS outputs

1. Set up the possible auto correlation model

𝜀t=𝜌𝜀t-1 + 𝜔t
2. We do not know 𝜌, so we estimate 𝜌hat

Durbin-Watson Test
3. Using Durbin Watsons Test, get the value
of d test statistic
4. If d is not near to 2, thus possible
autocorrelation occurs.
5. If there is autocorrelation based on durbin
watson test, the residual should be
redefined using autocorrelation model,
replace it by 𝜌hat which can be found in “1st
order correlation”
Other test 1.
2.
Overall testing for White Noise
Autocorrelation Plot
Effects of Autocorrelation
Least square estimators of the Confident intervals and the
regression coefficients are various tests of significance
unbiased but are not efficient in commonly employed would no
the sense that they are no longer longer be strictly valid.
have minimum variance

Estimates of variance and

standard errors of the regression
coeﬃcients may be seriously
understated, giving a spurious
impression of accuracy.
Remedies
If the cause of the serial correlation is the

Re-specification incorrect specification, a respecification can

remedy the problem with serially correlated
error terms into an equation where the error
terms are no longer serially correlated.
It transforms the model just like in the problem

Generalized Least
in heteroskedasticity.

1. Let the model yt with 𝜀t=𝜌𝜀t-1 + 𝜔t

Squares
2. Make a model for yt-1
3. Multiply 𝜌 to both sides of the equation of
model for yt-1
4. Subtract equation in #3 and #2
5. Redeﬁne the Betastar and Epsilonstar
Cochrane-Orcutt
Procedure
Process of estimating rho then proceed to GLS
Diagnosis 6: Outliers
Outliers shouldn’t be automatically deleted
in the data. There can be outliers but its

Outliers inﬂuence to the estimation procedures must

be reduced.
How to detect outliers?
It can’t be detected all together, it requires to
plot each Xs in Y to see if there is oddities
Plots
Scatter plots or residual plots.

Experimental Design I Lecture Notes 1
100% (1)
Experimental Design I Lecture Notes 1
33 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Ad3411 Data Science and Analytics Laboratory
100% (7)
Ad3411 Data Science and Analytics Laboratory
24 pages
The Five Assumptions of Multiple Linear Regression
No ratings yet
The Five Assumptions of Multiple Linear Regression
18 pages
High Yield Notes
No ratings yet
High Yield Notes
251 pages
CH 4 - Problems
No ratings yet
CH 4 - Problems
72 pages
Unit-2 Ak
No ratings yet
Unit-2 Ak
106 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
00000chen - Linear Regression Analysis3
No ratings yet
00000chen - Linear Regression Analysis3
252 pages
OLS Assumptions
No ratings yet
OLS Assumptions
40 pages
Econometrics Board Questions
No ratings yet
Econometrics Board Questions
13 pages
Multiple Linear Regression Test - 2025
No ratings yet
Multiple Linear Regression Test - 2025
47 pages
01 - Quantitative Methods
No ratings yet
01 - Quantitative Methods
28 pages
Stat 136 Chapter 9 Diagnostic Plots and Nonlinearity
No ratings yet
Stat 136 Chapter 9 Diagnostic Plots and Nonlinearity
47 pages
Econometrics For Finance Chapter 4
No ratings yet
Econometrics For Finance Chapter 4
44 pages
Lecture 3
No ratings yet
Lecture 3
35 pages
Chap4 Econometrics I Jonse
No ratings yet
Chap4 Econometrics I Jonse
51 pages
Chapter 4
No ratings yet
Chapter 4
15 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Chapter 5 - Violations of Regression Assumptions
No ratings yet
Chapter 5 - Violations of Regression Assumptions
44 pages
Statistical Analysis Using SPSS and R - Chapter 5 PDF
No ratings yet
Statistical Analysis Using SPSS and R - Chapter 5 PDF
93 pages
Heteroscedasticity:: Testing and Correcting in SPSS
No ratings yet
Heteroscedasticity:: Testing and Correcting in SPSS
32 pages
Econometrics II Chapter Two
No ratings yet
Econometrics II Chapter Two
40 pages
Topic 5
No ratings yet
Topic 5
30 pages
ch4 (Multi Hetro Auto)
No ratings yet
ch4 (Multi Hetro Auto)
43 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
Multicollinearity AND Heteroskedasticity
No ratings yet
Multicollinearity AND Heteroskedasticity
75 pages
Heteroscedasticity
No ratings yet
Heteroscedasticity
21 pages
Week 6: Assumptions in Regression Analysis
No ratings yet
Week 6: Assumptions in Regression Analysis
69 pages
Heteroscedasticity Workshop
No ratings yet
Heteroscedasticity Workshop
72 pages
Chapter 3
No ratings yet
Chapter 3
22 pages
OLS Assumptions and Diagnostics
No ratings yet
OLS Assumptions and Diagnostics
18 pages
Chapter 4 MLR
No ratings yet
Chapter 4 MLR
17 pages
2023 Level II Key Facts and Formula Sheet (KFFS)
No ratings yet
2023 Level II Key Facts and Formula Sheet (KFFS)
14 pages
04 Violation of Assumptions All
No ratings yet
04 Violation of Assumptions All
24 pages
Econometery ch2
No ratings yet
Econometery ch2
38 pages
4-Regression Diagnostics SAS
No ratings yet
4-Regression Diagnostics SAS
12 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
LR Assumptions - 05
No ratings yet
LR Assumptions - 05
12 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Advances 20220303 24
No ratings yet
Advances 20220303 24
13 pages
Heteros Ce Dasti City
No ratings yet
Heteros Ce Dasti City
17 pages
Econometrics A
No ratings yet
Econometrics A
18 pages
Chapter 4 - Acct
No ratings yet
Chapter 4 - Acct
16 pages
Linear Regression Assumptions and Limitations
No ratings yet
Linear Regression Assumptions and Limitations
10 pages
LR Assumptions
No ratings yet
LR Assumptions
9 pages
Assumptions Linear Regression
No ratings yet
Assumptions Linear Regression
14 pages
Yaregal Birhanu
No ratings yet
Yaregal Birhanu
8 pages
Economic
No ratings yet
Economic
11 pages
ED 801 Module 4 Answers
100% (1)
ED 801 Module 4 Answers
23 pages
Multiple Regression and Issues in Regression Analysis
No ratings yet
Multiple Regression and Issues in Regression Analysis
7 pages
Assosa University School of Graduate Studies Mba Program
No ratings yet
Assosa University School of Graduate Studies Mba Program
10 pages
05 Diagnostic Test of CLRM 2
No ratings yet
05 Diagnostic Test of CLRM 2
39 pages
BRM Assignment
No ratings yet
BRM Assignment
26 pages
Statistics Year 2 (A Level) Unit Test 1: Regression and Correlation Mark Scheme
No ratings yet
Statistics Year 2 (A Level) Unit Test 1: Regression and Correlation Mark Scheme
9 pages
18 2 12 Ajao
No ratings yet
18 2 12 Ajao
8 pages
Beginning Behavioral Research A Conceptual Primer, 7th Edition Exclusive Download
100% (17)
Beginning Behavioral Research A Conceptual Primer, 7th Edition Exclusive Download
14 pages
Econometrics and Softwar Applications (Econ 7031) Assignment
No ratings yet
Econometrics and Softwar Applications (Econ 7031) Assignment
8 pages
Linear Regression Analysis: Module - Iv
No ratings yet
Linear Regression Analysis: Module - Iv
10 pages
Chapter 6
No ratings yet
Chapter 6
5 pages
Violations of Assumptions
No ratings yet
Violations of Assumptions
1 page
MATH 231-Statistics-Hira Nadeem PDF
No ratings yet
MATH 231-Statistics-Hira Nadeem PDF
3 pages
Z Score Tables
No ratings yet
Z Score Tables
5 pages
Parametric and Nonparametric
No ratings yet
Parametric and Nonparametric
2 pages
Weibull-Analysis-In-Excel Standard IEC 61649
No ratings yet
Weibull-Analysis-In-Excel Standard IEC 61649
113 pages
Mean Median and Mode For Grouped Data
No ratings yet
Mean Median and Mode For Grouped Data
4 pages
Paired T-Test Using MINITAB
No ratings yet
Paired T-Test Using MINITAB
4 pages
Research Design Statistical Analysis 2nd Edition Jerome L. Myers Instant Download
No ratings yet
Research Design Statistical Analysis 2nd Edition Jerome L. Myers Instant Download
52 pages
Indicator Variables: Variable or Dummy Variables
No ratings yet
Indicator Variables: Variable or Dummy Variables
11 pages
ML Ques Mod-1
No ratings yet
ML Ques Mod-1
25 pages
Dickey, Fuller - 1981 - Likelihood Ratio Statistics For Autoregressive Time Series With A Unit Root
No ratings yet
Dickey, Fuller - 1981 - Likelihood Ratio Statistics For Autoregressive Time Series With A Unit Root
17 pages
PCA Machine Learning
No ratings yet
PCA Machine Learning
34 pages
Cross-Sectional and Cohort Designs
No ratings yet
Cross-Sectional and Cohort Designs
21 pages
Problem Set 2 Answer PDF
No ratings yet
Problem Set 2 Answer PDF
5 pages
Ribs
No ratings yet
Ribs
12 pages
EMF - Prático
No ratings yet
EMF - Prático
32 pages
Griya Report-Con S Eom 1704.Dwg Jalan
No ratings yet
Griya Report-Con S Eom 1704.Dwg Jalan
11 pages
Assg 2
No ratings yet
Assg 2
10 pages
Business Statistics
No ratings yet
Business Statistics
29 pages
GWR Tutorial
No ratings yet
GWR Tutorial
27 pages
Analisis Pengaruh Inflasi, Suku Bunga Kredit, Pendapatan Per Kapita Terhadap Penanaman Modal Dalam Negeri Di Indonesia
No ratings yet
Analisis Pengaruh Inflasi, Suku Bunga Kredit, Pendapatan Per Kapita Terhadap Penanaman Modal Dalam Negeri Di Indonesia
19 pages
Statistik Era Ayu Wandira
No ratings yet
Statistik Era Ayu Wandira
3 pages
Contoh Uji Validitas & Reliabilitas PDF
No ratings yet
Contoh Uji Validitas & Reliabilitas PDF
5 pages
Statistical Functions
No ratings yet
Statistical Functions
2 pages
Additional Mathematics Results For SMK Taman SEA From 2013 - 2017
No ratings yet
Additional Mathematics Results For SMK Taman SEA From 2013 - 2017
8 pages
Understanding Vector Calculus: Practical Development and Solved Problems
From Everand
Understanding Vector Calculus: Practical Development and Solved Problems
Jerrold Franklin
No ratings yet
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet

Assumption Checking On Linear Regression

Uploaded by

Assumption Checking On Linear Regression

Uploaded by

Stat 136: Diagnostic Checking and

Linearity The nonlinearity happens when the

In partial regression plot,

To improve data conﬁguration

Failure to satisfy these assumptions as

Non-normality of serious repercussion in modelling.

More often generate outliers that pull the

For heavy-tailed errors, the eﬃciency of

Normality If the plot exhibits a straight line, it is normal.

Possible defect: Occurrence of one or two large

Failure to satisfy these assumptions as

Ho: σ12 = σ22 vs Ha: σ12 ≠ σ22

In choosing the X, choose with the worst case

Goldfeld-Quandt Model the ﬁrst half(according to level of x) that

Then use SSElow/high/SSEhigh/lowas F-stat. CR:

Ho: σ12 = σ22 vs Ha: σ12 ≠ σ22

Generalized Least Since V is a positive deﬁnite (i.e., xVx’, where for

Transforming model Y=XB + 𝜀, multiplying W

Dependence If only the set c1,c2,...cn with all values

If Xs are nearly linearly dependent,

Pairwise correlation of independent variables

Multicollinearity Absence of pairwise correlation of Xs will

Joint correlation of the independent variables

The data collection method Model speciﬁcation.

Unstable/inﬂated standard errors

Condition Number k<100-no serious problem | k between 100-1000

If λj is close to 0 and tj={tj1,.., tjk }’ is the

Indices greater than 30 could indicate present

This can help in formulating a possible

Variance Variance Proportion: 𝜋ij=(tij/𝜆i)/VIFj

unimportant Pag nasa results from SAS,

Regression Then these principal components will be the

Durbin-Watson Test Hypotheses: Ho: 𝜌=0 vs Ha: 𝜌≠0

Test Stat: d=∑(et-et-1)2/∑e2t (for all i to n)

1. Set up the possible auto correlation model

Estimates of variance and

Re-specification incorrect specification, a respecification can

1. Let the model yt with 𝜀t=𝜌𝜀t-1 + 𝜔t

Outliers inﬂuence to the estimation procedures must

You might also like