0% found this document useful (0 votes)

26 views44 pages

MultivariableRegression 6

This document discusses assumptions and outliers in multiple regression models. It defines outliers and discusses their causes and effects. It also covers outlier detection methods like leverage scores and Cook's distance which are calculated using the hat matrix. Leverage scores measure how far a predictor variable is from its mean.

Uploaded by

Alada mana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views44 pages

MultivariableRegression 6

Uploaded by

Alada mana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

PE I: Multivariable Regression

Outliers
(Chapter 4.9)

Andrius Buteikis, [email protected]

http://web.vu.lt/mif/a.buteikis/
Multiple Regression: Model Assumptions

Much like in the case of the univariate regression with one independent variable, the multiple
regression model has a number of required assumptions:
(MR.1): Linear Model The Data Generating Process (DGP), or in other words, the
population, is described by a linear (in terms of the coefficients) model:

Y = Xβ + ε (MR.1)

(MR.2): Strict Exogeneity Conditional expectation of ε, given all observations of the

explanatory variable matrix X, is zero:

E (ε|X) = 0 (MR.2)

This assumption also implies that E(ε) = E (E(ε|X)) = 0, E(εX) = 0

and Cov(ε, X) = 0. Furthermore, this property implies that: E (Y |X) = Xβ
(MR.3): Conditional Homoskedasticity The variance-covariance matrix of the error
term, conditional on X is constant:
 
Var(1 ) Cov(1 , 2 ) ... Cov(1 , N )
 Cov(2 , 1 ) Var(2 ) ... Cov(2 , N )
Var (ε|X) =   = σ2 I (MR.3)
 
.. .. .. ..
 . . . . 
Cov(N , 1 ) Cov(N , 2 ) ... Var(N )

(MR.4): Conditionally Uncorrelated Errors The covariance between different error

term pairs, conditional on X, is zero:

Cov (i , j |X) = 0, i 6= j (MR.4)

This assumption implies that all error pairs are uncorrelated. For cross-sectional data,
this assumption implies that there is no spatial correlation between errors.
(MR.5) There exists no exact linear relationship between the explanatory variables.
This means that:

c1 Xi1 + c2 Xi2 + ... + ck Xik = 0, ∀i = 1, ..., N ⇐⇒ c1 = c2 = ... = ck = 0 (MR.5)

This assumption is violated if there exists some cj 6= 0.

Alternatively, this requirement means that:

rank (X) = k + 1

or, alternatively, that:

det X > X 6= 0

This assumption is important, because a linear relationship between independent variables

means that we cannot separately estimate the effects of changes in each variable
separately.
(MR.6) (optional) The residuals are normally distributed:

ε|X ∼ N 0, σ2 I

(MR.6)
Outliers
An outlier is an observation which is significantly different from other values in a random sample
from a population.
If we collect all of the various problems that can arise - we can rank them in terms of
severity:

outliers > non − linearity > heteroscedasticity > non − normality

Outlier Causes
Outliers can be cause by:
I measurement errors;
I being from a different process, compared to the rest of the data;
I not having a representative sample (e.g. measuring a single observation from a different city,
when the remaining observations are all from one city);
Outlier Consequences
Outliers can lead to misleading results in parameter estimation and hypothesis testing. This
means that a single outlier can make it seem like:
I a non-linear model may be better suited to the data sample, as opposed to a linear model;
I the residuals are heteroskedastic, when in fact only a residual has a larger variance, which is
different from the rest;
I the distribution is skewed (i.e. non-normal), because of a single observation/residual, which
is significantly different form the test.
set.seed(123)
#
N <- 100
x <- rnorm(mean = 8, sd = 2, n = N)
y <- 4 + 5 * x + rnorm(mean = 0, sd = 0.5, n = N)
y[N] <- -max(y)
Outlier Detection
The broad definition of outliers means that the decision whether an observation should be
considered an outlier is left to the econometrician/statistician/data scientist.
Nevertheless, there are a number of different methods, which can be used to identify abnormal
observations.
Specifically, for regression models, outliers are also detected by comparing the true and fitted
values. Assume that our true model is the linear regression:

Y = Xβ + ε (1)

Then, assume that we estimate β

b via OLS. Consequently, we can write the fitted values as:

b = X X> X −1 X> Y = HY

Y
b = Xβ
−1 >
where H = X X> X X is called the hat matrix (or the projection matrix), which is the
orthogonal projection that maps the vector of the response values, Y, to the vector of
fitted/predicted values, Y.
b It describes the influence that each response value has on each fitted
value, which is why H is sometimes also referred to as the influence matrix.
To understand the projection matrix a bit better do not treat the fitted values as something that
is separate from the true values.
I Instead assume that you have two sets of values: Y and Y.b
I Ideally, we would want Y = Y.
b
I Assuming that the linear relationship, Y = Xβ + ε, holds, this will generally not be possible
because of the random shocks ε
However, the closest approximation would be the conditional expectation of Y, given a design
matrix X, since we know that the conditional expectation is the best predictor from the proof in
Ch. 3.7.
The Conditional Expectation is The Best Predictor (Ch. 3.7)
We begin by outlining the main properties of the conditional moments, which will be useful
(assume that X and Y are random variables):
I Law of total expectation: E [E (h(Y )|X )] = E [h(Y )];
I Conditional variance: Var(Y |X ) := E (Y − E [Y |X ])2 |X = E(Y 2 |X ) − (E [Y |X ])2 ;

I Variance of conditional expectation:

Var(E [Y |X ]) = E (E [Y |X ])2 − (E [E [Y |X ]])2 = E (E [Y |X ])2
− (E [Y ])2 ;
I Expectation of conditional variance: E [Var(Y |X )] = E (Y − E [Y |X ])2 =

E E Y 2 |X − E (E [Y |X ])2 = E Y 2 − E (E [Y |X ])2 ;
I Adding the third and fourth properties together gives us:

Var(Y ) = E Y 2 − (E [Y ])2 = Var(E [Y |X ]) + E [Var(Y |X )].

For simplicity, assume that we are interested in the prediction of Y via the conditional
expectation:
Y = E (Y|X)
We will show that, in general, the conditional expectation is the best predictor of Y.
Assume that the best predictor of Y (a single value), given X is some function g(·), which
minimizes the expected squared error:

argming(X) E (Y − g(X))2 .

Using the conditional moment properties, we can rewrite E (Y − g(X))2 as:

E (Y − g(X))2 = E (Y + E[Y |X] − E[Y |X] − g(X))2

= E (Y − E[Y |X])2 + 2(Y − E[Y |X])(E[Y |X] − g(X)) + (E[Y |X] − g(X))2

= E E (Y − E[Y |X])2 |X + E 2(E[Y |X] − g(X))E [Y − E[Y |X]|X] + E (E[Y |X] − g(X))2 |X

= E [Var(Y |X )] + E (E[Y |X] − g(X))2 .

Taking g(X) = E[Y |X] minimizes the above equality to the expectation of the conditional
variance of Y given X:
E (Y − E[Y |X])2 = E [Var(Y |X )] .

Thus, g(X) = E[Y |X] is the best predictor of Y .

Going back to out projection matrix . . .

Using the OLS definition of β,

b the best predictor (i.e. the conditional expectation) maps the
values of Y to the values of Y
b via the projection matrix H.

The projection matrix can be utilized when calculating leverage scores and Cook’s distance,
which are used to identify influential observations.
Leverage Score of Observations
Leverage measures how far away an observation of a predictor variable, X, is from the mean of
the predictor variable.
For the linear regression model, the leverage score for the i-th observation is defined as the i-th
−1 >
diagonal element of the projection matrix H = X X> X X , which is equivalent to taking a
partial derivative of Yi with respect to Yi :
b

∂Y
bi
hii = = (H)ii
∂Yi

Defining the leverage score via the partial derivative allows us to interpret the leverage score as
the observation self-influence, which describes how the actual value, Yi , influences the fitted
value, Y
bi .
The leverage score hii is bounded:

0 ≤ hii ≤ 1

Proof.
Noting that H is symmetric and the fact that it is an idempotent matrix:
−1 > −1 > −1 >
H2 = HH = X X> X X X X> X X = XI X> X X =H

we can examine the diagonal elements of the equality H2 = H to get the following bounds of Hii :
X
hii = hii2 + hij2 ≥ 0
i6=j

hii ≥ hii2 =⇒ hii ≤ 1

We can also relate the residuals to the leverage score:

ε=Y−Y
b b = (I − H) Y

Examining the variance-covariance matrix of the regression errors we see that:

> >
ε) = Var((I − H) Y) = (I − H) Var(Y) (I − H) = σ 2 (I − H) (I − H) = σ 2 (I − H) ,
Var(b

where we have used the fact that (I − H) is idempotent and Var(Y) = σ 2 I.

Since the diagonal elements of the variance-covariance matrix are the variances of each
observation, we have that Var(b i ) = (1 − hii )σ 2 .
Thus, we can see that a leverage score of hii ≈ 0 would indicate that the i-th observation has no
influence on the error variance, which would mean that its variance close to the true
(unobserved) variance σ 2 .
Observations with leverage score values larger than 2(k + 1)/N are considered to be
potentially highly influential.
Assume that we estimate the model via OLS:
mdl_1_fit <- lm(y ~ 1 + x)

Studentized Residuals
The studentized residuals are related to the standardized residuals, as they are defined as:

bi
ti = √
b 1 − hii
σ

The main distinction comes from the calculation of σ

b, which can be calculated in two ways:
I Standardized residuals calculate the internally studentized residual variance estimate:

N
1 X
b2 =
σ 2
N − (k + 1) j=1 j
b

I If we suspect that the i-th residual of being improbably large (i.e. it cannot be from the
same normal distribution as the remaining of the residuals) - we exclude it from variance
estimation by calculating the externally studentized residual variance estimate:
N
2 1 X
σ = 2
N − (k + 1) − 1 j=1 j
b(i) b
j6=i
If the residuals are independent and ε ∼ N (0, σ 2 I), then the distribution of the studentized
residuals depends on the calculation of the variance estimate:
I If the residuals are internally studentized - they have a tau distribution:
√
v · tv −1
ti ∼ q , where v = N − (k + 1)
2
tv −1 + v − 1

I If the residuals are externally studentized - they have a Student’s t-distribution (we will
also refer to them as ti(i) ):
ti = ti(i) ∼ t(N−(k−1)−1)
Observations with studentized residual values larger than 3 in *absolute* value could
be considered outliers.
We can plot the studentized and standardized residuals:
olsrr::ols_plot_resid_stud(mdl_1_fit)

Studentized Residuals Plot

Threshold: abs(3)
0
Deleted Studentized Residuals

−50

Observation
−100 normal
outlier

−150

−200 100

0 25 50 75 100
Observation
olsrr::ols_plot_resid_stand(mdl_1_fit)

Standardized Residuals Chart

2.5
Threshold: abs(2)

0.0
Standardized Residuals

−2.5

−5.0

−7.5

−10.0 100

0 25 50 75 100
Observation
We can examine the same plots on the model, with the outlier observation removed from the
data:
olsrr::ols_plot_resid_stud(lm(y[-N] ~ 1 + x[-N]))

Studentized Residuals Plot

Threshold: abs(3)

2
Deleted Studentized Residuals

0
Observation
normal

−2

−4

0 25 50 75 100
Observation
olsrr::ols_plot_resid_stand(lm(y[-N] ~ 1 + x[-N]))

Standardized Residuals Chart

64 Threshold: abs(2)

49 74
39 96
2
Standardized Residuals

−2

0 25 50 75 100
Observation
While the studentized residuals appear to have no outliers, the standardized residuals indicate
that a few observations may be influential. Since we have simulated the data, we know that our
data contained only one outlier. Consequently, we should not treat all observations outside the
threshold as definite outliers.
We may also be interested in plotting the studentized residuals against the leverage points:
olsrr::ols_plot_resid_lev(mdl_1_fit)

Outlier and Leverage Diagnostics for y

657 16 26 70 184497 72
Threshold: 0.04
0

−50

Observation
RStudent

normal
−100
leverage
outlier

−150

100
−200

0.02 0.04 0.06 0.08

Leverage
olsrr::ols_plot_resid_lev(lm(y[-N] ~ 1 + x[-N]))

Outlier and Leverage Diagnostics for y[−N]

Threshold: 0.04

5.0

49 74
2.5 39 96
Observation
RStudent

97 normal
57 16 70
6 72 leverage
0.0
26 18 outlier
44
35

−2.5

−5.0

0.02 0.04 0.06 0.08

Leverage
This plot combined the leverage score, which shows influential explanatory variable
observations, and the studentized residual plot, which shows outlier residuals of the difference
between the actual and fitted dependent variables.
Influential observations
Influential observations are defined as observations, which have a large effect on the results of
a regression.
DFBETAS
The DFBETAi vector measures how much an observation i has effected the estimate of a
regression coefficient vector . It measures the difference between the regression coefficients,
calculated for all of the data, and the regression coefficients, calculated with the observation i
deleted:
b −β
β b
(i)
DFBETAi = q
σ 2 diag ((X> X)−1 )
b(i)
√
Observations with a DFBETA value larger than 2/ N in absolute value should be
carefully inspected.
The recommended general cutoff (absolute) value is 2.
We can calculate the appropriate DFBETAS for the last 5 observations as follows:
dfbetas_manual <- NULL
for(i in (N-4):N){
mdl_2_fit <- lm(y[-i] ~ 1 + x[-i])
numerator <- mdl_1_fit$coef - mdl_2_fit$coef
denominator<- sqrt((summary(mdl_2_fit)$sigma^2) * diag(solve(t(cbind(1, x)) %*% cbind(1, x))))
dfbetas_manual <- rbind(dfbetas_manual, numerator / denominator)
}
print(dfbetas_manual)

## (Intercept) x
## [1,] 0.028743821 -0.022789554
## [2,] 0.030744687 -0.034844559
## [3,] 0.020403791 -0.024298429
## [4,] 0.006702931 -0.004242548
## [5,] -29.230784828 25.362876769
While these calculations are a bit more involved, we can use the built-in functions as well:
print(tail(dfbetas(mdl_1_fit), 5))

## (Intercept) x
## 96 0.028743821 -0.022789554
## 97 0.030744687 -0.034844559
## 98 0.020403791 -0.024298429
## 99 0.006702931 -0.004242548
## 100 -29.230784828 25.362876769
If we wanted, we could also plot these values:
olsrr::ols_plot_dfbetas(mdl_1_fit)

page 1 of 1
Influence Diagnostics for (Intercept)
0
Threshold: 0.2
DFBETAS

−10

−20

100
−30
0 25 50 75 100
Observation

Influence Diagnostics for x

100
Threshold: 0.2
20
DFBETAS

0
0 25 50 75 100
Observation
If we were to remove the last observation and examine the DFBETAS plot:
olsrr::ols_plot_dfbetas(lm(y[-N] ~ 1 + x[-N]))

page 1 of 1
Influence Diagnostics for (Intercept)
64
Threshold: 0.2
0.4
DFBETAS

44
74
0.2 25 96

0.0

−0.2
8 43

0 25 50 75 100
Observation

Influence Diagnostics for x[−N]

8
0.25 43
Threshold: 0.2 97
DFBETAS

0.00

74
−0.25
44

0 25 50 75 100
Observation
We see that there are some observations, which may be worth examining. In this case, we know
that there are no more outliers because we have simulated the data ourselves. So this is a good
example that you should not blindly trust the above charts, as the influential observations are
not necessarily outliers.
DFFITS
DFFITS measures how much an observation i has effected the fitted value of a regression. It is
defined as a Studentized difference between the fitted values from a regression, estimated on all
of the data, and the fitted values from a regression, estimated on the data with observation i
deleted: r
Ybi − Ybi(i) hii
DFFITSi = 2 √ = ti(i)
σb(i) hii 1 − hii
where ti(i) is the externally studentized residual.
tmp_val <- dffits(mdl_1_fit)
print(format(tail(cbind(tmp_val), 10), scientific = FALSE))
## tmp_val
## 91 " -0.0005235787"
## 92 " 0.0031760359"
## 93 " 0.0091761236"
## 94 " 0.0199891169"
## 95 " -0.0226748095"
## 96 " 0.0376495768"
## 97 " -0.0379725909"
## 98 " -0.0287153104"
## 99 " 0.0125545090"
## 100 "-32.6910440413" p
Observations with a DFFITS value larger than 2 (k + 1)/N in absolute value
olsrr::ols_plot_dffits(mdl_1_fit)

Influence Diagnostics for y

0 Threshold: 0.28

−10
DFFITS

−20

−30

100

0 25 50 75 100
Observation
olsrr::ols_plot_dffits(lm(y[-N] ~ 1 + x[-N]))

Influence Diagnostics for y[−N]

0.6 64 Threshold: 0.28

0.4

49 74

0.2
DFFITS

0.0

−0.2

43
8
44
−0.4
0 25 50 75 100
Observation
Similarly to what we have observed with DFBETAS - we should not blindly trust that each value
outside the cutoff region is an outlier. Instead, we should treat them as influential observations,
which need additional analysis to determine whether they are acceptable.
Cook’s distance
Cook’s D measures the aggregate impact of each observation on the group of regression
coefficients, as well as the group of fitted values. It can be used to:
I indicate influential data points (i.e. potential outliers);
I indicate regions, where more observations would be needed;
Cook’s distance for observation i is defined as:
PN b b 2
j=1 (Yj − Yj(i) ) 2i

b hii
Di = =
(k + 1)bσ2 (k + 1)b σ 2 (1 − hii )2

where:
I Y bj(i) is the fitted value of Yj , obtained by excluding the i-th observation and re-estimating
the same model via OLS.
ε> bε
I σb2 =
b
is the mean squared error of the error term.
N − (k + 1)
Note: in practical terms, it may be easier to use the leverage score expression of Di instead of
re-estimating the model for each observation case.
tmp_val <- cooks.distance(mdl_1_fit)
print(format(tail(cbind(tmp_val), 10), scientific = FALSE))

## tmp_val
## 91 "0.0000001384804"
## 92 "0.0000050955563"
## 93 "0.0000425310902"
## 94 "0.0002017917033"
## 95 "0.0002596785491"
## 96 "0.0007154000322"
## 97 "0.0007282312180"
## 98 "0.0004164378945"
## 99 "0.0000796089714"
## 100 "1.2596831219103"
Cook’s distance values, which are:
I larger than 4/N (the traditional cut-off);
1 PN
I larger than 3 × Di
N i=1
could be considered highly influential.
We can plot the Di points:
olsrr::ols_plot_cooksd_bar(mdl_1_fit)

Cook's D Bar Plot

Threshold: 0.04
100

1.0

Observation
Cook's D

normal
outlier
0.5

0.0

0 25 50 75 100
Observation
olsrr::ols_plot_cooksd_chart(mdl_1_fit)

Cook's D Chart
100
Threshold: 0.04

1.0
Cook's D

0.5

0.0

0 25 50 75 100
Observation
As well as plot the Di on the data without the outlier observation:
olsrr::ols_plot_cooksd_bar(lm(y[-N] ~ 1 + x[-N]))

Cook's D Bar Plot

64
Threshold: 0.04
0.15

0.10

Observation
Cook's D

normal
outlier
44

8
0.05 74
43
49

0.00

0 25 50 75 100
Observation
olsrr::ols_plot_cooksd_chart(lm(y[-N] ~ 1 + x[-N]))

Cook's D Chart
64
Threshold: 0.04
0.15

0.10
Cook's D

44
8
43 74
0.05 49

0.00

0 25 50 75 100
Observation

We again see a similar result, as with DFBETAS and DFFITS.

Also note that R has a lot of different plots for the default lm model output:
par(mfrow = c(3, 2), mar = c(2, 2, 2, 2))
for(i in 1:6){
plot(mdl_1_fit, which = i)
}

Residuals vs Fitted Normal Q−Q

Standardized residuals
72 64 64 72
−20

−2
−60

−6
−100

−10
100
100

20 30 40 50 60 −2 −1 0 1 2

Scale−Location
Fitted values Cook's distance
Theoretical Quantiles
100
3.0

100

1.2
Cook's distance
2.0

0.8
1.0

0.4
72 64
18 72
0.0

0.0
20 30 40 50 60 0 20 40 60 80 100

Residuals
Fittedvs Leverage
values
Cook's dist vsObs.
Leverage
number
hii (1 − hii)
10 100 8 6

1.2
18 72

Cook's distance
−2

0.8
0.5
1
4
−6

0.4 2
−10

Cook's distance 100

0.0
18 72 0

0.00 0.02 0.04 0.06 0.08 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
par(mfrow = c(3, 2), mar = c(2, 2, 2, 2))
for(i in 1:6){
plot(lm(y[-N] ~ 1 + x[-N]), which = i)
}

Residuals vs Fitted Normal Q−Q

Standardized residuals

0 1 2 3 4
64 64

49
1.0

74 74 49
0.0
−1.0

−2
20 30 40 50 60 −2 −1 0 1 2

Scale−Location
Fitted values Cook's distance
Theoretical Quantiles
64 64
1.5

Cook's distance
74 49

0.10
1.0

44
8
0.5

0.00
0.0

20 30 40 50 60 0 20 40 60 80 100

Residuals
Fittedvs Leverage
values
Cook's dist vsObs.
Leverage
number
hii (1 − hii)

0.00 0.05 0.10 0.15

0 1 2 3 4

64 3.5 64 3 2.5 2
0.5

Cook's distance
1.5

44
8
1
44
Cook's distance
−2

8 0.5
0

0.00 0.02 0.04 0.06 0.08 0 0.02 0.04 0.06 0.08

Addressing Outliers
After determining that a specific observation is indeed an outlier, we want to address them in
some way.
Capping the Outliers
If we find that the explanatory variables X1,i , ..., Xk,i of an outlier variable Yi are similar to other
observations, with non-outlier values of Yi , we may cap the value of the outlier, to match the
values.

Replacing Outliers with Imputed Values

If we are certain that the outlier is due to some error in the data itself - we could try to impute
the observations by treating them as missing values and substituting them for some average
value of Y .
The Expectation-maximization (EM) algorithm could be utilized for missing data imputation.

Deleting Outliers
In some cases, if we are absolutely sure that the observation is an outlier, which is either
completely unlikely, or impossible to encounter again, we could drop it.

Robust Regression
In addition to the methods mentioned before, we could also run a Robust regression.
In our example, we know that the last observation was differently generated, and is thus an
outlier, which we can delete.
We can compare how would our model look like with the whole dataset, and if we were to drop
the outlier observation:
plot(x, y)
lines(x, mdl_1_fit$fitted.values, col = "red")
lines(x[-N], lm(y[-N] ~ 1 + x[-N])$fitted.values, col = "blue")
points(x[N], y[N], pch = 19, col = "red")
legend("topleft", lty = 1, col = c("red", "blue"), legend = c("with outlier", "deleted outlier"))
60

with outlier
deleted outlier
40
20
0
y

−20
−60

Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
Econometrics
No ratings yet
Econometrics
13 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
Matrix OLS NYU Notes
No ratings yet
Matrix OLS NYU Notes
14 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
Sta 3
No ratings yet
Sta 3
9 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Oulier in R
No ratings yet
Oulier in R
8 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Chapter 3
No ratings yet
Chapter 3
22 pages
Lecture 1a
No ratings yet
Lecture 1a
17 pages
Chapter 11
No ratings yet
Chapter 11
10 pages
Chapter 6: Regression
No ratings yet
Chapter 6: Regression
7 pages
7772 LectureNotes
No ratings yet
7772 LectureNotes
120 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
Gary Chamberlain Econometric S
No ratings yet
Gary Chamberlain Econometric S
152 pages
330 Lect11
No ratings yet
330 Lect11
35 pages
330 Lecture11 2014
No ratings yet
330 Lecture11 2014
61 pages
Econometric Theory: Module - Ii
No ratings yet
Econometric Theory: Module - Ii
11 pages
EC501 Lecture 02
No ratings yet
EC501 Lecture 02
27 pages
Chapter 2 Regression Analysis Notes
No ratings yet
Chapter 2 Regression Analysis Notes
11 pages
Regression Basics in Matrix Terms: 1 The Normal Equations of Least Squares
No ratings yet
Regression Basics in Matrix Terms: 1 The Normal Equations of Least Squares
3 pages
Econometrics I Lecture 4 Wooldridge
No ratings yet
Econometrics I Lecture 4 Wooldridge
33 pages
Lesson 3 Overview Problems and Outliers
No ratings yet
Lesson 3 Overview Problems and Outliers
31 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Machine Learning Lecture Notes Undergrad
No ratings yet
Machine Learning Lecture Notes Undergrad
19 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
Some Methods of Detection of Outliers in Linear Regression Model-Ranjit PDF
No ratings yet
Some Methods of Detection of Outliers in Linear Regression Model-Ranjit PDF
19 pages
Multiple Linear Regression Model by Jeevan Bista
No ratings yet
Multiple Linear Regression Model by Jeevan Bista
16 pages
Mult Hetero Notes Agd
No ratings yet
Mult Hetero Notes Agd
29 pages
Econometrics Module 2
No ratings yet
Econometrics Module 2
38 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
Stats101A - Chapter 3
No ratings yet
Stats101A - Chapter 3
54 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
Linear Regression
No ratings yet
Linear Regression
108 pages
Session CLRM Review 1
No ratings yet
Session CLRM Review 1
47 pages
Chapter 1
No ratings yet
Chapter 1
17 pages
Wooldridge Notes
No ratings yet
Wooldridge Notes
15 pages
Multiple Regression
No ratings yet
Multiple Regression
22 pages
Linear Regression Assumptions and Limitations
No ratings yet
Linear Regression Assumptions and Limitations
10 pages
03 Assumptions and Gauss Markov
No ratings yet
03 Assumptions and Gauss Markov
5 pages
Econometrics Lecture4 MultipleRegression
No ratings yet
Econometrics Lecture4 MultipleRegression
40 pages
Bias-Variance Tradeoffs: 1 Single Sample MLE
No ratings yet
Bias-Variance Tradeoffs: 1 Single Sample MLE
7 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
Multicollinearity and Endogeneity PDF
No ratings yet
Multicollinearity and Endogeneity PDF
37 pages
Classical LinearReg 000
No ratings yet
Classical LinearReg 000
41 pages
Chap 7
No ratings yet
Chap 7
7 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Lecture-4 2
No ratings yet
Lecture-4 2
50 pages
Multiple Linear Regression in Data Mining
100% (1)
Multiple Linear Regression in Data Mining
14 pages
LR Assumptions - 05
No ratings yet
LR Assumptions - 05
12 pages
CH 2
No ratings yet
CH 2
31 pages
Notes 2
No ratings yet
Notes 2
16 pages
3 The Basic Linear Model Finite Sample Results
No ratings yet
3 The Basic Linear Model Finite Sample Results
9 pages
7 OLS Assumptions
No ratings yet
7 OLS Assumptions
37 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
230127 (최종보고서) 변화하는+미디어환경의+대응을+위한+공영방송의+재구조화+방안+연구 송부용 인쇄본
No ratings yet
230127 (최종보고서) 변화하는+미디어환경의+대응을+위한+공영방송의+재구조화+방안+연구 송부용 인쇄본
158 pages
Uni Variate Regression
No ratings yet
Uni Variate Regression
61 pages
MultivariableRegression 1
No ratings yet
MultivariableRegression 1
30 pages
Bai Luyen Tap Cho Hoc Sinh Mon Tieng Anh Lop 11 Bai So 9 1
No ratings yet
Bai Luyen Tap Cho Hoc Sinh Mon Tieng Anh Lop 11 Bai So 9 1
3 pages
Not Many Young People Living in Rural Areas Have Access To A University Education
No ratings yet
Not Many Young People Living in Rural Areas Have Access To A University Education
1 page
(Dethihsg247.Com) Tuyen Tap 20 Nam de Thi Olympic 30 4 Tieng Anh 11
100% (1)
(Dethihsg247.Com) Tuyen Tap 20 Nam de Thi Olympic 30 4 Tieng Anh 11
424 pages
The Namesake Practice 3
No ratings yet
The Namesake Practice 3
2 pages
Class Lecture - Energy
No ratings yet
Class Lecture - Energy
42 pages
Answers To Ch-1 (Cash Budget)
No ratings yet
Answers To Ch-1 (Cash Budget)
5 pages
Grading 2014-2015
No ratings yet
Grading 2014-2015
21 pages
Shcola
No ratings yet
Shcola
16 pages
1817-18 Gas Lighting in London
No ratings yet
1817-18 Gas Lighting in London
1 page
Sociolinguistics - Politeness Strategy Used by Judges and Contestant in Indonesian Idol
No ratings yet
Sociolinguistics - Politeness Strategy Used by Judges and Contestant in Indonesian Idol
16 pages
An Examination of Audit Delay Further Evidence From New Zealand
No ratings yet
An Examination of Audit Delay Further Evidence From New Zealand
13 pages
3.BP Travel - Create Quotes - Functional Requirements Questionnaire (FRQ)
No ratings yet
3.BP Travel - Create Quotes - Functional Requirements Questionnaire (FRQ)
11 pages
About The Handout: Professional Statement
No ratings yet
About The Handout: Professional Statement
21 pages
Internship Report
No ratings yet
Internship Report
11 pages
First Video: How Your Brain Predictions Interfere With What You See - Georg Keller
No ratings yet
First Video: How Your Brain Predictions Interfere With What You See - Georg Keller
2 pages
GK For
No ratings yet
GK For
171 pages
Life Cycle of Angiosperms
No ratings yet
Life Cycle of Angiosperms
5 pages
Asme Section Ii A-2 Sa-815 Sa-815m
No ratings yet
Asme Section Ii A-2 Sa-815 Sa-815m
10 pages
Operation Manual WBL-100/101/200: 3, Hagavish St. Israel 58817 Tel: 972 3 5595252, Fax: 972 3 5594529
No ratings yet
Operation Manual WBL-100/101/200: 3, Hagavish St. Israel 58817 Tel: 972 3 5595252, Fax: 972 3 5594529
5 pages
Cclues
No ratings yet
Cclues
66 pages
Accessories: Section Pages Section Pages Section Pages
No ratings yet
Accessories: Section Pages Section Pages Section Pages
48 pages
Strengthening Community Pharmacies Role in Early Tuberculosis Case Detection and Referrals in Jinja Municipality, Uganda - End of Project Report
No ratings yet
Strengthening Community Pharmacies Role in Early Tuberculosis Case Detection and Referrals in Jinja Municipality, Uganda - End of Project Report
37 pages
1997 SCMR 876
No ratings yet
1997 SCMR 876
3 pages
D T Solar Ovens Lesson Plan
No ratings yet
D T Solar Ovens Lesson Plan
6 pages
Writing and Academic Essay
No ratings yet
Writing and Academic Essay
4 pages
Unsolved Case Files Who Whacked Jack 01
No ratings yet
Unsolved Case Files Who Whacked Jack 01
11 pages
Model Question Paper Grade 9
No ratings yet
Model Question Paper Grade 9
12 pages
Unix and Shell Programming
No ratings yet
Unix and Shell Programming
19 pages
As 1789
No ratings yet
As 1789
2 pages
Curriculum Development
67% (3)
Curriculum Development
65 pages
(Blackwell Ancient Religions) Rives, James B - Religion in The Roman Empire (2007, Blackwell Pub) - Libgen - Li
No ratings yet
(Blackwell Ancient Religions) Rives, James B - Religion in The Roman Empire (2007, Blackwell Pub) - Libgen - Li
319 pages
SFBT Course Handbook
100% (1)
SFBT Course Handbook
19 pages
The Myth of Scientific Miracles in The Koran
40% (5)
The Myth of Scientific Miracles in The Koran
13 pages

MultivariableRegression 6

Uploaded by

MultivariableRegression 6

Uploaded by

PE I: Multivariable Regression

Andrius Buteikis, [email protected]

(MR.2): Strict Exogeneity Conditional expectation of ε, given all observations of the

This assumption also implies that E(ε) = E (E(ε|X)) = 0, E(εX) = 0

(MR.4): Conditionally Uncorrelated Errors The covariance between different error

Cov (i , j |X) = 0, i 6= j (MR.4)

c1 Xi1 + c2 Xi2 + ... + ck Xik = 0, ∀i = 1, ..., N ⇐⇒ c1 = c2 = ... = ck = 0 (MR.5)

This assumption is violated if there exists some cj 6= 0.

or, alternatively, that:  

This assumption is important, because a linear relationship between independent variables

outliers > non − linearity > heteroscedasticity > non − normality

Then, assume that we estimate β

E (Y − g(X))2 = E (Y + E[Y |X] − E[Y |X] − g(X))2

= E [Var(Y |X )] + E (E[Y |X] − g(X))2 .

Thus, g(X) = E[Y |X] is the best predictor of Y .

Using the OLS definition of β,

hii ≥ hii2 =⇒ hii ≤ 1

Examining the variance-covariance matrix of the regression errors we see that:

where we have used the fact that (I − H) is idempotent and Var(Y) = σ 2 I.

The main distinction comes from the calculation of σ

Studentized Residuals Plot

Standardized Residuals Chart

Studentized Residuals Plot

Standardized Residuals Chart

Outlier and Leverage Diagnostics for y

0.02 0.04 0.06 0.08

Outlier and Leverage Diagnostics for y[−N]

0.02 0.04 0.06 0.08

Influence Diagnostics for x

Influence Diagnostics for x[−N]

Influence Diagnostics for y

Influence Diagnostics for y[−N]

Cook's D Bar Plot

Cook's D Bar Plot

We again see a similar result, as with DFBETAS and DFFITS.

Residuals vs Fitted Normal Q−Q

Cook's distance 100

Residuals vs Fitted Normal Q−Q

0.00 0.05 0.10 0.15

0.00 0.02 0.04 0.06 0.08 0 0.02 0.04 0.06 0.08

Replacing Outliers with Imputed Values

You might also like

Cov (i , j |X) = 0, i 6= j (MR.4)

or, alternatively, that: