0% found this document useful (0 votes)

177 views139 pages

Stat 473-573 Notes

This document provides notes for the course STAT 473-573 Statistical Methods and Models. It outlines the prerequisites, textbook, planned chapters to be covered, exam dates, and a reading assignment. It also includes an introduction to key concepts in probability and statistics that will be built upon in the course, including experiments, random variables, distributions, parameters, inference, estimation, and hypothesis testing. The goal of the course is to use statistical methods to construct statistical models and analyze how well they fit real-world phenomena using linear models.

Uploaded by

Arkadiusz Michael Bar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

177 views139 pages

Stat 473-573 Notes

Uploaded by

Arkadiusz Michael Bar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 139

Class Notes STAT 473-573

[email protected]
Oce: 361 F
815-753-6829
1. The prerequisite for this course is STAT 350, STAT 301 and
MATH 211.
2. The prescribed text for this course is
APPLIED LINEAR STATISTICAL MODELS -5
th
EDITION by
Kutner et al.
3. For this course we plan on covering chapters 1, 2, 3, 4, 6, 7,
9, 10 and 11. However if time permits we can start chapter 8
or 12.
4. The rst midterm exam will be on Sep 28 and the second one
will be on Nov 18.
5. Reading Exercise: Please go through Sections 1.1, 1.2, 1.3,
1.4, 1.5 on your own.
A brief Introduction

Experiment

Sample Space (Set of all possible outcomes)

Events (Subsets of Sample Space)

Probability

Random Variable Univariate and Bivariate/Multivariate

Distribution of a Random Variable

Discerete and Continuous Random Variables

Distribution of a Random Variable is characterized by a set of

Parameters

Parameters and Parametric models

Normal, Binomal, Uniform, Poisson,.....

What if the parameter is unknown?

Inference: We infer the parameters value from the data at

hand.

Estimation and Hypothesis Testing.

Estimation: Two kinds Point Estimation and Interval

Estimation.
1. The title of this course is STATISTICAL METHODS AND
MODELS.
2. We will use STATISTICAL METHODS to construct
STATISTICAL MODEL. (How is it dierent from a
MATHEMATICAL MODEL ?)
3. What is a MODEL: We intend to reconstruct a real life
phenomena using statistical/mathematical tools and
techniques.
4. MODEL is an approximation of the phenomenon we intend
to study.
5. For this course we shall keep ourselves conned only to
LINEAR MODELS. ( How is it dierent from NON LINEAR ?)
How does modeling a phenomenon help
1. Prediction.
2. Mechanism by which the data is produced.
We need DATA to model.
Refrigeration Equipment Data

An object is produced in lots of varying sizes

What is the optimum lot size for producing this part?

This qn can be answered only if we can successfully study the

relationship between lot size and labor hours required to
produce the lot.

So the question is how can we study successfully the

relationship between the lot size and the labor hours required
to produce the lot.

The rst thing we need to build a statistical model is DATA.

Data Description

Number of variable. Usually denoted by p. Here p = 2.

Number of observations. Usually denoted by n. Here n = 25.

The observations are independent (VERY IMPORTANT). The

observations do not inuence each other.

Variable names: Lot Size and Work Hours.

Problem: Study how the Lot Size explains the Work Hours.

That is study the change in Work Hours when the Lot Size
changes.

Dependent Variable: Work Hours

Independent/Explanatory Variable/Covariate: Lot Size.

One dependent and one independent variable. (Can we have

more than one independent varaible? Can we have more than
one dependent variable?)

Most people use statistics the way a drunk uses a lamp post.
More for support than illumination.

Let the DATA speak for itself.

Exploratory Data Analysis.

Is there a quick or huerstic way of determining if a relationship

exists between the two varaible? And if yes what is the nature.

Yes. Scatter plot. Dependent on Y axis and the Independent

on the X axis.

We can roughly expect 3 kinds of trends.

Linear trend

NonLinear trend

No trend
Linear Trend
6 4 2 0 2 4 6

5
0
5
1
0
x
y
1
Linear Trend
5 0 5

5
0
5
x
y
5 0 5

5
0
5
x
y
1
Nonlinear Trend
6 4 2 0 2 4 6
0
1
0
2
0
3
0
4
0
5
0
x
y
1
Nonlinear Trend
5 0 5
0
1
0
2
0
3
0
4
0
5
0
x
y
5 0 5
0
1
0
2
0
3
0
4
0
5
0
x
y
1
No Trend
4 2 0 2 4

1
0
1
2
x
y

Now based on the scatter plot we need model the data.

Since we shall be working only with linear models so we would

be interested in those dtata sets where the scatterplot shows a
linear trend.

If the scatter plot does not show linear plot, does it mean that
we can never we a linear model there?
Scatter Plot: Refrigeration Equipment Data
How do we construct the model

It would be best if our model could pass through each

obervation? (Would it really be best?)

But since we want to t a stright line it wont be possible. So

what do we do.

We could either say

Y =
0
+
1
X which would be a deterministic model, which
would not be able to explain the scattering

Y =
0
+
1
X + where is a random variable
Why the epsilon?

As statiticians we choose to model these deviations as

observations from a random varaible

Our model is
Y
i
=
0
+
1
X
i
+
for i = 1, ..., n

Here Y
i
is the i
th
observation ofresponse variable.

0
and
1
are the model parameters.

X
i
s are known constants, i.e., the i
th
value of the predictor
varaible.

i
are i.i.d N(0,
2
). Is known?
The tted model

So we want draw a stright line on the scatter plot. And we

want to do it in the best possible way.

What is best?

We want to draw a stright line which minimizes the sum of

squares of the distance from the obs and the tted.

The sum of squares of the distance from the obs and the
tted is:
n

i =1
(Y
i

0
+
1
X
i
)
2
The tted model

In the last slide we said .. straight line which minimizes the

sum of squares of the distance from the obs and the tted.

A stright line has two parameters. Slope

1
and intercept
0

How to do we choose the slope

1
and intercept
0

As stated before we choose that (

0
,
1
) which minimizes
n

i =1
(Y
i

0
+
1
X
i
)
2

Calculus maxima minima in two variables Pg 17

Least Squares Estimated Values

b
1
=

n
i =1
(X
i

X)(Y
i

Y)

n
i =1
(X
i

X)
2

b
0
=

Y b
1

X
Now that we the estimated values of the parameters we us the
data to compute it and plot the tted regression line on the scatter
plot. (See reg2.png)
The Gauss Markov Theorem

So these LSE have any special statistical properties? Yes

Unbisaed

Minimum variance among among all unbiased linear estimators

BLUE (Best linear unbiased estimator) not blue : (((((

Properties of the tted model

What is the tted regression equation here?

hr = 62.366 + 3.5702 lot

There are six properties in all (see page 23-24). We shall

check to see if these properties hold for our data set.
Interpretation of
0
and
1
Our model is as follows:
Y
i
=
0
+
1
X
i
+
i
for all i = 1, ..., n.

When X
i
= 0 we nd Y
i
=
0
. So
0
is that population mean
of the observed variable y when x = 0.

A change in the population mean value of the observed

variable when the independent variable undergoes a unit
change.
Recall that (b
0
, b
1
) are the estimates of (
1
,
2
). So b
0
+ b
1
X is
the estimate of the mean response of the variable Y when the
value of X is xed at a certain level.

What are tted values? The tted value of the i

th
observation
is

Y
i
= b
0
+ b
1
X
i

The residual for the i

th
observation is e
i
= Y
i

Y
i

An example. page 23

What is E(Y
i
) =?. What is E(Y
i

Y
i
) =?

Error Sum of squares or Residual sum of squares

SSE =

n
i =1
e
2
i
Estimation of
2

Suppose I have a random sample of size n drawn from a

population whose mean is and variance is
2
.

Suggest an unbiased estimator for

2
.

What were the two main assumptions required to answer the

above question?

Now consider our Y

i
s for all i = 1, ..., n. What is the
distribution of, say, Y
5
?

Does this random sample of size n satisfy the above two

assumptions? Which ones does it violate?

So what would be the estimator of

2
here?

2
= MSE =
SSE
n 2

E(MSE) =
2

MSE =
_
SSE
n 2
Chapter 1: Important Terms and Concepts:
1. explanatory variable/covariates/independent variable
2. predictor variable/observations/dependent variable
3. scatter plot
4. Linear Model
5. Simple linear regression
6. slope
1
its estimate b
0
and its interpretation
7. intercept
0
its estimate and its interpretation
8. method of least squares
9. Fitted model and its properties (7 of them). Fitted value.
10. Gauss Markov Theorem
11. residual
12. error sum of squares or residual sum of squares
13. error mean square or residual mean square
14. estimate of
2
The section in Chapter 1 which will not be included in the syllabus
is 1.8.
Chapter 2
5 0 5
3
4
5
6
7
x
y
1
5 0 5
3
4
5
6
7
x
y

From the scatter plot above we know a linear model is

appropriate

But we are not sure if

1
should be non zero?

What do we do now? That is, which model should we t:

with or without
1
?
Y
i
=
0
+
1
X
i
+
i
OR
Y
i
=
0
+
i

By tting a model we mean estimating its parameters. In our

case we estimate (
0
,
1
) or (
0
,
1
,
2
) accordingly as
2
is
known or not.

Now if we cannot infer much from the scatter plot and we

want to use some analytical tools then, one way to see if
linear relationship exists at all would be to test
H
0
:
1
= 0
H
1
:
1
= 0
Hypothesis Testing

We need a H
0
and a H
1
.

T: Test statistic. (This should not contain any unknown

parameters)

Distribution of T under H
0
.

R: Critical Region or the Rejection Region (Usually in term of

the test statistic T).

Now computed value of T also written as T

Check to see if T

belongs to R.

Or compute the P-value and compare it with .

Our hypotheses are:

H
0
:
1
= 0
H
1
:
1
= 0

Recall the estimate for

1
is b
1
.
b
1
=

n
i =1
(X
i

X)(Y
i

Y)

n
i =1
(X
i

X)
2
=
n

i =1
(X
i

X)

n
i =1
(X
i

X)
2
Y
i

This b
1
shall be our test statistic.

Notice it is a linear combination of Y

i
s.
More about b
1

E(b
1
)=?

Var(b
1
)=?

Sampling Distribution: What shall be the distribution of b

1
?

What would be the distribution of

TS =
b
1

1
_
Var(b
1
)

Can TS be a test statistic? Does it contain any unknown

parameter.

Estimated variance of b
1
, that is, estimate of Var (b
1
) (s(b
1
)
2
).

What is the distribution of

Z\
_

2
n
n
?
Where Z( N(0, 1)) and
2
n
are independent.

If we know (n 2)
2
/
2

2
n2

And we aslo know the above random variable is independent

of b
0
and b
1

What would be the distribution of

b
1

1
s(b
1
)

Example 1 (what is the degrees of freedom of our test

statistic) and Example 2 (page 46,48)
Condence interval for b
1

Is the t distribution symmetric ?

t(/2; n 2) : Denotes the (/2)100 percentile of the t

distribution with n 2 degrees of freedom.

t(/2; n 2) = t(1 /2; n 2)

Now
P
_
t(/2; n 2)
b
1

1
s(b
1
)
t(1 /2; n 2)
_
= 1

Thus the 100(1 )% C.I. for

1
is
_
b
1
t(1 /2; n 2) s(b
1
), b
1
+t(1 /2; n 2) s(b
1
)
_
ANOVA: Analysis of Variance

Observe the data set Y

1
, ..., Y
n
with n observations.

The sample variance of this data set is

s
2
=
1
n 1
n

i =1
(Y
i

Y)

Is s
2
always 0? When will it be 0?

If the answer to the above question is no, then the question is

Why NOT ??

Because there is varaition in the data.

What causes this variation?

Recall, we had assumed that

Y
i
=
0
+
1
X
i
+
i

What do we see? The variation is caused by:

A part explained by X
i
s.

Random error
i
s.

Y
i
=

Y + Y
i

Y

What is

Y?

Y
i

Y =

Y
i

Y + Y
i

Y
Thus
(Y
i

Y)
2
= (

Y
i

Y + Y
i

Y)
2

Which nally gives us (verify)

i =1
(Y
i

Y)
2
=
n

i =1
(

Y
i

Y)
2
+
n

i =1
(Y
i

Y)
2

So we have,
n

i =1
(Y
i

Y)
2
=
n

i =1
(

Y
i

Y)
2
+
n

i =1
(Y
i

Y)
2

SSR (Regresion Sum of Squares) =

n
i =1
(

Y
i

Y)
2

MSR = SSR/df(SSR)

SSE (Error Sum of Squares) =

n
i =1
(Y
i

Y)
2

MSE = SSE/df(SSE)

SSTO (Total Sum of Squares) =

n
i =1
(Y
i

Y)
2

SSTO = SSR + SSE

The expected values

E(MSR) =
2
+
2
1
n

i =1
(X
i

X)
2

E(MSE) =
2

E(MSR)
E(MSE)
=

2
+
2
1

n
i =1
(X
i

X)
2

2
= 1 when
1
= 0.

Now that we know the ratio will be greater than 1 when

1
is
not equals 0. Thus we can use
F =
MSR
MSE
as a test statistic for testing
H
0
:
1
= 0
H
1
:
1
= 0

What is the distribution of F under H

0
?
F distribution

n
i =1
Z
2
i
? where Z
i
are i.i.d N(0, 1)

What is the distribution of

Z
_

2
n
n
where Z and
2
n
are independent.

F =

2
m
/m

2
n
/n
where
2
m
and
2
n
are independent.

Our question was F ?

Now
F =
SSR

2
SSE

2
(n2)

If we know, under H
0
:
1
= 0,
SSR

2
and
SSE

2
are independent
and have
2
1
and
2
n2

F F(1, n 2)

Since F is always greater than 0. What shall be the critical

region?

Recall that
E(MSR)
E(MSE)
=

2
+
2
1

n
i =1
(X
i

X)
2

2
= 1 when
1
= 0.

Critical region: { F > c }

Altough two sided alternative hypothesis, but CR is one sided.

See Example Pg 71 of the textbook.

So we have established another method of testing

H
0
:
1
= 0
H
1
:
1
= 0

Are the two tests, using a T statistic and using a F statistic

equivalent? YES

Notice if
{X
2
> 4} = {X > 2} {X < 2}

Since F = T
2
, therefore
{F > c} = {T >

c} {T <

See page 71 F(1, 23) = 4.28 = (2.069)

2
= (t(1, 23))
2
ANOVA Table
Source of variation SS df MS E(MS) F P value
Regression SSR 1 MSR=SSR/1 E(MSR) <0.0001*
Error SSE n-2 MSE=SSE/(n-2) E(MSE) <0.0001*
Total SSTO n-1
Inference on b
0
0 5 10
0
5
1
0
x
y
1
0 5 10
0
5
1
0
x
y

From the scatter plot above we know a linear model is

appropriate

But we are not sure if

0
should be non zero?

What do we do now? That is, which model should we t:

with or without
0
?
Y
i
=
0
+
1
X
i
+
i
OR
Y
i
=
1
X
i
+
i

By tting a model we mean estimating its parameters. In our

case we estimate (
0
,
1
) or (
0
,
1
,
2
) accordingly as
2
is
known or not.

Now if we cannot infer much from the scatter plot and we

want to use some analytical tools then, one way to see if
linear relationship exists at all would be to test
H
0
:
0
= 0
H
1
:
0
= 0
Inference on b
0

Recall
b
0
=

Y b
1

X =
1
n
n

i =1
Y
i

_
n

i =1

Xk
i
Y
i
_
=
n

i =1
(
1
n

Xk
i
)Y
i

E(b
0
) =
0

2
(b
0
) =
2
_
1
n
+

X
2

(X
i

X)
2
_

Sampling Distribution: What shall be the distribution of b

0
?
(Assuming that
2
is known.)

What would be the distribution of

TS =
b
0

0
_
Var(b
0
)

Can TS be a test statistic? Does it contain any unknown

parameter?

Estimated variance of b
0
, that is, estimate of Var (b
0
) (s(b
0
)
2
).

s(b
0
)
2
= MSE
_
1
n
+

X
2

(X
i

X)
2
_

Recall that
Z\
_

2
n
n
t
n2
Where Z( N(0, 1)) and
2
n
are independent.

If we know (n 2)
2
/
2

2
n2

And we aslo know the above random variable is independent

of b
0
and b
1

What would be the distribution of

b
0

0
s(b
0
)
Condence interval for b
0

Is the t distribution symmetric ?

t(/2; n 2) : Denotes the (/2)100 percentile of the t

distribution with n 2 degrees of freedom.

t(/2; n 2) = t(1 /2; n 2)

Now
P
_
t(/2; n 2)
b
0

0
s(b
0
)
t(1 /2; n 2)
_
= 1

Thus the 100(1 )% C.I. for

0
is
_
b
0
t(1 /2; n 2) s(b
0
), b
0
+t(1 /2; n 2) s(b
0
)
_
Interval estimation of E(Y
h
)

For a particular value of X, say X = X

h
, we are interested in
the value of the corresponding Y
h
ie the dependent variable
corresponding to this value of the independent variable.

Now if it wont be possible to get the value of Y

h
, so the next
best thing would be E(Y
h
) and this is in a way better because
this actually gives us the mean value of all the possible
estimates of Y
h
.

Recall Y
h
=
0
+
1
X
h
+
h
. Thus
E(Y
h
) =
0
+
1
X
h

What would be an estimator of E(Y

h
). Well from what we
already have, we can say

Y
h
= b
0
+ b
1
X
h
would be an estimator of E(Y
h
).

Y
h
) =
0
+
1
X
h

Variance of

Y
h

2
(

Y
h
) =
2
_
1
n
+
(X
h

X)
2

(X
h

X)
2
_

Estimate of
2
(

Y
h
) =?
MSE
_
1
n
+
(X
h

X)
2

(X
h

X)
2
_

Sampling distribution of

Y
h

What would be the distribution of

TS =

Y
h
(
0
+
1
X
h
)
_
Var(

Y
h
)

Can TS be a test statistic? Does it contain any unknown

parameter?

Estimated variance of

Y
h
, that is, estimate of Var (

Y
h
)
(s(

Y
h
)
2
).

MSE
_
1
n
+
(X
h

X)
2

(X
h

X)
2
_

Recall that
Z\
_

2
n
n
t
n2
Where Z( N(0, 1)) and
2
n
are independent.

If we know (n 2)
2
/
2

2
n2

And we aslo know the above random variable is independent

of b
0
and b
1

What would be the distribution of

Y
h
(
0
+
1
X
h
)
_
MSE
_
1
n
+
(X
h

X)
2

(X
h

X)
2
_
Condence interval for E(Y
h
)

Is the t distribution symmetric ?

t(/2; n 2) : Denotes the (/2)100 percentile of the t

distribution with n 2 degrees of freedom.

t(/2; n 2) = t(1 /2; n 2)

Now
P
_
t(/2; n2)
Y
h
(
0
+
1
X
h
)
s(

Y
h
)
t(1/2; n2)
_
= 1

Thus the 100(1 )% C.I. for

0
+
1
X
h
is
_
b
0
+b
1
X
h
t(1/2; n2)s(

Y
h
), b
0
+b
1
X
h
+t(1/2; n2)s(

Y
h
)
_
Prediction of new observation
Now suppose we have X = X
new
and want to nd the
corresponding value of Y
new
. Since
Y
new
=
0
+
1
X
new
+
new
and
new
is unobserved so we shall new be able to get the exact
value of Y
new
. So here we propose an interval estimate of Y
new
.
Recall

Y
new
= b
0
+ b
1
X
new
.
Now consider Y
new

Y
new
.

E(Y
new

Y
new
) =?

Var (Y
new

Y
new
) =
2
+
2
_
1
n
+
(X
new

X)
2

(X
i

(X))
_

s
2
(Y
new

Y
new
) = MSE + MSE
_
1
n
+
(X
new

X)
2

(X
i

(X))
_

What is the distribution of

Y
new

Y
new
_

2
+
2
_
1
n
+
(X
new

X)
2

(X
i

(X))
_

If
2
is unknown we shall replace it by the MSE

So now the question is what is the distribution of

Y
new

Y
new
_
MSE + MSE
_
1
n
+
(X
new

X)
2

(X
i

(X))
_

Y
new

Y
new
_
MSE
_
1 +
1
n
+
(X
new

X)
2

(X
i

(X))
_

Observe that MSE is independent of the numerator. Why?

Thus the 100(1 )% C.I. for Y

new
is
_

Y
new
t(1 /2; n 2) s(Y
new

Y
new
),

Y
new
+ t(1 /2; n 2) s(Y
new

Y
new
)
_
R
2
and r

i =1
(Y
i

Y)
2
=
n

i =1
(

Y
i

Y)
2
+
n

i =1
(Y
i

Y)
2

SSR (Regresion Sum of Squares) =

n
i =1
(

Y
i

Y)
2

SSE (Error Sum of Squares) =

n
i =1
(Y
i

Y)
2

SSTO (Total Sum of Squares) =

n
i =1
(Y
i

Y)
2

SSTO = SSR + SSE

Thus,
1 =
SSR
SSTO
+
SSE
SSTO
SSR
SSTO
= 1
SSE
SSTO
R
2
=
SSR
SSTO
= 1
SSE
SSTO

What happens if all the observations are all on a straight line?

SSE = 0 or R
2
= 1

R
2
= 0. What does it imply? See the expln on page 74.
A brief note on R
2
and r .
R
2
is dened as the proportionate reduction of the total variation
associated with the use of the predictor variable X.

R
2
=
SSR
SSTO
= 1
SSE
SSTO
=
SSTO SSE
SSTO

Thus larger the value of SSR (or smaller the value of SSE)
closer is R
2
to 1.

If our model was

Y
i
=
0
+
i
we can show that, based on this model the total variation
(constant times the estimate of
2
) will be
n

i =1
(y
i
y)
2
.

However, if we use this model, which uses the predictor

variable X
Y
i
=
0
+
1
X
i
+
i
we can show that, based on this model the total variation
(constant times the estimate of
2
) will be
n

i =1
(y
i
y
i
)
2
.

Thus the absolute reduction in total variation will be

i =1
(y
i
y)
2

i =1
(y
i
y
i
)
2

Now the proportionate reduction (or the percentage decrease)

in the total variation will be

n
i =1
(y
i
y)
2

n
i =1
(y
i
y
i
)
2

n
i =1
(y
i
y
i
)
2
=

n
i =1
( y y
i
)
2

n
i =1
(y
i
y
i
)
2
Limitations of R
2
In the text book page 75 the authors speaks of 3
misunderstandings. Here we go over them.

I: Recall that we had proved that E(

Y
h
) =
0
+
1
X
h
. This
implies that the tted regression equation

Y
h
= b
0
+ b
1
X
h
is
an unbiased estimator of the mean value of Y when the level
of X is xed at X
h
. Now merely being unbiased does not
mean much. We would want to make sure that the variance
of this estimator is not too large. We have
s.e
2
(

Y
h
) = MSE
_
1
n
+
(X
h

X)
2

(X
i

X)
2
_
So if X
h
= 100, then

Y
h
= b
0
+ b
1
100, which would be an
unbiased estimator of
0
+
1
100, but the s.e
2
would be
203.72, which we can see is quite large. (See page 55
Example 2)
Using 100 (1 )% C.I. for testing hypothesis at level

The Critical Region for rejecting H

0
:
1
=
10
against
H
a
:
1
=
10
is
_
b
1
>
10
+ t(1 /2, n 2) se(b
1
)
_
_
_
b
1
<
10
t(1 /2, n 2) se(b
1
)
_

The above is equivalent to

_
b
1
t(1 /2, n 2) se(b
1
) >
10
_
_
_
b
1
+ t(1 /2, n 2) se(b
1
) <
10
_

The null hypothesis will NOT be rejected if

_
b
1
t(1/2, n2)se(b
1
) <
10
< b
1
+t(1/2, n2)se(b
1
)
_

Thus under the null hypothesis, i.e, when H

0
:
1
=
10
is
true we have
P
_
b
1
t(1 /2, n 2) se(b
1
)
<
10
< b
1
+ t(1 /2, n 2) se(b
1
)
_
= 1

Recall the 100 (1 )% C.I. for

1
is
_
b
1
t(1/2, n2) se(b
1
), b
1
+t(1/2, n2) se(b
1
)
_

That is
P
_
b
1
t(1 /2, n 2) se(b
1
)

1
b
1
+ t(1 /2, n 2) se(b
1
)
_
= 1

So if we are given a 100 (1 )% condence interval for say

1
and we want to use it to test the null hypothesis
H
0
:
1
=
10
against two sided alternative, then what do we
do.

We check to see if
10
lies outside the interval, if so then we
reject the null.
Chapter 2

Sections not included for the midterm 2.6, 2.8 and 2.11

Important terms and concepts

Inference (Testing + Interval Estimation) concerning

T test

ANOVA table + F test

Inference (Testing + Interval Estimation) concerning

Interval Estimation of E(Y

h
).

Prediction interval of Y
new

R
2
and r
Chapter 3

We assumed that our observations were generated as follows:

Y
i
=
0
+
1
X
i
+
i
i = 1, ..., n.
where,
i
were assumed to be i.i.d. Normal(0,
2
).

Essentially we had assumed that

The regression function was linear. (Model Assumption)

The error terms have a constant variance. (Error Assumption)

The error terms are independent.(Error Assumption)

The error terms are normally distributed.(Error Assumption)

The question is how do we know, merely based on the data

set, that the above 5 assumptions are valid.

This question can be answered by studying the residuals.

What are the residuals?

Recall that
e
i
= Y
i

Y
i
and

Y
i
= b
0
+ b
1
X
i

Also recall (from the 6 properties of the regression on pages

23 and 24. )
n

i =1
e
i
= 0
n

i =1
X
i
e
i
= 0

Question: Are the residuals independent? Why?

Now E(e
i
) = 0 for all i = 1, ..., n.

Var (e
i
) =
2
(1 h
ii
)

What is h
ii
? Its the i
th
diagonal element of the Hat matrix
H? What is the Hat matrix? Later.

Moral of the story e

i
s do not have a constant variance !

So it becomes dicult to use them together. If we want to

comapre one with another we would want to make sure they
are standardised.

So consider
e

i
=
e
i
e
_
Var (e
i
)
=
e
i
_
Var (e
i
)
=
e
i
_

2
(1 h
ii
)
This is going to make it mean 0 and variance 1. But what if
variance
2
is unknown?

Then we shall use

i
=
e
i
e
_
Var (e
i
)
=
e
i
_
Var (e
i
)
=
e
i
_
MSE(1 h
ii
)
Nonlinearity of Regression Function

Is the regression function linear?

An initial answer to this question is provided by scatter plot.

But this is not always as eective as the residual plot (we

shall soon see an example).

What is a residual plot? Plot residuals VS X OR residuals VS

tted.

Let us consider the following examples.

Scatter Plot
10 5 0 5 10

1
0

5
0
5
1
0
1
5
independent
d
e
p
e
n
d
e
n
t
Semi-Studentised Residual and Independent Variable Plot
10 5 0 5 10

2
.
0

1
.
5

1
.
0

0
.
5
0
.
0
0
.
5
1
.
0
1
.
5
predictor
s
t
d

r
e
s
i
d
u
a
l
s
Semi-Studentised Residual and Fitted Equation Plot
10 5 0 5 10 15

2
.
0

1
.
5

1
.
0

0
.
5
0
.
0
0
.
5
1
.
0
1
.
5
fitted
s
t
d

r
e
s
i
d
u
a
l
s

Here the data was generated from a linear model. Here

Y = 2.4 + 1.4 X + Normal (0, 2)

From the scatter plot we observe that strict linear trend

Now observe the semistudentized residual and x plot. What

do we see there?

Notice that the plot is scattered around the y = 0 line, and

more importantly we do NOT observe any pattern !!

If the functional form of the regression model is incorrect, the

residual plots constructed by using the model will often
display a pattern.

Later we shall see that this pattern can be used to determine

a more appropriate model.

More examples ......

Scatter Plot
10 5 0 5 10
0
2
0
4
0
6
0
8
0
1
0
0
independent
d
e
p
e
n
d
e
n
t
Semi-Studentised Residual and Independent Variable Plot
10 5 0 5 10

1
.
0

0
.
5
0
.
0
0
.
5
1
.
0
1
.
5
predictor
s
t
d

r
e
s
i
d
u
a
l
s
Semi-Studentised Residual and Fitted Equation Plot
36.1 36.2 36.3 36.4 36.5 36.6 36.7 36.8

1
.
0

0
.
5
0
.
0
0
.
5
1
.
0
1
.
5
fitted
s
t
d

r
e
s
i
d
u
a
l
s

Here the data was generated from a linear model. Here

Y = X
2
+ Normal (0, 2)

From the scatter plot we observe that its not linear at all. So
we can guess that a linear t wont work.

Now observe the semistudentized residual and x plot. What

do we see there?

Notice that the plot shows a distinct pattern (in this case a
quadratic pattern). Implying that the functional form of the
population regression equation is not linear.

More examples ......

Scatter Plot
0 5 10 15 20 25 30
3
.
0
3
.
5
4
.
0
4
.
5
5
.
0
5
.
5
6
.
0
independent
d
e
p
e
n
d
e
n
t
Semi-Studentised Residual and Independent Variable Plot
0 5 10 15 20 25 30

0
.
6

0
.
4

0
.
2
0
.
0
0
.
2
0
.
4
0
.
6
predictor
r
e
s
i
d
u
a
l
s
Semi-Studentised Residual and Fitted Equation Plot
5.0135 5.0140 5.0145 5.0150 5.0155 5.0160 5.0165

0
.
6

0
.
4

0
.
2
0
.
0
0
.
2
0
.
4
0
.
6
fitted
r
e
s
i
d
u
a
l
s

Here the data was generated from a linear model. Here

Y = 5 + Normal (0, 2)

From the scatter plot we observe that it is linear with no

slope. So we can guess that a linear with slope wont work.

Now observe the semistudentized residual and x plot. What

do we see there?

More examples ......

Estimate t value Pr(>|t|)
b0 5.0134539 170.324 <2e-16 ***
b1 0.0001085 0.065 0.948
---
R-squared: 0.000153

In the past exmples it was pretty obvious from the scatter plot
itself as what to expect.

Let us take a look at the following example.

Scatter Plot
80 100 120 140 160 180 200 220
0
1
2
3
4
5
6
7
independent
d
e
p
e
n
d
e
n
t
Semi-Studentised Residual and Independent Variable Plot
80 100 120 140 160 180 200 220

1
.
0

0
.
5
0
.
0
0
.
5
1
.
0
predictor
r
e
s
i
d
u
a
l
s
Semi-Studentised Residual and Fitted Equation Plot
2 3 4 5 6 7

1
.
0

0
.
5
0
.
0
0
.
5
1
.
0
fitted
r
e
s
i
d
u
a
l
s
b0 -1.727 0.13500
b1 6.484 0.00064 ***
R-squared: 0.8751
Scatter Plot
0 5 10 15 20
0
1
0
0
2
0
0
3
0
0
4
0
0
independent
d
e
p
e
n
d
e
n
t
Semi-Studentised Residual and Independent Variable Plot
0 5 10 15 20

1
.
0

0
.
5
0
.
0
0
.
5
1
.
0
1
.
5
2
.
0
predictor
s
t
d

r
e
s
i
d
u
a
l
s
Scatter Plot and the Regression Equation
0 5 10 15 20
0
1
0
0
2
0
0
3
0
0
4
0
0
independent
d
e
p
e
n
d
e
n
t

Here the data was generated from a linear model. Here

Y = 5 + 3X + X
2
+ Normal(0, 2).

Now observe the semistudentized residual and x plot and

compare it with R
2
. What can we conclude?
Nonconstancy of error variance

Scatter plot of the residuals and the X

i
s or the tted values.

If however the number of observations is small, the scatter

plot of absolute value of the residuals and the X
i
s or the
tted values would be a good thing to do.

If the variances are systematically increasing then funnel

opening outwards.

If the variances are systematically decreasing then funnel

closing outwards.

If the variances vary arbitrarily, we wont be able to make that

out from the residual plot.
Nonindependence of error terms

How do we verify the assumption that the random errors are

independent.

When we say non independent we mean that the errors are

generated from a autoregressive time series model, that is:

i
=
i 1
+ u
i
where u
i
are i.i.d N(0,
2
)

Plot the residuals against time sequence instead of the

predicted value or X
i
s itself.

Observe the plot. If it does not display any pattern then the
errors are independent.

However if there is a pattern then we can assume

nonindependence.

What kind of pattern are we looking into?

We expect the residuals in the sequence plot to uctuate in a

more or less random pattern around 0. Lack of randomness
can take the form of too much or too little alteration of points
around the zero line. Read the Comment on page 110.

Now merely based on a graph we can only drawn conslusions

subjectively.

In order to be more precise we need a hypothesis test. This

brings us to the Durbin Watsons test.

H
0
: The errors are not autocorrelated
H
1
: The errors are autocorrelated

Test Statistic:
d =

n
i =2
(e
i 1
e
i
)
2

n
i =1
e
2
i

We shall talk more about this shortly.

Nonnormality of error terms

Distribution plot: Box plot, histogram etc

Comparison of frequencies (the 95% rule for t distribution )

Q-Q Plot

Step1: Arrange the standardised residuals from the smallest to

the largest e
[i ]
. This shall be on the Y axis

Step2: On the X axis plot the points z

(i )
, where z
(i )
is dened
as the point on the scale of the standard normal curve so that
the area under the curve to the left of this point z
(i )
, is
i 0.375
n+0.25
.

If the plot appears to be a straight line then we can assume

normality.

Why z
(i )
? because E(e
[i ]
) very close to
2
z
(i )
Outliers

What are outliers?

Extreme Observations

How to detect outliers?

Plot the semistudentized residuals against X

i
or

Y
i
. This plot
should be centerd around the line y = 0. Now look for the
points which lie outside the band y = 4 and y = 4. This is a
rough and ready way to detect outliers.

What causes outliers to occur in the residual scatter plot?

Mostly because of error in recording the observations. But it

may also be due to interaction with another predictor variable
which is not present in the model.
Why do we need to detect outliers?

LSE are signicantly modied due to the presence of outliers.

This causes wrong t..

If the number of observations is small. This in turn leads us

to errenous conclusions as it alters the residual plot as well.
Exploratory Residual Analysis

MODEL ASSUMPTIONS

The regression function is linear

Plot of residuals against predictor variables or tted values

The error variances are constant

Plot of residuals against predictor variables or tted values

The error terms are independent

Plot of residuals against time sequence

The error terms are normally distributed

Normal Probability Plot of residuals

OUTLIER DETECTION

Plot of residuals against predictor variables or tted values

Tests
In the last few slides we saw a bunch of plots to verify the four
model assumptions. But plots can be interpreted subjectively so in
order to check if the assumptions hold we need to carry out
hypothesis tests.

To check for the independence of the error terms: Durbin

Watson Test

To check for the constancy of variance of the error terms:

Brown Forsythe or Breusch Pagan

To check for the normality of the error terms: Correlation for

the QQ Plot or Kolmogorov-Smironov Test and a bunch of
others.

To check for the linearity of the regression function: F test for

Lack of Fit.

To detect outliers: Remove the point ret the regression

equation and construct a prediction interval.
Remedy

So far we have been discussing about how to detect if the

model assumptions are violated.

Now we shall talk about remedies.

There are two ways of xing the problem

Change the model (from to linear to something more complex).

Transform the data set.

Why should we prefer one over another? Complex models

may lead to a better understanding as to how the data is
being generated. But estimating the model parameters might
turn out to be very challenging. Whereas, if we can transform
the data successfully then we can get away by building a
relatively simpler model (perhaps with lesser number of
parameters to estimate and using simpler techniques).
Model

Regression function is not linear : Consider a polynomial or a

non linear regression model how do we determine its
functional form? scatter plot or techniques which shall be
discussed later. Instead of the usual linear form we could have
E(Y) =
0
+
1
X +
2
X
2
OR E(Y) =
0

X
1
.

Variances of the error terms are not constant : Use weighted

least squares instead of the usual method

That is divide each observation Y

i
by
2
i
, thus making the
variances of each observation constant.

Non independence of errors : Use a dependent structure

instead of the usual iid assumption, which will in turn alter
the theoretical properties of the estimates of the model
parameters.

Non normality of the error term : Use the proper distribution

instead of the normal distribution.
Data Transformation
As we discussed earlier using a complex model can be very
challenging. Whereas, if we can transform the data successfully
then we can get away by building the simple regression model. So
we could

Transforming X.

Transform Y. Power transformation or Box Cox

transformation

Transform both X and Y.

Remember: Why are we transforming? So that we dont have to
use a complex model i.e., we can use a simple linear model. That
is all the 4 model assumptions are valid.
Transforming X
Suppose the scatter plot shows a non linear trend, here we could
either transform X or Y. Here if we have sucient reason to
believe that the error terms are normally distributed and the
variance is constant then we should transform X rather than Y.
Why so ? Because if we transform Y, say

Y, then the
distribution of the error shall change and the variances shall not
necessarily be constant any more.

Step 1: Draw the scatter plot

Step 2: Does this plot look linear?

Step 3: Yes B-)) No? Make a good guess !

Step 4: Transform the X.

Step 5: Draw scatter plot again.

Step 6: Build the model.

Step 7: Residual analysis. What do the plots say?

Note: As we shall soon see in Step 4 this transformation does

not have to be unique, a couple of dierent ones may do the
job. We select any one which ts.

See the comment on page 132 - Very important.

The transformation only on X is essentially used when the

problem lies in the linearity assumption of the regression
function.

Now if the assumptions of constant variance and normality is

violated. What do we do then?

These are issues related to the error term. So the only way to
x it would be transforming the Y values. Usually both these
problems are addressed together by using only one
transformation.

How do we know that there is a problem? Use the residual

plot or observe the scatter plot carefully. If it seems like it is
spreading out then that shows the variance is increasing. Take
a look at the gure 3.15 in the textbook.
Transforming Y
Once we know that transforming Y is necessary, how do we do it?

We could just guess from the scatter plot like we did for X,
but there is another method which is less heuristic.

Box Cox Transformation or the Power Transformation. How

does it work?

We would be happiest if we could t the model

Y
i
=
0
+
1
X
i
+
i

But that is not be : (( So we need to transform the Y

variable, that is come up with a and use Y

instead of Y,
that is now we have to t
Y

i
=
0
+
1
X
i
+
i
Box Cox Transformation

Step 1: Choose 20 (?) uniformly seperated points from the

interval [-2,2] e.g. [-2.0,-1.8,-1.6,...,1.6,1.8,2.0 ]

Step 2: For each value of make the following

transformation:
If = 0
W
i
=
1
((
n
1
Y
i
)
1/n
)
1
(Y

i
1)
If = 0
W
i
= (
n
1
Y
i
)
1/n
log
e
(Y
i
)

Step 3: So now from the original data set consisting of Y

i
s we
have a new data set consisting of W
i
s instead. Now we build
a regression model with these (X
i
, W
i
). The number of
regression models shall be same as the number of dierent
values of .

Step 4: Now calculate the SSE for each of the regression

models. Plot the SSE vs . Select that where the SSE is
minimised.

Step 5: If a non zero is chosen the our transformation will

be Y

otherwise if = 0 the the transformation will be

log
e
(Y
i
)
Transforming both X and Y

When unequal variances are present but regression relation is

linear, a transformation on Y may not be sucient.

Such transformation on Y may stabilise the error variance but

it might also change the linear relationship to non linear.

A transformation on X shall also be required.

So rst we transform Y and then we transform X.

Post Transformation

Once we are done with the transformation, then we t the

linear regression to the transformed data and check the
residual plot.

If we are happy with the residual plot then job done.

Otherwise we try out a dierent transformation till we arrive
at the desire result.
Condence Intervals

100(1 /2)% C.I. for

0
is
b
0
t(1 /4; n 2)s{b
0
}

100(1 /2)% C.I. for

1
is
b
1
t(1 /4; n 2)s{b
1
}

What will be the 100(1 /2)% region for (

0
,
1
)

Let A denote the complement of the region

(b
0
t(1 /4; n 2)s{b
0
}, b
0
+ t(1 /4; n 2)s{b
0
})
Thus P(A) =

Let B denote the complement of the region

(b
1
t(1 /4; n 2)s{b
1
}, b
1
+ t(1 /4; n 2)s{b
1
})
Thus P(B) =

P(A B) = P(A) +P(B) P(A B)

P(A B) 1 P(A) P(B)

Now our required region is, say R

(b
0
t(1 /4; n 2)s{b
0
}) (b
1
t(1 /4; n 2)s{b
1
})

P(R) = P(A B) 1 P(A) P(B)

P(R) = P(A B) 1
Regression through the origin

If our model was

Y
i
=
1
X
i
+
i

E(Y
i
) =
1
X
i

The LSE of
1
is
b
1
=
n

i =1
X
i
Y
i
/
n

i =1
X
2
i

The unbiased estimator of

2
is
s
2
= MSE =
n

i =1
e
2
i
/(n 1)

Table 4.1 page 162

Multiple Regression

So far we had been studying Simple Linear Regression.

Now we move to Multiple linear regression

Why was it simple: It had only one covariate, that is

Y
i
=
0
+
1
X
i
+
i
for all i = 1 to n

In a multiple regression setting we have

Y
i
=
0
+
1
X
1i
+
2
X
2i
+
3
X
3i
+... +
p1
X
(p1)i
+
i
that is
Y
i
=
0
+
p1

j =1

j
X
ji
+
i
for all i = 1 to n
Y
n1
= X
np

p1
+
n1
where
Y
n1
= vector of n observations.
Y = (Y
1
, ..., Y
n
)

X
np
= the design matrix.
The i
th
row of the above matrix is
X
i
= (1, X
1i
, X
2i
, ..., X
(p1)i
)

n1
= vector of n errors.
that is
= (
1
, ...,
n
)

and

i
are i.i.d N(0,
2
)
Now that we can write
Y = X +
we have
E(Y
n1
) = X
Note we are taking the expectation of a vector.
V(Y) =
2
I
nn
We derived the variance of a vector. The Least Squares
Estimates, here, are

= b = (X

X)
1
X

Y
E(

) =
Fitted value

Y = X

Y = X(X

X)
1
X

Y
or

Y = HY
H = X(X

X)
1
X

It is to be noted that the matrix H is idempotent and symmetric.

Residuals in multiple regression
e = Y

Y = Y HY = (I H)Y
E(e) = 0
Why?
V(e) =
2
(I H)
Why?
s
2
(e) = MSE(I H)
ANOVA in multiple regression setting
Recall that
SSTO = SSE + SSR
where,
SSTO = Y

Y
1
n
Y

JY = Y

[I
1
n
J]Y
as usual it has n 1 degrees of freedom.
SSE = e

e = (Y Xb)

(Y Xb) = Y

(I H)Y
it has n p degrees of freedom.
SSR = b

Y
1
n
Y

JY = Y

[H
1
n
J]Y
it has p 1 degrees of freedom. where J is an n n matrix of 1s.
(For a numerical example see page 243, for ANOVA table see page
225)
Thus we have
MSE = SSE/(n p)
(n p)MSE/
2

2
np
MSR = SSR/(p 1)
As before it can be shown that, if all the
i
s are zero, then,
E(MSR) =
2
otherwise
E(MSR) >
2
Now that we have MSE, MSR we can construct an ANOVA table
as before. Now here,
H
0
:
1
=
2
= ... =
p1
= 0
H
a
: At least one
i
is non zero
The test statistic
F

=
MSR
MSE
Rejection region is
F

F(1 ; p 1, n p)
R
2
: Coecient of multiple determination
As before
R
2
=
SSR
SSTO
= 1
SSE
SSTO

It meausres the proportionate reduction of the total variation

in Y associated with the use of set of X variables X
1
, ..., X
p1
.

0 R
2
1. Now R
2
= 0, when all b
k
= 0 for
k = 0, 1, .., p 1, and R
2
= 1 when all

Y
i
= Y
i
for all
i = 1, ..., n.

As we increase the number of covariates, i.e. X

i
s (say from p
=5 to p=10) the SSE will decrease, thus R
2
will increase.
But this does not mean that the model has a better t.

This is why we have adjusted R

2
R
2
adj
= 1
SSE/(np)
SSTO/(n1)
Inference about the regression parameters in multiple
regression
E(b) =
Why?
V(b) =
2
(X

X)
1
Why?
s
2
(b) = MSE(X

X)
1
We know that
b MVN(, )
So we know that each b
k
where k = 0, 1, ..., (p 1) is normally
distributed. Hence as before
b
k

k
s{b
k
}
t(n p)
for all k = 0, 1, 2, ..., p 1
Hence the interval estimate of
k
with (1 ) condence
coecient is
_
b
k
t(1 /2, n p)s{b
k
}, b
k
+ t(1 /2, n p)s{b
k
}
_
Tests for
k
where k = 0, 1, 2, ..., p 1. In order to test
H
0
:
k
= 0
H
a
:
k
= 0
We use
t

=
b
k
s{b
k
}
as our test statistic and our ctitical region is,
|t

| > t(1 /2; n p)

Joint condence intervals. If we wish to nd the joint condence
interval of g of the p parameters, the condence limits with family
condence coecient 1 are:
b
k
t(1 /2g; n p)
Interval Estimation of E(Y
h
). Recall
E(Y) = X
Thus
E(Y
h
) = X

General Linear Test Approach

Recall that the model under consideration is
Y = X +

= (XX

)
1
Y
In order to test
H
0
:
j
= 0
H
a
:
j
= 0
We use, the t-test, where the test statistic is

i

i

MSE C
ii
t
(np)
Also we had
SSE = SSR + SSE
and as p increases SSE decreases or (SSR increases) and vice versa.
Here
SST =
n

i =1
(y
i
y)
2
and it does not depend on either p, the number of parameters in
the model or the values of the covariates in the model (i.e, the
actual values of X
i
s).
Now lets, for a moment, get back to the SLR model setting, that
is,
y
i
=
0
+
1
x
i
+
i
when i = 1, ..., n
Now in this setting the full model is
y
i
=
0
+
1
x
i
+
i
and the reduced model is
y
i
=
0
+
i
Under the full model the
SSE =
n

i =1
(y
i
y
i
)
2
=
n

i =1
(y
i
b
0
b
1
x
i
)
2
Under the reduced model
SSE =
n

i =1
(y
i
y
i
)
2
=
n

i =1
(y
i
y)
2
Observe that, under the reduced model, the SSE = SST. Since
SSE decreases as p increases, so if adding any new variable had
any eect can be found out by comparing
SSE(F) with SSE(R)
We also know that
SSE(F) SSE(R)
So in order to test
H
0
:
1
= 0
H
a
:
1
= 0
we use the following test statistic
F

=
(SSE(R) SSE(F))/(df
R
df
F
)
SSE(F)/df
F
If H
0
is true we know
F

=
SST SSE
(n 1) (n 2)

SSE
(n 2)
=
MSR
MSE
F(1, n 2)
Here (that is the case when p=2) we nd that the General Linear
Test is identical to the ANOVA test.
When p=2, we have
SST = SSR(X
1
) + SSE(X
1
)
When p=3, we have
SST = SSR(X
1
, X
2
) + SSE(X
1
, X
2
)
When p=4, we have,
SST = SSR(X
1
, X
2
, X
3
) + SSE(X
1
, X
2
, X
3
)
EXTRA SUM OF SQUARES:
SSR(X
2
|X
1
) = SSR(X
1
, X
2
) SSR(X
1
)
= SSE(X
1
) SSE(X
1
, X
2
)
The EXTRA SUM OF SQUARES is the measure of the marginal
eect of adding the new variable to the existing model. Similarly
we can dene:
SSR(X
3
|X
1
, X
2
) = SSE(X
1
, X
2
) SSE(X
1
, X
2
, X
3
)
or
SSR(X
3
, X
2
|X
1
) = SSE(X
1
) SSE(X
1
, X
2
, X
3
)
SST = SSR(X
1
) + SSE(X
1
)
SSR(X
2
|X
1
) = SSE(X
1
) SSE(X
1
, X
2
)
SST = SSR(X
1
) + SSE(X
2
|X
1
) + SSE(X
1
, X
2
)
SST = SSR(X
1
, X
2
) + SSE(X
1
, X
2
)
Comparing the two we get
SSR(X
1
, X
2
) = SSR(X
1
) + SSR(X
2
|X
1
)
Since
SSR(X
1
|X
2
) = SSE(X
1
) SSE(X
1
, X
2
)
SSR(X
3
|X
1
, X
2
) = SSE(X
1
, X
2
) SSE(X
1
, X
2
, X
3
)
We can write
SST = SSR(X
1
) + SSE(X
1
)
= SSR(X
1
) + SSE(X
1
, X
2
) + SSE(X
2
|X
1
)
= SSR(X
1
) + SSE(X
2
|X
1
) + SSR(X
3
|X
1
, X
2
) + SSE(X
1
, X
2
, X
3
)
Comparing this with
SST = SSR(X
1
, X
2
, X
3
) + SSE(X
1
, X
2
, X
3
)
We can write
SSR(X
1
, X
2
, X
3
) = SSR(X
1
) + SSE(X
2
|X
1
) + SSR(X
3
|X
1
, X
2
)
The df of SSR is p 1, so the df of SSR(X
1
|X
2
, X
3
) is 1, and that
of SSR(X
1
, X
2
|X
3
) is 2. Now that we have SSR we can also dene
the MSR
MSR(X
2
, X
3
|X
1
) = SSR(X
2
, X
3
|X
1
)/2
Thus we decompose the total SSR into smaller components. What
is the use of all this ? Well it gives us an idea as to how the
reduction in variation takes place and how each covariate is
responsible in bringing about this change, in other words, the
contribution of each covariate gets more explicit.
Consider the following (full)model
Y
i
=
0
+
1
X
i 1
+
2
X
i 2
+
3
X
i 3
+
i
H
0
:
3
= 0
H
a
:
3
= 0
SSE(F) = SSE(X
1
, X
2
, X
3
)
SSE(R) = SSE(X
1
, X
2
)
F

=
SSE(R) SSE(F)
df
R
df
F

SSE(F)
df
F
=
SSE(X
1
, X
2
) SSE(X
1
, X
2
, X
3
)
(n 3) (n 4)

SSE(X
1
, X
2
, X
3
)
(n 4)
=
SSR(X
3
|X
1
, X
2
)
1

SSE(X
1
, X
2
, X
3
)
n 4
=
MSR(X
3
|X
1
, X
2
)
MSE(X
1
, X
2
, X
3
)
This is known as the partial F-test
H
0
:
2
=
3
= 0
H
a
: H
0
is not true.
Full Model: Same as before, thus SSE(F) = SSE(X
1
, X
2
, X
3
)
Reduced Model: Y
i
=
0
+
1
X
i 1
+
i
, thus SSE(R) = SSE(X
1
)

Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
Bryson 2015
No ratings yet
Bryson 2015
17 pages
Gopalpur Cluster
No ratings yet
Gopalpur Cluster
19 pages
Cia - 4 End Term Examination Fundamentals of Business Analytics
No ratings yet
Cia - 4 End Term Examination Fundamentals of Business Analytics
14 pages
Hand Washing Sop
No ratings yet
Hand Washing Sop
8 pages
Phil Hist Topic Outline
No ratings yet
Phil Hist Topic Outline
5 pages
Social Media Use in B2B Context
No ratings yet
Social Media Use in B2B Context
72 pages
External Environment 2. Natural Environment 3. Sources of Data and Techniques
No ratings yet
External Environment 2. Natural Environment 3. Sources of Data and Techniques
80 pages
Simple Regression
No ratings yet
Simple Regression
46 pages
Human Resource Planning
100% (2)
Human Resource Planning
16 pages
Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099
No ratings yet
Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099
110 pages
ch12 0
No ratings yet
ch12 0
82 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Media Potrayals of Minorities
No ratings yet
Media Potrayals of Minorities
21 pages
English Writing UVic English
No ratings yet
English Writing UVic English
63 pages
Ch08 (Hypothesis Testing)
No ratings yet
Ch08 (Hypothesis Testing)
28 pages
Item Analysis
No ratings yet
Item Analysis
12 pages
NTT Technical Review September 2022 Vol. 20 No. 9
No ratings yet
NTT Technical Review September 2022 Vol. 20 No. 9
89 pages
Successful HIV Prevention Programming For HIV-Positive MSM
No ratings yet
Successful HIV Prevention Programming For HIV-Positive MSM
60 pages
Cecily CV2023
No ratings yet
Cecily CV2023
3 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
(SBA) Social Studies
No ratings yet
(SBA) Social Studies
17 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Lesson 2 - Strategic Management
No ratings yet
Lesson 2 - Strategic Management
5 pages
Gary Chamberlain Econometric S
No ratings yet
Gary Chamberlain Econometric S
152 pages
ELT Materials - Claims, Critiques and Controversies
No ratings yet
ELT Materials - Claims, Critiques and Controversies
15 pages
Basic Econometrics Health
No ratings yet
Basic Econometrics Health
183 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Statistical Inference
No ratings yet
Statistical Inference
2 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
Econ20222 MJAbackgr
No ratings yet
Econ20222 MJAbackgr
164 pages
Sample of A Scientific Literature Review
100% (2)
Sample of A Scientific Literature Review
8 pages
1 Preliminaries: 1.1 Motivation
No ratings yet
1 Preliminaries: 1.1 Motivation
7 pages
Shark Tank Analysis
No ratings yet
Shark Tank Analysis
14 pages
Topic 6B Regression
No ratings yet
Topic 6B Regression
13 pages
A Genre Analysis of ESP Book Reviews and Its Reflections Into Genre-Based Instruction
No ratings yet
A Genre Analysis of ESP Book Reviews and Its Reflections Into Genre-Based Instruction
15 pages
Statistic SimpleLinearRegression
No ratings yet
Statistic SimpleLinearRegression
7 pages
ECMT1020 Formulas 2021
No ratings yet
ECMT1020 Formulas 2021
9 pages
Econometrics (Yamaguchi)
No ratings yet
Econometrics (Yamaguchi)
30 pages
RRL Suggestion
No ratings yet
RRL Suggestion
3 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
Example Class One
No ratings yet
Example Class One
4 pages
Linear Regression
100% (2)
Linear Regression
228 pages
CH 2
No ratings yet
CH 2
31 pages
Chapter 1
No ratings yet
Chapter 1
24 pages
Econ 471 Notes 1
No ratings yet
Econ 471 Notes 1
14 pages
Bce 211F Sim SDL Manual - 1
No ratings yet
Bce 211F Sim SDL Manual - 1
54 pages
1.1 Simple Linear Regression Model
100% (1)
1.1 Simple Linear Regression Model
15 pages
Chapter 2
No ratings yet
Chapter 2
58 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
Summary of Topics For Midterm Exam #2: STA 371G, Fall 2017
No ratings yet
Summary of Topics For Midterm Exam #2: STA 371G, Fall 2017
6 pages
Simple Linear Regression: Parameters
No ratings yet
Simple Linear Regression: Parameters
34 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Print SMPN 28 The Effect of Sociodrama Method For Junior High School Students
No ratings yet
Print SMPN 28 The Effect of Sociodrama Method For Junior High School Students
30 pages
Regression Models Notes
No ratings yet
Regression Models Notes
13 pages
Linear Regression
No ratings yet
Linear Regression
65 pages
Shubham Kore Resume
No ratings yet
Shubham Kore Resume
1 page
Topic 3a
No ratings yet
Topic 3a
64 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
Chapter2 (Simple Linear Regression)
No ratings yet
Chapter2 (Simple Linear Regression)
11 pages
Chaeat Sheet Econometrics
100% (2)
Chaeat Sheet Econometrics
5 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
Stats Reviewer 1.1
No ratings yet
Stats Reviewer 1.1
9 pages
Unit 2 - 2
No ratings yet
Unit 2 - 2
22 pages
Additional Cheatsheet en
No ratings yet
Additional Cheatsheet en
3 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
Econometrics
No ratings yet
Econometrics
13 pages
Business Analytics
No ratings yet
Business Analytics
19 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Cheatsheet
No ratings yet
Cheatsheet
4 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
DA Unit 3 Trio
No ratings yet
DA Unit 3 Trio
13 pages
Elementary Regression Analysis
No ratings yet
Elementary Regression Analysis
25 pages
Regression Notes - Part-1
No ratings yet
Regression Notes - Part-1
17 pages
Additional Cheatsheet en
No ratings yet
Additional Cheatsheet en
3 pages
Econometric Estimation BETA
No ratings yet
Econometric Estimation BETA
36 pages
5 - SCALEPlus - OnePagers - MeasureResults
No ratings yet
5 - SCALEPlus - OnePagers - MeasureResults
2 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
ch12 0
No ratings yet
ch12 0
43 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
4 pages
Linear Model 1
No ratings yet
Linear Model 1
71 pages
BASIC ECONOMETRICS OLS Assumptions
No ratings yet
BASIC ECONOMETRICS OLS Assumptions
29 pages