0% found this document useful (0 votes)
47 views7 pages

Business Stat & Emetrics - Inference in Regression

The document discusses key concepts in simple linear regression including: 1) The OLS estimators for the slope (B1) and intercept (B0) are linear, unbiased estimators. 2) The sampling distributions of the OLS estimators are normally distributed. 3) Confidence intervals for B1 and B0 can be constructed using the t-distribution based on the estimated variances of the estimators. 4) Hypothesis tests about the population parameters B1 and B0 can be conducted similarly to tests about other population parameters. 5) The R-squared goodness of fit statistic measures the proportion of variation in the dependent variable explained by the regression model.

Uploaded by

kasim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views7 pages

Business Stat & Emetrics - Inference in Regression

The document discusses key concepts in simple linear regression including: 1) The OLS estimators for the slope (B1) and intercept (B0) are linear, unbiased estimators. 2) The sampling distributions of the OLS estimators are normally distributed. 3) Confidence intervals for B1 and B0 can be constructed using the t-distribution based on the estimated variances of the estimators. 4) Hypothesis tests about the population parameters B1 and B0 can be conducted similarly to tests about other population parameters. 5) The R-squared goodness of fit statistic measures the proportion of variation in the dependent variable explained by the regression model.

Uploaded by

kasim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
You are on page 1/ 7

Inferences and hypothesis testing

In simple regression analysis, we assume that x and y are related linearly in the population.
Mathematically,

y i=B0 + B1 x i + ε i

But since we don’t know the population parameters, we use sample data to obtain a sample regression
model

^0+ B
y= B ^1 Xi

Here the slope and intercept are simply estimates of the corresponding parameters in the population. But,
these parameters are just realizations of a random variable: the estimators of each of the coefficients. The
estimators being random variable they can be characterized statistically.

Properties of slope estimator

 It is a random variable
 The mean of the estimator equals the population value.
 The estimator is normally distributed

Sampling distribution of ^
B0 and ^B1

Linearity of OLS estimators

 A linear estimator is an estimator that is a linear combination of the dependent variable. That is,
^Bi=w1 Y 1 + w2 Y 2 + …+ w n Y n

To see that the OLS estimator of the slope is linear


n

∑ ( Y i−Y ) (¿ X i −X )
^B= 1
¿
n

∑ ( X i− X ) 2

i=1

n
^B=∑ X i−X ¿ Y i ¿
n

∑ ( X i− X )2
1

i=1

For notational simplicity, let x I =X i− X

∑ xi Y i

( )
n
xi
^B= =∑ Y i =∑ w i Y i
1
n n

∑ (x i ) 2
∑ (x i )2 1

i=1 i=1

Hence, the OLS estimator is linear estimator.

OLS estimator is unbiased

^ 1 )=E ¿
E( B

Since

Y i−Y =B0 +B 1 x i+ ε i−(B0 +B 1 X )=B1 ( X i−X )+ε i


E (B
^ 1 ) =E ¿

Thus the estimator is biased only if the second term above is not zero. But, consider two possibilities’.
First, if we assume X is constant because it is a treatment determined by the researcher, then since E ε j=0
, the last term becomes zero. Second, if we assume X is random, then as far as the covariance of the error
with X is zero (assumption of our classical model), then again the last terms is zero. In either case, the
estimator is unbiased.

Variance of the sampling distribution of ^


BI

∑ ε i (¿ X i− X )
^ ^ 1 )=
�筽1−E ( B 1
¿
n

∑ ( X i −X ) 2

i=1

Which means that

n
^ ∑ ( X i −X )2 var (ε ¿¿i) ¿
var ( B)=
i=1
¿ ¿¿

Variance of B1 is smaller if the variance of y is smaller

Variance of B1 is smaller if the sample size is larger


Variance of B1 is larger the smaller the variance of the explanatory variable, X.

Normality of the OLS estiamtors

Since
n
^B=∑ wi Y i
1

It is a linear function of Y. Since we assumed that

2
Y i N (B 0+ B1 X i , σ ε )

A linear function of Y is also normally distributed. Hence,

σ y2
^B1 N (B1 , )
n

∑ ( X i−X ) 2

i=1

n
σy
2
∑ X i2
^B N (B , i=1
)
0 0 n
n ∑ ( X i −X )
2

i=1

Confidence intervals

In deriving the 100(1-α)% confidence interval for the parameter ^


B1, first let’s standardize ^B1

^ −B
B 1 1
σ ^B 1

Then,
^B −B
1 1
P( Z α ≤ ≤ Z α )=1−α
1−
2
σ B^ 1 1−
2

And solved for the population parameter

P( B1−σ B^ 1 Z α
^ 1 ≤ B 1+ σ ^ Z
≤B α )=1−α
1− B1 1−
2 2

2
But, now we don’t often know the population variance, σ y . Hence, we also need to estimate the
variance of the sampling distirution of the parameter estimator because we don’t know the variance
of y which determines the variance of the parameter as can be seen here

σ y2
^ )=
var ( B n

∑ ( X i − X )2
i=1

Now, we have to estimate the population variance of Y by the sample variance

∑ e i2
2 i=1
s y =,
n−1

After this substitution, we use the t-distribution for confidence interval

P( B1−s ^B 1 t α ≤ ^B1 ≤ B 1+ s ^B 1 t α )=1−α


1− ,n−2 1− ,n−2
2 2

Where,

^ )= sy2
var ( B n

∑ ( X i − X )2
i=1
Hypothesis testing

Hypothsis testing is done in quite similar fashion as done when testing hypothesis for mean etc.

Goodness of fit test (R-squared)

This tells us the fraction of the variation in the dpendent variable that is explained by the explanatory
variable.

Consider the following decomposition.

^ .
For an individual observation, we have the actual or observed value Y i and the estimated value Y i
Why the value differ from the mean value Y . There are two possible factors to explain this. First,
the value of X may be different from the mean of X for the observation. Hence, the dependent
variable also takes a value different from the mean of y. This difference is

Y^ i−Y

This is accounted for by the explanatory variable: it is what the regression line tries to explain.

Next, the value of y may also be different from the estimated value. This difference is

Y i−Y^ i

This is due to random error. This is what the error term represents.

Now, the total deviation of actual value of y from the mean is the sum of the two deviations
Y i−Y =( Y^ i −Y ) +(Y i−Y^ i )

It can be shown that we can square and sum this equation as

n n n

∑ (Y ¿¿ i−Y ) =∑ (Y^ i−Y )2+∑ (Y i−Y^ i)2 ¿


2

1 1 1

Total ∑ of squares=explained ∑ of squares+ unepxlai �< ed ∑ of squares

The R^2 is measure of the percentage of explained variabation

∑ (Y^ i−Y )2
2 1
R= n

n ∑ (Y i −Y^ i)2
∑ (Y ¿¿ i−Y )2=1− n
1
¿
∑ (Y ¿¿ i−Y ) ¿
1 2

Since OLS minimizes the unexplained sum of squares, it also maximizes the explained sum square,
or the R-squared.

You might also like