Business Stat & Emetrics - Inference in Regression
Business Stat & Emetrics - Inference in Regression
In simple regression analysis, we assume that x and y are related linearly in the population.
Mathematically,
y i=B0 + B1 x i + ε i
But since we don’t know the population parameters, we use sample data to obtain a sample regression
model
^0+ B
y= B ^1 Xi
Here the slope and intercept are simply estimates of the corresponding parameters in the population. But,
these parameters are just realizations of a random variable: the estimators of each of the coefficients. The
estimators being random variable they can be characterized statistically.
It is a random variable
The mean of the estimator equals the population value.
The estimator is normally distributed
Sampling distribution of ^
B0 and ^B1
A linear estimator is an estimator that is a linear combination of the dependent variable. That is,
^Bi=w1 Y 1 + w2 Y 2 + …+ w n Y n
∑ ( Y i−Y ) (¿ X i −X )
^B= 1
¿
n
∑ ( X i− X ) 2
i=1
n
^B=∑ X i−X ¿ Y i ¿
n
∑ ( X i− X )2
1
i=1
∑ xi Y i
( )
n
xi
^B= =∑ Y i =∑ w i Y i
1
n n
∑ (x i ) 2
∑ (x i )2 1
i=1 i=1
^ 1 )=E ¿
E( B
Since
Thus the estimator is biased only if the second term above is not zero. But, consider two possibilities’.
First, if we assume X is constant because it is a treatment determined by the researcher, then since E ε j=0
, the last term becomes zero. Second, if we assume X is random, then as far as the covariance of the error
with X is zero (assumption of our classical model), then again the last terms is zero. In either case, the
estimator is unbiased.
∑ ε i (¿ X i− X )
^ ^ 1 )=
�筽1−E ( B 1
¿
n
∑ ( X i −X ) 2
i=1
n
^ ∑ ( X i −X )2 var (ε ¿¿i) ¿
var ( B)=
i=1
¿ ¿¿
Since
n
^B=∑ wi Y i
1
2
Y i N (B 0+ B1 X i , σ ε )
σ y2
^B1 N (B1 , )
n
∑ ( X i−X ) 2
i=1
n
σy
2
∑ X i2
^B N (B , i=1
)
0 0 n
n ∑ ( X i −X )
2
i=1
Confidence intervals
^ −B
B 1 1
σ ^B 1
Then,
^B −B
1 1
P( Z α ≤ ≤ Z α )=1−α
1−
2
σ B^ 1 1−
2
P( B1−σ B^ 1 Z α
^ 1 ≤ B 1+ σ ^ Z
≤B α )=1−α
1− B1 1−
2 2
2
But, now we don’t often know the population variance, σ y . Hence, we also need to estimate the
variance of the sampling distirution of the parameter estimator because we don’t know the variance
of y which determines the variance of the parameter as can be seen here
σ y2
^ )=
var ( B n
∑ ( X i − X )2
i=1
∑ e i2
2 i=1
s y =,
n−1
Where,
^ )= sy2
var ( B n
∑ ( X i − X )2
i=1
Hypothesis testing
Hypothsis testing is done in quite similar fashion as done when testing hypothesis for mean etc.
This tells us the fraction of the variation in the dpendent variable that is explained by the explanatory
variable.
^ .
For an individual observation, we have the actual or observed value Y i and the estimated value Y i
Why the value differ from the mean value Y . There are two possible factors to explain this. First,
the value of X may be different from the mean of X for the observation. Hence, the dependent
variable also takes a value different from the mean of y. This difference is
Y^ i−Y
This is accounted for by the explanatory variable: it is what the regression line tries to explain.
Next, the value of y may also be different from the estimated value. This difference is
Y i−Y^ i
This is due to random error. This is what the error term represents.
Now, the total deviation of actual value of y from the mean is the sum of the two deviations
Y i−Y =( Y^ i −Y ) +(Y i−Y^ i )
n n n
1 1 1
∑ (Y^ i−Y )2
2 1
R= n
n ∑ (Y i −Y^ i)2
∑ (Y ¿¿ i−Y )2=1− n
1
¿
∑ (Y ¿¿ i−Y ) ¿
1 2
Since OLS minimizes the unexplained sum of squares, it also maximizes the explained sum square,
or the R-squared.