0% found this document useful (0 votes)
37 views18 pages

Lec3 ppt2019

Here are the steps to test for a linear association between parent's height and child's height using a significance level of 0.01: 1. State the null and alternative hypotheses: H0: β1 = 0 Ha: β1 ≠ 0 2. Calculate the t-statistic: T* = β̂1/s(β̂1) = 0.637/0.0224 = 28.46 3. Find the critical value for a two-sided test with n-2=926 degrees of freedom: t0.995;926 = 2.576 4. Compare the t-statistic to the critical value: Since 28.46 > 2.

Uploaded by

lcaccompany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views18 pages

Lec3 ppt2019

Here are the steps to test for a linear association between parent's height and child's height using a significance level of 0.01: 1. State the null and alternative hypotheses: H0: β1 = 0 Ha: β1 ≠ 0 2. Calculate the t-statistic: T* = β̂1/s(β̂1) = 0.637/0.0224 = 28.46 3. Find the critical value for a two-sided test with n-2=926 degrees of freedom: t0.995;926 = 2.576 4. Compare the t-statistic to the critical value: Since 28.46 > 2.

Uploaded by

lcaccompany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Stat 206: Linear Models

Lecture 3

Oct. 2, 2019
ReCap: Properties of LS Estimators

• LS estimators are unbiased: For all values of β0 , β1 ,

E (β̂0 ) = β0 , E (β̂1 ) = β1 .
• Variances of β̂0 , β̂1 :

 2 
 1 X 
σ {β̂0 } = σ  + Pn
2 2 
n 2
i =1 (Xi − X)
σ 2
σ2 {β̂1 } = Pn .
i =1 (Xi − X )2
Standard errors (SE) of the LS estimators.
• Replace σ2 by MSE:
 2 
 1 X 
s 2 {β̂0 } = MSE  + P  ,
n n 2
i =1 ( Xi − X )
MSE
s 2 {β̂1 } = Pn .
i =1 (Xi − X )2

• s {β̂0 } and s {β̂1 } are SE of β̂0 ad β̂1 , respectively.


• SEs with the increase of
Pn 2 2
i =1 (Xi − X ) = (n − 1)sX , which in turn with
the increase of sample size n and sample variance sX2 of X .
• SEs tend to with the increase of error
variance.
What are the implications?
Figure: Effects of the dispersion of X on the variability of the fitted line
8
7


6


5
y
4
3
2

0 1 2 3 4 5 6
x
8


7
6
5
y
4


3
2

0 1 2 3 4 5 6
x
A Simulation Study

Simulate 100 data sets.


• n = 5 cases with the X values

X1 = 1.86, X2 = 0.22, X3 = 3.55, X4 = 3.29, X5 = 1.25,

fixed throughout all data sets.


• For each data set, the response variable is generated by:
• First generate 1 , · · · , 5 i.i.d. from N (0, 1).
• Then set the response variable as:

Yi = 2 + Xi + i , i = 1, · · · , 5.

• For each data set, derive the LS estimators β̂0 , β̂1 and MSE.
• Data set 1:
case X Y
1 1.86 3.08
2 0.22 2.27
3 3.55 4.38
4 3.29 5.12
5 1.25 1.38
β̂0 = 1.34, β̂1 = 0.94, MSE = 0.79.
• Data set 2:
case X Y
1 1.86 2.91
2 0.22 2.13
3 3.55 5.35
4 3.29 5.76
5 1.25 2.01
β̂0 = 1.19, β̂1 = 1.20, MSE = 0.52.
• ..., ...
• Data set 100:
case X Y
1 1.86 3.36
2 0.22 2.50
3 3.55 5.93
4 3.29 5.36
5 1.25 2.67
β̂0 = 1.75, β̂1 = 1.09, MSE = 0.24.

Note how the Xi s are kept fixed and how the LS estimators vary
across these data sets.
Figure: Sampling distributions of β̂0 , β̂1 , MSE. Sample means are
1.99, 1.02, 1.04 respectively. True parameters are 2, 1, 1, respectively.

histogram of beta_0_hat histogram of beta_1_hat

25
20
Frequency

Frequency

15
5 10

0 5
0

0 1 2 3 4 0.0 0.5 1.0 1.5 2.0

beta_0 beta_1

histogram of MSE
25
Frequency

15
0 5

0 1 2 3 4

sigma^2
Figure: True: red solid; LS lines: grey broken; mean LS line: blue broken
7
6
5
y

4
3
2

0 1 2 3 4 5

x
We calculate the sample mean and sample standard deviation of
these 100 realizations of β̂0 , β̂1 , respectively. Then compare them
to the respective theoretical values.
• β̂0 : Theoretical mean and standard deviation:
v
u
t  2 
 1 X 
E (β̂0 ) = β0 = 2, σ{β̂0 } = σ2  + P  = 0.854.
n n 2
i = 1 ( Xi − X )

Sample mean and sample standard deviation: 1.99, 0.847.


• β̂1 : Theoretical mean and standard deviation:
s
σ2
E (β̂1 ) = β1 = 1, σ{β̂0 } = Pn = 0.358.
2
i =1 (Xi − X )

Sample mean and sample standard deviation: 1.002, 0.36.


Normal Error Model

Normal error model: Simple regression model +


assumption.
• Model equation:

Yi = β0 + β1 Xi + εi , i = 1, . . . , n.

• Model assumptions: The error terms i s are


Sampling Distributions of LS Estimators

Under the Normal error model:


• β̂0 , β̂1 are :

Notes: Use the facts (i) linear combinations of independent normal


random variables are still normal random variables; (ii) β̂0 , β̂1 are
linear combinations of the Yi s.
• SSE /σ2 follows
• Moreover, SSE is with both β̂0 and β̂1 .
Inference of Regression Coefficients
All inferences are under the Normal error model.
• Studentized pivotal quantity:

where t(n−2) denotes the t-distribution with n − 2 degrees of


freedom.
• The numerator is the difference between the estimator and the
parameter.
• The denominator is the standard error of the estimator.
• This quantity follows a known distribution, i.e., the t-distribution.
Notes: Use the fact that if Z ∼ N (0, 1), S 2 ∼ χ2(k ) and Z , S 2 are
independent, then √ Z2 ∼ t(k ) .
S /k
Confidence Interval

(1 − α)-Confidence interval of β1 :

where t (1 − α/2; n − 2) is the (1 − α/2)th percentile of t(n−2) .

How to construct confidence intervals for β0 ?


Interpretation of Confidence Intervals

Figure: A Simulation Study

90% CIs for beta_1

0 1 2 3

beta_1
Heights

• Recall n = 928, X = 68.316, Xi2 = 4334058,


P
i

i =1 Xi − n (X ) = 3038.761. Also
Pn Pn
i =1 (Xi − X )2 = 2 2

β̂0 = 24.54, β̂1 = 0.637, MSE = 5.031.

So
s {β̂1 } =
• 95%-confidence interval of β1 :

We are that the regression slope is in


between 0.557 and 0.717.
T-tests

(0) (0)
• Null hypothesis: H0 : β1 = β1 , where β1 is a given constant.
• T-statistic:

• Null distribution of the T-statistic:

Can you derive the null distribution?


Decision rule at significance level α.
(0)
• Two-sided alternative Ha : β1 , β1 : Reject H0 if and only if
|T ∗ | > t (1 − α/2; n − 2), or equivalently, reject H0 if and only if
pvalue:= P (|t(n−2) | > |T ∗ |) < α.
(0)
• Left-sided alternative Ha : β1 < β1 : Reject H0 if and only if
T ∗ < t (α; n − 2), or equivalently, reject H0 if and only if
pvalue:= P (t(n−2) < T ∗ ) < α.
(0)
• Right-sided alternative Ha : β1 > β1 : Reject H0 if and only
if T ∗ > t (1 − α; n − 2), or equivalently, reject H0 if and only if
pvalue:= P (t(n−2) > T ∗ ) < α.
The decision rule depends on the form of

Why are the critical value approach and the pvalue approach
equivalent? How to conduct hypothesis testing with regard to β0 ?
Heights

Test whether there is a linear association between parent’s height


and child’s height. Use significance level α = 0.01.
• The hypotheses:

• T statistic:
• Critical value:
• Since
• Or the pvalue . Since

• Conclude that

You might also like