Lecture 3
Test of Hypothesis and Confidence Interval
Testing of hypotheses and confidence interval estimation for slope parameter:
Now we consider the tests of hypothesis and confidence interval estimation for the
slope parameter 𝛽1 of the model under two cases, viz., when 𝜎 2 is known and when
𝜎 2 is unknown.
Case 1: When 𝜎 2 is known:
Consider the simple linear regression mode 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝜀𝑖 . It is assumed that
𝜀𝑖 ′𝑠 are independent and identically distributed and follow N(0, 𝜎 2 ).
First, we develop a test for the null hypothesis related to the slope parameter
𝐻0 : 𝛽1 = 𝛽10
where 𝛽10 is some given constant.
𝜎2
Assuming 𝜎 2 to be known, we know that 𝐸(𝛽̂1 ) = 𝛽1 , 𝑣𝑎𝑟(𝛽̂1 ) = and 𝛽̂1 is a
𝑆𝑥𝑥
linear combination of normally distributed 𝑦𝑖 ′𝑠. So
𝜎2
̂
𝛽1 ~𝑁 (𝛽1 , )
𝑆𝑥𝑥
and so, the following statistic can be constructed
𝛽̂1 − 𝛽1
𝑍1 =
𝜎2
√
𝑆𝑥𝑥
which is distributed as N(0,1) when 𝐻0 is true.
A decision rule to test 𝐻1 : 𝛽1 ≠ 𝛽10 can be framed as follows:
Reject 𝐻0 if |𝑍1 | > 𝑍𝛼/2
where 𝑍𝛼/2 is the 𝛼/2 percent points on the normal distribution.
Similarly, the decision rule for one-sided alternative hypothesis can also be framed.
The 100(1 − 𝛼)% confidence interval for 𝛽1 can be obtained using the 𝑍1 statistic
as follows:
𝑃[−𝑧𝛼/2 ≤ 𝑍1 ≤ 𝑧𝛼/2 ] = 1 − 𝛼
𝛽̂1 − 𝛽1
𝑃 −𝑧𝛼/2 ≤ ≤ 𝑧𝛼/2 = 1 − 𝛼
𝜎2
√
[ 𝑆𝑥𝑥 ]
𝜎2 𝜎2
𝑃 [𝛽̂1 − 𝑧𝛼/2 √ ≤ 𝛽1 ≤ 𝛽̂1 +𝑧𝛼/2 √ ] = 1 − 𝛼
𝑆𝑥𝑥 𝑆𝑥𝑥
So 100(1 − 𝛼)100% confidence interval for 𝛽1 is
𝜎2 𝜎2
[𝛽̂1 − 𝑧𝛼/2 √ ̂
, 𝛽 + 𝑧𝛼/2 √ ]
𝑆𝑥𝑥 1 𝑆𝑥𝑥
where 𝑍𝛼/2 is the 𝛼/2 percent point of the N(0,1) distribution.
Case 2: When 𝜎 2 is unknown:
When 𝜎 2 is unknown then we proceed as follows.
The following statistic can be constructed:
𝛽̂1 − 𝛽1
𝑡0 =
𝜎̂ 2
√
𝑆𝑥𝑥
𝛽̂1 − 𝛽1
𝑡0 =
𝑆𝑆𝑟𝑒𝑠
√
(𝑛 − 2)𝑆𝑥𝑥
which follows a t -distribution with (n - 2) degrees of freedom, denoted as 𝑡𝑛−2 ,
when 𝐻0 is true.
A decision rule to test 𝐻1 : 𝛽1 ≠ 𝛽10 can be framed as follows:
Reject 𝐻0 if |𝑡0 | > 𝑡𝑛−2,𝛼/2
where 𝑡𝛼/2 is the 𝛼/2 percent points on the t-distribution with (n – 2) degrees of
freedom. Similarly, the decision rule for one-sided alternative hypothesis can also
be framed.
The 100(1 − 𝛼)100% confidence interval for 𝛽1 can be obtained using the
𝑡0 statistic as follows:
𝑃[−𝑡𝛼/2 ≤ 𝑡0 ≤ 𝑡𝛼/2 ] = 1 − 𝛼
𝛽̂1 − 𝛽1
𝑃 −𝑡𝛼/2 ≤ ≤ 𝑡𝛼/2 = 1 − 𝛼
𝑆𝑆𝑟𝑒𝑠
√
[ (𝑛 − 2)𝑆𝑥𝑥 ]
𝑆𝑆𝑟𝑒𝑠 𝑆𝑆𝑟𝑒𝑠
𝑃 [𝛽̂1 − 𝑡𝛼/2 √ ≤ 𝛽1 ≤ 𝛽̂1 +𝑡𝛼/2 √ ]=1−𝛼
(𝑛 − 2)𝑆𝑥𝑥 (𝑛 − 2)𝑆𝑥𝑥
So 100(1 − 𝛼)% confidence interval for 𝛽1 is
𝑆𝑆𝑟𝑒𝑠 𝑆𝑆𝑟𝑒𝑠
[𝛽̂1 − 𝑡𝛼/2 √ , 𝛽̂1 + 𝑡𝛼/2 √ ]
(𝑛 − 2)𝑆𝑥𝑥 (𝑛 − 2)𝑆𝑥𝑥
where 𝑡𝛼/2 is the 𝛼/2 percent point of the t distribution with n – 2 degrees of
freedom.
(∑ 𝒙𝒊 )𝟐
̅)𝟐 = ∑ 𝒙𝟐𝒊 −
𝑺𝒙𝒙 = ∑(𝒙𝒊 − 𝒙
𝒏
(∑ 𝒚𝒊 )𝟐
𝑺𝒚𝒚 = ∑(𝒚𝒊 − 𝒚 ̅)𝟐 = ∑ 𝒚𝟐𝒊 −
𝒏
̅)(𝒚𝒊 − 𝒚
𝑺𝒙𝒚 = ∑( 𝒙𝒊 − 𝒙 ̅)
(∑ 𝒙𝒊 )(∑ 𝒚𝒊 )
= ∑ 𝒙𝒊 𝒚𝒊 − 𝒏
̂ 𝟏 𝑺𝒙𝒚
𝑺𝑺𝒓𝒆𝒔 = 𝑺𝒚𝒚 − 𝜷
Testing of hypotheses and confidence interval estimation for intercept term:
Now, we consider the tests of hypothesis and confidence interval estimation for
intercept parameter 𝛽0 of the model under two cases, viz., when 𝜎 2 is known and
when 𝜎 2 is unknown.
Case 1: When 𝜎 2 is known:
Suppose the null hypothesis under consideration is
𝐻0 : 𝛽0 = 𝛽00
where 𝛽00 is some given constant.
1 𝑥̅ 2
Assuming 𝜎 2 to be known, we know that 𝐸(𝛽̂0 ) = 𝛽0 , 𝑣𝑎𝑟(𝛽̂0 ) = 𝜎 2 ( + )
𝑛 𝑠𝑥𝑥
and 𝛽̂0 is a linear combination of normally distributed 𝑦𝑖 ′𝑠. So
1 𝑥̅ 2
̂ 2
𝛽0 ~𝑁 (𝛽0 , 𝜎 ( + ))
𝑛 𝑠𝑥𝑥
and so, the following statistic can be constructed
𝛽̂0 − 𝛽00
𝑍0 =
1 𝑥̅ 2
√𝜎 2 ( +
𝑛 𝑠 ) 𝑥𝑥
which is distributed as N(0,1) when 𝐻0 is true.
A decision rule to test 𝐻1 : 𝛽0 ≠ 𝛽00 can be framed as follows:
Reject 𝐻0 if |𝑍0 | > 𝑍𝛼/2
where 𝑍𝛼/2 is the 𝛼/2 percent points on the normal distribution.
Similarly, the decision rule for one-sided alternative hypothesis can also be framed.
The 100(1 − 𝛼)100% confidence interval for 𝛽0 can be obtained using the
𝑍0 statistic as follows:
𝑃[−𝑧𝛼/2 ≤ 𝑍0 ≤ 𝑧𝛼/2 ] = 1 − 𝛼
𝛽̂0 − 𝛽00
𝑃 −𝑧𝛼/2 ≤ ≤ 𝑧𝛼/2 = 1 − 𝛼
1 𝑥̅ 2
√𝜎 2 ( +
[ 𝑛 𝑠𝑥𝑥 ) ]
1 𝑥̅ 2 1 𝑥̅ 2
̂ 2
𝑃 [𝛽0 − 𝑧𝛼/2 √𝜎 ( + ̂ 2
) ≤ 𝛽0 ≤ 𝛽0 +𝑧𝛼/2 √𝜎 ( + )] = 1 − 𝛼
𝑛 𝑠𝑥𝑥 𝑛 𝑠𝑥𝑥
So 100(1 − 𝛼)% confidence interval for 𝛽0 is
1 𝑥̅ 2 1 𝑥̅ 2
[𝛽̂0 − 𝑧𝛼/2 √𝜎 2 ( + ) , 𝛽̂0 + 𝑧𝛼/2 √𝜎 2 ( + )]
𝑛 𝑠𝑥𝑥 𝑛 𝑠𝑥𝑥
where 𝑍𝛼/2 is the 𝛼/2 percent point of the N(0,1) distribution.
Case 2: When 𝜎 2 is unknown:
When 𝜎 2 is unknown then we proceed as follows.
The following statistic can be constructed:
𝛽̂0 − 𝛽0
𝑡0 =
𝑆𝑆𝑟𝑒𝑠 1 𝑥̅ 2
√ +
𝑛 − 2 (𝑛 𝑠𝑥𝑥 )
which follows a t -distribution with (n - 2) degrees of freedom, denoted as 𝑡𝑛−2 ,
when 𝐻0 is true.
A decision rule to test 𝐻1 : 𝛽0 ≠ 𝛽00 can be framed as follows:
Reject 𝐻0 if |𝑡0 | > 𝑡𝑛−2,𝛼/2
where 𝑡𝛼/2 is the 𝛼/2 percent points on the t-distribution with (n – 2) degrees of
freedom. Similarly, the decision rule for one-sided alternative hypothesis can also
be framed.
The 100(1 − 𝛼)% confidence interval for 𝛽0 can be obtained using the t statistic
as follows:
𝑃[−𝑡𝑛−2, 𝛼/2 ≤ 𝑡0 ≤ 𝑡𝛼/2 ] = 1 − 𝛼
𝛽̂0 − 𝛽0
𝑃 −𝑡𝛼/2 ≤ ≤ 𝑡𝑛−2,𝛼/2 = 1 − 𝛼
𝑆𝑆 1 𝑥̅ 2
√ 𝑟𝑒𝑠 ( +
[ 𝑛 − 2 𝑛 𝑠𝑥𝑥 ) ]
𝑆𝑆𝑟𝑒𝑠 1 𝑥̅ 2 𝑆𝑆𝑟𝑒𝑠 1 𝑥̅ 2
𝑃 [𝛽̂0 − 𝑡𝑛−2,𝛼/2 √ ( + ̂
) ≤ 𝛽0 ≤ 𝛽0 +𝑡𝛼/2 √ ( + )] = 1 − 𝛼
𝑛 − 2 𝑛 𝑠𝑥𝑥 𝑛 − 2 𝑛 𝑠𝑥𝑥
So 100(1 − 𝛼)100% confidence interval for 𝛽0 is
𝑆𝑆𝑟𝑒𝑠 1 𝑥̅ 2 𝑆𝑆𝑟𝑒𝑠 1 𝑥̅ 2
[𝛽̂0 − 𝑡𝑛−2,𝛼/2 √ ( + ̂
) , 𝛽0 +𝑡𝛼/2 √ ( + )]
𝑛 − 2 𝑛 𝑠𝑥𝑥 𝑛 − 2 𝑛 𝑠𝑥𝑥
where 𝑡𝛼/2 is the 𝛼/2 percent point of the t distribution with n – 2 degrees of
freedom.
(∑ 𝒙𝒊 )𝟐
̅)𝟐 = ∑ 𝒙𝟐𝒊 −
𝑺𝒙𝒙 = ∑(𝒙𝒊 − 𝒙
Example: 𝒏
(∑ 𝒚𝒊 )𝟐
𝑺𝒚𝒚 = ∑(𝒚𝒊 − 𝒚
̅ )𝟐 = ∑ 𝒚𝟐𝒊 −
𝒏
̅)(𝒚𝒊 − 𝒚
𝑺𝒙𝒚 = ∑( 𝒙𝒊 − 𝒙 ̅)
(∑ 𝒙𝒊 )(∑ 𝒚𝒊 )
= ∑ 𝒙𝒊 𝒚𝒊 − 𝒏
̂ 𝟏 𝑺𝒙𝒚
𝑺𝑺𝒓𝒆𝒔 = 𝑺𝒚𝒚 − 𝜷
Example:
Test the hypotheses
𝐻0 : 𝛽1 = 0
𝐻1 : 𝛽1 ≠ 0
and
𝐻0 : 𝛽0 = 0
𝐻1: 𝛽0 ≠ 0
and 95% confidence intervals for 𝛽1 and 𝛽0 for the fitted regression line for the data
on incomes and food expenditures on the seven households given in the following
Table.
Income, x 55 83 38 61 33 49 67
Food expenditure, y 14 24 13 16 9 15 17
Solution: We are to find the values of a and b for the regression model. Table
below shows the calculations required for the computation of a and b.
Income, Food expenditure, xy 𝑥2 𝑦2
x y
55 14 770 3025 196
83 24 1992 6889 576
38 13 494 1444 169
61 16 976 3721 256
33 9 297 1089 81
49 15 735 2401 225
67 17 1139 4489 289
∑ 𝑥 𝑦=6403 2
∑ 𝑦 =1792
∑ 𝑥 = 386 ∑ 𝑦 = 108 ∑ 𝑥2
= 23,058
𝛽̂1 − 𝛽10
𝑡0 =
𝑆𝑆𝑟𝑒𝑠
√
(𝑛 − 2)𝑆𝑥𝑥
which follows a t -distribution with (n - 2) degrees of freedom, denoted as 𝑡𝑛−2 ,
when 𝐻0 is true.
(∑ 𝑥)( ∑ 𝑦) 386 × 108
∑ 𝑥𝑦 − 6403 − 447.5714
𝛽̂1 = 𝑛 = 7 = = 0.2525
(∑ 𝑥) 2 (386) 2 1772.8571
2
∑𝑥 − 23058 −
𝑛 7
386 108
𝑥̅ = = 55.1429 𝑦̅ = = 15.4286
7 7
𝛽̂0 = 𝑦̅ − 𝑏𝑥̅ = 15.4286 − (. 25252)(55.1429) = 1.5050
(∑ 𝑥𝑖 )2 (386)2
𝑆𝑥𝑥 = ∑ 𝑥𝑖2 − = 23058 − = 1772.8571
𝑛 7
(∑ 𝑦𝑖 )2 (108)2
𝑆𝑦𝑦 = ∑ 𝑦𝑖2 − = 1792 − = 125.7143
𝑛 7
(∑ 𝑥𝑖 )(∑ 𝑦𝑖 ) 386 × 108
𝑆𝑥𝑦 = ∑ 𝑥𝑖 𝑦𝑖 − = 6403 − = 447.5714
𝑛 7
𝑆𝑟𝑒𝑠 = 𝑆𝑦𝑦 − 𝛽̂1 𝑆𝑥𝑦 = 125.7143 − 0.2525 × 447.5714 = 12.7025
𝛽̂1 − 𝛽10 0.2525 − 0 0.2525
𝑡0 = = =
𝑆𝑆𝑟𝑒𝑠 12.7025
√ √ √ 12.7025
(𝑛 − 2)𝑆𝑥𝑥 (7 − 2)1772.8571 8864.2855
.2525
= = 6.6710
.037855
d.f. =7-2= 5
From t-distribution table, p-value is .001.
SPSS Result
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients 95.0% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 1.507 2.174 .693 .519 -4.082 7.096
x .252 .038 .948 6.664 .001 .155 .350
a. Dependent Variable: y
So 100(1 − 𝛼)% confidence interval for 𝛽1 is
𝑆𝑆𝑟𝑒𝑠 𝑆𝑆
[𝛽̂1 − 𝑡𝛼/2 √(𝑛−2)𝑆 , 𝛽̂1 + 𝑡𝛼/2 √ 𝑟𝑒𝑠 ]
(𝑛−2)𝑆
𝑥𝑥 𝑥𝑥
12.7025 12.7025
[. 2525 − 2.365√8864.285 , .2525 + 2.365√8864.285 ]
12.7025 12.7025
[. 2525 − 2.365√8864.285 , .2525 + 2.365√8864.285 ]
[. 2525 − 2.365√0.00143, .2525 + 2.365√0.00143 ]
[. 2525 − 2.365 × 0.037815, .2525 + 2.365 × 0.037815 ]
[. 2525 − 0.08943, .2525 + 0.08943 ]
[. 0.16307, 0.3419 ]
Theorem:
The regression coefficients are independent of the change of origin, but not of
the scale. By origin, we mean that there will be no effect on the regression
coefficients if any constant is subtracted from the value of X and Y.
Assignment
Test the hypotheses
𝐻0 : 𝛽0 = 0
𝐻1: 𝛽0 ≠ 0
and 95% confidence intervals for 𝛽0 for the fitted regression line for the data on
incomes and food expenditures on the seven households given in the following
Table.
Income, x 55 83 38 61 33 49 67
Food expenditure, y 14 24 13 16 9 15 17