Chapter 03 Inferences
Chapter 03 Inferences
Chapter – 03
Interval Estimation
Confidence Interval Estimate for intercept “α ”
Let a be the unbiased estimate of α computed from values of a small random sample of size n
selected from a bivariate normal population having mean α and standard deviation
√[ ]
2
1 X
σ α =σ y. x + .
n ∑ ( X −X )2
√[ ]
2
1 X
sa =s y . x +
n ∑ ( X−X )2
If sample size is small, the sampling distribution of a approach to t – distribution with v=n−2
degree of freedom.
a−α
t= t(v)
sa
Now to construct 100 ( 1−α ) % confidence interval for α . We chose two values (−t α / 2 ,t α / 2 ) due to
its precision with shortest confidence interval for α . Thus, we make the following probability
statement:
P [−t α /2 <t <t α / 2 ] =1−α
−t α/ 2< t<t α /2
a−α
−t α / 2< < t α /2
sa
−t α/ 2 s a< a−α <t α / 2 s a
−a−t α / 2 sa ← α ←a+t α /2 s a
a+ t α /2 s a >α >a−t α /2 s a
a−t α / 2 sa < α < a+t α /2 s a
Or we can write as:
a ± t α sa
(v)
2
2
If sample size is small, the sampling distribution of b approach to t – distribution with v=n−2
degree of freedom.
b− β
t= t (v)
sb
Now to construct 100 ( 1−α ) % confidence interval for β . We chose two values (−t α / 2 ,t α / 2 ) due to
its precision with shortest confidence interval for β . Thus, we make the following probability
statement:
P [−t α /2 <t <t α / 2 ] =1−α
−t α/ 2< t<t α /2
b−β
−t α / 2< < t α /2
sb
−t α/ 2 sb < b−β <t α / 2 s b
−b−t α / 2 sa ← β ←b+t α /2 s b
b+ t α /2 s b > β >b−t α /2 s b
b−t α / 2 sb < β <b +t α /2 s b
Or we can write as:
b ± t α sb
(v)
2
Confidence Interval for the mean value of the dependent variable for a specified X =X 0
(μY . X )
0
Let Y^ 0=a+b X 0 be the estimate of μY . X computed from the values of a small random sample of
0
size n selected from a bivariate normal population having mean “ μY . X ” and standard deviation
√ X −X )2
σ Y .X. 1 + ( 0
n ∑ ( X −X )2
.
3
Y^ 0−μY . X
t= t (v)
√
2
1 ( X 0− X )
sY . X . +
n ∑ ( X− X )2
√
2
1 ( X 0 −X )
Y^ 0 ± t α sY . X . +
2
(v ) n ∑ ( X −X )2
Y^ 0 ± t α s ^
(v ) Y
2
√[ ]
2
1 X
sa =s y . x +
n ∑ ( X−X )2
4
√[ ]
450 2
)(
1 5
sa =100.16 + = 105.05
5 9000
a ± t α sa
(v)
2
19.30 ± 3.182×105.05
19.30 ±108.23
−88.93 , 127.53
ii. Confidence Interval for β
sy.x 100.16
sb = = =1.056
√ ∑ ( X −X ) 2
√ 9000
b ± t α sb
(v)
2
1.03 ± 3.182×1.056
1.03 ± 3.34
−2.31 , 4.37
iii. Estimate of Y when X = 60
Y^ 0=19.30+1.03× 60=81.10
√
2
1 ( X 0−X )
sY^ =sY . X +
n ∑ ( X−X )2
√
2
1 ( 60−90 )
sY^ =100.16 + =54.86
5 9000
95 % confidence interval for μY . X is given below:
Y^ 0 ± t α sY^
(v )
2
2 2
(n−2) sY . X 2 (n−2)s Y . X
2
< σY . X< 2
χα χ α
(n−2) 1− (n−2)
2 2
(3) ( 100.16 )2 2
< σ Y . X <(3) (100.16 )2 ¿ ¿
9.30 0.210
5
2
3236< σ Y . X <143314
Hypothesis Testing
Hypothesis Testing about Intercept “α ”
Let a be the unbiased estimate of α computed from values of a small random sample of size n
selected from a bivariate normal population having mean α and standard deviation
√[ ]
2
1 X
σ α =σ y. x + .
n ∑ ( X −X )2
√[ ]
2
1 X
sa =s y . x +
n ∑ ( X−X )2
If sample size is small, the sampling distribution of a approach to t – distribution with v=n−2
degree of freedom.
a−α
t= t(v)
sa
Testing procedure:
i. H 0 : α =α 0 Vs. H 1 : α ≠ α 0
ii. The significance level; α
iii. The test statistic:
a−α
t= t(v)
sa
iv. Critical Region:
Reject H 0: When |t |≥t α (v)
2
v. Computation
vi. Remarks.
Hypothesis Testing about Slope “ β ”
Let b be the unbiased estimate of β computed from values of a small random sample of size n
selected from a bivariate normal population having mean α and standard deviation
σ y .x
σ b= 2.
√
∑ ( X− X )
As σ y .x is unknown, so, replace by its estimate s y . x . the estimate of σ b is given below:
6
sy.x
sb =
√ ∑ ( X −X ) 2
If sample size is small, the sampling distribution of b approach to t – distribution with v=n−2
degree of freedom.
b− β
t= t (v)
sb
The hypothesis testing H 0 : β=0 is most commonly tested which is equivalent to H 0 : There no
correlation between X and Y or the two variables are independent.
Testing procedure:
i. H 0 : β=β 0 Vs. H 1 : β ≠ β 0 (Generally β 0=0 )
H 0 : β ≤ β 0 Vs. H 1 : β > β 0
H 0 : β ≥ β 0 Vs. H 1 : β < β 0
ii. The significance level; α
iii. The test statistic:
b− β
t= t (v)
sb
iv. Critical Region:
Reject H 0: When |t |≥t α (v )( H 1 : β ≠ β 0)
2
Hypothesis Testing for the mean value of the dependent variable for a specified X =X 0 (μY . X ) 0
Let Y^ 0=a+b X 0 be the estimate of μY . X computed from the values of a small random sample of
0
size n selected from a bivariate normal population having mean “ μY . X ” and standard deviation
√
σ Y .X. 1 +
( X 0−X )
n ∑ ( X −X )2
.
Testing Procedure:
sY . X .
1
√
+
( 0 )
X −X
n ∑ ( X− X )2
7
i. H 0 : μY . X =μ0 vs. H 1 : μ Y . X ≠ μ 0
ii. The significance level; α
iii. The test statistic:
Y^ 0−μY . X
t= t (v )
sY^
Solution: (a)
i. H 0 :α =20 Vs. H 1 : α ≠ 20
ii. The significance level; α =0.05
iii. The test statistic:
a−α
t= t(3)
sa
iv. Critical Region:
Reject H 0: When |t |≥t 0.05 (3 )=3.182
2
v. Computation
a=19.30
sa =¿ 105.05
a−α 19.30−20
t= = =−0.0066
sa 105.05
vi. Remarks: The computed t value falling in the acceptance region, so, the sample data
does not establish sufficient evidence to reject H 0 :α =20 at 5 % significance level.
Thus, it is concluded that α =20 .
Solution: (b)
i. H 0 : β=0 Vs. H 1 : β ≠ 0
ii. The significance level; α
iii. The test statistic:
8
b− β
t= t (3)
sb
iv. Critical Region:
Reject H 0: When |t |≥t 0.05 (3 )=3.182
2
v. Computation
b=1.03
sb =1.056
b− β 1.03
t= = =0.975
sb 1.056
vi. Remarks: The computed t value falling in the acceptance region, so, the sample data
does not establish sufficient evidence to reject H 0 : β=0 at 5 % significance level.
Thus, it is concluded that two variables are independent.
(c).
i. H 0 : μY . X =70 Vs. H 1 : μ Y . X ≠70
ii. The significance level; α =0.05
iii. The test statistic:
Y^ 0−μY . X
t= t (3)
sY^
vi. Remarks
The Calculated t values falling in the acceptance region, so, we have not sufficient evidence to
reject null hypothesis at 5 %.
Prediction with Simple Linear Regression Model
The major goal regression to predict an individual value of Y 0 (dependent variable) for a
specified value X 0 of independent variable. The estimated equation Y^ 0=a+b X 0 is used for
predicting Y 0 value of Y when X =X 0 .
Y 0=a+b X 0+ ∈0
E(Y 0)=α + β X 0
[ ]
2
2 1 ( X 0−X )
Var (Y 0 )=σ 1+ +
Y .X
n ∑ ( X−X )2
Y^ 0 +t α s Y^
(v) 0
2
Where
√
2
1 ( X 0−X )
sY^ =sY . X 1+ +
0
n ∑ ( X−X )2
SSTotal=∑ ( Y −Y )
2
10
The sum squares total in dependent variable can be partitioned into two parts explain part and un
explain part as:
∑ (Y −Y )2=∑ ( Y ± Y^ −Y )
2
∑ (Y −Y )2=∑ ¿ ¿ ¿
∑ (Y −Y )2=∑ ( Y^ −Y ) +∑ ( Y −Y^ )
2 2
MSR
F= F
MSE (P ,n− P−1)
Reject H 0 of β s=0, if F ≥ F α ( p ,n− p−1)
Further we can explain as:
∑ ( Y^ −Y )
2
2
R=
∑ ( Y −Y )2
Or
∑ ( Y −Y^ )
2
2
R =1−
∑ ( Y −Y )2
∑ ( Y −Y^ ) =1−R2
2
∑ ( Y −Y )2
MSR
F=
MSE
SSR
P
F=
SSE
n− p−1
11
∑ ( Y^ −Y )
2
p
F=
∑ ( Y −Y^ )
2
n− p−1
)[ ∑ ( Y^ −Y )
]
2
F= (
n− p−1
p ∑ ( Y −Y^ )
2
[ ]
∑ ( Y^ −Y )
2
∑ ( Y −Y )2
F=( n− pp−1 ) ∑ ( Y −Y^ )
2
∑ ( Y −Y )2
( )( )
2
n− p−1 R
F= 2
p 1−R
( )( )
2
n−2 R
F= 2
1 1−R
( n−2 ) R2
F= F(1 ,n−2)
1−R2
Example: Consider the data on advertisement (X) and sale revenue (Y) for an athletic sports
ware store for five months. The observations are as follow:
Month Y (000,000) X (000)
1 3 1
2 4 2
3 2 3
4 6 4
5 8 5
Find the regression equation of sale on advertisement. Assume normality, test the hypothesis
H 0 : β=0 at 5 % significance level.
12
1 3 1 3 1 9
2 4 2 8 4 16
3 2 3 6 9 4
4 6 4 24 16 36
5 8 5 40 25 64
Total 23 15 81 55 129
n ∑ XY −∑ X ∑ Y
b=
n ∑ X 2− ( ∑ X )
2
5 × 81−15 ×23
b= 2
=1.20
5 ×55−( 15 )
15
X= =3
5
23
Y= =4.60
5
a=Y −b X
a=4.60−1.20 ×3=1.00
The estimated regression line of sale on advertisement is given by:
Y^ =a+bX
Y^ =1.00 +1.20 X
i. H 0 : β=0 Vs. H 1 : β ≠ 0
ii. The significance level; α =0.05
iii. The test statistic: Using ANOVA Technique.
MSR
F= F
MSE (1 ,3 )
iv. Reject H 0 of β=0 , if F ≥ F 0.05( 1 ,3 )=7.71
v. Computation
13
SSTotal=∑ ( Y −Y )2
(∑ Y )
2
SSTotal=∑ Y −
2
n
2
( 23 )
SSTotal=129− =23.20
5
SSE=∑ ( Y −Y^ )
2
SSE=∑ Y −a ∑ Y −b ∑ XY
2
vi. Remarks
The calculated F value fall in the acceptance region. Thus, we conclude that the slope is non –
significant.
Example: Given the data
Y X1 X2
12 2 1
10 2 1
9 3 0
13 4 0
20 4 3
Obtain the least squares estimates of the parameters in the multiple linear regression model
Y = β0 + β 1 X 1 + β 2 X 2 + ϵ and test the overall significance of the regression coefficients.
Solution:
Y X1 X2 X1 Y X2 Y X1 X2 X1
2
X2
2
Y
2
12 2 1 24 12 2 4 1 144
10 2 1 20 10 2 4 1 100
9 3 0 27 0 0 9 0 81
13 4 0 52 0 0 16 0 169
14
20 4 3 80 60 12 16 9 400
64 15 5 203 82 16 49 11 894
( ∑ X 1)
2
∑ x =∑ X −
2
1
2
1
n
( 15 )2
∑ x 21=49− 5
=4
( ∑ X 2)
2
∑ x =∑ X −
2
2
2
2
n
( 5 )2
∑ x =11− 5 =6
2
2
∑ X1∑ X2
∑ x 1 x2 =∑ X 1 X 2− n
15 ×5
∑ x 1 x2 =16− 5
=1
∑ X 1∑ Y
∑ x 1 y=∑ X 1 Y − n
15 × 64
∑ x 1 y=203− 5
=11
∑ X2∑ Y
∑ x 2 y=∑ X 2 Y − n
5 × 64
∑ x 2 y=∑ 82− 5
=18
4 b1 +b 2=11
b 1+6 b2=18
2 4 b1+ 6 b2=66
b 1+6 b2=18
23 b1 =48 → b1=2.087
15
4 b1 +b 2=11
^y =b1 x 1 +b 2 x 2
^y =2.087 x 1 +2.65 x 2
Hypothesis Testing
i. H 0 : β 1=β 2 Vs. H 1 : β1 ≠ β2
ii. The significance level; α =0.05
iii. The test Statistic:
MSR
F= F
MSE (2 ,3 )
iv. Reject H 0 when F ≥ F 0.05 ( 2 , 3 )=19.16
v. Computation:
(∑ Y )
2
SSTotal=∑ Y −
2
n
2
( 64 )
SSTotal=894− =74.8
5
SSE=∑ ( Y −Y^ )
2
Let s2y . x be the estimate of σ 2y .x computed from the values of bivariate normal population. If it is
desired to test H 0 :σ 2y. x =σ 20, then the sampling distribution of σ 2y .x approach to chi square
distribution with (n – 2) degree of freedom.
2 ( n−2 ) s 2y . x 2
χ= 2
χ (n−2)
σ 0
Testing Procedure:
2 2 2 2
i. H 0 :σ y. x =σ 0 Vs . H 0 :σ y. x >σ 0
2 2 2 2
H 0 :σ y. x =σ 0 Vs . H 0 :σ y. x <σ 0
ii. The significance level; α
iii. The test statistic:
2 ( n−2 ) s 2y . x 2
χ= 2
χ (n−2)
σ 0
2 2
iv. Reject H 0 when χ > χ α (n−2)
2 2
Reject H 0 when χ < χ 1−α (n −2 )
v. Computation
vi. Conclusion
Let b 1 and b 2 be the u/b estimates of β 1 and β 2 computed from the values of two random and
independent samples of sizes n1 , n2 selected from two normal populations. The sampling
distribution of ( b 1−b2 ) approach to t – distribution with ( β 1−β 2 ) and pooled standard
deviation s y . x . p
Where
17
√
2 2
( n1−2 ) s y 1.x 1+(n 2−2) s y 2. x 2
s y . x . p=
n1 +n2−4
Or
√ ∑ ( Y 1 −Y^ 1 ) +∑ ( Y 2−Y^ 2)
2 2
s y . x . p=
n1+ n2−4
Where
sb −b =s y . x. p
1 2
√ 1
2
+
1
∑ ( X 1− X 1 ) ∑ ( X 2−X 2 )2
Testing Procedure:
i. H 0 : β 1=β 2 Vs . H 0 : β 1 ≠ β 2
ii. The significance level; α
iii. The test statistic:
( b1−b2 )−( β1− β2 )
t= t (n +n −4)
s b −b 1 2
1 2
Set ∑ X ∑ Y ∑ XY ∑ X 2 ∑ Y 2
A 8 37 76 18 349
B 15 47 179 59 557
Find the estimates of β 1 and β 2, the two regression coefficients of two linear regression lines.
n 1 ∑ X 1 Y 1− ∑ X 1 ∑ X 1 4 ×76−8 ×37
b 1= 2
= 2
=1
n 1 ∑ X − ( ∑ X 1)
2
1
4 ×18− ( 8 )
18
Y^ 1=a1+ b1 X 1
Y^ 1=7.25+ X 1
n2 ∑ X 2 Y 2 − ∑ X 2 ∑ X 2 4 ×179−15 × 47
b 2= 2
= 2
=1
n2 ∑ X 22−( ∑ X 2 ) 4 ×59−( 15 )
Y^ 2=a2 +b2 X 2
Y^ 1=8.00 + X 2
i. H 0 : β 1=β 2 Vs. H 1 : β1 ≠ β2
ii. The significance level; α =0.05
iii. The test statistic:
( b1−b2 )−( β1− β2 )
t= t (n +n −4)
s b −b1 2
1 2
∑ ( X 1−X 1) =∑ X −
2 2
1
n1
2
( 8)
∑ ( X 1−X 1) 2=18− 4
=2
(∑ X 2 )
2
∑ ( X 2−X 2) =∑ X
2 2
2 −
n2
2
( 15 )
∑ ( X 2−X 2) 2=59− 4
=2.75
√ ∑ ( Y 1−Y^ 1 ) +∑ ( Y 2−Y^ 2 )
2 2
s y . x . P=
n1 +n2 −4
s y . x . P=
√ 4.75+2
4
=1.30
sb −b =s y . x. p
1 2
√ 1
2
+
1
∑ ( X 1− X 1 ) ∑ ( X 2−X 2 )2
sb −b =1.30
1 2
√ 1 1
+
2 2.5
=1.23
0
t= =0
1.23
To test the null hypothesis that the regression model is linear that’s μ y . x =α + βX . Let a random
sample of n observation with k distinct values of X is selected and for each value X, the Y
observations is repeated n1 , n2 , … ., n k. The observd data can be display as:
X Y - values Sum
1 2 ⋯ j ⋯ n yi 0
x1 y 11 y 12 ⋯ y1 j ⋯ y1n 1
y 10
x2 y 21 y 22 ⋯ y2 j ⋯ y1 n 2
y 20
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
xi yi 1 yi 2 ⋯ y ij ⋯ yi n
i
yi 0
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
xk yk 1 yk 2 ⋯ y kj ⋯ y1 n k
yk 0
y i 0 ( ∑ y ij )
2 2
χ =∑ −b ∑ ( x−x )
2 2 2
1 −
ni n
2
yi 0
χ =∑ y −∑
2 2
2 ij
ni
2
χ1
(k −2)
F= 2
F α ( v1 , v2 )
χ2
(n−k )
Where
v 1=k−2, v 2=n−k
Example: Use the following data to test the hypothesis that the regression is linear at the 0.05
level of significance:
X 2 2 2 3 3 4 5 5 6 6 6
Y 4 3 8 18 22 24 24 18 3 10 16
Solution: The estimated Regression line:
X Y XY X
2
n ∑ XY −∑ X ∑ Y 11× 634−44 × 150
2 4 12 4 b= = =1.214
n ∑ X −( ∑ X )
2 2
2
11×204−( 44 )
2 3 6 4
2 8 16 4
44
3 18 54 9 =4 X=
11
3 22 66 9
150
4 24 96 16 Y= =13.64
11
5 24 120 25
a=Y −b X=13.64−1.214 × 4=8.78
5 18 90 25
Estimated Regression Line:
6 3 18 36
Y^ =a+bX
6 10 60 36
Y^ =8.78+1.214 X
6 16 96 36
44 150 634 204
2
χ1
(k −2)
F= 2
F α ( v1 , v2 )
χ2
(n−k )
iv. Reject 0, when F ≥ F 0.05 ( 3 , 6 ) =8.94
H
v. Computation:
X Y yi 0
2 4 3 8 15
3 18 22 40
4 24 24
5 24 18 42
6 3 10 16 29
∑ ∑ y ij 150
∑ ( x −x ) =∑ x −
2 2
n
2
( 44 )
∑ ( x −x )2=204− 11
=28
y i 0 ( ∑ y ij )
2 2
χ =∑ −b ∑ ( x−x )
2 2 2
1 −
ni n
( 15 )2 ( 40 )2 ( 24 )2 ( 42 )2 ( 29 )2 ( 150 )2
χ 21= + + + + − −( 1.214 )2 ×28
3 2 1 2 3 11
2
χ 1=2613.33−2045.45−41.26=526.62
2
yi 0
χ 2=∑ y ij −∑
2 2
ni
2
χ 2=2738−2613.33=124.67
22
2
χ1 526.62
(k −2) 3 175.54
F= 2
= = =8.44
χ2 124.67 20.78
(n−k ) 6
vi. Remarks: The computed F value falls in the acceptance region, we have not
sufficient evidence to reject H 0. It is concluded the regression line is linear.
Let r be the estimate of ρ computed from the values of a random sample of size n selected from
a bivariate normal population having correlation coefficient ρ . The sampling distribution of r
approach to normal for very large sample size.
r −ρ
z= 2
1−r
√n
But this test is not recommended because it required very large sample size.
Facing this situation R.A Fisher introduce a transformation technique known as Fisher’s z
transformation, which transform a non-normal variable into approximately normal. This
transformation is given by;
z f −μ z
z=
1
√n−3
Where
1+ r
z f =1.1513 log
1−r
1+ ρ
μ z=1.1513 log
1−ρ
To construct 100 ( 1−α ) % confidence interval for μ z. Choose two values −z α and z α and make
2 2
(
P −z α ≤ z ≤ z α =1−α
2 2
)
−z α ≤ z ≤ z α
2 2
z f −μ z
−z α ≤= ≤ zα
2
1 2
√ n−3
1 1
−z f −z α ≤−μ z ≤−z f + z α
2 √ n−3 2 √ n−3
1 1
zf + z α ≥ μz ≥ z f −z α
2 √ n−3 2 √ n−3
1 1
z f −z α ≤ μz ≤ zf + z α
2 √ n−3 2 √ n−3
zα
2
zf ±
√n−3
Re transformed the lower and upper limits of μ z by using Fisher’s z – table to construct
confidence for ρ .
.92 1.5890 1.5956 1.6022 1.6089 1.6157 1.6226 1.6296 1.6366 1.6438 1.6510
.93 1.6584 1.6658 1.6734 1.6811 1.6888 1.6967 1.7047 1.7129 1.7211 1.7295
.94 1.7380 1.7467 1.7555 1.7645 1.7736 1.7828 1.7923 1.8019 1.8117 1.8216
.95 1.8318 1.8421 1.8527 1.8635 1.8745 1.8857 1.8972 1.9090 1.9210 1.9333
.96 1.9459 1.9588 1.9721 1.9857 1.9996 2.0139 2.0287 2.0439 2.0595 2.0756
.97 2.0923 2.1095 2.1273 2.1457 2.1649 2.1847 2.2054 2.2269 2.2494 2.2729
.98 2.2976 2.3235 2.3507 2.3796 2.4101 2.4427 2.4774 2.5147 2.5550 2.5987
.99 2.6467 2.6996 2.7587 2.8257 2.9031 3.9945 3.1063 3.2504 3.5434 3.8002
Example: A random sample of size n = 23, taken from bivariate normal population, showed a
correlation coefficient of 0.59. Compute 95 % confidence interval for ρ .
Solution:
r =0.59
1+ r 1.59
z f =1.1513 log =1.1513 log =0.677
1−r 0.41
z α =z 0.05 =1.96
2 2
zα
2
zf ±
√n−3
1.96
0.677 ±
√20
0.677 ± 0.438
0.230 ≤ μ z ≤ 1.115
0.23 ≤ ρ ≤0.82
( H 0 : ρ= ρ0 ¿
25
Let r be the estimate of ρ computed from the values of a random sample of size n selected from
a bivariate normal population having correlation coefficient ρ . The sampling distribution of r
approach to normal for very large sample size.
r −ρ
z= 2
1−r
√n
But this test is not recommended because it required very large sample size.
Facing this situation R.A Fisher introduce a transformation technique known as Fisher’s z
transformation, which transform a non-normal variable into approximately normal. This
transformation is given by;
z f −μ z
z=
1
√n−3
Where
1+ r
z f =1.1513 log
1−r
1+ ρ
μ z=1.1513 log
1−ρ
Testing Procedure:
i. H 0 : ρ= ρ0 Vs. H 1 : ρ≠ ρ0
ii. The significance level; α
iii. The test statistic:
z f −μ z
z=
1
√n−3
iv. Reject H 0, when |z|≥ z α
2
v. Computation
vi. Remarks
Example: A value of r of 0.6 is calculated from a random sample of 39 pairs of observations
from a bivariate normal population. Is this value of r consistent with the hypothesis that ρ=0.4?
Solution:
( H 0 : ρ=0 )
Let r be the estimate of ρ computed from the values of a random sample of size n, selected from
a bivariate normal population having correlation coefficient ρ . The sampling distribution of r is
approach to t – distribution with (n – 2) d.f if ρ=0.
r √ n−2
t= t (n−2)
√ 1−r 2
Or
2
r ( n−2 )
F= F(1 ,n−2)
1−r 2
Example: A sample of size 12 yielded r = 0.32. test H 0 : ρ=0 against alternative H 1 : ρ≠ 0 . Let
α =0.01
Solution:
i. H 0 : ρ=0 Vs. H 1 : ρ≠ 0
ii. The significance level; α =0.01
iii. The test statistic:
r √ n−2
t= t (n−2)
√ 1−r 2
27
vi. Remarks
The null hypothesis is not rejected, thus we concluded that the two variables X and Y
are independent.
Example: What will be the least value of r in a random sample of 38 pairs that is significant (i).
at the 0.05 level (ii). At the 0.01 level.
|t |≥t α (n−2)
2
|t |=t α (n−2)
2
|t |≥2.021 ¿
r √n−2
=2.021
√ 1−r 2
r √ 36
=2.021
√1−r 2
6r
=2.021
√1−r 2
6 r =2.021( √ 1−r 2 )
36 r 2=4.08 ( 1−r 2)
28
2
40.8 r =4.08
r =0.32
Let r 1 and r 2 be the sample correlation coefficients computed from the values of two random
independent samples of size n1 , n2 pairs selected from two bivariate normal populations having
correlation coefficients ρ1∧ ρ2. The sampling distribution of ( r 1 −r 2 ) does not follow normal. The
Fisher’s z – transformation technique will be used to test the null hypothesis that population
correlation coefficient is identical.
That’s
z f −z f
z= 1 2
N (0 , 1)
√ 1
+
1
n1 −3 n2 −3
Testing Procedure:
i. H 0 : ρ1=ρ2 Vs. H 1 : ρ1 ≠ ρ2
ii. The significance level; α
iii. The test statistic:
z f −z f
z= 1 2
N (0 , 1)
√ 1
+
1
n1 −3 n2 −3
iv. Reject H 0, When |z|≥ z α
2
v. Computation
vi. Remarks
Example: Two independent samples have 28 and 19 pairs of observations with correlation
coefficients 0.55 and 0.75 respectively. Are these values of r consistent with the hypothesis that
the samples have been drawn from the same population?
Solution:
i. H 0 : ρ1=ρ2 Vs. H 1 : ρ1 ≠ ρ2
ii. The significance level; α =0.05
iii. The test statistic:
z f −z f
z= 1
N ( 0 , 1)
2
v. Computation
1+r 1 1.55
z f =1.1513 log =1.1513 log =0.62
1
1−r 1 0.45
1+r 2 1.75
z f =1.1513 log =1.1513 log =0.97
2
1−r 2 0.25
z f −z f 0.62−0.97
z= 1 2
= =−1.093
vi. Remarks
√ 1
+
1
n1 −3 n2 −3 √ 1
+
28−3 19−3
1
The null hypothesis is not rejected; it is conclude that that both samples came
from the same population.
Let r i (i=1 , 2 , … , k) are the sample correlation coefficients computed from k random samples of
ni (i=1, 2 , … , k ) pairs of observations respectively, selected from k bivariate normal populations
with corresponding population correlation coefficients ρi (i=1 , 2 , … , k) .
H 0 : ρ1=ρ2 =…= ρk
Using Fisher’s z – transformation and the following test statistic is used to test the above null
hypothesis.
k
u=∑ ( ni−3 ) ( z f −z )2 χ 2(k−1 )
i
i=1
Where
z=
∑ ( ni −3 ) z f i
∑ ( ni−3 )
30
Example: Random samples of 10, 15, and 20 are drawn from a bivariate normal population,
yielding r = 0.3, 0.4, 0.49 respectively. Form a combined estimate of ρ and test the hypothesis
that the correlations are homogenous.
Solution:
z=
∑ ( ni −3 ) z f = 16.22 =0.45
i
∑ ( ni−3 ) 36
k
u=∑ ( ni−3 ) ( z f −z )2 =0.3537
i
i=1
Let r ij .1 2 3 …. k be the estimate of ρij .1 2 3 …. k computed from the values of random of size n pairs of
observations selected from normally distributed population. If it is desired to test
H 0 : ρij .1 23 … . k =0. The test statistic is to be used is
r ij. 1 23 … .k √ n−k−2
t= t (n− k−2 )
√1−r 2
ij .12 …k
OR
31
2
r ij. 12… k (n−k −2)
F= 2
F (1 , n−k−2)
1−r ij. 12… k
Example: Given n = 20, r 12 .34=0.51 test by means of t and F test, the hypothesis that ρ12 .34=0
Solution:
0.51 √ 16
t= =2.37
√1−( 0.51 ) 2
Let r ij .1 2 3 …. k be the estimate of ρij .1 2 3 …. k computed from the values of random of size n pairs of
observations selected from normally distributed population. As the r ij .1 2 3 …. k does not follow
normal. Using Fisher’s z – transformation which convert a non – normal variable into
approximately normal.
zf −μ z
z ij. 12… p= ij .12 …k ij .12 …k
1
√ n−k−3
Where
1+r ij. 12 3 … .k
zf =1.1513 log
ij .12 … k
1−r ij. 1 23 … .k
(
P −z α ≤ z ≤−z α =1−α
2 2
)
−z α ≤ z ≤−z α
2 2
zα
2
zf ±
ij .12 … k
√n−k −3
Testing Hypothesis that a Multiple Correlation Coefficient is zero
Let R y . 12 … . p be the estimate of ρ y. 12 … . p computed from the values of a random sample of size n
selected from a normally distributed population. If it is desired to test H 0 : ρ y .1 2 … . p=0 , which is
equivalent to test H 0= β1=β 2=…=β p=0. The test statistic would be used
2
R y .12 … p ( n−p−1 )
F= F( p , n−p −1 )
p ( 1−R 2y .12 … p )