0% found this document useful (0 votes)
11 views32 pages

Chapter 03 Inferences

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views32 pages

Chapter 03 Inferences

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

1

Chapter – 03

Inferences about Simple & Multiple Linear Regression Models

Interval Estimation
Confidence Interval Estimate for intercept “α ”
Let a be the unbiased estimate of α computed from values of a small random sample of size n
selected from a bivariate normal population having mean α and standard deviation

√[ ]
2
1 X
σ α =σ y. x + .
n ∑ ( X −X )2

As σ y .x is unknown, so, replace by its estimate s y . x . the estimate of σ α is given below:

√[ ]
2
1 X
sa =s y . x +
n ∑ ( X−X )2

If sample size is small, the sampling distribution of a approach to t – distribution with v=n−2
degree of freedom.
a−α
t= t(v)
sa

Now to construct 100 ( 1−α ) % confidence interval for α . We chose two values (−t α / 2 ,t α / 2 ) due to
its precision with shortest confidence interval for α . Thus, we make the following probability
statement:
P [−t α /2 <t <t α / 2 ] =1−α

−t α/ 2< t<t α /2

a−α
−t α / 2< < t α /2
sa
−t α/ 2 s a< a−α <t α / 2 s a
−a−t α / 2 sa ← α ←a+t α /2 s a
a+ t α /2 s a >α >a−t α /2 s a
a−t α / 2 sa < α < a+t α /2 s a
Or we can write as:
a ± t α sa
(v)
2
2

Confidence Interval Estimate for slope “ β ”


Let b be the unbiased estimate of β computed from values of a small random sample of size n
selected from a bivariate normal population having mean α and standard deviation
σ y .x
σ b= 2.
√∑ ( X− X )
As σ y .x is unknown, so, replace by its estimate s y . x . the estimate of σ b is given below:
sy.x
sb =
√ ∑ ( X −X ) 2

If sample size is small, the sampling distribution of b approach to t – distribution with v=n−2
degree of freedom.
b− β
t= t (v)
sb

Now to construct 100 ( 1−α ) % confidence interval for β . We chose two values (−t α / 2 ,t α / 2 ) due to
its precision with shortest confidence interval for β . Thus, we make the following probability
statement:
P [−t α /2 <t <t α / 2 ] =1−α

−t α/ 2< t<t α /2

b−β
−t α / 2< < t α /2
sb
−t α/ 2 sb < b−β <t α / 2 s b
−b−t α / 2 sa ← β ←b+t α /2 s b
b+ t α /2 s b > β >b−t α /2 s b
b−t α / 2 sb < β <b +t α /2 s b
Or we can write as:
b ± t α sb
(v)
2
Confidence Interval for the mean value of the dependent variable for a specified X =X 0
(μY . X )
0

Let Y^ 0=a+b X 0 be the estimate of μY . X computed from the values of a small random sample of
0

size n selected from a bivariate normal population having mean “ μY . X ” and standard deviation

√ X −X )2
σ Y .X. 1 + ( 0
n ∑ ( X −X )2
.
3

As σ Y . X is unknown so, replace by its estimate sY . X . if n is small the sampling distribution of


Y^ 0=a+b X 0 approach to t distribution with v=n−2 degree of freedom.

Y^ 0−μY . X
t= t (v)


2
1 ( X 0− X )
sY . X . +
n ∑ ( X− X )2

A 100 (1−α ) % confidence interval for μY . X is given by


2
1 ( X 0 −X )
Y^ 0 ± t α sY . X . +
2
(v ) n ∑ ( X −X )2

Y^ 0 ± t α s ^
(v ) Y
2

Confidence Interval estimate for σ 2Y . X is given below:


2
(n−2) sY . X 2 2 ¿
2
< σ Y . X < n−2 ¿ s Y . X 2
χ α χ α
(n−2) 1− (n−2)
2 2

Example: Data given example – 1. Construct 95 % confidence interval for


i. Intercept of the regression line
ii. Slope of the regression line
iii. For the μY . X =60
iv. Confidence Interval for population variance of regression Y on X.
Solution: The estimated model is given below:
Y^ =19.30+ 1.03 X
s y . x =100.16
We need to calculate
(∑ X )
2
2
∑ ( X−X ) =∑ X − 2
n
2
( 450 )
∑ ( X−X ) =49500−
2
=9000
5
t α =t 0.025(3) =3.182
(v)
2

i. Confidence interval for α

√[ ]
2
1 X
sa =s y . x +
n ∑ ( X−X )2
4

√[ ]
450 2
)(
1 5
sa =100.16 + = 105.05
5 9000

a ± t α sa
(v)
2
19.30 ± 3.182×105.05
19.30 ±108.23
−88.93 , 127.53
ii. Confidence Interval for β
sy.x 100.16
sb = = =1.056
√ ∑ ( X −X ) 2
√ 9000
b ± t α sb
(v)
2

1.03 ± 3.182×1.056
1.03 ± 3.34
−2.31 , 4.37
iii. Estimate of Y when X = 60
Y^ 0=19.30+1.03× 60=81.10


2
1 ( X 0−X )
sY^ =sY . X +
n ∑ ( X−X )2


2
1 ( 60−90 )
sY^ =100.16 + =54.86
5 9000
95 % confidence interval for μY . X is given below:
Y^ 0 ± t α sY^
(v )
2

81.10 ± 3.182× 54.86


81.10 ± 174.56
−73.45 , 255.65
2 2
χ 0.025 (3 )=9.30 χ 0.975(3)=0.210

2 2
(n−2) sY . X 2 (n−2)s Y . X
2
< σY . X< 2
χα χ α
(n−2) 1− (n−2)
2 2

(3) ( 100.16 )2 2
< σ Y . X <(3) (100.16 )2 ¿ ¿
9.30 0.210
5

2
3236< σ Y . X <143314

Hypothesis Testing
Hypothesis Testing about Intercept “α ”
Let a be the unbiased estimate of α computed from values of a small random sample of size n
selected from a bivariate normal population having mean α and standard deviation

√[ ]
2
1 X
σ α =σ y. x + .
n ∑ ( X −X )2

As σ y .x is unknown, so, replace by its estimate s y . x . the estimate of σ α is given below:

√[ ]
2
1 X
sa =s y . x +
n ∑ ( X−X )2

If sample size is small, the sampling distribution of a approach to t – distribution with v=n−2
degree of freedom.
a−α
t= t(v)
sa

Testing procedure:
i. H 0 : α =α 0 Vs. H 1 : α ≠ α 0
ii. The significance level; α
iii. The test statistic:
a−α
t= t(v)
sa
iv. Critical Region:
Reject H 0: When |t |≥t α (v)
2
v. Computation
vi. Remarks.
Hypothesis Testing about Slope “ β ”
Let b be the unbiased estimate of β computed from values of a small random sample of size n
selected from a bivariate normal population having mean α and standard deviation
σ y .x
σ b= 2.

∑ ( X− X )
As σ y .x is unknown, so, replace by its estimate s y . x . the estimate of σ b is given below:
6

sy.x
sb =
√ ∑ ( X −X ) 2

If sample size is small, the sampling distribution of b approach to t – distribution with v=n−2
degree of freedom.
b− β
t= t (v)
sb

The hypothesis testing H 0 : β=0 is most commonly tested which is equivalent to H 0 : There no
correlation between X and Y or the two variables are independent.
Testing procedure:
i. H 0 : β=β 0 Vs. H 1 : β ≠ β 0 (Generally β 0=0 )
H 0 : β ≤ β 0 Vs. H 1 : β > β 0
H 0 : β ≥ β 0 Vs. H 1 : β < β 0
ii. The significance level; α
iii. The test statistic:
b− β
t= t (v)
sb
iv. Critical Region:
Reject H 0: When |t |≥t α (v )( H 1 : β ≠ β 0)
2

Reject H 0: When t >t α(v)( H 1 : β > β 0)


Reject H 0: When t ←t α (v)( H 1 : β < β 0)
v. Computation
vi. Remarks.

Hypothesis Testing for the mean value of the dependent variable for a specified X =X 0 (μY . X ) 0

Let Y^ 0=a+b X 0 be the estimate of μY . X computed from the values of a small random sample of
0

size n selected from a bivariate normal population having mean “ μY . X ” and standard deviation


σ Y .X. 1 +
( X 0−X )
n ∑ ( X −X )2
.

As σ Y . X is unknown so, replace by its estimate sY . X . if n is small the sampling distribution of


Y^ 0=a+b X 0 approach to t distribution with v=n−2 degree of freedom.
Y^ 0−μY . X
t= t (v)

Testing Procedure:
sY . X .
1

+
( 0 )
X −X
n ∑ ( X− X )2
7

i. H 0 : μY . X =μ0 vs. H 1 : μ Y . X ≠ μ 0
ii. The significance level; α
iii. The test statistic:
Y^ 0−μY . X
t= t (v )
sY^

iv. Critical Region:


Reject H 0: When |t |≥t α (v )
2
v. Computation
vi. Remarks
Example: Using the data of example 1 and test the following hypothesis at 5 % level of
significance.
a. H 0 :α =20 Vs. H 1 : α ≠ 20
b. H 0 : β=0 Vs. H 1 : β ≠ 0
c. H 0 : μY . X =70 Vs. H 1 : μ Y . X ≠70

Solution: (a)
i. H 0 :α =20 Vs. H 1 : α ≠ 20
ii. The significance level; α =0.05
iii. The test statistic:
a−α
t= t(3)
sa
iv. Critical Region:
Reject H 0: When |t |≥t 0.05 (3 )=3.182
2
v. Computation
a=19.30
sa =¿ 105.05

a−α 19.30−20
t= = =−0.0066
sa 105.05

vi. Remarks: The computed t value falling in the acceptance region, so, the sample data
does not establish sufficient evidence to reject H 0 :α =20 at 5 % significance level.
Thus, it is concluded that α =20 .
Solution: (b)
i. H 0 : β=0 Vs. H 1 : β ≠ 0
ii. The significance level; α
iii. The test statistic:
8

b− β
t= t (3)
sb
iv. Critical Region:
Reject H 0: When |t |≥t 0.05 (3 )=3.182
2
v. Computation
b=1.03
sb =1.056

b− β 1.03
t= = =0.975
sb 1.056

vi. Remarks: The computed t value falling in the acceptance region, so, the sample data
does not establish sufficient evidence to reject H 0 : β=0 at 5 % significance level.
Thus, it is concluded that two variables are independent.
(c).
i. H 0 : μY . X =70 Vs. H 1 : μ Y . X ≠70
ii. The significance level; α =0.05
iii. The test statistic:
Y^ 0−μY . X
t= t (3)
sY^

iv. Critical Region:


Reject H 0: When |t |≥t 0.05 (3 )=3.182
2
v. Computation
Y^ 0=81.10 and 54.86
Y^ 0−μY . X
t=
sY^
81.10−70
t= =0.202
54.86

vi. Remarks
The Calculated t values falling in the acceptance region, so, we have not sufficient evidence to
reject null hypothesis at 5 %.
Prediction with Simple Linear Regression Model
The major goal regression to predict an individual value of Y 0 (dependent variable) for a
specified value X 0 of independent variable. The estimated equation Y^ 0=a+b X 0 is used for
predicting Y 0 value of Y when X =X 0 .

The true value Y 0 of Y is given by:


9

Y 0=a+b X 0+ ∈0

Satisfying the classical assumptions of OLS.


The mean of true value Y 0 is
E(Y 0)=E(a+ b X 0 +∈0 )

E(Y 0)=α + β X 0

The variance of the true value Y 0 is


Var (Y 0 )=Var (a+b X 0+ ∈0)

Var (Y 0 )=Var (Y −b X + b X 0 +∈0 )

Var (Y 0 )=Var (Y + b X 0 −b X +∈0 )

Var (Y 0 )=Var (Y + b( X 0−X )+∈0 )


2
Var (Y 0 )=Var (Y )+ ( X 0−X ) Var (b)+ Var (∈0)
2 2
σY . X 2 σ Y.X 2
Var (Y 0 )= + ( X 0− X ) +σ Y . X
n ∑ ( X−X )2

[ ]
2
2 1 ( X 0−X )
Var (Y 0 )=σ 1+ +
Y .X
n ∑ ( X−X )2

Prediction Interval for Y 0 of Y

Y^ 0 +t α s Y^
(v) 0
2

Where


2
1 ( X 0−X )
sY^ =sY . X 1+ +
0
n ∑ ( X−X )2

Hypothesis Testing about Slope of Regression Line through ANOVA Technique


The sum squares total in dependent variable can be computed as:

SSTotal=∑ ( Y −Y )
2
10

The sum squares total in dependent variable can be partitioned into two parts explain part and un
explain part as:

∑ (Y −Y )2=∑ ( Y ± Y^ −Y )
2

∑ (Y −Y )2=∑ ¿ ¿ ¿
∑ (Y −Y )2=∑ ( Y^ −Y ) +∑ ( Y −Y^ )
2 2

SSTotal=SSExplain variation ( SSR ) +SSUnexplainvariation (SSE)


Presentation in ANOVA Table
Suppose we have p – regressors.
S . O. V d .f S.S M .S F
p SSR=∑ ( Y^ −Y ) SSR MSR
2
Regressio
=MSR
n P MSE
n−p−1 SSE=∑ ( Y −Y^ ) SSE
2
Residual
=MSE
n− p−1
n−1 SSTotal=∑ ( Y −Y )
2
Total

MSR
F= F
MSE (P ,n− P−1)
Reject H 0 of β s=0, if F ≥ F α ( p ,n− p−1)
Further we can explain as:

∑ ( Y^ −Y )
2
2
R=
∑ ( Y −Y )2
Or

∑ ( Y −Y^ )
2
2
R =1−
∑ ( Y −Y )2
∑ ( Y −Y^ ) =1−R2
2

∑ ( Y −Y )2
MSR
F=
MSE
SSR
P
F=
SSE
n− p−1
11

∑ ( Y^ −Y )
2

p
F=
∑ ( Y −Y^ )
2

n− p−1

)[ ∑ ( Y^ −Y )
]
2

F= (
n− p−1
p ∑ ( Y −Y^ )
2

Multiply and divide by ∑ (Y −Y ) .


2

[ ]
∑ ( Y^ −Y )
2

∑ ( Y −Y )2
F=( n− pp−1 ) ∑ ( Y −Y^ )
2

∑ ( Y −Y )2

( )( )
2
n− p−1 R
F= 2
p 1−R

In simple linear regression, p = 1.

( )( )
2
n−2 R
F= 2
1 1−R

( n−2 ) R2
F= F(1 ,n−2)
1−R2

It is concluded that H 0 : β=0 is equivalent to test H 0 : ρ2Y . X =0

Example: Consider the data on advertisement (X) and sale revenue (Y) for an athletic sports
ware store for five months. The observations are as follow:
Month Y (000,000) X (000)
1 3 1
2 4 2
3 2 3
4 6 4
5 8 5

Find the regression equation of sale on advertisement. Assume normality, test the hypothesis
H 0 : β=0 at 5 % significance level.
12

Solution: The necessary computations.


Month Y (000,000) X (000) XY X
2
Y
2

1 3 1 3 1 9
2 4 2 8 4 16
3 2 3 6 9 4
4 6 4 24 16 36
5 8 5 40 25 64
Total 23 15 81 55 129

n ∑ XY −∑ X ∑ Y
b=
n ∑ X 2− ( ∑ X )
2

5 × 81−15 ×23
b= 2
=1.20
5 ×55−( 15 )
15
X= =3
5
23
Y= =4.60
5
a=Y −b X
a=4.60−1.20 ×3=1.00
The estimated regression line of sale on advertisement is given by:
Y^ =a+bX
Y^ =1.00 +1.20 X
i. H 0 : β=0 Vs. H 1 : β ≠ 0
ii. The significance level; α =0.05
iii. The test statistic: Using ANOVA Technique.
MSR
F= F
MSE (1 ,3 )
iv. Reject H 0 of β=0 , if F ≥ F 0.05( 1 ,3 )=7.71
v. Computation
13

SSTotal=∑ ( Y −Y )2
(∑ Y )
2

SSTotal=∑ Y −
2
n
2
( 23 )
SSTotal=129− =23.20
5
SSE=∑ ( Y −Y^ )
2

SSE=∑ Y −a ∑ Y −b ∑ XY
2

SSE=129−( 1 ) ( 23 )−1.20 ×81=8.80


SSR can be obtained as:
SSR=SSTotal−SSE
SSR=23.20−8.80=14.40
Presentation in ANOVA table
S. O. V d .f S.S M .S F
Regression 1 14.40 14.40 14.40
=4.91
Residual 3 8.80 2.93 2.93
Total 4 23.20 ---

vi. Remarks
The calculated F value fall in the acceptance region. Thus, we conclude that the slope is non –
significant.
Example: Given the data

Y X1 X2
12 2 1
10 2 1
9 3 0
13 4 0
20 4 3

Obtain the least squares estimates of the parameters in the multiple linear regression model
Y = β0 + β 1 X 1 + β 2 X 2 + ϵ and test the overall significance of the regression coefficients.

Solution:

Y X1 X2 X1 Y X2 Y X1 X2 X1
2
X2
2
Y
2

12 2 1 24 12 2 4 1 144
10 2 1 20 10 2 4 1 100
9 3 0 27 0 0 9 0 81
13 4 0 52 0 0 16 0 169
14

20 4 3 80 60 12 16 9 400
64 15 5 203 82 16 49 11 894

( ∑ X 1)
2

∑ x =∑ X −
2
1
2
1
n

( 15 )2
∑ x 21=49− 5
=4

( ∑ X 2)
2

∑ x =∑ X −
2
2
2
2
n

( 5 )2
∑ x =11− 5 =6
2
2

∑ X1∑ X2
∑ x 1 x2 =∑ X 1 X 2− n
15 ×5
∑ x 1 x2 =16− 5
=1

∑ X 1∑ Y
∑ x 1 y=∑ X 1 Y − n
15 × 64
∑ x 1 y=203− 5
=11

∑ X2∑ Y
∑ x 2 y=∑ X 2 Y − n
5 × 64
∑ x 2 y=∑ 82− 5
=18

b 1 ∑ x 1+ b2 ∑ x 1 x 2=∑ x 1 yb 1 ∑ x 1 x 2 +b2 ∑ x 2=∑ x 2 y


2 2

4 b1 +b 2=11

b 1+6 b2=18

2 4 b1+ 6 b2=66

b 1+6 b2=18

23 b1 =48 → b1=2.087
15

Substitute b 1 in equation first

4 b1 +b 2=11

4 ×2.087+ b2=11→ b2=2.65

Estimated regression model in deviated form

^y =b1 x 1 +b 2 x 2

^y =2.087 x 1 +2.65 x 2

Re transform into original model

Y^ −Y =2.087 ( X 1−X 1 ) +2.65 ( X 2−X 2 )

Y^ −12.8=2.087 ( X 1−3 ) +2.65 ( X 2−1 )

Y^ = (12.8−6.261−2.65 ) +2.087 X 1 +2.65 X 2

Y^ =3.889+ 2.087 X 1 +2.65 X 2

Hypothesis Testing

i. H 0 : β 1=β 2 Vs. H 1 : β1 ≠ β2
ii. The significance level; α =0.05
iii. The test Statistic:
MSR
F= F
MSE (2 ,3 )
iv. Reject H 0 when F ≥ F 0.05 ( 2 , 3 )=19.16
v. Computation:
(∑ Y )
2

SSTotal=∑ Y −
2
n
2
( 64 )
SSTotal=894− =74.8
5
SSE=∑ ( Y −Y^ )
2

SSE=∑ Y −b0 ∑ Y −b1 ∑ X 1 Y −b2 ∑ X 2 Y


2

SSE=894−3.889 × 64−2.087 ×203−2.65 ×82=4.143


SSR=SSTotal−SSE
SSR=74.8−4.143=70.657
SV df SS MS F
SSR 2 70.657 35.3285 17.53
16

SSE 2 4.143 2.015


SSTotal 4 74.8 18.7
vi. Remarks: The calculated statistic falling in the acceptance region, the sample data
does not provide sufficient evidence to reject the null hypothesis. Thus it is concluded
the slopes are identical.

Testing Hypothesis about Population Variance (OR Standard Deviation) of Regression

Let s2y . x be the estimate of σ 2y .x computed from the values of bivariate normal population. If it is
desired to test H 0 :σ 2y. x =σ 20, then the sampling distribution of σ 2y .x approach to chi square
distribution with (n – 2) degree of freedom.

2 ( n−2 ) s 2y . x 2
χ= 2
χ (n−2)
σ 0

Testing Procedure:
2 2 2 2
i. H 0 :σ y. x =σ 0 Vs . H 0 :σ y. x >σ 0
2 2 2 2
H 0 :σ y. x =σ 0 Vs . H 0 :σ y. x <σ 0
ii. The significance level; α
iii. The test statistic:
2 ( n−2 ) s 2y . x 2
χ= 2
χ (n−2)
σ 0
2 2
iv. Reject H 0 when χ > χ α (n−2)
2 2
Reject H 0 when χ < χ 1−α (n −2 )
v. Computation
vi. Conclusion

Testing Hypothesis about Equality of slopes of Two Regression Lines

Let b 1 and b 2 be the u/b estimates of β 1 and β 2 computed from the values of two random and
independent samples of sizes n1 , n2 selected from two normal populations. The sampling
distribution of ( b 1−b2 ) approach to t – distribution with ( β 1−β 2 ) and pooled standard
deviation s y . x . p

Where
17


2 2
( n1−2 ) s y 1.x 1+(n 2−2) s y 2. x 2
s y . x . p=
n1 +n2−4

Or

√ ∑ ( Y 1 −Y^ 1 ) +∑ ( Y 2−Y^ 2)
2 2

s y . x . p=
n1+ n2−4

( b1−b2 )−( β1− β2 )


t= t (n +n −4)
s b −b
1 2
1 2

Where

sb −b =s y . x. p
1 2
√ 1
2
+
1
∑ ( X 1− X 1 ) ∑ ( X 2−X 2 )2
Testing Procedure:

i. H 0 : β 1=β 2 Vs . H 0 : β 1 ≠ β 2
ii. The significance level; α
iii. The test statistic:
( b1−b2 )−( β1− β2 )
t= t (n +n −4)
s b −b 1 2
1 2

iv. Reject H 0 when |t |≥t α (n + n −4)1 2


2
v. Computation:
vi. Conclusion
Example: The various sums for sets of data, each of 4 observations are as follows:

Set ∑ X ∑ Y ∑ XY ∑ X 2 ∑ Y 2
A 8 37 76 18 349
B 15 47 179 59 557
Find the estimates of β 1 and β 2, the two regression coefficients of two linear regression lines.

Solution: The data of two sets can be expressed as:

n1=4 , ∑ X 1=8 , ∑ Y 1=37. ∑ X 1 Y 1=76 , ∑ X 1=18 , ∑ Y 1=349


2 2

n2 =4 , ∑ X 2=15 , ∑ Y 2=47. ∑ X 2 Y 2=179 , ∑ X 2=59 , ∑ Y 2=552


2 2

n 1 ∑ X 1 Y 1− ∑ X 1 ∑ X 1 4 ×76−8 ×37
b 1= 2
= 2
=1
n 1 ∑ X − ( ∑ X 1)
2
1
4 ×18− ( 8 )
18

a 1=Y 1−b 1 X 1=9.25−1 ×2=7.25

The estimated regression line is

Y^ 1=a1+ b1 X 1

Y^ 1=7.25+ X 1

n2 ∑ X 2 Y 2 − ∑ X 2 ∑ X 2 4 ×179−15 × 47
b 2= 2
= 2
=1
n2 ∑ X 22−( ∑ X 2 ) 4 ×59−( 15 )

a 2=Y 2−b 2 X 2=11.75−3.75=8

The estimated regression line is

Y^ 2=a2 +b2 X 2

Y^ 1=8.00 + X 2

Testing Hypothesis about Equality of Two Regression Lines

i. H 0 : β 1=β 2 Vs. H 1 : β1 ≠ β2
ii. The significance level; α =0.05
iii. The test statistic:
( b1−b2 )−( β1− β2 )
t= t (n +n −4)
s b −b1 2
1 2

iv. Reject H 0 , when |t |≥t 0.025 ( 4 )=2.776


v. Computation:
(∑ X 1 )
2

∑ ( X 1−X 1) =∑ X −
2 2
1
n1
2
( 8)
∑ ( X 1−X 1) 2=18− 4
=2

(∑ X 2 )
2

∑ ( X 2−X 2) =∑ X
2 2
2 −
n2
2
( 15 )
∑ ( X 2−X 2) 2=59− 4
=2.75

∑ ( Y 1−Y^ 1 ) =∑ Y 21−a 1 ∑ Y 1−b1 ∑ X 1 Y 1


2

∑ ( Y 1−Y^ 1 ) =349−7.25 ×37−76=4.75


2

∑ ( Y 2−Y^ 2 ) =∑ Y 22−a 2 ∑ Y 2−b2 ∑ X 2 Y 2


2
19

∑ ( Y 2−Y^ 2 ) =557−8 × 47−179=2


2

√ ∑ ( Y 1−Y^ 1 ) +∑ ( Y 2−Y^ 2 )
2 2

s y . x . P=
n1 +n2 −4

s y . x . P=
√ 4.75+2
4
=1.30

sb −b =s y . x. p
1 2
√ 1
2
+
1
∑ ( X 1− X 1 ) ∑ ( X 2−X 2 )2

sb −b =1.30
1 2
√ 1 1
+
2 2.5
=1.23

( b1−b2 )−( β1− β2 )


t=
s b −b
1 2

0
t= =0
1.23

vi. Remarks: The calculated t value fall in the


crtical region, so we have not sufficient
evidence to reject H 0. Thus, we conclude that the two regression lines are
parallel.

Testing Hypothesis about the Linearity Regression

To test the null hypothesis that the regression model is linear that’s μ y . x =α + βX . Let a random
sample of n observation with k distinct values of X is selected and for each value X, the Y
observations is repeated n1 , n2 , … ., n k. The observd data can be display as:

X Y - values Sum
1 2 ⋯ j ⋯ n yi 0
x1 y 11 y 12 ⋯ y1 j ⋯ y1n 1
y 10
x2 y 21 y 22 ⋯ y2 j ⋯ y1 n 2
y 20
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
xi yi 1 yi 2 ⋯ y ij ⋯ yi n
i
yi 0
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
xk yk 1 yk 2 ⋯ y kj ⋯ y1 n k
yk 0

y i 0: represent sum of Y values corresponding to x i.


20

y i 0 ( ∑ y ij )
2 2

χ =∑ −b ∑ ( x−x )
2 2 2
1 −
ni n
2
yi 0
χ =∑ y −∑
2 2
2 ij
ni
2
χ1
(k −2)
F= 2
F α ( v1 , v2 )
χ2
(n−k )

Where

v 1=k−2, v 2=n−k

Example: Use the following data to test the hypothesis that the regression is linear at the 0.05
level of significance:

X 2 2 2 3 3 4 5 5 6 6 6
Y 4 3 8 18 22 24 24 18 3 10 16
Solution: The estimated Regression line:

X Y XY X
2
n ∑ XY −∑ X ∑ Y 11× 634−44 × 150
2 4 12 4 b= = =1.214
n ∑ X −( ∑ X )
2 2
2
11×204−( 44 )
2 3 6 4
2 8 16 4
44
3 18 54 9 =4 X=
11
3 22 66 9
150
4 24 96 16 Y= =13.64
11
5 24 120 25
a=Y −b X=13.64−1.214 × 4=8.78
5 18 90 25
Estimated Regression Line:
6 3 18 36
Y^ =a+bX
6 10 60 36
Y^ =8.78+1.214 X
6 16 96 36
44 150 634 204

i. H 0 :The regression is linear Vs. H 1 : The regression is not linear


ii. The significance level; α =0.05
iii. The test statistic:
21

2
χ1
(k −2)
F= 2
F α ( v1 , v2 )
χ2
(n−k )
iv. Reject 0, when F ≥ F 0.05 ( 3 , 6 ) =8.94
H
v. Computation:
X Y yi 0

2 4 3 8 15
3 18 22 40
4 24 24
5 24 18 42
6 3 10 16 29

∑ ∑ y ij 150

∑ ∑ y 2ij=( 4 )2+ (3 )2+ …+( 16 )2=2738


(∑ x )
2

∑ ( x −x ) =∑ x −
2 2
n
2
( 44 )
∑ ( x −x )2=204− 11
=28

y i 0 ( ∑ y ij )
2 2

χ =∑ −b ∑ ( x−x )
2 2 2
1 −
ni n

( 15 )2 ( 40 )2 ( 24 )2 ( 42 )2 ( 29 )2 ( 150 )2
χ 21= + + + + − −( 1.214 )2 ×28
3 2 1 2 3 11
2
χ 1=2613.33−2045.45−41.26=526.62
2
yi 0
χ 2=∑ y ij −∑
2 2
ni
2
χ 2=2738−2613.33=124.67
22

2
χ1 526.62
(k −2) 3 175.54
F= 2
= = =8.44
χ2 124.67 20.78
(n−k ) 6

vi. Remarks: The computed F value falls in the acceptance region, we have not
sufficient evidence to reject H 0. It is concluded the regression line is linear.

Inferences about Correlations

Confidence Interval Estimate for Population Correlation Coefficient

Let r be the estimate of ρ computed from the values of a random sample of size n selected from
a bivariate normal population having correlation coefficient ρ . The sampling distribution of r
approach to normal for very large sample size.

r −ρ
z= 2
1−r
√n
But this test is not recommended because it required very large sample size.

Facing this situation R.A Fisher introduce a transformation technique known as Fisher’s z
transformation, which transform a non-normal variable into approximately normal. This
transformation is given by;

z f −μ z
z=
1
√n−3
Where

1+ r
z f =1.1513 log
1−r

1+ ρ
μ z=1.1513 log
1−ρ

To construct 100 ( 1−α ) % confidence interval for μ z. Choose two values −z α and z α and make
2 2

the following probability statement.


23

(
P −z α ≤ z ≤ z α =1−α
2 2
)
−z α ≤ z ≤ z α
2 2

z f −μ z
−z α ≤= ≤ zα
2
1 2
√ n−3
1 1
−z f −z α ≤−μ z ≤−z f + z α
2 √ n−3 2 √ n−3

1 1
zf + z α ≥ μz ≥ z f −z α
2 √ n−3 2 √ n−3

1 1
z f −z α ≤ μz ≤ zf + z α
2 √ n−3 2 √ n−3


2
zf ±
√n−3
Re transformed the lower and upper limits of μ z by using Fisher’s z – table to construct
confidence for ρ .

Table 1: Fisher's -transform. .


.00 .01 .02 .03 .04 .05 .06 .07 .08 .09
.0 .0000 .0100 .0200 .0300 .0400 .0500 .0601 .0701 .0802 .0902
.1 .1003 .1104 .1206 .1307 .1409 .1511 .1614 .1717 .1820 .1923
.2 .2027 .2132 .2237 .2342 .2448 .2554 .2661 .2769 .2877 .2986
.3 .3095 .3205 .3316 .3428 .3541 .3654 .3769 .3884 .4001 .4118
.4 .4236 .4356 .4477 .4599 .4722 .4847 .4973 .5101 .5230 .5361
.5 .5493 .5627 .5763 .5901 .6042 .6184 .6328 .6475 .6625 .6777
.6 .6931 .7089 .7250 .7414 .7582 .7753 .7928 .8107 .8291 .8480
.7 .8673 .8872 .9076 .9287 .9505 .9730 .9962 1.0203 1.0454 1.0714
.8 1.0986 1.1270 1.1568 1.1881 1.2212 1.2562 1.2933 1.3331 1.3758 1.4219
.000 .001 .002 .003 .004 .005 .006 .007 .008 .009
.90 1.4722 1.4775 1.4828 1.4882 1.4937 1.4992 1.5047 1.5103 1.5160 1.5217
.91 1.5275 1.5334 1.5393 1.5453 1.5513 1.5574 1.5636 1.5698 1.5762 1.5826
24

.92 1.5890 1.5956 1.6022 1.6089 1.6157 1.6226 1.6296 1.6366 1.6438 1.6510
.93 1.6584 1.6658 1.6734 1.6811 1.6888 1.6967 1.7047 1.7129 1.7211 1.7295
.94 1.7380 1.7467 1.7555 1.7645 1.7736 1.7828 1.7923 1.8019 1.8117 1.8216
.95 1.8318 1.8421 1.8527 1.8635 1.8745 1.8857 1.8972 1.9090 1.9210 1.9333
.96 1.9459 1.9588 1.9721 1.9857 1.9996 2.0139 2.0287 2.0439 2.0595 2.0756
.97 2.0923 2.1095 2.1273 2.1457 2.1649 2.1847 2.2054 2.2269 2.2494 2.2729
.98 2.2976 2.3235 2.3507 2.3796 2.4101 2.4427 2.4774 2.5147 2.5550 2.5987
.99 2.6467 2.6996 2.7587 2.8257 2.9031 3.9945 3.1063 3.2504 3.5434 3.8002

Example: A random sample of size n = 23, taken from bivariate normal population, showed a
correlation coefficient of 0.59. Compute 95 % confidence interval for ρ .

Solution:

r =0.59

1+ r 1.59
z f =1.1513 log =1.1513 log =0.677
1−r 0.41

z α =z 0.05 =1.96
2 2

95 % confidence Interval for μ z


2
zf ±
√n−3
1.96
0.677 ±
√20
0.677 ± 0.438

0.230 ≤ μ z ≤ 1.115

0.23 ≤ ρ ≤0.82

Hypothesis Testing about Correlation Coefficient have Specified Value

( H 0 : ρ= ρ0 ¿
25

Let r be the estimate of ρ computed from the values of a random sample of size n selected from
a bivariate normal population having correlation coefficient ρ . The sampling distribution of r
approach to normal for very large sample size.

r −ρ
z= 2
1−r
√n
But this test is not recommended because it required very large sample size.

Facing this situation R.A Fisher introduce a transformation technique known as Fisher’s z
transformation, which transform a non-normal variable into approximately normal. This
transformation is given by;

z f −μ z
z=
1
√n−3
Where

1+ r
z f =1.1513 log
1−r

1+ ρ
μ z=1.1513 log
1−ρ

Testing Procedure:

i. H 0 : ρ= ρ0 Vs. H 1 : ρ≠ ρ0
ii. The significance level; α
iii. The test statistic:
z f −μ z
z=
1
√n−3
iv. Reject H 0, when |z|≥ z α
2
v. Computation
vi. Remarks
Example: A value of r of 0.6 is calculated from a random sample of 39 pairs of observations
from a bivariate normal population. Is this value of r consistent with the hypothesis that ρ=0.4?

Solution:

i. H 0 : ρ=0.4 Vs. H 1 : ρ≠ 0.4


ii. The significance level; α =0.05
26

iii. The test statistic:


z f −μ z
z=
1
√n−3
iv. Reject H 0, when |z|≥ z 0.05 =1.96
2
v. Computation
1.6
z f =1.1513 log =0.693
0.4
1.4
μ z=1.1513 log =0.423
0.6
z f −μ z 0.693−0.423
z= = =1.68
1 1
√n−3 √39
vi. Remarks
The correlation coefficient is 0.4.
Hypothesis Testing about Correlation Coefficient is zero

( H 0 : ρ=0 )
Let r be the estimate of ρ computed from the values of a random sample of size n, selected from
a bivariate normal population having correlation coefficient ρ . The sampling distribution of r is
approach to t – distribution with (n – 2) d.f if ρ=0.

If it is desired to test H 0 : ρ=0, the test statistic will be

r √ n−2
t= t (n−2)
√ 1−r 2
Or
2
r ( n−2 )
F= F(1 ,n−2)
1−r 2

Example: A sample of size 12 yielded r = 0.32. test H 0 : ρ=0 against alternative H 1 : ρ≠ 0 . Let
α =0.01

Solution:

i. H 0 : ρ=0 Vs. H 1 : ρ≠ 0
ii. The significance level; α =0.01
iii. The test statistic:
r √ n−2
t= t (n−2)
√ 1−r 2
27

iv. Reject H 0, when |t |≥t 0.005 (10)=3.169


v. Computation
r √ n−2 0.32 √ 12−2
t= = =1.06
√ 1−r 2
√ 1−( 0.32 ) 2

vi. Remarks
The null hypothesis is not rejected, thus we concluded that the two variables X and Y
are independent.

Example: What will be the least value of r in a random sample of 38 pairs that is significant (i).
at the 0.05 level (ii). At the 0.01 level.

Solution: The r will be significant if the null hypothesis is rejected.

|t |≥t α (n−2)
2

The least value of significance is

|t |=t α (n−2)
2

|t |≥t 0.025 (38−2)

|t |≥2.021 ¿

r √n−2
=2.021
√ 1−r 2
r √ 36
=2.021
√1−r 2
6r
=2.021
√1−r 2
6 r =2.021( √ 1−r 2 )

36 r 2=4.08 ( 1−r 2)
28

2
40.8 r =4.08

r =0.32

Hypothesis Testing about ρ1= ρ2

Let r 1 and r 2 be the sample correlation coefficients computed from the values of two random
independent samples of size n1 , n2 pairs selected from two bivariate normal populations having
correlation coefficients ρ1∧ ρ2. The sampling distribution of ( r 1 −r 2 ) does not follow normal. The
Fisher’s z – transformation technique will be used to test the null hypothesis that population
correlation coefficient is identical.

That’s

z f −z f
z= 1 2
N (0 , 1)

√ 1
+
1
n1 −3 n2 −3

Testing Procedure:

i. H 0 : ρ1=ρ2 Vs. H 1 : ρ1 ≠ ρ2
ii. The significance level; α
iii. The test statistic:
z f −z f
z= 1 2
N (0 , 1)

√ 1
+
1
n1 −3 n2 −3
iv. Reject H 0, When |z|≥ z α
2
v. Computation
vi. Remarks
Example: Two independent samples have 28 and 19 pairs of observations with correlation
coefficients 0.55 and 0.75 respectively. Are these values of r consistent with the hypothesis that
the samples have been drawn from the same population?

Solution:

i. H 0 : ρ1=ρ2 Vs. H 1 : ρ1 ≠ ρ2
ii. The significance level; α =0.05
iii. The test statistic:
z f −z f
z= 1
N ( 0 , 1)
2

iv. Reject H 0, When |z|≥ 1.96


√1
+
n1 −3 n2 −3
1
29

v. Computation
1+r 1 1.55
z f =1.1513 log =1.1513 log =0.62
1
1−r 1 0.45
1+r 2 1.75
z f =1.1513 log =1.1513 log =0.97
2
1−r 2 0.25

z f −z f 0.62−0.97
z= 1 2
= =−1.093

vi. Remarks
√ 1
+
1
n1 −3 n2 −3 √ 1
+
28−3 19−3
1

The null hypothesis is not rejected; it is conclude that that both samples came
from the same population.

Hypothesis testing about the equality of several Correlation Coefficients

Let r i (i=1 , 2 , … , k) are the sample correlation coefficients computed from k random samples of
ni (i=1, 2 , … , k ) pairs of observations respectively, selected from k bivariate normal populations
with corresponding population correlation coefficients ρi (i=1 , 2 , … , k) .

If it is desired to test the null hypothesis:

H 0 : ρ1=ρ2 =…= ρk

Using Fisher’s z – transformation and the following test statistic is used to test the above null
hypothesis.
k
u=∑ ( ni−3 ) ( z f −z )2 χ 2(k−1 )
i
i=1

Where

z=
∑ ( ni −3 ) z f i

∑ ( ni−3 )
30

Example: Random samples of 10, 15, and 20 are drawn from a bivariate normal population,
yielding r = 0.3, 0.4, 0.49 respectively. Form a combined estimate of ρ and test the hypothesis
that the correlations are homogenous.

Solution:

i. H 0 : ρ1=ρ2 =ρ3 Vs. H 1 : ρ1 ≠ ρ2 ≠ ρ3


ii. The significance level; α =0.05
iii. The test statistic:
k
u=∑ ( ni−3 ) ( z f −z )2 χ 2(k−1 )
i
i=1

iv. Reject H 0, when u ≥ χ 20.05 (2 ) df =5.99


v. Computation:
ni ni −3 ri 1+r i ( n i−3 ) z f ( n i−3 ) ( z f −z )
2
z f =1.1513 log i i
i
1−r i

10 7 0.3 0.31 2.17 0.1372


15 12 0.4 0.42 5.04 0.0108
20 17 0.49 0.53 9.01 0.2057
36 16.22 0.3537

z=
∑ ( ni −3 ) z f = 16.22 =0.45
i

∑ ( ni−3 ) 36
k
u=∑ ( ni−3 ) ( z f −z )2 =0.3537
i
i=1

vi. Remarks: The correlation coefficients are identical.


Hypothesis Testing about Partial Correlation Coefficient of order k

Let r ij .1 2 3 …. k be the estimate of ρij .1 2 3 …. k computed from the values of random of size n pairs of
observations selected from normally distributed population. If it is desired to test
H 0 : ρij .1 23 … . k =0. The test statistic is to be used is

r ij. 1 23 … .k √ n−k−2
t= t (n− k−2 )
√1−r 2
ij .12 …k

OR
31

2
r ij. 12… k (n−k −2)
F= 2
F (1 , n−k−2)
1−r ij. 12… k

Example: Given n = 20, r 12 .34=0.51 test by means of t and F test, the hypothesis that ρ12 .34=0

Solution:

i. H 0 : ρ12 .34=0 Vs. H 1 : ρ12. 34 ≠ 0


ii. The significance level; α =0.05
iii. The test statistic:
r √ n−k−2
t= ij. 1 23 … .k 2 t (20−2−2)
√1−r ij .12 …k
iv. Reject H 0, when |t |≥t 0.025 (16 )=2.120
v. Computation:
r ij. 1 23 … .k √ n−k−2
t=
√1−r 2
ij .12 …k

0.51 √ 16
t= =2.37
√1−( 0.51 ) 2

vi. Remarks: The null hypothesis is rejected.

Confidence Interval for Partial Correlation of Order K

Let r ij .1 2 3 …. k be the estimate of ρij .1 2 3 …. k computed from the values of random of size n pairs of
observations selected from normally distributed population. As the r ij .1 2 3 …. k does not follow
normal. Using Fisher’s z – transformation which convert a non – normal variable into
approximately normal.

zf −μ z
z ij. 12… p= ij .12 …k ij .12 …k

1
√ n−k−3
Where

1+r ij. 12 3 … .k
zf =1.1513 log
ij .12 … k
1−r ij. 1 23 … .k

Now to construct 100 ( 1−α ) % confidence interval for μ z ij .12… k


32

(
P −z α ≤ z ≤−z α =1−α
2 2
)
−z α ≤ z ≤−z α
2 2


2
zf ±
ij .12 … k
√n−k −3
Testing Hypothesis that a Multiple Correlation Coefficient is zero

Let R y . 12 … . p be the estimate of ρ y. 12 … . p computed from the values of a random sample of size n
selected from a normally distributed population. If it is desired to test H 0 : ρ y .1 2 … . p=0 , which is
equivalent to test H 0= β1=β 2=…=β p=0. The test statistic would be used
2
R y .12 … p ( n−p−1 )
F= F( p , n−p −1 )
p ( 1−R 2y .12 … p )

You might also like