Unit4 Multivariate Analysis
Unit4 Multivariate Analysis
Where x is called the predictor or regressor variable and y is called the response
variable. The quantity is called the error which is equal to the difference
between observed value and the estimated value. Note that y = β0 + β1x is the
equation of least squares straight line connecting the variables x and y, where β 0
is the intercept and β1 is the slope.
Suppose that fix the value of the regressor variable x and observe the
corresponding value of the response y. Then E(Y = y/X = x) is given by
E(y/x) = µy/x
= E(β0 + β1x + )
= β0 + β1x
Similarly,
Var(y/x) = σ2y/x
= Var(β0 + β1x + ε) = σ2
Regression models are used for several purposes, some of them are;
where β0 is the intercept, β1 is the slope and ε is a random error component. The
errors are normally distributed with mean zero and variance σ2.
We will use the method of least squares to estimate the parameters β0 and
β1 in (1). That is we will estimate β0 and β1 so that the sum of squares of the
differences between the observations yi and the straight line is a minimum.
i = 1,2,……. n.
n
S (β0, β1) = (y
i =1
i – β0 – β1xi)2 ------------------ (5) is a minimum.
S n
= 0 - 2 (y − − 1 xi ) = 0
0
i 0
i =1
S n
= 0 −2 ( yi − 0 − 1 xi ) xi = 0
1 i =1
n n
n 0 + 1 xi = yi
→ ( 5)
i =1 i =1
n n n
0 xi + 1 xi = xi yi
2
i =1 i =1 i =1
0 and 1 are called the least squares estimators of 0 and 1 . Here the equations
0 = y − 1 x → (6)
n
n
n
x y − x y
i i i i
And ˆ1 = i =1 i =1 i =1
2
→ (7)
n
xi
xi2 − i =1
n
i =1 n
Equation (6) is obtained from the first equation of (5), after dividing with n,
1 n 1 n
where y = i
n i =1
y , x = xi .
n i =1
Equation (8) gives the point estimate of the mean of y for a particular x.
2
n
xi
i =1
n
Let us denote S xx = xi −
2
i =1 n
n 2
= ( xi − x ) → (9)
i =1
n n
xi yi
S xy = xi yi − i =1 i =1
n
And
i =1 n
n
= yi ( xi − x ) → (10)
i =1
S xy
Now equation (7) can be written as 1 = → (11)
S xx
The difference between the observed value yi and the corresponding fitted
That is, ei = yi - yi
= yi − ( 0 + 1 x ), i = 1,2,………n → (12)
1 2158.70 15.50
2 1678.15 23.75
3 2316.00 8.00
4 2061.30 17.00
5 2207.50 5.50
6 1708.30 19.00
7 1784.70 24.00
8 2575.00 2.50
8 8
n n
xi2 = 2130.8125
i =1
x y
i =1
i i = 220755.2625
S xx = 470.49218 S xy = −16798.7578
S 16798.7578
Now, 1 = S = − 470.49218 = −35.70464
xy
xx
0 = y − 1 x
= 2061.2063-(-35.70464)(14.4063)
= 2575.578
y = 0 + 1 x = 2575.578 − 35.70464 x
Properties of the least squares estimators and the fitted regression model :
S
i.e., 1 = S
xy
xx
n
yi ( xi − x )
= i =1 n
(x − x )
2
i
i =1
n
( xi − x )
= y i n
( x − x )
i =1 2
i
i =1
n
( xi − x ) xi − x
= c y i i , where ci = n = S
i =1
( x − x )
2
xx
i
i =1
(2) The least squares estimators 0 and 1 are unbiased estimators of the model
parameters 0 and 1 .
i.e., E ( 0 ) = 0 and E ( 1 ) = 1
21 x2
+
Var( 0 ) = n S
xx
2
Var( 1 ) = S xx
Where 0 = y − 1 x →
S xy
1 =
S xx
(3) The sum of the residuals in any regression model that contains an
intercept 0 is always zero.
( y − y ) = 0
n
i.e., i i
i =1
( ) y − ( )
n n
Consider yi − yi = + 1 xi
i =1 i =1
i 0
= y −
i =1
i 0 − 1 xi
n n n
= yi − 0 − 1 xi
i =1 i =1 i =1
= ny − n0 − nx 1
(
= ny − n y − 1 x − nx 1 )
= ny − ny + n1 x − nx 1
=0
(4) The sum of the observed values yi equals the sum of the fitted values yi .
n n
i.e., yi = yi
i =1 i =1
(5) The least squares regression line always passes through the point ( x , y ) .
(6) The sum of the residuals weighted by the corresponding value of the
regressor variable always equals zero.
n
(7) The sum of the residuals weighted by the corresponding fitted value
always equals zero.
Estimation of :
2
SSRe s as follows:
SS Re s
n2− 2
2
SS
E Re s =
2
n−2
SS
Thus an unbiased estimator of is S = n − 2
2 2 Re s
= MSRe s
( y − y )
n 2
= i i
i =1
n 2
= yi − 0 − 1 xi
i =1
n 2
=
i =1
yi − y + 1 x − 1 xi
[ 0 = y − 1 x ]
( y − y ) + ( x − x )
2
= i i 1
i =1
n 2 n n
( y − y ) + 1 ( x − x ) − 21 ( xi − x )( yi − y )
2 2
= i i
i =1 i =1 i =1
2
= S yy + 1 S xx − 21S xy
S xy
( )
2
= S yy + 1 S xx − 21 1S xx 1 =
S xx
2 2
= S yy + 1 S xx − 21 S xx
2
= S yy − 1 S xx
2
S
= S yy − xy S xx
S xx
S xy2
= S yy −
S xx
SSRe s = S yy − 1S xy
n
( Let us denote SST = S yy = ( yi − y ) )
2
i =1
Suppose that we wish to test the hypothesis that the slope equals a constant
say 10 . The appropriate hypotheses are
1 − 10
Test statistic, Z 0 =
2
S xx
2 2
1 − Z < 1 < 1 + Z
2
S xx 2
S xx
Suppose that we wish to test the hypothesis that the slope equals a constant
say 10 . The appropriate hypotheses are
1 − 10
Test statistic, t0 =
MSRe s
S xx
1 − 10
= →
2
S xx
SSRe s SSRe s
1 − t (n − 2)d . f 1 1 + t (n − 2)d . f
2
(n − 2) S xx 2
(n − 2) S xx
( )
Se 1 =
MSRe s
S xx
1 − 10
t0 = =
Se( 1 )
Problem: The following are measurements of the air velocity and evaporation
coefficient of burning fuel droplets in an impulse engine:
Air velocity (cm/sec) : 20 60 100 140 180 220 260 300 340 380
Evo. Coeff(mm2/sec): 0.18 0.37 0.35 0.78 0.56 0.75 1.18 1.36 1.17 1.65
10 10 10 10 8
xi = 2000 ,
i =1
xi2 = 532000 ,
i =1
yi = 8.35 ,
i =1
xi yi = 2175.40 ,
i =1
y
i =1
2
i = 9.1097
2
10
xi
S xx = xi − i =1
10
2
i =1 n
( 2000)
2
= 532000 - = 132000
10
10 10
10 xi yi
S xy = xi yi − i =1 i =1
i =1 n
2
10
yi
S yy = yi − i =1
10
2
i =1 n
(8.35)
2
= 9.1097 − = 2.13745
10
S xy 505.40
Now, 1 = = = 0.00383
S xx 132000
0 = y − 1 x
8.35 2000
= − 0.00383
10 10
= 0.069
y = 0.069 + 0.00383x
(ii) Null Hypothesis, H o : 1 = 0
Alternative Hypothesis, H1 : 1 0
0 − 10 SSRe s
Test the statistic, t0 = , MSRe s =
MSRe s n−2
S xx
S 2 xy (505.40) 2
SSRe s = S yy − = 2.13745 −
S xx 132000
= 0.20238
0.20238
MSRe s =
8
= 0.0252975
0.00383
t0 =
0.0252975
132000
t0 = 8.7488
Since t0 = 8.7488 exceeds t0.025 (8) d . f = 2.306 ,we have to reject the null
Hypothesis .So, Take the alternative hypothesis. That is H1 : 1 0
(iii) The (1 − )100% confidence limits for 1 are
SS Res
1 t (n − 2)d . f
2 (n − 2) S xx
MS Res
1 t (n − 2)d . f
2 S xx
Here (1 − )100 = 95
(1 − ) = 0.95
= 0.05
= 0.025
2
MSRe s = 0.0252975
S xx = 132000
MS Res
= 0.0000012049
S xx
0.00383 (2.306)(0.0000012049)
0.0038272 1 0.0038327
Hypothesis testing on the intercept( 2 is Known):
Sample size = n
Level of significance =
0 − 00
Test the statistic, Z 0 =
1 x2
2( + )
n S xx
1 x2
0 Z 2 ( + )
2 n S xx
1 x2 1 x2
0 − Z 2 ( + ) 0 0 + Z 2 ( + )
2 n S xx 2 n S xx
Sample size =n
Level of significance =
0 − 00
Test the statistic, t0 =
1 x2
MS Re s ( + )
n S xx
Decision: Reject the null hypothesis is t0 t , n − 2
2
1 x2 1 x2
0 − t , n − 2 MSRe s ( + ) 0 0 + t , n − 2 MSRe s ( + )
2 n S xx 2 n S xx
Problem: The following data pertain to the number of computer jobs per day
CPU Time(y) 2 5 4 9 10
Solution: (i) x
i
i = 15, yi = 30, xi 2 = 55, yi 2 = 226, xi yi = 110, x = 3 and y = 6
i i i i
S xy 20
1 = = =2
S xx 10
0 = y − 1 x
= 6 − (2)(3)
=0
The least squares line is y = 2 x
H1 : 0 0.002
n = 5, = 0.05
S 2 xy (20) 2
SS Re s = S yy − = 46 − =6
S xx 10
SSRe s 6
MSRe s = = =2
3 3
0 − 00
Test the statistic, t0 =
1 x2
MS Re s ( + )
n S xx
0 − 0.002
=
1 32
2( + )
5 10
−0.002
=
2 18
( + )
5 10
5
= −0.002
11
= −0.000909
t , (n − 2) d . f = t0.025 (3 d . f ) = 3.182
2
Decision: Reject H 0 if t0 t , n − 2
2
Here (1 − )100 = 95
(1 − ) = 0.95
= 0.05
= 0.025
2
1 x2 1 x2
0 − t , n − 2 MSRe s ( + ) 0 0 + t , n − 2 MSRe s ( + )
2 n S xx 2 n S xx
1 32 1 32
0 − 3.182 2 + 0 0 + 3.182 2 +
5 10 5 10
−4.7196 0 4.7196
Null Hypothesis, H o : 1 = 0
Alternative Hypothesis, H1 : 1 0
Sample size =n
Level of significance =
1
Test the statistic, t0 =
Sec( 1 )
MSRe s
Where Sec( 1 ) =
S xx
Decision: Reject the null hypothesis if t0 t ,(n − 2) d . f
2
Problem: The following are measurements of the air velocity and evaporation
coefficient of burning drop lets in an impulse engine.
Solution:
Null Hypothesis, H o : 0 = 0
Alternative Hypothesis, H1 : 0 0
0
Test the statistic, t0 =
Sec( 0 )
MSRe s
Where Sec(0 ) =
S xx
S 2 xy SSRe s
SS Res = S yy − = 0.20238 MS Res = = 0.02529
S xx n−2
1
Test the statistic, t0 =
MSRe s
S xx
0.00383
t0 =
0.02529
132000
= 8.75
t0.025,8 = 2.306