0% found this document useful (0 votes)
42 views47 pages

Statistics For Business and Economics: Dr. Tang Yu

The document discusses simple linear regression analysis, including how to test for correlation between variables, perform a hypothesis test on the slope coefficient, and establish a confidence interval for the slope estimate. Examples are provided to demonstrate calculating sums of squares, performing an ANOVA test, and establishing a regression equation that can be used to estimate or predict values.

Uploaded by

One Plus 7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views47 pages

Statistics For Business and Economics: Dr. Tang Yu

The document discusses simple linear regression analysis, including how to test for correlation between variables, perform a hypothesis test on the slope coefficient, and establish a confidence interval for the slope estimate. Examples are provided to demonstrate calculating sums of squares, performing an ANOVA test, and establishing a regression equation that can be used to estimate or predict values.

Uploaded by

One Plus 7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 47

Statistics for Business and

Economics
Dr. TANG Yu

Department of Mathematics
Soochow University
May 28, 2007
Types of Correlation

Positive correlation Negative correlation No correlation


Slope 1 is positive Slope 1 is negtive Slope 1 is zero
Hypothesis Test
 For the simple linear regression
model y  0  1 x  
 If x and y are linearly related, we
must have 1  0
 We will use the sample data to
test the following hypotheses
about the parameter 1
H 0 : 1  0 H a : 1  0
Sampling Distribution
• Just as the sampling distribution of the sample
mean, X-bar, depends on the the mean, standard
deviation and shape of the X population, the
sampling distributions of the β0-hat and β1-hat least
squares estimators depend on the properties of the
{Yj } sub-populations (j=1,…, n).

y j   0  1 x j   j

Given xj, the properties of the {Yj } sub-population


are determined by the εj error/random variable.
Model Assumption
As regards the probability distributions of εj ( j
=1,…, n), it is assumed that:
i. Each εj is normally distributed, Yj is also normal;
ii. Each εj has zero mean, E(Yj) = β0 + β1 xj
iii.Each εj has the same Var(Yj) = σε2 is also
variance, σε2, constant;
iv. The errors are independent of {Yi} and {Yj}, i  j, are also
each other, independent;
v. The error does not depend on The effects of X and ε on Y
the independent variable(s). can be separated from
each other.
Graph Show
E(Y)

E(Y)  β0  β1X
Yi : N (β0+β1xi ; σ )

Yj : N (β0+β1xj ; σ )

X
xi xj

The y distributions have the same shape at each x value


Sum of Squares
Sum of squares due to error (SSE)
SSE  1   2   3   4
ˆ 2
ˆ 2
ˆ 2
ˆ 2


  Yi  Yˆi 
2

Sum of squares due to regression (SSR)



SSR  SYY   Yˆi  Y 
2

Total sum of squares (SST)


SST   Yi  Y   SSE  SSR
2
ANOVA Table

Source of Sum of Degree of


Mean Square F
Variation Squares Freedom

Regression SSR 1 MSR=SSR/1 MSR/MSE

MSE=
Error SSE n-2
SSE/(n-2)
Total SST n-1
Example
Score (y) LSD Conc (x) x-xbar y-ybar Sxx Sxy Syy
78.93 1.17 -3.163 28.843 10.004569 -91.230409 831.918649
58.20 2.97 -1.363 8.113 1.857769 -11.058019 65.820769
67.47 3.26 -1.073 17.383 1.151329 -18.651959 302.168689
37.47 4.69 0.357 -12.617 0.127449 -4.504269 159.188689
45.65 5.83 1.497 -4.437 2.241009 -6.642189 19.686969
32.92 6.00 1.667 -17.167 2.778889 -28.617389 294.705889
29.97 6.41 2.077 -20.117 4.313929 -41.783009 404.693689
Total 350.61 30.33 -0.001 0.001 22.474943 -202.487243 2078.183343

350.61 30.33
y  50.087 x  4.333
7 7
^  202.4872 ^ ^
1   9.01  0  y   1 x  50.09  (9.01)( 4.33)  89.10
22.4749
^
y  89.10  9.01x
SSE

yˆ  89.10  9.01x
Yi Yˆi Yi  Yˆi Y  Yˆ 
i i
2

78.93 78.5583 0.3717 0.138161


58.20 62.3403 -4.1403 17.14208
67.47 59.7274 7.7426 59.94785
37.47 46.8431 -9.3731 87.855
45.65 36.5717 9.0783 82.41553
32.92 35.04 -2.12 4.4944
29.97 31.3459 -1.3759 1.893101
253.886
SST and SSR

S xx  22.475 S xy  202.487 S yy  2078.183


yˆ  89.10  9.01x
SST  SYY  2078.183 SSE  253.89
SSR  SST  SSE  1824.3
ANOVA Table
Source of Sum of Degree of
Mean Square F
Variation Squares Freedom

Regression 1824.3 1 MSR=1824.3 35.93

Error 253.9 5 MSE=50.78

Total 2078.2 6

 As F=35.93 > 6.61, where 6.61 is the critical


value for F-distribution with degrees of freedom 1
and 5 (significant level takes .05), we reject H0,
and conclude that the relationship between x and
y is significant
Hypothesis Test
 For the simple linear regression
model y  0  1 x  
 If x and y are linearly related, we
must have 1  0
 We will use the sample data to
test the following hypotheses
about the parameter 1
H 0 : 1  0 H a : 1  0
Standard Errors
Standard error of estimate: the sample standard
deviation of ε.
SSE
s  MSE 
n2

Replacing σε with its estimate, sε, the


estimated standard error ofβ1-hat is
s s
sˆ  
1
S xx 
 ix  x 2
t-test
 Hypothesis
H 0 : 1  0 H a : 1  0

 Test statistic
ˆ1
t
sˆˆ
1

where t follows a t-distribution with


n-2 degrees of freedom
Reject Rule
 Hypothesis
H 0 : 1  0 H a : 1  0
 This is a two-tailed test
p  value approach : Reject H 0 if p  value  

Critical value approach : Reject H 0 if t  t  2 or t  t 2


Example
Score (y) LSD Conc (x) x-xbar y-ybar Sxx Sxy Syy
78.93 1.17 -3.163 28.843 10.004569 -91.230409 831.918649
58.20 2.97 -1.363 8.113 1.857769 -11.058019 65.820769
67.47 3.26 -1.073 17.383 1.151329 -18.651959 302.168689
37.47 4.69 0.357 -12.617 0.127449 -4.504269 159.188689
45.65 5.83 1.497 -4.437 2.241009 -6.642189 19.686969
32.92 6.00 1.667 -17.167 2.778889 -28.617389 294.705889
29.97 6.41 2.077 -20.117 4.313929 -41.783009 404.693689
Total 350.61 30.33 -0.001 0.001 22.474943 -202.487243 2078.183343

350.61 30.33
y  50.087 x  4.333
7 7
^  202.4872 ^ ^
1   9.01  0  y   1 x  50.09  (9.01)( 4.33)  89.10
22.4749
^
y  89.10  9.01x
SSE

yˆ  89.10  9.01x
Yi Yˆi Yi  Yˆi Y  Yˆ 
i i
2

78.93 78.5583 0.3717 0.138161


58.20 62.3403 -4.1403 17.14208
67.47 59.7274 7.7426 59.94785
37.47 46.8431 -9.3731 87.855
45.65 36.5717 9.0783 82.41553
32.92 35.04 -2.12 4.4944
29.97 31.3459 -1.3759 1.893101
253.886
Calculation
SSE 253.89
s  MSE    7.1258
n2 72
s s 7.1258
sˆ     1.5031
1
S xx 
 ix  x 2
22.475

ˆ1  9.01
t   5.9943  2.571
sˆˆ 1.5031
1

where 2.571 is the critical value for t-distribution


with degree of freedom 5 (significant level
takes .025), so we reject H0, and conclude that
the relationship between x and y is significant
Confidence Interval
β1-hat is an estimator of β1
ˆ1 follows a t-distribution with
t
sˆˆ
1 n-2 degrees of freedom
The estimated standard error ofβ1-hat is
s s
sˆ  
1
S xx 
 ix  x 2

So the C% confidence interval


estimators of β1 is βˆ1  tα/ 2 ,n 2 s βˆ
1
Example

The 95% confidence interval estimators


of β1 in the previous example is

 9.01 2.5711.5031  9.01 3.86

i.e., from –12.87 to -5.15, which does not


contain 0
Regression Equation
 It is believed that the longer one studied, the better
one’s grade is. The final mark (Y) on study time (X)
is supposed to follow the regression equation:
yˆ  ˆ 0  ˆ1 x  21.590  1.877 x

 If the fit of the sample regression equation is


satisfactory, it can be used to estimate its mean
value or to predict the dependent variable.
Estimate and Predict
yˆ  ˆ 0  ˆ1 x  21.590  1.877 x

Estimate Predict

For the expected value of a For a particular element of


Y sub-population. a Y sub-population.
E.g.: What is the mean final E.g.: What is the final mark of
mark of all those students who Tom who spent 30 hours on
spent 30 hours on studying? studying?
I.e., given x = 30, how large is I.e., given x = 30, how large is y?
E(y)?
What Is the Same?
For a given X value, the point forecast (predict)
of Y and the point estimator of the mean of the
{Y} sub-population are the same:
yˆ  ˆ0  ˆ1 x

Ex.1 Estimate the mean final mark of


students who spent 30 hours on study.
Ex.2 Predict the final mark of Tom, when his
study time is 30 hours.
yˆ  ˆ0  ˆ1 x  21.590  1.877  30  77.9
What Is the Difference?
The interval prediction of Y and the interval
estimation of the mean of the {Y} sub-population
are different:

 The prediction  The estimation


1 ( xg  x ) 2 1 ( xg  x ) 2
yˆ  t 2 s 1  yˆ  t 2 s 
n  ( xi  x ) 2 n  ( xi  x ) 2

The prediction interval is wider than the


confidence interval
Example
Score (y) LSD Conc (x) x-xbar y-ybar Sxx Sxy Syy
78.93 1.17 -3.163 28.843 10.004569 -91.230409 831.918649
58.20 2.97 -1.363 8.113 1.857769 -11.058019 65.820769
67.47 3.26 -1.073 17.383 1.151329 -18.651959 302.168689
37.47 4.69 0.357 -12.617 0.127449 -4.504269 159.188689
45.65 5.83 1.497 -4.437 2.241009 -6.642189 19.686969
32.92 6.00 1.667 -17.167 2.778889 -28.617389 294.705889
29.97 6.41 2.077 -20.117 4.313929 -41.783009 404.693689
Total 350.61 30.33 -0.001 0.001 22.474943 -202.487243 2078.183343

350.61 30.33
y  50.087 x  4.333
7 7
^  202.4872 ^ ^
1   9.01  0  y   1 x  50.09  (9.01)( 4.33)  89.10
22.4749
^
y  89.10  9.01x
SSE

yˆ  89.10  9.01x
Yi Yˆi Yi  Yˆi Y  Yˆ 
i i
2

78.93 78.5583 0.3717 0.138161


58.20 62.3403 -4.1403 17.14208
67.47 59.7274 7.7426 59.94785
37.47 46.8431 -9.3731 87.855
45.65 36.5717 9.0783 82.41553
32.92 35.04 -2.12 4.4944
29.97 31.3459 -1.3759 1.893101
253.886
Estimation and Prediction
The point forecast (predict) of Y
and the point estimator of the
mean of the {Y} are the same:

yˆ  89.10  9.01x
For x g  5 .0

yˆ  89.10  9.01 5.0  44.05


Estimation and Prediction

But for the interval estimation


and prediction, it is different:

yˆ  89.10  9.01x
For x g  5 .0
Data Needed
For x g  5 .0
SSE 253.89
s  MSE    7.1258
n2 72

 ix  x 2
 S xx  22.475

t.025  2.571
 The prediction  The estimation
1 ( xg  x ) 2 1 ( xg  x ) 2
yˆ  t 2 s 1  yˆ  t 2 s 
n  ( xi  x ) 2 n  ( xi  x ) 2
Calculation
1 ( xg  x ) 2
yˆ  t 2 s 
n  ( xi  x ) 2
Estimation
1 5.0  4.333
2
 44.05  2.571 7.1258  
7 22.475
 44.05  7.3887
1 ( xg  x ) 2
yˆ  t 2 s 1  
n  ( xi  x ) 2
Prediction
1 5.0  4.333
2
 44.05  2.571 7.1258  1  
7 22.475
 44.05  19.7543
Moving Rule
 As xg moves away from x the interval
becomes longer. That is, the shortest
interval is found at x.
1 ( x g  x)2
The confidence interval ŷ  t  2 s  
when xg = x n
 ( x i  x)2

The confidence interval 1 12


ŷ  t  2 s  
when xg = x  1 n
 ( xi  x)2

The confidence interval 1 22


x  2 x 1 x 1 x  2 ŷ  t  2 s  
x when xg = x  2 n
 ( xi  x)2
Moving Rule
 As xg moves away from x the interval
becomes longer. That is, the shortest
interval is found at x.

The confidence interval 1 ( xg  x ) 2


yˆ  t 2 s 1 
when xg x= n  ( xi  x ) 2

The confidence interval 1 12


yˆ  t 2 s 1 
when xg = x  1 n  ( xi  x ) 2

x  2 x 1 x 1 x  2 The confidence interval 1 22


x yˆ  t 2 s 1 
when xg = x  2 n  ( xi  x ) 2
Interval Estimation

Prediction Estimation

x  2 x 1 x 1 x  2
x
Residual Analysis
RegressionResidual – the difference
between an observed y value and its
corresponding predicted value
r  y  yˆ
Properties of Regression Residual
The mean of the residuals equals zero
The standard deviation of the residuals is
equal to the standard deviation of the fitted
regression model
Example

yˆ  89.10  9.01x
Score (y) LSD Conc (x) y-hat residual(r)
78.93 1.17 78.558 0.3717
58.20 2.97 62.34 -4.1403
67.47 3.26 59.727 7.7426
37.47 4.69 46.843 -9.3731
45.65 5.83 36.572 9.0783
32.92 6.00 35.04 -2.12
29.97 6.41 31.346 -1.3759
Residual Plot Against x
r

x
Residual Plot Against y-hat
r


Three Situations

Good Pattern

Non-constant
Variance

Model form
not adequate
Standardized Residual
 Standard deviation of the ith residual
s yi  yˆi  s 1  hi
where
s yi  yˆ i  the standard deviation of residual i
s  the standard error of the estimate
1
hi  
 xi  x 
2

n  x j  x 2
 Standardized residual for observation i
yi  yˆ i
zi 
s yi  yˆ i
Standardized Residual Plot
z

x
Standardized Residual
 The standardized residual plot can provide
insight about the assumption that the
error term has a normal distribution
 If the assumption is satisfied, the
distribution of the standardized residuals
should appear to come from a standard
normal probability distribution
 It is expected to see approximately 95%
of the standardized residuals between –2
and +2
Detecting Outlier

Outlier
Influential Observation

Outlier
Influential Observation

Influential
observation
High Leverage Points
 Leverage of observation
1
hi  
 xi  x 
2

n  x j  x 2
 For example
10 10 15 20 20 25 70
x  24.2857
1
hi  
 xi  x 
2
1
 
70  24.2857 
2
 .94 
6 6
  .86
n  x j  x  7  xi  24.2857 
2 2
n 7
Contact Information

 Tang Yu (唐煜)
[email protected]
 http://math.suda.edu.cn/homepage/tangy

You might also like