Correlation and regression
correlation
A stereo and sound equipment store’s manager wants to
determine the relationship between the number of weekend
television commercials shown and the sales at the store
during the following week. The sample data is as shown:
Week No. of commercials ‘x’ Sales Volume (Rs.1000s)
1 2 50
2 5 57
3 1 41
4 3 54
5 4 54
6 1 38
7 5 63
8 3 48
9 4 59
10 2 46
covariance
Covariance is a descriptive measure of linear relationship
between two variables.
Covariance is given by sxy = [∑(xi - xbar) (yi - ybar)] / (n-1)
SCATTER DIAGRAM METHOD
SOURCE: WIKIPEDIA
Simple linear regression
Regression x y
Model y = β0 + x1 y1
β1 x + ε x2 y2
Regression .
v .
eqn . .
E(y) = β0 + β1 x xn yn
Estimated
b0, b1 provide Regression
Equation
estimates of ŷ = b0 + b1x
β0 and β1 Sample stats
b0, b1
Least squares method
Sample data from 10 pizza parlor restaurants situated near
college campus.
Restaurant Student Population (‘000s) Sales (Rs.‘000s)
1 2 58
2 6 105
3 8 88
4 8 118
5 12 117
6 16 137
7 20 157
8 20 169
9 22 149
10 26 202
Least squares method
Least squares criterion
• Min ∑(yi - ŷ)2
Estimated regression equation is
ŷ = b0 + b1 x
Where b1 = [∑(xi - xbar) (yi - ybar)] / ∑(xi - xbar)2
And where b0 = ybar - b1 xbar
Chart Title
220
200
y = 5x + 60
180
160
140
120
100
80
60
40
20
0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
Student Population (000s)
SSE
xi Student Predicted Squared
yi Sales
Restaurant Population Sales Error (yi - ŷ) Error
(Rs.‘000s)
(‘000s) ŷ = 5x + 60 (yi - ŷ)2
1 2 58 70 -12 144
2 6 105 90 15 225
3 8 88 100 -12 144
4 8 118 100 18 324
5 12 117 120 -3 9
6 16 137 140 -3 9
7 20 157 160 -3 9
8 20 169 160 9 81
9 22 149 170 -21 441
10 26 202 190 12 144
SSE = 1530
SSt
xi Student
yi Sales Squared Error
Restaurant Population Error (yi - ȳ)
(Rs.‘000s) (yi - ȳ)2
(‘000s)
1 2 58 -72 5184
2 6 105 -25 625
3 8 88 -42 1764
4 8 118 -12 144
5 12 117 -13 169
6 16 137 7 49
7 20 157 27 729
8 20 169 39 1521
9 22 149 19 361
10 26 202 72 5184
ȳ= 130 SST = 15730
Coefficient of determination
SSE = ∑(yi - ŷi)2
SST = ∑(yi - ȳ)2
SSR = ∑(ŷi - ȳ)2 = SST - SSE
r2 = SSR / SST
Correlation Coefficient = (sign of b1) √r2
Coefficient of determination
Xi 1 2 3 4 5
Yi 3 7 5 11 14
The estimated regression equation for these data is ŷ =.20 +
2.60x
A) compute SSE, SST and SSR
B) Compute r2
C) Compute r
Testing for significance
E(y) = β0 + β1 x. If the value of ‘x’ = 0, then E(y) = β0 and hence x
and y are linearly related
To test the significance of relationship, conduct a hypothesis test
to determine whether the value of is β1 zero.
MSE - Mean Square of errors estimates the value of σ2.
S2 is unbiased estimator of σ2
So S2 = MSE = SSE / (n-2)
n-2 degrees of freedom as β0 & β1 are already used to compute
SSE.
Testing for significance
H 0: β 1 = 0
• H a: β 1 ≠ 0
Sampling distribution of b1
• E(b1) = β1
Standard Deviation σb1= σ / √(xi - xbar)2
• sb1 = s / √(xi - xbar)2
exercise
Xi 2 3 5 1 8
Yi 25 25 20 30 16
• A) Compute mean square error
• B) Use t-test to test for significance at alpha = .05
• C) use F-test to test the hypothesis at .05 level of
significance. Present the results in ANOVA table format.