Chapter 8: Correlaiton and Simple Linear Regression Analysis
Chapter 8: Correlaiton and Simple Linear Regression Analysis
e.g.1 An insurance company manager is concerned with the health of female adults, since the
company is prepared to give a reduced premium rate to those who have a certain level of
fitness. In particular, he would like to investigate how their height is related to their
weight, with a view to possibly using these measurements as a fitness criterion.
To this end, he selects a random sample of 12 adult females and measures both their
height (in cm) and weight (in kg). The results are:-
1
y y
xx xx
xxx x
x x x x
x x x xxx x
x x x x x xx
x x x x xxx x
xx xx xx
x x x x x x
xx x x x x
x x x x xx
xx x xx xx .
x x x xx x x
xx x x x x
x x x x xx xx
x x
0 < r < +1 -1 < r < 0
- The intermediate case where the correlation coefficient is zero corresponds to cases
where the estimated line is parallel to the axis.
y y
r=0 x r=0 x
- The correlation coefficient is a statistic that is specially concerned with linear
relationship: zero correlation does not necessary imply that X and Y are unrelated.
S xy xy x y
n
r= = n x 2 x n y 2 y
2 2
S xx S yy
x y
where Sxy = xy -
n
x 2
Sxx = x 2
n
y 2
Syy = y n
2
Important notes:
1. The value of r must always lie between –1 and +1 (both inclusive).
2. If r = +1, the two variables have perfect positive correlation. On a scatter diagram, this
means that the points all lie on a straight line that has a positive gradient.
3. If r = -1, the two variables have perfect negative correlation. On a scatter diagram, this
means that the points all lie on a straight line that has a negative slope.
4. If the two variables are positively correlated, but not perfectly so, the coefficient lies between
0 and +1.
5. If the two variables are negatively correlated, but not perfectly so, the coefficient lies between
–1 and 0.
6. If the two variables have no overall upward or downward trend whatsoever, the coefficient is
zero.
2
e.g.2 Calculate and comment on the correlation coefficient for the data below:
x y
167 71.8
168 72.0
165 69.3
165 70.0
160 64.2
156 58.1
169 74.0
166 70.0
162 59.3
158 59.0
168 67.1
168 64.0
e.g.3 A men’s tie shop ran 10 sales promotions to determine the number of men’s neckties of a
certain type that customers would buy at various prices. Following are the sales results:
Prices (dollars) Number of ties sold
6.49 187
6.99 149
7.49 155
7.99 148
8.49 130
8.99 132
9.49 90
9.99 99
10.49 69
10.99 51
3. Test statistic:
3
r n2
t
1 r2
4. Rejection region:
One-tailed test Two-tailed test
t t
t > t ;( n 2 ) t > ;( n 2 ) or t < - ;( n 2 )
2 2
e.g.5 To examine the relationship between the store size (i.e. square footage) and its annual
sales, a sample of 14 stores was selected and the data is shown in the table below.
Linear Regression
- Of the various kinds of equations used to predict values of one variable (dependent
variable), y, from associated values of another variable (independent variable), x, the
simplest and most widely used is the linear equation in two unknowns, which is the form
y = a + bx
where a = the y intercept
b = slope of the line (the change in y which accompanies a change of one unit in x).
- Linear equations are useful and important not only because many relationships are
actually of this form, but also because they often provide close approximations to
relationship which would otherwise be difficult to describe in mathematical terms.
The term ‘linear equation’ arises from the fact that, when plotted on ordinary graph paper,
all pairs of values of x and y which satisfy an equation of the form y = a + bx fall on a
straight line.
y
y = a + bx
4
a
0 x
SSE = y
i 1
i yˆ i
where S xx xi x
i 1
2
n
= n 2
xi
xi i 1
i 1 n
n
S xy xi x y i y
i 1
n n
xi y i
= n
xi y i i 1 i 1
i 1 n
e.g.6 The data below relates the weekly maintenance cost ($) to the age (in months) of 10
machines of similar type in a manufacturing company. Find the linear equation of
maintenance cost on age and use this to predict the maintenance cost for a machine of this
type which is 40 months old.
Machines 1 2 3 4 5 6 7 8 9 10
Age 5 10 15 20 30 30 30 50 50 60
Cost ($) 190 240 250 300 310 335 300 300 350 395
e.g.7 The expenditure on child care facilities in the previous year by a random sample of 6
local councils, and the number of children under age 12 living in the electorates, are
shown below.
5
Council Expenditure ($’000) Number of children
1 125 1723
2 180 2510
3 154 1856
4 90 1525
5 102 1624
6 63 920
x 2 4 8 10 13 16
y 50 38 26 25 7 2
STA2204.chap8
t distribution
6
Find your degrees of freedom in the df column and use that row to find the next smaller number.
Read the probability in the top row. Since your t will probably be
a little bit bigger than the value in the table, your P will be smaller, eg., P < 0.01
Remember that P < 0.05 is the arbitrary value that is generally accepted to be significant
(There must be less than a 5% possibility that the difference between means is due to chance.)
(Values in the table are critical values of the t- random variable for right hand tails of the indicated areas, alpha).
df .25 .20 .15 .10 .05 .025 .02 .01 .005 .0025 .001 .0005
1 1.000 1.376 1.963 3.078 6.314 12.71 15.89 31.82 63.66 127.3 318.3 636.6
2 .816 1.061 1.386 1.886 2.920 4.303 4.849 6.965 9.925 14.09 22.33 31.60
3 .765 .978 1.250 1.638 2.353 3.182 3.482 4.541 5.841 7.453 10.21 12.92
4 .741 .941 1.190 1.533 2.132 2.776 2.999 3.747 4.604 5.598 7.173 8.610
5 .727 .920 1.156 1.476 2.015 2.571 2.757 3.365 4.032 4.773 5.893 6.869
6 .718 .906 1.134 1.440 1.943 2.447 2.612 3.143 3.707 4.317 5.208 5.959
7 .711 .896 1.119 1.415 1.895 2.365 2.517 2.998 3.499 4.029 4.785 5.408
8 .706 .889 1.108 1.397 1.860 2.306 2.449 2.896 3.355 3.833 4.501 5.041
9 .703 .883 1.100 1.383 1.833 2.262 2.398 2.821 3.250 3.690 4.297 4.781
10 .700 .879 1.093 1.372 1.812 2.228 2.359 2.764 3.169 3.581 4.144 4.587
11 .697 .876 1.088 1.363 1.796 2.201 2.328 2.718 3.106 3.497 4.025 4.437
12 .695 .873 1.083 1.356 1.782 2.179 2.303 2.681 3.055 3.428 3.930 4.318
13 .694 .870 1.079 1.350 1.771 2.160 2.282 2.650 3.012 3.372 3.852 4.221
14 .692 .868 1.076 1.345 1.761 2.145 2.264 2.624 2.977 3.326 3.787 4.140
15 .691 .866 1.074 1.341 1.753 2.131 2.249 2.602 2.947 3.286 3.733 4.073
16 .690 .865 1.071 1.337 1.746 2.120 2.235 2.583 2.921 3.252 3.686 4.015
17 .689 .863 1.069 1.333 1.740 2.110 2.224 2.567 2.898 3.222 3.646 3.965
18 .688 .862 1.067 1.330 1.734 2.101 2.214 2.552 2.878 3.197 3.611 3.922
19 .688 .861 1.066 1.328 1.729 2.093 2.205 2.539 2.861 3.174 3.579 3.883
20 .687 .860 1.064 1.325 1.725 2.086 2.197 2.528 2.845 3.153 3.552 3.850
21 .663. .859 1.063 1.323 1.721 2.080 2.189 2.518 2.831 3.135 3.527 3.819
22 .686 .858 1.061 1.321 1.717 2.074 2.183 2.508 2.819 3.119 3.505 3.792
23 .685 .858 1.060 1.319 1.714 2.069 2.177 2.500 2.807 3.104 3.485 3.768
24 .685 .857 1.059 1.318 1.711 2.064 2.172 2.492 2.797 3.091 3.467 3.745
25 .684 .856 1.058 1.316 1.708 2.060 2.167 2.485 2.787 3.078 3.450 3.725
26 .684 .856 1.058 1.315 1.706 2.056 2.162 2.479 2.779 3.067 3.435 3.707
27 .684 .855 1.057 1.314 1.703 2.052 2.15 2.473 2.771 3.057 3.421 3.690
28 .683 .855 1.056 1.313 1.701 2.048 2.154 2.467 2.763 3.047 3.408 3.674
29 .683 .854 1.055 1.311 1.699 2.045 2.150 2.462 2.756 3.038 3.396 3.659
30 .683 .854 1.055 1.310 1.697 2.042 2.147 2.457 2.750 3.030 3.385 3.646
40 .681 .851 1.050 1.303 1.684 2.021 2.123 2.423 2.704 2.971 3.307 3.551
50 .679 .849 1.047 1.299 1.676 2.009 2.109 2.403 2.678 2.937 3.261 3.496
60 .679 .848 1.045 1.296 1.671 2.000 2.099 2.390 2.660 2.915 3.232 3.460
80 .678 .846 1.043 1.292 1.664 1.990 2.088 2.374 2.639 2.887 3.195 3.416
100 .677 .845 1.042 1.290 1.660 1.984 2.081 2.364 2.626 2.871 3.174 3.390
1000 .675 .842 1.037 1.282 1.646 1.962 2.056 2.330 2.581 2.813 3.098 3.300
inf. .674 .841 1.036 1.282 1.645 1.960 2.054 2.326 2.576 2.807 3.091 3.291