Lesson 12 - Introduction To Regression and Correlation Analysis Regression Analysis
Lesson 12 - Introduction To Regression and Correlation Analysis Regression Analysis
Regression Analysis
Regression model (2.1) is said to be simple, linear in the parameters, and linear
in the independent variable. It is “simple” in that there is only one independent variable,
“linear in the parameters” because no parameter appears as an exponent or is multiplied
or divided by another parameter, and “linear in the independent variable” because this
variable appears only in the first power. A model that is linear in the parameters and the
independent variable is also called a first-order model.
The term o is called the y-intercept. It refers to the expected level of Y when X
= 0 (no priors). This is the base line amount because it is what Y is before we take the
level of X into account.
The term 1 is called the slope (or the regression coefficient) for X. This
represents the amount that Y changes (increases or decreases) for each change of one
unit in X. Thus, for example, the difference in sentence length between a defendant with
X = 0 (no priors) and X = 1 (one prior) and the defendant with X = 2 (two priors) is 2 i .
Finally, i is called the error term or disturbance term. It represents the amount
of the sentence that cannot be accounted for by o and 1 X. In other words, i
represents the departure of a given defendant’s sentence from that which would be
expected on the basis of his number of priors (X).
1
1. The variable X is fixed or a predetermined variable.
2. The variable Y’s have normal distribution and have the same variances.
3. The errors (ei) associated with each value of Y is also normal with mean zero and
variance 2.
The values of the parameters in the regression equation are often times not
known, the common practice is to obtain sample paired observations (X,Y) and then
estimate the above mentioned parameters using the following formulas:
2
X Y
XY X X Y Y
i i
i i
b1 n
i i
X X
(estimator of 1)
Xi
2 2
2i i
bo
1
Yi b1 X i Y b1 (estimator of 0)
n
Y 2
X Y
Yi b1 X i Yi
2 i i i
n n (estimator of ) 2
S 2
MSE SSE
x
y n2 n2
To test the significance of H0: 0=0 and H0: 1 = 1, the following test statistics
may be computed:
bo o
t
SE bo
where:
1
X 2 2 2
X nX
MSE
i
SE (b0) = MSE S 2
n X X x
2
n X i X
2
y SSX
i
X 2
SSX X 2
b1 1
t
SE b1
where:
S 2x
SE(b1) = y
SSX
3
graph that shows the way scores on any two variables, X and Y, are scattered
throughout the range of possible score values.
r
XY n X Y
n XY X Y
X n X Y
2 2 2
nY
2
n X 2 2
X n Y 2 Y
2
Requirements for the Use of Pearson’s Correlation Coefficient
1. A straight-line relationship. Pearson’s r is only useful for detecting a straight-
line correlation between X and Y.
2. Interval Data. Both X and Y variables must be measured at the interval level,
so that scores may be assigned to the respondents.
3. Random sampling. Sample members must have been drawn at random from
a specified population to apply a test of significance.
r n2
t
1 r2
4
SEATWORK NO.2
The following data give infant mortality rates (deaths 1 year or less after birth, per
1000 live births) for a period of 11 years in certain country.
Year (X) 1 2 3 4 5 6 7 8 9 10 11
Infant Mortality Rate
68.5 65.8 65.2 65.5 64.1 59.3 62 67.9 64.7 58.7 53.3
(Y)
a.Determine the slope of the line if the data is fitted to a straight line by least square
method. What does the slope indicate?
b.Compute the coefficient of determination.
5
c.Assuming that the factors affecting mortality rate to be constant, estimate the infant
mortality rate of the country 20 years from the last recorded data.
COMPUTATIONS:
X Y X2 Y2 XY
1 68.50 1 4692.3 68.5
2 65.80 4 4329.6 131.6
3 65.20 9 4251 195.6
4 65.50 16 4290.3 262
5 64.10 25 4108.8 320.5
6 59.30 36 3516.5 355.8
7 62.00 49 3844 434
8 67.90 64 4610.4 543.2
9 64.70 81 4186.1 582.3
10 58.70 100 3445.7 587
11 53.30 121 2840.9 586.3
nXY XY
b1 =
X
2
nX 2
b0 = Y b1 X
695 66
= 0.9382
11 11
bo = 68.811
Y = b0 + b1 X
Y =68.811 – 0.9382X
b. Coefficient of determination, r2
nXY XY
r=
nX 2
X 2 nY 2 Y 2 =
6
r = -0.689
r2 = (-0.689)2
r2 = 0.474
2
r = 47.4%
PRESS
MODE MODE 2 1
The calculator is ready to perform Linear Regression Function. Again, before we
start to input data PRESS
Shift Clr 1 = AC
to ensure that no other data is present in the calculator memory.
Data Input:
Data is inputted, one by one, by keying in the datum X first, followed by a comma
and the Y value, and pressing the DT (M+) button
PRESS
1 , 68.50 M+
2 , 65.80 M+
3 , 65.20 M+
4 , 65.50 M+
5 , 64.10 M+
6 , 59.30 M+
7 , 62.00 M+
8 , 67.90 M+
9 , 64.70 M+
10 , 58.70 M+
11 , 53.30 M+
Shift 1 1 = 506. X 2
Shift 1 2 = 66. X
Shift 1 3 = 11. n
Shift 2 1 = 6. X
7
Shift 2 2 = 3.16227766 X n
Shift 2 3 = 3.31662479 X n 1
Shift 1 1 = 44,115.56 Y 2
Shift 1 2 = 695. Y
Shift 1 3 = 4,066.8 XY
Shift 2 1 = 63.1818181818 Y
Shift 2 2 = 4.308515497 Yn
Shift 2 3 = 4.518809175 Yn 1
Shift 2 3 = -0.688587845 r
WORKSHEET No. 2
INTRODUCTION TO REGRESSION AND CORRELATION ANALYSIS
1. The Statistics Consulting Center at Virginia Polytechnic Institute and State
University analyzed data on normal woodchucks for the Department of Veterinary
Medicine. The variables of interest were body weight in grams and heart weight in
grams. It was also of interest to develop a linear regression equation in order to
determine if there is a significant linear relationship between heart weight and total
body weight. Use heart weight as the independent variable and the body weight as
the dependent variable and fit a simple linear regression using the following data. In
addition, test the hypothesis Ho: 0 against 0
Summary Table (Computations)
Body Weight Heart Weight
Y2 X2 XY
(grams) Y (grams) X
4050 11.2 16402500 125.44 45360.00
2645 12.4 6996025 153.76 32798.00
3120 10.5 9734400 110.25 32760.00
8
5700 13.2 32490000 174.24 75240.00
2595 9.80 6734025 96.04 25431.00
3640 11.0 13249600 121.00 40040.00
2050 10.8 4202500 116.64 22140.00
4235 10.4 17935225 108.16 44044.00
2935 12.2 8614225 148.84 35807.00
4975 11.2 24750625 125.44 55720.00
3690 10.8 13616100 116.64 39852.00
2800 14.2 7840000 201.64 39760.00
2775 12.2 7700625 148.84 33855.00
2170 10.0 4708900 100.00 21700.00
2370 12.3 5616900 151.29 29151.00
2055 12.5 4223025 156.25 25687.50
2025 11.8 4100625 139.24 23895.00
2645 16.0 6996025 256.00 42320.00
2675 13.8 7155625 190.44 36915.00
59150 226.30 203066950 2740.15 702475.5
Summary:
N = 19
X 226.3 Y 59,150 X 2
2,740.15
226.3
Y 2
203,066,950 XY 702,475.5 X
19
11 .911
59,150
Y 3,113 .158
19
9
n XY X Y
r
n X 2
X
2
n Y 2
Y
2
19 702,475.5 226.3 59,150
19 2,740.15 226.3 19 203,066,950 59,150
2 2
38,610.5
306,034,194,978
0.07
PRESS
MODE MODE 2 1
The calculator is ready to perform Linear Regression Function. Again, before we
start to input data PRESS
Shift Clr 1 = AC
to ensure that no other data is present in the calculator memory.
Data Input:
Data is inputted, one by one, by keying in the datum X first, followed by a comma
and the Y value, and pressing the DT (M+) button
PRESS
11.2 , 4050 M+
12.4 , 2645 M+
10.5 , 3120 M+
13.2 , 5700 M+
9.80 , 2595 M+
11.0 , 3640 M+
10.8 , 2050 M+
10.4 , 4235 M+
12.2 , 2935 M+
11.2 , 4975 M+
10.8 , 3690 M+
14.2 , 2800 M+
12.2 , 2775 M+
10.0 , 2170 M+
12.3 , 2370 M+
10
12.5 , 2055 M+
11.8 , 2025 M+
16.0 , 2645 M+
13.8 , 2675 M+
Shift 1 1 = 2,740.15 X 2
Shift 1 2 = 226.3 X
Shift 1 3 = 19 n
Shift 2 1 = 11.91052632 X
Shift 1 1 = 203,066.950 Y 2
Shift 1 2 = 59,150. Y
Shift 1 3 = 702,475.5 XY
Shift 2 1 = 3,113.157895 Y
Shift 2 1 = 3,653.445709 A
Shift 2 2 = -45.36221157 B
Shift 2 3 = -0.069794379 r
Yˆ A BX 3,653.45 45.36 X
where: bo = A and b1 = B
t t 0.025,17 2.110
significance, thus , n 2
2
c) Computation:
11
Y 2
X Y
Y 2
n
b1 XY
n
S 2x
y n2
203,066.950
59,150 2 1,798.864 702,475.5 226.3 59,150
19 19
19 2
2,020,274.238
118 ,839.661
17
SSX X 2
X 2
2,740.15
226.3 2 44.798
n 19
S 2x
SE b1 y
118,839.661
2,652.796 51.505
SSX 44.798
t
b1 1 0.754 0.015
SE b1 51.505
d) Decision: As shown in the figure below, the computed t-value is still located
on the acceptance region. Thus, the null hypothesis is accepted.
-2.110 +2.110
-0.015
e) Conclusion: Therefore, the slope of the regression line is not different from
zero. Meaning, the regression equation could not be used to
estimate the body weight of an individual if the heart weight
is given.
X1 Y X12 Y2 XY
50 48 2500 2304 2400
36 57 1296 3249 2052
12
40 66 1600 4356 2640
41 70 1681 4900 2870
28 89 784 7921 2492
49 36 2401 1296 1764
42 46 1764 2116 1932
45 54 2025 2916 2430
52 26 2704 676 1352
29 77 841 5929 2233
29 89 841 7921 2581
43 67 1849 4489 2881
38 47 1444 2209 1786
34 51 1156 2601 1734
53 57 2809 3249 3021
36 66 1296 4356 2376
33 79 1089 6241 2607
29 88 841 7744 2552
33 60 1089 3600 1980
55 49 3025 2401 2695
29 77 841 5929 2233
44 52 1936 2704 2288
43 60 1849 3600 2580
911 1411 37661 92707 53479
n XY X Y
r
n X 2
X
2
n Y 2
Y
2
23 53,479 9111,411 55,404
0.774
23 37,661 911 23 92,707 1,411
2 2
5,128,097,880
PRESS
MODE MODE 2 1
The calculator is ready to perform Linear Regression Function. Again, before we
start to input data PRESS
Shift Clr 1 = AC
to ensure that no other data is present in the calculator memory.
13
Data Input:
Data is inputted, one by one, by keying in the datum X first, followed by a comma
and the Y value, and pressing the DT (M+) button
PRESS
50 , 48 M+
36 , 57 M+
40 , 66 M+
41 , 70 M+
28 , 89 M+
49 , 36 M+
42 , 46 M+
45 , 54 M+
52 , 26 M+
29 , 77 M+
29 , 89 M+
43 , 67 M+
38 , 47 M+
34 , 51 M+
53 , 57 M+
36 , 66 M+
33 , 79 M+
29 , 88 M+
33 , 60 M+
55 , 49 M+
29 , 77 M+
44 , 52 M+
43 , 60 M+
Shift 1 1 = 37,661. X 2
Shift 1 2 = 911. X
Shift 1 3 = 23 n
Shift 2 1 = 39.60869565 X
14
Shift 1 1 = 92,707. Y 2
Shift 1 2 = 1,411. Y
Shift 1 3 = 53,479. XY
Shift 2 3 = -0.773682845 r
The level of patient’s satisfaction is negatively correlated with the patient’s age.
The degree of association is moderately correlated.
X2 Y X22 Y2 XY
51 48 2601 2304 2448
46 57 2116 3249 2622
48 66 2304 4356 3168
44 70 1936 4900 3080
43 89 1849 7921 3827
54 36 2916 1296 1944
50 46 2500 2116 2300
48 54 2304 2916 2592
62 26 3844 676 1612
50 77 2500 5929 3850
48 89 2304 7921 4272
53 67 2809 4489 3551
55 47 3025 2209 2585
51 51 2601 2601 2601
54 57 2916 3249 3078
49 66 2401 4356 3234
56 79 3136 6241 4424
46 88 2116 7744 4048
49 60 2401 3600 2940
51 49 2601 2401 2499
52 77 2704 5929 4004
58 52 3364 2704 3016
50 60 2500 3600 3000
1168 1411 59748 92707 70695
X Y X 2
Y 2
XY
15
n XY X Y
r
n X 2
X
2
n Y 2
Y
2
23 70,695 1,1681,411 22,063
0.587
23 59,748 1,168 23 92,707 1,411
2 2 37,557.59843
PRESS
MODE MODE 2 1
The calculator is ready to perform Linear Regression Function. Again, before we
start to input data PRESS
Shift Clr 1 = AC
to ensure that no other data is present in the calculator memory.
Data Input:
Data is inputted, one by one, by keying in the datum X first, followed by a comma
and the Y value, and pressing the DT (M+) button
PRESS
51 , 48 M+
46 , 57 M+
48 , 66 M+
44 , 70 M+
43 , 89 M+
54 , 36 M+
50 , 46 M+
48 , 54 M+
62 , 26 M+
50 , 77 M+
48 , 89 M+
53 , 67 M+
55 , 47 M+
51 , 51 M+
54 , 57 M+
49 , 66 M+
16
56 , 79 M+
46 , 88 M+
49 , 60 M+
51 , 49 M+
52 , 77 M+
58 , 52 M+
50 , 60 M+
Shift 1 1 = 59,748. X 2
Shift 1 2 = 1,168. X
Shift 1 3 = 23 n
Shift 2 1 = 50.7826087 X
Shift 1 1 = 92,707. Y 2
Shift 1 2 = 1,411. Y
Shift 1 3 = 70,695. XY
Shift 2 3 = -0.587444376 r
X3 Y X32 Y2 XY
2.3 48 5.29 2304 110.4
2.3 57 5.29 3249 131.1
2.2 66 4.84 4356 145.2
1.8 70 3.24 4900 126
1.8 89 3.24 7921 160.2
2.9 36 8.41 1296 104.4
2.2 46 4.84 2116 101.2
2.4 54 5.76 2916 129.6
2.9 26 8.41 676 75.4
2.1 77 4.41 5929 161.7
2.4 89 5.76 7921 213.6
17
2.4 67 5.76 4489 160.8
2.2 47 4.84 2209 103.4
2.3 51 5.29 2601 117.3
2.2 57 4.84 3249 125.4
2 66 4 4356 132
2.5 79 6.25 6241 197.5
1.9 88 3.61 7744 167.2
2.1 60 4.41 3600 126
2.4 49 5.76 2401 117.6
2.3 77 5.29 5929 177.1
2.9 52 8.41 2704 150.8
2.3 60 5.29 3600 138
52.8 1411 123.24 92707 3171.9
n XY X Y
r
n X 2
X
2
n Y 2
Y
2
23 3,171.9 52.81,411 1,547.1
0.60
23123.24 52.8 23 92,707 1,411
2 2
6,597,751.2
PRESS
MODE MODE 2 1
The calculator is ready to perform Linear Regression Function. Again, before we
start to input data PRESS
Shift Clr 1 = AC
to ensure that no other data is present in the calculator memory.
Data Input:
Data is inputted, one by one, by keying in the datum X first, followed by a comma
and the Y value, and pressing the DT (M+) button
PRESS
2.3 , 48 M+
2.3 , 57 M+
2.2 , 66 M+
1.8 , 70 M+
18
1.8 , 89 M+
2.9 , 36 M+
2.2 , 46 M+
2.4 , 54 M+
2.9 , 26 M+
2.1 , 77 M+
2.4 , 89 M+
2.4 , 67 M+
2.2 , 47 M+
2.3 , 51 M+
2.2 , 57 M+
2 , 66 M+
2.5 , 79 M+
1.9 , 88 M+
2.1 , 60 M+
2.4 , 49 M+
2.3 , 77 M+
2.9 , 52 M+
2.3 , 60 M+
Shift 1 1 = 123.24 X 2
Shift 1 2 = 52.8 X
Shift 1 3 = 23 n
Shift 2 1 = 2.295652174 X
Shift 1 1 = 92,707. Y 2
Shift 1 2 = 1,411. Y
Shift 1 3 = 3,171.9 XY
Shift 2 3 = -0.602310478 r
19
from the new freshman class in a study to determine whether a student’s grade
point average (GPA) at the end of the freshman year (Y) can be predicted from
the entrance test score (X). The results of the study follow.
a) Compute the slope and the intercept of the regression line.
b) Test the significance of the slope (Use = .05)
c) Calculate r
d) Test the null hypothesis that 0 against the alternative that > 0 at the 0.01
level of significance.
e) What percentage of the variation in the GPA is explained by difference in the
entrance test scores?
20
103
Y 2
134.84 XY 262.46 X
20
5.15
50
Y 2.50
20
n XY X Y 20(262.46) (103)(50.0) 99.2
b1 0.368
n X X 20(543.92) (103)
2 2
2 269.4
PRESS
MODE MODE 2 1
The calculator is ready to perform Linear Regression Function. Again, before we
start to input data PRESS
Shift Clr 1 = AC
to ensure that no other data is present in the calculator memory.
Data Input:
Data is inputted, one by one, by keying in the datum X first, followed by a comma
and the Y value, and pressing the DT (M+) button
PRESS
5.50 , 3.10 M+
4.80 , 2.30 M+
4.70 , 3.00 M+
3.90 , 1.90 M+
4.50 , 2.50 M+
6.20 , 3.70 M+
6.00 , 3.40 M+
5.20 , 2.60 M+
4.70 , 2.80 M+
7.30 , 1.60 M+
4.90 , 2.00 M+
5.40 , 5.40 M+
5.00 , 5.00 M+
6.30 , 6.30 M+
4.60 , 4.60 M+
4.30 , 4.30 M+
21
5.00 , 5.00 M+
5.90 , 5.90 M+
4.10 , 4.10 M+
4.70 , 4.70 M+
Shift 1 1 = 543.92 X 2
Shift 1 2 = 103. X
Shift 1 3 = 20. n
Shift 2 1 = 5.15 X
Shift 1 1 = 134.84 Y 2
Shift 1 2 = 50.0 Y
Shift 1 3 = 262.46 XY
Shift 2 1 = 0.603637713 A
Shift 2 2 = 0.368225686 B
Shift 2 3 = 0.430824437 r
c) Computation:
Y 2
X Y
Y 2
n
b1 XY
n
S 2x
y n2
134.84
50.0 2 0.368 262.46 103 50.0
20 20
20 2
8.015
0.445
18
SSX X 2
X 2
543.92
103
2
13.47
n 20
22
S 2x
SE b1 y 0.445
0.0331 0.182
SSX 13.47
t
b1 1 0.368 2.024
SE b1 0.1818
d) Decision: Since the computed t-value is less than the tabulated t-value, there
is no sufficient evidence to reject the null hypothesis as shown in
the figure below.
2.024
0.025 0.025
-2.101 +2.101
f) Conclusion: Therefore, the slope of the regression line is not different from
zero. Meaning, the regression equation could not be used to estimate the
entrance score if the grade point average is given.
n XY X Y
r
n X 2
X
2
n Y 2
Y
2
20 262.46 103 50 99.2
0.431
20 543.92 103 20134.84 50.0
2 2
53,017.92
a) Ho: There is no linear relationship between entrance exam score and grade
point average 0
Ha: There is linear relationship between entrance exam score and grade
point average 0
b) Critical region and Level of Significance: let 0.05 level of significance, then
the critical region is t t0.025, 18 2.101
, n 2
2
c) Computation:
t
r n2
0.431 20 2 2.026
1 r 1 0.431
2 2
23
d) Decision: Since the computed t-value is less than the tabulated t-value,
therefore there is no enough evidence to reject the null
hypothesis.
2.026
0.025 0.025
-2.101 +2.101
Only 1.18% of the variation in entrance exam scores is explained by the total
variation in grade point average. The other 98.82% variations in entrance
examination scores are explained by the other factors not included in the study.
24
76 65 5776 4225 4940
65 84 4225 7056 5460
45 116 2025 13456 5220
58 76 3364 5776 4408
45 97 2025 9409 4365
53 100 2809 10000 5300
49 105 2401 11025 5145
78 77 6084 5929 6006
967 1379 60409 121887 81331
Summary:
N = 16 X 967 Y 1,379 X 2
60,409
967
Y 2
121,887 XY 81,331 X
16
60.4375
1,379
Y 86.1875
16
PRESS
MODE MODE 2 1
The calculator is ready to perform Linear Regression Function. Again, before we
start to input data PRESS
Shift Clr 1 = AC
to ensure that no other data is present in the calculator memory.
Data Input:
Data is inputted, one by one, by keying in the datum X first, followed by a comma
and the Y value, and pressing the DT (M+) button
PRESS
71 , 82 M+
64 , 91 M+
25
43 , 100 M+
67 , 68 M+
56 , 87 M+
73 , 73 M+
68 , 78 M+
56 , 80 M+
76 , 65 M+
65 , 84 M+
45 , 116 M+
58 , 76 M+
45 , 97 M+
53 , 100 M+
49 , 105 M+
78 , 77 M+
Shift 1 1 = 60,409. X 2
Shift 1 2 = 967. X
Shift 1 3 = 16. n
Shift 2 1 = 60.4375 X
Shift 1 1 = 121,887. Y 2
Shift 1 2 = 1,379. Y
Shift 1 3 = 81,331. XY
Shift 2 1 = 148.0506756 A
Shift 2 2 = -1.023589254 B
Shift 2 3 = -0.823894252 r
c) Computation:
26
Y 2
X Y
Y 2
n
b1 XY
n
S 2x
y n2
121,887
1,379 2 1.024 81,331 967 1,379
16
16
16 2
973.8295
69.5593
14
SSX X 2
X 2
60,409
967 2 1,965.9375
n 16
S 2x
SE b1 69.5593
y
0.0354 0.1881
SSX 1,965.9375
t
b1 1 1.024 5.4439
SE b1 0.1881
d) Decision: Since the computed t-value is greater than the tabulated t-value,
there is enough evidence to reject the null hypothesis. The
computed t-value is located in the critical region as shown in the
figure below.
-2.145 +2.145
-5.4439
e) Conclusion: Therefore, the slope of the regression line is different from zero.
n XY X Y
r
n X 2
X
2
n Y 2
Y
2
16 81,331 9671,379 32,197
0.8239
16 60,409 967 16121,887 1,379
2 2
1,527,171,705
27
b) Critical region and Level of Significance: let 0.05 level of significance, then
the critical region is t t0.025, 14 2.145
, n 2
2
r n2 0.8239 16 2
c) Computation: t 5.4395
1 r 1 0.8239
2 2
d) Decision: Since the computed t-value is greater than the tabulated t-value,
therefore there is an evidence to reject the null hypothesis.
28
Summary:
N = 10 X 43 Y 31 X 2
257
43 31
Y 2
145 XY 170 X
10
4.3 Y
10
3.1
n XY X Y
r
n X 2
X n Y 2 Y
2 2
10170 43 31 367
0.618
10 257 43 10145 31
2 2
352,569
PRESS
MODE MODE 2 1
The calculator is ready to perform Linear Regression Function. Again, before we
start to input data PRESS
Shift Clr 1 = AC
to ensure that no other data is present in the calculator memory.
Data Input:
Data is inputted, one by one, by keying in the datum X first, followed by a comma
and the Y value, and pressing the DT (M+) button
PRESS
1 , 1 M+
5 , 4 M+
6 , 2 M+
1 , 3 M+
8 , 5 M+
2 , 1 M+
5 , 2 M+
9 , 6 M+
4 , 7 M+
2 , 0 M+
Shift 1 1 = 257. X 2
29
Shift 1 2 = 43. X
Shift 1 3 = 10. n
Shift 2 1 = 4.3 X
Shift 1 1 = 145. Y 2
Shift 1 2 = 31. Y
Shift 1 3 = 170. XY
Shift 2 3 = 0.61807902 r
a) Ho: There is no linear relationship between (X) how many years they have
lived in their neighborhood and (Y) how many of their neighbors they
regard as friends. 0
Ha: There is linear relationship between (X) how many years they have lived
in their neighborhood and (Y) how many of their neighbors they regard
as friends. 0
b) Critical region and Level of Significance: let 0.10 level of significance, then
the critical region is t t0.05, 8 1.86
, n 2
2
r n2 0.618 10 2
c) Computation: t 2.2234
1 r 1 0.618
2 2
d) Decision: Since the computed t-value is greater than the tabulated t-value,
therefore there is an evidence to reject the null hypothesis.
6. A criminologist studying the relationship between population density and robbery rate
in medium-sized US cities collected the following data for a random sample of 16
cities; X is the population density of the city (number of people per unit area), and Y
30
is the robbery rate last year (number of robberies per 100,000 people). Assume that
the first-order regression model is appropriate.
a) Obtain the estimated regression function. Plot the estimated regression function
and the data. Does the linear regression function appear to give a good fit here?
Discuss.
b) Obtain point estimates of the mean robbery rate last year in cities with population
density X = 60.
Summary:
N = 16 X 1118 Y 3220 X 2
81,452
1,118
Y 2
649,736 XY 225,869 X
16
69.875
3,220
Y 201.25
16
31
Regression Equation = Yˆ bo b1 X 182.9707 0.2616 Xˆ
PRESS
MODE MODE 2 1
The calculator is ready to perform Linear Regression Function. Again, before we
start to input data PRESS
Shift Clr 1 = AC
to ensure that no other data is present in the calculator memory.
Data Input:
Data is inputted, one by one, by keying in the datum X first, followed by a comma
and the Y value, and pressing the DT (M+) button
PRESS
59 , 209 M+
49 , 180 M+
75 , 195 M+
54 , 192 M+
78 , 215 M+
56 , 197 M+
60 , 208 M+
82 , 189 M+
69 , 213 M+
83 , 201 M+
88 , 214 M+
94 , 212 M+
47 , 205 M+
65 , 186 M+
89 , 200 M+
70 , 204 M+
Shift 1 1 = 81,452. X 2
Shift 1 2 = 1,118. X
Shift 1 3 = 16. n
32
Shift 2 1 = 69.875 X
Shift 1 1 = 649,736. Y 2
Shift 1 2 = 3,220. Y
Shift 1 3 = 225,869. XY
Shift 2 1 = 182.9724994 A
Shift 2 2 = 0.261574247 B
Shift 2 3 = 0.365011194 r
Scatter Plot
220
Robbery Rate per 100,000
215
210
205
200
195
190
185
180
175
0 20 40 60 80 100
Population Density
Y 2
X Y
Y 2
n
b1 XY
n
S 2x
y n2
649,736
3,220
2
0.2616 225,869
1,118 3,220
16 16
16 2
1,483.0156
105.9296
14
33
SSX X 2
X 2
81,452
1,118 2 3,331.75
n 16
S 2x
SE b1 105.9296
y
0.03179 0.1783
SSX 3,331.75
t
b1 1 0.2616 1.4672
SE b1 0.1783
d) Decision: Accept the null hypothesis
e) Conclusion: Therefore the slope of regression line is not significantly
different from zero. Thus, the regression equation could not be
used to estimate the crime rate if the population density is given.
34
2 6 4 36 12
3 7 9 49 21
4 1 16 1 4
65 48 477 250 201
Summary:
N = 12 X 65 Y 48 X 2
477
65 48
Y 2
250 XY 201 X
12
5.4167 Y
12
4.0
PRESS
MODE MODE 2 1
The calculator is ready to perform Linear Regression Function. Again, before we
start to input data PRESS
Shift Clr 1 = AC
to ensure that no other data is present in the calculator memory.
Data Input:
Data is inputted, one by one, by keying in the datum X first, followed by a comma
and the Y value, and pressing the DT (M+) button
PRESS
2 , 8 M+
7 , 3 M+
5 , 4 M+
12 , 2 M+
1 , 5 M+
10 , 2 M+
8 , 1 M+
35
6 , 5 M+
5 , 4 M+
2 , 6 M+
3 , 7 M+
4 , 1 M+
Shift 1 1 = 477. X 2
Shift 1 2 = 65. X
Shift 1 3 = 12. n
Shift 2 1 = 5.416666667 X
Shift 1 1 = 250. Y 2
Shift 1 2 = 48. Y
Shift 1 3 = 201. XY
Shift 2 3 = -0.693150947 r
t
r n2
0.6932 12 2 3.041
1 r 1 0.6932
2 2
d) Decision: The computed t-value is greater than the tabulated t-value thus there
is an evidence to reject the null hypothesis. The computed t-value is located on
the critical region as shown in the figure below.
3.04
1
-2.228 +2.228
e) Conclusion: Therefore, there is linear relationship between length of
unemployment and job-seeking activity among white-collar
36
workers. The two variables are moderately and inversely
correlated.
8. In preparing for an examination, some students in a class studied more than others.
Each student’s grade on the 10-point exam and the number of hours studied are listed
as follows:
Summary:
N=8 X 36 Y 43 X 2
204
36
Y 2
285 XY _226 X
8
4.5
43
Y 5.375
8
37
PRESS
MODE MODE 2 1
The calculator is ready to perform Linear Regression Function. Again, before we
start to input data PRESS
Shift Clr 1 = AC
to ensure that no other data is present in the calculator memory.
Data Input:
Data is inputted, one by one, by keying in the datum X first, followed by a comma
and the Y value, and pressing the DT (M+) button
PRESS
4 , 5 M+
1 , 2 M+
3 , 1 M+
5 , 5 M+
8 , 9 M+
2 , 7 M+
7 , 6 M+
6 , 8 M+
Shift 1 1 = 204. X 2
Shift 1 2 = 36. X
Shift 1 3 = 8. n
Shift 2 1 = 4.5 X
Shift 1 1 = 285. Y 2
Shift 1 2 = 43. Y
Shift 1 3 = 226. XY
Shift 2 3 = 0.683227084 r
a) Ho: There is no linear relationship between exam grade and number of hours
studied 0
Ha: There is linear relationship between exam grade and number of hours
studied 0
38
b) Critical region and Level of Significance: let 0.05 level of significance, then
t t 0.025, 6 2.447
the critical region is , n 2
2
c) Computation:
t
r n2
0.683 8 2 2.292
1 r 1 0.683
2 2
d) Decision: Since the computed t-value is less than the tabulated t-value,
therefore there is no enough evidence to reject the null
hypothesis.
e) Conclusion: Therefore, there is no enough evidence to conclude that exam
grade is linearly correlated to the number of hours studied.
There is no significant linear relationship exist between exam
grade and number of hours studied.
39