ProbStat - Curvefitting - U5notes
ProbStat - Curvefitting - U5notes
(The method of least squares, curvilinear regression, multiple regressions, correlation (excluding
causation))
Curve fitting: Computing a curve corresponding to a given set of points is called a curve fitting
Regression: A relation between independent and dependent variables obtained from a given set of points is
called a regression
Simple regression: A relation between one dependent variable and one independent variable obtained
from a given set of points is called a simple regression
Multiple regression: A relation between one dependent variable and two or more independent variables
obtained from a given set of points is called a simple regression
Regression curve of y on x1 , x2 : A relation of the form y a bx1 cx2 is called a regression curve of
y on x1 , x2
Least Squares Method: The method of computing a curve (or regression curve) by using a given set of
points such that the sum of the squares of deviations from the points to the curve along y axis is
minimum
1. To fit a straight line of the form y a bx , the Normal equation are given by
y na b x
xy a x b x 2
2. To fit a straight line of the form x a by , the Normal equation are given by
x na b y
xy a y b y 2
6. To fit a parabola of 2nd degree (or quadratic curve) of the form y a bx cx 2 , the Normal equation
are given by
y na b x c x 2
xy a x b x c x
2 3
7. To fit a multiple regression curve of the form z a bx cy , the Normal equation are given by
z na b x c y
xz a x b x c xy
2
yz a y b xy c y 2
8. To fit a multiple regression curve of the form y a bx1 cx2 , the Normal equation are given by
y na b x1 c x2
x y ax bx cx x
1 1
2
1 1 2
1. Fit a straight line y a bx for the following data by least squares method
x 1 2 3 4 5
y 12 25 40 50 65
2. By the method of least squares, fit a straight line y a bx for the following data
x 50 70 100 120
y 12 15 21 25
x 12 15 21 25
y 50 70 100 120
x 1 3 5 7 9
y 100 81 73 54 43
Consider,
x y Y log y x2 xY
1 100 2 1 2
3 81 1.9085 9 5.7255
5 73 1.8633 25 9.3165
7 54 1.7324 49 12.1268
9 43 1.6335 81 14.7015
x 25 Y 9.1377 x 165 2
xY 43.8703
Here x 25, Y 9.1377, x 2
165, xY 43.8703 and n 5
The normal equations becomes 9.1377 5 A 25B (1)
and 43.8703 25 A 165B (2)
Solving (1) and (2), A 2.0548 and B 0.0455 or
B
Therefore, a 10 A 113.4488 and b 0.1048
log e
Hence the required curve is y 113.4488 e 0.1048x
x 1 2 3 4 5
y 130 152.2 177.3 190.2 244.7
Consider,
x y Y log y x2 xY
1 130 2.1139 1 2.1139
2 152.2 2.1824 4 4.3648
3 177.3 2.2487 9 6.7461
4 190.2 2.2792 16 9.1168
5 244.7 2.3886 25 11.9432
x 15 Y 11.2129 x 2 55 xY 34.2849
Here x 15, Y 11.2129, x 2
55, xY 34.2849 and n 5
The normal equations becomes 11.2129 5 A 15B (1)
and 34.2849 15 A 55B (2)
Solving (1) and (2), A 2.0487 and B 0.0646 or
Therefore, a 10 111.8716 and b 10 1.1604
A B
x 2 3 4 5 6
y 144 172.8 207.4 248.8 298.6
and n 5
The normal equations becomes 11.5837 5 A 20B and 47.1266 20 A 90B
On solving, A 2 and B 0.0792 or a 100 and b 1.2
Hence the exponential curve y 100 1.2
x
7. Fit a power curve y ax b for the following data
x 1 2 3 4 5 6
y 2.98 4.26 5.21 6.10 6.80 7.50
x 20 25 28 35 43
y 52 48 63 79 95
Consider
x y x2 xy
20 52 400 1040
25 48 625 1200
28 63 784 1764
35 79 1225 2765
43 95 1849 4085
x 151 y 337 x 2 4883 xy 10854
Here x 151, y 337, x 2
4883, xy 10854 and n5
The normal equations become 337 5a 151b (1)
and 10854 151a 4883b (2)
Solving (1) and (2), a 4.0998 and b 2.096
Hence the regression line of y on x is y 4.0998 2.096 x
x 1 3 5 7 9
y 2 7 10 11 9
x y a x b x c x
2 2 3 4
Consider,
x y x2 x3 x4 xy x2 y
1 2 1 1 1 2 2
3 7 9 27 81 21 63
5 10 25 125 625 50 250
7 11 49 343 2401 77 539
9 9 81 729 6561 81 729
x 25 y 39 x 165
2
x 1225
3
x 4 9669 xy 231 x y 1583
2
Here x 25, y 39, x 2
165, x 3
1225, x 4
9669, xy 231
x 2
y 1583 and n 5
x y a x b x c x
2 2 3 4
Consider,
x y x2 x3 x4 xy x2 y
1.0 1.1 1 1 1 1.1 1.1
1.5 1.3 2.25 3.375 5.0625 1.95 2.925
2.0 1.6 4 8 16 3.2 6.4
2.5 2.0 6.25 15.625 39.0625 5 12.5
3.0 2.7 9 27 81 8.1 24.3
3.5 3.4 12.25 42.875 150.0625 11.9 41.65
4.0 4.1 16 64 256 0.4 1.6
x y x2 x3 x4 xy x2 y
17.5 12.2 50.75 161.875 548.1875 31.65 90.475
Sum of the squares of the deviations is given by S di2 yi f ( xi ) yi (a bxi )
2 2
y 3 5 6 8 12 14
x1 16 10 7 4 3 2
x2 90 72 54 42 30 12
x y a x b x x c x
2 2 1 2
2
2
Consider,
y x1 x2 x12 x 22 x1 x2 x1 y x2 y
3 16 90 256 8100 1440 48 270
5 10 72 100 5184 720 50 360
6 7 54 49 2916 378 42 324
8 4 42 16 1764 168 32 336
12 3 30 9 900 90 36 360
14 2 12 4 144 24 28 168
y x1 x2 x12 x22 x1 x2 x1 y x2 y
48 42 300 434 19008 2820 236 1818
14. Determine the least squares regression equation of the form z a bx cy for the following data
z 16 19 23 20 26 23 28
x 1 2 3 4 5 6 7
y 4 5 7 2 6 1 4
yz a y b xy c y 2
Consider,
z x y x2 y2 xy xz yz
16 1 4 1 16 4 16 64
19 2 5 4 25 10 38 95
23 3 7 9 49 21 69 161
20 4 2 16 4 8 80 40
26 5 6 25 36 30 130 156
23 6 1 36 1 6 138 23
28 7 4 49 16 28 196 112
z x y x2 y2 xy xz yz
155 28 29 140 147 107 667 651
Exercise:
1. Fit a straight line y a bx for the following data by least squares method
x 1 2 3 4 5 6
y 14 33 40 63 76 85
2. Fit a straight line y a bx for the following data by least squares method
x 0 2 3 5 9
y -3 4 3 8 15
3. Fit a straight line x a by for the following data by least squares method
4. The following shows the improvement of eight students in a speed-reading program, and the number
of weeks they have been in the program:
No.of weeks x 3 5 2 8 6 9 3 4
Speed gain (words/min.) y 86 118 49 193 164 232 73 109
x 1 2 3 4 5 6
y 2.98 4.26 5.21 6.10 6.80 7.50
x 50 60 70 90 100
y 65 51 40 26 08
9. Find Y when X1 = 10 and X2 = 6 from the least square regression equation of Y on X1 and X2for the
following data
Y 90 72 54 42 30 12
X1 3 5 6 8 12 14
X2 16 10 7 4 3 2
10. Fit a least-squares regression plane for the following data and also find y at x1 2.2 and x2 90 .
y 5.3 7.8 7.4 9.8 10.8 9.1 8.1 7.2 6.5 12.6
x1 1.5 2.5 0.5 1.2 2.6 0.3 2.4 2 0.7 1.6
x2 66 87 69 141 93 105 111 78 66 123
Correlation: The relationship between two variables such that a change in one variable results in a
ve or -ve change in the other and also greater change in one variable results in corresponding greater
change in the other is called a correlation. For a change in one variable, if there is a corresponding change
in the other variable then the variables are called correlated.
Note:
(i) If the variables deviate in the same direction then the correlation is called direct or ve correlation
(ii) If the variables deviate in the opposite direction then the correlation is called inverse or -ve
correlation
Correlation Coefficient (or Karl Pearson coefficient of correlation): The numerical measurement of
linear relationship between the variables x and y is called the coefficient of correlation of x and y and it
is denoted by r ( x, y) or r
Note:
(i) The coefficient of correlation r is always lies between 1 and 1 ; that is, 1 r 1
(ii) If r 0 then the variables are not correlated
(iii) If r 1 then the variables are positively and perfectly correlated
(iv) If r 1 then the variables are negatively and perfectly correlated
(v) If 0 r 1 then the variables are positively and partially correlated
(vi) If 1 r 0 then the variables are negatively and partially correlated
Correlation formulas:
Mean of x is given by x
x
(i)
n
Variance of x is given by x2 or x2 (x ) 2
(x x)2 x2
(ii)
n n
x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9
Here x x 56 7 and y
y 40 5
n 8 n 8
x y xx y y ( x x )( y y ) (x x)2 ( y y) 2
1 1 -6 -4 24 36 16
3 2 -4 -3 12 16 9
4 4 -3 -1 3 9 1
6 4 -1 -1 1 1 1
8 5 1 0 0 1 0
9 7 2 2 4 4 4
11 8 4 3 12 16 9
14 9 7 4 28 49 16
x y x x y y x x
2
y y
2
56 40 84 132 56
x 78 36 98 25 75 82 90 62 65 39
y 84 51 91 60 68 62 86 58 53 47
Here x
x 650 65 and y
y 660 66
n 10 n 10
x y xx y y ( x x )( y y ) (x x)2 ( y y) 2
78 84 13 18 234 169 324
36 51 - 29 -15 435 841 225
98 91 33 25 825 1089 625
25 60 - 40 -6 240 1600 36
75 68 10 2 20 100 4
82 62 17 -4 -68 289 16
90 86 25 20 500 625 400
62 58 -3 -8 24 9 64
65 53 0 -13 0 0 169
39 47 - 26 -19 494 676 361
x y x x y y x x
2
y y
2
x 65 66 67 67 68 69 70 72
y 67 68 65 68 72 72 69 71
x y xx y y ( x x )( y y ) (x x)2 ( y y) 2
65 67 -3 -2 6 9 4
66 68 -2 -1 2 4 1
67 65 -1 -4 4 1 16
67 68 -1 -1 1 1 1
68 72 0 3 0 0 9
69 72 1 3 3 1 9
70 69 2 0 0 4 0
72 71 4 2 8 16 4
x y x x y y x x
2
y y
2
544 552 24 36 44
x 65 66 67 67 68 69 70 72
y 67 68 65 68 72 72 69 71
Determine (i) x and y (ii) x and y (iii) Cov( x, y) (iv) the correlation coefficient
between x and y
x , 2 (x x) ( x x ) ( y y)
2
Consider,
x y xx y y ( x x )( y y ) (x x)2 ( y y) 2
65 67 -3 -2 6 9 4
66 68 -2 -1 2 4 1
67 65 -1 -4 4 1 16
67 68 -1 -1 1 1 1
68 72 0 3 0 0 9
69 72 1 3 3 1 9
70 69 2 0 0 4 0
72 71 4 2 8 16 4
x y x x y y x x
2
y y
2
544 552 24 36 44
(i) x
x 544 68 and y
y 552 69
n 8 n 8
(ii) x2
(x x)2 36
4.5 , x 4.5 2.1213
n 8
y2
( y y ) 2 44 5.5 , 5.5 2.3452
y
n 8
(iii) Cov( x, y)
( x x ) ( y y ) 24 3
n 8
Cov( x, y) 3
(iv) r 0.6030
x y 2.1213 2.3452
5. Find the correlation coefficient between x and y from the following data
x 78 89 97 69 59 79 68 57
y 125 137 156 112 107 138 123 108
x 78 89 97 69 59 79 68 57
y 125 137 156 112 107 138 123 108
Determine
(i) x and y
(ii) x and y
(iii) Cov( x, y)
(iv) the correlation coefficient between x and y
(v) two regression lines
x y xx y y ( x x )( y y ) (x x)2 ( y y) 2
78 125 3.5 -0.75 -2.625 12.25 0.5625
89 137 14.5 11.25 163.125 210.25 126.5625
97 156 22.5 30.25 680.625 506.25 915.0625
69 112 -5.5 -13.75 75.625 30.25 189.0625
59 107 -15.5 -18.75 290.625 240.25 351.5625
79 138 4.5 12.25 55.125 20.25 150.0625
68 123 -6.5 -2.75 17.875 42.25 7.5625
57 108 -17.5 -17.75 310.625 306.25 315.0625
x y x x y y x x
2
y y
2
(ii) x2
( x x ) 1368 171, 171 13.0767
2
x
n 8
y
2 ( y y ) 2 2055.5
256.9375 , y 256.9375 16.0293
n 8
(iii) Cov( x, y)
( x x ) ( y y ) 1591 198.875
n 8
Cov( x, y) 198.875
(iv) r 0.9488
x y 13.0767 16.0293
y
(v) The regression line of y on x is given by y y r (x x)
x
16.0293
That is, y 125.75 (0.9488) ( x 74.5)
13.0767
y 125.75 1.1630 ( x 74.5)
y 125.75 1.1630 x 86.6435
y 39.1065 1.163 x
x
And the regression line of x on y is given by x x r ( y y)
y
13.0767
That is, x 74.5 (0.9488) ( y 125.75)
16.0293
x 74.5 0.7740 ( y 125.75)
x 74.5 0.774 y 97.3305
x 22.8305 0.774 y
7. The two regression equations of the variables x and y are y 0.399 x 6.934 0 and
x 1.212 y 2.461 0. Find (i) mean of x (ii) mean of y (iii) correlation coefficient between x and y
8. The two regression equations of the variables x and y are x 19.13 0.87 y and y 11.64 0.50 x.
Find (i) mean of x (ii) mean of y (iii) correlation coefficient between x and y
n n
x x y y x x y y
1 2 1 2 2
n n n
x y 2r x y
2 2
x2 y2 x2 y
Therefore, r
2 x y
x2 y x2 y2
10. Establish the formula r
2 x y
Solution: Consider,
x2 y x y x y x x y y
1 2 1 2
n n
x x y y x x y y
1 2 1 2 2
n n n
x y 2r x y
2 2
x2 y x2 y2
Therefore, r
2 x y
x2 y2 x2 y
11. Use the formula r to compute the correlation coefficient to the following data
2 x y
x 78 89 97 69 59 79 68 57
y 125 137 156 112 107 138 123 108
Solution: Consider,
(ii) x2
(x x) 2
1368
171 , x 171 13.0767
n 8
y2
( y y ) 2 2055.52 256.94 , 256.94 16.0293
y
n 8
x y z
2 2 ( z z ) 2 241.5
30.1875 , x y z 30.1875 5.4943
n 8
x2 y2 x2 y 171 256.94 30.1875
(iii) r 0.9488
2 x y 2 13.0767 16.0293
r 2 1 x y
12. If is the angle between the two regression lines, prove that tan
r x2 y2
Solution: We know that the two regression lines
y
y y r ( x x ) (1)
x
x x r x ( y y ) (2)
y
1y
From (2), y y (x x)
r x
y
The slope of the line (1), m1 r
x
1y
The slope of the line (2), m2
r x
y 1y r 2 1 y
r x
r
m1 m2 x r x r 2 1 x y
Therefore, tan
1 m1m2 y 1y
x2 y2 r x2 y2
1 r
x r x
2
x
r 2 1 x y
If is acute angle then tan is positive, and therefore tan
r 2 2
x y
Exercise:
x 62 56 36 66 25 75 82 78
y 58 44 51 58 60 68 62 84
3. Compute the correlation coefficient to the following data
x 8 1 5 4 7
y 3 4 0 2 1
x 50 60 70 90 100
y 65 51 40 26 08
Determine
(i) x and y
(ii) x and y
(iii) Cov( x, y)
(iv) the correlation coefficient between x and y
(v) two regression lines
5. The equations of two regression lines obtained in a correlation analysis are 4 x 5 y 33 0 and
20 x 9 y 107. Compute (i) mean of x (ii) mean of y (iii) correlation coefficient between x and y
6. Psychological tests of intelligence and engineering ability were applied to 10 students. Here is a record
of ungrouped data showing intelligence ratio (IR) and engineering ratio (ER). Calculate the coefficient
of correlation
Student A B C D E F G H I J
IR 105 104 102 101 100 99 98 96 93 92
ER 101 103 100 98 95 96 104 92 97 94
Correlation coefficient r
( x x ) ( y y) 0.59
( x x ) ( y y)
2 2
x2 y2 x2 y
7. Use the formula r to compute the correlation coefficient to the following data
2 x y
X 62 56 36 66 25 75 82 78
Y 58 44 51 58 60 68 62 84
8. Given that x 31.6, y 38, x 3.72, y 6.31 and r 0.36 . Determine the two regression lines
Rank Correlation: The correlation between the ranks of the variables x and y is called the rank
correlation
Repeated Values: If an item of x or y is repeated m times, then we give the average rank for the repeated
m (m 2 1)
items and add the factor to d 2 in the formula of .
12
Problems:
1. Determine the rank correlation coefficient for the following data
x 68 64 75 50 64 80 75 40 55 64
y 62 58 68 45 81 60 68 48 50 70
6d 2
Solution: The rank correlation of x, y is given by 1
n (n 2 1)
The values of x in decreasing order: 80, 75, 75, 68, 64, 64, 64, 55, 50, 40
The values of y in decreasing order: 81, 70, 68, 68, 62, 60, 58, 50, 48, 45
Consider,
x y Rank x Rank y d Rank x Rank y d2
68 62 4 5 -1 1
64 58 6 7 -1 1
75 68 2.5 3.5 -1 1
50 45 9 10 -1 1
64 81 6 1 5 25
80 60 1 6 -5 25
75 68 2.5 3.5 -1 1
40 48 10 9 1 1
55 50 8 8 0 0
64 70 6 2 4 16
d 2 72
Here, in the values of x, 75 is repeated 2 times and 64 is repeated 3 times
And in the values of y, 68 is repeated 2 times
Therefore the correction factor is given by
m (m 2 1) 2(2 2 1) 3(32 1) 2(2 2 1) 1 2 1 3
12 12 12 12 2 2
m (m 2
1)
Now, n 10 , d 2 72 and d 2 12
72 3 75
Hence the Rank correlation coefficient,
6d 2 6 (75) 450
1 1 1 1 0.4545 0.5455
n (n 1)
2
10 (10 1)
2
990
2. Determine the rank correlation coefficient for the following data
x 10 15 12 17 13 16 24 14 22
y 30 42 45 46 33 34 40 35 39
6d 2
Solution: The rank correlation of x, y is given by 1
n (n 2 1)
Consider,
x 1 6 5 10 3 2 4 9 7 8
y 6 4 9 8 1 2 3 10 5 7
Calculate the rank correlation coefficient
6d 2
Solution: The rank correlation of x, y is given by 1
n (n 2 1)
Consider,
x y Rank x Rank y d Rank x Rank y d2
1 6 1 6 -5 25
6 4 6 4 2 4
5 9 5 9 -4 16
10 8 10 8 2 4
3 1 3 1 2 4
2 2 2 2 0 0
4 3 4 3 1 1
9 10 9 10 -1 1
7 5 7 5 2 4
8 7 8 7 1 1
d 60
2
Here, there are no repetitions in the values of x and y
Now, n 10 , d 2 60
Hence the Rank correlation coefficient,
6d 2 6 (60) 360
1 1 1 1 0.3636 0.6364
n (n 1)
2
10 (10 1)
2
990
x 5 10 6 3 19 5 6 12 8 2 10 19
y 8 3 2 9 12 3 17 18 22 12 17 20
6d 2
Solution: The rank correlation of x, y is given by 1
n (n 2 1)
Consider,
x y Rank x Rank y d Rank x Rank y d2
5 8 9.5 9 0.5 0.25
10 3 4.5 10.5 -6 36
6 2 7.5 12 -4.5 20.25
3 9 11 8 3 9
19 12 1.5 6.5 -5 25
5 3 9.5 10.5 -1 1
6 17 7.5 4.5 3 9
12 18 3 3 0 0
8 22 6 1 5 25
2 12 12 6.5 5.5 30.25
10 17 4.5 4.5 0 0
19 20 1.5 2 -0.5 0.25
d 2 156
Here, in the values of x, 19 is repeated 2 times, 10 is repeated 2 times, 6 is repeated 2 times, and 5 is
repeated 2 times
And in the values of y, 17 is repeated 2 times, 12 is repeated 2 times and 3 is repeated 2 times,
Therefore the correction factor is given by
m (m 2 1) 7 2(2 2 1) 7 3.5
12 12 2
m (m 2
1)
Now, n 12 , d 2
156 and d 2
12
156 3.5 159.5
Hence the Rank correlation coefficient,
6d 2 6 (159.5) 957
1 1 1 1 0.5577 0.4423
n (n 1)
2
12 (12 1)
2
1716
Exercise:
x 8 3 9 2 7 10 4 6 1 5
y 9 5 10 1 8 7 3 4 2 6
x 78 56 36 66 25 75 82 62
y 84 44 57 58 60 68 62 58
m (m 2
1)
d 2 28.5 ,
1
Ans: n 10 , and 0.655
12 2
3. Determine the rank correlation coefficient for the following data
x 65 63 67 64 68 62 70 66 68 67 69 71
y 68 66 68 65 69 66 68 65 71 67 68 70
m (m 2
1)
Ans: n 12 , d 2 72.5 , 12
7 and 0.722