Regression Analysis-Statistics Notes
Regression Analysis-Statistics Notes
Coefficient of skewness
skp
iii)
mean mod e
s.deviation
Kurtosis
-
2
Moment coefficient of Kurtosis
m4
m22
is given by
M1
1 n
xi - 0 1
n i 1
1 n
xi x
n i 1
0
1
1
2
2
3
3
5
4
6
5
7
6
4
7
3
8
2
9
1
10
0
Solution
m
2 42
m2
Ungrouped data
1 n
4
M 4 xi x
n i 1
M2
1 n
xi x 2
n i 1
Grouped data
1 n
4
M 4 fi xi x
n i 1
M2
1 n
2
fi xi x
n i 1
marks
xi x A
xi x A 2
xi x A 4
fi xi x A
-4.47
19.98
399.23
19.98
399.23
-3.47
12.041
144.98
24.1
289.96
-2.47
6.1009
37.22
18.3
111.66
-1.47
2.1609
4.67
10.80
23.35
15
-0.47
0.2209
0.049
1.33
0.294
24
0.53
2.34
5.47
15.12
32.69
35
1.53
6.10
37.22
24.4
148.88
24
fi xi x A
fx
2.53
12.04
144.98
36.12
434.94
21
3.53
19.98
399.20
39.96
798.4
16
4.53
29.92
895.21
29.90
895.21
10
5.53
41.86
1752.33
f 34
fx 152
fx
f
152
4.47
34
1 n
4
fi xi x
n i 1
M4
fi xi x
3134.614
fi xi x
1
3134.614
(3134.614)
n
34
92.19
1
220.03
34
6.47
1 n
2
M 2 fi xi x
n i 1
m4 92.19
6.47
m22
2 3
2 3
220.03
i 1
i 1
2 3
x2
Q2. The first four central moments of distribution are 0, 2.5, 0.7 and 18.75. Comment on the
skewness and kurtosis of the distribution
Q3. The daily expenditure in $ of 100 families is given below
Daily
0-20
expenditur
e
Number of 13
families
20-40
f2
40-60
27
60-80
f4
80-100
16
If the mode of the distribution is 44, find the missing frequencies hence calculate Karl
Pearsons coefficient of skewness.
Self-practice questions.
Q1.
Q2.
Suppose 80 per cent of the material received from a vendor is of exceptional quality,
while only 50 per cent of the material received from vendor B is of exceptional
quality. However, the manufacturing capacity of vendor A is limited and for this
reason only 40 per cent of the material purchased comes from vendor A. The other 60
per cent comes from vendor B. An incoming shipment of material is inspected and it
is inspected and it is found to be of exceptional quality. What is the probability that it
came from vendor A.
REGRESSION ANALYSIS
Definition: regression is the measure of average relationship between two moving variables
in terms of origin unit of data.
Use of regression/ importance
It provides estimate of value of dependent variable from value of independent variable.
used to obtain a measure of error involved in using the regression line for estimation
with help of regression analysis we can obtain a measure of degree of association of
correlation that exist between two variables
Regression equation
4 NELSON KIPRONO BII Statistics Class notes
y f (x)
x f ( y)
ii) Regression of x on y
y is independent variable and x is dependent variable
Note: in regression analysis the independent variable is also known as regressor or predictor
or explanatory and dependent variable is called regressed or explained variable.
Method of least Square
y a bx
a, b are constants
to solve the equation we use the following relationship
y Na bx
xy ax bx 2
x on y
x Na by
xy ay by 2
Example 1
Determine regression equation of y on x from the following data
X
Y
1
2
2
5
3 4
3 8
5
7
Solution
X
1
Y
2
X2
1
Y2
4
XY
2
25
10
16
64
32
25
49
35
X 15
Y 25
X 2 55
Y 2 151
XY 88
(i)
Regression of X on Y
x a by
x Na by
xy ay by 2
15 5a 25b
88 25a 151b
75 25a 125b
88 25a 151b
13 26b
b 0.5
15 5a 12.5
2 .5 5 a
a 0.5
y a bx
y Na bx
xy ax bx 2
25 5a 15b
88 15a 55b
75 15a 45b
88 15a 55b
13 10b
b 1.3
a 1.1
Example 2
(ii) Regression of Y on X
X 1.1 1.3 y
A company is introducing a job evaluation scheme in which all jobs are graded by points for
skill, responsibility and so on. Monthly pay scales (in thousand Kenya shillings) are then
drawn up according to the number of points allocated to the number of points allocated. To
date the company has applied this scheme to 9 jobs:
Job
A
B
C
D
E
F
G
H
I
Points
5
25
7
19
10
12
15
28
16
Pay
3.0
5.0
3.25
6.5
5.5
5.6
6.0
7.2
6.1
(a) Fit the least squares regression line for linking pay scales to points
(b) Estimate the monthly pay for a job graded by 20 points
Solution.
Grade
point
5
25
7
19
10
12
15
28
16
137
d x x 15
dx
-10
10
-8
4
-5
-3
0
13
1
2
100
100
64
16
25
9
0
169
1
484
Pay scale,
y
dy y 5
dy
3
5
3.25
6.5
5.5
5.6
6.0
7.2
6.1
48.15
-2.0
0
-1.75
1.5
0.5
0.6
1
2.2
1.1
3.15
4
0
3.06
2.25
0.25
0.36
1
4.84
1.21
16.97
(a)
x 137 / 9 15.22
y 48.15 / 9 5.35
b yx
n d x d y d x d y
n d x d x
2
9(65.40) 2(3.15)
9( 484) ( 2) 2
=0.133
Substituting in
y y b yx ( x x)
We have y-5.35=0.133(x-15.22)
=3.326+0.133x
(b) For job grade point 20, the estimated average pay scale is given by
3.326+0.133(20)
=5.989 x 1000
=Kshs 5989
dxd y
20
0
14
6
-2.5
-1.8
0
28.6
1.1
65.40
1
40
2.5
2
70
6.0
3
50
4.0
4
60
5.0
5
80
4.0
6
50
2.5
7
90
5.5
8
40
3.0
9
60
4.5
10
60
3.0
Calculate the regression line of sales on test scores and estimate the probable weekly sales
volume if a sales man makes a score of 100.
Activity: Refer to reading texts and read on correlation analysis
then answer the following questions;
Q1. Define the term correlation
Q2. Explain the different types of correlation.
Q3. Explain areas of application for correlation analysis in
business
Q4. Using the self practice question above, find the correlation
between test score and sales, hence comment on your results.
Q5. The following data give the ages and blood pressure of 10 women
Age
Blood
pressure
56
147
42
12
5
36
118
47
128
(a)
Find the correlation
(Answer=0.892)
49
145
42
140
coefficient
60
15
5
72
160
between
age
63
149
55
15
0
and
blood
pressure
(b) Determine the least squares regression equation of blood pressure on age
(Answer, Y=83.758 +
1.11x)
(c) Estimate the blood pressure of a woman whose age is 45 years (Answer=134)