BT Stat Unit 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

UNIT-II

SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION
(i) skewness (ii) kurtosis.
skewness In a perfectly symmetrical distribution Mean, Median and mode
coincide , skewness is a measure to study the aspect of a statistical
distribution. If adistribution is not symmetrical,we say that it is skewed.

(ii) kurtosis: Kurtosis is a measure of fitness or peakness of a distribution.

(iii) Pearsons coefficient of skewness =


When Mode is not well defined
( )
(iv) Pearsons coefficient of skewness = .
Bowley’s formula for measuring skewness.

Bowleys coefficient of skewness=

1. In a distribution mean=65,median=70 and the coefficient of skewness is


-0.6. Find the coefficient of variation.
( )
Solution:
( )
-0.6 =
( )
= = =25
Coefficient variation = =
2. In a distribution the sum of the two quartiles is 78.2 and their difference is
14.3 and if it’s median is 35.7 Find the coefficient of skewness
Solution:
Given =78.2
=14.3
Median M=35.7
Coefficient of skewness= = =0.4755
3. Pearson’s coefficient of -0.7 and the value of the median and standard
deviation are 12.8 and 6 respectively. Estimate the value of mean.
Solution:
Pearsons coefficient of skewness =-0.7,Median=12.8,S,D=6
( )

( )
- 0.7= -1.4=Mean-Median
-1.4 = Mean-12.8 Mean=12.8-1.4
Mean=11.4
4. In a frequency distribution,the coefficient of skewness based upon
quaetiles is 0.6.If the sum of the upper and lower quartiles is 100 and the
median 38,Estimate the value of the upper quartile.
Solution:
=0.6, =100 ,M=38
( )
=
( )
0.6 = =
( ) ---( )
Adding 1&2 2 =140 ( )
5.Find the coefficient of skewness,If difference between two quartiles is equal
to 8,sum of two quartiles is 22 and median is 10.5.
Solution:
Given =22, =8 ,h=10.5
( )
= = = =0.125
6. Calculate the coefficient of variation,if Karl Pearson’s coefficient of skewness
is 0.42,mean is 86,and median is 80.
Solution:
Given ,pearsons coefficient of Skewness =0.42
( )
Mean=86,Median=80. S.K =
( )
⇒0.42= => = =42.857
Coefficient of variation = x 100 =
7. The first four central moments of a distribution are 0,2.5,0.7 and
8.75.Write the skewness and kurtosis of the distribution.
Solution:
The coefficient of skewness is given by
( )
= ( )
,Since is positive ,the distribution is
positively Skewed.
The measure of kurtosis is given by = =( )
= =3
Since =3 the distribution is normal.
8 . The Karl Pearsons coefficient of skewness of a distribution is 0.32,it’s
standard deviation is 6.5 and the mean is 29.6.Calculate the mode and
the median.(L3)
Solution:
=0.32, =6.5 ,Mean =29.6
( ) ( )
S.K = => 0.32=
=> 0.32x6.5 =88.8 -3 Median
=>3 Median =-2.08+88.8 =86.72
Median = =28.90
Mean-Mode=3(Mean-Median)
29.6-Mode =3(29.6-28.90)
=3(0.7) =2.1
Mode=29.6-2.1=27.5

9. Compute the first four central moments for the following data 8,
10,11,12,14. (L3)
Solution:
̅ = = =11

x x- ̅ ( ̅) ( ̅) ( ̅)
8 -3 9 -27 81

10 -1 1 -1 1

11 0 0 0 0

12 1 1 1 1

14 3 9 27 81

55 0 20 56 144

The four central moments are


∑( ̅) ∑(( ̅) ) ∑(( ̅) )
= =0, = =4 , = = =11.2
∑(( ̅) )
= = =32.8
10. The first three moments of a distribution about are 2,10 and -30. Find
the value of (L1)
Solution:
About the value x=3, =2 , =10,
= - =10-4=6,
=-30-3( )( )+2( )
=-30-60+16=-74
( )

Pearsons coefficient of skewness=

1.Calculate Karl Pearson’s coefficient of skewness. (L3)


Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70
No.of 10 15 24 25 10 10 6
candidates

Solution:

Marks Mid F d fd f
value
0-10 5 10 -3 -30 90

10-20 15 15 -2 -30 60

20-30 25 24 -1 -24 24

30-40 35 25 0 0 0

40-50 45 10 1 10 10

50-60 55 10 2 20 40

60-70 65 6 3 18 64


A=35,d= , ̅ =A+ -

Mode =l + ( ) ( )
=30+ =30.625

√∑ (∑ )

=√ ( ) =30.625
Coefficient of skewness= = =0.0476

1. Calculate the Pearson’s coefficient of skewness for the following data (L3)
Class 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89
frequency 5 9 14 20 25 15 8 4

Solution:
class Mid value F d fd f
9.5-19.5 14.5 5 3 -15 45

19.5-29.5 24.5 9 -2 -18 36

29.5-39.5 34.5 14 -1 -14 14

39.5-49.5 44.5 20 0 0 0

49.5-59.5 55.5 25 1 25 25

59.5-69.5 65.5 15 2 30 60

69.5-79.5 75.5 8 3 24 72

79.5-89.5 85.5 4 4 16 64

Let A=44.6 ;c=10 ,d=



Mean = A+ =44.5+

√∑ (∑ )

=√ ( )
=√
=17.12
Mode = l + ( )
=49.5+ ( )
=49.5+
Pearsons coefficient of skewness=

2. Calculte the pearsons coefficient of skewness for the following data (L3)
Class 3-7 8-12 13-17 18-22 23-27 28-32 33-37 38-42
frequency 2 108 580 175 80 32 18 5

Solution:

class Mid value f d fd f


2.5-7.5 5 2 -3 -6 18
7.5-12.5 10 108 -2 -216 512
12.5-17.5 15 58 -1 -580 580
17.5-22.5 20 175 0 0 0
22.5-27.5 25 80 1 80 80
27.5-32.5 30 32 2 64 128
32.5-37.5 35 18 3 54 162
37.5-42.5 40 5 4 20 80

TOTAL 1000 584 1560

A=20 d=

Mean = A+ =20+

Mode = l + ( )
=15+ ( )

=15+ +15=17.69

S √∑ (∑ )

√ ( ) =5.52
Pearsons coefficient of skewness=

4. Calculate Pearson’s coefficient of skewness for the following data (L3)


Size 7 8 9 10 11 12 13 14
Frequency 2 11 36 64 39 39 22 2

Solution:
This is a discrete data.
Maximum frequency corresponds to x=10
X f d fd f
7 2 -3 -6 18
8 11 -2 -22 44
9 36 -1 -36 36
10 64 0 0 0
11 39 1 39 39
12 39 2 60 120
13 22 3 66 198
14 2 4 8 32

Mode =10 ,let A=10,d=x-10



Mean = A+ =10+

S √∑ (∑ ) =√ ( )

Pearsons coefficient of skewness=

Bowley’s coefficient of Skewness =

5. Calculate Bowleys coefficient of skewness for the following data. ((L4)


Weight(in kgs) 40 50 60 70 80 90
No.of persons 185 167 132 82 38 12
Solution:
More than No of persons class f Cf
40 185 40-50 18 18
50 167 50-60 35 53
60 132 60-70 50 103
70 82 70-80 44 147
80 38 80-90 26 173
90 12 90and 12 185
above
Median = l + 60 + = 67.9

= = 50 + = 58.07

= 70 + = 78.125
( )
Bowley’s coefficient of Skewness = = = 19.61

MOMENTS
( ̅)

( ̅)

( ̅)

( ̅)

6. Calculate the first four central moments for the following frequency
distribution. (L3)
X 0 1 2 3 4 5 6 7 8
F 1 8 28 56 70 56 28 8 1
Solution:
X f D ( ̅) ( ̅) ( ̅) ( ̅)
0 1 -4 -4 16 -64 256
1 8 -3 -24 72 -216 648
2 28 -2 -56 112 -224 448
3 56 -1 -56 56 -56 56
4 70 0 0 0 0 0
5 56 1 56 56 56 56
6 28 2 56 112 224 748
7 8 3 24 72 216 648
8 1 4 4 16 64 256
256 0 512 0 2616
̅ = = = 4
( ̅)
∑ = = 0
( ̅)
∑ = = 2
( ̅)
∑ = = 0
( ̅)
∑ = = 10.22
Since the distribution is symmetrical

7. Calculate the first four central moments for the following frequency.(L4)
Marks less than 80 70 60 50 40 30 20 10
frequency 100 90 80 60 32 20 13 5

Solution:

Marks Mid value f d Fd f f f


0-10 5 5 -4 -20 80 -320 1280
10-20 15 8 -3 -24 72 -216 648
20-30 25 7 -2 -14 28 -56 112
30-40 35 12 -1 -12 12 -12 12
40-50 45 28 0 0 0 0 0
50-60 55 20 1 20 20 20 20
60-70 65 10 2 20 40 80 160
70-80 75 10 3 30 90 270 810
100 0 392 -234 3042


̅
Let d = , c = 10


8. Calculate the moment measure of Kurtosis from the following data (L4)

X 2 4 6 8 10 12 14
Y 4 11 48 27 20 16 8
Solution:

X F d fd f f f
2 4 -3 -12 36 -108 324
4 11 -2 -22 44 -88 176
6 18 -1 -18 18 -18 36
8 27 0 0 0 0 0
10 20 1 20 20 20 20
12 16 2 32 64 128 256
14 8 3 24 72 216 648

TOTAL 104 24 254 150 1442

( )

= 11.53 – 3 9.77 0.46 + 2(0.46)3


= 11.53 – 13.4826 + 0.0973 = – 1.8553

=221.84 –21.2152+12.404 –0.1341=212.89


Measure of Kurtosis based on moments
= 2.33
CORRELATION.
Correlation; Let X and Y be two random variables, Correlation is the measure
of co variability taking into account for the variance of X and Y.
Correlation coefficient
Let X and Y be two random variables,the correlation coefficient denoted by
( ) ( )
,is defined by
√ √

Types of correlation

Types of correlation:
( i) positive and negative
(ii).Simple,partial and multiple
(iii)Linear,non linear.
lines of regression.

Regression is a mathematical measure of average relationship between two


or more variables in terms of original limits of the data.
Lines of regression:
The line of regression fn y on x is given by
y- ̅ ( ̅ ).
The line of regression fn x on y is given by
( ̅ )=r ( ̅)
` Regression coefficient.

A measure of assotiation between two random variables obtained as the


expected value of the product of the two random variables around their
Means;that is
Cov( )=E( ) –E( ) ( )

1. If two regression coefficients are 0.8 and 0.6.Find coefficient of


correlation?(L1)
Solution:
Given =0.8, =0.6
= =( )( )=0.48
r=0.692

2. The two equations of the variable are


Find the correlation coefficient between (L1)
Solution:
Given that the regression equations of X&Y are
X=19.13-0.87y
the regression coefficient of X onY is
The regression eqn of Y on X is

the regression coefficient of YonX is


the correlation oefficient between X &Y is given by
√ = √( )( )

=
3.Calculate the coefficient of correlation between from the following
data. (L3)
x 1 3 5 8 9 10
y 3 4 8 10 12 11
Solution:

x y ̅ ̅ ( ̅) ( ̅) ( ̅ )( ̅)
1 3 -5 -5 25 25 25
3 4 -3 -4 9 16 12
5 8 -1 0 1 0 0
8 10 2 2 4 4 4
9 12 3 4 9 16 12
10 11 4 3 16 9 12

36 48 0 0 64 70 65

∑ ∑
̅ ̅
∑( ̅ )( ̅)
̅
√∑( ̅ ) √∑( ̅) √ √
4.Calculate coefficient of correlation between . (L3)
x 1 2 3 4 5 6 7 8 9
y 12 11 13 15 14 17 16 19 18
Solution:
Y ̅ ̅ ( ̅) ( ̅) ( ̅ )( ̅)
1 12 -4 -3 16 9 12
2 11 -3 -4 9 16 12
3 13 -2 -2 4 4 4
4 15 -1 0 1 0 0
5 14 0 -1 0 1 0
6 17 1 2 1 4 2
7 16 2 1 4 1 2
8 19 3 4 9 16 12
9 18 4 3 16 9 12
45 135 0 0 60 60 56

̅ ̅
∑( ̅ )( ̅)
=
√∑( ̅ ) √∑( ̅) √ √

5.Ten competitors in a musical test were ranked by 3 judges X,Y,Z in the


following order. (L2)
A B C D E F G H I J
Rank by X 1 6 5 10 3 2 4 9 7 8
Rank by Y 3 5 8 4 7 10 2 1 6 9
Rank by Z 6 4 9 8 1 2 3 10 5 7
Using rank correlation method ,Discuss which pair of judges has the
nearest approach.

Solution:
X y Z
1 3 6 -2 -3 -5 4 9 25
6 5 4 1 1 2 1 1 4
5 8 9 -3 -1 -4 9 1 16
10 4 8 6 -4 2 36 16 4
3 7 1 -4 6 2 16 36 4
2 10 2 -8 8 0 64 64 0
4 2 3 2 -1 1 4 1 1
9 1 10 8 -9 -1 64 81 1
7 6 5 1 1 2 1 1 4
8 9 7 -1 2 1 1 4 1
200 214 60
The rank correlation between x & y is
∑ ( )
( )
( ) ( )
The rank correlation between y & z is
∑ ( )
( )
( ) ( )
The rank correlation between y & z is
∑ ( )
( )
( ) ( )
Since ( ) is maximum and also positive, We conclude that the pair of
judges x & z has the nearest approach to common likings in music

6. From the following data, Calculate (L3)


(i) The two regression equations.
(ii)The coefficient of correlation between the marks in Economics and
Statistics.
(iii)The most likely marks in statistics when marks in Economics are 30.
Marks in 25 28 35 32 31 36 29 38 34 32
Economics
Marks in 43 46 49 41 36 32 31 30 33 39
Statistics

Solution:
x Y x- ̅ =x-32 y- ̅=y-38 ( ̅) ( ̅) ( ̅ )( ̅)
25 43 -7 5 49 25 -35
28 46 -4 8 16 64 -32
35 49 3 11 9 121 33
32 41 0 3 0 9 0
31 36 -1 -2 1 4 2
36 32 4 -6 16 36 -24
29 31 -3 -7 9 49 21
38 30 6 -8 36 64 -48
34 33 2 -5 4 25 -10
32 39 0 1 0 1 0
320 380 0 0 140 398 -93

∑ ∑
Here ̅ & ̅

Coefficient of regression of y on x is
∑( ̅ )( ̅)
∑( ̅)

Coefficient of regression of x on y is
∑( ̅ )( ̅)
∑( ̅)

(i)Equation of the line of regression of x on y is


̅ ( ̅)
(ie) x-32 = -0.2337(y-38)
= -0.2337 y + 0.2337 38
X = -0.2337 y + 40.8806
Equation of the line of regression of y on x is
̅ ( ̅)
(ie) y-38 = -0.6643(x-32)
= -0.6643 x + 0.6643 32
y = -0.6643 x + 59.2576
(ii)Coefficient of correlation

= (-0.6643) (-0.2337) = 0.1552


r = √
(iii)When x = 30, y = ?
Y = -0.6643 x + 59.2576
y = -0.6643 30 + 59.2576
y = 39.32
39
7. Find the regression equation showing the regression equation of
capacity utilization on production from the following data. (L2)
Average Standard deviation
Production(in lakh units) 35.6 10.5
Capacity utilization (in percentage) 84.8 8.5
r=0.62.Estimate the production when the capacity utilization is 70 percent.

Solution:
Let production be denoted by the variable x and capacity utilization by y
Then the regression equation is given by
̅ ( ̅) ----------------------(1)
Where = 0.62 = 0.5019
& ̅ = 35.6 , ̅ = 84.8
(1) y – 84.8 = 0.5019 (x-35.6)
y = 66.9324 + 0.5019 x
Which is the required regression of capacity utilization on production.
To find regression equation x on y is
̅ ( ̅) -------------------------(2)
Where = 0.62 = 0.7659
(2) x – 35.6 = 0.7659(y-84.8)
X = 35.6 + 0.7659 y – 64.9483
= 0.7659 y – 29.3483
When y = 70, x = 0.7659(70) – 29.3483
= 24.2647
Hence the estimated production is 242.647 units when the capacity
utilization is 70 percent.

8. The two lines of regression are (L6)


The variance of x is 9.
Evaluate (i)The mean values of X and Y.
(ii)Correlation coefficient between X and Y.
Solution:
(i)Since both the lines of regression passes through the mean values ̅ ̅,
The point ( ̅ ̅) must satisfy the two given regression lines
(ie) 8 ̅ – 10 ̅ = -66 -----------------(1)
40 ̅ - 18 ̅ = 214 -----------------(2)
(1)*5 40 ̅ – 50 ̅ = -330
40 ̅ – 18 ̅ = 214
-----------------------
- 32 ̅ = 544 ̅ = 17
(1) 8 ̅ - 10*17 = -66 ̅ = 13
̅ ̅
(ii) From (1) 10 y = 8 x + 66
y=

= 0.6
Since both the regression coefficients are positive, r must be positive
r = 0.6

You might also like