0% found this document useful (0 votes)
22 views35 pages

BS Lect 05

Uploaded by

jasonnumahnalkel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views35 pages

BS Lect 05

Uploaded by

jasonnumahnalkel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 35

Numerical Measures

Chapter 2 – Part 2

Measures of Dispersion
Measures of dispersion

Any numerical measure that measures the scatter


of the distribution. Measures of variation give
information on the spread or variability of the data
values.
Range

Mean deviation

Variance

Standard deviation

Coefficient of variation Same center,


Measures of skewness different variation
Range
 Simplest measure of variation
 Difference between the largest and the smallest
values in a set of data:

Range = Xlargest – Xsmallest

Example:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13
Disadvantages of the Range
 Ignores the way in which data are distributed

7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5

 Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Interquartile Range

 Can eliminate some outlier problems by using


the interquartile range

 Eliminate some high- and low-valued


observations and calculate the range from the
remaining values

 Interquartile range = 3rd quartile – 1st quartile


= Q3 – Q1
Interquartile Range

Example:
X Median X
minimum Q1 (Q2) Q3 maximum

25% 25% 25% 25%

12 30 45 57 70

Interquartile range
= 57 – 30 = 27
Mean deviation

• Average of the deviations from the mean


• Indicates how far are all the observations from the center
of the distribution.

Ungroup data Group data


n h

| x  x |
i | x  x | f
i i

M .D  i 1 M .D  i 1
h
n f
i 1
i
Variance and standard deviation
(Ungroup data)
Sample variance
Average (approximately) of squared deviations of
values from the mean
2
2  
n n n
 ( xi  x ) n xi    xi 
2 
s 2  i 1  i 1  i 1 
n 1 n(n  1)
Sample standard deviation
2
n n
 n

 (x  x) n xi    xi 
2 2
i
s  s2  i 1
 i 1  i 1 
n 1 n(n  1)
Variance and standard deviation
(Group data)
Sample variance
2
n n
 n

  
2
( xi  x ) f i 2
n xi f i   xi i
f
s 2  i 1 h  i 1  i 1 
n(n  1)
( f i )  1
i 1

Sample standard deviation

2
n n
 n

 (x  x) n xi f i    xi f i 
2 2
i fi
s  s2  i 1
 i 1  i 1 
n 1 n(n  1)
Standard deviation

• Most commonly used measure of


variation
• Shows variation about the mean
• Is the square root of the variance
• Has the same units as the original data
Setup Table 5.1

No. of observation Observation( xi ) xi2 | xi  x | ( xi  x ) 2


1 x1 x12 | x1  x | ( x1  x ) 2
2 x2 x 22 | x2  x | ( x2  x ) 2
. . . . .
. . . . .
n xn x n2 | xn  x | ( xn  x ) 2
Total n n n n
 xi  xi2  | x i  x |  ( xi  x ) 2
i 1 i 1 i 1 i 1
Setup Table 5.2
No. of xi fi xifi Fi xi2 f i | xi  x | f i ( xi  x ) 2 f i
classes

1 x1 f1 x1 f1 F1 x12 f1 | x1  x | f1 ( x1  x ) 2 f1

2 x2 f2 x2 f2 F2 x 22 f 2 | x2  x | f 2 ( x2  x ) 2 f 2

. . . . . . . .

. . . . . . . .

h xh fh xhfh Fh x h2 f h | xh  x | f h ( xh  x ) 2 f h

Total h h h h h
 f i  n  xi f i  xi2 f i  | x i  x | f i  ( xi  x ) 2 f i
i 1 i 1 i 1 i 1 i 1
Example 2.5.1
Compute mean deviation, variance and standard deviation
for the following random sample of 8 observations:

6, 10, 9, 12, 15, 11, 10, 5

No. of ob Observations (xi) xi2 | xi  x | ( xi  x ) 2


1 5 25 4.75 22.5625
2 6 36 3.75 14.0625
3 9 81 0.75 0.5625
4 10 100 0.25 0.0625
5 10 100 0.25 0.0625
6 11 121 1.25 1.5625
7 12 144 2.25 5.0625
8 15 225 5.25 27.5625
Total 78 832 18.5 71.5
Solution

x
i 1
i x
Mean deviation= = 18.5/8 = 2.3
n

n n n
n x  (  xi )
i
2 2
8(832)  (78) 2
 ( xi  x ) 2
i 1 i 1 i 1
2
Variance, s = =   71.5 / 7  10.21
n(n  1) (8)(7) n 1

Standard deviation, s = s 2  10.21 = 3.2


Example 2.5.2

Compute mean deviation, variance and standard deviation from the following frequency
distribution

Class boundary Frequency Mid value f i xi xi2 f i | xi  x | f i ( xi  x ) 2 f i


( fi ) ( xi )
9.5 - 12.5 6 11 66 726 45.16 339.96
12.5 - 15.5 5 14 70 980 22.64 102.48
15.5 - 18.5 14 17 238 4046 21.38 32.66
18.5 - 21.5 18 20 360 7200 26.51 39.04
21.5 - 24.5 9 23 207 4761 40.25 180.05
24.5 - 27.5 3 26 78 2028 22.42 167.52
Total 55 1019 19741 178.36 861.71

Mean = 18.53, Median = 18.92, Mode = 19.423.


Solution
h

x i  x fi
178.36
Mean deviation  i 1
h
= = 3.24
55
f
i 1
i

 i
( x
i 1
 x ) 2
fi
861.71
2
Variance, s = h
= = 15.96
55
( f i )  1
i 1
2
h  h 
n xi f i    xi f i 
2

or = i 1  i 1  = (55)(19741)  1019 2
= 15.96
n(n  1) 55(55  1)

Standard deviation, s = 15.96 = 4


Measuring variation

Small standard deviation

Large standard deviation


Comparing Standard Deviations

Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338

Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 0.926

Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 4.567
Advantages of Variance and
Standard Deviation

 Each value in the data set is used in the


calculation

 Values far from the mean are given extra


weight
(because deviations from the mean are squared)
Coefficient of Variation

 Measures relative variation


 Always in percentage (%)
 Shows variation relative to mean
 Can be used to compare two or more sets of
data measured in different units

 S 
CV     100%

 X 
Example
Example 2.5.3

Refer to the frequency distribution in example 2.5.2 to compute the coefficient of


variation.

Solution

4
Standard deviation, s = 4, mean, x = 18.53, Therefore c. v = x 100 = 21.6%
18.53
Another example: Comparing
Coefficient of Variation
 Stock A:
 Average price last year = K50

 Standard deviation = K5

S  k5
CVA    100% 
 100%  10%
X  k50 Both stocks
have the same
 Stock B: standard
 Average price last year = k100 deviation, but
stock B is less
 Standard deviation = k5
variable relative
to its price
S  k5
CVB    100% 
 100%  5%
X  k100
Shape of a Distribution

 Describes how data are distributed


 Measures of shape
 Symmetric or skewed

Left-Skewed Symmetric Right-Skewed


Mean < Median Mean = Median Median < Mean
Measure of Skewness
Measure of skewness indicates the shape of the
distribution. Whether it skewed or normal.

3( x  m)
s.k =
s
Example 2.5.4
Refer to the frequency distribution in example 2.5.2 to compute the measure of
skewness to determine the type of the distribution.

Solution

Standard deviation s = 4, mean x = 18.53, median m = 18.92. Therefore, skewness

3(18.53  18.92)
s.k = = - 0.3
4

This is a negatively skewed distribution.


Example
Example 2.5.5

Compare the following data sets that represent the test scores of two groups of
students for a special IQ test.

Group A Group B
1 4
2 4
3 4
4 4
4 4
4 4
4 6
10 2
Solution
Summary statistics of the test scores of the students
Absolute Square
Group A Deviation from Deviation
Student Group A Group B Deviation from mean from mean
(xi - x ) xi - x (xi - x )2
1 1 4 -3 3 9
2 2 4 -2 2 4
3 3 4 -1 1 1
4 4 4 0 0 0
5 4 4 0 0 0
6 4 4 0 0 0
7 4 6 0 0 0
8 10 2 6 6 36
Total 32 32 0 12 50
Mean 4 4
Mode 4 4
Median 4 4
Range 9 4
Variance 7.14 1.14
Mean
deviation 1.5 0.5
Standard
deviation 2.67 1.07
c. v 66.8% 26.7%
s. k 0 0
Comparison Comments
 The measures of central tendency are equal so the two
sets of data cannot be compared in terms of the
measures of central tendency.
 The range for group A greater than group B which means
the scores for group A are scattered than for group B. In
other words scores for group B are more uniform, which
means the scores are much closer to each other.
 The distribution of the scores of the two groups are
symmetrical, this is because the mean and the median
coincide.
 Coefficient of variation for A is greater than that of B
because there is more variation in A.
Theorems regarding mean and
variance
Theorem 2.5.1

Arithmetic mean is affected by the change of origin. For example, if a fixed


number say ‘a’ is added to or, subtracted from each observation in a data set, then
the mean of the new observations is a + (the original mean), or a - (the original
mean).

Theorem 2.5.2

Arithmetic mean is affected by the change of scale. For example, if each


observation in a data set is multiplied by a fixed number say ‘b’, then the mean of
the new observations is b  (the original mean).

Theorem 2.5.3

Variance is not affected by the change of origin, but is affected by the change of
scale, or, variance does not depend on the change of origin but it does depend on
the change of scale. For example, if a fixed number say ‘a’ is added to or,
subtracted from all observations in a data set, then the variance of the new
observations will be same as the original Variance. Butif all observations in a data
set are multiplied or divided by a fixed number say ‘b’, then the Variance of the
new observations will be b2  (the original Variance), or (the original Variance) /
b2 .
Example 2.5.6
a) Suppose each observation in example 1 is increased by a factor of 3, what is the
new mean, and the new variance?
b) Suppose each observation in example 1 is increased by 3, what is the new
mean, and the new variance?
c) Suppose each observation in example 1 is decreased by 4, what is the new
mean?

Solution

a) Original mean x  9.75 , new mean x  9.75 x 3 = 29.25


Original variance s = 10.21, new variance s = 10.21 x 32 = 91.89
b) Original mean x  9.75 , new mean x  9.75 + 3 = 12.75
Original variance s = 10.21, new variance s = 10.21 (no change)
c) Original mean x  9.75 , new mean x  9.75 - 4 = 5.75
Using Microsoft Excel

 Descriptive Statistics can be obtained


from Microsoft® Excel
 Use menu choice:
tools / data analysis / descriptive statistics
 Enter details in dialog box
Using Excel

Use menu choice:


tools / data analysis /
descriptive statistics
Using Excel
(continued)

 Enter dialog box


details

 Check box for


summary statistics

 Click OK
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:

$2,000,000
500,000
300,000
100,000
100,000
Chapter Summary

 Measure dispersion is any measure indicating the amount of scatter


ness about the central point.
 Range is the difference between the highest and the lowest
observations.
 Mean deviation is the average of the deviations for all the
observations of a data set from the mean.
 Variance measures the amount of variation from the mean.
 Coefficient of variation compares the variability of two or sets of data.
 The measure of Skewness gives us an indication of the type of
distribution of the observations. If the skewness is a negative value,
it is a negatively skewed distribution.
 If the skewness is a positive value, it is a positively skewed
distribution.

You might also like