0% found this document useful (0 votes)
69 views15 pages

Chapter 2B QS (PC)

This document discusses various measures of dispersion used to describe the spread or variability in a data set. It defines key terms like range, quartile deviation, variance and standard deviation. Range is the difference between the maximum and minimum values. Quartile deviation describes the spread between the first and third quartiles. Variance and standard deviation are more accurate measures that use all data points and their deviation from the mean. Standard deviation in particular provides a measure of average deviation from the mean and is widely used to understand variability in data sets. Examples are provided to demonstrate calculating each measure of dispersion.

Uploaded by

SEOW INN LEE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views15 pages

Chapter 2B QS (PC)

This document discusses various measures of dispersion used to describe the spread or variability in a data set. It defines key terms like range, quartile deviation, variance and standard deviation. Range is the difference between the maximum and minimum values. Quartile deviation describes the spread between the first and third quartiles. Variance and standard deviation are more accurate measures that use all data points and their deviation from the mean. Standard deviation in particular provides a measure of average deviation from the mean and is widely used to understand variability in data sets. Examples are provided to demonstrate calculating each measure of dispersion.

Uploaded by

SEOW INN LEE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Chapter 2B Data Description (B)

Measures of Dispersion
➢ Measures of dispersion help us to understand the spread or
variability of a set of data. It gives additional information to judge
the reliability of the measure of central tendency and helps in
comparing dispersion that is present in various samples.
➢ Two data sets can have the same mean, the same median, or the
mode and yet they are very different in other respects.
➢ Example: consider the heights (cm) of five employees from each
of the sales and production departments as shown:
Sales department: 183 185 193 193 198
Production department: 170 183 193 193 213

The two groups have the same mean heights, 190.4cm, the same median heights,
193cm, and the same modal heights, 193cm.
Nonetheless, it is clear that the two data sets differ. To describe this difference
quantitatively, we use a measure of dispersion.

➢ There are several commonly used measures of dispersion. They


are range, quartile deviation, variance, standard deviation and
coefficient of variation.
➢ The more spread out or dispersed the data, the larger is the range,
the quartile deviation, the variance and the standard deviation.
Range
➢ Range is the difference between the largest and the smallest
observations in a data set.
Range = largest value – smallest value

1
Eg 1: Find the range for the following data.
10 15 17 20 25 29 30 35 38 40 45
Solution: Range = 45 – 10 = 35

For grouped data (discrete) For grouped data (continuous)


Range = Upper class limit of the Range= Upper class boundary of
last class – Lower class the last class – Lower
limit of the first class class boundary of the
first class

Eg 2A: (Discrete) Eg 2B: (Continuous)


The following table shows the Find the range of the following
daily outputs of 80 workers in a frequency distribution regarding
factory. Determine the range. the time spent (in hour) by
Daily outputs No. of students in campus per week.
workers Time spent No. of students
10 – 19 6 0-<6 2
20 – 29 10 6 - < 12 4
30 – 39 30 12 - < 18 10
40 – 49 20 18 - < 24 12
50 – 59 10 24 - < 30 8
60 – 69 4 Solution: Range = 30 – 0 = 30
Solution: Range = 69 – 10 = 59
Advantage of range: It is easy to understand and simple to calculate.
Disadvantage of range: Since only the largest and the smallest
values are considered, it can be very much influenced by them
especially if they are unrepresentative extreme values. (Remember
the influence of extreme value?)

2
Quartile Deviation (semi-interquartile range)
𝑄3 −𝑄1
QD = where 𝑄1 = lower quartile or first quartile
2

𝑄3 = upper quartile or third quartile


Interquartile range – the different between the third and the first
quartiles.
Interquartile range, IQR = Q3 – Q1
1) For raw data
✓ Arrange the data into an array in ascending order of magnitude.
(𝑛+1)th
✓ Locate the quartile items as: Q1 = item
4
3(𝑛+1)th
Q3 = item
4

Eg 3: The following are the scores of 12 students in a mathematics


class.
75 80 68 53 99 58 76 73 85 88 91 79
a) Find the values of 𝑄1 and 𝑄3 .
b) Find the interquartile range and quartile deviation.

53 58 68 73 75 76 79 80 85 88 91 99
12+1
a) Q1 = ( ) 𝑡ℎ = 3.25th = 68 + 0.25(73 – 68) = 69.25
4
3(12+1)
Q3 = ( ) 𝑡ℎ = 9.75th = 85 + 0.75(88 – 85) = 87.25
4

b) IQR = Q3 – Q1 =87.25 – 69.25 = 18


18
QD = =9
2

3
Eg 4: The following are the ages of nine employees of an insurance
company.
47 28 39 51 33 37 59 24 33
Find the quartile deviation.
24 28 33 33 37 39 47 51 59
9+1
Q1 = ( ) 𝑡ℎ = 2.5th =28 + 0.5(33 – 28) = 30.5
4
3(9+1)
Q3 = ( ) 𝑡ℎ =7.5th = 47 + 0.5(51 – 47) = 49
4
49−30.5
QD = = 9.25
2

2) For grouped data,


Two methods:
a) By Ogive
cf

Value

b) By calculation (Linear interpolation formula)

where L = lower class boundary


c = class size
4
f = frequency
∑ 𝑓𝑄−1 = cumulative frequency before

Eg 5: The following frequency distribution shows the daily production


level.
Production(units) No. of days
13 – 17 2
18 – 22 22
23 – 27 10
28 – 32 14
33 – 37 3
38 – 42 4
43 – 47 6
48 – 52 1
Find the quartile deviation using (a) the formula; and (b) an ogive.

Production(units) f Class cf Upper


boundaries boundary
0 12.5
13 – 17 2 12.5 – 17.5 2 17.5
Q1 class
18 – 22 22 17.5 – 22.5 24 22.5
Q3 class
23 – 27 10 22.5 – 27.5 34 27.5
28 – 32 14 27.5 – 32.5 48 32.5
33 – 37 3 32.5 – 37.5 51 37.5
38 – 42 4 37.5 – 42.5 55 42.5
43 – 47 6 42.5 – 47.5 61 47.5
48 – 52 1 47.5 – 52.5 62=n 52.5

5
62 LQ1=17.5
(a) Q1 = value of the th = value of the 15.5 th item
4 cQ1=5
Q1 class: 17.5 – 22.5 fQ1=22

5 fq1-1=2
Q1 = 17.5 + 22 [15.5 − 2] = 20.57
3(62)
Q3 = value of the th = value of the 46.5 th item
4
LQ3=27.5
Q3 class: 27.5 – 32.5
cQ3=5
5
Q3 = 27.5 + 14 [46.5 − 34] = 31.96 fQ3=14

fq3-1=34
31.96−20.57
Quartile deviation = = 5.695
2

(b)

Q1 = 20.5
25% of the days are having production less than or equal to 20.5
units and the other 75% of the days are more than or equal to 20.5
units.
6
Q3 = 32
75% of the days are having production less than or equal to 32
units and the other 25% of the days are more than or equal to 32
units.
32−20.5
Quartile deviation = = 5.75
2

Advantages of quartile deviation:


It can be computed even though the end values of the distribution are
not known, as with the open-ended classes. Also, it is not influenced
by the extreme values.
Disadvantage of quartile deviation:
It is not fully representative of a set of measurements as it is not
based on all information available.

Standard Deviation and Variance


✓ The standard deviation, s is a very important and useful measure
of spread. It gives a measure of the deviations of the reading from
the mean, 𝑥̅ . It is calculated using all the values in the distribution.

1) For raw data


(∑ 𝑥)2
∑ 𝑥2 ∑𝑥 2 ∑ 𝑥2 −
𝜎2 = −(𝑁) and 𝑠2 = 𝑛
𝑁 𝑛−1

where 𝜎 2 = the population variance


𝑠 2 = the sample variance
7
∑𝑥 2
∑𝑥 2
Population standard deviation : 𝜎 = √𝜎 2 =√ −( )
𝑁 𝑁

(∑ 𝑥)2
∑ 𝑥2 −
Sample standard deviation : 𝑠 = √𝑠 2 = √ 𝑛
𝑛−1

Eg 6: Find the variance and standard deviation for a sample of five


numbers 2, 3, 5, 6, 8.
x2 = 138 x = 24 n=5
(∑ 𝑥)2 (24)2
∑ 𝑥2 − 138−
𝑛 5
s2 = = = 5.7
𝑛−1 5−1

s = √5.7 = 2.39

Eg 7: Following are the 1999 earnings (in thousand of dollars) before


taxes for all six employees of a small company.
29.50 16.20 35.45 21.35 49.70 24.60
Calculate the variance and standard deviation for these data.

x2 = 5920.465 x = 176.8 n=6

∑ 𝑥2 ∑𝑥 2 5920.465 176.8 2
 =
2
−(𝑁) = −( ) = 118.460
𝑁 6 6

 = √118.460 = 10.88
8
2) For grouped data
2
(∑ 𝑓𝑥)
∑ 𝑓𝑥 2 ∑ 𝑓𝑥 2 ∑ 𝑓𝑥 2 −
𝜎2 = ∑𝑓
− (∑ ) and 𝑠2 = 𝑛
𝑓 𝑛−1

where 𝜎 2 = the population variance


𝑠 2 = the sample variance
x = midpoint of a class
∑ 𝑓𝑥 2 ∑ 𝑓𝑥 2
Population standard deviation : 𝜎 = √𝜎 2 =√ ∑ 𝑓 − ( ∑𝑓 )

2
2 (∑ 𝑓𝑥)
∑ 𝑓𝑥 −
Sample standard deviation : 𝑠 = √𝑠 2 = √ 𝑛
𝑛−1

Eg 8: The following table shows the number of children in all their


families.
No. of children per family, x 1 2 3 4 5
Frequency, f 3 4 8 2 3

Find the standard deviation.

9
Solution:
x f fx fx2
1 3 3 3
2 4 8 16
3 8 24 72
4 2 8 32
5 3 15 75
f= 20 fx= 58 fx2= 198
∑ 𝑓𝑥 2 ∑ 𝑓𝑥 2 198 58 2
St. dev = √ ∑𝑓
− ( ∑ 𝑓 ) = √ 20 − (20) = 1.22

Eg 9: The following data give the frequency distribution of daily


commuting times (in minutes) from home to work for all 25
employees of a company.
Daily commuting Number of
time (minutes) employee
0 -< 10 4
10 -< 20 9
20 -< 30 6
30 -< 40 4
40 -< 50 2
Calculate the variance and standard deviation.
Solution:

10
x f fx fx2
5 4 20 100
15 9 135 2025
25 6 150 3750
35 4 140 4900
45 2 90 4050
f = 25  fx = 535  fx2 = 14825

∑ 𝑓𝑥 2 ∑ 𝑓𝑥 2 14825 535 2
Variance = ∑𝑓
− ( ∑𝑓 ) = − ( 25 ) = 135.04
25

St. dev = √135.04 = 11.621

Eg10: The following data give the frequency distribution of the


number of orders received during the past 50 days at the office
of a mail-order company.

No. of orders f
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14

Calculate the variance and standard deviation.

11
x f fx fx2
11 4 44 484
14 12 168 2352
17 20 340 5780
20 14 280 5600
f = 50  fx = 832  fx2 = 14216

2 2
(∑ 𝑓𝑥) (832)
∑ 𝑓𝑥 2 − 14216−
𝑛 50
Variance = = = 7.582
𝑛−1 50−1

St.dev = √7.582 = 2.75

Coefficient of Variation
➢ Useful for relative comparison.

𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
CV = × 100%
𝑚𝑒𝑎𝑛

Eg 11: Over a period of three months, the daily number of components


produced by two comparable machines was measured, giving
the following statistics.
Machine A: mean = 242.8, standard deviation = 20.5
Machine B: mean = 281.3, standard deviation = 23.0
Find the coefficient of variation of machines A and B. Do
comment on the results.
12
20.5
CV of Machine A = 242.8 × 100% = 8.44%

23.0
CV of Machine B = 281.3 × 100% = 8.18%

Comment: Machine B is more stable compared to machine A.


The higher the CV, the more variability.

Computer Application – Using Excel


Example
Table shows a sample of 50 final exam scores taken from last
semester’s elementary statistics class.

To get some basic statistics from the data, follow the following
procedure:

13
14
15

You might also like