0% found this document useful (0 votes)
19 views4 pages

NE 2207 Part 5

Uploaded by

sadiqulshiam07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views4 pages

NE 2207 Part 5

Uploaded by

sadiqulshiam07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Variation in the data

Consider the following two datasets:


1st dataset: 49, 50, 51
2nd dataset: 0, 50, 100
Both datasets have the same mean: 50, but the 2nd dataset has more variability. We
will discuss how to measure variability.
Measures
(a) Range
(b) Mean deviation from mean
(c) Variance and Standard deviation
Range
Range = largest value − smallest value.
Example
Data: 2, 4, 8, 5
Range = 8 – 2 = 6

• Range is not very useful. It only gives a rough idea about the variation.

Mean deviation from mean


𝑛𝑛
1
MD = � |𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ |
𝑛𝑛
𝑖𝑖=1

Note
For any data set
𝑛𝑛

�(𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ ) = 0
𝑖𝑖=1

18
Example
Data: 2, 3, 5, 6
𝑥𝑥̅ = 4
1
MD = (|2 − 4| + ⋯ + |6 − 4|) = 1.5
4

Variance
1
𝑠𝑠 2 = �(𝑥𝑥 − 𝑥𝑥̅ )2
𝑛𝑛 − 1

Example
Data: 2, 3, 5, 6
𝑥𝑥̅ = 4
1
𝑠𝑠 2 = �(𝑥𝑥 − 𝑥𝑥̅ )2
𝑛𝑛 − 1
1
= ((2 − 4)2 + (3 − 4)2 + (5 − 4)2 + (6 − 4)2 )
4−1

= 3.33

Note
The division is by 𝑛𝑛 − 1 because the number of free values (degrees of freedom) is
𝑛𝑛 − 1. If 𝑛𝑛 = 4, and we know 3 values of (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ ), the 4th one can be calculated.

Standard deviation (SD)


It is the positive square-root of variance and is denoted by 𝑠𝑠.
𝑠𝑠 = �𝑠𝑠 2
Example

In the previous example, 𝑠𝑠 = √3.33 = 1.83


19
Note

Range, mean deviation, variance and SD cannot be negative.

Empirical Rule
If a distribution (histogram) appears to be symmetric and bell-shaped, we expect
that approximately
• 68% of the data values will fall in the interval (𝑥𝑥̅ − 𝑠𝑠, 𝑥𝑥̅ + 𝑠𝑠)
(within one standard deviation of the sample mean)
• 95% of the data values will fall in the interval (𝑥𝑥̅ − 2𝑠𝑠, 𝑥𝑥̅ + 2𝑠𝑠)
(within two standard deviations of the sample mean)
• 99.7% of the data values will fall in the interval (𝑥𝑥̅ − 3𝑠𝑠, 𝑥𝑥̅ + 3𝑠𝑠)
(within three standard deviations of the sample mean)
Example
Let the mean and SD of commuting time (minutes) of workers be 60 and 10,
respectively. Let the histogram be more or less symmetric and bell-shaped. We
then have:
𝑥𝑥̅ − 𝑠𝑠 = 60 − 10 = 50
𝑥𝑥̅ + 𝑠𝑠 = 60 + 10 = 70
Approximately 68% workers have commuting time between 50 and 70 minutes.
𝑥𝑥̅ − 2𝑠𝑠 = 60 − 2 × 10 = 40
𝑥𝑥̅ + 2𝑠𝑠 = 60 + 2 × 10 = 80
Approximately 95% workers have commuting time between 40 and 80 minutes.
𝑥𝑥̅ − 3𝑠𝑠 = 60 − 3 × 10 = 30
𝑥𝑥̅ + 3𝑠𝑠 = 60 + 3 × 10 = 90
Approximately 97.7% workers have commuting time between 30 and 90 minutes.

20
30 40 50 60 70 80 90

Coefficient of variation (CV)


Let there be two datasets. First set contains small values and the second set contains
large values. Comparing their standard deviations may be misleading. In order to
compare their variability, we should use CV defined as
𝑠𝑠
CV = × 100
𝑥𝑥̅
Example
The SD of a particular type of 10-mg tablets is 1 mg, while the SD of a particular
type of 50-mg tablets is 2 mg. Which type of tablets has more variability?
Solution
For 10-mg tablets
𝑠𝑠 1
CV(1) = × 100 = × 100 = 10
𝑥𝑥̅ 10
For 50-mg tablets
𝑠𝑠 2
CV(2) = × 100 = × 100 = 4
𝑥𝑥̅ 50
Therefore, 10-mg tablets have more variability.

21

You might also like