0% found this document useful (0 votes)
35 views

Lecture Notes 4

This document discusses measures of variation that are used in addition to measures of central tendency to more accurately describe data sets. It defines key measures of dispersion including range, variance, standard deviation, and coefficient of variation. Formulas are provided for calculating population and sample variance and standard deviation using both ungrouped and grouped data. Examples are included to demonstrate calculating these measures of variation.

Uploaded by

mi5180907
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Lecture Notes 4

This document discusses measures of variation that are used in addition to measures of central tendency to more accurately describe data sets. It defines key measures of dispersion including range, variance, standard deviation, and coefficient of variation. Formulas are provided for calculating population and sample variance and standard deviation using both ungrouped and grouped data. Examples are included to demonstrate calculating these measures of variation.

Uploaded by

mi5180907
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Lecture 4

3–2 Measures of Variation:

In statistics, to describe the data set accurately, statisticians must know more than the measures of central
tendency. Although location is generally considered to be the most important single characteristic of a distribution,
the variability or dispersion of the values is also very important.
The mean or median of a variable provides an inadequate description of the distribution of that variable since the
list of values would include a wide range. Therefore, description is best presented by using a measure of central
tendency as well as a measure of dispersion.
For normally distributed data, using mean and standard deviation is the most appropriate while for non-normally
distributed data, using median and interquartile range is more appropriate.
Measures of dispersion include:

- Range.
- Variance.
- Standard deviation.
- Coefficient of variation.

Descriptive
Statistics

Ungrouped Grouped
Data Data

Measures of Measures of Measures of


Central Variation Position
Tendency

Range:

The range is the highest value minus the lowest value. The symbol R is used for the range.

R = highest value - lowest value

PHM111s - Probability and Statistics


Example 4–1: The salaries for the staff of the XYZ Manufacturing Co. are shown here. Find the range.
Staff Salary
Owner $100,000
Manager 40,000
Sales representative 30,000
Workers 25,000
15,000
18,000

Solution
The range is R = $100,000 - $15,000 = $85,000.

Population Variance and Standard Deviation

The variance is the average of the squares of the distance each value is from the mean.
The symbol for the population variance is σ .
2

The formula for the population variance is

σ =∑2(X − µ) 2

N
where
X = individual value
µ = population mean
N = population size
The standard deviation is the square root of the variance. The symbol for the population standard deviation is σ .
The corresponding formula for the population standard deviation is

σ
= σ
= 2 ∑ ( X − µ) 2

Example 4–2:

Find the variance and standard deviation for brand A paint.


10, 60, 50, 30, 40, 20

Solution
Step 1 Find the mean for the data.

µ
= ∑=
X 10 + 60 + 50 + 30 + 40 + 20 210
= = 35
N 6 6

PHM111s - Probability and Statistics


Step 2 Subtract the mean from each data value.
10 - 35 = -25 50 - 35 = +15 40 - 35 = +5
60 - 35 = +25 30 - 35 = -5 20 - 35 = -15
Step 3 Square each result.
(-25)2 = 625 (+15)2 = 225 (+5)2 = 25
2 2
(+25) = 625 (-5) = 25 (-15)2 = 225
Step 4 Find the sum of the squares.
625 + 625 + 225 + 25 + 25 + 225 = 1750
Step 5 Divide the sum by N to get the variance.
Variance = 1750 ÷ 6 = 291.7
Step 6 Take the square root of the variance to get the standard deviation. Hence, the standard deviation equals
291.7 , or 17.1. It is helpful to make a table.
A B C
Values X X −µ ( X − µ )2

10 -25 625
60 +25 625
50 +15 225
30 -5 25
40 +5 25
20 -15 225

1750

Column A contains the raw data X. Column B contains the differences X − µ obtained in step 2. Column C
contains the squares of the differences obtained in step 3.

Sample Variance and Standard Deviation

Case 1: Ungrouped Data

The formula for the sample variance, denoted by s2, is

=∑
(X − X ) 2

s 2

n −1
where
X = individual value
X = sample mean
n = sample size
The symbol for the sample standard deviation is s .

=s s
= 2 ∑(X − X ) 2

n −1
where
X = individual value
X = sample mean
n = sample size
PHM111s - Probability and Statistics
The shortcut formulas for computing the variance and standard deviation for data obtained from samples are as
follows:
Variance Standard deviation
n( ∑ X 2 ) − ( ∑ X ) 2
s = ∑
n( X 2 ) − ( ∑ X ) 2
2
s=
n(n − 1) n(n − 1)

Example 4–3: Find the sample variance and standard deviation for the amount of European auto sales for a sample
of 6 years shown. The data are in millions of dollars.
11.2, 11.9, 12.0, 12.8, 13.4, 14.3
Solution
Step 1 Find the sum of the values.
∑ X = 11.2 + 11.9 + 12.0 + 12.8 + 13.4 + 14.3 = 75.6
Step 2 Square each value and find the sum.
∑X 2
= 11.22 + 11.92 + 12.02 + 12.82 + 13.42 + 14.32 = 958.94
Step 3 Substitute in the formulas and solve.
n( ∑ X 2 ) − ( ∑ X ) 2
s = 2

n(n − 1)
6(958.94) − 75.62
=
6(6 − 1)
38.28
= = 1.276
30
The variance is 1.28 rounded.
=s 1.28 1.13
=
Hence, the sample standard deviation is 1.13.

Case 2: Grouped Data

Step 1 Make a table as shown, and find the midpoint of each class.
A B C D E
Class Frequency Midpoint f .X m f . X m2
Step 2 Multiply the frequency by the midpoint for each class, and place the products in column D.

Step 3 Multiply the frequency by the square of the midpoint, and place the products in column E.

Step 4 Find the sums of columns B, D, and E. (The sum of column B is n. The sum of column D is ∑ f .X m
. The
sum of column E is ∑ f . X .)
2
m
Step 5 Substitute in the formula and solve to get the variance.
n( ∑ f . X m 2 ) − ( ∑ f . X m ) 2
s =
2

n(n − 1)
Step 6 Take the square root to get the standard deviation.

PHM111s - Probability and Statistics


Example 4–4 Find the variance and the standard deviation for the frequency distribution of the following data:
(data represent the number of miles that 20 runners ran during one week).
Class Frequency Midpoint
5.5–10.5 1 8
10.5–15.5 2 13
15.5–20.5 3 18
20.5–25.5 5 23
25.5–30.5 4 28
30.5–35.5 3 33
35.5–40.5 2 38
Solution

Step 1 Make a table as shown, and find the midpoint of each class.
A B C D E
Frequency Midpoint
Class f Xm f .X m f . X m2

5.5–10.5 1 8
10.5–15.5 2 13
15.5–20.5 3 18
20.5–25.5 5 23
25.5–30.5 4 28
30.5–35.5 3 33
35.5–40.5 2 38

Step 2 Multiply the frequency by the midpoint for each class, and place the products in column D.
(1)(8) = 8 (2)(13) = 26 . . . (2)(38) = 76

Step 3 Multiply the frequency by the square of the midpoint, and place the products in column E.
(1)(8)2 = 64 (2)(13)2 = 338 ... (2)(38)2 = 2888

Step 4 Find the sums of columns B, D, and E. The sum of column B is n, the sum of column D is ∑ f .X m
, and
the sum of column E is ∑ f .X 2
m
. The completed table is shown.

A B C D E
Class Frequency Midpoint f .X m f . X m2

5.5–10.5 1 8 8 64
10.5–15.5 2 13 26 338
15.5–20.5 3 18 54 972
20.5–25.5 5 23 115 2,645
25.5–30.5 4 28 112 3,136
30.5–35.5 3 33 99 3,267
35.5–40.5 2 38 76 2,888

n = 20 ∑ f .X m
= 490 ∑ f .X 2
m
= 13,310

PHM111s - Probability and Statistics


Step 5 Substitute in the formula and solve for s2 to get the variance.
n( ∑ f . X m 2 ) − ( ∑ f . X m ) 2
s2 =
n(n − 1)
20(13,310) − 4902
=
20(20 − 1)
266,200 − 240,100
=
20(19)
26,100
= = 68.7
380
Step 6 Take the square root to get the standard deviation.
=s 68.7 8.3
=

Coefficient of Variation

Whenever two samples have the same units of measure, the variance and standard deviation for each can be
compared directly. A statistic that allows you to compare standard deviations when the units are different,
is called the coefficient of variation.
The coefficient of variation, denoted by CVar, is the standard deviation divided by the mean. The result is
expressed as a percentage.

For samples, For populations,


s σ
CVar = .100% CVar = .100%
X µ

Example 4–5 The mean of the number of sales of cars over a 3-month period is 87, and the standard deviation is 5.
The mean of the commissions is $5225, and the standard deviation is $773. Compare the variations
of the two.

Solution

The coefficients of variation are


s 5
CVar
= = .100% = .100% 5.7% sales
X 87
s 773
CVar =
= .100% = .100% 14.8% commissions
X 5225
Since the coefficient of variation is larger for commissions, the commissions are more variable than the sales.

PHM111s - Probability and Statistics


3-3 Measures of Position

Descriptive
Statistics

Measures of Measures of Measures of


Central Variation Position
Tendency

Standard Scores

A z score or standard score for a value is obtained by subtracting the mean from the value and dividing the result
by the standard deviation. The symbol for a standard score is z. The formula is
value − mean
z=
standard deviation
For samples, the formula is
X−X
z=
s
For populations, the formula is
X −µ
z=
σ
The z score represents the number of standard deviations that a data value falls above or below the mean.
Example 4–6: A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of 10; she
scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare her relative
positions on the two tests.
Solution
First, find the z scores. For calculus the z score is
X − X 65 − 50
=z = = 1.5
s 10
For history the z score is
30 − 25
=z = 1.0
5
Since the z score for calculus is larger, her relative position in the calculus class is higher than her relative position
in the history class.

When all data for a variable are transformed into z scores, the resulting distribution will have a mean of 0 and a
standard deviation of 1. A z score, then, is actually the number of standard deviations each value is from the mean
for a specific distribution.
PHM111s - Probability and Statistics
Percentiles
Percentiles divide the data set into 100 equal groups.
Percentile Formula
The percentile corresponding to a given value X is computed by using the following formula:
(number of values below X ) + 0.5
Percentile = .100%
total number of values

n. p
and the order of the value corresponding to certain percentile is c =
100
where
n = total number of values
p = percentile
Example 4–7: A teacher gives a 20-point test to 10 students. The scores are shown here. Find the percentile rank
of a score of 12. Also find the value corresponding to the 25th percentile.
18, 15, 12, 6, 8, 2, 3, 5, 20, 10
Solution:
Arrange the data in order from lowest to highest.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Then substitute into the formula.
(number of values below X ) + 0.5
Percentile = .100%
total number of values
Since there are six values below a score of 12, the solution is
6 + 0.5
= Percentile = .100% 65th percentile
10
(10)(25)
and c = = 2.5 ⇒ c = 3
100
Hence, the value 5 corresponds to the 25th percentile.
(Note: If c is not a whole number, round it up to the next whole number as in this example.)
Thus, a student whose score was 12 did better than 65% of the class.
Example 4–8: Using the data set in the previous Example, find the value that corresponds to the 60th percentile.
Solution
Arrange the data in order from smallest to largest.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Substitute in the formula.
n. p (10)(60)
= c = = 6
100 100

If c is a whole number, use the value halfway between the c and c +1 values when counting up from the lowest
value. In this case, the 6th and 7th values.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20

6th value 7th value


The value halfway between 10 and 12 is 11. Find it by adding the two values and dividing by 2.
10 + 12
= 11
2
Hence, 11 corresponds to the 60th percentile. Anyone scoring 11 would have done better than 60% of the class.
PHM111s - Probability and Statistics
Quartiles and Deciles

Case 1: Ungrouped Data

Quartiles divide the distribution into four groups, separated by Q1, Q2, Q3.
Note that Q1 is the same as the 25th percentile; Q2 is the same as the 50th percentile,
or the median; Q3 corresponds to the 75th percentile, as shown:

Example 4–9: Find Q1, Q2, and Q3 for the data set 15, 13, 6, 5, 12, 50, 22, 18.

Solution

Step 1 Arrange the data in order.


5, 6, 12, 13, 15, 18, 22, 50
Step 2 Find the median (Q2).
5, 6, 12, 13, 15, 18, 22, 50

MD
13 + 15
Q= =MD = 14
2 2
Step 3 Find the median of the data values less than 14.
5, 6, 12, 13

Q1
6 + 12
= Q = 9
1 2
So Q1 is 9.
Step 4 Find the median of the data values greater than 14.
15, 18, 22, 50

Q3
18 + 22
= Q = 20
3 2
Here Q3 is 20. Hence, Q1 = 9, Q2 = 14, and Q3 = 20.

In addition to dividing the data set into four groups, quartiles can be used as a rough measurement of variability.
The interquartile range (IQR) is defined as the difference between Q1 and Q3 and is the range of the middle 50%
of the data.
IQR = Q3 - Q1
The interquartile range is used to identify outliers, and it is also used as a measure of variability in exploratory data
analysis.

PHM111s - Probability and Statistics


Midhinge: Quartiles are also useful in the development of a measure of location that is called the midhinge which
is the mean of the first and third quartiles in a set of data:

Q +Q
Midhinge = 1 3
2
Deciles divide the distribution into 10 groups, as shown. They are denoted by D1, D2, etc.

Note that D1 corresponds to P10; D2 corresponds to P20; etc. Deciles can be found by using the formulas given for
percentiles. Taken altogether then, these are the relationships among percentiles, deciles, and quartiles.
Deciles are denoted by D1, D2, D3, . . . , D9, and they correspond to P10, P20, P30, . . . , P90.
Quartiles are denoted by Q1, Q2, Q3 and they correspond to P25, P50, P75.
The median is the same as P50 or Q2 or D5.

Case 2: Grouped Data

Using the same method of calculations as in the median, we can get Q1 and Q3 equation as follows:

 n   3n 
 −F   4 −F 
Q L + i
= 4 Q
= L + i
1 Q1 , 3 Q3 
 fQ   fQ3 
 1   

Example 4–10: Based on the grouped data below, find the interquartile range (IQR).
Time to travel to work f

1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 – 50 7
Solution

Construct the cumulative frequency distribution

Height (in cm) f cf


1 – 10 8 8
11 – 20 14 22
21 – 30 12 34
31 – 40 9 43
41 – 50 7 50

PHM111s - Probability and Statistics


n 50
Class Q=
1 = = 12.5 → class Q1 is the 2nd class
4 4
 n 
 −F 
Q L + i
= 4
1 Q1 
 fQ 
 1 
 12.5−8 
10.5 + 10 
= =13.7143
 14 

3n 3(50)
=
Class Q3 = = 37.5 → class Q3 is the 4th class
4 4
 3n 
 4 −F 
Q
= L + i 
3 Q3
 fQ3 
 
 37.5− 34 
= 30.5 + 10  =34.3889
 9 
IQR = Q3 - Q1 = 34.3889 - 13.7143 = 20.6746

Outliers
An outlier is an extremely high or an extremely low data value when compared with the rest of the data values.

Example 4–11: Check the following data set for outliers.


5, 6, 12, 13, 15, 18, 22, 50

Solution

The data value 50 is extremely suspect. These are the steps in checking for an outlier.
Step 1 Find Q1 and Q3. From the previous example, Q1 is 9 and Q3 is 20.
Step 2 Find the interquartile range (IQR), which is Q3 - Q1.
IQR = Q3 - Q1 = 20 - 9 = 11

Step 3 Multiply this value by 1.5.


1.5(11) = 16.5

Step 4 Subtract the value obtained in step 3 from Q1, and add the value obtained in step 3 to Q3.
9 - 16.5 = -7.5 and 20 + 16.5 = 36.5

Step 5 Check the data set for any data values that fall outside the interval from -7.5 to 36.5. The value 50 is outside
this interval; hence, it can be considered an outlier.

PHM111s - Probability and Statistics


3–4 Exploratory Data Analysis

A boxplot is a graph of a data set obtained by drawing a horizontal line from the minimum data value to Q1,
drawing a horizontal line from Q3 to the maximum data value, and drawing a box whose vertical sides pass
through Q1 and Q3 with a vertical line inside the box passing through the median or Q2.

Example 4–12: The number of meteorites found in 10 states of the United States is 89, 47, 164, 296, 30, 215, 138,
78, 48, 39. Construct a boxplot for the data.

Solution

Step 1 Arrange the data in order:


30, 39, 47, 48, 78, 89, 138, 164, 215, 296
Step 2 Find the median.
30, 39, 47, 48, 78, 89, 138, 164, 215, 296

Median
78 + 89
Q= =MD = 83.5
2 2
Step 3 Find Q1.
30, 39, 47, 48, 78

Q1
Step 4 Find Q3.
89, 138, 164, 215, 296

Q3
Step 5 Draw a scale for the data on the x axis.

Step 6 Located the lowest value, Q1, median, Q3, and the highest value on the scale.

Step 7 Draw a box around Q1 and Q3, draw a vertical line through the median, and connect the upper value and the
lower value to the box as in the figure. Fi

gure 3–

PHM111s - Probability and Statistics

You might also like