Lecture 2 PDF
Lecture 2 PDF
• Measures of Variability
Lecture 2 1
MEASURES OF CENTRAL TENDENCY (Measures of Location)
• the mean
• the median
• the mode.
We will be measuring these for samples drawn from populations, as well as for
grouped and ungrouped data.
Lecture 2 2
The arithmetic mean or average of a population is represented by μ (the Greek
letter mu); and for a sample, by (read “X bar”).
Lecture 2 3
The median (read “X tilda”) for ungrouped data is the value of the middle item
when all the items are arranged in either ascending or descending order in terms of
values:
where
= the lower class boundary of the class containing the median
= the total number of data
= the cumulative frequency of the classes before the median class
= the frequency of the median class
= class interval size
Lecture 2 4
The mode (read “X hat”) for ungrouped data is the value that occurs most
frequently in the data set.
where
= lower class boundary of the modal class
= frequency of the modal class minus the frequency of the previous class
= frequency of the modal class minus the frequency of the following class
= class interval size
Lecture 2 5
EXAMPLE : Find the mean, median and mode of the given set of numbers:
5, 4, 6, 8, 7, 2, 9, 4, 12
The median for ungrouped data is the value of the middle item when all the items
are arranged. Then we arrange the data in ascending order:
2, 4, 4, 5, 6, 7, 8, 9, 12
Therefore .
The mode for ungrouped data is the value that occurs most frequently in the data
set. Then .
Lecture 2 6
EXAMPLE : Find the mean, median and mode of the given set of numbers:
12, 5, 3, 6, 4, 9, 7, 6, 9, 11
The median for ungrouped data is the value of the middle item when all the items
are arranged. Then We arrange the data in ascending order:
3, 4, 5, 6, 6, 7, 9, 9, 11, 12
Therefore
The mode for ungrouped data is the value that occurs most frequently in the data
set. Then and
Lecture 2 7
EXAMPLE: The following table shows the distribution of the marks of a course.
A) Determine the mean, mode and median of the marks.
B) Determine the number of students whose marks are
I) less than 57.
II) between 58 and 72.
Marks No. of students Class Boundary Class Mark f.x
(f) (X)
30 – 39 2 29.5 – 39.5 34.5 69
40 – 49 10 39.5 – 49.5 44.5 445
50 – 59 9 49.5 – 59.5 54.5 490.5
Median 60 – 69 22 59.5 – 69.5 64.5 1419
class
70 – 79 14 69.5 – 79.5 74.5 1043
80 – 89 32 79.5 – 89.5 84.5 2704
Modal 90 - 99 3 89.5 – 99.5 94.5 283.5
class
92 6454
Lecture 2 8
I) less than 57. 57
2 + 10 + X
59.5
69.5
79.5
89.5
99.5
49.5
39.5
29.5
Class Boundary :
Frequency : 2 10 9 22 14 32 3
59.5 – 49.5 9 students
57 – 49.5 X
19 students have got less than 57.
59.5
69.5
79.5
89.5
99.5
49.5
39.5
29.5
Class Boundary :
Frequency : 2 10 9 22 14 32 3
59.5 – 49.5 9 students
59.5 - 58 X
The 2nd quartile, the 5th decile and the 50th percentile correspond to the median.
The 25th and 75th percentiles correspond to the 1st and 3rd quartiles, respectively.
Lecture 2 10
EXAMPLE : A sample of 25 workers in a plant receive the hourly wages (in dollars) as:
3.65 3.78 3.85 3.95 4.00
4.10 4.25 3.50 3.85 3.96
3.60 3.90 4.28 3.75 3.95
4.05 4.08 4.15 3.80 4.05
3.88 3.95 4.06 4.18 4.05
a) Construct a frequency distribution table having $0.10 class interval size.
b) Find the 1st and 2nd quartiles and the 3rd deciles and 60th percentiles for both ungrouped
and grouped data.
1/4 n are in the left side and 3/4 n are in the right side of 1st quartile.
2/4 n are in the left side and 2/4 n are in the right side of 2nd quartile.
3/10 n are in the left side and 7/10 n are in the right side of 3rd decile.
60/100 n are in the left side and 40/100 n are in the right side of 60th percentile.
Lecture 2 12
For grouped data:
Class Frequency
Boundary
3.495 – 3.595 1
3.595 – 3.695 2
3.695 – 3.795 2
3.795 – 3.895 4
3.895 – 3.995 5
3.995 – 4.095 6
4.095 – 4.195 3
4.195 – 4.295 2
25
Lecture 2 13
MEASURES OF DISPERSION (Measures of Variability)
Dispersion refers to the variability or spread in the data. The most important
measures of dispersion are
the variance
the standard deviation
the coefficient of variation .
We will measure these for grouped and ungrouped data.
Lecture 2 14
Variance of a population is represented by (the Greek letter sigma squared );
and for a sample, by .
NOTE: The quantity n − 1 is often called the degrees of freedom associated with
the variance estimate.
Lecture 2 15
Standard deviation of a population is represented by ; and for a sample, by s.
They are the positive square roots of their respective variances.
Lecture 2 16
The coefficient of variation (CV) is defined as the ratio of the standard deviation
to the mean:
The standard deviations of two variables, while both measure dispersion in their
respective variables, cannot be compared to each other in a meaningful way to
determine which variable has greater dispersion because they may vary greatly in
their units and the means about which they occur. The standard deviation and
mean of a variable are expressed in the same units, so taking the ratio of these two
allows the units to cancel. This ratio can then be compared to other such ratios in a
meaningful way: between two variables, the variable with the smaller CV is less
dispersed than the variable with the larger CV.
Lecture 2 17
EXAMPLE: An engineer is interested in testing the “bias” in a pH meter. Data are
collected on the meter by measuring the pH of a neutral substance (pH = 7.0).
A sample of size 10 is taken, with results given by
7.07 7.00 7.10 6.97 7.00 7.03 7.01 7.01 6.98 7.08
Lecture 2 18
EXAMPLE: Find an estimate of the variance and standard deviation of the
following data for the marks obtained in a test (out of 50) by 88 students.
Lecture 2 19
EXAMPLE: A company has two sections with 40 and 65 employees respectively.
Their average weekly wages are $550 and $350. The standard deviation are $10
and $9. Which section has larger variability in wages?
Section B has larger variability in the wages, since the CV of section B is greater than
the CV of section A.
Lecture 2 20