S3-Measures of Dispersion
S3-Measures of Dispersion
Dr. Mahesh K C 1
Dispersion: Why it is important
• An average or a measure of central tendency does not give a full picture
of the data. It is only a representative value.
• Two sets of observations may have same average and at the same time
one of the set may be much more scattered than the other.
Dr. Mahesh K C 3
Variance and Standard Deviation (SD)
x x
n 2
i
For ungrouped data, Variance
2 i 1
and Variance
n
f x x
n 2
i i
For grouped data, Variance 2 i 1
and Variance
N
Dr. Mahesh K C 4
Relative Measure of Dispersion
• The measures of dispersion that are considered so far are called
absolute measures. They are expressed in the same units in
which the observations are measured.
• A relative measure of dispersion, generally expressed in
percentage is useful in comparing two or more data sets.
• One such measure (unit less) is the coefficient of variation (CV),
defined as:
CV ( SD / Mean) *100
• A data set having lesser the CV, is said to be more consistent, more
stable, more uniform, less variable.
• CV is generally used in the following situations:
a) comparing two or more data sets measured in different units.
b) comparing data sets that are measured in the same units but
their average values differ widely.
Dr. Mahesh K C 5
Example 1
Dr. Mahesh K C 6
Example 1 Contd..
Dr. Mahesh K C 7
Example 2
• Automobiles traveling on a road with a posted speed limit
of 55 miles per hour are checked for speed by a state
police radar system. Following is a frequency distribution
of speeds. Calculate the SD and CV.
Speed(m/hr) Frequency
45-50 10
50-55 40
55-60 150
60-65 175
65-70 75
70-75 15
75-80 10
Dr. Mahesh K C 8
fi xi fi*xi fi*(xi-xbar)^2
10 47.5 475 10*(47.5-
61.18)^2=1871
40 52.5 2100 3012
150 57.5 8625 2025
175 62.5 10937.5 343
75 67.5 5062.5 3067.5
15 72.5 1087.5 1922.5
10 77.5 775 2663.42
475 Total 29062.5 14801.5
Dr. Mahesh K C 9
Example 3 (Home work)
• Two service stations recorded the following frequency distribution for
the number of gallons of gasoline sold per car in a sample of 680 cars.
Identify which station is more consistent in terms of recording the
number of gallons sold per car. Justify your answer.
Dr. Mahesh K C 10
Exploratory data analysis (EDA): The Box &
Whisker Plot
• Five number summary: The following five numbers are used to summarize the
data: 1) Smallest Value, 2) First quartile (Q1), 3) Median(Q2), 4) Third quartile
(Q3) and 5) Largest Value.
• Construction of Box & Whisker plot (identifying outliers)
1) A box is drawn with the ends of the box located at Q1 and Q3.
2) A vertical line is drawn in the box at the location of the median (Q2).
3) By using IQR = Q3 – Q1, limits are located. The lower limit is 1.5(IQR) below Q1
and upper limit is 1.5(IQR) above Q3.
4) Draw dotted line called whiskers from the ends of the box to the smallest and
largest values inside the limits computed in step 3.
5) Locate outliers using the symbol *. Data outside the limits computed in step3
are called outliers.
• A box plot can also used to check the skewness of the data. If more observations
lie right of the median then the data is positively skewed and if more
observations lie left of the median then the data is negatively skewed.
Dr. Mahesh K C 11
A Box plot
Whiskers Whiskers
Q2
Q3
Q1
IQR
Q1-1.5*IQR
Q3+1.5*IQR
Dr. Mahesh K C 12
Box Plot Example
• A series of hourly temperatures were measured throughout the day
in degrees Fahrenheit. The recorded values are listed in order as
follows: 52, 57, 57, 57, 58, 63, 66, 66, 67, 67, 68, 69, 70, 70, 70, 70, 72,
73, 75, 75, 76, 76, 78, 79, 81, 89. Draw a Box-Whisker plot and check
whether outlier (s) exist or not.
• Answer:
Five Number Summary: Min = 52, Max = 89, Q1= 66, Q2 = 70, and Q3=75.
IQR = 9, UL = Q3+(1.5)IQR = 88.5, LL = Q1-(1.5)IQR = 52.5
Dr. Mahesh K C 13
Box plot
Dr. Mahesh K C 14
Reference (s)
Dr. Mahesh K C 15