0% found this document useful (0 votes)
31 views15 pages

S3-Measures of Dispersion

The document discusses measures of dispersion used to describe how spread out or clustered data values are around a central measure like the mean or median. It defines key measures like range, interquartile range, variance, standard deviation, and coefficient of variation. Examples are provided to demonstrate calculating and interpreting these dispersion statistics. The box and whisker plot is also introduced as a visual tool for exploring the distribution of data values and identifying outliers.

Uploaded by

Shriya Heda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views15 pages

S3-Measures of Dispersion

The document discusses measures of dispersion used to describe how spread out or clustered data values are around a central measure like the mean or median. It defines key measures like range, interquartile range, variance, standard deviation, and coefficient of variation. Examples are provided to demonstrate calculating and interpreting these dispersion statistics. The box and whisker plot is also introduced as a visual tool for exploring the distribution of data values and identifying outliers.

Uploaded by

Shriya Heda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Session 3: Measures of Dispersion

Dr. Mahesh K C 1
Dispersion: Why it is important
• An average or a measure of central tendency does not give a full picture
of the data. It is only a representative value.
• Two sets of observations may have same average and at the same time
one of the set may be much more scattered than the other.

• It gives us additional information that enables us to judge the reliability


of our measure of central tendency. If data are widely dispersed, the
central location is less representative of the data as a whole.
• We may wish to compare the dispersions of various samples. A wide
spread of values from the center is undesirable. One need to choose
distributions with the lowest dispersion.
• A measure of dispersion gives an idea of the extent to which the
individual items differ on the whole from the average considered.

• Commonly used measure of dispersion: Range, Inter-quartile range,


Variance, Standard deviation and Coefficient of variation.
Dr. Mahesh K C 2
Range and Inter-Quartile Range (IQR)

Dr. Mahesh K C 3
Variance and Standard Deviation (SD)

 x  x 
n 2

i
For ungrouped data, Variance  
2 i 1
and   Variance
n

 f x  x 
n 2

i i
For grouped data, Variance   2 i 1
and   Variance
N

Dr. Mahesh K C 4
Relative Measure of Dispersion
• The measures of dispersion that are considered so far are called
absolute measures. They are expressed in the same units in
which the observations are measured.
• A relative measure of dispersion, generally expressed in
percentage is useful in comparing two or more data sets.
• One such measure (unit less) is the coefficient of variation (CV),
defined as:
CV  ( SD / Mean) *100
• A data set having lesser the CV, is said to be more consistent, more
stable, more uniform, less variable.
• CV is generally used in the following situations:
a) comparing two or more data sets measured in different units.
b) comparing data sets that are measured in the same units but
their average values differ widely.

Dr. Mahesh K C 5
Example 1

Dr. Mahesh K C 6
Example 1 Contd..

Dr. Mahesh K C 7
Example 2
• Automobiles traveling on a road with a posted speed limit
of 55 miles per hour are checked for speed by a state
police radar system. Following is a frequency distribution
of speeds. Calculate the SD and CV.
Speed(m/hr) Frequency

45-50 10
50-55 40
55-60 150
60-65 175
65-70 75
70-75 15
75-80 10

Dr. Mahesh K C 8
fi xi fi*xi fi*(xi-xbar)^2
10 47.5 475 10*(47.5-
61.18)^2=1871
40 52.5 2100 3012
150 57.5 8625 2025
175 62.5 10937.5 343
75 67.5 5062.5 3067.5
15 72.5 1087.5 1922.5
10 77.5 775 2663.42
475 Total 29062.5 14801.5

Dr. Mahesh K C 9
Example 3 (Home work)
• Two service stations recorded the following frequency distribution for
the number of gallons of gasoline sold per car in a sample of 680 cars.
Identify which station is more consistent in terms of recording the
number of gallons sold per car. Justify your answer.

Gasoline Frequency Gasoline Frequency


Station 1 Station 2
0-5 74 0-4 64
5-10 192 4-9 182
10-15 280 9-14 250
15-20 105 14-19 130
20-25 23 19-24 29
25-30 6 24-29 25

Dr. Mahesh K C 10
Exploratory data analysis (EDA): The Box &
Whisker Plot
• Five number summary: The following five numbers are used to summarize the
data: 1) Smallest Value, 2) First quartile (Q1), 3) Median(Q2), 4) Third quartile
(Q3) and 5) Largest Value.
• Construction of Box & Whisker plot (identifying outliers)
1) A box is drawn with the ends of the box located at Q1 and Q3.
2) A vertical line is drawn in the box at the location of the median (Q2).
3) By using IQR = Q3 – Q1, limits are located. The lower limit is 1.5(IQR) below Q1
and upper limit is 1.5(IQR) above Q3.
4) Draw dotted line called whiskers from the ends of the box to the smallest and
largest values inside the limits computed in step 3.
5) Locate outliers using the symbol *. Data outside the limits computed in step3
are called outliers.

• A box plot can also used to check the skewness of the data. If more observations
lie right of the median then the data is positively skewed and if more
observations lie left of the median then the data is negatively skewed.

Dr. Mahesh K C 11
A Box plot

• A typical Box & Whisker plot is as follows:

Whiskers Whiskers

Q2
Q3
Q1

IQR
Q1-1.5*IQR
Q3+1.5*IQR

Dr. Mahesh K C 12
Box Plot Example
• A series of hourly temperatures were measured throughout the day
in degrees Fahrenheit. The recorded values are listed in order as
follows: 52, 57, 57, 57, 58, 63, 66, 66, 67, 67, 68, 69, 70, 70, 70, 70, 72,
73, 75, 75, 76, 76, 78, 79, 81, 89. Draw a Box-Whisker plot and check
whether outlier (s) exist or not.
• Answer:
Five Number Summary: Min = 52, Max = 89, Q1= 66, Q2 = 70, and Q3=75.
IQR = 9, UL = Q3+(1.5)IQR = 88.5, LL = Q1-(1.5)IQR = 52.5

Whiskers will be drawn from Q1 and Q3 up to the minimum and


maximum in the data which falls in the LL and UL. In this case 57 and 81
are falls in the (LL, UL). Hence the data points 52 and 89 are an outlier.

Dr. Mahesh K C 13
Box plot

Dr. Mahesh K C 14
Reference (s)

• Anderson, D.R, Sweeney, D.J, Williams, T.A, Camm, J.D &


Cochran, J.J (2014), Statistics for Business and Economics,
12 ed., Cengage Learning.
• Black, K (2013), Applied Business Statistics, 7th ed., Wiley
India.

Dr. Mahesh K C 15

You might also like