Lecture 3
Lecture 3
Chapter 5
Numerical descriptive measures
5.1 Measure of central location
5.2 Measure of variability
5.3 Measure of relative standing and box plots
5.4 Approximate descriptive measures for
grouped data
Optional reading: 5.5 & 5.6
1
Sum of measurements
Mean =
Number of measurements
Sample mean Population mean
ni1 x i Ni1 x i
x
n N
1
Example 5.1, page 131
The mean of the sample of 10 measurements (waiting times for a bus),
14, 8, 12, 13, 12, 6, 19, 7, 11, 8, is given by
10 x 14 8 12 ... 11 8
x i 1 i 11.0
10 10
Example 4.1 (contd.), page 85
Suppose the telephone bills of Example 4.1 represent a population
of measurements. The population mean is
i200
1 xi 196.65 468.75 ... 270.90 196.65
238.015
200 200
2
Median: another measure of central location
Example 5.2, page 132 Example 5.3, page 132
Seven employee salaries were recorded Suppose the director’s salary of $130000
(in $1000): 49, 52, 47, 53, 51, 47, 50. was added to the group recorded before.
Find the median salary. Find the median salary.
Odd number of observations Even number of observations
First, sort the salaries in an order. First, sort the salaries.
Then, locate the value in the middle. Then, locate the values in the middle.
There are two middle values!
47,47,49,50,51,52,53 47,47,49,50,51,52,53,130
47,47,49,50 51,52,53,130
Median is the value that falls in
the middle when the
measurements are arranged in
order of magnitude.
47,47,49,50, 50.5, 51,52,53,130
5
3
Example 4.1 (contd.), page 85
For large data sets, the modal class is much more
relevant than a single-value mode. The modal class
is the class with the highest frequency. There may be
one modal class or two or more modal classes
Histogram
For large data sets
70 the modal class is
60
much more relevant
50
than the a single-
Frequency
40
30
value mode.
20 In example 4.1: The
10 modal class is
0
100 150 200 250 300 350 400 450 500 515 More
[200, 250] with the
Bin highest frequency 60.
4
Excel Histogram for Example 5.6
e
10
20
30
40
50
60
70
80
90
0
or
10
More 0 The histogram is skewed to the left
M
If the distribution is negatively Modal class is [80, 90)
skewed, then Mean < Median < Mode. with the highest
73.98 < 81 < 84 frequency 28
9
5
Example 5.8, page 149
Let us use the Excel printout that is run from the
‘Descriptive statistics’ sub-menu
Trust A Trust B
Rates of return over the
past 10 years for two unit Mean 20Mean 15
trusts are shown below. Standard Standard
Error 5.29471435Error 3.152353618
Which one has a higher Median 18.6Median 14.75
level of risk? Mode #N/A Mode #N/A
Trust A: 12.3, -2.2, 24.9, 1.3, Standard Standard
37.6, 46.9, 28.4, 9.2, 7.1, 34.5 Deviation 16.7433569Deviation 9.968617423
Trust B: 15.1, 0.2 , 9.4, 15.2, Sample Sample
Variance 280.34Variance 99.37333333
30.8, 28.3, 21.2, 13.7, 1.7,14.4
Kurtosis -1.3419311Kurtosis -0.46393926
Trust A should be Skewness 0.21697141Skewness 0.106952106
considered Range 49.1Range 30.6
riskier because its standard Minimum -2.2Minimum 0.2
Maximum 46.9Maximum 30.8
deviation is larger.
Sum 200Sum 150
Count 10Count 1110
6
Example 5.9 page 151 (contd.):
other conclusions
• By the empirical rule, approximately 95% of the area
under a mound-shaped histogram lies between
( x 2s, x 2s)
95%
of the area
2 4 6 8 10 12 14 16 More
x 2s, x x 2s
• About 95% of all the measurements fall within two
standard deviations around the mean [4%, 16 %] (in
fact, 96 out of 100 call durations fall in this
range: 96%)
• The range = 16.72-3.41 = 13.31
• s range / 4 = 3.4 (in fact s = 3)
Example 5.10, page 152 : study yourself 13
7
A measure of variability:
Coefficient of Variation
s
Sample coefficient of variation : cv
x
Population coefficient of variation : CV
This coefficient provides a proportionate measure
of variation. A standard deviation of 10 may be
perceived as large when the mean value is 100,
but only moderately large when the mean value
is 500.
8
Commonly Used Percentiles…
First (lower) decile = 10th percentile
First (lower) quartile, Q1 = 25th percentile
Second (middle)quartile,Q2 = 50th percentile
Third (upper) quartile, Q3, = 75th percentile
Ninth (upper) decile = 90th percentile
Location of Percentiles
17
18
Similarly, we can have: P50 = 13.5 and P75 =21
9
Quartiles and Variability
Q1 Q2 Q3 Q1 Q2 Q3
Positively skewed Negatively skewed
histogram histogram
19
Interquartile Range…
The quartiles can be used to create another
measure of variability, the interquartile range,
which is defined as follows:
Interquartile Range = Q3 – Q1
10
Box Plots
Box Plot is a pictorial display that graphs
five main descriptive measures of the
measurement set:
• L – The largest measurement
• Q3 – The upper quartile An adjustment to this general
• Q2 – The median description of a box plot may
be needed in the presence of
• Q1 – The lower quartile outliers. See the next example.
• S – The smallest measurement
S Q1 Q2 Q3 L
21
Box Plots
The box plot is a technique that graphs five
statistics:
• the minimum and maximum observations, and
22
11
Box Plots
23
-2.75 16.05
S Q1 Q2 Q3 L
0.9 4.3 5.3 9.0 11.4 25.5
24
12
Interpreting the box plot results ???
S Q1 Q2 Q3 L
0.9 4.3 5.3 9.0 25.5
50%
25% 25%
0.9 25.5
25
n 1 i1 n
in class i
26
13
Example 5.15, page 169
i61 fimi 312.0 Class Class Frequency Midpoint
x 10.4
30 6 i limits fi mi fimi fimi2
1 2–5 3 3.5 10.5 36.75
1 k 2 ( i1 fimi )
k 2
s2 fimi 2 5–8 6 6.5 39.0 253.5
n 1 i1 n 3 8–11 8 9.5 76.0 722.0
1 312
2 – – – – – –
3,751.5 17.47 6 17–20 2 18.5 37.0 684.5
29 30
n = 30 312.0 3 751.5
10
Real values :
8 x 10.26 and s2 18.40
6
4 Approximate the mean and
standard deviation of the
2
telephone call durations
0
represented by the
2 5
3.5 6.5 8 11 14 17 20 More
frequency distribution. 27
Home assignment:
28
14