3 Numerical Descriptive Measures
3 Numerical Descriptive Measures
measures
Summary Definitions
Mean (average)
The sum of all the data entries divided by the number of entries.
Population mean: x
u N
Sample mean: x
x n
Population mean µ
i1Xi X X
N 1 X
2
N
N
For a sample of size n:
The ith value
Pronounced x-bar
n
i1
Xi X X
X n X1 2
n
n
11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20
Mean Mean =
=13 14
Measures of Central Tendency : Median
• Median is the value that divides the data into two parts- 50% of the observations
have values less than the median and 50% of the observations have values
greater then the median.
• The location of the median when the values are in numerical order
(smallest to largest):
n1
Median position 2 position in the ordered
data
11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20
Median = 13 Median = 13
A sample of 10 adults was asked to report the number of hours they spent on the internet the
previous month. The results are listed here. Calculate the sample mean and Median.
0 7 12 5 33 14 8 0 9 22
The median is the average of the fifth and sixth observations (the middle two), which
are 8 and 9, respectively. Thus, the median is 8.5.
Measures of Central Tendency : The Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Mode = 9 No Mode
Copyright © 2017 Pearson Education, Ltd.
Who wins between Mean, Median, and Mode?
Out of the three measures to choose from, which one should we use?
• The mean is generally our first selection. However, there are several
circumstances when the median is better.
• The mode is seldom the best measure of central location.
• One advantage the median holds is that it not as sensitive to
extreme values as is the mean.
All observations except 0 occur once. There are two 0s. Thus, the
mode is 0. As you can see, this is a poor measure of central location. It
is nowhere near the center of the data. Compare this with the mean
11.0 and median 8.5 and we can see that mean and median are
superior measures.
Activity
The prices (in dollars) for a sample of roundtrip flights from Chicago, Illinois to
Cancun, Mexico are listed. What is the mean, median, mode price of the
flights?
1872 432 397 427 388 482 397 358 432
Mean=5185/9= 576.111
Variation
Measures of variation
give information on the
spread or variability of
the data values which
measure of location fail
to tell.
Same
centre,
different
variation
Measures of Variation: The Range
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 2 = 12
Potential problem with Range?
Once again let us think about the following example on grades.
Grades of course 1: {4, 4, 4, 4, 50}.
Grades of course 2: {4, 8, 15, 24, 39, 50}.
Range= 46 in both the courses but the two courses have very
different distributions.
Deviation
The difference between the data entry, x, and the mean of the
data set.
N
Population variance: i
(X μ)2
i1
σ2 N
Where
μ = population mean, N = population size
Xi = ith value of the variable X
Copyright © 2017 Pearson Education, Ltd.
Numerical Descriptive Measures for a Population: Standard
Deviation σ
Most commonly used measure of variation.
Shows average variation about the mean.
Is the square root of the population variance.
Has the same units as the original data.
N
Population standard deviation:
i
(X μ) 2
i1
σ
N
Measures of Variation: Sample Variance
Average (approximately) of squared deviations of values from the
mean.
n
Sample variance:
2
(X X)
i
2
S i1
n -1
Where
X = arithmetic mean
n = sample size
Xi = ith value of the variable
X
Measures of Variation: Sample Standard Deviation
Most commonly used measure of variation.
Shows average variation about the mean.
Is the square root of the variance.
Has the same units as the original data.
Sample standard deviation: (X i X) 2
S i 1
n -1
Interpreting Standard Deviation
Standard deviation is a measure of the typical amount an entry
deviates from the mean.
The more the entries are spread out, the greater the
standard deviation.
.
Measures of Variation: Comparing Standard
Deviations
• What can you say about the distribution of grades if the histogram is bell-
shaped?
• We know that approximately 68% of the marks fell between 65 and 75,
approximately 95% of the marks fell between 60 and 80, and
approximately 99.7% of the marks fell between 55 and 85.
• What can you say about the distribution of grades if the shape of the
histogram is not known?
• If the shape of the histogram is not known, we can say that at least 75%
of the marks fell between 60 and 80, and at least 88.9% of the marks fell
between 55 and 85. (k= 2 and 3.)
The Coefficient of Variation (CV)
Measures relative variation
Always in percentage (%)
Shows variation relative to mean
Is the standard deviation divided by the mean, multiplied by 100%
Comparing Coefficients of Variation
Stock A:
Average price last year = $50
Standard deviation = $5
S $5
CVA 100% 100% 10%
X $50
S $5
CVB 100% 100% 5%
X $100
Measures of Variation:
Summary Characteristics
The more the data are spread out, the greater the range, variance,
and standard deviation.
The more the data are concentrated, the smaller the range, variance,
and standard deviation.
If the values are all the same (no variation), all these measures will be
zero.
The measure of variability can be used for interval data and Ordinal data
(IQR).
Measure of Relative Standing
Q1 Q2
Q3
Solution:
• Q2 divides the data set into two halves.
Lower half Upper half
6 7 8 10 11 15 17 18 18 19 20 31 54 59 104
Q2
The first (16/4th position) =4th position = 10, second quartiles (16*2)/4 =8th
position = 18 and third quartiles (16*3)/4 =12th position = 31
Lower half Upper half
6 7 8 10 11 15 17 18 18 19 20 31 54 59 104
Q1 Q2 Q3
Measures the range of the middle 50% of the data that shows how
spread out the data is.
The difference between the third and first quartiles.
IQR = Q3 – Q1
Large values of this statistic mean that the 1st and 3rd quartiles are
far apart indicating a high level of variability.
Find the interquartile range of the data set. Recall Q1 = 10, Q2 = 18,
and Q3 = 31
Solution:
• IQR = Q3 – Q1 = 31 – 10 = 21
The number of power plants in the middle portion of the data set vary by at
most 21.
Describing Relationship between Two
Variables
n n n
(X X)(Y Y)
i i (Xi X) 2
i
(Y Y ) 2
Because we’ve already calculated the covariances we need to compute only the standard deviations of X
and Y.
For Set 1: Strong positive linear relationship
For Set 2: Strong negative linear relationship
For Set 3: Weak negative linear relationship