CH03 - Descriptive Statistics 2
CH03 - Descriptive Statistics 2
CHAPTER 3
Descriptive Statistics
• Measurements:
Mean
Mode
Median
3
Mean
• Mean is the sum of the observations divided by
the number of observations.
• It is the most common measure of central
tendency,
n
x i
Sample mean: x i 1
n
N
x i
Population mean: i 1
4
Example
• The mean is 15
5
• The mean is unique for every set of data.
• Meaningful for interval and ratio data.
• Can be affected by outliers – rare observations that
are radically different from the rest.
• Example: 3, 4, 6, 4, 7, 3, 6, 5, 1500
• Mean: 170.89
6
Mean of Grouped Data
fx i i
f1 x1 f 2 x2 .... f h xh
x i 1
n f1 f 2 .... f h
where,
𝑓 : frequency in a class or frequency of an observed
value.
𝑥 : class midpoint or an observed value.
𝑛 : number of classes or number of observed values.
7
Example
Find the mean value for the following data:
Number of
children 1 2 3 4 5 6 7
(𝑥 )
Frequency
5 12 8 3 0 0 1
(𝑓 )
Solution:
∑ 𝑓 𝑥 = 5(1) + 12(2) + 8(3) + 3(4) + 0(5)+ 0(6) + 1(7) = 72
h
fx i i
72
x i 1
2 .5
n 29
8
Example
Find the mean value for the following data:
Class interval Frequency
41 - 50 7
51 - 60 10
61 - 70 15
71 - 80 2
81 - 90 6
Total 50
9
Example
fx i i
2520
x i 1
50.4
n 50
10
Median
• The median is the middle value when the data are
arranged from smallest to largest.
• To find the median, your numbers have to be listed in
an order, so you may have to rewrite your list first.
• For an odd number of observations, the formula for
the place to find the median is
([the number of data points] + 1) ÷ 2
11
Example
• Arrange in order: 13, 13, 13, 13, 14, 14, 16, 18, 21
12
• For an even number of observations, the median is
the mean of the two middle numbers.
• Example:
2, 3, 3, 5, 6, 7, 8, 9
(5 + 6) ÷ 2 = 5.5 (median)
13
• The median is meaningful for ratio, interval, and
ordinal data.
• The median is not affected by outliers.
• Example: 3, 4, 6, 4, 7, 3, 6, 5, 1500
3, 3, 4, 4, 5, 6, 6, 7, 1500
• Median: 5
14
Median of Grouped Data
• The median for the grouped data is given by
where,
(Median class is the first class with the value of cumulative frequency equal
at least N/2)
L : lower limit of median class,
N : total number of observations,
cfp : cumulative frequency of the class preceding the median class,
fmed : frequency of the median class,
W : median class size.
15
Example
Class N÷2 = 40÷2=20
boundary Cumulative
Class Frequency
frequency .: median class = 51-55
Total 40
70
= 51.5
16
Mode
17
Example
Case study: On a cold winter day in January, the
temperature for 9 North American cities is recorded in
Fahrenheit as follows:
-8, 0, -3, 4, 12, 0, 5, -1, 0
What is the mode of these temperatures?
Solution:
Ordering the data from lowest to highest, we get:
-8, -3, -1, 0, 0, 0, 4, 5, 12
The mode of these temperatures is 0.
18
Example
Case study: A marathon race was completed by 5
participants. The time taken by each participant is
recorded as follows:
2.7 hr, 8.3 hr, 3.5 hr, 5.1 hr, 4.9 hr
What is the mode of these times given in hours?
Solution:
Ordering the data from least to greatest, we get:
2.7, 3.5, 4.9, 5.1, 8.3
Since each value occurs only once in the data set, there is
no mode for this set of data.
19
Example
Case study: In a crash test, 11 cars were tested to
determine what impact speed was required to obtain
minimal bumper damage. The collected data as shown
below:
24, 15, 18, 20, 18, 22, 24, 26, 18, 26, 24
Find the mode of the speeds given in miles per hour.
Solution:
Ordering the data from least to greatest, we get:
15, 18, 18, 18, 20, 22, 24, 24, 24, 26, 26
Since both 18 and 24 occur three times, the modes are 18
and 24 miles per hour. This data set is bimodal.
20
Mode of Grouped Data
21
• The mode of grouped data is calculated using the
formula:
22
Example
Compute the mode of the test score below.
Score
Frequency
(Class)
41 - 45 1 .: modal class = 26 - 30
36 - 40 5 l 26 0.5 25.5
31 - 35 7 f0 h5
25.5-30.5 26 - 30 16 f1
f1 14
21 - 25 8 f2
f0 7
16 - 20 2
f2 8
Total
14 − 7
𝑀𝑜𝑑𝑒 = 25.5 + 5 = 28.2
2 ∗ 14 − 7 − 8
23
24
Example
25
Exercise #1
The owner of a shoe shop recorded the sizes of the feet of
all the customers who bought shoes in his shop in one
morning. These sizes are listed below:
26
Exercise #2
The table below gives the number of accidents each year
at a particular road junction:
2001 2002 2003 2004 2005 2006 2007 2008
4 5 4 52 10 5 3 5
(a) Calculate the mean, median and mode for the values above.
(b) A road safety group want the council to do some improvement
to make this junction safer. Which measure will they use to
argue for this?
(c) The council don't want to spend money on the road junction.
Which measure will they use to argue that safety work is not
necessary?
28
Exercise #3
You grew fifty baby carrots using special soil. You dig
them up and measure their lengths (to the nearest mm)
and group the results:
Length (mm)
Find the following: Frequency
Class
(a) mean
(b) median 150 – 154 5
(c) mode 155 – 159 2
160 – 164 6
165 – 169 8
170 – 174 9
175 – 179 11
180 – 184 6
185 – 189 3
30
Exercise #4
Find mean, median and mode corresponding to the
frequency table of samples of students cars and staff cars
obtained from a college.
32
Data Profiles
• Percentile
• Quartile
34
Percentile
In a population or a sample, the P-th percentile is a
value such that at least P percent of the values take on
this value or less and at least (100-P) percent of the
values take on this value or more.
35
Example
36
• Sort the data set so measurements are in order from
lowest to highest,
Y[1], Y[2], …. , Y[N]
• Calculate,
P
i (N )
100
37
Example
Given a set of data: 12, 4, 6, 11, 9,15, 20, 18, 25, 30
•N = 10, P = 80 ; i = 80 x 10 100 = 8
38
Example
4 6 9 11 12 15 18 20 25 30
i = 68 x 10 100 = 6.8, k = 7
39
• The process of finding the percentile that
corresponds to a particular value x is:
40
Example
Given a set of data as follows:
12 4 6 11 9 15 20 18 25 30
Find the percentile corresponding to the value,
Y[k] = 15
Solution:
Arrange the data in order,
4 6 9 11 12 15 18 20 25 30
number of values less than 15 5
percentile of value 15 (100) (100) 50
total number of values 10
The 1st, 2nd, and 3d quartiles are the 25th, 50th, and
75th percentiles respectively.
42
Example
Q1 – 25th percentile
Q2 – 50th percentile (median)
Q3 – 75th percentile
43
Example
Q1 Q3
44
Example
45
Exercise #5
0.7901 0.8044 0.8062 0.8073 0.8079 0.8110
0.8126 0.8128 0.8143 0.8150 0.8150 0.8152
0.8152 0.8161 0.8161 0.8163 0.8165 0.8170
(a) Use the 18 sorted (left to right) weights of regular can drinks to
find the percentile corresponding to the given value.
i. 0.8143
ii. 0.8062
(b) Find the indicated percentile and quartile.
i. P80
ii. Q3
iii. P33
iv. Q1
46
48
Measures of Dispersion
49
Range
• The range is the largest number in a set minus the
smallest number.
50
Variance
• A measure of the dispersion of a set of data points
around their mean value.
51
x x
n
2
i
• Sample:
s2 i 1
n 1
x
2
• Population: i
2 i 1
52
Standard Deviation
• A statistic used as a measure of the dispersion or
variation in a distribution, equal to the square root of
the arithmetic mean of the squares of the deviations
from the arithmetic mean.
53
Exercise #6
Adam has been playing golf on the weekends for the past
three years. Recently, he started keeping track of his
recorded scores. His scores for June and July at his favorite
9-hole (par 36) golf course are provided below:
45 49 42 56 41 36 34 38 41 45 40 42 41 39
38 40 39 36 41
56
Skewness
• Occurs when a distribution is not symmetrical about
its mean.
• A distribution is symmetrical when its median, mean,
and mode are equal.
• A positively skewed (skewed to the right) distribution
occurs when the mean exceeds the median.
• A negatively skewed (skewed to the left) distribution
occurs when the mean is less than the median.
57
Measuring Skewness
i1 i
N
( x x ) 3
Skewness
( N 1) s 3
x = mean
58
• For normal distribution (symmetric distribution):
skewness = 0.
• Any symmetric data should have:
Skewness value near zero.
Distribution with mean, median and mode fall at
the same point.
59
Positive/Right Skewed
• Skewness > 0
The distribution is asymmetrical and points in the positive
direction.
Example: Test scores of difficult examination where almost
everyone did poorly on it.
mode < median < mean
60
Positive/Right Skewed
61
Negative/Left Skewed
• Skewness < 0
The distribution is asymmetrical and points in the negative
direction.
Example: Test scores of difficult examination where almost
everyone did good on it.
mode > median > mean
62
Negative/Left Skewed
63
64
65
Kurtosis
( N 1) s 4
66
Excess Kurtosis
• Excess kurtosis is simply kurtosis − 3.
• A normal distribution has kurtosis exactly 3 (excess kurtosis
exactly 0). Any distribution with kurtosis ≈ 3 (excess ≈ 0) is
called mesokurtic.
• A distribution with kurtosis < 3 (excess kurtosis < 0) is called
platykurtic. Compared to a normal distribution, its tails are
shorter and thinner, and often its central peak is lower and
broader.
• A distribution with kurtosis > 3 (excess kurtosis > 0) is called
leptokurtic. Compared to a normal distribution, its tails are
longer and fatter, and often its central peak is higher and
sharper.
67
68
69
Example
70
Example - Solution
i1 i
N
( x x ) 3
Skewness
( N 1) s 3
∑ (𝑥 − 21.52) −8.245
= = = −0.0183
15 − 1 𝑠 (14)(3.18)
Interpretation: The skewness here is -0.0183. This value implies that the distribution
of the data is slightly skewed to the left or negatively skewed. It is skewed to the left
because the computed value is negative, and is slightly, because the value is close to
zero.
71
i1 i
N
Kurtosis = ( x x ) 4
( N 1) s 4
∑ (𝑥 − 21.52) 3086.1
= = = 2.15
15 − 1 𝑠 (14)(3.18)
72
72