Lecture Notes 3
Lecture Notes 3
3- Data Description
After completing the previous stages which are collecting, organizing and presenting data for statistics,
summarizing data is the next stage that can be done by various statistical methods.
Statisticians use samples taken from populations; however, when populations are small, it is not necessary
to use samples since the entire population can be used to gain information. Measures found by using all the data
values in the population are called parameters. Measures obtained by using the data values from samples are
called statistics.
A statistic is a characteristic or measure obtained by using the data values from a sample.
A parameter is a characteristic or measure obtained by using all the data values from a specific population.
(Roman letters will be generally used to indicate sample statistics and Greek letters to indicate population
parameters).
Descriptive
Statistics
Ungrouped Grouped
Data Data
X1 + X 2 + X 3 + + X n ∑X i
=X = i =1
n n
where n represents the total number of values in the sample.
PHM111s - Probability and Statistics
b) For a population with size N, the mean is given as:
N
X1 + X 2 + X 3 + + X n ∑X i
=µ = i =1
N N
where N represents the total number of values in the population.
Example 3–1: The data represent the number of days off per year for a sample of individuals selected from nine
different countries. Find the mean.
20, 26, 40, 36, 23, 42, 35, 24, 30
Solution:
X
= ∑=
X 20 + 26 + 40 + 36 + 23 + 42 + 35 + 24 + 30 276
= = 30.7 days
n 9 9
Hence, the mean of the number of days off is 30.7 days.
Case 2: Grouped Data
n
∑ f .X
i i
1 n
=X =
i =1
n
n
∑ fi . X i
∑ fi i =1
i =1
Example 3–2: The following table gives the frequency distribution of the number of orders received each day
during the past 50 days at the office of a mail-order company. Calculate the mean.
Solution
Number of orders f
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
n = 50
Number of orders f X f. X
10 – 12 4 11 44
13 – 15 12 14 168
16 – 18 20 17 340
19 – 21 14 20 280
n = 50 = 832
X is the midpoint of the class. It is obtained by adding the class limits and divide by 2.
X
= ∑ f=
.X 832
= 16.64
n 50
PHM111s - Probability and Statistics
The Median
Case 1: Ungrouped Data
The median is the middle value in an ordered sequence of data. The symbol for the median is MD.
(n + 1)
The positioning-point formula used to find the place in the ordered array corresponds to the median value.
2
1. If there are an odd number of observations in the data set, the median is represented by the
(n + 1)
numerical value corresponding to the positioning point, the ordered observation.
2
2. If there are an even number of observations in the data set, then the positioning point lies between
the two observations in the middle of the data set. The median is then the average of the numerical
values corresponding to these two middle observations.
Example 3–3: The number of rooms in the seven hotels in downtown Pittsburgh is 713, 300, 618, 595, 311, 401,
and 292. Find the median.
Solution:
Example 3–4: The number of cloudy days for the top 10 cloudiest cities is shown. Find the median.
209, 223, 211, 227, 213, 240, 240, 211, 229, 212
Solution:
Arrange the data in order.
209, 211, 211, 212, 213, 223, 227, 229, 240, 240
↑
Median
213 + 223
MD = = 218
2
Hence, the median is 218 days.
Example 3–5: Based on the grouped data below, find the median.
Solution
Seconds f
51 – 55 2
56 – 60 7
61 – 65 8
66 – 70 4
Construct the cumulative frequency distribution
n
2 − Fm−1
MD= L + i
fm
10.5 − 9
= 60.5 + 5
8
=60.5 + 0.9375 =61.4375
The Mode
Case 1: Ungrouped Data
The value that occurs most often in a data set is called the mode.
• A data set that has only one value that occurs with the greatest frequency is said to be unimodal.
• If a data set has two values that occur with the same greatest frequency, both values are considered to be
the mode and the data set is said to be bimodal.
• If a data set has more than two values that occur with the same greatest frequency, each value is used as the
mode, and the data set is said to be multimodal.
• When no data value occurs more than once, the data set is said to have no mode.
The mode for grouped data is the modal class. The modal class is the class with the largest
frequency.
f m − f m−1
Mode= L + i
( f m − f m−1 ) + ( f m − f m +1 )
where:
fm-1 = the frequency of the group before the modal group
fm+1 = the frequency of the group after the modal group
fm = the frequency of the modal group
i = the group width
L = the lower class boundary of the modal group
Example 3–7: Based on the grouped data below, find the mode.
Solution
Seconds f
51 – 55 2
56 – 60 7
61 – 65 8
66 – 70 4
L = 60.5, fm-1 = 7, fm = 8, fm+1 = 4, i = 5
8−7
= 60.5 + 5
Mode
(8 − 7) + (8 − 4)
1
=60.5 + 5 =61.5 s
5
The Midrange
The midrange is defined as the sum of the lowest and highest values in the data set, divided by 2. The symbol MR
is used for the midrange.
lowest value + highest value
MR =
2
Example 3–8: Find the midrange of data:
Find the weighted mean of a variable X by multiplying each value by its corresponding weight and dividing the
sum of the products by the sum of the weights.
n
w1 X 1 + w2 X 2 + ... + wn X n ∑w X i i
=X = i =1
n
w1 + w2 + ... + wn
∑w i =1
i
where w1, w2, . . . , wn are the weights and X1, X2, . . . , Xn are the values.
Solution
Course Credits (w) Grade (X)
∑w X i i
(3)(4) + (3)(2) + (4)(3) + (2)(1) 32
X
= i =1
n = = = 2.7
3+ 3+ 4+ 2 12
∑ wi
i =1