0% found this document useful (0 votes)
33 views

Lecture Notes 3

This document provides an overview of descriptive statistics and measures of central tendency. It defines key terms like population, parameter, sample, and statistic. It then describes three common measures of central tendency: mean, median, and mode. For the mean and median, it provides formulas and examples for calculating these values from both ungrouped and grouped data sets. The summary focuses on defining important statistical concepts and measures discussed in the document.

Uploaded by

mi5180907
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Lecture Notes 3

This document provides an overview of descriptive statistics and measures of central tendency. It defines key terms like population, parameter, sample, and statistic. It then describes three common measures of central tendency: mean, median, and mode. For the mean and median, it provides formulas and examples for calculating these values from both ungrouped and grouped data sets. The summary focuses on defining important statistical concepts and measures discussed in the document.

Uploaded by

mi5180907
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lecture 3

3- Data Description
After completing the previous stages which are collecting, organizing and presenting data for statistics,
summarizing data is the next stage that can be done by various statistical methods.
Statisticians use samples taken from populations; however, when populations are small, it is not necessary
to use samples since the entire population can be used to gain information. Measures found by using all the data
values in the population are called parameters. Measures obtained by using the data values from samples are
called statistics.
A statistic is a characteristic or measure obtained by using the data values from a sample.
A parameter is a characteristic or measure obtained by using all the data values from a specific population.
(Roman letters will be generally used to indicate sample statistics and Greek letters to indicate population
parameters).

Descriptive
Statistics

Ungrouped Grouped
Data Data

Measures of Measures of Measures of


Central Variation Position
Tendency

3–1 Measures of Central Tendency


Arithmetic Mean
Case 1: Ungrouped Data
The arithmetic mean, also known as the arithmetic average, is found by adding the values of the data and dividing
by the total number of values.
a) The mean for sample with size n, is given as:
n

X1 + X 2 + X 3 +  + X n ∑X i
=X = i =1

n n
where n represents the total number of values in the sample.
PHM111s - Probability and Statistics
b) For a population with size N, the mean is given as:
N

X1 + X 2 + X 3 +  + X n ∑X i
=µ = i =1

N N
where N represents the total number of values in the population.
Example 3–1: The data represent the number of days off per year for a sample of individuals selected from nine
different countries. Find the mean.
20, 26, 40, 36, 23, 42, 35, 24, 30
Solution:

X
= ∑=
X 20 + 26 + 40 + 36 + 23 + 42 + 35 + 24 + 30 276
= = 30.7 days
n 9 9
Hence, the mean of the number of days off is 30.7 days.
Case 2: Grouped Data
n

∑ f .X
i i
1 n
=X =
i =1
n
n
∑ fi . X i
∑ fi i =1
i =1

X i : is the midpoint of the class i,


where f i : is the frequency for the class i in the sample,
n
n = ∑ f i : is the sample size (sum of all frequencies).
i =1

Example 3–2: The following table gives the frequency distribution of the number of orders received each day
during the past 50 days at the office of a mail-order company. Calculate the mean.
Solution
Number of orders f

10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14

n = 50
Number of orders f X f. X
10 – 12 4 11 44
13 – 15 12 14 168
16 – 18 20 17 340
19 – 21 14 20 280
n = 50 = 832
X is the midpoint of the class. It is obtained by adding the class limits and divide by 2.

X
= ∑ f=
.X 832
= 16.64
n 50
PHM111s - Probability and Statistics
The Median
Case 1: Ungrouped Data
The median is the middle value in an ordered sequence of data. The symbol for the median is MD.

(n + 1)
The positioning-point formula used to find the place in the ordered array corresponds to the median value.
2
1. If there are an odd number of observations in the data set, the median is represented by the
(n + 1)
numerical value corresponding to the positioning point, the ordered observation.
2
2. If there are an even number of observations in the data set, then the positioning point lies between
the two observations in the middle of the data set. The median is then the average of the numerical
values corresponding to these two middle observations.

Example 3–3: The number of rooms in the seven hotels in downtown Pittsburgh is 713, 300, 618, 595, 311, 401,
and 292. Find the median.
Solution:

Step 1 Arrange the data in order.


292, 300, 311, 401, 595, 618, 713
Step 2 Select the middle value.
292, 300, 311, 401, 595, 618, 713

Median
Hence, the median is 401 rooms.

Example 3–4: The number of cloudy days for the top 10 cloudiest cities is shown. Find the median.
209, 223, 211, 227, 213, 240, 240, 211, 229, 212
Solution:
Arrange the data in order.
209, 211, 211, 212, 213, 223, 227, 229, 240, 240

Median
213 + 223
MD = = 218
2
Hence, the median is 218 days.

Case 2: Grouped Data

Step 1: Construct the cumulative frequency distribution.


Step 2: Decide the class that contain the median.
Class Median is the first class with the value of cumulative frequency equal at least n/2.
Step 3: Find the median by using the following formula:
n 
 2 − Fm−1 
Median = MD= L + i  
 f m 
 
PHM111s - Probability and Statistics
where:
n = the total frequency
Fm-1 = the cumulative frequency of the class preceding the median class
fm = the frequency of the class median
i = the class width in which median lies
L = the lower class boundary of the class in which median lies

Example 3–5: Based on the grouped data below, find the median.
Solution
Seconds f

51 – 55 2
56 – 60 7
61 – 65 8
66 – 70 4
Construct the cumulative frequency distribution

Height (in cm) f cf


51 – 55 2 2
56 – 60 7 9
61 – 65 8 17
66 – 70 4 21
n 21
= = 10.5 → class median is the 3rd class
2 2
So, Fm-1 = 9, fm = 8, i = 5 and L = 60.5
Therefore,

n 
 2 − Fm−1 
MD= L + i  
 fm 
 
 10.5 − 9 
= 60.5 + 5  
 8 
=60.5 + 0.9375 =61.4375

The Mode
Case 1: Ungrouped Data
The value that occurs most often in a data set is called the mode.
• A data set that has only one value that occurs with the greatest frequency is said to be unimodal.
• If a data set has two values that occur with the same greatest frequency, both values are considered to be
the mode and the data set is said to be bimodal.
• If a data set has more than two values that occur with the same greatest frequency, each value is used as the
mode, and the data set is said to be multimodal.
• When no data value occurs more than once, the data set is said to have no mode.

PHM111s - Probability and Statistics


Example 3–6: The data show the number of licensed nuclear reactors in the United States for a recent 15-year
period. Find the mode.
104 104 104 104 104
107 109 109 109 110
109 111 112 111 109
Solution
Since the values 104 and 109 both occur 5 times, the modes are 104 and 109. The data set is said to be bimodal.

Case 2: Grouped Data

 The mode for grouped data is the modal class. The modal class is the class with the largest
frequency.
 f m − f m−1 
Mode= L + i  
 ( f m − f m−1 ) + ( f m − f m +1 ) 
where:
fm-1 = the frequency of the group before the modal group
fm+1 = the frequency of the group after the modal group
fm = the frequency of the modal group
i = the group width
L = the lower class boundary of the modal group

Example 3–7: Based on the grouped data below, find the mode.
Solution
Seconds f

51 – 55 2
56 – 60 7
61 – 65 8
66 – 70 4
L = 60.5, fm-1 = 7, fm = 8, fm+1 = 4, i = 5

 8−7 
= 60.5 + 5 
Mode 
 (8 − 7) + (8 − 4) 
1
=60.5 + 5   =61.5 s
 5

The Midrange
The midrange is defined as the sum of the lowest and highest values in the data set, divided by 2. The symbol MR
is used for the midrange.
lowest value + highest value
MR =
2
Example 3–8: Find the midrange of data:

18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10

PHM111s - Probability and Statistics


Solution
The smallest bonus is $10 million and the largest bonus is $34.5 million.
l0 + 34.5 44.5
MR
= = = $22.25 million
2 2
Notice that this amount is larger than seven of the eight amounts and is not typical of the average of the bonuses.
The reason is that there is one very high bonus, namely, $34.5 million.

The Weighted Mean

Find the weighted mean of a variable X by multiplying each value by its corresponding weight and dividing the
sum of the products by the sum of the weights.
n

w1 X 1 + w2 X 2 + ... + wn X n ∑w X i i
=X = i =1
n
w1 + w2 + ... + wn
∑w i =1
i

where w1, w2, . . . , wn are the weights and X1, X2, . . . , Xn are the values.

Example 3–9: A student received an A in English Composition I (3 credits), a C in Introduction to Psychology


(3credits), a B in Biology I (4credits), and a D in Physical Education (2credits). Assuming A =4
grade points, B =3 grade points, C =2 grade points, D =1 grade point, and F =0 grade points, find
the student’s grade point average.

Solution
Course Credits (w) Grade (X)

English Composition I 3 A (4 points)


Introduction to Psychology 3 C (2 points)
Biology I 4 B (3 points)
Physical Education 2 D (1 point)

∑w X i i
(3)(4) + (3)(2) + (4)(3) + (2)(1) 32
X
= i =1
n = = = 2.7
3+ 3+ 4+ 2 12
∑ wi
i =1

The grade point average is 2.7.

PHM111s - Probability and Statistics


Distribution Shapes
Positively skewed, symmetric, and negatively
skewed:

In a positively skewed or right-skewed


distribution, the majority of the data values
fall to the left of the mean and cluster at the
lower end of the distribution; the “tail” is to
the right. Also, the mean is to the right of the
median, and the mode is to the left of the
median.

In a symmetric distribution, the data values


are evenly distributed on both sides of the
mean. In addition, when the distribution is
unimodal, the mean, median, and mode are
the same and are at the center of the
distribution.

When the majority of the data values fall to


the right of the mean and cluster at the upper
end of the distribution, with the tail to the left,
the distribution is said to be negatively
skewed or left-skewed. Also, the mean is to
the left of the median, and the mode is to the
right of the median.

PHM111s - Probability and Statistics

You might also like