Lesson 3 Methods of Summarizing Data
Lesson 3 Methods of Summarizing Data
Lesson 3: Method of
Summarizing Data
❑ In this unit, we will be learning the different statistical methods that can further help us describe and
summarize data sets.
❑ The most common of these methods is by finding the average. Let us consider the following:
▪ The average height of a Filipino man is 5 feet and 3 inches; while the average height of Filipino
woman is just 5 feet.
▪ The average salary for a teacher is ₱20,695 per month in Philippines.
▪ On the average, 24 million people receive animal bites.
▪ The average American is sick in bed in seven days a year missing five days of work.
❑ In the above examples, the word average is “ambiguous”.
❑ Loosely stated, the average means the “center of the distribution” or the “most typical case”.
Camarines Norte State College
❑ One can think of an average as one value that best represents an entire group of
scores. You can also think of the average as the “middle” space or a fulcrum on a
seesaw –it’s the point where all the values in a set of values are balanced.
❑ Measures of average are also called “measures of central tendency” that include
mean, median, mode, and midrange.
❑ The succeeding sections will guide you through the procedures and processes on
how to compute these averages.
Camarines Norte State College
Mean
❑ The most common type of average; it is also referred as “arithmetic average.”
❑ It is the score located at the mathematical center of the distribution. Also, it is used to
summarize the interval and ratio variables when the distribution is symmetrical.
❑ Generally, it is the sum of all the values divided by the number of values in the given
data set.
Camarines Norte State College
Mean
A statistic is a characteristic or
For the sample mean: measure obtained by using data
values from a sample.
𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 σ 𝑥
𝑥ҧ = =
𝑛 𝑛 A parameter is a characteristic or
measure obtained by using all the
where 𝑛 represents the total number values in the sample data values from a specific
For the population mean: population.
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 σ 𝑋
𝜇= =
𝑁 𝑁
where 𝑁 represents the total number values in the population
Camarines Norte State College
Solution
General Rounding Rule:
Let us say that these 20 students represent a sample,
In Statistics, the general rounding
σ𝑥
𝑥ҧ = rule is that when computations are done in
𝑛
2.80 + 2.30 + 2.30 + ⋯ + 1.75 + 1.40 43.45
the calculation, rounding should be done
𝑥ҧ =
20
=
20 until the final answer is calculated. When
𝑥ҧ = 2.1725 ≈ 2.173 rounding is done in intermediate steps, it
Thus, the average GWA of 20 students is 2.173. tends to increase the difference between
the answer and the exact one.
Camarines Norte State College
Median
❑ The halfway point in a data set; “middle value” in the data set; the score is at 50th percentile.
❑ Other references called this as “midpoint” of the data array (when the data set is ordered, it is
called an array).
❑ Used to summarize ordinal or highly skewed interval or ratio variables.
❑ To find the median of ungrouped data, follow these steps:
1. Arrange the values/quantities (ascending or descending).
2. Number the values/quantities consecutively from 1 to n.
𝑛+1 𝑡ℎ
3. Case 1. If n is odd, the median is the quantity.
2
𝑛 𝑡ℎ 𝑛+1 𝑡ℎ
Case 2. If n is even, the median is the average of the and 2 quantities.
2
Camarines Norte State College
Solution
Arrange these GWA values and number these values from 1 to 20 (n = 20).
The median is the average of the 10th and 11th values, that is
2.10 + 2.30 4.40
𝑀𝐷 = = = 2.20
2 2
Thus, the middle score of student’s GWA is 2.20
Camarines Norte State College
Mode
❑ The score in the data set that “occurs most frequently”; the score value(s) with
the highest frequency.
❑ The mode can be used when the data are nominal or categorical, such as religious
preference, gender, political affiliation.
❑ The mode is not always unique.
❑ The set of data may be considered unimodal if it contains only one mode; bimodal if
it has two modes; trimodal if it contains three; or sometimes a data set has no mode.
Camarines Norte State College
Solution
By mere inspection, one can easily identify the mode of an ungrouped data.
Since 2.30 appeared 3 times, therefore, the modal score of students’ GWA is 2.30.
Camarines Norte State College
Midrange
❑ It is a (very) “rough estimate” of the middle; can be easily affected by extremely
high or extremely low value.
❑ It is defined as the sum of the lowest and highest values in the data set, divided by
2. That is,
𝑙𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 + ℎ𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒
𝑀𝑅 =
2
Camarines Norte State College
Solution
With the given data set below, the lowest value is 1.10 and the highest values is 3.00.
Therefore,
𝑙𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 + ℎ𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 1.10 + 3.00 4.10
𝑀𝑅 = = = = 2.05
2 2 2
Thus, the midrange of the given GWA scores is 2.05
Camarines Norte State College
Let us summarize the obtained values for the given data set:
Camarines Norte State College
Properties: Mean
1. The mean is found by using all the values of the data.
2. The mean varies less than the median or mode when samples are taken from the
same population and all three measures are computed for these samples.
3. The mean is used in computing other statistics, such as the variance.
4. The mean is used in computing other statistics, such as the data values.
5. The mean cannot be computed for the data in a frequency distribution that has an
open-ended class.
6. The mean is affected by extremely high or low values, called outliers, and may not
be the appropriate average to use in these situations.
Camarines Norte State College
Properties: Median
1. The median is used to find the center or middle value of data set.
2. The median is used when it is necessary to find out whether the data values fall into the upper half
or lower half of the distribution.
3. The median is used for an open-ended distribution.
4. The median is affected less than the mean by extremely high or extremely low values.
Properties: Mode
1. The mode is used when the most typical case is desired.
2. The mode is the easiest average to compute/determine.
3. The mode can be used when the data are nominal or categorical, such as religious preference,
gender, political affiliation.
4. The mode is NOT always unique. A data set can have more than one mode, or no mode at all.
Camarines Norte State College
Properties: Midrange
1. The is easy to compute.
2. The midrange gives the midpoint.
3. The midrange is affected by extremely high or low values in the data set.
Camarines Norte State College
❑ So far, we have learned how to compute and/or identify the mean, median, mode,
and midrange of an ungrouped data. However, there are some cases where data are
expressed as grouped frequency distribution (grouped data).
❑ Let us consider again this data set:
Temperatures °𝑭 in Provinces
These data represent the record high temperatures in degrees Fahrenheit for 50
provinces in the Philippines.
Camarines Norte State College
Coded Formula:
′
where: 𝑥0 = assumed mean (class mark with code 0)
σ𝑓 ∙𝑥
𝑥ҧ = 𝑥0 + 𝑖 𝑥 ′ = coded value
𝑛
𝑓 = frequency of each class
𝑖 = class interval/ class width
Camarines Norte State College
Coded Formula:
′
where: 𝑥0 = assumed mean (class mark with code 0)
σ𝑓 ∙𝑥
𝑥ҧ = 𝑥0 + 𝑖 𝑥 ′ = coded value
𝑛
𝑓 = frequency of each class
𝑖 = class interval/ class width
Camarines Norte State College
𝑛
− 𝑐𝑓𝑝
𝑀𝐷 = 𝑥𝐿𝐵 + 2 𝑖
𝑓𝑚
Where: 𝑥𝐿𝐵 = lower boundary of the median class
𝑓𝑚 = frequency of the median class
𝑐𝑓𝑝 = cumulative frequency of the class preceding the median class
𝑖 = class interval/ class width
Camarines Norte State College
Steps:
1. Add columns for “Class Boundaries” and “Cumulative Frequency”.
Note:
i. The formula requires/uses the lower class boundaries.
ii. The < 𝑐𝑓 is obtained by adding up the frequencies from the lowest class.
Camarines Norte State College
Median Class
25th
Score
2. Determine the “Median Class” Note: The median class is the class containing the middle score.
→n/2 = (50)/2 = 25th score
Camarines Norte State College
𝑑1
𝑥ො = 𝑥𝐿𝐵 + 𝑖
𝑑1 + 𝑑2
❑ In statistics, to describe the data set accurately, statisticians must know more than the measures of
central tendency.
❑ The “measures of variability” group of analytical tools that describes the “spread” or variability of a
data set.
❑ It indicates how close or widespread the scores are from the average
❑ Different measures of variability:
▪ Range
▪ Interquartile Range and Interquartile Deviation
▪ Mean Deviation
▪ Variance and Standard deviation
Camarines Norte State College
Range
❑ The range is the highest value minus the lowest value. The symbol R is used for the range.
❑ It tells us the “width” of our data set.
❑ Advantages:
▪ Easy to calculate
❑ Disadvantages:
▪ It does not consider every value in the data set; whether most of the scores are in the extremes
▪ Easily affected by extreme values
❑ The formula:
𝑅 = 𝐻𝑉 − 𝐿𝑉
where HV = highest value and LV = lowest values
Camarines Norte State College
Example:
Solution.
Identifying the highest value and lowest value in every
data set and plugging in,
Brand A: R = 60 – 15 = 50 months
Brand B: R = 45 – 25 = 20 months
Interpretation:
The range of Brand A shows that 50 months separate
the largest data from the smallest data value. For Brand
B, 20 months separate the largest data from the smallest
data value, which is less than one-half of Brand A’s
range
Camarines Norte State College
Solution.
Using the formula,
σ 𝑋−𝜇 90
MADA: 𝑀𝐴𝐷 = = = 15→spread out about the mean
𝑁 6
σ 𝑋−𝜇 30
MADB: 𝑀𝐴𝐷 = 𝑁
= 6
= 5→relatively closer about the mean
Camarines Norte State College
Solution.
Using the formula,
2 σ 𝑋−𝜇 2 1750
Brand A: 𝜎 = = = 291.67→ 𝜎 = 𝜎 2 = 291.57 = 17
𝑁 6
σ 𝑋−𝜇 2 250
Brand B: 𝜎 2 = = = 41.67→ 𝜎 = 𝜎 2 = 41.67 = 6.45
𝑁 6
Camarines Norte State College
Solution.
Using the formula for sample variance and standard deviation,
2 ത 2
σ(𝑋−𝑋) 1750
Brand A: 𝑠 = = = 350→ s = 𝑠 2 = 350 = 18.71
𝑛−1 6−1
ത 2
σ(𝑋−𝑋) 250
Brand B: 𝑠 2 = = = 50→ s= 𝑠 2 = 50 = 7.07
𝑛−1 6−1