0% found this document useful (0 votes)
8 views49 pages

Mean, Median, Mode

The document discusses measures of central tendency, including mean, median, and mode, explaining their calculations and differences, particularly in relation to outliers. It highlights the sensitivity of the mean to extreme values and suggests the median as a more robust alternative in such cases. Additionally, it covers how to estimate these measures from grouped data and introduces the geometric mean.

Uploaded by

ayeshach2808
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views49 pages

Mean, Median, Mode

The document discusses measures of central tendency, including mean, median, and mode, explaining their calculations and differences, particularly in relation to outliers. It highlights the sensitivity of the mean to extreme values and suggests the median as a more robust alternative in such cases. Additionally, it covers how to estimate these measures from grouped data and introduces the geometric mean.

Uploaded by

ayeshach2808
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

F and RF Histogram

Histogram
Mean
• The mean, also called the arithmetic mean, is
the most frequently used measure of central
tendency.
• For ungrouped data, the mean is obtained by
dividing the sum of all values by the number
of values in the data set:
• The mean calculated for sample data is denoted
by (read as “x bar”), and the mean calculated for
population data is denoted by (Greek letter mu).

• the number of values in a data set is denoted by


n for a sample and by N for a population.
• and the sum of all values of x is denoted by
Using these notations, we can write the
following formulas for the mean. ∑x.
Mean
• If we take a sample of three employees from
this company and calculate the mean age of
those three employees, this mean will be
denoted by x̅ , x bar. Suppose the three
values included in the sample are 32, 39, and
57. Then, the mean age for this sample is
• If we take a second sample of three employees of
this company, the value of will (most likely) be
different. Suppose the second sample includes
the values 53, 27, and 44. Then, the mean age for
this sample is
Population vs Sample
• we can state that the value of the population mean
is constant. However, the value of the sample mean
varies from sample to sample.
• The value of for a particular sample depends on
what values of the population are included in that
sample.
• Sometime a data set may contain a few very small
or a few very large values, such values are called
outliers or extreme values.
• A major shortcoming of the mean as a measure of
central tendency is that it is very sensitive to
outliers.
Cont.
• The preceding example should encourage us to be
cautious. We should remember that the mean is
not always the best measure of central tendency
because it is heavily influenced by outliers.

• Sometimes other measures of central tendency


give a more accurate impression of a data set. For
example, when a data set has outliers, the median
can be used as a measure of central tendency.
Median
• Median The median is the value of the middle term in a data set
that has been ranked in increasing order.
• As is obvious from the definition of the median, it divides a ranked
data set into two equal parts.

• The calculation of the median consists of the following two steps:


• 1. Rank the data set in increasing order.
• 2. Find the middle term. The value of this term is the median.

• Note that if the number of observations in a data set is odd, then


the median is given by the value of the middle term in the ranked
data. However, if the number of observations is even, then the
median is given by the average of the values of the two middle
terms.
• The advantage of using the median as a
measure of central tendency is that it is not
influenced by outliers. Consequently, the
median is preferred over the mean as a
measure of central tendency for data sets that
contain outliers.
Mode
• Mode The mode is the value that occurs with the highest
frequency in a data set.
• A major shortcoming of the mode is that a data set may
have none or may have more than one mode, whereas it
will have only one mean and only one median.
• A data set with each value occurring only once has no
mode.
• A data set with only one value occurring with the highest
frequency has only one mode. The distribution is unimodal.

• A data set with two values that occur with the same
(highest) frequency has two modes. The distribution, is
bimodal.
• If more than two values in a data set occur with the
same (highest) frequency, then the data set
contains more than two modes and it is said to be
multimodal.
• Last year’s incomes of five randomly selected
families were $76,150, $95,750, $124,985, $87,490,
and $53,740.
Find the mode.
• Solution Because each value in this data set
occurs only once, this data set contains no
mode.
• The ages of 10 randomly selected students from a
class are 21, 19, 27, 22, 29, 19, 25, 21, 22, and 30
years, respectively.
Find the mode.
• Solution This data set has three modes: 19,
21, and 22. Each of these three values occurs
with a (highest) frequency of 2.
• One advantage of the mode is that it can be
calculated for both kinds of data—quantitative
and qualitative—whereas the mean and median
can be calculated for only quantitative data.
Relationships Among the Mean,
Median, and Mode
• Two of the many shapes that a histogram or a
frequency distribution curve can assume are
symmetric and skewed. This section describes the
relationships among the mean, median, and
mode for three such histograms and frequency
distribution curves.
• Knowing the values of the mean, median, and
mode can give us some idea about the shape
of a frequency distribution curve.
• For a symmetric histogram and frequency
distribution curve with one peak (see Figure
3.2), the values of the mean, median, and
mode are identical, and they lie at the center
of the distribution.
• For a histogram and a frequency distribution curve
???????? skewed to the right
• The value of the mean is the largest, that of the
mode is the smallest, and the value of the median
lies between these two.
Mode always occurs at the peak point.
The value of the mean is the largest in this case
because it is sensitive to outliers that occur in the
right tail. These outliers pull the mean to the right.
• If a histogram and a frequency distribution
curve are skewed to the left the value of the
mean is the smallest and that of the mode is
the largest, with the value of the median lying
between these two. In this case, the outliers in
the left tail pull the mean to the left.
Mean for Grouped Data
• First find the midpoint (m) of each class.
• Multiply the midpoints by the frequencies (f) of the
corresponding classes.

• The sum of these products, denoted by ∑mf gives


an approximation for the sum of all values.

• Divide this sum by the total number of observations


(n or N) in the data set.
Formulae
Something is missing !
• If the data are given in the form of a frequency
table, we no longer know the values of
individual observations.

• We cannot obtain the sum of individual


values. We find an approximation for the sum
of these values using the procedure
The frequency distribution of the travelling times (in minutes) from
home to work for all 25 employees of a company.

Calculate the mean of the daily travelling times.


Check this out
• 21.40 minutes is an approximate and not the exact
value of the mean. We can find the exact value of
the mean only if we know the exact travelling time
for each of the 25 employees of the company.
The frequency distribution of the number of orders received each
day during the past 50 days at the office of a mail-order company.

Calculate the mean.


Solution:
• Estimating the Median from Grouped
Data
• To estimate the Median, let's look at our data again:

• Number of games Frequency cf


• 1-5 2 2
• 6 - 10 7 9
• 11 - 15 8 17
• 16 - 20 3 20
• The median is the mean of the middle two numbers (the 10th and 11th
values) ...

• ... and they are both in the 11 - 15 group:

• We can say "the median group is 11 - 15"


• But if we need to estimate a single Median value we can use
this formula:

• Estimated Median = L + (n/2) − cfb/fm × w


• where:

• L is the lower class boundary of the group containing the


median
• n is the total number of data
• cfb is the cumulative frequency of the groups before the
median group
• fm is the frequency of the median group
• w is the group width
• For our example:

• L = 11
• n = 20
• cfb = 2 + 7 = 9
• fm = 8
• w=5
• Estimated Median = 11 + (20/2) − 9 ×5
• 8
• = 11 + (1/8) x 5
• = 11.625
• Estimating the Mode from Grouped Data
Estimating the Mode from Grouped Data
Again, looking at our data:
We can easily find the modal group (the group with
the highest frequency),
which is 11 - 15
We can say "the modal group is 11 - 15"

Number Frequenc
of games y
1-5 2
6 - 10 7
11 - 15 8
16 - 20 3
But the actual Mode may not even be in that group! Or there may
be more than one mode. Without the raw data we don't really know.
But, we can estimate the Mode using the following formula:
where:
•L is the lower class boundary of the modal group
•fm-1 is the frequency of the group before the modal group
•fm is the frequency of the modal group
•fm+1 is the frequency of the group after the modal group
•w is the group width

fm − fm-1

Estimated Mode = L + ×w

(fm − fm-1) + (fm − fm+1)


In this example:
•L = 11
•fm-1 = 7
•fm = 8
•fm+1 = 3
•w = 5
8−7
Estimated Mode = 11 + ×5
(8 − 7) + (8 − 3)
= 11 + (1/6) × 5
= 11.833...
Geometric Mean
• Definition: The nth root of the product of n
numbers.

You might also like