Mean, Mode Median
Mean, Mode Median
Structure
16.1 Introduction
16.2 Objectives
16.3 Meaning of Statistics
16.3.1 Descriptive Statistics
16.3.2 Inferential Statistics
16.4 Need for Statistics
16.5 Meaning and Need of Central Tendency
16.5.1 Mean, Median and Mode
16.5.2 Using Mean, Median and Mode
16.6 Meaning of Variability
16.6.1 Measures of Variability
16.6.2 Mean Deviation and Standard Deviation
16.6.3 Derived Scores
16.7 Correlation: Meaning and Interpretation
16.8 Let Us Sum Up
16.9 Answers to Check Your Progress
16.10 Suggested Readings
16.1 INTRODUCTION
If you evaluate a large number of students, say 1000, with the help of an
achievement test, you shall have 1000 different numerical scores (known as
raw scores). With the help of this long list of numbers, you cannot draw any
conclusion about the achievement of your student or about the progress of the
class as a whole. In order to make these 'raw scores' meaningful, you have to
arrange them in a systematic manner to enable yourself to say something about
the overall achievement of the students as a group. The process of organizing
data, in order to draw meaningful conclusions, is known as 'statistics'.
~ The measures of central tendency such as mean, median and mode are used to
determine the 'typical' or average score for a group, where as the measures of
variability, such as standard deviation, indicate how the scores are spread about
the central or typical value. On the other hand, the measures of relative
position show how an individual performs in relation to other persons in the
group and the measures of association indicate how two sets of scores for the
same group of persons are related. These meaisuies are used to study the scores
based on samples drawn fiom large+mpulations and indicate the characteristics
I
of the samples.
You may use the ideas of statistical inference very frequently. For example, if
you want to make a statement about the mean IQ in a complete population of
students in a particular institution, you draw a representative sample,
Student Perf01-mince:
Interpretation administer an intelligence test to the students included in the sample and then
compute the mean IQ of the sample. Thereafter, you may use this sample mean
for estimating the mean IQ of the population. The procedures of statistical
inference involves the use of many concepts such as, 'standard error',
'sampling distribution' and 'levels of significance' which are beyond the scope
of the present discussion.
The procedure of statistical inference also involves the establishment of the
accuracy of the sample statistics, such as, mean as estimates of the population
parameters.
b) Compare your answer with that given at the end of the unit.
.........................................................................
.........................................................................
.........................................................................
Interpreting Test Results
16.5 MEANING AND NEED OF CENTRAL TENDENCY
It is commonly observed that measurements based on large groups or samples
have wide variation. If you measure the heights of plants of certain type in a
garden, it is observed that heights vary widely from very small (short) plants to
very tall plants. However, 'very short' and 'very tall' plants are relatively small
in number and majority of heights are spread around the 'average' value, some
being closer while others being farther. Similarly, if you consider the heights
of 15 year old boys, you shall observe that exceptionally short and
exceptionally tall boys are small in number. The heights of a large majority are
concentrated around the mean. This 'tendency' is common to all kinds of
measurements.
By the very nature of the way in which mean is computed, it is based on eacb
and every score. If some of the scores in the set are very small or very large as
compared to the rest, the mean is drastically shifted towards the extreme
scores. However, in general, mean is a preferred measure of central tendency.
It is more precise and stable index as compared to median and mode. If a
number of samples of the same size are selected from a population, the means
of those samples will be closer to one another in comparison to the modes or
the medians.
In the above example, if mean is subtracted from each of the scores and the
resulting differences are added algebraically, the result will be zero. In the
following table column (1) shows raw sccares (X) and column (2) shows the
numbers obtained after subtracting mean fi-bm raw scores (x) and column (3)
shows the squares (x2) of these differences.
It can be observed that the sum of entries in column (2) is zero, and that of
those in column (3) is 120.
These results lead to certain important conclusions. Column (2) shows the InterprebgTestResults
deviation of each raw score from the mean. ' ~ h e s enumbers are called
'deviation scores'. The deviation scores may be negative as well as positive
and indicate how much each raw score is above or below the mean. As shown
in the example, the sum of the deviation scores around the mean is zero. This
leads to a new definition of mean which states that mean is "that point about
which the sum of deviations is zero". Here, the sum of deviations on one side
of the mean equals the sum of deviations on the other side. The mean,
therefore, acts as a "fulcrum" of the distribution.
Column (3) shows each deviation squared and sum of these squares is 120. It
. can be easily shown that sum of deviations about any point other than the mean
will always be more than 120. This leads to still another definition of mean as
"that point in the dstribution about which the sum of squares of deviations is
at a minimum". These properties of mean help in further study of statistics.
It can be observed that 21 is such a point which divides the distribution into
I
two equal parts because 5 scores are below it and 5 scores are above it. Such a
point on the scale of measurement above and below which equal number of
cases or scores fall, is known as 'median7. It should be noted that median is not
necessarily one of the given scores. It is a point on the scale which divides the
group into two equal halves.
The concept of median may be compared with the median of a triangle which
divides the triangle into two equal parts by area. Here, area is comparable to
the size of the group. If the number of scores or students is odd, the median is
I the middle score as shown in the above example. On the other hand, when the
number of scores is even, the median is the mean of the middle two scores. If
in the above example the last score 27 is deleted, the median (Md) may be
I computed as follows:
If two or more scores in the distribution are repeated and the repetition takes
place near the median, the computation of median is made by the method of
interpolation. For example, in the following set of scores the score 15 occurs
thrice near the median.
The logic may be extended to the computation of median for grouped data.
Consider the following distribution:
In this case, the first column presents raw scores, second column frequencies
and the third column cumulative frequencies. The cumulative frequency of a
score means the total number of scores falling below its upper real limit. For
example, the cumulative frequency of the score 16 is 29. This means that 29
scores (out of 40) fall below 16.5.
In finding the median of the given distribution, we are interested in the point
below which 20 scores (half of 40) fall. We can see that 17 scores have been
covered up to the upper real limit (15.5) of the score 15. This means that our
median (Md) is more than 15.5, or we can say that
Md = 15.5 + something
1x3
-= .25 points
12
Here, 40 is the size of the group N, 17 is the cumulative frequency (F) of the
score next lower than in score which covers the median, 12 is the frequency V)
I of the score having the median and 1 is the class interval (i) of the score that
has the median. Therefore, the general equation for the median may be written
as:
where L is the real lower limit of the score that contains the median.
I
It should be noted that the median is only the mid-point of the scores and its
computation does not take into account each and every score. Unlike the mean,
it is not affected by extremely high or extremely low scores. Two quite
different sets of scores may have the same median. If one or more of the scores
at the upper or lower ends of the distribution are changed, median is not
affected. It is an appropriate measure of central tendency when data represent
an ordinal scale that is, when the data are available in the form of ranks,
ordered on some continuum in a series ranging from lowest to highest
according to characteristics we wish to measure. However, for a distribution
representing interval scale but having some extreme scores, median may be a
more appropriate measure of central tendency than the mean.
Mode: Mode is the third measure of central tendency which is used when data
represent a nominal scale. It is defined as that value of score which occurs
most frequently. For example, in the scores
the score 15 occurs thnce and 16 occurs twice, and all other scores 13, 14 and
17 occur only once each. Therefore, by definition 15 is the mode of this set of
scores. When data have been arranged into frequency distribution, then the
mode is the mid-point of the class interval having the highest frequency.
Sometimes a distribution may have two modes. If in a set of scores two scores
have equal and highest frequency, then both of these scores are mbdes. Such a.
distribution is known as a bimodal distribution. Similarly, there can be multi-
modal distributions also.
Student Performance:
Interpretation 16.5.2 Using Mean, Median and Mode
The three measures of central tendency are used in different situations. The
essential difference between mean and median is that while mean is based on
all the raw scores, the median is a point on the scale dividing the number of
persons or scores into two equal halves. The following data will clarify this
point: \
In the above data, it can be observed that mean is affected by change of scores
at the extreme while median is not. Therefore, in a distribution where extreme
a scores exist, median is the appropriate measure of central tendency rather than
the mean. In the distributions (2) and (3), median is the appropriate measure of
central tendency while in distribution (1) mean represents the central tendency.
Mode is seldom used. Its computation is easy, but it is highly unstable and
may change with minor shift in the frequencies from one interval to another.
However, there are situations in which only mode can be used. For example,
if a shoe company wants to h o w which size of shoe it should produce more, it
would use mode as a measure of central tendency. The most frequently sold
size of the shoes is the mode.
b) Compare your answers with those given at the end of the unit.
12, 18, 15, 17, 16, 17, 12, 11, i8,20, 19, i5,16, 17, i4, 17,
13, 13, 11, 19, 18, 17, 14,20, 15, 17, 16, 14, 19,20.
........................................................................
........................................................................
........................................................................
Interpreting Test Results
SetA:44 46 48 50 52 54 56
SetB:20 30 40 50 60 70 80
It may be observed that mean and median of both the sets are the same (50),
but the two distributions are obviously different from each other. In set A
scores are very close to the mean while in set B, scores are spread apart in
t relation to the mean. This shows that in addition to central tendency, it is also
important to know as to how scores are spread about the central value. The
tendency of the scores to spread about the central value is known as
'variability'. It can be seen that variability of set B is more than that of the set
A. Thus, you will agree that there is a need for having measures that indicate
how scores are spread out.
There are several statistics that serve this purpose. These are: range, mean
deviation, standard deviation and quartiles. Range is simply the difference
between the highest and the lowest scores in a distribution. In set A, range is
56 - 44 = 12, while in set B it is 80 - 20 = 60. Though range can be quickly
calculated, it is highly unstable measure like mode. However, it gives a rough
estimate of variability in a given situation. Among other measures, standard
deviation is the most useful. Since mean deviation and standard deviation are
closely related, we will discuss them in detail.
While discussing the mean in Section 16.5.1, we defined the deviation score x
obtained by subtracting mean from each of the raw scores. That is:
We also showed that sum of deviation scores for given data is zero. If we
consider all the deviation scores as positive, their sum will not be zero: Let us
reproduce the earlier example in the form of the given table. It can be observed'
that when deviation scores are added irrespective of their algebraic signs, their
sum is 28. We obtain a measure of variability by dividing 28 by 10 which is
2.8. As this measure is the mean of absolute values of deviations from the
mean, it is known as the 'Mean Deviation' (MD).
Student Performance:
Interpretation
Raw Score Deviation Score Absolute value of x Squared Deviation
X X . 1x1 x2
20 -6 6 ' 36
12 -4 4 16
13 - -3 3 9
15 -1 1 1
16 0 0 0
16 0 0 0
17 +1 1 1
20 +4 4 16
20 +4 4 16
21 ,+5 5 25
CX- 160 Cx=O ~ 1 x 1= 2 8 zx2= 120
M=16
Symbolically,
The number 3.46 is the Standard Deviation of the given set of scores. '1t is
computed by finding out the squareroot of the mean of squared deviations of
raw score from the mean. Symbolically,
in which x = X - M.
Where C X is
~ the sum of squared raw scores and CX is the sum of raw scores.
1
I
approximately normal, the SD shows interesting features. In this case, 99% of
the raw scores fall within the points 3 standard deviations below and 3 standard
deviations above the mean. In the above example 99% of the raw scores lie
within the limits
I
16 _+ 3 x 3.46 or 16 f 10.38
In the same way, 95% and 68% cases fall within the limits 16 f 2 x 3.46 and
+
16 1 x 3.46 respectively. Symbolically, we can say that'
%Scores: We have seen earlier that deviation scores may be both negative and
positive. All the raw scores falling above the mean have positive deviation
scores and those falling below the mean have negative deviation scores. When
each of the deviation scores is divided by standard deviation, we obtain a
different kind of derived scores commonly known as Z-scores or standard
scores. We define Z-score as follows:
I
X-M
z= -
SD
It can be observed that Z-scores are both positiv,e and negative. If we find the
algebraic sum of Z-scores, we will get zero. An important characteristic of
2-scores is that their mean is always zero and SD is always I . This enhances
the utility of Z-scores in comparing scores given on different scales. If two
diflerent sets of scores are converted into 2-scores, they become comparable.
T-Scores: Z-Scores are simple and meaningful. But, they suffer from certain
limitations. You have to work with negative score and decimal fraction while
using Z-scores. The average teacher is not competent enough to face such a
situation. In order to simplify the use and interpretation of scores, you can
further convert Z-scores into new type of standard scores known as T-scores.
T-scores are standard scores having a mean of 50 with an SD of 10 points. For
converting a raw score to T-score, it has to be converted to Z-score first. After
this, each Z-score is multiplied by 10 and 50 is added to the product to obtain a
T-score. Symbolically
Interpreting Test Results
These standard scores can be used to interpret and compare raw scores without
encountering negative signs and decimal fractions. This can be easily
understood not only by teachers, but also by students and parents.
b) Compare your answers with those given at the end of the unit.
12, 18, 15, 17, 16, 17, 12, 11, 18, 20, 19, 15, 16, 17, 14, 17,
13, 13, 11, 19, 18, 17, 14,20, 15, 17, 16, 14, 19,20.
........................................................................
........................................................................
........................................................................
........................................................................
........................................................................
........................................................................
.........................................................................
You might have observed that normally students' scoring highin mathematics
tend to score high in science also. If you measure, heights and weights of 15
years old children, you shall observe that taller children tend to be heavier. In
the first case you can say that achievement scores in science and mathematics
vary together. Similarly, it can be said that heights and weights of children
vary together. There are many variables in nature that vary together. You can
also say that certain variables covary. When two variables covary, they are said
to be correlated, and the underlying phenomenon is known as 'correlation'.
15
Mx= - = 3, 0,= = 1.095 = 1.1 Approx.
5
The formula,
X Y
r= zxzy
N
m?y be simplified by substituting Zx = -,and
o x
Zy = -.
If we substitute a, = i -
T a n d ?=
1 N
we have,
This equation can be used to calculate correlation coefficient directly from raw
I
I
scores.
The correlation coefficient may assume both positive and negative values
ranging from -1 to +I.. ,When the value of correlation coefficient is +1 or -1, it
Thus
Coefficient of Determination = 3
t The value of r2 gives the proportion of variance in one variable which is due to
variation in the other variable. In the above example,
This means that only 2.56% of the variance in one variable is due to variance
in the other variable. Similarly, an r = .70 indicates that coefficient of
determination is (.7012 or .49, showing that 49% of the variance is common to
the variables being correlated.
b) Compare your answers with those given at the end of the unit.
........................................................................
.........................................................................
2. Compute correlation coefficient between the variables X'
and Y in the following data and interpret the result.
X: 95,90, 85,80,75,70,65,60,55
Y: 76,78,77,71,75,79,73,72,74
........................................................................
.........................................................................
.. --
.- . .- - . --
--
Student Performance:
Interpretation 16.8 LET US SUM UP
2. For the given data Mean = 16, Median = 16.5, Mode = 17.
Answers to Check Your Progress 3 Interpreting Test Results
I
For the given data standard deviation = 2.633
1. Two variables are said to be correlated when they vary together in the same
or opposite directions. The underlying phenomenon is known as
correlation. The coefficient of correlation may vary from - 1 to + I .
2. The correlation caefficient between the given sets of scores is + 0.443. The
coefficient of determination I'2 = 0.1962 which means that 19.62% of the
variance is common to both the variables.