0% found this document useful (0 votes)
27 views24 pages

Chapter Three

Uploaded by

Habtamu Sntayhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views24 pages

Chapter Three

Uploaded by

Habtamu Sntayhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

CHAPTER THREE

MEASURES OF CENTRAL TENDENCY AND DISPERSION


At the end of this Chapter, students will be able to:
 Explain Use of Summation Notation.
 Discuss central tendency measures.
 Discuss measures of dispersion.
3.1. Introduction
Chapter two showed how you can gain useful information from raw data by
organizing them into a frequency distribution and then presenting the data by using
various graphs.
This chapter is concerned with two numerical ways of describing data, namely,
measures of central location and measures of dispersion. Measures of location
are often referred to as averages. Also called measures of central tendency and
include the mean, median, mode, and midrange. The purpose of a measure of
location is to pinpoint the center of a set of data. An average is a measure of
location that shows the central value of the data
The most familiar of these methods is the finding of averages. Measures of average
or center are also called measures of central tendency and include the mean,
median, mode, and midrange.
In addition to measures of location we should consider the dispersion—often
called the variation or the spread—in the data.
The measures that determine the spread of the data values are called measures of
variation, or measures of dispersion. These measures include the range, variance,
and standard deviation.

3.2. The Use of Summation Notation


The summation notation
Suppose x1, x2, x3… xn are numerical measurements of a variable X. The sum of all
Xi’s where i goes from 1 up to n is symbolically given by
n

x i
i 1 = x1 + x2 +…. + xn
Summation operator, implies that the values that follow it are to be summed or
added together. Properties of summation notation

1|Page
3.3. Measures of Central Tendency
When populations are small, it is not necessary to use samples since the entire
population can be used to gain information.
 For example, suppose The Manager of ABC business wanted to know the
average weekly sales of all the company’s representatives. If the company
employed a large number of salespeople, say, nationwide, he would have to
use a sample and make an inference to the entire sales force. But if the
company had only a few salespeople, say, only 87 agents, he would be able
to use all representatives’ sales for a randomly chosen week and thus use the
entire population.
 Measures found by using all the data values in the population are called
parameters.
 Measures obtained by using the data values from samples are called
statistics.
A. Mean
It is a central value of a finite set of numbers. The mean, also known as the
arithmetic average, is found by adding the values of the data and dividing by the
total number of values.
For example, the mean of 3, 2, 6, 5, andm4 is found by adding 3+2 +6 +5 +4 = 20
and dividing by 5; hence, the mean of the data is 20/5=4. The values of the data are
represented by X’s. In this data set, X1=3, X2=2, X3=6, X4=5, and X5=4. To
show a sum of the total X values, the symbol
∑ (the capital Greek letter sigma) is used, and ∑X-means to find the sum of the X
values in
the data set.
1. Sample mean
2|Page
We often select a sample from a population to find something about a specific
characteristic of the population.
For raw data, that is, ungrouped data, the mean is the sum of all the sampled values
divided by the total number of sampled values. To find the mean for a sample:
Sample mean = Sum of all the values in the sample
Number of values in the sample
The mean of a sample and the mean of a population are computed in the same way,
but
the shorthand notation used is different. The formula for the mean of a sample is:

Or

Where:
“X bar``--is the sample mean.
n =is the number in the sample.
X= represents any particular value.
Σ= is the Greek capital letter “sigma” and indicates the operation of adding.
ΣX =is the sum of the X values.
Example: 1 - Injibara General Hospital is studying the number of minutes
used by clients in a particular laboratory room. A random sample of 12
clients showed the following number of minutes used last month.
90 77 94 89 119 112
91 110 92 100 113 83
What is the arithmetic mean number of minutes used?
Solution

The arithmetic mean of the number of minutes used last month by the sample of
laboratory room
Users is 97.5 minutes.
Example 2
The data represent the number of days off per year for a sample of individuals
selected from nine different organizations. Find the mean.
20, 26, 40, 36, 23, 42, 35, 24, 30
Solution

Hence, the mean of the number of days off is 30.7 days.

3|Page
Example 3
The following data shows the number of patients in a sample of Bure, Finoteselam,
Injibara, Chagni, Jawi, and Dangila hospitals who acquired an infection while
hospitalized. 110, 76, 29, 38, 105, 31

What will be the mean of the number of hospital infections for the six
hospitals?
Solution

The mean of the number of hospital infections for the six hospitals is
64.8.

Rounding Rule for the Mean


The mean should be rounded to one more decimal place than occurs in
the raw data.
 For example, if the raw data are given in whole numbers, the mean
should be rounded to the nearest tenth. If the data are given in tenths,
the mean should be rounded to the nearest hundredth, and so on.
2. Population Mean

The population mean is the sum of all the values in the population divided by the
number of values in the population. To find the population mean, we use the
following formula:
Population mean = Sum of all the values in the population
Number of values in the population
The mean of a population using mathematical symbols is:

Or
Where:
μ =represents the population mean. It is the Greek lowercase letter “mu.”
N= is the number of items in the population.
X= represents any particular value.
Σ =is the Greek capital letter “sigma” and indicates the operation of adding.
ΣX= is the sum of the X values.

There are five Groups of Students in Injibara University, Department of


management. Listed below is the number of points earned by each group in the 1 st
semester of 2013 E.C, for statistics for Management I.
Group Points
s

4|Page
G-1 63
G-2 62
G-3 57
G-4 56
G-5 48

In this population? What is the arithmetic mean number of points earned?


Solution
This is a population if the researcher is considering only the groups in Injibara
University, Department of management. Add the number of points for each of the
five groups. The total number of points for the five groups is 286. To find the
arithmetic mean, divide this total by 5. Therefore, the arithmetic mean is 57.2,
found by 286/5.

How do we interpret the value of 57.2? The typical number of points earned by a
team Injibara University, Department of management during the semester was
57.2.

Finding the mean of Grouped Data

The procedure for finding the mean for grouped data uses the midpoints of the
classes.
This procedure is shown next.
The procedure for finding the mean for grouped data assumes that the
mean of all the
raw data values in each class is equal to the midpoint of the class.
In reality, this is not true, since the average of the raw data values in
each class usually will not be exactly equal to the midpoint. However,
using this procedure will give an acceptable approximation of the
mean, since some values fall above the midpoint and other values fall
below the midpoint for each class, and the midpoint represents an
estimate of all values in the class.
The steps for finding the mean for grouped data are summarized in the
next
Procedure Table.

5|Page
Example: The following data shows the miles that 20 randomly selected runners
ran during a
given week.

Having the data in this frequency distribution, find the mean.


Solution

Step 1 Make a table as shown.

6|Page
Step 2 Find the midpoints of each class and enter them in column C.
Xm= 5.5+10.5 = 8
2
= 10.5+15.5 = 13
2
= 15.5+20.5 = 18
2
= 20.5+25.5= 23
2
= 25.5+30.5= 28
2
= 30.5+35.5=33
2
=35.5+40.5=38
2
Step 3: for each class, multiply the frequency by the midpoint, as shown, and
place
the product in column D.
1*8=8
2*13=26
3*18= 54
5*23 = 115
4*28 = 112
3* 33= 99
2* 38 = 76
Step 4 Find the sum of column D

7|Page
The Properties of the Arithmetic Mean
The arithmetic mean is a widely used measure of central location or tendency. It
has several
important properties:

 Every set of interval-level or ratio-level data has a mean.


 All the values are included in computing the mean.
 The mean is unique. That is, there is only one mean in a set of data. Later
in the chapter,
we will discover a measure of location that might appear twice, or more than
twice in a set
of data.
 The sum of the deviations of each value from the mean is zero.
Expressed symbolically:

Thus, we can consider the mean as a balance point for a set of data.

B. Median

8|Page
The median of a set of data is defined as the middle value when the data is
arranged in order of magnitude. If there are no ties, half of the observations will be
smaller than the median, and half of the observations will be larger than the
median.
The median is the halfway point in a data set. Before you can find this point, the
data
must be arranged in order. When the data set is ordered, it is called a data array.
The
median either will be a specific value in the data set or will fall between two
values.
Generally; the center for such data can be better described using a measure of
location called the median.
The symbol for the median is MD.
Steps in computing the median of a data array
 Step 1 Arrange the data in order.
 Step 2 Select the middle point.
Example1 : The number of rooms in Habesha, Belay zeleke, Gozamin, ABC,
Tilik, FM and Menkorer hotels in in Debremarkos is 713, 300, 618, 595, 311, 401,
and 292. Find the median.

Solution
Step 1 Arrange the data in order.
292, 300, 311, 401, 595, 618, 713
Step 2 Select the middle value.
292, 300, 311, 401, 595, 618, 71
Therefore the median is 401.
 Note: this is true if the data is odd, but if the data is even, the median will be
determined by adding the two mid-point values and divide the summation value
by two. In other words,

 For an odd number of observations, the median is the middle value. i.e If N
is odd, then the median is given by

9|Page
 For an even number of observations, the median is the average of the two
middle values. If N is even, then the median is given by

Example 2: - Find the median of each of the following sets of numbers.


a. 12, 15, 22, 17, 20, 26, 22, 26, 12
b. 4, 7, 9, 10, 5, 1, 3, 4, 12, 10
Solution: - Arranging the data in an increasing order of magnitude, we obtain
12, 12, 15, 17, 20, 22, 22, 26, 26
Here, N (= 9) is odd, and so

Notice that if a number is repeated, we still count it the number of times it


appears when
we calculate the median.
Arranging the data in an increasing order of magnitude, we obtain
1, 3, 4, 4, 5, 7, 9, 10, 10, 12. Here, N(=10) is an even number and so

Notice that, in each case, the median divides the distribution into two equal parts,
with 50% of
the observations greater than it and the other 50% less than it.
Example 3 : The number of products that have sold in ABC trading over an 8-year
period follows.
684, 764, 656, 702, 856, 1133, 1132, 1303

Find the median,


Step 1: arrange the data
The arranged data will be 656,684, 702,764, 856, 1132, 1133, 1303
Step 2 Select the middle value.

From this one can find as 764 and 856 are the mid points of the data
Therefore the median for this data will be
MD=764+856 = 810
2
The major properties of the median
10 | P a g e
 It is not affected by extremely large or small values. Therefore, the
median is a valuable
measure of location when such values do occur.
 It can be computed for ordinal-level data or higher.

The Median of Grouped Data


Recall that the median is defined as the midpoint of the observations after they
have been ordered from the lowest to the highest. If the data are grouped some of
the raw data values may not be available, and so we cannot necessarily determine
the exact value of the median. But we can estimate the median by first finding the
position of the median (which class it falls in) and then calculating an estimate of
the median within this median class.

Where
L =Lower limit of the median class
n = Total number of observations = f
m = Cumulative frequency preceding the median class
f = Frequency of the median class
c = Class interval of the median class
Example: Find the median for the following continuous frequency distribution:
Class 0-1 1-2 2-3 3-4 4-5
5-6
Frequency 1 4 8 7 3
2

Solution
Step 1: determine the cumulative frequency

Cumulative
Class Frequency
frequency
0-1 1 1
1-2 4 5
2-3 8 13
3-4 7 20
4-5 3 23
5-6 2 25
Total 25

11 | P a g e
Step 2: Substituting in the formula the relevant values.

=2.9375

C. Mode
The mode is the value that occurs most often in the data set. It is sometimes said to
be the most typical case. The value that occurs most often in a data set is called the
mode.
A data set that has only one value that occurs with the greatest frequency is
said to
be unimodal. If a data set has two values that occur with the same greatest
frequency, both values
are considered to be the mode and the data set is said to be bimodal. If a
data set has more
than two values that occur with the same greatest frequency, each value is
used as the
mode, and the data set is said to be multimodal. When no data value
occurs more than
once, the data set is said to have no mode.
Mode is a very useful measure when you want to keep in the inventory, the most
popular shirt in
terms of collar size during festival season. Median and mean will not be helpful in
this type of
situation. Another example where mode is the only answer is in determining the
most typical
shoe size to be kept in stock in a shop selling shoes.

A data set can have more than one mode or no mode at all. These situations
will be shown in some of the examples that follow.
Example 1: - Find the mode of the signing bonuses of your new business in
2013 E.C.
The bonuses in millions of dollars are
18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10

Solution
it is helpful to arrange the data in order although it is not necessary.
10, 10, 10, 11.3, 12.4, 14.0, 18.0, 34.5 since $10 million occurred 3 times—a
frequency larger than any other number—the mode is $10 million.

12 | P a g e
Example2. : The life in number of hours of 10 flashlight batteries are as follows:
Find the mode.
340 350 340 340 320 340 330 330 340 350

340 occurs five times. Hence, mode=340.


 The mode of 1, 2, 2, 2, 3 is 2.
 The modes of 2, 3, 4, 4, 5, 5 are 4 and 5.
 The mode does not exist when every observation has the same frequency.
For example, the
following sets of data have no modes:
 3, 6, 8, 9;
 4, 4, 4, 7, 7, 7, 9, 9, 9.
It can be seen that the mode of a distribution may not exist, and even if it exists, it
may not
be unique.
Example 3: 20 patients selected at random had their blood groups determined.

Blood group A B AB O
Number of 2 4 6 8
patients

The blood group with the highest frequency is O. The mode of the data is therefore
blood
group O. We can say that most of the patients selected have blood group O.

 Notice that the mean and the median cannot be applied to the data in this
Example. This is because the variable “blood group” cannot take numerical
values. The mode can be used to describe both quantitative and
qualitative data.
Mode for Grouped Data

For a grouped frequency distribution, the class interval with the highest frequency
is called the modal class.

Where L =Lower limit of the modal class.


d1= f1-f0 d2 = f1 -f2
f1 = Frequency of the modal class

13 | P a g e
f0= Frequency preceding the modal class
f2= Frequency succeeding the modal class
C = Class Interval of the modal class

Example: Find the mode for the following continuous frequency distribution:
Class 0-1 1-2 2-3 3-4 4-5 5-6
Frequenc
1 4 8 7 3 2
y

L=2
d1  f1  f0 = 8 – 4 = 4
d2  f1 f2 = 8 – 7 = 1
C = 1 Hence

= 2.8
D. The Midrange
The midrange is the average of the smallest and largest observations.
The midrange is a rough estimate of the middle. It is found by adding the lowest
and
highest values in the data set and dividing by 2.
The midrange is defined as the sum of the lowest and highest values in the data
set,
divided by 2. The symbol MR is used for the midrange.
MR= lowest value + highest value
2

Example 1. Find the midrange. 2, 3, 6, 8, 4, 1


MR = 1+ 8 = 9 = 4.5
2 2
Hence, the midrange is 4.5
If the data set contains one extremely large value or one extremely small value, a
higher or lower midrange value will result and may not be a typical description of
the middle.
Example: 18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10. Find the midrange of data.

The smallest bonus is $10 million and the largest bonus is $34.5 million.

14 | P a g e
Notice that this amount is larger than seven of the eight amounts and
is not typical of
the average of the bonuses. The reason is that there is one very high
bonus, namely, 34.5 million.

Properties and Uses of mean, the median and the mode


We have looked at three different measures of central tendency and we now
consider them in
the light of the various information they give about sets of data. By examining the
advantages
and limitations of each of the three measures, we may know what information they
give.
The Mean
1. The mean is found by using all the values of the data.
2. The mean varies less than the median or mode when samples are taken from the
same population and all three measures are computed for these samples.
3. The mean is used in computing other statistics, such as the variance.
4. The mean for the data set is unique and not necessarily one of the data values.
5. The mean cannot be computed for the data in a frequency distribution that has
an open-ended class.
6. The mean is affected by extremely high or low values, called outliers, and may
not be the appropriate average to use in these situations.
The Median
1. The median is used to find the center or middle value of a data set.
2. The median is used when it is necessary to find out whether the data values fall
into the upper half or lower half of the distribution.
3. The median is used for an open-ended distribution.
4. The median is affected less than the mean by extremely high or extremely low
values.
The Mode
1. The mode is used when the most typical case is desired.
2. The mode is the easiest average to compute.

15 | P a g e
3. The mode can be used when the data are nominal or categorical, such as
religious preference, gender, or political affiliation.
4. The mode is not always unique. A data set can have more than one mode, or the
mode may not exist for a data set.
The Midrange
1. The midrange is easy to compute.
2. The midrange gives the midpoint.
3. The midrange is affected by extremely high or low values in a data set.

3.4. Measures of dispersion or variation

In simple terms, measures of dispersion indicate how large the spread of the
distribution is
around the central tendency.
In statistics, to describe the data set accurately, statisticians must know more than
the
measures of central tendency. A measure of central tendency only shows us the
middle or the average of a data set without how far each observation is from the
average. In some cases, we may see that the averages of distinct distributions are
the same.

It is important to study the central tendency along with dispersion to throw light on
the shape of the curve; to gauge whether there is distortion to the bell shaped
symmetrical normal distribution curve that forms the foundation stone upon which
the entire statistical inference is built.

Example: 1
CBE of Injibara University wishes to test two programs (extension and
regular) of post graduate program to see how long each will last before
September 2014, for running the day one class one scenario. The
programmer conduct a need assessment survey from 6 areas of each
program
to test. Since different obstacles are existed to each program and only
six areas are addressed, these two programs constitute two small
populations. The results (in months) are shown.

Find the mean of each group.

16 | P a g e
Regular Extension
10 35
60 45
50 30
30 35
40 40
20 25
Solution
the mean for Regular program is

The mean for Extension program is

Since the means are equal in this Example, you might conclude that both program
curriculum development is last equally well. However, when the data sets are
examined graphically, a somewhat different conclusion might be drawn. Even
though the means are the same for both programs, the spread, or variation, is quite
different. Extension program performs more consistently; it is less variable.
Therefore, measurement of these variations is necessary to make rational decisions.
Sometimes, the average may be different, but the nature of variation may be the
same.
Measures of dispersion are studied with the following objectives.
Measure of dispersion help
 To study the representativeness and reliability of the average.
 To determine the reliability of average: an average found from a homogeneous
set of observations is considered to be representative and reliable. When the
dispersion is small, it can be considered that greater uniformity is ensured in the
distribution and average is considered to be fairly representative and reliable. But
greater value of dispersion indicates that the average is unreliable and not
representative one.
 To compare the variability among the distributions: A high degree of
dispersion shows lack of uniformity and low degree of dispersion accounts for
more uniformity and consistency. If, for example, the prices of a commodity over a
period of time are to be compared, fewer variations in the prices denote more
uniformity and representative ness and Vice-Versa.

 To facilitate the use of other statistical measures: Variation or dispersion is very


useful in the application of advanced statistical techniques like correlation,
regression, analysis of time series, tests of significance etc.

17 | P a g e
For the spread or variability of a data set, three measures are commonly used:
range, mean deviation, variance, and standard deviation. Each measure will be
discussed in this section.
A. Range
The range is the simplest measures of dispersion. The range is the highest value
minus the lowest value. The symbol R is used for the range.
R= highest value - lowest value
Example 1: - The salaries for the staff of the ABC Manufacturing Co. are
shown here.

Find the range


Staff Salary
Owner 100,000
Manager 40,000
Sales representative 30,000
Workers 25,000
15,000
18,000
Solution
The range is R =100,000 -15,000 = 85,000.
Since the owner’s salary is included in the data, the range is a large number. To
have a more meaningful statistic to measure the variability, statisticians use
measures called the variance and standard deviation.
Population Variance and Standard Deviation
Before the variance and standard deviation are defined formally, the
computational procedure will be shown, since the definition is derived from
the procedure.

Example 1: Find the variance and standard deviation for the following data
set
10, 60, 50, 30, 40, 20
Solution
Step 1 Find the mean for the data.

Step 2 Subtract the mean from each data value.

Step 3 Square each result

18 | P a g e
Step 4 Find the sum of the square

Step 5 Divide the sum by N to get the variance.

Example 2: Find the variance and standard deviation for the following data
set
35, 45, 30, 35, 40, 25
Solution
Step 1 Find the mean

19 | P a g e
Step 2 Subtract the mean from each value, and place the result in column of
the table.
Step 3 Square each result and place the squares in column of the table,

Step 4 Find the sum of the squares in column C

Step 5 Divide the sum by N to get the variance.

Step 6 Take the square root to get the standard deviation.

Hence, the standard deviation is 6.5.


Since the standard deviation of the first Data set A is 17.1 and the standard
deviation of the second data set is 6.5, the data are more variable for the 1 st data
set. In summary, when the means are equal, the larger the variance or standard
deviation is, the more variable the
data are.
Sample Variance and Standard Deviation
When computing the variance for a sample, one might expect the following
expression to be used:

Where `x-bar` is the sample mean and n is the sample size.

The expression

Does not give the best estimate of the population variance because when
the population
is large and the sample is small (usually less than 30), the variance
computed by this formula usually underestimates the population variance.
Therefore, instead of dividing by
n, find the variance of the sample by dividing by n-1, giving a slightly larger

20 | P a g e
value and
an unbiased estimate of the population variance.

To find the standard deviation of a sample, you must take the square root of
the
sample variance, which was found by using the preceding formula.

Shortcut formulas for computing the variance and standard deviation are
presented
next

Example: Find the sample variance and standard deviation for the amount
of European auto sales for a sample of 6 years shown. The data are in
millions of dollars.
11.2, 11.9, 12.0, 12.8, 13.4, 14.3
Solution

21 | P a g e
Step 1 Find the sum of the values.

Step 2 Square each value and find the sum.

Step 3 Substitute in the formulas and solve

Hence, the sample standard deviation is 1.13

Variance and Standard Deviation for Grouped Data


The procedure for finding the variance and standard deviation for grouped
data is similar to that for finding the mean for grouped data, and it uses the
midpoints of each class.

Example: - Find the variance and the standard deviation for the frequency
distribution of the following data which represents the number of Kgs that 20
machines produces during
one week.

Solution:

22 | P a g e
Step 1 Make a table as shown, and find the midpoint of each class.
Step 2 multiply the frequency by the midpoint for each class, and place the
products in columnD.
Step 3 multiply the frequency by the square of the midpoint, and place the products
in column E.

Step 5 Substitute in the formula and solve for s2 to get the variance.

Step 6 Take the square root to get the standard deviation.

Uses of Variance and Standard Deviation


 As previously stated, variances and standard deviations can be used to determine the
spread of the data. If the variance or standard deviation is large, the data are more
dispersed. This information is useful in comparing two (or more) data sets to determine
which is more (most) variable.
 The measures of variance and standard deviation are used to determine the
consistency
of a variable. For example, in the manufacture of fittings, such as nuts and

23 | P a g e
bolts, the
variation in the diameters must be small, or the parts will not fit together.
 The variance and standard deviation are used to determine the number of
data values that
fall within a specified interval in a distribution.
 Finally, the variance and standard deviation are used quite often in
inferential statistics.

THE END

24 | P a g e

You might also like