Chapter Three
Chapter Three
x i
i 1 = x1 + x2 +…. + xn
Summation operator, implies that the values that follow it are to be summed or
added together. Properties of summation notation
1|Page
3.3. Measures of Central Tendency
When populations are small, it is not necessary to use samples since the entire
population can be used to gain information.
For example, suppose The Manager of ABC business wanted to know the
average weekly sales of all the company’s representatives. If the company
employed a large number of salespeople, say, nationwide, he would have to
use a sample and make an inference to the entire sales force. But if the
company had only a few salespeople, say, only 87 agents, he would be able
to use all representatives’ sales for a randomly chosen week and thus use the
entire population.
Measures found by using all the data values in the population are called
parameters.
Measures obtained by using the data values from samples are called
statistics.
A. Mean
It is a central value of a finite set of numbers. The mean, also known as the
arithmetic average, is found by adding the values of the data and dividing by the
total number of values.
For example, the mean of 3, 2, 6, 5, andm4 is found by adding 3+2 +6 +5 +4 = 20
and dividing by 5; hence, the mean of the data is 20/5=4. The values of the data are
represented by X’s. In this data set, X1=3, X2=2, X3=6, X4=5, and X5=4. To
show a sum of the total X values, the symbol
∑ (the capital Greek letter sigma) is used, and ∑X-means to find the sum of the X
values in
the data set.
1. Sample mean
2|Page
We often select a sample from a population to find something about a specific
characteristic of the population.
For raw data, that is, ungrouped data, the mean is the sum of all the sampled values
divided by the total number of sampled values. To find the mean for a sample:
Sample mean = Sum of all the values in the sample
Number of values in the sample
The mean of a sample and the mean of a population are computed in the same way,
but
the shorthand notation used is different. The formula for the mean of a sample is:
Or
Where:
“X bar``--is the sample mean.
n =is the number in the sample.
X= represents any particular value.
Σ= is the Greek capital letter “sigma” and indicates the operation of adding.
ΣX =is the sum of the X values.
Example: 1 - Injibara General Hospital is studying the number of minutes
used by clients in a particular laboratory room. A random sample of 12
clients showed the following number of minutes used last month.
90 77 94 89 119 112
91 110 92 100 113 83
What is the arithmetic mean number of minutes used?
Solution
The arithmetic mean of the number of minutes used last month by the sample of
laboratory room
Users is 97.5 minutes.
Example 2
The data represent the number of days off per year for a sample of individuals
selected from nine different organizations. Find the mean.
20, 26, 40, 36, 23, 42, 35, 24, 30
Solution
3|Page
Example 3
The following data shows the number of patients in a sample of Bure, Finoteselam,
Injibara, Chagni, Jawi, and Dangila hospitals who acquired an infection while
hospitalized. 110, 76, 29, 38, 105, 31
What will be the mean of the number of hospital infections for the six
hospitals?
Solution
The mean of the number of hospital infections for the six hospitals is
64.8.
The population mean is the sum of all the values in the population divided by the
number of values in the population. To find the population mean, we use the
following formula:
Population mean = Sum of all the values in the population
Number of values in the population
The mean of a population using mathematical symbols is:
Or
Where:
μ =represents the population mean. It is the Greek lowercase letter “mu.”
N= is the number of items in the population.
X= represents any particular value.
Σ =is the Greek capital letter “sigma” and indicates the operation of adding.
ΣX= is the sum of the X values.
4|Page
G-1 63
G-2 62
G-3 57
G-4 56
G-5 48
How do we interpret the value of 57.2? The typical number of points earned by a
team Injibara University, Department of management during the semester was
57.2.
The procedure for finding the mean for grouped data uses the midpoints of the
classes.
This procedure is shown next.
The procedure for finding the mean for grouped data assumes that the
mean of all the
raw data values in each class is equal to the midpoint of the class.
In reality, this is not true, since the average of the raw data values in
each class usually will not be exactly equal to the midpoint. However,
using this procedure will give an acceptable approximation of the
mean, since some values fall above the midpoint and other values fall
below the midpoint for each class, and the midpoint represents an
estimate of all values in the class.
The steps for finding the mean for grouped data are summarized in the
next
Procedure Table.
5|Page
Example: The following data shows the miles that 20 randomly selected runners
ran during a
given week.
6|Page
Step 2 Find the midpoints of each class and enter them in column C.
Xm= 5.5+10.5 = 8
2
= 10.5+15.5 = 13
2
= 15.5+20.5 = 18
2
= 20.5+25.5= 23
2
= 25.5+30.5= 28
2
= 30.5+35.5=33
2
=35.5+40.5=38
2
Step 3: for each class, multiply the frequency by the midpoint, as shown, and
place
the product in column D.
1*8=8
2*13=26
3*18= 54
5*23 = 115
4*28 = 112
3* 33= 99
2* 38 = 76
Step 4 Find the sum of column D
7|Page
The Properties of the Arithmetic Mean
The arithmetic mean is a widely used measure of central location or tendency. It
has several
important properties:
Thus, we can consider the mean as a balance point for a set of data.
B. Median
8|Page
The median of a set of data is defined as the middle value when the data is
arranged in order of magnitude. If there are no ties, half of the observations will be
smaller than the median, and half of the observations will be larger than the
median.
The median is the halfway point in a data set. Before you can find this point, the
data
must be arranged in order. When the data set is ordered, it is called a data array.
The
median either will be a specific value in the data set or will fall between two
values.
Generally; the center for such data can be better described using a measure of
location called the median.
The symbol for the median is MD.
Steps in computing the median of a data array
Step 1 Arrange the data in order.
Step 2 Select the middle point.
Example1 : The number of rooms in Habesha, Belay zeleke, Gozamin, ABC,
Tilik, FM and Menkorer hotels in in Debremarkos is 713, 300, 618, 595, 311, 401,
and 292. Find the median.
Solution
Step 1 Arrange the data in order.
292, 300, 311, 401, 595, 618, 713
Step 2 Select the middle value.
292, 300, 311, 401, 595, 618, 71
Therefore the median is 401.
Note: this is true if the data is odd, but if the data is even, the median will be
determined by adding the two mid-point values and divide the summation value
by two. In other words,
For an odd number of observations, the median is the middle value. i.e If N
is odd, then the median is given by
9|Page
For an even number of observations, the median is the average of the two
middle values. If N is even, then the median is given by
Notice that, in each case, the median divides the distribution into two equal parts,
with 50% of
the observations greater than it and the other 50% less than it.
Example 3 : The number of products that have sold in ABC trading over an 8-year
period follows.
684, 764, 656, 702, 856, 1133, 1132, 1303
From this one can find as 764 and 856 are the mid points of the data
Therefore the median for this data will be
MD=764+856 = 810
2
The major properties of the median
10 | P a g e
It is not affected by extremely large or small values. Therefore, the
median is a valuable
measure of location when such values do occur.
It can be computed for ordinal-level data or higher.
Where
L =Lower limit of the median class
n = Total number of observations = f
m = Cumulative frequency preceding the median class
f = Frequency of the median class
c = Class interval of the median class
Example: Find the median for the following continuous frequency distribution:
Class 0-1 1-2 2-3 3-4 4-5
5-6
Frequency 1 4 8 7 3
2
Solution
Step 1: determine the cumulative frequency
Cumulative
Class Frequency
frequency
0-1 1 1
1-2 4 5
2-3 8 13
3-4 7 20
4-5 3 23
5-6 2 25
Total 25
11 | P a g e
Step 2: Substituting in the formula the relevant values.
=2.9375
C. Mode
The mode is the value that occurs most often in the data set. It is sometimes said to
be the most typical case. The value that occurs most often in a data set is called the
mode.
A data set that has only one value that occurs with the greatest frequency is
said to
be unimodal. If a data set has two values that occur with the same greatest
frequency, both values
are considered to be the mode and the data set is said to be bimodal. If a
data set has more
than two values that occur with the same greatest frequency, each value is
used as the
mode, and the data set is said to be multimodal. When no data value
occurs more than
once, the data set is said to have no mode.
Mode is a very useful measure when you want to keep in the inventory, the most
popular shirt in
terms of collar size during festival season. Median and mean will not be helpful in
this type of
situation. Another example where mode is the only answer is in determining the
most typical
shoe size to be kept in stock in a shop selling shoes.
A data set can have more than one mode or no mode at all. These situations
will be shown in some of the examples that follow.
Example 1: - Find the mode of the signing bonuses of your new business in
2013 E.C.
The bonuses in millions of dollars are
18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10
Solution
it is helpful to arrange the data in order although it is not necessary.
10, 10, 10, 11.3, 12.4, 14.0, 18.0, 34.5 since $10 million occurred 3 times—a
frequency larger than any other number—the mode is $10 million.
12 | P a g e
Example2. : The life in number of hours of 10 flashlight batteries are as follows:
Find the mode.
340 350 340 340 320 340 330 330 340 350
Blood group A B AB O
Number of 2 4 6 8
patients
The blood group with the highest frequency is O. The mode of the data is therefore
blood
group O. We can say that most of the patients selected have blood group O.
Notice that the mean and the median cannot be applied to the data in this
Example. This is because the variable “blood group” cannot take numerical
values. The mode can be used to describe both quantitative and
qualitative data.
Mode for Grouped Data
For a grouped frequency distribution, the class interval with the highest frequency
is called the modal class.
13 | P a g e
f0= Frequency preceding the modal class
f2= Frequency succeeding the modal class
C = Class Interval of the modal class
Example: Find the mode for the following continuous frequency distribution:
Class 0-1 1-2 2-3 3-4 4-5 5-6
Frequenc
1 4 8 7 3 2
y
L=2
d1 f1 f0 = 8 – 4 = 4
d2 f1 f2 = 8 – 7 = 1
C = 1 Hence
= 2.8
D. The Midrange
The midrange is the average of the smallest and largest observations.
The midrange is a rough estimate of the middle. It is found by adding the lowest
and
highest values in the data set and dividing by 2.
The midrange is defined as the sum of the lowest and highest values in the data
set,
divided by 2. The symbol MR is used for the midrange.
MR= lowest value + highest value
2
The smallest bonus is $10 million and the largest bonus is $34.5 million.
14 | P a g e
Notice that this amount is larger than seven of the eight amounts and
is not typical of
the average of the bonuses. The reason is that there is one very high
bonus, namely, 34.5 million.
15 | P a g e
3. The mode can be used when the data are nominal or categorical, such as
religious preference, gender, or political affiliation.
4. The mode is not always unique. A data set can have more than one mode, or the
mode may not exist for a data set.
The Midrange
1. The midrange is easy to compute.
2. The midrange gives the midpoint.
3. The midrange is affected by extremely high or low values in a data set.
In simple terms, measures of dispersion indicate how large the spread of the
distribution is
around the central tendency.
In statistics, to describe the data set accurately, statisticians must know more than
the
measures of central tendency. A measure of central tendency only shows us the
middle or the average of a data set without how far each observation is from the
average. In some cases, we may see that the averages of distinct distributions are
the same.
It is important to study the central tendency along with dispersion to throw light on
the shape of the curve; to gauge whether there is distortion to the bell shaped
symmetrical normal distribution curve that forms the foundation stone upon which
the entire statistical inference is built.
Example: 1
CBE of Injibara University wishes to test two programs (extension and
regular) of post graduate program to see how long each will last before
September 2014, for running the day one class one scenario. The
programmer conduct a need assessment survey from 6 areas of each
program
to test. Since different obstacles are existed to each program and only
six areas are addressed, these two programs constitute two small
populations. The results (in months) are shown.
16 | P a g e
Regular Extension
10 35
60 45
50 30
30 35
40 40
20 25
Solution
the mean for Regular program is
Since the means are equal in this Example, you might conclude that both program
curriculum development is last equally well. However, when the data sets are
examined graphically, a somewhat different conclusion might be drawn. Even
though the means are the same for both programs, the spread, or variation, is quite
different. Extension program performs more consistently; it is less variable.
Therefore, measurement of these variations is necessary to make rational decisions.
Sometimes, the average may be different, but the nature of variation may be the
same.
Measures of dispersion are studied with the following objectives.
Measure of dispersion help
To study the representativeness and reliability of the average.
To determine the reliability of average: an average found from a homogeneous
set of observations is considered to be representative and reliable. When the
dispersion is small, it can be considered that greater uniformity is ensured in the
distribution and average is considered to be fairly representative and reliable. But
greater value of dispersion indicates that the average is unreliable and not
representative one.
To compare the variability among the distributions: A high degree of
dispersion shows lack of uniformity and low degree of dispersion accounts for
more uniformity and consistency. If, for example, the prices of a commodity over a
period of time are to be compared, fewer variations in the prices denote more
uniformity and representative ness and Vice-Versa.
17 | P a g e
For the spread or variability of a data set, three measures are commonly used:
range, mean deviation, variance, and standard deviation. Each measure will be
discussed in this section.
A. Range
The range is the simplest measures of dispersion. The range is the highest value
minus the lowest value. The symbol R is used for the range.
R= highest value - lowest value
Example 1: - The salaries for the staff of the ABC Manufacturing Co. are
shown here.
Example 1: Find the variance and standard deviation for the following data
set
10, 60, 50, 30, 40, 20
Solution
Step 1 Find the mean for the data.
18 | P a g e
Step 4 Find the sum of the square
Example 2: Find the variance and standard deviation for the following data
set
35, 45, 30, 35, 40, 25
Solution
Step 1 Find the mean
19 | P a g e
Step 2 Subtract the mean from each value, and place the result in column of
the table.
Step 3 Square each result and place the squares in column of the table,
The expression
Does not give the best estimate of the population variance because when
the population
is large and the sample is small (usually less than 30), the variance
computed by this formula usually underestimates the population variance.
Therefore, instead of dividing by
n, find the variance of the sample by dividing by n-1, giving a slightly larger
20 | P a g e
value and
an unbiased estimate of the population variance.
To find the standard deviation of a sample, you must take the square root of
the
sample variance, which was found by using the preceding formula.
Shortcut formulas for computing the variance and standard deviation are
presented
next
Example: Find the sample variance and standard deviation for the amount
of European auto sales for a sample of 6 years shown. The data are in
millions of dollars.
11.2, 11.9, 12.0, 12.8, 13.4, 14.3
Solution
21 | P a g e
Step 1 Find the sum of the values.
Example: - Find the variance and the standard deviation for the frequency
distribution of the following data which represents the number of Kgs that 20
machines produces during
one week.
Solution:
22 | P a g e
Step 1 Make a table as shown, and find the midpoint of each class.
Step 2 multiply the frequency by the midpoint for each class, and place the
products in columnD.
Step 3 multiply the frequency by the square of the midpoint, and place the products
in column E.
Step 5 Substitute in the formula and solve for s2 to get the variance.
23 | P a g e
bolts, the
variation in the diameters must be small, or the parts will not fit together.
The variance and standard deviation are used to determine the number of
data values that
fall within a specified interval in a distribution.
Finally, the variance and standard deviation are used quite often in
inferential statistics.
THE END
24 | P a g e