Chapter 2e
Chapter 2e
CHAPTER - 2
2. Summarization of Data
2.1 Measures of Central Tendency
The most important objective of a statistical analysis is to determine a single value for the entire
mass of data, which describes the overall level of the group of observations and can be called a
representative of the whole set of data. It tells us where the center of the distribution of data is
located. The most commonly used measures of central tendencies are :
The Mean (Arithmetic mean, Weighted mean, Geometric mean and Harmonic means)
The Mode
The Median
Suppose that , ,…, are n observed values in a sample of size n taken from a population
of size N. Then the arithmetic mean of the sample, denoted by ̅ , is given by
⋯ ∑
= = → .
⎧ ∑ →
⎪
= ∑ ∑
→ ℎ ∑ = .
⎨
⎪∑ → ℎ ∑ = .
⎩ ∑
Example 1: The net weights of five perfume bottles selected at random from the production
line 85.4, 85.3, 84.9, 85.4 85. What is the arithmetic mean weight of the sample
observation?
Solution; = 5 = 85.4, = 85.3, = 84.9, = 85.4 = 85.
∑ . . . . .
= = = = 85.32.
Marks ( ) 9 10 11 12 13 14 15 16 17 18
Frequency ( ) 1 2 3 6 10 11 7 3 2 1
9 10 11 12 13 14 15 16 17 18 Total
1 2 3 6 10 11 7 3 2 1 46
9 20 33 72 130 154 105 48 34 18 623
So =∑ ∑
= = 13.54.
Example 3: The net income of a sample of large importers of Urea was organized into the
following table. What is the arithmetic mean of net income?
So = ∑ ∑ = = . .
Example 4: From the following data, calculate the missing frequency? The mean number of
tablets to cure ever was 29.18.
Number of tablets 19 − 21 22 − 24 25 − 27 28 − 30 31 − 33 34 − 36 37 − 39
Number of persons cured 6 13 19 18 12 9
CI 19 − 21 22 − 24 25 − 27 28 − 30 31 − 33 34 − 36 37 − 39 Total
6 13 19 18 12 9 77+
20 23 26 29 32 35 38
120 299 494 29 576 420 342 2251 + 29
29.18 ℎ
2251 + 29
= =≫ 29.18 = =≫ 29.18 (77+ ) = 2251 + 29
∑ 77+
.
=≫ 29.18 − 29 = 2251 − 2246.86 =≫ 0.18 = 4.14 =≫ = = 23.
.
Combined mean
If we have an arithmetic means , ,…, of n groups having the same unit of measurement
of a variable, with sizes , ,…, observations respectively, we can compute the combined
mean of the variant values of the groups taken together from the individual means by
⋯ ∑
= = ∑
⋯
Example 1: Compute the combined mean for the following two sets.
: 1, 4, 12, 2, 8 6 ; : 3, 6, 2, 7 4.
∑ ∑
Solution: = 6, ̅ = = = 5.5 ; = 6, ̅ = = = 4.4.
̅ + ̅ 6 5.5 + 5 4.4 55
= = = = 5.
+ 6+5 11
Example 2: The mean weight of 150 students in a certain class is 60 kg. The mean
weight of boys in the class is 70 kg and that of girl’s is 55 kg . Find the number of boys
and girls in the class?
Solution; Let be the number of boys and be the number of girls in the class.
Also let , be the mean weights of boys, girls and the mean weights of all
students respectively. Then = 70, = 55 = 60. + = 150.
̅ ̅
= =≫ 60 = =≫ 60 =
=≫ 9000 = 70 + 55 … . … (1)
150 = + … … (2)
(1) (2) , = 50 = 100.
Disadvantages of the arithmetic mean
1. The mean is meaningless in the case of nominal or qualitative data.
2. In case of grouped data, if any class interval is open ended, arithmetic mean cannot be
calculated, since the class mark of this interval cannot be found.
2.2.1.2 Weighted mean
In the computation of arithmetic mean, we had given an equal importance to each observation.
Sometimes the individual values in the data may not have an equally importance. When this is
the case, we assigned to each weight which is proportional to its relative importance.
The weighted mean of a set of values , ,…, with corresponding weights
, ,…, denoted by ̅ is computed by:
+ +⋯+ ∑
= =
+ + ⋯+ ∑
The calculation of cumulative grade point average (CGPA) in Colleges and Universities is a
good example of weighted mean.
Example: If a student scores "A" in a 3 EtCTS course, "B" in a 6 EtCTS course, "B" in
another 5 EtCTS course and "D" in a 2 EtCTS course. Compute his /her GPA for the semester.
Solution: Here the numerical values of the letter grades are the values (i.e. = 4, =
3, = 2 = 1) and the corresponding EtCTS of the course are their respective
weights. i.e.
Grade values ( ) 4 3 3 1
Weight ( ) 3 6 5 2
⋯ ∑
= = = ∑
= = = = 2.9375.
⋯
Values 2 4 6 8 10
Frequency 1 2 2 2 1
Note: SHM is used for equal distances, equal costs and equal rates.
Example 1: A motorist travels for three days at a rate (speed) of 480 km/day. On the first day he
travels 10 hours at a rate of 48 km/h, on the second day 12 hours at a rate of 40 km/h, on the
third day 15 hours at a rate of 32 km/h. What is the average speed?
Solution: Since the distance covered by the motorist is equal ( . . = 480, = 480, =
480), so we use SHM.
We can check this, by using the known formula for average speed in elementary physics.
Check; ( )= =
= = = 38.42 / h.
Example 2: A business man spent 20 Birr for milk at 40 cents per liter in Mizan-Aman town and
another 20 Birr at 60 cents per liter in Tepi town. What is the average price of milk at two towns.
Solution: Since the price on the two towns are equal (20 Birr), so we use SHM.
( )= . . = = 48 / .
Example 2: A business man spent 20 Birr for milk at rate of 40 cents per liter in Mizan-Aman
town and 25 Birr at a price of 50 cents per liter in Tepi town. What is the average price ?
Solution: Since the price on the two towns are different , so we use WHM by taking the cost as
weights (wi).
∑ (20 + 25)
= . ℎ. = w = = 45 / .
∑ 20 25
40 + 50
x /
ℎ ℎ .
Example: The mean age of a group of 100 persons was found to be 32.02 years. Later on, it was
discovered that age of 57 was misread as 27. Find the corrected mean?
Solution: = 100, = 32.02 , = 57 = 27.
− 57 − 27
̅ = ̅ + = 32.02 + = 32.02 + 0.3 = 32.32 .
100
Median and mode
2.2.2 The Median
Suppose we sort all the observations in numerical order, ranging from smallest to largest or vice
versa. Then the median is the middle value in the sorted list. We denote it by x.
Let , ,…, be n ordered observations. Then the median is given by:
.
=
.
Example 1: Find the median for the following data.
23, 16, 31, 77, 21, 14, 32, 6, 155, 9, 36, 24, 5, 27, 19
Solution: First arrange the given data in increasing order. That is
5, 6, 9, 14, 16, 19, 21, 23, 24, 27, 31, 32, 36, 77, 155.
= 15 =≫ , = = = = 23
Then the average of the variant values corresponding to these lcf is the median.
Values (xi) 3 5 4 2 7 6
Frequency (fi) 2 1 3 2 1 1
Solution: First arrange the data in increasing order and construct the lcf table for this data.
Values (xi) 2 3 4 5 6 7
Frequency (fi) 2 2 3 1 1 1
Lcf 2 4 7 8 9 10
= 10 =≫ . = = 5 + 1 = + 1 = 5 + 1 = 6.
Then the smallest LCF which is ≥ 5 & 6 7 and the variant value corresponding to this LCF
Values (xi) 10 9 11 12 14 13 15 16 17 18
Frequency (fi) 2 1 3 6 10 11 7 3 2 1
Solution: First arrange the data in ascending order and construct the LCF table for this data.
Values (xi) 9 10 11 12 13 14 15 16 17 18
Frequency (fi) 1 2 3 6 11 10 7 3 2 1
LCF 1 3 6 12 23 33 40 43 45 46
46 46
= 46 =≫ . = = 23 + 1 = + 1 = 23 + 1 = 24.
2 2 2 2
ℎ ≥ 23 & 24 23 & 33 the variant values
corresponding to these LCF are 13 & 14 respectively. Thus the median x = = 13.5.
Median for grouped data
The formula for computing the median for grouped data is given by
−
= = +
ℎ : − ℎ ℎ .
− ℎ .
ℎ ℎ .
ℎ ℎ ℎ . ∶ = − .
ℎ ℎ ℎ .
Note: The class corresponding to the smallest LCF which is ≥ is called the median
class. So that the median lies in this class.
Steps to calculate the median for grouped data
1. First construct the LCF table.
2. Determine the median class. To determine the median class, find and search the
smallest LCF which is ≥ . Then the class corresponding to this lcf is the median
class.
Daily production 80 − 89 90 − 99 100 − 109 110 − 119 120 − 129 130 − 139
Frequency 5 9 20 8 6 2
Daily production(CI) 80 − 89 90 − 99 100 − 109 110 − 119 120 − 129 130 − 139
Frequency(fi) 5 9 20 8 6 2
Lcf 5 14 34 42 48 50
To obtain the median class , calculate = = 25. Thus the smallest lcf which is ≥ is 34. So
− (25 − 14) 10
= x = lcb + 2 = 99.5 + = 105.
20
Properties of the median
1. The median is unique.
2. It can be computed for an open ended frequency distribution if the median does not lie in
an open ended class.
3. It is not affected by extremely large or small values .
4. It is not so suitable for algebraic manipulations.
5. It can be computed for ratio level, interval level and ordinal level data.
2.2.3 The mode
In every day speech, something is “in the mode” if it is fashionable or popular. In statistics this
“popularity” refers to frequency of observations.
Therefore, mode is the `most frequently observed value in a set of observations.
: : 10, 10, 9, 8, 5, 4, 5, 12, 10 = 10 → .
: 10, 10, 9, 9, 8, 12, 15, 5 = 9 &10 → .
: 4, 6, 7, 15, 12, 9 .
Remark: In a set of observed values, all values occur once or equal number of times, there is no
mode. (See set C above).
ℎ : – ℎ ℎ .
− ℎ ℎ .
− ℎ ℎ ℎ .
− ℎ ℎ ℎ .
− ℎ ℎ ℎ .
Example 1: The ages of newly hired, unskilled employees are grouped into the following
distribution. Then compute the modal age?
Ages 18 − 20 21 − 23 24 − 26 27 − 29 30 − 32
Number 4 8 11 20 7
Solution: First we determine the modal class. The modal class is 27 − 29, since it has the highest
frequency. ℎ , = 26.5, = 3, ∆ = 20 − 11 = 9, ∆ = 20 − 7 = 13.
∆ 9 27
= + = 26.5 + 3 = 26.5 + = 26.5 + 1.2 = .
∆ +∆ 9 + 13 22
Interpretation: The age of most of these newly hired employees is 27.7 (27 years and 7 months).
Example 2: The following table shows the distribution of a group of families according to their
expenditure per week. The median and the mode of the following distribution are known to be
25.50 Birr and 24.50 Birr respectively. Two frequency values are however missing from the
table. Find the missing frequencies.
Class interval 1 − 10 11 − 20 21 − 30 31 − 40 41 − 50
Frequency 14 27 15
Solution: The LCF table of the given distribution can be formed as follows.
Expenditure (CI) 1 − 10 11 − 20 21 − 30 31 − 40 41 − 50
Number of families (fi) 14 27 15
LCF 14 14 + 41 + 41 + + 56 + +
Here: = 56 + + . Since the median and the mode are Birr 25.5 & 24.5 respectively then
the class 21 − 30 is the median class as well as the modal class.
56+ 2 + 4
(14+ 2 )
25.5 = 20.5 + ()
2
24.5 = 20.5 + ( ( )
2) ( 4 )
. ( ) . ( )
56+ 2+ 4 (14+ 2) 2
5 = & 4 =
2 4
Example: Consider the age data given below and calculate Q1, Q2, and Q3.
19, 20, 22, 22, 17, 22, 20, 23, 17, 18
Solution: First arrange the data in ascending order, n=10.
17, 17, 18, 19, 20, 20, 22, 22, 22, 23
Q1 = ( ) =( ) = (2.75)th observation = 2nd observation + 0.75 (3rd - 2nd)
22)= 22
The class corresponding to this lcf is called the ith quartile class. This is the class where Qi lies.
The unique value of the ith quartile (Qi) is then calculated by the formula
−
= + , = , , .
ℎ : lcb − ℎ ℎ ℎ .
− ℎ ℎ ℎ .
ℎ ℎ ℎ ℎ .
: 2 =
2. Percentiles (P)
Percentiles are 99 points, which divide a given ordered data into 100 equal parts. These
( + 1)
= , = 1, 2, … ,99. → .
100
The class corresponding to this lcf is called the mth percentile class. This is the class where P m
lies.
The unique value of the mth percentile (Pm)) is then calculated by the formula
−
= + , = , ,…, .
ℎ : , ℎ .
3. Deciles (D)
Deciles are the nine points, which divide the given ordered data into 10 equal parts.
( + 1)
, =
= 1, 2, … ,9.
10
For the grouped data, the computations of the 9 deciles can be done as follows:
The class corresponding to this lcf is called the kth decile class. This is the class where Dk lies.
The unique value of the kth decile ( ) is calculated by the formula
−
= + , = , , … , .
ℎ : , ℎ .
Note that: =Q =D =P and , ,…, , ,…,
, , .
interval 21 − 22 23 − 24 25 − 26 27 − 28 29 − 30
F 10 22 20 14 14
interval 21 − 22 23 − 24 25 − 26 27 − 28 29 − 30 total
F 10 22 20 14 14 80
Lcf 10 32 52 66 80
this 23 − 24, is the first quartile class. lcb = 22.5, = 2, = 22, = 10.
( )
Q = lcb + = 22.5 + = 23.41.
this 25 − 26, is the second quartile class. lcb = 24.5, = 2, = 20, = 32.
( )
Q = lcb + = 24.5 + = 25.3.
= 27.64.
∑ ∑ ̅ ( . )
̅= = = 39.4 , = = = 17.31. , = √17.31 = 4.16.
Properties of Variance
1. The variance is always non-negative ( ≥ 0).
2. If every element of the data is multiplied by a constant "c", then the new variance
= .
3. When a constant is added to all elements of the data, then the variance does not change.
4. The variance of a constant (c) measured in n times is zero. i.e. (var(c) = 0).
Exercise: Verify the above properties.
Uses of the Variance and Standard Deviation
1. They can be used to determine the spread of the data. If the variance or S.D is large, then
the data are more dispersed.
2. They are used to measure the consistency of a variable.
3. They are used quit often in inferential statistics.
5. Coefficient of Variation (C.V)
Whenever the two groups have the same units of measurement, the variance and S.D for each
can be compared directly. A statistics that allows one to compare two groups when the units of
measurement are different is called coefficient of variation. It is computed by:
. = 100% → .
. = 100% → .
̅
Example: The following data refers to the hemoglobin level for 5 males and 5 female students.
In which case , the hemoglobin level has high variability (less consistency).
. . . . . . .
Solution: ̅ = = = 14.8 , ̅ = = = 13.7.
∑ ̅
= = 2.44. , = √2.44 = 1.56205.,
∑ − ̅
= = 2.19. , = √2.19 = 1.479865.
−1
.
. = 100% = 100% = . %,
̅ .
1.479865
. = 100% = 100% = . %.
̅ 13.7
Therefore, the variability in hemoglobin level is higher for females than for males.
= → .
− ̅
= → .
Z gives the number of standard deviation a particular observation lie above
or below the mean.
A positive Z-score indicates that the observation is above the mean.
A negative Z-score indicates that the observation is below the mean.
Example: Two sections were given an examination on a certain course. For section 1, the
average mark (score) was 72 with standard deviation of 6 and for section 2, the average mark
(score) was 85 with standard deviation of 7. If student A from section 1 scored 84 and student B
from section 2 scored 90, then who perform a better relative to the group?
̅
Solution: = = = 2.
̅
= = = 0.71.
Since ZA > ZB i.e. 2 > 0.71, student A performed better relative to his group than student B.
Therefore, student A has performed better relative to his group because the score's of student A
is two standard deviation above the mean score of section 1 while the score of student B is only
0.71 standard deviation above the mean score of students in section 2.