0% found this document useful (0 votes)
25 views

Module 2 - 4

This unit discusses different methods for presenting raw data in a meaningful way, including tabulation, diagrams, charts, and graphs. Tabulation is one of the most common methods and involves organizing data into a table for easy comparison. Different types of pictorial presentations are then discussed, including pie charts, bar charts, histograms, and frequency polygons. These visualization methods make it easier for readers to understand large amounts of data at a glance. Examples are provided for each method along with self-assessment exercises to test comprehension.

Uploaded by

codforphilsam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Module 2 - 4

This unit discusses different methods for presenting raw data in a meaningful way, including tabulation, diagrams, charts, and graphs. Tabulation is one of the most common methods and involves organizing data into a table for easy comparison. Different types of pictorial presentations are then discussed, including pie charts, bar charts, histograms, and frequency polygons. These visualization methods make it easier for readers to understand large amounts of data at a glance. Examples are provided for each method along with self-assessment exercises to test comprehension.

Uploaded by

codforphilsam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Module 2

Data Presentation
Unit 1: Data Presentation
Unit 1
Data Presentation
Content
1.0 Introduction
2.0 Learning Outcomes

3.0 Learning Content

3.1 Narration/Tabulation

3.2 Diagrams, Charts and Graphs

4.0 Conclusion

5.0 Summary

6.0 Tutor-Marked Assignment

7.0 References/Further Reading


1.0 Introduction
This unit is concerned with the meaningful manner in which raw data (which are usually
in the form of large set of unorganized numerical values) are summarized and
interpreted so that important features and trends may be identified.
A set of data may be presented in tables or described by means of diagrams, charts
and graphs. Before discussing these terms, let us look at what you should learn in this
unit as stated in the objectives.

2.0 Learning Outcomes


By the end of this unit, you should be able to:
1. Tabulate and organize raw data.
2. Present data by means of tables, diagrams, charts and graphs.
3. Interpret and highlight important features of the data through the tables,
diagrams, charts and graphs.

3.0 Learning Content


3.1 Narration/Tabulation
After collecting the necessary data, the first task of a statistician or researcher is to
reduce and simplify the detail into such a form that the salient features may be
brought out, which will facilitate the interpretation of the assembled data. This
procedure is known in classifying and tabulating the data. Tabulation thus
enhances the condensation and easy comparison of data. Most published data
usually come in tabulated form and it is one of the most popular methods of making
data more comprehensible.

Lagos Outside Lagos Total


Passed 24,375 45,000 69,375
Referred 4,875 22,500 27,375
Absent 5,775 9,000 14,775
Result withheld 1,650 33750 35,400
Failed 825 2,250 3,075
Total 37,500 112,500 150,000

Examples:
1. Guardian Newspapers has three titles in her stables: The Guardian, “African
Guardian”, and “Express.” A study of staff ratio in three departments was carried
out and the following information was gathered. There are 200 staff in the three
departments, of which 65 are in the African Guardian. Of the staff in the guardian,
30 are in the editorial department and 21 in the advertising department. In the
Guardian Express, 22 staff are in the production department and 15 in the
advertisement department of the total of 61. The total number of staff in the three
departments who worked in the advert department 55 and those in the
production department 65.
i. Tabulate the above information so as to give the highest possible information.
ii. How many staff are in the advertisement department in the African Guardian?

Solution:

The Guardian African Guardian Guardian Express Total


Editorial 30 26 24 80
Production 23 20 22 65
Advert. 21 19 15 55
Total 74 65 61 200

1. 19 staff are in the advert department in the African Guardian.


2. In 1998, 150,000 candidates entered for JAMB examinations. 25% of the
candidates came from Lagos while the rest came from outside Lagos. Of those
who came from Lagos, 65% passed the examinations, 13% were referred’; and of
the rest, 0.7% were absent, 0.2% never received their results while the others failed
the examinations. Of those who came from outside Lagos, 0.4% passed the
examinations, 0.2% were referred and 0.3% had their results withheld. Of the rest,
80% were absent while all others failed the examination.
i. Arrange the above information on a table.
ii. What percentage of the candidates failed the examinations?
iii. Mention two pictorial forms in which the information can be presented.
Solution
ii) Percentage of those who failed
3075 × 100
150000

= 2.05%
iii) Multiple bar chart and component bar charts.

Self-Assessment Exercise
1. Tabulation thus enhances the condensation and easy comparison of data (Yes
or No)?
3.2 Pictorial Presentations (Diagrams, Charts and Graphs)

No matter how informative and well designed a statistical table is, as a medium for
conveying to the reader an immediate and clear impression of its content, it is inferior
to a good chart or graph. Many people are incapable of comprehending large masses
of information presented in tabular form; the figures merely confuse them.
Furthermore, many of such people are unwilling to make the effort to grasp the
meaning of such data. Graphs and charts come into their own as a means of conveying
information in easily comprehensible form. It is for such reasons that government and
multinationals always produce popular versions of important white papers in the form
of multi-coloured booklets full of simple diagrams and charts. Such diagrams and
charts are also often now seen on television both for viewers’ easy understanding and
for advertising.

Though such pictorial presentation reduces the amount of detail that can be put across
to the reader or viewer, often it is not the detail that matters, but rather the overall
picture. The most popular charts, diagrams and graphs are:

i. Pie-charts
ii. Bar diagrams (bar charts and histograms)
iii. Graphs (frequency polygons and Ogives)

3.2.1 Pie Charts


A pie-chart is simply a circle divided into sections. This circle represents the total of
the data being presented and each section is drawn proportional to its relative size.
The main advantage of a pie-chart is that it is easy to understand. It is most suited to
very simple comparisons where there are only few groups, say 2 to 4. The use of the
pie-chart where there are more than 4 sections to be labelled usually results in a loss
of the clear visual effect.

Self-Assessment Exercise 2
1. An investigation of the marital status of the staff of an institution reveals the
following:

Marital Status No. of Staff


Singles 35
Married 130
Widowed 25
Divorced 10

i. Draw a Pie Chart using the above Information.


3.2.2 Bar Charts

A simple bar chart comprises of a number of equally spaced rectangles.


A multiple bar chart is usually used in the comparison of two or more attributes.

A component bar chart comprises of bars which are subdivided into components.
Example:
Self-Assessment Exercise 3

1. Represent the data in the above SAE 1 in a Bar Chart.

Self-Assessment Exercise 4

The sex distribution of staff in five departments of a Television station is given below:

Department Male Female Total


Admin (I) 25 15 40
Programmes (II) 65 30 95
Commercial (III) 45 40 85
News (IV) 35 15 50
Sports (V) 30 10 40
Total 200 110 310

Represent the above information on:


i. Multiple bar chart
ii. Component bar chart

3.2.3 Histograms

Histograms and bar charts look alike in presentation but while the bars of the bar charts
are usually not joined; those of the histogram are usually joined. Further, while the
chart attaches importance only to its heights, histogram attaches importance to both
heights and the widths.

Self-Assessment Exercise 5.

1. Obtain the histogram of the data in the SAE 2 above.


3.2.4 Frequency polygon
A frequency polygon is obtained by joining the mid-points of the tops of the rectangles
of a histogram.
Example: Draw the frequency polygon for the example above.

130

120

110

100

90

80

70

60

50

40
Widowed

Divorced
Single

Married

Self-Assessment Exercise(s) 6

1. Of 100 patients in an orthopaedic hospital who were asked for their room, 50
wanted private rooms, 40 wanted semi-private and 10 would make do with any
room. Present this data by means of a bar chart.
2. Of 400 nursing students at teaching hospital,152 planned too into psychiatric
speciality,120 into paediatric, 80 into public health and 48 into orthopaedic
nursing. Represent the information on pie chart.
3 In a study of age distribution of patients in orthopaedic hospital, the following
ages were recorded:
51 35 45 52 53 32 31 44 47 35 52 36 44 45 44 32
48 44 44 33 53 44 44 47 44 44 44 55 44 34 54 44
45 48 32 44 47 58 50 37 44 47 50 46 38 57 49 50
51 38
Draw the frequency polygon for data above
4.0 Conclusion

In this unit, you have studied several graphical representations of data. These
representations are used in interpreting features of data. They are descriptive in
nature.

5.0 Summary

You have learned the following concepts in this unit.

1. The raw data resulting from a survey or census are usually unorganised
2. A collection of data must be organised and summarised so as to reveal the
significant features.
3. A collection of data may be described by frequency tables, pie chart, bar chart,
histograms and frequency polygons.

6.0 Tutor-Marked Assignment


1. Tabulation thus enhances the condensation and easy comparison of data (Yes or
No)?
2. Of 100 patients in an orthopaedic hospital who were asked for their room,50
wanted private rooms, 40 wanted semi-private and 10 would make do with any
room. Present this data by means of a bar chart.
3. Of 400 nursing students at teaching hospital, 152 planned too into psychiatric
speciality,120 into paediatric, 80 into public health and 48 into orthopaedic
nursing. Represent the information on pie chart.
4. In a study of age distribution of patients in orthopaedic hospital, the following ages
were recorded:

51 35 45 52 53 32 31 44 47 35 52 36 44 45 44 32 48
44 44 33 53 44 44 47 44 44 44 55 44 34 54 44 45 48
32 44 47 58 50 37 44 47 50 46 38 57 49 50 32 51 38

Draw the frequency polygon for the data above.

7.0 References/Further Reading

Hannagan, T.J. (1982) Mastering Statistics. The Macmillan Press Ltd

Indira Gandhi National Open University, (1999) Probability and Statistics, Sita Fine
Arts Pvt. Ltd., New Delhi-28
Module 3
Measures of Central
Tendency
Unit 1: Measures of Location

Unit 2: Weighted Mean


Unit 1
Measures of Location
Content
1.0 Introduction

2.0 Learning Outcomes

3.0 Learning Content

3.1 Grouping and tabulation.

3.2 The Mean (Averages)

3.3 The Median

3.4 The Mode

4.0 Conclusion

5.0 Summary

6.0 Tutor-Marked Assignment

7.0 References/Further Reading


1.0 Introduction
In population as well as in sample data sets, there is the tendency for most of the
observations to lie centrally within the given set of data arranged according to
magnitude. An index to describe this concentration of values near the middle is
customarily referred to as a measure of central tendency, a “typical” value, or simply,
an average. It is called a measure of location because it indicates where, among the
possible values of a variable, the population or sample is located. Measures of location
are very useful parameters in that they describe a property of populations. Rather than
compare entire distributions of sets of data with each other, it is usually more efficient
to compare only certain characteristics of their parameters. We will discuss here,
characteristics of some of these parameters and their associated sample statistics.

2.0 Learning Outcomes


Upon completion of this unit, you should be able to:
1. Apply use summation operator in computation
2. Define and compute the following measures of central tendency.
a. Mean
b. Median
c. Mode.
3. Summarise data by means of central tendency.

3.0 Learning Content


3.1 Grouping and Tabulation

Sometimes the figures in a data are so spread that unless the figures are grouped, a
neat and sensible frequency table may not be achieved. Tabulation done in this way
is called a grouped frequency distribution table. The figures are usually grouped into
distinct classes to avoid confusion of possible placement of data into two or more
classes. So, classes may not have gaps in reality.

Self-Assessment Exercise 1
The weights in Kg of a collection of 40 workers in an organization are given below:
59, 53, 66, 55, 57, 65, 48, 59, 51, 58, 52, 68, 60, 70, 71, 56, 70, 64, 54, 67, 62, 53,
49, 56, 63, 48, 57, 61, 58, 55, 56, 55, 61, 52, 54, 65, 56, 50, 62, 60.

Using the tally method, prepare a grouped frequency distribution table using groups
48 – 52, 53 – 57, …
3.2 Mean (Averages)

Consider the statement:


“any average student should pass JAMB examination”.

The word average’ is used here to denote the not-too- brilliant and not-too-dull
student. But in statistics the word has a special meaning. In the above context it
would be used statistically, to describe that student who is representative, in some
ways, of all students that sat for the examination. Therefore, if we have a group of
figures, the average figure is that single figure that can represent all the other groups
in that distribution. Three types of averages are often used in statistics:

i. The mean
ii. The median, and
iii. The mode

3.2.1 The Mean

For n numbers x1, x2... xn, the mean, denoted by

𝑥̅ = x1 + x2 + … + xn
n
𝑥̅ = Σx /n

e.g. for the set 3, 5, 7, 10, 15.

𝑥̅ = Σx = 3 + 5 + 7 + 10 + 15
n 5
= 40/5
=8

Note: When each of the numbers x1, x2,…,xn has attached frequencies f1, f2,...,f3, then
the mean becomes

Mean = 𝑥̅ = Σx
Σf

Where Σf = n

Self-Assessment Exercise(s) 2

1. The figures are usually grouped into distinct classes to avoid confusion of
possible placement of data into two or more classes (Yes or No)?
2. A large and ungrouped data are cumbersome to study and interpret (Yes or
No)?
3.2.2 Mean of a Grouped Data

Three methods of calculation are:

i. The long method


ii. The assumed mean method
iii. The coding method.

Long Method

𝑥̅ = 𝜀𝑓𝑥
𝛴𝑓𝑥
=
𝛴𝑓

Assumed Mean Method

𝛴𝑓𝑑
𝑥̅ = 𝐴 +
𝛴𝑓

Where,

A is a guessed or assumed mean and d = X– A are the deviations from the assumed
mean.

Coding Method

𝛴𝑓𝑢
𝑥̅ = 𝐴 + ( 𝛴𝑓 ) + 𝐶

Where: A is an appropriate chosen x value,

C is the common class size and

u = …, -3, -2, -1, 0, 1, 2, 3…..

NOTE: The coding method is very short and should always be used for grouped data
when class intervals are equal.

Self-Assessment Exercise 3

1. Referring to the table in the above SAE 1, calculate the mean using:
a. The long method
b. Assumed mean of 61
c. The coding method.
Advantages of Mean
It takes account of all the values of a distribution. It is therefore, more representative
than the other two and for this reason alone, it is used more than the other two
averages.

Disadvantages of Mean

1. It is often the most difficult to calculate.


2. It is not easily understood by non-statistician.
3. While the mice and median often represent actual scores belonging to some
members of the population, the. Mean often does not.
4. When the mean is used with discrete variables (e.g. number of children), it often
yields unrealistic values such as 2.5 children.
5. While the median and mode can be obtained graphically the mean cannot.

Self-Assessment Exercise(s) 4

1. What are the merit and demerit of mean as a measure of location?


2. The distribution of the number of overtime hours per month worked by 60 staff of
NITEL are given below:

Overtime (hrs) 60 – 69 50 – 59 40 – 49 30 – 39 20 – 29 10 – 19

No. of workers 5 11 17 14 9 4

a. Calculate the mean overtime hour using:


i. The long method
ii. Assumed mean of 40.5
iii. The coding method

3.3 The Median

If a set of data arranged in order of magnitude, the middle value, which divides the set
into two equal groups is the median. Generally, for N data,

𝑁+1 𝑡ℎ
Median =( ) item
2

Example:
Find the median of the following sets of data
(a) 3, 6, 2, 4, 3
(b) 2, 5, 3, 4, 8, 3
Solution:
(a) Arrangement in order: 2, 3, 3, 4, 6
Here N= 5 and
5+1 𝑡ℎ
Median = ( ) item
2

= 6/2 = 3
That implies the 3rd item =3

(b) Arrangement in order: 2, 3, 3, 4, 5, 3

Here, N = 6
6+1 𝑡ℎ
Thus, Median = ( ) item
2

= 3.5th item

This will be in interpreted as the 3rd item + 4th item


2

Thus, median

3.3.1 Median of a grouped data

The median of a group data can be obtained graphically from the cumulative frequency
curve (ogive) or by calculation, using the formula:

Median = L + N/2 – F C
f

Where:

L = Value of the lower-class boundary of the median class.


F = Cumulative frequency of the class just above the one containing the median.
f = Frequency of the median class
C = Size of the median class interval
NOTE: Usually we first obtain the value of N/2 which will enable us locate the
position of the median in the cumulative frequency distribution.

Advantages of Median
1. It is easily understood.
2. It is relatively easy to calculate.
Disadvantages of Median
1. It takes no account of extreme values in the distribution. For instance, the median
of 2, 40, 43, 45, and 96 is even though there are two extreme values 2 and 96.
2. It does not use all the data available.

Self-Assessment Exercise(s) 5

1. What are merit and demerit of median as a measure of central location?


2. The distribution of the number of overtime hours per month worked by 60 staff of NITEL
are given below:

Overtime (hrs) 60 – 69 50 – 59 40 – 49 30 – 39 20 – 29 10 – 19

No. of workers 5 11 17 14 9 4

a. Construct the cumulative frequency curve and from it estimate the median.
b. Calculate the median and compare your results.

3.4 The Mode

This is the value or number that has the highest frequency in a distribution. The mode
may not exist and even when it does exist, it may not be unique.

Example:

(a) 5, 2, 4, 7, 5, 3 has mode = 5 (unimodal)


(b) 2, 6, 3, 4, 3, 2, 5 has two modes 2 and 3 (bimodal)
(c) 4, 7, 2, 1, 3 has no mode.

The mode can be obtained both graphically and by calculations. For a grouped data,
we use the histogram to estimate the mode while by calculation we use the formula:

Mode = L + fm - fa C
2fm – fa - fb

Where:
L = lower class boundary of the modal class
fm = frequency of the modal class
fa = frequency of the class above the modal class
fb = frequency of the class below the modal class
C = size of the modal class interval.
Note: The modal class is the class that has the highest frequency. The mode itself is
a number within this class.

Referring to the table above

a. Construct the histogram and from it, estimate the mode of the distribution.
b. Calculate the mode and compare your answer with the estimated value in (a)
above.

Solution:

a. The construction should be done on a graph sheet where frequencies are on the
vertical axis and class boundaries on the horizontal

13

12

11

10

4 47. 52. 57. 62. 67. 72.


The mode is approximately 56.

𝑓𝑚 − 𝑓𝑎
b. Mode = 𝐿+ [ ]𝑐
2 𝑓𝑚 − 𝑓𝑎 − 𝑓𝑏

Here, the modal class is 53 – 57.

Hence, L = 52.5, fm = 12, fa = 10 and C = 5

12−8
Mode = 52.2 + [ ]x 5
2(12)− 8−10

= 52.5 + (4/6) x 5
= 52.5 + 3.33
= 55.83
Comparison: Graphical value = 56
Estimated value = 55.83
The values agree appreciably.

Advantages of Mode

It is often the easiest to calculate of the three.

Disadvantages of Mode

1. It presents a misleading picture for a distribution that does not have a regular
shape.
2. It does not use all the data available.

3.4.1 Relation between The Mode, Median and Mean


For unimodal frequency curves which are moderately skewed, the following relation
between the mean, median and mode holds:

Mean – Mode = 3(Mean – Median)

The figure below shows the relative positions of the mean, median and mode for
frequency curves which are skewed to the right and left respectively. For symmetrical
curves the mean, mode and median all coincide.
Self-Assessment Exercise 6

1. The distribution of the number of overtime hours per month worked by 60 staff
of NITEL are given below:

Overtime (hrs) 60 – 69 50 – 59 40 – 49 30 – 39 20 – 29 10 – 19
No. of workers 5 11 17 14 9 4
a. Calculate the mode of the distribution.

4.0 Conclusion
In this unit, you have been exposed to the measure of central tendency which are
bench marks, typical scores or measures which give precise and brief description of a
set of data. These are very important aspect of statistics you cannot toy with.
To make your data very precise for interpretation, you will need to learn these
measures of location very well.

5.0 Summary
In this unit you have learnt that the measures of central tendency are a set of bench
marks which make precise and brief presentation or description of a set of scores. The
three basic measures of central tendency are the mean, the median and the mode.
The mean is the most widely used. It is equal to the sum of the scores divided by the
number of the scores. The symbol is 𝑥̅ and the formula is 𝛴𝑋⁄𝑁 or 𝛴𝐹𝑋⁄𝛴𝐹. Or for
assumed mean = AM + int(Σfx/Σfx).

6.0 Tutor Marked Assignment (TMA)


Use the data below to find;
a. Mean
b. Median and
c. Mode
S/N CLASS INTERVAL F
1 75-79 2
2 70-74 4
3 65-69 6
4 60-64 10
5 55-59 25
6 50-54 35
7 45-49 20
8 40-44 15
9 35-39 10
10 30-34 5

7.0 References/Further Reading


Hannagan, T.J. (1982) Mastering Statistics, New York: The Macmillan Press Ltd
Ary Donald and Jacobs, L.C. (1996) Introduction to Statistics,
Unit 2
Weighted Mean
Content
1.0 Introduction

2.0 Learning Outcomes

3.0 Learning Content

3.1 Weighted Arithmetic Mean

3.2 Geometric Mean and Harmonic mean

3.3 Relationship between Arithmetic, Geometric and Harmonic Means

4.0 Conclusion

5.0 Summary

6.0 Tutor-Marked Assignment

7.0 References/Further Reading


1.0 Introduction
Consider this situation, students are admitted to a B.Sc. course in statistics on the
basis of their performances in the Higher Secondary, or an equivalent examination.
Should the scores in mathematics papers should be considered more important than
those in physics papers? Similarly, should the scores in language papers not be least
important? It is necessary in such a situation to take into account the relative
importance (or weight) of the different observations while evaluating the mean.

2.0 Learning Outcomes


At the end of this lesson, you should be able to:
1. Calculate the weighted mean (arithmetic, geometric and harmonic means)
2. Discuss the relationship between arithmetic, geometric and harmonic Means

3.0 Learning Content


3.1 Weighted Arithmetic Mean
Sometimes we associate with the numbers X1, X2, . . ., Xn certain weighting factors
or weights w1, w2, . . ., Wn depending on the significance or importance attached to
the numbers. In this case

𝑥̅ = w1X1 + w2X2 + ... + WnXn = ∑wX


W1+W2+...+WK
is called the weighted arithmetic mean.

Example: If a final examination in a course is weighted three times as much as a


quiz and a student has a final examination grade of 85 and quiz grades 70 and 90,
the mean grade is:

𝑥̅ = (1)(70) + (1)(90) + (3)(85)


1+1+3
= 415/5
= 83
Properties of the arithmetic mean
1. The algebraic sum of the deviations of a set of numbers from their arithmetic
mean is zero.

Example: The deviations of the numbers 8,3, 5, 12, 10 from their arithmetic mean
7.6 are 8-7.6, 3-7.6, 5-7.6, 12-7.6, 10-7.6
= 0.4, -4.6, -2.6, 4.4, 2.4 with algebraic sum
0.4-4. 6-2.6 + 4.4 + 2.4 = 0.
2. The sum of the squares of the deviations of a set of numbers, X, from any
number a is a minimum if and only if a = X.

i) e.g. Prove that W2 + pw + q, where p and q are given constants, is a


minimum if and only if W = - ½P

using (a), show that ∑(x_- a)²


N

ii) Prove that ∑(x_- a)² is a minimum if and only if a = 𝑥̅


N

Solution:

∑(x2 - 2ax + a2) = ∑x2 -2a∑x +Na2


N N
= a – 2a∑x + ∑x
2 2

N N
Comparing the last expression with (w2 + pw + q), we have:
W=a, p= -2 ∑x/N, Q = ∑X2/N
Then the expression is a minimum when a=-1/2p = ∑x/N = X
(c) If f1 numbers have mean m1, f2 numbers have mean m2, . . ., fK numbers have mean
mK then the mean of all the numbers is

= fimi + f2m2 +... + fkmk


fl+f2+... +fk

i.e. a weighted arithmetic means of all the numbers


E.g. if a company having 80 employers, 60 earn ₦3 per hour and 20 earns ₦2 per
hour, determine:
(i) The mean earnings per hour
(ii) Would the answer to (a) be the same if the 60 employers earn a mean hourly wage
of ₦3 per hour and the 20 employers earn a mean hourly wage of ₦2 per hour?
(iii) Do you believe the mean hourly wage to be typical?
Solution
(i) 𝑥̅ = ∑fx/∑f
= 60x3 + 20 x 2
60 +20
= 220/80
= ₦2.75
(ii) Yes, the result is the same
(iii) Yes, it is typical.

(d) If A is any guessed or assumed arithmetic mean (which may be any number) and
if = X-A, denoted by d, are the deviations of X from A, then,
x = A+ ∑di/N

Or simply X = A + ∑d/N
If the data is grouped, then
𝑥̅ = A + ∑ fidi/ ∑ fi or simply as

𝑥̅ = A + ∑fd/N
where N = ∑ fi

Self-Assessment Exercise(s) 1

1. The algebraic sum of the deviations of a set of numbers from their arithmetic
mean is zero (Yes or No)?
2. If a final examination in a course is weighted three times as much as a quiz and
a student has a final examination grade of 85 and quiz grades of 70 and 90,
calculate the mean grade.

3.2 Geometric Mean and Harmonic Mean

3.2.1 Geometric Mean


The geometric mean, G of a set of N numbers X1, X2, X3, ., XN is the Nth root of the
product of the numbers:
G = N X1.X2.X3…XN
E.g. The geometric mean of the numbers: 2, 4, 8 is:
3 (2) (4) (8) = ∛ 64
=8
In practice, G is computed by logarithms. For the geometric mean from grouped data,
use the mid- class intervals as your values.

3.2.2 The Harmonic Mean, H

The harmonic mean, H of a set of N numbers: X1, X2, X3, .. . , XN , is the reciprocal of
the arithmetic mean of the reciprocals of the numbers:
H = N / ∑1/x
Example: The harmonic means of the numbers 2, 4, 8 is

H = 3
½ + ¼ + 1/8

= 3
7/8

= 3.43

NOTE: It is often convenient to express the fractions in decimal form first.

Self-Assessment Exercise(s) 2

1. The harmonic mean, H of a set of N numbers: X1, X2, X3, .. . , XN , is the reciprocal
of the arithmetic mean of the reciprocals of the numbers (Yes or No)?
2. Calculate the geometric mean of the numbers: 3, 5, 6, 7.
3. Calculate harmonic mean of the numbers: 5, 6, 7.

3.3 Relation between Arithmetic, Geometric, and Harmonic Means


The geometric mean of a set of positive numbers X1, X2, ..., Xr is less than or equal
to their arithmetic mean but is greater than or equal to their harmonic mean. In
symbols,
H ≤G≤ X
The equality signs hold only if all the numbers X1, X2, . . . , X are identical.
Example: The set 2, 4, 8 has arithmetic mean 4.67, geometric mean 4, and
harmonic mean 3.43.
Example: During one year the ratio of milk prices per quart to bread prices per loaf
was 2. Whereas, during the next year the ratio was 2.00.
i. Find the arithmetic mean of the ratios for the two year period.
ii. Find the arithmetic mean of the ratios of bread prices to milk prices for the two
year period.
iii. Discuss the advisability of using arithmetic mean for averaging ratios.
iv. iv. Discuss the suitability of the geometric mean for averaging ratios.

Solution:
i. Mean ratio of milk to bread prices = 1(2.50+2.00) 2.25
ii. Since the ratio of milk to bread prices for the first year is 2.50, the ratio of bread to
milk price is 1/2.50 = 0.40. Similarly, the ratio of bread to milk prices for the second
year is 1/2.00 0
Then
Mean ratio of bread to milk prices = 1(0.40+0.50) = 0.45
iii. We would expect the mean ratio of milk to bread prices to be the reciprocal of the
mean ratio bread to milk prices if the mean is an appropriate average.
However, 1/0.45 = 2.11 # 2.25.
This shows that the arithmetic mean is a poor average to use for ratios.
iv. Geometric mean of ratios of milk to bread prices = V(2.50)(2.00)
V5
Geometric mean of ratios of bread to milk prices v’(0.40)(0.50) = V 0.2 = 1/ V 5 Since
these averages are reciprocals, our conclusion is that the geometric mean is more
suitable than the arithmetic mean for averaging ratios for this type of problem.
Self-Assessment Exercise 3

1. Explain the relation among arithmetic, geometric and harmonic means.

4.0 Conclusion
We have been exposed to the concept of weighted mean: be it arithmetic, geometric
and harmonic means. This means for us to calculate appropriate mean, weight has to
be attached to individual value accordingly.

5.0 Summary
In this unit, you have been learnt that the concept of weighted mean is particularly
useful in the construction of price index number and in such a situation to take into
account the relative importance (or weight) of the different observations while
evaluating the mean.

6.0 Tutor Marked Assignment (TMAs)


1. The algebraic sum of the deviations of a set of numbers from their arithmetic mean
is zero (Yes or No)?
2. The price increases from 2010 to 2011for five food items have been (in percentage
terms) as follows:
132.1 153.4 144.3 119.7 120.1
And the relative importance of these items in a typical citizen’s diet is:
34 19 24 12 11
Calculate the average price increase for these items.
3. The harmonic mean, H of a set of N numbers: X1, X2, X3,. , XN, is the reciprocal of
the arithmetic mean of the reciprocals of the numbers (Yes or No)?
4. Calculate the geometric mean of the numbers: 2, 4, 6, 8.
5. Calculate harmonic mean of the numbers: 2, 4, 6, 8.
6. Explain the relation among arithmetic, geometric and harmonic means.

7.0 References/Further Reading


Harper W.M. (1982) Statistics, Fourth Edition. Macdonald and Evans Handbook
Series.
Hannagan, T.J. (1982) Mastering Statistics. The Macmillan Press Ltd
Module 4
Measures of Dispersion
Unit 1: Measures of Dispersion 1
Unit 2: Measures of Dispersion 2
Unit 1
Measures
of Dispersion 1
Content
1.0 Introduction

2.0 Learning Outcomes

3.0 Learning Content

3.1 The Range

3.2 The Quartiles

3.3 The Deciles and Percentiles

4.0 Conclusion

5.0 Summary

6.0 Tutor-Marked Assignment

7.0 References/Further Reading


1.0 Introduction
The degree to which numerical data tends to spread about an average value is called
the variation or dispersion of the data. Consider the following distribution of wages in
Naira of 5 workers in each of two television stations A and B.

Station A 25,000 30,000 35,000 40,000 45,000

Station B 7,500 12,500 35,000 55,000 65,000

The mean and the median wages for each of the two distributions is N35, 000. From
the results, one could wrongly conclude that the workers conditions of service in both
stations are the same. A close observation of the figures clearly shows that the wages
of workers in station A are more fairly and evenly distributed than those in B. One
therefore, needs a study of dispersion to detect the disparity in a distribution. Various
measures of dispersion are available; the measures which we shall discuss in this unit
are the range, the quartile, the deciles and percentiles.

2.0 Learning Outcomes


By the end of this unit you will be able to:
1. Define and calculate the range in a given set of scores.
2. Explain and locate the quartiles in a distribution of scores.
3. Explain and locate the deciles in a set of scores
4. Explain and calculate the percentiles in a given set of scores.

3.0 Learning Content


3.1 The Range
This is the simplest but crude and unreliable method of estimating variability. It is
defined as the difference between the highest and the lowest scores in a given
distribution. It is usually affected by the presence of two extreme scores. The greater
the range, the greater is the dispersion or variability. It can be found by using the
formula:
R = Xh –Xl,
where Xh represents the highest score and Xl is the lowest scores.
Example:
Find the range in the following set of scores.
53, 59, 72, 62, 57, 54, 66, 79, 14, 65, 64, 95, 59.
If you look at the scores very well, you will notice that the lowest score X l =14 and the
highest score Xh= 95. Therefore the range R. will be :
Xh-XL =95-14
=81.
Self-Assessment Exercise 1

1. Find the range in the set of scores below


53, 59, 60, 48, 64, 72, 56, 34, 75, 52, 36, 93

3.2 Quartiles

In the last unit, you learnt that Median is a positional score, which occupy the middle
point on the score scale. In the same way, the quartiles are positional scores. The first
Quartile Q1 is the score point that sets the lower quarter or 25% of the group. In the
same way, the middle quartile Q2 is the median score point and third quartile Q3 is the
75% of the group. So, quartiles are points that divide a score into four equal parts.
These points can locate in a distribution.

Quartile deviation or the semi-interquartile range


= Q3 – Q1
2
Where Q1 and Q3 are the lower and upper quartiles respectively.
Example: Locate the Q1 and Q3 in the data given below.

Scores 15 18 21 23 25 27 28 29 32
Freq. 1 1 2 3 6 5 3 4 3

Solution:

i. Complete the table to include the cumulative frequency.

S/No Scores F CF
1 32 3 28
2 29 4 25
3 28 3 21
4 27 5 18
5 25 6 13
6 23 3 7
7 21 2 4
8 18 1 2
9 15 1 1
ii. Find the 25% or ¼ of the number of scores =25/100 x 28/1=7
iii. Count below along the frequency column until you get 25% of the cases. It is
between 23 and 25 i.e Q1= 23+25/2 =24
iv. Find 75% or ¾ of the scores =75/100x28/1 =21
v. Count from the below along the frequency until you get 75% of the cases. This
gives Q3. It is between 28 and 29 i.e Q3=28+29/2=28.5
Self-Assessment Exercise 2

1. The table below shows a frequency distribution of weekly wages in naira of 65


employees at the P and R Company, locate the Q1 and Q3 in the data given below.

Wages 50-59.99 60- 70- 80- 90- 100- 110-


69.99 79.99 89.99 99.99 109.99 119.99
(₦)

No of 8 10 16 14 10 5 2
Employees

3.3 The Deciles and the Percentiles


3.3.1 The Deciles

In this sub unit, we shall move to another step. This is to divide into ten equal parts to
locate the deciles. Deciles points are used to mark off a distribution, thus indicating
points of dividing a distribution of success into tenths. Thus, there are 9 deciles i.e.
from 1 to 9 which divide a distribution into ten equal parts. D1 is the first deciles and
below D1 lies the bottom 10% of the group. In the same way D 2 is the point in the
distribution below which 20% of the cases fall. Like quartiles, deciles are points in a
distribution not segments.

3.3.2 The Percentiles


Percentiles are ordinal measures. They are scores points which divide the distribution
into 100 equal parts called percentages. In other words, they are points on the raw
scale below which given percentages of the cases in the distribution fall. For instances,
80th percentile is the point on the score scale that has exactly 80% of the cases below
it.
Percentiles are symbolised by the letter Px, with x denoting the particular percentile.
Thus 90th percentile is written P90, they are used for decision making when part of a
population is to be selected because of its position within the total.
Note that the median corresponds to the 50th percentiles, P50 and 2nd Quartile Q2
The 1st quartile corresponds to the 25th percentile, P25
The 3rd quartile corresponds to the 75th percentile, P75
Self-Assessment Exercise(s) 3
1. Locate the following in the data given in the Self-Assessment Exercise 2 above:
a. D2
b .D4
2. Explain what do you understand by percentile.

Scores 16 18 20 22 24 26 28 30 32

Freq. 1 1 2 3 6 5 3 4 3

4.0 Conclusion
In this unit, we have gone through sources of the measures of variability or dispersion.
These are the measures used to establish the homogeneity or heterogeneity of a set
of data in a distribution scale.

5.0 Summary
In this unit you have been exposed to some measures of variability which are
measures that show the spread of the scores in a given distribution. The measures
you have seen so far are:
i. The range which simply shows the difference between the highest and the lowest
observations or numbers.
ii. The quartiles are the points which divide the distributions or scores into four
equal parts called quarters.
iii. The deciles are also points on the distribution that divide the distribution into ten
equal parts or tenths.
iv. Percentile are points on the score scale that divide the distribution into 100 equal
parts called centiles or percentages.

6.0 Tutor-Marked Assignment


1. Find the range in the set of scores below
53, 59, 60, 48, 64, 72, 56, 34, 75, 52, 36, 93
2. i. Locate the Q1 and Q3 in the data given below.
ii. Find semi interquartile range.
3. Explain the following:
i. P50
ii. P95
iii. D2
iv. D4
7.0 References/Further Reading
Harper W.M. (1982) Statistics, Fourth Edition. Macdonald and Evans Handbook
Series.
Ary, Donald and Jacobs (1976) Introduction to Statistics: Purposes and Procedures,
New York.
Unit 2
Measures
of Dispersion 2
Content
1.0 Introduction

2.0 Learning Outcomes

3.0 Learning Content

3.1 The Mean Deviation

3.2 Variance and Standard Deviation

4.0 Conclusion

5.0 Summary

6.0 Tutor-Marked Assignment


1.0 Introduction
In unit 4, you learnt the various measures of central tendency and you are told that
these measures are a set of bench marks which make precise and brief presentation
or description of a set of that. Therefore, in order to describe a distribution adequately,
we shall need both the measures of central tendency and variability. This is because
information concerning variability may be as important as or more important than
information concerning central tendency.

2.0 Learning Outcomes


At the end of this unit, you will be able to:
1. Calculate the mean deviation
2. Calculate the variance in a set of scores.
3. Calculate the standard deviation in a given distribution

3.0 Learning Content


3.1 Mean Deviation
At the aforementioned measures of dispersion only use two
points in the distribution; they are therefore statistically unreliable. A measure which
makes use of all the available data is the mean deviation.
∑ | 𝑥 – 𝑥̅ | 𝑓𝑜𝑟 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 (𝑢𝑛𝑔𝑟𝑜𝑢𝑝) 𝑑𝑎𝑡𝑎.
𝑀𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 =
𝑛
However, when frequencies are attached, then
Mean deviation = Σ f|x – 𝑥̅ |
Σf
Where, n = Σ f and X = the mean.

Example: Find the mean deviation of the set of values: 2, 4, 7, 10, 12

Solution

Mean deviation = Σ f|x – 𝑥̅ |


Σf
Here, X = Σ f = 2 + 4 + 7 + 10 + 12
n 5
= 35/5
=7
Thus:

X x–𝒙
̅=x–7 |x – 𝒙
̅| = |x – 7|
2 -5 5
4 -3 3
7 0 0
10 3 3
12 5 5
Total 16

Hence:
Mean Deviation = 16/5 = 3.2
Note: For a grouped data, the class marks are taken as our x values.
Self-Assessment Exercise 1

1. Find the mean deviation of the set of values: 12, 6, 7, 3, 15, 10, 18, 5

3.2 Variance and Standard Deviation


This is the most reliable, useful and mostly used measure of dispersion. Reasons for
this are not farfetched.
1. The standard deviation makes use of all the members of a distribution.
2. It yields itself for further statistical usage as used in computations under normal
distribution.

Calculations
1. Long Method:

Σfx2 Σfx
𝑆 = 𝐶 × √ Σf − ( Σf )2

2. Assumed Mean Method:

Σfd2 Σfd
𝑆 = 𝐶 × √ Σf − ( Σf )2

Where d = x – A
3. Coding Method:

Σfu2 Σfu 2
𝑆 =𝐶 ×√ −( )
Σf Σf
Where u is as earlier defined under mean.
Note: In problems involving calculation of both mean and standard deviation, for
simplicity, the method used in computing the mean should be applied to find the
standard deviation.
The square of the standard deviation is called the variance. So if ‘S’ denote the
standard deviation, then S2 is the Variance.
Examples:
1. The marks scored by some 50 students in a statistics test are given below:

Marks 51 – 60 41 – 50 31 - 40 21 - 30 11 – 20 1 – 10
Frequency 5 10 11 12 6 6

a. Calculate the mean and the standard deviation using the log method

Solution:

F X Fx x2 fx2 ̅
x-𝒙 |x-31.1| f|x-31.1| F
= x-31.1
1 – 10 6 5.5 33.0 30.25 181.5 -25.5 25.5 153.6 6
11 – 20 8 15.5 93.0 240.25 1441.5 -15.5 15.5 93.5 12
21 – 30 12 25.5 306.0 650.25 7803.0 -5.5 5.5 67.2 24
31 – 40 11 35.5 390.5 1260.25 13862.75 4.4 4.4 43.4 35
41 – 50 10 45.5 455.0 2070.25 20702.25 14.4 14.4 144.0 45
51 – 60 5 55.5 277.5 3080.25 14401.25 24.4 24.4 122.0 50
Total 50 1555.0 59392.5 628.8

𝑥̅ = Σfx = 1555
Σf 50

= 31.1

Σfu2 Σfu
a. 𝑆 = 𝐶 × √ Σf − ( Σf )2

59392.5 1555 2
𝑆=√ −( )
50 50

= 14.85
Self-Assessment Exercise 2

1. The distribution of the ages of 100 people in a village is given below:

Ages 60- 62 63-65 66 -68 69-71 72- 74


Frequency 5 18 42 27 8
Using assumed mean 67, calculate the standard deviation and the variance of the
distribution.

4.0 Conclusion
In this unit you have learnt that apart from the usefulness of the measures of the
measures of central tendency for providing a concise index of the average value of set
scores, three is more to be studied about a set of scores. Therefore, to describe a
distribution of scores very well and adequately we need both the measures of central
tendency and measure of variability, this is because the two measures make up two
types of descriptive statistics which are indispensable in describing distribution of a
given data.

5.0 Summary
You have studied three measures in this unit; they are mean deviation, standard
deviation and variance. Thus,

Mean Deviation = Σ f|x – 𝑥̅ |


Σf
Standard Deviation
i. Long Method:

Σfx2 Σfx
𝑆 = √ Σf − ( Σf )2

ii. Assumed Mean Method:

Σfd2 Σfd
𝑆 = √ Σf − ( Σf )2

Where d = x – A
iii. Coding Method:

Σfu2 Σfu
𝑆 = 𝐶 × √ Σf − ( Σf )2
6.0 Tutor-Marked Assignment
1. Find the mean deviation of the set of values: 2, 3, 5, 6, 8.
2. The distribution of the ages of 108 staff of a telecommunication outfit are given
below:

Ages 15-19 20-24 25-29 30-34 35-39 40-44 45-49

No of Staff 8 12 20 24 16 16 12

Using assumed mean 33, calculate the mean and the standard deviation of the
distribution.

7.0 References/Further Reading


Murry R. Spiegel, Outline of Theory and Problems of Statistics (1961),Schaum
Publishing Company. U.S.A.

Harper W.M (1982) Statistics, Fourth Edition, Macdonald and Evans.

You might also like