0% found this document useful (0 votes)
9 views22 pages

Statistics 1 2025

The document provides a comprehensive overview of statistics, detailing its definition, basic elements, branches, and types of data. It emphasizes the importance of proper data collection, organization, analysis, and interpretation for meaningful conclusions. Additionally, it distinguishes between descriptive and inferential statistics and explains various statistical terms and concepts.

Uploaded by

esthertalibita24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views22 pages

Statistics 1 2025

The document provides a comprehensive overview of statistics, detailing its definition, basic elements, branches, and types of data. It emphasizes the importance of proper data collection, organization, analysis, and interpretation for meaningful conclusions. Additionally, it distinguishes between descriptive and inferential statistics and explains various statistical terms and concepts.

Uploaded by

esthertalibita24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

STATISTICS

Introduction

The field of statistics deals primarily with numerical data gathered from surveys or collected in
experiments. Its objective is to summarize such data so that the summary gives us a good indication
about some characteristics of a population or phenomenon that we wish to study. To ensure that
our conclusions are meaningful, it is necessary to subject our data to scientific analyses so that
rational decisions can be made. Hence the field of statistics is defined as the proper collection of
data, its organization into manageable and presentable form, its analysis and interpretation into
conclusions for useful purposes

The Basic Elements of Statistics


According to the definition above, the field of statistics incorporates the following five elements.
(I) Proper Collection of Data
Since statistical analyses and decisions are based upon the raw data collected from surveys, it is
important and necessary that such data are carefully and accurately collected, accumulated and
recorded. Faulty data or/ and faulty data collection techniques result into wrong conclusions.
Sources of Data
Data may be available from existing published sources that may be already organized in some
presentable form. Such information is commonly referred to as secondary data.
On the other hand, the investigator may actually collect his data. This is usually done when
information about some area of inquiry has not been previously ascertained. In such cases, the
information is referred to as primary data.
(II) Organization and Classification of Data
For data collected through a survey, it is necessary to edit this data to correct any apparent
inconsistencies, ambiguities, recording errors and other mistakes that can enter into the actual
computations. But even before the data have been collected and edited, it is assumed that these can
be suitably classified according to some common characteristics of the population under study.
For example, we can collect data that purposefully describes the non-nationals living in Kampala
city with respect to the following characteristics, say age group, annual income, type of
employment or professional identification, level of education, family size and so on. Data
classified according to such characteristics can be presented more easily.
(III) Presentation of Data
The organized data can now be presented in the form of tables, diagrams, charts or graphs. This
presentation in an orderly form facilitates the understanding and analysis of the data.
(IV) Analysis of the Data
Data analysis makes data more useful for certain conclusions. The analysis may take the form of
a critical observation of the data to draw some meaningful conclusions about it or it may involve
highly complex and sophisticated statistical computer software like Minitab, SAS, SPSS, Microfit
excel, R, STATA, E-views, among others. The commonly used simple statistical tools for
analyzing data include calculating of averages (mean), dispersion of data around averages and
percentages.
(V) Interpretation of Data
Data interpretation means drawing conclusions from the data that form the basis of decision
making. The correct interpretation requires a high degree of skill and experience and is necessary
in order to draw valid conclusions. For example, a simple statistical statement, that the average
mark in a statistics examination was 65% may not be the correct interpretation of the data if some
students scored very highly and others scored very low marks.
The Branches of Statistics
The field of statistics is primarily split up into two identifiable branches, which are descriptive
statistics and inferential statistics.
Descriptive Statistics
This is a branch of statistics that merely describes the data and consists of methods and techniques
used in collection, organization, presentation and analysis of data in order to describe the various
features and characteristics of such data. The data can be presented in form of charts, tables or
graphs in order to show certain trends, proportions, maximum and minimum values and so on. In
addition to collection, organization and presentation of data, descriptive statistics is also concerned
with data analysis so that the data can be easily understood. This involves computation of averages,
proportions and measures of spread of data around the average to gain clarity and compactness,
even though we may lose some details.
For example, the following statistics in their most summarized form describe in some way the
characteristics of the population from which they are drawn although not much can be inferred or
concluded from them nor can definite decisions be made.
I. 40% of the students offering Mathematical economics – 2 class are married.
II. The ages of students in my class range from 20 to 40 years.
III. The average mark in a statistics examination last year was 68%.
Inferential Statistics
This is a branch of statistics that deals with methods of inferring or drawing conclusions about the
characteristics of the population based upon the results of the samples taken from the same
population. The measured characteristics of the sample are known as sample statistics and the
measured characteristics of the population are known as population parameters. Inferential
statistics requires making a decision on whether the conclusions drawn from the analysis of the
sample are exactly the same as the conclusions that would be drawn from the entire population
from which the representative sample was taken.
Statistical Terms and Concepts
There are several terms that are used extensively in the field of business statistics. Amongst them
are the following,
(i) Data
This is a term you have come across time and time again so far, but what does “data” mean? Data
is simply a scientific term for facts, figures, information and measurements. Examples of data are
incomes of families, marks scored in an examination, number of goals scored by each football
team in the super division in a given season, the profit after tax of a company for the past five years
and so on. Before the data obtained from statistical surveys or investigations have been worked
on, they are known as raw data. If the data is written down in ascending or descending order, then
it is called ordered data. If the ordered data is arranged in arrays of rows or columns, then it is
known to be presented in an ordered array.
(ii) Population
In statistical terms, this refers to the totality of things, objects, and persons under consideration.
For example, if we are interested in knowing the percentage of adult Ugandan travelers who go to
America annually, then all those adult Ugandans who travel abroad become our population.
(iii) Sample
A sample is a portion of the total population that is considered for study and analysis. It is selected
in such a way so as to be representative of the population. For instance, if we want to study the
income pattern of lecturers at Makerere University and there are 500 lecturers, then we may take
a random sample of only 50 lecturers out of the entire population of 500 for the purpose of our
study. Then this number of 50 lecturers constitutes a sample.
Statistical Data
Data for statistical purposes is of two types, i.e. primary and secondary data, depending on the
source from which it comes.
Primary Data
These are data that are collected afresh and for the first time from the source. Primary data is
collected for a specific purpose or study or inquiry. Primary data are always original in character.
Primary data is either survey data, if it is obtained in an uncontrolled situation by asking
questions or making observations or it is experimental data, if it is obtained in a controlled
situation by making experiments.
Secondary Data
This is data that has been already collected and it has already passed through the statistical
process. It can be sourced from Newspapers, Journals,
Magazines, Reports, books and so on. The advantage of using secondary data is that it is
reasonably cheap to obtain and to use and a relatively faster than the primary data

Qualitative and Quantitative data


Various definitions exist for the distinction between these two types of data. Non-numerical,
nominal, data is always described as being qualitative or non-metric as the data is being
described by some quality but not measured. Quantitative or metric data which describes some
measurement or quantity is always numerical and measured on the interval or ratio scales. All
definitions agree that interval or ratio data are quantitative
Discrete and Continuous data
Quantitative data may be discrete or continuous. If the values which can be taken by a variable
change in steps the data are discrete. These discrete values are often, but not always, whole
numbers. If the variable can take any value within a range, it is continuous. The number of people
shopping in a supermarket is discrete but the amount they spend is continuous. The number of
children in a family is discrete but a baby's birth weight is continuous.
Summarizing data using Frequency Distributions
When a large number of measurements of a particular variable are taken, for example, the number
of units produced per employee per week, you may find some values occurring more than once.
Such data can be organized into a frequency distribution which simply lists each value of the
variable and the frequency of its occurrence. This can also be done in a tabular form, and the table
is called a frequency table.
Frequency Distributions
A frequency distribution is defined as the list of all values in the given distribution and the
corresponding number of times each individual value occurs in the distribution. The main
objective of a frequency distribution is to summarize numeric data in a logical way that enables an
overall perspective of the data to be obtained quickly and easily. Frequency distributions are of
two kinds, simple (ungrouped) frequency distributions and the grouped frequency distributions.
A simple (ungrouped) frequency distribution
A simple frequency distribution is a list of data values, with their corresponding number of
occurrences as their frequency. Simple frequency distributions are suitable in situations where we
have a limited number of discrete data values that are repetitive. Discrete values are those values
obtained by counting and assume whole number values. A simple frequency distribution is not
normally suitable for continuous data (i.e. where values have been measured), since the likelihood
of repeated values is small.
Example constructing a simple frequency distribution using a tally chart
The following data records the goals scored by 20 football clubs in the champion’s league.
3 4 1 2 0 6 7 3 3 4
5 5 0 2 6 3 3 5 5 2
Construct a simple frequency distribution table for these data, using a tally chart.
Solution
The tally chart is constructed by examining each value and recording its occurrence with a stroke
and from this, the frequency distribution table can be constructed as shown below:

Tally chart
Data value Tally marks Total
0 || 2
1 | 1
2 ||| 3
3 |||| 5
4 || 2
5 |||| 4
6 || 2
7 | 1

Simple frequency distribution table


Number of goals scored by the club Number of football clubs
0 2
1 1
2 3
3 5
4 2
5 4
6 2
7 1
Observations
a. The majority of the clubs scored three goals in the tournament.
b. No club scored more than seven goals in the tournament.
A Grouped Frequency Distribution
A grouped frequency distribution summarizes data into groups (classes) of values, with each class
showing the number of values that lie in it as its frequency. When the number of distinct values in
a set is large (20 or more, say) a simple frequency distribution is not appropriate, since there will
be too much information not easily assimilated. In this type of situation, a grouped frequency
distribution is used.
Forming Classes or Groups
There are basically two ways of forming classes or groups of the given values. These are as
follows:
Inclusive class intervals. Here the upper limit of the class interval is included in the given class
interval. These class intervals are used when the variable under consideration happens to be a
discrete one (i.e. takes on one figure at a time) e.g. the number of workers.
11-19 20-29 30-39 40-49 etc
Note: here the term whose value is 19 will be put in a class interval (11-19) and the one having a
value of 39 will be put in a class interval of (30-39)
Exclusive class intervals. Here, all values equal to the upper-class limit are grouped into the
next class. These class intervals are usually used when the variable under consideration is a
continuous one (capable of being measured in fractions like height, age, weight and so on).
10-20 20-30 30-40 40-50 etc
Note: here, the term whose value is say 40 won’t be put in (30-40) but will instead fall under (40-
50).
Definitions of terms associated with Grouped Frequency Distribution Classes.
Class limits
These are the lower and upper values of the classes as physically described in the distribution.
These class limits have been illustrated in the preceding section above.
Class boundaries
These are sometimes called mathematical or real limits. They are the lower and upper limits up to
which inclusive classes can be extended to cover the successive gaps that exist in between them.
They can also be referred to as the lower and upper values of classes that mark common points
between them. For exclusive classes, the class limits are also the class boundaries. However, for
inclusive classes, the class boundaries differ from class limits. Class boundaries are determined
as follows:
Upper class boundary = Upper class limit + ½ d
Lower class boundary = Lower class limit – ½ d
Where d is the size of the gap that exists between the successive classes.
Example 5 (Getting class - boundaries)
Consider the classes, 10 – 19, and 20 – 29 and so on. The size of the gap between 19 and 20 is 1
unit. The class boundaries for the class 10 – 19 are as follows:
Upper class boundary = 19 + ½ (1) = 19 + 0.5 = 19.5
Lower class boundary = 10 – ½ (1) = 10 – 0.5 = 9.5
The same applies for 20 – 29,
Upper class boundary = 29 + ½ (1) = 29 + 0.5 = 29.5
Lower class boundary = 20 – ½ (1) = 20 – 0.5 = 19.5

Class mark (class mid-point)

This is the mid- point of the class interval. E.g. for an interval of (60-62), the class mark is
(60+62) = 61
2
Cumulative frequency distribution
These helps us to determine the number of units/observations that lie above a given lower class
limit or below a given upper class limit in a frequency distribution. When you are interested to
get the number of units below a specified value then you consider a less than cumulative
frequency distribution. However, when you are interested to get the number of units that lie
above a specified value, then you consider using a more than cumulative frequency distribution
Example
Age of Frequency(fi) Cum.freq.(less Cum.freq.(more
workers. than) than)
15-25 5 5 7+3+5+7+3+5=30
25-35 3 5+3=8 7+3+5+7+3=25
35-45 7 5+3+7=15 7+3+5+7=22
45-55 5 5+3+7+5=20 7+3+5=15
55-65 3 5+3+7+5+3=23 7+3=10
65-75 7 5+3+7+5+3+7=30 7
Σfi=30
From the above table it can be noted that:-

Case 1. Cumulative frequency less than


➢ There are 5 people less than 25 years, 8 people less than 35 years, 15 people less than 45
years and so on.

Part4. The Graphical Presentation of Numerical Frequency Distributions


Having summarized data in form of frequency distributions, we now present it graphically in order
to give it a more visual display. Data can be presented graphically using, Histograms, Frequency
polygons and curves, Cumulative frequency curves or ogives.
A pie chart is a circle, which is divided by radial lines into sections or subsections of different
angles so that the area of a particular section is proportional to size of the figure represented.
Example. Suppose UTL’s work force is split as per the table below.
Job description Number employed.
Labourers 21
Mechanics. 38
Cleaners. 9
Clerks. 12
Total 80
Required:- Represent the above information on a pie chart.

Cleaners clerks laborers mechanics

Bar graph /bar chat.


There are 3 types of bar graphs i.e. – A simple bar graph, multiple bar graph, and component
change bar graph.
A simple bar graph employs bars to indicate frequency of occurrence of observation with in
each category. A vertical bar graph is drawn for each category and the height represents the
number of members in that particular class.
In the component bar graph, bars are divided into different components representing a given
category of data. In multiple bar graph, there are several bars within a given category.

Example. Suppose UTL’s work force is split as per the table below.
Job description Number employed.
Laborers 21
Mechanics. 38
Cleaners. 9
Clerks. 12
Total 80
Required:- Represent the above information on a simple bar chat.
A sim ple bar chat for the above inform ation.

40
35
Number employed.

30
25
20
15
10
5
0
Laborers Mechanics. Cleaners. Clerks.
Job description.

The component bar graph Example 2


Value of some major export commodities, groups (1980 – 84) In ($ millions)
Year Agriculture Mineral Others Total
1980 238.1 322.4 131.3 691.8
1981 174.8 300.6 90.5 565.9
1982 174.4 302.1 93.9 570.4
1983 231.5 373.1 82.8 687.4
1984 380.6 328.5 114.6 823.7
Source: Bank of Uganda, quarterly economic bulletin
Others

Mineral

Agriculture.

The Multiple Bar Graphs. Example 3


Uganda export and import values ($million)
Year Export value Import value
1976 451.6 351.1
1977 570.9 448.3
1978 550.4 478.4
Source: UBOS, quarterly bulletin

Export Value

Import
Value
Graphical presentation These are divided into 3 parts. Histograms, Frequency polygons and
Cumulative frequency curves.
The Histogram It’s a plot of class frequencies against class boundaries. Its body comprises of
bars of equal width whose height represents the class frequency. Unlike in a bar graph, the bars
in a histogram are attached to each other and the width of the rectangle corresponds to the class
width and the height to the class frequencies.
Graphical presentation These are divided into 3 parts. Histograms, Frequency polygons and
Cumulative frequency curves.
The Histogram It’s a plot of class frequencies against class boundaries. Its body comprises of
bars of equal width whose height represents the class frequency. Unlike in a bar graph, the bars
in a histogram are attached to each other and the width of the rectangle corresponds to the class
width and the height to the class frequencies.

Example
Required:-
Age of Frequency(fi) Mid point (xi) Construct a histogram to
workers. display the information
15-25 5 20 in the table.
25-35 3 30
35-45 7 40
45-55 5 50
55-65 3 60
65-75 7 70
Σfi=30

A histogram for the above information


8

6
frequency(f)

0
15-25 25-35 35-451 45-55 55-65 65-75
class intervals(ages)
15-25 25-35 35-45 45-55 55-65 65-75

The Frequency Polygon


It’s a plot of class frequencies against class marks or mid points. Successive points are joined by
straight lines there by forming a line graph. Two class intervals are added to the distribution one
at the beginning and the other at the end of the distribution for purposes of closing the polygon.

A Frequency Polygon from the previous table.

8
7
6
Frequency

5
4
3
2
1
0
10 20 30 40 50 60 70 80
Mid points of the class.

The cumulative frequency curve. (OGIVE)


The OGIVE is of two types. The less than Ogive and, The greater than Ogive.
Example. The cumulative freq distribution of statistics results.
Age of Frequency(fi) Cum.freq.(less Cum.freq.(more
workers. than) than)
15-25 1 1 25
25-35 2 3 24
35-45 3 6 22
45-55 5 11 19
55-65 7 18 14
65-75 5 23 7
75-85
2 25 2

∑f=25
Required: - Draw the “less than” and “more than” cumulative frequency curves from this
distribution

Case 1.
The less than Ogive.
Here the less than cumulative frequencies are plotted against the upper class limits /boundaries
of each class.
A less than Ogive showing statistics results.

30

25

Less than Cum.Freq.


20

15

10

0
25 35 45 55 65 75 85

Upper class limits/B

Case2.
The more than Ogive.
Here the more than cumulative frequencies are plotted against the lower-class limits /boundaries
of each class as shown in the next graph.

A more than Ogive showing statistics results.


30

25
More than Cum.Freq.

20

15

10

0
15 25 35 45 55 65 75
Low er class lim its/B
STATISTICAL MEASURES
Chapter 1, 2 &3 helped us to transform a mass of raw data into a meaningful form; we organized
it into a frequency distribution and portrayed it graphically on a histogram and a frequency
polygon. We also looked at other graphical techniques like line charts/graphs and pie charts.
Chapter 5 & 6 are concerned with the other two ways of describing data; i.e. measures of central
tendency & measures of dispersion. It deals with basic analysis of univariate data (data obtained
from measuring just one attribute). Statistical measures is the name given to describe this type of
analysis, the measures themselves being split into Measures of location/ average (central
tendency), Measures of Dispersion and Measures of skewness.

MEASURES OF CENTRAL TENDENCY / LOCATION


A measure of central tendency is a sample value around which the distribution of data is
centered. Or a single value which can neatly characterize the whole group of data. The purpose of
a measure of central tendency/average is to pinpoint the center of a set of observations. e.g., the
average mark in business statistics was 82%, the average life expectancy of Ugandans is 45 years.
These values (measures of location are very useful in presenting the overall picture of the entire
data and in making comparisons between two or more sets of data). The common measures of
central tendency are: - The Mean, The Mode and The Median.
Methods of computing the measures for central tendency:

a) (i) The Arithmetic Mean (for ungrouped data)


A statistician usually takes a sample from a population in order to find out something about a
specific characteristic of the population. Fore example, a lecturer would wish to know the mean
height of all students in diploma. It can be time consuming & expensive to start measuring heights
of all students. He might select only a sample of 8 students and gets their heights. This mean
height of 8 students will reflect the mean height of all diploma students. The arithmetic mean is
the most widely used of all averages.
For ungrouped data, if observations are in a row form, the mean is computed by summing all the
observations and then dividing their sum by the total number of observations.
N

x + x 2 + x3 + ........ + x N
 xi  x
i =1
x= 1 = =
N N N

Example1
The net weights of the contents of a sample of five Coca-Cola bottles selected at random from the
production line are (in grams): 85.4, 85.3, 84.9, 85.4, and 85.0. What is the arithmetic mean weight
of the sample of observations?
Therefore, the contents of
x=
 x = 85.4 + 85.3 + 84.9 + 85.4 + 85.0 = 426.0 = 85.2 a Coca-Cola bottle on
average weigh 85.2 grams
N 5 5

Note: the mean of a sample, or any other measure based on sample data is called a statistic.
Many studies involve all the population values. For example, if the study involved all the weekly
earnings of MTN workers. In this case, the entire set of weekly earnings would be considered as
the population. And the mean of weekly earnings of workers would be:-

=
x Where, µ stands for the population mean and N is the total
N number of observations.

As mentioned earlier, any measurable characteristic of a sample is called a statistic. Any


measurable characteristic of a population is called a parameter. A sample statistic is used to
estimate a population parameter.

Example 2
The following are estimates for outstanding balances (in Shs ‘000) for a sample of 20 degree
students at Makerere in Uganda shillings. Determine the arithmetic mean.

137 136 135 136 138 137 136 137


135 135 137 138 136 136 138 135
136 137 136 136

Solution
Tally Frequency (f) (f x)
Balances (x), 000
135 //// 4 540
136 //// /// 8 1,088
137 //// 5 685
138 /// 3 414
∑f = 20 ∑ (f x) = 2,727

X =
 fx =  fx = 2,727 = 136.35
f N 20

X is136,000UgandaShil lings

(ii) Arithmetic mean for grouped data


In grouped data, the arithmetic mean is given by:

X =
 fx =  fx
f N

Example 1. Given the following table


Class mark Frequency fi xi
Class (xi) (fi)
60 – 62 61 5 305
63 – 65 64 18 1,152
66 – 68 67 42 2,814
69 – 71 70 27 1,890
72 – 74 73 8 584
∑fi = 100 ∑ fi xi = 6,745

X =
 fx = 6,745 = 67.45
 f 100
Example 2: For grouped data
The following are selling prices in (thousands of $) of 20 vehicles sold by Ramzan motors in the
past 7 months. What is the estimated mean selling price?

85 75 66 43 40
88 80 56 56 67
87 83 65 53 75
83 83 52 44 45

Solution
Choose a suitable class, e.g. (40 – 49)
Selling x tally frequency fx
price in f
‘000
$40 – 49 44.5 //// 4 178
50 – 59 54.5 //// 4 218
60 – 69 64.5 /// 3 193.5
70 – 79 74.5 // 2 149
80 – 89 84.5 //// // 7 591.5
 f = 20  fx = 1330

X =
 fx = 1,330 = 66.5thousands of $.
 f 20
Interpretation: For the past 7 months, the average selling price of 20 vehicles was approximately
$ 66.5 thousand.

Properties of the Arithmetic Mean


1) It can be calculated on any data set and thus it always exists.
2) It’s very sensitive to extreme values and this makes the mean unrepresentative of the data.
3) A set of numerical data has only one mean and so the mean is unique.
4) It takes into account all observations in the data set.
5) It can be used for further statistical treatment, e.g. mean for several sets of data can be
combined into an overall mean for all data.
6) It is relatively reliable in the sense that means of several samples drawn from the same
population do not vary as widely as other statistics used to estimate the population mean.
7) The sum of the deviations of a set of numbers from their mean is zero.

( )
ie,  x − x = 0

8) It cannot be calculated for open intervals.

Advantages of the arithmetic mean.


(a) It’s easy to calculate.
(b) It can easily be manipulated to calculate other useful statistical measures.
(c) It uses the values of all the observations.
Disadvantages of the arithmetic mean.
(a) A few extreme values can distort the mean making it unrepresentative of the data set.
(b) It can’t be read from the graph.
(c) When the data is discrete, it may produce a value that appears to be unrealistic.

THE MEDIAN
It has been noted that for data containing one or two very large or very small values, the
arithmetic mean might not be very representative. The center point for such problems can better
be described by a measure of central tendency called the “median”
It is a statistical measure that divides a data set into two equal parts.

Case 1
For ungrouped data, we first arrange the items in either ascending or descending order. The
median will then be given by the observations that do fall in the middle (odd number of
observations) or the average of the two middle numbers (even no. of observations).

Example 1
Ungrouped data
Given the following set of data: 1, 2, 8, 9, 4, 7, 6, find the median.

Solution
Organize the data in an ascending order, i.e.
1, 2, 4, 6, 7, 8, 9.
The median = 6 (number that lies in the middle of the data set).
Example 2
Given the following data set: 1, 2, 8, 9, 4, 7, 6, 10, find the median.
Solution
Organise in the ascending order:
1, 2, 4, 6, 7, 8, 9, 10
Since there are two terms (6 & 7) in the middle, we take the average, i.e.

6+7
i.e. Median = = 6.5
2

Case 2
For Grouped Data
A. The interpolation formula or method.
B. Graphical interpolation method.

A. The interpolation formula or method.


Steps to be taken.
1) Compute the cumulative frequency (cf).
2) Divide the total frequency by 2 in order to ascertain the median class. The result gives the
cumulative frequency of the median class.
3) Compute the median using the formula:-
 N − CFb 
median = Lm +  2  xc Where,
 fm  Lm = Lower class boundary of the median class
  (middle class)
C = Median Class width
fm = frequency of the median class
Cfb = First cumulative frequency before the median
Example1. Consider the given table and findclass
the median.
Class f Cumulative frequency (cf)
60 – 62 5 5
63 – 65 18 5 + 18 = 23
66 – 68 42 5 + 18 + 42 = 65
69 – 71 27 5 + 18 + 42 + 27 = 92
72 - 74 8 5 + 18 + 42 + 27 + 8 = 100
∑f = 100

 N − CFb  50 – 23
median = Lm +  2  xc Median = 65.5 +
42
x3
 fm 
 
Median = 67.43

Example 2
Given the following information, obtain the median class.

Marks Frequency Cumulative frequency


(f) (cf)
1– 6 4 4
7 – 12 15 19
13 – 18 11 30 Note: since the groups are even, the
19 – 24 8 38 median group can be identified by
25 – 30 5 43
th th
31 – 36 1 44 N = 44 = 22nd
2 2

22nd is therefore the median group = (13 – 18)

Since, ∑f = 44

and Lm = 12.5
N
/2 = 44/2 = 22
Cfb = 19
fm = 11
C =6
22 – 19
So, median = 12.5 + x6
11

= 14.12
Hence, the median is approximately 14.

B. Graphical interpolation method


Step 1: Compute N/2 and locate it on the Y-axis which has values of cumulative frequency.
Step 2: At the point of intersection with the curve, draw the perpendicular to the X-axis.
Step 3: Read off the value of the median from the X-axis.

NOTE: The median can also be the value of X that corresponds to the vertical line that divides a
histogram into two equal parts having equal areas
But for now lets consider the less than Ogive.
An OGIVE from the above table.

50
45
s
e
i 40
c N/2 An OGIVE
n 35
e =22
u
q 30
e
r
F
e 25
v
it
l 20
a
u
m15
u
C 10

5
0
6 12 18 24 30 36

UCL

So, from the above graph it can be seen that the median is approximately 14

Note: - The graphical estimation method is usually superior to the formula estimation method
as long as a smooth cumulative frequency curve is drawn. This is due to its superior non
interpolation effect. The median is a positional average and is influenced by the position of
the items in the series and not by the size of the items that’s why it’s not a suitable
representative of the a data series in most cases.

Advantages of the Median


(a) It’s an appropriate alternative to the mean when extreme values are present at one or both
ends of the distribution or data set. E.g. (158, 138,141,148,148,146,157,252) , the mean
was previously 161 which is unrepresentative of the data set, but if the values are arranged
in ascending order the median is 148 which is at least a good representative measure for this
average.
(b) It can be used when certain end values of the distribution are difficult, expensive, or
impossible to obtain and the basic example here is life data.
(c) Can be used where non numeric data is available unlike the mean that can’t be used in this
case. E.g., the median size of a shirt can be determined if all shirts are measured in terms
like large, extra large, medium etc..
(d) Will often assume a value equal to one of its original items which is considered as an
advantage over the mean.

Disadvantage of the median


(a) The median is difficult to handle theoretically in more advanced statistical work so its use is
restricted to analysis at a basic level.
(b) In a grouped frequency distribution, the value of the median within the median class is only
an estimate whether it’s calculated or estimated from the graph.

THE MODE
If we want to find a measure of central location in terms of popularity (most popular item) then
it’s advisable to compute the mode. e.g., if we consider a shop selling television sets and the
manager asks him self ”what price does the average television set sell at?” ,the best measure would
be the mode. Other cases where the mode is the best average are: - shoe and cloth sizes, number
of defectives found on a production line, size of company by number of employees.etc.
So the mode is the observation/value that occurs most often (with the highest frequency) in a
given data set. For ungrouped data, the value that appears most frequently is the mode. A given
data set can have one mode (unimodal) or two modes (bimodal) or more than two modes
(multimodal)

Example 1
For ungrouped data
Given the following distribution: 2, 3, 6, 3, 5, 7, 3, 2, 7, find the mode.

Solution
Arrange in ascending order:
2, 2, 3, 3, 3, 5, 6, 7, 9
By mere observation, 3 appears often compared to any other number.

For grouped data


The mode can be estimated using either the:-
A. The interpolation formula or method.
B. Graphical method.
Where, Lm is the lower class boundary of the modal
class, fm is the frequency of the modal class, fa is the
A. The interpolation method. frequency of the class above the modal class, fb is the
frequency below the modal class(down). C is the
 ( fm − fa ) 
Mode = Lm +   xC modal class width.
 ( f m − f a ) + ( f m − f b )
Example 1
Given the following grouped data:

Class f
60 – 62 5 Modal class = 66 – 68
63 – 65 18 Note: Lm = 65.5
If the modal class happens to be the
66 – 68 42 first for
m = 42
last class in the distribution, then we
69 – 71 fa =the18mode as:
27 estimate
72 - 74 8 Mode fb = =3 (median)
27 – 2 (mean).
C =3
∑f = 100

42 – 18 Mode = 67.346.
So, mode = 65.5 + x3
(42 – 18) + (42 – 27)

A. Graphical method.
The mode can also be got graphically from a histogram – by drawing lines diagonally from upper
corners of the tallest bar to upper corners of the adjacent bars. Then a bar line is drawn from the
point of intersection to the X-axis and the mode is read off from the X-axis, i.e.

A histogram from the above table.


45

40

35 60 – 62
30 63 – 65
Frequency

66 – 68
25
69 – 71
20
72 - 74
15

10

0
Classes
60–62 63–65 66–68 69–71 72-74

So, from the above graph, the mode is approximately 67.


Example 2
The frequency distribution below represents time in seconds needed to serve a sample of
customers by cashiers at a bank in December 2004

Time (sec)
Frequency
20 – 29 6
30 – 39 16
40 – 49 21
50 – 59 29
60 – 69 25
70 – 79 22
80 – 89 11
90 – 99 7
100 - 109 4
110 – 119 0 Modal class = (50 – 59), i.e. class with highest frequency.
120 – 129 2
Lm = 49.5
fm = 29
fa = 21 Read about interpretation of the mode in
fb = 25 business decision making.
C = 10

 ( fm − fa ) 
Mode = Lm +   xC
 ( f m − f a ) + ( f m − f b )

29 – 21
Mode = 49.5 + x 10
(29 – 21) + (29 – 25)

8 x
Mode = 49.5 + 10
12
Mode = 56.2
Note:-The mode is usually affected by the much popular class when the distribution is
significantly skewed and sometimes it might not exist if all items have different values or not be
unique e.g. when two or more values have the same highest frequency. The mode can’t be used
for further statistical analysis since it has no natural measure of dispersion.
➢ It’s usually used as an alternative to the mean and median when the situation calls for the
most popular item in the data set.
➢ Easy to understand, not difficult to calculate and can be used when the distribution has open
ended classes.
➢ Although the mode usually ignores isolated extreme values, it’s thought to be too much
affected by the most popular class when the distribution is significantly skewed.
➢ Sometimes it might not exist, when the set of items all have different values. Or, might not be
unique when there are two or more values that do have the highest frequency.
➢ Unlike the mean and the median, it has no natural measure of dispersion to go with it which
is a particular disadvantage in most cases where further analysis is required.
➢ Like the median, the mode is not used for further statistical work.

Revision questions
Question 1
a) Define an arithmetic mean and mention any 5 qualities of a good arithmetic mean.
b) Distinguish between the following:
(i) Mode and median
(ii) Median and range
(iii) Sample mean and population mean.
c) Nine light bulbs burned out after lasting for 867, 849, 840, 852,666, 867, 756, 342 and
822 hours of continuous use. Find the mean, the mode, the range, the median and also
determine what the mean would be if the second value was incorrectly recorded as 489
instead of 849.
d) When is it advisable for a business statistician to apply: - a) the mode, b) the median, c)
the arithmetic mean? As a measure of central tendency / location?
Question 2
a) Name two separate conditions under which the median rather than the mean would be
chosen as a measure of location and explain why?
b) What would be the advantages and disadvantages of using:-
• The arithmetic mean.
• The median and
• The mode?
c) What are the properties of any good average?

Question 3
The following are daily number of newspapers sold by the Red Pepper in 90 business days.

Copies sold
Number of days Compute the:
20 – 24 3 a) Average number of copies sold
25 – 29 10 b) Median number of copies
30 – 34 21 c) Mode
35 – 39 28 d) Draw a histogram of the distribution and
40 – 44 14 estimate the mode
45 – 49 9 e) Draw a frequency polygon and estimate the
median
50 - 54 5
f) Do you get the answers in b) and e)?
Total 90

You might also like