0% found this document useful (0 votes)

9 views22 pages

Statistics 1 2025

The document provides a comprehensive overview of statistics, detailing its definition, basic elements, branches, and types of data. It emphasizes the importance of proper data collection, organization, analysis, and interpretation for meaningful conclusions. Additionally, it distinguishes between descriptive and inferential statistics and explains various statistical terms and concepts.

Uploaded by

esthertalibita24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views22 pages

Statistics 1 2025

Uploaded by

esthertalibita24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

STATISTICS

Introduction

The field of statistics deals primarily with numerical data gathered from surveys or collected in
experiments. Its objective is to summarize such data so that the summary gives us a good indication
about some characteristics of a population or phenomenon that we wish to study. To ensure that
our conclusions are meaningful, it is necessary to subject our data to scientific analyses so that
rational decisions can be made. Hence the field of statistics is defined as the proper collection of
data, its organization into manageable and presentable form, its analysis and interpretation into
conclusions for useful purposes

The Basic Elements of Statistics

According to the definition above, the field of statistics incorporates the following five elements.
(I) Proper Collection of Data
Since statistical analyses and decisions are based upon the raw data collected from surveys, it is
important and necessary that such data are carefully and accurately collected, accumulated and
recorded. Faulty data or/ and faulty data collection techniques result into wrong conclusions.
Sources of Data
Data may be available from existing published sources that may be already organized in some
presentable form. Such information is commonly referred to as secondary data.
On the other hand, the investigator may actually collect his data. This is usually done when
information about some area of inquiry has not been previously ascertained. In such cases, the
information is referred to as primary data.
(II) Organization and Classification of Data
For data collected through a survey, it is necessary to edit this data to correct any apparent
inconsistencies, ambiguities, recording errors and other mistakes that can enter into the actual
computations. But even before the data have been collected and edited, it is assumed that these can
be suitably classified according to some common characteristics of the population under study.
For example, we can collect data that purposefully describes the non-nationals living in Kampala
city with respect to the following characteristics, say age group, annual income, type of
employment or professional identification, level of education, family size and so on. Data
classified according to such characteristics can be presented more easily.
(III) Presentation of Data
The organized data can now be presented in the form of tables, diagrams, charts or graphs. This
presentation in an orderly form facilitates the understanding and analysis of the data.
(IV) Analysis of the Data
Data analysis makes data more useful for certain conclusions. The analysis may take the form of
a critical observation of the data to draw some meaningful conclusions about it or it may involve
highly complex and sophisticated statistical computer software like Minitab, SAS, SPSS, Microfit
excel, R, STATA, E-views, among others. The commonly used simple statistical tools for
analyzing data include calculating of averages (mean), dispersion of data around averages and
percentages.
(V) Interpretation of Data
Data interpretation means drawing conclusions from the data that form the basis of decision
making. The correct interpretation requires a high degree of skill and experience and is necessary
in order to draw valid conclusions. For example, a simple statistical statement, that the average
mark in a statistics examination was 65% may not be the correct interpretation of the data if some
students scored very highly and others scored very low marks.
The Branches of Statistics
The field of statistics is primarily split up into two identifiable branches, which are descriptive
statistics and inferential statistics.
Descriptive Statistics
This is a branch of statistics that merely describes the data and consists of methods and techniques
used in collection, organization, presentation and analysis of data in order to describe the various
features and characteristics of such data. The data can be presented in form of charts, tables or
graphs in order to show certain trends, proportions, maximum and minimum values and so on. In
addition to collection, organization and presentation of data, descriptive statistics is also concerned
with data analysis so that the data can be easily understood. This involves computation of averages,
proportions and measures of spread of data around the average to gain clarity and compactness,
even though we may lose some details.
For example, the following statistics in their most summarized form describe in some way the
characteristics of the population from which they are drawn although not much can be inferred or
concluded from them nor can definite decisions be made.
I. 40% of the students offering Mathematical economics – 2 class are married.
II. The ages of students in my class range from 20 to 40 years.
III. The average mark in a statistics examination last year was 68%.
Inferential Statistics
This is a branch of statistics that deals with methods of inferring or drawing conclusions about the
characteristics of the population based upon the results of the samples taken from the same
population. The measured characteristics of the sample are known as sample statistics and the
measured characteristics of the population are known as population parameters. Inferential
statistics requires making a decision on whether the conclusions drawn from the analysis of the
sample are exactly the same as the conclusions that would be drawn from the entire population
from which the representative sample was taken.
Statistical Terms and Concepts
There are several terms that are used extensively in the field of business statistics. Amongst them
are the following,
(i) Data
This is a term you have come across time and time again so far, but what does “data” mean? Data
is simply a scientific term for facts, figures, information and measurements. Examples of data are
incomes of families, marks scored in an examination, number of goals scored by each football
team in the super division in a given season, the profit after tax of a company for the past five years
and so on. Before the data obtained from statistical surveys or investigations have been worked
on, they are known as raw data. If the data is written down in ascending or descending order, then
it is called ordered data. If the ordered data is arranged in arrays of rows or columns, then it is
known to be presented in an ordered array.
(ii) Population
In statistical terms, this refers to the totality of things, objects, and persons under consideration.
For example, if we are interested in knowing the percentage of adult Ugandan travelers who go to
America annually, then all those adult Ugandans who travel abroad become our population.
(iii) Sample
A sample is a portion of the total population that is considered for study and analysis. It is selected
in such a way so as to be representative of the population. For instance, if we want to study the
income pattern of lecturers at Makerere University and there are 500 lecturers, then we may take
a random sample of only 50 lecturers out of the entire population of 500 for the purpose of our
study. Then this number of 50 lecturers constitutes a sample.
Statistical Data
Data for statistical purposes is of two types, i.e. primary and secondary data, depending on the
source from which it comes.
Primary Data
These are data that are collected afresh and for the first time from the source. Primary data is
collected for a specific purpose or study or inquiry. Primary data are always original in character.
Primary data is either survey data, if it is obtained in an uncontrolled situation by asking
questions or making observations or it is experimental data, if it is obtained in a controlled
situation by making experiments.
Secondary Data
This is data that has been already collected and it has already passed through the statistical
process. It can be sourced from Newspapers, Journals,
Magazines, Reports, books and so on. The advantage of using secondary data is that it is
reasonably cheap to obtain and to use and a relatively faster than the primary data

Qualitative and Quantitative data

Various definitions exist for the distinction between these two types of data. Non-numerical,
nominal, data is always described as being qualitative or non-metric as the data is being
described by some quality but not measured. Quantitative or metric data which describes some
measurement or quantity is always numerical and measured on the interval or ratio scales. All
definitions agree that interval or ratio data are quantitative
Discrete and Continuous data
Quantitative data may be discrete or continuous. If the values which can be taken by a variable
change in steps the data are discrete. These discrete values are often, but not always, whole
numbers. If the variable can take any value within a range, it is continuous. The number of people
shopping in a supermarket is discrete but the amount they spend is continuous. The number of
children in a family is discrete but a baby's birth weight is continuous.
Summarizing data using Frequency Distributions
When a large number of measurements of a particular variable are taken, for example, the number
of units produced per employee per week, you may find some values occurring more than once.
Such data can be organized into a frequency distribution which simply lists each value of the
variable and the frequency of its occurrence. This can also be done in a tabular form, and the table
is called a frequency table.
Frequency Distributions
A frequency distribution is defined as the list of all values in the given distribution and the
corresponding number of times each individual value occurs in the distribution. The main
objective of a frequency distribution is to summarize numeric data in a logical way that enables an
overall perspective of the data to be obtained quickly and easily. Frequency distributions are of
two kinds, simple (ungrouped) frequency distributions and the grouped frequency distributions.
A simple (ungrouped) frequency distribution
A simple frequency distribution is a list of data values, with their corresponding number of
occurrences as their frequency. Simple frequency distributions are suitable in situations where we
have a limited number of discrete data values that are repetitive. Discrete values are those values
obtained by counting and assume whole number values. A simple frequency distribution is not
normally suitable for continuous data (i.e. where values have been measured), since the likelihood
of repeated values is small.
Example constructing a simple frequency distribution using a tally chart
The following data records the goals scored by 20 football clubs in the champion’s league.
3 4 1 2 0 6 7 3 3 4
5 5 0 2 6 3 3 5 5 2
Construct a simple frequency distribution table for these data, using a tally chart.
Solution
The tally chart is constructed by examining each value and recording its occurrence with a stroke
and from this, the frequency distribution table can be constructed as shown below:

Tally chart
Data value Tally marks Total
0 || 2
1 | 1
2 ||| 3
3 |||| 5
4 || 2
5 |||| 4
6 || 2
7 | 1

Simple frequency distribution table

Number of goals scored by the club Number of football clubs
0 2
1 1
2 3
3 5
4 2
5 4
6 2
7 1
Observations
a. The majority of the clubs scored three goals in the tournament.
b. No club scored more than seven goals in the tournament.
A Grouped Frequency Distribution
A grouped frequency distribution summarizes data into groups (classes) of values, with each class
showing the number of values that lie in it as its frequency. When the number of distinct values in
a set is large (20 or more, say) a simple frequency distribution is not appropriate, since there will
be too much information not easily assimilated. In this type of situation, a grouped frequency
distribution is used.
Forming Classes or Groups
There are basically two ways of forming classes or groups of the given values. These are as
follows:
Inclusive class intervals. Here the upper limit of the class interval is included in the given class
interval. These class intervals are used when the variable under consideration happens to be a
discrete one (i.e. takes on one figure at a time) e.g. the number of workers.
11-19 20-29 30-39 40-49 etc
Note: here the term whose value is 19 will be put in a class interval (11-19) and the one having a
value of 39 will be put in a class interval of (30-39)
Exclusive class intervals. Here, all values equal to the upper-class limit are grouped into the
next class. These class intervals are usually used when the variable under consideration is a
continuous one (capable of being measured in fractions like height, age, weight and so on).
10-20 20-30 30-40 40-50 etc
Note: here, the term whose value is say 40 won’t be put in (30-40) but will instead fall under (40-
50).
Definitions of terms associated with Grouped Frequency Distribution Classes.
Class limits
These are the lower and upper values of the classes as physically described in the distribution.
These class limits have been illustrated in the preceding section above.
Class boundaries
These are sometimes called mathematical or real limits. They are the lower and upper limits up to
which inclusive classes can be extended to cover the successive gaps that exist in between them.
They can also be referred to as the lower and upper values of classes that mark common points
between them. For exclusive classes, the class limits are also the class boundaries. However, for
inclusive classes, the class boundaries differ from class limits. Class boundaries are determined
as follows:
Upper class boundary = Upper class limit + ½ d
Lower class boundary = Lower class limit – ½ d
Where d is the size of the gap that exists between the successive classes.
Example 5 (Getting class - boundaries)
Consider the classes, 10 – 19, and 20 – 29 and so on. The size of the gap between 19 and 20 is 1
unit. The class boundaries for the class 10 – 19 are as follows:
Upper class boundary = 19 + ½ (1) = 19 + 0.5 = 19.5
Lower class boundary = 10 – ½ (1) = 10 – 0.5 = 9.5
The same applies for 20 – 29,
Upper class boundary = 29 + ½ (1) = 29 + 0.5 = 29.5
Lower class boundary = 20 – ½ (1) = 20 – 0.5 = 19.5

Class mark (class mid-point)

This is the mid- point of the class interval. E.g. for an interval of (60-62), the class mark is
(60+62) = 61
2
Cumulative frequency distribution
These helps us to determine the number of units/observations that lie above a given lower class
limit or below a given upper class limit in a frequency distribution. When you are interested to
get the number of units below a specified value then you consider a less than cumulative
frequency distribution. However, when you are interested to get the number of units that lie
above a specified value, then you consider using a more than cumulative frequency distribution
Example
Age of Frequency(fi) Cum.freq.(less Cum.freq.(more
workers. than) than)
15-25 5 5 7+3+5+7+3+5=30
25-35 3 5+3=8 7+3+5+7+3=25
35-45 7 5+3+7=15 7+3+5+7=22
45-55 5 5+3+7+5=20 7+3+5=15
55-65 3 5+3+7+5+3=23 7+3=10
65-75 7 5+3+7+5+3+7=30 7
Σfi=30
From the above table it can be noted that:-

Case 1. Cumulative frequency less than

➢ There are 5 people less than 25 years, 8 people less than 35 years, 15 people less than 45
years and so on.

Part4. The Graphical Presentation of Numerical Frequency Distributions

Having summarized data in form of frequency distributions, we now present it graphically in order
to give it a more visual display. Data can be presented graphically using, Histograms, Frequency
polygons and curves, Cumulative frequency curves or ogives.
A pie chart is a circle, which is divided by radial lines into sections or subsections of different
angles so that the area of a particular section is proportional to size of the figure represented.
Example. Suppose UTL’s work force is split as per the table below.
Job description Number employed.
Labourers 21
Mechanics. 38
Cleaners. 9
Clerks. 12
Total 80
Required:- Represent the above information on a pie chart.

Cleaners clerks laborers mechanics

Bar graph /bar chat.

There are 3 types of bar graphs i.e. – A simple bar graph, multiple bar graph, and component
change bar graph.
A simple bar graph employs bars to indicate frequency of occurrence of observation with in
each category. A vertical bar graph is drawn for each category and the height represents the
number of members in that particular class.
In the component bar graph, bars are divided into different components representing a given
category of data. In multiple bar graph, there are several bars within a given category.

Example. Suppose UTL’s work force is split as per the table below.
Job description Number employed.
Laborers 21
Mechanics. 38
Cleaners. 9
Clerks. 12
Total 80
Required:- Represent the above information on a simple bar chat.
A sim ple bar chat for the above inform ation.

40
35
Number employed.

30
25
20
15
10
5
0
Laborers Mechanics. Cleaners. Clerks.
Job description.

The component bar graph Example 2

Value of some major export commodities, groups (1980 – 84) In ($ millions)
Year Agriculture Mineral Others Total
1980 238.1 322.4 131.3 691.8
1981 174.8 300.6 90.5 565.9
1982 174.4 302.1 93.9 570.4
1983 231.5 373.1 82.8 687.4
1984 380.6 328.5 114.6 823.7
Source: Bank of Uganda, quarterly economic bulletin
Others

Mineral

Agriculture.

The Multiple Bar Graphs. Example 3

Uganda export and import values ($million)
Year Export value Import value
1976 451.6 351.1
1977 570.9 448.3
1978 550.4 478.4
Source: UBOS, quarterly bulletin

Export Value

Import
Value
Graphical presentation These are divided into 3 parts. Histograms, Frequency polygons and
Cumulative frequency curves.
The Histogram It’s a plot of class frequencies against class boundaries. Its body comprises of
bars of equal width whose height represents the class frequency. Unlike in a bar graph, the bars
in a histogram are attached to each other and the width of the rectangle corresponds to the class
width and the height to the class frequencies.
Graphical presentation These are divided into 3 parts. Histograms, Frequency polygons and
Cumulative frequency curves.
The Histogram It’s a plot of class frequencies against class boundaries. Its body comprises of
bars of equal width whose height represents the class frequency. Unlike in a bar graph, the bars
in a histogram are attached to each other and the width of the rectangle corresponds to the class
width and the height to the class frequencies.

Example
Required:-
Age of Frequency(fi) Mid point (xi) Construct a histogram to
workers. display the information
15-25 5 20 in the table.
25-35 3 30
35-45 7 40
45-55 5 50
55-65 3 60
65-75 7 70
Σfi=30

A histogram for the above information

6
frequency(f)

0
15-25 25-35 35-451 45-55 55-65 65-75
class intervals(ages)
15-25 25-35 35-45 45-55 55-65 65-75

The Frequency Polygon

It’s a plot of class frequencies against class marks or mid points. Successive points are joined by
straight lines there by forming a line graph. Two class intervals are added to the distribution one
at the beginning and the other at the end of the distribution for purposes of closing the polygon.

A Frequency Polygon from the previous table.

8
7
6
Frequency

5
4
3
2
1
0
10 20 30 40 50 60 70 80
Mid points of the class.

The cumulative frequency curve. (OGIVE)

The OGIVE is of two types. The less than Ogive and, The greater than Ogive.
Example. The cumulative freq distribution of statistics results.
Age of Frequency(fi) Cum.freq.(less Cum.freq.(more
workers. than) than)
15-25 1 1 25
25-35 2 3 24
35-45 3 6 22
45-55 5 11 19
55-65 7 18 14
65-75 5 23 7
75-85
2 25 2

∑f=25
Required: - Draw the “less than” and “more than” cumulative frequency curves from this
distribution

Case 1.
The less than Ogive.
Here the less than cumulative frequencies are plotted against the upper class limits /boundaries
of each class.
A less than Ogive showing statistics results.

Less than Cum.Freq.

0
25 35 45 55 65 75 85

Upper class limits/B

Case2.
The more than Ogive.
Here the more than cumulative frequencies are plotted against the lower-class limits /boundaries
of each class as shown in the next graph.

A more than Ogive showing statistics results.

25
More than Cum.Freq.

0
15 25 35 45 55 65 75
Low er class lim its/B
STATISTICAL MEASURES
Chapter 1, 2 &3 helped us to transform a mass of raw data into a meaningful form; we organized
it into a frequency distribution and portrayed it graphically on a histogram and a frequency
polygon. We also looked at other graphical techniques like line charts/graphs and pie charts.
Chapter 5 & 6 are concerned with the other two ways of describing data; i.e. measures of central
tendency & measures of dispersion. It deals with basic analysis of univariate data (data obtained
from measuring just one attribute). Statistical measures is the name given to describe this type of
analysis, the measures themselves being split into Measures of location/ average (central
tendency), Measures of Dispersion and Measures of skewness.

MEASURES OF CENTRAL TENDENCY / LOCATION

A measure of central tendency is a sample value around which the distribution of data is
centered. Or a single value which can neatly characterize the whole group of data. The purpose of
a measure of central tendency/average is to pinpoint the center of a set of observations. e.g., the
average mark in business statistics was 82%, the average life expectancy of Ugandans is 45 years.
These values (measures of location are very useful in presenting the overall picture of the entire
data and in making comparisons between two or more sets of data). The common measures of
central tendency are: - The Mean, The Mode and The Median.
Methods of computing the measures for central tendency:

a) (i) The Arithmetic Mean (for ungrouped data)

A statistician usually takes a sample from a population in order to find out something about a
specific characteristic of the population. Fore example, a lecturer would wish to know the mean
height of all students in diploma. It can be time consuming & expensive to start measuring heights
of all students. He might select only a sample of 8 students and gets their heights. This mean
height of 8 students will reflect the mean height of all diploma students. The arithmetic mean is
the most widely used of all averages.
For ungrouped data, if observations are in a row form, the mean is computed by summing all the
observations and then dividing their sum by the total number of observations.
N

x + x 2 + x3 + ........ + x N
 xi  x
i =1
x= 1 = =
N N N

Example1
The net weights of the contents of a sample of five Coca-Cola bottles selected at random from the
production line are (in grams): 85.4, 85.3, 84.9, 85.4, and 85.0. What is the arithmetic mean weight
of the sample of observations?
Therefore, the contents of
x=
 x = 85.4 + 85.3 + 84.9 + 85.4 + 85.0 = 426.0 = 85.2 a Coca-Cola bottle on
average weigh 85.2 grams
N 5 5

Note: the mean of a sample, or any other measure based on sample data is called a statistic.
Many studies involve all the population values. For example, if the study involved all the weekly
earnings of MTN workers. In this case, the entire set of weekly earnings would be considered as
the population. And the mean of weekly earnings of workers would be:-

=
x Where, µ stands for the population mean and N is the total
N number of observations.

As mentioned earlier, any measurable characteristic of a sample is called a statistic. Any

measurable characteristic of a population is called a parameter. A sample statistic is used to
estimate a population parameter.

Example 2
The following are estimates for outstanding balances (in Shs ‘000) for a sample of 20 degree
students at Makerere in Uganda shillings. Determine the arithmetic mean.

137 136 135 136 138 137 136 137

135 135 137 138 136 136 138 135
136 137 136 136

Solution
Tally Frequency (f) (f x)
Balances (x), 000
135 //// 4 540
136 //// /// 8 1,088
137 //// 5 685
138 /// 3 414
∑f = 20 ∑ (f x) = 2,727

X =
 fx =  fx = 2,727 = 136.35
f N 20

X is136,000UgandaShil lings

(ii) Arithmetic mean for grouped data

In grouped data, the arithmetic mean is given by:

X =
 fx =  fx
f N

Example 1. Given the following table

Class mark Frequency fi xi
Class (xi) (fi)
60 – 62 61 5 305
63 – 65 64 18 1,152
66 – 68 67 42 2,814
69 – 71 70 27 1,890
72 – 74 73 8 584
∑fi = 100 ∑ fi xi = 6,745

X =
 fx = 6,745 = 67.45
 f 100
Example 2: For grouped data
The following are selling prices in (thousands of $) of 20 vehicles sold by Ramzan motors in the
past 7 months. What is the estimated mean selling price?

85 75 66 43 40
88 80 56 56 67
87 83 65 53 75
83 83 52 44 45

Solution
Choose a suitable class, e.g. (40 – 49)
Selling x tally frequency fx
price in f
‘000
$40 – 49 44.5 //// 4 178
50 – 59 54.5 //// 4 218
60 – 69 64.5 /// 3 193.5
70 – 79 74.5 // 2 149
80 – 89 84.5 //// // 7 591.5
 f = 20  fx = 1330

X =
 fx = 1,330 = 66.5thousands of $.
 f 20
Interpretation: For the past 7 months, the average selling price of 20 vehicles was approximately
$ 66.5 thousand.

Properties of the Arithmetic Mean

1) It can be calculated on any data set and thus it always exists.
2) It’s very sensitive to extreme values and this makes the mean unrepresentative of the data.
3) A set of numerical data has only one mean and so the mean is unique.
4) It takes into account all observations in the data set.
5) It can be used for further statistical treatment, e.g. mean for several sets of data can be
combined into an overall mean for all data.
6) It is relatively reliable in the sense that means of several samples drawn from the same
population do not vary as widely as other statistics used to estimate the population mean.
7) The sum of the deviations of a set of numbers from their mean is zero.

( )
ie,  x − x = 0

8) It cannot be calculated for open intervals.

Advantages of the arithmetic mean.

(a) It’s easy to calculate.
(b) It can easily be manipulated to calculate other useful statistical measures.
(c) It uses the values of all the observations.
Disadvantages of the arithmetic mean.
(a) A few extreme values can distort the mean making it unrepresentative of the data set.
(b) It can’t be read from the graph.
(c) When the data is discrete, it may produce a value that appears to be unrealistic.

THE MEDIAN
It has been noted that for data containing one or two very large or very small values, the
arithmetic mean might not be very representative. The center point for such problems can better
be described by a measure of central tendency called the “median”
It is a statistical measure that divides a data set into two equal parts.

Case 1
For ungrouped data, we first arrange the items in either ascending or descending order. The
median will then be given by the observations that do fall in the middle (odd number of
observations) or the average of the two middle numbers (even no. of observations).

Example 1
Ungrouped data
Given the following set of data: 1, 2, 8, 9, 4, 7, 6, find the median.

Solution
Organize the data in an ascending order, i.e.
1, 2, 4, 6, 7, 8, 9.
The median = 6 (number that lies in the middle of the data set).
Example 2
Given the following data set: 1, 2, 8, 9, 4, 7, 6, 10, find the median.
Solution
Organise in the ascending order:
1, 2, 4, 6, 7, 8, 9, 10
Since there are two terms (6 & 7) in the middle, we take the average, i.e.

6+7
i.e. Median = = 6.5
2

Case 2
For Grouped Data
A. The interpolation formula or method.
B. Graphical interpolation method.

A. The interpolation formula or method.

Steps to be taken.
1) Compute the cumulative frequency (cf).
2) Divide the total frequency by 2 in order to ascertain the median class. The result gives the
cumulative frequency of the median class.
3) Compute the median using the formula:-
 N − CFb 
median = Lm +  2  xc Where,
 fm  Lm = Lower class boundary of the median class
  (middle class)
C = Median Class width
fm = frequency of the median class
Cfb = First cumulative frequency before the median
Example1. Consider the given table and findclass
the median.
Class f Cumulative frequency (cf)
60 – 62 5 5
63 – 65 18 5 + 18 = 23
66 – 68 42 5 + 18 + 42 = 65
69 – 71 27 5 + 18 + 42 + 27 = 92
72 - 74 8 5 + 18 + 42 + 27 + 8 = 100
∑f = 100

 N − CFb  50 – 23
median = Lm +  2  xc Median = 65.5 +
42
x3
 fm 
 
Median = 67.43

Example 2
Given the following information, obtain the median class.

Marks Frequency Cumulative frequency

(f) (cf)
1– 6 4 4
7 – 12 15 19
13 – 18 11 30 Note: since the groups are even, the
19 – 24 8 38 median group can be identified by
25 – 30 5 43
th th
31 – 36 1 44 N = 44 = 22nd
2 2

22nd is therefore the median group = (13 – 18)

Since, ∑f = 44

and Lm = 12.5
N
/2 = 44/2 = 22
Cfb = 19
fm = 11
C =6
22 – 19
So, median = 12.5 + x6
11

= 14.12
Hence, the median is approximately 14.

B. Graphical interpolation method

Step 1: Compute N/2 and locate it on the Y-axis which has values of cumulative frequency.
Step 2: At the point of intersection with the curve, draw the perpendicular to the X-axis.
Step 3: Read off the value of the median from the X-axis.

NOTE: The median can also be the value of X that corresponds to the vertical line that divides a
histogram into two equal parts having equal areas
But for now lets consider the less than Ogive.
An OGIVE from the above table.

50
45
s
e
i 40
c N/2 An OGIVE
n 35
e =22
u
q 30
e
r
F
e 25
v
it
l 20
a
u
m15
u
C 10

5
0
6 12 18 24 30 36

UCL

So, from the above graph it can be seen that the median is approximately 14

Note: - The graphical estimation method is usually superior to the formula estimation method
as long as a smooth cumulative frequency curve is drawn. This is due to its superior non
interpolation effect. The median is a positional average and is influenced by the position of
the items in the series and not by the size of the items that’s why it’s not a suitable
representative of the a data series in most cases.

Advantages of the Median

(a) It’s an appropriate alternative to the mean when extreme values are present at one or both
ends of the distribution or data set. E.g. (158, 138,141,148,148,146,157,252) , the mean
was previously 161 which is unrepresentative of the data set, but if the values are arranged
in ascending order the median is 148 which is at least a good representative measure for this
average.
(b) It can be used when certain end values of the distribution are difficult, expensive, or
impossible to obtain and the basic example here is life data.
(c) Can be used where non numeric data is available unlike the mean that can’t be used in this
case. E.g., the median size of a shirt can be determined if all shirts are measured in terms
like large, extra large, medium etc..
(d) Will often assume a value equal to one of its original items which is considered as an
advantage over the mean.

Disadvantage of the median

(a) The median is difficult to handle theoretically in more advanced statistical work so its use is
restricted to analysis at a basic level.
(b) In a grouped frequency distribution, the value of the median within the median class is only
an estimate whether it’s calculated or estimated from the graph.

THE MODE
If we want to find a measure of central location in terms of popularity (most popular item) then
it’s advisable to compute the mode. e.g., if we consider a shop selling television sets and the
manager asks him self ”what price does the average television set sell at?” ,the best measure would
be the mode. Other cases where the mode is the best average are: - shoe and cloth sizes, number
of defectives found on a production line, size of company by number of employees.etc.
So the mode is the observation/value that occurs most often (with the highest frequency) in a
given data set. For ungrouped data, the value that appears most frequently is the mode. A given
data set can have one mode (unimodal) or two modes (bimodal) or more than two modes
(multimodal)

Example 1
For ungrouped data
Given the following distribution: 2, 3, 6, 3, 5, 7, 3, 2, 7, find the mode.

Solution
Arrange in ascending order:
2, 2, 3, 3, 3, 5, 6, 7, 9
By mere observation, 3 appears often compared to any other number.

For grouped data

The mode can be estimated using either the:-
A. The interpolation formula or method.
B. Graphical method.
Where, Lm is the lower class boundary of the modal
class, fm is the frequency of the modal class, fa is the
A. The interpolation method. frequency of the class above the modal class, fb is the
frequency below the modal class(down). C is the
 ( fm − fa ) 
Mode = Lm +   xC modal class width.
 ( f m − f a ) + ( f m − f b )
Example 1
Given the following grouped data:

Class f
60 – 62 5 Modal class = 66 – 68
63 – 65 18 Note: Lm = 65.5
If the modal class happens to be the
66 – 68 42 first for
m = 42
last class in the distribution, then we
69 – 71 fa =the18mode as:
27 estimate
72 - 74 8 Mode fb = =3 (median)
27 – 2 (mean).
C =3
∑f = 100

42 – 18 Mode = 67.346.
So, mode = 65.5 + x3
(42 – 18) + (42 – 27)

A. Graphical method.
The mode can also be got graphically from a histogram – by drawing lines diagonally from upper
corners of the tallest bar to upper corners of the adjacent bars. Then a bar line is drawn from the
point of intersection to the X-axis and the mode is read off from the X-axis, i.e.

A histogram from the above table.

35 60 – 62
30 63 – 65
Frequency

66 – 68
25
69 – 71
20
72 - 74
15

0
Classes
60–62 63–65 66–68 69–71 72-74

So, from the above graph, the mode is approximately 67.

Example 2
The frequency distribution below represents time in seconds needed to serve a sample of
customers by cashiers at a bank in December 2004

Time (sec)
Frequency
20 – 29 6
30 – 39 16
40 – 49 21
50 – 59 29
60 – 69 25
70 – 79 22
80 – 89 11
90 – 99 7
100 - 109 4
110 – 119 0 Modal class = (50 – 59), i.e. class with highest frequency.
120 – 129 2
Lm = 49.5
fm = 29
fa = 21 Read about interpretation of the mode in
fb = 25 business decision making.
C = 10

 ( fm − fa ) 
Mode = Lm +   xC
 ( f m − f a ) + ( f m − f b )

29 – 21
Mode = 49.5 + x 10
(29 – 21) + (29 – 25)

8 x
Mode = 49.5 + 10
12
Mode = 56.2
Note:-The mode is usually affected by the much popular class when the distribution is
significantly skewed and sometimes it might not exist if all items have different values or not be
unique e.g. when two or more values have the same highest frequency. The mode can’t be used
for further statistical analysis since it has no natural measure of dispersion.
➢ It’s usually used as an alternative to the mean and median when the situation calls for the
most popular item in the data set.
➢ Easy to understand, not difficult to calculate and can be used when the distribution has open
ended classes.
➢ Although the mode usually ignores isolated extreme values, it’s thought to be too much
affected by the most popular class when the distribution is significantly skewed.
➢ Sometimes it might not exist, when the set of items all have different values. Or, might not be
unique when there are two or more values that do have the highest frequency.
➢ Unlike the mean and the median, it has no natural measure of dispersion to go with it which
is a particular disadvantage in most cases where further analysis is required.
➢ Like the median, the mode is not used for further statistical work.

Revision questions
Question 1
a) Define an arithmetic mean and mention any 5 qualities of a good arithmetic mean.
b) Distinguish between the following:
(i) Mode and median
(ii) Median and range
(iii) Sample mean and population mean.
c) Nine light bulbs burned out after lasting for 867, 849, 840, 852,666, 867, 756, 342 and
822 hours of continuous use. Find the mean, the mode, the range, the median and also
determine what the mean would be if the second value was incorrectly recorded as 489
instead of 849.
d) When is it advisable for a business statistician to apply: - a) the mode, b) the median, c)
the arithmetic mean? As a measure of central tendency / location?
Question 2
a) Name two separate conditions under which the median rather than the mean would be
chosen as a measure of location and explain why?
b) What would be the advantages and disadvantages of using:-
• The arithmetic mean.
• The median and
• The mode?
c) What are the properties of any good average?

Question 3
The following are daily number of newspapers sold by the Red Pepper in 90 business days.

Copies sold
Number of days Compute the:
20 – 24 3 a) Average number of copies sold
25 – 29 10 b) Median number of copies
30 – 34 21 c) Mode
35 – 39 28 d) Draw a histogram of the distribution and
40 – 44 14 estimate the mode
45 – 49 9 e) Draw a frequency polygon and estimate the
median
50 - 54 5
f) Do you get the answers in b) and e)?
Total 90

Introduction To Business Statistics
No ratings yet
Introduction To Business Statistics
506 pages
Business Statistics
100% (22)
Business Statistics
506 pages
Statistics For Management Exam-Converted (1) - Min
100% (8)
Statistics For Management Exam-Converted (1) - Min
4 pages
BS Political Science Complete Statistics
No ratings yet
BS Political Science Complete Statistics
65 pages
Statistics
No ratings yet
Statistics
109 pages
Module 1 Introduction To Statistics and Data Analysis
No ratings yet
Module 1 Introduction To Statistics and Data Analysis
36 pages
Business Tool For Decision Making
No ratings yet
Business Tool For Decision Making
68 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Stat 111 Lecture Note 01 PDF
No ratings yet
Stat 111 Lecture Note 01 PDF
14 pages
Business Statistics
No ratings yet
Business Statistics
186 pages
Chapter One Quantitative Techniques
No ratings yet
Chapter One Quantitative Techniques
70 pages
Basic Statistics Ch1 - 4
No ratings yet
Basic Statistics Ch1 - 4
69 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
82 pages
Engineering Data Analysis Module 1-4
No ratings yet
Engineering Data Analysis Module 1-4
122 pages
Gizaw
No ratings yet
Gizaw
78 pages
BBA 2nd Sem - BBAHC-3
No ratings yet
BBA 2nd Sem - BBAHC-3
72 pages
Probability and Statistics
No ratings yet
Probability and Statistics
415 pages
Ps Darshan
No ratings yet
Ps Darshan
130 pages
Moment Generating Function
No ratings yet
Moment Generating Function
5 pages
FE - Engineering Probability and Statistics
No ratings yet
FE - Engineering Probability and Statistics
22 pages
Final AB 19-21 PIM3 Basics of Business Statistics
No ratings yet
Final AB 19-21 PIM3 Basics of Business Statistics
37 pages
Unit 1
No ratings yet
Unit 1
94 pages
Slides
No ratings yet
Slides
41 pages
Statistics For Engineers-1
No ratings yet
Statistics For Engineers-1
49 pages
Basic of Statistics
No ratings yet
Basic of Statistics
10 pages
Average: Sagni D. 1
No ratings yet
Average: Sagni D. 1
85 pages
Ch.1 All & Ch.2 Introduction
No ratings yet
Ch.1 All & Ch.2 Introduction
44 pages
PIM3 - Basics of Business Statistics
No ratings yet
PIM3 - Basics of Business Statistics
37 pages
ME346A Lecture Notes Win2012
No ratings yet
ME346A Lecture Notes Win2012
183 pages
Math Presentation
73% (15)
Math Presentation
58 pages
Week One May 20 bcsc108
No ratings yet
Week One May 20 bcsc108
13 pages
Business Statistics
No ratings yet
Business Statistics
31 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
20 pages
(Probability & Statistics) For BSCS (Lecture .1 & 2)
No ratings yet
(Probability & Statistics) For BSCS (Lecture .1 & 2)
13 pages
Thinking Statistically
From Everand
Thinking Statistically
Anthony Banfield
5/5 (1)
1 Descriptive Part
No ratings yet
1 Descriptive Part
13 pages
Chapter 06 - Continuous Probability Distributions
100% (1)
Chapter 06 - Continuous Probability Distributions
29 pages
Probability and Statistics: Foundation
No ratings yet
Probability and Statistics: Foundation
2 pages
Sheet - 1 - EEE - Introduction of Statistics - Not in Syllabus
No ratings yet
Sheet - 1 - EEE - Introduction of Statistics - Not in Syllabus
15 pages
4 - Discretization and Concept Hierarchy
No ratings yet
4 - Discretization and Concept Hierarchy
27 pages
Statics and Probability Ch1-9
No ratings yet
Statics and Probability Ch1-9
161 pages
Statistical Data by Group 1 - Statistic Economics 2
No ratings yet
Statistical Data by Group 1 - Statistic Economics 2
17 pages
Introduction To Statistics Data Collection
No ratings yet
Introduction To Statistics Data Collection
17 pages
Basic Statistics Note.1
No ratings yet
Basic Statistics Note.1
47 pages
Chapter 1-1
No ratings yet
Chapter 1-1
18 pages
New Generation University College: AUGUST 2020
No ratings yet
New Generation University College: AUGUST 2020
51 pages
Chapter One
No ratings yet
Chapter One
8 pages
Basic Statistics For Analysis and Interpretation of Assessment Data
No ratings yet
Basic Statistics For Analysis and Interpretation of Assessment Data
24 pages
Chapter 1-Introduction
No ratings yet
Chapter 1-Introduction
6 pages
1.STA 112 Session 1
No ratings yet
1.STA 112 Session 1
7 pages
Basic Statistics
No ratings yet
Basic Statistics
53 pages
Sta 321
No ratings yet
Sta 321
7 pages
Business Statistics
No ratings yet
Business Statistics
10 pages
Chapter One
No ratings yet
Chapter One
7 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
GEC05 Module Statistics
No ratings yet
GEC05 Module Statistics
34 pages
Mechanical Measurements
No ratings yet
Mechanical Measurements
22 pages
123 Teaching Notes Bstat
No ratings yet
123 Teaching Notes Bstat
18 pages
Chapter One
No ratings yet
Chapter One
6 pages
Lesson 1 Stats
No ratings yet
Lesson 1 Stats
5 pages
Unit - 1: Statistics: Meaning, Significance & Limitations
No ratings yet
Unit - 1: Statistics: Meaning, Significance & Limitations
11 pages
Quantitative Techniques Assignment
No ratings yet
Quantitative Techniques Assignment
22 pages
BBM 207 Risk and Insurance
No ratings yet
BBM 207 Risk and Insurance
27 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Math 11 SP LAS 4 02 18 2021
No ratings yet
Math 11 SP LAS 4 02 18 2021
12 pages
Probability
No ratings yet
Probability
2 pages
Statistics and Analysis Notes
No ratings yet
Statistics and Analysis Notes
8 pages
Simulation Assignment
No ratings yet
Simulation Assignment
7 pages
Syllabus of B.E. Mechanical Engineering Programme
No ratings yet
Syllabus of B.E. Mechanical Engineering Programme
131 pages
MB0040 Statistics
No ratings yet
MB0040 Statistics
18 pages
Stat
No ratings yet
Stat
9 pages
Module 1
No ratings yet
Module 1
10 pages
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
Chapter 3
No ratings yet
Chapter 3
8 pages
Unit Summary
No ratings yet
Unit Summary
31 pages
Quiz 3 - Discrete Probability Distribution
No ratings yet
Quiz 3 - Discrete Probability Distribution
3 pages
1define What Is Statistic
No ratings yet
1define What Is Statistic
2 pages
Answer Book END-SEM
No ratings yet
Answer Book END-SEM
51 pages
Cost Overruns and Delays in Energy Megaprojects - How Big Is Big Enough?
No ratings yet
Cost Overruns and Delays in Energy Megaprojects - How Big Is Big Enough?
10 pages
Lecture 7: Special Probability Distributions - 2: Assist. Prof. Dr. Emel YAVUZ DUMAN
No ratings yet
Lecture 7: Special Probability Distributions - 2: Assist. Prof. Dr. Emel YAVUZ DUMAN
34 pages
6914 - N - 19724 Sampling For Proportions
No ratings yet
6914 - N - 19724 Sampling For Proportions
13 pages
Tutorial 5
No ratings yet
Tutorial 5
6 pages
Unit 5 Overview of Probability
No ratings yet
Unit 5 Overview of Probability
21 pages
Statistics and Probability Module 3
No ratings yet
Statistics and Probability Module 3
3 pages
Probability Distribution Formula - Normal and Gaussian Distribution
No ratings yet
Probability Distribution Formula - Normal and Gaussian Distribution
5 pages
CC-302 Psychology - 2021 - 241009 - 102245
No ratings yet
CC-302 Psychology - 2021 - 241009 - 102245
6 pages
Instant Download Quantitative Investment Analysis, 4th Edition Cfa Institute PDF All Chapter
100% (1)
Instant Download Quantitative Investment Analysis, 4th Edition Cfa Institute PDF All Chapter
54 pages
Data Collection: Six Sigma Thinking, #1
From Everand
Data Collection: Six Sigma Thinking, #1
Sumeet Savant
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet

Statistics 1 2025

Uploaded by

Statistics 1 2025

Uploaded by

STATISTICS

The Basic Elements of Statistics

Qualitative and Quantitative data

Simple frequency distribution table

Class mark (class mid-point)

Case 1. Cumulative frequency less than

Part4. The Graphical Presentation of Numerical Frequency Distributions

Cleaners clerks laborers mechanics

Bar graph /bar chat.

The component bar graph Example 2

The Multiple Bar Graphs. Example 3

A histogram for the above information

The Frequency Polygon

A Frequency Polygon from the previous table.

The cumulative frequency curve. (OGIVE)

Less than Cum.Freq.

Upper class limits/B

A more than Ogive showing statistics results.

MEASURES OF CENTRAL TENDENCY / LOCATION

a) (i) The Arithmetic Mean (for ungrouped data)

As mentioned earlier, any measurable characteristic of a sample is called a statistic. Any

137 136 135 136 138 137 136 137

(ii) Arithmetic mean for grouped data

Example 1. Given the following table

Properties of the Arithmetic Mean

8) It cannot be calculated for open intervals.

Advantages of the arithmetic mean.

A. The interpolation formula or method.

Marks Frequency Cumulative frequency

22nd is therefore the median group = (13 – 18)

B. Graphical interpolation method

Advantages of the Median

Disadvantage of the median

For grouped data

A histogram from the above table.

So, from the above graph, the mode is approximately 67.

You might also like