Statistics 1 2025
Statistics 1 2025
Introduction
The field of statistics deals primarily with numerical data gathered from surveys or collected in
experiments. Its objective is to summarize such data so that the summary gives us a good indication
about some characteristics of a population or phenomenon that we wish to study. To ensure that
our conclusions are meaningful, it is necessary to subject our data to scientific analyses so that
rational decisions can be made. Hence the field of statistics is defined as the proper collection of
data, its organization into manageable and presentable form, its analysis and interpretation into
conclusions for useful purposes
Tally chart
Data value Tally marks Total
0 || 2
1 | 1
2 ||| 3
3 |||| 5
4 || 2
5 |||| 4
6 || 2
7 | 1
This is the mid- point of the class interval. E.g. for an interval of (60-62), the class mark is
(60+62) = 61
2
Cumulative frequency distribution
These helps us to determine the number of units/observations that lie above a given lower class
limit or below a given upper class limit in a frequency distribution. When you are interested to
get the number of units below a specified value then you consider a less than cumulative
frequency distribution. However, when you are interested to get the number of units that lie
above a specified value, then you consider using a more than cumulative frequency distribution
Example
Age of Frequency(fi) Cum.freq.(less Cum.freq.(more
workers. than) than)
15-25 5 5 7+3+5+7+3+5=30
25-35 3 5+3=8 7+3+5+7+3=25
35-45 7 5+3+7=15 7+3+5+7=22
45-55 5 5+3+7+5=20 7+3+5=15
55-65 3 5+3+7+5+3=23 7+3=10
65-75 7 5+3+7+5+3+7=30 7
Σfi=30
From the above table it can be noted that:-
Example. Suppose UTL’s work force is split as per the table below.
Job description Number employed.
Laborers 21
Mechanics. 38
Cleaners. 9
Clerks. 12
Total 80
Required:- Represent the above information on a simple bar chat.
A sim ple bar chat for the above inform ation.
40
35
Number employed.
30
25
20
15
10
5
0
Laborers Mechanics. Cleaners. Clerks.
Job description.
Mineral
Agriculture.
Export Value
Import
Value
Graphical presentation These are divided into 3 parts. Histograms, Frequency polygons and
Cumulative frequency curves.
The Histogram It’s a plot of class frequencies against class boundaries. Its body comprises of
bars of equal width whose height represents the class frequency. Unlike in a bar graph, the bars
in a histogram are attached to each other and the width of the rectangle corresponds to the class
width and the height to the class frequencies.
Graphical presentation These are divided into 3 parts. Histograms, Frequency polygons and
Cumulative frequency curves.
The Histogram It’s a plot of class frequencies against class boundaries. Its body comprises of
bars of equal width whose height represents the class frequency. Unlike in a bar graph, the bars
in a histogram are attached to each other and the width of the rectangle corresponds to the class
width and the height to the class frequencies.
Example
Required:-
Age of Frequency(fi) Mid point (xi) Construct a histogram to
workers. display the information
15-25 5 20 in the table.
25-35 3 30
35-45 7 40
45-55 5 50
55-65 3 60
65-75 7 70
Σfi=30
6
frequency(f)
0
15-25 25-35 35-451 45-55 55-65 65-75
class intervals(ages)
15-25 25-35 35-45 45-55 55-65 65-75
8
7
6
Frequency
5
4
3
2
1
0
10 20 30 40 50 60 70 80
Mid points of the class.
∑f=25
Required: - Draw the “less than” and “more than” cumulative frequency curves from this
distribution
Case 1.
The less than Ogive.
Here the less than cumulative frequencies are plotted against the upper class limits /boundaries
of each class.
A less than Ogive showing statistics results.
30
25
15
10
0
25 35 45 55 65 75 85
Case2.
The more than Ogive.
Here the more than cumulative frequencies are plotted against the lower-class limits /boundaries
of each class as shown in the next graph.
25
More than Cum.Freq.
20
15
10
0
15 25 35 45 55 65 75
Low er class lim its/B
STATISTICAL MEASURES
Chapter 1, 2 &3 helped us to transform a mass of raw data into a meaningful form; we organized
it into a frequency distribution and portrayed it graphically on a histogram and a frequency
polygon. We also looked at other graphical techniques like line charts/graphs and pie charts.
Chapter 5 & 6 are concerned with the other two ways of describing data; i.e. measures of central
tendency & measures of dispersion. It deals with basic analysis of univariate data (data obtained
from measuring just one attribute). Statistical measures is the name given to describe this type of
analysis, the measures themselves being split into Measures of location/ average (central
tendency), Measures of Dispersion and Measures of skewness.
x + x 2 + x3 + ........ + x N
xi x
i =1
x= 1 = =
N N N
Example1
The net weights of the contents of a sample of five Coca-Cola bottles selected at random from the
production line are (in grams): 85.4, 85.3, 84.9, 85.4, and 85.0. What is the arithmetic mean weight
of the sample of observations?
Therefore, the contents of
x=
x = 85.4 + 85.3 + 84.9 + 85.4 + 85.0 = 426.0 = 85.2 a Coca-Cola bottle on
average weigh 85.2 grams
N 5 5
Note: the mean of a sample, or any other measure based on sample data is called a statistic.
Many studies involve all the population values. For example, if the study involved all the weekly
earnings of MTN workers. In this case, the entire set of weekly earnings would be considered as
the population. And the mean of weekly earnings of workers would be:-
=
x Where, µ stands for the population mean and N is the total
N number of observations.
Example 2
The following are estimates for outstanding balances (in Shs ‘000) for a sample of 20 degree
students at Makerere in Uganda shillings. Determine the arithmetic mean.
Solution
Tally Frequency (f) (f x)
Balances (x), 000
135 //// 4 540
136 //// /// 8 1,088
137 //// 5 685
138 /// 3 414
∑f = 20 ∑ (f x) = 2,727
X =
fx = fx = 2,727 = 136.35
f N 20
X is136,000UgandaShil lings
X =
fx = fx
f N
X =
fx = 6,745 = 67.45
f 100
Example 2: For grouped data
The following are selling prices in (thousands of $) of 20 vehicles sold by Ramzan motors in the
past 7 months. What is the estimated mean selling price?
85 75 66 43 40
88 80 56 56 67
87 83 65 53 75
83 83 52 44 45
Solution
Choose a suitable class, e.g. (40 – 49)
Selling x tally frequency fx
price in f
‘000
$40 – 49 44.5 //// 4 178
50 – 59 54.5 //// 4 218
60 – 69 64.5 /// 3 193.5
70 – 79 74.5 // 2 149
80 – 89 84.5 //// // 7 591.5
f = 20 fx = 1330
X =
fx = 1,330 = 66.5thousands of $.
f 20
Interpretation: For the past 7 months, the average selling price of 20 vehicles was approximately
$ 66.5 thousand.
( )
ie, x − x = 0
THE MEDIAN
It has been noted that for data containing one or two very large or very small values, the
arithmetic mean might not be very representative. The center point for such problems can better
be described by a measure of central tendency called the “median”
It is a statistical measure that divides a data set into two equal parts.
Case 1
For ungrouped data, we first arrange the items in either ascending or descending order. The
median will then be given by the observations that do fall in the middle (odd number of
observations) or the average of the two middle numbers (even no. of observations).
Example 1
Ungrouped data
Given the following set of data: 1, 2, 8, 9, 4, 7, 6, find the median.
Solution
Organize the data in an ascending order, i.e.
1, 2, 4, 6, 7, 8, 9.
The median = 6 (number that lies in the middle of the data set).
Example 2
Given the following data set: 1, 2, 8, 9, 4, 7, 6, 10, find the median.
Solution
Organise in the ascending order:
1, 2, 4, 6, 7, 8, 9, 10
Since there are two terms (6 & 7) in the middle, we take the average, i.e.
6+7
i.e. Median = = 6.5
2
Case 2
For Grouped Data
A. The interpolation formula or method.
B. Graphical interpolation method.
N − CFb 50 – 23
median = Lm + 2 xc Median = 65.5 +
42
x3
fm
Median = 67.43
Example 2
Given the following information, obtain the median class.
Since, ∑f = 44
and Lm = 12.5
N
/2 = 44/2 = 22
Cfb = 19
fm = 11
C =6
22 – 19
So, median = 12.5 + x6
11
= 14.12
Hence, the median is approximately 14.
NOTE: The median can also be the value of X that corresponds to the vertical line that divides a
histogram into two equal parts having equal areas
But for now lets consider the less than Ogive.
An OGIVE from the above table.
50
45
s
e
i 40
c N/2 An OGIVE
n 35
e =22
u
q 30
e
r
F
e 25
v
it
l 20
a
u
m15
u
C 10
5
0
6 12 18 24 30 36
UCL
So, from the above graph it can be seen that the median is approximately 14
Note: - The graphical estimation method is usually superior to the formula estimation method
as long as a smooth cumulative frequency curve is drawn. This is due to its superior non
interpolation effect. The median is a positional average and is influenced by the position of
the items in the series and not by the size of the items that’s why it’s not a suitable
representative of the a data series in most cases.
THE MODE
If we want to find a measure of central location in terms of popularity (most popular item) then
it’s advisable to compute the mode. e.g., if we consider a shop selling television sets and the
manager asks him self ”what price does the average television set sell at?” ,the best measure would
be the mode. Other cases where the mode is the best average are: - shoe and cloth sizes, number
of defectives found on a production line, size of company by number of employees.etc.
So the mode is the observation/value that occurs most often (with the highest frequency) in a
given data set. For ungrouped data, the value that appears most frequently is the mode. A given
data set can have one mode (unimodal) or two modes (bimodal) or more than two modes
(multimodal)
Example 1
For ungrouped data
Given the following distribution: 2, 3, 6, 3, 5, 7, 3, 2, 7, find the mode.
Solution
Arrange in ascending order:
2, 2, 3, 3, 3, 5, 6, 7, 9
By mere observation, 3 appears often compared to any other number.
Class f
60 – 62 5 Modal class = 66 – 68
63 – 65 18 Note: Lm = 65.5
If the modal class happens to be the
66 – 68 42 first for
m = 42
last class in the distribution, then we
69 – 71 fa =the18mode as:
27 estimate
72 - 74 8 Mode fb = =3 (median)
27 – 2 (mean).
C =3
∑f = 100
42 – 18 Mode = 67.346.
So, mode = 65.5 + x3
(42 – 18) + (42 – 27)
A. Graphical method.
The mode can also be got graphically from a histogram – by drawing lines diagonally from upper
corners of the tallest bar to upper corners of the adjacent bars. Then a bar line is drawn from the
point of intersection to the X-axis and the mode is read off from the X-axis, i.e.
40
35 60 – 62
30 63 – 65
Frequency
66 – 68
25
69 – 71
20
72 - 74
15
10
0
Classes
60–62 63–65 66–68 69–71 72-74
Time (sec)
Frequency
20 – 29 6
30 – 39 16
40 – 49 21
50 – 59 29
60 – 69 25
70 – 79 22
80 – 89 11
90 – 99 7
100 - 109 4
110 – 119 0 Modal class = (50 – 59), i.e. class with highest frequency.
120 – 129 2
Lm = 49.5
fm = 29
fa = 21 Read about interpretation of the mode in
fb = 25 business decision making.
C = 10
( fm − fa )
Mode = Lm + xC
( f m − f a ) + ( f m − f b )
29 – 21
Mode = 49.5 + x 10
(29 – 21) + (29 – 25)
8 x
Mode = 49.5 + 10
12
Mode = 56.2
Note:-The mode is usually affected by the much popular class when the distribution is
significantly skewed and sometimes it might not exist if all items have different values or not be
unique e.g. when two or more values have the same highest frequency. The mode can’t be used
for further statistical analysis since it has no natural measure of dispersion.
➢ It’s usually used as an alternative to the mean and median when the situation calls for the
most popular item in the data set.
➢ Easy to understand, not difficult to calculate and can be used when the distribution has open
ended classes.
➢ Although the mode usually ignores isolated extreme values, it’s thought to be too much
affected by the most popular class when the distribution is significantly skewed.
➢ Sometimes it might not exist, when the set of items all have different values. Or, might not be
unique when there are two or more values that do have the highest frequency.
➢ Unlike the mean and the median, it has no natural measure of dispersion to go with it which
is a particular disadvantage in most cases where further analysis is required.
➢ Like the median, the mode is not used for further statistical work.
Revision questions
Question 1
a) Define an arithmetic mean and mention any 5 qualities of a good arithmetic mean.
b) Distinguish between the following:
(i) Mode and median
(ii) Median and range
(iii) Sample mean and population mean.
c) Nine light bulbs burned out after lasting for 867, 849, 840, 852,666, 867, 756, 342 and
822 hours of continuous use. Find the mean, the mode, the range, the median and also
determine what the mean would be if the second value was incorrectly recorded as 489
instead of 849.
d) When is it advisable for a business statistician to apply: - a) the mode, b) the median, c)
the arithmetic mean? As a measure of central tendency / location?
Question 2
a) Name two separate conditions under which the median rather than the mean would be
chosen as a measure of location and explain why?
b) What would be the advantages and disadvantages of using:-
• The arithmetic mean.
• The median and
• The mode?
c) What are the properties of any good average?
Question 3
The following are daily number of newspapers sold by the Red Pepper in 90 business days.
Copies sold
Number of days Compute the:
20 – 24 3 a) Average number of copies sold
25 – 29 10 b) Median number of copies
30 – 34 21 c) Mode
35 – 39 28 d) Draw a histogram of the distribution and
40 – 44 14 estimate the mode
45 – 49 9 e) Draw a frequency polygon and estimate the
median
50 - 54 5
f) Do you get the answers in b) and e)?
Total 90