Business Statistics Chapter 2
Business Statistics Chapter 2
STATISTICS
CHAPTER 2
CLASSIFICATION OF DATA
Some key terms
• Raw data: it is the collected data which have not been organized numerically.
• Arrays: An array is the arrangement of raw numerical data in ascending or
descending order of magnitude.
• Variable: A variable is any quantity or attribute whose value varies from one unit
to another. A variable may be classified as;
• i. Continuous Variable – A variable which takes all values within a given range
• ii. Discrete Variable – A variable whose values are countable.
• Frequency: It is the number of occurrence of a given value or group.
• Frequency Distribution: A tabular arrangement of data by classes together with
the corresponding class frequencies is called a frequency distribution (table). It
may be:
i. Ungrouped
ii. Grouped or
iii. Categorical
Ungrouped Frequency Distribution
• Example.
The following is a record of number of
absentees per day from a factory over 21 days.
3 1 2 4 1
4 4
2 0 3 1 2
1 0
2 1 1 1 0
0 4
SOLUTION
NO. OF TALLY NO. OF DAYS.
ABSENTEES (FREQUENCY)
0 //// 4
1 //// // 7
2 //// 4
3 // 2
4 //// 4
TOTAL 21
Grouped Frequency Distribution
This is a set of class intervals for the variable (continuous) together
with the associated frequencies.
• Class interval: It is a subdivision of the total range of values a
variable (i.e. continuous) may take. The groups into which the
values are put are called “classes” e.g. 41 – 49 or 50 – 59
• Class limits: These are end points of the class interval. The end
numbers 41 and 50 are called the lower class limit and the
numbers 49 and 59 are called the upper class limits.
• Class Frequency: It is the number of variables which fall in a given
interval.
Grouped Frequency Distribution cont’d
• Class Boundary: These are the lower and the upper values of a class
that mark common points between classes.
To find the class boundaries, we subtract the upper class limit of the first
class from the lower class limit of the second class (i.e.50 – 49 = 1)
Then we divide the result by 2, (i.e. 1/2 = 0.5).
We then subtract 0.5 from all the lower class limits and add 0.5 to all the
upper class limits to obtain the lower class boundaries and the upper
class boundaries respectively.
Hence, simply, it is the dividing line between any two successive classes.
Thus the class boundary for the class 41 – 49 is given as 40.5 – 49.5.
Grouped Frequency Distribution cont’d
• Class Size: This is the difference between the upper and lower class
boundaries of a class interval.
Class Size = Upper class boundary - lower class boundary
For example the class size or width for the class 41 – 49 is given as 49.5
– 40.5 = 9
32 – 40 / 1
41 – 49 /// 3
50 – 58 //// // 7
59 – 67 //// 4
68 – 76 ///// 5
Total 20
Grouped Frequency Distribution steps
• Find the highest and lowest values.
• Find the range (highest value – lowest value)
• Select the number of classes desired.
(Sturge’s rule = 1+3.322((n))
(where “n” stands for the number of data values)
• Find the width by dividing the range by the number of classes and rounding
up.
• Select a starting point (usually the lowest value); add the width to get the
lower limits.
• Find the upper class limits.
• Find the boundaries.
• Tally the data, find the frequencies, and find the cumulative frequency.
EXAMPLE 2.3
Construct a frequency distribution of the data below
1 2 6 7 12
2 6 9 5 13
18 7 3 15 15
17 1 14 5 4
4 16 4 5 8
5 18 5 2 6
9 11 12 1 9
10 11 4 10 2
9 18 8 8 4
7 3 2 6 14
SOLUTION
CLASS LIMIT TALLY CLASS FREQUENCY
BOUNDARY
1–3 //// //// 0.5 – 3.5 10
4–6 //// //// //// 3.5 – 6.5 14
7–9 //// //// 6.5 – 9.5 10
10 – 12 //// / 9.5 – 12.5 6
13 – 15 //// 12.5 – 15.5 5
16- 18 //// 15.5 – 18.5 5
TOTAL 50
Categorical Frequency Distribution
The categorical frequency distribution is used for data that can be placed in specific
categories, such as nominal or ordinal level data. For example data such as political
affiliation, religious affiliation or major field of study, etc., could be put in a categorical
frequency distribution.
Example
Twenty-five army inductees were given a blood test to determine their blood type. The data
set is as follows:
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
Construct a frequency distribution for the data.
Solution
CLASS TALLY FREQUENCY PERCENTAGE
(f/n)
A //// 5 20
B //// // 7 28
O //// //// 9 36
AB //// 4 16
TOTAL 25 100
Example
The area of specialization of some selected students from KNUST School of
Business are given as follows:
HTM ACF ACF HTM LSCM ACF
LSCM LSCM ACF MIB ACF LSCM
ACF ACF LSCM HTM LSCM HTM
HTM LSCM LSCM ACF MIB MIB
MIB HTM LSCM ACF HTM HTM
FREQUENCY 2 4 6 2 1
10 – 19 9.5 – 19.5 2
20 – 29 19.5 – 29.5 4
30 – 39 29.5 – 39.5 6
40 – 49 39.5 – 49.5 2
50 – 59 49.5 – 59.5 1
Trial example
• The number of absent days of a group of employees in 2020 is
presented in the table below. Construct a histogram for this data.
AGE (midpoint) 1 2 3 4 5
FREQUENCY 2 4 2 5 1
SOLUTION
CLASS MID-POINT CLASS BOUNDARY FREQUENCY
1 0.5 – 1.5 2
2 1.5 – 2.5 4
3 2.5 – 3.5 2
4 3.5 – 4.5 5
5 4.5 – 5.5 1
Cumulative Frequency Curve
• The cumulative frequency corresponding to a class is the sum of
frequencies of that class and of all classes preceding that class.
• A table showing cumulative frequency is called the cumulative frequency
table
• A graph of cumulative frequency against the corresponding variable is
called the cumulative frequency curve or Ogive.
• Unless otherwise stated a question will normally refer to the less than
Ogive.
• Less than ogives are drawn using the higher class boundaries
• More than ogives are drawn using the lower class boundaries
Example
• Draw the ogives for the following distribution of marks
obtained by 59 students:
MARKS 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50 50 - 60 60 – 70
FREQ. 4 8 11 15 12 6 3
Solution
MARKS FREQUENCY LESS THAN MORE THAN
CUMULATIVE CUMULATIVE
FREQUENCY FREQUENCY
0 – 10 4 4 59
10 – 20 8 12 55
20 – 30 11 23 47
30 – 40 15 38 36
40 – 50 12 50 21
50 – 60 6 56 9
60 – 70 3 59 3
Exploratory Data Analysis – The Stem And Leaf Display