BADB1014 Quantitative Methods - Lesson 3
BADB1014 Quantitative Methods - Lesson 3
To describe situations, draw conclusions, or make inferences about events, one must organise the data in some
meaningful way. The most convenient method of organising data is to construct a frequency distribution. After
organising the data, the researcher must present them so they can be understood by those who will benefit from
reading the study. The most useful method of presenting the data is by constructing statistical charts and graphs.
There are many different types of charts and graphs, and each one has a specific purpose. This lesson shows the
statistical methods that can be used to summarise data. The method is the finding of averages, median, mode,
range, variance and standard deviation will be discussed in this lesson.
Introduction
Statistics
Statistics is the mathematical science that deals with the collection, analysis, and presentation of data, which
can then be used as a basis for inference and induction.
Data
Values assigned to observations or measurements
Information
Data that are transformed into useful facts that can be used for a specific purpose, such as making a decision
Branches of statistics
Descriptive statistics
• collecting, summarising, and displaying data
Inferential statistics
• making claims or conclusions about the data based on a sample
Population
• represents all possible subjects that are of interest in a particular study
Sample
• refers to a portion of the population that is representative of the
population from which it was selected
A frequency distribution shows the number of data observations that fall into specific intervals.
• Graphically summarise information not readily observable by merely looking at data in a table
• A class is a category (row) in a frequency distribution.
Continuous data are values that can take on any real numbers, including numbers that contain decimal points.
• Usually measured rather than counted
• Examples are weight, time, and distance.
Relative frequency distributions display the proportion of observations of each class relative to the total number
of observations.
• Shows the fraction of observations in each class
• Found by dividing each frequency by the total number of observations
• The fractions in a relative frequency distribution add up to 1.00.
Example:
A histogram is a graph showing the number of observations in each class of a frequency distribution.
Ideally, the number of classes in a frequency distribution should be between 4 and 20.
• Some data sets, particularly those with continuous data, require several values to be grouped together
in a single class.
• This grouping prevents having too many classes in the frequency distribution, which can make it difficult
to detect patterns.
Number of Classes
One method to determine the number of classes in a frequency distribution is the rule
2k ≥ n
where k = Number of classes
n = Number of data points
• Find the lowest value of k that satisfies the rule.
Suppose n = 50
25 = 32 < 50 (k = 5 is too small.)
26 = 64 > 50 (k = 6 is a good choice.)
Class Width
Class Boundaries
Class boundaries represent the minimum and maximum values for each class.
• Choose class boundaries that are easy to read.
☺🗹 ☹🗷
3 to less than 6 minutes 3.21 to less than 6.21 minutes
6 to less than 9 minutes vs. 6.21 to less than 9.21 minutes
9 to less than 12 minutes 9.21 to less than 12.21 minutes
Class Frequencies
Find class frequencies by counting and recording the number of observations in each class.
• This is easier when the data are sorted.
Example:
The Ogive
The ogive is a line graph that plots the cumulative relative frequency distribution.
It provides a simple representation of the frequencies that are less than or equal to a certain number.
Bar Charts
Bar charts are a good tool for displaying qualitative data that have been organised in categories.
Pareto Charts
Pareto charts are bar charts that show the frequency of the categories that cause quality control problems.
Show quality problem categories in decreasing order
• The most problematic categories are shown first
Pareto charts also plot the cumulative relative frequency as a line on the chart known as an ogive.
A stem and leaf display splits the data values into stems (the larger place values) and leaves (the smaller place
value).
7|8 8 9 9 9
8|0 0 0 0 1 1 2 3 3 4 4 4 5 6 7 8
9|0 2 5
• The stem labeled 7(5) stores all the scores between 75 and 79.
• The stem 8(0) stores all the scores between 80 and 84.
Advantages:
• Simple to calculate
• Summarizes the data with a single value
Disadvantages:
• With only a summary value you lose information about the original data.
• Sample 1 with n = 3: 999, 1000, 1001 𝑥̅ = 1000
• Sample 2 with n = 3: 0, 1000, 2000 𝑥̅ = 1000
• Just knowing the mean does not help you know what the underlying data looks
like.
• The value of the mean is sensitive to outliers (values that are much higher or lower than most of
the data).
The Median
The median is the value in the data set for which half the observations are higher and half the observations are
lower.
• First arrange the data in ascending order.
When there are even numbers of data values, the median is halfway between the two middle values.
Example with sample of size n = 6:
145 157 170 182 204 209
The Mode
The mode is the value that appears most often in a data set.
• If no data value or category repeats more than once, then we say that the mode does not exist.
• More than one mode can exist if two or more values tie for the most frequent.
The mode is a particularly useful way to describe categorical data.
• The car that appears most often is Toyota (occurs 7 times), so the mode is the Toyota model.
Example:
Prices for 5 homes have been collected
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
© UNITAR International University 14 Prepared by: Zainora Hayat bin Hudi
Sum 3,000,000
Which Measure of Central Tendency Should You Use?
The mean is generally used as it is relatively easy to determine and most widely understood by people with little
statistical training.
If outliers are present, the median is often used, since the median is not sensitive to outliers
• For example, median home prices may be reported for a region; it is less sensitive to outliers.
For categorical data, the mode is the only choice
The Range
Advantages:
Example:
Used when the data set represents an entire population rather than a sample from a population
The standard deviation is a common measure of consistency in business applications, such as quality
control.
• The standard deviation measures the amount of variability around the mean.
The standard deviation is affected by the scale of the data.
• When sample means are very different, comparing standard deviations can be misleading.
The coefficient of variation, CV, measures the standard deviation in terms of its percentage of the mean.
• A high CV indicates high variability relative to the size of the mean.
• A low CV indicates low variability relative to the size of the mean.
A smaller coefficient of variation indicates more consistency within a set of data values.
1 to under 5 6
5 to under 9 12
9 to under 13 10
13 to under 17 4
The merchant would like to calculate the average number of viewed pages.
Midpoint
Number of pages Frequency
(mi)
1 to under 5 3 6
5 to under 9 7 12
9 to under 13 11 10
13 to under 17 16 4
- end of content –