0% found this document useful (0 votes)
56 views113 pages

SOB 1040 Lecture 2 - Data Organisation and Descriptive Statistics

This document provides an overview of descriptive statistics techniques for organizing and visualizing qualitative and quantitative data. Key points include: 1) Descriptive statistics describes important aspects of a data set using tables, graphs, and numerical summaries. Techniques for qualitative data include summary tables and contingency tables. Techniques for quantitative data include ordered arrays and frequency distributions. 2) Visualization of qualitative data uses bar charts and pie charts. Visualization of quantitative data uses histograms, stem-and-leaf displays, and polygons to depict distributions and compare values. 3) An example shows how to construct a histogram to analyze price data from restaurants, examining the distribution of prices and whether they fall within a normal range. The histogram

Uploaded by

Caroline Kapila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views113 pages

SOB 1040 Lecture 2 - Data Organisation and Descriptive Statistics

This document provides an overview of descriptive statistics techniques for organizing and visualizing qualitative and quantitative data. Key points include: 1) Descriptive statistics describes important aspects of a data set using tables, graphs, and numerical summaries. Techniques for qualitative data include summary tables and contingency tables. Techniques for quantitative data include ordered arrays and frequency distributions. 2) Visualization of qualitative data uses bar charts and pie charts. Visualization of quantitative data uses histograms, stem-and-leaf displays, and polygons to depict distributions and compare values. 3) An example shows how to construct a histogram to analyze price data from restaurants, examining the distribution of prices and whether they fall within a normal range. The histogram

Uploaded by

Caroline Kapila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 113

SOB 1040B: BUSINESS STATISTICS

LECTURE 2
DESCRIPTIVE STATISTICS

Business Statistics Graduate School of Business


Introduction
• There are two main branches of statistics namely descriptive statistics
and inferential statistics.
• Descriptive statistics is the science of describing the important
aspects of a set of measurements.
• Statistical inference is the science of using a sample of measurements
to make generalizations about the important aspects of a population
of measurements. This we shall cover later.
• We now focus on descriptive statistics. The techniques of descriptive
statistics include tabular and graphical methods, and numerical
methods.
Business Statistics Graduate School of Business 2
Organising Qualitative Data
• You can organise qualitative data using the following:
1. The summary table. It presents tallied responses as frequencies or
percentages for each category. Helps you see the differences among
the categories by displaying the frequency, amount, or percentage
of items in a set of categories in a separate column. This is usually
for a single variable.
2. The Contingency Table. A contingency table allows you to study
patterns that may exist between the responses of two or more
categorical variables. This type of table cross-tabulates, or tallies
jointly, the responses of the categorical variables.

Business Statistics Graduate School of Business 3


• Contingency table for the type of health facility used by students of
BBA 3415 whenever they got sick and whether a fee is charged.

Fee
Type of health facility Yes No Total

Government hospital/clinic 34 53 87
Private hospital/clinic 20 77 97
Total 54 130 184

Business Statistics Graduate School of Business 4


Organising Quantitative Data
• You can organise quantitative data using the following:
1. The Ordered Array. This is the arranging of the values of a numerical
variable in rank order, from the smallest value to the largest value.
An ordered array helps you get a better sense of the range of values
in your data and is particularly useful when you have more than a
few values.
2. The Frequency Distribution. A frequency distribution summarizes
numerical values by tallying them into a set of numerically ordered
classes. Classes are groups that represent a range of values, called a
class interval. Each value can be in only one class and every value
must be contained in one of the classes.

Business Statistics Graduate School of Business 5


Organising Quantitative Data
• To create a useful frequency distribution, you must think about how
many classes are appropriate for your data and also determine a
suitable width for each class interval.
• In general, a frequency distribution should have at least 5 classes but
no more than 15 classes because having too few or too many classes
provides little new information.
• To determine the number of classes we use the formula:

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠

Business Statistics Graduate School of Business 6


Organising Quantitative Data

• And to determine class width, use the formula

𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 − 𝑆𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒


𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑤𝑖𝑑𝑡ℎ =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠

Business Statistics Graduate School of Business 7


Visualizing Qualitative Data
• When data are qualitative, we use names to identify the different
categories (or classes).
• Often, we summarize qualitative data by using a frequency
distribution, bar charts, and pie charts.
1. Frequency distribution. As earlier discussed, this is a table that
summarizes the number (or frequency) of items in each of several
nonoverlapping classes.
• We can also summarize the proportion (or fraction) of items in each class by
using a relative frequency distribution.
• Relative frequency=𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠Τ𝑛

Business Statistics Graduate School of Business 8


Examples of frequency distributions
• Frequency distribution • Relative frequency distribution
Hungry Lion outlets Frequency Hungry Lion Relative Percent
Kitwe 8 outlets Frequency Frequency
Kitwe 0.16 16%
Chipata 2
Chipata 0.04 4%
Ndola 9
Ndola 0.18 18%
Lusaka 19
Lusaka 0.38 38%
Livingstone 4
Livingstone 0.08 8%
Kabwe 8
Kabwe 0.16 16%
Total frequency 50
Total frequency 1.00 100%

Business Statistics Graduate School of Business 9


2. Bar chart is a graphic that depicts a frequency, relative frequency, or
percent frequency distribution.
• When the score categories (X values) are measurements from a
nominal or an ordinal scale, the graph should be a bar graph.
Percentage (%)

Form of payment
Other/don't know 3

Electronic/online 28

Check 54

Cash 15

Business Statistics Graduate School of Business 10


3. Pie chart. A pie chart is
another graphic that can be
used to depict a frequency Pie Chart
distribution.
• When constructing a pie chart,
we first draw a circle to
represent the entire data set.
• We then divide the circle into
sectors or “pie slices” based on
the relative frequencies of the
classes.
Kitwe Chipata Ndola Lusaka Livingstone Kabwe

Business Statistics Graduate School of Business 11


Visualizing Quantitative Data

• Among the charts you use to visualize numerical data are the stem-
and-leaf display, the histogram, the percentage polygon, and the
cumulative percentage polygon (ogive)

• We begin by looking at the histogram.

Business Statistics Graduate School of Business 12


Visualizing Quantitative Data
1. Histogram. This is a bar chart for grouped numerical data in which
you use vertical bars to represent the frequencies or percentages in
each group. It is a bar chart such that each bar represents the
number of data values included in a given range of data (or class).
• In a histogram, there are no gaps between adjacent bars. You
display the variable of interest along the horizontal (X ) axis.
• The vertical (Y ) axis represents either the frequency or the
percentage of values per class interval.
• Let us begin by looking at an example.

Business Statistics Graduate School of Business 13


Hypothetical data for the price of meals at
various restaurants in Lusaka
Restaurant Price per Restaurant Price per Restaurant Price per Restaurant Price per
ID plate ID plate ID plate ID plate
(𝐙𝐌𝐖) (𝐙𝐌𝐖) (𝐙𝐌𝐖) (𝐙𝐌𝐖)
1 134 11 130 21 117 31 120
2 126 12 132 22 146 32 143
3 142 13 136 23 155 33 142
4 130 14 137 24 137 34 85
5 128 15 124 25 104 35 121
6 121 16 143 26 130 36 136
7 130 17 136 27 130 37 138
8 130 18 147 28 125 38 132
9 132 19 144 29 142 39 80
10 128 20 152 30 157 40 120
Business Statistics Graduate School of Business 14
• We can ask the following questions.
1. How are prices per plate for restaurants distributed?
2. Suppose the restaurants normally charge the prices between
K120 and K145 per plate, are the prices reported in our sample
typically within the normal price ranges?

• In order to make sense of the data, we have to visualize it by using


the frequency histogram.

Business Statistics Graduate School of Business 15


How to construct a histogram

• Step 1: Determine the number of bars (classes) for the histogram

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑎𝑟𝑠 (𝐶𝑙𝑎𝑠𝑠𝑒𝑠) = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠

• Step 2: Determine the class width (width of each bar)

𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 − 𝑆𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒


𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑤𝑖𝑑𝑡ℎ(𝐶𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ) =
𝑁𝑏𝑎𝑟𝑠

Business Statistics Graduate School of Business 16


How to construct a histogram

• Step 3: Generate a frequency distribution table (Group the data


into intervals and record the number of data points in each
class)
• What do you do with data points on the boundaries?

• Step 4: Construct the histogram showing frequencies on the y-


axis and group intervals on the x-axis (ensure that there are no
gaps between the bars)
Business Statistics Graduate School of Business 17
Example 1
• Using the price data for a sample of restaurants in Lusaka do the
following.
1. Construct a histogram.
2. Comment on the following:
a) Are the majority of restaurants’ prices above or below K119 per
plate?
b) Assuming the expected normal price for restaurants is between
K120 to K160 per plate, are the sampled restaurants charging
below the normal expected price range?

Business Statistics Graduate School of Business 18


Restaurant Price per Restaurant Price per Restaurant Price per Restaurant Price per
ID plate ID plate ID plate ID plate
(𝐙𝐌𝐖) (𝐙𝐌𝐖) (𝐙𝐌𝐖) (𝐙𝐌𝐖)
1 134 11 130 21 117 31 120
2 126 12 132 22 146 32 143
3 142 13 136 23 155 33 142
4 130 14 137 24 137 34 85
5 128 15 124 25 104 35 121
6 121 16 143 26 130 36 136
7 130 17 136 27 130 37 138
8 130 18 147 28 125 38 132
9 132 19 144 29 142 39 80
10 128 20 152 30 157 40 120

Business Statistics Graduate School of Business


Example 1 Solution
• Number of bars for the histogram Class categories Frequency
𝑁𝑏𝑎𝑟𝑠 = 40 = 6.32
80 – 93 2
• Round off to 6 bars
• Class width 93 – 106 1
157 − 80
𝑊𝑖𝑑𝑡ℎ = = 12.8 106 – 119 1
6
• Round off to class with of 13 119 – 132 18

132 – 145 13

145 – 157 5

Business Statistics Graduate School of Business


Solution cont…
20
18
16
14
Frequency

12
10
8
6
4
2
0
80 – 93 93 – 106 106 – 119 119 – 132 132 – 145 145 – 157
Price per plate categories

Business Statistics Graduate School of Business


Solution cont…
• We can see that the majority of 20

18
the prices are above 119. 16

• Assuming restaurants typically 14

Frequency
12

charge between K120 – K160 per 10

plate, our data shows that there 8

6
are a few prices that stray 4

relatively far from the normal 2

price charged per plate. 0


80 – 93 93 – 106 106 – 119 119 – 132 132 – 145 145 – 157
Price per plate categories

Business Statistics Graduate School of Business


Relative Frequency Histograms
• A relative frequency histogram is
a bar chart such that each bar Class Frequency Relative
represents the proportion of the categories Frequency
sample falling within a particular 80 – 93 2 0.05
range of data (or class) 93 – 106 1 0.025
• Relative frequency is determined 106 – 119 1 0.025
by dividing each class frequency 119 – 132 18 0.45
by the total number of 132 – 145 13 0.325
observations in the sample to
145 – 157 5 0.125
obtain the relative frequency

Business Statistics Graduate School of Business 23


Relative Frequency Histograms
20
18
16
14
Frequency

12
10
8
6
4
2
0
80 – 93 93 – 106 106 – 119 119 – 132 132 – 145 145 – 157
Price per plate categories

Business Statistics Graduate School of Business 24


Relative Frequency Histograms: Example
(continued)
20

• 32.5% of prices are between 132 18

and 145 16

• prices are skewed to the left


14

Frequency
12

• Modal class of prices is from 119 10

to 132 8

0
80 – 93 93 – 106 106 – 119 119 – 132 132 – 145 145 – 157
Price per plate categories

Business Statistics Graduate School of Business 25


In-class exercise-Histogram
• Nchelenge is a supervisor at one of a very large mining plants in
Solwezi. During the month of August, 2023, he randomly obtained the
time (in seconds) for a sample of one hundred and ten employees
(𝑛 = 110) on their completion of a particular task. The goal was to
complete the task in less than 4.5 minutes. The table below contains
these times (in seconds). Based on the data in the table carry out an
analysis and do the following:
a) Develop the frequency distribution table
b) Construct the histogram based on your frequency distribution table
c) Comment on the shape of your histogram.
d) What do the data indicate?
e) Provide recommendation to the supervisor based on what your data
indicate.

Business Statistics Graduate School of Business 26


• Table showing completion times in seconds
271 236 294 252 254 263 266 222 262 278 288
262 237 247 282 224 263 267 254 271 278 263
262 288 247 252 264 263 247 225 281 279 238
252 242 248 263 255 294 268 255 272 271 291
263 242 288 252 226 263 269 227 273 281 267
263 244 249 252 256 263 252 261 245 252 294
288 245 251 269 256 264 252 232 275 284 252
263 274 252 252 256 254 269 234 285 275 263
263 246 294 252 231 265 269 235 275 288 294
263 247 252 269 261 266 269 236 276 248 299
Business Statistics Graduate School of Business 27
2. Polygon. Here a dot is centered above each score so that the height
of the dot corresponds to the frequency. The dots are then
connected by straight lines. An additional line is drawn at each end
to bring the graph back to a zero frequency.

Business Statistics Graduate School of Business 28


3. Stem-and-leaf display. This is a kind of graph that places the
measurements in order from smallest to largest and allows the
analyst to simultaneously see all of the measurements in the data
set and see the shape of the data set’s distribution.
• When constructing a stem-and-leaf display, there are no rules that
dictate the number of stem values (rows) that should be used.
• The stem-and-leaf display is advantageous over the histogram
because it allows us to actually see the measurements in the data
set in addition to the distribution’s shape

Business Statistics Graduate School of Business 29


Steps in constructing the Stem-and-leaf
1. Decide what units will be used for the stems and the leaves. Each
leaf must be a single digit and the stem values will consist of
appropriate leading digits. As a general rule, there should be
between 5 and 20 stem values.
2. Place the stem values in a column to the left of a vertical line with
the smallest value at the top of the column and the largest value at
the bottom.
3. To the right of the vertical line, enter the leaf for each measurement
into the row corresponding to the proper stem value. Each leaf
should be a single digit—these can be rounded values that were
originally more than one digit if we are using an appropriately
defined leaf unit.

Business Statistics Graduate School of Business 30


4. Rearrange the leaves so that they are in increasing order from left
to right.
In-class exercise
Construct a stem-and-leaf for the following data.
30.8 30.8 32.1 32.3 32.7
31.7 30.4 31.4 32.7 31.4
30.1 32.5 30.8 31.2 31.8
31.6 30.3 32.8 30.7 31.9
32.1 31.3 31.9 31.7 33.0
33.3 32.1 31.4 31.4 31.5
31.3 32.5 32.4 32.2 31.6
31.0 31.8 31.0 31.5 30.6
32.0 30.5 29.8 31.7 32.3
32.4 30.5 31.1 30.7 31.4

Business Statistics Graduate School of Business 31


• Version A

Business Statistics Graduate School of Business 32


• Version B

Business Statistics Graduate School of Business 33


4. The Scatter Plot. Often, you have two numerical measurements
about the same item or individual. A scatter plot can explore the
possible relationship between those measurements by plotting the
data of one numerical variable on the horizontal, or X, axis and the
data of a second numerical variable on the vertical, or Y, axis.
• For example, a marketing analyst could study the effectiveness of
advertising by comparing advertising expenses and sales revenues
of 50 stores. Using a scatter plot, a point is plotted on the two-
dimensional graph for each store, using the X axis to represent
advertising expenses and the Y axis to represent sales revenues.

Business Statistics Graduate School of Business 34


Misuses and Common Errors in Visualizing
Data
• Good graphical displays clearly and unambiguously reveal what the
data convey.
• Unfortunately, many graphs presented in the media (broadcast, print,
and online) are incorrect, misleading, or so unnecessarily complicated
that they should never be used.
• Let us look at a few examples.

Business Statistics Graduate School of Business 35


Misuses and Common Errors in Visualizing
Data
• Good graphical displays clearly and unambiguously reveal what the
data convey.
• Unfortunately, many graphs presented in the media (broadcast, print,
and online) are incorrect, misleading, or so unnecessarily complicated
that they should never be used.
• Let us look at a few examples.

Business Statistics Graduate School of Business 36


• The following presents exports of wine from Australia to the United States.

• What problem can you notice here with this presentation?


• We are using a three-dimensional wineglass icon to represent the two
dimensions of exports and time. Although the wineglass presentation may
catch the eye, the data should instead be presented in a summary table or
a time-series plot.

Business Statistics Graduate School of Business 37


• What problems can you notice here?.
Business Statistics Graduate School of Business 38
• There are several problems in this graph.
1. There is no zero point on the vertical axis.
2. The acreage of 135,326 for 1949–1950 is plotted above the
acreage of 150,300 for 1969–1970.
3. It is not obvious that the difference between 1979–1980 and
1997–1998 (71,569 acres) is approximately 3.5 times the
difference between 1979–1980 and 1969–1970 (21,775 acres).
4. There are no scale values on the horizontal axis. Years are
plotted next to the acreage totals, not on the horizontal axis.
5. The values for the time dimension are not properly spaced along
the horizontal axis. For example, the value for 1979–1980 is
much closer to 1989–1990 than it is to 1969–1970.

Business Statistics Graduate School of Business 39


• The following are some guidelines for developing good graphs:
• A graph should not distort the data.
• A graph should not contain chartjunk, unnecessary adornments
that convey no useful information.
• Any two-dimensional graph should contain a scale for each axis.
• The scale on the vertical axis should begin at zero.
• All axes should be properly labelled.
• The graph should contain a title.
• The simplest possible graph should be used for a given set of data.

Business Statistics Graduate School of Business 40


Numerical Techniques: (a) Measures of Central Tendency

Business Statistics Graduate School of Business


What are measures of central tendency?
• These are numbers generated for a given data set to help us locate the
“center” of that data set. That is, the central tendency is the extent to
which the data values group around a typical or central value
• A measure of central tendency represents the center or middle of the
data. Sometimes we think of a measure of central tendency as a typical
value.
• There are three basic measures of central location that we will concern
ourselves with
• Mean
• Median
• Mode

Business Statistics Graduate School of Business


Mean
• Most common method for finding a typical value for a list of numbers,
found by adding up all the values and then dividing by the number of
items. The mean is the only common measure in which all the values play
an equal role.
• One important measure of central tendency for a population of
measurements is the population mean.
• The population mean, which is denoted 𝜇 and pronounced mew, is the
average of the population measurements.
• The sample mean is denoted 𝑋ത and for ungrouped data it is given as below

Business Statistics Graduate School of Business


𝑛
1
𝑋ത = ෍ 𝑋𝑖
𝑛
𝑖=1
• Where 𝑋ത is the sample mean, σ𝑛𝑖=1 𝑋𝑖 tells us that we are adding up
all the observations for our variable of interest from the first to the
1
nth number and divides the sum by the number of observations in
𝑛
our sample.
𝑛

෍ 𝑋𝑖 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑖=1

Business Statistics Graduate School of Business 44


Example
• Steady collected the times shown below for 10 consecutive days from
the time he got out of bed to when he left for class.

𝐷𝑎𝑦: 1 2 3 4 5 6 7 8 9 10
𝑇𝑖𝑚𝑒 (𝑀𝑖𝑛𝑢𝑡𝑒𝑠): 39 29 43 52 39 44 40 31 44 35

• Compute the average time and interpret it.

Business Statistics Graduate School of Business


Solution to example
𝑛
1
𝑋ത = ෍ 𝑋𝑖
𝑛
𝑖=1

10
1 1
𝑋ത = ෍ 𝑋𝑖 = 𝑋1 + 𝑋2 + ⋯ + 𝑋10
10 10
𝑖=1

1
𝑋ത = 39 + 29 + 43 + 52 + 39 + 44 + 40 + 31 + 44 + 35
10
1
𝑋ത = 396 = 39.6
10

Business Statistics Graduate School of Business


• For grouped data the sample and population means are given as
below
σ𝑛𝑖=1 𝑓𝑖 𝑋𝑖 σ𝑁
𝑖=1 𝑓𝑖 𝑋𝑖
𝑋ത = 𝑛 𝑎𝑛𝑑 𝜇= 𝑁
σ𝑖=1 𝑓𝑖 σ𝑖=1 𝑓𝑖
• Where
• σ𝑛𝑖=1 𝑓𝑖 𝑋𝑖 is the sum of the frequency of each class times the class mid point.
• 𝑋𝑖 = the mid point for class 𝑖.
• 𝑓𝑖 = the frequency for class i.
• σ𝑛𝑖=1 𝑓𝑖 = sum of all frequencies.

Business Statistics Graduate School of Business 47


In-class exercise
• Calculate the sample mean given the following.
Weight (grams) Frequency (f)
19.2 to 19.4 1
19.5 to 19.7 2
19.8 to 20.0 8
20.1 to 20.3 4
20.4 to 20.6 3
20.7 to 20.9 2

Business Statistics Graduate School of Business 48


Characteristics of the mean

• Only have one mean for a data set

• Simple to compute and interpret

• Best measure of central location for statistical inference

• Value of the mean is influenced by extreme measurements

• Only applicable to quantitative data

Business Statistics Graduate School of Business 49


Weighted average
• The mean or ordinary average that we have demonstrated in the
previous slides assumes that data points for a particular variable are
equally important
• However, there situations when you as an analyst would want to
compute the mean based on the assumption that the data points in
the data set should not be treated equally
• The concept of weighted average allows different weights to be
assigned to observations when computing the mean

Business Statistics Graduate School of Business


• With the weights (wi′ s) being 0 ≤ wi ≤ 1 and σni=1 wi = 1 the

weighted mean is given as:

𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 = 𝑠𝑢𝑚 𝑜𝑓 𝑤𝑒𝑖𝑔ℎ𝑡 𝑡𝑖𝑚𝑒𝑠 𝑑𝑎𝑡𝑎 𝑖𝑡𝑒𝑚

𝑋ത𝑤 = 𝑤1 𝑋1 + 𝑤2 𝑋2 + ⋯ + 𝑤𝑛 𝑋𝑛

𝑋ത𝑤 = ෍ 𝑤𝑖 𝑋𝑖
𝑖=1

Business Statistics Graduate School of Business 51


• Alternatively, we can obtain the same results if the weights (wi′ s) are

𝑤𝑖 ≥ 0. In this case the weighted mean is given as:

𝑤1 𝑋1 + 𝑤2 𝑋2 + ⋯ + 𝑤𝑛 𝑋𝑛
𝑋ത𝑤 =
𝑤1 + 𝑤2 + ⋯ + 𝑤𝑛

σ𝑛𝑖=1 𝑤𝑖 𝑋𝑖
𝑋ത𝑤 = 𝑛
σ𝑖=1 𝑤𝑖

Business Statistics Graduate School of Business 52


In class exercise
• Compute the ordinary and the weighted average incomes for data
given in the table below. What do you observe?

Socio-economic status Proportion of population (%) Annual Income (ZMW)

Very poor 30
0.4
Poor 40
2
Middle 20
8
Relatively rich 8
40
Rich 2
200
Business Statistics Graduate School of Business
Solution cont…
• The ordinary average overstates the typical income of
individuals in the country. This is because computation of the
average is influenced by extreme values (outliers) that are on
the tail of the distribution. Since the distribution in this case
is skewed to the right, we expect the average to gravitate
towards the extreme values.
• The weighted average, on the other hand, takes into account
the proportion (weights) of the different socio-economic
groups reflecting a more realistic picture.
Business Statistics Graduate School of Business
Solution
• Ordinary • Weighted average
5
𝑛 𝑋ത = ෍ 𝑤𝑖 𝑋𝑖
1
𝑋ത = ෍ 𝑋𝑖 𝑖=1
𝑛 = 0.3 × 0.4 + 0.4 × 2
𝑖=1
+ 0.2 × 8 + 0.08 × 40
5
+ (0.02 × 200)
1 1 = 9.72

𝑋 = ෍ 𝑋𝑖 = (250.4)
5 5
𝑖=1
𝑋ത = 50.08

Business Statistics Graduate School of Business


The Geometric Mean
• When you want to measure the rate of change of a variable over
time, you need to use the geometric mean instead of the arithmetic
mean.
• The geometric mean is the 𝑛𝑡ℎ root of the product of 𝑛 values. This is
given as:
ത 𝑛
𝑋𝐺 = 𝑋1 × 𝑋2 × ⋯ × 𝑋𝑛 = 𝑋1 × 𝑋2 × ⋯ × 𝑋𝑛 1/𝑛

Business Statistics Graduate School of Business 56


In-class exercise
• Compute the mean and geometric mean for the following
observations.
4, 6, 9, 12, 3

Business Statistics Graduate School of Business 57


The Harmonic Mean
• This is the reciprocal of the arithmetic mean of the reciprocals of
observations.
• Given the observations 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 , the harmonic mean, is given
𝑛
𝐻𝑀 =
𝑛 1
σ𝑖=1
𝑥𝑖
• Example
• Compute the harmonic mean for the following observations.
4, 6, 9, 12, 3

Business Statistics Graduate School of Business 58


Exercise Solution

𝑛 5 5
𝐻𝑀 = = = = 5.29
𝑛 1 1 1 1 1 1 0.94444444
σ𝑖=1 + + + +
𝑥𝑖 4 6 9 12 3

Business Statistics Graduate School of Business 59


Median

• It is the central value; 50% of the measurements lie above the median
and 50% lie below the median

• There is only one median in a data set

• It is not influenced by extreme measures

• Applicable to both quantitative and ordinal data

Business Statistics Graduate School of Business


How to find the median of ungrouped data
1. Put the data items in order from smallest to largest (or largest to
smallest)

2. Find the middle value


a) If 𝑛 an odd number, the median is the middle data value found by using the
formula (1 + 𝑛)/2
b) If n is an even number, there are two middle values instead of just one. You find
the average of these two middle numbers located (1 + 𝑛)/2 steps in from either
end of the ordered data list.

Business Statistics Graduate School of Business


In class exercise
SOB Student Annual Income (K) • Compute the median income
1 5 for the sample of SOB Students
2 15.25 • Interpret the median computed
3 20 in this case
4 7.5
5 1.2
6 12.5
7 15
8 7.5
9 7.5
10 40
Business Statistics Graduate School of Business
Solution
• Order the data set • 𝑛 = 10, implying that the median is
SOB Student Annual income located at
1 1.2
2 5 1 + 10
= 5.5
3 7.5 2
4 7.5 • Since 5.5 is between 5 and 6, the
5 7.5
6 12.5 median is the average of data values
7 15 at these points
8 15.25
9 20 7.5 + 12.5
10 = 10
40 2
Business Statistics Graduate School of Business
Median for grouped data
• The median for the grouped data is given by the formula:

𝑁
−𝐹
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + 2 𝑐
𝑓𝑚

• WHERE: L = lower limit of the median class, 𝑵 = the number of observations in


the data set, 𝑭 = sum of the frequencies up to but not including the median
class, 𝐅𝐦 = frequency of the median class and 𝑪 = width of the class interval.

Business Statistics Graduate School of Business


Mode
• It is the most frequent or probable measurement in a data
set

• There can be more than one Mode

• It is not influenced by extreme measures

• Good for both qualitative and quantitative data

Business Statistics Graduate School of Business


In class exercise
SOB Student Annual Income (K) • Find the mode income for the
1 5 • sample of SOB Students
2 15.25
3 20
4 7.5
5 1.2
6 12.5
7 15
8 7.5
9 7.5
10 40
Business Statistics Graduate School of Business
Solution
SOB Student Annual Income (K) Income Level (K) Frequency
1 1.2 1.2 1
2 5 5 1
3 7.5 7.5 3
4 7.5 12.5 1
5 7.5 15 1
6 12.5 15.25 1
7 15 20 1
8 15.25 40 1
9 20
Most frequent income level is K7.5.
10 40

Business Statistics Graduate School of Business


Mode for grouped data
• The mode for the grouped data is given by the formula:

𝑑1
𝑀𝑜𝑑𝑒 = 𝐿 + 𝑐
𝑑1 + 𝑑2

• WHERE: L = Lower Limit of the Modal Class, 𝐝𝟏 = Frequency of the Modal


Class minus the Frequency of the Previous Class, 𝐝𝟐 = Frequency of the
Modal Class Minus the Frequency of the Following Class and 𝑪 = width of
the class interval.

Business Statistics Graduate School of Business


In-class exercise

• Use the table on slide (page) 43 to find the median and mode for that
dataset.

Business Statistics Graduate School of Business


Symmetry and skewness

• Symmetry is the characteristic of lateral or side-to-side balance. If the


left side of a distribution is a mirror image of the right side, then a
distribution is symmetrical

• If a distribution is clumped on one side of its range and has a long tail
on the other, then it is considered to be skewed in the direction of the
tail

Business Statistics Graduate School of Business


Histogram of ages of children
25 SYMMETRIC DISTRIBUTION

20

15

10

0
1 3 5 7 9 11 13

• Symmetry implies mean and median are both at center. Mean = median
Business Statistics Graduate School of Business
• Histogram of ages of children
25 DISTRIBUTION SKEWED RIGHT

20

15

10
• Histogram of ages of children
5

0
25 DISTRIBUTION SKEWED LEFT
1 3 5 7 9 11 13
20

15

10

0
1 3 5 7 9 11 13
Business Statistics Graduate School of Business
Skewness & Median vs. Mean
• If a distribution is skewed left, then there must be a clump to the
right. The median is “pulled” right. The mean is more influenced by
the extreme values in the long tail. They pull it in that direction. Thus
the mean lies to the left of the median.
Histogram of ages of children
25 DISTRIBUTION SKEWED LEFT

20
Median
15
Mean
10

0
1 3 5 7 9 11 13

Business Statistics Graduate School of Business


Skewness & Median vs. Mean
• If a distribution is skewed right, then there must be a clump to the
left. The median is “pulled” left. The mean is more influenced by the
extreme values in the long tail. They pull it in that direction. Thus the
mean lies to the right of the median.
Histogram of ages of children
25
DISTRIBUTION SKEWED RIGHT
20 Median
Mean
15

10

0
1 3 5 7 9 11 13

Business Statistics Graduate School of Business


Why is skewness a problem and what can be done
about it when doing analysis?

Business Statistics Graduate School of Business


Measures of Skewness

• Skewness is a measure of symmetry, or more precisely, the lack of


symmetry. A distribution, or data set, is symmetric if it looks the same
to the left and right of the center point.

• Skewness is measured by the following:


𝑠𝑘 = 𝑥ҧ − 𝑀0 OR 𝑠𝑘 = 3(𝑥ҧ − 𝑀)

• Where 𝑀0 =mode and 𝑀 =mean.

Business Statistics Graduate School of Business 76


Measures of Skewness

• These measures of skewness given above are absolute, therefore sometimes


we use the relative measure of skewness known as Coefficient of skewness.
The coefficient OF skewness is given by:
𝑥ҧ − 𝑀0
𝐶𝑜𝑒𝑓 𝑠𝑘 =
𝜎
Or
𝑥ҧ − 𝑀
𝐶𝑜𝑒𝑓 𝑠𝑘 =
𝜎
Business Statistics Graduate School of Business 77
Which measure of central tendency should I
use?
• Mean requires quantitative data (continuous and discrete)
• Median works with quantitative or ordinal
• Mode works with quantitative, ordinal, or nominal

Quantitative Ordinal Nominal

Average Yes - -

Median Yes Yes -

Mode Yes Yes Yes

Business Statistics Graduate School of Business


Numerical Techniques: Measures of Variability

Business Statistics Graduate School of Business


Variability: Introduction
• Defined as the extent to which the data values differ from each other
• Also known as dispersion, spread, uncertainty, diversity, risk
• Example data: 2, 2, 2, 2, 2, 2, 2
• Variability = 0
• Example data: 1, 3, 2, 2, 1, 2, 3
• How much variability?
• Look at how far each data value is from average 𝑋 = 2:
• Deviations from average are −1, 1, 0, 0, −1, 0, 1

Business Statistics Graduate School of Business


Measures of Variability: Range

• Range represents the size of the entire data set computed as follows:
𝑅𝑎𝑛𝑔𝑒 = 𝑚𝑎𝑥 − 𝑚𝑖𝑛
Where max is the largest data value of the data series and min is the smallest

• Range is useful for two purposes:


• Describe the extent of the data

• To search for errors in the data

Business Statistics Graduate School of Business


In-class Exercise
SOB Student Annual Income (K)
• Compute the range for income 1 1.2
data of SOB Students 2 5
3 7.5
𝑅𝑎𝑛𝑔𝑒 = 𝑚𝑎𝑥 − 𝑚𝑖𝑛 4 7.5
5 7.5
𝑅𝑎𝑛𝑔𝑒 = 40 − 1.2
6 12.5
𝑅𝑎𝑛𝑔𝑒 = 38.8 7 15
8 15.25
9 20
10 40
Business Statistics Graduate School of Business
Measures of Variability: Variance
• Variance measures the dispersion of the data around the mean

• Variance for a sample is measured as follows:

𝑛
1
𝑠2 = ෍ 𝑋𝑖 − 𝑋ത 2
𝑛−1
𝑖=1

Business Statistics Graduate School of Business


In-class exercise
• Compute the variance for the SOB Student Annual Income (K)
SOB Students’ incomes data set 1 1.2
2 5
3 7.5
4 7.5
5 7.5
6 12.5
7 15
8 15.25
9 20
10 40
Business Statistics Graduate School of Business
Solution
SOB Student Annual Income (K) ഥ)
(𝑿 − 𝑿 ഥ
𝑿𝒊 − 𝑿 𝟐

1 1.2 -11.945 142.683025


2 5 -8.145 66.341025
3 7.5 -5.645 31.866025
4 7.5 -5.645 31.866025
5 7.5 -5.645 31.866025
6 12.5 -0.645 0.416025
7 15 1.855 3.441025
8 15.25 2.105 4.431025
9 20 6.855 46.991025
10 40 26.855 721.191025

Business Statistics Graduate School of Business


Solution
𝑋ത = 13.145
𝑛
1
2
𝑠 = ෍ 𝑋𝑖 − 𝑋ത 2
𝑛−1
𝑖=1
𝑛

෍ 𝑋𝑖 − 𝑋ത 2 = 1081.09225
𝑖=1
1
𝑠2 = 1081.09225 = 120.121361
10 − 1
Difficult to interpret because its unit of measurement is the square of the original data

Business Statistics Graduate School of Business


Measures of Variability: Standard deviation
• Standard Deviation: Measures
the average distance each data SOB Annual ഥ)
(𝑿 − 𝑿 ഥ
𝑿𝒊 − 𝑿 𝟐

Stud Income
point is from the mean ent (K)
• Same units as the underlying 1 1.2 -11.945 142.683025
data instead of the square of the 2 5 -8.145 66.341025
3 7.5 -5.645 31.866025
underlying data 4 7.5 -5.645 31.866025
• Standard deviations for a sample 5 7.5 -5.645 31.866025
is measured as follows: 6 12.5 -0.645 0.416025
7 15 1.855 3.441025
𝑛
1 8 15.25 2.105 4.431025
𝑠= ෍ 𝑋𝑖 − 𝑋ത 2 9 20 6.855 46.991025
𝑛−1 10 40 26.855 721.191025
𝑖=1

Business Statistics Graduate School of Business


Solution
𝑋ത = 13.145
SOB Annual ഥ)
(𝑿 − 𝑿 ഥ
𝑿𝒊 − 𝑿 𝟐
𝑛
Student Income (K) 1
1 1.2 -11.945 142.683025
2
𝑠 = ෍ 𝑋𝑖 − 𝑋ത 2
𝑛−1
2 5 -8.145 66.341025 𝑖=1
3 7.5 -5.645 31.866025 𝑠 2 = 120.121361
4 7.5 -5.645 31.866025 𝑠 = 120.121361
5 7.5 -5.645 31.866025 𝑠 = 10.9599890967
6 12.5 -0.645 0.416025
7 15 1.855 3.441025
8 15.25 2.105 4.431025
Summarizes the typical
9 20 6.855 46.991025 distance from average for the
10 40 26.855 721.191025 individual data values

Business Statistics Graduate School of Business


• Compute and interpret the standard deviation for the following
values

271 236 294


262 237 247
262 288 247
252 242 248
263 247 252

Business Statistics Graduate School of Business 89


Normal Distribution and Std. Dev.
• For a normal distribution only
• 2/3 of data fall within one standard deviation of the average (either above or
below)
• 95% for 2 std devs.
• 99.7% for 3
one one
standard standard
deviation deviation

2/3 of data

95% of the data

99.7% of the data

Business Statistics Graduate School of Business


Measures of Variability: Coefficient of Variation
• The coefficient of variation • CV tells us the level of variation
measures the relative variability above or below the average
of a data set • CV has no measurement units
• This statistic can be useful in and thus may be useful in
comparing the variability comparing the variability of
between two data sets with different situations on a size-
different means. adjusted basis
• CV is computed as follows:
𝑠
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 =
𝑋ത

Business Statistics Graduate School of Business


Coefficient of variation
Tires produced per hour
Hour Plant A Plant B
1 986 765
2 823 923
3 862 856
4 794 913
5 897 645
6 896 964
7 912 756
8 765 739
St Dev. 71.3 110.6
Average 866.9 820.1
CV 0.0823 0.1348
Business Statistics Graduate School of Business
Measures of Relative Standing

Business Statistics Graduate School of Business


What are measures of relative standing?
• These are numbers that are used to describe the relative location of
data points within a data set
• In this unit, we focus on one measure of relative standing
• Percentiles

Business Statistics Graduate School of Business


Percentiles
• Percentiles are summary measures expressing ranks as percentages
from 0% to 100% rather than from 1 to n
• Percentiles are used in two ways:
• Indicate the data value at a given percentage e.g., 10th percentile of SOB
Students’ income is K3.1
• Indicate the percentage ranking of a given data value e.g., Student 2’s income
of K5 is within the 25th percentile
• Percentiles generally tell us the proportion of the data values that fall
below a given threshold and the proportion of data that fall above it

Business Statistics Graduate School of Business


Steps for computing percentiles

• To calculate the 𝑘 𝑡ℎ percentile (where k is any number between 0 and


100), do the following steps:
1. Order all the values in the data set from smallest to largest
2. Multiply 𝑘 percent by the total number of values, 𝑛. The resulting number
you get is called the index.

𝑘
𝑖 = 𝑘% × 𝑛 = ×𝑛
100

Business Statistics Graduate School of Business


Steps for computing percentiles cont…
❖If the index obtained in Step 2 is not a whole number, round it up to the
nearest whole number and count the values in your data set from the
smallest to the largest value until you reach the rounded-up number . The
corresponding value in your data set is the kth percentile

❖If the index obtained in Step 2 is a whole number, Count the values in your
data set from the smallest to the largest value until you reach the number
indicated by the index. The kth percentile is the average of that corresponding
value in your data set and the value that directly follows it.

Business Statistics Graduate School of Business


In class exercise
SOB Student Annual Income (K) • Calculate the 10th, 25th, 50th
1 5 ,75th , and 90th percentile for
2 15.25 the SOB Students’ annual
3 20 income
4 7.5 • Interpret each percentile
5 1.2
6 12.5
7 15
8 7.5
9 7.5
10 40

Business Statistics Graduate School of Business


Solution
• Order the data set
SOB Student Annual Income (K)
1 1.2
2 5
3 7.5
4 7.5
5 7.5
6 12.5
7 15
8 15.25
9 20
10 40
Business Statistics Graduate School of Business
Solution

𝑘% Index Is index whole What step 𝒌𝒕𝒉 percentile


number? should be
= 𝑘% ∗ 𝑛 taken to find
𝒌𝒕𝒉 percentile?
10 1 Yes Step 5 K3.1
25 2.5 No Step 4 K7.5
50 5 Yes Step 5 K10
75 7.5 No Step 4 K15.25
90 9 Yes Step 5 K30

Business Statistics Graduate School of Business


Solution
• Order the data set
SOB Student Annual Income (K)
1 1.2
2 5
3 7.5
4 7.5
5 7.5
6 12.5
7 15
8 15.25
9 20
10 40
Business Statistics Graduate School of Business
Solution

𝑘% Index Is index whole What step 𝒌𝒕𝒉 percentile


number? should be
= 𝑘% ∗ 𝑛 taken to find
𝒌𝒕𝒉 percentile?
10 1 Yes Step 5 K3.1
25 2.5 No Step 4 K7.5
50 5 Yes Step 5 K10
75 7.5 No Step 4 K15.25
90 9 Yes Step 5 K30

Business Statistics Graduate School of Business


Solution
• Order the data set
SOB Student Annual Income (K)
1 1.2
2 5
3 7.5
4 7.5
5 7.5
6 12.5
7 15
8 15.25
9 20
10 40
Business Statistics Graduate School of Business
Solution

𝑘% Index Is index whole What step 𝒌𝒕𝒉 percentile


number? should be
= 𝑘% ∗ 𝑛 taken to find
𝒌𝒕𝒉 percentile?
10 1 Yes Step 5 K3.1
25 2.5 No Step 4 K7.5
50 5 Yes Step 5 K10
75 7.5 No Step 4 K15.25
90 9 Yes Step 5 K30

Business Statistics Graduate School of Business


Solution
• Order the data set
SOB Student Annual Income (K)
1 1.2
2 5
3 7.5
4 7.5
5 7.5
6 12.5
7 15
8 15.25
9 20
10 40
Business Statistics Graduate School of Business
Solution

𝑘% Index Is index whole What step 𝒌𝒕𝒉 percentile


number? should be
= 𝑘% ∗ 𝑛 taken to find
𝒌𝒕𝒉 percentile?
10 1 Yes Step 5 K3.1
25 2.5 No Step 4 K7.5
50 5 Yes Step 5 K10
75 7.5 No Step 4 K15.25
90 9 Yes Step 5 K30

Business Statistics Graduate School of Business


Solution
• Order the data set
SOB Student Annual Income (K)
1 1.2
2 5
3 7.5
4 7.5
5 7.5
6 12.5
7 15
8 15.25
9 20
10 40
Business Statistics Graduate School of Business
Solution

𝑘% Index Is index whole What step 𝒌𝒕𝒉 percentile


number? should be
= 𝑘% ∗ 𝑛 taken to find
𝒌𝒕𝒉 percentile?
10 1 Yes Step 5 K3.1
25 2.5 No Step 4 K7.5
50 5 Yes Step 5 K10
75 7.5 No Step 4 K15.25
90 9 Yes Step 5 K30

Business Statistics Graduate School of Business


Solution
• Order the data set
SOB Student Annual Income (K)
1 1.2
2 5
3 7.5
4 7.5
5 7.5
6 12.5
7 15
8 15.25
9 20
10 40
Business Statistics Graduate School of Business
Solution
𝑘% Interpretation
𝑘% percentile
10 K3.1 10% of SOB Students’ incomes are below K3.1
while 90% are above K3.1
25 K7.5 25% of SOB Students’ incomes are below K7.5
while 75% are above K7.5
50 K10 50% of SOB Students’ incomes are below K10
while 50% are above K10
75 K15.25 75% of SOB Students’ incomes are below
K15.25 while 25% are above K15.25
90 K30 90% of SOB Students’ incomes are below K30
while 10% are above K30
Business Statistics Graduate School of Business
Box-and-whiskers displays (box plots)
• A more sophisticated modification of the graphical five-number
summary is called a box-and-whiskers display (sometimes called a box
plot).
Steps in constructing box plots
1. Draw a box that extends from the first quartile 𝑄1 to the third
quartile 𝑄3 .
2. Determine the values of the lower and upper limits.
3. Draw whiskers as dashed lines that extend below 𝑄1 and above 𝑄3 .
4. A measurement that is less than the lower limit or greater than the
upper limit is an outlier.

Business Statistics Graduate School of Business111


In-class exercise
Develop the Box-and-Whiskers plot for the following data.

7524 18211 135540 49312 57283 190250


72814 26817 41286 90416 11070 36551

Business Statistics Graduate School of Business112


End of Lecture 2

Business Statistics Graduate School of Business113

You might also like