0% found this document useful (0 votes)
4 views38 pages

Week 03

This document outlines the key concepts of numerical descriptive techniques in business analytics, focusing on measures of central location (mean, median, mode), variability (range, variance, standard deviation), and relative standing (percentiles, quartiles). It emphasizes the importance of understanding these measures for data analysis and decision-making. Additionally, it introduces the Central Limit Theorem and various probability distributions relevant to statistical analysis.

Uploaded by

Mukhtar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views38 pages

Week 03

This document outlines the key concepts of numerical descriptive techniques in business analytics, focusing on measures of central location (mean, median, mode), variability (range, variance, standard deviation), and relative standing (percentiles, quartiles). It emphasizes the importance of understanding these measures for data analysis and decision-making. Additionally, it introduces the Central Limit Theorem and various probability distributions relevant to statistical analysis.

Uploaded by

Mukhtar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

MGT 2148 - Business Analytics and

Decision Making
Week 3

Professor: Mohammad Raahemi Spring 2025


Week 3 – Numerical Descriptive Techniques
❑ Measures of Central Location (Mean, Median, Mode)
❑ Measures of Variability (Range, Variance, Standard Deviation,
Coefficient of Variation)
❑ Measures of Relative Standing (Locating Percentiles, Box Plot)
❑ Probability Events
Introduction
After completing this week, you should be able to:

▪ Determine Measure of Central Location, Variability, Relative Standing

▪ Understand Measures of Linear Relationship

▪ Compare Graphical and Numerical Techniques

▪ Define General Guidelines for Exploring Data

▪ Identify Methods of Collecting Data

▪ Develop Sampling Plans

▪ Understand Sampling and Non-sampling Errors


Measures of Central Location…
The arithmetic mean, a.k.a. average, shortened to mean, is the most popular & useful
measure of central location.

It is computed by simply adding up all the observations and dividing by the total number of
observations:

Sum of the observations


Mean =
Number of observations
Measures of Central Location…
❑ When referring to the number of observations
in a population, we use uppercase letter N

❑ When referring to the number of observations


in a sample, we use lower case letter n

❑ The arithmetic mean for a population is


denoted with Greek letter “mu”: 𝜇

❑ The arithmetic mean for a sample is denoted


with an “x-bar”: 𝑋ത
Measures of Central Location…

Sample Mean
Population Mean
Measures of Central Location…
The median is calculated by placing all the observations in order; the observation that falls in
the middle is the median.

Data: {0, 7, 12, 5, 14, 8, 0, 9, 22} N=9 (odd)


Sort them bottom to top, find the middle:
0 0 5 7 8 9 12 14 22

Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10 (even)


Sort them bottom to top, the middle is the
simple average between 8 & 9: 0 0 5 7 8 9 12 14 22 33
median = (8+9)÷2 = 8.5

Sample and population medians are computed the same way.


Measures of Central Location…
❑ The mode of a set of observations is the value that
occurs most frequently.

❑ A set of data may have one mode (or modal class),


or two, or more modes.

❑ Mode is a useful for all data types, though mainly


used for nominal data.

❑ For large data sets the modal class is much more


relevant than a single-value mode.

❑ Sample and population modes are computed the


same way.
Mean, Median, Mode…
If a distribution is symmetrical, the mean, median and mode may coincide…

median
mode

mean
Mean, Median, Mode: Which Is Best?
❑ With three measures from which to choose,
which one should we use?

❑ The mean is generally our first selection.


However, there are several circumstances
when the median is better.

❑ The mode is seldom the best measure of


central location.

❑ One advantage the median holds is that it not


as sensitive to extreme values as is the mean.
Mean, Median, Mode: Which Is Best?
To illustrate, consider the data in Example 4.1: { 0, 7, 12, 5, 14, 8, 0, 9, 22 }. The mean was
11.0 and the median was 8.5.

Now suppose that the respondent who reported 33 hours actually reported 133 hours
(obviously an Internet addict). The mean becomes:

σni=1 xi 0 + 7 + 12 + 5 + 133 + 14 + 8 + 0 + 22 210


xlj = = = = 21.0
n 10 10

This value is only exceeded by only two of the ten observations in the sample, making this
statistic a poor measure of central location.

The median stays the same. When there is a relatively small number of extreme observations,
the median usually produces a better measure of the center of the data.
Measures of Central Location Summary…
Compute the Mean to
Describe the central location of a single set of interval data

Compute the Median to


Describe the central location of a single set of interval or ordinal data

Compute the Mode to


Describe a single set of nominal data
Measures of Variability…
Measures of central location fail to tell the whole story about the distribution; that is, how much
are the observations spread out around the mean value?

For example, two sets of class grades are shown. The


mean (=50) is the same in each case…
But, the red class has greater variability than the blue
class.
Range
The range is the simplest measure of variability, calculated as:

Range = Largest observation – Smallest observation

e.g.,
Data: {4, 4, 4, 4, 50} Range = 46
Data: {4, 8, 15, 24, 39, 50} Range = 46
The range is the same in both cases, but the data sets have very different distributions…

What is the range for following dataset?


Data:{-10, 0, 16, 22, 22, 38}
Range
❑ Its major advantage is the ease with which it can be computed.

❑ Its major shortcoming is its failure to provide information on the dispersion of the
observations between the two end points.

❑ Hence, we need a measure of variability that incorporates all the data and not just two
observations. Hence…
Variance
Variance and its related measure, standard deviation, are arguably the most important
statistics. Used to measure variability, they also play a vital role in almost all statistical inference
procedures.

❑ Population variance is denoted by 𝜎 2 (Lower case Greek letter “sigma” squared)

❑ Sample variance is denoted by 𝑆2 (Lower case “S” squared)


Example
The following sample consists of the number of jobs six students applied for: 17, 15, 23, 7, 9,
13. Finds its mean and variance. What are we looking to calculate?

Sample Mean

Sample Variance
Standard Deviation
The standard deviation is simply the square root of the variance, thus:

Population standard deviation:

Sample standard deviation:


Example
Consider Example 4.8. A golf club manufacturer has designed a new club and wants to
determine if it is hit more consistently (i.e. with less variability) than with an old club.

Using Data > Data Analysis > Descriptive Statistics in Excel, we produce the following tables
for interpretation…

You get more


consistent
distance with the
new club.
Interpreting Standard Deviation
The standard deviation can be used to compare the variability of several distributions and make
a statement about the general shape of a distribution. If the histogram is bell shaped, we can
use the Empirical Rule, which states:

➢ Approximately 68% of all observations fall within one standard deviation of the mean.
➢ Approximately 95% of all observations fall within two standard deviations of the mean.
➢ Approximately 99.7% of all observations fall within three standard deviations of the
mean.
The Empirical Rule

Approximately 68% of all observations fall


within one standard deviation of the mean.

Approximately 95% of all observations fall


within two standard deviations of the mean.

Approximately 99.7% of all observations fall


within three standard deviations of the mean.
Central Limit Theorem
The sampling distribution of the sample means will approach normal distribution as the sample
size gets bigger, no matter what the shape of the population distribution is.

Assumptions

Data must be randomly sampled. Sample values must be independent of


each other.

Samples should come from the same Sample size must be sufficiently large
distribution. (≥30).

Let’s see CLT in action by simulation - Link to external site


Distributions around us (commonly occurring)

Bernoulli The outcome of tossing a fair coin

Binomial The number of non-defective products in a production run

Uniform The number of books sold weekly at a bookstore

Normal IQ distribution of all the seven years old children in Ottawa


Binomial Distribution
The binomial distribution is the probability distribution of the number of successes of an experiment that is conducted
multiple times and has only two possible outcomes.

Example: Suppose you have purchased 10 lottery tickets and the possible outcomes are winning the lottery or not
winning the lottery, then you can answer a question like what is the probability of winning 6 lottery tickets using the
binomial distribution.

The assumptions of Binomial distribution are as follows:


1. There are only two possible outcomes (success or failure) for each trial.

2. The number of trials is fixed.

3. The outcome of each trial is independent. In other words, none of the trials affect the probability of the
next trial.

4. The probability of success is the same for each trial.


Bernoulli distribution

In binomial distribution, if the number of trials for a given experiment is equal to 1, then it is
called the Bernoulli distribution.
Uniform Distribution

The Uniform Distribution is the


probability distribution where all
outcomes are equal likely.

Discrete Uniform Distribution: Can Continuous Uniform Distribution: Can


take a finite number (m) of values and take any value within a given range
each value has an equal probability of with equal probability.
selection. Example: Weight gained by a person
Example: Rolling a single die. over the next 2 months can be
Coefficient of Variation
The coefficient of variation of a set of observations is the standard deviation of the
observations divided by their mean,
that is:

Population coefficient of variation = CV =

This coefficient provides a proportionate


Sample coefficient of variation = cv = measure of variation, e.g.
A standard deviation of 10 may be
perceived as large when the mean value is
100, but only moderately large when the
mean value is 500.
Measures of Relative Standing
Measures of relative standing are designed to provide information about the position of
particular values relative to the entire data set.

Percentile: the Pth percentile is the value for which P percent are less than that value and (100-
P)% are greater than that value.

Suppose you scored in the 60th percentile on the GMAT, that means 60% of the other scores
were below yours, while 40% of scores were above yours.
Quartiles
We have special names for the 25th, 50th, and 75th percentiles, namely quartiles.

❑ The first or lower quartile is labeled Q1 = 25th percentile.

❑ The second quartile, Q2 = 50th percentile (which is also the median).

❑ The third or upper quartile, Q3 = 75th percentile.

We can also convert percentiles into quintiles (fifths) and deciles (tenths).
Commonly Used Percentiles
First (lower) decile = 10th percentile
First (lower) quartile, Q1, = 25th percentile
Second (middle)quartile,Q2, = 50th percentile
Third quartile, Q3, = 75th percentile
Ninth (upper) decile = 90th percentile

Note: If your exam mark places you in the 80th percentile, that doesn’t mean you scored 80%
on the exam – it means that 80% of your peers scored lower than you on the exam; It is about
your position relative to others.
Location of Percentiles
The following formula allows us to approximate the location of any percentile:
Location of Percentiles

Recall the data from Example 4.1: 0 0 5 7 8 9 12 14 22 33. Where is the location of the
25th percentile? That is, at which point are 25% of the values lower and 75% of the values
higher?
L25 = (10+1)(25/100) = 2.75

The 25th percentile is three-quarters of the distance between the second (which is 0) and the third (which is 5) observations. Three-
quarters of the distance is: (.75)(5 – 0) = 3.75. Because the second observation is 0, the 25th percentile is 0 + 3.75 = 3.75

L75 = (10+1)(75/100) = 8.25


It is located one-quarter of the distance between the eighth and the ninth observations, which are 14 and 22, respectively. One-quarter of
the distance is: (.25)(22 - 14) = 2, which means the 75th percentile is at: 14 + 2 = 16
Location of Percentiles

Please remember…

position 16
2.75

0 0 | 5 7 8 9 12 14 | 22 33

position
3.75 8.25

Lp determines the position in the data set where the percentile value lies,not the value of the
percentile itself.
Interquartile Range
The quartiles can be used to create another measure of variability, the interquartile range,
which is defined as follows:

Interquartile Range = Q3 – Q1

The interquartile range measures the spread of the middle 50% of the observations. Large
values of this statistic mean that the 1st and 3rd quartiles are far apart indicating a high level of
variability.
Box Plots
The box plot is a technique that graphs five statistics:
the minimum and maximum observations, and

Whisker

Whisker (1.5*(Q3–Q1))
the first, second, and third quartiles.
Example
A large number of fast-food restaurants with drive-through windows offering drivers and their
passengers the advantages of quick service. To measure how good the service is, an
organization called QSR planned a study wherein the amount of time taken by a sample of
drive-through customers at each of five restaurants was recorded. Compare the five sets of
data using a box plot and interpret the results.
Example
Wendy’s service time is shortest and least variable.
Hardee’s has the greatest variability, while Jack-in-
the-Box has the longest service times.
Thank you

Source of Content: G. Keller (2017) Statistics for Management


Source of Decorative Figures: https://www.freepik.com/

You might also like