Week 03
Week 03
Decision Making
Week 3
It is computed by simply adding up all the observations and dividing by the total number of
observations:
Sample Mean
Population Mean
Measures of Central Location…
The median is calculated by placing all the observations in order; the observation that falls in
the middle is the median.
median
mode
mean
Mean, Median, Mode: Which Is Best?
❑ With three measures from which to choose,
which one should we use?
Now suppose that the respondent who reported 33 hours actually reported 133 hours
(obviously an Internet addict). The mean becomes:
This value is only exceeded by only two of the ten observations in the sample, making this
statistic a poor measure of central location.
The median stays the same. When there is a relatively small number of extreme observations,
the median usually produces a better measure of the center of the data.
Measures of Central Location Summary…
Compute the Mean to
Describe the central location of a single set of interval data
e.g.,
Data: {4, 4, 4, 4, 50} Range = 46
Data: {4, 8, 15, 24, 39, 50} Range = 46
The range is the same in both cases, but the data sets have very different distributions…
❑ Its major shortcoming is its failure to provide information on the dispersion of the
observations between the two end points.
❑ Hence, we need a measure of variability that incorporates all the data and not just two
observations. Hence…
Variance
Variance and its related measure, standard deviation, are arguably the most important
statistics. Used to measure variability, they also play a vital role in almost all statistical inference
procedures.
Sample Mean
Sample Variance
Standard Deviation
The standard deviation is simply the square root of the variance, thus:
Using Data > Data Analysis > Descriptive Statistics in Excel, we produce the following tables
for interpretation…
➢ Approximately 68% of all observations fall within one standard deviation of the mean.
➢ Approximately 95% of all observations fall within two standard deviations of the mean.
➢ Approximately 99.7% of all observations fall within three standard deviations of the
mean.
The Empirical Rule
Assumptions
Samples should come from the same Sample size must be sufficiently large
distribution. (≥30).
Example: Suppose you have purchased 10 lottery tickets and the possible outcomes are winning the lottery or not
winning the lottery, then you can answer a question like what is the probability of winning 6 lottery tickets using the
binomial distribution.
3. The outcome of each trial is independent. In other words, none of the trials affect the probability of the
next trial.
In binomial distribution, if the number of trials for a given experiment is equal to 1, then it is
called the Bernoulli distribution.
Uniform Distribution
Percentile: the Pth percentile is the value for which P percent are less than that value and (100-
P)% are greater than that value.
Suppose you scored in the 60th percentile on the GMAT, that means 60% of the other scores
were below yours, while 40% of scores were above yours.
Quartiles
We have special names for the 25th, 50th, and 75th percentiles, namely quartiles.
We can also convert percentiles into quintiles (fifths) and deciles (tenths).
Commonly Used Percentiles
First (lower) decile = 10th percentile
First (lower) quartile, Q1, = 25th percentile
Second (middle)quartile,Q2, = 50th percentile
Third quartile, Q3, = 75th percentile
Ninth (upper) decile = 90th percentile
Note: If your exam mark places you in the 80th percentile, that doesn’t mean you scored 80%
on the exam – it means that 80% of your peers scored lower than you on the exam; It is about
your position relative to others.
Location of Percentiles
The following formula allows us to approximate the location of any percentile:
Location of Percentiles
Recall the data from Example 4.1: 0 0 5 7 8 9 12 14 22 33. Where is the location of the
25th percentile? That is, at which point are 25% of the values lower and 75% of the values
higher?
L25 = (10+1)(25/100) = 2.75
The 25th percentile is three-quarters of the distance between the second (which is 0) and the third (which is 5) observations. Three-
quarters of the distance is: (.75)(5 – 0) = 3.75. Because the second observation is 0, the 25th percentile is 0 + 3.75 = 3.75
Please remember…
position 16
2.75
0 0 | 5 7 8 9 12 14 | 22 33
position
3.75 8.25
Lp determines the position in the data set where the percentile value lies,not the value of the
percentile itself.
Interquartile Range
The quartiles can be used to create another measure of variability, the interquartile range,
which is defined as follows:
Interquartile Range = Q3 – Q1
The interquartile range measures the spread of the middle 50% of the observations. Large
values of this statistic mean that the 1st and 3rd quartiles are far apart indicating a high level of
variability.
Box Plots
The box plot is a technique that graphs five statistics:
the minimum and maximum observations, and
Whisker
Whisker (1.5*(Q3–Q1))
the first, second, and third quartiles.
Example
A large number of fast-food restaurants with drive-through windows offering drivers and their
passengers the advantages of quick service. To measure how good the service is, an
organization called QSR planned a study wherein the amount of time taken by a sample of
drive-through customers at each of five restaurants was recorded. Compare the five sets of
data using a box plot and interpret the results.
Example
Wendy’s service time is shortest and least variable.
Hardee’s has the greatest variability, while Jack-in-
the-Box has the longest service times.
Thank you