0% found this document useful (0 votes)
19 views35 pages

Hypothesis Testing

Uploaded by

MishikaKumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views35 pages

Hypothesis Testing

Uploaded by

MishikaKumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Async 1a – Hypothesis Testing

Hypothesis Testing Intuition


Coin Flipping
Intuitive Hypothesis Testing
Is a coin fair (P(H) = 0.5)? Suppose we get:

Case 1: 55 heads in 100 tosses? Probably fair -


outcome not unlikely (prob = 0.3173)

Case 2: 85 heads in 100 tosses? Probably not fair -


outcome highly unlikely (prob = .0000000000026)
Rare events only happen rarely, and they don’t happen to me!

Case 3: 38 heads in 100 tosses? Not immediately clear -


(prob = 0.0164)
Formalizing Hypothesis Testing
We establish two hypotheses
1) (Null) H0: coin is fair (p = P(H) = 0.5)
2) (Alternative) Ha: coin isn’t fair (p ≠ 0.5)

• Accept H0 unless you see strong contrary evidence.


• (i.e., evidence to convince you beyond a reasonable doubt that H0
isn’t true.)

• “Reasonable doubt” quantified by , the “significance level of the test.”


• Typical values of α are 1% and 5%.

• “Difference” vs. “significant difference”:


• 55 heads in 100 tosses is different from the expected value
(which is 50 heads).
• 85 heads in 100 tosses is significantly different.
Hypothesis Testing
Motivating Examples
Hypothesis Testing Motivation
You introduced a new marketing campaign. Are sales
significantly higher since you started the campaign? Should
you continue the campaign?

You manage a fund benchmarked to the S&P 500.


Recently it’s done better than the S&P 500. Can you claim
your fund significantly outperforms the benchmark?

You received performance summaries of all your


employees. Did Bob commit significantly more errors than
the average? Should you fire Bob?
Kapow! Introduction
Hypothesis Testing for
Means Example: Kapow!
Kapow! Case
Quality control has recently become an issue at Kapow!,
with some customers complaining that many of the bottles
appear to be underfilled. Beatrix has become concerned
that her bottling facility may not be filling the quart-sized
jars reliably and faithfully. She recently took a course in
statistics, and she decided to put some of this knowledge to
work.
Kapow! Case
Beatrix decided to calibrate the measuring cup to measure
out on average μ = 32.20 ounces of beverage into each
bottle… Beatrix chose the value 32.20 to be on the safe
side. She thinks that it is reasonable to assume that the
volume of product in individual bottles follows a normal
distribution. Assuming a standard deviation of σ = 0.10
ounces, setting the average to μ = 32.20 ounces means
that approximately 98% of the individual bottles will contain
the advertised 32 ounces.
Kapow! Case
Beatrix decided to randomly sample 25 bottles of beverage
product every morning, calculate the sample mean of the
number of ounces per jar, and then decide whether the
measuring cup needed recalibration based on the value of .

Unfortunately, there was no guarantee that the process


average would stay at the desired level for more than one
day… Therefore she planned to evaluate the process using
a new sample each morning.
The Bottling Process
The Bottling Process
• Goal:  = 32.20 ounces
• Why?
• Assume  = 0.10 ounces
• Assume an approximately Normal population distribution
• So, if  = 32.20 ounces, then approximately 98% of the individual
jars will contain at least 32 ounces of beverage product.
(P(X  32) = 1 – NORM.DIST(32, 32.2, 0.10, true) = 0.9772)
Amounts in the individual bottles

31.8 32 32.2 32.4 32.6

Note: this is the last time we’ll talk about the amounts in the individual jars.
The Bottling Process

The mark made by


the grease pencil
so that Xbar, on
average, equals µ
= 32.20

The measuring cup


Whole Foods Under Investigation for
Overcharging in NYC
Washington Post, June 25, 2015

“Its nickname is “Whole Paycheck”—but Whole Foods’ high prices were generally thought to be part of
its luxury mystique, not wrongdoing or mislabeling.

But now that New York City’s Department of Consumer Affairs (DCA) is investigating the grocery chain
for `systematic overcharging for pre-packaged foods,’ that may change.”

“The agency says it tested 80 different types of prepackaged foods at New York Whole Foods outlets
and found that all had mislabeled weights. The U.S. Department of Commerce says a package can
deviate from its stated weight by only so much, according to DCA; 89% of the packages DCA tested did
not meet this standard.”

Alleged overcharges:
• Vegetable platters: $20 per package, including $6.15 overcharge on average
• Chicken tenders: $9.99 per pound, including $4.13 overcharge on average per package
• Berries: $8.58 per package, including $1.15 overcharge on average
Base your conclusions on the following
sample that was taken this morning.

OBS OUNCES OBS OUNCES OUNCES


1 32.03 14 32.25
2 32.26 15 32.02 Mean 32.1588
3 32.16 16 32.04 Standard Error 0.02244
4 32.20 17 32.10 Median 32.16
5 32.12 18 32.22 Mode 32.20
6 32.39 19 32.34 Standard Deviation 0.11219
7 32.15 20 32.08 Sample Variance 0.01259
8 32.12 21 31.98 Kurtosis -0.21531
9 32.24 22 32.16 Skewness 0.01188
10 32.13 23 32.26 Range 0.46
11 31.93 24 32.20 Minimum 31.93
12 32.11 25 32.17 Maximum 32.39
13 32.31 Sum 803.97
Count 25
Confidence Level (95.0%) 0.04631
The Bottling Process
• Thomas just took a sample of n = 25 and got X-bar = 32.1588.

• This is smaller than the goal of  = 32.20.


• Is this just normal variation?
• OR, is this cause for concern?

• If X-bar is “close” to 32.20, conclude that this is just normal


variation. If X-bar is “far” from 32.20, then shut down and re-
calibrate.

• Assuming that the cup is calibrated properly, X-bar is a random


sample of one value from a probability distribution with
mean μ X μ
σ
standard error σ X 
n
• We will estimate σ with s and use the t distribution.
Distribution of X-bar in ounces of beverage
product

Assuming µ = 32.20

32.12 32.14 32.16 32.18 32.2 32.22 32.24 32.26 32.28


X-bar
X  X  32.20
tOBS  
sX 0.02244
Distribution in standardized values

-4.00 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 4.00

t
Give Me a Break!
Which Is the Null Hypothesis?
• The Alternative Hypothesis is the one you wish to detect.
• Words such as “significantly different from,” or
“significantly greater than,” or “significantly less than” are
keys in defining the Alternative Hypothesis.
• The Null Hypothesis is the one you would accept without
strong evidence that points to the Alternative Hypothesis.
• The Null Hypothesis gets the “benefit of the doubt.”
Evidence for the Alternative Hypothesis must be
convincing.
H0 :  32.20 ( or  32.20)
Ha :   32.20
Question a will be discussed in Class 5
Question b
Lower-Tailed Test Setup
b) Assume Beatrix wishes to detect whether the
process average is significantly less than 32.20
ounces.

Set up the appropriate hypothesis test if Thomas


wishes to test at level of significance of α = 0.05.

Calculate the test-statistic and prob-value of the


test.

What conclusion would you reach?


First Hypothesis Testing Key
Assume the null
hypothesis is true
(ATNHIT).

Second Hypothesis
Testing Key
Draw a picture!
Question b:
Lower-Tailed Test Follow-up
b) H0:  = 32.20 (or  ≥ 32.20)
Ha:  < 32.20
= -1.8362

Hypothesis test based on:


prob-value = Pr{ t ≤ -1.8362}
= T.DIST(-1.8362,24,true)
• ATNHIT = 0.0394

0.0394 < 0.05, so


Reject H0.

We have compelling evidence


at the 5% significance level
that the true process mean
is less than 32.20.
If H0 is true, how rare an event have I seen?

This is captured by the prob-value.


What Can We Really Conclude?
If we reject H0:
• Our observation is not reasonably consistent with the cup being
properly calibrated ().
• We are pretty sure that the cup needs calibrating ().

If we don’t reject H0:


• Either (1) the cup is properly calibrated (μ ≥ 32.20),
• Or (2) we don’t have (good) enough data to prove that the cup is
miscalibrated (μ < 32.20).
Question d:
Upper-Tailed Test
d) Suppose that Beatrix wishes to rigorously support
her claim that Kapow! jars contain more than 32
ounces of product on average.

What should be the null and alternative hypotheses


for her test?

H0: μ = 32 (or μ ≤ 32)


Ha: μ > 32

You might also like