Hypothesis Testing: Proportions and Means
Hypothesis Testing: Proportions and Means
In this chapter, we will learn the concept of statistical testing. Let us start with a simulated
example and understand why the idea of “statistical testing” even arise, and how do we
formulate and carry out the test.
Game: Let’s play a betting game. My bag has 10 small balls (8 red, 2 blue) and I will bet
3 people a candy bar that a blue ball will come up. I know my chances aren’t very good,
but I will take them anyway ...
• Does it seem reasonable if the bag really contains 8 red balls and 2 blue balls that I
would win 3 times in 3 trials?
• The actual chance of me winning 3 times (if the bag really contains what I claim) is
.........
Moral of the story: An outcome that would rarely happen if a claim (hypothesis) were true
1
2 DS 1 Lecture Notes
Example: Football game Every week, two teams (one made up of husbands, the other
made up of escaped convicts — the “In-laws” and the “Outlaws”) get together and play a
football game. They decide who receives the kicko↵ by a flip of a “fair” coin provided by
the In-laws (heads = outlaws receive, tails = in-laws receive). Well, after losing the coin
toss for a long time, the Outlaws have become suspicious!
Being well-read data analysts that the Outlaws are, they now decide to see if the “fair”
coin really is fair (They have little trouble stealing it from the In-laws but they got it
eventually!). They then flip the coin 100 times on the sidelines, and get 35 heads. Outlaws
expected ...... heads but got 35 heads. They are furious! Two possible explanations:
• The coin is fair and this outcome just happened by chance (bad sample!)
• For a fair coin, this outcome is so unlikely (extreme) that we can conclude that the
coin is not fair.
Null Hypothesis (H0 ): The chance of heads = 50% (This is the statement of “status quo”:
It says the observed outcome is di↵erent from the expected one just by chance variation).
Or, p = 0.5(= p0 ).
Alternative Hypothesis (HA ): The chance of heads < 50%. (This is the statement we’d like
to prove. It says chance variation is not enough to explain the outcome).
To know more about null and alternative hypotheses, we go to the next section.
The Null Hypothesis is an existing/status-quo, usually strong or preferred belief. One usu-
ally doesn’t want to change this belief unless there is compelling evidence to do so. When
testing e↵ect (µ) of a new drug, Null hypothesis could be H0 : µ 0, i.e. a conservative
belief that the drug has no e↵ect or negative e↵ect. We don’t want to risk a patient’s health
with new drug unless there is a strong evidence to believe it will work. Based on the context,
Null hypothesis could also be H0 : µ = 0, i.e. expected e↵ect or mean e↵ect of drug is 0. For
Hypothesis Testing: Proportions and Means 3
example, the drug may be intended to either decrease sugar level (when high) or increase
it (when low). In the Football game example, H0 : p = 1/2.
The Alternative Hypothesis depends on what we are trying to find evidence for. In the drug
example, we may want to find evidence for whether the expected e↵ect of drug is positive.
In that case, H1 : µ > 0, which is called a one-sided alternative. If we are trying to find out
if there is any e↵ect at all, i.e. expected e↵ect of drug is non-zero, then H1 : µ 6= 0. This is
a two-sided alternative. In the Football game example, H1 : p < 1/2.
Philosophy of Hypothesis Test: “Innocent until proven guilty”. Assume null hypothesis
to be true and see if there is convincing evidence in the data to prove otherwise. To make
a decide on the null and alternative hypothesis based on two things:
• how much di↵erent the observed outcome is from the expected outcome if the null
hypothesis is true (test statistic)
• what is the chance of that type of di↵erence occurring if null hypothesis is true.
If such a di↵erence is very unlikely to occur if null hypothesis is true (that is the chance of
that type of di↵erence occurring, if null hypothesis is true, is negligible), then we reject the
null hypothesis.
Back to Football game example: As we are dealing with proportions, our test statistic
b p
p
under H0 is Z = d p) = 3.
SE(b
So, the observed outcome is -3 SD below the expected outcome. Is this an unlikely (extreme)
outcome?
4 DS 1 Lecture Notes
Question: What is the chance of getting such an extreme (35% heads in 100 tosses) or
more extreme outcome (less than 35%) if H0 is true? Ans.0.00135.
P-value: The chance of getting a test statistic as extreme or more extreme than what we
observed if H0 is true.
We note that “Extreme” is in the direction of H1 . This p-values tell us: If H0 is true (i.e.,
the coin is fair), then the chance of getting 35% or less heads in 100 tosses is 0.00135. This
tell us that if H0 is true, the chance of getting a test statistic as extreme as -3 is very low.
As our p-value is so small, we conclude that we have enough evidence to reject the null.
That is, H0 is not a reasonable explanation of the observed data.
For p-value, how small is small enough to reject H0 ? This cuto↵ is called “significance level”
and is denoted by ↵. Typical choices of ↵ are 0.05 or 0.01.
• Use the p-value and the given significance level to draw the conclusion: If p-value
↵, reject H0 . If p-value > ↵, do not reject H0 .
Example: Suppose SingTM and Xilingo have just started their business in Laos and cur-
rently have a limited customer base since the e-commerce market is still under developed.
Only 30 individuals are using SingTM currently. If we want to test H0 : p = 1/3 vs
H1 : p 6= 1/3 at significance level of ↵ = 0.5, then for a large or a small di↵erence of the
sample proportion and the population proportion under the condition that null hypothesis
is true (i.e. p = 1/3), we will reject the H0 .
Hypothesis Testing: Proportions and Means 5
Decision
Truth (unknown) # Reject H0 Do not reject H0
H0 Wrong Decision Correct Decision
H1 Correct Decision Wrong Decision
A suspect is brought to the court - “presume innocent until proven guilty.” So, H0 : Person
is Innocent vs. H1 : Person is Guilty. H0 is rejected (i.e., the suspect is convicted) only if
there is strong evidence against his/her innocence. Otherwise, H0 is not rejected (i.e., the
suspect is acquitted).
Note, we cannot eliminate the possibility of errors because our decision is based on a sample,
and not the whole population. But we can have some control over the chances of making
errors. If we try to decrease P(type I error) (by concluding persons under trial innocent
more often), what happens to P(type II error)?
Note: Type I error is typically considered more severe and so it’s chance is controlled by
setting ↵ to be a small. After setting ↵ to be small, the studies are designed in such a way
that type II error is minimized.
Interpretation of P-value: P-value is NOT the probability that H0 is true. H0 is either true
or not true. It does not vary from sample to sample. P-value tells how likely it is to get the
observed sample (or something more extreme) if H0 is true. Smaller the P-value, stronger
the evidence against H0 .
Fact: P(Type I error) = ↵ (the significance level used).
6 DS 1 Lecture Notes
We reiterate that Type II Error is the error committed by rejecting the alternative H1 when
it is actually true. Probability of Type II Error (denoted by ) depends on the particular
value of the population proportion (say p1 ) in H1 . P(Type II Error) or function is defined
as
Power of the test is the probability of accepting H1 when it is true. Power is directly
related to the probability of type II error or the function. Thus, Power(p1 ) = 1 (p1 ).
A higher power is desirable for a test.
Question: In the SingTM (in Laos) example, for the two-sided test H0 : p = 1/3 vs.
H1 : p 6= 1/3, at 5% level, suppose the rejection rule was: Reject H0 if the value of pb is
outside the interval [0.16, 0.50]. Compute the power of the two sided test for p1 = 2/3.
Power can be computed for di↵erent values p1 . Figure 5.2 plots the power function for various
values of p. Power is higher for values further away from the null value p0 = 1/3. This means
that for the given sample size, the test will more powerful in di↵erentiating null (p0 ) from
alternatives (p1 ) that are further away from p0 . The di↵erent types of errors in statistical
testing can be visualized as in Table 5.1. Note that it is usually not possible to minimize
both errors simultaneously. Idea in testing is therefore to fix a small probability for Type
I error (↵) to protect the preferred or strong belief and then obtain a test that maximizes
power at the given level. Statistical theory of inference (Neyman Pearson Lemma) enables
one to construct such tests. We just mention here that the tests we use in the course are
justified in theory.
Hypothesis Testing: Proportions and Means 7
So far, we used CLT based normal approximation for pb. When n is small, the CLT
approximation may not hold. We could use the actual sampling distribution based on
Binomial(n, p0 ). Then we find smallest c so that P (|b
p p0 | > c) ↵. P-value and Power
can also be obtained using actual sampling distribution.
For one-sided test, the formulation of test and computations are similar to two-sided test,
with obvious modification. Summary is as below.
• A one sided right or upper tail test is formulated as: H0 : p p0 vs. H1 : p > p0 . This
is to be tested based on an i.i.d. sample of size n.
• We fix the significance Level of the test at ↵. Note that in one-sided test, the signif-
icance level is to be taken as the probability of Type I Error when p = p0 , i.e. the
boundary value.
• Since the alternative is one-sided and upper tailed, the test would take the form :
Reject H0 if Z > z↵ or |b
p p0 | > c.
• The power of the test for any p1 > p0 can be computed by (p1 ) = 1 Pp1 (b
p p0 c).
Exercise: Answer the questions below for H0 : p 1/3 vs. H1 : p < 1/3. For reference,
z0.05 = 1.64, z0.01 = 2.33, z0.025 = 1.96, z0.005 = 2.58.
p = 0.1 (ii) pb = 0.25 (iii)
(a) What is the P-value when computed value in sample of (i)b
pb = 0.5.
8 DS 1 Lecture Notes
(c) In (i) (ii) and (iii) what is the outcome of the testing at 1% level of significance?
So far, we mainly looked at problems involving population proportion (p). The ideas can
be easily extended to formulate and carry out test for the population mean µ. The ideas
can be simply summarized as follows.
• Test Statistic: µ
b = x̄, and approximate sampling distribution can be obtained via
CLT.
x̄ µ0
(i) When is known, Z = p 2 /n
⇠ N (0, 1) approximately for large n.
2
(ii)When is unknown we use= s2 instead of 2
. If x1 , x2 , ..., xn comes from a distri-
x̄ µ0
bution that is approximately normal, then T = p ⇠ tn 1.
s2 /n
tn 1 denotes t-distribution with (n-1) degrees of freedom (d.f). Here, ’T’ is called the t-
2
statistic”. So, depending on whether is known or unknown, we use one of the above null
distributions.
Note: You need to be able to look up from t tables and Normal table in the text book.
Exercise: Suppose we are interested to demonstrate that people pay lesser on E-bay than
on regular online purchases. A particular product is chosen to demonstrate this. It is found
that Amazon charged a price of USD 46.99 for this product. A sample of 52 Ebay auction
prices during the same period for the same product were recorded. Their average was USD
44.17 with a standard deviation of USD 4.15.
2. What is the p-value? Is there enough evidence for Ebay’s claim at 1% level of signifi-
cance?
3. Suppose the standard deviation of the population was known (i.e. = 4.15), what
would you do di↵erently?
Note that we could also have stated the test-rule in the above question in terms of a 99%
confidence interval. So, there is some relation between confidence intervals and testing. We
would reject H0 if
x̄ µ0
|T | = p > t0.005,51
s2 /n
p
Equivalently, we reject H0 if |x̄ µ0 | > s2 /n >⇣t0.005,51 . ⌘
p p
Equivalently, we reject H0 if µ0 does not belong to x̄ t0.005,n 1 s2 /n, x̄ + t0.005,n 1 s2 /n .
We note that x̄ = 44.17, s = 4.15, n = 52. So, we reject H0 at 99% level if µ0 does not
belong to the interval [42.63, 45.71], i.e. if µ0 is not contained in the 99% confidence interval.
A similar connection exists between one-sided test and one-sided confidence intervals. A
one sided interval of 95% confidence will be either of the form [L, 1) or (1, U ], where L or
U will be chosen so that the probability is 0.95.