0% found this document useful (0 votes)
2 views

MIT18_05S14_class18slides

Uploaded by

mail2vinaykk
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

MIT18_05S14_class18slides

Uploaded by

mail2vinaykk
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Null Hypothesis Significance Testing

p-values, significance level, power, t-tests


18.05 Spring 2014

January 1, 2017 1 /22


Understand this figure
f (x|H0 )

x
reject H0 don’t reject H0 reject H0

x = test statistic

f (x|H0 ) = pdf of null distribution = green curve

Rejection region is a portion of the x-axis.

Significance = probability over the rejection region = red area.


January 1, 2017 2 /22
Simple and composite hypotheses

Simple hypothesis: the sampling distribution is fully specified.


Usually the parameter of interest has a specific value.

Composite hypotheses: the sampling distribution is not fully


specified. Usually the parameter of interest has a range of values.

Example. A coin has probability θ of heads. Toss it 30 times and let


x be the number of heads.
(i) H: θ = 0.4 is simple. x ∼ binomial(30, 0.4).
(ii) H: θ > 0.4 is composite. x ∼ binomial(30, θ) depends on which
value of θ is chosen.

January 1, 2017 3 /22


Extreme data and p-values
Hypotheses: H0 , HA .

Test statistic: value: x, random variable X .

Null distribution: f (x|H0 ) (assumes the null hypothesis is true)

Sides: HA determines if the rejection region is one or two-sided.

Rejection region/Significance: P(x in rejection region | H0 ) = α.

The p-value is a computational tool to check if the test statistic is in


the rejection region. It is also a measure of the evidence for rejecting
H0 .
p-value: P(data at least as extreme as x | H0 )

Data at least as extreme: Determined by the sided-ness of the


rejection region.
January 1, 2017 4 /22
Extreme data and p-values
Example. Suppose we have the right-sided rejection region shown
below. Also suppose we see data with test statistic x = 4.2. Should
we reject H0 ?
f (x|H0 )

x
cα 4.2
don’t reject H0 reject H0

answer: The test statistic is in the rejection region, so reject H0 .

Alternatively: blue area < red area


Significance: α = P(x in rejection region | H0 ) = red area.
p-value: p = P(data at least as extreme as x | H0 ) = blue area.
Since, p < α we reject H0 .
January 1, 2017 5 /22
Extreme data and p-values
Example. Now suppose x = 2.1 as shown. Should we reject H0 ?
f (x|H0 )

x
2.1 cα
don’t reject H0 reject H0

answer: The test statistic is not in the rejection region, so don’t


reject H0 .

Alternatively: blue area > red area


Significance: α = P(x in rejection region | H0 ) = red area.
p-value: p = P(data at least as extreme as x | H0 ) = blue area.
Since, p > α we don’t reject H0 .

January 1, 2017 6 /22


Critical values

Critical values:

The boundary of the rejection region are called critical values.

Critical values are labeled by the probability to their right.

They are complementary to quantiles: c0.1 = q0.9

Example: for a standard normal c0.025 = 1.96 and c0.975 = −1.96.

In R, for a standard normal c0.025 = qnorm(0.975).

January 1, 2017 7 /22


Two-sided p-values
These are trickier: what does ‘at least as extreme’ mean in this case?
Remember the p-value is a trick for deciding if the test statistic is in
the region.
If the significance (rejection) probability is split evenly between the
left and right tails then

p = 2min(left tail prob. of x, right tail prob. of x)

f (x|H0 )

x
c1−α/2 x cα/2
reject H0 don’t reject H0 reject H0

x is outside the rejection region, so p > α: do not reject H0


January 1, 2017 8 /22
Concept question
1. You collect data from an experiment and do a left-sided z-test
with significance 0.1. You find the z-value is 1.8
(i) Which of the following computes the critical value for the
rejection region.
(a) pnorm(0.1, 0, 1) (b) pnorm(0.9, 0, 1)
(c) pnorm(0.95, 0, 1) (d) pnorm(1.8, 0, 1)
(e) 1 - pnorm(1.8, 0, 1) (f) qnorm(0.05, 0, 1)
(g) qnorm(0.1, 0, 1) (h) qnorm(0.9, 0, 1)
(i) qnorm(0.95, 0, 1)

(ii) Which of the above computes the p-value for this experiment.

(iii) Should you reject the null hypothesis.


(a) Yes (b) No

January 1, 2017 9 /22


Error, significance level and power
True state of nature
H0 HA
Our Reject H0 Type I error correct decision
decision Don’t reject H0 correct decision Type II error

Significance level = P(type I error)


= probability we incorrectly reject H0
= P(test statistic in rejection region | H0 )
= P(false positive)
Power = probability we correctly reject H0
= P(test statistic in rejection region | HA )
= 1 − P(type II error)
= P(true positive)
• HA determines the power of the test.
• Significance and power are both probabilities of the rejection region.
• Want significance level near 0 and power near 1.
January 1, 2017 10 /22
Table question: significance level and power

The rejection region is boxed in red. The corresponding probabilities


for different hypotheses are shaded below it.
x 0 1 2 3 4 5 6 7 8 9 10
H0 : p(x|θ = 0.5) .001 .010 .044 .117 .205 .246 .205 .117 .044 .010 .001
HA : p(x|θ = 0.6) .000 .002 .011 .042 .111 .201 .251 .215 .121 .040 .006
HA : p(x|θ = 0.7) .000 .0001 .001 .009 .037 .103 .200 .267 .233 .121 .028

1. Find the significance level of the test.


2. Find the power of the test for each of the two alternative
hypotheses.

January 1, 2017 11 /22


Concept question

1. The power of the test in the graph is given by the area of


f (x|HA ) f (x|H0 )
R3
R2
R1 R4

x
reject H0 region . non-reject H0 region

(a) R1 (b) R2 (c) R1 + R2 (d) R1 + R2 + R3

January 1, 2017 12 /22


Concept question
2. Which test has higher power?

f (x|HA ) f (x|H0 )

x
reject H0 region . do not reject H0 region

f (x|HA ) f (x|H0 )

x
reject H0 region . do not reject H0 region

(a) Top graph (b) Bottom graph


January 1, 2017 13 /22
Discussion question

The null distribution for test statistic x is N(4, 82 ). The rejection


region is {x ≥ 20}.

What is the significance level and power of this test?

January 1, 2017 14 /22


One-sample t-test
Data: we assume normal data with both µ and σ unknown:
x1 , x2 , . . . , xn ∼ N(µ, σ 2 ).
Null hypothesis: µ = µ0 for some specific value µ0 .
Test statistic:
x − µ0
t= √
s/ n
where n
2 1 n
s = (xi − x)2 .
n − 1 i=1
Here t is the Studentized mean and s 2 is the sample variance.
Null distribution: f (t | H0 ) is the pdf of T ∼ t(n − 1),
the t distribution with n − 1 degrees of freedom.
Two-sided p-value: p = P(|T | > |t|).
R command: pt(x,n-1) is the cdf of t(n − 1).
http://mathlets.org/mathlets/t-distribution/
January 1, 2017 15 /22
Board question: z and one-sample t-test

For both problems use significance level α = 0.05.

Assume the data 2, 4, 4, 10 is drawn from a N(µ, σ 2 ).

Suppose H0 : µ = 0; HA : µ = 0.

1. Is the test one or two-sided? If one-sided, which side?

2. Assume σ 2 = 16 is known and test H0 against HA .

3. Now assume σ 2 is unknown and test H0 against HA .

January 1, 2017 16 /22


Two-sample t-test: equal variances
Data: we assume normal data with µx , µy and (same) σ unknown:
x1 , . . . , xn ∼ N(µx , σ 2 ), y1 , . . . , ym ∼ N(µy , σ 2 )

Null hypothesis H0 : µx = µy .
(n − 1)sx2 + (m − 1)sy2 1 1
Pooled variance: sp2 = + .
n+m−2 n m
x̄ − ȳ
Test statistic: t=
sp
Null distribution: f (t | H0 ) is the pdf of T ∼ t(n + m − 2)

In general (so we can compute power) we have


(x̄ − ȳ ) − (µx − µy )
∼ t(n + m − 2)
sp

Note: there are more general formulas for unequal variances.


January 1, 2017 17 /22
Board question: two-sample t-test

Real data from 1408 women admitted to a maternity hospital for (i)
medical reasons or through (ii) unbooked emergency admission. The
duration of pregnancy is measured in complete weeks from the
beginning of the last menstrual period.
Medical: 775 obs. with x̄ = 39.08 and s 2 = 7.77.
Emergency: 633 obs. with x̄ = 39.60 and s 2 = 4.95

1. Set up and run a two-sample t-test to investigate whether the


duration differs for the two groups.
2. What assumptions did you make?

January 1, 2017 18 /22


Table discussion: Type I errors Q1

1. Suppose a journal will only publish results that are statistically


significant at the 0.05 level. What percentage of the papers it
publishes contain type I errors?

answer: With the information given we can’t know this. The


percentage could be anywhere from 0 to 100! –See the next
two questions.

January 1, 2017 19 /22


Table discussion: Type I errors Q2

2. Jerry desperately wants to cure diseases but he is terrible at


designing effective treatments. He is however a careful scientist and
statistician, so he randomly divides his patients into control and
treatment groups. The control group gets a placebo and the
treatment group gets the experimental treatment. His null hypothesis
H0 is that the treatment is no better than the placebo. He uses a
significance level of α = 0.05. If his p-value is less than α he publishes
a paper claiming the treatment is significantly better than a placebo.
(a) Since his treatments are never, in fact, effective what percentage
of his experiments result in published papers?
(b) What percentage of his published papers contain type I errors,
i.e. describe treatments that are no better than placebo?

January 1, 2017 20 /22


Table discussions: Type I errors: Q3

3. Efrat is a genius at designing treatments, so all of her proposed


treatments are effective. She’s also a careful scientist and statistician
so she too runs double-blind, placebo controlled, randomized studies.
Her null hypothesis is always that the new treatment is no better than
the placebo. She also uses a significance level of α = 0.05 and
publishes a paper if p < α.
(a) How could you determine what percentage of her experiments
result in publications?
(b) What percentage of her published papers contain type I errors,
i.e. describe treatments that are no better than placebo?

January 1, 2017 21 /22


MIT OpenCourseWare
https://ocw.mit.edu

18.05 Introduction to Probability and Statistics


Spring 2014

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms .

You might also like