1 - QBA Lecture 5 - Hypotheis Testing
1 - QBA Lecture 5 - Hypotheis Testing
Hypothesis testing
L ESACRHNOI N
OGL
Dr Itamar Megiddo
Department of Management Science
[email protected]
My office Hours:
X
Objectives
S T R AT H C LY D E B U S I N E S S S C H O O L
POPULATION
Estimating Collecting
population data
parameters
using sample
statistics
SAMPLE
Analysing sample
data
X
Tests covered
S T R AT H C LY D E B U S I N E S S S C H O O L
Null hypothesis (𝐻! ): average time spent reading is 5.42 hours per
week.
That is, 𝜇 = 5.42 ℎ𝑜𝑢𝑟𝑠 This relates to the
national average
𝜎
𝑿~𝑁 𝜇,
𝑛
Hypothesis testing framework
X
Use the CLT to calculate the Z-value and use the standard normal
table to obtain the probability of observing a sample mean of 6.02
hours or higher.
6.02 − 5.42
𝑧= = 2.776 From z-tables:
1.45 𝑷(𝒁 > 𝟐. 𝟕𝟕𝟔) = 𝟎. 𝟎𝟎𝟐𝟕
45
Hypothesis testing framework
X
1. Make an initial assumption about the data generating mechanism – a null hypothesis
and an alternative hypotheis
2. Collect a sample of data as evidence
3. Calculate the probability of observing such evidence, given the initial assumption
4. Depending on how likely or unlikely the evidence is, we reject or fail to reject the
initial assumption
As our p-value is less than our significance level, we reject the null
hypothesis (𝐻! ).
X
Tests covered
S T R AT H C LY D E B U S I N E S S S C H O O L
Smaller samples
S T R AT H C LY D E B U S I N E S S S C H O O L
6.02 − 5.42
𝑡= = 1.0948
1.45
7
• We are also working with a small sample size, and thus need to
assume that our sample mean follows a t-distribution.
• We are also working with a small sample size, and thus need to
assume that our sample mean follows a t-distribution.
• Once we have this critical value, if the value of our test statistic,
our t-value, exceeds it, we reject the null hypothesis.
X
Critical value (the value that portions of an area of 0.05 in the upper
area of the t-distribution) = 1.943
Given our observed t-value is less than our critical t-value, we do not
reject the null hypothesis in this case.
Tests covered
S T R AT H C LY D E B U S I N E S S S C H O O L
The professor suspected that the true mean reading time was above the national
average.
(version 2)
S T R AT H C LY D E B U S I N E S S S C H O O L
(version 2)
S T R AT H C LY D E B U S I N E S S S C H O O L
(version 2)
1. Make an initial assumption about the data generating mechanism – a null hypothesis and
an alternative hypothesis
Null hypothesis (𝐻! ): average time spent reading is 5.42 hours per
week.
That is, 𝜇 = 5.42 ℎ𝑜𝑢𝑟𝑠
(version 2)
1. Make an initial assumption about the data generating mechanism – a null hypothesis and
an alternative hypothesis
Rereading:
At a local university, a professor would like to investigate whether
this holds true for his students. He does not know whether he
will find his students read more or less than the national
average, but just wants to test this idea that students spend 5.42
hours per week reading.
Nothing here suggests that he thinks the reading time at his institution will
be higher, and nothing suggests he thinks it will be lower à a one-tailed
test is inappropriate.
X
Hypothesis testing framework
Example: time spend reading
S T R AT H C LY D E B U S I N E S S S C H O O L
(version 2)
1. Make an initial assumption about the data generating mechanism – a null hypothesis and
an alternative hypothesis
Important notes:
• Hypotheses should be set up before we observe the data, rather than after
• If there is nothing in the situation to suggest a one-tailed test, we adopt a two-
tailed test as default
If we are interested in checking for an increase or decrease, but not both, from the
null, use a one-tailed test
If you are interested in any difference from the null value, then the test should be
two-tailed
X
Hypothesis testing framework
Example: time spend reading
S T R AT H C LY D E B U S I N E S S S C H O O L
(version 2)
1. Make an initial assumption about the data generating mechanism – a null hypothesis and an
alternative hypotheis
2. Collect a sample of data as evidence
𝜎
𝑿~𝑁 𝜇,
𝑛
X
Hypothesis testing framework
Example: time spend reading
S T R AT H C LY D E B U S I N E S S S C H O O L
(version 2)
1. Make an initial assumption about the data generating mechanism – a null hypothesis and an
alternative hypotheis
2. Collect a sample of data as evidence
3. Calculate the probability of observing such evidence, given the initial assumption
(version 2)
1. Make an initial assumption about the data generating mechanism – a null hypothesis and an
alternative hypotheis
2. Collect a sample of data as evidence
3. Calculate the probability of observing such evidence, given the initial assumption
(version 2)
1. Make an initial assumption about the data generating mechanism – a null hypothesis and an
alternative hypotheis
2. Collect a sample of data as evidence
3. Calculate the probability of observing such evidence, given the initial assumption
(version 2)
1. Make an initial assumption about the data generating mechanism – a null hypothesis and an
alternative hypotheis
2. Collect a sample of data as evidence
3. Calculate the probability of observing such evidence, given the initial assumption
(version 2)
1. Make an initial assumption about the data generating mechanism – a null hypothesis and an
alternative hypotheis
2. Collect a sample of data as evidence
3. Calculate the probability of observing such evidence, given the initial assumption
(version 2)
1. Make an initial assumption about the data generating mechanism – a null hypothesis and
an alternative hypotheis
2. Collect a sample of data as evidence
3. Calculate the probability of observing such evidence, given the initial assumption
4. Depending on how likely or unlikely the evidence is, we reject or fail to reject the
initial assumption
Tests covered
S T R AT H C LY D E B U S I N E S S S C H O O L
A national student survey indicated that the average time spent reading per week was 5.42
hours .
At a local university, a professor would like to investigate whether this holds true for his
students. He does not know whether he will find his students read more or less than the
national average, but just wants to test this idea that students spend 5.42 hours per week
reading.
He conducts a study of 7 students and observes a mean reading time per week of 6.02
hours, and a standard deviation of 1.45 hours.
Conduct a hypothesis test (at a significance level of 5%) to examine how the reading time
at the Professor’s university compares to the national average.
X
Null hypothesis (𝐻! ): average time spent reading is 5.42 hours per
week.
That is, 𝜇 = 5.42 ℎ𝑜𝑢𝑟𝑠
6.02
t= = 1.0948
1.45
7
Adjust the p-value calculation or critical value calculation accordingly.
Half 0.05 to give 0.025, and calculate the value of t that gives a region of
0.025 in the upper tail of the distribution, given a sample size of 7.
Degrees of freedom: 𝑣 = 7 − 1 = 6
X
Hypothesis testing framework
Example: time spend reading
S T R AT H C LY D E B U S I N E S S S C H O O L
6.02
t= = 1.0948
1.45
7
Adjusting critical value: Half 0.05 to give 0.025, and calculate the value of t that gives a
region of 0.025 in the upper tail of the distribution, given a sample size of 7.
Degrees of freedom: 𝑣 = 7 − 1 = 6
X
Hypothesis testing framework
Example: time spend reading
S T R AT H C LY D E B U S I N E S S S C H O O L
Given our observed t-value is less than our critical value we do not reject the Null
hypothesis
Note: the area portioned off by 2.447 in the upper tail of the t-distribution is called
the critical region.
X
Tests covered
S T R AT H C LY D E B U S I N E S S S C H O O L
Chi-squared test
S T R AT H C LY D E B U S I N E S S S C H O O L
1 1 1 1
1
𝑜2 − 𝑒2 90 − 58.5 70 − 67.5 100 − 66
𝜒 =6 = + + ⋯+
𝑒2 58.5 67.5 66
2
= 69.63
X
Degrees of freedom
= 𝑟−1 𝑐−1
= 2−1 3−1
= 1×2
=2
X
Accept area
Degrees of freedom = 2, and upper tail area = 0.05 (5% Reject area
significance level)
X
Accept area
Objectives
S T R AT H C LY D E B U S I N E S S S C H O O L
•H0= Average time spent reading is 5.42 hours per week µ = 5.42
hours H1= Average time spent reading is greater than 5.42 hours
per week µ > 5.42 hours