Unit 3
Unit 3
1
Table of contents
5. T-Distribution
8. Hypothesis Tests for the Means of Two Populations with Equal Standard Deviations,
Using Independent Samples 2
Statistics
3
Hypothesis Testing
4
Types of Hypothesis
A hypothesis about the value of a population parameter is an assertion about its value.Typically
a hypothesis test involves two hypotheses: the null hypothesis and the alternative hy-
pothesis (or research hypothesis), which we define as follows.
5
Types of Hypothesis
6
The Logic of Hypothesis Testing
After we have chosen the null and alternative hypotheses, we must decide whether
to reject the null hypothesis in favor of the alternative hypothesis. The procedure for
deciding is roughly as follows.
Logic Take a random sample from the population. If the sample data are
consistent with the null hypothesis, do not reject the null hypothe-
sis; if the sample data are inconsistent with the null hypothesis and
supportive of the alternative hypothesis, reject the null hypothesis
in favor of the alternative hypothesis.
7
Type I and Type II Errors
Any decision we make based on a hypothesis test may be incorrect because we have
used partial information obtained from a sample to draw conclusions about the entire
population. There are two types of incorrect decisions : Type I error and Type II error
8
“False Alarm” vs “Under Reaction”
9
Type I and Type II Errors
Consider the sugar-packaging hypothesis test. The null and alternative hypotheses are:
10
Inferential Statistics
A Type I error occurs if a true null hypothesis is rejected. The probability of that
happening, the Type I error probability, commonly called the significance level of the
hypothesis test, is denoted α.
Significance The probability of making a Type I error, that is, of rejecting a true
Level null hypothesis, is called the significance level, α , of a hypothesis
test.
A Type II error occurs if a false null hypothesis is not rejected. The probability of that
happening, the Type II error probability, is denoted β
Relation For a fixed sample size, the smaller we specify the significance
level, α, the larger will be the probability, β, of not rejecting a false
null hypothesis.
11
Possible Conclusions for a Hypothesis Test
The significance level, α, is the probability of making a Type I error, that is, of rejecting
a true null hypothesis. Therefore, if the hypothesis test is conducted at a small
significance level (e.g., α = 0.05), the chance of rejecting a true null hypothesis will be
small.
For Small α Suppose that a hypothesis test is conducted at a small signifi-
cance level.
1. If the null hypothesis is rejected, we conclude that the data
provide suffi- cient evidence to support the alternative hy-
pothesis.
2. If the null hypothesis is not rejected, we conclude that the
data do not provide sufficient evidence to support the alter-
native hypothesis.
12
Critical-Value Approach
13
Critical-Value Approach to Hypothesis Testing
14
”
Problem : Critical-Value Approach to Hypothesis Testing
Problem Jack tells Jean that his average drive of a golf ball is 275 yards.
Jean is skeptical and asks for substantiation. To that end, Jack hits
25 drives.
The (sample) mean of Jack’s 25 drives is only 264.4 yards. Jack still
maintains that, on average, he drives a golf ball 275 yards and that
his (relatively) poor performance can reasonably be attributed to
chance.
At the 5% significance level, do the data provide sufficient evi-
dence to conclude that Jack’s mean driving distance is less than
275 yards?
For our analysis, we assume that Jack’s driving distances are nor-
mally distributed (which can be shown to be reasonable) and that
the population standard deviation of all such driving distances is
20 yards.
15
Terminology of the Critical-Value Approach
Definition • Rejection region: The set of values for the test statistic that
leads to rejection of the null hypothesis.
• Nonrejection region: The set of values for the test statistic
that leads to non- rejection of the null hypothesis.
• Critical value(s): The value or values of the test statistic that
separate the rejection and nonrejection regions. A critical
value is considered part of the rejection region.
16
Steps in the Critical-Value Approach to Hypothesis Testing
If the value of the test statistic falls in the rejection region, reject H0 ; oth-
5 erwise, do not reject H0 .
17
P-Value Approach
18
P-Value Approach to Hypothesis Testing
“ With the P-value approach to hypothesis testing, we first evaluate how likely
observation of the value obtained for the test statistic would be if the null hy-
pothesis is true. The criterion for deciding whether to reject the null hypothesis
involves a comparison of that likelihood with the specified significance level of
the hypothesis test.
19
”
Terminology of the Critical-Value Approach I
The smaller (closer to 0) the P-value, the stronger is the evidence against the null hy-
pothesis and, hence, in favor of the alternative hypothesis. Stated simply, an outcome
that would rarely occur if the null hypothesis were true provides evidence against the
null hypothesis and, hence, in favor of the alternative hypothesis.
20
Terminology of the Critical-Value Approach II
P-Value as the Observed Significance Level
The P-value of a hypothesis test equals the smallest significance level at which the null
hypothesis can be rejected, that is, the smallest significance level for which the ob-
served sample data results in rejection of H0 .
if z0 denote the observed value of the test statistic z, we determine the P-value as-
• Two-tailed test: The P-value equals the probability of observing a value of the test
statistic z that is at least as large in magnitude as the value actually observed, which
is the area under the standard normal curve that lies outside the interval -|z0 |–|z0 |.
21
Terminology of the Critical-Value Approach III
• Left-tailed test: The P-value equals the probability of observing a value of the test
statistic z that is as small as or smaller than the value actually observed, which is
the area under the standard normal curve that lies to the left of z0
• Right-tailed test: The P-value equals the probability of observing a value of the test
statistic z that is as large as or larger than the value actually observed, which is the
area under the standard normal curve that lies to the right of z0
22
Steps in the P-Value Approach to Hypothesis Testing
23
Problem : P-Value Approach to Hypothesis Testing
Problem Jack tells Jean that his average drive of a golf ball is 275 yards.
Jean is skeptical and asks for substantiation. To that end, Jack hits
25 drives.
The (sample) mean of Jack’s 25 drives is only 264.4 yards. Jack still
maintains that, on average, he drives a golf ball 275 yards and that
his (relatively) poor performance can reasonably be attributed to
chance.
At the 5% significance level, do the data provide sufficient evi-
dence to conclude that Jack’s mean driving distance is less than
275 yards?
For our analysis, we assume that Jack’s driving distances are nor-
mally distributed (which can be shown to be reasonable) and that
the population standard deviation of all such driving distances is
20 yards.
24
Problem : P-Value Approach to Hypothesis Testing I
Problem The value of the test statistic for a left-tailed one-mean z-test is z =
−1.19.
• Determine the P-value.
• At the 5% significance level, do the data provide sufficient
evidence to reject the null hypothesis in favor of the
alternative hypothesis?
Problem The value of the test statistic for a right-tailed one-mean z-test is z
= 2.85.
• Determine the P-value.
• At the 1% significance level, do the data provide sufficient
evidence to reject the null hypothesis in favor of the
alternative hypothesis?
25
Problem : P-Value Approach to Hypothesis Testing II
Problem The value of the test statistic for a two-tailed one-mean z-test is z
= −1.71.
• Determine the P-value.
• At the 5% significance level, do the data provide sufficient
evidence to reject the null hypothesis in favor of the
alternative hypothesis?
26
Hypothesis Tests for One Population Mean When σ Is Known I
27
Hypothesis Tests for One Population Mean When σ Is Known II
28
T-Distribution I
If x is a normally distributed variable with mean µ and standard deviation σ , then, for
samples of size n, the variable x is also normally distributed and has mean μ and stan-
dard deviation σ/n. Equivalently, the standardized version of x,
x−µ
z= √
σ/ n
has the standard normal distribution. but what if the population standard deviation is
unknown? Then we cannot base our confidence-interval procedure on the standard-
ized version of x. The best we can do is estimate the population standard deviation, σ
, by the sample standard deviation, s; in other words, we replace sigma by s in above
Equation and base our confidence-interval procedure on the resulting variable t called
the studentized version of x.
x−µ
t= √
s/ n
29
T-Distribution II
30
T-Distribution and T-Curves I
There is a different t-distribution for each sample size. We identify a particular t-distribution
by its number of degrees of freedom (df ). For the studentized version of x, the number
of degrees of freedom is 1 less than the sample size, which we indicate symbolically by
df = n − 1.
Studentized Suppose that a variable x of a population is normally distributed
Version of the with mean μ. Then, for samples of size n, the variable
Sample Mean
x−µ
t= √
s/ n
has the t-distribution with n − 1 degrees of freedom.
31
T-Distribution and T-Curves II
32
T-Distribution and T-Curves III
33
T-Distribution and T-Curves IV
34
T-Distribution and T-Curves V
35
Hypothesis Tests for One Population Mean When σ Is Unknown I
36
Hypothesis Tests for One Population Mean When σ Is Unknown II
37
Hypothesis Tests for One Population Mean When σ Is Unknown III
Example Figure below shows the pH levels obtained by the researchers for
15 lakes. At the 5% significance level, do the data provide sufficient
evidence to conclude that, on average, high mountain lakes in the
Southern Alps are nonacidic?
38
Hypothesis Tests for One Population Mean When σ Is Unknown IV
Solution the null and alternative hypotheses : Let µ denote the mean pH
: STEP I level of all high mountain lakes in the Southern Alps. Then the null
and alternative hypotheses are, respectively,
• H0 : µ = 6 (on average, the lakes are acidic)
• Ha : µ > 6 (on average, the lakes are nonacidic).
Note that the hypothesis test is right tailed.
Solution significance level, [α] : We are to perform the test at the 5% sig-
: STEP II nificance level, so
α = 0.05.
39
Hypothesis Tests for One Population Mean When σ Is Unknown V
x − µ0
t= √ = 3.458
s/ n
40
Hypothesis Tests for One Population Mean When σ Is Unknown VI
41
Type II Error Probabilities I
Type II error The probability of making a Type II error ( Not rejecting the null
hypothesis when it is in fact false) depends on the sample size,
the significance level, and the true value of the parameter under
consideration.
42
Type II Error Probabilities II
43
Type II Error Probabilities III
44
Type II Error Probabilities IV
Example The manufacturer of a new model car, the Orion, claims that a
typical car gets 26 miles per gallon (mpg). A consumer advocacy
group is skeptical of this claim and thinks that the mean gas
mileage, µ, of all Orions may be less than 26 mpg. The group plans
to perform the hypothesis test
• H0 : µ = 26 mpg (manufacturer’s claim)
• Ha : µ < 26 mpg (consumer group’s conjecture),
at the 5% significance level, using a sample of 30 Orions. Find the
probability of making a Type II error if the true mean gas mileage
of all Orions is 25.8 mpg and 25.0 mpg.
45
Type II Error Probabilities V
46
Type II Error Probabilities VI
47
Type II Error Probabilities VII
48
Type II Error Probabilities VIII
49
Power and Power Curves I
The power of a hypothesis test is between 0 and 1 and measures the ability of the hy-
pothesis test to detect a false null hypothesis. If the power is near 0, the hypothesis test
is not very good at detecting a false null hypothesis; if the power is near 1, the hypothesis
test is extremely good at detecting a false null hypothesis.
50
Power and Power Curves II
Power Curve The visual display of the effectiveness of the hypothesis test, ob-
tained by plotting points of power against various values of the
parameter and then connecting the points with a smooth curve
is called a power curve. In general, the closer a power curve is to 1
(i.e., the horizontal line 1 unit above the horizontal axis), the better
the hypothesis test is at detecting a false null hypothesis.
51
Power and Power Curves III
52
Power and Power Curves IV
53
Power and Power Curves V
Example The manufacturer of a new model car, the Orion, claims that a
typical car gets 26 miles per gallon (mpg). A consumer advocacy
group is skeptical of this claim and thinks that the mean gas
mileage, µ, of all Orions may be less than 26 mpg. The group plans
to perform the hypothesis test
• H0 : µ = 26 mpg (manufacturer’s claim)
• Ha : µ < 26 mpg (consumer group’s conjecture),
at the 5% significance level, using a sample of 30 Orions. where
μ is the mean gas mileage of all Orions. Construct a power curve.
Assume that gas mileages of Orions are normally distributed with
a standard deviation of 1.4 mpg.
54
Power and Power Curves VI
55
Power and Power Curves VII
56
Power and Power Curves VIII
57
Power and Power Curves IX
58
Hypothesis Tests for the Means of Two Populations with Equal
Standard Deviations, Using Independent Samples I
59
Hypothesis Tests for the Means of Two Populations with Equal
Standard Deviations, Using Independent Samples II
60
Hypothesis Tests for the Means of Two Populations, Using
Independent Samples I
61
Hypothesis Tests for the Means of Two Populations, Using
Independent Samples II
62
Confusion Matrix I
A confusion matrix is a table that is often used to describe the performance of a classi-
fication model (or ‘classifier’) on a set of test data for which the true values are known.
The confusion matrix itself is relatively simple to understand, but the related terminol-
ogy can be confusing.
63
Confusion Matrix II
It is a table with two rows and two columns that reports the number of false
positives, false negatives, true positives, and true negatives.
64
Confusion Matrix III
65
Confusion Matrix IV
False Neg- • The predicted value was falsely predicted
ative (FN) – • The actual value was positive but the model predicted a neg-
Type 2 error ative value
• Also known as the Type 2 error
Example :: Given a sample of 13 pictures, 8 of cats and 5 of dogs, where cats belong to
class 1 and dogs belong to class 0,
Confusion With these two labelled sets (actual and predictions) we can cre-
Matrix ate a confusion matrix that will summarize the results of testing
the classifier:
67
Confusion Matrix VI
68
Confusion Matrix VII
69
Sensitivity and Specificity I
70
Sensitivity and Specificity II
71
ROC Curve I
A receiver operating characteristic curve, or ROC curve, is a graphical plot that illus-
trates the diagnostic ability of a binary classifier system as its discrimination threshold
is varied. The method was originally developed for operators of military radar receivers,
which is why it is so named.
ROC Curve This is a commonly used graph that summarizes the performance
of a classifier over all possible thresholds. It is generated by plot-
ting the True Positive Rate (y-axis) against the False Positive Rate
(x-axis) as you vary the threshold for assigning observations to a
given class.
• The ROC curve is created by plotting the true positive rate (TPR) against the false
positive rate (FPR) at various threshold settings.
• The true-positive rate is also known as sensitivity, recall or probability of detection
in machine learning.
72
ROC Curve II
• The false-positive rate is also known as probability of false alarm and can be
calculated as (1 − specificity).
• It can also be thought of as a plot of the power as a function of the Type I Error of
the decision rule (when the performance is calculated from just a sample of the
population, it can be thought of as estimators of these quantities).
73
AUC: Area Under the ROC Curve I
Definition AUC stands for “Area under the ROC Curve.” That is, AUC measures
the entire two-dimensional area underneath the entire ROC curve
from (0,0) to (1,1).
74
AUC: Area Under the ROC Curve II
75
AUC: Area Under the ROC Curve III
AUC represents the probability that a random positive (green) example is positioned to
the right of a random negative (red) example.
AUC ranges in value from 0 to 1. A model whose predictions are 100% wrong has an AUC
of 0.0; one whose predictions are 100% correct has an AUC of 1.0.
77