0% found this document useful (0 votes)
52 views77 pages

Unit 3

Uploaded by

PaiEducation
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views77 pages

Unit 3

Uploaded by

PaiEducation
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Unit III

HYPOTHESIS TESTS AnD STATISTICAL TESTS

Sumit Kr. Choubey

1
Table of contents

1. Elements of Hypothesis Testing

2. Critical-Value Approach to Hypothesis Testing

3. P-Value Approach to Hypothesis Testing

4. Hypothesis Tests for One Population Mean When σ Is Known

5. T-Distribution

6. Hypothesis Tests for One Population Mean When σ Is Unknown

7. Type II Error Probabilities & Power

8. Hypothesis Tests for the Means of Two Populations with Equal Standard Deviations,
Using Independent Samples 2
Statistics

3
Hypothesis Testing

“ A statistical hypothesis is an assertion or conjecture concerning one or more


populations.
To prove that a hypothesis is true, or false, with absolute certainty, we
would need absolute knowledge. That is, we would have to examine the entire
population.
Instead, hypothesis testing concerns on how to use a random sample
to judge if it is evidence that supports or not the hypothesis.

4
Types of Hypothesis

A hypothesis about the value of a population parameter is an assertion about its value.Typically
a hypothesis test involves two hypotheses: the null hypothesis and the alternative hy-
pothesis (or research hypothesis), which we define as follows.

The null hypothesis, denoted H0 , is the statement about the population


1 parameter that is assumed to be true unless there is convincing evidence
to the contrary.

The alternative hypothesis, denoted Ha or H1 , is a statement about the


2 population parameter that is contradictory to the null hypothesis, and is
accepted as true only if there is convincing evidence in favor of it.

5
Types of Hypothesis

Null hypothesis [H0 ] Alternative hypothesis [Ha ]

Null means Nothing New Simply the other option


Assumed to be true Rejection of the assumption
Negative of research question Research Question to be proven
Contains equality Contains inequality
=, ⩾, ⩽ ̸=, >, <
Null & Al- • H0 : {=} Ha : {̸=} Two tailed Test
ternative • H0 : {⩾} Ha : {<} Left tail test
• H0 : {⩽} Ha : {>} Right tail test

6
The Logic of Hypothesis Testing

After we have chosen the null and alternative hypotheses, we must decide whether
to reject the null hypothesis in favor of the alternative hypothesis. The procedure for
deciding is roughly as follows.

Logic Take a random sample from the population. If the sample data are
consistent with the null hypothesis, do not reject the null hypothe-
sis; if the sample data are inconsistent with the null hypothesis and
supportive of the alternative hypothesis, reject the null hypothesis
in favor of the alternative hypothesis.

In practice, of course, we must have a precise criterion for deciding whether to


reject the null hypothesis.

7
Type I and Type II Errors
Any decision we make based on a hypothesis test may be incorrect because we have
used partial information obtained from a sample to draw conclusions about the entire
population. There are two types of incorrect decisions : Type I error and Type II error

Definition • Type I error: Rejecting the null hypothesis when it is in fact


true.
• Type II error: Not rejecting the null hypothesis when it is in
fact false.

8
“False Alarm” vs “Under Reaction”

9
Type I and Type II Errors

Consider the sugar-packaging hypothesis test. The null and alternative hypotheses are:

• H0 : µ = 454 g (the packaging machine is working properly)


• Ha : µ ̸= 454 g (the packaging machine is not working properly),
where μ is the mean net weight of all bags of pretzels packaged.

Error • A Type I error occurs when a true null hypothesis is rejected.


In this case, a Type I error would occur if in fact µ = 454g but
the results of the sampling lead to the conclusion that
µ ̸= 454g.
• A Type II error occurs when a false null hypothesis is not
rejected. In this case, a Type II error would occur if in fact
µ ̸= 454g but the results of the sampling fail to lead to that
conclusion.

10
Inferential Statistics

A Type I error occurs if a true null hypothesis is rejected. The probability of that
happening, the Type I error probability, commonly called the significance level of the
hypothesis test, is denoted α.

Significance The probability of making a Type I error, that is, of rejecting a true
Level null hypothesis, is called the significance level, α , of a hypothesis
test.

A Type II error occurs if a false null hypothesis is not rejected. The probability of that
happening, the Type II error probability, is denoted β

Relation For a fixed sample size, the smaller we specify the significance
level, α, the larger will be the probability, β, of not rejecting a false
null hypothesis.

11
Possible Conclusions for a Hypothesis Test

The significance level, α, is the probability of making a Type I error, that is, of rejecting
a true null hypothesis. Therefore, if the hypothesis test is conducted at a small
significance level (e.g., α = 0.05), the chance of rejecting a true null hypothesis will be
small.
For Small α Suppose that a hypothesis test is conducted at a small signifi-
cance level.
1. If the null hypothesis is rejected, we conclude that the data
provide suffi- cient evidence to support the alternative hy-
pothesis.
2. If the null hypothesis is not rejected, we conclude that the
data do not provide sufficient evidence to support the alter-
native hypothesis.

12
Critical-Value Approach

13
Critical-Value Approach to Hypothesis Testing

“ With the critical-value approach to hypothesis testing, we choose a “cutoff


point” (or cutoff points) based on the significance level of the hypothesis test.
The criterion for deciding whether to reject the null hypothesis involves a
comparison of the value of the test statistic to the cutoff point(s).

14

Problem : Critical-Value Approach to Hypothesis Testing
Problem Jack tells Jean that his average drive of a golf ball is 275 yards.
Jean is skeptical and asks for substantiation. To that end, Jack hits
25 drives.
The (sample) mean of Jack’s 25 drives is only 264.4 yards. Jack still
maintains that, on average, he drives a golf ball 275 yards and that
his (relatively) poor performance can reasonably be attributed to
chance.
At the 5% significance level, do the data provide sufficient evi-
dence to conclude that Jack’s mean driving distance is less than
275 yards?

For our analysis, we assume that Jack’s driving distances are nor-
mally distributed (which can be shown to be reasonable) and that
the population standard deviation of all such driving distances is
20 yards.

15
Terminology of the Critical-Value Approach
Definition • Rejection region: The set of values for the test statistic that
leads to rejection of the null hypothesis.
• Nonrejection region: The set of values for the test statistic
that leads to non- rejection of the null hypothesis.
• Critical value(s): The value or values of the test statistic that
separate the rejection and nonrejection regions. A critical
value is considered part of the rejection region.

16
Steps in the Critical-Value Approach to Hypothesis Testing

1 State the null and alternative hypotheses.

2 Decide on the significance level, α.

3 Compute the value of the test statistic.

4 Determine the critical value(s).

If the value of the test statistic falls in the rejection region, reject H0 ; oth-
5 erwise, do not reject H0 .

6 Interpret the result of the hypothesis test.

17
P-Value Approach

18
P-Value Approach to Hypothesis Testing

“ With the P-value approach to hypothesis testing, we first evaluate how likely
observation of the value obtained for the test statistic would be if the null hy-
pothesis is true. The criterion for deciding whether to reject the null hypothesis
involves a comparison of that likelihood with the specified significance level of
the hypothesis test.

19

Terminology of the Critical-Value Approach I

P-Value The P-value of a hypothesis test is the probability of getting sam-


ple data at least as inconsistent with the null hypothesis (and sup-
portive of the alternative hypothesis) as the sample data actually
obtained. We use the letter P to denote the P-value.

The smaller (closer to 0) the P-value, the stronger is the evidence against the null hy-
pothesis and, hence, in favor of the alternative hypothesis. Stated simply, an outcome
that would rarely occur if the null hypothesis were true provides evidence against the
null hypothesis and, hence, in favor of the alternative hypothesis.

Decision If the P-value is less than or equal to the specified significance


Criterion level, reject the null hypothesis; otherwise, do not reject the null
hypothesis. In other words, if P ⩽ α, reject H0 ; otherwise, do not
reject H0 .

20
Terminology of the Critical-Value Approach II
P-Value as the Observed Significance Level

The P-value of a hypothesis test equals the smallest significance level at which the null
hypothesis can be rejected, that is, the smallest significance level for which the ob-
served sample data results in rejection of H0 .

Detrmining To determine the P-value of a hypothesis test, we assume that the


P-Value null hypothesis is true and compute the probability of observing a
value of the test statistic as extreme as or more extreme than that
observed. By extreme we mean ‘ far from what we would expect
to observe if the null hypothesis is true’.

if z0 denote the observed value of the test statistic z, we determine the P-value as-
• Two-tailed test: The P-value equals the probability of observing a value of the test
statistic z that is at least as large in magnitude as the value actually observed, which
is the area under the standard normal curve that lies outside the interval -|z0 |–|z0 |.
21
Terminology of the Critical-Value Approach III
• Left-tailed test: The P-value equals the probability of observing a value of the test
statistic z that is as small as or smaller than the value actually observed, which is
the area under the standard normal curve that lies to the left of z0
• Right-tailed test: The P-value equals the probability of observing a value of the test
statistic z that is as large as or larger than the value actually observed, which is the
area under the standard normal curve that lies to the right of z0

22
Steps in the P-Value Approach to Hypothesis Testing

1 State the null and alternative hypotheses.

2 Decide on the significance level, α.

3 Compute the value of the test statistic.

4 Determine the P-value, P.

5 If P ⩽ , reject H0 ; otherwise, do not reject H − 0 .

6 Interpret the result of the hypothesis test.

23
Problem : P-Value Approach to Hypothesis Testing
Problem Jack tells Jean that his average drive of a golf ball is 275 yards.
Jean is skeptical and asks for substantiation. To that end, Jack hits
25 drives.
The (sample) mean of Jack’s 25 drives is only 264.4 yards. Jack still
maintains that, on average, he drives a golf ball 275 yards and that
his (relatively) poor performance can reasonably be attributed to
chance.
At the 5% significance level, do the data provide sufficient evi-
dence to conclude that Jack’s mean driving distance is less than
275 yards?

For our analysis, we assume that Jack’s driving distances are nor-
mally distributed (which can be shown to be reasonable) and that
the population standard deviation of all such driving distances is
20 yards.

24
Problem : P-Value Approach to Hypothesis Testing I

Problem The value of the test statistic for a left-tailed one-mean z-test is z =
−1.19.
• Determine the P-value.
• At the 5% significance level, do the data provide sufficient
evidence to reject the null hypothesis in favor of the
alternative hypothesis?

Problem The value of the test statistic for a right-tailed one-mean z-test is z
= 2.85.
• Determine the P-value.
• At the 1% significance level, do the data provide sufficient
evidence to reject the null hypothesis in favor of the
alternative hypothesis?

25
Problem : P-Value Approach to Hypothesis Testing II

Problem The value of the test statistic for a two-tailed one-mean z-test is z
= −1.71.
• Determine the P-value.
• At the 5% significance level, do the data provide sufficient
evidence to reject the null hypothesis in favor of the
alternative hypothesis?

26
Hypothesis Tests for One Population Mean When σ Is Known I

27
Hypothesis Tests for One Population Mean When σ Is Known II

28
T-Distribution I
If x is a normally distributed variable with mean µ and standard deviation σ , then, for
samples of size n, the variable x is also normally distributed and has mean μ and stan-
dard deviation σ/n. Equivalently, the standardized version of x,

x−µ
z= √
σ/ n
has the standard normal distribution. but what if the population standard deviation is
unknown? Then we cannot base our confidence-interval procedure on the standard-
ized version of x. The best we can do is estimate the population standard deviation, σ
, by the sample standard deviation, s; in other words, we replace sigma by s in above
Equation and base our confidence-interval procedure on the resulting variable t called
the studentized version of x.

x−µ
t= √
s/ n

29
T-Distribution II

30
T-Distribution and T-Curves I

There is a different t-distribution for each sample size. We identify a particular t-distribution
by its number of degrees of freedom (df ). For the studentized version of x, the number
of degrees of freedom is 1 less than the sample size, which we indicate symbolically by
df = n − 1.
Studentized Suppose that a variable x of a population is normally distributed
Version of the with mean μ. Then, for samples of size n, the variable
Sample Mean
x−µ
t= √
s/ n
has the t-distribution with n − 1 degrees of freedom.

31
T-Distribution and T-Curves II

32
T-Distribution and T-Curves III

Basic Proper- • Property 1: The total area under a t-curve equals 1.


ties of t-Curves • Property 2: A t-curve extends indefinitely in both directions,
approaching, but never touching, the horizontal axis as it
does so.
• Property 3: A t-curve is symmetric about 0.
• Property 4: As the number of degrees of freedom becomes
larger, t-curves look increasingly like the standard normal
curve.

33
T-Distribution and T-Curves IV

34
T-Distribution and T-Curves V

35
Hypothesis Tests for One Population Mean When σ Is Unknown I

36
Hypothesis Tests for One Population Mean When σ Is Unknown II

37
Hypothesis Tests for One Population Mean When σ Is Unknown III

Example Figure below shows the pH levels obtained by the researchers for
15 lakes. At the 5% significance level, do the data provide sufficient
evidence to conclude that, on average, high mountain lakes in the
Southern Alps are nonacidic?

38
Hypothesis Tests for One Population Mean When σ Is Unknown IV

Solution the null and alternative hypotheses : Let µ denote the mean pH
: STEP I level of all high mountain lakes in the Southern Alps. Then the null
and alternative hypotheses are, respectively,
• H0 : µ = 6 (on average, the lakes are acidic)
• Ha : µ > 6 (on average, the lakes are nonacidic).
Note that the hypothesis test is right tailed.

Solution significance level, [α] : We are to perform the test at the 5% sig-
: STEP II nificance level, so
α = 0.05.

39
Hypothesis Tests for One Population Mean When σ Is Unknown V

Solution value of the test statistic :


: STEP III
x − µ0
t= √ .
s/ n

here µ0 = 6, n = 16, x = 6.6 and s = 0.672 So,

x − µ0
t= √ = 3.458
s/ n

40
Hypothesis Tests for One Population Mean When σ Is Unknown VI

41
Type II Error Probabilities I

Type II error The probability of making a Type II error ( Not rejecting the null
hypothesis when it is in fact false) depends on the sample size,
the significance level, and the true value of the parameter under
consideration.

42
Type II Error Probabilities II

43
Type II Error Probabilities III

44
Type II Error Probabilities IV

Example The manufacturer of a new model car, the Orion, claims that a
typical car gets 26 miles per gallon (mpg). A consumer advocacy
group is skeptical of this claim and thinks that the mean gas
mileage, µ, of all Orions may be less than 26 mpg. The group plans
to perform the hypothesis test
• H0 : µ = 26 mpg (manufacturer’s claim)
• Ha : µ < 26 mpg (consumer group’s conjecture),
at the 5% significance level, using a sample of 30 Orions. Find the
probability of making a Type II error if the true mean gas mileage
of all Orions is 25.8 mpg and 25.0 mpg.

Assume that gas mileages of Orions are normally distributed with


a standard deviation of 1.4 mpg.

45
Type II Error Probabilities V

46
Type II Error Probabilities VI

47
Type II Error Probabilities VII

48
Type II Error Probabilities VIII

49
Power and Power Curves I

Power The power of a hypothesis test is the probability of not making a


Type II error, that is, the probability of rejecting a false null hypoth-
esis. We have

Power = 1−P(Type II error) = 1−β

The power of a hypothesis test is between 0 and 1 and measures the ability of the hy-
pothesis test to detect a false null hypothesis. If the power is near 0, the hypothesis test
is not very good at detecting a false null hypothesis; if the power is near 1, the hypothesis
test is extremely good at detecting a false null hypothesis.

50
Power and Power Curves II

Power Curve The visual display of the effectiveness of the hypothesis test, ob-
tained by plotting points of power against various values of the
parameter and then connecting the points with a smooth curve
is called a power curve. In general, the closer a power curve is to 1
(i.e., the horizontal line 1 unit above the horizontal axis), the better
the hypothesis test is at detecting a false null hypothesis.

51
Power and Power Curves III

52
Power and Power Curves IV

53
Power and Power Curves V

Example The manufacturer of a new model car, the Orion, claims that a
typical car gets 26 miles per gallon (mpg). A consumer advocacy
group is skeptical of this claim and thinks that the mean gas
mileage, µ, of all Orions may be less than 26 mpg. The group plans
to perform the hypothesis test
• H0 : µ = 26 mpg (manufacturer’s claim)
• Ha : µ < 26 mpg (consumer group’s conjecture),
at the 5% significance level, using a sample of 30 Orions. where
μ is the mean gas mileage of all Orions. Construct a power curve.
Assume that gas mileages of Orions are normally distributed with
a standard deviation of 1.4 mpg.

54
Power and Power Curves VI

55
Power and Power Curves VII

56
Power and Power Curves VIII

57
Power and Power Curves IX

58
Hypothesis Tests for the Means of Two Populations with Equal
Standard Deviations, Using Independent Samples I

59
Hypothesis Tests for the Means of Two Populations with Equal
Standard Deviations, Using Independent Samples II

60
Hypothesis Tests for the Means of Two Populations, Using
Independent Samples I

61
Hypothesis Tests for the Means of Two Populations, Using
Independent Samples II

62
Confusion Matrix I
A confusion matrix is a table that is often used to describe the performance of a classi-
fication model (or ‘classifier’) on a set of test data for which the true values are known.
The confusion matrix itself is relatively simple to understand, but the related terminol-
ogy can be confusing.

Definition In the field of machine learning and specifically the problem of


statistical classification, a confusion matrix, also known as an er-
ror matrix, is a specific table layout that allows visualization of the
performance of an algorithm, typically a supervised learning one
(in unsupervised learning it is usually called a matching matrix).
Each row of the matrix represents the instances in a predicted
class, while each column represents the instances in an actual
class (or vice versa). The name stems from the fact that it makes
it easy to see whether the system is confusing two classes (i.e.
commonly mislabeling one as another).

63
Confusion Matrix II
It is a table with two rows and two columns that reports the number of false
positives, false negatives, true positives, and true negatives.

64
Confusion Matrix III

True Pos- • The predicted value matches the actual value


itive (TP) • The actual value was positive and the model predicted a pos-
itive value

True Neg- • The predicted value matches the actual value


ative (TN) • The actual value was negative and the model predicted a
negative value

False Pos- • The predicted value was falsely predicted


itive (FP) – • The actual value was negative but the model predicted a pos-
Type 1 error itive value
• Also known as the Type 1 error

65
Confusion Matrix IV
False Neg- • The predicted value was falsely predicted
ative (FN) – • The actual value was positive but the model predicted a neg-
Type 2 error ative value
• Also known as the Type 2 error

Example :: Given a sample of 13 pictures, 8 of cats and 5 of dogs, where cats belong to
class 1 and dogs belong to class 0,

actual = [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0],


prediction = [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1]
assume that a classifier that distinguishes between cats and dogs is trained, and we
take the 13 pictures and run them through the classifier, and the classifier makes 8
accurate predictions and misses 5: 3 cats wrongly predicted as dogs (first 3 predictions)
and 2 dogs wrongly predicted as cats (last 2 predictions).
66
Confusion Matrix V

Confusion With these two labelled sets (actual and predictions) we can cre-
Matrix ate a confusion matrix that will summarize the results of testing
the classifier:

67
Confusion Matrix VI

68
Confusion Matrix VII

69
Sensitivity and Specificity I

Sensitivity and specificity are statistical measures of the performance of a classification


test that are widely used in medicine:

Sensitivity measures the proportion of positives that are correctly identified


(i.e. the proportion of those who have some condition (affected)
who are correctly identified as having the condition)

70
Sensitivity and Specificity II

Specificity measures the proportion of negatives that are correctly identified


(i.e. the proportion of those who do not have the condition (unaf-
fected) who are correctly identified as not having the condition).

71
ROC Curve I

A receiver operating characteristic curve, or ROC curve, is a graphical plot that illus-
trates the diagnostic ability of a binary classifier system as its discrimination threshold
is varied. The method was originally developed for operators of military radar receivers,
which is why it is so named.

ROC Curve This is a commonly used graph that summarizes the performance
of a classifier over all possible thresholds. It is generated by plot-
ting the True Positive Rate (y-axis) against the False Positive Rate
(x-axis) as you vary the threshold for assigning observations to a
given class.

• The ROC curve is created by plotting the true positive rate (TPR) against the false
positive rate (FPR) at various threshold settings.
• The true-positive rate is also known as sensitivity, recall or probability of detection
in machine learning.

72
ROC Curve II
• The false-positive rate is also known as probability of false alarm and can be
calculated as (1 − specificity).
• It can also be thought of as a plot of the power as a function of the Type I Error of
the decision rule (when the performance is calculated from just a sample of the
population, it can be thought of as estimators of these quantities).

73
AUC: Area Under the ROC Curve I

Definition AUC stands for “Area under the ROC Curve.” That is, AUC measures
the entire two-dimensional area underneath the entire ROC curve
from (0,0) to (1,1).

74
AUC: Area Under the ROC Curve II

AUC provides an aggregate measure of performance across all possible classification


thresholds. One way of interpreting AUC is as the probability that the model ranks a
random positive example more highly than a random negative example. For example,
given the following examples, which are arranged from left to right in ascending order
of logistic regression predictions:

75
AUC: Area Under the ROC Curve III
AUC represents the probability that a random positive (green) example is positioned to
the right of a random negative (red) example.

AUC ranges in value from 0 to 1. A model whose predictions are 100% wrong has an AUC
of 0.0; one whose predictions are 100% correct has an AUC of 1.0.

AUC is desirable for the following two reasons:


• AUC is scale-invariant. It measures how well predictions are ranked, rather than
their absolute values.
• AUC is classification-threshold-invariant. It measures the quality of the model’s
predictions irrespective of what classification threshold is chosen.
However, both these reasons come with caveats, which may limit the usefulness of AUC
in certain use cases:
• Scale invariance is not always desirable. For example, sometimes we really do need
well calibrated probability outputs, and AUC won’t tell us about that.
76
AUC: Area Under the ROC Curve IV

• Classification-threshold invariance is not always desirable. In cases where there are


wide disparities in the cost of false negatives vs. false positives, it may be critical
to minimize one type of classification error. For example, when doing email spam
detection, you likely want to prioritize minimizing false positives (even if that results
in a significant increase of false negatives). AUC isn’t a useful metric for this type of
optimization.

77

You might also like