Chapter 10
Chapter 10
Analysis
602-CDS-3
Master of Science in Data Science
King Khalid University
College of Computer Science
Information Systems Department
مرحبا بكم
Name: Dr. Ayman Qahmash
E-mail: [email protected]
Twitter: @aym_qh
Office: Vice Dean Academics,
College of Computer Science
2
Ch 10.
Hypothesis Testing Using
a Single Sample
Learning Objectives:
STUDENTS WILL UNDERSTAND:
1. that rejecting the null hypothesis implies strong
support for the alternative hypothesis.
2. why failing to reject the null hypothesis does
not imply strong support for the null hypothesis.
3. the reasoning used to reach a decision in a
hypothesis test.
4
Learning Objectives:
STUDENTS WILL BE ABLE TO:
1. translate a research question into null and
alternative hypotheses.
2. describe Type I and Type II errors in context.
3. carry out a large-sample z test for a population
proportion and interpret the results in context.
4. carry out a t test for a population mean and interpret
the results in context.
5. describe the effect of the significance level and the
sample size on the power of a test.
5
Introduction
● A hypothesis is a claim or statement about the value of a single
population characteristic or the values of several population
characteristics. The following are examples of legitimate hypotheses:
= 1000, where is the mean number of characters in an e-mail message
p = .01, where p is the proportion of e-mail messages that are
undeliverable
● In contrast, the statements x 5 1000 and p⁄ 5 .01 are not hypotheses,
because x and p⁄ are sample characteristics.
6
Introduction
● A test of hypotheses is a method that uses sample data to decide
between two competing claims (hypotheses) about a population
characteristic.
● One hypothesis might be = 1000 and the other ≠ 1000 or one
hypothesis might be p = .01 and the other p ≠ .01.
● If it were possible to carry out a census of the entire population, we
would know which of the two hypotheses is correct, but usually we
must decide between them using information from a sample.
7
Introduction
● We initially assume that a particular hypothesis, called the null
hypothesis, is the correct one.
● We then consider the evidence (the sample data) and reject the null
hypothesis in favor of the competing hypothesis, called the alternative
hypothesis, only if there is convincing evidence against the null
hypothesis.
8
9
10
11
Errors in Hypothesis
● Once hypotheses have been formulated, a test procedure uses
sample data to determine whether H0 should be rejected.
● Just as a jury may reach the wrong verdict in a trial, there is some
chance that using a test procedure with sample data may lead us to
the wrong conclusion about a population characteristic. One
erroneous conclusion in a criminal trial is for a jury to convict an
innocent person, and another is for a guilty person to be set free.
Similarly, there are two different types of errors that might be made
when making a decision in a hypothesis testing problem.
● One type of error involves rejecting H0 even though the null
hypothesis is true. The second type of error results from failing to
reject H0 when it is false. These errors are known as Type I and Type
II errors, respectively.
12
Ref: https://www.scribbr.com/statistics/type-i-and-type-ii-errors/
13
14
Case study:
Women with ovarian cancer usually are not diagnosed until the disease is in an advanced stage, when it is
most difficult to treat. The paper “Diagnostic Markers for Early Detection of Ovarian Cancer” (Clinical Cancer
Research [2008]: 1065–1072) describes a new approach to diagnosing ovarian cancer that is based on using
six different blood biomarkers (a blood biomarker is a biochemical characteristic that is measured in
laboratory testing). The authors report the following results using the six biomarkers:
● For 156 women known to have ovarian cancer, the biomarkers correctly identified 151 as having ovarian
cancer.
● For 362 women known not to have ovarian cancer, the biomarkers correctly identified 360 of them as being
ovarian cancer free.
We can think of using this blood test to choose between two hypotheses:
H0: woman has ovarian cancer
Ha: woman does not have ovarian cancer
Notice that although these are not “statistical hypotheses” (statements about a population characteristic), the
possible decision errors are analogous to Type I and Type II errors.
In this situation, believing that a woman with ovarian cancer is cancer free would be a Type I error—rejecting
the hypothesis of ovarian cancer when it is in fact true. Believing that a woman who is actually cancer free
does have ovarian cancer is a Type II error—not rejecting the null hypothesis when it is in fact false.
Based on the study results, we can estimate the error probabilities. The probability of a Type I error, a, is 15
approximately 5/156 = .032. The probability of a Type II error, b, is approximately 2/362 = .006.
Example:
● H0 = Ali’s used car is safe to drive.
○ Which statement is type I error ?
○ Which statement is type I I error ?
○ Which type of errors that has greater consequences?
1. Ali thinks that his car may be safe when, in fact, the car is not
safe.
2. Ali thinks that his car may be safe when, in fact, the car is safe.
3. Ali thinks that his car may be not safe when, in fact, the car is
not safe.
4. Ali thinks that his car may be not safe when, in fact, the car is
safe.
16
Example:
● Medical personnel are required to report suspected cases of child abuse. Because some diseases
have symptoms that mimic those of child abuse, doctors who see a child with these symptoms must
decide between two competing hypotheses:
● H0: symptoms are due to child abuse
● Ha: symptoms are due to disease
(Although these are not hypotheses about a population characteristic, this exercise illustrates the
definitions of Type I and Type II errors.) The article “Blurred line Between Illness, Abuse Creates Prob-
lem for Authorities” (Macon Telegraph, February 28, 2000) included the following quote from a doctor
in Atlanta regarding the consequences of making an incorrect decision: “If it’s disease, the worst you have
is an angry family. If it is abuse, the other kids (in the family) are in deadly danger.”
a. For the given hypotheses, describe Type I and Type II errors.
b. Based on the quote regarding consequences of the two kinds of error, which type of error
does the doctor quoted consider more serious? Explain.
17
Example:
● Ann Landers, in her advice column of October 24, 1994 (San Luis Obispo Telegram-Tribune), described the
reliability of DNA paternity testing as follows: “To get a completely accurate result, you would have to be
tested, and so would (the man) and your mother. The test is 100% accurate if the man is not the father and
99.9% accurate if he is.”
a. Consider using the results of DNA paternity testing to decide between the following two
hypotheses:
H0: a particular man is the father
Ha: a particular man is not the father
In the context of this problem, describe Type I and Type II errors. (Although these are not hypotheses about a
population characteristic, this exercise illustrates the definitions of Type I and Type II errors.)
b. Based on the information given, what are the values of 𝛼, the probability of a Type I error, and b,
the probability of a Type II error?
c. Ann Landers also stated, “If the mother is not tested, there is a 0.8% chance of a false positive.”
For the hypotheses given in Part (a), what is the value of 𝛽 if the decision is based on DNA testing in
which the mother is not tested?
18
Large-Sample Hypothesis Tests for a
Population Proportion
● Now that the basic concepts of hypothesis testing have been introduced, we
are ready to consider how to use sample data to decide between a null and an
alternative hypothesis. In a hypothesis test, there are two possible
conclusions: We either reject H0 or we fail to reject H0.
● The fundamental idea behind hypothesis-testing procedures is this: We reject
the null hypothesis if the observed sample is very unlikely to have occurred
when H0 is true.
● In this section, we consider testing hypotheses about a population proportion
when the sample size n is large.
● As before, p denotes the proportion of individuals or objects in a specified
population that possess a certain property. A random sample of n individuals
or objects is selected from the population. The sample proportion
19
20
21
• Critical region: If the value of the test statistic falls in this region, then
the null hypothesis is rejected.
• Z test: A test of the null hypothesis that the mean of a normal
population having a known variance is equal to a specified value.
• t test: A test of the null hypothesis that the mean of a normal
population having an unknown variance is equal to a specified value.
22
23
24
25
Computing a P-Value for a Large-Sample Test Concerning p
26
27
28
29
30
31
32
33
34
35
36
Example: Time Stands Still (or So It Seems)
• A study conducted by researchers investigated whether time perception, an indication of a person’s ability
to concentrate, is impaired during nicotine withdrawal.
• After a 24-hour smoking abstinence, 20 smokers were asked to estimate how much time had passed during
a 45-second period.
• Suppose the resulting data on perceived elapsed time (in seconds) were as follows:
69 || 65 || 72 || 73 || 59 || 55 || 39 || 52 || 67 || 57 || 56 || 50 || 70 || 47 || 56 || 45 || 70 || 64 || 67 || 53
n = 20
x̄ = 59.30
s = 9.84
• The researchers wanted to determine whether smoking abstinence had a negative impacton time
perception, causing elapsed time to be overestimated. With m representing themean perceived
elapsed time for smokers who have abstained from smoking for 24 hours,we can answer this question
by testing
37
Example: Time Stands Still (or So It Seems)
H0: μ = 45 (no consistent tendency to overestimate the time elapsed)
To answer this question, we carry out a hypothesis test with a significance level of .05 using the nine-step
procedure:
5. Test statistic:
38
Example: Time Stands Still (or So It Seems)
6. Assumptions: This test requires a random sample and either a large sample size or a normal population
distribution.
7. Computations:
39
Example of Hypothesis Tests for a Population Proportion :
40
41
Example of Hypothesis Tests for a Population Proportion :
A mobile phone company believes that the
percentage of residents who own a phone is 60% or
less. After, conducting a survey of 250 students and
found that 170 responded yes to owning a laptop.
A. Form the null and alternative hypothesis
B. At 10% significance level, do we have enough
evidence to support that the residents who own a
phone is 60% or less?
42
Thanks!
Any questions?
You can find me at
● @aym_qh
● [email protected]
43