50% found this document useful (2 votes)
576 views

Hypothesis Testing (Lecture) PDF

The document discusses hypothesis testing, including the key concepts of the null hypothesis (Ho), alternative hypothesis (Ha), type I and type II errors, significance level, critical value, critical region, noncritical region, one-tailed and two-tailed tests, and common statistical tests like the z-test, t-test, and chi-square test. It provides examples of stating hypotheses, choosing appropriate tests, and making decisions to reject or fail to reject the null hypothesis based on computed test values and critical values.

Uploaded by

Sour Wolf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
576 views

Hypothesis Testing (Lecture) PDF

The document discusses hypothesis testing, including the key concepts of the null hypothesis (Ho), alternative hypothesis (Ha), type I and type II errors, significance level, critical value, critical region, noncritical region, one-tailed and two-tailed tests, and common statistical tests like the z-test, t-test, and chi-square test. It provides examples of stating hypotheses, choosing appropriate tests, and making decisions to reject or fail to reject the null hypothesis based on computed test values and critical values.

Uploaded by

Sour Wolf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

HYPOTHESIS TESTING

HYPOTHESIS

- intellectual guess

- is a statement about concepts that


refers to observable phenomena which
may be judge as true or false and is
subject to empirical testing
Two Types of Hypothesis

Null Hypothesis (Ho)


- is the hypothesis we hope to reject
- always express the idea of a no
significant difference or relationship

Alternative Hypothesis (Ha or H1)


- it is the opposite of the null hypothesis.
It specifies existence of a difference or
a relationship
Example:
Title: The Effects of Age to Child Bearing

Ho: Age has no effect to child bearing.

H1: Age has an effect to child bearing.


Example:
Title: The Relationship Between Obesity
and Diabetes

Ho: There is no relationship between


obesity and diabetes.

H1: There is a relationship between


obesity and diabetes.
Type I and Type II Errors

Type I Error
– null hypothesis is rejected when it is true

Type II Error
– null hypothesis is accepted when it is
false
Example:
The defendant is either guilty or innocent,
and he or she will be convicted or
acquitted.

Ho: The defendant is innocent.


H1: The defendant is not innocent.
Ho true Ho false
(innocent) (not innocent)

Reject Ho
Type I Correct
(convict) Error decision

Do not
Correct Type II
Reject Ho decision Error
(acquit)
The decision to reject or not reject the
null hypothesis does not prove anything.

The only way to prove anything


statistically is to use the entire
population, which, in most cases, is not
possible. The decision, then, is made on
the basis of probabilities. That is, when
there is a large difference between the
mean obtained from the sample and the
hypothesized mean, the null hypothesis
is probably not true.

How large a difference is necessary to


reject the null hypothesis?
Level of Significance (α)

- it is the probability of committing the


Type I error.

* Statisticians generally agree on using


three arbitrary significance levels: the
0.10, 0.05 and 0.01 levels.
What significance level should you
use?
* The most commonly used level of significance
is 0.05 level.

* If we are going to commit to an action that is


expensive, health-related or could have legal
consequences, we want to be more certain that
we are not falsely rejecting the null hypothesis,
we use the 0.01 level of significance.

* If we are doing a pilot study or just want to


have some indication of an effect, we might let
the significance level be 0.10. When in doubt, it
is best use the standard 0.05 level.
Critical Value (CV)
– taken from a table for the appropriate test.
- separates the critical region from the non
critical region.

Critical or Rejection Region


– the range of values of the test value that
indicates that there is a significant
difference and that the null hypothesis
should be rejected.
Noncritical or Non-rejection Region
– the range of values of the test values that
indicates that the difference was probably
due to change and that the null hypothesis
should not be rejected.

Note: The location of the critical value


depends on the inequality sign of the
alternative hypothesis.
One-Tailed Test
– indicates that the null hypothesis should
be rejected when the test value is in the
critical region on one side of the mean.
H1: µ > 250 grams

Noncritical Critical region


Region
Two-Tailed Test
– the null hypothesis should be rejected
when the test value is in either of the
two critical region.
H1: µ ≠ 250 grams

Critical Noncritical Critical


Region Region Region
Statistical Test
Z-test
- it is used in comparing two means and
when n ≥ 30, and the population
standard deviation is known.

One sample mean test


– a sample mean compared to a population
mean.
Z  x    n 

Statistical Test

Two sample mean test


– a sample mean with another sample
mean.
 x  x    
   
Z   1 2  1 2
2  2
1  2
n n
1 2
Statistical Test
T-test
- it is used when the sample is small, n <
30, and when the only sample variance is
known.

One sample mean test


– a sample mean compared to a population
mean.
t  x    n  df  n 1
s
Statistical Test
Two sample mean test
– one sample mean compared to another
sample mean
 x  x    

t  1 2   1 2 
 n 1 S 2   n 1 S 2  
   
 1  1  2  2  1  1 
n n 2 n n 
1 2  1 2

df  n  n  2
1 2
Statistical Test
t-test (dependent samples)
ഥ 𝑛
𝐷
𝑡=
𝑛 σ 𝐷2 − σ 𝐷 2
𝑛(𝑛 − 1)
df = n – 1
where:
D = difference between the scores
𝐷ഥ = mean of the difference
n = number of pairs of the given
Statistical Test
t-test with correlation
𝑟 𝑛−2
𝑡=
1 − 𝑟2

df = n – 2
where:
r = pearson r
n = number of pairs of the given
Statistical Test
Chi-Square Test
- it is particularly useful in tests involving
cases where persons, events or objects
are grouped in two or more nominal
categories such as yes or no, approve-
undecided-disapprove, or class A,B,C,D.

2
x 
O  E 2
E
Steps for Testing Hypothesis
1. State the null hypothesis
2. Select an appropriate alternative
hypothesis
3. Choose the appropriate statistical test
4. Select the desired level of significance to
be used
5. Compute the calculated value and
determine the critical test value
6. Make the decision. Reject the null
hypothesis if the calculated value is
larger than the critical value, otherwise,
do not reject the null.
Example:
A company that makes chocolates claims that
the mean weight of the bag of chocolates is
240 grams with the standard deviation of
20.5 grams. Using a 0.05 significance level,
would you agree with the company if a
random sample of 50 bags of chocolates was
found to have a mean weight of 230 grams?
Example:
A random sample of 25 cartons of a
certain brand of powdered milk showed
a mean content of 237 grams with a
standard deviation of 8.56 grams, while a
sample of 20 cartons of another brand of
powdered milk showed a mean content
of 240 grams with a standard deviation
of 9.75 grams. Using a 0.05 level of
significance, is there a difference in the
mean content of the two brands of
powdered milk?
Example:
Hoaglin, Mosteller and Turkey (1983)
present data on blood levels of beta-
endorphin as a function of stress. They
measured beta-endorphin levels on 12
patients 12 hours before surgery and again
10 minutes before surgery. The data are
presented here, in fmol/ml.
What is the significant difference in the
blood levels of beta-endorphin as a function
of stress in 12 hours before and 10 minutes
before surgery?
(Data is presented on the next slide.)
Example:
. Subject 12 hours before 10 minutes
before
1 10.0 6.5
2 6.5 14.0
3 8.0 13.5
4 12.0 18.0
5 5.0 14.5
6 11.5 9.0
7 5.0 18.0
8 8.5 12.0
9 7.5 7.5
10 5.8 6.0
11 4.7 25.0
12 8.0 12.0
Example:
Suppose you were interested in the effects
of interracial contact on racial attitudes.
You have a fairly reliable test of racial
attitudes in which high scores indicate
more positive attitudes. You administer
the test one Monday morning to a biracial
group of fourteen 12-year old girls who do
not know each other but who have signed
up for a weeklong community day camp.
Example:
The campers then spend the next week
taking nature walks, playing ball, eating
lunch, swimming, making things, and doing
the kinds of things that camp directors
dream up to keep 12-year-old busy. On
Saturday morning, the girls are again given
the racial attitude test. Thus, the data
consists of 14 pairs of before-and-after
scores. What conclusion can you make
after testing the degree of significance
difference of the hypothesis?
Exercises:
1. A random sample of 8 cigarettes of a
certain brand has an average nicotine
content of 4.2 milligrams and a
standard deviation of 1.4 milligrams. Is
this in line with the manufacturer’s
claim that the average nicotine content
does not exceed 3.5 milligrams? Use a
0.05 level of significance and assume
the distribution of nicotine to be
normal.
Exercises:
2. A random sample of 100 recorded
deaths in the Philippines during the
past year showed an average life span
of 71.8 years, with a standard deviation
of 8.9 years. Does this seem to indicate
that the average lifespan today is
greater than 70 years? Use alpha =
0.10.
Exercises:
3. A manufacturer claims that the average
tensile strength of thread A exceeds the
average tensile strength of thread B by at
least 12 kilograms. To test this claim, 50
pieces of each type of thread are tested
under similar conditions. Type A thread
had an average tensile strength of 86.7
kilograms with a standard deviation of
6.28 kilograms, while type B thread had an
average strength of 77.8 kilograms with a
standard deviation of 5.61 kilograms. Test
the manufacturer’s claim using a 0.01 level
of significance.
Exercises:

4. Two machines fill grated cheese


packages. The population is said to be
normally distributed with population
standard deviations 0.80 ounces for
machine A and 0.60 ounces for machine
B. Samples are selected from each
machine. The sample data is as follows:
Sample Sample
Size Mean
Machine A 22 8.2 ounces
Machine B 24 7.9 ounces
Exercises:

We are interested in determining whether


the mean content of packages filled by
machine A is more than the mean content
of packages filled by machine B. Conduct
an appropriate hypothesis test using 0.05
level of significance.
SW:
1. A microbiologist claims that the life span of
newly discovered virus in one’s system is 12
days with a standard deviation of 1.2 days
before it can be contagious. Using a random
sample of 40 infected patients in a certain
hospital, it was found that the average life
span of the said virus is 11.2 days only. Is it
safe now to conclude that the
microbiologist’s claim is acceptable?
SW:
2. In a study of usage of instant coffee by a simple
random sample of 14 rural families, the
consumption of a certain coffee was found to
have an average of 30 ounces per family every
month with a standard deviation of 5 ounces. In
another similar study a sample of 15 urban
families, consumption was found to average 28
ounces with a standard deviation of 4 ounces. At
0.01 level of significance, would you conclude
that there was a statistically significant
difference in the sample averages of
consumption of instant coffee between the rural
and the urban families?
CHI-SQUARE (x2)

The chi-square can be used for a variable or two


variables for which there are two or more
categories each. It reflects discrepancies
between the observed and expected or
theoretical frequencies of individuals, objects, or
events falling in the various categories.
TYPES OF CHI-SQUARE (x2)
1. Test of Goodness of Fit

• A chi-square goodness of fit is performed in


order to determine if a set of observed data
corresponds to some theoretical distribution.
• The test is applied when you have one
categorical variable from a single population.
It is used to determine whether sample data
are consistent with a hypothesized
distribution.
APPLICATIONS OF x2
(TEST OF GOODNESS OF FIT)
A University conducted a survey of its recent graduates to collect
demographic and health information for future planning
purposes as well as to assess students' satisfaction with their
undergraduate experiences. The survey revealed that a
substantial proportion of students were not engaging in regular
exercise, many felt their nutrition was poor and a substantial
number were smoking. In response to a question on regular
exercise, 60% of all graduates reported getting no regular
exercise, 25% reported exercising sporadically and 15% reported
exercising regularly as undergraduates. The next year the
University launched a health promotion campaign on campus in
an attempt to increase health behaviors among undergraduates.
The program included modules on exercise, nutrition and
smoking cessation. To evaluate the impact of the program, the
University again surveyed graduates and asked the same
questions.
APPLICATIONS OF x2
(TEST OF GOODNESS OF FIT)
The survey was completed by 470 graduates and
the following data were collected on the exercise
question:
No Regular Sporadic Regular Total
Exercise Exercise Exercise
Number of 255 125 90
Students
Based on the data, is there evidence of a shift in the
distribution of responses to the exercise question
following the implementation of the health
promotion campaign on campus? Run the test at a
5% level of significance.
TYPES OF CHI-SQUARE (x2)
2. Test of Homogeneity
(Two or more samples, one criterion variable)
• The test is applied to a single categorical
variable from two or more different
populations. It is used to determine whether
frequency counts are distributed identically
across different populations.
• The chi-square test is frequently used to
determine if two or more populations are
homogenous, the data distributions are similar
with respect to a particular variable.
APPLICATIONS OF x2:
TEST OF HOMOGENEITY
A group of 266 healthy men and women were grouped
according to their number of relationships. They were then
exposed to a virus that caused colds. The data is summarized
in the table below. Does the data provide sufficient evidence
to indicate that susceptibility to colds is affected by the
number of relationships you have?
Number of Relationships
Contacted
cold? 3 or less 4 to 5 6 or more
Yes 49 43 34
No 31 47 62
TYPES OF CHI-SQUARE (x2)
3. Test of Independence
(One sample, two criterion)
• The test is applied when you have two
categorical variables from a single population.
It is used to determine whether there is a
significant association between the two
variables.
TYPES OF CHI-SQUARE (x2)
• The one sample test of independence differs from
the test of homogeneity in that for each sample
member there are measures on two variables.
The sample used in a test of independence
consists of members randomly drawn from the
same population. This test is used to see if
measures taken on two criterion variables are
either independent or associated with one
another in a given population.
APPLICATIONS OF x2:
TEST OF INDEPENDENCE
A researcher asked mothers of autistic and non-
autistic children to say what time period they
breastfed their children. The data is in the next slide
(Schultz, Klonoff-Cohen, Wingard, Askhoomoff,
Macera, Ji & Bacher, 2006) showed the tabulated
results. Do the data provide enough evidence to
show that that breastfeeding and autism are
independent? Test at the1% level.
APPLICATIONS OF x2:
TEST OF INDEPENDENCE

Breastfeeding Timelines
Autism None Less than 2 2 to 6 More than
months months 6 months
Yes 241 198 164 215
No 20 25 27 44
EXERCISES:
1. The National Center for Health Statistics (NCHS) provided
data on the distribution of weight (in categories) among
Americans in 2002. The distribution was based on
specific values of body mass index (BMI) computed as
weight in kilograms over height in meters squared.
Underweight was defined as BMI< 18.5, Normal weight as
BMI between 18.5 and 24.9, overweight as BMI between
25 and 29.9 and obese as BMI of 30 or greater. Americans
in 2002 were distributed as follows: 2% Underweight,
39% Normal Weight, 36% Overweight, and 23% Obese.
Suppose we want to assess whether the distribution of
BMI is different in the Framingham Offspring sample.
EXERCISES:

Using data from the 3,326 participants who attended the


seventh examination of the Offspring in the Framingham
Heart Study we created the BMI categories as defined and
observed the following:
Under- Normal Overweight Obese Total
weight Weight BMI 25.0- BMI > 30
BMI<18.5 BMI 18.5- 29.9
24.9
Number of 20 932 1374 1000 3326
Participants
EXERCISES:

2. A researcher wanted to know if the attitude of children


is dependent on his order of birth. Data are as follows:

At 5% level of significance, test if the attitude of the children


is dependent on the order of birth.
EXERCISES:

3. One hundred individuals, aged 20 – 58, were given a test


of psychomotor skill. Both age and score were classified
as shown in the table:

Age Score
High Average Low
40 – 49 23 20 17
20 – 39 18 12 10

Test for the dependency of the scores obtained in the


psychomotor test and the individual’s age at 10% level of
significance.

You might also like