0% found this document useful (0 votes)
94 views46 pages

Chapter 7: Hypothesis Testing: Procedures in Hypothesis Testing Statistical Power Factors That Affect Power

1. This chapter discusses hypothesis testing procedures including specifying the null and alternative hypotheses, calculating the appropriate test statistic, evaluating the statistic based on its sampling distribution, and stating conclusions. 2. Two common hypothesis tests discussed are the z-test, which is used when the population parameters are known, and the t-test, which is used when the population standard deviation is unknown. 3. Key aspects of hypothesis testing include setting the alpha level to determine statistical significance, and the risks of making Type I errors in rejecting a true null hypothesis or Type II errors in failing to reject a false null hypothesis.

Uploaded by

Debela Lemesa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views46 pages

Chapter 7: Hypothesis Testing: Procedures in Hypothesis Testing Statistical Power Factors That Affect Power

1. This chapter discusses hypothesis testing procedures including specifying the null and alternative hypotheses, calculating the appropriate test statistic, evaluating the statistic based on its sampling distribution, and stating conclusions. 2. Two common hypothesis tests discussed are the z-test, which is used when the population parameters are known, and the t-test, which is used when the population standard deviation is unknown. 3. Key aspects of hypothesis testing include setting the alpha level to determine statistical significance, and the risks of making Type I errors in rejecting a true null hypothesis or Type II errors in failing to reject a false null hypothesis.

Uploaded by

Debela Lemesa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

CHAPTER 7: HYPOTHESIS TESTING

Procedures in Hypothesis Testing

Statistical Power

Factors that Affect Power


Hypothesis Testing
2

A hypothesis is a prediction of the results of


a subsequent observation or a proposed
explanation of those results.

It is derived from a theory and states what


we expect to observe in the real world if our
theory is correct.
Two important Points from the definition
3

1. Hypotheses are predictions that are stated


before we make our observations. Sometimes
we engage in post-hoc hypothesizing, but
normally we state our hypotheses before we
begin our data collection and analysis.

2. Hypotheses are derived from theory. Theory is


central to research and the process of
hypothesis testing allows us to test our
theories.
Retaining/Rejecting H0
4

When we retain the null hypothesis we are


saying that we cannot rule out chance as an
explanation for the results we observe.

If, on the other hand, we reject the null


hypothesis, we conclude that it is unlikely that
the results are due to chance, and conclude to
accept the alternative at least tentatively.
7.1. Procedures in Hypothesis Testing
5

 Hypothesis testing involves four steps.

1. Specify the null and alternative hypotheses


2. Calculate the test statistic
3. Evaluate the statistic based on its sampling
distribution.
4. State conclusions
Hypothesis about the Mean of
a Single Population
6

To test hypotheses about the mean of a single


population, we can make use of z test or t test
depending on whether the parameters of the
population are known or are unknown.
Z test
7

Assumptions

The z test is appropriate when the parameters of


the population are known (when μ and σ are
known).

In addition, to use this test, the sampling


distribution of the mean should be normally
distributed. This, of course, requires that N ≥ 30.
Example 1
8

A university president believes that over the past few years


the average age of students attending his university has
changed. To test this hypothesis, an experiment is
conducted in which the age of 150 students who have been
randomly sampled from the student body is measured. The
mean age is 23.5 years. A complete census taken at the
university a few years prior to the experiment showed a
mean age of 22.4 years, with a standard deviation of 7.6.
Find out whether the sample mean is significantly different
from the population mean.
Step 1. Specify H0 and H1
9

There are two types of hypotheses:


Null hypothesis, H0: μ = ̅χ OR H0: μ - ̅χ = 0.
Alternative hypothesis, H1: μ ≠ ̅χ OR H1:μ - ̅χ≠ 0.

Note that
H0 and H1 are mutually exclusive.
H0 and H1 exhaust all possibilities.
In other words,
10

 The null hypothesis asserts that it is reasonable to consider


the sample with mean 23.5 a random sample from a
population with µ=22.4.

 In contrast, the non-directional alternative hypothesis asserts


that over the past few years, the average age of students at
the university has changed. Therefore, the sample with mean
= 23.5 is a random sample from a population where µ ≠ 22.4.

H0: µ = 23.5
H1: µ ≠ 23.5
Step 2. Calculate the appropriate statistic
11

Calculate Z obt from the given data.

Zobt =

= 1.77
Step 3. Evaluate the statistic
12

 Evaluate the statistic based on its sampling distribution.


 The decision rule is as follows.

1. If |Zobtained| ≥ |Zcritical|, reject H0.

2. If |Zobtained| < |Zcritical|, retain H0 (or do not reject Ho).

 For α = .05 (two-tailed test), from Table A,

Zcritical =
Step 4. Conclusion
13

 Since |Zobtained| = 1.77 < 1.96, it does not fall within the
critical region for rejection of Ho. Therefore, we retain Ho.

 We cannot conclude that the average age of students


attending the university has changed.
The Central Limit Theorem
14

If repeated random samples of size N are


drawn from any population (of whatever
form) having a mean µ and a variance σ2
then as N becomes large, the sampling
distribution of sample means approaches
normality, with mean µ and variance σ2/N.
Example 2
15

A gasoline manufacturer believes a new additive will


result in more miles per gallon. A large number of
mileage measurements on the gasoline without the
additive have been made by the company under
rigorously controlled conditions. The results show a
mean of 24.7 miles per gallon and a standard
deviation of 4.8. Tests are conducted on a sample of
75 cars using the gasoline plus the additive. The
sample mean equals 26.5 miles per gallon. Find out
whether the difference between the two means is
statistically significant.
Solution
16

Step 1.

Step 2.

Step 3.

Step 4. Conclusion
t test
17

 Earlier, we stated that z test is appropriate in


situations where both the mean and standard
deviation of the population were known.
 However, these situations are relatively rare. It is
more common to encounter situations in which
the mean of the population can be specified and
the standard deviation is unknown. In these
cases, the z test cannot be used. Instead,
another test, called Student’s t test, is employed.
Comparison of the z and t tests
18

Z test t test

Zobtained = tobtained =

Zobtained = tobtained =
Example
19

Suppose you have a technique that you believe will


decrease the age at which children begin speaking. In
your locale, the average age of first word utterances is
13.0 months. The standard deviation is unknown. You
apply your technique to a random sample of 15
children. The results show that the sample mean age
of first word utterances is 11.0 months with a standard
deviation of 3.34.
1. What is the null hypothesis?
2. What is the non-directional alternative hypothesis?
3. Did the technique work? Use α = .05 two-tailed test.
Alpha Level
20

 One of steps in hypothesis testing (identified above) is to


evaluate the statistic based on its sampling distribution.
 When we talk about statistical significance, we are trying
to rule out chance as an explanation for our results. In
order to do that we must set some criteria or limit for
determining when we should reject the null hypothesis.
 The decision criterion is often referred to as the alpha
level or significance level and is symbolized as . 
 While there are no hard and fast rules for setting the alpha
level, commonly accepted alpha levels have emerged
over the years. In the social and behavioral sciences,
alpha levels of .05 and .01 have been traditionally used.
What Does Alpha Mean?
21

When we are doing hypothesis testing we


are making probability statements about the
likelihood of an event occurring by chance.
By setting alpha at .05 we are saying that we
are willing to be wrong in our evaluation of
the null hypothesis about 5% of the time.
For this reason we cannot say that we prove
a hypothesis. Since we are making
probability statements, even when the
probability is small, we could still be wrong.
Alpha and Statistical Significance
22

When we set the alpha level at .05, we are saying that


if chance alone is working, we would expect to obtain
these results less than 5% of the time.
Since we are willing to accept that level of risk, we
conclude it is unlikely that these results occurred by
chance and that something else must account for the
results.
When this happens, we say that the results are
statistically significant.
Remember that we could still be wrong. We think we
are correct. The "odds" are in our favor. But we could
still be wrong.
Errors in Hypothesis Testing
23

Because we are only making a probability


statement whenever we are doing hypothesis
testing, we could be wrong in the evaluation
of our results.
Whenever we do hypothesis testing we run
the risk of making two types of errors.
These two types of errors are commonly
referred to as Type I error and Type II error.
Type I and Type II Errors
24

 Type I error means that we rejected the null


hypothesis when it is true. In other words, we identify
a result as statistically significant when it is not.
 Type II error, on the other hand, means we fail to
reject the null hypothesis when it is false. In other
words, we failed to identify a result as statistically
significant when it is.
 Type I and Type II errors represent two competing
outcomes. A Type I error may occur only when the
null hypothesis is actually true, while a Type II error
may occur only when the alternative hypothesis is
actually true.
Type I and Type II Errors (cont.)
25

Type I and Type II errors are summarized


in the table below.

The probability of making a Type I error


is equal to alpha while the probability of
making a Type II error is called beta.
Type I and Type II Errors (cont.)
26
Should we use directional or non-directional
tests?

Some people argue that the evaluation of the null


hypothesis should always be two-tailed (non-
directional).
That is because non-directional tests are more
“conservative”, making it more difficult to reject
the null hypothesis and reducing the Type I error
rate.
Also, if the results are extreme in the opposite
direction of the prediction, the researcher must
27
still retain the null hypothesis.
7.2. Statistical Power

Statistical power refers to the likelihood that


a hypothesis test will correctly reject the null
hypothesis when it is false.

Conceptually, statistical power is the


sensitivity of the experiment to detect a real
effect in the independent variable, if there is
one.
28
What is Statistical Power?
29

 Power is especially important in experimental


studies where the size of the sample is often
small.
 Since power is a probability, its values can range
between 0.00 and 1.00. The higher the power, the
more sensitive the study is to detect a real effect
of the independent variable.
 Experiments with power as high as .80 or higher
are desirable but rarely seen in the social and
behavioral sciences. Values of .40 to .60 are
much more common.
Characteristics of Power
30

1. Power is defined mathematically as the


probability that the experiment will result in
rejecting the null hypothesis if the
independent variable has a real effect.
2. Beta is the probability of making a Type II
error
3. Power + Beta = 1.00. Thus as power
increases, beta (the chance of making a
Type II error) decreases.
Characteristics of Power (Cont’d)
31

4. Power varies with N. Increasing N also


increases power.
5. Power varies directly with the magnitude
of the real effect of the independent
variable. The power of an experiment is
greater for large effects than for small
effects.
6. Power varies with the alpha level. If alpha
is made more stringent, power decreases
(because beta increases).
Type I Errors
32

 We make a Type I error when we reject the null


hypothesis even though it is true. The probability
of making a Type I error is determined by alpha
or significance level and is symbolized by  .
 If we set alpha at .05, a fairly standard level of
significance, we have a .05 probability of making
a Type I error.
 Conversely, we have a .95 probability (1 -  ) of
making a correct decision (when the null
hypothesis is true).
Type II Errors
33

Power is concerned with the probability of


making a Type II error.
A Type II error occurs when we retain the null
hypothesis even though it is false.
If we are conducting a study, it means that
our intervention really does have an effect
but that the hypothesis test failed to detect it.
Power and Beta
34

When talking about power, it is necessary to


reverse our perspective from hypothesis testing.
Rather than examining the probability of making an
error, we are concerned with the probability of
reaching the correct decision. Of course, the two
are closely related.
The probability of making a Type II error is called
 is symbolized by the Greek letter .
beta and
Power, therefore, is the probability of correctly
rejecting
 the null hypothesis when it is false or
(1 - ).
Alpha, Beta, and Power
35
7.3. Factors that Affect Power
36

Power is a complex statistical concept.


Many things can affect the statistical
power of a study.
However, there are three important
factors that affect the power of a
hypothesis test.
Alpha
Sample size
Effect size
Alpha and Power
37

 Reducing the alpha level will reduce the power of the test.
 As the alpha level becomes more stringent (for example,
going from .05 to .01) the power of the test decreases.
This is because the probability of a Type II error (beta)
increases as the probability of a Type I error (alpha)
decreases.
 If beta increases, power or the value of (1 - ) will
decrease. Thus, a small alpha means that we are  less
likely to reject the null hypothesis when it is true (Type I
error), but at the same time you are more likely to retain a
null hypothesis that is false (Type II error), thus
decreasing the power of the test.
Sample Size and Power
38

 As the sample size increases, the power of the test


increases.
 As the size of the sample increases the better it will
represent the population from which it was drawn. In
other words, if there is a real treatment effect in the
population, we are more likely to find it with a large
sample than a small sample.
 Remember that the standard error is in part a function
of sample size. When we compute the standard error,
the denominator of the equation includes N.
 So, the larger the sample size the smaller the standard
error and a smaller standard error means it is easier to
detect an effect if one is present.
Effect Size and Power
39

Other things being equal, the larger the effect


produced by the independent variable on a
given dependent variable for the population,
the more likely it is that the effect will be
statistically significant and the greater the
statistical power.
While alpha level and sample size are
relatively straight-forward, effect size is a
problematic factor in power analysis.
Two Issues Regarding Effect Size and Power
40

1. In designing a study, it is usually difficult to know what


effect size is reasonable to expect from the
independent variable under investigation. This makes
it difficult to plan research with sufficient power to
detect an effect if it is present.
2. Effect size will vary depending on the relative values of
the difference between the means and the variance
(as measured by the standard deviation). If there is a
relatively large amount of variance in the samples, the
size of the mean difference must be larger in order to
obtain even a small effect size. The problem is that
several factors can effect these terms such as the
homogeneity of the population, sensitivity of the
measurement instrument, etc.
Increasing Power
41

Altering any of these 3 factors (alpha,


sample size, effect size) can increase
power. However, not all factors produce
the same magnitude of change in power.
All things being equal, relatively little gain
is derived from increasing the alpha level.
Greater gain is obtained by increasing
the sample size and the effect size.
Different Effect Sizes
42

 Different types of data require different statistical


procedures, which in turn use different measures of effect
size.
 A z-score is a measure of effect size.
 Three commonly used effect sizes are:
Cohen’s d is similar to a z-score and is calculated as,

Xtreatment  X
d control
s
control

r, the correlation coefficient, and;

r2, the percent of variance accounted for.


Cohen’s effect size d
43

 Literally interpreted, an effect size of d = .50 means


that the treatment group mean is about one-half of a
standard deviation above or below the comparison
group mean.
 The difference between the treatment group mean and
the control group mean is standardized using the
control group standard deviation as an estimate of the
population standard deviation.
 It provides a standard indicator of the strength of the
effect across different studies.
Magnitude of Effect Size
44

 Effect size is often interpreted as small, medium,


or large.
 Most power tables report power using one or
more of these three effect indicators. According to
Cohen (1977), the following effect sizes
correspond to small, medium, and large effects.
Relationship among Effect Sizes
45

These three effect indicators are related


to one another. That is, we can convert
from one to the other by using a simple
calculation.

If the power table we are using reports


power for only one indicator and we have
another, we can easily convert your effect
size to the one in the table below.
Table. Converting one form to another
46

You might also like