hypothesis testing

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 18

Hypothesis Testing

5
Making a Decision
4
about Rejecting or
Computing the sample Failing to Reject the
statistic Null Hypothesis
3
Collecting Data

2
Setting the level of significance,
or alpha level, for rejecting the
null hypothesis
1
Identifying a Null and
Alternative
Hypothesis
1.Identifying a Null and Alternative Hypothesis.

Null Hypothesis
The null hypothesis is a prediction about the
population and is typically stated using the language
of “no difference” (or “no relationship” or “no
association”)

e.g. there is no difference between smokers and


nonsmokers on depression scores

The alternative hypothesis, however, indicates a difference


(or relationship or association), and the direction of this
difference may be positive or negative (alternative Alternative
directional hypotheses) or either positive or negative
(alternative non-directional hypotheses). Hypothesis
e.g. there is a difference between smokers and
nonsmokers on depression scores
2. Setting the level of significance, or alpha level, for rejecting the null hypothesis

In this fi gure, we see a normal curve


illustrating the distribution of sample
means of all possible outcomes if the
null hypothesis is true. We would
expect most of our sample means to
fall in the center of the curve if the
hypothesis is true, but a small number
would fall at the extremes.

In other words, we would expect to find that, for any sample of smokers and nonsmokers, their depression
scores are similar, but in a small percentage of cases, you might actually find them to be different. As you
can see, there are shaded areas at each end of this curve. We would expect there to be an extremely low
probability that a score would fall within these areas.
Critical Region
The area on the normal curve for low probability values if the null hypothesis is true
is called the critical region. If sample data (i.e., the difference between smokers and
nonsmokers on depression) falls into the critical region, the null hypothesis is rejected.
This means that instead of “there is no difference” as stated in the null hypothesis, we
find the alternative to probably be true: “there is a difference.”

one-tailed test of significance


When the critical region for rejection of the null
hypothesis is divided into two areas at the tails of the sampling distribution.

two-tailed test of significance


When the critical region for rejection of the null
hypothesis is divided into two areas at the tails of the sampling distribution
3. Collecting Data

We collect data by administering an instrument


or recording behaviors on a check sheet for
participants. Then, as discussed earlier in this
chapter, we code the data and input it into a
computer fi le for analysis.
4. Computing the Sample Statistic
Next, using a computer program, you compute a statistic or p value and determine if it falls
inside or outside of the critical region. A p value is the probability ( p) that a result could have
been produced by chance if the null hypothesis were true. After calculating the p value, we
compare it with a value in a table located in the back of major statistics books

Degrees of freedom (df) used in a statistical test is usually one less than the number of
scores. For a sample of scores, df = n–1.

The degrees of freedom establish the number of scores in a sample that are independent
and free to vary because the sample mean places a restriction on sample variability. In a
sample of scores, when the value of the mean is known, all scores but one can vary (i.e.,
be independent of each other and have any values), because one score is restricted by
the sample mean (Gravetter & Wallnau, 2007).
The table 6.5 presents many of the common
statistical tests used in educational research.
we can use these questions to determine the
statistical test:
• Do you plan to compare groups or relate
variables in your hypotheses or research
questions?
• How many independent variables do you have in
one research question or hypothesis?
• How many dependent variables do you have in
one research question or hypothesis?
• Will you be statistically controlling for covariates
in your analysis of the research question or
hypothesis?
• How will your independent variable(s) be
measured?
• How will your dependent variable(s) be
measured?
• Are the scores on your variables normally
distributed; that is, could you assume a normal
curve if the scores were plotted on a graph?
5. Making a Decision about Rejecting or Failing to
Reject the Null Hypothesis
In Table 6.6, we compare smokers and nonsmokers in terms of their
scores on depression.
The statistical test you computed was a t-test analysis and it indicated
that the 26 nonsmokers have a mean of 69.77 on the depression scale,
whereas the 24 smokers have a mean of 79.79, a difference of 10.02
points between the two groups.
The two-tailed significance test indicates a t = -7.49 with 48 degrees of
freedom, resulting in a two-tailed p value of .00 ( p = .00).
This p value is statistically significant because it is less than alpha = .05.
If the p value is less than alpha, we reject the null hypothesis; if it is
greater than alpha, you accept the hypothesis. Our overall conclusion,
then, is that there is a difference between nonsmokers and smokers and
their depression, and we reject the null hypothesis (there is no difference)
and accept the alternative (there is a difference).
In making this statement, we followed this procedure:

1.Look at the value of the statistical test and its associated p value.

2.Determine if the observed p value is less than or greater than the value obtained
from a distribution of scores for the statistic (with certain degrees of freedom and with
either a one- or two-tailed test at a significance level). You can determine this table p
value by hand by comparing the value of the test statistic with the value in a
distribution table for the statistic. Alternatively, you can let the computer program
identify the observed p value, and you can interpret whether it is greater or less than
your alpha value.

3.Decide to reject or fail to reject the null hypothesis. We need to next decide if our
p value is statistically significant to reject or fail to reject the null hypothesis.
Statistical significance is when p value of the observed scores is less than the
predetermined alpha level set by the researcher.
This test examines whether
nonsmokers and smokers are
different in terms of their peer group
affiliation. The top table shows cells
containing information about the
observed count in each cell and an
expected count. For example, for
athletes, we expected 6.2 individuals
to be nonsmokers, and instead we
found 8. The Pearson chi-square test
= 1.71, with df = 3, resulted in a p
value (or significance level) of .635. At
p = .05, .635 is not statistically
signifificant, and our conclusion is to
fail to reject the null hypothesis. We
conclude that there is no detectable
difference between smokers and
nonsmokers and peer group
affiliation.
Potential Errors in Outcomes

The columns in this table represent the two actual states of affairs in the population: There
is no difference between smokers and nonsmokers on depression scores (said another
way, smokers and nonsmokers are equally depressed), or there really is a difference
between smokers and nonsmokers on depression scores. The information in the rows
shows the two decisions that researchers make based on actual data they receive: to reject
the null hypothesis or to fail to reject the null.
Four possible outcomes
Option A The researcher can reject the null hypothesis (i.e.,
Option B A there is a difference) when the population values are
truly such that there is no effect.
Option C

Option D

B The researcher can commit an error by failing to


reject the null hypothesis

The researcher can reject the null hypothesis when

C it should be rejected because an effect exists. This


is a correct decision and, therefore, no error is
committed.

D The researcher can fail to reject the null hypothesis


when it should not be rejected because there was no
effect.
Estimating Using Confidence Intervals
Confidence intervals provide additional
information about our hypothesis test. A It is helpful, then, to consider a range of
confidence interval or interval estimate values around the sample mean that it
is the range of upper and lower could take given the multiple collection of
statistical values that are consistent with samples. Researchers set a confidence
observed data and are likely to contain interval around this mean value of the
the actual population mean. Because sample to illustrate the potential range of
means are only estimates of population scores that are likely to occur. Moreover,
values, they can never be precise, and this occurrence is framed as a percent,
sample means indicate a point estimate such as 95% of the time (95 out of 100),
of the population mean. the population value will be within the
range of the interval. Moreover, this interval
can be identified by upper and lower limits,
the values that define the range of the
interval.
Determining Effect Size
Effect size identifies the strength of the conclusions about group differences or about the
relationship among variables in a quantitative study. We can calculate effect size between
groups in our high school smoking example. A researcher, for example, could examine the
means in Table 6.6 and see that the mean scores were 10.02 points apart, a sizable
difference on a 100-point scale. More precisely, we calculate effect sizes and report them in
terms of standard deviation units. For the t-test statistic, the effect size (ES) can be
calculated with the equation:

ES = Mean smokers - Mean nonsmokers / Standard Deviation weighted

where the standard deviation weighted can be obtained by averaging the standard deviations
for the smokers and nonsmokers, taking into account the size of the groups.
Reporting the Result
In Reporting the Result, the researcher also stays close
to the statistical findings without drawing broader
implications or meaning from them. A result section
includes:

Tables Figures
summarize statistical information (charts, pictures, drawings) that
portray variables and their
relationships

Detailed Explanation
• 1. Tables

A table is a summary of quantitative data organized into rows and columns. Typically, tables for
reporting results contain quantitative information, but they might contain text information such as
summaries of key studies found in the literature (and incorporated earlier in a study, before the
results). One advantage of using tables is that they can summarize a large amount of data in a
small amount of space
Below are some guidelines for creating tables.

1. Although you can present multiple statistical tests in one table, a general guideline
is to present one table for each statistical test. Sometimes, however, you can combine
data from different statistical analyses into a single table. For example, all descriptive
data to questions ( M, SD, and range) can be combined into a single table. However,
you should present each inferential test in an individual table.
2. Readers should be able to grasp easily the meaning of a table. Tables should
organize data into rows and columns with simple and clear headings. Also, the title for
the table should accurately represent the information contained in the table and be as
complete a description as possible.
3. It is important to know the level of statistical detail for descriptive and inferential
statistics to report in tables. An examination of tables in scholarly journals typically
provides models to use for the level of detail required for each type of statistical test.
4. Authors typically report notes that qualify, explain, or provide additional information
in the tables, which can be helpful to readers. Often, these notes include information
about the size of the sample reported in the study, the probability values used in
hypothesis testing, and the actual significance levels of the statistical test.

You might also like