0% found this document useful (0 votes)
28 views13 pages

Unit 7 2 Hypothesis Testing and Test of Differences

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 13

Hypothesis Testing

Alternative Way of Making Conclusions in Hypothesis Testing


• one may want to make a decision by obtaining the p-value
• p-value is the smallest value of ! for which Ho will be rejected based on sample information
• reporting the p-value will allow the reader of the published research to evaluate the extent to which the data
disagree with Ho.
• in particular, it enables researcher to choose his personal value of !
• if p-value < !, then Ho is rejected, otherwise, Ho is not rejected

Measuring Effect Size


A hypothesis test does not really evaluate the absolute size of a treatment effect. To correct this problem, it is
recommended that whenever researchers report a statistically significant effect, they also provide a report of the effect
size (see the guidelines presented by L. Wilkinson and the APA Task Force on Statistical Inference, 1999). Therefore,
as we present different hypothesis tests we also present different options for measuring and reporting effect size.

Definition. A measure of effect size is intended to provide a measurement of the absolute magnitude of a treatment
effect, independent of the size of the sample(s) being used.

One of the simplest and most direct methods for measuring effect size is Cohen’ s d. Cohen (1988)
recommended that effect size can be standardized by measuring the mean difference in terms of the standard deviation.
The resulting measure of effect size is computed as

Interpreting Effect Size using Cohen’s d

Interpreting r2 In addition to developing the Cohen’s d measure of effect size, Cohen (1988) also proposed
criteria for evaluating the size of a treatment effect that is measured by r2. The criteria were actually suggested for
evaluating the size of a correlation, r, but are easily extended to apply to r2. Cohen’s standards for interpreting r2 are
shown in table below.

Confidence Interval

Definition. A confidence interval is an interval, or range of values, centered around a sample statistic. The logic
behind a confidence interval is that a sample statistic, such as a sample mean, should be relatively near to the
corresponding population parameter. Therefore, we can confidently estimate that the value of the parameter should be
located in the interval.

1
Test of Differences Between Means

I. t Statistic (Student’s t test)

There are two general research designs that can be used to obtain the two sets of data to be compared:
• The two sets of data could come from two completely separate groups of participants. For example, the study
could involve a sample of men compared with a sample of women. Or the study could compare grades for one
group of fresh- men who are given laptop computers with grades for a second group who are not given
computers.
• The two sets of data could come from the same group of participants. For example, the researcher could obtain
one set of scores by measuring depression for a sample of patients before they begin therapy and then obtain a
second set of data by measuring the same individuals after 6 weeks of therapy.

Definition. The t statistic is used to test hypotheses about an unknown population mean, µ, when the value of is
unknown. The formula for the t statistic has the same structure as the z-score formula, except that the t statistic uses the
estimated standard error in the denominator.

Definition. Degrees of freedom describe the number of scores in a sample that are independent and free to vary.
Because the sample mean places a restriction on the value of one score in the sample, there are n – 1 degrees of
freedom for a sample with n scores

Two basic assumptions are necessary for hypothesis tests with the t statistic.
• The values in the sample must consist of independent observations. In everyday terms, two observations
are independent if there is no consistent, predictable relationship between the first observation and the second.
More precisely, two events (or observations) are independent if the occurrence of the first event has no effect
on the probability of the second event.
• The population that is sampled must be normal. This assumption is a necessary part of the mathematics
underlying the development of the t statistic and the t distribution table. However, violating this assumption
has little practical effect on the results obtained for a t statistic, especially when the sample size is relatively
large. With very small samples, a normal population distribution is important. With larger samples, this
assumption can be violated without affecting the validity of the hypothesis test. If you have reason to suspect
that the population distribution is not normal, use a large sample to be safe.

A. t test for Independent Samples


Definition. A research design that uses a separate group of participants for each treatment condition (or for each
population) is called an independent-measures research design or a between-subjects research design.

The goal of an independent-measures research study is to evaluate the mean difference between two populations
(or between two treatment conditions). Using subscripts to differentiate the two populations, the mean for the first
population is "# , and the second population mean is "$ . The difference between means is simply "# − "$ . As always,
the null hypothesis states that there is no change, no effect, or, in this case, no difference. Thus, in symbols, the null
hypothesis for the independent-measures test is
H0: "# − "$ = 0 or "# = "$ (No difference between the population means)

Example: Using the Data File for STAT 201, test whether the grades of male students in English significantly differ
from the female students assuming that the data is approximately normally distributed and the students were randomly
selected. Use the steps in hypothesis testing.

Solution:
1. Formulate the null hypothesis.
Ho: There is no significant difference in the grades of students in English when the students were classified
as to sex.
Ha: There is a significant difference in the grades of students in English when the students were classified as
to sex.

2. Set the level of significance and tailedness of the test.


! = 0.05
'()*+,-+..: two-tailed

3. Determine the test to be used.


Test statistic: t-test for Independent Samples

2
4. Compute the statistical test.
Using SPSS
Steps: (1) Click Analyze, then select (2) Compare Means, and (3) click Independent-Samples T
Test.

A dialog box will open, (4) Put Grades in English under the Test Variable (s) box, and (5) Sex under
Grouping Variable box.

Then (6) Click Define Groups… box and (7) write 1 for Group 1, and 2 for Group 2. After which, (8)
Click Continue, and the (9) Click OK.

3
SPSS Output
Group Statistics
Std. Std. Error
Sex N Mean Deviation Mean
Grades in Male 23 84.3478 3.52406 .73482
English Female 22 90.2273 3.66362 .78109
Std. Deviation (Entire Group) = 4.63

Independent Samples Test


Levene's
Test for
Equality of
Variances t-test for Equality of Means
95% Confidence
Interval of the
Sig. (2- Mean Std. Error Difference
F Sig. t df tailed) Difference Difference Lower Upper
Grades in Equal variances
1.167 .286 -5.487 43 .000 -5.87945 1.07146 -8.04025 -3.71865
English assumed
Equal variances
-5.482 42.697 .000 -5.87945 1.07240 -8.04260 -3.71629
not assumed

Effect Size:
56 − 57 90.23 − 84.35 5.88
01ℎ+-’. , = = = = 1.27
8',. :+;)(')1- 4.63 4.63
Interpretation:
Results in the Independent Samples Test table showed that we are 95% confident that the mean difference of -
5.879 falls between -8.04 and -3.719 with a standard error of 1.071. It further shows that, significant difference
exited in the grades of students in English when the students were classified as to sex since the p-value of 0.000 is
less than the level of significance which is 0.05 with a t-value of -5.487 and the degrees of freedom of 43.
This simply suggest that female students with a mean grade of 90.23 performed better than male students
with a mean grade of 84.35 as shown in the Group Statistics table with a large effect size, d=1.27.

5. Compare the significance/ probability obtained to the level of significance. Make your decision.
Reject H0 if p≤α, otherwise do not reject.
Decision: Since the p-value of 0.000 is less than the level of significance which is 0.05, reject the null hypothesis
(Ho).

6. Make your conclusion.


Conclusion: There is a significant difference in the grades of students in English when the students were classified
as to sex, ' 43 = −5.487, G = 0.000. This means that female students with a mean grade of 90.23 performed
better than male students with a mean grade of 84.35 with a large effect size, d=1.27.

B. t test for Dependent (or Related) Samples


Definition. A repeated-measures design, or a within-subject design, is one in which the dependent variable is
measured two or more times for each individual in a single sample. The same group of subjects is used in all of the
treatment conditions.
The main advantage of a repeated-measures study is that it uses exactly the same individuals in all treatment
conditions. Thus, there is no risk that the participants in one treatment are substantially different from the participants
in another. With an independent-measures design, on the other hand, there is always a risk that the results are biased
because the individuals in one sample are systematically different (smarter, faster, more extroverted, and so on) than
the individuals in the other sample.
4
Example: Using the Data File for STAT 201, test if there is a significant difference between the pretest and the post test
scores of students when exposed to a certain intervention assuming that the data is approximately normally distributed.
Use the steps in hypothesis testing.

Solution:
1. Formulate the null hypothesis.
Ho: There is no significant difference between the pretest and the post test scores of students when exposed
to a certain intervention.
Ha: There is a significant difference between the pretest and the post test scores of students when exposed to
a certain intervention.

2. Set the level of significance and tailedness of the test.


! = 0.05
'()*+,-+..: two-tailed

3. Determine the test to be used.


Test statistic: t-test for Dependent Samples

4. Compute the statistical test.


Using SPSS

Steps: (1) Click Analyze, then select (2) Compare Means, and (3) click Paired-Samples T Test.

A dialog box will open, (4) Click and Put Pretest and Posttest under the Paired Variables box, and (5)
Click OK.

5
SPSS Output

Paired Samples Statistics


Mean N Std. Deviation Std. Error Mean
Pair 1 Pretest Score 81.5556 45 5.51673 .82239
Posttest Score 85.6444 45 3.95518 .58960
Paired Samples Correlations
N Correlation Sig.
Pair 1 Pretest Score
& Posttest 45 .262 .082
Score

Effect Size: H = 0.262, H $ = 0.0686 = 0.07

Paired Samples Test


Paired Differences t df Sig. (2-tailed)
95% Confidence
Interval of the
Std. Std. Error Difference
Mean Deviation Mean Lower Upper
Pair 1 Pretest Score -
-4.08889 5.88458 .87722 -5.85681 -2.32097 -4.661 44 .000
Posttest Score
Interpretation:
Results in the Paired Samples Test table showed that we are 95% confident that the mean difference of
-4.089 falls between -5.857 and -2.321 with a standard error of 0.877. It further shows that, there is a
significant difference in the pretest and the post test scores of students when exposed to a certain intervention
since the p-value of 0.000 is less than the level of significance which is 0.05 with a t-value of -4.661 and the
degrees of freedom of 44.
This simply suggest that students perform well when exposed to a certain intervention with a pretest
score of 81.56 and a posttest score of 85.64 as shown in the Paired Sample Statistics table with a small effect
size of 7% (r $ = 0.07).

5. Compare the significance/ probability obtained to the level of significance. Make your decision.
Reject H0 if p≤α, otherwise do not reject.
Decision: Since the p-value of 0.000 is less than the level of significance which is 0.05, reject the null hypothesis
(Ho).

6. Make your conclusion.


Conclusion: There is a significant difference in the pretest and the post test scores of students when exposed to a
certain intervention, ' 44 = −4.661, G = 0.000. This simply suggest that students perform well when exposed to
a certain intervention with a pretest score of 81.56 and a posttest score of 85.64 with a small effect size of 7% (r $ =
0.07).

C. One-way Analysis of Variance (ANOVA)


In everyday language, the scores are different; in statistical terms, the scores are variable. Our goal is to measure
the amount of variability (the size of the differences) and to explain why the scores are different.
The first step is to determine the total variability for the entire set of data. To compute the total variability, we
combine all of the scores from all of the separate samples to obtain one general measure of variability for the complete
experiment. Once we have measured the total variability, we can begin to break it apart into separate components. The
word analysis means dividing into smaller parts. Because we are going to analyze variability, the process is called
6
analysis of variance. This analysis process divides the total variability into two basic components.
• Between-Treatments Variance. We calculate the variance between treatments to provide a measure of the
overall differences between treatment conditions. Notice that the variance between treatments is really
measuring the differences between sample means.
• Within-Treatment Variance. In addition to the general differences between treatment conditions, there is
variability within each sample. The within-treatments variance provides a measure of the variability inside
each treatment condition.

Analyzing the total variability into these two components is the heart of ANOVA.
Thus, the entire process of ANOVA requires nine calculations: three values for SS, three values for df, two
variances (between and within), and a final F-ratio. However, these nine calculations are all logically related and are all
directed toward finding the final F-ratio. The figure below shows the logical structure of ANOVA calculations.

If an ANOVA were used to evaluate these data, a significant F-ratio would indicate that at least one of the
sample mean differences is large enough to satisfy the criterion of statistical significance. As the name implies, post
hoc tests are done after an ANOVA. More specifically, these tests are done after ANOVA when
1. You reject Ho and
2. There are three or more treatments (k ≥3).
Rejecting Ho indicates that at least one difference exists among the treatments. If there are only two treatments,
then there is no question about which means are different and, therefore, no need for posttests. However, with three or
more treatments (k ≥3), the problem is to determine exactly which means are significantly different.

Definition. Post hoc tests (or posttests) are additional hypothesis tests that are done after an ANOVA to determine
exactly which mean differences are significant and which are not.

The independent-measures ANOVA requires the same three assumptions that were necessary for the
independent-measures t hypothesis test:
1. The observations within each sample must be independent.
2. The populations from which the samples are selected must be normal.
3. The populations from which the samples are selected must have equal variances (homogeneity of variance).

Example: Using the Data File for STAT 201, test if there is a significant difference in the SATT score of students when
classified as to the highest educational attainment of the father assuming that the data is approximately normally
distributed. Use the steps in hypothesis testing.

Solution:
1. Formulate the null hypothesis.
Ho: There is no significant difference in the SATT score of students when classified as to highest educational
attainment of the father.
Ha: There is a significant difference in the SATT score of students when classified as to highest educational
attainment of the father.

7
2. Set the level of significance and tailedness of the test.
! = 0.05
'()*+,-+..: two-tailed

3. Determine the test to be used.


Test statistic: One-way Analysis of Variance (ANOVA)

4. Compute the statistical test.


Using SPSS
Steps: (1) Click Analyze, then select (2) Compare Means, and (3) click One-Way ANOVA.

A dialog box will open, (4) Click and Put SATT Score under the Dependent List box, and (5) Click Father
HEA in Factor box.

Click Options box, check Descriptive, and click Continue.

8
For Post Hoc Test, click Post Hoc box, check either Scheffe, LSD, or Bonferroni (depending on the data/
study) and click Continue.

SPSS Output
Descriptives
SATT Score
95% Confidence Interval
for Mean Minimum Maximum
Std. Std. Lower Upper
N Mean Deviation Error Bound Bound
Secondary 15 79.2667 5.21627 1.34684 76.3780 82.1553 70.00 87.00
Bachelor's Degree 15 83.4000 5.05399 1.30494 80.6012 86.1988 70.00 90.00
Master's Degree 15 90.7333 4.25049 1.09747 88.3795 93.0872 84.00 96.00
Total 45 84.4667 6.74739 1.00584 82.4395 86.4938 70.00 96.00

ANOVA
SATT Score
Sum of
Squares df Mean Square F Sig.
Between Groups 1011.733 2 505.867 21.429 .000
Within Groups 991.467 42 23.606
Total 2003.200 44
Effect Size:
7K L7M XY.Z[LZX.$Z ##.^\
Between Masters and Secondary: 01ℎ+-’. , = = = = 1.70
NOP.QRSTUOTVW \.Z] \.Z]
7K L7_ XY.Z[L`[.^Y Z.[[
Between Masters and Bachelor’s Degree: 01ℎ+-’. , = = = = 1.09
NOP.QRSTUOTVW \.Z] \.Z]

Interpretation:
Results in the ANOVA table showed that significant difference exited in the SATT score of students when the
students classified as to highest educational attainment of the father since the p-value of 0.000 is less than the level
of significance which is 0.05 with a F-value of 21.429 and the degrees of freedom between groups is 2 and within
groups is 42.
Since a significant difference existed in the SATT score of students when the students classified as to highest
educational attainment of the father, a post hoc test will be employed to determine where significant difference
existed between groups/categories in the highest educational attainment of the father.

9
Multiple Comparisons
Dependent Variable: SATT Score
Mean 95% Confidence Interval
Difference Lower Upper
(I) Father HEA (J) Father HEA (I-J) Std. Error Sig. Bound Bound
Scheffe Secondary Bachelor's Degree -4.13333 1.77412 .078 -8.6355 .3688
Master's Degree -11.46667* 1.77412 .000 -15.9688 -6.9645
Bachelor's Secondary 4.13333 1.77412 .078 -.3688 8.6355
Degree Master's Degree -7.33333* 1.77412 .001 -11.8355 -2.8312
Master's Degree Secondary 11.46667* 1.77412 .000 6.9645 15.9688
Bachelor's Degree 7.33333* 1.77412 .001 2.8312 11.8355
*. The mean difference is significant at the 0.05 level.
Interpretation:
Using Scheffe as a post hoc test, the significant difference in the SATT score existed between students whose
father are secondary graduate and those students whose father are master’s degree holder (Mean Diff.=-11.467,
p=0.000). This simply means that students whose father are master’s degree holder have better SATT score than
students whose father are secondary graduate with a large effect size, d=1.70.
Also, the significant difference in the SATT score existed between students whose father are bachelor’s
degree holder and those students whose father are master’s degree holder (Mean Diff.=-7.333, p=0.000). This simply
means that students whose father are master’s degree holder have better SATT score than students whose father are
bachelor’s degree holder with a large effect size, d=1.09.

5. Compare the significance/ probability obtained to the level of significance. Make your decision.
Reject H0 if p≤α, otherwise do not reject.
Decision: Since the p-value of 0.000 is less than the level of significance which is 0.05, reject the null hypothesis
(Ho).

6. Make your conclusion.


Conclusion: There is a significant difference in the SATT score of students when classified as to highest
educational attainment of the father, a 2,42 = 21.429, G = 0.000.
Using Scheffe as a post hoc test, the significant difference in the SATT score existed between students whose
father are secondary graduate and those students whose father are master’s degree holder (Mean Diff.=-11.467,
p=0.000). This simply means that students whose father are master’s degree holder have better SATT score than
students whose father are secondary graduate with a large effect size, d=1.70.
Also, the significant difference in the SATT score existed between students whose father are bachelor’s degree
holder and those students whose father are master’s degree holder (Mean Diff.=-7.333, p=0.000). This simply
means that students whose father are master’s degree holder have better SATT score than students whose father are
bachelor’s degree holder with a large effect size, d=1.09.

10
Problem Set:
A. Using the Data File for STAT 201, answer the following problems using the steps in hypothesis testing.
1. Is there a significant difference in the SATT score of students when classified as to sex?

Report
SATT Score
Sex Mean N Std. Deviation
Male 84.7391 23 5.30195
Female 84.1818 22 8.11017
Total 84.4667 45 6.74739

T-Test
Group Statistics

Sex N Mean Std. Deviation Std. Error Mean


SATT Score Male 23 84.7391 5.30195 1.10553
Female 22 84.1818 8.11017 1.72909

Independent Samples Test


Levene's Test
for Equality of
Variances t-test for Equality of Means
95% Confidence
Interval of the
Sig. (2- Mean Std. Error Difference
F Sig. t df tailed) Difference Difference Lower Upper
SATT Equal
Score variances 4.988 .031 .274 43 .785 .55731 2.03367 -3.54397 4.65859
assumed
Equal
variances
.272 35.945 .788 .55731 2.05231 -3.60518 4.71981
not
assumed

2. Is there a significant difference in the pretest scores of students when classified as to type of high school graduated
from?
Report
Pretest
HS Graduated Mean N Std. Deviation
Public 81.3214 28 5.93204
Private 81.9412 17 4.90498
Total 81.5556 45 5.51673

11
T-Test

Group Statistics
HS Graduated N Mean Std. Deviation Std. Error Mean
Pretest Public 28 81.3214 5.93204 1.12105
Private 17 81.9412 4.90498 1.18963

Independent Samples Test


Levene's Test
for Equality of
Variances t-test for Equality of Means
95% Confidence
Interval of the
Sig. (2- Mean Std. Error Difference
F Sig. t df tailed) Difference Difference Lower Upper
Pretest Equal
variances .401 .530 -.362 43 .719 -.61975 1.71324 -4.07482 2.83532
assumed
Equal
variances
-.379 38.870 .707 -.61975 1.63462 -3.92643 2.68694
not
assumed

B. Ninety students in a math class were exposed in an intervention where the teacher used geogebra in teaching
graphing trigonometric functions. Students were given a test to measure their performance on the said topic. Test
if there is a significant difference between the pretest and the post test scores of students when exposed to an
intervention of using geogebra in teaching graphing trigonometric functions assuming that the data is
approximately normally distributed. Use the steps in hypothesis testing.

T-Test
Paired Samples Statistics
Std. Error
Mean N Std. Deviation Mean
Pair 1 Pretest 83.5556 45 5.70309 .85017
Posttest 87.2222 45 4.63136 .69040

Paired Samples Correlations


N Correlation Sig.
Pair 1 Pretest & Posttest 45 .453 .002

Paired Samples Test


Paired Differences
Std. 95% Confidence Interval
Std. Error of the Difference Sig. (2-
Mean Deviation Mean Lower Upper t df tailed)
Pair 1 Pretest - Posttest -3.66667 5.48137 .81711 -5.31345 -2.01988 -4.487 44 .000

12
C. Using the Data File for STAT 201, answer the given problem using the steps in hypothesis testing.
1. Is there a significant difference in the Entrance Test score of students when classified as to the highest educational
attainment of the father?

Report
Entrance Test
HEA Mean N Std. Deviation
Secondary 318.6667 15 69.44439
Bachelors 356.1333 15 61.94568
Masters 418.0667 15 38.68825
Total 364.2889 45 70.35482

Oneway
ANOVA
Entrance Test

Sum of Squares df Mean Square F Sig.


Between Groups 75599.244 2 37799.622 11.165 .000
Within Groups 142192.000 42 3385.524
Total 217791.244 44

Post Hoc Tests


Multiple Comparisons
Dependent Variable: Entrance Test
95% Confidence Interval
Mean Lower Upper
(I) HEA (J) HEA Difference (I-J) Std. Error Sig. Bound Bound
Scheffe Secondary Bachelors -37.46667 21.24625 .223 -91.3831 16.4498
Masters -99.40000* 21.24625 .000 -153.3165 -45.4835
Bachelors Secondary 37.46667 21.24625 .223 -16.4498 91.3831
Masters -61.93333* 21.24625 .021 -115.8498 -8.0169
Masters Secondary 99.40000* 21.24625 .000 45.4835 153.3165
Bachelors 61.93333* 21.24625 .021 8.0169 115.8498
Bonferroni Secondary Bachelors -37.46667 21.24625 .255 -90.4477 15.5144
Masters -99.40000* 21.24625 .000 -152.3811 -46.4189
Bachelors Secondary 37.46667 21.24625 .255 -15.5144 90.4477
Masters -61.93333* 21.24625 .017 -114.9144 -8.9523
Masters Secondary 99.40000* 21.24625 .000 46.4189 152.3811
Bachelors 61.93333* 21.24625 .017 8.9523 114.9144
*. The mean difference is significant at the 0.05 level.

13

You might also like