Reviewer For Advanced Statistics
Reviewer For Advanced Statistics
Types of Exam:
Identification
Multiple Choice
Solving
Note: Bring your own scientific calculator during exam. Sharing of calculator is not allowed. Cellphone
is strict not allowed!
Hypothesis
a supposition or proposed explanation made on the basis of limited evidence as a starting
point for further investigation.
a proposition made as a basis for reasoning, without any assumption of its truth.
an educated guess about something in the world around you. It should be testable, either by
experiment or observation.
A hypothesis is a specific statement of prediction. It describes in concrete (rather than
theoretical) terms what you expect will happen in your study. Not all studies have hypotheses.
Sometimes a study is designed to be exploratory. There is no formal hypothesis, and perhaps
the purpose of the study is to explore some area more thoroughly in order to develop some
specific hypothesis or prediction that can be tested in future research. A single study may have
one or many hypotheses.
Examples of Hypothesis
1. A new medicine you think might work.
2. A way of teaching you think might be better.
3. A possible location of new species.
4. A fairer way to administer standardized tests.
Hypothesis Statement
If you are going to propose a hypothesis, it’s usual to write a statement.
Your statement will look like this:
“If I will…(do this to an independent variable)….then (this will happen to the dependent
variable).”
For example:
If I (decrease the amount of water given to herbs) then (the herbs will increase in size).
If I (give patients counseling in addition to medication) then (their overall depression
scale will decrease).
If I (give exams at noon instead of 7) then (student test scores will improve).
If I (look in this certain location) then (I am more likely to find new species).
Hypothesis testing in statistics is a way for you to test the results of a survey or
experiment to see if you have meaningful results. You’re basically testing whether your results
are valid by figuring out the odds that your results have happened by chance. If your results
may have happened by chance, the experiment won’t be repeatable and so has little use.
Hypothesis testing can be one of the most confusing aspects for students, mostly
because before you can even perform a test, you have to know what your null hypothesis is.
Often, those tricky word problems that you are faced with can be difficult to decipher.
But it’s easier than you think; all you need to do is:
1. Figure out your null hypothesis,
2. State your null hypothesis,
3. Choose what kind of test you need to perform,
4. Either support or reject the null hypothesis.
The First step in the decision-making procedure is to state the null hypothesis. The
null hypothesis is a hypothesis of “no effect” and is usually formulated for the express
purpose of being rejected; that is, it is the negation of the point one is trying to make. If it is
rejected, the alternative hypothesis is supported. The alternative hypothesis is the
operational statement of the experimenter’s research hypothesis. The research is the
prediction derived from the theory under test.
In advance of the data collection, we specify a set of all possible samples that could
occur when the null hypothesis is true. From these we specify a subset of possible samples
which are so inconsistent with the null hypothesis (or so extreme) that the probability is very
small, when the null hypothesis is true, that the sample that we actually observe a sample
which is included in the subset, we reject the null hypothesis.
In other words, the maximum probability with which we could be willing to risk a
Type I error is called the level of significance of the test. This probability, often denoted by
α (alpha), is generally specified before any samples are drawn, so that results obtained will
not influence our choice.
In practice, a level of significance of 0.05 or 0.01 is customary, although other values
are used. If for example a 0.05 or 5% level of significance is chosen in designing a test of
hypothesis, then there are about 5 chances in 100 that we would reject the hypothesis
when it should be accepted, i.e., we are about 95% confident that we have made the right
decision. In such case we say that the hypothesis has been rejected as a 0.05 level of
significance, which means that we could be wrong with probability 0.05.
If the level of significance used is 0.10 or 10%, then we are about 95% confident
that we have made the right decision.
Dependent or Correlated Samples are data that consist of information obtained from pairs or repeated
measures. These includes set of twins; one is being assigned to each experimental group; pairs of
siblings being similarly used; paired students with the same characteristics such as with the same
academic status.
The dependent samples t-test is used to compare the sample means from two related groups. This
means that the scores for both groups being compared come from the same people. The purpose of
this test is to determine if there is a change from one measurement (group) to the other.
Basic Hypotheses
Null: The mean difference between the two groups is not different from 0.
Alternative: The mean difference between the two groups is different from 0.
Real-World Examples
Pre-test/Post-test
Is there an improvement in reading scores after participating in the Read Like a Pro
course?
Do people recall more words after learning a memorization strategy?
When to Choose a Paired T-Test / Paired Samples T-Test / Dependent Samples T-Test
Choose the paired t-test if you have two measurements on the same item, person or thing.
You should also choose this test if you have two items that are being measured with a unique
condition.
For example, you might be measuring car safety performance in vehicle research and testing
and subject the cars to a series of crash tests. Although the manufacturers are different, you might be
subjecting them to the same conditions.
With a “regular” two sample t-test, you’re comparing the means for two different samples. For
example, you might test two different groups of customer service associates on a business-related
test or testing students from two universities on their English skills. If you take a random sample each
group separately and they have different conditions, your samples are independent and you should
run an independent samples t-test (also called between-samples and unpaired-samples).
The null hypothesis for the independent samples t-test is μ1 = μ2. In other words, it assumes
the means are equal. With the paired t-test, the null hypothesis is that the pairwise
difference between the two tests is equal (H 0: µd = 0). The difference between the two tests is very
subtle; which one you choose is based on your data collection method.
∑D
N
√
t=
( (∑ D )
)
2
∑D −
2
N
( N −1)(N )
where,
∑D – Summation of the Difference (Sum of X – Y)
∑ D2 – Summation of the Squared Differences
¿ – Summation of the Differences, squared
N – Number of Samples
Illustrative Example 1:
Calculate the paired t-test for the following data
Score 1 (X) Score 2 (Y)
3 20
3 13
3 13
12 20
15 29
16 32
17 23
19 20
23 25
24 15
32 30
Solution:
1. Subtract each Y score from each X score and get the sum of the difference of X and Y
Score 1 (X) Score 2 (Y) X–Y
3 20 -17
3 13 -10
3 13 -10
12 20 -8
15 29 -14
16 32 -16
17 23 -6
19 20 -1
23 25 -2
24 15 9
32 30 2
∑ = -73
2. Get the square of the difference of X and Y and get its sum
Score 1 (X) Score 2 (Y) X–Y (X – Y)2
3 20 -17 289
3 13 -10 100
3 13 -10 100
12 20 -8 64
15 29 -14 196
16 32 -16 256
17 23 -6 36
19 20 -1 1
23 25 -2 4
24 15 9 81
32 30 2 4
∑ (X – Y) = -73 ∑ (X – Y)2 = 1131
3. Compute for t-value (Note that there are 11 samples in the given data)
∑D −73
N 11 −6.6364
=
√
t=
√
2 ¿ −6.6364
( ) √
( ∑ D )
2
(−73) 1131−484.4545 = = −2.74
∑ D 2
−
1131−
11
√ 5.8777
N 110
(11−1)(11)
( N −1)(N )
5. Determine the p-value in the t-table (see appendix B, two-tailed since the hypothesis for this
data is assumed as directional), using the degrees of freedom. If alpha (α) is not specified in
the problem, use all the time 0.05 (5%). For this given data, the t-value is 2.228.
6. Compare your t-table value from your computed t-value. If the computed t-value is greater than
or equal to the t-table value at an alpha level of .05, then reject the null hypothesis. Note that
you have to get the absolute value of the computed t-value. Don’t be confused with the
negative sign in the computed t-value since it also indicates with the direction of the t-value;
hence, the p-value remains the same for both directions.
In this case, the computed t-value (-2.74) is greater than to the t-table value (2.228) at an
alpha level of .05, then reject the null hypothesis. This implies that there is a significant
difference between the mean scores of the given data.
Illustrative Example 2:
In a study of effectiveness of physical exercise in weight reduction, a group of 10 persons engaged in
a prescribed program of physical exercise for one showed the following results:
Weight Before Weight After
(Pounds) (X) (Pounds) (Y)
200 196
178 171
169 170
212 207
180 177
165 162
201 199
179 173
243 231
144 140
Use the 0.05 level of significance to test if the weight after the prescribed program of physical
exercise is less than the weight before the prescribed physical exercise is given.
Compute for t-value (Note that there are 10 samples in the given data)
∑D 45
N 10 4.5
=
√
t=
√
2 ¿ 4.5
( ) √
( ∑ D )
2
(45) 309−202.5 = = 4.137
∑ D 2
−
309−
10
√ 1.183
N 90
(10−1)(10)
( N −1)(N )
Determine the p-value in the t-table (see appendix A, one-tailed), using the degrees of
freedom. Alpha (α) is specified in the problem which is 0.05 (5%). For this given data, the t-
value is 1.833.
6. Decision: Compare your t-table value from your computed t-value. If the computed t-value is
greater than or equal to the t-table value at an alpha level of .05, then reject the null
hypothesis.
In this case, the computed t-value (4.137) is greater than to the t-table value (1.833) at an
alpha level of .05, then reject the null hypothesis.
7. Conclusion: Based on the result, it can be concluded that the weight before the physical
exercise is significantly larger than the mean weight after the physical exercises as treatment.
The result further reveals that there exists a significant reduction of weight as indicated in the
test. Therefore, the prescribed program of physical exercise is effective in reducing weight.
Identification: Identify the word/phrase that best supply the missing information in each
scenario below
Scenarion1: To evaluate the effectiveness of tutorial session, the teacher gave a pretest before
the tutorial, and a posttest after the tutorial. Out of 36 students, the average difference (posttest
- pretest) is 1.6, and the standard deviation of the differences is 4.6. Can we say that the tutorial
session significantly enhances test performance with a significance level of α = 0.10?
Supply the following data:
1. H0 (in symbol, two-tailed): µ1 = µ2
2. Ha: (in symbol, two-tailed): µ1 ≠ µ2
T-table value is 2.2433
Computed t-value is 0.0201
3. Tc-value < Tt-value
4. Based on this, we should/we FAIL TO REJECT the null hypothesis.
Scenario 2: To evaluate the effectiveness of a new strength training program, the physical
strength of the two groups of volunteers was measured for the 3-month training. Can we say that
the new strength training program significantly enhances the physical strength of volunteers
with a significance level of α = 0.05?
Supply the following data:
1. H0 (in symbol, two-tailed): µ1 = µ2
2. Ha: (in symbol, two-tailed): µ1 ≠ µ2
T-table value is 5.3658
Computed t-value is 5.3758
3. Tc-value > Tt-value
4. Based on this, we should/we REJECT the null hypothesis.
Scenario 3: A study was conducted to determine the effectiveness of newly developed machine
to treat cancer. This was administered to one group sample. Supply the following data:
5. H0 (in symbol, one-tailed): µ1 = µ2
6. Ha: (in symbol, one-tailed): µ1 > µ2
T-table value is 1.5896
Computed t-value is 2.8975
7. Tc-value > Tt-value
8. Based on this, we should/we REJECT the null hypothesis.
df = 20 – 1 = 19