Module 20 Inferential Statistics (Parametric Test)
Module 20 Inferential Statistics (Parametric Test)
MODULE 20
INFERENTIAL
STATISTICS
(PARAMETRIC TEST)
STATISTICS AND PROBABILITY
WHAT IS INFERENTIAL STATISTICS?
In inferential statistics, the researcher is trying to arrive at a
conclusion. When we say inference, this is an educated guess or
a meaningful prediction based on findings and conclusion.
t = t-test
x1 = data of group 1
x2 = data of group 2
= the sum of squares of group 1
= the sum of squares of group 2
= the number of observations in group 1
= the number of observations in group 2
Note that (1) the formula holds if the variances in the population are equal
and the population are normal and (2) the null hypothesis is “the means are
equal”. The null is true if t is close to zero.
EXAMPLE:
The following are the scores of 10 male students and 10 female third year
Computer Science students of Prof. Alyssa Jandy Angulo in the preliminary examination
in Statistics and Probability.
Scores of Male and Female Third Year Computer Science students
in the Preliminary Examination in Statistics and Probability
Male Female
28 24
36 18
34 22
32 10
8 20
24 6
24 14
20 4
18 12
34 26
Hypotheses:
Ho: There is no significant difference between the performance of the male and
female third year Computer Science students in the preliminary examination in
Statistics and Probability.
Ha: There is a significant difference between the performance of the male and
female third year Computer Science students in the preliminary examination in
Statistics and Probability.
Level of Significance:
Based on the problem;
= 0.05
df = n1 + n2 - 2 = 10 + 10 – 2 = 18
t(0.05) = 2.101 (see table for critical values of t-test)
Computation for t-test:
Plug into a formula:
Male Female
28 784 24 576
36 1296 18 324
34 1156 22 484
32 1024 10 100
8 64 20 400
24 576 6 36
24 576 14 196
20 400 4 16
18 324 12 144
34 1156 26 676
Conclusion:
Since the t-computed value of 2.77 is greater than the t-
tabular/critical value of 2.101 at 0.05 level of significance with
18 degrees of freedom, the null hypothesis is rejected in favor
of the research hypothesis. This means that there is a
significant difference between the performance of the male
and female third year Computer Science students in the
preliminary examination in Statistics and Probability.
t-TEST FOR
CORRELATED SAMPLES
t-test for Correlated Samples
Another parametric test is the t-test for correlated
samples. This test is applied only in a one group of samples.
This could be used in the evaluation of a certain program or
treatment. Since the t-test for correlated sample is another
parametric test, conditions must be met such as it should be
in a normal distribution and the use of interval or a ratio data.
Related groups would mean the subjects are measured at two different times, or
under two different conditions, or are matched on a specific criteria.
2 different times: e.g. pretest and posttest design or before and after
treatment/intervention
2 different conditions: e.g. testing the effect of fertilizers A and B on the mango
yields
Matched on specific criteria: e.g. case control study for 75 years old males with TB
Diagram:
Compare if there
is significant
Where;
t = t-test for correlated samples
= the mean difference between the pretest and the posttest
= the sum of the squares of the difference between the pretest
and the posttest
= the summation of the difference between the pretest and the
posttest
n = the sample size
Example:
During the first day of class, a professor conducted a 50-item
pre-test to his fifteen students in Statistics and Probability
before the formal lesson of the said subject. After a semester, he
gave a posttest to his fifteen same students using the same set
of examination that he was given in the pretest. He wants to
determine if there is a significant difference between the pretest
and the posttest. The following is the result of the experiment.
The professor use the = 0.01 level of significant.
Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Pretest 15 12 20 10 8 27 29 13 19 22 25 14 28 18 16
Postest 20 18 25 25 20 35 43 28 29 37 46 27 33 37 28
Solution:
Hypotheses:
Ho: There is no significant difference between the Pretest and Posttest of the
fifteen students in Statistics and Probability based on the teaching method used by
the Professor.
Ha: There is a significant difference between the Pretest and Posttest of the fifteen
students in Statistics and Probability based on the teaching method used by the
Professor.
Level of Significance:
Based on the problem;
= 0.01
df = = 15 - 1 = 14
t(0.01) = 2.977 (see table for critical values of t-test)
Compute the t-value using the t-test for correlated samples:
Pretest Posttest
D D2
Plug in a formula:
15 20 -5 25
12 18 -6 36
20 25 -5 25
10 25 -15 225
8 20 -12 144
27 35 -8 64
29 43 -14 196
13 28 -15 225
19 29 -10 100
22 37 -15 225
25 46 -21 441
14 27 -13 169
28 33 -5 25
Decision Rule:
18 37 -19 361
16 28 -12 144
If the t-computed value is greater
than/less than or beyond the critical
= -11.67 value, reject the Ho otherwise accept
Conclusion: the Ho.
The t-computed value of -8.84 is beyond the t-critical value of -2.997 at 0.01 level of significance
with 14 degrees of freedom. The null hypothesis is therefore to reject in favor of the research
hypothesis. This means that the posttest result is higher than the pretest result. It implies that the
method of teaching of a Professor is effective.
z-TEST
z-test
z-test is another test under parametric statistics which requires
normality of distribution. It is often applied in large samples (n > 30), if the
population standard deviation () is known. It is used to compare two
means, the sample mean, and the perceived population mean.
It is also used to compare the two sample means taken from the same
population. The z-test can be applied in two ways: (1) the One-Sample
Mean test and (2) the Two-Sample Mean test.
The tabular value of the z-test at 0.01 and 0.05 level of significance is
shown below:
Level of Significance
Test
0.01 0.05
One-tailed 2.33 1.645
Two-tailed 2.575 1.960
z-TEST FOR
ONE SAMPLE GROUP
z-test for One Sample Group
The z-test for one sample group is used to compare the perceived population
mean against the sample mean . Using this test, we can determine whether the
mean of a group differs from a specified value.
This procedure is based on the normal distribution. So for small samples, this
procedure works best if the data that were drawn is a normal distribution or one that
is close to normal. Usually, we can consider samples of size 30 or higher to be large
samples. However, if the population standard deviation is not known, the sample
standard deviation can be used as a substitute.
The null hypothesis used for this test is as follow:
where;
z = the z-test for one sample group
= sample mean
= hypothesized value of the population mean
= population standard deviation
n = sample size
Example:
Hypotheses:
Ho: The average performance of the students is 86%.
Ha: The average performance of the students is not 86%.
Level of Significance: Statistics:
z-test for a one-tailed test
= 0.05
z = 1.645 Here; = 80
= 86
n = 40
Plug in a formula: = 76
Decision Rule:
If the z-computed value is greater than or beyond the z-tabular, reject the Ho.
Conclusion:
Since the z-computed value of -0.49 is not beyond the critical value of -1.645 at 0.05
level of significance, the research hypothesis that is the average performance of the
students is 86% will be accepted.
z-TEST FOR
TWO SAMPLE MEANS
z-test for Two Sample Means
The z-test for two sample mean is another parametric test. This is basically
to compare the means of two independent groups where the samples were
drawn. The sample must be drawn from a normally distributed population.
The formula for the z-test for two sample means is:
Where:
= the mean of sample 1
= the mean of sample 2
= the variance of sample 1
= the variance of sample 2
= size of sample 1
= size of sample 2
The tabular value of the z-test at 0.01 and 0.05 level of
significance is shown below:
Level of Significance
Test
0.01 0.05
One-tailed 2.33 1.645
Two-tailed 2.575 1.960
Example:
An entrance examination was administered to incoming freshmen in the
department of Information Technology and the department of Computer
Science with 100 students each department randomly selected. The mean
scores of the given samples were = 89 and = 83 and the variances of the test
scores were 45 and 40 respectively. Is there a significant difference between
the two groups? Use 0.01 level of significance, two-tailed test.
Problem: Is there any significant difference between the two groups?
Hypotheses:
Ho: There is no significant difference between the two groups.
Ha: There is a significant difference between the two groups.
Level of Significance:
= 0.01
z = 2.575
Statistics:
Plug into a formula:
z-test for a two-tailed test
Here;
Decision Rule:
= 89
If the z-computed value is greater than or beyond
= 83
the z-tabular, reject the Ho.
=45
Conclusion:
=40
Since the z-computed of 6.52 is beyond the z-tabular
= 100 of 2.575 at a 0.01 level of significance, we could say that
we are going to reject the null hypothesis and instead we
= 100 are going to accept the alternative hypothesis that there is
a significant difference between the two groups.
F-TEST
F-test
The F-test is another parametric test and this can be called also as
Analysis of Variance (ANOVA).
Ronald A. Fisher developed the F-test. This test is use when the variances
of two populations are equal or when the distribution is normal and the level
of measurement is interval or ratio.
The test can be a two-tailed or a one-tailed test. If the variances are not
equal, the two-tailed test should be used while the one-tailed test used in
one direction where the variance of the first population is either greater or
less than the second population variance but it should be both.
There are three kinds of analysis of variance and these are (a) one-way
analysis of variance, (b) two-way analysis of variance and the (c) three-way
analysis of variance.
As stated before, the One-way ANOVA is also called as F-test and is used
to test null hypothesis that the means of the independent group means are
equal.
The F-test can be used for two or more independent groups, though
there is a misconception that it can only be used for more than two groups.
When performing a one-way ANOVA, we are testing the null hypothesis,
that is:
The ANOVA table has five columns and these are the
sources of variation, degrees of freedom, sum of squares, mean
squares and the F-value, both the computed and the tabular
values.
The sources of variations are between the groups, within the group itself
and the total variation.
The degree of freedom for the total is the total number of observation
minus 1. The degree of freedom from the between group is the total number
of groups minus 1. The degree of freedom for the within group is the total
degree of freedom (df) minus the between groups of degree of freedom (df).
F-value
Source of
df SS MS
Variation Computed Tabular
See the table
at 0.05 or the
Between
BSS desired level
Groups
of
significance
with df
Within between and
WSS
Groups within group
Total N-1 TSS
Based on the table:
1. K is the number of columns
2. N is the number of observation
3. BSS is the between sum of squares minus the CF, where
5. WSS is the within sum of squares or it is the difference between the TSS minus the BSS
6. MSB mean squares between is equal to the BSS/df
7. MSW mean square within is equal to the WSS/df
8. The
The F-computed value must be compared with the
F-tabular value at a given level of significance with the
corresponding df’s of BSS and WSS.
BRAND
DAY A(x1) B(x2) C(x3) (x1)2 (x2)2 (x3)2
1 4 8 3 16 64 9
2 6 3 5 36 9 25
3 2 6 3 4 36 9
4 5 4 6 25 16 36
5 2 7 4 4 49 16
Total
n1 = 5 n2 = 5 n3 = 5
Perform the analysis of variance and test the hypothesis at 0.05 level of
significance that the average sales of the three brands of cellular phone are
equal.
Problem: Is there any significant difference in the average sales of the three
brands of cellular phone?
Hypotheses:
Ho: There is no significant difference in the average sales of the three brands of
cellular phone.
Ha: There is a significant difference in the average sales of the three brands of cellular
phone.
Computation:
Between
2 8.93 4.465
Groups
3.89
1.45
Within
12 36.8 3.07
Groups
Total 14 45.73
Decision rule:
If the F-computed is greater than the F-tabular value, reject the null hypothesis.
Conclusion:
Since the F-computed value of 1.45 is less than the F-tabular value of 3.89 at
0.05 level of significance with 2 and 12 degrees of freedom, retain the null
hypothesis that is there is no significant difference in the average sales of the three
brands of cellular phone.
PEARSON PRODUCT
MOMENT COEFFICIENT
OF CORRELATION
What is Pearson Product Moment Coefficient of Correlation ?
The Pearson product-moment correlation symbolizes as r is a
parametric measure of association for two variables, say X and Y.
It measures both the strength and the direction of a linear
relationship.
If one variable X is an exact linear function of another
variable Y, a positive relationship exists if the correlation is 1. This
is also called as a perfect positive correlation.
14
12
10
0
0 1 2 3 4 5 6 7 8
14
12
10
0
0 1 2 3 4 5 6 7 8
0
0 2 4 6 8 10 12 14
POSITIVE CORRELATION
10
0
0 2 4 6 8 10 12 14
NEGATIVE CORRELATION
5
4.5
3.5
2.5
1.5
0.5
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5
NO CORRELATION; r = 0
Note that we use the r to determine the index of relationship
between two variables, the independent and the dependent
variables.
What is the Degree of Correlation?
1. Correlation is measured on a scale of -1 to
+1, where 0 indicates no correlation and either -1 or +1 suggests
high correlation. Both -1 and +1 are equally high degree of
correlation
2. Moderate correlation is often suggested by a correlation
coefficient of about 0.7. There is no absolute number guide for
correlation coefficient that tell when a two variables have low to
high degree of correlation; however, r closed to -1 or +1
suggest a high degree of correlation, values closed to 0 suggests no
correlation or low correlation and values
between 0.7 and 0.8 are moderate correlation.
How do we interpret the computed r?
Where:
r = the Pearson Product Moment Coefficient of Correlation
n = sample size
= sum of squares of x
= sum of squares of y
Example:
Consider the pre-test and the post test in Statistics and Probability of the
ten students of CS 3201.
Student
Exam A B C D E F G H I J
Pre-test (x) 56 70 60 85 75 87 72 89 75 86
Post-test
65 78 60 90 75 90 79 89 89 95
(y)
Level of Significance:
= 0.05
df = n – 2 = 10 – 2 = 8
r(0.05) = 0.6319
Computation (Statistics)
x y x2 y2 xy
56 65 3136 4225 3640
70 78 4900 6084 5460
60 60 3600 3600 3600
85 90 7225 8100 7650
75 75 5625 5625 5625
87 90 7569 8100 7830
72 79 5184 6241 5688
89 89 7921 7921 7921
75 89 5625 7921 6675
86 95 7396 9025 8170
= 755 =810 =58181 =66842 =62259
Apply into a formula: Decision Rule:
If the computed r value is
greater than the r tabular value,
reject the null hypothesis.
Conclusion:
Since the computer r-value which is 0.92 is
higher than the r-tabular value which is 0.632 at
0.05 level of significance with 8 degree of
freedom, we are going to reject the null
hypothesis. This means that there is a significant
relationship between the pre-test and post-test
of the ten students of CS in Statistics and
Probability. It implies that the higher the pre-test
the higher also the post-test and likewise, the
lower the pre-test, the lower the post test.
LINEAR
REGRESSION
What is a LINEAR REGRESSION?
Linear regression is the most basic and commonly used predictive analysis.
Regression estimates are used to describe data and to explain the relationship
between one dependent variable and one or more independent variables.
Linear regression analysis consists of more than just fitting a linear line
through a cloud of data points. It consists of 3 stages – (1) analyzing the
correlation and directionality of the data, (2) estimating the model, i.e., fitting the
line, and (3) evaluating the validity and usefulness of the model.
http://www.statisticssolutions.com/what-is-linear-regression/
How the Linear Regression Works?
Where;
x is the independent or the predictor variable
y is the dependent or the predictand variable
b is the slope of the line
a is the constant value
To be able to determine the linear regression equation, we
need to solve first for the value of a and b.
Example:
Consider table below:
x 1 2 3 4 5 6 7 8
y 3 1 4 2 5 3 7 5
a. Find the linear regression equation c. Plot the point and sketch the trend line
b. What is the value of y if x = 12?
x y xy x2
1 3 3 1
2 1 2 4
3 4 12 9
4 2 8 16
5 5 25 25
6 3 18 36
7 7 49 49
8 5 40 64
x = 36 y = 30 xy = 157 x2 = 204
Solving for b: Solving for a:
0
0 1 2 3 4 5 6 7 8 9
NEXT TOPIC IS
NON-PARAMETRIC
STATISTICS