0% found this document useful (0 votes)
113 views71 pages

Module 20 Inferential Statistics (Parametric Test)

The document discusses inferential statistics and parametric tests, specifically the t-test. It provides information on the t-test for two independent samples and the t-test for correlated samples. For the t-test for two independent samples, it discusses the assumptions, null and alternative hypotheses, and formulas. An example comparing exam scores of male and female students is shown. For the t-test for correlated samples, it discusses the assumptions and uses a paired t-test example to compare exam scores before and after an intervention.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views71 pages

Module 20 Inferential Statistics (Parametric Test)

The document discusses inferential statistics and parametric tests, specifically the t-test. It provides information on the t-test for two independent samples and the t-test for correlated samples. For the t-test for two independent samples, it discusses the assumptions, null and alternative hypotheses, and formulas. An example comparing exam scores of male and female students is shown. For the t-test for correlated samples, it discusses the assumptions and uses a paired t-test example to compare exam scores before and after an intervention.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 71

STATISTICS AND PROBABILITY

MODULE 20
INFERENTIAL
STATISTICS
(PARAMETRIC TEST)
STATISTICS AND PROBABILITY
WHAT IS INFERENTIAL STATISTICS?
In inferential statistics, the researcher is trying to arrive at a
conclusion. When we say inference, this is an educated guess or
a meaningful prediction based on findings and conclusion.

We use inferential statistics to try to infer from the sample


data what would be the population might be. It is also use in
hypothesis testing, determining relationships and making
predictions.

We use inferential statistics to make a conclusion of the


probability observed between the differences of two groups. In
other words, we use inferential statistics to make inferences
from our collected data to more general conditions.
t-TEST FOR TWO
INDEPENDENT
SAMPLES
STATISTICS AND PROBABILITY

What is the t-test for independent samples?

The t-test is a test of difference between two independent


groups. It uses means thus we are going to compare two means,
say x1 against x2 and it uses the standard deviation also.

Here, the samples should be drawn from two normal


population and the size should be large enough so that the
distribution of sample means is approximately normal.

Note that if the sample size is small and markedly non-


formal or there are outliers, the nonparametric alternatives are
Wilcoxon Test and Mann-Whitney U.
STATISTICS AND PROBABILITY

This test was introduced by William S. Gosset under the pen


name “Student”, hence the t-test is also called as “Student t-
test”. It is the most commonly used by the researcher.

According to Broto (2007), the t-test for independent


samples is used when we compare means of two independent
groups and if the distribution is normally distributed where Sk =
0 and Ku = 0.265. This test is used when the data is interval or
ratio with a sample which is less than 30.
STATISTICS AND PROBABILITY

However, if n becomes larger, the t-distribution comes close into


z-distribution. Thus, t-test can be used not only if n < 30 but also if n
is large or n  30 and if the populations’ standard deviation is not
known.

In our previous lesson, the divisor n – 1 in the formula for


variance and the standard deviation is what we called the
degree of freedom. The degree of freedom is the number of
variables which are free to vary.
STATISTICS AND PROBABILITY

When performing a t-test on two independent samples


(groups), we are testing the null hypothesis;

“Ho: The two group means are equal”

against the alternative hypothesis;

“Ha: The two group means are not equal or


the mean of one group is higher (or lower) than
the mean of the other group”
Formula for t-test (Two Independent Samples)

t = t-test
x1 = data of group 1
x2 = data of group 2
= the sum of squares of group 1
= the sum of squares of group 2
= the number of observations in group 1
= the number of observations in group 2

= degree of freedom (df)


 
; the mean of group 1
 
; the mean of group 2
 
= ; ; sum of squares of group 1
 
= ; sum of squares of group 2
Another Formula for t-test (Two Independent Samples)

Where; Pooled variance

Note that (1) the formula holds if the variances in the population are equal
and the population are normal and (2) the null hypothesis is “the means are
equal”. The null is true if t is close to zero.
EXAMPLE:
The following are the scores of 10 male students and 10 female third year
Computer Science students of Prof. Alyssa Jandy Angulo in the preliminary examination
in Statistics and Probability.
Scores of Male and Female Third Year Computer Science students
in the Preliminary Examination in Statistics and Probability

Male Female
28 24
36 18
34 22
32 10
8 20
24 6
24 14
20 4
18 12
34 26

Prof. Angulo wants to determine if there is no significant difference between the


performance of the male and female students in the preliminary examination. She
uses the t-test for two independent groups at 0.05 level of significance.
Problem: Is there a significant difference between the performance of the male and
female third year Computer Science students in the preliminary examination in
Statistics and Probability?

Hypotheses:
 
Ho: There is no significant difference between the performance of the male and
female third year Computer Science students in the preliminary examination in
Statistics and Probability.
 
Ha: There is a significant difference between the performance of the male and
female third year Computer Science students in the preliminary examination in
Statistics and Probability.

Level of Significance:
 
Based on the problem;
 
 = 0.05
 
df = n1 + n2 - 2 = 10 + 10 – 2 = 18
 
t(0.05) = 2.101 (see table for critical values of t-test)
Computation for t-test:
Plug into a formula:
Male Female

28 784 24 576
36 1296 18 324
34 1156 22 484
32 1024 10 100
8 64 20 400
24 576 6 36
24 576 14 196
20 400 4 16
18 324 12 144
34 1156 26 676

Solving for SS1 and SS2;


Decision Rule:
 
Reject the Ho if |t-computed|  |t-critical| otherwise accept
the Ho.

Conclusion:
 
Since the t-computed value of 2.77 is greater than the t-
tabular/critical value of 2.101 at 0.05 level of significance with
18 degrees of freedom, the null hypothesis is rejected in favor
of the research hypothesis. This means that there is a
significant difference between the performance of the male
and female third year Computer Science students in the
preliminary examination in Statistics and Probability.
t-TEST FOR
CORRELATED SAMPLES
t-test for Correlated Samples
Another parametric test is the t-test for correlated
samples. This test is applied only in a one group of samples.
This could be used in the evaluation of a certain program or
treatment. Since the t-test for correlated sample is another
parametric test, conditions must be met such as it should be
in a normal distribution and the use of interval or a ratio data.

Related groups would mean the subjects are measured at two different times, or
under two different conditions, or are matched on a specific criteria.

2 different times: e.g. pretest and posttest design or before and after
treatment/intervention

2 different conditions: e.g. testing the effect of fertilizers A and B on the mango
yields

Matched on specific criteria: e.g. case control study for 75 years old males with TB
Diagram:

Pre-test Intervention Post-test

Compare if there
is significant

The t-test for correlated samples is used to find out if a


difference exists between the before and after means. If there is a
difference in favor of the post test then the intervention or
treatment or method is effective. However, if there is no significant
difference then the treatment or method is not effective.
Formula for t-test (Correlated Samples)

Where;
 
t = t-test for correlated samples
 
= the mean difference between the pretest and the posttest
 
= the sum of the squares of the difference between the pretest
and the posttest
 
= the summation of the difference between the pretest and the
posttest
 
n = the sample size
Example:
During the first day of class, a professor conducted a 50-item
pre-test to his fifteen students in Statistics and Probability
before the formal lesson of the said subject. After a semester, he
gave a posttest to his fifteen same students using the same set
of examination that he was given in the pretest. He wants to
determine if there is a significant difference between the pretest
and the posttest. The following is the result of the experiment.
The professor use the  = 0.01 level of significant.

Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Pretest 15 12 20 10 8 27 29 13 19 22 25 14 28 18 16

Postest 20 18 25 25 20 35 43 28 29 37 46 27 33 37 28
Solution:

Problem: Is there a significant difference between the Pretest and


Posttest of the fifteen students in Statistics and Probability on the use
teaching method by the Professor?

Hypotheses:
 
Ho: There is no significant difference between the Pretest and Posttest of the
fifteen students in Statistics and Probability based on the teaching method used by
the Professor.
 
Ha: There is a significant difference between the Pretest and Posttest of the fifteen
students in Statistics and Probability based on the teaching method used by the
Professor.

Level of Significance:
 
Based on the problem;
 = 0.01
df = = 15 - 1 = 14
t(0.01) = 2.977 (see table for critical values of t-test)
Compute the t-value using the t-test for correlated samples:
Pretest Posttest
D D2
Plug in a formula:
15 20 -5 25
12 18 -6 36
20 25 -5 25
10 25 -15 225
8 20 -12 144
27 35 -8 64
29 43 -14 196
13 28 -15 225
19 29 -10 100
22 37 -15 225
25 46 -21 441
14 27 -13 169
28 33 -5 25
Decision Rule:
18 37 -19 361
16 28 -12 144  
 If the t-computed value is greater
than/less than or beyond the critical
= -11.67 value, reject the Ho otherwise accept
Conclusion: the Ho.
 
The t-computed value of -8.84 is beyond the t-critical value of -2.997 at 0.01 level of significance
with 14 degrees of freedom. The null hypothesis is therefore to reject in favor of the research
hypothesis. This means that the posttest result is higher than the pretest result. It implies that the
method of teaching of a Professor is effective.
z-TEST
z-test
z-test is another test under parametric statistics which requires
normality of distribution. It is often applied in large samples (n > 30), if the
population standard deviation () is known. It is used to compare two
means, the sample mean, and the perceived population mean.

It is also used to compare the two sample means taken from the same
population. The z-test can be applied in two ways: (1) the One-Sample
Mean test and (2) the Two-Sample Mean test.

The tabular value of the z-test at 0.01 and 0.05 level of significance is
shown below:

Level of Significance
Test
0.01 0.05
One-tailed 2.33 1.645
Two-tailed 2.575 1.960
z-TEST FOR
ONE SAMPLE GROUP
z-test for One Sample Group
The z-test for one sample group is used to compare the perceived population
mean  against the sample mean . Using this test, we can determine whether the
mean of a group differs from a specified value.
This procedure is based on the normal distribution. So for small samples, this
procedure works best if the data that were drawn is a normal distribution or one that
is close to normal. Usually, we can consider samples of size 30 or higher to be large
samples. However, if the population standard deviation is not known, the sample
standard deviation can be used as a substitute.
The null hypothesis used for this test is as follow:

Ho:  = c ; (The population mean equals the hypothesized mean)

The alternative hypothesis that we could choose are as follows:


 
Ha:   c ; (The population mean differs the hypothesized mean)
Ha:  > c ; (The population mean is greater than the hypothesized mean)
Ha:  < c ; (The population mean is less than the hypothesized mean)
Formula for z-test (One Sample Group)

where;
 
z = the z-test for one sample group
= sample mean
 = hypothesized value of the population mean
 = population standard deviation
n = sample size
Example:

A Mathematics Professor claims that the average performance of his


students is at least 86%. To be able to verify if his claim is true, he
conducted an examination to his 40 students. After the exam, he got a
mean grade of 80%. With the standard deviation of 76%, is the claim of
the Professor is true? Use the z-test (one tailed test) at 0.05 level of
significance.
Solution:
Problem: Is the claim of a Professor true that the average performance of his
student is at least 86%?

Hypotheses:
 
Ho: The average performance of the students is 86%.
 
Ha: The average performance of the students is not 86%.
Level of Significance: Statistics:
  z-test for a one-tailed test
 = 0.05  
z =  1.645 Here; = 80
 = 86
n = 40
Plug in a formula:  = 76
 

Decision Rule:
 
If the z-computed value is greater than or beyond the z-tabular, reject the Ho.

Conclusion:
 
Since the z-computed value of -0.49 is not beyond the critical value of -1.645 at 0.05
level of significance, the research hypothesis that is the average performance of the
students is 86% will be accepted.
z-TEST FOR
TWO SAMPLE MEANS
z-test for Two Sample Means

The z-test for two sample mean is another parametric test. This is basically
to compare the means of two independent groups where the samples were
drawn. The sample must be drawn from a normally distributed population.

When do we use the z-test for two sample means?

We use z-test to find out if there is a significant difference between the


two populations by only comparing the sample mean of the population.

The null hypothesis used for this test is as follow:


 
Ho: 1 = 2 ; (The population mean 1 equals the population mean 2)

The alternative hypothesis is as follow:


 
Ha: 1  2; (The population mean 1 is not equal to the population mean 2)
 
What is the similarity and difference of t-test and z-test?
Z-tests and t-tests are statistical methods involving data analysis that have
applications in business, science, and many other disciplines. Let's explore some of
their differences and similarities as well as situations where one of these methods
should be used over the other.
Z-tests are statistical calculations that Like z-tests, t-tests are calculations
can be used to compare population used to test a hypothesis, but they
means to a sample's. The z-score tells are most useful when we need to
you how far, in standard deviations, a determine if there is a statistically
data point is from the mean or significant difference between two
average of a data set. A z-test independent sample groups. In other
compares a sample to a defined words, a t-test asks whether a
population and is typically used for difference between the means of two
dealing with problems relating to groups is unlikely to have occurred
large samples (n > 30). Z-tests can because of random chance. Usually, t-
also be helpful when we want to test tests are most appropriate when
a hypothesis. Generally, they are most dealing with problems with a limited
useful when the standard deviation is sample size (n < 30).
known.
Source: http://study.com/academy/lesson/z-test-t-test-similarities-differences.html
Source: http://www.statisticshowto.com/when-to-use-a-t-score-vs-z-score/
FORMULA FOR Z-test

The formula for the z-test for two sample means is:

Where:
 
= the mean of sample 1
 
= the mean of sample 2
 
= the variance of sample 1
 
= the variance of sample 2
 
= size of sample 1
 
= size of sample 2
The tabular value of the z-test at 0.01 and 0.05 level of
significance is shown below:

Level of Significance
Test
0.01 0.05
One-tailed 2.33 1.645
Two-tailed 2.575 1.960
Example:
An entrance examination was administered to incoming freshmen in the
department of Information Technology and the department of Computer
Science with 100 students each department randomly selected. The mean
scores of the given samples were = 89 and = 83 and the variances of the test
scores were 45 and 40 respectively. Is there a significant difference between
the two groups? Use 0.01 level of significance, two-tailed test.
 
Problem: Is there any significant difference between the two groups?

Hypotheses:
 
Ho: There is no significant difference between the two groups.
 
Ha: There is a significant difference between the two groups.

Level of Significance:
 
 = 0.01
z =  2.575
Statistics:
Plug into a formula:
z-test for a two-tailed test
 
Here;

Decision Rule:
= 89
 
If the z-computed value is greater than or beyond
= 83
the z-tabular, reject the Ho.
 
=45
Conclusion:
=40  
Since the z-computed of 6.52 is beyond the z-tabular
= 100 of 2.575 at a 0.01 level of significance, we could say that
we are going to reject the null hypothesis and instead we
= 100 are going to accept the alternative hypothesis that there is
a significant difference between the two groups.
 
F-TEST
F-test

The F-test is another parametric test and this can be called also as
Analysis of Variance (ANOVA).

Ronald A. Fisher developed the F-test. This test is use when the variances
of two populations are equal or when the distribution is normal and the level
of measurement is interval or ratio.

The test can be a two-tailed or a one-tailed test. If the variances are not
equal, the two-tailed test should be used while the one-tailed test used in
one direction where the variance of the first population is either greater or
less than the second population variance but it should be both.

There are three kinds of analysis of variance and these are (a) one-way
analysis of variance, (b) two-way analysis of variance and the (c) three-way
analysis of variance.

Here, we are focusing mainly in a ONE-WAY ANALYSIS OF VARIANCE.


ONE-WAY
ANALYSIS OF
VARIANCE
The one-way Analysis of Variance (ANOVA) can be used for the case of a
quantitative outcome with a categorical explanatory variable that has two or
more levels of treatment. The term one-way, also called one-factor, indicates
that there is a single explanatory variable (“treatment”) with two or more
levels, and only one level of treatment is applied at any time for a given subject.

The nonparametric equivalents of one-way ANOVA use ranks and


medians, instead of means and standard deviation. Three nonparametric
alternatives are Kruskal Wallis Test, the Jonckheere-Terpstra Test and the
Median Test

As stated before, the One-way ANOVA is also called as F-test and is used
to test null hypothesis that the means of the independent group means are
equal.

The F-test can be used for two or more independent groups, though
there is a misconception that it can only be used for more than two groups.
When performing a one-way ANOVA, we are testing the null hypothesis,
that is:

Ho: All group means are equal


against the alternative hypothesis
Ha: Not all the group means are equal

The ANOVA table has five columns and these are the
sources of variation, degrees of freedom, sum of squares, mean
squares and the F-value, both the computed and the tabular
values.
The sources of variations are between the groups, within the group itself
and the total variation.
The degree of freedom for the total is the total number of observation
minus 1. The degree of freedom from the between group is the total number
of groups minus 1. The degree of freedom for the within group is the total
degree of freedom (df) minus the between groups of degree of freedom (df).
F-value
Source of
df SS MS
Variation Computed Tabular
See the table
at 0.05 or the
Between
BSS desired level
Groups
of
significance
with df
Within between and
WSS
Groups within group
Total N-1 TSS
Based on the table:
 1. K is the number of columns
2. N is the number of observation
3. BSS is the between sum of squares minus the CF, where

4. TSS is the total sum of squares minus the CF.

5. WSS is the within sum of squares or it is the difference between the TSS minus the BSS
6. MSB mean squares between is equal to the BSS/df
7. MSW mean square within is equal to the WSS/df
8. The
The F-computed value must be compared with the
F-tabular value at a given level of significance with the
corresponding df’s of BSS and WSS.

If the F-computed value is greater than the F-


tabular value, we are going to reject the null
hypothesis in favor of the research hypothesis or when
the F-computed value is greater than or beyond than
the F-tabular, the alternative is accepted which means
that there is significant difference between and among
the mean of the different groups.
Example:
The computer store is selling three different brands of cellular phone. The
manager of the store wants to determine if there is a significant difference in
the average sales of the three brand of cellular phone for five-day selling. The
following data are recorded:

BRAND
DAY A(x1) B(x2) C(x3) (x1)2 (x2)2 (x3)2
1 4 8 3 16 64 9
2 6 3 5 36 9 25
3 2 6 3 4 36 9
4 5 4 6 25 16 36
5 2 7 4 4 49 16
Total
n1 = 5 n2 = 5 n3 = 5

Perform the analysis of variance and test the hypothesis at 0.05 level of
significance that the average sales of the three brands of cellular phone are
equal.
Problem: Is there any significant difference in the average sales of the three
brands of cellular phone?

Hypotheses:
Ho: There is no significant difference in the average sales of the three brands of
cellular phone.
 
Ha: There is a significant difference in the average sales of the three brands of cellular
phone.

Level of significance:  = 0.05

Computation:

WSS = TSS – BSS = 45.73 – 8.93= 36.8


F-value
Source of
Df SS MS
Variation Computed Tabular

Between
2 8.93 4.465
Groups
3.89
1.45
Within
12 36.8 3.07
Groups
Total 14 45.73
Decision rule:
 
If the F-computed is greater than the F-tabular value, reject the null hypothesis.
Conclusion:
 
Since the F-computed value of 1.45 is less than the F-tabular value of 3.89 at
0.05 level of significance with 2 and 12 degrees of freedom, retain the null
hypothesis that is there is no significant difference in the average sales of the three
brands of cellular phone.
 
PEARSON PRODUCT
MOMENT COEFFICIENT
OF CORRELATION
What is Pearson Product Moment Coefficient of Correlation ?
The Pearson product-moment correlation symbolizes as r is a
parametric measure of association for two variables, say X and Y.
It measures both the strength and the direction of a linear
relationship.
If one variable X is an exact linear function of another
variable Y, a positive relationship exists if the correlation is 1. This
is also called as a perfect positive correlation.

If the relationship is negative relationship, then the correlation


is -1 which is also known as a perfect negative correlation.
If there is no linear predictability between the two variables,
the correlation is 0. 
In other words, we are going to consider the relationship between two
variables X and Y rather than predicting a value of Y.
16

14

12

10

0
0 1 2 3 4 5 6 7 8

PERFECT POSITIVE CORRELATION; r = 1


16

14

12

10

0
0 1 2 3 4 5 6 7 8

PERFECT NEGATIVE CORRELATION; r = -1


10

0
0 2 4 6 8 10 12 14

POSITIVE CORRELATION
10

0
0 2 4 6 8 10 12 14

NEGATIVE CORRELATION
5

4.5

3.5

2.5

1.5

0.5

0
0.5 1 1.5 2 2.5 3 3.5 4 4.5

NO CORRELATION; r = 0
Note that we use the r to determine the index of relationship
between two variables, the independent and the dependent
variables.
What is the Degree of Correlation?
1. Correlation is measured on a scale of -1 to
+1, where 0 indicates no correlation and either -1 or +1 suggests
high correlation. Both -1 and +1 are equally high degree of
correlation
2. Moderate correlation is often suggested by a correlation
coefficient of about 0.7. There is no absolute number guide for 
correlation coefficient that tell when a two variables have low to
high degree of correlation; however, r closed to -1 or +1  
suggest a high degree of correlation, values closed to 0 suggests no
correlation or low correlation and values  
between 0.7 and 0.8 are moderate correlation.
How do we interpret the computed r?

To be able to interpret the computed coefficient


correlation r, here is the range of correlation and its
interpretation.

Between 0.80 to  0.99 High correlation


Between 0.60 to  0.79 Moderately high correlation
Between 0.40 to  0.59 Moderate correlation
Between 0.20 to  0.39 Low correlation
Between 0.01 to  0.19 Negligible correlation
Formula for Pearson Product Moment Coefficient Correlation r
The Pearson Product Moment Coefficient of Correlation
denoted by r is determined by using the formula:

Where:
  r = the Pearson Product Moment Coefficient of Correlation
n = sample size

= the sum of the product of x and y

= the product of the sum of and the sum of

= sum of squares of x

= sum of squares of y
 
Example:
Consider the pre-test and the post test in Statistics and Probability of the
ten students of CS 3201.
Student
Exam A B C D E F G H I J
Pre-test (x) 56 70 60 85 75 87 72 89 75 86

Post-test
65 78 60 90 75 90 79 89 89 95
(y)

Solve and analyze the value of r. Determine if there is a significant


relationship between the pre-test and post-test of the ten students of CS in
Statistics and Probability. Use  = 0.05 level of significant.

Problem: Is there is a significant relationship between the


pre-test and post-test of the ten students of CS in Statistics and
Probability?
Hypotheses:
 
Ho: There is no significant relationship between the pre-test and post-test of the
ten students of CS in Statistics and Probability.
 
Ha: There is a significant relationship between the pre-test and post-test of the ten
students of CS in Statistics and Probability.

Level of Significance:
 
 = 0.05
df = n – 2 = 10 – 2 = 8
r(0.05) = 0.6319
Computation (Statistics)

Pearson Product Moment Coefficient of Correlation

x y x2 y2 xy
56 65 3136 4225 3640
70 78 4900 6084 5460
60 60 3600 3600 3600
85 90 7225 8100 7650
75 75 5625 5625 5625
87 90 7569 8100 7830
72 79 5184 6241 5688
89 89 7921 7921 7921
75 89 5625 7921 6675
86 95 7396 9025 8170
= 755 =810 =58181 =66842 =62259
Apply into a formula: Decision Rule:
 
If the computed r value is
greater than the r tabular value,
reject the null hypothesis.
 

Conclusion:
 
Since the computer r-value which is 0.92 is
higher than the r-tabular value which is 0.632 at
0.05 level of significance with 8 degree of
freedom, we are going to reject the null
hypothesis. This means that there is a significant
relationship between the pre-test and post-test
of the ten students of CS in Statistics and
Probability. It implies that the higher the pre-test
the higher also the post-test and likewise, the
lower the pre-test, the lower the post test.
LINEAR
REGRESSION
What is a LINEAR REGRESSION?

Linear regression is the most basic and commonly used predictive analysis. 
Regression estimates are used to describe data and to explain the relationship
between one dependent variable and one or more independent variables.

Linear regression analysis consists of more than just fitting a linear line
through a cloud of data points.  It consists of 3 stages – (1) analyzing the
correlation and directionality of the data, (2) estimating the model, i.e., fitting the
line, and (3) evaluating the validity and usefulness of the model.

The three major uses for regression analysis are:


(1) causal analysis,
(2) forecasting an effect, and
(3) trend forecasting.  

http://www.statisticssolutions.com/what-is-linear-regression/
How the Linear Regression Works?

In simple linear regression, we predict scores on one variable from the


scores on a second variable. The variable we are predicting is called
the criterion variable and is referred to as Y. The variable we are basing our
predictions on is called the predictor variable and is referred to as X. When
there is only one predictor variable, the prediction method is called simple
regression.

Here, the criterion Y when plotted as a function of X form a straight


line. Once we know the linear regression equation, we could now
determine or we could predict the value of Y in terms of X.

Importance of Linear Regression

Linear regression attempts to model the relationship between two variables


by fitting a linear equation to observed data.
Like other parametric tests, it must also meet
some conditions. First the data should be
normally distributed using the level of
measurement which is expressed in an interval
or ratio data.
In regression analysis, we are going to
consider two variables. If two variables are
correlated, that is if the correlation coefficient
(r) is significant, then it is possible to predict or
estimate the value of one variable from the
knowledge of the other variable as stated in the
previous slide.
LINEAR REGRESSION EQUATION

Where;
x is the independent or the predictor variable
 
y is the dependent or the predictand variable
 
b is the slope of the line
 
a is the constant value
To be able to determine the linear regression equation, we
need to solve first for the value of a and b.
Example:
Consider table below:
x 1 2 3 4 5 6 7 8
y 3 1 4 2 5 3 7 5

a. Find the linear regression equation c. Plot the point and sketch the trend line
b. What is the value of y if x = 12?

x y xy x2
1 3 3 1
2 1 2 4
3 4 12 9
4 2 8 16
5 5 25 25
6 3 18 36
7 7 49 49
8 5 40 64
x = 36 y = 30 xy = 157 x2 = 204
Solving for b: Solving for a:

So, the linear regression equation is:

Next, we need to solve for y if x = 12.


Graph:

0
0 1 2 3 4 5 6 7 8 9
NEXT TOPIC IS
NON-PARAMETRIC
STATISTICS

You might also like