0% found this document useful (0 votes)
28 views53 pages

MATH 101-Week 7-8 - Lesson 4.1 Correlation & Regression Analysis

This document outlines the objectives and methodologies for using correlation and linear regression in statistics to analyze data and make predictions. It explains key concepts such as independent and dependent variables, correlation coefficients, and hypothesis testing, including the formulation of null and alternative hypotheses. The document also provides examples and procedures for conducting statistical analyses, emphasizing the importance of these methods in decision-making.

Uploaded by

Kasten Estolas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views53 pages

MATH 101-Week 7-8 - Lesson 4.1 Correlation & Regression Analysis

This document outlines the objectives and methodologies for using correlation and linear regression in statistics to analyze data and make predictions. It explains key concepts such as independent and dependent variables, correlation coefficients, and hypothesis testing, including the formulation of null and alternative hypotheses. The document also provides examples and procedures for conducting statistical analyses, emphasizing the importance of these methods in decision-making.

Uploaded by

Kasten Estolas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

OBJECTIVES:

At the end of this lesson, you must be able to:


1. Use the method of correlation and linear
regression to predict the value of a variable given
certain conditions.
2. Recognize the importance of correlation
analyses in making decisions.
INTRODUCTION
Statistics , a branch of Mathematics that examines and
investigates ways to process and analyze the data
gathered.
It provides procedure in data collection, presentation,
organization, and interpretation to have meaningful idea
that is useful to decision-makers.
INTRODUCTION
• Collection of data is the process of gathering
relevant information from the population.
• Organization of data is the systematic arrangement
of data into tables, graphs, or charts so that logical
and statistical conclusions can easily be derived
from the collected information.
INTRODUCTION
• Analysis of data refers to the process of deducing
relevant information from the given data so that
the numerical description can be formulated.
• Interpretation of data is all about deriving
conclusion from the data that have been analyzed.
It also involves making predictions and forecasts
about large groups based on gathered data from
small groups.
INTRODUCTION
INTRODUCTION
Two Fields of Statistics
1. Descriptive Statistics consist of the collection,
organization, summarization, and presentation of data
Here, the statistician tries to describe a given situation. To
tell something about a particular group of observation
2. Inferential Statistics The logical process from sample
analysis to a generalization of conclusion.
Here, the statistician tries to make inferences from samples
to population. This area also makes use of the concept of
probability.
IMPORTANT TERMS
Population (N) - consist of all the members of the
group about which to draw conclusion.
Sample (n) - portion or part, of the population of
interest selected for analysis.
IMPORTANT TERMS
Parameter
Numerical index describing a characteristic of a
population.
Statistic
Numerical index describing a characteristic of a
sample.
IMPORTANT TERMS
Constant
Characteristics of objects, people, or events that can
take of different values.
Example: Weight
Variable
Characteristics of objects, people, or events that can
take of different values.
Example : Boiling temperature in degree centigrade
Types of Variables
CORRELATION ANALYSIS
Independent Variable (x) -The variable being used as the
basis of prediction and is usually goes on the x-axis.
Dependent Variable (y) -The dependent variable
(sometimes known as the responding variable) is what is
being studied and measured. The dependent variable
always goes on the y-axis.
Example : Hours Studied Vs. Score on Exam
Dependent Variable: Score (Effect)
Independent Variable: Hours Studied (Cause)
CORRELATION ANALYSIS
Correlation Analysis is a method of statistical
evaluation used to study the strength of a relationship
between two, numerically measured, continuous
variables (e.g. height and weight).
If correlation is found between two variables it means
that when there is a systematic change in one variable,
there is also a systematic change in the other; the
variables alter together over a certain period of time.
CORRELATION ANALYSIS
If there is correlation found, depending upon the numerical
values measured, this can be either positive or negative.
Positive correlation exists if one variable
increases/decreases simultaneously with the other, i.e. the
high numerical values of one variable relate to the high
numerical values of the other.
Negative correlation exists if one variable decreases when
the other increases, i.e. the high numerical values of one
variable relate to the low numerical values of the other.
CORRELATION ANALYSIS
Two variables are positively correlated if the values of
the two variables both increase or both decrease.
Two variables are negatively correlated if the values of
one variable increase while the values of the other
decreases.
Two variables are not correlated or they have zero
correlation if one variable neither increases or
decreases while the other increases.
SCATTER PLOT/SCATTER DIAGRAM
A scatter plot is drawn so we can analyze if the two
variables are related somehow. If there is correlation
found, depending upon the numerical values
measured, this can be either positive or negative.
A scatter plot is a graph of ordered pairs (x, y)
consisting of data from two data sets
SCATTER PLOT
SCATTER PLOT
SCATTER PLOT
The Correlation Coefficient (r)
The correlation coefficient (r) is a number that describes
how strong the relationship between two data sets.
Correlation coefficients range from -1 (perfect negative
correlation) to 1 (perfect positive correlation). A
correlation coefficient close to zero indicates that the data
sets are most likely not linearly correlated (See figure 1).
Pearson Product Moment-Correlation Formula (Pearson’s r)
n xy  ( x)( y )
r
[n( x 2 )  ( x) 2 ][n( y 2 )  ( y ) 2 ]
CORRELATION ANALYSIS

The Table below is the interpretation of the various


degree of linear correlation (Blay2013)

Between ±0.80 𝑡𝑜 ± 0.99 high correlation


Between ±0.60 𝑡0 ± 0.79 moderately high correlation
Between ±0.40 𝑡𝑜 ± 0.59 moderate correlation
Between ±0.20 𝑡𝑜 ± 0.39 low correlation
Between ±0.01 𝑡𝑜 ± 0.19 negligible correlation
Example 1
Is there a significant relationship between the two sets of
test scores in Algebra and Geometry of ten students?
Draw a scatter plot. Find the correlation coefficient (r) for
the data and discuss what you think it indicates.
Example 1: Solution

Interpretation: There is a positive correlation between the scores in


Algebra and scores in Geometry, hence, when scores in Algebra
increased/decreased, scores in Geometry increased/decreased.
Example 1: Solution
Example 1: Solution
Linear Regression/Regression Analysis
Regression analysis is a statistical tool used to show how
two or more variables are related to each other. If two
variables are observed to be related, it is helpful if we
can produce an equation to model the relationship.

If this relationship follows a linear pattern, the model is


a linear equation, or in statistics, a linear regression
equation.
Linear Regression/Regression Analysis
Three (3) Major Uses of Regression analysis are:
1. determining the strength of predictors’
2. forecasting an effect, and
3. trend forecasting.
How to Find the Regression Equation
The simplest form of regression equation with one independent
variable and one dependent variable is defined by the formula

Where: x – score in the independent variable (predictor)


y – estimated dependent variable score (criterion measure)
b – regression coefficient
a - constant
Example:
Find the equation of the regression line for the data
in Example 1.
Solution:
We already calculated the values need for each
formula when we found the correlation coefficient in
Example 1.
Substitute into the first formula to find the value of
the slope.
Regression Equation
Predicted value of y:

Slope (b) :
Regression Analysis
Substitute into the second formula to find the value of a
(y-intercept) when b = 0.80

Substituting the value of a and b, the regression equation is


𝒚 = 𝟑. 𝟔𝟒 + 𝟎. 𝟖𝟎𝒙
Objectives:
At the end of this lesson, you must be able to:
1. to be able to formulate the null and alternative
hypotheses.
2. to differentiate between the null hypothesis and
the alternative hypothesis.
3. to perform the step by step procedure for
hypothesis testing.
Hypothesis Testing
It is a statistical method that is used in making
statistical decisions using experimental data.

It is basically an assumption that we make about the


population parameter.
Hypothesis Testing
There are two (2) types of statistical hypothesis:
a. Null Hypothesis , symbolized by H0, is a statistical
hypothesis testing that assumes that the
observation is due to a chance factor.

b. Alternative Hypothesis, symbolized by Ha it


states that there is a difference between two
population means (or parameters)
Two (2) types of hypothesis
A null hypothesis (Ho) is a hypothesis that says there
is no statistical significance between the two
variables. It is the one which the researcher always
hopes to reject; it shows no significant
difference/relationship.

Example: There is no significant relationship between


the test scores in Algebra and Geometry.
Two (2) types of hypothesis
An alternative hypothesis (Ha) is one that states
there is a statistically significant relationship between
two variables. It challenges Ho and shows a
significant difference/relationship.

Example: There is a significant relationship between


the test scores in Algebra and Geometry.
Why do we need to test a hypothesis?
Hypothesis testing is an essential procedure in
statistics.

A hypothesis test evaluates two mutually exclusive


statements about a population to determine which
statement is best supported by the sample data like
when we say that a finding is statistically significant.
Level of Significance
The level of significance refers to the degree of
significance in which we accept or reject the null
hypothesis.

Level of significance is the maximum probability of


committing a Type I error.

That is, P (Type I error) = α.


Level of Significance
The critical or rejection value is the range of the
values of the test value that indicates that there is
significant difference and that the null hypothesis
(H0) should be rejected

noncritical or nonrejection region is the range of the


values of the test value that indicates that the
difference was probably due to chance and that the
null hypothesis (H0) should not be rejected.
One Tailed versus Two Tailed
A one-tailed test shows that the Ho be rejected
when test value is in the critical region on one side of
the mean.

A two-tailed test, the Ho should be rejected when


the test value is in either of the two critical regions.
Procedure in Testing a Hypothesis (t-test for correlation)
Step 1. Formulate the hypotheses. (Null and Alternative)
Ho: There is no significant relationship between the
scores in Algebra and Geometry.
Ha: There is a significant relationship between the
scores in Algebra and Geometry.

Step 2: Calculate the value of correlation coefficient, r.

Step 3. Set the Level of significance (∝ =0.05)


Procedure in Testing a Hypothesis (t-test for correlation)
Step 4. Calculate the value of t computed using the formula
below:
Procedure in Testing a Hypothesis (t-test for correlation)
Step 5. Statistical decision (reject or do not reject)
Calculate the degrees of freedom to find the value of
T-critical on the t-table of values:
The degree of freedom (df) gives the number of pieces of
independent information available for computing variability.
 df is calculated only from samples.
NOTE: If tcomputed  tcritical, do not reject H0
If tcomputed  tcritical, reject H0

Step 6: Draw conclusions


Example: Testing a Hypothesis
Let us test the hypothesis for Example 1 in lesson 4.1. Is
there a significant relationship between the two sets of test
scores in Algebra and Geometry of ten students? Find the
correlation coefficient for the data and discuss what you
think it indicates.

For this problem r = 0.81, use this coefficient in testing the


hypothesis.
Example: Testing a Hypothesis
Step 1. State the Null and alternative hypotheses.
Ho: There is no significant relationship between the
scores in Algebra and Geometry.
Ha: There is a significant relationship between the
scores in Algebra and Geometry.

Step 2. Calculate the correlation coefficient ( r ).

r = 0.81
Example: Testing a Hypothesis
Step 3. Level of significance, α= 0.05 (this is a constant value)

Step 4. Calculate the value of t computed.


Example: Testing a Hypothesis
Step 5. Statistical Decision
From the t-table of values, at 0.05 level of significance
𝑡𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 = 2.2306.
Since 𝑡𝑐𝑜𝑚𝑝 𝑖𝑠 3.906 > 𝑡𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 = 2.2306.
Decision: Reject the Ho and accept Ha.

Step 6. Conclusion
We can conclude that there is a very strong/highly
significant correlation between Algebra and Geometry scores.
Hence, when the scores in Algebra are increased/ decreased
then the scores in Geometry are also increased/or decreased.
t-critical values
References

Prepared by:

Gracia T. Canlas, LPT, MAED


Instructor – MATH 101
Thank you for listening!

You might also like