Correlation and Regration

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 57

CORRELATION

AND
GROUP 1

REGRESSION
INTRODUCTIO
N
Two statistical concepts that are used to examine relationships between v ariab les are
cor relation and regression .

Two variables have a linear relationship, which correlation analyzes bo th in term s of


d irection and strength. From -1 to 1, the correlation coefficient is u sed to exp ress it.
A co rrelation that is positive is shown by a positive value, and one that is n egative b y
a neg ative value.

In contrast, regression makes predictions about one variable's value b ased on the
v alue of another, taking it a step further. A regression equation represents th e
relationship. Simple regression involves two variables, while multip le reg ression
d eals with more than two.

To p ut it simply, regression assists in formulating predictions based on a r elation ship,


wh er eas correlation identifies whether a link exists. Correlation assesses ho w two
v ar iables move together, and regression helps us understand and quantif y that
relationship. Both are crucial tools in understanding patterns and mak in g p rediction s
in v arious fields, such as finance, biology, and social sciences.
Distinguish and explain the difference between
dependent and independent variables

Draw a scatter plot for a set of ordered pairs

Ob j ectives
Calculate and interpret the coefficient of
correlation using Pearson Product-Moment
Correlation Coefficient (PMCC)

Evaluate the significance of correlatio


Correlation and explainsn coefficient.
Conduct a test hypothesis on the correlation
coefficient

Conduct a test for Spearman Rank


Ob j ectives Correlation and explain its application

Calculate and interpret the linear regression


equation
C O R R E L AT IO N
A N A LY SIS
• Statistical method used to determined
whether relationship variable exist
• Method of measuring Strength of such
relationship between the two variables
• Determines how closely, or to what
degree the two variables are changing or
moving together
• Measures the degree of association
between the two values of related
variables given in the data set.
Karl Pears on
(1857-1936)
He discovered that relationships between
variables could be quantified and expressed as a
coefficient for continuous data or as a percentage
for binary data. Karl Pearson first used the idea
of correlation 1890.
RELATIONSHIP MAY INDICATE THE
FOLLOWING:

DEGREE OF CA USE AN D PRED ICTIV E RELIA BILIT


ASSOCIATION EFFECT A BILITY Y OF THE
TEST
English and Math Nutritional Status of Entrance Test and Teacher-made Test:
grade of group or of Pupils and their Grades of Freshman It may be considered
students: A high academic Students: A high reliable if the
value of r means performance in degree of students perform
that the students Math: It can be relationship could consistently in the
who get high grades interpreted that imply that the test regardless of
in English are very nutritional status of entrance test could the time the test is
likely those who got pupils may have predict the grades of taken
high grades in Math. affected their freshman students.
performance in
school.
SCATTER
DIAGRAM
SCATTER
DIAGRAM
scattter plot, X-Y
graph and
Correlation Chart
SCATTER
DIAGRAM
scattter plot, X-Y
graph and
Correlation Chart
SCATTER
DIAGRAM
scattter plot, X-Y
graph and
Correlation Chart
WHEN TO USE A SCATTER DIAGRAM
• Paired Numerical data
• Dependent variable have multiple values for each value of independent variable
• To determine whether the two variables are related, such as
⚬ to identify potential root causes of problems
⚬ to determine objectively whether a particular cause and effect are related
⚬ to determine two effects that appear to be related, both occur with the same
cause
⚬ to test for autocorrelation before constructing a control chart
SCATTER DIAGRAM
EXAMPLE Ice cream sell vs Noon temp of
the day
SCATTER DIAGRAM
EXAMPLE Ice cream sell vs Noon temp of
the day
SCATTER DIAGRAM
CONSIDERATION
Intelligence

Stronger the relationship


SCATTER DIAGRAM
CONSIDERATION
Use Statistics to determine relationship

Data might be Stratified


SCATTER DIAGRAM
CONSIDERATION
Data don’t cover a wide enough
range

Think creatively:
To use scatter diagrams o discover a root cause

First Step:
in looking for a relationship variables
PEARSON PRODUCT
MOMENT CORRELATION
COEFFICIENT
Used if

the objective of the study is to the extent to which the variables are
related to each other (relationship)

the data is parametric


Where: N = number of data pairs
ΣX= sum of first data set
ΣXY = sum of the product of paired data
ΣY= sum of second data set
ΣX²= sum of squared X data
Σy²= sum of squared Y data
To interpret the r:

Size of Correlation Interpretation

0.90 to 1.00 (-0.90 to -1.00) Very high positive (or negative) correlation

0.70 to 0.90 (-0.70 to -0.90) High positive (or negative) correlation

0.50 to 0.70 (-0.50 to -0.70) Moderate positive (or negative) correlation

0.30 to 0.50 (-0.30 to -0.50) Low positive (or negative) correlation

Very low positive (or negative) correlation/


0.00 to 0.30 (-0.00 to -0.30)
Negligible correlation
-After identifying the Pearson coefficient r, get its corresponding p-value.

- Compare the p-value to the level of significance (oc)

If p-value> x: Accept the null hypothesis (H)

If p-value <<< Reject the null hypothesis (H.)


Example:
1. A researcher wants to determine the degree of relationship between the students' grades in Algebra and in Statistics. He
interviewed 10 students and summarized their grades in the table on the right. Using A= 0.05, is there a significant relationship
between the students! grades in Algebra and in Statistics?
Grade in Algebra Grade in
Students X Statistics Y
X² Y² XY

1 85 80 7,225 6,400 6,800

2 90 89 8,100 7,921 8,010

3 87 84 7,569 7,056 7,308

4 79 86 6,241 7,396 6,794

5 75 79 5,625 6,429 5,925

6 80 86 6,400 7,396 6,880

7 88 90 7,744 8,100 7,920

8 85 90 7,225 8,100 7,650

9 86 87 7,396 7,569 7,482

10 80 86 6,400 7,396 6,880

Total 835∑X 857∑Y 69,925∑X² 73,575∑Y² 71,649∑XY


r=0.551 Moderate Positive Correlation
n=10
P-Value= 0.0988
STEP-BY-STEP GUIDE TO
CALCULATE P-VALUE IN
To calculate P Value in ExcelEXCEL
using the T.TEST function, follow these steps:

• Select a cell where you want to display the P Value.


• Enter the formula: =T.TEST(array1, array2, tails, type)
• Replace array1 and array2 with the data sets you want to test.
• Set tails to 1 for a one-tailed test or 2 for a two-tailed test.
• Set type to 1 for a paired test or 2 for an unpaired test.
• The formula will return the P Value for the test.
HYPOTHESIS TESTING FOR
POPULATION CORRELATION
COEFFICIENT
P = population correlation coefficient

Calculate a standardized test statistic (t)

Hypothesis test for correlation coefficient tells us


whether the correlation between 2 variables is
significant.
USUAL
HYPOTHESIS
TESTS FOR P
Ho: p=0 (no signficant correlation)
Ha: p≠0 (signficant correlation)
Ho: p<0 (no signficant positive correlation).
Ha: p>0 (signficant positive correlation)
Ho: p>0 (no signficant negative correlation)
Ha: p<0 (signficant negative correlation)
USE A T-TEST
Example:
n(ordered pairs) = 10
r(sample correlation coefficient) = 0.551
level of significance α = 0.05
Test the significance of this correlation
coefficient.
Step 1: identify the hypotheses
null hypothesis Ho: p=0 (no signficant
correlation)
t = standardized test statistic
alternative hypothesis Ha: p≠0
r = sample correlation coefficient
n = number of ordered pairs
Step 2: identify the level of significance
d.f. =n-2 Step 3: identify the degrees of freedom
d.f.n-2=10-2 = 8
d.f. = 8
Step 4: determine the test to use
Step 5: determine the critical values
Step 4 Step 5

Step 6: identify the rejection regions

rejection regions (any value


less than to or greater than to)
Step 7: use formula to calculate t value (standardized test statistic)

Step 8: make a decision (reject or fail to reject null hypothesis)


Ho=There is no significant relationship between the students' grades in Algebra and in Statistics.
Step 9: interpret the decision
t= 1.867 ≤ 2.306(critical value)
Reject the alternative hypothesis because the computed value is less than the tabulated value .
Therefore, there is no significant relationship between the students' grades in Algebra and in Statistics.
SPEARMAN’S RANK
CORRELATION COEFFICIENT
EXAMPLE: SPEARMAN'S RANK CORRELATION
COEFFICIENT

Find the Spearman’s correlation coefficient between 𝑥 and 𝑦.


SIMPLE LINEAR
REGRESSION
ANALYSIS
SIMPLE LINEAR
REGRESSION
is used to estimate the relationship
between two quantitative variables.
You can use simple linear regression
when you want to know:
Regression models describe
the relationship between
How stro ng the relationship is variables by fitting a line to
be twe e n two variables (e.g., the the observed data. Linear
re la tion ship between rainfall and regression models use a
so il ero sion ). straight line, while logistic
and nonlinear regression
models use a curved line.
Regression allows you to
estimate how a dependent
The value of the dependent variable at a variable changes as the
certain value of the independent
independent variable(s)
variable (e.g., t he amount of soil
erosion at a certain level of rai nfall). change.
Assumptions of simple linear regression
Simple linear regression is a parametric test, meaning that it makes certain assumptions about the data. These assumptions are:

Homogeneity of variance (homoscedasticity): the size of the error in our prediction doesn’t change significantly across the values of the
independent variable.
Independence of observations: the observations in the dataset were collected using statistically valid sampling methods, and there are no
hidden relationships among observations.

Normality: The data follows a normal distribution.


Linear regression makes one additional assumption:

The relationship between the independent and dependent variable is linear: the line of best fit through the data points is a straight line (rather
than a curve or some sort of grouping factor)
How to perform a simple linear regression
Simple linear regression formula
The formula for a simple linear regression is:

y = {\beta_0} + {\beta_1{X}} + {\epsilon}


y is the predicted value of the dependent variable (y) for any given value of the independent variable (x).
B0 is the intercept, the predicted value of y when the x is 0.
B1 is the regression coefficient – how much we expect y to change as x increases.
x is the independent variable ( the variable we expect is influencing y).
e is the error of the estimate, or how much variation there is in our estimate of the regression coefficient.
Linear regression finds the line of best fit line through your data by searching for the regression coefficient (B1) that
minimizes the total error (e) of the model.
E S T I M AT I N G T H E
COEFFICIENT
Sum of Square for
Error
The least square method determines the coefficients that minimizing the sum of
the squared deviation between the points and the line define by the
coefficient, it is called sum of square for error (or total variations).
Measures of Variance in Simple Linear Regression
Analysis
STANDARD
ERROR OF
THE
ESTIMATE
Calculate and interpret the standard error of the estimate for multiple regression
Figure 1 shows two regression examples.
You can see that in Graph A, the points are
closer to the line than they are in Graph B.
Therefore, the predictions in Graph A are
more accurate than in Graph B.

The standard error of the estimate is a


measure of the accuracy of predictions.
Recall that the regression line is the line
that minimizes the sum of squared
deviations of prediction (also called the sum
of squares error). The standard error of the
estimate is closely related to this quantity
and is defined:
where σest is the standard error of the estimate, Y is an actual score, Y' is a predicted
score, and N is the number of pairs of scores. The numerator is the sum of squared
differences between the actual scores and the predicted scores.
Note the similarity of the formula for σest to the formula for σ. It turns out that σest is
the standard deviation of the errors of prediction (each Y - Y' is an error of prediction).
Everest Cantu Drew Remy Marsh
Holloway
Ceo Of Ingoude Ceo Of Ingoude Ceo Of Ingoude
Company Company Company
EXAMPLE
Assume the data in Table 1 are the data from a population of five X, Y pairs.
Similar formulas are used when the
The last column shows that the sum of where ρ is the population value of standard error of the estimate is
the squared errors of prediction is Pearson's correlation and SSY is computed from a sample rather than a
2.791. Therefore, the standard error of population. The only difference is that
the estimate is the denominator is N-2 rather than N.
The reason N-2 is used rather than N-1
is that two parameters (the slope and
the intercept) were estimated in order to
estimate the sum of squares. Formulas
for a sample comparable to the ones for
a population are shown below.
For the data in Table 1, μy = 2.06, SSY
= 4.597 and ρ= 0.6268. Therefore,
There is a version of the formula for the
standard error in terms of Pearson's
correlation:

which is the same value computed


previously.
COEFFICIENT
OF
DETERMINATI
ON
COEFFICIENT OF
DETERMINATION
The coefficient of determination (R² or r-squared) is a statistical measure in a regression model that determines the
proportion of variance in the dependent variable that can be explained by the independent variable. It is a number
between 0 and 1 that measures how well a statistical model predicts an outcome. The outcome is presented by the
model’s dependent variable.
INTERPRETING THE COEFFICIENT OF DETERMINATION

Coefficient of Determination
Interpretation
(R2)

The model does not predict the


0
outcome.

The model partially predicts the


Between 0 and 1
outcome.

The model perfectly predicts the


1
outcome.
• Graphing your linear regression data usually
gives you a good clue as to w hether its R2 is
high or low. For example, the graphs below
show two sets of simulated data:

• Calcu lating the coefficient of d etermination


• Yo u can cho ose between two formulas to calculate the
coefficient o f determ inatio n (R²) of a simp le lin ear
regression. Th e first fo rm ula is specific to sim ple linear
regressions, and the second fo rm ula can b e used to calcu late
th e R² of m any types of statistical m odels.
Formula 1: Using the correlation coefficient Formula 2: Using the regression outputs

Example: Calculating R² using the correlation coefficient

You are studying the relationship between heart rate and age
in children, and you find that the two variables have a
negative Pearson correlation:

r= -0.28

This value can be used to calculate the coefficient of


determination (R²) using Formula 1:

R²= (r)²

R²= (-0.28)²

R²= 0.08

You might also like