0% found this document useful (0 votes)
16 views26 pages

Correlation and Regression

The document covers correlation and linear regression, explaining how to measure the strength and direction of relationships between quantitative variables using the correlation coefficient r and regression equations. It outlines the hypothesis testing process for correlation, common errors in interpretation, and the use of regression analysis to predict dependent variable values based on independent variables. Additionally, it provides examples and exercises related to job satisfaction and burnout among medical technologists.

Uploaded by

RODERICK BALCE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views26 pages

Correlation and Regression

The document covers correlation and linear regression, explaining how to measure the strength and direction of relationships between quantitative variables using the correlation coefficient r and regression equations. It outlines the hypothesis testing process for correlation, common errors in interpretation, and the use of regression analysis to predict dependent variable values based on independent variables. Additionally, it provides examples and exercises related to job satisfaction and burnout among medical technologists.

Uploaded by

RODERICK BALCE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Correlation and Linear

Regression

Roderick D. Balce
Learning Outcomes
• Describe the direction and strength of linear
relationship between quantitative variables.
• Use paired data to find the value of the linear
correlation coefficient r.
• Perform a hypothesis test for correlation.
• Determine and interpret R2 value.
• Use the regression equation generated from
paired data to predict the value of the
dependent variable given the value of the
independent variable.
Basic Concepts
A correlation exists between two variables
when the values of one are somehow
associated with the values of the other in
some way.

The linear correlation coefficient r measures


the strength of the linear relationship
between the paired quantitative x- and y-
values in a sample.
Linear Correlation
Linear relationships Curvilinear relationships

Y Y

X X

Y Y

X X
Linear Correlation
Strong relationships Weak relationships

Y Y

X X

Y Y

X X
Strength and Direction of Linear
Correlation
r value Interpretation
1 Perfect positive linear relationship
0 No linear relationship
-1 Perfect negative linear relationship
Scatter Plots of Data with Various
Correlation Coefficients
Y Y Y

X X X
r = -1 r = -.625 r=0
Y
Y Y

X X X
r=1 r = .351 r=0
Properties of the
Linear Correlation Coefficient r
1. –1 ≤ r ≤ 1
2. If all values of either variable are converted
to a different scale, the value of r does not
change.
3. The value of r is not affected by the choice
of x and y. Interchange all x- and y-values
and the value of r will not change.
4. r measures strength of a linear relationship.
5. r is very sensitive to outliers, they can
dramatically affect its value.
Hypothesis Test for Correlation
Requirements:
1. The sample of paired (x, y) data is a simple
random sample of quantitative data.
2. Visual examination of the scatterplot must
confirm that the points approximate a
straight-line pattern.
3. The outliers must be removed if they are
known to be errors.
Hypothesis Test for Correlation

Notation:
n = number of pairs of sample data

r = linear correlation coefficient for a sample


of paired data

ρ = linear correlation coefficient for a


population of paired data
Hypothesis Test for Correlation
Hypotheses:
H 0: ρ = 0 (There is no linear correlation.)
H 1: ρ ≠ 0 (There is a linear correlation.)

For one-tailed tests:


Hypothesis Test for Correlation

Decision and Conclusion:


If |r| > critical value or the p-value ≤ α, reject
H0 and conclude that there is enough
evidence to support the claim of a linear
correlation.

If |r| ≤ critical value or the p-value > α, fail to


reject H0 and conclude that there is not
enough evidence to support the claim of a
linear correlation.
Example: Job Satisfaction
A cross-sectional study was conducted to
explore the relationships between job
satisfaction (JS) and burnout (BO) among 200
medical technologists working in government
hospitals. At α=0.05, is there a linear
correlation between job satisfaction and
burnout?

Hypotheses:
H 0: ρ = 0
H 1: ρ ≠ 0
Scatterplot
Assumption Check and Correlation
Analysis
Shapiro-Wilk Test for Bivariate Normality
Shapiro-Wilk p
BO - JS 0.991 0.229

Pearson's Correlation
Pearson's r p
BO - JS -0.650 < .001

Decision and Conclusion:


_______________________________________
Common Errors
Involving Correlation
1. Causation: It is wrong to conclude that
correlation implies causality.
2. Averages: Averages suppress individual
variation and may inflate the correlation
coefficient.
3. Linearity: There may be some relationship
between x and y even when there is no
linear correlation.
Regression Analysis
Used to predict the value of a dependent
variable (ŷ) based on the value of at least one
independent variable (x).
Simple linear regression
– only one independent
variable
Multiple regression
– two or more independent
variables
Regression Analysis
 Regression Line
The graph of the regression equation is
called the regression line (or line of best
fit, or least squares line).

 Regression Equation
Given a collection of paired data, the
regression equation
y^ = b + b x
0 1

algebraically describes the relationship


between the two variables.
Notation for
Regression Equation
Population Sample
Parameter Statistic
y-intercept of
regression equation β0 b0
Slope of regression
equation β1 b1

Equation of the
regression line
y = β 0 + β 1x y^ = b0 + b1x
Coefficient of Determination, R2

Measures the proportion of the variation in the


dependent variable (y) that is explained by the
independent variable (x) for a linear regression
model.

Mathematically related to the correlation


coefficient.

Values range from 0 to 1.


Example: Job Satisfaction
What proportion of the variation in job
satisfaction can be explained by the variation in
burnout score?
Model Summary - JS
Model R R² Adjusted R² RMSE
H₀ 0.000 0.000 0.000 29.319
H₁ 0.650 0.423 0.420 22.334

R2 = ______
Explained variation: _______
Unexplained variation: _______
Predicting Values of ŷ
Predicting Values of ŷ
Coefficients
Unstand Standa
Model SE t p
ardized rdized
H₀ Intercept 132.165 2.073 63.751 < .001
H₁ Intercept 235.459 8.724 26.990 < .001
BO -2.112 0.175 -0.650 -12.039 < .001

Regression equation:

yˆ = ____________________
Example: Job Satisfaction
If more employees are measured for burnout,
what do you predict their job satisfaction will
be?

Burnout score (x) Job satisfaction (y)


a. 25 ______
b. 50 ______
c. 70 ______
d. 85 ______
Practice Test
• Using the JASP outputs below, determine the
following:
– Correlation between job satisfaction (JS) and
turnout intention or intention to quit (TOI)
• r = _______
• P-value = ________
• Decision and conclusion: ________________________
_____________________________________________
Regression Analysis (x=JS; y=TOI)
• R2 = _________ Unexplained variation (%) = ______
• ŷ = ________________________
• Predict the TOI for the following JS values:
62 130 55

You might also like