0% found this document useful (0 votes)
20 views29 pages

Correlation and Regression 25102024

The document provides an overview of correlation and linear regression analysis, focusing on their definitions, significance, and applications in biostatistics. It explains the correlation coefficient, its types (Pearson and Spearman), and how to interpret them, along with the assumptions and equations related to linear regression. Additionally, it covers the concept of the coefficient of determination (R-squared) and includes references for further reading.

Uploaded by

andri elzar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views29 pages

Correlation and Regression 25102024

The document provides an overview of correlation and linear regression analysis, focusing on their definitions, significance, and applications in biostatistics. It explains the correlation coefficient, its types (Pearson and Spearman), and how to interpret them, along with the assumptions and equations related to linear regression. Additionally, it covers the concept of the coefficient of determination (R-squared) and includes references for further reading.

Uploaded by

andri elzar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Correlation and

Linear Regression
Analysis
Biostatistics teaching team

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Outline
Learning Objectives
Correlation
Linear Regression
References

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Learning Objectives
1.Understanding the concept of correlation and its significance.
2.Understanding how to calculate and interpret correlation coefficient.
3.Understanding the difference between correlation and causation.
4.Understanding the concept of linear regression and its applications.
5.Understanding the equation and assumptions of linear regression.
6.Understanding how to fit a linear regression and interpret the coefficients.
7.Understanding the meaning and interpretation of R-squared, p-value, and
confidence intervals.

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Correlation
"Correlation does not imply causation."

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Correlation

• Sir Francis Galton (Uncle to Darwin


• Development of behavioral statistics
• Father of Eugenics
• Science of fingerprints as unique
• Invented the pocket
• Invented Correlation and Regression

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Correlation

• Correlation is the degree to which two variables are linearly


related.

• Correlation is actually any statistical relationship, whether causal


or not, between two random variables in bivariate data.

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Correlation Coefficient

• A statistical measure of the strength of the relationship between


the relative movements of two variables.
• The values range between -1.0 and 1.0.
• A correlation of -1.0 shows a perfect negative correlation,
• A correlation of 1.0 shows a perfect positive correlation.
• A correlation of 0.0 shows no linear relationship between the movement of the two
variables.
• Two types of correlation coefficients: Pearson & Spearman

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Correlation Coefficient
Correlation Coefficient Interpretation

.90 to 1.00 (−.90 to −1.00) Very high positive (negative) correlation

.70 to .90 (−.70 to −.90) High positive (negative) correlation

.50 to .70 (−.50 to −.70) Moderate positive (negative) correlation

.30 to .50 (−.30 to −.50) Low positive (negative) correlation

.00 to .30 (.00 to −.30) negligible correlation

Correlation coefficient interpretation (Hinkle et al., 2003)

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Correlation Coefficient (Pearson)
• A statistics that measures the linear correlation between two variables X and Y
• Important!
• Pearson correlation only applicable for a linear relationship between two continuous variables
• Correlation coefficient of Pearson Correlation referred as r

• The r value varies depending on the strength and direction of the relationship
• Positive correlation (r>0): as X increases Y also increases at a constant rate (linear)
• Negative correlation (r<0): as X increases Y decreases at a constant rate (linear)

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Correlation Coefficient (Pearson)

Direction
• Positive Linear
• Negative Linear
Strength
• Strong
• Moderate
• Weak
• No Linear Correlation

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Correlation Coefficient (Spearman)
• Also called as Spearman’s rank correlation coefficient
• Nonparametric measure of correlation
• The shape of association between two variables is monotonic
• Similar to Pearson Correlation in terms of direction of relationship
• The movement or change of value between two variables is not constant

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Correlation Coefficient
• Pearson vs Spearman Correlation
• Pearson only works with a linear relationship vs Spearman works with linear and also
monotonic relationships
• Pearson directly measured the raw data values vs Spearman use rank-ordered values of
data

Pearson= +1, Pearson= +0.8,


Spearman= +1 Spearman= +1
Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id
Fat and calories of
Starbucks foods
The scatterplot shows the
relationship between fat
(grams) and calories of
Starbuck foods
Dependent variable?

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Fat and calories of
Starbucks foods
The scatterplot shows the
relationship between fat
(grams) and calories of
Starbuck foods
Dependent variable?
Calories

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Fat and calories of
Starbucks foods
The scatterplot shows the
relationship between fat
(grams) and calories of
Starbuck foods
Dependent variable?
Calories
Independent variable?

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Fat and calories of
Starbucks foods
The scatterplot shows the
relationship between fat
(grams) and calories of
Starbuck foods
Dependent variable?
Calories
Independent variable?
Fat
Relationship?
Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id
r= 0.76
Fat and calories of
Starbucks foods
The scatterplot shows the
relationship between fat
(grams) and calories of
Starbuck foods
Dependent variable?
Calories
Linear, Positive, Strong Correlation
Independent variable?
Fat
Relationship?
Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id
Calculate Correlation Coefficient (Hand
Calculation)

• ρxy=rxy = Correlation between two variables


• Cov(rx, ry) = Covariance of return X and Covariance of return of Y
• σx = Standard deviation of X
• σy = Standard deviation of Y

You will use Jamovi in the laboratory session to do correlation analysis

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Linear Regression

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Linear Regression

• A linear approach for modelling the relationship between


a scalar response and one or more explanatory variables
• Remember the linear equation in high school
• Y=mX+B
• m → slope
• B → intercept

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Linear Regression Equation

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Linear Regression Equation

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Linear Regression Error Term
• Another term is Residual
• Residual is the difference between the observed of dependent
variable (yi) and predicted ŷi.

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Hypothesis testing

Source: M.Sc Epidemiology UMC Utrecht

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Linear Regression Assumptions
1.Linearity: Dependent and independent variables has linear relationship. Checked
using scatter plot
2.Normality: Dependent and Independent variables normally distributed. Checked
using histogram, Q-Q plot, skewness, kurtosis, normality test.
3.Homoscedasticity: the variance of error/residuals should be constant. Checked
using residual plot.
4.Independent: No association between independent variables. Checked using
correlation test between independent variables
5.Error/Residuals Normally Distributed: Normal distribution of error terms/residuals.
Checked using Q-Q plot.

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Linear Regression Coefficient
• The linear model for predicting calories using fat of Starbuck
foods
calories= 183.7 + 11.3*fat
• Interpretation
• As fat increases by 1 unit (gram), the average calories of foods increases 11.3
• When food has 0 unit of fat, the average calories of foods are 183.7
• The linear model for predicting birth weight using mother weight
prior pregnancy is
birth weight= 2369.2 + 4.43*mother weight
• Interpretation?
Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id
Linear Regression Coefficient of Determination

• Coefficient of Determination or R-squared


• Proportion of the variation in the dependent variable that is predictable
from the independent variable(s).
• Squared of correlation coefficient (simple linear regression)

• Formula=

• The value ranged between 0 to 1. Value close or equal to 1 indicates


that the regression predictions perfectly fit the data

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id


Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id
References

• Diez, D. M., Barr, C. D., & Cetinkaya-Rundel, M. (2012). OpenIntro


statistics (pp. 174-175). Boston, MA, USA:: OpenIntro.
• Hinkle, D. E., Wiersma, W., & Jurs, S. G. (2003). Applied statistics
for the behavioral sciences (Vol. 663). Houghton Mifflin college
division.
• Yadav, S. (2018). Correlation analysis in biological studies.
Journal of the Practice of cardiovascular sciences, 4(2), 116.

Biostatistics, Epidemiology and Population Health (BEPH) beph.fkkmk.ugm.ac.id

You might also like