0% found this document useful (0 votes)

19 views50 pages

Biostatistics Lect 7b - 112025

The document covers the concepts of correlation and simple linear regression, focusing on the examination of linear relationships between two quantitative variables. It explains how to draw regression equations to predict values and emphasizes the importance of the correlation coefficient and the coefficient of determination (r²) in understanding the strength of these relationships. Additionally, it discusses the assumptions required for regression analysis and provides examples and exercises to illustrate these concepts.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views50 pages

Biostatistics Lect 7b - 112025

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 50

C 7

L E

BIOSTATISTICS

CORRELATION
AND
SIMPLE LINAR REGRESSION
Objectives

• To examine the linear relationship between two quantitative variables

using

• CORRELATION (see Lec 7a)

• REGRESSION
Comments on Graphs
• Always draw a graph
• examples highlights correlation coefficient as a summary statistic is not sufficient
for final decision of the data

• If coefficient of linear correlation between (x, y) is significant; a linear

equation can be expressed y in terms of x.
• This equation can be used to predict the values of y given values of x.
• This equation is called the regression equation.

• The value r2 is the proportion of the variation in y that is explained by

the linear association between x and y.
SIMPLE LINEAR REGRESSION
Correlation and Regression

• Correlation describes the strength of a linear relationship between

two variables
• Linear means “straight line”

• Regression tells us how to draw the straight line described by the

correlation
• Calculates the “best-fit” line for a certain set of data
Simple Linear Regression Equation

X
Regression Equation
• Linear association between 2 quantitative variables
• (Independent variable or predictor variable or explanatory variable)
• (dependent variable or response variable).

where is the intercept; estimate of regression intercept

is the slope; estimate of the regression slope

• and are sample estimates of and (population parameters)

• : value of the observation
• : estimated Y value , for a given observation
Simple Linear Regression Equation

Y
Yi β0  β1Xi  ε i
Observed Value Yi
of Y for Xi
Slope = β1 Change in Y
Change in X
Predicted Value
of Y for Xi

Intercept = β0

X=0 Xi
X
Examples
Assumptions/Requirements
• For each fixed value of x, corresponding values of y have a bell-shaped
distribution.
• For different values of x, distribution of y-values all have the same
variance (homoscedasticity)

-Variance increases when x increases -Variance remains constant when x increases

-Variance is not the same for all values of x -Variance is approximately the same for all values of x
The Slope and Intercept

For 𝐲 =𝒃 𝟎 + 𝒃𝟏 𝒙

𝑆 𝐿𝑂𝑃𝐸 ∷ 𝑏1=𝑛 ¿ ¿
Example: Regression
Interpretation
•

For every 1unit increase in ,

there is a 0.182 unit decrease
in
The sign of the slope
coefficient indicates the
direction
Example: Matched Pairs
• Examine the relationship between self-reported and measured
female heights (in.).

• Create a Scatterplot
• Remember in session on paired t-test,
we failed to reject the null hypothesis of a mean height
difference being equal to zero.
•

Note the different limits in the two plots

• Linear Correlation Coefficient: r = 0.856863

• Coefficient of Determination: r2= 0.7342
Regression: Making
Predictions
• Only predict within the relevant range of data

Relevant range for

interpolation
450
400
House Price ($1000s)

350
300
250
200
150
100
50 Do not extrapolate
0
beyond the range of
0 500 1000 1500 2000 2500 3000
Square Feet
observed X’s
Exercise
• Data: height and age for 64 children aged 16 or less.
• RQ: Are children’s height related to age? If it is, describe the
association between them.
Scatter Diagram

r = 0.88
• Note:
Coefficient called “_cons” is the intercept ‘a’
The “age” coefficient shows the slope ‘b’
y = 62.2 + 7.2 x
Therefore: Height (cm) = 62.2 + 7.2 (Age (yrs))
Constructing the
Regression Line
• Method of Least Squares
An interpretation of the correlation coefficient, r

• r2 measures how much of the variation in the y variable is accounted

for by the linear relationship with the x variable.

The total variation in y can

be thought of as the sum of the
squared distances from each
y-point to their mean.

(Total SS)
After fitting the regression line,
there is considerably less
variation remaining.

“Residual variation”
(Residual SS)

Also “Error SS”

• The difference between the total sum of squares and the residual sum
of squares is the amount of variation explained by the regression
model
Total ss – Residual ss = Model ss
Total ss – Error ss = Regression ss
Measures of Variation
• Total variation is made up of two parts:

SST  SSR  SSE

Total Sum of Regression Sum of Error Sum of
Squares Squares Squares

SST  ( Yi  Y )2 SSR  ( Ŷi  Y )2 SSE  ( Yi  Ŷi )2

where:
Y= Mean value of the dependent variable
Yi = Observed value of the dependent variable
Yˆi
= Predicted value of Y for the given Xi value
Measures of Variation
• Total variation is made up of two parts:

SST  SSR  SSE

• SST = total sum of squares (Total Variation)

Measures the variation of the Yi values around their mean

• SSR = regression sum of squares (Explained Variation)

Variation attributable to the relationship between X and Y

• SSE = error sum of squares (Unexplained Variation)

Variation in Y attributable to factors other than X

• Note:
“Model” is the Model/Regression Sum of Squares
“Residual” is the Residual/Error Sum of Squares
“Total” is the Total Sum of Squares
Measures of Variation
Y Coefficient of Determination, r2
Yi
^
𝒀
SSE = (Yi - i )2 (the unexplained deviation)

SST = (Yi - )2 The proportion of variation

^i explained by the model
𝒀
2
SSR = (i - )2 (the explained deviation)
0 r 1
𝒀 R-squared is a goodness-of-fit measure

Indicates the percentage of the variance in

the dependent variable that the
independent variables explains.
Measures strength of the relationship
between model and the dependent variable
Xi X
• The difference between the total sum of squares and the residual sum
of squares is the amount of variation explained by the regression
model
Total ss – Residual ss = Model ss
Total ss – Error ss = Regression ss
The proportion of variation
explained by the model
r2=0.7657, implies that 76.6% of the variation is explained by the
regression model
76.6% of the variation in height is explained by age in this model
A simple linear regression was calculated to predict participants height based on their age. A significant regression equation was
found (p<0.001) with 76.6% of the variation in height explained by the model. Participants’ predicted height is equal to 62.2 +
7.2 (Age) years when height is measured in centimetres. Participants’ average height increased 7.2cm for every year of age.
Sampling Error in The Regression Line

• Sample: = bo + b1 x Correlation coefficient r

• Population: y = β0 + β1x Correlation coefficient

Null hypothesis: x and y are not linearly related

We can test either

i.e. Ho : = 0 or β = 0

Using β = 0, is another way to asses if there is a significant linear relationship.

This is given by SPSS
• b = 7.238393 and s.e.(β ) = 0.5085 (units???)
• t = (7.2384 - 0) / 0.5085 = 14.23 => p <0.001
• Very strong evidence that the true slope is not equal to zero.
Exercise 1
Looking at associations between a biomarker of allergy, and
environmental factors
Look at the examples of regression output, and for each one:
1. Do a quick sketch the regression line
2. What is the correlation between the variables - and is this
correlation statistically significant?
3. Interpret the output in words, in terms of the relationship (if any)
between the variables. (Think about the slope, confidence
interval, p-value)
4. How much of the variation in the response variable is due to
variation in the explanatory variable?
There is a significant association between the biomarker and
maxpm10, with an estimated increase in the biomarker of 0.07
(95% CI 0.06-0.07) units per 1 unit increase in maxpm10
(p<0.001).
Or: an estimated increase of 6.92 (6.48 – 7.37) units
Exercise: 2
Looking at associations between a biomarker of allergy, and environmental
factors
Look at the examples of regression output, and for each one:
1. Do a quick sketch the regression line
2. What is the correlation between the variables - and is this correlation
statistically significant?
3. Interpret the output in words, in terms of the relationship (if any)
between the variables. (Think about the slope, confidence interval, p-
value)
4. How much of the variation in the response variable is due to variation in
the explanatory variable?
• For the 2nd one only: find the predicted value of the biomarker when
mintemp = 20
biom
• biom

mintemp
biom = 16.46892 - 0.8152396*mintemp
biom
Predicted value of biom when mintemp = 20?

Predicted value at 20: 16.46892 - 0.8152396*20 = 0.17 units

mintemp
Predictions Using Regression Eqs
• Prediction Y given X:
• If linear correlation is NOT significant (i.e. fail to reject,
Do not use regression line; the mean () is the best predicted y-value
• If linear correlation is significant (i.e. rejected ,
Use equation to find best predicted y value; stay within the range of
the available/observed data
• To determine if correlation is significant:
• Calculate r and test H1:
Reporting Regression: APA
Consider
• You want to know if height predicts weight

Actuarial Brochure - Revised JKUAT MAIN
50% (2)
Actuarial Brochure - Revised JKUAT MAIN
2 pages
Resume Samples
No ratings yet
Resume Samples
13 pages
Chys 3P15 Exam Review
No ratings yet
Chys 3P15 Exam Review
1 page
Employee Benefits P201
No ratings yet
Employee Benefits P201
18 pages
Study Plan
100% (1)
Study Plan
2 pages
Correlation
100% (1)
Correlation
29 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
Week 12+13
No ratings yet
Week 12+13
47 pages
MAP 716 Lecture 4 Simple Linear Regression
No ratings yet
MAP 716 Lecture 4 Simple Linear Regression
23 pages
Simple Linear Regressionclassroom
No ratings yet
Simple Linear Regressionclassroom
37 pages
Linear Regression and Correlation
No ratings yet
Linear Regression and Correlation
35 pages
Regression Equation For SI
No ratings yet
Regression Equation For SI
12 pages
Quant Studies Chapter 7
No ratings yet
Quant Studies Chapter 7
14 pages
File4-Session3-Introduction To Regression
No ratings yet
File4-Session3-Introduction To Regression
50 pages
Asynchronus Learning Module - Sesi 8
No ratings yet
Asynchronus Learning Module - Sesi 8
9 pages
Linear Regression
No ratings yet
Linear Regression
64 pages
Regression PDF
No ratings yet
Regression PDF
18 pages
Dr. Sufian M. Salih / Regression and Correlation
No ratings yet
Dr. Sufian M. Salih / Regression and Correlation
14 pages
Regression Analysis
No ratings yet
Regression Analysis
22 pages
Chapter12 Stats
No ratings yet
Chapter12 Stats
6 pages
Reg & Cor QMS 080-1
No ratings yet
Reg & Cor QMS 080-1
48 pages
Chapter No 11 (Simple Linear Regression)
No ratings yet
Chapter No 11 (Simple Linear Regression)
3 pages
05 Class RegressionCorrelation
No ratings yet
05 Class RegressionCorrelation
57 pages
Correlation and Regression
No ratings yet
Correlation and Regression
8 pages
Simple Linear Regression and Correlation: Model and Examine The Relationship Between A and One or More (Predictors)
No ratings yet
Simple Linear Regression and Correlation: Model and Examine The Relationship Between A and One or More (Predictors)
31 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
Chapter11 - Simple Regression
No ratings yet
Chapter11 - Simple Regression
12 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
64 pages
BES - Lecture 10 - Simple Linear Regression
No ratings yet
BES - Lecture 10 - Simple Linear Regression
15 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
32 pages
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
No ratings yet
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
4 pages
Stats10 - Chapter+4 2
No ratings yet
Stats10 - Chapter+4 2
54 pages
Population Pyramid: Pyramid", Is A Graphical Illustration That Shows The
No ratings yet
Population Pyramid: Pyramid", Is A Graphical Illustration That Shows The
7 pages
Risk Management Framework in A Life Insurance Company
No ratings yet
Risk Management Framework in A Life Insurance Company
18 pages
@regression
No ratings yet
@regression
33 pages
QBM 101 Lecture 10
No ratings yet
QBM 101 Lecture 10
45 pages
5 Chapter Fi
No ratings yet
5 Chapter Fi
29 pages
Simple Linear
No ratings yet
Simple Linear
10 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
68 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
77 pages
Module5 Marketing Mix Model 1
No ratings yet
Module5 Marketing Mix Model 1
43 pages
LINEAR REGRESSION Feu Diliman
No ratings yet
LINEAR REGRESSION Feu Diliman
11 pages
Lecture8 4
No ratings yet
Lecture8 4
29 pages
Module 4: Regression Shrinkage Methods
No ratings yet
Module 4: Regression Shrinkage Methods
5 pages
Relationship - Correlation and Regression
No ratings yet
Relationship - Correlation and Regression
42 pages
Simple Linear Regression
100% (1)
Simple Linear Regression
50 pages
Simple Regression
No ratings yet
Simple Regression
18 pages
Introduction To Linear Regression and Correlation Analysis: Objectives
100% (1)
Introduction To Linear Regression and Correlation Analysis: Objectives
33 pages
Biostat Lecture Note 3
No ratings yet
Biostat Lecture Note 3
5 pages
Regression
No ratings yet
Regression
66 pages
Simple Regression Model: Erbil Technology Institute
No ratings yet
Simple Regression Model: Erbil Technology Institute
9 pages
3.3 Notes: Correlations - The Strength of A Linear Trend: Objectives
No ratings yet
3.3 Notes: Correlations - The Strength of A Linear Trend: Objectives
15 pages
Introduction To Linear Regression and Correlation Analysis
No ratings yet
Introduction To Linear Regression and Correlation Analysis
47 pages
Correlation - Linear - Logistic Regression
No ratings yet
Correlation - Linear - Logistic Regression
123 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Topic - Chapter 12 - Regression Models
No ratings yet
Topic - Chapter 12 - Regression Models
1 page
Mortability Table Insurance
No ratings yet
Mortability Table Insurance
2 pages
Correlation and Regression
No ratings yet
Correlation and Regression
32 pages
Practical Biostatistics BMB-308: Torial Port and Presentation
No ratings yet
Practical Biostatistics BMB-308: Torial Port and Presentation
28 pages
Regression&Corr&Annova
No ratings yet
Regression&Corr&Annova
71 pages
Alexander Forbes Company Profile 2019
No ratings yet
Alexander Forbes Company Profile 2019
17 pages
Correlation and Regression
No ratings yet
Correlation and Regression
10 pages
Regression
No ratings yet
Regression
3 pages
Response
No ratings yet
Response
20 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
Chap5Correlation LinearRegression
No ratings yet
Chap5Correlation LinearRegression
14 pages
SOA Exam Statistics For Risk Modelling Study Manual
No ratings yet
SOA Exam Statistics For Risk Modelling Study Manual
42 pages
Regression Analysis
No ratings yet
Regression Analysis
21 pages
Unit 5
No ratings yet
Unit 5
34 pages
Correlation
No ratings yet
Correlation
22 pages
Simple Linear Regression & Correlation Chapter No 14...
No ratings yet
Simple Linear Regression & Correlation Chapter No 14...
43 pages
Crossvalidation - 1
No ratings yet
Crossvalidation - 1
30 pages
Session On Multicollinearity
No ratings yet
Session On Multicollinearity
11 pages
Regression Equation
No ratings yet
Regression Equation
56 pages
Lecture 8 Correlation and Linear Regression
No ratings yet
Lecture 8 Correlation and Linear Regression
66 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Regression 2024
No ratings yet
Regression 2024
49 pages
Regression
No ratings yet
Regression
12 pages
Uts Ekonometrika
No ratings yet
Uts Ekonometrika
37 pages
CH 01 Single Cash Flow - Ready-To-Build
No ratings yet
CH 01 Single Cash Flow - Ready-To-Build
3 pages
IPSAS 25 Employees Benefits
No ratings yet
IPSAS 25 Employees Benefits
38 pages
STAT2 2e R Markdown Files Sec4.7
No ratings yet
STAT2 2e R Markdown Files Sec4.7
10 pages
Actuarial Valuation-KGA
No ratings yet
Actuarial Valuation-KGA
12 pages
Numerical Methods Module 5
No ratings yet
Numerical Methods Module 5
19 pages
PSRM II Assingment 6
No ratings yet
PSRM II Assingment 6
2 pages
Metagenomics Current Research Application and Comp
No ratings yet
Metagenomics Current Research Application and Comp
9 pages
FOUN 1210 SEM I 2022-23 TUTORIAL SHEETS 10 11 Health Disease
No ratings yet
FOUN 1210 SEM I 2022-23 TUTORIAL SHEETS 10 11 Health Disease
2 pages
Hill Et Al - Towards A Global Genomic Surveillance Network
No ratings yet
Hill Et Al - Towards A Global Genomic Surveillance Network
14 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
SPSS Project 2024 - 2025sem2
No ratings yet
SPSS Project 2024 - 2025sem2
5 pages
Biostatistics Lect 7a - Correlation - 142021
No ratings yet
Biostatistics Lect 7a - Correlation - 142021
31 pages
IHRScoreperCapacity 202407150742
No ratings yet
IHRScoreperCapacity 202407150742
10 pages
生物统计方法与应用9-Regression and Correlation
No ratings yet
生物统计方法与应用9-Regression and Correlation
42 pages
The Disrupter Documentary
100% (1)
The Disrupter Documentary
23 pages

Biostatistics Lect 7b - 112025

Uploaded by

Biostatistics Lect 7b - 112025

Uploaded by

C 7

• To examine the linear relationship between two quantitative variables

• CORRELATION (see Lec 7a)

• If coefficient of linear correlation between (x, y) is significant; a linear

• The value r2 is the proportion of the variation in y that is explained by

• Correlation describes the strength of a linear relationship between

• Regression tells us how to draw the straight line described by the

where is the intercept; estimate of regression intercept

• and are sample estimates of and (population parameters)

-Variance increases when x increases -Variance remains constant when x increases

For every 1unit increase in ,

Note the different limits in the two plots

• Linear Correlation Coefficient: r = 0.856863

Relevant range for

• r2 measures how much of the variation in the y variable is accounted

The total variation in y can

Also “Error SS”

SST  SSR  SSE

SST  ( Yi  Y )2 SSR  ( Ŷi  Y )2 SSE  ( Yi  Ŷi )2

SST  SSR  SSE

Measures the variation of the Yi values around their mean

• SSR = regression sum of squares (Explained Variation)

Variation attributable to the relationship between X and Y

• SSE = error sum of squares (Unexplained Variation)

Variation in Y attributable to factors other than X

SST = (Yi - )2 The proportion of variation

Indicates the percentage of the variance in

• Sample: = bo + b1 x Correlation coefficient r

Null hypothesis: x and y are not linearly related

We can test either

Using β = 0, is another way to asses if there is a significant linear relationship.

Predicted value at 20: 16.46892 - 0.8152396*20 = 0.17 units

You might also like