0% found this document useful (0 votes)

13 views

Bio2 Module 1 - Simple Linear Regression and Correlation

This document provides an overview of simple linear regression and correlation. It defines linear regression as estimating the numerical relationship between variables using an equation of the form Ŷ= a + bX. Regression finds the line that minimizes the sum of squared errors between predicted and actual Y values. Correlation (measured by r) indicates the strength and direction of the linear relationship between two variables. The document provides examples of calculating the linear regression equation, correlation coefficient, and rank correlation coefficient from datasets. It also discusses limitations of correlation including that correlation does not necessarily indicate causation.

Uploaded by

tamirat hailu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Bio2 Module 1 - Simple Linear Regression and Correlation

Uploaded by

tamirat hailu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 20

University of Gondar

Gondar College of Medicine and Health Sciences

Biostatistics II _ module 1

Simple linear regression

and correlation

Getu Degu

November 2008
Simple linear regression and correlation

Data are frequently given in pairs where one variable is

dependent on the other.

E.g. 1. Weight and height

2. House rent and income
3. Yield and fertilizer

It is usually desirable to express their relationship by

finding an appropriate mathematical equation. To
form the equation, collect the data on these two
variables. Let the observations be denoted by (X1 ,Y1),
(X2 ,Y2), (X3 ,Y3) . . . (Xn ,Yn).

However, before trying to quantify this relationship,

plot the data and get an idea of their nature.

Plot these points on the XY plane and obtain the

scatter diagram.

1
Relationship between heights of
fathers and their oldest sons

73
Heights of oldest sons (inches)

72
71
70
69
68
67
66
65
64
63
62
62 64 66 68 70 72 74
Heights of fathers (inches)

NB: The actual figures of the above scatter diagram

are given on page 5.

2
A) Simple linear regression

The scatter diagram helps to choose the curve that

best fits the data. The simplest type of curve is a
straight line whose equation is given by Ŷ= a + bxi .
This equation is a point estimate of Y = α + βXi .

b= the sample regression coefficient of Y on X.

β= the population regression coefficient of Y on X.

Y on X means Y is the dependent variable and X is

the independent one.

3
Regression is a method of estimating the numerical
relationship between variables. For example, we
would like to know what is the mean or expected
weight for factory workers of a given height, and what
increase in weight is associated with a unit
increase in height.

The purpose of a regression equation is to use one

variable to predict another.

How is the regression equation determined?

4
The Method of least square

The difference between the given score Y and the

predicted score Ŷ is known as the error of estimation.
The regression line, or the line which best fits the
given pairs of scores, is the line for which the sum of
the squares of these errors of estimation (Σеi²) is
minimized. That is, of all the curves, the curve with
minimum Σеi² is the least square regression which
best fits the given data.

The least square regression line for the set of

observations (X1 ,Y1), (X2 ,Y2), (X3 ,Y3) . . . (Xn ,Yn)
has the equation Ŷ = a + bxi .

5
The values ‘a’ and ‘b’ in the equation are constants,
i.e., their values are fixed. The constant ‘a’ indicates
the value of y when x=0. It is also called the y
intercept. The value of ‘b’ shows the slope of the
regression line and gives us a measure of the change
in y for a unit change in x.

This slope (b) is frequently termed as the regression

coefficient of Y on X. If we know the values of ‘a’
and ‘b’, we can easily compute the value of Ŷ for any
given value of X.

6
The constants ‘a’ and ‘b’ are determined by solving
simultaneously the equations (normal equations):

ΣY = an + bΣX
ΣXY = aΣX + bΣX²

a= -b

b= =

7
Example: Heights of 10 fathers (X) together with their
oldest sons (Y) are given below (in inches). Find the
regression of Y on X.

Father (X) oldest son (Y) product (XY) X²

63 65 4095 3969
64 67 4288 4096
70 69 4830 4900
72 70 5040 5184
65 64 4160 4225
67 68 4556 4489
68 71 4828 4624
66 63 4158 4356
70 70 4900 4900
71 72 5112 5041

Total 676 679 45967 45784

8
a= -b

b= =

b= = = = 0.77

a= - 0.77 ( ) = 67.9 – 52.05 = 15.85

Therefore, Ŷ = 15.85 + 0.77 X

The regression coefficient of Y on X (i.e., 0.77) tells

us the change in Y due to a unit change in X.

9
Estimate the height of the oldest son for a father’s
height of 70 inches.

Ŷ = 15.85 + 0.77 (70) = 69.75 inches.

NB: 1) n is the number of pairs of X and Y scores

which are used in determining the
regression line. In the above example, n=10.

2) Be careful to distinguish between (ΣX)² and

Σχ².

10
Explained, unexplained (error), total variations

If all the points on the scatter diagram fall on the

regression line we could say that the entire variance
of Y is due to variations in X.

Explained variation = Σ(Ŷ- )²

The measure of the scatter of points away from the

regression line gives an idea of the variance in Y that
is not explained with the help of the regression
equation.

Unexplained variation = Σ(Y - Ŷ)²

11
The variation of the Y’s about their mean can also be
computed. The quantity Σ(Y- )² is called the total
variation.

Explained variation + unexplained variation = Total variation

The ratio of the explained variation to the total

variation measures how well the linear regression line
fits the given pairs of scores. It is called the
coefficient of determination, and is denoted by r².

r² =

The explained variation is never negative and is never

larger than the total variation. Therefore, r² is always
between 0 and 1. If the explained variation equals 0,
r² = 0.

If r² is known, then r =  . The sign of r is the same

as the sign of b from the regression equation.

Since r² is between 0 and 1, r is between -1 and +1.

12
B) Linear Correlation (Karl Pearson’s Coefficient of
linear correlation):- measures the degree of linear
correlation between two variables (eg. X and Y).
This correlation coefficient is given in pure number,
independent of the units in which the variables are
expressed. It also tells us the direction of the slope
of a regression line is positive or negative.

Its formula is: r =

13
Properties

1) -1  r 1
2) r is a pure number without any unit
3) If r is close to 1  a strong positive
relationship
4) If r is close to -1  a strong negative
relationship
5) If r = 0 → no correlation

Determine the value of ‘r’ for the scores in the

above example.

r = 0.7776  0.78

14
Rank correlation coefficient

The Karl Pearson’s coefficient of correlation cannot

be used in cases where the direct quantitative
measurement of the phenomenon under study is not
possible. In such situations we can use the
Spearman’s rank correlation coefficient.

The spearman’s rank correlation coefficient, denoted

by rs , measures the correlation between two paired
samples of ranked data. This correlation coefficient is
applied to the ranks in two paired samples (not to the
original scores). The formula for computing rank
correlation by this method is:

rs = 1 -

 List the n pairs of ranks; X,Y.

 Find the differences (di ) between the ranks.
 Square these differences and add the squares
(di²).
 Compute rs .

15
Example

Six paintings were ranked by two judges. Calculate

the rank correlation coefficient.

Painting First judge Second judge di di²

(X) (Y)
A 2 2 0 0
B 1 3 -2 4
C 4 4 0 0
D 5 6 -1 1
E 6 5 1 1
F 3 1 2 4

di² = 10, n = 6.

rs = 1 - =1- =1-

= 1 – 0.29

= 0.71

How do you interpret the above correlation coefficient?

16
Spurious correlation

As with regression analysis, similar warnings pertain

to the limitations in the interpretation of a correlation
coefficient.

1. The correlation coefficient applies only to a linear

relationship between X and Y
2. Correlation does not mean causation

Often one encounters what seem to be nonsense or

spurious correlations between two variables that
logically appear to be totally unrelated to one another.
These often arise with correlations taken over time,
usually over a period of several years.

17
What do you think about the correlation coefficient (r)
of 0.9 between the amount of rainfall in Canada and
the maize production in Ethiopia from 1990 to 2000?
Assume the yearly data of the amount of rainfall and
maize production for the years 1990 to 2000 are
available.

18
Exercise 5
Data on FEV1 (forced expiratory volume in one
second) (Y) and height (X) of 20 male medical
students are given below:

Height (cm) FEV1(litres)

164.0 3.54
167.0 3.54
170.4 3.19
171.2 2.85
171.2 3.42
171.3 3.20
172.0 3.60
172.0 3.78
174.0 4.32
176.0 3.75
177.0 3.09
177.0 4.05
177.0 5.43
177.4 3.60
178.0 2.98
180.7 4.80
181.0 3.96
183.1 4.78
183.6 4.56
183.7 4.68

A) Find the regression of Y on X.

B) What is the expected FEV1 for a male student whose height is
175 cm ?
C) What is the expected FEV1 for a female student whose height
is 166 cm ?
D) What is the expected FEV1 for a male student whose height is
270 cm?
E) Determine the Karl Pearson’s linear correlation coefficient.
F) Compute the coefficient of determination and give an
explanation for it.

Categorical Slide2024
No ratings yet
Categorical Slide2024
32 pages
15 MAY - NR - Correlation and Regression
No ratings yet
15 MAY - NR - Correlation and Regression
10 pages
Simple Linear Regression and Correlation Analysis: Chapter Five
No ratings yet
Simple Linear Regression and Correlation Analysis: Chapter Five
5 pages
Introduction To Regression
No ratings yet
Introduction To Regression
13 pages
Correlation_Linear_Logistic Regression
No ratings yet
Correlation_Linear_Logistic Regression
123 pages
biostat lecture note 3
No ratings yet
biostat lecture note 3
5 pages
UCBBSC0204 Unit 1 Linear Statistical Models 1
No ratings yet
UCBBSC0204 Unit 1 Linear Statistical Models 1
26 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
14 pages
Correlation and Regression
No ratings yet
Correlation and Regression
22 pages
PSNM - Ch. 1
No ratings yet
PSNM - Ch. 1
16 pages
Lecture 3.1.9 (REGRESSION)
No ratings yet
Lecture 3.1.9 (REGRESSION)
9 pages
Regression
No ratings yet
Regression
7 pages
Title: Regression and Correlation: Mathematics Support Centre
No ratings yet
Title: Regression and Correlation: Mathematics Support Centre
2 pages
Y X y X N B: Linear Regression
No ratings yet
Y X y X N B: Linear Regression
7 pages
U4 m4
No ratings yet
U4 m4
19 pages
Probability Distributions and Curve Fitting
No ratings yet
Probability Distributions and Curve Fitting
53 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
Correlation and Regression
100% (6)
Correlation and Regression
36 pages
Stat II Chapter 6
No ratings yet
Stat II Chapter 6
11 pages
Correlation & Regression (Complete) .PDF Theory Module-6-B
100% (1)
Correlation & Regression (Complete) .PDF Theory Module-6-B
9 pages
CORRELATION and Regression
No ratings yet
CORRELATION and Regression
6 pages
Handout 5 Correlation and Regression (Recovered)
No ratings yet
Handout 5 Correlation and Regression (Recovered)
6 pages
Chapter Five Regression
No ratings yet
Chapter Five Regression
12 pages
How Can We Explore The Association Between Two Quantitative Variables?
No ratings yet
How Can We Explore The Association Between Two Quantitative Variables?
7 pages
Correlation and Regression
No ratings yet
Correlation and Regression
4 pages
Unit 07 Regression Correlation (1)
No ratings yet
Unit 07 Regression Correlation (1)
36 pages
Screenshot 2025-02-21 at 4.32.06 AM
No ratings yet
Screenshot 2025-02-21 at 4.32.06 AM
47 pages
Business Stat CHAPTER 6
No ratings yet
Business Stat CHAPTER 6
5 pages
Regression
No ratings yet
Regression
14 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
11 pages
Sec D CH 12 Regression Part 2
100% (1)
Sec D CH 12 Regression Part 2
66 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Note #10 Correlation and Regression
No ratings yet
Note #10 Correlation and Regression
7 pages
Regression Analysis
No ratings yet
Regression Analysis
47 pages
3 Unit (1) - Merged
No ratings yet
3 Unit (1) - Merged
22 pages
QT_LESSON 8-Regression & Correlation.docx
No ratings yet
QT_LESSON 8-Regression & Correlation.docx
12 pages
Correlation & Simple Regression
No ratings yet
Correlation & Simple Regression
15 pages
12.1correlation and simple linear
No ratings yet
12.1correlation and simple linear
45 pages
Sta404 - Chapter 5 - Bivariate Analysis (Student)
No ratings yet
Sta404 - Chapter 5 - Bivariate Analysis (Student)
27 pages
Regression: by Vijeta Gupta Amity University
No ratings yet
Regression: by Vijeta Gupta Amity University
15 pages
STAT1
No ratings yet
STAT1
17 pages
Book 2 Notes-71-78
No ratings yet
Book 2 Notes-71-78
8 pages
Module 3 (Regression Line) and Module 4
No ratings yet
Module 3 (Regression Line) and Module 4
38 pages
Chapter 13 PowerPoint
No ratings yet
Chapter 13 PowerPoint
36 pages
Regression
No ratings yet
Regression
7 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
File Gabungan
No ratings yet
File Gabungan
107 pages
Regression Analysis
No ratings yet
Regression Analysis
5 pages
What Is Regression?
No ratings yet
What Is Regression?
13 pages
Regression Basics
No ratings yet
Regression Basics
8 pages
5_Chapter9-linear regression
No ratings yet
5_Chapter9-linear regression
15 pages
Regression 9
No ratings yet
Regression 9
20 pages
REGRESSION and CORRELATION ANALYSIS STA 106 -DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 -DR. BASHIRU
10 pages
Simple Linear Regression and Its Properties 82
No ratings yet
Simple Linear Regression and Its Properties 82
8 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Unit 6, Regression
No ratings yet
Unit 6, Regression
34 pages
Unit 2-Part 3-Linear Regression
No ratings yet
Unit 2-Part 3-Linear Regression
38 pages
Business Statistics Method: by Farah Nurul Aisyah (4122001020) Jasmine Alviana Zalzabillah (4122001070)
No ratings yet
Business Statistics Method: by Farah Nurul Aisyah (4122001020) Jasmine Alviana Zalzabillah (4122001070)
35 pages
PS - Module 3 - ViRa
No ratings yet
PS - Module 3 - ViRa
104 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
SPSS Base 10.0 For Windows
No ratings yet
SPSS Base 10.0 For Windows
6 pages
Junaidi Et Al (WL & Ti)
No ratings yet
Junaidi Et Al (WL & Ti)
6 pages
Risk Assessment and Prediction of Rock Fragmentation Produced by Blasting Operation: A Rock Engineering System
No ratings yet
Risk Assessment and Prediction of Rock Fragmentation Produced by Blasting Operation: A Rock Engineering System
12 pages
Questions For Viva
No ratings yet
Questions For Viva
4 pages
2.1 Descriptive Statistics Contd
No ratings yet
2.1 Descriptive Statistics Contd
20 pages
SEM:Confirmatory Factor Analysis (CFA)
No ratings yet
SEM:Confirmatory Factor Analysis (CFA)
28 pages
TheImpactofTimeManagementontheStudents1 PB - 2
No ratings yet
TheImpactofTimeManagementontheStudents1 PB - 2
8 pages
Slides
No ratings yet
Slides
174 pages
Hidayawiya, Erlina, Isfenti Sadalia
No ratings yet
Hidayawiya, Erlina, Isfenti Sadalia
12 pages
Assignment 6
No ratings yet
Assignment 6
5 pages
The Impact of IFRS On The Value Relevance of Accounting
No ratings yet
The Impact of IFRS On The Value Relevance of Accounting
10 pages
Water: Ipeat Calibration Tool of SWAT
No ratings yet
Water: Ipeat Calibration Tool of SWAT
17 pages
CDA_Assignment4
No ratings yet
CDA_Assignment4
12 pages
EJEMPLO
No ratings yet
EJEMPLO
11 pages
P1 Ele70b BV04
No ratings yet
P1 Ele70b BV04
40 pages
ML Web App Presentation
No ratings yet
ML Web App Presentation
10 pages
Conjoint Analysis
No ratings yet
Conjoint Analysis
40 pages
EPGP 10 DSA Business Analysis QT Project Group 02
No ratings yet
EPGP 10 DSA Business Analysis QT Project Group 02
15 pages
Chapter 14 Multiple Regression
No ratings yet
Chapter 14 Multiple Regression
28 pages
The Effectof Experiential Marketingand Service Qualityon Customer Loyaltyof Dominos Pizzain Cirebon City
No ratings yet
The Effectof Experiential Marketingand Service Qualityon Customer Loyaltyof Dominos Pizzain Cirebon City
7 pages
NUMERICAL ANALYSIS Project
No ratings yet
NUMERICAL ANALYSIS Project
13 pages
Josh Rombach Case 2
No ratings yet
Josh Rombach Case 2
5 pages
Production economics and marketing of finger millet in Mugu district
No ratings yet
Production economics and marketing of finger millet in Mugu district
15 pages
Applsci 12 08252
No ratings yet
Applsci 12 08252
20 pages
Module 2.2 Randomized Assignment
No ratings yet
Module 2.2 Randomized Assignment
10 pages
Bibliometric Analysis of Sustainable Development of Women Entrepreneurship From 1989 2022
No ratings yet
Bibliometric Analysis of Sustainable Development of Women Entrepreneurship From 1989 2022
11 pages
Multiple Regression
No ratings yet
Multiple Regression
36 pages
1 PDF
No ratings yet
1 PDF
19 pages
23-24Exam-withanswers
No ratings yet
23-24Exam-withanswers
18 pages
(eBook PDF) Elementary Statistics: A Step By Step Approach 10th Edition instant download
100% (1)
(eBook PDF) Elementary Statistics: A Step By Step Approach 10th Edition instant download
56 pages

Bio2 Module 1 - Simple Linear Regression and Correlation

Uploaded by

Bio2 Module 1 - Simple Linear Regression and Correlation

Uploaded by

University of Gondar

Gondar College of Medicine and Health Sciences

Simple linear regression

Data are frequently given in pairs where one variable is

E.g. 1. Weight and height

It is usually desirable to express their relationship by

However, before trying to quantify this relationship,

Plot these points on the XY plane and obtain the

NB: The actual figures of the above scatter diagram

The scatter diagram helps to choose the curve that

b= the sample regression coefficient of Y on X.

Y on X means Y is the dependent variable and X is

The purpose of a regression equation is to use one

How is the regression equation determined?

The difference between the given score Y and the

The least square regression line for the set of

This slope (b) is frequently termed as the regression

Father (X) oldest son (Y) product (XY) X²

Total 676 679 45967 45784

a= - 0.77 ( ) = 67.9 – 52.05 = 15.85

Therefore, Ŷ = 15.85 + 0.77 X

The regression coefficient of Y on X (i.e., 0.77) tells

Ŷ = 15.85 + 0.77 (70) = 69.75 inches.

NB: 1) n is the number of pairs of X and Y scores

2) Be careful to distinguish between (ΣX)² and

If all the points on the scatter diagram fall on the

Explained variation = Σ(Ŷ- )²

The measure of the scatter of points away from the

Unexplained variation = Σ(Y - Ŷ)²

Explained variation + unexplained variation = Total variation

The ratio of the explained variation to the total

The explained variation is never negative and is never

If r² is known, then r =  . The sign of r is the same

Since r² is between 0 and 1, r is between -1 and +1.

Its formula is: r =

Determine the value of ‘r’ for the scores in the

The Karl Pearson’s coefficient of correlation cannot

The spearman’s rank correlation coefficient, denoted

 List the n pairs of ranks; X,Y.

Six paintings were ranked by two judges. Calculate

Painting First judge Second judge di di²

How do you interpret the above correlation coefficient?

As with regression analysis, similar warnings pertain

1. The correlation coefficient applies only to a linear

Often one encounters what seem to be nonsense or

Height (cm) FEV1(litres)

A) Find the regression of Y on X.

You might also like