Simple Linear Regression and Correlation - Class Example
Simple Linear Regression and Correlation - Class Example
Development of a statistical model that can be used to predict the values of a dependent
or response variable from the values of at least one explanatory or independent variable
Dependent variable
• The variable we wish to predict or explain
Independent variable
• The variable used to predict or explain the dependent variable
Scatter diagram
• Visualise the relationship between variables (independent variable on the horizontal X axis and a dependent
variable on the vertical Y axis)
• Helps suggest starting point for regression analysis
Positive straight-line relationship
∆𝑌𝑌 - Change in Y
a ∆X - Change in X
0 X
𝑌𝑌𝑐𝑐 = 𝑎𝑎 + 𝑏𝑏𝑏𝑏
a = Population Y intercept
X = Independent Variable
Types of relationships
Negative Linear
No relationship
Correlation Analysis
The objective is not to use one variable to predict another, but rather to measure the
strength of the association or covariation that exists between two continuous
variables
For r → +1, there is a strong positive relationship between the variables, i.e., as x
increases, y also increases.
For r → -1, there is a strong negative relationship between the variables, i.e., as x
increases, y decreases, and vice versa.
Association
Y Y Y
X X X
0.00
-0.72
0.50
-0.96
0.98
-0.45
Procedure for Calculation
1. Collect the data for both dependent (Y) and independent (X).
n⋅ ∑ xy − ∑ 𝑥𝑥 ⋅ ∑ 𝑦𝑦
r= Coefficient of determination = 𝑟𝑟 2
n⋅ ∑ x 2 − ∑ 𝑥𝑥 2 ⋅ n⋅ ∑ y 2 − ∑ 𝑦𝑦 2
n⋅ ∑ xy − ∑ 𝑥𝑥 ⋅ ∑ 𝑦𝑦
𝑏𝑏 = 𝑎𝑎 = 𝑌𝑌 − 𝑏𝑏 ⋅ 𝑋𝑋
n⋅ ∑ x 2 − ∑ 𝑥𝑥 2
∑ 𝑌𝑌 ∑ 𝑋𝑋
𝑌𝑌 = 𝑋𝑋 =
𝑛𝑛 𝑛𝑛
Example
An engineer wishes to examine the relationship between the length of steel bars (cm) and its
respective weight (kg). A random sample of 10 steel bars is selected.
1,40 2,45
1,60 3,12
1,70 2,79
1,88 3,08
1,10 1,99
1,55 2,19
2,35 4,05
2,45 3,24
1,43 3,19
1,70 2,55
a) Compute the coefficient of correlation and determination and interpret your answers.
b) Determine the regression equation and estimate the weight of 3cm of steel bar
c) Test the hypothesis if the coefficient of correlation in the population is zero. Use α = 0.05
Scatter Plot
4,5
3,5
3
Weight (Kg)
2,5
1,5
0,5
0
0,00 0,50 1,00 1,50 2,00 2,50 3,00
Length (cm)
Computing the values
n⋅ ∑ xy − ∑ 𝑥𝑥 ⋅ ∑ 𝑦𝑦
r=
n⋅ ∑ x 2 − ∑ 𝑥𝑥 2 ⋅ n⋅ ∑ y 2 − ∑ 𝑦𝑦 2
There is a moderate positive relationship between the length of steel bars and the weight
𝑟𝑟 2 = 0.7642 = 0.584
58% of the variation in the weight of steel bars can be explained by the variability in the length.
n⋅ ∑ xy − ∑ 𝑥𝑥 ⋅ ∑ 𝑦𝑦
𝑏𝑏 =
n⋅ ∑ x 2 − ∑ 𝑥𝑥 2
∑ 𝑌𝑌 28.65
𝑌𝑌 = = = 2.865
𝑛𝑛 10
∑ 𝑋𝑋 17.16
𝑋𝑋 = = = 1.716
𝑛𝑛 10
𝑎𝑎 = 𝑌𝑌 − 𝑏𝑏 ⋅ 𝑋𝑋
4,5
3,5
2,5
1,5
0,5
0
Intercept (a) 0,00 0,50 1,00 1,50 2,00 2,50 3,00
Length (cm)
= 0,977
Y = 0.977 + 1.1𝑋𝑋
The predicted weight for a length of steel bar that is 3cm long = 4.28 kg
Hypothesis Testing (Correlation coefficient (r) for Liner Regression)
State the null hypothesis (𝐻𝐻𝑜𝑜 )
• Ho: 𝜌𝜌 = 0
Critical value tc
• tc = t(degrees of freedom = n-2)
Decision rule:
• Accept Ho if -tc < t < tc
t test
𝑟𝑟 − 𝜌𝜌
𝑡𝑡 =
1 − 𝑟𝑟 2
𝑛𝑛 − 2 State the decision
• r – correlation value (sample)
• n – number of samples
• 𝜌𝜌 - population correlation coefficient
Example
Test the hypothesis that the coefficient of correlation in the experiment is zero. Use α = 0.05
𝑟𝑟−𝜌𝜌 0.764−0
Test statistic 𝑡𝑡 = = = 3.34
1−𝑟𝑟2 1−0.584
𝑛𝑛−2 10−2