BIO203 Lecture 11 (Correlation) SHF 2024
BIO203 Lecture 11 (Correlation) SHF 2024
I didn’t
break it!
ctr drug A
protein extraction
sample A sample B
protein extraction
protein concentration
absorbance @ 595 nm
sample A sample B
Abs: 10 Abs: 15
1 1 2
2 2 5
3 3 5
4 4 9
5 5 11
6 6 12
7 7 14
8 8 15
9 9 17
protein concentration
absorbance @ 595 nm 10 10 21
1 1 2
2 2 5
3 3 5
4 4 9
5 5 11
6 6 12
7 7 14
8 8 15
9 9 17
protein concentration
absorbance @ 595 nm 10 10 21
1 1 2
2 2 5
3 3 5
4 4 9
5 5 11
6 6 12
7 7 14
8 8 15
9 9 17
protein concentration
absorbance @ 595 nm 10 10 21
test 1 ??? 10
test 2 ??? 15
1 1 2
2 2 5
3 3 5
4 4 9
5 5 11
6 6 12
7 7 14
8 8 15
9 9 17
protein concentration
absorbance @ 595 nm 10 10 21
test 1 ??? 10
test 2 ??? 15
independent variable
regression line
→ mathematical relationship between x and y
dependent variable
independent variable
line: y = a + b x
dependent variable
regression coefficients: a, b
slope (b)
y intercept (a)
independent variable
line: y = a + b x
dependent variable
regression coefficients: a, b
y 1x
y intercept (a)
independent variable
dependent variable
independent variable
dependent variable
independent variable
how to determine the regression coefficients from the measured data points ?
dependent variable
independent variable
line = a + b x
dependent variable
regression coefficients: a, b
slope (b)
y intercept (a)
independent variable
meanx = 5.5
line = a + b x
dependent variable
regression coefficients: a, b
y
meany = 11.1
slope (b)
y intercept (a)
independent variable
how to determine the regression coefficients from the measured data points ?
sample x y
1 1 2
2 2 5
3 3 5
4 4 9
5 5 11
6 6 12
7 7 14 y
8 8 15
9 9 17
10 10 21
Σ 55 111
σ 𝒙−𝒙ഥ (𝒚 − 𝒚 ഥ)
𝒃= ഥ − 𝒃ഥ
𝒂= 𝒚 𝒙
ഥ )𝟐
σ(𝒙 − 𝒙
how to determine the regression coefficients from the measured data points ?
sample x y ഥ
𝒙 − 𝒙 ഥ
𝒚 − 𝒚 produc t ഥ)𝟐
(𝒙 − 𝒙
σ 𝒙−𝒙ഥ (𝒚 − 𝒚 ഥ)
𝒃= = 𝟏. 𝟗𝟒𝟓 ഥ − 𝒃ഥ
𝒂= 𝒚 𝒙 = 𝟎. 𝟒
ഥ )𝟐
σ(𝒙 − 𝒙
how to determine the regression coefficients from the measured data points ?
sample x y ഥ
𝒙 − 𝒙 ഥ
𝒚 − 𝒚 produc t ഥ)𝟐
(𝒙 − 𝒙 y’
1 1 2 -4.5 -9.1 40.95 20.25 2.3
2 2 5 -3.5 -6.1 21.35 12.25 4.3
3 3 5 -2.5 -6.1 15.25 6.25 6.2
4 4 9 -1.5 -2.1 3.15 2.25 8.2
5 5 11 -0.5 -0.1 0.05 0.25 10.1
6 6 12 0.5 0.9 0.45 0.25 12.1
7 7 14 1.5 2.9 4.35 2.25 14.0 y
8 8 15 2.5 3.9 9.75 6.26 16.0
9 9 17 3.5 5.9 20.65 12.25 17.9
10 10 21 4.5 9.9 44.55 20.25 19.9
σ 𝒙−𝒙ഥ (𝒚 − 𝒚 ഥ)
𝒃= = 𝟏. 𝟗𝟒𝟓 ഥ − 𝒃ഥ
𝒂= 𝒚 𝒙 = 𝟎. 𝟒 𝒚′ = 𝟏. 𝟗𝟒𝟓𝒙 + 𝟎. 𝟒
ഥ )𝟐
σ(𝒙 − 𝒙
𝒚′ = 𝟏. 𝟗𝟒𝟓𝒙 + 𝟎. 𝟒
𝒚′ = 𝟏. 𝟗𝟒𝟓𝒙 + 𝟎. 𝟒
𝒚 − 𝟎. 𝟒
𝒙′ =
𝟏. 𝟗𝟒𝟓
Abs: 15
𝟏𝟓 − 𝟎. 𝟒
𝑩𝑺𝑨 = = 𝟕. 𝟓
𝟏. 𝟗𝟒𝟓
regression on x regression on y
regression on y
regression on x
the regression line allows you to make predictions (but be careful !!!)
regression on y regression on y
regression on x
regression on x
the regression line allows you to make predictions (but be careful !!!)
regression on y regression on y
regression on x
regression on x
regression - vs - correlation
dependent variable
dependent variable
measured
independent variable in response independent variable
random
sampling pre-determined
chosen set
goodness of fit
of the regression line
dependent variable
→ correlation coefficient r
→ coefficient of determination r2
y
x
independent variable
dependent variable
sample x y ഥ
𝒚 − 𝒚 ഥ)𝟐
(𝒚 − 𝒚
1 1 2 -8 64
2 3 9 -1 1
3 5 9 -1 1
4 6 11 1 1
5 7 14 4 16
6 9 15 5 25
Σ 31 60 0 108
total
sum of squares
mean 5.2 10
sample x y ഥ
𝒚 − 𝒚 ഥ)𝟐
(𝒚 − 𝒚
1 1 2 -8 64
2 3 9 -1 1
3 5 9 -1 1
4 6 11 1 1
5 7 14 4 16
6 9 15 5 25
Σ 31 60 0 108
total
sum of squares
mean 5.2 10
sample x y 𝒚′ 𝒚 − 𝒚′ (𝒚 − 𝒚′)𝟐
Σ 31 60 60 0 10.8
residuals
mean 5.2 10 10 sum of squares
sample x y ഥ
𝒚 − 𝒚 ഥ)𝟐
(𝒚 − 𝒚
1 1 2 -8 64
2 3 9 -1 1
3 5 9 -1 1
4 6 11 1 1
5 7 14 4 16
6 9 15 5 25
Σ 31 60 0 108
total
r2 = 0.9
sum of squares
mean 5.2 10
sample x y 𝒚′ 𝒚 − 𝒚′ (𝒚 − 𝒚′)𝟐
Σ 31 60 60 0 10.8
residuals
mean 5.2 10 10 sum of squares
r2 = coefficient of determination
→ a measure of how much the X variable explains the variation of the Y variable
→ range: 0 - 1
𝑺𝑺𝒓𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔 𝑺𝑺𝒓𝒆𝒈𝒓𝒆𝒔𝒔𝒊𝒐𝒏
𝒓𝟐 = 𝟏 − =
𝑺𝑺𝒕𝒐𝒕𝒂𝒍) 𝑺𝑺𝒕𝒐𝒕𝒂𝒍)
r2 = coefficient of determination
→ a measure of how much the X variable explains the variation of the Y variable
→ range: 0 - 1
𝑺𝑺𝒓𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔 𝑺𝑺𝒓𝒆𝒈𝒓𝒆𝒔𝒔𝒊𝒐𝒏
𝒓𝟐 = 𝟏 − =
𝑺𝑺𝒕𝒐𝒕𝒂𝒍) 𝑺𝑺𝒕𝒐𝒕𝒂𝒍)
σ 𝒙−𝒙
ഥ (𝒚 − 𝒚
ഥ)
𝒓=
σ 𝒙−𝒙
ഥ 𝟐 ഥ
𝒚−𝒚 𝟐
total residuals
sample x y ഥ
𝒚 − 𝒚 ഥ )𝟐
(𝒚 − 𝒚 𝒚′ 𝒚 − 𝒚′ (𝒚 − 𝒚′)𝟐
𝑺𝑺𝒓𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔 𝟔. 𝟔𝟓
𝒓𝟐 = 𝟏 − =𝟏− = 𝟎. 𝟗𝟕𝟗
𝑺𝑺𝒕𝒐𝒕𝒂𝒍) 𝟑𝟏𝟖. 𝟗
(𝒏 − 𝟐) ∙ 𝒓𝟐
𝒕=
𝟏 − 𝒓𝟐
𝟏𝟎 − 𝟐 ∙ 𝟎. 𝟗𝟕𝟗𝟏
𝒕= = 𝟏𝟗. 𝟒
𝟏 − 𝟎. 𝟗𝟕𝟗𝟏
𝒑 = 𝟓. 𝟐𝟑 𝟏𝟎−𝟖
𝒅𝒇 = # 𝒐𝒇 𝒑𝒂𝒊𝒓𝒔 − 𝟐
𝒔𝟐𝒚𝒙 =
σ(𝒚−𝒚′ )𝟐
= 𝑴𝑺𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍 ഥ)𝟐
σ(𝒚′ − 𝒚
𝒏−𝟐 𝒔𝟐𝒚𝒙 = = 𝑴𝑺𝑹𝒆𝒈𝒓𝒆𝒔𝒔𝒊𝒐𝒏
𝟏
σ(𝒚 − 𝒚′ )𝟐
𝒔𝒚𝒙 = 𝒔𝟐𝒚𝒙 =
𝒏−𝟐
n A B C
1 30 51 46 𝑺𝑺𝑻
2 40 54 54 𝑴𝑺𝑻 = 𝒏𝑻 − 𝟏
3 41 50 43
4 41 56 56
5 42 57 50
6 43 49 32
7 46 45 56
8 48 60 46
9 55 52 56
10 60 54 65
A B C ഥ𝟏 − 𝑿
ഥ ഥ𝒙)𝟐 + 𝒏𝟐 − 𝟏 (𝑿
ഥ𝟐 − 𝑿
ഥ 𝒙ഥ)𝟐 + 𝒏𝟑 − 𝟏 (𝑿
ഥ𝟑 − 𝑿
ഥ 𝒙ഥ)𝟐
𝒏𝟏 − 𝟏 (𝑿
mean 44.6 52.8 50.4 𝑴𝑺𝑩 = 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒔𝒂𝒎𝒑𝒍𝒆𝒔 − 𝟏
n 10 10 10
stdev 8.4 4.3 9.1
var 69.8 18.8 83.6 𝒏𝟏 − 𝟏 𝒔𝟐𝟏 + 𝒏𝟐 − 𝟏 𝒔𝟐𝟐 + 𝒏𝟑 − 𝟏 𝒔𝟐𝟑
𝑴𝑺𝑾 = 𝒏𝑻 − 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒔𝒂𝒎𝒑𝒍𝒆𝒔
larger s2
𝑭= smaller s2
MSB
𝑭= MSW
density
177.7
𝑭= 57.4
= 3.10
p value
𝑴𝑺𝑻 = 65.7
𝑴𝑺𝑩 = 177.7
value of F
𝑴𝑺𝑾 = 57.4
how to determine the regression coefficients from the measured data points ?
regression variance
ഥ )𝟐
σ(𝒚 − 𝒚 𝑴𝑺𝑹𝒆𝒈𝒓𝒆𝒔𝒔𝒊𝒐𝒏 𝟑𝟏𝟐. 𝟐
𝑴𝑺𝑹𝒆𝒈𝒓𝒆𝒔𝒔𝒊𝒐𝒏 = 𝑭= = = 𝟑𝟕𝟓. 𝟓
𝟏 𝑴𝑺𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍 (𝟔. 𝟔𝟓/𝟖)
𝒅𝒇 = 𝟏, 𝟖
residual variance
σ(𝒚 − 𝒚′ )𝟐 𝒑 = 𝟓. 𝟐𝟑 𝟏𝟎−𝟖
𝑴𝑺𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍 =
𝒏−𝟐