Regression
Regression
2022/2023
Luís Paquete
University of Coimbra
Linear Regression
Contents
Regression model
yi = a + bxi + ei
where ei is the residual for the i-th measurement, that
is, the difference between the measured value for yi
and that would have been predicted from the model.
Linear Regression
Regression model
● Why not the sum of absolute differences? This function is not differentiable
at 0. Then, the minimizers of the function cannot be easily found.
Example
Develop a regression model to relate the time required to perform a file-read operation to
the number of bytes read
Example in R
Linear Regression
● Multiple linear regression extends linear regression for k > 1 independent input
variables
y = b0 + b1 x1 + b2 x2 + ... + bk xk
● Each data point (x1i, x2i, ..., xki, yi) can be expressed as
yi = b0 + b1 x1i + b2 x2i + ... + bk xki + ei
Y=Xb+e
Example
Develop a regression model to relate the time required to perform a certain number of
input-output and memory operations
Example in R
> D <- read.table("regr5.in",header=TRUE)
> lr.out <- lm(D$time ~ D$IO + D$Mem)
> summary(lr.out)
Call:
lm(formula = R$time ~ R$IO + R$mem)
Residuals:
1 2 3 4 5 6
2.9144 -1.7523 0.9941 -2.2725 -3.9086 4.0248
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.779630 2.947538 -0.604 0.589
R$IO 0.111336 0.003698 30.104 8.05e-05 ***
R$Mem 0.055185 0.036737 1.502 0.230
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.049 on 3 degrees of freedom
Multiple R-squared: 0.9967, Adjusted R-squared: 0.9945
F-statistic: 454.2 on 2 and 3 DF, p-value: 0.0001888
y = -1.780 + 0.111 x1 + 0.055 x2
Linear Regression
Y = B0 + B1 x1 + B2 x2 + ... + Bk xk
● Each data point (x1i, x2i, ..., xki, yij) can be expressed as
yij = b0j + b1j x1i + b2j x2i + ... + bkj xki + eij
Coefficient of determination
● Determine how much of the total variation is "explained" by the linear model.
● SST is the total variation of the measured system output
Coefficient of determination
Coefficient of correlation
● It allows to investigate whether the correlation between input and output is positive
(0 < r ≤ 1) or negative (-1 ≤ r < 0). It indicates the strength of the linear relation.
Linear Regression
Coefficient of correlation
Example
Develop a regression model to relate the time required to perform a file-read operation to
the number of bytes read
Example in R
Linear Regression
Example:
Linear Regression
Transformations
Rule of Thumb 1: Transforming y may correct problems with the error terms.
Transformations
Transformations
Transformations
y = abx
by taking the logarithm of both sides
ln y = ln a + (ln b)x
the expression has a linear form:
Example
Develop a regression model for the number of transistors in the following years
Year Transistors
1 9500
2 16000
3 23000
4 38000
5 62000
6 105000
Linear Regression
Example in R
Linear Regression
Example
Develop a regression model for the number of transistors in the following years
Year ln(Transistors)
1 9.1590
2 9.6803
3 10.0432
4 10.5453
5 11.0349
6 11.5617
b’ = 0.474
a’ = 8.679
y' = 8.679 + 0.474x
Linear Regression
Example in R
Example
Develop a regression model for the number of transistors in the following years
Year Transistors
1 9500
2 16000
3 23000
4 38000
5 62000
6 105000
Example
Develop a regression model for the relation between CPU-time and number of processors
Processors CPU-time
1 100
2 54
3 25
4 18
5 15
6 12
7 10
8 12
9 8
Linear Regression
Example in R
Linear Regression
Example
Reciprocal transformation:
Processors CPU-time-1
1 0.01
2 0.02
3 0.04
4 0.06
5 0.07
6 0.08
7 0.10
8 0.08
9 0.13
Linear Regression
Example in R
Linear Regression
Example
Develop a regression model for the relation between CPU-time and number of processors
Processors CPU-time
1 100
2 54
3 25
4 18
5 15
6 12
7 10
8 12
9 8
y = (-0.002+0.013 x)-1
Linear Regression
Example
Develop a regression model for the CPU-time of binary search given a list size
Size CPU-time
1 6.91
2 7.60
3 8.00
4 8.29
5 8.52
6 8.70
7 8.85
8 8.99
9 9.01
Linear Regression
Example in R
Linear Regression
Example
Example in R
Linear Regression
Example
Develop a regression model for the CPU-time of binary search given a list size
Size CPU-time
1 6.91
2 7.60
3 8.00
4 8.29
5 8.52
6 8.70
7 8.85
8 8.99
9 9.01
Example
Example in R
Linear Regression
Example
Example in R
Linear Regression
Example
● Linear regression model assumes a linear relationship between the input variable and
the output variable.
● Multiple linear regression model deals with more than one input variable
● Coefficient of determination is the fraction of total variation that is provided by the linear
model
● The assumptions of linear regression need to be met in order to ensure that the model
can be used for inference (e.g prediction).
● Transformations can be applied in order to model polynomial, exponential or inverse
relationships, but some care must be taken in the interpretation of the resulting model.
Linear Regression
References: