Introduction Linear Regression 2015
Introduction Linear Regression 2015
Linear regression has many practical uses. Most applications fall into one of the following two broad categories:
Prediction: linear regression can be used to fit a predictive model to an observed data set of y and X values.
After developing such a model, if an additional value of X is then given without its accompanying value of y,
the fitted model can be used to make a prediction of the value of y. (see example in the lecture Salary as a
function of years of education)
Given a variable y and a number of variables X1, ..., Xp that may be related to y, linear regression analysis can
be applied to quantify the strength of the relationship between y and the Xj, to assess which Xj may have no
relationship with y at all, and to identify which subsets of the Xj contain redundant information about y.
(especially used for calibration)
Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways,
such as by minimizing the "lack of fit" in some other norm (as with least absolute deviations regression), or by
minimizing a penalized version of the least squares loss function as in ridge regression. Conversely, the least squares
approach can be used to fit models that are not linear models. Thus, although the terms "least squares" and "linear
model" are closely linked, they are not synonymous.
In linear regression, data are modeled using linear predictor functions, and unknown model parameters are estimated
from the data. Such models are called linear models. Most commonly, linear regression refers to a model in which the
conditional mean of y given the value of X is an affine function of X.
(text taken from Wikipedia)
Contents
1 Least Squares Methods ......................................................................................................................... 2
1.1 Linear Models ............................................................................................................................... 2
1.2 Weighed and non-weighed linear regression ..................................................................................5
2 Statistics ............................................................................................................................................... 6
2.1 Error Propagation Y -> beta ........................................................................................................... 7
2.2 Prediction for Y .............................................................................................................................. 7
2.3 Error Propagation y->x ................................................................................................................. 8
The least squares optimality criterion minimizes the sum of squares of residuals between actual observed outputs and
output values of the numerical model that are predicted from input observations
For measurement devices, most of the times a true linear behavior can be assumed between the concentration and
the produced signal intensity. The range in which linearity is observed is the so called ‘dynamic range’ of the
instrument.
The classic least squares problem is to fit a straight line model with a single input x (concentration of the standard)
and a single output y (signal measured) using parameters b and a as shown in Equation 1.
y a x b (1)
Unfortunately, individual data observations xi and yi may not fit the model perfectly due to experimental measurement
error, sample processing variation etc. The unknown, random error is denoted by ei :
yi a xi b ei (2)
Multiple observations (1.. nobs) from a standard calibration line can be expressed as function of the independent
variable x, (concentration of the standard), the yet unknown slope a and offset b of the device and the random errors
e:
y1 a x1 b e1
y2 a x 2 b e 2
(3)
ynobs a x nobs b enobs
In case two measurements have been performed, the system can easily be solved for a and b:
Please note that the noise e has a direct impact on both, the slope and offset of the
calibration line. The result might change significantly from run to run.
Clearly, taking more measurements into account will improve the accuracy. In this case
more measurements than variables are present, and the solution cannot be calculated
explicitly, e.g. instead a best estimate of the slope and offset is calculated. As best
estimate for the slope and offset we minimize the deviation between observations and
the calibration line. The most common approach is to define the so called sum of squared
errors (SSQ) based on the vertical deviation between measurements and regression line.
nobs
2
SSQ yi a x i b (5)
i 1
The best estimate of a and b is obtained when the deviation between line and measurements is minimal.
Mathematically, this is a minimization problem, e.g. find a and b such that SSQ is minimal:
In case of linear systems, the solution can be found by classical calculus, e.g. the SSQ function at the minimum will
have the first derivative zero:
SSQ SSQ
0, 0 (7)
a b
After several calculus steps a solution for the parameters a and b is obtained (for details please refer to linear algebra
literature):
These calculations are implemented in excel when using the functions ‘slope’ and ‘intercept’ resp. in the graph ‘add
trendline’.
In MATLAB the calculation is performed via linear regression. There are toolboxes that have a graphical interface, but
in the end it is faster to program and especially, you learn an important engineering programming language. First of
all, MATLAB works with vectors and matrices – all data and math required will be placed in respective ‘containers’.
First, the equation system in (3) is formulated in matrix notation.
First, the parameters a and b are the unknowns, thus these are placed in a vector:
a
(9)
b
With this definition, a matrix with the known values can be generated, such that A resembles the system in (3):
x1 1
x 1
y 2 (10)
x nobs 1
The scalar sum of squares SSQ of residual errors is shown in Equation 7. To minimize the sum of squares, one may
set the partial derivative of the sum of squares with respect to the model parameters equal to zero as shown in
ŷ b ax
Equation 8. Rearranging these terms provides a linear matrix solution for optimal model parameters per Equation 9.
r1 a x1 b y1
r2 a x 2 b y2
r A y (11)
rnobs a xnobs b
T
SSQ rT r Aβ y Aβ y (12)
y y 2β A y + βT AT A β
T T T
SSQ
2 AT y + 2 AT A (13)
Setting the derivative to zero and rearranging with respect to β , we obtain the best estimate for β :
SSQ 1
2AT y + 2AT A β 0, β AT A AT y (14)
β
x_std = [ 0; 1; 2; 3; 5; 10; 20 ];
y_meas = [0.07; 5.30; 10.09;16.29;27.65; 54.97;96.46];
plot( x_std,y_meas, 'sb' ); hold on;
% generate matrix A
A = [ x_std ones( size( x_std ) ) ];
Implicitly, with the approach taken so far, it was assumed that each measurement had an equal, absolute value. E.g.
for a calibration line from 1 to 10, the first and last point had the same error 1 ± 1 and 10 ± 1, which on a relative
scale means that the low measurement had an error of 100% while the high standard only 10%.
The assumption of a constant absolute error is not valid for most devices – usually the error has a constant as well as
relative component. Using massive repeated standard measurements, the error can be expressed as a function of the
concentration.
Weighting is introduced at the stage of the sum of squares calculation. Instead of the ‘simple’ square of the deviation
between prediction and measurement, the error is weighted with the standard deviation of the measurement:
2
nobs
yi a x i b
SSQ i2
(15)
i 1
In matrix notation, the standard deviations are represented in a covariance matrix Sy:
2
1
22
Sy
(16)
2
nobs
r S0.5 A y (17)
SSQ r2 rT r
T
S0.5 A y S0.5 A y (18)
yT Sy 1y 2 T
AT Sy 1y + T
AT Sy 1A
Following the same approach as previously, the best estimate of the parameters a and b is obtained by calculating the
derivative of SSQ with respect to the parameters and determining the zero-crossing:
Weighing can have a major impact on the result, especially for the lower range of the calibration line as with relative
errors the accuracy of lower points increases significantly compared to a constant absolute weight.
MATLAB:
x_std = [ 0; 1; 2; 3; 5; 10; 20 ];
y_meas = [0.07; 5.30; 10.09;16.29;27.65; 54.97;96.46];
plot( x_std,y_meas, 'sb' ); hold on;
% generate matrix A
A = [ x_std ones( size( x_std ) ) ];
Sy = diag( e_meas.^2 );
ab = inv( A'* inv(Sy) * A ) * A' * inv(Sy) * y_meas;
2 Statistics
As discussed above, each measurement is affected by noise. The noise has an influence on the accuracy of the
calibration line and the upcoming calculations of concentrations.
The error in the slope and offset of the calibration line is calculated following the laws of error propagation. Without
going into detail, the error propagation can be calculated using the derivative of the regression function with respect
to the parameters:
β β T
1
β A Sy A
T T
A Sy y , S S (20)
y y y
This information can be applied to calculate the accuracy of the prediction – the forward error propagation, given x
and the accuracy of the slope and offset, how accurate will the calculated y value be? Without further details, the law
of error propagation is applied:
T
dy dy
y ax b, Sy S (21)
d d
For a plot of the prediction band (for 1 sigma, 68% accuracy), following MATLAB code can be used:
x_std = [ 0; 1; 2; 3; 5; 10; 20 ];
y_meas = [0.07; 5.30; 10.09;16.29;27.65; 54.97;96.46];
plot( x_std,y_meas, 'sb' ); hold on;
% generate matrix A
A = [ x_std ones( size( x_std ) ) ];
Sy = diag( e_meas.^2 );
ab = inv( A'* inv(Sy) * A ) * A' * inv(Sy) * y_meas;
100 measurements
regression
prediction band
80
60
40
20
-20
0 2 4 6 8 10 12 14 16 18 20
standard concentration (mmol/L)
In most cases we are more interested in the ‘reverse’ direction. We have a calibration line and a measurement from a
sample with unknown concentration. The concentration is determined from the regression:
1
x y b (22)
a
The error propagation is calculated as in the forward direction, but now with the derivative dx/dbeta:
dx dx T
Sx S (23)
d d
Additionally to the inaccuracy of the regression line, the inaccuracy of the measurement ( y +/- sy) itself needs to be
taken into account. The linear error propagation is extended with the respective derivative (dx/dy):
100
90
80
measurement signal (au)
70
60
50
40
30
20
10
0
0 5 10 15 20
standard concentration (mmol/L)