0% found this document useful (0 votes)
5 views

Introduction Linear Regression 2015

Uploaded by

mardjukifreddy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Introduction Linear Regression 2015

Uploaded by

mardjukifreddy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Introduction to Linear Regression

Linear regression has many practical uses. Most applications fall into one of the following two broad categories:

 Prediction: linear regression can be used to fit a predictive model to an observed data set of y and X values.
After developing such a model, if an additional value of X is then given without its accompanying value of y,
the fitted model can be used to make a prediction of the value of y. (see example in the lecture Salary as a
function of years of education)
 Given a variable y and a number of variables X1, ..., Xp that may be related to y, linear regression analysis can
be applied to quantify the strength of the relationship between y and the Xj, to assess which Xj may have no
relationship with y at all, and to identify which subsets of the Xj contain redundant information about y.
(especially used for calibration)

Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways,
such as by minimizing the "lack of fit" in some other norm (as with least absolute deviations regression), or by
minimizing a penalized version of the least squares loss function as in ridge regression. Conversely, the least squares
approach can be used to fit models that are not linear models. Thus, although the terms "least squares" and "linear
model" are closely linked, they are not synonymous.

In linear regression, data are modeled using linear predictor functions, and unknown model parameters are estimated
from the data. Such models are called linear models. Most commonly, linear regression refers to a model in which the
conditional mean of y given the value of X is an affine function of X.
(text taken from Wikipedia)

Contents
1 Least Squares Methods ......................................................................................................................... 2
1.1 Linear Models ............................................................................................................................... 2
1.2 Weighed and non-weighed linear regression ..................................................................................5
2 Statistics ............................................................................................................................................... 6
2.1 Error Propagation Y -> beta ........................................................................................................... 7
2.2 Prediction for Y .............................................................................................................................. 7
2.3 Error Propagation y->x ................................................................................................................. 8

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 1 of 9


1 Least Squares Methods

The least squares optimality criterion minimizes the sum of squares of residuals between actual observed outputs and
output values of the numerical model that are predicted from input observations

1.1 Linear Models

For measurement devices, most of the times a true linear behavior can be assumed between the concentration and
the produced signal intensity. The range in which linearity is observed is the so called ‘dynamic range’ of the
instrument.

The classic least squares problem is to fit a straight line model with a single input x (concentration of the standard)
and a single output y (signal measured) using parameters b and a as shown in Equation 1.

y a x  b (1)

Unfortunately, individual data observations xi and yi may not fit the model perfectly due to experimental measurement
error, sample processing variation etc. The unknown, random error is denoted by ei :

yi a xi  b  ei (2)

Multiple observations (1.. nobs) from a standard calibration line can be expressed as function of the independent
variable x, (concentration of the standard), the yet unknown slope a and offset b of the device and the random errors
e:

y1  a x1  b  e1
y2  a x 2  b  e 2
(3)

ynobs  a x nobs  b  enobs

In case two measurements have been performed, the system can easily be solved for a and b:

y1  e1  y2  e2 y x  y1x 2  x1e2  x 2e1


a  , b 2 1 (4)
x1  x 2 x1  x 2

Please note that the noise e has a direct impact on both, the slope and offset of the
calibration line. The result might change significantly from run to run.

Clearly, taking more measurements into account will improve the accuracy. In this case
more measurements than variables are present, and the solution cannot be calculated
explicitly, e.g. instead a best estimate of the slope and offset is calculated. As best
estimate for the slope and offset we minimize the deviation between observations and
the calibration line. The most common approach is to define the so called sum of squared
errors (SSQ) based on the vertical deviation between measurements and regression line.

nobs
2
SSQ    yi   a x i  b   (5)
i 1

The best estimate of a and b is obtained when the deviation between line and measurements is minimal.
Mathematically, this is a minimization problem, e.g. find a and b such that SSQ is minimal:

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 2 of 9


 aˆ  nobs
   min 2
 bˆ    yi   a x i  b   (6)
  a,b
i 1

In case of linear systems, the solution can be found by classical calculus, e.g. the SSQ function at the minimum will
have the first derivative zero:

SSQ SSQ
 0, 0 (7)
a b

After several calculus steps a solution for the parameters a and b is obtained (for details please refer to linear algebra
literature):

nobs nobs nobs nobs nobs nobs nobs


 yi  x i2   xi  x i yi nobs  x iyi   xi  yi
b i 1 i 1 i 1 i 1 , a i 1 i 1 i 1 (8)
2 2
nobs  nobs
 nobs  nobs 
 
nobs  x i2    x i  nobs  x i2    x i 
i 1
 i 1  i 1  i 1 

These calculations are implemented in excel when using the functions ‘slope’ and ‘intercept’ resp. in the graph ‘add
trendline’.

In MATLAB the calculation is performed via linear regression. There are toolboxes that have a graphical interface, but
in the end it is faster to program and especially, you learn an important engineering programming language. First of
all, MATLAB works with vectors and matrices – all data and math required will be placed in respective ‘containers’.
First, the equation system in (3) is formulated in matrix notation.

First, the parameters a and b are the unknowns, thus these are placed in a vector:

a 
    (9)
 b 

With this definition, a matrix with the known values can be generated, such that A  resembles the system in (3):

 x1 1 
 
 x 1 
y   2   (10)
   
 
 x nobs 1 

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 3 of 9


The parameters a and b (organized in beta) can be calculated using different approaches. Following the approach
described above, matrix notation is used to calculate the SSQ and respective derivatives.

The scalar sum of squares SSQ of residual errors is shown in Equation 7. To minimize the sum of squares, one may
set the partial derivative of the sum of squares with respect to the model parameters  equal to zero as shown in

ŷ  b  ax

Equation 8. Rearranging these terms provides a linear matrix solution for optimal model parameters per Equation 9.

r1  a x1  b  y1
r2  a x 2  b  y2
r  A  y (11)

rnobs  a xnobs  b

T
SSQ  rT r   Aβ  y   Aβ  y  (12)
 y y  2β A y + βT AT A β
T T T

The derivative with respect to β becomes:

 SSQ
 2 AT y + 2 AT A  (13)


Setting the derivative to zero and rearranging with respect to β , we obtain the best estimate for β :

 SSQ 1
 2AT y + 2AT A β  0, β   AT A  AT y (14)
β

In MATLAB, this calculation can be performed in two ways:

Example data to fill x and y:

x_std = [ 0; 1; 2; 3; 5; 10; 20 ];
y_meas = [0.07; 5.30; 10.09;16.29;27.65; 54.97;96.46];
plot( x_std,y_meas, 'sb' ); hold on;

1. MATLAB Matrix calculus

% generate matrix A
A = [ x_std ones( size( x_std ) ) ];

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 4 of 9


ab = inv( A'*A ) * A' * y_meas;
plot( [ 0; x_std(end)], [ab(2) ab(1)*x_std(end)+ab(2)], 'r:' );

2. MATLAB ‘\’ command

% calculate regression line


% generate matrix A
A = [ x_std ones( size( x_std ) ) ];
ab = A \ y_meas;
plot( [ 0; x_std(end)], [ab(2) ab(1)*x_std(end)+ab(2)], 'r--' );

1.2 Weighed and non-weighed linear regression

Implicitly, with the approach taken so far, it was assumed that each measurement had an equal, absolute value. E.g.
for a calibration line from 1 to 10, the first and last point had the same error 1 ± 1 and 10 ± 1, which on a relative
scale means that the low measurement had an error of 100% while the high standard only 10%.

The assumption of a constant absolute error is not valid for most devices – usually the error has a constant as well as
relative component. Using massive repeated standard measurements, the error can be expressed as a function of the
concentration.

Weighting is introduced at the stage of the sum of squares calculation. Instead of the ‘simple’ square of the deviation
between prediction and measurement, the error is weighted with the standard deviation of the measurement:

2
nobs
 yi   a x i  b  
SSQ   i2
(15)
i 1

In matrix notation, the standard deviations are represented in a covariance matrix Sy:

 2 
 1 
 
22
Sy   
 (16)
  
 2 
 nobs 

The SSQ is calculated taking into account the variances:

r  S0.5  A  y  (17)

SSQ   r2  rT r
T
  S0.5  A  y   S0.5  A  y   (18)
 yT Sy 1y  2 T
AT Sy 1y + T
AT Sy 1A 

Following the same approach as previously, the best estimate of the parameters a and b is obtained by calculating the
derivative of SSQ with respect to the parameters and determining the zero-crossing:

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 5 of 9


 SSQ
0
β
0   2AT Sy 1y + 2 AT Sy 1A β, (19)
1
β   AT Sy 1A  AT Sy 1y

Weighing can have a major impact on the result, especially for the lower range of the calibration line as with relative
errors the accuracy of lower points increases significantly compared to a constant absolute weight.

MATLAB:

x_std = [ 0; 1; 2; 3; 5; 10; 20 ];
y_meas = [0.07; 5.30; 10.09;16.29;27.65; 54.97;96.46];
plot( x_std,y_meas, 'sb' ); hold on;

% generate matrix A
A = [ x_std ones( size( x_std ) ) ];

% generate matrix Sy^-1


% observation from experiments.. 5% error

e_meas = 0.05 * y_meas;

Sy = diag( e_meas.^2 );
ab = inv( A'* inv(Sy) * A ) * A' * inv(Sy) * y_meas;

plot( [ 0; x_std(end)], [ab(2) ab(1)*x_std(end)+ab(2)], 'm' );

2 Statistics

Relevance of statistics? Answer the question if your measurements are relevant ;)

As discussed above, each measurement is affected by noise. The noise has an influence on the accuracy of the
calibration line and the upcoming calculations of concentrations.

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 6 of 9


2.1 Error Propagation Y -> beta

The error in the slope and offset of the calibration line is calculated following the laws of error propagation. Without
going into detail, the error propagation can be calculated using the derivative of the regression function with respect
to the parameters:

β   β T
1 
β   A Sy A 
T T
A Sy y , S  S   (20)
 y y   y 

2.2 Prediction for Y

This information can be applied to calculate the accuracy of the prediction – the forward error propagation, given x
and the accuracy of the slope and offset, how accurate will the calculated y value be? Without further details, the law
of error propagation is applied:

T
dy  dy 
y  ax  b, Sy  S   (21)
d   d  

For a plot of the prediction band (for 1 sigma, 68% accuracy), following MATLAB code can be used:

x_std = [ 0; 1; 2; 3; 5; 10; 20 ];
y_meas = [0.07; 5.30; 10.09;16.29;27.65; 54.97;96.46];
plot( x_std,y_meas, 'sb' ); hold on;

% generate matrix A
A = [ x_std ones( size( x_std ) ) ];

% generate matrix Sy^-1


% observation from experiments.. 5% error

e_meas = 0.05 * y_meas;

Sy = diag( e_meas.^2 );
ab = inv( A'* inv(Sy) * A ) * A' * inv(Sy) * y_meas;

plot( [ 0; x_std(end)], [ab(2) ab(1)*x_std(end)+ab(2)], 'k' );

% calculate errors in parameters

B = inv(A'*A)*A'; % derivative dbeta/dy

Sb = B * Sy * B'; % covariance of beta


Xx = linspace( 0, max(x_std), 50 )'; % 50 points between 0 and highest point
C = [ Xx ones(size(Xx)) ]; % Xx vector of points x

se = C * Sb * C'; % covariance of the predictions


se = diag( se ).^0.5; % standard error, 68% prediction
plot( Xx, C*ab + se, 'm:' ); % plot
plot( Xx, C*ab - se, 'm:' );

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 7 of 9


120

100 measurements
regression
prediction band

80

60

40

20

-20
0 2 4 6 8 10 12 14 16 18 20
standard concentration (mmol/L)

2.3 Error Propagation y->x

In most cases we are more interested in the ‘reverse’ direction. We have a calibration line and a measurement from a
sample with unknown concentration. The concentration is determined from the regression:

1
x  y b (22)
a

The error propagation is calculated as in the forward direction, but now with the derivative dx/dbeta:

dx  dx T
Sx  S   (23)
d   d  

Additionally to the inaccuracy of the regression line, the inaccuracy of the measurement ( y +/- sy) itself needs to be
taken into account. The linear error propagation is extended with the respective derivative (dx/dy):

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 8 of 9


dx  S  dx T

Sx  (24)
    Sy    
d   d  
 y   y 

100

90

80
measurement signal (au)

70

60

50

40

30

20

10

0
0 5 10 15 20
standard concentration (mmol/L)

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 9 of 9

You might also like