0% found this document useful (0 votes)

5 views

Introduction Linear Regression 2015

Uploaded by

mardjukifreddy

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Introduction Linear Regression 2015

Uploaded by

mardjukifreddy

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Introduction to Linear Regression

Linear regression has many practical uses. Most applications fall into one of the following two broad categories:

 Prediction: linear regression can be used to fit a predictive model to an observed data set of y and X values.
After developing such a model, if an additional value of X is then given without its accompanying value of y,
the fitted model can be used to make a prediction of the value of y. (see example in the lecture Salary as a
function of years of education)
 Given a variable y and a number of variables X1, ..., Xp that may be related to y, linear regression analysis can
be applied to quantify the strength of the relationship between y and the Xj, to assess which Xj may have no
relationship with y at all, and to identify which subsets of the Xj contain redundant information about y.
(especially used for calibration)

Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways,
such as by minimizing the "lack of fit" in some other norm (as with least absolute deviations regression), or by
minimizing a penalized version of the least squares loss function as in ridge regression. Conversely, the least squares
approach can be used to fit models that are not linear models. Thus, although the terms "least squares" and "linear
model" are closely linked, they are not synonymous.

In linear regression, data are modeled using linear predictor functions, and unknown model parameters are estimated
from the data. Such models are called linear models. Most commonly, linear regression refers to a model in which the
conditional mean of y given the value of X is an affine function of X.
(text taken from Wikipedia)

Contents
1 Least Squares Methods ......................................................................................................................... 2
1.1 Linear Models ............................................................................................................................... 2
1.2 Weighed and non-weighed linear regression ..................................................................................5
2 Statistics ............................................................................................................................................... 6
2.1 Error Propagation Y -> beta ........................................................................................................... 7
2.2 Prediction for Y .............................................................................................................................. 7
2.3 Error Propagation y->x ................................................................................................................. 8

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 1 of 9

1 Least Squares Methods

The least squares optimality criterion minimizes the sum of squares of residuals between actual observed outputs and
output values of the numerical model that are predicted from input observations

1.1 Linear Models

For measurement devices, most of the times a true linear behavior can be assumed between the concentration and
the produced signal intensity. The range in which linearity is observed is the so called ‘dynamic range’ of the
instrument.

The classic least squares problem is to fit a straight line model with a single input x (concentration of the standard)
and a single output y (signal measured) using parameters b and a as shown in Equation 1.

y a x  b (1)

Unfortunately, individual data observations xi and yi may not fit the model perfectly due to experimental measurement
error, sample processing variation etc. The unknown, random error is denoted by ei :

yi a xi  b  ei (2)

Multiple observations (1.. nobs) from a standard calibration line can be expressed as function of the independent
variable x, (concentration of the standard), the yet unknown slope a and offset b of the device and the random errors
e:

y1  a x1  b  e1
y2  a x 2  b  e 2
(3)

ynobs  a x nobs  b  enobs

In case two measurements have been performed, the system can easily be solved for a and b:

y1  e1  y2  e2 y x  y1x 2  x1e2  x 2e1

a  , b 2 1 (4)
x1  x 2 x1  x 2

Please note that the noise e has a direct impact on both, the slope and offset of the
calibration line. The result might change significantly from run to run.

Clearly, taking more measurements into account will improve the accuracy. In this case
more measurements than variables are present, and the solution cannot be calculated
explicitly, e.g. instead a best estimate of the slope and offset is calculated. As best
estimate for the slope and offset we minimize the deviation between observations and
the calibration line. The most common approach is to define the so called sum of squared
errors (SSQ) based on the vertical deviation between measurements and regression line.

nobs
2
SSQ    yi   a x i  b   (5)
i 1

The best estimate of a and b is obtained when the deviation between line and measurements is minimal.
Mathematically, this is a minimization problem, e.g. find a and b such that SSQ is minimal:

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 2 of 9

 aˆ  nobs
   min 2
 bˆ    yi   a x i  b   (6)
  a,b
i 1

In case of linear systems, the solution can be found by classical calculus, e.g. the SSQ function at the minimum will
have the first derivative zero:

SSQ SSQ
 0, 0 (7)
a b

After several calculus steps a solution for the parameters a and b is obtained (for details please refer to linear algebra
literature):

nobs nobs nobs nobs nobs nobs nobs

 yi  x i2   xi  x i yi nobs  x iyi   xi  yi
b i 1 i 1 i 1 i 1 , a i 1 i 1 i 1 (8)
2 2
nobs  nobs
 nobs  nobs 
 
nobs  x i2    x i  nobs  x i2    x i 
i 1
 i 1  i 1  i 1 

These calculations are implemented in excel when using the functions ‘slope’ and ‘intercept’ resp. in the graph ‘add
trendline’.

In MATLAB the calculation is performed via linear regression. There are toolboxes that have a graphical interface, but
in the end it is faster to program and especially, you learn an important engineering programming language. First of
all, MATLAB works with vectors and matrices – all data and math required will be placed in respective ‘containers’.
First, the equation system in (3) is formulated in matrix notation.

First, the parameters a and b are the unknowns, thus these are placed in a vector:

a 
    (9)
 b 

With this definition, a matrix with the known values can be generated, such that A  resembles the system in (3):

 x1 1 
 
 x 1 
y   2   (10)
   
 
 x nobs 1 

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 3 of 9

The parameters a and b (organized in beta) can be calculated using different approaches. Following the approach
described above, matrix notation is used to calculate the SSQ and respective derivatives.

The scalar sum of squares SSQ of residual errors is shown in Equation 7. To minimize the sum of squares, one may
set the partial derivative of the sum of squares with respect to the model parameters  equal to zero as shown in

ŷ  b  ax

Equation 8. Rearranging these terms provides a linear matrix solution for optimal model parameters per Equation 9.

r1  a x1  b  y1
r2  a x 2  b  y2
r  A  y (11)

rnobs  a xnobs  b

T
SSQ  rT r   Aβ  y   Aβ  y  (12)
 y y  2β A y + βT AT A β
T T T

The derivative with respect to β becomes:

 SSQ
 2 AT y + 2 AT A  (13)


Setting the derivative to zero and rearranging with respect to β , we obtain the best estimate for β :

 SSQ 1
 2AT y + 2AT A β  0, β   AT A  AT y (14)
β

In MATLAB, this calculation can be performed in two ways:

Example data to fill x and y:

x_std = [ 0; 1; 2; 3; 5; 10; 20 ];
y_meas = [0.07; 5.30; 10.09;16.29;27.65; 54.97;96.46];
plot( x_std,y_meas, 'sb' ); hold on;

1. MATLAB Matrix calculus

% generate matrix A
A = [ x_std ones( size( x_std ) ) ];

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 4 of 9

ab = inv( A'*A ) * A' * y_meas;
plot( [ 0; x_std(end)], [ab(2) ab(1)*x_std(end)+ab(2)], 'r:' );

2. MATLAB ‘\’ command

% calculate regression line

% generate matrix A
A = [ x_std ones( size( x_std ) ) ];
ab = A \ y_meas;
plot( [ 0; x_std(end)], [ab(2) ab(1)*x_std(end)+ab(2)], 'r--' );

1.2 Weighed and non-weighed linear regression

Implicitly, with the approach taken so far, it was assumed that each measurement had an equal, absolute value. E.g.
for a calibration line from 1 to 10, the first and last point had the same error 1 ± 1 and 10 ± 1, which on a relative
scale means that the low measurement had an error of 100% while the high standard only 10%.

The assumption of a constant absolute error is not valid for most devices – usually the error has a constant as well as
relative component. Using massive repeated standard measurements, the error can be expressed as a function of the
concentration.

Weighting is introduced at the stage of the sum of squares calculation. Instead of the ‘simple’ square of the deviation
between prediction and measurement, the error is weighted with the standard deviation of the measurement:

2
nobs
 yi   a x i  b  
SSQ   i2
(15)
i 1

In matrix notation, the standard deviations are represented in a covariance matrix Sy:

 2 
 1 
 
22
Sy   
 (16)
  
 2 
 nobs 

The SSQ is calculated taking into account the variances:

r  S0.5  A  y  (17)

SSQ   r2  rT r
T
  S0.5  A  y   S0.5  A  y   (18)
 yT Sy 1y  2 T
AT Sy 1y + T
AT Sy 1A 

Following the same approach as previously, the best estimate of the parameters a and b is obtained by calculating the
derivative of SSQ with respect to the parameters and determining the zero-crossing:

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 5 of 9

 SSQ
0
β
0   2AT Sy 1y + 2 AT Sy 1A β, (19)
1
β   AT Sy 1A  AT Sy 1y

Weighing can have a major impact on the result, especially for the lower range of the calibration line as with relative
errors the accuracy of lower points increases significantly compared to a constant absolute weight.

MATLAB:

x_std = [ 0; 1; 2; 3; 5; 10; 20 ];
y_meas = [0.07; 5.30; 10.09;16.29;27.65; 54.97;96.46];
plot( x_std,y_meas, 'sb' ); hold on;

% generate matrix A
A = [ x_std ones( size( x_std ) ) ];

% generate matrix Sy^-1

% observation from experiments.. 5% error

e_meas = 0.05 * y_meas;

Sy = diag( e_meas.^2 );
ab = inv( A'* inv(Sy) * A ) * A' * inv(Sy) * y_meas;

plot( [ 0; x_std(end)], [ab(2) ab(1)*x_std(end)+ab(2)], 'm' );

2 Statistics

Relevance of statistics? Answer the question if your measurements are relevant ;)

As discussed above, each measurement is affected by noise. The noise has an influence on the accuracy of the
calibration line and the upcoming calculations of concentrations.

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 6 of 9

2.1 Error Propagation Y -> beta

The error in the slope and offset of the calibration line is calculated following the laws of error propagation. Without
going into detail, the error propagation can be calculated using the derivative of the regression function with respect
to the parameters:

β   β T
1 
β   A Sy A 
T T
A Sy y , S  S   (20)
 y y   y 

2.2 Prediction for Y

This information can be applied to calculate the accuracy of the prediction – the forward error propagation, given x
and the accuracy of the slope and offset, how accurate will the calculated y value be? Without further details, the law
of error propagation is applied:

T
dy  dy 
y  ax  b, Sy  S   (21)
d   d  

For a plot of the prediction band (for 1 sigma, 68% accuracy), following MATLAB code can be used:

x_std = [ 0; 1; 2; 3; 5; 10; 20 ];
y_meas = [0.07; 5.30; 10.09;16.29;27.65; 54.97;96.46];
plot( x_std,y_meas, 'sb' ); hold on;

% generate matrix A
A = [ x_std ones( size( x_std ) ) ];

% generate matrix Sy^-1

% observation from experiments.. 5% error

e_meas = 0.05 * y_meas;

Sy = diag( e_meas.^2 );
ab = inv( A'* inv(Sy) * A ) * A' * inv(Sy) * y_meas;

plot( [ 0; x_std(end)], [ab(2) ab(1)*x_std(end)+ab(2)], 'k' );

% calculate errors in parameters

B = inv(A'A)A'; % derivative dbeta/dy

Sb = B * Sy * B'; % covariance of beta

Xx = linspace( 0, max(x_std), 50 )'; % 50 points between 0 and highest point
C = [ Xx ones(size(Xx)) ]; % Xx vector of points x

se = C * Sb * C'; % covariance of the predictions

se = diag( se ).^0.5; % standard error, 68% prediction
plot( Xx, C*ab + se, 'm:' ); % plot
plot( Xx, C*ab - se, 'm:' );

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 7 of 9

120

100 measurements
regression
prediction band

-20
0 2 4 6 8 10 12 14 16 18 20
standard concentration (mmol/L)

2.3 Error Propagation y->x

In most cases we are more interested in the ‘reverse’ direction. We have a calibration line and a measurement from a
sample with unknown concentration. The concentration is determined from the regression:

1
x  y b (22)
a

The error propagation is calculated as in the forward direction, but now with the derivative dx/dbeta:

dx  dx T
Sx  S   (23)
d   d  

Additionally to the inaccuracy of the regression line, the inaccuracy of the measurement ( y +/- sy) itself needs to be
taken into account. The linear error propagation is extended with the respective derivative (dx/dy):

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 8 of 9

dx  S  dx T

Sx  (24)
    Sy    
d   d  
 y   y 

100

80
measurement signal (au)

0
0 5 10 15 20
standard concentration (mmol/L)

BBT2 – MATLAB - 2015/16, S.A. Wahl Page 9 of 9

Worley Parsons Design Guide Pumps and Pump Circuits
100% (2)
Worley Parsons Design Guide Pumps and Pump Circuits
37 pages
The Colonization of Tiamat V (Phoenix III, Daniel) PDF
No ratings yet
The Colonization of Tiamat V (Phoenix III, Daniel) PDF
44 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
TI-Nspire Programming - TI-Basic Developer
No ratings yet
TI-Nspire Programming - TI-Basic Developer
14 pages
Universal Emulator Julie Manual
100% (1)
Universal Emulator Julie Manual
3 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
ECON 5350 Class Notes Least Squares: 2.1 The Problem
No ratings yet
ECON 5350 Class Notes Least Squares: 2.1 The Problem
4 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
Dynamic Behavior of Chemical Processes
No ratings yet
Dynamic Behavior of Chemical Processes
45 pages
Simple Regression Analysis
No ratings yet
Simple Regression Analysis
60 pages
Assignment Responsion 08 Linear Regression Line: By: Panji Indra Wadharta 03411640000037
No ratings yet
Assignment Responsion 08 Linear Regression Line: By: Panji Indra Wadharta 03411640000037
11 pages
Assignment Responsion 08 Linear Regression Line: By: Panji Indra Wadharta 03411640000037
No ratings yet
Assignment Responsion 08 Linear Regression Line: By: Panji Indra Wadharta 03411640000037
11 pages
LMC02 App
No ratings yet
LMC02 App
3 pages
R Akne Ovningar Empirisk Modellering
No ratings yet
R Akne Ovningar Empirisk Modellering
23 pages
BE368 Lecture 4
No ratings yet
BE368 Lecture 4
28 pages
15.3 Straight-Line Data With Errors in Both Coordinates
No ratings yet
15.3 Straight-Line Data With Errors in Both Coordinates
6 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Linear Regression Course
No ratings yet
Linear Regression Course
22 pages
Lab 8
No ratings yet
Lab 8
13 pages
Least Square Regression
No ratings yet
Least Square Regression
13 pages
Machine Learning: Linear Models For Regression
No ratings yet
Machine Learning: Linear Models For Regression
54 pages
Regression Analysis With Scilab
No ratings yet
Regression Analysis With Scilab
57 pages
Day 1
No ratings yet
Day 1
41 pages
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
No ratings yet
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
15 pages
2-lecture 2-1
No ratings yet
2-lecture 2-1
30 pages
EstimationTheory Lecture 03 (1)
No ratings yet
EstimationTheory Lecture 03 (1)
21 pages
L. D. College of Engineering: Lab Manual For
No ratings yet
L. D. College of Engineering: Lab Manual For
70 pages
CH 2
No ratings yet
CH 2
31 pages
6th Lecture Note 108335647 230518 203102
No ratings yet
6th Lecture Note 108335647 230518 203102
35 pages
Advanced Regression With JMP PRO Handout
No ratings yet
Advanced Regression With JMP PRO Handout
46 pages
Che 310 Handout 5 Regression
No ratings yet
Che 310 Handout 5 Regression
20 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Lecturer 4 Regression Analysis
100% (1)
Lecturer 4 Regression Analysis
29 pages
ML 5
No ratings yet
ML 5
21 pages
Regression Notes- Part-1
No ratings yet
Regression Notes- Part-1
17 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
Part 5
No ratings yet
Part 5
27 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
Lecture 5 Regression
No ratings yet
Lecture 5 Regression
77 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Econ 471 Notes 1
No ratings yet
Econ 471 Notes 1
14 pages
Curve Fitting
No ratings yet
Curve Fitting
17 pages
REg03
No ratings yet
REg03
39 pages
8. Linear Regression
No ratings yet
8. Linear Regression
29 pages
w3 - Linear Model - Linear Regression
No ratings yet
w3 - Linear Model - Linear Regression
33 pages
Econometrics (EM2008) The K-Variable Linear Regression Model
No ratings yet
Econometrics (EM2008) The K-Variable Linear Regression Model
46 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Data Science Unit-II
No ratings yet
Data Science Unit-II
28 pages
L3 MLRM
No ratings yet
L3 MLRM
10 pages
Optimization Techniques 1. Least Squares
No ratings yet
Optimization Techniques 1. Least Squares
17 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
04 LinearRegression
No ratings yet
04 LinearRegression
61 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
SolutionQuiz1
No ratings yet
SolutionQuiz1
5 pages
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
No ratings yet
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
20 pages
Ch2 Linear Regression Analysis
No ratings yet
Ch2 Linear Regression Analysis
57 pages
Standard-Slope Integration: A New Approach to Numerical Integration
From Everand
Standard-Slope Integration: A New Approach to Numerical Integration
Peter James Italia, MD
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Mathematical Model For LCL Filter With AFE Converter
No ratings yet
Mathematical Model For LCL Filter With AFE Converter
4 pages
12V boost controller .xls (兼容模式)
No ratings yet
12V boost controller .xls (兼容模式)
2 pages
RNDF MDF Formats 031407
No ratings yet
RNDF MDF Formats 031407
14 pages
1996 Williamson Vortex Dynamics in The Cylinder Wake Ann Review
No ratings yet
1996 Williamson Vortex Dynamics in The Cylinder Wake Ann Review
67 pages
Interest Rate Futures and Options OpenGamma
No ratings yet
Interest Rate Futures and Options OpenGamma
7 pages
References: Sources Used
No ratings yet
References: Sources Used
4 pages
(1906) Wireless Telegraphy and Telephony (Wireless Radio)
100% (1)
(1906) Wireless Telegraphy and Telephony (Wireless Radio)
436 pages
Evaluation of Ergosterol Composition and Esterification Rate in Fungi Isolated From Mangrove Soil, Long-Term Storage of Broken Spores, and Two Soils
No ratings yet
Evaluation of Ergosterol Composition and Esterification Rate in Fungi Isolated From Mangrove Soil, Long-Term Storage of Broken Spores, and Two Soils
15 pages
Pranav - Final Assignment
No ratings yet
Pranav - Final Assignment
56 pages
SNA-Graph Essentials
No ratings yet
SNA-Graph Essentials
106 pages
205-01 Driveshaft - General Procedures - Driveshaft Angle Measurement
No ratings yet
205-01 Driveshaft - General Procedures - Driveshaft Angle Measurement
4 pages
Batching and Mixing 2011-1 PDF
No ratings yet
Batching and Mixing 2011-1 PDF
84 pages
Directional Earth Fault MRP2
No ratings yet
Directional Earth Fault MRP2
36 pages
1 s2.0 S0254629916002507 Main PDF
No ratings yet
1 s2.0 S0254629916002507 Main PDF
7 pages
Sikacrete Fire Protection Mortar 201102
No ratings yet
Sikacrete Fire Protection Mortar 201102
4 pages
AQR (2014) Capital Market Assumptions For Major Asset Classes
No ratings yet
AQR (2014) Capital Market Assumptions For Major Asset Classes
12 pages
Electrolyser-Operating Manual PDF
0% (1)
Electrolyser-Operating Manual PDF
6 pages
Magnetic Field PDF
No ratings yet
Magnetic Field PDF
29 pages
Integrals Using Branch Cut
No ratings yet
Integrals Using Branch Cut
23 pages
dsineDN DU ROM Extended DN2 Product Manual 1.0
No ratings yet
dsineDN DU ROM Extended DN2 Product Manual 1.0
2 pages
Read/Write Base Station U2270B: Features
No ratings yet
Read/Write Base Station U2270B: Features
18 pages
Management of The Soft Palate Defect Steven Eckert PDF
No ratings yet
Management of The Soft Palate Defect Steven Eckert PDF
15 pages
Clock-Driven Scheduling: Notations and Assumptions
No ratings yet
Clock-Driven Scheduling: Notations and Assumptions
11 pages
Jurnal Lagi
No ratings yet
Jurnal Lagi
13 pages
Astronomicon Vol. 4
100% (1)
Astronomicon Vol. 4
156 pages