0% found this document useful (0 votes)
14 views24 pages

Experiment 6 - Linear Systems, Regression, Curve Fitting, and Interpolation

The document outlines Experiment 6 of a Practical Numerical Analysis course, focusing on solving linear equations, linear regression, curve fitting, and interpolation using MATLAB. It provides detailed explanations of representing linear equations in matrix form, methods for linear regression, and techniques for quantifying the goodness of fit. Additionally, it introduces MATLAB's built-in functions for regression and the curve fitting toolbox for non-linear data analysis.

Uploaded by

Rasha Leo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views24 pages

Experiment 6 - Linear Systems, Regression, Curve Fitting, and Interpolation

The document outlines Experiment 6 of a Practical Numerical Analysis course, focusing on solving linear equations, linear regression, curve fitting, and interpolation using MATLAB. It provides detailed explanations of representing linear equations in matrix form, methods for linear regression, and techniques for quantifying the goodness of fit. Additionally, it introduces MATLAB's built-in functions for regression and the curve fitting toolbox for non-linear data analysis.

Uploaded by

Rasha Leo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Copyright © (2020) Dr.

Ashraf Suyyagh – All Rights Reserved

University of Jordan

School of Engineering and Technology

Department of Computer Engineering

Practical Numerical Analysis (CPE313)

Experiment 6 - Solving Linear Equations, Basics of Linear


Regression and Curve Fitting, and Interpolation
Material prepared by Dr. Ashraf E. Suyyagh

Table of Contents
Experiment 6 - Solving Linear Equations, Basics of Linear Regression and Curve Fitting, and
Interpolation ........................................................................................................................................... 1
Solving Linear Equations in Matrix Form ............................................................................................... 1
Representing Linear Equations in MATLAB ...................................................................................... 1
System of Linear Equations ............................................................................................................... 2
Linear Regression .................................................................................................................................. 3
Least-Squares Fit of a Straight Line .................................................................................................. 6
Quantifying the Goodness of Fit ........................................................................................................ 8
MATLAB Built-in Functions for Regression ....................................................................................... 9
Curve Fitting ......................................................................................................................................... 10
Interpolation ......................................................................................................................................... 20

Solving Linear Equations in Matrix Form


A linear equation can be represented as a vector made of the coefficients of its terms. A system of
linear equations can be represented by stacking the vectors of each linear equation on top of each
other forming a matrix.

Representing Linear Equations in MATLAB


A linear equation can be presented in vector form as
the coefficients vector .

So, the function can be expressed in MATLAB as:

c1 = [1, -3, 2, -1, 1, 2];

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 1 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

while the function can be expressed in MATLAB as:

c2 = [1, 0, 3, 0, -5];

Notice that we first reordered the terms to start from the highest order , and the
coefficients for the missing terms and are written as .

If we know the coefficients of each term, we can evaluate the function for any using the
MATLAB command polyval. The polyval command takes as input the coefficients vector and the
at which we want to evaluate . Suppose you want to compute in the previous
function. If you already have the coefficients vector, you can simply write:

y = polyval(c1, 2.5)

y = 9.9688

System of Linear Equations


A system of linear equations with variables can be expressed as:

(1)

The above system can be expressed in generic matrix form as a matrix holding the coefficients of
the equations (the left-hand side), a vector of the unknowns , and a vector that holds
the right hand-side. Notice that we moved all literals to the right-hand side, so only
the unknowns remain on the left-hand side before we transformed the equations into matrix form.

In order to understand the notation better, let us write a numeric example. Suppose we have a
system of three equations and three unknowns like this:

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 2 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

Before expressing the system in matrix form correctly, we need to reorder the terms in all three
equations so that the variables are aligned. That is, the order of the variables must be the
same in all three equations. Also, all literals must be moved to the right-hand part:

We can express it in matrix form as:

Notice that we end up with two numeric matrices, one to the left holding the coefficients, and one to
the right holding the literals. In MATLAB, we write them as:

coef = [ 1 -2 3; ...
-1 3 -1; ...
2 -5 5];
literals = [9; -6; 17];

To find the values of the three unknowns and solve the system, there are two methods; the first is by
using the inverse (the order is important to satisfy the matrix equations requirements):

b = inv(coef)*literals

b = 3×1
1
-1
2

The other method is simply by using the backward division:

b = coef\literals

b = 3×1
1
-1
2

Linear Regression
Suppose that we have collected some measurements in the form of
These measurements could be coming of an engineering application like measuring the speed of a
car every 10 seconds. By observing the scatter plot of the discrete measurements, we might notice
that their shape can be approximated by a linear equation. We want to fit a straight line to this set of
paired measurements.

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 3 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

figure
x = 10:10:60;
y = [60, 65, 55, 65, 63, 70];
scatter(x,y)
axis ([0 70 40 80])

We know that the equation of a straight line is:

where is the slope of the line and is the x-intercept. Yet, even if we find this straight line that
fits the data, we know for sure that it will not pass through all the data points; some will fall below the
line, others above it. Therefore, the straight-line equation will have some errors.

We can also find multiple straight lines that will fit and approximate the data; we can draw any of the
four coloured lines and say it approximates the data. So, which one to use?

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 4 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

Notice that for the data pairs , we are going to create a line whose
equation is . So, for every data point , we have its true value and its
corresponding value on the straight line.

The difference between the true value and its corresponding value on the straight line is what
we call the residual error which we express as (Notice the red lines in the above figure
which illustrate this error). Ideally, we want to find a line whose values have the least residual error
(difference) from all corresponding true values. That is, we need to minimize the sum of all absolute
errors . At the same time, we don't need one or few outlier points to affect the line. For

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 5 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

example, we don't want the slightly distant point at (30, 40) to severely shift the straight line
downwards which we can arguably agree that it beautifully passes through the other points.

Least-Squares Fit of a Straight Line


One major algorithm used to fit a straight line to measurements is called the least squares fit or least
squares errors. The algorithm provides one unique line and given its name, also the least error. The
algorithm attempts to minimize the Sum of the Square of the Residual errors (SSR), also called Sum
of the Squared Estimates of errors (SSE):

(2)

Analysing Eq.2, the measurements collected provide us with , yet we need to have the
coefficients which define the line which constitute the two unknowns we need to solve for. We
need to have two equations to solve for . The derivation starts with differentiating the equation
twice, once with respect to and another with respect to :

(3)

(4)

We already know that the minimum occurs when the derivative is zero, so we set both

to zero, then we collect the terms. We end up with a system of two linear equations with two
unknowns that we can easily solve.

(5)

(6)

If we take a closer look at Eq. 6, we observe that depends solely on the measurement points; all
terms in the equation are related to , and the number of observations . Once we have the
value of , then we can substitute it in Eq. 5 and solve for , thus having the coefficients of the
straight line that best fits the data.

Let us apply this equation to our first example where we had measurements of the car speed every
10 seconds. The points we have are (10, 60), (20, 65), (30, 55), (40, 65), (50, 63), (60, 70) where
denotes the sample time every 10 seconds, and denotes its speed in km/h. The best approach to
solve this by hand is to construct a table of these samples as shown below where we use it to
compute all the terms in Eq. 6:

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 6 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

Solving for :

Therefore, the line which best fits the data is the one with the least SSR (SSE) compared to any
other line and is represented by:

We call the line resulting from this linear regression method the regression line.

We shall now plot this line with the scattered measurements on one plot:

figure
x = 10:10:60;
y = [60, 65, 55, 65, 63, 70];
scatter(x,y)
axis ([0 70 0 90])
hold on
xs = 5:0.1:65;
ys = 0.1543.*xs+57.5995;
plot(xs,ys)

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 7 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

Notice that the slightly far measurement at point (30, 55) did not shift the line towards it as much.

Quantifying the Goodness of Fit


What we learnt thus far is that the least squares method gives us the best fit it can given the
measurements observed. Yet, how can we determine that best fit straight line the algorithm was able
to provide is actually good enough? We need more criteria to tell us if this fit is good or not!

In Eq. 2, we defined the sum of the square of the residual errors


as the square of the error between each measured point and each predicted point on the regression
line. The problem with this measure is that with more points available, the more errors and SSR
(SSE) keeps getting larger. However, if we divide this value by the number of points n, then this is
called mean square error or MSE:

(7)

There is also another goodness of fit metric called Root Mean Square Error (RMSE) which is simply
taking the square root of MSE:

(8)

For all three metrics, SSR (SSE), MSE, or RMSE, the closer the value to zero, means the less the
errors, and therefore it is a better fit. A perfect fit will have all these value compute to zero.

We also have another way to quantify the goodness of the fit which is called the correlation
coefficient, or simply r. It can be computed using the following formula:

(9)

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 8 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

From which we can compute the R-Square metric . For a perfect fit, , and the closer the
values we obtain to one means the better the fit.

To apply these metrics to the car example, we already found that the regression line equation is
, if we substitute the values of that we collected in the regression line
formula, we will get:

p = [0.1543, 57.5995];
yn = polyval(p, 10:10:60)

yn = 1×6
59.1425 60.6855 62.2285 63.7715 65.3145 66.8575

Compare these values to the actual measured car speed values:

disp(y)

60 65 55 65 63 70

To calculate :

SSR = sum((yn - y).^2)

SSR = 88.3429

We calculate the correlation coefficient r:

n = 6;
r = (n *sum(x.*y) - sum(x)*sum(y))/ (sqrt(n*sum(x.^2)-
sum(x)^2)*sqrt(n*sum(y.^2)-sum(y)^2))

r = 0.5661

disp(r^2)

0.3204

It is worth to note that to find if the fit has high quality or not, we should not depend on the R-Square
alone, or the MSE alone, because sometimes R-Square could be close to one, yet MSE is way high
than zero. So, we should always consider both metrics.

MATLAB Built-in Functions for Regression


In this course, we only introduce linear regression. That is how to best fit linear lines. There are
many numerical methods which attempt to fit non-linear lines (e.g., quadratic, cubic, log, ln, etc.).
MATLAB offers the command polyfit. This command can actually fit higher degree polynomials
and not only linear regression. It is internally built on the concept of least square errors.

To fit and plot a linear regression line in our previous example, we can use:

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 9 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

x = 10:10:60;
y = [60, 65, 55, 65, 63, 70];
p = polyfit(x, y, 1) % The 1 here denotes we need to fit a line (linear
regression)

p = 1×2
0.1543 57.6000

Notice how the result almost matches our previous results.

We can plot the measurements and linear regression lines using polyfit and polyval as follows:

figure
x = 10:10:60;
y = [60, 65, 55, 65, 63, 70];
scatter(x,y)
axis ([0 70 0 90])
hold on
p = polyfit(x, y, 1)

p = 1×2
0.1543 57.6000

xs = 5:0.1:65;
ys = polyval(p, xs);
plot(xs, ys)

Curve Fitting
In the previous section, we learnt to use linear regression to find the equation of the linear line that
fits the data with the least error possible. However, this technique only works when the data plot
resembles a linear plot. What if the points are similar to a quadratic equation? an exponential
equation? That is, they are non-linear. For sure linear regression will fail. There are many numerical

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 10 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

techniques for non-linear regression but in this section, we will use MATLAB's curve fitting toolbox to
find the equation that best fits the data points we have instead of doing numerical methods.

In MATLAB, go to the Apps tab, and under Maths, Statistics, and Optimization you will find a
toolbox called Curve Fitting. Open this toolbox by clicking on it.

The tool is very simple to use. First, we need to select the data points for the x-axis, the y-axis (or
the z-axis if the plot is 3D). Make sure that the length of the data points for all axes is equal.

Let us use the previous data set of the car example:

x = 10:10:60; % original measurments


y = [60, 65, 55, 65, 63, 70];

Then, from curve fitter window, click on Select Data, and from the new window, select the variable x
for the X Data, and the variable Y for the Y Data, then close the window. You can also give your fit a
name, say exampleFit1

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 11 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

Notice that by default, the polynomial option was selected, and that the polynomial degree was set to
1, that is a linear line. So, the toolbox starts with linear regression.

Notice the results and the goodness of fits measures:

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 12 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

In this toolbox, p1 and p2 are the same as a1 and a0 that we calculated numerically. Also, notice the
values for SSE, RMSE, and R-square.

Now, let us try a set of non-linear points, for example the x and y pairs in the matrix Data:

Data = ...
[0.0000 5.8955
0.1000 3.5639
0.2000 2.5173
0.3000 1.9790
0.4000 1.8990
0.5000 1.3938
0.6000 1.1359
0.7000 1.0096
0.8000 1.0343
0.9000 0.8435
1.0000 0.6856
1.1000 0.6100
1.2000 0.5392
1.3000 0.3946
1.4000 0.3903

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 13 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

1.5000 0.5474
1.6000 0.3459
1.7000 0.1370
1.8000 0.2211
1.9000 0.1704
2.0000 0.2636];

x = Data(:,1);
y = Data(:,2);

Load the x and y values into the Curve Fitting Select Data Window. It is clear that Linear
Regression does not fit the data well. This is obvious given that the RMSE is much higher than zero
and R-Square is not close to 1.

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 14 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

If you click on the Residual Plot button, it will plot the difference between each actual point and
the corresponding point on the line :

Notice how the residual errors are quite large for a bad fit.

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 15 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

Let us try to use a quadratic fit using polynomial of degree 2. Even though R-Square has increased
from 0.6443 to 0.8637, and the RMSE decreased from 0.8458 to 0.5379. It is clear we can do better.

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 16 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

What if we try to use a cubic equation by using a polynomial of degree 3?

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 17 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

Let us change our equations from polynomials to exponentials, and try to change the number of
exponential terms from 1 to 2:

When we have one exponential term, we don't have much change in terms of goodness of fit
compared to the cubic equation:

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 18 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

However, if we use two exponential terms, notice how beautifully the curve fits the data. Also notice
that R-Square is 0.9961 and very close to 1, while RMSE is 0.09322 and much closer to zero than
before.

Of course, we can even tune the fit for better results by enabling the advanced options and selecting
more parameters like certain algorithms among others. But this is out of scope of this lab course.

Now, in the previous fit, notice that the results returned four parameters and . Also note that
the equation is given as a*exp(b*x) + c*exp(d*x), so we can write this in MATLAB:

f = @(x) 3.007*exp(-10.59*x)+ 2.889*exp(-1.4*x)

f = function_handle with value:


@(x)3.007*exp(-10.59*x)+2.889*exp(-1.4*x)

and we can find any value on this curve by simply calling the function, for example, to find ,
write:

f(1.75)

ans = 0.2493

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 19 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

Interpolation
In engineering and scientific applications, we collect measurements from sensors or other
experiments. These measurements are discrete in nature; that is, they are sampled at non-
continuous points in time (e.g., every 10 ms, second, day, etc.). Sometimes, we might have an
erroneous measurement (possibly due to high noise) or a missing measurement (e.g.sensor failure).
As such, we want to predict what the original value was and replace the erroneous or missing value.
At other times, we might be interested in predicting the value for a point of time that we did not take
a measurement for.

Suppose we are measuring the speed of a car every ten seconds for the duration of one minuet
similar to the car example we have seen already. What if we wanted to predict the speed of the car
at the 55th second? Or the 43rd second? These are values that we did not take a measurement for.

You could use the regression techniques we just learnt to come up with the regression line
(polynomial or otherwise) to find a formula for the speed, then apply :

x = 10:10:60; % original measurments


y = [60, 65, 55, 65, 63, 70];

p = polyfit(x, y, 1); % finding the regression line 1st-degree polynomial


ys = polyval(p, 43) % Evaluate the desired point using this polynomial

ys = 64.2343

In the above example, we applied linear regression because we noticed through the scatter plot that
a straight line better fits the data.

But what if you had measurements described as this:

figure
x2 = 1:1:10;
y2 = [2, 5, 8, 15, 19, 17, 14, 13, 10, 7];
scatter(x2,y2);

And you want to predict the value at 3.25? In this case, you might want to connect straight lines
between the points and , and compute the slope of this line segment, then write the
equation of the line, then substitute in the line equation.

hold on
plot(x2,y2)

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 20 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

Notice that we were not able to use linear regression on the entire points to predict the value
because the scatter as a whole does not represent a linear function. Instead, we took two adjacent
points (3, 8) and (4, 15) and connected them with a line and used the line equation to find the value
at x = 3.5.

In a similar fashion, MATLAB offers the function interp1 that interpolates data at certain data
points. To predict the value of 3.5, we need to only pass the entire original measurements, and the
data point we want to interpolate at:

interp1(x2, y2, 3.5)

ans = 11.5000

You can also interpolate at many points at once by passing a vector of points:

interp1(x2, y2, [3.5, 6.75, 8.25])

ans = 1×3
11.5000 14.7500 12.2500

Let us compare the output of the interp1 function for the car example with the output we got using
polyfit and polyval:

interp1(x,y, 43)

ans = 64.4000

Why are the two values different? That is, the interpolated result using the regression line was
64.2343 and using interp1 is 64.4. Let us examine the plot to illustrate how they differ. The interp1
command connects each successive two points with a line and uses the equation of that line piece

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 21 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

to find the interpolation. The regression line is a line that approximates all points together and thus
has a different equation.

figure
x = 10:10:60; % original measurements
y = [60, 65, 55, 65, 63, 70];
scatter (x,y)
hold on
p = polyfit(x, y, 1); % finding the regression line 1st-degree polynomial
yn = polyval (p, 10:0.1:60);
plot(10:0.1:60, yn)
plot (x,y)
axis ([0 70 50 75])
legend('Car speed', 'Regression Line', 'Piece-wise interp1')

In either of the two previous cases, when we examine the previous figure, we can easily see that
connecting the points using straight line or regression line does not best fit the function or might not
offer the best interpolated value. We could have used a smoother fit which will capture the actual
figure more accurately. This will then yield better predictions and interpolations.

MATLAB provides the command spline which performs cubic-spline interpolation instead of linear
interpolation. It has the same syntax as interp1 :

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 22 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

ys = spline(x2, y2, 3.5)

ys = 11.3279

You can also interpolate at many points at once by passing a vector of points:

spline(x2, y2, [3.5, 6.75, 8.25])

ans = 1×3
11.3279 14.5459 12.4709

To visualize how spline works, we can plot the smoothed curve:

figure
x2 = 1:1:10;
y2 = [2, 5, 8, 15, 19, 17, 14, 13, 10, 7];
scatter(x2,y2);
hold on
xnew = 1:0.1:10;
ynew = spline(x2, y2, xnew);
plot(xnew, ynew)

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 23 of 24
Copyright © (2020) Dr. Ashraf Suyyagh – All Rights Reserved

Experiment version 1.1


Original Experiment December 17th, 2020
Last Updated April 8th, 2022
Dr. Ashraf Suyyagh - All Rights Reserved

Revision History

Ver. 1.1
- Corrected the notation of the linear equation and the linear system and made
it easier to understand.
- Replaced the plots and figures of the linear regression section with new ones
and simplified the discussion
-Removed some of the previous metrics of the goodness of fit and introduced
simpler ones. Removed the difficult interpretation of the some of these
statistical metrics.
- Added a new section on the curve fitting toolbox to cover non-linear
regression and other algorithms in simple manner.
- Added a clarification on why interpolation using the regression line and
using interp1 function can be different.
- Removed interp2, interp3, and intern commands.

For Internal Use Only at the Department of Computer Engineering – University of Jordan Page 24 of 24

You might also like