6-Model Building and Regression - P
6-Model Building and Regression - P
MATLAB
for Engineering Applications
Fourth Edition
Chapter 06
Function Discovery: A systematic way of finding an equation that best fits the data is
regression (also called the least-squares method).
Each function gives a straight line when plotted using a specific set of axes:
1. The linear function y = mx + b gives a straight line when plotted on
rectilinear axes. Its slope is m and its intercept is b.
2. The power function y = bxm gives a straight line when plotted on log-log
axes.
3. The exponential function y = bemx and its equivalent y = b(10)mx form give
a straight line when plotted on a semilog plot whose y-axis is logarithmic.
Function Discovery
The power function y = 2x −05 and the exponential function y = 101−x plotted on
linear, semi-log, and log-log axes.
Linear Semi-log
Log-Log
2. Plot the data using rectilinear scales. If it forms a straight line, then it can be
represented by the linear function and you are finished. Otherwise, if you have
data at x = 0, then
3. If you suspect a power function, plot the data using log-log scales. Only a power
function will form a straight line on a log-log plot. If you suspect an exponential
function, plot the data using the semilog scales. Only an exponential function will
form a straight line on a semilog plot.
4. In function discovery applications, we use the log-log and semilog plots only to
identify the function type, but not to find the coefficients b and m. The reason is that
it is difficult to interpolate on log scales.
Command Description
p = polyfit(x,y,n) Fits a polynomial of degree n to data described by
the vectors x and y, where x is the independent
variable. Returns a row vector p of length n + 1
that contains the polynomial coefficients in order of
descending powers.
𝑝 𝑥 = 𝑝1 𝑥 𝑛 + 𝑝2 𝑥 𝑛−1 + ⋯ + 𝑝𝑛 𝑥 + 𝑝𝑛+1
Syntax: p = polyfit(x,y,n)
where x and y contain the data, n is the order of the polynomial to be fitted, and p is
the vector of polynomial coefficients.
We can find the linear function that fits the data by typing p = polyfit(x,y,1). The
first element p1 of the vector p will be m, and the second element p2 will be b.
Sonar measurements of the range of an approaching underwater vehicle are given in the
following table, where the distance is measured in nautical miles (𝑛𝑚𝑖). Assuming the relative
speed v is constant, the range as function of time is given by 𝑟 = −𝑣𝑡 + 𝑟0 where is 𝑟0 the
initial range at 𝑡 = 0. Estimate (a) the speed v and (b) when the range will be zero.
Time, t (min) 0 2 4 6 8 10
Range, r (nmi) 3.8 3.5 2.7 2.1 1.2 0.7
t = 0:2:10;
r = [3.8,3.5,2.7,2.1,1.2,0.7];
% First-order curve fit.
p = polyfit(t,r,1)
% Create plotting variable.
y = p(1)*t+p(2);
plot(t,r,'o',t,y),xlabel('t (min)'),ylabel('y (nmi)')
% Speed calculation.
v = -p(1)*60 % speed in knots (nmi/hr)
------------------------
The temperature of coffee cooling in a mug at room temperature (68°F) was measured at various
times. The data follow.
Develop a model of the coffee’s temperature as a function of time, and use the model to estimate
how long it took the temperature to reach 120°F.
1. Because T(0) is finite but nonzero, the power function cannot describe these data. So we do
not bother to plot the data on log-log axes.
2. Common sense tells us that the coffee will cool, and its temperature will eventually equal the
room temperature. So, we subtract the room temperature from the data and plot the relative
temperature, T − 68, versus time.
• If the relative temperature is a linear function of time, the model is 𝑇 − 68 = 𝑚𝑡 + 𝑏.
• If the relative temperature is an exponential function of time, the model is 𝑇 − 68 =
𝑏(10)𝑚𝑡 .
Numerical Methods and Modeling for Engineering Applications R. Rashidzadeh 15
FACULTY OF ENGINEERING - UNIVERSITY OF WINDSOR
𝑇 = 68 + 𝑏(10)𝒎𝑡
Temperature of a cooling cup of coffee,
plotted on various coordinates
Numerical Methods and Modeling for Engineering Applications R. Rashidzadeh 16
FACULTY OF ENGINEERING - UNIVERSITY OF WINDSOR
Q1-The distance a spring stretches from its “free length” is a function of how much tension force
is applied to it. The following table gives the spring length y that the given applied force f
produced in a particular spring. The spring's free length is 4.7 in. Find a functional relation
between f and x, the extension from the free length, 𝑓 = 𝑘𝑥 + 𝑏. The spring constant k is:
1. 0.1
2. 0.2
3. 0.3 Password: polynomial
4. 0.4
Obtain a function that describes these data. How may years after 2012 the population will be
double its 2012 size.
1. 5.9 years
2. 8.7 years
3. 12.3 years
4. 14.8 years
Q3-A 15-cup coffee pot was placed under a water faucet and filled to the 15-cup
line. With the outlet valve open, the faucet’s flow rate was adjusted until the
water level remained constant at 15 cups, and the time for one cup to flow out
of the pot was measured. This experiment was repeated with the pot filled to
the various levels shown in the following table:
(a) Use a power function, log10 V = m log10(1/t) + log10b , to obtain a linear relation
between the flow rate (1/t) and the number of cups in the pot (V). M and b are:
1. m= 0.4331, b= 0.0499
2. m= 1.3254, b= 1.1545
3. m= 4.9873, b= 6.5343
4. m= 5.4754, b= 2.3451
Numerical Methods and Modeling for Engineering Applications R. Rashidzadeh 20
FACULTY OF ENGINEERING - UNIVERSITY OF WINDSOR
The Least Squares Criterion: used to fit a function f (x). It minimizes the
sum of the squares of the residuals, J. J is defined as
m
J = f ( xi ) − yi
2
i =1
We can use this criterion to compare the quality of the curve fit for two or
more functions used to describe the same data. The function that gives the
smallest J value gives the best fit.
𝑝 𝑥 = 𝑝1 𝑥 𝑛 + 𝑝2 𝑥 𝑛−1 + ⋯ + 𝑝𝑛 𝑥 + 𝑝𝑛+1
x = 1:9;
y = [5,6,10,20,28,33,34,36,42];
for k = 1:4
coeff = polyfit(x,y,k);
subplot (2,2,k);
plot(x,y, 'O');
hold on
plot(x,polyval(coeff,x));
J(k) = sum ((polyval(coeff,x)-y).^2)
end
1- 3-
4-
2-
m m
J
J = f ( xi ) − yi S = ( yi − y )
2
r = 1−
2 2
i =1 i =1 S
The value of S indicates how much the data is spread around the mean 𝑦, ത and
the value of J indicates how much of the data spread is unaccounted for by the
model.
Thus the ratio J/S indicates the fractional variation unaccounted for by the
model.
Numerical Methods and Modeling for Engineering Applications R. Rashidzadeh 27
FACULTY OF ENGINEERING - UNIVERSITY OF WINDSOR
Q5 -The J value for the second-degree polynomial representing the following data is 𝑥 =
0, 1, . . . , 5 and 𝑦 = 0, 1, 44, 40, 41, 47
m first-degree second-degree
J = f ( xi ) − yi
2
i =1
1. 1485.7
2. 1151.7
3. 73.161 fourth-degree
third-degree
4. 1333.7
8 decimal-place accuracy
• high-degree polynomials can produce large errors if their coefficients are not represented
with a large number of significant figures.
• The effect of computational errors in computing the coefficients can be lessened by properly
scaling the x values.
Numerical Methods and Modeling for Engineering Applications R. Rashidzadeh 29
FACULTY OF ENGINEERING - UNIVERSITY OF WINDSOR
Using Residuals
High Residuals Low Residuals No Clear Pattern
In general, if you see a pattern in the plot of the residuals, it indicates that another function can
be found to describe the data better.
Numerical Methods and Modeling for Engineering Applications R. Rashidzadeh 30
FACULTY OF ENGINEERING - UNIVERSITY OF WINDSOR
For a perfect fit, J = 0 and thus r 2 = 1. Thus the closer r 2 is to 1, the better the fit. The
largest r 2 is 1.
It is possible for J to be larger than S, and thus it is possible for r 2 to be negative. Such
cases, however, are indicative of a very poor model that should not be used.
As a rule of thumb, a good fit accounts for at least 99 percent of the data variation. This
value corresponds to r 2 ≥ 0.99.
1. Subtract the minimum 𝑥 value or the mean 𝑥 value from the 𝑥 data, if the range of 𝑥
is small, or
2. Divide the 𝑥 values by the maximum value or the mean value, if the range is large.
Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Vehicle flow(millions) 2.1 3.4 4.5 5.3 6.2 6.6 6.8 7 7.4 7.8
Year = 2000:2009;
Veh_Flow = [2.1,3.4,4.5,5.3,6.2,6.6,6.8,7,7.4,7.8];
p = polyfit(Year,Veh_Flow,3);
Warning: Polynomial is badly conditioned. Add points with distinct X values,
reduce the degree of the polynomial, or try centering and scaling as described
in HELP POLYFIT.
> In polyfit (line 79)
The problem is caused by the large values of the independent variable Year
Year = 2000:2009;
Veh_Flow = [2.1,3.4,4.5,5.3,6.2,6.6,6.8,7,7.4,7.8];
x = Year-2000; y = Veh_Flow;
Veh_Flow = [2.1,3.4,4.5,5.3,6.2,6.6,6.8,7,7.4,7.8];
p = polyfit(x,y,3);
J = sum((polyval(p,x)-y).^2);
S = sum((y-mean(y)).^2);
r2 = 1 - J/S
plot(x, y, 'O', x, polyval(p,x))
r2 = 0.9972
𝑓 = 0.0087(𝑡 − 2000)3 − 0.1851(𝑡 − 2000)2 + 1.5991(𝑡 − 2000) + 2.0362
Numerical Methods and Modeling for Engineering Applications R. Rashidzadeh 34
FACULTY OF ENGINEERING - UNIVERSITY OF WINDSOR
Q6- The U.S. census data from 1790 to 1990 is stored in the file census. dat, which is supplied
with MATLAB. Type load census to load this file. The first column, cdate, contains the years, and
the second column, pop, contains the population in millions. First try to fit a cubic polynomial to
the data. If you get a warning message, scale the data by subtracting 1790 from the years, and
fit a cubic. The coefficient of determination (r-squared value) is:
Q7- The U.S. census data from 1790 to 1990 is stored in the file census. dat, which is supplied
with MATLAB. Type load census to load this file. The first column, cdate, contains the years, and
the second column, pop, contains the population in millions. First try to fit a cubic polynomial to
the data. If you get a warning message, scale the data by subtracting 1790 from the years, and fit
a cubic. The estimate population in 1965 is.
a) 154.1 millions
b) 162.2 millions
c) 178.3 millions
d) 189.4 millions
Q8- Load the census data. Use the scaled data and try three linear, quadratic, cubic, and an
exponential fits. Then plot the residuals. Which one is the better fit.
a) Linear
b) Quadratic
c) Cubic
d) Exponential
x 1 2 3 4 5 6 7 8 9 10
y 10 14 16 18 19 20 21 22 23 23
a) 𝑎1 = 14.4353 𝑎2 = 2.4543
b) 𝑎1 = 7.34764 𝑎2 = 6.6895
c) 𝑎1 = 5.7518 𝑎2 = 9.9123
d) 𝑎1 = 3.4344 𝑎2 = 11.472
for example, 𝑦 = 𝑎0 + 𝑎1 𝑥1 + 𝑎2 𝑥2 .
How to find the coefficient values 𝑎0 , 𝑎1 , and 𝑎2 to fit a set of data (𝑦, 𝑥1 , 𝑥2 ) in the least-
squares sense?
Xa = y
1 0 5 𝑎0 7.1
1 1 7 𝑎1 = 19.2
Xa = y 1 2 8 31
𝑎2
1 3 11 45
a = 0.8000
10.2429
1.2143
Max_Percent_Error = 3.2193
Linear-in-Parameters Regression
The first-order model written for each of the n data points results in n equations, which can be
expressed as follows:
Xa = y′
Linear-in-Parameters Regression
t = [0,0.3,0.8,1.1,1.6,2.3,3];
y = [0,0.6,1.28,1.5,1.7,1.75,1.8];
X = [ones(size(t));exp(-t)]';
a = X\y'
plot(t, y, 'O')
hold on
yp=a(1)+a(2).*exp(-t);
plot(t,yp);
a) y = 12.4734
b) y = 17.1179
c) y = 23.4754
d) y = 34.7321
Suppose the model is required to pass through a point not at the origin, say the point
(𝑥0, 𝑦0) and that point is known to be an exact solution to the equation, so that
𝑦0 = 𝑚𝑥0 + 𝑏
In that case, simply subtract 𝑥0 from all the 𝑥 values, subtract 𝑦0 from all the 𝑦 values.
𝑢 = 𝑥 − 𝑥0 and 𝑤 = 𝑦 − 𝑦0.
The resulting equation will be of the form 𝑤 = 𝑚𝑢, and the coefficient 𝑚 can be calculated
using right division. In MATLAB we would write 𝑚 = 𝑤’\u’ where 𝑢 and 𝑤 are row vectors
containing the transformed data.
load census
x = cdate;
y = pop;
plot(x,y,'O')
Q11- A mass attached to a spring and damper is displaced a distance x0 (cm) while being given
an initial velocity v0 (cm/s). We know from physics and mathematics (see Chapter 8) that the
displacement x as a function of time is given by
5𝑥0 𝑣0 −2𝑡 2𝑥0 + 𝑣0 −5𝑡
𝑥 𝑡 = + 𝑒 − 𝑒
3 3 3
The displacement is measured every 0.2 s. The measured displacement versus time is given by
a) x0 = 1.5268
b) x0 = 3.4378
c) x0 = 5.3298
d) x0 = 7.1983
Numerical Methods and Modeling for Engineering Applications R. Rashidzadeh 50
FACULTY OF ENGINEERING - UNIVERSITY OF WINDSOR
Q12- The U.S. census data from 1790 to 1990 is stored in the file census. dat, which is supplied
with MATLAB. Type load(census) to load this file. The first column, cdate, contains the years,
and the second column, pop, contains the population in millions. Use the Basic Fitting interface
to solve this problem. First try to fit a cubic polynomial to the data. If you get a warning
message, center and scale the data by checking Center and scale x data in the Interface, and fit
a fourth degree polynomial 𝑝1 𝑧 4 + 𝑝2 𝑧 3 +𝑝3 𝑧 2 +𝑝4 𝑧 + 𝑝5 . The coefficient, 𝑝1 is
1. 0.6
2. 0.7
3. 0.8
4. 0.9
Q13- Consider the following data. Find the best-fit line, 𝑦𝑜 = 𝑚𝑥𝑜 + 𝑛, that passes through
the point 𝑥𝑜 = 10 and 𝑦𝑜 = 11. 𝑚 and 𝑛 are:
x 0 5 10
y 2 6 11
1. m=1.1673, n=0.4234
2. m=1.2567, n=0.6575
3. m=1.0849, n=0.1509
4. m=1.3443, n=0.1172