Department of Metallurgical Engineering and Materials Science, IIT Bombay
Department of Metallurgical Engineering and Materials Science, IIT Bombay
We will learn how to perform data analysis using MATLAB in this tutorial. We will first start
with fitting data to polynomial curves. Then we will learn how to fit data to a generic
user-defined function. We will also learn how to read/import data into MATLAB.
Let us create two vectors x and y with the following data:
>> plot(x,y1,’*’,’LineWidth’,12);
>> hold on;
Now we will use the function polyfit to calculate the coefficients of a linear fit (using least
square regression) for the above data.
Here the parameter 1 indicates that the data is to be fit to a 1st order polynomial. A nth order
polynomial in MATLAB has the following form: p(x) = p1 x n + p2 x n−1 + ... + pn x + pn+1 . We
therefore have 2 coefficients for a 1st order polynomial. These coefficients are stored in
coeffs1.
Now we will use the function polyval to create another vector y2 which uses the values in
coeffs1 to find the values of y2 corresponding to x using the above polynomial.
>> y2 = polyval(coeffs1,x);
>> plot(x,y2,’Color’,’r’);
MM612: Tutorial 2 1
This superimposes the plot of y2 vs. x on the plot that we had earlier. As can be seen, the linear
fit does not do a very good job of representing the above data. We use a 2nd order polynomial to
fit the above data now:
We see that the 2nd order polynomial does a much better job of fitting to the given data! Please
save this plot to the file plot1.png using:
>> print(‘plot1’,’-dpng’);
It is likely that in many cases the generic polynomial expression in MATLAB may not yield a
satisfactory fit to the data. For example, the data might give a better fit to trigonometric or
exponential functions. We need to define our own functions in such situations.
We will create a script and use another data set. All commands in this lessons will be written in
the script example_fit.m:
function example_fit ()
% Name:
% Roll no.:
x = [0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0];
y = [0.02 0.15 0.3 0.5 0.75 1.1 1.5 1.9 2.4 2.9];
L2.1: Let us assume that this data represents a physical process which can be represented by
the equation: y = a1 x + a2 x2 + a3 exp(x)x . We need to find the parameters a1 , a2 , and a3 based
on the fit of y to x. This is essentially a system of linear equations of the form: [data][a] = [y],
where data is a 10x3 matrix with the elements data(i,1) = x(i), data(i,2) =
x(i)^2, data(i,3) = x(i)*exp(x(i)). a is a 3x1 matrix of the coefficients of the above
equation and y is the 10x1 matrix declared above.
MM612: Tutorial 2 2
data(:,1) = x(:);
data(:,2) = x(:).^2;
data(:,3) = x(:).*exp(x(:));
Note the : operator which can be used to specify operations over all columns. We can solve the
set the of linear equations simply by using:
a = data\y’;
Here \ is the operator used to solve a system of linear equations. It is the same as mldivide. a
= data\y solves a system of linear equations: data*a = y.
Now we will create another vector y1 which has the values of the above expression using the
fitting coefficients a. We plot the original data using symbols and the fitted curve using a line.
Note that this particular technique worked only because we could reduce our model to a system
of linear equations. This might not work if we have a more complicated model form!
L2.2: Now we will use the same data to fit to another non-linear equation of the form:
y = p1 (x − p2 )p3 . We do a least square regression fit to find the parameters p1 , p2 , and p3 . We
2
first define a function: f = ∑ (y i − p1 (xi − p2 )p3 ) . We want to minimize the value of this
i
function over all elements denoted by the index, i , using the function fminsearch. First we
create a function f with the symbolic handle p. We provide the initial guess values of p1 , p2 ,
and p3 in the vector pguess. The function call to fminsearch also returns the value of the
residual, which is stored in fminres. Once we obtain the values of p, we create another vector
y2 generated using the above equation and then plot it. We save the plot as plot2.png.
MM612: Tutorial 2 3
plot(x,y2,'Color','g');
legend(‘Data’,’Fit1’,’Fit2’);
hold off;
print(‘plot2’,’-dpng’);
Note that the regression fit is very sensitive to the initial guess values and they must be chosen
properly!
We often deal with large amount of data written in text files. We need to read these data files and
import the data into MATLAB in order to work with them. MATLAB provides functions to
easily read data files. Here we will only learn about one of them: importdata. We will repeat
the same exercise as above. Instead of assigning the values to x and y within the script, we will
read them in from the file testfile.txt. We create a second script file example_fit2.m.
function example_fit2 ()
% Name:
% Roll no.:
A = importdata(‘testfile.txt’);
x = A(:,1);
y = A(:,2);
data(:,1) = x(:);
data(:,2) = x(:).^2;
data(:,3) = x(:).*exp(x(:));
a = data\y;
MM612: Tutorial 2 4
pguess = [0.1 0.1 2];
[p,fminres] = fminsearch(f,pguess);
y2 = p(1)*((x - p(2)).^p(3));
plot(x,y2,'Color','g');
legend(‘Data’,’Fit1’,’Fit2’);
hold off;
print(‘plot2’,’-dpng’);
The data is read in at the beginning of the script into the matrix A. We then assign x as the 1st
column of A and y as the 2nd column of A. Everything else remains the same as earlier.
Assignments
Assignment 1:
Create a MATLAB function called MM612A3 with the file name MM612A3.m.
You have been given the data for the experimentally measured lattice parameter (via neutron
diffraction) as a function of temperature for a certain material in the file
lattice_strain_data.txt. Import this data into MATLAB. The first column in this data
file gives the temperature T (in Kelvin), while the second column gives the lattice parameter aT
(in Angstroms).
(a) Calculate the strain corresponding to each temperature using the equation:
εT = (aT − a293 /a293 ) , where a293 corresponds to the lattice parameter at 293 K and is the data
given in the first row.
(b) The coefficient of thermal expansion is generally given by: C T E = dε/dT . From the above
data, calculate the CTE for this material. Plot this CTE data as a function of temperature using
the symbol *.
(c) Now fit this CTE to the polynomial expression: C T E (/K) = p1 T (K)2 + p2 T (K) + p3 using
the function polyfit. Plot this fitted CTE as a function of temperature using a solid red line.
(d) For this material, the theoretical value of instantaneous CTE is given by the expression:
C T E (/K) = 9.472 × 10−6 + 2.062 × 10−8 T (K) − 8.934 × 10−12 T (K)2 . Plot the theoretical value
of CTE as a function of temperature on the same plot using a solid green line.
Please label your axes properly and also give the legend for the three plots. Save the plot as
MM612A3.png.
MM612: Tutorial 2 5