PLSR Using MATLAB
PLSR Using MATLAB
Samantha Clayton
April 14, 2019
1. Click on the Home tab in Matlab. Press the “Import Data” button and select the dataset you would like to use.
2. The dataset will open onto a screen. Select the data you would like to use then press the “Import Selection” button.
The output type options are table, column vectors, numeric matrix, cell array, or string array depending on what you
are importing. I suggest using a string array for this process to retain the column labels (if your dataset has them) and
to more easily manipulate them. If you select something other than a table, make sure that all the rows and columns
you wish to import are selected, as MATLAB may not select them by default.
a. You can also click on the arrow on the Import Selection button to get a drop-down menu. You can click on
generate script or function to auto-import this if you need to run it again.
b. Or you could save the workspace after importing all of the files you need and load the workspace rather than
having to import the data again if you need to run this a separate time.
3. The dataset will be imported into MATLAB as the data output type you selected with the same name as the original
file. If the data is in two separate files, repeat the previous steps for the second file. The remaining steps and sample
code outlined below are written assuming that the data is imported using string arrays. Separate the data and labels:
X = str2double(Xmatrix(2:end, 2:end));
Y = str2double(Ymatrix(2:end, 2:end));
X_col_labels = cellstr(Xmatrix(1,2:end));
X_row_labels = cellstr(Xmatrix(2:end,1));
Y_col_labels = cellstr(Ymatrix(1,2:end));
Y_row_labels = cellstr(Ymatrix(2:end,1));
4. If x and y data are in the same table, allocate the x and y portions of the array into two separate arrays:
Data = str2double(data_matrix(2:end,2:end));
X = Data(:,1:last_x_col)
Y = Data(:,first_y_col:end) %last_x_col and first_y_col need to be defined based on your input
X_col_labels = str2cell(Xmatrix(1,2:end));
Y_col_labels = X_row_labels;
X_row_labels(data_matrix(2:last_x_col,1)
Y_row_labels(data_matrix(first_y_col:end,1)
5. The data needs to be normalized before using PLSR. You can use the zscore function to do this.
z_x = zscore(X);
z_y = zscore(Y);
Displaying outputs:
7. Follow the example code provided below to complete the percent variance in y explained by each component (R2Y):
figure;
plot(1:ncomp,cumsum(100*pctvar(2,:)),'-bo');
xlabel('Components')
ylabel('Percent Variance Explained in Y')
Lazzara Lab
Samantha Clayton
April 14, 2019
8. Follow the example code below to create a scores plot (done below for first two components):
figure;
xlabel('Component 1')
ylabel('Component 2')
hold on
x = x_scores(:,1);
y = x_scores(:,2);
scatter(x,y);
xlim([-1 1]);
ylim([-1 1]);
box on
hax = gca;
line([0 0],get(hax,'YLim'),'Color','k','LineStyle','--')
hline = refline([0 0]); hline.Color = 'k'; hline.LineStyle = '--';
labels = X_row_labels;
yline = 0;
xline = 0;
dx = 0.02; dy = 0.02; % displacement so the text does not overlay the data points
text(x+dx, y+dy, labels, 'Fontsize', 10, 'Interpreter', 'none'); % labeling points
Lazzara Lab
Samantha Clayton
April 14, 2019
This portion deals with duplicate values in the data before labeling. It is not necessary if there are none.
The following portion shifts the labels on points with the same value. If there are no duplicate values this portion is
again not necessary.
for i = 1:size(duplicate_value,1) - 1
text(x_dup(i+1)+dx, y_dup(i+1)+dy-2.5*i*dy, X_col_labels(duplicate_ind(i+1)), 'Fontsize', 10, 'Interpreter',
'none','Color','k')
end
Cross-validation
10. To perform a PLSR calculation with cross-validation, modify the plsregress command accordingly:
[x_loadings, y_loadings, x_scores, y_scores, beta, pctvar, mse, stats] = plsregress(z_x, z_y, ncomp, 'CV',
ncv,’mcreps’,mcreps);
%Calculate Q2Y values for each model component using mean squared errors from cross-validation. For %this particular
data set, the Q2Y values will be negative, indicating that the PLSR model developed here %is not a predictive one.
Q2Y = 1- mse(2,2:end)/sum(sum((z_y-mean(z_y)).^2)./size(z_y,1));