A Guided Tour To Machine Learning Using MATLAB
A Guided Tour To Machine Learning Using MATLAB
Introduction
• This document guides you through several tutorials, papers, and resources related to
Machine Learning (with emphasis on image and vision tasks) using MATLAB.
• It assumes no prior exposure to Machine Learning or MATLAB.
• It is structured as a step-by-step guide. It is best that you follow it in the intended
sequence.
2. Run the example file dtIris.m, paying attention to the following aspects:
a. How to load a dataset (in this case, it's already available in .mat format)
b. How to plot different views of the dataset (whenever feasible) in order to better
understand the data
c. How to create a decision tree, view it, and use it to make a prediction using unseen
data
d. How to compute resubstitution error of the resulting classification tree
e. How to compute cross-validation accuracy
This version: 4/4/17 4:37 PM
3. Run the examples in the 'Regression_Demos' subfolder. They are also available at:
http://www.mathworks.com/matlabcentral/fileexchange/35789-new-regression-
capabilities-in-r2012a
a. (OPTIONAL, but recommended) Watch the associated webinar / video:
https://www.mathworks.com/videos/regression-analysis-with-matlab-new-
statistics-toolbox-capabilities-in-r2012a-81869.html
b. Explore the examples following this sequence: StraightLine.m, CurvesSurfaces.m,
and NonLinear.m. (Skip the Housing.m, Model.m and the GLMs.m examples)
c. Don’t be intimidated or discouraged by the rich amount of information available
in some MATLAB objects, e.g., LinearModel.
1. Dataset:
In this example, we will use the Fisher’s Iris dataset.
This is a sample dataset included in the MATLAB Statistics and Machine Learning Toolbox.
You can find all sample datasets at:
https://www.mathworks.com/help/stats/_bq9uxn4.html
You can view the datasets loaded to the workspace by double clicking the matrix name
under Workspace window.
(Please notice that you may have a different window layout than the screenshot below)
In this example, meas is a 150*4 double matrix. There are 150 rows each row represents
one instance. There are 4 columns store attribute information (col1: sepal length in cm;
col2: sepal width in cm; col3: petal length in cm; col4: petal width in cm).
The class for each instance is stored in a separate 150*1 cell called “species”. In this case,
the first 50 instances belong to class Setosa, the following 50 belong to class Versicolor and
the last 50 belong to class Virginica.
This version: 4/4/17 4:37 PM
fishertable = readtable('fisheriris.csv');
a) MATLAB Toolstrip: On the APPS tab, under Math, Statistics and Optimization, click the app
icon (see screenshot below).
b) MATLAB command prompt: type classificationLearner
This version: 4/4/17 4:37 PM
5. On the Classification Learner tab, in the File section, click New Session. (see screenshot
below)
This version: 4/4/17 4:37 PM
6. In the New Session dialog box, select the table fishertable from the workspace list.
Note: If you did optional step 2, you may find meas in the dialog as well; make sure the
fishertable is selected.
Observe that the app has selected response and predictor variables based on their data
type. Petal and sepal length and width are predictors, and species is the response that you
want to classify. For this example, do not change the selections.
7. Accept the default validation option (5-fold cross-validation) and continue by clicking Start
Session. You will see the session like following screenshot.
This version: 4/4/17 4:37 PM
8. Choose a classification model. In this case, we shall use a simple decision tree.
To create a classification tree model, on the Classification Learner tab, in the Classifier
section, click the down arrow to expand the gallery and click Simple Tree. Then disable
the 'Use Parallel' button (if it's set to ON) and click Train.
9. Examine results
The Simple Tree model is now in the History list. The model validation score is in the
Accuracy box. This number may be slightly different in your case.
Examine the scatter plot. An X indicates misclassified points. The blue points (setosa
species) are all correctly classified, but some of the other two species are misclassified.
Under Plot, switch between the Data and Model Predictions options. Observe the color of
the incorrect (X) points. Alternatively, while plotting model predictions, to view only the
incorrect points, clear the Correct check box.
On the Classification Learner tab, in the Plots section, click Confusion Matrix or ROC Curve
to generate Confusion Matrix or ROC Curve, respectively. Each plot will open on a separate
tab. See representative screenshots on the next page.
Experiment with changing the settings in each Plot section to fully examine how the
currently selected classifier performed in each class.
This version: 4/4/17 4:37 PM
This version: 4/4/17 4:37 PM
1
Technically, you don't know what a kNN classifier is, since we haven't covered it in class (yet).
But that's on purpose! My goal is to show that you can pick other classifiers, train them, and
'play' with their parameters rather easily, even if you don't quite know what is "inside the box".
This version: 4/4/17 4:37 PM
After performing Feature Selection, a new model will appear on the left-hand side of the app.
You should then train it and compare the accuracy results (as well as confusion matrix, ROC
curve, AUC, etc.) against the previously trained models. MATLAB will indicate the best model so
far by highlighting the highest accuracy values.
See screenshot below (obtained after trying 5 variants of decision trees and 2 variants of kNN).
If the exported model is a decision tree called trainedTreeClassifier, use the following line on
MATLAB command window to view the resulting model (tree):
view(trainedTreeClassifier.ClassificationTree,'mode','graph');
You can use the exported classifier to make predictions on new data. For example, to make
predictions for the fishertable data in your workspace, enter:
yfit = trainedClassifier.predictFcn(fishertable)
The output yfit contains a class prediction for each data point. See the Command Window
screenshot below:
This version: 4/4/17 4:37 PM
Mathworks provide many nice detailed examples, alternatively, you can refer to these links:
https://www.mathworks.com/help/stats/train-decision-trees-in-classification-learner-app.html
http://www.mathworks.com/help/stats/train-logistic-regression-classifiers-in-classification-learner-app.html