Machine Learning in Matlab - Module 1
Machine Learning in Matlab - Module 1
Data Types
Congratulations! You have passed this lesson's quiz achieving 100% NEXT LESSON
This course serves as a review of a selection of different data types available in MATLAB. In
particular, the focus of the following lessons will be on the data types:
– Tables
– Categoricals
Outputs Inputs
o structure
o cell
o logical
o datetime
o function handle
o categorical
Task 4/Use the whos command to see the variables in the workspace and compare their
properties.
Specialized Data Types
Specialized Data Types
Statistics and Machine Learning Toolbox provides specialized data types for storing
information about specific statistical and machine learning constructs such as linear regression
models, probability distributions, and classification trees. These data types provide a way to
combine the data (properties) and the functionality (methods) relevant to the specific
application. For example, the properties for a linear model variable include the coefficients
and residuals. The functionality includes predicting values from that model.
Download
Task 2
To see a property’s value, use dot notation.
>> PropValue = mdl.PropName
Set k equal to the value of the NumNeighbors property of mdl.
Task 3
You can specify property values when you create the classifier.
>> knnMdl = fitcknn(tableData,’Response’,’PropertyName’,PropertyValue);
Create a k-nearest neighbor classifier named mdl where the NumNeighbors property is set to 3.
Further Practice
You can use the fitcknn function on the table groupData with a response named group. You can
set the PropertyName to ‘NumNeighbors’ and PropertyValue to 3.
>> properties(mdl)
>> mdl.PropertyName
Task 2
Try to create a variable named first3 containing the first three rows of data.
Task 3
Replace the second row of data with the values 72,1 and 181,4.
Task 4
Create a variable named height from the first column of data.
Linear Indexing
Create the matrix:
data = [83; 80; 74; 81; 78; 90; 34; 25];
Tasks
Task 1
Replace the third element with 73,8.
Task 2
Overwrite the last three elements so that they all contain the value 80.
Task 3
Increase the first two values in height by 0,4.
Logical Indexing
Create the matrix:
data = [83; 80; 74; 81; 78; 90; 34; 25];
Tasks
Task 1
Replace all elements greater than 80 with 81.
Table Properties
Before you start with the tasks, download the dataset then perform the following code in
MATLAB:
Download
playerInfo = readtable(‘bball.txt’);
Tasks
Task 1
You can access the metadata of a table named t using the Properties property and dot
notation.
>> t.Properties
Try to create a variable named props containing the table properties of playerInfo.
Task 2
You can access a specific property of a table named t using the following syntax.
>> t.Properties.PropName;
Try to save the variable names in playerInfo to a variable named v.
Task 3
To reassign a value in a cell array, use curly braces to index into it.
>> x{2} = ‘NewName’
In order to reassign a variable name, you will need to index into that property of the table.
>> t.Properties.VariableNames{1} = ‘HurrNum’;
Give the first variable in playerInfo the name ‘playerID’.
Further Practice
You may now use the Command Window to practice MATLAB commands, or move on to the
next section.
Inputs Outputs
Name of the file you would 'bballPlayers.txt' Name of the variable you would like to store playerInfo
like to import, entered as a your data to in MATLAB, returned as a table
string. array.
1. Which command is the correct way to import the following file, bballStars.xls, as a
table?100
bballStars
Grade: 100
o playerInfo = readtable(bballStars.xls)
o readtable(playerInfo,'bballStars.xls')
o readtable('bballStars.xls') = playerInfo
o playerInfo = importtable('bballStars.xls')
o playerInfo = readtable('bballStars.xls')
The correct syntax has the output on the left side, followed by the equals sign, then
the readtable function which contains the name of the file you want to import in
single quotes.
Download
Tasks
Task 1
Read the file bballStats.txt into a table named stats.
Linear Indexing
Each element is referred to by the single value representing its location in the array, as stored
in linear manner in memory. Arrays are stored columnwise in MATLAB, so the linear index
increments down the columns. Providing only one (numeric) index indicates that linear
indexing is being used.
Logical Indexing
Elements are referred to by a logical condition. A logical array is used as an index. Those
elements in the array where the index is true are referenced. Logical indices can be used in
either linear or row, column manner.
1. Which of the following commands will create matrix B given matrix A?50
AB
Grade: 50
o B = A;
B(1:5) = 10;
o B = A;
B(:,1) = 10;
o B = A;
B(B>6) = 10;
o B = A(1:5);
o B = A(:,1);
o B = A(A>6);
Only two of the options will create B. When you use linear indexing, the result is
returned in the same orientation as the input array. Therefore, if the input is a row
vector, the result will be a row vector.
1. What is the size and type of array name after executing the following command?
>> name = playerInfo{:,1:2}25
playerInfo
Grade: 25
o heights = playerInfo(:,3)
o heights = playerInfo(3,:)
o heights = playerInfo{:,3}
o heights = playerInfo.height
o heights = playerInfo{:,'height'}
You can access data in a table with dot notation or curly brackets. When using curly
brackets, you can index via variable name or column number.
3. (Select all that apply) Which of the following commands will store the names of the
players who are over 84 inches tall in a 2-by-2 cell array called overSeven?25
Grade: 25
o overSeven = playerInfo([1,3],1:2)
o overSeven = playerInfo{[1,3],1:2}
o overSeven = playerInfo([1,3],{'firstName','lastName'})
o overSeven = playerInfo{[1,3],{'firstName','lastName'}}
o over7ft = playerInfo([1,3],:)
o over7ft = playerInfo{[1,3],:}
o over7ft = playerInfo{[1,3],{'firstName','lastName','height','weight'}}
o over7ft = playerInfo{playerInfo.height > 84,{'firstName','lastName','height','weight'}}
o over7ft = playerInfo([1,3],{'firstName','lastName','height','weight'})
Task 2
Create a table from the data in playerInfo named playerHt that only keeps the
variables bioIDand height in that order.
>> names = playerInfo(:,{‘firstName’,’lastName’})
Task 3
Create a numeric array named height that contains the height variable from the playerInfotable.
You can use the categories function to extract a cell array of strings of the categories
associated with a categorical array.
Representing Discrete Categories – Ordinal
Arrays
Ordinal Arrays
If your categories have an ordering to them, you can specify this in the categorical function.
To do this, include as two additional inputs a list of the increasing order of the categories as
well as the ‘Ordinal’ flag.
Grouped Operations
Statistics and Machine Learning Toolbox provides a number of functions that are designed
specifically to work with categorical data.
For example, grpstats calculates statistics – the mean by default – for given variables,
grouped according to a grouping variable.
>> statarray = grpstats(tbl,groupVar,whichstats)
Output Input
Table with the group statarray Input table – if any variables are not numeric or tbl
values for the summary logical (other than those specified in groupVar),
statistic types specified then you must specify the variables for which you
in whichstats. want to perform operations on using a name-value
pair argument.
o mean
o max
o min
o numel
o sum
See the documentation for grpstats to determine the default operation.
Download
Task 2
You can perform multiple statistics operations with one call to grpstats by supplying a third input
argument, a cell array containing handles to the desired functions. Make sure to update the output
variables accordingly.
>> stats = grpstats(table,’var’,…
{‘mean’,@sum,’numel’})
Create a table ms containing the mean values and standard deviations for x and y grouped
by group in tbl.
Merging Data – Introduction
Merging Data
You can join two tables in many ways. If you want to include every single observation (row)
from both tables, you can use the outerjoin function. However, if you just want to select the
observations that have key variables common to both tables, then use innerjoin.
Task 2
In order to find the average value of a vector and omit NaN from the calculation, you can use
the ‘omitnan’ flag as an additional input to the mean function.
>> y = mean(y,’omitnan’);
Try to create a variable named xAvg that contains the average value in x.
>> xAvg=mean(x,'omitnan')
Task 3
Try to create a variable named xMed that contains the median value of x.
Task 4
Try to create a variable named xMin that contains the smallest value in x.
Task 5
Try to create a copy of x named y.
Task 6
Use the isequal function to determine if x and y are equal. Assign the result to test.
Tables
The ismissing function can be applied to an entire table. It returns a value of true wherever a
value is “missing,” meaning:
NaN for numeric arrays
undefined for categorical arrays
Empty for string (text) arrays
NaT (not a time) for datetime arrays
Task 2
Use the nnz function to try to create a variable named nn that contains the number
of NaNvalues in x.
>> numberNans = nnz(isnan(y))
Task 3
Use the all function to create a logical vector named colNan which contains a value of truefor
any columns of x that contain all NaNs.
Task 4
Remove the column of all NaNs from x.
>> x(:,cols) = [];
Further Practice
What is the result when you apply isfinite to x? Try replacing any non-finite values
in x with zero.
When you are finished practicing, please continue to the next Lesson.