0% found this document useful (0 votes)
184 views

Machine Learning in Matlab - Module 1

This course serves as a review of a selection of different data types available in MATLAB. In particular, the focus of the following lessons will be on the data types: – Tables – Categoricals
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
184 views

Machine Learning in Matlab - Module 1

This course serves as a review of a selection of different data types available in MATLAB. In particular, the focus of the following lessons will be on the data types: – Tables – Categoricals
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Module 1 Importing and organizing data

Data Types
Congratulations! You have passed this lesson's quiz achieving 100% NEXT LESSON

This course serves as a review of a selection of different data types available in MATLAB. In
particular, the focus of the following lessons will be on the data types:
– Tables
– Categoricals

Outputs Inputs

knnModel Variable containing information about X Predictor values, specified as a


the classification model. numeric matrix.

y Classification (response) values,


specified as a vector.
Data Types Quiz
Congratulations! You have passed this quiz achieving 100% NEXT LESSON

1. Which of the following data types are available in MATLAB?100


Grade: 100

o structure

o cell

o logical

o datetime

o function handle

o categorical

o All of the above


All of the listed data types are available in MATLAB.

Data Types – Practice 1


Tasks:
Task 1 /Create a variable named num which contains a double precision number.
Hint:
By default, all numbers are double precision in MATLAB.
>> x = 42; %example
Task 2 /This time, create a variable named name which contains a character array.
Hint:
To create a character array, surround the value with single quotes
>> S = ‘MATLAB4Engineers’; %example
Task 3 /This time, create a variable named bool which contains a logical value.
Hint:
To create a logical value, you can simple set the value to true or false or you can use the output of a
comparison. For example,
>> ex = true; %example

Task 4/Use the whos command to see the variables in the workspace and compare their
properties.
Specialized Data Types
Specialized Data Types
Statistics and Machine Learning Toolbox provides specialized data types for storing
information about specific statistical and machine learning constructs such as linear regression
models, probability distributions, and classification trees. These data types provide a way to
combine the data (properties) and the functionality (methods) relevant to the specific
application. For example, the properties for a linear model variable include the coefficients
and residuals. The functionality includes predicting values from that model.

Specialized Data Types – Creating and Using a


Classification Variable
Before you start with the tasks, download the dataset then perform the following code in
MATLAB:

Download

>> load groups


Tasks
Task 1
To construct a k-nearest neighbor classification in MATLAB, use the fitcknn function with
the data table as input and specify the name of the response variable in single quotes.
>> knnMdl = fitcknn(tableData,’Response’);
Create a k-nearest neighbor classifier named mdl on the data in groupData where the response
variable is named group.

Task 2
To see a property’s value, use dot notation.
>> PropValue = mdl.PropName
Set k equal to the value of the NumNeighbors property of mdl.

Task 3
You can specify property values when you create the classifier.
>> knnMdl = fitcknn(tableData,’Response’,’PropertyName’,PropertyValue);
Create a k-nearest neighbor classifier named mdl where the NumNeighbors property is set to 3.

Further Practice
You can use the fitcknn function on the table groupData with a response named group. You can
set the PropertyName to ‘NumNeighbors’ and PropertyValue to 3.
>> properties(mdl)
>> mdl.PropertyName

What is a Table – Introduction


What is a Table?
A table is a data type that is well suited for column-oriented data that is often stored as
columns in a text file or spreadsheet.





The data can be thought of as a series of observations (rows) over a set of
different variables (columns). In this way of organizing data, you typically would like to refer
to columns by name and to keep all observations together.

Referencing Elements of an Array


Row, Column Indexing
Create the matrix:
data = [83 220; 80 217; 74 175; 81 270; 78 215];
Tasks
Task 1
Replace the element in the first row and second column with the value 183,7.

Task 2
Try to create a variable named first3 containing the first three rows of data.

Task 3
Replace the second row of data with the values 72,1 and 181,4.

Task 4
Create a variable named height from the first column of data.

Linear Indexing
Create the matrix:
data = [83; 80; 74; 81; 78; 90; 34; 25];
Tasks
Task 1
Replace the third element with 73,8.

Task 2
Overwrite the last three elements so that they all contain the value 80.

Task 3
Increase the first two values in height by 0,4.

Logical Indexing
Create the matrix:
data = [83; 80; 74; 81; 78; 90; 34; 25];
Tasks
Task 1
Replace all elements greater than 80 with 81.

Table Properties
Before you start with the tasks, download the dataset then perform the following code in
MATLAB:

Download

playerInfo = readtable(‘bball.txt’);
Tasks
Task 1
You can access the metadata of a table named t using the Properties property and dot
notation.
>> t.Properties
Try to create a variable named props containing the table properties of playerInfo.

Task 2
You can access a specific property of a table named t using the following syntax.
>> t.Properties.PropName;
Try to save the variable names in playerInfo to a variable named v.

Task 3
To reassign a value in a cell array, use curly braces to index into it.
>> x{2} = ‘NewName’
In order to reassign a variable name, you will need to index into that property of the table.
>> t.Properties.VariableNames{1} = ‘HurrNum’;
Give the first variable in playerInfo the name ‘playerID’.

Further Practice
You may now use the Command Window to practice MATLAB commands, or move on to the
next section.

Storing Data in a Table – Introduction


You have not taken this lesson's quiz yet

Storing Data in a Table


Suppose you have the following basketball player data saved in a text file or a spreadsheet
(.xls) file.
>> playerInfo = readtable(‘bballPlayers.txt’)

Inputs Outputs

Name of the file you would 'bballPlayers.txt' Name of the variable you would like to store playerInfo
like to import, entered as a your data to in MATLAB, returned as a table
string. array.

Storing Data in a Table – Introduction Quiz


Congratulations! You have passed this quiz achieving 100% NEXT LESSON

1. Which command is the correct way to import the following file, bballStars.xls, as a
table?100

bballStars

Grade: 100

o playerInfo = readtable(bballStars.xls)

o readtable(playerInfo,'bballStars.xls')

o readtable('bballStars.xls') = playerInfo

o playerInfo = importtable('bballStars.xls')

o playerInfo = readtable('bballStars.xls')
The correct syntax has the output on the left side, followed by the equals sign, then
the readtable function which contains the name of the file you want to import in
single quotes.

Import Table Data


Congratulations! You have passed this lesson. NEXT LESSON
Before you start with the tasks, download the dataset:

Download

Tasks
Task 1
Read the file bballStats.txt into a table named stats.

Referencing Elements of an Array –


Introduction
Congratulations! You have passed this lesson's quiz achieving 50% NEXT LESSON

Referencing Elements of an Array


Row, Column (Subscripted) Indexing
Each element is referenced by its unique combination of row and column indices. Providing
two or more (numeric) indices, separated by commas, indicates that row, column indexing is
being used.

Linear Indexing
Each element is referred to by the single value representing its location in the array, as stored
in linear manner in memory. Arrays are stored columnwise in MATLAB, so the linear index
increments down the columns. Providing only one (numeric) index indicates that linear
indexing is being used.
Logical Indexing
Elements are referred to by a logical condition. A logical array is used as an index. Those
elements in the array where the index is true are referenced. Logical indices can be used in
either linear or row, column manner.

Referencing Elements of an Array –


Introduction Quiz
Congratulations! You have passed this quiz achieving 50% NEXT LESSON

1. Which of the following commands will create matrix B given matrix A?50
AB

Grade: 50

o B = A;
B(1:5) = 10;

o B = A;
B(:,1) = 10;

o B = A;
B(B>6) = 10;

o All of the above


Test these different methods in MATLAB. The first method incorporates logical
indexing – all values in the first column are greater than six, and all other values are
less than six. The second method uses linear indexing – the first column corresponds
to the first five values. The third method uses row, column indexing – using : indicates
using all the elements.
2. (Select all that apply) Which of the following commands will create matrix B given
matrix A?50
A&B

Incorrect - Right Answer: B = A(:,1);, B = A(A>6);

o B = A(1:5);

o B = A(:,1);

o B = A(A>6);
Only two of the options will create B. When you use linear indexing, the result is
returned in the same orientation as the input array. Therefore, if the input is a row
vector, the result will be a row vector.

Indexing with Tables – Introduction


You have not taken this lesson's quiz yet

Extracting Data from a Table


To index into a table, you can specify the desired variables by either column numbers or a
cell array containing the variable names.
>> heightAndWeight = data(:,4:5);
>> heightAndWeight = data(:,{‘height’,’weight’});
You can index into a table using regular array indexing with parentheses. As with any array
indexing, the result is a subset of the original data, thus the result is a table.
If you would like to extract a variable from a table as its underlying variable type, you can use
dot notation.
>> h = data.height;
Create a numeric vector named h from the variable height in the table data.
If you would like to extract the contents of more than one variable, you can do so by indexing
into your table with curly braces. When extracting multiple variables from a table, the
variables must be of compatible type to be concatenated into a single variable. If not, indexing
into multiple variables will produce an error.
>> heightAndWeight = data{:,4:5};
>> heightAndWeight = data{:,{‘height’,’weight’}};
Thus, indexing with parentheses returns a table, and indexing with curly braces returns the
data type of the underlying variable.
Indexing with Tables – Introduction Quiz
Congratulations! You have passed this quiz achieving 100% NEXT LESSON

1. What is the size and type of array name after executing the following command?
>> name = playerInfo{:,1:2}25

playerInfo

Grade: 25

o 6-by-2 character array

o 6-by-2 cell array

o 6-by-2 table array

o The command produces an error message


Since the data in firstName and lastName contains strings of varying lengths, the
underlying data type for each variable is a cell array. Thus, the resultant variable
produced indexing with curly braces will be a cell array.
2. Select all that apply) Which of the following commands will store the information in
the height variable in a 6-by-1 numeric array called heights?25
Grade: 25

o heights = playerInfo(:,3)

o heights = playerInfo(3,:)

o heights = playerInfo{:,3}

o heights = playerInfo.height

o heights = playerInfo{:,'height'}
You can access data in a table with dot notation or curly brackets. When using curly
brackets, you can index via variable name or column number.
3. (Select all that apply) Which of the following commands will store the names of the
players who are over 84 inches tall in a 2-by-2 cell array called overSeven?25
Grade: 25

o overSeven = playerInfo([1,3],1:2)

o overSeven = playerInfo{[1,3],1:2}

o overSeven = [ playerInfo.firstName([1,3]) playerInfo.lastName([1,3]) ]

o overSeven = playerInfo([1,3],{'firstName','lastName'})

o overSeven = playerInfo{[1,3],{'firstName','lastName'}}

o overSeven = playerInfo{playerInfo.height > 84,{'firstName','lastName'}}


There are four correct responses. Using parentheses will create another table not a
cell array. To extract the contents of a table, you can access the data with dot
notation or curly brackets. You can also use logical indexing to get the proper values.
4. (Select all that apply) Which of the following commands will store all the
information from players who are over 84 inches tall in a table named over7ft?25
Grade: 25

o over7ft = playerInfo([1,3],:)

o over7ft = playerInfo{[1,3],:}

o over7ft = playerInfo{[1,3],{'firstName','lastName','height','weight'}}
o over7ft = playerInfo{playerInfo.height > 84,{'firstName','lastName','height','weight'}}

o over7ft = playerInfo([1,3],{'firstName','lastName','height','weight'})

o over7ft = playerInfo(playerInfo.height > 84,{'firstName','lastName','height','weight'})


There are three correct answers. To create a table from tabular data, index into the
table with parentheses. Note that because the underlying variable types across
columns are not the same, using curly braces will produce an error.

Indexing with Tables


Making a Subset of a Table
Before you start with the tasks, perform the following line:
>> playerInfo = readtable(‘bballPlayers.txt’);
Tasks
Task 1
The table playerInfo has more information than is necessary.
Create a table named playersthat only keeps columns 1, 3, and 4 in that order.
>> cols = [1 3:4]
>> smTable = playerInfo(:,cols)

Task 2
Create a table from the data in playerInfo named playerHt that only keeps the
variables bioIDand height in that order.
>> names = playerInfo(:,{‘firstName’,’lastName’})

Task 3
Create a numeric array named height that contains the height variable from the playerInfotable.

Appending Data to a Table


Tasks
Task 1
Create a numeric array named bmi that contains the body mass index for each player. Use the
following formula:
703 *weight ./ (height .^ 2)
Note that height and weight are variables in the playerInfo table.
Set bmi equal to the formula. However, you must use the table name, playerInfo with dot
notation to get the values for weight and height.
Task 2
Add a variable named BMI to playerInfo that contains the data in bmi.
Use dot notation to add a new variable.
>> table.newVar = data2add; playerInfo.BMI=bmi

Extracting Numeric Data


If you want to access data from multiple variables, you can do that by indexing with curly
braces. Inside the curly braces, you can specify the desired variables by either a numeric array
containing the column numbers or as a cell array with the variable names.
The following code creates two equivalent arrays from a table.
>> yrs = playerInfo{:,[11 12]};
>> yrs = playerInfo{:,{‘firstseason’,’lastseason’}};
Tasks
Task 1
Try to create a numeric array named body that contains height and weight (columns 3 and 4) in
the playerInfo table.

Representing Discrete Categories


You have not taken this lesson's quiz yet

Representing Discrete Categories


When text labels are intended to represent a finite set of possibilities, a cell array of strings is
unnecessary and utilizes more memory. Instead, you can use a categorical array.
You can use the categorical function to convert a cell array to a categorical array.

Representing Discrete Categories Quiz


Congratulations! You have passed this quiz achieving 100% NEXT LESSON
1. (Select all that apply) In which of the following situations would you typically use a
categorical array?100
Grade: 100

o A list of strings where some values in the list are repeated

o A list of all unique values

o A fixed set of possibilities

o A distribution of numeric data


Generally, you want to use a categorical array when you have a finite set of strings.

Representing Discrete Categories: Using


Categorical Arrays
Using Categorical Arrays
Categorical arrays allow the use of == for comparison, even though the labels are text.

You can use the categories function to extract a cell array of strings of the categories
associated with a categorical array.
Representing Discrete Categories – Ordinal
Arrays
Ordinal Arrays
If your categories have an ordering to them, you can specify this in the categorical function.
To do this, include as two additional inputs a list of the increasing order of the categories as
well as the ‘Ordinal’ flag.

Grouped Operations – Introduction


Congratulations! You have passed this lesson's quiz achieving 100% NEXT LESSON

Grouped Operations
Statistics and Machine Learning Toolbox provides a number of functions that are designed
specifically to work with categorical data.
For example, grpstats calculates statistics – the mean by default – for given variables,
grouped according to a grouping variable.
>> statarray = grpstats(tbl,groupVar,whichstats)

Output Input

Table with the group statarray Input table – if any variables are not numeric or tbl
values for the summary logical (other than those specified in groupVar),
statistic types specified then you must specify the variables for which you
in whichstats. want to perform operations on using a name-value
pair argument.

Name of the variable to group in table statarray. groupVar

Optional input – the default value is 'mean'. whichstats


Statistical operation(s) to perform on the grouped
variables. For multiple operations, use a cell array of
multiple types of summary statistic specified as
function handles or strings.

Grouped Operations – Introduction Quiz


Congratulations! You have passed this quiz achieving 100% NEXT LESSON

1. By default, which statistical operation is performed on each group in grpstats?100


Grade: 100

o mean

o max

o min

o numel

o sum
See the documentation for grpstats to determine the default operation.

Grouped Operations – Using grpstats


Download the data file before you start

Download

Load the data into MATLAB using the following command


>> load groups
Tasks
Task 1
The grpstats function will group all categories in a variable together, and then find the
average value of those groups for the other variables in the table.
>> stats = grpstats(table,’variableName’)
Create a table named m that contains the mean values for x and y grouped by group in tbl.
>> meanTbl = grpstats(table,‘groupingVariableName’)

Task 2
You can perform multiple statistics operations with one call to grpstats by supplying a third input
argument, a cell array containing handles to the desired functions. Make sure to update the output
variables accordingly.
>> stats = grpstats(table,’var’,…
{‘mean’,@sum,’numel’})
Create a table ms containing the mean values and standard deviations for x and y grouped
by group in tbl.
Merging Data – Introduction
Merging Data
You can join two tables in many ways. If you want to include every single observation (row)
from both tables, you can use the outerjoin function. However, if you just want to select the
observations that have key variables common to both tables, then use innerjoin.

Working with Missing Data – Introduction


Working with Missing Data
Any calculation involving NaNs will return NaN. Consequently, the standard statistical
functions such as mean and std will, by default, return NaN if the input data contains any
NaNs. You can provide an optional flag to have these functions ignore NaNs.
Working with Missing Data – Calculations
Involving NaNs
Execute the following code in MATLAB before you start:
>> x = [0.32 0.95 NaN 0.87 0.71 0.42]
Tasks
Task 1
Use the mean function to try to create a variable named xAvg that contains the average value
in x.
>> xAvg=mean(x)

Task 2
In order to find the average value of a vector and omit NaN from the calculation, you can use
the ‘omitnan’ flag as an additional input to the mean function.
>> y = mean(y,’omitnan’);
Try to create a variable named xAvg that contains the average value in x.
>> xAvg=mean(x,'omitnan')

Task 3
Try to create a variable named xMed that contains the median value of x.

Task 4
Try to create a variable named xMin that contains the smallest value in x.

Task 5
Try to create a copy of x named y.

Task 6
Use the isequal function to determine if x and y are equal. Assign the result to test.

Locating Missing Data – Introduction


Locating Missing Data
Numeric Arrays
Trying to use == to test for equality on NaN values will not work because there is no numeric
sense of equality for NaNs. In other words, how can you compare a number to something that
is not a number? Is NaN greater than 3 or less than 3?
Because you cannot use == to find NaNs in your data, there are functions, such
as isnan, available that allow you to test if a value is NaN.

Tables
The ismissing function can be applied to an entire table. It returns a value of true wherever a
value is “missing,” meaning:
 NaN for numeric arrays
 undefined for categorical arrays
 Empty for string (text) arrays
 NaT (not a time) for datetime arrays

Locating Missing Data – Numeric Data with


NaNs
Congratulations! You have passed this lesson. NEXT LESSON

Execute the following command on MATLAB before you start:


y = [3.2 9.5 NaN 8.7; 7.1 4.2 NaN 8.1; NaN 6.3 NaN 0.9; 9.5 1.5 NaN 1.3; 2.8 5.5 NaN inf];
Tasks
Task 1
The function isnan will check to find any missing values and will return a logical array.
>> y = [1 NaN 2];
>> isnan(y)
ans =
010
Try to create a variable named xI that is the same size as x with a value of true where the
elements of the input are NaNs, and false where they are not.

Task 2
Use the nnz function to try to create a variable named nn that contains the number
of NaNvalues in x.
>> numberNans = nnz(isnan(y))

Task 3
Use the all function to create a logical vector named colNan which contains a value of truefor
any columns of x that contain all NaNs.

Task 4
Remove the column of all NaNs from x.
>> x(:,cols) = [];

Further Practice
What is the result when you apply isfinite to x? Try replacing any non-finite values
in x with zero.
When you are finished practicing, please continue to the next Lesson.

You might also like