Weka Exercise - Introduction To Algorithms
Weka Exercise - Introduction To Algorithms
University of Sunderland
Aim: To show how to run several algorithms on datasets to get an idea of how many types of data
we have. We shall use 3 algorithms commonly used to get a first feel of data. We also look at basic
data cleaning.
1. ZeroR:
2. OneR (1R)
We see in the confusion matrix that there are 3 types of iris flower
a, b, c.
a b c <-- classified as
50 0 0 | a = Iris-setosa
0 47 3 | b = Iris-versicolor
0 3 47 | c = Iris-virginica
WEKA implements pre-processing of data by means of the editor (as seen above) and Filters. We list
some of the main filters. Filters are selected in the Pre-processing tab by the Filters button. Use the
ionosphere dataset.
Filters can make very large datasets smaller in order for them to processed on less powerful systems.
Or they can randomise the order of the data for better machine learning (even adding noise to data).
• To reduce the dataset size use: Filter > Supervised> Instance > Resample and select e.g. 50%
• To merge data ranges e.g. income into low/medium/high: Filter> Supervised> Attribute>
Discretize
• To reorder datafor better processing: Filter> Unsupervised> Reorder
• To add noise to improve some algorithms: Filter> Unsupervised > AddNoise
• To automatically reduce attributes: Filter> Unsupervised> PrincipalComponents
Filters can also pick out a subset of features to process to make processing more efficient.
• Removing attributes: Filter> Unsupervised> Remove and indicate column e.g. 1 and inverse
• With the ionosphere freshly loaded use edit to select some values in a column and
delete them
• You will see a number of missing values in the attribute window
• Now we will replace these values automatically with a filter: Filter> Unsupervised>
Attribute > ReplaceMissingValues
• Then go back to edit and see what values have replaced the missing ones