Lab 4 Specification
Lab 4 Specification
Lab 4 Specification
Aims: This lab provides an opportunity for you to exercise five selected learning algorithms,
i.e., decision tree learning, random forest learning, support vector machine learning, linear
regression/logistic regression and k-nearest neighbours learning, to solve a simple
classification problem and a simple regression problem.
Tasks:
Task 0. Download the Lab 4 solution template package and unpack the files to your lab 4
folder. Pay attention to the two datasets used:
Dataset 1: Dataset for a classification problem
For the classification problem, the dataset is from the Iris Plants Database at
https://gist.github.com/curran/a08a1080b88344b0c8a7
Note: You can go to the page from the given link, find iris.csv and download the csv file
from “Raw”.
Relevant Information about this dataset: --- This is perhaps the best-known database to be
found in the pattern recognition literature. The data set contains 3 classes of 50 instances
each, where each class refers to a type of iris plant. One class is linearly separable from the
other 2; the latter are NOT linearly separable from each other.
--- Predicted attribute: class of iris plant.
--- This is an exceedingly simple domain.
--- Number of Instances: 150 (50 in each of three classes)
--- Number of Attributes: 4 numeric, predictive attributes and the class
--- Attribute Information:
• sepal length in cm
• sepal width in cm
• petal length in cm
• petal width in cm
• class: Iris Setosa, Iris Versicolour and Iris Virginica