Lab 1 Assignment
Student Name ID:
Submit your .ipynb file to the Lab Assignments section, ensuring that your name and student ID are
included, by 11:59 PM on February 9.
Note: Please answer the following questions using Google Colab - the online Python platform https:
//colab.research.google.com, and click New Notebook to access a Colab note. For all questions
related to coding, the codes should be provided; please also refer to the relevant lecture slides.
1. (10 points) In this assignment, we will use the “fruit data with colors.txt” dataset to implement
the KNN algorithm for fruit classification. We will use the mass, width, height, and color scores as
features and fruit label as the target variable. Our objective is to develop a KNN based predictor
that can classify a fruit based on its features.
(a) (2 points) Please import the dataset “fruit data with colors.txt” and show the first 10 rows.
Use mass, width, height, and color scores as the features and fruit label as the target.
(b) (1 point) Please split the dataset into training and testing datasets with a 4:1 ratio.
(c) (3 points) Please rescale the features by using min-max scaling. Discuss the importance of
scaling. Explain two different methods of scaling. Put your explanation in a markdown cell.
(d) (2 points) Please construct a KNN-based fruit classifier by performing a Grid Search to find
the best value of K for the KNN classifier, using values of K ranging from 1 to 20.
(e) (1 point) Using the best K found from Grid Search, train a KNN model and compute its test
set accuracy.
(f) (1 point) Using the trained model, perform 5-fold cross-validation and compute the accuracy
for each fold. Report the individual fold accuracies and the average cross-validation accuracy.