HW02 - KNN DT

This homework assignment covers machine learning algorithms including k-nearest neighbors (kNN) and decision trees. There are 7 problems in total - 3 on kNN involving computing distances, analyzing distance measures, and class distributions; 1 on the curse of dimensionality; 2 on decision trees involving optimal tree depth and entropy calculations; and 1 programming problem to implement kNN in a Jupyter notebook. Students are instructed to submit a single PDF file with their written solutions by a specified deadline.

Uploaded by

ghukasyans033

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views3 pages

HW02 - KNN DT

Uploaded by

ghukasyans033

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

CS251/CS340 Machine Learning Page 1

Machine Learning Homework 02

k-Nearest Neighbors and Decision Trees

This homework consists of 3 theoretical/intuitive problems related to kNN, 1 related to the Curse of Di-
mensionality, 2 related to Decision Trees and one programming task implementing the kNN algorithm.

The solutions will be uploaded after the submission deadline.

Homework
kNN Classification
Problem 1: Compute the distance between
   
1 −1
x = 3 , and y =  0 
   

5 4

using

a) ⟨x, y⟩ := xT y  
2 1 0
b) ⟨x, y⟩ := xT Ay, A = 1 3 −1
 

0 −1 2

For point b), firstly prove that the matrix A is PSD.

Briefly describe how a) relates to the Euclidean distance and how b) relates to the Mahalanobis distance.
Problem 2: You are given the following dataset, with points of two different classes:

Name x1 x2 class
A 0.5 4.0 1
B 1.0 3.0 1
C 1.5 3.0 1
D 3.5 2.0 2
E 6.0 2.0 2
F 6.0 1.0 2

We perform 1-NN classification with leave-one-out cross validation on the data in the plot.

a) Compute the distance between each point and its nearest neighbor using L1 -norm as the distance
measure.

b) Compute the distance between each point and its nearest neighbor using L2 -norm as the distance
measure.

c) What can you say about classification if you compare the two distance measures?

Upload a single PDF file with your homework solution to Moodle by 06.02.2024, 23:59. We recommend to typeset your solution
(using LaTex or Word), but handwritten solutions are also accepted (bring them to the next lecture or put it in my box).
Collaboration is fine but submitting the same or extremely similar solutions is not allowed. Homework rules are in the syllabus.
CS251/CS340 Machine Learning Page 2

Problem 3: Consider a dataset with 3 classes C = {A, B, C}, with the following class distribution NA =
16, NB = 8, NC = 8. We use unweighted k-NN classifier, and set k to be equal to the number of data
points, i.e. k = NA + NB + NC =: N .

a) What can we say about the prediction for a new point xnew ?
b) How about if we use the weighted (by distance) version of k-Nearest Neighbors?

Curse of Dimensionality
Problem 4: When the dimensionality of the feature space, denoted as k, becomes large, the perfor-
mance of kNN and other local prediction methods tends to decline. These approaches rely on observations
near the test observation for making predictions. This decline in performance is referred to as the curse
of dimensionality. We will now investigate this curse.

a) Consider a scenario where we have a collection of observations, each featuring measurements on a

single feature (k = 1) denoted as X. We make the assumption that X follows a uniform distribution
within the interval [0, 1]. Every observation is associated with a corresponding response value (tar-
get). The objective is to predict the response of a test observation by exclusively utilizing observa-
tions falling within the 10% range closest to the X value of the test observation. To illustrate, when
predicting the response for a test observation with X = 0.2, we will consider observations within the
range [0.15, 0.25]. On average, what proportion of the available observations will we use to make the
prediction?

b) Now, consider a situation where we possess a set of observations, each featuring measurements on
two distinct features (k = 2), denoted as X1 and X2 . We make the assumption that the pair (X1 , X2 )
follows a uniform distribution over the intervals [0, 1] for both features. The goal is to predict the re-
sponse of a test observation by exclusively considering observations within a 10% range of both the
X1 and X2 values closest to that test observation. To illustrate, when predicting the response for a
test observation with X1 = 0.2 and X2 = 0.75, we will utilize observations in the range [0.15, 0.25]
for X1 and in the range [0.7, 0.8] for X2 . On average, what proportion of the available observations
will be employed in making this prediction?

c) Now, consider a scenario where we have a set of observations involving k = 100 features. Once again,
these observations are uniformly distributed on each feature, and each feature spans values from 0 to
1. The goal is to predict the response of a test observation by utilizing observations within the 10%
range of each feature’s values that are closest to that specific test observation. What proportion of
the available observations will be employed, on average, to make this prediction?

d) By referencing your responses to sections (a)-(c), argue that the disadvantage of kNN in situations
where the number of features, k, is large is the scarcity of training observations that are considered
”close” to a given test observation.

e) Consider now that our goal is to make a prediction for a test observation by establishing a p-dimensional
hypercube centered around it, which, on average, encompasses 10 % of the training observations. For
k values of 1, 2 and 100, what is the extent of each side of the hypercube? Comment on your answer.
Note: A hypercube is a generalization of a cube to an arbitrary number of dimensions. When k =
1, a hypercube is simply a line segment, when k = 2 it is a square, and when k = 100 it is a 100-
dimensional cube.

Decision Tree Classification

Problem 5: The plot below shows data of two classes that can easily be separated by a single (diagonal)
line. Does there exist a decision tree (as defined during the lecture) of depth 1 that classifies this dataset
with 100% accuracy? Justify your answer.

Problem 6: You are developing a model to classify games at which machine learning will beat the world
champion within five years. The following table contains the data you have collected.

No. x1 (Team or Individual) x2 (Mental or Physical) x3 (Skill or Chance) y (Win or Lose)

1 I M C W
2 I P S W
3 T M C W
4 I M S W
5 T M C W
6 I M S W
7 T P C L
8 T M S L
9 I M C L
10 I P S L

a) Calculate the entropy iH (y) of the class labels y.

b) Build the optimal decision tree of depth 1 using entropy as the impurity measure.

Programming Task
Problem 7: Load the notebook hw 02 notebook.ipynb from Moodle. Fill in the missing code and
run the notebook. Save the evaluated notebook and add it to your submission.
Note: We suggest that you use Anaconda for installing Python and Jupyter, as well as for managing pack-
ages. We recommend that you use Python 3 or higher.
For more information on Jupyter notebooks, consult the Jupyter documentation. Instructions for convert-
ing the Jupyter notebooks to PDF are provided within the notebook.

Homework 1
0% (1)
Homework 1
4 pages
HW02 Sol - KNN DT
No ratings yet
HW02 Sol - KNN DT
8 pages
Classification
No ratings yet
Classification
11 pages
10 EST Solution
No ratings yet
10 EST Solution
16 pages
Classification: K N X X X y I y
No ratings yet
Classification: K N X X X y I y
6 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
Week4 Questions
No ratings yet
Week4 Questions
4 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Example 1: Riding Mowers
No ratings yet
Example 1: Riding Mowers
6 pages
Homework Solution 01 KNN DT
No ratings yet
Homework Solution 01 KNN DT
4 pages
KNN CIML
No ratings yet
KNN CIML
12 pages
Difference Between Instance-And Model-Based Learning
No ratings yet
Difference Between Instance-And Model-Based Learning
35 pages
Machine Learning Module-03
No ratings yet
Machine Learning Module-03
24 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
No ratings yet
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
9 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
Chapter 2
No ratings yet
Chapter 2
70 pages
ML Lab Programs (1-12)
No ratings yet
ML Lab Programs (1-12)
35 pages
178 hw1
No ratings yet
178 hw1
4 pages
T07 IDS - Classification
No ratings yet
T07 IDS - Classification
20 pages
ML 20230316 1
No ratings yet
ML 20230316 1
9 pages
Datamining Lect7knearst
No ratings yet
Datamining Lect7knearst
62 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
KNN Practice Set
No ratings yet
KNN Practice Set
5 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
10 pages
10-601 Machine Learning: Homework 7: Instructions
No ratings yet
10-601 Machine Learning: Homework 7: Instructions
5 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
ML Lab Programs (1-13)
No ratings yet
ML Lab Programs (1-13)
44 pages
MSBD5001 WrittenAssignment2 2024F
No ratings yet
MSBD5001 WrittenAssignment2 2024F
5 pages
AA1 Tema4
No ratings yet
AA1 Tema4
37 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
WK 07
No ratings yet
WK 07
8 pages
cs4302 Lecture2
No ratings yet
cs4302 Lecture2
40 pages
T01 Soln
No ratings yet
T01 Soln
5 pages
3 2KNN
No ratings yet
3 2KNN
27 pages
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
No ratings yet
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
8 pages
Lecture Slides-Week15,16
No ratings yet
Lecture Slides-Week15,16
50 pages
HW 02
No ratings yet
HW 02
3 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
01 Basics 02knn 02
No ratings yet
01 Basics 02knn 02
7 pages
Shubham Pract 6 - Merged
No ratings yet
Shubham Pract 6 - Merged
12 pages
CS-3035 (ML) - CS Mid March 2023
No ratings yet
CS-3035 (ML) - CS Mid March 2023
3 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
Lecture 5-KNN
No ratings yet
Lecture 5-KNN
55 pages
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
No ratings yet
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
3 pages
COSC 6342"machine Learning" Homework1 Spring 2013
No ratings yet
COSC 6342"machine Learning" Homework1 Spring 2013
9 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
Classification and K Nearest Neighbour Algorithm
No ratings yet
Classification and K Nearest Neighbour Algorithm
53 pages
Assignment 4 Solution
No ratings yet
Assignment 4 Solution
3 pages
Classification Algorithms I
No ratings yet
Classification Algorithms I
14 pages
08 Kmethods3 Curse Deminsionality
No ratings yet
08 Kmethods3 Curse Deminsionality
44 pages
Siddu AIml
No ratings yet
Siddu AIml
8 pages
Recitation 8
No ratings yet
Recitation 8
5 pages
Nearest Neighborhood: Instructor: Junghye Lee
No ratings yet
Nearest Neighborhood: Instructor: Junghye Lee
31 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
08 - KNN
No ratings yet
08 - KNN
39 pages
Ma 3 H0
No ratings yet
Ma 3 H0
2 pages
Discrete Probability Distribution
No ratings yet
Discrete Probability Distribution
21 pages
Middle East Technical University Department of Mechanical Engineering ME 310 Numerical Methods All Sections - Fall 2021
No ratings yet
Middle East Technical University Department of Mechanical Engineering ME 310 Numerical Methods All Sections - Fall 2021
4 pages
Pipeline Leak Detection and Control
No ratings yet
Pipeline Leak Detection and Control
6 pages
Find The Optimal Solution To The Linear Programming Model With He Integer Restrictions Relaxed
No ratings yet
Find The Optimal Solution To The Linear Programming Model With He Integer Restrictions Relaxed
10 pages
Final Value Theorem PPT Electronics 1
No ratings yet
Final Value Theorem PPT Electronics 1
6 pages
M 6 Problem Set Solutions
No ratings yet
M 6 Problem Set Solutions
8 pages
AI Note
No ratings yet
AI Note
113 pages
Machine Learning Bloque 4
No ratings yet
Machine Learning Bloque 4
12 pages
HW2 Solution
No ratings yet
HW2 Solution
4 pages
Module C - Transportation Models
No ratings yet
Module C - Transportation Models
27 pages
Dsa Path
No ratings yet
Dsa Path
5 pages
ISM Unit - 2
No ratings yet
ISM Unit - 2
9 pages
Operations Management Final Exam
No ratings yet
Operations Management Final Exam
23 pages
Daa Vtu Module Ii
No ratings yet
Daa Vtu Module Ii
86 pages
Chapter Three: Lecture 1: Solving Problems by Searching and Constraint Satisfaction Problem
No ratings yet
Chapter Three: Lecture 1: Solving Problems by Searching and Constraint Satisfaction Problem
53 pages
IE 312-5.1-Location Problem Basic Models-Continuous II
No ratings yet
IE 312-5.1-Location Problem Basic Models-Continuous II
32 pages
2nd International Conference IOT, Blockchain and Cryptography (IOTBC 2024)
No ratings yet
2nd International Conference IOT, Blockchain and Cryptography (IOTBC 2024)
3 pages
RIT AR 20-III-I Question Bank (DAA)
No ratings yet
RIT AR 20-III-I Question Bank (DAA)
5 pages
Unit IV Morphology Introduction Lecture
No ratings yet
Unit IV Morphology Introduction Lecture
16 pages
AI Exam 2021-2022 UET
No ratings yet
AI Exam 2021-2022 UET
2 pages
Unit - 1 ADA
No ratings yet
Unit - 1 ADA
14 pages
Chapter 5-Computer Theory BY Danial I. A Cohen
67% (21)
Chapter 5-Computer Theory BY Danial I. A Cohen
19 pages
2 Sensitivity Analysis
No ratings yet
2 Sensitivity Analysis
40 pages
IEM 4103 Quality Control & Reliability Analysis IEM 5103 Breakthrough Quality & Reliability
No ratings yet
IEM 4103 Quality Control & Reliability Analysis IEM 5103 Breakthrough Quality & Reliability
42 pages
Course Outline MTS 202 - Statistical Inference
No ratings yet
Course Outline MTS 202 - Statistical Inference
5 pages
Microelectronic Devices Circuits and Systems Second International Conference ICMDCS 2021 Vellore India February 11 13 2021 Revised Selected Papers 1st Edition V. Arunachalam (Editor) - The latest ebook is available, download it today
100% (3)
Microelectronic Devices Circuits and Systems Second International Conference ICMDCS 2021 Vellore India February 11 13 2021 Revised Selected Papers 1st Edition V. Arunachalam (Editor) - The latest ebook is available, download it today
76 pages
406 QM MCQ 75
No ratings yet
406 QM MCQ 75
3 pages
Air Quality Prediction
No ratings yet
Air Quality Prediction
21 pages
5 - Lecture 5 - S-Plane To Z-Plane Mapping & Transfer Function - (2nd Term 2021-2022)
100% (1)
5 - Lecture 5 - S-Plane To Z-Plane Mapping & Transfer Function - (2nd Term 2021-2022)
12 pages