0% found this document useful (0 votes)

2 views

Assignment 9 solution

The document contains solutions to various questions related to machine learning concepts such as decision trees, bagging, K-nearest neighbors, gradient boosting, random forests, AdaBoost, K-means, dendrograms, and hierarchical clustering. It explains the mechanisms of these algorithms, their advantages, and the mathematical principles behind them. Key insights include how bagging reduces variance in decision trees, the role of the parameter K in KNN, and the flexibility of hierarchical clustering in determining the number of clusters.

Uploaded by

Shruti Lashkari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Assignment 9 solution

Uploaded by

Shruti Lashkari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Week 09 Assignment Solutions

IIT Madras Instructors

Question 1: Decision Trees and Bagging

Solution: (C) High variance
Bagging (Bootstrap Aggregating) addresses the high variance problem of decision trees. Individual
decision trees tend to overfit to their training data, but by combining predictions from multiple trees
trained on different bootstrap samples, bagging reduces variance and improves generalization.
Decision trees inherently suffer from high variance because of their tendency to grow deep structures
that perfectly fit the training data. Since trees make hard splitting decisions at each node, small changes
in the training data can result in completely different tree structures, leading to significantly different
predictions. This sensitivity to the specific training examples is the essence of their high variance nature.
Furthermore, decision trees have no built-in regularization mechanism to prevent overfitting, causing
them to capture noise in the training data rather than just the underlying pattern.
Bagging reduces variance by averaging predictions from multiple models trained on different subsets
of the data. Since each decision tree is trained on a different bootstrap sample, they make different
errors that tend to cancel out when averaged together. This ensemble approach effectively smooths out
the individual trees’ high variance, resulting in more stable and reliable predictions without significantly
increasing bias. The key insight is that while individual trees may overfit to their specific training
samples, their collective prediction is more robust and generalizable.

Question 2: K-Nearest Neighbors Parameter

Solution: (B) The number of nearest neighbors to consider for classification
In KNN, the parameter K determines how many neighboring data points are considered when making
a prediction. The algorithm assigns the majority class (for classification) or average value (for regression)
from these K neighbors.
For a test point x, the KNN prediction is given by:
1 X
ŷ(x) = yi (1)
K
i∈NK (x)

where NK (x) represents the K nearest neighbors of x according to some distance metric (typically
Euclidean distance: d(x, xi ) = kx − xi k2 ). For classification, this becomes a majority vote:
X
ŷ(x) = arg max I(yi = c) (2)
c
i∈NK (x)

The choice of K controls the bias-variance tradeoff: smaller K leads to lower bias but higher variance,
while larger K provides smoother decision boundaries but may increase bias.

Question 3: Gradient Boosting Fitting

Solution: (C) The gradient of the loss function with respect to predictions
In gradient boosting, each new model is trained to predict the gradient of the loss function with
respect to the current predictions. This allows the algorithm to iteratively reduce the error by following
the direction of steepest descent.
Mathematically, for a loss function L(y, F (x)), where F (x) is the current ensemble prediction, the
next model hm (x) in the ensemble is trained to approximate:

∂L(y, F (x))
hm (x) ≈ − (3)
∂F (x) F (x)=Fm−1 (x)

1
The ensemble is then updated as:

Fm (x) = Fm−1 (x) + α · hm (x) (4)

where α is a learning rate. For example, with squared error loss L(y, F (x)) = 21 (y − F (x))2 , the gradient
is simply the residual (y − F (x)), meaning each new model is trained to predict the errors of previous
models.

Question 4: Random Forest Feature Selection

Solution: (A) Square root of the total number of features
√
Random forests typically use p features at each split (where p is the total number of features). This
randomness increases diversity among trees in the ensemble, further helping to reduce variance.
At each node of a tree in a random forest, only a random subset of features mtry is considered for
√
splitting. For classification problems, the default is mtry = p, while for regression problems, it’s often
mtry = p/3. Mathematically, if we denote the set of all features as F with |F| = p, then at each node
√
we select a subset Fsub ⊂ F such that |Fsub | = mtry = p. The optimal split is then chosen only from
this subset:

Best Split = arg max Information Gain(j, s) (5)

j∈Fsub ,s

where j is a feature and s is a splitting threshold.

Question 5: AdaBoost Misclassification

Solution: (D) They are given higher weights in the next iteration
AdaBoost increases the weights of misclassified examples in each iteration, forcing subsequent weak
learners to focus more on the difficult examples that previous models couldn’t classify correctly.
The weight update rule for AdaBoost is:
(
(t+1) (t) e−αt , if instance i is correctly classified
wi = wi × αt (6)
e , if instance i is misclassified

where αt = 12 ln 1− t
t
and t is the weighted error rate of the t-th weak classifier. After updating,
weights are normalized to form a probability distribution. This exponential weighting scheme ensures
that misclassified examples receive exponentially higher weights, making them more influential in training
subsequent classifiers.

Question 6: AdaBoost Loss Function

Solution: (D) Exponential loss
AdaBoost minimizes the exponential loss function, which heavily penalizes misclassifications. This is
mathematically consistent with the algorithm’s weight update mechanism.
The exponential loss function for AdaBoost is defined as:

L(y, F (x)) = e−yF (x) (7)

PT
where y ∈ {−1, +1} is the true label and F (x) = t=1 αt ht (x) is the ensemble prediction. This loss
function can be shown to be equivalent to the weight update rule used in AdaBoost.
Taking the negative gradient of this loss with respect to F (x) gives:

∂L(y, F (x))
− = y · e−yF (x) (8)
∂F (x)

which is precisely what each weak learner tries to approximate in each iteration, proving that AdaBoost
is indeed a gradient boosting algorithm minimizing exponential loss.

2
Question 7: Decision Tree Split Metrics
Solution: (D) Mean squared error
For binary decision trees used in classification, Gini index, entropy, and classification error are com-
mon splitting criteria. Mean squared error is typically used for regression trees, not for binary classifi-
cation problems.
The Gini index measures impurity and is defined as:
c
X
Gini(t) = 1 − p(i|t)2 (9)
i=1

where p(i|t) is the proportion of samples belonging to class i at node t, and c is the number of classes.
Entropy is calculated as:
c
X
Entropy(t) = − p(i|t) log2 p(i|t) (10)
i=1

For regression trees, mean squared error is appropriate:

1 X
MSE(t) = (yi − ȳt )2 (11)
Nt i∈t

where Nt is the number of samples at node t, yi is the target value of sample i, and ȳt is the mean target
value at node t.

Question 8: K-means Initialization

Solution: (D) Randomly within the range of the data
The standard K-means algorithm initializes cluster centers randomly within the data range. This
random initialization selects points from the feature space to serve as the starting positions for the cluster
centroids.
(0)
In standard K-means, if the data range for feature j is [aj , bj ], then initial centroids µk have their
j-th component initialized as:
(0)
µkj ∼ Uniform(aj , bj ) (12)

This uniform random selection across the data range ensures that the initial centroids cover the
feature space where data points exist.

Question 9: Dendrograms
Solution: (C) A tree-like diagram showing the hierarchy of clusters
A dendrogram is a tree-like visualization that shows how clusters are merged (agglomerative) or split
(divisive) at each step of hierarchical clustering, revealing the nested structure of the clustering.

Question 10: Hierarchical Clustering Advantage

Solution: (A) It doesn’t require specifying the number of clusters beforehand
A key advantage of hierarchical clustering is that it produces a complete hierarchy of clusters. Users
can choose the appropriate number of clusters after examining the dendrogram, unlike K-means which
requires specifying K in advance.
Hierarchical clustering creates a nested tree of partitions, which can be cut at any level to yield a
specific number of clusters. This flexibility is particularly useful when the optimal number of clusters is
not known a priori. The hierarchy can be represented by a sequence of clusterings C = {C0 , C1 , . . . , Cn },
where C0 is the finest clustering (each point is its own cluster) and Cn is the coarsest clustering (all
points in one cluster).

3
The appropriate number of clusters can be determined by examining the dendrogram and identifying
where the largest change in dissimilarity occurs, which is often visualized as a large vertical gap in the
dendrogram. This can be formally expressed as finding i that maximizes:

∆(i) = d(Ci+1 ) − d(Ci ) (13)

where d(Ci ) is the dissimilarity level at which clustering Ci is formed.

Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
M3u Playlist Links 2022 by Receiver Updates
75% (4)
M3u Playlist Links 2022 by Receiver Updates
20 pages
Nptel Week 7
No ratings yet
Nptel Week 7
3 pages
Assignment 3.Docx 2
No ratings yet
Assignment 3.Docx 2
23 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Assingment On Database
No ratings yet
Assingment On Database
16 pages
Random Forest
No ratings yet
Random Forest
10 pages
classification algorithms
No ratings yet
classification algorithms
20 pages
Decision Tree Comprehesive
No ratings yet
Decision Tree Comprehesive
7 pages
examBD2223 January Solutions
No ratings yet
examBD2223 January Solutions
7 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
09_EnsembleLearning
No ratings yet
09_EnsembleLearning
36 pages
Trees, Boosting, and Random Forest
No ratings yet
Trees, Boosting, and Random Forest
14 pages
AIML QB in Short Form
No ratings yet
AIML QB in Short Form
48 pages
Random Forest
No ratings yet
Random Forest
10 pages
ML 2 (Mainly KNN)
100% (1)
ML 2 (Mainly KNN)
12 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
09 Decision Trees Nearest Neighbor
No ratings yet
09 Decision Trees Nearest Neighbor
8 pages
FMLanswerkey-IT 2.docx (1) (1) (1)
No ratings yet
FMLanswerkey-IT 2.docx (1) (1) (1)
11 pages
Data Science - Decision Tree - Random Forest
No ratings yet
Data Science - Decision Tree - Random Forest
15 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
Aiml QB With Ans - 075736
No ratings yet
Aiml QB With Ans - 075736
69 pages
Chap 8
No ratings yet
Chap 8
9 pages
Pa ZG512 Ec-3r First Sem 2022-2023
No ratings yet
Pa ZG512 Ec-3r First Sem 2022-2023
5 pages
ai ml unit 3
No ratings yet
ai ml unit 3
15 pages
Review Questions DS
No ratings yet
Review Questions DS
14 pages
Decision Trees Set-1
No ratings yet
Decision Trees Set-1
7 pages
Machine Learning: Classification & Decision Trees
No ratings yet
Machine Learning: Classification & Decision Trees
24 pages
ML Assignment
No ratings yet
ML Assignment
7 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
Chap9 Cart 574 1
No ratings yet
Chap9 Cart 574 1
42 pages
Random Forest
No ratings yet
Random Forest
5 pages
MLT Unit-3 Important Questions
No ratings yet
MLT Unit-3 Important Questions
8 pages
ML Mid Question Solve
No ratings yet
ML Mid Question Solve
19 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
CS467-M4-Machine Learning-Ktustudents - in
No ratings yet
CS467-M4-Machine Learning-Ktustudents - in
9 pages
Assignment 04
No ratings yet
Assignment 04
17 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
39 pages
Assignment 6 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 6 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
10 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Classification
No ratings yet
Classification
8 pages
2025 Ensemble Learning.docx
No ratings yet
2025 Ensemble Learning.docx
25 pages
Wk07
No ratings yet
Wk07
8 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
18 pages
Machine Learning Solutions
No ratings yet
Machine Learning Solutions
6 pages
Aiml ML Session 13
No ratings yet
Aiml ML Session 13
78 pages
Aam Ut-1 Qb Ans [Final]
No ratings yet
Aam Ut-1 Qb Ans [Final]
26 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
CP 4
No ratings yet
CP 4
2 pages
DMT MCQ
No ratings yet
DMT MCQ
15 pages
Tex
No ratings yet
Tex
7 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Aam Ut-1 Qb Ans- [Final]
No ratings yet
Aam Ut-1 Qb Ans- [Final]
28 pages
DM Module-3 Notes
No ratings yet
DM Module-3 Notes
25 pages
DS535 Note 6 (Page28-30)
No ratings yet
DS535 Note 6 (Page28-30)
4 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
PC3 AMC Technical - Specification
No ratings yet
PC3 AMC Technical - Specification
1 page
OLONISAKIN DAVID OLUWASEY, SIWES REPORT COPY
No ratings yet
OLONISAKIN DAVID OLUWASEY, SIWES REPORT COPY
26 pages
Problem Solving M
No ratings yet
Problem Solving M
113 pages
Casr 139
No ratings yet
Casr 139
467 pages
Office Organization and Administration 2
No ratings yet
Office Organization and Administration 2
38 pages
Ethiopia National Fertilizer and Inputs Unit - PDF Abbyy
No ratings yet
Ethiopia National Fertilizer and Inputs Unit - PDF Abbyy
51 pages
Rekapan Bookingan
No ratings yet
Rekapan Bookingan
19 pages
DEIF TCM 2 Datasheet 4921240329uk
No ratings yet
DEIF TCM 2 Datasheet 4921240329uk
6 pages
Case Study Ford Motor Company: New Strategies For International Growth
No ratings yet
Case Study Ford Motor Company: New Strategies For International Growth
4 pages
Artificial Intelligence Course Code ECE4 PDF
No ratings yet
Artificial Intelligence Course Code ECE4 PDF
72 pages
Cognitive Regulative: Institutional Theory
No ratings yet
Cognitive Regulative: Institutional Theory
6 pages
Microcontroller Lab BECL456A Manual-PPP
No ratings yet
Microcontroller Lab BECL456A Manual-PPP
28 pages
Contrastive Predictive Coding
No ratings yet
Contrastive Predictive Coding
13 pages
Complementing Mathematical Thinking and Statistical Thinking in School Mathematics
No ratings yet
Complementing Mathematical Thinking and Statistical Thinking in School Mathematics
21 pages
Homework 2 Solution
No ratings yet
Homework 2 Solution
9 pages
Cambury Place 12710
No ratings yet
Cambury Place 12710
2 pages
Table IV Study Guide
No ratings yet
Table IV Study Guide
6 pages
Directory of Overseas Workers Welfare Administration (Owwa) - Regional Welfare Offices (Rwos)
No ratings yet
Directory of Overseas Workers Welfare Administration (Owwa) - Regional Welfare Offices (Rwos)
2 pages
About Increff - 2022
No ratings yet
About Increff - 2022
3 pages
Zetasizer Pro and Zetasizer Ultra: Advance With Confidence
No ratings yet
Zetasizer Pro and Zetasizer Ultra: Advance With Confidence
20 pages
Microsoft Hololens 2
No ratings yet
Microsoft Hololens 2
8 pages
Public Financial Management Reforms in Turkey: Progress and Challenges, Volume 2 Tekin Akdemir - The latest updated ebook version is ready for download
100% (1)
Public Financial Management Reforms in Turkey: Progress and Challenges, Volume 2 Tekin Akdemir - The latest updated ebook version is ready for download
58 pages
Pavement Design Report
No ratings yet
Pavement Design Report
205 pages
SGSI 2024 - Overview
No ratings yet
SGSI 2024 - Overview
28 pages
Geography Project: Name-Class - Sec - Roll No. - Session
50% (2)
Geography Project: Name-Class - Sec - Roll No. - Session
8 pages
The Stone Age: 100 Words To Describe
No ratings yet
The Stone Age: 100 Words To Describe
1 page
Air Flow Design - Using The Cascade Approach
No ratings yet
Air Flow Design - Using The Cascade Approach
5 pages
Faculy Marked Assignment No. 3: Mechanical Engineering Department
No ratings yet
Faculy Marked Assignment No. 3: Mechanical Engineering Department
5 pages
Chemical Bonding - JEE Main 2020 January
No ratings yet
Chemical Bonding - JEE Main 2020 January
5 pages

Assignment 9 solution

Uploaded by

Assignment 9 solution

Uploaded by

Week 09 Assignment Solutions

IIT Madras Instructors

Question 1: Decision Trees and Bagging

Question 2: K-Nearest Neighbors Parameter

Question 3: Gradient Boosting Fitting

Fm (x) = Fm−1 (x) + α · hm (x) (4)

Question 4: Random Forest Feature Selection

Best Split = arg max Information Gain(j, s) (5)

where j is a feature and s is a splitting threshold.

Question 5: AdaBoost Misclassification

Question 6: AdaBoost Loss Function

L(y, F (x)) = e−yF (x) (7)

For regression trees, mean squared error is appropriate:

Question 8: K-means Initialization

Question 10: Hierarchical Clustering Advantage

∆(i) = d(Ci+1 ) − d(Ci ) (13)

where d(Ci ) is the dissimilarity level at which clustering Ci is formed.

You might also like