Assignment 9 solution
Assignment 9 solution
where NK (x) represents the K nearest neighbors of x according to some distance metric (typically
Euclidean distance: d(x, xi ) = kx − xi k2 ). For classification, this becomes a majority vote:
X
ŷ(x) = arg max I(yi = c) (2)
c
i∈NK (x)
The choice of K controls the bias-variance tradeoff: smaller K leads to lower bias but higher variance,
while larger K provides smoother decision boundaries but may increase bias.
1
The ensemble is then updated as:
where α is a learning rate. For example, with squared error loss L(y, F (x)) = 21 (y − F (x))2 , the gradient
is simply the residual (y − F (x)), meaning each new model is trained to predict the errors of previous
models.
∂L(y, F (x))
− = y · e−yF (x) (8)
∂F (x)
which is precisely what each weak learner tries to approximate in each iteration, proving that AdaBoost
is indeed a gradient boosting algorithm minimizing exponential loss.
2
Question 7: Decision Tree Split Metrics
Solution: (D) Mean squared error
For binary decision trees used in classification, Gini index, entropy, and classification error are com-
mon splitting criteria. Mean squared error is typically used for regression trees, not for binary classifi-
cation problems.
The Gini index measures impurity and is defined as:
c
X
Gini(t) = 1 − p(i|t)2 (9)
i=1
where p(i|t) is the proportion of samples belonging to class i at node t, and c is the number of classes.
Entropy is calculated as:
c
X
Entropy(t) = − p(i|t) log2 p(i|t) (10)
i=1
where Nt is the number of samples at node t, yi is the target value of sample i, and ȳt is the mean target
value at node t.
This uniform random selection across the data range ensures that the initial centroids cover the
feature space where data points exist.
Question 9: Dendrograms
Solution: (C) A tree-like diagram showing the hierarchy of clusters
A dendrogram is a tree-like visualization that shows how clusters are merged (agglomerative) or split
(divisive) at each step of hierarchical clustering, revealing the nested structure of the clustering.
3
The appropriate number of clusters can be determined by examining the dendrogram and identifying
where the largest change in dissimilarity occurs, which is often visualized as a large vertical gap in the
dendrogram. This can be formally expressed as finding i that maximizes: