0% found this document useful (0 votes)
8 views

Week 7 solution

The document contains multiple-choice questions (MCQs) related to decision trees, focusing on concepts such as attribute selection measures, binary vs. multiway splits, pruning techniques, and the use of Chi-Square and Gini index. It explains the correct answers and provides reasoning for each question, highlighting the characteristics and limitations of decision trees. Key points include that K-Nearest Neighbors is not an attribute selection measure, decision trees are prone to overfitting, and Gini index measures node impurity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Week 7 solution

The document contains multiple-choice questions (MCQs) related to decision trees, focusing on concepts such as attribute selection measures, binary vs. multiway splits, pruning techniques, and the use of Chi-Square and Gini index. It explains the correct answers and provides reasoning for each question, highlighting the characteristics and limitations of decision trees. Key points include that K-Nearest Neighbors is not an attribute selection measure, decision trees are prone to overfitting, and Gini index measures node impurity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

1. Which of the following is NOT an attribute selection measure used in decision trees?

A) Entropy
B) Information Gain
C) Chi-Square
D) K-Nearest Neighbors (KNN)
Answer: D) K-Nearest Neighbors (KNN)
Explanation:
Decision trees use measures like Entropy, Information Gain, and Chi-Square to determine the best
split at each node. KNN is a classification algorithm and is not used for attribute selection in decision
trees.

2. What is the key difference between a binary split and a multiway split in decision trees?
A) Binary splits divide the data into two groups, while multiway splits create multiple child nodes.
B) Multiway splits are used only for numerical attributes, whereas binary splits are for categorical
attributes.
C) Binary splits use entropy, while multiway splits use Gini index.
D) Multiway splits always result in better accuracy than binary splits.
Answer: A) Binary splits divide the data into two groups, while multiway splits create multiple child
nodes.
Explanation:
 Binary splits create two branches from a node, dividing the data into two groups.
 Multiway splits allow multiple branches, creating more than two child nodes.
 Both binary and multiway splits can be used for numerical or categorical attributes,
depending on the decision tree implementation.

3. In decision tree pruning, which technique removes unnecessary nodes AFTER the tree has
been fully grown?
A) Pre-Pruning
B) Post-Pruning
C) Overfitting Pruning
D) Random Forest
Answer: B) Post-Pruning
Explanation:
 Post-pruning (also called Reduced Error Pruning) removes nodes after the full tree has
been built.
 It evaluates subtrees and removes branches that do not significantly improve accuracy,
helping to reduce overfitting.
 Pre-pruning, in contrast, stops tree growth early based on conditions like minimum samples
per split.
4. How is the Chi-Square test used for decision tree splitting?
A) It calculates entropy to determine the best split.
B) It measures the statistical significance of differences between parent and child nodes.
C) It ensures all splits are binary.
D) It helps reduce the number of categorical features.
Answer: B) It measures the statistical significance of differences between parent and child nodes.
Explanation:
The Chi-Square test measures whether a split significantly improves classification by checking
differences in observed vs. expected frequencies of target variables. A higher Chi-Square value
means a better split.

5. What is the main disadvantage of decision trees compared to other machine learning
algorithms?
A) Decision trees are difficult to interpret.
B) They always underfit the data.
C) They are prone to overfitting, especially with deep trees.
D) They require extensive data cleaning.
Answer: C) They are prone to overfitting, especially with deep trees.
Explanation:
 Decision trees tend to overfit when they become too complex, learning noise in the training
data.
 This issue can be addressed using pruning or ensemble methods like Random Forest to
improve generalization.
MCQs (Decision Tree)
1. Given entropy of parent = 1, weights averages = (3/4,1/4) and entropy of children =
(0.9, 0). What is the information gain?
a) 0.675
b) 0.75
c) 0.325
d) 0.1
Ans: c)
Explanation: We know Information Gain = Entropy (Parent) – ∑ (weights average *
entropy (Child)).

Information Gain = 1 – (3/4 * 0.9 + 1/4 * 0)


= 1 – (0.675 + 0)
= 1 – 0.675
= 0.325

2. Which of the following statements is not true about Information Gain?


a) It is used to determine which feature/attribute gives us the maximum information
about a class
b) It is based on the concept of entropy, which is the degree of impurity or disorder
c) It aims to reduce the level of entropy starting from the root node to the leave nodes
d) It is often promote the level of entropy starting from the root node to the leave
nodes
Ans: d)
Explanation: Information Gain is based on the concept of entropy and it never tries to
promote but tries to reduce the level of entropy starting from the root node to the leave
nodes.

3. If a dataset has three classes with probabilities 0.2, 0.3, and 0.5, what is the Gini
index?
a) 0.50
b) 0.62
c) 0.42
d) 0.38
Ans: c)
Explanation: Gini=1−((0.2)2+ (0.3)2+ (0.5)2) = 1−(0.04+0.09+0.25) =1−0.38=0.62

4. The Gini coefficient in a decision tree is used to measure:


a) The depth of the tree
b) The impurity of a node
c) The number of leaves in the tree
d) The accuracy of the model
Ans: b)
Explanation: The Gini coefficient in a decision tree is used to measure the impurity of
a node. The nodes having the lowest value of Ginni coefficient
5. Which criterion is used by default in DecisionTreeClassifier() for classification?
a) Entropy
b) Gini
c) Mean Squared Error
d) Information Gain
Ans: b)
Explanation: Gini Coefficient is used by default in objective questions on decision tree
in python

You might also like