DM Chapter 4
DM Chapter 4
Remember:
- Entropy is used to determine which feature to use for splitting, not where to split.
- The goal is to reduce entropy (homogeneity) and maximize information gain
Step 3: The examples are then partitioned into groups of distinct values of this feature;
Next, among the group of movies with a larger number of celebrities, we can make another split
between:
- movies with a high budget
- movies without a high budget
As illustrated by the peak in entropy at x = 0.50, a 50-50 split results in the maximum entropy.
Information gain:
- Information gain is used to calculate the change in entropy resulting from a split on each
possible feature.
- The algorithm checks different features and selects the one that provides the highest information
gain, meaning it best separates the classes.
- If the information gain = 0 → no reduction in entropy for splitting on this feature → The split does
not improve class separation.
Information Gain (Feature) = Entropy Before Split − Weighted Entropy After Split
InfoGain(F) = Entropy (S1) − Entropy (S2)
- Weighted Entropy After Split: entropy in the partitions resulting from the split.
𝑛
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆2) = ∑ 𝑤𝑖 * 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑃𝑖)
𝑖=1
Weighted Entropy after split = w1* Entropy(P1) +…+ wn * Entropy(Pn)
Where wi is the proportion of data points in partition Pi
𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑝𝑜𝑖𝑛𝑡𝑠 𝑖𝑛 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝑃𝑖
𝑤𝑖 = 𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑖𝑓 𝑑𝑎𝑡𝑎 𝑝𝑜𝑖𝑛𝑡𝑠
This means that after splitting a dataset into multiple groups (partitions), the overall entropy of the new
dataset is calculated by considering:
1. The entropy of each individual partition (how mixed or pure it is).
2. The size (proportion) of each partition relative to the total dataset.
-
Handling Numeric Features
- The previous formulae assume nominal features, but decision trees use information gain for
splitting on numeric features as well.
- A common practice is testing various splits that divide the values into groups greater than or less
than a threshold; this reduces the numeric feature into a two-level categorical feature.
- The numeric threshold (e.g., "greater than 50" vs. "less than 50"). yielding the largest
information gain is chosen for the split
Pruning the decision tree
Large tree → overly specific decisions → overfitting model
Pruning a decision tree involves reducing its size.
Post-pruning:
- Post-pruning involves:
1. growing a tree that is too large
2. using pruning criteria based on the error rates at the nodes to reduce the size of the tree to a
more appropriate level.
- This is often a more effective approach because it is difficult to determine the optimal depth of a
decision tree without growing it first.
C5.0’s Approach to Pruning