Module 9- CART
Module 9- CART
Supervised Unsupervised
Dimensionality
Regression Classification Clustering
Reduction
Height> 6
𝑅2
180 No Yes
Weight >
100 180
Male 𝑅1
𝑅3 𝑅1 No Yes
✓ Root node
✓ Splitting
✓ Branch
✓ Decision node (internal node)
✓ Leaf node (terminal node)
✓ Sub-tree
✓ Depth (level)
✓ Pruning
They all
measure
impurity
𝑋1
𝐺𝑖𝑛𝑖 = 1 − 𝑝𝑗2
𝑗
• Where 𝑦ො𝑅𝑗 is the mean target for the training observations within the 𝑗𝑡ℎ rectangle.
• The best split is made at that particular step, rather than looking ahead and picking a split that will lead
to a better tree in some future step.
The output of recursive binary A tree corresponding to the A perspective plot of the prediction
splitting on a two-dimensional partition in the left panel. surface corresponding to that tree.
example
2. Gini index:
3. Cross entropy:
• 𝑝Ƹ 𝑚𝑘 represents the proportion of training observations in the 𝑚𝑡ℎ region from the 𝑘 𝑡ℎ class.
• Classification error rate is not sufficiently sensitive to node purity and in practice either Gini or
Cross entropy is preferred.
• 𝛼 controls the bias variance trade off and is determined by cross validation.
• Lastly, we return to full data set and obtain the subtree corresponding to 𝛼
Pros:
• Easy to interpret and visualize
• Can easily handle categorical data without the need to create dummy variables
• Can easily capture Non-linear patterns
• Can handle data in its raw form (no preprocessing needed). Why?
• Has no assumptions about distribution because of the non-parametric nature of the algorithm
Cons:
• Poor level of predictive accuracy.
• Sensitive to noisy data. It can overfit noisy data. Small variations in data can result in the
different decision tree*.
*This can be reduced by bagging and boosting algorithms.