Pruning is the process of removing branches or nodes from a decision tree to simplify it and reduce overfitting. Some key points about pruning:
- Pruning reduces the complexity of the decision tree to avoid overfitting to the training data.
- It is done to improve the accuracy of the model on new unseen data by removing noisy or unstable parts of the tree.
- Common pruning techniques include pre-pruning, cost-complexity pruning, reduced error pruning etc.
- The goal of pruning is to find a tree with optimal complexity that balances bias and variance for best generalization on new data.
To answer your question - tree based models and linear models each have their own advantages in different situations: