Decision Trees
Decision Trees
Nested if – else:
Decision tree starts with a node with conditions and divided into several nodes, this diagram is called a
tree.
Root – node: This is the starting node.
Leaf – node: These are branches of the root node.
Internal – nodes: These are neither leaf nodes or root nodes.
Non – leaf nodes: These are nodes that make decisions, these are called the nodes of decision tree.
Geometry:
In decision tree all of the hyper planes are axis parallel.
Building a Decision Tree:
Entropy:
Calculation of the Entropy
Properties of Entropy:
Various cases:
Entropy curve:
This is a symmetric curve.
Conclusion:
Entropy for real – valued feature:
Given a random variable, if all of them are equi – probable, entropy is maximum.
If the data distribution is more uniform then the entropy is more.
If the data is more like gaussian then the data has less entropy.
Information gain:
Recursively breaking each node basing the maximum information gain as the criteria till we get the
pure node.
If the depth of the tree increases chances of over fitting the data increases.
If depth is small then the decision tree tend to under fit.
The Hyper parameter in decision tree is the height of the tree.
We use cross validation to choose the depth of the decision tree.
Building a decision Tree: Splitting numerical features:
If there are outliers or noise in the data then the decision tree tend to fit these points and make the
model to over fit the data.
A decision stump is noting but under fitting the data with less depth.
Depth is calculated using cross – validation
Decision tree can handle Large data, dimensionality is small or reasonable, low latency.
Regression using Decision Trees:
Instead of using Information gain we use Mean square error (or) mean absolute deviation is used to
make regression trees.
All the lines are axis parallel in decision tree model for regression / classification.
Decision trees can read the data explicitly not in case is similarity matrix.
Decision trees can naturally can be extended to multi – class classification.
Decision surface:
The decision that we get are non – linear. It basically divides the data into axis – parallel planes / hyper
planes.
There are feature interactions in decision trees to decide the class of the query point.
There are logical feature interactions in decision trees.
Outliers:
Interpret – ability:
Interpret – ability is very easy in decision trees.
Feature importance:
We can sum up the reductions in entropy of each feature based in the importance of the feature.
If one feature occurs more than one time then we can conclude that feature is more important.
Exercise: