Decision Trees Notes
Decision Trees Notes
Decision Trees are divided into Classification and Regression Trees. Regression trees are
needed when the response variable is numeric or continuous. Classification trees, as the
name implies are used to separate the dataset into classes belonging to the response variable.
Terminology related to Decision Trees:
Basic terminology used with Decision trees:
ROOT Node: It represents entire population or sample and this further gets divided into two or
more homogeneous sets.
SPLITTING: It is a process of dividing a node into two or more sub-nodes.
Decision Node: When a sub-node splits into further sub-nodes, then it is called decision node.
Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal node.
Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You
can say opposite process of splitting.
Branch / Sub-Tree: A sub section of entire tree is called branch or sub-tree
Parent and Child Node: A node, which is divided into sub-nodes is called parent node of sub-
nodes whereas sub-nodes are the child of parent node.
Advantages:
Are simple to understand and interpret. People are able to understand decision tree
models after a brief explanation.
Have value even with little hard data. Important insights can be generated based on
experts describing a situation (its alternatives, probabilities, and costs) and their
preferences for outcomes.
Help determine worst, best and expected values for different scenarios.
Use a white box model. If a given result is provided by a model.
Can be combined with other decision techniques.
Disadvantage:
They are unstable, meaning that a small change in the data can lead to a large change
in the structure of the optimal decision tree.
They are often relatively inaccurate. Many other predictors perform better with similar
data. This can be remedied by replacing a single decision tree with a random forest of
decision trees, but a random forest is not as easy to interpret as a single decision tree.
For data including categorical variables with different number of levels, information gain
in decision trees is biased in favor of those attributes with more levels.[7]
Calculations can get very complex, particularly if many values are uncertain and/or if
many outcomes are linked.
Decision tree classifier parameters:
https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
Decision tree regressor parameters:
https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html