ML For ME S17 Decision Trees
ML For ME S17 Decision Trees
Algorithm
Decision Tree
➢ Supervised learning algorithm
➢ Both for classification and regression problems
➢ Tree-like structure
➢ Each internal node tests on attribute
➢ Each branch corresponds to attribute value
➢ Each leaf node represents the final decision or
prediction
Decision Tree
Decision Tree - example
Terminologies
➢ Root Node: A decision tree’s root node, which represents the
original choice or feature from which the tree branches, is the
highest node.
➢ Internal Nodes (Decision Nodes): Nodes in the tree whose choices
are determined by the values of particular attributes. There are
branches on these nodes that go to other nodes.
➢ Leaf Nodes (Terminal Nodes): The branches’ termini, when choices
or forecasts are decided upon. There are no more branches on leaf
nodes.
➢ Branches (Edges): Links between nodes that show how decisions are
made in response to particular circumstances.
Terminologies
➢ Splitting: The process of dividing a node into two or more sub-
nodes based on a decision criterion
➢ Parent Node: The original node from which a split originates
➢ Child Node: Nodes created as a result of a split from a parent
node
➢ Decision Criterion: The rule or condition used to determine
how the data should be split at a decision node
➢ Pruning: The process of removing branches or nodes from a
decision tree to improve its generalization and prevent
overfitting
Why Decision Trees?
➢ Versatile in simulating intricate decision-making processes
➢ Complex choice scenarios – due to their hierarchical structure
➢ Provide comprehensible insights into the decision logic
➢ Proficient with both numerical and categorical data
➢ Can easily adapt to a variety of datasets - autonomous feature
selection capability
➢ Provide simple visualization - underlying decision processes in
a model
How Decision Tree is formed?
➢ Recursively partitioning the data based on the values
of different attributes
➢ Attribute selection measure (ASM): Algorithm selects
the best attribute to split the data at each internal
node, based on certain criteria such as Information
gain
➢ Splitting process continues until a stopping criterion
is met
How Decision Tree measures
Information Gain= Entropy(S)-[(Weighted Avg) *Entropy(each feature)]