0% found this document useful (0 votes)
5 views12 pages

ML For ME S17 Decision Trees

The Decision Tree Classifier is a supervised learning algorithm used for both classification and regression, characterized by a tree-like structure with nodes representing decisions based on attributes. Key terminologies include root nodes, internal nodes, leaf nodes, and the processes of splitting and pruning to enhance model performance. While decision trees are easy to interpret and handle various data types, they can be prone to overfitting and sensitive to data changes.

Uploaded by

Anuradha Ranjan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views12 pages

ML For ME S17 Decision Trees

The Decision Tree Classifier is a supervised learning algorithm used for both classification and regression, characterized by a tree-like structure with nodes representing decisions based on attributes. Key terminologies include root nodes, internal nodes, leaf nodes, and the processes of splitting and pruning to enhance model performance. While decision trees are easy to interpret and handle various data types, they can be prone to overfitting and sensitive to data changes.

Uploaded by

Anuradha Ranjan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Decision Tree Classifier

Algorithm
Decision Tree
➢ Supervised learning algorithm
➢ Both for classification and regression problems
➢ Tree-like structure
➢ Each internal node tests on attribute
➢ Each branch corresponds to attribute value
➢ Each leaf node represents the final decision or
prediction
Decision Tree
Decision Tree - example
Terminologies
➢ Root Node: A decision tree’s root node, which represents the
original choice or feature from which the tree branches, is the
highest node.
➢ Internal Nodes (Decision Nodes): Nodes in the tree whose choices
are determined by the values of particular attributes. There are
branches on these nodes that go to other nodes.
➢ Leaf Nodes (Terminal Nodes): The branches’ termini, when choices
or forecasts are decided upon. There are no more branches on leaf
nodes.
➢ Branches (Edges): Links between nodes that show how decisions are
made in response to particular circumstances.
Terminologies
➢ Splitting: The process of dividing a node into two or more sub-
nodes based on a decision criterion
➢ Parent Node: The original node from which a split originates
➢ Child Node: Nodes created as a result of a split from a parent
node
➢ Decision Criterion: The rule or condition used to determine
how the data should be split at a decision node
➢ Pruning: The process of removing branches or nodes from a
decision tree to improve its generalization and prevent
overfitting
Why Decision Trees?
➢ Versatile in simulating intricate decision-making processes
➢ Complex choice scenarios – due to their hierarchical structure
➢ Provide comprehensible insights into the decision logic
➢ Proficient with both numerical and categorical data
➢ Can easily adapt to a variety of datasets - autonomous feature
selection capability
➢ Provide simple visualization - underlying decision processes in
a model
How Decision Tree is formed?
➢ Recursively partitioning the data based on the values
of different attributes
➢ Attribute selection measure (ASM): Algorithm selects
the best attribute to split the data at each internal
node, based on certain criteria such as Information
gain
➢ Splitting process continues until a stopping criterion
is met
How Decision Tree measures
Information Gain= Entropy(S)-[(Weighted Avg) *Entropy(each feature)]

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

Entropy(s)= ∑jPj log2 Pj


Note: the logarithm is still base entropy is a measure of the
expected encoding length measured in bits
Gini Index= 1- ∑jPj2
Advantages
• Easy to understand and interpret, making them
accessible to non-experts
• Handle both numerical and categorical data without
requiring extensive preprocessing
• Provides insights into feature importance for decision-
making
• Handle missing values and outliers without significant
impact – robust to errors
• Applicable to both classification and regression tasks
Disadvantages
• The potential for overfitting

• Sensitivity to small changes in data, limited


generalization if training data is not representative

• Potential bias in the presence of imbalanced data


References
• https://www.javatpoint.com/machine-learning-decision-
tree-classification-algorithm
• https://www.geeksforgeeks.org/decision-tree-introduction-
example/
• https://www.analyticsvidhya.com/blog/2021/08/decision-
tree-algorithm/

You might also like