0% found this document useful (0 votes)
2 views13 pages

Decision Tree

Decision Trees are supervised machine learning algorithms used for classification and regression, structured like a tree with nodes representing decisions and outcomes. They are easy to interpret and widely applicable in various fields, though they can be prone to overfitting. Techniques like pruning and ensemble methods, such as Random Forests, can help mitigate overfitting and improve accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views13 pages

Decision Tree

Decision Trees are supervised machine learning algorithms used for classification and regression, structured like a tree with nodes representing decisions and outcomes. They are easy to interpret and widely applicable in various fields, though they can be prone to overfitting. Techniques like pruning and ensemble methods, such as Random Forests, can help mitigate overfitting and improve accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Decision Tree

By
Dr Ravi Prakash Verma
Professor
Department of CSAI
ABESIT
Decision Tree
1. Introduction to Decision Trees
• A Decision Tree is a supervised machine learning algorithm used for
both classification and regression tasks.
• It is structured like a tree, where:
• Each internal node represents a decision on an attribute.
• Each branch represents an outcome of the decision.
• Each leaf node represents a class label (for classification) or a numerical value
(for regression).
• Decision trees are easy to interpret and widely used in AI, finance,
medical diagnosis, and business analytics.
Decision Tree
2. Structure of a Decision Tree
• A decision tree consists of:
• Root Node: The starting point of the tree where the first split occurs.
• Internal Nodes: These nodes represent decisions based on a feature.
• Branches: These represent possible outcomes from an internal node.
• Leaf Nodes: The final classification/output.
• Example of a simple decision tree for a Loan Approval System:
Credit Score > 700?
/ \
Yes No
/ \
Approve? Income > 50K?
/ \ / \
Yes No Yes No
Decision Tree
3. Decision Tree Learning Process
• A decision tree is built using a recursive splitting approach, which
selects the best feature to split the data at each step.
Step 1: Selecting the Best Split (Feature Selection)
• The most important step in building a decision tree is selecting the best feature
to split the data.
• The goal is to choose the feature that maximizes information gain or reduces
impurity the most.
• There are three main criteria used:
1. Gini Impurity
2. Entropy (Information Gain)
3. Variance Reduction (for regression)
Decision Tree
Decision Tree
Decision Tree
Decision Tree
5. Decision Tree Construction Example
• Let’s construct a decision tree using Entropy & Information Gain.
• Dataset (Weather & Play Tennis Decision)

Outlook Temperature Humidity Wind Play Tennis


Sunny Hot High Weak No
Sunny Hot High Strong No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Decision Tree
Decision Tree
Decision Tree
7. Advantages and Disadvantages of Decision Trees
• Advantages
• Easy to interpret
• Handles both numerical & categorical data
• Requires little data preprocessing
• Can model complex interactions between features
• Disadvantages
• Prone to overfitting (high variance)
• Biased toward dominant classes
• Unstable (small changes in data can lead to different trees)
• Greedy algorithm may not find the global best tree
Decision Tree
8. Decision Tree Pruning
• Pruning helps prevent overfitting by reducing tree size.
• Pre-Pruning: Stops tree growth early (e.g., setting max depth).
• Post-Pruning: Removes unnecessary branches after tree construction.
9. Random Forest: An Extension of Decision Trees
• A Random Forest is a collection of multiple decision trees that vote for the final
outcome.
• It reduces overfitting.
• It is more accurate than a single decision tree.
Decision Tree
11. Summary
• Decision trees are a powerful yet interpretable machine learning model.
• They use Gini impurity, entropy, and information gain to determine the best
splits.
• Overfitting can be reduced using pruning and ensemble methods (Random
Forest).
• They are widely used in classification, regression, finance, and medical
diagnosis.

You might also like