0% found this document useful (0 votes)
2 views

My Decision Tree Algorithm

The document provides an overview of decision trees, a supervised learning algorithm used for classification and regression, highlighting key concepts such as decision nodes, leaf nodes, entropy, and information gain. It discusses various algorithms including CART, ID3, C4.5, and CHAID, along with their use cases and criteria for splitting data. Additionally, it covers the evaluation process for decision trees, including data splitting, making predictions, and calculating performance metrics.

Uploaded by

www.rithika2001
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

My Decision Tree Algorithm

The document provides an overview of decision trees, a supervised learning algorithm used for classification and regression, highlighting key concepts such as decision nodes, leaf nodes, entropy, and information gain. It discusses various algorithms including CART, ID3, C4.5, and CHAID, along with their use cases and criteria for splitting data. Additionally, it covers the evaluation process for decision trees, including data splitting, making predictions, and calculating performance metrics.

Uploaded by

www.rithika2001
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

DECISION TREE

ALGORITHM
DECISION TREE

Decision tree are the type of supervised


learning where the data is continuously split
according to a certain parameters.

The tree has two entities namely decision


node and leaf node.

The decision node are where the data is split


and Leaves are the decision or final
outcomes.
Decision tree is used for both classification and
regression.

1. Entropy
2. Information gain

Entropy is a measure of impurity or uncertainty


in a dataset, and it's a key factor in decision
trees. It helps the algorithm determine how to
split data at each node to create more
structured subsets.
Formula:
Information gain is a measure used to
determine which feature should be used to
split the data at each internal node of the
decision tree. It is calculated using entropy.

Formula:
EXAMPLE
Gini index :

The Gini Index is a statistical measure used to


determine inequality or impurity in a dataset.
Purpose: Measures how "pure" or "impure" a
dataset is. Pure data means all elements belong
to one category, while impure data means they
are distributed among multiple categories.
Range: Values range from 0 to 1:
0: Perfectly pure (all elements belong to one
class).
1: Maximum impurity (elements are evenly
distributed across all classes).
• Formula:

• Higher Gini Index indicates more impurity. Lower Gini


Index indicates more purity.
Example:
ALGORITHMS:

1. CART (Classification and Regression


Trees)
Use Case: Both classification and regression.
Split Criterion:
 Gini Index for classification.
 Mean Squared Error (MSE) for regression.
Output: Binary tree (each node splits into
two branches).
2. ID3 (Iterative Dichotomiser 3)
Use Case: Classification tasks.
Split Criterion: Information Gain,
Entropy
Limitations:
 Does not handle continuous data directly.
 Prone to overfitting.
ID3 is one of the earliest decision tree
algorithms, developed by Ross Quinlan, and
is used for classification tasks.
3. C4.5
C4.5 is an advanced decision tree algorithm
developed by Ross Quinlan as an
improvement over the ID3 algorithm.
Use Case: Classification tasks.
Split Criterion: Gain Ratio
Features:
 Handles continuous and missing data.
 Prunes trees to avoid overfitting.
5. CHAID (Chi-squared Automatic
Interaction Detection)
Use Case: Primarily for categorical data.
Split Criterion: Chi-square test for
independence.
Features:
 Produces multi-way splits (not restricted to binary
splits).
 Often used in marketing and survey data analysis.
Evaluating the decision tree:

 Split Data
Divide the dataset into training and test sets to evaluate
performance on unseen data.
 Make Predictions
Use the decision tree to predict outcomes for the test set.
 Compare Predictions
Compare the tree’s predictions with the actual outcomes
from the test set.
 Calculate Metrics
Measure performance using metrics like accuracy, precision,
recall, or F1 score.
 Analyze and Improve
Check for overfitting, adjust parameters and improve if
needed.
THANK YOU

You might also like