0% found this document useful (0 votes)
6 views9 pages

Machine Learning Chapter 4

Decision trees are a flowchart-like structure used for decision-making and predictions in various fields, consisting of nodes for decisions, branches for outcomes, and leaf nodes for final predictions. The creation process involves selecting the best attribute based on metrics like Gini impurity and entropy, and recursively splitting the dataset until a stopping criterion is met. While they offer advantages such as simplicity and versatility, decision trees can suffer from overfitting and instability.

Uploaded by

Ariba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views9 pages

Machine Learning Chapter 4

Decision trees are a flowchart-like structure used for decision-making and predictions in various fields, consisting of nodes for decisions, branches for outcomes, and leaf nodes for final predictions. The creation process involves selecting the best attribute based on metrics like Gini impurity and entropy, and recursively splitting the dataset until a stopping criterion is met. While they offer advantages such as simplicity and versatility, decision trees can suffer from overfitting and instability.

Uploaded by

Ariba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Decision Tree

Definition

 Decision trees are a popular and powerful tool used in various fields
such as machine learning, data mining, and statistics. They provide a
clear and intuitive way to make decisions based on data by modeling
the relationships between different variables
What is a Decision Tree?

 A decision tree is a flowchart-like structure used to make decisions or


predictions. It consists of nodes representing decisions or tests on
attributes, branches representing the outcome of these decisions, and
leaf nodes representing final outcomes or predictions. Each internal
node corresponds to a test on an attribute, each branch corresponds to
the result of the test, and each leaf node corresponds to a class label or
a continuous value.
Structure of a Decision Tree

 Root Node: Represents the entire dataset and the initial decision to be
made.
 Internal Nodes: Represent decisions or tests on attributes. Each
internal node has one or more branches.
 Branches: Represent the outcome of a decision or test, leading to
another node.
 Leaf Nodes: Represent the final decision or prediction. No further splits
occur at these nodes.
How Decision Trees Work?

 The process of creating a decision tree involves:


 Selecting the Best Attribute: Using a metric like Gini impurity,
entropy, or information gain, the best attribute to split the data is
selected.
 Splitting the Dataset: The dataset is split into subsets based on the
selected attribute.
 Repeating the Process: The process is repeated recursively for each
subset, creating a new internal node or leaf node until a stopping
criterion is met (e.g., all instances in a node belong to the same class or
a predefined depth is reached).
Metrics for Splitting

 Gini Impurity: Measures the likelihood of an incorrect classification of a


new instance if it was randomly classified according to the distribution of
classes in the dataset.
 Entropy: Measures the amount of uncertainty or impurity in the
dataset.
 Information Gain: Measures the reduction in entropy or Gini impurity
after a dataset is split on an attribute.
 Decision Tree, the major challenge is the identification of the attribute for the
root node at each level. This process is known as attribute selection. We have
two popular attribute selection measures:
 Information Gain: When we use a node in a decision tree to partition the training
instances into smaller subsets the entropy changes. Information gain is a
measure of this change in entropy.
 Gini Index: Gini Index is a metric to measure how often a randomly chosen
element would be incorrectly identified.
 It means an attribute with a lower Gini index should be preferred.

 Entropy: is the measure of uncertainty of a random variable, it characterizes


the impurity of an arbitrary collection of examples. The higher the entropy more
the information content.
Advantages of Decision Trees

 Simplicity and Interpretability: Decision trees are easy to


understand and interpret. The visual representation closely mirrors
human decision-making processes.
 Versatility: Can be used for both classification and regression tasks.
 No Need for Feature Scaling: Decision trees do not require
normalization or scaling of the data.
 Handles Non-linear Relationships: Capable of capturing non-linear
relationships between features and target variables.
Disadvantages of Decision Trees

 Overfitting: Decision trees can easily over fit the training data,
especially if they are deep with many nodes.
 Instability: Small variations in the data can result in a completely
different tree being generated.
 Bias towards Features with More Levels: Features with more levels
can dominate the tree structure.

You might also like