0% found this document useful (0 votes)

8 views11 pages

Data II - Decision Trees and Rules

The document discusses decision trees as a predictive modeling technique that utilizes 'if-then' rules to classify data based on features. It explains methods for determining the best splits, including Information Gain and Gini Index, and emphasizes the importance of balancing underfitting and overfitting through techniques like pruning. Additionally, it touches on regression decision trees and the Sum of Squared Residuals (SSR) method for minimizing variance in predictions.

Uploaded by

Arij Khlifi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views11 pages

Data II - Decision Trees and Rules

Uploaded by

Arij Khlifi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Data Mining: Decision Tree Yesmine Chalgham

Decision Trees and Rules

Decision Tree:

● Decision tree learning: build tree-shaped models for prediction.

● Model: uses "if-then" decisions (nodes) with branches for different
choices.
● Final nodes (leaves): represent the predicted outcome (e.g., class label).
● Classification: data follows branches based on features, reaching a leaf
and its prediction.
How to get this Decision tree?

Split

Data Mining: Decision Tree Page 1

Data Mining: Decision Tree Yesmine Chalgham

Choosing the best split

● Identifying the Best Split:

The primary challenge for a decision tree is determining the most effective
feature to divide the dataset, aiming for partitions with examples
predominantly of a single class to achieve purity.
I. Information Gain method:
● Measuring Purity with Entropy:
Entropy measures the impurity within a dataset segment S

Entropy values range from 0 (completely homogeneous) to 1 (maximum

disorder).
● Information Gain:
It measures the reduction in entropy from the pre-split dataset (S1) to the post-
split partitions (S2), aiming to maximize homogeneity in the resulting groups.

Calculating Total Entropy Post-Split:

Explaining the method in Steps and example:

Data Mining: Decision Tree Page 2

Data Mining: Decision Tree Yesmine Chalgham

Imagine you're a marketing manager trying to target ads for a new fitness tracker. You
have data on 100 past customers, including their age, income, and whether they bought
the tracker (1) or not (0). Your goal is to build a decision tree that can predict future
customer purchases(tracker) based on these attributes.

1. Calculate the entropy of the parent node. Entropy is a measure of how

mixed up the classes are in a dataset. A dataset with only one class has an
entropy of 0, while a dataset with an equal number of each class has an
entropy of 1.

➔ Think of the parent node as the entire customer pool (100 people).
➔ We know 40 bought the tracker (class 1) and 60 didn't (class 0).
➔ Calculate the entropy using the formula:
Entropy(S) = - (0.4 * log2(0.4) + 0.6 * log2(0.6)) = 0.971

This value (0.971) tells us how mixed up the classes are in the parent node. Closer
to 1 means more uncertainty, with classes evenly distributed.

2. For each potential split feature, calculate the entropy of each child
node. A child node is a group of data points that result from splitting the
parent node on a particular feature value.

➔ We'll consider splitting the data by age (<30 and 30+).

➔ Calculate entropy for each group based on their purchase behavior:
- Young (<30): 30 bought, 20 didn't (Entropy = 0.971)
- Old (30+): 10 bought, 40 didn't (Entropy = 0.758)

These values indicate that the "young" group is less certain (more mixed
classes), while the "old" group is more certain (mostly didn't buy).

3. Calculate the information gain for each potential split. Information gain
is the difference between the entropy of the parent node and the
weighted average of the entropy of the child nodes. The split with the
highest information gain is the best split, because it results in the most
homogeneous child nodes.

Data Mining: Decision Tree Page 3

Data Mining: Decision Tree Yesmine Chalgham

➔ Recall that information gain measures how much a specific feature (age
in this case) helps make the data less mixed.
➔ Use the formula:

InformationGain(age) = Entropy(parent) - (weight_young *

Entropy(young) + weight_old * Entropy(old))

= 0.971 - (0.6 * 0.971 + 0.4 * 0.758) = 0.0852

➔ We interpret this as: splitting by age reduces the overall uncertainty

(entropy) by 0.0852.
4. Repeat for Income Split
➔ Perform similar calculations for splitting based on income levels.
➔ Compare the information gain of both splits (age and income).
5. Choose the Best Split
➔ In this example, suppose the information gain for income split is 0.025.
➔ Since 0.0852 > 0.025, splitting by age leads to a greater reduction in
uncertainty.
➔ The higher the Information Gain (IG), the more an attribute effectively
reduces uncertainty about the target variable's outcome in a decision
tree model.
This is just the first step! The process continues iteratively for each branch,
considering further splits based on other features until a stopping criteria is met
(e.g., minimum information gain threshold or reaching a certain level of purity in
each leaf node).

Another example:

II. Gini index

method:

Data Mining: Decision Tree Page 4

Data Mining: Decision Tree Yesmine Chalgham

The Gini index, or Gini impurity, is a measure used in decision trees to quantify
the impurity or disorder within a dataset.

Where:

● S is the dataset for which the Gini index is being calculated.

● n is the number of different classes (outcomes) in the dataset.
● Pi is the proportion (probability) of the class i within the dataset S.

Example: To illustrate the use of the Gini index in building a decision tree for a
marketing scenario involving a new fitness tracker, dataset of 100 past
customers.
For simplicity, let's assume the dataset has been divided based on age into two
groups: "Under 30" and "30 and Over", and based on income into two groups:
"High" and "Low". We also have the purchase outcome for each group.
**Age Groups**:
- Under 30: 40 customers, 30 bought the tracker (1), 10 did not (0).
- 30 and Over: 60 customers, 20 bought the tracker (1), 40 did not (0).

**Income Groups**:
- High: 50 customers, 35 bought the tracker (1), 15 did not (0).
- Low: 50 customers, 15 bought the tracker (1), 35 did not (0).

1. Calculating Gini Index for Age Groups

First, we calculate the Gini index for splitting by age:

**Under 30**:

30 and Over:

➔ Weighted Gini for Age:

2. Calculating Gini Index for Income Groups

Data Mining: Decision Tree Page 5

Data Mining: Decision Tree Yesmine Chalgham

Next, we calculate the Gini index for splitting by income:

**High Income**:

**Low Income**:

Weighted Gini for Income:

Since the dataset is evenly split between high and low income, the weighted
Gini for income is simply the Gini of either group: 0.42.
3. Decision on the First Split
Comparing the weighted Gini indices for age and income:
- Weighted Gini for Age: 0.417
- Weighted Gini for Income: 0.42
➔ A lower Gini index indicates a better attribute for splitting the data.
This means that, according to our example, age is a slightly better predictor of
whether a customer will buy the fitness tracker than income.

4. Building the Decision Tree

1. First Split on Age: Divide the dataset into "Under 30" and "30 and Over".
2.Further Splits: For each of these groups, we could further analyze the data (possibly considering
income or other attributes if available) to create additional splits, aiming to increase the purity of
the nodes.
3. Terminal Nodes (Leaves): Continue splitting until reaching a point where further splits do not
significantly increase purity or when a node has reached a minimum size.
Other method explained in class:

Data Mining: Decision Tree Page 6

Data Mining: Decision Tree Yesmine Chalgham

Fitting/Underfitting/Overfitting the training data

Underfitting in Decision Trees

Data Mining: Decision Tree Page 7

Data Mining: Decision Tree Yesmine Chalgham

Underfitting occurs when the model is too simple to capture the underlying
structure of the data. This usually happens when the tree is not deep enough,
leading to large bias and poor performance on both training and unseen data.

Characteristics of Underfitting:

● The decision tree has very few nodes.

● It makes very broad generalizations about the data.

Overfitting in Decision Trees

Overfitting occurs when the model is too complex, capturing noise in the
training data as if it were a real pattern. This leads to high variance and poor
generalization to new data. Overfitting is common in decision trees that are
allowed to grow without constraints, creating highly specific rules that apply
only to the training data.

Characteristics of Overfitting:

● The decision tree is very deep, with many nodes.

● It makes highly specific splits that reflect noise or outliers in the training
data.

Fitting Decision Trees Properly

The key to using decision trees effectively is to balance between underfitting
and overfitting. This can be achieved through techniques such as

● Pruning (removing parts of the tree that don't provide additional power)
● Setting a maximum depth for the tree
● Requiring a minimum number of samples to split a node.

Pruning the decision tree

Data Mining: Decision Tree Page 8

Data Mining: Decision Tree Yesmine Chalgham

Problem: Decision trees can overfit the training data, leading to poor
performance on unseen data.

Solution: Pruning reduces the size and complexity of the tree, improving
generalization.

Techniques:

● Pre-pruning (Early stopping): Stop growing the tree early based on size
or data purity.
● Post-pruning: Grow a large tree, then remove suboptimal branches based
on error rates.

Benefits: Trade-offs:

● Improves generalization: Less prone to ● Pre-pruning might miss important

overfitting. patterns.
● Simpler model: Easier to interpret and ● Post-pruning requires growing a
faster to predict. large tree, consuming more
resources.

Key Parameters and Stopping Rules:

● Max Depth: Limits tree depth to prevent overfitting.

● Max Observations: Limits data points used for splits, reducing sensitivity
to noise. (eg: kol node fiha at least x observations)
● Convergence Criteria: This criterion monitors how much the tree's
internal rules (like split conditions) are changing between training
iterations. If the changes are minimal for a set number of iterations, it
suggests the tree is no longer improving and might have converged.

Regression Decision Tree

A regression tree is a type of decision tree used in machine learning for

predicting continuous values.
(12) Regression Trees, Clearly Explained!!! - YouTube

SSR method:

Data Mining: Decision Tree Page 9

Data Mining: Decision Tree Yesmine Chalgham

Choosing the best feature for splitting in a regression decision tree involves
finding the split that minimizes the variance (spread) of the target variable
(what you're trying to predict) within the resulting groups. This method is called
the Sum of Squared Residuals (SSR).

1. Calculate SSR for All Possible Splits:

● Divide the data into two groups based on the chosen split condition.
● For each group:
a. Calculate the average target value
b. Find the squared difference between each data point's actual
target value and the average target value for its group (this
represents the squared residual).
c. Sum all these squared residuals for each group to get the total
SSR for that specific split.
● Do this again and again with each two groups.
2. Choose the Split with Minimum SSR:
○ After calculating the SSR for every possible split across all features,
compare the SSR values for each split.
○ The feature-split combination that results in the lowest overall SSR
is chosen as the splitting point for the current node in the decision
tree. This split minimizes the variance of the target variable within
the resulting groups, leading to better predictions later.
Example:

Data Mining: Decision Tree Page 10

Data Mining: Decision Tree Yesmine Chalgham

Note : Sum of Squared Residuals (SSR): Measures how "spread out" the
target variable is within the resulting groups after a split. We want to
minimize SSR because the smaller the SSR, the more homogeneous
(similar) the target variable is within each group, leading to better
predictions.

Data Mining: Decision Tree Page 11

Optometry AND Ophthalmology: Kerala Government Optometrists Association PSC Training
100% (7)
Optometry AND Ophthalmology: Kerala Government Optometrists Association PSC Training
137 pages
Introduction To Big Data and Data Mining
No ratings yet
Introduction To Big Data and Data Mining
130 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Improve Safety Operability and Controllability HYSYS Dynamics
No ratings yet
Improve Safety Operability and Controllability HYSYS Dynamics
33 pages
PMOS NMOS Equations and Examples
100% (1)
PMOS NMOS Equations and Examples
3 pages
FYI: You Got LFI FYI: You Got LFI: Tal Be'ery
No ratings yet
FYI: You Got LFI FYI: You Got LFI: Tal Be'ery
34 pages
2015 Bull CAT 09.pdf - PDF
No ratings yet
2015 Bull CAT 09.pdf - PDF
67 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
3.cbse C1ipl Biweekly Test-1 Final Syllabus (18.07.2025 & 19.07.2025)
No ratings yet
3.cbse C1ipl Biweekly Test-1 Final Syllabus (18.07.2025 & 19.07.2025)
1 page
GT Class TKGTPS
100% (1)
GT Class TKGTPS
84 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
9395P Manual
No ratings yet
9395P Manual
232 pages
Mehta-OS - Intermediate 6CO-1
No ratings yet
Mehta-OS - Intermediate 6CO-1
2 pages
8 Classification
No ratings yet
8 Classification
82 pages
Aldehydes and Ketones-02 Solved Problems
No ratings yet
Aldehydes and Ketones-02 Solved Problems
13 pages
Trees
No ratings yet
Trees
78 pages
DM Unit-4
No ratings yet
DM Unit-4
75 pages
Mod 3 Part1 - Merged
No ratings yet
Mod 3 Part1 - Merged
101 pages
June CH 4 Atomic Structure Class Viii Notes
No ratings yet
June CH 4 Atomic Structure Class Viii Notes
4 pages
SAT Subject Math Level 1 Practice Test 3
No ratings yet
SAT Subject Math Level 1 Practice Test 3
17 pages
Unit 1 Classification & Prediction DM
No ratings yet
Unit 1 Classification & Prediction DM
71 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Decision Trees
No ratings yet
Decision Trees
61 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
Training Day 22
No ratings yet
Training Day 22
48 pages
DM 4
No ratings yet
DM 4
68 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Unit-3 ML
No ratings yet
Unit-3 ML
47 pages
3 - Sınıflandırma 2
No ratings yet
3 - Sınıflandırma 2
62 pages
2 Decision Tree Algo
No ratings yet
2 Decision Tree Algo
46 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
P9-10 ClassBasic
No ratings yet
P9-10 ClassBasic
82 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
ML-chap9 2024 110217
No ratings yet
ML-chap9 2024 110217
52 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Digital Marketing Be Etc (Insem.) (2019 Pattern) (Semester Viii) (Elective Vi) March 24
No ratings yet
Digital Marketing Be Etc (Insem.) (2019 Pattern) (Semester Viii) (Elective Vi) March 24
1 page
Decision Tree
No ratings yet
Decision Tree
34 pages
Data Mining
No ratings yet
Data Mining
68 pages
CSE445 NSU Week - 4
No ratings yet
CSE445 NSU Week - 4
48 pages
ML Lecture 8 9 Classification
No ratings yet
ML Lecture 8 9 Classification
35 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
27 pages
MEAN IMP QUESTION FOR END SEM (Ans)
No ratings yet
MEAN IMP QUESTION FOR END SEM (Ans)
40 pages
Xlunifac
No ratings yet
Xlunifac
118 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Decision Tree
No ratings yet
Decision Tree
47 pages
Classification
No ratings yet
Classification
45 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
59 pages
Class Basic
No ratings yet
Class Basic
75 pages
Lecture 5 DecisionTree
No ratings yet
Lecture 5 DecisionTree
21 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
07.2.decision Trees - ML
No ratings yet
07.2.decision Trees - ML
32 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Decision Trees
No ratings yet
Decision Trees
31 pages
DM 3
No ratings yet
DM 3
37 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Lec05 Classification DecisionTree
No ratings yet
Lec05 Classification DecisionTree
67 pages
S&ML Unit 6 - Q & A
No ratings yet
S&ML Unit 6 - Q & A
12 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
Materials and Design
No ratings yet
Materials and Design
12 pages
Decision Tree Theory
No ratings yet
Decision Tree Theory
22 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Decision Trees
No ratings yet
Decision Trees
16 pages
Data Minning Unit 5 PDF
No ratings yet
Data Minning Unit 5 PDF
19 pages
Decision Tree Introduction
No ratings yet
Decision Tree Introduction
14 pages
Vapor-Liquid Equl. K - Value
100% (1)
Vapor-Liquid Equl. K - Value
49 pages
Metra 18C and Metrawin 90 Modular Calibration System, CP
No ratings yet
Metra 18C and Metrawin 90 Modular Calibration System, CP
8 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
06T Semihermetic Screw Compressor
No ratings yet
06T Semihermetic Screw Compressor
8 pages
4.3 Orthogonal Diagonalization
No ratings yet
4.3 Orthogonal Diagonalization
11 pages
For Classification Models
No ratings yet
For Classification Models
47 pages
Lesson 7 Supervised Method (Decision Trees) Algorithms
No ratings yet
Lesson 7 Supervised Method (Decision Trees) Algorithms
12 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
HP95 Ai
No ratings yet
HP95 Ai
3 pages
Ncert Solutions For Class 10 Maths Chapter 13
No ratings yet
Ncert Solutions For Class 10 Maths Chapter 13
31 pages
Decitions Tree
No ratings yet
Decitions Tree
6 pages
Mass:mass Problems
No ratings yet
Mass:mass Problems
2 pages
Decision Tree Tutorial
No ratings yet
Decision Tree Tutorial
8 pages
Analog and Mixed Mode Vlsi Design
No ratings yet
Analog and Mixed Mode Vlsi Design
4 pages
Working With Mongo DB PDF
No ratings yet
Working With Mongo DB PDF
12 pages
Data Mining & Knowledge Discovery
No ratings yet
Data Mining & Knowledge Discovery
34 pages
Z Fi BDC Asset Master
No ratings yet
Z Fi BDC Asset Master
23 pages
Puneet Tandon PDF
No ratings yet
Puneet Tandon PDF
2 pages
Thermodynamics and Fluid Mechanics
No ratings yet
Thermodynamics and Fluid Mechanics
3 pages
LDR
No ratings yet
LDR
7 pages
Micro
No ratings yet
Micro
17 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet