DM Chapter 4

Chapter 4 discusses decision trees and their classification capabilities, emphasizing the importance of entropy and information gain for feature selection and data splitting. It outlines the structure of decision trees, the divide-and-conquer approach for building them, and the C5.0 algorithm's strengths and weaknesses. Additionally, it covers pruning techniques to prevent overfitting and improve model accuracy.

Uploaded by

oumaima abaied

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views6 pages

DM Chapter 4

Uploaded by

oumaima abaied

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Chapter 4: Classification Using Decision Trees and Rules

Remember:
- Entropy is used to determine which feature to use for splitting, not where to split.
- The goal is to reduce entropy (homogeneity) and maximize information gain

Understanding decision trees

- From of the model: tree structure.
- The model itself comprises a series of logical decisions.
- Decision nodes: a decision to be made on an attribute.
- Branches: decision nodes split into branches that indicate the decision's choices.
- The tree is terminated by leaf nodes (terminal nodes) that denote the result of following a
combination of decisions.
- Decision trees are appropriate for:
- applications in which the classification mechanism needs to be transparent for legal
reasons.
- The results need to be shared to facilitate decision-making.
- Decision trees are:
- Most widely used machine learning technique
- Can be applied for modeling almost any type of data—often with unparalleled performance.
- In spite of their wide applicability, it is worth noting some scenarios where trees may not be an
ideal fit because may result in a very large number of decisions and an overly complex tree:
- a task where the data has a large number of nominal features with many levels
- the data has a large number of numeric features.

Divide and conquer

- Decision trees are built using a heuristic called recursive partitioning, known as divide and
conquer.
- It is called divide and conquer because it uses the feature values to split the data into smaller
and smaller subsets of similar classes.
Steps:
1. The algorithm begins at the root node (the entire dataset)
2. It chooses a feature that is the most predictive of the target class.
3. The examples are then partitioned into groups of distinct values of this feature; this decision
forms the first set of tree branches.
4. The algorithm continues to divide-and-conquer the nodes, choosing the best predictive feature
each time until a stopping criterion is reached.
Stopping condition:
This might occur at a node if:
➢ All (or nearly all) of the examples at the node have the same class
➢ There are no remaining features to distinguish among examples
➢ The tree has grown to a predefined size limit
Example:
Imagine that you are working for a Hollywood film
studio, and your desk is piled high with screenplays.
Rather than read each one cover-to-cover, you decide
to develop a decision tree algorithm to predict
whether a potential movie would fall into one of three
categories: mainstream hit, critic's choice, or box office
bust. To gather data for your model, you turn to the
studio archives to examine the previous ten years of
movie releases. After reviewing the data for 30 different movie scripts, a pattern emerges. There seems to
be a relationship between the film's proposed shooting budget, the number of A-list celebrities lined up
for starring roles, and the categories of success. A scatter plot of this data might look something like the
following diagram:
Step 1:The algorithm begins at the root node (the entire dataset)

Step 2: Choose a feature (film's proposed shooting

budget, or number of A-list celebrities) that is the most
predictive of the target class. (categories of success).
Let's first split the feature indicating the number of
celebrities, partitioning the movies into 2 groups:
- with a low number of A-list stars
- without a low number of A-list stars

Step 3: The examples are then partitioned into groups of distinct values of this feature;
Next, among the group of movies with a larger number of celebrities, we can make another split
between:
- movies with a high budget
- movies without a high budget

At this point, we have partitioned the data into three groups:

1. top-left corner: critically-acclaimed films 🟩
a. a high number of celebrities
b. a relatively low budget.
2. top-right corner: box office hits 🔷:
a. a large number of celebrities
b. a high budget and.
3. Bottom half: flops 🔴:
a. a low number of celebrities: has little star power
b. A budget ranging from small to large
Step 4: Continue to divide and conquer the nodes:
We could continue to divide the data by splitting it based on increasingly specific ranges of budget and
celebrity counts until each of the incorrectly classified values resides in its own, perhaps tiny partition.
Stopping condition:
Since the data can continue to be split until there are
no distinguishing features within a partition, a
decision tree can be prone to be overfitting for the
training data with overly specific decisions. We'll
avoid this by stopping the algorithm here since more
than 80 percent of the examples in each group are
from a single class.
Limitation of the decision tree: uses axis-parallel
splits
- You might have noticed that diagonal lines could
have split the data even more cleanly.
- The fact that each split considers one feature at a
time prevents the decision tree from forming
more complex decisions such as "if the number of
celebrities is greater than the estimated budget,
then it will be a critical success".

The C5.0 decision tree algorithm

- The most well-known implementation of decision trees.
- Was developed by computer scientist J. Ross Quinlan as an improved version of his prior
algorithm, C4.5, which itself is an improvement over his ID3 (Iterative Dichotomiser 3) algorithm.
- It does well for most types of problems directly out of the box.
- It generally performs nearly as well as other models but is much easier to understand and
deploy.
Strengths:
➔ An all-purpose classifier that does well on most problems
➔ Highly-automatic learning process can handle numeric or nominal features, missing data
➔ Uses only the most important features
➔ Can be used on data with relatively few training examples or a very large number
➔ Results in a model that can be interpreted without a mathematical background (for relatively
small trees)
➔ More efficient than other complex models
Weaknesses: relatively minor and can be largely avoided
➔ Decision tree models are often biased toward splits on features having a large number of levels
➔ It is easy to overfit or underfit the model
➔ Can have trouble modeling some relationships due to reliance on axis parallel splits
➔ Small changes in training data can result in large changes to decision logic
➔ Large trees can be difficult to interpret and the decisions they make may seem counterintuitive
Choosing the best feature for splitting data
Entropy:
- Pure segment: If it contains only a single class.
- C5.0 uses entropy for measuring purity (randomness). The entropy of a sample of data indicates
how mixed the class values are;
- If one class dominates completely (e.g., 100% "Yes"), entropy is zero, meaning the data is
completely homogeneous.
- If a dataset has equal numbers of different classes (e.g., 50% "Yes" and 50% "No"), it has
high entropy ( = 1 ).
- The definition of entropy is specified by:

In the entropy formula, for a given segment of data (S):

- C: the number of different class levels
- Pi: the proportion of values falling into class level i.
For example, suppose we have a partition of data with two classes:
- red (60%)
- white (40%)
We can calculate the entropy as: -0.60 * log2(0.60) - 0.40 * log2(0.40)=0.9709506
How Decision Trees Use Entropy:
- A decision tree algorithm looks for the best way to split the data to reduce entropy.
- It evaluates each feature and calculates how much it decreases entropy when used for splitting.
For any 2-class arrangement:
If x: the proportion of class 1
Then, 1-x: the proportion of class 2
Using the curve() function, we can then plot the entropy for all possible values of x: curve(-x * log2(x) - (1
- x) * log2(1 - x), col="red", xlab = "x", ylab = "Entropy", lwd=4)

As illustrated by the peak in entropy at x = 0.50, a 50-50 split results in the maximum entropy.
Information gain:
- Information gain is used to calculate the change in entropy resulting from a split on each
possible feature.
- The algorithm checks different features and selects the one that provides the highest information
gain, meaning it best separates the classes.
- If the information gain = 0 → no reduction in entropy for splitting on this feature → The split does
not improve class separation.
Information Gain (Feature) = Entropy Before Split − Weighted Entropy After Split
InfoGain(F) = Entropy (S1) − Entropy (S2)

- Weighted Entropy After Split: entropy in the partitions resulting from the split.
𝑛
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆2) = ∑ 𝑤𝑖 * 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑃𝑖)
𝑖=1
Weighted Entropy after split = w1* Entropy(P1) +…+ wn * Entropy(Pn)
Where wi is the proportion of data points in partition Pi
𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑝𝑜𝑖𝑛𝑡𝑠 𝑖𝑛 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝑃𝑖
𝑤𝑖 = 𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑖𝑓 𝑑𝑎𝑡𝑎 𝑝𝑜𝑖𝑛𝑡𝑠
This means that after splitting a dataset into multiple groups (partitions), the overall entropy of the new
dataset is calculated by considering:
1. The entropy of each individual partition (how mixed or pure it is).
2. The size (proportion) of each partition relative to the total dataset.
-
Handling Numeric Features
- The previous formulae assume nominal features, but decision trees use information gain for
splitting on numeric features as well.
- A common practice is testing various splits that divide the values into groups greater than or less
than a threshold; this reduces the numeric feature into a two-level categorical feature.
- The numeric threshold (e.g., "greater than 50" vs. "less than 50"). yielding the largest
information gain is chosen for the split
Pruning the decision tree
Large tree → overly specific decisions → overfitting model
Pruning a decision tree involves reducing its size.

Pre-pruning/ Early stopping:

- Pre-pruning/ Early stopping stops the tree from growing:

a. once it reaches a certain number of decisions
b. if the decision nodes contain only a small number of examples.
- However, one downside is that there is no way to know whether the tree will miss subtle, but
important patterns that it would have learned had it grown to a larger size.

Post-pruning:

- Post-pruning involves:
1. growing a tree that is too large
2. using pruning criteria based on the error rates at the nodes to reduce the size of the tree to a
more appropriate level.
- This is often a more effective approach because it is difficult to determine the optimal depth of a
decision tree without growing it first.
C5.0’s Approach to Pruning

Post-pruning: C5.0 automatically prunes trees to improve accuracy.

- C5.0 first grows a large tree that overfits the training data.
- Then, it removes branches and nodes that don’t significantly improve classification accuracy.

Subtree Raising & Subtree Replacement

Subtree Raising:
- A branch deep in the tree is moved higher if it helps simplify the tree without losing accuracy.
Subtree Replacement:
- A complex set of conditions is replaced with a simpler decision that performs just as well.

Balancing overfitting and underfitting models

If model accuracy is vital it may be worth investing some time with various pruning options to see if it
improves performance on the test data.
One of the strengths of the C5.0 algorithm is that it is very easy to adjust the training options.

Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Trees
No ratings yet
Trees
78 pages
How To Download Google Maps For Windows 11 - 10
No ratings yet
How To Download Google Maps For Windows 11 - 10
28 pages
Week 11 - Decision Tree Learning
No ratings yet
Week 11 - Decision Tree Learning
43 pages
SysAdmin MCQ
No ratings yet
SysAdmin MCQ
92 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Lecture 7 Overview of ML Models
No ratings yet
Lecture 7 Overview of ML Models
77 pages
Decision Tree
No ratings yet
Decision Tree
58 pages
Unit II
No ratings yet
Unit II
34 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
2 ML Ch3 Decision Trees Final
No ratings yet
2 ML Ch3 Decision Trees Final
70 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
Decision Tree
No ratings yet
Decision Tree
23 pages
Feedback Control Systems (FCS) : Lecture-26 Routh-Herwitz Stability Criterion
No ratings yet
Feedback Control Systems (FCS) : Lecture-26 Routh-Herwitz Stability Criterion
19 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
07.2.decision Trees - ML
No ratings yet
07.2.decision Trees - ML
32 pages
Module III - Classification Decision Tree
No ratings yet
Module III - Classification Decision Tree
48 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
SOP For Protocol For Working Standard
No ratings yet
SOP For Protocol For Working Standard
6 pages
Machine Learning: Prepared by
No ratings yet
Machine Learning: Prepared by
44 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Classification
No ratings yet
Classification
30 pages
Installation: Order No.: Customer: Equipment: Converter Type: Document: 3BHS213774E01 ACS 1000 W
No ratings yet
Installation: Order No.: Customer: Equipment: Converter Type: Document: 3BHS213774E01 ACS 1000 W
73 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Supervised Learning-Classification Part-4 Divide and Conquer
No ratings yet
Supervised Learning-Classification Part-4 Divide and Conquer
32 pages
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
No ratings yet
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
73 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Set 5
No ratings yet
Set 5
10 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
DM Chapter 8
No ratings yet
DM Chapter 8
7 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Com01 PPT Operatorsandconditionalstatement
No ratings yet
Com01 PPT Operatorsandconditionalstatement
19 pages
DS4 - CLS-Decision Tree
No ratings yet
DS4 - CLS-Decision Tree
32 pages
Product Supplement For Planning Space: Access To This Documentation (" ")
No ratings yet
Product Supplement For Planning Space: Access To This Documentation (" ")
6 pages
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
No ratings yet
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
22 pages
Decision Trees
No ratings yet
Decision Trees
8 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
Tree Menu Magic 2
No ratings yet
Tree Menu Magic 2
77 pages
Management Policy PDF
No ratings yet
Management Policy PDF
50 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Dtree&rf
No ratings yet
Dtree&rf
26 pages
LVC 1 Post-Session Summary
No ratings yet
LVC 1 Post-Session Summary
9 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Argus 40 Optical Swing Lane Data Sheet
No ratings yet
Argus 40 Optical Swing Lane Data Sheet
4 pages
DM Chapter 7
No ratings yet
DM Chapter 7
6 pages
TONEX Pedal User Manual
No ratings yet
TONEX Pedal User Manual
67 pages
Entropy and Information Gain For Decision Tree Algorithm
No ratings yet
Entropy and Information Gain For Decision Tree Algorithm
12 pages
DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
Adaptive DFE Modeling Using IBIS v4. 2
No ratings yet
Adaptive DFE Modeling Using IBIS v4. 2
36 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Search Bar
No ratings yet
Search Bar
6 pages
Adv Econ Chapter 1: Modeling Framework
No ratings yet
Adv Econ Chapter 1: Modeling Framework
5 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Decision Trees
No ratings yet
Decision Trees
5 pages
Data Mining Chapter 2: Market Basket Analysis
No ratings yet
Data Mining Chapter 2: Market Basket Analysis
4 pages
ToPrint ExamTopics 77 - 100
100% (1)
ToPrint ExamTopics 77 - 100
46 pages
Decision Trees: A Recent Overview: S. B. Kotsiantis
No ratings yet
Decision Trees: A Recent Overview: S. B. Kotsiantis
23 pages
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
No ratings yet
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
7 pages
Q4 MATH 9-WEEK 3-Solving Right Triangle Using Trigonometric Ratios
No ratings yet
Q4 MATH 9-WEEK 3-Solving Right Triangle Using Trigonometric Ratios
39 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Tree
No ratings yet
Decision Tree
28 pages
TSA Chapters 1: Introduction To Time Series
No ratings yet
TSA Chapters 1: Introduction To Time Series
4 pages
Mickael Musindo
No ratings yet
Mickael Musindo
2 pages
Expose 6 PDF
0% (1)
Expose 6 PDF
2 pages
Endress-Hauser Proline T-Mass A 150 6AAB EN
No ratings yet
Endress-Hauser Proline T-Mass A 150 6AAB EN
4 pages
CLADLOK Flat Panel Datasheet
No ratings yet
CLADLOK Flat Panel Datasheet
2 pages
CBD ZZ 00 DR DR 1001
No ratings yet
CBD ZZ 00 DR DR 1001
1 page
LT08
No ratings yet
LT08
5 pages
Yoga Pavan Resume
No ratings yet
Yoga Pavan Resume
2 pages
2 Abstract (Black and White)
No ratings yet
2 Abstract (Black and White)
5 pages
Decision Trees: Classifier
No ratings yet
Decision Trees: Classifier
23 pages
Handover - Check List
No ratings yet
Handover - Check List
5 pages
ATCR33S LQ (mm08610)
No ratings yet
ATCR33S LQ (mm08610)
2 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
No ratings yet
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
13 pages
Multi-Service Video Management Platform: Ultra Series
No ratings yet
Multi-Service Video Management Platform: Ultra Series
2 pages
Simulación Con VLECalc
No ratings yet
Simulación Con VLECalc
36 pages
Double Skin Ducted Blower Split System (A5DSB-H/A5MC-H) Double Skin Ducted Blower Split System (A5DSB-H/A5MC-H)
No ratings yet
Double Skin Ducted Blower Split System (A5DSB-H/A5MC-H) Double Skin Ducted Blower Split System (A5DSB-H/A5MC-H)
1 page
Java Frame
No ratings yet
Java Frame
3 pages
Essay and Hackathon
No ratings yet
Essay and Hackathon
2 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet