2 - 4 Cart

The document discusses the Gini Index and entropy as criteria for decision tree algorithms, highlighting that Gini is faster to compute while entropy may yield slightly better accuracy. It explains the process of constructing decision trees using the CART algorithm, which includes selecting attributes based on Gini or standard deviation reduction. The document also covers regression trees, detailing the steps for splitting nodes and determining when to stop further branching based on the coefficient of deviation.

Uploaded by

sumanthveeramallu9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views16 pages

2 - 4 Cart

Uploaded by

sumanthveeramallu9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Gini vs Entropy GINI (t ) 1  j

[ p ( j | t )] 2

•entropy is more complex since it makes use of logarithms and consequently,

the calculation of the Gini Index will be faster.
• Accuracy using the entropy criterion are slightly better (not always).

C1 1 P(C1) = 1/6 P(C2) = 5/6

C2 5 Gini = 1 – (1/6)2 – (5/6)2 = 0.278

C1 2 P(C1) = 2/6 P(C2) = 4/6

C2 4 Gini = 1 – (2/6)2 – (4/6)2 = 0.444
Example: Construct Decision tree using CART

attribute A will be
chosen to split the
node as
Gini(A)<Gini(B)
Decision tree using CART
algorithm
Day Outlook Temp. Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Weak Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Strong Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
3
Selecting the Best Attribute for splitting
using Gini
S=[9+,5
-]
Outlook

Ove
Sunn Rai
r
y n
cast
[2+, [4+, [3+,
3-] 0] 2-]
Outlook will be placed at root as it has
high information gain (low Gini index).
Regression trees
• Regression trees are trees where their leaves predict a real number
and not a class.
• Example CART
• CART stands for Classification and Regression Trees. An important
feature of CART is its ability to generate regression trees
• CART looks for splits that minimize the prediction squared error (the
least–squared deviation). The prediction in each leaf is based on the
weighted mean for node.
Algorithm
• Step 1: The standard deviation of the target is calculated.
• Step 2: The dataset is then split on the different attributes. The standard deviation for
each branch is calculated. The resulting standard deviation is subtracted from the
standard deviation before the split. The result is the standard deviation reduction.
• Step 3:The attribute with the largest standard deviation reduction is chosen for the
decision node.
• Step 4a: The dataset is divided based on the values of the selected attribute. This process
is run recursively on the non-leaf branches, until all data is processed.
• Repeat all the steps until when coefficient of deviation (CV) for a branch becomes
smaller than a certain threshold (e.g., 10%) and/or when too few instances in a
branch
CART Example
• Hours played is target (continuous outcome) -- regression
a) Standard deviation for target:
• Compute SD for outlook

Rainy : 25, 30, 35, 38, 48

Mean = 35.2
Std. Dev = [(25-35.2)^2 + (30-
35.2)^2 + (38-35.2)^2 + (48-
35.2)^2 ]/ 5
= 7.78

The standard deviation reduction = Std. dev (Hours)- Std. dev (Hours, outlook))
= 9.32- 7.66 = 1.66
• In practice, we need some termination criteria. For example, when coefficient of
deviation (CV) for a branch becomes smaller than a certain threshold (e.g., 10%)
and/or when too few instances (n) remain in the branch (e.g., 3).

3.49 / 46.3 = 7.53% = 8%

(46+43+52+44)/4 = 46.25 =
"Overcast" subset does not need any further splitting because 46.3
its CV (8%) is less than the threshold (10%). The related leaf
node gets the average of the "Overcast" subset.
• The "Sunny" branch has an CV (28%) more than the threshold (10%) which needs
further splitting. We select "Temp" as the best best node after "Outlook" because
it has the largest SDR.
• Because the number of data points for both branches (FALSE and TRUE) is equal
or less than 3, we stop further branching and assign the average of each branch
to the related leaf node.
• Moreover, the “Rainy" branch has an CV (22%) which is more than the threshold (10%).
This branch needs further splitting. We select "Temp" as the best best node because it
has the largest SDR.
• Because the number of data points for all three branches (Cool, Hot and Mild) is equal or
less than 3 we stop further branching and assign the average of each branch to the
related leaf node.

Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Decision Tree
No ratings yet
Decision Tree
100 pages
Decision Tree Notes
No ratings yet
Decision Tree Notes
6 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
28 pages
Session 6 - Decision Tree
No ratings yet
Session 6 - Decision Tree
37 pages
IS4834 Week 8
No ratings yet
IS4834 Week 8
42 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Week - 2 Day - 2 Machine Learning 2 - 3
No ratings yet
Week - 2 Day - 2 Machine Learning 2 - 3
33 pages
Dinesh Kumar Indra Panwar Arjan Singh
No ratings yet
Dinesh Kumar Indra Panwar Arjan Singh
19 pages
Decision Tree
No ratings yet
Decision Tree
36 pages
CSE 422 Machine Learning Tree Based Methods
No ratings yet
CSE 422 Machine Learning Tree Based Methods
35 pages
BSC ML Ch3
No ratings yet
BSC ML Ch3
106 pages
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
CSE445 NSU Week - 4
No ratings yet
CSE445 NSU Week - 4
48 pages
ML 20
No ratings yet
ML 20
24 pages
Decision Tree Theory
No ratings yet
Decision Tree Theory
22 pages
Decision Tree
No ratings yet
Decision Tree
66 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Data Minning Unit 5 PDF
No ratings yet
Data Minning Unit 5 PDF
19 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
12 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Decision Tree
No ratings yet
Decision Tree
34 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
6 DecisionTrees ID3 CART
No ratings yet
6 DecisionTrees ID3 CART
24 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
06 Classification Decision Tree
No ratings yet
06 Classification Decision Tree
42 pages
Examples
No ratings yet
Examples
8 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Unit-3 ML
No ratings yet
Unit-3 ML
47 pages
CSC454 10
No ratings yet
CSC454 10
36 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
Decistion Tree
No ratings yet
Decistion Tree
27 pages
5 1 Decision Trees
No ratings yet
5 1 Decision Trees
34 pages
S&ML Unit 6 - Q & A
No ratings yet
S&ML Unit 6 - Q & A
12 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
DT Classifier
No ratings yet
DT Classifier
45 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
ML Unit-3
No ratings yet
ML Unit-3
92 pages
00 Decision Tree Example
No ratings yet
00 Decision Tree Example
12 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
1-Mapping Problems To Machine Learning Tasks
No ratings yet
1-Mapping Problems To Machine Learning Tasks
19 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
CAT 2021 Junior Solutions
No ratings yet
CAT 2021 Junior Solutions
7 pages
Variables and Data Types in Java Javatpoint
No ratings yet
Variables and Data Types in Java Javatpoint
5 pages
Graph Theory: Introduction To
No ratings yet
Graph Theory: Introduction To
13 pages
Chapter Seven Multimedia Data Compression 1. Lossy and Lossless Compression
100% (1)
Chapter Seven Multimedia Data Compression 1. Lossy and Lossless Compression
34 pages
Clustering in Non-Euclidean Space
No ratings yet
Clustering in Non-Euclidean Space
4 pages
Chapter 3 Gate-Level Minimization
No ratings yet
Chapter 3 Gate-Level Minimization
67 pages
Sas 21 Mat 152 - FLM
No ratings yet
Sas 21 Mat 152 - FLM
8 pages
Machine Learning Syllabus
No ratings yet
Machine Learning Syllabus
2 pages
Pattern 1: Sliding Window: Find Averages of Sub Arrays
No ratings yet
Pattern 1: Sliding Window: Find Averages of Sub Arrays
143 pages
Course No: EEE 4704 Experiment No: 03 Name of The Experiment: "Study On Convolutional Encoding and Viterbi Convolutional Decoding Algorithm"
No ratings yet
Course No: EEE 4704 Experiment No: 03 Name of The Experiment: "Study On Convolutional Encoding and Viterbi Convolutional Decoding Algorithm"
9 pages
1.2.2 (DFA and NFA)
No ratings yet
1.2.2 (DFA and NFA)
27 pages
Vector Quantization: Data Compression and Data Retrieval
No ratings yet
Vector Quantization: Data Compression and Data Retrieval
39 pages
Verhoosel 33241900 2024-2
No ratings yet
Verhoosel 33241900 2024-2
82 pages
Trellis Coded Modulation
No ratings yet
Trellis Coded Modulation
3 pages
Railway Planning
No ratings yet
Railway Planning
25 pages
Operation Research Paper
No ratings yet
Operation Research Paper
10 pages
Lecture - 7 Classification (SVM)
No ratings yet
Lecture - 7 Classification (SVM)
48 pages
110CS0081 1
No ratings yet
110CS0081 1
42 pages
Viva Questions
No ratings yet
Viva Questions
13 pages
Cariaga, Galanta Sec-29mn Final
No ratings yet
Cariaga, Galanta Sec-29mn Final
13 pages
cps506 Polarity Puzzle w2025
No ratings yet
cps506 Polarity Puzzle w2025
6 pages
Artificial Intelligence: Lecture 4: Problem Solving Search
No ratings yet
Artificial Intelligence: Lecture 4: Problem Solving Search
61 pages
Ex38ec - Problem Statement
No ratings yet
Ex38ec - Problem Statement
2 pages
Lecture 05 - Quasi Newthon Methods
No ratings yet
Lecture 05 - Quasi Newthon Methods
10 pages
Mathematical Notations and Set Theory
No ratings yet
Mathematical Notations and Set Theory
63 pages
Itc 2017
No ratings yet
Itc 2017
1 page
Finite State Transducers
No ratings yet
Finite State Transducers
4 pages
Quiz 07int Simpson3by8
No ratings yet
Quiz 07int Simpson3by8
2 pages
The Ocaml Language: Syntax Functions Conditionals
No ratings yet
The Ocaml Language: Syntax Functions Conditionals
1 page
Instruction for Using a Slide Rule
From Everand
Instruction for Using a Slide Rule
W. Stanley
No ratings yet

2 - 4 Cart

Uploaded by

2 - 4 Cart

Uploaded by

Gini vs Entropy GINI (t ) 1  j

•entropy is more complex since it makes use of logarithms and consequently,

C1 1 P(C1) = 1/6 P(C2) = 5/6

C1 2 P(C1) = 2/6 P(C2) = 4/6

Rainy : 25, 30, 35, 38, 48

3.49 / 46.3 = 7.53% = 8%

You might also like