Decision TREE

This document discusses how to build a decision tree using the Classification and Regression Tree (CART) algorithm in Python. It describes converting categorical variables into dummy variables, fitting a CART model to predict income using predictor variables like marital status and capital gains, and exporting the tree structure to a file. The CART model is limited to a maximum of 5 leaf nodes and uses the Gini criterion. The document provides code to preprocess data, fit the CART model, export the tree structure, and make predictions on the training data.

Uploaded by

WHITE YT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

155 views3 pages

Decision TREE

Uploaded by

WHITE YT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Decision TREE

We shall examine two of the many methods for measuring leaf node purity, which lead to the
two leading algorithms for constructing decision trees:
• CART algorithm
• C5.0 algorithm

How to Build CART Decision Trees Using Python

import pandas as pd
import numpy as np
import statsmodels.tools.tools as stattools
from sklearn.tree import DecisionTreeClassifier, export_graphviz
adult_tr = pd.read_csv("C:/.../adult_ch6_training")
For simplicity, we save the Income variable as y.
y = adult_tr[[’Income’]]
We have a categorical variable, Marital status, among our predictors. The CART model
implemented in the sklearn package needs categorical variables converted to a dummy variable
form. Thus, we will make a series of dummy variables for Marital status using the categorical()
command.
mar_np = np.array(adult_tr[’Marital status’])
(mar_cat, mar_cat_dict) = stattools.categorical(mar_np, drop=True, dictnames = True)
We turn the variable Marital status into an array using array(), then use the categorical()
command from the stattools package to create a matrix of dummy variables for each value of
Marital status. We save the matrix and dictionary separately using (mar_cat, mar_cat_dict).
The matrix mar_cat contains five columns, one for each category in the original Marital status
variable. Each row represents a record in the adult_tr data set. Each row will have a 1 in the
column which matches the value that record had in the original Marital status variable. You can
tell which column represents which category by examining mar_cat_dict. In our case, the first
row of mar_cat has a 1 in the third column. By examining mar_cat_dict, we know the third
column represents the “Never married” category. Sure enough, the first record of adult_tr has
“Never married” as the Marital status variable value.
Now, we need to add the newly made dummy variables back into the X variables.
mar_cat_pd = pd.DataFrame(mar_cat)
X = pd.concat((adult_tr[[’Cap_Gains_Losses’]], mar_cat_pd), axis = 1)
We first make the mar_cat matrix a data frame using the DataFrame() command. We then use
the concat() command to attach the predictor variable Cap_Gains_Losses to the data frame of
dummy variables that represent marital status. We save the result as X.
Before we run the CART algorithm, note that the columns of X do not include the different
values of the Marital status variable. Run mar_cat_dict to see that the first column is for the
value “Divorced,” the second for “Married,” and so on. Since the first column of X is
Cap_Gains_Losses, we can specify the names of each column of X.
X_names = ["Cap_Gains_Losses", "Divorced", "Married","Never-married", "Separated",
"Widowed"]
It will help us when visualizing the CART model to know the levels of y as well.
y_names = ["<=50K", ">50K"]
Now, we are ready to run the CART algorithm!
cart01 = DecisionTreeClassifier(criterion = "gini", max_leaf_nodes=5).fit(X,y)
To run the CART algorithm, we use the DecisionTreeClassifier() command. The
DecisionTreeClassifier() command sets up the various parameters for the decision tree. For
example, the criterion = “gini” input specifies that we are using a CART model which utilizes
the Gini criterion, and the max_leaf_nodes input trims the CART tree to have at most the
specified number of leaf nodes. For this example, we have limited our tree to five leaf nodes. The
fit() command tells Python to fit the decision tree that was previously specified to the data. The
predictor variables are given first, followed by the target variable. Thus, the two inputs to fit()
are the X and y objects we created. We save the decision tree as cart01.
Finally, to obtain the tree structure, we use the export_graphviz() command.
export_graphviz(cart01, out_file = "C:/.../cart01.dot",feature_names=X_names,
class_names=y_names)
The first input is the decision tree itself, which we saved as cart01. The out_file input will save
the tree structure to the specified location and name the file cart01.dot. Run the contents of the
file through the graphviz package to display the CART model. Specifying feature_names =
X_names and class_names = y_names add the predictor variable names and the target variable
values to the cart01.dot file, greatly increasing its readability. To obtain the classifications of the
Income variable for every variable in the training data set, use the predict() command.
predIncomeCART = cart01.predict(X)
Using the predict() command on cart01 says that we want to use our CART model to make the
classifications. Including the predictor variables X as input specifies that we want predictions for
those records in particular. The result is the classification, according to our CART model, for
every record in the training data set. We save the predictions as predIncomeCART.

ISC Class XII-COMPUTER PROJECT
75% (12)
ISC Class XII-COMPUTER PROJECT
91 pages
BIOLOGY Assessmemt
33% (3)
BIOLOGY Assessmemt
4 pages
Digital Signal Processing (BEC-42) : Unit-3 Lecture-1 (FIR Filter Design)
No ratings yet
Digital Signal Processing (BEC-42) : Unit-3 Lecture-1 (FIR Filter Design)
115 pages
Apbioteacherlabmanual2012 2ndprt LKD PDF
100% (1)
Apbioteacherlabmanual2012 2ndprt LKD PDF
426 pages
Week 5 - Human Flourishing in Progress and De-Development
No ratings yet
Week 5 - Human Flourishing in Progress and De-Development
18 pages
Gen Ed 07 STS
100% (1)
Gen Ed 07 STS
34 pages
ODB - Chem (Matter)
100% (1)
ODB - Chem (Matter)
2 pages
Population, Community and Ecosystem
No ratings yet
Population, Community and Ecosystem
48 pages
Biol 308 - Study Questions
100% (1)
Biol 308 - Study Questions
47 pages
Graph Traversal
100% (1)
Graph Traversal
38 pages
Machine Learning - What It Is, Tutorial, Definition, Types - Javatpoint
No ratings yet
Machine Learning - What It Is, Tutorial, Definition, Types - Javatpoint
14 pages
Decision Trees
No ratings yet
Decision Trees
77 pages
Analysis of Perseverance Rover's Wheels
100% (1)
Analysis of Perseverance Rover's Wheels
12 pages
CART Algorithm
No ratings yet
CART Algorithm
65 pages
Classification and Regression Trees CART
No ratings yet
Classification and Regression Trees CART
40 pages
Civil Aeronautics - Manual
No ratings yet
Civil Aeronautics - Manual
135 pages
Cva Lab Reviewer
No ratings yet
Cva Lab Reviewer
34 pages
Potential and Kinetic Energy
No ratings yet
Potential and Kinetic Energy
22 pages
Lec 01 - 1 Materials Processing Technology
No ratings yet
Lec 01 - 1 Materials Processing Technology
12 pages
NGEC7 03 Science, Technology and Nation Building
No ratings yet
NGEC7 03 Science, Technology and Nation Building
16 pages
Set B - STS Prelim Exam
No ratings yet
Set B - STS Prelim Exam
3 pages
ML Exp8 C36
No ratings yet
ML Exp8 C36
18 pages
3science, Technology, and Society: The Information Age (Digital Age)
No ratings yet
3science, Technology, and Society: The Information Age (Digital Age)
8 pages
Energy Flow - Year 9
No ratings yet
Energy Flow - Year 9
27 pages
TTL Lesson 3
No ratings yet
TTL Lesson 3
9 pages
Technology As A Way of Revealing
No ratings yet
Technology As A Way of Revealing
12 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
35 pages
1 GEN ED PRE-BOARD - Rabies Comes From Dog and Other Bites
No ratings yet
1 GEN ED PRE-BOARD - Rabies Comes From Dog and Other Bites
11 pages
Dna Genetic Material
No ratings yet
Dna Genetic Material
56 pages
Perseverance: in The Name of The God of The Universe
No ratings yet
Perseverance: in The Name of The God of The Universe
21 pages
Comparison of Graph Clustering Algorithms
No ratings yet
Comparison of Graph Clustering Algorithms
6 pages
ODB - Math (Geometry)
No ratings yet
ODB - Math (Geometry)
4 pages
Ecological Waste Management System
No ratings yet
Ecological Waste Management System
5 pages
Historical Antecedent in The Philippines
No ratings yet
Historical Antecedent in The Philippines
31 pages
Human Flourishing My Version
No ratings yet
Human Flourishing My Version
26 pages
K Nearest Neighbor - Step by Step Tutorial
No ratings yet
K Nearest Neighbor - Step by Step Tutorial
16 pages
ODB - Bio (Human Anatomy)
No ratings yet
ODB - Bio (Human Anatomy)
3 pages
Classification and Regression Trees (CART) Theory and Applications
No ratings yet
Classification and Regression Trees (CART) Theory and Applications
40 pages
Cytological and Mathematical Basis of Inheritance
No ratings yet
Cytological and Mathematical Basis of Inheritance
6 pages
GED109 - Module Paper 2
No ratings yet
GED109 - Module Paper 2
5 pages
BLEPT EXAM DRILL 2 PROFEDUC Q Only 150 Sept 2022 Exam
No ratings yet
BLEPT EXAM DRILL 2 PROFEDUC Q Only 150 Sept 2022 Exam
9 pages
STS Notes
No ratings yet
STS Notes
8 pages
Operation Wood
No ratings yet
Operation Wood
369 pages
Physical Science: Romans 8: 31 "If God Is For Us, Who Can Be Against Us?"
No ratings yet
Physical Science: Romans 8: 31 "If God Is For Us, Who Can Be Against Us?"
3 pages
Indigenous Science and Technology in The Philippines
No ratings yet
Indigenous Science and Technology in The Philippines
15 pages
Chapter 1 Introduction To Earth Science
No ratings yet
Chapter 1 Introduction To Earth Science
11 pages
ODB - CurrDev 1
No ratings yet
ODB - CurrDev 1
3 pages
ODB - Bio (Botany)
No ratings yet
ODB - Bio (Botany)
1 page
Chem01 - General and Inorganic
No ratings yet
Chem01 - General and Inorganic
15 pages
Acceleration Quiz: How Fast Is Fast?
No ratings yet
Acceleration Quiz: How Fast Is Fast?
6 pages
01 Chapter 1 - Thinking Critically With Psychological Science - Student
No ratings yet
01 Chapter 1 - Thinking Critically With Psychological Science - Student
32 pages
Verbal Analogy
No ratings yet
Verbal Analogy
4 pages
Deciphering Cryptographic Messages, Containing Detailed Discussions On Statistics. (1) It
No ratings yet
Deciphering Cryptographic Messages, Containing Detailed Discussions On Statistics. (1) It
4 pages
ODB Assessment 3
No ratings yet
ODB Assessment 3
2 pages
STS Pre Midterm Reviewer
No ratings yet
STS Pre Midterm Reviewer
6 pages
Discrete and Continuous Simulation
No ratings yet
Discrete and Continuous Simulation
15 pages
Lesson 4
No ratings yet
Lesson 4
34 pages
Information Security
No ratings yet
Information Security
43 pages
Chem Prelim Exam2018
No ratings yet
Chem Prelim Exam2018
5 pages
Sample Question
No ratings yet
Sample Question
2 pages
Sts Reviewerrr
No ratings yet
Sts Reviewerrr
4 pages
Sts Gmo Report
No ratings yet
Sts Gmo Report
9 pages
Introduction To Biology
No ratings yet
Introduction To Biology
6 pages
Principles of Teaching: Classroom Management
No ratings yet
Principles of Teaching: Classroom Management
3 pages
TEST F Perceptual Acuity QUS
No ratings yet
TEST F Perceptual Acuity QUS
4 pages
Principles of Teaching: Ephesians 4:32 "Forgive As You Have Been Forgiven."
No ratings yet
Principles of Teaching: Ephesians 4:32 "Forgive As You Have Been Forgiven."
2 pages
Week #6 - Verilog Behavioural Modeling (Part 4) FSM
No ratings yet
Week #6 - Verilog Behavioural Modeling (Part 4) FSM
18 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Job Description For DS4
No ratings yet
Job Description For DS4
5 pages
2011control Strategy of Disc Braking Systems For Downward Belt Conveyors
No ratings yet
2011control Strategy of Disc Braking Systems For Downward Belt Conveyors
4 pages
Methodology Fyp2 (Experimental & Simulation) - Dr. Zainoor
No ratings yet
Methodology Fyp2 (Experimental & Simulation) - Dr. Zainoor
23 pages
Linear Algebra Second Edition 2nd Edition Jin Ho Kwak Sungpyo Hong Download
100% (1)
Linear Algebra Second Edition 2nd Edition Jin Ho Kwak Sungpyo Hong Download
80 pages
Decision Trees
No ratings yet
Decision Trees
25 pages
EC220/221 Introduction To Econometrics: Canh Thien Dang
No ratings yet
EC220/221 Introduction To Econometrics: Canh Thien Dang
30 pages
Public Key Infrastructure: Jim Hurst
No ratings yet
Public Key Infrastructure: Jim Hurst
5 pages
ECE 565 - Estimation, Filtering, and Detection
No ratings yet
ECE 565 - Estimation, Filtering, and Detection
2 pages
Sample of My Work Lab Details For Lab 04
No ratings yet
Sample of My Work Lab Details For Lab 04
22 pages
Pet
No ratings yet
Pet
15 pages
Absolute/Global Extrema: Maxima and Minima of A Function of One Variable
No ratings yet
Absolute/Global Extrema: Maxima and Minima of A Function of One Variable
3 pages
SR Flip Flop JK Flip Flop
No ratings yet
SR Flip Flop JK Flip Flop
7 pages
Ijsra 2024 1210
No ratings yet
Ijsra 2024 1210
8 pages
Pigeon Hole
No ratings yet
Pigeon Hole
5 pages
Learning Representations On Logs For AIOps
No ratings yet
Learning Representations On Logs For AIOps
11 pages
M06 HCEFSM PathFollower
No ratings yet
M06 HCEFSM PathFollower
10 pages
DSP - Manual Part
No ratings yet
DSP - Manual Part
9 pages
Course - Syllabus - 2024 WAY - ECO3104-11 - ECONOMETRICS (1) - SEOKJOO ANDREW CHANG
No ratings yet
Course - Syllabus - 2024 WAY - ECO3104-11 - ECONOMETRICS (1) - SEOKJOO ANDREW CHANG
2 pages
Technical Software Pic Crc16
No ratings yet
Technical Software Pic Crc16
7 pages
Box-Muller Transform Wiki
No ratings yet
Box-Muller Transform Wiki
5 pages
404eurec-502 Eirec-502
No ratings yet
404eurec-502 Eirec-502
2 pages
Labra
No ratings yet
Labra
2 pages

Decision TREE

Uploaded by

Decision TREE

Uploaded by

Decision TREE

How to Build CART Decision Trees Using Python

You might also like