0% found this document useful (0 votes)

48 views4 pages

Decision Tree

Decision trees are a classification technique that can be used for both classification and prediction problems. They work by recursively splitting a dataset into purer subsets based on the values of predictor variables using a measure of impurity like Gini index or information gain. Random forests are an ensemble method that creates many decision trees and aggregates their results, which helps reduce overfitting and improves accuracy compared to single decision trees. Key steps in building decision trees include handling missing data, converting categorical variables into numeric format, fitting the tree to training data, and evaluating on test data.

Uploaded by

Gurushantha Doddamani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views4 pages

Decision Tree

Uploaded by

Gurushantha Doddamani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 4

Correlation: To find the relationship b/w 2 variables only

Regression: To Find the causal effect relationship b/w IDV on the DV

Linear Regression: Prediction Technique

Simple Linear Regression : 1 DV and 1 IDV, Both DV and IDV are continuous
Multiple Linear Regression : 1 DV and more than 1 IDV.

Logistic Regression: DV is binary categorical, IDV is categorical and continuous.

-> Classification Technique
Single Predictor Model:
Multiple predictor Model:

Decision Tree and Random Forest:

Classification Technique Dependent Variable Independent Variable Purpose of
Algorithm
Decision Tree Categorical Categorical and
It is a classification Technique and is used to classify the records

continuous in a pictorial format with the help of gini index.

Random Forest categorical categorical and

It is an ensemble of decision tree algorithm used to

continuous find the important variable for decision tree

Decision Tree:is the only algorithm helpful for Classification and Prediction
Technique
Classify the records in Pictorial format, Target variable is categorical and input
variable is categorical and continuous
circle is node and box is leaf.

Training data set is Past data.

Based on Gini index we need to start the classification

nominal->distinct way identification

multi-way split(more than 2 way) categories

Ordinal-> Ranking the objects

Ex: size

Continuous : Variables are defined

impurityness of data
Gini 0 to 0.5
close to 0-> Homogenity
Entropy 0 to 1
0 -> homogenity and less impurityness

Misclassification error:
it ranges from 0 to 0.5

Binary split -> 2 -> Iterative Dicotomous

Categorical -> More than 2 ( Low, Medium, High) -> Cart
Continuous -> Income, Age, -> Cart

Regression Tree -> Continuous variable

CART -> Classification Regression Tree
ID3 -> Binary tree

Classification:
Overfitting:-> Decision tree grow more-> More and More Independent Variable:
Difficult to classify -> pre proning : before generation of tree.
Random forest Algorithm is ensemble of decision tree
Identify the Imp IDV
Pruning: cutting: Minimizing the the no of attributes for the tree(Control the
growing of tree)
horizontal

pre-pruning:Forward pruning: Before generation of Decision tree->we can use the

Random forest
It is used to identify the imp IDV.

post-pruning:Backward pruning: Once the tree is generated

1. Subtree Replacement ->Entire tree is summarized and replaced
2. Subtree Raising -> Connect to the Main node.

Handling Missing Attribute:

More than 50 are missing-> we can delete the entire column.

we can take the average and add to the missing values if the IDV is continuous

If IDV is categorical then we need to take the mode and replace the missing values

advantages
1.Generate the rules
2. perform the classification
3. IDV is both continuous and Categorical
4. By visualization we can clearly classify the data.

Weakness
1. Always the prediction variable IDV is categorical
2. it is not helpful for continuous variable.
3. It's performance is less for many categories.
4. it is not fit for small amount of data
5.

Underfitting

Missing Values
Costs of Classification:

Python coding
1. Import pandas library -> To load the data frame
2. Import Numpy package -> Storing the data in the array and matrix form.
3. from sklearn import tree -> for the Machine learning Algorithm
4. from sklearn import preprocessing -> used to convert text into numerical, and
missing values
5. Load the training data set
6. If the missing values is continuous then replace with Mean or If it is
categorical then replace with Mode
7. np.where(condition,value,colume)
8. Text into numrical: Label Encoder(it will fit for all variables), Label
Binarizer(2 values)
9. label_encoder = preprocessing.LabelEncoder()
10. encoded_sex = label_encoder.fit_transform(titanic_train['sex'])
11. fit_transform-> convert the text into numerical.
12. Initialize the decision tree model
tree_model = tree.DecisionTreeClassifier() -> output is categorical
DecisionTreeRegressor()-> The ouput is continuous
13. tree_model.fit(X = pd.DataFrame(encoded_sex), y = titanic_train["Survived"])

14. Based on the GIni Index value

15. Interface there is a graphviz
with open("Dtree1.dot", 'w') as f:
f= tree.export_grapghviz(tree_model, feature_names=["Sex"],out_file = f);
webgraphviz.com

lesser value is left side and greater values is right side

16. Predictors = pd.DataFrame([encoded_sex, titanic_train["Pclass"]]).T

17. tree_model.fit(X=predictors, y=titanic_train["Survived"])

with open("Dtree2.dot",'w')as f:
f = tree.export_grapghviz(tree_model,feature_names=["Sex","Pclass"],out_file = f);

For more than one DV we will use T

predictors:
Gender, Passenger Class
More than one independent variable we need to create the data frame

Max Depth is 8 : We need to control the Tree

there are 4 independent variables are there and output variable is 1 and it is
categorical (yes or no)
there fore 4 * 2 = 8 so we can go for 8 depth.
It will go for the 8 nodes

Titanic_train dataset link:https://drive.google.com/file/d/1qxJEtjt_pHzb52-

h_HXszQcNjt3zPEsF/view?usp=sharing

RandomForestClassifier(n_estimators = 1000, max_features =2, oob_score = True)

Based on the decision tree we can check the accuracy

Random forest

survived is dependent variable

IDV : Gender , fare , age

Model accuracy is more

accuracyscore ?

Decision Trees
100% (2)
Decision Trees
16 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Slide 3
No ratings yet
Slide 3
23 pages
UCS551 Chapter 6 - Classification
No ratings yet
UCS551 Chapter 6 - Classification
20 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
4.1.3.5 Lab - Decision Tree Classification
No ratings yet
4.1.3.5 Lab - Decision Tree Classification
11 pages
Decision Tree
No ratings yet
Decision Tree
3 pages
Practical 15 Python
No ratings yet
Practical 15 Python
6 pages
Decision Tree
100% (1)
Decision Tree
57 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
ML 3
No ratings yet
ML 3
20 pages
Decision Tree
No ratings yet
Decision Tree
82 pages
DWM Exp3 63
No ratings yet
DWM Exp3 63
7 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Data Science Interview Preparation (30 Days of Interview Preparation)
No ratings yet
Data Science Interview Preparation (30 Days of Interview Preparation)
22 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
E IS388 Theory MellaMargaretaVeronica 00000059669
No ratings yet
E IS388 Theory MellaMargaretaVeronica 00000059669
7 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Supervised Classification Notes
No ratings yet
Supervised Classification Notes
31 pages
Project Occupancy Alfonso Vicente Aragues
No ratings yet
Project Occupancy Alfonso Vicente Aragues
18 pages
ML Python
No ratings yet
ML Python
11 pages
Decision Trees
No ratings yet
Decision Trees
8 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Unit II
No ratings yet
Unit II
34 pages
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
5.desion Tree
No ratings yet
5.desion Tree
18 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Module 04
No ratings yet
Module 04
75 pages
Unit 3 PDF
No ratings yet
Unit 3 PDF
7 pages
Tree Based Classifiers: Dinesh R
No ratings yet
Tree Based Classifiers: Dinesh R
54 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Practice 2+
No ratings yet
Practice 2+
25 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
M2 - Supervised Machine Learning
No ratings yet
M2 - Supervised Machine Learning
79 pages
Presentation Report S2019 Artificial Intelligence-CS360
No ratings yet
Presentation Report S2019 Artificial Intelligence-CS360
9 pages
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
17 pages
Business Analytics: Data Classification
No ratings yet
Business Analytics: Data Classification
36 pages
365 ML Infographic
No ratings yet
365 ML Infographic
1 page
Decision Tree R
No ratings yet
Decision Tree R
5 pages
ML2
No ratings yet
ML2
7 pages
Pca2 1
No ratings yet
Pca2 1
26 pages
Supervised Learning
No ratings yet
Supervised Learning
187 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
No ratings yet
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
34 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
5 pages
AIML Module-04
No ratings yet
AIML Module-04
46 pages
ML5 Implementation
No ratings yet
ML5 Implementation
32 pages
Supervised Learning
No ratings yet
Supervised Learning
71 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
7 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Types
No ratings yet
Data Types
123 pages
Computer Mcqs Special Osssc
No ratings yet
Computer Mcqs Special Osssc
6 pages
How To Reset Forgotten OBIEE 11G RPD PASSWD
No ratings yet
How To Reset Forgotten OBIEE 11G RPD PASSWD
4 pages
Network Escalation Procedure V 1.0
No ratings yet
Network Escalation Procedure V 1.0
4 pages
FDD and TDD PDF
No ratings yet
FDD and TDD PDF
5 pages
Subject: DSTC++: Unit-I: Above Average Questions Short Questions
No ratings yet
Subject: DSTC++: Unit-I: Above Average Questions Short Questions
7 pages
SQL Reference en
No ratings yet
SQL Reference en
2,016 pages
CSM 151 - Lec1 - 2024
No ratings yet
CSM 151 - Lec1 - 2024
69 pages
Velpfiwe Advancerev05 2 278495
No ratings yet
Velpfiwe Advancerev05 2 278495
6 pages
Big and Little Endian
No ratings yet
Big and Little Endian
4 pages
Data Transfer, Sorting
No ratings yet
Data Transfer, Sorting
3 pages
WMN Practical Exam Questions
No ratings yet
WMN Practical Exam Questions
10 pages
List of Public Functions
No ratings yet
List of Public Functions
2 pages
Generation of Computer
No ratings yet
Generation of Computer
3 pages
DBMS Record For Degree Students
No ratings yet
DBMS Record For Degree Students
53 pages
Dual In-Line Package: Applications
No ratings yet
Dual In-Line Package: Applications
5 pages
Intro To Computer Programming Assignment 2
No ratings yet
Intro To Computer Programming Assignment 2
4 pages
Release Bulletin Philips DICOM Viewer R3.0 L1 SP09
No ratings yet
Release Bulletin Philips DICOM Viewer R3.0 L1 SP09
14 pages
ALG Tetra Security
No ratings yet
ALG Tetra Security
13 pages
Unit-Vi: Principle Sources of Optimization
No ratings yet
Unit-Vi: Principle Sources of Optimization
11 pages
Project File Edited
No ratings yet
Project File Edited
28 pages
Bootstrap 4 Cheat Sheet BC
No ratings yet
Bootstrap 4 Cheat Sheet BC
13 pages
Em-Tech-Intro To Ict
No ratings yet
Em-Tech-Intro To Ict
28 pages
Brosur ATG - Motherwell
No ratings yet
Brosur ATG - Motherwell
2 pages
Motorola 68Hc11 Microcontroller: Definition - What Does Microcontroller Mean?
No ratings yet
Motorola 68Hc11 Microcontroller: Definition - What Does Microcontroller Mean?
2 pages
Ibex101 Sep2020 SW - Env PDF
No ratings yet
Ibex101 Sep2020 SW - Env PDF
28 pages
2019 Census CAPI User Guide
No ratings yet
2019 Census CAPI User Guide
95 pages
Hammer Call Master FAQs
No ratings yet
Hammer Call Master FAQs
8 pages
Security Appscan Enterprise V9.0.3.9 Planning & Installation Guide
No ratings yet
Security Appscan Enterprise V9.0.3.9 Planning & Installation Guide
164 pages
DR Drill Complete Updated
No ratings yet
DR Drill Complete Updated
30 pages

Decision Tree

Uploaded by

Decision Tree

Uploaded by

Correlation: To find the relationship b/w 2 variables only

Regression: To Find the causal effect relationship b/w IDV on the DV

Linear Regression: Prediction Technique

Logistic Regression: DV is binary categorical, IDV is categorical and continuous.

Decision Tree and Random Forest:

continuous in a pictorial format with the help of gini index.

Random Forest categorical categorical and

continuous find the important variable for decision tree

Training data set is Past data.

nominal->distinct way identification

multi-way split(more than 2 way) categories

Ordinal-> Ranking the objects

Continuous : Variables are defined

Binary split -> 2 -> Iterative Dicotomous

Regression Tree -> Continuous variable

pre-pruning:Forward pruning: Before generation of Decision tree->we can use the

post-pruning:Backward pruning: Once the tree is generated

Handling Missing Attribute:

14. Based on the GIni Index value

lesser value is left side and greater values is right side

16. Predictors = pd.DataFrame([encoded_sex, titanic_train["Pclass"]]).T

17. tree_model.fit(X=predictors, y=titanic_train["Survived"])

For more than one DV we will use T

Max Depth is 8 : We need to control the Tree

Titanic_train dataset link:https://drive.google.com/file/d/1qxJEtjt_pHzb52-

RandomForestClassifier(n_estimators = 1000, max_features =2, oob_score = True)

survived is dependent variable

Model accuracy is more

You might also like