0% found this document useful (0 votes)

73 views16 pages

Decision Trees Palagraism

The document discusses decision trees, including: 1) How decision trees work by using a tree-like graph or model of decisions and possible consequences to classify data points into categories or predict a target variable. 2) The key steps to build a decision tree including calculating metrics like entropy, information gain, and Gini impurity to determine the optimal features to split the data on at each node. 3) Decision tree implementation in Python for classification and regression problems, and parameters like minimum split size that affect the size of the tree.

Uploaded by

Vasudha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views16 pages

Decision Trees Palagraism

Uploaded by

Vasudha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Fig 1 Decision tree 4

Fi2 Partitioning data at different level 9

Fig 3 9
Contents
6. Decision trees..................................................................................................................................3
6.1 How to build a decision tree?....................................................................................................4
6.1.1 Entropy:...............................................................................................................................5
6.2.2 Information Gain.........................................................................................................6
6.1.3 Ginni impurity:-.................................................................................................................7
6.1.4 MSE......................................................................................................................................7
6.2 Parameters Related to the Size of the Tree:.............................................................................9
6.2.1 Minimum Split Size..........................................................................................................10
6.3 Decision tree implementation in Python................................................................................10
6.3.1 In case of classification.....................................................................................................10
6.3.2 In case of regression..........................................................................................................15
6.4 When to stop growing?............................................................................................................17
6.5 Advantages of decision:...........................................................................................................17
6. Decision trees

Decision trees support tool that uses a tree-like graph or a model of decisions and their
possible consequence. It is also called instance based algorithm as at each instance we take
decision or we can say it uses nested if- else condition.
Decision Tree is a non linear model which is made of various linear axis parallel planes.
In logistic regression we have a single plane but in decision tree we have multiple planes.
Suppose we want to classify a person is defaulter in paying tax or not on basis of three
categories – employed, area, location. Then decision tree made as:

Depth of ROOT
Tree = 0 NODE
N= 1000
1 600
1 = no
defaulter 0 400
0=defau
lter |
|
Employed ==
Y
/ \
/ \
Depth of
Tree = 1 YES \ NO
N= 700 N= 300
1 550 1 50
0 150 0 250
| |
| |
Gender==
M Location==N
/ \ / \

/ \ ..../ \
YES
Depth of (Gender== (Gender==F Location= Location!
Tree = 2 M) ) N) =N
N= 300 N= 400 N= 100 N= 200
77% 1 230 # 1 10 25% 1 25 1 150
23% 0 70 # 0 390 75% 0 75 0 50
No nodefau
defaulter defaulter
defaulter ler
Decision tree represented as:

Fig 1 Decision tree

Geometrically, it is axis parallel planes that tessellate the area into Hypercuboid or
hypercube.
Like other machine learning model Decision Trees also of two types Regression Trees and
Classification trees. Some terms related to decision trees are:
 Decision Node: Node that is splitted
 Root Node: This is the top-most node
 Leaf Node: last node of the tree
 Splitting: The process of dividing a node is known as splitting. This is the most
commonly used method where a top-down approach is taken to split each node to find
the best split
 Pruning: Reverse of splitting. Here we remove sub-nodes of a decision node.

6.1 How to build a decision tree?

In case of classification type of problem we use “Entropy”, “Information Gain”, “Ginni

coefficient ” as our measure to decide the feature that split the data, where as in case of
linear regression we use mean sum of square (MSE) to split the data.
The most common criteria/metrics for Classification Trees are-
 Entropy
 Gini index
 Chi-Square (for Classification Trees)
 Reduction In Variance (for Regression Trees)
 ANOVA.

6.1.1 Entropy: It is defined as:

H= -p log2p – q log2q
p is probability of outcome variable classified in class 1 and q is probability that
outcome variable(defaulter) classified in class 0
Suppose, we have a data like 1000 employees as

Defaulte Employed Area Gender

1 1 N M
0 0 S F
0 0 S M
1 0 S F
0 1 N F
1 0 N M
0 0 N M
1 1 S M
1 0 S F
0 1 S M
0 0 N F

And we want to classify data a variable as defaulter or not on basis of employed,

Area, Gender.

Steps in computing entropy as:

= From above graph we saw in case of feature employment, no of people in class 1 = 600
No of people in class 0= 400
Probabilityclass1)= 600/1000
Probabilityclass1)= 400/1000
H(employed) = -0.6*log(0.6,2)-0.4*log(0.4,2)
= 0.65
Similarly, we calculate other entropy or H(location) and H(Gender)
ROOT Potential Features for Split
Default Are Gend
Employed
er a er
0.2
0.65
1 0.34
1 1 N M
0 0 S F
0 0 S M
1 0 S F
0 1 N F
1 0 N M
0 0 N M
1 1 S M
1 0 S F
0 1 S M
0 0 N F
1 1 N M
Since we see above for employed we have maximum entropy so we choose it as our root
node.

We see from above example that entropy approach 1 if both the categories are equally
probable then and if they are unequal probable then entropy decreases. Generally, the feature
which has higher entropy is selected for splitting. Entropy measures the value of
randomness in the data.

6.2.2 Information Gain: It is defined as

IG= Entropy(parent) – Weighted average of entropy of child nodes

Information Gain is opposite of entropy. As the name suggests it explain how much
homogeneity in the data is maintained if that feature used as a base node. Higher the IG better
the feature is for splitting the Decision tree. It is computed as:

ROOT NODE
N= 1000
1 600
0 400
|
|
Employed == Y
/ \
/ \
YES \ NO
N= 700 N= 300
1 550 1 50
0 150 0 250

H(employed)= 0.65
H(Employed = Y) = -550/700*log2(550/700)-150/700*log2(150/700) = 0.75
Similarly for (employed = N) = 0.25
Information gain(employed) = 0.65 - (700/1000*0.75+300/1000*0.25) =0.65 -0.6
=0.05

Similarly we do it for other features.

6.1.3 Ginni impurity:- It is defined

Where pi is the probability of allocating a unit in the ith class .

i.e, Ginnii iimpurity for employed in above example given as:
1 – ((0.6)2+(0.4)2) =0.6
Similarly we compute entropy for other features.
Ginni impurity and entropy both are same. The only difference lies in the formula of ginni impurity
doesn’t uses log that makes calculation faster as compared to entropy (computational
complexity).

Now suppose, we have a data like 1000 employees as

Defaulte Employed Area Gender

15 1 N M
25 0 S F
65 0 S M
85 0 S F
45 1 N F
54 0 N M
12 0 N M
94 1 S M
100 0 S F
25 1 S M
20 0 N F

6.1.4 MSE
In this if we have defaulter (amount doesn’t pay by a person in Rs) is given and we want to
predict it. In this case we use mse.
First we compute mean square error of defaulter using formula
Σ ( yi−γ ) 2
=
n
Suppose (mean of 1000 defaulter) = 50
Now we compute mse as:
( 15−γ ) 2+ ( 25−γ ) 2+ ( 65−γ ) 2+…+ ( 25−γ ) 2+ ( 20−γ ) 2
1000

= 196.258

Now, for feature employed we have 700 yes and 300 no then,
For employed =Yes
( 15−γ ) 2+ ( 45−γ ) 2+ ( 94−γ ) 2+…+ (25−γ ) 2
We compute Mse =
700
=102.68
For employed = No
( 25−γ ) 2+ ( 65−γ ) 2+ ( 65−γ ) 2+…+ ( 20−γ ) 2
We compute Mse =
300
=78.54
Now, for feature employed =196.258 – (700/1000*102.68+ 300/1000 *78.4)
= 196.258 – 94.8
=101.458
101.458 here represent that using feature employed reduction in MSE is 101.458.
Similarly we try for location and area and the feature for splitting which maximum
reduces MSE of defaulter(root node)

Defaulter Mse= ROOT NODE

196.258 N= 1000
1 600
0 400
|
|
Employed == Y
/ \
/ \
Y = Mean or Y = Mean or
Md(employed Md(employed
=Yes) YES =No) NO
MSE =102.68 N= 700 MSE =78.54 N= 300
1 550 1 50
0 150 0 250

Now, we are aware of how to build a decision tree. But we must take care while
building a decision tree as depth increase the chance of overfitting increased (or
chances of higher noisy points in the data) and if depth is too short like 1 then
decision tree underfit. So, appropriate depth should be find out using cross validation.

Complete Data
1000

K fold Cross Validation with K =
1000
1

2

3

4

5

In cross validation, we test our data on some part(/5 or 1/10) of original data for same
hyper parameter and compute accuracy simply by taking average.
6.2 Parameters Related to the Size of the Tree:

Maximum Depth Length from the root node to the leaf node of the tree. If maximum
depth is defined externally then tree stop growing after that depth. Too short value of
maximum depth (like 2,3) lead to under fit and high value (like 10,20) this lead to
overfit. Also tree of high maximum depth is hard to interpret.

Fig 2 Partitioning data at different level of depth

6.2.1 Minimum Split Size

No of sample required for splitting. High value of this lead to underfitting and low value
corresponds to overfitting.
6.2.2 Minimum Leaf Size
No of sample present at terminal node. This is useful in case of imbalance data. Suppose if
we have 100 data point of which 90 belong to one class and 10of another class. Then a case
may arise where at terminal node we have 90 in one node and 10 in another leaf node .
Prediction made by such model are incorrect.

6.3 Decision tree implementation in Python

6.3.1 In case of classification

Dataset has folloing columns as:

Pregnancies = No of pregnancies a person had
Glucose = Level of glucose in body
BloodPressure = Blood pressure of person
SkinThickness =Thickness of skin
Insulin = Insulin in body
BMI = Body mass innex of person
,DiabetesPedigreeFunction = unit for measuring diabetes
Age Age of perrson
Outcome A person is diabetic or not

Outcome is a dependent variable it predict a person is diabetic or not and rest are
independent variable.

Outcome variable is imbalance

Input: sns.heatmap(data_num.corr())
Output:
it is clear that there is no or low multicollinearity

First we balance the dataset

from imblearn.over_sampling import SMOTE

over_sampler = SMOTE(k_neighbors=4)
train_X, train_Y = over_sampler.fit_resample(train_x, train_y)

1 343
0 343
Name: Outcome, dtype: int64

Since, we removed outlier, no missing value, no multicollinearity

Now, we train our model
from sklearn import tree
clf = tree.DecisionTreeClassifier()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(train_X, train_Y)

Accuracy before cross validation and hyper parameter tuning

For training data

train_predicted_probabilities = pd.DataFrame(clf.predict_proba(train_X))[1]
roc_auc_score(train_Y, train_predicted_probabilities)

Output : 1.0

For test data

test_predicted_probabilities = pd.DataFrame(clf.predict_proba(test_X))[1]
roc_auc_score(test_Y, test_predicted_probabilities)

Output: The AUC for the model built on the Test Data is : 0.663152005508693

In train data we have 100 % accuracy and test 70%

After K fold cross validation

from sklearn import model_selection

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_validate
from sklearn.model_selection import KFold

from sklearn import tree

clf = tree.DecisionTreeClassifier()
kfold = KFold(n_splits=4)
scoring = 'roc_auc'
results_k10 = cross_val_score(clf, train_X, train_Y, cv=kfold,
scoring=scoring)

train_auc = metrics.roc_auc_score(train_Y, clf.predict(train_X))

test_auc = metrics.roc_auc_score(test_Y, clf.predict(test_X))

print("The AUC for the model built on the Train Data is : ", train_auc)
print("The AUC for the model built on the Test Data is : ", test_auc)

Output:
The AUC for the model built on the Train Data is : 1.0
The AUC for the model built on the Test Data is : 0.70

Accuracy of train is 100 and test is 70%

After Hyper parameter tuning:

from sklearn.model_selection import GridSearchCV

params = {'max_features': ['auto', 'sqrt'],
'criterion':['gini','entropy'],
'min_samples_split': [2,5,7],
'min_samples_leaf':[4,5,6,7],
'max_depth':[5,6,7]}

from sklearn import tree

clf = tree.DecisionTreeClassifier()
DT = GridSearchCV(clf, param_grid=params, cv = 5, scoring='roc_auc')
DT.fit(train_X,train_Y)
DTC_F = tree.DecisionTreeClassifier(criterion = 'gini', max_depth = 7,
min_samples_split = 7, min_samples_leaf
= 7, max_features = 'auto')
#Fitting decision tree for best classifier
DTC_F.fit(train_X, train_Y)
train_predicted_probabilities = pd.DataFrame(DTC_F.predict_proba(train_X))
[1]
roc_auc_score(train_Y, train_predicted_probabilities)
test_predicted_probabilities = pd.DataFrame(DTC_F.predict_proba(test_X))[1]
roc_auc_score(test_Y, test_predicted_probabilities)

Ouput:

GridSearchCV(cv=5, estimator=DecisionTreeClassifier(),
param_grid={'criterion': ['gini', 'entropy'],
'max_depth': [5, 6, 7],
'max_features': ['auto', 'sqrt'],
'min_samples_leaf': [4, 5, 6, 7],
'min_samples_split': [2, 5, 7]},

Train output:
0.9377810266130608
Test output
0.7364565529144444

Accuray

After parameter tuning accuracy of test has been increased from 60 to 73%. For train
accuracy is still 0.93. Since there is too much variation in accuracy of train and test data so
there is high variance in the data.
6.3.2 In case of regression
Data set contain the following variable as:
In train data we have 8523 column and following columns:

Item_Identifier: product ID

Item_Weight: Weight of the product in Kg

Item_Fat_Content: product is low fat or no fat

Item_Visibility: The % of total display area of the products in a store

Item_Type: The category to which the product belongs

Item_MRP: Maximum Retail Price selling price) of the product

Outlet_Identifier: Unique store ID

Outlet_Establishment_Year: The year in which store was established

Outlet_Size: The size of the store in terms of ground area covered

Outlet_Location_Type: The type of city in which the store is located

Outlet_Type: Whether the outlet is just a grocery store or supermarket

Item_Outlet_Sales: Sales of the product in the particulat store. This is the outcome variable to be
predicted.

Item_Outlet_Sales is dependent variable rest are independent variable

After all preprocessing steps. We train a decision tree model without hyperparameter tuning
as:

from sklearn.tree import DecisionTreeRegressor

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error as mse

reg_decision_model=DecisionTreeRegressor()
reg_decision_model.fit(X_train,y_train)
reg_decision_model.score(X_train,y_train)
1.0

Model score

reg_decision_model.score(X_test, y_test)
0.562696

We got 100% score on training data.

On test data we got 57% score because we haven’t done any tuning parameters just initialized
the tree with default values. So tree has expanded it full length using feature which is not
actually valid. That’s why 100 % accuracy on train data (i.e, a highly overfitted model)
Now we tune parameter to get rid off this

Simple Models without any Hyper parameter or its tuning but with K-Fold Cross
Validation

from sklearn import tree

clf = tree.DecisionTreeRegressor()
clf = clf.fit(X_train,y_train)
clf.predict(X_train)[1]

resultsDTC = cross_val_score(clf,X_train,y_train , cv=5,

scoring='neg_mean_squared_error')

resultsDTC.mean()

-3.176499710061905

MSE after cross validation

from sklearn import metrics

print('MAE:', metrics.mean_absolute_error(y_test,prediction))
print('MSE:', metrics.mean_squared_error(y_test, prediction))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, prediction)))

MAE: 0.8283144309737974
MSE: 3.4923738756355105
RMSE: 1.868789414470103

After hyperparameter tuning.

# Hyper parameters range intialization for tuning

parameters={"splitter":["best","random"],
"max_depth" : [1,3,5,7,9],
"min_samples_leaf":[4,5,6,7,8],
"min_weight_fraction_leaf":[0.1,0.2,0.3,0.4,0.5,0.6],
"max_features":["auto","log2","sqrt",None],
"max_leaf_nodes":[None,40,50,60] }

tuning_model=GridSearchCV(reg_decision_model,param_grid=parameters,scoring=
'neg_mean_squared_error',cv=3,verbose=3)

Tuning best parameter:

tuned_hyper_model=
DecisionTreeRegressor(max_depth=5,max_features='sqrt',max_leaf_nodes=60,min
_samples_leaf=4,min_weight_fraction_leaf=0.1,splitter='best')
Error after hyper parameter tuning

# With hyperparameter tuned

from sklearn import metrics

print('MAE:', metrics.mean_absolute_error(y_test,tuned_pred))
print('MSE:', metrics.mean_squared_error(y_test, tuned_pred))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, tuned_pred)))

MAE: 0.9994744906564477
MSE: 2.344128598627275
RMSE: 1.53105473404032

It was observed that after tuning MSE has been decreased to a small value

6.4 When to stop growing?

 Pure node got stop growing

 If we have lack of point stop growing
 It tree are too deep

6.5 Advantages of decision trees:

 Decision Trees provide high accuracy and require very little preprocessing of
data such as outlier capping, missing value, variable transformation etc.
 It work well for non-linear relationships.
 Tree-Based models can be very easily visualized with clear-cut demarcation
allowing people having no background of statistics to understand the process
easily.
 Decision Tees implements well in data cleaning, data exploration and
variable selection and creation.
 Decision Trees can work with high dimensional data having both- continuous
and categorical variables
 Feature interaction are in built of decision tree.
 DT are super - interpretable.

In this module we have learnt about decision tree but our test model and train model has
high variation. Now we look some more powerful algorithm which is aggregation of
decision tree i.e Boosting, Bagging in upcoming module..

STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
18 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Decision Tree Version 3
No ratings yet
Decision Tree Version 3
16 pages
2 ML Ch3 Decision Trees Final
No ratings yet
2 ML Ch3 Decision Trees Final
70 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
60 pages
IS4834 Week 8
No ratings yet
IS4834 Week 8
42 pages
Decision Trees (I) : ISOM3360 Data Mining For Business Analytics, Session 4
No ratings yet
Decision Trees (I) : ISOM3360 Data Mining For Business Analytics, Session 4
32 pages
EDA Cat2
No ratings yet
EDA Cat2
54 pages
Lec7 - Nonparametric Methods - II
No ratings yet
Lec7 - Nonparametric Methods - II
38 pages
Stochastic Calculus For Finance
100% (10)
Stochastic Calculus For Finance
188 pages
IML Unit04 - Learning Decision Trees
No ratings yet
IML Unit04 - Learning Decision Trees
28 pages
Statistics and Probability: Lesson 4
No ratings yet
Statistics and Probability: Lesson 4
17 pages
Classification: Basic Concepts, Decision Trees, and Model Evaluation
No ratings yet
Classification: Basic Concepts, Decision Trees, and Model Evaluation
46 pages
ML Unit 2
No ratings yet
ML Unit 2
84 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Decision Tree
No ratings yet
Decision Tree
71 pages
Decision Tree
No ratings yet
Decision Tree
23 pages
Unit 3
100% (1)
Unit 3
21 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
Decision Trees
No ratings yet
Decision Trees
13 pages
ML Unit 3
No ratings yet
ML Unit 3
83 pages
5 1 Decision Trees
No ratings yet
5 1 Decision Trees
34 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
ML Unit 3 Notes-1
No ratings yet
ML Unit 3 Notes-1
118 pages
Unit-3 ML
No ratings yet
Unit-3 ML
47 pages
LVC 1 Post-Session Summary
No ratings yet
LVC 1 Post-Session Summary
9 pages
Decision Trees: Classifier
No ratings yet
Decision Trees: Classifier
23 pages
ML Lecture 8 9 Classification
No ratings yet
ML Lecture 8 9 Classification
35 pages
Trees
No ratings yet
Trees
78 pages
Classification
No ratings yet
Classification
75 pages
Lecture 5 DecisionTree
No ratings yet
Lecture 5 DecisionTree
21 pages
Chap 18 B
No ratings yet
Chap 18 B
22 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Decision Trees
No ratings yet
Decision Trees
17 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
Decision Tree Introduction
No ratings yet
Decision Tree Introduction
14 pages
S&ML Unit 6 - Q & A
No ratings yet
S&ML Unit 6 - Q & A
12 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
Decision Tree Tutorial
No ratings yet
Decision Tree Tutorial
8 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Unit II
No ratings yet
Unit II
34 pages
Decision Tree Classifier - Manual
No ratings yet
Decision Tree Classifier - Manual
6 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
My Project 1 AI
No ratings yet
My Project 1 AI
3 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
21 pages
Soft Computing Lab Practical Assignment 2
No ratings yet
Soft Computing Lab Practical Assignment 2
10 pages
Lesson 4 My Path Task - WorldQuant University
No ratings yet
Lesson 4 My Path Task - WorldQuant University
6 pages
Chapter 6 Measures of Skewness and Kurtosis
No ratings yet
Chapter 6 Measures of Skewness and Kurtosis
25 pages
Expt No. 3 Frequency by Quadrat Method
No ratings yet
Expt No. 3 Frequency by Quadrat Method
2 pages
Measurement Uncertainty and Probability (Willink R., 2013)
No ratings yet
Measurement Uncertainty and Probability (Willink R., 2013)
294 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Random Function
No ratings yet
Random Function
3 pages
Modeling Relationship Among Factors That Affecting Customers' Intention in Purchasing Malaysian Cars Using Structural Equation Model
No ratings yet
Modeling Relationship Among Factors That Affecting Customers' Intention in Purchasing Malaysian Cars Using Structural Equation Model
10 pages
With Answers
100% (1)
With Answers
24 pages
Probability: Arpit Choudhry SIR
No ratings yet
Probability: Arpit Choudhry SIR
134 pages
Stochastic Hydrology
No ratings yet
Stochastic Hydrology
187 pages
Summative 2 SP
No ratings yet
Summative 2 SP
3 pages
Eviews VAR Stata
No ratings yet
Eviews VAR Stata
17 pages
Line of Regression Part 1
No ratings yet
Line of Regression Part 1
27 pages
Ai SL Y1 Unit 6 Review
No ratings yet
Ai SL Y1 Unit 6 Review
27 pages
Sampling Theory
No ratings yet
Sampling Theory
4 pages
Occupancytuts - Survey-Level Covariates
No ratings yet
Occupancytuts - Survey-Level Covariates
35 pages
B11
100% (1)
B11
20 pages
Research Design and Statistics
No ratings yet
Research Design and Statistics
46 pages
CH 13 Non Parametric Test
No ratings yet
CH 13 Non Parametric Test
35 pages
Solution To DMOP Make Up Exam 2016
No ratings yet
Solution To DMOP Make Up Exam 2016
5 pages
Bayesian Belief Networks
No ratings yet
Bayesian Belief Networks
9 pages
Data Summarization in Excel
No ratings yet
Data Summarization in Excel
3 pages
Maths (4) Module 4
No ratings yet
Maths (4) Module 4
45 pages
Emperical Measurement of Price
No ratings yet
Emperical Measurement of Price
5 pages
ICT505 Data Analytics Week - 4
No ratings yet
ICT505 Data Analytics Week - 4
4 pages
Probability and Random Variables: Abu Bakr Siddique
No ratings yet
Probability and Random Variables: Abu Bakr Siddique
41 pages
Gaussian Distributions: Overview: This Worksheet Introduces The Properties of Gaussian Distributions, The
No ratings yet
Gaussian Distributions: Overview: This Worksheet Introduces The Properties of Gaussian Distributions, The
25 pages
Department of Education: Learner'S Activity Sheet in Practical Research 2 For Quarter 2, Week 6
No ratings yet
Department of Education: Learner'S Activity Sheet in Practical Research 2 For Quarter 2, Week 6
13 pages
Solution For Homework 2 Problem 1
No ratings yet
Solution For Homework 2 Problem 1
8 pages
Midterm Quiz DRY RUN: You Have Completed
No ratings yet
Midterm Quiz DRY RUN: You Have Completed
4 pages
Lectures on Measure and Integration
From Everand
Lectures on Measure and Integration
Harold Widom
No ratings yet

Decision Trees Palagraism

Uploaded by

Decision Trees Palagraism

Uploaded by

Fig 1 Decision tree 4

Fi2 Partitioning data at different level 9

Fig 1 Decision tree

6.1 How to build a decision tree?

In case of classification type of problem we use “Entropy”, “Information Gain”, “Ginni

6.1.1 Entropy: It is defined as:

Defaulte Employed Area Gender

And we want to classify data a variable as defaulter or not on basis of employed,

Steps in computing entropy as:

6.2.2 Information Gain: It is defined as

IG= Entropy(parent) – Weighted average of entropy of child nodes

Similarly we do it for other features.

6.1.3 Ginni impurity:- It is defined

Where pi is the probability of allocating a unit in the ith class .

Now suppose, we have a data like 1000 employees as

Defaulte Employed Area Gender

Defaulter Mse= ROOT NODE

Fig 2 Partitioning data at different level of depth

6.2.1 Minimum Split Size

6.3 Decision tree implementation in Python

Dataset has folloing columns as:

Outcome variable is imbalance

First we balance the dataset

from imblearn.over_sampling import SMOTE

Since, we removed outlier, no missing value, no multicollinearity

Accuracy before cross validation and hyper parameter tuning

For training data

For test data

In train data we have 100 % accuracy and test 70%

After K fold cross validation

from sklearn import model_selection

from sklearn import tree

train_auc = metrics.roc_auc_score(train_Y, clf.predict(train_X))

Accuracy of train is 100 and test is 70%

After Hyper parameter tuning:

from sklearn.model_selection import GridSearchCV

from sklearn import tree

Item_Weight: Weight of the product in Kg

Item_Fat_Content: product is low fat or no fat

Item_Visibility: The % of total display area of the products in a store

Item_Type: The category to which the product belongs

Item_MRP: Maximum Retail Price selling price) of the product

Outlet_Identifier: Unique store ID

Outlet_Establishment_Year: The year in which store was established

Outlet_Size: The size of the store in terms of ground area covered

Outlet_Location_Type: The type of city in which the store is located

Outlet_Type: Whether the outlet is just a grocery store or supermarket

Item_Outlet_Sales is dependent variable rest are independent variable

from sklearn.tree import DecisionTreeRegressor

We got 100% score on training data.

from sklearn import tree

resultsDTC = cross_val_score(clf,X_train,y_train , cv=5,

MSE after cross validation

from sklearn import metrics

After hyperparameter tuning.

# Hyper parameters range intialization for tuning

Tuning best parameter:

# With hyperparameter tuned

from sklearn import metrics

6.4 When to stop growing?

 Pure node got stop growing

6.5 Advantages of decision trees:

You might also like