0% found this document useful (0 votes)
48 views

Decision Tree & Regression

A decision tree is a predictive model that uses a branching series of Boolean tests to classify or predict outcomes. It works by splitting the data into smaller groups based on attribute values, with each split aimed at increasing the homogeneity of the resulting groups. Decision trees allow for intuitive visualization and interpretation of classification or regression rules. Random forests build multiple decision trees and merge their results to improve predictive accuracy over a single tree, helping to avoid overfitting. They have become very popular due to their accuracy and ability to handle large datasets with many attributes.

Uploaded by

vignesh waran
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Decision Tree & Regression

A decision tree is a predictive model that uses a branching series of Boolean tests to classify or predict outcomes. It works by splitting the data into smaller groups based on attribute values, with each split aimed at increasing the homogeneity of the resulting groups. Decision trees allow for intuitive visualization and interpretation of classification or regression rules. Random forests build multiple decision trees and merge their results to improve predictive accuracy over a single tree, helping to avoid overfitting. They have become very popular due to their accuracy and ability to handle large datasets with many attributes.

Uploaded by

vignesh waran
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

What is a decision tree

• An inductive learning task


– Use particular facts to make more generalized conclusions

• A predictive model based on a branching series of


Boolean tests
– These smaller Boolean tests are less complex than a one-
stage classifier

• Let’s look at a sample decision tree…


• Statistical model objective – Predictive (industry),
Explanation(Academic)
Sample
1

1 = root node
2 3

4 5 6 7 = terminal/leaf
node
8 9 10 11

= non-terminal
12 13
Decision Tree

A decision tree is a tree where each


node represents a FEATURE
(Attribute),
Each link branch represents a
Decision (RULE)
and each LEAF represent a outcome
Algorithm to Find Root Node
• Algorithms –
– CART
– Gini Index – raprt, randomforest, scikitkearn
ID3 –
- Entropy Function
- Information Gain
- Misclassification
- Chisqaure
Entropy
Entropy
Entropy Calculation
Entropy
Example : Cross sell - respondents

Root node
(20000, 1000)
have a car?
yes

(13000,540)
(7000, 460)
PG or higher?
Decision
Married? nodes

yes yes

(2000,400) (5000,60) (3000 , 500) (10000,40)

Leaf nodes
Tree Construction
• Building the tree
– A tree where each node represent a FEATURE.
– Each link represent a branch or Decision
– Each leaf represent an outcome
• Pruning – cross validation
• How to do it in R
• Binary is used in two places used –
– y or
– splitting class – boolean test question
• Decision Tree has multiple classes but asking question is only
Boolean Test for Splitting.
Build the tree
• Choose the best split at every level – that’s the
algorithm used to find
• How to know what is best?
– Best is that, which reduces “impurity” of a node the
most
• Method to determine impurity and therefore best
– Gini Index
– Entropy
– Misclassification
Gini
• Simply put – it is the probability of randomly
selected two elements belonging to the same
class. Sum(p(i)^2) for all class i.
• Assumption: Higher data is better
• P(Age)^2 + p(salary)^2+ p(education)^2 –
ginni score
• ( p(age)- plog2(age)) + (p(salary ) – p
log(salary)) + (p(education) – p log(education))
Gini calculation
C1 5
C2 6

Gini = (5/11) ^ 2 + (6/11) ^2 = 0.504


Entropy
• Measures reduction in entropy
• Entropy is
– Sum[p(i)*log2p(i)]
where p(i) is probability of being in class I
• Information Gain
Entropy(parent node) – Entropy(child nodes)
Entropy calculation
C1 5
C2 6

Entropy = - 5/11 log2 (5/11) – 6/11 log2 (6/11) = 0.99


Misclassification

Proportion wrongly classified = 5/11


Pruning a tree
• Definition : Pruning a branch Tt from a tree T consists of
deleting all descendants of t except its root node. T- Tt is the
pruned tree.
1
1
2
2 3
2 3

4 5 6 7
4 5 6 7

10 11
8 9 10 11 8 9

12 13
12 13

Tree T Branch T2 Tree T-T2


Why you need to prune
• Over fitting
• Error rate on test data has different pattern
than error rate on training data
• Largest tree is not the best
Pruning in practice
• 10 fold cross validation - The data is divided into 10 subsets
of equal size (at random) and then the tree is grown leaving
out one of the subsets and the performance assessed on the
subset left out from growing the tree. This is done for each
of the 10 sets. The average performance is then assessed.
• We can then use this information to decide how complex
(determined by the size of cp) the tree needs to be. The
possible rules are to minimise the cross validation relative
error (xerror), or to use the “1-SE rule” which uses the
largest value of cp with the “xerror” within one standard
deviation of the minimum.
Optimization
• Optimize tree branches (decision node).
• Optimize depth of the tree to control
complexity and over-fitting.
Regression Tree
• Y is a continuous variable
• Y = f(x1, x2, x3. ….xk)
• We use to split the tree using SS (sum of
squared error) reduction is high by which
variables

• Cooks Distance – is a measure of how much influence each
point is making on the line
• Thumb rule to have
• Cooks distance
• = 1 / 39 = 1/ 40(rows) – k(features)
• Change in estimate with and
• without the influence rows is
• Cooks distance.
• Higher means than point
• Is making significan impact
• Fine Tuning – cooks distance fundamental to remove
Ensamble
• Class of Technique – called Ensamble
(Collection )
• It is extremally popular
• Because of accuracy
Bagging
• Random Forest based on randomization of two things –
– Samples (rows) – it will do sample with replacement sample of the
data ,
– Ex- if I create forest of 200 trees so
– Step 1 – create 200 samples of the sample data which I have , will be
creating drawing sampl with replacement, which employs some rows
will be repeated and some will be left out.
– Left out data / observation are called OUT OF BAG sample. Idea of
Bagging.
– STEP 2 – not using the same or all columns
– Thumb rules is 1/3 or sqrt(nos columns) randomly
– So out of 21 columns randomly will choose 7 columns for 1 st tree.
– This sampling techniques said BOOTSTRAP
– Means for 200 tree create 200 boottrap samples.
– Basically two things
– Randomize samples
– Create botstrap sampling
– Step 3 – Build the tree
We don’t bother about overfitting, so we don’t
care about CP – otherwise it will have cancelling
effect.
• Now once the tree is grown up will pass new dataset
and majority of class will be class of that tree.
• Higer nos of class from all treess will be final decision.
• Regression Tree – Average of all tree yhat percentage
will be final result.
• Basic OF - some tree will have right
– Some will have wrong
– When we wil start aggregating over a large nos of tree
particularly if hey are diverse tree
– They will more over be right than wrong.
Class of Technique - Ensamble
• Ex- if I want to answer of karnattaka election decision than
will not rely on a one person will ask many group of people
or can say rely on independent of many people.
• This is the whole idea. (collection of prediction)
• SO there is a clas of technique is called Ensamble technique.
• Not so much of their expanatory power but bcs of their
accuracy.
• Two broad types of techniques of this
– Bagging technique
– Boosting technique – rocket science in ensamble is this
• Collecion technique is either bagging or boosting tech.
• Bagging most popular technique is – Random Forest.
• Boosting – gradient boosting , Ada boosting.
• Thesse are basically collection of predicter used
together to come out with estimation.
• Underline both techniques popular base is TREE
• The foundation of them is decision tree.
• So if I say talk to 100 people mean I say to develop 100
trees. That is what is mean
How does Random forest work?
Random Forest developed by aggregating trees
Can be used for clasification and regression
Avoid overfitting
Can deal withlarge nos of features
Helps with feature selection based on importacne
User friendly : only 2 free parameters
- Tree – ntree , default 500
- Variables randomly sample as candidate at each split – mtry
- default is sqrt(p) for classifcication and 1/3 for regresion
- mtry is for column selection so that every tree is randomized tree not the similar.
> randomforest(as.factor(CREDIT)~ ., data =d, ntree = 200, mtry =5)) -> rf
-> rf
- > names(rf)
- OOBS – this ssample is not used for building the tree so all tree has OOBS sample left out, now this
OOBS will pass to their respective tree model for testing.
Some of them will be misclassified.
Now those misclasification of all OOBS sample are OOBS error.
RF Steps
•1. Draw ntree Bootstrap samples
•OOBS will beused to know how my Random forest model will perform with new data set.
•See err.rate > rf$err
•This shows how error rate is reducing while increasing the tree.
•Plot the OOBS error rate
•> plot(rf) – this can be checked wha shold be optimum size of the tree based on after how
many tree there is no error reducion is signficant.
> randomforest(as.factor(CREDIT)~ ., data =d, ntree = 170, mtry =5)) -> rf
-> rf
Predict from this forest :
> predict(rf, test) -> pred
> table(test$CREDIT, pred)

Some interesting ways to look at the the important variables by random forest .
Random Forest
• > getTree(rf, 15) – It will tell me entire story of tree 15.
• Even though it is a black box we can see which all variables are
important
• > importance(rf) show me important variables
• It gives some story telling power
• Random forest can be used for the knowledge based in easy way
• Also can understand or find out easily he importance variables out of
a large nos.
• Using this information can go and create logistic regression and if
error is not much difernce than give explanationnfrom the LR model.
• >varImpPlot(rf) it gives the importance of variables
• Random forest we don’t bother worry about “multicollinearity”
BOOSTING
• It runs on first dtaset 1 and have some
• pred(t) = pred(1)+ pred(2)……….+Pred(t-1)
• Till error reuction is happening the sequectially
new model will be creating and prediction will
behappening.
• This methods works with only numeric not with
categorical or factor variables
• Bcs it works with gradient descent which
requires derivatives
Boosting
• Use all factor variable to convert into dummy variables
• The shortcut method is run logistic regression model
• Convert the result into matrix model or dummy variables
– >Ex- glm(CREDIT~., data= train) -> m
– >Model.matrix(m) -> dv
– >View(dv)
– Run xgboost model
– > dim(dv)
– > xgboost(data =m, label=tain$CREDIT, nrounds=3, max_depth=2, objectives
“binary:logistic”) -> xg
– > xgboost(data =m, label=tain$CREDIT, nrounds=20, max_depth=2,
objectives “binary:logistic”) -> xg
– For test dataset

You might also like