0% found this document useful (0 votes)
1 views26 pages

Decision Trees

The document discusses Decision Trees as a supervised machine learning method used for classification and regression tasks, highlighting their structure and key algorithms like CART and C4.5. It outlines the advantages of Decision Trees, such as ease of understanding and less data cleaning, as well as disadvantages like overfitting and challenges with continuous variables. Additionally, it covers important concepts like entropy, Gini index, and various performance metrics used to evaluate the effectiveness of Decision Trees.

Uploaded by

ridss9002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views26 pages

Decision Trees

The document discusses Decision Trees as a supervised machine learning method used for classification and regression tasks, highlighting their structure and key algorithms like CART and C4.5. It outlines the advantages of Decision Trees, such as ease of understanding and less data cleaning, as well as disadvantages like overfitting and challenges with continuous variables. Additionally, it covers important concepts like entropy, Gini index, and various performance metrics used to evaluate the effectiveness of Decision Trees.

Uploaded by

ridss9002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Decision Trees

Supervised Machine Learning with Decision Trees


Does this
human earn

Decision Trees more than


$128.12?

● Banks can't blindly trust the machine answer.


● What if there's a system failure, hacker attack or a quick fix from a senior?
● To deal with it, we have Decision Trees.
● All the data automatically divided to yes/no questions.
● They could sound a bit weird from a human perspective, e.g., whether the creditor earns more than $128.12?
● Though, the machine comes up with such questions to split the data best at each step.
Who is the mother of
Decision Trees father of the niece of the
brother of the nephew of
the son of the insurer?

● That's how a tree is made.


● The higher the branch — the broader the question.
● Any analyst can take it and explain afterwards. He may not understand it, but explain easily! (typical analysts)
● Decision trees are widely used in high responsibility spheres:
○ Diagnostics

○ Medicine

○ Finances
Decision Trees ● The two most popular algorithms for forming the trees
are CART and C4.5.
● Pure decision trees are rarely used today.
● However, they often set the basis for large systems, and
their ensembles even work better than neural networks.
● When you google something, that's precisely the bunch
of dumb trees which are looking for a range of answers
for you.
● Search engines love them because they're fast.
Decision Trees
The CART Algorithm Data

Test Train

Test Train Test Train

Test Train Test Train

Test Train Test Train


The CART Algorithm
● Scikit-Learn uses the Classification And Regression Tree (CART) algorithm to train Decision Trees (also
called “growing” trees).
● The idea is really quite simple: the algorithm first splits the training set in two subsets using a single
feature k and a threshold tk

● Once it has successfully split the training set in two, it splits the subsets using the same logic, then the
sub-subsets and so on, recursively.
● CART algorithm is a greedy algorithm: it greedily searches for an optimum split at the top level, then
repeats the process at each level.
● It does not check whether or not the split will lead to the lowest possible impurity several levels down.
Important Terminology related to Decision
Trees
● Root Node: It represents entire population or sample and this further gets divided into two or more
homogeneous sets.
● Splitting: It is a process of dividing a node into two or more sub-nodes.
● Decision Node: When a sub-node splits into further sub-nodes, then it is called decision node.
● Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal node.
● Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say
opposite process of splitting.
● Branch / Sub-Tree: A sub section of entire tree is called branch or sub-tree.
● Parent and Child Node: A node, which is divided into sub-nodes is called parent node of sub-nodes
where as sub-nodes are the child of parent node.
Decision Trees
Pruning

● The technique of setting constraint is a greedy-approach.


● The algorithm will check for the best split instantaneously and move forward until one of the specified
stopping condition is reached.
Pruning

● Consider the following case when you’re driving:


Pruning

● There are 2 lanes:


○ A lane with cars moving at 80km/h

○ A lane with trucks moving at 30km/h

● At this instant, you are the yellow car and you have 2 choices:
○ Take a left and overtake the other 2 cars quickly

○ Keep moving in the present lane


Pruning

● Lets analyze these choices.


● In the former choice, you’ll immediately overtake the car ahead and reach behind the truck and start
moving at 30 km/h, looking for an opportunity to move back right.
● All cars originally behind you move ahead in the meanwhile.
Pruning

● This would be the optimum choice if your objective is to maximize the distance covered in next say 10
seconds.
● In the later choice, you sale through at same speed, cross trucks and then overtake maybe depending
on situation ahead. Greedy you!
● This is exactly the difference between normal decision tree & pruning.
● A decision tree with constraints won’t see the truck ahead and adopt a greedy approach by taking a
left.
● On the other hand if we use pruning, we in effect look at a few steps ahead and make a choice.
Advantages

Easy to Understand

● Decision tree output is very easy to understand even for people from non-analytical background.
● It does not require any statistical knowledge to read and interpret them.
● Its graphical representation is very intuitive and users can easily relate their hypothesis.
Advantages

Useful in Data exploration

● Decision tree is one of the fastest way to identify most significant variables and relation between two
or more variables.
● With the help of decision trees, we can create new variables / features that has better power to
predict target variable.
● It can also be used in data exploration stage. For example, we are working on a problem where we
have information available in hundreds of variables, there decision tree will help to identify most
significant variable.
Advantages

Less data cleaning required

● It requires less data cleaning compared to some other modeling techniques.


● It is not influenced by outliers and missing values to a fair degree.
Advantages

Data type is not a constraint

● It can handle both numerical and categorical variables.

Non Parametric Method

● Decision tree is considered to be a non-parametric method.


● This means that decision trees have no assumptions about the space distribution and the classifier
structure.
Disadvantages

Over fitting

● Over fitting is one of the most practical difficulty for decision tree models.
● This problem gets solved by setting constraints on model parameters and pruning
Disadvantages

Not fit for continuous variables

● While working with continuous numerical variables, decision tree loses information when it
categorizes variables in different categories.
Entropy
● The concept of entropy originated in thermodynamics as a measure
of molecular disorder.

● Entropy approaches zero when molecules are still and well ordered.

● In Machine Learning, it is frequently used as an impurity measure.

● A set’s entropy is zero when it contains instances of only one class.

● The more it is closer to zero, the better your algorithm.


Entropy Entropy =
~0 dQ/T
Gini Index

● We use the Gini Index as our cost function used to evaluate splits in the dataset.
● A Gini score gives an idea of how good a split is by how mixed the classes are in the two groups
created by the split.
● A perfect separation results in a Gini score of 0,
● The worst case split that results in 1.
● CART (Classification and Regression Tree) uses Gini method to create binary splits.
Metric What does it say? What it should be?

Accuracy Proportion of the correctly Closer to 1, the better


classified samples and all the
samples.

Precision & Recall Actual probability and Predicted Closer to 1, the better
probability

F1 Score Harmonic Mean of Precision and Closer to 1, the better


Recall

Confusion Matrix Identify a class that’s constantly If the classifier is perfect, you’ll
mistaken for some other class. obtain non-zero values only on
the main diagonal.

Performance Metrics
Metric What does it say? What it should be?

Receiver Operating Probability curve Closer to 1, the better


Characteristic (ROC) Curve

Area Under ROC Curve (AOC) How much the model is capable Closer to 1, the better
of distinguishing between classes

Performance Metrics
AUC and ROC Curve
AUC and ROC Curve

You might also like