0% found this document useful (0 votes)

89 views109 pages

ML L8 Decision Tree

- Decision trees are a supervised learning technique that can be used for both classification and regression problems. - They work by recursively splitting a dataset into purer subsets based on the values of predictor variables. - The document discusses the basic concepts of decision trees including their structure, types (classification vs regression trees), and applications. It also provides examples to illustrate how decision trees can be used for classification tasks.

Uploaded by

Mickey Mouse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views109 pages

ML L8 Decision Tree

Uploaded by

Mickey Mouse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 109

Machine Learning

19CSE305
L-T-P-C: 3-0-3-4
Lecture 8

Decision Trees
DECISION TREES

• Classic and natural model of learning

• Based on the fundamental computer science notion of “divide and conquer”
• It creates a model that predicts the value of a target variable based on several
input variables
• A tree can be learned by splitting the source set into subsets based on an
attribute value test using recursive partitioning.
• The recursion is completed when the subset at a node has all the same value
of the target variable, or when splitting no longer adds value to the
predictions.
• Although decision trees can be applied to many learning problems, we will
begin with the simplest case: binary classification
What is Decision Tree?
● Decision Tree is a Supervised learning technique that can be used
for both classification and Regression problems, but mostly it is
preferred for solving Classification problems
● It is a tree-structured classifier, where
○ internal nodes represent the features of a dataset
○ branches represent the decision rules
○ each leaf node represents the outcome

3
Definition
The decision tree model of learning
The goal of learner: Figure out what questions to
ask, in what order, and what to predict when you
have answered enough questions
What is Decision Tree?

6
Two Main Types of Decision Trees

● Classification Trees
○ Here the decision variable is Categorical/ discrete
○ Binary recursive partitioning – a process of splitting the data into
partitions, and then splitting it up further on each of the branches.

7
Two Main Types of Decision Trees

● Regression Trees
○ Decision trees where the target variable can take continuous values
are called regression trees

8
Applications
The decision tree model of learning
Do people like movies by an Actor “x”?
The decision tree model of learning
Do people like movies by an Actor “x”?

Q : Is the genre of movie is “Action”?

‣Yes
Q : Is the lead actor is “x”?
‣Yes
Q : Has people liked previous Action movie of the “x”?
‣Yes
People do like Action movies of “x”
The decision tree model of learning
Do people like movies by an Actor “x”?

Q : Is the genre of the movie “Comedy”?

‣Yes
Q : Is the lead actor “x”?
‣Yes
Q : Has people liked the previous Comedy movie of the “x”?
‣Yes
People don’t like Comedy movies of “x”
The decision tree model of learning
Lead Actor Genre Hit(Y/N)
x Action Yes
x Fiction Yes
x Romance No
x Action Yes
y Action No
y Fiction No
y Romance Yes
The decision tree model of learning

Lead Actor Genre Hit(Y/N)

x Action Yes ‣ Questions are features

x Fiction Yes ‣ Responses are feature values
x Romance No ‣ Decision is the label
x Action Yes

y Action No

y Fiction No

y Romance Yes
Why it’s Decision Tree?

Lead Actor Genre Hit(Y/N)

x Action Yes

x Fiction Yes

x Romance No

x Action Yes

y Action No

y Fiction No

y Romance Yes
Why it’s Decision Tree?

Lead Actor Genre Hit(Y/N) ‣ We can represent set of questions and

x Action Yes guesses in a tree format

x Fiction Yes

x Romance No

x Action Yes

y Action No

y Fiction No

y Romance Yes
Decision Trees
Example 2

If we are classifying bank loan application for a customer, the decision tree may
look like this
Decision Trees
What is a Decision tree?

A decision tree is a tree where each node represents a feature(attribute), each

link(branch) represents a decision(rule) and each leaf represents an
outcome(categorical or continues value).
Create a tree for the entire data and process a single outcome at every leaf(or
minimize the error in every leaf).
Terms and Process
of building a
Decision Tree
Terminologies
Root Node

A
3 Types of nodes:
F T
1. Root Node
2. Branch Node Branch Node
B B
3. Leaf Node
F T
Benefits T F
•It does not require any
domain knowledge. F
•It is easy to comprehend. T T F
•The learning and
classification steps of a
decision tree are simple
and fast. Leaf Node
Classification—A Two-Step Process [RECAP]

Model construction: describing a set of predetermined classes

◦ Each tuple/sample is assumed to belong to a predefined class, as determined by
the class label attribute
◦ The set of tuples used for model construction is training set
◦ The model is represented as classification rules, decision trees, or mathematical
formulae
Model usage: for classifying future or unknown objects
◦ Estimate accuracy of the model
◦ The known label of test sample is compared with the classified result from the model
◦ Accuracy rate is the percentage of test set samples that are correctly classified by the model
◦ Test set is independent of training set (otherwise overfitting)
◦ If the accuracy is acceptable, use the model to classify new data
22

Note: If the test set is used to select models, it is called validation (test) set
Steps in Decision Tree Construction

Begin the tree with the root node, says S, which contains the
complete dataset

Find the best attribute in the dataset using Attribute

Selection Measure (ASM)

Divide the S into subsets that contain possible values for the
best attributes

Generate the decision tree node, which contains the best

attribute

Recursively make new decision trees using the subsets of

the dataset created in step 3

23
How does Decision Tree Classification Work?

24
Examples
Illustrating Classification Task
Tid Attrib1 Attrib2 Attrib3 Class Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
Induction
5 No Large 95K Yes

6 No Medium 60K No
7 Yes Large 220K No Learn
8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes

Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ? Deduction
14 No Small 95K ?
15 No Large 67K ?
10

Test Set
Decision Tree-Classification
Task
al al us
r ic r ic u o
o o
teg teg n tin as
s
a a o c l
c c c
Tid Refund Marital Taxable Splitting Attributes
Status Income Cheat

1 Yes Single 125K No

2 No Married 100K No Refun
3 No Single 70K No Yes d No
4 Yes Married 120K No
NO MarSt
5 No Divorced 95K Yes
Single, Divorced Married
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
9 No Married 75K No
NO YES
10 No Single 90K Yes
10

Training Data Model: Decision Tree

Another Example of
Decision Tree
al al us
r ic r ic uo
o o
teg teg ntin as
s
a a o c l MarSt Single,
c c c
Tid Refund Marital Taxable
Married Divorce
Status Income Cheat d
NO Refun
1 Yes Single 125K No
Yes d No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
6 No Married 60K No
NO YES
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that fits
10 No Single 90K Yes the same data!
10
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund
10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K