0% found this document useful (0 votes)

50 views

Lesson 10 Decision Trees

Decision trees are a classification method that uses a tree structure with branches to classify observations based on their attributes. Nodes represent attributes and branches represent decisions made based on attribute values. Leaf nodes assign class labels. Decision trees can be used for tasks like predicting customer purchases, classifying congressional speeches, and detecting spam emails. They provide an easily interpretable visual representation and set of rules for classification.

Uploaded by

Gab Dy

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views

Lesson 10 Decision Trees

Uploaded by

Gab Dy

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

Introduction to

Data Science
DECISION TREES
Classification
In addition to analytical methods such as clustering, association rule learning,
and modeling techniques like regression, classification is another fundamental
learning method that appears in applications related to data mining.

In classification learning, a classifier is presented with a set of examples that

are already classified and, from these examples, the classifier learns to assign
unseen examples. In other words, the primary task performed by classifiers is
to assign class labels to new observations.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Classification
Most classification methods are supervised, in that they start with a training
set of labels that are predetermined, unlike in clustering, to learn how likely the
attributes of these observations may contribute to the classification of future
unlabeled observations.

For example, existing marketing, sales, and customer demographic data can
be used to develop a classifier to assign a “purchase” or “no purchase” label to
potential future customers.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Classification
Classification is widely used for prediction purposes. For example, by building
a classifier on the transcripts of United States Congressional floor debates, it
can be determined whether the speeches represent support or opposition to
proposed legislation.

Classification can help health care professionals diagnose heart disease

patients.

Based on an e-mail’s content, e-mail providers also use classification to

decide whether the incoming e-mail messages are spam.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Decision Trees
A decision tree is one of the two fundamental classification methods, the other
being naïve Bayes.

A decision tree (also called prediction tree) uses a tree structure to specify
sequences of decisions and consequences.

Given input X=, the goal is to predict a response or output variable Y. Each
member of the set is called an input variable. The input values of a decision
tree can be categorical or continuous.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Decision Trees
The prediction can be achieved by constructing a decision tree with test points
and branches. At each test point, a decision is made to pick a specific branch
and traverse down the tree. Eventually, a final point is reached, and a
prediction can be made.

Each test point in a decision tree involves testing a particular input variable (or
attribute), and each branch represents the decision being made. Due to its
flexibility and easy visualization, decision trees are commonly deployed in data
mining applications for classification purposes.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Decision Trees
A decision tree employs a structure of test points (called nodes) and
branches, which represent the decision being made.

A node without further branches is called a leaf node. The leaf nodes return
class labels and, in some implementations, they return the probability scores.

A decision tree can be converted into a set of decision rules.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Decision Trees
In the following example rule, income and mortgage_amount are input
variables, and the response is the output variable default with a probability
score.

IF income < $50,000 AND mortgage_amount > $100K

THEN default = True WITH PROBABILITY 75%

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Decision Trees
Decision trees have two varieties: classification trees and regression trees.
Classification trees usually apply to output variables that are categorical—
often binary—in nature, such as yes or no, purchase or not purchase, and so
on.

Regression trees, on the other hand, can apply to output variables that are
numeric or continuous, such as the predicted price of a consumer good or the
likelihood a subscription will be purchased.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Decision Trees
Decision trees can be applied to a variety of situations. They can be easily
represented in a visual way, and the corresponding decision rules are quite
straightforward.

Additionally, because the result is a series of logical if-then statements, there

is no underlying assumption of a linear (or nonlinear) relationship between the
input variables and the response variable.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
Figure 7-1 shows an example of
using a decision tree to predict
whether customers will buy a
product.

The term branch refers to the

outcome of a decision and is
visualized as a line connecting two
nodes.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
If a decision is numerical, the
“greater than” branch is usually
placed on the right, and the “less
than” branch is placed on the left.

Depending on the nature of the

variable, one of the branches may
need to include an “equal to”
component.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
Internal nodes are the decision or
test points.

Each internal node refers to an input

variable or an attribute.

The top internal node is called the

root.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
The decision tree in Figure 7-1 is a
binary tree in that each internal node
has no more than two branches.

The branching of a node is referred

to as a split.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees

Sometimes decision trees may have more than two branches stemming from a
node. For example, if an input variable Weather is categorical and has three
choices—Sunny, Rainy, and Snowy—the corresponding node Weather in the
decision tree may have three branches labeled as Sunny, Rainy, and Snowy,
respectively.

The depth of a node is the minimum number of steps required to reach the
node from the root. In Figure 7-1 for example, nodes Income and Age have a
depth of one, and the four nodes on the bottom of the tree have a depth of two.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
Sometimes decision trees may have more than two branches stemming from a
node. For example, if an input variable Weather is categorical and has three
choices—Sunny, Rainy, and Snowy—the corresponding node Weather in the
decision tree may have three branches labeled as Sunny, Rainy, and Snowy,
respectively.

Leaf nodes are at the end of the last branches on the tree. They represent class
labels—the outcome of all the prior decisions. The path from the root to a leaf
node contains a series of decisions made at various internal nodes.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
The depth of a node is the minimum
number of steps required to reach
the node from the root.

In Figure 7-1 for example, nodes

Income and Age have a depth of
one, and the four nodes on the
bottom of the tree have a depth of
two.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees

In Figure 7-1, the root node splits

into two branches with a Gender
test.

The right branch contains all those

records with the variable Gender
equal to Male, and the left branch
contains all those records with the
variable Gender equal to Female to
create the depth 1 internal nodes

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
Each internal node effectively acts
as the root of a subtree, and a best
test for each node is determined
independently of the other internal
nodes.

The left-hand side (LHS) internal

node splits on a question based on
the Income variable to create leaf
nodes at depth 2, whereas the right-
hand side (RHS) splits on a question
on the Age variable.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
The decision tree in Figure 7-1
shows that females with income less
than or equal to $45,000 and males
40 years old or younger are
classified as people who would
purchase the product.

In traversing this tree, age does not

matter for females, and income does
not matter for males.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
To illustrate how a decision tree works, consider the case of a bank that wants to
market its term deposit products (such as Certificates of Deposit) to the
appropriate customers.

Given the demographics of clients and their reactions to previous campaign

phone calls, the bank’s goal is to predict which clients would subscribe to a term
deposit.

The dataset used here is based on the original dataset collected from a
Portuguese bank on directed marketing campaigns as stated in the work by
Moro et al. [6].
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
Figure 7-3 shows a subset of the
modified bank marketing dataset.

This dataset includes 2,000

instances randomly drawn from the
original dataset, and each instance
corresponds to a customer.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
To make the example simple, the subset only keeps the following categorical
variables: (1) job, (2) marital status, (3) education level, (4) if the credit is in
default, (5) if there is a housing loan, (6) if the customer currently has a personal
loan, (7) contact type, (8) result of the previous marketing campaign contact
(poutcome), and finally (9) if the client actually subscribed to the term deposit.

Attributes (1) through (8) are input variables, and (9) is considered the outcome.
The outcome subscribed is either yes (meaning the customer will subscribe to
the term deposit) or no (meaning the customer won’t subscribe).

All the variables listed earlier are categorical.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
A summary of the dataset shows the following statistics. For ease of display, the
summary only includes the top six most frequently occurring values for each
attribute. The rest are displayed as (Other)

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees

Attribute job includes the following values.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
Figure 7-4 shows a decision tree
built over the bank marketing
dataset.

The root of the tree shows that the

overall fraction of the clients who
have not subscribed to the term
deposit is 1,789 out of the total
population of 2,000.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
At each split, the decision tree
algorithm picks the most informative
attribute out of the remaining
attributes.

The extent to which an attribute is

informative is determined by
measures such as entropy and
information gain, which will be
discussed later

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
At the first split, the decision tree
algorithm chooses the poutcome
attribute. There are two nodes at
depth=1.

The left node is a leaf node representing

a group for which the outcome of the
previous marketing campaign contact is
a failure, other, or unknown.

For this group, 1,763 out of 1,942 clients

have not subscribed to the term deposit.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
The right node represents the rest of
the population, for which the
outcome of the previous marketing
campaign contact is a success.

For the population of this node, 32

out of 58 clients have subscribed to
the term deposit

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
This node further splits into two
nodes based on the education level.

If the education level is either

secondary or tertiary, then 26 out of
50 of the clients have not subscribed
to the term deposit.

If the education level is primary or

unknown, then 8 out of 8 times the
clients have subscribed.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
The left node at depth 2 further splits
based on attribute job.

If the occupation is admin, blue collar,

management, retired, services, or
technician, then 26 out of 45 clients
have not subscribed.

If the occupation is self-employed,

student, or unemployed, then 5 out of
5 times the clients have subscribed.

*Text taken from Data Science and Big Data Analytics by EMC Education Services

5E Lesson Plan in Science
No ratings yet
5E Lesson Plan in Science
9 pages
FS 1 Ep 3
100% (1)
FS 1 Ep 3
20 pages
Judicial Affidavit
No ratings yet
Judicial Affidavit
6 pages
Third-Degree Impasses (Knibe S.)
100% (1)
Third-Degree Impasses (Knibe S.)
10 pages
Module 6
No ratings yet
Module 6
82 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
Notes_Decision_Tree
No ratings yet
Notes_Decision_Tree
22 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
7 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Decisiontree1 2
No ratings yet
Decisiontree1 2
29 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Akash Chowdhury - Se
No ratings yet
Akash Chowdhury - Se
5 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
Module 5 Machine Learning
No ratings yet
Module 5 Machine Learning
36 pages
PADM - Decision Trees
No ratings yet
PADM - Decision Trees
43 pages
Padm
No ratings yet
Padm
40 pages
Decision Tree
100% (1)
Decision Tree
57 pages
Lecture Notes - Decision Tree
No ratings yet
Lecture Notes - Decision Tree
13 pages
Business Analytics: Data Classification
No ratings yet
Business Analytics: Data Classification
36 pages
Week 8 - Understanding the Decision Tree
No ratings yet
Week 8 - Understanding the Decision Tree
28 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Unit-II - Tree Based Methods
No ratings yet
Unit-II - Tree Based Methods
158 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
What Is A Decision Tree ?: - Decision Tree Is A Classifier in The Form of A Tree Structure, Where Each Node Is Either
No ratings yet
What Is A Decision Tree ?: - Decision Tree Is A Classifier in The Form of A Tree Structure, Where Each Node Is Either
18 pages
Lec.7.intro.D.S. Fall 2023
No ratings yet
Lec.7.intro.D.S. Fall 2023
26 pages
Decision Trees Notes
No ratings yet
Decision Trees Notes
5 pages
EDA Cat2
No ratings yet
EDA Cat2
54 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
Decision Trees
No ratings yet
Decision Trees
21 pages
Decision Trees
100% (2)
Decision Trees
16 pages
TEAA_ Tree Ensembles-1
No ratings yet
TEAA_ Tree Ensembles-1
43 pages
5.classification in AI - Unit 2
No ratings yet
5.classification in AI - Unit 2
5 pages
Module 04
No ratings yet
Module 04
75 pages
Decision Treesnotes
No ratings yet
Decision Treesnotes
3 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
ML Chapter 4 Part2
No ratings yet
ML Chapter 4 Part2
75 pages
Classification: Decision Trees: Business Analytics Lecture 7/8
No ratings yet
Classification: Decision Trees: Business Analytics Lecture 7/8
35 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Group7 - Decision Tree Analysis
No ratings yet
Group7 - Decision Tree Analysis
8 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
10 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
11 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Decision Tree
No ratings yet
Decision Tree
4 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
decisiontrees (1)
No ratings yet
decisiontrees (1)
28 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
decision tree
No ratings yet
decision tree
13 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Decision Tree R
No ratings yet
Decision Tree R
5 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Lecture 7 Overview of ML models
No ratings yet
Lecture 7 Overview of ML models
77 pages
Unit 2
No ratings yet
Unit 2
11 pages
Decision Trees
No ratings yet
Decision Trees
12 pages
Decision Tree and Ensemble
No ratings yet
Decision Tree and Ensemble
92 pages
Module 04 Edited
No ratings yet
Module 04 Edited
19 pages
TPS552 Group 12
No ratings yet
TPS552 Group 12
14 pages
Decisiontree 2
No ratings yet
Decisiontree 2
16 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Lesson 7 Logistic Regression
No ratings yet
Lesson 7 Logistic Regression
17 pages
Lesson 1 Overview of Data Science
No ratings yet
Lesson 1 Overview of Data Science
12 pages
Heat Work Energy Solutions
No ratings yet
Heat Work Energy Solutions
7 pages
Problems Encountered by The Bureau of Fire Protection
0% (1)
Problems Encountered by The Bureau of Fire Protection
25 pages
Mechanical Engineering Fire Protection System
No ratings yet
Mechanical Engineering Fire Protection System
2 pages
Engineering Management
No ratings yet
Engineering Management
5 pages
Ce40 M2 CPR
No ratings yet
Ce40 M2 CPR
2 pages
CE115 2 Lesson 4 3 18 21
No ratings yet
CE115 2 Lesson 4 3 18 21
22 pages
New Arrival
No ratings yet
New Arrival
7 pages
2 Resource Package Introduction
95% (21)
2 Resource Package Introduction
50 pages
CBLM - BPP Prepare and Display Petits Fo
88% (8)
CBLM - BPP Prepare and Display Petits Fo
68 pages
SS17 Learning Module Enhanced Module
No ratings yet
SS17 Learning Module Enhanced Module
66 pages
Legal Education Board: Hon. Emerson B. Aquende Chairperson IBP Representative
No ratings yet
Legal Education Board: Hon. Emerson B. Aquende Chairperson IBP Representative
1 page
Example Channels 1
No ratings yet
Example Channels 1
12 pages
Understanding Autistic Spectrum Disorders (ASDs) TheMind 1999 PDF
No ratings yet
Understanding Autistic Spectrum Disorders (ASDs) TheMind 1999 PDF
16 pages
A Blockchain Enabled ELearning Platform
No ratings yet
A Blockchain Enabled ELearning Platform
29 pages
6.data Science
No ratings yet
6.data Science
6 pages
Music Resume v2
No ratings yet
Music Resume v2
2 pages
Anusha Tiwari Letter 1-2
No ratings yet
Anusha Tiwari Letter 1-2
2 pages
Condition of Education in Pakistan
No ratings yet
Condition of Education in Pakistan
52 pages
DLP No 5 Creates Rhythmic Pattern in Time Signatures
No ratings yet
DLP No 5 Creates Rhythmic Pattern in Time Signatures
4 pages
(Ebook) Ultimate Review for the Neurology Boards by Alexander Rae-Grant et al. ISBN 9781620700815, 1620700816 - Download the ebook now and read anytime, anywhere
100% (1)
(Ebook) Ultimate Review for the Neurology Boards by Alexander Rae-Grant et al. ISBN 9781620700815, 1620700816 - Download the ebook now and read anytime, anywhere
61 pages
CNAA English Brochure 2013
No ratings yet
CNAA English Brochure 2013
21 pages
List of Registration Centers - Greater Accra
100% (1)
List of Registration Centers - Greater Accra
54 pages
Dhanalakshmi Srinivasan Engineering College
No ratings yet
Dhanalakshmi Srinivasan Engineering College
117 pages
TAC-RLA-Updates-and-Scoring-110524
No ratings yet
TAC-RLA-Updates-and-Scoring-110524
48 pages
Parental Knowledge and Perception of Childhood Obesity-Review Model
No ratings yet
Parental Knowledge and Perception of Childhood Obesity-Review Model
14 pages
Education Guide With Q - S
No ratings yet
Education Guide With Q - S
80 pages
Grade 6 Schedule
No ratings yet
Grade 6 Schedule
1 page
DLP Math 5 First Grading (Lesson 1)
No ratings yet
DLP Math 5 First Grading (Lesson 1)
3 pages
Students Resume
No ratings yet
Students Resume
9 pages
Shreeji Mohan
No ratings yet
Shreeji Mohan
1 page
Competency Based Curriculum Report
No ratings yet
Competency Based Curriculum Report
26 pages
Management Theories in Education
No ratings yet
Management Theories in Education
4 pages

Lesson 10 Decision Trees

Uploaded by

Lesson 10 Decision Trees

Uploaded by

Introduction to

In classification learning, a classifier is presented with a set of examples that

Classification can help health care professionals diagnose heart disease

Based on an e-mail’s content, e-mail providers also use classification to

A decision tree can be converted into a set of decision rules.

IF income < $50,000 AND mortgage_amount > $100K

Additionally, because the result is a series of logical if-then statements, there

The term branch refers to the

Depending on the nature of the

Each internal node refers to an input

The top internal node is called the

The branching of a node is referred

In Figure 7-1 for example, nodes

In Figure 7-1, the root node splits

The right branch contains all those

The left-hand side (LHS) internal

In traversing this tree, age does not

Given the demographics of clients and their reactions to previous campaign

This dataset includes 2,000

All the variables listed earlier are categorical.

Attribute job includes the following values.

The root of the tree shows that the

The extent to which an attribute is

The left node is a leaf node representing

For this group, 1,763 out of 1,942 clients

For the population of this node, 32

If the education level is either

If the education level is primary or

If the occupation is admin, blue collar,

If the occupation is self-employed,

You might also like