Lesson 10 Decision Trees
Lesson 10 Decision Trees
Data Science
DECISION TREES
Classification
In addition to analytical methods such as clustering, association rule learning,
and modeling techniques like regression, classification is another fundamental
learning method that appears in applications related to data mining.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Classification
Most classification methods are supervised, in that they start with a training
set of labels that are predetermined, unlike in clustering, to learn how likely the
attributes of these observations may contribute to the classification of future
unlabeled observations.
For example, existing marketing, sales, and customer demographic data can
be used to develop a classifier to assign a “purchase” or “no purchase” label to
potential future customers.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Classification
Classification is widely used for prediction purposes. For example, by building
a classifier on the transcripts of United States Congressional floor debates, it
can be determined whether the speeches represent support or opposition to
proposed legislation.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Decision Trees
A decision tree is one of the two fundamental classification methods, the other
being naïve Bayes.
A decision tree (also called prediction tree) uses a tree structure to specify
sequences of decisions and consequences.
Given input X=, the goal is to predict a response or output variable Y. Each
member of the set is called an input variable. The input values of a decision
tree can be categorical or continuous.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Decision Trees
The prediction can be achieved by constructing a decision tree with test points
and branches. At each test point, a decision is made to pick a specific branch
and traverse down the tree. Eventually, a final point is reached, and a
prediction can be made.
Each test point in a decision tree involves testing a particular input variable (or
attribute), and each branch represents the decision being made. Due to its
flexibility and easy visualization, decision trees are commonly deployed in data
mining applications for classification purposes.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Decision Trees
A decision tree employs a structure of test points (called nodes) and
branches, which represent the decision being made.
A node without further branches is called a leaf node. The leaf nodes return
class labels and, in some implementations, they return the probability scores.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Decision Trees
In the following example rule, income and mortgage_amount are input
variables, and the response is the output variable default with a probability
score.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Decision Trees
Decision trees have two varieties: classification trees and regression trees.
Classification trees usually apply to output variables that are categorical—
often binary—in nature, such as yes or no, purchase or not purchase, and so
on.
Regression trees, on the other hand, can apply to output variables that are
numeric or continuous, such as the predicted price of a consumer good or the
likelihood a subscription will be purchased.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Decision Trees
Decision trees can be applied to a variety of situations. They can be easily
represented in a visual way, and the corresponding decision rules are quite
straightforward.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
Figure 7-1 shows an example of
using a decision tree to predict
whether customers will buy a
product.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
If a decision is numerical, the
“greater than” branch is usually
placed on the right, and the “less
than” branch is placed on the left.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
Internal nodes are the decision or
test points.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
The decision tree in Figure 7-1 is a
binary tree in that each internal node
has no more than two branches.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
Sometimes decision trees may have more than two branches stemming from a
node. For example, if an input variable Weather is categorical and has three
choices—Sunny, Rainy, and Snowy—the corresponding node Weather in the
decision tree may have three branches labeled as Sunny, Rainy, and Snowy,
respectively.
The depth of a node is the minimum number of steps required to reach the
node from the root. In Figure 7-1 for example, nodes Income and Age have a
depth of one, and the four nodes on the bottom of the tree have a depth of two.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
Sometimes decision trees may have more than two branches stemming from a
node. For example, if an input variable Weather is categorical and has three
choices—Sunny, Rainy, and Snowy—the corresponding node Weather in the
decision tree may have three branches labeled as Sunny, Rainy, and Snowy,
respectively.
Leaf nodes are at the end of the last branches on the tree. They represent class
labels—the outcome of all the prior decisions. The path from the root to a leaf
node contains a series of decisions made at various internal nodes.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
The depth of a node is the minimum
number of steps required to reach
the node from the root.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
Each internal node effectively acts
as the root of a subtree, and a best
test for each node is determined
independently of the other internal
nodes.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
To illustrate how a decision tree works, consider the case of a bank that wants to
market its term deposit products (such as Certificates of Deposit) to the
appropriate customers.
The dataset used here is based on the original dataset collected from a
Portuguese bank on directed marketing campaigns as stated in the work by
Moro et al. [6].
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
Figure 7-3 shows a subset of the
modified bank marketing dataset.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
To make the example simple, the subset only keeps the following categorical
variables: (1) job, (2) marital status, (3) education level, (4) if the credit is in
default, (5) if there is a housing loan, (6) if the customer currently has a personal
loan, (7) contact type, (8) result of the previous marketing campaign contact
(poutcome), and finally (9) if the client actually subscribed to the term deposit.
Attributes (1) through (8) are input variables, and (9) is considered the outcome.
The outcome subscribed is either yes (meaning the customer will subscribe to
the term deposit) or no (meaning the customer won’t subscribe).
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
Figure 7-4 shows a decision tree
built over the bank marketing
dataset.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
At each split, the decision tree
algorithm picks the most informative
attribute out of the remaining
attributes.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
At the first split, the decision tree
algorithm chooses the poutcome
attribute. There are two nodes at
depth=1.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
The right node represents the rest of
the population, for which the
outcome of the previous marketing
campaign contact is a success.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Overview of Decision Trees
This node further splits into two
nodes based on the education level.
*Text taken from Data Science and Big Data Analytics by EMC Education Services