0% found this document useful (0 votes)
47 views50 pages

2000 Cart

This document discusses classification and regression trees (CART) and how they are built and interpreted. It covers: - What classification and regression trees are and how they differ - How trees are built by recursively splitting nodes to maximize purity - How trees can be overfit and pruning is used to remove unnecessary splits - Cross-validation is used to select the optimal tree size by averaging error rates across folds
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views50 pages

2000 Cart

This document discusses classification and regression trees (CART) and how they are built and interpreted. It covers: - What classification and regression trees are and how they differ - How trees are built by recursively splitting nodes to maximize purity - How trees can be overfit and pruning is used to remove unnecessary splits - Cross-validation is used to select the optimal tree size by averaging error rates across folds
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

AQM 2000

Predictive Business Analytics


Eric W. Chan, Ph.D.
Sessions 14: Classification and Regression Trees (CART)
AQM 2000, Math, Analytics, Science, and Tech (MAST) Division
What is it?

• Classification trees try to predict a categorical target.


• Will this person default on their loan?
• What type of book is this?

• Regression trees try to predict a continuous numerical target.


• What will the total rainfall next year be?
• How much will this house sell for?
Data

• Someone started a company using AI to assist high school students


improve admissions chances at “reach” schools.
• How it works:
• Students take a detailed survey.
• Algorithm helps determine who their reach, possible, and safety schools realistically are.
• Company uses survey and other attributes to determine where students need most help in
(essays, SATs, grades, etc.) and provide the assistance for fee.
• Goal is to improve chances at admissions towards a “reach” school.
Building Intuition

• Each dot represents the results of a student: success (green triangles) or failure
(red circles).
Building Intuition

• Goal: predict whether a student with a given GPA and SAT score will result in a
successful admission into at least one “reach” school.
Building Intuition

• Method: draw a minimal number of horizontal and vertical lines that partitions
the predictor space into regions containing just one value of the target.
Building Intuition

• Condition: keep drawing until you hit the boundary or a line you've already drawn.
Building Intuition
Building Intuition

• Here is one possible solution.


Building Intuition

• Would this new person (blue square) be predicted to gain entry or not into a
“reach” school?
Interpreting Trees
Interpreting Trees

• This tree map can be represented as a sequence of really simple decisions in the
form of a decision tree.
Interpreting Trees

• Decision trees are composed of nodes and edges.


• Nodes represent subsets of our data; edges represent choices.
Interpreting Trees

Current Prediction
Percent Target Value 1
Percent of Total Data Set

• Each node has the same representation for a classification. This is one from a
classification tree.
• For a regression tree, there is no “Percent Target Value 1.” Why?
Interpreting Trees

• In the top node, we would predict 1 (Success), because 62% of people in this
group were successes; this node represents 100% of the data set.
Interpreting Trees

• Below, we have a rule that splits the node's records, e.g., GPA< 3.2.
• Left is always “yes,” and right is always “no.”
Interpreting Trees

• Graphically, this condition is the same as drawing a line at GPA=3.2


Interpreting Trees

• Our prediction at this stage (where GPA < 91 is “no”) is Success; 83% of records in this
node are Successes.
• This node represents 75% of the total training set.
Interpreting Trees

• Our next splitting condition is GPA < 3.8.


• Records satisfying this condition go to the left.
Interpreting Trees

• This is condition is the same as drawing a vertical line at GPA < 3.8.
• This line stops at SAT = 1285.
Interpreting Trees

• Let's predict our test record: SAT = 1580, GPA = 3.55.


• Earlier we decided that this record should be classified as a failed attempt.
Interpreting Trees

• In the tree context, this prediction is just following a path from the top to the
bottom, making the appropriate choices along the way.
Building Trees in R
Splitting

• The leaves here contain more than one value of the target!
• How do we know if we should split? How do we know where to split?
Splitting

• How do we know where to split?


• Why is the rest split at SAT = 1220?
Splitting

• Qualitatively, we want the split that maximizes node purity.


• For categorical targets, a perfectly pure node contains only one value of the
target.
Splitting

• For categorical targets, a perfectly pure node contains only one value of the target.
• The closer we can get to this goal the better.
Splitting

• For categorical targets, a perfectly pure node contains only one value of the target.
• The closer we can get to this goal the better.
Splitting

• Quantitatively, there are lots of ways to measure node purity.


• For a binary target, we might use Gini impurity:
impurity = p(1 - p)
where p is the proportion of True observations in one node after the split.

• If p = 0, the node we have only False target values, and impurity = 0.


• If p = 1, the node we have only True target values, and impurity = 0.
• If p = 0.5, we have an evenly split target, and impurity is maximized.
Splitting
Splitting
Splitting

• For each variable, the model considers every possible split.


• The best split is the one that results in the lowest resulting impurity.
Stopping Rules and Pruning

Even given a splitting rule, how do we know when to stop building the tree?
There are several such stopping rules:
• The node contains records that all have the same value of the target, i.e., the
node is perfectly pure.
• minsplit: The node is below a user specified minimum size, typically something
like 2% of the total number of records.
• minbucket: One of the nodes that would result from the split is below a user
specified minimum size, typically something like 1% of the total number of
records.
• cp: Performing a split wouldn't improve the predictive power of the model by
some user specified amount.
Stopping Rules and Pruning

• As we add more complexity to our tree model, its performance on training gets
better and better -- with a big enough tree, every node is perfectly pure!
Stopping Rules and Pruning

• The performance on test gets better for a while, then starts to get worse.
• At some point, the splits in our tree are just modeling noise in training.
Stopping Rules and Pruning

• We want to find the tree size that minimizes error rate on test.
• We want to find the tree that is neither overfit or underfit.
Stopping Rules and Pruning

• Here's an example of underfit tree.


• There is likely much more structure in the data than what is represented here.
Stopping Rules and Pruning

• Here's an example of a likely overfit tree


• We need to prune the tree to remove these unnecessary branches.
Stopping Rules and Pruning

• The pruned tree is just the original with all the unnecessary branches removed.
• Pruned trees often perform worse on training and better on test than unpruned trees.
Stopping Rules and Pruning

We cannot tell whether a tree is over- or under-fit


simply by looking at it - we need a more systematic and
quantitative way to determine the best model complexity!
Cross Validation

• Earlier, we learned there's a problem if we're using test data to build our
model.
• The whole point of test was to hold it out in until we had made our model!
• There's a clever way to get around this called k-fold cross-validation.
Cross Validation

• Let's see how 4-fold cross-validation would work.


• We begin by breaking the training data set into 4 equally sized pieces.
Cross Validation

• First, we build a model using data only from pieces 2, 3 and 4.


• We then evaluate the performance of this model on piece 1.
Cross Validation

• Next, we build a model using data only from pieces 1, 3 and 4.


• We then evaluate the performance of this model on piece 2.
Cross Validation

• We repeat using pieces 1, 2, and 4 as our “training” and piece 3 as our “test”.
..
Cross Validation

• . . . and finally using pieces 1, 2, and 3 as our “training’ and piece 4 as our
“test.”
Cross Validation

• Each fold grows a different tree -- notice that the splits might not be the same!
• After each split, the error rates across all k trees are averaged.
Cross Validation
So how can we use k-fold cross-validation to build a tree of the “best” size?
• We build k different trees, each one using one of the k pieces of the real
training set as its “test.”
• Each time we add a split in each of the k trees, we keep track of the
performance of this new tree on its “test” data.
• For a given tree size, we average the error rate across all k trees.
• We define the “best” tree size to be the one with the lowest average error rate.
Cross Validation

• Cross-validation allows us to estimate the error on test without ever using the
test data.
Stopping Rules and Pruning in R

You might also like