2000 Cart
2000 Cart
• Each dot represents the results of a student: success (green triangles) or failure
(red circles).
Building Intuition
• Goal: predict whether a student with a given GPA and SAT score will result in a
successful admission into at least one “reach” school.
Building Intuition
• Method: draw a minimal number of horizontal and vertical lines that partitions
the predictor space into regions containing just one value of the target.
Building Intuition
• Condition: keep drawing until you hit the boundary or a line you've already drawn.
Building Intuition
Building Intuition
• Would this new person (blue square) be predicted to gain entry or not into a
“reach” school?
Interpreting Trees
Interpreting Trees
• This tree map can be represented as a sequence of really simple decisions in the
form of a decision tree.
Interpreting Trees
Current Prediction
Percent Target Value 1
Percent of Total Data Set
• Each node has the same representation for a classification. This is one from a
classification tree.
• For a regression tree, there is no “Percent Target Value 1.” Why?
Interpreting Trees
• In the top node, we would predict 1 (Success), because 62% of people in this
group were successes; this node represents 100% of the data set.
Interpreting Trees
• Below, we have a rule that splits the node's records, e.g., GPA< 3.2.
• Left is always “yes,” and right is always “no.”
Interpreting Trees
• Our prediction at this stage (where GPA < 91 is “no”) is Success; 83% of records in this
node are Successes.
• This node represents 75% of the total training set.
Interpreting Trees
• This is condition is the same as drawing a vertical line at GPA < 3.8.
• This line stops at SAT = 1285.
Interpreting Trees
• In the tree context, this prediction is just following a path from the top to the
bottom, making the appropriate choices along the way.
Building Trees in R
Splitting
• The leaves here contain more than one value of the target!
• How do we know if we should split? How do we know where to split?
Splitting
• For categorical targets, a perfectly pure node contains only one value of the target.
• The closer we can get to this goal the better.
Splitting
• For categorical targets, a perfectly pure node contains only one value of the target.
• The closer we can get to this goal the better.
Splitting
Even given a splitting rule, how do we know when to stop building the tree?
There are several such stopping rules:
• The node contains records that all have the same value of the target, i.e., the
node is perfectly pure.
• minsplit: The node is below a user specified minimum size, typically something
like 2% of the total number of records.
• minbucket: One of the nodes that would result from the split is below a user
specified minimum size, typically something like 1% of the total number of
records.
• cp: Performing a split wouldn't improve the predictive power of the model by
some user specified amount.
Stopping Rules and Pruning
• As we add more complexity to our tree model, its performance on training gets
better and better -- with a big enough tree, every node is perfectly pure!
Stopping Rules and Pruning
• The performance on test gets better for a while, then starts to get worse.
• At some point, the splits in our tree are just modeling noise in training.
Stopping Rules and Pruning
• We want to find the tree size that minimizes error rate on test.
• We want to find the tree that is neither overfit or underfit.
Stopping Rules and Pruning
• The pruned tree is just the original with all the unnecessary branches removed.
• Pruned trees often perform worse on training and better on test than unpruned trees.
Stopping Rules and Pruning
• Earlier, we learned there's a problem if we're using test data to build our
model.
• The whole point of test was to hold it out in until we had made our model!
• There's a clever way to get around this called k-fold cross-validation.
Cross Validation
• We repeat using pieces 1, 2, and 4 as our “training” and piece 3 as our “test”.
..
Cross Validation
• . . . and finally using pieces 1, 2, and 3 as our “training’ and piece 4 as our
“test.”
Cross Validation
• Each fold grows a different tree -- notice that the splits might not be the same!
• After each split, the error rates across all k trees are averaged.
Cross Validation
So how can we use k-fold cross-validation to build a tree of the “best” size?
• We build k different trees, each one using one of the k pieces of the real
training set as its “test.”
• Each time we add a split in each of the k trees, we keep track of the
performance of this new tree on its “test” data.
• For a given tree size, we average the error rate across all k trees.
• We define the “best” tree size to be the one with the lowest average error rate.
Cross Validation
• Cross-validation allows us to estimate the error on test without ever using the
test data.
Stopping Rules and Pruning in R