0% found this document useful (0 votes)
45 views24 pages

L4SAS Viya - Decision Tree

Uploaded by

Shaiba Shoshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views24 pages

L4SAS Viya - Decision Tree

Uploaded by

Shaiba Shoshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Decision Trees

Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.


Decision Trees in SAS Visual Statistics
• There is only one response variable. It can be either a category or a measure.
(Both decision trees and regression trees can be created.)

• There can be multiple predictors (categories or measures)

• Both category and measure predictors are accommodated


=> extremely versatile technique as can be used with any data type

• Using Interactive mode, you can manually train and prune a decision tree

• You can derive a leaf ID. This ID can be used in other models that are featured in
the SAS Visual Statistics functionality

2
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree Roles
• Response – only one measure or
categorical variable

• Predictors – assign any number of


measure and category variables

• Partition ID – only one partition


variable (optional)
>to build a DT for a cluster

3
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
continued...
Decision Tree Options
• Decision Tree
• Event level • Rapid growth
• Autotune • Prune with
• Missing assignment validation data

• Minimum value • Pruning


• Growth strategy • Reuse predictors

• Maximum branches • Number of bins

• Maximum levels • Prediction cutoff

• Leaf size • Statistic percentile

• Bin response variable • Tolerance


• Predictor bins
• Bin method
4
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
See lectures notes for more details continued...
Decision Tree Options
• Decision Tree
• Maximum branches
- Max number of splits at one node
• Maximum levels
- Maximum depth of the tree
• Leaf size
- Minimum number of observations in a leaf node
• Pruning
- Specifies the aggressiveness of the tree pruning
algorithm. A more aggressive algorithm creates a
smaller decision tree. Larger values are more
aggressive
• Reuse predictors
- Allows more than one split in the same branch
based on a predictor. 5
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree Options
• Model Display
• Plot layout
• (General) Statistic to show
• (Decision Tree / Icicle Plot) Statistic to show
• Legend visibility
• Plot type
• Plot to show
• Confusion matrix legend visibility

6
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree Results
Summary Bar

Variable
Importance
Decision Tree

Assessment

Icicle
7
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Analyzing Decision Results
Four panes appear under a summary bar. They can help you analyze
the results of the decision tree model.
• Tree with Treemap – displays an interactive and navigational decision tree
with node statistics and node rules
• Icicle Plot – displays a hierarchical breakdown of the tree data
• Variable Importance Plot – provides the variable importance information
for effects in the tree
• Assessment Plots – provide information on the performance of the tree
- Confusion matrix summarizes classifications.
- ROC (receiver operating characteristic) measures classification accuracy.
- Misclassification measures predictive accuracy.

8
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Analyzing Decision Results
Summary bar

Target
Baseline category Performance of the
for computing decision Tree
the different
performance
indicators

9
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree: Assessment Statistics
The Assessment Statistics tab displays the value of any assessment statistics
that are computed for the model.

10
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree Results: Tree and Treemap

The color of the node in the treemap


indicates the predicted level for that
node

Zoomed in and
node selected
11
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Analyzing Decision Tree Results: Icicle
Icicle Plot: displays a hierarchical breakdown of the tree data

• Each tile = one node


• Size proportional to the
number of observation
• Colour = prediction 12
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree Details Table: Node Statistics
The Node Statistics tab provides summary statistics for each node
in the decision tree.

13
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Details Table: Node Rules
The Node Rules tab provides the sorting rule that is used for each node
in the decision tree.
• Node ID Just another way
• Parent ID to represent your decision tree
• Type (Class or Leaf)
• Column for each predictor and rule applied

14
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree Results: Variable Importance Plot
Provides the variable importance information for effects in the tree.

15
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Details Table: Variable Importance
The Variable Importance tab provides variable importance information
for the variables that are used in the tree.
• Variable name
• Importance value
• Standard Deviation

16
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree Results: Leaf Statistics
what does the distribution inside a leaf look like

Count

Percent

17
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree Results: Assessment

Confusion
Matrix

ROC Misclassification

18
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Assessment: Confusion Matrix

19
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Details Table: Confusion Matrix
The Confusion Matrix tab provides a summary of the correct and incorrect
classifications for the model that is used to generate the confusion matrix.

20
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Assessment: Misclassification Plot

Misclassification measures predictive


accuracy:

A Misclassification plot displays how


many observations were correctly and
incorrectly classified for each value of
the response variable (TP/FP/FN/TN)

> We want to minimize the orange part

21
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree: Misclassification
The Misclassification tab displays a summary of correct and incorrect
classifications for the model

22
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Assessment: ROC Chart
ROC (receiver operating characteristic)
measures classification accuracy:
- The ROC chart displays the ability of a
model to avoid false positive and false
negative classifications.
- The classification accuracy of a model is
demonstrated by the degree that the
ROC curve pushes upward and to the
left.
- This degree can be quantified by the
area under the curve.
- The area will range from 50, for a
worthless model, to 100, for a perfect
classifier.
23
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Details Table: ROC
The ROC tab displays the results that are used to generate the ROC plot

24
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.

You might also like