Decision Trees

Uploaded by

medhavipandit3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views77 pages

Decision Trees

Uploaded by

medhavipandit3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 77

Intro to Data

Mining for Business

STAT331
DECISION TREES
What we will cover
Become familiar with a Dimension Reduction
data set • Perform PCA to reduce # of
• What variables are in it? variables in data set
• Where did it come from?
• Any initial patterns or Interpret
relationships? Draw conclusions
Provide recommendations
Cluster Analysis
• Generate useful segments
within the data

Data pre-processing
• What to do with missing info
• Transform variables
• Get it ready for analysis
Predictive Models
• Linear regression (covered in Evaluate Model
STAT202) • Training and test sets
• Logistic Regression • Performance metrics
• Decision Trees
Classification
How can I estimate the category a target variable falls into?
◦ Similar to estimation, but NON-NUMERIC target
◦ Help to answer yes/no questions or place observations in mutually exclusive subsets
◦ Ex: classifying credit card transaction as real or fraudulent, diagnosing a patient with a particular
disease, determining income brackets based on personal characteristics, etc.
Prediction
How can I predict the results of a FUTURE outcome
◦ Similar to classification and estimation however, the “outcome” is not yet known.
◦ Methods of classification and estimation can be used here if they appropriately fit the context
◦ Ex: what will the price of a stock be 3 months from now? What will the increase be in car related
accidents if I increase the speed limit?
Classification Tasks
Goal:
◦ To correctly classify an observation as one of multiple discrete categories

Target variable: categorical

◦ Ex: high, medium low; yes, no

Predictor variables:
◦ Discrete numerical
◦ Categorical
◦ Continuous numerical (used by creating “categories” or break points)
Classification Tasks
Examine a data set containing:
◦ Predictor variables
◦ Already classified target variable outcomes

The model “learns” about links between predictors and a classification level of the target variable
This is the “training” portion
Consider new data:
◦ Predictor variables
◦ Target variable unclassified

The model then classifies the target variable (this is the “testing” portion)
We can evaluate error during this portion before applying to truly unknown target variable
classification
Expected Value Decision Trees
Lay out options in logical order
Calculate expected value of each
Select a “path” with the highest value
Decision making tool

Data mining decision tress are different

Decision Trees
One form of classification model

Model pieces:
◦ Root nodes: beginning of the decision tree
◦ Decision nodes: a node where observations are broken down by discrete categories
◦ Branches: represents one categorical level associated with a parent decision node
◦ Leaf nodes: a termination point on a tree that seeks to have as little variation in classifications of the
target variable as possible
◦ Pure leaf nodes: all observations have the same classification
Decision Tree: Example
Predicting “good” versus “bad” credit risk
You have information regarding:
◦ Savings level
◦ Assets
◦ Income
Decision Tree: Example
Predicting “good” versus “bad” credit risk
You have information regarding:
◦ Savings level
◦ Grouped into discrete categories (low, medium and high)
◦ Assets
◦ Grouped into discrete categories (low and high)
◦ Income
◦ Grouped into discrete categories via a breakpoint (greater than or equal to 30,000 and less than $30,000)
Decision Tree: Example
Root Node
Savings = Low, Med, High?

Savings = Low Savings = Med Savings = High