0% found this document useful (0 votes)

31 views

Decision Tree - All Cost Functions - Stanford

Uploaded by

ABDURRAHMAN AHAMED

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views

Decision Tree - All Cost Functions - Stanford

Uploaded by

ABDURRAHMAN AHAMED

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Statistics 202:

Data Mining

Jonathan
c
Taylor

Statistics 202: Data Mining

Classification & Decision Trees
Based in part on slides from textbook, slides of Susan Holmes

Jonathan
c Taylor

October 19, 2012

1/1
Classification

Statistics 202:
Data Mining

Jonathan
c Problem description
Taylor
We are given a data matrix X with either continuous or
discrete variables such that each row Xi ∈ F and a set of
labels Y ∈ L.
For a k-class problem, #L = k and we can think of
L = {1, . . . , k}.
Our goal is to find a classifier

f :F →L

that allows us to predict the label of a new observation

given a new set of features.

2/1
Classification

Statistics 202:
Data Mining

Jonathan
c
Taylor
A supervised problem
Classification is a supervised problem.
Usually, we use a subset of the data, the training set to
learn or estimate the classifier yielding fˆ = fˆtraining .
The performance of fˆ is measured by applying it to each
case in the test set and computing
X
L(fˆtraining (X
X j ), Y j )
j∈test

3/1
Classification

Statistics 202:
Data Mining

Jonathan
c
Taylor

4/1
Classification

Statistics 202:
Data Mining

Jonathan
c
Taylor

Examples of classification tasks

Predicting whether a tumor is benign or malignant.
Classifying credit card transactions as fraudulent or
legitimate.
Predicting the type of a given tumor among several types.
Cateogrizing a document or news story as one of {finance,
weather, sports, etc.}

5/1
Classification

Statistics 202:
Data Mining

Jonathan
c
Taylor
Common techniques
Decision Tree based Methods
Rule-based Methods
Discriminant Analysis
Memory based reasoning
Neural Networks
Naı̈ve Bayes
Support Vector Machines

6/1
Classification trees

Statistics 202:
Data Mining

Jonathan
c
Taylor

7/1
Classification trees

Statistics 202:
Data Mining

Jonathan
c
Taylor

8/1
Applying a decision tree rule

Statistics 202:
Data Mining

Jonathan
c
Taylor

9/1
Applying a decision tree rule

Statistics 202:
Data Mining

Jonathan
c
Taylor

10 / 1
Applying a decision tree rule

Statistics 202:
Data Mining

Jonathan
c
Taylor

11 / 1
Applying a decision tree rule

Statistics 202:
Data Mining

Jonathan
c
Taylor

12 / 1
Applying a decision tree rule

Statistics 202:
Data Mining

Jonathan
c
Taylor

13 / 1
Applying a decision tree rule

Statistics 202:
Data Mining

Jonathan
c
Taylor

14 / 1
Applying a decision tree rule

Statistics 202:
Data Mining

Jonathan
c
Taylor

15 / 1
Decision boundary for tree

Statistics 202:
Data Mining

Jonathan
c
Taylor

16 / 1
Decision tree for iris data using all features

Statistics 202:
Data Mining

Jonathan
c
Taylor

17 / 1
Decision tree for iris data using petal.length,
petal.width
Statistics 202:
Data Mining

Jonathan
c
Taylor

18 / 1
Regions in petal.length, petal.width plane

Statistics 202:
Data Mining

Jonathan
c 3.0
Taylor

2.5

2.0

1.5
Petal width

1.0

0.5

0.0

0.50 1 2 3 4 5 6 7 8
Petal length

19 / 1
Decision boundary for tree

Statistics 202:
Data Mining

Jonathan
c
Taylor

Figure : Trees have trouble capturing structure not parallel to axes

20 / 1
Learning the tree

Statistics 202:
Data Mining

Jonathan
c
Taylor

21 / 1
Learning the tree

Statistics 202:
Data Mining

Jonathan
c Hunt’s algorithm (generic structure)
Taylor
Let Dt be the set of training records that reach a node t
If Dt contains records that belong the same class yt , then
t is a leaf node labeled as yt .
If Dt = ∅, then t is a leaf node labeled by the default
class, yd .
If Dt contains records that belong to more than one class,
use an attribute test to split the data into smaller subsets.
Recursively apply the procedure to each subset.
This splitting procedure is what can vary for different tree
learning algorithms . . .

22 / 1
Learning the tree

Statistics 202:
Data Mining

Jonathan
c
Taylor

23 / 1
Learning the tree

Statistics 202:
Data Mining

Jonathan
c
Taylor Issues
Greedy strategy: Split the records based on an attribute test
that optimizes certain criterion.
What is the best split: What criterion do we use? Previous
example chose first to split on Refund . . .
How to split the records: Binary or multi-way? Previous
example split Taxable Income at ≥ 80K . . .
When do we stop? Should we continue until each node if possible? Previous
example stopped with all nodes being completely homogeneous
...

24 / 1
Different splits: ordinal / nominal

Statistics 202:
Data Mining

Jonathan
c
Taylor

Figure : Binary or multi-way?

25 / 1
Different splits: continuous

Statistics 202:
Data Mining

Jonathan
c
Taylor

Figure : Binary or multi-way?

26 / 1
Choosing a variable to split on

Statistics 202:
Data Mining

Jonathan
c
Taylor

Figure : Which should we start the splitting on?

27 / 1
Learning the tree

Statistics 202:
Data Mining

Jonathan
c
Taylor

Choosing the best split

Need some numerical criterion to choose among possible
splits.
Criterion should favor homogeneous or pure nodes.
Common cost functions:
Gini Index
Entropy / Deviance / Information
Misclassification Error

28 / 1
Choosing a variable to split on

Statistics 202:
Data Mining

Jonathan
c
Taylor

29 / 1
Learning the tree

Statistics 202:
Data Mining

Jonathan
c
Taylor
GINI Index
Suppose we have k classes and node t has frequencies
pt = (p1,t , . . . , pk,t ).
Criterion

X l
X
2
GINI (t) = pj,t pj 0 ,t = 1 − pj,t .
(j,j 0 )∈{1,...,k}:j6=j 0 j=1

Maximized when pj,t = 1/k with value 1 − 1/k

Minimized when all records belong to a single class.
Minimizing GINI will favour pure nodes . . .

30 / 1
Learning the tree

Statistics 202:
Data Mining

Jonathan
c Gain in GINI Index for a potential split
Taylor
Suppose t is to be split into j new child nodes (tl )1≤l≤j .
Each child node has a count nl and a vector of frequencies
(p1,tl , . . . , pk,tl ). Hence they have their own GINI index,
GINI (tl ).
The gain in GINI Index for this split is
Pj
l=1 nl GINI (tl )
Gain(GINI , t → (tl )1≤l≤j ) = GINI (t) − Pj .
l=1 n l

Greedy algorithm chooses the biggest gain in GINI index

among a list of possible splits.

31 / 1
Decision tree for iris data using all features with
GINI
Statistics 202:
Data Mining

Jonathan
c
Taylor

32 / 1
Learning the tree

Statistics 202:
Data Mining

Jonathan
c
Taylor Entropy / Deviance / Information
Suppose we have k classes and node t has frequencies
pt = (p1,t , . . . , pk,t ).
Criterion
k
X
H(t) = − pj,t log pj,t
j=1

Maximized when pi,t = 1/k with value log k

Minimized when one class has no records in it.
Minimizing entropy will favour pure nodes . . .

33 / 1
Decision tree for iris data using all features with
Entropy
Statistics 202:
Data Mining

Jonathan
c
Taylor

34 / 1
Learning the tree

Statistics 202:
Data Mining

Jonathan
c
Taylor Gain in entropy for a potential split
Suppose t is to be split into j new child nodes (tl )1≤l≤j .
Each child node has a count nl and a vector of frequencies
(p1,tl , . . . , pk,tl ). Hence they have their own entropy H(tl ).
The gain in entropy for this split is
Pj
l=1 nl H(tl )
Gain(H, t → (tl )1≤l≤j ) = H(t) − Pj .
l=1 n l

Greedy algorithm chooses the biggest gain in H among a

list of possible splits.

35 / 1
Learning the tree

Statistics 202:
Data Mining

Jonathan
c
Taylor
Misclassification Error
Suppose we have k classes and node t has frequencies
pt = (p1,t , . . . , pk,t ).
The mode is
k̂(t) = argmax pk,t .
k

Criterion

Misclassification Error(t) = 1 − pk̂(t),t

Not smooth in pt as GINI , H, can be more difficult to

optimize numerically.

36 / 1
Different criteria: GINI , H, MC

Statistics 202:
Data Mining

Jonathan
c
Taylor

37 / 1
Learning the tree

Statistics 202:
Data Mining

Jonathan
c
Taylor

Misclassification Error
Example: suppose parent has 10 cases: {7D, 3R}
A candidate split produces two nodes: {3D, 0R},
{4D, 3R}.
The gain in MC is 0, but gain in GINI is 0.42 − 0.342 > 0.
Similarly, entropy will also show an improvement . . .

38 / 1
Choosing the split for a continuous variable

Statistics 202:
Data Mining

Jonathan
c
Taylor

39 / 1
Learning the tree

Statistics 202:
Data Mining

Jonathan
c
Taylor

Stopping training
As trees get deeper, or if splits are multi-way the number
of data points per leaf node drops very quickly.
Trees that are too deep tend to overfit the data.
A common strategy is to “prune” the tree by removing
some internal nodes.

40 / 1
Learning the tree

Statistics 202:
Data Mining

Jonathan
c
Taylor

Figure : Underfitting corresponds to the left-hand side, overfit to the

right 41 / 1
Learning the tree

Statistics 202:
Data Mining Cost-complexity pruning (tree library)
Jonathan
c
Taylor
Given a criterion Q like H or GINI , we define the
cost-complexity of a tree with terminal nodes (tj )1≤j≤m
m
X
Cα (T ) = nj Q(tj ) + αm
j=1

Given a large tree TL we might compute Cα (T ) for any

subtree T of TL .
The optimal tree is defined as

T̂α = argmin Cα (T ).
T ≤TL

Can be found by “weakest-link” pruning. See Elements of

Statistical Learning for more . . . 42 / 1
Learning the tree

Statistics 202:
Data Mining

Jonathan
c Pre-pruning (rpart library)
Taylor
These methods stop the algorithm before it becomes a
fully-grown tree.
Examples
Stop if all instances belong to the same class (kind of
obvious).
Stop if number of instances is less than some user-specified
threshold. Both tree, rpart have rules like this.
Stop if class distribution of instances are independent of
the available features (e.g., using χ2 test)
Stop if expanding the current node does not improve
impurity measures (e.g., Gini or information gain). This
relates to cp in rpart.

43 / 1
Training and test error as a function of cp

Statistics 202:
Data Mining

Jonathan
c
Taylor

44 / 1
Evaluating a classifier

Statistics 202:
Data Mining

Jonathan
c Metrics for Performance Evaluation…
Taylor

PREDICTED CLASS
Class=Yes Class=No

ACTUAL Class=Yes a b
(TP) (FN)
CLASS
Class=No c d
(FP) (TN)
l Most widely-used metric:

45 / 1
Evaluating a classifier

Statistics 202:
Data Mining

Jonathan
c
Taylor

Measures of performance
Simplest is accuracy
TP + TN
Accuracy =
TP + TN + FP + FN
= SMC(Actual, Predicted)
= 1 − Misclassification Rate

46 / 1
Evaluating a classifier

Statistics 202:
Data Mining

Jonathan
c
Taylor

Accuracy isn’t everything

Consider an unbalanced 2-class problem with # 1’s=10, #
0’s=9990.
Simply labelling everything 0 yields 99.9% accuracy.
But, this classifier misses all class 1.

47 / 1
Evaluating a classifier

Statistics 202:
Data Mining
Cost Matrix
Jonathan
c
Taylor

PREDICTED CLASS

C(i|j) Class=Yes Class=No

ACTUAL Class=Yes C(Yes|Yes) C(No|Yes)

CLASS
Class=No C(Yes|No) C(No|No)

C(i|j): Cost of misclassifying class j example as class i

Learning the tree

Statistics 202:
Data Mining

Jonathan
c
Taylor

Measures of performance
Classification rule changes to
X
Label(p, C ) = argmini C (i|j)pj
j

Accuracy is the same as cost if C (Y |Y ) = C (N|N) = c1 ,

C (Y |N) = C (N|Y ) = c2 .

49 / 1
Evaluating a classifier

Statistics 202:
Computing Cost of Classification
Data Mining

Jonathan
c Cost PREDICTED CLASS
Taylor
Matrix
C(i|j) + -
ACTUAL
+ -1 100
CLASS
- 1 0

Model PREDICTED CLASS Model PREDICTED CLASS

M1 M2
+ - + -
ACTUAL ACTUAL
+ 150 40 + 250 45
CLASS CLASS
- 60 250 - 5 200

Accuracy = 80% Accuracy = 90%

50 / 1
Evaluating a classifier

Statistics 202:
Data Mining

Jonathan
c Measures of performance
Taylor
Other common ones
TP
Precision =
TP + FP
TN
Specificity = = TNR
TN + FP
TP
Sensitivity = Recall = = TPR
TP + FN
2 · Recall · Precision
F =
Recall + Precision
2 · TP
=
2 · TP + FN + FP

51 / 1
Evaluating a classifier

Statistics 202:
Data Mining

Jonathan
c
Taylor

Measures of performance
Precision emphasizes P(p = Y , a = Y ) &
P(p = Y , a = N).
Recall emphasizes P(p = Y , a = Y ) & P(p = N, a = Y ).
FPR = 1 − TNR
FNR = 1 − TPR.

52 / 1
Evaluating a classifier

Statistics 202:
Data Mining

Jonathan
c
Taylor
Measure of performance
We have done some simple training / test splits to see
how well our classifier is doing.
More accurately, this procedure measures how well our
algorithm for learning the classifier is doing.
How well this works may depend on
Model: Are we using the right type of classifier
model?
Cost: Is our algorithm sensitive to the cost of
misclassification?
Data size: Do we have enough data to learn a model?

53 / 1
Evaluating a classifier
Learning Curve
Statistics 202:
Data Mining l Learning curve sh
how accuracy cha
Jonathan
c
Taylor
with varying samp
l Requires a sampli
schedule for creat
learning curve:
l Arithmetic sam
(Langley, et a
l Geometric sa
(Provost et al

Effect of small samp

- Bias in the es
- Variance of e

increases, our estimate
Kumar
of accuracy improves, as 4/18/2004
Introduction to Data Mining
does the variability of our estimate . . .

54 / 1
Evaluating a classifier

Statistics 202:
Data Mining

Jonathan
c
Estimating performance
Taylor
Holdout: Split into test and training (e.g. 1/3 test, 2/3
training).
Random subsampling: Repeated replicates of holdout,
averaging results.
Cross validation: Partition data into K disjoint subsets. For
each subset Si , train on all but Si , then test on
Si .
Stratified sampling: May be helpful to sample so Y/N class is
roughly equal in training data.
0.632 Bootstrap: Combine training error and bootstrap error
...

55 / 1
Statistics 202:
Data Mining

Jonathan
c
Taylor

56 / 1

Pdf-Bobcat-773-509635001-509616001-509634999 - HEISECKE PDF
No ratings yet
Pdf-Bobcat-773-509635001-509616001-509634999 - HEISECKE PDF
506 pages
Data Mining CS4168 Lecture 5 Basics of Classification 1
No ratings yet
Data Mining CS4168 Lecture 5 Basics of Classification 1
25 pages
Mushrooms Russia and History
100% (5)
Mushrooms Russia and History
579 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
07.2.Decision Trees_ML
No ratings yet
07.2.Decision Trees_ML
32 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Classification: Basic Concepts and Decision Trees
No ratings yet
Classification: Basic Concepts and Decision Trees
71 pages
Clase12 13
No ratings yet
Clase12 13
15 pages
06-Classification_Part1
No ratings yet
06-Classification_Part1
44 pages
Lec05 Classification DecisionTree
No ratings yet
Lec05 Classification DecisionTree
67 pages
DWDM Asgmnt Prog
No ratings yet
DWDM Asgmnt Prog
51 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
DM Lect8
No ratings yet
DM Lect8
56 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
59 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
3 - Sınıflandırma 2
No ratings yet
3 - Sınıflandırma 2
62 pages
10.1 Decision Tree
No ratings yet
10.1 Decision Tree
17 pages
dm unit 4
No ratings yet
dm unit 4
24 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
Lecture 7 Overview of ML models
No ratings yet
Lecture 7 Overview of ML models
77 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
EDA Cat2
No ratings yet
EDA Cat2
54 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
35 pages
CSE445 NSU Week_4
No ratings yet
CSE445 NSU Week_4
48 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Chapter 5. Decision Trees
No ratings yet
Chapter 5. Decision Trees
53 pages
Data Mining
No ratings yet
Data Mining
68 pages
WS - Data Analytics Fundamental-R
No ratings yet
WS - Data Analytics Fundamental-R
51 pages
Classification: Decision Trees: Business Analytics Lecture 7/8
No ratings yet
Classification: Decision Trees: Business Analytics Lecture 7/8
35 pages
ML-chap9_2024_110217
No ratings yet
ML-chap9_2024_110217
52 pages
Adobe Scan 16 May 2023 (5)
No ratings yet
Adobe Scan 16 May 2023 (5)
12 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Decision Trees (I) : ISOM3360 Data Mining For Business Analytics, Session 4
No ratings yet
Decision Trees (I) : ISOM3360 Data Mining For Business Analytics, Session 4
32 pages
Data Mining & Knowledge Discovery
No ratings yet
Data Mining & Knowledge Discovery
34 pages
Week 4 - Classification - Decision Tree 1
No ratings yet
Week 4 - Classification - Decision Tree 1
40 pages
Data Mining Classification Algorithms: Credits: Padhraic Smyth
No ratings yet
Data Mining Classification Algorithms: Credits: Padhraic Smyth
54 pages
Decision Trees
No ratings yet
Decision Trees
21 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Classification&DecisionTree (2)
No ratings yet
Classification&DecisionTree (2)
10 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Tree Based Classifiers: Dinesh R
No ratings yet
Tree Based Classifiers: Dinesh R
54 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
DMDW_Classification
No ratings yet
DMDW_Classification
18 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Lecture 9
No ratings yet
Lecture 9
21 pages
Decision Tree
100% (4)
Decision Tree
66 pages
Digestive System Grade 6
No ratings yet
Digestive System Grade 6
1 page
Protocolo 25cal
No ratings yet
Protocolo 25cal
3 pages
Aldec PDF
No ratings yet
Aldec PDF
4 pages
Departmental Wise Measurable Master - 2
No ratings yet
Departmental Wise Measurable Master - 2
13 pages
Seminar On: Wireless LAN and Security
No ratings yet
Seminar On: Wireless LAN and Security
49 pages
Manufacturing Processes - II - Lecture Notes PDF
No ratings yet
Manufacturing Processes - II - Lecture Notes PDF
18 pages
Best CoD Mobile Settings For Season 2 Sensitivity, Gyroscope, Controls, More - Charlie INTEL
No ratings yet
Best CoD Mobile Settings For Season 2 Sensitivity, Gyroscope, Controls, More - Charlie INTEL
1 page
12 H.D.A. Chemical Kinetics 04-07-2021 Paper
No ratings yet
12 H.D.A. Chemical Kinetics 04-07-2021 Paper
5 pages
Operations Department Procedure Manual: by by
No ratings yet
Operations Department Procedure Manual: by by
29 pages
WC Assignment-2
No ratings yet
WC Assignment-2
5 pages
Mitel 6865 SIP Phone: Key Features
No ratings yet
Mitel 6865 SIP Phone: Key Features
3 pages
Assignment 2 - Supply Chain
No ratings yet
Assignment 2 - Supply Chain
5 pages
HashiCorp Certified Terraform Associate (003) WhizCard
No ratings yet
HashiCorp Certified Terraform Associate (003) WhizCard
21 pages
Solar Power Calculation - Formula In-Depth Explanation and Examples
No ratings yet
Solar Power Calculation - Formula In-Depth Explanation and Examples
47 pages
Multiple Choice Questions
No ratings yet
Multiple Choice Questions
3 pages
SOBHA MEADOWS - WHISPERING HILL - Project Brochure
No ratings yet
SOBHA MEADOWS - WHISPERING HILL - Project Brochure
20 pages
Sustainable Cities Presentation - 8 IBM
No ratings yet
Sustainable Cities Presentation - 8 IBM
9 pages
Assignment 2 (LCW) (1) HH
No ratings yet
Assignment 2 (LCW) (1) HH
12 pages
Colonoscopy Prep
No ratings yet
Colonoscopy Prep
8 pages
ACTIVITIES OF DIVERSITY IN LIVING ORGANISMS 9TH CLASS BIOLOGY (1)
No ratings yet
ACTIVITIES OF DIVERSITY IN LIVING ORGANISMS 9TH CLASS BIOLOGY (1)
6 pages
KUMBH MELA PROJECT CLASS 9
No ratings yet
KUMBH MELA PROJECT CLASS 9
2 pages
Tfp500 Ds1 Tyco
No ratings yet
Tfp500 Ds1 Tyco
10 pages
Workbook Answer Key
40% (10)
Workbook Answer Key
8 pages
Core Build-Up Materials
No ratings yet
Core Build-Up Materials
6 pages
Robert Venturi 1
0% (1)
Robert Venturi 1
26 pages
UNT10 (Assembly)
No ratings yet
UNT10 (Assembly)
1 page
Certificate From AlMasaood
No ratings yet
Certificate From AlMasaood
5 pages
818 Manual
No ratings yet
818 Manual
206 pages

Decision Tree - All Cost Functions - Stanford

Uploaded by

Decision Tree - All Cost Functions - Stanford

Uploaded by

Statistics 202:

Statistics 202: Data Mining

October 19, 2012

that allows us to predict the label of a new observation

Examples of classification tasks

Figure : Trees have trouble capturing structure not parallel to axes

Figure : Binary or multi-way?

Figure : Binary or multi-way?

Figure : Which should we start the splitting on?

Choosing the best split

Maximized when pj,t = 1/k with value 1 − 1/k

Greedy algorithm chooses the biggest gain in GINI index

Maximized when pi,t = 1/k with value log k

Greedy algorithm chooses the biggest gain in H among a

Misclassification Error(t) = 1 − pk̂(t),t

Not smooth in pt as GINI , H, can be more difficult to

Figure : Underfitting corresponds to the left-hand side, overfit to the

Given a large tree TL we might compute Cα (T ) for any

Can be found by “weakest-link” pruning. See Elements of

Accuracy isn’t everything

C(i|j) Class=Yes Class=No

ACTUAL Class=Yes C(Yes|Yes) C(No|Yes)

C(i|j): Cost of misclassifying class j example as class i

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 48 / 1

Accuracy is the same as cost if C (Y |Y ) = C (N|N) = c1 ,

Model PREDICTED CLASS Model PREDICTED CLASS

Accuracy = 80% Accuracy = 90%

Effect of small samp

Figure : As data© Tan,Steinbach,

You might also like