0% found this document useful (0 votes)
18 views

Attribute Selection Measure

This document discusses different techniques for building decision trees from data: 1. Information gain and gain ratio select the attribute that best splits the data based on reduction of entropy. 2. Gini index measures impurity in data partitions, and splits on the attribute giving the largest reduction in impurity. 3. Overfitting can be avoided through prepruning by not splitting nodes below a threshold, or postpruning by removing branches from a fully grown tree.

Uploaded by

hehehenotyours
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Attribute Selection Measure

This document discusses different techniques for building decision trees from data: 1. Information gain and gain ratio select the attribute that best splits the data based on reduction of entropy. 2. Gini index measures impurity in data partitions, and splits on the attribute giving the largest reduction in impurity. 3. Overfitting can be avoided through prepruning by not splitting nodes below a threshold, or postpruning by removing branches from a fully grown tree.

Uploaded by

hehehenotyours
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Select the attribute with the highest

information gain
Let pi be the probability that an arbitrary
tuple in D belongs to class Ci, estimated by |
Ci, D|/|D|
Expected information (entropy) needed to
Class P: buys_computer =
classify
“yes” a tuple in D:
Class N: buys_computer =
“no”
Information 9 needed
Info( D)=I( 9 ,5 )=− log
14 2 ( )− log ((after
9 5
14 14 2
5
14
)=0. 940using A to split D

into v partitions) to classify D:


5 4
Infoage ( D )= I ( 2 ,3 )+ I ( 4 ,0 )
14 14
5
+ I ( 3 ,2 )=0 . 694
Information gained by branching on
14

attribute A
Gain(age)=Info( D)−Info ( D )=0 . 246
age

means “age <=30” has 5 out of 14 samples,


with 2 yes’es and 3 no’s. Hence

Similarly,
Gini index(CART, IBM IntelligentMiner)

If a data set D contains examples from n


classes, gini index, gini(D) is defined as

where pj is the relative frequency of


class j in D
If a data set D is split on A into two subsets D1
and D2, the gini index gini(D) is defined as

Reduction
Ex. D has 9intuples
Impurity:
in buys_computer = “yes”
and 5 in “no”
The attribute provides the smallest ginisplit(D)
Suppose the attribute
(or the largest income
reduction partitions
in impurity) D into
is chosen
10 in D1the
to split : {low, medium}
node (need toand 4 in D2 all the
enumerate
possible splitting points for each attribute)

The three measures, in general, return good


results
Ginibut is 0.458; Gini is 0.450.
{low,high} {medium,high}
Information gain:on the {low,medium} (and
Thus, split
biased{high})
towardssince it has the attributes
multivalued lowest Gini index
All
Gainattributes
ratio: are assumed continuous-valued
May
tendsneed otherunbalanced
to prefer tools, e.g., clustering, to get
splits in which one
the possible split values
partition is much smaller than the others
CHAID: a popular decision tree algorithm, measure
Can be modified for categorical attributes
Gini
basedindex:
on χ2 test for independence
biased to multivalued
C-SEP: performs attributes
better than info. gain and gini index in
has difficulty
certain cases when # of classes is large
tends to favor
G-statistic: testsapproximation
has a close that result intoequal-sized
χ2 distribution
partitions and
MDL (Minimal purity inLength)
Description both partitions
principle (i.e., the
simplest solution is preferred):
Overfitting: An induced tree may overfit the
training data
Too many branches, some may reflect
anomalies due to noise or outliers
Poor accuracy for unseen samples
Two approaches to avoid overfitting
Improvements
Prepruning: Halt tree construction early ̵ do
 Allow for continuous-valued attributes
not split a node if this would result in the
 Dynamically define new discrete-valued attributes that partition the continuous
goodness measure
attribute falling
value into a discrete below a threshold
set of intervals

 Handle missing attribute values


Difficult to choose an appropriate threshold
 Assign the most common value of the attribute

Postpruning: Remove
 Assign probability branches
to each of the possible values from a “fully
grown” tree—get a sequence of progressively
 Attribute construction

pruned trees
 Create new attributes based on existing ones that are sparsely represented

 This reduces fragmentation, repetition, and replication


Use a set of data different from the training
data to decide which is the “best pruned tree”

You might also like