Example_Classification
Example_Classification
no yes yes
4
Attribute Selection: Information Gain
¨ Class P: buys_computer = “yes” 9 9 5 5
¨ Class N: buys_computer = “no” Info(D) = I (9,5) = - 14 log2 (14 ) - 14 log2 (14 ) =0.940
12
Binary Attributes: Computing Gini Index
! Splits into two partitions n 2
gini(D) =1- å p j j =1
! Effect of weighing partitions:
– Larger and Purer Partitions are sought for.
Parent
B?
C1 6
Yes No C2 6
Node N1 Node N2
Gini = ?
22
Binary Attributes: Computing Gini Index
! Splits into two partitions gini(D) =1- n 2
å pj
! Effect of weighing partitions: j =1
– Prefer Larger and Purer Partitions.
Parent
B? C1 6
Yes No C2 6
Gini = 0.500
Gini(N1) Node N1 Node N2
= 1 – (5/7)2 – (2/7)2
= 0.408 N1 N2 Gini(Children)
Gini(N2) C1 5 1
2 2 C2 2 4 = 7/12 * 0.408 + weighting 5/12
= 1 – (1/5) – (4/5)
= 0.320 Gini=0.371 * 0.320
= 0.371
71
Categorical Attributes: Computing Gini Index
¨ For each distinct value, gather counts for each class in the dataset
¨ Use the count matrix to make decisions
Multi-way split Two-way split
(find best partition of values)
¤ Use Binary Decisions based on one splitting value 1 Yes Single 125K No
2 No Married 100K No
¤ Number of possible splitting values = Number of distinct values 4 Yes Married 120K No
-1
¤ Typically, the midpoint between each pair of adjacent values is 5 No Divorced 95K Yes
considered as a
possible split point 6 No Married 60K No
7 Yes Divorced 220K No
n (ai+ai+1)/2 is the midpoint between the values of ai and ai+1 8 No Single 85K Yes
¨ Each splitting value has a count matrix associated with it 9 No Married 75K No
10 No Single 90K Yes
26
Continuous Attributes:
Computing Gini Index or expected information
requirement
First decide the splitting value to discretize the attribute:
¨ For efficient computation: for each attribute,
Step 1: Sort the attribute on values
Gini 0.420 0.400 0.375 0.343 0.417 0.400 0.300 0.343 0.375 0.400 0.420
27
Continuous Attributes:
Computing Gini Index or expected information
requirement
First decide the splitting value to discretize the attribute:
¨ For efficient computation: for each attribute,
Step 1: Sort the attribute on values
Step 2: Linearly scan these values, each time updating the count matrix
Step Yes 0 3 0 3 0 3 0 3 1 2 2 1 3 0 3 0 3 0 3 0 3 0
2: No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0
Gini 0.420 0.400 0.375 0.343 0.417 0.400 0.300 0.343 0.375 0.400 0.420
For each splitting value, get its count matrix: how many
data tuples have:
(a) Taxable income <=65 with class label “Yes” , (b)
Taxable income <=65 with class label “No”, (c) Taxable
income >65 with class label “Yes”,
28 (d) Taxable income >65 with class
label “No”.
Continuous Attributes:
Computing Gini Index or expected information
requirement
First decide the splitting value to discretize the attribute:
¨ For efficient computation: for each attribute,
Step 1: Sort the attribute on values
Step 2: Linearly scan these values, each time updating the count matrix
Step Yes 0 3 0 3 0 3 0 3 1 2 2 1 3 0 3 0 3 0 3 0 3 0
2: No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0
Gini 0.420 0.400 0.343 0.417 0.400 0.300 0.343 0.375 0.400 0.420
For each splitting value, get its count matrix: how many
data tuples have:
(a) Taxable income <=72 with class label “Yes” , (b)
Taxable income
<=72 with class label “No”, (c) Taxable income >72 with
class label “Yes”,
29 (d) Taxable income >72 with class
label “No”.
Continuous Attributes:
Computing Gini Index or expected information
requirement
First decide the splitting value to discretize the attribute:
¨ For efficient computation: for each attribute,
Step 1: Sort the attribute on values
Step 2: Linearly scan these values, each time updating the count matrix
Step Yes 0 3 0 3 0 3 0 3 1 2 2 1 3 0 3 0 3 0 3 0 3 0
2: No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0
Gini 0.420 0.400 0.375 0. 0.417 0.400 0.300 0.343 0.375 0.400 0.420
For each splitting value, get its count matrix: how many
data tuples have:
(a) Taxable income <=80 with class label “Yes” , (b)
Taxable income <=80 with class label “No”, (c) Taxable
income >80 with class label “Yes”,
30 (d) Taxable income >80 with class
label “No”.
Continuous Attributes:
Computing Gini Index or expected information
requirement
First decide the splitting value to discretize the attribute:
¨ For efficient computation: for each attribute,
Step 1: Sort the attribute on values
Step 2: Linearly scan these values, each time updating the count matrix
Step Yes 0 3 0 3 0 3 0 3 1 2 2 1 3 0 3 0 3 0 3 0 3 0
2: No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0
Gini 0.420 0.400 0.375 0.343 0.417 0.400 0.300 0.343 0.375 0. 0.420
For each splitting value, get its count matrix: how many data tuples have:
(a) Taxable income <=172 with class label “Yes” , (b)
Taxable income <=172 with class label “No”, (c) Taxable
income >172 with class label
31 “Yes”, (d) Taxable income >172 with class
label “No”.
Continuous Attributes:
Computing Gini Index or expected information
requirement
First decide the splitting value to discretize the attribute:
¨ For efficient computation: for each attribute,
Step 1: Sort the attribute on values
Step 2: Linearly scan these values, each time updating the count matrix
Step 3: Computing Gini index and choose the split position that has the least
Gini index
Step 3:
Step
1:
Step
2:
Sorted Values
Possible Splitting
For each splitting value v (e.g., 65), compute its Gini
index:
gini (D) = |D1| |D | Here D1 and D2 are two partitions based on
Taxable _ Income gini(D1) + 2 gini(D2) v: D1 has
|D|
32 |D| taxable income <=v and D2 has >v
Continuous Attributes:
Computing Gini Index or expected information
requirement
First decide the splitting value to discretize the attribute:
¨ For efficient computation: for each attribute,
Step 1: Sort the attribute on values
Step 2: Linearly scan these values, each time updating the count matrix
Step 3: Computing Gini index and choose the split position that has the least
Gini index
Step 3:
Step
1:
Step
2:
Sorted Values
Possible Splitting
For each splitting value v (e.g., 72), compute its Gini
index:
gini (D) = |D1| |D | Here D1 and D2 are two partitions based on
Taxable _ Income gini(D1) + 2 gini(D2) v: D1 has
|D|
33 |D| taxable income <=v and D2 has >v
Continuous Attributes:
Computing Gini Index or expected information
requirement
First decide the splitting value to discretize the attribute:
¨ For efficient computation: for each attribute,
Step 1: Sort the attribute on values
Step 2: Linearly scan these values, each time updating the count matrix
Step 3: Computing Gini index and choose the split position that has the least
Gini index
Step 3:
Step
1:
Step
2:
<= > <= > <= > <= > <= > <= > <= > <= > <= > <= > <= >
Yes 0 3 0 3 0 3 0 3 1 2 2 1 3 0 3 0 3 0 3 0 3 0
No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0
Sorted Values
Gini 0.420 0.400 0.375 0.343 0.417 0.400 0.300 0.343 0.375 0.400 0.420
Possible Splitting
Choose this splitting value (=97) with the least Gini index to discretize
Taxable Income
34
Continuous Attributes:
Computing Gini Index or expected information
requirement
First decide the splitting value to discretize the attribute:
¨ For efficient computation: for each attribute,
Step 1: Sort the attribute on values
Step 2: Linearly scan these values, each time updating the count matrix
Step 3: Computing expected information requirement and choose the split position
that has the least value
Step 3:
Step
1:
Step
2:
Sorted Values
Possible Splitting
If Information Gain is Similarly to calculating Gini index, for each splitting value, compute
used Info_{Taxable Income}:
for attribute selection, Info (D) =
2
|Dj |
å ´ Info(D j )
35
Taxable-Income
j =1 |D|
Continuous Attributes:
Computing Gini Index or expected information
requirement
First decide the splitting value to discretize the attribute:
¨ For efficient computation: for each attribute,
Step 1: Sort the attribute on values
Step 2: Linearly scan these values, each time updating the count matrix
Step 3: Computing Gini index and choose the split position that has the least
Gini index
Step 3:
Step
1:
Step
2:
Sorted Values
Possible Splitting
Choose this splitting value (=97 here) with the least Gini index or expected information requirement
to discretize Taxable Income
36
Continuous Attributes:
Computing Gini Index or expected information
requirement
First decide the splitting value to discretize the attribute:
¨ For efficient computation: for each attribute,
Step 1: Sort the attribute on values
Step 2: Linearly scan these values, each time updating the count matrix
Step 3: Computing Gini index and choose the split position that has the least Gini
index
Step 3:
Step
1:
Step
2:
No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0
Gini 0.420 0.400 0.375 0.343 0.417 0.400 0.300 0.343 0.375 0.400 0.420
Sorted Values
Possible Splitting
At each level of the decision tree, for attribute selection, (1) First, discretize a continuous attribute
by deciding the splitting value; (2) Then, compare the discretized attribute with other attributes in
terms of Gini Index reduction or Information Gain.
37
Continuous Attributes:
Computing Gini Index or expected information
requirement
First decide the splitting value to discretize the attribute:
For each
¨ For efficient computation: for each attribute,
attribute,
Step 1: Sort the attribute on values
only scan the
Step 2: Linearly scan these values, each time updating the count
matrix data tuples
once
Step 3: Computing Gini index and choose the split position that has the least Gini
index
Step 2:
Step 3:
Step
1:
No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0
Gini 0.420 0.400 0.375 0.343 0.417 0.400 0.300 0.343 0.375 0.400 0.420
Sorted Values
Possible Splitting
At each level of the decision tree, for attribute selection, (1) First, discretize a continuous attribute
by deciding the splitting value; (2) Then, compare the discretized attribute with other attributes in
terms of Gini Index reduction or Information Gain.
38
Naïve Bayes Classifier: Training
Dataset
Class: age income student credit_rating buys_computer
<=30 high no fair no
C1:buys_computer = ‘yes’ <=30 high no excellent no
31…40 high no fair yes
C2:buys_computer = ‘no’ >40 medium no fair yes
>40 low yes fair yes
Data to be classified: >40 low yes excellent no
31…40 low yes excellent yes
X = (age <=30, Income = medium, <=30 medium no fair no
Student = yes, Credit_rating = Fair) <=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
118
Naïve Bayes Classifier: An age
<=30
income student credit_rating buys_computer
high no fair no
Example <=30
31…40
>40
high
high
no excellent
no fair
medium no fair
no
yes
yes
>40 low yes fair yes
¨ Prior probability P(Ci): >40 low yes excellent no
31…40 low yes excellent yes
P(buys_computer = “yes”) = 9/14 <=30 medium no fair no
<=30 low yes fair yes
= 0.643 P(buys_computer = “no”) >40 medium yes fair yes
<=30 medium yes excellent yes
= 5/14= 0.357 31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
119
Naïve Bayes Classifier: An age
<=30
income student credit_rating buys_computer
high no fair no
Example <=30
31…40
>40
high
high
no excellent
no fair
medium no fair
no
yes
yes
>40 low yes fair yes
¨ P(Ci): P(buys_computer = “yes”) = 9/14 >40 low yes excellent no
31…40 low yes excellent yes
= 0.643 P(buys_computer = “no”) <=30 medium no fair no
<=30 low yes fair yes
= 5/14= 0.357 >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
¨ Compute P(X|Ci) for each class, where, >40 medium no excellent no
X = (age <=30, Income = medium, Student = yes,
Credit_rating = Fair)
121
P(student = “yes” |
Naïve Bayes Classifier: An buys_computer =
“no”) = 1/5 = 0.2
Example P(credit_rating =
“fair” |
¨ P(Ci): P(buys_computer = “yes”) = 9/14 buys_computer =
“yes”) = 6/9 =
= 0.643 P(buys_computer =
0.667
“no”) = 5/14= 0.357 P(credit_rating =
¨ Compute P(Xi|Ci) for each class “fair” |
P(age = “<=30”|buys_computer = “yes”) = 2/9 = buys_computer =
0.222 “no”) = 2/5 = 0.4
P(age = “<= 30”|buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) =
4/9 = 0.444
P(income = “medium” | buys_computer = “no”) =
2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 =
0.667
31…40 low yes excellent yes
<=30 medium no fair no
age income student credit_rating buys_computer <=30 low yes fair yes
<=30 high no fair no >40 medium yes fair yes
<=30 high no excellent no <=30 medium yes excellent yes
31…40 high no fair yes 31…40 medium no excellent yes
>40 medium no fair yes 31…40 high yes fair yes
>40 low yes fair yes >40 medium no excellent no
>40 low yes excellent no
122
P(credit_rating =
Naïve Bayes Classifier: An “fair” |
buys_computer =
Example “yes”) = 6/9 =
0.667
P(credit_rating =
¨ P(Ci): P(buys_computer = “yes”) = 9/14
“fair” |
= 0.643 P(buys_computer = “no”) = 5/14= buys_computer =
0.357 “no”) = 2/5 = 0.4
¨ Compute P(Xi|Ci) for each class
P(age = “<=30”|buys_computer = “yes”) = 2/9 =
0.222
P(age = “<= 30”|buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) =
4/9 = 0.444
P(income = “medium” | buys_computer = “no”) =
2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 =
0.667
P(student = “yes” | buys_computer = “no”) = 1/5 =
0.2
31…40 low yes excellent yes
<=30 medium no fair no
age income student credit_rating buys_computer <=30 low yes fair yes
<=30 high no fair no >40 medium yes fair yes
<=30 high no excellent no <=30 medium yes excellent yes
31…40 high no fair yes 31…40 medium no excellent yes
>40 medium no fair yes 31…40 high yes fair yes
>40 low yes fair yes >40 medium no excellent no
>40 low yes excellent no
¨ X = (age <= 30 , income = medium, student = yes,
credit_rating = fair) P(X|Ci) : P(X|buys_computer = “yes”) =
0.222 x 0.444 x 0.667 x 0.667 = 0.044 P(X|buys_computer =
“no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
Take into account
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer the prior
= “yes”) = 0.028 P(X|buys_computer = “no”) * probabilities
P(buys_computer = “no”) = 0.007
123
P(credit_rating =
Naïve Bayes Classifier: An “fair” |
buys_computer =
Example “yes”) = 6/9 =
0.667
P(credit_rating =
¨ P(Ci): P(buys_computer = “yes”) = 9/14
“fair” |
= 0.643 P(buys_computer = “no”) = 5/14= buys_computer =
0.357 “no”) = 2/5 = 0.4
¨ Compute P(Xi|Ci) for each class
P(age = “<=30”|buys_computer = “yes”) = 2/9 =
0.222
P(age = “<= 30”|buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) =
4/9 = 0.444
P(income = “medium” | buys_computer = “no”) =
2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 =
0.667
P(student = “yes” | buys_computer = “no”) = 1/5 =
0.2
31…40 low yes excellent yes
<=30 medium no fair no
age income student credit_rating buys_computer <=30 low yes fair yes
<=30 high no fair no >40 medium yes fair yes
<=30 high no excellent no <=30 medium yes excellent yes
31…40 high no fair yes 31…40 medium no excellent yes
>40 medium no fair yes 31…40 high yes fair yes
>40 low yes fair yes >40 medium no excellent no
>40 low yes excellent no
X = (age <= 30 , income = medium, student = yes,
¨
Prediction
!!
0.85 Yes
!" 0.75 No
!
#
0.65 Yes
!$
0.4 No
!% 0.3 No
!
&
83
ROC Calculation
84
!"
0.75 No
!
#
0.65 Yes
!$
0.4 No
!% 0.3 No
!
&
84
ROC Calculation
85
#
!
0.65 Yes
!$
0.4 No
!% 0.3 No
!
&
85
ROC Calculation
86
Yes
"≥0.7→ !
# -() = 0.334
!" 0.75 No
0.4 No
!$
!% 0.3 No
!
&
86
ROC Calculation
87
Yes !
#
0.65 Yes
!%
0.3 No
!
&
87
ROC Calculation
88
&
88
ROC Calculation
89
Yes 0.3 No
!
& -() = 1.0
!%
89