0% found this document useful (0 votes)

43 views145 pages

M6 Classification Alternative

The document discusses rule-based classification techniques for data mining. It describes how rule-based classifiers work by applying a set of "if-then" rules to classify records. Each rule has a condition part and a class label part. It provides an example of rules developed from a dataset to classify different types of animals. The document also discusses concepts like rule coverage, accuracy, ordered vs unordered rule sets, direct and indirect methods for building rules, and the sequential covering approach for generating rules from data.

Uploaded by

Andaro Production

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views145 pages

M6 Classification Alternative

Uploaded by

Andaro Production

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 145

Data Mining

Classification: Alternative Techniques

Lecture Notes for Chapter 4

Rule-Based

Introduction to Data Mining , 2nd Edition

by
Tan, Steinbach, Karpatne, Kumar
Rule-Based Classifier

Classify records by using a collection of

“if…then…” rules
Rule: (Condition) → y
– where
◆ Condition is a conjunction of tests on attributes
◆ y is the class label
– Examples of classification rules:
◆ (Blood Type=Warm)  (Lay Eggs=Yes) → Birds
◆ (Taxable Income < 50K)  (Refund=Yes) → Evade=No

9/30/2020 Introduction to Data Mining, 2nd Edition 2

Rule-based Classifier (Example)
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds

R1: (Give Birth = no)  (Can Fly = yes) → Birds

R2: (Give Birth = no)  (Live in Water = yes) → Fishes
R3: (Give Birth = yes)  (Blood Type = warm) → Mammals
R4: (Give Birth = no)  (Can Fly = no) → Reptiles
R5: (Live in Water = sometimes) → Amphibians
9/30/2020 Introduction to Data Mining, 2nd Edition 3
Application of Rule-Based Classifier

A rule r covers an instance x if the attributes of

the instance satisfy the condition of the rule
R1: (Give Birth = no)  (Can Fly = yes) → Birds
R2: (Give Birth = no)  (Live in Water = yes) → Fishes
R3: (Give Birth = yes)  (Blood Type = warm) → Mammals
R4: (Give Birth = no)  (Can Fly = no) → Reptiles
R5: (Live in Water = sometimes) → Amphibians

Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?
grizzly bear warm yes no no ?

The rule R1 covers a hawk => Bird

The rule R3 covers the grizzly bear => Mammal

9/30/2020 Introduction to Data Mining, 2nd Edition 4

Rule Coverage and Accuracy
Tid Refund Marital Taxable
Coverage of a rule: Status Income Class

– Fraction of records 1 Yes Single 125K No

2 No Married 100K No
that satisfy the
3 No Single 70K No
antecedent of a rule 4 Yes Married 120K No

Accuracy of a rule: 5 No Divorced 95K Yes

6 No Married 60K No
– Fraction of records 7 Yes Divorced 220K No
that satisfy the 8 No Single 85K Yes

antecedent that 9 No Married 75K No

10 No Single 90K Yes
also satisfy the 10

consequent of a (Status=Single) → No
rule Coverage = 40%, Accuracy = 50%

9/30/2020 Introduction to Data Mining, 2nd Edition 5

How does Rule-based Classifier Work?

R1: (Give Birth = no)  (Can Fly = yes) → Birds

Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?

A lemur triggers rule R3, so it is classified as a mammal

A turtle triggers both R4 and R5
A dogfish shark triggers none of the rules

9/30/2020 Introduction to Data Mining, 2nd Edition 6

Characteristics of Rule Sets: Strategy 1

Mutually exclusive rules

– Classifier contains mutually exclusive rules if
the rules are independent of each other
– Every record is covered by at most one rule

Exhaustive rules
– Classifier has exhaustive coverage if it
accounts for every possible combination of
attribute values
– Each record is covered by at least one rule
9/30/2020 Introduction to Data Mining, 2nd Edition 7
Characteristics of Rule Sets: Strategy 2

Rules are not mutually exclusive

– A record may trigger more than one rule
– Solution?
◆ Ordered rule set
◆ Unordered rule set – use voting schemes

Rules are not exhaustive

– A record may not trigger any rules
– Solution?
◆ Use a default class
9/30/2020 Introduction to Data Mining, 2nd Edition 8
Ordered Rule Set

Rules are rank ordered according to their priority

– An ordered rule set is known as a decision list
When a test record is presented to the classifier
– It is assigned to the class label of the highest ranked rule it has
triggered
– If none of the rules fired, it is assigned to the default class

R1: (Give Birth = no)  (Can Fly = yes) → Birds

Name Blood Type Give Birth Can Fly Live in Water Class
turtle cold no no sometimes ?
9/30/2020 Introduction to Data Mining, 2nd Edition 9
Rule Ordering Schemes

Rule-based ordering
– Individual rules are ranked based on their quality
Class-based ordering
– Rules that belong to the same class appear together

Rule-based Ordering Class-based Ordering

(Refund=Yes) ==> No (Refund=Yes) ==> No

(Refund=No, Marital Status={Single,Divorced}, (Refund=No, Marital Status={Single,Divorced},

Taxable Income<80K) ==> No Taxable Income<80K) ==> No

(Refund=No, Marital Status={Single,Divorced}, (Refund=No, Marital Status={Married}) ==> No

Taxable Income>80K) ==> Yes
(Refund=No, Marital Status={Single,Divorced},
(Refund=No, Marital Status={Married}) ==> No Taxable Income>80K) ==> Yes

9/30/2020 Introduction to Data Mining, 2nd Edition 10

Building Classification Rules

Direct Method:
◆ Extract rules directly from data
◆ Examples: RIPPER, CN2, Holte’s 1R

Indirect Method:
◆ Extract rules from other classification models (e.g.
decision trees, neural networks, etc).
◆ Examples: C4.5rules

9/30/2020 Introduction to Data Mining, 2nd Edition 11

Direct Method: Sequential Covering

1. Start from an empty rule

2. Grow a rule using the Learn-One-Rule function
3. Remove training records covered by the rule
4. Repeat Step (2) and (3) until stopping criterion
is met

9/30/2020 Introduction to Data Mining, 2nd Edition 12

Example of Sequential Covering

(i) Original Data (ii) Step 1

9/30/2020 Introduction to Data Mining, 2nd Edition 13

Example of Sequential Covering…

R1 R1

(iii) Step 2 (iv) Step 3

9/30/2020 Introduction to Data Mining, 2nd Edition 14

Rule Growing

Two common strategies

Yes: 3
{} No: 4
Refund=No, Refund=No,
Status=Single, Status=Single,
Income=85K Income=90K
(Class=Yes) (Class=Yes)

Refund=
No
Status =
Single
Status =
Divorced
Status =
Married
... Income
> 80K
Refund=No,
Status = Single
Yes: 3 Yes: 2 Yes: 1 Yes: 0 Yes: 3 (Class = Yes)
No: 4 No: 1 No: 0 No: 3 No: 1

(a) General-to-specific (b) Specific-to-general

9/30/2020 Introduction to Data Mining, 2nd Edition 15

Rule Evaluation
FOIL: First Order Inductive
Foil’s Information Gain Learner – an early rule-
based learning algorithm

– R0: {} => class (initial rule)

– R1: {A} => class (rule after adding conjunct)

– 𝐺𝑎𝑖𝑛 𝑅 , 𝑅 = 𝑝 × [ 𝑙𝑜𝑔 𝑝1 𝑝0
0 1 1 2 − 𝑙𝑜𝑔2 ]
𝑝1 + 𝑛1 𝑝0 + 𝑛0

– 𝑝0 : number of positive instances covered by R0

𝑛0 : number of negative instances covered by R0
𝑝1 : number of positive instances covered by R1
𝑛1 : number of negative instances covered by R1

9/30/2020 Introduction to Data Mining, 2nd Edition 16

Direct Method: RIPPER

For 2-class problem, choose one of the classes as

positive class, and the other as negative class
– Learn rules for positive class
– Negative class will be default class
For multi-class problem
– Order the classes according to increasing class
prevalence (fraction of instances that belong to a
particular class)
– Learn the rule set for smallest class first, treat the rest
as negative class
– Repeat with next smallest class as positive class

9/30/2020 Introduction to Data Mining, 2nd Edition 17

Direct Method: RIPPER

Growing a rule:
– Start from empty rule
– Add conjuncts as long as they improve FOIL’s
information gain
– Stop when rule no longer covers negative examples
– Prune the rule immediately using incremental reduced
error pruning
– Measure for pruning: v = (p-n)/(p+n)
◆ p: number of positive examples covered by the rule in
the validation set
◆ n: number of negative examples covered by the rule in
the validation set
– Pruning method: delete any final sequence of
conditions that maximizes v
9/30/2020 Introduction to Data Mining, 2nd Edition 18
Direct Method: RIPPER

Building a Rule Set:

– Use sequential covering algorithm
◆ Finds the best rule that covers the current set of
positive examples
◆ Eliminate both positive and negative examples
covered by the rule
– Each time a rule is added to the rule set,
compute the new description length
◆ Stop adding new rules when the new description
length is d bits longer than the smallest description
length obtained so far

9/30/2020 Introduction to Data Mining, 2nd Edition 19

Direct Method: RIPPER

Optimize the rule set:

– For each rule r in the rule set R
◆ Consider 2 alternative rules:
– Replacement rule (r*): grow new rule from scratch
– Revised rule(r′): add conjuncts to extend the rule r
◆ Compare the rule set for r against the rule set for r*
and r′
◆ Choose rule set that minimizes MDL principle

– Repeat rule generation and rule optimization

for the remaining positive examples

9/30/2020 Introduction to Data Mining, 2nd Edition 20

Indirect Methods

P
No Yes

Q R Rule Set

No Yes No Yes r1: (P=No,Q=No) ==> -

r2: (P=No,Q=Yes) ==> +
- + + Q r3: (P=Yes,R=No) ==> +
r4: (P=Yes,R=Yes,Q=No) ==> -
No Yes
r5: (P=Yes,R=Yes,Q=Yes) ==> +
- +

9/30/2020 Introduction to Data Mining, 2nd Edition 21

Indirect Method: C4.5rules

Extract rules from an unpruned decision tree

For each rule, r: A → y,
– consider an alternative rule r′: A′ → y where A′
is obtained by removing one of the conjuncts
in A
– Compare the pessimistic error rate for r
against all r’s
– Prune if one of the alternative rules has lower
pessimistic error rate
– Repeat until we can no longer improve
generalization error

9/30/2020 Introduction to Data Mining, 2nd Edition 22

Indirect Method: C4.5rules

Instead of ordering the rules, order subsets of

rules (class ordering)
– Each subset is a collection of rules with the
same rule consequent (class)
– Compute description length of each subset
◆ Description length = L(error) + g L(model)
◆ g is a parameter that takes into account the
presence of redundant attributes in a rule set
(default value = 0.5)

9/30/2020 Introduction to Data Mining, 2nd Edition 23

Example
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no no yes mammals
python no yes no no no reptiles
salmon no yes no yes no fishes
whale yes no no yes no mammals
frog no yes no sometimes yes amphibians
komodo no yes no no yes reptiles
bat yes no yes no yes mammals
pigeon no yes yes no yes birds
cat yes no no no yes mammals
leopard shark yes no no yes no fishes
turtle no yes no sometimes yes reptiles
penguin no yes no sometimes yes birds
porcupine yes no no no yes mammals
eel no yes no yes no fishes
salamander no yes no sometimes yes amphibians
gila monster no yes no no yes reptiles
platypus no yes no no yes mammals
owl no yes yes no yes birds
dolphin yes no no yes no mammals
eagle no yes yes no yes birds

9/30/2020 Introduction to Data Mining, 2nd Edition 24

C4.5 versus C4.5rules versus RIPPER

Give C4.5rules:
Birth? (Give Birth=No, Can Fly=Yes) → Birds
(Give Birth=No, Live in Water=Yes) → Fishes
Yes No
(Give Birth=Yes) → Mammals
(Give Birth=No, Can Fly=No, Live in Water=No) → Reptiles
Mammals Live In ( ) → Amphibians
Water?
Yes No RIPPER:
(Live in Water=Yes) → Fishes
Sometimes (Have Legs=No) → Reptiles
(Give Birth=No, Can Fly=No, Live In Water=No)
Fishes Amphibians Can → Reptiles
Fly?
(Can Fly=Yes,Give Birth=No) → Birds
Yes No () → Mammals

Birds Reptiles

9/30/2020 Introduction to Data Mining, 2nd Edition 25

C4.5 versus C4.5rules versus RIPPER

C4.5 and C4.5rules:

PREDICTED CLASS
Amphibians Fishes Reptiles Birds Mammals
ACTUAL Amphibians 2 0 0 0 0
CLASS Fishes 0 2 0 0 1
Reptiles 1 0 3 0 0
Birds 1 0 0 3 0
Mammals 0 0 1 0 6
RIPPER:
PREDICTED CLASS
Amphibians Fishes Reptiles Birds Mammals
ACTUAL Amphibians 0 0 0 0 2
CLASS Fishes 0 3 0 0 0
Reptiles 0 0 3 0 1
Birds 0 0 1 2 1
Mammals 0 2 1 0 4

9/30/2020 Introduction to Data Mining, 2nd Edition 26

Advantages of Rule-Based Classifiers

Has characteristics quite similar to decision trees

– As highly expressive as decision trees
– Easy to interpret (if rules are ordered by class)
– Performance comparable to decision trees
◆Can handle redundant and irrelevant attributes
◆ Variable interaction can cause issues (e.g., X-OR problem)
Better suited for handling imbalanced classes
Harder to handle missing values in the test set

9/30/2020 Introduction to Data Mining, 2nd Edition 27

Data Mining
Classification: Alternative Techniques

Lecture Notes for Chapter 4

Instance-Based Learning

Introduction to Data Mining , 2nd Edition

by
Tan, Steinbach, Karpatne, Kumar
Nearest Neighbor Classifiers

Basic idea:
– If it walks like a duck, quacks like a duck, then
it’s probably a duck

Compute
Distance Test
Record

Training Choose k of the

Records “nearest” records

2/10/2021 Introduction to Data Mining, 2nd Edition 2

Nearest-Neighbor Classifiers
Unknown record Requires the following:
– A set of labeled records
– Proximity metric to compute
distance/similarity between a
pair of records
– e.g., Euclidean distance
– The value of k, the number of
nearest neighbors to retrieve
– A method for using class
labels of K nearest neighbors
to determine the class label of
unknown record (e.g., by
taking majority vote)

2/10/2021 Introduction to Data Mining, 2nd Edition 3

How to Determine the class label of a Test Sample?

Take the majority vote of class labels among the k-

nearest neighbors
Weight the vote according to distance
– weight factor, 𝑤 = 1/𝑑2

2/10/2021 Introduction to Data Mining, 2nd Edition 4

Choice of proximity measure matters

For documents, cosine is better than correlation or

Euclidean

111111111110 000000000001
vs
011111111111 100000000000

Euclidean distance = 1.4142 for both pairs, but

the cosine similarity measure has different
values for these pairs.

2/10/2021 Introduction to Data Mining, 2nd Edition 5

Nearest Neighbor Classification…

Data preprocessing is often required

– Attributes may have to be scaled to prevent distance
measures from being dominated by one of the
attributes
◆Example:

– height of a person may vary from 1.5m to 1.8m

– weight of a person may vary from 90lb to 300lb
– income of a person may vary from $10K to $1M

– Time series are often standardized to have 0

means a standard deviation of 1

2/10/2021 Introduction to Data Mining, 2nd Edition 6

Nearest Neighbor Classification…

Choosing the value of k:

– If k is too small, sensitive to noise points
– If k is too large, neighborhood may include points from
other classes

2/10/2021 Introduction to Data Mining, 2nd Edition 7

Nearest-neighbor classifiers

Nearest neighbor
classifiers are local
classifiers

They can produce 1-nn decision boundary is

decision boundaries of a Voronoi Diagram
arbitrary shapes.

2/10/2021 Introduction to Data Mining, 2nd Edition 8

Nearest Neighbor Classification…

How to handle missing values in training and

test sets?
– Proximity computations normally require the
presence of all attributes
– Some approaches use the subset of attributes
present in two instances
◆ This may not produce good results since it
effectively uses different proximity measures for
each pair of instances
◆ Thus, proximities are not comparable

2/10/2021 Introduction to Data Mining, 2nd Edition 9

K-NN Classificiers…
Handling Irrelevant and Redundant Attributes

– Irrelevant attributes add noise to the proximity measure

– Redundant attributes bias the proximity measure towards certain
attributes

2/10/2021 Introduction to Data Mining, 2nd Edition 10

K-NN Classifiers: Handling attributes that are interacting

2/10/2021 Introduction to Data Mining, 2nd Edition 11

Handling attributes that are interacting

2/10/2021 Introduction to Data Mining, 2nd Edition 12

Improving KNN Efficiency

Avoid having to compute distance to all objects in

the training set
– Multi-dimensional access methods (k-d trees)
– Fast approximate similarity search
– Locality Sensitive Hashing (LSH)
Condensing
– Determine a smaller set of objects that give
the same performance
Editing
– Remove objects to improve efficiency
2/10/2021 Introduction to Data Mining, 2nd Edition 13
Data Mining
Classification: Alternative Techniques

Bayesian Classifiers

Introduction to Data𝑝 Mining, 2nd Edition

by
Tan, Steinbach, Karpatne, Kumar
Bayes Classifier

• A probabilistic framework for solving classification

problems
• Conditional Probability: P(Y | X ) =
P( X , Y )
P( X )
P( X , Y )
P( X | Y ) =
P(Y )
• Bayes theorem:
P( X | Y ) P(Y )
P(Y | X ) =
P( X )

2/08/2021 Introduction to Data Mining, 2nd Edition 2

Using Bayes Theorem for Classification

• Consider each attribute and class

label as random variables
• Given a record with attributes (X1,
al al us
X2,…, Xd), the goal is to predict e go
ir c
e go
ir c
tin
uo
ss
a t a t n la
class Y c c co c
Tid Refund Marital Taxable
Status Income Evade

– Specifically, we want to find the value of 1 Yes Single 125K No

2 No Married 100K No
Y that maximizes P(Y| X1, X2,…, Xd )
3 No Single 70K No
4 Yes Married 120K No

• Can we estimate P(Y| X1, X2,…, Xd )

5 No Divorced 95K Yes
6 No Married 60K No
directly from data? 7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

2/08/2021 Introduction to Data Mining, 2nd Edition 3

Using Bayes Theorem for Classification

• Approach:
– compute posterior probability P(Y | X1, X2, …, Xd) using
the Bayes theorem
P( X 1 X 2  X d | Y ) P(Y )
P(Y | X 1 X 2  X n ) =
P( X 1 X 2  X d )

– Maximum a-posteriori: Choose Y that maximizes

P(Y | X1, X2, …, Xd)

– Equivalent to choosing value of Y that maximizes

P(X1, X2, …, Xd|Y) P(Y)

• How to estimate P(X1, X2, …, Xd | Y )?

2/08/2021 Introduction to Data Mining, 2nd Edition 4
Example Data
Given a Test Record:
al al us
go
ir c
go
X = (Refund = No, Divorced, Income = 120K)
ir c
in
uo
te te nt a ss
l
ca ca co c
Tid Refund Marital Taxable
Status Income Evade • We need to estimate
1 Yes Single 125K No P(Evade = Yes | X) and P(Evade = No | X)
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No In the following we will replace
5 No Divorced 95K Yes
Evade = Yes by Yes, and
6 No Married 60K No
7 Yes Divorced 220K No Evade = No by No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

2/08/2021 Introduction to Data Mining, 2nd Edition 5

Example Data
Given a Test Record:
al al us
go
ir c
go
ir cX = (Refund = No, Divorced, Income = 120K)
in
uo
te te nt a ss
l
ca ca co c
Tid Refund Marital Taxable
Status Income Evade

1 Yes Single 125K No

2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

2/08/2021 Introduction to Data Mining, 2nd Edition 6

Conditional Independence

• X and Y are conditionally independent given Z if

P(X|YZ) = P(X|Z)

• Example: Arm length and reading skills

– Young child has shorter arm length and
limited reading skills, compared to adults
– If age is fixed, no apparent relationship
between arm length and reading skills
– Arm length and reading skills are conditionally
independent given age

2/08/2021 Introduction to Data Mining, 2nd Edition 7

Naïve Bayes Classifier

• Assume independence among attributes Xi when class is

given:
– P(X1, X2, …, Xd |Yj) = P(X1| Yj) P(X2| Yj)… P(Xd| Yj)

– Now we can estimate P(Xi| Yj) for all Xi and Yj

combinations from the training data

– New point is classified to Yj if P(Yj)  P(Xi| Yj) is

maximal.

2/08/2021 Introduction to Data Mining, 2nd Edition 8

Naïve Bayes on Example Data
Given a Test Record:
al al us
go
ir c
go
X = (Refund = No, Divorced, Income = 120K)
ir c
in
uo
te te nt a ss
l
ca ca co c
Tid Refund Marital Taxable
Status Income Evade P(X | Yes) =
1 Yes Single 125K No
P(Refund = No | Yes) x
2 No Married 100K No
3 No Single 70K No
P(Divorced | Yes) x
4 Yes Married 120K No P(Income = 120K | Yes)
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
P(X | No) =
8 No Single 85K Yes P(Refund = No | No) x
9 No Married 75K No
P(Divorced | No) x
10 No Single 90K Yes
10

P(Income = 120K | No)

2/08/2021 Introduction to Data Mining, 2nd Edition 9

Estimate Probabilities from Data
l l
ic a ic a
ous
or or u
te g
te g
ntin
cla s•s P(y) = fraction of instances of class y
ca ca co
Tid Refund Marital Taxable – e.g., P(No) = 7/10,
Status Income Evade P(Yes) = 3/10
1 Yes Single 125K No
2 No Married 100K No • For categorical attributes:
3 No Single 70K No
4 Yes Married 120K No
P(Xi =c| y) = nc/ n
5 No Divorced 95K Yes
– where |Xi =c| is number of
6 No Married 60K No instances having attribute
7 Yes Divorced 220K No value Xi =c and belonging to
8 No Single 85K Yes class y
9 No Married 75K No
– Examples:
10 No Single 90K Yes
P(Status=Married|No) = 4/7
10

P(Refund=Yes|Yes)=0

2/08/2021 Introduction to Data Mining, 2nd Edition 10

Estimate Probabilities from Data

• For continuous attributes:

– Discretization: Partition the range into bins:
◆ Replace continuous value with bin value
– Attribute changed from continuous to ordinal

– Probability density estimation:

◆ Assume attribute follows a normal distribution
◆ Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
◆ Once probability distribution is known, use it to
estimate the conditional probability P(Xi|Y)

2/08/2021 Introduction to Data Mining, 2nd Edition 11

Estimate
oric a l
Probabilities
or ic a l
uous from Data
teg teg ntin a s s
a a l
c c co c
Tid Refund Marital
Status
Taxable
Income Evade
• Normal distribution:
( X i − ij ) 2
−
1 Yes Single 125K No 1 2 ij2
P( X i | Y j ) = e
2 No Married 100K No
2 2
ij
3 No Single 70K No
4 Yes Married 120K No – One for each (Xi,Yi) pair
5 No Divorced 95K Yes
6 No Married 60K No • For (Income, Class=No):
7 Yes Divorced 220K No
– If Class=No
8 No Single 85K Yes
9 No Married 75K No
◆ sample mean = 110
10 No Single 90K Yes ◆ sample variance = 2975
10

1 −
( 120−110 ) 2

P( Income = 120 | No) = e 2 ( 2975 )

= 0.0072
2 (54.54)
2/08/2021 Introduction to Data Mining, 2nd Edition 12
Example of Naïve Bayes Classifier
Given a Test Record:

X = (Refund = No, Divorced, Income = 120K)

Naïve Bayes Classifier:

P(Refund = Yes | No) = 3/7

2/08/2021 Introduction to Data Mining, 2nd Edition 13

Naïve Bayes Classifier can make decisions with partial
information about attributes in the test record
Even in absence of information
about any attributes, we can use P(Yes) = 3/10
Apriori Probabilities of Class P(No) = 7/10
Variable:
Naïve Bayes Classifier: If we only know that marital status is Divorced, then:
P(Yes | Divorced) = 1/3 x 3/10 / P(Divorced)
P(Refund = Yes | No) = 3/7
P(Refund = No | No) = 4/7 P(No | Divorced) = 1/7 x 7/10 / P(Divorced)
P(Refund = Yes | Yes) = 0
P(Refund = No | Yes) = 1 If we also know that Refund = No, then
P(Marital Status = Single | No) = 2/7 P(Yes | Refund = No, Divorced) = 1 x 1/3 x 3/10 /
P(Marital Status = Divorced | No) = 1/7
P(Marital Status = Married | No) = 4/7
P(Divorced, Refund = No)
P(Marital Status = Single | Yes) = 2/3 P(No | Refund = No, Divorced) = 4/7 x 1/7 x 7/10 /
P(Marital Status = Divorced | Yes) = 1/3 P(Divorced, Refund = No)
P(Marital Status = Married | Yes) = 0
If we also know that Taxable Income = 120, then
For Taxable Income: P(Yes | Refund = No, Divorced, Income = 120) =
If class = No: sample mean = 110 1.2 x10-9 x 1 x 1/3 x 3/10 /
sample variance = 2975
P(Divorced, Refund = No, Income = 120 )
If class = Yes: sample mean = 90
sample variance = 25 P(No | Refund = No, Divorced Income = 120) =
0.0072 x 4/7 x 1/7 x 7/10 /
P(Divorced, Refund = No, Income = 120)
2/08/2021 Introduction to Data Mining, 2nd Edition 14
Issues with Naïve Bayes Classifier
Given a Test Record:
X = (Married)

Naïve Bayes Classifier:

For Taxable Income:

If class = No: sample mean = 110
sample variance = 2975
If class = Yes: sample mean = 90
sample variance = 25

2/08/2021 Introduction to Data Mining, 2nd Edition 15

Issues with Naïve Bayes Classifier
ic al ic al us
o
gor gor in
u s
te te nt a s Naïve Bayes Classifier:
Consider the
ca table cwitha Tid =co7 deleted cl
Tid Refund Marital Taxable
Status Income Evade P(Refund = Yes | No) = 2/6
P(Refund = No | No) = 4/6
1 Yes Single 125K No P(Refund = Yes | Yes) = 0
2 No Married 100K No P(Refund = No | Yes) = 1
P(Marital Status = Single | No) = 2/6
3 No Single 70K No P(Marital Status = Divorced | No) = 0
4 Yes Married 120K No P(Marital Status = Married | No) = 4/6
5 No Divorced 95K Yes
P(Marital Status = Single | Yes) = 2/3
P(Marital Status = Divorced | Yes) = 1/3
6 No Married 60K No P(Marital Status = Married | Yes) = 0/3
7 Yes Divorced 220K No For Taxable Income:
If class = No: sample mean = 91
8 No Single 85K Yes
sample variance = 685
9 No Married 75K No If class = No: sample mean = 90
10 No Single 90K Yes sample variance = 25
10

Given X = (Refund = Yes, Divorced, 120K)

Naïve Bayes will not be able to
P(X | No) = 2/6 X 0 X 0.0083 = 0 classify X as Yes or No!
P(X | Yes) = 0 X 1/3 X 1.2 X 10-9 = 0

2/08/2021 Introduction to Data Mining, 2nd Edition 16

Issues with Naïve Bayes Classifier

• If one of the conditional probabilities is zero, then

the entire expression becomes zero
• Need to use other estimates of conditional probabilities
than simple fractions n: number of training
instances belonging to class y
• Probability estimation:
nc: number of instances with
Xi = c and Y = y
𝑛𝑐
original: 𝑃 𝑋𝑖 = 𝑐 𝑦) = v: total number of attribute
𝑛 values that Xi can take
𝑛𝑐 + 1 p: initial estimate of
Laplace Estimate: 𝑃 𝑋𝑖 = 𝑐 𝑦) =
𝑛+𝑣 (P(Xi = c|y) known apriori
m: hyper-parameter for our
𝑛𝑐 + 𝑚𝑝 confidence in p
m − estimate: 𝑃 𝑋𝑖 = 𝑐 𝑦) =
𝑛+𝑚

2/08/2021 Introduction to Data Mining, 2nd Edition 17

Example of Naïve Bayes Classifier

Name Give Birth Can Fly Live in Water Have Legs Class
human yes no no yes mammals
A: attributes
python no no no no non-mammals M: mammals
salmon no no yes no non-mammals
whale yes no yes no mammals N: non-mammals
frog no no sometimes yes non-mammals
komodo no no no yes non-mammals
6 6 2 2
bat
pigeon
yes
no
yes
yes
no
no
yes
yes
mammals
non-mammals
P ( A | M ) =    = 0.06
cat yes no no yes mammals
7 7 7 7
leopard shark yes no yes no non-mammals 1 10 3 4
turtle no no sometimes yes non-mammals P ( A | N ) =    = 0.0042
penguin no no sometimes yes non-mammals 13 13 13 13
porcupine yes no no yes mammals
7
P ( A | M ) P ( M ) = 0.06  = 0.021
eel no no yes no non-mammals
salamander no no sometimes yes non-mammals
gila monster no no no yes non-mammals 20
platypus no no no yes mammals
13
owl
dolphin
no
yes
yes
no
no
yes
yes
no
non-mammals
mammals
P ( A | N ) P ( N ) = 0.004  = 0.0027
eagle no yes no yes non-mammals 20

P(A|M)P(M) > P(A|N)P(N)

Give Birth Can Fly Live in Water Have Legs Class
yes no yes no ? => Mammals

2/08/2021 Introduction to Data Mining, 2nd Edition 18

Naïve Bayes (Summary)

• Robust to isolated noise points

• Handle missing values by ignoring the instance

during probability estimate calculations

• Robust to irrelevant attributes

• Redundant and correlated attributes will violate

class conditional assumption
–Useother techniques such as Bayesian Belief
Networks (BBN)

2/08/2021 Introduction to Data Mining, 2nd Edition 19

Naïve Bayes

• How does Naïve Bayes perform on the following dataset?

Conditional independence of attributes is violated

2/08/2021 Introduction to Data Mining, 2nd Edition 20

Bayesian Belief Networks

• Provides graphical representation of probabilistic

relationships among a set of random variables
• Consists of:
– A directed acyclic graph (dag) A B
◆ Node corresponds to a variable
◆ Arc corresponds to dependence
C
relationship between a pair of variables

– A probability table associating each node to its

immediate parent

2/08/2021 Introduction to Data Mining, 2nd Edition 21

Conditional Independence

D
D is parent of C
A is child of C
C
B is descendant of D
D is ancestor of A

A B

• A node in a Bayesian network is conditionally

independent of all of its nondescendants, if its
parents are known
2/08/2021 Introduction to Data Mining, 2nd Edition 22
Conditional Independence

• Naïve Bayes assumption:

X1 X2 X3 X4 ... Xd

2/08/2021 Introduction to Data Mining, 2nd Edition 23

Probability Tables

• If X does not have any parents, table contains

prior probability P(X)
Y

• If X has only one parent (Y), table contains

conditional probability P(X|Y) X

• If X has multiple parents (Y1, Y2,…, Yk), table

contains conditional probability P(X|Y1, Y2,…, Yk)

2/08/2021 Introduction to Data Mining, 2nd Edition 24

Example of Bayesian Belief Network

Exercise=Yes 0.7 Diet=Healthy 0.25

Exercise=No 0.3 Diet=Unhealthy 0.75

Exercise Diet

D=Healthy D=Healthy D=Unhealthy D=Unhealthy

Heart E=Yes E=No E=Yes E=No
Disease HD=Yes 0.25 0.45 0.55 0.75
HD=No 0.75 0.55 0.45 0.25

Blood
Chest Pain
Pressure

HD=Yes HD=No HD=Yes HD=No

CP=Yes 0.8 0.01 BP=High 0.85 0.2
CP=No 0.2 0.99 BP=Low 0.15 0.8

2/08/2021 Introduction to Data Mining, 2nd Edition 25

Example of Inferencing using BBN

• Given: X = (E=No, D=Yes, CP=Yes, BP=High)

2/08/2021 Introduction to Data Mining, 2nd Edition 26

Data Mining

Support Vector Machines

Introduction to Data Mining, 2nd Edition

by
Tan, Steinbach, Karpatne, Kumar

10/11/2021 Introduction to Data Mining, 2nd Edition 1

Support Vector Machines

• Find a linear hyperplane (decision boundary) that will separate the data
10/11/2021 Introduction to Data Mining, 2nd Edition 2
Support Vector Machines

• One Possible Solution

10/11/2021 Introduction to Data Mining, 2nd Edition 3
Support Vector Machines

• Another possible solution

10/11/2021 Introduction to Data Mining, 2nd Edition 4
Support Vector Machines

• Other possible solutions

10/11/2021 Introduction to Data Mining, 2nd Edition 5
Support Vector Machines

• Which one is better? B1 or B2?

• How do you define better?
10/11/2021 Introduction to Data Mining, 2nd Edition 6
Support Vector Machines

b21
b22

margin
b11

b12

• Find hyperplane maximizes the margin => B1 is better than B2

10/11/2021 Introduction to Data Mining, 2nd Edition 7
Support Vector Machines

 
w• x +b = 0
 
  w • x + b = +1
w • x + b = −1

b11

  b12
 1 if w • x + b  1 2
f ( x) =    Margin = 
− 1 if w • x + b  −1 || w ||
10/11/2021 Introduction to Data Mining, 2nd Edition 8
Linear SVM

• Linear model:
 
 1 if w • x + b  1
f ( x) =   
− 1 if w • x + b  −1

• Learning the model is equivalent to determining


the values of w and b

– How to find w and b from training data?

10/11/2021 Introduction to Data Mining, 2nd Edition 9

Learning Linear SVM
2
• Objective is to maximize: Margin = 
|| w ||
 2
 || w ||
– Which is equivalent to minimizing: L( w) =
2
– Subject to the following constraints:
 
1 if w • x i + b  1
yi =   
− 1 if w • x i + b  −1
or
𝑦𝑖 (w • x𝑖 + 𝑏) ≥ 1, 𝑖 = 1,2, . . . , 𝑁

◆ This is a constrained optimization problem

– Solve it using Lagrange multiplier method

10/11/2021 Introduction to Data Mining, 2nd Edition 10

Example of Linear SVM

Support vectors

x1 x2 y l
0.3858 0.4687 1 65.5261
0.4871 0.611 -1 65.5261
0.9218 0.4103 -1 0
0.7382 0.8936 -1 0
0.1763 0.0579 1 0
0.4057 0.3529 1 0
0.9355 0.8132 -1 0
0.2146 0.0099 1 0

10/11/2021 Introduction to Data Mining, 2nd Edition 11

Learning Linear SVM

• Decision boundary depends only on support

vectors
– If you have data set with same support
vectors, decision boundary will not change

– How to classify using SVM once w and b are

found? Given a test record, xi
 
 1 if w • x i + b  1
f ( xi ) =   
− 1 if w • x i + b  −1

10/11/2021 Introduction to Data Mining, 2nd Edition 12

Support Vector Machines

• What if the problem is not linearly separable?

10/11/2021 Introduction to Data Mining, 2nd Edition 13

Support Vector Machines

• What if the problem is not linearly separable?

– Introduce slack variables
◆ Need to minimize:  2
|| w ||  N k
L( w) = + C   i 
◆ Subject to:
2  i =1 

 
1 if w • x i + b  1 - i
yi =   
− 1 if w • x i + b  −1 + i
◆ If k is 1 or 2, this leads to similar objective function
as linear SVM but with different constraints (see
textbook)

10/11/2021 Introduction to Data Mining, 2nd Edition 14

Support Vector Machines

b21
b22

margin
b11

b12

• Find the hyperplane that optimizes both factors

10/11/2021 Introduction to Data Mining, 2nd Edition 15
Nonlinear Support Vector Machines

• What if decision boundary is not linear?

10/11/2021 Introduction to Data Mining, 2nd Edition 16

Nonlinear Support Vector Machines

• Transform data into higher dimensional space

Decision boundary:
 
w • ( x ) + b = 0
10/11/2021 Introduction to Data Mining, 2nd Edition 17
Learning Nonlinear SVM

• Optimization problem:

• Which leads to the same set of equations (but

involve (x) instead of x)

10/11/2021 Introduction to Data Mining, 2nd Edition 18

Learning NonLinear SVM

• Issues:
– What type of mapping function  should be
used?
– How to do the computation in high
dimensional space?
◆ Most computations involve dot product (xi)• (xj)
◆ Curse of dimensionality?

10/11/2021 Introduction to Data Mining, 2nd Edition 19

Learning Nonlinear SVM

• Kernel Trick:
– (xi)• (xj) = K(xi, xj)
– K(xi, xj) is a kernel function (expressed in
terms of the coordinates in the original space)
◆ Examples:

10/11/2021 Introduction to Data Mining, 2nd Edition 20

Example of Nonlinear SVM

SVM with polynomial

degree 2 kernel

10/11/2021 Introduction to Data Mining, 2nd Edition 21

Learning Nonlinear SVM

• Advantages of using kernel:

– Don’t have to know the mapping function 
– Computing dot product (xi)• (xj) in the
original space avoids curse of dimensionality

• Not all functions can be kernels

– Must make sure there is a corresponding  in
some high-dimensional space
– Mercer’s theorem (see textbook)

10/11/2021 Introduction to Data Mining, 2nd Edition 22

Characteristics of SVM

• The learning problem is formulated as a convex optimization problem

– Efficient algorithms are available to find the global minima
– Many of the other methods use greedy approaches and find locally
optimal solutions
– High computational complexity for building the model

• Robust to noise
• Overfitting is handled by maximizing the margin of the decision boundary,
• SVM can handle irrelevant and redundant attributes better than many
other techniques
• The user needs to provide the type of kernel function and cost function
• Difficult to handle missing values

• What about categorical variables?

10/11/2021 Introduction to Data Mining, 2nd Edition 23

Data Mining

Ensemble Techniques

Introduction to Data Mining, 2nd Edition

by
Tan, Steinbach, Karpatne, Kumar

10/11/2021 Introduction to Data Mining, 2nd Edition 1

Ensemble Methods

Construct a set of base classifiers learned from

the training data

Predict class label of test records by combining

the predictions made by multiple classifiers (e.g.,
by taking majority vote)

10/11/2021 Introduction to Data Mining, 2nd Edition 2

Example: Why Do Ensemble Methods Work?

10/11/2021 Introduction to Data Mining, 2nd Edition 3

Necessary Conditions for Ensemble Methods

Ensemble Methods work better than a single base classifier if:

1. All base classifiers are independent of each other
2. All base classifiers perform better than random guessing
(error rate < 0.5 for binary classification)

Classification error for an

ensemble of 25 base classifiers,
assuming their errors are
uncorrelated.

10/11/2021 Introduction to Data Mining, 2nd Edition 4

Rationale for Ensemble Learning

Ensemble Methods work best with unstable

base classifiers
– Classifiers that are sensitive to minor perturbations in
training set, due to high model complexity
– Examples: Unpruned decision trees, ANNs, …

10/11/2021 Introduction to Data Mining, 2nd Edition 5

Bias-Variance Decomposition

Analogous problem of reaching a target y by firing

projectiles from x (regression problem)

For classification, the generalization error of model 𝑚 can

be given by:

𝑔𝑒𝑛. 𝑒𝑟𝑟𝑜𝑟 𝑚 = 𝑐1 + 𝑏𝑖𝑎𝑠 𝑚 + 𝑐2 × 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑚)

10/11/2021 Introduction to Data Mining, 2nd Edition 6
Bias-Variance Trade-off and Overfitting

Overfitting

Underfitting

Ensemble methods try to reduce the variance of complex

models (with low bias) by aggregating responses of
multiple base classifiers
10/11/2021 Introduction to Data Mining, 2nd Edition 7
General Approach of Ensemble Learning

Using majority vote or

weighted majority vote
(weighted according to their
accuracy or relevance)

10/11/2021 Introduction to Data Mining, 2nd Edition 8

Constructing Ensemble Classifiers

By manipulating training set

– Example: bagging, boosting, random forests

By manipulating input features

– Example: random forests

By manipulating class labels

– Example: error-correcting output coding

By manipulating learning algorithm

– Example: injecting randomness in the initial weights of ANN

10/11/2021 Introduction to Data Mining, 2nd Edition 9

Bagging (Bootstrap AGGregatING)

Bootstrap sampling: sampling with replacement

Original Data 1 2 3 4 5 6 7 8 9 10
Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9
Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2
Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7

Build classifier on each bootstrap sample

Probability of a training instance being selected in

a bootstrap sample is:
➢ 1 – (1 - 1/n)n (n: number of training instances)
➢ ~0.632 when n is large
10/11/2021 Introduction to Data Mining, 2nd Edition 10
Bagging Algorithm

10/11/2021 Introduction to Data Mining, 2nd Edition 11

Bagging Example

Consider 1-dimensional data set:

Original Data:
x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
y 1 1 1 -1 -1 -1 -1 1 1 1

Classifier is a decision stump (decision tree of size 1)

– Decision rule: x  k versus x > k
– Split point k is chosen based on entropy

xk

True False

yleft yright
10/11/2021 Introduction to Data Mining, 2nd Edition 12
Bagging Example

Bagging Round 1:
x 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.9 0.9 x <= 0.35  y = 1
y 1 1 1 1 -1 -1 -1 -1 1 1 x > 0.35  y = -1

Bagging Round 2:
x 0.1 0.2 0.3 0.4 0.5 0.5 0.9 1 1 1
y 1 1 1 -1 -1 -1 1 1 1 1

Bagging Round 3:
x 0.1 0.2 0.3 0.4 0.4 0.5 0.7 0.7 0.8 0.9
y 1 1 1 -1 -1 -1 -1 -1 1 1

Bagging Round 4:
x 0.1 0.1 0.2 0.4 0.4 0.5 0.5 0.7 0.8 0.9
y 1 1 1 -1 -1 -1 -1 -1 1 1

Bagging Round 5:
x 0.1 0.1 0.2 0.5 0.6 0.6 0.6 1 1 1
y 1 1 1 -1 -1 -1 -1 1 1 1

10/11/2021 Introduction to Data Mining, 2nd Edition 13

Bagging Example

Bagging Round 1:
x 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.9 0.9 x <= 0.35  y = 1
y 1 1 1 1 -1 -1 -1 -1 1 1 x > 0.35  y = -1

Bagging Round 2:
x 0.1 0.2 0.3 0.4 0.5 0.5 0.9 1 1 1 x <= 0.7  y = 1
y 1 1 1 -1 -1 -1 1 1 1 1 x > 0.7  y = 1

Bagging Round 3:
x 0.1 0.2 0.3 0.4 0.4 0.5 0.7 0.7 0.8 0.9 x <= 0.35  y = 1
y 1 1 1 -1 -1 -1 -1 -1 1 1 x > 0.35  y = -1

Bagging Round 4:
x 0.1 0.1 0.2 0.4 0.4 0.5 0.5 0.7 0.8 0.9 x <= 0.3  y = 1
y 1 1 1 -1 -1 -1 -1 -1 1 1 x > 0.3  y = -1

Bagging Round 5:
x 0.1 0.1 0.2 0.5 0.6 0.6 0.6 1 1 1 x <= 0.35  y = 1
x > 0.35  y = -1
y 1 1 1 -1 -1 -1 -1 1 1 1

10/11/2021 Introduction to Data Mining, 2nd Edition 14

Bagging Example

Bagging Round 6:
x 0.2 0.4 0.5 0.6 0.7 0.7 0.7 0.8 0.9 1 x <= 0.75  y = -1
y 1 -1 -1 -1 -1 -1 -1 1 1 1 x > 0.75  y = 1

Bagging Round 7:
x 0.1 0.4 0.4 0.6 0.7 0.8 0.9 0.9 0.9 1 x <= 0.75  y = -1
y 1 -1 -1 -1 -1 1 1 1 1 1 x > 0.75  y = 1

Bagging Round 8:
x 0.1 0.2 0.5 0.5 0.5 0.7 0.7 0.8 0.9 1 x <= 0.75  y = -1
y 1 1 -1 -1 -1 -1 -1 1 1 1 x > 0.75  y = 1

Bagging Round 9:
x 0.1 0.3 0.4 0.4 0.6 0.7 0.7 0.8 1 1 x <= 0.75  y = -1
y 1 1 -1 -1 -1 -1 -1 1 1 1 x > 0.75  y = 1

Bagging Round 10:

x 0.1 0.1 0.1 0.1 0.3 0.3 0.8 0.8 0.9 0.9 x <= 0.05  y = 1
x > 0.05  y = 1
y 1 1 1 1 1 1 1 1 1 1

10/11/2021 Introduction to Data Mining, 2nd Edition 15

Bagging Example

Summary of Trained Decision Stumps:

Round Split Point Left Class Right Class

1 0.35 1 -1
2 0.7 1 1
3 0.35 1 -1
4 0.3 1 -1
5 0.35 1 -1
6 0.75 -1 1
7 0.75 -1 1
8 0.75 -1 1
9 0.75 -1 1
10 0.05 1 1

10/11/2021 Introduction to Data Mining, 2nd Edition 16

Bagging Example
Use majority vote (sign of sum of predictions) to
determine class of ensemble classifier
Round x=0.1 x=0.2 x=0.3 x=0.4 x=0.5 x=0.6 x=0.7 x=0.8 x=0.9 x=1.0
1 1 1 1 -1 -1 -1 -1 -1 -1 -1
2 1 1 1 1 1 1 1 1 1 1
3 1 1 1 -1 -1 -1 -1 -1 -1 -1
4 1 1 1 -1 -1 -1 -1 -1 -1 -1
5 1 1 1 -1 -1 -1 -1 -1 -1 -1
6 -1 -1 -1 -1 -1 -1 -1 1 1 1
7 -1 -1 -1 -1 -1 -1 -1 1 1 1
8 -1 -1 -1 -1 -1 -1 -1 1 1 1
9 -1 -1 -1 -1 -1 -1 -1 1 1 1
10 1 1 1 1 1 1 1 1 1 1
Sum 2 2 2 -6 -6 -6 -6 2 2 2
Predicted Sign 1 1 1 -1 -1 -1 -1 1 1 1
Class

Bagging can also increase the complexity (representation

capacity) of simple classifiers such as decision stumps
10/11/2021 Introduction to Data Mining, 2nd Edition 17
Boosting

An iterative procedure to adaptively change

distribution of training data by focusing more on
previously misclassified records
– Initially, all N records are assigned equal
weights (for being selected for training)
– Unlike bagging, weights may change at the
end of each boosting round

10/11/2021 Introduction to Data Mining, 2nd Edition 18

Boosting

Records that are wrongly classified will have their

weights increased in the next round
Records that are classified correctly will have
their weights decreased in the next round

Original Data 1 2 3 4 5 6 7 8 9 10
Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3
Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2
Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4

• Example 4 is hard to classify

• Its weight is increased, therefore it is more
likely to be chosen again in subsequent rounds

10/11/2021 Introduction to Data Mining, 2nd Edition 19

AdaBoost

Base classifiers: C1, C2, …, CT

Error rate of a base classifier:

Importance of a classifier:

1  1 − i 
i = ln 
2  i 

10/11/2021 Introduction to Data Mining, 2nd Edition 20

AdaBoost Algorithm

Weight update:

If any intermediate rounds produce error rate

higher than 50%, the weights are reverted back
to 1/n and the resampling procedure is repeated
Classification:

10/11/2021 Introduction to Data Mining, 2nd Edition 21

AdaBoost Algorithm

10/11/2021 Introduction to Data Mining, 2nd Edition 22

AdaBoost Example

Consider 1-dimensional data set:

Original Data:
x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
y 1 1 1 -1 -1 -1 -1 1 1 1

Classifier is a decision stump

– Decision rule: x  k versus x > k
– Split point k is chosen based on entropy
xk

True False

yleft yright
10/11/2021 Introduction to Data Mining, 2nd Edition 23
AdaBoost Example

Training sets for the first 3 boosting rounds:

Boosting Round 1:
x 0.1 0.4 0.5 0.6 0.6 0.7 0.7 0.7 0.8 1
y 1 -1 -1 -1 -1 -1 -1 -1 1 1

Boosting Round 2:
x 0.1 0.1 0.2 0.2 0.2 0.2 0.3 0.3 0.3 0.3
y 1 1 1 1 1 1 1 1 1 1

Boosting Round 3:
x 0.2 0.2 0.4 0.4 0.4 0.4 0.5 0.6 0.6 0.7
y 1 1 -1 -1 -1 -1 -1 -1 -1 -1

Summary:
Round Split Point Left Class Right Class alpha
1 0.75 -1 1 1.738
2 0.05 1 1 2.7784
3 0.3 1 -1 4.1195
10/11/2021 Introduction to Data Mining, 2nd Edition 24
AdaBoost Example

Weights
Round x=0.1 x=0.2 x=0.3 x=0.4 x=0.5 x=0.6 x=0.7 x=0.8 x=0.9 x=1.0
1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
2 0.311 0.311 0.311 0.01 0.01 0.01 0.01 0.01 0.01 0.01
3 0.029 0.029 0.029 0.228 0.228 0.228 0.228 0.009 0.009 0.009

Classification
Round x=0.1 x=0.2 x=0.3 x=0.4 x=0.5 x=0.6 x=0.7 x=0.8 x=0.9 x=1.0
1 -1 -1 -1 -1 -1 -1 -1 1 1 1
2 1 1 1 1 1 1 1 1 1 1
3 1 1 1 -1 -1 -1 -1 -1 -1 -1
Sum 5.16 5.16 5.16 -3.08 -3.08 -3.08 -3.08 0.397 0.397 0.397
Predicted Sign 1 1 1 -1 -1 -1 -1 1 1 1
Class

10/11/2021 Introduction to Data Mining, 2nd Edition 25

Random Forest Algorithm

Construct an ensemble of decision trees by

manipulating training set as well as features

– Use bootstrap sample to train every decision

tree (similar to Bagging)
– Use the following tree induction algorithm:
◆ At every internal node of decision tree, randomly
sample p attributes for selecting split criterion
◆ Repeat this procedure until all leaves are pure
(unpruned tree)

10/11/2021 Introduction to Data Mining, 2nd Edition 26

Characteristics of Random Forest

10/11/2021 Introduction to Data Mining, 2nd Edition 27

Gradient Boosting

Constructs a series of models

– Models can be any predictive model that has
a differentiable loss function
– Commonly, trees are the chosen model
◆ XGboost (extreme gradient boosting) is a popular
package because of its impressive performance
Boosting can be viewed as optimizing the loss
function by iterative functional gradient descent.
Implementations of various boosted algorithms
are available in Python, R, Matlab, and more.

10/11/2021 Introduction to Data Mining, 2nd Edition 28

Data Mining
Classification: Alternative Techniques

Imbalanced Class Problem

Introduction to Data Mining, 2nd Edition

by
Tan, Steinbach, Karpatne, Kumar
Class Imbalance Problem

Lots of classification problems where the classes

are skewed (more records from one class than
another)
– Credit card fraud
– Intrusion detection
– Defective products in manufacturing assembly line
– COVID-19 test results on a random sample

Key Challenge:
– Evaluation measures such as accuracy are not well-
suited for imbalanced class

2/15/2021 Introduction to Data Mining, 2nd Edition 2

Confusion Matrix

Confusion Matrix:

PREDICTED CLASS

Class=Yes Class=No

Class=Yes a b
ACTUAL
CLASS Class=No c d

a: TP (true positive)
b: FN (false negative)
c: FP (false positive)
d: TN (true negative)

2/15/2021 Introduction to Data Mining, 2nd Edition 3

Accuracy

PREDICTED CLASS

Class=Yes Class=No

Class=Yes a b
ACTUAL (TP) (FN)
CLASS
Class=No c d
(FP) (TN)

Most widely-used metric:

a+d TP + TN
Accuracy = =
a + b + c + d TP + TN + FP + FN
2/15/2021 Introduction to Data Mining, 2nd Edition 4
Problem with Accuracy
Consider a 2-class problem
– Number of Class NO examples = 990
– Number of Class YES examples = 10
If a model predicts everything to be class NO, accuracy is
990/1000 = 99 %
– This is misleading because this trivial model does not detect any class
YES example
– Detecting the rare class is usually more interesting (e.g., frauds,
intrusions, defects, etc)

PREDICTED CLASS
Class=Yes Class=No

Class=Yes 0 10
ACTUAL
CLASS Class=No 0 990
2/15/2021 Introduction to Data Mining, 2nd Edition 5
Which model is better?

PREDICTED
Class=Yes Class=No
A ACTUAL Class=Yes 0 10
Class=No 0 990

Accuracy: 99%

PREDICTED
B Class=Yes Class=No
ACTUAL Class=Yes 10 0
Class=No 500 490

Accuracy: 50%
2/15/2021 Introduction to Data Mining, 2nd Edition 6
Which model is better?

PREDICTED
A Class=Yes Class=No
ACTUAL Class=Yes 5 5
Class=No 0 990

PREDICTED
B Class=Yes Class=No
ACTUAL Class=Yes 10 0
Class=No 500 490

2/15/2021 Introduction to Data Mining, 2nd Edition 7

Alternative Measures

PREDICTED CLASS
Class=Yes Class=No

Class=Yes a b
ACTUAL
CLASS Class=No c d

a
Precision (p) =
a+c
a
Recall (r) =
a+b
2rp 2a
F - measure (F) = =
r + p 2a + b + c
2/15/2021 Introduction to Data Mining, 2nd Edition 8
Alternative Measures
10
PREDICTED CLASS Precision (p) = = 0.5
10 + 10
10
Class=Yes Class=No Recall (r) = =1
10 + 0
Class=Yes 10 0 2 *1* 0.5
ACTUAL F - measure (F) = = 0.62
CLASS Class=No 10 980 1 + 0.5
990
Accuracy = = 0.99
1000

2/15/2021 Introduction to Data Mining, 2nd Edition 9

Alternative Measures
10
PREDICTED CLASS Precision (p) = = 0.5
10 + 10
10
Class=Yes Class=No Recall (r) = =1
10 + 0
Class=Yes 10 0 2 *1* 0.5
ACTUAL F - measure (F) = = 0.62
CLASS Class=No 10 980 1 + 0.5
990
Accuracy = = 0.99
1000

PREDICTED CLASS 1
Precision (p) = =1
1+ 0
Class=Yes Class=No
1
Recall (r) = = 0.1
Class=Yes 1 9 1+ 9
ACTUAL 2 * 0.1*1
CLASS Class=No 0 990 F - measure (F) = = 0.18
1 + 0.1
991
Accuracy = = 0.991
1000
2/15/2021 Introduction to Data Mining, 2nd Edition 10
Which of these classifiers is better?

PREDICTED CLASS
Precision (p) = 0.8
Class=Yes Class=No
Recall (r) = 0.8
A Class=Yes 40 10 F - measure (F) = 0.8
ACTUAL
CLASS Class=No 10 40 Accuracy = 0.8

PREDICTED CLASS
B Class=Yes Class=No Precision (p) =~ 0.04
Class=Yes 40 10 Recall (r) = 0.8
ACTUAL F - measure (F) =~ 0.08
CLASS Class=No 1000 4000
Accuracy =~ 0.8

2/15/2021 Introduction to Data Mining, 2nd Edition 11

Measures of Classification Performance

PREDICTED CLASS
Yes No
ACTUAL
Yes TP FN
CLASS
No FP TN

 is the probability that we reject

the null hypothesis when it is
true. This is a Type I error or a
false positive (FP).

 is the probability that we

accept the null hypothesis when
it is false. This is a Type II error
or a false negative (FN).

2/15/2021 Introduction to Data Mining, 2nd Edition 12

Alternative Measures

A PREDICTED CLASS Precision (p) = 0.8

TPR = Recall (r) = 0.8
Class=Yes Class=No FPR = 0.2
F−measure (F) = 0.8
Class=Yes 40 10 Accuracy = 0.8
ACTUAL
CLASS Class=No 10 40
TPR
=4
FPR

B PREDICTED CLASS Precision (p) = 0.038

TPR = Recall (r) = 0.8
Class=Yes Class=No
FPR = 0.2
Class=Yes 40 10 F−measure (F) = 0.07
ACTUAL Accuracy = 0.8
CLASS Class=No 1000 4000
TPR
=4
FPR

2/15/2021 Introduction to Data Mining, 2nd Edition 13

Which of these classifiers is better?

A PREDICTED CLASS
Class=Yes Class=No
Precision (p) = 0.5
Class=Yes 10 40
TPR = Recall (r) = 0.2
ACTUAL
Class=No 10 40
FPR = 0.2
CLASS F − measure = 0.28

B PREDICTED CLASS
Precision (p) = 0.5
Class=Yes Class=No
TPR = Recall (r) = 0.5
Class=Yes 25 25
ACTUAL Class=No 25 25
FPR = 0.5
CLASS F − measure = 0.5

C PREDICTED CLASS Precision (p) = 0.5

Class=Yes Class=No
TPR = Recall (r) = 0.8
Class=Yes 40 10
ACTUAL FPR = 0.8
Class=No 40 10
CLASS
F − measure = 0.61
2/15/2021 Introduction to Data Mining, 2nd Edition 14
ROC (Receiver Operating Characteristic)

A graphical approach for displaying trade-off

between detection rate and false alarm rate
Developed in 1950s for signal detection theory to
analyze noisy signals
ROC curve plots TPR against FPR
– Performance of a model represented as a point in an
ROC curve

2/15/2021 Introduction to Data Mining, 2nd Edition 15

ROC Curve

(TPR,FPR):
(0,0): declare everything
to be negative class
(1,1): declare everything
to be positive class
(1,0): ideal

Diagonal line:
– Random guessing
– Below diagonal line:
◆ prediction is opposite
of the true class

2/15/2021 Introduction to Data Mining, 2nd Edition 16

ROC (Receiver Operating Characteristic)

To draw ROC curve, classifier must produce

continuous-valued output
– Outputs are used to rank test records, from the most likely
positive class record to the least likely positive class record
– By using different thresholds on this value, we can create
different variations of the classifier with TPR/FPR tradeoffs
Many classifiers produce only discrete outputs (i.e.,
predicted class)
– How to get continuous-valued outputs?
◆ Decision trees, rule-based classifiers, neural networks,
Bayesian classifiers, k-nearest neighbors, SVM

2/15/2021 Introduction to Data Mining, 2nd Edition 17

Example: Decision Trees
Decision Tree
x2 < 12.63

x1 < 13.29 x2 < 17.35

Continuous-valued outputs
x1 < 6.56 x1 < 2.15

x2 < 12.63
x1 < 7.24
x2 < 8.64
x1 < 13.29 x2 < 17.35

x1 < 12.11
x2 < 1.38 x1 < 6.56 x1 < 2.15
0.059 0.220
x1 < 18.88
x1 < 7.24
x2 < 8.64 0.071
0.107

x1 < 12.11
x2 < 1.38 0.727
0.164

x1 < 18.88
0.143 0.669 0.271

0.654 0

2/15/2021 Introduction to Data Mining, 2nd Edition 18

ROC Curve Example

x2 < 12.63

x1 < 13.29 x2 < 17.35

x1 < 6.56 x1 < 2.15

0.059 0.220

x1 < 7.24
x2 < 8.64 0.071
0.107

x1 < 12.11
x2 < 1.38 0.727
0.164

x1 < 18.88
0.143 0.669 0.271

0.654 0

2/15/2021 Introduction to Data Mining, 2nd Edition 19

ROC Curve Example
- 1-dimensional data set containing 2 classes (positive and negative)
- Any points located at x > t is classified as positive

At threshold t:
TPR=0.5, FNR=0.5, FPR=0.12, TNR=0.88
2/15/2021 Introduction to Data Mining, 2nd Edition 20
How to Construct an ROC curve

• Use a classifier that produces a

Instance Score True Class
continuous-valued score for
1 0.95 +
each instance
2 0.93 +
• The more likely it is for the
3 0.87 - instance to be in the + class, the
4 0.85 - higher the score
5 0.85 - • Sort the instances in decreasing
6 0.85 + order according to the score
7 0.76 - • Apply a threshold at each unique
8 0.53 + value of the score
9 0.43 - • Count the number of TP, FP,
10 0.25 + TN, FN at each threshold
• TPR = TP/(TP+FN)
• FPR = FP/(FP + TN)

2/15/2021 Introduction to Data Mining, 2nd Edition 21

How to construct an ROC curve
Class + - + - - - + - + +
P
Threshold >= 0.25 0.43 0.53 0.76 0.85 0.85 0.85 0.87 0.93 0.95 1.00

TP 5 4 4 3 3 3 3 2 2 1 0

FP 5 5 4 4 3 2 1 1 0 0 0

TN 0 0 1 1 2 3 4 4 5 5 5

FN 0 1 1 2 2 2 2 3 3 4 5

TPR 1 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.2 0

FPR 1 1 0.8 0.8 0.6 0.4 0.2 0.2 0 0 0

ROC Curve:

2/15/2021 Introduction to Data Mining, 2nd Edition 22

Using ROC for Model Comparison

No model consistently
outperforms the other
M1 is better for
small FPR
M2 is better for
large FPR

Area Under the ROC

curve (AUC)
Ideal:
▪ Area =1
Random guess:
▪ Area = 0.5

2/15/2021 Introduction to Data Mining, 2nd Edition 23

Dealing with Imbalanced Classes - Summary

Many measures exists, but none of them may be ideal in

all situations
– Random classifiers can have high value for many of these measures
– TPR/FPR provides important information but may not be sufficient by
itself in many practical scenarios
– Given two classifiers, sometimes you can tell that one of them is
strictly better than the other
◆C1 is strictly better than C2 if C1 has strictly better TPR and FPR relative to C2 (or same
TPR and better FPR, and vice versa)
– Even if C1 is strictly better than C2, C1’s F-value can be worse than
C2’s if they are evaluated on data sets with different imbalances
– Classifier C1 can be better or worse than C2 depending on the scenario
at hand (class imbalance, importance of TP vs FP, cost/time tradeoffs)

2/15/2021 Introduction to Data Mining, 2nd Edition 24

Which Classifer is better?
Precision (p) = 0.98
T1 PREDICTED CLASS TPR = Recall (r) = 0.5
Class=Yes Class=No
FPR = 0.01
Class=Yes 50 50 TPR/FPR = 50
ACTUAL
CLASS Class=No 1 99
F − measure = 0.66

Precision (p) = 0.9

T2 PREDICTED CLASS
TPR = Recall (r) = 0.99
Class=Yes Class=No
FPR = 0.1
ACTUAL
Class=Yes 99 1
TPR/FPR = 9.9
Class=No 10 90
CLASS
F − measure = 0.94

T3 PREDICTED CLASS Precision (p) = 0.99

Class=Yes Class=No TPR = Recall (r) = 0.99
Class=Yes 99 1 FPR = 0.01
ACTUAL
CLASS Class=No 1 99 TPR/FPR = 99

2/15/2021 Introduction to Data Mining, 2nd Edition F − measure = 0.99

25
Which Classifer is better? Medium Skew case
Precision (p) = 0.83
T1 PREDICTED CLASS TPR = Recall (r) = 0.5
Class=Yes Class=No
FPR = 0.01
Class=Yes 50 50 TPR/FPR = 50
ACTUAL
CLASS Class=No 10 990
F − measure = 0.62

Precision (p) = 0.5

T2 PREDICTED CLASS
TPR = Recall (r) = 0.99
Class=Yes Class=No
FPR = 0.1
ACTUAL
Class=Yes 99 1
TPR/FPR = 9.9
Class=No 100 900
CLASS
F − measure = 0.66

T3 PREDICTED CLASS Precision (p) = 0.9

Class=Yes Class=No TPR = Recall (r) = 0.99
Class=Yes 99 1 FPR = 0.01
ACTUAL
CLASS Class=No 10 990 TPR/FPR = 99

2/15/2021 Introduction to Data Mining, 2nd Edition F − measure = 0.94

26
Which Classifer is better? High Skew case
Precision (p) = 0.3
T1 PREDICTED CLASS TPR = Recall (r) = 0.5
Class=Yes Class=No
FPR = 0.01
Class=Yes 50 50 TPR/FPR = 50
ACTUAL
CLASS Class=No 100 9900
F − measure = 0.375

Precision (p) = 0.09

T2 PREDICTED CLASS
TPR = Recall (r) = 0.99
Class=Yes Class=No
FPR = 0.1
ACTUAL
Class=Yes 99 1
TPR/FPR = 9.9
Class=No 1000 9000
CLASS
F − measure = 0.165

T3 PREDICTED CLASS Precision (p) = 0.5

Class=Yes Class=No TPR = Recall (r) = 0.99
Class=Yes 99 1 FPR = 0.01
ACTUAL
CLASS Class=No 100 9900 TPR/FPR = 99

2/15/2021 Introduction to Data Mining, 2nd Edition F − measure = 0.66

27
Building Classifiers with Imbalanced Training Set

Modify the distribution of training data so that rare

class is well-represented in training set
– Undersample the majority class
– Oversample the rare class

2/15/2021 Introduction to Data Mining, 2nd Edition 28

Fuji SPH Standart Programmer
No ratings yet
Fuji SPH Standart Programmer
317 pages
24-09-30 OHM - Synchros, Size 11 - Clifion Precision Division of Litton Systems, Inc Clifton Heights, Pennsylvania - 32-1132-01 - Rev 5
No ratings yet
24-09-30 OHM - Synchros, Size 11 - Clifion Precision Division of Litton Systems, Inc Clifton Heights, Pennsylvania - 32-1132-01 - Rev 5
73 pages
The Status of E-Government in South Africa
0% (1)
The Status of E-Government in South Africa
12 pages
Lecture Notes For Chapter 5: by Tan, Steinbach, Kumar
0% (1)
Lecture Notes For Chapter 5: by Tan, Steinbach, Kumar
88 pages
Assessment Matching and True or False Type Test
No ratings yet
Assessment Matching and True or False Type Test
18 pages
By Eesha Tur Razia Babar: 2/1/2021 Introduction To Data Mining, 2 Edition 1
No ratings yet
By Eesha Tur Razia Babar: 2/1/2021 Introduction To Data Mining, 2 Edition 1
63 pages
Unit6 - 5 Rule Based Classifier
No ratings yet
Unit6 - 5 Rule Based Classifier
28 pages
Aiml Unit-4
No ratings yet
Aiml Unit-4
82 pages
05 Chap3 - Basic - Classification Edited On Oct 10, 2023
No ratings yet
05 Chap3 - Basic - Classification Edited On Oct 10, 2023
78 pages
Rule Based Classification
No ratings yet
Rule Based Classification
28 pages
Classification: Alternative Techniques: Md. Fazlul Karim Patwary IIT
No ratings yet
Classification: Alternative Techniques: Md. Fazlul Karim Patwary IIT
65 pages
Cs Topics
No ratings yet
Cs Topics
21 pages
Lecture 7 - Classification (Rules and Naïve Bayes)
100% (1)
Lecture 7 - Classification (Rules and Naïve Bayes)
19 pages
McKinsey The Winning Formula For Omnichannel Banking in North America
No ratings yet
McKinsey The Winning Formula For Omnichannel Banking in North America
9 pages
Class Adv Classification III
No ratings yet
Class Adv Classification III
54 pages
Myanmar Times (Myan) 201331607
100% (1)
Myanmar Times (Myan) 201331607
88 pages
Tuning ZFS On FreeBSD
No ratings yet
Tuning ZFS On FreeBSD
29 pages
Rule-Based Classification
No ratings yet
Rule-Based Classification
43 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
102 pages
Class Adv Classification I
No ratings yet
Class Adv Classification I
39 pages
Chap5 Alternative Classifi1
No ratings yet
Chap5 Alternative Classifi1
67 pages
Lec 1
No ratings yet
Lec 1
27 pages
Datamining Unit 3
No ratings yet
Datamining Unit 3
47 pages
Chap7 Extended Association Analysis
No ratings yet
Chap7 Extended Association Analysis
67 pages
Backup Strategy
No ratings yet
Backup Strategy
39 pages
Rule Coverage and Accuracy
No ratings yet
Rule Coverage and Accuracy
43 pages
Lecture 9
No ratings yet
Lecture 9
32 pages
Class 2b-Decision Rules
No ratings yet
Class 2b-Decision Rules
24 pages
Using DFS and GPO in FIM High Availability Scenarios
No ratings yet
Using DFS and GPO in FIM High Availability Scenarios
40 pages
Class Adv Classification II
No ratings yet
Class Adv Classification II
32 pages
Basic Classification
No ratings yet
Basic Classification
58 pages
Data Mining Alternative Classification Notes
No ratings yet
Data Mining Alternative Classification Notes
72 pages
GE CardioSoft V5.1 ECG Software - Service Manual
No ratings yet
GE CardioSoft V5.1 ECG Software - Service Manual
110 pages
Handling Continuous Attributes: Different Kinds of Rules
No ratings yet
Handling Continuous Attributes: Different Kinds of Rules
33 pages
Association Analysis: Advance Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis: Advance Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
87 pages
Chap9 Anomaly Detection
No ratings yet
Chap9 Anomaly Detection
46 pages
CH 4 - Classification Rule - Based Global Edition Edited Oct 17, 2024
No ratings yet
CH 4 - Classification Rule - Based Global Edition Edited Oct 17, 2024
28 pages
Classification: Alternative Techniques: Salvatore Orlando
No ratings yet
Classification: Alternative Techniques: Salvatore Orlando
52 pages
Week 4 - Classification Alternative Techniques
No ratings yet
Week 4 - Classification Alternative Techniques
87 pages
Operations Guide For Microsoft Advanced Group Policy Management 2.5
No ratings yet
Operations Guide For Microsoft Advanced Group Policy Management 2.5
65 pages
Unit 4 Data Mining Algorithms: Dr. Anjan Krishnamurthy Associate Professor Bmsit&M
No ratings yet
Unit 4 Data Mining Algorithms: Dr. Anjan Krishnamurthy Associate Professor Bmsit&M
95 pages
Chap4 Rule Based
No ratings yet
Chap4 Rule Based
21 pages
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
88 pages
Classification and Prediction-Module4
No ratings yet
Classification and Prediction-Module4
26 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Data Mining Classification: Alternative Techniques: Lecture Notes For Chapter 5 Introduction To Data Mining
No ratings yet
Data Mining Classification: Alternative Techniques: Lecture Notes For Chapter 5 Introduction To Data Mining
44 pages
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
72 pages
WK 6 Nearest Neighbor Classifier and Bayesian Classifier 12-05-2021
No ratings yet
WK 6 Nearest Neighbor Classifier and Bayesian Classifier 12-05-2021
23 pages
Introduction To Data Mining, 2 Edition: by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Introduction To Data Mining, 2 Edition: by Tan, Steinbach, Karpatne, Kumar
95 pages
3 Persiapan Data Mining
No ratings yet
3 Persiapan Data Mining
83 pages
Chap4 Rule Based
No ratings yet
Chap4 Rule Based
27 pages
Rule Based Classifier
No ratings yet
Rule Based Classifier
14 pages
Data Mining Classification: Alternative Techniques
No ratings yet
Data Mining Classification: Alternative Techniques
14 pages
L07 ANSYS WB LS-DYNA - Element Formulations
No ratings yet
L07 ANSYS WB LS-DYNA - Element Formulations
35 pages
Chap3 Basic Classification New 2
No ratings yet
Chap3 Basic Classification New 2
21 pages
Lec06 Classification NaiveBayes RuleBased
No ratings yet
Lec06 Classification NaiveBayes RuleBased
44 pages
Lect12-Rule Based Classifier
No ratings yet
Lect12-Rule Based Classifier
27 pages
Database Exam 1 Review
No ratings yet
Database Exam 1 Review
7 pages
Rule
No ratings yet
Rule
3 pages
Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
30 pages
IME672 - Lecture 48
No ratings yet
IME672 - Lecture 48
21 pages
Rule Based Classifications
No ratings yet
Rule Based Classifications
14 pages
Rule Based Classification
No ratings yet
Rule Based Classification
34 pages
Lec4 PDF
No ratings yet
Lec4 PDF
14 pages
Lecture Notes For Chapter 10 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 10 Introduction To Data Mining: by Tan, Steinbach, Kumar
24 pages
Data Mining Classification: Alternative Techniques
No ratings yet
Data Mining Classification: Alternative Techniques
14 pages
Lecture Notes For Chapter 4 Rule-Based Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Rule-Based Introduction To Data Mining, 2 Edition
28 pages
Report Orange Quockhanh - Abc
No ratings yet
Report Orange Quockhanh - Abc
15 pages
Lecture Notes For Chapter 1: by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Lecture Notes For Chapter 1: by Tan, Steinbach, Karpatne, Kumar
28 pages
SQL Statements and Functions in SAP HANA
No ratings yet
SQL Statements and Functions in SAP HANA
35 pages
Rule Based Classifier
No ratings yet
Rule Based Classifier
20 pages
MSA2000 Controller Replacement
No ratings yet
MSA2000 Controller Replacement
4 pages
250 Ms-Excel Keyboard Shortcuts
No ratings yet
250 Ms-Excel Keyboard Shortcuts
17 pages
4 Rules
No ratings yet
4 Rules
23 pages
Coding Theory
No ratings yet
Coding Theory
17 pages
Pemrograman Opengl Dasar: Pertemuan 5
No ratings yet
Pemrograman Opengl Dasar: Pertemuan 5
34 pages
Rules
No ratings yet
Rules
26 pages
Flask Session
No ratings yet
Flask Session
21 pages
Data Mining: Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Data Mining: Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
15 pages
Chap4 Naive Bayes
No ratings yet
Chap4 Naive Bayes
14 pages
Airline Schedule Development
No ratings yet
Airline Schedule Development
20 pages
87-351 Lecture 11 Notes
No ratings yet
87-351 Lecture 11 Notes
10 pages
Pressure Vessel Design Manual
No ratings yet
Pressure Vessel Design Manual
8 pages
Enclosures and Numbering/Lettering Paragraphs
No ratings yet
Enclosures and Numbering/Lettering Paragraphs
2 pages
Data Mining - Rule Based Classification
No ratings yet
Data Mining - Rule Based Classification
3 pages
Index
No ratings yet
Index
2 pages
Certificate Mca Internship
No ratings yet
Certificate Mca Internship
3 pages
Liudzius Resume 10 06
No ratings yet
Liudzius Resume 10 06
2 pages
MscIT Entrance Test Syllabus
100% (1)
MscIT Entrance Test Syllabus
4 pages
Snapping Turtle Keeper’s Guide Caring for One of Nature’s Toughest Reptiles
From Everand
Snapping Turtle Keeper’s Guide Caring for One of Nature’s Toughest Reptiles
Reed Ashcroft
No ratings yet