0% found this document useful (0 votes)

13 views41 pages

Classification 2

The document discusses classification methods in data mining, focusing on rule-based classifiers and nearest-neighbor classifiers. It explains the structure of rule-based classifiers, including how rules are formed, their coverage and accuracy, and methods for generating and optimizing rules. Additionally, it contrasts eager learners, which build models immediately after training, with lazy learners, which delay model generalization until classification is needed.

Uploaded by

floraaluoch3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views41 pages

Classification 2

Uploaded by

floraaluoch3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

University of Florida CISE department Gator Engineering

Classification
Part 2

Dr. Sanjay Ranka

Professor
Computer and Information Science and Engineering
University of Florida, Gainesville
University of Florida CISE department Gator Engineering

Overview
• Rule based Classifiers
• Nearest-neighbor Classifiers

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Rule Based Classifiers

• Classify instances by using a collection of
“if … then …” rules
• Rules are presented in Disjunctive Normal
Form, R = (r1 v r2 v … rk)
• R is called rule set
• ri’s are called classification rules
• Each classification rule is of form
– ri : (Conditioni) → y
• Condition is a conjunction of attribute tests
• y is the class label
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Rule Based Classifiers

• ri : (Conditioni) → y
– LHS of the rule is called rule antecedent or pre-condition
– RHS is called the rule consequent
• If the attributes of an instance satisfy the pre-
condition of a rule, then the instance is assigned
to the class designated by the rule consequent
• Example
– (Blood Type=Warm)∧ (Lay Eggs=Yes) → Birds
– (Taxable Income < 50K) ∧ (Refund=Yes)
→ Cheat=No

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Classifying Instances with Rules

• A rule r covers an instance x if the
attributes of the instance satisfy the
condition of the rule
• Rule:
r : (Age < 35) ∧ (Status = Married) → Cheat=No
• Instances:
x1 : (Age=29, Status=Married, Refund=No)
x2 : (Age=28, Status=Single, Refund=Yes)
x3 : (Age=38, Status=Divorced, Refund=No)
• Only x1 is covered by the rule r
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Rule Based Classifiers

• Rules may not be mutually exclusive
– More than one rule may cover the same instance
• Strategies:
– Strict enforcement of mutual exclusiveness
• Avoid generating rules that have overlapping coverage with
previously selected rules
– Ordered rules
• Rules are rank ordered according to their priority
– Voting
• Allow an instance to trigger multiple rules, and consider the
consequent of each triggered rule as a vote for that particular
class
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Rule Based Classifiers

• Rules may not be exhaustive
• Strategy:
– A default rule rd : ( ) yd can be added
– The default rule has an empty antecedent and
is applicable when all other rules have failed
– yd is known as default class and is often
assigned to the majority class

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Example of Rule Based Classifier

• r1: (Refund=No) &

(Marital Status=Single) &
(Taxable Income>80K)
Yes
• r2: (Refund=No) &
(Marital
Status=Divorced) &
(Taxable Income>80K)
Yes
• default: ( ) No
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Advantages of Rule Based Classifiers

• As highly expressive as decision trees
• Easy to interpret
• Easy to generate
• Can classify new instances rapidly
• Performance comparable to decision trees

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Basic Definitions
• Coverage of a rule:
– Fraction of
instances that
satisfy the
antecedent of a rule
• Accuracy of a rule:
– Fraction of
instances that
satisfy both the
antecedent and (Marital Status=Married) → No
consequent of a rule Coverage = 40%, Accuracy = 100%
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

How to Build a Rule Based Classifier

• Generate an initial set of rules
– Direct Method:
• Extract rules directly from data
• Examples: RIPPER, CN2, Holte’s 1R
– Indirect Method:
• Extract rules from other classification methods (e.g. decision
trees)
• Example: C4.5 rules
• Rules are pruned and simplified
• Rules can be order to obtain a rule set R
• Rule set R can be further optimized
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Indirect Method:
From Decision Trees to Rules

•Rules are mutually exclusive and exhaustive

•Rule set contains as much information as the tree
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Rules can be Simplified

•Initial Rule: (Refund=No) ∧ (Status=Married) → No

•Simplified Rule: (Status=Married) → No
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Indirect Method: C4.5 rules

• Creating an initial set of rules
– Extract rules from an un-pruned decision tree
– For each rule, r: A y
• Consider an alternative rule r’: A’ y, where A’ is
obtained by removing one of the conjuncts in A
• Compare the pessimistic error rate for r against all
r’s
• Prune if one of the r’s has a lower pessimistic error
rate
• Repeat until we can no longer improve the
generalization error
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Indirect Method: C4.5 rules

• Ordering the rules
– Instead of ordering the rules, order subsets of
rules
• Each subset is a collection of rules with the same
consequent (class)
• Description length of each subset is computed, and
the subsets are then ordered in the increasing order
of the description length
– Description length = L(exceptions) + g·L(model)
– g is a parameter that takes in to account the presence of
redundant attributes in a rule set. Default value is 0.5

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Direct Method: Sequential Covering

• Sequential Covering Algorithm (E:
training examples, A: set of attributes)
1. Let R = { } be the initial rule set
2. While stopping criteria is not met
1. r Learn-One-Rule (E, A)
2. Remove instances from E that are covered by r
3. Add r to rule set: R = R v r

• Example of stopping criteria: Stop when

all instances belong to same class or all
attributes have same value
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Example of Sequential Covering

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Example of Sequential Covering

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Learn One Rule

• The objective of this function is to extract
the best rule that covers the current set of
training instances
– What is the strategy used for rule growing
– What is the evaluation criteria used for rule
growing
– What is the stopping criteria for rule growing
– What is the pruning criteria for generalizing
the rule

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Learn One Rule

• Rule Growing Strategy
– General-to-specific approach
• It is initially assumed that the best rule is the empty
rule, r: { } y, where y is the majority class of the
instances
• Iteratively add new conjuncts to the LHS of the rule
until the stopping criterion is met
– Specific-to-general approach
• A positive instance is chosen as the initial seed for a
rule
• The function keeps refining this rule by generalizing
the conjuncts until the stopping criterion is met
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Learn One Rule

• Rule Evaluation and Stopping Criteria
– Evaluate rules using rule evaluation metric
• Accuracy
• Coverage
• Entropy
• Laplace
• M-estimate
– A typical condition for terminating the rule
growing process is to compare the evaluation
metric of the previous candidate rule to the
newly grown rule
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Learn One Rule

• Rule Pruning
– Each extracted rule can be pruned to improve their
ability to generalize beyond the training instances
– Pruning can be done by removing one of the conjuncts
of the rule and then testing it against a validation set
• Instance Elimination
– Instance elimination prevents the same rule from being
generated again
– Positive instances must be removed after each rule is
extracted
– Some rule based classifiers keep negative instances,
while some remove them prior to generating next rule
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Direct Method: Sequential Covering

• Use a general-to-specific search strategy
• Greedy approach
• Unlike decision tree (which uses simultaneous
covering), it does not explore all possible paths
– Search only the current best path
– Beam search: maintain k of the best paths
• At each step,
– Decision tree chooses among several alternative
attributes for splitting
– Sequential covering chooses among alternative
attribute-value pairs

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Direct Method: RIPPER

• For 2-class problem, choose one of the classes as
positive class, and the other as negative class
– Learn rules for positive class
– Negative class will be default class
• For multi-class problem
– Order the classes according to increasing class
prevalence (fraction of instances that belong to a
particular class)
– Learn the rule set for smallest class first, treat the rest
as negative class
– Repeat with next smallest class as positive class
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Foil's Information Gain

• Compares the performance of a rule before and
after adding a new conjunct
• Suppose the rule r: A y covers p0 positive and
n0 negative instances
• After adding a new conjunct B, the rule r’: A^B
y covers p1 positive and n1 negative instances
• Foil's information gain is defined as
= t · [ log2(p1 / (p1 + n1)) - log2(p0 / (p0 + n0)) ]
where t is the number of positive instances covered
by both r and r’

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Direct Method: RIPPER

• Growing a rule:
– Start from empty rule
– Add conjuncts as long as they improve Foil's
information gain
– Stop when rule no longer covers negative examples
– Prune the rule immediately using incremental
reduced error pruning
– Measure for pruning: v = (p - n) / (p + n)
• p: number of positive examples covered by the rule in the
validation set
• n: number of negative examples covered by the rule in the
validation set
– Pruning method: delete any final sequence of
conditions that maximizes v
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Direct Method: RIPPER

• Building a Rule Set:
– Use sequential covering algorithm
• Finds the best rule that covers the current set of
positive examples
• Eliminate both positive and negative examples
covered by the rule
– Each time a rule is added to the rule set,
compute the description length
• Stop adding new rules when the new description
length is d bits longer than the smallest description
length obtained so far. d is often chosen as 64 bits
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Direct Method: RIPPER

• Optimize the rule set:
– For each rule r in the rule set R
• Consider 2 alternative rules:
– Replacement rule (r*): grow new rule from scratch
– Revised rule (r’): add conjuncts to extend the rule r
• Compare the rule set for r against the rule set for
r* and r’
• Choose rule set that minimizes MDL principle
– Repeat rule generation and rule optimization
for the remaining positive examples

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Example

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

C4.5 rules versus RIPPER

C4.5rules:
(Give Birth=No, Can Fly=Yes) → Birds
(Give Birth=No, Live in Water=Yes) → Fishes
(Give Birth=Yes) → Mammals
(Give Birth=No, Can Fly=No, Live in Water=No) → Reptiles
( ) → Amphibians
RIPPER:
(Live in Water=Yes) → Fishes
(Have Legs=No) → Reptiles
(Give Birth=No, Can Fly=No, Live In Water=No)
→ Reptiles
(Can Fly=Yes,Give Birth=No) → Birds
() → Mammals

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

C4.5rules versus RIPPER

C4.5rules:

RIPPER:

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Eager Learners
• So far we have learnt that classification involves
– An inductive step for constructing classification models
from data
– A deductive step for applying the derived model to
previously unseen instances
• For decision tree induction and rule based
classifiers, the models are constructed
immediately after the training set is provided
• Such techniques are known as eager learners
because they intend to learn the model as soon
as possible, once the training data is available
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Lazy Learners
• An opposite strategy would be to delay the
process of generalizing the training data until it
is needed to classify the unseen instances
• Techniques that employ such strategy are known
as lazy learners
• An example of lazy learner is the Rote Classifier,
which memorizes the entire training data and
perform classification only if the attributes of a
test instance matches one of the training
instances exactly

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Nearest-neighbor Classifiers
• One way to make the “Rote Classifier”
approach more flexible is to find all
training instances that are relatively similar
to the test instance. They are called nearest
neighbors of the test instance
• The test instance can then be classified
according to the class label of its neighbors
• “If it walks like a duck, quacks like a duck, and
looks like a duck, then it’s probably a duck”
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Nearest Neighbor Classifiers

• For data set with
continuous attributes
• Requires
– Set of sorted instances
– Distance metric to compute
distance between instances
– Value of k, the number of
nearest neighbors to
retrieve
• For classification
– Retrieve the k nearest
neighbors
– Use class labels of the
nearest neighbors to
determine the class label of
the unseen instance (e.g. by
taking majority vote)
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Definition of Nearest Neighbor

 The k-nearest neighbors of an instance x are

defined as the data points having the k smallest
distances to x
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

1-nearest Neighbor

• If k = 1, we can illustrate the decision boundary

of each class by using a Voronoi diagram
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Distance Metric
• Distance metric is required to compute the
distance between two instances
• A nearest neighbor classifier represents each
instance as a data point embedded in a d-
dimensional space, where d is the number of
continuous attributes
• Euclidean Distance

• Weighted Distance
– Weight factor, w = 1 / d2
– Weight the vote according to the distance
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Choosing the value of k

• If k is too small,
classifier is sensitive
to noise points
• If k is too large
– Computationally
intensive
– Neighborhood may
include points from
other classes

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Nearest Neighbor Classifiers

• Problems with Euclidean distance
– High dimensional data
• Curse of dimensionality
– Can produce counter intuitive results (e.g. text
document classification)
• Solution: Normalization
111111111110 100000000000
vs.
011111111111 000000000001
Euclidean distance between pairs of un-normalized vectors

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Nearest Neighbor Classifiers

• Other issues
– Nearest neighbor classifiers are lazy learners
• It does not build models explicitly
• Classifying instances is expensive
– Scaling of attributes
• Person:
– (Height in meters, Weight in pounds, Class)
– Height may vary from 1.5 m to 1.85 m
– Weight may vary from 90 lbs to 250 lbs
– Distance measure could be dominated by difference in
weights

Data Mining Sanjay Ranka Spring 2011

12 Maths Important Questions 2025
No ratings yet
12 Maths Important Questions 2025
5 pages
Dump
No ratings yet
Dump
110 pages
Lecture 7 - Classification (Rules and Naïve Bayes)
100% (1)
Lecture 7 - Classification (Rules and Naïve Bayes)
19 pages
Chapter 4: Classification & Prediction
100% (1)
Chapter 4: Classification & Prediction
54 pages
17 Earthquake Engineering
No ratings yet
17 Earthquake Engineering
70 pages
M6 Classification Alternative
No ratings yet
M6 Classification Alternative
145 pages
Classification: Alternative Techniques: Md. Fazlul Karim Patwary IIT
No ratings yet
Classification: Alternative Techniques: Md. Fazlul Karim Patwary IIT
65 pages
Class Adv Classification III
No ratings yet
Class Adv Classification III
54 pages
Rules
No ratings yet
Rules
84 pages
Lecture Notes For Chapter 5: by Tan, Steinbach, Kumar
0% (1)
Lecture Notes For Chapter 5: by Tan, Steinbach, Kumar
88 pages
DM 05 04 Rule-Based Classification
No ratings yet
DM 05 04 Rule-Based Classification
72 pages
Rule Based Classification
No ratings yet
Rule Based Classification
28 pages
Unit-III - Chapter7-Learning Rule Sets
No ratings yet
Unit-III - Chapter7-Learning Rule Sets
44 pages
Unit 1-1
No ratings yet
Unit 1-1
45 pages
ML UNIT-3 Notes PDF
No ratings yet
ML UNIT-3 Notes PDF
23 pages
Rule-Based Classification
No ratings yet
Rule-Based Classification
43 pages
Chap5 Alternative Classifi1
No ratings yet
Chap5 Alternative Classifi1
67 pages
Unit6 - 5 Rule Based Classifier
No ratings yet
Unit6 - 5 Rule Based Classifier
28 pages
Rule Coverage and Accuracy
No ratings yet
Rule Coverage and Accuracy
43 pages
Datamining Unit 3
No ratings yet
Datamining Unit 3
47 pages
An Examination of The Relationship Between Ability Model Emotional Intelligence and Leadership Practices of Organizational Leaders and Entrepreneurs
No ratings yet
An Examination of The Relationship Between Ability Model Emotional Intelligence and Leadership Practices of Organizational Leaders and Entrepreneurs
251 pages
Data Mining Alternative Classification Notes
No ratings yet
Data Mining Alternative Classification Notes
72 pages
Week 4 - Classification Alternative Techniques
No ratings yet
Week 4 - Classification Alternative Techniques
87 pages
Lecture 9
No ratings yet
Lecture 9
32 pages
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
88 pages
CH 4 - Classification Rule - Based Global Edition Edited Oct 17, 2024
No ratings yet
CH 4 - Classification Rule - Based Global Edition Edited Oct 17, 2024
28 pages
Classification and Prediction-Module4
No ratings yet
Classification and Prediction-Module4
26 pages
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
72 pages
Chap4 Rule Based
No ratings yet
Chap4 Rule Based
27 pages
Pattern Recognition and Computer Vision NOTES
No ratings yet
Pattern Recognition and Computer Vision NOTES
27 pages
Chap4 Rule Based
No ratings yet
Chap4 Rule Based
21 pages
Model CV Curriculum Vitae European Engleza
No ratings yet
Model CV Curriculum Vitae European Engleza
2 pages
Classification: Alternative Techniques: Salvatore Orlando
No ratings yet
Classification: Alternative Techniques: Salvatore Orlando
52 pages
Classification: Dr. Sanjay Ranka
No ratings yet
Classification: Dr. Sanjay Ranka
51 pages
Unit 5
No ratings yet
Unit 5
21 pages
Lec06 Classification NaiveBayes RuleBased
No ratings yet
Lec06 Classification NaiveBayes RuleBased
44 pages
Construction and Trial Experiment of A Small Size Thermo-Acoustic Refrigeration System
No ratings yet
Construction and Trial Experiment of A Small Size Thermo-Acoustic Refrigeration System
6 pages
DM - 05 - 04 - Rule-Based Classification PDF
No ratings yet
DM - 05 - 04 - Rule-Based Classification PDF
72 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Ten Project Proposals in Artificial Intelligence: January 2008
No ratings yet
Ten Project Proposals in Artificial Intelligence: January 2008
35 pages
Rule Based Classifier
No ratings yet
Rule Based Classifier
14 pages
2) Makrifatullah 3A
No ratings yet
2) Makrifatullah 3A
64 pages
Lect12-Rule Based Classifier
No ratings yet
Lect12-Rule Based Classifier
27 pages
IME672 - Lecture 48
No ratings yet
IME672 - Lecture 48
21 pages
ML Unit-5
No ratings yet
ML Unit-5
23 pages
DM 04 04 Rule-Based Classification
No ratings yet
DM 04 04 Rule-Based Classification
72 pages
Euro Standards - Eye and Face Protection
100% (2)
Euro Standards - Eye and Face Protection
3 pages
Lecture Notes For Chapter 4 Rule-Based Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Rule-Based Introduction To Data Mining, 2 Edition
28 pages
From Trees To Rules
No ratings yet
From Trees To Rules
28 pages
Discovering Rules For Rule Based Machine Learning With The Help of Novelty Search
No ratings yet
Discovering Rules For Rule Based Machine Learning With The Help of Novelty Search
15 pages
Rule Based Classification
No ratings yet
Rule Based Classification
34 pages
Lesson 3.2 - Supervised Learning Evaluation
No ratings yet
Lesson 3.2 - Supervised Learning Evaluation
31 pages
Introduction To Data Mining: Dr. Sanjay Ranka
No ratings yet
Introduction To Data Mining: Dr. Sanjay Ranka
44 pages
Eodr Fcds
No ratings yet
Eodr Fcds
12 pages
Dm4part1 PDF
No ratings yet
Dm4part1 PDF
24 pages
CHAPTER 6 Machine Learning: Objective
No ratings yet
CHAPTER 6 Machine Learning: Objective
29 pages
Data Mining Classification: Alternative Techniques: Lecture Notes For Chapter 5 Introduction To Data Mining
No ratings yet
Data Mining Classification: Alternative Techniques: Lecture Notes For Chapter 5 Introduction To Data Mining
44 pages
Lect6 PDF
No ratings yet
Lect6 PDF
66 pages
Rule Based Classification
No ratings yet
Rule Based Classification
42 pages
Neo4j Graph Data Science Certified - Exam Practice Tests
From Everand
Neo4j Graph Data Science Certified - Exam Practice Tests
Cristian Scutaru
No ratings yet
Sorting Algorithms and Techniques: Definitive Reference for Developers and Engineers
From Everand
Sorting Algorithms and Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Rule
No ratings yet
Rule
3 pages
4 Rules
No ratings yet
4 Rules
23 pages
Rules
No ratings yet
Rules
26 pages
An Integer Programming Approach To Inductive Learning Using Genetic Algorithm (2002)
No ratings yet
An Integer Programming Approach To Inductive Learning Using Genetic Algorithm (2002)
6 pages
Akyuz - Birbil (2021) - Discovering Classification Rules For Interpretable Learning With Linear Programming-1
No ratings yet
Akyuz - Birbil (2021) - Discovering Classification Rules For Interpretable Learning With Linear Programming-1
16 pages
DataStage Configuration File
No ratings yet
DataStage Configuration File
10 pages
Transaction Concurrency and Deadlock
No ratings yet
Transaction Concurrency and Deadlock
65 pages
MATH 8 THIRD QUARTER EXAM (AutoRecovered)
No ratings yet
MATH 8 THIRD QUARTER EXAM (AutoRecovered)
8 pages
Exams 1
No ratings yet
Exams 1
20 pages
Data Mining-Rule Based Classification
No ratings yet
Data Mining-Rule Based Classification
4 pages
Total Improvement Management - (TIM) : H. James Harrington
No ratings yet
Total Improvement Management - (TIM) : H. James Harrington
70 pages
05.contraction of Skeletal Muscle
No ratings yet
05.contraction of Skeletal Muscle
87 pages
Decision Trees For Handling Uncertain Data To Identify Bank Frauds
No ratings yet
Decision Trees For Handling Uncertain Data To Identify Bank Frauds
4 pages
SHELLEY Queen Mab
No ratings yet
SHELLEY Queen Mab
4 pages
The Psychology of Seduction PDF
No ratings yet
The Psychology of Seduction PDF
15 pages
Tuyhoa 2019 2020
No ratings yet
Tuyhoa 2019 2020
11 pages
Data Mining
No ratings yet
Data Mining
7 pages
NEUST AAF F001 Syllabus of Instruction Rev.02
No ratings yet
NEUST AAF F001 Syllabus of Instruction Rev.02
4 pages
Clustering Notes
No ratings yet
Clustering Notes
20 pages
Automated Software Testing
No ratings yet
Automated Software Testing
10 pages
10 - Helical Anchors
No ratings yet
10 - Helical Anchors
55 pages
Lect3 Pipeline
No ratings yet
Lect3 Pipeline
4 pages
Chapter 1 - Persistent Questions
No ratings yet
Chapter 1 - Persistent Questions
2 pages
Chapter 2 Random Variables
No ratings yet
Chapter 2 Random Variables
232 pages
Clustering Illustrations Publishing 1
No ratings yet
Clustering Illustrations Publishing 1
54 pages
Vernacular Arch
No ratings yet
Vernacular Arch
59 pages
SCP 1471 A One of A Kind Friend
No ratings yet
SCP 1471 A One of A Kind Friend
195 pages
2.2 Hypothesis Testing Critical Values - COMPLETE
No ratings yet
2.2 Hypothesis Testing Critical Values - COMPLETE
13 pages
Unit 6 Homework Part 2
No ratings yet
Unit 6 Homework Part 2
33 pages
S37 Make-to-Order With Component Availability Check: Scenario Overview
No ratings yet
S37 Make-to-Order With Component Availability Check: Scenario Overview
14 pages
Audit Report
No ratings yet
Audit Report
2 pages
Concave Mirror
No ratings yet
Concave Mirror
3 pages
Milgram Questions
No ratings yet
Milgram Questions
2 pages
Legabex Syllabus Final
No ratings yet
Legabex Syllabus Final
6 pages
Exploring Randomness PDF
No ratings yet
Exploring Randomness PDF
2 pages

Classification 2

Uploaded by

Classification 2

Uploaded by

University of Florida CISE department Gator Engineering

Dr. Sanjay Ranka

Data Mining Sanjay Ranka Spring 2011

Rule Based Classifiers

Rule Based Classifiers

Data Mining Sanjay Ranka Spring 2011

Classifying Instances with Rules

Rule Based Classifiers

Rule Based Classifiers

Data Mining Sanjay Ranka Spring 2011

Example of Rule Based Classifier

• r1: (Refund=No) &

Advantages of Rule Based Classifiers

Data Mining Sanjay Ranka Spring 2011

How to Build a Rule Based Classifier

•Rules are mutually exclusive and exhaustive

Rules can be Simplified

•Initial Rule: (Refund=No) ∧ (Status=Married) → No

Indirect Method: C4.5 rules

Indirect Method: C4.5 rules

Data Mining Sanjay Ranka Spring 2011

Direct Method: Sequential Covering

• Example of stopping criteria: Stop when

Example of Sequential Covering

Data Mining Sanjay Ranka Spring 2011

Example of Sequential Covering

Data Mining Sanjay Ranka Spring 2011

Learn One Rule

Data Mining Sanjay Ranka Spring 2011

Learn One Rule

Learn One Rule

Learn One Rule

Direct Method: Sequential Covering

Data Mining Sanjay Ranka Spring 2011

Direct Method: RIPPER

Foil's Information Gain

Data Mining Sanjay Ranka Spring 2011

Direct Method: RIPPER

Direct Method: RIPPER

Direct Method: RIPPER

Data Mining Sanjay Ranka Spring 2011

Data Mining Sanjay Ranka Spring 2011

C4.5 rules versus RIPPER

Data Mining Sanjay Ranka Spring 2011

C4.5rules versus RIPPER

Data Mining Sanjay Ranka Spring 2011

Data Mining Sanjay Ranka Spring 2011

Nearest Neighbor Classifiers

Definition of Nearest Neighbor

 The k-nearest neighbors of an instance x are

• If k = 1, we can illustrate the decision boundary

Choosing the value of k

Data Mining Sanjay Ranka Spring 2011

Nearest Neighbor Classifiers

Data Mining Sanjay Ranka Spring 2011

Nearest Neighbor Classifiers

Data Mining Sanjay Ranka Spring 2011

You might also like