0% found this document useful (0 votes)

61 views17 pages

Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition

The document discusses instance-based learning classification techniques, specifically nearest neighbor classifiers. It explains that nearest neighbor classifiers determine the class of an unknown record based on the classes of the k nearest training records, as measured by a distance metric like Euclidean distance. The document covers key aspects of nearest neighbor classifiers like choosing k, handling scaling issues, and techniques for improving efficiency like using proximity graphs.

Uploaded by

Usman Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views17 pages

Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition

Uploaded by

Usman Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Data Mining

Classification: Alternative Techniques

Lecture Notes for Chapter 4

Instance-Based Learning

Introduction to Data Mining , 2nd Edition

by
Tan, Steinbach, Karpatne, Kumar

02/14/2018 Introduction to Data Mining, 2nd Edition 1

Instance Based Classifiers

 Examples:
– Rote-learner
 Memorizes entire training data and performs
classification only if attributes of record match one
of the training examples exactly

– Nearest neighbor
 Uses k “closest” points (nearest neighbors) for
performing classification

02/14/2018 Introduction to Data Mining, 2nd Edition 2

Nearest Neighbor Classifiers

 Basic idea:
– If it walks like a duck, quacks like a duck, then
it’s probably a duck

Compute
Distance Test
Record

Training Choose k of the

Records “nearest” records

02/14/2018 Introduction to Data Mining, 2nd Edition 3

Nearest-Neighbor Classifiers
Unknown record  Requires three things
– The set of labeled records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve

 To classify an unknown record:

– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)

02/14/2018 Introduction to Data Mining, 2nd Edition 4

Definition of Nearest Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points

that have the k smallest distances to x

02/14/2018 Introduction to Data Mining, 2nd Edition 5

1 nearest-neighbor

Voronoi Diagram

02/14/2018 Introduction to Data Mining, 2nd Edition 6

Nearest Neighbor Classification

 Compute distance between two points:

– Euclidean distance

d ( p, q )   ( pi
i
q )
i
2

 Determine the class from nearest neighbor list

– Take the majority vote of class labels among
the k-nearest neighbors
– Weigh the vote according to distance
 weight factor, w = 1/d2
02/14/2018 Introduction to Data Mining, 2nd Edition 7
Nearest Neighbor Classification…

 Choosing the value of k:

– If k is too small, sensitive to noise points
– If k is too large, neighborhood may include points from
other classes

02/14/2018 Introduction to Data Mining, 2nd Edition 8

Nearest Neighbor Classification…

 Scaling issues
– Attributes may have to be scaled to prevent
distance measures from being dominated by
one of the attributes
– Example:
 height of a person may vary from 1.5m to 1.8m
 weight of a person may vary from 90lb to 300lb
 income of a person may vary from $10K to $1M

02/14/2018 Introduction to Data Mining, 2nd Edition 9

Nearest Neighbor Classification…

 Selection of the right similarity measure is critical:

111111111110 000000000001
vs
011111111111 100000000000

Euclidean distance = 1.4142 for both pairs

02/14/2018 Introduction to Data Mining, 2nd Edition 10

Nearest neighbor Classification…

 k-NN classifiers are lazy learners since they do not build

models explicitly
– Does not learn a distinguishing function from the data
– Memorizes the training dataset
– In contrast, logistic regression learns its model
weights during training phase
 Classifying unknown((prediction) records are relatively
expensive
– Each time k-NN has to search nearest neighbors in
the entire training set
 Tricks such as BallTress and KDtress can help in speed up the
search
02/14/2018 Introduction to Data Mining, 2nd Edition 11
Nearest neighbor Classification…

 Can produce arbitrarily shaped decision boundaries

 Selection of right proximity measure (amount of k) is
essential
 Superfluous or redundant attributes can create problems
– Only distinguishing attributes make the difference
 Missing attributes are hard to handle

02/14/2018 Introduction to Data Mining, 2nd Edition 12

Improving KNN Efficiency

 Avoid computing distance to all objects in the

training set
– Multi-dimensional access methods (k-d trees)
– Fast approximate similarity search
– Locality Sensitive Hashing (LSH)
 Condensing

– Determine a smaller set of objects that give

the same performance
 Editing

– Remove objects to improve efficiency

02/14/2018 Introduction to Data Mining, 2nd Edition 13
KNN and Proximity Graphs

 Proximity graphs
– a graph in which two vertices are connected
by an edge if and only if the vertices satisfy
particular geometric requirements
– nearest neighbor graphs
– minimum spanning trees
– Delaunay triangulations
– relative neighborhood graphs
– Gabriel graphs

02/14/2018 Introduction to Data Mining, 2nd Edition 14

Measure Proximity in a Graph

 Two different algorithms that measures the

proximity between vertices of a graph
– Shared Nearest Neighbor (SNN)
 Defines proximity or similarity between two vertices in
terms of the number of neighbors (i.e., directly
connected vertices) they have in common

02/14/2018 Introduction to Data Mining, 2nd Edition 15

Measure Proximity in a Graph

 The Neumann Kernal

 Models the relatedness of vertices in a graph based
on the immediate and more distant connections
between vertices
 Use a tunable parameter to control how much weight
should be given to more distant connections
 Produce results based entirely on immediate
neighbors or results that take the full structure of a
graph into consideration

02/14/2018 Introduction to Data Mining, 2nd Edition 16

KNN and Proximity Graphs

 See recent papers by Toussaint

– G. T. Toussaint. Proximity graphs for nearest neighbor
decision rules: recent progress. In Interface-2002, 34th
Symposium on Computing and Statistics, ontreal, Canada,
April 17–20 2002.
– G. T. Toussaint. Open problems in geometric methods for
instance based learning. In Discrete and Computational
Geometry, volume 2866 of Lecture Notes in Computer
Science, pages 273–283, December 6-9, 2003.
– G. T. Toussaint. Geometric proximity graphs for improving
nearest neighbor methods in instance-based learning and
data mining. Int. J. Comput. Geometry Appl., 15(2):101–
150, 2005.

02/14/2018 Introduction to Data Mining, 2nd Edition 17

AIML - UNIT-4 Modified
No ratings yet
AIML - UNIT-4 Modified
119 pages
02-knn Slides
No ratings yet
02-knn Slides
57 pages
K-Nearest Neighbourhood
100% (1)
K-Nearest Neighbourhood
7 pages
Enhancing K-Nearest Neighbor Algorithm: A Comprehensive Review and Performance Analysis of Modifications
No ratings yet
Enhancing K-Nearest Neighbor Algorithm: A Comprehensive Review and Performance Analysis of Modifications
55 pages
Lecture 12
No ratings yet
Lecture 12
15 pages
Chap4 KNN
No ratings yet
Chap4 KNN
11 pages
Chap4 KNN New
No ratings yet
Chap4 KNN New
7 pages
Lec12 Nearest Neighborclassifier
No ratings yet
Lec12 Nearest Neighborclassifier
12 pages
K Nearest Neighbours
No ratings yet
K Nearest Neighbours
12 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
73 pages
Chap4 KNN
No ratings yet
Chap4 KNN
6 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
21 pages
ML KN
No ratings yet
ML KN
12 pages
Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF
No ratings yet
Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF
41 pages
Nearest Neighbor Classification: Presented by Sam Brown Sbbrown@uvm - Edu DATA MINING - Xindong Wu
No ratings yet
Nearest Neighbor Classification: Presented by Sam Brown Sbbrown@uvm - Edu DATA MINING - Xindong Wu
39 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
KNN HMM
No ratings yet
KNN HMM
51 pages
9.introduction To Artificial Intelligence
No ratings yet
9.introduction To Artificial Intelligence
14 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
01 Basics 02knn 01
No ratings yet
01 Basics 02knn 01
7 pages
Challenges in KNN Classification: Shichao Zhang
No ratings yet
Challenges in KNN Classification: Shichao Zhang
13 pages
Chapter 2
No ratings yet
Chapter 2
26 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
PowerPoint Presentation - KNN Presentation
No ratings yet
PowerPoint Presentation - KNN Presentation
16 pages
Lecture-11-KNearest Clustering-Part-1
No ratings yet
Lecture-11-KNearest Clustering-Part-1
18 pages
Machine Learning Technique K-Nearest Neighbors (K-NN) : Dr. Gaurav Dixit
No ratings yet
Machine Learning Technique K-Nearest Neighbors (K-NN) : Dr. Gaurav Dixit
7 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
Lecture 4 KNN
No ratings yet
Lecture 4 KNN
17 pages
Research and Implementation of Machine
No ratings yet
Research and Implementation of Machine
6 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Identification of Safety Critical Equipment (SCE) : Guide
100% (3)
Identification of Safety Critical Equipment (SCE) : Guide
28 pages
20180723161729D4730 - Pert18 - K-Nearest Neighbor
No ratings yet
20180723161729D4730 - Pert18 - K-Nearest Neighbor
22 pages
7.classification After
No ratings yet
7.classification After
51 pages
Session 9 KNN - 2024
No ratings yet
Session 9 KNN - 2024
23 pages
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
No ratings yet
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
16 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
COS4852 2023 Unit 2 - KNN
No ratings yet
COS4852 2023 Unit 2 - KNN
10 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
K-Means Consistency in Clustering
No ratings yet
K-Means Consistency in Clustering
10 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
20 KNN Presentation
No ratings yet
20 KNN Presentation
16 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
Dynamic KNNF
No ratings yet
Dynamic KNNF
3 pages
5c. Nearest Neighbour Classifier
No ratings yet
5c. Nearest Neighbour Classifier
2 pages
Location Analysis Technique
0% (1)
Location Analysis Technique
5 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
Instance-Based Learning: Slides Provided by Introduction To Data Mining, 2 Edition
No ratings yet
Instance-Based Learning: Slides Provided by Introduction To Data Mining, 2 Edition
13 pages
Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition
13 pages
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
25 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
2 K-Nearest Neighbors: ( (X, Y, Y) Be The Set of Ob-X (X) R Y (Y) R
No ratings yet
2 K-Nearest Neighbors: ( (X, Y, Y) Be The Set of Ob-X (X) R Y (Y) R
2 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
D245/D246/D247 Service Manual
100% (1)
D245/D246/D247 Service Manual
288 pages
File PDF
No ratings yet
File PDF
50 pages
Accenture Disruptability POV PDF
No ratings yet
Accenture Disruptability POV PDF
13 pages
The Polygraph Instrument and Key Requirements
No ratings yet
The Polygraph Instrument and Key Requirements
36 pages
Wikipedia K Nearest Neighbor Algorithm
No ratings yet
Wikipedia K Nearest Neighbor Algorithm
4 pages
Zero Trust Presentation
No ratings yet
Zero Trust Presentation
14 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
Assignment
No ratings yet
Assignment
12 pages
ODE/PDE Analysis of Multiple Myeloma Programming in R 1st Edition High-Resolution PDF Download
100% (11)
ODE/PDE Analysis of Multiple Myeloma Programming in R 1st Edition High-Resolution PDF Download
17 pages
Planning and Implementing Disaster Recovery
No ratings yet
Planning and Implementing Disaster Recovery
31 pages
Integrative Programming and Technology 1
No ratings yet
Integrative Programming and Technology 1
4 pages
D4u - Techspec - en Vol.3.4
No ratings yet
D4u - Techspec - en Vol.3.4
75 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
AIS Book Chapter 1 Answer
No ratings yet
AIS Book Chapter 1 Answer
5 pages
ABB Network Management Card User Manual-En
No ratings yet
ABB Network Management Card User Manual-En
36 pages
cs3235 3 PDF
No ratings yet
cs3235 3 PDF
142 pages
ATS 12-13 Security Testing 1
No ratings yet
ATS 12-13 Security Testing 1
37 pages
Zenith MTH 101 PDF 2 For Exam
No ratings yet
Zenith MTH 101 PDF 2 For Exam
18 pages
Introduction To Computer Security Matt Bishop Exercise Solutions
11% (9)
Introduction To Computer Security Matt Bishop Exercise Solutions
3 pages
Chapter 6 - JAVASCRIPT - Hamid
No ratings yet
Chapter 6 - JAVASCRIPT - Hamid
55 pages
ATS-14 Security Testing Part 2
No ratings yet
ATS-14 Security Testing Part 2
13 pages
Fulltext01 PDF
No ratings yet
Fulltext01 PDF
50 pages
ML in Python Part-2
No ratings yet
ML in Python Part-2
21 pages
Cluster Analysis Set 01: Types of Clustering
No ratings yet
Cluster Analysis Set 01: Types of Clustering
18 pages
Required Parts: Table 2
No ratings yet
Required Parts: Table 2
7 pages
ATS-15-16 Security Testing Part 3
No ratings yet
ATS-15-16 Security Testing Part 3
56 pages
Journal - Arrive Tsitaire Jean 2se17mr
No ratings yet
Journal - Arrive Tsitaire Jean 2se17mr
21 pages
BAMC Layout and Manual Merged
No ratings yet
BAMC Layout and Manual Merged
15 pages
1802 Release Highlights: SAP Activate Implementation Roadmap For SAP S/4HANA Cloud
No ratings yet
1802 Release Highlights: SAP Activate Implementation Roadmap For SAP S/4HANA Cloud
1 page
A Method of Detecting SQL Injection Attack To Secure Web Applications
No ratings yet
A Method of Detecting SQL Injection Attack To Secure Web Applications
9 pages
Inno Apps Company Profile
No ratings yet
Inno Apps Company Profile
29 pages
Cluster Analysis 04: Elbow, Slihouette, Hierarchical Clustering, Agglomerative Clustering, Min, Max, Group Average
No ratings yet
Cluster Analysis 04: Elbow, Slihouette, Hierarchical Clustering, Agglomerative Clustering, Min, Max, Group Average
28 pages
Report On Python
No ratings yet
Report On Python
20 pages
Configuring ODI External User Authentication
No ratings yet
Configuring ODI External User Authentication
18 pages
Solved - Discuss The Difference Between Resource Loading and Resource Leveling, and Provide An Example..
No ratings yet
Solved - Discuss The Difference Between Resource Loading and Resource Leveling, and Provide An Example..
2 pages
ML in Python
No ratings yet
ML in Python
15 pages
Advance Topics in Info & Comm Security Lecture 2: Security Policies and Prevention Tips
No ratings yet
Advance Topics in Info & Comm Security Lecture 2: Security Policies and Prevention Tips
14 pages
NPN RF Transistor: Absolute Maximum Ratings TA 25 Cumess Otherwise Noted
No ratings yet
NPN RF Transistor: Absolute Maximum Ratings TA 25 Cumess Otherwise Noted
3 pages
Introduction Regression Analysis: Muhammad Naveed Aman
No ratings yet
Introduction Regression Analysis: Muhammad Naveed Aman
12 pages
College Social Networking Site
No ratings yet
College Social Networking Site
5 pages
Animasi Pesawat Menggunakan OpenGL
No ratings yet
Animasi Pesawat Menggunakan OpenGL
11 pages
Lecture Notes For Chapter 4 Artificial Neural Networks Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Artificial Neural Networks Introduction To Data Mining, 2 Edition
20 pages
WLAN Security: Threats and Countermeasures: VOL 2 (2018) NO 4 e-ISSN: 2549-9904 ISSN: 2549-9610
No ratings yet
WLAN Security: Threats and Countermeasures: VOL 2 (2018) NO 4 e-ISSN: 2549-9904 ISSN: 2549-9610
7 pages
Quotation # SO2023/3378851: Your Reference: Recurrence: Quotation Date: Expiration: Salesperson
No ratings yet
Quotation # SO2023/3378851: Your Reference: Recurrence: Quotation Date: Expiration: Salesperson
1 page
Kamal Sir Cabin: S.No. Item Reuse in 206 Where and How
No ratings yet
Kamal Sir Cabin: S.No. Item Reuse in 206 Where and How
2 pages
AMS Non-Disclosure Agreement v1
No ratings yet
AMS Non-Disclosure Agreement v1
1 page
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition

Uploaded by

Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition

Uploaded by

Data Mining

Classification: Alternative Techniques

Lecture Notes for Chapter 4

Introduction to Data Mining , 2nd Edition

02/14/2018 Introduction to Data Mining, 2nd Edition 1

02/14/2018 Introduction to Data Mining, 2nd Edition 2

Training Choose k of the

02/14/2018 Introduction to Data Mining, 2nd Edition 3

 To classify an unknown record:

02/14/2018 Introduction to Data Mining, 2nd Edition 4

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points

02/14/2018 Introduction to Data Mining, 2nd Edition 5

02/14/2018 Introduction to Data Mining, 2nd Edition 6

 Compute distance between two points:

 Determine the class from nearest neighbor list

 Choosing the value of k:

02/14/2018 Introduction to Data Mining, 2nd Edition 8

02/14/2018 Introduction to Data Mining, 2nd Edition 9

 Selection of the right similarity measure is critical:

Euclidean distance = 1.4142 for both pairs

02/14/2018 Introduction to Data Mining, 2nd Edition 10

 k-NN classifiers are lazy learners since they do not build

 Can produce arbitrarily shaped decision boundaries

02/14/2018 Introduction to Data Mining, 2nd Edition 12

 Avoid computing distance to all objects in the

– Determine a smaller set of objects that give

– Remove objects to improve efficiency

02/14/2018 Introduction to Data Mining, 2nd Edition 14

 Two different algorithms that measures the

02/14/2018 Introduction to Data Mining, 2nd Edition 15

 The Neumann Kernal

02/14/2018 Introduction to Data Mining, 2nd Edition 16

 See recent papers by Toussaint

02/14/2018 Introduction to Data Mining, 2nd Edition 17

You might also like