0% found this document useful (0 votes)

63 views26 pages

Lazy Learners Unit 2

This document discusses lazy learners, a type of machine learning algorithm. It focuses on k-nearest neighbor classifiers, which are a common lazy learning approach. The key points are: 1) Lazy learners store all training data and delay processing until making predictions, whereas eager learners construct a model during training. 2) K-nearest neighbor classifiers find the k closest training examples in feature space to make predictions for new examples. 3) Techniques like kD-trees can be used to efficiently search for the nearest neighbors in large datasets.

Uploaded by

Manshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views26 pages

Lazy Learners Unit 2

Uploaded by

Manshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Data Mining

Lazy Learners
(Instance-Based
Learners)
Outline

• Introduction
• k-Nearest-Neighbor Classifiers

Lazy Learners
Introduction

• Lazy vs. eager learning

– Eager learning
◆ e.g. decision tree induction, Bayesian classification, rule-
based classification
◆ Given a set of training set, constructs a classification model
before receiving new (e.g., test) data to classify
– Lazy learning
◆ e.g., k-nearest-neighbor classifiers, case-based reasoning
classifiers
◆ Simply stores training data (or only minor processing) and
waits until it is given a new instance
• Lazy: less time in training but more time in
predicting
Lazy Learners
Introduction

• Lazy learners store training examples and delay the

processing (“lazy evaluation”) until a new instance
must be classified
• Accuracy
– Lazy method effectively uses a richer hypothesis space
since it uses many local linear functions to form its implicit
global approximation to the target function
– Eager: must commit to a single hypothesis that covers the
entire instance space

Lazy Learners
Example Problem: Face
Recognition
• We have a database of (say) 1 million face
images
• We are given a new image and want to find the
most similar images in the database
• Represent faces by (relatively) invariant values,
e.g., ratio of nose width to eye width
• Each image represented by a large number of
numerical features
• Problem: given the features of a new face, find those
in the DB that are close in at least ¾ (say) of the
features
Lazy Learners
Introduction

• Typical approaches
– k-nearest neighbor approach
◆ Instances represented as points in a Euclidean space.
– Case-based reasoning
◆ Uses symbolic representations and knowledge-based
inference

Lazy Learners
k-Nearest-Neighbor Classifiers

• All instances correspond to points in the n-

dimentional space
• The training tuples are described by n attributes.
• Each tuple represents a point in an n-dimensional
space.
• A k-nearest-neighbor classifier searches the
pattern space for the k training tuples that are
closest to the unknown tuple.

Lazy Learners
k-Nearest-Neighbor Classifiers

• Example:
– We are interested in classifying the type of drug a
patient should be prescribed
– Based on the age of the patient and the patient’s
sodium/potassium ratio (Na/K)
– Dataset includes 200 patients

Lazy Learners
Scatter plot

On the scatter plot; light gray points indicate drug Y; medium gray points indicate drug A or
X; dark gray points indicate drug B or C

Lazy Learners
Close-up of neighbors to new patient
2

• k=1 => drugs B and C (dark gray)

• k=2 => ?
• K=3 => drugs A and X (medium gray)

• Main questions:
– How many neighbors should we consider? That is,
what is k?
– How do we measure distance?
– Should all points be weighted equally, or should some
points have more influence than others?
Lazy Learners
k-Nearest-Neighbor Classifiers

• The nearest neighbor are defined in terms of

Euclidean distance, dist(X1, X2)
• The Euclidean distance between two points or tuples, say,
X1 = (x11, x12, … , x1n) and X2 = (x21, x22, ... , x2n), is:

– Nominal attributes: distance either 0 or 1

Lazy Learners
k-Nearest-Neighbor Classifiers

• Typically, we normalize the values of each attribute in

advanced.
• This helps prevent attributes with initially large ranges
(such as income) from outweighing attributes with
initially smaller ranges (such as binary
attributes).
Min-max normalization:

– all attribute values lie between 0 and 1

Lazy Learners
k-Nearest-Neighbor Classifiers

• Common policy for missing values: assumed to be

maximally distant (given normalized attributes)
• Other popular metric: Manhattan (city-block)
metric
– Taking absolute differences value without squaring
them

Lazy Learners
k-Nearest-Neighbor Classifiers

• For k-nearest-neighbor classification, the unknown tuple is

assigned the most common class among its k nearest
neighbors.
• When k = 1, the unknown tuple is assigned the class of
the training tuple that is closest to it in pattern space.
• Nearest-neighbor classifiers can also be used for
prediction, that is, to return a real-valued prediction for
a given unknown tuple.
– In this case, the classifier returns the average value of the real-
valued labels associated with the k nearest neighbors of the
unknown tuple.

Lazy Learners
Categorical Attributes

• A simple method is to compare the corresponding

value of the attribute in tuple X1 with that in tuple X2.
• If the two are identical (e.g., tuples X1 and X2
both have the color blue), then the difference
between the two is taken as 0, otherwise 1.
• Other methods may incorporate more sophisticated
schemes for differential grading (e.g., where a
difference score is assigned, say, for blue and white
than for blue and black).

Lazy Learners
Missing Values

• In general, if the value of a given attribute A is

missing in tuple X1 and/or in tuple X2, we assume
the maximum possible difference.
• For categorical attributes, we take the difference
value to be 1 if either one or both of the
corresponding values of A are missing.
• If A is numeric and missing from both tuples X1 and
X2, then the difference is also taken to be 1.
– If only one value is missing and the other (which we’ll call
v’) is present and normalized, then we can take the
difference to be either |1 - v’| or |0 – v’| , whichever is
greater.

Lazy Learners
Determining a good value for
k
• k can be determined experimentally.
• Starting with k = 1, we use a test set to estimate the
error rate of the classifier.
• This process can be repeated each time by
incrementing k to allow for one more neighbor.
• The k value that gives the minimum error rate
may be selected.
• In general, the larger the number of training
tuples is, the larger the value of k will be

Lazy Learners
Finding nearest neighbors efficiently

• Simplest way of finding nearest neighbor: linear

scan of the data
– Classification takes time proportional to the product of the
number of instances in training and test sets
• Nearest-neighbor search can be done more
efficiently using appropriate data structures
There two methods that represent training data in
a tree structure:
kD-trees (k-dimensional trees)
Ball trees

Lazy Learners
kD-trees

• kD-tree is a binary tree that divides the input

space with a hyperplane and then splits each
partition again, recursively.
• The data structure is called a kD-tree because it
stores a set of points in k-dimensional space, k
being the number of attributes.

Lazy Learners
kD-tree example

Lazy Learners
Using kD-trees:
example
• The target, which is not one of the instances in the tree, is
marked by a star.
• The leaf node of the region containing the target is
colored black.
• To determine whether one
closer exists, first check
whether it is possible for a
closer neighbor to lie within
the node’s sibling.
• Then back up to the parent
node and check its sibling

Lazy Learners
More on kD-trees

• Complexity depends on depth of tree

• Amount of backtracking required depends on
quality of tree
• How to build a good tree? Need to find good split
point and split direction
– Split direction: direction with greatest variance
– Split point: median value or value closest to mean
along that direction
• Can apply this recursively

Lazy Learners
Building trees incrementally

• Big advantage of instance-based learning:

classifier can be updated incrementally
– Just add new training instance!
• We can do the same with kD-trees
• Heuristic strategy:
Find leaf node containing new instance
Place instance into leaf if leaf is empty
Otherwise, split leaf
Tree should be rebuilt occasionally

Lazy Learners
Lazy Learners
Lazy Learners

Fundamentals of Database System Note Unit 1-4 PDF
100% (9)
Fundamentals of Database System Note Unit 1-4 PDF
50 pages
Com 322 Database Design II
No ratings yet
Com 322 Database Design II
18 pages
CS583 Unsupervised Learning
No ratings yet
CS583 Unsupervised Learning
95 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
K-Nearest Neighbors
100% (1)
K-Nearest Neighbors
32 pages
Primtech Manual - Eng
No ratings yet
Primtech Manual - Eng
317 pages
K - Nearest Neighbors
No ratings yet
K - Nearest Neighbors
33 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
Javatpoint
No ratings yet
Javatpoint
12 pages
News Eplan en Us
No ratings yet
News Eplan en Us
208 pages
Chapter Wise One Marks Questions
No ratings yet
Chapter Wise One Marks Questions
12 pages
UNIT 2 - Notes
No ratings yet
UNIT 2 - Notes
31 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
Data Structure With C++ Syllabus
No ratings yet
Data Structure With C++ Syllabus
2 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
DSA Pyq
No ratings yet
DSA Pyq
17 pages
ML Lecture 13 KNN
No ratings yet
ML Lecture 13 KNN
14 pages
Ai Problem Solving Search and Control Strategies
100% (2)
Ai Problem Solving Search and Control Strategies
74 pages
Unit 4 - KVR
No ratings yet
Unit 4 - KVR
111 pages
Lazy Learners PDF
No ratings yet
Lazy Learners PDF
15 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Algorithms: K Nearest Neighbors
No ratings yet
Algorithms: K Nearest Neighbors
16 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
ABC Same Code S.piragash PDF
No ratings yet
ABC Same Code S.piragash PDF
116 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
20 pages
06 KNN
No ratings yet
06 KNN
41 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
30 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
K Nearest Neighbor
No ratings yet
K Nearest Neighbor
32 pages
Aiml M3 C2
No ratings yet
Aiml M3 C2
56 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
Lecture Note #3 - PEC-CS701E
No ratings yet
Lecture Note #3 - PEC-CS701E
27 pages
Week 07
No ratings yet
Week 07
24 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
Week 7 Part 1KNN K Nearest Neighbor Classification
No ratings yet
Week 7 Part 1KNN K Nearest Neighbor Classification
47 pages
Dsa JS
No ratings yet
Dsa JS
30 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
17 Recursive Definitions and Structural Induction
No ratings yet
17 Recursive Definitions and Structural Induction
49 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
K Nearest Neighbor
No ratings yet
K Nearest Neighbor
33 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Leetcode Questions
No ratings yet
Leetcode Questions
16 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
Java Programming For BSC It 4th Sem Kuvempu University
100% (1)
Java Programming For BSC It 4th Sem Kuvempu University
52 pages
Chapter 4 Trees and Graphs Tree
No ratings yet
Chapter 4 Trees and Graphs Tree
8 pages
ML KN
No ratings yet
ML KN
12 pages
ML 2
No ratings yet
ML 2
6 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
DSC Unit-4
No ratings yet
DSC Unit-4
30 pages
Data Structure - CS3301 - Important Questions With Answer - Unit 3 - Trees
No ratings yet
Data Structure - CS3301 - Important Questions With Answer - Unit 3 - Trees
14 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
Classification KNN
No ratings yet
Classification KNN
11 pages
Lazy Learners Unit 2
No ratings yet
Lazy Learners Unit 2
26 pages
KNN PDF
No ratings yet
KNN PDF
30 pages
Drain An Online Log Parsing Approach With Fixed Depth Tree
No ratings yet
Drain An Online Log Parsing Approach With Fixed Depth Tree
8 pages
7.classification After
No ratings yet
7.classification After
51 pages
Lecture 12
No ratings yet
Lecture 12
15 pages
Algorithms - K Nearest Neighbors
No ratings yet
Algorithms - K Nearest Neighbors
23 pages
Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF
No ratings yet
Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF
41 pages
Trees: - General Trees - Binary Trees - Binary Search Trees - AVL Trees - Balanced and Threaded Trees
No ratings yet
Trees: - General Trees - Binary Trees - Binary Search Trees - AVL Trees - Balanced and Threaded Trees
46 pages
DADM S15 K-NN Classification
No ratings yet
DADM S15 K-NN Classification
13 pages
Research and Implementation of Machine
No ratings yet
Research and Implementation of Machine
6 pages
01 Basics 02knn 01
No ratings yet
01 Basics 02knn 01
7 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Analysis of Pruned Minimax Trees
No ratings yet
Analysis of Pruned Minimax Trees
24 pages
05 K-Nearest Neighbors
No ratings yet
05 K-Nearest Neighbors
15 pages
Instance Based Learning
No ratings yet
Instance Based Learning
16 pages
CSE 373 Lecture 14: Midterm Review Today's Topics:: Hashing: Applications
No ratings yet
CSE 373 Lecture 14: Midterm Review Today's Topics:: Hashing: Applications
4 pages
35 Backtracking
No ratings yet
35 Backtracking
21 pages
COS4852 2023 Unit 2 - KNN
No ratings yet
COS4852 2023 Unit 2 - KNN
10 pages
EE2204 DSA 100 2marks
No ratings yet
EE2204 DSA 100 2marks
18 pages
5c. Nearest Neighbour Classifier
No ratings yet
5c. Nearest Neighbour Classifier
2 pages
Comprehensive Analysis of Housing Price Prediction in Pune Using Multi-Featured Random Frest Approach
No ratings yet
Comprehensive Analysis of Housing Price Prediction in Pune Using Multi-Featured Random Frest Approach
5 pages
Binary Search Tree (BST)
No ratings yet
Binary Search Tree (BST)
12 pages
KNN ALGORITHM IN MACHINELEARNING
No ratings yet
KNN ALGORITHM IN MACHINELEARNING
10 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
Toc
No ratings yet
Toc
6 pages
Lazy Learning (Or Learning From Your Neighbors)
No ratings yet
Lazy Learning (Or Learning From Your Neighbors)
3 pages
K-Nearest Neighbor Classifier: This Slide Is Modified From Dr. Tan's Slides. Thanks To Dr. Tan
No ratings yet
K-Nearest Neighbor Classifier: This Slide Is Modified From Dr. Tan's Slides. Thanks To Dr. Tan
11 pages
Improving Time-Complexity of K Nearest Neighbors Classifier: A Systematic Review
No ratings yet
Improving Time-Complexity of K Nearest Neighbors Classifier: A Systematic Review
6 pages
Lazy vs. Eager Learning
No ratings yet
Lazy vs. Eager Learning
6 pages
6.006 Introduction To Algorithms: Mit Opencourseware
No ratings yet
6.006 Introduction To Algorithms: Mit Opencourseware
5 pages
Dsa 5
No ratings yet
Dsa 5
1 page
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Lazy Learners Unit 2

Uploaded by

Lazy Learners Unit 2

Uploaded by

Data Mining

• Lazy vs. eager learning

• Lazy learners store training examples and delay the

• All instances correspond to points in the n-

• k=1 => drugs B and C (dark gray)

• The nearest neighbor are defined in terms of

– Nominal attributes: distance either 0 or 1

• Typically, we normalize the values of each attribute in

– all attribute values lie between 0 and 1

• Common policy for missing values: assumed to be

• For k-nearest-neighbor classification, the unknown tuple is

• A simple method is to compare the corresponding

• In general, if the value of a given attribute A is

• Simplest way of finding nearest neighbor: linear

• kD-tree is a binary tree that divides the input

• Complexity depends on depth of tree

• Big advantage of instance-based learning:

You might also like