0% found this document useful (0 votes)
40 views2 pages

5c. Nearest Neighbour Classifier

Nearest Neighbour Classifier - converted kectures

Uploaded by

harsh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views2 pages

5c. Nearest Neighbour Classifier

Nearest Neighbour Classifier - converted kectures

Uploaded by

harsh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Lazy vs.

Eager Learning 1 nearest-neighbor


 Lazy vs. eager learning • Nearest neighbor algorithm does not explicitly compute decision boundaries, but
Lazy learning (e.g., instance-based learning): Simply stores these can be inferred

Nearest-Neighbor Classifier training data (or only minor processing) and waits until it is given
a test tuple
• Decision boundaries: Voronoi diagram visualization, show how input space divided
into classes;

(Instance Based Learning) Eager learning : Given a set of training tuples, constructs a
classification model before receiving new (e.g., test) data to
• Each line segment is equidistant between two points of opposite classes

classify e.g Naïve-Bayes, Decision tree, SVM


 Lazy: less time in training but more time in predicting

 Accuracy: Lazy method effectively uses a richer hypothesis


space since it uses many local linear functions to form an implicit
Ref & Acknowledgments global approximation to the target function
1. Dr B S Panda IIT Delhi Eager: must commit to a single hypothesis that covers the entire
2. R. Zemel, R. Urtasun, S. Fidler, University of Toronto instance space
6
3. Dr Sudeshna Sarkar, IIT Kharagpur

Nearest Neighbors: Decision Boundaries


Classification: parametric vs non-parametric Instance-Based Learning
• Linear regression relates two variables with a straight line; • One way of solving tasks of approximating discrete or real
nonlinear regression relates the variables using a curve. valued target functions
• Line/curve characteristics are needed, such classification is • Have training examples: (xn, f(xn)), n=1..N.
parametric
• Key idea:
• Other alternate: non-parametric
– just store the training examples
• Typically simple methods for approximating discrete-valued
– when a test example is given then find the closest matches
or real-valued target functions (they work for classification
or regression problems)

7 Example: 2D decision boundary

Nearest Neighbors: Decision Boundaries


Instance-Based Classifiers Inductive Assumption
Set of Stored Cases • Store the training records

Atr1 ……... AtrN Class • Use training records to • Similar inputs map to similar outputs
predict the class label of
A unseen cases – If not true => learning is impossible
B – If true => learning reduces to defining “similar”
B
Unseen Case
C
Atr1 AtrN
• Not all similarities created equal
A ……...

C
– predicting a person’s weight may depend on different attributes
than predicting their IQ
B

Example: 3D decision boundary

Nearest Neighbors: Multi-modal Data


 Nearest Neighbor approaches can work with multi-modal data
Instance Based Classifiers Nearest-Neighbor Classifiers  Multi modal data: Multimodal data refers to data that spans different types
Unknown record and contexts (e.g., imaging, text, or genetics)
Requires three things
• Examples:
– The set of stored records
– Rote-learner – Distance Metric to compute distance
• Memorizes entire training data and performs classification only if attributes between records
of record match one of the training examples exactly – The value of k, the number of nearest
neighbors to retrieve

– Nearest neighbor l To classify an unknown record:


• Uses k “closest” points (nearest neighbors) for performing classification – Compute distance to other training records
– Identify k nearest neighbors
– Use class labels of nearest neighbors to
determine the class label of unknown record
(e.g., by taking majority vote)

Nearest Neighbors
Nearest Neighbor Classifiers Definition of Nearest Neighbor [Pic by Olga Veksler]

• Basic idea:
– If it walks like a duck, quacks like a duck, then it’s probably a duck

Compute X X X

Distance Test
Record

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

Training Choose k of the K-nearest neighbors of a record x are data points


Records “nearest” records
that have the k smallest distance to x  Nearest neighbors sensitive to mis-labeled data (“class noise”).
 Solution?
13 / 22
k-Nearest Neighbors Nearest Neighbor Classification… K-NN: Issues (Complexity) & Remedies
• Choosing the value of k:  Expensive at test time: To find one nearest neighbor of a query point x, we
[Pic by Olga Veksler] must compute the distance to all N training examples. Complexity:
– If k is too small, sensitive to noise points
O(kdN) for kNN
– If k is too large, neighborhood may include points from other classes
 Use subset of dimensions
– We can use cross-validation to find k  Pre-sort training examples into fast data structures (e.g., kd-trees)
– Rule of thumb is k < sqrt(n), where n is the number of training  Compute only an approximate distance (e.g., LSH)
examples  Remove redundant data (e.g., condensing)
 Storage Requirements: Must store all training data
 Remove redundant data (e.g., condensing)
X  High Dimensional Data: “Curse of Dimensionality”
 Required amount of training data increases exponentially with
dimension
 Computational cost also increases

13 / 22

k-Nearest Neighbors Remedies: Remove Redundancy


Nearest Neighbor Classification Nearest Neighbor Classification: Issues If all Voronoi neighbors have the same class, a sample is useless, remove it

• Compute distance between two points: • Scaling issues


– Euclidean distance – If some attributes (coordinates of x) have larger ranges, they are
d ( p, q)   ( pi
i
q ) i
2
treated as more important
– Manhatten distance – Example:
• height of a person may vary from 1.5m to 1.8m
𝑖 𝑖
• weight of a person may vary from 60 KG to 100KG
𝑖
– q norm distance • income of a person may vary from Rs10K to Rs 2 Lakh

𝑞 1/𝑞
𝑖 𝑖 𝑖

Nearest Neighbor Classification: Scaling Issue Nearest neighbor Classification…


• k-NN classifiers are lazy learners
• Determine the class from nearest neighbor list  Scaling issues – It does not build models explicitly
– take the majority vote of class labels among the k-nearest neighbors  Attributes may have to be scaled to prevent distance – Unlike eager learners such as decision tree induction and rule-based
y’ = 𝒙 𝑖 ,𝑦 𝑖 ϵ 𝐷𝑧 𝑖 measures from being dominated by one of the attributes systems
𝑣
where Dz is the set of k closest training examples to z.  Normalize scale – Naturally forms complex decision boundaries;
– Weigh the vote according to distance  Simple option: Linearly scale the range of each feature to – adapts to data density If we have lots of samples, kNN typically
be, e.g., in range [0,1] works well
y’ = 𝒙 𝑖 ,𝑦 𝑖 ϵ 𝐷𝑧 𝑖 𝑖
• Problems: Sensitive to class noise
𝑣  Linearly scale each dimension to have 0 mean and variance
• weight factor, w = 1/d2 – Sensitive to scales of attributes
1 (compute mean µ and variance σ2 for an attribute xj and
scale: (xj − m)/σ) – Distances are less meaningful in high dimensions
– Classifying unknown records are relatively expensive

The KNN classification algorithm Nearest Neighbor Classification: Issues


Let k be the number of nearest neighbors and D be the set of  Irrelevant, correlated attributes add noise to distance
training examples. measure
1. for each test example z = (x’,y’) do  eliminate some attributes Thank You
2.Compute d(x’,x), the distance between z and every  or vary and possibly adapt weight of attributes
example, (x,y) ϵ D  Non-metric attributes (symbols)
3. Select Dz D, the set of k closest training examples to z.  Hamming distance
4. y’ = 𝑖 𝑖 ϵ 𝑧

5. end for

KNN Classification NN Classification: Issue with Distance Measure


$2,50,000 • Problem with Euclidean measure:
$2,00,000 – High dimensional data
• curse of dimensionality: all vectors are almost equidistant to the query vector
$1,50,000

Loan$ Non-Default – Can produce undesirable results


$1,00,000 Default
111111111110 100000000000
vs
$50,000
011111111111 000000000001
$0 d = 1.4142 d = 1.4142
0 10 20 30 40 50 60 70
Age
 Solution: Normalize the vectors to unit length

You might also like