CSE445 NSU Week - 5
CSE445 NSU Week - 5
● Basic idea:
– If it walks like a duck, quacks like a duck, then it’s probably
a duck
Compute
Distance Test
Record
: Unweighted voting
Query Types
• Exact match query: Asks for the object(s) whose key matches query
key exactly.
• Range query: Asks for the objects whose key lies in a specified query
range (interval).
4
Exact Match Query
• Suppose that we store employee records in a database:
• Asks for the object(s) whose key matches query key exactly.
5
Range Query
• Example:
• key=Age: retrieve all records satisfying
20 < Age < 50
• key= #Children: retrieve all records satisfying
1 < #Children < 4
6
Nearest-Neighbor(s) (NN) Query
• Example:
• key=Salary: retrieve the employee whose salary is closest to $50,000 (i.e., 1-
NN).
• key=Age: retrieve the 5 employees whose age is closest to 40 (i.e., k-NN, k=5).
7
Nearest Neighbor(s) Query
• What is the closest restaurant to my hotel?
8
Nearest Neighbor(s) Query
(cont’d)
• Find the 4 closest restaurants to my hotel
9
Nearest Neighbor Query in High
Dimensions
• Very important and practical problem!
• Image retrieval
find N closest
matches (i.e., N
nearest neighbors)
(f1,f2, .., fk)
11
Nearest Neighbor Query in High
Dimensions
• Face recognition
12
Interpreting Queries Geometrically
• Multi-dimensional keys can be thought as “points” in
high dimensional spaces.
13
Example 1- Range Search in 2D
14
Example 2 – Range Search in 3D
15
Example 3 – Nearest Neighbors
Search
Query
Point
16
Classification…
● Data preprocessing is often required
– Attributes MUST have to be scaled to prevent distance measures from being dominated by
one of the attributes
◆Example:
– height of a person may vary from 1.5m to 1.8m
– weight of a person may vary from 90lb to 300lb
– income of a person may vary from $10K to $1M
02/02/2025 18
K-Nearest Neighbor (kNN)
● kNN is a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset
● During the training phase, the KNN algorithm stores the entire training dataset as
a reference
● Manhattan distance: Manhattan Distance is the sum of absolute differences
between points across all the dimensions
● Next, the algorithm identifies the K nearest neighbors to the input data point
based on their distances.
● In the case of classification, the algorithm assigns the most common class label
among the K neighbors as the predicted label for the input data point.
● For regression, it calculates the average or weighted average of the target values
of the K neighbors to predict the value for the input data point
Need for Standardizing Attributes: Feature Scaling
21
Need for Standardizing Attributes
: Min-Max scaling