Notes 02
Notes 02
EE514 – CS535
Zubair Khalid
https://www.zubairkhalid.org/ee514_2023.html
Outline
Bayesian Methods
Frequency Table
Decision Trees
Logistic Regression
Classification
Similarity Function K Nearest Neighbor
Neural Network
Others
Support Vector
Machine
k-Nearest Neighbor (kNN) Algorithm
Idea:
Generalization:
Determine the label of k nearest neighbors and
assign the most frequent label
k=3 k=7
k-Nearest Neighbor (kNN) Algorithm
Formal Definition:
Interpretation:
k-Nearest Neighbor (kNN) Algorithm
Formal Definition:
https://demonstrations.wolfram.com/KNearestNeighborKNNClassifier/
k-Nearest Neighbor (kNN) Algorithm
Characteristics of kNN:
- Hyper-Parameters
- k (number of neighbors)
- Distance metric (to quantify similarity)
k-Nearest Neighbor (kNN) Algorithm
Characteristics of kNN:
- Complexity (both time and storage) of prediction increases with the size
of training data.
- In case of a tie:
- Use prior information
- Use 1-nn classifier or k-1 classifier to decide
Properties of Norm
k-Nearest Neighbor (kNN) Algorithm
Distance Metric:
k-Nearest Neighbor (kNN) Algorithm
Distance Metric:
Properties of Distance Metrics:
k-Nearest Neighbor (kNN) Algorithm
Distance Metric:
k-Nearest Neighbor (kNN) Algorithm
Cosine Distance
- k=n
Decreasing k enables capturing finer structure of space
Idea: Pick k not too large, but not too small (depends on data)
How?
k-Nearest Neighbor (kNN) Algorithm
Choice of k:
- Learn the best hyper-parameter, k using the data.
- Start from k=1 and keep iterating by carrying out (5 or 10, for example)
cross-validation and computing the loss on the validation data using the
training data.
Error Rate:
k-Nearest Neighbor (kNN) Algorithm
Error Convergence:
Error Rate:
Error Rate:
k-Nearest Neighbor (kNN) Algorithm
Error Convergence:
Steps:
k-Nearest Neighbor (kNN) Algorithm
Algorithm:
Steps: Computational Complexity
1. Find distance between given test point and feature vector of every point in D.
2. Find k points in D closest to the given test point vector to form a set SX.
3. Find the most frequent label in the set Sx and assign it to the test point.
Computational Complexity:
Space Complexity:
Outline
- Pick the middle value of the feature along the selected dimension after sorting along
that dimension.
- Use this value as the root node and construct a binary tree and keep going.
k-Nearest Neighbor (kNN) Algorithm
K-D Tree:
Splitting dimension
Example:
k-Nearest Neighbor (kNN) Algorithm
K-D Tree:
Example:
k-Nearest Neighbor (kNN) Algorithm
K-D Tree:
Connection with kNN:
Finding nearest neighbor
- Trade-offs:
- Computational overhead to construct a tree O(n logn).
- Space complexity: O(n).
- May miss neighbors.
- Performance is degraded with the increase in the dimension of
future space (Curse of Dimensionality).
Outline
- To ensure the points stay close to each other, the size (n) of the
data set must also have exponential growth. That means, we need a
very large dataset to maintain the density of points in the high
dimensional space.
k-Nearest Neighbor (kNN) Algorithm
The Curse of Dimensionality:
- For high-dimensional datasets, the size of data space is huge.
Ref: CB
k-Nearest Neighbor (kNN) Algorithm
The Curse of Dimensionality:
k-Nearest Neighbor (kNN) Algorithm
The Curse of Dimensionality:
k-Nearest Neighbor (kNN) Algorithm
The Curse of Dimensionality (Another viewpoint):
- kNN algorithm carries out predictions about the test point assuming
we have data-points near to the test point that are similar to the test
point.
- For example,
- Data along a line or a plane in higher dimensional space
- detection of orientation of object in an image; data lies on effectively
1 dimensional manifold in probably 1million dimensional space.
- Face recognition in an image (50 or 71 features).
- Spam filter
k-Nearest Neighbor (kNN) Algorithm
Reference:
Overall:
• https://www.cs.cornell.edu/courses/cs4780/2018fa/
Zubair Khalid
https://www.zubairkhalid.org/ee514_2023.html
Outline
- Dimensionality Reduction
- Feature Selection
- Feature Extraction - PCA
Dimensionality Reduction
Why?
- Increasing the number of inputs or features does not
always improve accuracy of classification.
Benefits:
- Improve the classification performance.
Optimal subset:
{x1, x2} or {x1, x3}
- Calculate score of each feature against the label using the following metrics:
- Pearson correlation coefficient
- Mutual Information
- F-score
- Chi-square
- Signal-to-noise ratio (SNR), etc.
- Dimensionality Reduction
- Feature Selection
- Feature Extraction - PCA
Dimensionality Reduction
Feature Extraction:
Transform existing features to obtain a set of new features using some mapping function.
- Project into lower dimensional space using the following linear transformation
- For example (can you tell me size of matrix W for the following cases),
- find best planar approximation to 4D data
- find best planar approximation to 100D data
- We want to find this mapping while preserving as much information as possible, and ensuring
- Objective 1: the features after mapping are uncorrelated; cannot be reduced further
- We have used covariance matrix to define the mapping and used eigenvectors with
largest eigenvalues, that is, those dimensions capturing the variations in the data.
- PCA maps the data along the directions where we have most of the
variations in the data.
Dimensionality Reduction
Feature Extraction - Principal Component Analysis:
How do we choose k?
- It depends on the amount of information, that is variance, we want to preserve in the
mapping process.
- The covariance matrix of the reduced feature is projection along orthogonal components
(directions) and therefore features are uncorrelated to each other. In other words, PCA
decorrelates the features.
- Limitation:
- PCA does not consider the separation of data with respect to class label and
therefore we do not have a guarantee the mapping of the data along dimensions of
maximum variance results in the new features good enough for class discrimination.
Solution: Linear Discriminant Analysis (LDA) - Find mapping directions along which
the classes are best separated.