0% found this document useful (0 votes)
12 views17 pages

ML Lecture14

Uploaded by

Aniket Dwivedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views17 pages

ML Lecture14

Uploaded by

Aniket Dwivedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Machine Learning

CSE343/CSE543/ECE363/ECE563
Lecture 14 | Take your own notes during lectures
Vinayak Abrol <[email protected]>
Eager and Lazy Learning
Eager: is a learning method in which we learn a general model/mapping i.e., an
input-independent target function during training of the system.
- Examples SVM, LR, DT
- Target function will be approximated globally during training
- Post-training queries to the system have no effect on the system
- Much less space is required

Lazy: Generalization of the training data is, in theory, delayed until a query is made
to the system.
- Used when data set is continuously updated e.g., top 10 songs
- There is in principle no training phase
- Target function will be approximated locally
- Large space requirements, slow inference, and sensitive to noise
- Examples include K-NN, Local Regression, CBR
Instance-Based Learning: KNN
Instance-based learning methods simply store the training examples (or a
reasonably sized subset) instead of learning explicit description of the target
function.
- When a new instance is encountered, its relationship to the stored examples
is examined in order to assign a target function value for the new instance.

k-Nearest Neighbor

The nearest neighbors of an instance are defined in terms of a


distance.

For a given query instance, output is calculated using the


function values of k-nearest neighbor of the input query.

KNN neither truly supervised or unsupervised forms though former notion is


more popular.
KNN: Normalization and Feature weighting
If target function is discrete: Simply take a vote between neighbors
If target function is continuous: Take average value

Pick ‘k’ with the lowest error rate on the validation set

Distance could be dominated by some attributes with large numbers


e.g., features: age, income (30, 70K) → normalized (0.35, 0.38)

Differences in irrelevant features can also dominate


- KNN is easily misled in high-dimensional space
- Reweighting a dimension i by weight wi
- Setting wi to zero eliminates this dimension and is typically done using
cross-validation
KNN: Distance Metrics
Minkowski Distance: for real-valued normed vector spaces
p being 1 or 2 corresponds to the Manhattan and the Euclidean distance.

Weighted Euclidean distance: Sample weighting

Cosine Distance: cosine of the angle between two vectors


and determines whether two vectors are pointing in the
same direction.

We might need a different distance depending on the type of data


we are dealing with e.g., Hamming distance for binary strings.
K-Means Clustering
It minimizes the intra-cluster variance, i.e. the sum of the
squared distances between the center of the cluster and the
data samples associated with it.

Homework: K-Means vs K-Median vs K-medoids ?


K-Means Clustering
Distance from center is always minimum.
K-Means converges irrespective of initialization.

Convergence does not means better result but lower cost !


K-Means [Lloyd-EM Style]: Convergence
There are at most 𝑘𝑁 ways to partition 𝑁 data points into 𝑘 clusters;
each such partition can be called a "clustering".

This is a large but finite number.


For each iteration of the algorithm, we produce a new clustering based only on the old clustering.

Notice that
- If the old clustering is the same as the new, then the next clustering will again be the same
- If the new clustering is different from the old, then the newer one has a lower cost

Assignment Step

Update Step
K-Means
● k-means assume the variance of the distribution of
each attribute (variable) is spherical;

● All variables have the same variance;

● The prior probability for all k clusters are the same,


i.e. each cluster has roughly equal number of
observations;

Understanding the assumptions underlying a method is


essential: it doesn't just tell you when a method has drawbacks,
it tells you how to fix them.
Kernel K-Means
● k-means assume the variance of the distribution of
each attribute (variable) is spherical;

● All variables have the same variance;

● The prior probability for all k clusters are the same,


i.e. each cluster has roughly equal number of
observations;

Understanding the assumptions underlying a method is


essential: it doesn't just tell you when a method has drawbacks,
it tells you how to fix them.
Density Based Clustering
This method is based on the idea that a cluster/group in a data space is a contiguous region of high point
density, separated from other clusters by sparse regions.

The data points in the separating, sparse regions are typically considered noise/outliers.
Density Based Clustering
This method is based on the idea that a cluster/group in a data space is a contiguous region of high point
density, separated from other clusters by sparse regions.

The data points in the separating, sparse regions are typically considered noise/outliers.
● Defined distance (Density-based spatial clustering of applications with noise-DBSCAN) is used
to differentiate between dense clusters and sparser noise. DBSCAN is the fastest of the clustering
algorithms, but it can only be used when all significant clusters possess comparable densities.

● Self-adjusting (HDBSCAN or tunable DBSCAN) is data-driven & uses a range of distances to


distinguish clusters of different densities from noise with sparser coverage.

● Multi-scale (Ordering Points To Identify Cluster Structure-OPTICS) approach by creating an


ordered list of points defining a reachability distance, which is a measure of how easy it is to
reach a point from other points in the dataset. Points with similar reachability distances are
likely to be in the same cluster. Essentially it produces a visualization of Reachability distances.

● Kernel density based (Mean-shift clustering) methods estimates the underlying distribution
from samples and moves the kernel window towards mean/center of mass to identify clusters.
Density Based Clustering

https://www.youtube.com/watch?app=desktop&v=RDZUdRSDOok
Clustering
Segmentation via Mean-Shift Clustering
Mean-Shift Clustering
Thanks

You might also like