0% found this document useful (0 votes)

12 views17 pages

ML Lecture14

Uploaded by

Aniket Dwivedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views17 pages

ML Lecture14

Uploaded by

Aniket Dwivedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Machine Learning

CSE343/CSE543/ECE363/ECE563
Lecture 14 | Take your own notes during lectures
Vinayak Abrol <[email protected]>
Eager and Lazy Learning
Eager: is a learning method in which we learn a general model/mapping i.e., an
input-independent target function during training of the system.
- Examples SVM, LR, DT
- Target function will be approximated globally during training
- Post-training queries to the system have no eﬀect on the system
- Much less space is required

Lazy: Generalization of the training data is, in theory, delayed until a query is made
to the system.
- Used when data set is continuously updated e.g., top 10 songs
- There is in principle no training phase
- Target function will be approximated locally
- Large space requirements, slow inference, and sensitive to noise
- Examples include K-NN, Local Regression, CBR
Instance-Based Learning: KNN
Instance-based learning methods simply store the training examples (or a
reasonably sized subset) instead of learning explicit description of the target
function.
- When a new instance is encountered, its relationship to the stored examples
is examined in order to assign a target function value for the new instance.

k-Nearest Neighbor

The nearest neighbors of an instance are deﬁned in terms of a

distance.

For a given query instance, output is calculated using the

function values of k-nearest neighbor of the input query.

KNN neither truly supervised or unsupervised forms though former notion is

more popular.
KNN: Normalization and Feature weighting
If target function is discrete: Simply take a vote between neighbors
If target function is continuous: Take average value

Pick ‘k’ with the lowest error rate on the validation set

Distance could be dominated by some attributes with large numbers

e.g., features: age, income (30, 70K) → normalized (0.35, 0.38)

Diﬀerences in irrelevant features can also dominate

- KNN is easily misled in high-dimensional space
- Reweighting a dimension i by weight wi
- Setting wi to zero eliminates this dimension and is typically done using
cross-validation
KNN: Distance Metrics
Minkowski Distance: for real-valued normed vector spaces
p being 1 or 2 corresponds to the Manhattan and the Euclidean distance.

Weighted Euclidean distance: Sample weighting

Cosine Distance: cosine of the angle between two vectors

and determines whether two vectors are pointing in the
same direction.

We might need a diﬀerent distance depending on the type of data

we are dealing with e.g., Hamming distance for binary strings.
K-Means Clustering
It minimizes the intra-cluster variance, i.e. the sum of the
squared distances between the center of the cluster and the
data samples associated with it.

Homework: K-Means vs K-Median vs K-medoids ?

K-Means Clustering
Distance from center is always minimum.
K-Means converges irrespective of initialization.

Convergence does not means better result but lower cost !

K-Means [Lloyd-EM Style]: Convergence
There are at most 𝑘𝑁 ways to partition 𝑁 data points into 𝑘 clusters;
each such partition can be called a "clustering".

This is a large but ﬁnite number.

For each iteration of the algorithm, we produce a new clustering based only on the old clustering.

Notice that
- If the old clustering is the same as the new, then the next clustering will again be the same
- If the new clustering is diﬀerent from the old, then the newer one has a lower cost

Assignment Step

Update Step
K-Means
● k-means assume the variance of the distribution of
each attribute (variable) is spherical;

● All variables have the same variance;

● The prior probability for all k clusters are the same,

i.e. each cluster has roughly equal number of
observations;

Understanding the assumptions underlying a method is

essential: it doesn't just tell you when a method has drawbacks,
it tells you how to ﬁx them.
Kernel K-Means
● k-means assume the variance of the distribution of
each attribute (variable) is spherical;

● All variables have the same variance;

● The prior probability for all k clusters are the same,

i.e. each cluster has roughly equal number of
observations;

Understanding the assumptions underlying a method is

essential: it doesn't just tell you when a method has drawbacks,
it tells you how to ﬁx them.
Density Based Clustering
This method is based on the idea that a cluster/group in a data space is a contiguous region of high point
density, separated from other clusters by sparse regions.

The data points in the separating, sparse regions are typically considered noise/outliers.
Density Based Clustering
This method is based on the idea that a cluster/group in a data space is a contiguous region of high point
density, separated from other clusters by sparse regions.

The data points in the separating, sparse regions are typically considered noise/outliers.
● Defined distance (Density-based spatial clustering of applications with noise-DBSCAN) is used
to differentiate between dense clusters and sparser noise. DBSCAN is the fastest of the clustering
algorithms, but it can only be used when all significant clusters possess comparable densities.

● Self-adjusting (HDBSCAN or tunable DBSCAN) is data-driven & uses a range of distances to

distinguish clusters of diﬀerent densities from noise with sparser coverage.

● Multi-scale (Ordering Points To Identify Cluster Structure-OPTICS) approach by creating an

ordered list of points deﬁning a reachability distance, which is a measure of how easy it is to
reach a point from other points in the dataset. Points with similar reachability distances are
likely to be in the same cluster. Essentially it produces a visualization of Reachability distances.

● Kernel density based (Mean-shift clustering) methods estimates the underlying distribution
from samples and moves the kernel window towards mean/center of mass to identify clusters.
Density Based Clustering

https://www.youtube.com/watch?app=desktop&v=RDZUdRSDOok
Clustering
Segmentation via Mean-Shift Clustering
Mean-Shift Clustering
Thanks

Soal Bahasa Inggris Kelas 9 SMP/MTs - Report Text
100% (10)
Soal Bahasa Inggris Kelas 9 SMP/MTs - Report Text
2 pages
DB58 Engine Manual (En)
88% (8)
DB58 Engine Manual (En)
210 pages
Continuous Crystallizers
50% (2)
Continuous Crystallizers
22 pages
Phage Ecology: Harald Brüssow and Elizabeth Kutter
No ratings yet
Phage Ecology: Harald Brüssow and Elizabeth Kutter
36 pages
Cognizance On Fish 1
No ratings yet
Cognizance On Fish 1
12 pages
IPAQ - AUTOMATIC REPORT - Kuisioner
No ratings yet
IPAQ - AUTOMATIC REPORT - Kuisioner
20 pages
POETRY NOTES English
No ratings yet
POETRY NOTES English
44 pages
Unit 4
No ratings yet
Unit 4
29 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Systematically Attacking The Guard by Gordon Ryan-FlowChart 1.1
No ratings yet
Systematically Attacking The Guard by Gordon Ryan-FlowChart 1.1
1 page
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
Unit-1 - Machine Learning
No ratings yet
Unit-1 - Machine Learning
85 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
Buku Paket Bahasa Inggris Sman 1 Kahu
No ratings yet
Buku Paket Bahasa Inggris Sman 1 Kahu
99 pages
Clustering Classification and Intro Neural Network
No ratings yet
Clustering Classification and Intro Neural Network
168 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
Clustering
No ratings yet
Clustering
75 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
K-Means Clustering and K-Nearest Neighbors Algorithm
No ratings yet
K-Means Clustering and K-Nearest Neighbors Algorithm
62 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
Machine Learning-4
No ratings yet
Machine Learning-4
73 pages
M5
No ratings yet
M5
40 pages
M5
No ratings yet
M5
40 pages
Clustering
No ratings yet
Clustering
65 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
CSE4261 Lecture-8
No ratings yet
CSE4261 Lecture-8
49 pages
Mod 4 - CLustering
No ratings yet
Mod 4 - CLustering
55 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
SPRING Add-On CL V1 1 Protected FR
No ratings yet
SPRING Add-On CL V1 1 Protected FR
42 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
PART2
No ratings yet
PART2
61 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
Unit - 4 DWDM
No ratings yet
Unit - 4 DWDM
27 pages
ML - 8
No ratings yet
ML - 8
70 pages
Vestibular Rehab by Susan B OSullivan
No ratings yet
Vestibular Rehab by Susan B OSullivan
34 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
Subject Geology: Paper No and Title Remote Sensing and GIS Module No and Title Module Tag
No ratings yet
Subject Geology: Paper No and Title Remote Sensing and GIS Module No and Title Module Tag
19 pages
Lecture 13 - Unsupervised Learning, PCA ICA
No ratings yet
Lecture 13 - Unsupervised Learning, PCA ICA
50 pages
Unit 4
No ratings yet
Unit 4
125 pages
Clustering
No ratings yet
Clustering
75 pages
Lecture Unsupervised (17!04!2024)
No ratings yet
Lecture Unsupervised (17!04!2024)
61 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
2122 Phy WS - Book3B 4.3 - 2 Ans
No ratings yet
2122 Phy WS - Book3B 4.3 - 2 Ans
2 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Unit 7 Clustering
No ratings yet
Unit 7 Clustering
56 pages
ML Clustering
No ratings yet
ML Clustering
33 pages
L07 Clustering Algorithms
No ratings yet
L07 Clustering Algorithms
45 pages
Week 10
No ratings yet
Week 10
41 pages
Hubungan Sub Etnik Pada Suku Minahasa Menggunakan
No ratings yet
Hubungan Sub Etnik Pada Suku Minahasa Menggunakan
19 pages
Endangered Species
No ratings yet
Endangered Species
14 pages
Human Person As A Being For Death
No ratings yet
Human Person As A Being For Death
16 pages
MedSurg - Respiratory Case Study
No ratings yet
MedSurg - Respiratory Case Study
7 pages
Outline: Three Basic Algorithms
No ratings yet
Outline: Three Basic Algorithms
34 pages
DATA 2024 - Dist
No ratings yet
DATA 2024 - Dist
97 pages
CE345 - Lecture #9 - Clustering
No ratings yet
CE345 - Lecture #9 - Clustering
56 pages
UCS 401 Unit-Lll Lect 13 Distance Based Models Neighbours and Examples
No ratings yet
UCS 401 Unit-Lll Lect 13 Distance Based Models Neighbours and Examples
20 pages
Singh Et Al. 2022 - PCA - Description
No ratings yet
Singh Et Al. 2022 - PCA - Description
14 pages
ML - Unit - 2
No ratings yet
ML - Unit - 2
13 pages
K Means
No ratings yet
K Means
25 pages
Lesson 10
No ratings yet
Lesson 10
9 pages
Unit 4
No ratings yet
Unit 4
16 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
ML Lecture15
No ratings yet
ML Lecture15
13 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
DM After Midz
No ratings yet
DM After Midz
22 pages
Hurricanes Grade5
No ratings yet
Hurricanes Grade5
3 pages
K Means
No ratings yet
K Means
9 pages
ML Clustering2
No ratings yet
ML Clustering2
11 pages
Philosophy of Law
No ratings yet
Philosophy of Law
8 pages
MCT-335 InM LabManual 01
No ratings yet
MCT-335 InM LabManual 01
5 pages
Feed Export Flow
No ratings yet
Feed Export Flow
2 pages
Practice Questions
No ratings yet
Practice Questions
3 pages
Eng SS 114-92009 A
No ratings yet
Eng SS 114-92009 A
3 pages
Kaizen RuleBook
No ratings yet
Kaizen RuleBook
4 pages
Spec Sheet - Bass - Xls - Bass
No ratings yet
Spec Sheet - Bass - Xls - Bass
2 pages
Инструкция Panasonic KX-TCD150FXC (77 страницы)
No ratings yet
Инструкция Panasonic KX-TCD150FXC (77 страницы)
3 pages
3.1 Circuits
No ratings yet
3.1 Circuits
3 pages
How To Read Research Papers
No ratings yet
How To Read Research Papers
2 pages
WMI 2024 Final Grade 04 Paper B Question
No ratings yet
WMI 2024 Final Grade 04 Paper B Question
3 pages
For Fill Slope For Cut Slope
No ratings yet
For Fill Slope For Cut Slope
2 pages
20 May
No ratings yet
20 May
1 page
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

ML Lecture14

Uploaded by

ML Lecture14

Uploaded by

Machine Learning

The nearest neighbors of an instance are deﬁned in terms of a

For a given query instance, output is calculated using the

KNN neither truly supervised or unsupervised forms though former notion is

Distance could be dominated by some attributes with large numbers

Diﬀerences in irrelevant features can also dominate

Weighted Euclidean distance: Sample weighting

Cosine Distance: cosine of the angle between two vectors

We might need a diﬀerent distance depending on the type of data

Homework: K-Means vs K-Median vs K-medoids ?

Convergence does not means better result but lower cost !

This is a large but ﬁnite number.

● All variables have the same variance;

● The prior probability for all k clusters are the same,

Understanding the assumptions underlying a method is

● All variables have the same variance;

● The prior probability for all k clusters are the same,

Understanding the assumptions underlying a method is

● Self-adjusting (HDBSCAN or tunable DBSCAN) is data-driven & uses a range of distances to

● Multi-scale (Ordering Points To Identify Cluster Structure-OPTICS) approach by creating an

You might also like