0% found this document useful (0 votes)

23 views22 pages

ClusteringSlides Stanford

This document discusses machine learning clustering techniques. Clustering involves partitioning a set of data points into groups (clusters) so that items within each cluster are more similar to each other than items in other clusters, based on a distance function. K-means clustering is introduced, which aims to partition data into k clusters by minimizing the sum of squared distances between data points and their assigned cluster centers. Examples are provided clustering European cities based on geographic distance and temperature. Uses of clustering include classification by assigning labels to clusters, identifying similar items, and detecting anomalies.

Uploaded by

Sujit Kumar Mohanty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views22 pages

ClusteringSlides Stanford

Uploaded by

Sujit Kumar Mohanty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Machine Learning - Clustering

CS102
Spring 2020

Clustering CS102
Data Tools and Techniques
§ Basic Data Manipulation and Analysis
Performing well-defined computations or asking
well-defined questions (“queries”)
§ Data Mining
Looking for patterns in data
§ Machine Learning
Using data to build models and make predictions
§ Data Visualization
Graphical depiction of data
§ Data Collection and Preparation

Clustering CS102
Machine Learning
Using data to build models and make predictions
Supervised machine learning
• Set of labeled examples to learn from: training data
• Develop model from training data
• Use model to make predictions about new data
Unsupervised machine learning
• Unlabeled data, look for patterns or structure
(similar to data mining)

Clustering CS102
Clustering
Like classification, data items consist of values
for a set of features (numeric or categorical)
§ Medical patients
Feature values: age, gender, symptom1-severity,
symptom2-severity, test-result1, test-result2

§ Web pages
Feature values: URL domain, length, #images, heading1,
heading2, …, headingn

§ Products
Feature values: category, name, size, weight, price

Clustering CS102
Clustering
Like classification, data items consist of values
for a set of features (numeric or categorical)
§ Medical patients Unlike classification,
Feature values: age, gender,
there is no label
symptom1-severity,
symptom2-severity, test-result1, test-result2

§ Web pages
Feature values: URL domain, length, #images, heading1,
heading2, …, headingn

§ Products
Feature values: category, name, size, weight, price

Clustering CS102
Clustering
Like K-nearest neighbors, for any pair of data items
i1 and i2, from their feature values can compute
distance function: distance(i1,i2)
Example:
Features - gender, profession, age, income, postal-code
person1 = (male, teacher, 47, $25K, 94305)
person2 = (female, teacher, 43, $28K, 94309)
distance(person1, person2)

distance() can be defined as inverse of similarity()

Clustering CS102
Clustering
GOAL: Given a set of data items, partition
them into groups (= clusters) so that items
within groups are close to each other based
on distance function
Ø Sometimes number of clusters is pre-specified
Ø Typically clusters need not be same size

Clustering CS102
Some Uses for Clustering
§ Classification!
• Assign labels to clusters
• Now have labeled training data
for future classification
§ Identify similar items
• For substitutes or recommendations
• For de-duplication
§ Anomaly (outlier) detection
• Items that are far from any cluster

Clustering CS102
K-Means Clustering
Reminder: for any pair of data items i1 and i2
have distance(i1,i2)
For a group of items, the mean value (centroid)
of the group is the item i (in the group or not)
that minimizes the sum of distance(i,i’) for all i’
in the group

Clustering CS102
K-Means Clustering
For a group of items, the mean value (centroid)
of the group is the item i (in the group or not)
that minimizes the sum of distance(i,i’) for all i’
in the group
§ Error for each item: distance d from the mean
for its group; squared error is d 2
§ Error for the entire clustering:
sum of squared errors (SSE)

Remind you of anything?

Clustering CS102
K-Means Clustering
Given set of data items and desired
number of clusters k, K-means groups the
items into k clusters minimizing the SSE

§ Extremely difficult to compute efficiently

Ø In fact, impossible
§ Most algorithms compute
an approximate solution
(might not be absolute
lowest SSE)

Clustering CS102
Clustering European Cities
By geographic distance, then by temperature

Clustering CS102
Clustering European Cities
Distance = actual distance, k = 5

Clustering CS102
Clustering European Cities
Distance = actual distance, k = 8, with cluster means

Clustering CS102
Clustering European Cities
Distance = actual distance, k = 2, with cluster means

Clustering CS102
Clustering European Cities
Distance = actual distance, k = 30

Clustering CS102
Clustering European Cities
Distance = temperature, k = 5

Clustering CS102
Clustering European Cities
Distance = temperature, k = 8, with means

Clustering CS102
Clustering European Cities
Distance = temperature, k = 2

Clustering CS102
Clustering European Cities
Distance = temperature, k = 3

Clustering CS102
Clustering European Cities
Distance = temperature, k = 30

Clustering CS102
Some Uses for Clustering
§ Classification
• Assign labels to clusters
• Now have labeled training data
for future classification
§ Identify similar items
• For substitutes or recommendations
• For de-duplication
§ Anomaly (outlier) detection
• Items that are far from any cluster

Clustering CS102

AI Machine Learning All-In-One Mastery Course 2025 Volume 1 (AI Mastery Course Series) (Source, Creator Brown, Jamil)
No ratings yet
AI Machine Learning All-In-One Mastery Course 2025 Volume 1 (AI Mastery Course Series) (Source, Creator Brown, Jamil)
370 pages
DWM Imp Ques
No ratings yet
DWM Imp Ques
1 page
Rotundo & Sackett (2002)
No ratings yet
Rotundo & Sackett (2002)
15 pages
EDA Mini Report
No ratings yet
EDA Mini Report
32 pages
7th Sem Syllabus
No ratings yet
7th Sem Syllabus
15 pages
K Means Clustering
No ratings yet
K Means Clustering
29 pages
Edae 032
No ratings yet
Edae 032
12 pages
Ai Paper 5
No ratings yet
Ai Paper 5
8 pages
Comp 1942 finalExamQuestion-2016
No ratings yet
Comp 1942 finalExamQuestion-2016
11 pages
Final Stibo
No ratings yet
Final Stibo
25 pages
WP Demystifying Ai
No ratings yet
WP Demystifying Ai
33 pages
Lecture 2.1.1 To 2.1.2
No ratings yet
Lecture 2.1.1 To 2.1.2
97 pages
ML Mod 5
No ratings yet
ML Mod 5
5 pages
Marketing Analytics Consilidated ITAE003
No ratings yet
Marketing Analytics Consilidated ITAE003
4 pages
Lecture 4.6 Unsupervised-Learning Clustering
No ratings yet
Lecture 4.6 Unsupervised-Learning Clustering
60 pages
Expert Systems - Merged
No ratings yet
Expert Systems - Merged
39 pages
cs4811 ch10c Clustering
No ratings yet
cs4811 ch10c Clustering
35 pages
DSBDA EndSem2023 12F FlyHigh
No ratings yet
DSBDA EndSem2023 12F FlyHigh
20 pages
C MDA
No ratings yet
C MDA
7 pages
Data Mining Algorithms in R - Clustering - Fuzzy Clustering - Fuzzy C-Means - Wikibooks, Open Books For An Open World
No ratings yet
Data Mining Algorithms in R - Clustering - Fuzzy Clustering - Fuzzy C-Means - Wikibooks, Open Books For An Open World
8 pages
8 Cluster
No ratings yet
8 Cluster
33 pages
Clustering Techniques
No ratings yet
Clustering Techniques
23 pages
Gurina 2019
No ratings yet
Gurina 2019
10 pages
Unit 6 Unsupervised Learning
No ratings yet
Unit 6 Unsupervised Learning
68 pages
Multivariate Analysis
No ratings yet
Multivariate Analysis
7 pages
Comparison of Segmentation Approaches: by Beth Horn and Wei Huang
No ratings yet
Comparison of Segmentation Approaches: by Beth Horn and Wei Huang
12 pages
Clustering
No ratings yet
Clustering
34 pages
Advanced Image Segmentation Techniques
No ratings yet
Advanced Image Segmentation Techniques
71 pages
ML Clustering
No ratings yet
ML Clustering
33 pages
Module 5
No ratings yet
Module 5
370 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
ML Unit-4-1
No ratings yet
ML Unit-4-1
39 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
Data Mining MCQ Multiple Choice Questions With Answers: Eguardian
No ratings yet
Data Mining MCQ Multiple Choice Questions With Answers: Eguardian
15 pages
Week6 Clustering Regression
No ratings yet
Week6 Clustering Regression
101 pages
cz4041 10 Clustering
No ratings yet
cz4041 10 Clustering
67 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Big Data Analytics-Syllabus
No ratings yet
Big Data Analytics-Syllabus
3 pages
CS583 Unsupervised Learning
No ratings yet
CS583 Unsupervised Learning
95 pages
04 LEC Data Science Kmeans
No ratings yet
04 LEC Data Science Kmeans
26 pages
Nom 001
No ratings yet
Nom 001
24 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
CS131 Computer Vision: Foundations and Applications Practice Final (Solution) Stanford University December 11, 2017
No ratings yet
CS131 Computer Vision: Foundations and Applications Practice Final (Solution) Stanford University December 11, 2017
15 pages
Clustering
No ratings yet
Clustering
38 pages
Unit-4 ML
No ratings yet
Unit-4 ML
16 pages
Classification Slides
No ratings yet
Classification Slides
41 pages
DM 4
No ratings yet
DM 4
76 pages
INT213
0% (1)
INT213
30 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
772s Data - Mining.concepts - And.techniques.2nd - Ed
No ratings yet
772s Data - Mining.concepts - And.techniques.2nd - Ed
239 pages
6.nsupervised Learning Clustering Lecture 7 Slides For4962
No ratings yet
6.nsupervised Learning Clustering Lecture 7 Slides For4962
37 pages
BMW M-5
No ratings yet
BMW M-5
48 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Lect 10 DM
No ratings yet
Lect 10 DM
36 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
DM Chapter 5 (Clustering)
No ratings yet
DM Chapter 5 (Clustering)
40 pages
Week 9 Part 1 Clustering
No ratings yet
Week 9 Part 1 Clustering
44 pages
Unsupervised Learning Update
No ratings yet
Unsupervised Learning Update
37 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
A Review On The Current Segmentation Algorithms For Medical Images
No ratings yet
A Review On The Current Segmentation Algorithms For Medical Images
6 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Clustering and Visualisation of Data - 2020
No ratings yet
Clustering and Visualisation of Data - 2020
5 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
W6 Clustering
No ratings yet
W6 Clustering
29 pages
Clustering
No ratings yet
Clustering
75 pages
3 UnSupervised Learning
No ratings yet
3 UnSupervised Learning
53 pages
Clustering
No ratings yet
Clustering
84 pages
ML 03 Clustering
No ratings yet
ML 03 Clustering
63 pages
Overview of Big Data: CS102 Winter 2019
No ratings yet
Overview of Big Data: CS102 Winter 2019
53 pages
Machine Learning - Classification: CS102 Winter 2019
No ratings yet
Machine Learning - Classification: CS102 Winter 2019
36 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
Unit - 4 - Modified
No ratings yet
Unit - 4 - Modified
152 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
UNIT5
No ratings yet
UNIT5
60 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
CS583 Unsupervised Learning
No ratings yet
CS583 Unsupervised Learning
95 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
95 pages
Soft Vs Hard Clustering
No ratings yet
Soft Vs Hard Clustering
5 pages
Data Clustering: K-Means and Hierarchical Clustering
100% (1)
Data Clustering: K-Means and Hierarchical Clustering
24 pages
CC282 Unsupervised Learning (Clustering) : Lecture 7 Slides For CC282 Machine Learning, R. Palaniappan, 2008 1
No ratings yet
CC282 Unsupervised Learning (Clustering) : Lecture 7 Slides For CC282 Machine Learning, R. Palaniappan, 2008 1
38 pages
CS583 Unsupervised Learning
No ratings yet
CS583 Unsupervised Learning
95 pages
Unsupervised Learning and Clustering
No ratings yet
Unsupervised Learning and Clustering
19 pages
Clustering
No ratings yet
Clustering
5 pages

ClusteringSlides Stanford

Uploaded by

ClusteringSlides Stanford

Uploaded by

Machine Learning - Clustering

distance() can be defined as inverse of similarity()

Remind you of anything?

§ Extremely difficult to compute efficiently

You might also like