Lecture 23 - Clustring

Clustering involves grouping data into clusters so that objects within a cluster are similar to each other but dissimilar to objects in other clusters. It is an unsupervised learning technique with no predefined classes. Clustering is used for data visualization, as a preprocessing step for other algorithms, and in applications like image processing, web mining, and bioinformatics. Common clustering algorithms include k-means, which assigns objects to the closest of k randomly selected centroids, iteratively updating the centroids until clusters stabilize. The quality of clustering depends on achieving high intra-cluster similarity and low inter-cluster similarity.

Uploaded by

bscs-20f-0009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views14 pages

Lecture 23 - Clustring

Uploaded by

bscs-20f-0009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Clustering

What Is Clustering?
• Group data into clusters
• Similar to one another within the same cluster
• Dissimilar to the objects in other clusters
• Unsupervised learning: no predefined classes

Outliers
Cluster 1
Cluster 2
Outliers
• Outliers are objects that do not belong to any cluster or
form clusters of very small cardinality

cluster

outliers

• In some applications we are interested in discovering

outliers, not clusters (outlier analysis)
Why do we cluster?
• Clustering : given a collection of data objects group them so that
• Similar to one another within the same cluster
• Dissimilar to the objects in other clusters

• Clustering results are used:

• As a stand-alone tool to get insight into data distribution
• Visualization of clusters may unveil important information
• As a preprocessing step for other algorithms
• Efficient indexing or compression often relies on clustering
Applications of clustering?
• Image Processing
• cluster images based on their visual content
• Web
• Cluster groups of users based on their access patterns on
webpages
• Cluster webpages based on their content
• Bioinformatics
• Cluster similar proteins together (similarity wrt chemical
structure and/or functionality etc)
• Many more…
Observations to cluster
• Real-value attributes/variables
• e.g., salary, height

• Binary attributes
• e.g., gender (M/F), has_cancer(T/F)

• Nominal (categorical) attributes

• e.g., religion (Christian, Muslim, Buddhist, Hindu, etc.)

• Ordinal/Ranked attributes
• e.g., military rank (soldier, sergeant, lutenant, captain, etc.)

• Variables of mixed types

• multiple attributes with various types
What Is A Good Clustering?
• High intra-class similarity and low inter-class similarity
• Depending on the similarity measure
• The ability to discover some or all of the hidden patterns
How Good Is A Clustering?
• Dissimilarity/similarity depends on distance function
• Different applications have different functions
• Judgment of clustering quality is typically highly subjective
Similarity and Dissimilarity Between Objects
• Distances are normally used measures
• Distance: a generalization

• If q = 2, d is Euclidean distance
• If q = 1, d is Manhattan distance
• Weighed distance
K-means algorithm
• Step-1: Select the value of K, to decide the number of clusters to be
formed.
• Step-2: Select random K points which will act as centroids.
• Step-3: Assign each data point, based on their distance from the randomly
selected points (Centroid), to the nearest/closest centroid which will form
the predefined clusters.
• Step-4: place a new centroid of each cluster.
• Step-5: Repeat step no.3, which reassign each datapoint to the new closest
centroid of each cluster.
• Step-6: If any reassignment occurs, then go to step-4 else go to Step 7.
• Step-7: FINISH
K-Means: Example
10
10
9
9
8
8
7
7
6
6
5
5
4
4
Assign Update 3
3

2 each the 2

1
objects cluster 1

0
0
0 1 2 3 4 5 6 7 8 9 10 to most means 0 1 2 3 4 5 6 7 8 9 10

similar
center reassign reassign

K=2
Arbitrarily choose K
object as initial
cluster center Update
the
cluster
means
K mean clustering with k=4
Implementation in python
• from sklearn.cluster import Kmeans
• kmeans = KMeans(n_clusters=2)
• kmeans.fit(X)
• https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KM
eans.html

E78330 PDF - 3364504 - en-US-6-3
No ratings yet
E78330 PDF - 3364504 - en-US-6-3
693 pages
Blue Coat Authentication Webcast Final
No ratings yet
Blue Coat Authentication Webcast Final
53 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
Week 10 Lecture - Introduction To Clustering
No ratings yet
Week 10 Lecture - Introduction To Clustering
35 pages
Clustering
No ratings yet
Clustering
5 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
Week 9
No ratings yet
Week 9
66 pages
Unit-4 ML
No ratings yet
Unit-4 ML
16 pages
Clustering
No ratings yet
Clustering
38 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
K Mean Clustering
No ratings yet
K Mean Clustering
59 pages
Clustering
No ratings yet
Clustering
84 pages
Clustering
No ratings yet
Clustering
125 pages
Unit 4
No ratings yet
Unit 4
74 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
07 Clustering
No ratings yet
07 Clustering
34 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Chapter 3-Unsupervised Learning - Updated
No ratings yet
Chapter 3-Unsupervised Learning - Updated
54 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Slides - Intro To Clustering
No ratings yet
Slides - Intro To Clustering
10 pages
Lecture 2.1.1 To 2.1.2
No ratings yet
Lecture 2.1.1 To 2.1.2
97 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
UNIT-6 K Means Clustering
No ratings yet
UNIT-6 K Means Clustering
12 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
09 Clustering
No ratings yet
09 Clustering
21 pages
M3 - Unsupervised Machine Learning
No ratings yet
M3 - Unsupervised Machine Learning
35 pages
DM 4
No ratings yet
DM 4
76 pages
Lecture 4.6 Unsupervised-Learning Clustering
No ratings yet
Lecture 4.6 Unsupervised-Learning Clustering
60 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
83 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
Lect 12
No ratings yet
Lect 12
80 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
Clustering
No ratings yet
Clustering
75 pages
Clustering Part1
No ratings yet
Clustering Part1
79 pages
5.1 Intro-Clustering, Distance Measures.
No ratings yet
5.1 Intro-Clustering, Distance Measures.
25 pages
Week 14 and 15 Machine Learning Unsupervised 2
No ratings yet
Week 14 and 15 Machine Learning Unsupervised 2
25 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Clustering
No ratings yet
Clustering
27 pages
Clustering and Pattern Recognition Unit 5
No ratings yet
Clustering and Pattern Recognition Unit 5
21 pages
Clustering
No ratings yet
Clustering
67 pages
Unit 4
No ratings yet
Unit 4
125 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
Clustering and K-Means Algorithm
No ratings yet
Clustering and K-Means Algorithm
81 pages
Data Mining For BI - Part 5
No ratings yet
Data Mining For BI - Part 5
34 pages
Clustering
No ratings yet
Clustering
75 pages
Clustering
No ratings yet
Clustering
80 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
Mini Project
No ratings yet
Mini Project
8 pages
Clustering FinancialData
No ratings yet
Clustering FinancialData
38 pages
Soft Vs Hard Clustering
No ratings yet
Soft Vs Hard Clustering
5 pages
Supervised Learning vs. Unsupervised Learning
No ratings yet
Supervised Learning vs. Unsupervised Learning
7 pages
W6 Clustering
No ratings yet
W6 Clustering
29 pages
INVENRELATION
From Everand
INVENRELATION
Shih Yu Chang
No ratings yet
Online Creation Tools Platforms
No ratings yet
Online Creation Tools Platforms
5 pages
SRS - Final Year Project (Flappy Birds) Software Requirements Specification by Shoaib
No ratings yet
SRS - Final Year Project (Flappy Birds) Software Requirements Specification by Shoaib
21 pages
Modules and Ports
No ratings yet
Modules and Ports
20 pages
Coen243 - A2 - F - 2024
No ratings yet
Coen243 - A2 - F - 2024
6 pages
AT PT in SCIENCE 5 Q2 Week 3 4
No ratings yet
AT PT in SCIENCE 5 Q2 Week 3 4
23 pages
Unique IP Project File Harshit
No ratings yet
Unique IP Project File Harshit
26 pages
Classlink Apps
No ratings yet
Classlink Apps
1 page
Rest API Slides
No ratings yet
Rest API Slides
34 pages
SAP Note 32215812931
No ratings yet
SAP Note 32215812931
2 pages
Inspired Instruments You Rock Guitar
No ratings yet
Inspired Instruments You Rock Guitar
5 pages
Shaders Excerpt
No ratings yet
Shaders Excerpt
51 pages
Sensi Fee Structure
No ratings yet
Sensi Fee Structure
4 pages
Quick Installation Guide
No ratings yet
Quick Installation Guide
7 pages
Professional Summary: Skills
No ratings yet
Professional Summary: Skills
1 page
3i - Portable Device For On-Line Insulation Monitoring in Switchgear Cells and HV/MV Cables by Partial Discharges
No ratings yet
3i - Portable Device For On-Line Insulation Monitoring in Switchgear Cells and HV/MV Cables by Partial Discharges
3 pages
SIP5 - 7SA SD 82 84 86 7SL 82 86 SJ 86 - V08.40 - Manual - C010 E - en
0% (1)
SIP5 - 7SA SD 82 84 86 7SL 82 86 SJ 86 - V08.40 - Manual - C010 E - en
2,200 pages
Digital Ic Design July 2023
No ratings yet
Digital Ic Design July 2023
4 pages
Xiaomi MiA3 Laurel Sprout 2024-01-08 00-38-47
No ratings yet
Xiaomi MiA3 Laurel Sprout 2024-01-08 00-38-47
2 pages
SDS011 Laser PM2.5 Sensor Specification-V1.2
No ratings yet
SDS011 Laser PM2.5 Sensor Specification-V1.2
12 pages
Humidificador F&P MR850
100% (1)
Humidificador F&P MR850
67 pages
Guide To Livestreaming - Nordic
No ratings yet
Guide To Livestreaming - Nordic
10 pages
Code Optimization
No ratings yet
Code Optimization
25 pages
Infineon SONOS - Technology Whitepaper v06 - 00 EN
No ratings yet
Infineon SONOS - Technology Whitepaper v06 - 00 EN
17 pages
Computer Literacy Test 2
No ratings yet
Computer Literacy Test 2
11 pages
Silo - Tips - Guide To Snare For Windows v42
No ratings yet
Silo - Tips - Guide To Snare For Windows v42
48 pages
All Auto Product Catalogue-2024
No ratings yet
All Auto Product Catalogue-2024
21 pages
Advanced Payroll Software & HRIS To Enhance Your HR Business Process
No ratings yet
Advanced Payroll Software & HRIS To Enhance Your HR Business Process
15 pages
Finite Difference Method
No ratings yet
Finite Difference Method
7 pages

Lecture 23 - Clustring

Uploaded by

Lecture 23 - Clustring

Uploaded by

Clustering

• In some applications we are interested in discovering

• Clustering results are used:

• Nominal (categorical) attributes

• Variables of mixed types

You might also like