Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
12 views
Clustering
Understanding Clustering
Uploaded by
Tinotenda Sandra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save Clustering For Later
Download
Save
Save Clustering For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
0 ratings
0% found this document useful (0 votes)
12 views
Clustering
Understanding Clustering
Uploaded by
Tinotenda Sandra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save Clustering For Later
Carousel Previous
Carousel Next
Save
Save Clustering For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download
You are on page 1
/ 6
Search
Fullscreen
13: Clustering Previous Next Index Unsupervised learning - introduction + Talk about clustering © Learning from unlabeled data + Unsupervised learning © Useful to contras with supervised learning + Compare and contrast © Supervised learning = Given a set of labels, fit a hypothesis to it © Unsupervised learning = Try and determining structure in the data 1 Clustering algorithm groups data together based on data features + What is clustering good for Market segmentation - group customers into different market segments © Social network analysis - Facebook "smartlists’ © Organizing computer clusters and data centers for network layout and location © Astronomical data analysis - Understanding galaxy formation K-means algorithm + Want an algorithm to automatically group the data into coherent clusters + K-means is by far the most widely used clustering algorithm Overview + Take unlabeled data and group into two clusters + Algorithm overview © 1) Randomly allocate two points as the cluster centroids = Have as many cluster centroids as clusters you want to do (K cluster centroids, in fact) = In our example we just have two clusters © 2) Cluster assignment step = Go through each example and depending on if it’s closer to the red or blue centroid assign each point to one of the two clusters = To demonstrate this, we've gone through the data and "colour" each point red or blue© 3) Move centroid step ‘= Take each centroid and move to the average of the correspondingly assigned data-points * Repeat 2) and 3) until convergence + More formal definition © Input: = K (number of clusters in the data) + Training set fx! x2, x8. x") © Algorithm: * Randomly initialize K cluster centroids as {t1,, Hoy Hg + sx} Repeat { fori=1tom cl := index (from 1 to K) of cluster centroid closest to x) fork=1toK [4 = average (mean) of points assigned to cluster k } + Loop = This inner loop repeatedly sets the &® centroid closes to x! + ive. take example, measure squared distance to each cluster centroid, assign eto the cluster closest @a Kod ope Wal Cs « toops = Loops over each centroid calculate the average mean based on all the points associated with each centroid from ¢) = What if there's a centroid with no data mare acon oto th clase = Or, randomly reinitialize it + Not sure when though, variable to be the index of the closes variable of cluster K-means for non-separated clusters + So far looking at K-means where we have well defined clusters + But often K-means is applied to datasets where there aren't well defined clusters© eg, T-shirt sizing T-shirt sizing 2 eor 9 © > . . 2 . 2 wt. Height t obvious discrete groups ay you want to have three sizes ($,M,L) how big do you make these? © One way would be to run K-means on this data © May do the following T-shirt sizing Height © So creates three clusters, even though they aren't realy there © Look at first population of people = Tey and design a small T-shirt which fits the 1st population ‘= And so on for the other two © This is an example of market segmentation * Build products which suit the needs of your subpopulations K means optimization objective + Supervised learning algorithms have an optimization objective (cost function) © K-means does too + K-means has an optimization objective like the supervised learning funetions we've seen, © Why is this good? © Knowing this is useful because it helps for debugging © Helps find better clusters ‘+ While K-means is running we keep track of two sets of variables © lis the index of clusters {1,2,... K} to which x is currently assigned one of K different values) © ty is the cluster associated with centroid k = Locations of cluster centroid k * i. there are mc! values, as each example has a c! value, and that value is one the the clusters (i.e. can only be= So there are K = So these the centroids which exist in the training data space the cluster centroid of the cluster to which example x' has been assigned to ‘= This is more for convenience than anything else ‘= You could look up that example iis indexed to cluster j (using the c vector), where jis between 1 and K = Then look up the value associated with cluster j in the u vector (i.e. what are the features associated with 4) ‘= But instead, for easy description, we have this variable which gets, = Lets say x! as been assigned to cluster 5 = Means that tly the same value he = Hs * Using this notation we can write the optimization objective; te IT (CO yop, pry esac) = =D Ile = peo |? (estes) = D2 Hl 6 i.e. squared distances between training example x! and the cluster centroid to which x! has been assigned to + This is just what we've been doing, as the visual description below shows; New = The red line here shows the distances between the example x! and the cluster to which that example has been. assigned = Means that when the example is very close to the cluster, this value is small = When the cluster is very far away from the example, the value is large © This is sometimes called the distortion (or distortion cost function) © Sowe are finding the values which minimizes this funetion; ymin I(r, we) Bayes MK + Ifwe consider the k-means algorithm © The cluster assigned step is minimizing J(..) with respect to cc? .¢ * i.e. find the centroid closest to each example = Doesn't change the centroids themselves © The move centroid step = We can show this step is choosing the values of which minimizes J(.) with respect to 1 © So, we're partitioning the algorithm into two parts ‘» First part minimizes the c variables 1 Second part minimizes the J variables * We can use this knowledge to help debug our K-means algorithm Random initialization + How we initialize K-means © And how avoid local optimum + Consider clustering algorithm © Never spoke about how we initialize the centroids 1 A few ways - one method is most recommended + Have number of centroids set to less than number of examples (K < m) (if K > m we have a problem)o © Randomly pick K training examples © Set p, up to pix to these example's values + K means can converge to different solutions depending on the initialization setup© Risk of_ocal optimum GLOBAL OPTIMUM LocaL OPTIMA © The local optimum are valid convergence, but local optimum not global ones «If this is a concern ‘© We can do multiple random initializations * See if we get the same result - many same results are likely to indicate a global optimum + Algorithmically we can do this as follows; For i=1to 100{ Randomly initialize K-means. Run K-means. Get c{1) c™), 1... Compute cost function (distortion) TED, 2.5, wry.) } © Atypical number of times to initialize K-means is 50-1000 © Randomly initialize K-means + For each 100 random initialization run K-means * Then compute the distortion on the set of cluster assignments and centroids at convergent + End with 100 ways of cluster the data * ick the clustering which gave the lowest distortion + Ifyou're running K means with 2-10 clusters ean help find better global optimum ‘© If Kis larger than 10, then multiple random initializations are less likely to be necessary © First solution is probably good enough (better granularity of clustering) How do we choose the number of clusters? * Choosing K? © Nota great way to do this automatically © Normally use visualizations to do it manually + What are the intuitions regarding the data? + Why is this hard © Sometimes very ambiguous 1 e.g, two clusters or four clusters = Not necessarily a correct answer © This is why doing it automatic this is hard Elbow method + Vary K and compute cost function at a range of K values + As K increases J\..) minimum value should decrease (i.e. you decrease the granularity so centroids can better optimize)© Plot this (K vs JO) + Look for the "elbow" on the graph Selon" ieee 12345678 K (ho. of clusters) Cost function J « Chose the “elbow” number of clusters * Ifyou get a nice plot this is a reasonable way of choosing K © Risks © Normally you don’t get a a nice line -> no clear elbow on curve © Not really that helpful Another method for choosing K + Using K-means for market segmentation + Running K-means for a later/downstream purpose © See how well different number of clusters serve you later needs + eg. © T-shirt size example = Ifyou have three sizes (S,M,L) 1 Or five sizes (XS, S, M, L, XL) = Run K means where K= 3 and K= 5 © How does this look Tshirt sting L. T-shirt sing Weight Weight Height Height © This gives a way to chose the number of clusters = Could consider the cost of making extra sizes vs. how well distributed the products are = How important are those sizes though? (e.g. more sizes might make the customers happier) = So applied problem may help guide the number of clusters
You might also like
13: Clustering: Unsupervised Learning - Introduction
PDF
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
2 - K-Mean
PDF
No ratings yet
2 - K-Mean
39 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
PDF
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
PDF
No ratings yet
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
168 pages
19.1. Partitioning-Based Clustering Algorithms
PDF
No ratings yet
19.1. Partitioning-Based Clustering Algorithms
27 pages
Clustering
PDF
No ratings yet
Clustering
4 pages
Week 9
PDF
No ratings yet
Week 9
66 pages
Clustering: Unsupervised Learning
PDF
No ratings yet
Clustering: Unsupervised Learning
29 pages
K Means Clustering
PDF
No ratings yet
K Means Clustering
22 pages
P-3 1 2-Kmeans
PDF
No ratings yet
P-3 1 2-Kmeans
43 pages
Clustering Algorithm
PDF
No ratings yet
Clustering Algorithm
47 pages
Kmeans
PDF
No ratings yet
Kmeans
92 pages
Mod4_Unsupervised Learning
PDF
No ratings yet
Mod4_Unsupervised Learning
9 pages
1731009606_Clustering_(Class_38-39)
PDF
No ratings yet
1731009606_Clustering_(Class_38-39)
45 pages
K_means.ipynb_-_Colab
PDF
No ratings yet
K_means.ipynb_-_Colab
10 pages
algo
PDF
No ratings yet
algo
59 pages
Clustering-Part1.pptx
PDF
No ratings yet
Clustering-Part1.pptx
84 pages
1 The K-Medoids Algorithm
PDF
No ratings yet
1 The K-Medoids Algorithm
5 pages
K - Means Clustering
PDF
No ratings yet
K - Means Clustering
13 pages
Clustering: Unsupervised Learning Introduc3on
PDF
No ratings yet
Clustering: Unsupervised Learning Introduc3on
29 pages
Lecture 13
PDF
No ratings yet
Lecture 13
29 pages
Unit-4
PDF
No ratings yet
Unit-4
46 pages
4.1.2. K Means Clustering
PDF
No ratings yet
4.1.2. K Means Clustering
38 pages
A Tutorial On Clustering Algorithms
PDF
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
Clusterin G: Unsupervised Learning
PDF
No ratings yet
Clusterin G: Unsupervised Learning
29 pages
Unit 4 Aam
PDF
No ratings yet
Unit 4 Aam
26 pages
Unit 4 Clustering - K-Means and Hierarchical
PDF
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Unsupervised Learning - Clustering
PDF
No ratings yet
Unsupervised Learning - Clustering
55 pages
Unsupervised Learning
PDF
No ratings yet
Unsupervised Learning
12 pages
K Mean
PDF
No ratings yet
K Mean
12 pages
Unit IV
PDF
No ratings yet
Unit IV
96 pages
K-MEANS CLUSTERING ppt kpu
PDF
No ratings yet
K-MEANS CLUSTERING ppt kpu
4 pages
Presentation 1
PDF
No ratings yet
Presentation 1
47 pages
Clustering in Machine Learning
PDF
No ratings yet
Clustering in Machine Learning
20 pages
WINSEM2021-22_ECE6093_ETH_VL2021220505450_Reference_Material_I_23-03-2022_slides_kmeans_(1) (1)
PDF
No ratings yet
WINSEM2021-22_ECE6093_ETH_VL2021220505450_Reference_Material_I_23-03-2022_slides_kmeans_(1) (1)
28 pages
2021 Clustering
PDF
No ratings yet
2021 Clustering
50 pages
DSML-ML09. Unsupervised Learning
PDF
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
kmeansfinal
PDF
No ratings yet
kmeansfinal
16 pages
Lecture 3. Partitioning-Based Clustering Methods
PDF
No ratings yet
Lecture 3. Partitioning-Based Clustering Methods
27 pages
Kmean
PDF
No ratings yet
Kmean
24 pages
K-Means Clustering Algorithm
PDF
No ratings yet
K-Means Clustering Algorithm
20 pages
Clustering Techniques - Hierarchical, K-Means Clustering
PDF
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
unsupervised learning
PDF
No ratings yet
unsupervised learning
23 pages
Clustering FinancialData
PDF
No ratings yet
Clustering FinancialData
38 pages
Intro Data Science: Cluster Analysis
PDF
No ratings yet
Intro Data Science: Cluster Analysis
60 pages
Lecture 11 - K-Means Clustering (DONE!!) PDF
PDF
No ratings yet
Lecture 11 - K-Means Clustering (DONE!!) PDF
49 pages
04-FSSR_DS610_2024=2025T1_Kmeans
PDF
No ratings yet
04-FSSR_DS610_2024=2025T1_Kmeans
57 pages
Introduction To Unsupervised Learning:: Clustering
PDF
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
K-Means Clustering Algorithm
PDF
No ratings yet
K-Means Clustering Algorithm
40 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
PDF
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
PDF
No ratings yet
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
19 pages
Clustering K-Means
PDF
100% (2)
Clustering K-Means
28 pages
UNIT - 3 - Clustering
PDF
No ratings yet
UNIT - 3 - Clustering
21 pages
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
PDF
No ratings yet
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
20 pages
Clustering
PDF
No ratings yet
Clustering
28 pages
kmea
PDF
No ratings yet
kmea
53 pages
som-new
PDF
No ratings yet
som-new
21 pages
K-MEANS-FINAL
PDF
No ratings yet
K-MEANS-FINAL
10 pages
datamining-lect8
PDF
No ratings yet
datamining-lect8
79 pages
Related titles
Click to expand Related Titles
Carousel Previous
Carousel Next
13: Clustering: Unsupervised Learning - Introduction
PDF
13: Clustering: Unsupervised Learning - Introduction
2 - K-Mean
PDF
2 - K-Mean
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
PDF
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
PDF
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
19.1. Partitioning-Based Clustering Algorithms
PDF
19.1. Partitioning-Based Clustering Algorithms
Clustering
PDF
Clustering
Week 9
PDF
Week 9
Clustering: Unsupervised Learning
PDF
Clustering: Unsupervised Learning
K Means Clustering
PDF
K Means Clustering
P-3 1 2-Kmeans
PDF
P-3 1 2-Kmeans
Clustering Algorithm
PDF
Clustering Algorithm
Kmeans
PDF
Kmeans
Mod4_Unsupervised Learning
PDF
Mod4_Unsupervised Learning
1731009606_Clustering_(Class_38-39)
PDF
1731009606_Clustering_(Class_38-39)
K_means.ipynb_-_Colab
PDF
K_means.ipynb_-_Colab
algo
PDF
algo
Clustering-Part1.pptx
PDF
Clustering-Part1.pptx
1 The K-Medoids Algorithm
PDF
1 The K-Medoids Algorithm
K - Means Clustering
PDF
K - Means Clustering
Clustering: Unsupervised Learning Introduc3on
PDF
Clustering: Unsupervised Learning Introduc3on
Lecture 13
PDF
Lecture 13
Unit-4
PDF
Unit-4
4.1.2. K Means Clustering
PDF
4.1.2. K Means Clustering
A Tutorial On Clustering Algorithms
PDF
A Tutorial On Clustering Algorithms
Clusterin G: Unsupervised Learning
PDF
Clusterin G: Unsupervised Learning
Unit 4 Aam
PDF
Unit 4 Aam
Unit 4 Clustering - K-Means and Hierarchical
PDF
Unit 4 Clustering - K-Means and Hierarchical
Unsupervised Learning - Clustering
PDF
Unsupervised Learning - Clustering
Unsupervised Learning
PDF
Unsupervised Learning
K Mean
PDF
K Mean
Unit IV
PDF
Unit IV
K-MEANS CLUSTERING ppt kpu
PDF
K-MEANS CLUSTERING ppt kpu
Presentation 1
PDF
Presentation 1
Clustering in Machine Learning
PDF
Clustering in Machine Learning
WINSEM2021-22_ECE6093_ETH_VL2021220505450_Reference_Material_I_23-03-2022_slides_kmeans_(1) (1)
PDF
WINSEM2021-22_ECE6093_ETH_VL2021220505450_Reference_Material_I_23-03-2022_slides_kmeans_(1) (1)
2021 Clustering
PDF
2021 Clustering
DSML-ML09. Unsupervised Learning
PDF
DSML-ML09. Unsupervised Learning
kmeansfinal
PDF
kmeansfinal
Lecture 3. Partitioning-Based Clustering Methods
PDF
Lecture 3. Partitioning-Based Clustering Methods
Kmean
PDF
Kmean
K-Means Clustering Algorithm
PDF
K-Means Clustering Algorithm
Clustering Techniques - Hierarchical, K-Means Clustering
PDF
Clustering Techniques - Hierarchical, K-Means Clustering
unsupervised learning
PDF
unsupervised learning
Clustering FinancialData
PDF
Clustering FinancialData
Intro Data Science: Cluster Analysis
PDF
Intro Data Science: Cluster Analysis
Lecture 11 - K-Means Clustering (DONE!!) PDF
PDF
Lecture 11 - K-Means Clustering (DONE!!) PDF
04-FSSR_DS610_2024=2025T1_Kmeans
PDF
04-FSSR_DS610_2024=2025T1_Kmeans
Introduction To Unsupervised Learning:: Clustering
PDF
Introduction To Unsupervised Learning:: Clustering
K-Means Clustering Algorithm
PDF
K-Means Clustering Algorithm
Introduction To The K-Means Clustering Algorithm Based On The Elbow
PDF
Introduction To The K-Means Clustering Algorithm Based On The Elbow
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
PDF
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
Clustering K-Means
PDF
Clustering K-Means
UNIT - 3 - Clustering
PDF
UNIT - 3 - Clustering
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
PDF
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
Clustering
PDF
Clustering
kmea
PDF
kmea
som-new
PDF
som-new
K-MEANS-FINAL
PDF
K-MEANS-FINAL
datamining-lect8
PDF
datamining-lect8