0% found this document useful (0 votes)

77 views

Hierarchical Clustering

This document provides an overview of hierarchical clustering techniques. It discusses how hierarchical clustering can represent relationships between data points in a nested tree structure. The document outlines different methods for hierarchical clustering including agglomerative (bottom-up) and divisive (top-down) approaches. It also describes various linkage methods for determining the distance or similarity between clusters during the clustering process, such as single, complete, average and Ward's linkage. Finally, it discusses converting hierarchical clusters into flat partitions and using connectivity constraints to improve clustering results.

Uploaded by

HigorScbd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views

Hierarchical Clustering

Uploaded by

HigorScbd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Chapter 19

Hierarchical clustering

Hierarchical Clustering. Agglomerative and Divisive Clustering. Clustering Features.

19.1 Hierarchical clustering

Deciding the best number of clusters is often difficult, as the structure of the data may not provide an
obvious solution for this problem. For example, if we want to cluster all living organisms, it is not
clear how many clusters we should have. In this case, the reason is that living organisms are related
in a family tree, in a range of degrees of distance. The best option is to represent this structure in a
series of nested clusters, and clusters of clusters, and so on. This is done with hierarchical clustering.
Figure 19.1 shows two examples of hierarchical clustering. The tree of life, a hierarchical clustering of
living species that also represents their evolutionary relations, and hierarchical clustering for analysing
similarities in gene expression patterns in different organisms.

Figure 19.1: Examples of hierarchical clustering. Left panel, hierarchical clustering of living organisms,
indicating evolutionary relations. Image source: Wikipedia. On the right panel, hierarchical clustering
of gene expression data (Mulvey and Gingold, Online Computational Biology Textbook).

A hierarchical clustering can be represented as a dendrogram (a tree) by joining together first the
examples that are more similar and then gradually joining the most similar clusters until all links are
found, as shown in Figure 19.2. This means that we need to define how to measure the similarity, or
dissimilarity, between examples in our dataset but also how to measure similarity between clusters of
examples, because we need to decide how to cluster clusters into sets of larger clusters.

163
164 CHAPTER 19. HIERARCHICAL CLUSTERING

Figure 19.2: Hierarchical clustering represented as a dendrogram. Image source: Wikipedia.

There are several ways of thinking about this problem. We can think about proximity between
examples as a generic term of “likeness”, without any precise definition. Similarityis more well defined,
generally a number between 0 and 1 that indicates how alike examples are. Dissimilarity is also a
quantitative measure, in this case of difference between examples, and distance is a special case of
a dissimilarity measure that respects the algebraic properties of a distance. Namely, not negative,
symmetrical and respecting the triangle inequality:

d(x, y) ≥ 0 , d(x, y) = d(y, x) , d(x, z) ≤ d(x, y) + d(y, x)

There are many possible distance measures. Some of the most used are Euclidean, Manhattan and
squared Euclidean distance.
rP
• Euclidean: kx − yk2 = (xd − yd )2
d

• Squared Euclidean: kx − yk22 = (xd − yd )2

P
d

(xd − yd )2
P
• Manhattan: kx − yk1 =
d
p
• Mahalanobis (normalized by variance): (x − y)T Cov −1 (x − y)

For strings and sequences in general, some useful measures are the Hamming distance, which is the
count of differences between the strings, or the Levenshtein distance, or edit distance, counting the
number of single-character edits (insertions, deletions or substitutions) needed to transform one string
into the other.
Apart from a way to measure similarity or distance between examples, we must also measure
distance between clusters. The method for evaluating cluster distance is the linkage, and there are also
several ways of doing this.

• Single linkage: distance between clusters is the distance between the closest points.

dist(Cj , Ck ) = min (dist(x ∈ Cj , y ∈ Ck ))

• Complete linkage: distance between the most distant points.

dist(Cj , Ck ) = max (dist(x ∈ Cj , y ∈ Ck ))

19.1. HIERARCHICAL CLUSTERING 165

• Centroid linkage: distance between the centroids of the two clusters.

P P
x ∈ Cj y ∈ Ck
dist(Cj , Ck ) = dist ,
|Cj | |Ck |

• Average linkage: average distance between all pairs of points from the different clusters.

dist(Cj , Ck ) = mean (dist(x ∈ Cj , y ∈ Ck ))

• Median linkage: median distance between all pairs of points from the different clusters.

dist(Cj , Ck ) = median (dist(x ∈ Cj , y ∈ Ck ))

• Ward linkage: join clusters that minimize Sum of Squares Error:

N X
X K
rnk kxn − µk k2
n=1 k=1

Figure 19-linkage illustrates some examples of linkage methods.

Figure 19.3: Single, complete and centroid linkage methods.

The obvious advantages of hierarchical clustering is avoiding the need to specify a number of
clusters, both before or after clustering, and the possibility of revealing some hierarchical structure in
the data. The disadvantages are that hierarchical clustering must generally be done in a single pass,
with a greedy algorithm, which may introduce errors, and if the hierarchical structure assumed by this
type of clustering does not exist in the data the result may be confusing or misleading.
Agglomerative clustering is a bottom-up approach that begins with singleton clusters and repeatedly
joins the best two clusters, according to the linkage method used, into a higher level cluster until all
elements are joined. The time complexity of agglomerative clustering is generally O(n3 ), but can be
improved with linkage constraints.
Divisive clustering is a top-down approach that begins with a single cluster containing all examples
and iteratively picks a cluster to split and separates it into smaller clusters until some number of clusters
is reached. The theoretical time complexity for divisive clustering is O(2n ) for an exhaustive search
and this approach needs an additional clustering algorithm for splitting each cluster. However, the time
complexity in practice can be lower, depending on the clustering algorithm used, and it may be better
than agglomerative clustering if we only want a few levels of hierarchical clustering.
166 CHAPTER 19. HIERARCHICAL CLUSTERING

19.2 Hierarchical to partitional

Although a hierarchical clustering is a tree of clusters inside other clusters, we can convert it into a
partitional clustering, with a set of clusters at the same level, by cutting some arcs of the tree. The
farther we go from the root of the tree, the greater the number of clusters generated. Figure ?? illustrates
this process.

Two clusters

Five clusters

Figure 19.4: Partitioning a hierarchical clustering by cutting the tree at the desired level.

19.3 Connectivity constraints

In agglomerative clustering, we can restrict which clusters to join by adding connectivity constraints.
These constraints specify which examples are considered connected and only clusters with connected
examples, from one cluster to the other, can be joined into larger clusters. This helps solve some
problems like Figure 19.5 illustrates. The left panel shows the result of agglomerative clustering
without connectivity constraints. Since the linkage method used (Ward) takes into account only
distances between the points, in order to minimize the SSE, the clusters include examples across the gap
separating different stretches of the “ribbon” in which the data is structured. A connectivity constraint
that restricts the connection of each example only to the 10 nearest neighbours creates a graph of
connections that respects the structure of the data and prevents these inadequate clusters from forming.
To create this matrix with the connectivity constraints, we can use the kneighbors_graph function
from the neighbors module of the Scikit-learn, and then use the connectivity constraints matrix in
the AgglomerativeClustering class, as shown below. The result is shown in the right panel of
Figure 19.5.
1 from sklearn.cluster import AgglomerativeClustering
2 from sklearn.neighbors import kneighbors_graph
19.4. CHOOSING THE LINKAGE METHOD 167

Figure 19.5: Agglomerative clustering with Ward linkage, without connectivity constraints (left panel)
and with connectivity constraints connecting only the 10 nearest neighbours of each example.

3
4 connectivity = kneighbors_graph(X, n_neighbors=10, include_self=False)
5 ward = AgglomerativeClustering(n_clusters=6, connectivity=connectivity,
6 linkage=’ward’).fit(X)

19.4 Choosing the linkage method

Scikit-Learn currently offers three linkage methods for agglomerative clustering: complete, average and
Ward linkage. Figure 19.6 shows an example data set clustered to three clusters using agglomerative
clustering and the three linkage methods. Complete linkage tends to favour larger clusters, so leads to a
poor relation between the clusters and the data structure in some cases, as the figure shows (left panel).
Average linkage is better, in these cases (middle panel), and Ward linkage (right panel), minimizing the
SSE measured in the clusters, seems to work best. However, Ward linkage can only be used when the
dissimilarity measure is the Euclidean distance, so if another measure must be used average linkage
tends to be the best option.

Figure 19.6: Agglomerative clustering of the same data set with (left to right) complete, average and
Ward linkage.

19.5 Bisecting k-means

An example of a divisive hierarchical clustering algorithm is the bisecting k-means. The algorithm is:

1. Start with all the examples in a single cluster.

168 CHAPTER 19. HIERARCHICAL CLUSTERING

Figure 19.7: Some examples from the handwritten digits dataset.

2. Choose the best cluster for splitting (e.g. the largest or the one with the lowest score).

3. Split the best candidate with k-means, using k = 2.

4. Repeat steps 2 and 3 until the desired number of clusters is reached.

Although the time complexity for an exhaustive search in divisive clustering is O(2n ), using the
k-means algorithm reduces the complexity (although at the cost of having a more greedy divisive
clustering) and the possibility of stopping at the desired level may make this algorithm preferable to
agglomerative clustering in some cases, since agglomerative clustering must run until the complete
tree is generated.

19.6 Clustering features

Conceptually, clustering features is the same as clustering examples. We need but imagine that we
transpose the matrix with all examples in rows and features in columns and obtain a new matrix were
the examples are in the columns, and are now considered features, and the original features, now in rows,
are examples. Clustering features allows us to identify similar features and reduce the dimensionality
of the data by grouping these together in a single feature. With hierarchical clustering we have a simple
way of controlling how many groups of features we use and thus the dimensionally of the resulting data
set.
To illustrate this, we will simplify the handwritten digits data set, which consists of digitized
handwritten digits into grayscale bitmaps of 64 pixels. Figure 19.7 shows these data.
The data set has a total of 1797 examples with 64 features each so, for feature clustering, we
will convert it into a set of 64 examples each with 1797 features. Then we cluster it into 16 clusters,
corresponding to 16 features in the original data set, which we can extract by averaging all features in
each cluster. We also add a connectivity constraint to restrict clustering to neighbouring pixels in the
original image. Feature clustering is done automatically in the FeatureAgglomeration class, so the
complete code, including loading the data set, is simply:
1 import numpy as np
2 from sklearn import datasets, cluster
19.7. FURTHER READING 169

3 from sklearn.feature_extraction.image import grid_to_graph

4
5 digits = datasets.load_digits()
6 images = digits.images
7 X = np.reshape(images, (len(images), -1))
8 connectivity = grid_to_graph(images[0].shape[0],images[0].shape[1])
9 agglo = cluster.FeatureAgglomeration(connectivity=connectivity, n_clusters=16)
10 agglo.fit(X)
11 X_reduced = agglo.transform(X)
12 X_restored = agglo.inverse_transform(X_reduced)

Lines 5-7 are for loading the data and converting the image matrices into a matrix of examples
(rows) and features (columns). Line 8 is for creating the connectivity matrix with the neighbours of
each pixel in the 64 × 64 image matrix. Lines 9 and 10 create the agglomerative clusterer and fit the
data, and the last two lines complete the reduced dataset, with only 16 features, and a 64 features dataset
with the feature values aggregated, averaging the features in the same cluster. Figure 19.8 shows the
result. Although the digits in the reduced dataset are no longer recognizable as digits, it is easy to see
that the patterns are different from digit to digit, so this process reduced the number of features without
losing much information.

Figure 19.8: Feature clustering. The original handwritten digits features were clustered as shown in the
left panel. Using only these 16 clusters as 16 features, the reduced data set is illustrated on the right
panel.

19.7 Further Reading

1. Scikit-learn documentation on clustering:http://scikit-learn.org/stable/modules/
clustering.html

2. Alpaydin [2], Section 7.7

Bibliography

[1] Uri Alon, Naama Barkai, Daniel A Notterman, Kurt Gish, Suzanne Ybarra, Daniel Mack, and
Arnold J Levine. Broad patterns of gene expression revealed by clustering analysis of tumor and
normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of
Sciences, 96(12):6745–6750, 1999.

[2] Ethem Alpaydin. Introduction to Machine Learning. The MIT Press, 2nd edition, 2010.

[3] David F Andrews. Plots of high-dimensional data. Biometrics, page 125–136, 1972.

[4] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and
Statistics). Springer, New York, 1st ed. edition, oct 2006.

[5] Deng Cai, Xiaofei He, Zhiwei Li, Wei-Ying Ma, and Ji-Rong Wen. Hierarchical clustering of
www image search results using visual, textual and link information. In MULTIMEDIA ’04
Proceedings of the 12th annual ACM international conference on Multimedia, page 952–959.
Association for Computing Machinery, Inc., October 2004.

[6] Guanghua Chi, Yu Liu, and Haishandbscan Wu. Ghost cities analysis based on positioning data
in china. arXiv preprint arXiv:1510.08505, 2015.

[7] Le Cun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Hand-
written digit recognition with a back-propagation network. In Advances in Neural Information
Processing Systems, page 396–404. Morgan Kaufmann, 1990.

[8] Pedro Domingos. A unified bias-variance decomposition. In Proceedings of 17th International

Conference on Machine Learning. Stanford CA Morgan Kaufmann, page 231–238, 2000.

[9] Hakan Erdogan, Ruhi Sarikaya, Stanley F Chen, Yuqing Gao, and Michael Picheny. Using
semantic analysis to improve speech recognition performance. Computer Speech & Language,
19(3):321–343, 2005.

[10] Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. A density-based algorithm for
discovering clusters in large spatial databases with noise. In Kdd, page 226–231, 1996.

[11] Brendan J Frey and Delbert Dueck. Clustering by passing messages between data points. science,
315(5814):972–976, 2007.

179
180 BIBLIOGRAPHY

[12] Arthur E Hoerl and Robert W Kennard. Ridge regression: Biased estimation for nonorthogonal
problems. Technometrics, 12(1):55–67, 1970.

[13] Patrick Hoffman, Georges Grinstein, Kenneth Marx, Ivo Grosse, and Eugene Stanley. Dna visual
and analytic data mining. In Visualization’97., Proceedings, page 437–441. IEEE, 1997.

[14] Chang-Hwan Lee, Fernando Gutierrez, and Dejing Dou. Calculating feature weights in naive
bayes with kullback-leibler measure. In Data Mining (ICDM), 2011 IEEE 11th International
Conference on, page 1146–1151. IEEE, 2011.

[15] Stuart Lloyd. Least squares quantization in pcm. Information Theory, IEEE Transactions on,
28(2):129–137, 1982.

[16] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine
learning research, 9(Nov):2579–2605, 2008.

[17] James MacQueen et al. Some methods for classification and analysis of multivariate observa-
tions. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability,
volume 1, page 281–297. Oakland, CA, USA., 1967.

[18] Stephen Marsland. Machine Learning: An Algorithmic Perspective. Chapman & Hall/CRC, 1st
edition, 2009.

[19] Thomas M. Mitchell. Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edition,
1997.

[20] Yvan Saeys, Iñaki Inza, and Pedro Larrañaga. A review of feature selection techniques in
bioinformatics. bioinformatics, 23(19):2507–2517, 2007.

[21] Joshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometric framework for
nonlinear dimensionality reduction. science, 290(5500):2319–2323, 2000.

[22] Roberto Valenti, Nicu Sebe, Theo Gevers, and Ira Cohen. Machine learning techniques for face
analysis. In Matthieu Cord and Pádraig Cunningham, editors, Machine Learning Techniques for
Multimedia, Cognitive Technologies, page 159–187. Springer Berlin Heidelberg, 2008.

[23] Giorgio Valentini and Thomas G Dietterich. Bias-variance analysis of support vector machines for
the development of svm-based ensemble methods. The Journal of Machine Learning Research,
5:725–775, 2004.

[24] Jake VanderPlas. Frequentism and bayesianism: a python-driven primer. arXiv preprint
arXiv:1411.5018, 2014.

Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
No ratings yet
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
3 pages
A Course in Machine Learning
100% (1)
A Course in Machine Learning
191 pages
Clustring
No ratings yet
Clustring
20 pages
Hierarchical Clustering: Class Program University Semester Lecturer Sources
100% (1)
Hierarchical Clustering: Class Program University Semester Lecturer Sources
33 pages
AI20- Hierarchical-clustering
No ratings yet
AI20- Hierarchical-clustering
31 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Hierarchical-Clustering-in-Machine-Learning
No ratings yet
Hierarchical-Clustering-in-Machine-Learning
10 pages
Hierarchical clustering
No ratings yet
Hierarchical clustering
23 pages
6 - Chapter 6 - Hierarchical Clustering
No ratings yet
6 - Chapter 6 - Hierarchical Clustering
32 pages
Cluster Analysis
No ratings yet
Cluster Analysis
6 pages
1629189889 ML TCS Lecture Hierarchical 1608
No ratings yet
1629189889 ML TCS Lecture Hierarchical 1608
41 pages
Herichycal March2020
No ratings yet
Herichycal March2020
29 pages
10Hierarchical&Probabilistic Clustering & GMM (ML)
No ratings yet
10Hierarchical&Probabilistic Clustering & GMM (ML)
24 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Herichycal Cluster - March2020
No ratings yet
Herichycal Cluster - March2020
29 pages
Hierar Scale4
No ratings yet
Hierar Scale4
51 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
101 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
4.4 Hierarchical Clustering Methods
No ratings yet
4.4 Hierarchical Clustering Methods
39 pages
6902 An Applied Algorithmic Foundation For Hierarchical Clustering
No ratings yet
6902 An Applied Algorithmic Foundation For Hierarchical Clustering
10 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
11 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
6 pages
Expt-5
No ratings yet
Expt-5
3 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
K-Means and Hierarchical Clustering
No ratings yet
K-Means and Hierarchical Clustering
30 pages
Heirarchical clustering
No ratings yet
Heirarchical clustering
22 pages
Hierarchical Clustering.pptx
No ratings yet
Hierarchical Clustering.pptx
96 pages
3.2 HierCluster
No ratings yet
3.2 HierCluster
17 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
11 pages
Agglomerative Hierarchical Clustering
No ratings yet
Agglomerative Hierarchical Clustering
22 pages
CLUSTERING
No ratings yet
CLUSTERING
16 pages
lec2
No ratings yet
lec2
32 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Hierarchical Clustering PDF
No ratings yet
Hierarchical Clustering PDF
5 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Week-9-Part-2 Agglomerative Clustering
No ratings yet
Week-9-Part-2 Agglomerative Clustering
40 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
6 pages
ML Unit 5
No ratings yet
ML Unit 5
50 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
34 pages
06-clus3
No ratings yet
06-clus3
32 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Unit-4 new
No ratings yet
Unit-4 new
36 pages
Lect 11 DM
No ratings yet
Lect 11 DM
41 pages
Clustering
No ratings yet
Clustering
39 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
DWM Exp8 127 133 137
No ratings yet
DWM Exp8 127 133 137
4 pages
UnSupervisedLearning
No ratings yet
UnSupervisedLearning
22 pages
Clustering
No ratings yet
Clustering
19 pages
clustering1
No ratings yet
clustering1
2 pages
08 Clustering Hierarchical
No ratings yet
08 Clustering Hierarchical
44 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
Hierarchica L Clustering: Ronak Jangid - IU1841230019 Jaykumar Chavda - IU1841230020 Jeet Maru - IU1841230029
No ratings yet
Hierarchica L Clustering: Ronak Jangid - IU1841230019 Jaykumar Chavda - IU1841230020 Jeet Maru - IU1841230029
16 pages
DA Seminar
No ratings yet
DA Seminar
29 pages
Constraint Networks: Targeting Simplicity for Techniques and Algorithms
From Everand
Constraint Networks: Targeting Simplicity for Techniques and Algorithms
Christophe Lecoutre
No ratings yet
Minimum Bounding Box: Unveiling the Power of Spatial Optimization in Computer Vision
From Everand
Minimum Bounding Box: Unveiling the Power of Spatial Optimization in Computer Vision
Fouad Sabry
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Plant Disease Detection Using Deep Learning Approach: Project phase-II Presentation On
No ratings yet
Plant Disease Detection Using Deep Learning Approach: Project phase-II Presentation On
20 pages
Pattern Recognition Techniques in AI
No ratings yet
Pattern Recognition Techniques in AI
6 pages
g41 Project Report[1]
No ratings yet
g41 Project Report[1]
24 pages
Training Deep Neural Networks
No ratings yet
Training Deep Neural Networks
55 pages
Machine Learning Problem Statements
No ratings yet
Machine Learning Problem Statements
15 pages
Data Mining Using Conceptual Clustering
No ratings yet
Data Mining Using Conceptual Clustering
29 pages
L - 1 - 2 - Emerging Trends in Artificial Intelligence and Data Science
No ratings yet
L - 1 - 2 - Emerging Trends in Artificial Intelligence and Data Science
106 pages
Cluster Analysis Using Dicer: Install - Packages
No ratings yet
Cluster Analysis Using Dicer: Install - Packages
8 pages
Tirumala Engineering College: Detection of Cyber Attacks in Network by Using Machine Learning
No ratings yet
Tirumala Engineering College: Detection of Cyber Attacks in Network by Using Machine Learning
24 pages
Introduction To Data Science With Python
No ratings yet
Introduction To Data Science With Python
2 pages
1 s2.0 S2214785322011452 Main
No ratings yet
1 s2.0 S2214785322011452 Main
8 pages
Ai and Law
No ratings yet
Ai and Law
8 pages
cloud-digital-leader_0
No ratings yet
cloud-digital-leader_0
40 pages
Selection From The Book Exploring Geological Data With WEKA For iSE-ACADEMY
No ratings yet
Selection From The Book Exploring Geological Data With WEKA For iSE-ACADEMY
17 pages
Spreport
No ratings yet
Spreport
9 pages
University of Technology Department of Electrical Engineering Final Course Examination 2019-2020
No ratings yet
University of Technology Department of Electrical Engineering Final Course Examination 2019-2020
2 pages
Machine Learning Practice
No ratings yet
Machine Learning Practice
17 pages
Machine Learning in Production Andrew Kelleher, Adam Kelleher Isbn 978-0!13!4116549 Pearson 1st Edition 2019 282 Pages
No ratings yet
Machine Learning in Production Andrew Kelleher, Adam Kelleher Isbn 978-0!13!4116549 Pearson 1st Edition 2019 282 Pages
282 pages
Synopsis On: (Development of Automatic Text Summarization Algorithm)
No ratings yet
Synopsis On: (Development of Automatic Text Summarization Algorithm)
14 pages
Project Seminar Report
No ratings yet
Project Seminar Report
18 pages
CS 540: Introduction To Artificial Intelligence: Final Exam: 8:15-9:45am, December 21, 2016 132 Noland
No ratings yet
CS 540: Introduction To Artificial Intelligence: Final Exam: 8:15-9:45am, December 21, 2016 132 Noland
8 pages
AI Copilots: Improving The HR Experience For Employees: Hands Work
No ratings yet
AI Copilots: Improving The HR Experience For Employees: Hands Work
3 pages
CNN Basic Structure, Hyper-Parameter Tuning, Regularization-Dropouts
No ratings yet
CNN Basic Structure, Hyper-Parameter Tuning, Regularization-Dropouts
54 pages
Signal Processing: Qiang Zhang, Li Zhuo, Jiafeng Li, Jing Zhang, Hui Zhang, Xiaoguang Li
No ratings yet
Signal Processing: Qiang Zhang, Li Zhuo, Jiafeng Li, Jing Zhang, Hui Zhang, Xiaoguang Li
8 pages
Data Warehousing and Mining April 2019
No ratings yet
Data Warehousing and Mining April 2019
4 pages
Practical Implementation of Random Forest-Based Mineral
No ratings yet
Practical Implementation of Random Forest-Based Mineral
17 pages
An Innovative Method For Hindi Word Sense Disambiguation: Binod Kumar Mishra Suresh Jain
No ratings yet
An Innovative Method For Hindi Word Sense Disambiguation: Binod Kumar Mishra Suresh Jain
17 pages
Drug Recommendation System Based On Sentiment Analysis of Drug Reviews Using Machine Learning
No ratings yet
Drug Recommendation System Based On Sentiment Analysis of Drug Reviews Using Machine Learning
8 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
10 pages

Hierarchical Clustering

Uploaded by

Hierarchical Clustering

Uploaded by

Chapter 19

Hierarchical Clustering. Agglomerative and Divisive Clustering. Clustering Features.

19.1 Hierarchical clustering

Figure 19.2: Hierarchical clustering represented as a dendrogram. Image source: Wikipedia.

d(x, y) ≥ 0 , d(x, y) = d(y, x) , d(x, z) ≤ d(x, y) + d(y, x)

• Squared Euclidean: kx − yk22 = (xd − yd )2

dist(Cj , Ck ) = min (dist(x ∈ Cj , y ∈ Ck ))

• Complete linkage: distance between the most distant points.

dist(Cj , Ck ) = max (dist(x ∈ Cj , y ∈ Ck ))

• Centroid linkage: distance between the centroids of the two clusters.

dist(Cj , Ck ) = mean (dist(x ∈ Cj , y ∈ Ck ))

dist(Cj , Ck ) = median (dist(x ∈ Cj , y ∈ Ck ))

• Ward linkage: join clusters that minimize Sum of Squares Error:

Figure 19-linkage illustrates some examples of linkage methods.

Figure 19.3: Single, complete and centroid linkage methods.

19.2 Hierarchical to partitional

19.3 Connectivity constraints

19.4 Choosing the linkage method

19.5 Bisecting k-means

1. Start with all the examples in a single cluster.

Figure 19.7: Some examples from the handwritten digits dataset.

3. Split the best candidate with k-means, using k = 2.

4. Repeat steps 2 and 3 until the desired number of clusters is reached.

19.6 Clustering features

3 from sklearn.feature_extraction.image import grid_to_graph

19.7 Further Reading

2. Alpaydin [2], Section 7.7

[8] Pedro Domingos. A unified bias-variance decomposition. In Proceedings of 17th International

You might also like