0% found this document useful (0 votes)
15 views21 pages

4 Clustering

This document discusses various clustering techniques used in predictive analytics and healthcare: 1. K-means clustering aims to minimize distances between clusters and is an iterative process that assigns patients to a set number of predefined clusters. 2. Fuzzy C-means clustering is similar but allows patients to have partial membership in multiple clusters, introducing overlap. 3. Spectral clustering can handle irregularly shaped clusters and is based on the eigenvalues of the similarity matrix between patients. 4. Hierarchical clustering builds a dendrogram showing cluster relationships and can be agglomerative or divisive in merging or splitting clusters.

Uploaded by

paulitxenko08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views21 pages

4 Clustering

This document discusses various clustering techniques used in predictive analytics and healthcare: 1. K-means clustering aims to minimize distances between clusters and is an iterative process that assigns patients to a set number of predefined clusters. 2. Fuzzy C-means clustering is similar but allows patients to have partial membership in multiple clusters, introducing overlap. 3. Spectral clustering can handle irregularly shaped clusters and is based on the eigenvalues of the similarity matrix between patients. 4. Hierarchical clustering builds a dendrogram showing cluster relationships and can be agglomerative or divisive in merging or splitting clusters.

Uploaded by

paulitxenko08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Unit 4 – Clustering

Predictive Analytics in Healthcare

Miguel Rodrigo
Department of Electronic Engineering, School of Engineering
Universitat de València, Avgda. Universitat s/n
46100 Burjassot (Valencia)
[email protected]

1
Clustering: definition

Unsupervised learning technique that allows to group samples


(patients) based on the similarity of their characteristics (features). The
label is never used in this analysis.
Data is represented on a N-dimension space, being N the number of
features, and clusters (groups) of patients are identified as continuous
area of the data space with a relatively high density of points, that is
separated from other dense locations, by areas whose density of points
is relatively low.
How many clusters/groups
do we have?
Two? Maybe four?

2
Proximity measures

Similarity measures (Correlation):


• Inner product, Cosine measure, Tanimoto’s measure

Dissimilarity measures (Distances):


• Euclidean, Mahalanobis, Bhattacharyya

Caution: before using clustering algorithms, features should be


normalized. Otherwise, distances have different measures
depending on the range of the feature (e.g. age vs siblings).

3
K-means

The algorithm tries to minimize the


distance of all patterns that belong
to a given cluster. This is why the first
step is to find the closest cluster (j-th
cluster) to the i-th pattern, and then
the second step is based on making
intra-cluster distances as low as
possible. It is an iterative process
(randomly initialized in most cases).
It is based on the minimization of the
following cost function:

4
Fuzzy C-Means
K-means
Based on fuzzy logic, where a given measure may have a membership to different
categories. When applied to clustering, any pattern may belong to different clusters
with a fuzzy membership, thus introducing overlapping, which is a common
phenomenon, in a natural way and with a robust mathematical background.
Similar to K-Means but instead of using the distance from a given pattern only to the
closest cluster, all distances are taken into account, with an associated fuzzy
membership (m>1 for a fuzzy clustering).
Fuzzy C-means

5
K-means and Fuzzy C-Means

Despite being the most widely used


clustering algorithm, it has yet some
drawbacks:
• The number of clusters must be known
in advance (partially solved by
ISODATA).
• Clusters tend to have hyper-spherical
shapes, thus mixing or breaking up
natural clusters with non-spherical
shapes.
• All clusters are formed by a similar
number of patterns, again mixing or
breaking up natural clusters.

6
Spectral clustering

There are methods focused on dividing a set of graphs or nodes of


a graph into different clusters. Based on using the eigenvalues
(spectrum) of the Laplacian of the proximity (dissimilarity) matrix,
which contains the distance values between each pair of points.

The stages of the algorithm are:


1. Constructing a nearest neighbors graph or radius-based graph.
2. Embed the data points in low dimensional space (spectral
embedding) in which the clusters are more obvious with the use of
eigenvectors of the Laplacian graph.
3. Use the lowest eigenvalue in order to choose the eigenvector for
the cluster.

7
Spectral clustering

It can deal with clusters that are not compact or within convex boundaries, where most of the classical
clustering algorithms fail.
ISODATA is a variant of spectral clustering.

K-means Spectral
K-means Spectral

8
Hierarchical clustering

Dendogram
Clustering method based in an iterative
process, that may be agglomerative or Large and single cluster
divisive, in which the different clusters are
created using a hierarchy based on
proximity/dissimilarity measures.
Depending on the hierarchy that is
selected, a different number of clusters
turns up. Its main advantage and
drawback is the same, though it seems
paradoxical: the selection of a correct
hierarchy is a challenge but the
hierarchical visualization (dendrogram) is
useful in itself. Smallest and separated clusters

9
Hierarchical clustering: Agglomerative approach

F G

D
B E
C
A

10
Hierarchical clustering: Agglomerative approach

Initial clustering: each pattern is a cluster


For each iteration:
1. Calculation of the distances d(Cr,Cs) between all pairs of
clusters produced in the last iteration. That pair of clusters
with the shortest distance (Ci,Cj) is selected to be joined in a
new cluster
2. The new clustering is like the one in the previous iteration but
without clusters Ci and Cj that now are a unique cluster Cq,
that does appear in the new clustering.

The algorithm stops when there is only one cluster for the whole
data set. Sometimes, the whole hierarchy is not obtained, but just a
selection that seems reasonable for the problem at hand.

11
Hierarchical clustering: Agglomerative approach

Not all distances must be calculated at each iteration, only the distance between
the new cluster Cq and the rest of clusters; algorithms based on Lance and
Williams formula are normally used:
• Single-link algorithm:
• Complete-link algorithm:
• Unweighted average algorithm:
• Weighted average algorithm:
• Unweighted centroid algorithm:
• Weighted centroid algorithm:

12
Cluster validity
Suitable number of clusters for a
given distribution of samples Dunn index (M is the number of clusters)
(patients)? Usually, there is not a
unique and definite answer.
Some indices that can help find the
correct number of clusters are
based on Compactness and Silhouette Coefficient:
Isolation.
Score = (b-a)/max(a,b)
a= average intra-cluster
distance
b= average inter-cluster
distance

13
Cluster validity

• SSE (Sum of Squared Error) or Inertia plot. elbow point


Average SSE error between samples and cluster
centers. Elbow point: the point after which the
SSE or inertia starts decreasing in a linear
fashion.
• Hopkins statistics. Used to assess the clustering
tendency of a data set by measuring the
probability that a given data set is generated by
uniform data distribution. In other words, it
tests the spatial randomness of the data.

14
Self Organizing Maps (SOM)

Unsupervised machine learning technique used to produce a low-dimensional (typically two-dimensional)


representation of a higher dimensional data set while preserving the topological structure of the data.
SOM are build using shallow neural networks, which are trained using competitive learning, in which several
nodes compete for the right to respond to a subset of the input data.

Neurons, defined as the points of the black net, are


trained to cover the data distribution (blue)

15
Once trained, SOM maps show information into a
uniform 2D map. These SOM map shows the
same 2D distribution for all the features used in
the clustering. Each sample (patient) is always in
the same position of the 2D map.

In colors: Average
feature value per
neuron

Number of samples (patients)


per neuron

16
Other clustering methods: Density-based

The data points in the region separated by two clusters of low point density are considered as noise. The
surroundings with a radius ε of a given object are known as the ε neighborhood of the object. If the ε
neighborhood of the object comprises at least a minimum number of objects then it is called a core object.

• DBSCAN (Density-Based
Spatial Clustering of
Applications with Noise). It
depends on a density-
based notion of cluster. It
also identifies clusters of
arbitrary size in the spatial
database with outliers.

17
Other clustering methods: Density-based

• OPTICS (Ordering Points To Identify


the Clustering Structure). Addresses
the problem of data density that
cannot solve DBSCAN.

• DENCLUE (DENsity-based CLUstEring).


It uses grid cells but only keeps
information about grid cells that do
actually contain data points and
manages these cells in a tree-based
access structure. It is good for data
sets with a huge amount of noise.
Reachability-plot (a special kind of dendrogram)

18
Other clustering methods: Manifold learning

Class of unsupervised estimators that seeks to describe datasets as low-dimensional manifolds embedded in
high-dimensional spaces (e.g. piece of paper).

• Multidimensional scaling (MDS). Set of related


ordination techniques used in information visualization,
in particular to display the information contained in a
distance matrix. It is a form of non-linear dimensionality
reduction.
• Locally linear embedding (LLE). It tries to reduce these
n-Dimensions while trying to preserve the geometric
features of the original non-linear feature structure.
• Isometric mapping (IsoMap). Combines MDS with the
geodesic distance for reducing the dimensionality of
data sampled from a smooth manifold.

19
Clustering with categorical features

In many clinical problems, many of the input features for clustering


are categorical variables. This limits the clustering performance, as
points (patients) are not well spatially distributed.

Tips:
• Remember to use any kind of data normalization to compare
categorical (Boolean) with continuous features (e.g. age vs sex).
• Consider running clustering only on continuous features if
possible.
• There are some clustering methods / metrics that better deal with
binary features: Hierarchical clustering, DBSCAN, etc.

20
Clustering methods for supervised learning

Unsupervised learning does not allow to train models for classification,


regression or prediction, as clustering algorithms are not made to ‘fit’ a
desired response.
However, clustering techniques are very useful for supervised learning
problems, as they can be used for several tasks:
• Finding relevant subpopulations in our data: inherent clinical
groups that are relevant for our problem, subpopulations with
different responses to the label, etc. Training different
classification/regression models for each subpopulation can be
beneficial.
• Helping in feature selection. Those features better allowing patient
clustering may be better for the problem. This can checked through
clustering metrics (silhouette, etc.) or by directly comparing cluster
distribution with the label (contingency tables, etc.). Cluster ID Label

21

You might also like