0% found this document useful (0 votes)

199 views

Clustering: Georg Gerber Lecture #6, 2/6/02

This document provides an overview of clustering techniques. It discusses: 1) Motivations for clustering like exploring gene expression patterns to suggest gene functions. 2) The importance of choosing a (dis)similarity measure like Euclidean distance or Pearson correlation when clustering. 3) Clustering algorithms including hierarchical agglomerative clustering which merges similar clusters, k-means clustering which partitions data into k clusters, and self-organizing maps. Examples of clustering gene expression data from research papers are provided.

Uploaded by

maruthiy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

199 views

Clustering: Georg Gerber Lecture #6, 2/6/02

Uploaded by

maruthiy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 50

Clustering

Georg Gerber
Lecture #6, 2/6/02
Lecture Overview
 Motivation – why do clustering? Examples
from research papers
 Choosing (dis)similarity measures – a critical
step in clustering
 Euclidean distance
 Pearson Linear Correlation
 Clustering algorithms
 Hierarchical agglomerative clustering
 K-means clustering and quality measures
 Self-organizing maps (if time)
What is clustering?
 A way of grouping together data samples that
are similar in some way - according to some
criteria that you pick
 A form of unsupervised learning – you
generally don’t have examples demonstrating
how the data should be grouped together
 So, it’s a method of data exploration – a
way of looking for patterns or structure in the
data that are of interest
Why cluster?
 Cluster genes = rows
 Measure expression at multiple time-points,
different conditions, etc.
 Similar expression patterns may suggest similar
functions of genes (is this always true?)
 Cluster samples = columns
 e.g., expression levels of thousands of genes for
each tumor sample
 Similar expression patterns may suggest biological
relationship among samples
Example 1: clustering genes
 P. Tamayo et al., Interpreting patterns of
gene expression with self-organizing maps:
methods and application to hematopoietic
differentiation, PNAS 96: 2907-12, 1999.
 Treatment of HL-60 cells (myeloid leukemia cell
line) with PMA leads to differentiation into
macrophages
 Measured expression of genes at 0, 0.5, 4 and 24
hours after PMA treatment
 Used SOM technique;
shown are cluster
averages
 Clusters contain a number
of known related genes
involved in macrophage
differentiation
 e.g., late induction
cytokines, cell-cycle genes
(down-regulated since
PMA induces terminal
differentiation), etc.
Example 2: clustering genes
 E. Furlong et al., Patterns of Gene Expression During
Drosophila Development, Science 293: 1629-33, 2001.
 Use clustering to look for patterns of gene expression
change in wild-type vs. mutants
 Collect data on gene expression in Drosophila wild-type
and mutants (twist and Toll) at three stages of
development
 twist is critical in mesoderm and subsequent muscle
development; mutants have no mesoderm
 Toll mutants over-express twist
 Take ratio of mutant over wt expression levels at
corresponding stages
Find general trends
in the data – e.g., a
group of genes with
high expression in
twist mutants and
not elevated in Toll
mutants contains
many known neuro-
ectodermal genes
(presumably over-
expression of twist
suppresses
ectoderm)
Example 3: clustering samples
 A. Alizadeh et al., Distinct types of diffuse large B-cell
lymphoma identified by gene expression profiling,
Nature 403: 503-11, 2000.
 Response to treatment of patients w/ diffuse large B-
cell lymphoma (DLBCL) is heterogeneous
 Try to use expression data to discover finer
distinctions among tumor types
 Collected gene expression data for 42 DLBCL tumor
samples + normal B-cells in various stages of
differentiation + various controls
Found some tumor
samples have
expression more
similar to germinal
center B-cells and
others to peripheral
blood activated B-cells
Patients with
“germinal center
type” DLBCL generally
had higher five-year
survival rates
Lecture Overview
 Motivation – why do clustering? Examples
from research papers
 Choosing (dis)similarity measures – a
critical step in clustering
 Euclidean distance
 Pearson Linear Correlation
 Clustering algorithms
 Hierarchical agglomerative clustering
 K-means clustering and quality measures
 Self-Organizing Maps (if time)
How do we define “similarity”?
 Recall that the goal is to group together
“similar” data – but what does this mean?
 No single answer – it depends on what we
want to find or emphasize in the data; this is
one reason why clustering is an “art”
 The similarity measure is often more
important than the clustering algorithm used
– don’t overlook this choice!
(Dis)similarity measures
 Instead of talking about similarity measures,
we often equivalently refer to dissimilarity
measures (I’ll give an example of how to
convert between them in a few slides…)
 Jagota defines a dissimilarity measure as a
function f(x,y) such that f(x,y) > f(w,z) if and
only if x is less similar to y than w is to z
 This is always a pair-wise measure
 Think of x, y, w, and z as gene expression
profiles (rows or columns)
Euclidean distance
n
d euc (x, y)   i i
( x
i 1
 y ) 2

 Here n is the number of dimensions in the

data vector. For instance:
 Number of time-points/conditions (when
clustering genes)
 Number of genes (when clustering samples)
deuc=0.5846 deuc=1.1345

deuc=2.6115 These examples of

Euclidean distance
match our intuition of
dissimilarity pretty
well…
deuc=1.41 deuc=1.22

…But what about these?

What might be going on with the expression profiles
on the left? On the right?
Correlation
 We might care more about the overall shape of
expression profiles rather than the actual magnitudes
 That is, we might want to consider genes similar
when they are “up” and “down” together
 When might we want this kind of measure? What
experimental issues might make this appropriate?
Pearson Linear Correlation
n

 ( x  x )(y
i i  y)
 ( x , y)  i 1
n n

 ( xi  x )
i 1
2
 i
( y
i 1
 y ) 2

1 n
x   xi
n i
1 n
y   yi
n i
 We’re shifting the expression profiles down (subtracting the
means) and scaling by the standard deviations (i.e., making the
data have mean = 0 and std = 1)
Pearson Linear Correlation
 Pearson linear correlation (PLC) is a measure that is
invariant to scaling and shifting (vertically) of the
expression values
 Always between –1 and +1 (perfectly anti-correlated
and perfectly correlated)
 This is a similarity measure, but we can easily make
it into a dissimilarity measure:
1   (x, y)
dp 
2
PLC (cont.)
 PLC only measures the degree of a linear relationship
between two expression profiles!
 If you want to measure other relationships, there are
many other possible measures (see Jagota book and
project #3 for more examples)

 = 0.0249, so dp = 0.4876
The green curve is the
square of the blue curve –
this relationship is not
captured with PLC
More correlation examples

What do you think the How about here? Is

correlation is here? Is this what we want?
this what we want?
Missing Values
 A common problem w/ microarray data
 One approach with Euclidean distance or PLC
is just to ignore missing values (i.e., pretend
the data has fewer dimensions)
 There are more sophisticated approaches that
use information such as continuity of a time
series or related genes to estimate missing
values – better to use these if possible
Missing Values (cont.)
The green profile is
missing the point in the
middle
If we just ignore the
missing point, the green
and blue profiles will be
perfectly correlated (also
smaller Euclidean distance
than between the red and
blue profiles)
Lecture Overview
 Motivation – why do clustering? Examples
from research papers
 Choosing (dis)similarity measures – a critical
step in clustering
 Euclidean distance
 Pearson Linear Correlation
 Clustering algorithms
 Hierarchical agglomerative clustering
 K-means clustering and quality measures
 Self-Organizing Maps (if time)
Hierarchical Agglomerative
Clustering
 We start with every data point in a
separate cluster
 We keep merging the most similar pairs
of data points/clusters until we have
one big cluster left
 This is called a bottom-up or
agglomerative method
Hierarchical Clustering (cont.)
 This produces a
binary tree or
dendrogram
 The final cluster is
the root and each
data item is a leaf
 The height of the
bars indicate how
close the items are
Hierarchical Clustering Demo
Linkage in Hierarchical
Clustering
 We already know about distance measures
between data items, but what about between
a data item and a cluster or between two
clusters?
 We just treat a data point as a cluster with a
single item, so our only problem is to define a
linkage method between clusters
 As usual, there are lots of choices…
Average Linkage
 Eisen’s cluster program defines average linkage
as follows:
 Each cluster ci is associated with a mean vector i
which is the mean of all the data items in the cluster
 The distance between two clusters ci and cj is then just
d(i , j )
 This is somewhat non-standard – this method is
usually referred to as centroid linkage and
average linkage is defined as the average of all
pairwise distances between points in the two
clusters
Single Linkage
 The minimum of all pairwise distances
between points in the two clusters
 Tends to produce long, “loose” clusters
Complete Linkage
 The maximum of all pairwise distances
between points in the two clusters
 Tends to produce very tight clusters
Hierarchical Clustering Issues
 Distinct clusters are not produced –
sometimes this can be good, if the data has a
hierarchical structure w/o clear boundaries
 There are methods for producing distinct
clusters, but these usually involve specifying
somewhat arbitrary cutoff values
 What if data doesn’t have a hierarchical
structure? Is HC appropriate?
Leaf Ordering in HC
 The order of the leaves (data points) is
arbitrary in Eisen’s implementation
If we have n data points, this
leads to 2n-1 possible
orderings
Eisen claims that computing
an optimal ordering is
impractical, but he is
wrong…
Optimal Leaf Ordering
 Z. Bar-Joseph et al., Fast optimal leaf
ordering for hierarchical clustering, ISMB
2001.
 Idea is to arrange leaves so that the most
similar ones are next to each other
 Algorithm is practical (runs in minutes to a
few hours on large expression data sets)
Optimal Ordering Results

Hierarchical clustering Input Optimal ordering

K-means Clustering
 Choose a number of clusters k
 Initialize cluster centers 1,… k
 Could pick k data points and set cluster centers to these

points
 Or could randomly assign points to clusters and take

means of clusters
 For each data point, compute the cluster center it is
closest to (using some distance measure) and assign the
data point to this cluster
 Re-compute cluster centers (mean of data points in
cluster)
 Stop when there are no new re-assignments
K-means Clustering (cont.)

How many clusters do

you think there are in this
data? How might it have
been generated?
K-means Clustering Demo
K-means Clustering Issues
 Random initialization means that you may get
different clusters each time
 Data points are assigned to only one cluster
(hard assignment)
 Implicit assumptions about the “shapes” of
clusters (more about this in project #3)
 You have to pick the number of clusters…
Determining the “correct”
number of clusters
 We’d like to have a measure of cluster quality
Q and then try different values of k until we
get an optimal value for Q
 But, since clustering is an unsupervised
learning method, we can’t really expect to
find a “correct” measure Q…
 So, once again there are different choices of
Q and our decision will depend on what
dissimilarity measure we’re using and what
types of clusters we want
Cluster Quality Measures
 Jagota (p.36) suggests a measure that
emphasizes cluster tightness or homogeneity:
k
1
Q  d (x ,  i )
i 1 | Ci | x Ci

 |Ci | is the number of data points in cluster i

 Q will be small if (on average) the data points
in each cluster are close
Cluster Quality (cont.)
This is a plot of the
Q measure as given
in Jagota for k-
means clustering on
the data shown
Q earlier
How many clusters
do you think there
actually are?

k
Cluster Quality (cont.)
 The Q measure given in Jagota takes into account
homogeneity within clusters, but not separation
between clusters
 Other measures try to combine these two
characteristics (i.e., the Davies-Bouldin measure)
 An alternate approach is to look at cluster stability:
 Add random noise to the data many times and

count how many pairs of data points no longer

cluster together
 How much noise to add? Should reflect estimated

variance in the data

Self-Organizing Maps
 Based on work of Kohonen on learning/memory in
the human brain
 As with k-means, we specify the number of clusters
 However, we also specify a topology – a 2D grid that
gives the geometric relationships between the
clusters (i.e., which clusters should be near or distant
from each other)
 The algorithm learns a mapping from the high
dimensional space of the data points onto the points
of the 2D grid (there is one grid point for each
cluster)
Self-Organizing Maps (cont.)
10,10 Grid points map to
cluster means in
high dimensional
space (the space
of the data points)
11,11

Each grid point

corresponds to a
cluster (11x11 =
121 clusters in this
example)
Self-Organizing Maps (cont.)
 Suppose we have a r x s grid with each grid
point associated with a cluster mean 1,1,… r,s
 SOM algorithm moves the cluster means around
in the high dimensional space, maintaining the
topology specified by the 2D grid (think of a
rubber sheet)
 A data point is put into the cluster with the
closest mean
 The effect is that nearby data points tend to
map to nearby clusters (grid points)
Self-Organizing Map Example
We already saw this in the
context of the macrophage
differentiation data…
This is a 4 x 3 SOM and the mean
of each cluster is displayed
SOM Issues
 The algorithm is complicated and there are a
lot of parameters (such as the “learning rate”)
- these settings will affect the results
 The idea of a topology in high dimensional
gene expression spaces is not exactly obvious
 How do we know what topologies are appropriate?
 In practice people often choose nearly square grids
for no particularly good reason
 As with k-means, we still have to worry about
how many clusters to specify…
Other Clustering Algorithms
 Clustering is a very popular method of microarray
analysis and also a well established statistical
technique – huge literature out there
 Many variations on k-means, including algorithms
in which clusters can be split and merged or that
allow for soft assignments (multiple clusters can
contribute)
 Semi-supervised clustering methods, in which
some examples are assigned by hand to clusters
and then other membership information is
inferred
Parting thoughts: from Borges’ Other
Inquisitions, discussing an encyclopedia entitled
Celestial Emporium of Benevolent Knowledge

“On these remote pages it is written that animals are

divided into: a) those that belong to the Emperor;
b) embalmed ones; c) those that are trained; d)
suckling pigs; e) mermaids; f) fabulous ones; g)
stray dogs; h) those that are included in this
classification; i) those that tremble as if they were
mad; j) innumerable ones; k) those drawn with a
very fine camel brush; l) others; m) those that
have just broken a flower vase; n) those that
resemble flies at a distance.”

Animal Behavior Goodenough
100% (2)
Animal Behavior Goodenough
546 pages
RITM Guidelines For Referral 2017
No ratings yet
RITM Guidelines For Referral 2017
8 pages
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
No ratings yet
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
3 pages
My Lecture On CLUSTER ANALYSIS PDF
No ratings yet
My Lecture On CLUSTER ANALYSIS PDF
55 pages
UNIT 1 - Orientation To The Human Body Study Guide
100% (1)
UNIT 1 - Orientation To The Human Body Study Guide
6 pages
Dried Blood Spots
No ratings yet
Dried Blood Spots
21 pages
NO Kode Uraian: Rencana Anggaran Biaya Tahun 2018/2019 Stikes Awal Bros Pekanbaru Laboratorium Radiologi
No ratings yet
NO Kode Uraian: Rencana Anggaran Biaya Tahun 2018/2019 Stikes Awal Bros Pekanbaru Laboratorium Radiologi
8 pages
Agglomerative Hierarchical Clustering
No ratings yet
Agglomerative Hierarchical Clustering
22 pages
Clustering
No ratings yet
Clustering
64 pages
Clustering
No ratings yet
Clustering
75 pages
Clustering
No ratings yet
Clustering
36 pages
How Does Gene Expression Clustering Work?: Primer
No ratings yet
How Does Gene Expression Clustering Work?: Primer
3 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Agenda: 1. Introduction To Clustering
No ratings yet
Agenda: 1. Introduction To Clustering
47 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
Lecture-11 Cluster Analysis-1
No ratings yet
Lecture-11 Cluster Analysis-1
28 pages
Clustering
No ratings yet
Clustering
39 pages
Cluster Analysis I: Presidency University
No ratings yet
Cluster Analysis I: Presidency University
98 pages
Chapter 8 - Clustering
No ratings yet
Chapter 8 - Clustering
42 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
No ratings yet
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
7 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
lecture07_95791
No ratings yet
lecture07_95791
84 pages
Clustering: Sridhar S Department of IST Anna University
No ratings yet
Clustering: Sridhar S Department of IST Anna University
91 pages
5clustering-2
No ratings yet
5clustering-2
35 pages
What Is The Most Natural (Non-Autonomous, E.G. Breathing) Thing Done by Human Beings? How Often Does The Average Human Do It?
No ratings yet
What Is The Most Natural (Non-Autonomous, E.G. Breathing) Thing Done by Human Beings? How Often Does The Average Human Do It?
32 pages
Microarray Data Analysis: Class Discovery and Class Prediction: Clustering and Discrimination
No ratings yet
Microarray Data Analysis: Class Discovery and Class Prediction: Clustering and Discrimination
70 pages
Introduction To Clustering: Alka Arora Sr. Scientist
No ratings yet
Introduction To Clustering: Alka Arora Sr. Scientist
57 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
Lecture 3
No ratings yet
Lecture 3
46 pages
Understanding The Inners of Clustering: DR Akashdeep, UIET, Panjab University Chandigarh, Maivriklab@pu - Ac.in
No ratings yet
Understanding The Inners of Clustering: DR Akashdeep, UIET, Panjab University Chandigarh, Maivriklab@pu - Ac.in
61 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
MDA Session 4
No ratings yet
MDA Session 4
5 pages
Unit- 4 DMA
No ratings yet
Unit- 4 DMA
145 pages
CV w4 - Recognition - Statistical Based
No ratings yet
CV w4 - Recognition - Statistical Based
42 pages
Lecture 02 - Cluster Analysis 1
No ratings yet
Lecture 02 - Cluster Analysis 1
59 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
5 Microarray PDF
No ratings yet
5 Microarray PDF
79 pages
Supervised Learning vs. Unsupervised Learning
No ratings yet
Supervised Learning vs. Unsupervised Learning
7 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
ppt7
No ratings yet
ppt7
41 pages
06 - Unsupervised Learning - 18 Dec 2023
No ratings yet
06 - Unsupervised Learning - 18 Dec 2023
50 pages
L18_19_Clustering
No ratings yet
L18_19_Clustering
48 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Module12.02 UnsupervisedLearning
No ratings yet
Module12.02 UnsupervisedLearning
25 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Data Mining P
No ratings yet
Data Mining P
23 pages
Cluster Analysis: Grouping Cases or Variables
No ratings yet
Cluster Analysis: Grouping Cases or Variables
42 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
11 pages
CLUSTERING
No ratings yet
CLUSTERING
16 pages
lec2
No ratings yet
lec2
32 pages
Hierarchical Clustering.pptx
No ratings yet
Hierarchical Clustering.pptx
96 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
61 pages
ML ch 4 (4)
No ratings yet
ML ch 4 (4)
65 pages
UnSupervisedLearning
No ratings yet
UnSupervisedLearning
22 pages
30 May 2017 Points of Significance Clustering
No ratings yet
30 May 2017 Points of Significance Clustering
2 pages
Clustering - The Data Ensemble
No ratings yet
Clustering - The Data Ensemble
4 pages
Clustering
No ratings yet
Clustering
20 pages
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
No ratings yet
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
24 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
LP E-Basa Feb.6, Artificial Body Parts Pp.174-175
No ratings yet
LP E-Basa Feb.6, Artificial Body Parts Pp.174-175
4 pages
PDF Animal Communication Theory Information and Influence 1st Edition Ulrich E. Stegmann download
100% (10)
PDF Animal Communication Theory Information and Influence 1st Edition Ulrich E. Stegmann download
60 pages
Struktur Morfologi Dan Anatomi Burung Endemik Sulawesi Cabai Panggul-Kelabu (Dicaeum Celebicum Műller, 1843)
No ratings yet
Struktur Morfologi Dan Anatomi Burung Endemik Sulawesi Cabai Panggul-Kelabu (Dicaeum Celebicum Műller, 1843)
7 pages
FB Book KKaschner Sounds RF JG
No ratings yet
FB Book KKaschner Sounds RF JG
7 pages
Ch. 22 Origin of Species S18
No ratings yet
Ch. 22 Origin of Species S18
24 pages
Group No.: Total Score Group Members:: E. Coli Smear Preparation. On Microscopic Examination, How Would You Expect This
No ratings yet
Group No.: Total Score Group Members:: E. Coli Smear Preparation. On Microscopic Examination, How Would You Expect This
2 pages
Compendium FSS NFS FA 17 10 2022
No ratings yet
Compendium FSS NFS FA 17 10 2022
16 pages
New PPT Biology
No ratings yet
New PPT Biology
14 pages
Inheritance Patterns II Answer Document FA22
No ratings yet
Inheritance Patterns II Answer Document FA22
8 pages
Pengaruh Ekstrak Biji Pala (Myristica Fragrans) Terhadap Jumlah Eritrosit Dan Leukosit Pada Tikus Putih (Rattus Norvegicus)
No ratings yet
Pengaruh Ekstrak Biji Pala (Myristica Fragrans) Terhadap Jumlah Eritrosit Dan Leukosit Pada Tikus Putih (Rattus Norvegicus)
8 pages
Microbiology
100% (1)
Microbiology
10 pages
BIOLOGY SS2 Second Term Exam
80% (5)
BIOLOGY SS2 Second Term Exam
6 pages
Complement System-1
No ratings yet
Complement System-1
22 pages
Rolling Circle Replication
No ratings yet
Rolling Circle Replication
13 pages
Mark Scheme: Single Award Science: Biology
No ratings yet
Mark Scheme: Single Award Science: Biology
6 pages
Answer Key - B Division - Ecology - KC Regionals 2018
No ratings yet
Answer Key - B Division - Ecology - KC Regionals 2018
2 pages
Microbial endophytes: functional biology and applications Kumar - Download the ebook and start exploring right away
100% (1)
Microbial endophytes: functional biology and applications Kumar - Download the ebook and start exploring right away
62 pages
Department of Education: Learner'S Activity Sheet For Quarter 2, Week - 4
No ratings yet
Department of Education: Learner'S Activity Sheet For Quarter 2, Week - 4
8 pages
The Role of Mycorrhizal Networks in Forest Ecosystems
No ratings yet
The Role of Mycorrhizal Networks in Forest Ecosystems
3 pages
Flora Malesiana 1950 PDF
No ratings yet
Flora Malesiana 1950 PDF
810 pages
Land Use Change Pattern and The Balance of Food Production in Karawang District
No ratings yet
Land Use Change Pattern and The Balance of Food Production in Karawang District
6 pages
Lichen & Fungi
No ratings yet
Lichen & Fungi
1 page
2303210012i0 Afrianti Novita Sirait
No ratings yet
2303210012i0 Afrianti Novita Sirait
1 page
Maan Ist Year
No ratings yet
Maan Ist Year
12 pages
Aqa 8464B1H QP Jun19
No ratings yet
Aqa 8464B1H QP Jun19
24 pages
Kin Altruism, Reciprocal Altruism, and The Big Five Personality Factors
No ratings yet
Kin Altruism, Reciprocal Altruism, and The Big Five Personality Factors
13 pages
Salt Lake Community College - Biology Department BIOL1610 (LS) : College Biology I Spring Semester 2018
No ratings yet
Salt Lake Community College - Biology Department BIOL1610 (LS) : College Biology I Spring Semester 2018
10 pages
Chapter 13 The Cell_ A Molecular Approach
No ratings yet
Chapter 13 The Cell_ A Molecular Approach
27 pages

Clustering: Georg Gerber Lecture #6, 2/6/02

Uploaded by

Clustering: Georg Gerber Lecture #6, 2/6/02

Uploaded by

Clustering

 Here n is the number of dimensions in the

deuc=2.6115 These examples of

…But what about these?

What do you think the How about here? Is

Hierarchical clustering Input Optimal ordering

Hierarchical clustering Input Optimal ordering

How many clusters do

 |Ci | is the number of data points in cluster i

count how many pairs of data points no longer

variance in the data

Each grid point

“On these remote pages it is written that animals are

You might also like