0% found this document useful (0 votes)

116 views53 pages

K-Means and Kohonen Maps Unsupervised Clustering Techniques: Steve Hookway 4/8/04

K-means clustering is an unsupervised machine learning algorithm that groups unlabeled data points into a specified number of clusters (k) based on their similarity. It works by randomly selecting k data points as initial cluster centers, then assigning each remaining point to the nearest center, and recalculating the centers as the means of each cluster's points. This process repeats until the centers stabilize or a maximum number of iterations is reached. Choosing the right number of clusters k can be difficult, and the results are affected by the random initial center selection, so multiple runs may be needed. Cluster quality is assessed based on intra-cluster similarity versus inter-cluster distance.

Uploaded by

Raghav Venkat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

116 views53 pages

K-Means and Kohonen Maps Unsupervised Clustering Techniques: Steve Hookway 4/8/04

Uploaded by

Raghav Venkat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 53

K-means and Kohonen Maps

Unsupervised Clustering Techniques

Steve Hookway
4/8/04
What is a DNA Microarray?
 An experiment on the order of 10k
elements
 A way to explore the function of a gene
 A snapshot of the expression level of an
entire phenotype under given test
conditions
Some Microarray Terminology
 Probe: ssDNA printed on the solid
substrate (nylon or glass) These are
the genes we are going to be testing
 Target: cDNA which has been labeled
and is to be washed over the probe
Microarray Fabrication
 Deposition of DNA fragments
 Deposition of PCR-amplified cDNA clones
 Printing of already synthesized
oligonucleotieds
 In Situ synthesis
 Photolithography
 Ink Jet Printing
 Electrochemical Synthesis

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

cDNA Microarrays and Oligonucleotide
Probes

cDNA Arrays Oligonucleotide

Arrays
Long Sequences Short Sequences

Spot Unknown Spot Known

Sequences Sequences
More variability More reliable data

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

In Situ Synthesis
 Photochemically synthesized on the chip
 Reduces noise caused by PCR, cloning,
and Spotting
 As previously mentioned, three kinds of
In Situ Synthesis
 Photolithography
 Ink Jet Printing
 Electrochemical Synthesis

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

Photolithography
Photodeprotection
 Similar to process used to
build VLSI circuits
 Photolithographic masks
are used to add each base
mask
 If base is present, there
will be a hole in the
corresponding mask
 Can create high density
C
arrays, but sequence
length is limited

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

Ink Jet Printing
 Four cartridges are loaded with the four
nucleotides: A, G, C,T
 As the printer head moves across the
array, the nucleotides are deposited
where they are needed

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

Electrochemical Synthesis
 Electrodes are embedded in the substrate to
manage individual reaction sites
 Electrodes are activated in necessary
positions in a predetermined sequence that
allows the sequences to be constructed base
by base
 Solutions containing specific bases are
washed over the substrate while the
electrodes are activated
From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici
http://www.bio.davidson.edu/courses/genomics/chip/chip.html
Application of Microarrays
 We only know the
function of about
20% of the 30,000
genes in the Human
Genome
 Gene exploration
 Faster and better
 Can be used for
DNA computing
http://www.gene-chips.com/sample1.html
From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici
A Data Mining Problem
 On a given Microarray we test on the
order of 10k elements at a time
 Data is obtained faster than it can be
processed
 We need some ways to work through
this large data set and make sense of
the data
Grouping and Reduction
 Grouping: discovers patterns in the data
from a microarray
 Reduction: reduces the complexity of
data by removing redundant probes
(genes) that will be used in subsequent
assays
Unsupervised Grouping: Clustering
 Pattern discovery via grouping similarly
expressed genes together
 Three techniques most often used
 k-Means Clustering
 Hierarchical Clustering

 Kohonen Self Organizing Feature

Maps
Clustering Limitations
 Any data can be clustered, therefore we
must be careful what conclusions we
draw from our results
 Clustering is non-deterministic and can
and will produce different results on
different runs
K-means Clustering
 Given a set of n data points in d-
dimensional space and an integer k
 We want to find the set of k points in d-
dimensional space that minimizes the
mean squared distance from each data
point to its nearest center
 No exact polynomial-time algorithms
are known for this problem
“A Local Search Approximation Algorithm for k-Means Clustering” by Kanungo et. al
K-means Algorithm
(Lloyd’s Algorithm)
 Has been shown to Data
Points
converge to a locally
Optimal
optimal solution Centers
 But can converge to Heuristic
a solution arbitrarily Centers

bad compared to
the optimal solution K=3

•“K-means-type algorithms: A generalized convergence theorem and characterization of local optimality” by Selim and Ismail
•“A Local Search Approximation Algorithm for k-Means Clustering” by Kanungo et al.
Euclidean Distance
n
d E ( x, y )   i i
( x
i 1
 y ) 2

Now to find the distance between two points, say

the origin and the point (3,4):

d E (O, A)  3 4  5 2 2

Simple and Fast! Remember this when we consider

the complexity!
Finding a Centroid
We use the following equation to find the n dimensional
centroid point amid k n dimensional points:
k k k

 x1st  x2nd
i i  xnth i
CP ( x1 , x 2 ,...,x k )  ( i 1 , i 1
,..., i 1
)
k k k

Let’s find the midpoint between 3 2D points, say: (2,4) (5,2) (8,9)

258 4 29
CP  ( , )  (5,5)
3 3
K-means Algorithm
1. Choose k initial center points randomly
2. Cluster data using Euclidean distance (or other
distance metric)
3. Calculate new center points for each cluster using only
points within the cluster
4. Re-Cluster all data using the new center points
1. This step could cause data points to be placed in a different
cluster
5. Repeat steps 3 & 4 until the center points have moved
such that in step 4 no data points are moved from one
cluster to another or some other convergence criteria
is met

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

An example with k=2

1. We Pick k=2
centers at
random
2. We cluster our
data around
these center
points
Figure Reproduced From “Data Analysis Tools for DNA
Microarrays” by Sorin Draghici
K-means example with k=2

3. We recalculate
centers based on
our current clusters

Figure Reproduced From “Data Analysis Tools for DNA

Microarrays” by Sorin Draghici
K-means example with k=2

4. We re-cluster our
data around our
new center points

Figure Reproduced From “Data Analysis Tools for DNA

Microarrays” by Sorin Draghici
K-means example with k=2

5. We repeat the last

two steps until no
more data points
are moved into a
different cluster

Figure Reproduced From “Data Analysis Tools for DNA

Microarrays” by Sorin Draghici
Choosing k
 Use another clustering method
 Run algorithm on data with several
different values of k
 Use advance knowledge about the
characteristics of your test
 Cancerous vs Non-Cancerous
Cluster Quality
 Since any data can be clustered, how do we
know our clusters are meaningful?
 The size (diameter) of the cluster vs. The inter-
cluster distance
 Distance between the members of a cluster and
the cluster’s center
 Diameter of the smallest sphere

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

Cluster Quality Continued

distance=5
size=5
distance=20

Quality of cluster assessed by

ratio of distance to nearest
size=5
cluster and cluster diameter
Figure Reproduced From “Data Analysis Tools for DNA
Microarrays” by Sorin Draghici
Cluster Quality Continued
Quality can be
assessed simply by
looking at the
diameter of a cluster

A cluster can be formed even when there is

no similarity between clustered patterns. This
occurs because the algorithm forces k clusters
to be created.
From “Data Analysis Tools for DNA Microarrays” by Sorin
Draghici
Characteristics of k-means Clustering
 The random selection of initial center points
creates the following properties
 Non-Determinism

 May produce clusters without patterns

 One solution is to choose the centers randomly

from existing patterns

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

Algorithm Complexity
 Linear in the number of data points, N
 Can be shown to have time of cN
 c does not depend on N, but rather the
number of clusters, k
 Low computational complexity
 High speed

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

The Need for a New Algorithm
-Each data point is
assigned to the
correct cluster
-Data points that
seem to be far away
from each other in
heuristic are in
reality very closely
related to each other

Figure Reproduced From “Data Analysis Tools for DNA

Microarrays” by Sorin Draghici
The Need for a New Algorithm

Eisen et al., 1998

Kohonen Self Organizing Feature Maps
(SOFM)
 Creates a map in which similar patterns are
plotted next to each other
 Data visualization technique that reduces n
dimensions and displays similarities
 More complex than k-means or hierarchical
clustering, but more meaningful
 Neural Network Technique
 Inspired by the brain
From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici
SOFM Description
 Each unit of the SOFM Output Layer
has a weighted
connection to all inputs
 As the algorithm
progresses,
neighboring units are
grouped by similarity
Input Layer
From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici
SOFM Algorithm
Initialize Map
For t from 0 to 1 //t is the learning factor
Randomly select a sample
Get best matching unit
Scale neighbors
Increase t a small amount //decrease learning factor
End for

From: http://davis.wpi.edu/~matt/courses/soms/
An Example Using Color
Three dimensional data: red, blue, green

Will be converted into 2D image map with

clustering of Dark Blue and Greys together and
Yellow close to Both the Red and the Green

From http://davis.wpi.edu/~matt/courses/soms/
An Example Using Color

Each color in
the map is
associated with
a weight

From http://davis.wpi.edu/~matt/courses/soms/
An Example Using Color
1. Initialize the weights

Random Colors in the Equidistant

Values Corners

From http://davis.wpi.edu/~matt/courses/soms/
An Example Using Color Continued
2. Get best matching unit

After randomly selecting a sample, go through all

weight vectors and calculate the best match (in this
case using Euclidian distance)
Think of colors as 3D points each component (red,
green, blue) on an axis

From http://davis.wpi.edu/~matt/courses/soms/
An Example Using Color Continued
2. Getting the best matching unit continued…
For example, lets say we chose green as the
sample. Then it can be shown that light
green is closer to green than red:
Green: (0,6,0) Light Green: (3,6,3) Red(6,0,0)

LightGreen  32 0 2 32  4.24

Re d  6 2  (6) 2 0 2  8.49
This step is repeated for entire map, and the weight with
the shortest distance is chosen as the best match

From http://davis.wpi.edu/~matt/courses/soms/
An Example Using Color Continued
3. Scale neighbors
1. Determine which weights are considred
nieghbors
2. How much each weight can become
more like the sample vector

1. Determine which weights are considered neighbors

In the example, a gaussian function is used where every
point above 0 is considered a neighbor
 6.66666667 x 2  y 2
f ( x, y )  e

From http://davis.wpi.edu/~matt/courses/soms/
An Example Using Color Continued
2. How much each weight can become more like the sample

When the weight with the smallest distance is chosen

and the neighbors are determined, it and its
neighbors ‘learn’ by changing to become more like the
sample…The farther away a neighbor is, the less it
learns

From http://davis.wpi.edu/~matt/courses/soms/
An Example Using Color Continued

NewColorValue = CurrentColor*(1-t)+sampleVector*t
For the first iteration t=1 since t can range from 0 to 1,
for following iterations the value of t used in this
formula decreases because there are fewer values in
the range (as t increases in the for loop)

From http://davis.wpi.edu/~matt/courses/soms/
Conclusion of Example

Samples continue to be chosen

at random until t becomes 1
(learning stops)
At the conclusion of the
algorithm, we have a nicely
clustered data set. Also note
that we have achieved our goal:
Similar colors are grouped
closely together

From http://davis.wpi.edu/~matt/courses/soms/
SOFM Applied to Genetics
 Consider clustering 10,000 genes
 Each gene was measured in 4
experiments
 Input vectors are 4 dimensional
 Initial pattern of 10,000 each described by
a 4D vector
 Each of the 10,000 genes is chosen one
at a time to train the SOM

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

SOFM Applied to Genetics
 The pattern found to be closest to the current
gene (determined by weight vectors) is
selected as the winner
 The weight is then modified to become more
similar to the current gene based on the
learning rate (t in the previous example)
 The winner then pulls its neighbors closer to
the current gene by causing a lesser change
in weight
From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici
SOFM Applied to Genetics
 This process continues for all 10,000
genes
 Process is repeated until over time the
learning rate is reduced to zero

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

Our Favorite Example With Yeast
 Reduce data set to 828 genes
 Clustered data into 30 clusters using a
SOFM
 Each pattern is represented by its
average (centroid) pattern
 Clustered data has same behavior
 Neighbors exhibit similar behavior
“Interpresting patterns of gene expression with self-organizing maps: Methods and application
to hematopoietic differentiation” by Tamayo et al.
A SOFM Example With Yeast

“Interpresting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic
differentiation” by Tamayo et al.
Benefits of SOFM
 SOFM contains the set of features
extracted from the input patterns
(reduces dimensions)
 SOFM yields a set of clusters
 A gene will always be most similar to a
gene in its immediate neighborhood
than a gene further away

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

Conclusion
 K-means is a simple yet effective
algorithm for clustering data
 Self-organizing feature maps are slightly
more computationally expensive, but
they solve the problem of spatial
relationship

“Interpresting patterns of gene expression with self-organizing maps: Methods and application
to hematopoietic differentiation” by Tamayo et al.
References
 Basic microarray analysis: grouping and feature reduction by
Soumya Raychaudhuri, Patrick D. Sutphin, Jeffery T. Chang and
Russ B. Altman; Trends in Biotechnology Vol. 19 No. 5 May
2001
 Self Organizing Maps, Tom Germano,
http://davis.wpi.edu/~matt/courses/soms
 “Data Analysis Tools for DNA Microarrays” by Sorin Draghici;
Chapman & Hall/CRC 2003
 Self-Organizing-Feature-Maps versus Statistical Clustering
Methods: A Benchmark by A. Ultsh, C. Vetter; FG
Neuroinformatik & Kunstliche Intelligenz Research Report 0994
References
 Interpreting patterns of gene expression with self-
organizing maps: Methods and application to
hematopoietic differentiation by Tamayo et al.
 A Local Search Approximation Algorithm for k-Means
Clustering by Kanungo et al.
 K-means-type algorithms: A generalized convergence
theorem and characterization of local optimality by
Selim and Ismail

Essential Amino Acids
No ratings yet
Essential Amino Acids
23 pages
Dna Extraction
No ratings yet
Dna Extraction
21 pages
Protein Sorting: Dr. Narendhirakannan RT Assistant Professor Department of Biochemistry
100% (1)
Protein Sorting: Dr. Narendhirakannan RT Assistant Professor Department of Biochemistry
43 pages
Session11-Parts 21-22
No ratings yet
Session11-Parts 21-22
171 pages
Chapter 4 Cell Structure Study Guide
75% (4)
Chapter 4 Cell Structure Study Guide
21 pages
Quarter 4 Science Week 3-4
100% (2)
Quarter 4 Science Week 3-4
6 pages
Magcore® Nucleic Acid Extraction Kit User'S Manual: Kit Contents, Description, Applications, Pretreatment, Protocol
No ratings yet
Magcore® Nucleic Acid Extraction Kit User'S Manual: Kit Contents, Description, Applications, Pretreatment, Protocol
84 pages
Clustering
No ratings yet
Clustering
22 pages
کتاب چهارم بارگزاری شده
No ratings yet
کتاب چهارم بارگزاری شده
63 pages
Nerve Muscle Physiology
100% (2)
Nerve Muscle Physiology
18 pages
Clustering
No ratings yet
Clustering
45 pages
Ch10 Clustering
No ratings yet
Ch10 Clustering
45 pages
STEM - BIO11 12 Id F 7
No ratings yet
STEM - BIO11 12 Id F 7
4 pages
Clustering
No ratings yet
Clustering
64 pages
Raven - Biology Part23
No ratings yet
Raven - Biology Part23
91 pages
MCQ
100% (1)
MCQ
15 pages
Unit - 4 DMA
No ratings yet
Unit - 4 DMA
145 pages
DS9 - Clustering
No ratings yet
DS9 - Clustering
35 pages
9700 m16 QP 22 PDF
No ratings yet
9700 m16 QP 22 PDF
16 pages
150 MCQ, Biochemistry, 2nd Sem
100% (1)
150 MCQ, Biochemistry, 2nd Sem
43 pages
Grouping
No ratings yet
Grouping
98 pages
Receptor Kinetics
No ratings yet
Receptor Kinetics
61 pages
Biosynthesis of Collagen
No ratings yet
Biosynthesis of Collagen
40 pages
07 Clustering
No ratings yet
07 Clustering
54 pages
Clustering
No ratings yet
Clustering
35 pages
Cluster
No ratings yet
Cluster
50 pages
Molecular Cloning 2
No ratings yet
Molecular Cloning 2
183 pages
Worksheet Part 2 Plasma Membrane
100% (1)
Worksheet Part 2 Plasma Membrane
4 pages
Unsupervised Machine Learning Techniques
No ratings yet
Unsupervised Machine Learning Techniques
58 pages
Cells 10 03017 v2
No ratings yet
Cells 10 03017 v2
38 pages
Agenda: 1. Introduction To Clustering
No ratings yet
Agenda: 1. Introduction To Clustering
47 pages
A Comparative Study and Analysis For Microarray Gene Expression Data Using Clustering Techniques
No ratings yet
A Comparative Study and Analysis For Microarray Gene Expression Data Using Clustering Techniques
3 pages
Bradford Assay Prelab PDF
No ratings yet
Bradford Assay Prelab PDF
19 pages
Unit Iii
No ratings yet
Unit Iii
62 pages
CMMB 461 Dna Microarray 2 2019 For D2L
No ratings yet
CMMB 461 Dna Microarray 2 2019 For D2L
27 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Lab 5: Selection Analysis: Bioinformatic Methods I Lab 5
No ratings yet
Lab 5: Selection Analysis: Bioinformatic Methods I Lab 5
14 pages
Kehr CV Summer 2022 No Address
No ratings yet
Kehr CV Summer 2022 No Address
7 pages
Clustering
No ratings yet
Clustering
36 pages
1.4 Membrane Transport Notes
No ratings yet
1.4 Membrane Transport Notes
13 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
06 Cytoskeleton 2020 Notes
No ratings yet
06 Cytoskeleton 2020 Notes
17 pages
Lec. 15-Final. ClusAdvanced
No ratings yet
Lec. 15-Final. ClusAdvanced
103 pages
Lect3 Clustering
No ratings yet
Lect3 Clustering
86 pages
5 Microarray PDF
No ratings yet
5 Microarray PDF
79 pages
Lab Report
No ratings yet
Lab Report
7 pages
Chymotrypsin Mechnism
No ratings yet
Chymotrypsin Mechnism
8 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Data Mining
No ratings yet
Data Mining
26 pages
Ult SCH 94 Benchmark
No ratings yet
Ult SCH 94 Benchmark
14 pages
Nucleic Acids - IB Biology SL Grade 11
No ratings yet
Nucleic Acids - IB Biology SL Grade 11
9 pages
Unit 7 Clustering
No ratings yet
Unit 7 Clustering
56 pages
Bio PPT
No ratings yet
Bio PPT
14 pages
Lect 10 DM
No ratings yet
Lect 10 DM
36 pages
Cluster Analysis in DNA Microarray Experiments: Sandrine Dudoit and Robert Gentleman
No ratings yet
Cluster Analysis in DNA Microarray Experiments: Sandrine Dudoit and Robert Gentleman
48 pages
Datawarehousing and Data Mining
No ratings yet
Datawarehousing and Data Mining
119 pages
Machine Learning Section3 Ebook v05
No ratings yet
Machine Learning Section3 Ebook v05
15 pages
WEEK 2 Quiz
No ratings yet
WEEK 2 Quiz
3 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
Cells Webquest
No ratings yet
Cells Webquest
4 pages
Genetic Engineering (DLP)
No ratings yet
Genetic Engineering (DLP)
6 pages
DWDM Unit V Note
No ratings yet
DWDM Unit V Note
19 pages
Microarray Data Analysis: Class Discovery and Class Prediction: Clustering and Discrimination
No ratings yet
Microarray Data Analysis: Class Discovery and Class Prediction: Clustering and Discrimination
70 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Clustering
No ratings yet
Clustering
65 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
77 pages
Clustering
No ratings yet
Clustering
75 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Introduction To Data Science: Clustering
No ratings yet
Introduction To Data Science: Clustering
45 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
DM 10,11 Clustering PDF
No ratings yet
DM 10,11 Clustering PDF
65 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
W6 Clustering
No ratings yet
W6 Clustering
29 pages
Dosage Compensation in Human Genetics
No ratings yet
Dosage Compensation in Human Genetics
1 page
043 Chenb Hierarchical
No ratings yet
043 Chenb Hierarchical
4 pages
Anova For Comparing Means Between More Than 2 Groups: Variance: Average of Squared Differences From Mean
No ratings yet
Anova For Comparing Means Between More Than 2 Groups: Variance: Average of Squared Differences From Mean
69 pages
Introduction To Clustering Procedures: Sas/Stat User's Guide
No ratings yet
Introduction To Clustering Procedures: Sas/Stat User's Guide
48 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
No ratings yet
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
43 pages
Unit - 4 - Modified
No ratings yet
Unit - 4 - Modified
152 pages
Microarray Full
No ratings yet
Microarray Full
56 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
51 pages
K-Means Clustering Clustering Algorithms Implementation and Comparison
No ratings yet
K-Means Clustering Clustering Algorithms Implementation and Comparison
4 pages
Comprehensive Review of K Means Clustering Algorithms1
No ratings yet
Comprehensive Review of K Means Clustering Algorithms1
6 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
From Everand
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
Fouad Sabry
No ratings yet

K-Means and Kohonen Maps Unsupervised Clustering Techniques: Steve Hookway 4/8/04

Uploaded by

K-Means and Kohonen Maps Unsupervised Clustering Techniques: Steve Hookway 4/8/04

Uploaded by

K-means and Kohonen Maps

Unsupervised Clustering Techniques

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

cDNA Arrays Oligonucleotide

Spot Unknown Spot Known

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

 Kohonen Self Organizing Feature

Now to find the distance between two points, say

Simple and Fast! Remember this when we consider

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

Figure Reproduced From “Data Analysis Tools for DNA

Figure Reproduced From “Data Analysis Tools for DNA

5. We repeat the last

Figure Reproduced From “Data Analysis Tools for DNA

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

Quality of cluster assessed by

A cluster can be formed even when there is

 May produce clusters without patterns

 One solution is to choose the centers randomly

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

Figure Reproduced From “Data Analysis Tools for DNA

Eisen et al., 1998

Will be converted into 2D image map with

Random Colors in the Equidistant

After randomly selecting a sample, go through all

LightGreen  32 0 2 32  4.24

1. Determine which weights are considered neighbors

When the weight with the smallest distance is chosen

Samples continue to be chosen

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

You might also like