-
Notifications
You must be signed in to change notification settings - Fork 81
/
Copy pathbiclustering.html
494 lines (454 loc) · 29.6 KB
/
biclustering.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>2.4. Biclustering — scikit-learn 0.16.1 documentation</title>
<!-- htmltitle is before nature.css - we use this hack to load bootstrap first -->
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" href="../_static/css/bootstrap.min.css" media="screen" />
<link rel="stylesheet" href="../_static/css/bootstrap-responsive.css"/>
<link rel="stylesheet" href="../_static/nature.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../_static/gallery.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: '../',
VERSION: '0.16.1',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="../_static/jquery.js"></script>
<script type="text/javascript" src="../_static/underscore.js"></script>
<script type="text/javascript" src="../_static/doctools.js"></script>
<script type="text/javascript" src="../_static/js/copybutton.js"></script>
<link rel="shortcut icon" href="../_static/favicon.ico"/>
<link rel="author" title="About these documents" href="../about.html" />
<link rel="top" title="scikit-learn 0.16.1 documentation" href="../index.html" />
<link rel="up" title="2. Unsupervised learning" href="../unsupervised_learning.html" />
<link rel="next" title="2.5. Decomposing signals in components (matrix factorization problems)" href="decomposition.html" />
<link rel="prev" title="2.3. Clustering" href="clustering.html" />
<script type="text/javascript" src="../_static/sidebar.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<script src="../_static/js/bootstrap.min.js" type="text/javascript"></script>
<link rel="canonical" href="https://scikit-learn.org/stable/modules/biclustering.html" />
<script type="text/javascript">
$("div.buttonNext, div.buttonPrevious").hover(
function () {
$(this).css('background-color', '#FF9C34');
},
function () {
$(this).css('background-color', '#A7D6E2');
}
);
var bodywrapper = $('.bodywrapper');
var sidebarbutton = $('#sidebarbutton');
sidebarbutton.css({'height': '900px'});
</script>
</head>
<body>
<div class="header-wrapper">
<div class="header">
<p class="logo"><a href="../index.html">
<img src="../_static/scikit-learn-logo-small.png" alt="Logo"/>
</a>
</p><div class="navbar">
<ul>
<li><a href="../../stable/index.html">Home</a></li>
<li><a href="../../stable/install.html">Installation</a></li>
<li class="btn-li"><div class="btn-group">
<a href="../documentation.html">Documentation</a>
<a class="btn dropdown-toggle" data-toggle="dropdown">
<span class="caret"></span>
</a>
<ul class="dropdown-menu">
<li class="link-title">Scikit-learn 0.16 (Stable)</li>
<li><a href="../tutorial/index.html">Tutorials</a></li>
<li><a href="../user_guide.html">User guide</a></li>
<li><a href="classes.html">API</a></li>
<li><a href="../faq.html">FAQ</a></li>
<li class="divider"></li>
<li><a href="http://scikit-learn.org/dev/documentation.html">Development</a></li>
<li><a href="http://scikit-learn.org/0.15/">Scikit-learn 0.15</a></li>
</ul>
</div>
</li>
<li><a href="../auto_examples/index.html">Examples</a></li>
</ul>
<div class="search_form">
<div id="cse" style="width: 100%;"></div>
</div>
</div> <!-- end navbar --></div>
</div>
<!-- Github "fork me" ribbon -->
<a href="https://github.com/scikit-learn/scikit-learn">
<img class="fork-me"
style="position: absolute; top: 0; right: 0; border: 0;"
src="../_static/img/forkme.png"
alt="Fork me on GitHub" />
</a>
<div class="content-wrapper">
<div class="sphinxsidebar">
<div class="sphinxsidebarwrapper">
<div class="rel">
<!-- rellinks[1:] is an ugly hack to avoid link to module
index -->
<div class="rellink">
<a href="clustering.html"
accesskey="P">Previous
<br/>
<span class="smallrellink">
2.3. Clustering
</span>
<span class="hiddenrellink">
2.3. Clustering
</span>
</a>
</div>
<div class="spacer">
</div>
<div class="rellink">
<a href="decomposition.html"
accesskey="N">Next
<br/>
<span class="smallrellink">
2.5. Decomposing...
</span>
<span class="hiddenrellink">
2.5. Decomposing signals in components (matrix factorization problems)
</span>
</a>
</div>
<!-- Ad a link to the 'up' page -->
<div class="spacer">
</div>
<div class="rellink">
<a href="../unsupervised_learning.html">
Up
<br/>
<span class="smallrellink">
2. Unsupervised ...
</span>
<span class="hiddenrellink">
2. Unsupervised learning
</span>
</a>
</div>
</div>
<p class="doc-version">This documentation is for scikit-learn <strong>version 0.16.1</strong> — <a href="http://scikit-learn.org/stable/support.html#documentation-resources">Other versions</a></p>
<p class="citing">If you use the software, please consider <a href="../about.html#citing-scikit-learn">citing scikit-learn</a>.</p>
<ul>
<li><a class="reference internal" href="#">2.4. Biclustering</a><ul>
<li><a class="reference internal" href="#spectral-co-clustering">2.4.1. Spectral Co-Clustering</a><ul>
<li><a class="reference internal" href="#mathematical-formulation">2.4.1.1. Mathematical formulation</a></li>
</ul>
</li>
<li><a class="reference internal" href="#spectral-biclustering">2.4.2. Spectral Biclustering</a><ul>
<li><a class="reference internal" href="#id3">2.4.2.1. Mathematical formulation</a></li>
</ul>
</li>
<li><a class="reference internal" href="#biclustering-evaluation">2.4.3. Biclustering evaluation</a></li>
</ul>
</li>
</ul>
</div>
</div>
<div class="content">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body">
<div class="section" id="biclustering">
<span id="id1"></span><h1>2.4. Biclustering<a class="headerlink" href="#biclustering" title="Permalink to this headline">¶</a></h1>
<p>Biclustering can be performed with the module
<a class="reference internal" href="classes.html#module-sklearn.cluster.bicluster" title="sklearn.cluster.bicluster"><tt class="xref py py-mod docutils literal"><span class="pre">sklearn.cluster.bicluster</span></tt></a>. Biclustering algorithms simultaneously
cluster rows and columns of a data matrix. These clusters of rows and
columns are known as biclusters. Each determines a submatrix of the
original data matrix with some desired properties.</p>
<p>For instance, given a matrix of shape <tt class="docutils literal"><span class="pre">(10,</span> <span class="pre">10)</span></tt>, one possible bicluster
with three rows and two columns induces a submatrix of shape <tt class="docutils literal"><span class="pre">(3,</span> <span class="pre">2)</span></tt>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="gp">>>> </span><span class="n">data</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">rows</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])[:,</span> <span class="n">np</span><span class="o">.</span><span class="n">newaxis</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">columns</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">])</span>
<span class="gp">>>> </span><span class="n">data</span><span class="p">[</span><span class="n">rows</span><span class="p">,</span> <span class="n">columns</span><span class="p">]</span>
<span class="go">array([[ 1, 2],</span>
<span class="go"> [21, 22],</span>
<span class="go"> [31, 32]])</span>
</pre></div>
</div>
<p>For visualization purposes, given a bicluster, the rows and columns of
the data matrix may be rearranged to make the bicluster contiguous.</p>
<p>Algorithms differ in how they define biclusters. Some of the
common types include:</p>
<ul class="simple">
<li>constant values, constant rows, or constant columns</li>
<li>unusually high or low values</li>
<li>submatrices with low variance</li>
<li>correlated rows or columns</li>
</ul>
<p>Algorithms also differ in how rows and columns may be assigned to
biclusters, which leads to different bicluster structures. Block
diagonal or checkerboard structures occur when rows and columns are
divided into partitions.</p>
<p>If each row and each column belongs to exactly one bicluster, then
rearranging the rows and columns of the data matrix reveals the
biclusters on the diagonal. Here is an example of this structure
where biclusters have higher average values than the other rows and
columns:</p>
<div class="figure align-center">
<a class="reference external image-reference" href="../auto_examples/bicluster/images/plot_spectral_coclustering_003.png"><img alt="../_images/plot_spectral_coclustering_003.png" src="../_images/plot_spectral_coclustering_003.png" style="width: 300.0px; height: 300.0px;" /></a>
<p class="caption">An example of biclusters formed by partitioning rows and columns.</p>
</div>
<p>In the checkerboard case, each row belongs to all column clusters, and
each column belongs to all row clusters. Here is an example of this
structure where the variance of the values within each bicluster is
small:</p>
<div class="figure align-center">
<a class="reference external image-reference" href="../auto_examples/bicluster/images/plot_spectral_biclustering_003.png"><img alt="../_images/plot_spectral_biclustering_003.png" src="../_images/plot_spectral_biclustering_003.png" style="width: 300.0px; height: 300.0px;" /></a>
<p class="caption">An example of checkerboard biclusters.</p>
</div>
<p>After fitting a model, row and column cluster membership can be found
in the <tt class="docutils literal"><span class="pre">rows_</span></tt> and <tt class="docutils literal"><span class="pre">columns_</span></tt> attributes. <tt class="docutils literal"><span class="pre">rows_[i]</span></tt> is a binary vector
with nonzero entries corresponding to rows that belong to bicluster
<tt class="docutils literal"><span class="pre">i</span></tt>. Similarly, <tt class="docutils literal"><span class="pre">columns_[i]</span></tt> indicates which columns belong to
bicluster <tt class="docutils literal"><span class="pre">i</span></tt>.</p>
<p>Some models also have <tt class="docutils literal"><span class="pre">row_labels_</span></tt> and <tt class="docutils literal"><span class="pre">column_labels_</span></tt> attributes.
These models partition the rows and columns, such as in the block
diagonal and checkerboard bicluster structures.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">Biclustering has many other names in different fields including
co-clustering, two-mode clustering, two-way clustering, block
clustering, coupled two-way clustering, etc. The names of some
algorithms, such as the Spectral Co-Clustering algorithm, reflect
these alternate names.</p>
</div>
<div class="section" id="spectral-co-clustering">
<span id="spectral-coclustering"></span><h2>2.4.1. Spectral Co-Clustering<a class="headerlink" href="#spectral-co-clustering" title="Permalink to this headline">¶</a></h2>
<p>The <a class="reference internal" href="generated/sklearn.cluster.bicluster.SpectralCoclustering.html#sklearn.cluster.bicluster.SpectralCoclustering" title="sklearn.cluster.bicluster.SpectralCoclustering"><tt class="xref py py-class docutils literal"><span class="pre">SpectralCoclustering</span></tt></a> algorithm finds biclusters with
values higher than those in the corresponding other rows and columns.
Each row and each column belongs to exactly one bicluster, so
rearranging the rows and columns to make partitions contiguous reveals
these high values along the diagonal:</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">The algorithm treats the input data matrix as a bipartite graph: the
rows and columns of the matrix correspond to the two sets of vertices,
and each entry corresponds to an edge between a row and a column. The
algorithm approximates the normalized cut of this graph to find heavy
subgraphs.</p>
</div>
<div class="section" id="mathematical-formulation">
<h3>2.4.1.1. Mathematical formulation<a class="headerlink" href="#mathematical-formulation" title="Permalink to this headline">¶</a></h3>
<p>An approximate solution to the optimal normalized cut may be found via
the generalized eigenvalue decomposition of the Laplacian of the
graph. Usually this would mean working directly with the Laplacian
matrix. If the original data matrix <img class="math" src="../_images/math/0acafa529182e79b4f56165ec677554fba7fcf98.png" alt="A"/> has shape <img class="math" src="../_images/math/8ac687edcd29efcd2489036b036bfe5f4f003527.png" alt="m
\times n"/>, the Laplacian matrix for the corresponding bipartite graph
has shape <img class="math" src="../_images/math/b7804eb491eb62b008f5f5d8889c82d7a383d3d8.png" alt="(m + n) \times (m + n)"/>. However, in this case it is
possible to work directly with <img class="math" src="../_images/math/0acafa529182e79b4f56165ec677554fba7fcf98.png" alt="A"/>, which is smaller and more
efficient.</p>
<p>The input matrix <img class="math" src="../_images/math/0acafa529182e79b4f56165ec677554fba7fcf98.png" alt="A"/> is preprocessed as follows:</p>
<div class="math">
<p><img src="../_images/math/691f9bcf2834afe76dac2bd0c1170cd70ca13c54.png" alt="A_n = R^{-1/2} A C^{-1/2}"/></p>
</div><p>Where <img class="math" src="../_images/math/9d86170e7de539c0ff999de09621ee0c7b6c8ed0.png" alt="R"/> is the diagonal matrix with entry <img class="math" src="../_images/math/a581f053bbfa5115f42c13094857cdd12a37ec49.png" alt="i"/> equal to
<img class="math" src="../_images/math/7c732f51f0f10cd327f9dcaa7d3432173d8964f0.png" alt="\sum_{j} A_{ij}"/> and <img class="math" src="../_images/math/2bcc65482aa8e15cd4c9e9f2542451fb4e971a91.png" alt="C"/> is the diagonal matrix with
entry <img class="math" src="../_images/math/d32c78b759903e3f4bd4fd2ce0b86358f7500c5d.png" alt="j"/> equal to <img class="math" src="../_images/math/aa3d3ca3b6fb90c3c1e09992631c35e7f17c92e8.png" alt="\sum_{i} A_{ij}"/>.</p>
<p>The singular value decomposition, <img class="math" src="../_images/math/89cdb11ac552712c9ab3dd1bcc28619b9ae30c77.png" alt="A_n = U \Sigma V^\top"/>,
provides the partitions of the rows and columns of <img class="math" src="../_images/math/0acafa529182e79b4f56165ec677554fba7fcf98.png" alt="A"/>. A subset
of the left singular vectors gives the row partitions, and a subset
of the right singular vectors gives the column partitions.</p>
<p>The <img class="math" src="../_images/math/3a5c29c227c4edf0fb9e66e1f3515649e95bc6c9.png" alt="\ell = \lceil \log_2 k \rceil"/> singular vectors, starting
from the second, provide the desired partitioning information. They
are used to form the matrix <img class="math" src="../_images/math/c4563e7ecec2336a3934447a6c10ef8519b0b452.png" alt="Z"/>:</p>
<div class="math">
<p><img src="../_images/math/5f5a8c7bf3426ebee66a481436596ef99a070177.png" alt="Z = \begin{bmatrix} R^{-1/2} U \\\\
C^{-1/2} V
\end{bmatrix}"/></p>
</div><p>where the the columns of <img class="math" src="../_images/math/73f5e249c88b2b3068263480f576b051cb5c4f6e.png" alt="U"/> are <img class="math" src="../_images/math/f92ab7ae092a16eac5c7423d7c5446623447d724.png" alt="u_2, \dots, u_{\ell +
1}"/>, and similarly for <img class="math" src="../_images/math/c99df7a209495334da442b1ec998abaabfa320d8.png" alt="V"/>.</p>
<p>Then the rows of <img class="math" src="../_images/math/c4563e7ecec2336a3934447a6c10ef8519b0b452.png" alt="Z"/> are clustered using <a class="reference internal" href="clustering.html#k-means"><em>k-means</em></a>. The first <tt class="docutils literal"><span class="pre">n_rows</span></tt> labels provide the row partitioning,
and the remaining <tt class="docutils literal"><span class="pre">n_columns</span></tt> labels provide the column partitioning.</p>
<div class="topic">
<p class="topic-title first">Examples:</p>
<ul class="simple">
<li><a class="reference internal" href="../auto_examples/bicluster/plot_spectral_coclustering.html#example-bicluster-plot-spectral-coclustering-py"><em>A demo of the Spectral Co-Clustering algorithm</em></a>: A simple example
showing how to generate a data matrix with biclusters and apply
this method to it.</li>
<li><a class="reference internal" href="../auto_examples/bicluster/bicluster_newsgroups.html#example-bicluster-bicluster-newsgroups-py"><em>Biclustering documents with the Spectral Co-clustering algorithm</em></a>: An example of finding
biclusters in the twenty newsgroup dataset.</li>
</ul>
</div>
<div class="topic">
<p class="topic-title first">References:</p>
<ul class="simple">
<li>Dhillon, Inderjit S, 2001. <a class="reference external" href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.3011">Co-clustering documents and words using
bipartite spectral graph partitioning</a>.</li>
</ul>
</div>
</div>
</div>
<div class="section" id="spectral-biclustering">
<span id="id2"></span><h2>2.4.2. Spectral Biclustering<a class="headerlink" href="#spectral-biclustering" title="Permalink to this headline">¶</a></h2>
<p>The <a class="reference internal" href="generated/sklearn.cluster.bicluster.SpectralBiclustering.html#sklearn.cluster.bicluster.SpectralBiclustering" title="sklearn.cluster.bicluster.SpectralBiclustering"><tt class="xref py py-class docutils literal"><span class="pre">SpectralBiclustering</span></tt></a> algorithm assumes that the input
data matrix has a hidden checkerboard structure. The rows and columns
of a matrix with this structure may be partitioned so that the entries
of any bicluster in the Cartesian product of row clusters and column
clusters is are approximately constant. For instance, if there are two
row partitions and three column partitions, each row will belong to
three biclusters, and each column will belong to two biclusters.</p>
<p>The algorithm partitions the rows and columns of a matrix so that a
corresponding blockwise-constant checkerboard matrix provides a good
approximation to the original matrix.</p>
<div class="section" id="id3">
<h3>2.4.2.1. Mathematical formulation<a class="headerlink" href="#id3" title="Permalink to this headline">¶</a></h3>
<p>The input matrix <img class="math" src="../_images/math/0acafa529182e79b4f56165ec677554fba7fcf98.png" alt="A"/> is first normalized to make the
checkerboard pattern more obvious. There are three possible methods:</p>
<ol class="arabic simple">
<li><em>Independent row and column normalization</em>, as in Spectral
Co-Clustering. This method makes the rows sum to a constant and the
columns sum to a different constant.</li>
<li><strong>Bistochastization</strong>: repeated row and column normalization until
convergence. This method makes both rows and columns sum to the
same constant.</li>
<li><strong>Log normalization</strong>: the log of the data matrix is computed: <img class="math" src="../_images/math/4b09797b90bcd5a2e9c4778f03073d8d9fc8ffeb.png" alt="L =
\log A"/>. Then the column mean <img class="math" src="../_images/math/5213b21bf8b813f616076d232e710fe56e137abe.png" alt="\overline{L_{i \cdot}}"/>, row mean
<img class="math" src="../_images/math/f6c463bfc8a99bc816f44ceaf875e0ec258397da.png" alt="\overline{L_{\cdot j}}"/>, and overall mean <img class="math" src="../_images/math/111ee5b0009a7d22900d39da34dd082f6b4433a4.png" alt="\overline{L_{\cdot
\cdot}}"/> of <img class="math" src="../_images/math/0a5711c7a37994043b2bc3bb374adca232491762.png" alt="L"/> are computed. The final matrix is computed
according to the formula</li>
</ol>
<div class="math">
<p><img src="../_images/math/c570ab13e7a6725e1772f64ad76d96a957adc07d.png" alt="K_{ij} = L_{ij} - \overline{L_{i \cdot}} - \overline{L_{\cdot
j}} + \overline{L_{\cdot \cdot}}"/></p>
</div><p>After normalizing, the first few singular vectors are computed, just
as in the Spectral Co-Clustering algorithm.</p>
<p>If log normalization was used, all the singular vectors are
meaningful. However, if independent normalization or bistochastization
were used, the first singular vectors, <img class="math" src="../_images/math/6bd6b69e7846992625db6cd37ca4dcec179fdbc3.png" alt="u_1"/> and <img class="math" src="../_images/math/26bed463602b7b5764b5bdb149c6f77b40c998c3.png" alt="v_1"/>.
are discarded. From now on, the “first” singular vectors refers to
<img class="math" src="../_images/math/fabb5cd5dc86de2c0afb132798404b7646fa72eb.png" alt="u_2 \dots u_{p+1}"/> and <img class="math" src="../_images/math/e2b7e8b7dc650f4c44705855380b100eb43e5c26.png" alt="v_2 \dots v_{p+1}"/> except in the
case of log normalization.</p>
<p>Given these singular vectors, they are ranked according to which can
be best approximated by a piecewise-constant vector. The
approximations for each vector are found using one-dimensional k-means
and scored using the Euclidean distance. Some subset of the best left
and right singular vector are selected. Next, the data is projected to
this best subset of singular vectors and clustered.</p>
<p>For instance, if <img class="math" src="../_images/math/3eca8557203e86160952e1c0f735f7417f3285b1.png" alt="p"/> singular vectors were calculated, the
<img class="math" src="../_images/math/23f1b45408e5b4130c0f940fcbfcec54492cbdcd.png" alt="q"/> best are found as described, where <img class="math" src="../_images/math/19fd361492333ed7742f1adca8f3750a8c5f843b.png" alt="q<p"/>. Let
<img class="math" src="../_images/math/73f5e249c88b2b3068263480f576b051cb5c4f6e.png" alt="U"/> be the matrix with columns the <img class="math" src="../_images/math/23f1b45408e5b4130c0f940fcbfcec54492cbdcd.png" alt="q"/> best left singular
vectors, and similarly <img class="math" src="../_images/math/c99df7a209495334da442b1ec998abaabfa320d8.png" alt="V"/> for the right. To partition the rows,
the rows of <img class="math" src="../_images/math/0acafa529182e79b4f56165ec677554fba7fcf98.png" alt="A"/> are projected to a <img class="math" src="../_images/math/23f1b45408e5b4130c0f940fcbfcec54492cbdcd.png" alt="q"/> dimensional space:
<img class="math" src="../_images/math/92e92368f40b47a70de9e8c1990ed3a9bb3c8cd9.png" alt="A * V"/>. Treating the <img class="math" src="../_images/math/c4bb40dd65eae6c11b325989b14e0b8d35e4e3ef.png" alt="m"/> rows of this <img class="math" src="../_images/math/be320c6c188b9e67dff4244e742aa9f5d56be255.png" alt="m \times q"/>
matrix as samples and clustering using k-means yields the row labels.
Similarly, projecting the columns to <img class="math" src="../_images/math/971ec4565ce3998936bc77cf2f2900547f2f2d98.png" alt="A^{\top} * U"/> and
clustering this <img class="math" src="../_images/math/90669be6326baab71b695e16bd5df6e770317d81.png" alt="n \times q"/> matrix yields the column labels.</p>
<div class="topic">
<p class="topic-title first">Examples:</p>
<ul class="simple">
<li><a class="reference internal" href="../auto_examples/bicluster/plot_spectral_biclustering.html#example-bicluster-plot-spectral-biclustering-py"><em>A demo of the Spectral Biclustering algorithm</em></a>: a simple example
showing how to generate a checkerboard matrix and bicluster it.</li>
</ul>
</div>
<div class="topic">
<p class="topic-title first">References:</p>
<ul class="simple">
<li>Kluger, Yuval, et. al., 2003. <a class="reference external" href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.135.1608">Spectral biclustering of microarray
data: coclustering genes and conditions</a>.</li>
</ul>
</div>
</div>
</div>
<div class="section" id="biclustering-evaluation">
<span id="id4"></span><h2>2.4.3. Biclustering evaluation<a class="headerlink" href="#biclustering-evaluation" title="Permalink to this headline">¶</a></h2>
<p>There are two ways of evaluating a biclustering result: internal and
external. Internal measures, such as cluster stability, rely only on
the data and the result themselves. Currently there are no internal
bicluster measures in scikit-learn. External measures refer to an
external source of information, such as the true solution. When
working with real data the true solution is usually unknown, but
biclustering artificial data may be useful for evaluating algorithms
precisely because the true solution is known.</p>
<p>To compare a set of found biclusters to the set of true biclusters,
two similarity measures are needed: a similarity measure for
individual biclusters, and a way to combine these individual
similarities into an overall score.</p>
<p>To compare individual biclusters, several measures have been used. For
now, only the Jaccard index is implemented:</p>
<div class="math">
<p><img src="../_images/math/a30d171210b238748decce5260c2d9d980b14003.png" alt="J(A, B) = \frac{|A \cap B|}{|A| + |B| - |A \cap B|}"/></p>
</div><p>where <img class="math" src="../_images/math/0acafa529182e79b4f56165ec677554fba7fcf98.png" alt="A"/> and <img class="math" src="../_images/math/83956e92fcc80dee17fce864543216939a3c9da7.png" alt="B"/> are biclusters, <img class="math" src="../_images/math/8bd057dc66d00020cd35b77da7f8c9d7809c6eeb.png" alt="|A \cap B|"/> is
the number of elements in their intersection. The Jaccard index
achieves its minimum of 0 when the biclusters to not overlap at all
and its maximum of 1 when they are identical.</p>
<p>Several methods have been developed to compare two sets of biclusters.
For now, only <a class="reference internal" href="generated/sklearn.metrics.consensus_score.html#sklearn.metrics.consensus_score" title="sklearn.metrics.consensus_score"><tt class="xref py py-func docutils literal"><span class="pre">consensus_score</span></tt></a> (Hochreiter et. al., 2010) is
available:</p>
<ol class="arabic simple">
<li>Compute bicluster similarities for pairs of biclusters, one in each
set, using the Jaccard index or a similar measure.</li>
<li>Assign biclusters from one set to another in a one-to-one fashion
to maximize the sum of their similarities. This step is performed
using the Hungarian algorithm.</li>
<li>The final sum of similarities is divided by the size of the larger
set.</li>
</ol>
<p>The minimum consensus score, 0, occurs when all pairs of biclusters
are totally dissimilar. The maximum score, 1, occurs when both sets
are identical.</p>
<div class="topic">
<p class="topic-title first">References:</p>
<ul class="simple">
<li>Hochreiter, Bodenhofer, et. al., 2010. <a class="reference external" href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881408/">FABIA: factor analysis
for bicluster acquisition</a>.</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
<div class="footer">
© 2010 - 2014, scikit-learn developers (BSD License).
<a href="../_sources/modules/biclustering.txt" rel="nofollow">Show this page source</a>
</div>
<div class="rel">
<div class="buttonPrevious">
<a href="clustering.html">Previous
</a>
</div>
<div class="buttonNext">
<a href="decomposition.html">Next
</a>
</div>
</div>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-22606712-2']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
<script src="http://www.google.com/jsapi" type="text/javascript"></script>
<script type="text/javascript"> google.load('search', '1',
{language : 'en'}); google.setOnLoadCallback(function() {
var customSearchControl = new
google.search.CustomSearchControl('016639176250731907682:tjtqbvtvij0');
customSearchControl.setResultSetSize(google.search.Search.FILTERED_CSE_RESULTSET);
var options = new google.search.DrawOptions();
options.setAutoComplete(true);
customSearchControl.draw('cse', options); }, true);
</script>
<script src="https://scikit-learn.org/versionwarning.js"></script>
</body>
</html>