0% found this document useful (0 votes)

8 views38 pages

Lecture 9

This document discusses the use of heatmaps and clustering in bioinformatics for visualizing omic data, emphasizing the importance of verifying statistical significance through expression data. It explains how clustering can reveal hidden patterns in complex datasets and provides R code for creating heatmaps and rugs to display additional clinical information. The document highlights the power of clustering in identifying groups of genes and samples with similar expression profiles.

Uploaded by

9djbwrn8cw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views38 pages

Lecture 9

Uploaded by

9djbwrn8cw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Bioinformatics for

Wet-Lab Biologists
Omic Data Analysis
& Visualisation Using R

Lecture 9. Heatmaps and Clustering

Part 1: What are heatmaps?
Heatmap of Significantly Different Genes

So far we have shown that there are significantly different genes BUT NOT HOW
BELIEVABLE THEY ARE ALL AT ONCE. Statistics can be wrong. The first rule of
research is to be RIGHT. The first rule of statistics is once you have p-values you
MUST check to see if they are believable by eye, by looking at the original data. In this
example we need to look at the significant genes at the EXPRESSION LEVEL. If you
read a RNA-seq paper that shows p-values but NO EXPRESSION DATA don’t believe
it.
Heatmap of Significantly Different Genes

Question: Are our significantly differential genes consistent between replicates but
different between groups?

A heatmap is simply a table where the numbers are replaced by a colour. The colour
intensity represents the number. One great thing about heatmaps is that they let you see
your entire expression matrix (both genes and samples) at once. NO OTHER plot lets you
do this.

What do this plot show?

Clustering

The second great thing about heatmaps is that they can be clustered. Clustering
is an EXTREMELY powerful transformation for finding groups.

If we plot the same heatmap without clustering we get this. Remember the data it shows is
identical:

It looks very messy. That’s because the genes are simply in the order that they appear in
the differential expression file – which is random. What we need is some way to order
the genes so that it looks neat. One way is to sort them by fold change. A better way is to
cluster them.
Clustering

In clustering we order the genes by how similar their pattern of expression is

(across the samples). This is achieved iteratively:

(1) we take our expression matrix of (2) we decide which two genes have the most
signficant genes (in this case 3 genes) similar expression pattern. Based on Spearman
correlations between all combinations of genes

(3) We average the expression values (4) We repeat stages 2+3 until there is
for the two genes identified in (2): only one spearman value.

Obviously with many 100s of genes this will be computationally quite heavy.
Clustering

As we perform the previous step, what we are actually doing is building a

dendrogram.

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

As we perform the previous step, what we are actually doing is building a

dendrogram.

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

As we perform the previous step, what we are actually doing is building a

dendrogram.

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

As we perform the previous step, what we are actually doing is building a

dendrogram.

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

As we perform the previous step, what we are actually doing is building a

dendrogram.

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

As we perform the previous step, what we are actually doing is building a

dendrogram.

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

As we perform the previous step, what we are actually doing is building a

dendrogram.

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

As we perform the previous step, what we are actually doing is building a

dendrogram.

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

In a dendrogram the “arms” can rotate freely. E.g. this is the same as:

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

This...

TTN ACTN TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B

Clustering

The second part of a clustering algorithm is the reordering based on how far up or
down the joining loop a join occurred. In the previous example the starting order of
the genes was close to the re-ordered order. Lets go again with a more complex
example.

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

We can see that the dendrogram is very tangled up. If we start at the first join and spin
the dendrogram until they are side by side, then take the next join and repeat, and so on
we get the final order.

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

5
4

2
1

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

Clustering

5
4

2
1

TNF P53 CCL2 CSF1 ADRB2 TTN NR1B ACTN NR1A

Clustering

5
4

2
1

TNF P53 CCL2 ADRB2 TTN CSF1 NR1B ACTN NR1A

Clustering

5
4

2
1

TNF P53 NR1A CCL2 ADRB2 TTN CSF1 NR1B ACTN

Clustering

Ta daa! This is the order that we place the genes in the heatmap.

5
4

2
1

TNF P53 NR1A CCL2 ADRB2 TTN CSF1 NR1B ACTN

Clustering

There are MANY different algorithms for clustering. This is the simplest but also
one of the most powerful and widely used. Clustering can be achieved in R.

The three components that can easily be modified in the algorithm are:

1) The method for getting the distance. Here we used Spearman correlations, but we
could use Pearson, Euclidean, Geometric, etc.

2) The method we used to agglomerate was mean. i.e. we averaged the expression
values when we joined. There are others such as median.

3) The method we used to reorder the dengrogram was distance. There are others.

A final point regarding dendrograms is that you DO NOT need to show them on the plot.
Clustering is NOT scientific. The dendrogram is meaningles once the order is decided.
The power of clustering large datasets

Previously we have used simple datasets. i.e. two groups of samples with few
replicates. However with more complex experiments (and primary tissue)
clustering is MORE powerful. It can be used to identify hidden patterns.

Un-clustered data The same data but clustered – look what was hiding!
2,000 genes

2,000 genes
87 patients 87 patients

You can identify previously unkown groups of genes and samples with concordant
expression profiles this way.
Part 2: Making heatmaps in R
library(amap)

# makes a matrix
hm.matrix = as.matrix(em_sig_scaled)

# gets the distances

y.dist = Dist(hm.matrix, method="spearman")

# clusters
y.cluster = hclust(y.dist, method="average")

# this pulls out the dendrogram

y.dd = as.dendrogram(y.cluster)

# this untangles the denrogram

y.dd.reorder = reorder(y.dd,0,FUN="average")

# this gets the untangled gene order from the denrogram

y.order = order.dendrogram(y.dd.reorder)

# this reorders the original matrix in the new order

hm.matrix_clustered = hm.matrix[y.order,]

# makes the colour palette

colours = c("blue","pink","red")
palette = colorRampPalette(colours)(100)

# melt and plot

hm.matrix_clustered = melt(hm.matrix_clustered)
ggp = ggplot(hm.matrix_clustered, aes(x=Var2, y=Var1, fill=value)) + geom_tile() + scale_fill_gradientn(colours = palette)
ggp
Part 3: Rugs
Rugs

Often the clusters, groups or clinical information on a heatmap has its own
colour bar. This is called a rug. In fact it is actually a tiny heatmap, that
accompanies the main one.

group
age
Rugs

We simply take the values we want to be the rug (e.g. age, BMI, tissue, etc) we
then melt, don’t cluster, and plot a heatmap using these melted values.

# load sample sheet

sample_sheet = read.table("/data_d2-d9/sample_sheet.csv", header=TRUE,row.names = 1, sep='\t')

# colours
rug_colours = c("red", "cyan", "purple“)

# rug for discrete variable

rug_data = as.matrix(as.numeric(factor(ss$SAMPLE_GROUP)))
rug_data = melt(rug_data)

# plot
ggp = ggplot(rug_data , aes(x = Var1, y = Var2, fill = value)) + geom_tile() + scale_fill_gradientn(colours = rug_colours)

# trim off the headers and legends, grid, everything etc..

+ theme(plot.margin=unit(c(0,1,1,1), "cm"),
axis.line=element_blank(),axis.text.x=element_blank(),axis.title.x=element_blank(),axis.text.y=element_blank(),axis.ticks=elemen
t_blank(),axis.title.y=element_blank(),legend.position="none",panel.background=element_blank(),panel.border=element_blank(),
panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank())
Rugs

Amazing!

Kyocera Service Manual 3050 4050 5050
100% (14)
Kyocera Service Manual 3050 4050 5050
455 pages
Online Share Sos BRP Brand Guide 1
No ratings yet
Online Share Sos BRP Brand Guide 1
135 pages
Genetic Engineering in Animals Part 1 17052013
100% (1)
Genetic Engineering in Animals Part 1 17052013
52 pages
Clustering 2
No ratings yet
Clustering 2
11 pages
CMMB 461 Dna Microarray 2 2019 For D2L
No ratings yet
CMMB 461 Dna Microarray 2 2019 For D2L
27 pages
Cluster
No ratings yet
Cluster
2 pages
Microarray Data Analysis: Class Discovery and Class Prediction: Clustering and Discrimination
No ratings yet
Microarray Data Analysis: Class Discovery and Class Prediction: Clustering and Discrimination
70 pages
How Does Gene Expression Clustering Work?: Primer
No ratings yet
How Does Gene Expression Clustering Work?: Primer
3 pages
The Fundamentals of Constructing and Interpreting Heat Maps
No ratings yet
The Fundamentals of Constructing and Interpreting Heat Maps
13 pages
Cluster Analysis in DNA Microarray Experiments: Sandrine Dudoit and Robert Gentleman
No ratings yet
Cluster Analysis in DNA Microarray Experiments: Sandrine Dudoit and Robert Gentleman
48 pages
Visualizing Data: BINF733 SPRING2006 Dr. Jeff Solka and Dr. Jennifer Weller
No ratings yet
Visualizing Data: BINF733 SPRING2006 Dr. Jeff Solka and Dr. Jennifer Weller
76 pages
Clustering
No ratings yet
Clustering
22 pages
Cluster Analysis For Gene Expression Data: Jiong Yang Eecs Case Western Reserve University
No ratings yet
Cluster Analysis For Gene Expression Data: Jiong Yang Eecs Case Western Reserve University
34 pages
5 Microarray PDF
No ratings yet
5 Microarray PDF
79 pages
Ch10 Clustering
No ratings yet
Ch10 Clustering
45 pages
Combined 76 90
No ratings yet
Combined 76 90
15 pages
R Reference Card For Data Mining
No ratings yet
R Reference Card For Data Mining
3 pages
Iris HC Solution
No ratings yet
Iris HC Solution
31 pages
Combined 91 105
No ratings yet
Combined 91 105
15 pages
Clustering
No ratings yet
Clustering
36 pages
RDataMining Reference Card
No ratings yet
RDataMining Reference Card
5 pages
Report
No ratings yet
Report
30 pages
Ijcet 10 01 005 PDF
No ratings yet
Ijcet 10 01 005 PDF
10 pages
Agenda: 1. Introduction To Clustering
No ratings yet
Agenda: 1. Introduction To Clustering
47 pages
Microarray Full
No ratings yet
Microarray Full
56 pages
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
No ratings yet
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
34 pages
Gene and Sample Clustering
No ratings yet
Gene and Sample Clustering
5 pages
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
No ratings yet
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
32 pages
A Grammar of Graphics
0% (1)
A Grammar of Graphics
45 pages
Dchip Expression
No ratings yet
Dchip Expression
4 pages
Clustering Tutorial May
No ratings yet
Clustering Tutorial May
60 pages
Consensus Cluster Plus
No ratings yet
Consensus Cluster Plus
12 pages
YanchangZhao Refcard Data Mining
No ratings yet
YanchangZhao Refcard Data Mining
3 pages
Heatmap
No ratings yet
Heatmap
4 pages
An Iterative Data Mining Approach For Mining Overlapping Co Expression Patterns in Noisy Gene Expression
No ratings yet
An Iterative Data Mining Approach For Mining Overlapping Co Expression Patterns in Noisy Gene Expression
22 pages
Consensus Cluster Plus
No ratings yet
Consensus Cluster Plus
12 pages
Metodos Clasificacion
No ratings yet
Metodos Clasificacion
203 pages
Guide To Create: Beautiful Graphics in R
No ratings yet
Guide To Create: Beautiful Graphics in R
48 pages
Clustering Gene Expression Data: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu April 2001
No ratings yet
Clustering Gene Expression Data: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu April 2001
9 pages
Graphics Chapter
No ratings yet
Graphics Chapter
49 pages
Dhaeseleer nb05
No ratings yet
Dhaeseleer nb05
4 pages
Dendrograms & PFGE Analysis
No ratings yet
Dendrograms & PFGE Analysis
28 pages
Beautiful Graphics in R
No ratings yet
Beautiful Graphics in R
238 pages
R Tutorial: Geometric Interpretation of Gene Co-Expression Network Analysis, Applied To Brain Cancer Microarray Data
No ratings yet
R Tutorial: Geometric Interpretation of Gene Co-Expression Network Analysis, Applied To Brain Cancer Microarray Data
27 pages
Gmapr: Use The Gmap Suite of Tools in R: Michael Lawrence, Cory Barr October 26, 2021
No ratings yet
Gmapr: Use The Gmap Suite of Tools in R: Michael Lawrence, Cory Barr October 26, 2021
8 pages
Creating Heatmaps With Hierarchical Clustering
No ratings yet
Creating Heatmaps With Hierarchical Clustering
14 pages
Antigen Map 3 D Manual
No ratings yet
Antigen Map 3 D Manual
10 pages
Data Visualization
100% (1)
Data Visualization
47 pages
10 Cluster Analysis
No ratings yet
10 Cluster Analysis
13 pages
Clustering: Georg Gerber Lecture #6, 2/6/02
No ratings yet
Clustering: Georg Gerber Lecture #6, 2/6/02
50 pages
Unit Iii
No ratings yet
Unit Iii
62 pages
Demystifying Clustering KMeans Agglomer
No ratings yet
Demystifying Clustering KMeans Agglomer
10 pages
Cluster Analysis Using Dicer: Install - Packages
No ratings yet
Cluster Analysis Using Dicer: Install - Packages
8 pages
4 Clustering
No ratings yet
4 Clustering
21 pages
Clustering
No ratings yet
Clustering
64 pages
YanchangZhao Refcard Data Mining
No ratings yet
YanchangZhao Refcard Data Mining
4 pages
R Reference Card For Data Mining
No ratings yet
R Reference Card For Data Mining
4 pages
Shaping The Brain
From Everand
Shaping The Brain
Nicholas Thomas
No ratings yet
Negative Mass and Negative Refractive Index in Atom Nuclei - Nuclear Wave Equation - Gravitational and Inertial Control: Part 5: Gravitational and Inertial Control, #5
From Everand
Negative Mass and Negative Refractive Index in Atom Nuclei - Nuclear Wave Equation - Gravitational and Inertial Control: Part 5: Gravitational and Inertial Control, #5
Raul Fattore
No ratings yet
Somatosensory Teachers Manual: for Somatosensory Science Facts
From Everand
Somatosensory Teachers Manual: for Somatosensory Science Facts
Charles Pidgeon
No ratings yet
Negative Mass and Negative Refractive Index in Atom Nuclei - Nuclear Wave Equation - Gravitational and Inertial Control: Part 2: Gravitational and Inertial Control, #2
From Everand
Negative Mass and Negative Refractive Index in Atom Nuclei - Nuclear Wave Equation - Gravitational and Inertial Control: Part 2: Gravitational and Inertial Control, #2
Raul Fattore
No ratings yet
Negative Mass and Negative Refractive Index in Atom Nuclei - Nuclear Wave Equation - Gravitational and Inertial Control: Part 3: Gravitational and Inertial Control, #3
From Everand
Negative Mass and Negative Refractive Index in Atom Nuclei - Nuclear Wave Equation - Gravitational and Inertial Control: Part 3: Gravitational and Inertial Control, #3
Raul Fattore
No ratings yet
The Science of Stem Cells
From Everand
The Science of Stem Cells
Jonathan M. W. Slack
No ratings yet
Mathematical Biophysics
100% (6)
Mathematical Biophysics
274 pages
Jumadiao, Yra Marielle M. Exercise 9: Heat Balance and Theoretical Flame Temperature Given
No ratings yet
Jumadiao, Yra Marielle M. Exercise 9: Heat Balance and Theoretical Flame Temperature Given
4 pages
JCB Cross Ref Application Guide 2024 New 1 1
No ratings yet
JCB Cross Ref Application Guide 2024 New 1 1
8 pages
Boundary Fill - Lect - 09
No ratings yet
Boundary Fill - Lect - 09
7 pages
LM317 3-Terminal Adjustable Regulator: 1 Features 3 Description
No ratings yet
LM317 3-Terminal Adjustable Regulator: 1 Features 3 Description
32 pages
Bunn ULTRA Service and Repair Manual
No ratings yet
Bunn ULTRA Service and Repair Manual
73 pages
Week 1 Lesson 2 - PPT
No ratings yet
Week 1 Lesson 2 - PPT
21 pages
Borax - The Inexpensive Detox, Arthritis, Osteoporosis and Mycoplasma Cure
80% (20)
Borax - The Inexpensive Detox, Arthritis, Osteoporosis and Mycoplasma Cure
14 pages
Nissan Google Sheet Client Status
No ratings yet
Nissan Google Sheet Client Status
50 pages
Hermite Curves, B-Splines and NURBS: Outline
No ratings yet
Hermite Curves, B-Splines and NURBS: Outline
10 pages
Lecture 1 CSU510 Introduction To Metabolism
No ratings yet
Lecture 1 CSU510 Introduction To Metabolism
13 pages
Azolla
No ratings yet
Azolla
5 pages
ETALON Melles Griot PDF
100% (1)
ETALON Melles Griot PDF
2 pages
List Item Autolube Graco Dyna Star Bina Pertiwi
No ratings yet
List Item Autolube Graco Dyna Star Bina Pertiwi
1 page
Lecture 4 Children Behavior in The Dental Clinic
0% (1)
Lecture 4 Children Behavior in The Dental Clinic
5 pages
Schedule Be'e Mos Bridge
No ratings yet
Schedule Be'e Mos Bridge
1 page
Anchoring
100% (2)
Anchoring
34 pages
Sae Mem 24 PDF
No ratings yet
Sae Mem 24 PDF
38 pages
Documentupload 2549
No ratings yet
Documentupload 2549
69 pages
Sync Belt Overview
No ratings yet
Sync Belt Overview
4 pages
25 Lieb-Thirring Inequalities
No ratings yet
25 Lieb-Thirring Inequalities
35 pages
The Farmer and The Seed English Tract
No ratings yet
The Farmer and The Seed English Tract
2 pages
Anita Devi Insurance
No ratings yet
Anita Devi Insurance
1 page
MOVIE REVIEW and Analysis
No ratings yet
MOVIE REVIEW and Analysis
5 pages
Xaxew
No ratings yet
Xaxew
2 pages
Group 7 QP
No ratings yet
Group 7 QP
13 pages
Direct Design Handbook Working October 4 2012
No ratings yet
Direct Design Handbook Working October 4 2012
122 pages

Lecture 9

Uploaded by

Lecture 9

Uploaded by

Bioinformatics for

Lecture 9. Heatmaps and Clustering

What do this plot show?

In clustering we order the genes by how similar their pattern of expression is

As we perform the previous step, what we are actually doing is building a

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

As we perform the previous step, what we are actually doing is building a

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

As we perform the previous step, what we are actually doing is building a

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

As we perform the previous step, what we are actually doing is building a

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

As we perform the previous step, what we are actually doing is building a

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

As we perform the previous step, what we are actually doing is building a

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

As we perform the previous step, what we are actually doing is building a

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

As we perform the previous step, what we are actually doing is building a

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

TTN ACTN TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

TNF P53 CCL2 CSF1 ADRB2 NR1A NR1B ACTN TTN

TNF P53 CCL2 CSF1 ADRB2 TTN NR1B ACTN NR1A

TNF P53 CCL2 ADRB2 TTN CSF1 NR1B ACTN NR1A

TNF P53 NR1A CCL2 ADRB2 TTN CSF1 NR1B ACTN

TNF P53 NR1A CCL2 ADRB2 TTN CSF1 NR1B ACTN

# gets the distances

# this pulls out the dendrogram

# this untangles the denrogram

# this gets the untangled gene order from the denrogram

# this reorders the original matrix in the new order

# makes the colour palette

# melt and plot

# load sample sheet

# rug for discrete variable

# trim off the headers and legends, grid, everything etc..

You might also like