We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 14
‘11824, 9:36 PM (Creating Heatmaps with
OG
AIMLDS Data Science Data Analysis Data Visualization Machine Learning Deep Learning NLP Comp
archical Clustering -GeeksforGocks
Creating Heatmaps with Hierarchical Clustering
Last Updated : 27 Sep, 2023
Before diving into our actual topic, let's have an understanding of Heatmaps
and Hierarchical Clustering.
Heatmaps
Heatmaps are a powerful data visualization tool that can reveal patterns,
relationships, and similarities within large datasets. When combined with
hierarchical clustering, they become even more insightful. In this brief article,
we'll explore how to create captivating heatmaps with hierarchical
clustering in R programming.
Understanding Hierarchical Clustering
Hierarchical Clustering is a powerful data analysis technique used to
uncover patterns, relationships, and structures within a dataset. It belongs to
the family of unsupervised machine learning algorithms and is particularly
useful in exploratory data analysis and data visualization. Hierarchical
Clustering is often combined with heatmap visualizations, as demonstrated
in this article, to provide a comprehensive understanding of complex
datasets.
What is Hierarchical Clustering?
Hierarchical Clustering, as the name suggests, creates a hierarchical or tree-
like structure of clusters within the data. It groups similar data points
together, gradually forming larger clusters as it moves up the hierarchy. This
hierarchical representation is often visualized as a dendrogram, which is a
tree diagram that illustrates the arrangement of data points into clusters.
We use cookies to ensure you have the best browsing experience on our website. By using
our site, you acknowledge that you have read and understood our Cookie Policy & Privacy,
Policy
ntpsufwaew.geekstorgecks.org/reatig nealmaps-witr-ierarchical-ustering/ref=neader_outind ana‘11824, 9:36 PM (Creating Heatmaps with
archical Clustering -GeeksforGocks
=f oS
How Does Hierarchical Clustering Work?
Hierarchical Clustering can be performed in two main approaches:
Agglomerative (bottom-up) and Divisive (top-down),
+ Agglomerative Hierarchical Clustering: This is the more commonly used
approach. It starts with each data point as its own cluster and then
iteratively merges the closest clusters until all data points belong to a
single cluster. The choice of a distance metric to measure similarity
between clusters and a linkage method to determine how to merge
clusters are critical in this process.
* Divisive Hierarchical Clustering: This approach takes the opposite route.
It begins with all data points in one cluster and recursively divides them
into smaller clusters based on dissimilarity. While it provides insights into
the finest details of data, it is less commonly used in practice due to its
computational complexity
Why Use Hierarchical Clustering?
1. Hierarchy of Clusters: Hierarchical Clustering provides a hierarchical
view of how data points are related. This can be particularly useful when
there are natural hierarchies or levels of similarity in the data.
2. Interpretability: The dendrogram generated by Hierarchical Clustering
allows for easy interpretation. You can visually identify the number of
clusters at different levels of the hierarchy.
3. No Need for Predefined Number of Clusters: Unlike some other
clustering techniques that require you to specify the number of clusters in
We use cookies to ensure you have the best browsing experience on our website. By using
our site, you acknowledge that you have read and understood our Cookie Policy & Privacy,
Policy
nipsufinew.geekstorgecks.org/reating snealmaps-witr-ierarchical-ustering/ref=neader_outind ane‘18/24, 3:96 PM Creating Heatmaps with
Getting Started
Before diving into the code, ensure you have the necessary packages
installed. We'll use the * pheatmap ‘ package for heatmap visualization and
‘dendextend’ for dendrogram customization. If you haven't already installed
them, run the following commands:
R
install. packages ("pheatmap")
install. packages ("dendextend")
Load the required packages:
R
Library(pheatmap)
Library(dendextend)
Preparing Your Data
For our demonstration, let's consider a hypothetical gene expression dataset.
It's crucial to have data with clear patterns or relationships to create
meaningful heatmaps. Replace this example data with your own dataset as
needed.
R
# Example gene expression data
gene_data <- data. frane(
Gene = <("Gene1", “Gene2", “Gene3", "Genes", "Genes"),
Samplet = c(2.3, 1.8, 3.2, 0.9, 2.5),
Sample2 = c(2-1, 1.7, 3.0, 1.0, 2.4),
Sample3 = c(2.2, 1.9, 3.1, @.8, 2.6),
Sampled = c(2.4, 1.6, 3.3, @.7, 2.3),
SampleS = c(2.0, 1.5, 3.4, 0.6, 2.7)
We use cookies to ensure you have the best browsing experience on our website. By using.
ur site, you acknowledge that you have read and understood our Cookie Policy & Privacy,
Policy
nipsufinew.geekstorgecks.org/reating nealmaps-witr-ierarchical-ustering/ef=neader_outind ane‘11824, 9:36 PM
Output:
(Creating Heatmaps with
Gene Sample1 Sample2 Sample3 Sample SampleS
1 Genel
2 Gene2
3 Gene3
4 Gened
5 Genes
2.3
1.8
3.2
a.9
2.5
21
1.7
3.8
1.8
2.4
2.2
1.9
3.1
a8
2.6
24
1.6
3.3
a7
2.3
Removing Non-Numeric Labels
R
2.8
1.5
3.4
8.6
2.7
# Remove the non-numeric column (Gene names) temporarily
gene_names <- gene_data$Gene
gene_data <- gene_data[, -1]
print (gene_data)
Output:
We use cookies to ensure you have the best browsing experience on our website. By using
our site, you acknowledge that you have read and understood our Cookie Policy & Privacy,
ER on)
Policy
nipsufinew.geekstorgecks.org/reating snealmaps-witr-ierarchical-ustering/ref=neader_outind
ana‘11824, 9:36 PM Creating Heatmaps with Hierarchical Clustering - GooksforGocks
Sample1 Sample2 Semple3 Sampled Samples
2.3 2.1 2.2 2.4 2.8
1809017 «19 46 45
3.2 3.8 31 33 34
9 1.8 88 87 86
25 24 2624327
weune
* In many datasets, the first column often contains labels or identifiers,
such as gene names or sample IDs. These non-numeric columns are
essential for understanding the data but can interfere with mathematical
operations like distance calculations.
* To perform distance calculations correctly, we need to exclude these non-
numeric columns. In this code, we store the gene names in the variable
‘gene_names’ for later use and remove the non-numeric column from the
‘gene_data DataFrame’.
* Removing the non-numeric column temporarily allows us to calculate
distances without interference from these labels
Calculating Distances and Performing Hierarchical
Clustering
To create meaningful heatmaps, we first calculate distances between data
points using various methods. In this case, we'll use Euclidean, Manhattan,
and Pearson correlation distances.
R
# Calculate distances with different methods
euclidean_dist_rous <- dist(gene_data, method = "euclidean")
manhattan_dist_rows <- dist(gene_data, method = “manhattan")
correlation_dist_rows <- as.dist(1 - cor(gene_data, method = “pearson"))
# Perform hierarchical clustering for rows
complete_clusters_euclidean_rows <- hclust(euclidean_dist_rows, method = "compl
complete_clusters_manhattan_rows <- hclust(manhattan_dist_rows, method = "compl
complete_clusters_correlation_rows <- hclust(correlation_dist_rows, method = "c
# Calculate distances for columns
euclidean_dist_cols <- dist(t(gene_data), method = “euclidean”)
manhattan_dist_cols <- dist(t(gene_data), method
We use cookies to ensure you have the best browsing experience on our website. By using.
our site, you acknowledge that you have read and understood our Cookie Policy & Privacy,
Policy
nipsufwaew.geekstorgecks.org/reating snealmaps-witr-ierarchical-ustering/ref=neader_outind one*118124, 3:36 PM (creating Heatmaps with Hierarchical Clustering - GeoksforGecks
complete_clusters_manhattan_cols <- hclust(manhattan_dist_cols, method = "compl
complete_clusters_correlation_cols <- hclust(correlation_dist_cols, method = "c
In the data analysis process, calculating distances between data points and
performing hierarchical clustering are fundamental steps to uncover patterns
and relationships within a dataset.
In this code snippet:
We calculate distances using three different methods: Euclidean, Manhattan,
and Pearson correlation. Each of these methods offers a unique way to
quantify the dissimilarity between data points.
* Euclidean distance: Euclidean distance is a measure of the “as-the-crow-
flies" distance between two points in multidimensional space. It's suitable
for datasets where the variables have the same scale.
+ Manhattan distance: Manhattan distance calculates the distance as the
sum of the absolute differences between the coordinates of two points.
It's robust to outliers and works well with data that may not meet the
assumptions of Euclidean distance.
+ Pearson correlation distance: Pearson correlation distance quantifies the
dissimilarity between data points in terms of their linear relationships. It
measures how variables move together or apart and is often used with
gene expression data
For each distance metric, we perform hierarchical clustering for the rows and
columns separately.
We use the “complete” linkage method for hierarchical clustering, which
determines the distance between clusters based on the maximum pairwise
dissimilarity between their data points.
other linkage methods:
* Single Linkage (method = “single”): It calculates the minimum distance
between clusters’ elements. It tends to create elongated clusters.
* Average Linkage (method = “average”: It calculates the average
distance between clusters’ elements. It often results in balanced clusters
We use cookies to ensure you have the best browsing experience on our website. By using
our site, you acknowledge that you have read and understood our Cookie Policy & Privacy,
Policy
nipsufwaew.geekstorgecks.org/reating snealmaps-witr-ierarchical-ustering/ref=neader_outind one‘11824, 9:36 PM (Creating Heatmaps with
rarchical Clustering - GookstorGooks
By calculating distances and performing hierarchical clustering for both rows
and columns, we gain insights into the structure and relationships within the
dataset. These hierarchical clustering's are essential for creating informative
heatmaps, as they determine how the rows and columns should be
organized to reveal patterns and similarities effectively.
Generating Distinct Heatmaps:
To gain comprehensive insights into your dataset, we generate three distinct
heatmaps, each based on a different distance metric. These heatmaps
visually represent the relationships and patterns within your data,
To analyze these heatmaps you must know below 6 points:
+ Understanding the Color Scale: Heatmaps use color gradients to
represent data values. Warmer colors (e.g., red) typically signify higher
values, while cooler colors (e.g., blue) represent lower values. This color
scale helps interpret the intensity or magnitude of data.
* Identifying Clusters: Look for groups of similar elements within rows and
columns, often indicated by dendrogram branches.
* Interpreting Dendrograms: Examine dendrograms to understand
hierarchical relationships and dissimilarity levels between clusters.
* Spotting Patterns: Identify consistent color patterns, revealing similarities
or differences in data behavior.
+ Comparing Heatmaps: If using multiple distance metrics, compare
heatmaps to gain insights into data characteristics
* Applying Domain Knowledge: Utilize domain-specific expertise to
decipher biological or contextual significance, especially in fields like gene
expression analysis.
Euclidean Distance Heatmap:
R
We use cookies to ensure you have the best browsing experience on our website. By using.
our site, you acknowledge that you have read and understood our Cookie Policy & Privacy,
Policy
nipsufinew.geekstorgecks.org/reating nealmaps-witr-ierarchical-ustering/ef=neader_outind m4*118124, 3:36 PM (creating Heatmaps with Hierarchical Clustering - GeoksforGecks
main = “Euclidean Distance Heatmap")
Output:
Euclidean Distance Heatmap
The Euclidean distance heatmap leverages the Euclidean distance metric,
providing a view of the pairwise distances between genes and samples.
Genes and samples with similar expression profiles cluster together,
allowing you to identify groups of genes and samples that share common
characteristics or responses. This heatmap is particularly useful for
discovering clusters based on the spatial “as-the-crow-flies" distance
between data points.
Manhattan Distance Heatmap:
R
pheatmap(as.matrix(gene_data),
cluster_rows = complete_clusters_manhattan_rows,
cluster_cols = complete_clusters_manhattan_cols,
main = "Manhattan Distance Heatmap")
Output:
We use cookies to ensure you have the best browsing experience on our website. By using
ur site, you acknowledge that you have read and understood our Cookie Policy & Privacy,
Policy
nipsufwaew.geekstorgecks.org/reating snealmaps-witr-ierarchical-ustering/ref=neader_outind ane‘11824, 9:36 PM (Creating Heatmaps with Hierarchical Clustering - GeoksforGooks
Manhattan Distance Heatmap
The Manhattan distance heatmap, on the other hand, employs the
Manhattan distance metric. It reveals patterns and clusters within your data
based on the sum of absolute differences between coordinates. Unlike the
Euclidean distance heatmap, this visualization is robust to outliers and can
be more suitable for datasets with variables that do not meet Euclidean
assumptions.
Pearson Correlation Distance Heatmap:
R
pheatmap(as.matrix(gene_data),
cluster_rows = complete_clusters_correlation_rows,
cluster_cols = complete_clusters_correlation_cols,
main = "Pearson Correlation Distance Heatmap")
Output:
Pearson Correlation Distance Heatmap
CC
I
as
a
I
We use cookies to ensure you have the best browsing experience on our website. By using
ur site, you acknowledge that you have read and understood our Cookie Policy & Privacy,
Policy
nipsufinew.geekstorgecks.org/reating nealmaps-witr-ierarchical-ustering/ef=neader_outind one‘11824, 9:36 PM (Creating Heatmaps with
rarchical Clustering - GookstorGooks
The Pearson correlation distance heatmap explores linear relationships
between genes and samples. It quantifies how genes move together or
apart, uncovering co-expression patterns or anti-correlations. This heatmap
is valuable for identifying genes that are co-regulated under specific
conditions, making it a powerful tool for gene expression analysis.
Conclusion:
Creating heatmaps with hierarchical clustering in R is a valuable technique
for visualizing complex datasets, such as gene expression or any data with
similar patterns. This visualization not only aids in identifying clusters but
also provides a clear overview of your data’s structure.
Experiment with different datasets, clustering methods, and color palettes to
unlock hidden insights within your data. Heatmaps with hierarchical
clustering are your ticket to revealing intricate relationships in a visually
stunning way. Explore, analyze, and discover the stories within your data.
Are you passionate about data and looking to make one giant leap into your
career? Our Data Science Course will help you change your game and, most
importantly, allow students, professionals, and working adults to tide over
into the data science immersion. Master state-of-the-art methodologies,
powerful tools, and industry best practices, hands-on projects, and real-
world applications. Become the executive head of industries related to Data
Analysis, Machine Learning, and Data Visualization with these growing
skills. Ready to Transform Your Future? Enroll Now to Be a Data Science
Expert!
We use cookies to ensure you have the best browsing experience on our website. By using.
ur site, you acknowledge that you have read and understood our Cookie Policy & Privacy,
Policy
nipsufinew.geekstorgecks.org/reating snealmaps-witr-ierarchical-ustering/ref=neader_outind 1014‘yei24, 3:36 PM Creating Heatmaps with Hierarchical Clustering - GooksforGocks
N maga... + Fottow a 2
Next Article
Hierarchical Clustering in Data Mining
Similar Reads
Hierarchical Clustering in R Programming
Hierarchical clustering in R Programming Language is an Unsupervised non-
linear algorithm in which clusters are created such that they have a
3min read
Difference between K means and Hierarchical Clustering
k-means is method of cluster analysis using a pre-specified no. of clusters. It
requires advance knowledge of 'k’. Hierarchical clustering also known as.
2min read
Hierarchical clustering using Weka
In this article, we will see how to utilize the Weka Explorer to perform
hierarchical analysis. The sample data set for this example is based on iris dat...
3min read
Structured vs Unstructured Ward in Hierarchical Clustering Using Scikit.
Hierarchical clustering is a popular and widely used method for grouping and
organizing data into clusters. In hierarchical clustering, data points are groupe.
‘3min read
We use cookies to ensure you have the best browsing experience on our website. By using.
our site, you acknowledge that you have read and understood our Cookie Policy & Privacy,
Policy
itps:lwn.geeksforgeeks.org/creating-heatmaps-wit-hirarchical-clustering/?ref=header_outind 14‘18/24, 9:36 PM Creating Heatmaps with Hierarchical Clustering - GooksforGocks
A Hierarchical clustering method works via grouping data into a tree of
clusters. Hierarchical clustering begins by treating every data point as a...
5 min read
Hierarchical Clustering in Machine Learning
In data mining and statistics, hierarchical clustering analysis is a method of
clustering analysis that seeks to build a hierarchy of clusters ie. tree-type..
Tmin read
Hierarchical Density-Based Spatial Clustering of Applications with Noise...
Clustering is a machine-learning technique that divides data into groups, or
clusters, based on similarity. By putting similar data points together and.
6 min read
Hierarchical Clustering with Scikit-Learn
Hierarchical clustering is a popular method in data science for grouping similar
data points into clusters. Unlike other clustering techniques like K-means...
4 min read
How to Make Heatmaps in R with pheatmap?
One of the widely used visualizations is heat maps. They proved to be very
helpful in analyzing the highly correlated features in the input data. In this...
4min read
How To Make Heatmaps in R with ComplexHeatmap?
Heatmap is a Visualization technique in the form of a matrix. It is mainly used
for analyzing the numeric feature in a dataset and to visualize the trend over...
3min read
Article Tags : ALML-DS Data Visualization Language AMML-S With,
AGS. GeeksforGeeks
We use cookies to ensure you have the best browsing experience on our website. By using.
our site, you acknowledge that you have read and understood our Cookie Policy & Privacy,
Policy
nipsufinew.geekstorgecks.org/reating nealmaps-witr-ierarchical-ustering/ef=neader_outind sane‘828, 336 PM Croat Heatmaps wih Hierarchical Clusaing - Gookstorooks
(201305) | Registered Address:-K 061,
Tower k, Gulshan Vivante Apartment,
Sector 137, Noida, Gautam Buddh
Nagar, Utar Pradesh, 201305
Company
About Us
Legal
In Media
Contact Us
Advertise with us
GFG Corporate Solution
Placement Training Program
GeeksforGeeks Community
DSA
Data Structures
Algorithms
DSA for Beginners
Basic DSA Problems
DSA Roadmap
‘Top 100 DSAInterview Problems
DSA Roadmap by Sandeep Jain
All Cheat Sheets
Web Technologies
HTML
css
JavaScript
Typescript
Reactss
Nexus
Bootstrap
Web Design
Computer Science
0
Computer Network
ting Systems
Database Management System
Software Engineering
Languages
Python
Java
cH
PHP
Gotang
sau
RLanguage
Android Tutorial
Tutorials Archive
Data Science & ML
Data Science With Python
Data Science For Beginner
Machine Learning
ML Maths
Data Visualisation
Pandas
NumPy
NLP
Deep Learning
Python Tutorial
Python Programming Examples
Python Projects
Python Tkinter
Web Scraping
Openc¥ Tutorial
Python interview Question
Django
Devops
sit
Linux
ws
Docker
We use cookies to ensure you have the best browsing experience on our website. By using
ur site, you acknowledge that you have read and understood our Cookie Policy & Privacy,
Policy
nipsufinew.geekstorgecks.org/reating nealmaps-witr-ierarchical-ustering/ef=neader_outind 1384‘11824, 9:36 PM
Software Testing
‘System Design
High Level Design
Low Level Design
UML Diagrams.
Interview Guide
Design Patterns
o0A0
Creating Heatmaps with Hierarchical Clustering - GooksforGocks
Devops Roadmap
Inteview Preparation
Competitive Programming
Top DS or Algo for CP
‘Company-Wise Recruitment Process
Company-Wise Preparation
Aptitude Preparation
Puzzles
system Design Bootcamp
Interview Questions
‘School Subjects
Mathematics
Physics
Chemistry
Biology
Social Science
English Grammar
Commerce
World 6k
‘eeksforGeek:
GeeksforGeeks Videos
psa
Python
Java
ce
Web Development
Data Science
cS Subjects
s, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using
our site, you acknowledge that you have read and understood our Cookie Policy & Privacy,
Policy
sana