0% found this document useful (0 votes)

4 views17 pages

Document (26) - Copy 2

The document presents a major project on the analysis of mitochondrial data using R Studio, focusing on Principal Component Analysis (PCA) to reduce dimensionality and visualize biological data. It outlines the methodology for conducting PCA, interpreting results, and the significance of the analysis in population genetics, functional genomics, and clinical applications. The project emphasizes the importance of identifying disease-gene associations and prioritizing research targets through data visualization techniques.

Uploaded by

chefroyale.23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views17 pages

Document (26) - Copy 2

Uploaded by

chefroyale.23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

MAJOR PROJECT - ANALYSIS OF

MITOCHONDRIAL
DATA USING R STUDIO

SUBMITTED BY : SUBMITTED TO :
BHUVAN NAKRA DR. MINAKSHI GARG
UEM211071
BE BIOTECHNOLOGY, 8 TH SEM

1
ACKNOWLEDGEMENT

I would like to express my heartfelt gratitude to my esteemed guide, Dr. Minakshi Garg, for her
unwavering support, guidance, and encouragement throughout the course of my project,
ANALYSIS OF MITOCHONDRIAL DATA USING R STUDIO . Her vast knowledge and expertise
have been invaluable in shaping the direction and scope of this research. She has provided me
with insightful suggestions, critical feedback, and constructive advice at every stage of the
project, ensuring its successful completion. Her dedication and meticulous attention to detail
inspired me to approach the project with the same rigor and commitment.

I am especially thankful for the time and effort Dr. Minakshi Garg devoted to mentoring me,
despite her busy schedule. Her ability to explain complex concepts in a simplified manner and
her enthusiasm for teaching have been a source of immense motivation. She not only guided me
technically but also instilled in me the importance of discipline, perseverance, and critical
thinking, which have significantly contributed to my growth as a student and a learner.

I also want to acknowledge her constant encouragement, which played a pivotal role in
overcoming challenges during the project. Her guidance extended beyond academics, providing
a supportive and collaborative environment that encouraged creativity and innovation. This
project would not have reached its current level of success without her continuous mentorship. I
feel privileged to have had the opportunity to work under her guidance, and I will always be
grateful for her invaluable contribution to this endeavour.

2
PRINCIPAL COMPONENT ANALYSIS (PCA ) BIPLOT IN
R STUDIO

PRINCIPAL COMPONENT ANALYSIS ( PCA ) IS A

DIMENSIONALITY REDUCTION METHOD THAT IS
OFTEN USED TO REDUCE THE DIMENSIONALITY OF
LARGE DATASETS , BY TRANSFORMING A LARGE SET
OF VARIABLES INTO A SMALLER ONE THAT STILL
CONTAINS MOST OF THE INFORMATION IN THE LARGE
DATASET

USE IN ANALYSIS OF BIOLOGICAL DATA :

 BIOLOGICAL DATA LIKE GENE EXPRESSION ,

METABOLOMICS , SNPs or PROTEOMICS –
OFTEN INVOLVES HUNDREDS OF VARIABLES .
 REVEAL PATTERNS SUCH AS SAMPLE
CLUSTERING AND TRENDS
 VISUALIZE VARIATION ACROSS SAMPLES
FEATURED IN A TWO - DIMENSIONAL OR THREE
DIMENSIONAL SPACE

3
HOW PCA CONSTRUCTS THE PRINCIPAL COMPONENTS

AS THERE ARE AS MANY PRICIPAL COMPONENTS AS

THERE ARE VARIABLES IN THE DATA , PRINCIPAL
COMPONENTS ARE CONSTRUCTED IN SUCH A
MANNER THAT THE FIRST PRINCIPAL COMPONENT
ACCOUNTS FOR THE LARGEST POSSIBLE VARIANCE
IN THE DATASET.

COMPONENTS OF A PCA BIPLOT :

A PCA BIPLOT CONSISTS OF TWO MAIN ELEMENTS –

SAMPLES (POINTS ) – shows how individuals ( example –
tissue samples ,individuals , species ) group in a PC Space
VARIABLES (ARROWS) – how original variables ( example
– nucleotide variability , heteroplasmy frequency and
conservation scores )

4
STEPS IN PCA :

1. LOADING THE ESSENTIAL LIBRARIES

library(readxl)
library(ggplot2)

2. IMPORTING DATA
my_data <- read_excel("C:/Users/bhuva/Downloads/1-
da.xlsx")

3. EXTRACTING NUMERIC DATA

numeric_data<-my_data[, sapply(my_data , is.numeric)]

4. DATA INSPECTION AND CLEANING

str(numeric_data)
summary(numeric_data)
numeric_data<- na.omit(numeric_data)

5. PERFORMING PCA

pca_result<- prcomp(numeric_data , scale. = TRUE)

summary(pca_result)

6. GENERATING THE BIPLOT

biplot(pca_result,col=c("magenta","blue"))

5
RESULT

Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 1.3848 1.1884 0.6624 0.48087
Proportion of Variance 0.4794 0.3531 0.1097 0.05781
Cumulative Proportion 0.4794 0.8325 0.9422 1.00000

6
INTERPRETATION

(PC1 + PC2 ) TOGTHER EXPLAIN ABOUT 88.3 % OF THE

TOTAL VARIANCE , WHICH IS QUITE GOOD

PC1- STRONLY INFLUENCED BY HF ( VARIANT ALLELE

FREQUENCY & NUCLEOTIDE VARIABILITY )
PC2- INFLUENCED BY “ PHASTCONS20WAY ” AND
“PHYLOP20WAY” i.e. EVOLUTIONARY CONSERVATION

VARIANTS WITH HIGH- CONSERVATION SCORES

CLUSTER ON POSITIVE Y-AXIS

RESULTS
 HIGH CONSERVATION + LOW VARIABILITY –
DISEASE-CAUSING VARIANTS
 LOW CONSERVATION +HIGH VARIABILITY –
BENIGN OR TOLERATED POLYMORPHISM
 HIGH CONSERVATION + HIGH VARIABILITY –
HOTSPOTS OR POPULATION-SPECIFIC
FUNCTIONAL VARIANTS
 LOW CONSERVATION +LOW VARIABILITY –
RARE NEUTRAL VARIANTS

7
 INTRODUCTION :

The mitochondrial DNA is maternally inherited , a circular

molecule ; of about 16.6 kb (16,569 bp) and unlike the nuclear
genome has no introns
Mitochondrial DNA contains 37 genes, all of which are essential
for normal mitochondrial function.
- 13of these genes provide instructions for making enzymes
involved in oxidative phosphorylation
The remaining genes provide instructions for making molecules
called transfer RNA (tRNA) and ribosomal RNA (rRNA), which
are chemical cousins of DNA
Although it codes for a small number of genes, mutations in
mtDNA are common

8
SIGNIFICANCE OF PCA ANALYSIS :

9
1. IN POPULATION GENETICS OR PHYLOGENY
WE COMPILE VARIANTS FROM DIFFERENT
INDIVIDUALS AND USE VARIANT FREQUENCIES
OR CONSERVATION SCORES TO COMPARE
GENETIC PATTERNS ACROSS SAMPLES

2. FUNCTIONAL GENOMICS
PCA CAN HELP INTERPRET FUNCTIONAL IMPACT
SCORES ( CONSERVATION SCORES ) ACROSS
VARIANTS

3. ENVIRONMENTAL MICROBIOLOGY
CLUSTER SOIL / OIL SAMPLES BASED ON
MICROBIAL ABUNDANCE AND ENVIRONMENTAL
FACTORS ( PH , TEMPERATURE )

4. DIFFERENTIATION OF VARIANT ALLELES

DIFFERENTATING HIGH-RISK VARIANTS FROM
BENIGN ONES

5. QUALITY CONTROL PCA CAN DETECT BATCH

EFFECTS , OUTLIERS OR TECHNICAL ARTIFACTS
IN HIGH-THROUGHPUT DATA

TOP DISEASES COUNT PER LOCUS :

10
STEPS :
1. LOAD THE REQUIRED LIBRARIES
library(readxl)
library(dplyr)
library(ggplot2)

2. READ THE EXCEL FILE

data <- read_excel("C:/Users/bhuva/Downloads/1-
da.xlsx")

3. FILTER ROWS WITH NON-MISSING VALUES

filtered_data <- data %>%
filter(!is.na(ClinVar), !is.na(Locus))

4. FIND TOP 20 DISEASES

top_diseases <- filtered_data %>%
count(ClinVar, sort = TRUE) %>%
top_n(20, n) %>%
pull(ClinVar)
5. FILTER DATASET TO INCLUDE ONLY TOP
DISEASES
top_filtered <- filtered_data %>%
filter(ClinVar %in% top_diseases)

6. GROUPING OF DATA

11
grouped_data <- top_filtered %>%
group_by(Locus, ClinVar) %>%
summarise(Count = n(), .groups = "drop")

7. CREATE THE BAR PLOT

ggplot(grouped_data, aes(x = Locus, y = Count, fill =

ClinVar)) +
geom_bar(stat = "identity") +
theme_minimal() +
labs(title = "Top 20 Diseases per Locus",
x = "Locus", y = "Count",
fill = "Disease (ClinVar)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

SIGNIFICANCE OF THIS ANALYSIS –

12
1. Disease – Gene Association Mapping

Helps establish clear relationships between specific

genomic loci and diseases

2. Prioritization of Research Targets

Identifies “hotspot” loci associated with multiple diseases
Helps prioritize genes/loci for functional studies
Reveals pleiotropic effects where single loci influence
multiple phenotypes

3. Clinical Applications

Improves genetic testing panels by identifying most

clinically relevant loci

4. Biological Pathway Analysis

When combined with pathway databases , reveals disease

networks showing how differernt diseases may share
common biological pathways

TOP VARIANTS BY PATHOGENICITY

13
STEPS :
1. INSTALL THE REQUIRED LIBRARIES
library(readxl)
library(dplyr)
library(ggplot2)
2. READ THE DATA
data <- read_excel("C:/Users/bhuva/Downloads/1-
da.xlsx")
3. COUNT VARIANT ALLELE OCCURENCES
variant_counts <- data %>%
count(`Variant Allele`, sort = TRUE)
4. GET TOP 10 VARIANT ALLELES
top_10 <- variant_counts %>%
top_n(10, n)
5. FILTER ORIGINAL DATA FOR ONLY THESE TOP
VARIANTS
filtered_data <- data %>%
filter(`Variant Allele` %in% top_10$`Variant Allele`)

6. RECOUNT WITH PATHOGENICITY FOR

PLOTTING
plot_data <- filtered_data %>%
count(`Variant Allele`, Pathogenicity)

7. PLOT THE BAR GRAPH

14
ggplot(plot_data, aes(x = reorder(`Variant Allele`, n), y =
n, fill = Pathogenicity)) +
geom_col() +
coord_flip() +
labs(
title = "Top 10 Variants by Pathogenicity",
x = "Variant Allele",
y = "Count"
)+
theme_minimal() +
theme(axis.text.y = element_text(size = 10))

SIGNIFICANCE OF THIS ANALYSIS

15
1. Prioritizing Clinically Relevant Variants
Researchers can focus on variants most likely to cause
the disease
Streamline genetic testing and diagnosis workflows

2. Understanding Disease Mechanisms

Reveal common mutation pattern in specific diseases
Contribute to knowledge of genotype-phenotype
Relationships

3. Supporting Personalized Medicine

Tailored treatment strategies for patients ( for example in
cancer diagnosis)
Better Predictions of disease risk or drug response

5. Our Analysis can contribute to public databases such

as ClinVar , gnomAD or COSMIC

16
17

Introduction To Bioinformatics With R A Practical Guide For Biologists (Edward Curry)
100% (1)
Introduction To Bioinformatics With R A Practical Guide For Biologists (Edward Curry)
308 pages
Statistical Analysis of Micro Biomed at A With R
No ratings yet
Statistical Analysis of Micro Biomed at A With R
43 pages
Kassambara, Alboukadel - Machine Learning Essentials - Practical Guide in R (2018)
100% (1)
Kassambara, Alboukadel - Machine Learning Essentials - Practical Guide in R (2018)
424 pages
Reference Manual: Paleontological Statistics
No ratings yet
Reference Manual: Paleontological Statistics
278 pages
Edger Users Guide
No ratings yet
Edger Users Guide
139 pages
Edge RUsers Guide
No ratings yet
Edge RUsers Guide
138 pages
Gene Cancer Data Analysis
No ratings yet
Gene Cancer Data Analysis
10 pages
(Alboukadel Kassambara) Multivariate Analysis II P
100% (1)
(Alboukadel Kassambara) Multivariate Analysis II P
170 pages
PBMC Guided Tutorial
No ratings yet
PBMC Guided Tutorial
27 pages
Combined 50 75
No ratings yet
Combined 50 75
26 pages
Basi Concepts
No ratings yet
Basi Concepts
32 pages
BT3041 Topic9
No ratings yet
BT3041 Topic9
25 pages
05 Data Transformation Exploration Visualization
No ratings yet
05 Data Transformation Exploration Visualization
38 pages
Past Manual
No ratings yet
Past Manual
221 pages
Mixomics
No ratings yet
Mixomics
100 pages
EX NO:06 Simulate Dimensionality Reduction Using Pca On A Dataset Date
No ratings yet
EX NO:06 Simulate Dimensionality Reduction Using Pca On A Dataset Date
4 pages
Script de Invernizzi
No ratings yet
Script de Invernizzi
26 pages
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
No ratings yet
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
119 pages
Past 3 Manual
100% (1)
Past 3 Manual
224 pages
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
No ratings yet
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
122 pages
Assignment CB 1
No ratings yet
Assignment CB 1
69 pages
Past 3 Manual
No ratings yet
Past 3 Manual
275 pages
Past 3.22 Manual PDF
No ratings yet
Past 3.22 Manual PDF
265 pages
Introduction To Differential Gene Expression Analysis Using RNA-seq
No ratings yet
Introduction To Differential Gene Expression Analysis Using RNA-seq
97 pages
Rcourse Partviz
No ratings yet
Rcourse Partviz
9 pages
PCA Biopsy Data Explanation
No ratings yet
PCA Biopsy Data Explanation
5 pages
02 Pca
No ratings yet
02 Pca
14 pages
Clustering 2
No ratings yet
Clustering 2
11 pages
R Textbook Full
No ratings yet
R Textbook Full
96 pages
Independent Principal Component Analysis For Biologically Meaningful Dimension Reduction of Large Biological Data Sets
No ratings yet
Independent Principal Component Analysis For Biologically Meaningful Dimension Reduction of Large Biological Data Sets
15 pages
Past 3 Manual
No ratings yet
Past 3 Manual
225 pages
Reference Manual: Paleontological Statistics
No ratings yet
Reference Manual: Paleontological Statistics
222 pages
Past 3 Manual
No ratings yet
Past 3 Manual
221 pages
Photodiode
100% (1)
Photodiode
24 pages
MicroArray Analysis - 201
No ratings yet
MicroArray Analysis - 201
13 pages
Past 3 Manual
No ratings yet
Past 3 Manual
259 pages
50 Studies Every Plastic Surgeon Should Know Full Download
No ratings yet
50 Studies Every Plastic Surgeon Should Know Full Download
411 pages
Bioinformatics Unveiled
From Everand
Bioinformatics Unveiled
Joan Melody
No ratings yet
Mla - 2 (Cia - 2) - 20221013
No ratings yet
Mla - 2 (Cia - 2) - 20221013
14 pages
Beginner's Guide To Using The DESeq2 Package
No ratings yet
Beginner's Guide To Using The DESeq2 Package
32 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Krijnen IntroBioInfStatistics
No ratings yet
Krijnen IntroBioInfStatistics
278 pages
Applied Statistics For Bioinformatics Using R
100% (2)
Applied Statistics For Bioinformatics Using R
279 pages
edgeRUsersGuide PDF
No ratings yet
edgeRUsersGuide PDF
110 pages
Lab0 R Tutorial EHS
No ratings yet
Lab0 R Tutorial EHS
9 pages
Lecture 2 Multivariate Data Analysis and Visualization
100% (1)
Lecture 2 Multivariate Data Analysis and Visualization
33 pages
DA Lab Week-1
No ratings yet
DA Lab Week-1
7 pages
Little Book of R For Multivariate Analysis
No ratings yet
Little Book of R For Multivariate Analysis
51 pages
R Tutorial For Identification of Positional and Functional Candidate Genes Using R
No ratings yet
R Tutorial For Identification of Positional and Functional Candidate Genes Using R
15 pages
Differential Analysis of Count Data - The Deseq2 Package: Michael Love, Simon Anders, Wolfgang Huber
No ratings yet
Differential Analysis of Count Data - The Deseq2 Package: Michael Love, Simon Anders, Wolfgang Huber
33 pages
Biostatistical Methods: The Assessment of Relative Risks
From Everand
Biostatistical Methods: The Assessment of Relative Risks
John M. Lachin
3.5/5 (2)
Educational Leadership: Donnie Adams
No ratings yet
Educational Leadership: Donnie Adams
171 pages
Normalization 1
No ratings yet
Normalization 1
23 pages
Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras
No ratings yet
Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras
25 pages
Howto: Get Pretty HTML Output For My Gene List: James W. Macdonald August 22, 2008
No ratings yet
Howto: Get Pretty HTML Output For My Gene List: James W. Macdonald August 22, 2008
12 pages
B.tech Mechanical Engineering Syllabus Neftu-2018
No ratings yet
B.tech Mechanical Engineering Syllabus Neftu-2018
105 pages
Affy Diffexp Clustering Exercise-1
No ratings yet
Affy Diffexp Clustering Exercise-1
16 pages
Using Limma For Microarray and RNA-Seq Analysis
No ratings yet
Using Limma For Microarray and RNA-Seq Analysis
13 pages
International Handbook of Inquiry and Learning Ravit Golan Duncan and Clark A Chinn PDF Download
No ratings yet
International Handbook of Inquiry and Learning Ravit Golan Duncan and Clark A Chinn PDF Download
81 pages
Biochemistry Answer Key-PINK PACOP
100% (3)
Biochemistry Answer Key-PINK PACOP
29 pages
Practical PCA Methods in R
No ratings yet
Practical PCA Methods in R
29 pages
Activity 5a - Data Analysis Using R and Other Stat Application-1
No ratings yet
Activity 5a - Data Analysis Using R and Other Stat Application-1
8 pages
Quality Control & Normalization of RNA SEQ Data: Shivangi Agarwal, PHD
No ratings yet
Quality Control & Normalization of RNA SEQ Data: Shivangi Agarwal, PHD
35 pages
Systems Biology: A Textbook
From Everand
Systems Biology: A Textbook
Edda Klipp
No ratings yet
NDT Report Test
No ratings yet
NDT Report Test
10 pages
Use of Government Staff Car
No ratings yet
Use of Government Staff Car
1 page
Naidoo Education Studies
No ratings yet
Naidoo Education Studies
9 pages
Grey B-JD
No ratings yet
Grey B-JD
4 pages
Emotron-Fdu-Vfx 2 0 Technical Catalogue 01-4948-01 Rev-2018.en
No ratings yet
Emotron-Fdu-Vfx 2 0 Technical Catalogue 01-4948-01 Rev-2018.en
34 pages
Language Research
No ratings yet
Language Research
207 pages
Costing Sheet of 10 KLD
No ratings yet
Costing Sheet of 10 KLD
1 page
A Guide To Writing The Dissertation Literature Review Justus J Randolph
100% (2)
A Guide To Writing The Dissertation Literature Review Justus J Randolph
8 pages
Appendix - K (HSE Training Matrix, CV Dari HSE Staff)
No ratings yet
Appendix - K (HSE Training Matrix, CV Dari HSE Staff)
28 pages
Forces Unit Test
100% (1)
Forces Unit Test
3 pages
Akers 1989
No ratings yet
Akers 1989
26 pages
Chapter .4
No ratings yet
Chapter .4
10 pages
Panjab, Chandigarh: (Theory Examination)
No ratings yet
Panjab, Chandigarh: (Theory Examination)
2 pages
Annual Fund Gift Table
No ratings yet
Annual Fund Gift Table
4 pages
Maths Sessional Roll No 27
No ratings yet
Maths Sessional Roll No 27
13 pages
TG Read Up 3
No ratings yet
TG Read Up 3
98 pages
ML Assignment 2 2019 Nptel
No ratings yet
ML Assignment 2 2019 Nptel
34 pages
Detailed Lesson Plan 5
No ratings yet
Detailed Lesson Plan 5
7 pages
Admissionformuiams2025
No ratings yet
Admissionformuiams2025
2 pages
July 2023 Time Table
No ratings yet
July 2023 Time Table
5 pages
Sony Hcd-Ex600 Ex700 Ex900 Ver.1.0
No ratings yet
Sony Hcd-Ex600 Ex700 Ex900 Ver.1.0
72 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Final Quiz 1 - Attempt Review
No ratings yet
Final Quiz 1 - Attempt Review
6 pages
BLP#1 - Assessment of Community Initiative
No ratings yet
BLP#1 - Assessment of Community Initiative
5 pages
Toefl Tips
No ratings yet
Toefl Tips
12 pages
Syllabus AST 101 G1 2
No ratings yet
Syllabus AST 101 G1 2
6 pages
Maths Sba
No ratings yet
Maths Sba
16 pages
Brochure - Fibra-Cel Disks Questions and Answers
No ratings yet
Brochure - Fibra-Cel Disks Questions and Answers
4 pages
Grade 8 Science Unit 1 Force, Motion and Energy PDF
100% (6)
Grade 8 Science Unit 1 Force, Motion and Energy PDF
67 pages
Practical Unit Operation Lab Year Level 3: Department of Petrochemical College of Technical Engineering
No ratings yet
Practical Unit Operation Lab Year Level 3: Department of Petrochemical College of Technical Engineering
10 pages
Applications of Multi-Omics: Fundamentals of Integrating Biological Data for Precision Medicine and Research
From Everand
Applications of Multi-Omics: Fundamentals of Integrating Biological Data for Precision Medicine and Research
Richard Skiba
No ratings yet
Veritas Et Misericordia 1 Finals Essay
No ratings yet
Veritas Et Misericordia 1 Finals Essay
2 pages
DLP - Environmental Science 7 - q3 Week 1
No ratings yet
DLP - Environmental Science 7 - q3 Week 1
4 pages
Genetic Algorithm: Fundamentals and Applications
From Everand
Genetic Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Document (26) - Copy 2

Uploaded by

Document (26) - Copy 2

Uploaded by

MAJOR PROJECT - ANALYSIS OF

PRINCIPAL COMPONENT ANALYSIS ( PCA ) IS A

USE IN ANALYSIS OF BIOLOGICAL DATA :

 BIOLOGICAL DATA LIKE GENE EXPRESSION ,

AS THERE ARE AS MANY PRICIPAL COMPONENTS AS

COMPONENTS OF A PCA BIPLOT :

A PCA BIPLOT CONSISTS OF TWO MAIN ELEMENTS –

1. LOADING THE ESSENTIAL LIBRARIES

3. EXTRACTING NUMERIC DATA

4. DATA INSPECTION AND CLEANING

pca_result<- prcomp(numeric_data , scale. = TRUE)

6. GENERATING THE BIPLOT

(PC1 + PC2 ) TOGTHER EXPLAIN ABOUT 88.3 % OF THE

PC1- STRONLY INFLUENCED BY HF ( VARIANT ALLELE

VARIANTS WITH HIGH- CONSERVATION SCORES

The mitochondrial DNA is maternally inherited , a circular

4. DIFFERENTIATION OF VARIANT ALLELES

5. QUALITY CONTROL PCA CAN DETECT BATCH

TOP DISEASES COUNT PER LOCUS :

2. READ THE EXCEL FILE

3. FILTER ROWS WITH NON-MISSING VALUES

4. FIND TOP 20 DISEASES

7. CREATE THE BAR PLOT

ggplot(grouped_data, aes(x = Locus, y = Count, fill =

SIGNIFICANCE OF THIS ANALYSIS –

Helps establish clear relationships between specific

2. Prioritization of Research Targets

Improves genetic testing panels by identifying most

4. Biological Pathway Analysis

When combined with pathway databases , reveals disease

TOP VARIANTS BY PATHOGENICITY

6. RECOUNT WITH PATHOGENICITY FOR

7. PLOT THE BAR GRAPH

SIGNIFICANCE OF THIS ANALYSIS

2. Understanding Disease Mechanisms

3. Supporting Personalized Medicine

5. Our Analysis can contribute to public databases such

You might also like