Document (26) - Copy 2
Document (26) - Copy 2
MITOCHONDRIAL
DATA USING R STUDIO
SUBMITTED BY : SUBMITTED TO :
BHUVAN NAKRA DR. MINAKSHI GARG
UEM211071
BE BIOTECHNOLOGY, 8 TH SEM
1
ACKNOWLEDGEMENT
I would like to express my heartfelt gratitude to my esteemed guide, Dr. Minakshi Garg, for her
unwavering support, guidance, and encouragement throughout the course of my project,
ANALYSIS OF MITOCHONDRIAL DATA USING R STUDIO . Her vast knowledge and expertise
have been invaluable in shaping the direction and scope of this research. She has provided me
with insightful suggestions, critical feedback, and constructive advice at every stage of the
project, ensuring its successful completion. Her dedication and meticulous attention to detail
inspired me to approach the project with the same rigor and commitment.
I am especially thankful for the time and effort Dr. Minakshi Garg devoted to mentoring me,
despite her busy schedule. Her ability to explain complex concepts in a simplified manner and
her enthusiasm for teaching have been a source of immense motivation. She not only guided me
technically but also instilled in me the importance of discipline, perseverance, and critical
thinking, which have significantly contributed to my growth as a student and a learner.
I also want to acknowledge her constant encouragement, which played a pivotal role in
overcoming challenges during the project. Her guidance extended beyond academics, providing
a supportive and collaborative environment that encouraged creativity and innovation. This
project would not have reached its current level of success without her continuous mentorship. I
feel privileged to have had the opportunity to work under her guidance, and I will always be
grateful for her invaluable contribution to this endeavour.
2
PRINCIPAL COMPONENT ANALYSIS (PCA ) BIPLOT IN
R STUDIO
3
HOW PCA CONSTRUCTS THE PRINCIPAL COMPONENTS
4
STEPS IN PCA :
2. IMPORTING DATA
my_data <- read_excel("C:/Users/bhuva/Downloads/1-
da.xlsx")
5. PERFORMING PCA
biplot(pca_result,col=c("magenta","blue"))
5
RESULT
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 1.3848 1.1884 0.6624 0.48087
Proportion of Variance 0.4794 0.3531 0.1097 0.05781
Cumulative Proportion 0.4794 0.8325 0.9422 1.00000
6
INTERPRETATION
RESULTS
HIGH CONSERVATION + LOW VARIABILITY –
DISEASE-CAUSING VARIANTS
LOW CONSERVATION +HIGH VARIABILITY –
BENIGN OR TOLERATED POLYMORPHISM
HIGH CONSERVATION + HIGH VARIABILITY –
HOTSPOTS OR POPULATION-SPECIFIC
FUNCTIONAL VARIANTS
LOW CONSERVATION +LOW VARIABILITY –
RARE NEUTRAL VARIANTS
7
INTRODUCTION :
8
SIGNIFICANCE OF PCA ANALYSIS :
9
1. IN POPULATION GENETICS OR PHYLOGENY
WE COMPILE VARIANTS FROM DIFFERENT
INDIVIDUALS AND USE VARIANT FREQUENCIES
OR CONSERVATION SCORES TO COMPARE
GENETIC PATTERNS ACROSS SAMPLES
2. FUNCTIONAL GENOMICS
PCA CAN HELP INTERPRET FUNCTIONAL IMPACT
SCORES ( CONSERVATION SCORES ) ACROSS
VARIANTS
3. ENVIRONMENTAL MICROBIOLOGY
CLUSTER SOIL / OIL SAMPLES BASED ON
MICROBIAL ABUNDANCE AND ENVIRONMENTAL
FACTORS ( PH , TEMPERATURE )
6. GROUPING OF DATA
11
grouped_data <- top_filtered %>%
group_by(Locus, ClinVar) %>%
summarise(Count = n(), .groups = "drop")
3. Clinical Applications
13
STEPS :
1. INSTALL THE REQUIRED LIBRARIES
library(readxl)
library(dplyr)
library(ggplot2)
2. READ THE DATA
data <- read_excel("C:/Users/bhuva/Downloads/1-
da.xlsx")
3. COUNT VARIANT ALLELE OCCURENCES
variant_counts <- data %>%
count(`Variant Allele`, sort = TRUE)
4. GET TOP 10 VARIANT ALLELES
top_10 <- variant_counts %>%
top_n(10, n)
5. FILTER ORIGINAL DATA FOR ONLY THESE TOP
VARIANTS
filtered_data <- data %>%
filter(`Variant Allele` %in% top_10$`Variant Allele`)
14
ggplot(plot_data, aes(x = reorder(`Variant Allele`, n), y =
n, fill = Pathogenicity)) +
geom_col() +
coord_flip() +
labs(
title = "Top 10 Variants by Pathogenicity",
x = "Variant Allele",
y = "Count"
)+
theme_minimal() +
theme(axis.text.y = element_text(size = 10))
16
17