0% found this document useful (0 votes)

87 views25 pages

Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras

The document describes analysis of breast cancer gene expression data from 3 datasets using R. The analysis included quality assessment, normalization, pairwise comparisons to identify differentially expressed genes, hierarchical clustering, and meta-analysis combining p-values across datasets. While some genes were identified in individual datasets, meta-analysis found no significant genes, possibly due to inconsistencies across the larger but less relevant second dataset.

Uploaded by

sheena_scroggin3619

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views25 pages

Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras

Uploaded by

sheena_scroggin3619

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 25

Project O: Breast Cancer Gene

Analysis Using R

Sheena Scroggins, Susan McGowan, John Caras

Introduction

• We chose to perform analysis on Breast Cancer Cells.

• We retrieved cells files that used the Affymetrix HGU133a
oligonucleotide array through a combination of the sites:
http://www.nextbio.com/b/nextbio.nb
and http://www.ncbi.nlm.nih.gov/geo/
• We performed our analysis on 3 datasets containing breast
cancer cells and normal epithelial tissue cells.
• The data sets and colors associated with each set are:
– data1 = GSE6883-GDS2617 (22)-orange
– data2 = GSE9574-GDS3139 (49)-green
– data3 = GSE5874-GDS3097 (28)-blue
Brief Background on Datasets

• The first group of cells chosen are high tumorigenic

capacity cells vs. non-tumorigenic cells vs. normal cells.
The tumorigenic cells express CD44 and low or
undetectable levels of CD24. This data set contains 2
disease states. We labeled the non-tumorigenic disease
cells as cancer = Y in the treat.txt file.
• The second group of cells we chose contained normal
epithelial cells of breast cancer patients vs. normal
epithelial cells of non-breast cancer patients.
• The third group of cells chosen had 2 disease states:
inflammatory breast cancer cells vs. invasive, non-
inflammatory breast cancer cells.
Quality Assessment & NUSE Plotting
• QA/QC - perform quality assessment and quality control to
remove, as early as possible, any ‘bad arrays’ identified.
• The code used to generate QC plots:
 library(simpleaffy)
 library(affyQCReport)
 library(affyPLM)
 dataset1 <- read.affy("treat1.txt")
 saqc1 <- qc(dataset1)
 plot(saqc1)

• The code used to generate NUSE plots:

 dataPLM1 <- fitPLM(dataset1)boxplot(dataPLM1,
main="NUSE Data Set 1", ylim=c(0.95, 1.10),
outline=FALSE, col="orange", las=3, whisklty=0,
staplelty=0)
QC Plots datasets 1,2,3
NUSE Plots Data sets 1, 2, 3
Pre-Processing of Data:
RMA Normalization & Expression Set
Creation
• RMA performs three tasks:
– background adjustment
– quantile normalization
– summarization
• Our code consists of RMA in conjunction with
creation of the expression sets.
 data1.rma <- call.exprs(dataset1,"rma")
Pairwise Comparison
• We used the function pairwise.comparison to find the mean
of the datasets and to compute log2 fold change between
the two groups: Cancer = Y and Cancer = N.
• Then we filtered the results to determine significantly
changing genes.
• Code used for pairwise analysis:
 pairwise_results1 <- pairwise.comparison(data1.rma,
"Cancer", c("Y", "N"), dataset1)
 significant1 <- pairwise.filter(pairwise_results1,
fc=log2(1.5), min.present.no=10, tt= 0.01,
present.by.group=FALSE)
• Next we sorted the significant results based upon t-test
results (lowest t-test value = highest significance, then we
display the first 25 genes.
 sort(abs(tt(significant1)),decreasing=FALSE)[1:25]
Volcano Plots

• Create volacano plots to display the fold change

vs. the lowest t-test pvalue scores (top 25 genes)
 plot( all_foldchange_probesets1 , lod1 , pch = "." ,
xlab = "fold change" , ylab = expression(-log[10]~p))
o1 <- order(abs(all_foldchange_probesets1),
decreasing = TRUE) [1:25]
points( all_foldchange_probesets1[o1], lod1[o1], pch
= 18 , col = "orange")
Volcano Plots 1, 2, 3
Filter Expression Sets

• Filter expression sets with criteria IQR >0.5

• Code – first load libraries needed
 library(affy)
 library(genefilter)
 library(multtest)
 library(RColorBrewer)
 library(pvclust)
 library(hopach)
 library(cluster)

• Next build expression set by RMA

 data1 <- read.AnnotatedDataFrame( "treat1.txt", header=TRUE,
row.names=1, sep="\t" )
 pData(data1)
 rma_eset1 <- justRMA( filenames = rownames (pData(data1)) )
 rma_data1 <- exprs(rma_eset1)
Filter code continued
 IQRfil <- function( x ) ( IQR(x) > 0.5 )
 Filter <- filterfun( IQRfil ) n
 rma_filtered1 <- genefilter( rma_data1, Filter)
 rma_selected1 <- rma_data1 [ rma_filtered1, ]
 cl1 <- as.numeric(data1$Cancer == "Y" )
 resT1 <- mt.maxT(rma_selected1, classlabel=cl1, B=2000 )
 top_multtest1 <- rownames(resT1) [1:50]
 library( "hgu133a.db" )
 probe_sets1 <- rownames(resT1) [1:10]
 gene_symbols1 <- unlist( mget( probe_sets1, hgu133aSYMBOL ) )
 tabulated_probes1 <- aafTableAnn( probe_sets1, "hgu133a.db",
aaf.handler( ) )
 saveHTML( tabulated_probes1 , file="MULTTEST-GeneList1.html" )
Distance Computation & Visualization
of gene expression data
 es1 <- rma_eset1[top_multtest1,]
 iqrs1 <- esApply(es1,1,IQR)
 gvals1 <- scale(t(exprs(es1)),rowMedians(es1),
iqrs1[featureNames(es1)])
Computing Distances – 3 methods
 manDist1 <- dist(gvals1, method="manhattan")
 hr1 <- hclust(as.dist(1-cor(t(gvals1),method=
"pearson")),method="complete")
 hc1 <- hclust(as.dist(1-cor(gvals1, method="spearman")),method="complete")
Creating Heatmaps
 hmcol <- colorRampPalette(brewer.pal(10, "RdBu"))(256)
 hmcol <- rev(hmcol)
 heatmap(as.matrix(manDist1),sym=TRUE,col=hmcol,
distfun=function(x)as.dist(x))
Cluster Analysis (Pearson) Data set 1, 2, 3
Cluster Analysis (Spearman) Data set 1, 2, 3
The samples are in rows and the features
are in columns. Heat Map Dataset #1
Heat Map Data Set #2
Heat Map Data Set #3
JUST FOR FUN

• Affymetrix front end or GUI tool – Expression Console

Meta-Analysis of Microarray Data

• Method used - combining p-values

This is performed by obtaining two measurements of
significance of change in gene expression
1. value of test-statistic
2. p-value
This method combines the p-values from all three studies
and results them as one p-value.
Make expression sets, apply filters and merge:
 library(MAMA)
 eset1 = call.exprs(phenodata1, "rma")
 eset1 = nsFilter(eset1, require.entrez = TRUE, require.GOBP = TRUE,
remove.dupEntrez = TRUE, var.func = IQR, var.cutoff = 0.5, feature.exclude =
"^AFFX")
 eset1.data = eset1$eset
 esets = list(exprs(eset1.data), exprs(eset2.data), exprs(eset3.data))
 classes = list(pData(phenodata1)[,2], pData(phenodata2)[,2],
pData(phenodata3)[,2])
Detecting differentially expressed genes

 pvalt = pvalcombination(esets, classes, moderated = "t", BHth

= 0.01)
DE IDD Loss IDR IRR
0 0 92 NaN 100

RESULTS:
DE – this refers to the number of significant genes in Meta-
Analysis. Our chosen group of genes shows none.
IDD – genes which are significant in Meta-Analysis but not in
individual studies. None is expected since no DEG genes were
found.
Loss – genes significant in individual data sets but not in Meta-
Analysis.
IDR & IRR are the percentages of Integration Driven Dis-
coveries and Integration Driven Revisions in identified differentially
expressed genes.
DEG analysis – Summary of the
combination of P - values
 summary(pvalt)
Length Class Mode
study1 76 -none- numeric
study2 0 -none- numeric
study3 16 -none- numeric
AllIndStudies92 -none- numeric
Meta 0 -none- numeric
TestStatistic 5228-none- numeric
• study1, study2 & study3 – indices of differentially
expressed genes in data sets 1 through 3.
• AllIndStudies - an index of DEG in at least one data set.
• Meta – an index of DEG found by meta-analysis.
• TestStatistic- an index of test statistics in meta-analysis.
Conclusion

• All three data sets used the HGU133a array. All three data
sets came from breast cancer tissue or individuals who had
breast cancer, with the exception of the second set which
compared normal epithelial cells in breast cancer vs.
normal epithelial cells in non-breast cancer samples.
• Analyzing microarray data by meta-analysis can be
problematic. In some cases, such as this study, it can lead
to null results.
• Whether analyzing our data sets individually or by Meta-
Analysis we can come to the conclusion that the second
data set, which happened to be the largest data set did not
have highly expressed or differentially expressed genes.
References
• Clark, A.G., Glanowski S, Nielsen R, Thomas PD, Kejariwal A, Todd
MA, Tanenbaum DM, Civello D, Lu F, Murphy B, Ferriera S, Wang G,
Zheng X, White TJ, Sninsky JJ, Adams MD, Cargill M., (2003).
Inferring nonneutral evolution from human-chimp-mouse orthologous
gene trios. Journal of Science, 2003 December 12;302(5652):1960-3.
• Hahne, F.,Huber,W.,Gentleman, R., and Falcon,S.,. Bioconductor
Case Studies. New York, NY. Springer Science and Business Media,
LLC. 2008.
• Ihnatova, I., MAMA: a 9 in 1 R package for Meta-Analysis of
MicroArray, October 1, 2010.
• http://cran.r-project.org/
• http://en.wikipedia.org/wiki/BRCA1
• http://en.wikipedia.org/wiki/BRCA2
• http://www.cancer.gov/cancertopics/factsheet/Risk/BRCA
• http://www.ncbi.nlm.nih.gov/geo/
• http://www.nextbio.com/b/nextbio.nb

Introduction To Bioinformatics With R A Practical Guide For Biologists (Edward Curry)
100% (1)
Introduction To Bioinformatics With R A Practical Guide For Biologists (Edward Curry)
308 pages
High Dimensional Microarray Data Analysis Cancer Gene Diagnosis and Malignancy Indexes by Microarray Instant EPUB Download
100% (16)
High Dimensional Microarray Data Analysis Cancer Gene Diagnosis and Malignancy Indexes by Microarray Instant EPUB Download
16 pages
Applied Statistics For Bioinformatics PDF
No ratings yet
Applied Statistics For Bioinformatics PDF
278 pages
Statistical Formula
100% (2)
Statistical Formula
12 pages
Critical Thinking
50% (2)
Critical Thinking
22 pages
Cohort Study Design
No ratings yet
Cohort Study Design
25 pages
Awan Sports Industries PVT - LTD Storage, Handling & Godaam of Various Type of Chemical Waste
No ratings yet
Awan Sports Industries PVT - LTD Storage, Handling & Godaam of Various Type of Chemical Waste
109 pages
Instructions For The Irb Research Proposal Format
100% (2)
Instructions For The Irb Research Proposal Format
4 pages
Edge RUsers Guide
No ratings yet
Edge RUsers Guide
138 pages
Edger Users Guide
No ratings yet
Edger Users Guide
139 pages
Kerali PDF
No ratings yet
Kerali PDF
15 pages
Pooling Data Across Micorarray
No ratings yet
Pooling Data Across Micorarray
49 pages
Global Test
No ratings yet
Global Test
67 pages
Course 7 - Statistics For Genomic Data Science - Week 4
No ratings yet
Course 7 - Statistics For Genomic Data Science - Week 4
25 pages
Discovering Combinatorial Biomarkers: Vipin Kumar
No ratings yet
Discovering Combinatorial Biomarkers: Vipin Kumar
23 pages
HW 4
No ratings yet
HW 4
12 pages
Yaari 2013
No ratings yet
Yaari 2013
11 pages
Deltaxpress (Δxpress) : A Tool For Mapping Differentially Correlated Genes Using Single-Cell Qpcr Data
No ratings yet
Deltaxpress (Δxpress) : A Tool For Mapping Differentially Correlated Genes Using Single-Cell Qpcr Data
18 pages
Pathway Analysis Cavill1315
No ratings yet
Pathway Analysis Cavill1315
17 pages
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
No ratings yet
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
119 pages
Descriptive Analysis in R For Metagenomics
No ratings yet
Descriptive Analysis in R For Metagenomics
79 pages
The Application of The Permutation Test in Genome Wide Expression Analysis
No ratings yet
The Application of The Permutation Test in Genome Wide Expression Analysis
115 pages
Biomarker Discovery Tutorial
No ratings yet
Biomarker Discovery Tutorial
2 pages
Compare Groups
No ratings yet
Compare Groups
49 pages
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
No ratings yet
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
122 pages
Applied Statistics For Bioinformatics Using R
100% (2)
Applied Statistics For Bioinformatics Using R
279 pages
Beginner's Guide To Using The DESeq2 Package
No ratings yet
Beginner's Guide To Using The DESeq2 Package
32 pages
Notes For Lectures 11 To 16 - 2024
No ratings yet
Notes For Lectures 11 To 16 - 2024
68 pages
Example Analysis AMDA Version 2.0.0: Mattia Pelizzola March 13, 2006
No ratings yet
Example Analysis AMDA Version 2.0.0: Mattia Pelizzola March 13, 2006
48 pages
Equivalency Reliability
No ratings yet
Equivalency Reliability
6 pages
Detailed Outline Product's Advertising Campaign Advertising Objectives Commitment Resources Required
No ratings yet
Detailed Outline Product's Advertising Campaign Advertising Objectives Commitment Resources Required
3 pages
Lab Manual - DSR
No ratings yet
Lab Manual - DSR
32 pages
DAafgfaga
No ratings yet
DAafgfaga
22 pages
Lecture 3. Dimension Reduction
No ratings yet
Lecture 3. Dimension Reduction
37 pages
Batch Effect Removal
No ratings yet
Batch Effect Removal
7 pages
M.SC Transcriptome Analysis 2025
No ratings yet
M.SC Transcriptome Analysis 2025
21 pages
Dchip Expression
No ratings yet
Dchip Expression
4 pages
M5 A2
100% (1)
M5 A2
4 pages
Statistical For de
No ratings yet
Statistical For de
9 pages
Slides Nov2019 Day4
No ratings yet
Slides Nov2019 Day4
28 pages
Ceda
No ratings yet
Ceda
11 pages
Document (26) - Copy 2
No ratings yet
Document (26) - Copy 2
17 pages
Gene Ontology and Functional Enrichment: Genome 559: Introduction To Statistical and Computational Genomics
No ratings yet
Gene Ontology and Functional Enrichment: Genome 559: Introduction To Statistical and Computational Genomics
30 pages
Minor PPT Yolo
No ratings yet
Minor PPT Yolo
19 pages
Solutions - Lab 4 - Assumptions & Multiple Comparisons: Learning Outcomes
No ratings yet
Solutions - Lab 4 - Assumptions & Multiple Comparisons: Learning Outcomes
23 pages
Identifying Differentially Expressed Genes
No ratings yet
Identifying Differentially Expressed Genes
3 pages
R Tutorial For Identification of Positional and Functional Candidate Genes Using R
No ratings yet
R Tutorial For Identification of Positional and Functional Candidate Genes Using R
15 pages
Edger Users Guide
No ratings yet
Edger Users Guide
105 pages
Affy Diffexp Clustering Exercise-1
No ratings yet
Affy Diffexp Clustering Exercise-1
16 pages
Tutorial On Microarray Analysis Using Bioconductor and R (Sample Study)
No ratings yet
Tutorial On Microarray Analysis Using Bioconductor and R (Sample Study)
2 pages
edgeRUsersGuide PDF
No ratings yet
edgeRUsersGuide PDF
110 pages
Using Limma For Microarray and RNA-Seq Analysis
No ratings yet
Using Limma For Microarray and RNA-Seq Analysis
13 pages
Sheet 14
No ratings yet
Sheet 14
3 pages
Differential Analysis of Count Data - The Deseq2 Package: Michael Love, Simon Anders, Wolfgang Huber
No ratings yet
Differential Analysis of Count Data - The Deseq2 Package: Michael Love, Simon Anders, Wolfgang Huber
33 pages
Easy Differential Expression: F. Hahne and W. Huber
No ratings yet
Easy Differential Expression: F. Hahne and W. Huber
6 pages
Multivariate Exploratory
No ratings yet
Multivariate Exploratory
13 pages
Sex Spectra
100% (1)
Sex Spectra
6 pages
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
No ratings yet
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
39 pages
Assignment CB 1
No ratings yet
Assignment CB 1
69 pages
Robustness Evaluations of Pathway Activity Inference Methods On Gene Expression Data
No ratings yet
Robustness Evaluations of Pathway Activity Inference Methods On Gene Expression Data
24 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
Chua Yuen Chong, Gerrard - BIO61604 - Pract 3 and 4
No ratings yet
Chua Yuen Chong, Gerrard - BIO61604 - Pract 3 and 4
20 pages
Introduction To R For Gene Expression Data Analysis
No ratings yet
Introduction To R For Gene Expression Data Analysis
11 pages
IQ Mod
No ratings yet
IQ Mod
45 pages
RNA-Seq Analysis Course
No ratings yet
RNA-Seq Analysis Course
40 pages
Ial Edexcel Psychology May 2023 Paper 1
No ratings yet
Ial Edexcel Psychology May 2023 Paper 1
24 pages
Gene Expression Analysis: Ulf Leser and Karin Zimmermann
No ratings yet
Gene Expression Analysis: Ulf Leser and Karin Zimmermann
46 pages
Sofyan Fadli Anshary Rumasukun, Yohanis Rante, Oscar O. Wambrauw, Bonifasia Elita Bharanti
100% (1)
Sofyan Fadli Anshary Rumasukun, Yohanis Rante, Oscar O. Wambrauw, Bonifasia Elita Bharanti
13 pages
Limma: January 11, 2011
No ratings yet
Limma: January 11, 2011
168 pages
Sulla Healing Santo Daime 2005
No ratings yet
Sulla Healing Santo Daime 2005
105 pages
Krijnen IntroBioInfStatistics
No ratings yet
Krijnen IntroBioInfStatistics
278 pages
Assessing Students' Vocabulary Learning Perception and Strategy Use With Particular Reference To EFL Students of Adigrat University
No ratings yet
Assessing Students' Vocabulary Learning Perception and Strategy Use With Particular Reference To EFL Students of Adigrat University
6 pages
Social Communicaiton Intervention Programme
No ratings yet
Social Communicaiton Intervention Programme
16 pages
Chapter 9 IMPROVING OF CLASSROOM BASED ASSESSMENT TEST
No ratings yet
Chapter 9 IMPROVING OF CLASSROOM BASED ASSESSMENT TEST
54 pages
Doc2vec Explain
No ratings yet
Doc2vec Explain
5 pages
Chapter 7 - Demand Forecasting in SCM
No ratings yet
Chapter 7 - Demand Forecasting in SCM
21 pages
Lewkowitz Gilliland 2024 A Feminist Critical Analysis of Public Toilets and Gender A Systematic Review
No ratings yet
Lewkowitz Gilliland 2024 A Feminist Critical Analysis of Public Toilets and Gender A Systematic Review
28 pages
Gifted and Talented Education
No ratings yet
Gifted and Talented Education
31 pages
Culture and The Consumer Journey - 2020 - Journal of Retailing
No ratings yet
Culture and The Consumer Journey - 2020 - Journal of Retailing
15 pages
Ainga-S Final Proposal 29
No ratings yet
Ainga-S Final Proposal 29
20 pages
CurriculumVitae Example Bocconi
No ratings yet
CurriculumVitae Example Bocconi
2 pages
LerouxMcShane2017 Youthpolicing
No ratings yet
LerouxMcShane2017 Youthpolicing
14 pages
Anova Kacang Panjang
No ratings yet
Anova Kacang Panjang
8 pages
DSS Chapter 4
No ratings yet
DSS Chapter 4
2 pages
Take Home Quiz
No ratings yet
Take Home Quiz
1 page
Multicultural Community Service Job Announcement: Executive Director Position
No ratings yet
Multicultural Community Service Job Announcement: Executive Director Position
2 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet

Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras

Uploaded by

Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras

Uploaded by

Project O: Breast Cancer Gene

Sheena Scroggins, Susan McGowan, John Caras

• We chose to perform analysis on Breast Cancer Cells.

• The first group of cells chosen are high tumorigenic

• The code used to generate NUSE plots:

• Create volacano plots to display the fold change

• Filter expression sets with criteria IQR >0.5

• Next build expression set by RMA

• Affymetrix front end or GUI tool – Expression Console

• Method used - combining p-values

 pvalt = pvalcombination(esets, classes, moderated = "t", BHth

You might also like