100% found this document useful (8 votes)
240 views14 pages

Methods of Microarray Data Analysis III Papers From CAMDA 02 1st Edition DOCX PDF Download

The document is a compilation of papers from the third CAMDA conference focusing on microarray data analysis, emphasizing data quality assurance and various analytical methodologies. It includes tutorials on gene expression biology, quality monitoring, and outlier detection, as well as presentations on specific research findings related to gene expression differences and normalization methods. The volume serves as a resource for researchers to enhance their understanding and improve practices in microarray data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (8 votes)
240 views14 pages

Methods of Microarray Data Analysis III Papers From CAMDA 02 1st Edition DOCX PDF Download

The document is a compilation of papers from the third CAMDA conference focusing on microarray data analysis, emphasizing data quality assurance and various analytical methodologies. It includes tutorials on gene expression biology, quality monitoring, and outlier detection, as well as presentations on specific research findings related to gene expression differences and normalization methods. The volume serves as a resource for researchers to enhance their understanding and improve practices in microarray data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Methods of Microarray Data Analysis III Papers from CAMDA

02 1st Edition

Visit the link below to download the full version of this book:

https://medipdf.com/product/methods-of-microarray-data-analysis-iii-papers-from-
camda-02-1st-edition/

Click Download Now


eBook ISBN: 0-306-48354-8
Print ISBN: 1-4020-7582-0

©2004 Springer Science + Business Media, Inc.

Print ©2003 Kluwer Academic Publishers


Boston

All rights reserved

No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,
mechanical, recording, or otherwise, without written consent from the Publisher

Created in the United States of America

Visit Springer's eBookstore at: http://www.ebooks.kluweronline.com


and the Springer Global Website Online at: http://www.springeronline.com
Contents

Contributing Authors ix

Preface xi

Introduction 1

SECTION I TUTORIALS 7

THE BIOLOGY BEHIND GENE EXPRESSION: A BASIC TUTORIAL 9


MICHAEL F. OCHS AND ERICA A. GOLEMIS

MONITORING THE QUALITY OF MICROARRAY EXPERIMENTS 25


KEVIN R. COOMBES, JING WANG, LYNNE V. ABRUZZO

OUTLIERS IN MICROARRAY DATA ANALYSIS 41


RONALD K. PEARSON, GREGORY E. GONYE, AND JAMES S. SCHWABER

SECTION II BEST PRESENTATION AWARD 57

ORGAN-SPECIFIC DIFFERENCES IN GENE EXPRESSION AND


UNIGENE ANNOTATIONS DESCRIBING SOURCE MATERIAL 59
DAVID N. STIVERS, JING WANG, GARY L. ROSNER, AND KEVIN R. COOMBES

SECTION III ANALYZING IMAGES 73


vi Methods of Microarray Data Analysis III

CHARACTERIZATION, MODELING, AND SIMULATION OF MOUSE


MICROARRAY DATA 75
DAVID S. LALUSH

TOPOLOGICAL ADJUSTMENTS TO GENECHIP EXPRESSION VALUES


ANDREY PTITSYN 93

SECTION IV NORMALIZING RAW DATA 103

COMPARISON OF NORMALIZATION METHODS FOR CDNA


MICROARRAYS 105
LILING WARREN, BEN LIU

SECTION V CHARACTERIZING TECHNICAL AND BIOLOGICAL


VARIANCE 123

SIMULTANEOUS ASSESSMENT OF TRANSCRIPTOMIC VARIABILITY


AND TISSUE EFFECTS IN THE NORMAL MOUSE 125
SHIBING DENG, TZU-MING CHU, AND RUSS WOLFINGER

HOW MANY MICE AND HOW MANY ARRAYS? REPLICATION IN


MOUSE CDNA MICROARRAY EXPERIMENTS 139
XIANGQIN CUI AND GARY A. CHURCHILL

BAYESIAN CHARACTERIZATION OF NATURAL VARIATION IN GENE


EXPRESSION 155
MADHUCHHANDA BHATTACHARJEE, COLIN PRITCHARD, MIKKO J.
SlLLANPÄÄ AND ELJA ARJAS

SECTION VI INVESTIGATING CROSS HYBRIDIZATION ON


OLIGONUCLEOTIDE MICROARRAYS 173

QUANTIFICATION OF CROSS HYBRIDIZATION ON


OLIGONUCLEOTIDE MICROARRAYS 175
LI ZHANG, KEVIN R. COOMBES, LIANCHUN XIAO

ASSESSING THE POTENTIAL EFFECT OF CROSS-HYBRIDIZATION ON


OLIGONUCLEOTIDE MICROARRAYS 185
SEMAN KACHALO, ZAREMA ARBIEVA AND JIE LIANG

WHO ARE THOSE STRANGERS IN THE LATIN SQUARE? 199


WEN-PING HSIEH, TZU-MING CHU, AND RUSS WOLFINGER
Methods of Microarray Data Analysis III vii

SECTION VII FINDING PATTERNS AND SEEKING BIOLOGICAL


EXPLANATIONS 209
BAYESIAN DECOMPOSITION CLASSIFICATION OF THE PROJECT
NORMAL DATA SET 211
T. D. MOLOSHOK, D. DATTA, A. V. KOSSENKOV, M. F. OCHS

THE USE OF GO TERMS TO UNDERSTAND THE BIOLOGICAL


SIGNIFICANCE OF MICROARRAY DIFFERENTIALGENE EXPRESSION
DATA 233
RAMÓN DÍAZ-URIARTE, FÁTIMA AL-SHAHROUR, AND JOAQUÍN DOPAZO

Acknowledgments 249

Index 251
This page intentionally left blank
Contributing Authors

Abruzzo, Lynne V., University of Texas M.D. Anderson Cancer Center,


Houston, TX
Al-Shahrour, Fátima, Centro Nacional de Investigaciones Oncológicas,
(CNIO),
(Spanish National Cancer Centre), Madrid, Spain
Arbieva, Zarema, University of Illinois at Chicago, Chicago, IL
Arjas, Elja, Rolf Nevanlinna Institute, University of Helsinki, Finland
Bhattacharjee, Madhuchhanda, Rolf Nevanlinna Institute, University of
Helsinki, Finland
Chu, Tzu-Ming, SAS Institute, Cary, NC
Churchill, Gary A., The Jackson Laboratory, Bar Harbor, Maine
Coombes, Kevin R., University of Texas M.D. Anderson Cancer Center,
Houston, TX
Cui, Xiangqin, The Jackson Laboratory, Bar Harbor, Maine
Datta, D., Fox Chase Cancer Center, Philadelphia, PA
Deng, Shibing, SAS Institute, Cary, NC
Díaz-Uriarte, Ramón, Centro Nacional de Investigaciones Oncológicas,
(CNIO), (Spanish National Cancer Centre), Madrid, Spain
Dopazo, Joaquín, Centro Nacional de Investigaciones Oncológicas, (CNIO),
(Spanish National Cancer Centre), Madrid, Spain
Golemis, Erica A., Fox Chase Cancer Center, Philadelphia, PA
Gonye, Gregory E., Thomas Jefferson University, Philadelphia, PA
Hsieh, Wen-Ping, North Carolina State University, Raleigh, NC
Kachalo, Seman, University of Illinois at Chicago, Chicago, IL
Kossenkov, A. V., Fox Chase Cancer Center, Philadelphia, PA and Moscow
Physical Engineering Institute, Moscow, Russian Federation
Liang, Jie, University of Illinois at Chicago, Chicago, IL
Liu, Ben, Bio-informatics Group Inc., Cary, NC
Moloshok, T. D., Fox Chase Cancer Center, Philadelphia, PA
Ochs, Michael F., Fox Chase Cancer Center, Philadelphia, PA
Pearson, Ronald K., Thomas Jefferson University, Philadelphia, PA
x Contributing Authors

Pritchard, Colin, Fred Hutchinson Cancer Research Centre, Seattle, WA


Ptitsyn, Andrey, Pennington Biomedical Research Center
Rosner, Gary L., University of Texas M.D. Anderson Cancer Center,
Houston, TX
Schwaber, James S, Thomas Jefferson University, Philadelphia, PA
Sillanpää, Mikko J., Rolf Nevanlinna Institute, University of Helsinki,
Finland
Stivers, David N., University of Texas M.D. Anderson Cancer Center,
Houston, TX
Wang, Jing, University of Texas M.D. Anderson Cancer Center, Houston,
TX
Warren, Liling, Bio-informatics Group Inc., Cary, NC
Wolfinger, Russ, SAS Institute, Cary, NC
Xiao, Lianchun, The University of Texas MD Anderson Cancer Center,
Houston, TX
Zhang, Li, The University of Texas MD Anderson Cancer Center, Houston,
TX
Preface

As microarray technology has matured, data analysis methods have


advanced as well. However, microarray results can vary widely from lab to
lab as well as from chip to chip, with many opportunities for errors along the
path from sample to data. The third CAMDA conference held in November
of 2002 pointed out the increasing need for data quality assurance
mechanisms through real world problems with the CAMDA datasets. Thus,
the third volume of Methods of Microarray Data Analysis emphasizes many
aspects of data quality assurance.
We highlight three tutorial papers to assist with a basic understanding of
underlying principles in microarray data analysis, and add twelve papers
presented at the conference. As editors, we have not comprehensively edited
these papers, but have provided comments to the authors to encourage clarity
and expansion of ideas. Each paper was peer-reviewed and returned to the
author for further revision.
We do not propose these methods as the de facto standard for microarray
analysis. But rather we present them as starting points for discussion to
further the science of micrarray data analysis. The CAMDA conference
continues to bring to light problems, solutions and new ideas to this arena
and offers a forum for continued advancement of the art and science of
microarray data analysis.

Kimberly Johnson

Simon Lin
This page intentionally left blank
Introduction

A comparative study of analytical methodologies using a standard data


set has proven fruitful in microarray analysis. To provide a forum for these
comparisons the third Critical Assessment of Microarray Data Analysis
(CAMDA) conference was held in November, 2002. Over 170 researchers
from eleven countries heard twelve presentations on topics such as data
quality analysis, image analysis, data normalization, expression variance,
cross hybridization and pattern searching. The conference has evolved in its
third year, just as the science of microarrays has developed. While initial
microarray data analysis techniques focused on classification exercises
(CAMDA ’00), and later on pattern extraction (CAMDA ’01), this year’s
conference, by necessity, focused on data quality issues. This shift in focus
follows the maturation of microarray technology as the detection of data
quality problems has become a prerequisite for data analysis. Problems such
as background noise determination, faulty fabrication processes, and, in our
case, errors in data handling, were highlighted at the conference.
The CAMDA ‘02 conference provided a real-world lesson on data
quality control and saw significant development of the cross-hybridization
models. In this volume, we present three tutorial chapters and twelve paper
presentations. First, Michael Ochs and Erica Golemis present a tutorial
called “The Biology Behind Gene Expression.” This discussion is for non-
biologists who want to know more about an intelligent machine called the
cell. This machinery is extremely complex and a glossary in this tutorial
provides the novice with an overview of important terms related to
microarrays while the rest of the paper details the biological processes that
impact microarray analysis. Next is a tutorial on methods of data quality
control by Kevin Coombes. We invited Dr. Coombes to submit this tutorial,
2 Introduction

titled “Monitoring the Quality of Microarray Experiments,” as an expansion


of his presentation at the conference, which is also prominently featured.
The last tutorial is by Ronald Pearson titled “Outliers in Microarray Data
Analysis.” This tutorial addresses the issue of quality control by identifying
outliers and suggesting methods to deal with technical and biological
variations in microarray data.
As always, we are happy to highlight the paper voted by attendees as the
Best Presentation. This year, the award went to:

David N. Stivers, Jing Wang, Gary L. Rosner, and Kevin R. Coombes


University of Texas M.D. Anderson Cancer Center, Houston, TX
“Organ-specific Differences in Gene Expression and UniGene
Annotations Describing Source Material”

for their rigorous scrutiny of data quality before starting data analysis.
Presented by Kevin Coombes, their paper not only revealed the existence of
errors in the Project Normal data set, but also specified the exact nature of
the problems and included the methods used to detect these problems. See
below for more details on these data set errors.

CAMDA 2002 Data Sets

The scientific committee chose two data sets for CAMDA ‘02. The first,
called Project Normal came from The Fred Hutchinson Cancer Center and it
showed the variation of baseline gene expression levels in the liver, kidney
and testis of six normal mice. By using a 5406-clone spotted cDNA
microarray, Pritchard et al. concluded that replications are necessary in
microarray experiments. The second data set came from the Latin Square
Study at Affymetrix Inc. This benchmark data set was created to develop
statistical algorithms for microarrays. Sets of fourteen genes with known
concentrations were spiked into a complex background solution and
hybridized on Affymetrix chips. Data was obtained with replicates and both
Human and E. Coli chips were studied.
As mentioned above, there were errors in the Project Normal dataset that
were undetected until CAMDA abstracts were submitted. Once we received
the Stivers et al. abstract, we asked the original Project Normal authors to
confirm their findings. The errors in the data set were verified and after
much discussion among the Scientific Committee members, a decision was
made to keep the contest going to allow the Stivers group to report and
discuss their finding of data abnormalities at the conference. Actually, many
groups revealed various aspects of the data abnormalities, but the Stivers
group not only realized that a problem existed, but also identified the
Methods of Microarray Data Analysis III 3

specific problem. Colin Pritchard, representing Project Normal, confirmed


that indeed, the problems in the dataset were a result of incorrectly merging
the data with the annotations, resulting in mismatched row/column
combinations. In addition, several slides have a small number of misaligned
grids. These problems affected about 1/3 of the genes (though different sets)
in the testis and liver data. Pritchard also noted that a re-analysis of the data
with the corrected data sets showed that the results were not notably
different from the original conclusions. For the record, both the original and
the corrected data sets are available at the CAMDA conference website for
researchers who might be interested in “data forensics”. We extend our
thanks to Colin Pritchard, Li Hsu, and Peter Nelson at the Fred Hutchinson
Cancer Center for their assistance and professionalism in handling this
discovery and allowing the conference to proceed as planned. They were
most gracious in their contributions to the conference.

Organization of this Volume

After presenting the three tutorial papers, naturally the first conference
paper is the one voted as Best Presentation. We then divide the book into
subject areas covering image analysis, data normalization, variance
characterization, cross hybridization, and finally pattern searching. At the
end of this introduction, you will also find a link to the web companion to
this volume.

Analyzing Images
Raw microarray data first exists as a scanned image file. Differences in
spot size, non-uniformity of spots, heterogeneous backgrounds, dust and
scratches all contribute to variations at the image level. In Chapter 5, David
Lalush characterizes such parameters and discusses ways to simulate
additional microarray images for use in developing image analysis
algorithms.
On the Affymetrix platform, hybridization operators have observed that
the images tend to form some kind of mysterious pattern. In Chapter 6,
Andrey Ptitsyn argues that there indeed is a background pattern. He further
postulates that the pattern might be caused by the fluid dynamics in the
hybridization chamber.
4 Introduction

Normalizing Raw Data

Normalization has been recognized as a crucial step in data pre-


processing. Do some mathematical operations truly allow us to remove the
systematic variation that might skew our analysis, or, are we distorting the
data to create illusions? The paper by Liling Warren and Ben Lu.
investigated seven different ways to normalize microarray data. Results
show that normalization has a greater impact than expected on detecting
differential expressions: the same downstream detection method can result in
23 to 451 genes, depending on the pre-processing of the data. Suggestions
to guide researchers in the normalization process are provided.

Characterizing Technical and Biological Variance

The project normal paper [PNAS 98:13266-77, 2001] showed us that


even for animals under ‘normal’ conditions, gene expression levels do
fluctuate from one to the other. This biological variation complicates the
final genetic variation we find on the microarray. The microarray can also
include technical variations produced during the measurement process. Deng
et al. describes a two linear mixed model to assess variability and
significance in Chapter 8. By a similar mixed model approach, Cui et al.
calculates the necessary number of replicates to detect certain changes. This
is of great interest to experimental biologists. Usually, we have limited
resources for either total number of microarrays as a financial consideration,
or from the limited number of cells we can obtain. The optimal resource
allocation formula by Cui et al. lets us answer questions such as: Should we
use more mice or more arrays? Should we pool mice? Chapter 9 provides
some answers to these questions.
Estimating location and scale from experimental measurements has been
one of the major themes in statistics. Most of the previous work on
microarrays focused on the classification of the expression changes under
different conditions. Bhattacharjee et al. investigated the classification of
intrinsic biological variance of gene expression. By using a Bayesian
framework, the authors support the hypothesis that some genes by nature
exhibit highly varied expression. This work is featured in Chapter 10.

Investigating Cross Hybridization on Oligonucleotide


Microarrays

Quantitative binding of genes on the chip surface is a fundamental issue


of microarrays [Nature Biotechnology 17:788-792, 1999]. Characterizing

You might also like