MicroRNA Profiling in Cancer A Bioinformatics Perspective, 1st Edition Full Book Download
MicroRNA Profiling in Cancer A Bioinformatics Perspective, 1st Edition Full Book Download
1st Edition
Visit the link below to download the full version of this book:
https://medipdf.com/product/microrna-profiling-in-cancer-a-bioinformatics-perspe
ctive-1st-edition/
Email: [email protected]
Web: www.panstanford.com
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center,
Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from
the publisher.
ISBN-13 978-981-4267-01-4
ISBN-10 981-4267-01-5
Printed in Singapore.
To the Memory of my Mentor
Dr. Andrei Yakovlev
This page intentionally left blank
PREFACE
There is a new kid on the block taking the field of cancer research by storm: a
new type of noncoding short single stranded RNA molecule, microRNA, has
recently emerged as one of the most important new classes of cellular regulators.
First discovered in 1993 by Victor Ambros in C. Elegans, these small RNAs
were termed microRNAs in 2001. Just seven years later the importance of
microRNA discovery has been highlighted by the Lasker Award given to Victor
Ambros, Gary Ruvkun, and David Baulcombe in 2008.
It is already evident that the discovery of microRNA has created a paradigm
shift in post-genomics biology, not only for scientists accustomed to traditional
central dogma of molecular biology but also for researchers studying human
diseases and accustomed to traditional genetics approach of studying one gene at
the time. The ability of microRNAs to control large groups of genes and impose
global post-transcriptional regulation of many (if not all) important cellular
processes in development, cell proliferation and differentiation has opened up a
new dimension and uncovered an astonishing complexity of intracellular
regulatory circuitry.
With the advancement of high throughput technologies over past 3 years it is
now possible to profile and sequence the entire human microRNome. Latest
advances in microRNA research have provided strong evidence that aberrant
expression of specific miRNAs is associated with a broad spectrum of human
diseases, such as cancer, diabetes, cardiovascular diseases, psychological
disorders, and others. First evidence that microRNA could be involved in cancer
came in 2002. The number of publications devoted to microRNA in cancer has
been growing exponentially ever since. In 2008 alone there were nearly 600
original research papers published on the role of miRNA in cancer with over 160
papers related to miRNA profiling in cancer.
The global microRNome profiling has shown drastic changes in expression of
multiple miRNAs in all human cancers tested to date. microRNA expression
signatures have often provided a more accurate method of classifying cancer
subtypes than transcriptome profiling of all known protein-coding genes.
microRNA profiling also allows classification of different stages in tumor
progression and in some cases can predict outcome of a disease. As researchers
vii
viii Preface
continue to study microRNA expression in cancer and focus on the most relevant
microRNAs, the further advancement of microRNA detection technologies and
computational bioinformatics methodologies is critical for successful
identification of most informative diagnostics markers and drug targets among
the tissue specific microRNAs.
This book presents a collection of invited chapters written by experts in the
emerging field of miRNomics from the unique prospective of bioinformatics,
computational and systems biology. The field is still in flux and our knowledge
and understanding of microRNAs is continuously and rapidly evolving. The idea
behind this project was to assemble a snapshot of current state of the art
microRNA research in a form of critical reviews and summaries of recently
published results on microRNA profiling in cancer.
This volume covers a wide range of topics related to microRNA profiling
which are grouped in four overlapping categories:
i. Chapters 1 through 4 provide detailed technical guides as well as tricks and
tips for using three major technology platforms that are currently available
for microRNA profiling: Detection of microRNA with Agilent microarrays
(chapter 1) and Exiqon LNA-enhanced microarrays (chapter 2); High
throughput real-time quantitative PCR (chapter 3); and Next-Generation
sequencing (chapter 4)
ii. Chapters 4, 5, 7 and 9 provide overviews and critical analysis of microRNA
expression for several human cancers such as hematological malignancies
(chapter 4, 7), prostate cancer (chapter 5), colon, lung, breast, pancreatic and
liver cancers (chapter 9).
iii. Chapters 6 and 7 present in depth reviews of microRNome architecture
discussing the function of intronic microRNAs and intriguing hypothesis of
“a kiss of microRNA” (chapter 6) as well as genome-wide analysis of
microRNA genes location showing striking correlation with genomically
unstable loci and retroviral integration sites (chapter 7).
iv. Final section of the book is devoted to the advanced systems biology topics
on analysis of microRNA regulatory networks and their interconnection
with the transcription factors-mediated gene networks (chapter 8),
functional profiling of co-expressed microRNAs in cancer and pathway
analysis of microRNA targets (chapter 9), as well as important progress in
mathematical modeling and computational analysis of dynamics of post-
transcriptional gene regulation by microRNAs (chapter 10).
To summarize, this first-of-its-kind book provides a comprehensive review of
well over a hundred recent publications in the area of microRNA profiling in
Preface ix
Yuriy Gusev
Oklahoma City, Oklahoma, USA
This page intentionally left blank
CONTENTS
Preface iii
xi
xii Contents
Glossary 217
Index 255
CHAPTER 1
1. Background
Many studies are revealing the complexity of the roles miRNAs play in normal
and abnormal animal cell development, differentiation, and regulation (for
examples of recent reviews, see1-5). Many of the recent findings involve
measurements of miRNA expression levels. It is important for such miRNA
measurements to be reproducible and quantitatively comparable from sample to
sample. In a typical experimental design, cells from blood, tissue, or cell culture
are isolated, their RNA is extracted and quantitated, and the amount of each
1
2 B. Curry and R. Ach
targeted miRNA in the RNA sample is measured. The measured amount of each
miRNA is then normalized to some measure of the original sample amount, often
to a proxy for the cell count of the sample.6 Statistical techniques are then applied
to the data, often in conjunction with clinical or other expression data, to answer
the research questions addressed in the experiment. Each of these steps adds
statistical noise and potential bias to the results, and it is important to understand
these sources of error in order to choose appropriate statistics for evaluating
observed correlations and variations.
In this chapter, we discuss some of the important considerations for making
accurate, reproducible miRNA measurements. We compare microarray and
qPCR measurements, and discuss the importance of sample prep methods and
RNA quality. We then discuss normalization methods. Appropriate normalization
of raw microarray data can reduce bias in comparisons among different samples.
Once measurement noise has been carefully characterized and RNA quality is
well controlled, most residual variation in miRNA profiles is either due to
biological noise or to the biological phenomenon that is of interest in the study.
profiling of mature miRNAs directly from total RNA without the use of size
fractionation or RNA amplification.16, 26, 27 The labeling reaction involves ligation
of one labeled cytosine to the 3’ end of each RNA sequence in the sample. The
labeling is performed under denaturing conditions, ensuring a high labeling yield,
minimal sequence bias,16 and consistently reproducible efficiency for every
miRNA sequence.26,27
Agilent microarray probe design features base-pairing with the additional
nucleotide incorporated at the 3’-end of the miRNA during labeling.16 Probes are
designed using empirical melting point-determination, making the platform
capable of single-nucleotide discrimination in the miRNA sequences.16 Hairpin
structures incorporated at the 5’ end of the probes allow the binding of the mature
miRNAs while discouraging the binding of longer RNAs in the total RNA
sample.16,26,27 The labeled sample is hybridized to the microarray under
conditions that approach equilibrium, with a substantial and reproducible fraction
of the labeled targets hybridized to the array.16
Agilent arrays comprise up to four different sequences probing each target
miRNA, each replicated four or more times. They also include a large number of
negative controls, which are designed to not hybridize to any labeled sequences,
and are used for background subtraction and estimation of background noise.
After hybridization, washing, and scanning, the image is processed as follows.28
First the array features are located, and the mean signal for each feature is
computed as the robust average of the counts per pixel reported for central pixels
of the feature. Background fluorescence is estimated by a robust RMS fit of a
surface through the negative control features, along with other features reporting
weak signals not significantly higher than the average of the negative controls.
The height of this fitted surface at each feature location is used to estimate the
background for that feature, and the standard deviation of the residuals from the
surface fit is used to estimate the noise in the background. This background is
then subtracted from the mean signal of each feature, producing the background-
subtracted signal (BGSubSignal). Since a large fraction of the miRNA genes are
not expressed in a typical sample, the distribution of background-subtracted
signals is expected to appear as a roughly Gaussian distribution centered near
zero, and tailing towards the high end, where genes expressed at low levels
contribute to the distribution (Figure 1).
The BGSubSignals of replicated features with the same probe sequence are
then summed, after applying a statistical test to reject outliers (at p < .05). The
sum of BGSubSignals for each probe sequence is multiplied by the number of
pixels in each feature and by a scaling constant, and the product is reported as the
4 B. Curry and R. Ach
Figure 1. Distribution of weak signals after background subtraction of a typical array measurement.
The width of this distribution is a measure of the background noise on the array, which determines
the detection limit.
TotalProbeSignal for that probe. Next, the TotalProbeSignals for all the different
probes targeting a given miRNA gene are summed to produce the
TotalGeneSignal for that gene, which is reported in the GeneView summary file.
The TotalGeneSignal is proportional to the total number of labeled sequences
hybridized to the probes targeting each miRNA. Details of the algorithms used to
accomplish these operations are available.28
The sum of signals of all features targeting a miRNA gene is reported, rather
than their average, for two main reasons. First, it is not obvious how to
appropriately average the signals from probes of varying sequences, which are
expected to have significantly different sensitivities to the target miRNA.
Second, for a particular array design, the sum of signals is proportional to their
unweighted average; however, array designs can change on a regular basis,
tracking the evolving consensus about which miRNA sequences are important.29
When array designs change, the TotalGeneSignal is more stable than an
average. For example, 369 genes on the Agilent Human V2 array design (P/N
G4470B) are targeted by the same probe sequences as those on the older Human
V1 array design (P/N G4470A), but there are only 16 array features targeting
each gene on the newer design (V2), rather than 20 as on the older design (V1).
When 100ng of total RNA from brain or placenta were hybridized to the V1 and
V2 array designs, the 75th percentile of the mean BGSubSignals reported for
Measurement and Interpretation of miRNA Expression Profiles 5
these 369 genes increased from 89 to 104 in brain, and from 164 to 198 in
placenta. These values are consistent with the hypothesis that nearly the same
amount of labeled target is hybridizing to 80% of the number of features, thereby
proportionately increasing the average. By contrast, the 75th percentile of the
TotalGeneSignal reported for these genes remained nearly unchanged, decreasing
slightly from 116 to 109 for brain and 211 to 207 for placenta (unpublished data
courtesy of Petula D’Andrade and Stephanie Fulmer-Smentek, Agilent
Technologies).
The linear dynamic range and detection limit of an assay can be assessed from
the response to a series of standard analytes of known concentrations. Figure 2
shows Agilent miRNA microarray results for 67 synthetic miRNA sequences,
serially diluted and hybridized (data courtesy of Hui Wang, Agilent
Laboratories). Nearly all sequences show a linear dynamic range of five logs.
The linear dynamic range is limited at the low end by background noise on the
array, and at the high end by partial saturation of the binding sites on the array.
Such titration curves can be used to measure the absolute sensitivity of the assay
to each miRNA target.30 None of the miRNAs measured in this dataset exhibit
high-end saturation with 1 fmol input levels, and most are detected above
background at 10-100 zmol input.
As with other linear measurements, proper background subtraction is essential
to maintaining response linearity at low signal levels.31 Under-subtracting
background causes low-end curvature in the signal response plot. Since there are
often many miRNAs not expressed in a given sample, many miRNAs report
signals less than zero after background subtraction, which can cause difficulties if
the data are log-transformed. In Figure 2, miRNAs not detected above
background are omitted from the plotted data. If missing data is problematic for
the downstream analysis algorithms, genes expressing below background can be
surrogated, by setting their TotalGeneSignal equal to the error in the
TotalGeneSignal estimated from the error model.
In order to estimate the significance of measured expression levels and fold
changes, it is necessary to associate error bounds with the reported signals. The
error in the BGSubSignal of each feature is estimated using a variation on the
Rosetta error model,32,33 which includes an additive error term and a
multiplicative error term.31 The additive error is calculated from the robust
standard deviation of the background (i.e. the width of the distribution in
Figure 1). Since the multiplicative error term models inter-array variance, it
cannot be robustly estimated for a single array.32 Agilent recommends using 0.08
(8% error), estimated from multiple replicate experiments conducted in different