Microarrays Technology
Microarrays Technology
arrays
VIDAN FATHI GHONEIM
DNA Microarray - A technology
that is reshaping molecular
biology
Genes function in a network
Thousands of genes and their products (i.e., proteins) in a given
living organism function in a complicated and orchestrated way
that creates the mystery of life.
Advantages
-Very specific, can identify large number of SNPs
-Tens of thousands oligos can be analyzed per chip
-Automation
-High sensitivity
Disadvantages
*-Only known genes can be analyzed
& S
Gene 25 dots
PM Probe = 25 bp
probe perfectly
=
have 5 probe
complementary to a
we
that mean
specific region of a
gene
16 perfect MM Probe = 25 bp
match probe agreeing with
oligos a PM apart from the
16 middle base.
mismatch The middle base is a
oligos transition (A ⇐⇒ G,
>
after binding with the Patient sample
C ⇐⇒ G) of that
S
- -
base
II
I
II
I II II
A
PM
In UN
I 16 25 I A
I when we take
I 16 25 If the binding happen
=>
mean
this binding
Person that
Sample from Sick
- -
Aharoni and Vorst, in press
activate in the abnormal Case
101 II II I I II II
MM
L 16 25
I G I
L 16 25
-
b
missmatching 1to make sure the binding is true binding
Synthesis by photolithography -
Affymetrix
Deposition of pre-synthesized DNA
--
&
Glass microarrays
-cDNA printed onto microscope slide
* -Two-colour fluorescence detection
>
- color
oligo array
one
-Automation
-High sensitivity
-Reduced sample requirements
Disadvantages
-Not reusable
reference
-
cell
- -
e
>
-
Single
&
- O
8
-
-
-u
-see
Spotted Two-channel cDNA
Microarrays
gi old
-
-& & S
cell
with
control
doesn't
to
a
normal
bnormal
bind
or
Cell
bind
ma
with
reference
&
[abnormal]
the
Cell
gene
the
bind
normal
with
and
both
abnormal
(normal]
Cells
X
Micro-array Manufacture
X
<100 nucleotides 200-2,000 base pairs Proteins
Aldehyde, Epoxy or AML Slide Silane or Polylysine Slide Nylon Membrane or Hydrogel
Primary Amine
NH2
HC=N
HC=O
UV Cross-Linking or Bake
Schiff Base
Aldehyde
SNP Analysis
Expression Profiling
Microarrays Process
X
DNA Spotting
hybridisation
↑ analysis d os 10
scanning
Pins
-
-
Slow
A
Solid Pins
&
Quill (Split) Pins
&
Capillary Pins
X
Emission
Galvo Mirror
Proprietary Lens
(F-theta & hi-resolution)
Microarray Sample
Signals Interpretation
Slide printing
Labelled
target
from RNA
Slide printing
Cy5 Cy3
Cy5 Cy3
Cy5 Cy3
Hybridise
Probe printed
onto slide
Gene 1 Gene 2 Gene 3 Gene 4 Gene 5
Cyanine5 Gene
Cyanine3
-
cDNA derived from normal tissue is hybridized to the
-
target DNA.
DNA.
DNA.
target Cy5
Cy5
Cy5
Cy3
Cy3
Cy3
Hybridise
Probe printed
onto slide
Gene 1 Gene 2 Gene 3 Gene 4 Gene 5
Slide Scanned
Spot intensities
analysed
Q ② ⑨
③
Biological Scanning
Experimental
Question or Lab work & Image
Design
Hypothesis Analysis
① ⑥
⑤
Biological verification Statistics and
ranking Preprocessing
and interpretation met
ranking
Scanning & Image Analysis
Transfers the information on the slide into images using a laser scanner.
Addressing
Find the areas in an image that belong to spots. The combined area of
spot and its background is called target area
Segmentation
Partition the target area into foreground and background
Image Analysis
Reduction >
-
Extract two scalar values R and G for red and green intensities
and assign one value R/G for relative abundance
- R/G of 1 indicates no change -
(greater intensity in G)
-
R black
-
= 0
- no
binding
Data Representation
Histogram/Density plots
From Hybridized Micro-array Image to
Raw Data
gene activity
log
&
2
Image Analysis – Addressing
b
O ....
Image Analysis – Segmentation
---- QuantArray,ImaGene
---- ScanAnalyze
---- GenePix,Spot
MICROARRAYS & DATA
PREPROCESSING
Problem Definition
Definition
misleading.
L
-
~
I
"backgrounda missing
data
No is j
- 1
2
o pin effects.
Problematic Spots
Steps of Preprocessing
possible. ing
i values
Gene Filtering
Gene -
profiling experiments have genes that exhibit little
variation in the profile and are generally not of interest in
the experiment.
Constant = no variation => not significant gene
or
% js .
54a ; &
experimental data.
entropy
low measure of sturbance
b
=
no variation
entropy
Gene Ranking
&
mRNA level between the two sources.
◼ More formally, we want to test
H0: gene g is not regulated
HA: gene g is regulated.
usi binding & 05 mRNA more available 20
If true gene
Ho
this is the is not
signicante
: M, = M2 >
-
Arrays
Array 1 Array 2 Array 3 Array 4 …
Gene 1 0.34 0.23 -0.30 0.78 …
Gene 2 2.30 1.71 3.44 0.65 …
Genes Gene 3 -0.45 -0.19 0.11 -0.58 …
Gene 4 0.45 0.12 0.78 0.12 …
Gene 5 -3.41 -2.17 -4.21 -1.67 …
… … … … … …
Possible solutions
missing values.
4. Imputation of missing values.
-
>
-
missing values 11di :
estimation
Why do we need Normalization ?
Replicates are essential in microarray studies, they not only make the
mean expression values more accurate (reducing random noise), but
also provide information about the variability of a particular expression
value in the natural population (essential for hypothesis testing … to
come later).
Normalization
i ni
!
log-
R 109 o
>
-
49
slides
for different
>
-
Box plot showing median absolute deviation
Before Normalization
After Normalization
50
dig
-
Density plot to view density function
intensity intensity
51
Case Study:
GPR-GenePix Results format (*.gpr)
&
(gene activity
↑
Normalization: Loess
>
- Normalization within the
array
region
-
Case Study: Normalization Within Array
Loess
A, M So I -
gene activity
1555
before
1 1. %
55. so
⑧
after
Syd
⑨ 3
gene activity
*
55
Normalization: Quantile
>
- Normalization between the
array
It is based on the rationale behind the QQ-plots where the quantiles (i. e. the sorted
measurements or values) of a data set X is plotted against the quantiles of another
data set Y.
If X and Y both come from the same distribution then their QQ-plot approximately
shows a line along the diagonal.
-
In the case of probe level intensities the quantiles of two arrays usually do not lie on
the diagonal even though their true underlying expression values are (or at least
-
&
need the
we
normalization One could argue that the respective distribution functions were transformed during the microarray experiment due to
technical reasons.
In order to regain a common distribution, one simply could project the quantiles onto the diagonal of the QQ -plot.
Another method for between arrays normalization is the simple scaling of the M-
values from a series of arrays so that each array has the same Median Absolute
Deviation (MAD.
It does not consider any region or intensity dependent effects.
However, the concept of quantile is similar idea to scale normalization but more radical, as all of the various quantiles
are adjusted and not only the 50% quantile (median).
Case Study: Normalization Between Arrays
RG densities RG densities
0.25
0.20
0.20
0.15
0.15
Density
Density
0.10
0.10
0.05
0.05
0.00
0.00
0 5 10 15 4 6 8 10 12 14 16
Intensity Intensity