0% found this document useful (0 votes)
8 views57 pages

Microarrays Technology

The document provides an overview of DNA microarray technology, highlighting its ability to analyze gene expression on a large scale, which is crucial for understanding complex biological systems. It discusses various applications in cancer research, including tumor classification and biomarker identification, as well as the different approaches to microarray design and analysis. Additionally, it covers the importance of data preprocessing and analysis techniques to ensure accurate interpretation of gene expression data.

Uploaded by

Rana Alnamlah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views57 pages

Microarrays Technology

The document provides an overview of DNA microarray technology, highlighting its ability to analyze gene expression on a large scale, which is crucial for understanding complex biological systems. It discusses various applications in cancer research, including tumor classification and biomarker identification, as well as the different approaches to microarray design and analysis. Additionally, it covers the importance of data preprocessing and analysis techniques to ensure accurate interpretation of gene expression data.

Uploaded by

Rana Alnamlah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Introduction to Micro-

arrays
VIDAN FATHI GHONEIM
DNA Microarray - A technology
that is reshaping molecular
biology
 Genes function in a network
Thousands of genes and their products (i.e., proteins) in a given
living organism function in a complicated and orchestrated way
that creates the mystery of life.

 Gene expression pathways react to biochemical and other types


of simulation

 Traditional methods in molecular biology generally work on a


"one gene in one experiment" basis, which means that the
throughput is very limited and the "whole picture" of gene function
is hard to obtain.

 DNA Microarrays promise to monitor the whole genome on a


single chip so that researchers can have a better picture of the
interactions among thousands of genes simultaneously.
Microarray Applications in Cancer

 Identification 0f Single Nucleotide Polymorphism (SNP)


 Classification of tumors
 Identification of target genes of tumor suppressers
 Identification of cancer biomarkers
 Identification of genes associated with chemo
resistance and drug discovery
Two Basic Approaches to Micro-arrays

 In situ synthesis of oligo arrays


 Deposition of pre-synthesized DNA
In situ synthesis of oligo arrays
-
j% 8

 Oligonucleotide chips (Affymetrix) base Pair

-Oligo probes synthesized onto chip (short usually 25 bp long)


·
-One-colour fluorescence detection
-

-Gene expression and mutational analysis

 Advantages
-Very specific, can identify large number of SNPs
-Tens of thousands oligos can be analyzed per chip
-Automation
-High sensitivity

 Disadvantages
*-Only known genes can be analyzed

-Many spots make a gene


-May be too specific for gene expression analysis
-Not reusable
Affymetrix Array
5%
%
>
- 11 .
1
sequence
.

& S

Gene 25 dots
PM Probe = 25 bp
probe perfectly
=

probe take 5 dots


each only

have 5 probe

complementary to a
we
that mean

specific region of a
gene
16 perfect MM Probe = 25 bp
match probe agreeing with
oligos a PM apart from the
16 middle base.
mismatch The middle base is a
oligos transition (A ⇐⇒ G,
>
after binding with the Patient sample

C ⇐⇒ G) of that
S
- -

base
II
I
II
I II II
A
PM
In UN

I 16 25 I A
I when we take
I 16 25 If the binding happen
=>
mean
this binding
Person that
Sample from Sick

- -
Aharoni and Vorst, in press
activate in the abnormal Case

101 II II I I II II
MM
L 16 25
I G I
L 16 25
-
b
missmatching 1to make sure the binding is true binding
Synthesis by photolithography -
Affymetrix
Deposition of pre-synthesized DNA
--
&

 Glass microarrays
-cDNA printed onto microscope slide
* -Two-colour fluorescence detection
>
- color
oligo array
one

-Gene expression and novel gene discovery


length of the probe depend the type
-
on we
use (usually long)
 Advantages
-Measure of competitive fluorescence hybridization allow
quantitative analysis of complete tissue sections.
-Thousands of individual cDNA per slide
*-Unknown genes can be analyzed

-Automation
-High sensitivity
-Reduced sample requirements

 Disadvantages
-Not reusable
reference
-
cell

- -

e
>
-

Single

&
- O
8
-
-

-u

-see
Spotted Two-channel cDNA
Microarrays

Up to 200.000 spots


on a single glass-
slide.
elij 1 ,
521
,
-

gi old
-

-& & S

RNA from two


sources on each
array. & each one represent gene activity

Labeled with Print-tip


different dyes, red area
(Cy5) and green cDNA Green black red
Yellow

(Cy3). Microarray the


bind

cell
with

control
doesn't

to

a
normal

bnormal
bind

or

Cell
bind

ma
with

reference
&

[abnormal]
the

Cell
gene
the
bind

normal
with

and
both

abnormal

(normal]
Cells
X
Micro-array Manufacture

Itis glass or some other material slide


DNA molecules are attached at fixed locations
(spots)
The spots are printed by a robot, or synthesized
by photolithography, or by ink-jet printing
Itstypical dimension is about 1 inch or less, spot
diameter is of the order 0.1 mm and can be
even smaller
Slide types
Oligonucleotide Probe PCR or cDNA Probe

X
<100 nucleotides 200-2,000 base pairs Proteins

Aldehyde, Epoxy or AML Slide Silane or Polylysine Slide Nylon Membrane or Hydrogel

Primary Amine

NH2
HC=N
HC=O
UV Cross-Linking or Bake
Schiff Base
Aldehyde

SNP Analysis
Expression Profiling
Microarrays Process

X
DNA Spotting

hybridisation
↑ analysis d os 10

scanning
Pins

-
-

Slow
A
Solid Pins

Low density arrays

Pins very durable


&

&
Quill (Split) Pins

High density arrays


Fast
Pins expensive, and more
delicate
-

&
Capillary Pins

Medium/High density arrays


Fast
Only four pins
Scanning Optics

X
Emission

Galvo Mirror

Proprietary Lens
(F-theta & hi-resolution)

Microarray Sample
Signals Interpretation
Slide printing

CONTROL (Normal) CELL TEST (Altered) CELL

Labelled
target
from RNA
Slide printing

CONTROL (Normal) CELL TEST (Altered) CELL


Cy5 Cy3
Cy5 Cy3

Cy5 Cy3
Cy5 Cy3
Cy5 Cy3

Hybridise
Probe printed
onto slide
Gene 1 Gene 2 Gene 3 Gene 4 Gene 5
Cyanine5 Gene

GREEN represents Control DNA where either DNA or

Cyanine3
-
cDNA derived from normal tissue is hybridized to the
-

target DNA.

RED represents Sample DNA where either DNA or cDNA


-

is derived from diseased tissue hybridized to the target


-

DNA.

YELLOW represents a combination of Control and


-

Sample DNA where both hybridized equally to the target


-

DNA.

BLACK represents areas where neither the Control nor


-
Sample DNA hybridized to the target DNA
-
CONTROL (Normal) CELL TEST (Altered) CELL
Labelled Cy5
Cy5
Cy3
Cy3

target Cy5
Cy5
Cy5
Cy3
Cy3
Cy3

Hybridise
Probe printed
onto slide
Gene 1 Gene 2 Gene 3 Gene 4 Gene 5

Slide Scanned

Spot intensities
analysed

Gene 1 Gene 2 Gene 3 Gene 4 Gene 5


No Higher expression No Higher expression No
Difference in test Difference in control Difference
A Micro-array Experiment

Q ② ⑨

Biological Scanning
Experimental
Question or Lab work & Image
Design
Hypothesis Analysis

① ⑥

Biological verification Statistics and
ranking Preprocessing
and interpretation met
ranking
Scanning & Image Analysis

 Spotted two-channel micro-arrays uses two different lasers, one with


wavelength 532nm, and one with wavelength 635nm.
-
-
red

 Transfers the information on the slide into images using a laser scanner.

 Image analysis software is used to find and extract information from


each spot.
Image Analysis

 Addressing
Find the areas in an image that belong to spots. The combined area of
spot and its background is called target area
 Segmentation
Partition the target area into foreground and background
Image Analysis

 Reduction >
-

Extract two scalar values R and G for red and green intensities
and assign one value R/G for relative abundance
- R/G of 1 indicates no change -

- R/G < 1 indicates down regulation -

(greater intensity in G)
-

R black
-
= 0
- no
binding
Data Representation

◼ To increase visualization, the following


transformation is done
M = log R – log G = log (R / G)
2

A = (log R + log G)/2

◼ M is the log fold change.


◼ A the total spot intensity.ratio) used for visualization
Raw intensities Log 2 intensities
-
small values

Histogram/Density plots
From Hybridized Micro-array Image to
Raw Data

The spot intensities of TIFF images processed


using image analysis programs are then
exported from the program into a series of text
files and spread sheets, representing the raw
data.
Microarray measurments in these files represent
the target quantity (the gene abundance)
-
> mRNA

indirectly by measuring the intensity of the


-

fluorescence of the spots for each fluorescent


dye (each optical wavelength)
Gene Expression Data (Raw Data)

 The spot intensities are then exported from the


image analysis program into a series of text files.
 It consists of three parts:
o Gene expression data matrix -

o Gene annotation >


-
~

o Sample annotation >


-
24
=-

 Fulland detailed description of each feature


(spot) on the array should be provided.
 How exactly the gene expression matrix was
obtained should be supplied.
Image Analysis
Programs
 ArrayVision,
 Dapple,
 ScanAlyze
 ImaGene,
 GenePix,
 QuantArray,
 or SPOT.
each row represent a spot

gene activity

log
&
2
Image Analysis – Addressing

Adaptive shape Adaptive circle Fixed circle [e.g.


[e.g. Spot] [e.g. GenePix, ScanAlyze, GenePix,
Dapple] QuantArray]
Problematic Spots

b
O ....
Image Analysis – Segmentation

 The local background is estimated by taking the median of the values


inside the indicated areas.

---- QuantArray,ImaGene
---- ScanAnalyze
---- GenePix,Spot
MICROARRAYS & DATA
PREPROCESSING
Problem Definition
Definition

 The gene activity estimation has an impact on the subsequent data


analysis and biological interpretation.

 If the gene’s measured activity is not due to the activity itself,


subsequent analysis using this erroneous estimate will, of course, be
-

misleading.

 The need to have complete gene expression matrix as most DNA


microarray data analysis methods such as hierarchical gene clustering,
biomarker identification, sample class prediction, and genetic and
regulatory network prediction require the expression values to be
complete and as accurately as possible.
-m
negative ⑳ > Since it's negative

L
-
~

I
"backgrounda missing
data

No is j
- 1
2

5144 data Preprocessing 2


-
5
Why Preprocessing of Micro-
arrays???
-
Measurements on Micro-arrays may
be biased by:

Efficiency of RNA extraction, reverse


transcription, label incorporation. in the experimental work

Exposure, scanning, spot detection, etc.


-

Systematic effects due to characteristics of


&
--

the array, such as


o effects of different probes (i.e. cDNAs or oligos),
o spotting effects,
o region effects,
>
- scratch or Sand

o pin effects.
Problematic Spots
Steps of Preprocessing

 Background Correction: The spot intensities are


corrected for the background intensities.
 Normalization: The purpose is to promote uniformity
within arrays and reproducibility between arrays.
 Gene Filtering: The non informative spots should be
-

excluded from the analysis.


 Dealing with MV: Imputing MV as accurate as

possible. ing
i values
Gene Filtering

behavior of the gene

 Gene -
profiling experiments have genes that exhibit little
variation in the profile and are generally not of interest in
the experiment.
Constant = no variation => not significant gene
or

not informative gene

 Gene expression profile experiments have data where the


absolute values are very low (poor spot hybridization)
- -

b not significant gene

% js .
54a ; &

 Another filter removes genes, whose profiles have low


Eis
entropy
&
System 30
-
less than a given percentile ;from the
spike =

experimental data.
entropy
low measure of sturbance
b
=
no variation

entropy
Gene Ranking

◼ We want to identify genes with different

&
mRNA level between the two sources.
◼ More formally, we want to test
H0: gene g is not regulated
HA: gene g is regulated.
usi binding & 05 mRNA more available 20

' interaction &9


intensity [5g
M = gene in
group as
, M = gene in
group (2)

If true gene
Ho
this is the is not
signicante
: M, = M2 >
-

Combining the Data H : M ,


FM2 -
> if this is true the gene is
signicante

Arrays
Array 1 Array 2 Array 3 Array 4 …
Gene 1 0.34 0.23 -0.30 0.78 …
Gene 2 2.30 1.71 3.44 0.65 …
Genes Gene 3 -0.45 -0.19 0.11 -0.58 …
Gene 4 0.45 0.12 0.78 0.12 …
Gene 5 -3.41 -2.17 -4.21 -1.67 …
… … … … … …

◼ How do we know which genes that has a change


in mRNA level?
Dealing with MV
MV

Possible solutions

1.Exclude missing values from subsequent analysis.


-

2. Repeat the experiment.


-
> not efficient consume time , money
,

3. Modify clustering methods that can deal with


-

missing values.
4. Imputation of missing values.
-
>
-
missing values 11di :

estimation
Why do we need Normalization ?

•Variations that we are interested in, for example caused by particular


sample treatments or disease states (Signal)

•Normalization is a term used to describe a collection of methods which


try to eliminate the unwanted variations between sample expression
-

values while retaining the variation that we are interested in.

•These problems can be demonstrated by microarraying the same


sample twice and comparing the expression values. -
> should give a linear relationship
Self-against-Self Experiment

=> need to be normalized

This shows random noise and


systematic bias. Figure : Self-self
experiment. All data points should ideally lie on the red
straight line. This shows systematic bias between arrays.
Ideally, all data points should lie
on this line.
Sample Replicates

These are biological replicates, in that they provide information about


the variation between biological samples.

Replicates are essential in microarray studies, they not only make the
mean expression values more accurate (reducing random noise), but
also provide information about the variability of a particular expression
value in the natural population (essential for hypothesis testing … to
come later).
Normalization

◼ Normalization can both within a single array and


between many arrays.
within the
array : normalization between genes (spots

normalization between subjects


between many arrays
:

◼ Common methods for within array normalization


are: global loess and print-tip loess.

◼ Common methods for between array


normalization are: scale and quantile.
Graphical Representation of
slide data:
ideal
Zero

Scatter plot MA-plot

i ni

!
log-

R 109 o
>
-

49
slides
for different
>
-
Box plot showing median absolute deviation

Before Normalization
After Normalization

50
dig
-
Density plot to view density function

intensity intensity

51
Case Study:
GPR-GenePix Results format (*.gpr)
&
(gene activity


Normalization: Loess
>
- Normalization within the
array

curve for the entire Slice Call genes]


global loess :
regression
>
-
Slice
local 11 / "
: for a
part from the

region
-
Case Study: Normalization Within Array

Loess

A, M So I -

gene activity
1555

before

1 1. %
55. so

after

Syd
⑨ 3

gene activity
*

55
Normalization: Quantile
>
- Normalization between the
array

It is based on the rationale behind the QQ-plots where the quantiles (i. e. the sorted

measurements or values) of a data set X is plotted against the quantiles of another
data set Y.
 If X and Y both come from the same distribution then their QQ-plot approximately
shows a line along the diagonal.
-

 In the case of probe level intensities the quantiles of two arrays usually do not lie on
the diagonal even though their true underlying expression values are (or at least
-

&

should be) identically distributed in replicate samples.


-
that mean always

need the
we

normalization  One could argue that the respective distribution functions were transformed during the microarray experiment due to
technical reasons.
 In order to regain a common distribution, one simply could project the quantiles onto the diagonal of the QQ -plot.

 Another method for between arrays normalization is the simple scaling of the M-
values from a series of arrays so that each array has the same Median Absolute
Deviation (MAD.
 It does not consider any region or intensity dependent effects.
 However, the concept of quantile is similar idea to scale normalization but more radical, as all of the various quantiles
are adjusted and not only the 50% quantile (median).
Case Study: Normalization Between Arrays
RG densities RG densities

0.25
0.20

0.20
0.15

0.15
Density
Density

0.10

0.10
0.05
0.05

0.00
0.00

0 5 10 15 4 6 8 10 12 14 16

Intensity Intensity

BetweenArray Normalization (Quantile)

You might also like