Micro Arrays II - Image Analysis and Data Pre-Processing
Micro Arrays II - Image Analysis and Data Pre-Processing
MX
A7-421
m
probsets (~100)
x sectors (~=3) DNA Usually 3 Sectors (print-tip) i x j spots (18x20) Empty spots landing lights
Target
(cDNA, PCR products, etc.)
n probsets
(~100)
Probeset
Oligos
~20 40nt
Usually 1 n x m probsets
10,000 genes * 2 dyes * 3 copies/gene * ~40 pixels/gene = 2,400,00 values Image Analysis Pre-processing only 10,000 values
Image Analysis
Addressing
Addressing: Estimate location of spot centers. Segmentation: Classify pixels as foreground or background. Extraction: For each spot on the array and each dye foreground intensities background intensities Done by GeneChip Affymetrix software quality measures.
Image Analysis
Addressing: Estimate location of spot centers. Segmentation: Classify pixels as foreground or background. Extraction: For each spot on the array and each dye foreground intensities background intensities quality measures.
Image Analysis
Addressing: Estimate location of spot centers. Segmentation: Classify pixels as foreground or background. Extraction: For each spot on the array and each dye foreground intensities background intensities quality measures. Segmentation
Circular feature
Background Reduction
Image Analysis
Segmentation (Spot detection) Background Estimation Value
Value = Spot Intensity Spot Background
Sample1 Sample1
98 4209 2 . . 9711 . . 28
98 4209 2 . . 9711 . . 28
G=Sample1 Log2(G=Sample1)
Log2
Log2(R=Sample1)
R=Sample1
98 4209 2 . . 9711 . . 28
R=Sample1
Desv Intensity
MA-Plot
G=Sample1
1 value?
R G
M A
Normalization 2 dyes
"With-in"
(2 color technologies)
(assumption: Majority No change)
1
M
log2(R)-log2(G) -1 -4 -3 -2
10
12
14
16
(log2(G)+log2(R)) / 2
Normalization 2 dyes
"With-in"
(2 color technologies)
(assumption: Majority No change)
Before
After
Normalization 2 dyes
"With-in" Spatial
(2 color technologies) Before Normalization Aftter loess Global Normalization Aftter loess by Sector (print-tip) Normalization
Log2
1.5
Normalization 1 or 2 dyes
density(x = log2(t[, 15] + 200), adjust = 0.475)
Between-slides
Density 0.0
7
0.5
1.0
9 N = 3840
10 Bandwidth = 0.1051
11
12
Before normalization
1.0
0.8
density
0.6
0.4
0.0
0.2
10
11
12
13
14
15
16
0.0
0.2
0.4
0.6
10
11
12 x
13
14
15
log intensity
Summarization Affymetrix
Oligonucleotide dependent technologies
PM MM
The "summarization" equivalent in two-dyes technologies is the average of gene replicates within the slide.
Use replicated spots as averages Remove unrecoverable genes Remove problematic spots in all arrays Infer values using computational methods (warning)
More than 10,000 genes Too many data increases Computation Time and analysis complexity Remove
Genes that do not change significantly Undefined Genes Low expression Large signal to noise ratio Large statistical significance Large variability Large expression
Keeping
Transformation
c)
Microarray Twodyes
Image Scanning
Spot Detection
Intensity Value
M=log2(R/G)
Normalization
d)
Within
Between
A=log2(R*G)/2
Dr. Hugo A. Barrera Saldaa Paper in Mol. Med. 2007 : DNA Microarrays - A Powerful Genomic Tool for Biomedical Research Trevino - Barrera - Mol Med 2007
(controls)
Microarray
Hybridization
(byduplicates)
Scanning&
DataProcessing
Detectionof
Dierentially
ExpressedGenes
Validationand
Analysis
Image
Analysis
Within
Normalization
(perarray)
Between
Normalization
(allarrays)
6b
http://bioinformatica.mty.itesm.mx/?q=node/68
Read Images
Read BOTH Images together using SpotFinder
Mark file 1 as "Cy3" = Green Mark file 2 as "Cy5" = Red
Create Grid
Create Grid
Metarows = 12, Metacolumns = 4 Rows = 24, Columns = 24 Pixels = 450 (of the 24 x 24 spots) Spacing = 18 (between metacolumns and metarows)
Adjust Grid
Created Grids are not aligned to the image.
Use Visible All (right click in a blank area) Use Move All To adjust overall position. Use visible all to restore grid.
Save Grid
Image Analysis
Copy images
Adjust (save grid first, in mac adjust doesnt work well) Process
Export to .mev file Open .mev file in excel Remove comment lines Compute signal:
1 From the grid adjust 1 From the RI plot 1 From the data (figure) 2 From the QC view (A and B) What does they represent?
Signal A = Cy3 Green = MNA - MedBkgA = Media del spot A - Mediana del fondo B Signal B = Cy5 Red = MNB - MedBkgB = Media del spot B - mediana del fondo B Copy image in a word file
Execute Process
- Select Gridding Tab - Use Histogram Segmentation - Spot Size = 10 - Process All !
Inspect MA-PLOT
Select RI-PLOT Tab Observe the MA-PLOT You can switch on/off specific grids A tendency can be observed (which has to be corrected to 0 see MIDAS exercise)
View 2 gives if each had M > 1 (yellow, or 0.5 in this image) or M < -1 View 1 gives the count of all M values per color (yellow, gray, blue, and green)
Signal A = Cy3 Green = MNA MedBkgA = Media del spot A Mediana del fondo B Signal B = Cy5 Red = MNB - MedBkgB = Media del spot B - mediana del fondo B Copy image in a word file
Imagen
Lemos 2 imgenes, Verde=Cy3, Roja=Cy5 para generar un valor de intensidad con ruido de fondo reducido para cada color:
Generamos
Datos
un grid con la cantidad de spots y diseo espacial especificado para el microarreglo Ajustamos las posiciones visualmente moviendo los grids Calculamos el valor de la seal y el ruido de fondo para cada color Obtuvimos un archivo con datos