0% found this document useful (0 votes)

5 views9 pages

Statistical For de

Lesson 6 of STAT 555 focuses on statistical methods for analyzing differential expression in microarray studies, particularly using design matrices and moderated t-tests. The lesson demonstrates how to handle paired samples in two-condition studies, emphasizing the importance of using the LIMMA package for efficient computation and improved power in statistical tests. It also explains the concept of moderated t-tests and the benefits of empirical Bayes methods for variance moderation, ultimately enhancing the detection of differentially expressed genes.

Uploaded by

Huaqing Li

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views9 pages

Statistical For de

Uploaded by

Huaqing Li

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

3/8/2018 https://onlinecourses.science.psu.

edu/stat555/print/book/export/html/8

Published on STAT 555 (https://onlinecourses.science.psu.edu/stat555)

Home > Lesson 6: Statistics for Differential Expression in Microarray Studies

Lesson 6: Statistics for Differential

Expression in Microarray Studies
Key Learning Goals for this Lesson:

Developing a design matrix

Understanding the moderated t-test
Understanding why increasing power improves both the FDR and FNR

After preprocessing and normalization, the data are in an expression matrix usually with genes in the
rows and samples in the columns. The basic analysis is a test for each gene. We need to account for
the experimental design in the analysis - for example we may have paired samples or technical
replicates.

We are going to demonstrate with the ColonCA data.

6.1 - Two Condition Studies

The most basic differential expression study has two conditions. These could be:

treatment vs control
mutant vs wild type
exposure B versus exposure C

The first thing we will do is look at how this works on a two channel microarray. Because we find that the
two channels on the same microarray are more similar than samples on different microarrays, we will
used a paired analysis. Usually each array will have both conditions using a dye swap design. The dye
effect will average out over the arrays, as long as the same number of samples for each treatment are
labelled with each dye, so we do not need to create a special statistical model with a dye effect.
However, M=log2(R)-log2(G), so for some microarrays M represents B-C and for others C-B and we
do need to keep track of that.

https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8 1/9
3/8/2018 https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8

For the colonCA data, let's assume that the 22 paired samples were paired on the arrays (i.e. the two
tissues from the same subject were hybridized to the two channels on the same array) and that red is
always reported before green. (When we have data from a lab or a data repository, it should be clear
which samples are labeled with each dye.) There is a positive patient number for the normal tissue
samples and a negative patient number with the same number with for the tumor tissue samples. This
makes it easy for us to look at the data.

For example, we can see below that the first two columns are microarray 12 with the tumor in the red
and the normal in the green.

For microarray 27 (columns 3 and 4), the normal is in the red and the tumor is in the green. This is
exactly what we expect in a dye-swap design.

Typically, the data will be arranged in two matrices, one for the red channel on each array, and one for
the green, as shown below. After normalization, we obtain M=log2(R)-log2(G). For the paired
comparison, we reduce the data to d=log2(normal)-log2(tumor) for each patient (microarray). As
shown below, on array 12, d=-M and on array 27 d=M.

To account for this, we make a "design matrix" which is a matrix with one column and 22 rows (one for
each microarray). The entry is +1 if d=M and -1 if d=-M, so that the element-wise product of the design
matrix and the M values is d. This simple operation allows us to compute the d values associated with
every gene in a single matrix operation, giving us a matrix of the log2 expression difference between
normal and tumor for every gene on every microarray as a matrix with genes in the rows and
microarrays in the columns.
https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8 2/9
3/8/2018 https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8

We can then compute the t-statistic for each row, giving us a paired t-test for every gene. Some of the
t-statistics are shown below, along with the histogram of all of the 2000 t-statistics. If there is no
differential expression, these t-statistics all come from the null distribution. Assuming that the difference
in log2 expression values is approximately normal, the null distribution should be t with 21 d.f. for each
of the genes.

Hsa. 3004 -2.057516

Hsa.13491 -1.169676

Hsa.13491.1 -1.590321

Hsa.37254 1.137496

Hsa. 541 -1.750747

Hsa. 20836 -0.851902

The results look fairly bell-shaped, but there might be some genes with t-statistics farther out in the tails
than expected. To be sure, we need to compute the p-value for each test, based on the t on 21 d.f. as
the reference null distribution.

Instead of computing a t-statistic for every gene using a loop, we will use the LIMMA package, which
computes the mean difference and the s.e. of the mean difference efficiently using matrix
operations. We will use the LIMMA package throughout this course.

Here are the results that we get using LIMMA.

t.test LIMMA (unmoderated)

Hsa. 3004 -2.057516 -2.057516

Hsa.13491 -1.169676 -1.169676

Hsa.13491.1 -1.590321 -1.590321

Hsa.37254 1.137496 1.137496

Hsa. 541 -1.750747 -1.750747

Hsa. 20836 -0.851902 -0.851902

One reason to use LIMMA is computational efficiency. ColonCA has only 2000 genes, so computing
the t-statistic using a loop does not take very long. However, when we have 30 - 60 thousand genes,
using a loop is quite slow. However, LIMMA has two other advantages. It turns out that the idea of
using a design matrix allows for very flexible statistical analysis. Be changing the design matrix, we can
https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8 3/9
3/8/2018 https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8

use LIMMA for complicated experimental designs like 2-way ANOVA and randomized complete block
designs. We can have a mix of paired and unpaired samples. We can do analysis of covariance
(ANCOVA) or regression. As well, the default method in LIMMA uses an empirical Bayes estimate to
"moderate" the standard deviation in the t-test denominator using the distribution of all the standard
deviations. This can really improve the power especially with small samples. Recall that although
power is the probability of correctly rejecting the null hypothesis when it is false, if we obtain more
power for the same p-value then we actually improve the false discovery rate as well.

Below are the p-values from the 2000 paired t-tests using the ordinary t-statistic and the moderated t-
statistic. Because our sample size (22) is relatively large, there is not a huge difference between the
two test statistics, although if you look carefully you can see that the first bin is slightly higher for the
moderated test (which is what you expect if there are some truly differentially expressing genes).

The table below shows the estimated percentage of genes that do NOT differentially express π0

Pounds and Cheng π0 Storey π0 tests with p < 0.05

ordinary t 0.758 0.664 122

moderated t 0.761 0.679 178

Pounds and Cheng's method says that about 25% differentially expressed and about 75% don't.
Storey's method says that roughly 33% differentially expressed. However, both methods estimate a
slightly larger percentage of NON-differentially expressing genes using the moderated test. And yet, at
p<0.05, there are almost 50% more "detections" using the moderated test. Why?

The moderated t has more power so there are fewer non-detections. Even the most conservative of the
4 estimates of π0 estimates that 25% or 500 genes differentially express. So, even if we reject at
p<0.05 and ALL the discoveries are true, we have a substantial number of false nondiscoveries. The

https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8 4/9
3/8/2018 https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8

moderated method improves power and so reduces the false nondiscovery rate by using information
from the population of genes to improve the estimate of the SD of expression for each gene.

What is a Moderated T-test?

The idea of moderation comes from a deep and surprising statistical result called Stein's paradox [1].
In its simplest form it states that when you are estimating 4 or more population means, the best
estimator is not the 4 or more sample means, but a weighted average of the sample means and the
"mean of the means". One way to find an appropriate weighting is to assume that the population
means themselves come from a population (which is a Bayesian idea). This population is then the
prior, and the best estimator of each population mean is the Bayes predictor, which is a weighted
average of the prior mean and the sample mean. LIMMA uses an Empirical Bayes method which
estimates the prior from the set of all feature(since we have 2000 of them - one for each gene).

LIMMA uses this Empirical Bayes method to moderate the sample variances, which are mean squared
deviations (and so are a type of sample mean). Below is a plot of variance of the genes and beside it
a gamma distribution with the mean and variance matching the average of variance of the sample
variances. (Got that? We computed 2000 variances and then used them as a sample to compute a
mean and variance.)

The shapes of these two plots is very similar (although the peak of the Gamma distribution is slightly
lower.) LIMMA assumes that the variances come from a Gamma distribution and uses the variances
computed for each gene to obtain the empirical estimates of the mean and variance of the Gamma.
This Gamma distribution is the empirical prior distribution for the variances.

Now consider the plot below of the same Gamma distribution. The mean of the distribution is denoted
by the vertical black line. The other 4 vertical lines are the TRUE (but unknown) variances of 4 genes.
The curves of the same color are the sampling distributions of the sample variances for sample size 22
for a gene with this variance. Notice that the genes with high variance have a sampling distribution that
is very spread out and that there are a lot of sample variances in the left tail. So, if we observe a gene
with a large sample variance, its true variance is likely to be smaller. Conversely (and harder to see on
this plot) the genes with variance smaller than the mean variance are have more sample variances
below the true value. The moderated variance is a weighted average of the black line and the
observed variance for each gene. This pulls the large variances down towards the mean variance
(making the t-tests for those genes more powerful) and the small variances up towards the mean
variance (making the t-tests for those genes less powerful). Moderation is somewhat like having a
https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8 5/9
3/8/2018 https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8

larger sample size for estimating the variance because the sample variance (on average) is more likely
to be far from the population variance, while moderated variances are (on average) closer. The
moderated variance also has an associated d.f. that is higher.

This empirical Bayes moderation turns out to be equivalent to having a larger sample size. So, as well
as shrinking the sample variances towards the mean variance, the associated degrees of freedom
increase. These are the d.f. of the null t-distribution for the moderated t-test. Recall that as the d.f. of a
t-test increase, the critical values decrease. Using the moderated variance to estimate the SD has two
effects on the t-test - the SD in the denominator of the test changes and the d.f. increase. The result is
that for genes with large sample variance, power is increased in two ways - the moderated sample
variance is smaller and the d.f. are larger. For genes with small sample variance, power is decreased
by having a larger moderated sample variance and increased by having more d.f.

If you had a sample size of 100 instead of 22 all of the distributions would be narrower, and there is less
shrinkage. As a result, the amount of moderation adjusts automatically to the amount of information
available for each gene.

In summary, the moderated t-test is a t-test using the square root of the moderated variance as the SD
instead of the sample variance. Having a better estimate of the standard deviation in turns out to be
reflected in having more power for the tests. You would have the most power for that sample size if you
actually knew the standard deviation, in which case you would use a normal distribution for the null
distribution.

There are many different models for doing variance moderation: Cyber T and SAM are two of these
products available in R which use moderated variances with a somewhat different justification. I prefer
LIMMA and the associated statistical model because the clever empirical prior used by LIMMA leaves
us with t-tests. Using other software you have to simulate to get the null distribution. Using LIMMA
means that the t-test using the moderated standard deviation continues to have a t-distribution when the

https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8 6/9
3/8/2018 https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8

null is true, but there is a change in the d.f. associated with the moderated variance which is computed
along with the moderated standard deviation. Simulations done by Smyth and others confirmed that the
moderated t-test gains power. This means that the for any given false discovery rate, the false non-
discovery rate will be smaller.

Doing the moderated tests with LIMMA

To perform the moderated tests with LIMMA we start with the normalized data. For two-channel
microarrays, we usually use the normalize M as the data, which will give us a moderated paired t-test.
This is appropriate, because hybridizing the two samples on the same microarray induces a pairing.
As well, for the colonCA data, we have assumed that the two samples on the same array are also from
the same patient.

We require a "design matrix" to determine what needs to be averaged for the test. When there are
dye-swaps, some M values are B-C and others are C-B, so our design matrix is a single column of +/-1
with +1 for Normal-Cancer and -1 for Cancer-Normal. Denoting the design matrix as D and the column
of M values as M, DTM is the average Normal-Cancer value.

Given the data and the design matrix, LIMMA computes the difference in means the the appropriate SD
in the one step, and then computes the moderated SD and the moderated t-test in the "eBayes" step. I
then usually use Storey's method to compute q-values and come up with a final gene list.

6.2 - Two-Condition Studies on a 1 Channel

MicroArray
In this section, we work with independent samples. We assume that the measurement platform is a 1-
channel microarray such as an Affymetrix array, or a two channel 'reference design'in which one channel
is devoted to a reference sample, so that M=sample-reference. We start with normalized data on the
log2 scale, in a data matrix with genes in the rows and samples in the columns.

We will use the colonCA data again, but this time we use only the normal samples from the 22 patients
with two samples, and only the cancer samples from the 18 patients that had only cancer samples. This
gives us independent samples.

Our typical situation is that each sample is on separate array and the analysis will be a two sample t-
test. Again, we can do this by doing t-tests for every row. But we prefer to do the empirical Bayes or
'moderated' t-test because to get a little more power.

In the paired case, we had an M value for each patient which represented the difference in expression,
so our design matrix required only one column which gave the sign of M. In the unpaired case, some
patients provided cancer samples and others provided normal samples. So we need a slightly different
design matrix. I prefer to use what is called the treatment means model. The columns of the design
matrix are indicator variables for each treatment group. So, in this case, we have 2 columns. One
column has a 1 for each normal sample and a 0 for each tumor sample, while the other column has a 1
for each tumor sample and a 0 for each normal sample. Letting D be the design matrix and E the
https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8 7/9
3/8/2018 https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8

log2(Expression) matrix (after normalization) we find that M=ED is a 2000 x 2 matrix. The first column
is the sample mean for the normal samples for each gene and the second column is the sample mean
for the tumor samples for each gene. The ordinary (unmoderated) variances are the pooled within
treatment variances.

To obtain the difference in means, we use a contrast matrix. If C=[1,-1] a 1x2 matrix, then MCT is a
1x2000 matrix with the difference in means for each gene.

In the final step we moderate the pooled within variances exactly as we did in the paired case, and
compute the moderated two-sample t-statistics (using the pooled variances).

Below you can see the 2000 p-values from the usual two sample t-tests and a moderated two sample t-
test. As we saw with the paired t-test, the moderation does not have a large effect, because the
samples sizes (18 and 22) are relatively large. But we do expect some additional power due to the
moderation.

As we did before, we also estimated pi0 , the percentage of genes that do not differentially expressed.
We used the Pounds method which is the average p-value. We also use the Storey's method which
uses the the area under the flat part of the curve. They are somewhat different but both show that there
is quite a bit of differential expression. At most 65% of the genes do not differentially express and 35%
do. As before, our nondetection rate is pretty high, since we expect about 35% of 2000 = 700 genes to
differentially express. And, as before, we detect more "significant" genes with the moderated than with
the unmoderated test, even though the estimate of π0 is quite similar.

Pounds and Cheng π0 Storey π0 #q < 0.05

ordinary t 0.648 0.582 326

moderated t 0.648 0.588 340

So far we have done two different analyses (paired and unpaired), each using part of the data. The
power of LIMMA and other "linear models" methods is that we can fit a model that uses all the data.
https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8 8/9
3/8/2018 https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8

We will look at that in Chapter 7.

Source URL: https://onlinecourses.science.psu.edu/stat555/node/8

https://onlinecourses.science.psu.edu/stat555/print/book/export/html/8 9/9

Lecture 1 Data Quality and Statistics
50% (2)
Lecture 1 Data Quality and Statistics
31 pages
Paper 5 Essentials Guideline
No ratings yet
Paper 5 Essentials Guideline
5 pages
Health Economics 6th Edition Santerre Instructor Test Bank
No ratings yet
Health Economics 6th Edition Santerre Instructor Test Bank
309 pages
Analysis of Microarray Gene Expression Data Ebook Full Text
100% (16)
Analysis of Microarray Gene Expression Data Ebook Full Text
17 pages
Applied Statistics For Bioinformatics PDF
No ratings yet
Applied Statistics For Bioinformatics PDF
278 pages
Sokal y Rohlf Bioestadistica
67% (3)
Sokal y Rohlf Bioestadistica
374 pages
Lecture 9 - Parametric Statistics (Teaching)
No ratings yet
Lecture 9 - Parametric Statistics (Teaching)
10 pages
BPS651 Exercise V
50% (2)
BPS651 Exercise V
5 pages
Ex Tenebris Marking System
No ratings yet
Ex Tenebris Marking System
5 pages
Diferential Expression Analysis PDF
No ratings yet
Diferential Expression Analysis PDF
72 pages
Student - S T Distribution
100% (1)
Student - S T Distribution
12 pages
2020 74564 Moesm1 Esm
No ratings yet
2020 74564 Moesm1 Esm
16 pages
Test of Statistical Hypothesis
No ratings yet
Test of Statistical Hypothesis
100 pages
Statistics For A2 Biology
100% (1)
Statistics For A2 Biology
9 pages
Notes For Lectures 11 To 16 - 2024
No ratings yet
Notes For Lectures 11 To 16 - 2024
68 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
10 pages
Basi Concepts
No ratings yet
Basi Concepts
32 pages
SPSS Advance Statistics Session 1 RCD DR Muhammad Khan Asif
No ratings yet
SPSS Advance Statistics Session 1 RCD DR Muhammad Khan Asif
55 pages
Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras
No ratings yet
Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras
25 pages
Introduction To Bios Tatis Tic S Second
No ratings yet
Introduction To Bios Tatis Tic S Second
374 pages
2017dec 02402 Solution en
No ratings yet
2017dec 02402 Solution en
45 pages
STAB22 Midterm 2009W
No ratings yet
STAB22 Midterm 2009W
14 pages
Basic Principles in Bioinformatics: Understanding Microarrays
No ratings yet
Basic Principles in Bioinformatics: Understanding Microarrays
81 pages
Analysis of Continuous and Categorical Variables: January 28, 2020
No ratings yet
Analysis of Continuous and Categorical Variables: January 28, 2020
31 pages
Statistical Principles of Experimental Design: Dov Stekel
No ratings yet
Statistical Principles of Experimental Design: Dov Stekel
58 pages
1 Biostatistics
No ratings yet
1 Biostatistics
16 pages
Analysis of Microarray Gene Expression Data - M. Lee (Kluwer
No ratings yet
Analysis of Microarray Gene Expression Data - M. Lee (Kluwer
398 pages
Light XlTwgwQ0 OvDn1N7
No ratings yet
Light XlTwgwQ0 OvDn1N7
41 pages
1.StudentT TestContingencytables2020 Solution Laboratory Class
No ratings yet
1.StudentT TestContingencytables2020 Solution Laboratory Class
14 pages
Modelling in R
No ratings yet
Modelling in R
47 pages
Perth 2014 - Abstract Book - Final PDF
100% (1)
Perth 2014 - Abstract Book - Final PDF
277 pages
Biostatistics Assignment: Dna Microarray: AN
No ratings yet
Biostatistics Assignment: Dna Microarray: AN
14 pages
Statistical Applications in Genetics and Molecular Biology
No ratings yet
Statistical Applications in Genetics and Molecular Biology
28 pages
Student's T Test
100% (1)
Student's T Test
7 pages
T Test in R
No ratings yet
T Test in R
12 pages
Chicago River Design Guidelines 2019
100% (2)
Chicago River Design Guidelines 2019
137 pages
Introduction To Career Counseling
No ratings yet
Introduction To Career Counseling
27 pages
Honka B2B Brochure 2020
No ratings yet
Honka B2B Brochure 2020
66 pages
OL Physics Book 2 (MCQ Theory) 2008 Till 2021
No ratings yet
OL Physics Book 2 (MCQ Theory) 2008 Till 2021
386 pages
Mechanics of Materials B.C. Punmia - Get The Ebook in PDF Format For A Complete Experience
No ratings yet
Mechanics of Materials B.C. Punmia - Get The Ebook in PDF Format For A Complete Experience
56 pages
Medical Statistics New
No ratings yet
Medical Statistics New
46 pages
Lab Manual FPA 580 PDF
No ratings yet
Lab Manual FPA 580 PDF
34 pages
The Statistical Analysis of Mitochondrial DNA Polymorphisms: X and The Problem of Small Samples
No ratings yet
The Statistical Analysis of Mitochondrial DNA Polymorphisms: X and The Problem of Small Samples
7 pages
Krijnen IntroBioInfStatistics
No ratings yet
Krijnen IntroBioInfStatistics
278 pages
900-Prof Ed - Questions
No ratings yet
900-Prof Ed - Questions
63 pages
BES - R Lab
No ratings yet
BES - R Lab
5 pages
Transformando La Movilidad Urbana en Mexico2
No ratings yet
Transformando La Movilidad Urbana en Mexico2
4 pages
Use of The Half-Normal Probability Plot To Identify Significant Effects For Microarray Data
No ratings yet
Use of The Half-Normal Probability Plot To Identify Significant Effects For Microarray Data
24 pages
2023 T-Test Lecture Notes (Teacher) Final
No ratings yet
2023 T-Test Lecture Notes (Teacher) Final
13 pages
Basic SPSS Guidance 1
No ratings yet
Basic SPSS Guidance 1
26 pages
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
No ratings yet
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
39 pages
Gene Expression Analysis: Ulf Leser and Karin Zimmermann
No ratings yet
Gene Expression Analysis: Ulf Leser and Karin Zimmermann
46 pages
NLM Sas 6
No ratings yet
NLM Sas 6
6 pages
Labreport Heat Exchanger
No ratings yet
Labreport Heat Exchanger
27 pages
Cambridge IGCSE: 0500/12 First Language English
No ratings yet
Cambridge IGCSE: 0500/12 First Language English
16 pages
Statistical Computing by Using R
100% (1)
Statistical Computing by Using R
11 pages
Exam Program Nov 2022 (Civil Engg)
No ratings yet
Exam Program Nov 2022 (Civil Engg)
4 pages
Stats and Math For 9700 Bio p5
No ratings yet
Stats and Math For 9700 Bio p5
8 pages
Hilbert System Logic
No ratings yet
Hilbert System Logic
55 pages
NOTES
No ratings yet
NOTES
10 pages
HoBt Test
No ratings yet
HoBt Test
7 pages
WTC Foundation Beam MKD 03
No ratings yet
WTC Foundation Beam MKD 03
8 pages
Corydoras
No ratings yet
Corydoras
2 pages
Identifying Differentially Expressed Genes
No ratings yet
Identifying Differentially Expressed Genes
3 pages
Multivariate Exploratory
No ratings yet
Multivariate Exploratory
13 pages
Ho:  = 0 H:  0 Z= g B 0 σ: Tukey Ho: μ H1: μ q= SE II. Answer briefly. Use point form. 25pts
No ratings yet
Ho:  = 0 H:  0 Z= g B 0 σ: Tukey Ho: μ H1: μ q= SE II. Answer briefly. Use point form. 25pts
5 pages
Lecture 4 - How To Choose A Statistical Test
No ratings yet
Lecture 4 - How To Choose A Statistical Test
18 pages
Easy Differential Expression: F. Hahne and W. Huber
No ratings yet
Easy Differential Expression: F. Hahne and W. Huber
6 pages
R Commands
No ratings yet
R Commands
5 pages
Writing 1
No ratings yet
Writing 1
10 pages
MSC 417 PDF
No ratings yet
MSC 417 PDF
26 pages
Ram Mohan Impact of West
No ratings yet
Ram Mohan Impact of West
2 pages
C5c Total Internal Reflection and The Critical Angle
No ratings yet
C5c Total Internal Reflection and The Critical Angle
2 pages
Av Log
No ratings yet
Av Log
11 pages
Analysis and Interpretation of Data
No ratings yet
Analysis and Interpretation of Data
3 pages
Brookfield Blower, Filter Calculation
100% (1)
Brookfield Blower, Filter Calculation
3 pages
Pemanfaatan Serat Selulosa ECENG GONDOK (Eichhornia Crassipes) SEBAGAI BAHAN BAKU Pembuatan Kertas: Isolasi Dan Karakterisasi
No ratings yet
Pemanfaatan Serat Selulosa ECENG GONDOK (Eichhornia Crassipes) SEBAGAI BAHAN BAKU Pembuatan Kertas: Isolasi Dan Karakterisasi
8 pages
Kluenter Pause - Look
No ratings yet
Kluenter Pause - Look
64 pages
CV Adalumo Kolapo O.
No ratings yet
CV Adalumo Kolapo O.
2 pages
Curvas Graficas de LM35
No ratings yet
Curvas Graficas de LM35
2 pages
CC Block (Horiskhali)
No ratings yet
CC Block (Horiskhali)
1 page
Abrar's Lesson Plan
No ratings yet
Abrar's Lesson Plan
4 pages
Analysis of Variance
No ratings yet
Analysis of Variance
51 pages
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
From Everand
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
Björn Olsson
No ratings yet
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
From Everand
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
César Pérez López
No ratings yet
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Gene Expression Programming: Fundamentals and Applications
From Everand
Gene Expression Programming: Fundamentals and Applications
Fouad Sabry
No ratings yet

Statistical For de

Uploaded by

Statistical For de

Uploaded by

3/8/2018 https://onlinecourses.science.psu.

Published on STAT 555 (https://onlinecourses.science.psu.edu/stat555)

Lesson 6: Statistics for Differential

Developing a design matrix

We are going to demonstrate with the ColonCA data.

6.1 - Two Condition Studies

Hsa. 3004 -2.057516

Hsa. 541 -1.750747

Hsa. 20836 -0.851902

Here are the results that we get using LIMMA.

t.test LIMMA (unmoderated)

Hsa. 3004 -2.057516 -2.057516

Hsa.13491 -1.169676 -1.169676

Hsa.13491.1 -1.590321 -1.590321

Hsa.37254 1.137496 1.137496

Hsa. 541 -1.750747 -1.750747

Hsa. 20836 -0.851902 -0.851902

Pounds and Cheng π0 Storey π0 tests with p < 0.05

ordinary t 0.758 0.664 122

moderated t 0.761 0.679 178

What is a Moderated T-test?

Doing the moderated tests with LIMMA

6.2 - Two-Condition Studies on a 1 Channel

Pounds and Cheng π0 Storey π0 #q < 0.05

ordinary t 0.648 0.582 326

moderated t 0.648 0.588 340

We will look at that in Chapter 7.

Source URL: https://onlinecourses.science.psu.edu/stat555/node/8

You might also like