Completely Randomized Designs: Gary W. Oehlert
Completely Randomized Designs: Gary W. Oehlert
Gary W. Oehlert
School of Statistics
University of Minnesota
N units
g different treatments
P
g known treatment group sizes n1 , n2 , . . . , ng with ni = N
Completely random assignment of treatments to units
Completely random assignment means that every possible grouping of units into g
groups with the given sample sizes is equally likely.
This is the basic experimental design; everything else is a modification.1
The CRD is
Easiest to do.
Easiest to analyze.
Most resilient when things go wrong.
Often sufficient.
1
“God invented the integers, the rest is the work of man.” Leopold Kronecker
Examples
1. Does a wood board .625 inches thick have the same strength as a .75 inch thick
wood board with a notch cut to .625 thickness? Twenty-six 2.5” by .75” by 3 foot
boards. Half are chosen at random to be notched in the center. Response is load at
failure in horizontal bending.
2. Do the efflux transporters P-gp and/or BCRP affect the ability of a certain
chemotherapy drug to cross the blood-brain barrier. We will make 30 in-vitro
measurements of chemo accumulation in cells. Ten will be done with wild type cells,
10 with cells that over-express P-gp, and 10 with cells that over-express BCRP. The
efflux transporters (or not) are randomly assigned to the trials.
3. Do xantham gum and/or cinnamon affect the sensory quality of gluten-free cookies?
Eight batches of cookies will be made, with two of the eight batches assigned to each
of the four combinations of low/high gum and low/high cinnamon. The response is a
sensory score.
4. How do sling length and size of counterweight affect the throw distance of a
trebuchet? Randomly assign 27 throws to the nine combinations of three lengths and
three weights, with three throws per combination. The response is the distance of the
projectile.
Experiment like this:
Build like this?
Inference
“Treatment means vary linearly with temperature” is simpler than “Each treatment
has its own mean” or even “Treatment means vary quadratically with temperature.”
An explanatory model (especially a simple one) helps us understand the data.
All models are wrong; some models are useful. — George Box
We might not believe that the simple model can be completely true in some infinitely
precise sense, but if the data are consistent with it, we use it.
Comparing models
The total sum of squares in the data SST is the sum of the model or explained sum of
squares SSM plus the error or residual sum of squares SSE . For a fixed set of data, if
you change the model making one SS bigger, then the other must get smaller.
We say that the special case model is included in the more complicated model, or
perhaps that it is a restriction of (a restricted version of) the more complicated model.
We sometimes say that the special case model is nested in the more complicated
model, but we will also use the descriptor “nested” in a different way later, so beware.
When we have model A included in model B, then:
1 Model B (fit by LS) always fits at least as well as model A (fit by LS), and usually
fits better.
2 The error sum of squares from model B cannot be larger than the error sum of
squares from model A, and is usually smaller.
3 Equivalently, the model SS for model B is always at least as large and usually
larger than the model SS for model A.
4 The reduction in error SS going from A to B is the same as the increase in model
SS going from A to B.
ANOVA
The special case model never fits as well as the larger model, but how do we decide
that it is good enough, that is, is consistent with the data?
Significance testing
Information Criteria
Significance testing
We will make an ANOVA table that has a row for the restricted model, a row for the
increment from the restricted model to the larger model, and a row for all of the
residual bits.
Each row in the table has a label, a sum of squares, a “degrees of freedom,” and a
“Mean square.”
Degrees of freedom count free parameters. If there are r1 parameters in the mean
structure of the included model, and r2 parameters in the mean structure of the larger
model, then there are r2 − r1 parameters in the improvement from the small model to
the large model, and N − r2 parameters for residuals (error).
An MS is SS divided by DF.
The generic table looks like this (SS1 is model SS for restricted model, and SS2 is
model SS for the large model):
Source SS DF MS
Model 1 SS1 r1 SS1 /r1
Improvement from
Model 1 to Model 2 SS2 − SS1 r2 − r1 (SS2 − SS1 )/(r2 − r1 )
There are simple formulae for elements of the ANOVA table for many designed
experiments.
Let Pni
j=1 yij
y i• =
ni
be the mean response in the ith treatment, and let
Pg Pni
i=1 j=1 yij
y •• =
N
be the grand mean response.
Suppose that the restricted model is the model that all treatments have the same
mean, and the larger model is the model that each treatment has its own mean. Then:
r1 = 1
r2 = g
SS1 = Ny •• 2
SS2 = gi=1 ni y i• 2
P
Source SS DF MS
Overall mean Ny •• 2 1
Pg
Between Treatments i=1 ni (y i• − y •• )2 g −1 SSTrt /(g − 1)
Pg Pni
Error i=1 j=1 (yij − y i• )2 N −g SSE /(N − g )
and the MS may be denoted MSE and MSTrt .
In fact, the line for the overall mean is so boring that it is usually left off.
Digression on Pythagorean Theorem
Note that
yij = y •• + (y i• − y •• ) + (yij − y i• )
Square both sides and add over all i and j and we get
g X
X ni g
X g X
X ni
yij2 = Ny •• 2 + ni (y i• − y •• )2 + (yij − y i• )2
i=1 j=1 i=1 i=1 j=1
plus a lot of sums of cross products. All those sums of cross products add to zero (the
three components of yij are perpendicular out in N-dimensional geometry so sums of
squares add up).
Probability model
The ANOVA is just algebra, albeit algebra with statistical intent. We need a
probability model.
E (MSE ) = σ 2
E (MSTrt ) = σ 2
If the restricted model is not good enough its expectation is larger than σ 2 . This
means that
F = MSTrt /MSE
is a test statistic for comparing the restricted model to the full model; we reject the
null if F is too big.
When the null is true and the normal distribution assumptions are correct, the F-test
follows an F-distribution with g − 1 and N − g df (note df from numerator and
denominator MS). Reject the null that the single mean model is true when the p-value
for the F-test is too small.
We did the algebra for the single mean model and individual mean model, but the F
test is appropriate for general restricted models versus a containing model. It’s just
that the computations are not always so clean.
Resin example in R.
Information criteria
Information criteria include a measure of how well the data fit the model (smaller
being better) plus a penalty for using additional parameters.
We’ll say a lot more later, but for now suffice it to say that big L is good.
AIC = −2 ln(L) + 2k
BIC = −2 ln(L) + ln(N)k
Choose a model with smaller AIC (or BIC).
In general, AIC tends to choose models with more parameters than we get from
significance testing, i.e., some things in the selected model might be “insignificant.”
The reverse tends to be true for BIC, especially for big data sets.
Except for very small data sets, BIC penalizes additional parameters more than AIC.
BIC thus tends to choose smaller models than AIC.
AIC tends to work better when all candidate models are approximate; BIC tends to
work better in large samples when one of the candidate models is really the right
model.
Four completely separate ways to identify the same place. In fact, walking directions
are not even unique!
Mean parameters suffer the same issue: there are many ways to describe/parameterize
the same set of means. Sometimes one is better than another in a particular context.
Sometimes one is more understandable than another.
They can all be different yet still correct, but you need to know which ones you’re
working with.
Consider the resin example.
If we have a single mean model, the only parameter is the overall mean µ. Our
estimate would be µ b = y •• = 1.465.
In the separate means model, parameters are the group means, and the estimates
would be µb1 = y 1• = 1.933 and so on.
Sometimes we want to write
µi = µ + αi
Where µ is some kind of “central value” and αi is a treatment effect.
Like the walking instructions, there are many, many ways, but there are three
semi-standard ways.
Define µ Equivalent constraint
µ = µ1 α1 = 0
P
µi P
µ= i
g i αi = 0
P
ni µi P
µ= i
N i ni αi = 0
The first is the default in R, I find the second more interpretable, and the third is
useful in hand calculations.
The important things (µi − µj = αi − αj ) are the same in all versions.
Care about µ in the single mean model; care about µi and αi − αj in the separate
means model.
What about polynomial models? Let zi be the temperature treatment for group i.
Here are some models
µi = β0
µi = β0 + β1 zi
µi = β0 + β1 zi + β2 zi2
µi = β0 + β1 zi + β2 zi2 + β3 zi3
µi = β0 + β1 zi + β2 zi2 + β3 zi3 + β4 zi4
The first is the same as the single mean model, the last fits the same means as the
separate means model, and the others are intermediate.
Note that equivalently written parameters have different meanings (and different
values) in different models.
µi = β0 + β1 [zi − 210.0811]
+ β2 [zi2 − 422.9zi + 44043.5]
+ β3 [zi3 − 636.4zi2 + 133812.3zi − 9294576.3]
This is equivalent to the cubic model on the last slide, but here the βi retain values
and meanings as we change linear to quadratic to cubic (and you can go higher).
These are orthogonal polynomials.
The moral of the story is that
Parameters are tricksy and can often be defined in many ways within a single
mean structure.
We usually only use parameters as a means to an end.
Most parameters are arbitrary, so inference on parameters (as opposed to model
comparison or comparison of means) is also somewhat arbitrary.
R will compute the estimates as well as standard errors for various parameterizations,
polynomials, orthogonal polynomials, trigonometric series, and so on. They are done
correctly, but they retain the arbitrariness of their definition.
Back to resin.