0% found this document useful (0 votes)

67 views33 pages

Completely Randomized Designs: Gary W. Oehlert

This document provides an overview of completely randomized designs (CRDs). A CRD involves randomly assigning experimental units to treatment groups without restriction. Key points: - CRDs have N total units that are randomly assigned to g treatment groups of sizes n1, n2, ..., ng without restriction. - They are the simplest experimental design and easiest to analyze. Variation between treatment groups reflects only random variation, not bias from non-random assignment. - Inference focuses on comparing treatment means using analysis of variance (ANOVA) and F-tests to determine if means differ significantly. - The ANOVA partitions total variation into a model sum of squares and error sum of squares to compare nested models (e

Uploaded by

Kirandeep Kaur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views33 pages

Completely Randomized Designs: Gary W. Oehlert

Uploaded by

Kirandeep Kaur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Completely Randomized Designs

Gary W. Oehlert

School of Statistics
University of Minnesota

January 18, 2016

Definition

A completely randomized design (CRD) has

N units
g different treatments
P
g known treatment group sizes n1 , n2 , . . . , ng with ni = N
Completely random assignment of treatments to units

Completely random assignment means that every possible grouping of units into g
groups with the given sample sizes is equally likely.
This is the basic experimental design; everything else is a modification.1

The CRD is

Easiest to do.
Easiest to analyze.
Most resilient when things go wrong.
Often sufficient.

Consider a CRD first when designing.

1
“God invented the integers, the rest is the work of man.” Leopold Kronecker
Examples

1. Does a wood board .625 inches thick have the same strength as a .75 inch thick
wood board with a notch cut to .625 thickness? Twenty-six 2.5” by .75” by 3 foot
boards. Half are chosen at random to be notched in the center. Response is load at
failure in horizontal bending.

2. Do the efflux transporters P-gp and/or BCRP affect the ability of a certain
chemotherapy drug to cross the blood-brain barrier. We will make 30 in-vitro
measurements of chemo accumulation in cells. Ten will be done with wild type cells,
10 with cells that over-express P-gp, and 10 with cells that over-express BCRP. The
efflux transporters (or not) are randomly assigned to the trials.
3. Do xantham gum and/or cinnamon affect the sensory quality of gluten-free cookies?
Eight batches of cookies will be made, with two of the eight batches assigned to each
of the four combinations of low/high gum and low/high cinnamon. The response is a
sensory score.

4. How do sling length and size of counterweight affect the throw distance of a
trebuchet? Randomly assign 27 throws to the nine combinations of three lengths and
three weights, with three throws per combination. The response is the distance of the
projectile.
Experiment like this:
Build like this?
Inference

Most of our inference is about treatment means:

Any evidence means are not all the same?

Which ones differ?
Any pattern in differences?
How can differences be described succinctly?
Estimates/confidence intervals of means and differences.

Variability and other aspects may be of interest in specific cases.

Models

We seek the simplest model consistent with the data.

“All treatments have the same mean” is simpler than

“Each treatment has its own mean.” If we cannot say that the complicated model is
needed, we take the simple model.

Sometimes we seek a more explanatory model.

“Treatment means vary linearly with temperature” is simpler than “Each treatment
has its own mean” or even “Treatment means vary quadratically with temperature.”
An explanatory model (especially a simple one) helps us understand the data.
All models are wrong; some models are useful. — George Box

We might not believe that the simple model can be completely true in some infinitely
precise sense, but if the data are consistent with it, we use it.
Comparing models

We gauge model fit by looking at the sum of squared residuals.

We usually choose model parameters so as to minimize the sum of squared residuals.

The total sum of squares in the data SST is the sum of the model or explained sum of
squares SSM plus the error or residual sum of squares SSE . For a fixed set of data, if
you change the model making one SS bigger, then the other must get smaller.

SST = SSM + SSE

“All treatment means are the same” is a special case of “Each treatment has its own
mean.” “Treatment means vary linearly with temperature” is a special case of
“Treatment means vary quadratically with temperature” and, indeed, of “Each
treatment has its own mean” as well.

We say that the special case model is included in the more complicated model, or
perhaps that it is a restriction of (a restricted version of) the more complicated model.

We sometimes say that the special case model is nested in the more complicated
model, but we will also use the descriptor “nested” in a different way later, so beware.
When we have model A included in model B, then:

1 Model B (fit by LS) always fits at least as well as model A (fit by LS), and usually
fits better.
2 The error sum of squares from model B cannot be larger than the error sum of
squares from model A, and is usually smaller.
3 Equivalently, the model SS for model B is always at least as large and usually
larger than the model SS for model A.
4 The reduction in error SS going from A to B is the same as the increase in model
SS going from A to B.
ANOVA

The partitioning of the sums of squares is called Analysis of Variance, or ANOVA.

The special case model never fits as well as the larger model, but how do we decide
that it is good enough, that is, is consistent with the data?

The two basic approaches are:

Significance testing
Information Criteria
Significance testing

We will make an ANOVA table that has a row for the restricted model, a row for the
increment from the restricted model to the larger model, and a row for all of the
residual bits.

Each row in the table has a label, a sum of squares, a “degrees of freedom,” and a
“Mean square.”

Degrees of freedom count free parameters. If there are r1 parameters in the mean
structure of the included model, and r2 parameters in the mean structure of the larger
model, then there are r2 − r1 parameters in the improvement from the small model to
the large model, and N − r2 parameters for residuals (error).

An MS is SS divided by DF.
The generic table looks like this (SS1 is model SS for restricted model, and SS2 is
model SS for the large model):

Source SS DF MS
Model 1 SS1 r1 SS1 /r1

Improvement from
Model 1 to Model 2 SS2 − SS1 r2 − r1 (SS2 − SS1 )/(r2 − r1 )

Error SSE N − r2 SSE /(N − r2 )

Notation

There are simple formulae for elements of the ANOVA table for many designed
experiments.

Let yij be the jth response in treatment i. i = 1, 2, . . . , g and j = 1, 2, . . . , ni .

Let Pni
j=1 yij
y i• =
ni
be the mean response in the ith treatment, and let
Pg Pni
i=1 j=1 yij
y •• =
N
be the grand mean response.
Suppose that the restricted model is the model that all treatments have the same
mean, and the larger model is the model that each treatment has its own mean. Then:
r1 = 1
r2 = g
SS1 = Ny •• 2
SS2 = gi=1 ni y i• 2
P

SS2 − SS1 = gi=1 ni (y i• − y •• )2

SSE = gi=1 nj=1 (yij − y i• )2

P P i

and the ANOVA table is . . .

Basic ANOVA

The first four columns of the ANOVA table are:

Source SS DF MS
Overall mean Ny •• 2 1
Pg
Between Treatments i=1 ni (y i• − y •• )2 g −1 SSTrt /(g − 1)
Pg Pni
Error i=1 j=1 (yij − y i• )2 N −g SSE /(N − g )
and the MS may be denoted MSE and MSTrt .

In fact, the line for the overall mean is so boring that it is usually left off.
Digression on Pythagorean Theorem

Note that
yij = y •• + (y i• − y •• ) + (yij − y i• )
Square both sides and add over all i and j and we get
g X
X ni g
X g X
X ni
yij2 = Ny •• 2 + ni (y i• − y •• )2 + (yij − y i• )2
i=1 j=1 i=1 i=1 j=1

plus a lot of sums of cross products. All those sums of cross products add to zero (the
three components of yij are perpendicular out in N-dimensional geometry so sums of
squares add up).
Probability model
The ANOVA is just algebra, albeit algebra with statistical intent. We need a
probability model.

Assume that yij ∼ N(µi , σ 2 ). Then,

E (MSE ) = σ 2

and if the restricted model is true we also have

E (MSTrt ) = σ 2

If the restricted model is not good enough its expectation is larger than σ 2 . This
means that
F = MSTrt /MSE
is a test statistic for comparing the restricted model to the full model; we reject the
null if F is too big.
When the null is true and the normal distribution assumptions are correct, the F-test
follows an F-distribution with g − 1 and N − g df (note df from numerator and
denominator MS). Reject the null that the single mean model is true when the p-value
for the F-test is too small.

We did the algebra for the single mean model and individual mean model, but the F
test is appropriate for general restricted models versus a containing model. It’s just
that the computations are not always so clean.

Resin example in R.
Information criteria

Akaike introduced the first information criterion, AIC.

Later Bayesians added a second one, BIC.

Now there are several more.

Information criteria include a measure of how well the data fit the model (smaller
being better) plus a penalty for using additional parameters.

Models with smaller values of AIC or BIC are better models.

Let L be the maximized likelihood for the data. This is the “probability” of the data
under the model, with the parameters chosen to make the probability as high as
possible. This likelihood model has k parameters that we can choose. Typically these
parameters are things like treatment means, or regression coefficients, or residual
variances.

We’ll say a lot more later, but for now suffice it to say that big L is good.

AIC = −2 ln(L) + 2k
BIC = −2 ln(L) + ln(N)k
Choose a model with smaller AIC (or BIC).

In general, AIC tends to choose models with more parameters than we get from
significance testing, i.e., some things in the selected model might be “insignificant.”
The reverse tends to be true for BIC, especially for big data sets.

Except for very small data sets, BIC penalizes additional parameters more than AIC.
BIC thus tends to choose smaller models than AIC.

AIC tends to work better when all candidate models are approximate; BIC tends to
work better in large samples when one of the candidate models is really the right
model.

Resin example in R, continued.

Parameters

You have an apartment in SE Minneapolis. You can locate it by

Latitude and longitude;

Street address (note, streets in SE are not oriented NS/EW, so this is different
than lat/long);
Walking directions from here;
Distance and direction from here.

Four completely separate ways to identify the same place. In fact, walking directions
are not even unique!
Mean parameters suffer the same issue: there are many ways to describe/parameterize
the same set of means. Sometimes one is better than another in a particular context.
Sometimes one is more understandable than another.

It is an embarrassment of riches, but as long as the parameters describe the same

means, we are OK.

They can all be different yet still correct, but you need to know which ones you’re
working with.
Consider the resin example.

Trt (o C ) 175 194 213 231 250 All data

Average 1.933 1.629 1.378 1.194 1.057 1.465
Count 8 8 8 7 6 37

If we have a single mean model, the only parameter is the overall mean µ. Our
estimate would be µ b = y •• = 1.465.

In the separate means model, parameters are the group means, and the estimates
would be µb1 = y 1• = 1.933 and so on.
Sometimes we want to write
µi = µ + αi
Where µ is some kind of “central value” and αi is a treatment effect.

We always have αi = µi − µ and α bi − µ

bi = µ b, but how do we define µ?

Like the walking instructions, there are many, many ways, but there are three
semi-standard ways.
Define µ Equivalent constraint
µ = µ1 α1 = 0
P
µi P
µ= i
g i αi = 0
P
ni µi P
µ= i
N i ni αi = 0

The first is the default in R, I find the second more interpretable, and the third is
useful in hand calculations.
The important things (µi − µj = αi − αj ) are the same in all versions.

Care about µ in the single mean model; care about µi and αi − αj in the separate
means model.
What about polynomial models? Let zi be the temperature treatment for group i.
Here are some models

µi = β0
µi = β0 + β1 zi
µi = β0 + β1 zi + β2 zi2
µi = β0 + β1 zi + β2 zi2 + β3 zi3
µi = β0 + β1 zi + β2 zi2 + β3 zi3 + β4 zi4

The first is the same as the single mean model, the last fits the same means as the
separate means model, and the others are intermediate.

Note that equivalently written parameters have different meanings (and different
values) in different models.

Note that we maintain hierarchy.

But we don’t even leave polynomials in peace. Consider

µi = β0 + β1 [zi − 210.0811]
+ β2 [zi2 − 422.9zi + 44043.5]
+ β3 [zi3 − 636.4zi2 + 133812.3zi − 9294576.3]

This is equivalent to the cubic model on the last slide, but here the βi retain values
and meanings as we change linear to quadratic to cubic (and you can go higher).
These are orthogonal polynomials.
The moral of the story is that

Parameters are tricksy and can often be defined in many ways within a single
mean structure.
We usually only use parameters as a means to an end.
Most parameters are arbitrary, so inference on parameters (as opposed to model
comparison or comparison of means) is also somewhat arbitrary.

R will compute the estimates as well as standard errors for various parameterizations,
polynomials, orthogonal polynomials, trigonometric series, and so on. They are done
correctly, but they retain the arbitrariness of their definition.

Back to resin.

Complete Block Designs
No ratings yet
Complete Block Designs
80 pages
Completely Randomized Design
100% (4)
Completely Randomized Design
5 pages
Sutherland Interview Question and Answer
No ratings yet
Sutherland Interview Question and Answer
11 pages
Business Solutions
100% (6)
Business Solutions
227 pages
Unit 0 - Statistics Unit Notes Dictated (CLOSED)
No ratings yet
Unit 0 - Statistics Unit Notes Dictated (CLOSED)
10 pages
Biostatistics Kmu Final 2020 With Key
67% (6)
Biostatistics Kmu Final 2020 With Key
7 pages
Cs1 Compiler With Index (13.03.2023)
No ratings yet
Cs1 Compiler With Index (13.03.2023)
209 pages
Topic 6
No ratings yet
Topic 6
27 pages
16 Feb 2006
No ratings yet
16 Feb 2006
274 pages
Stats For Primary FRCA
No ratings yet
Stats For Primary FRCA
7 pages
NLP Assignment-3 Solution
100% (1)
NLP Assignment-3 Solution
6 pages
Lecture14 OneWayANOVA
No ratings yet
Lecture14 OneWayANOVA
46 pages
Basic Concepts of One Way Analysis of Variance (ANOVA)
No ratings yet
Basic Concepts of One Way Analysis of Variance (ANOVA)
38 pages
Ac 07 PDF
No ratings yet
Ac 07 PDF
3 pages
Quality Midsem
No ratings yet
Quality Midsem
179 pages
MIT9 63F09 Lec04
No ratings yet
MIT9 63F09 Lec04
7 pages
Miller - Haden - 2013 - GLM Statistical Analysis PDF
No ratings yet
Miller - Haden - 2013 - GLM Statistical Analysis PDF
274 pages
Logistic Regression From Malhotra
No ratings yet
Logistic Regression From Malhotra
24 pages
Introduction To Analysis of Variance
No ratings yet
Introduction To Analysis of Variance
17 pages
Experimental Design
No ratings yet
Experimental Design
253 pages
Weins Introduction To Design
No ratings yet
Weins Introduction To Design
261 pages
Maths Theory Unit 4 Only
No ratings yet
Maths Theory Unit 4 Only
10 pages
A Review of Basic Statistical Concepts: Answers To Problems and Cases 1
No ratings yet
A Review of Basic Statistical Concepts: Answers To Problems and Cases 1
227 pages
Six Sigma - Live Lecture 14
No ratings yet
Six Sigma - Live Lecture 14
66 pages
MTH6134 Notes11
No ratings yet
MTH6134 Notes11
77 pages
Anova
No ratings yet
Anova
6 pages
Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable
No ratings yet
Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable
39 pages
C291 2019 Lectures 20 21
No ratings yet
C291 2019 Lectures 20 21
12 pages
Design of Experiments
No ratings yet
Design of Experiments
24 pages
A Review of Basic Statistical Concepts: Answers To Problems and Cases 1
No ratings yet
A Review of Basic Statistical Concepts: Answers To Problems and Cases 1
94 pages
Principles of The T-Test and ANOVA
No ratings yet
Principles of The T-Test and ANOVA
64 pages
BBADM 221 Unit 10 - With Notes
No ratings yet
BBADM 221 Unit 10 - With Notes
51 pages
A First Course in Experimental Design
No ratings yet
A First Course in Experimental Design
193 pages
20151113141143introduction To Statistics-7
No ratings yet
20151113141143introduction To Statistics-7
29 pages
Newbold Stat8 Ism 04 Ge
No ratings yet
Newbold Stat8 Ism 04 Ge
50 pages
Regression and Life Cycle Costing
No ratings yet
Regression and Life Cycle Costing
28 pages
06 HypothesisTesting
No ratings yet
06 HypothesisTesting
65 pages
Applied Statistics II Chapter 9 The One-Way Model: Jian Zou
No ratings yet
Applied Statistics II Chapter 9 The One-Way Model: Jian Zou
81 pages
Anova: Samprit Chakrabarti
No ratings yet
Anova: Samprit Chakrabarti
30 pages
Week 11
No ratings yet
Week 11
22 pages
Chapter 12
No ratings yet
Chapter 12
12 pages
4.1 Non Parametric and Sample Size Selection
No ratings yet
4.1 Non Parametric and Sample Size Selection
28 pages
14 Anova1
No ratings yet
14 Anova1
31 pages
Basic Concepts of One Way Analysis of Variance (ANOVA)
No ratings yet
Basic Concepts of One Way Analysis of Variance (ANOVA)
38 pages
Anova
No ratings yet
Anova
46 pages
Random Effect Models
No ratings yet
Random Effect Models
17 pages
Final Exam in STAT
No ratings yet
Final Exam in STAT
2 pages
Outline and Equation Sheet For M E 345: Every Additive Term in An Equation Must Have The Same Dimensions
No ratings yet
Outline and Equation Sheet For M E 345: Every Additive Term in An Equation Must Have The Same Dimensions
7 pages
STAT453 Study Guide
No ratings yet
STAT453 Study Guide
11 pages
Advanced Data Analysis Binder 2015
100% (1)
Advanced Data Analysis Binder 2015
165 pages
Seminar 3
No ratings yet
Seminar 3
69 pages
ML Notes
No ratings yet
ML Notes
38 pages
Bag II
No ratings yet
Bag II
28 pages
Lecture 2: Completely Randomised Designs: Example 1
No ratings yet
Lecture 2: Completely Randomised Designs: Example 1
25 pages
H-409 Experimental Design With R
No ratings yet
H-409 Experimental Design With R
72 pages
Beginers Guide To Statistics
No ratings yet
Beginers Guide To Statistics
18 pages
Basic Concepts of One Way Analysis of Variance (ANOVA)
No ratings yet
Basic Concepts of One Way Analysis of Variance (ANOVA)
30 pages
Stat Lesson 1 PDF
100% (1)
Stat Lesson 1 PDF
19 pages
Stat 11 Anova
No ratings yet
Stat 11 Anova
36 pages
ML Unit 3
No ratings yet
ML Unit 3
46 pages
L4&5 Multiple Regression 2010B
No ratings yet
L4&5 Multiple Regression 2010B
77 pages
Normal Distribution1
100% (1)
Normal Distribution1
8 pages
BRMS - DR - Dhanashree Havale MCQ
No ratings yet
BRMS - DR - Dhanashree Havale MCQ
17 pages
Single Factor Design: Analysis of Variance (ANOVA)
No ratings yet
Single Factor Design: Analysis of Variance (ANOVA)
20 pages
STAT - Lec.2 - Measures of Centeral Tendency - Measures of Dispersion.
100% (1)
STAT - Lec.2 - Measures of Centeral Tendency - Measures of Dispersion.
33 pages
COMM 201 Biostatistics
No ratings yet
COMM 201 Biostatistics
30 pages
Sta1008-Sample Test
No ratings yet
Sta1008-Sample Test
6 pages
Chi Square Test
No ratings yet
Chi Square Test
3 pages
Ch-6 Normal Distribution Lecture Notes
No ratings yet
Ch-6 Normal Distribution Lecture Notes
6 pages
Hypothesis Testing Mean by Z - T Tests 10052024 101926am
No ratings yet
Hypothesis Testing Mean by Z - T Tests 10052024 101926am
14 pages
Option Valuation The Black-Scholes-Merton Option Pricing Model
No ratings yet
Option Valuation The Black-Scholes-Merton Option Pricing Model
9 pages
AS Lecture 10 (Anova Test)
No ratings yet
AS Lecture 10 (Anova Test)
29 pages
Annova and Chi Square
No ratings yet
Annova and Chi Square
20 pages
BMA3102 Topic One
No ratings yet
BMA3102 Topic One
34 pages
MIT6 041F10 Rec14
No ratings yet
MIT6 041F10 Rec14
2 pages
RP-04: Monitoring and Adjustment of Calibration Intervals For Mass Standards
No ratings yet
RP-04: Monitoring and Adjustment of Calibration Intervals For Mass Standards
14 pages
ch13 Assignment
No ratings yet
ch13 Assignment
8 pages
Chapter Two - Estimation (STA408)
No ratings yet
Chapter Two - Estimation (STA408)
46 pages
Regression Analysis: Prof. Prema Muthuswamy KCT, Coimbatore
No ratings yet
Regression Analysis: Prof. Prema Muthuswamy KCT, Coimbatore
22 pages
ADT123
No ratings yet
ADT123
9 pages
Problems ch14
No ratings yet
Problems ch14
9 pages
Stat 6201 Midterm Exam I Solutions Octo
No ratings yet
Stat 6201 Midterm Exam I Solutions Octo
6 pages
S9hbgKG6EemEawpeY3OQmg - Formative Quiz 1 Solutions PDF
No ratings yet
S9hbgKG6EemEawpeY3OQmg - Formative Quiz 1 Solutions PDF
2 pages
Fisher's Least Significant Difference (LSD) Method: − μ − X − X − X − μ + 1/J
No ratings yet
Fisher's Least Significant Difference (LSD) Method: − μ − X − X − X − μ + 1/J
3 pages
Problem and Solution Chart: Using Solvin'S Formula
No ratings yet
Problem and Solution Chart: Using Solvin'S Formula
2 pages
Calculus Volume1
From Everand
Calculus Volume1
Ming Yao Tsai
No ratings yet
A Concept of Limits
From Everand
A Concept of Limits
Donald W. Hight
4/5 (4)
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet

Completely Randomized Designs: Gary W. Oehlert

Uploaded by

Completely Randomized Designs: Gary W. Oehlert

Uploaded by

Completely Randomized Designs

January 18, 2016

A completely randomized design (CRD) has

Consider a CRD first when designing.

Most of our inference is about treatment means:

Any evidence means are not all the same?

Variability and other aspects may be of interest in specific cases.

We seek the simplest model consistent with the data.

“All treatments have the same mean” is simpler than

Sometimes we seek a more explanatory model.

We gauge model fit by looking at the sum of squared residuals.

We usually choose model parameters so as to minimize the sum of squared residuals.

SST = SSM + SSE

The partitioning of the sums of squares is called Analysis of Variance, or ANOVA.

The two basic approaches are:

Error SSE N − r2 SSE /(N − r2 )

Let yij be the jth response in treatment i. i = 1, 2, . . . , g and j = 1, 2, . . . , ni .

SS2 − SS1 = gi=1 ni (y i• − y •• )2

SSE = gi=1 nj=1 (yij − y i• )2

and the ANOVA table is . . .

The first four columns of the ANOVA table are:

Assume that yij ∼ N(µi , σ 2 ). Then,

and if the restricted model is true we also have

Akaike introduced the first information criterion, AIC.

Later Bayesians added a second one, BIC.

Now there are several more.

Models with smaller values of AIC or BIC are better models.

Resin example in R, continued.

You have an apartment in SE Minneapolis. You can locate it by

Latitude and longitude;

It is an embarrassment of riches, but as long as the parameters describe the same

Trt (o C ) 175 194 213 231 250 All data

We always have αi = µi − µ and α bi − µ

Note that we maintain hierarchy.

You might also like