Slidesc53 5
Slidesc53 5
(CRD)
1
Completely Randomized designs
• The completely randomized design is a simple experimental
design, in terms of data analysis and convenience. With this
design, participants are randomly assigned to treatments.
• The main goal is to compare the treatments.
• We will assume that the population from which we select the sample
is large compared to the sample size and so we can ignore any finite
population correction factors.
2
Completely Randomized Design with Two Treatments
3
Completely Randomized Design with Two Treatments
4
Completely Randomized Design with Two Treatments
5
Completely Randomized Design with Two Treatments
6
Completely Randomized Design with Two Treatments
7
Completely Randomized Design with Two Treatments
8
Completely Randomized Design with Two Treatments
9
Example (Completely Randomized Design with Two Treatments)
10
Example (Completely Randomized Design with Two Treatments)
11
Example (Completely Randomized Design with Two Treatments)
12
Example (Completely Randomized Design with Two Treatments), Using R
grades <- read.table("grades2ttest.txt", header=1) #the data file
head(grades)
13
Example (Completely Randomized Design with Two Treatments), Using R
t.test(Grade ~ Method, var.equal = T, data = grades)
14
Example (Completely Randomized Design with Two Treatments),
Tes 𝝈𝟐𝟏 = 𝝈𝟐𝟐 (against 𝝈𝟐𝟏 ≠ 𝝈𝟐𝟐 )
15
Systematic Differences Between the Populations
16
Systematic Differences Between the Populations
17
Systematic Differences Between the Populations
18
Systematic Differences Between the Populations
19
Effect of B on Type I Error
# R code for calculating the Type I error when bias exists
alpha <- c(0.01, 0.05, 0.10)
z <- qnorm(1-alpha/2)
f <- NA
l <- matrix(0,ncol=3, nrow=10)
u <- matrix(0,ncol=3, nrow=10)
prob <- matrix(0,ncol=3, nrow=10)
typeIerror <- matrix(0,ncol=3, nrow=10)
dimnames(typeIerror) <- list( c((1:10)/10), c("0.01", "0.05", "0.10"))
for (i in (1:10)){
f[i] <- i/10
for (j in (1:3)){
l[i,j] <- -(1+f[i])*z[j]
u[i,j] <- (1-f[i])*z[j]
prob[i,j] <- pnorm(u[i,j])- pnorm(l[i,j])
typeIerror[i,j] <- 1- prob[i,j]
}
}
20
Effect of B on Type I Error
21
Comparing 𝑎 (> 2) treatments: one-way ANOVA
22
Comparing 𝑎 (> 2) treatments: one-way ANOVA
23
Comparing 𝑎 (> 2) treatments: one-way ANOVA
24
The Analysis of Variance for a single factor CRD
25
The Analysis of Variance for a single factor CRD
26
The Analysis of Variance for a single factor CRD
27
The Analysis of Variance for a single factor CRD
28
The Analysis of Variance for a single factor CRD
29
Example
30
Example
31
Solution
32
Example (Using R)
# Rcode for one way ANOVA
grades <- read.table("1wayanova.txt", header=T) #the data file
head(grades)
33
Example (Using R)
grades <- read.table("1wayanova.txt", header=T) #the data file
grp.means <- with(grades, tapply(Grade,Method,mean))
grp.means
1 2 3
65.12000 59.93333 66.84000
grp.StdDev <- with(grades, tapply(Grade,Method,sd))
grp.StdDev
1 2 3
13.264363 11.614300 9.762513
grp.n <- with(grades, tapply(Grade,Method,length))
grp.n
1 2 3
25 30 25
34
Example (Using R)
grades$Method <- as.factor(grades$Method)
fit <- aov(Grade~Method, data=grades)
anova(fit)
35
Example (Using R)
• Just aov function also gives similar information (without familiar
looking ANOVA table)
• fit <- aov(Grade~Method, data=grades)
• fit
36
Model Checking
• To check if the model assumptions are at least approximately satisfied
by the data, we can look at residuals given by
1
• Ignoring 1 − some authors (e.g. Montgomery) define the
𝑛𝑖
standardized residuals (semi-standardized residuals) as
37
Model Checking
• If the errors 𝜖𝑖𝑗 ∼ 𝑁(0, 𝜎 2 ), then the standardized residuals should
be approximately 𝑁(0,1).
• To check whether the distribution of the error term is Normal, we
look at the plot of residuals against their Normal quantiles.
• If the points on the Normal quantile plot deviate significantly from
the straight line, the distribution of the residuals is not Normal.
38
Example
• qqnorm(residuals(fit))
39
Model Checking
• If the model is correct and the assumptions are satisfied, then the
residuals should not be related to any variable.
• In particular they should not be related to predicted values.
• As a check of model adequacy we often have a look at a plot of
residuals / standardized residuals against the predicted values.
• Any patterns or unusual structure on this plot indicates inadequacy of
the model or departure from assumptions.
40
Model Checking
41
Multiple comparisons
42
Multiple comparisons
• The significance level for each hypothesis (0.05 in this example) is called
the individual error rate.
• There are many procedures designed to control the family error rate when
making multiple comparisons.
• The simplest way is to lower the individual error rate to make the family
error rate is less than 𝛼
43
Multiple comparisons (Bonferroni’s method)
• For example, if we have 𝑔 null hypotheses to be tested and if we want
the family error rate to be no more than 𝛼, then we can test each
hypothesis with an individual error rate 𝛼/𝑔.
• In this case we can show that the family error rate is less than or
equal to 𝛼 , some desired level.
44
Multiple comparisons (Bonferroni’s method)
• This approach can be thought of as ``𝛼-splitting''. If 𝑔 inferences (tests
or confidence intervals) are each made at some level 𝛼 ∗ , the
maximum possible ``overall error rate'' is 𝑔 𝛼 ∗ .
• We choose 𝛼 ∗ = 𝛼/𝑔, so that the overall error is less than or equal to
𝛼.
• This family error rate can be less than the desired error rate 𝛼.
45
Tukey’s method
46
Tukey’s method
47
Tukey’s method
48
Tukey’s method
a=5
N = 25
alpha = 0.05
qtukey(1-alpha, nmeans = a, df = N-a)
[1] 4.231857
49
Inferences about Individual Means
50
Example (revisiting)
51
Example (revisiting)
52
Side-by-side Boxplots for Comparing treatments
• boxplot(grades$Grade~grades$Method)
53
Example (cont.)
• pairwise.t.test(grades$Grade, grades$Method, p.adj = "none")
54
Example (cont.)
• pairwise.t.test(grades$Grade, grades$Method, p.adj = "bonf")
55
Example (cont.)
• TukeyHSD(fit, conf.level=0.95)
56
Example (cont.), calculating p adj
grades <- read.table("1wayanova.txt", header=1) #the data file
ybar <- with(grades, tapply(Grade,Method,mean))
ybar
1 2 3
65.12000 59.93333 66.84000
n <- with(grades, tapply(Grade,Method,length))
n
1 2 3
25 30 25
N <- sum(n)
a <- 3
57
Example (cont.), calculating p adj
df <- N-a # Degrees of freedom error
q12 <- abs(ybar[1]ybar[2])/sqrt((mse/2)*(1/n[1]+1/n[2]))
# Compute p- adj = P(Q > q12)
p_value12 <- 1 - ptukey(q12, nmeans = a, df = N-a)
p_value12
0.2326663
abs(ybar[1]-ybar[2])
5.186667
58
Example (cont.)
• par(mfrow=c(2,1))
• plot(TukeyHSD(fit, conf.level=0.90))
• plot(TukeyHSD(fit, conf.level=0.95))
59
Example (cont.) Interpretations
• The ANOVA table on R output shows that the p-value is 0.07572 <
0.10 and so the effect of the teaching method is significant at
𝜶 = 0.10
• The multiple comparison procedure (Bonferroni's method) shows that
methods 2 and 3 are significantly different at the 10 percent
level of significance.
• For methods 2 and 3 , Tukey’s plot of the 90% confidence interval
does not include 0.
60
Contrasts
61
Contrasts
62
Example
Test 𝐻0 : 𝜇1 + 𝜇2 − 2𝜇3 = 0 against 𝐻1 : 𝜇1 + 𝜇2 − 2𝜇3 ≠ 0
grades <- read.table("1wayanova.txt", header=1) #the data file
ybar <- with(grades, tapply(Grade,Method,mean))
ybar
1 2 3
65.12000 59.93333 66.84000
n <- with(grades, tapply(Grade,Method,length))
n
1 2 3
25 30 25
N <- sum(n)
a <- 3
63
Example
grades$Method <- as.factor(grades$Method)
fit <- aov(Grade~Method, data=grades)
anova(fit)
65
Example
library(lsmeans)
lsm <- lsmeans(fit, ~ Method)
lsm
contrast(lsm, list(c))
66
Contrasts (F-test)
67
Contrasts (F-test) Using R
# F test
ssc <- ctr^2/sum(c^2/n)
ssc
[1] 318.9402
F <- ssc/mse
F
[1] 2.35643
pval_F <- 1-pf(F, 1, N-a)
pval_F
[1] 0.1288664
68