Design of Experiment (G2)
Design of Experiment (G2)
Of Bharat Agarwal
Sriram Ranga
2019B4A70725P (Leader)
2017A7PS0047P
Experiments
Anmol Kaushik 2018B4AB0888P
Rituj Mittal 2018B4A80915P
Sonam Kataria 2018B4TS1165P
Neha Saini 2018B4TS1168P
Atmesh Mahapatra 2019B4A30560P
Pranay Bansal 2019B4A10471P
1. Design of Experiments 1
a. Randomisation 3
b. Replication 3
c. Local Control 3
d. Optimal Design 8
f. Quasi-Experimental Design 10
6. References 24
1
DESIGN OF EXPERIMENTS
'Design for the experiment, don't experiment for the de sign.'
In the literature on the design and analysis of experiments far more emphasis has been
placed on analysis than on design. In fact, the experimental design determines the manner
in which the experiment is carried out, which is reflected in the statistical model, which in
turn determines the appropriate techniques of analysis.
The main question now arises is how to obtain the data such that the assumptions are
met, and the data is readily available for the application of the tools involved in the
technique of statistical inference. Since the first step involves the obtaining of
sufficient experimental units, the next job is using an appropriate randomisation
procedure for allocating the experimental units to the treatments in a random fashion.
The Design of Experiment provides a method by which the treatments are placed at
random on the experimental units in such a way that the responses are estimated with
utmost precision possible.
2
Each of these basic principles are analogous to the three rectangular faces of a triangular
prism, each serving an important role in being exhaustive for the - Design of the
Experiment.
3
Randomisation
Randomisation forms the basis of a valid experiment as it involves the allocation of
treatments to the experimental units at random which ensures that every possible allotment
of treatments has the same probability.
Replication
Replication serves as the validator of the experiment where we obtain a more
reliable estimate of the experimental error by repeating the experimental situation
by replicating the experimental unit.
Local Control
Local Control involves grouping together of homogeneous experimental units into
groups / blocks eliminating the variation within the blocks aiming to reduce the
experimental error.
Here each experimental unit is randomly assigned to a random group to receive a different
treatment and hence, each unit in the same group will receive the same treatment and we can
finally compare the results for each treatment.
Advantages of CRD
● This is a pretty basic setup among the different experimental designs and hence is
relatively easy to implement and perform statistical analysis.
● It required certain strong a priori assumptions to carry out further analysis.
● There is no restriction on the number of replications for the different treatments.
● Example: Suppose there are 3 treatments and 15 experimental units:
Disadvantages of CRD
● Used only when the entire experimental material is homogeneous i.e. every
experimental unit is having identical characteristics. Hence, often inefficient because it
is not always possible to gather sufficient numbers of homogeneous units for an
experiment.
● The results may be skewed if we forcefully assume our experimental material to be
homogeneous because there may be the presence of nuisance/extraneous factors which
were kept unaccounted as of not being of our primary interest.
● The variability resulting among the experimental units i.e. the variation among the
response variables is completely directed into experimental error – a variable which we
are expected to minimise.
● Not suitable for larger number of treatments as it would essentiate more experimental
material which would increase the variation.
6
Here, the concept of Blocking is used to remove the effects of a few of the most important
extraneous / nuisance variables. The basic concept is to create homogeneous blocks in which
these extraneous factors are held constant and the factor of interest is allowed to vary.
Every treatment occurs in every We can consider each treatment The experimental units are first
block because each treatment to be replicated the same sorted into homogeneous
appears in each block. Thus, the number of times as the number groups / blocks and then all the
number of treatments is equal to of units in a block. Hence, the treatment combinations are
the number of units in any number of replications is the randomly assigned to the units
block. So, we randomly allocate same as the number of blocks. within the blocks.
the treatments to the
experimental units in each
block.
Advantages of RBD
● Effectively handles non-homogeneous experimental material.
● It has flexibility to accommodate any number of treatments, blocks, and replications.
● The different treatments need not have equal sample sizes.
● Smaller error variance as the Local Control principle makes that sure because of the
homogeneous blocks and because of parting away some variance from the error
variance due to the difference among blocks. Thus, this dominates over the Complete
Block Design which has high experimental error due to high variability among
experimental units.
● Relatively easy statistical analysis even with the missing data.
● If an entire treatment or a block needs to be dropped from the analysis for some reason,
such as spoiled results, the analysis is not complicated thereby.
Disadvantages of RBD
● Not suitable for a large number of treatments because the block size becomes too large.
Because the prima facie idea of Randomised Block Design is based on the fact of
reducing the variability within blocks, but with the increase of block size, we deviate
from our basic setup.
● It requires some strong assumptions more than that for a completely randomised
design - like no interactions between treatments and blocks and constant variance
from block to block. So, interactions between block and treatment effects increase
error.
7
When the experimental units represent physical entities, then smaller groups or blocks of
experimental units usually result in greater homogeneity. And hence, we don’t favour using
block design with more than the minimum x experimental units per block, where x represents
the number of levels of the treatment factor. But in cases where the experimental runs
represent trials rather than physical entities, then larger block sizes do not necessarily
increase the variability of experimental units within a block and hence experimental runs can
be made quickly.
Advantages of GRBD
● Helpful in cases where there is uncertainty over the block-treatment interaction.
● Helpful in cases where the experimental units represent trials rather than physical
entities.
Disadvantages of GRBD
● Higher cost of experimentation because of use of replications.
● In some experimental settings, use of one factor for blocking the experimental units
doesn’t lead to a satisfactory precision, hence two factors other than the treatment
factor may be required to yield a higher precision.
8
Optimal Design
When the factor levels i.e. treatments are continuous rather than discrete, we can use
Randomised Block Design can’t study all the factor levels to determine the optimal
value of the factor which would optimise our response variable, as they assume a
simple setup for the experiment which renders inappropriate for above stated goals
with given constraints, so this is accomplished with the help of Optimal Design.
Where η = set of experimental design, y = observed data for each design, θ = unknown
parameters for each design, d = decision taken for data y. The aim of this approach is to
maximize the utility function, and the value of d obtained by doing so will be the best
decision, and the value of η obtained by pairing this with the value of d obtained will be
the Bayesian Experimental Design.
Quasi-Experimental Design
A Quasi-Experimental Design aims at establishing a cause-and-effect relationship
between the dependent and independent variable where assignment is done on some
specific, non-random criterion thus allowing the experimenter to control the
assignment to the treatment condition. It is usually used in cases when randomisation
is impractical and / or unethical.
EXAMPLES
Completely Randomised Design
Suppose the BITS Admin wants to test which academic structure is more stress free for the
students :
1) T1 T2 T3 or 2) Midsem-Compre.
Assumptions:
● The population corresponding to both treatments is normally distributed.
● Both treatments have the same population variance.
● All experimental units are independent of each other in their working.
There will be 2 groups and each academic structure will be considered a separate group.
For a particular batch having lets say 1000 students, 500 random students each, will be put
under the two groups.
12
Optimal Design
Suppose a research student in Robotics from Massachusetts Institute of Technology wants to
study the response of his model to 4 different sound patterns : A with 5 levels, B with 2 levels,
and C with 3 levels and D with 7 levels. One complete replication of this experiment would
require 5*2*3*7 = 210 experimental units. But the student could afford only 37 units. The
question now arises as to which 37 out of the 210 units should the student choose. The
D-Optimal Design algorithm provides a reasonable choice here.
Quasi-Experimental Design
A fuel company claims that it's fuel produces less pollutants from the vehicle exhausts than
the existing fuel standards. To verify the claim, the Delhi govt proposes an odd-even scheme
for all the cars in Delhi in which the odd numbered cars will run on the existing fuel
standardized by the pollution control board and the even numbered cars will run on the new
fuel. After a month of implementation, the level of pollutant in the cars' engine will be
measured and conclusion will be drawn accordingly.
Here odd numbered cars are the control group and even numbered cars will be the treatment
group. Odd-even grouping is easier to keep track as random grouping will be impractical to
measure the results.
17
A college primarily uses 3 proctored examination platforms for conducting exams. The college
administration is under the impression that all the 3 platforms are equally effective, and wants to
check if there is a difference in effectiveness of the platform.
Assumptions:
● The population corresponding to the three treatments is normally distributed.
● All the treatments have the same population variance.
● All experimental units are independent of each other in their working.
Step 1:
Stating the null and alternative hypothesis based on the treatments:
H0 : Mean of marks in all the platforms are equal.
Ha : One of the platforms has a different mean.
Step 2:
We have to decide on an extraneous factor that could be affecting the averages of an exam. One
factor is the scoring ability of the students. So far, the students have been tested for 90 marks.
So, we divide the class into blocks of students with different post mid sem scores.
We can consider the following 6 groups:
Students with scores: (i) 0-15 (ii) 16-30 (iii) 31-45 (iv) 46-60 (v) 61-75 (vi) 76-90.
18
Step 3:
Performing the experiment: (Conducting another exam of 90 marks)
1 5 6 8
2 24 25 28
3 35 37 34
4 54 56 50
5 70 65 65
6 81 82 81
Step 4:
Determining suitable significance level. We shall take α = 0.05.
Step 5:
Calculating treatment mean, block mean and overall mean:
Treatment means (mean for each platform):
𝑥1 = 44.8333, 𝑥2 = 45.1666, 𝑥3 = 44.3333
Overall mean
𝑥 = 44.777
19
Step 6:
(i) Calculating SSTR, SSBL, SST, SSE:
2
SST = ∑ ∑(𝑥 − 𝑥) = 11517.111
𝑖𝑗
𝑖 𝑗
2
SSTR =𝑏 * ∑(𝑥 − 𝑥) = 2.111
𝑗
𝑗
2
SSBL = 𝑘 * ∑(𝑥 − 𝑥) = 11463.11
𝑖
𝑖
Here, i ranges from 1 to 6, j ranges from 1 to 3, b = 6, k= 3
𝑆𝑆𝐵𝐿
MSBL = = 2292.622
𝑏−1
𝑆𝑆𝐸
MSE = = 5.188
(𝑘−1)*(𝑏−1)
Step 7:
𝑀𝑆𝑇𝑅
Calculating the test statistic, FObserved : 𝑀𝑆𝐸 = 0.203
Step 8:
Performing the hypothesis test:
For F = 0.203, corresponding p value is 0.819 > α = 0.05.
Thus, we fail to reject H0
Thus, there is no significant difference between the effectiveness of the 3 platforms.
20
21
22
23
24
REFERENCES