Design of Experiments:: 1) Experiment
Design of Experiments:: 1) Experiment
1)Experiment: An operation which can produce some well defined results is known as
experiment.
Through experimentation, we study the effect of changes in one variable (such as
application of fertilizer) on another variable (such as grain yield of a crop).The variable
whose changed we wish to study may be termed as a dependent variable or response
variable (yield).The variable whose effect on the response variable are termed as an
independent variable or a factor. Thus, crop yield, mortality of pests etc. are known as
responses and the fertilizer, spacing, irrigation schedule, pesticide etc. are known as
factors.
2)Design of Experiments: Choice of treatments, method of assigning treatments to
experimental units and arrangement of experimental units in different patterns are known as
design of experiment.
3)Treatment: Objects of comparison in an experiment are defined as treatments. Or
Any specific experimental conditions/materials applied to the experimental units are
termed as treatments.
Ex: Different varieties tried in a trail, different chemicals, dates of sowing, and
concentration of insecticides.
A treatment is usually a combination of specific values called levels.
4)Experimental material is the objects or group of individual or animal etc… on which we
the experiment is conducted is called as experimental material. Ex: Land, Animals, lab
culture, machines etc…
5)Experimental unit: The ultimate basic object to which treatments are applied or on
which the experiment is conducted is known as experimental unit.
Ex: Piece of land, an animal, plots, etc...
6)Experimental error is the random variation present in all experimental results.
Response from all experimental units may be different to the same treatment even under
similar conditions, and it is often true that applying the same treatment over and over again
to the same unit will result in different responses in different trials. Experimental error does
not refer to conducting the wrong experiment. These variations in responses may be due to
various reasons such as factors like heterogeneity of soil, climatic factors and genetic
differences, etc.. also may cause variations (known as extraneous factors). The unknown
variations in response caused by extraneous factors are known as experimental error.
For proper interpretation of experimental results, we should have accurate
estimate of the experimental error. If the experiment errors are small we will get the
more information from an experiment, we say that the precision of the experiment is
more.
Our aim of designing an experiment will be to minimize this experimental error.
7)Layout: The placement of the treatments on the experimental units along with the
arrangement of experimental units is known as the layout of an experiment.
Analysis of Variance (ANOVA):
Definition of ANOVA:
The analysis of variance is the systematic algebraic procedure of decomposing (i.e.
partitioning) overall variation ( i.e. total variation) in the responses observed in an
experiment into different component of variations such as treatment variation and error
variation. Each component is attributed identifiable cause or source of variation.
Assumptions of ANOVA:
For the validity of the F-test in ANOVA the following assumptions are made.
1. The effects of different factors (treatments and environmental effects) are additive in
nature.
2. The observations and experimental errors are independent
3. Experimental errors are distributed independently and normally with mean zero and
constant variation i.e. ~K(0, σ2)
4. Observations of character under study follow normal distribution
CHRIST
Deemed to be University
Essential Reading:
Montgomery D.C, Design and Analysis of Experiments, John Wiley and Sons Inc,.
New York, 2014.
Gupta S.C and Kapoor V.K, Fundamentals of Applied Statistics, 4th edition (Reprint),
Sultan Chand and Sons, India, 2019.
K.A. Gomez and A.A. Gomez, Statistical Procedures for Agricultural Research,
Second Edition. John Wiley & sons.
Recommended Reading:
Definition:
Design of Experiment
Over past two decades, DOE was a very useful tool traditionally used
for improvement of product quality and reliability.
The usage of DOE has been expanded across many industries as part of
decision making process either along a new product development,
manufacturing process and improvement. It is not used only in
engineering areas it has been used in administration, marketing,
hospitals, pharmaceutical, food industry, energy and architecture, and
chromatography.
Basic Terminologies:
Experiment: An operation which can produce some well defined results
is known as experiment. Where such inquiry will aid in on
administrative decision such as recommending or not recommending,
etc…
Types of Experiments
1. Preliminary Experiment
2. Critical Experiment
3.Demonstrational Experiment
Case Study:
Researcher wants to study the durability (Tensile Strength)
of fabric under repeated washing using water with different
temperature (Hot, warm and Cold) and cleansing agent
(Regular, low phosphate, liquid).
● Responses variable ?
● Treatment or factor ?
● Experimental unit ?
● Level ?
Historical perspective:
One Factor At a Time (OFAT) was very popular scientific method dominated until
early nineteen century. In this method one variable/factor is tested at a time.
In the 1920s and 1930s Ronald A. Fisher conducted a research in agriculture with
the aim of increasing yield of crop in the UK. He came up with design of experiment
and officially he was the first one who started using DOE.
Credit for Response Surface Method (RSM) belongs to George Box who is also
from the UK. He was concerned with experimental design procedures for process
optimization.
1. Comparison - multiple comparisons to select the best option that uses t‒test,
Z‒test, or F‒test.
2.Variable screening ‒ Intended to select important factors (variables) among many
that affect performances.
3.Transfer function identification ‒ Identified, the relationship between the input
variables and output variable can be used for further performance exploration via
transfer function.
4.System Optimization ‒ optimization by moving the experiment to optimum setting
of the variables.
5.Perform Experiment
6. Data Analysis
Definition of ANOVA:
●
2 ……..
3 ……..
. . . . ….
. . . . …..
k ……..
Grand
Total
(GT)
Test Procedure: The steps involved in carrying out the analysis are:
Patients
Drugs 1 2 3 4 5 Total Mean
A 7 5 6 3 4
B 10 12 9 8 6
C 14 16 13 12 10
GT GT
Mean
Steps
●
e) Find the sum of squares within the class or sum of squares due to
error (SSE):
= 50
The table value is obtained from F-table for (k-1, n-k) df at α % & denoted it as F tab
F(2,12) at α=0.01
6.93
5. Decision criteria:
Fixed Effects:
Deemed to be University
NH: σ2= 0
AH: σ2 ǂ 0
Let us consider the case when there are two factors which may affect the
variate yij values under study.
Ex: The yield of cow milk may be affected by rations (feeds) as well as
the varieties (breeds) of the cows.
Let us now suppose that the n cows are divided into ‘h’ different groups
or classes according to their breed, each group containing ‘k’ cows and
then let us consider the effect of k treatments (rations) given at random
to cows in each group on the yield of milk.
1 2 3 j h
1
Total Grand
Total
(GT)
Mean
The total variation in the observation yij can be split into the following three
components:
(i) The variation between the treatments (rations)
(ii) The variation between the varieties (breeds)
(iii) The inherent variation within the observations of treatments and
varieties.
The first two types of variations are due to assignable causes which can be
detected and controlled by human endeavour and the third type of variation
due to chance causes which are beyond the control of human hand.
Test procedure for two -way analysis: The steps involved in carrying out
the analysis are:
1(b).
H0: μ.1 = μ.2 = … =μ.h (for comparison of varieties/ breed) i.e. there is no
significant difference between varieties ( breeds).
H1: There is a significant difference between varieties ( breeds)
ANOVA TABLE
Between Treatments (Feeds) k-1 SSTr MST= SSTr / k-1 FT =MST / MSE
(i) For comparison between treatments, obtain F-table value for [k-1, (k-1)
(h-1)] df at a level of significance and denoted it as Ftab.
(ii) For comparison between Varieties, obtain F-table value for [k-1, (k-1)
(h-1)] df at a level of significance and denoted it as Ftab.
5) Decision criteria.
Example:
●
Methods
M1 M2 M3
1 7.5 7.0 7.1
Analyst 2 7.4 7.2 6.7
3 7.3 7.0 6.9
4 7.6 7.2 6.8
5 7.4 7.1 6.9
Methods
M1 M2 M3 Row Total, Ti.
(Analyst)
1 7.5, 7.0 7.1 21.6 466.56
7.4,6.8
Analyst 2 7.4 7.2 6.7 21.3 453.69
(Methods)
Steps:
●
=0.137
ANOVA TABLE
Decision criteria:
Since the calculated FB value (23.27) is greater than the F table value
(4.458), Reject Null hypothesis, Thus, we conclude that the methods
differ significantly at 5 % level of significance.
Comparison of Means:
Pairwise comparison:
CD=0.1914
Statement of Hypothesis:
●
ANOVA TABLE
Sources of DF Sum of M.S.S F ratio F table
Variation squares (S.S.)
Factor B q -1 SSB= FB =
Error pq (m-1)
Statement of Hypothesis:
●
= 1.714
ANOVA TABLE
Sources of DF Sum of M.S.S F ratio F table
Variation squares
(S.S.)
Order of 5-1 = 4 10.342 F0.05(4,50) = @5
gravida 2.55
@1 3.71
Conclusion:
Comparison of Means:
Pairwise comparison:
Order of Means
gravida
1
2
3
4
5
Comparison of Means:
Pairwise comparison:
CRD is the basic single factor design. In this design, the treatments are
assigned completely at random so that each experimental unit has the
same chance of receiving any one treatment.
But CRD is appropriate only when the experimental material is
homogeneous.
CRD is not preferred in field experiments.
In laboratory experiments, pot culture experiment and greenhouse
studies it is easy to achieve homogeneity of experimental materials and
therefore CRD is most useful in such experiments.
Definition:
It is defined as the design in which first the field is divided into a
number of experimental units (small plots) depending upon the number
of treatments and number of replications for each treatment, and then
treatments are assigned completely at random so that each experimental
unit has the same chance of receiving any one treatment.
(It is also known as non-restrictional design)
Layout of CRD:
Completely randomized design is the one in which all the experimental units are
taken in a single group which are homogeneous as far as possible. The
randomization procedure for allotting the treatments to various units will be as
follows.
1. Determine the total number of experimental units.
2. Assign a plot number to each of the experimental units starting from left to right
for all rows.
3. Assign the treatments to the experimental units by using random numbers.
•
1 2 3 4
8 7 6 5
9 10 11 12
16 15 14 13
17 18 19 20
•Then ‘n’ distinct three-digit random numbers are selected
from the random number table.
•These ranks correspond to the plot number, the first set of ‘r’
units are allocated to treatment t1, the next ‘r’ units are
allocated to treatment t2 and so on. This procedure is
continued until all treatments have been applied.
Random Rank Treatment to be
Number applied
807 18 t 1 t1
186 4 t1 t1
410 10 t1
345 9 (r times)
626 14 (5 times)
340 7 t2 t2
883 19 t2 t2
569 13 t2
341 8 (r times)
094 2 (5 times)
322 6 t3 t3
252 5 t3 t3 (r times)
047 1 t3
469 12 (5 times)
632 15
183 3 t4 t4
417 11 t4 t4
782 17 t4
969 20 (r times)
697 16 (5 times)
Note: Only replication and randomization principles are adopted in this
design. But local control is not adopted (because experimental material
is homogeneous).
CHRIST
Deemed to be University
2 ……..
3 ……..
. . . . ….
. . . . …..
k ……..
Grand
Total
(GT)
Test Procedure: The steps involved in carrying out the analysis are:
ANOVA Table
Advantages of CRD:
2. There is complete flexibility in this design i.e. any number of treatments and replications for each
treatment can be tried.
6.Even if some values are missing the analysis will be remains simple.
Disadvantages of CRD :
1. It is difficult to find homogeneous experimental units in all respects and hence CRD is seldom
suitable for field experiments as compared to other experimental designs.
Uses of CRD:
CRD is more useful under the following circumstances.
1) When the experimental material is homogeneous i.e., laboratory, or green house, playhouses, pot
culture etc.
2) When the quantity or amount of experimental material of any one or more of the treatment is
limited or small.
3) When there is a possibility of any one or more observations or experimental unit being destroyed.
Definition:
Layout of RCBD
● Let r = 4, t = 3
Block
I
Block
II
Block
II
Block
IV
t2 t1 t3 t1
t1 t2 t1 t2
Field t3 t3 t2 t3
B1 B2 B3 B4
Rand. No Rank Treatment
T2 T1
053 1 T1
T1 T2
684 2 T2
749 3 T3 T3 T3
B1 B2 B3 B4
Rand. No Rank Treatment T2 T1 T3 T1
178 1 T1
T1 T2 T1 T2
310 2 T2
976 3 T3 T3 T3 T2 T3
1 2 ….….…j……….. r Total
1 y11 y21 y12 y22 ………………. y1r T1
2 . . ………………. y2r . T2
.i .
. . ………………. .
. yt1 . yt2 ………………. . *"
………………. .ytr .
………………… Tt
.
T
Total R1 R2 …… ………. Rr GT
ANOVA Table
Test Procedure:
1. Null hypothesis:
2. Level of significance (α ):
3. Test Statistic:
4. calculated F statistic and F table values
5. Decision criteria
A hardness testing machine operates by pressing a tip into a metal test “coupon.” The hardness of the
coupon can be determined from the depth of the resulting depression. Four tip types are being tested
to see if they produce significantly different readings. The coupons might differ slightly in their
hardness (for example, if they are taken from ingots produced in different heats).Thus coupon is can
be treated as a blocking factor.
Test Coupon
1 2 3 4
1 9.3 9.4 9.6 10
Type
2 9.4 9.3 9.8 9.9
of Tip
3 9.2 9.4 9.5 9.7
Efficiency of Blocking
Yates (1937) considered a method of estimating the missing values, inserting the estimates and
analysing the data. This technique gives results identical with those obtained by the correct
procedure. The theory has been developed under the assumption that all the treatment contrasts are
estimable under missing value
Replication Total
I II III IV
1 22.9 25.9 39.1 33.9
2 29.5 30.4 X 29.6
3 28.8 24.4 32.1 28.6
4 47 40.9 12.8 32.1
5 28.9 20.4 21.1 31.8
Total
Replication Total
I II III IV
1 22.9 25.9 39.1 33.9 121.8
2 29.5 30.4 X 29.6 89.5
3 28.8 24.4 32.1 28.6 113.9
4 47 40.9 12.8 32.1 132.8
5 28.9 20.4 21.1 31.8 102.2
Total 157.1 142 105.1 156 560.2
X= 25.64
Grain yield per plant (grams) of maize of nine varieties in a randomized block
design were as tabulated below.
Varieties Rep I Rep II Rep III Total
V1 21 20 19.5
V2 19 18 18.5
V3 18.5 18 18.9
V4 27.5 X 27
V5 31 32.5 32.6
V6 31.5 30.5 32
V7 25.3 25.5 26.6
V8 39 40 38.5
V9 39 38.5 40
Total
X= 26.9
B=0.77
CF=21100.85
Total SS= 1585.63
Replication SS= 0.99
Treat SS= 1576.80
Corrected Treat SS=1576.03
Error SS= 8.61
Total 18 1585.63
Treatment R1 R2 R3
1 14.5 14 14
2 16.5 16.9 16.7
3 X 16.7 17.4
4 17.6 16.9 17.5
5 18.5 17.9 17.6
6 19.3 18.3 18.8
7 19.5 19 X
Treatment R1 R2 R3 Total
1 14.5 14 14 42.5
2 16.5 16.9 16.7 50.1
3 X31 16.7 17.4 34.1/2=17.05
4 17.6 16.9 17.5 52
5 18.5 17.9 17.6 54
6 19.3 18.3 18.8 56.4
7 19.5 19 X73 (19.21) 38.5
105.9/6=17.
Total 119.7 102 327.6+19.21= 346.8
65
Treatment R1 R2 R3 Total
1 14.5 14 14 42.5
2 16.5 16.9 16.7 50.1
3 17.5 16.7 17.4 51.6
4 17.6 16.9 17.5 52
5 18.5 17.9 17.6 54
6 19.3 18.3 18.8 56.4
7 19.5 19 19.2 57.7
Total 123.4 119.7 121.2 364.3
Bias(X31) = 0.019
Bias(X73) = 4.140
CF=
Tss=
TrSS=
● T3-T7
● T3-----0+1+0.5=1.5 T7=0.5+1+0 =1.5
Treatment R1 R2 R3 Total
1 14.5 14 14 42.5
2 16.5 16.9 16.7 50.1
3 17.5 16.7 17.4 51.6
4 17.6 16.9 17.5 52
5 18.5 17.9 17.6 54
6 19.3 18.3 18.8 56.4
7 19.5 19 19.2 57.7
Total 123.4 119.7 121.2 364.3
2. Enter all initial values assigned in step1, in the table of observed values and estimate the
one remaining missing observation by using the appropriate missing data formula.
(x73=19.21)
3. Enter the estimate of the missing data obtained in step 2, in the table consisting of all
observed values and the initial value (or values) assigned in step1.
4. Remove one initial value. Treat the removed value as the missing data, and estimate it
following the same missing data formula technique used in step 2.
Repeat the foregoing procedure for the third missing observation, then for the fourth missing
observation, and so on, until all missing data have been estimated once through the missing data
formula
▪ Violation of assumptions of ANOVA may result in unreliable statistical tests and the
unacceptability of the conclusions.
▪ A solution is often to 'transform' the data to conform to a normal probability distribution. For this,
we take the original data apply a formula and carry out ANOVA on the transformed data.
● Most popular among the different types of transformations used to transform skewed data
to approximately conform to normality. If the original data follows a non–normal
distribution, so, then the log-transformed data follows a normal or near normal distribution.
● When the original observation Y is converted to log Y, the conversion is known as log
transformation.
● The logarithmic transformation is mostly appropriate for data where the standard deviation is
proportional to the mean.
Advantages of RBD
All the three principles of design of experiments are used, the conclusions drawn from RBD are
more valid and reliable.
If data from individual units be missing then, analysis can be done by estimating it.
RCBD has been shown to be more efficient or accurate than CRD. The elimination of block sum
of squares from error sum of squares, usually results in a decrease of error sum of squares.
Disadvantages of RBD
In field experiments, it is usually observed that as the number of treatments increases, the block
size increases and so one has lesser control over error.
It cannot control two sided variation of experimental material simultaneously. That is why, it is not
recommended when experimental material contains considerable variability.
● When the experimental material is divided into rows and columns and the
treatments are allocated such that each treatment occurs only once in a row
and once in a column, the design is known as Latin Square Design (LSD).
● In LSD the number of rows and columns are equal. Hence the arrangement
will form a square. It is identified as 5X5 Latin squares, 6X6 Latin squares,
etc.
The major feature of this design is its capacity to simultaneously handle two
known sources of variation among experimental units.
Example
● An animal scientists wishes to study weight gain in piglets but knows that
both litter membership and initial weights significantly affect the response.
A B C
B C A
C D A
4x4
A B C D A B C D A B C D A B C D
B C D A B A D C B C D A B D A C
C D A B C D B A C D A B C A D B
D A B C D C A B D A B C D C B A
5x5
A B C D E
B A E C D
C D A E B
D E B A C
E C D B A
A B C D E F
B F D C A E
C D E F B A
D A F E C D
E C A B F D
F E B A D C
The process of randomization and layout for a LS design is shown below for an experiment with
five treatments A, B, C, D, and E.
A B C D E
B A E C D
C D A E B
D E B A C
E C D B A
Excellence and Service
CHRIST
Deemed to be University
Step 2: Randomize the row arrangement of the plan selected in step 1, following one of the
randomization schemes.
Select five three-digit random numbers from random number table; for example, 628, 846, 475, 902,
and 452.
Rank the selected random numbers from lowest to highest:
Step 3 : Use the rank to represent the existing row number of the selected plan and the sequence to
represent the row number of the new plan. For our example, the third row of the selected plan
(rank = 3) becomes the first row (sequence =1) of the new plan; the fourth row of the selected plan
becomes the second row of the new plan; and so on. The new plan, after the row randomization is:
C D A E B
D E B A C
B A E C D
E C D B A
A B C D E
Step 4. Randomize the column arrangement, using the same procedure used for row arrangement in
step 2. For our example, the five random numbers selected and their ranks are:
792 1 4
032 2 1
947 3 5
293 4 3
196 5 2
The rank will now be used to represent the column number of the plan obtained in step 2 (i.e., with
rearranged rows) and the sequence will be used to represent the column number of the final plan. For
our example, the fourth column of the plan obtained in step 2 becomes the first column of the final
plan, the first column of the plan of step 2 becomes the second column of the final plan, and so on. The
final plan, which becomes the layout of the experiment is:
E C B A D
A D C B E
C B D E A
B E A D C
D A E C B
Grain Yield of four promising maize hybrids (A, B, C and D) from an experiment
with Latin Square Design
ANOVA
Total 15 1.413
● CD= ?
Efficiency of LSD
In estimating the efficiency of LSD over RCBD, we have to consider the type of blocks. If the
LSD had been RCBD with columns as blocks it is termed as column blocking,
similarly, if the LSD had been RCBD with rows as blocks termed as row blocking.
When the error df is less 20, the relative efficiency has to be adjusted by multiplying the
precision factor
Advantages of LSD
Since total variation is divided into three parts namely rows, columns and
treatments, the error variance is reduced considerably. It happens due to
the fact that rows and columns being perpendicular to each other,
eliminates the two-way heterogeneity up to a maximum extent.
LSD is more efficient than RBD or CRD.
When missing values are present, missing plot technique can be used and
analysed.
Disadvantages of LSD
This design is not as flexible as RBD or CRD as the number of treatments is
limited to the number of rows and columns.
LSD is seldom used when the number of treatments is more than 12.
LSD is not suitable for treatments less than five. Because of the limitations
on the number of treatments, LSD is not widely used in agricultural
experiments.
Note: The number of sources of variation is two for CRD, three for RBD
and four for LSD.
Factorial Experiment
When two or more number of factors are investigated simultaneously in a single
experiment such experiments are called as factorial experiments.
The experiment, in which the effect of a number of levels of a factor is to be
assessed in combination with levels of other factor(s) simultaneously are called as
factorial experiment.
The name factorial experiment was given by F.Yates in 1926
.
Terminologies
● Information obtained from factorial experiment is much more than that obtained from
series of single factor experiments
❑ When several treatment combination are involved, the execution of experiment and
statistical analysis become more complex.
Complete Confounding:
If the same effect is confounded in all the replications, then, it is know as
complete confounding.
Ex:
The confounded effect is generally of little or no value to the experimenter
Partial Confounding:
The effect which is confounded in one replicate is being estimated and
tested on the basis of the replicates in which it is not confounded. This
system of confounding is known as partial confounding.
Ex:
AC confounded in Replication I
BC confounded in Replication II
etc
Ex:
AB, AC and BC are confounded in I, II and III replication respectively and
there are three such sets of replications.
EX:
AC confounded 2 times
BC confounded 3 times
AB confounded 1 time