Chapter 1
Chapter 1
Chapter 1
THE CONCEPT OF EXPERIMENT
Some Definitions:-
An Experiment:- An experiment is a planned inquiries to discover new fact, or to confirm or deny
the results of previous investigation.
Treatment:- A treatment is a set of operations (more or less well defined) Which potentially affect
change in the experimental unit.
Experimental Units:- An experimental unit is the smallest piece of experimental material to which
one trial of a single treatment is applied.
Sampling unit:- A sampling unit is a fraction (might all) of the experimental unit. It is the smallest
part of experimental material from which we can make a single measurement.
Design:- A plan and a set of rules of association between the experimental units and treatments
such that we can measure Yield = true value + error.
Experimental Error:- Experimental error describe the failure of two identically treated
experimental units to yields identical results. OR the measure of variation among "yields" on
(entire) experimental units treated alike. (It comes as a result from replication).
Yields:- Yields is the quantity which is measured on the experimental material.
Block:- A group of homogeneous experimental units are called block.
Replication:-When a treatment appears more than once in an experiment, the treatment is said to
be replicated.
Random Assignment:- If treatments are assigned to a set of units in such a way that every unit is
equally likely to receive any treatment. The assignment is said to be random.
THE DESIGN
The next and the most important phase of a research project is the design phase. The design phase
is mainly concerned with the considerations of numbers of observations to be taken and in deciding
on the size of the sample to be taken for a given experiment. Without this information, the best
alternative is to take as large a sample as possible, although in practice, it is usually impracticable.
After the number of observations and the number of experimental units are decided, the order in
2
which the experiments run, is of prime importance. Once a decision has been made to certain
variables at specific levels, there are always a number of variables that can't be controlled.
Randomization of the order of experimentation will tend to average out the effect of these
uncontrolled variables.
2) Randomization:- By randomization we mean that treatments are assigned to the units in such
a way that any unit is equally likely to receive any treatment, that is, randomization means the
allocation of treatment to the experimental units and performing the individual runs or trials of the
experiment in such a way that every possible allocation or order has the same probability. The
2
object of the randomization is to avoid any of personal bias which may conscious or unconscious.
Statistical methods require that error are random variables. Randomization makes this assumption
valid. The way in which randomization is performed, is an experiment depending on the type of
design being used.
Reasons for Randomization
! To minimize bias (means and variances).
! To obtain uncorrelated errors.
! To obtain valid estimate of experimental error.
3) Reduction of Errors (Local Control):- Reduction of error refers to the amount of balancing,
grouping and blocking of the experimental units. This makes the design more efficient.
Grouping:- Grouping mean placing of a set of homogeneous experimental units into groups in
order that the different groups may be subjected to different treatments.
Balancing:- Balancing mean the obtaining of the experimental units, the grouping, the blocking
and the assignment of the treatments to the experimental units in such a way that a balanced
configuration results.
Blocking:- Block is a set of homogeneous experimental units which are less variable within a set
than all of experimental units in total. Blocking is specifically useful when the number of
treatments is large and it may not be possible to get a large homogenous set of units required for
the experiment. Since the experimental error of results only arise from the variation among the
units with-in a replicate, the variation in all the units can be controlled by grouping the units so
that units in same replicate are similar and therefore reducing the experimental error. Variation
from one replicate to another do not contribute to the errors, it is therefore important to keep the
technique uniform within a replication and changes should be made when moving from one
replicate to another. To take full advantage of the operation opportunities for increased precision
by grouping the units, the best criteria for group should be, to minimized the variation within a
group and maximized the variation among different groups.
Blocks may or may not of interest directly, if not, they are just a source of variation to be isolated.
If yes, they are still "error reducer" also provide BT interaction, that is the effect of a treatment
3
i = ( i ) = 0
i=1 i=1
(1)
where μi is the mean of the ith treatment and μ is the over all mean.
2) The εij are random sample from a population which is normally distributed, with a mean
zero and a common variance σ2. i. e. εij NID(0, σ2).
where yij is the Jth observation under ith treatment, μ is the overall mean, αi denotes the effect of ith
treatment and εij the random error which is assumed to be normally and independently distributed
random variable with mean "0" and constant variance σ2 (i.e., same for all treatments), i.e, εij
NID (0, σ2).
i = ( i ) = 0 where μi is the mean of the ith treatment and μ is the over all mean.
i=1 i=1
possible levels, the model is called random effect model or component of variance model. In such
situation, the conclusions are extended to all possible levels of the factor whether they are involved
in the experiment or not. In such a situation τi is considered to be a random variable independent
of the error εij and it is assumed that τi NID ( 0, σ2τ ).
The hypothesis tested is that no variability exists between treatments, i.e, H0 :σ2τ = 0.
The model to be used is determined by the experimenter's view of the experiment. Either the results
are pertinent to only the treatments present (Model I) or inferences are to be made to a larger
population of treatments (Model II). This completes the specification of the model.
ij
i=1 j=1 nt 1
The numerator of the above quantity is called the total sum of squares (SSTot) which measure the
total variability of the data. Now SSTot can be written as:
7
( y )
t n t n
y .. = yij yi. + yi. y ..
2 2
ij
i=1 j=1 i=1 j=1
( y ) ( )
t n
=
2
ij yi. + y i. y ..
i=1 j=1
=
t n
i=1 j=1
( y ij
) (
2
yi. + y i. y .. + crossproduct terms)
2
The crossproduct terms vanishes and hence
( y )
y .. = n ( yi. y .. ) + yij yi. ( )
t n t t n
2 2 2
ij
i=1 j=1 i=1 i=1 j=1 The quantity SST is the sum of squares of
SStot = SST + SSE
difference between treatment averages and the grand average and it measures the difference
between treatment means. The quantity SSE. is the sum of squares of differences of observations
within treatments from the treatment averages, measure the random error.
In the same way we can partitioned the total (nt - 1) degrees of freedom. The SST has (t -
1) and the SSE has t(n - 1) degrees of freedom.
2 2
1 1
C.F = y ij = + i + ij
rt i j rt i j
2
1
= rt + r i + ij
rt i i j
1 2 2 2
2
=
rt
( )
r t + ij + 2rt ij
i j i j
1 2 2 2
=
rt
( )
r t + ij2 + ij gh + 2rt ij
i j i j i j
( 1
rt i j
)1
= rt 2 + ij2 + ij gh + 2 ij
rt i j i j
rt i j
( 1
)
E C.F = rt 2 + E ij2 + E ij gh + 2 E ij
1
rt i j
i j
= rt 2 + 2 1
The Total sum of squares are SSTot = y ij2 − C.F , and
i j
y = ( + i + ij )
2 2
ij
i j i j
(
= 2 + i2 + ij2 + 2 i + 2 ij + 2 i ij )
i j
y i . = y ij = ( + i + ij )
j j
= r + r i + ij
j
2
( yi .) 2
= r + r i + ij
j
2
= r + r + ij + 2r 2 i + 2r ij + 2r i ij
2 2 2
i
2
j j j
= r 2 2 + r 2 i2 + ij2 + ij gh + 2r 2 i + 2r ij + 2r i ij
j j j j
1
( yi .)2 = rt 2 + r i2 + 1 ij2 + 1 ij gh
r i i r i j r i j
Applying expectation, we get
1
E y i2 . = r i2 + (t − 1) 2 B
r i
By definition
( )
t n
SSE = yij yi.
2
i=1 j=1
Now
1
( + + )= + +
1 n n
yi. = yij =
n j=1
i ij i i. Now substituting this value from eq. 9 into eq. 8
n j=1
we will get
get
10
E[ SSE ] = E ( ij i. )
t n
2
i=1 j=1
t
= ( n 1 ) 2
i=1
= t ( n 1 ) 2
Or
E[MSE] = 2 Thus MSE is an unbiased estimate of σ2.
Similarly
SST = n ( yi. y .. )
t
2
i=1
and
1 t n
y .. = ( y )
tn i=1 j=1 ij
1 t n
= ( + i + ij ) Now substituting values from eq. 9 and eq. 14 into eq. 13 we get
tn i=1 j=1
1 t
=+ i + i.
t i=1
SST = n ( y i. y .. )
t
2
i=1
2
t
1 t
= n ( + i + i. ) + i + ..
i=1 t i=1
2
t
1 t
= n i i ( i. .. )
i=1 t i=1
t 1 t
2
= n i i + ( i. .. ) + crossproduct terms
2
i=1 t i=1
and taking expectation of both sides
t 1 t
2
t 2
E [SST ] = n E i i + n E ( i. .. )
i=1 t i=1 i=1
The cross-product terms vanishes after taking expectation. Now if the treatment levels are fixed,
11
E [ SST ] = n E t
2
i +n( t 1 )
2
n
= n E + ( t 1 )
i=1
t
i2 2
i=1
0, then αi2 = 0 and hence E[MST] = σ2. Thus "MST" is also unbiased estimate of σ2 if the null
hypothesis of no treatment effect is true.
Now as we have assumed that.
εij NID ( 0, σ2 ) also yij NID ( μi, σ2 ) i.e., yij NID (μ + αi, + σ2 ).
So SSE/σ2 χ2(t (n - 1)) and SST/σ2 χ2(t -1) (under the null hypothesis)
Since the degrees of freedom of the two χ2 - variables are equal to (tn - 1), the total numbers of
degrees of freedom therefore, the Cochran's theorem implies that is the two χ2 variables are
independent. Thus under H0 : αi = 0,
MST/MSE F(t - 1), t (n - 1)
We see that MSE is an unbiased estimate of σ2 and under the null hypothesis, the MST is also an
unbiased estimate of σ2. However, if H0 is not true, then MST will be greater than σ2, thereby
giving a larger value of F statistic. Thus a large value of F implies a false H 0. Hence the critical
region for the Analysis of Variance will be:
Reject H0 if Fcal > Fα, [(t - 1), t (n - 1)].
NOTE: By Cochran's theorem, if Xi NID (0, 1) i = 1, 2, . . . . n Then Xi2 = Q1 + Q2 + . . . . +
Qs where s n, also Qi's are χ2 variables and Qi has "ni" degrees of freedom Then these variables
will be independent if ni = n i.e. if n1 + n2 + . . . . . . . + ns = n.
dL d t n 2
= ( yij i ) = 0
d d i=1 j=1
t n
_ 2 ( yij i ) = 0
i=1 j=1
..
..
..
n
dL
dt
= 2 [ yj=1
ij t ] = 0
Now using the assumption Σαi = 0 (of the fixed effect model)
t n
n t = yij
i=1 j=1
_ ˆ = y ..
n
n + n 1 = y1j
j=1 Similarly we can get
_ ˆ 1 = y1. y ..
ˆ 2 = y 2. y ..
ˆ 3 = y 3. y ..
..
..
..
ˆ t = yt. y ..
Confidence Interval for the ith Treatment mean
As the mean of the ith treatment is given by μi = μ + αi and the point estimator of μi would be _^i =
_^ + _^i = y_.. + y_i. - y_.. = y_i.
Now as yij NID (μi, σ2) therefore y_i. NID (μi, σ2/n)
Since MSE is an unbiased estimate of σ2, therefore, the variable
yi. i
_ t t ( n 1 ) Thus the 100 (1 - α)% confidence interval for μi can be constructed as follows:
MSE
n
P t < t < t = 1
2 2
yi. i
_ P t ; t(n 1) < < t ;t(n 1) = 1
2 MSE 2
n
MSE MSE
_ P t ; t(n 1) . < yi. i < t ;t(n 1) . =1
2 n 2 n
MSE MSE
_ P yi. t ; t(n 1) . < i < yi. + t ;t(n 1) . =1
2 n 2 n
Which is the 100 (1 - α)% confidence interval for the ith treatment mean μi.
14
Example
An anthropologist was interested in studying physical __________________________
Caucasian Japanese Chinese
14.20 12.85 14.15
differences, if any, among the various races of people 14.30 13.65 13.90
inhabiting Hawaii. As a part of her study she obtained a 15.00 13.40 13.65
14.60 14.20 13.60
random sample of eight 5-year-old girls from each of three 14.55 12.75 13.20
races: Caucasian, Japanese and Chinese. She made a number 15.15 13.35 13.20
14.60 12.50 14.05
of anthropometric measurements on each girl. She wanted to 14.55 12.80 13.80
determine whether the Oriental races differ from the
Caucasian, and whether the oriental races differ from each
other. The results of the head width measurements are given in the table II.
The anthropologist is interested in answers to the following questions:
1. Do head width means differ among the races?
2. Is there a difference between the Caucasian race and the Oriental races?
3. Do the Oriental races differ in head width?
4. Find 95% confidence interval on the means of Chinese head width.
t
E [ SST ] = E n ( yi. y .. )2
i=1
t 1 t
= E n ( + i + i. i .. )2
i=1 t i=1
t 2
1 t
=E n ( i i ) + ( i. .. )
i=1 t i=1 Therefore
= E n t + n ( ) + crossproductterms
2
1
t t t
2
i i i. ..
i=1 i=1 i=1
= n E E( ) + n (t 1) n
t 2
2
i i
i=1
= n (t 1) 2 + (t 1) 2
E [MST ] = n 2 + 2 Now as
SSE 2
_
2 t(n 1)
and under H0, we have
SST SST
= 2 _ 2(t 1)
n +
2 2
Thus under H0, we have
SST / 2 (t 1)
F= _ F [t 1; t(n 1)]
SSE / 2 t(n 1)
Now as under H0, both MST and MSE are unbiased estimates of
SST / (t 1)
F= _ F [t 1; t(n 1)]
SSE / t(n 1)
σ2. But if H0 is false, then
E[MST] = σ2α + σ2 i.e, under H1, the expected value of numerator is greater than the expected
value of the denominator. Hence critical region of the test will be:
Reject H0 if Fcal > Fα; [t - 1, t(n - 1)]
n= ni i=1
t 1 i=1 t
i=1
ni
P F 1 (d.f) F F (d.f) = 1
2 2
MST 2
P F 1 (d.f) F
(d.f) =1
2 MSE n
2
+ 2 2
1 MST n 2 + 2 MST
2
1
P =1 Now the confidence interval for σ2α /
F (d.f) MSE F 1 (d.f) MSE
2 2
1 MST MST
1 2
n 2 1
P 1 = 1
F (d.f) MSE F 1 (d.f) MSE
2 2
1 1 MST 2 1 1 MST
P 1 1 =1
n F (d.f) MSE 2 n F 1 (d.f) MSE
2 2
(σ2α + σ2) can be find using the above interval.
As L σ2α / σ2 U and we can write
2
= 1 Therefore the required confidence limits will be:
2
2 + 2 1+
2 2
2
2 2
2
L U
Where
1+ L + 1+U
1 MST 1 1 MST 1
L=
1 , U= 1
n MSE F (d.f.) n MSE F 1 (d.f.)
2 2
Example
A textile company weaves a fabric on a large Observations 1 2LOOMS3 4
___________________________________
number of looms. They would like the looms to be
1 98 91 96 95
homogeneous so that they obtain a fabric of 2 97 90 95 96
3 99 93 97 99
uniform strength. To investigate the variations in
4 96 92 95 98
strength between looms, four (4) looms are
selected at random and the strength of the fabric
manufactured on each loom is determined. The
data obtained is given in Table IV.
19