BDU Biometrics
BDU Biometrics
Instructor:
Getachew Alemayehu Damot (PhD, Associate Professor)
1
Part I: Some Important Statistical Concepts Useful for Experimental Designs and Analysis
Statistics is the science, pure and applied, of creating, developing and applying techniques
such that the uncertainty of inductive influences may be evaluated.
Statistics is used in almost all fields of human activities and used by government bodies,
private business firms and research agencies as an indispensable tool.
Limitations of statistics
1. It cannot deal with a single observation or value
2. Statistical methods are not applicable to studies which measure qualitative characters and
cannot be coded in numerical values.
3. The statistical study does not take care of changes occurring to individuals
4. The statistical statements or conclusions are generally not true or applicable to
individuals but are applicable to the majority of classes.
5. Misuse of statistics arises from deliberate motivation or due to lack of knowledge or
application of inappropriate methodology.
6. Complete accuracy in statistics is often impossible
2
In summary, statistics a highly developed science with deep rooted mathematical base. It is
applicable to a large number of economic, social and business phenomena which are executed by
different bodies of government and private sector. It is a backbone of industrial research.
+∞
E(x) = ∫ x ƒ ( x ) dx ,ifxiscontinuousrandomvariable
−∞
Mean (μ ¿ - is a measure of central tendency. The value that tell us the aggregate to
the center.
3
mean. It is an indicator of how close are they, or how far they one to
another. Variance is simply the deviation of each variable from the mean.
Examples:
A B
B) Properties of variance
1) v(c) = 0
2) v(x) = E(x-μ)2 = σ2
3) v(cx) = E[c(x-μ)]2 = c2E(x-μ)2 = c2σ2
4) v(x1+x2) = v(x1)+v(x2)+2cov(x1,x2), where cov means covariance of
x1& x2 = σ12+σ22+2σ122
5) v(x1-x2) = v(x1)+v(x2)-2cov(x1,x2) = σ12+σ22-2σ122
Note: Cov(x1,x2) = 0, if x1& x2 are independent.
If x1& x2 are dependent, cov(x1,x2) ≠ 0, rather other integer.
4
2. Sampling and Sampling Distribution
a) Observation – is an elementary unit or a measure of some attribute of a population.
Attributes means characters that are considered or interested. Examples, grain yield per
spike, percent silt in some soils, amount of fat in milk of Zebu cows, etc.
b) Population – is the universe or an aggregate or totality of elementary unit or
measurement. All or whole measurement what we are interested is statistically called
population. Examples, all the grain yield of a given wheat variety, amount of Debre Zeit
soils, all milk of Zebu cows, etc. Some populations could be finite and some other
populations are infinite.
c) Sample – is a set of n observations or elementary units drawn from a population.
Normally, n is designated as sample size, whereas N is designated as population size.
The objective of statistical analysis is to draw inference about a population based on the
result of a sample. Or to give conclusion about a population, sample should be taken and
analyzed statistically. Otherwise, it is difficult to give a conclusion about a given population.
Most of the time, it will be assumed that a random sample is taken, rather than systematic
sampling. Assume that the population consists of N individuals and a sample of n individuals
is taken. If a population size is N and a sample size of this population is n, the total possible
N!
combinations of samples is calculated as = . Possible samples are given equal
n ! ( N −n ) !
opportunity to be members of the sample, then the sample is said to be random sample.
Example, population letters include A, B, C, D & E (N=5). If 2 letters are taken as a sample,
5! 5x 4 x 3 x2 x1
how many possible samples can be formed randomly? = = 10
2! ( 5−2 ) ! 2 x 1 (3 x 2 x 1)
possible samples will be having in this population.
d) Parameter – is some value of a population which is usually not known. It is a value that
we could like to make estimation. Parameters are expressed (designated) in terms of
Greek letters such as μ (mew), σ2 (sigma square), ρ (raw), τ, etc. Parameters are
constants. I.e., they are not random variable.
e) Statistic – is a sample estimate of a parameter. Statistics – are some estimates of a
sample and designated by Latin letters such as , s2, etc. Suppose y1, y2, y3, …yn is a
n
5
n
2
s = ∑ ¿¿ ¿ = sample variance where n-1 is the degree of freedom.
i=1
s = √ s 2, where S is standard deviation. All , s2 and s are statistics. What is the meaning
of statistics? Statistics are estimates (values) of a sample. Normally, statistics are
computed from samples. estimates μ, where is a minimum sum of squares estimates
of μ; s2 estimates σ2; and s estimates σ.
The normal curve is the theoretical model that is used to analyze data in real life.
Density Function – The frequency or density fxn of the normal law is given as follows:
1 −( x−μ ) 2
ƒ(x) = е , where -∞ < x <+∞
μ √2 πσ 2 2σ2
-∞ < μ<+ ∞
σ2>0
e ≈ 2.718…
Frequency curve – The frequency curve of normal distribution looks like the following:
6
In this case, we can say x has a normal distribution with mean μ and with σ2.
Symbolically, x N(μ,σ2), where means distributed.
If x is distributed normal with mean μ and variance σ2, i.e., x (μ,σ2), then
defined Ƶ as:
x−μ
Ƶ= , then this Ƶ is distributed normal with mean 0 and variance 1. In other words, Ƶ
σ
(0, 1).
1 −ƶ 2
The density fxn of Ƶ is given as follows: ƒ(ƶ) = е , -∞ < ƶ <+ ∞
√2 π 2
The reason why we use the standard normal distribution is that the population’s μ and σ2
are not known. So that ƶ normal distribution is used practically because ƶ is any random
variable.
The probability distribution for the standard normal variate has been completely
tabulated.
Disadvantage of this is that many statistical techniques assume that the random variable
under consideration is normally distributed and used this distribution to calculate various
7
probabilities. Consequently, many people use a standard normal distribution whether
their study follows normal distribution or not. Indeed, there are some cases which are not
fit the standard normal distribution. Examples, if we want to know females born in the
country annually, or number of telephone calls, etc. that enforces us to use other
distribution which is known as Poisson distribution.
8
Central Limit Theorem
Central limit theorem is also known as the law of large number.
If y1, y2, y3, …, yn is a sequence of n independent variables with E(yi) = μi and
n
x−∑ μi
i=1
v(yi) = σ2 and x = y1+y2+y3+…+yn, then Ƶn = n , then Ƶn N (0,1)
∑ σi
i=1
Therefore, the central limit theorem gives us the justification for using the normal
distribution. Even if any data are not normal, rather Poisson or Binomial or other
distribution, they are used in normal distribution, because normal distribution (standard)
fits the central limit theorem.
Define μ = x12+x22+x32+…+xn2, then μ will have the chi-square distribution with the
following frequency function:
1
x -n/2 n/2 -1 -μ/2
ƒ(μ) = n 2 xμ x e , μ ≥ 0.
( −1)!
2
n=1
μ 0<x2<+∞
From the curve, we note that as sample size increases, the chi-square curve approaches
the normal curve. So that when the sample size is high, it is better to use normal
distribution, since it is easier than chi-square.
9
10
Goodness of fit
k k
2 Observed −Expected 2 Oi−Ei 2
Define X = ∑ ( ) = ∑( )
i=1 Expected i=1 Ei
Here X2 is used as a measure of discrepancy between theoretical and observed
frequencies. Such type of chi-square is used in plant and animal breeding.
Example, crossing two heterozygote plants of sorghum for height, Aa x Aa ⇒ 1AA, 2Aa,
1aa. If you cross 100 plants, as a breeder you may expect 25 of AA, 50 of Aa and 25 of
aa.
Ify1, y2, y3, …, ynis a random sample from a normal distribution, that is, y N(μ,
n
SS 2(n-1)
σ ). Let the sum of squares are defined as follows: SS = ∑ ( yi−¿) ¿2, then
2
x ,
i=1 σ2
where n-1 is degree of freedom.
If the random variable in the sample is independently and identically distributed, that is,
σ 2 2(n-1) 2 SS
y N ¿2), then s2 x , s =
n−1 n−1
σ 2 2(n-1)
s2 x ,
n−1
( n−1 ) s 2 2(n-1)
x
σ2
(n-1)s2 = SS
ƒ(t)
0 t
11
Properties of the t-distribution
1. As the sample size increases, the curve approaches the normal curve.
2. E(t) = 0
k
3. v(t) = , where k>2
k−2
4. The curve is symmetric
Ify1, y2, y3, …, ynis a random sample from the normal distribution, that is, y
−μ
s
N(μ-σ ), then defined t as t = s t(n-1), where
2
is standard error.
√ n
√n
Letx1, x2, x3, …, xn be a random sample from a normal distribution with mean μ1 and
variance σ12, and lety1, y2, y3, …, yn be another random sample with mean μ2 and variance
σ 12 σ 22
σ22, then define Ƶ = ¿ ¿ N(0,1), where = σ12 and = σ22,
m m
Where m and n are the sample sizes for the first and second samples., respectively. This
is true if the two variances σ12 and σ22 are known. If the 2 population variances are not
known, then they have to be estimated from the samples.
Let s12 = ∑ ¿¿ ¿
s22 = ∑ ¿¿ ¿
SSx−SSy
Compute pooled variance as sp2 =
( m−1 ) +(n−1)
( m−1 ) s 12+ ( n−1 ) s 22
or sp2 = , where s 12 = s12 and
m+ n−2
s22 = s22
Here, we have assumed that the populations have the same variance, that is, σ12=σ22=σ2
The t- is defined as t = ¿ ¿. This is known as a two sample t-test where samples are
independent.
t-is known as sampling distribution. t = ¿ ¿ t(m+n-2), this is true for σ1=σ2.
s 12 s 22
√
If σ12≠ σ22, there is no Sp, rather
m
+
n
12
u
Ƥ (u+v /2)(u+ v /2)u/2 f ( −1)
2
h(f) =
u v u
Ƥ ( ) ( ) [( )
2
Ƥ
2 v ]
f +1 u+v /2
13
h(f)
Application f
Assume that you have two samples x1, x2, x3, …, xn and y1, y2, y3, …, yn with the same
population variance, then define F as F = s12/s22 F(m-1, n-1). Here, we have
2 2
assumed that s1 >s2 , usually the larger to be the nominator where as lesser as
denominator.
3. Estimation
In statistics often times are interested in estimating some properties of the population or test
hypothesis concerning that property. A sample is usually taken from the population and the
value or values concerning property or properties are calculated and the values are taken as
estimates the population properties.
Assume that we are interested in determining the mean of the normal distribution, then we
14
statistics. In this case, no other estimator of the population mean μ such as median or
mode gives additional information.
=
∑ (xi ) = x 1+ x 2 ,+ x 3+ …+ x n
n n
4. Efficiency – An estimator is said to be efficient if it has minimum variance (smallest
variance). Let Theta1, 2, 3 ( 1, 2, 3) be estimators of a parameter with respective
variance σ2. -this mark over the designated variable means estimator. means estimator
of variable θ, or is an estimator of μ. For instance, 1, 2, 3 are of 2.00, 2.32, 1.81,
respectively, and then 3 is an efficient.
Methods of estimation
There are 3 important methods of estimation in mathematics. These are:
1. The least square method
2. Maximum likelihood method
3. Minimum χ 2-method, ( χ 2 = Chi-Square)
Why we need these methods? The answer is to estimate the population parameter.
1. Maximum likelihood (M.L.)
The properties of M.L. are such that it gives estimator with desirable properties such as
consistency, efficiency and sufficiency, but not unbiased. Assume that we have the
following parameters: θ1, θ2, θ3,…, θk to be estimated. Let x be the random variable (r.v)
of interest. Take a random sample such as x1, x2, x3, …, xn. Then, M.L. is computed as
follow: πƒ(x; θ1, θ2, θ3,…, θk). π = product is known as maximum likelihood. The values
of θ1, θ2, θ3,…, θk is given in a such way that M.L. value is maximized.
1 ¿ 1 σ2
Example, e-( x−μ ¿ 2 2 = e- , which is of the estimation of μ & σ2. Take
√2 πσ 2 √2 πσ 2 2
samples and multiply them μ1, σ12; μ2, σ22; μ3, σ32; …; μn, σn2.
2. Least square method (L.S.)
It is mostly widely used one techniques of estimation. For least square estimate, take the
values of estimators which give you or result in the smallest deviation of the actual values
from expected value. Let x1, x2, x3, …, xn be a sample from a given population, then the
n n
least squares estimate is computed as follows: L = ∑ [ xi−E( xi)]2 = ∑ ( xi−μ)2.
i=1 i=1
15
This technique is used in frequency data. Let’s say there are γ attributes (characters), and
let’s take a sample of size n. Let ni of these values belong to the ith attribute, and let’s pi
be the true population of the ith attribute, then the minimum χ 2 is computed as follows:
γ
( ¿−nipi)2
χ *= ∑ =
ƒ(x) = E(xi-μ)2
i=1 nipi
L.S.
μ
Find the value of pi which minimizes χ . χ is asymptotically efficient. When the size is
* *
large, it will be very efficient, because it approaches to the line. In other words, for large
samples, χ *is distributed χ 2. In this course, we will be interested in the least squares
method, because it is unbiased estimator and simple to compute better than other
methods.
4. Hypotheses Testing
Definition – Hypothesis is a statement about some attribute(s) of a population. Examples: the
proportion of silt in the soil of Debre Zeit is 50%; there is no difference in yield of fertilized
and unfertilized crops; the butter fat percentage of Zebu cows is 4; etc.
Hypothesis testing – is a procedure in which an experimenter decides which of the two sets
of dichotomous hypotheses is accepted or rejected with a specified level of risk of making an
error. There are two hypotheses, namely:
α α
(a) Exact hypothesis = such as two tails (sides) test = Ƶ or t . Examples:
2 2
(i) H0 (H-null) hypothesis, such as μ = 4; and
(ii) Ha (H-alternate) hypothesis, μ ≠ 4.
(b) Inexact hypothesis = one tail (side) test = Ƶα or tα. Examples:
(i) H0 (H-null) hypothesis, μ >4; and
(ii) Ha (H-alternate) hypothesis, μ <4.
16
Steps in hypothesis testing
1. State the null hypothesis,
2. Determine the test statistics and sample statistics (Ƶ, t, F, χ 2, etc.),
3. Determine the significance level,
4. Decide on sample size,
5. Take a sample and compute sample statistics, and
6. Make a decision by comparing sample statistic with test statistic. Reject your hypothesis
if computed value is larger than table value of your statistic.
−μ −μ
,
For example: for testing means, we can use Ƶ- or t-statistics. Ƶ = σ t = s .
√n √n
t-test is used – (1) when σ is not known, or
(2) when the sample size is <30.
Statistical inference includes (a) estimation, and (b) hypothesis testing.
Example: let’s say that our hypothesis H0 – μ = 100. Assume that random variable under
study is normally distributed with mean μ and variance σ2 = 25. Let the significance level
to be 0.05. Take a sample of size 100 and calculate = 102. Then, Ƶ is computed as:
−μ 102−100
Ƶ= σ = 25 = 4.
√n √
100
Whereas, Z0.05 from standard normal distribution table = 1.645.
Then, the decision would be = reject H0.
μ μ
One tail test for inexact hypotheses (left-less than, right-greater than)
17
Rejection Rejection
region region
Acceptance
region
α μ α
-significance level -significance level
2 2
Two tails test for exact hypothesis
Significance level α (alpha) is the probability which determines your rejection region. In
industry, they call it tolerant level, while in economics or business, they call it risk level.
If the work is very precise, the α level will be very small, whereas as not precise very much
level of risk will be large. But, usually 0.01 & 0.05 levels of error are commonly used for
most agricultural and biological studies.
Distribution Distribution
under H0 under Ha
1−α 1−β
Decision category
True situation
μ = 100 μ = 105
Decision μ = 100 1-α β
Correct Type II error
μ = 105 α 1-β
Type I error Correct
18
Power of a test
Power of a test is the probability of rejecting a hypothesis when it is false. This probability is
designated by 1-β. An experimenter selected an experimental design and set of decision rules
which give the highest power.
19
Methods of increasing power of test
1. Increase sample size,
2. Taking samples randomly,
3. Select an appropriate experimental design which will precisely measures the treatment
effects and results in small error variance
S12 =
∑ ( x 1−1)2 ,
i=1
m−1
n
S22 =
∑ ( x 2−2)2 ,
i=1
n−1
20
( m−1 ) S 12+ ( n−1 ) S 22
Pairing avoids variation that reduces error. Sp = . Here, we have
( m−1 )+(n−1)
assumed that the population variances are equal. This test is known as independent t-test.
Assignment: Is Bartlett’s test the best? Read Bartlett’s test for homogeneity variances.
21
Letx1, x2, x3, …, xn be a random sample and y1, y2, y3, …, yn be another random sample,
the followings are true: x N(μx, σx2), as well as, y N(μy, σy2).
Assume that the two variances σx2 and σy2 are equal. In other words, the two population
variances are equal. Then, we can test the hypothesis. H0: μx = μy, or H0: μx – μy = 0.
22
Procedure
1. Decide on significant level
2. Decide on sample size
3. Take the samples
4. Compute the two variances, then compute pooled variance
5. Compute two sample t
6. Reject the hypothesis if t-calculated > tα/2(m+n-2). It is two tails test if not (inexact
hypothesis) use tα, not tα/2.
Computation of estimates
Note that this sign means estimates of statistics.
m
1. = =
∑ xi
i=1
m
n
2. = =
∑ yi
i=1
n
m
3. x
2
= sx2 = ∑ ¿¿ ¿
i=1
n
4. y
2
= sy2 = ∑ ¿¿ ¿
i=1
2
SSx +SSy
5. = sp2 = m+n−2
3 4 sx = (2-2)2+(1-2)2+(3-2)2+(2-2)2/3
2
23
m
2
sx = ∑ ¿¿ ¿ -Deviation formula
i=1
m
m ∑ xi
sx =2
∑ xi2−( i=1m )2 -Machine
i=1
m−1
formula
m
2
∑x i = 22+12+32+22 = 18
i=1
n ∑ yi n
2 i=1 2 2
According to machine formula, sy = ∑ yi2−( n
) ∑y i = 25+9+16+64 = 114
i=1 i=1
n−1
(Σyi)2/n = 202/4 = 400/4 = 100
sy2 = 114-100/4-1 = 14/3 =4.667
The pool sample variance is estimated as: sp2 = ssx+ssy/m+n-2 = 2+14/(4+4)-2 = 16/6 = 2.667
sp = √ s p 2 = √ 2.667 = 1.6329
t = ¿¿ Tabulated t-value
(2−5)−0
t= 1 1 tα/2(σ)
1.6329
t = -3.67
√ +
4 4
t0.5/2(σ) = 1.46
Decision: Reject H0, whilet-calculated is > t-tabulated indicating that the population mean of x &
y are not equal.
If we don’t wish to assume that the two variances are equal, then we can test the following
hypothesis: H0: σx2 = σy2 vs. Ha: σx2 ≠ σy2
Procedure:
1. Choose α-level (error-level)
2. Take the two samples (xi& yi)
3. Compute F as: F = (sx2/σx2)/sy2/σy2, σx2/σy2 = 1, while they are equal hypothetically
If H0 is true, then F = sx2/sy2 Fα(m-1, n-1)
4. Reject H0, if F>Fα(m-1, n-1)
Example, test the variances of x & y given above: sx2= 0.667,and sy2 =4.667
F = sy2/sx2 =4.667/0.667 = 7.0
24
F-tabulated at 0.05 error: F0.05(3, 3) = 9.28
Decision: Accept H0. Since F-distribution is skewed to the right, it doesn’t have two tails test,
rather one tail test.
1. 1 = 1. =
∑ x 1 j = x /n 1.
j =1
n
j
2. 2 = 2. =
∑ x 2 j = x /n 2.
j =1
n
j
3. 3 = 3. =
∑ x 3 j = x /n 3.
j =1
n
. . . .
. . . .
. . . .
j
k. k = k. =
∑ xkj = x /n k.
j =1
n
1. 1
2
= s12 =
∑ ( x 1 j−1. ) 2
j =1
n−1
n
2. 2
2
= s22 =
∑ ( x 2 j−2. ) 2
j =1
n−1
25
n
3. 3
2
= s32 =
∑ ( x 3 j−3. ) 2
j =1
n−1
. . .
. . .
. . .
n
k. k
2
= sk2 =
∑ ( xkj−k . ) 2
j =1
n−1
2
( n−1 ) s 12+ ( n−1 ) s 22+ ( n−1 ) s 32+ … ( n−1 ) sk 2
= sp2 =
k (n−1)
k n
sp2 =
∑ ∑ ( xij−i. ) 2 , Error variance is the pooled variance.
i=1 j=1
k (n−1)
To test the hypothesis, H0: μ1 = μ2 = μ3 = … = μk
We know that under the null hypothesis, 1, 2, 3, …, k is a random sample from
N(μ,σ2/n), then we can estimate μ as:
k n
= .. =
∑ ∑ xij , where N = kn
i=1 j=1
N
k
5. Analysis of Variance
26
4. The numerator and denominator F-ratios are independent.
27
99 77 76 68
Ʃyi.= yi. 436 510 456 372
i. 72.67 85 76 62
What would be the statement of null hypothesis?
H0: There is no difference among the absorption of fats by doughnuts.
The F-test is usually used for the analysis of variance. If the F-test shows that there is the
difference between treatment means, we would like to know which of the means are different
and the process is known as multiple comparison or means separation (separation of
means).
28
If the overall F-test declares, there is significant difference between treatments (groups), the
experimenter is faced the question, which of the means under the study are different? The F-
test indicates to the experimenter that either something has or has not happened between
treatments. If the null hypothesis is true, there would have been a small probability of
happening between treatments. In other words, what has happened between the treatments is
a small probability difference as a matter of chance.
1- 3 2- 4
1- 4 3- 4
If Ψ (phay) is a contrast for population, its sample estimate is given by . For instance,
1 = 1- 2, 2 = 1- 3, 3 = 1+ 2+ 3 -3 4, etc.
Each method of means comparison has advantages and disadvantages, but the 1st and the 4th
methods are commonly used. Indeed, the 5th method is very appropriate if the sample size (n)
of the treatments is not the same.
tα(k(n-1) is t-value of error’s degree of freedom, while MSE is a mean square of error and
n is a sample size. If your calculated difference exceeds d, reject the null hypothesis; if
not, accept the null hypothesis. For instance, the doughnut study case indicated above,
put the means in descending and ascending orders in x & y axis, respectively, and
calculate the difference between the means.
2=85 3=76 1=72.67 4=62
4=62 23 14 10.6 -
1=72.67 12.34 3.34 - -
29
3=76 9 - - -
2=85 - - - -
Correction!
2 MSE 2(110.4 )
d = tα(k(n-1)
√ n √
= t05 (20)
6
= 1.7 √ 6.06 6.06 = 10.3
Then, the difference between two means would be significant if it exceeds 10.3. That
mean, there is no difference between 2& 3, and 3 & 1, whereas there is
1significance difference between 2& 4, 2& 1, and 3& 1. For more
2=85a
3=76ab
4=62c
In some cases, however, the LCD can lead to anomalous (unusual) results. The F-test
may be significant, but the means difference may not be significant.
freedom and r distance between two means at 0.01 or 0.05 level of error.
30
Procedure
1. Rank the means
2. Obtain pr& v Correction!
3. Compute wr
4. Compare differences withwr
Example, take the results of the doughnut study. First make ranking means from the
highest to the smallest: 2, 3, 1, 4. Then, compute as follow:
110.4
w2 = 2.95
√ 6
110.4
= 12.65
w3 = 3.10
√ 6
110.4
= 13.30
w4 = 3.18
√ 6
= 13.64
There is only difference between 2& 4, and 3& 4. When we compare the
results of LSD and DNMRT methods, LSD seems more sensitive than DNMRT.
LSD DNMRT
1b 1ab
2a 2a
3ab 3a
4c 4b
3. Schiffe’s S-Method
This is known as simultaneous confidence bound.
S = √ ( k −1 ) Fα ( v 1 , v 2) MSE ∑ ( Ci ) 2 , where ci are possible contrasts.
Example,
√
H0: μ1+μ2-μ3-μ4 = 0
¿
= 1+ 2- 3- 4 = 72.67+85-76-62 = 19.6
( 1 ) 2 ( 1 ) 2 (−1 ) 2 (−1 ) 2
√
S = √ 3(4.90) 110.4 ∑
6
+
6
+
6
+
6
31
S = 30.45, and the decision is accepting the hypothesis, while the
estimate is less than S.
Example, it is given that σ2 = 50, and α = 0.05 and L = 2, then n is calculated as:
n = (4Ƶ0.052 σ2)/L = [4(1.96)2x 502]/22 = 200. To use the formula for determining sample size
(n), we need to know the sample variance.
When we sum all observations and divided by sample size, it gives μ. But, when we look at
each observation vs. μ, it may less or larger from μ. This deviation (difference) regardless
sign (increase or decrease) is called experimental error.
32
Experimental error is used to determine the accuracy of an estimate or accuracy of the
experiment. Suppose we have two populations with mean μ1&μ2, and we want to test the
following hypothesis: H0: μ1 = μ2. There are two possible cases in relation to error.
1. In the first case (case I), the experimental error would be large. In this case, the difference
between the two means must be large also in order to be detected. Large experimental
error affect our estimation.
2. In the second case (case II), experimental error is small. In this case, small differences
can be detected.
1 1
t = [( 1- 2) – (μ1-μ2)]/sp
√ +
m p
When experimental error is reduced, the accuracy of an experiment increases.
Relative efficiency – is a measure used to compare two designs in terms of having the
minimum experimental error. It depends upon the variances of the two designs. Suppose
we wish to compare desirable one (I) against design two (II). If we have large samples,
the relative efficiency can be computed as follow: R.E. = (s22/s12) x 100.
33
If we have small samples, on the other hand, the relative efficiency can be calculated as
follow: R.E. = [(n1+1) (n2+3) s22]/[(n1+3) (n2+1) s12] x 100, where s2 is error
variance. The latter takes in to account the degrees of freedom in comparing the
efficiency of the designs.
A design is said to be more efficient than the other one if the R.E. is greater than 100%. A
design which reduces the error variance and the degrees of freedom at the same time may
not be the best. A design which reduces the error of variance by increasing sample size to
increase degrees of freedom is rather the best. Example let the sample variance obtained
for design I be 127 and that of design II be 152. Let their respective sample sizes be 6 and
14. Symbolically, s12 = 127; s22 = 152
n1 = 6 n2 = 14
(i) The relative efficiency of design I over design I without considering the df is:
R.E. = (152/127) x 100 = 119.7%
(ii) If we take into account the sample sizes, then
R.E. = [(n1+1) (n2+3) s22]/[(n1+3) (n2+1) s12] x 100
R.E. = [(6+1) (14+3) 152]/[(6+3) (14+1) 127] x 100 = 105.5%
Since the R.E. value is greater than 100, we say that design I is more efficient than design
II.
34
Both randomization and replication are necessary to obtain a valid estimate of
experimental error. Increasing number of observations is important to reduce errors
caused by man, but replication is important to increase accuracy and precision.
d) Treatment – An experimental material in which an experimenter is interested in and
whose effect he wishes to measure. Examples, (i) effect of fertilizers on root
development, (ii) effect of quantity of food consumed on weight gain, (iii) effect of
quantity of enzyme on chemical reaction, etc.
e) Factors and levels – are kinds of treatments having main and sub constitutes. For
instance, nitrogen and phosphorous fertilizers on the one hand and their different rates
on the other hand.
Nitroge 0, 25, 50, 60, 70
n
P2O5 0, 50, 100, 150, 200
The items nitrogen and phosphorous are the factors, while their rates are the levels.
f) Treatment combinations – The combination of factors and levels is known as
treatment combination. Assume that we have t-factors and r-levels each, and then we
will have rt treatment combinations. But, if the levels are different, then the treatment
combinations would be rt1 x rt2x rt3 x …x rtn. In case of the above example, nitrogen
and phosphorous and their levels, the treatment combination is = 52 = 25.
g) Blocking – This is a very useful term in statistics or experimental design. It is a
technique in which experimental units are classified into homogenous group. This is
done to reduce variability arising from differences due to experimental units.
Assume that we have t-treatments and we wish to observe each treatment r-times
(replications), then we have rt experimental units to work with. Suppose r = 4 and t =
4, then rt = 4x4 = 16. Suppose also these experimental units are hypothetically
arranged as follow: and with the application of
blocking, they will be arranged as follow:
Group I:
Group II:
Group III:
Group IV:
This technique is similar to blocking because homogenous groups are assigned to one
block. Note that the units within each block are homogenous, and this is the idea of
blocking. If land is an experimental unit, its condition may vary along the slope or
across East-West, South-North or both. In other words, it may vary one-way or two-
ways.
35
Each treatment has to be allocated in each block which is considered to be homogenous.
The treatments found in the same homogenous block are therefore comparable. But, any
two treatments found in any two different blocks separately are not comparable, while
their differences may not be totally due to their effect, perhaps may be due to the
differences of experimental units. In each block, each treatment has to be assigned
randomly.
Suppose that we have t-treatments and wish to observe each treatment r-times
(replications). Then, we will have 3 different cases to placing treatments on an
experimental unit where the treatments are comparable.
1. Case I – Suppose that the experimental units are homogenous. In other words, no
restriction on experimental unit. Indeed, if there is no variation, no need to make
blocking and the source of variation will be due to treatments and experimental error.
Under this condition, the experimental design is known as Completely Randomized
Design (CRD) and the AOV (ANOVA) is known as One-Way ANOVA, while the
main source of variation is only one that is the treatment.
2. Case II – Suppose not all experimental units are homogenous, and there is one
restriction on the experimental unit. Within blocks/groups, the experimental units are
homogenous, while between blocks/groups, they are heterogeneous. Randomly
assigning of each treatment to one experimental unit within a group and that group is
known as a block. Under this case, each treatment occurs within each block and
therefore the blocks are said to be complete. Each treatment occurs once in each
block. If we need to observe each treatment more than once in each block, say k
times, then we need k homogenous experimental units to form a complete block.
For instance, when land as experimental unit varies only one-way, either along the
slope, East-West or South-North, it is necessary to apply one-way blocking and each
block will be considered as a replication for a set of basic treatments. Under this
condition, the design is known as Random Complete Block Design (RCBD) and the
ANOVA is known as Two-Way ANOVA, while one-way variation of experimental
unit (block) and the treatments are the two main sources of variation.
3. Case III - In case of two-way variation (restriction) on the experimental unit, two-
way blocking is applied and the design is known as Latin Square Design (LSD) and
the ANOVA is known as Three-Way ANOVA where row and column blocks and the
treatments are the three sources of variation.
36
Two-way blocking necessitates, however, equal number of row and column blocks as
the number of treatments. If the number of treatments is 4, then the number of row
and column blocks has to be 4.
Column
TT1 T2 T1 T3 T1 T4 T2 T3 T2 T4 T3 T4
b1 b2 b3 b4 b5 b6
In this case, all the treatments occur an equal number of times.
37
Note that we can compare treatments which are found on the same block, not on different
blocks. The function of local control is to make the experimental design more efficient.
T1 T1 T2 T3 T4 T2
T2 T2 T3 T2 T1 T4
T3 T4 T4 T1 T3 T1
T5 T6 T7 T8 T9 T10
1. General Model
First it is necessary to transcribe the parameters.
yijk = μij + Ɛijk, where i = 1,2,3,…,b
j = 1,2,3,…,t
k = 1,2,3,…,n
In other words, yijk is the kth observation of the jth treatment of the ith block. By
transforming the above equation, we will have the following:
yijk = μ + βi + ζ j +Ɛijk, where yijk = typical observation
μ = grand mean
βi = ith block effect
ζ j = jth treatment effect
38
Ɛijk = kth random error of the jth treatment
If we put as yijk = μij + Ɛijk, it is not possible to know the effect of treatments and
blocks. The next step will be estimation of parameters.
2. Estimation of parameters
The second step after transcribe the parameters is estimating the parameters. To
b t
estimate our parameters, we must set the following: ∑ β I = 0, ∑ζ j = 0, that means
i=1 j=1
the Ʃ of blocks is zero, because some are <μ and some others are >μ.
b t t t
μ = (∑ ∑ μ ij)/bt, μi. = (∑ μ ij)/t, μ.j = (∑ μ ij)/b, βi = μi.-μ, ζj = μ.j-μ
i=1 j=1 j=1 j=1
Ʃμ.. – μ = 0
Estimates
b t n
= (∑ ∑ ∑ y ijk)/btn = …
i=1 j=1 k=1
t n
i. = (∑ ∑ y ijk)/nt = i..
j=1 k=1
b n
.j = (∑ ∑ y ijk)/nb = .j.
i=1 k=1
ζj = μ.j-μ
i = μ.j-μ = .j. - … , where j = 1,2,3,…,t
b t n
Additive Model
The model discussed here above (yijk = μ + βi + ζ j +Ɛijk) is known as additive model.
39
A model is said to be additive if the difference between two treatments is the same for
all blocks. In other words, in additive model, the difference between any two
treatments doesn’t vary from block to block. For example, if we have 4 treatments,
the treatment differences would be: ζ1-ζ2, ζ1-ζ3, ζ1-ζ4, ζ2-ζ3, ζ2-ζ4, ζ3-ζ4, and each of
these differences are the same in all blocks.
10
9
ζ2
8
Treatmet effects
7 ζ4
6
5 ζ3
4
ζ1
3
2
1
0
β1 β2 β3 β4 β5
Block effects
Additive model never includes interaction effects. It is applicable where the difference
between treatments is the same in all blocks effect. I.e., ζ1-ζ3 = c (constant) at all blocks.
ij = ij. - i.. - +
.j. …
b t
To obtain the estimate of δij, we need to set that ∑ ∑ xij = 0
i=1 j=1
40
Test hypotheses
1. Hypothesis on treatments
H0: There is no difference among treatments.
H0: ζ1 = ζ2 = ζ3 = … = ζt, or
H0: μ.1 = μ.2 = μ.3 = … = μ.t
Ifyijk N(μij, σ2), then .j. N(μ.j, σ2/bn). If the null hypothesis is true,
.1., .2., .3., … .t. is a random sample from a normal distribution with mean μ and
t
variance σ2/bn. Symbolically, .j. N(μ.j, σ2/bn). The variance of .j. is = [∑ ¿¿
j=1
2 2
.j. - ...) ]/t-1. It estimates σ /bn, which indicates the variability of treatments.
This estimates σ2/bn and the estimate of σ2 (Q1) = bnƩ[( .j. - ... )2]/t-1
2
To test the above hypothesis, compute Ϝ = Q1/ Fα[t-1, bt(n-1)]
b
2
= [∑ ∑ t ∑ n ¿ ¿ ¿ ¿yijk - ij. )2]/bt(n-1)
i=1 j=¿¿ k=¿¿
2. Hypothesis on blocks
H0: There is no difference among blocks.
H0: β1 = β2 = β3 = … = βb, or
H0: μ1. = μ2. = μ3. = … = μb.. If H0 is true, 1.., 2.., 3.., … b.. is a random sample
distributed normal with mean μ and variance σ2/tn. Then i.. N(μ, σ2/tn).
Sample variance of i.. is = [ ∑ b ¿ ¿ i.. - ... )2]/b-1. This estimates σ2/tn and the estimate
i=¿¿
3. Hypothesis on interactions
H0: There is no difference among interactions of blocks and treatments.
H0: δ11 = δ12 = δ13 = … = δbt.
The variance of the estimate ij.- i..- .j.+ …
b t
= [∑ ∑ ¿ ¿¿ --
ij. . i..- .j. + … )2]/(b-1) (t-1). This estimates σ2/n,
i=1 j=1
b t
then the estimate of σ2 (Q3) = n[∑ ∑ ¿ ¿¿ --
ij. . i.. - +
.j. … )2]/(b-1) (t-1). Then F is
i=1 j=1
2
computed as F = Q3/ Fα[(b-1)(t-1), bt(n-1)].
Note that whether the model additive or non-additive is determined after conducting the
experiment and observing the samples which are either affected or not affected by the
interaction of blocks and treatments.
41
Under additive model SStotal = SSblock+SStreatment+ SSerror, whereas under non-additive
modelSStotal = SSblock+SStreatment+ SSinteraction +SSerror.
42
Part II: Design and Analysis of Experiments
Definition – By statistical design and analysis of experiments, we mean the process of planning
an experiment so that appropriate data could be collected which can be analyzed to result in valid
and objective conclusion.
A) Types of experimental design from randomization point of view: There are two major
types of experimental design:
1. Systematic experimental designs
2. Randomized experimental designs
Diagonal Square
43
A B C A B C
C A B B C A
B C A C A B
B) Types of experimental design from completeness point of view: There are two major
types of experimental design:
1. Complete designs
2. Incomplete designs
The chief advantage of the randomized designs is that they are subjected for statistical
analysis.
Definition – The simplest experimental design where the treatments are assigned at
random to homogenous experimental units is known as the completely randomized
design (CRD). It is selected when the overall variation of experimental units is
relatively small or insignificant.
45
1. Appropriate only where the number of treatments are small and where there are
homogenous experimental units (material), such as in laboratories and green
houses.
2. Not appropriate for field experiment, because it is nearly impossible to get
homogenous experimental units in the field.
3. It does not provide method for estimating interaction of treatments with blocks
Note that the term layout refers to the placement of experimental treatments on the
experimental site whether it be over space, time, or type of material.
∑∑ y ij2 - [∑ ¿¿ ij)2]/n
i=1 j=1 i=1
Total kn-1 k n
2
k n
∑∑ y ij - (∑ ∑ y ij)2/kn
i=1 j=1 i=1 j=1
Example, assume that we have four treatments and 32 homogenous experimental units.
The data from this experiment are as follows:
Treatments
t1 t2 t3 t4
3 4 7 7
6 5 8 8
3 4 7 9
3 3 6 8
1 2 5 10
2 3 6 10
2 4 5 9
2 3 6 11
∑yij = 22 28 50 72
k n k n
1. Total sum of squares (TSS) = ∑ ∑ y ij - (∑ ∑ y ij)2/kn
2
= (3 +6 +3 +…+11 ) – (3+6+3+…+11)2/4x8
2 2 2 2
46
= 1160.0 – 924.5
= 235.5
k n k
2. Treatment sum of squares (TreatSS) = ∑ ∑ y - [∑ ¿¿ ij)2]/n
ij
2
= 1119 – 924.5
= 194.5
k n k
3. Error sum of squares = ∑ ∑ y ij2 - [∑ ¿¿ ij)2]/n
i=1 j=1 i=1
Assignment: please carry out multiple comparisons for the above experiment using LSD
and DMRT!
In this case, the sum of squares due to error is partitioned into block and error sums of
squares.
47
tn = number of observations per block
btn = total number of observations, or
btn = total number of treatment combinations
48
Illustration of treatment combinations in the following table:
Where the total treatment combinations = btn
Treatments
t1 t2 t3 .. tt
b1 t11 t21 t31 … tt1
t12 t22 t32 … tt2
t13 t23 t33 … tt3
. . . … .
. . . … .
. . . … .
t1n t2n t3n … ttn
b2 t11 t21 t31 … tt1
t12 t22 t32 … tt2
t13 t23 t33 … tt3
. . . … .
. . . … .
. . . … .
t1n t2n t3n … ttn
Blocks b3 t11 t21 t31 … tt1
t12 t22 t32 … tt2
t13 t23 t33 … tt3
. . . … .
. . . … .
. . . … .
t1n t2n t3n … ttn
.. … … … … …
.. … … … … …
.. … … … … …
bb t11 t21 t31 … tt1
t12 t22 t32 … tt2
t13 t23 t33 … tt3
. . . … .
. . . … .
. . . … .
t1n t2n t3n … ttn
49
Statistical model for RCBD
There are two different statistical models for RCBD depending upon
absence or presence of interaction effects between treatments and blocks.
Case 1:In additive model
yijk = μ + βi + ζ j +Ɛijk
Case 2:In non-additive model
yijk = μ + βi + ζ j +δij+Ɛijk
Estimates
t1 t2 t3 i.. = i.
b1 2 3.5 3 3
b2 4 3 5 4
b3 3 4 2 3
b4 3 5 4 4
.j. = .j 3 4 3.5 3.5 = … =
Computation of treatment and block effects
50
a) Block effects: βi = μi. - μ..
i = i. - ..
1 = 1. - .. = 3 – 3.5 = -0.5
2 = 2. - .. = 4 – 3.5 = 0.05
3 = 3. - .. = 3 – 3.5 = -0.5
4 = 4. - .. = 4 – 3.5 = 0.5
b
∑εi.. = 0, then what understand from here? Also ∑ ❑i = 0, what we understand from here.
i=1
1 = .1 - .. = 3 – 3.5 = -0.5
2 = .2 - .. = 4 – 3.5 = 0.5
3 = .3 - .. = 3.5 – 3.5 = 0
t
∑ε.j. = 0, and ∑ ❑i = 0. What these mean?
i=1
11 = 2-3-3+3.5 = -0.5
12 = 3.5-3-4+3.5 = 0.0
13 = 3-3-3.5+3.5 = 0.0
21 = 4-4-3+3.5 = 0.5
22 = 3-4-4+3.5 = -1.5
23 = 5-4-3.5+3.5 = 1.0
31 = 3-3-3+3.5 = 0.5
32 = 4-3-4+3.5 = 0.5
33 = 2-3-3.5+3.5 = -1.0
41 = 3-4-3+3.5 = -0.5
42 = 5-4-4+3.5 = 0.5
= 4-4-3.5+3.5 = 0.0
43
51
Test of hypothesis
a) On treatment:H0: ζ1 = ζ2 = ζ3 or H0: μ.1 = μ.2 = μ.3
Sample estimate .j. N(μ, σ2/bn), if the null hypothesis is true, μ.j = μ
Q1 = bnƩ[( .j. - ...)2]/t-1 = 4x4∑[(3-3.5)2 + (4-3.5)2 + (3.5-3.5)2]/2 = 4
b
2
= [∑ ∑ t ∑ n ¿ ¿ ¿ ¿yijk - ij. )2]/bt(n-1)
i=1 j=¿¿ k=¿¿
= 7.44
F = Q1/ 2 = 4/7.44 = 0.537
F0.05(2,36) = 3.2
Conclusion: Accept H0
Analysis of Variance
SV df SS MS F
2 2
Block b-1 tnƩ( i.. - ... ) tnƩ[( i.. - ... ) ]/b-1
Interaction (b-1) (t-1) n∑( ij. - i.. - .j. + … )2 n[∑( ij. - i.. - .j. + …)2]/(b-1) (t-1)
52
AOV (ANOVA) Table of the above results
SV df SS MS F
Block 3 12 4 0.537
Treatment 2 8 4 0.537
Interactio 6 24 4 0.537
n
36 267.84 7.44 -
Error
Total 47 - -
Error bt(n-1) σ2
Total btn-1
Error bt(n-1) σ2
Total btn-1
53
3. Model III (random model)
Both blocks and treatments are found in random model
AOV Table
SV df Expected MS
Treatments t-1 σ2+ nσBT2+bnσT2
Error bt(n-1) σ2
Total btn-1
Advantages of RCBD
1. More accurate than CRD for most types of experiments
2. Analysis is straight forward
3. No restriction on number of treatments or replications (high flexibility)
4. Possible for estimating missing observation
Disadvantages of RCBD
The chief disadvantage of RCBD is that it is not appropriate for experiments which have
a very large number of treatments and/or so the blocks have considerable variability.
Its flexibility and ease of application have made it the most popular design in use. The
Latin square design is being its closest rival.
54
Comparison of RCBD with CRD
Relative efficiency – R.E of RCBD as compare to CRD is:
R.E = [(n1+1) (n2+3)s22]/[(n1+3) (n2+1)s12] x 100%
Where n1 = df of RCBD
s12 = MSE of RCBD
n2 = df of CRD
s22 = MSE of CRD
χ = [r(B) + t(T) - G]/(r-1)(t-1)
= ❑√ s 2¿ ¿
Example, compare the previous RCB’s AOV results with that of CRD’s AOV.
AOV Table (RCBD) AOV Table (CRD)
SV df SS MS SV df SS MS
Blocks 3 60 20
Treatments 2 50 25
Treatment 2 50 25
s
6 120 20 Error 9 180 20
Error
Total 11 230 Total 11 230
In a condition that experimental units have two way-restrictions, then the experimental
unit is divided into rows and column so as to occur each treatment once in each row and
column. This type of design is known as Latin square design. By elimination of row and
55
column effects from the error variance, the mean square due to error would be reduced
more than that of RCBD. Latin square design is used in biology, agriculture, industry,
economics and many other fields.
Advantages of LSD
1. Experimental units with two way-restrictions, the LSD controls variability better than
the RCBD. In other words, the error mean square would be much smaller than that of
RCBD.
2. Analysis is simple as compared to that of other designs, although it is more
complicated than that of CRD & RCBD.
3. Analysis remains simple with missing data.
Disadvantages of LSD
1. The number of treatments is restricted by number of rows or columns. As a thumb
rule, the LSD is not generally used for the treatments exceed 10.
2. Fore fewer than 5 treatments, the df for controlling heterogenesity is
disproportionately small. Error will be large that may render to a more tendency to
accept null hypothesis. So it is necessary to use repeated Latin square or other
appropriate designs.
3. If no assumption about,
Example, consider 4x4 L.S, and how you independently randomize row and column? For
randomization, follow the following steps:
1. Draw 3 sets of 4 random digits (1, 2, 3, 4) from table of random digits.
Assume the 3 sets of random digits are: 2, 1, 3, 4
3, 1, 2, 4
4, 3, 1, 2
2. Select the square according to the first number of the first set. Since the first number
is two (2), we pick the 2nd set of 4x4 L.S.
That is, the b set given above. A B C D
B C D A
C D A B
D A B C
3. Arrange rows of step 2 according to the 2nd set of numbers (3, 1, 2, 4)
C D A B
A B C D
B C D A
D A B C
4. Randomize columns of step 3 according to the 3rd set (4, 3, 1, 2), and work with this
L.S. B A C D
57
D C A B
A D B C
C B D A
Construction of L.S with a one step cyclic permutation: the simplest and commonest
way of constructing L.S is using a one step cyclic permutation of the letters. Example,
consider a chemical experiment in which there are 6 chemicals under investigation, with
6 methods of mixing and 6 technicians to do the job. The layout of this experiment on the
basis of a one step cyclic permutation is as the following:
A B C D E F
F A B C D E
E F A B C D
D E F A B C
CDEFAB
BCDEFA
Assumptions: before we analyze a Latin square experiment, we need to make the
following assumptions:
1. No row by column interaction,
2. No column by treatment interaction,
3. No row by treatment interaction,
4. No row by column by treatment interaction.
If we don’t have these assumptions, we would need a large number of observations. For
example, if we have an experiment under 6x6 Latin square, we need 63 = 216
observations.
1 2 3
I A B C
II C A B
III B C A
58
t
μi.. = ∑ μ ijk/t, (i, j, k) ϵ Q
i=1
Sample estimates
Estimate of treatment mean is = ..k
" " row " " = i..
" " column " " = .j.
For the above example, mean of treatment A is:
..A = ⅓(y11A+y22A+y33A)
=
i i.. - … , .j. = .j. - …, ..k = ..k - …
Test, 1) H0: ζ 1 = ζ 2 = ζ 3 = … = ζ k
2) H0: ρ1 = ρ2 = ρ3 = … = ρi
3) H0: δ1 = δ2 = δ3 = … = δj
Then, compute Q1 as = t∑( ..k - …)2/t-1 = t∑ ..k
2
– t2 …
2
/t-1
" Q2 as = t∑( i.. - …)2/t-1 = t∑ 2
i.. - t
2
…
2
/t-1
" Q3 as = t∑( .j. - …)2/t-1 = t∑ 2
.j. - t
2
…
2
/t-1
2
To test hypothesis, you have to compute F = Q/ F[(t-1), (t-1)(t-2)
Example of Latin square experiment on feeding trial on sheep having different age and
breed.
Row blocks – age differences
Column blocks – breed differences
Treatments:
A. Grazing only (control)
59
B. Grazing and maize supplement
C. Grazing, maize and protein supplement (P1)
D. Grazing, maize and protein supplement (P2)
E. Grazing, maize and protein supplement (P3)
60
Compute relative efficiency
1. If we consider rows as blocks, then you have:
SB2(rows) = 4(40.66)+12(17.96)/16 = 22.50
R.E = (13x19x22.50)/(15x17x17.96) x100% = 121%
Conclusion: row blocking is efficient.
Relative efficiency of LS as compare to RCBD consisting of row blocks.
R.E = [(t-1)(t-2)+1][(t-1)(t-1)+3]S2RCBD/[(t-1)(t-2)+3][(t-1)(t-1)+1]S2LSD
2. Columns as blocks
SB2(columns) = 4(207.76)+12(17.96)/16 = 65.41
R.E = (13x19x65.41)/(15x17x17.96) x100% = 357%
3. R.E of LS as compared CRD
R.E = [(13x23)/(21x15)][(60.45/17.96)]x100% = 319.5
Conclusion: LSD is more efficient than CRD.
61
Case II: Rows are the same, but columns are different at different locations.
Model: yijkl = μ+ρi+δjl+ζk+Ϩl+εijkl, where δjl = jth column at l location.
AOV Table
Source of variation df
Rows t-1
Treatments t-1
Locations s-1
Columns within location s(t-1)
Error (st2-1) - [2(t-1)+(s-1)+s(t-1)]
Total st2-1
Case III: Both rows and columns are different at different locations.
Model: yijkl = μ+ρil+δjl+ζk+Ϩl+εijkl, where ρil = ith row at l location
δjl = jth column at l location
AOV Table
Source of variation df
Rows/location s(t-1)
Columns/location s(t-1)
Treatments t-1
Locations s-1
Error (st2-1) - [2s(t-1)- (t-1)-(s-1)]
Total st2-1
62
AOV Table as cross-over design
Source of var. df
Columns 5 (r-1)
Rows 1 (t-1)
Treatments 1 (t-1)
Error 4 (t-1)(r-2)
Total tr-1
If the experiment is conducted as a 2x2 repeated LS, the layout would be as follow:
SQ1 SQ2 SQ3
Columns 1 2 1 2 1 2
1 A B A B B A
Rows 2 B A B A A B
The repeated LS is like case III where rows and columns are different at different
locations. Thus, the AOV Table of this LSD is that of case III.
AOV Table as LSD
Source of variation df
Squares 2 = (s-1)
Periods/square 3 = s(t-1)
Columns/square 3 = s(t-1)
Treatments 1 = t-1
Error 2 = (st2-1) - [2s(t-1)- (t-1)-(s-1)]
Total 11 = st2-1
Note: The change-over design gives more degree of freedom than that of repeated LS. So
that using change-over design is better than using repeated LSD.
The cross-over design may be used for any number of treatments with restriction
that the number of replications be a multiple of number of treatments. But, it is not
advisable to use repeated LSD instead of the cross-over design if the number of
treatments exceeds 4.
63
Many characteristics under study change with time. For instance, milk yield varies
with the stage of lactation. If all the animals are given the same sequence of
treatments say A, B, C in three periods of that order. In this case, treatment C will
be under estimated because it will be given to the cows at their declining stage of
lactation. To overcome this problem, it is necessary to apply treatments in
different sequences.
Sequences
I II III
P1 A B C
Periods P2 B C A
P3 C A B
Carry-over effect
In the use of switch-over design, there is another problem which we call it carry-
over effect. What we call treatment effect in a switch-over trial for a given period
may not be the effect due solely to the treatment in that period, but it may be also
due to a carry-over of treatment conducted a period before.
If one is interested only in the direct effect of that particular period, it is possible to
give or to allow sufficiently long period or rest period that we call it carry-over
period for the effect of the previous treatment to disappear. During this period, the
animals will be treated equally or can be switched to the next treatment, but the
yield will be discarded. The rest period will be sufficiently long so that the residual
effect of the previous period would disappear. Make sure that the carry-over effect
and the direct effect are not confused (confounded).
In many cases such as lactation or growth, the periods themselves are of limited
duration. On the other hand, the treatments should be given sufficient time to
express their effect. In other words, it may not be possible to have sufficiently long
64
period of rest to get rid off or at least appreciably reduce the effect of previous
treatment. Not only that one may not know whether or not experiments in such a
way that both the carry-over and the residual effects can be estimated. Hence, it is
noted also that switch-back or switch-over doesn’t work for more treatments
because they require a long period that will be out of the time of confound.
The analysis of the cross-over and the direct effects becomes much simpler if each
treatment is preceded by every other treatment an equal number of times. This
makes the design balanced for residual effects.
65
c) Graeco Latin Square
Assume that we have a LS of a given dimension superimposed Greek letter on the
Latin letters such every Greek letter occurs once and only once with any Latin letter.
Example, consider a 4x4 LS:
Aα Bβ Cδ Dζ
Dβ Cα Bζ Aδ
Bδ Aζ Dα Cβ
Cζ Dδ Aβ Bα
For instance, 4 poultry feeds and 4 feed additives are tested on 4 different age groups
and 4 different breeds of chickens. Latin letters can represent main poultry feeds,
while Greek letters represent feed additives.
6.2 Incomplete Designs (please refer Gomez and Gomez from page 39 to page 83)
A) Lattice Design
1. Balanced Lattice Square Design
2. Partially Balanced Lattice Square Design
Simple Lattice Square Design
Triple Lattice Square Design
Quadruple Lattice Square Design
B) Group Balanced Square Design
7. Factorial Experiments
This is the condition of experimentation in which there are several factors with many levels
and all treatment combinations are observed. A group of treatments which contain two or
more levels of two or more factors or substances in all combinations is known as factorial
experiment. Example, a fertilizer study on N, P, K, S & Zn, each with 5 levels: N1, N2,
N3, N4, N5
P1, P2, P3, P4, P5
K1, K2, K3, K4, K5
66
S1, S2, S3, S4, S5
Zn1, Zn2, Zn3, Zn4, Zn5
5
Then we will have 5 = 5x5x5x5x5 = 3125 treatment combinations. Here, we say that
we have 5 factors each factor at 5 levels.
67
b a0b2 a1b2 a2b2
2
If the range of the levels is known and if we are interested in the nature of the response curve, we
should take as many levels as possible. The levels may be equally spaced or not. To ease the
computation of linear regression curve, it is advisable to use levels which are equally spaced. If
the range in which the factor is effective is not known, but the lower or upper limit is known, use
the formula c±nk to determine the levels, where c is the lower or upper value, n is the levels (0,
1, 2, 3, etc) and k is the interval which is usually constant.
68
= μ2.- μ1., the inner subscript is the same, but the outer is different.
B = [(μ12 - μ11) + (μ22 - μ21)]/2
= [(μ12 + μ22) - (μ11 + μ21)]/2
= μ.2 - μ.1
3. Interaction effect – interaction effect is defined as the failure of one factor, in the above
case a, to retain the same order and magnitude of performance at all levels of the other
factor, in the above case b. In other words, if the difference between the two levels of
factor a varies at each level of b, then we say the factors a & b interact. The interaction of
factors a & b is designated as AB. Symbolically, interaction of A & B is defined as the
average difference between the simple effects of A or B.
For example, take the above 2x2 factorial experiment. The interaction (AB) is estimated
as follow:
AB = [(μ21 - μ11) - (μ22 - μ12)]/2
= [μ21 + μ12- μ11 - μ22]/2
Note: the interaction effect calculated in line with A is the same as in line with B
AB = [(μ12 - μ11) - (μ22 - μ21)]/2
= [μ12 + μ21- μ11 - μ22]/2
Estimates of means
a0 a1
b0 5 3
b1 6 6
69
5. Interaction of AB: [(μ21 - μ11) - (μ22 - μ12)]/2, in line with A
(-2-0)/2
-1
[(μ12 - μ11) - (μ22 - μ21)]/2, in line with B
(1-3)/2
-1
Test of hypotheses in FE: several hypotheses can be tested in factorial experiments. For
instance, in a 2x2 FE, the hypotheses of interest would be the followings:
1. H0: A = 0 or H0: μ2. - μ1.= 0
2. H0: B = 0 or H0: μ.2 – μ.1= 0
3. H0: AB = 0, or no interaction between A & B
70
1. SStotal = Σyijl2 – (y…)2/ijl, y… = ΣΣΣyijl and (y…)2/ijl is known as c.f
= (42+62+22+42+42+82+72+52) – 402/(2x2x2)
= 226 – 1600/8
= 226 – 200 (c.f)
= 26
2. SSA = Σ(yi..)2/jl – c.f MSA = SSA/dfA
2 2)
= (22 +18 /4 – 200 = 2.0/1
= 202 – 200 = 2.0
= 2.0
3. SSB = Σ(y.j.)2/il – c.f MSB = SSB/dfB
2 2
= (16 +24 )/4 – 200 = 8.0/1
= 208 – 200 = 8.0
= 8.0
4. SSAB = Σ(yij.)2/l – c.f – SSA – SSB MSAB = SSAB/dfAB
2 2 2 2
= (10 +12 +6 +12 )/2 – 200 – 2.0 – 8.0 = 2.0/1
= 2.0 = 2.0
5. SSerror = SStotal – SSA – SSB – SSAB MSerror = SSerror/dferror
= 26 – 2.0 – 8.0 – 2.0 = 14/ij(l-1)
= 14 = 14/4
= 3.5
Computation of F using variances
4. To test H0: A = 0, compute F as = MSA/MSerror Fα(1, 4(n-1)
5. To test H0: B = 0, compute F as = MSB/MSerror Fα(1, 4(n-1)
6. To test H0: AB = 0, compute F as = MSAB/MSerror Fα(1, 4(n-1)
AOV Table under CRD using variances
Source df SS MS Fcal F0.05 SD
A 1 2.0 2.0 0.57 7.71 NS
B 1 8.0 8.0 2.28 7.71 NS
AB 1 2.0 2.0 0.57 7.71 NS
Error 4 14 3.5
Total 7 26
Under RCBD, the data must be first arranged as the following:
A B BI BII
a0 b0 4 6
b1 2 4
a1 b0 4 7
b1 8 5
Block total 18 23
The SS of block is computed as = (172+232)/4 – c.f
= 204.5- 200
71
= 4.5
AOV Table of 2x2 FE under RCBD using variances
Source df SS MS Fcal F0.05 SD
Block 1 4.5 4.5 1.42 7.71 NS
A 1 2.0 2.0 0.63 7.71 NS
B 1 8.0 8.0 2.52 7.71 NS
AB 1 2.0 2.0 0.63 7.71 NS
Error 3 9.5 3.17
Total 7 26
Suppose that we use Latin square design, then the layout of 2x2 FE would be as the following:
Row total
a0b0 (1) a1b0 (2) a0b1 (3) a1b1(4) R1
a1b1(5) a0b0 (6) a1b0 (7) a0b1(8) R2
a0b1 (9) a1b1(10) a0b0 (11) a1b0 (12) R3
a1b0 (13) a0b1(14) a1b1(15) a0b0 (16) R4
Column total: C1 C2 C3 C4 G (grand total)
Treatment total: T1 T2 T3 T4 = interactions
Factor A & B total:
a0 a1 B-total
b0 T1 T3 b0total
b1 T2 T4 b1total
A-total a0total a1total G
Note: In factorial experiments under RCBD or LS design, it is not necessary to look at the
interaction between treatments or factors and blocks.
72
Means with plus-minus technique in FE
Any factor at 0 level is the lowest one and designated by 1, while the next level is designated by
the factor itself. With this conversion, a0b0 = (1), a0b1 = b, a1b0 = a, and a1b1 = ab
With this conversion, the plus-minus table for a 2x2 factorial experiment is presented as fllow:
(1) a b ab Divisor (coefficient)
Mean + + + + 4
A – + – + 2
B – – + + 2
AB + – – + 2
Where: a0b0 = (1), a0b1 = b, a1b0 = a, and a1b1 = ab
Mean table
FL a0 a1
c0 c1 c0 c1
b0 μ111 μ112 μ211 μ212
b1 μ121 μ122 μ221 μ222
1. Simple effect of A: i) μ211 -μ111, ii) μ221 -μ121, iii) μ212-μ112, iv) μ222 - μ122
2. Simple effect of B: i) μ121 -μ111, ii) μ122 -μ112, iii) μ221-μ211, iv) μ222 - μ212
3. Simple effect of C: i) μ112 -μ111, ii) μ122 -μ121, iii) μ212-μ211, iv) μ222 - μ221
4. Main effect of A: = the average of the simple effects of A
= 1/4[(μ211 -μ111)+(μ221 -μ121)+(μ212-μ112)+(μ222 - μ122)]
= 1/4(μ211 + μ221 + μ212+ μ222 -μ111-μ121-μ112- μ122)
= μ2.. - μ1..
5. Similarly, main effect of B: = μ.2. – μ.1.
73
6. Main effect of C: = μ..2 – μ..1
7. Simple effect of AB: = the interaction between A & B at each level of C
AB1 = 1/2[(μ211 -μ111) - (μ221 -μ121)]
= 1/2(μ211+μ121-μ111-μ221)
AB2 = 1/2[(μ212-μ112) - (μ222 - μ122)]
= 1/2(μ212+μ122-μ112-μ222)
8. Interaction of AB: = the average of the simple effects of AB
= 1/2(AB1+AB2)
9. Simple effect of AC: = the interaction between A & C at each level of B
AC1 = 1/2[(μ112 -μ111) – (μ212 -μ121)]
AC2 = 1/2[(μ122 -μ121) – (μ222 –μ221)]
10. The AC interaction: = 1/2(AC1+AC2)
11. Simple effect of BC: = the interaction between B & C at each level of A
BC1 = 1/2[(μ121 -μ111) – (μ122 -μ112)]
BC2 = 1/2[(μ221 -μ211) – (μ222 - μ212)]
12. The BC interaction: = 1/2(BC1+BC2)
13. The ABC interaction is the average difference between the simple effects of AB or AC or
BC: ABC = 1/4[(μ211+μ121+μ112+μ222) – (μ111+μ221+μ212+μ122)]
74
c0 c1 c0 c1 c0 c1 c0 c1
b0 μ111 μ112 μ211 μ212 b0 4 (12) 3 (9) 5 (15) 2 (6)
b1 μ121 μ122 μ221 μ222 b1 2 (6) 4 (12) 7 (21) 6 (18)
Computation of effects
Simple effects: A = μ211–μ111, μ221–μ121, μ212–μ112, μ222–μ122
= 5–4, 7–2 2–3 6–4
=1 5 -1 2
B = μ121–μ111, μ122–μ112, μ221–μ211, μ222–μ212
= 2–4 4–3 7–5 6–2
= -2 1 2 4
C = μ112–μ111, μ122–μ121, μ212–μ211, μ222–μ221
= 3–4 4–1 2–5 6–7
= -1 2 -3 -1
Main effects: A = 1/4[(μ211–μ111)+(μ221–μ121)+(μ212–μ112)+(μ222–μ122)]
= 1/4[(5–4)+(7–2)+(2–3)+(6–4)]
= 1/4(1+5+-1+2) = 1.75
B = 1/4[(μ121–μ111)+(μ122–μ112)+(μ221–μ211)+(μ222–μ212)]
= 1/4[(2–4)+(4–3)+(7–5)+(6–2)] = 1.25
C =1/4 [(μ112–μ111)+(μ122–μ121)+(μ212–μ211)+(μ222–μ221)]
= 1/4[(3–4)+(4–1)+(2–5)+(6–7)] = -0.75
Interaction effects: AB = 1/4[(μ111+μ221+μ112+μ222)–(μ211+μ121+μ212+μ122)]
= 1/4[(4+7+3+6)–(5+2+2+4)]
= 14(20–13) = 1.75
AC = 1/4[(μ111+μ121+μ212+μ222)–(μ211+μ221+μ112+μ122)]
= 1/4[(4+2+2+6)–(5+7+3+4)]
= 1/4(14–19) = -1.25
BC = 1/4[(μ111+μ211+μ122+μ222) – (μ121+μ221+μ112+μ212)]]
= 1/4[4+5+4+6) – (2+7+3+2)]
= 1/4(19–14) = 1.25
ABC = 1/4[(μ211+μ121+μ112+μ222)–(μ111+μ221+μ212+μ122)]
= 1/4[(5+2+3+6)–(4+7+2+4)]
= 1/4(16–17)
=1/4 (-1) = -0.25
75
Sum of squares using variances
SSA = Σ(392+602)/12 – 992/24 = 426.75- 408.375 = 18.375
2 2 2
SSB = Σ(42 +57 )/12 – 99 /24 = 417.75- 408.375 = 9.375
2 2 2
SSC = Σ(54 +45 )/12 – 99 /24 = 411.75- 408.375 = 3.375
2 2 2 2 2
SSAB = Σ(21 +21 +18 +39 )/6–99 /24–SSA–SSB = 454.5- 408.375-18.375-9.375 = 18.375
SSAC = Σ(182+212+362+242)/6–992/24–SSA–SSC = 439.5- 408.375-18.375-3.375 = 9.375
SSBC = Σ(272+152+272+302)/6–992/24–SSB–SSC = 430.5- 408.375-9.375-3.375 = 9.375
2 2 2 2 2 2 2 2 2
SSABC = Σ(12 +9 +15 +6 +6 +12 +21 +18 )/3–99 /24–SSA–SSB–SSC -SSAB-SSAC-SSBC
= 477- 408.375-18.375-9.375-3.375-18.375-9.375-9.375 = 0.375
24 factorial experiments
This refers as experiments having 4 factors at 2 levels each.
Determination of factorial effects using algebraic technique
In order to use the algebraic technique, we need to set the first level of a factor as 1 and the 2nd
level of the factor as its factor designated by small letter. That is, for instance, a0 = 1 and a1 = a.
The factors are: a, b, c & d. The divisor = 1/2n-1, where n is number of factors.
a0 a1
c0 c1 c0 c1
d0 d1 d0 d1 d0 d1 d0 d1
b0 (1) d c cd a ad ac acd
b1 b bd bc bcd ab abd abc abcd
76
Effect (1) d c cd a ad ac acd b bd bc bcd ab abd ab abcd
s c
A - - - - + + + + - - - - + + + +
B - - - - - - - - + + + + + + + +
C - - + + - - + + - - + + - - + +
D - + - + - + - + - + - + - + - +
AB + + + + - - - - - - - - + + + +
AC + + - - - - + + + + - - - - + +
AD + - + - - + - + + - + - - + - +
BC + + - - + + - - - - + + - - + +
BD + - + - + - + - - + - + - + - +
CD + - - + + - - + + - - + + - - +
Effect (1) d c cd a ad ac acd b bd bc bcd ab abd ab abcd
s c
ABC - - + + + + - - + + - - - - + +
ABD - + - + + - + - + - + - - + - +
ACD - + + - + - - + - + + - + - - +
BCD - + + - - + + - + - - + + - - +
ABCD + - - + - + + - - + + - + - - +
Assignment: Use the algebraic technique and estimate various effects of a 23 experiment.
n n!
2. Two factors interactions = =
2 2! (n−2) !
n n!
3. Three factors interactions = =
3 3! (n−3)!
. . .
. . .
. . .
n n!
n. n factors interactions = =
n n !(n−n)!
Example, by using this open matrix, compute main effects and interactions of a 24 experiment!
n! 4! 4!
1. Main effects = = = =4
1! (n−1) ! 1! (4−1)! 1! 3 !
n! 4! 4!
2. Two factors interactions = = = =6
2! (n−2) ! 2! (4−2)! 2! 2 ! Total 15 factorial effects
77
n! 4! 4!
3. Three factors interactions = = = =4
3! (n−3)! 3! (4−3)! 3! 1 !
4! 4! 4!
4. Four factors interactions = = = =1
4 ! (n−4) ! 4 ! (4−4)! 4 !
A 32 factorial experiment
This mean an experiment consists of 2 factors each at 3 levels. Let’s assume that we have two
factors A and B. Their 3 levels are designated as the following: Factor A (a0, a1, a2) ; and Factor
B (b0, b1, b2). The treatment combinations of this FE are as follow:
a0 a1 a2 a0 a1 a2
b0 a0b0 a1b0 a2b0 Means b0 μ00 μ10 μ20
b1 a0b1 a1b1 a2b1 b1 μ01 μ11 μ21
b2 a0b2 a1b2 a2b2 b2 μ02 μ12 μ22
78
Before we attempt to measure factorial effects, we will assume that we have equally spaced and
even will partition A into linear and quadratic contrasts, and similarly also partition B into
linear and quadratic contrasts.
Factor A Factor B
a0 a1 a2 b0 b1 b2
Linear contrast
-1 0 1 -1 0 1
1 2 1 Quadratic contrast 1 2 1
80
The AOV Table
Source df
A 1
B 2
AB 2
C 3
AC 3
BC 6
ABC 6
Error Depends upon a design used
Definition: Confounding involves tying up or mixing up some effects with block differences.
The segregation of blocks within replicate results in a decrease in the degrees of freedom
associated with either error or treatment sum of squares or even both. This means that
information concerning some treatment effects will be mixed up with block differences.
For illustration consider the following. Have three factors A, B & C each at 2 levels and
treatment combinations are: a0b0c0, a1b0c0, a0b1c0, a1b1c0, a0b0c1, a1b0c1, a0b1c1, a1b1c1.
Assume that it is not possible for us to find 8 homogenous experimental material to form a
complete block. Rather we can find 2 blocks with 4 homogenous experimental units within each
block, but heterogeneous between blocks. The question is how to allocate the 8 treatments
between the 2 incomplete blocks. Suppose the following choice was made:
Block B1 = a1b0c0, a0b1c1, a0b0c1, a1b1c1
Block B2 = a0b1c1, a1b0c1, a1b1c0, a0b0c0
81
What is the difference between the two blocks measure?
B1-B2 = [(a1b0c0+a0b1c1+a0b0c1+a1b1c1) – (a0b1c1+a1b0c1+a1b1c0+a0b0c0)]
A little algebraic manipulation shows that it is equivalent to = (a1-a0)(b1-b0)(c1-c0), which
stands for ABC effect. I.e. this measure of the interaction ABC is identical with block difference.
Since blocks are formed in such a way that the variation between blocks is maximized, while on
the contrary the effect ABC is estimated with relatively poor precision from this replication.
This is because of that it is subjected not only to within block variability, but also it is affected by
the variation between the two blocks. This means that ABC is confounded with the blocks B1 &
B2. Hence, it is customary to ignore the effect of ABC interaction, which is already confounded
with the replication.
On the other hand, the other effects are free from block differences and thereby they are not
confounded. That is, A, B, c, AB, AC & BC are obtained by taking within block comparisons
(intra-block differences).
The implication is that we estimate the ABC interaction as an inter-block difference, whereas
the others are estimated as intra-block difference.
The uncofounded effects are estimated with an error variance of σ2/4 corresponding to a
relatively homogenous experimental units of 4. Had a complete block of 8 units would be used
and the error variance of estimating effects would have been σ2/8. Definitely, confounding
results in decreasing the degree of freedoms of both the treatment and error, and therefore the
treatments effects must be relatively conspicuous enough to be statistically significant among
their results.
82
Advantages of confounding
1. Confounding is of utility if the gain in efficiency through reduction of the error variable
is materialized
2. The loss of information concerning the confounded effect can in some cases be tolerated;
otherwise it can be used to increase the efficiency of the experiment and to precisely
estimate other treatment combinations.
Disadvantages of confounding
1. Confounding effects are replicated fewer times than the unconfounded effects
2. The calculation procedures are usually more difficult
3. Considerable difficulties are encountered if the treatments interact with incomplete blocks
Types of confounding
There are two general types of confounding used by experiments. These are:
a) Complete confounding, and
b) Partial confounding
a) Complete confounding
If we found that an effect is of no use or of negligible importance, we confound that effect with
incomplete blocks in all replication, and this is known as complete confounding. Before using
this complete confounding, the experimenter must make sure that it is desirable to do so.
Sometimes in an experiment where effects are completely confounded, it may be necessary to
adjust means.
We note that the effect ABC is completely confounded with blocks in all replications, because
its expected value includes the block difference.
Exercise: show that the remaining effects are not confounded by using plus-minus technique!
B1 B2 B1 B2 B1 B2
abc (1) ABC = + - A= + -
a ab + - confounded + + not confounded
b ac + - - +
c bc + - - -
B= + - C= + -
- + not confounded - - not confounded
+ - - +
- + + +
AB = + + AC = + +
- + not confounded - - not confounded
- - + +
+ - - -
BC = + +
+ - not confounded
- -
- +
84
AOV of a 23 experiment in which the effect is completely confounded
The analysis of an experiment in which an effect is completely confounded is straight foreword.
The breakdown of source of variation and the calculation of the sum of squares is similar to
RCBD design (simple factorial experiment in RCBD). The only exception being the following:
1. The completely confounded effect cannot be estimated, and
2. The divisor of the block sum of squares will be 4 instead of 8.
The AOV Table
Source df
Block 5
A 1
B 1
AB 1
C 1
AC 1
BC 1
Error 12
Total 23
b) Partial confounding
If the experimenter doesn’t want to lose all information on the effect of ABC, he/she should not
completely confounded in all replication.
He/she may, for instance, confound ABC in Rep I, BC in Rep II, etc. The information on ABC
effect can be obtained from the replication in which it is not confounded.
Since complete confounding results in total loss of information concerning the confounded
effect, we may wish not to completely confound it in all replication. Instead we may wish to
confound some effects in some replications and others in other replications. We can then get
information about the effect from the replication in which it is not confounded and this type of
confounding is known as partial confounding.
Verify that ABC, AC & BC are confounded at Rep I, II and III, respectively!
Rep I Rep II Rep II
B1 B2 B1 B2 B1 B2
ABC = + - + + + -
+ - - + - -
+ - - - + +
+ - + - - +
Results confounded not confounded not confounded
86
Note that when factor is confounded, all “+ve” will be found only in one block and all the “-ve”
signs will be found in another block. Conversely, if they are not confounded, both positive and
negative signs are found in both blocks.
Relative information: This is the amount of information which could be obtained about the
factorial effect. If the effect is not confounded, its information is 1 (100%). If it is confounded,
its information is less than 1. Therefore, information on ABC, AC & BC is 2/3, while all the rest
have 100% information.
The first task in analyzing a factorial experiment is to find out which effect is confounded.
Computation of sum of squares
1. SStotal = Σ(122+122+…+152) – (3172)/24 = 76.0
2. SSblock = Σ(492+512+…+602)/4 – c.f. = 30.8
3. SSmain effects (A, B & C)
b0 b1 Total
a0 74 80 154
a1 74 89 163
Total 148 169 317
2 2
SSA = (154 +163 )/12 – c.f. = 3.4
SSB = (1482+1692)/12 – c.f. = 18.4
SSC = (1572+1602)/12 – c.f. = 0.375
If AB had not been confounded at all within replications, it would have been obtained as:
SSAB = (742+…+892)/6 – c.f. – SSA-SSB
But, the problem is AB is confounded in Rep III. Therefore, use Rep I & Rep II to
calculate SSAB. Then, we must form a new A by B table using data from Rep I & II.
b0 b1 Total
a0 46 49 95
a1 45 59 104
Total 91 108 199
87
SSAB actually = (462+…+592)/4 – (199)2/16 – (952+1042)/8 – c.f. – (912+1082)/8-c.f
= 2505.75-2475.0625-5.0625-18.0625 = 7.5625
c0 c1 Total
a0 52 54 106
a1 56 55 111
Total 108 109 217
2 2 2 2
SSAC = (52 +54 +56 +55 )/4-2943.0625-1.5625-0.0625 = 0.5625
c0 c1 Total
b0 50 52 102
b1 60 56 116
Total 110 108 218
SSBC = (50 +52 +60 +56 )/4-2182/16-[(1022+1162)/8-2970.25]-[(1102+1082)/8-2970.25]
2 2 2 2
= 14.75-12.25-0.25 = 2.25
SSABC = (352+…+432)/3-3172/24-SSA-SSB-SSAB-SSC-SSAC-SSBC
The procedure outlined is straight forward, but some additional works required. An easier
way to find effects in 2x2x… series is using Yate’s method of sums and differences.
88
The latest column gives us the effect, M, A, B AB, C, AC, BC & ABC, in that order. If the
effects were not confounded to obtain the respective sums square, all we need to do is square this
column and divide by number of observations (total number).
On the other hand, if effect is confounded we have to adjust for confounding. For example, effect
AB is confounded between blocks and 5 & 6 in Rep III. Therefore, we adjust this effect by
removing the block differences that is, B5-B6 = -58-60 = -2. To get the effect of AB we subtract
this number from 9 = (9-(-2) = 11
AOV Table
Source df Sum of squares
Blocks 5 30.8
A 1 3.4
B 1 18.4
AB 1 7.6
C 1 0.4
AC 1 0.6
BC 1 2.25
ABC 1 0.04
Error 11 12.6
Total 23 76.0
Example, let AB being the effect to be confounded. According to even-odd technique, the
blocks look like the following:
Block I: (1), ab, c, abc
Block II: a, b, bc, ac
89
3. Permutation method: Random assignment of treatments to incomplete blocks. Here we
don’t know which factor is confounded purposely.
In another way, we look into the values of x1, x2, x3& x4 which denote the levels of the
1st, 2nd, 3rd& 4th factors, respectively. Each factor of xi will have a value of 0 or 1 where i
refers levels of factor x. I.e. xi = i levels of factor x and i is 0 or 1 (i=0,1). Let Hi have the
value 0 or 1 depending upon whether the factor is confounded or not.
Example, suppose we have 23 experiment and desire to confound AB. Then, we will have
the followings: HA = 1; HB = 1; and HC = 0. Based on modular arithmetic, these can be
transformed into the following equation: HAXA+HBXB+HCXC = I
Procedure
i. Determine the contrast to be confounded
ii. Compute HAXA+HBXB+HCXC for all treatment combinations
90
iii. Assign those treatment combinations to one blockHAXA+HBXB+HCXC = 0; and in
the other block those which result in HAXA+HBXB+HCXC = 1
Taking the above example to confound AB, the modular arithmetic is done as
follow:
Treatment XA XB XC HAXA+HBXB Mod 2 Block IBlockII
combinations
(1) 0 0 0 0 0(mod 2) = 0 1 a
a 1 0 0 1 1(mod 2) = 1 ab b
b 0 1 0 1 1(mod 2) = 1 c ac
ab 1 1 0 2 2(mod 2) = 0 abc bc
c 0 0 1 0 0(mod 2) = 0
ac 1 0 1 1 1(mod 2) = 1
bc 0 1 1 1 1(mod 2) = 1
abc 1 1 1 2 2(mod 2) = 0
HAXA+HBXB+HCXC = I, since HC = 0, only HAXA+HBXB will be resulted and the mod is 2
while they are 2 factors. HAXA+HBXB = I
Note: H takes a value when each factor in separate or interaction is confounded = 1not 0.
Since in split plot experiment variation among sub unit is expected to be less than among the
whole units, then:
91
1. The factors which require smaller amount of experimental material, and/or
2. The factors which are of major importance, and/or
3. Factors which are expected to exhibit smaller differences, and/or
4. Factors for which greater precision is required
are assigned to the sub plots.
The underline principle is that whole plots to which one or more factors are applied are
divided into subunits to which one or more levels or the other factors are applied. The whole
plot unit becomes main plot whereas the subunit becomes the sub plot.
The split plot design is an incomplete block design where main effect that is the whole plot is
confounded. The design has a complete block as regards the sub plot treatment.
Examples:
1. Study to determine the effect of 4 levels of meat on 3 breeds of cows. The main plot here
would be the breads.
2. Study of 3 types of nozzles of irrigation equipment and fertilizer application on yield of
onion
3. Effect of spraying 3 chemicals using airplane and spacing on yield of corn
Advantages of split-plots
1. Experimental material which is large by necessity and a design can be utilized to compare
subsidiary treatments (sub units)
2. Increase precision over the complete block design with regard to the sub unit treatments
and the interaction of sub plot and main plot
3. The over all precision of this design relative to RCBD can be increased, if the sub unit
treatments are laid out in latin square or in an incomplete latin square design
Disadvantages of split-plots
1. Whole-plot factor is less precisely estimated as compare to RCBD
2. When missing data exist, the analysis becomes too complicated compared to RCBD
Randomization to be followed depends upon the design used (whole-plots). Example, assume the
one factor in our study is fertilizer designated by a, and the other factor is variety and designated
by b. Assume that a has p levels and b has q levels. Assume also that a is the main plot factor
and b is a sub-plot factor. Then the design looks like the following:
a1 a2 … ap
b1 b1 … b1
b2 b2 … b2
. . … .
bq. bq. … bq
92
Here we want to know varieties effect precisely. When we a, it is confounded. In this design,
sub-units are complete, but incomplete to main-plot and main-plot is confounded.
From this design, we note that we have an incomplete block design as regards the main effect a
and a complete block design as regard the sub unit treatment b. We also note that the main plot
treatment is confounded with incomplete blocks. Therefore, that is why we say a split-plot design
is an incomplete block design in which main effect is confounded. Note that RCBD requires pq
experimental units to form a complete block.
Suppose we have two factors each at 3 levels and replicated 3 times. And let the following be the
experimental layout.
Rep I Rep II Rep III
a1a3 a2 a2 a3a1 a3a1a2
b1 b3 b2 b1 b1 b3 b3 b1 b3
b2 b1 b3 b2 b2 b13 b1 b3 b2
b3 b2 b1 b3 b3 b2 b2 b2 b1
If we assume that there is no interaction between whole plot treatments and replications, as well
as, between sub-plot treatments and blocks, we will have two types of comparisons:
1. Between whole plots, and
2. Between sub-plots within whole plots
Within whole plot comparison
To make the within whole plot comparison, we will divide data into 3 groups. In the first group,
we put the treatment combination having the first level of a. In second group, second level of a,
and so on.
Group I Group II Group III
Rep I a1b1 a2b1 a3b1
a1b2 a2b2 a3b2
a1b3 a2b3 a3b3
Rep II a1b1 a2b1 a3b1
a1b2 a2b2 a3b2
a1b3 a2b3 a3b3
Rep III a1b1 a2b1 a3b1
93
a1b2 a2b2 a3b2
a1b3 a2b3 a3b3
c) Complete analysis
Source df Sum of squares
Main-plot 8
Rep 2 (r-1) SSR
A 2 (a-1) SSA
Error (a) 4 [(r-1)(a-1)] SS(a)
Sup-plot 18
B 2 (b-1) SSB
AB 4 [(b-1)(a-1)] SSAB
Error (b) 12 [a(b-1)(r-1)] SS(b)
Total 26 (abr-1) SStotal
94
An example of a split-plot analysis
Yields of three varieties of alfalfa (ton/acre) with four dates of cutting.
Blocks Variety
Variety Date total
I II III IV V VI Total
Ladak A 2.17 1.88 1.62 2.34 1.58 1.66 11.25
B 1.58 1.26 1.22 1.59 1.25 0.94 7.84
C 2.29 1.60 1.67 1.91 1.39 1.12 9.98
D 2.23 2.01 1.82 2.10 1.66 1.10 10.92
Total 8.29 6.75 6.33 7.94 5.88 4.82 39.99
Cossac A 2.33 2.01 1.70 1.78 1.42 1.35 10.59
k B 1.38 1.30 1.85 1.09 1.13 1.06 7.81
C 1.86 1.70 1.81 1.54 1.67 0.88 9.46
D 2.27 1.81 2.01 1.40 1.31 1.06 9.86
Total 7.84 6.82 7.37 5.81 5.53 4.35 37.72
Ranger A 1.75 1.95 2.13 1.78 1.31 1.30 10.22
B 1.52 1.47 1.80 1.37 1.01 1.31 8.48
C 1.55 1.61 1.82 1.56 1.23 1.13 8.90
D 1.56 1.72 1.99 1.55 1.51 1.33 9.66
Total 6.38 6.75 7.74 6.26 5.06 5.07 37.26
Block Total 22.49 20.32 21.44 20.01 16.47 14.24 114.97
95
AOV table
Source df SS MS
Main plots 17 5.6902
Blocks 5 4.1496 0.8300
Varieties 2 0.1778 0.089
Error(a) 10 1.3622 0.1362
Sub-plots 54 2.3511
Dates 3 1.9625 0.6542
DxV 6 0.2108 0.0351
Error(b) 45 1.2586 0.0280
Total 71 9.1218
There is also less degrees of freedom in split-plot design to compare the main effects. Since the
average error of the difference for both designs is the same. In other words, there is no overall
gain by using the split-plot design.
96
B) Split-split-plot design
In split-plot design, it is possible to have more than two factors. Assume that we have several
levels in all possible combinations and these combinations are of interested to the experimenters.
Assume that we have three factors (A, B, & C) with several levels. The first factor is less
important than B & C, while B is relatively more important than A, but less important than C. In
other words, C is much more important than either A or B.
The levels of factor a can be laid out as the main plots, the levels of factor b laid out under each
level of a, and the levels of factor c can be laid out under each level of factor b. such type of
design is known as the split-split-plot design.
It is possible to have more than 3 factors under such type of design. One can continue the same
procedure as long as the factors under consideration are hierarchal in terms of importance.
AOV table
Source df Sum of squares
Main plots ra-1 [Σ(yij..)2]/bc - c.f
Replication (R) r-1 [Σ(yi....)2]/abc - c.f
A a-1 [Σ(y.j..)2]/rbc - c.f
RA (Error a) (r-1)(a-1) [Σ(yij..)2]/bc - c.f – SSR – SSA
Sub-plots ab(r-1) [Σ(yijk.)2]/c - c.f
B b-1 [Σ(y..k.)2]/rac - c.f
AB (a-1)(b-1) [Σ(y.jk.)2]/rc - c.f – SSA – SSB
Error (b) a(r-1)(b-1) [Σ(yijk.)2]/c - c.f – SSR – SSA – SSB
Sub-sub-plots abc(r-1) [Σ(yijkl)2]/r - c.f
C c-1 [Σ(y…l)2]/rab - c.f
AC (a-1)(c-1) [Σ(y.j.l)2]/rb - c.f– SSC – SSA
BC (b-1)(c-1) [Σ(y..kl)2]/ra - c.f– SSC – SSB
ABC (a-1)(b-1)(c-1) [Σ(y.jkl)2]/r - c.f– SSC – SSB– SSA
Error (c) ab(c-1)(r-1) SStotal-SSR-SSA-SS(a)-SSB-SSAB-SS(b)-SSC-SSAC-SSBC-SSABC
Total abcr-1 [Σ(yijkl)2] – c.f
How to calculate the degrees of freedom and sum of squares, as well as, to set source of variation
for other higher levels of similar design follows the same way.
97
On the experimental errors
The whole plot error which is conventionally called Ea is larger than that of the sub-plot error,
called Eb. This is because of that the sub-units within the whole unit tend to be positively
correlated and therefore behave alike than the sub-units which are in other main-units.
Error a cannot be less than error b, unless this occurs by chance. If this happen, we assume that
both error a and b estimate the population error variance σ2 and so we pool the two errors and
used the pool error variance to test hypotheses.
C) Strip-plot/split-block design
Sometimes the factors a and b may not be as important as the fraction between the two factors.
Experimental condition may be such that both factor a and b need larger plots, but the interaction
is more important. Examples,
1) Fertilizer with broadcast method of application and tillage experiment with an ordinary
implement
2) Study on using sprinkler irrigation and spraying of some chemicals
To cope with situations like these, a design variously named as split-block, strip-cropping, or
two-way-whole is employed. In this design, the levels of factor a are laid out in randomized
complete block design, latin square design or any other design. Then the levels of factor b are
laid out across the levels of factor a. Such a design would effectively have both the levels of a
and b as whole-plot levels, the only sub-plot information is that of the AB interaction.
98
Advantages of the split-block design
1. The sub-plot may be kept relatively small even though the whole plots for both factors
may be large
2. The AB interaction would be estimated precisely
Here we have factor a and b in randomized complete blocks. Note that both factors are
randomized simultaneously in each block. Here again each b level is imposed across the whole a
levels and each level a is imposed across the whole b levels.
In this design, the two factors are laid out in randomized complete block design. Comparison
between any level of one factor is done using three replications.
a) To compare two levels of a use:
99
S.E = √ 2 MSE (a)/rb
b) To compare any two levels of b use:
S.E = √ 2 MSE (b)/ra
The name stripe-cropping implies that it requires large plot/area. If a and b are equally important,
but c is less important, c should be laid at the main plot and then a& b in each c level as split-
block. The experimental layout looks like the following:
c2 c1 c3
a1 a3 a2 a2 a1 a3 a3 a2 a1
b2 b2 b1
b3 b1 b2
b1 b3 b3
D) Strip-split-plot design
Please refer Gomez and Gomez from page 154 to page 167
E) Fractional Factorial Design
Please refer Gomez and Gomez from page 167 to page 186
Example 1, consider an industrial experiment in which the performance of 5 machines are tested
at 4 different stations with the following layout:
Machines
M1 M2 M3 M4 M5
Stations 1 5 9 13 17
2 6 10 14 18
3 7 11 15 19
4 8 12 16 20
100
Example 2, assume that a physian wants to study the effectiveness of the 2 drugs with respect to
some criteria. Suppose that the design calls for the administration of drug 1 to n patients in each
of three hospitals. Similarly, drug 2 is administrated to n patients in the remaining three
hospitals.
The effect of the drugs are masked or mixed with the effects of hospitals. The effect we measure
for drug 1 is not the unique effect due to the drug, but it also includes effects of hospital 1, 2 & 3.
Note: It is impossible to compare drug one with drug two, but possible to compare drug one
effect with different hospitals.
Test of hypotheses
I. On machines: H0: μ1. = μ2. = μ3. = μ4. = μ5.
II. On stations:
1. H0: μ11 = μ12 =μ13 = μ14
2. H0: μ21 = μ22 =μ23 = μ24
3. H0: μ31 = μ32 =μ33 = μ34
4. H0: μ41 = μ42 =μ43 = μ44
101
5. SSmachine 5 = Σ(y5j)2 – (Σy5.)2/r
102
Analysis of variance table
Source df Sum of squares
Between machines 4 (m-1) Σ(yi..)2/sr – (Σy...)2/msr
Between stations/machine 15 [m(s-1)] Σ(yij.)2/r – Σ(yi..)2/sr
Between period/station/machine ms(r-1) Σ(yijk)2 – Σ(yij.)2/r
Total msr-1 Σ(yijk)2 – (Σy…)2/msr
Model: μ+si+aj(i)+wk(ij)+rl(ijk)+hm(ijkl)+εijklm
Note: In nested design, we cannot study the interaction effect, because sub-unit factors are
different to different units.
There are different types of nested design. Example, two-way nested design as split-block
design.
103
Part II: Correlation and Regression
8. Correlation
Correlation is one of the most widely used concepts in statistical methodology. In olden days, it
was used by biologists, but now it has found to be used in many fields such as agriculture,
industry, chemistry, psychology, etc.
Correlation coefficient
104
Correlation coefficient measures the amount of co-variability between the two variables.
1. Population correlation coefficient: Let us assume that we have two variables x and y,
such that E(x) = μx v(x) = σx2
E(y) = μy v(y) = σy2
Then, covariance between x and y is given by cov(x,y) = E(x-μx) (y-μy)
E ( x−μ x)( y −μ y) cov (x . y)
Population correlation coefficient is given as ρ = =
√ E ( x−μ x)2 ( y −μ y ) 2 √ v ( x ) v ( y )
σ x, y
=
σ xσ y
Ρ (rho) measures the strength of the linear relationship between x and y. ρ2 is sometimes
known as population coefficient of determination, which measures the amount of
variability in one variable that is accounted for the other variable. Ρ (rho) assumes values
between negative and positive one. That is, ρ lies between -1 and +1, symbolically it is
= -1≤ρ≤1
Type
x1 x1 Type
r=0 *
* * * * * * * * *
* * r=0
105
*
x2 x2
x1 Type
* *
* * r=0
* *
* *
x2
The positive value of r indicates that as the values of one variable increase, the
values of the other variable increase also. While the negative value for r shows
that as the values of one variable increase, the values of the other variable
decrease on the contrary.
106
Test of hypothesis concerning the correlation coefficient
To perform a test on the correlation coefficient, we need to know the sampling
distribution of r. If we are told that r has a bivariate normal distribution, then we will
have two cases:
1. ρ = 0
2. ρ ≠ 0
1. Case 1, if ρ = 0. Then we can find easily the sampling distribution of r. Then r will be
1−r 2
distributed normal with mean 0 and variance σr2, where σr2 = . Note: E(r) = 0
n−2
n=50
n=10
-1 0 1
The characteristic of the sampling distribution of r is that it depends upon p and n. In this
case, since we have assumed that p = 0, then the distribution depends upon n only.
r
r− p r
To test H0: p = 0 vs. Ha: p ≠ 0, we use the t-test. t = = = 1−r 2
Sr Sr
√ n−2
Example: Suppose a sample of n = 10 pairs of brothers and sisters is selected and the
sample correlation coefficient is found to be r = 0.7. Test the significance of the value
0.7 0.7 0.7
0.7
against p = 0. t = 1−(0.7)2 = 1−0.49 = 0.51 = = 2.77
√ 10−2
t0.05(8) = 2.26
√ 8 √ 8
0.2525
Conclusion: we reject the null hypothesis, while t-calculated value is greater than
t-tabulated value.
2. In case where p ≠ 0, rather say p = c, then the frequency distribution looks like the
following:
p =0.8
p = 0.5
107
-1 0 1
The same is true to negative values of p. If p ≠ 0, then we not use the t-distribution to test
the hypothesis such as H0: p = 0.6. To test such hypothesis, we need to make the
1+r
following transformation. Ƶr = ½ln = ½(2.3026log10(1+r/1-r))
1−r
Then, Zr will be approximately normally distributed with mean
1
E(r) = ½(2.3026)log10(1+p/1-p)), var(Ƶr) =
n−3
Once we can use normal distribution theory test hypothesis regarding the population
correlation coefficient. Note that Ƶr = r+(r3/3)+(r5/5)+(r7/7)+…. This is equal to the
inverse hyperbolic tangent of r (tann-1r). For values of r, 0≤r≤1, then 0≤Ƶr ≤∞
Ƶ r−Ƶ p 1
To test hypothesis, we compute the normal deviate Ƶ = , where σr2 = .
σr n−3
Example: Suppose 12 pairs of observations are collected with the following results,
r = 0.866. Test the hypothesis p = 0.75?
1+0.866
Ƶr = ½ln¿) = ½ln( ) = 1.3169
1−0.866
1+0.75
Ƶp = ½ln¿) = ½ln( ) = 0.973
1−0.75
1 1 1
Note: σr2= = = = 0.111, σr = 0.333
n−3 12−3 9
Ƶ r−Ƶ p 1.3169−0.973
Ƶ= = = 1.032
σr 0.333
Example: Let us use the previous example and find 95% confidence interval?
r = 0.866
Ƶr = 1.3169
σr = 0.333
Ƶr±Ƶ0.05(2) σr = 1.3169±(1.96)(0.333) 0.664≤Ƶp≤1.970
Then the 95% confidence interval for ρ is 0.5810≤ρ≤0.9618
108
This is calculated by referring the table or by transformation of Ƶr to ρ.
Since 0.454 is less than Ƶ0.05(2-side) = Ƶ0.05(2), ∞= 1.96, we accept H0. Under this condition, we must
calculate the average or weighted correlation coefficient.
( n 1−3 ) Ƶ 1+ ( n 2−3 ) Ƶ 2 ( 12 ) 1.2315+ ( 15 ) 1.0557
Ƶw = == = 1.1338
(n 1−3)(n 2−3) (12 ) (15)
Rank correlation
There are cases where it is known that a relationship exists between two variables, but the
distribution of these variables may not be known. Under this circumstance, it will not possible to
compute r using our previous procedure. To take care of such a situation, that is to investigate the
association of x and y where their distribution are not known, two statistics have been developed
by two famous statisticians called Spearman and Kendall. These statistics respectively called
Spearman’s Rho (ρs)and Kendall’s taw (ζ). Such statistics which don’t depend up on specific
distribution or whose distribution is free statistics is known as non-parametric statistic.
To compute Spearman’s Rho (ρs), we use the following formula: rs = 1-(6Σdi2/n3-n), where rs is
estimate of ρs, n is sample size. di = the difference in ranks between x & y. Ranking = (1+2)/2 =
1.5 if two same number of smallest ones.
Procedure
1. Rank the data in both x & y separately
109
2. Find the differences in ranks between x & y
3. Compute sum of squares of the differences in ranks
4. Then compute rs as given before
5. For n exceeding 20, rs is approximately normally distributed. That means that the normal
distribution can be used to test hypothesis. But, if n is less than 20, there is a special of rs
table which should be used for non-parametric statistics.
110
Example: Test H0: ρ = 0 using the following data:
Intra-class correlation
Sometimes it is possible to have data which cannot be designated as x and y, or x1 and x2. For
instance, information on some characteristics of twins falls in this category.
111
5 74.1 72.9
6 74.1 69.5
7 72.3 74.2
The intra-class correlation is defined as:
Group msq−Error msq
rI =
Group mean+ Error ms
Using our data, we compute the various sum of squares and our analysis of variance table look
like the following.
112
AOV table
Source df SS MS F
Between group 6 198.31 33.05 20.6
Within groups/Error 7 21.86 3.12
Total 13 220.17
33.05−3.12
rI = = = 0.827
33.05+3.12
9. Regression
One of the most frequently used techniques in research to find the relationship between variables
which are causally related is known as regression.
What is regression?
We observe that the taller a person, the heavier he is. The, what is the relationship between
weight and height? If, for instance we are given the height of a person, we can determine his
weight (using regression model), or if weight and height are related, we may also wish to know
the closeness of the relationship.
Other examples:
1. Relationship between consumption and family income. Does the expenditure on
consumption depend on the income of a family?
2. What is the relationship between milk yield and consumption of ration by cows?
Simple regression
A regression is said to be simple if there is one independent variable.
Examples:
1. Relationship between temperature and army warm population.
2. Fruit size of orange and the level of phosphorous in the soil, etc.
113
Multiple regression
In this type of regression, we have more than one independent variables defining the dependent
variable. Example, weight gain of animal as a function of feed, temperature, breed, disease and
age.
Here, we are interested in the simple linear regression and in this course we will concentrate in
dealing with linear regression.
Linear regression
This means that the relationship between the dependent and independent variables can be
expressed as a straight line.
Example, suppose we have height (inches) and weight (lb) of a group of children as follows:
Height (x) Weight (y) E(y)
50 40 41 42 43 44 42
51 41 43 44 46 46 44
52 41 44 45 48 52 46
53 43 46 47 49 55 48
54 44 46 49 51 60 50
Note that we have 25 pairs of numbers such as (50, 400), (50, 41), etc. These 25 pairs comprise
the population. The graph of this information looks like the following:
1400 = b0+b1x
1200
1000
800
Weight
600
400
200
0
E(y/x=50)
-200 0 10 20 30 40 50 60 70 80 90
-400 E(y/x=42)
Height
114
As we see the graph, we have a sub-population of y values for the x values. The x values are
always fixed. We can talk about the distribution of the y values. There are two cases:
1. In the first case, the distribution of y is specified such as normal
2. In the second case, the distribution is not specified
The average of the y values is known as the expected value of the sub-population.
Example: E(y/=x=54) = (44+46+49+51+60)/5 = 50
Slope: is represented byβ1which measures the amount of change in y affected by a unit change
inx. That is, if x increases by one unit, what happens to y is measured by β1.
Both β0& β1 are known as regression coefficient. The equation is known as the regression curve
or the regression function.
Least squares technique: In the regression equation, the coefficients β0& β1 have to be
estimated using the best fitting line which is given by the equation =b0+b1x. This line is drown
115
by finding b0&b1 in such a way that the sum of the square of the deviation from this line is
minimized. This is done using the technique of least squares.
Find the value of b0&b1 in such a way that S is minimized. Using the notion of differential
calculus, the value of b0&b1 with minimum S are given by the following formula:
b0 = - b1 Σxy−( Σx ) ( Σy)
b1 = =
Σ¿¿
Example: Let us consider the following data in which age x and wing length y of sparrows are
given.
Age (x days) Wing length (y cm)
3.0 1.4 n = 13
4.0 1.5 Σx = 130 Σy = 44.4
5.0 2.2 = 10.0 = 3.42
2
6.0 2.4 Σx = 1562.0 Σy2= 171.3
( Σx ) 2
8 3.1 SSx = Σx2- 1562-[(130)2/13] = 262
n
( Σy ) 2
9 3.2 SSy = Σy2- 171.3-[(44.4)2/13] = 19.66
n
10 3.2 Σxy = 514.8
( Σx ) ( Σy)
11 3.9 SSxy = Σxy- 514.8--[(130x44.4)/13] =
n
70.80
12 4.1
( Σx ) ( Σy)
Σxy−
n Σxy
14 4.7 b1 = = = 70.8/262 = 0.27cm/day
( Σx ) 2 Σx 2
Σx 2−
n
15 4.5
16 5.2 b0 = - b1 = 3.42-(0.27)10 = 0.72
116
Standard error of the estimates
Now, we have found the equation of the best fitting line. What remaining for us is to find out
how well the line fits the set of observation?
For example, given the age of a sparrow, how well or accurately can we estimate its wing
length?
Consider the following two graphs:
• •• • • • • • •
• • • •• •
• • • • • • •• • • • •
a • • •• • • • b • •• ••
• • • • • • • • •• • •
• • • •
• • •
• • • • •
• • •
• • • • • •
Note that the values in a are scattered, whereas they are concentrated in b. This means that the
line b is better than that of a in fitting well with the observations. The standard deviation of
a is more than that of b. The variance of y is given by Sy2 = Σ(y- )2/n-1
Using the same approach, the conditional variance of y gives x. We know that the deviations of
interest are of the form yi- i.
The reason for dividing by n-2 is that in the regression equation, the parameters β0& β1 have to
be estimated. Defined in this way, Sy2/x is unbiased estimate of the population conditional
variance, σy2/x. For large sample size, the formula for computing the conditional variance is too
cumbersome. The simpler one is the following:
1
Sy2/x = ¿)]
n−2
The two formulas are algebraically identical. Using this variance of the regression coefficients
are: V(b0) = S2b0=
Sy 2/ xΣxi 2
( Σx ) 2 Sy 2 /x
n(Σxi 2− ) ¿
2 n
V(b1) = S b1= ( Σx ) 2 =
Σxi 2− ¿
Following our previous example,nwe can compute the standard error of the estimates as follow:
Σy2 – ((Σy)2/n) = 19.656
Σxy – ((Σx)(Σy)/n) = 70.82
117
Σx2 – ((Σx)2/n) = 262
1
Then, Sy2/x = ¿)]
n−2
1
= ¿)]
n−2
0.524
= = 0.0477
11
Sy 2/ x 0.0477
S2b1 = = = 0.013, Sb1 = √ 0.013 = 0.114
Sx 2 262
Sy 2/ xΣxi 2
( 0.0477 ) (1562)
2
S b0 = ( Σx ) 2 = = 0.0219 Sb0 = √ 0.0219 = 0.1479
n(Σxi 2− ) 13(262)
n
The Y-intercept
Given the value of b1, we can have different lines depending upon what value b0 assumes. If b1
remains the same, butb0 changes, we will have several lines which are parallel from one to
another. 1
Example: 2
3
b0(4) 4 Equation 4: 4 = b0(1)+b1x
b0(3) Equation 3: 3 = b0(2)+b1x
b0(2) Equation 2: 2 = b0(3)+b1x
b0(1) Equation 1: 1 = b0(4)+b1x
But, on the other hand, if the intercept b0 remains the same and b1 changes, under this
circumstance b1 obtains lines which start from one point and diverge apart. This phenomenon is
known as concurrence. The lines are said to concur at a point b0.
2 = b0+b1(2)x
1 = b0+b1(1)x
118
Extrapolation data point
We are generally advised not to extrapolate beyond the values of set of data we have. The reason
for this is that we don’t know the nature of the curve beyond our data. We don’t know whether or
not it will remain linear.
119
Assumptions underlying for the use of regression
1. For any value of x, there is normally distributed sub-population of y values
2. The variances of the population are equal
3. The errors in the dependent variable are additive
4. The independent variable is measured without error, since it is fixed.
Note that the sample estimate of β1 is b1. If we take several samples, we will have several values
of b1. This means that b1 will have sampling distribution with mean β1 and some variance. That
is, the mean of b1 = E(b1) = β1.
V(b1) = σ2b1 = σ2y/x/σ2x
Using a development in statistics, the sampling distribution of b1 is normal. There is a theory in
statistics that says that any function of a normal random variable is itself normal. Here, y has a
normal distribution and b1 is a function of as y = b0+b1x+ε
For large sample, we can use the following formula to set confidence interval of β1:
b1- Ƶα/2Sb1≤ β1≤ b1+ Ƶα/2Sb1
Otherwise, for small sample size, use t-test in estimating the confidence interval of β1.
b1- tα/2Sb1≤ β1≤ b1+ tα/2Sb1
The dependent variable can be expressed without the independent variable. To test this
hypothesis, either Ƶ-test or the t-test can be used. Each of this will be illustrated separately.
Let the hypothesis be of the form H0: β1 = 0. Then, t-takes the following shape:
t = (b1- β1)/Sb1 = (b1-0)/Sb1
120
For our example indicated above, b1 = 0.27 and Sb1 = 0.0135, then, t = 0.27/0.0135 = 20
Conclusion: This value is significant.
The F-test on β1
To develop this test, we must first of all define the various sum of squares.
1. Total sum of squares: S2total = Σy2-((Σy)2/n)
( Σxy− ( Σx )( Σy ) ) 2/n
( Σxy ) 2
2. Regression sum of squares: ( Σx ) 2 = = b1Σxy
Σx 2− Σx 2
n
3. Residual sum of squares: SStotal-SSregression
SSregression
The remaining variability that is 1- R2 = .
SStotal
1-R2 is the part that is not related to x.
SSresidual
For our previous example, R2 = = 17.7/19.6569 = 0.90
SS total
121
The high value of R2 indicates that most of the variation in y is due to the regression of x&y.
122