Probability & Statistical Methods - Unit 1 To 4 Material
Probability & Statistical Methods - Unit 1 To 4 Material
January 1, 2017 1 / 23
Sample Space Confusions
January 1, 2017 2 / 23
Conditional Probability
‘the probability of A given B’.
P(A ∩ B)
P(A|B) = , provided P(B) = 0.
P(B)
B
A
A∩B
1. What is P(A|B)?
(a) 1/16 (b) 1/8 (c) 1/4 (d) 1/5
2. What is P(B|A)?
(a) 1/16 (b) 1/8 (c) 1/4 (d) 1/5
January 1, 2017 4 / 23
Table Question
January 1, 2017 5 / 23
Multiplication Rule, Law of Total Probability
Multiplication rule: P(A ∩ B) = P(A|B) · P(B).
Ω
B1
A ∩ B1
A ∩ B2 A ∩ B3
B2 B3
January 1, 2017 6 / 23
Trees
Organize computations
Compute total probability
Compute Bayes’ formula
Example. : Game: 5 red and 2 green balls in an urn. A random ball
is selected and replaced by a ball of the other color; then a second
ball is drawn.
1. What is the probability the second ball is red?
2. What is the probability the first ball was red given the second ball
was red?
5/7 2/7
First draw R1 G1
4/7 3/7 6/7 1/7
Second draw
R2 G2 R2 G2
January 1, 2017 7 / 23
Concept Question: Trees 1
x
A1 y A2
B1 z B2 B1 B2
C1 C2 C1 C2 C1 C2 C1 C2
(a) P(A1 )
(b) P(A1 |B2 )
(c) P(B2 |A1 )
(d) P(C1 |B2 ∩ A1 ).
January 1, 2017 8 / 23
Concept Question: Trees 2
x
A1 y A2
B1 z B2 B1 B2
C1 C2 C1 C2 C1 C2 C1 C2
(a) P(B2 )
(b) P(A1 |B2 )
(c) P(B2 |A1 )
(d) P(C1 |B2 ∩ A1 ).
January 1, 2017 9 / 23
Concept Question: Trees 3
x
A1 y A2
B1 z B2 B1 B2
C1 C2 C1 C2 C1 C2 C1 C2
(a) P(C1 )
(b) P(B2 |C1 )
(c) P(C1 |B2 )
(d) P(C1 |B2 ∩ A1 ).
January 1, 2017 10 / 23
Concept Question: Trees 4
x
A1 y A2
B1 z B2 B1 B2
C1 C2 C1 C2 C1 C2 C1 C2
(a) C1
(b) B2 ∩ C1
(c) A1 ∩ B2 ∩ C1
(d) C1 |B2 ∩ A1 .
January 1, 2017 11 / 23
Let’s Make a Deal with Monty Hall
One door hides a car, two hide goats.
The contestant chooses any door.
Monty always opens a different door with a goat. (He
can do this because he knows where the car is.)
The contestant is then allowed to switch doors if she
wants.
What is the best strategy for winning a car?
(a) Switch (b) Don’t switch (c) It doesn’t matter
January 1, 2017 12 / 23
Board question: Monty Hall
January 1, 2017 13 / 23
Independence
Events A and B are independent if the probability that
one occurred is not affected by knowledge that the other
occurred.
⇔ P(A ∩ B) = P(A)P(B)
January 1, 2017 14 / 23
Table/Concept Question: Independence
(Work with your tablemates, then everyone click in the answer.)
January 1, 2017 15 / 23
Bayes’ Theorem
P(B|A) · P(A)
P(A|B) =
P(B)
Often compute the denominator P(B) using the law of
total probability.
January 1, 2017 16 / 23
05/11/2022, 19:47 Bayes’ Theorem - Theorem, Proof, Solved Example Problems | Mathematics
If A1 , A2 , A3 , ..., An are mutually exclusive and exhaustive events such that P(Ai) > 0, i = 1,2,3,….n and B is any event in which P(B)
> 0, then
Proof
Example 12.26
A factory has two machines I and II. Machine I produces 40% of items of the output and Machine II produces 60% of the items.
Further 4% of items produced by Machine I are defective and 5% produced by Machine II are defective. An item is drawn at random. If
the drawn item is defective, find the probability that it was produced by Machine II. (See the previous example, compare the
questions).
Solution
Let A1 be the event that the items are produced by Machine-I, A2 be the event that items are produced by Machine-II. Let B be the
event of drawing a defective item. Now we are asked to find the conditional probability P (A2 / B). Since A1 , A2 are mutually exclusive
and exhaustive events, by Bayes’ theorem,
We have,
https://www.brainkart.com/article/Bayes--Theorem_36140/ 2/8
05/11/2022, 19:47 Bayes’ Theorem - Theorem, Proof, Solved Example Problems | Mathematics
P ( A1 ) =0.40 , P ( B / A1 ) = 0.04
P ( A2 ) = 0.60, P (B / A2 ) = 0.05
Example 12.27
A construction company employs 2 executive engineers. Engineer-1 does the work for 60% of jobs of the company. Engineer-2 does
the work for 40% of jobs of the company. It is known from the past experience that the probability of an error when engineer-1 does the
work is 0.03, whereas the probability of an error in the work of engineer-2 is 0.04. Suppose a serious error occurs in the work, which
engineer would you guess did the work?
Solution
Let A1and A2 be the events of job done by engineer-1 and engineer-2 of the company respectively. Let B be the event that the error
occurs in the work.
Since P (A1 / B ) > P (A2 / B) , the chance of error done by engineer-1 is greater than the chance of error done by engineer-2. Therefore
one may guess that the serious error would have been be done by engineer-1.
https://www.brainkart.com/article/Bayes--Theorem_36140/ 3/8
05/11/2022, 19:47 Bayes’ Theorem - Theorem, Proof, Solved Example Problems | Mathematics
Example 12.28
The chances of X, Y and Z becoming managers of a certain company are 4 : 2 : 3. The probabilities that bonus scheme will be
introduced if X, Y and Z become managers are 0.3, 0.5 and 0.4 respectively. If the bonus scheme has been introduced, what is the
probability that Z was appointed as the manager?
Solution
Let A1, A2 and A3 be the events of X, Y and Z becoming managers of the company respectively. Let B be the event that the bonus scheme
will be introduced.
Since A1 , A2 and A3 are mutually exclusive and exhaustive events, applying Bayes’ theorem
Example 12.29
A consulting firm rents car from three agencies such that 50% from agency L, 30% from agency M and 20% from agency N. If 90% of
the cars from L, 70% of cars from M and 60% of the cars from N are in good conditions (i) what is the probability that the firm will get
a car in good condition? (ii) if a car is in good condition, what is probability that it has come from agency N?
Solution
Let A1 , A2 , and A3 be the events that the cars are rented from the agencies X, Y and Z respectively.
https://www.brainkart.com/article/Bayes--Theorem_36140/ 4/8
05/11/2022, 19:47 Bayes’ Theorem - Theorem, Proof, Solved Example Problems | Mathematics
We have to find
P ( A2 ) = 0.30, P (G / A2 ) = 0.70
(i) Since A1 , A2 and A3 are mutually exclusive and exhaustive events and G is an event in S, then the total probability of event G is
P(G).
P(G) = 0.78.
By Bayes’ theorem,
EXERCISE 12.4
(1) A factory has two Machines-I and II. Machine-I produces 60% of items and Machine-II produces 40% of the items of the total
output. Further 2% of the items produced by Machine-I are defective whereas 4% produced by Machine-II are defective. If an item is
drawn at random what is the probability that it is defective?
(2) There are two identical urns containing respectively 6 black and 4 red balls, 2 black and 2 red balls. An urn is chosen at random and
a ball is drawn from it. (i) find the probability that the ball is black (ii) if the ball is black, what is the probability that it is from the first
urn?
(3) A firm manufactures PVC pipes in three plants viz, X, Y and Z. The daily production volumes from the three firms X, Y and Z are
respectively 2000 units, 3000 units and 5000 units. It is known from the past experience that 3% of the output from plant X, 4% from
plant Y and 2% from plant Z are defective. A pipe is selected at random from a day’s total production,
https://www.brainkart.com/article/Bayes--Theorem_36140/ 5/8
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)
Estimation theory
• Paramteric estimation;
• Bayesian estimation.
1
2 CHAPTER 1. ESTIMATION THEORY
- Fyθ (y) , fyθ (y) the cumulative distribution function and the probabil-
ity density function, respectively, of the observation vector y, which
depend on the unknown vector θ.
T : Y → Θ.
The value θ̂ = T (y), returned by the estimator when applied to the observa-
tion y of y, is called estimate of θ.
Unbiasedness
A first desirable property is that the expected value of the estimate θ̂ = T (y)
be equal to the actual value of the parameter θ.
In the above definition we used the notation Eθ [·], which stresses the
dependency on θ of the expected value of T (y), due to the fact that the pdf
of y is parameterized by θ itself.
The unbiasedness condition (1.1) guarantees that the estimator T (·) does
not introduce systematic errors, i.e., errors that are not averaged out even
when considering an infinite amount of observations of y. In other words,
T (·) does not overestimate neither underestimate θ, on average (see Fig. 1.1).
unbiased
biased
one has
" n
# n n
1 X 1X 1X
E [y] = E y = E [y i ] = m = m.
n i=1 i n i=1 n i=1
However,
!2
n
X
= n2 E (y i − m)2
E n(y i − m) − (y j − m)
j=1
" # !2
n
X n
X
− 2nE (y i − m) (y j − m) + E (y j − m)
j=1 j=1
= n2 σ 2 − 2nσ 2 + nσ 2
= n(n − 1)σ 2
because, for the independency assumption, E (y i − m)(y j − m) = 0 for
i 6= j. Therefore,
n
1X 1 n−1 2
σ̂y2 n(n − 1)σ 2 = σ 6= σ 2 .
E = 2
n i=1 n n
Consistency
n = 500
n = 100
n = 50
n = 20
If h i
lim E (θ̂n − θ)2 = 0,
n→∞
The result in Example 1.4 is a special case of the following more general
celebrated result.
where mT (y) = E [T (y)]. The above expression shows that the MSE of
a biased estimator is the sum of the variance of the estimator and of the
square of the deterministic quantity mT (y) − θ, which is called bias error. As
we will see, the trade off between the variance of the estimator and the bias
error is a fundamental limitation in many practical estimation problems.
The MSE can be used to decide which estimator is better within a family
of estimators.
Definition 1.5. Let T1 (·) and T2 (·) be two estimators of the parameter θ.
Then, T1 (·) is uniformly preferable to T2 (·) if
• be unbiased;
• the previous condition must hold for every admissible value of the pa-
rameter θ.
Unfortunately, there are many problems for which there does not exist any
UMV UE estimator. For this reason, we often restrict the class of estimators,
in order to find the best one within the considered class. A popular choice
is that of linear estimators, i.e., taking the form
n
X
T (y) = ai y i , (1.4)
i=1
with ai ∈ R.
Now, among all the estimators of form (1.4), with the coefficients ai satisfying
(1.5), we need to find the minimum variance one. Being the observations y i
independent, the variance of T (y) is given by
!2
n
X n
X
Eθ (T (y) − m)2 = Eθ a2i σi2 .
ai y i − m =
i=1 i=1
s.t.
n
X
ai = 1
i=1
n n
!
X X
L(a1 , . . . , an , λ) = a2i σi2 + λ ai − 1
i=1 i=1
∂L(a1 , . . . , an , λ)
= 0, i = 1, . . . , n (1.6)
∂ai
∂L(a1 , . . . , an , λ)
= 0. (1.7)
∂λ
From (1.7) we obtain the constraint (1.5), while (1.6) implies that
2ai σi2 + λ = 0, i = 1, . . . , n
12 CHAPTER 1. ESTIMATION THEORY
from which
1
λ=− n (1.8)
X 1
i=1
2σi2
1
σi2
ai = n , i = 1, . . . , n (1.9)
X 1
j=1
σj2
Notice that if all the measurements have the same variance σi2 = σ 2 , the
estimator m̂BLU E boils down to the sample mean y. This means that the
BLUE estimator can be seen as a generalization of the sample mean, in
the case when the measurements y i have different accuracy (i.e., different
variance σi2 ). In fact, the BLUE estimator is a weighted average of the ob-
servations, in which the weights are inversely proportional to the variance of
the measurements or, seen another way, directly proportional to the precision
of each observation. Let us assume that for a certain i, σi2 → ∞. This means
1
that the measurement y i is completely unreliable. Then, the weight σi2
of y i
within m̂BLU E will tend to zero. On the other hand, for an infinitely precise
1
measurement y j (σj2 → 0), the corresponding weight σj2
will be predominant
over all the other weights and the BLUE estimate will approach that mea-
surement, i.e., m̂BLU E ≃ y j . △
where !2
θ
∂ ln fyθ (y)
In (θ) = E (1.12)
∂θ
In (θ) = n I1 (θ).
where the inequality must be intended in matricial sense and the matrix
In (θ) ∈ Rp×p is the so-called Fisher information matrix
! !T
θ θ
∂ ln fy (y) ∂ ln fy (y)
In (θ) = Eθ .
∂θ ∂θ
h i
Notice that the matrix Eθ (T (y) − θ) (T (y) − θ)T is the covariance matrix
of the unbiased estimator T (·).
Theorem 1.3 states that there does not exist any estimator with variance
smaller than [In (θ)]−1 . Notice that In (θ) depends, in general, on the actual
value of the parameter θ (because the partial derivatives must be evaluated
in θ) which is unknown. For this reason, an approximation of the lower
bound is usually computed in practice, by replacing θ with an estimate θ̂.
Nevertheless, the Cramér-Rao is also important because it allows to define
the key concept of efficiency of an estimator.
14 CHAPTER 1. ESTIMATION THEORY
An efficient estimator has the least possible variance among all unbiased
estimators (therefore, it is also a UMVUE).
In the special case of i.i.d. observations y i , Theorem 1.3 states that
In (θ) = nI1 (θ), where I1 (θ) is the Fisher information of a single observa-
tion. Therefore, for a fixed θ, the Cramér-Rao bound decreases as n1 , as the
number of observations n grows.
θ
2
σy2 −1 [I1 (θ)]−1
E (y − my ) = ≥ [In (θ)] = .
n n
Let us now assume that the y i are distributed according to the Gaussian pdf
(y −m ) 2
1 − i 2y
fyi (yi ) = √ e 2σy
.
2πσy
Let us compute the Fisher information of a single measurement
!2
θ
∂ ln fy1 (y1 )
I1 (θ) = Eθ .
∂θ
and hence,
(y − my )2
θ 1
I1 (θ) = E = .
σy4 σy2
The Cramér-Rao bound takes on the value
Definition 1.10. Let y be a vector of observations with pdf fyθ (y), depend-
ing on the unknown parameter θ ∈ Θ. The likelihood function is defined
as
L(θ|y) = fyθ (y) .
ln L(θ|y).
Remark 1.1. Assuming that the pdf fyθ (y) be a differentiable function of
θ = (θ1 , . . . , θp ) ∈ Θ ⊆ Rp , with Θ an open set, if θ̂ is a maximum for L(θ|y),
it has to be a solution of the equations
∂L(θ|y)
= 0, i = 1, . . . , p (1.13)
∂θi θ=θ̂
or equivalently of
∂ ln L(θ|y)
= 0, i = 1, . . . , p. (1.14)
∂θi θ=θ̂
from which
n
X yi − m̂M L
= 0,
i=1
σy2
and hence
n
1X
m̂M L = y.
n i=1 i
Therefore, in this case the ML estimator coincides with the sample mean.
Since the observations are i.i.d. Gaussian variables, this estimator is also ef-
ficient (see Example 1.6). △
The result in Example 1.7 is not restricted to the specific setting or pdf
considered. The following general theorem illustrates the importance of max-
imum likelihood estimators, in the context of parametric estimation.
18 CHAPTER 1. ESTIMATION THEORY
Theorem 1.4. Under the same assumptions for which the Cramér-Rao bound
holds, if there exists an efficient estimator T ∗ (·), then T ∗ (·) is a maximum
likelihood estimator.
from which
n
1X
m̂M L = y
n i=1 i
n
2 1X
σM = (y − m̂M L )2 .
L
n i=1 i
1.4. NONLINEAR ESTIMATION WITH ADDITIVE NOISE 19
• asymptotically unbiased;
• consistent;
• asymptotically efficient;
• asymptotically normal.
h : Θ ⊆ Rp → Rn
y = Uθ + ε. (1.16)
In the following, we will assume that rank(U) = p, which means that the
number of linearly independent measurements is not smaller than the number
of parameters to be estimated (otherwise, the problem is ill posed).
We now introduce two popular estimators that can be used to estimate
θ in the setting (1.16). We will discuss their properties, depending on the
assumptions we make on the measurement noise ε. Let us start with the
Least Squares estimator.
The name of this estimator comes from the fact that it minimizes the
sum of the squared differences between the data realization y and the model
Uθ, i.e.
θ̂LS = arg min ky − Uθk2 .
θ
Indeed,
∂
ky − Uθk2 T
= 2θ̂LS U T U − 2y T U = 0,
∂θ θ=θ̂LS
∂xT Ax ∂Ax
where the properties ∂x
= 2xT A and ∂x
= A have been exploited. By
T
solving with respect to θ̂LS , one gets
T
θ̂LS = y T U(U T U)−1 .
22 CHAPTER 1. ESTIMATION THEORY
Finally, by transposing the above expression and taking into account that
the matrix (U T U) is symmetric, one obtains the equation (1.17).
It is worth stressing that the LS estimator does not require any a priori
information about the noise ε to be computed. As we will see in the sequel,
however, the properties of ε will influence those of the LS estimator .
Similarly to what has been shown for the LS estimator, it is easy to verify
that the GM estimator minimizes the weighted sum of squared errors between
y and Uθ, i.e.
θ̂GM = arg min(y − Uθ)T Σ−1
ε (y − Uθ).
θ
Notice that the Gauss-Markov estimator requires the knowledge of the co-
variance matrix Σε of the measurement noise. By using this information, the
measurements are weighted with a matricial weight that is inversely propor-
tional to their uncertainty.
Under the assumtpion that the noise has zero mean, E [ε] = 0, it is easy
to show that both the LS and the GM estimator are unbiased. For the LS
estimator one has
h i
Eθ θ̂LS = Eθ (U T U)−1 U T y = Eθ (U T U)−1 U T (Uθ + ε)
= Eθ θ + (U T U)−1 U T ε = θ.
= Eθ θ + (U T Σ−1 −1 T −1
ε U) U Σε ε = θ.
1.5. LINEAR ESTIMATION PROBLEMS 23
If the noise vector ε has non-zero mean, mε = E [ε], but the mean mε
is known, the LS and GM estimators can be easily amended to remove the
bias. In fact, if we define the new vector of random variables ε̃ = ε − mε ,
the equation (1.16) can be rewritten as
y − mε = Uθ + ε̃, (1.19)
and being clearly E [ε̃] = 0, E [ε̃ε̃′ ] = Σε , all the treatment can be repeated
by replacing y with y − mε . Therefore, the expressions of the LS and GM
estimators remain those in (1.17) and (1.18), with y replaced by y − mε .
The case in which the mean of ε is unknown is more intriguing. In some
cases, one may try to estimate it from the data, along with the parameter θ.
Assume for example that E [εi ] = m̄ε , ∀i. This means that E [ε] = m̄ε · 1,
where 1 = [1 1 ... 1]T . Now, one can define the extended parameter
vector θ̄ = [θ′ m̄ε ]T ∈ Rp+1 , and use the same decomposition as in (1.19) to
obtain
y = [U 1]θ̄ + ε̃
Then, one can apply the LS or GM estimator, by replacing U with [U 1], to
obtain a simultaneous estimate of the p parameters θ and of the scalar mean
m̄ε .
In the special case Σε = σε2 In (with In identity matrix of dimension n), i.e.,
when the variables ε are uncorrelated and have the same variance σε2 , the
BLUE estimator is the Least Squares estimator (1.17).
24 CHAPTER 1. ESTIMATION THEORY
Proof
Since we consider the class of linear unbiased estimators, we have T (y) = Ay,
and E [Ay] = AE [y] = AUθ. Therefore, one must impose the constraint
AU = Ip to guarantee that the estimator is unbiased.
In order to find the minimum variance estimator, it is necessary to minimize
(in matricial sense) the covariance of the estimation error
= E AεεT AT
= AΣε AT
A = (U T Σ−1 −1 T −1
ε U) U Σε + M (1.22)
AΣε AT = (U T Σ−1 −1 T −1 −1 T −1
ε U) U Σε Σε Σε U(U Σε U)
−1
+(U T Σ−1 −1 T −1
ε U) U Σε Σε M
T
+MΣε Σ−1 T −1
ε U(U Σε U)
−1
+ MΣε M T
= (U T Σ−1
ε U)
−1
+ MΣε M T
≥ (U T Σ−1
ε U)
−1
1.5. LINEAR ESTIMATION PROBLEMS 25
As it has been noticed after Definition 1.13, the solution of (1.23) is actu-
ally the Gauss-Markov estimator. Therefore, we can state that: in the case
of linear observations corrupted by additive Gaussian noise, the Maximum
Likelihood estimator coincides with the Gauss-Markov estimator. Moreover,
it is possible to show that in this setting
! !T
θ θ
∂ ln fy (y) ∂ ln fy (y)
Eθ = U T Σ−1 U
∂θ ∂θ
ε ∼ N(0, σε2 In ),
the, the GM estimator boils down to the LS one. Therefore: in the case
of linear observations, corrupted by independent and identically distributed
26 CHAPTER 1. ESTIMATION THEORY
Gaussian noise, the Maximum Likelihood estimator coincides with the Least
Squares estimator.
yi = θ + vi , i = 1, . . . , n
E [T (y)] = E [x] .
where d(x, T (y)) denotes the distance between x and its estimate T (y),
according to a suitable metric.
Since the distance d(x, T (y)) is a random variable, the aim is to minimize
its expected value, i.e. to find
x̂M SE = E [x|y] .
The previous result states that the estimator minimizing the MSE is the
a posteriori expected value of x, given the observation of y, i.e.
Z +∞
x̂M SE = xfx|y (x|y)dx. (1.25)
−∞
E [E [x|y]] = E [x] ,
one can conclude that the minimum MSE estimator is always unbiased.
The minimum MSE estimator has other attractive properties. In partic-
ulare, if we consider the matrix
• x̂M SE is the estimator minimizing (in matricial sense) Q(x, T (y)), i.e.
Example 1.10. Consider two random variables x and y, whose joint pdf is
given by
− 3 x2 + 2xy if 0 ≤ x ≤ 1, 1≤y≤2
2
fx,y (x, y) =
0 elsewhere
T (y) = Ay + b (1.26)
in which the matrix A ∈ Rm×n and the vector b ∈ Rm are the coefficients of
the estimator to be determined. Among all estimators of the form (1.26), we
aim at finding the one minimizing the MSE.
Definition 1.17. The Linear Mean Square Error (LMSE) estimator is de-
fined as x̂LM SE = A∗ y + b∗ , where
A∗ = Rxy Ry−1 ,
b∗ = mx − Rxy Ry−1 my .
(1.28)
1.6. BAYESIAN ESTIMATION 31
Observe that the last two terms of the previous expression are positive
semidefinite matrices. Hence, the solution of problem (1.28) is obtained by
choosing A∗ , b∗ such that the last two terms are equal to zero, i.e.
A∗ = Rxy Ry−1 ;
b∗ = mx − Amy = mx − Rxy Ry−1 my .
The LMSE estimator is unbiased because the expected value of the esti-
mation error is equal to zero. In fact,
= mx − mx + Rxy Ry−1 E [y − my ] = 0.
In the case in which the random variables x, y are jointly Gaussian, with
mean and covariance matrix defined as in Theorem 1.8, we recall that the
conditional expected value of x given the observation of y is given by
y 1 = x + ε1 ,
y 2 = x + ε2 .
Let ε1 , ε2 be two independent random variables, with zero mean and variance
σ12 , σ22 , respectively. Under the assumption that x and εi , i = 1, 2, are
independent, we aim at computing the LMSE estimator of x.
1.6. BAYESIAN ESTIMATION 33
y = 1 x + ε,
where 1 = (1 1)T .
First, let us compute the mean of y
E [y] = E [1 x + ε] = 1 mx
= 1 σx2 1T + Rε ,
where !
σ12 0
Rε =
0 σ22
mx σ12 σ22 mx 1 1
σx2 + σ22 y 1 + σ12 y 2 2
σx
+ y
σ12 1
+ y
σ22 2
= σ12 σ22
= σ12 +σ22
1
σ12 + σ22 + σx2 σx2 + σ12 σ22
mx 1 1
2
σx
+ y +
σ12 1 σ22 2
y
= 1 1 1 .
σx2 + σ2 + σ22
1
1.7 Exercises
1.1. Verify that in the problem of Example 1.9, the LS and GM estimators
of θ coincide respectively with y in (1.2) and m̂BLU E in (1.10).
1.7. EXERCISES 35
e) Find the variance of the estimation error for the estimator T (·) defined
in item d), in the case n = 1. Compute the Fisher information I1 (θ)
and show that the inequality (1.11) holds.
1.7. EXERCISES 37
1.7. Let a and b be two unknown quantities, for which we have three different
measurements:
y1 = a + v1
y2 = b + v2
y3 = a + b + v3
where v i , i = 1, 2, 3, are independent random variables, with zero mean.
Let E [v 21 ] = E [v 23 ] = 1 and E [v 22 ] = 21 . Find:
Compare the obtained estimates with those one would have if the observation
y 3 were not available. How does the variance of the estimation error change?
− 23 x2 + 2xy
(
0 ≤ x ≤ 1, 1 ≤ y ≤ 2
fx,y (x, y) =
0 elsewhere
a) Find the estimators x̂M SE and x̂LM SE of x, and plot them as functions
of the observation y.
38 CHAPTER 1. ESTIMATION THEORY
39
STUDENTSFOCUS.COM
UNIT-III
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
UNIT-IV
Unit –II
n
X i2 = Q1 + Q2 + ⋯ . +Qk
i 1
Where Qj is a quadratic from in X1, X2, ....,Xn, with rank (degrees of
freedom) rj: j= 1, 2, ...., k. Then the random variables Q1, Q2, ...., Qk are
mutually independent and 𝑄𝑗 𝜎 2 is 𝜒 2 -variate with rj degrees of freedom if and
only if
rj n .
j 1
For example
24
Let there be five treatments each to be replicated four times. There are,
therefore, 20 plots. Let these plots be numbered from 1 to 20 conveniently.
When a coin is tossed, there are two events, that is, either the head comes up,
or the tail. We denote the “head” by H and the “tail” by T.
Layout of CRD
1 2 3 4
A C A D
5 6 7 8
B D B D
9 10 11 12
C B C A
13 14 15 16
B D A C
Advantages of CRD
25
iii) This makes the design less efficient and results in less sensitivity in
detecting significant effects.
Applications
The model is
yij ( ti )
2
ij
(ee)
0
2 yij t i (1) 0
ij
2 yij t i 0
ij
y ti
0
0
2
ij
ij
Where y
i
ij G , G= Grand total
26
G ni ni t i … (1)
i i
(ee)
0
t i
y ti
0
0
2
ij
j
y t
j
ij
j j
i 0
Where y j
ij Ti
Ti ni ni t i 0 …(2)
n t
i
i i 0, n
i
i n
G nˆ 0
G
̂
n
Ti ni ̂ ni t i
Ti ni G
tˆi
ni ni n
Ti G
tˆi
ni n
27
Error Sum of Squares
E ee y ij t i
2 2
ij
y
ij
ij t i yij t i
y y
ij
ij ij t i other terms are vanished
y
ij
2
ij ˆ yij tˆ i yij
G T TG
= yij2 G i i
ij n i ni n
G 2 Ti 2 G 2
= ( y 2
n i ni n
ij
ij
Where y j
ij Ti
G2
Where is the correction factor
n
28
Error n-k By Er.S .S
MSSE=
subtraction nk
E.S.S=T.S.S-
Tr.S.S
Total n-1 G2
yij2
ij n
Problem 2.1 :
A 55 49 42 21 52 219
B 61 112 30 89 63 355
C 42 97 81 95 92 407
D 169 137169 85 154 714
Figures in antique in the Table are not given in the original data. They are a
part of the calculations for analysis.
Weight gain of baby chicks fed on different feeding materials composed of
tropical feed stuffs is given in Table.
29
Solution:
Null hypothesis, Hₒ: tA = tB = tC = tD
i.e., the treatment effects are same. In other words, all the treatments (A, B, C,
D) are alike as regards their effect on increase in weight.
Alternative hypothesis, H₁: At least two of tᵢ‟s are different.
Raw S.S. (R.S.S.) = yi j
2
ij = 55² + 49² +……..+ 85² + 154² = 1, 81,445
Test statistic: FT ̴ F(3,16), Tabulated Fₒ.ₒ₅ (3, 16) = 3.06. Hence FT is highly
significant and we rejected Hₒ at 5% level of significance and conclude that the
treatments A, B, C and D differ significantly.
Advantages of RBD
(i) Accuracy:
(ii) Flexibility:
30
In R.B.D no restriction are placed on the number of treatments
or the number of replicates. In general, at least two replicates are required to
carry out the test of significance (factorial design is an exception). In addition,
control (check) or some other treatments may be included more than once
without complications in the analysis.
Disadvantages of RBD
Layout of RBD: -
BlockI A B C D E
BlockII B C D E A
BlockIII C D E A B
BlockIV D E A B C
Lay out:
31
means total
y11 y12 .... y1r y1. T1.
y 21 y 22 .... y 2 r y 2. T2.
.. ... ... ... ... ..
yi1 yi 2 .... yij y ir yi. Ti.
... ... ... ... .... ....
... ... ... .... .... .....
y t1 yt 2 .... y tr y t . Tt .
means y .1 y.2 .... y.r
total T.1 T.2 .... T.r G
Let us assume that yij is the response of the yield of experiment unit from
ith treatment jth block.
The model is
y ij t i b j eij ; i 1, 2, , t ; j 1, 2,, r
Where yij is the response or the yield of the experimental unit receiving the ith
treatment in the jth block;
t r
Where µ , ti and bj are constants so that ti 0
i 1
and b
j 1
j 0
If we write y
i j
ij G Grand Total
y
j
ij Ti = Total for ith treatment
y
i
ij B j = Total for jth block
32
µ , ti and bjare estimated by the method of least squares
E eij2 y ij t i b j
2
… (1)
i j i j
E
0
2 ( yij t i b j ) 0
i j
0
( y
i j
ij ti b j )
2
0
y t
i j
ij
i j i j
i b j ) 0
i j
i
y
i j
ij tr r t i t b j 0
i j
Where y
i j
ij G
G tr r t i t b j 0 … (2)
i j
E
0
t i
2 ( yij t i b j )(1) 0
j
2 ( yij t i b j ) 0
j
0
(y
j
ij ti b j )
2
0
33
y t b ) 0
j
ij
j j
i
j
j
Where y j
ij Ti
Ti r rt i b j …(3)
j
E
0
b j
2 ( yij t i b j )(1) 0
i
2 ( yij t i b j ) 0
i
0
(y
i
ij ti b j )
2
0
y t b ) 0
i
ij
i i
i
i
j
Where y i
ij Bj
B j t ti tb j
i ….(4)
t r
ti 0
i 1
and b
j 1
j 0
G tr
G
ˆ
tr
34
Ti r̂ rt i
Ti r̂ rt i
Ti G
tˆi
r tr
B j tˆ tb j
Bj G ˆ
bj
t tr
E y ij t i b j
2
i j
E ( yij t i b j )( yij t i b j )
i j
G T G Bj G
y ij2 G y ij ( i ) y ij ( )
i j tr i j r tr i j t tr
Where yi j
ij G; y
j
ij Ti ; yij B j
i
Ti 2 Bj
2
G i G j
2 2
G2
= ( y ij
2
)
i j tr r tr t tr
G2
Where, correction factor=
tr
35
G2
Total Sum of Square = y 2
ij
i j tr
T i
2
G2
Treatment Sum of Square = ST2 = i
r tr
B 2
j
G2
Block Sum of Square = SB2=
j
t tr
Total rt-1
Under the null hypothesis H0t = t1=t2=…=ti against the alternative that all t‟s
are not equal the test statistic is :
S T2
FT ~ Ft 1,t 1r 1
S E2
36
S T2
FT 2 ~ Ft 1,t 1r 1
SE
Problem 3.3
Table 2.3
Solution:
Null hypothesis:
Alternative hypothesis:
H₁t: At least two τᵢ‟s are different. ; H₁b: At least two bᵢ‟s are different.
For finding the various S.S., we rearrange the above table as follows:
37
Table 2.4
Block
(Bᴊ)
Tᵢ²
14,161.00 14,376.01 14,908.41 4,212.01 4,395.69 9,273.69
Average
29.75 30.0 30.5 16.2 16.6 24.1
38
Block 3 219.43 s²B = 73.14 Fb = 73.14/15.31 = 4.7
Total 23 1.350.25
Tabulated F₃, ₁₅, (0.05) = 5.42 and F₅, ₁₅ (0.05) = 4.5 .Since under Hₒt Ft ̴ F (5,
15) and under Hₒb, Fb ̴ F (3, 15), we see that Ft is significant while Fb is not
significant at 5% level of significance. Hence, Ft is rejected at 5% level of
significance and we conclude that treatment effects are not alike. On the other
hand, Hb may be retained at 5% level of significance and we may conclude that
the blocks are homogeneous.
Let the observation yij = x (say) in the jth block and receiving the ith treatment be
missing, as given in table 3.7 Table 3.7
Treatments
1 2 … I … t
… …
where
y i . is total of known observations getting ith treatment
y. j is total of known observations in jth block and
39
y.. is total of all known observations
G 2 G x
2
Correction factor =
tr tr
ij
i j tr tr
2
y . i y i .. x
Sum of square due to treatment (S.S.Tr) = i
C.F i
C.F
r r
y y x
2 2
.j
C.F C.F
.j
Block (S.S.B) =
j
t r
i j r t
=
2
x 2 cons tan t terms independen t of x
G x 2 i.
y x
G x
2
y. j x 2 G x 2
tr r tr t tr
y x
2
tr r tr t tr
y x
2
y. j x G x
2 2
x 2 cons tan t terms independen t of x
i.
r t tr
40
S .S .E
0
x
2x 2
yi. x 2 y. j x 2 G x 0 0
r t tr 2
y x y x
x
i.
ij G x 0
r t tr
trx t ( yi. x) r y. j x G x
0
tr
trx tyi. tx ry . j rx (G x) 0
x(tr t r 1) tyi. ry . j G 0
x(tr t r 1) tyi. ry . j G
x((t 1)(r 1)) tyi. ry . j G
tyi. ry . j G
x
(r 1)(t 1)
Problem 3.3
Suppose that the value for treatment 2 is missing in replication III. The data
will then be as presented in the table below.
Replication
Treatment Total
I II III IV
41
2 29.5 30.4 X 29.6 89.5
= 397.7/12
= 33.1
= [135.1 – 4(33.1)]²/(5)(4)
= 7.29/20
= 0.3645
= 19946.9725 – 19425.1445
= 521.8280
42
= 521.4635
Source of variation
df SS MS F
Total 18 938.9610
2……. l…………m……………. K
1……
1 y₁₁ y₁₂ B₁
2 y₂₁ y₂₂ B₂
. .
. .
J x B’ᴊ+x
43
. .
. .
I y B’ᵢ+y
. .
. .
r Bᵣ
G 2 G x y
2
Correction factor =
tr tr
yij2
i j tr
x 2 y 2 cons tan t terms independent of x and y
tr
2
y . i y i .. x y
Sum of square due to treatment (S.S.Tr) = i
C.F i
C.F
r r
y y x
2
y
2
.j
C.F C.F
.j
Block (S.S.B) =
j
t r
S.S.E=T.S.S – S.S.Tr-S.S.B
x 2
y 2 cons tan t terms indepent of x and y C.F
2
= Ti x Tm B j x Bi
2 2 2
y y
C.F C.F
r r t t
44
T x
2
T y
2
2 i m
x y cons tan t terms indepent of x and y .C.F
2
r r
B x
2
B y
2
C.F C.F
j i
t t
T x T B x B
2 2 2 2
y y
G x y
2
x2 y2 m i
i j
...(1)
r r t t tr
.S .S .E
0
x
2 B j x
2x
2(Ti x)
2(G x y) 0
r t tr
B x
(T x) j
x i (G x y) 0 0
r t tr 2
xtr t Ti x r B j x G x y
0
tr
xtr t Ti x r ( B j x) G x y 0 tr 0
xtr tTi tx rB j rx G x y 0
x(tr t r 1) tTi rB j G y
45
tTi rB j G y
x
(t 1)(r 1)
.S.S.E
0
y
2 Bi y
2y
2(Tm y)
2(G x y) 0
r t tr
B y
(T y ) i
y m (G x y) 0 0
r t tr 2
ytr t Tm y r Bi y G x y
0
tr
ytr t Tm y r ( Bi y ) G x y 0 tr 0
ytr tTm ty rBi ry G x y 0
y(tr t r 1) tTm rBi G x
tTm rBi G x
y
(t 1)(r 1)
Problem 3.4
E C D B A Total
46
26 42 39 37 24 168
A D E C B
24 33 21 (X) 38 116
D B A E C
47 45 31 29 31 183
B A C D E
38 24 36 41 34 173
C E B A D
41 24 (X) 26 30 121
The means for second row and fourth column in which C is missing are
116/4 = 29.0 and 133/4 = 33.25, respectively. Hence the first estimate for C is
= 2030/12 – 1584.24/12
= 169.17 – 132.02
= 37.15
47
G₂ = 5(116 + 133 + 150) – 2(798.15)/12
= 1995/12 – 1596.3/12
= 166.25 – 133.03
= 33.22
G‟ = 761 + 33.22
= 794.22
B² = 169.17 – 2(794.3)/12
= 36.8
It can be seen that the estimated values for B are same and that for C are
very close. Hence we stop the iteration process at third cycle. The final
estimates for B and C for the missing plots are 36.8 and 33.3 respectively.
The column total, row total, etc., with respect to the missing plots are
modified by adding the estimated values. Thus we have,
CF = (831.1)²/25
= 27629.0884
48
Treatment SS = 28448.586 – CF = 819.4976
Error SS = 278.0288
Now ignoring the treatment classification the missing values are estimated
as in the case of RBD. The estimate of the second row, fourth column missing
value is 28.5; and that of fifth row, third column is 28.2. After substituting the
estimated values and analyzing the data as RBD, we get the error sum of
squares as 1031.5856. Then we have,
= 1031.5856 – 278.0288
= 753.5568
Source of
variation
df SS MS F
Total 22 1273.0416
LSD is defined for eliminating the variation of two factors called row and
column in this design. The number of treatments is equal to the number of
replications.
Layout of design
2x2 layouts
A B
B A
3x3 layouts
A B C
B C A
C A B
4x4 layouts
A B C D
B C D A
C D A B
D A B C
5X5 layouts
A B C D
E
B C D E
A
C D E A
B
50
D E A B
C
E A B C D
Example:
A Latin in which the treatments say A, B, C etc occur in the first row
and first column in alphabetical order is called standard Latin square.
Example:
A B
B A
Advantages of LSD
1. With two way grouping LSD controls more of the variation than
CRD or RBD.
2. The two way elimination of variation as a result of cross grouping
often results in small error mean sum of squares.
3. LSD is an incomplete 3-way layout. Its advantage over the complete
3-way layout is that instead of m³ experimental units only m² units
are needed. Thus, a 4x4 LSD results in saving of m³ = 4³ - 4² = 64-
16 = 48 observations over a complete 3-way layout.
4. The statistical analysis is simple though slightly complicated than
for RBD. Even 1 or 2 missing observations the analysis remains
relatively simple.
5. More than one factor can be investigated simultaneously.
Disadvantages of LSD
1. LSD is suitable for the number of treatments between 5 and 10 and
for more than 10 to 12 treatments the design is seldom used. Since
51
in that case, the square becomes too large and does not remain
homogeneous.
2. In case of missing plots the statistical analysis becomes quite
complex.
Let yijk (i, j, k=1,2,…,m)denote the response from the unit in the ith row, jth
column and receiving the kth treatment.
The model is
y ijk ri c j t k eijk ; i, j , k 1, 2, , m
Where µ is the constant mean effect; ri, cj and tk due to the ith row, jth column
and kth treatment respectively and eijk is error effect due to random component
assumed to be normally distributed with mean zero and variance
e2 i.e., eijk ~ N (0, e2 )
If we write
ijk
E E E E
0, 0, 0, 0
ri c j t k
52
Differentiate with respect to µ in equation (1)
E
2 yijk ri c j t k (1) 0
ijk
y
ijk
ijk ri c j t k 0
ijk ijk ijk ijk
Where y
ijk
ijk G , I,j,k=m2, I,j=m, I,k=m
G m 2 m ri m c j m t k 0 … (2)
i j k
E
2 yijk ri c j t k (1) 0
ri jk
y
jk
ijk ri c j t k 0
jk jk jk jk
Where y
jk
ijk Ri , I,j,k=m2, I,j=m, I,k=m
Ri m mri m c j m t k 0 … (3)
j k
E
2 y ijk ri c j t k (1) 0
c j ik
y
ik
ijk ri c j t k 0
ik ik ik ik
Where y
ik
ijk C j , I,j,k=m2, I,j=m, I,k=m
C j m m ri mc j m t k 0 … (4)
i k
E
2 yijk ri c j t k (1) 0
t k ij
53
y
ij
ijk ri c j t k 0
ij ij ij ij
Where y
ij
ijk Tk , I,j,k=m2, I,j=m, I,k=m
Tk m m ri m c j mt k 0 … (5)
i j
We assume that, r
i
i 0, c
j
j 0 and t
k
k 0
G m2
G m2
G
̂
m2
Ri m mri 0
Ri m̂ mri
Ri mG
rˆi
m m m2
Ri G
2 rˆi
m m
C j m mc j 0
C j m̂ mc j
Cj mG
cˆ j
m m.m 2
54
Cj G
cˆ j
m m2
Tk m mt k 0
Tk m̂ mt k
Tk mG ˆ
tk
m mm 2
Tk G
tˆk
m m2
E eijk ( yijk ri c j t k ) 2
2
ijk
= ( yijk ri c j t k )( yijk ri c j t k )
ijk
G R G Cj G T G
yijk2 y ijk yijk i 2 yijk 2 yijk k 2
ijk m2 ijk ijk m m ijk m m ijk m m
Ri2 C j Tk2
2
G i
2
G
2 G 2
k G2
( y ijk2 2 ) 2 2 2
j
ijk m m m m m m m
G2
Total Sum of Square = y 2
ijk 2
ijk m
R i
2
G2
Row Sum of Square = S R2 = i
m m2
55
C 2
j
G2
Column Sum of Square= S C2 =
j
m m2
T k
2
G2
Treatment Sum of Square= S T2 = k
m m2
Rows m-1 S R2 s R2 S R2 (m 1) FR s R2 s E2
Error (m-1)(m-2) S E2 s E2 S E2 (m 1) (m 2)
Total m2-1
Alternative Hypotheses
56
Similarly, we can test for H0c and H0t.
Problem 3
An experiment was carried out to determine the effect of claying the ground
on the field of barley grains; amount of clay used were as follows:
A: No clay
The yields were in plots of 8 meters by 8 meters and are given in table.
I D B C A 83.1
II C A D B 66.9
III A D B C 105.2
IV B C A D 105.0
Perform the ANOVA and calculate the critical difference for the treatment
mean yields.
57
Solution:
The four treatment totals are:
A: 30.8, B:86.9, C:124.5, D:118.0
Grand total G = 360,2, N = 16.
C.F. = (360.2)²/16 = 8109.0025
Raw S.S. = (29.1)² + (18.9)² +……..+ (9.5)² + (28.9)² = 10,052.08
Total S.S. = 10,052.08 – 8,109.0025 = 1,943.0775
S.S.R. = ¼ [(83.1)² + (66.9)² + (105.0)² + (105.0)²] – 8,109.0025
= 33,473.26/4 – 8,109.0025 = 259.3125
S.S.C. = ¼ [(75.8)² + (109.6)² + (84.1)² + (90.7)²] – 8,109.0025
= 33057.10/4 – 8109.0025 = 155.2725
S.S.T. = ¼ [(30.8)² + (86.9)² + (124.5)² + (118.0)²] – 8,109.0025
= 37924.50/4 – 8109.0025 = 1372.1225
Error S.S. = T.S.S. – S.S.R. – S.S.C. – S.S.T. = 156.3700
ANOVA TABLE FOR L.S.D.
Source of
variation S.S. Variance Ratio
(1) d.f. (3) M.S.S.
(4) = (3) ÷
(2) (2)
Rows 259.5375 86.4375 FR = 86.4375/26.0616 =
Columns 3 155.2725 51.7575 3.32<4.76
1,372.1225 457.3742 Fc = 51.7576/26.0616 = 1.98
Treatments 3 156.3700 26.0616 <4.76
Error FT = 457.3742/26.0616 =
3 17.55 > 4.76
6
Total 1,943.0775
15
Hence we conclude that the variation due to rows and columns is not
significant but the treatments, i.e., different levels of clay, have significant
effect on the yield.
58
2.5.2 One Missing observation in LSD
Let us suppose that in m×m Latin Square, the observation occurring in the ith
row , jth column and receiving the kth treatment is missing. Let us assume that
its value is x, i.e., yijk=x
G = grand total.
R x
G x 2 i G x
2
x 2 cons tan t terms independen t of x
m2 m m2
2
2
C j x 2 Tk x 2
G x
G x
m m2 m m2
R x C x
2 2
i
j
Tk x 2 G x 2
x cons tan t terms independen t of x
2
2 2
m m m m
Differentiate w. r.to x
R x
x
i
C j x Tk x 2G x 0 0
m m m m2 2
m Ri x mC x
mTk x 2G x
2
m2 x
0
j
2 2
m m m m2 m2
m 2 x mRi mx mC j mx mTk 2G 2 x 0
59